About The Position

About Clockwork Systems Clockwork.io – Software Driven Fabrics to increase GPU cluster utilization Clockwork Systems was founded by Stanford researchers and veteran systems engineers who share a vision for redefining the foundations of distributed computing. As AI workloads grow increasingly complex, traditional infrastructure struggles to meet the demands of performance, reliability, and precise coordination. Clockwork is pioneering a software-driven approach to AI fabrics by delivering cross-stack observability to catch and quickly resolve problems, workload fault tolerance to keep jobs running through failures, and performance acceleration that dynamically routes and paces traffic to avoid congestion. To learn more, visit www.clockwork.io . What you will do Use your knowledge and experience to lead/contribute directly to the design and build of high performance, reliable and scalable systems. You bring At least 5 years of experience with C/C++ systems programming Deep knowledge of Linux internals (e.g., system calls, memory management, kernel modules) Strong foundation in concurrent programming and synchronization techniques Strong understanding of the TCP/IP stack, socket programming, and low-latency networking (e.g., RDMA, DPDK, XDP) Strong understanding of memory hierarchy, CPU caches, multi-core architectures, and GPUs Strong skills in systems design, performance analysis, and low-level debugging

Requirements

  • At least 5 years of experience with C/C++ systems programming
  • Deep knowledge of Linux internals (e.g., system calls, memory management, kernel modules)
  • Strong foundation in concurrent programming and synchronization techniques
  • Strong understanding of the TCP/IP stack, socket programming, and low-latency networking (e.g., RDMA, DPDK, XDP)
  • Strong understanding of memory hierarchy, CPU caches, multi-core architectures, and GPUs
  • Strong skills in systems design, performance analysis, and low-level debugging

Nice To Haves

  • Contributions to open-source HPC libraries or Linux kernel subsystems
  • Experience with performance tuning of large-scale HPC clusters
  • Experience with MPI, RPC frameworks, or distributed runtimes
  • Experience with NCCL, CUDA and GPU Kernels
  • Knowledge of RDMA APIs (e.g., libibverbs) and transport semantics
  • Experience with NIC drivers or NIC architecture

Responsibilities

  • lead/contribute directly to the design and build of high performance, reliable and scalable systems.

Benefits

  • Challenging projects.
  • Competitive compensation.
  • A great benefits package.
  • A friendly and inclusive workplace culture.
  • Catered lunches.
  • Working in the heart of Silicon Valley.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service