Clockwork.io – Software Driven Fabrics to increase GPU cluster utilization Clockwork Systems was founded by Stanford researchers and veteran systems engineers who share a vision for redefining the foundations of distributed computing. As AI workloads grow increasingly complex, traditional infrastructure struggles to meet the demands of performance, reliability, and precise coordination. Clockwork is pioneering a software-driven approach to AI fabrics by delivering cross-stack observability to catch and quickly resolve problems, workload fault tolerance to keep jobs running through failures, and performance acceleration that dynamically routes and paces traffic to avoid congestion. To learn more, visit www.clockwork.io . We're building infrastructure for fault-tolerant, high-performance distributed GPU training. You'll work at the intersection of GPU systems, high-speed networking, and distributed coordination—designing and implementing systems that run at scale. This is a systems building role. You'll dig into internals, understand why things break under pressure, and design solutions that handle the messy reality of distributed systems.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
11-50 employees