As a Senior/Staff Engineer on the Foundation Model Compute Infrastructure team, you will lead the design and development of scheduling and orchestration systems for large-scale TPU workloads across multi-region clusters. You will work on distributed systems that manage thousands of accelerators and enable reliable, efficient execution of large-scale training and inference jobs. This role spans scheduling algorithms, cluster lifecycle management, workload orchestration, reliability engineering, and performance optimization.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior