Member of Technical Staff - GPU Infrastructure

Reflection

97d

About The Position

Design, build, and operate Reflection’s large-scale GPU infrastructure powering pre-training, post-training, and inference. Develop reliable, high-performance systems for scheduling, orchestration, and observability across thousands of GPUs. Optimize cluster utilization, throughput, and cost efficiency while maintaining reliability at scale. Build tools and automation for distributed training, inference, monitoring, and experiment management. Collaborate closely with research, training, and platform teams to accelerate development and enable large-scale training and inference. Push the limits of hardware, networking, and software to accelerate the path from idea to model.

Requirements

Deep systems or infrastructure engineering experience in high-performance or distributed computing environments.
Strong understanding of GPUs, CUDA, NCCL, and large-scale training frameworks (PyTorch, DeepSpeed, JAX, etc.).
Hands-on experience with containerization, orchestration, and cluster management (Kubernetes, Slurm, etc.).
Familiarity with modern observability stacks and performance profiling tools.
Ability to thrive in a fast-paced, high-ownership startup environment.

Nice To Haves

Excited to build from zero to one defining frontier-scale training/RL infrastructure.
Motivated by enabling researchers and engineers to build open-weight AI systems.

Responsibilities

Design, build, and operate large-scale GPU infrastructure.
Develop systems for scheduling, orchestration, and observability across GPUs.
Optimize cluster utilization, throughput, and cost efficiency.
Build tools and automation for distributed training and inference.
Collaborate with research, training, and platform teams.
Push limits of hardware, networking, and software.

Benefits

Top-tier compensation: Salary and equity structured to recognize and retain the best talent globally.
Comprehensive medical, dental, vision, life, and disability insurance.
Fully paid parental leave for all new parents, including adoptive and surrogate journeys.
Financial support for family planning.
Paid time off when needed, wellness and time-saver stipend, commute benefits, education stipend, and relocation support.
Lunch and dinner provided daily, regular off-sites and team celebrations.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Member of Technical Staff - GPU Infrastructure

About The Position

Requirements

Nice To Haves

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company