Member of Technical Staff - Mid-Training Infra

Reflection

23d

About The Position

Design, build, and operate large-scale GPU infrastructure for high-throughput model inference and mid-training workloads. Develop systems that power synthetic data generation and reinforcement learning pipelines at scale. Build high-performance inference platforms capable of serving and evaluating models across thousands of GPUs. Optimize throughput, latency, and GPU utilization for large language model inference and rollout workloads. Build infrastructure that supports reinforcement learning pipelines, including large-scale rollout generation, evaluation, and policy improvement loops. Work closely with research teams to support distributed RL workloads and large-scale model evaluation infrastructure. Improve performance of model execution through kernel-level optimization, model parallelism strategies, and GPU runtime improvements. Develop distributed systems that enable large-scale synthetic data generation and RL-driven training workflows. Diagnose and resolve performance bottlenecks across inference runtimes, GPU kernels, networking, and distributed compute systems.

Requirements

Experience deploying and operating large-scale GPU systems for inference or model serving.
Several years of hands-on experience building and running production infrastructure.
Strong understanding of GPU performance characteristics and optimization techniques.
Experience working with modern inference frameworks such as SGLang, Megatron, or similar high-performance LLM runtimes.
Familiarity with distributed reinforcement learning infrastructure or rollout generation systems.
Experience optimizing throughput for large-scale model execution workloads.
Experience working with GPU kernels or low-level performance optimization.
Familiarity with infrastructure used for synthetic data pipelines or RL training workflows.
Experience debugging performance issues across GPU, networking, and distributed execution layers.

Responsibilities

Design, build, and operate large-scale GPU infrastructure for high-throughput model inference and mid-training workloads.
Develop systems that power synthetic data generation and reinforcement learning pipelines at scale.
Build high-performance inference platforms capable of serving and evaluating models across thousands of GPUs.
Optimize throughput, latency, and GPU utilization for large language model inference and rollout workloads.
Build infrastructure that supports reinforcement learning pipelines, including large-scale rollout generation, evaluation, and policy improvement loops.
Work closely with research teams to support distributed RL workloads and large-scale model evaluation infrastructure.
Improve performance of model execution through kernel-level optimization, model parallelism strategies, and GPU runtime improvements.
Develop distributed systems that enable large-scale synthetic data generation and RL-driven training workflows.
Diagnose and resolve performance bottlenecks across inference runtimes, GPU kernels, networking, and distributed compute systems.

Benefits

Top-tier compensation: Salary and equity structured to recognize and retain the best talent globally.
Health & wellness: Comprehensive medical, dental, vision, life, and disability insurance.
Life & family: Fully paid parental leave for all new parents, including adoptive and surrogate journeys. Financial support for family planning.
Benefits & balance: paid time off when you need it, relocation support, and more perks that optimize your time.
Opportunities to connect with teammates: lunch and dinner are provided daily. We have regular off-sites and team celebrations.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume