Design, build, and operate large-scale GPU infrastructure for high-throughput model inference and mid-training workloads. Develop systems that power synthetic data generation and reinforcement learning pipelines at scale. Build high-performance inference platforms capable of serving and evaluating models across thousands of GPUs. Optimize throughput, latency, and GPU utilization for large language model inference and rollout workloads. Build infrastructure that supports reinforcement learning pipelines, including large-scale rollout generation, evaluation, and policy improvement loops. Work closely with research teams to support distributed RL workloads and large-scale model evaluation infrastructure. Improve performance of model execution through kernel-level optimization, model parallelism strategies, and GPU runtime improvements. Develop distributed systems that enable large-scale synthetic data generation and RL-driven training workflows. Diagnose and resolve performance bottlenecks across inference runtimes, GPU kernels, networking, and distributed compute systems.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
1-10 employees