ML Systems Engineer

Periodic Labs•Menlo Park, CA

17h•Hybrid

About The Position

Periodic Labs is an AI and physical sciences company building state-of-the-art models to accelerate breakthroughs across materials, energy, and beyond. The ML Systems Engineer will own the systems layer that makes frontier model training and inference fast, efficient, and tightly coupled to the RL feedback loop that drives scientific discovery. This role sits at the intersection of infrastructure and research, requiring deep understanding of scheduling, kernels, RDMA, weight synchronization, and communication primitives, while collaborating with researchers to co-design algorithms and infrastructure. The speed and reliability of the RL loop, which involves models proposing experiments, experiments generating data, and data feeding back into training, is a direct multiplier on the pace of scientific discovery, and this role will own the infrastructure that makes it fast.

Requirements

Large-scale inference infrastructure: load balancing, traffic shifting, scheduling, and serving architecture at production scale
Low-level systems programming: RDMA, NVLink, kernel-level work, and network stack optimization
GPU cluster scheduling and orchestration across Ray, Slurm, or Kubernetes, with awareness of rack topology and hardware locality
Writing and optimizing CUDA kernels, communication primitives, or distributed training collective operations
Profiling and benchmarking distributed ML systems to identify and eliminate bottlenecks across compute, memory, and network
Checkpoint management and streaming at scale, including direct cloud storage integration
Building or contributing to open source ML infrastructure projects (e.g., SGLang, Megatron-LM, vLLM, Ray)
Working directly with ML researchers on algorithm-infrastructure co-design — you understand the research well enough to make systems decisions that serve it

Responsibilities

Build rack and topology-aware scheduling for GB series GPUs across Ray, Slurm, and Kubernetes, minimizing latency and maximizing utilization across heterogeneous cluster configurations.
Build online and offline profilers that surface bottlenecks across the training and inference stack and translate findings into actionable optimizations.
Implement direct S3 checkpoint streaming to eliminate I/O bottlenecks in large-scale training runs.
Run methodical benchmarking to identify optimal RL training configurations across model sizes, batch strategies, and hardware topologies.
Write and optimize communication and GPU kernels to extract maximum throughput from the hardware.
Design and implement zero-copy RDMA weight synchronization between training and inference to keep the RL loop tight and low-latency.
Build fast sandbox execution environments that allow rapid rollout of model-generated actions and return of rewards without blocking the training pipeline.
Engage directly with the SGLang, Megatron, and Ray communities — contributing upstream, influencing roadmaps, and pulling in improvements that benefit Periodic Labs’ workloads.
Work in close collaboration with RL and pretraining researchers to co-design algorithms and infrastructure together — you will shape what is possible at the research level by knowing what is achievable at the systems level, and vice versa.