Senior AI Systems Performance Engineer

SambaNova Systems•Palo Alto, CA

116d

About The Position

We are seeking a talented and driven ML performance engineer to optimize and scale state-of-the-art foundation models on SambaNova's reconfigurable dataflow platform. You'll work hands-on with some of the most advanced models in the world — such as DeepSeek R1, GPT OSS, and other frontier architectures — to push the limits of throughput, latency, and efficiency. In this role, you'll bridge the gap between deep learning and systems performance, collaborating across compiler, runtime, and hardware layers to deliver world-record performance for large-scale AI inference.

Requirements

Bachelor's or higher degree in computer science, electrical engineering, or a related field (e.g., applied mathematics, physics, or statistics).
3+ years of experience in one or more of the following areas: Deep learning model development and performance optimization Compiler, runtime, or kernel-level optimization Software–hardware co-design or systems performance tuning
Proficiency in Python or C++, with strong foundations in algorithms, data structures, and numerical computing.
Experience with at least one major ML framework — PyTorch, TensorFlow, or JAX.
Demonstrated ability to analyze and optimize performance in real-world ML pipelines.

Nice To Haves

Hands-on experience with LLM or multimodal model training and inference.
Background in large-scale distributed training, continuous batching, and high-throughput inference systems.
Familiarity with quantization, graph optimization, kernel fusion, and model partitioning.
Experience with frameworks such as DeepSpeed, Megatron, vLLM, or TensorRT.
Strong GPU programming skills (CUDA, Triton, or OpenCL); experience with cuDNN, cuBLAS, or similar libraries is a plus.
Knowledge of memory hierarchy optimization, caching, and scheduling for large-scale model execution.
Publication record or open-source contributions in ML systems or performance optimization is a plus.

Responsibilities

Bring up and optimize cutting-edge foundation models (e.g., DeepSeek, Llama, Qwen, and others) on the SambaNova platform through the SambaNova software stack.
Profile and enhance model performance across compiler, runtime, and hardware layers to achieve SOTA throughput and latency.
Collaborate with machine learning, compiler, runtime, and hardware teams to deliver co-designed, high-performance AI applications.
Integrate the latest advances in model architecture, quantization, scheduling, and memory optimization from both academia and industry.
Develop robust, scalable, and efficient end-to-end inference solutions aligned with customer needs.
Identify performance bottlenecks and propose dataflow or scheduling optimizations for both single-node and distributed systems.

Benefits

SambaNova offers a competitive total rewards package, including the base salary, plus equity and benefits.
We cover 95% premium coverage for employee medical insurance, and 77% premium coverage for dependents and offer a Health Savings Account (HSA) with employer contribution.
We also offer Dental, Vision, Short/Long term Disability, Basic Life, Voluntary Life, and AD&D insurance plans in addition to Flexible Spending Account (FSA) options like Health Care, Limited Purpose, and Dependent Care.
Our library of well-being benefits available to you and your dependents includes a full subscription to Headspace, Gympass+ membership with access to physical gyms, One Medical membership, counseling services with an Employee Assistance Program, and much more.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume