Staff ML Engineer - Infrastructure

ChipStack•San Jose, CA

49d

About The Position

This role offers a unique opportunity to be part of the founding team at ChipStack, where we are reinventing how modern silicon chips are designed. You will work alongside highly experienced chip designers who have built complex chips, ML scientists who have trained LLMs at scale, and top-notch infrastructure and software engineers. You will get to leverage your experience building ML and data infrastructure and apply it to some of the hardest problems in chip design.

Requirements

5+ years of experience in ML infrastructure or adjacent roles
Deep expertise in Python and experience with training frameworks like PyTorch or TensorFlow
Strong systems engineering skills and experience with distributed training, data pipelines, and performance optimization
Experience deploying ML models to production (REST APIs, batch jobs, streaming pipelines)
Proficiency with cloud platforms (e.g., GCP, AWS) and containerized systems (Docker, Kubernetes)
Experience managing GPU/TPU workloads efficiently
Good communication skills and the ability to work directly with engineers and customers
Prior experience training or fine-tuning LLMs

Nice To Haves

Exposure to chip design fundamentals (via coursework or elsewhere)
Experience at an early-stage startup
Experience setting up observability, monitoring, and evaluation pipelines for ML models

Responsibilities

Building the core infrastructure that enables training, fine-tuning, evaluation, and deployment of LLMs across cloud and on-premise environments.
Your work will directly impact product capabilities and speed of iteration.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume