Senior AI Systems Engineer

Archer•San Jose, CA

1d•Onsite

About The Position

As a Senior AI Systems Engineer, you will architect, deploy, and manage the critical infrastructure services required for large-scale AI model training and inference. You will ensure our machine learning platforms are robust and efficient, bridging the gap between raw data and high-performance AI models.

Requirements

BS/MS/PhD degree in Computer Science, Software Engineering, or a related field.
3+ years of professional software engineering experience with a dedicated focus on AI/ML systems, high-performance computing (HPC), or ML infrastructure.
Familiarity with hyper-scaler infrastructure (AWS) alongside specialized AI-centric bare-metal and GPU clouds (Nebius AI Cloud).
Hands-on experience with containerization (Docker) and production-grade orchestration (Kubernetes), paired with cloud-agnostic cluster abstractors like SkyPilot to manage multi-region GPU availability.
Deep architectural understanding of large language models and the system infrastructure required to serve them at scale using frameworks like vLLM and SGLang.
Experience building high-throughput data pipelines to support large-scale training, including proficiency in SQL, NoSQL, and columnar storage formats optimized for ML (e.g., Parquet).

Nice To Haves

Familiarity with audio processing, speech-to-text frameworks, or Automatic Speech Recognition (ASR) pipelines.
Prior experience or a deep technical interest in aerospace, aviation, or autonomous systems (e.g., safety-critical software, edge-AI deployments).

Responsibilities

Deploy, scale, and manage resilient infrastructure services tailored for distributed AI model training and low-latency inference.
Utilize and maintain end-to-end tooling—including MLflow for experiment tracking and model registry—to streamline and optimize the AI development lifecycle.
Leverage specialized frameworks to maximize hardware utilization, managing multi-cloud compute scheduling alongside advanced LLM serving engines.
Partner closely with AI researchers and Software Engineers to productionize cutting-edge models, establish monitoring systems, and debug complex performance bottlenecks at the hardware-software interface.