Deployment Engineer, AI Inference

Cerebras SystemsSunnyvale, CA
35d

About The Position

We are seeking a highly skilled Deployment Engineer to build and operate our cutting-edge inference clusters. These clusters would provide the candidate an opportunity to work with the world's largest computer chip, the Wafer-Scale Engine (WSE), and the systems that harness its unparalleled power. You will play a critical role in ensuring reliable, efficient, and scalable deployment of AI inference workloads across our global infrastructure. On the operational side, you'll own the rollout of the new software versions and AI replica updates, along the capacity reallocations across our custom-built, high-capacity datacenters. Beyond operations, you'll drive improvements to our telemetry, observability and the fully automated pipeline. This role involves working with advanced allocation strategies to maximize utilization of large-scale computer fleets. The ideal candidate combines hands-on operation rigor with strong systems engineering skills and thrives on building resilient pipelines that keep pace with cutting-edge AI models. This role does not require 24/7 hour on-call rotations.

Requirements

  • 2-5 years of experience in operating on-prem compute infrastructure (ideally in Machine Learning or High-Performance Compute) or id developing and managing complex AWS plane infrastructure for hybrid deployments
  • Strong proficiency in Python for automation, orchestration, and deployment tooling
  • Solid understanding of Linux-based systems and command-line tools
  • Extensive knowledge of Docker containers and container orchestration platforms like K8S
  • Familiarity with spine-leaf (Clos) networking architecture
  • Proficiency with telemetry and observability stacks such as Prometheus, InfluxDB and Grafana
  • Strong ownership mindset and accountability for complex deployments
  • Ability to work effectively in a fast-paced environment.

Responsibilities

  • Deploy AI inference replicas and cluster software across multiple datacenters
  • Operate across heterogeneous datacenter environments undergoing rapid 10x growth
  • Maximize capacity allocation and optimize replica placement using constraint-solver algorithms
  • Operate bare-metal inference infrastructure while supporting transition to K8S-based platform
  • Develop and extend telemetry, observability and alerting solutions to ensure deployment reliability at scale
  • Develop and extend a fully automated deployment pipeline to support fast software updates and capacity reallocation at scale
  • Translate technical and customer needs into actionable requirements for the Dev Infra, Cluster, Platform and Core teams
  • Stay up to date with the latest advancements in AI compute infrastructure and related technologies.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Computer and Electronic Product Manufacturing

Education Level

No Education Listed

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service