About The Position

Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Our platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices, automate conversations and inefficient processes, and empower every team member to work smarter and faster. Born from the prestigious Stanford AI lab, Cresta's co-founder and chairman is Sebastian Thrun, the genius behind Google X, Waymo, Udacity, and more. Our leadership also includes CEO, Ping Wu, the co-founder of Google Contact Center AI and Vertex AI platform, and co-founder, Tim Shi, an early member of Open AI. Join us on this thrilling journey to revolutionize the workforce with AI. The future of work is here, and it's at Cresta. About the role: Own model serving: Design, build, and maintain low-latency, highly-available serving stacks for in-house ML model serving and integrating with LLM serving partners. Automate training pipelines: Orchestrate data prep, training, evaluation, and registry workflows on Kubernetes with solid MLOps practices. Optimize at scale: Profile and tune throughput, memory, and cost; introduce caching, sharding, batching, and GPU/CPU autoscaling where it pays off. Build platform primitives: Create reusable SDKs, templates, and CLI tools that let research and product teams ship models independently and safely. Raise the bar: Instrument deep observability (tracing, metrics, alerts), drive blameless post-mortems, and mentor engineers on production ML best practices.

Requirements

  • 5+ years writing production software; 2+ years focused on ML platform or infra.
  • Expert Python (async, typing, packaging, performance).
  • Working Golang knowledge for systems components.
  • Proven experience with one or more serving frameworks (e.g., vLLM, Triton, TorchServe).
  • Kubernetes and cloud-native ops.
  • Solid grasp of distributed systems, networking, and container security.
  • Culture of rigorous testing, code review, and continuous delivery.

Nice To Haves

  • Hands-on with large language models or real-time streaming inference.
  • Terraform, Helm, or similar IaC tooling.
  • Experience in speech or conversational AI domains.

Responsibilities

  • Design, build, and maintain low-latency, highly-available serving stacks for in-house ML model serving and integrating with LLM serving partners.
  • Orchestrate data prep, training, evaluation, and registry workflows on Kubernetes with solid MLOps practices.
  • Profile and tune throughput, memory, and cost; introduce caching, sharding, batching, and GPU/CPU autoscaling where it pays off.
  • Create reusable SDKs, templates, and CLI tools that let research and product teams ship models independently and safely.
  • Instrument deep observability (tracing, metrics, alerts), drive blameless post-mortems, and mentor engineers on production ML best practices.

Benefits

  • Comprehensive medical, dental, and vision coverage with plans to fit you and your family
  • Flexible PTO to take the time you need, when you need it
  • Paid parental leave for all new parents welcoming a new child
  • Retirement savings plan to help you plan for the future
  • Remote work setup budget to help you create a productive home office
  • Monthly wellness and communication stipend to keep you connected and balanced
  • In-office meal program and commuter benefits provided for onsite employees

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

251-500 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service