About The Position

Play a part in shaping the future of human-computer interaction. As an MLOps Engineer, you will be the backbone of the machine learning infrastructure that powers our speech, audio, and conversational AI teams - ensuring their models are trained on the best possible data. You will bridge the gap between research, data science, and engineering, owning the full ML lifecycle from large-scale data pipelines and distributed GPU training through to low-latency, high-fidelity inference and optimization. You'll partner closely with Audio ML Engineers, Speech ML Engineers, and ML Data Scientists to remove friction across their workflows and accelerate the path from research to product. The MLOps Engineer will drive end-to-end quality and operational excellence across data ingestion, model training, deployment pipelines, and MLOps tooling for our speech and audio ML platforms. This hire will build, deploy, and optimize production-grade systems with a strong emphasis on scalable, GPU-accelerated infrastructure. You will own the training infrastructure that powers distributed and self-supervised model training on HPC and Slurm-managed clusters, as well as the inference pipelines that bring low-latency, high-fidelity audio and speech models to production. You will establish standard methodologies for model integration, deployment, monitoring, and reproducibility using CI/CD principles. Design, build, and operate large-scale data pipelines for proprietary audio and speech datasets - supporting curation, quality monitoring, and validation at scale alongside our ML Data Science team. Partner closely with Audio ML Engineers, Speech ML Engineers, ML Data Scientists, and product teams to define metrics, gather requirements, and bring new capabilities to life. Build and operate distributed GPU training workflows, including job scheduling and resource management on Slurm-managed HPC clusters, for both supervised and self-supervised methods. Optimize model inference for low latency and high-fidelity streaming across serving environments, including optimization for Apple silicon. Design and maintain automated pipelines for model training, evaluation, versioning, and deployment, with special attention to speech, audio, and signal-processing workflows. Identify and resolve bottlenecks in ML and data workflows, improving system reliability, latency, and throughput at scale.

Requirements

  • 3 years in software engineering with demonstrated experience in large-scale software system design and implementation
  • Bachelor's Degree in Software Engineering, Computer Science, Electrical Engineering, Statistics, Machine Learning, Operations Research, or a related field
  • Proven track record of shipping and maintaining production-grade ML systems end-to-end
  • Hands-on experience with GPU-based model training and inference, including distributed/multi-node training
  • Experience operating workloads on HPC environments and job schedulers such as Slurm
  • Proficiency in Python and familiarity with deep learning frameworks such as PyTorch, TensorFlow, or JAX
  • Experience supporting speech and audio ML pipelines (e.g., ASR, TTS, speaker recognition, voice isolation, generative speech) and large-scale audio data processing
  • Experience with infrastructure for self-supervised and large-model training
  • Deep familiarity with GPU performance tuning, mixed-precision training, and distributed training frameworks
  • Familiarity with data quality frameworks, model monitoring, drift detection, and observability practices in production
  • Experience optimizing models for on-device or Apple silicon inference

Responsibilities

  • Design, build, and operate large-scale data pipelines for proprietary audio and speech datasets - supporting curation, quality monitoring, and validation at scale alongside our ML Data Science team.
  • Partner closely with Audio ML Engineers, Speech ML Engineers, ML Data Scientists, and product teams to define metrics, gather requirements, and bring new capabilities to life.
  • Build and operate distributed GPU training workflows, including job scheduling and resource management on Slurm-managed HPC clusters, for both supervised and self-supervised methods.
  • Optimize model inference for low latency and high-fidelity streaming across serving environments, including optimization for Apple silicon.
  • Design and maintain automated pipelines for model training, evaluation, versioning, and deployment, with special attention to speech, audio, and signal-processing workflows.
  • Identify and resolve bottlenecks in ML and data workflows, improving system reliability, latency, and throughput at scale.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service