About The Position

We’re looking for a Senior Machine Learning Engineer to join our Applied Science Data Frameworks team. In this role, you’ll build the infrastructure that powers large-scale, multimodal AI training and inference. You’ll work across machine learning, distributed systems, and data engineering to develop tools and platforms that help teams train and deploy models at scale. Your work will support systems that process billions of data points across large GPU environments. If you’re motivated by solving complex problems and building systems that enable others to do their best work, we’d love to connect.

Requirements

  • 8+ years of experience building and operating distributed systems or ML infrastructure in production
  • Experience working with large-scale data pipelines or inference systems
  • Strong programming skills in Python and a foundation in software engineering principles
  • Experience with ML frameworks such as PyTorch or TensorFlow
  • Familiarity with distributed computing tools (e.g., Ray, Spark, Dask, or similar)
  • Experience working with cloud platforms such as AWS or Azure
  • Understanding of MLOps practices, including CI/CD and deployment workflows
  • Ability to communicate clearly and collaborate with cross-functional teams

Nice To Haves

  • Experience working with multimodal data (images, video, text)
  • Familiarity with vector databases or semantic search systems

Responsibilities

  • Build distributed data loaders to support large-scale training workflows
  • Develop data pipelines for ingesting, transforming, and preparing multimodal datasets
  • Design batch inference systems for high-volume data processing across GPU environments
  • Improve system performance, scalability, and reliability using distributed computing tools (e.g., Ray, Spark, DuckDB)
  • Implement search and retrieval systems using vector databases and embedding-based approaches
  • Develop and maintain CI/CD workflows, including testing, deployment, and containerization
  • Partner with researchers and engineers to turn model requirements into scalable systems
  • Create reusable tools, libraries, and documentation to support teams across the organization
  • Monitor and improve system health, including throughput, latency, and resource utilization
  • Support a collaborative team environment through code reviews and knowledge sharing
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service