About The Position

Our Field Foundation Model (FFM) powers a global fleet of autonomous robots that capture massive streams of multimodal data across diverse, dynamic environments every day. As part of the Insight Team our mission is to transform this raw, multimodal data into actionable insights that empower our customers and engineers to deliver value. Field-insight Foundation Model (FiFM) is at the core of how we transform multimodal data from autonomous robots into actionable insights. As a Senior Machine Learning Platform Engineer, you will own the infrastructure that powers FiFM, from model hosting and distributed training pipelines to data systems, observability, and security.This is a role at the intersection of systems engineering and machine learning. You’ll design and operate large-scale ML platforms, ensure FiFM transitions smoothly from research into production, and optimize for both performance and cost across cloud and edge. In addition to building core infrastructure, you’ll play a leadership role by mentoring junior engineers, setting technical direction, and raising the engineering bar across the team.

Requirements

  • Bachelor’s/Master’s in Computer Science, Engineering, or related field (or equivalent experience).
  • 4+ years of industry experience in ML infrastructure or platform engineering.
  • Strong coding skills in Python/TypeScript and a strong foundation in software engineering best practices.
  • Proven experience with distributed systems, cloud platforms (AWS preferred), containerization and orchestration (Docker, Kubernetes/EKS, Ray), and serverless.
  • Hands-on experience building ML pipelines for distributed training and large-scale inference.
  • Strong knowledge of data management at scale, including preprocessing and retrieval of video/image datasets.
  • Proficiency with CI/CD pipelines, infrastructure-as-code (Terraform, CloudFormation), and automation.
  • Familiarity with MLOps tools (MLflow, Kubeflow, Airflow).
  • Experience with system monitoring and observability in production.

Nice To Haves

  • Experience with vector databases (OpenSearch, Pinecone, Weaviate) for indexing and retrieval.
  • Familiarity with distributed training frameworks (Horovod, DDP/FSDP, DeepSpeed, Ray).
  • Hands-on experience with GPU orchestration and auto-scaling (Karpenter, SageMaker, EKS).
  • Experience with agentic AI deployment workflows, orchestration frameworks, and retrieval-augmented generation.
  • Strong knowledge of security and compliance in ML and cloud environments.

Responsibilities

  • Design and manage scalable ML infrastructure with IaC tools (Terraform, CloudFormation).
  • Develop and optimize cloud-based pipelines for training, evaluation, and inference on multimodal datasets.
  • Build and operate data systems for large-scale video ingestion, indexing, and storage.
  • Maintain MLOps workflows for versioning, experiment tracking, reproducibility, and CI/CD.
  • Ensure reliability and observability with monitoring, logging, and alerting.
  • Collaborate with AI/ML Engineers to productionize workflows.
  • Optimize infrastructure for performance and cost across cloud and edge.
  • Enforce best practices in security, compliance, and maintainability.
  • Mentor and manage junior engineers, providing technical guidance and career development.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service