1.65 Senior Machine Learning Platform Engineer

FieldAI•Irvine, CA

133d

About The Position

Our Field Foundation Model (FFM) powers a global fleet of autonomous robots that capture massive streams of multimodal data across diverse, dynamic environments every day. As part of the Insight Team our mission is to transform this raw, multimodal data into actionable insights that empower our customers and engineers to deliver value. Field-insight Foundation Model (FiFM) is at the core of how we transform multimodal data from autonomous robots into actionable insights. As a Senior Machine Learning Platform Engineer, you will own the infrastructure that powers FiFM, from model hosting and distributed training pipelines to data systems, observability, and security.This is a role at the intersection of systems engineering and machine learning. Youâll design and operate large-scale ML platforms, ensure FiFM transitions smoothly from research into production, and optimize for both performance and cost across cloud and edge. In addition to building core infrastructure, youâll play a leadership role by mentoring junior engineers, setting technical direction, and raising the engineering bar across the team.

Requirements

Bachelorâs/Masterâs in Computer Science, Engineering, or related field (or equivalent experience).
4+ years of industry experience in ML infrastructure or platform engineering.
Strong coding skills in Python/TypeScript and a strong foundation in software engineering best practices.
Proven experience with distributed systems, cloud platforms (AWS preferred), containerization and orchestration (Docker, Kubernetes/EKS, Ray), and serverless.
Hands-on experience building ML pipelines for distributed training and large-scale inference.
Strong knowledge of data management at scale, including preprocessing and retrieval of video/image datasets.
Proficiency with CI/CD pipelines, infrastructure-as-code (Terraform, CloudFormation), and automation.
Familiarity with MLOps tools (MLflow, Kubeflow, Airflow).
Experience with system monitoring and observability in production.

Nice To Haves

Experience with vector databases (OpenSearch, Pinecone, Weaviate) for indexing and retrieval.
Familiarity with distributed training frameworks (Horovod, DDP/FSDP, DeepSpeed, Ray).
Hands-on experience with GPU orchestration and auto-scaling (Karpenter, SageMaker, EKS).
Experience with agentic AI deployment workflows, orchestration frameworks, and retrieval-augmented generation.
Strong knowledge of security and compliance in ML and cloud environments.

Responsibilities

Design and manage scalable ML infrastructure with IaC tools (Terraform, CloudFormation).
Develop and optimize cloud-based pipelines for training, evaluation, and inference on multimodal datasets.
Build and operate data systems for large-scale video ingestion, indexing, and storage.
Maintain MLOps workflows for versioning, experiment tracking, reproducibility, and CI/CD.
Ensure reliability and observability with monitoring, logging, and alerting.
Collaborate with AI/ML Engineers to productionize workflows.
Optimize infrastructure for performance and cost across cloud and edge.
Enforce best practices in security, compliance, and maintainability.
Mentor and manage junior engineers, providing technical guidance and career development.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume