Senior ML Ops Engineer (Machine Learning Infrastructure)

Parallel SystemsLos Angeles, CA
13hHybrid

About The Position

Parallel Systems is pioneering autonomous battery-electric rail vehicles designed to transform freight transportation by shifting portions of the $900 billion U.S. trucking industry onto rail. Our innovative technology offers cleaner, safer, and more efficient logistics solutions. Join our dynamic team and help shape a smarter, greener future for global freight. Senior ML Ops Engineer (Machine Learning Infrastructure) Parallel Systems is seeking an experienced MLOps/ML Infrastructure Engineer to lead the design and development of the scalable systems that power our autonomy and perception pipelines. As we build the first fully autonomous, battery-electric rail vehicles, you will play a critical role in enabling the ML teams to develop, train, and deploy models efficiently and reliably in both R&D and real-world environments. This is an opportunity to take full ownership of the ML infrastructure stack, from distributed training environments and experiment tracking to deployment and monitoring at scale. You’ll collaborate closely with world-class engineers in autonomy, robotics, and software, helping shape the core systems that make real-time, safety-critical ML possible. If you're driven by building robust platforms that unlock innovation in AI and robotics, we’d love to work with you. This role requires at least one week a month or more in our LA office per month.

Requirements

  • Bachelor’s or higher degree in Computer Science, Machine Learning, or a relevant engineering discipline.
  • 5+ years of experience building large-scale, reliable systems; 2+ years focused on ML infrastructure or MLOps.
  • Proven experience architecting and deploying production-grade ML pipelines and platforms.
  • Strong knowledge of ML lifecycle: data ingestion, model training, evaluation, packaging, and deployment.
  • Hands-on experience with MLOps tools (e.g., MLflow, Kubeflow, SageMaker, Airflow, Metaflow, or similar).
  • Deep understanding of CI/CD practices applied to ML workflows.
  • Proficiency in Python, Git, and system design with solid software engineering fundamentals.
  • Experience with cloud platforms (AWS, GCP, or Azure) and designing ML architectures in those environments.

Nice To Haves

  • Experience with deep learning architectures (CNNs, RNNs, Transformers) or computer vision.
  • Hands-on experience with distributed training tools (e.g., PyTorch DDP, Horovod, Ray).
  • Background in real-time ML systems and batch inference, including CPU/GPU-aware orchestration.
  • Previous work in autonomous vehicles, robotics, or other real-time ML-driven systems.

Responsibilities

  • Design and implement robust MLOps solutions, including automated pipelines for data management, model training, deployment and monitoring.
  • Architect, deploy, and manage scalable ML infrastructure for distributed training and inference.
  • Collaborate with ML engineers to gather requirements and develop strategies for data management, model development and deployment.
  • Build and operate cloud-based systems (e.g., AWS, GCP) optimized for ML workloads in R&D, and production environments.
  • Build scalable ML infrastructure to support continuous integration/deployment, experiment management, and governance of models and datasets.
  • Support the automation of model evaluation, selection, and deployment workflows.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service