About The Position

We are seeking an experienced, technically strong, impact-driven expert in ML Training Infrastructure with a demonstrated ability to lead through hands-on technical work. In this role, you will be responsible for defining the technical direction and driving the design and development of scalable, reliable, and high-performance AI/ML platform infrastructure that enables advanced AI research and model development at scale. As a Staff ML Engineer, you will operate as a technical leader across initiatives, partnering closely with machine learning engineers, research scientists, and platform teams to shape architecture, drive major technical decisions, and deliver state-of-the-art AI infrastructure that enables the future of intelligent driving technologies across General Motors vehicles.

Requirements

  • Bachelor's degree or higher in Computer Science or a related field, or equivalent practical experience.
  • 7+ years of professional software engineering experience.
  • 5+ years of specialized experience in AI/ML infrastructure, such as enabling distributed training for large-scale ML models.
  • Strong programming skills in Python, with deep proficiency in frameworks such as PyTorch (preferred), TensorFlow, or similar ML systems.
  • Proven experience designing and operating distributed systems for ML training, including distributed computing, GPU computing, and cloud environments (AWS, GCP, Azure).
  • Demonstrated track record of leading technically ambiguous, cross-team infrastructure initiatives and driving them to measurable impact.
  • Strong architectural judgment and ability to make sound technical tradeoffs across performance, reliability, usability, and cost.
  • Willingness to travel to Sunnyvale, CA as needed.
  • Comfortable operating in highly ambiguous and dynamic environments.

Nice To Haves

  • 7+ years of professional software engineering experience.
  • Deep expertise in PyTorch 2.x+ and distributed training frameworks.
  • Experience designing and developing training platforms that support FSDP, pipeline parallelism, and other scalable solutions for training large foundational models.
  • Experience profiling, analyzing, debugging, and optimizing training and data loading performance at scale.
  • Strong record of technical leadership through architecture reviews, roadmap influence, and cross-team execution.
  • Excellent communication skills, with the ability to build consensus, navigate controversial decisions, communicate risks clearly, and provide constructive technical feedback.
  • Self-motivated, execution-oriented, and motivated by delivering broad organizational impact.

Responsibilities

  • Define and drive the architecture, design, and development of scalable, reliable, and high-performance ML frameworks and platform capabilities to support model training at scale.
  • Lead model training performance analysis and optimization efforts across distributed training workflows, improving scalability, efficiency, and cost across heterogeneous hardware environments.
  • Raise the bar on system observability, debuggability, operational excellence, and developer experience across the ML training stack.
  • Own large, ambiguous, cross-functional technical initiatives from strategy through execution, including technical roadmap definition, tradeoff analysis, and delivery.
  • Influence platform direction by identifying long-term infrastructure investments, setting engineering standards, and driving adoption of best practices across teams.
  • Collaborate across organizational boundaries to align requirements, resolve technical disagreements, and integrate new capabilities into the platform ecosystem.
  • Mentor engineers through design reviews, technical guidance, and hands-on partnership, while elevating engineering quality across the team.

Benefits

  • medical
  • dental
  • vision
  • Health Savings Account
  • Flexible Spending Accounts
  • retirement savings plan
  • sickness and accident benefits
  • life insurance
  • paid vacation & holidays
  • tuition assistance programs
  • employee assistance program
  • GM vehicle discounts
  • relocation benefits
  • company vehicle evaluation program
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service