Senior Engineering Manager, ML Platform

Boston DynamicsWaltham, MA
$198,000 - $300,000

About The Position

We're looking for a Senior Engineering Manager to lead our ML Platform Team - a growing team responsible for the foundational infrastructure that powers our machine learning work. This is a player-coach role: you'll set technical direction and contribute hands-on while building out the team and establishing the processes that will scale with it. The platform is in its early stages, with some foundations in place. You'll be joining at a pivotal moment - making architectural decisions that will shape how the team and the platform grow from 4 engineers today to a team of 10–12.

Requirements

  • 7–12 years of engineering experience, with at least 2–3 years in a formal management or tech lead capacity
  • Demonstrated experience building or scaling a platform, infrastructure, or ML systems team from the ground up
  • Technical credibility in one or more of: GPU/distributed compute infrastructure, large-scale data storage and retrieval, or data pipeline frameworks
  • Experience making foundational architectural decisions in an early-stage or greenfield environment
  • Strong cross-functional communication skills - able to translate between ML researchers, engineers, and senior leadership
  • Comfortable with ambiguity; able to define the roadmap rather than just execute against one
  • A hands-on mindset - willing and able to write code, review designs, and debug production issues alongside your team

Nice To Haves

  • Familiarity with compute orchestration frameworks such as Kubernetes, Slurm, or Ray
  • Experience with ML training workflows, dataset generation pipelines, or feature stores
  • Prior experience growing a team through a hiring ramp (e.g. doubling or tripling headcount)

Responsibilities

  • Own the strategy, roadmap, and execution for GPU compute infrastructure, ensuring it scales to meet growing model training and fine-tuning demands
  • Contribute directly to infrastructure design and implementation, particularly in the near term as the team grows
  • Drive reliability, performance, and cost efficiency across distributed training clusters.
  • Optimize existing and new training workloads to achieve scale.
  • Evaluate and adopt new hardware (GPUs, TPUs, custom accelerators) and cloud/on-prem infrastructure as the team's needs evolve
  • Oversee the design and operation of data storage, indexing, and retrieval systems that support large-scale dataset generation
  • Ensure data pipelines are performant, fault-tolerant, and meet the quality and freshness requirements of ML teams
  • Establish early-stage standards for data access, lineage, and governance — pragmatic and scalable, not over-engineered
  • Lead the development and maintenance of shared libraries and frameworks for data transformation pipelines
  • Partner with ML researchers and engineers to understand their workflows and translate them into reliable, reusable platform capabilities
  • Champion developer productivity - reduce friction for teams consuming platform services
  • Lay the architectural foundations of the platform, making decisions that are pragmatic today but designed to scale to a 10–12 person team and beyond
  • Make key architectural decisions around compute orchestration (e.g. Kubernetes, Slurm, Ray), storage systems, and pipeline frameworks
  • Balance short-term delivery with long-term platform health -knowing when to build, buy, or borrow
  • Act as a technical partner to ML research, data engineering, and product teams - translating needs into platform priorities
  • Communicate roadmap, incidents, and technical tradeoffs clearly to both engineers and senior leadership
  • Help ML teams become self-sufficient on the platform, reducing bottlenecks on the platform team itself
  • Actively participate in hiring to grow the team from 4 to ~10–12 engineers, including defining roles and leveling
  • Mentor and develop engineers, establishing a team culture early that will hold as headcount scales
  • Define lightweight but durable team processes - on-call rotations, incident response, and engineering standards that won't need to be rebuilt at scale
  • Be comfortable doing IC work yourself while simultaneously building the team's capacity to take it on

Benefits

  • medical
  • dental
  • vision
  • 401(k)
  • paid time off
  • annual bonus structure
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service