Compute Platform Lead

Merlin Labs•Boston, MA

1d•Onsite

About The Position

Merlin is a venture backed aerospace startup building a non-human pilot to enable both reduced crew and uncrewed flight. Backed by some of the worldâs leading investors, Merlin is scaling alongside our customers to begin leveraging autonomy today to solve some of aviationâs biggest challenges. You are a senior platform engineer who understands that autonomous flight demands uncompromising reliability, real-time performance, and scalable infrastructure. You have built and owned compute platforms that power complex, mission-critical systems â spanning cloud ML training, high-fidelity simulation, and edge or embedded environments. You think holistically about architecture, balancing determinism, latency, scalability, safety, and cost. You are not looking to maintain infrastructure â you want to define it. You operate as a technical authority, partnering closely with autonomy, perception, controls, and flight software leaders to ensure the compute foundation enables rapid development and safe, scalable deployment. You thrive in environments where the platform you design directly impacts real-world performance. You are comfortable setting long-term technical direction while remaining hands-on in the most complex systems challenges.

Requirements

10+ years of experience building large-scale distributed or high-performance compute systems.
Proven ownership of production infrastructure supporting autonomous systems, robotics, aerospace, or ML-heavy platforms.
Deep expertise in distributed systems design, networking, and systems performance optimization.
Experience architecting infrastructure for GPU-based training, simulation, or real-time compute workloads.
Strong programming background in C++, Go, or Python, with comfort operating close to systems layers.
Experience bridging cloud infrastructure and edge/embedded compute environments.
Track record of leading complex cross-functional technical initiatives.
Ability to operate with autonomy in a fast-scaling, high-ambiguity startup environment.

Responsibilities

End-to-end compute architecture spanning cloud ML training, simulation clusters, data pipelines, and edge/onboard systems.
Infrastructure that supports large-scale distributed training, high-fidelity simulation (SIL/HIL), and autonomy validation workflows.
GPU/accelerator orchestration, workload scheduling, and performance optimization for compute-intensive autonomy systems.
Design decisions that balance determinism, latency, safety, scalability, and cost.
Platform reliability standards for systems that ultimately support flight-critical software.
Long-term roadmap for scaling compute infrastructure as fleet size, simulation fidelity, and ML workloads grow.
Technical mentorship and bar-raising across platform and infrastructure engineering.