Roblox-posted 2 days ago
Full-time • Senior
San Mateo, CA
1,001-5,000 employees

ML Platform powers hundreds of use cases and billions of inferences per day across Discovery, Safety, Economy, and Creation. We build the primitives that let teams train, evaluate, deploy, and operate models quickly and safely—so a new ML idea can reach production in weeks or less. We’re looking for a Principal Platform Engineer who treats platform as a product : someone who can turn complex ML/AI infrastructure into clear, durable APIs and easy-to-use CLIs/UIs that our internal developers love. This role blends product thinking, developer experience, backend engineering, and infrastructure at scale. Hands‑on ML experience is a plus; a track record of building internal platforms that developers love is a must.

  • Own platform as a product and set direction end to end: Define requirements, write RFDs, and ship APIs, SDKs, CLIs, and UIs that make ML@Roblox easy to adopt.
  • Bootstrap and maintain core ML Platform components: Serving Layer, Model Registry, Pipeline Orchestrator, and Training/Inference control planes.
  • Set technical strategy and oversee development of high scale and reliable infrastructure systems, with clear SLOs for latency, availability, and cost.
  • Design great developer experiences with paved‑road templates, golden paths, opinionated defaults, and clear docs to reduce time‑to‑first‑production.
  • Instrument the platform to measure adoption, friction, reliability, and cost; use data to prioritize roadmap and validate outcomes.
  • Partner across organizations (ML Engineering, Data Science, Infra/SRE, Security, Finance) to optimize performance, safety, and spend, especially for GPU‑intensive training and high‑QPS inference.
  • Propose and implement new platform tooling to improve time to production for MLEs across the full ML lifecycle.
  • Stay abreast of industry trends in machine learning and infrastructure to ensure the adoption of leading‑edge technologies and practices.
  • Mentor junior and senior engineers, lead design reviews, and drive cross‑team architectural decisions that last.
  • 5+ years of professional experience and have a wealth of system design experience upon which to draw to build a scalable, reliable ML platform for all of Roblox.
  • Proficiency in API design and developer experience—gRPC/REST APIs, SDKs, CLIs, and simple UIs that developers love to use.
  • Experience with the end‑to‑end ML model lifecycle such as model serving, training, model CI/CD, and GPU resources management, and have built ML platform features that are delightful to use.
  • Bachelor's degree in Computer Science, Computer Engineering, Data Science, or a similar technical field.
  • A Code Machine; you love not only to design and communicate ideas but also to actually ship product.
  • You obsess about user feedback, and constantly drive towards getting platform features in customers hands.
  • Passionate about supporting ML engineers to meet and understand their needs, and translating them into clean, durable platform abstractions.
  • You're passionate about infrastructure‑as‑code and automating painful manual processes.
  • You push for platform solutions that don’t leave on-call teams carrying the burden of design choices.
  • A clear communicator who excels at both written and verbal communication across differing levels of technical detail.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service