About The Position

As a Senior / Principal Inference Engineer on ML Platform you will build the next generation of ML Ecosystem Tooling, specifically around model inference. ML Platform today supports billions of requests per day across our homepage, marketplace, economy, and more. We are looking for accomplished engineers to help build out the next generation of ML platform tooling for high-scale inference in a quickly innovating space.

Requirements

  • 4+ years of professional experience and a tool chest of system design experience upon which to draw to build scalable, reliable platforms for all of Roblox.
  • Experience building complex distributed systems that scale to real-time ML inference serving, ideally for real-time recommendation systems serving millions of QPS.
  • Experience debugging complicated infrastructure-level performance issues to enable low latency, high throughput inference..
  • Bachelor's degree or higher in Computer Science, Computer Engineering, Data Science, or a similar technical field.

Nice To Haves

  • Passionate about supporting and working cross functionally with internal partners (Data Scientists and ML Engineers) to meet and understand their needs.
  • A reliability nut: you love digging into tricky postmortems and identifying and fixing weaknesses in complicated systems.
  • Ideally familiar with ML model inference frameworks like Triton Inference Server, TensorRT, KServe.

Responsibilities

  • Set technical strategy and oversee development of high scale, reliable infrastructure systems for large-scale inference, especially as we scale up both inference qps and model size.
  • Dig into performance bottlenecks all along the inference stack, spanning from model optimizations to infrastructure optimizations.
  • Stay abreast of industry trends in machine learning and infrastructure to ensure the adoption of leading-edge technologies and practices.
  • Bootstrap and maintain infrastructure for ML Platform components-Serving Layer, Metadata Store, Model Registry, and Pipeline Orchestrator.
  • Partner across organizations to build tooling, interfaces, and visualizations that make the ML@Roblox a delight to use.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Industry

Administrative and Support Services

Number of Employees

1,001-5,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service