Research Engineer - AI/RL Infrastructure

Applied IntuitionSunnyvale, CA
10dOnsite

About The Position

We are looking for a passionate Research Engineer (AI/RL Infrastructure) to join the Research Group at Applied Intuition. This role is ideal for engineers who design, build, and operate state-of-the-art, large-scale ML systems and enjoy working closely with researchers to develop and accelerate the core platform powering next-generation physical AI systems. The mission of the Research Group is to create cutting-edge technology enabling next-generation physical AI, with emphasis on the two most challenging applications reshaping our everyday life: end-to-end autonomous driving and robotic generalist. We have a group composed of leading experts from top institutions and companies, recognized for their exceptional academic and industry contributions—including eight Best Paper awards at premier conferences and journals such as CVPR and ICRA. Learn more at appliedintuition.com/research. Supported by industry-leading tools and infra, researchers can access millions of miles of data from large fleets, and deploy methods they develop into various autonomous and robotic systems including self-driving cars/trucks, autonomous mining/construction machines, humanoid robots and dexterous hands. In addition to your research contributions, you will contribute to and learn from best practices in the autonomy and robotics industries within our fast-paced and customer-focused culture. Improvements deployed to our system immediately help our customers with their programs and deliver value to our business. We are open to all years of experience as long as the necessary requirements are met, including those with potential Tech Lead and Manager capacity; Senior/Staff level experience is strongly preferred for this role.

Requirements

  • Experience building and operating production-grade software systems across the full machine learning lifecycle, including training, evaluation, data, and deployment
  • Opinions about building a company-wide platform for ML training, evaluation, and deployment
  • Experience with performance engineering and compute acceleration for large-scale ML training, including profiling, bottleneck analysis, and optimization
  • Strong systems-level debugging skills to diagnose and resolve issues in large-scale distributed training, spanning model code, data pipelines, runtimes, and cluster infrastructure
  • Deep familiarity with the open-source ML and systems ecosystem, with judgment on when to adopt open source versus build in-house
  • Technical experience in: Pytorch, CUDA, Ray, Flyte, K8s

Nice To Haves

  • Industry experience on relevant topics (self-driving application preferred)

Responsibilities

  • Design and build training and evaluation infrastructure to support our current AI research directions, orchestrating massive GPU clusters to process PBs of multimodal sensor data
  • Build robust benchmarking, continuous evaluation, and regression tracking systems to measure model performance across diverse, long-tail real-world driving distributions
  • Develop large-scale data sampling, dataset generation, and advanced data curation pipelines, leveraging state-of-the-art AI models to power a closed-loop data flywheel
  • Enable high-throughput distributed training across heterogeneous cloud environments, focusing on reliability, efficiency, and cost-aware scaling
  • Collaborate closely with AI research, autonomy, and platform teams to translate cutting-edge research into production-ready systems

Benefits

  • equity in the form of options and/or restricted stock units
  • comprehensive health, dental, vision, life and disability insurance coverage
  • 401k retirement benefits with employer match
  • learning and wellness stipends
  • paid time off
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service