AI Infrastructure Engineer, Distributed Training, Optimus

TeslaPalo Alto, CA
1d$124,000 - $420,000

About The Position

As a Software Engineer for the Optimus team, you will build the tools and infrastructure to make and measure improvements to neural network architecture, visualize data, assist with exporting and deploying neural networks to the bot, and evaluate experimental results. You will help us automate the entire workflows of training, validation, and production of the Optimus. Most importantly, you will see your work repeatedly shipped to and utilized by thousands of Humanoid Robots in real world applications.  Build and improve our Python training infrastructure for stable and faster training  Build the tooling and infrastructure for reporting and visualizing model metrics and performance  Build the pipelines to run and validate our PyTorch models  Manage, analyze, and visualize our training and test datasets  Coordinate with the team managing the hardware cluster to maintain high availability / jobs throughput for Machine Learning  Build and improve tooling to deploy trained neural nets to Tesla hardware

Requirements

  • Practical experience programming in Python and/or C++
  • Proficient in system-level software, particularly hardware-software interactions and resource utilization
  • Understanding of modern machine learning concepts and state of the art deep learning
  • Experience working with training frameworks, ideally PyTorch
  • Demonstrated experience scaling neural network training jobs across clusters of GPU’s

Nice To Haves

  • Previous experience in deep learning deployment
  • Profiling and optimizing CPU-GPU interactions (pipelining compute/transfers, etc)

Responsibilities

  • Build and improve our Python training infrastructure for stable and faster training
  • Build the tooling and infrastructure for reporting and visualizing model metrics and performance
  • Build the pipelines to run and validate our PyTorch models
  • Manage, analyze, and visualize our training and test datasets
  • Coordinate with the team managing the hardware cluster to maintain high availability / jobs throughput for Machine Learning
  • Build and improve tooling to deploy trained neural nets to Tesla hardware

Benefits

  • Medical plans > plan options with $0 payroll deduction
  • Family-building, fertility, adoption and surrogacy benefits
  • Dental (including orthodontic coverage) and vision plans, both have options with a $0 paycheck contribution
  • Company Paid (Health Savings Accounts) HSA Contribution when enrolled in the High-Deductible medical plan with HSA
  • Healthcare and Dependent Care Flexible Spending Accounts (FSA)
  • 401(k) with employer match, Employee Stock Purchase Plans, and other financial benefits
  • Company paid Basic Life, AD&D
  • Short-term and long-term disability insurance (90 day waiting period)
  • Employee Assistance Program
  • Sick and Vacation time (Flex time for salary positions, Accrued hours for Hourly positions), and Paid Holidays
  • Back-up childcare and parenting support resources
  • Voluntary benefits to include: critical illness, hospital indemnity, accident insurance, theft & legal services, and pet insurance
  • Weight Loss and Tobacco Cessation Programs
  • Tesla Babies program
  • Commuter benefits
  • Employee discounts and perks program

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service