Senior Research Scientist - Machine Learning System

BytedanceSan Jose, CA
3d$136,800 - $359,720

About The Position

The Machine Learning (ML) System sub-team combines system engineering and the art of machine learning to develop and maintain massively distributed ML training and Inference system/services around the world, providing high-performance, highly reliable, scalable systems for LLM/AIGC/AGI In our team, you'll have the opportunity to build the large scale heterogeneous system integrating with GPU/NPU/RDMA/Storage and keep it running stable and reliable, enrich your expertise in coding, performance analysis and distributed system, and be involved in the decision-making process. You'll also be part of a global team with members from the United States, China and Singapore working collaboratively towards unified project direction.

Requirements

  • Bachelor's degree or above, major in computer/electronics/automation/software, etc.
  • Proficient in C/C++, proficient in algorithms and data structures, familiar with Python
  • Understand the basic principles of deep learning algorithms, be familiar with the basic architecture of neural networks and understand deep learning training frameworks such as Pytorch.

Nice To Haves

  • Proficient in GPU high-performance computing optimization technology on CUDA, in-depth understanding of computer architecture, familiar with parallel computing optimization, memory access optimization, low-bit computing, etc.
  • Familiar with TensorRT-LLM, ORCA, VLLM, etc.
  • Knowledge of LLM models, experience in accelerating LLM model optimization is preferred.

Responsibilities

  • Responsible for developing and optimizing LLM inference framework.
  • Responsible for GPU and CUDA Performance optimization to create an industry-leading high-performance LLM inference engine.

Benefits

  • Employees have day one access to medical, dental, and vision insurance, a 401(k) savings plan with company match, paid parental leave, short-term and long-term disability coverage, life insurance, wellbeing benefits, among others.
  • Employees also receive 10 paid holidays per year, 10 paid sick days per year and 17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure).
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service