About The Position

Business Unit What the Role Entails Lead core technology R&D for the post-training stage of large language models (LLMs), including the design and optimization of high-quality reward systems. Continuously improve the model’s capabilities in complex instruction following, logical reasoning, and value alignment through Reward Modeling (RM) and Reinforcement Learning (RL) algorithms. Conduct in-depth research on and optimize post-training algorithms such as RLHF to improve training stability and overall model performance. Take ownership of data synthesis and management in the post-training stage. Design efficient data flywheel mechanisms, leverage techniques such as Supervised Fine-Tuning (SFT) and Self-Instruct to generate high-quality training data, and build a closed-loop signal modeling system that translates user feedback into model iteration. Be responsible for comprehensive evaluation and analysis of post-trained models, establish scientific evaluation metrics, stay up to date with cutting-edge research, and rapidly translate the latest advances into business value. Who We Look For Master’s degree or above in Computer Science, Software Engineering, Artificial Intelligence, or a related field. Strong understanding of Transformer architecture and the training principles of large language models. Deep research and hands-on experience in at least one of the following areas: LLM alignment, RLHF, Reward Modeling, or other post-training techniques. Solid foundation in algorithms and strong engineering skills. Proficient in Python and familiar with deep learning frameworks such as PyTorch or TensorFlow. Hands-on experience with distributed training. Familiarity with large-scale training and inference frameworks such as Megatron-LM, DeepSpeed, and vLLM is preferred. Experience training or fine-tuning models with tens or hundreds of billions of parameters is a strong plus. Strong research capability. Candidates with publications at top-tier conferences such as NeurIPS, ICLR, ICML, ACL, or EMNLP, or with high-impact contributions to open-source communities such as Hugging Face, will be preferred. Strong passion for technology and self-motivation, with the ability to analyze and solve complex problems, as well as excellent teamwork and communication skills. Location State(s) US-New York State-New York The expected base pay range for this position in the location(s) listed above is $182,500.00 to $343,200.00 per year. Actual pay may vary depending on job-related knowledge, skills, and experience. Employees hired for this position may be eligible for a sign on payment, relocation package, and restricted stock units, which will be evaluated on a case-by-case basis. Subject to the terms and conditions of the plans in effect, hired applicants are also eligible for medical, dental, vision, life and disability benefits, and participation in the Company’s 401(k) plan. The Employee is also eligible for up to 15 to 25 days of vacation per year (depending on the employee’s tenure), up to 13 days of holidays throughout the calendar year, and up to 10 days of paid sick leave per year. Your benefits may be adjusted to reflect your location, employment status, duration of employment with the company, and position level. Benefits may also be pro-rated for those who start working during the calendar year. Equal Employment Opportunity at Tencent As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals. Who we are Tencent is a world-leading internet and technology company that develops innovative products and services to improve the quality of life for people around the world. Equal Employment Opportunity at Tencent As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.

Requirements

  • Master’s degree or above in Computer Science, Software Engineering, Artificial Intelligence, or a related field.
  • Strong understanding of Transformer architecture and the training principles of large language models.
  • Deep research and hands-on experience in at least one of the following areas: LLM alignment, RLHF, Reward Modeling, or other post-training techniques.
  • Solid foundation in algorithms and strong engineering skills.
  • Proficient in Python and familiar with deep learning frameworks such as PyTorch or TensorFlow.
  • Hands-on experience with distributed training.

Nice To Haves

  • Familiarity with large-scale training and inference frameworks such as Megatron-LM, DeepSpeed, and vLLM is preferred.
  • Experience training or fine-tuning models with tens or hundreds of billions of parameters is a strong plus.
  • Strong research capability. Candidates with publications at top-tier conferences such as NeurIPS, ICLR, ICML, ACL, or EMNLP, or with high-impact contributions to open-source communities such as Hugging Face, will be preferred.
  • Strong passion for technology and self-motivation, with the ability to analyze and solve complex problems, as well as excellent teamwork and communication skills.

Responsibilities

  • Lead core technology R&D for the post-training stage of large language models (LLMs), including the design and optimization of high-quality reward systems.
  • Continuously improve the model’s capabilities in complex instruction following, logical reasoning, and value alignment through Reward Modeling (RM) and Reinforcement Learning (RL) algorithms.
  • Conduct in-depth research on and optimize post-training algorithms such as RLHF to improve training stability and overall model performance.
  • Take ownership of data synthesis and management in the post-training stage.
  • Design efficient data flywheel mechanisms, leverage techniques such as Supervised Fine-Tuning (SFT) and Self-Instruct to generate high-quality training data, and build a closed-loop signal modeling system that translates user feedback into model iteration.
  • Be responsible for comprehensive evaluation and analysis of post-trained models, establish scientific evaluation metrics, stay up to date with cutting-edge research, and rapidly translate the latest advances into business value.

Benefits

  • Employees hired for this position may be eligible for a sign on payment, relocation package, and restricted stock units, which will be evaluated on a case-by-case basis.
  • Subject to the terms and conditions of the plans in effect, hired applicants are also eligible for medical, dental, vision, life and disability benefits, and participation in the Company’s 401(k) plan.
  • The Employee is also eligible for up to 15 to 25 days of vacation per year (depending on the employee’s tenure), up to 13 days of holidays throughout the calendar year, and up to 10 days of paid sick leave per year.
  • Your benefits may be adjusted to reflect your location, employment status, duration of employment with the company, and position level.
  • Benefits may also be pro-rated for those who start working during the calendar year.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service