About The Position

Meta is seeking Research Engineers to join the Post-Training team within Meta Superintelligence Labs. High-quality data is the engine of AI progress at MSL, determining the capabilities we can unlock and how fast our models improve. As a Research Engineer on this team, you will build the pipelines to collect, generate, and refine the post-training data for our most advanced AI models. You'll work alongside world-class researchers and engineers to develop scalable systems for both human-in-the-loop data collection and automated synthetic data generation. This is a highly technical role requiring practical research engineering skills and the ability to work independently on a variety of open-ended machine learning challenges with high reliability. The data pipelines you build will directly impact the major model lines within MSL, making engineering reliability, rigor, and scalability paramount. You will excel by maintaining high velocity while adapting to rapidly shifting priorities. You'll tackle a wide variety of problems, from sourcing high-value expert data (STEM, finance, legal, health) to building custom environments that capture multi-step agentic trajectories (search, coding, computer use agents, shopping agents). If you are passionate about building the data engine that drives AI progress and thrive in fast-paced, high-impact research environments, we encourage you to apply for this exciting opportunity at the core of MSL.

Requirements

  • Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
  • Bachelor's or Master's degree in Computer Science, Machine Learning, or a related technical field
  • 1+ years of experience in machine learning engineering, machine learning research, or a related technical role
  • Proficiency in Python and experience with ML frameworks such as PyTorch
  • Experience identifying, designing, and completing medium to large technical features independently, without guidance
  • Demonstrated experience in software engineering practices including version control, testing, and code review practices
  • Ability to work independently and adapt to rapidly changing priorities

Nice To Haves

  • Publications at peer-reviewed venues (NeurIPS, ICML, ICLR, ACL, EMNLP, or similar) related to deep learning, language models, or data-centric AI
  • Hands-on experience with language model post-training systems, synthetic data generation, or building RLHF pipelines
  • Experience implementing or developing environments for agentic workflows (e.g., tool use, web browsing environments, coding sandboxes)
  • Experience working with large-scale distributed systems and high-throughput data pipelines
  • Familiarity with data quality filtering, deduplication, and contamination checking for LLMs
  • Track record of open-source contributions to ML infrastructure or datasets

Responsibilities

  • Design, build, and scale full-stack data collection pipelines for post-training (SFT, RLHF) across text, vision, and action modalities
  • Develop and implement environments to capture complex agentic trajectories, including computer use agents, Deep research workflows, UI generation, and shopping agents
  • Collaborate with external data vendors and domain experts to source, securely ingest, and prepare high-quality datasets in fields like STEM, finance, legal, and health
  • Execute on the technical vision of research scientists to generate and filter high-quality synthetic data at scale
  • Build robust, reusable data processing pipelines that scale across multiple model lines and product areas
  • Contribute to tooling that measures and ensures the Quality, Diversity, and Safety of post-training datasets

Benefits

  • bonus
  • equity
  • benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service