Scale AIposted 2 months ago
$176,000 - $255,000/Yr
Full-time • Entry Level
Seattle, WA

About the position

Scale works with the industry’s leading AI model labs to provide high quality data and accelerate progress in GenAI research. We are dedicated to advancing the science of data for generative AI. We develop innovative techniques for hybrid data generation and data quality assessment, ensuring high-quality and diverse datasets to drive the next generation of AI capabilities. We are looking for Research Scientists and Research Engineers to advance science of data and tackle challenges in data generation, quality assessment, and data selection for large-scale AI models. In this role, you will research and develop methodologies for synthetic and hybrid data generation, data quality and diversity analysis, and annotator behavior modeling. You will collaborate with researchers and engineers to define best practices in data-driven AI development. You will also partner with top foundation model labs to provide both technical and strategic input on the development of the next generation of generative AI models.

Responsibilities

  • Develop and refine synthetic and hybrid (with human-in-the-loop) data generation methods to enhance model training.
  • Design and implement data quality frameworks, including data diversity analysis, data selection strategies, and detection of reward hacking.
  • Collaborate with internal teams and external partners to establish best practices for high-quality AI datasets.
  • Publish research findings in top-tier AI conferences and contribute to open-source data quality initiatives.

Requirements

  • Ph.D., Master's degree/or equivalent experience in Computer Science, Machine Learning, AI, or a related field.
  • Strong background in deep learning, LLM, and data-centric AI methodologies.
  • Experience in synthetic data generation, data selection, reward hacking detection, human-in-the-loop data orc, and annotator behavior research.
  • Proficiency in Python and ML frameworks such as PyTorch or TensorFlow.
  • Excellent written and verbal communication skills.
  • Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals.
  • Previous experience in a customer facing role.

Benefits

  • Comprehensive health, dental and vision coverage
  • Retirement benefits
  • Learning and development stipend
  • Generous PTO
  • Commuter stipend (may be eligible)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service