About The Position

This role offers the opportunity to influence the quality and effectiveness of AI models by leading the evaluation, curation, and optimization of large-scale datasets. You will design and implement statistical and machine learning methods to ensure datasets are diverse, representative, and high-impact. Collaborating closely with research and engineering teams, you will define data quality standards, develop assessment frameworks, and create tools to automate preprocessing and validation. This position combines hands-on technical work with strategic guidance, allowing you to shape the future of AI model performance. Ideal candidates are independent problem-solvers with strong analytical skills, experience with unstructured datasets, and a passion for data-centric AI research. This is a fully remote role in a fast-paced, collaborative, and high-trust environment.

Requirements

  • PhD or equivalent Master’s degree with 4+ years of industry experience in machine learning, computer science, statistics, mathematics, engineering, or a related quantitative field.
  • Strong understanding of AI model training pipelines, including data preprocessing and evaluation.
  • Experience with large, unstructured datasets, especially text-based data.
  • Background in statistical analysis, bias detection, and data validation.
  • Proven ability to identify high-impact problems and implement independent solutions.
  • Excellent collaboration and communication skills for cross-functional teamwork.

Nice To Haves

  • experience with synthetic data generation or augmentation
  • open-source contributions
  • publications in data-centric AI
  • experience developing evaluation frameworks or performance metrics

Responsibilities

  • Lead the evaluation, curation, and optimization of large-scale datasets used for AI model training.
  • Design and apply statistical and machine learning techniques to filter, enrich, and assess data quality.
  • Develop frameworks to measure data diversity, duplication, and informativeness, reducing risks in training datasets.
  • Collaborate with model training teams to identify bottlenecks and optimize dataset performance.
  • Provide leadership on data quality strategy and define internal best practices.
  • Evaluate external datasets for integration, focusing on scalability, quality, and relevance, and develop data scorecards.
  • Contribute to the research and development of tools for automated data preprocessing, validation, and enrichment.

Benefits

  • Competitive compensation package with performance-based incentives.
  • Fully remote and flexible working arrangements.
  • Opportunity to work on high-impact AI projects with cutting-edge datasets.
  • Collaborative, high-trust culture emphasizing velocity, impact, and experimentation.
  • Direct influence on AI model performance and data quality strategy.
  • Exposure to advanced research and applied machine learning initiatives.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service