About The Position

This role offers a unique opportunity to influence the foundation of AI model performance by focusing on the quality and structure of training data. You will lead efforts to curate, assess, and optimize large-scale, unstructured datasets that power state-of-the-art AI systems. Working closely with research and engineering teams, you will apply statistical, computational, and ML-driven techniques to improve dataset diversity, representativeness, and overall impact. The position requires a highly analytical and independent thinker who can design frameworks to evaluate and de-risk datasets while contributing to the development of automated data preprocessing and validation tools. This is a highly collaborative role in a fast-moving, high-trust environment that encourages ownership, experimentation, and measurable impact on real-world AI outcomes.

Requirements

  • PhD or equivalent Master’s degree with 4+ years of industry experience in machine learning, computer science, statistics, mathematics, engineering, or a related quantitative field.
  • Strong understanding of AI training pipelines, including preprocessing, evaluation, and optimization of datasets.
  • Experience handling large, unstructured datasets, particularly in text-based domains.
  • Background in statistical analysis, bias detection, and data validation methodologies.
  • Ability to identify high-impact problems and independently develop solutions.
  • Excellent collaboration and communication skills, with experience working across technical teams.

Nice To Haves

  • experience with synthetic data generation, dataset augmentation, or development of evaluation frameworks
  • publications or open-source contributions in data-centric AI.

Responsibilities

  • Lead the evaluation, curation, and optimization of large-scale unstructured datasets for AI model training.
  • Design and implement statistical and machine learning methods to assess data quality, diversity, and informativeness.
  • Collaborate with model training teams to identify data bottlenecks and optimize dataset performance.
  • Provide leadership on data quality strategy and establish best practices for dataset assessment.
  • Evaluate external datasets for scalability, relevance, and integration, creating data scorecards as needed.
  • Contribute to R&D of tools that automate data preprocessing, validation, and enhancement processes.
  • Communicate findings and improvements to research, engineering, and cross-functional teams.

Benefits

  • Competitive compensation and performance-based incentives.
  • Fully remote, flexible working environment.
  • Opportunity to work on high-impact AI projects with access to large-scale datasets.
  • Collaborative and high-trust culture emphasizing ownership and velocity.
  • Direct influence on AI model performance and data strategy.
  • Exposure to cutting-edge AI research and applied machine learning projects.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service