AI Data Ops Lead

SanasPalo Alto, CA
60d

About The Position

Weʼre looking for a hands-on AI Data Ops Lead to own the datasets that power ourspeech and language models and analytics thereof. Youʼll design and maintain data pipelines, labeling workflows, and dashboards that transform raw multimodal data into actionable insights. This role blends data engineering with analytical depth-ideal for someone who can write production-grade Python, evaluate dataset quality, and surface trends that shape model development. Youʼll collaborate with and support Scientists, Data Collection teams, Executives, and external vendors to bring new data sources online, run data collection and labeling, automate data ingestion, and deliver transparent reporting across the AI data lifecycle

Requirements

  • 3-6 years of experience in data science, data operations, or ML data workflows
  • Strong programming skills in Python (pandas, NumPy, SQL, FastAPI or similar).
  • Proven experience building and maintaining Data dashboards (Gradio, Streamlit, Plotly, Dash, PowerBI, or similar).
  • Strong data analysis and visualization skills; comfort working with large, complex datasets
  • Familiarity with databases and cloud data infrastructure (SQL, DynamoDB, AWS Glue, S3, BigQuery, etc.)
  • Excellent communication and documentation skills; thrive in a fast-moving AI environment.

Nice To Haves

  • Experience with speech or audio datasets (e.g., ASR, TTS, voice embeddings, or diarization).
  • Familiarity with data labeling workflows for audio or text.
  • Knowledge of signal processing, spectrogram analysis, or acoustic feature extraction.
  • Experience with data orchestration tools (Dagster, Airflow, etc.)
  • Experience with building custom tooling on a need-basis (Retool, Replit, etc.)
  • Exposure to dataset versioning, evaluation pipelines, and MLOps principles.
  • Interest in advancing the data foundations of AI research

Responsibilities

  • Build and maintain internal tools for data collection, labeling, and ingestion.
  • Discover new data sources and prepare them into unified data frames for consumption
  • Coordinate with multiple stakeholders to ensure timely delivery of high quality data.
  • Operate and design ETL data pipelines for large-scale audio, text, and metadata.
  • Own data quality: Build tooling for quality assurance across all dimensions, discover inaccuracies and fix them + feed back into improving the QA tooling
  • Analyze dataset coverage, diversity, and quality; monitor bias and data drift.
  • Create dashboards and visual reports tracking data distribution, collection throughput, and collection quality.
  • Work cross-functionally to ensure that the data being made available meets our continuously evolving needs.
  • Run a monthly newsletter reporting about any changes being made to the data and all the new data sources being made available.
  • Design validation experiments for labeled datasets.
  • Implement automated checks for consistency, completeness, and noise reduction.
  • Support research teams with well-documented, high-integrity datasets

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Publishing Industries

Education Level

No Education Listed

Number of Employees

101-250 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service