About The Position

As a Machine Learning Engineer specializing in Data Synthesis, you will architect privacy-preserving data generation pipelines that reduce dependency on external data procurement, accelerate model development, and set a new standard for responsible ML at scale. You'll work at the intersection of cutting-edge generative AI research and production ML systems, collaborating closely with Engineering, Product, Privacy, and Legal teams. This unique opportunity shapes data strategy, impacting features used by millions while pioneering privacy-first ML practices.

Requirements

  • BS/Master's degree in Computer Science, Engineering, Statistics, or a related quantitative field, alternatively equivalent industry experience may be considered.
  • 5+ years of experience driving the design and development of machine learning pipelines as an ML Engineer.
  • Hands-on experience building synthetic data generation systems using modern generative techniques (GANs, VAEs, diffusion models, or LLM-based approaches), with measurable impact on model performance or data cost reduction.
  • Hands-on experience synthesizing time series data at scale.
  • Proficiency in Python and relevant ML frameworks (PyTorch, TensorFlow).
  • Proficiency in Spark, Ray, or other distributed computing technologies for developing pipelines at scale.
  • Proficiency in using industry-standard tools and techniques for statistical testing and data experimentation.
  • Experience with data augmentation across multiple data types (structured, unstructured, and semi-structured).
  • Strong data exploration and analytical skills, with the ability to assess and characterize diverse data assets.
  • Proven ability to collaborate across functions (R&D, Privacy, Legal, Infrastructure) and drive cross-team alignment.

Nice To Haves

  • PhD in Computer Science, Data Science, Statistics, AI/ML, or a related field.
  • Experience with Bayesian or causal graph-based approaches to data generation.
  • Experience identifying low-quality, erroneous, or fraudulent data at scale.
  • Deep familiarity with generative architectures including transformers, diffusion models, and multi-modal systems.
  • Track record of influencing cross-team roadmaps and driving adoption of new tools or infrastructure across organizations.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service