Member of Technical Staff, Synthetic Data

Recruiting From ScratchSan Francisco, NY
2d$150,000 - $350,000Onsite

About The Position

Our client is a fast-growing, venture-backed AI startup building next-generation infrastructure to enable more capable and reliable AI agents in enterprise environments. Backed by top-tier investors and operating at the frontier of AI, the team focuses on creating highly realistic synthetic environments and data that allow agents to train, evaluate, and operate in complex, real-world scenarios. This is a small, deeply technical team with a strong culture of ownership, speed, and precision. They are scaling quickly and tackling problems that sit at the intersection of data engineering, simulation, and applied AI research. As a Member of Technical Staff focused on Synthetic Data , you will play a critical role in building and owning the company’s synthetic data and simulation pipelines. This role is data engineering–first , with meaningful exposure to applied research. You will:

Requirements

  • 2+ years of experience in data engineering, or are a recent graduate with substantial internships or research experience
  • Strong programming skills in Python and SQL
  • Experience building or maintaining data pipelines in production environments
  • Comfort working with ambiguity and rapidly evolving requirements
  • A strong sense of ownership and the ability to move quickly without sacrificing quality

Nice To Haves

  • Experience with synthetic data, simulations, or related research
  • Open-source contributions or published research
  • Experience working at early-stage or fast-growing startups
  • Familiarity with AI systems, agent workflows, or applied ML infrastructure
  • Experience working with large-scale datasets and distributed systems

Responsibilities

  • Design, build, and maintain end-to-end data engineering pipelines, including ingestion, transformation, storage, and schema evolution
  • Develop large-scale synthetic data pipelines that simulate realistic enterprise environments, including logs, usage data, and operational metrics
  • Work with massive datasets (tens of millions of rows) and optimize systems for scalability, performance, and cost
  • Create realistic, noisy, and imperfect data that mirrors real enterprise complexity and inconsistency
  • Ensure cross-system consistency so AI agents can navigate multi-application workflows
  • Read, interpret, and apply research related to simulation and synthetic data generation
  • Collaborate closely with a highly technical team to iterate quickly and ship impactful infrastructure

Benefits

  • Competitive equity package
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service