Member of Technical Staff, Synthetic Data

Recruiting From Scratch•San Francisco, NY

2d•$150,000 - $350,000•Onsite

About The Position

Our client is a fast-growing, venture-backed AI startup building next-generation infrastructure to enable more capable and reliable AI agents in enterprise environments. Backed by top-tier investors and operating at the frontier of AI, the team focuses on creating highly realistic synthetic environments and data that allow agents to train, evaluate, and operate in complex, real-world scenarios. This is a small, deeply technical team with a strong culture of ownership, speed, and precision. They are scaling quickly and tackling problems that sit at the intersection of data engineering, simulation, and applied AI research. As a Member of Technical Staff focused on Synthetic Data , you will play a critical role in building and owning the company’s synthetic data and simulation pipelines. This role is data engineering–first , with meaningful exposure to applied research. You will:

Requirements

2+ years of experience in data engineering, or are a recent graduate with substantial internships or research experience
Strong programming skills in Python and SQL
Experience building or maintaining data pipelines in production environments
Comfort working with ambiguity and rapidly evolving requirements
A strong sense of ownership and the ability to move quickly without sacrificing quality

Nice To Haves

Experience with synthetic data, simulations, or related research
Open-source contributions or published research
Experience working at early-stage or fast-growing startups
Familiarity with AI systems, agent workflows, or applied ML infrastructure
Experience working with large-scale datasets and distributed systems

Responsibilities

Design, build, and maintain end-to-end data engineering pipelines, including ingestion, transformation, storage, and schema evolution
Develop large-scale synthetic data pipelines that simulate realistic enterprise environments, including logs, usage data, and operational metrics
Work with massive datasets (tens of millions of rows) and optimize systems for scalability, performance, and cost
Create realistic, noisy, and imperfect data that mirrors real enterprise complexity and inconsistency
Ensure cross-system consistency so AI agents can navigate multi-application workflows
Read, interpret, and apply research related to simulation and synthetic data generation
Collaborate closely with a highly technical team to iterate quickly and ship impactful infrastructure