Data Engineer

Imbue•San Francisco, CA

234d•$170,000 - $350,000

About The Position

We're a small, cross-functional team focused on building AI systems that reason and code. We care deeply about understanding how people interact with these systems and how we can use data to make them safer, smarter, and more useful. We're looking for a Data Engineer to build and own the pipelines and data infrastructure that power our product and research efforts. Your work will directly support model training, evaluation, product analytics, and safety systems. You'll partner closely with team members building our coding agents to make sure we're capturing the right signals and using them well. If you're excited about turning messy product data into actionable insights, and building systems that can scale with our research, we'd love to get connected!

Requirements

A strong software engineer with 5+ years of experience, ideally working with large-scale data systems.
Experienced in designing and maintaining data pipelines and infrastructure, especially for analytics, experimentation, and ML.
Comfortable with tools for data orchestration (Airflow, Prefect), batch or streaming processing (Spark, Ray, Flink), and event tracking and analytics (Amplitude, PostHog).
Experienced with cloud-based infrastructure and storage (e.g., S3, GCP, Snowflake, or Redshift), and thoughtful about cost-performance tradeoffs.
Exposure to MLOps, model serving infrastructure, or ML workflows.
Pragmatic and principled! You know when to optimize and when to ship.

Responsibilities

Combine synthetic data generation with human annotation platforms to produce high quality data that advances our product and research roadmap.
Design and build resilient, scalable pipelines (ETL and ELT) for batch and streaming data.
Develop and maintain infrastructure to support self-serve analytics, experimentation, and dataset generation.
Prototype, evaluate, and make 'build vs buy' decisions.
Help define and improve data modeling practices across the company, including instrumentation standards, dimensional modeling for analytics and feature stores for machine learning (ML).
Build integrations with ML infrastructure to support training pipelines, inference logging, and model monitoring (MLOps).
Debug pipeline failures, automate deployment processes, and improve data quality and reusability.

Benefits

Work directly on creating software with human-like intelligence.
Generous compensation, equity, and benefits.
Budget for self-improvement: coaching, courses, conferences, etc.
Actively co-create and participate in a positive, intentional team culture.
Spend time learning, reading papers, and deeply understanding prior work.
Frequent team events, dinners, off-sites, and hanging out.
Compensation packages are highly variable based on a variety of factors.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Career Level

Mid Level

Industry

Professional, Scientific, and Technical Services

Number of Employees

11-50 employees

Data Engineer

About The Position

Requirements

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company