Data Engineering Internship

Auditoria.AI•Santa Clara, CA

About The Position

We're scaling an AI-native enterprise SaaS platform that powers agentic automation for corporate finance teams at Fortune 500 companies. As a Data Engineering Intern, you'll build the data infrastructure that makes our agents work. Clean, well-modeled, LLM-ready data flowing from customer ERPs into Snowflake, through our semantic layer, and into the retrieval pipelines that ground every decision our agents make. You'll work across the modern data stack and implement medallion architecture patterns that serve both operational systems and AI/ML workloads.

Requirements

Pursuing (or recently graduated) a Bachelor's or Master's in Computer Science, Data Engineering, Statistics, or a related field
Solid SQL skills: joins, window functions, and a basic grasp of how to read a query plan
Hands-on experience with at least one relational database (MySQL, Postgres, or similar) through coursework, projects, or prior internships
Comfortable writing Python for data processing and scripting
Genuine interest in LLMs and AI systems, you've played with OpenAI/Anthropic APIs, built a RAG project, or thought seriously about how data shape affects model behavior
Excellent communication, you can explain what you built and why
Authorized to work in the United States without the need for future sponsorship

Nice To Haves

Exposure to Snowflake, BigQuery, or Databricks
Experience with dbt, Airflow, or another orchestration/transformation tool
Experience with vector databases (Pinecone, Weaviate, pgvector, Snowflake Cortex Search) or embedding workflows
Understanding of dimensional modeling (star/snowflake schemas)
Any prior internship or substantive personal project in data engineering

Responsibilities

Building ingestion pipelines from customer ERPs and finance systems into data warehouse
Writing transformations in our Bronze, Silver, Gold medallion architecture, with an eye toward making data LLM-ready: well-named, well-typed, well-documented, and semantically meaningful
Extending the semantic layer that powers natural-language analytics, this is what lets non-technical finance users ask questions and get grounded answers
Preparing and structuring data for retrieval, embeddings, vector search, and context assembly for RAG pipelines that feed our agents
Implementing data quality checks, lineage, and monitoring so agents never act on bad data
Tuning queries and warehouse usage for both cost and latency
Contributing to technical documentation and participating in code reviews

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume