Data Engineering Internship

Auditoria.AISanta Clara, CA

About The Position

We're scaling an AI-native enterprise SaaS platform that powers agentic automation for corporate finance teams at Fortune 500 companies. As a Data Engineering Intern, you'll build the data infrastructure that makes our agents work. Clean, well-modeled, LLM-ready data flowing from customer ERPs into Snowflake, through our semantic layer, and into the retrieval pipelines that ground every decision our agents make. You'll work across the modern data stack and implement medallion architecture patterns that serve both operational systems and AI/ML workloads.

Requirements

  • Pursuing (or recently graduated) a Bachelor's or Master's in Computer Science, Data Engineering, Statistics, or a related field
  • Solid SQL skills: joins, window functions, and a basic grasp of how to read a query plan
  • Hands-on experience with at least one relational database (MySQL, Postgres, or similar) through coursework, projects, or prior internships
  • Comfortable writing Python for data processing and scripting
  • Genuine interest in LLMs and AI systems, you've played with OpenAI/Anthropic APIs, built a RAG project, or thought seriously about how data shape affects model behavior
  • Excellent communication, you can explain what you built and why
  • Authorized to work in the United States without the need for future sponsorship

Nice To Haves

  • Exposure to Snowflake, BigQuery, or Databricks
  • Experience with dbt, Airflow, or another orchestration/transformation tool
  • Experience with vector databases (Pinecone, Weaviate, pgvector, Snowflake Cortex Search) or embedding workflows
  • Understanding of dimensional modeling (star/snowflake schemas)
  • Any prior internship or substantive personal project in data engineering

Responsibilities

  • Building ingestion pipelines from customer ERPs and finance systems into data warehouse
  • Writing transformations in our Bronze, Silver, Gold medallion architecture, with an eye toward making data LLM-ready: well-named, well-typed, well-documented, and semantically meaningful
  • Extending the semantic layer that powers natural-language analytics, this is what lets non-technical finance users ask questions and get grounded answers
  • Preparing and structuring data for retrieval, embeddings, vector search, and context assembly for RAG pipelines that feed our agents
  • Implementing data quality checks, lineage, and monitoring so agents never act on bad data
  • Tuning queries and warehouse usage for both cost and latency
  • Contributing to technical documentation and participating in code reviews
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service