About The Position

We are seeking a curious, motivated Data Engineer Intern to support the design, development, and optimization of modern data platforms that power analytics, experimentation, and emerging AI-driven workflows. This internship provides hands-on exposure to cloud data ecosystems such as Databricks, Snowflake, AWS, Azure, and GCP, with opportunities to contribute to real-world ETL/ELT pipelines, data transformations, and AI agent–enabled use cases. Internship details: The internship will begin June 15th, 2026 Our intention is to have interns work on-site in Conway, Arkansas 20-25 hours/week work dedication during the semester and up to 40/week during breaks and summer Anticipated graduation date between December 2026 - May 2027 Position Highlights We are seeking a curious, motivated Data Engineer Intern to support the design, development, and optimization of modern data platforms that power analytics, experimentation, and emerging AI-driven workflows. This internship provides hands-on exposure to cloud data ecosystems such as Databricks, Snowflake, AWS, Azure, and GCP, with opportunities to contribute to real-world ETL/ELT pipelines, data transformations, and AI agent–enabled use cases. As an intern, you will work closely with experienced data engineers, analytics teams, and AI practitioners to learn how reliable data foundations enable intelligent agents, automation, and data-driven decision-making—while gaining practical experience in scalable, privacy-aware data engineering.

Requirements

  • Currently pursuing a Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, Information Systems, or a related field.
  • Basic proficiency in SQL, including simple joins, aggregations, and filtering.
  • Familiarity with Python for scripting, data manipulation, or coursework projects.
  • Introductory understanding of data engineering concepts, such as ETL/ELT, data lakes, and data warehouses.
  • Exposure to at least one cloud platform (AWS, Azure, or GCP) through coursework, labs, or personal projects.
  • Interest in AI, machine learning, or intelligent systems, especially how they depend on high-quality data.
  • Strong willingness to learn, ask questions, and collaborate in a team environment.
  • Clear written and verbal communication skills with attention to detail.

Nice To Haves

  • Academic or personal project experience with Databricks, Snowflake, or BigQuery.
  • Exposure to Apache Spark, dbt, or workflow orchestration tools.
  • Familiarity with common data formats such as Parquet, JSON, Avro, or Delta Lake.
  • Basic understanding of streaming vs. batch processing concepts.
  • Coursework or projects involving AI agents, LLMs, or ML pipelines, such as:
  • Using agents to query data or generate insights Automating data-related tasks with AI-assisted workflows
  • Awareness of data privacy concepts (e.g., PII, GDPR, CCPA), even at a conceptual level.
  • Experience working in GitHub or similar version control systems.

Responsibilities

  • Assist in building and maintaining batch and streaming data pipelines using tools such as Spark, Databricks, Snowflake, and cloud-native services.
  • Support the development of ETL/ELT workflows using orchestration tools like Apache Airflow, dbt, or managed cloud schedulers.
  • Help ingest structured and semi-structured data from sources such as S3, ADLS, GCS, APIs, or Kafka into raw and curated data layers.
  • Write and maintain SQL and Python-based transformations for cleaning, joining, and aggregating datasets.
  • Participate in implementing data quality checks, validation rules, and basic monitoring to ensure data accuracy and reliability.
  • Collaborate with data engineers, analysts, and data scientists to understand how datasets are consumed by analytics models and AI agents.
  • Assist in preparing datasets and feature tables that can be used by AI/ML pipelines or autonomous agents for decision-making and automation.
  • Explore how AI agents can interact with data platforms (e.g., querying data, triggering pipelines, summarizing results) under guidance from senior team members.
  • Contribute to documentation of data flows, schemas, and pipeline logic to support team knowledge sharing.
  • Learn and follow data modeling, governance, and privacy best practices, especially in regulated or privacy-conscious environments.
  • Support version control and deployment processes using Git and basic CI/CD workflows.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service