Intern

KoantekChesterfield, MO
4d

About The Position

This internship is designed for aspiring data professionals who want to build high -performance data pipelines within the Databricks Lakehouse</b> ecosystem. You will move beyond theoretical knowledge to help build, scale, and optimize real -world ETL/ELT processes using Spark.

Requirements

  • Proficiency in Python</b> (specifically for data manipulation) and strong SQL</b> skills.
  • Familiarity with the basics of Apache Spark (DataFrames, RDDs) and distributed computing concepts.
  • Basic understanding of cloud platforms like Azure, AWS, or GCP</b> (where Databricks typically resides).
  • Experience using Git</b> for collaborative code management.
  • A "builder" mindset with a strong desire to automate manual processes and debug complex data flows.

Responsibilities

  • Assist in building and maintaining automated data pipelines using Delta Live Tables (DLT)</b> and Databricks Workflows.
  • Write clean, efficient code in PySpark</b> or Spark SQL</b> to clean and structure raw data for downstream analytics.
  • Support the implementation of the Medallion Architecture</b> (Bronze, Silver, and Gold layers) to ensure data quality and reliability.
  • Help identify bottlenecks in Spark jobs and assist in optimizing cluster configurations and partitioning strategies.
  • Work closely with Senior Data Engineers and Data Scientists to provide "query -ready" datasets for business intelligence.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service