Intern

Koantek•Chesterfield, MO

52d

About The Position

This internship is designed for aspiring data professionals who want to build high -performance data pipelines within the Databricks Lakehouse ecosystem. You will move beyond theoretical knowledge to help build, scale, and optimize real -world ETL/ELT processes using Spark.

Requirements

Proficiency in Python (specifically for data manipulation) and strong SQL skills.
Familiarity with the basics of Apache Spark (DataFrames, RDDs) and distributed computing concepts.
Basic understanding of cloud platforms like Azure, AWS, or GCP (where Databricks typically resides).
Experience using Git for collaborative code management.
A "builder" mindset with a strong desire to automate manual processes and debug complex data flows.

Responsibilities

Assist in building and maintaining automated data pipelines using Delta Live Tables (DLT) and Databricks Workflows.
Write clean, efficient code in PySpark or Spark SQL to clean and structure raw data for downstream analytics.
Support the implementation of the Medallion Architecture (Bronze, Silver, and Gold layers) to ensure data quality and reliability.
Help identify bottlenecks in Spark jobs and assist in optimizing cluster configurations and partitioning strategies.
Work closely with Senior Data Engineers and Data Scientists to provide "query -ready" datasets for business intelligence.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume