Pyspark Databricks Engineer

CrackaJack Digital Solutions LLC•Houston, TX

31d•Hybrid

About The Position

Job Title: PySpark and Databricks Developer Location: Houston, TX (Hybrid) Key Responsibilities: Design, develop, and optimize data pipelines and transformations using PySpark and Databricks. Collaborate with data architects and analysts to define and implement scalable data models and frameworks. Build and maintain complex data ingestion and processing workflows for large, distributed datasets. Develop reusable and efficient code following best practices in coding, testing, and deployment. Optimize Spark jobs for performance, scalability, and reliability in production environments. Work closely with cross-functional teams to ensure data quality, consistency, and integrity. Contribute to continuous improvement of the data engineering ecosystem and CI/CD processes. Required Skills and Qualifications: 5+ years of experience in software development with a focus on Python and PySpark. Hands-on expertise in Databricks platform — including cluster management, notebooks, and job orchestration. Strong programming fundamentals (data structures, algorithms, debugging, version control). Experience with Delta Lake, Spark SQL, and data lake architectures. Solid understanding of distributed computing, data partitioning, and Spark performance tuning. Familiarity with cloud platforms such as Azure, AWS, or GCP. Excellent communication and problem-solving skills — able to explain complex technical concepts clearly.

Requirements

5+ years of experience in software development with a focus on Python and PySpark.
Hands-on expertise in Databricks platform — including cluster management, notebooks, and job orchestration.
Strong programming fundamentals (data structures, algorithms, debugging, version control).
Experience with Delta Lake, Spark SQL, and data lake architectures.
Solid understanding of distributed computing, data partitioning, and Spark performance tuning.
Familiarity with cloud platforms such as Azure, AWS, or GCP.
Excellent communication and problem-solving skills — able to explain complex technical concepts clearly.

Responsibilities

Design, develop, and optimize data pipelines and transformations using PySpark and Databricks.
Collaborate with data architects and analysts to define and implement scalable data models and frameworks.
Build and maintain complex data ingestion and processing workflows for large, distributed datasets.
Develop reusable and efficient code following best practices in coding, testing, and deployment.
Optimize Spark jobs for performance, scalability, and reliability in production environments.
Work closely with cross-functional teams to ensure data quality, consistency, and integrity.
Contribute to continuous improvement of the data engineering ecosystem and CI/CD processes.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume