Databricks Data Engineer

Guidehouse•Arlington, VA

39d

About The Position

Develop and implement CI/CD pipelines for Databricks notebooks and jobs. Develop ETL pipelines using PySpark and Databricks. Implement Delta Lake for ACID transactions and data reliability. Optimize ingestion from APIs, streaming, and batch sources. Ensure compliance with data governance and security standards. Collaborate with data engineers and scientists to support data pipelines and ML workflows. Conduct ETL and data quality analysis using various technologies (i.e., Python, Databricks). Ensure data governance and quality assurance standards are met. Organize and lead meetings, including scheduling meetings; drafting and delivering agendas and meeting minutes; providing and archiving required documentation; and documenting, tracking, and following up on action items. Summarize and present information and reports to the team and make recommendations (both oral and written).

Requirements

Bachelor’s degree is required
Minimum SEVEN (7) years of total experience in cloud-based data platforms
Minimum FIVE (5) years experience with Databricks
Strong scripting skills (Python, Bash).
Experience with Delta Lake and Unity Catalog.
Strong knowledge of Spark architecture and distributed computing.
Hands-on experience with Terraform or other IaC tools.
Experience with Unity Catalog and Delta Lake.
Experience with data modeling and performance tuning.
Experience with streaming technologies (Kafka, Event Hub).
Experience with using CI/CD for data pipelines.
Familiarity with Kubernetes and container orchestration.
Excellent problem-solving skills and attention to detail.
Strong communication and collaboration skills, with the ability to work effectively in a team environment.

Nice To Haves

Databricks Certified Data Engineer Associate or Professional.
Azure Data Engineer Associate or AWS Big Data Specialty.

Responsibilities

Develop and implement CI/CD pipelines for Databricks notebooks and jobs.
Develop ETL pipelines using PySpark and Databricks.
Implement Delta Lake for ACID transactions and data reliability.
Optimize ingestion from APIs, streaming, and batch sources.
Ensure compliance with data governance and security standards.
Collaborate with data engineers and scientists to support data pipelines and ML workflows.
Conduct ETL and data quality analysis using various technologies (i.e., Python, Databricks).
Ensure data governance and quality assurance standards are met.
Organize and lead meetings, including scheduling meetings; drafting and delivering agendas and meeting minutes; providing and archiving required documentation; and documenting, tracking, and following up on action items.
Summarize and present information and reports to the team and make recommendations (both oral and written).

Benefits

Medical, Rx, Dental & Vision Insurance
Personal and Family Sick Time & Company Paid Holidays
Position may be eligible for a discretionary variable incentive bonus
Parental Leave and Adoption Assistance
401(k) Retirement Plan
Basic Life & Supplemental Life
Health Savings Account, Dental/Vision & Dependent Care Flexible Spending Accounts
Short-Term & Long-Term Disability
Student Loan PayDown
Tuition Reimbursement, Personal Development & Learning Opportunities
Skills Development & Certifications
Employee Referral Program
Corporate Sponsored Events & Community Outreach
Emergency Back-Up Childcare Program
Mobility Stipend