Databricks Data Engineer

GuidehouseArlington, VA
11d

About The Position

Develop and implement CI/CD pipelines for Databricks notebooks and jobs. Develop ETL pipelines using PySpark and Databricks. Implement Delta Lake for ACID transactions and data reliability. Optimize ingestion from APIs, streaming, and batch sources. Ensure compliance with data governance and security standards. Collaborate with data engineers and scientists to support data pipelines and ML workflows. Conduct ETL and data quality analysis using various technologies (i.e., Python, Databricks). Ensure data governance and quality assurance standards are met. Organize and lead meetings, including scheduling meetings; drafting and delivering agendas and meeting minutes; providing and archiving required documentation; and documenting, tracking, and following up on action items. Summarize and present information and reports to the team and make recommendations (both oral and written).

Requirements

  • Bachelor’s degree is required
  • Minimum SEVEN (7) years of total experience in cloud-based data platforms
  • Minimum FIVE (5) years experience with Databricks
  • Strong scripting skills (Python, Bash).
  • Experience with Delta Lake and Unity Catalog.
  • Strong knowledge of Spark architecture and distributed computing.
  • Hands-on experience with Terraform or other IaC tools.
  • Experience with Unity Catalog and Delta Lake.
  • Experience with data modeling and performance tuning.
  • Experience with streaming technologies (Kafka, Event Hub).
  • Experience with using CI/CD for data pipelines.
  • Familiarity with Kubernetes and container orchestration.
  • Excellent problem-solving skills and attention to detail.
  • Strong communication and collaboration skills, with the ability to work effectively in a team environment.

Nice To Haves

  • Databricks Certified Data Engineer Associate or Professional.
  • Azure Data Engineer Associate or AWS Big Data Specialty.

Responsibilities

  • Develop and implement CI/CD pipelines for Databricks notebooks and jobs.
  • Develop ETL pipelines using PySpark and Databricks.
  • Implement Delta Lake for ACID transactions and data reliability.
  • Optimize ingestion from APIs, streaming, and batch sources.
  • Ensure compliance with data governance and security standards.
  • Collaborate with data engineers and scientists to support data pipelines and ML workflows.
  • Conduct ETL and data quality analysis using various technologies (i.e., Python, Databricks).
  • Ensure data governance and quality assurance standards are met.
  • Organize and lead meetings, including scheduling meetings; drafting and delivering agendas and meeting minutes; providing and archiving required documentation; and documenting, tracking, and following up on action items.
  • Summarize and present information and reports to the team and make recommendations (both oral and written).

Benefits

  • Medical, Rx, Dental & Vision Insurance
  • Personal and Family Sick Time & Company Paid Holidays
  • Position may be eligible for a discretionary variable incentive bonus
  • Parental Leave and Adoption Assistance
  • 401(k) Retirement Plan
  • Basic Life & Supplemental Life
  • Health Savings Account, Dental/Vision & Dependent Care Flexible Spending Accounts
  • Short-Term & Long-Term Disability
  • Student Loan PayDown
  • Tuition Reimbursement, Personal Development & Learning Opportunities
  • Skills Development & Certifications
  • Employee Referral Program
  • Corporate Sponsored Events & Community Outreach
  • Emergency Back-Up Childcare Program
  • Mobility Stipend
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service