Python Developer (Databricks, Medallion Architecture)

CACI International•Ashburn, VA

1d•$103,800 - $218,100•Hybrid

About The Position

CACI is seeking a highly skilled and motivated Python Developer with extensive experience in Databricks and a strong understanding of medallion architecture principles to join our BEAGLE (Border Enforcement Applications for Government Leading-Edge Information Technology) Agile Solution Factory (ASF) Team supporting Customs and Border Protection (CBP) client located in Northern Virginia! In this hands-on role, you will be instrumental in building, optimizing, and maintaining our modern data platform. You will leverage Databricks' powerful capabilities to construct scalable, high-performance data pipelines that power critical business insights and drive data-informed decision-making across the organization. Join this passionate team of industry-leading individuals supporting best practices in agile software development for the Department of Homeland Security (DHS). You will support the men and women charged with safeguarding the American people and enhancing the nation’s safety and security.

Requirements

Candidate must be available to work a hybrid schedule in Ashburn, VA.
Must be a U.S. Citizen with the ability to pass CBP background investigation, criteria includes, but not limited to: 3-year check for felony convictions 1-year check for illegal drug use 1-year check for misconduct such as theft or fraud
College degree (B.S.) in Computer Science, Software Engineering, Information Management Systems or a related discipline. Equivalent professional experience will be considered in lieu of degree.
at least seven (7) years related technical experience.
Extensive hands-on experience with Databricks and its core components (Spark, Delta Lake).
Proven understanding and practical application of the Medallion Architecture (Bronze, Silver, Gold layers) and its benefits for data management.
Proficiency in Python for data manipulation, processing, and ETL development (e.g., using Pandas, PySpark).
Extensive experience with Spark SQL and PySpark for distributed data processing.
Deep understanding of Delta Lake features, including ACID transactions, schema evolution, time travel, and performance optimizations.
Experience with data warehousing concepts and best practices.
Familiarity with SQL for querying and data manipulation.
Experience with source code control systems and concurrent development workflows (Git preferred).
Strong analytical and problem-solving skills with the ability to troubleshoot complex data issues.
Excellent communication and interpersonal skills, with the ability to explain technical concepts clearly.
Strong ability to analyze complex project-related problems and create innovative solutions.

Nice To Haves

Experience with Databricks Unity Catalog for data governance, security, and discovery.
Familiarity with cloud platforms such as AWS, Azure, or GCP.
Experience with orchestration tools like Apache Airflow or Databricks Workflows.
Knowledge of CI/CD practices and tools (e.g., Jenkins, GitLab CI, GitHub Actions) for automated build and deployment processes.
Experience with data modeling tools and techniques.
Familiarity with BI tools (e.g., Tableau, Power BI, Looker) for data consumption.
Understanding of data security principles and best practices.
Experience with other data processing frameworks or technologies.
Ability to apply estimation techniques to software development efforts.
Strong collaboration skills and a desire to work within a team.
Highly responsible, team-oriented individual with a very strong work ethic and a self-starter.

Responsibilities

Design, develop, and implement robust, scalable, and performant ETL/ELT pipelines within the Databricks environment using Python and PySpark.
Build and manage data layers (Bronze, Silver, Gold) adhering to best practices of the medallion architecture, ensuring data quality, reliability, and discoverability.
Leverage Databricks features extensively, including Spark, Delta Lake, SQL Analytics, and Unity Catalog, to construct efficient and maintainable data solutions.
Collaborate closely with data scientists, analysts, and business stakeholders to understand data requirements and translate them into actionable data engineering solutions.
Implement comprehensive data quality checks, validation rules, and lineage tracking mechanisms within the Databricks ecosystem.
Optimize data pipelines and Spark jobs for performance, cost-efficiency, and scalability, utilizing Databricks and Delta Lake best practices.
Write clean, well-documented, and testable Python code, adhering to coding standards and promoting code quality through rigorous code reviews.
Troubleshoot and resolve complex issues related to data pipelines, Databricks jobs, and data integrity across development, staging, and production environments.
Contribute to the design and implementation of efficient data models optimized for query performance and data governance.
Stay abreast of emerging technologies and trends in data engineering and Databricks, and champion their adoption where appropriate to enhance our data platform.
Collaborate within an agile development framework, actively participating in team ceremonies and contributing to a culture of continuous improvement.