Data Engineer

CDC Foundation

2h•Remote

About The Position

The Data Engineer will play a crucial role in advancing the CDC Foundation's mission by designing, building, and maintaining data infrastructure for a public health organization. This role is aligned to the Workforce Acceleration Initiative (WAI). WAI is a federally funded CDC Foundation program with the goal of helping the nation’s public health agencies by providing them with the technology and data experts they need to accelerate their information system improvements. Working within Prince George’s County Health Department’s Information Technology, this role supports the development, implementation, and ongoing optimization of a cloud-based Azure Synapse Analytics data warehousing solution built on Medallion Architecture (Bronze, Silver, Gold). The Data Engineer collaborates with business and technical stakeholders to support requirements clarification, implement and manage data pipelines, ensure data sets are integrated and analytics-ready, and assist with testing and updating system functions and capabilities. The Data Engineer will be hired by the CDC Foundation and assigned to the Prince George’s County Health Department’s Information Technology. This position is eligible for a fully remote work arrangement for U.S. based candidates.

Requirements

Bachelor's degree in computer science, Information Technology, Data Science, or related field.
Minimum 5 years of professional experience in data engineering, analytics engineering, or data warehousing roles.
Hands-on experience supporting cloud-based data platforms, with preference for Microsoft Azure data services such as Azure Synapse Analytics and ADLS Gen2.
Minimum 2 years of professional experience in data engineering, analytics engineering, or data warehousing roles.
Minimum 1 year(s) of professional experience in systems analysis, requirements elicitation and management, systems design and engineering, and stakeholder engagement
Proficiency in SQL and at least one programming language such as Python, Scala, or Java.
Experience developing and supporting Spark-based data transformations and scalable ETL/ELT pipelines.
Understanding data warehousing concepts, including dimensional modeling and analytics-oriented data design.
Experience applying engineering best practices, including source control, CI/CD pipelines, automated testing, and peer review.
Familiarity with agile development methodologies and modern software design patterns.
Strong analytical, troubleshooting, and problem-solving skills related to data pipelines and data quality.
Excellent written and verbal communication skills, with the ability to explain technical concepts to non-technical stakeholders.
Experience collaborating with distributed and remote teams.

Nice To Haves

Experience working with public health, healthcare, or government data environments.
Familiarity with Microsoft Purview for data governance and lineage.
Experience supporting Power BI or other analytics and visualization tools.
Knowledge of HIPAA, public health data standards, and regulatory compliance.
Prior experience designing enterprise-scale Azure data platforms using Medallion or Lakehouse architectures.
Understand Medallion Architecture (Bronze, Silver, Gold) layers within Azure Data Lake Storage Gen2 (ADLS).
Monitor, troubleshoot, and tune Synapse workloads for performance, scalability, and cost efficiency.
Implement data governance, metadata management, and lineage using Microsoft Purview.
Promote standardization and reuse of patterns across Synapse artifacts.
Provide technical knowledge transfer to internal staff and partners if needed.
Stay current with industry trends and Azure data platform advancements, incorporating innovations where appropriate.

Responsibilities

Support the Design, implement, and maintain Azure Synapse Analytics solutions using Spark pools, dedicated and serverless SQL pools.
Support the development of reusable, parameter-driven data pipelines leveraging Synapse pipelines and Azure Data Factory–style orchestration.
Ingest and integrate data from diverse internal and external public health sources (clinical, operational, surveillance, census, and partner data).
Build Spark-based transformations and SQL-based data models to cleanse, standardize, and enrich data.
Design and maintain dimensional and analytical data models optimized for reporting, dashboards, and advanced analytics.
Identify and resolve data pipeline failures, data quality issues, and processing bottlenecks.
Implement logging, monitoring, and alerting for production-grade data pipelines.
Ensure compliance with public health data security, privacy, and regulatory requirements (HIPAA, CDC guidance, and county policies).
Apply role-based access control (RBAC) and data protection best practices across Azure resources.
Support the gathering, validation, and documentation of data-related business and technical requirements in collaboration with program stakeholders and IT partners.
Partner with public health program leaders, analysts, epidemiologists, and informatics teams to clarify data needs and translate defined requirements into technical data solutions.
Communicate technical concepts, implementation status, and considerations clearly to both technical and non-technical stakeholders.
Ensure delivered data products are analytics-ready and aligned with reporting, performance management, and decision-support needs.
Manage testing activities related to data flows, pipelines, and data processes.
Apply data engineering best practices including source control, CI/CD, automated testing, documentation, and code reviews.