Developer/Data Engineer

CMT Services IncBaltimore, MD
23hHybrid

About The Position

Maryland Department of Health is seeking a hands-on Data Engineer to design, develop, and optimize large-scale data pipelines in support of our Enterprise Data Warehouse (EDW) and Data Lake solutions. This role requires deep technical expertise in coding, pipeline orchestration, and cloud-native data engineering on AWS. The Data Engineer will be directly responsible for implementing ingestion, transformation, and integration workflows — ensuring data is high-quality, compliant, and analytics-ready. This role may support other projects or teams within MDH as needed.

Requirements

  • The proposed candidate must have a minimum of three (3) years of experience as a data engineer.
  • A bachelor's or master's degree from an accredited college or university with a major in computer science, statistics, mathematics, economics, or related field. Three (3) years of equivalent experience in a related field may be substituted for the bachelor's degree.
  • The candidate should have experience as data engineer or similar role with a strong understanding of data architecture and ETL processes.
  • The candidate should be proficient in programming languages for data processing and knowledgeable of distributed computing and parallel processing.
  • 3+ years hands-on experience in building, deploying, and maintaining data pipelines on AWS or equivalent cloud platforms.
  • Strong coding skills in Python and SQL (Scala or Java a plus).
  • Proven experience with Apache Spark (PySpark) for large-scale processing.
  • Hands-on experience with AWS Glue, S3, Redshift, Athena, EMR, Lake Formation.
  • Strong debugging and performance optimization skills in distributed systems.
  • Hands-on experience with Iceberg, Delta Lake, or other OTF table formats.
  • Experience with Airflow or other pipeline orchestration frameworks.
  • Practical experience in CI/CD and Infrastructure-as-Code (Terraform, CloudFormation).
  • Practical experience with EDI X12, HL7, or FHIR data formats.
  • Strong understanding of Medallion Architecture for data lake houses.
  • Hands-on experience building dimensional models and data warehouses.
  • Working knowledge of HIPAA and CMS interoperability requirements.

Responsibilities

  • Design, develop and maintain data pipelines, and extract, transform, load (ETL) processes to collect, process and store structured and unstructured data
  • Build data architecture and storage solutions, including data lakehouses, data lakes, data warehouse, and data marts to support analytics and reporting
  • Develop data reliability, efficiency, and qualify checks and processes
  • Prepare data for data modeling
  • Monitor and optimize data architecture and data processing systems
  • Collaboration with multiple teams to understand requirements and objectives
  • Administer testing and troubleshooting related to performance, reliability, and scalability
  • Create and update documentation
  • Design, code, and deploy ETL/ELT pipelines across bronze, silver, and gold layers of the Data Lakehouse.
  • Build ingestion pipelines for structured (SQL), semi-structured (JSON, XML), and unstructured data using PySpark/Python programming language using AWS Glue or EMR.
  • Implement incremental loads, deduplication, error handling, and data validation.
  • Actively troubleshoot, debug, and optimize pipelines for scalability and cost efficiency.
  • Develop dimensional data models (Star Schema, Snowflake Schema) for analytics and reporting.
  • Build and maintain tables in Iceberg, Delta Lake, or equivalent OTF formats.
  • Optimize partitioning, indexing, and metadata for fast query performance.
  • Build ingestion and transformation pipelines for EDI X12 transactions (837, 835, 278, etc.).
  • Implement mapping and transformation of EDI data with FHIR and HL7 frameworks.
  • Work hands-on with AWS Health Lake (or equivalent) to store and query healthcare data.
  • Develop automated validation scripts to enforce data quality and integrity.
  • Implement IAM roles, encryption, and auditing to meet HIPAA and CMS compliance standards.
  • Maintain lineage and governance documentation for all pipelines.
  • Work closely with the Lead Data Engineer, analysts, and data scientists to deliver pipelines that support enterprise-wide analytics.
  • Actively contribute to CI/CD pipelines, Infrastructure-as-Code (IaC), and automation.
  • Continuously improve pipelines and adopt new technologies where appropriate.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service