Lead Data Platform Engineer (Airflow / Astro & AWS)

Computer Task Group, IncAtlanta, GA
5d

About The Position

CTG is seeking to fill a Lead Data Platform Engineer (Airflow / Astro & AWS) position for our client. Location: Atlanta, GA Duration: 9 months Duties: Lead L2/L3 application support for enterprise Astronomer (Astro)-managed Apache Airflow environments Own Incident, Problem, and Change Management processes for the data orchestration platform Perform advanced root cause analysis (RCA) for pipeline failures, scheduler issues, and infrastructure bottlenecks Improve DAG reliability, SLA adherence, and reduce MTTR across data pipelines Establish operational playbooks, runbooks, and support standards Design and maintain dynamic DAG frameworks (Factory-based) to enable scalable pipeline onboarding Support complex ETL/ELT workflows across data lakes (S3), data warehouses (Snowflake, Redshift), and streaming/batch pipelines Ensure data quality validation, reconciliation, and SLA tracking across workloads Collaborate with Data Engineering teams to optimize performance and cost efficiency Support metadata management, data lineage, and governance integrations Develop production-grade Python DAGs, operators, and plugins Build reusable, configuration-driven orchestration frameworks Automate deployment, provisioning, and pipeline lifecycle management Enforce coding standards, version control, and CI/CD best practices Manage Astro deployments in AWS (EKS, S3, RDS, IAM, CloudWatch, VPC) Troubleshoot and optimize Kubernetes-based Airflow clusters Perform capacity planning and resource tuning for schedulers and workers Implement Infrastructure-as-Code (Terraform or CloudFormation) Ensure high availability, disaster recovery, and cloud cost optimization Design and implement observability dashboards for DAG health, SLA compliance, task failures, and resource utilization Integrate monitoring tools such as CloudWatch, Prometheus, and Grafana Implement proactive alerting and automated remediation workflows Ensure compliance with data governance, audit, and regulatory requirements Implement access controls, encryption standards, and secure data handling practices Partner with Data Governance and Security teams Lead and mentor a team of Data Platform and Application Support engineers Collaborate with cross-functional teams including Data Engineering, Analytics, and Cloud Infrastructure Provide executive-level reporting on platform KPIs (uptime, SLA adherence, incident trends) Drive continuous improvement and platform modernization initiatives

Requirements

  • Strong expertise in Astronomer (Astro) and Apache Airflow
  • Advanced Python programming and automation skills
  • Deep knowledge of ETL/ELT processes and modern data architectures
  • Experience with AWS services including EKS, S3, RDS, IAM, and CloudWatch
  • Hands-on Kubernetes experience (EKS preferred)
  • Experience building observability and monitoring solutions (Prometheus, Grafana, CloudWatch)
  • Strong understanding of data lakes, data warehousing, and distributed data processing
  • Familiarity with Infrastructure-as-Code (Terraform, CloudFormation)
  • Knowledge of CI/CD pipelines and DevOps practices
  • Strong analytical, troubleshooting, and problem-solving skills
  • Excellent leadership, communication, and stakeholder management abilities
  • 7+ years of experience in Data Platform, Data Engineering, or Application Support environments
  • 2+ years of hands-on experience with Astronomer (Astro) Airflow
  • Proven experience with dynamic DAG development (Factory-based frameworks)
  • Experience supporting enterprise-scale data orchestration platforms
  • Background in cloud-native data platforms and distributed systems
  • Bachelor’s degree in Computer Science, Data Engineering, Information Systems, or related field
  • Excellent verbal and written English communication skills and the ability to interact professionally with a diverse group are required.

Nice To Haves

  • Experience in regulated industries (e.g., Financial Services) preferred
  • Exposure to MLOps orchestration workflows is a plus

Responsibilities

  • Lead L2/L3 application support for enterprise Astronomer (Astro)-managed Apache Airflow environments
  • Own Incident, Problem, and Change Management processes for the data orchestration platform
  • Perform advanced root cause analysis (RCA) for pipeline failures, scheduler issues, and infrastructure bottlenecks
  • Improve DAG reliability, SLA adherence, and reduce MTTR across data pipelines
  • Establish operational playbooks, runbooks, and support standards
  • Design and maintain dynamic DAG frameworks (Factory-based) to enable scalable pipeline onboarding
  • Support complex ETL/ELT workflows across data lakes (S3), data warehouses (Snowflake, Redshift), and streaming/batch pipelines
  • Ensure data quality validation, reconciliation, and SLA tracking across workloads
  • Collaborate with Data Engineering teams to optimize performance and cost efficiency
  • Support metadata management, data lineage, and governance integrations
  • Develop production-grade Python DAGs, operators, and plugins
  • Build reusable, configuration-driven orchestration frameworks
  • Automate deployment, provisioning, and pipeline lifecycle management
  • Enforce coding standards, version control, and CI/CD best practices
  • Manage Astro deployments in AWS (EKS, S3, RDS, IAM, CloudWatch, VPC)
  • Troubleshoot and optimize Kubernetes-based Airflow clusters
  • Perform capacity planning and resource tuning for schedulers and workers
  • Implement Infrastructure-as-Code (Terraform or CloudFormation)
  • Ensure high availability, disaster recovery, and cloud cost optimization
  • Design and implement observability dashboards for DAG health, SLA compliance, task failures, and resource utilization
  • Integrate monitoring tools such as CloudWatch, Prometheus, and Grafana
  • Implement proactive alerting and automated remediation workflows
  • Ensure compliance with data governance, audit, and regulatory requirements
  • Implement access controls, encryption standards, and secure data handling practices
  • Partner with Data Governance and Security teams
  • Lead and mentor a team of Data Platform and Application Support engineers
  • Collaborate with cross-functional teams including Data Engineering, Analytics, and Cloud Infrastructure
  • Provide executive-level reporting on platform KPIs (uptime, SLA adherence, incident trends)
  • Drive continuous improvement and platform modernization initiatives
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service