Lead Data Engineer

AECOM•Dallas, TX

4h•Hybrid

About The Position

We’re hiring a hands-on Lead Data Engineer to help modernize an enterprise data platform from legacy, on-prem systems to a cloud-native AWS lakehouse. This is a lead individual contributor role with high ownership and a strong hands-on focus (approximately 70%), balancing deep technical delivery with design guidance, code reviews, and mentorship. While we leverage AWS S3 Tables (managed Apache Iceberg), success in this role requires a solid understanding of how modern table formats operate under the hood, beyond reliance on fully managed tooling. This position will offer flexibility for hybrid work schedules to include both in-office presence and telecommute/virtual work, to be based from either Houston or Dallas, TX.

Requirements

BA/BS in Computer Science, Engineering, or a related field plus at least 8 years of hands-on data engineering experience, or demonstrated equivalent experience and education
Strong, current hands-on experience with AWS data services, including S3 and Spark-based processing
Hands-on experience with an open table format (such as Apache Iceberg, Delta Lake, or Hudi), with a clear understanding of table metadata, schema evolution, partitioning, and performance tradeoffs
Proficiency in Python and PySpark for production data pipelines
Experience designing, building, and operating data pipelines end-to-end in AWS
Experience orchestrating data pipelines using Airflow or AWS Step Functions
Strong communication skills and ability to operate as a hands-on technical lead

Nice To Haves

Master's degree in a relevant field
Experience working with AWS S3 Tables or native Apache Iceberg in AWS environments
Experience modernizing on-prem or legacy data warehouse platforms
Familiarity with lakehouse performance tuning, schema evolution, and partitioning
Experience with data pipeline orchestration frameworks
Exposure to AI/ML data readiness or downstream analytics use cases
Experience working in large, complex enterprise environments

Responsibilities

Design, build, and operate end-to-end, production-grade data pipelines in AWS
Re-engineer legacy ETL into a lakehouse architecture (Bronze/Silver/Gold) on S3
Work hands-on with open table formats in AWS (S3 Tables / Apache Iceberg) and understand their metadata, snapshots, schema evolution, and performance characteristics
Develop pipelines using Python, PySpark, and Spark on AWS Glue and/or EMR
Orchestrate workloads using Airflow or AWS Step Functions
Partner with data architects to translate business requirements into data products
Perform code reviews, mentor engineers, and contribute hands-on in production
Support agile development practices, including planning, demos, and reviews