Lead AWS Data Engineer

Fannie MaePlano, TX
1d$141,000 - $184,000Onsite

About The Position

We are seeking an experienced Lead AWS Data Engineer to lead the design, development, and optimization of large-scale AWS-based Data Lake and data pipeline solutions. The ideal candidate will have deep expertise in AWS data services (EMR, Glue, S3, Athena, Lambda) and PySpark-based data processing, along with strong experience in data modeling, performance optimization, and data pipeline orchestration. This role requires strong technical leadership and stakeholder collaboration, with the ability to analyze complex datasets, perform data mapping across systems, and translate business requirements into scalable data engineering solutions. THE IMPACT YOU WILL MAKE The Lead AWS Data Engineer role will offer you the flexibility to make each day your own, while working alongside people who care so that you can deliver on the following responsibilities:

Requirements

  • 4+ years of experience in Data Engineering / Big Data development.
  • 4+ years of hands-on experience with AWS data services.
  • Strong experience with AWS EMR, AWS Glue, S3 Data Lakes, Athena / Redshift / Lakehouse architectures.
  • Expertise in PySpark and Spark-based distributed processing.
  • Basic understanding or exposure to Generative AI concepts and AWS Bedrock services.
  • Strong experience building large-scale data pipelines.
  • Proven experience with EMR performance tuning and debugging.
  • Experience with data mapping and integration across heterogeneous datasets.
  • Strong SQL and data modeling skills.
  • Excellent communication and stakeholder management skills.

Nice To Haves

  • Bachelor degree or equivalent
  • 10+ years of experience in Data Engineering / Big Data development.
  • 5+ years of hands-on experience with AWS data services.
  • Hands-on experience with AWS Bedrock, including working with foundation models and building GenAI-powered data solutions.
  • Experience integrating AI/ML or Generative AI capabilities into data pipelines or analytics platforms.
  • AWS Certification (e.g., AWS Certified Data Analytics, Machine Learning Specialty, or AI/ML-related certifications) preferred.
  • Experience with AWS Step Functions.
  • Experience with Data Lake governance tools (Lake Formation, Glue Catalog).
  • Knowledge of data security and compliance frameworks.
  • Experience implementing CI/CD pipelines for data platforms.

Responsibilities

  • Data Engineering & Architecture Design, build, and maintain scalable AWS Data Lake architectures using services such as S3, EMR, Glue, Athena, and Lambda.
  • Develop and optimize data pipelines and ETL/ELT workflows using PySpark, AWS Glue, and EMR.
  • Implement high-performance distributed data processing solutions for large-scale datasets.
  • Develop frameworks for data ingestion, transformation, validation, and publishing within the data lake ecosystem.
  • Performance Optimization Diagnose and resolve EMR cluster performance issues including memory management, Spark job optimization, partitioning strategies, and resource allocation.
  • Optimize Spark/PySpark workloads for cost and performance.
  • Implement monitoring and performance tuning strategies for data processing pipelines.
  • Data Analysis & Integration Analyze complex datasets across multiple systems to support data mapping, transformation, and integration.
  • Define and implement data quality checks and validation frameworks.
  • Collaborate with data architects and analysts to develop efficient data models and data flows.
  • Leadership & Collaboration Act as a technical lead for data engineering initiatives and mentor junior engineers.
  • Work closely with business stakeholders, product owners, and data consumers to gather requirements and translate them into technical solutions.
  • Provide guidance on data architecture best practices and standards.
  • Workflow & Automation Build and maintain workflow orchestration solutions using tools such as Airflow, Step Functions, or Glue Workflows.
  • Automate deployment and management of data pipelines using CI/CD practices and infrastructure-as-code.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service