Apache Iceberg Data Engineer

Booz Allen Hamilton•McLean, VA

20h•$62,000 - $141,000

About The Position

Apache Iceberg Data Engineer The Opportunity: Ever-expanding technology like IoT, machine learning, and artificial intelligence means that there’s more structured and unstructured data available today than ever before. As a data engineer, you know that organizing data can yield pivotal insights when it’s gathered from disparate sources. We need a data professional like you to help our clients find answers in their data to impact important missions—from fraud detection to cancer research to national intelligence. As a data engineer at Booz Allen, you’ll use your skills and experience to help build advanced technology solutions and implement data engineering activities on some of the most mission-driven projects in the industry. You’ll develop and deploy the pipelines and platforms that organize and make disparate data meaningful. Here, you’ll work with a multi-disciplinary team of analysts, data engineers, developers, and data consumers in a fast-paced, agile environment. You’ll sharpen your skills in analytical exploration and data examination while you support the assessment, design, developing, and maintenance of scalable platforms for your clients. Due to the nature of work performed within this facility, U.S. citizenship is required. Work with us to use data for good. Join us. The world can’t wait.

Requirements

2+ years of experience developing and maintaining data pipelines and workflows for large-scale datasets, ensuring efficiency and reliability
Experience working with Apache Iceberg or table formats such as Delta Lake or Hudi, including data lake transactions, schema evolution, data version control, and partition optimization
Experience working with distributed file systems, such as S3, HDFS, or GCS, and implementing scalable, high-performance data lake infrastructure
Experience with query engines such as Presto, Trino, Spark, or Hive, integrating them with Iceberg-backed tables for efficient querying of large datasets
Experience in Python and programming languages such as Java or Scala, including implementing scalable ETL/ELT processes to populate and maintain Iceberg tables
Experience working with data lifecycle management, including time-travel queries and optimizing data for both historical and real-time use cases
Knowledge of data lake and warehouse architecture principles and platforms, including best practices for storage optimization and modern lakehouse paradigms
Ability to debug and troubleshoot for data lake environments, addressing issues related to data consistency, governance, and performance bottlenecks, and design and document reusable, modular solutions for managing and interacting with Iceberg-backed datasets in complex ecosystems
Ability to obtain and maintain a Public Trust or Suitability/Fitness determination based on client requirements
Bachelor’s degree in Data Engineering or Computer Science

Nice To Haves

Experience integrating Apache Iceberg with orchestration tools like Apache Airflow to automate workflows involving complex data lake operations
Experience with containerized environments such as Docker, and orchestration platforms such as Kubernetes, ensuring scalability for Iceberg-backed systems
Experience working with AWS Glue Catalog, Hive Metastore, or other metadata or catalog systems to efficiently manage Iceberg schema and table metadata
Experience adapting Iceberg implementations for the cloud
Experience implementing data governance principles, including role-based access and compliance policies, into Iceberg workflows
Knowledge of cloud-native object storage systems such as AWS S3, Azure Data Lake, or Google Cloud Storage and Knowledge of distributed computing systems, such as Spark or Flink, for both batch and real-time data processing involving Iceberg datasets
Knowledge of partitioning strategies and optimization techniques for performance tuning of Iceberg analytics
Knowledge of real-time data streaming and integrating tools such as Kafka with Iceberg for near-real-time ingestion and analytics
Knowledge of Agile engineering practices