Apache Iceberg Data Engineer

Booz Allen HamiltonMcLean, VA
20h$62,000 - $141,000

About The Position

Apache Iceberg Data Engineer The Opportunity: Ever-expanding technology like IoT, machine learning, and artificial intelligence means that there’s more structured and unstructured data available today than ever before. As a data engineer, you know that organizing data can yield pivotal insights when it’s gathered from disparate sources. We need a data professional like you to help our clients find answers in their data to impact important missions—from fraud detection to cancer research to national intelligence. As a data engineer at Booz Allen, you’ll use your skills and experience to help build advanced technology solutions and implement data engineering activities on some of the most mission-driven projects in the industry. You’ll develop and deploy the pipelines and platforms that organize and make disparate data meaningful. Here, you’ll work with a multi-disciplinary team of analysts, data engineers, developers, and data consumers in a fast-paced, agile environment. You’ll sharpen your skills in analytical exploration and data examination while you support the assessment, design, developing, and maintenance of scalable platforms for your clients. Due to the nature of work performed within this facility, U.S. citizenship is required. Work with us to use data for good. Join us. The world can’t wait.

Requirements

  • 2+ years of experience developing and maintaining data pipelines and workflows for large-scale datasets, ensuring efficiency and reliability
  • Experience working with Apache Iceberg or table formats such as Delta Lake or Hudi, including data lake transactions, schema evolution, data version control, and partition optimization
  • Experience working with distributed file systems, such as S3, HDFS, or GCS, and implementing scalable, high-performance data lake infrastructure
  • Experience with query engines such as Presto, Trino, Spark, or Hive, integrating them with Iceberg-backed tables for efficient querying of large datasets
  • Experience in Python and programming languages such as Java or Scala, including implementing scalable ETL/ELT processes to populate and maintain Iceberg tables
  • Experience working with data lifecycle management, including time-travel queries and optimizing data for both historical and real-time use cases
  • Knowledge of data lake and warehouse architecture principles and platforms, including best practices for storage optimization and modern lakehouse paradigms
  • Ability to debug and troubleshoot for data lake environments, addressing issues related to data consistency, governance, and performance bottlenecks, and design and document reusable, modular solutions for managing and interacting with Iceberg-backed datasets in complex ecosystems
  • Ability to obtain and maintain a Public Trust or Suitability/Fitness determination based on client requirements
  • Bachelor’s degree in Data Engineering or Computer Science

Nice To Haves

  • Experience integrating Apache Iceberg with orchestration tools like Apache Airflow to automate workflows involving complex data lake operations
  • Experience with containerized environments such as Docker, and orchestration platforms such as Kubernetes, ensuring scalability for Iceberg-backed systems
  • Experience working with AWS Glue Catalog, Hive Metastore, or other metadata or catalog systems to efficiently manage Iceberg schema and table metadata
  • Experience adapting Iceberg implementations for the cloud
  • Experience implementing data governance principles, including role-based access and compliance policies, into Iceberg workflows
  • Knowledge of cloud-native object storage systems such as AWS S3, Azure Data Lake, or Google Cloud Storage and Knowledge of distributed computing systems, such as Spark or Flink, for both batch and real-time data processing involving Iceberg datasets
  • Knowledge of partitioning strategies and optimization techniques for performance tuning of Iceberg analytics
  • Knowledge of real-time data streaming and integrating tools such as Kafka with Iceberg for near-real-time ingestion and analytics
  • Knowledge of Agile engineering practices

Benefits

  • health
  • life
  • disability
  • financial
  • retirement benefits
  • paid leave
  • professional development
  • tuition assistance
  • work-life programs
  • dependent care
  • recognition awards program
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service