Meteorological Data Management Engineer

Booz Allen Hamilton•McLean, VA

2d•Remote

About The Position

As a data engineer, you know that organizing data can yield pivotal insights when it’s gathered from disparate sources. We need a data professional like you to help our clients find answers in their data to impact important missions from fraud detection to cancer research, to national intelligence. As a data engineer at Booz Allen, you’ll use your skills and experience to help build advanced technology solutions and implement data engineering activities on some of the most mission-driven projects in the industry. You’ll develop and deploy the pipelines and platforms that organize and make disparate data meaningful. Here, you’ll work with a multi-disciplinary team of analysts, data engineers, developers, and data consumers in a fast-paced, agile environment. You’ll sharpen your skills in analytical exploration and data examination while you support the assessment, design, development, and maintenance of scalable platforms for your clients. Due to the nature of work performed within this facility, U.S. citizenship is required. Work with us to use data for good. Join us. The world can’t wait.

Requirements

5+ years of experience with meteorological data
3+ years of experience managing large-scale data ecosystems
Experience with metadata management tools such as OpenMetadata, including setup, configuration, and integration into data pipelines to ensure discoverability, lineage tracking, and governance
Knowledge of data lifecycle management strategies, including tiering data across hot, warm, and cold storage layers, retention policies, and archival workflows, to support petabyte-scale and continuously-ingested datasets
Knowledge of cloud-based storage systems and intelligent tiering features such as AWS S3 Intelligent-Tiering, or in Azure or GCP, including APIs and configuration
Knowledge of data observability advancements to monitor, assess, and optimize the performance of pipelines and data storage systems in distributed cloud environments
Ability to work with large datasets using programming languages such as Python, to develop and optimize data organization, storage, and transformation workflows
Ability to obtain and maintain a Public Trust or Suitability/Fitness determination based on client requirements
Bachelor’s degree

Nice To Haves

Experience configuring cloud-native tools for policies on intelligent tiering such as automating data movement between storage tiers using Lambda, Step Functions, or equivalent workflows
Experience working with OpenMetadata integrations into existing tools such as Apache Airflow, Kubernetes, and large-scale orchestration systems, ensuring metadata catalogs automatically synchronize with pipeline operations
Experience working with containerized environments such as Docker or Kubernetes, and modern orchestration tools such as Airflow or Prefect, to optimize both metadata workflows and storage pipelines
Experience working with implementing data governance frameworks such as access controls and lineage policies that integrate with OpenMetadata or equivalent metadata tools
Knowledge of geospatial datasets or scientific data formats such as NetCDF, GRIB, or HDF5, commonly used in weather or satellite data systems, and their implications for storage architecture
Knowledge of distributed query engines such as Presto, Trino, Hive, or Spark, tuned for performance on a lakehouse or intelligent tiering-enabled data lake architecture
Knowledge of real-time data streaming tools and integrations such as Kafka or AWS Kinesis, ensuring metadata tracks changes and tiering strategies accommodate time-sensitive ingestion workflows
Knowledge of Agile engineering practices, including CI/CD pipelines and collaboration with data engineers, AI engineers, and product teams to deliver optimized data ecosystems

Responsibilities

Develop and deploy the pipelines and platforms that organize and make disparate data meaningful.
Work with a multi-disciplinary team of analysts, data engineers, developers, and data consumers in a fast-paced, agile environment.
Support the assessment, design, development, and maintenance of scalable platforms for clients.
Use data for good.
Design storage strategies such as intelligent tiering for cost optimization and performance.
Integrate storage and metadata strategies into Agile, cross-functional development teams to ensure alignment with real-time and batch processing pipelines.
Manage data lifecycle, including tiering data across hot, warm, and cold storage layers, retention policies, and archival workflows, to support petabyte-scale and continuously-ingested datasets.
Monitor, assess, and optimize the performance of pipelines and data storage systems in distributed cloud environments.
Develop and optimize data organization, storage, and transformation workflows using programming languages such as Python.
Analyze usage patterns and recommend optimizations for performance, cost, and data accessibility at scale.