Poly

Leading Path Consulting•McLean, VA

57d•Onsite

About The Position

The Sponsor requires Data Engineering support to evaluate, optimize, and implement robust data infrastructure that enables reliable, accessible, and scalable data delivery across the organization. The Contractor will work collaboratively with data consumers, technical teams, leadership, and stakeholders to assess current data pipelines, identify gaps in data accessibility and reliability, and architect solutions that establish trusted data foundations. Work involves applying engineering best practices to implement proper data modeling and integration patterns, ensuring data quality and observability throughout pipelines, and creating maintainable infrastructure that supports analytics, reporting, and operational use cases. The Sponsor's data landscape includes enterprise operational systems such as ServiceNow, network management platforms (NetIM), and network modeling tools (Forward Networks). The Data Engineering support must be adept at extracting data from these systems via APIs (Application Programming Interface), exports, and vendor-specific interfaces, often with limited documentation or non-standard data structures, and transforming this operational data into accessible, integrated datasets.

Requirements

Demonstrated experience building and managing data pipelines.
Demonstrated experience with python.
Demonstrated experience with cloud computing using AWS services.
Demonstrated experience processing data using Apache Spark.
Demonstrated experience with an RDBMS (Relational Database Management System) (Postgres, Oracle, MySQL) and writing SQL queries.
Demonstrated experience with Linux and shell scripting.
Demonstrated experience analyzing data in different file formats like csv, xml, json, avro, parquet, etc.
Demonstrated experience writing and validating unit tests.
Active TS/SCI w/ FS poly required prior to application

Nice To Haves

Demonstrated experience in NiFi, Apache AirFlow, or an equivalent solution or tool for orchestrating data pipelines.
Demonstrated experience with java or scala.
Demonstrated experience administering an EMR/Spark cluster.
Demonstrated experience conducting performance tuning of a spark job.
Demonstrated experience supporting Hive, Iceberg, or another technology providing SQL access to data.
Demonstrated experience developing cloud-based security solutions.
Demonstrated experience following a configuration management process to review and deploy code as part of releases.

Responsibilities

Conduct comprehensive assessments of existing data pipelines, infrastructure, and data flows including integrations with operational systems like ServiceNow, network management platforms, and business applications to identify technical debt, bottlenecks, and reliability issues.
Evaluate current data architecture against industry best practices and organizational needs; develop technical recommendations and roadmaps for data infrastructure improvements.
Design, build, and maintain production-grade data pipelines using orchestration tools such as Airflow or Prefect.
Develop robust ETL (Extract-Transform-Load) / ELT (Extract-Load-Transform) processes from diverse sources: SaaS platforms, network management systems, databases, APIs, files, and streams.
Build API integrations handling authentication (OAuth, API keys, and Single Sign-On (SSO)), rate limiting, pagination, retry logic, and error handling.
Extract data from systems not designed for export; reverse-engineer undocumented data structures and relationships.
Handle semi-structured data (JSON and XML); and transform into structured datasets with consistent schemas.
Design dimensional models, data warehouses, and data marts following industry methodologies.
Create conceptual, logical, and physical data models optimized for query performance and storage efficiency.
Implement slowly changing dimensions and other data warehousing patterns.
Establish naming conventions, data standards, and modeling best practices.
Implement comprehensive data quality checks, validation rules, and automated monitoring with alerting.
Build error handling, failure recovery, logging, and observability into all processes.
Optimize pipelines for performance, cost, and resource utilization.
Develop reusable components and frameworks; refactor legacy pipelines for reliability.
Build and maintain data infrastructure on cloud platforms (Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP)) using infrastructure-as-code using Terraform and CloudFormation.
Implement CI/CD pipelines, version control (Git), and automated testing frameworks.
Manage database performance tuning, indexing, partitioning, and capacity planning.
Establish backup, recovery, security controls, access controls, and compliance measures.
Partner with analysts, software developers, and business stakeholders to translate requirements into technical solutions.
Create comprehensive documentation for systems, processes, and integrations.
Provide technical guidance on data availability and proper usage; enable self-service access.
Troubleshoot pipeline failures, performance issues, and data discrepancies; perform root cause analysis.