Data Infrastructure Engineer

GuidehouseTysons, VA
$98,000 - $163,000

About The Position

We are seeking a Data Infrastructure Engineer to build and operate the data platform that powers AI/ML analytics modules. You will design and implement scalable data ingestion pipelines, robust ETL/ELT, and a modern data lake / delta lake (lakehouse) on AWS. You’ll also establish a managed metadata repository and governance layers (catalog, lineage, quality, access controls) and deliver automated cloud provisioning plus CI/CD for data pipelines to enable reliable, repeatable deployments across environments. This role is ideal for an engineer who enjoys platform building, automation, and enabling advanced analytics through trusted, well-governed data.

Requirements

  • Must be able to OBTAIN and MAINTAIN a Federal or DoD "PUBLIC TRUST"; candidates must obtain approved adjudication of their PUBLIC TRUST prior to onboarding with Guidehouse. Candidates with an ACTIVE PUBLIC TRUST or SUITABILITY are preferred.
  • Bachelor’s degree in Engineering, IT, Computer Science, or related field (or equivalent experience).
  • Minimum of FOUR (4) years experience building production data pipelines and/or data platforms.
  • Strong experience implementing data ingestion and ETL/ELT workflows, including data modeling and transformation best practices.
  • Hands-on experience building a data lake / delta lake (lakehouse) on AWS (or equivalent cloud) using object storage and modern table formats/patterns.
  • Proficiency in SQL and one programming language commonly used for data engineering (Python preferred; Scala/Java acceptable).
  • Experience with metadata management and governance: cataloging, lineage, ownership, access controls, classification and policy enforcement.
  • Experience implementing automated AWS provisioning using IaC and operating across multiple environments.
  • Experience building or operating CI/CD pipelines for data workflows (testing, packaging, deployment automation, environment promotion).
  • Solid security fundamentals: IAM/least privilege, encryption, secrets management, secure SDLC practices.

Nice To Haves

  • Hands-on experience with Databricks
  • Hands-on experience utilizing modern DevOps practices, including tools like Git, Terraform, Jenkins, AWS CodePipeline, and Docker.
  • Experience utilizing AI-assisted coding tools (e.g., GitHub Copilot, ChatGPT, Cursor, Kiro) to safely accelerate implementation while maintaining strict code quality through testing, code reviews, and security practices.
  • Knowledge graph and Graph RAG experience, including: Graph modeling and ontology/taxonomy alignment Entity resolution and relationship extraction Hybrid retrieval approaches combining graph traversal with semantic/vector search to improve grounding and explainability

Responsibilities

  • Build & Operate Data Pipelines (Batch + Streaming)
  • Design and implement batch and streaming ingestion from APIs, relational databases, file drops, event streams, and external partners.
  • Build and optimize ETL/ELT pipelines to produce curated, analytics-ready datasets for reporting and ML consumption.
  • Implement incremental processing patterns, change data capture (CDC) approaches where appropriate, and data contract standards.
  • Deliver a Modern Lakehouse (Data Lake / Delta Lake)
  • Build and manage a scalable lakehouse on AWS object storage (e.g., S3) using open table/file formats and delta/lakehouse concepts (e.g., ACID tables, schema evolution, time travel patterns).
  • Optimize performance and cost through partitioning, compaction, lifecycle policies, and efficient compute/storage usage.
  • Establish environment standards for dev/test/prod and consistent promotion across stages.
  • Metadata, Governance, Lineage & Quality (Trust Layer)
  • Implement a managed metadata repository for dataset cataloging, ownership, glossary/definitions, tagging, and discoverability.
  • Enable end-to-end lineage (source → transformations → consumption) to support auditability and impact analysis.
  • Implement governance controls including policy-based access, data classification, retention, and secure data handling.
  • Build operational data quality checks (freshness, completeness, validity, anomaly detection) and publish SLAs/SLOs.
  • AWS Automation + CI/CD for Data Pipelines
  • Implement automated cloud provisioning in AWS using Infrastructure as Code (IaC) for consistent environments and secure-by-default baselines.
  • Build and enhance CI/CD for data pipelines, including automated tests, validation gates, promotion workflows, and rollback strategies.
  • Improve observability with metrics/logs/alerts, dashboards, runbooks, and incident response readiness.
  • Cross-Team Collaboration & Documentation
  • Work closely with engineering, security, networking, and application teams to support mission needs and delivery timelines.
  • Maintain high-quality engineering documentation including SOPs, system diagrams, and secure configuration baselines.
  • Summarize and present findings and recommendations—both written and verbal—to technical and non-technical stakeholders.

Benefits

  • Medical, Rx, Dental & Vision Insurance
  • Personal and Family Sick Time & Company Paid Holidays
  • Parental Leave
  • 401(k) Retirement Plan
  • Group Term Life and Travel Assistance
  • Voluntary Life and AD&D Insurance
  • Health Savings Account, Health Care & Dependent Care Flexible Spending Accounts
  • Transit and Parking Commuter Benefits
  • Short-Term & Long-Term Disability
  • Tuition Reimbursement, Personal Development, Certifications & Learning Opportunities
  • Employee Referral Program
  • Corporate Sponsored Events & Community Outreach
  • Care.com annual membership
  • Employee Assistance Program
  • Supplemental Benefits via Corestream (Critical Care, Hospital Indemnity, Accident Insurance, Legal Assistance and ID theft protection, etc.)
  • Position may be eligible for a discretionary variable incentive bonus
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service