Sr. Data Architect - Aviation

SteerBridge•Vienna, VA

4d•$155,000 - $180,000

About The Position

SteerBridge Strategies is a modern technology company delivering innovative, mission-focused solutions to the U.S. Government and private sector. Leveraging deep expertise in federal acquisition, digital transformation, and emerging technologies, we deliver agile, commercial-grade capabilities that accelerate operational effectiveness and drive measurable mission success. At the core of SteerBridge is our people—especially the veterans whose leadership, problem-solving mindset, and commitment to excellence elevate every project we support. We don’t simply hire exceptional talent; we cultivate it, creating meaningful career pathways for veterans, military spouses, and professionals who share our passion for advancing technology and strengthening the missions we serve. We are seeking a Senior Data Architect to lead the design and evolution of enterprise-level data ecosystems. You will be responsible for architecting scalable, secure, and high-performance data infrastructures that support mission-critical aviation sustainment. This is a "player-coach" role that requires high-level strategic planning alongside hands-on engineering execution.

Requirements

Must be a U.S. Citizen.
Masters’s Degree or Above in Systems Engineering, Computer Science or related field.
An active security clearance or the ability to obtain one is required.
Minimum 6+ years of experience to include:
Experience in data management, utilizing advanced analytics tools and platforms and Python.
Experience with Data Warehousing consulting/engineering or related technologies (Redshift, Databricks, BigQuery, OADW, Apache Hive, Apache Lucene).
Experience in scripting, tooling, and automating large-scale computing environments.
Extensive experience with major tools such as Python, Pandas, PySpark, NumPy, SciPy, SQL, and Git; Minor experience with TensorFlow, PyTorch, and Scikit-learn.
Deep understanding of data security and federal compliance requirements.

Nice To Haves

Data modeling (conceptual, logical, and physical)
Database schema design
Understanding of different database paradigms (relational, NoSQL, graph databases, etc.)
ETL (Extract, Transform, Load) processes and tools
Experience with modern data warehousing solutions (e.g., Redshift, Snowflake, BigQuery)
Understanding of dimensional modeling (star/snowflake schemas) and data vault techniques.
Experience designing for both OLTP and OLAP workloads.
Familiarity with metadata-driven design and schema evolution in data systems.
Experience defining data SLAs and lifecycle management policies.
Designing and implementing scalable data architectures that support business intelligence, analytics, and machine learning workflows.
Proficiency in tools like Apache Kafka, Airflow, Spark, Flink, or NiFi
Experience with cloud-based data services (AWS Glue, Google Cloud Dataflow, Azure Data Factory)
Real-time and batch data processing
Automation and monitoring of data pipelines
Strong understanding of incremental processing, idempotency, and backfill strategies.
Knowledge of workflow dependency management, retries, and alerting.
Experience writing modular, testable, and reusable Python-based ETL code.
Leading the development of highly available, fault-tolerant, and scalable data pipelines, integrating multiple data sources, and ensuring data quality.
Expertise in cloud environments (AWS, GCP, Azure)
Understanding of cloud-based storage (S3, Blob Storage), databases (RDS, DynamoDB), and compute resources
Implementing cloud-native data solutions (Data Lake, Data Warehouse, Data Mesh)
Experience with cost monitoring and optimization for data workloads.
Familiarity with hybrid and multi-cloud architectures.
Understanding of serverless data patterns (e.g., Lambda + S3 + Athena, Cloud Functions + BigQuery).
Migrating legacy data infrastructure to the cloud or developing new data platforms using cloud services, with a focus on cost efficiency and scalability.
Experience with big data ecosystems (Hadoop, HDFS, Hive, Spark)
Distributed computing, parallel processing, and handling petabyte-scale data
Tools for querying large datasets (Presto, Athena)
Understanding of lakehouse frameworks (Delta Lake, Iceberg, Hudi).
Familiarity with data compaction, schema evolution, and ACID guarantees in distributed storage
Building and managing big data platforms to enable large-scale analytics, often incorporating structured and unstructured data.
Expertise in database technologies (SQL, NoSQL, GraphDBs)
Query optimization, indexing, and partitioning strategies
Backup, replication, and disaster recovery planning
Understanding of query execution plans, cost-based optimization, and caching strategies.
Experience performing index and partition design based on query patterns.
Familiarity with data versioning and temporal tables.
Experience profiling and optimizing application code interacting with databases.
Performance tuning for complex queries, implementing database replication and sharding strategies to support high availability and scalability.
Data privacy, encryption, and compliance with regulations (GDPR, CCPA)
Implementing data governance frameworks (data lineage, cataloging, metadata management)
Role-based access control and user management for sensitive data
Experience with automated policy enforcement and data lineage visualization tools (e.g., DataHub, Collibra, Alation).
Knowledge of data quality frameworks integrated into CI/CD pipelines.
Familiarity with data contract testing between producer and consumer teams.
Developing and implementing data governance policies and security controls across the organization’s data assets, ensuring compliance with industry standards.
Proficiency in Python and SQL
Experience with version control (Git) and CI/CD for data engineering (Gitlab, Jenkins, CircleCI)
API design and integration (Postman)
Strong understanding of object-oriented programming (OOP) principles and design patterns in Python.
Familiarity with software engineering best practices (modularity, testing, documentation, linting).
Understanding of algorithmic complexity (Big O notation) and ability to optimize code for scale.
Experience with parallel and distributed computation frameworks (Spark, Dask, Ray).
Ability to profile and debug performance bottlenecks in data workflows.
Use of type hinting, logging frameworks, and automated testing frameworks (pytest, unittest)
Experience in supporting data scientists with feature engineering, data wrangling, and model deployment
Knowledge of ML orchestration tools (MLflow, Kubeflow)
Hands-on experience with analytics tools (e.g., Tableau, Power BI)
Familiarity with feature store design and model feature lineage tracking.
Understanding of data versioning and reproducibility for ML workflows.
Experience supporting real-time model inference pipelines.
Designing architectures that support AI/ML initiatives, enabling scalable data pipelines for training models, and supporting experimentation in the production environment.
Leading data engineering teams, cross-functional collaboration with data scientists, analysts, and business units
Project management (Agile, Scrum, Kanban) and stakeholder communication
Experience with mentorship and growing junior data engineers
Experience establishing data architecture standards and best practices.
Ability to review and approve technical designs for consistency and scalability.
Proven success in mentoring engineers in code quality, modeling, and system design.
Leading the technical direction for large-scale data initiatives, such as enterprise data lake implementations or the creation of a unified data platform.

Responsibilities

Design conceptual, logical, and physical data models for complex federal environments.
Lead the transition from legacy on-premises systems to modern, cloud-native (AWS/GCP) data platforms.
Architect and oversee the build of automated ETL/ELT pipelines using Python, SQL, and PySpark to ingest and transform unstructured and structured data.
Implement and optimize enterprise data warehouses using tools like AWS Redshift, Google BigQuery, AWS Glue, and Databricks.
Establish data governance frameworks, metadata management, and data lineage in alignment with federal standards (HIPAA, FHIR, NIST).
Conduct index/partition design, query tuning, and sharding strategies to ensure high availability and scalability for real-time analytics.
Design data architectures that facilitate AI/ML initiatives, including model training pipelines and real-time inference in production environments.
Mentor a team of data engineers, enforce software engineering best practices (CI/CD, unit testing, documentation), and serve as a technical bridge between stakeholders and delivery teams.