Data Engineer II

SteerBridge•San Diego, CA

1d•Onsite

About The Position

SteerBridge seeks a highly skilled and motivated Data Engineer II to join our team supporting the F-35 AI/ML Spares Project. This role involves building and maintaining AWS-based ETL/ELT pipelines, curated analytical datasets, and reporting workflows that support operational decision-making. The Data Engineer II will design scalable data infrastructure for business intelligence, machine learning, and operational analytics, performing data engineering tasks on-site within existing systems of record with multiple databases. A key aspect of this role includes mentoring and collaborating with Marines at the squadron level, requiring a deep understanding of squadron-specific operations and a commitment to improving data entry and indexing practices. This position serves as a crucial link between existing systems and data development, requiring close collaboration with data scientists, analysts, and operational partners.

Requirements

3–5 years of professional experience in data engineering or a closely related role.
Bachelor's Degree in Computer Science or related field; three (3) years of additional relevant experience may substitute for education (minimum six years total experience without degree).
U.S. Citizenship required.
Active security clearance or the ability to obtain one is required.
Strong proficiency in Python (PySpark, pandas) and SQL for data processing and pipeline development; SQL fluency including CTEs, window functions, and complex joins.
Hands-on experience with at least one cloud data warehouse (Snowflake, BigQuery, or Redshift).
Experience configuring or monitoring data pipelines in cloud platforms (AWS preferred; Oracle, Azure, Google also considered).
Familiarity with analytics deployment architectures including Python, containerized Docker, and Kubernetes.
Experience with Spark/Databricks for streaming data analytics, with preferred experience in graph data, machine learning, and AI applications.
Experience using Azure Data Factory to schedule pipelines and manage data flows.
Ability to connect and work with APIs (REST, SOAP, HTTP methods).
Experience with workflow orchestration tools such as Apache Airflow, dbt, or Prefect.
Solid understanding of data modeling concepts: star schema, data vault, medallion architecture, data lineage, and source-to-target mapping.
Familiarity with data visualization tools including Tableau, Power BI, Elasticsearch/Kibana, R, or Alteryx.
Experience with version control (Git), CI/CD practices, and production deployment workflows.
Experience integrating data; familiarity with cleaning, merging, standardizing, documenting, and securing data.
Strong communication skills with the ability to translate technical concepts for non-technical and operational stakeholders.
Ability to develop relationships with collaborators, program providers, community partners, and military personnel.
Able to successfully prioritize and manage multiple critical projects simultaneously with a high degree of accuracy.
Experience working on applied data projects involving diverse organizations to collect, analyze, and interpret data.

Nice To Haves

AWS Professional or Specialty Certification, or the ability to obtain one (highly preferred).
Experience supporting DoD and/or VA missions.
At least two (2) years using SQL professionally, with proficiency in R or Python.
Cloud project experience using AWS, Google, Oracle, and/or Azure.
Experience with ML/NLP/AI including Neural Networks and Supervised/Unsupervised algorithms for anomaly detection, forecasting, and modeling.
Experience with streaming data technologies such as Apache Kafka or AWS Kinesis.
Proficiency in HTTP Methods, Postman development/testing of REST and/or SOAP APIs, and CRUD actions.
Familiarity with data catalog and lineage tools such as DataHub, Alation, or Monte Carlo.
Knowledge of infrastructure-as-code tools such as Terraform or Pulumi.
Exposure to ML pipelines and feature stores.
Demonstrated high proficiency in statistical analysis software: Power BI, Tableau, Elasticsearch/Kibana, Alteryx, Python, or R.
Deep understanding of data quality issues with applied experience in quality assurance.
Proficiency in each phase of the software development lifecycle.
Master's degree in Computer Science, Engineering, Mathematics, or a related field (or equivalent experience).
Excellent writing, presentation, and research design skills; track record of communicating complex concepts to diverse audiences.

Responsibilities

Lead end-to-end data pipeline operations — design, develop, and maintain robust ETL/ELT pipelines on AWS (AWS Glue, Amazon Redshift, Amazon S3) using modern orchestration tools such as Apache Airflow, dbt, or Prefect.
Use Azure Databricks (Spark) and Azure Data Factory to manage and schedule data pipelines and workflows.
Build and optimize data models in cloud data warehouses (Snowflake, BigQuery, or Redshift); maintain data in S3 buckets and Blob storage.
Integrate data from diverse sources including REST/SOAP APIs, event streams (Kafka), relational and non-relational databases, and SaaS platforms.
Create, index, query, and update SQL tables/servers; run and update Python and/or JavaScript code to parse data.
Monitor pipeline health, enforce data quality controls (schema validation, null checks, duplicate detection), troubleshoot data issues, and implement alerting and observability best practices.
Develop and implement data acquisition, quality assurance, and management protocols; document all data collection, cleaning, and analyses for internal and external users.
Use schedulers and APIs to obtain near real-time data; automate workflows and processes using Python or other scripting languages.
Partner with data scientists and analysts to deliver clean, well-documented datasets and data products; collaborate with and support the data science team to produce deliverables.
Contribute to data governance standards including lineage, cataloging, and access controls.
Mentor and collaborate with Marines at the squadron level to improve data entry and indexing practices.
Assist with maintenance and development of internal analytics data architecture.
Design, write, and disseminate innovative and visually appealing reports for diverse audiences.
Participate in code reviews, architecture discussions, and cross-functional planning sessions.
Evaluate and recommend new tools and technologies to improve the data platform.