Data Engineer

M9 Solutions•Springfield, VA

7h•Onsite

About The Position

M9 Solutions is dedicated to providing IT services and solutions to the Federal Government by mobilizing the right people, skills, clearance levels, and technologies to help organizations that desire improved performance and modern, sustainable change. M9 has provided quality IT services and support to more than 30 Federal Agencies and multiple commercial customers nationwide. Our capabilities include IT Talent Solutions, Data Delivery & Analytics, Cyber Security, Cloud Migration, Applications and Infrastructure, Software Development, and Finance & Accounting. M9 Solutions is seeking a Data Engineer to work onsite in support of a government contract for a client located in Springfield, VA . An active TS/SCI clearance is required.

Requirements

Active TS/SCI security clearance.
Bachelor's or master’s degree in computer science, engineering, or related field.
10+ years of experience in data engineering or software development roles.
Strong proficiency in Python, including experience with libraries like pandas, PySpark, FastAPI, or similar.
Solid experience with cloud services (AWS or Azure) and Cloud native data engineering tools.
Proven experience in building and maintaining data pipelines using Kafka, Airflow, and other open-source frameworks.
Strong grasp of database internals and trade-offs between different storage technologies.
Familiarity with data governance, lineage, and metadata management concepts.
Experience or strong interest in integrating LLMs and AI/ML models into production-grade data systems.

Nice To Haves

Knowledge of data cataloging tools and semantic layer design.
Experience with containerization (Docker) and orchestration (Kubernetes).
Familiarity with MLOps tools or platforms (e.g., SageMaker, MLflow).

Responsibilities

Collect and integrate data from a wide variety of structured and unstructured sources, including APIs, RDBMS, file systems, third-party services, and real-time streams.
Design and implement scalable ETL/ELT pipelines to clean, enrich, normalize, and semantically align data (ontology-driven transformations).
Build and deploy data pipelines and associated infrastructure on AWS or Azure, using managed services like Lambda, Glue, Step Functions, Azure Data Factory, etc.
Understand and optimize for different storage engines—relational (PostgreSQL, MySQL), columnar (Redshift, BigQuery), indexing engines (ElasticSearch), key-value stores (DynamoDB, Redis), Object stores (S3 or similar), and caching layers.
Work with Apache Kafka (or similar platforms) to handle high-volume, low-latency data streams.
Utilize Apache Airflow (or equivalent) to schedule and monitor complex data workflows.
Collaborate with data scientists to integrate LLMs and ML models into pipelines for inference, tagging, enrichment, or intelligent routing of data.