Data Engineer

Staffed4U•Chantilly, VA

7d•Onsite

About The Position

Data Engineer Location: Chantilly, VA Work Schedule: Full-Time, Onsite Clearance Required: Active TS/SCI with Full Scope Polygraph (FSP) Employment Type: W-2 Position Overview We are seeking a talented and mission-focused Data Engineer to join our growing team supporting cutting-edge intelligence community initiatives in Chantilly, VA. This role offers the opportunity to work with large-scale datasets and contribute to the development of a custom enterprise platform supporting critical mission objectives. The selected candidate will play a key role in designing, building, and optimizing scalable data pipelines and architectures that support analytics, machine learning, and enterprise data integration efforts. This position is funded for an initial 9–12 month period aligned with defined mission deliverables and system development timelines, with all development performed onsite at the customer location.

Requirements

3–5+ years of professional experience in Data Engineering or a related technical field.
Experience designing and implementing ETL/ELT pipelines.
Experience processing and managing large-scale structured and unstructured datasets.
Experience working in cloud-based data environments.
Strong proficiency with Python and SQL.
Experience with PySpark or other distributed processing frameworks (highly desired).
Experience with ElasticSearch/OpenSearch technologies.
Experience working within AWS cloud environments.
Experience supporting Linux-based systems.
Proficiency with Git for version control and collaborative development.
Understanding of machine learning workflows and MLOps concepts.
Experience integrating and consuming REST APIs.
Familiarity with Docker, Kubernetes, and CI/CD pipelines.
Active TS/SCI with Full Scope Polygraph (FSP) is required.
U.S. Citizenship required.
Strong collaboration and communication skills.
Ability to communicate complex technical concepts to non-technical audiences.
Detail-oriented with a strong commitment to data quality and integrity.
Ability to manage multiple priorities in a fast-paced mission environment.
Strong analytical and problem-solving capabilities.

Nice To Haves

Hands-on experience with graph databases.
Experience modeling, querying, and optimizing Neo4j databases.
Experience supporting advanced analytics, knowledge graphs, or entity resolution systems.
Experience working within Intelligence Community environments.

Responsibilities

Design, develop, and maintain ETL/ELT pipelines for both batch and real-time data processing using Python and SQL.
Integrate data from a variety of structured and unstructured sources, including databases, APIs, streaming platforms, PDFs, and Microsoft Office files.
Build scalable and maintainable data architectures to support analytics and machine learning workloads.
Optimize data processing workflows and queries for performance, scalability, and cost efficiency within AWS environments.
Support future pipeline scalability through exposure to PySpark and other distributed data processing frameworks.
Develop and maintain web scraping and data ingestion workflows to collect and process open-source data.
Transform collected information into structured datasets and visualizations for stakeholder analysis and decision-making.
Collect, clean, validate, and manage large volumes of structured and unstructured data.
Implement data quality controls, validation procedures, and version management practices.
Design and optimize data storage solutions utilizing AWS S3 for raw, intermediate, and production datasets.
Implement data governance best practices including documentation, cataloging, lineage tracking, and security controls.
Ensure compliance with customer and security requirements for data management and handling.
Partner closely with Data Scientists, Analysts, and Engineering teams to understand business and mission requirements.
Prepare clean, structured, and feature-ready datasets for analytics and machine learning applications.
Support feature engineering, aggregation, and large-scale data transformations.
Assist with deploying machine learning models into production environments while supporting monitoring, versioning, and performance optimization.
Integrate and consume REST APIs to support data acquisition and application workflows.
Utilize Docker, Kubernetes, Git, and CI/CD pipelines to support deployment and operational workflows.
Document data pipelines, architectures, schemas, and transformation processes.
Communicate technical concepts effectively to both technical and non-technical stakeholders.
Participate in code reviews and promote engineering best practices across the team.
Contribute to continuous improvement efforts related to data engineering, automation, and platform development.