Data Architect

ATTAINX INC•Herndon, VA

2h•Remote

About The Position

The Data Architect / Data Engineering Lead provides technical leadership for data architecture, data engineering, database modernization, and AI/ML enablement across the NRCS IT ecosystem. This role is responsible for guiding the transformation of legacy data platforms – including monolithic SQL Server environments, SSIS-based ETL pipelines, and tightly coupled cross-database dependencies – into scalable, cloud-native architectures on AWS. The position works in close coordination with the Enterprise Lead Architect, Government Program Managers, and cross-functional delivery teams to execute data management, modernization, and operational sustainment activities under the OMNI contract.

Requirements

Experience: 10+ years of progressive experience in data architecture, data engineering, and database administration across enterprise environments.
Cloud Platforms: 5+ years of hands-on experience designing and deploying data solutions on AWS, including direct experience with S3, Glue, EMR/Spark, Lambda, Step Functions, DMS, RDS (PostgreSQL, Aurora), DynamoDB, OpenSearch, and Lake Formation.
Database Technologies: Deepexpertisein Microsoft SQL Server (including HA/DR configurations, performance tuning, stored procedures, and large-scale database operations) and PostgreSQL/PostGIS. Experience with database decoupling and monolithic database decomposition.
Data Engineering: Proven experience building production data pipelines using Spark,PySpark, Databricks, and AWS Glue for batch, streaming, and geospatial workloads. Experience modernizing legacy ETL (SSIS) to cloud-native frameworks.
Programming: Strongproficiencyin SQL/T-SQL, Python, andPySpark. Working knowledge of Bash/PowerShell for automation.
Architecture:Demonstratedability to design and implement enterprise data architectures including data warehouses, data lakes,lakehouses(Delta Lake), and service-layer integration patterns.
Federal IT: 3+ years of experience supporting federal IT programs, with familiarity with FISMA, NIST RMF, ATO processes, and federal change management requirements.
DevSecOps: Experience with CI/CD pipelines, Git-based version control, Terraform or CloudFormation, Liquibase, and automated quality/security gates.
Agile/SAFe: Experience working withinSAFeAgile or equivalent iterative delivery frameworks, including backlog management in Jira.
Must be able to obtain and maintain a USDA public trust clearance.

Nice To Haves

Direct experience with USDA NRCS systems, including NASIS, Soil Data Warehouse, Web Soil Survey, SSURGO, or related soil/conservation data platforms.
Experience with FPAC IT governance, the Technical Guidance Framework (TGF), and FPAC CI/CD pipeline standards.
Hands-on experience with AWS Bedrock, SageMaker, and Generative AI patterns (RAG, embeddings, natural-language-to-SQL,LangChain).
Experience with geospatial data engineering, includingPostGIS,GeoPackage, ArcGIS WFS/WMS services, and spatial data pipelines.
Experience with AI-enabled legacy modernization platforms (e.g., Rhino.ai or equivalent).
Azure experience (Synapse, ADF, ADLS, Azure ML Studio, Databricks on Azure) as a complement to primary AWS focus.
Relevant certifications: AWS Solutions Architect, AWS Data Analytics Specialty, Azure Data Engineer Associate (DP–203), or equivalent.
Master’s degree in Computer Science, Data Science, or related field (in progress acceptable).

Responsibilities

Data Architecture and Strategy
Define andmaintaindata architecture standards, patterns, and governance practices across all NRCS systems, ensuring alignment with FPAC’s Technical Guidance Framework (TGF), Cloud Memo directives, and Zero Trust principles.
Lead conceptual and logical decomposition of monolithic database structures (e.g., NPAD) into domain-aligned, modular schemas that support incremental modernization and cloud migration.
Architect service-layer data access patterns to replace direct cross-database queries and business logic embedded in stored procedures, reducing architectural fragility and enabling decoupled deployments.
Design andmaintaindata models for enterprise soil data systems including NASIS, Soil Data Warehouse (SDW), Soil Data Marts (SDM), and related spatial/tabular datasets.
Align supported systems with USDA’s cloud-native Lakehouse Data Strategy, including adoption of Databricks as the departmental standard data integration tool and elimination of duplicated data copies.
Register andmaintainschemas, interfaces, and metadata in AWSDataZone(or Government-directed metadata tooling), ensuring synchronization across environments.
Data Engineering and Pipeline Development
Design, build, andmaintainend-to-end data engineering pipelines using AWS-native services (Glue, EMR/Spark, Lambda, Step Functions,EventBridge, DMS, S3, RDS/Aurora PostgreSQL) for batch, streaming, geospatial, and near-real-time workloads.
Modernize legacy SSIS-based ETL/ELT pipelines to cloud-native equivalents (AWS Glue, Databricks,PySpark), improving scalability, maintainability, and operational efficiency.
Build andoperateAWS DMS full-load and CDC pipelines to support migration of SQL Server databases to PostgreSQL/PostGISand other target platforms.
Implement Delta Lake standards, partitioning strategies, and performance tuning across ingestion frameworks for structured, unstructured, and geospatial data.
Develop serverless orchestration workflows using Lambda,EventBridge, and Step Functions for event-driven processing and automated data operations.
Implement data quality controls (validation, reconciliation, monitoring) andmaintainaudit-ready evidence of data management activities.
Database Operations and Modernization
Provide senior-level DBA support for SQL Server clusters (including high-availability configurations, failover groups, and large-scale datasets exceeding 50 TB), as well as PostgreSQL/PostGIS, Aurora, and DynamoDB environments.
Lead database schema versioning, change tracking, and deployment automation using Liquibase and Government-approved CI/CD processes.
Execute database modernization activities including re-platforming from on-premises SQL Server to AWS RDS/Aurora, decoupling monolithic database dependencies, andeliminatingcross-database stored procedure calls.
Develop andmaintainapplication-specific database recovery runbooks, including validated restore procedures, dependency mapping, and configuration baselines aligned with DR/COOP requirements.
AI/ML and Generative AI Enablement
Design and implement AI/ML and Generative AI solutions using AWS services (Bedrock, SageMaker, OpenSearch) to support natural-language-to-SQL, automated metadata generation, conversational technicalassistance, and AI-powered data pipeline optimization.
Apply GenAI tooling (e.g., Bedrock,LangChain, embeddings, RAG patterns) to accelerate documentation, schema analysis, and DevOps workflows.
Support AI-assisted analysis to detect redundant data flows, schema drift, and opportunities to simplify data integrations.
Leverage AI-enabled platforms (e.g., Rhino.ai or equivalent) for legacy system discovery, business logic extraction, and modernization acceleration where authorized by the Government.
AWS Migration Support
Provide data engineering and DBAexpertisein support of the urgent AWS migration from DISC data centers, including troubleshooting, testing, and implementing operational adjustments tomaintaincontinuity of mission-critical business functions (e.g., payment processing).
Support full on-premises to AWS migration for databases and data infrastructure, including provisioning, lift-and-shift, re-architecture, data migration validation, and issue resolution.
Design and execute data migration and transformation activities, including test data management and privacy-preserving techniques for non-production environments.
Governance, Compliance, and Knowledge Transfer
Maintain audit-ready documentation for all data architecture decisions, schema changes, pipeline configurations, and modernization artifacts in Government-designated systems of record.
Enforce FPAC architectural principles, secure coding standards, and NIST SP 800–53 controls across all data engineering and database activities.
Conduct architecture reviews, design assurance gates, and code reviews for data-related deliverables, ensuring adherence to quality standards and FPAC SonarQube thresholds.
Deliver knowledge transfer sessions to Government personnel and incoming vendors during transition periods, including complete documentation handoff of data systems, pipelines, and architectural decisions.
Maintain and update troubleshooting playbooks, runbooks, and knowledge articles for data systems in Government-designated repositories.

Benefits

Competitive compensation and benefits packages including paid vacation, medical, dental, vision, matching 401K plan, tuition/training reimbursement, and Long & Short-Term Disability.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume