Software Developer (Backend – Integration)

Guidehouse•Huntsville, AL

48d

About The Position

Guidehouse is seeking a Software Developer to join our Technology / AI and Data team, supporting mission-critical initiatives for Defense and Security clients. In this role, you will lead the design and implementation of secure, scalable ingestion and data processing workflows that power advanced AI-driven platforms. You will architect solutions for transforming complex, high-volume data into structured outputs optimized for downstream AI/ML pipelines, while ensuring compliance with stringent federal security and regulatory standards. Collaborating with engineers, architects, and mission stakeholders, you will deliver innovative backend capabilities that enable accurate, efficient, and reliable decision-making in support of national security objectives.

Requirements

An ACTIVE and MAINTAINED "TOP SECRET" Federal or DoD security clearance
Requires a University Degree and minimum 4-6 years of prior relevant experience; (Relevant experience may be substituted for formal education or advanced degree)
5 years of backend/integration engineering experience, including 3 years in large-scale ETL or ingestion workflows.
Deep experience with Python, Java, or Scala; ingestion frameworks such as Airflow, Step Functions, Dagster, or Glue.
Experience with ETL pipelines, large-scale document ingestion, OCR/VLM document understanding, unstructured data parsing.
Experience developing secure data processing/normalization workflows.
Experience with distributed processing frameworks.

Nice To Haves

An ACTIVE and MAINTAINED "TOP SECRET" Federal or DoD security clearance.
Once onboard with Guidehouse, new hire MUST be able to OBTAIN and MAINTAIN a Federal or DoD "TOP SECRET/SCI (TS/SCI)" security clearance.
8+ years of backend/integration engineering experience, including 4+ years in large-scale ETL or ingestion workflows.
Experience integrating FBI, DCSA, or NBIB systems or adjudication-related data sources.
Experience designing ingestion workflows for RAG, embeddings, vector databases, or long-context LLM pipelines.
Experience training or applying VLMs such as LayoutLM, Donut, Qwen-VL, or LLaVA for OCR replacement or enhancement.
Experience with knowledge graphs, entity resolution, evidence-linking workflow development.
Familiarity with SEAD-4, continuous vetting, or investigative case analysis processes.
Airflow vs. Dagster vs. Step Functions.
Textract, Tesseract, LayoutLM, Donut, Qwen-VL, LLaVA.
Specific AWS ingestion tools (Glue, Batch, S3 eventing).

Responsibilities

Serves as the lead backend integration engineer responsible for architecting and implementing ingestion, preprocessing, normalization, and transformation workflows for the FBI adjudication AI platform.
Designs ingestion frameworks supporting SF-86 forms, investigative attachments, summaries, financial/criminal records, and continuous vetting alerts using both traditional OCR and VLM/LLM-based document understanding.
Ensures ingestion workflows comply with FedRAMP High, RMF, CJIS, and FBI ATO requirements, including logging, auditability, encryption, and secure processing of PII and sensitive investigative information.
Collaborates with AI/ML engineers, backend API developers, cloud engineers, and security engineers to ensure ingestion outputs are optimized for RAG workflows, SEAD-4 scoring, anomaly detection, and adjudicator review.
Design ingestion pipelines supporting LLMs and VLMs for OCR, document understanding, multimodal extraction, and parsing of complex investigative materials including forms, tables, handwritten elements, and embedded imagery.
Build scalable ingestion and ETL workflows capable of processing hundreds of pages per case using OCR engines (Textract, Tesseract) and VLM-based parsing models such as LayoutLM, Qwen-VL, Donut, or LLaVA.
Implement normalization and transformation workflows including deduplication, schema harmonization, field mapping, classification labeling, chunking, segmentation, and tokenization optimized for downstream LLM/RAG operations.
Develop fault-tolerant ingestion systems with checkpointing, idempotency, retry frameworks, ingestion-state tracking, and structured error reporting.
Build secure, compliant integrations with FBI systems, case repositories, identity/HR systems, and continuous vetting alert sources using APIs, ETL endpoints, SFTP, and message queues.
Develop backend microservices that assemble case packages, correlate evidence across disparate sources, and produce structured adjudication-ready datasets.
Integrate ingestion outputs with vector databases, embedding pipelines, and LLM inference services, ensuring data is structured, enriched, and optimized for reasoning workflows.
Ensure all integrations enforce strict authentication, authorization, validation, and data-handling policies.
Create ingestion workflows that prepare documents and extracted content for embeddings, retrieval indexing, semantic search, and long-context reasoning.
Implement chunking, segmentation, labeling, and evidence-tagging strategies designed to maximize retrieval precision and reduce hallucination risk in LLM inference.
Develop heuristics for filtering, prioritizing, and contextualizing extracted information to enable fact-grounded SEAD-4 scoring and memo generation.
Support preparation of vector representations, metadata fields, and retrieval keys for large-scale evidence collections.
Implement secure ingestion pipelines aligned with FedRAMP High, RMF, CJIS, and FBI security requirements including encryption, access control, PII-handling rules, and secure logging.
Apply advanced PII-safe processing techniques including automated redaction, VLM-aided sensitive field detection, classification tagging, and compliance-driven filtering.
Ensure ingestion systems generate detailed logs, lineage metadata, provenance trails, and audit events supporting adjudication oversight and accreditation documentation.
Collaborate with Security Engineers to ensure ingestion controls map to SSP requirements and POA&M items are remediated promptly.
Optimize ingestion pipelines for parallelization, concurrency, batching, memory efficiency, and large-scale document processing throughput.
Implement distributed ETL frameworks such as Step Functions, Airflow, Dagster, Glue, or Spark depending on workload and security constraints.
Develop monitoring dashboards capturing ingestion throughput, VLM/LLM OCR accuracy metrics, error frequencies, latency patterns, and retry trends.
Implement resilience features including dead-letter queues, backoff retry mechanisms, fault isolation, and disaster-recovery patterns.
Align ingestion outputs directly with AI/ML engineer requirements for long-context LLM inference, retrieval indexing, and SEAD-4 scoring workflows.
Work with backend API developers to ensure ingestion flows integrate seamlessly with scoring engines, entity explorers, memo builders, and anomaly detection pipelines.
Participate in sprint ceremonies, architecture reviews, backlog refinement, and cross-functional coordination with mission stakeholders.
Mentor mid-level engineers in ETL design, multimodal OCR techniques, distributed system patterns, and secure ingestion best practices.

Benefits

Medical, Rx, Dental & Vision Insurance
Personal and Family Sick Time & Company Paid Holidays
Position may be eligible for a discretionary variable incentive bonus
Parental Leave and Adoption Assistance
401(k) Retirement Plan
Basic Life & Supplemental Life
Health Savings Account, Dental/Vision & Dependent Care Flexible Spending Accounts
Short-Term & Long-Term Disability
Student Loan PayDown
Tuition Reimbursement, Personal Development & Learning Opportunities
Skills Development & Certifications
Employee Referral Program
Corporate Sponsored Events & Community Outreach
Emergency Back-Up Childcare Program
Mobility Stipend