Guidehouse-posted about 11 hours ago
Full-time • Mid Level
Huntsville, AL
5,001-10,000 employees

Guidehouse is seeking a Software Developer to join our Technology / AI and Data team, supporting mission-critical initiatives for Defense and Security clients. In this role, you will lead the design and implementation of secure, scalable ingestion and data processing workflows that power advanced AI-driven platforms. You will architect solutions for transforming complex, high-volume data into structured outputs optimized for downstream AI/ML pipelines, while ensuring compliance with stringent federal security and regulatory standards. Collaborating with engineers, architects, and mission stakeholders, you will deliver innovative backend capabilities that enable accurate, efficient, and reliable decision-making in support of national security objectives.

  • Serves as the lead backend integration engineer responsible for architecting and implementing ingestion, preprocessing, normalization, and transformation workflows for the FBI adjudication AI platform.
  • Designs ingestion frameworks supporting SF-86 forms, investigative attachments, summaries, financial/criminal records, and continuous vetting alerts using both traditional OCR and VLM/LLM-based document understanding.
  • Ensures ingestion workflows comply with FedRAMP High, RMF, CJIS, and FBI ATO requirements, including logging, auditability, encryption, and secure processing of PII and sensitive investigative information.
  • Collaborates with AI/ML engineers, backend API developers, cloud engineers, and security engineers to ensure ingestion outputs are optimized for RAG workflows, SEAD-4 scoring, anomaly detection, and adjudicator review.
  • Design ingestion pipelines supporting LLMs and VLMs for OCR, document understanding, multimodal extraction, and parsing of complex investigative materials including forms, tables, handwritten elements, and embedded imagery.
  • Build scalable ingestion and ETL workflows capable of processing hundreds of pages per case using OCR engines (Textract, Tesseract) and VLM-based parsing models such as LayoutLM, Qwen-VL, Donut, or LLaVA.
  • Implement normalization and transformation workflows including deduplication, schema harmonization, field mapping, classification labeling, chunking, segmentation, and tokenization optimized for downstream LLM/RAG operations.
  • Develop fault-tolerant ingestion systems with checkpointing, idempotency, retry frameworks, ingestion-state tracking, and structured error reporting.
  • Build secure, compliant integrations with FBI systems, case repositories, identity/HR systems, and continuous vetting alert sources using APIs, ETL endpoints, SFTP, and message queues.
  • Develop backend microservices that assemble case packages, correlate evidence across disparate sources, and produce structured adjudication-ready datasets.
  • Integrate ingestion outputs with vector databases, embedding pipelines, and LLM inference services, ensuring data is structured, enriched, and optimized for reasoning workflows.
  • Ensure all integrations enforce strict authentication, authorization, validation, and data-handling policies.
  • Create ingestion workflows that prepare documents and extracted content for embeddings, retrieval indexing, semantic search, and long-context reasoning.
  • Implement chunking, segmentation, labeling, and evidence-tagging strategies designed to maximize retrieval precision and reduce hallucination risk in LLM inference.
  • Develop heuristics for filtering, prioritizing, and contextualizing extracted information to enable fact-grounded SEAD-4 scoring and memo generation.
  • Support preparation of vector representations, metadata fields, and retrieval keys for large-scale evidence collections.
  • Implement secure ingestion pipelines aligned with FedRAMP High, RMF, CJIS, and FBI security requirements including encryption, access control, PII-handling rules, and secure logging.
  • Apply advanced PII-safe processing techniques including automated redaction, VLM-aided sensitive field detection, classification tagging, and compliance-driven filtering.
  • Ensure ingestion systems generate detailed logs, lineage metadata, provenance trails, and audit events supporting adjudication oversight and accreditation documentation.
  • Collaborate with Security Engineers to ensure ingestion controls map to SSP requirements and POA&M items are remediated promptly.
  • Optimize ingestion pipelines for parallelization, concurrency, batching, memory efficiency, and large-scale document processing throughput.
  • Implement distributed ETL frameworks such as Step Functions, Airflow, Dagster, Glue, or Spark depending on workload and security constraints.
  • Develop monitoring dashboards capturing ingestion throughput, VLM/LLM OCR accuracy metrics, error frequencies, latency patterns, and retry trends.
  • Implement resilience features including dead-letter queues, backoff retry mechanisms, fault isolation, and disaster-recovery patterns.
  • Align ingestion outputs directly with AI/ML engineer requirements for long-context LLM inference, retrieval indexing, and SEAD-4 scoring workflows.
  • Work with backend API developers to ensure ingestion flows integrate seamlessly with scoring engines, entity explorers, memo builders, and anomaly detection pipelines.
  • Participate in sprint ceremonies, architecture reviews, backlog refinement, and cross-functional coordination with mission stakeholders.
  • Mentor mid-level engineers in ETL design, multimodal OCR techniques, distributed system patterns, and secure ingestion best practices.
  • An ACTIVE and MAINTAINED "TOP SECRET" Federal or DoD security clearance
  • Requires a University Degree and minimum 4-6 years of prior relevant experience; (Relevant experience may be substituted for formal education or advanced degree)
  • 5 years of backend/integration engineering experience, including 3 years in large-scale ETL or ingestion workflows.
  • Deep experience with Python, Java, or Scala; ingestion frameworks such as Airflow, Step Functions, Dagster, or Glue.
  • Experience with ETL pipelines, large-scale document ingestion, OCR/VLM document understanding, unstructured data parsing.
  • Experience developing secure data processing/normalization workflows.
  • Experience with distributed processing frameworks.
  • An ACTIVE and MAINTAINED "TOP SECRET" Federal or DoD security clearance.
  • Once onboard with Guidehouse, new hire MUST be able to OBTAIN and MAINTAIN a Federal or DoD "TOP SECRET/SCI (TS/SCI)" security clearance.
  • 8+ years of backend/integration engineering experience, including 4+ years in large-scale ETL or ingestion workflows.
  • Experience integrating FBI, DCSA, or NBIB systems or adjudication-related data sources.
  • Experience designing ingestion workflows for RAG, embeddings, vector databases, or long-context LLM pipelines.
  • Experience training or applying VLMs such as LayoutLM, Donut, Qwen-VL, or LLaVA for OCR replacement or enhancement.
  • Experience with knowledge graphs, entity resolution, evidence-linking workflow development.
  • Familiarity with SEAD-4, continuous vetting, or investigative case analysis processes.
  • Airflow vs. Dagster vs. Step Functions.
  • Textract, Tesseract, LayoutLM, Donut, Qwen-VL, LLaVA.
  • Specific AWS ingestion tools (Glue, Batch, S3 eventing).
  • Medical, Rx, Dental & Vision Insurance
  • Personal and Family Sick Time & Company Paid Holidays
  • Position may be eligible for a discretionary variable incentive bonus
  • Parental Leave and Adoption Assistance
  • 401(k) Retirement Plan
  • Basic Life & Supplemental Life
  • Health Savings Account, Dental/Vision & Dependent Care Flexible Spending Accounts
  • Short-Term & Long-Term Disability
  • Student Loan PayDown
  • Tuition Reimbursement, Personal Development & Learning Opportunities
  • Skills Development & Certifications
  • Employee Referral Program
  • Corporate Sponsored Events & Community Outreach
  • Emergency Back-Up Childcare Program
  • Mobility Stipend
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service