AI/ML Infrastructure Software Development Engineer

Booz Allen HamiltonWashington, DC
Remote

About The Position

To achieve an organization’s mission, leaders need strong team members who can create and analyze processes, communicate requirements, and develop innovative solutions throughout the execution of the mission. Whether reviewing program-wide technical architecture or providing AI/ML infrastructure expertise, our clients need someone who combines deep technical understanding of software engineering with strong architectural judgment. This role is for an experienced AI/ML Software Development Engineer who can operate at a system-of-systems level to support clients in advancing AI-enabled systems within an R&D environment. As part of the team, you'll serve as an AI/ML Infrastructure Software Development Engineer to the Advanced Research Projects Agency for Health (ARPA-H), helping conceptualize, create, and execute advanced government-funded research and development programs to accelerate better health outcomes for everyone. You will work with world-class scientists and engineers to support the development of high-impact solutions to society's most challenging health problems. You will leverage technical expertise to provide strategic assessments of new technologies in support of senior ARPA-H decision makers. Responsibilities include producing and presenting findings and recommendations to a team of colleagues and clients on the feasibility and potential impact of future research programs, assisting with the management of current programs, and facilitating commercialization of successfully developed technologies. You'll advise program leadership and support software engineering to support the client mission, ensuring program-wide technical architecture and engineering adhere to rigorous AI development, evaluation, and long-term impact. Your attention to detail, flexibility, communication skills, understanding of the client's mission, and problem-solving will enable the mission's success.

Requirements

  • 7+ years of experience with software engineering, including building and operating production systems
  • Experience being on-call, debugging incidents, and writing postmortems
  • Experience in high-velocity environments where you owned and shipped complex products end-to-end
  • Experience with at least 2 backend languages, including Python
  • Experience with Microsoft Azure, including Azure Functions, API Management, Container Apps, and Azure OpenAI Service
  • Experience with containerization, CI/CD, and infrastructure as Code
  • Knowledge of modern backend frameworks, async patterns, distributed systems, APIs, data pipelines, and software design patterns
  • Knowledge of authentication and identity systems, such as OAuth2, OIDC, or Azure Entra ID
  • Ability to own production systems
  • Bachelor's degree in Computer Science or Software Engineering

Nice To Haves

  • Experience in healthcare, life sciences, or other regulated domains
  • Experience in security-conscious engineering, including input validation, output sanitization, audit logging, and responsible AI guardrails
  • Experience in startup or early-stage environments, such as 0-to-1 product building
  • Experience implementing A2A communication patterns and multi-agent orchestration frameworks
  • Experience building on top of LLMs in production, including tool-calling, RAG, multi-step reasoning, multi-model routing, and context window management
  • Experience managing multi-provider LLM integrations, including rate limits, fallback routing, and API versioning
  • Experience in security-conscious engineering in regulated or government environments
  • Ability to be a self-starter and operate within a fast-paced environment
  • Ability to be comfortable with ambiguity and a high sense of urgency
  • Master’s degree in a relevant field

Responsibilities

  • Own and operate all backend and infrastructure components for an AI/ML model on Azure, including compute, APIs, identity, data layers, and IaC-driven environments
  • Build and maintain resilient CI/CD, deployment automation, secrets management, and production‑grade fundamentals, including monitoring, alerting, logging, tracing, SLOs, and incident response
  • Manage cost and token economics across all LLM providers, analyzing budgets, guardrails, and optimizations for cost‑per‑query
  • Lead agentic and protocol infrastructure, including MCP backend implementation, tool‑calling systems, and reliable A2A communication patterns
  • Design and evolve LLM orchestration, multi‑model routing, and robust fallback and degradation patterns across GPT, Claude, and Gemini
  • Build and operate RAG and knowledge pipelines, including ingestion, indexing, embedding, semantic search, and evaluation and safety monitoring
  • Drive engineering excellence through coding standards, reviews, documentation, mentoring, and consistently championing user‑focused, secure, compliant system design

Benefits

  • health benefits
  • life benefits
  • disability benefits
  • financial benefits
  • retirement benefits
  • paid leave
  • professional development
  • tuition assistance
  • work-life programs
  • dependent care
  • recognition awards program
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service