AL/ML Evaluation Engineer

Booz Allen Hamilton•Atlanta, GA

1d•$128,700 - $292,000•Onsite

About The Position

As an experienced engineer, you know that machine learning (ML) and AI evaluation are critical to understanding and operationalizing massive datasets in support of public health and safety missions. Your ability to evaluate, optimize, and deploy AI-driven systems makes you an integral part of delivering mission-focused solutions for scientists, analysts, and leadership teams. In this role, you’ll help define and implement scientific AI evaluation and enablement initiatives by translating advanced AI capabilities into practical, mission-specific workflows. You’ll collaborate with a large community of ML engineers, data scientists, architects, and product teams to design scalable ML and Generative AI solutions, including AI agents, retrieval-augmented generation (RAG) pipelines, and enterprise AI evaluation frameworks. You’ll apply technical expertise across AI evaluation, retrieval optimization, memory and state management, and AI agent architecture to support real-time insights and decision-making. Your work will also contribute to enterprise MLOps capabilities, data governance standards, and ethical AI practices within regulated public health environments.

Requirements

5+ years of experience with Generative AI, LLMs, AI agents, or RAG applications, and designing, developing, and deploying ML models and AI solutions using Python
3+ years of experience with AI agents and AI evaluation strategies in enterprise environments
2+ years of experience with Deep Research evaluation methodologies and AI evaluation workflows
Experience with ML frameworks such as TensorFlow or PyTorch, for production-grade model development
Experience with data engineering using PySpark, SQL, and Palantir Foundry, including Foundry AIP
Experience with MLOps platforms such as MLflow and cloud environments, including Azure
Knowledge of public health, healthcare, or government data systems and associated governance practices
Ability to design and optimize AI systems involving retrieval workflows, memory or state management, and real-time decision-support capabilities
Ability to obtain and maintain a Public Trust or Suitability/Fitness determination based on client requirements
Bachelor's degree in CS, Engineering, or Data Science

Nice To Haves

Experience working in healthcare, biomedical, or government public health AI/ML environments
Experience with conversational AI, chatbot systems, or full-stack AI application development
Experience with containerization, CI/CD, orchestration, and production MLOps pipelines
Experience with Agile delivery environments and tools such as Jira
Experience writing technical documentation and presenting AI/ML solutions to various audiences
Experience integrating enterprise AI tools such as Codex, Claude, or similar AI enablement platforms
Knowledge of enterprise AI governance, compliance, and ethical AI frameworks
Knowledge of AI systems for retrieval, ranking, and scientific evaluation use cases
Ability to collaborate effectively in matrixed, cross-functional organizations
Master's degree in CS, Data Science, ML, or a related field

Responsibilities

Build and maintain scalable data pipelines using PySpark and Palantir Foundry to support AI, analytics, and scientific evaluation workflows.
Design and implement ML and Generative AI workflows, including AI agents, RAG pipelines, and AI evaluation frameworks.
Integrate advanced AI technologies such as Codex and Claude, to support mission-specific workflows, real-time insights, and decision-making capabilities.
Develop evaluation strategies covering model quality, retrieval optimization, benchmarking, and performance monitoring for deployed AI systems.
Design AI architectures supporting memory, state management, structured and unstructured public health data interaction, and scalable agent orchestration.
Establish data governance, privacy, anonymization, documentation, and ethical AI standards across AI/ML systems and public health data environments.