AI/ML Engineer - Agentic

Hewlett Packard Enterprise•San Jose, CA

1d•Hybrid

About The Position

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today’s complex world. Our culture thrives on finding new and better ways to accelerate what’s next. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE. The AI/ML Engineer – Agentic is a senior individual contributor responsible for designing, building, and operating a production-grade agentic orchestration platform, including multi-agent workflows and MCP server–based tool infrastructure. The role focuses on enterprise-scale LLM integration, shared retrieval and memory services, and high‑performance backend systems that power agent execution. This position owns reliability, observability, and cloud-native operations for non-deterministic agentic systems in production. Contributions include applying developed subject matter expertise to solve common and sometimes complex technical problems and recommending alternatives where necessary. Might act as project lead and provide assistance to lower level professionals. Exercises independent judgment and consults with others to determine best method for accomplishing work and achieving objectives.

Requirements

Bachelor’s degree in computer science, engineering, information systems, or closely related quantitative discipline.
Typically, 4-7 years’ experience.
Production experience with agentic frameworks: LangGraph (preferred), Claude Agent SDK, or equivalent (not just prototypes)
Deep understanding of multi-agent architectures: supervisor/worker patterns, hierarchical agent graphs, ReAct loops, ReWoo
Hands-on with inter-agent communication protocols: MCP (Model Context Protocol), A2A, tool registry / server registry
LLM API integration at scale: structured outputs, streaming, function/tool calling, error handling
RAG pipeline design and optimization: chunking strategies, re-ranking, hybrid search - Know what knobs to turn for what issues
Vector store experience: OpenSearch or equivalent
Applied ML intuition: fine-tuning concepts, prompt engineering, evaluations, Qlora, PEFT
Backend development: FastAPI, gRPC, Kafka, Redis, message queues, Async
System design: Python, API Design
GraphQL and/or REST at enterprise scale
Observability and monitoring for non-deterministic systems: LangFuse, Prometheus, or equivalent
Kubernetes: deploying, scaling, and managing workloads (Deployments, Services, ConfigMaps, Secrets)
Container image management: building, tagging, versioning, and pushing images via Docker; familiarity with a container registry (ECR, GCR, Docker Hub)
CI/CD pipelines for automated build and deploy (GitHub Actions, Jenkins, ArgoCD, or similar)
Resource management: CPU/memory limits, autoscaling (HPA/VPA), health probes

Nice To Haves

Master’s desirable.
Multi-tenant architecture awareness: rate limiting, auth, tenant isolation
Knowledge base and cost optimization experience: AWS Bedrock, OpenSearch Serverless

Responsibilities

Design, build, and own a production-grade agentic orchestration platform, implementing scalable multi-agent workflows using frameworks such as LangGraph or equivalent.
Architect, develop, and operate the MCP server infrastructure, including inter-agent communication, tool/server registries, domain isolation, versioning, and lifecycle management.
Integrate and operate LLM services at enterprise scale, supporting streaming, structured outputs, tool/function calling, and robust error handling across agent workflows.
Build and maintain retrieval and memory services for agentic systems, including RAG pipelines, OpenSearch-backed vector stores, hybrid search, and relevance optimization.
Develop and operate high-performance backend services (FastAPI, gRPC, async systems, messaging) that power orchestration, tool execution, and agent runtime behavior.
Own observability and reliability for non-deterministic systems, delivering end-to-end tracing, monitoring, and cost/performance visibility for agent executions.
Manage cloud-native infrastructure and deployment, including Kubernetes workloads, containerized services, CI/CD pipelines, and resource optimization (CPU/memory, autoscaling).

Benefits

Health & Wellbeing We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.
Personal & Professional Development We also invest in your career because the better you are, the better we all are. We have specific programs catered to helping you reach any career goals you have — whether you want to become a knowledge expert in your field or apply your skills to another division.
Unconditional Inclusion We are unconditionally inclusive in the way we work and celebrate individual uniqueness. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume