Data scientist - Agentic AI

Hewlett Packard Enterprise•San Jose, CA

12h•$155,500 - $315,000•Onsite

About The Position

The Data Scientist – Agentic AI builds and operationalizes the core agentic workflows that power Marvis, Juniper's next-generation AI assistant for network operations. Working at the intersection of data science, generative AI, and production engineering, this role is responsible for designing, implementing, and evaluating the reasoning pipelines, tool-calling patterns, skills, and MCP server integrations that enable Marvis to autonomously diagnose, troubleshoot, and resolve complex networking problems. The ideal candidate combines deep hands-on experience with LLM-based agentic frameworks (LangGraph preferred) with the software engineering rigor needed to ship reliable, observable AI systems in a cloud-native environment.

Requirements

Master's or PhD degree in computer science, data science, mathematics, statistics, or a closely related quantitative discipline.
Typically, 4–6 years of experience building production ML/AI systems, with at least 1–2 years of hands-on work with generative AI and LLM-based applications.
Production experience with agentic orchestration frameworks: LangGraph (strongly preferred), LangChain, Claude Agent SDK, or equivalent — beyond prototypes.
Solid understanding of agentic design patterns: ReACT loops, tool/function calling, dynamic tool binding, skill-based execution, multi-step planning, and self-correction.
Hands-on experience with MCP (Model Context Protocol) or equivalent tool-serving protocols: tool schema design, server implementation, registry management.
LLM API integration at scale: prompt engineering, structured outputs, streaming, error handling, and cost optimization.
RAG pipeline design: chunking strategies, re-ranking, hybrid search, vector stores (OpenSearch or equivalent), and relevance optimization.
Experience building evaluation and testing frameworks for non-deterministic AI systems (offline evals, A/B testing, LLM-as-judge).
Strong foundation in statistical and machine learning techniques — anomaly detection, time-series analysis, clustering, causal inference, or related methods.
Applied ML intuition: knowing when to use retrieval vs. fine-tuning, prompt engineering vs. structured generation, and how to debug model behavior in production.
Proficient Python developer with experience in production codebases (not just notebooks).
Kubernetes: deploying, scaling, and managing workloads (Deployments, Services, ConfigMaps, Secrets, health probes).
CI/CD pipelines for automated build, test, and deploy (Jenkins, GitHub Actions, ArgoCD, or similar).
Container image management: building, tagging, versioning via Docker; familiarity with a container registry (ECR, GCR).
Backend service development: FastAPI or equivalent; REST/GraphQL API design.
Observability for AI systems: experience with tracing, monitoring, and logging tools (LangFuse, Prometheus, or equivalent).
Great written and verbal communication skills; ability to articulate technical designs to senior leadership.

Nice To Haves

Experience with agent memory systems (e.g., LangMem, custom memory architectures).
Familiarity with sandboxed code execution environments (E2B, Firecracker, or similar).
Networking domain knowledge (wireless/wired diagnostics, network troubleshooting) is a strong plus but not required.
Experience with AWS Bedrock, OpenSearch Serverless, or similar managed AI/ML services.

Responsibilities

Design, implement, and iterate on agentic workflows using LangGraph, including ReACT orchestration loops, dynamic tool selection and binding, multi-step reasoning, and self-correction patterns.
Develop and maintain MCP (Model Context Protocol) servers and skills — defining tool schemas, implementing domain-specific tools, writing skill playbooks (SKILL.md), and managing server lifecycle (versioning, deployment, monitoring).
Integrate and optimize LLM capabilities at production scale, including structured outputs, streaming, function/tool calling, prompt engineering, and robust error handling across agent execution paths.
Build and refine retrieval and memory services for agentic systems, including RAG pipelines, vector-store-backed semantic search, hybrid retrieval, long-term agent memory (semantic, episodic, procedural), and relevance tuning.
Design and execute evaluation frameworks for non-deterministic agentic systems — defining metrics, building test harnesses, running A/B tests on skills and tool configurations, and driving continuous quality improvement.
Collaborate with domain experts (network engineers, product managers) to formalize networking problems as agentic workflows, translating troubleshooting playbooks into skills, tools, and data pipelines.
Develop data analysis and transformation logic that runs in sandboxed execution environments (Code Mode), including multi-tool orchestration scripts, data aggregation, and visualization.
Deploy and operate containerized services in Kubernetes, contributing to CI/CD pipelines, container image management, health probes, and resource optimization.
Own observability for agentic workflows — implementing tracing, logging, cost tracking, and performance monitoring to ensure reliability of non-deterministic systems in production.

Benefits

Health & Wellbeing comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.
Personal & Professional Development programs catered to helping you reach any career goals you have
Unconditional Inclusion
Flexibility to manage our work and personal needs.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume