Manager - AI/LLMOps Forward Deployed Engineer

Andersen•New York, NY

19d

About The Position

At Andersen Consulting, AI/LLMOps FDEs own the technical delivery of production AI systems end-to-end and sit across the table from engineering teams at global enterprises. The practice is early. The engineers who join now will define its architecture, its standards, and what world-class AI delivery looks like for clients who trust us with their most consequential systems. Enterprise AI programs follow a predictable arc: a successful POC, a budget approval, and then months of stalled production work before the initiative quietly dies. The gap between a notebook that impresses stakeholders and a system that runs the business under real enterprise load is an engineering problem. Its name is LLMOps. LLMs have been serious enterprise tools for roughly three years. We are not looking for someone who has mastered a stable field. We are looking for someone who has been in the room while the field was being invented: someone who built RAG pipelines that broke in production, debugged agent loops that silently degraded after staging, and designed eval suites from scratch because nothing off the shelf measured what actually mattered. What You'll Do Day-to-day, you embed directly with client teams and own a defined technical workstream from design through production: Design and deploy RAG pipelines using pgvector, Pinecone, Qdrant, or Weaviate, with deliberate chunking strategies, hybrid search, and re-ranking layers. Build multi-step agent workflows using LangChain, LangGraph, LlamaIndex, or the Anthropic Claude SDK, with tool use, structured outputs, and memory. Implement guardrails at the correct system boundary: output schema enforcement, PII detection, content filtering, and agent behavior constraints calibrated for each client's regulatory context. Design eval suites in using off-the-shelf tools or custom Python harnesses that measure retrieval quality, answer faithfulness, hallucination rates, and latency under load. Instrument observability from day one: LLM call latency, retrieval quality metrics, agent decision traces, cost per query. Write production Python: typed, tested, linted, and deployable by another engineer. Deploy and serve models on AWS (Bedrock, SageMaker, EKS), Azure (Azure OpenAI Service, AKS), or GCP (Vertex AI, GKE) under latency-sensitive enterprise conditions. Treat prompt and context engineering as an engineering discipline: versioned system prompts, few-shot libraries, chain-of-thought elicitation, and context window budgeting, all of it tracked, tested, and iterated rather than adjusted informally. Build and deploy MCP servers to expose enterprise data sources, internal APIs, and tools to LLM agents in a standardized, auditable way. Contribute reusable accelerators, reference architectures, and internal tooling during bench time. Every asset you build should make the next engagement start faster Travel is a real part of this role. Client work regularly requires on-site presence during discovery, architecture reviews, and go-live.

Requirements

7+ years of total experience in data engineering, MLOps, software engineering, or closely adjacent infrastructure roles, with evidence of end-to-end ownership: architecture through production, including the post-launch work.
2–3 years of LLM-specific engineering experience is credible at this level. That is the ceiling given when the field started, not a floor. What distinguishes senior candidates is judgment: knowing when an agent architecture is the right tool, how to design for observability before it is needed, and how to have the hard conversation when a POC is not production-ready.
Production Python. Typed, tested, and structured for multi-engineer deployment. We will ask about your testing practices and code review standards.
At least one production LLM system shipped: a RAG pipeline, agent workflow, or LLM-powered application that handled real enterprise load, not just internal demos. The key question is what broke after launch and what you did about it.
Hands-on vector retrieval experience with pgvector, Pinecone, Qdrant, Weaviate, or Chroma, including hybrid search design, embedding quality diagnosis, and re-ranking strategy selection, not just initial setup.
Working knowledge of at least one agentic SDK (Anthropic Claude SDK, LangChain, LangGraph, LlamaIndex, or OpenAI SDK) in a non-trivial use case involving tool use, memory, or structured output handling.
Cloud AI infrastructure depth in at least one of AWS (Bedrock, SageMaker, EKS), Azure (Azure OpenAI Service, AKS), or GCP (Vertex AI, GKE).
LLM observability experience: you have instrumented a system before, not just read the documentation.
Enough communication clarity to explain an architecture decision to an engineering team and a business stakeholder in the same meeting. These are client-facing roles.
Travel up to 50%

Nice To Haves

Model serving experience in latency-constrained environments.
Guardrail implementation experience in regulated data contexts: PHI, PII, or MNPI.
RAGAS or custom eval harness experience.
MCP server development. Still rare enough that hands-on experience is high signal.
Palantir Foundry or AIP experience, a meaningful differentiator as Andersen's Palantir practice scales alongside the LLMOps practice.
Prior client-facing or consulting experience. FDEs who understand stakeholder management and expectation-setting ramp faster and derisk engagements earlier.
Domain exposure in financial services (insurance, reinsurance, credit), healthcare, supply chain, or manufacturing.

Responsibilities

Design and deploy RAG pipelines using pgvector, Pinecone, Qdrant, or Weaviate, with deliberate chunking strategies, hybrid search, and re-ranking layers.
Build multi-step agent workflows using LangChain, LangGraph, LlamaIndex, or the Anthropic Claude SDK, with tool use, structured outputs, and memory.
Implement guardrails at the correct system boundary: output schema enforcement, PII detection, content filtering, and agent behavior constraints calibrated for each client's regulatory context.
Design eval suites in using off-the-shelf tools or custom Python harnesses that measure retrieval quality, answer faithfulness, hallucination rates, and latency under load.
Instrument observability from day one: LLM call latency, retrieval quality metrics, agent decision traces, cost per query.
Write production Python: typed, tested, linted, and deployable by another engineer.
Deploy and serve models on AWS (Bedrock, SageMaker, EKS), Azure (Azure OpenAI Service, AKS), or GCP (Vertex AI, GKE) under latency-sensitive enterprise conditions.
Treat prompt and context engineering as an engineering discipline: versioned system prompts, few-shot libraries, chain-of-thought elicitation, and context window budgeting, all of it tracked, tested, and iterated rather than adjusted informally.
Build and deploy MCP servers to expose enterprise data sources, internal APIs, and tools to LLM agents in a standardized, auditable way.
Contribute reusable accelerators, reference architectures, and internal tooling during bench time. Every asset you build should make the next engagement start faster

Benefits

Employees (and their families) are eligible for medical, dental, vision, and basic life insurance coverage.
Employees may enroll in the firm’s 401(k) plan upon hire.
We offer 200 hours of paid time off annually, along with twelve paid holidays each calendar year.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume