At Andersen Consulting, AI/LLMOps FDEs own the technical delivery of production AI systems end-to-end and sit across the table from engineering teams at global enterprises. The practice is early. The engineers who join now will define its architecture, its standards, and what world-class AI delivery looks like for clients who trust us with their most consequential systems. Enterprise AI programs follow a predictable arc: a successful POC, a budget approval, and then months of stalled production work before the initiative quietly dies. The gap between a notebook that impresses stakeholders and a system that runs the business under real enterprise load is an engineering problem. Its name is LLMOps. LLMs have been serious enterprise tools for roughly three years. We are not looking for someone who has mastered a stable field. We are looking for someone who has been in the room while the field was being invented: someone who built RAG pipelines that broke in production, debugged agent loops that silently degraded after staging, and designed eval suites from scratch because nothing off the shelf measured what actually mattered. Day-to-day, you embed directly with client teams and own a defined technical workstream from design through production: Design and deploy RAG pipelines using pgvector, Pinecone, Qdrant, or Weaviate, with deliberate chunking strategies, hybrid search, and re-ranking layers. Build multi-step agent workflows using LangChain, LangGraph, LlamaIndex, or the Anthropic Claude SDK, with tool use, structured outputs, and memory. Implement guardrails at the correct system boundary: output schema enforcement, PII detection, content filtering, and agent behavior constraints calibrated for each client's regulatory context. Design eval suites in using off-the-shelf tools or custom Python harnesses that measure retrieval quality, answer faithfulness, hallucination rates, and latency under load. Instrument observability from day one: LLM call latency, retrieval quality metrics, agent decision traces, cost per query. Write production Python: typed, tested, linted, and deployable by another engineer. Deploy and serve models on AWS (Bedrock, SageMaker, EKS), Azure (Azure OpenAI Service, AKS), or GCP (Vertex AI, GKE) under latency-sensitive enterprise conditions. Treat prompt and context engineering as an engineering discipline: versioned system prompts, few-shot libraries, chain-of-thought elicitation, and context window budgeting, all of it tracked, tested, and iterated rather than adjusted informally. Build and deploy MCP servers to expose enterprise data sources, internal APIs, and tools to LLM agents in a standardized, auditable way. Contribute reusable accelerators, reference architectures, and internal tooling during bench time. Every asset you build should make the next engagement start faster Travel is a real part of this role. Client work regularly requires on-site presence during discovery, architecture reviews, and go-live. Production RAG pipelines with hybrid search, re-ranking, and query-time monitoring, designed to hold up under real query distributions and corpus drift, not just benchmark datasets. Multi-step agent orchestration systems with tool use, memory, and structured output validation, built to be reliable at enterprise load, not just demonstrable in a demo environment. Eval frameworks designed from scratch to measure what the client's system actually needs to get right: faithfulness, groundedness, latency percentiles, and failure mode frequency. Guardrail infrastructure positioned correctly in the call stack: input validation, output schema enforcement, and behavioral constraints for the specific regulatory context of each engagement. Observability stacks instrumented at the LLM call, retrieval, and agent decision layer, giving clients operational visibility instead of log files they can't act on. MCP servers exposing internal enterprise systems (databases, document stores, internal APIs) to LLM agents through a standardized, auditable interface. CI/CD pipelines for LLM systems with automated eval runs on prompt changes, model version regression testing, and deployment gating before anything reaches production.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Manager
Education Level
No Education Listed
Number of Employees
501-1,000 employees