At Andersen Consulting, AI/LLMOps FDEs own the technical delivery of production AI systems end-to-end and sit across the table from engineering teams at global enterprises. The practice is early. The engineers who join now will define its architecture, its standards, and what world-class AI delivery looks like for clients who trust us with their most consequential systems. Enterprise AI programs follow a predictable arc: a successful POC, a budget approval, and then months of stalled production work before the initiative quietly dies. The gap between a notebook that impresses stakeholders and a system that runs the business under real enterprise load is an engineering problem. Its name is LLMOps. LLMs have been serious enterprise tools for roughly three years. We are not looking for someone who has mastered a stable field. We are looking for someone who has been in the room while the field was being invented: someone who built RAG pipelines that broke in production, debugged agent loops that silently degraded after staging, and designed eval suites from scratch because nothing off the shelf measured what actually mattered. What You'll Do Day-to-day, you embed directly with client teams and own a defined technical workstream from design through production: Design and deploy RAG pipelines using pgvector, Pinecone, Qdrant, or Weaviate, with deliberate chunking strategies, hybrid search, and re-ranking layers. Build multi-step agent workflows using LangChain, LangGraph, LlamaIndex, or the Anthropic Claude SDK, with tool use, structured outputs, and memory. Implement guardrails at the correct system boundary: output schema enforcement, PII detection, content filtering, and agent behavior constraints calibrated for each client's regulatory context. Design eval suites in using off-the-shelf tools or custom Python harnesses that measure retrieval quality, answer faithfulness, hallucination rates, and latency under load. Instrument observability from day one: LLM call latency, retrieval quality metrics, agent decision traces, cost per query. Write production Python: typed, tested, linted, and deployable by another engineer. Deploy and serve models on AWS (Bedrock, SageMaker, EKS), Azure (Azure OpenAI Service, AKS), or GCP (Vertex AI, GKE) under latency-sensitive enterprise conditions. Treat prompt and context engineering as an engineering discipline: versioned system prompts, few-shot libraries, chain-of-thought elicitation, and context window budgeting, all of it tracked, tested, and iterated rather than adjusted informally. Build and deploy MCP servers to expose enterprise data sources, internal APIs, and tools to LLM agents in a standardized, auditable way. Contribute reusable accelerators, reference architectures, and internal tooling during bench time. Every asset you build should make the next engagement start faster Travel is a real part of this role. Client work regularly requires on-site presence during discovery, architecture reviews, and go-live.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Manager
Education Level
No Education Listed
Number of Employees
501-1,000 employees