Wells Fargo is seeking a Lead Software Engineer — LLM Inferencing & Agentic AI within Digital Technology’s AI Capability Engineering organization. In this role, you will design, build, and operate the GenAI Platform’s GPU infrastructure and LLM/SLM serving systems, ensuring highly performant, reliable, and secure model inferencing at scale. You will work across the full inferencing stack—from GPU cluster configuration and Run:AI / OpenShift AI orchestration to vLLM and NVIDIA Triton runtime optimization, including performance tuning, production hardening, and multi‑model deployment. Focus areas include operating H100/H200 GPU clusters, advanced GPU scheduling, disaggregated prefill/decode serving, deep observability, and productionizing endpoints behind the enterprise API Gateway. You will also design and deliver OpenAI‑compatible APIs (Responses, Interactions), support MCP server integrations, and contribute to agentic AI development—including tools, agents, workflows, and evaluations. This role additionally involves building UI surfaces that improve developer and operator productivity, enabling teams to use, monitor, and troubleshoot AI services more effectively. Strong experience with LLM/SLM behavior, inferencing optimizations, tuning techniques, and prompt engineering/evaluation is expected.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed