Wells Fargo is seeking a Lead Software Engineer — LLM Inferencing & Agentic AI within Digital Technology’s AI Capability Engineering organization. In this role, you will design, build, and operate the GenAI Platform’s GPU infrastructure and LLM/SLM serving systems, ensuring highly performant, reliable, and secure model inferencing at scale. You will work across the full inferencing stack—from GPU cluster configuration and Run:AI / OpenShift AI orchestration to vLLM and NVIDIA Triton runtime optimization, including performance tuning, production hardening, and multi‑model deployment. Focus areas include operating H100/H200 GPU clusters, advanced GPU scheduling, disaggregated prefill/decode serving, deep observability, and productionizing endpoints behind the enterprise API Gateway. You will also design and deliver OpenAI‑compatible APIs (Responses, Interactions), support MCP server integrations, and contribute to agentic AI development—including tools, agents, workflows, and evaluations. This role additionally involves building UI surfaces that improve developer and operator productivity, enabling teams to use, monitor, and troubleshoot AI services more effectively. Strong experience with LLM/SLM behavior, inferencing optimizations, tuning techniques, and prompt engineering/evaluation is expected. In this role, you will: Lead complex technology initiatives including those that are companywide with broad impact Act as a key participant in developing standards and companywide best practices for engineering complex and large scale technology solutions for technology engineering disciplines Design, code, test, debug, and document for projects and programs Review and analyze complex, large-scale technology solutions for tactical and strategic business objectives, enterprise technological environment, and technical challenges that require in-depth evaluation of multiple factors, including intangibles or unprecedented technical factors Make decisions in developing standard and companywide best practices for engineering and technology solutions requiring understanding of industry best practices and new technologies, influencing and leading technology team to meet deliverables and drive new initiatives Collaborate and consult with key technical experts, senior technology team, and external industry groups to resolve complex technical issues and achieve goals Lead projects, teams, or serve as a peer mentor Engineer GPU clusters and node pools; configure NVLink/NVSwitch, NVIDIA GPU Operator, MIG profiles, container runtime, and kernel/driver baselines for high‑throughput LLM/SLM workloads. Design and implement OpenAI‑compatible APIs (Responses, Interactions) behind the AI Gateway: define OpenAPI contracts, authN/Z (OAuth2/mTLS), rate limits/quotas, SLAs, versioning/deprecation, and SDK generation. Build and support MCP servers and tool adapters; manage agent/tool identity and capability metadata; integrate with agent registries and execution flows. Develop Agentic AI capabilities (tools/agents/workflows) including disaggregated prefill/decode patterns; contribute to runbooks, guardrails, and safe tool usage. Build UI surfaces (developer/ops consoles) for endpoint onboarding, prompt testing, evaluations, observability dashboards, and incident response workflows. Apply prompt engineering and evaluation best practices; create golden test suites, regression harnesses, and measurable SLO‑aligned criteria for production promotion.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
5,001-10,000 employees