Wells Fargo is seeking a Senior Software Engineer – LLM Inferencing & AI Gateway to join our Digital Technology – AI Capability Engineering team. In this role, you will design, build, and operate the GPU‑based GenAI platform and the serving infrastructure for LLM/SLM workloads. Your work will span the full stack—from GPU cluster configuration and Run:AI/OpenShift AI orchestration to optimizing vLLM/Triton runtimes and hardening these systems for production use. Key focus areas include H100/H200 GPU clusters, NVLink/NVSwitch, MIG, CUDA/NVML, GPU scheduling, and disaggregated inferencing patterns (prefill/decode). You will also drive observability best practices and deliver reliable, scalable model endpoints through an API Gateway–based production architecture. In this role, you will: Lead complex Generative AI initiatives and deliverables within technical domain environments Contribute to large scale planning of strategies Design, code, test, debug, and document for projects and programs associated with technology domain, including upgrades and deployments Review moderately complex technical challenges that require an in-depth evaluation of technologies and procedures Resolve moderately complex issues and lead a team to meet existing client needs or potential new clients needs while leveraging solid understanding of the function, policies, procedures, or compliance requirements Collaborate and consult with peers, colleagues, and mid-level managers to resolve technical challenges and achieve goals Lead projects and act as an escalation point, provide guidance and direction to less experienced staff Engineer GPUs clusters and node pools; configure NVLink/NVSwitch, NVIDIA GPU Operator, MIG profiles, container runtime, and kernel/driver baselines for high‑throughput LLM/SLM workloads.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
5,001-10,000 employees