About this role: Wells Fargo is seeking a Principal Engineer – Gen AI Platform Inferencing Engineering to lead the development and optimization of our AI model serving and inferencing platforms within Digital Technology's AI Capability Engineering group. This is a software engineering role — you'll write code, build systems, and solve hard problems in the AI inference stack. You'll work deep inside frameworks like vLLM, SGLang, and NVIDIA Dynamo, extending and optimizing them to serve models at enterprise scale. You'll also build the automation, tooling, and deployment infrastructure that connects these runtimes to Kubernetes-native serving layers like KServe, KNative, and OpenShift AI. If you've contributed to inference frameworks, written custom serving logic, or built production ML serving pipelines in Python, we want to hear from you. In this role, you will: Develop, extend, and optimize inference runtime configurations and integrations across vLLM, SGLang, NVIDIA Dynamo, TensorRT-LLM, and Triton Write Python-based tooling and automation for model onboarding, serving configuration, performance benchmarking, and deployment pipelines Build and maintain Kubernetes-native model serving infrastructure using KServe, KNative, and OpenShift AI — including custom serving runtimes and inference graphs Implement and tune inference performance optimizations — continuous batching, speculative decoding, prefix caching, concurrency control, autoscaling policies, and disaggregated prefill/decode pipelines Develop Helm charts, operators, and Kustomize overlays for deploying and managing inference workloads on OpenShift/OCP Integrate inference platforms with GPU workload orchestrators (Run:AI or similar) — automating project provisioning, quota management, and workload scheduling Build observability and testing harnesses — load testing frameworks, latency/throughput profiling scripts, and regression test suites for inference stack upgrades Partner with AI/ML teams to productionize new models, defining serving architectures, resource requirements, and SLA targets
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Principal
Education Level
No Education Listed
Number of Employees
5,001-10,000 employees