Principle Engineer - GPU and LLM Infrastructure

Wells Fargo & CompanyConcord, CA
12d$159,000 - $305,000Hybrid

About The Position

About this role: Wells Fargo is seeking a Principal Engineer – GPU & LLM Infrastructure to lead the end‑to-end strategy and operations of our enterprise GPU platforms within Digital Technology’s AI Capability Engineering group. In this role, you will design and evolve GPU architecture across on‑premises and cloud environments, guide POCs through production readiness, and oversee Day‑2 operations for large‑scale, multi‑cloud deployments. You will serve as the technical authority for Nvidia/Run:AI orchestration, drive alignment with OpenShift AI, and enable high‑performance LLM/SLM inferencing using Triton and vLLM. A core part of the role is ensuring our GenAI platforms are secure, resilient, scalable, and fully observable to meet the demands of enterprise‑grade AI workloads. In this role, you will: Act as an advisor to leadership to develop or influence GPU buildout for highly complex business and technical needs across multiple groups Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking Translate advanced technology experience, an in-depth knowledge of the organizations tactical and strategic business objectives, the enterprise technological environment, the organization structure, and strategic technological opportunities and requirements into technical engineering solutions Provide vision, direction and expertise to leadership on implementing innovative and significant business solutions Maintain knowledge of industry best practices and new technologies and recommends innovations that enhance operations or provide a competitive advantage to the organization Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership Design and implement GPU cluster topologies (H100/H200, NVLink/NVSwitch), networking, and storage paths for high‑throughput inferencing; publish sizing and performance baselines.

Requirements

  • 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education

Nice To Haves

  • 1+ years of experience with NVIDIA GPU and CUDA ecosystems, including CUDA, cuDNN, NVLink/NVSwitch, MIG, NCCL, GPU profilers, and performance tuning for H100/H200 architectures
  • 1+ years of experience with LLM/SLM runtimes, such as vLLM, TensorRT‑LLM, and Triton; hands-on work with model quantization (FP8, INT4 AWQ/GPTQ), KV‑cache optimization strategies, and disaggregated prefill/decode pipelines
  • 1+ years of experience in orchestration and GPU workload management, including GPU resource managers (collections/departments/projects/workloads), OCP/GKE administration, quota management, preemption and fair‑share enforcement, GPU scheduling and timeslicing, Helm/Kustomize, upgrade validation, and admission controls
  • 1+ years of experience with API and gateway platforms, including Apigee authentication/authorization, quota and rate-limit configuration, OpenAPI specifications, SDK generation, SLA operations, and API versioning/deprecation workflows
  • 1+ years of experience in observability and evaluation tooling, including Arize‑like systems for tracing and evaluations, SLO development, alerting design, retention/export workflows, and dashboard creation
  • 1+ years of experience in performance engineering, including throughput and latency modeling (token/sec, batch shaping, cache policies) and cost/performance optimization strategies for LLM/SLM workloads

Responsibilities

  • Act as an advisor to leadership to develop or influence GPU buildout for highly complex business and technical needs across multiple groups
  • Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking
  • Translate advanced technology experience, an in-depth knowledge of the organizations tactical and strategic business objectives, the enterprise technological environment, the organization structure, and strategic technological opportunities and requirements into technical engineering solutions
  • Provide vision, direction and expertise to leadership on implementing innovative and significant business solutions
  • Maintain knowledge of industry best practices and new technologies and recommends innovations that enhance operations or provide a competitive advantage to the organization
  • Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership
  • Design and implement GPU cluster topologies (H100/H200, NVLink/NVSwitch), networking, and storage paths for high‑throughput inferencing; publish sizing and performance baselines.

Benefits

  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Scholarships for dependent children
  • Adoption reimbursement

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Principal

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service