Enterprise AI Platform- GPU & LLM Infrastructure Product Manager

Wells Fargo BankCharlotte, NC
20hHybrid

About The Position

About this role: Wells Fargo is seeking an Enterprise AI Platform- GPU & LLM Infrastructure Product Manager (Lead Artificial Intelligence Solutions Consultant), as part of the Digital Capabilities team under Digital Technology & Innovation. Learn more about the career areas and lines of business at wellsfargojobs.com. You will define and lead the product strategy for Wells Fargo’s enterprise‑scale LLM/SLM inference GPU platform. In this role, you will partner closely with GPU hardware and platform engineering teams to translate customer needs and business objectives into a clear, prioritized roadmap with measurable outcomes. You will own capabilities across high‑performance model inferencing, GPU orchestration, and platform services, including vLLM, NVIDIA/Run:AI, and Red Hat OpenShift AI. The role also encompasses API productization, observability and evaluation, reliability and SLOs, and compliant end‑to‑end lifecycle management to enable secure, scalable, and enterprise‑ready AI solutions. In this role, you will: Lead a team to identify, strategize and execute highly complex Artificial Intelligence initiatives that span a line of business Recommend business strategy and deliver Artificial Intelligence enabling solutions to solve business challenges Define and prioritize cases, obtain the required resources and ensure the solutions deliver the intended benefits Leverage Artificial Intelligence expertise to evaluate technological readiness and resources required to execute the proposed solutions Make decisions to drive the implementation of Artificial Intelligence initiatives and programs while serving multiple stakeholders Resolve issues which may arise during development or implementation Collaborate and consult with peers, colleagues and managers to resolve issues and achieve goals

Requirements

  • 5+ years of Artificial Intelligence Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 2+ years of hands‑on experience with cloud platforms such as GCP or Azure, and container orchestration technologies including Docker and Kubernetes/OpenShift

Nice To Haves

  • 2+ years of experience working on platform or ML/AI infrastructure products within regulated environments
  • 2+ years of experience of proven success owning an API or platform with accountability for SLAs/SLOs, including versioning and deprecation strategies, change management, and reliability outcomes
  • Strong communication skills, with the ability to influence senior stakeholders and clearly explain complex technical concepts to diverse audiences
  • Working knowledge of LLM/SLM inference stacks, including vLLM, Triton, and TensorRT‑LLM, as well as batching strategies, KV cache management, quantization techniques (e.g., FP8, INT4), and evaluation frameworks—sufficient to make informed product trade‑offs with engineering teams
  • Familiarity with GPU and platform fundamentals, such as modern GPU architectures (e.g., H100/H200), MIG and NCCL, GPU orchestration tools (NVIDIA/Run:AI), and Kubernetes/OpenShift AI administration and admission control patterns
  • Experience building developer‑centric platforms, including APIs, SDKs, and structured release and governance processes
  • Hands‑on experience with observability and evaluation for GenAI systems, including dashboards, tracing, alerting, and safety and quality metrics
  • Demonstrated strength in stakeholder management, partnering effectively across Risk, Security, Architecture, and line‑of‑business application teams

Responsibilities

  • Lead a team to identify, strategize and execute highly complex Artificial Intelligence initiatives that span a line of business
  • Recommend business strategy and deliver Artificial Intelligence enabling solutions to solve business challenges
  • Define and prioritize cases, obtain the required resources and ensure the solutions deliver the intended benefits
  • Leverage Artificial Intelligence expertise to evaluate technological readiness and resources required to execute the proposed solutions
  • Make decisions to drive the implementation of Artificial Intelligence initiatives and programs while serving multiple stakeholders
  • Resolve issues which may arise during development or implementation
  • Collaborate and consult with peers, colleagues and managers to resolve issues and achieve goals

Benefits

  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Scholarships for dependent children
  • Adoption reimbursement
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service