Lead Machine Learning Engineer

Wells Fargo Bank•Concord, CA

21h•$119,000 - $224,000•Hybrid

About The Position

About this role: Wells Fargo is seeking a highly skilled and experienced Lead Machine Learning Engineer to join our team. The ideal candidate will be responsible for designing, implementing, and maintaining AI platforms both on-premises and in Google Cloud Platform (GCP). This role requires an understanding of AI/ML technologies, cloud infrastructure, and on-premises systems. This Lead ML Engineer will work closely with cross-functional teams to ensure the seamless integration and operation of AI solutions, drive strategic initiatives, and own the end-to-end delivery of scalable, production-grade AI platforms. In this role, you will: Lead complex technology initiatives including those that are companywide with broad impact Act as a key participant in developing standards and companywide best practices for engineering complex and large-scale technology solutions for technology engineering disciplines Design, code, test, debug, and document for projects and programs Review and analyze complex, large-scale technology solutions for tactical and strategic business objectives, enterprise technological environment, and technical challenges that require in-depth evaluation of multiple factors, including intangibles or unprecedented technical factors Make decisions in developing standard and companywide best practices for engineering and technology solutions requiring understanding of industry best practices and new technologies, influencing and leading technology team to meet deliverables and drive new initiatives Collaborate and consult with key technical experts, senior technology team, and external industry groups to resolve complex technical issues and achieve goals Lead projects, teams, or serve as a peer mentor Architect and deploy AI platforms on-premises and in GCP, ensuring scalability, reliability, and performance Provision and design optimized GCP cloud infrastructure using tools such as Terraform to support AI workloads, including compute, storage, and networking resources Work with data scientists, MLOPs engineers, and other stakeholders to understand requirements and deliver robust and scalable AI solutions Ensure that AI platforms adhere to security best practices and compliance requirements Monitor and optimize the performance of AI platforms, identifying and resolving bottlenecks Create and maintain comprehensive documentation for AI platform architecture and runbook content on net new technology for Platform Support team

Requirements

5+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
5+ years of experience in platform engineering, with a focus on AI/ML technologies
3+ years of experience with AI/ML frameworks and tools (e.g., Spark, PyTorch, Kubernetes, Docker)

Nice To Haves

Proficiency in cloud platforms, particularly GCP
Knowledge of on-premises Kubernetes platforms (preferably RH OpenShift)
Expertise in scripting and automation (e.g., Python, Bash, Terraform)
Familiarity with CI/CD pipelines and DevOps practices
Excellent problem-solving and analytical skills
Strong communication and collaboration abilities
Ability to work independently and as part of a team
GCP Professional Cloud Architect or similar certifications

Responsibilities

Lead complex technology initiatives including those that are companywide with broad impact
Act as a key participant in developing standards and companywide best practices for engineering complex and large-scale technology solutions for technology engineering disciplines
Design, code, test, debug, and document for projects and programs
Review and analyze complex, large-scale technology solutions for tactical and strategic business objectives, enterprise technological environment, and technical challenges that require in-depth evaluation of multiple factors, including intangibles or unprecedented technical factors
Make decisions in developing standard and companywide best practices for engineering and technology solutions requiring understanding of industry best practices and new technologies, influencing and leading technology team to meet deliverables and drive new initiatives
Collaborate and consult with key technical experts, senior technology team, and external industry groups to resolve complex technical issues and achieve goals
Lead projects, teams, or serve as a peer mentor
Architect and deploy AI platforms on-premises and in GCP, ensuring scalability, reliability, and performance
Provision and design optimized GCP cloud infrastructure using tools such as Terraform to support AI workloads, including compute, storage, and networking resources
Work with data scientists, MLOPs engineers, and other stakeholders to understand requirements and deliver robust and scalable AI solutions
Ensure that AI platforms adhere to security best practices and compliance requirements
Monitor and optimize the performance of AI platforms, identifying and resolving bottlenecks
Create and maintain comprehensive documentation for AI platform architecture and runbook content on net new technology for Platform Support team