Lead Infrastructure Engineer

Wells Fargo Bank•Chandler, AZ

1d•Onsite

About The Position

Wells Fargo is seeking a Lead Infrastructure Engineer to support Mainframe Level 2 (L2) Operations within our Platform Front Line organization. This role is accountable for ensuring rapid, consistent recovery from production incidents and acting as a guardian of production stability, enforcing rigorous analysis and risk evaluation of all infrastructure changes prior to implementation. In this role, you will: Lead complex initiatives to develop infrastructure to provide solutions for business applications. Participate in various projects intended to continually improve or upgrade the infrastructure. Evaluate internal and external software solutions which could be leveraged to meet target state architecture goals. Review and analyze high impact outages to ensure the proper processes and procedures are in place to avoid problems in the future. Design, build, deploy and maintain infrastructure solutions through collaborative efforts with the team and third party vendors. Design, code, test, debug and document programs using Agile development practices. Make decisions in technical designs, implementation plans and identify project risks and resource requirements. Direct the daily risk and control flow of operations, focusing on policies, procedures and work standards to ensure success. Recommend courses of action to maintain cost effectiveness and achieve results. Collaborate and consult with peers, colleagues and managers to resolve issues and achieve goals. Interact with customer and vendor.

Requirements

5+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
3+ year's experience supporting enterprise mainframe environments (e.g., z/OS, JES2, CICS, DB2, IMS or similar platforms)
3+ year's experience with strong experience in incident response, recovery management, and problem management disciplines
3+ year's experience with a Demonstrated ability to restore services quickly and consistently during high-severity incidents
3+ year's experience evaluating and enforcing change quality, risk controls, and operational readiness criteria
3+ year's experience with proven ability to troubleshoot complex infrastructure issues in high-availability environments

Nice To Haves

Experience in Mainframe L2 Operations or Production Support leadership roles
Knowledge of change governance and pre-implementation risk assessment practices
Experience with automation or scripting to improve recovery time and reduce manual effort (REXX, Python or equivalent)
Familiarity with enterprise monitoring, alerting, and incident management platforms
Experience supporting large-scale, mission-critical financial services environments
Strong ability to drive root cause analysis and sustained remediation of systemic issues
Excellent communication skills, with the ability to lead during high-pressure incident situations

Responsibilities

Lead complex initiatives to develop infrastructure to provide solutions for business applications
Participate in various projects intended to continually improve or upgrade the infrastructure
Evaluate internal and external software solutions which could be leveraged to meet target state architecture goals
Review and analyze high impact outages to ensure the proper processes and procedures are in place to avoid problems in the future
Design, build, deploy and maintain infrastructure solutions through collaborative efforts with the team and third party vendors
Design, code, test, debug and document programs using Agile development practices
Make decisions in technical designs, implementation plans and identify project risks and resource requirements
Direct the daily risk and control flow of operations, focusing on policies, procedures and work standards to ensure success
Recommend courses of action to maintain cost effectiveness and achieve results
Collaborate and consult with peers, colleagues and managers to resolve issues and achieve goals
Interact with customer and vendor
Participate in a 24x7 operational support model, including on-call rotation
Lead and support major incident response calls, ensuring timely resolution and clear communication
Enforce change discipline, validating readiness and risk mitigation before production deployment
Ensure compliance with enterprise operational, risk, and control policies
Maintain documentation and evidence required for audit and control validation
Partner across global teams (U.S. and India) to ensure consistent operational execution and coverage