Senior Infrastructure Engineer - SRE/Platform Engineering

Wells Fargo•Irving, TX

4d•Hybrid

About The Position

Wells Fargo is seeking a Senior Infrastructure Engineer to join the API SRE & Platform Operations team within CTO Platform Services. This role is focused on driving infrastructure stability, automation, and reliability across critical API and platform systems that support high-impact financial transactions. This is a highly visible role responsible for owning production reliability, improving operational efficiency, and enabling scalable platform capabilities across API Management, CI/CD, and supporting platform environments.

Requirements

4+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
4+ years of Proficiency in leveraging observability platforms such as BigPanda, ThousandEyes, Grafana, Prometheus, ELK, Splunk Observability, and AppDynamics to enhance service reliability and performance monitoring
3+ years of experience working with Red Hat Enterprise Linux and Kubernetes, with a strong focus on Red Hat OpenShift Container Platform (OCP)
3+ years of experience with Site Reliability Engineering and supporting production grade
3+ years of experience with automation & scripting

Nice To Haves

4+ years of experience in IT Service Management (ITSM), with a strong background in incident, problem, and change management processes
Experience with API management platforms such as Apigee or API gateways
Exposure to IBM DataPower or similar enterprise integration tools
Expertise in Ansible Tower, including developing and maintaining playbooks
Experience with cloud-native architectures, high-availability systems, Cloud & Container Technologies like GCP or Azure and familiarity with Kubernetes
Strong experience working in Agile methodologies / Scrum environments
Experience improving system reliability, scalability, and operational efficiency
Experience in project management and stakeholder engagement
Proven experience in leading cross-functional teams
Strong problem-solving and decision-making abilities
Excellent communication and collaboration skills

Responsibilities

Lead daily support operations for Apigee OPDK, Apigee Hybrid, to ensure platform uptime, stability, and performance
Troubleshoot runtime, policy, routing, and security issues on DataPower appliances
Develop specifications for complex infrastructure systems, design and test solutions
Contribute to the testing of business, application and technical infrastructure requirements
Implement reliability improvements through Infrastructure-as-Code (IaC) using Terraform, Ansible, and GitOps
Develop automated recovery scripts and tools to reduce manual operational overhead
Review and analyze solutions for cloud security, secrets management and key rotations
Design, code, test, debug and document programs using Agile development practices
Plan and execute version upgrades, patching cycles, infrastructure migrations, and configuration refactoring.
Improve proactive alerting to reduce mean time to detect (MTTD) and mean time to recover (MTTR)
Own and resolve P1/P2 high-severity incidents with quick response and deep technical troubleshooting
Direct the daily risk and control flow of operations, focusing on policies, procedures and work standards to ensure success
Participate in design discussions, architectural reviews, API governance activities, and platform modernization initiatives
Work with CAB (Change Advisory Board) for change planning, approvals, and execution tracking
Contribute to runbooks, SOPs, architectural diagrams, and platform knowledge base assets

Benefits

Relocation assistance is not available for this position
May be considered for a discretionary bonus, Restricted Share Rights, or other long – term incentive awards.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume