About The Position

Wells Fargo is seeking a Senior Infrastructure Engineer to join the API SRE & Platform Operations team within CTO Platform Services. This role is focused on driving infrastructure stability, automation, and reliability across critical API and platform systems that support high-impact financial transactions. This is a highly visible role responsible for owning production reliability, improving operational efficiency, and enabling scalable platform capabilities across API Management, CI/CD, and supporting platform environments.

Requirements

  • 4+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 4+ years of Proficiency in leveraging observability platforms such as BigPanda, ThousandEyes, Grafana, Prometheus, ELK, Splunk Observability, and AppDynamics to enhance service reliability and performance monitoring
  • 3+ years of experience working with Red Hat Enterprise Linux and Kubernetes, with a strong focus on Red Hat OpenShift Container Platform (OCP)
  • 3+ years of experience with Site Reliability Engineering and supporting production grade
  • 3+ years of experience with automation & scripting

Nice To Haves

  • 4+ years of experience in IT Service Management (ITSM), with a strong background in incident, problem, and change management processes
  • Experience with API management platforms such as Apigee or API gateways
  • Exposure to IBM DataPower or similar enterprise integration tools
  • Expertise in Ansible Tower, including developing and maintaining playbooks
  • Experience with cloud-native architectures, high-availability systems, Cloud & Container Technologies like GCP or Azure and familiarity with Kubernetes
  • Strong experience working in Agile methodologies / Scrum environments
  • Experience improving system reliability, scalability, and operational efficiency
  • Experience in project management and stakeholder engagement
  • Proven experience in leading cross-functional teams
  • Strong problem-solving and decision-making abilities
  • Excellent communication and collaboration skills

Responsibilities

  • Lead daily support operations for Apigee OPDK, Apigee Hybrid, to ensure platform uptime, stability, and performance
  • Troubleshoot runtime, policy, routing, and security issues on DataPower appliances
  • Develop specifications for complex infrastructure systems, design and test solutions
  • Contribute to the testing of business, application and technical infrastructure requirements
  • Implement reliability improvements through Infrastructure-as-Code (IaC) using Terraform, Ansible, and GitOps
  • Develop automated recovery scripts and tools to reduce manual operational overhead
  • Review and analyze solutions for cloud security, secrets management and key rotations
  • Design, code, test, debug and document programs using Agile development practices
  • Plan and execute version upgrades, patching cycles, infrastructure migrations, and configuration refactoring.
  • Improve proactive alerting to reduce mean time to detect (MTTD) and mean time to recover (MTTR)
  • Own and resolve P1/P2 high-severity incidents with quick response and deep technical troubleshooting
  • Direct the daily risk and control flow of operations, focusing on policies, procedures and work standards to ensure success
  • Participate in design discussions, architectural reviews, API governance activities, and platform modernization initiatives
  • Work with CAB (Change Advisory Board) for change planning, approvals, and execution tracking
  • Contribute to runbooks, SOPs, architectural diagrams, and platform knowledge base assets

Benefits

  • Relocation assistance is not available for this position
  • May be considered for a discretionary bonus, Restricted Share Rights, or other long – term incentive awards.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service