Observability and Automation Leader

Wells Fargo•Iselin, NJ

2d•$159,000 - $305,000•Hybrid

About The Position

Wells Fargo is seeking an Observability and Automation Leader to support and provide technology services for the Chief Operating Office Technology (COO Tech) Organization. The Chief Operating Office Technology (COO Tech) organization powers the technology behind some of the company’s most critical enterprise functions—from operational resilience and strategic execution to data, customer experience, supply chain, and shared services. Our mission is to modernize and optimize the platforms that enable the business to operate smoothly, scale confidently, and stay future‑ready. Within COO Technology, the Platform & Application Services team is building the next generation of intelligent, resilient systems—and we’re looking for a Systems Operations Senior Manager (Observability and Automation Leader) to help lead that transformation. In this role, you’ll lead a globally distributed team of engineers and play a pivotal role in shifting the organization from reactive firefighting to proactive, predictive system operations. You’ll champion modern observability practices, introduce advanced tools and automation, and drive the cultural evolution needed to achieve deep system insights and operational excellence at scale. This is an opportunity for a hands‑on, strategic leader who thrives at the intersection of technology, people, and transformation—and who wants to leave a lasting mark on how enterprise systems are built and run. This role is designed to move the organization from reactive troubleshooting to proactive and predictive system management, which directly addresses operational resilience and risk reduction. By standardizing telemetry (metrics, logs, traces) and correlating signals across platforms, the organization can identify issues earlier, prevent incidents, and shorten recovery times when failures occur. This directly supports COO priorities around business continuity, resiliency, and regulatory readiness.

Requirements

7+ years of Systems Engineering and Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
3+ years of management or leadership experience
4+ years of experience and expertise in observability & incident prevention methodologies, risk management frameworks, and operational excellence principles
4+ years of experience and expertise in cloud infrastructure, containerization platforms, application and platforms monitoring tools, and security technologies
4+ years of Site Reliability Engineering experience

Nice To Haves

Experience and expertise in platforms and application support within a financial services organization
Strong executive presence, facilitation skills, drive for results, attention to quality and detail, and a synergetic attitude
Ability to develop and deliver effective, well-articulated, written, and oral communications (e.g., emails, presentation materials) applying appropriate level of content/ detail dependent upon audience and intent
Experience as a highly effective leader with credibility from demonstrating strong business and technology acumen
Experience motivating and influencing groups or individuals across organizational boundaries to gain trust and confidence to make timely decisions
Excellent communications, interpersonal, and presentations skills across technical and non-technical groups
Proven track record of successfully working in a high-pressure environment
Strategic thinking and problem-solving abilities with a focus on proactive solutions
Experience with observability and automation solutions.

Responsibilities

Develop and implement a maturity model for observability, standardizing data collection and centralizing system telemetry to enhance root cause analysis and proactive system management.
Integrate observability tools with incident management and CI/CD pipelines, automating alert tuning and remediation using AI and machine learning for real-time anomaly detection.
Define, monitor, and align Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with business outcomes, promoting business observability and user experience insights.
Implement distributed tracing to optimize service interactions, especially within microservices architectures, and drive the adoption of observability as code (OaC) for consistency and automation.
Continuously optimize system performance and reliability with minimal manual intervention, leveraging automation and predictive management practices.
Manage and develop high-performing, technical teams, fostering a culture of talent development, and ensuring effective communication with customers regarding incidents and system changes.
Engage and collaborate with stakeholders to engineer projects, identify and implement new solutions, and support key risk initiatives.
Oversee network assessments, security audits, system enhancements, and ensure compliance with risk management policies and procedures.
Manage allocation of people and financial resources for Systems Operations, staying updated on emerging technologies and best practices in observability, automation, and AIOps.