Systems Operations Manager – Data Platforms & Pipelines

Wells Fargo•Charlotte, NC

2d•Hybrid

About The Position

Wells Fargo is seeking a leader to oversee enterprise Systems Operations supporting large‑scale data platforms and pipelines. We are looking for a mid level candidate who would be responsible for ensuring operational excellence, system reliability, and 24x7 global support for business‑critical data services. There is no Visa sponsorship or Visa transfers for this position. This is a hybrid position and in the office three days a week. In This Role, You Will: Lead and oversee day‑to‑day operations of enterprise systems and infrastructure supporting large‑scale data pipelines and applications Ensure high availability, performance, and reliability of platforms, meeting established service levels for downstream applications, users, and reporting systems Enforce Service Level Agreements (SLAs) and Operational Level Agreements (OLAs), including adherence to data protection and data loss prevention standards Direct incident management, root cause analysis, and post‑incident remediation to continuously improve system stability and operational maturity Make strategic decisions related to operating systems, application software, and systems management tools to meet business and technology objectives Manage and mentor a team of mid‑ to senior‑level Systems Operations analysts and engineers Foster a culture of ownership, accountability, and operational excellence across geographically distributed teams Partner closely with India‑based support teams to deliver effective 24x7 global coverage Oversee workforce planning, workload allocation, and performance management across a blended team of full‑time employees and contractors Manage financial and people resources, including vendor and contractor oversight Support talent development through coaching, succession planning, and hiring of high‑performing operational talent Engage with business stakeholders, engineering, architecture, infrastructure, platform teams, and external vendors Provide clear and transparent communication regarding system health, incidents, risks, and remediation plans Serve as an escalation point for operational issues impacting business‑critical systems Drive adoption of Site Reliability Engineering (SRE) principles to improve reliability, scalability, and risk management Lead automation‑first initiatives to reduce manual effort, improve consistency, and accelerate incident response Advance observability through effective monitoring, logging, metrics, and alerting to enable faster detection and root cause analysis Support operational readiness for modernization, migration, and cloud transformation initiatives planned for 2026 and beyond Develop, interpret, and enforce operational policies, procedures, and standards Ensure compliance with enterprise security, regulatory, resilience, and risk management requirements Maintain comprehensive runbooks, operational documentation, and knowledge repositories to support global continuity and consistency

Requirements

5+ years of Systems Engineering and Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, or education
2+ years of leadership experience
Experience with these key skills: Spark, OCP, DevOps CI/CD

Nice To Haves

Experience supporting enterprise ETL and data pipeline platforms
Proficient in building, maintaining, and enhancing Spark‑based data pipeline frameworks
Demonstrated leadership operating platforms, pipelines, and self‑service tooling in 24x7 global support models
Strong background in incident management, automation, and operational transformation
Strong knowledge of DevSecOps practices, including secure CI/CD pipelines, automated testing, and integrated security controls
Hands‑on understanding of Kubernetes operations, container orchestration concepts, and cloud‑native deployment patterns
Experience with Scala and big data technologies

Responsibilities

Lead and oversee day‑to‑day operations of enterprise systems and infrastructure supporting large‑scale data pipelines and applications
Ensure high availability, performance, and reliability of platforms, meeting established service levels for downstream applications, users, and reporting systems
Enforce Service Level Agreements (SLAs) and Operational Level Agreements (OLAs), including adherence to data protection and data loss prevention standards
Direct incident management, root cause analysis, and post‑incident remediation to continuously improve system stability and operational maturity
Make strategic decisions related to operating systems, application software, and systems management tools to meet business and technology objectives
Manage and mentor a team of mid‑ to senior‑level Systems Operations analysts and engineers
Foster a culture of ownership, accountability, and operational excellence across geographically distributed teams
Partner closely with India‑based support teams to deliver effective 24x7 global coverage
Oversee workforce planning, workload allocation, and performance management across a blended team of full‑time employees and contractors
Manage financial and people resources, including vendor and contractor oversight
Support talent development through coaching, succession planning, and hiring of high‑performing operational talent
Engage with business stakeholders, engineering, architecture, infrastructure, platform teams, and external vendors
Provide clear and transparent communication regarding system health, incidents, risks, and remediation plans
Serve as an escalation point for operational issues impacting business‑critical systems
Drive adoption of Site Reliability Engineering (SRE) principles to improve reliability, scalability, and risk management
Lead automation‑first initiatives to reduce manual effort, improve consistency, and accelerate incident response
Advance observability through effective monitoring, logging, metrics, and alerting to enable faster detection and root cause analysis
Support operational readiness for modernization, migration, and cloud transformation initiatives planned for 2026 and beyond
Develop, interpret, and enforce operational policies, procedures, and standards
Ensure compliance with enterprise security, regulatory, resilience, and risk management requirements
Maintain comprehensive runbooks, operational documentation, and knowledge repositories to support global continuity and consistency