Systems Operations Manager – Data Platforms & Pipelines

Wells FargoCharlotte, NC
Hybrid

About The Position

Wells Fargo is seeking a leader to oversee enterprise Systems Operations supporting large‑scale data platforms and pipelines. We are looking for a mid level candidate who would be responsible for ensuring operational excellence, system reliability, and 24x7 global support for business‑critical data services. There is no Visa sponsorship or Visa transfers for this position. This is a hybrid position and in the office three days a week. In This Role, You Will: Lead and oversee day‑to‑day operations of enterprise systems and infrastructure supporting large‑scale data pipelines and applications Ensure high availability, performance, and reliability of platforms, meeting established service levels for downstream applications, users, and reporting systems Enforce Service Level Agreements (SLAs) and Operational Level Agreements (OLAs), including adherence to data protection and data loss prevention standards Direct incident management, root cause analysis, and post‑incident remediation to continuously improve system stability and operational maturity Make strategic decisions related to operating systems, application software, and systems management tools to meet business and technology objectives Manage and mentor a team of mid‑ to senior‑level Systems Operations analysts and engineers Foster a culture of ownership, accountability, and operational excellence across geographically distributed teams Partner closely with India‑based support teams to deliver effective 24x7 global coverage Oversee workforce planning, workload allocation, and performance management across a blended team of full‑time employees and contractors Manage financial and people resources, including vendor and contractor oversight Support talent development through coaching, succession planning, and hiring of high‑performing operational talent Engage with business stakeholders, engineering, architecture, infrastructure, platform teams, and external vendors Provide clear and transparent communication regarding system health, incidents, risks, and remediation plans Serve as an escalation point for operational issues impacting business‑critical systems Drive adoption of Site Reliability Engineering (SRE) principles to improve reliability, scalability, and risk management Lead automation‑first initiatives to reduce manual effort, improve consistency, and accelerate incident response Advance observability through effective monitoring, logging, metrics, and alerting to enable faster detection and root cause analysis Support operational readiness for modernization, migration, and cloud transformation initiatives planned for 2026 and beyond Develop, interpret, and enforce operational policies, procedures, and standards Ensure compliance with enterprise security, regulatory, resilience, and risk management requirements Maintain comprehensive runbooks, operational documentation, and knowledge repositories to support global continuity and consistency

Requirements

  • 5+ years of Systems Engineering and Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, or education
  • 2+ years of leadership experience
  • Experience with these key skills: Spark, OCP, DevOps CI/CD

Nice To Haves

  • Experience supporting enterprise ETL and data pipeline platforms
  • Proficient in building, maintaining, and enhancing Spark‑based data pipeline frameworks
  • Demonstrated leadership operating platforms, pipelines, and self‑service tooling in 24x7 global support models
  • Strong background in incident management, automation, and operational transformation
  • Strong knowledge of DevSecOps practices, including secure CI/CD pipelines, automated testing, and integrated security controls
  • Hands‑on understanding of Kubernetes operations, container orchestration concepts, and cloud‑native deployment patterns
  • Experience with Scala and big data technologies

Responsibilities

  • Lead and oversee day‑to‑day operations of enterprise systems and infrastructure supporting large‑scale data pipelines and applications
  • Ensure high availability, performance, and reliability of platforms, meeting established service levels for downstream applications, users, and reporting systems
  • Enforce Service Level Agreements (SLAs) and Operational Level Agreements (OLAs), including adherence to data protection and data loss prevention standards
  • Direct incident management, root cause analysis, and post‑incident remediation to continuously improve system stability and operational maturity
  • Make strategic decisions related to operating systems, application software, and systems management tools to meet business and technology objectives
  • Manage and mentor a team of mid‑ to senior‑level Systems Operations analysts and engineers
  • Foster a culture of ownership, accountability, and operational excellence across geographically distributed teams
  • Partner closely with India‑based support teams to deliver effective 24x7 global coverage
  • Oversee workforce planning, workload allocation, and performance management across a blended team of full‑time employees and contractors
  • Manage financial and people resources, including vendor and contractor oversight
  • Support talent development through coaching, succession planning, and hiring of high‑performing operational talent
  • Engage with business stakeholders, engineering, architecture, infrastructure, platform teams, and external vendors
  • Provide clear and transparent communication regarding system health, incidents, risks, and remediation plans
  • Serve as an escalation point for operational issues impacting business‑critical systems
  • Drive adoption of Site Reliability Engineering (SRE) principles to improve reliability, scalability, and risk management
  • Lead automation‑first initiatives to reduce manual effort, improve consistency, and accelerate incident response
  • Advance observability through effective monitoring, logging, metrics, and alerting to enable faster detection and root cause analysis
  • Support operational readiness for modernization, migration, and cloud transformation initiatives planned for 2026 and beyond
  • Develop, interpret, and enforce operational policies, procedures, and standards
  • Ensure compliance with enterprise security, regulatory, resilience, and risk management requirements
  • Maintain comprehensive runbooks, operational documentation, and knowledge repositories to support global continuity and consistency

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service