Technical Program Mgr

NetApp, Inc.
12h

About The Position

NetApp ActiveIQ is a digital advisor that uses AIOps to simplify and automate the proactive care and optimization of customer’s infrastructure environment to improve its health and availability. ActiveIQ provides actionable intelligence for optimal storage health and simplified management. As a Senior SRE Technical Program Manager for Active IQ, you will lead cross-functional initiatives designed to drive reliability, scalability, and performance across our infrastructure and applications. You will collaborate closely with engineering, product, and SRE teams to launch, optimize, secure and scale both existing and new services. Your role balances operational excellence, project management, deep reliability engineering understanding, and stakeholder communication. You’ll champion best practices, foster a culture of continuous improvement, and ensure the successful execution of complex, multi-team programs.

Requirements

  • Extensive knowledge in the Cloud Architecture, Containerization (Kubernetes, dockers), distributed systems, microservices architecture, reliability engineering concepts, and automation tools.
  • Program/Project Management: Strong background in Agile, Scrum, Kanban, and program management methodologies. Experience managing complex, multi-stakeholder technical projects.
  • Incident Management: Hands-on experience with incident lifecycle, Root Cause Analysis (RCA), and blameless postmortems.
  • Monitoring & Observability: Familiarity with tools such as Dynatrac, Prometheus, Grafana, Splunk, or similar.
  • Communication: Excellent written and verbal communication, especially across technical and executive audiences.
  • Stakeholder Management: Proven ability to influence and collaborate with technical and non-technical stakeholders.
  • Problem Solving: Data-driven decision making with a bias for action and continuous improvement.
  • Mentorship: Experience mentoring junior TPMs, SREs, or engineers.
  • Education: B.S. Computer Science or Engineering; M.S. in Computer Science is a plus.
  • Experience: 20+ years of experience in SRE, Operations and Project Management
  • Certifications (PMP, CSM, SAFe) and/or SRE Certifications
  • Experience in largescale migrations, cloud adoption programs, or highavailability web services.

Nice To Haves

  • Gen AI Knowledge is a plus

Responsibilities

  • Drive the end to end delivery of largescale reliability and infrastructure projects, ensuring programs are on schedule, within scope, and aligned with business priorities.
  • Lead crossfunctional teams to define and implement SRE initiatives, such as incident management, SLAs, SLOs, and error budgets.
  • Collaborate with engineering and product teams to design and introduce reliability and security improvements for existing services.
  • Champion the adoption of automation, monitoring, and resilience best practices.
  • Develop and maintain project plans, dashboards, and progress reports, ensuring transparency for stakeholders.
  • Lead postincident reviews and identify improvements in processes and technology.
  • Coordinate and facilitate production readiness reviews, capacity planning, and disaster recovery exercises.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service