Manager, Incident Ops and Observability

F5 NetworksSeattle, WA
2d

About The Position

At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation. Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive. About F5 At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation. Everything we do c enters around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive. Position Summary We are seeking a manager to help build our new Site Reliability Engineering team to strengthen operational excellence across the Infrastructure & Security and F5 Digital organization. This role will play an important part in Digital’s incident management strategy, building out the Reliability Operations Center and monitoring capabilities and technologies to help Digital understand problems before our users do. The ideal candidate will bring deep expertise in incident lifecycle management—from detection a nd triage to resolution and post-mort em—and will collaborate cross-functionally to drive continuous improvement in our security posture. This leader will operationalize a world-class incident management program while also defining and implementing the vision for observability across F5’s hybrid infrastructure and cloud environments. This role requires strong leadership, technical acumen, and the ability to operat e under pressure while maintainin g clear communication with stakeholders at all levels.

Requirements

  • 5+ years of experience in running NOC/SOC/SRE teams with a focus on monitoring and observability .
  • 10+ years managing incident response, IT service management, or a related field.
  • Proven track record of managing complex security incidents in cloud and hybrid environments.
  • Experience with SIEM, SOAR, and log analysis tools (e.g., Splunk, DataDog , Panthe r , Crowdstrike ).
  • Experience with observability tools, especially tooling focused on synthetics , metrics, and infrastructure telemetry ( e.g. Grafana, ThousandEyes , LogicMonitor , Pingdom, Zabbix)
  • Excellent communication skills with the ability to convey technical information to both technical and non-technical audiences.
  • Ability to lead under pressure, prioritize effectively, and make decisions in high-stakes situations.
  • Familiarity with AWS, Google Workspace, and common SaaS platforms.
  • Bachelor’s degree in Computer Science , Cybersecurity , Information Systems, or related field (or equivalent experience).

Nice To Haves

  • Experience working in infrastructure, IT, or security organizations.
  • Familiarity with tools such as Tableau, PowerBI , or other reporting/analytics platforms.
  • Comfortable navigating ambiguity, with a proactive approach to problem-solving.
  • Strong interest in scaling operations and driving impact in security-focused initiatives.

Responsibilities

  • Lead the global Incident Response (IR) program, optimizing processes across detection, triage, containment, remediation, and post-incident analysis.
  • Hire, mentor and train global team members on incident response best practices and observability tooling .
  • Serve as technical lead and head engineer for creation and management of monitoring tools and services to support F5 infrastructure and business systems.
  • Serve as the primary incident commander during major incidents , ensuring timely resolution , excellent communication, and stakeholder alignment.
  • Define and continuously refine incid ent response policies, procedures, and runbooks to ensure consistent an d effectiv e handling of incidents.
  • Drive improvements in detection, escalation, and resolution through automation, tooling, and process enhancements.
  • Define and report KPIs for service reliability, incident response, and observability maturity to senior leadership.
  • Conduct root cause analyses and lead post-incident reviews to identify lessons learned and prevent recurrence.
  • Design and lead cross-functional tabletop exercises to strengthen organizational preparedness, communication, and response coordination during major incidents.
  • Maintain detailed incident records and metrics to support auditing, compliance, and continuous improvement.
  • Collaborate with ServiceNow team s and architects to manage incidents.
  • Establish an d maintain on-call rotation s with teams who own critical applications across the Digital organization .

Benefits

  • You may also be offered incentive compensation, bonus, restricted stock units, and benefits.
  • More details about F5’s benefits can be found at the following link: https://www.f5.com/company/careers/benefits .
  • F5 reserves the right to change or terminate any benefit plan without notice.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Manager

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service