Site Reliability Engineering Manager

F5, Inc•San Jose, CA

60d

About The Position

At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation. Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive. Manager, Site Reliability Engineering

Requirements

10+ years experience in engineering with Team Lead/Management experience.
Extensive SRE background and experience.
Excellent knowledge of Kubernetes and CI/CD.
Deep understanding of DevOps and site reliability engineering principles, including DORA, SLOs, SLIs, error budgets, and incident management.
Strong experience with both private and public cloud computing platforms (e.g., AWS, Google Cloud, Azure).
Proven experience with infrastructure as code (IaC) tools like Terraform or Ansible, and configuration management systems.
Excellent knowledge of Distributed cloud, Kubernetes, GitOps, CI/CD, and Networking.
Strong experience with observability platforms.
pache Kafka: Expertise in event streaming architecture, topic design, producer/consumer configuration, and handling high-volume, low-latency data pipelines. Experience with Kafka Connect and Schema Registry is a plus.
Vector (Datadog/Timber.io/Logs): Proficiency in configuring Vector for observability pipelines, including log transformation, enrichment, and routing to multiple sinks (e.g., Elasticsearch, S3, ClickHouse).
Experience with the Cortex suite of observability tools, including Cortex, Loki, Tempo, and Prometheus integration for scalable, multi-tenant monitoring systems
Architectural experience in constructing and overseeing large-scale cloud-based projects.
Provable proficiency and knowledge of Golang / Python.
Architectural experience building and managing large scale cloud-based efforts.
Excellent communication and presentation skills.
Technical confidence and familiarity with DevOps tools, techniques and more importantly, mindset.
Possess problem-solving attitude.

Responsibilities

Manage local SRE team.
Define and manage project plans and deliverables for the SRE evolution across the portfolio.
Investigate and resolve technical issues.
Properly and quickly prioritizing and scoping requirements.
Tracking issues and reporting statuses via use of platforms such as Gitlab.
Perform research and provide read outs to teams on such tools and technologies.
Provide guidance to team members in the right and proper direction (e.g. providing guidance on how to be using different tools and which tool to use).
The individual will work with the respective personnel to create/manage and work to complete the defined project plans and deliverables.
Improve our Kubernetes application delivery to production.
Design procedures for system troubleshooting and maintenance.
Performs other related duties as assigned.

Benefits

You may also be offered incentive compensation, bonus, restricted stock units, and benefits.
More details about F5's benefits can be found at the following link: https://www.f5.com/company/careers/benefits.
F5 reserves the right to change or terminate any benefit plan without notice.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume