Senior Manager, Site Reliability Engineer

Ping Identity•Denver, CO

About The Position

At Ping Identity, we believe in making digital experiences both secure and seamless for all users, without compromise. We call this digital freedom. And it's not just something we provide our customers. It's something that inspires our company. People don't come here to join a culture that's built on digital freedom. They come to cultivate it. Our intelligent, cloud identity platform lets people shop, work, bank, and interact wherever and however they want. Without friction. Without fear. While protecting digital identities is at the core of our technology, protecting individual identities is at the core of our culture. We champion every identity. One of our core values, Respect Individuality, reminds us to celebrate differences so you are empowered to bring your authentic self to work. We're headquartered in Denver, Colorado and we have offices and employees around the globe. We serve the largest, most demanding enterprises worldwide, including more than half of the Fortune 100. At Ping Identity, we're changing the way people and businesses think about cybersecurity, digital experiences, and identity and access management. As a Ping Identity SRE, you will be involved in every facet of our On-Demand SaaS services and will build, deploy, and maintain the infrastructure of one of the largest identity platforms in the world. We follow a DevOps model: our teams are integrated with development teams, and running continuous deployments daily, and SREs are expected to provide input in the product's design, development, deployment, and operations. Working within the Cloud Operations team, you'll manage a team that builds automated infrastructure and deployments. You'll be the expert on operational excellence and how systems can be built to be; redundant, scalable, and observable.

Requirements

6+ years experience leading a software focused SRE team of 8-10 staff.
Experience working in organizations with a global presence.
The ability to drive decisions around build vs buy.
Develop, maintain and administer modern infrastructure tooling with an emphasis on Infrastructure As Code (IAC).
Experience provisioning public cloud resources using IAC tools such as CloudFormation and Terraform.
Knowledge of scripting and programming standards (Python/Ruby/Bash/Go/etc.)
Experience with Docker and container orchestration (Kubernetes).
Experience using Git in a large team environment.
Experience with Security design principles.
Experience in a high-volume or critical production service environment.
IP networking; familiarity with the functionality, operating, and failure modes of networks.

Nice To Haves

Familiarity with observability tooling such as NewRelic, Splunk, Grafana, and Cloudwatch.
Familiarity with DevOps automation tools such as Jenkins, Artifactory, Spacelift
Solid experience with server configuration with Puppet/Chef/Salt.

Responsibilities

Leadership and Mentorship of a team of 8-10 SREs.
Oversee and maintain our production infrastructure hosted on AWS with a 99.99%+ uptime SLA
Collaboration with other SRE, Security and Development teams.
Define processes for the team to efficiently meet target dates.
Drive large projects to completion working with multiple Development teams.
Capacity analysis and planning.
Effectively manage and scale infrastructure through automation standards.
Analyze complex system behavior, performance and application issues.
Oversee observability and analysis across multiple datacenters.