Manager-Site Reliability

Innovaccer
·
Posted: 
August 30, 2023
·
Onsite
Job Commitment
Full-time
Job Commitment
Senior
Job Function
Dev & Engineering
Salary
N/A
Job Commitment
Full-time
Experience Level
Senior
Workplace Type
Onsite
Job Function

This job is closed

We regret to inform you that the job you were interested in has now been closed. Although this specific position is no longer available, we encourage you to continue exploring other opportunities on our job board.

About the position

This role is seeking an experienced engineering team member with a combination of Ops/Support and Site reliability experience. The role involves leading a team of engineers focused on production engineering and collaborating with different teams to drive initiatives and best practices adoption. The responsibilities include designing and architecting various domains of Production Engineering, leading production rollouts, establishing a solid observability stack, optimizing system utilization, and managing production and staging cloud platforms. The role also involves collaborating with various teams, conducting operations reviews, ensuring platform security, and participating in incident response.

Responsibilities

  • Lead a team of engineers focused on production engineering
  • Design and architect various domains of Production Engineering
  • Collaborate with different teams and drive initiatives and best practices adoption
  • Drive change across Dev teams on architecture patterns based on production issues and behavior
  • Lead the production roll out of new releases and emergency patches using CICD pipelines
  • Establish a solid production promotion/change management process
  • Roll out an observability stack to proactively detect outages and service degradation
  • Apply analytical skills to understand production system metrics and optimize system utilization and cost efficiency
  • Autoscale/down the platform during peak season scenarios
  • Be part of the 24x7 OnCall Production Support team
  • Understand end-to-end platform architecture and perform triage/RCA using observability tool chain
  • Work towards reducing the number of alerts/escalations to the next level team
  • Lead monthly operations review with the executive team
  • Operate and manage production and staging cloud platforms
  • Collaborate with various teams to drive RCA and product improvements
  • Ensure platform security as per guidelines established by CISO
  • Implement least privilege based RBAC for production services and tool chain
  • Build and execute a Disaster Recovery plan
  • Participate in Incident Response as a key stakeholder

Requirements

  • Experience in Ops/Support and Site reliability
  • Strong sense of ownership and a "can-do" attitude
  • Experience with MultiCloud platforms (AWS, Azure, GCP)
  • Experience with Distributed Compute technologies (Kubernetes, Containerization)
  • Experience with Persistence stores (Postgres, MongoDB)
  • Experience with DataWarehousing (Snowflake, DataBricks)
  • Experience with Messaging systems (Kafka)
  • Experience with CICD tools (Jenkins, ArgoCD, GitOps)
  • Experience with Observability tools (ElasticSearch, Prometheus, Jaeger, NewRelic)
  • Leadership skills to lead a team of engineers
  • Knowledge of SRE pillars - Deployment, Reliability, Scalability, Service Availability, Performance, Cost
  • Experience in production roll out of new releases and emergency patches using CICD pipelines
  • Strong analytical skills to optimize system utilization and drive cost efficiency
  • Ability to work in a 24x7 OnCall Production Support team
  • Understanding of platform architecture and ability to perform triage/RCA
  • Ability to reduce the number of alerts/escalations to the next level team
  • Experience in operations review with executive team
  • Experience in operating and managing production and staging cloud platforms
  • Collaboration skills to work across teams and drive product improvements
  • Knowledge of security guidelines and implementation (DDoS attacks, RBAC, Disaster Recovery)
  • Ability to participate in Incident Response situations.

Benefits

  • Collaboration with a spectrum of teams (Dev/DevOps/QA/Customer Success)
  • Opportunity to derive RCA/5 why analysis and drive product improvements
  • Secure platform against DDoS attacks and implement required security measures
  • Lead least privilege based RBAC for various production services and tool chain
  • Build and execute Disaster Recovery plan
  • Key stakeholder in Incident Response
  • Opportunity to work with AWS, Azure, or GCP
  • Building reliability, scalability, and performance systems in Production
  • Experience with log/metrics/tracing tool chain
  • Hands-on experience with Kubernetes and Linux
  • Programming experience with scripting languages like Python
  • Documentation and structuring skills
  • Experience in a Production environment with process focus
  • Ticketing system and Incident management experience
  • Security background and security first approach mindset
  • Experience with CICD pipelines and tool chains
  • Hands-on experience with Kafka, Postgre, SnowFlake, etc.
  • Opportunity to build high performing teams
  • Ability to perform under pressure without taking shortcuts
  • Strong verbal and oral communication skills
  • Cross-functional collaboration skills
  • Strong problem-solving skills
  • Excellent time management and organizational skills
  • Sense of personal responsibility and accountability for delivering high quality work

Job Application Resources

No items found.

More Openings at Innovaccer

Innovaccer
Web Design
Web Design
Web Design
Web Design
Onsite
·
Full-time
·
Dev & Engineering
$
320,000
-
$
360,000
/Year
·
Mid Level
·
101-250
Employees
This is some text inside of a div block.
Innovaccer
Web Design
Web Design
Web Design
Web Design
Onsite
·
Full-time
·
Marketing
$
320,000
-
$
360,000
/Year
·
Senior
·
101-250
Employees
This is some text inside of a div block.
Innovaccer
Web Design
Web Design
Web Design
Web Design
Onsite
·
Full-time
·
Design & UX
$
320,000
-
$
360,000
/Year
·
Senior
·
101-250
Employees
This is some text inside of a div block.
Innovaccer
Web Design
Web Design
Web Design
Web Design
Onsite
·
Full-time
·
Dev & Engineering
$
320,000
-
$
360,000
/Year
·
Mid Level
·
101-250
Employees
This is some text inside of a div block.
Innovaccer
Web Design
Web Design
Web Design
Web Design
Onsite
·
Full-time
·
Marketing
$
320,000
-
$
360,000
/Year
·
Manager
·
101-250
Employees
This is some text inside of a div block.
Innovaccer
Web Design
Web Design
Web Design
Web Design
Onsite
·
Full-time
·
Dev & Engineering
$
320,000
-
$
360,000
/Year
·
Mid Level
·
101-250
Employees
This is some text inside of a div block.

Similar Jobs

Reltio
Web Design
Web Design
Web Design
Web Design
Onsite
·
Full-time
·
Dev & Engineering
$
320,000
-
$
360,000
/Year
·
Senior
·
101-250
Employees
This is some text inside of a div block.
Reddit
Web Design
Web Design
Web Design
Web Design
Onsite
·
Full-time
·
Dev & Engineering
$
320,000
-
$
360,000
/Year
·
Manager
·
101-250
Employees
This is some text inside of a div block.
Recorded Future
Web Design
Web Design
Web Design
Web Design
Onsite
·
Full-time
·
Dev & Engineering
$
320,000
-
$
360,000
/Year
·
Manager
·
101-250
Employees
This is some text inside of a div block.
Reddit
Web Design
Web Design
Web Design
Web Design
Onsite
·
Full-time
·
Dev & Engineering
$
320,000
-
$
360,000
/Year
·
Manager
·
101-250
Employees
This is some text inside of a div block.
Recursion
Web Design
Web Design
Web Design
Web Design
Onsite
·
Full-time
·
Dev & Engineering
$
320,000
-
$
360,000
/Year
·
Director
·
101-250
Employees
This is some text inside of a div block.
Pure Storage
Web Design
Web Design
Web Design
Web Design
Onsite
·
Full-time
·
Dev & Engineering
$
320,000
-
$
360,000
/Year
·
Mid Level
·
101-250
Employees
This is some text inside of a div block.

Innovaccer

Supercharge your transformation with the Innovaccer Health Cloud
Location
San Francisco, CA
Company Size
1,001-5,000
Workplace Type
Industries
Artificial Intelligence
Health Care
Data and Analytics
Science and Engineering
Software
Health Tech
Open Roles
13
Less details
Create a Tailored Resume for this Role in Minutes
Start Building for Free

Innovaccer

Supercharge your transformation with the Innovaccer Health Cloud
Company Overview

Supercharge your transformation with the Innovaccer Health Cloud

Benefits
  • Industry-Focused Certifications
  • Rewards and Recognition
  • Health Insurance and Mental Well-being
  • Sabbatical Leave Policy
  • Open Floor Plan
  • Paternity and Maternity Leave
  • Paid Time Off
Less details

Want Jobs in Your Inbox?

Sign up for the Teal newsletter and get career guidance and new jobs weekly!
Thank you! Your submission has been received!
Oops! Please provide a correct email address