Sr. Manager, SRE

iCIMS•Holmdel, NJ

119d•$150,000 - $200,000

About The Position

We are seeking an exceptional Senior Manager of Site Reliability Engineering (SRE) to lead our global SRE organization and drive operational excellence across our multi-cloud SaaS platform. This role is critical to our mission of delivering reliable, scalable, and performant solutions to thousands of customers worldwide. The successful candidate will lead distributed teams across the US, Ireland, and India while ensuring optimal customer outcomes through proactive issue prevention and rapid incident resolution. Success Metrics include Customer Impact, Reliability, Team Growth, Proactive Prevention, and Cross-functional Collaboration.

Requirements

15+ years in SRE, DevOps, or Infrastructure Engineering roles with 5+ years in senior positions
Proven track record of scaling global engineering teams across multiple time zones
Experience leading teams through high-stakes incident response and customer escalations
Strong organizational skills with ability to influence cross-functional stakeholders
Deep expertise in multi-cloud environments (AWS primary, Azure secondary, GCP preferred)
Extensive experience with containerization, orchestration, and modern deployment practices
Strong background in database technologies
Proficiency with observability tools (New Relic, Grafana, Sumo Logic, or similar)
Experience with large-scale Java applications and legacy system modernization
Demonstrated success implementing SRE principles in large-scale production environments
Experience with ITIL, incident management frameworks and tools
Background in establishing and maintaining SLAs for enterprise SaaS products

Nice To Haves

Background with authentication systems (Auth0, Okta, SAML, OAuth)
Experience with API management platforms and integration architectures
Previous exposure to CDN optimization and global content delivery
Relevant certifications in AWS, Azure, or SRE practices

Responsibilities

Lead and scale a global SRE organization spanning multiple time zones (US, Ireland, India)
Develop and execute SRE strategy aligned with business objectives and customer success metrics
Drive cultural transformation toward reliability-first engineering practices across the organization
Partner closely with Customer Success to ensure customer-centric approach to all SRE initiatives
Establish and maintain SLAs, SLOs, and error budgets that balance reliability with feature velocity
Lead enterprise-wide incident management, ensuring rapid detection, response, and resolution
Serve as executive point of contact during critical incidents
Drive comprehensive root cause analysis (RCA) processes with actionable prevention strategies
Establish and maintain 24/7 on-call rotation and escalation procedures across global teams
Develop and execute disaster recovery and business continuity plans
Provide technical direction for complex, multi-cloud infrastructure spanning AWS, Azure, and GCP
Oversee reliability engineering for our entire product portfolio
Lead application performance monitoring initiatives
Drive modernization efforts and ensure optimal performance across geographically distributed DCs
Drive best practices in tuning SQL and NoSQL data platforms
Ensure high availability and performance of services including AWS, Authentication, Integration platforms, BI, API management, and Legacy systems
Manage reliability for thousands of customers in North America and EU
Establish observability standardization strategy
Drive automation initiatives to reduce manual operational overhead
Implement chaos engineering and reliability testing practices
Lead capacity planning and performance optimization efforts
Establish metrics-driven culture with focus on customer impact measurements

Benefits

Competitive health and wellness benefits including medical, dental, vision
401(k) with dependent care
Short term and long-term disability insurance
Life and AD&D insurance
Bonding and parental leave
Mindfulness resources
Open vacation policy
Sick days
Paid holidays
Quiet hours each workday
Tuition reimbursement

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Industry

Professional, Scientific, and Technical Services

Sr. Manager, SRE

About The Position

Requirements

Nice To Haves

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company