Senior Manager, Systems and Site Reliability Engineering

Cambria•Belle Plaine, MN

9d•$118,275 - $156,169•Onsite

About The Position

The Senior Manager, Systems and Site Reliability Engineering serves as the operational engine for Cambria’s infrastructure, acting as the primary bridge between IT strategy and technical execution. This role is responsible for the performance and development of Systems Engineers and Site Reliability Engineers. While driving the modernization to a container-first model , this leader ensures that services are available and perform optimally to scale manufacturing operations. As the #2 leader in IT Operations, you will translate strategic roadmaps into daily deliverables, manage high-pressure incident response, and serve as the technical face of the team to internal business partners.

Requirements

Extensive experience with container orchestration (Kubernetes, Nutanix EKS, AWS EKS) and modern CI/CD practices
Experience with implementing and maintaining virtualization platforms such as VMware/AHV
Solid foundation in Linux administration and troubleshooting
Experience writing and maintaining infrastructure as code
Experience working with Agile delivery methodologies such as Scrum and Kanban.
Familiarity and experience with ALM toolsets (Cambria uses Jira) and collaboration software (such as Slack, G-Suite, and Confluence).
Strong leadership and management skills.
Excellent communication and interpersonal skills.
Strong business acumen; can easily articulate complex ideas clearly to all levels of leadership
The ability pivot and drive change in an ever changing environment
Excellent time management and organizational skills
Proven track record of delivering results
Strong motivational, influential and organized leadership skills to lead the team to accomplish goals
Strong analytical and problem-solving skills.
Ability to thrive in a fast-paced, dynamic environment.
Bachelor’s degree in Computer Science, Electrical Engineering, or related technical degree or equivalent experience.
10+ years working in a Systems Administrator/Systems Engineering position
Experience with some or all of the following software/tools or close equivalents: VMware Nutanix NX Appliances Nutanix AHV Nutanix EKS AWS EKS Ansible Backup Software (Cohesity) Pure Storage Windows Server Linux (Red Hat Enterprise Linux) Active Directory Okta AWS Terraform Git Red Hat Satellite Red Hat Identity Management

Nice To Haves

Motivated self-learner pushing technology solutions forward who anticipates problems and challenges

Responsibilities

Manage the day-to-day deliverables of the team, ensuring tasks align with the workstreams translated from IT leadership strategy.
When necessary, assist in service delivery and participate in on-call rotation to ensure critical services are available.
Hold daily standup cadence meetings with the team and plan/prioritize work across the team to maintain high velocity.
Partner with senior leadership to ensure technical work is aligned with business priorities and participate in quarterly project planning.
Lead the performance and development of systems and SRE staff, defining skill competency models and gap closure plans.
Act as the primary point of contact for internal customers, clients, and users to understand needs and provide transparent technical updates.
Take the lead on organizing the team to resolve critical issues, including formal escalation management and standing up "war rooms" for P1/P2 items.
Participate in root cause analysis after incidents to ensure permanent resolution and communicate findings to affected business stakeholders.
Build, operate, and maintain scalable, highly available, and resilient infrastructure.
Develop a vision and strategy to automate common SRE requests.
Analyze metrics to create actionable monitors or alerts to ensure critical services are available and performing well.
Lead the adoption, implementation, and lifecycle of the container platform and associated CI/CD pipelines.
Plan, implement, and lifecycle Cambria’s compute, storage, virtualization, and server operating systems.
Work with product teams to ensure SLO targets are met or exceeded and participate in the on-call rotation.