Director - Site Reliability Engineering

Ultimate Kronos Group•Lowell, MA

79d

About The Position

UKG is seeking an innovative cloud leader to serve as the Director of Site Reliability Engineering (SRE). This role requires extensive experience in large-scale cloud environments and a proven track record of delivering world-class solutions. You will lead a global cloud team dedicated to establishing an SRE culture, aligning enterprise observability capabilities, and evaluating cloud-native technologies to enhance operational efficiency. Your leadership will be pivotal in enabling the business to deliver value rapidly while ensuring a consistent approach to delivery and security in a cloud environment.

Requirements

Bachelor's or Master's degree in Computer Science, Engineering, Software Engineering, or a related field.
10+ years of development/engineering experience, with significant experience in cloud environments.
Proven ability to manage resources across a global team and influence in a dynamic, matrixed environment.
Strong understanding of public cloud technologies, including experience with Linux-based infrastructures, Windows-based infrastructure, and Google Cloud Platform.
Familiarity with development languages such as Bash, Ansible/Python, Terraform, and GitHub Actions
Experience with automation frameworks and deployment orchestration capabilities
Exceptional communication skills, with the ability to articulate protocols and processes at various organizational levels
Strong analytical mindset, capable of making data-driven decisions
Professional experience with observability platforms such as Datadog, Grafana Cloud, and Splunk
Familiarity with Agile delivery methodologies, including Scrum, Kanban

Responsibilities

Lead product focused SRE teams, executing the vision for a unified enterprise Observability Platform utilized by global Engineering, Product, and Cloud organizations.
Drive organizational change towards a common methodology centered on Site Reliability Engineering and robust System Engineering practices.
Foster a high-performing Site Reliability Engineering (SRE) team by actively mentoring existing team members and strategically hiring top talent to drive organizational transformation and operational excellence
Analyze current technologies and processes, developing strategies for improvement and expansion.
Collaborate closely with engineering professionals to streamline cloud delivery and eliminate manual interventions, focusing on velocity and efficiency.
Partner with engineering leadership to develop and implement improvement plans, monitor progress, and ensure transparency by creating and maintaining a roadmap for engineering and operational enhancements
Provide programmatic building blocks that empower product engineering teams to deliver business value swiftly.
Mentor and train engineers across the organization, fostering a culture of continuous improvement in cloud delivery practices.
Coach the organization on SRE principles, including automation, visibility enhancements, toil reduction, self-healing, and root cause analysis.
Understand business priorities and identify technical initiatives that align with desired outcomes.
Advocate for a Continuous Delivery culture, ensuring that product engineering teams have the necessary tools and technologies to deliver a secure and exceptional customer experience.