Director - Site Reliability Engineering

Ultimate Kronos GroupLowell, MA
26d

About The Position

UKG is seeking an innovative cloud leader to serve as the Director of Site Reliability Engineering (SRE). This role requires extensive experience in large-scale cloud environments and a proven track record of delivering world-class solutions. You will lead a global cloud team dedicated to establishing an SRE culture, aligning enterprise observability capabilities, and evaluating cloud-native technologies to enhance operational efficiency. Your leadership will be pivotal in enabling the business to deliver value rapidly while ensuring a consistent approach to delivery and security in a cloud environment.

Requirements

  • Bachelor's or Master's degree in Computer Science, Engineering, Software Engineering, or a related field.
  • 10+ years of development/engineering experience, with significant experience in cloud environments.
  • Proven ability to manage resources across a global team and influence in a dynamic, matrixed environment.
  • Strong understanding of public cloud technologies, including experience with Linux-based infrastructures, Windows-based infrastructure, and Google Cloud Platform.
  • Familiarity with development languages such as Bash, Ansible/Python, Terraform, and GitHub Actions
  • Experience with automation frameworks and deployment orchestration capabilities
  • Exceptional communication skills, with the ability to articulate protocols and processes at various organizational levels
  • Strong analytical mindset, capable of making data-driven decisions
  • Professional experience with observability platforms such as Datadog, Grafana Cloud, and Splunk
  • Familiarity with Agile delivery methodologies, including Scrum, Kanban

Responsibilities

  • Lead product focused SRE teams, executing the vision for a unified enterprise Observability Platform utilized by global Engineering, Product, and Cloud organizations.
  • Drive organizational change towards a common methodology centered on Site Reliability Engineering and robust System Engineering practices.
  • Foster a high-performing Site Reliability Engineering (SRE) team by actively mentoring existing team members and strategically hiring top talent to drive organizational transformation and operational excellence
  • Analyze current technologies and processes, developing strategies for improvement and expansion.
  • Collaborate closely with engineering professionals to streamline cloud delivery and eliminate manual interventions, focusing on velocity and efficiency.
  • Partner with engineering leadership to develop and implement improvement plans, monitor progress, and ensure transparency by creating and maintaining a roadmap for engineering and operational enhancements
  • Provide programmatic building blocks that empower product engineering teams to deliver business value swiftly.
  • Mentor and train engineers across the organization, fostering a culture of continuous improvement in cloud delivery practices.
  • Coach the organization on SRE principles, including automation, visibility enhancements, toil reduction, self-healing, and root cause analysis.
  • Understand business priorities and identify technical initiatives that align with desired outcomes.
  • Advocate for a Continuous Delivery culture, ensuring that product engineering teams have the necessary tools and technologies to deliver a secure and exceptional customer experience.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service