Site Reliability Engineer III- Eng

UKGAlpharetta, GA
18h

About The Position

Site Reliability Engineers (SREs) at UKG are experienced individual contributors who apply software engineering principles to operational challenges across the full service lifecycle. In this role, you will proactively monitor system health, manage risk through SLOs and error budgets, lead incident response, and enable safe, rapid change — balancing reliability with delivery velocity. SREs at UKG are passionate about learning and evolving with modern technologies. We strive to innovate and relentlessly improve the customer experience, with an "automate everything" mindset that enables services to be delivered with speed, consistency, and high availability.

Requirements

  • 5+ years of hands-on experience in software engineering, systems engineering, or cloud-based environments, with a demonstrated ability to work independently on complex, ambiguous problems.
  • 3+ years of experience working with public cloud platforms (e.g., GCP (preferred), AWS, or Azure).
  • 3+ years of experience configuring, operating, and maintaining applications and/or systems infrastructure in a large-scale, customer-facing environment.
  • Demonstrated understanding of observability best practices, including metric generation and collection, log aggregation pipelines, time-series databases, and distributed tracing.
  • Experience coding in one or more higher-level programming languages (e.g., Python, Java, or C++).
  • Strong working knowledge of Linux systems, including troubleshooting, performance analysis, and scripting in production environments.
  • Experience with GitHub Actions and modern CI/CD practices.
  • Hands-on experience with containerization and container orchestration (Docker, Kubernetes) in production environments.
  • Experience building operational dashboards and alerts using observability tools such as Splunk or Grafana.
  • Able to communicate technical risk and tradeoffs clearly to non-technical stakeholders.

Nice To Haves

  • Experience with distributed system design and architecture.
  • Experience with infrastructure-as-code and configuration management tools (e.g., Terraform, Ansible).
  • Solid grounding in at least two of the following areas: Computer Science fundamentals, Cloud Architecture, Security, or Network Design.

Responsibilities

  • Engage in and improve the lifecycle of services from conception to end-of-life, including system design reviews, capacity planning, and production readiness.
  • Contribute to standards and best practices for system architecture, service delivery, reliability, and automation, including the definition and monitoring of service health indicators (latency, traffic, error rates, and resource saturation), service level objectives (SLOs), and the use of error budgets to guide operational and delivery decisions.
  • Support service, product, and engineering teams by leveraging common tooling and frameworks to increase availability and improve incident detection and response.
  • Improve system performance, availability, and efficiency through automation, process refinement, post-incident reviews, and in-depth configuration analysis.
  • Collaborate closely with engineering teams across the organization to deliver and operate reliable services.
  • Increase operational efficiency, effectiveness, and service quality by treating operational challenges as software engineering problems (reducing toil).
  • Share knowledge and contribute to a culture of Site Reliability Engineering best practices within the team.
  • Mentor and guide junior engineers on SRE principles, reliability practices, and operational standards.
  • Actively participate in incident response, including on-call rotations and post-incident reviews, collaborating with engineering teams to restore service and reduce recurrence.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service