Senior Site Reliability Engineer

Koch Industries•Atlanta, KS

302d

About The Position

Koch Global Services is on a mission to transform how we deliver reliable and scalable services to Koch. We are building an SRE capability from the ground up—modernizing legacy monitoring tools and practices. This transformation will drive a culture of reliability, accountability, and automation. If you are passionate about designing resilient systems, influencing strategic decisions, and mentoring the next generation of SREs, this is your opportunity to make a significant impact. This role is more than just engineering—it's about driving a transformation in how we deliver reliable, scalable, and observable services. If you're excited about the opportunity to build and influence a modern SRE capability from the ground up, we want to hear from you!

Requirements

Expertise with modern observability platforms and standards (Prometheus, Grafana, OpenTelemetry etc.)
Strong understanding of service reliability metrics, including SLIs, SLOs, and SLAs
Hands-on experience with Infrastructure as Code (IaC) tools (e.g., Terraform, Ansible, etc.)
Familiarity with the AWS ecosystem and cloud-native architectures
A passion for mentoring and developing engineers
Excellent communication skills, with the ability to engage technical and non-technical stakeholders
Experience leading incident response and driving post-incident analysis for continuous improvement

Nice To Haves

Hands-on experience with OpenTelemetry for distributed tracing and telemetry collection
Expertise in deploying and managing Grafana, Loki, Tempo, & Mimir
Experience migrating solutions from Splunk, LogicMontior, etc. to modern observability technologies
Experience with Kubernetes deployments and management
Knowledge of synthetic transaction monitoring to proactively detect reliability issues
Cross-domain expertise (e.g., networking, finance, leadership) that enhances your ability to drive impact
Experience with GitHub Enterprise for CI/CD and infrastructure automation
Multi-cloud experience (Azure, GCP, etc.)
Proven ability to drive organizational change and influence engineering culture

Responsibilities

Design and implement modern observability solutions to enhance service reliability and accountability
Define and measure service performance through SLIs, SLOs, and SLAs to drive intentional service reliability strategies
Partner with stakeholders to advocate for and drive reliability best practices, ensuring alignment with business objectives
Mentor and develop engineers, fostering a culture of continuous learning and growth

Benefits

Medical, dental, vision insurance
Flexible spending and health savings accounts
Life insurance, ADD, disability insurance
Retirement plan
Paid vacation/time off
Educational assistance
Infertility assistance
Paid parental leave
Adoption assistance

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Industry

Plastics and Rubber Products Manufacturing

Senior Site Reliability Engineer

About The Position

Requirements

Nice To Haves

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company