Site Reliability Engineer

Lean TECHniques•Johnston, IA

5d•Remote

About The Position

Maybe you’re bored and need a new challenge. Or you’re sick of all the bureaucracy and just want to focus on building resilient, reliable systems. Or maybe you’re looking for a job where you can do really great work from anywhere. Whatever the reason, we want you to know that LT is different. And not just air quotes “different,” but more like “breathing easy for the first time in a long time” different. It’s a place where you can write your own story and make a difference along the way. At LT, you’ll have the freedom and flexibility to do what you think needs to be done, and you’ll get to do it while working alongside a team of other curious individuals who love a good challenge too. We’re currently looking to add a Site Reliability Engineer to this nerd club of ours. If you’re someone who enjoys solving complex infrastructure problems, automating everything you possibly can, and making systems more resilient, we’d love to chat. Here’s what you can expect as a Site Reliability Engineer at LT: You’ll join forces with a small, collaborative team to design, build, and maintain highly reliable infrastructure and systems that power our applications. You’ll work closely with engineers to improve system performance, reliability, and scalability while helping automate infrastructure and deployment processes. You’ll help build and maintain modern cloud infrastructure using AWS, leveraging tools like Terraform to ensure environments are reproducible and manageable. You’ll design and maintain CI/CD pipelines using GitHub Actions, enabling teams to ship faster and safer. You’ll deploy and manage containerized applications in Kubernetes, helping improve reliability, observability, and operational efficiency. You’ll write scripts and automation using Bash and work comfortably in UNIX-based environments to streamline operations and reduce manual toil. You’ll participate in incident response and troubleshooting when things break (because sometimes they will), and help implement long-term improvements to prevent them from happening again. You’ll help improve monitoring, alerting, and system visibility so we can detect and resolve issues quickly. You’ll continuously look for ways to automate processes, reduce operational overhead, and make systems more resilient.

Requirements

Experience working as a Site Reliability Engineer, DevOps Engineer, or similar role
Hands-on experience with AWS cloud infrastructure
Experience managing infrastructure using Terraform
Experience building or maintaining CI/CD pipelines with GitHub Actions
Experience running and operating applications in Kubernetes
Strong experience working in UNIX/Linux environments
Ability to write automation scripts using Bash
A mindset focused on automation, reliability, and scalability
Experience troubleshooting production systems and improving system reliability
A collaborative attitude and strong communication skills

Responsibilities

join forces with a small, collaborative team to design, build, and maintain highly reliable infrastructure and systems that power our applications
work closely with engineers to improve system performance, reliability, and scalability while helping automate infrastructure and deployment processes
help build and maintain modern cloud infrastructure using AWS, leveraging tools like Terraform to ensure environments are reproducible and manageable
design and maintain CI/CD pipelines using GitHub Actions, enabling teams to ship faster and safer
deploy and manage containerized applications in Kubernetes, helping improve reliability, observability, and operational efficiency
write scripts and automation using Bash and work comfortably in UNIX-based environments to streamline operations and reduce manual toil
participate in incident response and troubleshooting when things break (because sometimes they will), and help implement long-term improvements to prevent them from happening again
help improve monitoring, alerting, and system visibility so we can detect and resolve issues quickly
continuously look for ways to automate processes, reduce operational overhead, and make systems more resilient

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume