Site Reliability Engineer

Lean TECHniquesJohnston, IA
5dRemote

About The Position

Maybe you’re bored and need a new challenge. Or you’re sick of all the bureaucracy and just want to focus on building resilient, reliable systems. Or maybe you’re looking for a job where you can do really great work from anywhere. Whatever the reason, we want you to know that LT is different. And not just air quotes “different,” but more like “breathing easy for the first time in a long time” different. It’s a place where you can write your own story and make a difference along the way. At LT, you’ll have the freedom and flexibility to do what you think needs to be done, and you’ll get to do it while working alongside a team of other curious individuals who love a good challenge too. We’re currently looking to add a Site Reliability Engineer to this nerd club of ours. If you’re someone who enjoys solving complex infrastructure problems, automating everything you possibly can, and making systems more resilient, we’d love to chat. Here’s what you can expect as a Site Reliability Engineer at LT: You’ll join forces with a small, collaborative team to design, build, and maintain highly reliable infrastructure and systems that power our applications. You’ll work closely with engineers to improve system performance, reliability, and scalability while helping automate infrastructure and deployment processes. You’ll help build and maintain modern cloud infrastructure using AWS, leveraging tools like Terraform to ensure environments are reproducible and manageable. You’ll design and maintain CI/CD pipelines using GitHub Actions, enabling teams to ship faster and safer. You’ll deploy and manage containerized applications in Kubernetes, helping improve reliability, observability, and operational efficiency. You’ll write scripts and automation using Bash and work comfortably in UNIX-based environments to streamline operations and reduce manual toil. You’ll participate in incident response and troubleshooting when things break (because sometimes they will), and help implement long-term improvements to prevent them from happening again. You’ll help improve monitoring, alerting, and system visibility so we can detect and resolve issues quickly. You’ll continuously look for ways to automate processes, reduce operational overhead, and make systems more resilient.

Requirements

  • Experience working as a Site Reliability Engineer, DevOps Engineer, or similar role
  • Hands-on experience with AWS cloud infrastructure
  • Experience managing infrastructure using Terraform
  • Experience building or maintaining CI/CD pipelines with GitHub Actions
  • Experience running and operating applications in Kubernetes
  • Strong experience working in UNIX/Linux environments
  • Ability to write automation scripts using Bash
  • A mindset focused on automation, reliability, and scalability
  • Experience troubleshooting production systems and improving system reliability
  • A collaborative attitude and strong communication skills

Responsibilities

  • join forces with a small, collaborative team to design, build, and maintain highly reliable infrastructure and systems that power our applications
  • work closely with engineers to improve system performance, reliability, and scalability while helping automate infrastructure and deployment processes
  • help build and maintain modern cloud infrastructure using AWS, leveraging tools like Terraform to ensure environments are reproducible and manageable
  • design and maintain CI/CD pipelines using GitHub Actions, enabling teams to ship faster and safer
  • deploy and manage containerized applications in Kubernetes, helping improve reliability, observability, and operational efficiency
  • write scripts and automation using Bash and work comfortably in UNIX-based environments to streamline operations and reduce manual toil
  • participate in incident response and troubleshooting when things break (because sometimes they will), and help implement long-term improvements to prevent them from happening again
  • help improve monitoring, alerting, and system visibility so we can detect and resolve issues quickly
  • continuously look for ways to automate processes, reduce operational overhead, and make systems more resilient
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service