Site Reliability Engineer

Jobgether
3d$118,000 - $158,000

About The Position

The Site Reliability Engineer will play a critical role in maintaining and scaling complex systems, ensuring the reliability, performance, and availability of infrastructure across cloud and on-premises environments. This role blends deep technical expertise in Linux systems, virtualization, container orchestration, Kubernetes, and CI/CD pipelines with proactive monitoring and operational excellence. You will collaborate closely with development and platform teams to implement best practices, automate workflows, and manage high-throughput services in large-scale datacenters. The position offers the opportunity to influence architecture, improve system resilience, and participate in incident response and root cause analysis. Ideal candidates thrive in fast-paced, distributed teams, are comfortable with both strategic planning and hands-on implementation, and are passionate about building robust and scalable systems.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or related field; advanced degree preferred
  • 5+ years of experience in site reliability engineering or similar roles, with a focus on production systems, containers, microservices, and service delivery
  • Strong expertise in Linux systems, virtualization, and large-scale datacenter operations
  • Hands-on experience with CI/CD pipelines, GitOps workflows, ArgoCD, Helm, and Kustomize
  • Proficiency with observability tools such as Prometheus, ELK Stack, Grafana, and log collection frameworks
  • Familiarity with networking concepts and protocols within Linux environments
  • Excellent troubleshooting, problem-solving, and cross-functional collaboration skills

Nice To Haves

  • Experience with Kubernetes and container orchestration is highly desirable

Responsibilities

  • Monitor, troubleshoot, and optimize system performance, reliability, and availability across bare metal, virtualized, and cloud environments
  • Design, implement, and maintain scalable infrastructure using containers, Kubernetes, and microservices architectures
  • Manage CI/CD pipelines and GitOps workflows, including ArgoCD, Helm charts, and Kustomize configurations for automated application deployment
  • Oversee configuration management using tools like Ansible to ensure consistent and reliable software releases across datacenter infrastructure
  • Design and operate high-throughput Kafka clusters for event streaming, including replication, consumer lag monitoring, and disaster recovery strategies
  • Collaborate with development teams to guide system design, operational policies, and performance optimization
  • Create and maintain technical documentation, runbooks, architectural diagrams, and network topology maps for operational excellence

Benefits

  • Competitive base salary range: $118,000 – $158,000 USD
  • Comprehensive medical, dental, and vision coverage, including HSA funding support
  • Employer-paid income protection (life, AD&D, short- and long-term disability)
  • 401(k) plan with employer match and Roth options, Employee Stock Purchase Plan (ESPP)
  • Paid time off, sick leave, and corporate holidays
  • Employee assistance programs and life balance benefits including travel assistance and identity theft protection
  • Additional perks: discount programs, credit union membership, Medicare assistance, and more
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service