Site Reliability Engineer

TeraswitchPittsburgh, PA
75d

About The Position

We’re looking for an experienced Site Reliability Engineer (SRE) to take ownership of our production systems’ availability, latency, performance, and capacity. In this role, you’ll apply your expertise in automation, monitoring, and resilient system design to maintain and improve our critical, large-scale infrastructure.

Requirements

  • 6+ years of relevant work experience in systems or infrastructure roles
  • Strong experience with Ansible
  • Experience with Prometheus, Grafana, and related monitoring tools
  • Solid understanding of networking and Linux-based systems
  • Hardware knowledge and experience managing physical or cloud-based fleets

Nice To Haves

  • Knowledge of Kubernetes
  • Experience with Blockchain is a plus
  • Familiarity with the HashiCorp stack: Nomad, Consul, Vault
  • Experience with HAProxy or similar load balancing software
  • Programming experience in Go, Rust, or Python is a plus

Responsibilities

  • Respond to customer support requests and participate in our 24/7 support rotation
  • Maintain internal documentation and deployment playbooks
  • Modify and test server configurations, then deploy to production
  • Monitor infrastructure and respond to alerts
  • Automate tasks using tools like Ansible, Terraform, and Nomad
  • Contribute to internal tooling and platform improvements
  • Stay current with changes in the protocols and tooling we support
  • Other duties as assigned
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service