Senior Site Reliability Engineer

Bolt On TechnologyTampa, FL
1d$110,000Hybrid

About The Position

The Senior Site Reliability Engineer is responsible for ensuring the reliability, scalability, performance, and security of our production systems. This role blends software engineering and systems engineering to build resilient infrastructure, improve automation, and proactively reduce operational risk. The Senior SRE will serve as a technical leader, driving best practices across observability, incident response, and platform stability.

Requirements

  • 6+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or related roles
  • Strong experience with cloud platforms (AWS, Azure, or GCP)
  • Proficiency with infrastructure as code (Terraform, CloudFormation, Pulumi, etc.)
  • Experience with containerization and orchestration (Docker, Kubernetes)
  • Strong Linux systems administration and networking fundamentals
  • Experience building and maintaining CI/CD pipelines
  • Hands-on experience with monitoring and observability tools (Datadog, Prometheus, Grafana, New Relic, etc.)
  • Strong troubleshooting and incident management skills
  • Experience with scripting and automation (Python, Bash, Go, or similar)

Nice To Haves

  • Experience designing multi-region or highly distributed systems
  • Experience with security best practices and compliance in production environments
  • Experience supporting high-availability SaaS platforms
  • Experience in a fast-growing or PE-backed environment
  • Experience influencing reliability culture across engineering teams

Responsibilities

  • Design, build, and maintain highly available, scalable, and fault-tolerant systems
  • Lead reliability improvements across production and non-production environments
  • Own and improve monitoring, alerting, and observability platforms
  • Drive incident response, root cause analysis, and post-incident reviews
  • Implement automation to reduce manual operational work
  • Partner with Engineering, Security, and Product to support platform needs
  • Establish and track SLIs, SLOs, and error budgets
  • Lead capacity planning and performance tuning efforts
  • Improve deployment, CI/CD, and infrastructure-as-code practices
  • Identify and mitigate reliability and scalability risks before they impact customers
  • Mentor and guide junior engineers and contribute to team technical standards
  • Participate in on-call rotation and help mature on-call processes

Benefits

  • Competitive salaries
  • Medical, dental, and vision benefits
  • Company-paid life insurance
  • Flexible schedules
  • Unlimited PTO
  • Volunteer Time Off
  • Sick leave
  • Parental leave
  • 9 company-paid holidays
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service