Senior Site Reliability Engineer (SRE)

Bellota LabsRedwood City, CA
6h$200,000 - $250,000

About The Position

At Bellota Labs, we are a fast-paced, hypergrowth startup poised to revolutionize the gaming world with ClubWPT Gold—a groundbreaking product from the World Poker Tour. Driven by innovation, game integrity, and exceptional customer experiences, we are on a mission to set new standards in online gaming. We are seeking an experienced Senior Site Reliability Engineer (SRE) to design, build, and maintain highly reliable, scalable, and secure systems. You will play a critical role in ensuring system availability, performance, and operational excellence across our infrastructure and applications. As a senior member of the team, you will also mentor engineers, influence architecture decisions, and drive best practices in reliability engineering, automation, and incident management.

Requirements

  • 5+ years of experience in SRE, DevOps, or Infrastructure Engineering.
  • Strong experience with cloud platforms (AWS).
  • Deep understanding of Linux systems and networking fundamentals.
  • Experience with containerization and orchestration (Docker, Kubernetes).
  • Proficiency in scripting/programming (Python, Go, Bash, or similar).
  • Experience with monitoring and observability platforms (Datadog/Prometheus).

Nice To Haves

  • Experience operating high-scale production systems.
  • Experience with microservices architecture.
  • Background in database reliability (Postgres, MySQL, Redis, etc.).
  • Experience implementing SRE practices (error budgets, blameless postmortems).
  • Experience with AI-driven SRE

Responsibilities

  • Design and implement highly available, scalable, and fault-tolerant systems.
  • Define and maintain SLIs, SLOs, and SLAs.
  • Lead incident response, root cause analysis (RCA), and postmortems.
  • Improve system resiliency and reduce operational toil through automation.
  • Design monitoring, alerting, and logging strategies.
  • Implement tools such as Prometheus, Grafana, Datadog, ELK, or similar.
  • Establish proactive alerting and capacity planning processes.
  • Conduct performance testing and optimization.
  • Identify bottlenecks and implement improvements.
  • Support system scaling initiatives and architecture reviews.
  • Partner with engineering teams to embed reliability into development processes.
  • Lead reliability initiatives and cross-functional projects.
  • Mentor junior engineers and promote SRE best practices.

Benefits

  • Lead High-Impact Projects – Play a key role in delivering innovative gaming experiences to a global audience
  • Collaborate Across Borders – Work with talented teams across Asia and the US
  • Fast-Paced Growth – Be part of a hypergrowth startup with ambitious goals
  • Competitive Benefits – Enjoy a top-tier compensation package in a dynamic company
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service