Senior Site Reliability Engineer

HelionEverett, WA
47dOnsite

About The Position

We are a fusion power company based in Everett, WA, with the mission to build the world's first fusion power plant, enabling a future with unlimited clean electricity. Our vision is a world with clean, reliable, and affordable energy for everyone. Since Helion's founding in 2013, we have raised over $1 billion from long-time investors such as Sam Altman, Mithril, and Capricorn Investment Group as well as new investors SoftBank and Lightspeed to propel us forward. Our last prototype, Trenta, completed 10,000 high-power pulses and reached plasma temperatures of 100 million degrees Celsius (9 keV). We are now operating Polaris, our next prototype on the path to the world's first fusion power plant. This is a pivotal time to join Helion. You will tackle real-world challenges with a team that prizes urgency, rigor, ownership, and a commitment to delivering hard truths - values essential to achieving what no one has before. Together, we will change the future of energy, because the world can't wait. The Senior Site Reliability Engineer is a strategic technical leader responsible for designing and maintaining resilient systems and infrastructure. This role involves proactive reliability engineering, incident response leadership, and mentoring junior SREs to uphold high operational standards across the organization. This is an onsite role that reports directly to the Director of IS&T at our Everett, WA office.

Requirements

  • 8+ years of experience in SRE, DevOps, or infrastructure engineering roles
  • Bachelor's or master's degree in computer science, engineering, or related field
  • Technical Proficiency: Advanced knowledge of cloud platforms (AWS, GCP, Azure), container orchestration (Kubernetes), and scripting languages (Python, Go, Bash)
  • Infrastructure Expertise: Deep understanding of distributed systems, networking, and Linux internals
  • Problem Solving: Strong analytical skills for diagnosing complex system failures and performance bottlenecks
  • Collaboration: Excellent communication and cross-functional teamwork abilities

Responsibilities

  • Collaborate with engineering teams to design scalable, fault-tolerant systems that meet performance and reliability goals
  • Define and manage SLIs, SLOs, and SLAs; implement error budgets and reliability metrics
  • Lead major incident responses, conduct root cause analyses, and drive postmortem processes
  • Build and maintain automation for deployments, monitoring, and infrastructure management using tools like Terraform, Kubernetes, and CI/CD pipelines
  • Develop and maintain observability platforms to ensure real-time system health tracking and proactive alerting
  • Forecast system demands and optimize performance through load testing and tuning
  • Collaborate with security teams to ensure infrastructure meets compliance and security standards
  • Guide junior engineers, promote best practices, and contribute to a culture of reliability and continuous improvement

Benefits

  • Medical, Dental, and Vision plans for employees and their families
  • 31 Days of PTO (21 vacation days and 10 sick days)
  • 10 Paid holidays, plus company-wide winter break
  • Up to 5% employer 401(k) match
  • Short term disability, long term disability, and life insurance
  • Paid parental leave and support (up to 16 weeks)
  • Annual wellness stipend
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service