Staff Site Reliability Engineer

Veeam Software
110d$201,000 - $287,100

About The Position

Veeam is launching a global Site Reliability Engineering (SRE) function to support the rollout and operation of our new SaaS offering: the Veeam Data Cloud. As a Staff Site Reliability Engineer, you will serve as a hands-on technical leader within the SRE team, guiding senior engineers, influencing product development teams, and ensuring the systems we operate are built to be reliable, scalable, and observable from the ground up. You will drive strategic initiatives, mentor others in the practice of SRE, and help define architectural best practices across our platform. This role is pivotal in aligning teams, enforcing high standards, and scaling SRE principles globally within Veeam.

Requirements

  • 8+ years of experience in a Software Engineering or SRE role, including technical leadership.
  • Demonstrated experience mentoring and guiding senior engineers.
  • Deep expertise in building distributed systems on public cloud (Azure preferred).
  • Strong skills in programming (e.g., JS, Go, Typescript, Java, or C#).
  • Hands-on experience with observability tooling (e.g., Prometheus, Grafana, OpenTelemetry).
  • Mastery of infrastructure automation tools (Terraform, Pulumi) and container orchestration (Kubernetes).
  • Ability to communicate clearly across geographies and disciplines.

Nice To Haves

  • Experience leading SRE initiatives across multiple product teams.
  • Background in chaos engineering, incident learning, or performance and load testing.
  • Familiarity with global compliance standards (ISO, SOC 2, GDPR, FedRAMP, CMMC).

Responsibilities

  • Act as a technical authority in your area, mentoring senior engineers and guiding design choices that improve service reliability and resilience.
  • Lead the definition and enforcement of SLIs, SLOs, and error budgets; drive adherence across engineering teams.
  • Collaborate with Staff peers across teams to align strategy and champion shared reliability standards and goals.
  • Partner with development and product teams to proactively design for failure, build resilient architecture, and operationalize reliability from the start.
  • Drive company-wide adoption of observability best practices and tooling.
  • Ensure metrics, logs, and traces provide deep, actionable insights across systems.
  • Lead complex incident responses, postmortems, and systemic reliability improvements.
  • Promote and enforce a blameless culture of learning and continuous improvement.
  • Lead initiatives in infrastructure as code, deployment automation, and resilience testing.
  • Influence the development and adoption of chaos engineering practices and release validation frameworks.
  • Partner with platform and security teams to ensure production readiness.
  • Work closely with your peer Staff Engineers to plan, align, and deliver against reliability goals.
  • Provide architectural guidance and advocate for engineering rigor and consistency.
  • Represent the SRE team in technical leadership forums and product planning discussions.

Benefits

  • Unlimited PTO
  • 3 global VeeaMe Days per year: company-wide closures for employees to take a break, disconnect, and focus on self-care
  • Paid Holidays
  • Veeam Care Days: 24 hours paid time for volunteering
  • Medical, dental, and vision coverage starting on day one (multiple plan options)
  • Flexible Spending Accounts (FSA) and Health Savings Account (HSA) options
  • Employer HSA contributions (for HDHP participants)
  • Life and AD&D insurance (employee, spouse/partner, and child options)
  • Company-paid short-term and long-term disability insurance
  • Supplemental individual disability insurance (IDI)
  • Family planning support: fertility, adoption, surrogacy, and parental resources
  • Paid parental leave
  • Employee Assistance Program
  • Additional voluntary benefits: accident, critical illness, hospital indemnity, legal, identity theft protection, commuter benefits, pet care
  • Mental health support
  • 401(k) plan
  • Professional training and education, on-demand learning libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and Global Day of Learning
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service