GOV Site Reliability Engineer

Veeam Software
$109,800 - $252,500Remote

About The Position

Veeam is building a global SRE function to support the Veeam Data Cloud, our SaaS platform. This role is part of the team supporting our Government and Sovereign Cloud environment. Success here requires a self-starter mindset — you'll need to be comfortable building your own context and tracking down information across a large, distributed engineering organization. You'll work alongside senior engineers to execute on reliability work, close observability gaps, respond to incidents, and help maintain the operational foundation the team runs on.

Requirements

  • 3+ years in Software Engineering, with at least 1 year in SRE, Platform Engineering, or DevOps working on cloud-hosted services.
  • Experience with cloud infrastructure on Azure or a comparable cloud provider.
  • Familiarity with regulated or compliance-oriented environments such as government (FedRAMP, CMMC), financial (PCI-DSS), or healthcare (HIPAA). You understand that compliance shapes what you can and can't do operationally.
  • Able to read and understand code well enough to investigate system behavior without always having someone walk you through it.
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, OpenTelemetry, ELK stack).
  • Experience with IaC tools (Terraform, Terragrunt, or Pulumi) and container orchestration (Kubernetes).
  • Experience with CI/CD tooling such as GitHub Actions, Azure DevOps, GitLab CI, or ArgoCD.
  • Strong programming skills in one or more of: TypeScript/JS, Go, Java, C#, or similar.
  • Solid understanding of distributed systems fundamentals and networking basics.
  • Clear written and verbal communication skills.

Nice To Haves

  • Experience in Government or Sovereign Cloud environments (e.g., Azure Government, AWS GovCloud).
  • Background in SaaS platforms or multi-tenant systems.
  • Familiarity with chaos engineering, resilience testing, or load testing.
  • Exposure to building or improving reliability practices on a team.
  • Familiar with AI-first development workflows using LLM-powered tools for automation, code generation, or documentation.

Responsibilities

  • Get up to speed on VDC workloads, dependencies, and operational workflows by reading code, docs, and working with SMEs.
  • Write and maintain runbooks, incident guides, and operational documentation.
  • Support knowledge transfer and contribute to onboarding materials for the team.
  • Participate in incident response including triage, investigation, mitigation, and postmortems.
  • Help implement and maintain SLIs, SLOs, and error budgets defined by the team.
  • Identify reliability issues during incidents or reviews and propose concrete improvements.
  • Support high availability and fault tolerance work on Azure, including Azure Government.
  • Close monitoring gaps by implementing instrumentation, alerting, and dashboards based on team standards.
  • Contribute to toil reduction through automation and tooling improvements.
  • Participate in on-call rotations.
  • Work with IaC, CI/CD pipelines, and deployment tooling in compliance-restricted environments.
  • Support testing, canary deployments, and release validation workflows.
  • Implement changes to infrastructure and configuration following established patterns and review processes.
  • Work with engineering, security, compliance, and operations teams to execute on reliability improvements.
  • Communicate clearly about system behavior, risk, and status — in writing and in meetings.
  • Raise blockers and gaps proactively; don't wait for problems to escalate.

Benefits

  • Unlimited paid time off
  • 12 paid holidays
  • 4 global VeeaMe Days for self-care
  • 24 paid volunteer hours annually through Veeam Cares
  • Paid parental leave: 8 weeks for all parents, 16 weeks for birthing parents
  • Medical, dental, and vision coverage starting on your first day
  • Mental health support, therapy sessions, and digital wellness tools via our Employee Assistance Program
  • 401(k) retirement plan with company matching contributions
  • Fertility, adoption, and surrogacy support through Maven
  • AirVet: 24/7 virtual veterinary care at no cost
  • Legal services, identity protection, and supplemental health insurance options
  • Tax-advantaged spending accounts for healthcare, dependent care, and commuting
  • Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and learning events like our annual Global Day of Learning
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service