Senior Platform Engineer (Cloud Workloads)

Veeam Software•San Jose, CA

20h

About The Position

Veeam is the Data and AI Trust Company, specializing in helping organizations ensure their data and AI are fully understood, secured, and resilient to enable the acceleration of safe AI at scale. As the market leader in both data resilience and data security posture management, Veeam is built for the convergence of identity, data, security, and AI risk. Headquartered in Seattle with offices in more than 30 countries, Veeam protects over 550,000 customers worldwide, who trust Veeam to keep their businesses running. Join us as we go fearlessly forward together, growing, learning, and making a real impact for some of the world’s biggest brands. We are looking for a Senior Platform Engineer to join the Workload team within the Veeam R&D Department. You will own critical observability infrastructure, drive incident response maturity, and help scale proactive support capabilities as operational accountability.

Requirements

5+ years of experience in cloud platform engineering, SRE, or infrastructure roles supporting commercial SaaS products
Deep hands-on experience with Elastic Stack: Building dashboards, writing KQL/Query DSL, managing Fleet
Proven experience operating and troubleshooting distributed, multi-tenant workloads on Azure and/or AWS
Strong understanding of Azure cloud services: AKS, Entra ID, Key Vault, Service Bus, Cosmos DB, Private Endpoints, etc.
Experience with incident response in production cloud environments, including runbook development and post-incident review
Experience with IaC tools (Azure Bicep, Terraform) and CI/CD pipelines (Azure DevOps, GitHub Actions)
Strong scripting skills in Bash, Python, or PowerShell
Ability to work cross-functionally with SRE, product, and customer-facing support teams

Nice To Haves

Familiarity with Veeam Data Platform products

Responsibilities

Design, build, and maintain observability pipelines using the Elastic Stack (Elasticsearch, Kibana, Fleet) across Azure and AWS workloads
Develop and own SLO/SLI dashboards and error budget reporting for BaaS platform services
Respond to and lead incident response for distributed, multi-tenant cloud workloads; own runbook creation, maintenance, and continuous improvement
Build and refine proactive support tooling, including pattern analysis, tenant correlation dashboards, and baseline deviation alerting, to reduce reactive support burden
Manage and maintain Elastic Fleet agent policies, enrollment health, and log streaming pipelines across Azure and AWS worker fleets
Partner with SRE, R&D, and Proactive Support teams to close observability gaps, including tenant identification workflows and admin portal integrations

Benefits

Unlimited paid time off, 12 paid holidays, plus 4 extra global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares
Paid parental leave: 8 weeks for all parents, 16 weeks for birthing parents
Medical, dental, and vision coverage starting on your first day
Mental health support, therapy sessions, and digital wellness tools via our Employee Assistance Program
401(k) retirement plan with company matching contributions
Fertility, adoption, and surrogacy support through Maven, plus paid volunteer time
AirVet: 24/7 virtual veterinary care at no cost
Legal services, identity protection, and supplemental health insurance options
Tax-advantaged spending accounts for healthcare, dependent care, and commuting
Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and learning events like our annual Global Day of Learning