Senior Platform Engineer (Cloud Workloads)

Veeam SoftwareSan Jose, CA

About The Position

Veeam is the Data and AI Trust Company, specializing in helping organizations ensure their data and AI are fully understood, secured, and resilient to enable the acceleration of safe AI at scale. As the market leader in both data resilience and data security posture management, Veeam is built for the convergence of identity, data, security, and AI risk. Headquartered in Seattle with offices in more than 30 countries, Veeam protects over 550,000 customers worldwide, who trust Veeam to keep their businesses running. Join us as we go fearlessly forward together, growing, learning, and making a real impact for some of the world’s biggest brands. We are looking for a Senior Platform Engineer to join the Workload team within the Veeam R&D Department. You will own critical observability infrastructure, drive incident response maturity, and help scale proactive support capabilities as operational accountability.

Requirements

  • 5+ years of experience in cloud platform engineering, SRE, or infrastructure roles supporting commercial SaaS products
  • Deep hands-on experience with Elastic Stack: Building dashboards, writing KQL/Query DSL, managing Fleet
  • Proven experience operating and troubleshooting distributed, multi-tenant workloads on Azure and/or AWS
  • Strong understanding of Azure cloud services: AKS, Entra ID, Key Vault, Service Bus, Cosmos DB, Private Endpoints, etc.
  • Experience with incident response in production cloud environments, including runbook development and post-incident review
  • Experience with IaC tools (Azure Bicep, Terraform) and CI/CD pipelines (Azure DevOps, GitHub Actions)
  • Strong scripting skills in Bash, Python, or PowerShell
  • Ability to work cross-functionally with SRE, product, and customer-facing support teams

Nice To Haves

  • Familiarity with Veeam Data Platform products

Responsibilities

  • Design, build, and maintain observability pipelines using the Elastic Stack (Elasticsearch, Kibana, Fleet) across Azure and AWS workloads
  • Develop and own SLO/SLI dashboards and error budget reporting for BaaS platform services
  • Respond to and lead incident response for distributed, multi-tenant cloud workloads; own runbook creation, maintenance, and continuous improvement
  • Build and refine proactive support tooling, including pattern analysis, tenant correlation dashboards, and baseline deviation alerting, to reduce reactive support burden
  • Manage and maintain Elastic Fleet agent policies, enrollment health, and log streaming pipelines across Azure and AWS worker fleets
  • Partner with SRE, R&D, and Proactive Support teams to close observability gaps, including tenant identification workflows and admin portal integrations

Benefits

  • Unlimited paid time off, 12 paid holidays, plus 4 extra global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares
  • Paid parental leave: 8 weeks for all parents, 16 weeks for birthing parents
  • Medical, dental, and vision coverage starting on your first day
  • Mental health support, therapy sessions, and digital wellness tools via our Employee Assistance Program
  • 401(k) retirement plan with company matching contributions
  • Fertility, adoption, and surrogacy support through Maven, plus paid volunteer time
  • AirVet: 24/7 virtual veterinary care at no cost
  • Legal services, identity protection, and supplemental health insurance options
  • Tax-advantaged spending accounts for healthcare, dependent care, and commuting
  • Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and learning events like our annual Global Day of Learning
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service