Senior Production Engineer

Veeam Software
$92,100 - $235,900

About The Position

As a Production Engineer, you will play a key role in supporting reliable, scalable systems for Veeam's Data Cloud platform. You will own production efficiency, automation and documentation projects, contribute to reliability and observability improvements, and own or participate in the full incident lifecycle — from on-call response, through mitigation, to leading post-incident reviews and driving improvements across support and development teams. You will work as part of a team of skilled engineers, collaborating with support and development as a bridge and driving force for change. You will communicate with product managers and security professionals to ensure our services are production-ready, performant, and fault-tolerant, and that we rapidly incorporate user feedback into improvements.

Requirements

  • 3–5 years of experience in software engineering, site reliability, production engineering, or senior technical support roles operating distributed systems.
  • Experience with log analysis and advanced troubleshooting
  • Basic programming experience (e.g., JS, Go, Typescript, Java, or C#).
  • Experience deploying and troubleshooting systems on a public cloud platforms (Azure preferred).
  • Familiarity with observability tooling (e.g., Elastic, Prometheus, Grafana, Open Telemetry).
  • Understanding of distributed systems, networking, automation and CI/CD.

Nice To Haves

  • Prior on-call or incident response experience.
  • Background in automation, performance testing, or service scalability.
  • Familiarity with compliance or security best practices.

Responsibilities

  • Own complex and escalated production issues from support, and drive long-term fixes in collaboration with engineering, including code, configuration, and architecture changes.
  • Proactively identify and address risks that are identified during the problem solving process.
  • Lead production efficiency initiatives, develop and maintain processes, run-books and knowledge base integrity.
  • Define, build and maintain production monitoring systems.
  • Continuously improve alerting to minimize noise and ensure actionable, well-documented runbooks.
  • Define and maintain SLIs/SLOs for key services, and use error budgets to guide operational and product decisions.
  • Turn manual processes into automation.
  • Own and drive post-mortem review process and actions arising from incident analysis.
  • Collaborate with support organization as an escalation point and feed back knowledge & improvement recommendations.
  • Collaborate with developers throughout the lifecycle of changes, from design through rollout and patch delivery, ensuring safe deployments and efficient incident mitigation.
  • Participate in design reviews to ensure services are operable with minimal manual intervention in production (automation, safe deployments, clear runbooks), and share learnings through documentation and feedback.

Benefits

  • Unlimited paid time off
  • 12 paid holidays
  • 4 extra global VeeaMe Days for self-care
  • 24 paid volunteer hours annually through Veeam Cares
  • Paid parental leave: 8 weeks for all parents, 16 weeks for birthing parents
  • Medical, dental, and vision coverage starting on your first day
  • Mental health support, therapy sessions, and digital wellness tools via our Employee Assistance Program
  • 401(k) retirement plan with company matching contributions
  • Fertility, adoption, and surrogacy support through Maven
  • AirVet: 24/7 virtual veterinary care at no cost
  • Legal services, identity protection, and supplemental health insurance options
  • Tax-advantaged spending accounts for healthcare, dependent care, and commuting
  • Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and learning events like our annual Global Day of Learning
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service