SRE Manager, Azure Cloud

WEXBoston, MA
1d

About The Position

We are looking for a highly motivated and high-potential Site Reliability Engineering (SRE) Manager to lead a team of engineers, lead impactful initiatives, and further elevate your career in reliability engineering. This is a transformative moment to be part of the SRE team at WEX. Our products support a wide range of customer businesses and generate complex, high-volume telemetry and operational data across systems and platforms. As WEX scales, reliability, performance, and operational excellence are more essential than ever. As the SRE Manager, you will lead a team of engineers who treat operations as a software problem. You aren't just managing infrastructure; you are the architect of our reliability strategy. Your mission is to balance the velocity of feature delivery with the stability of our Microsoft Azure ecosystem. You’ll also act as a key partner to engineering and product teams—guiding them on building with reliability in mind, embedding SRE best practices, and influencing platform architecture and operational maturity. We operate with agile methodologies and a product-minded engineering culture, and we leverage modern technologies—including AI—to continuously evolve our reliability capabilities. You’ll drive solutions to complex challenges with high business impact and collaborate with a team of leaders who will support and challenge you to grow further as a technical and strategic leader. If you’re passionate about reliability, eager to lead, and ready to make a big impact, this is a great opportunity for you!

Requirements

  • Experience: 5+ years in SRE, DevOps, or Cloud Engineering, with 2+ years leading technical teams in high-availability environments.
  • SRE Mindset: Deep understanding of SRE principles (Error Budgets, Eliminating Toil, Observability).
  • Technical Depth: Expertise in Terraform and GitOps workflows. Proficiency in Python or Go (for automation) and scripting (Bash/PowerShell). Strong grasp of Azure-native monitoring (KQL, Prometheus/Grafana integration).
  • Soft Skills: Ability to negotiate Error Budgets with product owners and translate technical debt into business risk. Experience mentoring and guiding engineers in areas such as on-call readiness, alert tuning, and automation best practices.
  • Education: Bachelor’s degree in CS, IT, or equivalent.

Nice To Haves

  • Azure Solutions Architect (AZ-305) or Azure DevOps Engineer (AZ-400) preferred.

Responsibilities

  • Team Leadership & SRE Advocacy
  • Mentorship: Lead weekly 1:1s focused on transitioning engineers from traditional ops mindsets to SRE/DevOps practices.
  • Blameless Culture: Drive a "blameless post-mortem" culture where incidents are viewed as opportunities to harden the system rather than find fault.
  • Toil Management: Actively identify and track "toil" (manual, repetitive work), ensuring the team maximizes their time on engineering projects that eliminate it.
  • Reliability & Operational Strategy
  • SLOs & SLIs: Define and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to measure the true health of Azure services from the user's perspective.
  • Error Budget Oversight: Manage error budgets in collaboration with Product Engineering to balance the risk of new deployments against system stability.
  • Incident Response & Resilience: Oversee the 24/7 on-call rotation with a focus on observability (Log Analytics, Azure Monitor, KQL) to reduce Mean Time to Detect (MTTD) and Mean Time to Repair (MTTR).
  • Engineering & Automation
  • IaC & GitOps: Lead the standardization of Infrastructure as Code (Terraform/Bicep) and CI/CD pipelines (GitHub Actions) to ensure all Azure resources are version-controlled and reproducible.
  • Self-Healing Systems: Architect automated remediation workflows to handle common failure modes, reducing the need for human intervention during minor incidents.
  • FinOps & Governance: Collaborate with FinOps to automate cost-optimization and enforce Azure Policies that prevent "drift" from security and compliance baselines.

Benefits

  • health, dental and vision insurances
  • retirement savings plan
  • paid time off
  • health savings account
  • flexible spending accounts
  • life insurance
  • disability insurance
  • tuition reimbursement
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service