Docusign-posted 3 months ago
$202,800 - $327,625/Yr
Full-time • Senior
5,001-10,000 employees

As a Principal Technical Program Manager for Site Reliability, you will lead complex planning, execution, and delivery of programs that enhance the reliability, scalability, and efficiency of our infrastructure and services. You will collaborate with cross-functional teams, including engineering, product, operations, and executive leadership, to define and implement SRE best practices, tools, and processes. The ideal candidate thrives in high-stakes environments, excels at coordinating deep technical work among large teams, and possesses exceptional follow-through to deliver results on time and within scope. You will be responsible for presenting progress and strategies to executive leadership, translating technical complexities into actionable insights. This position is an individual contributor role reporting to the Senior Director, Cloud and Production Engineering.

  • Lead end-to-end program management for major SRE programs, including incident management, release management, observability, automation, and capacity planning, to ensure system reliability and performance.
  • Coordinate and orchestrate work across large, distributed teams of software engineers, SREs, DevOps teams, and stakeholders to align on priorities, resolve blockers, technical deliverables, timelines, dependencies and drive successful outcomes.
  • Prepare and deliver executive-level presentations, dashboards, and reports that highlight project status, milestones, challenges, and outcomes, influencing strategic decisions.
  • Partnering with Incident Commanders to oversee post-incident reviews, drive root cause analysis, and implement preventive measures to minimize downtime and improve system resilience.
  • Communicate program status, risks, and outcomes to senior leadership and stakeholders, translating technical details into business impact.
  • 15+ years of professional experience in the High-Tech Industry, including 12+ years of experience in technical program management or site reliability engineering (SRE) managing and delivering world class SRE platform and infrastructure for SaaS products and services.
  • Proven track record of managing large-scale, technically complex programs involving 50+ team members, with a demonstrated ability to deliver under tight deadlines.
  • Demonstrated ability to work effectively with executives, including presenting strategic plans and program updates to senior leadership.
  • Experience with SRE principles, including observability, incident response, and infrastructure automation.
  • Experience with distributed systems, cloud platforms (e.g., AWS, Azure, GCP), and container orchestration (e.g., Kubernetes).
  • Experience with CI/CD pipelines, infrastructure as code (e.g., Terraform, Ansible), and monitoring tools (e.g., Prometheus, Grafana).
  • Experience in building dashboards and data driven approach to projects.
  • Experience with project management tools (e.g., Jira, Asana, Microsoft Project) and agile/scrum frameworks.
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field; advanced degree or equivalent experience.
  • Exceptional attention to detail, follow-through, and high drive to push projects forward, even in ambiguous or challenging situations.
  • Excellent communication and presentation skills, with experience briefing executives on technical and business impacts.
  • Strong problem-solving and analytical skills, with a focus on driving measurable outcomes.
  • Ability to thrive in ambiguous environments and manage competing priorities.
  • Experience in defining and implementing SLOs/SLIs for large-scale systems.
  • Understanding of setting OKR and managing of multiple deliverables.
  • Background in managing incident response processes or chaos engineering programs.
  • Familiarity with DevOps software development methodologies.
  • Paid Time Off: earned time off, as well as paid company holidays based on region.
  • Paid Parental Leave: take up to six months off with your child after birth, adoption or foster care placement.
  • Full Health Benefits Plans: options for 100% employer paid and minimum employee contribution health plans from day one of employment.
  • Retirement Plans: select retirement and pension programs with potential for employer contributions.
  • Learning and Development: options for coaching, online courses and education reimbursements.
  • Compassionate Care Leave: paid time off following the loss of a loved one and other life-changing events.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service