Director, Site Reliability Engineering

OracleSeattle, WA
$121,500 - $306,400

About The Position

Provides leadership to one or more teams designing and architecting infrastructure and service and provides input on best practices for reliability and functionality. Establishes direction to ensure accurate forecasting and ensure systems have adequate resources. Builds collaborative relationships with the software development team to create reliable, scalable infrastructures. Ensures alignment regarding data collection and contributes to standards for optimizing operations and infrastructure reliability. Defines approaches for incident response activities to ensure service reliability. Ensures in-depth reports. Plays a key role in developing standards for identifying and recommending automation. Anticipates and explains the impact of changes, mentoring other managers on what to communicate. Defines approaches for escalating incidents and refines methods for documentation. Encourages experimenting with new technology, executing improvements, building site reliability knowledge, and providing clear data.

Requirements

  • Leadership experience
  • Experience designing and architecting infrastructure and services
  • Knowledge of best practices for reliability and functionality
  • Forecasting and resource allocation for systems
  • Collaboration with software development teams
  • Understanding of data collection and operational standards
  • Experience defining incident response approaches
  • Experience developing standards for automation
  • Mentoring skills
  • Experience defining incident escalation methods
  • Experience with documentation refinement
  • Experience with new technology experimentation
  • Experience with site reliability knowledge building
  • Ability to provide clear data

Responsibilities

  • Provides leadership to one or more teams designing and architecting infrastructure and service
  • Provides input on best practices for reliability and functionality
  • Establishes direction to ensure accurate forecasting and ensure systems have adequate resources
  • Builds collaborative relationships with the software development team to create reliable, scalable infrastructures
  • Ensures alignment regarding data collection and contributes to standards for optimizing operations and infrastructure reliability
  • Defines approaches for incident response activities to ensure service reliability
  • Ensures in-depth reports
  • Plays a key role in developing standards for identifying and recommending automation
  • Anticipates and explains the impact of changes, mentoring other managers on what to communicate
  • Defines approaches for escalating incidents and refines methods for documentation
  • Encourages experimenting with new technology, executing improvements, building site reliability knowledge, and providing clear data

Benefits

  • Flexible medical
  • Life insurance
  • Retirement options
  • Volunteer programs
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service