Director, Site Reliability Engineering

OptimumTown of Oyster Bay, NY
4d$155,509 - $222,156

About The Position

The Director of Site Reliability Engineering will be responsible for enhancing and maintaining our private and public cloud infrastructure, and driving the technical direction, innovation, and scalability of our Cloud Platform. The Director, Site Reliability Engineering, will lead a team which is responsible for operating, architecting and building key infrastructure that runs the high availability services that our customers and their businesses depend upon.

Requirements

  • Bachelor’s degree in computer science or relevant experience. A master’s degree is a plus.
  • Minimum of 10 years of progressive experience in Engineering or Architecture roles, with a strong focus on large-scale distributed systems.
  • Minimum of 5 years of experience leading teams and initiatives within cloud infrastructure environments (GCP and AWS experience is highly preferred).
  • Minimum of 5 years of experience with on-premise Data Center infrastructure management and modernization.
  • Deep expertise and practical experience with SRE practices, DevOps principles, and Agile methodologies.
  • Demonstrated experience in designing, implementing, and maintaining secure and reliable technology delivery systems, including comprehensive disaster recovery planning, SAN, and highly available server/network architectures.
  • Proven ability to leverage data, analytics, and key operating metrics to inform strategic decision-making and articulate complex technical concepts to diverse audiences.
  • Exceptional strategic thinking capabilities with a track record of envisioning, articulating, and planning complex deliverables that align with organizational goals and financial objectives.
  • Strong leadership, mentorship, and talent development skills, with a passion for building and motivating high-performing engineering teams.
  • Adept at developing, communicating, and embedding best practice procedures across large teams and various functional areas.
  • Outstanding written and verbal communication skills, with the ability to influence and collaborate effectively across all levels of the organization.

Nice To Haves

  • A master’s degree is a plus.

Responsibilities

  • Strategic Leadership & Vision: Drive the vision, strategy, and organizational development of the cloud engineering function, championing SRE principles and best practices across the enterprise.
  • Engineering Excellence & Standards: Define, implement, and enforce robust engineering best practices, architectural guidelines, and operational standards to ensure the delivery of secure, scalable, and highly available solutions.
  • Technical Authority & Guidance: Provide expert technical leadership and strategic direction for automation, monitoring, performance optimization, and incident management, guiding teams toward resilient and efficient solutions.
  • High-Performance Team Development: Cultivate and lead highly engaged, high-performance SRE and cloud engineering teams, fostering a collaborative, inclusive, and innovative work environment.
  • Cross-Functional Collaboration & Knowledge Sharing: Champion a culture of open collaboration, leveraging cross-functional teams, communities of practice, and guilds to share knowledge, define standards, and drive continuous improvement across engineering domains.
  • Complex Project Delivery: Spearhead the end-to-end delivery of complex, high-visibility infrastructure projects, overseeing design, budgeting, testing, implementation, and continuous performance monitoring.
  • Resilience & Recovery: Architect, implement, and continuously improve incident response, disaster recovery, and business continuity plans for both cloud and on-premise infrastructure, ensuring rapid recovery and minimal disruption.
  • Cloud Infrastructure Strategy: Develop and execute a comprehensive cloud strategy focused on maximizing uptime, enhancing security, improving operational efficiency, and significantly reducing downtime.
  • Operational Excellence & Metrics: Champion a data-driven culture by establishing and monitoring key metrics and analytics to ensure the uptime, security, and efficiency of our cloud infrastructure, driving continuous improvement initiatives.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service