Senior Director, Platform Operations, GDC

Google LLCSunnyvale, CA
35d$349,000 - $485,000

About The Position

Google Distributed Cloud (GDC) is a cloud-centric platform that enables enterprises to run modern apps anywhere consistently at scale. We offer a wide spectrum of solutions from managed software on your own hardware, fully managed hardware and AI-led software services, to completely air-gapped sovereign offerings. GDC is a fully managed product portfolio that brings Google Cloud's infrastructure and services closer to where customer data is being generated and consumed. Empowering customers to run AI-led services in their own (or a partner) data center, or at the edge, alongside enterprise applications to support mission-critical use cases such as computer vision and Google AI edge inferencing. Over time we believe this team will define new products and services as part of the GDC portfolio that enable us to build large businesses by helping governments use AI for citizen services, manufacturers save time and money by using video for visual inspections on factory floors, and enable retailers to remove in-store hardware and create dynamic, modern applications. As the Senior Director, you will establish and maintain the core reliability and production standards for GDC. In this role, you will build and operate the production environment for GDC. You will lead the dedicated Site Reliability Engineering (SRE) team within GDC, be responsible for the operational excellence and technical direction of the distributed full stack cloud product, and lead an organization of approximately 50-100 people. The core mission is to drive the technical and strategic execution for reliable production operations, ensuring operability, performance, and security of GDC within the constraints of private, on-premise, and air-gapped environments. Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible.

Requirements

  • Bachelor's degree in Computer Science, related technical field, or equivalent practical experience.
  • 20 years of experience in architecting system design, algorithms, data structures, analysis, and software design.
  • 15 years of experience with site reliability engineering practices, including incident response, monitoring, service-level objectives, and change management.
  • Experience with predictable software installation and upgrades in air-gapped environments.

Nice To Haves

  • Master's degree or PhD in Computer Science or related technical field.
  • Experience leading software product development organizations that have shipped successful products to enterprise customers.
  • Experience in production operations for isolated systems, and expertise in distributed cloud/on-premise.
  • Experience building or operating large-scale infrastructure platforms and distributed systems, with technical knowledge of public/private cloud, GPUs, virtualization, and containerization.
  • Ability to resolve deep systemic operational problems.

Responsibilities

  • Drive the SRE technical strategy and architectural roadmap for GDC, ensuring the vision is set by the VP and is translated into measurable, high-impact results.
  • Develop and deliver the operations stack that enables Google's SREs, partners, and customers to operate the distributed cloud deployment.
  • Own, innovate, and create programs, software solutions, process innovations, and analytics that drive improvements to the availability and operability of GDC's products.
  • Lead, develop, and scale the mission-critical SRE and Production Operations team, including managing multiple Directors and/or large numbers of managers within the organization. Focus on talent management and developing leadership talent within the SRE function.
  • Partner closely with the EngProd (e.g., Developer Velocity and QA) function to ensure production readiness, reliability validation, and the enforcement of test coverage strategy (e.g., unit, integration, silver, gold, systest) before code deployment.

Benefits

  • bonus
  • equity
  • benefits

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Director

Industry

Web Search Portals, Libraries, Archives, and Other Information Services

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service