The DCIM Lifecycle team operates one of the largest-scale monitoring systems at Google, reading telemetry from millions of devices in every Google datacenter. Our issues include managing the rapid growth and diversification of the Google fleet and hardware, new use cases for critical monitoring of third-party facilities, and retiring technical debt. Google is bringing back tape libraries to our data centers in order to support various critical requirements including new cold storage tier, better TCO, contingency for HDD/SSD shortage due to unprecedented AI/ML capacity demand. This role is to design and delivery tape health at Google scale for reliability. In this role, you will work with your teammates to design, code, and put into production very large-scale distributed monitoring systems and work with your team and partner teams to enable new use cases for large-scale telemetry gathering. You will also create various system monitoring dashboards, defining service level objectives (SLOs), documentation and playbooks. You will have the opportunity to take onsite trips to one or more of Google's datacenters each year to work with new systems and datacenter technical staff in person.The US base salary range for this full-time position is $166,000-$244,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process. Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google [https://careers.google.com/benefits/].
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level