The DCIM (Data Center Infrastructure Management) Cooling team's mission is to deliver reliable, efficient, and intelligent cooling solutions for Google data centers, enabling the future of technology. The team owns the life-cycle management of all cooling devices deployed in Google's data center that includes telemetry collection, monitoring health and alerting Data Center Operations teams to take action on them. The DCIM Cooling team operates one of the largest-scale monitoring systems at Google, reading telemetry from thousands of devices in every Google data center. Our issues include handling the rapid growth and diversification of the Google fleet and hardware, new use cases for critical monitoring of third-party facilities, and retiring technical debt. In this role, you will work with your teammates to design, code, and put into production very large-scale distributed monitoring systems and work with your team and partner teams to enable new use cases for large-scale telemetry gathering. You will also create various system monitoring dashboards, defining Service Level Objectives (SLOs), documentation and playbooks. You will have the opportunity to take onsite trips to one or more of Google's data centers each year to work with new systems and data center technical staff in person.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level