Software Engineer, Data Center Power Modeling

GoogleSunnyvale, CA
3d$166,000 - $244,000

About The Position

Power Modeling team’s mission is to build tools and services to gather and serve accurate power data about how our fleet of data centers are planned and built, from power grid to chips, enabling maximum utilization of infrastructure, software reliability (e.g., Borg and Colossus), and optimal cluster planning. To accomplish this mission, we create highly reliable services, software tools and UIs to model all power topologies of Google’s data centers from the power sub-station to the racks and in-rack devices, which is critical for optimal planning, safe and reliable operation, and optimal infrastructure utilization. Compute and serve the failure domain data needed to avoid correlated failures in Borg job scheduling and Colossus data placement, which is critical for Google’s reliability. Represent accurate hardware power and thermal models, critical for validating rack design, rack placements and maximizing data center power and cooling infrastructure utilization, while staying within deployment constraints.

Requirements

  • Bachelor’s degree or equivalent practical experience.
  • 5 years of experience in software development.
  • 3 years of experience with distributed systems.
  • 3 years of experience with C++ coding.

Nice To Haves

  • Master's degree or PhD in Computer Science or a related technical field.
  • Experience with Web Applications, Front-End Development, Angular.
  • Familiarity with managing Google specific productions systems.
  • Excellent coding skills in Java, Typescript, Angular, C++ etc.

Responsibilities

  • Develop and manage the Power Topology infrastructure to construct and serve topological data for current and planned data center power graphs.
  • Utilize PowerGrab and PowerMap to accurately collect and update power topology data by scanning racks and power equipment during data center audits.
  • Maintain Failure Domain services to ensure high availability for Colossus and Borg, enabling intelligent decision-making for global data and job distribution.
  • Maintain applications used by technicians to record power connections feeding racks and in-rack devices.
  • Analyze power utilization of production and ML machines to verify and increase the accuracy of power models.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service