About The Position

Lambda, The Superintelligence Cloud, builds Gigawatt-scale AI Factories for Training and Inference. Lambda’s mission is to make compute as ubiquitous as electricity and give every person access to artificial intelligence. One person, one GPU. If you'd like to build the world's best deep learning cloud, join us. We are seeking an accomplished Advanced Cooling Facilities Manager specializing in Direct Liquid Cooling (DLC) systems to lead the global strategy, implementation, and operational excellence of Lambda’s next-generation liquid cooling infrastructure. This role will define methodologies and standards for the deployment, optimization, and scaling of cooling systems that enable Lambda’s GPU Cloud to deliver industry-leading performance for AI and machine learning workloads.

Requirements

  • Bachelor’s degree in Mechanical, Electrical, or Thermal Engineering (Master’s preferred).
  • Professional certifications such as DCCA, CompTIA Server+, or liquid cooling manufacturer certifications are strongly preferred.
  • 10+ years of experience in data center or mission-critical facility operations.
  • 7+ years managing advanced liquid cooling systems (CDUs, L2L/L2A loops, heat exchangers).
  • 5+ years supporting GPU/AI infrastructure or high-density compute workloads (>300 W per rack).
  • 3+ years managing technical teams in distributed, multi-site environments.

Nice To Haves

  • Master’s in Mechanical or Thermal Engineering.
  • Experience designing or supporting large-scale GPU clusters and AI cooling ecosystems.
  • Background in hyperscale, HPC, or advanced colocation environments.
  • Experience with AI-driven control systems and thermal optimization algorithms.
  • Demonstrated success implementing energy-efficient and water-conservation cooling strategies.

Responsibilities

  • Define and oversee operational standards and lifecycle management for all CDU systems (L2L and L2A), including performance optimization, reliability engineering, and capacity expansion strategies.
  • Lead the design and management of multi-stage cooling loops — from facility to rack level — ensuring precise control of temperature, pressure, and flow rate across variable load conditions.
  • Coordinate and validate integration of CDUs with facility water systems (FWS), heat exchangers, and mechanical infrastructure.
  • Architect the monitoring framework for coolant system telemetry — pressure, temperature, flow, differential, and conductivity.
  • Design and institutionalize maintenance methodologies, including condition-based maintenance schedules, failure-mode analysis, and reliability improvement plans for pumps, heat exchangers, and filtration systems.
  • Evaluate and forecast thermal capacity requirements for high-density GPU clusters, driving design and procurement of CDUs and loop systems.
  • Partner with data center design and mechanical engineering teams to co-develop cooling topologies, redundancy strategies, and modular infrastructure designs.
  • Act as the primary technical authority for liquid cooling vendor engagement — influencing product roadmaps, negotiating technical specifications, and qualifying emerging solutions.
  • Evaluate and pilot next-generation cooling technologies and automation platforms to reduce PUE, enhance reliability, and support sustainability objectives.
  • Establish performance metrics for cooling energy efficiency, uptime, and total cost of ownership.
  • Oversee global operation of liquid cooling infrastructure with near-zero downtime objectives.
  • Act as the senior technical lead for major cooling incidents, coordinating cross-functional response teams.
  • Establish robust documentation standards — including P&IDs, SOPs, commissioning reports, and change logs.
  • Ensure adherence to all applicable codes, environmental standards, and safety protocols.
  • Mentor and develop specialized liquid cooling technicians and engineers.
  • Lead liquid cooling deployment and operational programs across colocation and owned facilities worldwide.
  • Define and enforce standardized cooling system configurations, control sequences, and operating parameters across all sites.
  • Deploy and manage advanced remote monitoring and control systems for multi-site visibility, predictive analytics, and fault detection.
  • Architect the global cooling expansion framework to support rapid scaling of Lambda’s GPU cloud services.

Benefits

  • Health, dental, and vision coverage for you and your dependents.
  • Wellness and Commuter stipends for select roles.
  • 401k Plan with 2% company match (USA employees).
  • Flexible Paid Time Off Plan that we all actually use.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service