About The Position

We are looking for a hands-on and customer-focused HPC Support Engineering Manager to lead our Tier III Support Engineering team supporting customers on Lambda’s Private Cloud GPU clusters. You’ll be responsible for guiding a team of HPC Support Support Engineers, ensuring escalations are handled with speed and consistency, and driving a high standard of technical excellence and customer experience. This role requires both strong technical depth in HPC and the ability to lead, mentor, and collaborate across Support, Product, Engineering, and Sales. You’ll also play a critical role in shaping the supportability of Lambda’s products by representing customer experience in internal discussions. This position reports to the Manager of Support Operations and includes participation in an on-call rotation.

Requirements

  • Proven experience leading technical support or engineering teams, with a track record of building high-performing groups that deliver strong customer outcomes.
  • Skilled at managing escalations, providing clear direction under pressure, and serving as the point of leadership in critical customer situations.
  • Strong knowledge of HPC clusters, including GPU/InfiniBand systems, networking, and node-level troubleshooting.
  • Advanced Linux administration and diagnostic skills.
  • Skilled at motivating teams, setting direction, and developing engineers into strong technical contributors.
  • Strong analytical and problem-solving skills with a proactive, action-oriented mindset.
  • Action-oriented, accountable, and able to align team priorities with company and customer goals.

Nice To Haves

  • Advanced degree in Computer Science, Engineering, or related field.
  • Certifications in HPC, networking, or related technologies.
  • Experience with Slurm, Kubernetes, InfiniBand, and other high-performance interconnects (RoCE, NVLink/NVSwitch).
  • Background supporting Private Cloud environments or other dedicated enterprise clusters.
  • Experience supporting enterprise AI workloads across startups and Fortune 500 companies.

Responsibilities

  • Lead, coach, and mentor a team of HPC Support Engineers, fostering both technical growth and customer-first execution.
  • Ensure the highest quality of support for Lambda’s customers, who depend on our products for mission-critical workloads.
  • Own customer escalations and incidents, engaging directly with enterprise customers during high-visibility situations.
  • Partner with Product and Engineering teams to influence design decisions and ensure future offerings are supportable and reliable.
  • Stay current on the latest HPC and NVIDIA technologies, applying that knowledge to improve customer outcomes.
  • Develop and refine support processes, documentation, and workflows to ensure consistency and best practices.
  • Monitor and report on team performance, driving improvements in responsiveness, resolution quality, and customer satisfaction.
  • Manage team schedules, including on-call responsibilities, to ensure 24/7 coverage for critical issues.
  • Lead by example — actively participating in troubleshooting and case resolution when needed.

Benefits

  • Health, dental, and vision coverage for you and your dependents.
  • Wellness and Commuter stipends for select roles.
  • 401k Plan with 2% company match (USA employees).
  • Flexible Paid Time Off Plan that we all actually use.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service