Cerebras Systems-posted about 1 month ago
Full-time • Principal
Hybrid • Sunnyvale, CA
Computer and Electronic Product Manufacturing

The Cerebras Inference Service team is seeking an experienced Principal Technical Program Manager (TPM) to join our organization. We are a dynamic fast-growing team responsible for building and operating the Cerebras AI cloud that powers serving of cutting-edge AI models. In this role you will manage the inference compute capacity for our team. We are looking for a TPM Leader with a proven track record of managing and operating data center infrastructure. As a Principal TPM on our team, you will play a key role in planning weekly software releases and launches, coordinating the delivery of new capacity to the inference team, and monitoring the status of existing capacity. The ideal candidate is an extremely organized, proactive TPM with excellent teamwork skills, comfortable with working in a rapidly changing cross-functional environment.

  • Develop weekly capacity plans in collaboration with engineering and product management, including timelines and resource allocation
  • Handle all capacity movement and allocation requests related to the weekly release cycle
  • Collaborate with datacenter infrastructure and operations teams to ensure new capacity delivery and timely availability
  • Track inference capacity availability, utilization, and uptime metrics, and provide weekly updates to leadership and stakeholders
  • Generate near- and mid-term capacity projections for engineering and product planning purposes
  • Proactively identify and mitigate capacity bottlenecks, risks, and dependencies
  • Contribute to the continuous process improvement and development of internal capacity management tools
  • Manage and grow a small team.
  • B.S. degree in Computer Science, Engineering, or a related field; advanced degree preferred
  • 10+ years of TPM experience, preferably with focus on datacenter infrastructure
  • Excellent verbal and written communication skills
  • Strong organizational skills, teamwork, and can-do attitude
  • Clear and detailed documentation skills
  • Experience working with geographically dispersed teams across time zones
  • Proficiency with Jira or similar tool
  • Experience with AI infrastructure and related technologies is a plus
  • Experience with data center equipment, including servers, networking, and storage is a plus
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service