Tech Lead, Deployment & Operations — Custom Infrastructure

OpenAISan Francisco, CA
$342,000 - $445,000

About The Position

OpenAI’s Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI-native silicon while working closely with software and research partners to co-design hardware tightly integrated with AI models. In addition to delivering production-grade silicon for OpenAI’s supercomputing infrastructure, the team also creates custom design tools and methodologies that accelerate innovation and enable hardware optimized specifically for AI. We are seeking a Technical Lead to lead deployment and operations for OpenAI’s Silicon & Systems team. This person will become the Directly-Responsible Individual responsible for bringing OpenAI’s custom silicon and associated systems into data center environments, ensuring successful deployment, bring-up, validation, operational readiness, and ongoing reliability at scale. This role sits at the intersection of silicon, systems, infrastructure, data center operations, and software. You will lead a team focused on taking new hardware platforms from lab validation into production data center deployment. You will be responsible for building the operational processes, technical workflows, tooling, and cross-functional alignment required to deploy and operate custom AI hardware reliably in OpenAI’s supercomputing infrastructure. The ideal candidate is both a strong leader and a deeply technical operator. You should be comfortable staying close to the technical details of hardware bring-up, fleet deployment, debugging, system validation, data center integration, and production operations. This role requires strong execution, excellent cross-functional judgment, and the ability to drive clarity in ambiguous, fast-moving environments.

Requirements

  • 8+ years of engineering experience in hardware systems, infrastructure, data center deployment, production operations, systems engineering, silicon bring-up, or related technical domains
  • Strong technical depth in one or more of: hardware deployment, data center operations, rack-scale systems, silicon bring-up, systems validation, fleet operations, reliability engineering, infrastructure automation, or hardware/software integration
  • Experience bringing complex hardware systems from development or validation into production environments
  • Experience working closely with silicon, systems, software, infrastructure, networking, or data center teams
  • Experience with deployment planning, operational readiness, incident response, debugging, and root-cause analysis for production systems
  • Experience building tooling, automation, observability, or operational processes that improve deployment quality and fleet reliability
  • Demonstrated ability to hire, develop, and lead senior technical talent
  • Ability to move fluidly between people leadership, technical strategy, and hands-on operational problem solving
  • Strong written and verbal communication skills, especially in high-urgency, cross-functional technical environments
  • Experience working in fast-moving environments

Nice To Haves

  • Enjoy mentoring and developing engineers while staying deeply engaged in technical execution
  • Are excited by the challenge of bringing new custom hardware platforms into real-world production data center environments
  • Can operate across silicon, systems, software, infrastructure, and data center operations
  • Are comfortable leading through ambiguity, especially when the hardware, tooling, and operational model are still being built
  • Have strong judgment around deployment sequencing, technical risk, operational readiness, and when to escalate
  • Communicate clearly across technical and operational teams, and can align stakeholders through complex deployment and production issues
  • Care deeply about building practical systems, tools, and processes that work reliably at scale
  • Have a bias toward ownership and are comfortable jumping into urgent technical issues when needed

Responsibilities

  • Lead a team responsible for deployment and operations of OpenAI’s custom silicon and systems in data center environments
  • Own the path from hardware bring-up and validation through production deployment, operational readiness, and sustained fleet support
  • Partner closely with silicon, systems, software, infrastructure, networking, data center, supply chain, and external partner teams to ensure successful deployment at scale
  • Define deployment processes, operational playbooks, technical readiness criteria, escalation paths, and reliability practices for new hardware platforms
  • Drive cross-functional execution across lab bring-up, rack/system integration, data center deployment, fleet monitoring, debugging, and issue resolution
  • Stay hands-on technically through architecture reviews, deployment planning, failure analysis, operational debugging, and critical system-level decision-making
  • Identify gaps in tooling, observability, automation, validation coverage, and operational processes, and build plans to close them
  • Establish clear metrics for deployment readiness, reliability, performance, maintainability, and operational health
  • Build a strong engineering culture grounded in ownership, technical rigor, operational excellence, and high-velocity execution
  • Ensure OpenAI’s custom hardware platforms can be deployed and operated reliably, repeatably, and safely at scale
  • Be a contributor and technical driver for the architecture and design of future ML systems

Benefits

  • Compensation Range: $342K - $445K USD
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service