Principal Hardware Engineer - Platform

MicrosoftSan Diego, CA
1d

About The Position

Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the team behind Microsoft’s expanding Cloud Infrastructure and responsible for powering Microsoft’s “Intelligent Cloud” mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing, MSN, Office 365, Xbox Live, Teams, OneDrive, and the Microsoft Azure platform globally with our server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions. Our focus is on smart growth, high efficiency, and delivering a trusted experience to customers and partners worldwide and we are looking for passionate engineers to help achieve that mission. As Microsoft's cloud business continues to grow the ability to deploy new offerings and hardware infrastructure on time, in high volume with high quality and lowest cost is of paramount importance. To achieve this goal, the Cloud Hardware Systems Engineering (CHSE) team is instrumental in defining and delivering operational measures of success for hardware manufacturing, improving the planning process, quality, delivery, scale and sustainability related to Microsoft cloud hardware. We are looking for seasoned engineers with a dedicated passion for customer focused solutions, insight and industry knowledge to envision and implement future technical solutions that will manage and optimize the Cloud infrastructure. We are looking for a Principal Hardware Engineer - Platform to join the team. #SCHIE #azurehwjob

Requirements

  • Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 7+ years technical engineering experience OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 8+ years technical engineering experience OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Nice To Haves

  • 10+ years of relevant experience in system (compute, storage, networking, and/or accelerator) level design and/or implementation across the hardware development lifecycle.
  • 10+ years of hands-on experience in server hardware architecture, design, and development with solid understanding of hardware, firmware, and Operating System (OS).
  • Proven experience delivering AI and GPU-based systems to production.
  • Proven track record leading cross-functional technical execution across hardware, firmware, software, and datacenter infrastructure.
  • Deep system expertise in power delivery, thermal and liquid cooling, signal integrity, mechanical design, and reliability.
  • Experience bringing high-volume silicon platforms (GPU, SoC, accelerator) from architecture through production ramp.
  • Hands-on experience with PCIe, DDR, Ethernet, BIOS/BMC, and Linux and Windows integration.
  • Experience with datacenter-scale AI systems, including system debug and root cause analysis.
  • Ability to evaluate AI systems using performance-per-watt and performance-per-dollar metrics.
  • Clear, concise communicator with the ability to influence technical direction across teams and at senior levels.

Responsibilities

  • Own end-to-end technical delivery of AI and GPU-based platforms from architecture through production deployment.
  • Act as the system-level technical owner, driving decisions across architecture, silicon, firmware, hardware, OS, validation, and manufacturing.
  • Define and manage technical baselines with TPMs, including scope, schedule, dependencies, and change control.
  • Own system design quality and completeness, ensuring alignment with AI workloads, GPU performance targets, and platform constraints.
  • Influence platform-, rack-, and datacenter-level architecture, including GPU clustering, interconnects, and power and liquid cooling solutions.
  • Evaluate and de-risk NUDD technologies such as new accelerators, memory hierarchies, and high-speed interconnects.
  • Drive system-level tradeoffs across power, thermal, mechanical, cost, reliability, and manufacturability to optimize performance and TCO.
  • Partner with validation and manufacturing teams to resolve issues and ensure production-ready designs.
  • Collaborate with internal teams and external partners, including silicon vendors and open-source communities, to integrate new technologies.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service