Principal AI Network Architect

MicrosoftRedmond, WA
1d

About The Position

Do you want to be at the forefront of innovating the latest hardware designs to propel Microsoft’s cloud growth? Are you seeking a unique career opportunity that combines technical capabilities, cross-team collaboration, with business insight and strategy? Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees, we come together with a growth mindset, innovate to empower others, and collaborate to achieve our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day. Join the Systems Planning and Architecture (SPARC) team within Microsoft’s Azure Hardware Systems and Infrastructure (AHSI) organization, the team behind Microsoft’s expanding Cloud Infrastructure and for powering Microsoft’s “Intelligent Cloud” mission. Microsoft delivers more than 200 online services to more than one billion individuals worldwide, and AHSI is the team behind our expanding cloud infrastructure. We deliver the core infrastructure and foundational technologies for Microsoft's cloud businesses including Microsoft Azure, Bing, MSN, Office 365, OneDrive, Skype, Teams and Xbox Live. We are seeking a passionate Principal AI Network Architect to join the AI systems architecture team. The role includes network architecture evaluation, design and optimization for next-gen AI systems. Your work will have a direct influence on Azure product roadmaps. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience?
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Nice To Haves

  • Master’s or Doctoral degree in Electrical Engineering, Computer Engineering, or related fields and 10+ years of technical experience in the domain.
  • Deep expertise with ethernet networking, RDMA (RoCE, Infiniband), congestion control, and layer 2/3 switching.
  • Experience architecting scale-out/backend network for AI GPU clusters
  • Familiarity with scale-up networks such as NVLinks, UALink.
  • Experience with high radix ethernet switches
  • Familiarity with AI model execution pipelines, being able to analyze communication flows and its impact on model performance.
  • Prior contributions in standards committee and experience on hyperscale network deployments would be an added benefit
  • Skilled in partnering and influencing architects, hardware engineers, and software leads
  • Ability to manage through ambiguity, bringing clarity and results orientation to engage and energize collaborators and stakeholders
  • Collaboration skills, teamwork, and sense of presumed responsibility
  • Verbal and written communication skills, and ability to articulate and engage with both technical and non-technical stakeholders at all levels.
  • Experience leading and driving complex projects with respect and integrity, including those with multiple workstreams spanning different business and technical disciplines.
  • Intellectual curiosity and passion about learning and deploying new technologies.
  • Problem-solving skills, analytical capabilities, and attention to details

Responsibilities

  • Leadership: Spearhead architecture definition and evaluation of AI accelerator platforms, with a focus on high bandwidth, low latency networks. Drive end to end optimization of the stack from hardware, the software kernels.
  • Cross functional collaboration: Partner with silicon and platform design teams to co-design infrastructure that meets performance, reliability and deployment goals. Frame decisions in terms of TCO, performance, flexibility, scalability.
  • Prototyping: You will be working with state of art networking lab to prototype new network architectures.
  • Industry influence: Participate in industry consortiums to shape standards, and influence vendor roadmaps.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service