Principal Systems SW Engineer

MicrosoftRedmond, WA
16h

About The Position

Do you want to be at the forefront of innovating the latest hardware designs to propel Microsoft’s cloud growth? Are you seeking a unique career opportunity that combines technical capabilities, cross-team collaboration with business insight and strategy? This Principal Systems SW Engineer role with a focus on supporting hardware/software co-design and evaluation of AI systems architecture concepts to improve datacenter performance, efficiency, and reliability might be the right one for you. If you’re interested in hardware/software co-design and evaluating AI system architecture concepts to improve datacenter performance, efficiency, and reliability, this Principal Systems SW Engineer role could be a strong fit. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees, we come together with a growth mindset, innovate to empower others, and collaborate to achieve our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day. Join the Strategic Planning and Architecture (SPARC) team within Microsoft’s Azure Hardware Systems and Infrastructure (AHSI) organization, the team behind Microsoft’s expanding Cloud Infrastructure and for powering Microsoft’s “Intelligent Cloud” mission. Microsoft delivers more than 200 online services to more than one billion individuals worldwide, and AHSI is the team behind our expanding cloud infrastructure. We deliver the core infrastructure and foundational technologies for Microsoft's cloud businesses including Microsoft Azure, Bing, MSN, Office 365, OneDrive, Skype, Teams and Xbox Live. As part of the Systems Planning and Architecture (SPARC) group, you will help with pathfinding and architecture for future AI systems and related technologies that create advantages for Azure and Microsoft. You will collaborate across the Azure organization to evaluate next-generation datacenter technologies and influence Azure product roadmaps for both Microsoft and 3rd party silicon and systems.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Nice To Haves

  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Deep expertise in AI scale-up and scale-out networking/interconnect architectures, along with a good understanding of memory/storage technologies.
  • Deep understanding of AI inference systems and associated software, and emerging approaches to orchestrate tiered memory and storage capabilities for distributed serving and KV caching for agentic systems.
  • Understanding of GPU compute and systems in the cloud, including CPU, memory, networking and storage technologies.
  • Understanding of the system software, storage and communication library integration into AI frameworks.
  • Intellectual curiosity and passion about learning and deploying new technologies. Problem-solving skills, analytical capabilities, and attention to details.
  • Ability to manage through ambiguity, bringing clarity and results orientation to engage and energize collaborators and stakeholders
  • Experience leading and driving complex projects with respect and integrity, including those with multiple workstreams spanning different business and technical disciplines.
  • Skilled in partnering and influencing architects, hardware engineers, and software leads. Collaboration skills, teamwork, and sense of presumed responsibility
  • Verbal and written communication skills, and ability to articulate and engage with both technical and non-technical stakeholders at all levels.

Responsibilities

  • Leadership: Spearhead system architecture exploration and definition for Microsoft’s custom AI systems. Identify system level co-design opportunities working across GPU, host, network, storage and memory vectors
  • Conduct comprehensive architecture analysis for next-generation ML model architecture, with a deep understanding of Azure AI usecases. Run simulations to evaluate solutions and build end-to-end hardware and software prototypes.
  • Prototype: Collaborate with cross-functional teams to develop full stack technology across hardware and software, to mature concepts from PoC to productization.
  • Hands-on engineer who understands System Software stack and its interfaces with ML frameworks, ML model architecture and its mapping to AI systems.
  • Identifying promising features and technologies to address these problems, prototype solutions, derisk productization via building PoCs.
  • Working across hardware, software boundaries to develop end-to-end solutions.
  • Working across organization boundaries to land promising features and technologies in Microsoft’s AI systems.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service