About The Position

The Microsoft Azure Artificial Intelligence/High Performance Computing (AI/HPC) team is seeking software engineers to enable customers in deploying, monitoring, profiling, and debugging their applications on hyperscale cloud infrastructure. Azure supports the largest supercomputing deployments for complex computational problems in the public cloud, with its HPC products recognized on Top500, MLPerf, and Graph500 rankings. At this supercomputing scale, specialized tools and techniques are crucial for maintaining reliability, runtime performance, system health, and job execution to meet customer Service Level Agreements (SLAs). The role involves building and utilizing state-of-the-art cloud applications and services to identify operational gaps and implement features for the smooth operation and management of cloud-native supercomputers. As a Senior Supercomputing Engineer, responsibilities also include establishing best practices, driving architectural changes, and influencing the roadmap of relevant software and hardware components. This work directly impacts the business goals of a wide range of users and fosters growth and innovation in AI and HPC in the cloud. Microsoft's mission is to empower every person and organization to achieve more, fostering a culture of inclusion, respect, integrity, and accountability.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, OR Java, JavaScript, or Python OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
  • This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Nice To Haves

  • Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • 1+ years previous experience with running and troubleshooting machine learning workloads on GPU-based HPC systems.
  • 1+ years experience with Cloud Computing, Virtualization and Container Technologies.
  • Familiarity with AI/HPC workloads, GPU-based systems, AI assisted software development and secure software design practices.

Responsibilities

  • Collaborate with appropriate stakeholders to determine user requirements for a scenario.
  • Drive identification of dependencies and the development of design documents for a product, application, service, or platform.
  • Independently uses appropriate artificial intelligence tools and practices across the software development lifecycle to create, implement, optimize, debug, refactor, and reuse code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI).
  • Leverage subject-matter expertise of product features and partners with appropriate stakeholders (e.g., project managers) to drive a workgroup's project plans, release plans, and work items.
  • Act as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions, alerting stakeholders about status and initiates actions to restore system/product/service for simple and complex problems when appropriate.
  • Proactively seek new knowledge and adapt to new trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale.

Benefits

  • Certain roles may be eligible for benefits and other compensation. Additional benefits and pay information can be found at https://careers.microsoft.com/us/en/us-corporate-pay.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service