About The Position

In Azure Specialized, the team works collaboratively to bring the next generation of workloads to the Public Cloud platform, enabling end-to-end new scenarios for Azure customers. They imagine and build differentiating customer features and fundamental building blocks at the heart of the Azure platform, collaborating with industry partners. As a Senior Software Engineer, you will be critical in designing and delivering the next generations of High Performance Computing (HPC) to enable a wide variety of customer workloads including weather prediction, electronic design attestation, computational fluid dynamics and more. This role involves challenges across a wide spectrum of hardware architectures, network types, and processor types, focusing on delivering an end-to-end vertical view with continuous focus on customer value, quality, performance, and automation. The position entails deep technical work, defining, deploying, and sustaining hardware and software Azure infrastructure for HPC workloads with a key focus on quality, tooling, and metrics. The team focuses on hardware/software interaction, coding, and working with next-gen hardware, end-to-end systems engineering across the infrastructure - CPU differentiation, networking, switches, rack design, cluster design and more to come up with the best offering for the customer. This role offers a unique opportunity to significantly impact customers and the world, as the team expands capacity and supported scenarios for 100X growth. Microsoft's mission is to empower every person and organization to achieve more, fostering a culture of inclusion based on respect, integrity, and accountability.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, PowerShell, or Python OR equivalent experience.
  • 2+ years of experience telemetry and observability, monitoring and improving the quality of a service or cloud infrastructure, or measuring and driving improvements in a system.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Nice To Haves

  • Bachelor's Degree in Computer Science OR related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, PowerShell, or Python OR Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, PowerShell, or Python OR equivalent experience.
  • 2+ years of experience in high performance computing, including familiarity with accelerators and co-designing hardware and software.
  • 1+ year of experience working with distributed systems and cloud infrastructure, using profiling and performance analysis tools, and debugging low-level system code such as drivers.
  • 1+ year of experience in data science and telemetry, with exposure to machine learning middleware and performance optimization techniques.

Responsibilities

  • Dives deeply into any level or layer of a problem.
  • Learns emerging technologies, from hardware to software.
  • Evaluate and make recommendations that advance Azure infrastructure for HPC and AI-based workloads.
  • Evaluates metrics, telemetry and alerting against quality measures.
  • Leads self and others to analyze data, develop new tools, telemetry and alerting.
  • Leads by example within the team by producing extensible and maintainable code.
  • Optimizes, debugs, refactors, and reuses code to improve performance and maintainability, effectiveness, and return on investment (ROI).
  • Applies metrics to drive the quality and stability of code, as well as appropriate coding patterns and best practices.
  • Maintains communication with key partners across the Microsoft ecosystem of engineers.
  • Ensures alignment with partners' expectations.
  • Considers partner teams across organizations and their end goals for products to drive and achieve desirable user experiences and fitting dynamic needs of partners/customers through product development.
  • Drives identification of dependencies and the development of design documents for a product, application, service, or platform.
  • Creates, implements, optimizes, debugs, refactors, and reuses code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI).
  • Acts as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions, alerting stakeholders about status and initiates actions to restore system/product/service for simple and complex problems when appropriate.
  • Help ensure Azure platform is consistent on performance, can scale on-demand, and engineered to withstand the unparalleled computing demand from the customer workloads.
  • Help building a test-driven engineering culture to reduce regressions and bugs in production and will set a higher bar for infrastructure quality.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service