Want to impact the foundation for future AI storage development in Azure, the world's computer? The Azure Managed Lustre File System (AMLFS) team leads development, deployment, and monitoring of the most popular High-Performance Computing (HPC) parallel file system in the world: Lustre, the Azure storage solution of choice for AI training and fine-tuning. The AMLFS Platform Team is responsible for end-to-end delivery of AMLFS images, cluster deployment, logs and metrics, and configuration compliance. An ideal candidate will also have opportunities to impact cluster architecture and design of Lustre in the Azure ecosystem, performance analysis and optimization of AMLFS, and customer support for the most challenging parallel filesystem bugs or performance anomalies that arise within our product. As a Principal Software Engineer in the AMLFS Platform team you will lead design and development of key features, primarily working on reliable deployment of AMLFS in Azure, assessing and mitigating security risks, developing comprehensive unit and system-level tests, and diagnosing, mitigating, and fixing the most challenging deployment and upgrade customer issues. You will lead the design and development of logging, monitoring, and reporting capabilities for AMLFS and help define and measure key Service Level Indicators designed to make our product increasingly robust. This opportunity will allow you to develop expertise in distributed system and HPC/AI filesystem design, implementation, and debugging, grow proficient in navigating and managing Linux operating systems, and hone leadership qualities as you develop strong collaborative working relationships with with the core storage, compute, and networking teams that form the foundation of Azure. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees