Research Internships at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers, who pursue innovation in a range of scientific and technical disciplines to help solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment. As an Applied Science Research Intern, you will work with a small team to investigate recent Small Language Models (SLM) architectures and techniques, such as recurrent transformers and universal transformers, as potential approaches for maximizing the throughput of Large Language Models (LLMs) with limited high-speed cache. For example, could a useful model be pinned to Very Tightly Coupled Memory (VTCM) in a Qualcomm System on Chip (SoC) for its entire lifecycle? Similarly, could this be achieved in the fast caches of Graphics Processing Units (GPUs) or cloud Neural Processing Units (NPUs)? The choice of hardware target will be based on the candidate’s passion for and experience with different platforms. This opportunity will allow you to learn how to apply your model training skills at scale using Azure compute. In addition, you will be mentored by a multidisciplinary team with expertise in both on-device implementation and literature/state-of-the-art (SotA) approaches. This role will integrate you into the Applied Science Group (ASG) in Redmond and is not open to remote work; however, flexible working within Redmond and schedule flexibility are encouraged.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Career Level
Intern
Education Level
No Education Listed
Number of Employees
5,001-10,000 employees