Software Engineer III, Infrastructure, Cloud AI

Google•Sunnyvale, CA

50d•$141,000 - $202,000

About The Position

The XLA (Accelerated Linear Algebra) compiler is used across the Research to Production pipeline for both Training and Serving use cases for Tensor Processing Unit (TPU), Graphics Processing Unit (GPU) and Central Processing Unit (CPU) accelerators. You will work on projects that improve generalization across different hardware, frameworks and use cases by simplifying and standardizing compiler integration across different stacks.The AI and Infrastructure team is redefining what’s possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide. We're the driving force behind Google's groundbreaking innovations, empowering the development of our cutting-edge AI models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. From software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our TPUs, Vertex AI for Google Cloud, Google Global Networking, Data Center operations, systems research, and much more.

Requirements

Bachelor’s degree or equivalent practical experience.
2 years of experience with software development in C++.
2 years of experience with developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage or hardware architecture.
2 years of experience testing, maintaining, or launching software products, and 1 year of experience with software design and architecture.

Nice To Haves

Master's degree or PhD in Computer Science or a related technical field.
2 years of experience with performance, large-scale systems data analysis, visualization tools, or debugging.
Experience in compilers or runtimes.
Proficiency in code and system health, diagnosis and resolution, and software test engineering.

Responsibilities

Write and test product or system development code.
Understand how accelerator compilers and runtimes interact at a high level.
Develop and apply metrics to understand the problem you are solving and gage status/success as needed.
Close infrastructure (infra) gaps to help with ML stack maturation (e.g., reduce a number of ways something is done, improve reproducibility, improve tooling, improve usability).
Participate in design reviews with peers and stakeholders to decide amongst available technologies.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume