About The Position

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward. As a Software Engineer (SWE) in Cloud Kubernetes AI Infra, you will be building cutting edge cloud technology that is powering AI Infra for a number of the key, innovative companies. Your role is to work with product teams and make Kubernetes the best software to manage AI/ML workload at scale. Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

Requirements

  • Bachelor’s degree or equivalent practical experience.
  • 5 years of experience with distributed Machine Learning and Machine Learning infrastructure, distributed systems and with Machine Learning algorithms.
  • Experience architecting and developing distributed systems (e.g., Google Cloud).
  • Experience in AI/ML related engineering, developing, deploying, managing, and maintaining Machine Learning infrastructure.
  • Experience with Kubernetes.

Nice To Haves

  • Master's degree or PhD in Computer Science or related technical field.
  • 5 years of experience with data structures/algorithms.
  • 1 year of experience in a technical leadership role.
  • Experience developing accessible technologies.
  • Experience with deep learning.

Responsibilities

  • Scope, design, code, test and operate Kubernetes clusters and components to manage Hybrid and On-prem Tensor Processing Unit (TPU) infrastructure at scale.
  • Partner with cross-functional team to ship new AI Infra management features, to advance Kubernetes capabilities of running massive-scale GenAI workloads.
  • Maintain production systems which involve feature rollouts, reliability monitoring, integration testing, on-call support etc.
  • Engage closely with customers including the frontier AI labs to drive AI Infra roadmap and to increase TPU adoption on AI/ML workload.
  • Help shape the culture of the team to be a high executing team that is fun to work in.

Benefits

  • bonus
  • equity
  • benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service