About The Position

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow. In this role, you will be responsible for designing, building, and evolving Google's internet-facing edge infrastructure. We are the gateway to Google, and our mission is to deliver unparalleled reliability, scalability, and observability for the ingress/egress load balancing systems that power all of Google's services.

Requirements

  • Bachelor’s degree in Computer Science, a related technical field, or equivalent practical experience.
  • 2 years of experience with programming in one or more programming languages.
  • 2 years of experience working with administration (e.g., filesystems, inodes, system calls) or networking (e.g., Transmission Control Protocol/Internet Protocol, routing, network topologies and hardware, Software-Defined Networking).

Nice To Haves

  • Master's degree in Computer Science or Engineering.
  • 2 years of experience designing, analyzing, and troubleshooting distributed systems.

Responsibilities

  • Be responsible for the uptime, performance, and scalability of the software systems that load balance and route traffic for all of Google's services and Google Cloud customers.
  • Design, develop, and deploy software that improves the automation, monitoring, and operational efficiency of our systems. This includes building tools to prevent problem recurrence and to manage one of the largest networks in the world.
  • Participate actively in the design and implementation of the next generation of our edge infrastructure.
  • Engage in deep technical analysis of our systems, including performance tuning, capacity planning, and troubleshooting issues in our environment.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service