System Development Engineer, Silicon, Google Cloud

Google•Sunnyvale, CA

17h•$138,000 - $198,000•Onsite

About The Position

Systems Development Engineering (SDE) at Google is a role where you manage services and systems at scale. SDEs creatively put their engineering discipline to use automating the mundane and reducing toil. We don’t just write code to fix bugs, but emphasize the development of tools and solutions that fix classes of problems. We know it’s hard to control what you can’t measure – so we focus on observability: instrumenting first, then turning data into knowledge, and finally knowledge into action. We know that the operational efficiency of Google systems, services, virtual compute environments and the operating systems that power them impact the environment, not just the bottom line. We know that working together we can do more, and that community matters. Google brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow. Together we engineer and build the infrastructure, tools, access and telemetry for systems that enable orchestration of Google-scale services. Come build things that matter. EDACloud is a large, complex service that requires high availability. In this role, you will maintain a high uptime, especially doing it in a way that allows for the growth we're seeing. Planning, attention to detail, and a healthy amount of paranoia are needed for success. The AI and Infrastructure team is redefining what’s possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide. We're the driving team behind Google's groundbreaking innovations, empowering the development of our cutting-edge AI models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. From software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our TPUs, Vertex AI for Google Cloud, Google Global Networking, Data Center operations, systems research, and much more.

Requirements

Bachelor's degree in Computer Science or IT-related field, or equivalent practical experience.
3 years of experience with systems automation, and with systems design and implementation.
3 years of experience with technical infrastructure (e.g., deployment, maintenance, troubleshooting).
3 years of experience with troubleshooting across Linux and networking.

Nice To Haves

5 years of experience with Linux operating systems internals and administration, technical infrastructure (e.g., deployment, maintenance, troubleshooting), and with reliability of technical infrastructure.
3 years of experience with one or more programming/scripting languages (Go, Python, Bash).
3 years of experience in cloud systems design.
Experience in large scale environments, server virtualization, deployments and automation/configuration management, networking, and security fundamentals.
Excellent communication, people management, problem-solving, and presentation skills.
Passion for technology and delivering exceptional user experiences.

Responsibilities

Troubleshoot technical issues, evaluate technical data, and develop recommendations for systems and services within the domain, respond to tickets/bugs within team-defined service-level objectives (SLOs), and contribute to systems and services in related domains through bug reports and consultation.
Improve operations work and reduce support toil by proposing and implementing automation and process improvements.
Scale systems sustainably through mechanisms like automation. Evolve systems by pushing for changes that improve reliability and velocity.
Work with customers and developers on defining distributed systems requirements, testing procedures, and proposing solutions.
Participate in limited on-call rotation and monitoring of production distributed computing infrastructure.