Databases SRE Manager, ASE Cassandra SRE Manager

Apple•Seattle, WA

41d

About The Position

Apple’s Services Engineering organization (ASE) is seeking experienced database systems engineering managers to join our Databases SRE organization. The databases organization is expanding its footprint into Seattle and is seeking an experienced engineering manager to join and build out this site. The Databases SRE organization covers multiple database and data service technologies, including Apache Cassandra, Apache Kafka, Apache Solr, Redis/Valkey, as well as other technologies. This position would have an initial primary focus on the Apache Cassandra teams, but would grow to be a site lead for all database technologies based out of the Seattle site. Managers in ASE Cassandra SRE teams develop and contribute to software built to manage Apache Cassandra, an open source distributed database powering some of Apple's most critical internet services. You will be joining a team of experts, working at the cutting edge of modern database deployment architectures and distributed systems. The team's work is deployed at massive scale, serving millions of queries per second over hundreds of petabytes of data across our data-centers worldwide. It also has big impact, forming the platform upon which iCloud and many other internet services at Apple are built. In ASE, your work will benefit hundreds of millions of users and is critical to the success of some of the most visible current and future Apple features. The ASE Cassandra SRE team develops applications and tooling that are safe, reliable, scalable, and fast. This work requires an innovative spirit and an extraordinary degree of care and rigor in engineering. Team members contribute to all major components of Cassandra deployment infrastructure, including maintenance automation, backup service application, monitoring and alerting tooling/dashboards, deployment architecture, as well as contributing back to the upstream patches to the database focused on stability, performance, and scaling. As a leader in this organization, you will manage, develop, and grow a team responsible for Cassandra’s scalability and performance across Apple. You will also be responsible for growing the larger database and service organization within the Seattle campus leading automation and scaling initiatives across the Cassandra, Kafka, Solr, Redis, and etcd SRE teams. We are seeking a hands-on manager with domain experience who is comfortable working in the details. Familiarity with several of the following technical areas and desire to gain experience in others is expected: Experience developing or leading work in distributed systems engineering. Experience leading control plane services for managing data plane infrastructure. Experience managing or developing critical internet services / platform infrastructure. Understanding of distributed systems and database concepts (consistency models, isolation levels, crash and recovery semantics). Understanding of database concepts (consistency models, isolation levels, crash and recovery semantics). Performance engineering (design concepts, profile-guided optimization). Service management across a bare metal, virtualized (EC2), and containerized (K8s) style platforms. Fundamentals of system-level hardware and networking components (storage devices and controllers, network interfaces, CPU and memory layout in server-class systems). Operating systems concepts (process scheduling, disk and network I/O, performance). Datacenter architecture (networking topologies, host placement strategies, and failure modes); design of multi-datacenter systems; failure domains; and wide-area networking. This role also requires excellent communication, ability to partner with our Core Storage and Analytics teams, and a high degree of customer focus when engaging with internal platform customers. As a distributed team, ability to work effectively with colleagues based in other locations is also essential; experience in this area is a plus. Prior experience with development or maintenance of distributed databases / storage systems is recommended.

Requirements

BS, MS, or PhD in Computer Science / related fields or equivalent work experience
Ability to rapidly build credibility with engineers, customers, and partners and quickly learn new technical domains.
Expertise in project management, project planning, and making the unpredictable predictable.
Experience developing or leading work in distributed systems, database systems, storage engines, validation engineering, or performance engineering.
Proficient in modern Java and optionally Python / Golang
Understanding of core SRE concepts - Monitoring, Alerting, Incident management.
Experience developing or leading work in distributed systems engineering.
Experience leading control plane services for managing data plane infrastructure.
Experience managing or developing critical internet services / platform infrastructure.
Understanding of distributed systems and database concepts (consistency models, isolation levels, crash and recovery semantics).
Understanding of database concepts (consistency models, isolation levels, crash and recovery semantics).
Performance engineering (design concepts, profile-guided optimization).
Service management across a bare metal, virtualized (EC2), and containerized (K8s) style platforms.
Fundamentals of system-level hardware and networking components (storage devices and controllers, network interfaces, CPU and memory layout in server-class systems).
Operating systems concepts (process scheduling, disk and network I/O, performance).
Datacenter architecture (networking topologies, host placement strategies, and failure modes); design of multi-datacenter systems; failure domains; and wide-area networking.
Excellent communication, ability to partner with our Core Storage and Analytics teams, and a high degree of customer focus when engaging with internal platform customers.
As a distributed team, ability to work effectively with colleagues based in other locations is also essential

Nice To Haves

Advanced understanding of data structures and algorithms in storage and indexing.
Experience developing critical internet services and/or platform infrastructure.
Comfort working with geographically distributed partners.
Prior experience with development or maintenance of distributed databases / storage systems is recommended.

Responsibilities

Manage, develop, and grow a team responsible for Cassandra’s scalability and performance across Apple.
Growing the larger database and service organization within the Seattle campus leading automation and scaling initiatives across the Cassandra, Kafka, Solr, Redis, and etcd SRE teams.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume