Distributed Systems Engineer, Cassandra, Storage

DoorDash USASan Francisco, NY
12d

About The Position

About the Team The Storage teams build and operate online stateful systems and abstractions that are reliable, efficient, secure and easy to use for DoorDash Engineering. The Cassandra team under Storage is looking for passionate distributed systems engineers to architect and scale the next evolution of our Cassandra ecosystem. This team plays a critical role in scaling and hardening DoorDash's foundational key-value storage systems, powering mission-critical Tier 0 workloads across the company. This is a rare opportunity to shape the future of Cassandra at DoorDash, designing and scaling systems to meet 100x growth demands. You will help evolve our Cassandra ecosystem with a focus on end to end automation including building a self-service control plane, deep observability, and advanced online to offline infrastructure. You will also contribute to the upstream Apache Cassandra ecosystem where applicable, working alongside the open-source community. About the Role The Storage team is building and operating a high performance, scalable, reliable data abstraction layer that optimizes reliability and efficiency. You will help us bootstrap and scale our internal distributed database infrastructure centered around Cassandra, with a focus on reliability, operability, security and efficiency. This infrastructure will be the backbone of a platform that strives to manage itself and disappear into the background, enabling engineers to focus on building product experiences our customers love. You will report into the Engineering Manager on our Storage team within the Core Infrastructure Organization. You will have the opportunity to contribute to open-source projects and improve the ecosystem for its community. You’re excited about this opportunity because you will… Play a foundational role in shaping the direction of Cassandra ecosystem at DoorDash, helping define long-term strategy and platform capabilities. Build the future of Cassandra at DoorDash, including self-serve provisioning, elastic scaling, multi-tenant controls, and deeply integrated observability. Operate one of DoorDash’s most critical stateful systems, supporting high-throughput, low-latency production workloads at scale. Partner with the Storage Orchestration team to define automation and fleet management capabilities. Lead architectural improvements that enhance availability, fault tolerance, consistency, and multi-region durability. Define operational standards for schema governance, disaster recovery, backup and restore, and capacity planning. Have the chance to troubleshoot, debug and learn from interesting production problems at multiple layers of the infrastructure stack.

Requirements

  • You have 5+ years of experience building distributed systems with deep understanding of Cassandra internals, including replication, compaction, GC tuning, and JVM performance.
  • You have experience managing large fleets of database clusters, including building tools, monitoring, capacity planning, backups and disaster recovery strategies.
  • You are proficient in system-level fundamentals including operating systems, hardware, and networking.
  • You understand the tradeoffs of distributed system consistency, failure modes, and partition tolerance.
  • You have contributed to or are interested in contributing to open-source distributed database projects.
  • You thrive in a fast-paced, execution-driven environment and have a proven track record of delivering impactful storage solutions.
  • Must be comfortable regularly exercising discretion and independent judgment in performing job duties, including evaluating options, making informed decisions, and determining appropriate courses of action within the scope of assigned responsibilities.

Responsibilities

  • Play a foundational role in shaping the direction of Cassandra ecosystem at DoorDash, helping define long-term strategy and platform capabilities.
  • Build the future of Cassandra at DoorDash, including self-serve provisioning, elastic scaling, multi-tenant controls, and deeply integrated observability.
  • Operate one of DoorDash’s most critical stateful systems, supporting high-throughput, low-latency production workloads at scale.
  • Partner with the Storage Orchestration team to define automation and fleet management capabilities.
  • Lead architectural improvements that enhance availability, fault tolerance, consistency, and multi-region durability.
  • Define operational standards for schema governance, disaster recovery, backup and restore, and capacity planning.
  • Have the chance to troubleshoot, debug and learn from interesting production problems at multiple layers of the infrastructure stack.

Benefits

  • a 401(k) plan with employer matching
  • 16 weeks of paid parental leave
  • wellness benefits
  • commuter benefits match
  • paid time off and paid sick leave in compliance with applicable laws (e.g. Colorado Healthy Families and Workplaces Act)
  • DoorDash also offers medical, dental, and vision benefits, 11 paid holidays, disability and basic life insurance, family-forming assistance, and a mental health program, among others.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service