Ceph Cluster Development Engineer (C++ Focus)

Fortinet IncSanta Clara, CA
53d$179,000 - $219,000

About The Position

We are seeking a highly skilled Ceph Cluster Development & Operations Engineer with strong expertise in C++ systems programming to design, extend, and maintain enterprise-scale Ceph distributed storage clusters. The role involves deep development in Ceph core subsystems (RADOS, OSD, RGW, MDS), performance optimization, and operational excellence across multi-site, multi-zone architectures. You will work closely with system architects, SREs, and cloud infrastructure teams to ensure the reliability, scalability, and security of mission-critical storage systems deployed across multiple data centers and Kubernetes environments.

Requirements

  • Strong proficiency in C++ (C++11 or later), with experience in large-scale distributed systems or kernel-adjacent development.
  • Deep understanding of Ceph architecture and its core components: MON, OSD, MGR, RGW, MDS, and CRUSH maps.
  • Proficient in Linux systems programming, debugging (gdb, perf, valgrind), and performance profiling.
  • Experience with Python or Go for tooling and automation.
  • Strong foundation in data replication, erasure coding, and consistency models in distributed storage.
  • Hands-on experience with Kubernetes, Rook-Ceph, Helm, Ansible, and related DevOps tools.
  • Familiarity with TCP/IP, HTTP/S3 APIs, block storage (RBD/iSCSI), and object storage semantics.
  • Ability to conduct root-cause analysis and lead performance investigations under production environments.

Nice To Haves

  • Contributions to the Ceph open-source project or prior experience modifying Ceph source code.
  • Experience with multi-site replication, object versioning, compliance retention, or legal hold features.
  • Background in distributed storage systems, file systems, or cloud storage platforms.
  • Familiarity with containerized environments, network virtualization, and cloud-native observability stacks.
  • Excellent technical documentation and communication skills in English.

Responsibilities

  • Design, build, and operate large-scale Ceph clusters including RADOS, RGW, RBD
  • Contribute to or extend Ceph core components written in C++ (e.g., OSD, RGW, librados, BlueStore, MGR modules).
  • Profile and optimize performance across network, disk I/O, and replication layers (PG placement, CRUSH rules, BlueStore tuning).
  • Develop automation and tooling for cluster lifecycle management (deployment, upgrades, scaling, failover, and recovery).
  • Integrate Ceph with Kubernetes (via Rook-Ceph, CSI drivers) and CI/CD pipelines for continuous delivery.
  • Implement and validate multi-site replication and disaster recovery architectures for high availability.
  • Develop and maintain secure storage solutions using dm-crypt, KMS integration, and CephX authentication.
  • Build observability pipelines using Prometheus, Grafana, and custom exporters for metrics and health analytics.
  • Write and maintain SOPs, automation scripts, and system documentation to support production-grade operations.
  • Collaborate with upstream Ceph community or maintain in-house forks for feature development and bug fixes.

Benefits

  • medical
  • dental
  • vision
  • life and disability insurance
  • 401(k)
  • 11 paid holidays
  • vacation time
  • sick time
  • a comprehensive leave program
  • equity program

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Professional, Scientific, and Technical Services

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service