Senior Distributed Storage SRE Engineer

TencentPalo Alto, CA
$106,100 - $199,300Onsite

About The Position

We are seeking a Senior Distributed Storage SRE Engineer to be responsible for the daily operation and maintenance of distributed storage systems. This includes online release, software deployment, monitoring, and inspection. The role focuses on ensuring the stability of block storage, designing and implementing disaster recovery solutions, and optimizing service reliability, scalability, and performance to guarantee system SLA. You will also manage and plan resources for block storage and related systems to enhance efficiency, and participate in building the operation and maintenance support platform by developing tools to improve operational efficiency. A key part of this role is responding quickly to online incidents, discovering, debugging, and solving common faults, hidden dangers, and performance problems, and implementing emergency plans and fault recovery strategies.

Requirements

  • Bachelor’s degree Computer Science or related technical field, or equivalent practical experience.
  • Experience with Unix/Linux operating systems internals (e.g. filesystems, storage devices).
  • Experience with networking (e.g., tcp/ip, routing) or cloud systems.
  • Experience with analyzing and troubleshooting storage systems.
  • Experience programming in one or more of the following: Shell, Python, Go, etc.

Nice To Haves

  • Experience in designing or managing large-scale distributed storage systems, understanding the principle of distributed system and be familiar with open source distributed storage system (e.g. NAS, HDFS, CEPH).
  • Familiar with cloud products, have practical experience in block storage, and be able to deal with common block storage-related problems.
  • Experience with SRE jobs (e.g. online release, monitoring, daily inspection etc.) and script programming.
  • Strong sense of responsibility, and be able to respond and deal with problems in a timely manner.

Responsibilities

  • Responsible for the daily operation and maintenance of distributed storage systems (e.g. online release, software deployment, monitoring, inspection,etc.).
  • Responsible for the stability of the block storage, the design and implementation of disaster recovery solutions, promote the improvement of service reliability, scalability and performance optimization, and guarantee system SLA.
  • Responsible for resource management and planning of block storage and related systems to improve resource efficiency.
  • Participate in the construction of the operation and maintenance support platform, develop tools, and improve operational efficiency.
  • Quickly respond to online incidents, be able to discover, debug and solve common faults, hidden dangers and performance problems, and be responsible for the implementation of emergency plans and fault recovery strategies.

Benefits

  • Sign on payment
  • Relocation package
  • Restricted stock units
  • Medical benefits
  • Dental benefits
  • Vision benefits
  • Life benefits
  • Disability benefits
  • 401(k) plan
  • 15 to 25 days of vacation per year
  • 13 days of holidays
  • 10 days of paid sick leave per year
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service