Senior Site Reliability Engineer

ZetaChain•San Francisco, CA

28d•$140,000 - $190,000•Remote

About The Position

We're building something ambitious at ZetaChain: the first universal blockchain and AI platform that connects everything—Bitcoin, Ethereum, Solana, and more—while pioneering in the GenAI space. We're backed by top investors, live on mainnet, and building the future of blockchain and AI technology. If you're excited about working on big, meaningful problems with a world-class team, you're in the right place. We are looking for a Senior Site Reliability Engineer (SRE) to ensure the reliability, scalability, and security of ZetaChain’s production infrastructure. This role is highly hands‑on and execution‑focused. You will operate critical blockchain and AI‑adjacent infrastructure, build automation to reduce operational overhead, and partner closely with protocol, platform, and AI teams to design systems that are reliable by default.

Requirements

4+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or Platform Engineering
Strong software engineering background with production experience in Go and/or Python
Deep experience operating Linux systems in production
Proven experience running Kubernetes at scale
Experience supporting high‑availability distributed systems
Comfortable working in fast‑moving startup environments
Strong security mindset , especially for infrastructure running on public or adversarial networks
Excellent collaboration and communication skills
Languages: Go, Python, Bash, Terraform, Ansible
Infrastructure: Kubernetes, Docker, Linux
Observability: Prometheus, Grafana, Datadog, Loki, incident.io
Platforms: AWS, GCP, bare metal
Blockchain Stack: Cosmos SDK, Tendermint / CometBFT, Ethereum, Bitcoin

Nice To Haves

Exposure to AI‑powered infrastructure, observability, or developer tooling
Experience operating blockchain nodes or validator infrastructure
Familiarity with Cosmos‑based chains or EVM clients
Experience with DevOps, DevSecOps, or GitOps methodologies
Contributions to open‑source software

Responsibilities

Operate and maintain production blockchain infrastructure , including validators, RPC services, indexers, and supporting services
Ensure high availability and performance for AI‑enabled developer platforms and internal tooling
Build and maintain monitoring, alerting, and dashboards for protocol, infrastructure, and application health
Write high‑quality automation and infrastructure code to reduce toil and improve reliability
Participate in on‑call rotations , incident response, and post‑incident reviews
Partner with engineering teams to embed reliability, scalability, and security best practices into system design
Improve Kubernetes reliability across cloud and bare-metal environments
Continuously refine deployment, rollback, and recovery strategies

Benefits

Make a direct impact on infrastructure powering both blockchain and AI platforms
Work on technically challenging, real‑world distributed systems
Fully remote with quarterly in-person team meetups
Strong open‑source culture and modern engineering practices
Competitive compensation and meaningful ownership

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume