Senior Site Reliability Engineer

Launch Legends•Cheyenne, WY

68d

About The Position

Autheo is building a resilient DePIN platform guaranteeing 99.999% uptime across 1,000+ global nodes processing millions TPS, 200GB/s storage, and AI inference with unbreakable security and automated GDPR/HIPAA compliance during zero-downtime upgrades. As a part-time Senior Site Reliability Engineer in an equity-based cofounder role, you’ll maintain infrastructure resilience, implement chaos engineering, and optimize for high-availability blockchain operations. This role is critical to ensuring 1B+ TPS, 200GB/s DePIN flows, and AI/ML workloads in a decentralized environment. If you’re passionate about reliability and optimization, join us to fortify the backbone of the next trillion-dollar decentralized economy.

Requirements

Bachelor’s/Master’s in Computer Science or equivalent.
5+ years in SRE for high-availability systems (99.999% uptime).
Expertise in Kubernetes, Prometheus, Grafana, OpenTelemetry.
Proficiency in chaos engineering, Terraform/Ansible, and compliance auditing.

Nice To Haves

Background in blockchain/DePIN operations or AI infrastructure.
Experience with open-source SRE tools or multi-cloud environments.
Contributions to SRE standards or patents in reliability engineering.

Responsibilities

Infrastructure Resilience
Maintain 99.999% uptime across 1,000+ global nodes with automated failover and zero-downtime upgrades.
Implement chaos engineering to test resilience against failures in blockchain/DePIN operations.
Optimize for 1B+ TPS and 200GB/s storage with proactive capacity planning.
Monitoring & Observability
Deploy Prometheus/Grafana for real-time monitoring of blockchain anomalies and DePIN performance.
Integrate OpenTelemetry for distributed tracing with <15min MTTR.
Build ML-powered alerting for threat detection and resource imbalances.
Compliance & Security
Embed GDPR/HIPAA-compliant monitoring with automated audit logging.
Implement zero-trust security for DePIN networks and AI inference pipelines.
Design disaster recovery plans for blockchain/DeFi incidents with 95% success rate.
Automation & Optimization
Automate infrastructure provisioning and scaling with Terraform/Ansible.
Optimize Kubernetes for blockchain node operations and AI workloads.
Conduct post-mortems and SLO/SLI improvements for continuous reliability.
Collaboration & Innovation
Collaborate with DePIN, blockchain, and AI/ML teams for integrated reliability.
Lead SRE reviews for scalability and compliance.
Mentor engineers and contribute to open-source SRE tools.
Publish at SREcon/Web3 Summit on reliability innovations.