About The Position

We are seeking a highly skilled Senior Site Reliability Engineer / Cloud Engineer to join the platform and operations team at Autheo. This senior role focuses on designing, building, and operating reliable, secure, scalable cloud infrastructure to power blockchain production services and Web3-enabled applications within the Autheo ecosystem. You will bring deep expertise in AWS, blockchain node operations, automation, observability, and AIOps to ensure mission-critical distributed systems deliver exceptional uptime, performance, and resilience—while partnering closely with Engineering, DevOps, Support, and Product teams to drive operational excellence and reduce toil through automation.

Requirements

  • 7+ years in Cloud, SRE, Systems, or DevOps Engineering roles, including:
  • 5+ years operating production workloads on AWS
  • 3+ years supporting blockchain infrastructure, nodes, Web3 applications, DeFi, etc.
  • Strong hands-on experience with AWS services (EC2, EKS, ECS, S3, RDS/Aurora, VPC/VPN, Route53, ALB/NLB, KMS, IAM, Secrets Manager, Lambda, EventBridge, CloudWatch, ECR)
  • Production experience with containers & Kubernetes
  • Proficiency with IaC (Terraform, Helm, AWS CDK) and automation/scripting (Python, Bash, or Go preferred)
  • Working experience with CI/CD (GitHub Actions, Jenkins, Argo, etc.)
  • Demonstrated experience with observability systems (Datadog, Prometheus, OpenTelemetry, ELK, CloudWatch, Wazuh)
  • Practical exposure to AIOps concepts (event correlation, predictive diagnostics, anomaly detection, automated response)
  • Experience supporting 24×7 on-call rotation for production services
  • Strong understanding of distributed systems, reliability patterns, and fault tolerance
  • Experience participating in major incident response and post-incident reviews
  • Supporting production blockchain, cryptocurrency, and Web3 systems
  • Hands-on support for crypto wallets (e.g., MetaMask, Trust Wallet, Ledger, WalletConnect) including transaction troubleshooting
  • Working knowledge of blockchain explorers (Etherscan, BscScan, PolygonScan, Arbiscan) for transaction/wallet inspection
  • Good understanding of smart contracts; working experience with Solidity and OpenZeppelin
  • Ability to collect logs, TXIDs, wallet addresses, and reproduction steps for escalation

Nice To Haves

  • AWS Certifications (Solutions Architect, DevOps Engineer, SysOps Administrator)
  • Deep experience with blockchain, Web3, or decentralized system operations
  • Proven SRE methodology experience, including automation, CI/CD, and IaC development
  • Experience in compliance-driven environments (SOC2, PCI, ISO27000)
  • Reliability-first mindset with strong ownership during high-pressure incidents
  • Excellent collaboration across Engineering, Product, Support, and Operations
  • Advocate for automation, data-driven decisions, and operational excellence
  • Capable of mentoring junior engineers and cross-functional peers

Responsibilities

  • Cloud Platform, Network & Infrastructure
  • Architect, deploy, and operate highly available AWS infrastructure optimized for blockchain workloads
  • Implement Infrastructure as Code (IaC) using Terraform for repeatable, auditable provisioning
  • Manage production container platforms (EKS, ECS, Kubernetes, Docker, ECR)
  • Operate and optimize EC2, S3, EBS/FSx, Lambda, and related services
  • Design VPCs, VPNs, subnets, security groups, routing, load balancers, and network isolation
  • Implement IAM, KMS, Secrets Manager for identity, encryption, and key management
  • Apply scaling techniques for RPC endpoints (load balancing, caching, throttling) and manage public/private peer connectivity
  • Support and troubleshoot Amazon Linux, Oracle Linux, and Windows Server environments
  • Blockchain Infrastructure & Node Operations
  • Deploy, operate, and maintain blockchain nodes (full/archive/light clients) and RPC endpoints on EVM-compatible chains (Ethereum, Polygon, BNB Chain, etc.)
  • Optimize node performance, storage, networking, and containerization using Docker/Kubernetes
  • Monitor and troubleshoot blockchain health metrics (block height, peer count, sync status, logs, memory, throughput)
  • Blockchain Application & Service Support
  • Support on-chain/off-chain interactions, transactions, gas fees, signing, wallets, smart contract invocations, and state queries
  • Troubleshoot blockchain errors (transaction failures, RPC timeouts, indexing lag, sync divergence)
  • Work with API gateways and middleware services (Infura, Alchemy, QuickNode equivalents)
  • Blockchain Data & Indexing
  • Implement indexing for event logs, state, and transactions using tools like The Graph, ETL pipelines, custom services, or database-backed explorers
  • Site Reliability Engineering, Automation & DevOps
  • Implement Terraform, Helm, and GitOps workflows for infrastructure lifecycle management
  • Enforce resilient, automated, scalable design patterns and collaborate on faster, higher-quality deployments
  • Own availability, latency, performance, capacity, SLOs/SLIs/SLAs with observability-driven insights
  • Lead on-call rotations, incident response for S1/S2 events, post-incident reviews, and preventive initiatives
  • Reduce operational toil through automation; own and build CI/CD pipelines (Jenkins, GitHub Actions), Terraform validation, Docker builds, Helm deployments
  • Observability & Monitoring
  • Instrument blockchain workloads for metrics, logs, traces, predictive signals, and anomaly detection using Datadog, Prometheus, Grafana, ELK, CloudWatch, OpenTelemetry, Wazuh
  • Build automated alerting, anomaly detection, diagnostics, and end-to-end observability strategies
  • AIOps & Operational Intelligence
  • Implement AIOps for event correlation, anomaly detection, predictive diagnostics, automated remediation, and self-healing (using AWS SageMaker, Bedrock, and other AI tools)
  • Drive security threat detection/prioritization, capacity planning, forecasting, cost control, and reporting
  • Security & Compliance
  • Enforce cloud security best practices, vulnerability remediation pipelines, and compliance guardrails (SOC2, PCI, ISO27000)
  • Manage cryptographic materials, KMS/HSM, wallet abstractions (HD, custodial/non-custodial, multisig)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service