Senior Site Reliability Engineer, CCIP

Chainlink LabsCharlotte, NC
$129,000 - $244,000Remote

About The Position

As a Senior Site Reliability Engineer on the CCIP Platform team, you will ensure the reliability, scalability, and operational excellence of the systems powering Chainlink's Cross-Chain Interoperability Protocol (CCIP). This role exists to strengthen production resilience, reduce operational toil, and enable engineering teams to ship safely while maintaining high service availability. You will influence reliability practices across the platform and help establish operational standards that scale with the business.

Requirements

  • Demonstrated experience in Site Reliability Engineering, Production Engineering, or a similar role operating large-scale distributed systems.
  • Deep expertise defining, implementing, and driving adoption of SLOs, SLIs, and error budgets across engineering organizations.
  • Built and operated production Kubernetes environments supporting critical services.
  • Applied OpenTelemetry to improve observability across distributed systems.
  • Experience improving the reliability, scalability, and operability of production infrastructure.

Nice To Haves

  • Demonstrated technical leadership influencing reliability practices across engineering teams.
  • Experience performing capacity planning and performance tuning for high-throughput distributed services.
  • Previous experience working on Web3 infrastructure or within a crypto-native engineering organization.
  • Applied chaos engineering or fault-injection techniques to improve production resilience.
  • Partnered with software engineering teams to conduct production-readiness reviews before service launches.
  • Experience leading on-call operations, including defining rotations, escalation policies, and improving alert quality.

Responsibilities

  • Improve deployment safety and increase delivery velocity by advancing production engineering practices.
  • Establish distributed tracing across the platform to improve observability and accelerate incident investigation.
  • Eliminate operational toil through automation that increases engineering efficiency and platform reliability.
  • Drive adoption of meaningful SLOs, SLIs, and error budgets that guide engineering decisions and improve service health.
  • Increase platform scalability and operational readiness as CCIP continues to grow.
  • Strengthen Chainlink's reputation through highly available production systems while reducing operational overhead.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service