Site Reliability Engineer

AkamaiCambridge, MA
$75,700 - $136,300Hybrid

About The Position

Our team designs, develops, and manages applications and infrastructure that support Akamai Cloud's products and services. Our SRE teams solve reliability, security, and usability at scale for our global fleet while maintaining Akamai's mission at the forefront of what we do: make life better for billions of people, billions of times a day. In this role, you will focus on configuration management, IAC, and CI/CD. You will design, develop, and operate infrastructure deployment for the Akamai Cloud.

Requirements

  • Relevant experience and a Bachelor's degree in Computer Engineering, Computer Science or equivalent
  • Demonstrate experience in a Site Reliability or Software Engineering role, working with large-scale distributed systems.
  • Have experience with Terraform, including module development, state management, workspace design, policy enforcement, and enterprise-scale Infrastructure as Code implementations
  • Have experience managing Infrastructure as Code solutions using tools such as Terraform, SaltStack, Ansible, Chef, Puppet, or similar technologies
  • Have experience with designing, developing, and deploying software and infrastructure at scale in a Linux environment.
  • Have great communication and interpersonal skills

Responsibilities

  • Designing, developing, testing, and operating critical services that support the reliability, scalability, and performance of our infrastructure.
  • Designing and implementing observability solutions, including monitoring, logging, alerting, and telemetry capabilities, to proactively detect and resolve issues
  • Driving reliability improvements through automation, reducing operational toil and increasing the resilience of engineering processes.
  • Developing technical expertise in IAC systems and serving as a trusted technical resource, mentoring engineers and sharing best practices
  • Collaborating with software engineering, infrastructure, and platform teams to investigate complex production issues, identify root causes, and implement long-term corrective actions.
  • Participating in an on-call rotation and providing leadership during incident response, driving timely service restoration, effective communication, and post-incident improvement efforts.

Benefits

  • healthcare
  • 401K savings plan
  • company holidays
  • vacation (in the form of PTO)
  • sick time
  • family friendly benefits including parental leave
  • employee assistance program including a focus on mental and financial wellness
  • Employee Stock Purchase Plan (ESPP)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service