Site Reliability Engineer II

Akamai

2d•$95,000 - $171,000•Remote

About The Position

Do you like collaborating across teams to solve complex problems? Do you have a passion for cutting edge technologies and tackling system problems? Join our highly skilled Site Reliability team Our team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We specialize in creating solutions that help improve observability and enforce SLAs across all internal teams. We do all of this while maintaining Akamai's mission to make life better for billions of people, billions of times a day. Partner with the best As a Site Reliability Engineer II - Observability, you will collaborate across operations teams and application development teams. Together, you will be creating tooling and software that monitors and improves the reliability of our systems. You'll work with a diverse range of technologies as we release new applications and modernize existing tooling.

Requirements

Have 2 years of relevant experience and a Bachelor's degree in Computer Science or its equivalent
Have professional experience in a Site Reliability, Development, or SysAdmin role, working with large-scale distributed systems
Have in-depth experience working with modern observability tools such as OpenTelemetry, Prometheus, Grafana, Loki, or similar
Be familiar with distributed queueing technologies such as Kafka, RedPanda, NATS, or similar
Have experience with containerization technologies such as Docker or Podman and container orchestration (Kubernetes)
Have experience developing applications and scripts using languages such as Go, Python, Bash, Rust, or similar
Have familiarity with infrastructure-as-code tools such as Terraform or Pulumi
Have experience with continuous integration / continuous deployment tools such as Jenkins, Github Actions, or similar

Responsibilities

Deploying and maintaining our observability platform and internal tooling
Partnering across teams to ensure the reliability, scalability and usability of our products and services
Providing guidance to engineers and developers to increase confidence that their services are performing as expected
Collaborating with our support, operations, and engineering teams to investigate and troubleshoot complex problems
Improving our system monitoring and analysis platforms to ensure rapid error detection and remediation, including developing automated remediations
Participating in on-call rotations, guiding restoration and repair of service-impacting issues

Benefits

Our benefit options are designed to meet your individual needs for today and in the future.
We provide benefits surrounding all aspects of your life: Your health Your finances Your family Your time at work Your time pursuing other endeavors
Our benefit plan options are designed to meet your individual needs and budget, both today and in the future.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume