Site Reliability Engineer II

Akamai
2d$95,000 - $171,000Remote

About The Position

Do you like collaborating across teams to solve complex problems? Do you have a passion for cutting edge technologies and tackling system problems? Join our highly skilled Site Reliability team Our team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We specialize in creating solutions that help improve observability and enforce SLAs across all internal teams. We do all of this while maintaining Akamai's mission to make life better for billions of people, billions of times a day. Partner with the best As a Site Reliability Engineer II - Observability, you will collaborate across operations teams and application development teams. Together, you will be creating tooling and software that monitors and improves the reliability of our systems. You'll work with a diverse range of technologies as we release new applications and modernize existing tooling.

Requirements

  • Have 2 years of relevant experience and a Bachelor's degree in Computer Science or its equivalent
  • Have professional experience in a Site Reliability, Development, or SysAdmin role, working with large-scale distributed systems
  • Have in-depth experience working with modern observability tools such as OpenTelemetry, Prometheus, Grafana, Loki, or similar
  • Be familiar with distributed queueing technologies such as Kafka, RedPanda, NATS, or similar
  • Have experience with containerization technologies such as Docker or Podman and container orchestration (Kubernetes)
  • Have experience developing applications and scripts using languages such as Go, Python, Bash, Rust, or similar
  • Have familiarity with infrastructure-as-code tools such as Terraform or Pulumi
  • Have experience with continuous integration / continuous deployment tools such as Jenkins, Github Actions, or similar

Responsibilities

  • Deploying and maintaining our observability platform and internal tooling
  • Partnering across teams to ensure the reliability, scalability and usability of our products and services
  • Providing guidance to engineers and developers to increase confidence that their services are performing as expected
  • Collaborating with our support, operations, and engineering teams to investigate and troubleshoot complex problems
  • Improving our system monitoring and analysis platforms to ensure rapid error detection and remediation, including developing automated remediations
  • Participating in on-call rotations, guiding restoration and repair of service-impacting issues

Benefits

  • Our benefit options are designed to meet your individual needs for today and in the future.
  • We provide benefits surrounding all aspects of your life: Your health Your finances Your family Your time at work Your time pursuing other endeavors
  • Our benefit plan options are designed to meet your individual needs and budget, both today and in the future.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service