Do you like collaborating across teams to solve complex problems? Do you enjoy solving large scale distributed systems problems? Join the Mapping SRE team! The Mapping SRE team is responsible for overseeing and improving availability, reliability, performance and change management procedures of Akamai's mapping system. Our system routes trillions of client requests per day, controlling tens of terabits per second of content traffic served to clients worldwide. Our team defines KPIs, advances the state of measurements, monitoring dashboards, alerts, and investigates complex production issues. Partner with the best In this role, you'll work closely with cross-functional teams to understand and improve the performance, availability and reliability of Akamai's Mapping Service. You'll define key performance indicators (KPIs), advance the state of monitoring, alerting and operational responses, and investigate complex performance issues. As a Senior Site Reliability Engineer, you will be responsible for: Monitoring, investigating, and analyzing performance and availability by (co)designing, managing, and tracking product-related SLIs/SLOs Solving problems and avoid recurrence by developing tools / prototypes to proactively monitor service performance and availability Working closely with product engineers to advocate reliable and scalable system design for supportability, resilience and reliability Leveraging skills in data analysis, network diagnostics and debugging tools to characterize performance and recommend improvements Engaging with our support, operations and engineering teams to investigate and troubleshoot complex problems, including incident management and post-mortem reviews Collaborating with internal teams to help trouble-shoot and resolve escalations and incidents for our customers
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees