We are looking for a Site Reliability Engineer 3 to support mission-critical cloud services and production operations. The role focuses on improving service reliability, reducing operational risk, automating repetitive tasks, and driving faster detection and resolution of issues. The engineer will work closely with development, infrastructure, security, and operations teams to monitor service health, troubleshoot production issues, participate in incident response, improve observability, and implement reliability best practices. This role also includes analyzing recurring failures, building automation, supporting deployments, and contributing to capacity planning, disaster recovery, and operational readiness. Also works on number of different region/realm rollouts, deployments. Forecasts demands and responds to capacity needs. Collaborates with software development teams to develop reliable and scalable infrastructures. Performs data collection to maintain and optimize operations and reliability. Leverages knowledge to perform incident response and/or maintenance tasks. Provides health and performance reporting. Identifies opportunities for automation. Communicates about services and identifies and explains the potential impact of changes. Provides support for technology and document incidents. Experiments with new tools and assesses potential impact and develops knowledge of site reliability trends.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior