Service Reliability - Monitor, Measure and analyze the system's performance and availability. Resolve variety of high impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards Provide expertise in area and advanced knowledge of applications programming and ensure application design adheres to the overall architecture blueprint Utilize advanced knowledge of system flow and develop standards for coding, testing, debugging, and implementation Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions Serve as advisor or coach to junior SRE engineers, allocating work as necessary Automation - SREs' role is to develop and maintain automated tools and systems to manage and monitor the infrastructure. Reduce manual intervention, human errors and the time it takes to perform routine tasks. Capacity Planning and Scalability - periodically assess the capacity of needs of services and work on scaling them to handle the increased usage. Plan for resource allocation, manage load balancing and ensure the system can handle demand fluctuations Incident Management - Work to detect, diagnose and resolve issues quickly to minimize the impact on users and business. Conduct post-incident reviews to learn and improve system's reliability Cross-Functional Collaboration - SREs must work with different development teams, product owners and other stakeholders to ensure seamless deliveries and aligning to a common goal.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees