The Sr. Site Reliability Engineer is responsible for driving the reliability, performance, and scalability of services with minimal instruction. This role involves tackling non-routine assignments and resolving moderately complex issues that directly impact the service’s stability and effectiveness. The Sr. Site Reliability Engineer applies a deep understanding of software and systems engineering principles to design and implement solutions that enhance service reliability. This position requires good judgment and the ability to prioritize work effectively while contributing to the overall goals of the SRE team and organization. Note: This role may come into contact with confidential or sensitive customer information requiring special treatment in accordance with Red Hat policies and applicable privacy laws. What you will do: Lead the development and implementation of robust code and automation scripts to improve service reliability and scalability. Conduct thorough code reviews and testing processes to ensure the highest quality standards in the codebase. Work to solve moderately complex issues, making decisions that impact the service's reliability and performance. Mentor and guide junior engineers, fostering a collaborative environment focused on continuous improvement. Engage in a regular on-call rotation, taking responsibility for critical incidents and ensuring timely resolution. Lead incident response and postmortem processes, implementing solutions to prevent recurrence of issues. Collaborate with cross-functional teams to design, develop, and refine SRE tools and systems that support service objectives. Take ownership of tasks and projects, prioritizing them according to their impact on service health and team goals. What you bring: A bachelor's degree in Computer Science or a related technical field involving software or systems engineering is required. However, hands-on experience that demonstrates your ability and interest in Site Reliability Engineering are valuable to us, and may be considered in lieu of degree requirements. You must have some experience programming in at least one of these languages: Python, Golang, C, C++ or another object-oriented language. You must have experience working with public clouds such as AWS, GCP, or Azure. You must also have the ability to collaboratively troubleshoot and solve problems in a team setting. As an SRE you will be most successful if you have some experience troubleshooting an as-a-service offering (SaaS, PaaS, etc.) and some experience working with complex distributed systems. Direct experience with Kubernetes or OpenShift is a plus. We like to see a demonstrated ability to debug, optimize code and automate routine tasks. We are Red Hat, so you need a basic understanding of Unix/Linux operating systems.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees