Senior Site Reliability Engineer
People.ai
·
Posted:
August 25, 2023
·
Onsite
About the position
The Senior Site Reliability Engineer at Prove is responsible for implementing a software engineering approach to operations, using software to manage systems, solve problems, and automate tasks. They will lead the efforts in creating and supporting scalable and reliable software systems, ensuring the availability and reliability of services, and managing large systems through code. The role requires knowledge of Site Reliability Engineering principles, experience in deploying and monitoring software, and a strong passion for improving reliability and scalability. The Senior Site Reliability Engineer will also mentor colleagues, lead multi-team efforts, and have a strong focus on documentation and training.
Responsibilities
- Be familiar with the Site Reliability Engineering principles like error budgets and toil.
- Extensive knowledge in relation to how software is deployed, configured and monitored.
- Experience around creating observable production software applications and services which facilitates answering questions in relation to unknowns.
- Experience working with canaries and experiments.
- Excellent communication skills in order to provide excellent advice and feedback to other engineers in relation to reliability and scalability.
- Promote, maintain and enhance our cultural values of humility, passion, inclusion and leadership.
- Understand the application software architecture and data flow with particular interest in all aspects that can affect performance and reliability.
- A culture of identifying repetitive tasks, proposing and implementing automation solutions to remove toil.
- Exhibit a strong curiosity and passion for expanding and deepening knowledge.
- Have a culture of leading and owning software and services (you write it you wear it) from start to end, delivering quality for users as long as documentation and training for users and team members.
- Mentoring colleagues on different subjects related to the SRE work.
- Leading multi-team efforts and communities in relation to the use of technologies and good practices.
- Familiarity with chaos engineering and capacity planning.
- Ability to deliver production ready code for operations.
- Strong passion for producing documentation and training material for other teams.
- Working within OnCall shifts for SRE supported environments.
Requirements
- Familiarity with Site Reliability Engineering principles like error budgets and toil
- Extensive knowledge of software deployment, configuration, and monitoring
- Experience in creating observable production software applications and services
- Experience working with canaries and experiments
- Excellent communication skills for providing advice and feedback on reliability and scalability
- Promotion and enhancement of cultural values of humility, passion, inclusion, and leadership
- Understanding of application software architecture and data flow
- Ability to identify repetitive tasks and propose automation solutions
- Strong curiosity and passion for expanding knowledge
- Ownership of software and services from start to end, including documentation and training
- Mentoring colleagues on SRE work
- Leading multi-team efforts and communities in technology and good practices
- Familiarity with chaos engineering and capacity planning
- Ability to deliver production-ready code for operations
- Strong passion for producing documentation and training material
- Willingness to work within OnCall shifts for SRE supported environments
- 4 to 8 years of production engineering experience OR software engineering experience with sufficient production exposure
- Kubernetes architecture and operations experience
- Expertise in applications and services telemetry using standards like OpenTelemetry
- Good coding and automation skills, preferably with experience in Golang
- Strong Linux and networking fundamentals
Benefits
- Competitive salaries & Bonus Plan (for eligible roles) and Equity Plan
- 401(k) Retirement Plan & Match
- Comprehensive medical benefits for you and your family
- Emotional & Physical Wellness – Access to wellness services (EAP, Gympass, Prove Well-Being Reimbursement)
- Unlimited Vacation and Flexible hours
- Professional Development Coaching via Bravely
- Healthy lunches catered and bottomless snacks & beverages for all office locations
- 12 paid holidays for all global employees
- A great place to work and connect with other talented Provers like yourself!