Staff Site Reliability Engineer
Valimail
·
Posted:
August 29, 2023
·
Remote
About the position
As a Staff Site Reliability Engineer at Valimail, you will play a crucial role in building a more trusted email ecosystem by working closely with industry-leading companies. Your main responsibility will be to lead highly visible initiatives that aim to secure billions of inboxes. You will evangelize standard methodologies for building and operating highly reliable systems, serve as a subject matter expert in observability and monitoring, consult in system design to meet reliability and capacity requirements, and automate infrastructure, app deployments, and configuration management.
Responsibilities
- Evangelize standard methodologies for building and operating highly reliable systems
- Serve as the subject matter expert in observability and monitoring
- Consult in system design to meet reliability and capacity requirements
- Automate infrastructure, app deployments, and configuration management
- Conduct timely post-mortems of production infrastructure incidents
- Be a team leader with all aspects of operational security and compliance
- Seek out potential threats to security and reliability and advocate solutions
- Participate in an on-call rotation to receive escalations
- Work with Amazon Web Services, Kubernetes, and Terraform
Requirements
- 10+ years of SRE or DevOps experience
- Passion for reliable, scalable, observable software with a strong sense of ownership
- Experience developing and monitoring mission-critical systems
- Experience building Infrastructure as Code (preferably with Terraform)
- Substantial experience with programming languages (Ruby, Python, and Golang)
- Working knowledge of a configuration management tool (unspecified)
- Evangelize standard methodologies for building and operating highly reliable systems
- Serve as the subject matter expert in observability and monitoring
- Consult in system design to meet reliability and capacity requirements
- Automate infrastructure, app deployments, and configuration management
- Conduct timely post-mortems of production infrastructure incidents
- Be a team leader with all aspects of operational security and compliance
- Seek out potential threats to security and reliability and advocate solutions
- Participate in an on-call rotation to receive escalations
- Work with Amazon Web Services, Kubernetes, and Terraform
Benefits
- Competitive pay + participation in employee stock option plan
- Comprehensive health, dental, + vision coverage
- Eight weeks of paid maternity leave & Four weeks of paid paternity leave
- Remote First Company, you can work anywhere within the US
- Unlimited and flexible PTO