Staff Site Reliability Engineer
Valimail
·
Posted:
August 31, 2023
·
Remote
About the position
As a Staff Site Reliability Engineer at Valimail, you will play a crucial role in building a more trusted email ecosystem by working closely with industry-leading companies. Your main responsibilities will include evangelizing standard methodologies for building reliable systems, serving as a subject matter expert in observability and monitoring, and consulting in system design to meet reliability and capacity requirements. Additionally, you will automate infrastructure, app deployments, and configuration management, conduct post-mortems of production infrastructure incidents, and lead operational security and compliance efforts. With your expertise in SRE or DevOps, passion for scalable software, and experience with technologies like AWS, Kubernetes, and Terraform, you will contribute to securing billions of inboxes.
Responsibilities
- Evangelize standard methodologies for building and operating highly reliable systems
- Serve as the subject matter expert in observability and monitoring
- Consult in system design to meet reliability and capacity requirements
- Automate infrastructure, app deployments, and configuration management
- Conduct timely post-mortems of production infrastructure incidents
- Be a team leader with all aspects of operational security and compliance
- Seek out potential threats to security and reliability and advocate solutions
- Participate in an on-call rotation to receive escalations
- Work with Amazon Web Services, Kubernetes, and Terraform
Requirements
- 10+ years of SRE or DevOps experience
- Passion for reliable, scalable, observable software with a strong sense of ownership
- Experience developing and monitoring mission-critical systems
- Experience building Infrastructure as Code (preferably with Terraform)
- Substantial experience with programming languages (Ruby, Python, and Golang)
- Working knowledge of a configuration management tool (unspecified)
- Expertise in observability and monitoring
- Ability to consult in system design to meet reliability and capacity requirements
- Proficiency in automating infrastructure, app deployments, and configuration management
- Ability to conduct timely post-mortems of production infrastructure incidents
- Familiarity with operational security and compliance
- Ability to identify potential threats to security and reliability and propose solutions
- Willingness to participate in an on-call rotation for escalations
- Experience with Amazon Web Services, Kubernetes, and Terraform.
Benefits
- Competitive pay + participation in employee stock option plan
- Comprehensive health, dental, + vision coverage
- Eight weeks of paid maternity leave & Four weeks of paid paternity leave
- Remote First Company, you can work anywhere within the US
- Unlimited and flexible PTO