Senior/Staff Software Engineer (SRE)

Chainguard

75d•$144,000 - $200,000

About The Position

As a Site Reliability Engineer, help us design, automate, and scale secure‑by‑default cloud infrastructure so uptime stays exciting and on‑call stays uneventful. We are seeking a talented and experienced SRE to join our team to develop and maintain cloud-based infrastructure. You will be responsible for designing, building, and scaling robust infrastructure, including observability, metrics and alerting. You will also ensure our work is sustainable by promoting best practices around deployment, incident management and disaster recovery.

Requirements

Comfortable working and thriving within a Linux ecosystem
Experience supporting high availability distributed production systems
Experience with database administration and support
Treated infrastructure as code utilizing tools like Terraform, Ansible, Chef, Puppet, and SaltStack
Familiarity working in a public cloud platform (GCP, AWS, Azure)
Software development skills in at least one of the following languages: Python, Go, Javascript, and/or Ruby
B.S. or M.S. in Computer Science or related field or equivalent in related work experience
Strong English language skills and ability to work independently, as an effective part of a globally distributed team
Ability to learn about the supply chain security space

Nice To Haves

Experience scaling services in a performant and cost-effective manner
Implemented incident management and disaster recovery playbooks
Knowledge of microservices architecture and containerization (Docker/OCI, Kubernetes)
Familiarity across multiple public cloud platforms (GCP, AWS, Azure)
Operated a multi-tenant capable software defined network (SDN)
Linux systems troubleshooting and debugging skills
Solid understanding of data structures, algorithms, API design, and software design patterns
Interest in open source software projects and communities

Responsibilities

Practice continuous improvement, by iterating on how services are deployed, configured, monitored, and maintained on our platform
Lead incident response, diagnosis, and follow-up on system outages and alerts
Help develop an operational focus and act as thought leaders for the rest of engineering
Maintain and optimize infrastructure for performance, scalability, and cost
Analyze system metrics and identify opportunities for improvement in reliability and efficiency

Benefits

Flexible & Remote-First Culture: Work remotely with team meetup opportunities, bi-annual destination summits, and a monthly stipend for coworking spaces, phone and internet costs
Our Approach to Equity: Receive stock options upon hire and promotion. Plus, you can participate in secondary offerings and have 10 years to exercise your options
100% Covered Health Insurance: We cover 100% of your health, vision and dental insurance premiums for you and your dependents
∞ Flexible Time Off: Take the time you need to recharge and reset
18 Weeks Paid Parental Leave: 18 weeks for birthing parents and 12 weeks for non-birthing parents

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Career Level

Senior

Education Level

Bachelor's degree

Number of Employees

251-500 employees

Senior/Staff Software Engineer (SRE)

About The Position

Requirements

Nice To Haves

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company