Software Engineer, Site Reliability
Redpanda Data
·
Posted:
August 8, 2023
·
Remote
About the position
We are looking for an experienced Site Reliability Engineer (SRE) to join our growing team at Redpanda. As an SRE, you will play a crucial role in building and shaping the technical culture of our company. Your main responsibilities will include building new services, automating infrastructure lifecycle, and monitoring our services to ensure reliability, scalability, and high performance. You will also be involved in designing and implementing observability-as-code, building tools and services for automated infrastructure management, and participating in on-call rotations. This is an exciting opportunity to work with a distributed engineering team and contribute to the future of real-time data.
Responsibilities
- Be a part of the SRE team, working with all of engineering on building new services, automating infrastructure lifecycle on Kubernetes, and monitoring services with the goal of offering a reliable, scalable, and high-performance SaaS.
- Build systems and services to turn toil into automation.
- Design and implement observability-as-code.
- Build tools and services to allow automated infrastructure management and self-healing.
- Participate in on-call rotations, working to keep customer workloads running and incident-free.
- Have 5+ years of experience in an SRE-like role.
- Be comfortable working with a 100% distributed engineering team, collaborating on GitHub, in the open.
- Have a strong understanding of Go.
- Have experience with the ecosystem of both commercial and open-source observability.
- Have strong experience with AWS and GCP.
- Have experience managing Kubernetes.
- Have experience running highly-scalable production workloads on Kubernetes.
- Have experience managing infrastructure predictably through GitOps and IaC.
- Be willing to participate in an on-call rotation.
- Have excellent written communication skills.
- Have a B.S. in Computer Science or equivalent experience.
Requirements
- 5+ years of experience in an SRE-like role
- Comfortable working with a 100% distributed engineering team, collaborating on GitHub, in the open
- Strong understanding of Go
- Experience with the ecosystem of both commercial and open source observability
- Strong experience with AWS and GCP
- Experience managing Kubernetes
- Experience running highly-scalable production workloads on Kubernetes
- Experience managing infrastructure predictably through GitOps and IaC
- Willingness to participate in an on-call rotation
- Excellent written communication skills
- B.S. in Computer Science or equivalent experience
Benefits
- Competitive base salary range
- Remote-first company
- Opportunity to work with a team of seasoned engineers, hackers, and builders
- Funding from premier investors including GV and Lightspeed
- Building the future of streaming data
- Opportunity to shape the technical culture
- Exciting challenges and opportunities to learn
- Partnership with product, customer success, and cross-functional engineering teams
- Building new services and automating infrastructure lifecycle
- Design and implementation of observability-as-code
- Building tools and services for automated infrastructure management and self-healing
- Participation in on-call rotations
- Strong understanding of Go
- Experience with AWS and GCP
- Experience managing Kubernetes
- Experience running highly-scalable production workloads on Kubernetes
- Experience managing infrastructure predictably through GitOps and IaC
- Opportunity to work with SLIs and SLOs
- Experience operating a SaaS platform
- Fluency in TypeScript, C++, and Python (nice to have)
- Opportunity to work with streaming platforms
- Salary ranges determined by role, level, and location
- Culture based on trust, transparency, communication, and kindness.