Site Reliability Engineer ID60188

AgileEngine•Downey, CA

1d•Hybrid

About The Position

We are looking for an SRE Operations Engineer to keep production and staging environments running reliably across a cloud-based SaaS platform. You’ll respond to live incidents, reduce operational toil through automation, and improve observability using Kubernetes, Terraform, Grafana, and AWS. This is a hands-on role with real ownership across CI/CD pipelines, GitOps workflows, and on-call rotations.

Requirements

2+ years of experience in Site Reliability Engineering, DevOps, or Production Operations
Experience with AWS supporting production environments
Experience supporting production SaaS applications
Strong understanding of CI/CD systems such as GitHub Actions, Jenkins, or CircleCI
Experience with GitOps and strong Git fundamentals
Experience using GitHub, Jira, and Confluence in collaborative environments
Experience with Kubernetes such as EKS or kOps
Experience with Docker and containerization
Experience with observability tools such as Grafana, Prometheus, Loki, or PagerDuty
Experience with scripting languages such as Bash, Python, or Go
Experience with Infrastructure as Code such as Terraform or Helm
Ability to work within structured operational processes and SLAs
Strong written and verbal English communication skills
Self-driven with a growth mindset

Nice To Haves

AWS certifications such as Solutions Architect, DevOps Engineer, or SysOps Administrator
Experience in multi-tenant SaaS environments
Experience working in globally distributed teams
Familiarity with ChatOps practices
Experience improving monitoring quality and reducing alert fatigue

Responsibilities

Monitor and support production and staging environments in real time, ensuring high availability, performance, and stability
Respond to incidents, perform triage and root cause analysis, and contribute to post-incident reviews and remediation efforts
Participate in an on-call rotation with defined SLAs
Handle ad-hoc and unplanned operational requests from Product, Support, and internal teams
Maintain and enhance monitoring, alerting, dashboards, logs, and metrics, and improve observability practices
Support CI/CD pipelines, production releases, and GitOps workflows
Contribute to automation efforts to reduce operational toil
Maintain and improve Kubernetes-based infrastructure and containerized workloads
Support Infrastructure as Code practices and ongoing environment improvements

Benefits

Professional growth: Mentorship, TechTalks, and personalized growth roadmaps.
Competitive compensation: USD-based pay with education, fitness, and team activity budgets.
Exciting projects: Modern solutions with Fortune 500 and top product companies.
Flextime: Flexible schedule with remote and office options.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume