Senior Site Reliability Engineer

Prosper•San Francisco, CA

59d

About The Position

You will be a senior technical contributor on the SRE team, responsible for the reliability, scalability, and security of Prosperâ€™s Cloud Platform portfolio. This is as much of a platform engineering role as it is SRE role â€” you will maintain the applications that run on our platform, drive alignment to platform standards, and ensure services stay current within the framework and dependency realm. We are building an agentic AI-first operations model where AI agents handle investigations, deployments, audits, and optimizations â€” and you will be at the center of designing and governing that system. You will share the ownership of application-layer reliability, CI/CD pipelines, and observability while simultaneously building the skills, rules, and guardrails that allow AI agents to operate safely alongside human engineers.

Requirements

7+ years in SRE, DevOps, or Platform Engineering
Deep expertise with a major cloud provider (GCP preferred) and Kubernetes
Strong infrastructure-as-code experience with multi-environment patterns
Production CI/CD pipeline design
Observability and APM platform experience
Strong written communication â€” your documentation will be consumed by humans and AI agents alike

Nice To Haves

Experience building or integrating AI agents into operational workflows
Hands-on with LLM-powered development tooling
Background in designing guardrails or policy engines for automated systems
Track record of building internal developer platforms or self-service infrastructure

Responsibilities

Design and author AI agent skills â€” structured playbooks that encode investigation, deployment, and optimization workflows
Own application-layer reliability within Kubernetes-based compute (managed by the Infrastructure Engineering team) across all environments
Maintain and upgrade platform applications â€” drive framework upgrades, dependency updates, and alignment to platform standards
Drive infrastructure-as-code with modular, multi-environment patterns
Participate in on-call rotation and lead incident response
Build and maintain observability across cloud monitoring and APM platforms
Own the Internal Developer Platform â€” CI/CD pipelines, deployment tooling, and developer self-service
Mentor junior SRE engineers and shape team standards

Benefits

Flexible time off
Comprehensive health coverage
Competitive salary
Paid parental leave
Wellness benefits including access to mental health resources, virtual HIIT and yoga workouts
Udemy access
Childcare assistance
Pet insurance discounts
Legal assistance

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume