Senior Site Reliability Engineer

Prosper•San Francisco, CA

58d•Hybrid

About The Position

You will be a senior technical contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s Cloud Platform portfolio. This is as much of a platform engineering role as it is SRE role — you will maintain the applications that run on our platform, drive alignment to platform standards, and ensure services stay current within the framework and dependency realm. We are building an agentic AI-first operations model where AI agents handle investigations, deployments, audits, and optimizations — and you will be at the center of designing and governing that system. You will share the ownership of application-layer reliability, CI/CD pipelines, and observability while simultaneously building the skills, rules, and guardrails that allow AI agents to operate safely alongside human engineers.

Requirements

7+ years in SRE, DevOps, or Platform Engineering
Bachelor's degree in a technical field, or equivalent work experience.
Deep expertise with a major cloud provider (GCP preferred) and Kubernetes
Strong infrastructure-as-code experience with multi-environment patterns
Production CI/CD pipeline design
Observability and APM platform experience
Strong written communication — your documentation will be consumed by humans and AI agents alike

Nice To Haves

Experience building or integrating AI agents into operational workflows
Hands-on with LLM-powered development tooling
Background in designing guardrails or policy engines for automated systems
Track record of building internal developer platforms or self-service infrastructure

Responsibilities

Design and author AI agent skills — structured playbooks that encode investigation, deployment, and optimization workflows
Own application-layer reliability within Kubernetes-based compute (managed by the Infrastructure Engineering team) across all environments
Maintain and upgrade platform applications — drive framework upgrades, dependency updates, and alignment to platform standards
Drive infrastructure-as-code with modular, multi-environment patterns
Participate in on-call rotation and lead incident response
Build and maintain observability across cloud monitoring and APM platforms
Own the Internal Developer Platform — CI/CD pipelines, deployment tooling, and developer self-service
Mentor junior SRE engineers and shape team standards