Site Reliability Engineer Jobs

803 jobs found — updated daily

Staff Site Reliability Engineer

PlayStation GlobalSan Diego, CA
Onsite

About The Position

As a member of the Commerce Reliability Engineering team, you will carry the responsibility of keeping our monetization platform highly available and resilient, while continually enabling our service teams to deliver new and exciting product and technical features. Our team strives to iteratively learn, improve and automate our processes every single day, which continually improves operational excellence within our organization. You will be empowered to be a technical leader on our team, helping identify and proactivity drive improvements in both process and technology.

Requirements

  • BS degree in Computer Science, Engineering, or related technical subject area.
  • 7+ years hands-on AWS experience – integrating, developing and managing applications
  • 10+ years of relevant work experience in a high-volume and/or critical production, software environment
  • 10+ years of hands on software engineering or systems engineering experience (Java and/or c++ services)
  • 5+ years of experience with building automation into daily operational processes through one or more programming languages (preferably Python or Go).
  • Strong experience in configuring, tuning and automating operational responsibilities for AWS managed data services including RDS, DynamoDB and Elasticache
  • Experience with monitoring and log management tools (ie: DataDog, CloudWatch, Splunk)
  • Experience with container technologies and orchestration (ie: Docker, Kubernetes, EKS, Fargate)
  • Hands-on experience in triaging and tuning Java cloud applications with integration into AWS
  • Solid understanding of AWS networking systems and protocols (ie: ALB, R53, API-Gateway, TCP/IP, HTTP/HTTPS, DNS)
  • Experience with developing or support Continuous Integration and Continuous Delivery/Deployment pipelines (CI/CD)

Responsibilities

  • Hands-on application management of over 100 commerce and payment related services within an AWS cloud environment, ensuring availability, resiliency, scalability and performance.
  • Work side by side with our service development teams to develop, automate and ensure the production readiness of all new services and features introduced.
  • Apply, integrate and automate the configuration and ongoing operations of AWS managed services.
  • Identify areas for operational process improvement and automation. Drive the hands on development of scripts and tools to automate these processes within our environment.
  • Increase observability on our platform by implementing robust monitoring and alerting patterns across our services. Develop rich, informative dashboards / reports on our services that provide valuable insight, and develop meaningful alerting patterns to drive down the MTTD and MTTR on platform incidents.
  • Collaborate and partner with other SRE teams that specialize in areas such as data services, data platform, and platform hosting to inspire changes and ensure optimal application performance and resiliency across all back-end services within PlayStation.
  • Iteratively lead performance and capacity validation analysis for our commerce platform services. Utilize AWS patterns and technologies such as spot instances, dynamic auto-scaling and EKS to efficiently make the most of our AWS spend.
  • Review service flows and architecture to influence resiliency, availability and scalability for all services within our platform.
  • Provide rotational on-call support where you’ll respond, detect, triage and resolve production incidents on the commerce and payments platform.
  • Conduct, document and present root cause analysis documents to share incident insights and findings with our broader engineering organization.

Benefits

  • medical
  • dental
  • vision
  • matching 401(k)
  • paid time off
  • wellness program
  • employee discounts for Sony products
  • bonus package

Career Resources

Build a Resume for Site Reliability Engineer

The resume builder that gets results.

  • Get clear feedback so you look as qualified as you are
  • Align your resume with the job to get further in the process, faster
  • Take the guesswork out of resume writing

Explore Related Job Searches

Frequently Asked Questions

Common questions about Site Reliability Engineer careers and jobs.

Based on current job postings on Teal, the average Site Reliability Engineer salary in the US is approximately $196,000 per year, with a typical range of $73,000 to $252,000.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service