Senior Service Reliability Engineer

SonyAliso Viejo, CA
Hybrid

About The Position

PlayStation is a global leader in entertainment, producing products and services like PlayStation®5, PlayStation®4, PlayStation®VR, and PlayStation®Plus. Sony Interactive Entertainment (SIE) strives for an inclusive environment. The Gaming, Developer and Future Technology Group (GDFT) at Sony Computer Entertainment is leading the cloud gaming revolution, putting console-quality video games on any device. The Service Reliability Engineering team plays a significant role in delivering a great cloud gaming experience by influencing design and operational decisions towards overall service stability. SREs focus on overall ownership of production, production code quality, and deployments. The successful candidate will be self-directed and able to participate in decision-making at different levels, providing critical feedback during various phases of the operational lifecycle. The team is engaged throughout the software development lifecycle, ensuring operational readiness and stability.

Requirements

  • Minimum of 7+ years working experience in Software Development and/or Linux Systems Administration role.
  • Strong interpersonal, written and verbal communication skills.
  • Available to be scheduled in on-call rotation.
  • Proficient as a Linux Production Systems Engineer, with experience managing large scale Web Services infrastructure.
  • Development experience in one or more of the following programming languages: Python, Bash, Go, Java, C++, or Rust
  • Experience with at least 3 of the following topics: Distributed data storage at scale (Hadoop, Ceph)
  • Experience with at least 3 of the following topics: NoSQL at scale (MongoDB, Redis, Cassandra)
  • Experience with at least 3 of the following topics: Data Aggregation technologies. (ElasticSearch, Kafka)
  • Experience with at least 3 of the following topics: Scaling and running traditional RDBMS (PostgreSQL, MySQL) with High Availability
  • Experience with at least 3 of the following topics: Monitoring & Alerting (Prometheus, Grafana), and Incident Management toolsets
  • Experience with at least 3 of the following topics: Kubernetes and/or AWS (deployment and management)
  • Experience with at least 3 of the following topics: Software Distribution (Package management and distribution at scale)
  • Experience with at least 3 of the following topics: Configuration Management (ansible, saltstack, puppet, chef)

Nice To Haves

  • S/W Performance analysis and load testing (QA or SDET experience)

Responsibilities

  • Taking a leadership role in ongoing improvements in Reliability and Scalability
  • Work closely with SRE Management to define KPIs, processes and drive continuous improvement
  • Influence the architecture and implementation of solutions within the division
  • Mentor more junior SRE staff and enable them for success
  • Act as a voice to represent SRE in the wider organization
  • Represent the operational scalability of solutions in the wider division
  • Lead small-scale projects from inception to implementation
  • Design platform-wide solutions and provide technical leadership during their implementation
  • Demonstrate a high-level of organizational skills and initiative in the role

Benefits

  • medical
  • dental
  • vision
  • matching 401(k)
  • paid time off
  • wellness program
  • coveted employee discounts for Sony products
  • bonus package
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service