Site Reliability Engineer

NatWest GroupJersey, GA
76d

About The Position

Join us as a Site Reliability Engineer. In this key role, you'll improve, drive, and embed non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services. Working with support teams on investigating production incidents and optimising operational support of the eQ Payments platform. You'll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to deliver change in a safe and secure way. This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development.

Requirements

  • Strong knowledge of reliability systems thinking and experience of software engineering.
  • Experience of using a data driven and scientific approach to fact finding.
  • Financial services knowledge and the ability to identify wider business impact, risk and opportunity.
  • Good knowledge and experience of programming languages.
  • Strong knowledge of deploy and release services, automation, and troubleshooting.
  • Experience of utilising tools and technology across the software development lifecycle.
  • Experience using mathematical and statistical models to assess trends.
  • Strong communication skills with the ability to proactively engage with a wide range of stakeholders.

Nice To Haves

  • Experience of payment platforms and in particular RBSI Payment platforms including eQ.

Responsibilities

  • Work closely with feature team and other colleagues to meet defined service level objectives and continually improve systems and environments.
  • Define error budgets that support finding the right balance between risk and reliability.
  • Provide structure and help to the release process, suggesting and making improvements where possible.
  • Scale systems sustainably through mechanisms like automation, evolving them by pushing for changes that improve reliability and velocity.
  • Coach and provide guidance to colleagues and the wider team, leading where required.
  • Proactively contribute new ideas and innovations to meet short term and longer-term goals.
  • Continually balance and manage any potential risks.
  • Be accountable for the day-to-day health of both production and non-production environments and respond to any incidents as required.
  • Provide technical expertise and input to establish the risk tolerance of products and services.
  • Communicate incident status updates clearly and frequently to other teams, customers and stakeholders.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service