Reliability Engineer – OnePay

Synchrony•Alpharetta, GA

1d•$85,000 - $140,000•Hybrid

About The Position

The Reliability Engineer - OnePay plays a pivotal technical role within Synchrony Financial to ensure high availability of our applications to enhance and maintain customer experiences for OnePay integrations while providing operational excellence and adherence to program SLAs. This role provides technical expertise and rigor to identify and remediate failures or looming issues which could impact customer experiences. We are looking for a reliable and curious candidate that excels at problem analysis, troubleshooting methods and situational awareness within the context of distributed systems. This is also a hands-on technologist role requiring exposure to SRE and DevOps technology stacks and strong understanding of application support processes, including monitoring and addressing incidents/alerts across engineering applications and ensuring effective coordination and handoffs with vendors, partners and internal Synchrony teams.

Requirements

Bachelor’s degree and a minimum of 3 years of relevant experience in application development, reliability engineering, systems engineering, and/or production application support (or equivalent practical experience) or in lieu of degree, High School/GED and 5+ years of relevant experience.
Good understanding of the nature of distributed systems and cloud providers.
Solid understanding of cloud concepts such as containerization, message queues, load balancing, data replication, and HA patterns.
Understanding of IT application support processes, including incident management, problem resolution, and operational/support metrics used for decision-making.
Knowledgeable in UNIX Operating System fundamentals.
Familiar with network programming concepts and protocols.
Proficiency in DevOps concepts and Site Reliability Engineering (SRE) principles, including automation, monitoring, and reliability best practices.
Hands-on experience with scripting/automation in at least one language such as Python, Bash, JavaScript, PowerShell, Go, or similar.
Familiar with one or more configuration automation/tools such as Terraform, Ansible, Puppet, Chef, etc.
Understanding of the infrastructure of the applications supported.
Working knowledge of SDLC and Agile methodologies such as Scrum and Kanban.
Strong communication skills (verbal and written) and ability to interact with multiple audiences including developers, managers, and senior executives.
Customer-focus mindset; self-driven, detail-oriented; strong organizational and time management skills; operates with limited supervision.
Well-developed analytical and problem-solving skills.
Continuously seeking opportunities to enhance products/services through process improvements.
Ability and flexibility to travel for business as required.

Nice To Haves

Strong alignment to DevOps tools and SRE best practices.
ITIL Foundation and/or SRE/DevOps certifications are nice to have.
Experience with cloud providers such as AWS, Azure, GCP (including practical exposure to deployment processes such as AWS/PCF where applicable).
Knowledge of an application or systems language such as Java, Golang, Rust, C++.
Familiar with toolsets such as Jira, PagerDuty, OpsGenie, Kibana, Grafana, Splunk, and application performance monitoring tools such as New Relic.
Experience supporting or coordinating CI/CD pipelines (e.g., Jenkins, CloudBees) and release processes.
BS in Computer Science / Software Engineering, or equivalent.

Responsibilities

Drive investigations with cross-functional teams to understand failures, analyze production defects, troubleshoot systems, identify root cause, and implement fixes to prevent recurrence.
Work with peers to enhance observability, including establishing/maintaining dashboards and monitoring capabilities (e.g., Splunk/New Relic and similar tools), and improving alerting and operational readiness.
Ensure high standards of quality, availability, scalability, performance, and security of internally developed applications.
Continuously monitor the health and performance of engineering applications, production servers, and key service indicators; provide monitoring/reporting as needed.
Support release and operational processes, including troubleshooting CI/CD pipeline issues (e.g., Jenkins pipeline) and coordinating releases as needed with partner teams.
Participate in Agile sprints with cross-functional teams (multiple technologies, personnel, and processes) and contribute to continuous delivery and reliability outcomes.
Identify opportunities to drive technology innovation, reliability improvements, simplifications, and process improvements.
Communicate status of technical stacks, incidents, and reliability initiatives to stakeholders and leadership.
Work closely with a blended team of Synchrony resources and third-party partners/contractors.
Participate in an on-call rotation to respond to critical production issues.
Perform other duties and/or special projects as assigned.