Production Support Analyst

mthree•Charlotte, NC

52d

About The Position

Looking for local candidates Want to work in technology in the financial industry? We are looking for a Production Support Analyst to join our client's team in Charlotte, NC. We are looking for someone with excellent written and verbal communication skills, energetic, good follow-up that has a curious nature that drives them to solve problems. Ability to work in a fast paced and high demand environment. Financial background is desired. About mthree: Since 2010, mthree has been helping clients solve their business and technological challenges. We are a technology and business consultancy with a global workforce delivering significant business and IT projects in some of the largest financial services organizations worldwide. Core Services: Consulting and Advisory Managed Services Alumni Graduate Program Alumni Pro Program We have a global presence and are experts in delivering exceptional quality to our client base, providing consulting services across Risk, Regulation & Compliance; Vendor Products; Application Support; Application Development; Cyber & Information Security; Data Science and DevOps areas. Our Expert program offers experienced professionals access to top roles in tech, finance, aviation and insurance. Join us to work on groundbreaking technology projects, from international trading platforms to critical applications for leading airlines. We recruit professionals who are eager to fast-track their careers in technology or operations within prestigious global organizations. Responsibilities: Platform & Reliability Engineering Embed SRE and production engineering principles into Payments Modernization from design through early life support Define and validate non-functional requirements (NFRs) covering resilience, scalability, observability, recovery, and operability Drive replay, retry, and exception-handling validation for event-driven payment flows Lead capacity and performance testing, including volume growth and peak event scenarios (e.g. FedNow, CHIPS, SWIFT) Service Transition & Operational Readiness Own Permit-to-Operate readiness across environments (NFR Testing) Define cutover, shadow support, and early life support models Ensure runbooks, support procedures, on-call readiness, and escalation paths are production-grade before go-live Partner with Change Assurance to apply risk-based release controls, canary/blue-green strategies, and rollback automation Observability & Stability Implement end-to-end observability across Kafka, MongoDB, API layers, and downstream payment components Define and monitor SLOs, error budgets, and golden signals Reduce alert noise through signal design, correlation, and automation Analyze early defects and exception patterns (ACK/NACKs, business errors) to drive stabilization Chaos Engineering & Continuous Improvement Design and execute controlled failure testing (chaos engineering) to validate recovery patterns and blast radius Lead blameless RCAs, ensuring corrective actions are owned and recurrence is prevented Drive continuous service improvement (CSI) initiatives, including automation, resilience uplift, and technical debt reduction

Requirements

Range from juniors with 3-5 years experience to mid range, 10+ years.
Service management experience, payments knowledge and tech wise knowledge on framework such as springboot, mongodb, kakfa, Kubernetes/ CI/CD pipelines
Hands on experience with UNIX, SQL to assist with troubleshooting
Knowledge of Automation Related activities using scripting languages such as Python, Bash, Perl, Ruby
Excellent analytical and communication skills
Ability to prioritize and willingness to take ownership
Problem solving mindset and solution enabler
Great Problem trouble shooting skills

Nice To Haves

Financial background is desired.

Responsibilities

Embed SRE and production engineering principles into Payments Modernization from design through early life support
Define and validate non-functional requirements (NFRs) covering resilience, scalability, observability, recovery, and operability
Drive replay, retry, and exception-handling validation for event-driven payment flows
Lead capacity and performance testing, including volume growth and peak event scenarios (e.g. FedNow, CHIPS, SWIFT)
Own Permit-to-Operate readiness across environments (NFR Testing)
Define cutover, shadow support, and early life support models
Ensure runbooks, support procedures, on-call readiness, and escalation paths are production-grade before go-live
Partner with Change Assurance to apply risk-based release controls, canary/blue-green strategies, and rollback automation
Implement end-to-end observability across Kafka, MongoDB, API layers, and downstream payment components
Define and monitor SLOs, error budgets, and golden signals
Reduce alert noise through signal design, correlation, and automation
Analyze early defects and exception patterns (ACK/NACKs, business errors) to drive stabilization
Design and execute controlled failure testing (chaos engineering) to validate recovery patterns and blast radius
Lead blameless RCAs, ensuring corrective actions are owned and recurrence is prevented
Drive continuous service improvement (CSI) initiatives, including automation, resilience uplift, and technical debt reduction