Production Support Engineer III

Truist•Atlanta, GA

1d•Hybrid

About The Position

Ensure the operational integrity, availability, and performance of mission-critical systems. Manage technical incidents, troubleshoot recurring issues, and implement permanent solutions to maintain system stability. Collaborate with cross-functional teams to resolve incidents efficiently and improve system resiliency through proactive monitoring and automation. Handle the identification, triage, and resolution of medium-to-high priority incidents with minimal supervision to ensure business operations are minimally impacted. Collaborate with development teams, business partners, and other stakeholders to diagnose and resolve technical issues, implementing long-term fixes to prevent incident recurrence. Use monitoring tools (e.g., Splunk, Dynatrace, CloudWatch) to detect performance issues and execute corrective actions promptly. Enhance system observability to proactively detect issues and improve overall system performance and stability. Develop and maintain automation scripts to streamline routine production support tasks, reducing manual interventions. Implement automation strategies to improve production stability and minimize downtown. Maintain clear and detailed documentation of troubleshooting procedures, contributing to the shared knowledge base. Provide assistance in improving the incident, problem, and change management processes, following ITIL best practices. Participate in root cause analysis and suggest process improvements to enhance system stability and performance. Collaborate with cross-functional teams in resolving recurring production support issues and optimizing workflows. Actively mentor junior support engineers, fostering technical growth within the team.

Requirements

Must have Bachelor's degree in Computer Science, Computer Engineering, CIS or related technical field.
Must have 6 years of progressive experience in production support positions performing the following: Managing incident management, triage, and production support functions for both on-premise and cloud environments.
Proficiency with IT Service Management (ITSM) tools such as ServiceNow, and familiarity with incident, problem, and change management processes.
Understanding of infrastructure, application technology stacks, and the software development lifecycle.
Utilizing experience with: Dynatrace, Splunk, CloudWatch, DB2, SQL Server, Oracle, Microsoft Azure, SharePoint Development, AWS, OpenShift, Kubernetes, GitLab, Ansible, Shell script, Linux & AIX, IBM PowerHA, and Docker.

Responsibilities

Ensure the operational integrity, availability, and performance of mission-critical systems.
Manage technical incidents, troubleshoot recurring issues, and implement permanent solutions to maintain system stability.
Collaborate with cross-functional teams to resolve incidents efficiently and improve system resiliency through proactive monitoring and automation.
Handle the identification, triage, and resolution of medium-to-high priority incidents with minimal supervision to ensure business operations are minimally impacted.
Collaborate with development teams, business partners, and other stakeholders to diagnose and resolve technical issues, implementing long-term fixes to prevent incident recurrence.
Use monitoring tools (e.g., Splunk, Dynatrace, CloudWatch) to detect performance issues and execute corrective actions promptly.
Enhance system observability to proactively detect issues and improve overall system performance and stability.
Develop and maintain automation scripts to streamline routine production support tasks, reducing manual interventions.
Implement automation strategies to improve production stability and minimize downtown.
Maintain clear and detailed documentation of troubleshooting procedures, contributing to the shared knowledge base.
Provide assistance in improving the incident, problem, and change management processes, following ITIL best practices.
Participate in root cause analysis and suggest process improvements to enhance system stability and performance.
Collaborate with cross-functional teams in resolving recurring production support issues and optimizing workflows.
Actively mentor junior support engineers, fostering technical growth within the team.