Truist Financial Corporation-posted about 2 months ago
Full-time • Mid Level
Raleigh, NC
5,001-10,000 employees
Credit Intermediation and Related Activities

The Production Support Engineer III is responsible for ensuring the operational integrity, availability, and performance of mission-critical systems. This role involves managing technical incidents, troubleshooting recurring issues, and implementing permanent solutions to maintain system stability. The Engineer will collaborate with cross-functional teams to resolve incidents efficiently and improve system resiliency through proactive monitoring and automation.

  • Handle the identification, triage, and resolution of medium-to-high priority incidents with minimal supervision to ensure business operations are minimally impacted.
  • Collaborate with development teams, business partners, and other stakeholders to diagnose and resolve technical issues, implementing long-term fixes to prevent incident recurrence.
  • Use monitoring tools (e.g., Splunk, Dynatrace, CloudWatch) to detect performance issues and execute corrective actions promptly.
  • Enhance system observability to proactively detect issues and improve overall system performance and stability.
  • Develop and maintain automation scripts to streamline routine production support tasks, reducing manual interventions.
  • Implement automation strategies to improve production stability and minimize downtown.
  • Maintain clear and detailed documentation of troubleshooting procedures, contributing to the shared knowledge base.
  • Provide assistance in improving the incident, problem, and change management processes, following ITIL best practices.
  • Participate in root cause analysis and suggest process improvements to enhance system stability and performance.
  • Collaborate with cross-functional teams in resolving recurring production support issues and optimizing workflows.
  • Actively mentor junior support engineers, fostering technical growth within the team.
  • Escalate complex or unresolved issues to senior engineers or technical experts when necessary.
  • Bachelor's degree in Computer Science, Information Systems, Engineering, or a related discipline.
  • Six to ten years of experience in production support or related technical roles.
  • Experience in managing incident management, triage, and production support functions for both on-premise and cloud environments.
  • Proficiency with IT Service Management (ITSM) tools such as ServiceNow, and familiarity with incident, problem, and change management processes.
  • Strong experience with monitoring tools such as Dynatrace, Splunk, or CloudWatch for proactive issue detection and troubleshooting.
  • Understanding of infrastructure, application technology stacks, and the software development lifecycle.
  • Strong analytical and problem-solving skills with a focus on root cause analysis.
  • Ability to work independently, handle medium-to-complex issues, and escalate critical problems to senior staff as needed.
  • Experience in supporting Agile team/processes.
  • Financial services industry experience
  • Familiarity with Site Reliability Engineering (SRE) practices
  • All regular teammates (not temporary or contingent workers) working 20 hours or more per week are eligible for benefits, though eligibility for specific benefits may be determined by the division of Truist offering the position.
  • Truist offers medical, dental, vision, life insurance, disability, accidental death and dismemberment, tax-preferred savings accounts, and a 401k plan to teammates.
  • Teammates also receive no less than 10 days of vacation (prorated based on date of hire and by full-time or part-time status) during their first year of employment, along with 10 sick days (also prorated), and paid holidays.
  • Depending on the position and division, this job may also be eligible for Truist's defined benefit pension plan, restricted stock units, and/or a deferred compensation plan.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service