Truist Bank-posted 10 days ago
Full-time • Mid Level
Charlotte, NC
5,001-10,000 employees

The Production Support Engineer III is responsible for ensuring the operational integrity, availability, and performance of mission-critical systems. This role involves managing technical incidents, troubleshooting recurring issues, and implementing permanent solutions to maintain system stability. The Engineer will collaborate with cross-functional teams to resolve incidents efficiently and improve system resiliency through proactive monitoring and automation.

  • Handle the identification, triage, and resolution of medium-to-high priority incidents with minimal supervision to ensure business operations are minimally impacted.
  • Collaborate with development teams, business partners, and other stakeholders to diagnose and resolve technical issues, implementing long-term fixes to prevent incident recurrence.
  • Use monitoring tools (e.g., Splunk, Dynatrace, CloudWatch) to detect performance issues and execute corrective actions promptly.
  • Enhance system observability to proactively detect issues and improve overall system performance and stability.
  • Develop and maintain automation scripts to streamline routine production support tasks, reducing manual interventions.
  • Implement automation strategies to improve production stability and minimize downtown.
  • Maintain clear and detailed documentation of troubleshooting procedures, contributing to the shared knowledge base.
  • Provide assistance in improving the incident, problem, and change management processes, following ITIL best practices.
  • Participate in root cause analysis and suggest process improvements to enhance system stability and performance.
  • Collaborate with cross-functional teams in resolving recurring production support issues and optimizing workflows.
  • Actively mentor junior support engineers, fostering technical growth within the team.
  • Escalate complex or unresolved issues to senior engineers or technical experts when necessary.
  • Build and maintain the automation and streamlining of software delivery and operations for new or existing software applications through proficiency in capabilities and tools in the DevOps lifecycle including: Infrastructure as Code; Agile and DevOps Lifecycle Management; Source Code Management; Build Orchestration; Build Management; Artifact Repository Management; Behavior Driven Development; Test Driven Development; Automated Testing including Unit Testing, Integration Testing, Functional Testing, Smoke Testing, Regression Testing, Stress Testing, and Performance Testing; Static Code Analysis; Load and Performance Testing; Artifact Scanning; Database Schema Management, Orchestration and Recovery; Compliance Automation and Audit Trails; Configuration Management; Containers; Application Release Automation; Deployment Strategies and Patterns including Blue/Green Deployment, Canary Releases, and Rolling Releases; Logging and Log Analytics; and Performance Monitoring and Management.
  • Bachelor’s degree in Computer Science, Information Systems, Engineering, or a related discipline.
  • Six to ten years of experience in production support or related technical roles.
  • Experience in managing incident management, triage, and production support functions for both on-premise and cloud environments.
  • Proficiency with IT Service Management (ITSM) tools such as ServiceNow, and familiarity with incident, problem, and change management processes.
  • Strong experience with monitoring tools such as Dynatrace, Splunk, or CloudWatch for proactive issue detection and troubleshooting.
  • Understanding of infrastructure, application technology stacks, and the software development lifecycle.
  • Strong analytical and problem-solving skills with a focus on root cause analysis.
  • Ability to work independently, handle medium-to-complex issues, and escalate critical problems to senior staff as needed.
  • Experience in DevSecOps and support of CI/CD pipelines .
  • Experience in supporting Agile team/processes.
  • Financial services industry experience
  • Familiarity with Site Reliability Engineering (SRE) practices
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service