Truist Bank-posted 7 days ago
Full-time • Mid Level
Charlotte, NC
5,001-10,000 employees

The Head of Observability and Monitoring will lead the strategy, architecture, and implementation of observability, monitoring, and telemetry capabilities within a regulated banking environment. This role is critical to ensuring the resilience, performance, and security of the Bank’s technology landscape. The ideal candidate will possess deep technical expertise, a strategic mindset, and strong collaboration skills to drive best-in-class monitoring solutions that align with regulatory and business requirements.

  • Develop and execute a comprehensive observability strategy, integrating logging, metrics, and distributed tracing across the Bank’s technology stack.
  • Lead the design and deployment of monitoring platforms, ensuring real-time visibility into system performance, availability, and security threats.
  • Own the end-to-end observability architecture, including tools selection, automation, and integration with cloud, on-prem, and hybrid environments.
  • Drive the adoption of AI/ML-powered monitoring to enhance anomaly detection, predictive analytics, and automated incident response.
  • Ensure robust service level indicators (SLIs), service level objectives (SLOs), and error budgets are established and tracked for critical services.
  • Define and implement observability governance frameworks, ensuring compliance with regulatory requirements (e.g., FFIEC, OCC, Basel III, GDPR).
  • Develop strategies to support real-time monitoring, root cause analysis, and proactive remediation to minimize downtime and business impact.
  • Partner with engineering, security, business unit, risk, and compliance teams to align observability initiatives with operational stability and performance targets, continuity and disaster recovery plans.
  • Champion operational resilience by ensuring monitoring covers end-to-end customer journeys, critical business services, and third-party dependencies.
  • Establish and maintain a centralized observability platform, standardizing logging and metrics collection across microservices, APIs, databases, and infrastructure.
  • Work closely with platform teams to embed observability best practices into CI/CD pipelines and software development lifecycles.
  • Partner with Cybersecurity to integrate security monitoring, anomaly detection, and threat intelligence into observability solutions.
  • Engage with business and operations teams to ensure monitoring capabilities support customer experience, regulatory reporting, and incident management.
  • Serve as the Bank’s SME on observability, engaging with industry forums, vendors, and regulatory bodies to stay ahead of trends and compliance needs.
  • Proven expertise in modern observability stacks, including Splunk, Dynatrace, AppDynamics, ThousandEyes, ServiceNow AIOps or Datadog.
  • Deep understanding of cloud-native monitoring across AWS, Azure, and Google Cloud, including serverless, Kubernetes, and container-based architectures.
  • Strong hands-on experience with log aggregation, tracing (Jaeger, Zipkin), and APM (Application Performance Monitoring).
  • Knowledge of AI-driven monitoring, automated remediation, and self-healing infrastructure.
  • Familiarity with SIEM tools and security monitoring, ensuring alignment with SOC and threat detection capabilities.
  • Experience in API monitoring, network telemetry, and database performance tuning.
  • 10+ years of experience in observability, monitoring, or infrastructure resilience roles within regulated financial services or banking environments.
  • Proven track record of designing and implementing enterprise-scale observability platforms in a complex, multi-cloud environment.
  • Experience leading cross-functional teams to drive cultural adoption of observability and monitoring best practices.
  • Strong knowledge of regulatory and compliance requirements related to operational resilience, incident management, and monitoring.
  • Ability to translate complex technical monitoring data into actionable insights for senior executives and non-technical stakeholders.
  • Strong problem-solving skills with a proactive and forward-thinking approach to technology and resilience.
  • Excellent communication and leadership abilities, fostering collaboration across engineering, risk, and business teams.
  • In-depth understanding of compliance in regulated industries (e.g., financial services, healthcare).
  • Experience working with audit and risk management processes.
  • Facilitate collaboration between application, infrastructure, and business teams to drive efficiency and innovation.
  • Demonstrated ability to partner with line-of-business leaders, security teams, and developers to drive collaborative outcomes.
  • Excellent communication and influence skills to balance business, technology, and compliance needs.
  • Bachelor’s degree and 20 to 30 y ears related experience or equivalent combination.
  • Managed Technology or Technology Process Teams for more than 15 years or teams of 30 or more technologists.
  • Excellent knowledge of technical management and data governance.
  • Knowledge of current trends in IT hardware and systems software field.
  • Database management skills with the ability to produce reports.
  • Familiarity with the support and troubleshooting of personal computers and tablet devices.
  • Training ability and experience is a plus.
  • The position requires strong problem solving and analytical skills with the ability to work independently and exercise sound judgment
  • The ability to make commitments and be willing to be held accountable against them, organizing workloads to meet deadlines
  • Exhibit adaptability to accept or bring about change when needed
  • Strong written and verbal communication skills
  • The ability to excel in a team environment and advance overall team objectives
  • The ability to ensure customer satisfaction by delivering excellence in products and service
  • Ability to work and communicate with peers, vendors, internal staff, including software program leadership and others
  • Consistently demonstrate professional, positive, and approachable attitude, demeanor and discretion
  • Demonstrate sensitivity in handling confidential information
  • Formulate and clearly communicate ideas to others
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service