Head of Observability and Monitoring

Truist BankDallas, TX
12d

About The Position

The Head of Observability and Monitoring will lead the strategy, architecture, and implementation of observability, monitoring, and telemetry capabilities within a regulated banking environment. This role is critical to ensuring the resilience, performance, and security of the Bank’s technology landscape. The ideal candidate will possess deep technical expertise, a strategic mindset, and strong collaboration skills to drive best-in-class monitoring solutions that align with regulatory and business requirements.

Requirements

  • Proven expertise in modern observability stacks, including Splunk, Dynatrace, AppDynamics, ThousandEyes, ServiceNow AIOps or Datadog.
  • Deep understanding of cloud-native monitoring across AWS, Azure, and Google Cloud, including serverless, Kubernetes, and container-based architectures.
  • Strong hands-on experience with log aggregation, tracing (Jaeger, Zipkin), and APM (Application Performance Monitoring).
  • Knowledge of AI-driven monitoring, automated remediation, and self-healing infrastructure.
  • Familiarity with SIEM tools and security monitoring, ensuring alignment with SOC and threat detection capabilities.
  • Experience in API monitoring, network telemetry, and database performance tuning.
  • 10+ years of experience in observability, monitoring, or infrastructure resilience roles within regulated financial services or banking environments.
  • Proven track record of designing and implementing enterprise-scale observability platforms in a complex, multi-cloud environment.
  • Experience leading cross-functional teams to drive cultural adoption of observability and monitoring best practices.
  • Strong knowledge of regulatory and compliance requirements related to operational resilience, incident management, and monitoring.
  • Ability to translate complex technical monitoring data into actionable insights for senior executives and non-technical stakeholders.
  • Strong problem-solving skills with a proactive and forward-thinking approach to technology and resilience.
  • Excellent communication and leadership abilities, fostering collaboration across engineering, risk, and business teams.
  • In-depth understanding of compliance in regulated industries (e.g., financial services, healthcare).
  • Experience working with audit and risk management processes.
  • Facilitate collaboration between application, infrastructure, and business teams to drive efficiency and innovation.
  • Demonstrated ability to partner with line-of-business leaders, security teams, and developers to drive collaborative outcomes.
  • Excellent communication and influence skills to balance business, technology, and compliance needs.
  • Bachelor’s degree and 20 to 30 y ears related experience or equivalent combination.
  • Managed Technology or Technology Process Teams for more than 15 years or teams of 30 or more technologists.
  • Excellent knowledge of technical management and data governance.
  • Knowledge of current trends in IT hardware and systems software field.
  • Database management skills with the ability to produce reports.
  • Familiarity with the support and troubleshooting of personal computers and tablet devices.
  • Training ability and experience is a plus.
  • The position requires strong problem solving and analytical skills with the ability to work independently and exercise sound judgment
  • The ability to make commitments and be willing to be held accountable against them, organizing workloads to meet deadlines
  • Exhibit adaptability to accept or bring about change when needed
  • Strong written and verbal communication skills
  • The ability to excel in a team environment and advance overall team objectives
  • The ability to ensure customer satisfaction by delivering excellence in products and service
  • Ability to work and communicate with peers, vendors, internal staff, including software program leadership and others
  • Consistently demonstrate professional, positive, and approachable attitude, demeanor and discretion
  • Demonstrate sensitivity in handling confidential information
  • Formulate and clearly communicate ideas to others

Responsibilities

  • Develop and execute a comprehensive observability strategy, integrating logging, metrics, and distributed tracing across the Bank’s technology stack.
  • Lead the design and deployment of monitoring platforms, ensuring real-time visibility into system performance, availability, and security threats.
  • Own the end-to-end observability architecture, including tools selection, automation, and integration with cloud, on-prem, and hybrid environments.
  • Drive the adoption of AI/ML-powered monitoring to enhance anomaly detection, predictive analytics, and automated incident response.
  • Ensure robust service level indicators (SLIs), service level objectives (SLOs), and error budgets are established and tracked for critical services.
  • Define and implement observability governance frameworks, ensuring compliance with regulatory requirements (e.g., FFIEC, OCC, Basel III, GDPR).
  • Develop strategies to support real-time monitoring, root cause analysis, and proactive remediation to minimize downtime and business impact.
  • Partner with engineering, security, business unit, risk, and compliance teams to align observability initiatives with operational stability and performance targets, continuity and disaster recovery plans.
  • Champion operational resilience by ensuring monitoring covers end-to-end customer journeys, critical business services, and third-party dependencies.
  • Establish and maintain a centralized observability platform, standardizing logging and metrics collection across microservices, APIs, databases, and infrastructure.
  • Work closely with platform teams to embed observability best practices into CI/CD pipelines and software development lifecycles.
  • Partner with Cybersecurity to integrate security monitoring, anomaly detection, and threat intelligence into observability solutions.
  • Engage with business and operations teams to ensure monitoring capabilities support customer experience, regulatory reporting, and incident management.
  • Serve as the Bank’s SME on observability, engaging with industry forums, vendors, and regulatory bodies to stay ahead of trends and compliance needs.

Benefits

  • Truist offers medical, dental, vision, life insurance, disability, accidental death and dismemberment, tax-preferred savings accounts, and a 401k plan to teammates.
  • Teammates also receive no less than 10 days of vacation (prorated based on date of hire and by full-time or part-time status) during their first year of employment, along with 10 sick days (also prorated), and paid holidays.
  • Depending on the position and division, this job may also be eligible for Truist’s defined benefit pension plan, restricted stock units, and/or a deferred compensation plan.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Executive

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service