Lead Infrastructure Engineer

Truist BankCharlotte, NC
1d

About The Position

We are seeking a highly skilled and forward-thinking lead observability engineer to architect, implement, and evolve enterprise-grade observability capabilities across the Truist technology landscape. This role will drive the design and adoption of a modern, scalable observability platform rooted in OpenTelemetry (Otel) and enriched by complementary technologies including Prometheus, Grafana, Jaeger, and commercial APM solutions. You will lead the strategy for metrics, traces, and synthetic monitoring – enabling end-to-end visibility, accelerated incident response, and a frictionless developer experience. In this role, you’ll champion a shift from reactive monitoring to proactive, intelligence-driven observability. You’ll lead efforts to standardize telemetry pipelines, embed observability into CI/CD workflows, and integrate signal-based insights into reliability, performance, and business outcomes. Success in this position means reducing mean-time-to-detect (MTTD), accelerating root cause analysis, and creating a resilient, insight-rich environment that empowers engineering teams to deliver with confidence.

Requirements

  • Bachelor's degree and five years of experience in development or application support or an equivalent combination of education and work experience.
  • In- depth knowledge in information systems and ability to identify, apply, and implement best practices.
  • Understanding of key business processes and competitive strategies related to the IT function.
  • Ability to plan and manage projects.
  • Ability to solve complex problems by applying best practices.
  • Ability to provide direction and mentor less experienced teammates.
  • Ability to interpret and convey complex, difficult, or sensitive information.

Nice To Haves

  • Bachelor's degree and six years of experience or an equivalent combination of education and work experience.
  • Expertise with OpenTelemetry (Otel), including custom instrumentation, collector configuration, and pipeline design for traces, metrics, and logs.
  • Hands-on experience with observability tooling, such as Prometheus, Grafana, Jaegar, Loki, Elastic, Splunk, and/or Dynatrace in enterprise-grade environments.
  • Strong background in distributed systems, cloud-native architectures, and K8s, with the ability to identify observability gaps across service meshes, APIs, and event-driven platforms.
  • Proficiency in scripting or development languages (e.g. Python, Go, Bash, or Java) to automate telemetry integration, create custom exporters, and contributed to platform tooling.
  • Proven track record of driving enterprise adoption of observability standards and practices, including influencing telemetry strategies across engineering, SRE, and platform teams.

Responsibilities

  • Performs problem tracking, diagnosis and root-cause analysis, replication, troubleshooting, and resolution for complex issues. In this capacity, performs programming and debugging activities.
  • Responds to issues in a timely manner by receiving and investigating incidents or service tickets.
  • Analyzes and observes trends with technical issues and develops recommendations for long- term improvements.
  • Documents all relevant end-user interactions and steps taken to resolve incidents.
  • Has occasional contact with end-users.
  • Communicates status of issue resolution to internal customers.
  • May engage and manage outside vendors.
  • Applies in-depth knowledge of application support and an understanding of best practices.
  • Typically leads moderately complex projects and participates in larger, more complex initiatives.
  • Solves complex technical and operational problems.
  • Acts as a resource for teammates with less experience.
  • May have people management responsibilities for a small team.

Benefits

  • Truist offers medical, dental, vision, life insurance, disability, accidental death and dismemberment, tax-preferred savings accounts, and a 401k plan to teammates.
  • Teammates also receive no less than 10 days of vacation (prorated based on date of hire and by full-time or part-time status) during their first year of employment, along with 10 sick days (also prorated), and paid holidays.
  • Depending on the position and division, this job may also be eligible for Truist’s defined benefit pension plan, restricted stock units, and/or a deferred compensation plan.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service