Observability Engineer

Booz Allen HamiltonMcLean, VA
$86,800 - $198,000

About The Position

The Opportunity: Something breaks at 2 AM. Today, a human gets paged. Tomorrow, an AI agent detects the anomaly, correlates the root cause, triggers the remediation, and closes the ticket, all before the first cup of coffee. You are the engineer who builds that tomorrow. We are seeking a senior Observability Engineer with expertise in both AI technologies and enterprise performance monitoring. This role combines hands-on engineering with AIOps implementation to deliver full-stack visibility across 250+ services. You will lead efforts to implement predictive monitoring and self-healing capabilities that drive down operational costs while increasing system availability by leveraging AI to triage and resolve incidents. You will mentor and supervise engineers, own technical quality, and push the program toward AI-driven observability with opportunities to build new observability platforms from the ground up as we expand into new environments. Due to the nature of work performed within this facility, U.S. citizenship is required. Join us. The world can’t wait.

Requirements

  • 5+ years of experience in enterprise observability, monitoring, and site reliability engineering
  • Experience architecting and operating Dynatrace for full-stack observability, including agent deployment, distributed tracing, log management, synthetic monitoring, and digital experience monitoring
  • Experience implementing AIOps workflows, including predictive alerting, anomaly detection, automated remediation, and incident automation
  • Experience building observability integrations, custom extensions, and infrastructure-as-code using Python, JavaScript, Node.js, and Terraform
  • Experience building operational and executive dashboards and implementing SLOs and SLAs
  • Experience working in Agile environments with sprint-based delivery
  • Knowledge of network monitoring protocols, including SNMP, SNMP traps, NetFlow, and Syslog
  • Ability to mentor engineers, conduct code reviews, and take accountability for technical delivery and quality
  • Bachelor's degree in a Computer Science or Information Technology field
  • U.S. citizenship is required.

Nice To Haves

  • Experience with ServiceNow Event Management, including event rules, alert management rules, alert correlation, threshold tuning, noise reduction, CMDB integration with CI relationships and dependencies, ITOM alignment, automated incident creation, Flow Designer, IntegrationHub, JavaScript, Glide API, HTML or CSS, ServiceNow platform architecture, and developing standardized onboarding processes for integrating new monitoring tools into Event Management with governance, segregation of duties, and compliance documentation
  • Experience with advanced Dynatrace platform capabilities, including Grail, Smartscape, Davis AI, OpenPipeline, DQL, Workflow Automation, Platform API, AppSec, Session Replay, Grail-powered RUM, AI Observability, and Grail log management
  • Experience with Dynatrace Intelligence, including Dynatrace Assist, Intelligence Agents, MCP Server integration, and Dynatrace Apps development using the App Toolkit
  • Experience deploying and building observability platforms from scratch in government cloud environments such as AWS GovCloud, Azure Government, or IL4/IL5, including air-gapped, restricted network, and STIG-hardened deployments
  • Experience building self-service onboarding portals for application team observability adoption
  • Experience with open-source observability tooling, including OpenTelemetry, Prometheus, Grafana, and ELK/EFK
  • Experience with FinOps practices, containerization, and cloud platforms such as AWS, Azure, or GCP
  • Experience operating Splunk and Splunk Enterprise Security (SIEM), Cribl, and SolarWinds at enterprise scale
  • Dynatrace Professional or Master Certification or ServiceNow Certified Implementation Specialist - Event Management (CIS-EM) Certification

Responsibilities

  • Implement predictive monitoring and self-healing capabilities that drive down operational costs while increasing system availability by leveraging AI to triage and resolve incidents.
  • Mentor and supervise engineers.
  • Own technical quality.
  • Push the program toward AI-driven observability with opportunities to build new observability platforms from the ground up as we expand into new environments.

Benefits

  • health, life, disability, financial, and retirement benefits
  • paid leave
  • professional development
  • tuition assistance
  • work-life programs
  • dependent care
  • recognition awards program

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Number of Employees

5,001-10,000 employees

© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service