Observability Engineer

Booz Allen HamiltonMcLean, VA
Remote

About The Position

The Opportunity: Something breaks at 2 AM. Today, a human gets paged. Tomorrow, an AI agent detects the anomaly, correlates the root cause, triggers the remediation, and closes the ticket, all before the first cup of coffee. You are the engineer who builds that tomorrow. We are seeking a senior Observability Engineer with expertise in both AI technologies and enterprise performance monitoring. This role combines hands-on engineering with AIOps implementation to deliver full-stack visibility across 250+ services. You will lead efforts to implement predictive monitoring and self-healing capabilities that drive down operational costs while increasing system availability by leveraging AI to triage and resolve incidents. You will mentor and supervise engineers, own technical quality, and push the program toward AI-driven observability with opportunities to build new observability platforms from the ground up as we expand into new environments. Join us. The world can’t wait.

Requirements

  • 5+ years of experience in enterprise observability, monitoring, and site reliability engineering
  • Experience architecting and operating Dynatrace for full-stack observability, including agent deployment, distributed tracing, log management, synthetic monitoring, and digital experience monitoring
  • Experience implementing AIOps workflows, including predictive alerting, anomaly detection, automated remediation, and incident automation
  • Experience building observability integrations, custom extensions, and infrastructure-as-code using Python, JavaScript, Node.js, and Terraform
  • Experience building operational and executive dashboards and implementing SLOs and SLAs
  • Experience working in Agile environments with sprint-based delivery
  • Knowledge of network monitoring protocols, including SNMP, SNMP traps, NetFlow, and Syslog
  • Ability to mentor engineers, conduct code reviews, and take accountability for technical delivery and quality
  • Secret clearance
  • Bachelor's degree in Computer Science or Information Technology

Nice To Haves

  • Experience with ServiceNow Event Management, including event rules, alert management rules, alert correlation, threshold tuning, noise reduction, CMDB integration with CI relationships and dependencies, ITOM alignment, automated incident creation, Flow Designer, IntegrationHub, JavaScript, Glide API, HTML or CSS, and ServiceNow platform architecture
  • Experience developing standardized onboarding processes for integrating new monitoring tools into Event Management with governance, segregation of duties, and compliance documentation
  • Experience with advanced Dynatrace platform capabilities, including Grail, Smartscape, Davis AI, OpenPipeline, DQL, Workflow Automation, Platform API, AppSec, Session Replay, Grail-powered RUM, AI Observability, and Grail log management
  • Experience with Dynatrace Intelligence, including Dynatrace Assist, Intelligence Agents, MCP Server integration, and Dynatrace Apps development using the App Toolkit
  • Experience deploying and building observability platforms from scratch in government cloud environments such as AWS GovCloud, Azure Government, IL4, or IL5, including air-gapped, restricted network, and STIG-hardened deployments
  • Experience building self-service onboarding portals for application team observability adoption
  • Experience with open-source observability tooling, including OpenTelemetry, Prometheus, Grafana, ELK, and EFK
  • Experience with FinOps practices, containerization, and cloud platforms such as AWS, Azure, or GCP
  • Experience operating Splunk and Splunk Enterprise Security (SIEM), Cribl, and SolarWinds at enterprise scale
  • Dynatrace Professional or Master Certification or ServiceNow Certified Implementation Specialist - Event Management (CIS-EM) Certification

Responsibilities

  • Implement predictive monitoring and self-healing capabilities.
  • Leverage AI to triage and resolve incidents.
  • Mentor and supervise engineers.
  • Own technical quality.
  • Push the program toward AI-driven observability.
  • Build new observability platforms from the ground up as we expand into new environments.
  • Deploy agent deployment, distributed tracing, log management, synthetic monitoring, and digital experience monitoring.
  • Implement AIOps workflows, including predictive alerting, anomaly detection, automated remediation, and incident automation.
  • Build observability integrations, custom extensions, and infrastructure-as-code.
  • Build operational and executive dashboards.
  • Implement SLOs and SLAs.
  • Work in Agile environments with sprint-based delivery.
  • Conduct code reviews.
  • Take accountability for technical delivery and quality.
  • Develop standardized onboarding processes for integrating new monitoring tools into Event Management with governance, segregation of duties, and compliance documentation.
  • Build self-service onboarding portals for application team observability adoption.

Benefits

  • health, life, disability, financial, and retirement benefits
  • paid leave
  • professional development
  • tuition assistance
  • work-life programs
  • dependent care
  • recognition awards program
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service