Elastic Search / ELK Stack engineer - R01564999

Brillio•Mc Lean, VA

11h•$70 - $74•Hybrid

About The Position

We are seeking a highly experienced Senior Observability Engineer with deep expertise in ESS (Elastic Stack) to lead and accelerate the development of enterprise-grade observability capabilities across mission-critical applications. This role requires a hands-on SME who can design, build, and scale observability dashboards, APM, tracing, and monitoring solutions exclusively within ESS. The candidate will play a key role in transforming current monitoring into a proactive, intelligent, and scalable observability ecosystem. This is a high-impact, fast-paced engagement (target < 6 months) requiring ownership, technical depth, and execution excellence.

Requirements

Strong hands-on experience with ESS (Elastic Stack): Elasticsearch, Logstash, Kibana, Beats / Elastic Agent, Elastic APM.
Proven experience building enterprise-scale observability dashboards in ESS.
Deep understanding of: Microservices architecture, Kubernetes / OpenShift (OCP).
Experience with: APM, distributed tracing, logging, metrics correlation.
Ability to design multi-layer observability (infra → platform → app).

Nice To Haves

Experience with: Synthetic monitoring tools integrated with ESS, Real User Monitoring (RUM), Service maps and dependency graphs.
Knowledge of: CI/CD observability integration, Alerting frameworks within Elastic.
Scripting: Python / Shell / Groovy.

Responsibilities

Design and implement end-to-end observability solutions using ESS (Elastic Stack).
Build a centralized observability layer covering all MF applications.
Ensure block-level aggregation with drill-down to: Application-level metrics, APM traces, Logs and events, Service dependencies.
Develop and scale a large backlog of ESS dashboards, including but not limited to: Cluster Health (OCP/K8s), API & APM Dashboards, Service Health & Dependency Monitoring, Pod Status / Restart / Scaling Metrics, HTTP Status Analytics (200/400/500 trends), Transaction Processing Metrics, Infra Metrics (CPU, Memory, Disk, Network), Synthetic Monitoring & Availability.
Build intuitive, drill-down dashboards from MF Block → Service → Application level.
Expand ESS-based: Application Performance Monitoring (APM), Distributed tracing, Real User Monitoring (RUM), Synthetic monitoring.
Enable end-to-end traceability across microservices.
Design and implement smart alerting rules.
Move from reactive → proactive detection.
Reduce noise, improve signal quality.
Define SLOs, SLIs, and error budgets.
Enhance anomaly detection and trend analysis.
Work closely with: EOT Observability Team, Internal CDLs, Application teams.
Act as ESS Observability SME.
Provide guidance, standards, and best practices.