Dynatrace Observability Engineer

MLabs•McLean, VA

1d•Onsite

About The Position

Our firm has been retained to find a highly skilled Dynatrace Observability Engineer to design, implement, and optimize a world-class enterprise observability ecosystem. In this role, you will be the technical lead for our client's visibility strategy, leveraging Dynatrace’s APM, Infrastructure Monitoring, RUM, and Davis AI capabilities to ensure system reliability and proactive incident detection across hybrid and multi-cloud environments. You will collaborate directly with SREs, DevOps, Cloud Engineering, and business stakeholders to build scalable solutions that empower faster troubleshooting and data-driven decision-making.

Requirements

Tenure: Minimum 10 years of hands-on experience with Dynatrace APM (SaaS or Managed environment).
Observability Core: Strong understanding of the four pillars: metrics, logs, traces, and events.
Infrastructure: Deep knowledge of cloud platforms (AWS, Azure, or GCP) and/or Kubernetes.
Configuration: Proven experience configuring dashboards, alerts, tagging rules, and management zones.
Architecture: Solid understanding of microservices, distributed systems, APIs, and application performance concepts.
Automation: Proficiency with automation scripting tools (Python, Bash, PowerShell, YAML).
DevOps: Familiarity with CI/CD tools (Jenkins, Azure DevOps, GitHub Actions, or GitLab).

Nice To Haves

Dynatrace Certifications (Associate, Professional, or Master).
Experience with Monaco (Monitoring-as-Code) or Terraform.
Familiarity with OpenTelemetry or Chaos Engineering.
Background integrating Dynatrace with tools like ServiceNow, PagerDuty, or Grafana.

Responsibilities

Platform Ownership: Administer and maintain the Dynatrace platform, including upgrades, governance, and the deployment of OneAgents/ActiveGates across cloud, on-premise, and Kubernetes/OpenShift environments.
Observability Engineering: Implement end-to-end monitoring for applications, APIs, and logs. Configure distributed tracing, service flow mapping, and synthetic monitoring.
Performance & Reliability: Utilize Davis AI insights to analyze latency and bottlenecks. Partner with app teams for root-cause remediation and define SLOs/SLIs aligned with business KPIs.
Automation & Integration: Build automation scripts (Python, Bash, PowerShell) to streamline deployments. Integrate Dynatrace with CI/CD pipelines, ITSM tools (ServiceNow/PagerDuty), and logging platforms like Splunk.
Incident Management: Act as the Subject Matter Expert (SME) during major incidents, providing deep-dive observability insights and post-incident reviews.

Benefits

Technology Leadership: Serve as the primary SME for observability, influencing the roadmap for high-scale cloud environments.
High-Impact Work: Directly improve customer experience and system uptime for a major enterprise platform.
Professional Environment: Join a collaborative culture that values continuous improvement and technical ownership.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume