Design & Implement Solutions: Build and maintain comprehensive observability platforms that provide deep insights into complex systems, incorporating logs, metrics, and traces. System Instrumentation: Instrument applications, infrastructure, and services to collect telemetry data using frameworks like OpenTelemetry. Data Analysis & Visualization: Develop dashboards, reports, and alerts using tools like Prometheus, Grafana, and Splunk to visualize system performance and detect issues. Collaboration: Work with development, SRE, and DevOps teams to integrate observability best practices and align monitoring with business and operational goals. Automation: Develop scripts and use Infrastructure as Code (IaC) tools like Ansible and Terraform to automate monitoring configurations and telemetry collection. Implement and manage full-stack observability using Datadog, ensuring seamless monitoring across infrastructure, applications, and services. Instrument agents for on-premise, cloud, and hybrid environments to enable comprehensive monitoring. Design and deploy key service monitoring, including dashboards, monitor creation, SLA/SLO definitions, and anomaly detection with alert notifications. Configure and integrate Datadog with third-party services such as ServiceNow, SSO enablement, and other ITSM tools.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Career Level
Mid Level
Education Level
No Education Listed