Sr. Observability Engineer

Simple SolutionsUniversity, FL
23dOnsite

About The Position

The Sr. Observability Engineer is responsible for designing, deploying, and optimizing client’s enterprise observability ecosystem. This role delivers hands-on implementation and consulting expertise, focusing on LogicMonitor and modern observability practices to drive actionable insights, predictive analytics, and operational excellence across infrastructure, network, and application layers.

Requirements

  • LogicMonitor platform deployment, configuration, and optimization.
  • Deep understanding of observability frameworks, best practices, and enterprise monitoring.
  • Python scripting for data analytics, automation, and advanced reporting.
  • Kafka-based data streaming and integration.
  • Grafana dashboarding and visualization.
  • Experience with AI platforms and emerging observability technologies, including Amazon Bedrock.
  • Familiarity with ITSM and event management systems (ServiceNow, PagerDuty, BigPanda).
  • Telemetry protocols (SNMP, syslog, NetFlow, APIs) and data flow architecture.
  • Strong understanding of alerting, correlation logic, and performance baselining.
  • Excellent analytical, documentation, and communication skills; ability to work across engineering and operations teams.
  • 5–10 years of experience in network monitoring, observability, or infrastructure engineering roles.
  • Hands-on experience with LogicMonitor, Splunk, ThousandEyes, Datadog, Dynatrace, Cisco DNAC, and related platforms.
  • Scripting and automation experience (Python, PowerShell, REST APIs)
  • Looking for an expert in the Logic Monitor platform
  • Customer team is decommissioning 4 platforms into LogicMonitor

Responsibilities

  • Deploy, configure, and optimize LogicMonitor for enterprise-scale observability.
  • Design and build custom dashboards for actionable insights and performance monitoring.
  • Implement and manage data analytics workflows, including advanced scripting in Python for automation and reporting.
  • Integrate and manage data pipelines leveraging Kafka and related streaming technologies.
  • Ensure seamless data flow into Grafana for visualization and monitoring.
  • Develop and maintain integrations between observability, ITSM (ServiceNow), and event management tools (PagerDuty, Slack, BigPanda).
  • Standardize alert thresholds, escalation paths, and telemetry mappings across global regions.
  • Define and maintain event, alert, and rule logic to ensure accurate correlation and minimal noise.
  • Manage data ingestion pipelines from SNMP, syslog, APIs, and third-party sources into LogicMonitor and downstream analytics systems.
  • Advise on and implement AI-driven observability tools, including Amazon Bedrock, to enhance predictive analytics and anomaly detection.
  • Partner with network, server, and application teams to validate data flows, performance metrics, and dependency mapping.
  • Automate configuration and onboarding processes via API and scripting (Python, PowerShell, REST).
  • Support incident and problem management teams by correlating events across multiple tools to accelerate root cause analysis.
  • Document integrations, processes, and governance models for sustained operational excellence.
  • Serve as technical SME supporting observability tool upgrades, testing, and cross-platform enhancements.
  • Collaborate with stakeholders to align monitoring strategies with business objectives.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service