Senior Observability Engineer

Trident Consulting IncHolmdel Township, NJ
5hHybrid

About The Position

We are seeking a Senior Observability Engineer with strong expertise in Splunk administration and monitoring platforms to join the Enterprise Observability Engineering team. The ideal candidate will be responsible for configuring, administering, and maintaining enterprise observability tools including Splunk (primary), AppDynamics, OpenTelemetry, and Zenoss to ensure reliability, visibility, and optimal performance of enterprise IT systems. This role involves working closely with DevOps, infrastructure, and application teams to implement monitoring strategies and improve system observability.

Requirements

  • Bachelor's degree in Computer Science, Information Technology, or related field.
  • 5–7+ years of experience in Observability, Monitoring, or Site Reliability Engineering.
  • Strong hands-on experience with: Splunk (Administration, configuration, and implementation) AppDynamics Zenoss
  • Strong understanding of MELT framework: Metrics Events Logs Traces
  • Experience with OpenTelemetry including: Instrumentation patterns Context propagation Collectors Sampling
  • Experience with Kubernetes observability
  • Strong knowledge of IT infrastructure, applications, and networking
  • Experience with scripting or automation (Python, Bash)
  • Experience with cloud platforms such as AWS or Azure

Nice To Haves

  • Experience with monitoring tools such as: Prometheus Grafana
  • Knowledge of DevOps practices and CI/CD pipelines
  • Experience with Infrastructure as Code (Terraform or Ansible)
  • Familiarity with Git-based workflows

Responsibilities

  • Administer and configure Splunk, AppDynamics, OpenTelemetry (OTEL), and Zenoss platforms.
  • Implement monitoring solutions aligned with enterprise observability standards.
  • Perform upgrades, patching, and security hardening of observability tools.
  • Monitor the performance and health of observability platforms.
  • Ensure high availability and data integrity of monitoring systems.
  • Troubleshoot and resolve monitoring platform issues.
  • Design and maintain monitoring dashboards, alerts, and reports.
  • Collaborate with stakeholders to define monitoring requirements.
  • Implement alerting mechanisms for proactive issue detection.
  • Manage data ingestion and onboarding into monitoring systems.
  • Optimize platform performance through configuration tuning.
  • Manage resource utilization and storage within observability platforms.
  • Support incident investigation and root cause analysis.
  • Leverage observability data (metrics, logs, events, traces) to resolve issues.
  • Collaborate with IT and DevOps teams during incident response.
  • Maintain documentation for monitoring configurations and procedures.
  • Establish observability standards and best practices across the organization.
  • Provide technical support to internal teams using monitoring platforms.
  • Conduct training sessions to improve observability tool adoption.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service