Observability Engineer

TensorWaveLas Vegas, NV
1d

About The Position

We are looking for an Observability Engineer who is deeply obsessed with Grafana, Prometheus, and modern observability practices. This role exists to ensure our systems are measurable, understandable, and debuggable at all times. You will own the observability stack end-to-end — from instrumentation standards to dashboards, alerts, and signal quality — and work closely with infrastructure, platform, and application teams to make sure nothing important fails silently. If you think about metrics before features, believe bad alerts are worse than no alerts, and treat Grafana dashboards as first-class products, this role is for you.

Requirements

  • Strong hands-on experience with Grafana and Prometheus
  • Deep understanding of metrics-based observability
  • Experience designing monitoring and alerting systems at scale
  • Strong knowledge of alerting best practices (burn rates, SLO-based alerts, noise reduction)
  • Experience working with distributed systems and cloud or Kubernetes environments
  • Ability to reason about system behavior using telemetry
  • Comfortable working across teams to improve instrumentation and visibility

Nice To Haves

  • Experience with OpenTelemetry
  • Familiarity with logs and traces (Loki, Tempo, Jaeger, etc.)
  • Kubernetes observability experience
  • Experience operating observability systems in high-scale or production-critical environments
  • Infrastructure-as-Code experience (Terraform, Helm, etc.)

Responsibilities

  • Own and evolve our observability and monitoring platform, with Grafana and Prometheus at its core
  • Design, build, and maintain high-quality metrics pipelines using Prometheus and related tooling
  • Create clear, actionable Grafana dashboards that tell a story — not just charts
  • Define and maintain alerts that are meaningful, actionable, and low-noise
  • Establish and enforce observability standards across services (metrics, logs, traces)
  • Partner with engineering teams to instrument applications correctly
  • Lead improvements to alerting strategies, SLOs, and SLIs
  • Support incident response by helping teams quickly understand what broke and why
  • Continuously evaluate and improve signal quality, cardinality, and cost
  • Identify observability gaps and eliminate blind spots before they become outages

Benefits

  • Competitive Salary
  • Stock Options
  • 100% paid Medical, Dental, and Vision insurance
  • Life and Voluntary Supplemental Insurance
  • Short Term Disability Insurance
  • Flexible Spending Account
  • 401(k)
  • Flexible PTO
  • Paid Holidays
  • Parental Leave
  • Mental Health Benefits through Spring Health

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

51-100 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service