DevOps/Observability Engineer

Quantiphi
Remote

About The Position

We are seeking a highly experienced Senior DevOps/Observability Engineer with over 8 years of experience to lead the design and implementation of our next-generation, unified observability platform. This pivotal role will focus on architecting a sophisticated observability pipeline from the ground up, leveraging a modern, open-source-centric stack on Amazon Web Services (AWS). The ideal candidate will have deep expertise in designing and deploying observability solutions, with a strong emphasis on OpenTelemetry (OTel) and Kubernetes observability. You will be responsible for deploying, configuring, and integrating a suite of tools including Prometheus, Grafana, and Splunk to provide comprehensive insights into our complex, distributed systems. This is a hands-on role for a technical leader who is passionate about building scalable, reliable, and efficient monitoring and logging systems.

Requirements

  • Over 8 years of experience as a DevOps/Observability Engineer.
  • Proven ability to design and implement end-to-end observability pipelines using OpenTelemetry, Prometheus, and Grafana on centralized infrastructure.
  • Deep expertise in centralizing AWS telemetry, including multi-account CloudTrail organization trails, cross-account CloudWatch metrics/logs, and VPC Flow Logs.
  • Strong experience designing log aggregation strategies, implementing noise reduction/filtering at the collector level, and configuring Splunk HTTP Event Collector (HEC) integrations.
  • Hands-on experience building comprehensive alerting frameworks using Alertmanager and CloudWatch Alarms.
  • Hands-on experience with advanced dashboard engineering in Grafana (using PromQL).
  • Advanced proficiency in writing Terraform modules specifically for deploying and managing observability stacks and EC2 infrastructure.
  • Demonstrated experience managing, routing, and optimizing log pipelines at massive scale (TB/day).
  • Experience deploying Prometheus and OTel within Kubernetes (EKS) or containerized (ECS) environments.
  • Proven track record of reducing observability spend through strategic metric dropping, log filtering, and efficient storage tiering.

Responsibilities

  • Design and implement end-to-end observability pipelines using OpenTelemetry, Prometheus, and Grafana on centralized infrastructure.
  • Centralize AWS telemetry, including multi-account CloudTrail organization trails, cross-account CloudWatch metrics/logs, and VPC Flow Logs.
  • Design log aggregation strategies, implement noise reduction/filtering at the collector level, and configure Splunk HTTP Event Collector (HEC) integrations.
  • Build comprehensive alerting frameworks using Alertmanager and CloudWatch Alarms.
  • Engineer advanced dashboards in Grafana using PromQL.
  • Write Terraform modules specifically for deploying and managing observability stacks and EC2 infrastructure.
  • Manage, route, and optimize log pipelines at massive scale (TB/day).
  • Deploy Prometheus and OTel within Kubernetes (EKS) or containerized (ECS) environments.
  • Reduce observability spend through strategic metric dropping, log filtering, and efficient storage tiering.

Benefits

  • Opportunity to join one of the world’s fastest-growing AI-first digital engineering companies.
  • Make a real impact at scale.
  • Lead and collaborate with a high-energy team of talented, driven individuals solving complex, meaningful challenges.
  • Work with Fortune 500 companies and disruptive innovators in a research-driven environment with 60+ patents.
  • Gain hands-on experience with cutting-edge AI, ML, data, and cloud technologies.
  • Continuous upskilling opportunities.
  • Fun, diverse and hybrid work culture.
  • Ample opportunities to learn, grow and interact with colleagues from varied experience and backgrounds around the globe.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service