About The Position

Define observability strategy, goals and roadmap Develop and enhance full-stack observability for Hybrid, Cloud and Multi-cloud deployments while providing consistent view for various personas Solve complex technical problems - be hands-on Define and maintain a healthy balance of innovation and reliability as you deliver to meet customers' needs Identify comprehensive risks and risk-mitigation-mapping matrix, coordinate with the Security, Risk & Compliance teams Ensure solution meets all requirements of quality, security, modifiability, extensibility and scalability Develop high-level solution specifications with attention to integration and feasibility (technical, function and financial) Prepare an easy-to-understand report detailing achieved milestones, short-term and long-term project goals Provide technical guidance and coaching to a team of developers and engineers Work with peers across all Observability pillars (Logs, Traces, Events and SRE) to achieve common observability goal Build familiarity with Observability industry challenges/trends and bring that knowledge to improve your offerings 10+ years of experience in monitoring solutions, from architecture and design through delivery and support of complex, highly scalable, and robust systems. Experience designing scalable enterprise solutions with high‑volume, high‑frequency data workloads. Experience designing, building, and maintaining large‑scale hybrid observability solutions, supporting high‑volume, low‑latency data platforms. 5+ years of Site Reliability Engineering (SRE) practice experience with strong knowledge of modern observability architectures and tooling. 5+ years of software engineering experience with solid understanding of system architecture and contemporary observability solution Proven experience delivering high‑demand, customer‑facing offerings or services. Strong understanding of Full Stack Observability concepts and practices. Knowledge of Prometheus, OpenTelemetry, and core SRE methodologies. Experience managing and developing engineering teams. Exceptional collaboration, communication, and presentation skills. Hands‑on experience with one or more observability platforms: Prometheus, Grafana Stack (Grafana UI, GEM/Mimir, GET/Tempo, GEL/Loki), Splunk Observability, ElasticSearch, AWS CloudWatch, Google Cloud Observability, Datadog, or Dynatrace. Experience designing and deploying cloud‑based solutions on AWS, Google Cloud, Azure, or IBM Cloud using containers/Kubernetes (e.g., OpenShift, IBM Cloud Private, GKE, EKS, AKS). Background working on large‑scale projects, leading agile engineering teams, and developing software in languages such as Go, Ruby, Python, Java/J2EE, C++, PHP, and .NET. Experience building applications on modern microservice frameworks such as Spring Boot, Node.js/Express, MicroProfile, or Ruby on Rails. Telemetric pipelines BindPlane, Cribl, Open Telemetry, Vector, etc. Knowledge of the Grafana Stack is a plus.

Requirements

  • 10+ years of experience in monitoring solutions, from architecture and design through delivery and support of complex, highly scalable, and robust systems.
  • Experience designing scalable enterprise solutions with high‑volume, high‑frequency data workloads.
  • Experience designing, building, and maintaining large‑scale hybrid observability solutions, supporting high‑volume, low‑latency data platforms.
  • 5+ years of Site Reliability Engineering (SRE) practice experience with strong knowledge of modern observability architectures and tooling.
  • 5+ years of software engineering experience with solid understanding of system architecture and contemporary observability solution
  • Proven experience delivering high‑demand, customer‑facing offerings or services.
  • Strong understanding of Full Stack Observability concepts and practices.
  • Knowledge of Prometheus, OpenTelemetry, and core SRE methodologies.
  • Experience managing and developing engineering teams.
  • Exceptional collaboration, communication, and presentation skills.
  • Hands‑on experience with one or more observability platforms: Prometheus, Grafana Stack (Grafana UI, GEM/Mimir, GET/Tempo, GEL/Loki), Splunk Observability, ElasticSearch, AWS CloudWatch, Google Cloud Observability, Datadog, or Dynatrace.
  • Experience designing and deploying cloud‑based solutions on AWS, Google Cloud, Azure, or IBM Cloud using containers/Kubernetes (e.g., OpenShift, IBM Cloud Private, GKE, EKS, AKS).
  • Background working on large‑scale projects, leading agile engineering teams, and developing software in languages such as Go, Ruby, Python, Java/J2EE, C++, PHP, and .NET.
  • Experience building applications on modern microservice frameworks such as Spring Boot, Node.js/Express, MicroProfile, or Ruby on Rails.
  • Telemetric pipelines BindPlane, Cribl, Open Telemetry, Vector, etc.

Nice To Haves

  • Knowledge of the Grafana Stack is a plus.

Responsibilities

  • Define observability strategy, goals and roadmap
  • Develop and enhance full-stack observability for Hybrid, Cloud and Multi-cloud deployments while providing consistent view for various personas
  • Solve complex technical problems - be hands-on
  • Define and maintain a healthy balance of innovation and reliability as you deliver to meet customers' needs
  • Identify comprehensive risks and risk-mitigation-mapping matrix, coordinate with the Security, Risk & Compliance teams
  • Ensure solution meets all requirements of quality, security, modifiability, extensibility and scalability
  • Develop high-level solution specifications with attention to integration and feasibility (technical, function and financial)
  • Prepare an easy-to-understand report detailing achieved milestones, short-term and long-term project goals
  • Provide technical guidance and coaching to a team of developers and engineers
  • Work with peers across all Observability pillars (Logs, Traces, Events and SRE) to achieve common observability goal
  • Build familiarity with Observability industry challenges/trends and bring that knowledge to improve your offerings

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Manager

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service