Senior/Staff/Principal SWE- Observability Engineering

AppGate Cybersecurity, Inc.New York, NY

About The Position

We’re looking for an Observability Engineer (Senior/Staff/Principal level) who has shipped distributed tracing systems, designed high-cardinality pipelines, and knows OpenTelemetry inside and out. You will own the end-to-end design and implementation of the AppGate observability fabric — from telemetry SDKs in our clients and gateways, to the LogForwarder pipeline, to customer-side integrations. You’ll make the foundational technical decisions — transport protocols, sampling strategies, schema design, correlation models — that determine whether our platform scales gracefully to hundreds of millions of events per day. This is a builder’s role with a strategist’s reach.

Requirements

  • 8+ years of engineering experience with at least 4 years dedicated to observability, telemetry, or large-scale data infrastructure (Datadog, Splunk, Elastic, Honeycomb, New Relic, Grafana Labs, or equivalent).
  • Deep OpenTelemetry expertise: OTLP, the OTel Collector, semantic conventions, context propagation, and head/tail sampling — you can debate the trade-offs in your sleep.
  • Distributed tracing in production: You’ve designed or significantly contributed to a tracing system handling real customer traffic, not just a side project.
  • High-throughput pipeline experience: Hands-on with systems ingesting 100M+ events per day, including back-pressure handling, batching, and storage trade-offs.
  • Strong systems programming: Production Go and/or Rust preferred. Comfort across the stack, from agent code to backend services.
  • Networking and security fluency: Comfortable with TLS, DNS, TCP, and identity protocols. Prior ZTNA, SASE, or SD-WAN experience is a strong plus.
  • Mindset: Pragmatic, opinionated, and impact driven. You know when to prototype and when to ship.

Nice To Haves

  • Prior ZTNA, SASE, or SD-WAN experience

Responsibilities

  • OpenTelemetry-Native Telemetry Fabric: Logs and distributed traces from clients, controllers, gateways, and connectors — all correlated by session, user, device, and trace ID across the full ZTNA flow.
  • High-Cardinality Data Pipeline: An OTLP-based ingestion and routing layer engineered for 100M+ events per day, with attribute filtering, redaction, and tail-sampling.
  • End-to-End Distributed Tracing: Span hierarchies decomposing login and session establishment across posture checks, policy decisions, TLS handshakes, and entitlement resolution — turning hours of triage into seconds.
  • On-Demand Packet Capture: Admin-triggered PCAP coordinated across client and gateway, with the workflow fully observable through OTel logs and traces.
  • AI-Ready Foundation: Structured, semantically rich telemetry that future LLM-based incident analysis agents can reason over. The schema you design today is the substrate for Phase 3.
  • Architect the Observability Platform: Define telemetry schema, correlation model, transport, and sampling strategies spanning client devices, controllers, and gateways.
  • Build the Telemetry SDKs and LogForwarder: Instrument AppGate components with OpenTelemetry and implement the enrichment, redaction, batching, and tail-sampling pipeline that scales horizontally under load.
  • Validate at Customer Scale: Test in lab environments matching our largest deployments — hundreds of sites, tens of thousands of concurrent sessions — and hunt down cardinality explosions and pipeline backpressure before customers see them.
  • Drive Integration Standards: Own the OTLP, Prometheus, and JSON-log compatibility surface and validate ingestion into Datadog, Splunk, Nexthink, and Elastic.
  • Raise the Engineering Bar: Establish patterns and review practices the Data + AI team builds on. Mentor engineers and grow the observability discipline inside AppGate.
  • Collaborate Cross-Functionally: Work directly with product, R&D, and marquee customers in defense and critical infrastructure to shape requirements and deliver outcomes that matter.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service