Senior / Staff Software Engineer (Observability / SRE)

WaabiPittsburgh, PA
7d$148,000 - $249,000

About The Position

Waabi, founded by AI visionary Raquel Urtasun, is the leader in Physical AI. With a world-class team, we're unlocking the next era of autonomous transportation with technology that's powering commercial autonomous trucks and robotaxis. Waabi is backed by and partners with world leaders in AI, automotive, logistics, and deep tech. With offices in Toronto, San Francisco, Dallas, and Pittsburgh, Waabi is growing quickly and looking for diverse, innovative and collaborative candidates who want to impact the world in a positive way. To learn more visit: www.waabi.ai We are constantly expanding our compute footprint in the cloud, and need to expand our observability and monitoring capabilities alongside. We currently use the built in AWS monitoring tools, but this doesn’t work with our on-premise stuff and aren’t user friendly. There are a number of options out there we could deploy, but all of them require some attention and work. Even if we go a vendored route, we still need at least one person to own this area.

Requirements

  • 5+ years software engineering or systems/performance engineering experience (BS in CS/EE or related), with demonstrated end-to-end ownership of complex projects.
  • Proficient in at least one of: Python, Rust, C/C++; strong CS fundamentals and system design skills.
  • Hands-on with Linux internals (CPU scheduling, memory, I/O, networking) and perf tooling (perf, eBPF, flamegraphs, tracing frameworks).
  • Experience with Kubernetes, microservices, and distributed systems; comfort building production services and pipelines.
  • Proven track record of clear communication, writing design docs, and leading cross-functional efforts.

Nice To Haves

  • Experience deploying and managing observability platforms (OpenTelemetry, Grafana OSS).
  • Performance tuning for databases/streaming/batch/ML platforms; GPU/xPU or Arm performance exposure.
  • Experience tuning stream processing, batch or ML platforms (e.g. Argo Workflows, PyTorch).
  • Familiarity with microservices debugging and distributed tracing (OpenTelemetry, Prometheus).

Responsibilities

  • Design and lead the architecture and development of Waabi’s monitoring and observability stack, used to monitor the health and performance of cloud and on-prem environments.
  • Develop and extend workloads and benchmarks (compute, storage, network, ML/AI) and integrate stress, chaos, and regression tests to validate hardware and platform choices.
  • Analyze and optimize end-to-end performance across hardware, firmware, Linux kernel, runtimes, and distributed services using advanced profiling tools (perf, eBPF, flamegraphs, tracing frameworks).
  • Build automation and observability tooling (Go/Python/Java, Kubernetes/Docker) for CI/CD-based performance regression detection, telemetry, alerting, and anomaly detection.
  • Work with client teams to support their applications’ observability requirements.
  • Influence system architecture and tooling decisions that improve how Waabi builds, monitors, and scales its infrastructure.
  • Drive execution and quality, writing design docs, setting milestones, mentoring ICs, and communicating insights and results to stakeholders and leadership.

Benefits

  • Competitive compensation and equity awards.
  • Health and Wellness benefits encompassing Medical, Dental and Vision coverage (for full-time employees only).
  • Unlimited Vacation.
  • Flexible hours and Work from Home support.
  • Daily drinks, snacks and catered meals (when in office).
  • Regularly scheduled team building activities and social events both on-site, off-site & virtually.
  • As we grow, this list continues to evolve!
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service