AI Engineer - Site Reliability Researcher

TraversalNew York, NY
59d$150,000 - $300,000Onsite

About The Position

Site Reliability Engineering and troubleshooting are at the core of what Traversal does, and while that’s simple to say, it’s hard to do, and even harder to explain. SREs analyze customer issues, but SRE Researchers figure out how they analyze customer issues then work with engineering to teach the AI to replicate their process. In addition, our target user base is experienced SREs (like you) so be prepared to put yourself in the mindset of end users and help shape the product directly. To sum up, Traversal wants to model your troubleshooting talent in code, putting you at the nexus of current customers, potential customers, developers, AI engineers, UI experts and more. We’re entering a phase of rapid growth driven by the needs of customers from mid-market to Fortune100 enterprises. We need people with an engineering mindset who enjoy solving puzzles and have the flexibility to do something different every day. You’ll play a key role in establishing the SRE research practices that allow us to exceed customer expectations today, tomorrow and beyond.

Requirements

  • 5+ years of experience as an SRE, infrastructure engineer, or similar role in fast-paced environments
  • Innate ability to debug distributed systems (e.g.: bare metal, VMs, Kubernetes, Docker, containers), understand how you did it and explain it to others
  • Expertise with observability and metrics tools (Datadog, Elasticsearch, Grafana, OpenTelemetry, Prometheus, ServiceNow, Splunk, etc) and incident response
  • Understanding of networking including routers, switches, firewalls, VPNs, etc
  • Hands-on experience with cloud environments (AWS, Azure, Digital Ocean, GCP) and Infrastructure As Code like Helm and Terraform
  • Experience supporting cloud/on-prem and hybrid deployments

Nice To Haves

  • Background in developer productivity tooling or internal platform teams
  • Prior experience building systems that connect infra events to developer workflows
  • Exposure to agentic systems or AI observability platforms

Responsibilities

  • Troubleshooting Disparate Systems: Our customers use a wide variety of platforms so flexibility and curiosity are critical
  • External Interface: Gather requirements from new customers, guide them through on-boarding and maintain positive relationships to ensure their success
  • Internal Collaboration: Partner with engineering, AI, and product teams, passing along what you learn from end-users, as well as your own input
  • Evaluation and Analysis: Using your troubleshooting and customer RCAs to evaluate Traversal’s performance and find ways to further improve it
  • Incident Management: Lead and further our internal on-call and incident response processes, including alerting, debugging, and postmortems

Benefits

  • We offer competitive compensation, startup equity, health insurance, and additional benefits.
  • We’ll make sure you’re fully supported with health insurance, a great tech setup, flexible time off, and plenty of in-office snacks.
  • We offer competitive salary and equity packages, and take thoughtful consideration with every hire on our small, high-impact team.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service