About The Position

The Trade Desk is a global technology company and the world’s leading independent platform for digital advertising, with nearly 4,000 employees across more than 30 offices. Their technology helps advertisers reach the right audiences across the open internet — from streaming TV and podcasts to mobile apps, news, and more. Advertising powers the content people love, and by making it more transparent, effective, and responsible, The Trade Desk helps support trusted journalism, quality entertainment, and creators worldwide. The world’s brands and agencies rely on them to reach their customers and grow their businesses responsibly. The scale of their platform brings unique technical challenges, from processing massive datasets in real time to building systems that operate reliably on a global scale. Working at The Trade Desk offers worldwide impact, diverse perspectives, and a culture of curiosity and learning. The Service Excellence (SE) team owns the tools and infrastructure that help engineers at The Trade Desk understand and operate production systems. The Incident Response Services (IRS) taskforce, a part of the SE team, focuses on the on-call experience and is responsible for making incidents easier to detect, manage, and optimize using historical data points information.

Requirements

  • Experience building and operating production infrastructure or internal developer tooling
  • Comfort working across the stack — this role touches distributed systems, Kubernetes, observability pipelines, and web-based tooling
  • Familiarity with observability concepts: logging, alerting, on-call workflows
  • Strong debugging instincts: You will be expected to be called on when things break
  • Clear communication: The team works closely with engineers across the company; you'll need to explain tradeoffs and advocate for solutions

Nice To Haves

  • Experience with Grafana, Prometheus, or similar observability tools
  • Familiarity with Sumo Logic or other log management platforms
  • Prior work on developer portals or service catalog tooling (Backstage, OpsLevel, etc.)
  • Experience with Kubernetes at scale

Responsibilities

  • Incident management tooling
  • Build and maintain automation around the incident lifecycle: alerting, escalation, incident channels, retros, and SLA tracking
  • Help evaluate and migrate our logging stack
  • Participate in the re-evaluation of our logging vendor and collection architecture
  • Backstage/Service catalog — Extend our internal developer portal with K8s integrations, maturity models, and SLO adoption tooling
  • Alert quality tooling — Build the systems that give engineers better signal and less noise — smarter routing, better grouping, tighter feedback loops between alerts and the teams that own them

Benefits

  • comprehensive healthcare (medical, dental, and vision) with premiums paid in full for employees and dependents
  • retirement benefits such as a 401k plan and company match
  • short and long-term disability coverage
  • basic life insurance
  • well-being benefits
  • reimbursement for certain tuition expenses
  • parental leave
  • sick time of 1 hour per 30 hours worked
  • vacation time for full-time employees up to 120 hours thru the first year and 160 hours thereafter
  • around 13 paid holidays per year
  • Employees can also purchase The Trade Desk stock at a discount through The Trade Desk’s Employee Stock Purchase Plan
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service