Principal Platform Architect

TraversalNew York, NY
68d$250,000 - $500,000

About The Position

Traversal is the AI Site Reliability Engineer (SRE) for the enterprise—already trusted by some of the largest companies in the world to troubleshoot, remediate, and even prevent the most complex production incidents. Our mission is to free engineers from endless firefighting and enable them to focus on creative, high-impact work. Our roots remain deeply embedded in AI research, and we’re channeling that scientific rigor and creativity into building the premier AI agent lab for the enterprise. Hence, what we’re proudest of is assembling the most talented yet nicest group of individuals, including researchers from MIT, Harvard, and Berkeley, to world-class engineers from industry: Citadel Securities, Cockroach Labs, Datadog, DE Shaw, ServiceNow, Glean, Perplexity, Pinecone, and more, to take on one of the hardest problems for AI to solve. Without the entire team, none of this would be possible.

Requirements

  • 10+ years of experience in backend, infrastructure, or platform engineering, with a strong emphasis on large-scale data systems.
  • Proven expertise in designing, scaling, and operating high-throughput distributed systems for real-time or near real-time data processing.
  • Demonstrated success owning complex infrastructure architecture end-to-end—from initial design through to deployment and long-term maintenance.
  • Deep understanding of data pipeline design patterns (streaming and batch), storage systems, and consistency/performance tradeoffs at scale.
  • Hands-on experience with technologies like Kafka, Flink, Spark, Postgres, S3, and modern observability stacks.
  • Experience architecting for multi-tenant, hybrid, or on-prem environments.
  • Strong systems thinking and debugging skills across infrastructure, networking, and data layers.
  • Excellent communication and collaboration skills with a track record of driving alignment across technical and non-technical stakeholders.
  • Comfortable working in high-velocity, ambiguous startup environments, with a bias toward action and pragmatism.

Nice To Haves

  • Experience making software systems observable using logs, metrics, and traces.
  • Familiarity with Python-based ecosystems.
  • Background in infrastructure for ML/AI or LLM-powered products.
  • Experience provisioning and managing infrastructure using IaC tools (Terraform, Pulumi).
  • Contributions to open source or infrastructure tooling.

Responsibilities

  • Lead the design of scalable, resilient infrastructure systems to power AI-driven root cause analysis and observability workflows.
  • Define and evolve the long-term architecture strategy for infrastructure and observability systems—ensuring they scale with growing AI workloads, customer complexity, and team size.
  • Act as a key partner to product and engineering leadership—aligning on priorities, shaping the roadmap, and driving clarity across teams.
  • Tackle high-leverage, unscoped problems—bringing structure, clarity, and executable plans to ambiguous technical challenges.
  • Establish and evangelize best practices across reliability, system design, code quality, and observability—raising the bar for engineering across the org.
  • Partner with leadership to improve how technical decisions are made, how teams collaborate, and how we scale culture alongside systems.
  • Uplevel Staff and Senior engineers across the company, not just on your immediate team—through pairing, feedback, and technical guidance.
  • Work with recruiting and leadership to define role expectations, calibrate interviews, and evaluate candidates for high-impact roles.
  • Serve as a sounding board for critical architectural decisions, unlock velocity by unblocking teams, and help connect long-term vision with day-to-day execution.

Benefits

  • Competitive compensation
  • Startup equity
  • Health insurance
  • Flexible time off
  • In-office snacks
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service