About The Position

The Simulation Scalability team builds the core systems, frameworks, and workflows that ensure simulation runs reliably, efficiently, and at scale. We work across all simulation products and pipelines, addressing systemic issues, improving release quality, and strengthening the foundations that enable teams to iterate quickly and confidently. Our work spans systems architecture, software engineering, release management, and quality strategy, with a mission to increase reliability, throughput, and operational excellence across the entire simulation ecosystem.

Requirements

  • Experience improving release workflows, verification processes, or quality gates.
  • Ability to spot gaps in test coverage, validation workflows, or release signals and drive systemic improvements.
  • Ability to analyze complex simulation or compute workflows and design scalable, maintainable solutions.
  • Skilled at identifying recurring failure patterns and eliminating systemic causes.
  • Strong coding fundamentals with experience building production-quality services, frameworks, or tools.
  • Background improving reliability, performance, or developer experience in distributed systems.

Nice To Haves

  • Interest in growing across release engineering, systems architecture, and software development areas.

Responsibilities

  • Lead improvements to the end‑to‑end simulation release process, ensuring it is predictable, well‑defined, and continuously improving.
  • Define and drive the simulation-wide quality and test strategy, including shift‑left testing, automated validation, and early‑signal integration.
  • Strengthen release readiness criteria, quality gates, and verification workflows across multiple simulation teams.
  • Partner with product teams to reduce regressions, shorten validation cycles, and ensure higher-confidence releases.
  • Improve the performance, stability, and reliability of simulation frameworks and core tooling used at scale.
  • Identify systemic bottlenecks in the simulation stack and deliver architectural improvements that increase throughput and reduce operational friction.
  • Enhance simulation exit codes, error attribution, and debugging workflows to improve diagnosability and reduce wasted compute.
  • Establish and enforce best practices for simulation incident management, improving resilience, recovery time, and overall operational readiness.
  • Improve signals, metrics, and monitoring patterns that help teams identify release-related issues earlier and more accurately.
  • Collaborate closely with simulation feature teams, infrastructure groups, and platform stakeholders to align on scaling needs, test coverage gaps, and systemic quality improvements.
  • Work backward from organizational-level problems to design solutions that raise simulation quality and reliability across all workflows.

Benefits

  • From day one, we're looking out for your well-being–at work and at home–so you can focus on realizing your ambitions.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service