Principal Software Engineer (Service Platform & Orchestration)

ZscalerSan Jose, CA
$212,000 - $265,000Hybrid

About The Position

We are looking for a Principal Software Engineer (Service Platform & Orchestration) to join our team. This is a Hybrid (3 days in office) role, reporting to the VP, Engineering in the Zero Trust Exchange department. In this high-ownership position, you will engineer the management plane and reliability systems that govern ZIA’s fleet lifecycle at massive scale. This is a hands-on building role where you will lead the transformation of our infrastructure from legacy automation into a stateful, durable management plane (built on Temporal) to achieve deterministic "one-touch" provisioning and lifecycle operations. You will treat "infrastructure as a distributed system," developing self-healing capabilities and AI-driven SRE practices for a global fleet of 100k+ instances.

Requirements

  • Foundational understanding of AI/ML technologies and experience leveraging, securing, or positioning AI-driven solutions to optimize outcomes within your functional domain
  • Demonstrated curiosity and active exploration of AI tools, with a proven history of integrating new technologies to enhance daily workflows and augment problem-solving
  • BS/MS in Computer Science or a related technical field with 10+ years of experience in hyperscale systems, with a deep understanding of the unique failure modes and technical hurdles that only emerge at massive scale
  • Mastery of backend systems languages (Go, Java, Python, or others) with a proven ability to set the bar for code quality, maintainability, and distributed system correctness
  • Strong experience designing and operating complex distributed systems, with a focus on solving systemic challenges in concurrency, failure handling, and performance optimization
  • Proven track record of developing Platform APIs (REST/gRPC) with strong guarantees for idempotency, verification, and safe rollout patterns

Nice To Haves

  • Proficiency with AI code-assistance tools (e.g., Cursor, Windsurf) to accelerate legacy refactoring and system development
  • Proficiency in PostgreSQL, or other relational stores used for high-scale, stateful management-plane services
  • Direct experience building or operating systems with Temporal.io, Cadence, or similar workflow engines

Responsibilities

  • Lead the hands-on development and migration to a workflow-as-code platform (Temporal), building replay-safe, idempotent workflows that ensure deterministic operations across a global scale
  • Move the organization beyond "scripted automation" toward a robust management plane that treats the entire global fleet as a single, eventually-consistent distributed system
  • Design and implement services that leverage LLMs and ML for intelligent signal correlation, automated triage, and "self-correcting" fleet operations
  • Develop framework-level services and internal APIs that ensure all new products are delivered "orchestration-ready" with reliability hooks built directly into the code
  • Build deep telemetry (metrics, traces, and events) into the management plane so that every fleet-wide action is fully explainable, auditable, and replayable

Benefits

  • Various health plans
  • Time off plans for vacation and sick time
  • Parental leave options
  • Retirement options
  • Education reimbursement
  • In-office perks
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service