Senior Engineering Manager, Compute

Temporal Technologies

2d•$320,000 - $335,000

About The Position

Companies at the frontier of the AI revolution run on Temporal. OpenAI runs on Temporal, handling millions of requests. Cursor runs its cloud coding agents on Temporal at over 50 million actions a day across 7M+ workflows, and more than a third of the pull requests its users merge now come from those agents. Replit, Lovable, Abridge, and Hebbia build their agents on it too. In the last year alone, AI-native companies executed 1.86 trillion actions on Temporal Cloud, and the curve is still bending upwards. Backed by a recent $300M Series D at a $5B valuation, we are building the durable execution layer the agentic era depends on. The Compute team owns the layer all of that runs on. We are looking for a Senior Software Development Manager to lead the effort to make any aspects of Temporal's compute invisible to our customers, allowing them to focus on application layer innovation, while we handle the compute muck. This is a rare, build-the-foundation mandate: the compute substrate that the world's most demanding AI workloads will run on. We want a leader who has operated compute at planet scale, thinks in fleets, goodput, and cost-per-unit-of-compute, and pairs that with the operational rigor to run a service that frontier-AI companies bet production on.

Requirements

Proven experience leading software engineering teams that build and operate large-scale compute platforms or fleets, with strong operational practices.
12+ years in software and/or infrastructure engineering, including 7+ years of people management and demonstrated ownership of delivery and live-site outcomes.
Deep distributed-systems and compute infrastructure depth, with the hands-on judgment to guide architecture and execution rather than from a distance.
Experience operating multi-tenant compute that other people's production workloads depend on.
Bachelor's degree in Computer Science or related field, or equivalent practical experience; advanced degree a plus.
Excellent communication skills, with the ability to partner across engineering, product, and leadership and fold customer feedback into the roadmap.
Strong leadership, coaching, and performance management; ability to grow engineers and build a healthy, accountable, high-ownership team.
Excellence in execution: planning, prioritization, and delivering iterative milestones in an ambiguous, fast-moving environment while managing unplanned work.
Fleet thinking: utilization, goodput, capacity and supply planning, and cost discipline as first-class engineering concerns.
Live-site reliability craft: on-call, incident management & response, and postmortem-driven continuous improvement.
Strong command of the building blocks of a compute platform: multi-tenant isolation and security, scheduling, and resource management.
Ability to review and raise the bar on technical artifacts (design docs, code reviews) across a distributed-systems codebase.

Nice To Haves

MicroVMs and virtualization (Firecracker, gVisor, Edera) or managed-compute primitives (AWS Fargate, GCP Cloud Run, AWS Lambda), and/or Kubernetes internals.
Building serverless or hosted-compute products from 0 to 1, including the rapid-delivery-vs-durable-platform tradeoffs that come with it.
Multi-cloud delivery across AWS and GCP.
Cold-start, warm-pool, and scheduling/latency optimization for on-demand compute.
Agent sandboxes, secure execution of untrusted code, or other AI-agent infrastructure.
GPU / accelerated compute: fractional GPUs (MIG, MPS, time-slicing), GPU scheduling, training vs. inference fleets, and multi-tenant GPU isolation.

Responsibilities

Own the strategy and standards of excellence for the compute layer that the world's agents run on, across design, delivery, and operations. Build a culture of ownership, quality, and customer-first decision-making.
Lead, hire, and grow a high-ownership team; roll up sleeves, ready to do deep into the trenches, by staying close to design docs and code, rather than managing from a distance. Coach engineers, level them up, and clear the friction that slows them down.
Drive the arc from today's compute toward the next-generation of compute platforms. Ground prioritization in customer and design-partner feedback, and turn ambiguous, fast-moving requirements into predictable, iterative delivery.
Own operations, run on-call and incident response, and drive blameless postmortems and the systemic fixes that prevent recurrence.
Guide the hard architectural decisions for large-scale, multi-tenant compute, where technical concerns cut across workload isolation and security, scheduling, fleet efficiency / utilization / goodput, and performance, while ensuring the platform is reliable and efficient for the workloads that depend on it.
Own utilization, capacity and supply planning, and the cost-per-unit-of-compute and margin profile of the fleet, across CPU compute today and accelerated compute ahead.
Partner with leadership, Product, SDK, UX/DX, Security, and design-partner customers to align priorities and unblock delivery. Communicate progress, tradeoffs, and risk clearly to technical and non-technical audiences alike.

Benefits

Unlimited PTO
12 Holidays + 2 Floating Holidays
100% Premiums Coverage for Medical, Dental, and Vision
AD&D, LT & ST Disability, and Life Insurance (Standard & Supplemental Available)
Empower 401K Plan
Learning & Development
Lifestyle Spending
In-Home Office Setup
Professional Memberships
WFH Meals
Internet Stipend
Calm App Subscription for Mental Health & Wellness