Software Engineer - Member of Technical Staff

MithrilPalo Alto, CA
Hybrid

About The Position

You will work across Mithril's three core engineering areas — Consumption (the developer-facing product, billing, and API surface), Platform (the orchestration and marketplace engine), and Supply (integrations with cloud providers and capacity management). You will own meaningful slices of each, shipping features end-to-end against a product that handles real revenue and business-critical customer workloads. This is not a single-domain role. You'll move between backend systems, marketplace logic, and customer-facing surfaces — with direct exposure to the architectural decisions shaping how Mithril scales. Most early-stage engineering roles at infrastructure companies are either deep in systems (little product surface) or deep in product (little systems exposure). At Mithril, you'll work across both: the orchestration engine that manages GPU capacity across providers, and the developer-facing surfaces that customers use to reserve, bid, and consume that capacity. The systems you build will handle real money, real workloads, and real market dynamics — spot auctions, reservation pricing, and capacity allocation across a heterogeneous supply base. If you want to understand how GPU infrastructure markets actually work and build the systems that run them, this is that role.

Requirements

  • The profile we're hiring for is a generalist with strong backend instincts — someone who has shipped and maintained production systems at real scale, and is comfortable owning features across the full stack when needed.
  • Strong Python backend skills — you've built and maintained production APIs at real scale, not just prototypes.
  • Proven experience with distributed systems; comfortable with messaging systems (RabbitMQ, Kafka) and RPC methods (gRPC, protobuf, tRPC).
  • Fluency with relational databases: schema design, query optimization, and migrations in production environments.
  • Ability to own a feature end-to-end — from design doc through backend API, database migration, feature flag rollout, and monitoring.
  • Strong debugging instincts: calm and systematic when triaging production incidents, with a bias toward durable fixes.
  • Comfortable in small teams where ownership is high and process is lightweight.

Nice To Haves

  • Experience with marketplace and billing systems — especially variable-usage, auction-based, or complex reservation models.
  • Familiarity with Kubernetes, AWS infrastructure, or cloud provider APIs (GCP, OCI, Nebius).
  • Familiarity with Linux containers and container orchestration (Docker Swarm, Nomad).
  • Comfort working across the stack — Python backend and TypeScript/React frontend.
  • Experience with developer tools or CLIs (bonus: SkyPilot or similar frameworks).
  • Proficiency with infrastructure-as-code (Terraform, Kustomize) or observability tooling (Grafana, Prometheus).
  • Background in real-time systems: SSE, WebSockets, or event-driven architectures.
  • Prior experience at a startup where you wore multiple hats and shipped across domains.

Responsibilities

  • Build and maintain backend Python APIs powering Mithril's reservation, bidding, billing, and usage surfaces — including usage visualization and the billing platform.
  • Write and maintain database migrations, design relational schemas for complex multi-entity models (orders, allocations, grants, quotas), and optimize queries for performance.
  • Own features end-to-end: from design doc through backend API, database migration, feature flag rollout, and monitoring.
  • Design and implement orchestration primitives for flexible reservations: pause/return credits, reservation extensions, capacity algorithm updates, and the auction model connecting spot and reserved capacity.
  • Contribute to operational maturity — quota systems, financial controls, fraud prevention, and inventory management automation — so Mithril can scale supply sources without scaling headcount linearly.
  • Contribute to supply-side integrations, bringing GCP, Nebius, and OCI resources under Mithril management through the Mithril API.
  • Build tooling to give operators visibility into managed capacity and surface supply constraints early.
  • Build consumption tooling to help developers make better decisions: reservation calculators, bid modeling, cost dashboards, and CLI extensions.
  • Work across the client frontend to ship customer-facing features including usage graphs, billing views, and reservation management UIs.
  • Participate in on-call rotations, triage production incidents, and contribute to operational runbooks and automation that reduce toil over time.

Benefits

  • Health, dental, and vision coverage for you and your dependents
  • 401k Plan with 4% company match
  • 21 days of PTO & 14 company holidays; including 2 floating holidays
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service