Staff Production Engineer, Cloud Infrastructure

CrusoeSan Francisco, CA
1d$209,000 - $253,000

About The Position

We’re looking for a Staff Production Engineer to lead the design and operation of critical components within our cloud platform powering our AI-first compute environment. In this role, you will own reliability, scalability, and operational excellence across a defined infrastructure domain while partnering closely with other engineers to evolve broader platform strategy. You’ll combine deep hands-on execution with leadership, helping raise the bar for production engineering practices across the organization.

Requirements

  • 7-10 years operating large-scale production workloads on major cloud providers such as GCP or AWS
  • Deep knowledge of GCE, Kubernetes in general and GKE in particular, VPC networking, load balancers, firewall rules, interconnect, and GCS
  • Strong experience managing Kubernetes workloads and authoring Helm Charts
  • Strong Terraform experience and a track record of building automated multi-environment infrastructure
  • Hands-on experience with Kubernetes internals, workload orchestration, scaling, and observability
  • Ability to debug complex distributed systems across compute, storage, and network boundaries
  • Strong cloud security fundamentals, including least privilege, secrets management, and policy enforcement
  • Proficiency with Python, Go, or Shell for automation and tooling
  • Experience influencing design decisions and partnering with cross-functional teams

Nice To Haves

  • Experience supporting high-performance or AI/ML workloads in GCP
  • Familiarity with service mesh, or multi-cluster Kubernetes operations
  • Background in hybrid or multi-cloud infrastructure
  • Strong SRE fundamentals including SLOs, incident response, and postmortems
  • Experience with Spanner, BigQuery, Bigtable, or large-scale data platforms

Responsibilities

  • Design, build, and manage core cloud infrastructure across compute, networking, storage, and IAM
  • Architect, operate and scale Kubernetes-based platforms
  • Deploy and manage Kubernetes workloads using Helm charts and continuous deployment systems
  • Help operate the observability platforms for cloud and Kubernetes workloads using tools such as VictoriaMetrics and Grafana
  • Develop and maintain Terraform modules to define automated, auditable, and secure cloud environments
  • Own VPC design, routing, load balancers, interconnects, peering, and network security boundaries
  • Implement policies and guardrails across IAM, resource hierarchy, service accounts, and VPC-SC
  • Build automation for provisioning, lifecycle management, and blue/green or canary deploy patterns
  • Partner closely with security and platform teams on monitoring, logging, compliance, and operational readiness
  • Optimize cloud costs, quotas, and capacity planning across multiple projects and regions
  • Troubleshoot complex production issues across compute, storage, and networking layers

Benefits

  • Industry competitive pay
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Subscription to the Calm app
  • MetLife Legal
  • Company paid commuter benefit; $300 per month
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service