DevOps Engineer

Luminary CloudSan Mateo, CA

About The Position

We’re building out a cloud platform team and looking for a Senior DevOps Engineer to own the developer infrastructure that powers our products. You will own how we deploy, scale, observe, and secure systems across GCP and AWS, with Kubernetes at the core. This isn’t a ticket-queue role. You’ll work directly with engineers building services in Go and TypeScript, researchers training PyTorch models, and leadership defining the roadmap. You’ll have real ownership and the latitude to build things the right way from the start.

Requirements

  • 5–8 years of experience in DevOps, SRE, or platform engineering roles
  • Production Kubernetes experience — cluster management, not just deploying workloads
  • Hands-on experience with GCP or AWS; solid conceptual understanding of both
  • End-to-end ownership of CI/CD pipelines and GitOps workflows
  • Proficiency in Go or Python for writing infrastructure tooling and automation
  • Infrastructure as Code expertise with Terraform or Pulumi
  • Experience with observability stacks: Prometheus, Grafana, and a log aggregation platform
  • Strong grasp of cloud security fundamentals: IAM, secrets management, network policies

Nice To Haves

  • Experience supporting ML training infrastructure, GPU node pools, or model serving (TorchServe, Triton)
  • Familiarity with TypeScript for build tooling or internal developer platforms
  • Background in a fast-moving startup or product engineering environment
  • Contributions to open-source infrastructure tooling

Responsibilities

  • Design, build, and operate cloud infrastructure on GCP with an emphasis on reliability, security, and cost efficiency
  • Own and evolve our Kubernetes platform — cluster architecture, RBAC, networking, autoscaling, and workload scheduling
  • Build and maintain automated CI/CD pipelines using GitHub Actions and ArgoCD, supporting GitOps workflows for all services
  • Write Go and Python tooling to automate infrastructure tasks, improve developer experience, and extend internal platform capabilities
  • Establish observability practices — metrics (Prometheus/Grafana), distributed tracing (OpenTelemetry), and centralized logging
  • Define and enforce security best practices: secrets management (Vault/KMS), image scanning, IAM least-privilege, and network policies
  • Support GPU-based ML workloads, working with researchers to provision and optimise node pools for PyTorch training and inference
  • Respond to incidents and lead blameless postmortems to drive continuous improvement in system reliability
  • Write clear documentation and champion a culture of engineering excellence across the team

Benefits

  • Competitive salary, equity, and benefits

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

No Education Listed

Number of Employees

1-10 employees

© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service