fal-posted 4 months ago
$180,000 - $250,000/Yr
Full-time • Mid Level
San Francisco, CA
1-10 employees

We are seeking an experienced software engineer who thrives on building large scale computation platforms. The ideal candidate will have deep expertise in backend systems that orchestrate workloads and route requests efficiently, while managing capacity and resource constraints. A strong understanding of foundational cloud infrastructure and Linux provisioning and management tools is essential. The candidate should know how to achieve reliability and scale with minimum operational load.

  • Develop and maintain our core Python platform, which handles routing of requests, orchestration of AI workloads, GPU server capacity management, observability, authentication, rate limiting, and many others
  • Develop and maintain our infrastructure layer using Terraform, Ansible, and provider APIs to manage our fleet of GPU workers
  • Own K8s, FluxCD, Nomad, Prometheus, Thanos, Grafana, Loki, distributed networking storage, and other technologies that underpin our platform
  • Create the vision and lay the foundation for where our infrastructure should go in the next 1/2/5 years
  • Deep experience building distributed compute platforms, preferably with Python
  • Strong foundation in managing both cloud and bare metal infrastructure
  • Solid understanding of K8s and CI/CD on it
  • Excellent communication skills
  • Self-starter who executes quickly, takes ownership and constantly seeks improvement
  • Interesting and challenging work
  • Employee-friendly equity terms (early exercise, extended exercise)
  • A lot of learning and growth opportunities
  • Health, dental, and vision insurance (US)
  • Regular team events and offsites
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service