Staff Platform Engineer

SanasPalo Alto, CA

About The Position

We're looking for an experienced Platform Engineer to build and operate the hybrid infrastructure foundation for our advanced AI/ML research and product development. You'll architect, build, and run our platforms spanning AWS and on-premise deployments, empowering our teams to train and deploy complex models at scale. This role is focused on creating a robust, self-service environment using Kubernetes, AWS, and Infrastructure-as-Code (Terraform), and orchestrating high-demand GPU workloads.

Requirements

  • 5+ years of Software Engineering experience, preferably in Platform Engineering or Site Reliability.
  • Strong fundamentals with a focus on writing clean & maintainable code.
  • Strong proficiency in scripting (Bash), Python, or Rust.
  • Experience building large-scale distributed systems with high demands on model inference, performance, reliability, and observability.
  • Experience with high-performance compute (HPC) schedulers, capacity planning, containerized deployments, and familiarity in managing GPU-intensive AI workloads.
  • Strong communication skills with ability to own large scope projects by working cross-functionally across Engineering, AI, Product, Research and Business stakeholders.
  • Experience working with AWS (preferred), GCP or Azure, EKS / Kubernetes.
  • Deep curiosity about the state of agentic coding tools and how to optimize agent-assisted workflows.

Nice To Haves

  • Familiarity with real-time streaming protocols like WebTransport and SIP/SRTP.
  • Bachelor’s Degree in Computer Science or related fields.

Responsibilities

  • Architect and maintain our core computing platform using Kubernetes on AWS and on-premise, providing a stable, scalable environment for all applications, research initiatives, and Sanas services.
  • Provision, manage, and maintain our on-premise bare metal server infrastructure for high-performance GPU computing.
  • Lead comprehensive observability across the organization (monitoring, logging, tracing) to ensure platform(s) health, and create automation for operational tasks, incident response, and performance tuning.
  • Design and build low latency, scalable, and reliable infrastructure that serves model inference and training for our cutting-edge speech AI models.
  • Collaborate with AI researchers and ML engineers to understand their infrastructure needs and build the tools and workflows that accelerate and support their development cycle.
  • You'll have significant autonomy to shape our product infrastructure, and directly impact how cutting-edge AI is applied across various devices and applications in speech.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service