About The Position

ElastixAI is an early-stage Software startup on a mission to reinvent AI inference infrastructure from the ground up. We're building a next-generation inference platform that delivers unprecedented efficiency by tightly integrating machine learning, software stack, and custom hardware. Our philosophy is simple: the best performance comes from holistic co-design, where every layer, from model architecture to kernels to silicon, works in harmony. If you're excited about pushing AI performance to physical limits and shaping the future of large-scale inference, we'd love to meet you.

Requirements

  • Minimum BS in Computer Science, Software Engineering, or a related field.
  • 3–5 years of hands-on Kubernetes experience, including EKS, GKE, and/or self-hosted clusters.
  • 2–3 years of production experience operating workloads on AWS or GCP.
  • Proven track record running ML or inference services at scale on Kubernetes in production.
  • Strong experience running accelerated workloads in Kubernetes, scheduling, drivers, device plugins, MIG, networking, and storage considerations.
  • Solid coding skills in Python, Bash and proficiency in Go
  • Proficient in configuring and leveraging Linux OS in production
  • Experience with infrastructure-as-code (Terraform, Pulumi), OS configuration state (Ansible, Puppet, Salt) and GitOps workflows (Argo CD, Flux).
  • Experience in OS configuration tooling.
  • Familiarity with AI inference and/or training workflows and the operational patterns around them.
  • Pragmatic, ownership-oriented mindset; comfortable operating in early-stage ambiguity and shipping iteratively.

Nice To Haves

  • MS/PhD in Computer Science, Software Engineering, or a related field.
  • Experience with inference servers and runtimes (e.g., Triton, vLLM, TGI) and model-serving patterns (batching, streaming, KV-cache aware routing).
  • Exposure to heterogeneous accelerators beyond GPUs (FPGAs, custom ASICs).
  • Background in observability, SRE, or performance engineering for latency-sensitive services.
  • Experience building customer facing API platforms including onboarding, API keys/auth management, and usage metering.

Responsibilities

  • Build, operate, and evolve ElastixAI's Kubernetes infrastructure powering our Token-as-a-Service capability.
  • Run accelerated inference workloads in production at scale, with strong SLAs around latency, throughput, and availability.
  • Manage and harden our AWS, GCP, and on-prem infrastructure, including networking, storage, IAM, and observability layers tied to our services.
  • Develop tooling and automation in Python, Bash, Rust, and Go to streamline deployments, rollouts, autoscaling, and incident response.
  • Partner with the ML and runtime teams to productionize new inference capabilities, model deployments, and routing strategies.
  • Contribute to capacity planning, cost optimization, and reliability engineering across multi-cloud and self-hosted environments.
  • Help define the platform roadmap as we scale from early customers to broad production deployments.
  • Be a member of the Elastix On-Call rotation.

Benefits

  • Competitive compensation and startup equity package
  • Comprehensive medical, dental, and vision coverage (premiums 100% paid by employer)
  • Flexible Time Off (FTO)
  • Paid parental leave
  • Gym or fitness benefit
  • Commuter benefit
  • Investment in employee learning & development
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service