AI Inference Infrastructure Software Engineer (Kubernetes / Cloud)

ElastixAI•Seattle, WA

21h•Hybrid

About The Position

ElastixAI is an early-stage Software startup on a mission to reinvent AI inference infrastructure from the ground up. We're building a next-generation inference platform that delivers unprecedented efficiency by tightly integrating machine learning, software stack, and custom hardware. Our philosophy is simple: the best performance comes from holistic co-design, where every layer, from model architecture to kernels to silicon, works in harmony. If you're excited about pushing AI performance to physical limits and shaping the future of large-scale inference, we'd love to meet you.

Requirements

Minimum BS in Computer Science, Software Engineering, or a related field.
3–5 years of hands-on Kubernetes experience, including EKS, GKE, and/or self-hosted clusters.
2–3 years of production experience operating workloads on AWS or GCP.
Proven track record running ML or inference services at scale on Kubernetes in production.
Strong experience running accelerated workloads in Kubernetes, scheduling, drivers, device plugins, MIG, networking, and storage considerations.
Solid coding skills in Python, Bash and proficiency in Go
Proficient in configuring and leveraging Linux OS in production
Experience with infrastructure-as-code (Terraform, Pulumi), OS configuration state (Ansible, Puppet, Salt) and GitOps workflows (Argo CD, Flux).
Experience in OS configuration tooling.
Familiarity with AI inference and/or training workflows and the operational patterns around them.
Pragmatic, ownership-oriented mindset; comfortable operating in early-stage ambiguity and shipping iteratively.

Nice To Haves

MS/PhD in Computer Science, Software Engineering, or a related field.
Experience with inference servers and runtimes (e.g., Triton, vLLM, TGI) and model-serving patterns (batching, streaming, KV-cache aware routing).
Exposure to heterogeneous accelerators beyond GPUs (FPGAs, custom ASICs).
Background in observability, SRE, or performance engineering for latency-sensitive services.
Experience building customer facing API platforms including onboarding, API keys/auth management, and usage metering.

Responsibilities

Build, operate, and evolve ElastixAI's Kubernetes infrastructure powering our Token-as-a-Service capability.
Run accelerated inference workloads in production at scale, with strong SLAs around latency, throughput, and availability.
Manage and harden our AWS, GCP, and on-prem infrastructure, including networking, storage, IAM, and observability layers tied to our services.
Develop tooling and automation in Python, Bash, Rust, and Go to streamline deployments, rollouts, autoscaling, and incident response.
Partner with the ML and runtime teams to productionize new inference capabilities, model deployments, and routing strategies.
Contribute to capacity planning, cost optimization, and reliability engineering across multi-cloud and self-hosted environments.
Help define the platform roadmap as we scale from early customers to broad production deployments.
Be a member of the Elastix On-Call rotation.

Benefits

Competitive compensation and startup equity package
Comprehensive medical, dental, and vision coverage (premiums 100% paid by employer)
Flexible Time Off (FTO)
Paid parental leave
Gym or fitness benefit
Commuter benefit
Investment in employee learning & development

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume