Senior Infrastructure Engineer

Groq, Inc.Palo Alto, CA
65d$132,100 - $279,800Remote

About The Position

Mission: At Groq, we are building a custom cloud from the ground up - one data center at a time. Our Compute Storage team owns the systems that turn racks of bare metal into production-ready Kubernetes clusters powering the next generation of AI workloads. We are looking for a Staff Infrastructure Engineer to help us scale this effort. This is a hands-on role focused on fully automating deployment and lifecycle management of the Groq Cloud server fleet. You will work closely with DC, network and platform teams to define and develop tools and automation that enable seamless deployment and management of Groq compute nodes and storage clusters. We're looking for someone passionate about infrastructure who enjoys debugging close to the metal. If you're eager to grow your skills in deploying, scaling, and optimizing bare metal to support complex distributed HPC in the expanding inference market - we would love to talk.

Requirements

  • Experience with deploying and supporting Linux / Kubernetes systems at scale.
  • Familiarity with infrastructure-as-code and Git-based workflows (e.g., Terraform, Flux, Kustomize).
  • Ability to write and maintain basic tooling in common modern languages such as Go and Python.
  • Understanding of networking fundamentals (IPAM, VLANs, DHCP, DNS).
  • Working knowledge of storage concepts (block vs object, NFS, RAID, etc.).
  • Strong sense of ownership and a willingness to work through ambiguity.

Nice To Haves

  • Experience provisioning physical machines in a data center environment.
  • Exposure to Talos Linux, Kubernetes bootstrapping, or Kubernetes platform engineering.
  • Previous collaboration with facilities, hardware, or network teams in an operational role.

Responsibilities

  • Develop robust, scalable automation solutions (Go, Python, Bash) to streamline and standardize deployment workflows across global data center environments.
  • Be part of large cross-functional collaboration with data center operations, networking, and platform teams, ensuring infrastructure is fully integrated and production-ready.
  • Develop automation to ensure all production machines and clusters consistently meet optimal health standards in a timely manner.
  • Define best practices and standards for infrastructure-as-code and configuration management using Git, Flux, Terraform, and related tools.
  • Set technical direction and maintain high-quality system documentation, operational runbooks, and internal tooling that improve the resilience, repeatability, and observability of the infrastructure stack.

Benefits

  • At Groq, a competitive base salary is part of our comprehensive compensation package, which includes equity and benefits.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Professional, Scientific, and Technical Services

Education Level

No Education Listed

Number of Employees

251-500 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service