Infrastructure Reliability Engineer

Anduril IndustriesCosta Mesa, CA

About The Position

This is a small but growing team responsible for the infrastructure and operations behind core developer tools used across the entire engineering organization. You'll own the full lifecycle — patching, upgrades, backups, scaling, and incident response — for services that every engineer depends on daily. The role blends DevOps, SRE, and software engineering, and is ideal for engineers who want high ownership and company-wide impact. You should have a mindset of continuous improvement — if something is manual and repetitive, your instinct should be to automate it away. As the company's on-prem infrastructure footprint grows, this team will expand its scope to provide SRE capabilities for on-prem systems — making this an opportunity to help shape that practice from the ground up.

Requirements

  • Experience operating production systems using Docker and Kubernetes
  • Proficiency with at least one cloud platform (AWS, GCP, or Azure)
  • Experience managing infrastructure with Infrastructure-as-Code tools (e.g., Terraform)
  • Strong problem-solving skills with a focus on automation
  • Scripting or software development experience (e.g., Python, Go, Bash)
  • Familiarity with CI/CD pipelines and developer tooling
  • Ability to own systems end-to-end, from design to incident resolution
  • Eligible to obtain and maintain an active U.S. Secret security clearance

Nice To Haves

  • Prior experience with GitHub Enterprise Server, JFrog Artifactory/Xray, or CircleCI
  • Experience maintaining highly available, scalable internal tools
  • Exposure to security best practices, compliance requirements, or auditing
  • Experience supporting large, rapidly scaling engineering organizations
  • Experience with monitoring and observability platforms (e.g., Datadog, Prometheus, Grafana)
  • Background in SRE or hybrid SWE/DevOps roles
  • Experience with on-prem infrastructure operations, reliability, or capacity planning

Responsibilities

  • Own the lifecycle of core self-hosted developer tools (e.g., GitHub Enterprise Server, CircleCI, JFrog Artifactory/Xray)
  • Design and implement automated systems for patching, backups (with validation), and upgrades
  • Scale infrastructure to support a fast-growing engineering org
  • Use Infrastructure-as-Code (Terraform) to manage environments
  • Operate and troubleshoot systems using Docker, Kubernetes, and cloud platforms (AWS, GCP, Azure)
  • Define and maintain SLOs for service availability, reliability, and performance
  • Build and maintain monitoring, alerting, and observability for developer tool services
  • Lead and participate in incident response and root cause analysis
  • Work cross-functionally with platform, security, infrastructure (on-prem and cloud), and software teams

Benefits

  • Highly competitive equity grants are included in the majority of full time offers; and are considered part of Anduril's total compensation package.
  • Comprehensive, competitive benefits package (available at little to no cost to employees) ensures you’re supported in health, recovery, and whatever comes next.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service