Cloud Infrastructure Engineer

Firestorm•San Diego, CA

58d•Onsite

About The Position

Firestorm is building the next generation of uncrewed aircraft and the advanced manufacturing systems that deliver them at speed. The Software Integration & Operations department owns the software layer that spans factory floor to cloud — the applications, automation, edge systems, and intelligence that make it possible to iterate product designs, automate advanced manufacturing, and scale production with uncompromising quality and rigor. As a Cloud Infrastructure Engineer, you own the cloud runtime the manufacturing software platform operates on. Your scope is how the platform actually runs in production — the Kubernetes clusters, the service mesh, the observability stack, the deployment pipelines, and the multi-account patterns that keep everything reliable across commercial cloud, GovCloud, and on-premise edge environments. This is a hands-on IC role where reliability is the product: developer velocity, incident response quality, and production stability are all downstream of how well the platform you own operates.

Requirements

Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent practical experience).
5+ years of cloud infrastructure, platform engineering, or SRE experience with production ownership of a non-trivial platform.
Deep proficiency in Kubernetes in production — not just deploying workloads, but operating clusters, managing upgrades, and diagnosing failures at the cluster level.
Strong proficiency in Terraform and infrastructure-as-code patterns at multi-account scale.
Hands-on experience with service mesh technologies (Istio, Linkerd, or equivalent) and the patterns they enable — traffic management, authorization policies, observability.
Deep experience with cloud observability across at least one mature stack: metrics, distributed tracing, log aggregation, and alerting — and a track record of using them to find problems, not just display them.
Demonstrated incident response leadership — you have been the person in the room during production incidents and the one running the post-mortem afterward.
Demonstrated history of holding yourself and your teammates to a high standard, even when it creates discomfort.
U.S. person status required due to ITAR/EAR constraints on the work.

Nice To Haves

Hands-on experience with AWS GovCloud, Azure Government, or other regulated cloud environments.
Experience deploying containerized workloads into on-premise or air-gapped environments.
Familiarity with identity providers (Keycloak, Okta, Auth0) and OIDC/OAuth2 flows at scale.
Background that includes building a platform from early-stage through production-scale, not just operating a mature one.

Responsibilities

Own the cloud runtime — Kubernetes clusters, service mesh, and multi-account patterns — that the manufacturing software platform operates on across commercial cloud, GovCloud, and on-premise edge environments.
Design and evolve the platform's infrastructure-as-code, from foundational Terraform modules through Helm charts and GitOps workflows.
Own runtime observability end-to-end: metrics, traces, logs, SLOs, alerting, and the dashboards engineers actually use during incidents.
Drive production reliability — define incident response practices for the platform, serve as the escalation point during incidents, and run blameless post-mortems.
Architect the service mesh and traffic management layer — Istio or equivalent — to support routing, resilience, and observability patterns at platform scale.
Partner with the Infrastructure & Security Engineer and DevOps functions to devise security and compliance requirements, and adopt a common CI/CD infrastructure.
Raise the operational bar across the department — deployment practices, runbook quality, on-call rotations, etc.
Demonstrated history of holding yourself and your teammates to a high standard, even when it creates discomfort.

Benefits

Comprehensive medical, dental, and visions plans
401(k) Retirement Savings Plan
Equity grants for new hires
Unlimited PTO
Extremely generous company holiday calendar, including a holiday hiatus in November, & December.
Generous Parental Leave
Lifestyle Spending Account
FSA
DCFSA
HSA
Hospital Indemnity insurance
Critical Illness insurance
Accident insurance
Basic Life/AD&D, short-term and long-term disability insurance, 100% covered by Firestorm. Plus, the option to purchase additional life insurance for you and your family.
Mental Health Resources: We provide free mental health resources 24/7 including therapy and more. Additional work-life services, such as free legal and financial support, are available to you as well.