Senior Software Engineer, Infrastructure

Ironflow AISan Diego, CA
23h

About The Position

We are building the world’s first AI-native ERP for defense tech — a system that thinks, reasons, and acts alongside the people who build, protect, and defend our way of life. We’re not patching old ideas with a flashy new user interface. We’re starting fresh — reimagining how software supports operations at the cutting edge of the defense tech revolution. Built by veterans, engineers, and operators who’ve lived the pain of legacy systems. Ironflow combines natural language, intelligent agents, and real-world rigor to empower the frontline with clarity and speed. If you're bold, decisive, and obsessed with impact — join us. This is where curiosity meets purpose. Why This Role Matters We're looking for a Senior Software Engineer specializing in Infrastructure who is excited about building and scaling the platform that powers our product. You'll own the reliability, performance, and security of our infrastructure while working closely with product engineers to ship features faster. This is a high-ownership role at an early-stage company. The role has broad scope by design: you'll own the full infrastructure surface, from cluster operations to developer experience. We operate on AWS GovCloud in an environment purpose-built for ITAR and CUI compliance, which means the infrastructure decisions you make carry real regulatory weight. You won't be maintaining someone else's platform: you'll be the person who built it. If you want broad ownership, a direct line to product impact, and the technical challenge of building compliance-grade infrastructure from scratch, this role is for you.

Requirements

  • 5+ years of professional experience in infrastructure, DevOps, SRE, or platform engineering.
  • Deep hands-on experience with AWS services in production (EKS, IAM, Secrets Manager, ECR, RDS). Experience with or strong working knowledge of AWS GovCloud is a significant plus.
  • Strong Kubernetes expertise: you've operated clusters, debugged networking issues, managed Helm charts, and tuned workloads.
  • Proficiency in Python and/or TypeScript with a genuine interest in writing application code alongside infrastructure work.
  • Experience with GitOps workflows: ArgoCD, GitHub Actions, and Helm-based deployments.
  • Solid understanding of networking fundamentals (DNS, load balancing, TLS, Kubernetes Gateway API).
  • Comfort with Linux systems administration and shell scripting.
  • Familiarity with compliance-driven infrastructure: audit logging, access controls, and evidence collection for frameworks like CMMC, FedRAMP, and SOC 2.
  • A collaborative, low-ego mindset: you thrive in small, fast-moving teams.

Nice To Haves

  • Experience with Envoy Gateway or the Kubernetes Gateway API.
  • Background in PostgreSQL administration and schema-based multi-tenancy.
  • Familiarity with the Grafana observability stack (Loki, Tempo, Prometheus).
  • Experience with Karpenter for node autoscaling or cost optimization strategies for cloud spend.
  • Experience with Temporal for workflow orchestration.
  • Experience in a startup or high-growth environment where you wore many hats.

Responsibilities

  • Design, build, and maintain production infrastructure on AWS (EKS, RDS, ECR, VPC, IAM, Secrets Manager, etc.).
  • Develop and manage our Kubernetes clusters: deploy workloads, tune Karpenter node autoscaling, maintain Helm charts, and keep clusters healthy.
  • Own and extend our GitOps deployment pipeline: GitHub Actions for CI/CD, ArgoCD for continuous delivery, and Helm for packaging.
  • Manage supporting cluster operators including Envoy Gateway, External DNS, cert-manager, Fluent Bit, and the AWS Load Balancer Controller.
  • Own and improve our observability stack—Grafana for dashboards, Loki for log aggregation, Tempo for distributed tracing, and Prometheus for metrics.
  • Support multi-environment reliability across dev, stage, and production GovCloud accounts.
  • Improve system resilience through load testing (Locust), E2E testing (Playwright/Cucumber), and thoughtful capacity planning.
  • Contribute to backend services (FastAPI, SQLAlchemy) in Python and TypeScript.
  • Work alongside product engineers as a first-class contributor, making architecture decisions that balance speed, cost, and reliability.
  • Build developer experience tooling: local dev environments, CI pipeline improvements, and automated testing scaffolds that make the whole team faster.
  • Support and extend Temporal-based workflow orchestration for background processing.
  • Implement least-privilege IAM policies, IRSA (IAM Roles for Service Accounts), and network segmentation in a GovCloud environment.
  • Manage secrets through AWS Secrets Manager and the External Secrets Operator with automated rotation.
  • Maintain TLS automation via cert-manager and OIDC authentication flows.
  • Enable SOC2, CMMC, and FedRAMP compliance activities: GRC platform integration, audit logging pipelines, FIPS-validated endpoint configuration, system boundary documentation, and evidence collection for third-party assessments.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service