Principal Cloud Engineer - Infrastructure (Automation & BCDR)

ForeFlight•Austin, TX

12h•$208,000 - $244,000•Hybrid

About The Position

Jeppesen ForeFlight is redefining the future of aviation technology – equipping pilots, operators, and decision-makers with the software tools to forge bold new digital paths that push past the expected. We stand with those who dream bigger, fly farther, and seek to explore the next frontier in the cockpit and operations center. The role is preferred as Hybrid working in Austin TX, Houston TX or Denver CO, we will consider virtual for the right candidate.

Requirements

12+ years of engineering experience, with at least 7 as primary architect or technical owner of infrastructure automation platforms and resilience programs at scale
Deep production experience designing and operating IaC at scale: Terraform (or CDK/Pulumi equivalent), with strong opinions on module strategy, state management, policy-as-code, and guardrail enforcement across many cloud accounts and environments
Expert command of CI/CD for infrastructure: pipeline design, drift detection, plan/apply workflows, secrets handling, and self-service patterns that serve engineering teams safely at scale
Track record owning Business Continuity and Disaster Recovery strategy end-to-end: setting RTO/RPO targets, designing multi-region failover, running real DR exercises, and translating findings into durable architectural change
Hands-on experience with chaos engineering and resilience testing in production environments, including failure-injection tooling and game-day operations
Strong grounding in observability for infrastructure: SLOs, drift detection, state-of-the-fleet visibility, and instrumenting both control-plane and data-plane signals
Deep production experience in at least one major cloud (AWS preferred), with credible breadth across both AWS and Azure or strong evidence you can become productive across both
Cross-functional leadership, comfortable as a peer with senior security, compliance, finance, and product engineering leaders on business continuity and audit-readiness conversations
Comfortable with the coordination work of a recently combined company: divergent automation stacks, in-flight unification, and the political work that comes with consolidation

Nice To Haves

Experience leading a BCDR program through external audit or regulatory review (SOC 2, FedRAMP, ISO 22301, financial-services resilience frameworks, or aviation-relevant equivalents)
Experience standing up or evolving a self-service infrastructure platform (Backstage, internal developer portal, or equivalent) with golden-path provisioning patterns
Hands-on experience with infrastructure orchestration tooling beyond raw Terraform (Terragrunt, Atlantis, Spacelift, env0, Crossplane, or similar)
Experience with chaos engineering tooling (AWS FIS, Azure Chaos Studio, Gremlin, Chaos Mesh, Litmus) in production
Experience designing and operating cross-region or cross-cloud disaster recovery for stateful workloads (databases, message queues, object stores)
Background in SRE or platform reliability with strong instincts for SLO design, error budget policy, and toil reduction
Experience post-M&A integrating infrastructure automation platforms across two or more legacy stacks
Experience in aviation, regulated industries, or other domains with mission-critical workloads and strict business continuity requirements
Background contributing to or evaluating resilience standards and frameworks (ISO 22301, NIST SP 800-34, or industry equivalents)

Responsibilities

Own and evolve infrastructure automation platforms; CI/CD pipelines for infrastructure, self-service provisioning workflows, serving engineering teams across a distributed, multi-region environment
Lead the design and continuous validation of Business Continuity and Disaster Recovery strategy, including RTO/RPO target-setting, failover design, chaos engineering, and recovery runbook ownership
Build and operate observability and resilience tooling to ensure infrastructure state is fully instrumented, drift is detected proactively, and failure scenarios are exercised before they're encountered in production
Define and govern IaC standards (Terraform, CDK, or equivalent), including module strategy, state management, and guardrail enforcement across cloud accounts and environments
Own platform reliability outcomes, establish SLOs for core infrastructure services, drive down toil through systematic automation, and maintain high standards for incident response quality
Operate effectively across a complex organizational context, translating business continuity requirements from engineering, security, and compliance stakeholders into concrete infrastructure design and validated recovery capability