Senior DevOps, Platform Engineer

Moon•Glendale, CA

61d

About The Position

This is a dedicated platform ownership role. You will take full end-to-end responsibility for the DevOps and infrastructure layer across all products. You will be the single point of accountability for pipelines, environments, credentials, deployment reliability, financial cost governance, and compliance posture. Beyond keeping the lights on, you will proactively leverage AI-driven tooling to improve DevOps workflows and continuously raise the bar on what reliable infrastructure means. You will own the complete infrastructure and DevOps layer for a growing SaaS company running across multi-cloud infrastructure across major cloud providers — partnering directly with the Lead Engineer and a distributed offshore engineering team. This is not a support role. You are the primary point of accountability for pipeline reliability, credential hygiene, observability, cloud cost governance, and compliance posture. You also serve as the company’s infrastructure voice in financial and compliance conversations, owning the data that informs leadership decisions on cloud investment and risk.

Requirements

Hands-on experience with major cloud providers across multi-cloud environments — provisioning, networking, IAM, and cost management
Terraform or equivalent IaC tooling — you write it, maintain it, and own it
CI/CD pipeline design and ownership using modern CI/CD platforms — you design it, maintain it, and own it
Confident with .NET / C# application deployments
Secrets and credential management at scale using enterprise secrets management platforms
Multi-app, multi-environment deployment pipelines with consistent standards
Mobile build pipeline ownership for cross-platform mobile applications (iOS and Android), including app store deployment automation
Production observability engineering: APM tooling, log aggregation, alerting pipelines, and synthetic monitoring
SLA ownership experience — on-call processes, incident management, and runbook authorship
Demonstrated experience with FinOps practices: cloud cost dashboards, tagging, rightsizing, and budget governance
Familiarity with SOC 2 or ISO 27001 control environments — evidence collection, access reviews, and audit support
Comfortable working directly with a distributed offshore engineering team
Hands-on experience designing or operating active-active, multi-cloud production environments — this is a core platform requirement, not a stretch goal
Proficiency in Go for infrastructure tooling, automation, and custom operators/controllers
Strong Linux shell scripting (bash/sh) for automation, system administration, and pipeline scripting

Nice To Haves

Exposure to AI-assisted DevOps tooling — anomaly detection, LLM-assisted incident response, or AI-generated IaC
SaaS product background, ideally in a multi-tenant environment
Experience with FinOps tooling: Azure and AWS Cost management features
AWS Certified DevOps Engineer – Professional, Microsoft Certified: DevOps Engineer Expert, and Certified Kubernetes Administrator (CKA)

Responsibilities

Own all CI/CD pipelines across every application — consistent, documented, and not dependent on tribal knowledge
Manage multi-cloud resource provisioning with Terraform or equivalent IaC tooling — version-controlled and fully reproducible
Own secrets, credentials, and access management across all environments using enterprise secrets management platforms
Set up repo creation, branching standards, and tooling for the distributed engineering team
Own cross-platform mobile application build pipelines (iOS and Android) for both app stores
Support the work, deployment, and maintenance of our active-active multi-cloud platform
Own SLA commitments for all production systems — 99.99% uptime target with documented contracts and breach escalation paths
Define and maintain on-call rotation, and incident severity matrix
Maintain and evolve runbooks for all critical failure scenarios — executable by anyone on the team, not just the author
Implement and operate the full observability stack: enterprise observability tooling, centralized log aggregation, real-time alerting, and synthetic monitoring for all customer-critical paths
Own customer-facing and internal system health dashboards — surfacing uptime, error rates, latency, and throughput in real time
Build and maintain FinOps dashboards surfacing cloud spend, resource utilization, and cost-per-environment breakdowns
Implement financial controls: budget alerts, tagging enforcement, reserved instance planning, and rightsizing across major cloud providers
Own the monthly cloud cost review process — flag anomalies, model savings scenarios, and present findings to leadership
Track and report FinOps KPIs: cloud cost as a percentage of revenue, waste percentage, and savings realized
Own the infrastructure contribution to SOC 2 Type II and ISO 27001 — evidence collection, control mapping, and audit readiness at all times
Conduct regular access reviews and enforce least-privilege IAM across all cloud environments
Work directly with the compliance team to respond to auditor requests, close findings, and maintain continuous compliance posture
Implement and monitor infrastructure security controls: network segmentation, encryption at rest and in transit, vulnerability scanning, and drift detection
Identify DevOps workflows where AI tooling can drive measurable efficiency gains — pipeline generation, intelligent incident detection, and automated runbook execution
Evaluate and pilot AI-assisted tools for infrastructure: anomaly detection on metrics streams, LLM-assisted root cause analysis, and AI-generated IaC scaffolding
Document and share findings with the engineering team — serve as the infrastructure voice in the company’s AI-first engineering culture