Senior Dev Ops Engineer

SkyWater Technology Foundry, Inc.Bloomington, MN
$134,080 - $201,120Onsite

About The Position

We build and operate the platforms that powers our enterprise AI/ML, data engineering, and reporting/BI workloads. We run in a regulated Google Cloud environment (FedRAMP High), where reliability, security, and operational rigor are non-negotiable. We are hiring a seasoned DevOps engineer who can join our team and be self-sufficient from day one—owning infrastructure, CI/CD, observability, and security guardrails that keep our AI + data + reporting systems secure, compliant, and reliable. You will serve as a hands-on engineer and mentor others: you’ll standardize environments, reduce toil, harden delivery pipelines, and improve incident response—while working inside the constraints of a FedRAMP High environment.

Requirements

  • 10+ years in DevOps, platform engineering, or production operations for critical systems.
  • Proven experience operating in regulated cloud environments (FedRAMP High and/or similarly constrained government high-side environments).
  • Strong hands-on capability with: GCP operations (projects, IAM/service accounts, networking fundamentals)
  • Kubernetes/GKE in production
  • Terraform (or equivalent IaC) at scale
  • CI/CD systems and release automation
  • Observability (logs/metrics/traces), alerting, and incident response
  • Experience working in GCP Big Data Deployments (e.g., Big Query, BigTable, CloudBuild, Cloud Run, Cloud Functions, Managed Instance Groups, Airflow, GSUtil, Cloud Composer, Vertex AI& ML tools)
  • Proficiency in Linux and at least one of: Python, SQL, plus shell scripting (e.g., Zsh, Bash, PowerShell).
  • Experience working with BI, ETL, Data Management Tools (e.g., dbt, Power BI, Tableau)
  • Demonstrated ability to work independently: take ambiguous problems, drive execution end-to-end, communicate clearly, and land durable solutions.
  • Knowledge of API development (REST API)
  • Experience creating, hardening Docker images and Compose files
  • Experience with Virtual Private Cloud (VPC) and cloud network segmentation
  • Comfortable working in an Agile/Scrum environment

Nice To Haves

  • Experience supporting AI/ML platforms (training/inference workflows, model packaging/versioning, GPU capacity planning).
  • Experience supporting data platforms (warehouse, ETL/ELT, orchestration, streaming) in regulated environments.
  • Familiarity with compliance artifacts and workflows (e.g., SSP/POA&M concepts, control narratives, evidence collection), without needing constant direction from Governance, Risk, and Compliance teams.
  • GCP security posture tooling and workflows
  • Policy-as-code / admission controls (OPA/Gatekeeper or similar patterns)
  • Supply chain security (artifact signing, SBOMs, dependency management, container scanning)

Responsibilities

  • Own production reliability for AI + data platforms
  • Operate and continuously improve platform reliability for batch + streaming pipelines, reporting SLAs, and ML workloads.
  • Define and run SLOs/SLIs, alerting standards, and incident response processes (on-call, postmortems, measurable follow-ups).
  • Build runbooks, dashboards, and automation that reduce MTTR and recurring incidents.
  • Build secure, compliant delivery “paved roads”
  • Design and maintain CI/CD for services, pipelines, infra, and (where applicable) model artifacts.
  • Implement safe deployment patterns: progressive delivery, automated rollbacks, change controls, and release governance appropriate for regulated environments.
  • Own “golden paths” and templates so engineering teams can ship reliably without reinventing the wheel.
  • Design and maintain Terraform modules and IaC standards for repeatable GCP provisioning.
  • Operate GCP org/folder/project structures, network patterns, and environment separation (dev/stage/prod) aligned to compliance requirements.
  • Establish secure baseline configurations and guardrails (policy-as-code where relevant).
  • Implement and operate security controls aligned to FedRAMP High / NIST 800-53 High baseline concepts: IAM hardening, audit logging, encryption, vulnerability management, secure configuration, incident handling, and continuous monitoring.
  • Partner with compliance/security stakeholders to support audit readiness through evidence automation, control mapping, and operational documentation.
  • GKE platform operations: cluster lifecycle, upgrades, node pools, workload identity, RBAC, network policy, resource governance.
  • Centralized logging/monitoring and audit: alert hygiene, retention, routing, and security event visibility.
  • Secrets and key management: Secret Manager, Cloud KMS, key rotation patterns, access controls.
  • Network controls for regulated environments: private connectivity patterns, service perimeters, and controlled egress.
  • Participate in an on-call rotation for platform-owned services, with strong expectations to reduce noisy alerts and recurring incidents through engineering.

Benefits

  • competitive salary
  • opportunity to participate in incentive plans
  • 401k match
  • life insurance
  • opportunities to purchase SkyWater stock at a discounted rate
  • benefit eligibility day one
  • medical
  • dental
  • mental health benefits
  • vision
  • legal planning
  • short- and long-term disability
  • paid time off
  • paid holidays
  • on-site fitness facility
  • on-site self-serve market
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service