DevOps Supervisor NEX

Patterson-UTIHouston, TX

About The Position

The DevOps Supervisor will own end-to-end cloud infrastructure strategy, including networking, Kubernetes cluster management, IAM, secrets management, and cost optimization. This role involves leading Terraform IaC development, designing and operating Kubernetes workloads, and managing supporting infrastructure like databases and caching layers. The position also includes owning and maturing CI/CD pipelines using the Atlassian suite, standardizing Docker practices, and implementing GitOps workflows. Additionally, the role is responsible for the deployment architecture of edge-tier workloads, developing reliable provisioning and update workflows for edge nodes, and coordinating with product and field operations teams. A key aspect of this role is building and owning the on-call program, leading incident response, defining and tracking reliability metrics, and continuously improving observability. The DevOps Supervisor will also be responsible for hiring, mentoring, and growing a team of DevOps and Platform Engineers, partnering with backend engineering teams, championing a security-first culture, and managing vendor relationships and budget.

Requirements

  • 5+ years in DevOps, SRE, or Platform Engineering roles, with at least 1–2 years in a tech lead or supervisory capacity.
  • Deep hands-on experience with a major cloud platform (GCP preferred) including Kubernetes, IAM, networking, and managed services.
  • Strong Terraform skills — writing modules, managing remote state, and structuring multi-environment configurations.
  • Proficiency in Kubernetes and Kustomize for managing multi-environment, multi-target (cloud + edge) workloads.
  • Experience building and maintaining CI/CD pipelines in the Atlassian suite (Bitbucket, Bamboo, or Bitbucket Pipelines); comfort with pipeline-as-code patterns.
  • Solid Docker expertise including multi-stage builds, Compose stacks, and container runtime troubleshooting.
  • Hands-on experience with Prometheus, structured/JSON logging, and building actionable alerting systems.
  • Ability to lead on-call rotations and drive incident management processes end-to-end.
  • Comfortable working in a Python-centric engineering environment (Python 3.12, Poetry, FastAPI familiarity preferred).
  • Experience with edge / IoT deployment patterns — field hardware, intermittent connectivity, or OTA update strategies.
  • Demonstrates positive people management skills: communicates effectively, treats team members fairly and consistently, coaches well, and takes an interest in team members’ career development.
  • Bachelor’s Degree in Computer Science, Information Systems, or a related technical field (Required).
  • 5+ years of progressive experience in DevOps, SRE, or Platform Engineering (Required).
  • 1–2 years of experience in a team lead or supervisory capacity (Required).

Nice To Haves

  • GCP (GKE, Cloud SQL, IAM, VPC)
  • Kubernetes / Kustomize
  • Terraform
  • Docker / Docker Compose
  • Bitbucket / Bamboo / Bitbucket Pipelines
  • Prometheus / structured JSON logging
  • TimescaleDB / PostgreSQL / Redis
  • Auth0 (AuthN) / Cerbos (AuthZ)
  • Python 3.12 / FastAPI / Poetry
  • MQTT / Modbus/TCP (edge protocols)

Responsibilities

  • Own end-to-end cloud infrastructure strategy — networking, Kubernetes cluster management, IAM, secrets management, and cost optimization.
  • Lead all Terraform IaC development across environments (dev, staging, production), enforcing consistent module patterns and state management.
  • Design and operate Kubernetes workloads using Kustomize overlays for both cloud and edge deployment targets.
  • Manage supporting infrastructure: time-series and relational databases, caching layers, and cloud-managed services.
  • Own and mature CI/CD pipelines across all services using the Atlassian suite (Bitbucket, Bamboo / Bitbucket Pipelines) — building, linting, testing, publishing, and deploying Python/FastAPI microservices.
  • Standardize Docker build practices, image tagging strategies, and container registry management.
  • Implement and enforce GitOps workflows for Kubernetes deployments, ensuring audit trails and safe rollback capabilities.
  • Collaborate with development teams to reduce deployment friction and improve feedback loops.
  • Own deployment architecture for edge-tier workloads running on field hardware — Docker Compose stacks including MQTT and Modbus/TCP protocol adapters.
  • Develop reliable provisioning, update, and monitoring workflows for edge nodes in remote or low-connectivity environments.
  • Coordinate with product and field operations teams on edge deployment requirements, connectivity constraints, and rollout planning.
  • Build and own the on-call program: runbooks, alerting, escalation paths, and SLO definitions.
  • Lead incident response, ensuring fast mitigation and thorough post-mortems that prevent recurrence.
  • Define and track reliability metrics (availability, MTTR, error budgets) and report to the Director of Platform Development.
  • Continuously improve observability across cloud and edge environments through structured logging, metrics, and distributed tracing.
  • Hire, mentor, and grow a team of DevOps and Platform Engineers; define career ladders and performance expectations.
  • Partner with backend engineering teams to support the Python/FastAPI microservices platform, authentication, and authorization policy rollouts.
  • Champion a security-first culture: secrets management, least-privilege IAM, dependency scanning, and compliance automation.
  • Manage vendor relationships, cloud spend, and tooling budget with transparency to leadership.
  • Perform additional duties as required and assigned.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service