DevOps & Site Reliability Engineer

VoltaGridHouston, TX
Onsite

About The Position

DevOps / Site Reliability Engineer to implement and evolve the infrastructure, deployment pipelines, and reliability posture of our systems. You'll work closely with engineering teams to build scalable, observable, and resilient infrastructure while driving a culture of operational excellence.

Requirements

  • 4+ years of experience in DevOps, SRE, or infrastructure engineering roles
  • Strong experience with at least one major cloud provider (AWS, GCP, or Azure AWS preferred)
  • Deep hands-on experience with Kubernetes and Docker in production environments
  • Proficiency with infrastructure as code tools, particularly Terraform
  • Experience building and maintaining CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, or similar)
  • Solid understanding of monitoring and observability (metrics, logs, traces)
  • Strong scripting skills (Bash, Python, or Go)
  • Experience with incident management, SLO-based reliability practices, and capacity planning
  • Strong Linux systems administration skills (Ubuntu, RHEL/CentOS, or similar)
  • Experience with virtualization platforms including VM provisioning, storage, networking, and cluster management
  • Solid understanding of networking, DNS, load balancing, and security fundamentals

Nice To Haves

  • Contributions to internal developer platforms or platform engineering initiatives
  • Proxmox VE experience
  • Certifications in cloud platforms (AWS SA, CKA, etc.)

Responsibilities

  • Design, build, and maintain cloud infrastructure
  • Manage and optimize Kubernetes clusters and containerized workloads in production
  • Develop and maintain infrastructure as code using Terraform (or equivalent tooling)
  • Build and improve CI/CD pipelines to enable fast, safe, and reliable deployments
  • Implement and maintain monitoring, alerting, and observability systems (Prometheus, Grafana, Datadog, or similar)
  • Define and track SLIs/SLOs, participate in incident response, root cause analysis, and blameless postmortems
  • Identify and eliminate toil through automation and self-service tooling
  • Configure and maintain on-prem baremetal servers and Linux-based infrastructure
  • Configure, maintain, and optimize virtualized assets
  • Collaborate with development teams on system design, capacity planning, and performance optimization
  • Participate in on-call rotations and ensure production readiness of new services
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service