DevOps/SRE Team Lead

Telestream

17h•Remote

About The Position

Telestream is a leading provider of digital media tools and software solutions for the broadcast, streaming, and media industries. We empower content creators and distributors to produce and deliver high-quality video content while optimizing operations and maximizing revenue. Our teams work diligently to innovate and support world-class services, and we are seeking a DevOps/SRE Team Lead with proven, hands-on Kubernetes expertise to drive the reliability and scalability of our video processing infrastructure and oversee a small team of SRE’s and DevOps Engineers. This is a deeply technical lead role, requiring real-world experience administering production Kubernetes clusters—not theoretical familiarity. You will own CI/CD pipelines, infrastructure automation, and cloud platform operations in a fully remote environment where independent execution is essential. If you have built, broken, and fixed things in Kubernetes at scale, while managing and mentoring a team, we want to hear from you. Location: US Remote. Candidates must be legally authorized to work in the United States. This role is not eligible for employer-sponsored work authorization or visa sponsorship of any kind, now or in the future. Our process includes a live, hands-on technical interview conducted via shared terminal and screen share. You will be asked to work through real Kubernetes and infrastructure scenarios in real time—no take-home exercises, no slides. Candidates who are comfortable with the skills listed above will do well. Candidates who are not, will find this stage difficult to navigate. We value people who are direct about what they know and what they’re still learning.

Requirements

Bachelor’s degree in computer science, Engineering or equivalent
5-8+ years of experience in DevOps/SRE, with 2-3+ years in a leadership role.
Hands-on experience building and maintaining CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, or equivalent) with direct integration into Kubernetes deployment workflows
Production-level experience with infrastructure as code (Terraform required; CloudFormation or Pulumi a plus), including managing cloud-hosted Kubernetes clusters (EKS, GKE, or AKS)
Experience with monitoring, logging, and observability tooling in Kubernetes environments (Prometheus, Grafana, Datadog, ELK/EFK stack, or equivalent); ability to build dashboards and alerts from scratch, not just consume existing ones
Demonstrated, hands-on Kubernetes experience in production environments: cluster administration, Helm chart authoring and management, RBAC configuration, persistent storage, horizontal/vertical pod autoscaling, and diagnosing and resolving real production failures (CrashLoopBackOff, OOMKilled, networking issues, etc.)
Strong troubleshooting skills with the ability to diagnose infrastructure and application issues live, under pressure, without reference materials—this is evaluated directly in our interview process
Proficiency in scripting languages (Python, Go, Bash, or PowerShell); ability to write and own automation scripts, not just modify existing ones
Strong communication, conflict resolution, and the ability to influence without authority
Excellent communication and collaboration skills

Responsibilities

Design, deploy, and administer production Kubernetes clusters, including workload scheduling, namespace management, RBAC, network policies, and cluster upgrades
Design and maintain continuous integration/deployment pipelines to automate testing and deployment, including Kubernetes-native delivery workflows using Helm and ArgoCD or equivalent
Track software performance, fixing errors, troubleshooting systems, implement preventative measures to ensure smooth workflows
Implement and manage infrastructure.
Utilize Terraform or CloudFormation for IaC management
Optimize cloud resources by implementing cost-effective solutions
Collaborate with various teams to ensure smooth deployment
Monitor and create new processes based on performance analysis
Implement security best practices, including automated compliance checks and secure code deployment
Manage the technical roadmap, architecture while mentoring SRE and DevOps Engineers. (Player/Coach)
Hire, coach, and manage a team of DevOps engineers and Site Reliability Engineers.
Define DevOps/Platform roadmap aligned with business goals (e.g., cloud cost optimization, automation maturity).