Site Reliability Engineer, Distribution Engineering

NBCUniversalStamford, CT
79d$110,000 - $145,000

About The Position

NBCUniversal is seeking creative and driven Site Reliability Engineers to join our Distribution Engineering team. This team supports the infrastructure and systems that power NBCU's broadcast, streaming, and monitoring platforms. Within Distribution Engineering, we're hiring SRE's across three closely integrated focus areas: Video Streaming, Monitoring & Control, and Playout. As an SRE, you will be responsible for the engineering, operations, support, deployment, and maintenance of critical systems across on-premises and cloud environments. You will work in a fast-paced, agile environment where innovation and reliability are key.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).
  • 3+ years of SRE experience in the technology sector supporting and maintaining production-quality software or software-defined infrastructure in a high traffic environment run in cloud environments (AWS preferred).
  • Experience with IP video and broadcast technologies.
  • Proficiency in Linux system administration.
  • Experience with Infrastructure as Code (Terraform or CloudFormation) and configuration management technologies (Ansible).
  • Familiarity with CI/CD tools (e.g., GitHub Actions, Jenkins, ArgoCD).
  • Experience with containerization and orchestration (Docker, Kubernetes, EKS).
  • Scripting experience (Python, Bash, or similar).
  • Strong understanding of networking fundamentals and troubleshooting.
  • Experience with monitoring/logging tools (e.g., Grafana, Splunk, ELK, CloudWatch).
  • Comfortable working in agile, fast-paced environments.

Nice To Haves

  • Experience maintaining both Linux and Windows environments.
  • Familiarity with broadcast and monitoring tools such as Dataminer, TAG systems, and/or MediaProxy.
  • Strong hands-on experience debugging and troubleshooting distributed microservices in Kubernetes, including analyzing pod logs.
  • Solid understanding of networking concepts relevant to video streaming, including multicast, unicast, RTP/RTMP, and CDN workflows.
  • Ability to take ownership of problems and drive solutions through automation where applicable (Automation-first mentality).

Responsibilities

  • Develop automation to deploy, maintain, and monitor infrastructure and applications.
  • Troubleshoot and resolve issues in live, on-air environments.
  • Participate in CI/CD pipelines, including code deployment, testing, and monitoring.
  • Create and maintain system metrics, dashboards, and alerting to ensure high availability.
  • Collaborate with engineering, operations, and vendor teams to support system health and performance.
  • Act as a Level 2 support resource for broadcast-related incidents, including root cause analysis and documentation.
  • Participate in on-call rotation for 24/7 support coverage.
  • Evaluate new technologies and contribute to proof-of-concept deployments.
  • Document system configurations, incident resolutions, and operational procedures.

Benefits

  • Medical, dental, and vision insurance.
  • 401(k).
  • Paid leave.
  • Tuition reimbursement.
  • Various other discounts and perks.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Broadcasting and Content Providers

Education Level

Bachelor's degree

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service