Senior Site Reliability Engineer, Data & Analytics

Blizzard EntertainmentIrvine, CA
$101,000 - $186,754Hybrid

About The Position

This Senior Site Reliability Engineer role is on our Data & Analytics team, partnering with data, analytics, ML, and platform engineering to improve the reliability, scalability, and performance of large-scale data platforms, analytics pipelines, ML training pipelines, and inference services. In addition to core SRE responsibilities, this role will build operational and automation tooling that reduces toil, speeds up issue resolution, and improves engineering velocity. This includes contributing to internal platform services such as shared tooling, data integrations, and access-control patterns used across Blizzard. The ideal candidate is a production-minded SRE or platform engineer who is comfortable operating critical systems, writing software, and building tools that improve engineering efficiency without compromising reliability. This role is open to candidates based in Irvine, CA or Albany, NY (hybrid or on-site), as well as fully remote candidates.

Requirements

  • Experience operating reliable, distributed systems in SRE, platform, or similar roles
  • Experience with data, analytics, ML, or large-scale distributed workloads
  • Strong knowledge of Linux, containers, Kubernetes, and cloud infrastructure
  • Experience building automation or internal tools (Python, Go, shell, etc.)
  • Experience with infrastructure-as-code (e.g., Terraform)
  • Experience with CI/CD or GitOps systems (e.g., Jenkins, GitHub Actions, ArgoCD)
  • Familiarity with observability (metrics, logs, traces, alerting, incident response)
  • Solid understanding of SRE concepts (SLIs, SLOs, error budgets, postmortems)
  • Experience using modern development and automation practices to improve reliability and efficiency
  • Experience building internal tooling, automation, or developer productivity systems
  • Strong communication skills with technical and cross-functional partners

Nice To Haves

  • Experience with data and ML systems (training pipelines, model serving, GPU workloads)
  • Experience with distributed systems and messaging (Kafka, Pub/Sub)
  • Experience working in Kubernetes-based environments
  • Familiarity with observability tools (Prometheus, Grafana)
  • Experience operating systems in cloud environments (GCP, AWS)

Responsibilities

  • Participate in an on-call rotation and drive incidents to resolution
  • Lead blameless postmortems and identify systemic reliability improvements
  • Partner with data, ML, and platform teams to improve batch, streaming, training, and inference workloads
  • Support ML training pipelines and inference services, including GPU workloads
  • Help define how data and ML services run on Kubernetes
  • Design and build automation and operational tooling (e.g., workflows, diagnostic tooling, runbooks) to reduce on-call burden
  • Build and evolve centralized platform services, including shared tooling, data integrations, and access controls
  • Diagnose and resolve reliability, performance, and cost issues across distributed systems
  • Champion automation, documentation, and practices that reduce toil
  • Maintain infrastructure using Terraform and infrastructure-as-code principles
  • Improve CI/CD and GitOps workflows (Jenkins, GitHub Actions, ArgoCD)
  • Operate and improve containerized services on Kubernetes
  • Define and measure reliability using SLIs, SLOs, and error budgets
  • Run load tests, capacity modeling, and production validation
  • Build internal tools and paved paths that help teams operate safely and efficiently

Benefits

  • Medical, dental, vision, health savings account or health reimbursement account, healthcare spending accounts, dependent care spending accounts, life and AD&D insurance, disability insurance
  • 401(k) with Company match, tuition reimbursement, charitable donation matching
  • Paid holidays and vacation, paid sick time, floating holidays, compassion and bereavement leaves, parental leave
  • Mental health & wellbeing programs, fitness programs, free and discounted games, and a variety of other voluntary benefit programs like supplemental life & disability, legal service, ID protection, rental insurance, and others
  • Relocation assistance
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service