Site Reliability Engineer II, tvScientific

PinterestSan Francisco, CA
Onsite

About The Position

tvScientific is seeking a Site Reliability Engineer to help operate, scale, and continuously improve a cloud-native platform built on AWS, Kubernetes/EKS, and ArgoCD-driven GitOps workflows. This role will contribute to improving the reliability, scalability, automation, observability, and operational maturity of our infrastructure and delivery ecosystem. The ideal candidate is a hands-on engineer with solid production experience and a strong foundation in building and supporting resilient platforms using infrastructure as code, automation, and modern Kubernetes operational practices.

Requirements

  • 4+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or Cloud Infrastructure
  • Strong hands-on experience operating AWS in production environments
  • Good expertise in Kubernetes, including cluster operations, troubleshooting, workload reliability, and platform administration
  • Experience with Kubernetes multi-tenancy, including namespaces, RBAC, quotas, policies, and tenant isolation patterns
  • Experience implementing and operating ArgoCD within a GitOps delivery model
  • Strong hands-on experience with Helm
  • Experience with Terraform/Terragrunt for infrastructure provisioning and environment management
  • Solid scripting and automation skills using Bash and/or Python
  • Experience building, maintaining, or supporting CI/CD pipelines, ideally using GitHub Actions
  • Strong troubleshooting skills across Linux, containers, IAM, networking, and distributed systems
  • Experience with monitoring, alerting, and observability in production environments
  • Demonstrated ownership mindset with experience handling incidents and resolving production issues
  • Strong collaboration and communication skills, with the ability to work effectively across engineering, security, and platform teams
  • Bachelor’s degree in computer science, engineering, a related field or equivalent experience
  • Demonstrated ability to use AI to improve speed and quality in your day-to-day workflow for relevant outputs
  • Strong track record of critical evaluation and verification of AI-assisted work (e.g., testing, source-checking, data validation, peer review)
  • High integrity and ownership: you protect sensitive data, avoid over-reliance on AI, and remain accountable for final decisions and deliverables

Responsibilities

  • Ensuring the reliability, availability, and performance of production infrastructure and platform services
  • Operating and scaling Kubernetes platforms, including governance and support for multi-tenant workloads
  • Managing GitOps-based deployment workflows using ArgoCD and Helm
  • Supporting infrastructure provisioning and change management through Terraform/Terragrunt
  • Building and supporting CI/CD automation and deployment workflows using GitHub Actions
  • Participating in incident response, root cause analysis, and post-incident improvement initiatives
  • Reducing operational toil through scripting, tooling, and process automation
  • Advancing observability practices across logs, metrics, traces, dashboards, and alerting
  • Supporting secure secrets integration, IAM-aware operations, and platform guardrails
  • Partnering closely with application, security, and platform teams to improve reliability and delivery outcomes
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service