Principal Site Reliability Engineer

DigiCertLehi, UT
$160,000 - $190,000Remote

About The Position

The Platform Ops team within CloudOps is responsible for the reliability, scalability, and modernization of DigiCert’s cloud infrastructure. As a Principle SRE, you will own the intersection of software engineering and operations—driving automation-first practices, reducing toil, and accelerating our cloud transformation across AWS, Azure, and GCP environments. You will be a technical force multiplier: raising reliability standards across the organization, defining SLOs that matter, and building the internal platforms and tooling that enable product teams to ship with confidence.

Requirements

  • 5+ years of experience in SRE, platform engineering, or infrastructure engineering roles
  • Deep proficiency in at least one major cloud provider (AWS, GCP, or Azure) with working knowledge of multi-cloud environments
  • Strong software engineering skills in Python, Go, or Bash; comfortable writing production-grade automation and tooling
  • Hands-on Kubernetes experience: cluster operations, workload management, networking (CNI/service mesh), and security (RBAC, pod security)
  • Infrastructure-as-code expertise with Terraform or equivalent; experience with GitOps workflows
  • Proven experience designing and operating observability systems and responding to production incidents at scale
  • Strong understanding of networking fundamentals: DNS, TLS/PKI, load balancing, and zero-trust networking concepts

Nice To Haves

  • Experience in PKI, certificate lifecycle management, or security-adjacent infrastructure
  • Familiarity with compliance frameworks such as SOC 2, FedRAMP, or ISO 27001 in cloud environments
  • Prior experience driving cloud migration or modernization programs at scale
  • Contributions to open-source infrastructure or platform projects
  • AWS/GCP/Azure professional-level certifications (e.g., AWS Solutions Architect Professional, CKA/CKS)

Responsibilities

  • Define, implement, and own SLIs, SLOs, and error budgets for critical platform services
  • Lead blameless post-mortems and drive systemic reliability improvements across the platform
  • Design and implement observability pipelines (metrics, logs, traces) using tools such as Splunk, Prometheus, Grafana, or OpenTelemetry
  • Participate in on-call rotation and serve as an incident commander for P0/P1 events
  • Architect and execute migration strategies from legacy infrastructure to cloud-native patterns (containers, serverless, managed services)
  • Champion adoption of Kubernetes, service mesh, and managed cloud services (EKS, GKE, AKS)
  • Evaluate and introduce emerging cloud technologies that improve availability, cost efficiency, and developer experience
  • Partner with architecture and security teams to embed reliability and compliance into platform design
  • Build and maintain infrastructure-as-code using Terraform across multi-cloud environments
  • Develop internal tooling, self-service platforms, and golden-path templates that reduce operational burden for development teams
  • Automate operational workflows including provisioning, scaling, patching, and secret rotation
  • Contribute to and maintain CI/CD pipelines (GitHub Actions) to enable safe, frequent deployments
  • Mentor mid-level engineers on SRE principles, distributed systems, and infrastructure best practices
  • Collaborate cross-functionally with product, security, and compliance teams to deliver on platform roadmap commitments
  • Document architectural decisions, runbooks, and platform standards; raise the engineering bar through code and design reviews

Benefits

  • Competitive compensation and comprehensive health, dental, and vision coverage
  • Retirement savings programs with company matching (401(k) or RRSP)
  • Generous paid time off, including holidays, and vacation
  • Paid parental leave and family support benefits
  • Life and disability coverage
  • Flexible spending and health savings options (where applicable)
  • Health and wellness support, including gym reimbursement and wellness programs
  • Employee Assistance Program with 24/7confidential support for employees and families
  • Education assistance and professional development opportunities
  • Access to LinkedIn Learning and continuous learning resources
  • Employee referral bonus program and additional company perks and discounts
  • Internal rewards and recognition platform (Motivosity) to celebrate and acknowledge project wins, milestone achievements, and the outstanding contributions of our colleagues
  • Business travel insurance and global employee support programs
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service