Principal Site Reliability Engineer

DigiCert•Lehi, UT

8h•$160,000 - $190,000•Remote

About The Position

The Platform Ops team within CloudOps is responsible for the reliability, scalability, and modernization of DigiCert’s cloud infrastructure. As a Principle SRE, you will own the intersection of software engineering and operations—driving automation-first practices, reducing toil, and accelerating our cloud transformation across AWS, Azure, and GCP environments. You will be a technical force multiplier: raising reliability standards across the organization, defining SLOs that matter, and building the internal platforms and tooling that enable product teams to ship with confidence.

Requirements

5+ years of experience in SRE, platform engineering, or infrastructure engineering roles
Deep proficiency in at least one major cloud provider (AWS, GCP, or Azure) with working knowledge of multi-cloud environments
Strong software engineering skills in Python, Go, or Bash; comfortable writing production-grade automation and tooling
Hands-on Kubernetes experience: cluster operations, workload management, networking (CNI/service mesh), and security (RBAC, pod security)
Infrastructure-as-code expertise with Terraform or equivalent; experience with GitOps workflows
Proven experience designing and operating observability systems and responding to production incidents at scale
Strong understanding of networking fundamentals: DNS, TLS/PKI, load balancing, and zero-trust networking concepts

Nice To Haves

Experience in PKI, certificate lifecycle management, or security-adjacent infrastructure
Familiarity with compliance frameworks such as SOC 2, FedRAMP, or ISO 27001 in cloud environments
Prior experience driving cloud migration or modernization programs at scale
Contributions to open-source infrastructure or platform projects
AWS/GCP/Azure professional-level certifications (e.g., AWS Solutions Architect Professional, CKA/CKS)

Responsibilities

Define, implement, and own SLIs, SLOs, and error budgets for critical platform services
Lead blameless post-mortems and drive systemic reliability improvements across the platform
Design and implement observability pipelines (metrics, logs, traces) using tools such as Splunk, Prometheus, Grafana, or OpenTelemetry
Participate in on-call rotation and serve as an incident commander for P0/P1 events
Architect and execute migration strategies from legacy infrastructure to cloud-native patterns (containers, serverless, managed services)
Champion adoption of Kubernetes, service mesh, and managed cloud services (EKS, GKE, AKS)
Evaluate and introduce emerging cloud technologies that improve availability, cost efficiency, and developer experience
Partner with architecture and security teams to embed reliability and compliance into platform design
Build and maintain infrastructure-as-code using Terraform across multi-cloud environments
Develop internal tooling, self-service platforms, and golden-path templates that reduce operational burden for development teams
Automate operational workflows including provisioning, scaling, patching, and secret rotation
Contribute to and maintain CI/CD pipelines (GitHub Actions) to enable safe, frequent deployments
Mentor mid-level engineers on SRE principles, distributed systems, and infrastructure best practices
Collaborate cross-functionally with product, security, and compliance teams to deliver on platform roadmap commitments
Document architectural decisions, runbooks, and platform standards; raise the engineering bar through code and design reviews

Benefits

Competitive compensation and comprehensive health, dental, and vision coverage
Retirement savings programs with company matching (401(k) or RRSP)
Generous paid time off, including holidays, and vacation
Paid parental leave and family support benefits
Life and disability coverage
Flexible spending and health savings options (where applicable)
Health and wellness support, including gym reimbursement and wellness programs
Employee Assistance Program with 24/7confidential support for employees and families
Education assistance and professional development opportunities
Access to LinkedIn Learning and continuous learning resources
Employee referral bonus program and additional company perks and discounts
Internal rewards and recognition platform (Motivosity) to celebrate and acknowledge project wins, milestone achievements, and the outstanding contributions of our colleagues
Business travel insurance and global employee support programs