Sr. Site Reliability Engineer (Hybrid)

BroadridgeEl Dorado Hills, CA
7d$100,000 - $110,000Hybrid

About The Position

At Broadridge, we've built a culture where the highest goal is to empower others to accomplish more. If you’re passionate about developing your career, while helping others along the way, come join the Broadridge team. We are seeking a Senior Site Reliability Engineer (SRE) to design, build, and operate highly reliable, scalable, and secure platforms supporting business-critical applications across hybrid (on-prem and cloud) environments. This role blends software engineering, systems engineering, and operational excellence, with a strong focus on automation, resiliency, observability, and cost efficiency. The SRE will partner closely with application development, infrastructure, security, and product teams to reduce operational toil, improve system reliability, and enable faster, safer delivery of services.

Requirements

  • 3+ years of experience in Site Reliability Engineering, Platform Engineering, DevOps, or Systems Engineering
  • Strong programming experience in Python, Java, or similar languages
  • Deep experience with Linux/Unix systems
  • Hands-on expertise with AWS and cloud-native architectures
  • Proven experience with Terraform and Infrastructure as Code
  • Strong understanding of networking, security, and distributed systems
  • Experience operating mission-critical, high-volume platforms

Nice To Haves

  • Experience in financial services or highly regulated environments
  • Experience with EKS/Kubernetes at scale
  • Familiarity with Chaos Engineering and resilience testing
  • Experience leading cloud cost optimization (FinOps) initiatives
  • Prior experience transitioning traditional infrastructure teams into SRE practices

Responsibilities

  • Reliability & Resiliency Engineering Design and implement high-availability, fault-tolerant architectures across on-prem and cloud platforms (AWS).
  • Lead multi-region DR planning, implementation, and testing, including RTO/RPO definition and validation.
  • Define and enforce SLOs, SLIs, and error budgets to balance reliability with delivery velocity.
  • Drive self-healing automation and proactive remediation strategies.
  • Automation & Infrastructure as Code Build and maintain infrastructure using Terraform and configuration management tools (e.g., Chef).
  • Develop automation to eliminate manual operational tasks (TOIL reduction).
  • Create reusable modules, pipelines, and guardrails for standardized deployments.
  • Automate certificate lifecycle management, key rotation, and security updates.
  • Observability & Monitoring Design and implement end-to-end observability (metrics, logs, traces, synthetic monitoring).
  • Build dashboards, alerts, and runbooks to enable fast detection and resolution of incidents.
  • Improve signal-to-noise ratio in alerting to reduce operational fatigue.
  • Perform root cause analysis (RCA) and lead post-incident reviews with actionable follow-ups.
  • Cloud & Platform Engineering Engineer and operate platforms on AWS, including services such as: EKS, EC2, RDS/Aurora, Lambda, API Gateway CloudFront, WAF, ALB/NLB CloudWatch, X-Ray, IAM, Secrets Manager
  • Lead cloud migrations and modernization initiatives, including legacy system refactoring.
  • Implement secure networking patterns (VPCs, private subnets, controlled egress).
  • Performance, Scalability & Cost Optimization Identify and resolve performance bottlenecks through testing and analysis.
  • Drive FinOps initiatives to optimize infrastructure cost without compromising reliability.
  • Implement capacity planning and autoscaling strategies.
  • CI/CD & SDLC Enablement Design and support CI/CD pipelines enabling safe, repeatable deployments.
  • Embed reliability practices into the SDLC (testing, rollout strategies, rollback).
  • Partner with development teams to improve operability of applications before production.
  • Security & Compliance Partner with security and legal teams to meet regulatory and compliance requirements (e.g., data residency, GDPR-related controls).
  • Implement secure access controls, secrets management, and encryption best practices.
  • Participate in security reviews, audits, and risk assessments.
  • Leadership & Collaboration Act as a technical leader and mentor for engineers transitioning into SRE roles.
  • Influence architecture and design decisions across multiple teams.
  • Communicate effectively with engineering leadership, product owners, and non-technical stakeholders.
  • Drive a culture of operational excellence, blameless postmortems, and continuous improvement.

Benefits

  • Bonus Eligible
  • Please visit www.broadridgebenefits.com for information on our comprehensive benefit offerings for this role.
  • All Colorado employees receive paid sick leave in compliance with the Colorado Healthy Families and Workplaces Act and other legally required benefits, as applicable.
  • Broadridge provides educational opportunities, including formal classes, training programs and events.
  • Our associates have access to 8,500+ online courses covering business, leadership, technical, and function-specific topics through our LinkedIn Learning program.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service