About The Position

At Broadridge, we've built a culture where the highest goal is to empower others to accomplish more. If you’re passionate about developing your career, while helping others along the way, come join the Broadridge team. Broadridge is hiring! Are you passionate about driving excellence in site reliability and spearheading transformative technology solutions? We are seeking a dynamic Director of Site Reliability Engineering to join our operations team. In this influential role, you will provide support for our cutting-edge Integrated platform solution, responsible for operations and communication to senior leads and clients. You’ll be responsible for monitoring, supporting and resolving platform SRE issues, focused on providing top level support for our clients.

Requirements

  • 8 +years of operations experience with commercial service infrastructure at both a software and infrastructure level
  • In-depth knowledge of Windows operating systems, optional RHEL
  • Understanding of tier 3 architecture design and concepts
  • Experience automating processing using PowerShell scripting
  • Experience with change control and incident management processes
  • Experience working in both private cloud and public cloud environments (AWS/MS Azure)
  • You should have strong problem resolutions skills
  • Ability to work under tight deadlines
  • Bachelor’s degree in Computer Science, Computer Engineering, Computer Information Systems, Engineering (any) or in a related field plus 5+ years of experience in the job offered or in a related occupation

Nice To Haves

  • Experience in financial services industry preferred

Responsibilities

  • Platform Health & Monitoring Awareness of the reliability, availability, and performance standards of the integrated platform environment.
  • Develop, maintain, and continuously improve platform monitoring dashboards and alerting mechanisms using enterprise-standard tools.
  • Ensure monitoring coverage for all critical services, dependencies, and integrations.
  • Proactively identify potential issues or capacity constraints before they impact service availability.
  • Incident Management & Response Serve as the operations lead during major incidents, coordinating technical teams and stakeholders across the enterprise.
  • Lead real-time incident response and ensure timely communication, escalation, and resolution.
  • Conduct post-incident reviews, define root causes, and follow up on corrective actions.
  • Maintain readiness for incidents by ensuring runbooks are complete, accurate, and accessible.
  • Troubleshooting & Diagnostics Perform advanced troubleshooting using system logs, metrics, tracing, and other observability tools.
  • Collaborate with application development, infrastructure, and network teams to identify and resolve complex system or integration issues.
  • Establish repeatable diagnostic procedures and automate routine tasks wherever possible.
  • Operational Excellence & Continuous Improvement Evaluate existing operational processes, tools, and runbooks; recommend and implement improvements.
  • Drive reliability-focused engineering initiatives such as automation of deployment, recovery, and scaling.
  • Identify opportunities for optimizing system performance, cost, and maintainability.
  • Contribute to service-level objectives (SLOs), error budgets, and capacity planning.
  • Collaboration & Communication Act as liaison between platform operations and other technical, support, and business teams.
  • Mentor junior staff in incident response, monitoring best practices, and operational standards.
  • Provide clear, timely communication during incidents and contribute to stakeholder reporting.
  • Governance & Compliance Ensure operational health checks and compliance with internal standards and regulatory requirements.
  • Support audit activities and perform environment validation as required.

Benefits

  • Please visit www.broadridgebenefits.com for information on our comprehensive benefit offerings
  • We are dedicated to fostering a collaborative, engaging, and inclusive environment and are committed to providing a workplace that empowers associates to be authentic and bring their best to work.
  • We believe that associates do their best when they feel safe, understood, and valued, and we work diligently and collaboratively to ensure Broadridge is a company—and ultimately a community—that recognizes and celebrates everyone’s unique perspective.
  • Broadridge provides equal employment opportunities to all associates and applicants for employment without regard to race, color, religion, sex (including sexual orientation, gender identity or expression, and pregnancy), marital status, national origin, ethnic origin, age, disability, genetic information, military or veteran status, and other protected characteristics protected by applicable federal, state, or local laws.
  • Broadridge is committed to creating an engaging workplace for the most talented associates in our industry.
  • We are dedicated to fostering a collaborative, inclusive, and healthy environment that promotes flexibility and accountability.
  • Encouraging professional development opportunities is a core part of our culture.
  • Broadridge provides educational opportunities, including formal classes, training programs and events.
  • To enable learning in our hybrid working model, Broadridge has redesigned all development programs for 100% virtual delivery.
  • Our associates have access to 8,500+ online courses covering business, leadership, technical, and function-specific topics through our LinkedIn Learning program.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service