Senior Site Reliability Engineer- Central Platforms

SS&C Technologies•New York, NY

35d•$175,000 - $185,000

About The Position

We are seeking a Site Reliability Engineer (SRE) to join our Internal Platform Services team , responsible for the reliability, scalability, and performance of the core services that power our internal engineering ecosystem. You will work at the intersection of development and operations, enabling product teams to move quickly and safely by building and maintaining robust, self-service infrastructure components like Kubernetes clusters, internal databases, CI/CD pipelines, observability tools, and cloud APIs .

Requirements

3+ years of experience as an SRE, DevOps Engineer, or Infrastructure Engineer.
Solid experience with Kubernetes administration and tooling (e.g., Helm, ArgoCD, Kustomize).
Strong expertise in cloud platforms (e.g., AWS, GCP, or Azure).
Experience managing databases in production environments (e.g., backups, replication, tuning).
Proficiency in programming or scripting (e.g., Go, Python, Bash).
Deep understanding of CI/CD pipelines and infrastructure automation .
Familiarity with monitoring/observability tools (e.g., Prometheus, Grafana).
Strong communication skills and ability to collaborate with software engineering teams.

Nice To Haves

Experience in multi-tenant infrastructure environments.
Exposure to compliance and security best practices in infrastructure environments.

Responsibilities

Ensure reliability, scalability, and performance of services through SLIs/SLOs, capacity planning, and incident response.
Drive automation of infrastructure operations to minimize toil.
Develop and support monitoring, alerting , and observability systems to support proactive issue detection.
Partner with internal engineering teams to define service-level objectives , improve deployment workflows, and integrate infrastructure with development needs.
Contribute to on-call rotations and incident management , helping ensure high availability of services.
Drive post-incident reviews and blameless retrospectives to improve reliability.
Stay current with emerging technologies and recommend improvements to existing systems and practices.