Lead Site Reliability Principal Architect - Hybrid

FactSet•New York, NY

3d•$220,000 - $265,000•Hybrid

About The Position

FactSet creates flexible, open data and software solutions for over 200,000 investment professionals worldwide, providing instant access to financial data and analytics that investors use to make crucial decisions. At FactSet, our values are the foundation of everything we do. They express how we act and operate, serve as a compass in our decision-making, and play a big role in how we treat each other, our clients, and our communities. We believe that the best ideas can come from anyone, anywhere, at any time, and that curiosity is the key to anticipating our clients’ needs and exceeding our expectations. The Role: As the Lead Site Reliability Principal Architect, you will head a small team of Infrastructure & DevOps engineers. You’ll combine your technical expertise with leadership skills to ensure our systems are robust, observable, and ready to scale. You will balance hands-on engineering with mentorship, shaping the roadmap for a modern, resilient, and efficient infrastructure.

Requirements

Proven experience leading or mentoring DevOps/SRE engineers
Strong experience with AWS
Deep knowledge in infrastructure-as-code (Terraform, CloudFormation, etc.), config management (Ansible, Chef, Puppet), and container orchestration (Kubernetes, ECS, etc.).
Hands-on experience with modern observability stacks (Prometheus, Grafana, ELK, Datadog, etc.).
Track record of implementing high-availability and disaster recovery systems.
Effective communicator, able to work cross-functionally and inspire operational excellence.
Bachelors degree in computer science or relevant.

Responsibilities

Lead a team of SRE/Infrastructure engineers and manage the team’s roadmap
Design and implement systems for high availability, performance, and scalability.
Own and improve the end-to-end architecture of our infrastructure stack (cloud, CI/CD, configuration management, container orchestration, etc.).
Drive best practices for monitoring, alerting, logging, and observability; enable deep system visibility and actionable insight.
Champion a culture of operational excellence, incident response, and blameless postmortems.
Collaborate with other engineers on system architecture, deployments, and operational readiness.
Identify and eliminate single points of failure; work proactively on capacity planning and disaster recovery strategies.
Drive automation efforts for deployment, scaling, and infrastructure management.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume