Lead Site Reliability Principal Architect - Hybrid

FactSetNew York, NY
3d$220,000 - $265,000Hybrid

About The Position

FactSet creates flexible, open data and software solutions for over 200,000 investment professionals worldwide, providing instant access to financial data and analytics that investors use to make crucial decisions. At FactSet, our values are the foundation of everything we do. They express how we act and operate, serve as a compass in our decision-making, and play a big role in how we treat each other, our clients, and our communities. We believe that the best ideas can come from anyone, anywhere, at any time, and that curiosity is the key to anticipating our clients’ needs and exceeding our expectations. The Role: As the Lead Site Reliability Principal Architect, you will head a small team of Infrastructure & DevOps engineers. You’ll combine your technical expertise with leadership skills to ensure our systems are robust, observable, and ready to scale. You will balance hands-on engineering with mentorship, shaping the roadmap for a modern, resilient, and efficient infrastructure.

Requirements

  • Proven experience leading or mentoring DevOps/SRE engineers
  • Strong experience with AWS
  • Deep knowledge in infrastructure-as-code (Terraform, CloudFormation, etc.), config management (Ansible, Chef, Puppet), and container orchestration (Kubernetes, ECS, etc.).
  • Hands-on experience with modern observability stacks (Prometheus, Grafana, ELK, Datadog, etc.).
  • Track record of implementing high-availability and disaster recovery systems.
  • Effective communicator, able to work cross-functionally and inspire operational excellence.
  • Bachelors degree in computer science or relevant.

Responsibilities

  • Lead a team of SRE/Infrastructure engineers and manage the team’s roadmap
  • Design and implement systems for high availability, performance, and scalability.
  • Own and improve the end-to-end architecture of our infrastructure stack (cloud, CI/CD, configuration management, container orchestration, etc.).
  • Drive best practices for monitoring, alerting, logging, and observability; enable deep system visibility and actionable insight.
  • Champion a culture of operational excellence, incident response, and blameless postmortems.
  • Collaborate with other engineers on system architecture, deployments, and operational readiness.
  • Identify and eliminate single points of failure; work proactively on capacity planning and disaster recovery strategies.
  • Drive automation efforts for deployment, scaling, and infrastructure management.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Principal

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service