Vice President, Site Reliability Engineer

The Bank of New York Mellon-posted 3 months ago

$83,000 - $155,000/Yr

Full-time • Mid Level

Jersey City, NJ

5,001-10,000 employees

Securities, Commodity Contracts, and Other Financial Investments and Related Activities

Resume

Match Score

Upload and Match ResumeTrack Jobs with Teal

At BNY, our culture allows us to run our company better and enables employees' growth and success. As a leading global financial services company at the heart of the global financial system, we influence nearly 20% of the world's investible assets. Every day, our teams harness cutting-edge AI and breakthrough technologies to collaborate with clients, driving transformative solutions that redefine industries and uplift communities worldwide. Recognized as a top destination for innovators and champions of inclusion, BNY is where bold ideas meet advanced technology and exceptional talent. Together, we power the future of finance - and this is what #LifeAtBNY is all about. Join us and be part of something extraordinary. We're seeking a future team member for the role of SRE / Site Reliability Engineer to join our Technology team. This role is located in Jersey City, NJ.

Drive reliability and performance by defining SLOs/SLIs, improving observability, and proactively identifying and addressing system bottlenecks across cloud environments.
Automate infrastructure and operations using Terraform, Kubernetes, and CI/CD tools to eliminate toil and enable scalable, fault-tolerant deployments.
Collaborate cross-functionally with product, infrastructure, and DevOps teams to reduce incidents, build resilient services, and ensure architectural clarity.
Lead incident management by participating in on-call rotations, conducting postmortems, and implementing automated recovery to minimize downtime.
Build and maintain monitoring systems with tools like Prometheus, Grafana, AppDynamics, and Splunk to support real-time alerting and root cause analysis.
Develop platform tooling and pipelines for container orchestration, third-party integrations, and cloud-native operations to improve system efficiency and reliability.
Maintain and improve live services by measuring and monitoring latency and overall system health, working closely with tech support and operations teams.
Leverage and define KPIs to understand service performance and identify corrective actions.
Create, manage, and use dashboards for continuous monitoring and health checks of applications and underlying infrastructure.
Design and implement solutions to customer friction points and improve the entire lifecycle of services from inception through sustainment.
Assist in creating and maintaining automation to improve reliability and velocity in addressing issues during regular maintenance tasks.
Mentor engineers and champion SRE best practices, embedding a reliability-first culture and ensuring technical excellence across engineering teams.

Bachelor's degree in computer science or a related discipline, or equivalent work experience required; advanced degree preferred.
5-8 years of related experience; experience in the securities or financial services industry is a plus.
Strong expertise in cloud infrastructure (Azure, AWS, or GCP), containerization (Docker, Kubernetes), and Infrastructure as Code (Terraform, Helm).
Proficiency in observability and monitoring tools such as Prometheus, Grafana, AppDynamics, Datadog, Splunk, and experience with incident response and on-call support.
Solid programming and scripting skills in languages like Python, Go, or Java, with a focus on automation, tooling, and system integration.
Deep understanding of SRE principles, including SLAs, SLOs, error budgets, postmortems, and reliability-focused system design.
Familiarity with automated testing, DevSecOps practices, CI/CD methods, performance engineering, and security controls.
Strong collaboration and communication skills, with experience working in Agile environments and partnering with cross-functional engineering, product, and operations teams.
Previous success in technical engineering and coding experience beyond simple scripts.

Highly competitive compensation, benefits, and wellbeing programs rooted in a strong culture of excellence and our pay-for-performance philosophy.
Access to flexible global resources and tools for your life's journey.
Focus on your health, foster your personal resilience, and reach your financial goals as a valued member of our team.
Generous paid leaves, including paid volunteer time, that can support you and your family through moments that matter.

Track Jobs with Teal

Job Search Resources

•

AI Resume Builder

•

Site Reliability Engineer Resume Examples

•

Site Reliability Engineer Cover Letter Examples

Vice President, Site Reliability Engineer

Job Search Resources

Tools

Career Hubs

Guides

Company