Director, Splunk Platform Engineering & SRE

BNY Mellon•New York, NY

1d•Onsite

About The Position

At BNY, our culture allows us to run our company better and enables employees’ growth and success. As a leading global financial services company at the heart of the global financial system, we influence nearly 20% of the world’s investible assets. Every day, our teams harness cutting-edge AI and breakthrough technologies to collaborate with clients, driving transformative solutions that redefine industries and uplift communities worldwide. Recognized as a top destination for innovators, BNY is where bold ideas meet advanced technology and exceptional talent. Together, we power the future of finance – and this is what #LifeAtBNY is all about. Join us and be part of something extraordinary. We’re seeking a team member for the role of Director, Splunk Platform Engineering & SRE (Individual Contributor) to join our Cybersecurity Engineering Tools & Platforms team. This role is in New York, NY. This is a high-impact, deeply technical leadership role designed for a top-tier engineer, not a people manager. The Director title reflects the depth of technical expertise, ownership, and influence required, not team size. You will take ownership of a large-scale, mission-critical Splunk platform at the center of enterprise observability and cybersecurity. This role requires someone who can go deep into the stack: OS, network, ingestion pipelines, distributed systems, and resolve issues at their root, regardless of complexity. If you are the engineer that others call when systems fail in unpredictable ways, and you enjoy solving those problems, this role is built for you. This role is for someone who wants to own and evolve a complex, high-stakes platform, not just maintain it. It offers the opportunity to operate at a level where your technical decisions directly impact platform stability, security, and scale across the enterprise. If you’re motivated by depth, challenge, and solving problems others can’t, this is that role.

Requirements

Deep, hands-on expertise in Splunk platform engineering and large-scale SIEM environments
Bachelor's degree in computer science or a related discipline, or equivalent work experience required, advanced degree preferred.
12+ years of experience in information security or related technology experience required, experience in the securities or financial services industry is a plus.
Strong foundation in Site Reliability Engineering (SRE) and distributed systems
Proven ability to debug and resolve complex issues across the full stack, from application to OS and network layers
Expert knowledge of Linux/Unix systems, including performance tuning and low-level troubleshooting
Strong understanding of networking fundamentals (TCP/IP, packet analysis, syslog pipelines, latency debugging)
Experience building and operating high-volume data ingestion and processing systems
Proficiency in Splunk SPL, and data analysis
Strong programming/scripting skills (e.g., Python, Go, Java, or similar)
Hands-on experience with DevOps and configuration management tools (Ansible, Git, etc.)
Experience with Kubernetes and containerized environments
Deep understanding of security models, RBAC, and enterprise controls
Ability to operate independently in high-pressure situations and take full ownership of outcomes
A mindset focused on automation, scalability, and eliminating operational friction
Technical in depth and hands-on A.I. literacy as well as knowledge of MCP design

Nice To Haves

Experience in the securities or financial services industry is a plus.

Responsibilities

Own end-to-end engineering and operational accountability for the enterprise Splunk platform (SIEM), including architecture, capacity planning, ingestion, integrations, and lifecycle management
Act as the highest technical escalation point, driving resolution of critical incidents across application, platforms, and infrastructure layers
Troubleshoot and resolve deep, low-level technical issues, including: Linux/Unix OS internals (CPU, memory, I/O, process behavior), Network behavior, packet flow, and latency bottlenecks, Distributed system failures and data ingestion breakdowns
Drive platform reliability, capacity, observability, and performance engineering, using modern monitoring stacks (Prometheus, Moog)
Architect and scale high-throughput ingestion pipelines, integrating: Syslog and event ingestion frameworks, Kubernetes / containerized platforms, Cloud and enterprise systems
Own authentication, RBAC, and access control models, ensuring strong governance and compliance
Design and implement automation and configuration management frameworks (Git, Ansible) to reduce operational toil
Lead incident response, root cause analysis, and systemic fixes, embedding SRE principles (SLAs, SLOs, error budgets)
Drive platform upgrades, resilience strategies, and disaster recovery readiness
Evaluate and onboard emerging technologies, including AI/ML-driven analytics and contextual data platforms
Create bespoke solutions for unsolved problems using languages like python, java or golang.
Influence engineering direction across teams through technical leadership and expertise as an individual contributor
Mentor and elevate engineers through hands-on guidance and technical depth