Senior Application Support Engineer (SRE)

DTCC•Jersey City, NJ

1d•Hybrid

About The Position

The Information Technology group delivers secure, reliable technology solutions that enable DTCC to be the trusted infrastructure of the global capital markets. The team delivers high-quality information through activities that include development of essential, building infrastructure capabilities to meet client needs and implementing data standards and governance. As a Senior Application Support Engineer, you will play a critical role in supporting and improving the reliability of DTCC’s risk management applications. This role goes beyond traditional support—you will apply Site Reliability Engineering (SRE) principles to improve system stability, reduce incidents, and drive proactive improvements across a complex environment spanning mainframe, distributed systems, and cloud platforms. You will support a large-scale portfolio of over 100 applications and work as part of a globally distributed team, partnering across regions to ensure seamless production operations and high system availability. You will partner closely with Application Development, Infrastructure, and Operations teams to ensure production stability for mission-critical systems supporting financial risk and settlement processes.

Requirements

Bachelor’s degree preferred or equivalent practical experience
6-8 years of experience in Application Support, Production Support, or SRE roles
Strong understanding of application support and SRE principles including reliability engineering, observability and incident prevention.
Experience working in Linux and Windows environments including process inspection, log analysis and extensive troubleshooting.
Familiarity with monitoring and observability tools such as Splunk and Grafana and the ability to interpret system behavior and alerts.
Knowledge on application support – basic programming skills, log reading and analysis.
Distributed Application troubleshooting and Support skills. Strong problem-solving skills with the ability to think creatively.
Familiarity working with relational databases (DB2, Oracle, Snowflake)
Working knowledge of SQL and ability to execute queries for analysis and troubleshooting.
Experience with ITSM and operational tooling (e.g., ServiceNow) for incident, problem, and change management.
Familiarity with job scheduling, containerized platforms and modern application environments (e.g., Autosys, OpenShift).
Understanding of security fundamentals including certificate and password management.
Exposure to capital markets and financial industry is required.
Demonstrates clear written and verbal communication, ownership and leadership in fast‑paced production environments.
Comfortable operating with urgency and collaborating with global , distributed teams.
Proactive mindset with a focus on continuous improvement, automation and operational excellence

Nice To Haves

Exposure to messaging, networking or mainframe concepts is a plus.
Experience in handling issues related AWS and associated services .
Exposure to artificial intelligence concepts and their usage in production support
Exposure to mainframe environments (job monitoring, batch processing, JCL)
Experience with AWS or cloud-based applications
Basic scripting or programming experience (Python or similar)
Understanding of messaging systems, networking, or middleware
Exposure to AI/automation concepts in production support

Responsibilities

Act as a Lead Application Support Engineer with SRE responsibilities, partnering with Application Development, Infrastructure and Operations teams to improve system reliability, resilience and observability.
Lead the resolution of critical production incidents, providing clear impact analysis, root cause identification and preventative actions.
Drive incident, problem and major incident management maintaining ownership through resolution and post‑incident review.
Proactively identify reliability risks and implement improvements to prevent recurrence and reduce operational toil.
Review and maintain runbooks, knowledge articles and operational documentation to ensure production readiness and consistency.
Execute change, release and deployment activities, including production code releases and vendor application upgrades.
Perform and support Disaster Recovery activities, including testing, execution, and audit/BCM evidence collection.
Identify and implement automation and alert rationalization opportunities to improve operational efficiency and service stability.
Embed reliability, risk and control considerations into day‑to‑day operations, escalating issues appropriately.

Benefits

Competitive compensation, including base pay and annual incentive
Comprehensive health and life insurance and well-being benefits, based on location
Pension / Retirement benefits
Paid Time Off and Personal/Family Care, and other leaves of absence when needed to support your physical, financial, and emotional well-being.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume