Associate Director Application Support Engineering

DTCC•Tampa, FL

6d•Hybrid

About The Position

The Information Technology group delivers secure, reliable technology solutions that enable DTCC to be the trusted infrastructure of the global capital markets. The team delivers high-quality information through activities that include development of essential, building infrastructure capabilities to meet client needs and implementing data standards and governance. The Development family is responsible for creating, designing, deploying, and supporting applications, programs, and software solutions. May include research, new development, prototyping, modification, reuse, re-engineering, maintenance, or any other activities related to software products used internally or externally on product platforms supported by the firm. The software development process requires in-depth subject matter expertise in existing and emerging development methodologies, tools, and programming languages. Software Developers work closely with business partners and / or external clients in defining requirements and implementing solutions. The Application Support Engineering role specializes in maintaining and providing technical support for all applications that are beyond the development stage and are running in the daily operations of the firm. Works closely with development teams, infrastructure partners, and internal / external clients to escalate and resolve technical support incidents.

Requirements

Minimum of 8 years of related technical and management experience
Bachelor's degree preferred or equivalent experience
Proven experience with SRE or DevOps practices, including CI/CD pipelines, infrastructure as code, and automation frameworks
Strong understanding of monitoring and observability platforms (e.g., Grafana) and experience designing and fine‑tuning robust monitoring systems
Programming proficiency in one or more languages such as Python, Java, Go, or similar, for automation and tooling development
Familiarity with cloud platforms, containerized environments, and/or hybrid infrastructure models
Experience in financial services, capital markets, or regulated environments
Demonstrated participation in disaster recovery, performance, and resiliency testing
Knowledge of AI concepts, data platforms, messaging systems, and large‑scale batch or real‑time processing systems
Strong collaboration skills across technology and business teams
Hands‑on experience leading and participating in incident and problem management, including root cause analysis

Nice To Haves

Cloud certifications is a plus

Responsibilities

Participate in design reviews, sprint zero, and delivery planning to champion non‑functional requirements (NFRs) including resiliency, observability, fault tolerance, Holiday and Special days processing, as well as disaster recovery.
Collaborate with Major Release Management to ensure each Risk release meets SRE standards for observability and resiliency (SLIs/SLOs, monitoring, knowledge base articles). Ensure releases are subject to required deployment validations.
Define and evolve monitoring, alerting, SLIs, and SLOs, leveraging AI/ML‑driven analytics for anomaly detection, incident correlation, and early risk identification.
Make design recommendations that will quick detection of outage conditions and allow the application to recover without manual interventions and/or create a knowledge based guidance for application support team to follow for improved application recovery times. Participate in major incident response / Root Cause analysis to drive continual systemic recovery time improvements.
Drive automation and intelligent tooling (including AI‑assisted remediation) to reduce manual toil and improve consistency and recovery times.
Attend and present operational readiness with application support (EAS L2) at project management meeting - raise any operational risks and concerns. Test NFRs in UAT environments to validate effectiveness and completeness of operational capabilities. Validate operational readiness prior to release with stakeholders, partner with Embedded Risk and Security teams, and proactively surface and mitigate technology and operational risks.
Lead capacity planning and performance analysis to ensure Risk platforms scale reliably under high load.
Establish KPIs and operational metrics to demonstrate reliability improvements and operational maturity.
Build a strong SRE culture—enhanced by AI‑driven insights—across Risk Application Support and Development through mentorship and best‑practice coaching; leverage approved AI tools to analyze code and collaborate on knowledge base articles, and to accelerate improvements in observability, performance, security, and maintainability.

Benefits

Competitive compensation, including base pay and annual incentive
Comprehensive health and life insurance and well-being benefits, based on location
Pension / Retirement benefits
Paid Time Off and Personal/Family Care, and other leaves of absence when needed to support your physical, financial, and emotional well-being.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume