Director, Software Product Management & RE

Morgan Stanley•Boston, MA

1d•Onsite

About The Position

Morgan Stanley Services Group Inc. is seeking a Director, Software Product Management & RE in Boston, MA. This role involves deploying and maintaining comprehensive monitoring, logging, and alerting systems, handling incident response, root cause analysis, and post-mortem reviews for production outages, and participating in a 24/7 on-call rotation. The position also includes managing AI-enhanced knowledge management efforts for the Production Support team, embedding content from multiple sources, maintaining a centralized knowledge repository, and ensuring regular validation. A key aspect is building and implementing self-healing capabilities to detect and remediate production issues. The role requires planning, executing, and automating high-integrity data migrations between On-Prem databases and Snowflake on Azure Cloud, as well as replicating data. It also involves handling cross-vendor incident response during outages, maintaining runbooks for recovery scenarios, and completing user change requests and enhancements in Production environments, ensuring performance, load, and compliance testing before deployment. The Director will act as the primary operational point of contact for business teams, managing business flows like RFBs and End-of-day processing, and identifying and resolving operational bottlenecks in Aladdin workflows, data feeds, and batch processing.

Requirements

Master's degree in Engineering (any), Computer Science, or a related field of study.
Three (3) years of experience in the position offered or three (3) years as an Associate, Business Data Analyst, Software Developer, or a closely related occupation.
Three (3) years of experience with Generative AI.
Three (3) years of experience with scripting and automation using Linux and Bash.
Three (3) years of experience with Snowflake Data Modelling.
Three (3) years of experience with SQL.
Three (3) years of experience with data replication from on-premise legacy databases including Oracle, SQL Server, DB2, and Sybase to cloud databases including Snowflake using High Volume Replication (HVR) tool.
Three (3) years of experience with data analysis and data pipelines development using Python.
Three (3) years of experience with Data Pipelines Code.
Three (3) years of experience with DataIKU and Airflow.
Three (3) years of experience with scheduling, automating, and optimizing batch jobs using Autosys.
Three (3) years of experience with User Accepting testing and Software Validation.
Three (3) years of experience with data management and data integrity checks.
Three (3) years of experience with working on user requests for Software and Process enhancements.
Three (3) years of experience with data visualization tools including Tableau and PowerBI.
Three (3) years of experience with CI/CD process.
Three (3) years of experience with monitoring tools including DataDog, Splunk, Prometheus, and Grafana.
Three (3) years of experience with automating incident response and recovery workflows.
Three (3) years of experience with creating tickets and reports using ServiceNow.
Three (3) years of experience with incident management using PagerDuty.
Three (3) years of experience with preparing Post Mortem decks for Root Cause Analysis and Impact Mitigation.
Three (3) years of experience with Agile Methodologies.
Three (3) years of experience with Kanban.
Three (3) years of experience with Jira Boards.
Three (3) years of experience with Version control including Git or BitBucket.
Three (3) years of experience with Cloud based Data warehousing including Snowflake.
Three (3) years of experience with Public Cloud platforms including Azure or AWS.
Three (3) years of experience with cloud infrastructure monitoring using Terraform.
Three (3) years of experience with business impact analysis using BigPanda.
Three (3) years of experience with Synthetic monitoring and Application Programmable Interface (API) performance testing using APICA.

Responsibilities

Deploy and maintain comprehensive monitoring, logging, and alerting.
Handle incident response, root cause analysis, and post-mortem reviews for time resolution of production outages.
Participate in 24/7 on call rotation.
Handle AI Enhanced Knowledge Management efforts of the Production Support team.
Embed content and documentation from multiple sources, and maintain centralized knowledge repository and regular validation.
Build and implement self-healing capabilities for detecting and remediating Production issues.
Plan, execute, and automate high integrity data migrations between On- Prem databases and replicate to Snowflake on Azure Cloud.
Handle cross-vendor incident response during Outages and maintain runbooks for recovery scenarios.
Complete user change requests and enhancements in Production environments, and ensure changes go through Performance, Load, and Compliance testing before deployment.
Act as primary operational point of contact for business teams and handle business flows such as RFBs and End of the day processing.
Identify and resolve operational bottle necks in Aladdin workflows, Data feeds and Batch processing.