Morgan Stanley-posted 13 days ago
$150,000 - $210,000/Yr
Full-time • Mid Level
New York, NY
5,001-10,000 employees

In the Technology division, we leverage innovation to build the connections and capabilities that power our Firm, enabling our clients and colleagues to redefine markets and shape the future of our communities. This is a Lead Infrastructure Production Management & Reliability Engineering position at the Vice President level, which is part of the job family responsible for maintaining the stability and reliability of the organization's infrastructure systems, ensuring optimal performance and availability to support business operations. Role Profile: We are seeking an experienced Production Support Manager to join our Global Operations Reliability and Production Engineering team in New York City. The successful candidate will represent and manage critical incidents across a diverse portfolio of over 1,000 applications, ensuring stability, performance, and regulatory compliance for internal and external clients. This role requires both technical and business acumen, exceptional communication skills, and flexibility to operate in a dynamic, high-pressure environment, including rotational on-call and weekend coverage. The Production Support Lead will provide strategic leadership for complex distributed and cloud platforms, champion incident and problem management, drive automation and reliability initiatives, and build strong relationships with senior stakeholders. The ideal candidate will be a proactive, hands-on leader who excels at mastering innovative technologies and processes, fostering a culture of continuous improvement, and influencing technology and business decisions across the firm.

  • Provide strategic leadership and oversight for complex distributed and cloud platforms, ensuring operational excellence and regulatory compliance.
  • Lead and manage critical incidents, ensuring timely resolution and effective communication with executive management and business stakeholders.
  • Troubleshoot and resolve issues across hardware, software, application, network, and cloud stacks.
  • Build and maintain relationships with senior stakeholders, downstream consumers, IT partners, and development teams globally.
  • Mentor and manage a high-performing production support team, promoting continuous improvement, learning, and resilience.
  • Drive automation, toil reduction, and enhancements in observability, monitoring, and reliability across platforms.
  • Own and evolve documentation, knowledge sharing, and best practices for global teams.
  • Collaborate with development and infrastructure teams to resolve support issues and implement reliability solutions.
  • Represent production support in executive forums, influencing technology and business decisions.
  • Operate in a 'follow-the-sun' support model, with rotational on-call and weekend coverage.
  • Develop and implement programs to establish and enhance reliability and production management practices in the department.
  • Function as a buffer between support and development teams, reducing escalations and resolving issues within production management.
  • Support end users and business functions in day-to-day operations.
  • Bachelor of Computer Science, Engineering, or a related field.
  • 8+ years of hands-on experience in Production Support, Production Management, or a similar technical leadership role.
  • Proven people management and team leadership experience.
  • Strong working knowledge of UNIX/Linux operating systems, scripting languages (e.g., Shell, Python, Perl, JavaScript), and relational databases (e.g., Sybase, DB2, SQL, Postgres, Snowflake, MongoDB).
  • Experience troubleshooting large-scale distributed applications and managing critical incidents.
  • Hands-on experience supporting applications deployed on cloud platforms, particularly Azure.
  • Expertise in analyzing, debugging, and troubleshooting complex applications, infrastructure, and database issues.
  • Excellent and confident communicator, able to manage high-pressure environments and executive-level communication.
  • Flexibility in working hours, including rotational on-call and weekend coverage.
  • Experience supporting financial industry systems is highly valued.
  • Knowledge of ITIL, SDLC, and Agile development practices.
  • Experience with cloud/distributed computing technologies and certifications is a plus.
  • Familiarity with modern observability and monitoring tools (e.g., Azure Monitor, AppInsights, Prometheus, Grafana, Datadog, Kubernetes, Docker, Ansible).
  • Experience using and configuring DevOps tooling (e.g., Terraform), and in instrumenting application endpoints for logging, metrics, and events.
  • Strong documentation and knowledge-sharing skills.
  • Experience in implementing reliability engineering and production management programs.
  • We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry.
  • There’s also ample opportunity to move about the business for those who show passion and grit in their work.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service