About The Position

At Goldman Sachs, our Engineers don’t just make things – we make things possible.  Change the world by connecting people and capital with ideas.  Solve the most challenging and pressing engineering problems for our clients.  Join our engineering teams that build massively scalable software and systems, architect low latency infrastructure solutions, proactively guard against cyber threats, and leverage machine learning alongside financial engineering to continuously turn data into action.  Create new businesses, transform finance, and explore a world of opportunity at the speed of markets. Engineering is at the critical center of our business, and our dynamic environment requires innovative strategic thinking and immediate, real solutions.  Want to push the limit of digital possibilities?  Start here. Goldman Sachs Engineers are innovators and problem-solvers, building solutions in risk management, big data, mobile and more.  We look for creative collaborators who evolve, adapt to change and thrive in a fast-paced global environment. Asset & Wealth Management Engineering  Across Wealth Management, Goldman Sachs helps empower clients and customers around the world to reach their financial goals.  Our advisor-led wealth management businesses provide financial planning, investment management, banking and comprehensive advice to a wide range of clients, including ultra-high net worth and high net worth individuals, as well as family offices, foundations and endowments, and corporations and their employees.  Our consumer business provides digital solutions for customers to better spend, borrow, invest, and save.  Across Wealth Management, our growth is driven by a relentless focus on our people, our clients and customers, and leading-edge technology, data, and design. Wealth Management Onboarding Engineering is the nexus of our Wealth Management business.  With Engineers located in Dallas, Salt Lake City, New York City, London, Bengaluru, and Hyderabad, the Onboarding Engineering team is building the next generation of highly distributed and scalable systems to rapidly enable new clients and efficiently and effectively maintain and service existing clients by capturing, modeling, synthesizing, processing, and managing vast amounts of disparate data while upholding the highest standards of data hygiene and controls.  You will be part of a smart, passionate, and fun team of Engineers reimagining and digitizing complex workflows with focuses on usability, availability, resiliency, performance, and advanced monitoring with governance. The Production Management Lead is responsible for the stability, resilience, and operational excellence of the business unit's production environment.  This role drives and oversees incident, problem, change, and release management processes while partnering closely with Engineering, Infrastructure, and Business stakeholders to ensure systems operate reliably and securely.

Requirements

  • Bachelor's degree in Computer Science or related Engineering field
  • 8+ years of experience in production support, technology operations, or SRE
  • Proven experience managing high-availability, client-facing systems
  • Strong knowledge of incident, problem, and change management frameworks
  • Experience supporting regulated financial systems
  • Familiarity with cloud platforms and hybrid infrastructure environments
  • Strong understanding of monitoring and observability tools
  • Excellent written and verbal communication skills, including experience working directly with both technical and non-technical stakeholders

Responsibilities

  • Own end-to-end production support across mission-critical systems
  • Lead major incident response and post-incident reviews (RCA), driving corrective and preventative actions
  • Establish clear SLAs, SLOs, and KPIs to measure operational performance
  • Serve as an escalation point for high-severity issues impacting clients and/or revenue
  • Ensure production processes align with regulatory and audit requirements
  • Partner with Compliance and Risk to maintain strong operational controls
  • Identify systemic risks and implement mitigation strategies
  • Maintain business continuity and disaster recovery readiness
  • Improve release processes to balance speed with stability
  • Drive automaton of monitoring, alerting, and remediation
  • Analyze incident trends to improve system reliability and reduce repeat issues
  • Lead and mentor a team of production support engineers
  • Communicate clearly with senior leadership during incidents and risk events
  • Build a culture of accountability, transparency, and operational excellence
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service