About The Position

The Manager of Production Support leads teams responsible for ensuring the stability, resilience, and operational excellence of critical technology platforms supporting core lines of business. This role owns end-to-end production support operations while driving maturity toward engineering-first, site reliability–focused practices. The Director identifies and resolves complex technical, operational, risk, and organizational challenges, while building high-performing, accountable teams across onshore and offshore locations. This position carries full people management responsibility, including hiring, coaching, performance management, and disciplinary actions, and serves as a key partner to Technology, Risk, and Business leadership.

Requirements

  • Bachelor’s degree in Computer Science, Software Engineering, or a related technical field, or equivalent practical experience.
  • A minimum of 5 years of professional software engineering experience, including team leadership or supervisory responsibilities.

Nice To Haves

  • Understanding of multiple approaches to production support and software engineering delivery.
  • Full understanding of Agile methodology.
  • Experience leading teams in an Agile organization, particularly those practicing Site Reliability Engineering.
  • Experience using AI agents in day-to-day activities, particularly in regard to enabling software delivery and production support operations.
  • Banking or financial services experience.
  • Bachelor’s degree and twelve years of experience in software development, production support, including five years of management experience.

Responsibilities

  • Own end-to-end production support operations for multiple mission-critical applications supporting key lines of business, ensuring availability, stability, and performance meet defined SLAs and SLOs.
  • Provide accountable, visible leadership for 24x7 operational support, including on-call models, escalation paths, and incident response effectiveness.
  • Act as the senior escalation point for major incidents, ensuring swift recovery, accurate root cause analysis, and durable remediation.
  • Lead cross-functional incident recovery efforts in partnership with Incident Management, engineering teams, infrastructure, and business stakeholders.
  • Ensure timely root cause analysis (RCA), post-incident reviews, and corrective actions that prevent recurrence.
  • Establish and mature a production knowledge base, documenting known issues, recovery procedures, and architectural insights.
  • Drive adoption of Site Reliability Engineering (SRE) and lean engineering principles, including: Reduction of toil through automation, Engineering-based reliability metrics (error budgets, SLIs/SLOs), Proactive resilience and failure prevention practices.
  • Champion automation of repetitive and manual operational tasks, including incident detection, response, validation, and recovery where feasible.
  • Promote a culture of preventative engineering, partnering with development teams to improve system reliability upstream.
  • Implement and continuously improve real-time monitoring, alerting, and observability across applications and infrastructure.
  • Measure and optimize the effectiveness of monitoring and alerting to eliminate noise and accelerate mean-time-to-detect and mean-time-to-recover.
  • Leverage AI and advanced analytics to correlate telemetry data (logs, metrics, traces) and proactively identify emerging risks and root causes.
  • Champion the safe and responsible use of AI within production operations by adhering to enterprise guardrails and protecting sensitive data and system integrity.
  • Oversee operational readiness across releases, disaster recovery and failover testing and certificate and dependency lifecycle management.
  • Ensure production support is actively embedded in change planning, minimizing risk from releases and infrastructure changes.
  • Lead one or more Agile teams (Scrum, Kanban), including onshore and offshore engineers, fostering high performance and accountability.
  • Manage workforce vendors and partners, setting expectations, reviewing performance, and ensuring delivery quality.
  • Own budget and staffing plan aligned to application criticality, operational risk, and business growth objectives.
  • Act as the first line of defense in production operations by proactively identifying and mitigating technology, operational, and resiliency risks.
  • Partner effectively with second-line Risk, Audit, and Regulatory teams, ensuring findings are addressed and controls are continuously improved.
  • Ensure compliance with internal policies, regulatory requirements, and external audit expectations.
  • Own and drive remediation plans for risk, audit, and regulatory findings, ensuring timely, effective and sustainable resolution.
  • Lead responses to audit and regulatory inquiries, including providing evidence, clarifying controls, and appropriately challenging findings based on documented compliance.
  • Serve as a trusted advisor to senior Technology and Business leaders, communicating operational health, risk posture, and improvement roadmaps.
  • Lead or contribute significantly to large-scale initiatives, platform transformations, or regulatory-driven efforts.
  • Continuously assess organizational maturity and lead initiatives to improve reliability, efficiency, and talent capability.
  • Full people management accountability, including: Hiring and succession planning, Coaching and performance management, Compensation input and talent development, Disciplinary action and terminations as necessary.
  • Act as an Agile and DevOps champion, embedding production support within fast-moving delivery models.
  • Balance “keep-the-lights-on” operational excellence with continuous engineering improvement.
  • Drive measurable outcomes such as improved uptime, reduced incident volume, faster recovery, and improved customer experience.

Benefits

  • medical
  • dental
  • vision
  • life insurance
  • disability
  • accidental death and dismemberment
  • tax-preferred savings accounts
  • 401k plan
  • vacation
  • sick days
  • paid holidays
  • defined benefit pension plan
  • restricted stock units
  • deferred compensation plan
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service