Production Support Manager

New York LifeOverland Park, KS
$111,500 - $150,000Hybrid

About The Position

As the Production Support Manager within IL Technology Operations at New York Life Insurance Company, you will lead a small, high-impact team responsible for keeping production systems healthy, observable, and resilient. This is a proactive, prevention-focused leadership role — your primary mission is to build monitoring frameworks, automate early warning systems, and implement preventive strategies that stop production issues before they impact the business. You will balance day-to-day operational excellence with forward-looking strategic initiatives, including platform modernization and the adoption of new technologies such as Amazon QuickSight. The ideal candidate is a hands-on technical leader with a high sense of urgency, a builder’s mindset, and a deep commitment to both their team and the reliability of the systems they own.

Requirements

  • 5+ years of experience in production support, site reliability engineering (SRE), or technology operations, with a demonstrated focus on proactive monitoring and incident prevention.
  • 2+ years of people leadership experience with a proven ability to develop team members, drive accountability, and lead through both strategic initiatives and high-urgency production situations.
  • Working knowledge of AWS cloud services (EC2, CloudWatch, Lambda, S3, RDS, etc.) and hands-on experience designing or implementing monitoring and alerting frameworks that enable proactive detection and prevention of production issues.
  • Experience with DevOps practices including CI/CD pipelines, infrastructure-as-code, and automated deployment processes that reduce manual effort and production risk.
  • Proven ability to develop and execute operational strategies, establish SLAs, and communicate meaningful metrics and system health dashboards to business stakeholders.
  • Experience supporting or decommissioning legacy systems (e.g., Microsoft Access, on-premises databases), including dependency mapping, migration planning, and risk management.
  • Serves as the on-call escalation point for production support issues during the overnight cycle run, providing leadership guidance and decision-making support to the offshore team when critical incidents require management-level intervention.

Nice To Haves

  • AWS certifications (Solutions Architect, SysOps Administrator, or DevOps Engineer).
  • Hands-on experience with monitoring and observability tools such as CloudWatch, Datadog, Splunk, Grafana, or PagerDuty — with a focus on configuring proactive alerting and trend-based anomaly detection.
  • Proficiency in scripting and automation languages (Python, PowerShell, Bash) used to build preventive automation, runbooks, or self-healing workflows.
  • Experience with business intelligence or analytics platforms (e.g., Amazon QuickSight, Tableau) for operational reporting and health dashboards.
  • Background in insurance or financial services technology operations, with familiarity driving enterprise-level technology transformation initiatives.

Responsibilities

  • Build and continuously evolve a proactive monitoring and alerting framework — designing early warning systems, automated health checks, and trend-based detection that identify and resolve potential production risks before they escalate into system downtime or customer-facing incidents.
  • Lead and develop a high-performing onshore production support team (including a Production Support Analyst and Lead), while coordinating offshore resources to ensure 24/7 coverage, consistent quality standards, and a culture rooted in prevention-first thinking.
  • Drive platform modernization initiatives, including the strategic decommissioning of legacy Microsoft Access databases and implementation of CI/CD pipelines, infrastructure-as-code, and automated deployment processes that reduce manual toil and production risk.
  • Develop and communicate operational health metrics, SLA dashboards, and incident trend reports to business stakeholders — delivering transparent, timely insights into production health, preventive actions taken, and continuous improvement outcomes.
  • Serve as the escalation point and strategic owner for critical production incidents — rapidly triaging issues, coordinating resolution across teams, and conducting post-incident reviews that drive systemic, preventive improvements.

Benefits

  • leave programs
  • adoption assistance
  • student loan repayment programs
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service