Production Support - Senior Technical Lead

OCTAGOS HEALTH, INC.Houston, TX
1dOnsite

About The Position

Octagos is modernizing remote cardiac monitoring with AI-powered automation, seamless EHR integrations, and accuracy proven in high-volume, real-world clinics. Atlas AI™ triages cardiac device transmissions to filter nonactionable alerts and highlights the events that need true clinical attention. Through our Two-Brain Approach™ – combining Atlas AI™ with IBHRE-certified oversight – Octagos delivers 99%+ accuracy, sensitivity, and specificity for near-perfect clinical performance. With fast bi-directional EHR integrations, and flexible, cost-effective implementation, Octagos helps clinics scale care efficiently without compromise. Recognized by TIME and Statista as one of the World’s Top HealthTech Companies 2025, Octagos is redefining how cardiac care is delivered. We’re looking for a Sr. Tech Lead, Production Support to own the day-to-day health, stability, and incident resolution of the Octagos platform. This person will build and lead a production support team, establish processes for monitoring and triaging issues, and serve as the bridge between our clinics and engineering. You’ll be the first line of defense when something goes wrong, and the driving force behind making sure it doesn’t happen again. This role is ideal for someone who thrives in fast-paced environments, has a strong technical foundation, and brings operational rigor with a customer-first mindset. You’ll work closely with the VP of Engineering, Infrastructure, Security and the development team across the US and India, and our client success and clinical operations teams. This is an In-Office position based in Houston, Texas.

Requirements

  • 10+ years in production support, site reliability, DevOps, or application support roles, with at least4years in a people management capacity
  • Strong hands-on experience with Microsoft Azure (App Services, SQL Server, Application Insights, Log Analytics, Azure Functions)
  • Working knowledge of .NET/C# applications, SQL Server (T-SQL query analysis, execution plans, blocking/deadlock diagnosis), and web application architectures
  • Experience building incident management processes from scratch or significantly improving existing ones
  • Demonstrated ability to manage cross-functional escalations and communicate technical issues to non-technical stakeholders
  • Comfortable working across US and international (India) engineering teams with overlapping time zones
  • Strong organizational skills–can manage multiple simultaneous incidents without dropping context

Nice To Haves

  • Experience in healthcare SaaS, clinical software, or HIPAA-regulated environments
  • Familiarity with remote patient monitoring, medical device data, or cardiology workflows
  • Experience with RPA platforms, Playwright-based automation, or web scraping bots in production
  • Familiarity with EMR systems (Epic, Athena, NextGen, Cerner) and HL7/FHIR integration patterns
  • Hands-on experience with Power BI, Hangfire, Auth0, or Azure DevOps pipelines
  • Experience with observability tooling: Open Telemetry, Grafana, PagerDuty, or equivalent
  • Background in SQL performance tuning, index optimization, and managing scheduled database jobs

Responsibilities

  • Own the production support function end-to-end: monitoring, triage, escalation, resolution, and post-incident review
  • Establish and enforce incident management processes including severity classification, SLAs, runbooks, and communication protocols
  • Manage and respond to production incidents across the full stack: Angular portal, .NET API, SQL Server, Azure services, RPA bots, and EMR integrations
  • Monitor platform health using Azure Application Insights, Log Analytics, and custom alerting; proactively identify issues before they reach clinics
  • Coordinate with engineering on hotfixes, emergency deployments, and rollback procedures
  • Build, hire, and manage a team of production support analysts and engineers
  • Define on-call rotations, escalation paths, and shift coverage to ensure adequate support during business hours and critical after-hours windows
  • Create and maintain a knowledge base of known issues, troubleshooting guides, and standard operating procedures
  • Establish KPIs and reporting for production support (MTTR, ticket volume, recurring issues, SLA adherence) and drive continuous improvement
  • Partner with engineering to identify and prioritize reliability improvements, tech debt reduction, and observability gaps
  • Own the operational readiness process for new releases - review deployments, validate release notes, and confirm monitoring coverage before go-live
  • Drive root cause analysis (RCA) for recurring production issues and ensure permanent fixes are tracked to completion
  • Manage database-level production issues including long-running queries, blocking, deadlocks, and job failures (Hangfire, Azure Elastic Jobs)
  • Serve as the primary point of contact between client success, clinical operations, and engineering for production issues
  • Triage and contextualize clinic-reported issues, distinguishing between platform bugs, configuration issues, data quality problems, and user error
  • Coordinate with the RPA team on bot failures across vendor portals (Medtronic, Boston Scientific, St. Jude, Biotronik) and EMR push failures (Epic, Athena, NextGen, Cerner, etc.)
  • Communicate outage status, ETAs, and resolutions clearly to internal and external stakeholders
  • Leverage AI-assisted alert triage and log summarization to speed up incident identification, reduce noise, and improve time-to-detection
  • Use AI to draft clear, stakeholder-ready incident updates and post-incident summaries while ensuring accuracy and appropriate clinical/operational context
  • Apply AI-assisted root cause analysis workflows (pattern detection across traces, tickets, and recent deployments) to identify recurring failure modes and prioritize permanent fixes
  • Maintain an AI-enhanced knowledge base (runbooks, known issues, troubleshooting steps) with consistent tagging, searchable summaries, and regular human review for correctness

Benefits

  • Paid Time Off
  • Health, dental, and vision insurance
  • Competitive salary commensurate with experience
  • Opportunity to shape a critical function at a growing healthcare technology company
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service