Manager, Network Operations Center

CollectiveHealth, Inc.Lehi, UT
45dHybrid

About The Position

At Collective Health, we're transforming how employers and their people engage with their health benefits by seamlessly integrating cutting-edge technology, compassionate service, and world-class user experience design. Collective Health is expanding its technology team in Utah! We believe health benefits should be simple, accessible, and technology-driven. We believe that technology is a cornerstone of modern healthcare. This is a role for a hands-on, highly technical engineering manager who is passionate about building resilient, scalable, and automated observability and monitoring systems. The person will be tasked with a "greenfield" opportunity to support the architect, build, and manage our enterprise Network Operations Center (NOC). The primary objective is to move beyond traditional, reactive monitoring and support the creation of an automation-driven, proactive, and predictive technical operations function. The ideal candidate possesses deep technical expertise in modern observability and automation stacks and is driven to eliminate manual toil and improve reliability through code. The successful candidate will be responsible for supporting the design of the operational framework, participating in selecting and implementing monitoring tools, hiring and training a team of skilled engineers, and establishing the processes necessary to ensure 24/7/365 availability, performance, and security of our critical infrastructure. The primary goal of this role is to participate in the creation of a proactive, best-in-class NOC that minimizes downtime and service disruptions, thereby directly supporting exceptional patient care and healthcare systems operations. An essential skill for this position is to have a deep understanding of the regulatory and operational demands of the healthcare industry, including HIPAA compliance.

Requirements

  • A seasoned leader with a proven track record of building and managing high-stakes technical environments.
  • A minimum of 8 years in IT infrastructure or operations, with at least 4 years in a leadership capacity (Manager) within a NOC, SOC, and/or SRE team.
  • Verifiable, hands-on experience architecting, building, and scaling a technical operations center from the ground up in a modern cloud environment.
  • Bachelor's degree in Computer Science, Information Technology, or a related technical field, or equivalent, demonstrable professional experience. Military experience can substitute for a degree.
  • Expert-level knowledge of ITIL and SRE principles, with a proven ability to blend both frameworks to optimize operations (Incident/Problem Management, SLOs/SLIs, Error Budgets).
  • Expert-level scripting skills (Python, Go), full stack Java and Javascript Framework, and deep experience with Infrastructure as Code and automation frameworks (Terraform, Ansible).
  • Hands-on architectural expertise with public cloud platforms (AWS, GCP) and extensive experience with containerization and orchestration (Docker, Kubernetes).
  • Deep understanding of enterprise networking fundamentals (TCP/IP, BGP, DNS, VPNs) and expert-level systems administration skills (Linux, Windows).
  • Proven experience engineering and utilizing modern observability stacks (e.g., OpenTelemetry, Honeycomb, Prometheus, Grafana, ELK Stack) and managing ITSM platforms (Jira Service Management, PagerDuty, etc).

Nice To Haves

  • Prior experience in the healthcare IT industry.
  • In-depth knowledge of HIPAA/HITECH regulations and their application to IT infrastructure.
  • Relevant certifications such as ITIL Foundation/Practitioner, CCNA/CCNP, PMP, or CompTIA Network+.
  • Relevant AWS Certified Cloud Practitioner/Architect and GCP Cloud Digital Leader/Architect.

Responsibilities

  • You will be the principal architect and engineer for our new NOC, participating in the building of a highly automated, resilient, and insightful operations function from the ground up.
  • Participate in leading the end-to-end design and greenfield implementation of the NOC. Define and document all operational policies, SOPs, and data-driven performance standards (KPIs, SLOs, SLAs, and Error Budgets) to establish a foundation of excellence.
  • Deploy a sophisticated observability platform using tools like Prometheus, Grafana, Honeycomb, the ELK Stack, etc., to provide deep, actionable insights into our complex AWS and GCP cloud environments.
  • Evangelize and develop a framework for automated diagnostics and remediation. Drive the adoption of Infrastructure as Code (IaC) with Terraform and Ansible to eliminate manual toil and ensure consistent, repeatable environments.
  • Develop custom code that may be needed from time to time using technologies such as Python, Javascript, Java, Database, etc.
  • You will help build and cultivate a world-class technical team, fostering a culture that attracts and retains top engineering talent.
  • Lead the full-cycle recruitment, hiring, and onboarding process for the founding team of NOC engineers (L1/L2), ensuring they are equipped for success from day one.
  • Actively lead, mentor, and develop your team's talent. Foster a culture of technical curiosity and accountability through continuous feedback and clear career pathing.
  • Oversee all aspects of team management, including creating fair and effective 24X7 shift schedules and on-call rotations, to guarantee complete operational coverage at all times.
  • As the operational leader, you will own the real-time stability of our platform and command the response when issues arise, with a specific focus on the unique demands of the healthcare insurance TPA space.
  • Oversee the 24/7 monitoring and health of all critical infrastructure and applications, especially claims adjudication systems, member portals, and provider data networks.
  • Act as the ultimate point of escalation for major incidents. Lead the incident response and management process with precision, ensuring rapid technical resolution, conducting blameless post-mortems, and communicating clearly to all business and technical stakeholders.
  • Champion ITIL best practices to move beyond reactive fixes. Analyze incident trends to identify root causes, drive a proactive problem management process, and prevent future occurrences.
  • You will ensure the NOC operates with the highest standards of security and compliance, providing transparent visibility into performance for leadership and stakeholders.
  • Ensure end-to-end compliance by acting as the primary owner for NOC compliance, partnering closely with Cybersecurity and Data Governance teams to ensure all tools and procedures are architected to protect PHI/PII and adhere strictly to HIPAA and HITECH regulations.
  • Deliver actionable insights by developing and maintaining a suite of real-time dashboards for technical teams and automated executive reports for leadership, providing clear visibility into system health, SLO adherence, and overall operational performance.

Benefits

  • In addition to the salary, you will be eligible for stock options and benefits like health insurance, 401k, and paid time off.
  • Learn more about our benefits at https://jobs.collectivehealth.com/benefits/.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Manager

Industry

Insurance Carriers and Related Activities

Number of Employees

501-1,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service