AIOps Lead, Software Engineering

ZelisBoston, MA
$169,000 - $213,750Hybrid

About The Position

This role will lead the next phase of Zelis’ operational transformation as we accelerate AWS migration and expand AI-native capabilities into the operations space. You will define and drive the AIOps strategy for the organization, combining cloud operations, observability, automation, SRE practices, and AI/agentic solutions to improve reliability, incident response, operational efficiency, and platform resilience. As a senior technical leader, you will work across Engineering, Cloud, Infrastructure, Security, and Operations teams to design intelligent operational capabilities that move beyond traditional monitoring into proactive, automated, and agent-assisted operations. This role requires strong AWS depth, hands-on experience with observability platforms such as New Relic and OpenSearch, strong AI and agent experience, a strong SRE mindset, and proven ability to apply ChatOps and agentic or multi-agent systems to real-world operational workflows.

Requirements

  • Typically BS + 12 years or MS + 10 years (or equivalent), with a strong track record leading cloud operations, platform operations, SRE, observability, or AIOps initiatives across complex enterprise environments.
  • Strong hands-on experience designing and operating workloads on AWS, with expertise across compute, networking, storage, security, automation, and cloud operations patterns.
  • Deep experience with modern observability and monitoring platforms such as New Relic, OpenSearch, and related tools for metrics, logs, traces, dashboards, alerting, and operational analytics.
  • Proven experience applying AI to operations use cases such as event correlation, anomaly detection, alert reduction, root cause analysis, remediation support, and operational workflow automation.
  • Strong experience designing or implementing AI agents, agentic workflows, or multi-agent systems that improve operational processes and operator effectiveness.
  • Strong grounding in site reliability engineering principles, including service reliability, SLOs/SLIs, error budgets, automation, incident management, resilience, and continuous improvement.
  • Demonstrated success building or scaling ChatOps practices that improve collaboration, incident response, and operational execution through integrated messaging and workflow automation.
  • Strong knowledge of scripting, infrastructure automation, operational tooling, APIs, event-driven systems, and platform integration patterns.
  • Ability to translate operational pain points into scalable technical solutions that improve reliability, speed, and operational maturity.
  • Able to influence technical teams and senior leaders, build alignment across functions, and communicate complex operational strategies clearly.
  • Experience implementing operational AI responsibly with appropriate controls for accuracy, security, compliance, explainability, and human oversight.

Nice To Haves

  • Experience leading AIOps or intelligent operations initiatives in a cloud-first or large-scale enterprise environment.
  • Experience supporting AWS migration programs and modern cloud operating models.
  • Familiarity with incident management tooling, runbook automation, knowledge systems, and operational workflow orchestration platforms.
  • Experience integrating AI agents with observability, ticketing, collaboration, or operational systems.
  • Experience in healthcare, regulated environments, or other domains requiring strong reliability and compliance practices.
  • Exposure to platform engineering, DevOps, and developer experience practices that intersect with operational excellence.

Responsibilities

  • Lead the AIOps strategy and architecture for Zelis Price Business Unit as we modernize operations alongside AWS migration and AI-native acceleration.
  • Define and implement intelligent operational patterns that improve incident detection, triage, remediation, root cause analysis, and operational decision-making.
  • Architect and scale observability capabilities across cloud and application environments, including metrics, logs, traces, dashboards, alerting, and service health visibility.
  • Drive operational excellence across AWS environments by establishing scalable patterns for monitoring, resilience, reliability, automation, and governance.
  • Design and implement AIOps capabilities using AI, agents, and agentic workflows to support incident response, anomaly detection, alert correlation, noise reduction, troubleshooting, and operational automation.
  • Lead development of agentic and multi-agent operational solutions that coordinate across monitoring, diagnostics, knowledge retrieval, remediation workflows, and operator assistance.
  • Build and mature ChatOps capabilities to improve collaboration, visibility, response speed, and workflow automation across engineering and operations teams.
  • Partner with SRE, Cloud Engineering, Infrastructure, Security, and Application teams to embed reliability engineering practices into operational processes and platform design.
  • Establish standards for operational telemetry, service-level objectives, alert quality, escalation workflows, incident readiness, and post-incident learning.
  • Drive adoption and optimization of observability tools such as New Relic, OpenSearch, and related monitoring, logging, and analytics platforms.
  • Identify opportunities to apply AI to reduce manual operational effort, improve mean time to detect and resolve issues, and increase platform stability and operator productivity.
  • Ensure AIOps solutions are implemented with strong governance, security, auditability, and operational trustworthiness.
  • Create playbooks, standards, reusable patterns, and operating models that scale AIOps adoption across teams.
  • Mentor engineers and operators in modern operations practices spanning observability, automation, SRE, ChatOps, and AI-assisted operations.

Benefits

  • 401k plan with employer match
  • flexible paid time off
  • holidays
  • parental leaves
  • life and disability insurance
  • health benefits including medical, dental, vision, and prescription drug coverage
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service