Associate Director - IT Service Reliability & Operations

Eli Lilly and CompanyIndianapolis, IN
$132,000 - $193,600

About The Position

At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We’re looking for people who are determined to make life better for people around the world. This role is accountable for the end-to-end operational reliability, demand management, and capacity sustainability of Tech@Lilly’s centralized service and reliability operations, ensuring that incidents, events, requests, systemic reliability risks, and operational demand are managed through a centralized, standardized, data‑driven, and increasingly automated operating model. This role serves as the operations and demand lead for a centralized Service Reliability and Operations capability leading the progressive shift from human-executed operations to an increasingly automated, agent-assisted operating model. The role is responsible for driving technology-enabled transformation across operations through the practical application of AI, automation, and agentic solutions, demand forecasting, and capacity planning to ensure services scale predictably and reliably.

Requirements

  • Bachelor's degree in Business, Information Technology, STEM or related field
  • 12+ years of IT experience, with significant time in production operations, reliability, or service management, or SRE-adjacent environments
  • 7+ years leading vendor or supplier‑supported operating models, including capacity planning, demand forecasting, and driving automation and innovation
  • 5+ years in people leadership roles within complex, global environments
  • Demonstrated experience leading high‑severity incident response and operational risk mitigation
  • Hands-on experience with AI in production operations, reliability, or service management, or SRE-adjacent environments
  • Qualified applicants must be authorized to work in the United States on a full-time basis. Lilly will not provide support for or sponsor work authorization or visas for this role, including but not limited to F-1 CPT, F-1 OPT, F-1 STEM OPT, J-1, H-1B, TN, O-1, E-3, H-1B1, or L-1

Nice To Haves

  • Strong executive communication skills with the ability to translate operational and AI signals into business‑relevant insights
  • Ability to operate at both strategic design and day‑to‑day execution levels, particularly in high‑pressure production environments while steering the organization toward modern, tech-enabled operations.
  • Proven success implementing centralized, tiered operating models that incorporate demand management, and capacity planning to scale globally and leverage automation and AI to improve consistency and resilience.
  • Deep understanding of Incident, Event, Change, and Problem Management including how these practices are enhanced through automation, analytics, and AI-assisted workflows in a mature ITSM environment.
  • Demonstrated ability to use operational and demand data to drive capacity decisions, reliability improvements, and executive confidence.

Responsibilities

  • Operational Reliability & Service Ownership Lead a centralized Service Reliability & Operations function, providing standardized intake, triage, coordination, and governance across incidents, events, requests, and problems.
  • Deploy AI-assisted triage models that automatically classify, prioritize, and route incidents based on historical patterns, service risk profiles, and real-time signals
  • Establish and govern an automated remediation capability for known failure patterns, with human-in-the-loop escalation for high-risk scenarios
  • Own the operational runbook strategy, ensuring runbooks are not static documentation artifacts but active, machine-readable automation inputs that drive consistent, auditable response execution with or without human initiation
  • Own service stability and operational readiness outcomes for a heterogeneous, global application estate. Ensure Major Incident Management discipline, including command, escalation, and executive communications, is consistently executed for critical services.
  • Demand Management and Intake Governance Own operational demand management for centralized production support, ensuring requests, enhancements, onboarding, and change-driven demand are visible, prioritized, and aligned to reliability and capacity constraints.
  • Define and govern standardized demand intake, categorization, and prioritization models, balancing business urgency, service risk, and operational sustainability.
  • Leverage AI-driven demand pattern analysis to distinguish predictable, automatable demand to ensure human capacity is protected for high-judgment activity.
  • Use demand trends to influence service design, onboarding decisions, and support models, preventing unmanaged growth in operational complexity.
  • Capacity Planning & Workforce Sustainability Own capacity planning using demand, incident, and service risk data to proactively forecast workload and skill needs.
  • Translate demand signals into staffing strategies, automation investments, and capacity plans. Ensure the operating model scales sustainably by balancing workloads.
  • Reliability Engineering & Continuous Improvement Strengthen Problem Management as a reliability and demand-reduction lever, using incident trends and recurrence signals to drive systemic risk reduction rather than reactive firefighting.
  • Own and maintain a living automation roadmap that sequences opportunities by ROI, operational risk reduction, and technical feasibility that reduce MTTR, operational toil, and demand on human resources.
  • Partner with engineering, platform, and SRE teams to establish feedback loops between automated remediation outcomes and the knowledge base, ensuring every automated action either confirms or improves the underlying runbook, creating a self-reinforcing reliability system.
  • Data‑Driven Operations & Tool Enablement Use incident, change, and risk data to prioritize staffing, automation, and reliability improvement investments across the portfolio.
  • Define and standardize operational KPIs and health indicators, including demand volume, capacity utilization, MTTR, and automation effectiveness.
  • Ensure ITSM and observability tooling supports consolidated intake, standardized workflows, measurable outcomes, real-time visibility, predictive insights, and actionable reporting.
  • Partner on AI‑enabled and automated capabilities that improve productivity and reliability across operational teams, including predictive insights, automated remediation, and agent-driven coordination across teams.
  • Leadership & Stakeholder Engagement Lead and develop operations, reliability, and service management leaders, setting clear expectations for technical literacy, automation-first thinking, demand accountability, and outcome ownership.
  • Serve as a trusted partner to business service owners, risk, security, and technology leaders, translating operational data and AI-generated insights into actionable, business-relevant decisions.
  • Drive organizational change as services, suppliers, and operating models evolve, ensuring stability, transparency, and capacity is protected during transitions.

Benefits

  • Full-time equivalent employees also will be eligible for a company bonus (depending, in part, on company and individual performance).
  • In addition, Lilly offers a comprehensive benefit program to eligible employees, including eligibility to participate in a company-sponsored 401(k); pension; vacation benefits; eligibility for medical, dental, vision and prescription drug benefits; flexible benefits (e.g., healthcare and/or dependent day care flexible spending accounts); life insurance and death benefits; certain time off and leave of absence benefits; and well-being benefits (e.g., employee assistance program, fitness benefits, and employee clubs and activities).

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service