Lead, IT Service Operations

S&P GlobalRaleigh, NC
6d$14,000Hybrid

About The Position

About The Role: The Team: Service Management is a global team that provides specialized technical support across the suite of trade processing and workflow solutions that support all participants in Markets Group. The Service Management team works collaboratively, both internally and across our customer base, operating in a sharing and learning culture with a view to build continuous improvement in our processes. Responsibilities & Impact: We are seeking an experienced Service Management professional with more than 10 years of work experience to join the team in Dallas or Raleigh, US. The role encompasses 2nd line technical application support & Cloud Infrastructure Management for Markets group of Enterprise Solutions. This person will report directly to the Global Manager responsible for application support and will work closely with the global team contributing to the quality of our support. Act as a strategic technology partner to Architecture, Engineering, Business Systems, and Global Service Delivery (L1/L2/L3), ensuring enterprise-grade, resilient, and scalable IT services aligned to business outcomes. Establish and lead a collaborative service excellence culture, driving standardized, repeatable, and cost-efficient operational processes with a strong focus on quality, reliability, and continuous improvement. Own and govern the Major Incident Management lifecycle, from fault detection and triage through resolution, executive communication, post-incident reviews, and sustainable Root Cause remediation. Lead service performance reviews with business and technology stakeholders, identifying systemic improvement opportunities, operational risks, and reliability enhancements. Provide overall accountability for people leadership, including talent strategy, recruitment, onboarding, performance management, career development, and succession planning for Service Management and SRE teams. Define and evolve enterprise-level observability and reliability frameworks, covering metrics, logs, traces, SLIs/SLOs, and error budgets across hybrid and cloud platforms. Own Disaster Recovery, resiliency strategy, and operational readiness, ensuring regular testing, executive assurance, and continuous enhancement of recovery capabilities. Serve as a senior technical leader and mentor, guiding SREs, DevOps, and engineering teams while driving adoption of best practices across reliability engineering and operations.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related discipline.
  • Ideally 10-12+ years of progressive experience in SRE, DevOps, Platform Engineering, or Technology Operations, including leadership responsibility.
  • Proven experience designing and operating high-availability, disaster-recovery, and incident response capabilities across AWS, Azure, or GCP.
  • Strong understanding of ITIL-aligned Service Management processes and enterprise operational governance.
  • Deep expertise with observability platforms such as Splunk, CloudWatch, Prometheus, Grafana, Datadog, or equivalent.
  • Strong database expertise (Oracle / PostgreSQL), including advanced SQL tuning, performance optimization, and operational troubleshooting.
  • Demonstrated experience leading post-incident reviews and driving preventative engineering outcomes.
  • Excellent decision-making and leadership capabilities under high-pressure, executive-visible incidents.
  • Strong knowledge of Linux and Windows operating systems, automation, and scripting (Python preferred).
  • Solid understanding of SDLC, Agile methodologies, defect triage, and engineering collaboration models.

Nice To Haves

  • Prior experience in Financial Services and/or S&P Global technology platforms is highly desirable.

Responsibilities

  • Provide end-to-end ownership of Incident, Problem, Change, and Business Continuity processes, ensuring predictable, high-quality service delivery to internal and external customers.
  • Operate as the primary escalation authority for complex, high-impact production issues, coordinating across engineering, cloud, security, and vendor teams.
  • Partner closely with Product, Architecture, and Delivery teams to ensure operational readiness for releases, embedding reliability, supportability, and resilience early in the design lifecycle.
  • Drive continuous improvement initiatives across monitoring, alerting, reporting, automation, and operational maturity.
  • Embed AI/ML-driven operations (AIOps) to enhance anomaly detection, predictive alerting, intelligent noise reduction, and proactive incident prevention.
  • Influence and support technology governance, risk management, compliance, and audit activities related to service reliability.
  • Ensure 24x7 proactive monitoring and management of business-critical platforms, restoring service rapidly and minimizing customer impact.
  • Define and enforce incident severity models, ensuring accurate impact assessment, prioritization, and stakeholder communication.
  • Maintain end-to-end ownership of incidents, including those requiring third-line engineering or formal change execution.
  • Provide clear, consistent, and executive-level communication during incidents, outages, and service degradation.
  • Oversee application support spanning infrastructure, data remediation, user queries, education, and deep-dive incident investigations.
  • Drive observability across events, alerts, batch jobs, capacity planning, and performance KPIs, translating insights into actionable change.
  • Collaborate with functional and technical teams to ensure future deliverables (functional and non-functional) are operationally viable.
  • Champion knowledge management, ensuring high-quality runbooks, SOPs, and operational documentation in Confluence.
  • Deliver against SLA, OLA, and SLO commitments, with transparent reporting and corrective actions.
  • Leverage AIOps and reliability analytics to identify trends, systemic risks, and optimization opportunities at scale.

Benefits

  • Health & Wellness: Health care coverage designed for the mind and body.
  • Flexible Downtime: Generous time off helps keep you energized for your time on.
  • Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
  • Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
  • Family Friendly Perks: It’s not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families.
  • Beyond the Basics: From retail discounts to referral incentive awards—small perks can make a big difference.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service