Sr. IT Operations Engineer

SpaceXHawthorne, CA
4hOnsite

About The Position

As a Sr. IT Operations Engineer, you will lead the design, execution, and continuous improvement of SpaceX’s corporate ITSM (Information Technology Service Management), incident, change, and monitoring practices. You’ll own high‑visibility incidents, mentor peers, and drive data‑driven improvements that raise service reliability across the entire IT organization. You will contribute to building the foundation of SpaceX’s enterprise systems strategy—supporting initiatives across finance, reliability, change management, and IT operations.

Requirements

  • Bachelor's degree in a STEM discipline and 3+ years professional experience; OR 6+ years of professional experience in an IT engineering role in lieu of a degree.
  • 10+ years in IT operations, site‑reliability, or infrastructure engineering.
  • 5+ years administering or developing ITSM workflows and metrics (Jira SM, ServiceNow, etc.).

Nice To Haves

  • Deep knowledge of ITIL frameworks, with proven ownership of incident, change, and problem management programs.
  • Demonstrated success managing high‑severity incidents in mission‑critical environments.
  • Advanced SQL and/or Python skills for building automated analytics pipelines; strong Power BI/Tableau dashboard design.
  • Experience integrating monitoring stacks (Grafana, Prometheus, Datadog, Splunk) with ITSM and notification systems.
  • Background in Lean/Six Sigma, value‑stream mapping, or other continuous‑improvement methodologies.
  • Excellent executive‑level communication and stakeholder‑management abilities.

Responsibilities

  • Lead incident response—coordinating multi‑disciplinary teams, ensuring rapid mitigation, root‑cause determination, and executive communications.
  • Govern the change‑management program: refine risk‑assessment criteria, approve complex changes, and enforce post‑implementation reviews.
  • Architect and optimize monitoring and observability strategy across on‑prem, cloud, and SaaS services; standardize alerting, service level objectives, and dashboards.
  • Develop, automate, and maintain trusted Power BI datasets and reports that surface operational KPIs, trends, and capacity forecasts.
  • Analyze operational data to detect systemic issues; champion corrective and preventive actions with service owners.
  • Integrate ITSM, CMDB (configuration management database), automation, and monitoring platforms to streamline workflows and eliminate manual steps.
  • Mentor junior engineers, promote best practices, and present monthly operational performance reviews to leadership.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service