Technical Operations Manager - Vehicle Reliability Engineering

Aurora Innovation•Dallas, TX

5d•Onsite

About The Position

This high-leverage leadership position requires a technical operations expert to manage the Vehicle Reliability Engineering (VRE) team. This is not a traditional reactive support role; it is a proactive Operations and Reliability leadership position. You will be the central point of contact ("nerve center") for the systemic health of our fleet. You will lead a team of operations engineers focused on automating diagnostics, driving root-cause analysis, and bridging the communication between field operations, the command center, and core engineering teams to ensure our autonomous fleet is ready for commercial scale. This role is an onsite position based in Dallas, Texas, working Monday through Friday.

Requirements

5+ years of experience managing Site Reliability Engineering (SRE), Technical Operations, or Sustaining Engineering teams in a high-growth, high-stakes tech environment.
Bachelor’s degree or experience in a relevant field (e.g. Information Technology, Computer Science, Engineering).
3+ years of direct people management experience (managing teams of 5+ members).
Proficient in deep technical areas like: Linux environments, IT systems, hardware/software integrations, networking, and sensor suites (Lidar/Radar).
Proven experience establishing incident response frameworks, automation protocols, and performance metrics.
Excellent communication skills with the ability to translate complex technical/systems issues for non-technical operations stakeholders.
Strong bias for action and the ability to make high-pressure decisions regarding vehicle safety and operational status.

Nice To Haves

Deep technical knowledge of hardware/software integration, including experience troubleshooting sensors (Lidar/Radar) and industrial computers.
Expert-level Linux skills: 5+ years of experience in Linux administration, command-line troubleshooting, and shell scripting.
Automation Mindset: Previous experience with Python, Bash, or Go for automating operational tasks and support workflows.
Experience in the Autonomous Vehicle (AV), robotics, or aerospace industries.
Knowledge of computer networking (TCP/IP, UDP, VLANs) and data log analysis.
Troubleshoot and diagnose software and hardware issue escalations involving Linux environments, Lidar, Radar, and on-vehicle compute systems.

Responsibilities

Manage, mentor, and scale a team of operations engineers and reliability specialists focused on the systemic health and operational readiness of the Aurora autonomous fleet.
Shift the team’s focus from reactive troubleshooting to proactive incident prevention by driving deep Root Cause Analysis (RCA) and implementing long-term hardware/software fixes.
Partner with engineering teams to build and deploy automated diagnostic tools, scripts, and alerting systems that reduce manual intervention and improve vehicle uptime.
Oversee the "nerve center" of fleet health, utilizing telemetry, Linux command-line tools, and data dashboards to predict and resolve sensor (Lidar/Radar) and compute failures before they impact field operations.
Act as the primary translation layer between Operations, Product, and Infrastructure engineering, reporting to senior management and keeping stakeholders aligned on reliability initiatives.
Develop and implement SRE-based support processes, incident response playbooks, and workflow improvements to plan for scalability and new technology deployments.
Conduct regular training sessions for the support team to deepen their Linux, networking, system troubleshooting, and automation skills.
Develop and track key performance indicators (e.g., MTTR, uptime percentages, SLAs/SLOs) required to support a 24/7 commercial operations base.