Operations, Reliability and IT Manager

RADARSan Diego, CA
10h$200,000 - $270,000

About The Position

We are seeking an experienced and proactive Operations, Reliability, and IT Manager to lead our hybrid infrastructure, internal IT operations, and site reliability engineering initiatives. This people management position is responsible for leading a team of 5–8 engineers at launch, with the mandate to grow the team to 15 throughout the year. This role sits at the intersection of SRE, DevOps, ML Ops, and internal IT operations. You'll oversee critical infrastructure spanning edge and cloud environments as we work to unify compute across our distributed architecture. As the organization scales, this team will evolve into specialized units focused on engineering reliability and internal operations respectively. The successful candidate will play a pivotal role in shaping this evolution.

Requirements

  • You have 5+ years of leadership experience in SRE, DevOps, IT management, or related roles
  • You have a proven track record as a people manager, including hiring, developing, and mentoring technical teams
  • You have a strong technical background across SRE principles, DevOps practices, and internal IT operations
  • You have experience managing hybrid infrastructure (Windows, Linux, Mac) and cloud platforms
  • You have demonstrated ability to implement observability, monitoring, and incident management best practices
  • You are an excellent communication and stakeholder management skills with ability to influence across technical and non-technical audiences
  • You have experience delivering IT support and operations for distributed, fast-growing teams

Nice To Haves

  • You have experience in highly regulated or compliance-driven industries
  • You have hands-on experience with Google Cloud Platform (GCP) and Microsoft Azure
  • You have familiarity with ML Ops practices and infrastructure
  • You have experience with Fortinet or Cisco networking hardware (configuration, troubleshooting, maintenance)
  • You have a strong understanding of IAM, including user provisioning, RBAC, and audit logging
  • You have knowledge of network security best practices, including firewall management, VPNs, and security policy enforcement
  • You have experience with modern observability and endpoint management tools

Responsibilities

  • Lead, mentor, and grow a team of 5–8 engineers, with plans to expand to 15 throughout the year
  • Oversee SRE, DevOps, and ML Ops practices, ensuring operational excellence across engineering teams
  • Manage internal IT operations including help desk, endpoint management, and technical support for a hybrid fleet of Windows, Linux, and Mac machines
  • Drive the Control Plane initiative to unify compute across edge and cloud infrastructure
  • Own observability strategy, monitoring, incident response, and operational rigor across the engineering organization
  • Ensure network reliability, security posture, and compliance through well-documented processes and regular audits
  • Act as primary escalation point for technical incidents and crisis response
  • Collaborate with engineering, security, and product teams to align operations with organizational priorities
  • Define team structure, processes, and leadership framework as the organization scales

Benefits

  • equity
  • comprehensive medical and dental coverage
  • life and disability benefits
  • 401k plan
  • flexible time off
  • paid parental leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service