About The Position

Join General Motors’ Vehicle Security Platforms (VSP) teams, where we build resilient, secure, and scalable platforms supporting mission-critical vehicle security communications. We seek an experienced Staff Site Reliability Engineer (SRE) with extensive experience in scaling distributed systems and driving end-to-end reliability strategies. In this role, you will shape the reliability of GM’s next-generation vehicle security platforms, influence cross-organizational architecture decisions, and embed reliability as a first-class product concern. Your leadership will contribute directly to protecting millions of vehicles and customers globally.

Requirements

  • 7+ years of experience in Site Reliability Engineering, DevOps, or infrastructure/platform roles supporting secure, scalable systems.
  • Strong Proven expertise in designing and scaling cloud infrastructure (Azure) and container orchestration systems (Kubernetes, Docker).
  • Demonstrated mastery of infrastructure-as-code frameworks (Terraform, Helm, CloudFormation, etc).
  • Proficiency in Python and one JVM language (Java or Kotlin), and working knowledge of Go.
  • Deep architectural understanding of distributed systems, networking, system design, and large-scale security practices.
  • Track record of architecting and running zero-downtime systems in production.
  • Experience with modern monitoring and reliability tooling and frameworks (Prometheus, Datadog, OpenTelemetry, etc.).
  • Experience leading incident response, uptime SLO/SLA management, and operational excellence initiatives across multiple teams.
  • Capable of influencing architecture and product strategy while maintaining a hands-on approach to systems reliability.
  • Exceptional communication skills, able to present complex trade-offs and foster alignment across executive, product, and engineering stakeholders.

Nice To Haves

  • BS/MS/PhD in Computer Science, Engineering, or equivalent industry experience.
  • Deep understanding of encryption technologies, secure data handling practices, and identity management.
  • Experience designing and operating IoT or automotive-focused architectures with rigorous availability and safety requirements.
  • Direct experience in chaos engineering, game-day testing, disaster recovery orchestration, and production load testing.
  • Ability to grow and mentor engineers into leaders in their domain, building SRE teams that can operate independently at scale.
  • Demonstrated success in defining and executing reliability strategies with measurable business impact.
  • Strong product mindset with the ability to balance engineering excellence with speed and business priorities.

Responsibilities

  • Implement, and evolve secure, highly available, and globally distributed systems powering GM’s vehicle security platforms.
  • Own reliability roadmaps, establishing frameworks and strategies for system hardening, high availability, disaster recovery, and operational scalability.
  • Develop automation-first solutions to eliminate operational toil, with advanced use of languages such as Python, Go, and Java.
  • Lead incident response, driving systematic elimination of failure modes through blameless postmortems PRRs and cross-team preventative initiatives.
  • Drive observability strategies with best-in-class practices for metrics, logging, and distributed tracing, using Prometheus, Datadog, or similar stacks.
  • Partner with engineering, platform, and security teams to design for reliability from inception, influencing architecture reviews and CI/CD best practices.
  • Lead optimization, capacity planning, and performance-tuning strategies for large-scale, security-critical platforms.
  • Introduce modern SRE practices such as chaos engineering, resilience testing, and progressive delivery to validate support teams and evolve system safety along with SLO, SLI, and SLAs.
  • Mentor engineers across disciplines on SRE, platform resilience, secure operational practices, and architectural trade-offs.
  • Evaluate and adopt technologies (open-source, enterprise, homegrown) for security and reliability at scale.
  • Influence product strategy in partnership with engineering leads, ensuring operational reliability is prioritized alongside customer and business outcomes.

Benefits

  • From day one, we're looking out for your well-being–at work and at home–so you can focus on realizing your ambitions. Learn how GM supports a rewarding career that rewards you personally by visiting Total Rewards resources.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service