Manager, Site Reliability Engineering

General MotorsMountain View, CA
95d$165,000 - $298,800Hybrid

About The Position

As an SRE Engineering Manager, you will be expected to not only lead your team in setting priorities and ensuring alignment with organizational goals but also to be deeply technical. We expect our managers to be able to contribute directly through coding, reviewing code, and mentoring engineers. While it's unlikely that you'll spend the majority of your time coding, having the capability and willingness to dive into technical details, solve problems hands-on, and support your team's technical decisions is crucial. You'll be a mentor, guide, and a partner, helping engineers grow, and ensuring the reliability and efficiency of the systems they are working on. We believe in setting a high bar for engineering managers who can lead by example in both technical expertise and people leadership.

Requirements

  • Bachelor's degree in computer science or related fields or equivalent work experience.
  • 8+ years of experience in software development teams.
  • Proficiency in at least one programming language (e.g., Python, Go, Java) and familiarity with multiple language ecosystems.
  • Solid understanding of operating systems, networking, distributed systems, databases, and storage architectures.
  • Deep understanding of how code runs on underlying hardware, including operating systems, algorithms, and data structures.
  • Experience handling production incidents, including root cause analysis, mitigation, and working through complex system failures.
  • Strong communication skills, with an ability to explain technical concepts to both engineering and business stakeholders.
  • Proven experience in automating manual processes, building deployment pipelines, or managing configuration systems.

Nice To Haves

  • Experience with cloud platforms (AWS, GCP, Azure).
  • Familiarity with container orchestration systems like Kubernetes.
  • A track record of managing or developing distributed systems.
  • Prior experience with Java in production.

Responsibilities

  • Develop tools and software to automate operational processes, improve system reliability, and reduce manual intervention.
  • Lead, Implement and improve monitoring and observability frameworks, enabling proactive detection and resolution of incidents.
  • Participate in an on-call rotation to diagnose, troubleshoot, and mitigate production incidents, ensuring minimal downtime and swift resolution.
  • Work alongside developers to ensure the quality, scalability, and reliability of our services. Practice shared ownership of services in production, fostering a 'You build it, you run it' culture.
  • Manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to manage reliability expectations effectively.
  • Conduct deep-dive analyses of incidents and collaborate on post-incident reviews to derive learnings and prevent recurrence. Champion a culture of continuous improvement.
  • Evaluate system performance and advocate for optimizations that reduce infrastructure costs while maintaining service reliability.

Benefits

  • Medical, dental, vision, Health Savings Account, Flexible Spending Accounts.
  • Retirement savings plan.
  • Sickness and accident benefits.
  • Life insurance.
  • Paid vacation & holidays.
  • Tuition assistance programs.
  • Employee assistance program.
  • GM vehicle discounts.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Manager

Industry

Transportation Equipment Manufacturing

Education Level

Bachelor's degree

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service