Site Reliability Engineering (SRE) Team Lead

OneMain (Formerly Springleaf & OneMain Financials).Irving, TX
40d

About The Position

We are looking for a highly skilled and experienced Site Reliability Engineering Team Lead to guide our SRE team, foster best practices, and ensure operational excellence across our infrastructure. Position Overview As the SRE Team Lead, you will be responsible for the technical leadership of a talented team of site reliability engineers dedicated to maintaining and improving the reliability, scalability, and performance of our critical systems and services. You will serve as a technical leader and mentor, driving strategic initiatives around automation, incident management, observability and system design while collaborating closely with engineering, operations, and product teams.

Requirements

  • BA/BS in Computer Science, Engineering, related field, or equivalent experience.
  • 7+ years of experience in site reliability engineering, systems engineering, or related roles, with at least 2 years in a leadership position.
  • Proven experience leading and scaling high-performing engineering teams.
  • Deep expertise in cloud platforms (AWS, GCP, Azure) and container orchestration (Kubernetes, Docker).
  • Strong skills in infrastructure as code tools (Terraform, Ansible, CloudFormation) and CI/CD pipelines.
  • Proficiency with monitoring and alerting systems (Prometheus, Grafana, ELK, Datadog).
  • Solid programming and scripting skills (Python, Go, Bash, or similar).
  • Strong understanding of distributed systems, networking, security, and databases.
  • Excellent leadership, communication, and collaboration skills.
  • Experience managing incident response and on-call rotations.

Nice To Haves

  • Experience working with microservices and event-driven architectures.
  • Familiarity with compliance frameworks such as GDPR, PCI, SOX, or SOC 2.
  • Background in DevOps practices and tooling

Responsibilities

  • Lead, mentor, and grow a team of site reliability engineers, promoting a culture of reliability, automation, and continuous improvement.
  • Drive the design, implementation, and maintenance of scalable and fault-tolerant infrastructure to support high-availability services.
  • Oversee incident management processes, including triage, root cause analysis, and postmortems to improve system reliability and prevent recurrence.
  • Collaborate cross-functionally with software engineering, product, and operations teams to integrate reliability best practices into the software development lifecycle.
  • Define and implement operational metrics, SLIs/SLOs, and dashboards to monitor system health and drive proactive improvements.
  • Manage and assess the observability of critical environments proactively addressing gaps that may arise.
  • Oversee the release management processes, artifacts and tools that drive a repeatable software delivery lifecycle.
  • Champion automation efforts to reduce manual intervention, improve deployment pipelines, and optimize infrastructure management.
  • Lead capacity planning, disaster recovery, and performance tuning efforts.
  • Ensure security and compliance standards are upheld across infrastructure and operations.

Benefits

  • Health and wellbeing options including medical, prescription, dental, vision, hearing, accident, hospital indemnity, and life insurances
  • Up to 4% matching 401(k)
  • Employee Stock Purchase Plan (10% share discount)
  • Tuition reimbursement
  • Paid time off (15 days' vacation per year, plus 2 personal days, prorated based on start date)
  • Paid sick leave as determined by state or local ordinance, prorated based on start date
  • Paid holidays (7 days per year, based on start date)
  • Paid volunteer time (3 days per year, prorated based on start date)

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Credit Intermediation and Related Activities

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service