Sr. Manager, Site Reliability Engineering

Maggiano'sCoppell, TX
Hybrid

About The Position

We are seeking a highly skilled and motivated Sr. Manager, Site Reliability Engineer to join our team. As Sr. Manager, Site Reliability Engineer, you will play a crucial role in ensuring the reliability, performance, and scalability of our systems and services. You will be responsible for building and leading a team of talented engineers, driving initiatives to enhance reliability for our technology systems, streamline operations, and minimize downtime. Your technical expertise, coupled with strong communication skills and strategic thinking, will be instrumental in fostering collaboration across teams and implementing best practices. You will work closely with our development and operations teams to build and maintain robust infrastructure, automate processes, and improve overall system reliability. This role is based in Dallas (Coppell), TX and follows a hybrid schedule (3 days in office). We are currently focused on local candidates or those open to relocating to the area at their own expense. At this time, we are unable to provide sponsorship support.

Requirements

  • Master’s degree and/or bachelor’s degree in combination with equivalent experience in Computer Science, Engineering, or related field.
  • 5+ years as a Site Reliability Engineer or similar role, with a demonstrated track record of successfully managing reliability and scalability of large-scale systems.
  • Strong knowledge of cloud platforms (e.g., AWS, Azure, Google Cloud) and containerization technologies (e.g., Docker, Kubernetes).
  • Proficiency in scripting and automation languages (e.g., Python, Bash, Ansible).
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
  • Demonstrated leadership experience, with a passion for mentoring and developing team members.
  • Excellent problem-solving skills and the ability to work under pressure.
  • Proven ability to solve complex issues in a timely fashion.
  • Proven ability to quickly adapt and flex to a dynamic environment by being a “self-starter”.
  • Strong communication and collaboration skills.
  • Strong project management skills.
  • Strong documentation skills.
  • Solid understanding of networking, security, and system administration.
  • Experience with infrastructure as code (IaC) tools (e.g., Terraform, CloudFormation).
  • Knowledge of CI/CD pipelines and tools (e.g., Jenkins, GitLab CI).
  • Familiarity with database management systems (e.g., MySQL, PostgreSQL).

Responsibilities

  • Build, lead and mentor a team of Site Reliability Engineers, providing guidance and support, while also implementing best practices and resolving complex technical challenges.
  • Collaborate with cross-functional teams to define reliability requirements, establish service level objectives (SLOs), and develop a strategic vision along with defined action items to hold accountability among the team.
  • Monitor system performance, conduct root cause analysis of incidents, implement and document solutions to prevent recurrence, identify bottlenecks, and proactively address issues to ensure high availability and reliability.
  • Design, implement, and maintain scalable and reliable infrastructure to support our applications and services.
  • Develop and maintain automation tools to streamline deployment, monitoring, and incident response processes.
  • Collaborate across the IT department, but specifically with development teams to ensure best practices for software development, testing, and deployment.
  • Conduct root cause analysis of incidents and implements corrective actions to prevent recurrence.
  • Continuously improve system reliability, performance, and scalability through monitoring, testing, and optimization.
  • Gather and analyze metrics from operating systems, logs, as well as applications to assist in performance tuning and fault finding.
  • Partner with development teams to improve services through rigorous testing and release procedures.
  • Participate in system design consulting, platform management, and capacity planning.
  • Balance feature development speed and reliability with well-defined service-level objectives.

Benefits

  • Competitive package with medical, dental, and vision coverage
  • life insurance
  • paid vacation and holidays
  • 401(k) with company match
  • Employee Assistance Program with counseling, financial, legal, and life resources
  • Best You EDU, offering education programs and tuition reimbursement
  • Generous dining discounts at Chili’s® Grill & Bar and Maggiano’s Little Italy®
  • Annual bonus eligibility for every RSC Team Member
  • On-site gym and fitness classes like yoga and boot camp
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service