Site Reliability Engineering Technical Leader (Remote)

CiscoResearch Triangle Park, NC
$149,100 - $303,100Remote

About The Position

The application window is expected to close on: 06/30/2026. Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received. We are the Data Center Network Services team within Cisco IT that supports network services for Cisco Engineering and business functions worldwide. Our mission is simple – build the network of the future that is adaptable and agile on Cisco’s networking solutions. Cisco IT networks are deployed, monitored, and managed with a DevOps approach to support rapid application changes. We invest in transformative technologies that enable us to deliver services in a fast and reliable manner. The team culture is collaborative and fun, where thinking creatively and tinkering on new ideas are encouraged. You will be responsible for designing, developing, testing, and deploying advanced AI-driven software features for data center networks. You have strong interpersonal skills and are comfortable collaborating with fellow engineers, cross-functional engineering teams, and internal clients. You will create and implement innovative, high-quality capabilities to provide our clients with the best possible experience.

Requirements

  • Bachelor of Engineering or Technology with 10+ years of experience designing and building scalable, reliable networking solutions for AI/ML infrastructure and high-performance computing.
  • Strong expertise in Cisco Data Center Networking technologies, ACI networks, and technologies such as Routing, Switching, Nexus, VPC, VDC, VLAN, VXLAN, and BGP.
  • Proven leadership in driving strategic automation initiatives, guiding teams in automation, fostering continuous improvement and innovation to enhance service reliability and operational efficiency.
  • Experience managing networking for GPU cluster environments, implementing AI-based observability tools, and forecasting infrastructure needs for scaling AI workloads and managing hardware/software lifecycle.
  • Skilled in creating documentation and training materials and collaborating closely with Business Units to resolve hardware/software interoperability issues.
  • Proficiency in Terraform and Ansible for Infrastructure as Code (IaC).
  • Strong Programming skills and solid grasp of software engineering concepts including common data structures/standard algorithms, object-oriented design, distributed computing, and cloud computing paradigms.
  • Expertise in AI Fabric and Networking with a deep understanding of high-performance networking for AI/ML workloads.
  • Ability to implement and utilize AI-based observability tools.

Nice To Haves

  • Understanding of Build & Release Operations, DevOps principles, and Agile practices with a focus on quality-driven development.
  • Familiarity with Unix/Linux environments and domain knowledge of contemporary network technologies, network management, and protocols.
  • Experience with application/platform instrumentation, measurement, log data processing, monitoring, and tools such as JIRA, GIT, and Jenkins.
  • Cisco certifications (CCNA or CCNP) and experience managing Cisco Nexus Dashboard, APIC, Nexus Dashboard Fabric Controller, and VXLAN-based networks including troubleshooting.

Responsibilities

  • Designing, developing, testing, and deploying advanced AI-driven software features for data center networks.
  • Creating and implementing innovative, high-quality capabilities to provide clients with the best possible experience.
  • Driving strategic automation initiatives.
  • Guiding teams in automation.
  • Fostering continuous improvement and innovation to enhance service reliability and operational efficiency.
  • Managing networking for GPU cluster environments.
  • Implementing AI-based observability tools.
  • Forecasting infrastructure needs for scaling AI workloads.
  • Managing hardware/software lifecycle.
  • Creating documentation and training materials.
  • Collaborating closely with Business Units to resolve hardware/software interoperability issues.

Benefits

  • medical, dental and vision insurance
  • a 401(k) plan with a Cisco matching contribution
  • paid parental leave
  • short and long-term disability coverage
  • basic life insurance
  • Cisco restricted stock units
  • 10 paid holidays per full calendar year
  • 1 floating holiday for non-exempt employees
  • 1 paid day off for employee’s birthday
  • paid year-end holiday shutdown
  • 4 paid days off for personal wellness
  • 16 days of paid vacation time per full calendar year (non-exempt employees)
  • flexible vacation time off program (exempt employees)
  • 80 hours of sick time off provided on hire date and each January 1st thereafter
  • up to 80 hours of unused sick time carried forward
  • Optional 10 paid days per full calendar year to volunteer
  • annual bonuses (for non-sales roles)
  • performance-based incentive pay (for sales roles)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service