Site Reliability Engineer

Cisco•Research Triangle Park, NC

11d•$137,000 - $277,600•Onsite

About The Position

The application window is expected to close on: 06/15/2026. Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received. Meet the Team We are an agile team inside Cisco IT, building the next generation NoSQL and Vector Databases on cloud platforms that will be demonstrated by all of Cisco as we move to cloud native applications. This is a small team of highly motivated individuals demonstrating Agile scrum methodology. Our team is responsible for building and operating Hybrid Cloud Database services in a DevSecOps model. We move at a fast pace and are passionate about cloud and automation. We have a history of building clouds at a large scale and are looking for someone who as passionate about it as we are. Your Impact In this role, you will ensure production services are scalable, resilient, high-performing, and secure. You’ll support uptime through an On-Call rotation, monitoring, and alerting to meet SLOs and SLAs. Reliability is strengthened by conducting Disaster Recovery drills and managing incidents—investigating root causes, applying remediation, and driving continuous improvement. You’ll define reliability and security requirements for systems and components to meet company, customer, and regulatory objectives. Operational efficiency is enhanced by automating repetitive tasks and mitigating failure points. You’ll also develop tools and techniques for early detection of issues in products, packaging, processes, and product reliability. Serves as an experienced professional resource, independently applying best practices and business knowledge to improve products or services while guiding and supporting less experienced colleagues. Understands project and/or department needs and establishes relationships with appropriate cross-functional partners to gather input, collect information, and complete work steps. Designs and deploys small to mid-size or moderately complex solutions to optimize reliability, availability, latency, and performance. Builds automated platforms and applies design, deployment, and coding expertise to enhance reliability, scalability, and velocity; designs and tests high availability and disaster recovery measures across regions and customers. Forecasts and builds reports to determine at what point resources will be at capacity. Designs and implements tools to monitor and provide transparency into the performance and reliability of our infrastructure; collaborates with Developers and Ops to identify issues, serves as on-call SRE, and leads post mortems and root cause analyses. Builds and ensures security controls are in place in architectural design, collaborates with security in designing or reviewing security controls, and may actively contribute in security incident response.

Requirements

Bachelor’s degree in Computer Science or a related field
5+ years of technical expertise with cloud databases and experience with Vector databases (such as Pinecone, Weaviate, or Milvus) and/or with at least two of the following: PostgreSQL, MySQL, or MongoDB
Experience with AI frameworks (OpenAI API, Langchain, etc)
Experience with designing, administering, and maintaining Vector DB or Cloud DB architecture, including provisioning, upgrades, operations, backups, security, and performance
Experience with CI/CD framework and tools like Git/Github, Jenkins
Experience with automating DB tasks using Python, Database Lifecycle Management

Nice To Haves

Experience with public cloud like AWS, GCP, or Azure, or container technologies like Kubernetes and Openshift
Backend experience with Python or other programming skills
Demonstrated experience in building scalable databases on hybrid cloud infrastructure

Responsibilities

Ensure production services are scalable, resilient, high-performing, and secure.
Support uptime through an On-Call rotation, monitoring, and alerting to meet SLOs and SLAs.
Conduct Disaster Recovery drills and manage incidents—investigating root causes, applying remediation, and driving continuous improvement.
Define reliability and security requirements for systems and components to meet company, customer, and regulatory objectives.
Automate repetitive tasks and mitigate failure points.
Develop tools and techniques for early detection of issues in products, packaging, processes, and product reliability.
Serve as an experienced professional resource, independently applying best practices and business knowledge to improve products or services while guiding and supporting less experienced colleagues.
Understand project and/or department needs and establish relationships with appropriate cross-functional partners to gather input, collect information, and complete work steps.
Design and deploy small to mid-size or moderately complex solutions to optimize reliability, availability, latency, and performance.
Build automated platforms and apply design, deployment, and coding expertise to enhance reliability, scalability, and velocity; design and test high availability and disaster recovery measures across regions and customers.
Forecast and build reports to determine at what point resources will be at capacity.
Design and implement tools to monitor and provide transparency into the performance and reliability of our infrastructure; collaborate with Developers and Ops to identify issues, serve as on-call SRE, and lead post mortems and root cause analyses.
Build and ensure security controls are in place in architectural design, collaborate with security in designing or reviewing security controls, and may actively contribute in security incident response.

Benefits

medical, dental and vision insurance
a 401(k) plan with a Cisco matching contribution
paid parental leave
short and long-term disability coverage
basic life insurance
Cisco restricted stock units
10 paid holidays per full calendar year
1 floating holiday for non-exempt employees
1 paid day off for employee’s birthday
paid year-end holiday shutdown
4 paid days off for personal wellness
16 days of paid vacation time per full calendar year (non-exempt employees)
flexible vacation time off program (exempt employees)
80 hours of sick time off provided on hire date and each January 1st thereafter
up to 80 hours of unused sick time carried forward from one calendar year to the next
Additional paid time away may be requested to deal with critical or emergency issues for family members
Optional 10 paid days per full calendar year to volunteer
annual bonuses (for non-sales roles)
performance-based incentive pay (for sales roles)