Site Reliability Engineer

RingCentral•Denver, CO

About The Position

Say hello to opportunities. It’s not every day that you consider starting a new career. We’re RingCentral, and we’re happy that someone as talented as you is considering this role. First, a little about us, we’re a $2 Billion annual revenue company with double digit Annual Recurring Revenue (ARR) and a $93 Billion market opportunity in UCaaS, Contact Center and AI-powered adjacencies. We invest more than $250 million annually to ensure our AI-enabled technology and platforms meet or exceed the needs of our customers. RingSense AI is our proprietary AI solution. It’s designed to fit the business needs of our customers, orchestrated to be accurate and precise, and built on the same open platform principles we apply to our core software solutions. We are seeking a Site Reliability Engineer to help operate and improve large scale communications infrastructure powering SIP based voice and media services. Our platform handles high volumes of realtime sessions globally across distributed infrastructure. You would have the opportunity to work on an experienced team that operates across SIP signaling, media processing, networking, and infrastructure. The team deploys, monitors, and troubleshoots distributed services while contributing to the observability and automation that keeps them reliable. The work is technically challenging and contributions to tooling and process have a real impact on how the platform operates.

Requirements

Strong background administering UNIX/Linux systems and troubleshooting via the command line
Solid grasp of TCP/IP networking fundamentals including routing, NAT, load balancing, and container networking
Experience supporting SIP based VoIP or realtime communications systems, with strong understanding of SIP proxies, applications, session border controllers, RTP media servers
Familiarity with Git based version control systems (e.g. GitLab) and common repository workflows
Experience deploying services using GitOps automation (e.g. FluxCD, CI/CD)
Skilled in analyzing network traffic using Wireshark, tcpdump, or similar tools
Comfortable working with observability platforms including Prometheus, Grafana, Loki, and the ELK stack
Hands on experience operating containerized platforms using Kubernetes, including interaction with the Kubernetes API, container registries, and Helm-based deployments
Working knowledge of persistent storage in Kubernetes (e.g. PVCs)
Experience operating infrastructure in AWS or GCP, including services such as S3, EC2, CloudFront, and certificate/secret management
Familiarity with change management processes and production change approval workflows
Experience automating operational tasks using Python or shell scripting

Nice To Haves

Knowledge of FoIP signaling flows, t.30/t.38 protocols, and related troubleshooting
Experience with message brokers, media routing, or session-state management in distributed systems
Operating large scale distributed systems across multiple regions
Familiarity with hybrid cloud/baremetal environments
Exposure to infrastructure automation tools such as Ansible or Terraform
Understanding of container networking and services exposure
Working knowledge of Kafka, Nginx, or similar distributed platform technologies
Experience managing systems designed to meet high availability targets (e.g. 99.999%+)
Operating within regulated or compliance-scoped environments (e.g. PCI-DSS)
Keeping current and using modern AI tooling safely and responsibly, without overreliance

Responsibilities

Operate and maintain Linux based telephony and platform services in production
Troubleshoot SIP signaling and RTP media flows, including call routing, provisioning, registration, and signaling behavior
Diagnose issues across the network stack affecting realtime voice and media traffic, including analysis of packet and signaling flows
Deploy and administer Kubernetes services using Helm and GitOps workflows (e.g. CI/CD, FluxCD), including tracing and debugging configuration through layered rendering pipelines
Manage stateful and pinned workloads, including understanding of Kubernetes scheduling primitives such as taints, tolerations, and node affinity
Monitor systems and participate in on-call incident response for production infrastructure
Implement production changes using testing, rollback planning, and risk mitigation practices
Contribute to automation, observability, and operational tooling improvements
Coordinate with infrastructure, network, storage, and platform teams to resolve cross-domain issues and maintain highly available global services

Benefits

Comprehensive medical, dental, vision, disability, life insurance
Health Savings Account (HSA), Flexible Spending Account (FSAs) and Commuter benefits
Voluntary supplemental health coverage and life insurance
401K match and ESPP
Paid time off and paid sick leave
Paid parental and pregnancy leave
Family-forming benefits (IVF, Preservation, Adoption etc.)
Emergency backup care (Child/Adult/Pets)
Employee Assistance Program (EAP) with counseling sessions available 24/7
Free legal services that provide legal advice, document creation and estate planning
Employee bonus referral program
Student loan refinancing assistance
Employee 1:1 coaching, perks and discounts program