Senior Cloud Site Reliability Engineer

Solace CorporationBanglaore, IN
Hybrid

About The Position

Enterprise AI is moving from pilots to production, and the constraint is no longer the model — it's the data. Agents are only as good as what they can sense, trust, and act on in the moment, and real-time, event-driven data is becoming the foundation every serious AI system runs on. Solace is the leading platform for the enterprise AI era. Established enterprises worldwide — including RBC Capital Markets, Bosch, Heineken, PSA Singapore, United Airlines, Schwarz Group, and hundreds more — have built their business around Solace to enable intelligent, real-time experiences, modernize their application and integration landscape, and create seamless digital journeys for their customers, partners, and employees. So, the next time you drive a car, order furniture online, fly in a plane, check your bank balance on your phone, your positive experience could be a direct result of our technology—and your hard work! About the Role This position is for a Senior Cloud Site Reliability Engineer. You will be responsible for the daily operations of Solace Cloud , our market-leading SaaS offering, across leading cloud providers and platforms such as Amazon Web Services, Microsoft Azure, Google Cloud Platform, Kubernetes, etc.

Requirements

  • Proven expertise with public cloud providers (AWS, Azure, GCP) services & features
  • Proven expertise with cloud Kubernetes infrastructure platforms such as AWS Elastic Kubernetes Service, Azure Kubernetes Service, Google Kubernetes Service
  • Hands-on experience with Monitoring tools like Datadog, Kibana, Prometheus etc.
  • Hands-on experience with Infrastructure Automation using Terraform, Cloud Formation
  • Hands-on expertise in debugging production alerts
  • Expert-level understanding of Linux Operating Systems
  • Programmer in languages such as Groovy, Python, and Go
  • Certified Kubernetes Administrator
  • Certified Cloud Administrator (AWS, Azure, or GCP)

Nice To Haves

  • Highly technical, excited by technology, and eager to stay up to date in a rapidly evolving environment.
  • Expert-level knowledge in Cloud Networking Solutions
  • Knowledgeable in demonstrating the ability to debug at a system level and resolve incidents in complex cloud-based environments
  • Expert in Site reliability engineering and Incident response
  • A strong communicator who can articulate complex technical issues clearly and concisely & get on the phone with customers.
  • Experienced in SaaS operations and customer-facing technical support

Responsibilities

  • Ensuring that the Solace Cloud Services are healthy and reliable, and that SLAs are being met
  • Design and implement our infrastructure tooling, observability, and automation
  • Contribute to making the production operations more efficient, less error-prone, etc.
  • Expert-level knowledge in handling production Incidents in production-grade multi-cloud environments according to industry-standard Incident management process
  • Process handling service requests and provisioning by the customers.
  • Proven ability to manage customer escalations and drive resolution in mission-critical, high-impact production environments
  • Work directly with customers to identify, troubleshoot, and resolve operational issues.
  • Expert debugging knowledge in Linux and Kubernetes to detect operational issues.
  • Be on-call rotation and provide 24x7 off-hours support

Benefits

  • Flexibility is built into how we work
  • Training programs are designed to help you level up, fast.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service