About The Position

Our Cloud Operations team is seeking a Senior DevOps & Site Reliability Engineer who will play a critical role in ensuring the reliability, performance, and scalability of our diverse SaaS applications. You are a problem-solver and an automator at heart. This role is a specialized hybrid, bridging the gap between legacy VM-based architectures and modern cloud-native standards through aggressive automation and development-focused operations. Unlike a traditional SRE, this role is deeply integrated with the software development lifecycle, focusing on the consolidation and optimization of platform operations. You will be responsible for building the CI/CD frameworks, self-service tools, and AI-driven automation that allow our engineering teams to move faster while maintaining rock-solid stability. Your mission is to maximize the ROI of our existing infrastructure by "automating away" manual toil. On-call coverage will be required on a weekly rotation basis. In this role, you will be the technical anchor for a global platform footprint that includes a mix of Azure IaaS/PaaS, Google Cloud Platform (GCP), Kubernetes, and various data platforms. Your day will consist of: Intelligent Automation & DevOps: Identifying manual "toil" and replacing it with automated workflows for monitoring, change management, and routine administration of large-scale VM environments to ensure a positive ROI. AI-Enhanced Operations: Leading the integration of AI tools for automated code reviews, development frameworks, and predictive log analysis to drive departmental velocity and efficiency. Scalable CI/CD & Provisioning: Designing and maintaining "self-service" deployment frameworks and CI/CD pipelines (GitHub Actions, Bamboo) using Infrastructure as Code (Bicep, Terraform). Strategic ROI Projects: Evaluating platform components to determine the most cost-effective path: automating the current state or migrating features to modern, shared architectures. Unified Observability: Designing and maintaining a comprehensive observability stack across Azure and GCP (metrics, logs, traces) to identify performance bottlenecks and proactively address system defects. Cross-Functional Collaboration: Partner with engineering, security and operations teams to ensure new features are "born" with reliability, security and automated delivery in mind; Ensure adherence to security best practices and compliance standards (SOC2, HIPAA, ISO 27001) and operational excellence with cost efficiency. Root Cause Analysis & Forensics: Investigating complex performance defects by following log trails across web, application, and database tiers (SQL Server, MongoDB, MySQL). Governance & Security: Ensuring all platforms meet security standards (SOC2, HIPAA, ISO 27001) through automated policy enforcement across Azure and GCP.

Requirements

  • Must have a passion for life-long learning.
  • 6+ years in DevOps or SRE roles, with a proven track record of bridging development and operations in complex cloud environments
  • Extensive experience with Microsoft Azure (IaaS, PaaS, App Services, Networking) and/or Google Cloud Platform (GCP).
  • Expert-level PowerShell and Python skills.
  • Hands-on experience with Bicep or Terraform is required
  • Strong background in Windows/Linux Server OS, Kubernetes (AKS/GKE), Helm, and container orchestration
  • Familiarity with various middleware and PaaS technologies (e.g. Event Hub, Service Bus, CosmosDB, RabbitMQ, MongoDB, etc.)
  • Expert-level troubleshooting and the ability to reason through complex process workflows to identify faults in large-scale platform environments.

Nice To Haves

  • Experience with Atlassian suite (Jira, Confluence, Bitbucket).
  • Experience with AI-driven log analysis or automated incident remediation.
  • Knowledge of database tuning (SQL Server, MySQL, MongoDB).
  • Familiarity with compliance standards (SOC2, HIPAA, GDPR).

Responsibilities

  • Play a critical role in ensuring the reliability, performance, and scalability of our diverse SaaS applications.
  • Bridge the gap between legacy VM-based architectures and modern cloud-native standards through aggressive automation and development-focused operations.
  • Build CI/CD frameworks, self-service tools, and AI-driven automation.
  • Maximize the ROI of our existing infrastructure by "automating away" manual toil.
  • Serve as the technical anchor for a global platform footprint including Azure IaaS/PaaS, Google Cloud Platform (GCP), Kubernetes, and various data platforms.
  • Identify manual "toil" and replace it with automated workflows for monitoring, change management, and routine administration of large-scale VM environments.
  • Lead the integration of AI tools for automated code reviews, development frameworks, and predictive log analysis.
  • Design and maintain "self-service" deployment frameworks and CI/CD pipelines (GitHub Actions, Bamboo) using Infrastructure as Code (Bicep, Terraform).
  • Evaluate platform components to determine the most cost-effective path: automating the current state or migrating features to modern, shared architectures.
  • Design and maintain a comprehensive observability stack across Azure and GCP (metrics, logs, traces).
  • Partner with engineering, security and operations teams to ensure new features are "born" with reliability, security and automated delivery in mind.
  • Ensure adherence to security best practices and compliance standards (SOC2, HIPAA, ISO 27001) and operational excellence with cost efficiency.
  • Investigate complex performance defects by following log trails across web, application, and database tiers (SQL Server, MongoDB, MySQL).
  • Ensure all platforms meet security standards (SOC2, HIPAA, ISO 27001) through automated policy enforcement across Azure and GCP.

Benefits

  • competitive salaries
  • medical, dental and vision coverage
  • disability coverage
  • employer paid life insurance
  • mental health resources
  • 401(k) plan
  • fully paid parental leave program
  • Generous PTO
  • Flexible work schedules
  • Remote work opportunities
  • Paid company holidays
  • Appspace Quiet Fridays (No non-essential internal meetings scheduled)
  • A casual dress work environment
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service