Reliability Engineer Jobs

1,958 jobs found — updated daily

About The Position

Our Cloud Operations team is seeking a Senior DevOps & Site Reliability Engineer who will play a critical role in ensuring the reliability, performance, and scalability of our diverse SaaS applications. You are a problem-solver and an automator at heart. This role is a specialized hybrid, bridging the gap between legacy VM-based architectures and modern cloud-native standards through aggressive automation and development-focused operations. Unlike a traditional SRE, this role is deeply integrated with the software development lifecycle, focusing on the consolidation and optimization of platform operations. You will be responsible for building the CI/CD frameworks, self-service tools, and AI-driven automation that allow our engineering teams to move faster while maintaining rock-solid stability. Your mission is to maximize the ROI of our existing infrastructure by "automating away" manual toil. On-call coverage will be required on a weekly rotation basis.

Requirements

  • Must have a passion for life-long learning.
  • 6+ years in DevOps or SRE roles, with a proven track record of bridging development and operations in complex cloud environments
  • Extensive experience with Microsoft Azure (IaaS, PaaS, App Services, Networking) and/or Google Cloud Platform (GCP).
  • Expert-level PowerShell and Python skills.
  • Hands-on experience with Bicep or Terraform is required
  • Strong background in Windows/Linux Server OS, Kubernetes (AKS/GKE), Helm, and container orchestration
  • Familiarity with various middleware and PaaS technologies (e.g. Event Hub, Service Bus, CosmosDB, RabbitMQ, MongoDB, etc.)
  • Expert-level troubleshooting and the ability to reason through complex process workflows to identify faults in large-scale platform environments.

Nice To Haves

  • Experience with Atlassian suite (Jira, Confluence, Bitbucket).
  • Experience with AI-driven log analysis or automated incident remediation.
  • Knowledge of database tuning (SQL Server, MySQL, MongoDB).
  • Familiarity with compliance standards (SOC2, HIPAA, GDPR).

Responsibilities

  • Identifying manual "toil" and replacing it with automated workflows for monitoring, change management, and routine administration of large-scale VM environments to ensure a positive ROI.
  • Leading the integration of AI tools for automated code reviews, development frameworks, and predictive log analysis to drive departmental velocity and efficiency.
  • Designing and maintaining "self-service" deployment frameworks and CI/CD pipelines (GitHub Actions, Bamboo) using Infrastructure as Code (Bicep, Terraform).
  • Evaluating platform components to determine the most cost-effective path: automating the current state or migrating features to modern, shared architectures.
  • Designing and maintaining a comprehensive observability stack across Azure and GCP (metrics, logs, traces) to identify performance bottlenecks and proactively address system defects.
  • Partner with engineering, security and operations teams to ensure new features are "born" with reliability, security and automated delivery in mind; Ensure adherence to security best practices and compliance standards (SOC2, HIPAA, ISO 27001) and operational excellence with cost efficiency.
  • Investigating complex performance defects by following log trails across web, application, and database tiers (SQL Server, MongoDB, MySQL).
  • Ensuring all platforms meet security standards (SOC2, HIPAA, ISO 27001) through automated policy enforcement across Azure and GCP.

Benefits

  • competitive salaries
  • medical, dental and vision coverage
  • disability coverage
  • employer paid life insurance
  • mental health resources
  • 401(k) plan
  • fully paid parental leave program
  • Generous PTO
  • Flexible work schedules
  • Remote work opportunities
  • Paid company holidays
  • Appspace Quiet Fridays (No non-essential internal meetings scheduled)
  • A casual dress work environment

Build a Resume for Reliability Engineer

The resume builder that gets results.

  • Get clear feedback so you look as qualified as you are
  • Align your resume with the job to get further in the process, faster
  • Take the guesswork out of resume writing

Explore Related Job Searches

Frequently Asked Questions

Common questions about Reliability Engineer careers and jobs.

Based on current job postings on Teal, the average Reliability Engineer salary in the US is approximately $164,000 per year, with a typical range of $68,000 to $244,000.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service