Site Reliability Engineer

OneStream Software,
$114,000 - $148,000Remote

About The Position

As a Site Reliability Engineer, you will focus on ensuring the platform and services customers rely on are reliable, performant, and highly available. If you enjoy staying at the forefront of technology and automating infrastructure deployments, then this is the job for you. This vital role within Cloud Services requires knowledge and experience designing, implementing, and monitoring scalable and secure cloud services. The employee is expected to work well in a small team and willing to share responsibilities with other team members as needed. You will interact with internal staff, managers, and customers to implement and maintain operations. A passion for technology and learning, and the ability to grow others are vital for success in this role.

Requirements

  • BS/BA in computer science, engineering, or technology-related field (or equivalent work experience).
  • Proven work experience as a Site Reliability Engineer or in a similar role.
  • 6+ years of cloud infrastructure and software development experience.
  • 2+ years hands on experience of Azure Kubernetes Services (AKS) with container-based deployment skills or other platforms such as OpenShift, GKS, EKS.
  • Advanced understanding of APM and observability tools such as Dynatrace, AppInsights, DataDog, Log Analytics, New Relic, Prometheus and Grafana.
  • Advanced understanding of Infrastructure-as-Code (IaC) concepts and tooling (Terraform, CloudFormation templates, Bicep or ARM templates) on Microsoft Azure, Amazon Web Services (AWS), or Google Cloud Platform (GCP).
  • Deep knowledge of Configuration Management/Orchestration utilities such as Ansible, PowerShell DSC, Chef, and Puppet.
  • Advanced understanding of cloud concepts including elasticity, security, and identity management.
  • Well versed familiarity with Agile Development methodologies utilizing Jira or Azure DevOps Boards.
  • 6+ years of hands-on experience with the following technologies, tools, and concepts:
  • Automating processes using PowerShell, Bash, CLI, REST APIs, python, ARM Templates or other scripting languages.
  • Comfortable leveraging source control tools such as Git, Azure DevOps, or GitHub.
  • Knowledge of container orchestration platforms such as Kubernetes, OpenShift, AKS, GKS or helm.
  • Microsoft Azure, Amazon Web Services (AWS) or Google Cloud (GCP).

Nice To Haves

  • Experience working for a cloud service provider (CSP), managed service provider (MSP), or SaaS provider.
  • 6+ years of relevant Azure experience deploying and managing leveraging Infrastructure-as-Code (IAC) concepts.
  • Experience with Microsoft and .NET (.NET, C#, SQL).
  • Experience writing efficient and reliable code in a development environment.
  • Debian, Ubuntu, Alpine or other distributions of the Linux operating systems.
  • Deep knowledge and understanding of containerized applications, with special attention to reliability and monitoring of those containerized applications.

Responsibilities

  • Implement application/infrastructure observability solutions to ensure desired application availability, reliability, and performance.
  • Participate in regular On-Call rotations and share details related to incidents and their resolution through post-mortem reports and regular review meetings.
  • Proactively partner with Product and Engineering teams to identify, develop, deploy, and maintain reliable systems and services.
  • Influence and create new designs, architectures, standards, and methods for large-scale systems.
  • Sustain a high level of reliability for key services and automated systems.
  • Automate processes to improve reliability, performance, and availability.
  • Update technical documentation, workflows, and knowledge base articles.
  • Provide feedback in pull requests and peer coding reviews.
  • Implement codified automated solutions that build integrations between Dynatrace, Azure DevOps and Jira.
  • Solid knowledge in focused areas of OneStream Software.
  • Ability to mentor others in several technical areas.
  • Understanding practical use of SOC/FedRAMP controls to assist Compliance and Security teams.

Benefits

  • Vision
  • Medical
  • Life
  • Dental
  • 401K
  • Excellent Medical Plan.
  • Short & Long Term Disability.
  • Vacation Time.
  • Paid Holidays.
  • Professional Development.
  • Retirement Plan.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service