About The Position

As a Senior SRE, you will be a key contributor to the design, automation, and reliability of our cloud infrastructure. You will lead efforts to build and maintain robust systems on Azure, with a strong focus on managing Azure Kubernetes Service (AKS) clusters. This role requires a deep understanding of cloud-native technologies, infrastructure as code, and CI/CD practices. You will collaborate with cross-functional teams to ensure seamless deployments, high system availability, and secure operations across our environments.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent practical experience).
  • 6–8 years of experience in DevOps, SRE, or Cloud Infrastructure roles.
  • Proven experience managing AKS clusters in production environments.
  • Hands-on experience with Azure cloud services, GitHub workflows, and infrastructure automation.
  • Strong understanding of networking, security, and monitoring in cloud-native environments.
  • Excellent communication skills and ability to work effectively in a collaborative team setting.

Nice To Haves

  • Azure certifications (e.g., AZ-400, AZ-104), Terraform, CKA.
  • Experience in high-availability and disaster recovery planning
  • Familiarity with Agile/Scrum methodologies
  • Experience with cost optimization in cloud environments

Responsibilities

  • Design, implement, and manage scalable infrastructure on Microsoft Azure.
  • Manage and maintain AKS clusters, including provisioning, scaling, monitoring, and troubleshooting.
  • Automate infrastructure provisioning and deployments using Terraform, Helm, and Argo.
  • Build and maintain CI/CD pipelines using GitHub Actions integrated with Azure services.
  • Containerize applications using Docker and deploy them to AKS.
  • Monitor system performance and reliability using Grafana and other observability tools.
  • Collaborate with Information Security teams to ensure secure and compliant infrastructure.
  • Optimize networking configurations and troubleshoot connectivity issues across cloud environments.
  • Manage artifact repositories and package management using JFrog Artifactory.
  • Work with caching and data stores like Redis to enhance application performance.
  • Collaborate with software engineers to improve deployment processes and system architecture.
  • Document infrastructure and operational procedures for knowledge sharing and compliance.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service