We’re seeking a Senior Azure Engineer with strong SRE and DevOps expertise to build, operate, and optimize cloud-native platforms on scale. You will own reliability, performance, security, cost, and developer productivity for critical workloads. You’ll collaborate closely with product, security, and operations teams to automate everything—provisioning, deployments, observability, incident response, and compliance—while supporting a rotational shift model to ensure 24x7 coverage.Your Role: Design, build, and maintain Azure landing zones and platform services (e.g., VNet, Private Endpoints, Key Vault, Azure Firewall/NSGs, Application Gateway/WAF). Implement Infrastructure as Code (IaC) with Terraform and/or Bicep; enforce GitOps workflows (branching, PRs, policy checks). Create reusable modules, pipelines, and golden patterns for app teams; champion automation-first approaches Define and measure SLIs/SLOs, error budgets, and reliability roadmaps for critical services. Implement and tune observability (logs, metrics, traces) using Azure Monitor, Log Analytics, Application Insights, and Prometheus/Grafana where applicable. Conduct capacity planning, resiliency testing (chaos, failover, DR), and performance tuning across services. Build secure, robust CI/CD pipelines (GitHub Actions / Azure DevOps Pipelines) with automated testing, scans, and approvals. Standardize deployment strategies (blue/green, canary, rolling) for containerized and PaaS workloads. Manage container platforms (AKS: node pools, cluster autoscaling, HPA/VPA, ingress, network policies) and registries (ACR). Implement guardrails using Azure Policy, RBAC, PIM, and Blueprints (or equivalent) to enforce least privilege and compliance (e.g., SOC 2, ISO 27001, HIPAA as relevant). Manage secrets and certificates (Key Vault) and integrate security testing (SAST/DAST/Container scanning) into pipelines. Support vulnerability remediation and patching SLAs. Own incident response, including rotational shifts and on-call; lead triage, root cause analysis (RCA), and post-incident reviews. Optimize cost (FinOps), tagging standards, budgets, and proactive spending alerts. Maintain runbooks, knowledge base articles, and automation for routine operations. Act as a technical mentor; review designs/PRs; contribute to architecture decisions. Partner with app teams to onboard workloads, define nonfunctional requirements, and drive platform adoption.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed