Senior Azure Engineer (SRE/DevOps)

Capgemini•Chicago, IL

5h•Hybrid

About The Position

We’re seeking a Senior Azure Engineer with strong SRE and DevOps expertise to build, operate, and optimize cloud-native platforms on scale. You will own reliability, performance, security, cost, and developer productivity for critical workloads. You’ll collaborate closely with product, security, and operations teams to automate everything—provisioning, deployments, observability, incident response, and compliance—while supporting a rotational shift model to ensure 24x7 coverage.Your Role: Design, build, and maintain Azure landing zones and platform services (e.g., VNet, Private Endpoints, Key Vault, Azure Firewall/NSGs, Application Gateway/WAF). Implement Infrastructure as Code (IaC) with Terraform and/or Bicep; enforce GitOps workflows (branching, PRs, policy checks). Create reusable modules, pipelines, and golden patterns for app teams; champion automation-first approaches Define and measure SLIs/SLOs, error budgets, and reliability roadmaps for critical services. Implement and tune observability (logs, metrics, traces) using Azure Monitor, Log Analytics, Application Insights, and Prometheus/Grafana where applicable. Conduct capacity planning, resiliency testing (chaos, failover, DR), and performance tuning across services. Build secure, robust CI/CD pipelines (GitHub Actions / Azure DevOps Pipelines) with automated testing, scans, and approvals. Standardize deployment strategies (blue/green, canary, rolling) for containerized and PaaS workloads. Manage container platforms (AKS: node pools, cluster autoscaling, HPA/VPA, ingress, network policies) and registries (ACR). Implement guardrails using Azure Policy, RBAC, PIM, and Blueprints (or equivalent) to enforce least privilege and compliance (e.g., SOC 2, ISO 27001, HIPAA as relevant). Manage secrets and certificates (Key Vault) and integrate security testing (SAST/DAST/Container scanning) into pipelines. Support vulnerability remediation and patching SLAs. Own incident response, including rotational shifts and on-call; lead triage, root cause analysis (RCA), and post-incident reviews. Optimize cost (FinOps), tagging standards, budgets, and proactive spending alerts. Maintain runbooks, knowledge base articles, and automation for routine operations. Act as a technical mentor; review designs/PRs; contribute to architecture decisions. Partner with app teams to onboard workloads, define nonfunctional requirements, and drive platform adoption.

Requirements

7 years of hands-on experience with Azure-based infrastructure and services in production.
Deep expertise in several of: AKS, App Services, Functions, APIM, Azure SQL/MI, Cosmos DB, Storage, Event Hub/Service Bus, Redis, VNet/Peering, Private Link, Application Gateway/WAF, Front Door.
Strong IaC with Terraform (preferred) and/or Bicep; Git-based workflows; GitHub or Azure DevOps.
Proven SRE background: SLI/SLO design, error budgets, incident management, RCA, capacity and performance engineering.
CI/CD design and operations (GitHub Actions / Azure DevOps Pipelines); artifact/versioning strategies; release governance.
Observability with Azure: Monitor, Log Analytics, Application Insights, and alerting/automations (Action Groups, Logic Apps, Functions).
Solid networking fundamentals (DNS, TLS, routing, firewalls, load balancing), identity (AAD/Entra ID), and secrets management (Key Vault).
Scripting proficiency in PowerShell and/or Python; Linux fundamentals.
Understanding of security best practices: RBAC, PIM, Azure Policy, managed ide

Responsibilities

Design, build, and maintain Azure landing zones and platform services (e.g., VNet, Private Endpoints, Key Vault, Azure Firewall/NSGs, Application Gateway/WAF).
Implement Infrastructure as Code (IaC) with Terraform and/or Bicep; enforce GitOps workflows (branching, PRs, policy checks).
Create reusable modules, pipelines, and golden patterns for app teams; champion automation-first approaches
Define and measure SLIs/SLOs, error budgets, and reliability roadmaps for critical services.
Implement and tune observability (logs, metrics, traces) using Azure Monitor, Log Analytics, Application Insights, and Prometheus/Grafana where applicable.
Conduct capacity planning, resiliency testing (chaos, failover, DR), and performance tuning across services.
Build secure, robust CI/CD pipelines (GitHub Actions / Azure DevOps Pipelines) with automated testing, scans, and approvals.
Standardize deployment strategies (blue/green, canary, rolling) for containerized and PaaS workloads.
Manage container platforms (AKS: node pools, cluster autoscaling, HPA/VPA, ingress, network policies) and registries (ACR).
Implement guardrails using Azure Policy, RBAC, PIM, and Blueprints (or equivalent) to enforce least privilege and compliance (e.g., SOC 2, ISO 27001, HIPAA as relevant).
Manage secrets and certificates (Key Vault) and integrate security testing (SAST/DAST/Container scanning) into pipelines.
Support vulnerability remediation and patching SLAs.
Own incident response, including rotational shifts and on-call; lead triage, root cause analysis (RCA), and post-incident reviews.
Optimize cost (FinOps), tagging standards, budgets, and proactive spending alerts.
Maintain runbooks, knowledge base articles, and automation for routine operations.
Act as a technical mentor; review designs/PRs; contribute to architecture decisions.
Partner with app teams to onboard workloads, define nonfunctional requirements, and drive platform adoption.