Cloud Data Platform Engineer

OmegaHires

1d•Remote

About The Position

Cloud Data Platform Engineer / SRE (Azure | Databricks | Fabric | Unity Catalog) Role Overview: We are seeking highly skilled Cloud Data Platform Engineers (4–5 resources) with deep hands-on expertise across Azure, Databricks, Microsoft Fabric, and Unity Catalog to strengthen our cloud data operations. These roles demand technical execution, not coordination—requiring strong engineering capabilities in infrastructure automation, FinOps, CI/CD, platform governance, environment management, and Site Reliability Engineering (SRE) practices. The ideal candidates will have proven experience managing enterprise-scale data platforms, ensuring secure, reliable, cost-efficient, and well-governed cloud environments.

Requirements

6–12 years in cloud data engineering, SRE, or platform engineering roles.
Strong hands-on expertise with: Azure Data Services (ADLS, ADF, Synapse, Key Vault, VNets) Azure Databricks (clusters, jobs, Delta Lake, DLT, Unity Catalog) Microsoft Fabric (Lakehouse, Pipelines, Warehouse, Dataflows) Unity Catalog governance (catalogs, schemas, access policies, lineage)
Strong scripting and automation experience: Python, PowerShell, Bash, SQL, PySpark.
Experience with Terraform/Bicep for IaC.
Strong knowledge of Azure DevOps or GitHub Actions CI/CD pipelines.
Proven FinOps experience with cost governance and optimization across cloud workloads.
Experience in SRE practices—SLIs, SLOs, operational readiness, automated recovery.

Nice To Haves

Certifications in Azure Data Engineer, Azure DevOps Engineer, Databricks Data Engineer, FinOps Practitioner.
Experience in highly regulated environments (BFSI, Healthcare, Retail).
Understanding of zero-trust security models.

Responsibilities

Cloud Platform Operations Manage and optimize Azure workloads—ADLS, VNets, Key Vault, ADF, Synapse, Fabric, Databricks.
Configure and maintain Databricks clusters, jobs, DLT pipelines, Delta Lake storage, and Unity Catalog policies.
Operationalize Fabric Lakehouses, Pipelines, Warehouses, and Semantic Models for production workloads.
Ensure robust platform governance across environments (DEV–QA–UAT–PROD).
Infrastructure‑as‑Code & CI/CD Build and maintain Terraform/Bicep templates for environment provisioning and configuration.
Develop end-to-end CI/CD pipelines for Databricks, Fabric, and Azure components (ADO/GitHub).
Automate deployment of notebooks, workflows, access policies, networking components, and Fabric artifacts.
Enforce version control, release governance, and quality gates.
FinOps, Cost Management & Capacity Planning Implement FinOps dashboards, alerts, budgets, and spend governance practices.
Perform Databricks and Fabric cost optimization—cluster sizing, autoscaling, idle management, job tuning.
Conduct capacity planning for compute, storage, Fabric engines, and Databricks workloads.
Develop cost-saving recommendations and automated consumption monitoring.
Environment Management, Security & Governance Provision and manage Azure data environments with consistent policies and naming standards.
Configure RBAC, ACLs, Unity Catalog grants, service principals, network security, Managed Identities.
Implement governance standards for data access, lineage, audit logging, compliance, and risk mitigation.
Ensure secure connectivity using Private Endpoints, VNET integration, and enterprise IAM controls.
Monitoring, Observability & Platform Reliability (SRE) Implement monitoring and alerting using Azure Monitor, Log Analytics, Databricks Metrics, Fabric Admin APIs.
Build runbooks, dashboards, and automated remediation workflows for platform reliability.
Conduct performance tuning of data workloads, Fabric pipelines, Databricks jobs, and storage layers.
Lead incident management, root‑cause analysis, and environment stabilization efforts.