Fabric Data Engineer — Workplace Engineering

Vanguard•Wayne, PA

23h•Hybrid

About The Position

Vanguard is establishing Microsoft Fabric as the enterprise data and analytics foundation for its Workplace AI, Power BI, and cross-cloud analytics initiatives. This role is part of a strategic engagement with Microsoft to build this capability on F256 Reserved capacity, integrating with Vanguard's existing data, identity, and security infrastructure. The position involves hands-on data engineering to own the data layer of this foundation, focusing on designing and implementing scalable data products within OneLake, including lakehouses, warehouses, pipelines, notebooks, and Delta tables ready for semantic models. The engineer will be responsible for the lifecycle, governance, and operational health of the Fabric platform, collaborating closely with AI Engineers, Technical Project Managers, Cloud Domain Architects, and the Microsoft CDAO Fabric Enablement team. This is a builder role focused on strategic engineering and implementation within the emerging Workplace AI Fusion Team, not a support position.

Requirements

8+ years of professional software/data/platform engineering experience.
5+ years building production data solutions on the Microsoft and/or Azure data stack.
Hands-on production experience with at least three of: Microsoft Fabric (Lakehouse, Warehouse, Pipelines, Notebooks, Real-Time Intelligence), Azure Synapse, Azure Data Factory, Databricks, Power BI semantic models, Azure SQL/SQL Server.
Strong skills in SQL, PySpark, and KQL.
Demonstrable experience designing and shipping CI/CD for data platforms: Git workflows, automated deployment, environment promotion, secret-less authentication, and infrastructure-as-code.
Working knowledge of Terraform (preferred) or Bicep for cloud platform automation, including provider versioning, state management, and policy-as-code patterns.
Experience implementing security and compliance controls in a regulated environment: Purview, Sentinel, Defender, Conditional Access, MIP, DLP, RBAC, RLS/CLS/OLS, dynamic data masking.
Identity fluency with Entra ID (Azure AD) and federated IdPs (Okta preferred); experience with service principals, managed identities, and Workload Identity Federation.
Experience working in financial services, healthcare, or another heavily regulated environment, or a credible plan to come up to speed quickly.
Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.

Nice To Haves

DP-700 (Microsoft Certified: Fabric Data Engineer Associate) required or in-progress within 6 months of hire; DP-600 (Fabric Analytics Engineer Associate) and AZ-305 (Azure Solutions Architect Expert) preferred.
Hands-on experience with the Microsoft fabric-cicd Python library and the microsoft/fabric Terraform provider.
Experience operating a Fabric Center of Excellence, Power BI CoE, or comparable data-platform CoE.
Experience with cross-cloud data integration patterns (OneLake ↔ AWS S3, mirroring, shortcuts) and BCDR for analytics platforms at enterprise scale.
Experience configuring Prep for AI on semantic models and partnering with AI/agent engineers on certified data-product handoffs.
Background contributing to internal communities of practice, champions networks, or developer enablement programs.
Prior experience as a hands-on engineer in a Fusion Team or Data/AI Center of Excellence model.
Additional vendor certifications welcomed but not required: AZ-204, SC-100, DP-203.

Responsibilities

Design and implement scalable data storage in OneLake using Lakehouses (Delta) and Warehouses (T-SQL), selecting appropriate items for workloads and configuring SQL analytics endpoints, shortcuts, and OneLake security.
Build and maintain Spark notebooks (PySpark), Data Factory pipelines, Dataflows Gen2, Copy Jobs, and mirroring for enterprise-scale batch and incremental data ingestion.
Develop Real-Time Intelligence solutions including Eventstreams, Eventhouses/KQL databases, Activator reflexes, and Spark structured streaming for low-latency workloads.
Optimize Lakehouse tables (e.g., OPTIMIZE, V-Order, Z-Order, partitioning) and Direct Lake semantic-model-ready datasets to ensure predictable performance for downstream Power BI and AI agents.
Implement source control, branching, and CI/CD using native Fabric Git integration (Azure DevOps and GitHub), Fabric Deployment Pipelines, and the Microsoft fabric-cicd Python library.
Automate Dev/Test/Prod promotion using the Fabric REST API, service principals, and Workload Identity Federation, codifying environment-aware bindings via Variable Libraries and parameter.yml.
Operate a Feature → Dev → UAT → Prod branching pattern with mandatory PR reviews, cherry-pick promotion, and scoped blast radius using a single repo per team.
Manage the lifecycle of Fabric data components from creation to retirement, ensuring environment reproducibility from the GitHub pipeline.
Operate the Fabric F256 capacity, monitoring CU consumption, managing smoothing windows, diagnosing throttling, and right-sizing workloads.
Build telemetry using the Monitoring Hub, per-workspace Workspace Monitoring, Eventhouse monitoring, and the Admin Monitoring Workspace to track refresh failures, pipeline errors, and semantic-model health.
Define dashboards and alerts for ingestion, transformation, refresh, and capacity health, driving root-cause analysis for production incidents and incorporating lessons into platform standards.
Define and operate the on-call model for production data pipelines and Fabric items in partnership with Tier 3 Engineering.
Define and enforce Fabric platform standards through Infrastructure-as-Code (IaC) using Terraform and the official microsoft/fabric provider, workspace templates, naming/tagging conventions, and automated CI policy checks.
Manage tenant settings, domains, and capacity allocation in partnership with the Fabric Center of Excellence, aligning identity with Entra ID and Okta federation, and rotating service principals with PIM for elevated admin roles.
Implement RBAC patterns separating control-plane and data-plane roles, and operate RLS, CLS, OLS, dynamic data masking, and item-level sharing.
Integrate Microsoft Purview for sensitivity labels, DLP, metadata scanning, lineage, and impact analysis, managing endorsement for trusted datasets.
Build cross-cloud integration patterns (e.g., OneLake Direct Lake against AWS S3, Mirrored Databases) and shortcuts.
Publish governed, AI-ready data products with Prep for AI configured on semantic models for use by Fabric Data Agents, Copilot Studio, and Azure AI Foundry.
Coordinate with Data, Cloud, Identity, and Security domain teams on data-sharing patterns, private link configuration, and on-prem data gateway operations.
Serve as Tier 3 escalation for complex Fabric, OneLake, pipeline, capacity, and Direct Lake issues.
Provide technical consultation to teams onboarding workloads to Fabric.
Build reusable patterns, reference implementations, and internal playbooks for ingestion, modeling, deployment, and capacity operations.
Lead proof-of-concept work for new Fabric capabilities.
Partner with the Microsoft CDAO Fabric Enablement engagement to provide product roadmap insights.
Contribute to the Workplace AI and enterprise Data roadmap and operating model, and partner with champions for adoption outcomes.