Vice President, Data Engineering

Ares Management Corporation•New York, NY

About The Position

Over the last 20 years, Ares’ success has been driven by our people and our culture. Today, our team is guided by our core values – Collaborative, Responsible, Entrepreneurial, Self-Aware, Trustworthy – and our purpose to be a catalyst for shared prosperity and a better future. Through our recruitment, career development and employee-focused programming, we are committed to fostering a welcoming and inclusive work environment where high-performance talent of diverse backgrounds, experiences, and perspectives can build careers within this exciting and growing industry. Job Description Position Overview We seek a VP Data Engineer to own critical data pipelines and establish architectural patterns for our Databricks-on-Azure data platform. This is an opportunity to architect scalable ETL/ELT patterns, establish best practices for Databricks/Spark/Delta Lake development, and design systems that handle both structured and unstructured data at scale. You will work closely with the Principal Head of Data Engineering and VP Staff Data Engineer to build pipelines that power AI-ready infrastructure. Your code and patterns become the template for how the team builds data systems.

Requirements

6-9 years of data engineering experience with 2+ years at senior level or equivalent complexity
Expert-level proficiency in Databricks/Delta Lake: notebook development, SQL, Spark, performance tuning
Advanced SQL expertise: complex joins, window functions, CTEs, query optimization
Strong Python proficiency: PySpark, pandas, data validation libraries
Proven experience building ETL/ELT pipelines at scale (100GB+ datasets, multi-source ingestion)
Deep understanding of Delta Lake: transactions, ACID properties, schema evolution, merge operations
Experience with Azure cloud services: ADLS, Azure SQL, Event Hubs, blob storage, Azure Key Vault
Demonstrated experience with document and unstructured data processing
Experience with data orchestration tools (Prefect, Airflow, Databricks Workflows) and building robust error handling
Ability to mentor other engineers and lead by example
Comfort with greenfield projects and establishing best practices from scratch

Nice To Haves

Production experience with Databricks Unity Catalog and governance features
Experience with Databricks SQL and serverless compute
Hands-on experience with document extraction: PDFs, forms, OCR, Table extraction
Familiarity with Azure AI Services: Form Recognizer, Document Intelligence, Cognitive Search
Experience with NLP libraries (spaCy, NLTK) and text preprocessing at scale
Experience in financial services or PE environments
Familiarity with dbt for transformation orchestration
Databricks certifications or demonstrated expertise

Responsibilities

Databricks-on-Azure Architecture & Optimization Design and build complex Spark SQL and Python-based ETL/ELT pipelines in Databricks that handle large-scale data transformations
Master Delta Lake architecture: table design, partitioning strategies, file organization, Z-ordering for query optimization
Optimize Databricks cluster configurations: choose between interactive, job, and serverless compute based on workload; tune executor memory, shuffle partitions, and parallelism
Implement cost-efficient patterns: predictive pushdown, broadcast joins, caching strategies; right-size clusters and use spot instances for non-critical jobs
Design data quality frameworks within Databricks: schema validation, null handling, duplicate detection, completeness checks
Azure Integration & Data Movement Design ADLS (Azure Data Lake Storage) layouts: bronze/silver/gold medallion architecture, folder structures, retention policies
Optimize Azure data movement: leverage Delta Live Tables (DLT), Databricks SQL, and managed ingest patterns
Design Databricks workspace integration: configure Azure AD authentication, scoped API tokens, cluster policies
Implement data governance via Databricks Unity Catalog: manage catalogs, schemas, table ACLs, lineage tracking
Document & Unstructured Data Processing Design pipelines for document processing: ingest PDFs, Word docs, and other formats from blob storage into Databricks
Implement text extraction pipelines: use Databricks-native libraries, Azure AI Services (Form Recognizer, Document Intelligence)
Build structured extraction from unstructured data: extract financial tables, key metrics, and entities from deal documents and financial statements
Build preprocessing pipelines for NLP and LLM consumption: tokenization, chunking, metadata extraction, quality scoring
Manage document versioning and lineage: track source documents, extraction versions, and quality metrics
Core Pipeline Development Own end-to-end design and implementation of critical pipelines supporting investment teams, ops/finance, and client / IR teams
Establish patterns for error handling, logging, and monitoring within Databricks jobs
Implement idempotent pipeline design: support re-runs, backfills, and late-arriving data
Design incremental data loading: leverage Delta Lake's merge operations, CDC patterns, and change tracking
Partner on schema design and dimensional modeling of enterprise data sets
Mentorship & Standards Mentor junior and mid-level engineers through code review, pair programming, and design guidance
Establish Databricks/Spark best practices: naming conventions, notebook organization, cluster policies, testing patterns
Create reusable libraries and utilities: custom Spark functions, data quality frameworks, common transformations
Own code quality; your code is the reference for how the team builds Document patterns and best practices; maintain internal confluences/knowledge base
Troubleshooting & Optimization Debug complex Spark issues: shuffle spills, out-of-memory errors, performance bottlenecks
Optimize Databricks query performance: analyze execution plans, identify skew, apply optimization techniques
Manage cluster costs and performance: monitor job execution, identify inefficiencies, recommend cluster right-sizing
Lead postmortems and troubleshooting sessions for production issues