Data Engineer (Databricks), Assistant Vice President

State StreetBoston, MA
$110,000 - $177,500

About The Position

We are seeking a Data Engineer to design, build, and support a modern Legal Data Lakehouse platform on AWS and Databricks. This role will focus on developing scalable, high‑performance data pipelines and enabling trusted, governed data capabilities supporting legal operations, compliance analytics, reporting, and AI/ML use cases. The ideal candidate brings strong hands-on experience in Databricks, AWS data platforms, and enterprise data engineering practices, with experience delivering solutions in regulated environments aligned to security, compliance, and audit requirements. State Street's Legal, Security, and Compliance functions operate across contracts, matters, regulatory obligations, documents, and workflows distributed across multiple systems. Building robust data pipelines ensures data is accessible, reliable, and usable for analytics and decision-making. This role is critical to enabling a scalable and governed data foundation supporting legal analytics and operational reporting while strengthening data quality, auditability, and consistency.

Requirements

  • 8+ years of experience in Data Engineering or data platform development
  • Strong hands-on experience with Databricks and Apache Spark
  • Proficiency in PySpark, Python, and SQL
  • Experience with AWS data platform services, including: S3, Glue, Lambda, IAM
  • Experience working with Delta Lake and lakehouse architecture
  • Solid understanding of: Distributed data processing, ETL/ELT frameworks, Data modeling techniques

Nice To Haves

  • Hands-on experience with Databricks platform components (Delta Lake, Workflows, Unity Catalog)
  • Strong experience building end-to-end data pipelines (batch and streaming) using AWS and Databricks
  • Familiarity with performance optimization techniques in Spark and Delta Lake
  • Experience supporting analytics, reporting, or AI/ML use cases on a lakehouse platform
  • Understanding of data governance, metadata management, and security controls
  • Experience in Legal, Compliance, Financial Services, or regulated industries
  • Understanding of legal data constructs such as contracts, clauses, obligations, and matters
  • Exposure to unstructured data processing or document/NLP pipelines
  • Experience with Power BI, Power Apps, or Power Platform
  • Experience handling sensitive data in audit-driven environments

Responsibilities

  • Design, build, and maintain scalable data pipelines using PySpark, Python, and Spark SQL
  • Develop and optimize ETL/ELT workflows on Databricks using Delta Lake
  • Implement Lakehouse architecture (Bronze/Silver/Gold layers) for enterprise data platforms
  • Build and manage Databricks Jobs, Workflows, and Notebooks for batch and streaming workloads
  • Develop reusable frameworks for data ingestion, processing, and orchestration
  • Containerize data workloads using Docker and automate processes via scripting
  • Integrate Databricks data pipelines with Power Platform solutions (Power Apps, Power Automate)
  • Enable data exposure for business users via APIs, connectors, and curated datasets
  • Design and optimize data lakehouse architectures using Databricks
  • Integrate data from SQL Server, Oracle, and other enterprise source systems
  • Apply advanced data modeling techniques (dimensional modeling, partitioning, optimization)
  • Work with structured and semi-structured data (e.g., JSON, Parquet)
  • Tune performance using caching, indexing, and Spark optimization techniques
  • Publish curated datasets for consumption in Power BI dashboards/ Power Apps / Power Automate workflows
  • Ensure data quality using unit testing, validation frameworks, and automated checks
  • Monitor and troubleshoot distributed Spark workloads and pipelines
  • Analyze logs and resolve production issues across Databricks and cloud environments
  • Maintain data lineage, consistency, and audit readiness
  • Collaborate with Legal, Security, Compliance, and Enterprise Data teams to deliver scalable solutions
  • Translate business requirements into robust data engineering designs
  • Act as a Subject Matter Expert (SME) in Databricks and Lakehouse architecture
  • Lead initiatives with minimal supervision and take full ownership of deliverables
  • Support implementation of data governance frameworks using Databricks Unity Catalog and AWS controls (IAM, KMS)
  • Ensure adherence to: Data privacy and regulatory requirements (e.g., GDPR), Internal security and audit standards
  • Implement and maintain: Data access controls, Data classification and handling standards
  • Collaborate with IAM and security teams to ensure secure data access
  • Design and maintain CI/CD pipelines using Harness, Azure DevOps, or GitHub
  • Automate deployment of Databricks assets using Databricks Repos and CLI
  • Monitor, schedule, and optimize workflows using Databricks orchestration tools
  • Maintain clear documentation including architecture, data flows, and runbooks
  • Continuously improve performance, scalability, and cost efficiency

Benefits

  • our retirement savings plan (401K) with company match
  • insurance coverage including basic life, medical, dental, vision, long-term disability, and other optional additional coverages
  • paid-time off including vacation, sick leave, short term disability, and family care responsibilities
  • access to our Employee Assistance Program
  • incentive compensation including eligibility for annual performance-based awards (excluding certain sales roles subject to sales incentive plans)
  • eligibility for certain tax advantaged savings plans
  • inclusive development opportunities
  • flexible work-life support
  • paid volunteer days
  • vibrant employee networks
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service