Data Architect III

JPMorgan Chase & Co.•Palo Alto, CA

About The Position

As an Data Architect III at JPMorganChase within the CONSUMER & COMMUNITY BANKING you serve as a seasoned member of a team by incorporating leading best practices and collaborating with other architects to develop high-quality architecture solutions for various software applications and platforms. You are responsible for conducting critical technology solutions across multiple technical areas within various business functions in support of the firm’s business objectives.

Requirements

Formal training or certification on software engineering concepts and 3+ years applied experience
Strong knowledge of data architecture development with hands-on expertise designing Data Lake such as lakehouse solutions on Databricks or AWS Lake Formation.
Hands-on practical experience in system design, application development, testing, and operational stability for distributed data processing using PySpark, Spark SQL, and Scala on Databricks and AWS Glue, including monitoring with CloudWatch and platform-native metrics.
Proficient in coding in Python, Scala, and SQL, with production-grade developments.
Overall knowledge of the Software Development Life Cycle with emphasis on data product lifecycle: requirement capture, model design, pipeline implementation, automated testing, deployment via CI/CD (e.g., Git-based workflows), and ongoing optimization.
Solid understanding of agile methodologies including Continuous Integration/Delivery, application resiliency through retry logic, idempotency, and fault tolerance patterns, and security-by-design aligned to Lake Formation policies.
Demonstrated knowledge across Cloud disciplines, including architecting distributed data platforms on AWS and applying governed access patterns through Unity Catalog and Lake Formation.
Exposure to cloud technologies with practical experience on AWS services relevant to data platforms, including S3, IAM, KMS, CloudWatch, Step Functions, and integration with Lake Formation.
Proven ability to implement data security and privacy controls: column/row-level security via Lake Formation, data masking/tokenization patterns, and audit trails leveraging Unity Catalog and Cloud-native logging.
Competence in Data pipeline orchestration and workflow management, with automated dependency management and SLA tracking.

Nice To Haves

Experience with Open Table format such as Delta or Iceberg.
Familiarity with Great Expectations or similar data quality frameworks integrated into Databricks or AWS Glue pipelines.
Experience with Infra-as-Code for data platforms using Terraform or CloudFormation to provision Lake Formation permissions, Glue resources, and Databricks assets.
Experience using Natural Lang to SQL solutions such as DataBricks Genie or Snowflake Cortex.

Responsibilities

Creates secure and high-quality production code using PySpark, Spark SQL, and Scala within AWS Glue jobs and Databricks workflows; embeds data quality checks, schema enforcement, and lineage capture; and integrates with AWS Lake Formation access controls for consistent runtime authorization.
Produces data architecture and design artifacts for complex applications, including canonical data models. Data Platform and Pipeline design for both Data Publishing and Consumption Platforms.
Evaluates data architecture designs and provides feedback grounded in platform capabilities, benchmarking Spark query plans, storage formats (Parquet vs. Delta), and AWS Glue job types (Spark vs. Ray where applicable), while validating Lake Formation tag-based access control (LF-TBAC) and cross-account sharing patterns.
Represents the team in architectural governance bodies by standardizing data product interfaces, defining data contracts, endorsing schema evolution policies via Delta Lake (merge, optimize, vacuum), and establishing controls for PII/PCI using Lake Formation column-level permissions and row-level filters.
Leads the data architecture team to evaluate and adopt modern capabilities such as Multi-Catalog and Federated Data Catalog using Open Table format such as Iceberg.
Designs secure multi-tenant data lake architectures using AWS Lake Formation with tag-based policies, data masking strategies, and federated identity integration; implements cross-account and cross-region data sharing patterns aligned to enterprise security policies.
Work with Product, Engineering on the solution design, effectively translate business requirements in to technical design document and pivot into timelines.
Establishes cost governance and optimization by selecting proper storage tiers, using appropriate computer cluster for data consumption with standard fine grained data authorization