Lead Software Engineer - Data Engineer

JPMorgan Chase & Co.•Jersey City, NJ

4h•$152,000 - $215,000

About The Position

As a Lead Software Engineer at JPMorganChase within the Commercial & Investment Bank (CIB) – Regulatory Reporting Team, you are an integral part of an agile team that works to enhance, build, and deliver trusted market-leading technology products in a secure, stable, and scalable way. As a core technical contributor, you are responsible for conducting critical technology solutions across multiple technical areas within various business functions in support of the firm’s business objectives.

Requirements

Formal training or certification on software engineering concepts and 5+ years applied experience
5+ years of applied experience building production data engineering and/or software engineering solutions (design, development, testing, operations)
Hands-on practical experience delivering system design, application development, testing, and operational stability for large-scale data pipelines
Advanced in one or more programming language(s), with advanced proficiency in Python and strong hands-on experience with PySpark.
Advanced proficiency in Spark SQL and strong SQL fundamentals (data modeling, query optimization, execution plan analysis)
Demonstrated experience leading effective use of approved AI-assisted software development tools (e.g., for coding, code review, test acceleration, troubleshooting) with the ability to set team expectations for validating AI outputs for correctness, performance, and security
Strong understanding of responsible AI use in engineering workflows, including data sensitivity considerations, secure handling of inputs, outputs, and adherence to resiliency and security expectations; experience coaching engineers on safe, compliant adoption within delivery practice
Experience with AWS data management patterns including S3 and AWS Glue Data Catalog (metadata governance, table schema hygiene, discoverability). Would also consider other cloud based Data platform.
Required platform experience: delivering and operating Spark workloads on EMR and or Databricks (tuning, troubleshooting, monitoring, and cost, performance optimization)
Required lakehouse expertise: production experience with Apache Iceberg, including table design and ongoing operations such as partitioning strategy and file layout optimization, schema evolution and compatibility controls, compaction, small-file mitigation, snapshot retention management and metadata maintenance, safe backfills and rewrites, reprocessing patterns
Proficiency in automation and continuous delivery methods (CI CD, automated testing, and repeatable deployments for data pipelines)

Nice To Haves

Kafka familiarity (topic design, producer/consumer patterns, schema evolution/compatibility, and operational considerations) is a plus
Experience with Delta Lake concepts and trade-offs vs. Iceberg
Experience with Spark Structured Streaming and streaming ETL patterns
Working knowledge of Java (interoperability or leveraging existing JVM-based components)
Experience using AI-assisted engineering tools and workflows (e.g., GitHub Copilot, Claude) including spec-driven development, prompt-assisted refactoring, and code review—following enterprise-safe usage patterns

Responsibilities

Executes creative software solutions, design, development, and technical troubleshooting with ability to think beyond routine or conventional approaches to build solutions or break down technical problems, with a focus on data engineering and Spark-based ETL/ELT
Develops secure high-quality production code in Python/PySpark and Spark SQL, and reviews and debugs code written by others (Spark jobs, SQL logic, and data issues end-to-end)
Drives team adoption of enterprise-authorized AI-assisted engineering practices within the work environment to improve code quality, delivery speed, and operational outcomes (e.g., AI-assisted code review/refactoring, test strategy acceleration, incident/root-cause analysis support), while establishing consistent validation standards (secure coding, peer review, automated testing) and promoting reuse of effective patterns across the team.
Applies knowledge of tools within the Software Development Life Cycle toolchain, including enterprise-authorized AI-assisted development and automation capabilities, to improve the value realized by automation.
Identifies opportunities to eliminate or automate remediation of recurring issues to improve overall operational stability of software applications and systems, including data pipeline reliability and lakehouse maintenance automation
Leads evaluation sessions with external vendors, startups, and internal teams to drive outcomes-oriented probing of architectural designs, technical credentials, and applicability for use within existing systems and information architecture (e.g., EMR/Databricks, lakehouse/table formats, catalog/governance patterns)
Leads communities of practice across Software Engineering to drive awareness and use of new and leading-edge technologies, especially around Spark performance, Iceberg best practices, and data platform operations
Adds to team culture of diversity, opportunity, inclusion, and respect