Data Scientist Lead

JPMorgan Chase & Co.•Columbus, OH

23h

About The Position

Join an intellectually diverse team of economists, statisticians, engineers, and other analytics professionals focused on quantitative modeling within Community & Consumer Banking (CCB) at JPMorganChase & Co. As a Data Scientist Lead, within the Finance Decision Optimization group, you will build and deploy data-driven solutions, collaborate with stakeholders and cross-functional teams to define data and model requirements, design and build data pipelines, and develop complex predictive and optimization routines.

Requirements

A minimum of 5 years of relevant professional experience as a software engineer, data/ML engineer, data scientist, or AI/ML systems engineer, with a demonstrated track record of delivering complex, end-to-end technical solutions in production or near-production environments; Bachelor's degree in Computer Science, Financial Engineering, MIS, Mathematics, Statistics, or another quantitative field.
Practical knowledge of the banking sector, specifically in areas of retail deposits, auto, card, and mortgage lending, with an understanding of relevant compliance and regulatory contexts (e.g., Fair Lending).
Working knowledge of LLMs, agentic AI frameworks, and emerging AI engineering practices, including tool/function calling, RAG architectures, prompt design, and agent orchestration patterns; eagerness to stay current with the latest advancements in Agentic AI and machine learning.
Exceptional analytical and problem-solving abilities with a clear understanding of business requirements; capable of translating complex technical concepts to a wide range of audiences including non-technical stakeholders.
Highly detail-oriented with a proven track record of delivering tasks on schedule; able to manage multiple priorities efficiently in a fast-paced environment while maintaining quality and meeting critical business needs.
Excellent team player with strong interpersonal skills; able to work cross-functionally using a consultative approach, mentor junior staff, and contribute to a culture of shared technical ownership and continuous improvement.

Nice To Haves

Proficiency in Python programming with a strong grasp of object-oriented and functional programming concepts; experience applying Python in data processing, ML model development, and AI/LLM application development including prompt engineering and agentic workflow orchestration and hands-on experience with LLM orchestration frameworks (e.g., LangChain, LangGraph, LlamaIndex, or similar); familiarity with embedding models, vector databases (e.g., FAISS, Pinecone, pgvector), retrieval-augmented generation (RAG) pipelines, and evaluation frameworks for agentic systems.
Extensive knowledge of Apache Spark with experience optimizing Spark jobs for performance and scalability within Databricks; hands-on experience with cloud platforms (AWS EC2, EMR, S3/EFS or equivalent) and proficiency with Snowflake for large-scale data processing and analytics.
Advanced SQL skills for complex query writing, data manipulation, and analysis; strong experience in data engineering including ETL/ELT processes, data modeling, data governance, and compliance standards relevant to handling sensitive and regulated data and proficiency with the Python data science ecosystem (Pandas, NumPy, SciPy) and practical experience implementing and validating machine learning algorithms (e.g., XGBoost, TensorFlow) and ability to perform data analysis, cleansing, modeling (including time series and NLP), and visualization using tools such as Tableau or Alteryx to develop and automate actionable business insights.
Expertise in Linux bash shell environment and Git for version control and collaborative development; familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes) to support scalable deployment of data and AI services and familiarity with implementing guardrails, input/output validation, human-in-the-loop checkpoints, and monitoring/observability patterns (action traces, decision logs, cost and latency tracking) for AI/agentic systems operating in regulated environments.

Responsibilities

Build, compile, and automate scalable data pipelines, complex predictive models, and optimization routines using big data technologies (Spark, Databricks, Snowflake) on cloud platforms; transform massive volumes of data into actionable business insights and package solutions into repeatable, executable workflows for QA testing and production deployment.
Lead solution backtesting exercises across key stakeholder domains (e.g., Fair Lending), validate model performance against historical data, identify analytical gaps and proactively surface critical issues to business and technology partners to ensure models are robust, reliable, and decision-ready.
Stay ahead of industry trends in data science, ML, and cloud engineering; provide informed recommendations for adopting new and emerging technologies; actively support ongoing technology evaluation processes and contribute to early-stage proof of concept projects that test and validate innovative approaches.
Collaborate effectively across engineering, data science, business, and external stakeholder teams; manage project delivery within timelines; ensure solutions meet critical business needs while proactively raising risks, dependencies, and blockers to the right partners before they escalate. and serve as a mentor and knowledge resource for junior staff; establish best practices in data engineering, ML modeling, and analytical automation; foster a culture of continuous learning, technical excellence, and shared ownership across the team.
Architect and build foundational agentic workflows from the ground up — including tool/function calling, multi-step reasoning chains, and agent orchestration patterns — while establishing early technical standards that will scale from PoC to production-ready systems.
Define success metrics specific to agent performance (task completion, tool-use accuracy, reasoning consistency, failure modes); build evaluation harnesses early in the PoC stage to validate agent behavior, surface edge cases, and establish quality baselines before scaling.
Design and prototype retrieval layers (RAG, tool-augmented memory, knowledge base integrations) that agents rely on to take actions; ensure data quality and access controls are considered from day one of the PoC to avoid rearchitecting later and identify and mitigate risks unique to autonomous agents (unintended actions, prompt injection, cascading tool-call failures, data leakage) and establish guardrails and human-in-the-loop checkpoints early in the PoC to build a safe and auditable agent framework.
Instrument agent workflows with observability (action traces, decision logs, cost and latency tracking) from the earliest prototype and synthesize PoC findings into architectural decisions, runbooks, and optimization strategies (caching, model routing, token budgets) that accelerate the path to production deployment.