Senior Machine Learning Engineer, Model Risk Management

Block•Bay Area, CA, United States of America, CA

8h•$160,700 - $283,600•Remote

About The Position

Block lends, moves money, and screens for financial crime at enormous scale, and one bad model can mean millions in credit losses, suspicious activity that goes unreported, or a fair lending violation. Model Risk Management is the independent function that decides whether a model is sound enough to put in front of customers and regulators. The failures that matter rarely announce themselves: a model can clear every headline metric and still be broken underneath. It can pass clean at launch and then quietly drift as the population shifts, until the loss it was supposed to prevent surfaces months later. The hard part is finding what looks right and is wrong, then proving it well enough to hold up under questioning. Much of the work arrives under-specified, so you scope it into a defensible plan, ask the questions that surface the real requirements, and defend your tradeoffs to the people who built the model you are challenging. The same scrutiny you apply to models applies to AI. We build the tooling that lets a lean team validate at scale, so you critically evaluate what it produces and own the evaluation that confirms its output is reliable enough to act on. That work matters most for the GenAI and agentic systems most teams have not figured out how to oversee yet. As a senior individual contributor, you lead through technical depth and cross-team scope, and you partner widely across the organization. You work with the first-line modelers you challenge, the Legal, Compliance, and fair-lending teams who rely on your analysis, and the auditors and bank partners who carry it into regulatory engagements. This role is remote-friendly within approved US locations.

Requirements

A quantitative degree or equivalent experience, and senior-IC depth building or validating models in a high-stakes domain such as credit, fraud, or financial crime.
Command of effective-challenge methodology: reproduction, conceptual-soundness review, benchmarking, stress testing, and outcomes analysis, with an eye for how a model holds up after launch and where its assumptions break.
Deep applied ML and statistics across model families, from regression and tree ensembles to deep learning, with sound judgment about evaluation, calibration, and generalization.
Experimentation and statistical rigor: holdout and experiment design, reasoning about uncertainty, and evaluating a model beyond aggregate accuracy.
Solid software and data engineering: production-quality Python, SQL on large datasets, and reproducible, tested code.
Fluency with modern AI: building with LLMs and agentic tools, and the judgment to know when their output can be trusted.
Familiarity with model risk management frameworks and fair-lending standards, with the specifics learnable on the job.
The communication to explain and defend your conclusions to model owners and senior stakeholders, and the independence to operate under ambiguity.

Responsibilities

Independently challenge model owners across lending, fraud, and AML: reproduce their results, set and defend the acceptance thresholds, and own the call on whether a model is sound.
Hunt the silent errors that make metrics lie, and prove them out before they reach production.
Choose evaluation that holds up under real conditions: rare events, shifting populations, and drift that only shows up after launch.
Work hands-on in codebases you did not write, learning the data, configs, and conventions, and ship production code in the tooling you build to validate them.
Build the agentic validation tooling the team depends on, orchestrating agents that run in parallel.
Reason about ML systems end to end — how features, training, serving, monitoring, and scale fit together — to evaluate and challenge an owner's design.
Tie explainability and fair-lending findings on consumer credit models back to the model and product decisions that follow.
Help define how Block validates the systems at the frontier of production AI, setting standards where none exist yet.