Lead Data Scientist

Middesk•San Francisco, CA

58d•$210,000 - $250,000•Hybrid

About The Position

Middesk is seeking a hands-on applied ML expert to build the technical foundation for AI-driven applications that streamline customer workflows, focusing on business onboarding. The role involves shipping external-facing models in the risk/fraud space, dealing with imbalanced data, low labels, and changing behavior. This is a highly technical role with significant influence on ML design, build, and scaling at Middesk. The company follows a hybrid work model, requiring 2 days per week in the SF/NYC office, with candidates needing to be within commuting distance. Middesk is a Y Combinator graduate, backed by Sequoia Capital and Accel Partners, and recognized on the Forbes Fintech 50 List.

Requirements

5+ years of production ML experience in one or more of the following areas:
Building Production ML for risk, fraud, credit, or trust & safety: Track record of shipping external-facing ML applications in one or more of these domains.
Knowledge graph applications: Hands-on experience building, querying, or extracting signals from knowledge graphs—ideally over business entity networks (companies, persons, addresses, relationships) to support identity verification, fraud detection, or risk decisioning.
Entity resolution for business or individual identities: Experience disambiguating and linking records across noisy, incomplete, or conflicting data sources—particularly in KYB, KYC, AML, or identity verification contexts where the same real-world entity may appear under different names, addresses, or tax IDs.
Expertise in classification with real-world ML challenges, for example: imbalanced labels, sparse signals, cold start, and production version management.
Hands-on ML infrastructure experience: feature stores, model management, ML training/serving pipelines.
Comfort as a senior IC: setting technical direction, mentoring peers, and establishing best practices.

Nice To Haves

B2B SaaS experience, ideally building ML products for enterprise customers.
ML pipeline and automation engineering: Experience building end-to-end training harnesses that automate feature engineering, data validation, and model training.
Experience scaling ML across multiple products or risk domains.

Responsibilities

Build risk & fraud ML applications: Deliver production ML models in fraud, trust & safety, KYB, and compliance domains, with measurable impact on customer workflows.
Tackle hard data problems: Work on classification problems with extreme class imbalance, sparse signals, and “cold start” label challenges.
Innovate in feature engineering & labeling: Use graph-based techniques, weak supervision, LLMs, and AI agents to improve signal extraction and automate labeling process.
Establish ML infrastructure foundations: Partner with the ML infra team to design feature services, model training pipeline, model serving standards, and orchestration to scale multiple ML use cases.
Design and implement knowledge graph solutions: Leveraging LLMs for graph construction, querying, and retrieval to enhance entity resolution and business identity use cases.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume