Document understanding is a foundational intelligence layer that powers every major capability across our legal AI platform—from search and information extraction to agentic reasoning in products like Westlaw, PracticalLaw, and CoCounsel. You'll build state-of-the-art semantic chunking, document enrichment, and knowledge graph construction systems that serve as the cognitive foundation multiple product teams depend on, working across authoritative legal, tax and accounting content and extraordinarily diverse customer data. This is a rare opportunity to solve publishing-quality research problems with immediate production impact—your innovations will directly shape how millions of legal professionals research, analyze, and reason over complex legal documents while advancing the capabilities that enable the next generation of intelligent legal AI agents. As an Senior Applied Scientist you will: Innovate & Deliver: Design, build, test, and deploy end-to-end AI solutions for complex document understanding tasks in the legal domain. Develop advanced models for semantic chunking of lengthy, non-uniformly structured legal documents with adjustable granularity levels for different use cases. Build document enrichment systems that classify documents according to legal and customer-defined taxonomies and extract rich metadata. Create LLM-based knowledge graph construction pipelines that extract and link heterogeneous legal knowledge including citations, entities, and legal concepts across diverse legal content. Develop scalable synthetic data generation systems to support model training, simulate complex legal research queries and generate hallucination-free answers. Work in collaboration with engineering to ensure well-managed software delivery and reliability at scale. Evaluate & Optimize: Develop comprehensive data and evaluation strategies for both component-level and end-to-end quality, leveraging expert human annotation and synthetic data generation. Apply robust training and evaluation methodologies that balance model performance with latency requirements, particularly for SLM-based solutions. Apply knowledge distillation techniques to compress large models into efficient SLMs suitable for production deployment. Drive Technical Decisions: Independently determine appropriate architectures for challenging document understanding problems including: semantic chunking strategies that handle diverse document formats, preserve legal document structure, and adapt to different granularity needs; document classification approaches that work across varying legal taxonomies and generalize to customer-defined schemas; LLM-based knowledge extraction methods that handle challenges like citation recognition errors and contextual references; multi-document reasoning architectures for generating synthetic multi-hop queries that reflect complex legal research patterns. Balance accuracy, efficiency, and scalability while solving real-world challenges like handling diverse document formats and content types. Align & Communicate: Partner closely with Engineering and Product teams to translate complex legal document understanding challenges into scalable, production-ready solutions. Engage stakeholders across multiple product lines to deeply understand use case requirements, shaping objectives that align document understanding capabilities with diverse business needs including next-generation search and deep legal research. Advance the Field: Maintain scientific and technical expertise in one or more relevant areas as demonstrated through product deliverables, published research at top venues (e.g. ACL, EMNLP, ICLR, NeurIPS, SIGIR, KDD) , and intellectual property.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
Ph.D. or professional degree