At PitchBook, a Morningstar company, we are always looking forward. We continue to innovate, evolve, and invest in ourselves to bring out the best in everyone. We’re deeply collaborative and thrive on the excitement, energy, and fun that reverberates throughout the company. Our extensive learning programs and mentorship opportunities help us create a culture of curiosity that pushes us to always find new solutions and better ways of doing things. The combination of a rapidly evolving industry and our high ambitions means there’s going to be some ambiguity along the way, but we excel when we challenge ourselves. We’re willing to take risks, fail fast, and do it all over again in the pursuit of excellence. If you have a good attitude and are willing to roll up your sleeves to get things done, PitchBook is the place for you. About the Role: The Data Collection AI/ML team builds intelligent systems that scale and improve PitchBook’s data extraction, enrichment, and validation processes. The team applies advanced ML including classification, entity/relationship extraction, LLM-based parsing, OCR, and anomaly detection to ensure high accuracy, coverage, and timeliness of our proprietary datasets. The Staff MLE role is a force multiplier for the team, partnering with technical leadership to set best practices and design reusable ML architectures that support rapid innovation and operational excellence. As a Staff Machine Learning Engineer on the Data Collection AI/ML team, you will serve as the senior technical expert responsible for designing, architecting, and deploying advanced AI and machine learning systems that power PitchBook’s data collection, extraction, and enrichment workflows. You will play a pivotal role in elevating the technical bar of the organization by setting engineering standards, driving architectural decisions, and supporting teams to build scalable, production-grade ML systems. Your work will focus on automating and enhancing PitchBook’s ingestion and data quality pipelines across a wide variety of structured and unstructured sources, drawing from domain areas such as document understanding, OCR, natural language processing, entity resolution, multimodal modeling, retrieval systems, and LLM-driven extraction. You will collaborate closely with Engineering, Product, and Data Operations partners to translate business requirements into robust, high-impact AI solutions. This role is ideal for someone who thrives as a deeply technical IC and wants to push the boundaries of document AI and data extraction technology, shape long-term architectural direction, and materially influence the future of data automation at PitchBook. In addition to driving product impact, this role offers an opportunity to shape PitchBook’s growing presence and technical reputation in the AI and ML space. We are looking for individuals who are active contributors to the broader AI community through peer-reviewed research, technical publications, or open-source initiatives. Candidates who have authored conference papers or patents and who are excited to explore the frontiers of generative AI, LLMs, and applied NLP will be well-positioned to help us both advance our internal capabilities and deepen trust with our customers through thought leadership