Senior Quantitative Scientist (ML/NLP)
Verana Health
·
Posted:
May 23, 2023
·
Remote
About the position
Verana Health is seeking a Senior Quantitative Scientist with expertise in data science, machine learning, and natural language processing (NLP) to work with clinical structured and unstructured text data derived from electronic health records. This role will report to the Manager within the Quantitative Sciences Data Development team and will be responsible for algorithm development, protocol creation, code implementation, modeling, inference, and interpretation. The ideal candidate will have a Master's or doctorate in a quantitative field and will collaborate cross-functionally with teams to translate clinical investigation questions into detailed data analytics requirements for internal and external projects.
Responsibilities
- Develop and leverage state-of-the-art advances in natural language processing using pre-trained large language models (LLMs) for analyzing and reasoning over clinical notes and other unstructured data in the context of clinical problems.
- Drive cutting-edge research on language modeling with emphasis on scientific accuracy and explainability.
- Communicate analysis results via presentations to a multi-disciplinary audience using clear, intuitive visualizations.
- Establish and maintain best practices for data exploration, end-to-end model development and deployment lifecycle, and data/code/documentation management.
- Work on Qdata development and commercial projects leveraging real-world data through responsibilities such as creation of study plans, implementation of analyses, development of algorithms, and/or writing of publications.
- Collaborate cross-functionally with teams (e.g., Commercial, Product, Medical, Engineering/Technology, etc.) to translate clinical investigation questions into detailed data analytics requirements for internal and external projects.
- Provide mentorship and knowledge sharing to team members in standardizing machine learning/natural language processing best practices.
Requirements
- Master's or doctorate in a quantitative discipline (e.g., data science, computer science, machine learning, biostatistics, health economics, etc.) or equivalent practical experience.
- 5+ years of hands-on experience with messy data (e.g., electronic health records, outcomes data) and analytical methodologies.
- 3+ years of hands-on experience with machine learning model implementation & deployment.
- 3+ years of hands-on experience with state-of-the-art natural large language models (e.g., BERT, Longformer, RoBERTa, etc.) in resolving use cases like named entity recognition (NER), text classification, entity relation extraction, etc.
- Strong familiarity with programming languages, especially Python, Pyspark, R, SQL.
- Strong familiarity with coding platforms, especially Databricks, Amazon Sagemaker, Visual Studio Code.
- Strong familiarity with unstructured text processing techniques.
- Familiarity with clinical datasets and coding systems such as ICD, CPT, and RxNorm.
- Ability to work effectively with cross-functional teams.
- Clear communication skills and able to deliver internal/external presentations.
- Ability to prioritize and manage multiple projects with high attention to detail.