Data Scientist - Clinical AI

CVS HealthNew York, NY
$79,310 - $158,620Onsite

About The Position

CVS Health's Analytics & Behavior Change (A&BC) team is seeking to grow its Clinical Data Science & AI team. This role is tasked with activating CVS Health's clinical data repository to improve outcomes across multiple lines of business and use cases. The Data Scientist - Clinical AI will serve as a bridge between clinical data assets and the analysts, data scientists, and business partners who consume them, ensuring data is accessible, well-documented, fit for purpose, and aligned with clinical and regulatory standards. The position involves extracting signal from unstructured clinical text using NLP and language models, building and fine-tuning Small Language Models (SLMs), leveraging Large Language Models (LLMs) where appropriate, developing predictive analytics solutions, conducting exploratory data analysis, communicating findings to stakeholders, collaborating across teams, and staying current with emerging techniques in AI/ML. Upholding data governance standards, including HIPAA compliance, is also a key responsibility.

Requirements

  • 2+ years of experience in data science, machine learning, or applied NLP, preferably in healthcare or a similarly regulated domain.
  • Hands-on experience with NLP techniques such as text preprocessing, tokenization, named entity recognition (NER), text classification, and topic modeling applied to real-world unstructured data.
  • Practical experience with LLMs and/or SLMs, including prompt engineering, fine-tuning, RAG architectures, evaluation frameworks, or deploying language models in production or research settings.
  • Strong foundation in traditional machine learning, including supervised and unsupervised methods, feature engineering, model selection, cross-validation, and performance evaluation.
  • Proficiency in best coding practices, including version control (Git/Github), writing clean and reproducible code, and understanding the importance of well-organized repositories.
  • Deep EDA skills, with the ability to systematically explore datasets, identify data quality issues, surface insights, and make informed decisions about modeling approaches.
  • Proficiency in Python (pandas, scikit-learn, PyTorch or TensorFlow, Hugging Face Transformers) and SQL for working with large-scale healthcare datasets.
  • Experience with cloud-based data and ML platforms, preferably Google Cloud Platform (GCP) — BigQuery, Vertex AI, or equivalent.
  • Excellent presentation and communication skills, with the ability to clearly explain technical concepts and their business implications.
  • Good judgment and common sense, understanding when an LLM is appropriate, meeting deadlines, asking for help when needed, and avoiding over-engineering solutions.
  • Genuine curiosity and desire to learn, including reading papers, trying new tools, asking 'why,' and being energized by new problems.

Nice To Haves

  • Experience working with clinical text data, such as clinical notes, discharge summaries, or pathology reports.
  • Knowledge of clinical coding systems and terminologies (ICD-10, SNOMED-CT, LOINC, RxNorm, CPT, NDC, UMLS) and their relevance to NLP pipelines.
  • Familiarity with clinical data standards (HL7, FHIR, CCD/C-CDA) and common data models (e.g., OMOP).
  • Experience building or contributing to clinical NLP pipelines, including entity extraction, relation extraction, negation detection, or section segmentation from clinical narratives.
  • Experience with model evaluation in clinical contexts, understanding sensitivity/specificity tradeoffs, clinical validation, and responsible AI practices in healthcare.
  • Familiarity with MLOps practices, such as model versioning, experiment tracking, CI/CD for ML, and model monitoring.
  • Experience working directly with clinical stakeholders (physicians, nurses, clinical operation teams) and tailoring presentations, findings, and recommendations to different audience levels.
  • Privacy, security, and compliance experience, including HIPAA/HITRUST, de-identification/tokenization, and PHI handling.

Responsibilities

  • Extract signal from unstructured clinical text by applying NLP and language model techniques to clinical notes, CCD documents, and other free-text clinical data to generate structured, actionable features for downstream analytics and predictive models.
  • Build and fine-tune Small Language Models (SLMs), designing, training, and evaluating domain-specific SLMs tailored to clinical use cases, balancing performance, cost, latency, and compliance requirements.
  • Utilize LLMs where applicable, leveraging them for tasks like training data creation, entity extraction, and zero-shot classification, while also knowing when traditional ML, rules-based approaches, or simpler statistical methods are more appropriate.
  • Develop predictive analytics solutions by building and validating predictive models using both classical ML (gradient boosting, logistic regression, survival analysis) and modern deep learning approaches to support clinical decision-making and population health initiatives.
  • Conduct rigorous Exploratory Data Analysis (EDA) on structured and unstructured clinical datasets to uncover patterns, assess data quality, identify feature candidates, and inform modeling strategy.
  • Communicate findings clearly by presenting methodology, results, and recommendations to technical and non-technical stakeholders through visualizations, notebooks, and presentations, translating complex AI/ML concepts into actionable language.
  • Collaborate with machine learning engineers, data engineers, clinical informaticists, and business partners to ensure clinical data pipelines support AI/ML workflows and that model outputs are integrated into products and decision-making processes.
  • Stay current and curious by continuously evaluating emerging techniques in NLP, foundation models, and clinical AI, bringing new ideas to the team, prototyping rapidly, and advocating for evidence-based approaches.
  • Uphold data governance standards, ensuring all work complies with HIPAA, data privacy regulations, and internal data stewardship policies, particularly when handling PHI and unstructured clinical text.

Benefits

  • Medical coverage
  • Dental coverage
  • Vision coverage
  • Paid time off
  • Retirement savings options
  • Wellness programs
  • CVS Health bonus, commission or short-term incentive program
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service