Data Scientist, Advanced Analytics & Commercial Effectiveness

AstraZeneca•Mississauga, ON

3d•CA$134,708 - CA$176,804•Hybrid

About The Position

This role focuses on leveraging advanced statistical methods and AI-native analytics to analyze de-identified healthcare data. The goal is to identify patients, enhance commercial strategies, and improve outcomes for individuals with rare diseases. The position involves working within a fast-paced analytics team that combines deep statistical expertise with modern machine learning and Snowflake Cortex AI to accelerate model development while ensuring interpretability and compliance. Responsibilities include developing models for patient identification, adherence prediction, marketing effectiveness, and causal impact analysis, with the aim of delivering trusted models for leaders and teams. The company emphasizes an in-person working environment, with an average of at least three days per week in the office, balancing this with individual flexibility. They are seeking individuals who thrive at the intersection of analytical depth and real-world impact, ready to bring publication-grade statistical rigor and AI-native analytics to uncover hidden patients, elevate commercial strategy, and improve outcomes for people living with rare diseases.

Requirements

Master’s or PhD in Statistics, Biostatistics, Data Science, Econometrics, Applied Mathematics, or a related quantitative field.
4–8+ years in data science, applied statistics, or quantitative commercial analytics with a track record of deploying production-grade models in healthcare or life sciences.
Expert-level proficiency in hypothesis testing, regression analysis (linear, logistic, mixed-effects, regularized), ANOVA, survival analysis, Bayesian inference, experimental design, power analysis, significance testing, and multiple comparison corrections.
Deep understanding of when statistical methods apply, when they break down, and how to adapt for small-population rare disease contexts.
Proficiency in XGBoost, LightGBM, Random Forest, SVM, ensemble methods, neural networks, and time-series forecasting with thorough validation (cross-validation, precision-recall, ROC/AUC, calibration).
Expert-level Python (scikit-learn, XGBoost, LightGBM, statsmodels, lifelines, scipy.stats, PyMC, CausalML, DoWhy, SHAP, PyTorch) and SQL.
Proficiency with Jupyter, Git, and CI/CD integration for model deployment.
Proficiency with Snowflake (Snowpark Python, Snowpark Container Services, Cortex AI), Spark/PySpark, and MLflow or equivalent experiment tracking and model registry tools.
Hands-on experience building Bayesian MMM (PyMC, LightweightMMM, Robyn) and Next-Best-Action recommendation engines for pharmaceutical promotional optimization.
Experience with AI coding agents (Cortex AI, Claude Code, Copilot) for analytical development. Ability to critically evaluate agent-generated code and identify incorrect statistical reasoning.
Solid understanding of HIPAA de-identification standards, model explainability frameworks (SHAP, LIME), bias detection, and compliance with regulated healthcare data environments.
Ability to translate sophisticated statistical findings into actionable recommendations for non-technical commercial stakeholders and senior leadership.

Nice To Haves

Experience in rare disease or specialty pharma analytics — small-population modeling, patient identification, specialty pharmacy data, hub/PSP, REMS-related data, and high-value-per-patient environments.
Hands-on experience with Komodo Health (open and closed claims), IQVIA (Symphony, NPA, DDD), Veeva CRM, MMIT, Model N, specialty pharmacy dispense data, and EMR/EHR data.
Experience with NLP (topic modeling, NER, embeddings, text classification) and neural network architectures (RNNs, LSTMs, transformers) for healthcare analytics applications.
Experience with RLHF concepts, benchmark design, systematic prompt evaluation, and agent reasoning quality assessment.
Proficiency with PowerBI, Tableau, or Qlik for executive-facing dashboards and self-service reporting.

Responsibilities

Act as the go-to expert to set analytical standards across hypothesis testing, regression, inference, and experimental design; ensure outputs meet publication-grade rigor with clear assumptions, diagnostics, and power.
Design, validate, and deploy models using XGBoost, LightGBM, Random Forest, SVM, neural networks, and ensembles; address class imbalance with robust evaluation and calibration to drive precise commercial actions.
Build survival models (Cox, AFT, competing risks) to predict adherence, discontinuation, and patient lifetime value that inform proactive interventions.
Create and maintain ensemble time-series frameworks (ARIMA, Prophet, exponential smoothing, gradient-boosted) to guide demand planning, revenue scenarios, and launch-readiness decisions.
Design A/B tests and apply quasi-experimental methods (DiD, PSM, synthetic control, IV, RDD) to quantify the true effect of commercial initiatives on prescribing and patient outcomes.
Develop Bayesian MMM to estimate channel-level return on investment and response curves; recommend promotional reallocations that improve impact across personal and non-personal channels.
Build and refine HCP-level recommendation systems using contextual bandits, collaborative filtering, and reinforcement learning; integrate daily actions into Veeva CRM.
Train supervised classifiers on claims, labs, and specialty pharmacy data to prioritize likely undiagnosed patients and direct field resources where they matter most.
Deploy models that detect early risk signals from dispense intervals, hub interactions, and scheduling patterns to reduce discontinuation.
Construct HCP and patient segments using clustering and NLP-enriched profiles to focus engagement and tailor messaging.
Create real-time switching surveillance models from claims and formulary data to anticipate market dynamics and inform agile responses.
Use Snowflake Cortex AI and AI coding agents to speed prototyping and feature engineering while retaining full human-led statistical validation.
Build guardrails and evaluation suites that stress-test agent outputs, catch plausible-but-wrong reasoning, and prevent flawed insights from reaching decisions.
Design domain-specific prompts, benchmarks, and feedback loops to continuously improve agent analytical performance.
Work exclusively with de-identified patient-level data; implement minimum-necessary access and maintain re-identification risk assessments.
Implement bias detection, fairness audits, SHAP/LIME, drift monitoring, and validation gates; maintain end-to-end audit trails aligned to FDA, REMS, GDPR, SOC2, 21 CFR Part 11, and enterprise AI governance.
Translate sophisticated statistics into clear recommendations for Brand, Market Access, Patient Services, and Field teams; influence senior leaders with evidence that drives action.
Create meticulous documentation, code reviews, and trainings that set the standard across the analytics community and ensure sustainable, repeatable excellence.