Senior Data Scientist

Bristol Myers Squibb•Seattle, WA

21h•Hybrid

About The Position

As a Senior Data Scientist within Bristol Myers Squibb's AI Venture Studio delivery team, you will be a hands-on senior individual contributor who helps convert ambiguous scientific and business opportunities into measurable AI product hypotheses, experiments, and working solutions. You will partner with AI Engineers, Data Engineers, App/Cloud Engineers, Frontend Engineers, product owners, and domain experts to build and evaluate AI systems across R&D, Commercialization, Manufacturing, and Enabling Functions. The role sits at the edge of applied data science and AI engineering: you will design evaluation datasets, build analytical features, prototype models, test agent and retrieval performance, measure product impact, support sandboxed data problem solving, and explain model behavior in ways stakeholders can trust. You will help define the good analytical context agents need to perform reliable work, including query history, column values, explicit instructions, memory, data tools, warehouse context, and curated source meaning. BMS is an AWS-first engineering environment for these products, and your work will use AWS-aligned data and AI services alongside enterprise-preferred tools such as OpenSearch, Amazon S3 Vectors, Amazon Neptune, PostgreSQL/RDS, LangGraph, LangSmith, and a variety of approved frontier LLM models and APIs. This is a role for someone excited to work hands-on with the latest AI tools and frontier technologies, pushing the limits of what technology can do to help BMS discover, develop, and deliver innovative medicines.

Requirements

Bachelor's or higher degree in Data Science, Statistics, Computer Science, Engineering, Bioinformatics, Computational Biology, Applied Mathematics, or a related scientific field.
5+ years of experience in data science, machine learning, applied AI, analytics, computational science, or related technology roles with increasing responsibility.
Proficiency in Python, SQL, R and/or common data science libraries such as pandas, NumPy, scikit-learn, PyTorch, TensorFlow, statsmodels, or similar tools and packages.
Experience applying machine learning, statistics, NLP, information retrieval, experimentation, or decision science to real-world products or scientific/business workflows.
Experience with LLM applications, RAG, agentic AI, prompt/evaluation design, structured outputs, context-quality evaluation, knowledge curation, and model quality assessment.
Familiarity with AWS data and AI services such as S3, Athena, RDS/PostgreSQL, OpenSearch, SageMaker, Bedrock, or equivalent cloud tools.
Experience with evaluation rubrics, hallucination risk measurement, causal inference, simulation, optimization, recommendation methods, and reusable evaluation harnesses.
Familiarity with vector databases, knowledge graphs, embeddings, metadata strategy, and data quality practices.
Familiarity with lightweight web prototyping tools such as Streamlit for sharing analyses and exploratory AI demos.
Experience communicating quantitative findings, assumptions, limitations, and recommendations to technical and non-technical audiences.
Effective use of coding agents or AI-assisted development tools such as Claude Code, Codex, Gemini CLI, GitHub Copilot, or similar tools.
Excitement for experimenting with the latest AI tools and technologies while applying scientific rigor to help discover, develop, and deliver innovative medicines.
Curious and inquisitive mindset, with comfort working in agile pods, learning new domains quickly, and adapting analysis plans as evidence emerges.

Responsibilities

Frame ambiguous business and scientific questions into measurable AI product hypotheses, success metrics, evaluation plans, and rapid experiments.
Contribute to six-sprint, 12-week AI Accelerator agile cycles by testing hypotheses, validating AI product increments, and adapting analyses during two-week sprints.
Build data science prototypes using Python, SQL, notebooks, APIs, and AWS-aligned data services.
Support sandboxed data problem solving in non-production environments, enabling agents and analysts to branch, transform, test, and audit code-plus-data experiments before promotion.
Evaluate and curate the analytical context agents and analysts rely on, including explicit instructions, memory, data tools, and curated meaning from source materials and recommend improvements based on measured impact on agent quality.
Develop analytical features, embeddings, classifiers, ranking/scoring methods, recommendation logic, simulation approaches, or optimization methods as needed for product outcomes.
Partner with Data Engineers to shape reliable datasets, retrieval corpora, metadata, and feature pipelines using S3, Athena, PostgreSQL/RDS, vector databases, and knowledge graphs.
Design and execute evaluations for LLM, RAG, and agentic workflows, with emphasis on context quality, knowledge curation, semantic evolution, and model quality.
Build evaluation rubrics, golden datasets, structured output validation, error taxonomies, hallucination risk measurement, and SME review loops.
Use tools such as LangGraph, LangSmith, PydanticAI, or similar frameworks to test agent behavior, retrieval quality, reasoning traces, and workflow reliability.
Evaluate whether curated enterprise context improves agent quality, reliability, traceability, and decision usefulness compared with raw document retrieval.
Assess model and agent outputs for quality, uncertainty, calibration, bias, hallucination risk, traceability, and fitness for intended use.
Explore approved proprietary and open model options through enterprise channels and recommend model/task pairings based on evidence, risk, cost, and performance.
Define KPIs and analytical measurement plans for AI products, including adoption, user behavior, workflow efficiency, scientific utility, and business value.
Use bi-weekly demos, sprint reviews, stakeholder feedback, and performance results to measure MVP progress and assess readiness for scaling or production transition.
Apply statistical modeling, experimental design, causal inference, or quasi-experimental methods where appropriate to separate signal from noise.
Create clear analyses, visualizations, and narratives that help product teams and stakeholders understand model behavior, limitations, and opportunities.
Partner with responsible AI, security, quality, and domain experts to ensure evaluations and analytics respect data privacy, scientific integrity, and enterprise governance.
Contribute reusable notebooks, context-quality evaluation harnesses, analytics templates, prompt/evaluation assets, and data science patterns that can be adopted across pods.
Participate in code reviews, analysis reviews, design discussions, and technical problem-solving with engineering and product teams.
Use coding agents and AI-assisted development tools effectively while validating outputs, documenting assumptions, and maintaining scientific rigor.
Continuously refine analytical priorities and backlogs as insights emerge, incorporating stakeholder input, performance results, and lessons learned throughout MVP development.
Coach peers on practical data science, evaluation design, measurement strategy, and evidence-based decision making in fast-moving AI delivery environments.

Benefits

Medical, pharmacy, dental, and vision care.
BMS Well-Being Account, BMS Living Life Better, and Employee Assistance Programs (EAP).
401(k) plan, short- and long-term disability, life insurance, accident insurance, supplemental health insurance, business travel protection, personal liability protection, identity theft benefit, legal support, and survivor support.
Flexible time off (unlimited, with manager approval, 11 paid national holidays)
160 hours annual paid vacation for new hires with manager approval, 11 national holidays, and 3 optional holidays
Unlimited paid sick time
Up to 2 paid volunteer days per year
Summer hours flexibility
Leaves of absence for medical, personal, parental, caregiver, bereavement, and military needs
Annual Global Shutdown between Christmas and New Years Day.