Data Engineer II – Data & Analytics IT GDD Data Product - Safety & Regulatory

Bristol Myers Squibb•Princeton, NJ

109d

About The Position

As a Data Engineer II, you will help build and operate reliable, secure, and user-centric data and analytics capabilities for Global Drug Development (GDD) across Safety and Regulatory data products. You will design and optimize cloud-native data pipelines, dimensional models, and self-service analytics to support Pharmacovigilance (PV) scientists, Regulatory specialists, and cross-functional stakeholders. You will also leverage GenAI and semantic search techniques (e.g., RAG, vector embeddings) to improve data discovery, compliance automation, and decision support. This role is ideal for an engineer with 3–5 years of experience who thrives in regulated environments, can reverse-engineer complex systems, and enjoys turning ambiguous requirements into scalable, production-grade solutions.

Requirements

3–5 years of hands-on experience in Data Engineering/Analytics delivering production-grade data pipelines, models, and analytics in a cloud environment (AWS preferred) within regulated or life-sciences settings (highly preferred).
Bachelor’s degree in Engineering or a Scientific discipline required
Proficiency in Python and SQL with experience in Spark (PySpark) for data processing; strong Linux/shell scripting skills.
Demonstrated experience with AWS data services and tooling, including S3 and Redshift; familiarity with data lake patterns and scheduling/orchestration (e.g., cron or similar tools).
Proven capability in dimensional modeling, data mart design, and source-to-target mapping with strong documentation and lineage practices.
Hands-on experience operationalizing dashboards and self-service analytics using Spotfire and/or powerBI; familiarity with Quicksight is a plus.
Practical experience building GenAI/NLP features: RAG pipelines, vector embeddings (e.g., FAISS), prompt engineering, and frameworks such as LangChain; familiarity with OpenAI/Anthropic/Llama.
Strong stakeholder engagement and communication skills; demonstrated ability to collaborate across onshore/offshore teams and drive clarity from ambiguous requirements.
Experience in communication and stakeholder engagement; ability to reverse engineer and document complex systems, as well as support adoption among cross-functional teams.

Nice To Haves

Master’s degree in Analytics or related field preferred.

Responsibilities

Contribute to cross-functional data and AI initiatives across WWPS and Regulatory product lines; collaborate closely with PV scientists, Regulatory leads, data product owners, and engineering teams to deliver high-impact outcomes.
Design, build, and optimize scalable ETL/ELT pipelines and data models using AWS-native services (e.g., S3, Redshift) and Spark for large, complex life-sciences datasets; implement robust orchestration (e.g., cron or similar) and monitoring.
Develop and maintain dimensional models and data marts; define clear source-to-target mappings, data lineage, and documentation to support auditability, validation, and reuse.
Migrate and modernize data pipelines (e.g., Postgres to Redshift), reducing refresh latency and improving availability, performance, and cost efficiency through techniques like partitioning, distribution/sort keys, and caching.
Architect and deliver GenAI/NLP-powered features for data discovery and compliance automation using RAG, vector embeddings (e.g., FAISS), and frameworks like LangChain with OpenAI/Anthropic/Llama.
Build self-service analytics and interactive dashboards (Tableau, QuickSight, Power BI) to support operational and regulatory decision-making (e.g., query forecasting, submission tracking, safety signal exploration).
Ingest and harmonize data from multiple clinical programs into S3-backed data lakes; implement Spark transforms and Redshift models to expand safety and adverse event data domains.
Partner with PV and QA teams to plan and execute functional/regression/validation testing; document test evidence and support GxP-aligned processes to ensure high-quality releases.
Drive PoCs for governance and analytics, evaluate emerging patterns, and translate learnings into scalable platform capabilities.
Contribute to engineering standards, code reviews, and documentation; collaborate with onshore/offshore teams and mentor interns/junior analysts on best practices and business alignment.
Stay current on trends in GenAI, RAG, vector databases, semantic search, and cloud data engineering; propose and integrate best practices for continuous platform improvement.

Benefits

Health Coverage: Medical, pharmacy, dental, and vision care.
Wellbeing Support: Programs such as BMS Well-Being Account, BMS Living Life Better, and Employee Assistance Programs (EAP).
Financial Well-being and Protection: 401(k) plan, short- and long-term disability, life insurance, accident insurance, supplemental health insurance, business travel protection, personal liability protection, identity theft benefit, legal support, and survivor support.
Work-life benefits include: Paid Time Off US Exempt Employees: flexible time off (unlimited, with manager approval, 11 paid national holidays (not applicable to employees in Phoenix, AZ, Puerto Rico or Rayzebio employees) Phoenix, AZ, Puerto Rico and Rayzebio Exempt, Non-Exempt, Hourly Employees: 160 hours annual paid vacation for new hires with manager approval, 11 national holidays, and 3 optional holidays Based on eligibility, additional time off for employees may include unlimited paid sick time, up to 2 paid volunteer days per year, summer hours flexibility, leaves of absence for medical, personal, parental, caregiver, bereavement, and military needs and an annual Global Shutdown between Christmas and New Years Day. All global employees full and part-time who are actively employed at and paid directly by BMS at the end of the calendar year are eligible to take advantage of the Global Shutdown.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume