Data Architect II

GSK-posted 1 day ago

Full-time • Mid Level

San Francisco, MA

5,001-10,000 employees

Resume

Match Score

Upload and Match ResumeTrack Jobs with Teal

The Onyx Research Data Tech organization is GSK’s Research data ecosystem which has the capability to bring together, analyze, and power the exploration of data at scale. We partner with scientists across GSK to define and understand their challenges and develop tailored solutions that meet their needs. The goal is to ensure scientists have the right data and insights when they need it to give them a better starting point for and accelerate medical discovery. Ultimately, this helps us get ahead of disease in more predictive and powerful ways. Onyx is a full-stack shop consisting of product and portfolio leadership, data engineering, infrastructure and DevOps, data / metadata / knowledge platforms, and AI/ML and analysis platforms, all geared toward: Building a next-generation, metadata- and automation-driven data experience for GSK’s scientists, engineers, and decision-makers, increasing productivity and reducing time spent on “data mechanics” Providing best-in-class AI/ML and data analysis environments to accelerate our predictive capabilities and attract top-tier talent Aggressively engineering our data at scale, as one unified asset, to unlock the value of our unique collection of data and predictions in real-time The Onyx Data Architecture team sits within the Data Engineering team, which is responsible for the design, delivery, support, and maintenance of industrialized automated end to end data services and pipelines. They apply standardized data models and mapping to ensure data is accessible for end users in end-to-end user tools through use of APIs. They define and embed best practices and ensure compliance with Quality Management practices and alignment to automated data governance. They also acquire and process internal and external, structure and unstructured data in line with Product requirements. As a Data Architect II, you'll apply your expertise in big data and AI/GenAI workflows to support GSK's complex, regulated R&D environment. You'll contribute to designing Data Mesh/Data Fabric architectures while enabling modern AI and machine learning capabilities across our platform.

Partner with the Scientific Knowledge Engineering team to develop physical data models to build fit-for-purpose data products
Design data architecture aligned with enterprise-wide standards to promote interoperability
Collaborate with the platform teams and data engineers to maintain architecture principles, standards, and guidelines
Design data foundations that support GenAI workflows including RAG (Retrieval-Augmented Generation), vector databases, and embedding pipelines
Work across business areas and stakeholders to ensure consistent implementation of architecture standards
Lead reviews and maintain architecture documentation and best practices for Onyx and our stakeholders
Adopt security-first design with robust authentication and resilient connectivity
Provide best practices and leadership, subject matter, and GSK expertise to architecture and engineering teams composed of GSK FTEs, strategic partners, and software vendors.

Bachelor’s degree in computer science, engineering, Data Science or similar discipline
5+ years of experience in data architecture, data engineering, or related fields in pharma, healthcare, or life sciences R&D.
3+ years’ experience of defining architecture standards, patterns on Big Data platforms
3+ years’ experience with data warehouse, data lake, and enterprise big data platforms
3+ years’ experience with enterprise cloud data architecture (preferably Azure or GCP) and delivering solutions at scale
3+ years of hands-on relational, dimensional, and/or analytic experience (using RDBMS, dimensional, NoSQL data platform technologies, and ETL and data ingestion protocols)

Master's or PhD in computer science, engineering, Data Science or similar discipline
Deep knowledge and use of at least one common programming language: e.g., Python, Scala, Java
Experience with AI/ML data workflows: feature stores, vector databases, embedding pipelines, model serving architectures
Familiarity with GenAI/LLM data patterns: RAG architectures, prompt engineering data requirements, fine-tuning data preparation
Experience with GCP data/analytics stack: Spark, Dataflow, Dataproc, GCS, Bigquery
Experience with enterprise data tools: Ataccama, Collibra, Acryl
Experience with Agile frameworks: SAFe, Jira, Confluence, Azure DevOps
Experience applying CI/CD principles to data solution
Experience with Spark and RAG-based architectures for data science and ML use cases
Strong communication skills—ability to explain technical concepts to non-technical stakeholders
Pharmaceutical, healthcare, or life sciences background

health care and other insurance benefits (for employee and family)
retirement benefits
paid holidays
vacation
paid caregiver/parental and medical leave
annual bonus
eligibility to participate in our share based long term incentive program which is dependent on the level of the role

Track Jobs with Teal

Job Search Resources

•

AI Resume Builder

•

Data Architect Resume Examples

•

Data Architect Cover Letter Examples

Data Architect II

Job Search Resources

Tools

Career Hubs

Guides

Company