The Onyx Research Data Platform organization represents a major investment by GSK R&D and Digital & Tech, designed to deliver a step change in our ability to leverage data, knowledge, and prediction to find new medicines. We are a full-stack shop consisting of product and portfolio leadership, data engineering, infrastructure and DevOps, data / metadata / knowledge platforms, and AI/ML and analysis platforms, all geared toward: Building a unified, automated, next-generation data experience for GSK’s scientists, engineers, and decision-makers, increasing productivity, and reducing data friction Providing best-in-class AI/ML, GenAI and data analysis environments to accelerate our predictive capabilities and attract top-tier talent Aggressively engineering our data at scale to unlock the value of our combined data assets and predictions in real-time Data Engineering is responsible for the design, delivery, support, and maintenance of industrialised automated end to end data services and pipelines. They apply standardised data models and mapping to ensure data is accessible for end users in end-to-end user tools through use of APIs. They define and embed best practices and ensure compliance with Quality Management practices and alignment to automated data governance. They also acquire and process internal and external, structure and unstructured data in line with Product requirements. As a Data Engineer II, you are a technical contributor who can take a well-defined specification for a function, pipeline, service, or other sort of component, devise a technical solution, and deliver it at a high level. You are aware of, and adhere to, best practice for software development in general (and data engineering in particular), including code quality, documentation, DevOps practices, and testing. You ensure robustness of our services and serve as an escalation point in the operation of existing services, pipelines, and workflows. You will work across structured, unstructured, and scientific data domains, applying modern engineering and automation best practices to deliver reliable, scalable, and governed data products. You will also contribute to emerging GenAI-enabled data capabilities, such as embedding pipelines, vectorized data flows, and LLM-ready data products. You should be deeply familiar with the most common tools (languages, libraries, etc) in the data space, such as Spark, Kafka, Storm, etc., and aware of the open-source communities that revolve around these tools. You have a strong focus on operability of your tools and services, and develop, measure, and monitor key metrics for their work to seek opportunities to improve those metrics.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level