Principal Data Scientist

Structure Therapeutics•South San Francisco, CA

46d

About The Position

Structure Therapeutics develops life-changing medicines for patients using advanced structure-based and computational drug discovery technology. The company’s platform combines the latest advancements in visualization of molecular interactions, computational chemistry, and data integration to design orally available, superior small molecule medicines that overcome current limitations of biologic and peptide drugs. We are advancing a clinical-stage pipeline of differentiated treatments focused on chronic diseases with high unmet need, including cardiovascular, metabolic, and pulmonary conditions. Structure Therapeutics is led by an experienced group of international drug innovators and financed by top-tier global life sciences investors. The company completed an initial public offering (IPO) in February 2023. With offices in California and Shanghai, Structure Therapeutics has the benefit of being at the center of life science innovation in both the US and China and capitalizing on the strengths of each geographic location. We are seeking a highly motivated Principal Data Scientist with strong machine learning engineering expertise to join the Biometrics and Data Management group. This role will lead the efforts on advancing the current platform by developing AI-powered transformative and scalable data science tools to support evolving business needs. As a technical lead in a multidisciplinary environment, you will work closely with clinical scientists, biostatisticians, data managers, statistical programmers, and other stakeholders to design, implement and deploy robust data-driven solutions that accelerate decision-making and improve patient outcomes. In addition to modeling and analytics work, you will lead building production-grade tools and frameworks that extend the capabilities of the current internal platforms.

Requirements

Bachelor’s degree in computer science, Data Science, Statistics, Mathematics, or a related field.
Minimum of 12 years of Data Science experience with a Bachelor’s degree, 10 years with a Master’s, or 6 years with a Ph.D., including at least 5 years of hands-on experience developing production-grade software in programming languages such as Python or R.
Demonstrated understanding of software architecture, performance optimization and validation.
Strong expertise in Machine Learning and Deep Learning techniques (e.g., Scikit-learn, TensorFlow, or PyTorch).
Demonstrated experience deploying and maintaining models in production (e.g., APIs, batch jobs, microservices).
Strong analytical thinking and problem-solving skills.
Experienced with Github.

Nice To Haves

Master’s or Ph.D degree in Computer Science, Machine Learning, Statistics, Mathematics, or a related field.
Demonstrated expertise in developing and optimizing scalable data architectures and analytics pipelines within cloud environments (AWS, GCP, or Azure), including experience with containerization (e.g., Docker).
Experience with creating Large Language Models (LLMs) and GenAI based solution in production environment.
Experienced in designing and implementing CI/CD pipelines.

Responsibilities

Collaborate with cross functional units to understand business needs and gather insights to support internal data science tool development.
Develop and validate data science tools for clinical research, and trial optimization.
Ensure the scalability, maintainability, and reliability of analytical solutions through software engineering best practices (e.g., testing, documentation, CI/CD pipeline).
Design, build and implement scalable, secure and efficient infrastructure on AWS.
Communicate technical results and insights to stakeholders through clear, structured presentations and documentation.