Data Scientist

UCSF•San Francisco, CA

About The Position

This role involves developing and utilizing computational tools and systems to analyze and interpret biological or other research data. The Data Scientist will use and develop algorithms, computational techniques, and standard statistical methodologies. They will assist in the design of new experiments and lead the execution of building machine learning and statistical models. Responsibilities include implementing end-user needs in database development, maintenance, searching, and integration. The role also entails maintaining computational infrastructure and managing the flow of samples and information for large-scale studies, providing bioinformatics support and access to public and proprietary databases, and managing cloud and on-premises computational infrastructure and data. Our research is at the intersection of cardiovascular disease and human genetics, employing new techniques like deep learning for deep phenotyping, which relies on a solid foundation of classical bioinformatics. The Bioinformatics Programmer/Data Scientist will assist in managing, cleaning, and analyzing large-scale medical data using a wide variety of analytic techniques, both in the cloud and with on-premises compute depending on data permissions. Experience with a cloud provider such as AWS, Microsoft Azure, or Google Cloud is a plus, and the ability to learn how to manage cloud-based pipelines and perform cloud data management will be essential skills to develop and maintain. Maintaining bioinformatic databases by obtaining and restructuring data, including both UCSF proprietary data and public data, and writing tools to streamline discovery and replication analyses using these databases will be core responsibilities. An important task will be writing and maintaining analytic pipelines in languages such as R, python, Go, Rust, shell, SQL, WDL, and/or other appropriate languages, and using tools such as Docker. Experience with databases or the ability to learn will be requisite. Under the supervision of the PI, the Data Scientist will also be involved in data analysis, and will be comfortable with bioinformatic analyses including variant calling and annotation. There will be opportunities to employ cutting-edge methods and to develop new methods. The ability to learn and implement new techniques depending on the problem at hand will be an essential skill, thus requiring a strong foundation in computer programming. This position will also include administrative duties and will have the opportunity to participate in—and to lead—authorship teams.

Requirements

Acquired job skills, policies, and procedures to complete substantive assignments/projects/tasks of moderate scope and complexity.
Judgment within defined guidelines and practices to determine appropriate action.
Experience with a cloud provider such as AWS, Microsoft Azure, or Google Cloud is a plus.
Ability to learn how to manage cloud-based pipelines.
Ability to perform cloud data management.
Experience with databases or the ability to learn.
Comfortable with bioinformatic analyses including variant calling and annotation.
Ability to learn and implement new techniques depending on the problem at hand.
Strong foundation in computer programming.
Experience with R, python, Go, Rust, shell, SQL, WDL, or other appropriate languages.
Experience with tools such as Docker.

Nice To Haves

Experience with a cloud provider such as AWS, Microsoft Azure, or Google Cloud.

Responsibilities

Develop and utilize computational tools and systems to analyze and interpret biological or other research data.
Utilize and develop algorithms, computational techniques, and standard statistical methodologies.
Assist in the design of new experiments.
Lead the execution of building machine learning and statistical models.
Implement end-user needs in database development, maintenance, searching, and integration.
Maintain computational infrastructure.
Manage and track the flow of samples and information for large-scale studies.
Provide bioinformatics and access to public and proprietary databases.
Manage cloud and on-premises computational infrastructure and data.
Assist in managing, cleaning, and analyzing large scale medical data using a wide variety of analytic techniques, both in the cloud and with on-premises compute.
Learn how to manage cloud-based pipelines and perform cloud data management.
Maintain bioinformatic databases by obtaining and restructuring data, including both UCSF proprietary data and public data.
Write tools to streamline discovery and replication analyses using these databases.
Write and maintain analytic pipelines in languages such as R, python, Go, Rust, shell, SQL, WDL, and/or other appropriate languages.
Use tools such as Docker.
Perform data analysis, including bioinformatic analyses such as variant calling and annotation.
Employ cutting-edge methods and develop new methods.
Learn and implement new techniques depending on the problem at hand.
Perform administrative duties.
Participate in and lead authorship teams.