Bioinformatics Engineer II

Children's Hospital of Philadelphia•Philadelphia, PA

75d•Hybrid

About The Position

The Genomic Diagnostic Laboratory (GDL) of the Children's Hospital of Philadelphia is seeking a highly motivated Bioinformatics Engineer II with an interest in laboratory diagnostics to join our team. This is an exciting opportunity to work collaboratively within the bioinformatics team and together with our laboratory directors, supervisors, analysts, genetic counselors, and wet bench team to develop and improve pipelines and other tools to support accurate, timely diagnostic testing that has a direct impact on establishing the diagnosis, prognosis, treatment, and management plan for our patients. The team also supports cutting edge R&D efforts including new clinical test development and clinical research. The ability to communicate in a highly matrixed environment will be key to the candidate's success. Seasoned professionals as well as new graduates are welcome to apply and will be considered for the career ladder position commensurate with their experience. This position is hybrid with weekly onsite work at our Philadelphia campus as appropriate. The ideal candidate will have experience in the following: Software development in Python (including working with Git and writing unit tests) Bash scripting/Unix command line Reading and comprehending SQL In addition, experience in some of the following is preferred: Developing ETL workflows between SQL databases Deploying and troubleshooting of Virtual Machines (specifically ones running RHEL OS) Developing and/or utilizing APIs (either REST-ful or GraphQL-based) Submission + troubleshooting of HPC jobs via scheduler (slurm, qsub, etc) Use of open-source tools for processing NGS data (preferably in a reproducible manner via a workflow language such as Snakemake or WDL) Backend programming (devops) Full stack development Ansible and deployment, server management Human genetics data (NGS data)

Requirements

Bachelor's Degree - Required
At least three (3) years production, clinical or research bioinformatics data experience - Required
Extensive knowledge with high performance and parallel computing environments and data processing workflows.
Extensive experience with data storage frameworks.
Extensive knowledge of CPU- and IO-intensive bioinformatics data analysis applications.
Demonstrated track record of optimizing systems to meet changing performance and load requirements.
Extensive knowledge of HPC systems, job management applications, including methods for profiling performance, benchmarking, and optimizing multiple job types and scenarios in bioinformatics data processing.
Ability to independently plan and execute pipelines and workflows of high complexity.
Ability to independently engineer systems relative to larger enterprise framework.
Strong UNIX/LINUX expertise.
Expertise in support mechanisms for applications written in common bioinformatics languages such as R, Python, Perl or similar.
Expertise in support mechanisms for common bioinformatics applications, data sources, and data formats.
Knowledge of common microarray, NGS, mass spectrometry, or other high-throughput data formats.
Expertise with resources of genomic data sets and analysis tools, such as UCSC Genome Browser, Bioconductor, ENCODE, and NCBI databases.
Demonstrated ability to develop and implement best practices for bioinformatics systems integration, testing, and deployment.
Expert knowledge with cloud computing concepts and applications.
Ability to lead discussions with various information systems and technology owners to achieve desired bioinformatics outcomes.
Software development in Python (including working with Git and writing unit tests)
Bash scripting/Unix command line
Reading and comprehending SQL

Nice To Haves

Master's Degree computational discipline or systems engineering - Preferred
At least four (4) years production, clinical or research bioinformatics data experience - Preferred
Developing ETL workflows between SQL databases
Deploying and troubleshooting of Virtual Machines (specifically ones running RHEL OS)
Developing and/or utilizing APIs (either REST-ful or GraphQL-based)
Submission + troubleshooting of HPC jobs via scheduler (slurm, qsub, etc)
Use of open-source tools for processing NGS data (preferably in a reproducible manner via a workflow language such as Snakemake or WDL)
Backend programming (devops)
Full stack development
Ansible and deployment, server management
Human genetics data (NGS data)

Responsibilities

Independently manage and evolve local large-scale bioinformatics pipelines.
Independently manage and evolve large-scale bioinformatics high performance computing (HPC) capability primarily at a local level.
Independently manage and evolve process for effectively using enterprise-provided large-scale bioinformatics storage frameworks.
Work with DTS and data center staff to ensure the appropriate installation, maintenance, and support of bioinformatics-dedicated hardware, software, and data storage.
Work with DTS staff to establish and maintain appropriate levels of availability, response time, and performance of bioinformatics software and systems.
Establish and implement integration and testing procedures following industry best practices for production bioinformatics systems integration and deployment.
Facilitate efficient transfer of bioinformatics data from data sources to data users with benchmarking and data quality checks.
Operationally manage and evolve a robust heterogeneous UNIX, LINUX, OSX environment, including integration with enterprise resources.
Contribute to structured benchmark-based evaluation of new technologies.
Ensure the appropriate installation, maintenance, and support of bioinformatics-dedicated hardware, software, and data storage by collaborating with Digital & Technology Services and data coordination staff.
Implement bioinformatics processing, storage, and manipulation of bioinformatics data in a primarily local environment.
Engage with and participate in discussions related to endor-purchased systems and services usually under supervision.
Provide continuous assessment of commercial and open-source bioinformatics data processing solutions by applying structured benchmark evaluation.
Identify and test application/pipeline defects and fixes.
Troubleshoot data discrepancies.
Serve as engineering resource on a variety of bioinformatics-focused projects.
Serve as engineering facilitator by assessing all stakeholders, including bioinformatics management, bioinformatics scientists, Digital & Technology Services staff, and principal investigators
Mentor lower tier engineering individuals and groups as needed
Advocate for developed solutions in discussions with external technology owners in order to ensure that enterprise systems allow for freedom of operation.
Under supervision, contribute to the development of a formal bioinformatics engineering plan and development roadmap.
Adopts and implements policies and standards for data quality, completeness, and reproducibility.
Adopts and implements policies and standards for performance benchmarking and system stability.
Maintain and audit all documentation required for transparency and reproducibility of operations and any relevant regulations (e.g., CAP, CLIA). Documentation may include configuration, processes, service records, asset inventories, topologies, admin manuals, job instructions, support contacts, and bug/issue tracking.
Install, maintain, and provide technical support for all software installations and associated hardware.