Applied Bioinformatics Engineer, Pipelines & AI

Lilly•Indianapolis, IN

About The Position

The Human Genomics and Translational Data Sciences team within Cardiometabolic Research Data Science is hiring a Bioinformatics Pipeline Engineer to help build, solidify, and scale the analytical pipelines our scientists rely on every day. Our work spans multiple omics workflows, including target discovery and target due diligence, single cell sequencing, genomics, proteomics and, increasingly, AI-assisted workflows that pull these analyses together into faster, more reproducible products for therapeutic area partners across Lilly Research Labs. This role sits at the intersection of two worlds. On one side, we employ classical bioinformatics and statistical genetics pipelines — the kind of robust, reproducible, well-tested workflows that turn messy public and proprietary genomics data into trustworthy answers. On the other, the rapidly evolving stack of AI tooling — large language models like Claude, agentic workflows, building AI-friendly connectors like MCP (Model Context Protocol), and the code that lets scientists query complex datasets in natural language. We want someone who is genuinely curious about both, and keen to use both to improve the value we derive from our datasets to enable target support and novel target discovery. You will not be expected to be a senior expert in either domain on day one. You will be expected to bring strong software engineering instincts, and a keen curiosity and creativity to enhance the value of the tools and datasets at our disposal. You will work closely with statistical geneticists, computational biologists, and other engineers — both within our team and across Lilly — to ship tools that make the science faster and more reliable.

Requirements

B.S. in computer science, computational biology, bioinformatics, biological sciences, statistics, or a related field, with 10+ years relevant work experience
OR M.S. in computer science, computational biology, bioinformatics, biological sciences, statistics, or a related field, with 7+ years relevant work experience
OR Ph.D. in computer science, computational biology, bioinformatics, biological sciences, statistics, or a related field, with 1+ years relevant work experience.
Strong programming skills in Python and/or R including comfort with version control (Git), code review, testing, and writing maintainable code
Demonstrated ability to build stable and practical, reusable workflows and not just code for one-off analyses, with strong implementation skills in Python and modern AI/ML tooling
A collaborative, low-ego mentality; you enjoy building tools that other people use and you take feedback well
Comfort with cloud computing environments (AWS, GCP, or Azure) and Linux/command-line work
Ability to work successfully in a matrixed environment

Nice To Haves

Demonstrated experience building data analysis pipelines, ideally using a workflow manager such as Nextflow, Snakemake, or WDL
Working familiarity with bioinformatics file formats (VCF, BED, GTF, BAM, etc.) and standard tools (PLINK, samtools, bcftools, or similar)
Familiarity with typical data types in high-throughput biology, including NGS data
Hands-on experience or strong demonstrated interest in modern AI tooling — using LLMs through APIs, building MCP servers/connectors, prompt engineering, or wiring up agentic workflows
Prior experience with statistical workflows/biomedical statistics
Prior exposure to statistical genetics methods (GWAS, fine-mapping, MR, colocalization, burden testing) or large-scale genomic datasets (UK Biobank, gnomAD, GTEx, Open Targets)
Prior experience with complex high-throughput biological data or experiments such as spatial transcriptomics, large-scale screens, or multi-omics studies
Familiarity with R in addition to Python, particularly for statistical genetics packages
Experience with relational and/or graph databases, and with biomedical ontologies
Contributions to open-source projects or a public portfolio (GitHub, blog posts, demos)
Prior experience in pharma, biotech, or academic genomics research

Responsibilities

Support for computational biology workflows, including single cell, spatial, and other multi-omics analysis workflows for clinical and preclinical applications
Use modern workflow managers (e.g. Nextflow, Snakemake, or similar) and containerization (Docker, Singularity) to make pipelines portable, testable, and reusable across projects and teams
Help build and maintain reproducible analytical pipelines for statistical genetics and bioinformatics workflows
Wrap and harden ad-hoc analytical scripts written by scientists into production-quality tools that can be re-run reliably by others
Write tests, documentation, and clear examples so the pipelines you build are usable by colleagues with a range of technical backgrounds
Prototype agentic workflows that automate established and routine analytical tasks — for example, pulling target evidence across data sources, generating standardized due-diligence reports, or letting scientists interrogate complex datasets in natural language
Build and maintain MCP connectors that expose internal data, public resources, and analytical pipelines to LLM-based agents and tools like Claude
Identify and develop use cases where LLMs and agentic AI workflows can improve the speed, quality, consistency, or accessibility of work across therapeutic areas, focusing on end-to-end capabilities rather than isolated task completion
Contribute to a shared library of reusable AI tooling, prompt patterns, and integration code that the team can build on. Define technical standards for evaluation, documentation, guardrails, and workflow quality so that AI-based solutions are trusted, reproducible, and suitable for repeated use across teams and projects
Know the latest with the AI tooling landscape and bring back ideas the team can put to work. Help improve AI fluency among collaborators by demonstrating practical workflows
Partner closely with statistical geneticists, computational biologists, and software engineers within the Cardiometabolic Data Science group and across other Lilly Research Labs teams
Work with therapeutic area partners to understand their analytical needs and translate them into pipeline requirements
Coordinate with platform and engineering groups to ensure your pipelines integrate cleanly with broader Lilly infrastructure
Contribute to internal knowledge sharing — code reviews, demos, documentation, and helping colleagues get unblocked

Benefits

company bonus (depending, in part, on company and individual performance)
company-sponsored 401(k)
pension
vacation benefits
medical, dental, vision and prescription drug benefits
flexible benefits (e.g., healthcare and/or dependent day care flexible spending accounts)
life insurance and death benefits
certain time off and leave of absence benefits
well-being benefits (e.g., employee assistance program, fitness benefits, and employee clubs and activities)

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume