NLP LLM Specialist

Universities of Wisconsin•Madison, WI

20h•$76,289•Remote

About The Position

The Large Language Model (LLM) / Natural Language Processing (NLP) Engineer will serve as a hands-on technical contributor responsible for building, integrating, and operationalizing advanced language-model capabilities within the Wisconsin Health Data Hub (WHDH) platform. WHDH is a federally funded initiative developing a secure, cloud-native data ecosystem designed to support biomedical research, advanced analytics, and AI-driven discovery using real-world health data. This role focuses on the practical implementation of NLP and generative AI technologies that enable scalable analysis of large volumes of unstructured healthcare data such as clinical notes, research publications, and other text-based datasets. The engineer will design and deploy production-grade AI services, integrate LLM capabilities into the WHDH platform, and support researchers and partner organizations in leveraging these tools for applied healthcare analytics. The position requires a strong engineering mindset and the ability to translate emerging AI capabilities into reliable, scalable solutions operating within a secure research data environment. The Wisconsin Health Data Hub (WHDH) is a grant-funded initiative within the Office of Informatics and Information Technology (IIT) at the University of Wisconsin–Madison School of Medicine and Public Health. WHDH brings together a multidisciplinary team of technologists responsible for designing, implementing, and operating a secure data enclave that supports the responsible use of real-world health data for biomedical research. The WHDH team develops and manages a scalable data platform that enables researchers to efficiently access, integrate, and analyze large-scale health datasets from participating health systems. By providing advanced data services, governance frameworks, and analytical capabilities, WHDH accelerates the research lifecycle—from project conception and data acquisition to analysis and discovery—while ensuring compliance with applicable regulatory, privacy, and security requirements.

Requirements

3 Years of full-time professional experience building or deploying NLP or machine learning solutions in production environments.
Strong programming experience in Python and familiarity with modern NLP frameworks such as Hugging Face Transformers, spaCy, PyTorch, or TensorFlow.
Experience working with large-scale data processing pipelines and distributed data environments.
Experience deploying AI models using containerization technologies such as Docker and orchestration frameworks such as Kubernetes.
Ability to design and build scalable APIs and backend services supporting AI-powered applications.

Nice To Haves

5 years of full-time professional experience building or deploying NLP or machine learning solutions in production environments.
Experience working with biomedical or clinical text data.
Familiarity with healthcare data models and standards such as FHIR, OMOP, or UMLS.
Experience developing AI solutions in cloud environments such as AWS, Azure, or Google Cloud.
Experience with MLOps practices including model deployment, monitoring, and lifecycle management.
Familiarity with vector databases, embedding models, and retrieval-augmented generation (RAG) architectures.
Experience building generative AI applications using modern LLM frameworks.
PhD Preferred; Focus in Computer Science, Software Engineering, Artificial Intelligence, Data Science, or a related technical field preferred.

Responsibilities

Design, implement, and maintain production-ready NLP pipelines for processing large volumes of unstructured healthcare and biomedical text data.
Fine-tune, deploy, and optimize large language models for domain-specific applications including clinical text analysis, semantic search, and automated summarization.
Develop services for entity extraction, concept normalization, document classification, and information retrieval from healthcare datasets.
Build reusable NLP components and APIs that can be integrated into analytics workflows across the WHDH platform.
Integrate LLM and NLP capabilities into WHDH’s cloud-based data and analytics platform.
Develop scalable APIs and microservices that enable secure access to language-model capabilities by research teams and application developers.
Implement containerized services and deployment pipelines to operationalize AI models in production environments.
Work with teams to ensure NLP pipelines operate efficiently within large-scale distributed data processing environments.
Collaborate with platform engineers and domain experts to design AI-driven solutions that address real-world healthcare data challenges.
Translate emerging LLM capabilities into practical tools for clinical text processing, data enrichment, and knowledge extraction.
Rapidly prototype and iterate AI-enabled features that improve usability and accessibility of the WHDH data platform.
Support applied analytics initiatives that leverage LLM capabilities to enhance research workflows.
Ensure all AI solutions comply with institutional data governance policies and healthcare data privacy requirements.
Implement safeguards for secure handling of sensitive healthcare text data within NLP workflows.
Support responsible use of generative AI technologies through appropriate monitoring, evaluation, and documentation practices.
Collaborate with platform security teams to ensure compliance with HIPAA-aligned infrastructure requirements.
Prepares data sets for analysis including cleaning/quality assurance, transformations, restructuring, and integration of multiple data sources.
Serves as an institutional subject matter expert and liaison to key internal and external stakeholders regarding data science best practices and methodologies and represents the interests of data science.
Composes and assembles reproducible workflows and reports to clearly articulate patterns to researchers and/or administrators.
Leverage modern NLP frameworks and LLMs to extract critical insights from unstructured clinical notes and reports, ensuring data quality and integrity through rigorous preprocessing.
Develop predictive models using retrospective real-world data to estimate disease risk, progression, and treatment effectiveness, while addressing bias and fairness.
Design and execute rigorous hypothesis testing on observational datasets to validate research findings.
Work closely with data governance and security to ensure compliance with privacy regulations (e.g., NIST, HIPAA) when working with healthcare data; and address bias and fairness issues in AI models when dealing with sensitive health data.
Develop and implement informatics pipelines for the processing, integration, and harmonization of heterogeneous data sources.
Identifies and implements or guides others in implementing appropriate data science techniques to find data patterns and answer research questions chosen by the lead researcher including data visualization, statistical analysis, machine learning, and data mining.
Organizes and automates project steps for data preparation and analysis.
Documents approaches to address research questions and contributes to the establishment of reproducible research methodologies and analysis workflows.