Paper List

Biophysics

Exactly Solvable Population Model with Square-Root Growth Noise and Cell-Size Regulation

2025-12-04

This paper addresses the fundamental gap in understanding how microscopic growth fluctuations, specifically those with size-dependent (square-root) no...
Bioinformatics

Assessment of Simulation-based Inference Methods for Stochastic Compartmental Models

2025-12-02

This paper addresses the core challenge of performing accurate Bayesian parameter inference for stochastic epidemic models when the likelihood functio...
Computational Biophysics

Realistic Transition Paths for Large Biomolecular Systems: A Langevin Bridge Approach

2025-12-01

This paper addresses the core challenge of generating physically realistic and computationally efficient transition paths between distinct protein con...
Bioinformatics

MoRSAIK: Sequence Motif Reactor Simulation, Analysis and Inference Kit in Python

2025-12-01

This work addresses the computational bottleneck in simulating prebiotic RNA reactor dynamics by developing a Python package that tracks sequence moti...
Bioinformatics

The BEAT-CF Causal Model: A model for guiding the design of trials and observational analyses of cystic fibrosis exacerbations

2025-12

This paper addresses the critical gap in cystic fibrosis exacerbation management by providing a formal causal framework that integrates expert knowled...
Theoretical Biology

A Theoretical Framework for the Formation of Large Animal Groups: Topological Coordination, Subgroup Merging, and Velocity Inheritance

2025-11-28

This paper addresses the core problem of how large, coordinated animal groups form in nature, challenging the classical view of gradual aggregation by...
Bioinformatics

ANNE Apnea Paper

2025-03

This paper addresses the core challenge of achieving accurate, event-level sleep apnea detection and characterization using a non-intrusive, multimoda...
Bioinformatics

DeeDeeExperiment: Building an infrastructure for integrating and managing omics data analysis results in R/Bioconductor

2025

This paper addresses the critical bottleneck of managing and organizing the growing volume of differential expression and functional enrichment analys...

8 / 9

Journal: ArXiv Preprint

Published: Unknown

BioinformaticsNLP

Enhancing Clinical Note Generation with ICD-10, Clinical Ontology Knowledge Graphs, and Chain-of-Thought Prompting Using GPT-4

Computer Science, Old Dominion University | Biomedical Informatics, University of Arkansas for Medical Sciences

Ivan Makohon, Mohamad Najafi, Jian Wu, Mathias Brochhausen, Yaohang Li

The 30-Second View

IN SHORT: This paper addresses the core challenge of generating accurate and clinically relevant patient notes from sparse inputs (ICD codes and basic demographics) by augmenting Chain-of-Thought prompting with semantic search and structured medical knowledge graphs.

Innovation (TL;DR)

Methodology Proposes a novel hybrid prompting framework that integrates traditional Chain-of-Thought reasoning with semantic search results from a clinical corpus (CodiEsp dataset) to provide contextual examples.
Methodology Introduces the infusion of a structured clinical ontology knowledge graph (built from SNOMED CT OWL expressions) directly into the LLM prompt to ground generation in formal medical relationships and constraints.
Methodology/Biology Demonstrates the first systematic approach to reverse the common ICD code classification task, instead generating comprehensive clinical notes from ICD codes as primary input, evaluated on six distinct clinical cases.

Key conclusions

The proposed CoT prompting with semantic search (using ICD codes as query) consistently outperformed the standard one-shot baseline across six clinical cases, as evidenced by lower cosine distance scores (e.g., Case C showed a clear leftward shift in KDE peak, indicating higher semantic similarity to ground truth).
Incorporating a clinical knowledge graph (SNOMED CT OWL) into the prompt (CoT KG) provided structured medical relationships, enriching the generated notes with domain-specific terminology and logical constraints derived from formal ontologies.
The hybrid approach (CoT Semantic Search + KG) leverages both in-context examples from similar cases and formal medical knowledge, offering a robust framework for improving the factual accuracy and clinical relevance of LLM-generated notes from coded inputs.

Background and Gap： Current LLM applications in clinical note generation often rely on extensive patient narratives or transcripts as input. A significant bottleneck is generating coherent, detailed notes from highly structured but information-sparse inputs like diagnosis codes alone, which lacks contextual clinical reasoning and domain knowledge grounding.

Abstract: In the past decade a surge in the amount of electronic health record (EHR) data in the United States, attributed to a favorable policy environment created by the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 and the 21st Century Cures Act of 2016. Clinical notes for patients’ assessments, diagnoses, and treatments are captured in these EHRs in free-form text by physicians, who spend a considerable amount of time entering and editing them. Manually writing clinical notes takes a considerable amount of a doctor’s valuable time, increasing the patient’s waiting time and possibly delaying diagnoses. Large language models (LLMs) possess the ability to generate news articles that closely resemble human-written ones. We investigate the usage of Chain-of-Thought (CoT) prompt engineering to improve the LLM’s response in clinical note generation. In our prompts, we use as input International Classification of Diseases (ICD) codes and basic patient information. We investigate a strategy that combines the traditional CoT with semantic search results to improve the quality of generated clinical notes. Additionally, we infuse a knowledge graph (KG) built from clinical ontology to further enrich the domain-specific knowledge of generated clinical notes. We test our prompting technique on six clinical cases from the CodiEsp test dataset using GPT-4 and our results show that it outperformed the clinical notes generated by standard one-shot prompts.