Paper List

Bioinformatics

Discovery of a Hematopoietic Manifold in scGPT Yields a Method for Extracting Performant Algorithms from Biological Foundation Model Internals

2026-03-10

This work addresses the core challenge of extracting reusable, interpretable, and high-performance biological algorithms from the opaque internal repr...
Bioinformatics

MS2MetGAN: Latent-space adversarial training for metabolite–spectrum matching in MS/MS database search

2026-03-07

This paper addresses the critical bottleneck in metabolite identification: the generation of high-quality negative training samples that are structura...
Neuroscience

Toward Robust, Reproducible, and Widely Accessible Intracranial Language Brain-Computer Interfaces: A Comprehensive Review of Neural Mechanisms, Hardware, Algorithms, Evaluation, Clinical Pathways and Future Directions

2026-03-03

This review addresses the core challenge of fragmented and heterogeneous evidence that hinders the clinical translation of intracranial language BCIs,...
Mathematical Biology

Less Is More in Chemotherapy of Breast Cancer

2026-03-03

通过纳入细胞周期时滞和竞争项，解决了现有肿瘤-免疫模型的过度简化问题，以定量比较化疗方案。
Bioinformatics

Fold-CP: A Context Parallelism Framework for Biomolecular Modeling

2026-03

This paper addresses the critical bottleneck of GPU memory limitations that restrict AlphaFold 3-like models to processing only a few thousand residue...
Bioinformatics

Open Biomedical Knowledge Graphs at Scale: Construction, Federation, and AI Agent Access with Samyama Graph Database

2026-03

This paper addresses the core pain point of fragmented biomedical data by constructing and federating large-scale, open knowledge graphs to enable sea...
Bioinformatics

Predictive Analytics for Foot Ulcers Using Time-Series Temperature and Pressure Data

2026-02-27

This paper addresses the critical need for continuous, real-time monitoring of diabetic foot health by developing an unsupervised anomaly detection fr...
Bioinformatics

Hypothesis-Based Particle Detection for Accurate Nanoparticle Counting and Digital Diagnostics

2025-12-05

This paper addresses the core challenge of achieving accurate, interpretable, and training-free nanoparticle counting in digital diagnostic assays, wh...

9 / 18

期刊: ArXiv Preprint

发布日期: 2026-03-17

BioinformaticsMachine Learning

在强生物域偏移下药物反应模型对患者肿瘤的样本高效适应

Université Grenoble Alpes (UGA)

Camille Jimenez Cortes, Philippe Lalanda, German Vega

30秒速读

IN SHORT: 通过从无标记分子谱中学习可迁移表征，利用最少的临床数据实现患者药物反应的有效预测。

核心创新

Methodology Proposes STaR-DR, a staged transfer-learning framework that explicitly separates unsupervised representation learning, task-specific alignment, and few-shot clinical adaptation.
Methodology Demonstrates that unsupervised pretraining yields limited gains for in vitro prediction but substantially improves few-shot adaptation to patient tumors under strong domain shift.
Biology Links performance patterns to latent-space geometry, providing mechanistic insight into when representation learning is beneficial under biological domain shift.

主要结论

无监督预训练对域内预测（平衡准确率约0.85）和跨数据集泛化（ROC-AUC约0.75）的益处有限，但在适应具有非常有限标记数据的患者肿瘤时能带来明显收益。
分阶段框架在少样本患者水平适应期间实现了更快的性能提升，与单阶段基线相比，有效迁移所需的标记目标样本数量减少了约30-40%。
在Leave-Drug-Out协议下性能下降最为显著（AUPRC下降约0.15），突显了将药物反应预测外推到先前未见化合物的内在困难。

研究空白： 当前基于细胞系数据训练的DRP模型由于强生物域偏移而无法泛化到患者肿瘤，且现有方法未系统研究在此类偏移下表征学习何时能提高适应效率。

摘要: 由于体外细胞系与患者肿瘤之间存在显著的生物学差距，从临床前数据预测患者的药物反应仍然是精准肿瘤学的主要挑战。本研究不追求提高绝对的体外预测准确性，而是探讨在强生物域偏移下，明确分离表征学习与任务监督是否能使药物反应模型对患者数据实现更高效的样本适应。我们提出了一个分阶段的迁移学习框架，其中细胞和药物表征首先通过基于自动编码器的表征学习从大量未标记的药物基因组数据中独立学习。这些表征随后在细胞系数据上与药物反应标签对齐，并最终通过少样本监督适应到患者肿瘤。通过涵盖域内、跨数据集和患者水平设置的系统评估，我们发现当源域和目标域显著重叠时，无监督预训练提供的益处有限，但在适应具有非常有限标记数据的患者肿瘤时能带来明显收益。具体而言，所提出的框架在少样本患者水平适应期间实现了更快的性能提升，同时在标准细胞系基准测试中保持与单阶段基线相当的准确性。总体而言，这些结果表明，从无标记分子谱中学习结构化和可迁移的表征可以显著减少有效药物反应预测所需的临床监督量，为数据高效的临床前到临床转化提供了一条实用途径。