Paper List

Bioinformatics

SpikGPT: A High-Accuracy and Interpretable Spiking Attention Framework for Single-Cell Annotation

2025-12-02

This paper addresses the core challenge of robust single-cell annotation across heterogeneous datasets with batch effects and the critical need to ide...
Bioinformatics

Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time

2025-12-02

This paper addresses the core challenge of efficiently and accurately sampling the conformational landscape of biomolecules from diffusion-based struc...
Computational Neuroscience

Personalized optimization of pediatric HD-tDCS for dose consistency and target engagement

2025-12-01

This paper addresses the critical limitation of one-size-fits-all HD-tDCS protocols in pediatric populations by developing a personalized optimization...
Computational Biophysics

Realistic Transition Paths for Large Biomolecular Systems: A Langevin Bridge Approach

2025-12-01

This paper addresses the core challenge of generating physically realistic and computationally efficient transition paths between distinct protein con...
Bioinformatics

Consistent Synthetic Sequences Unlock Structural Diversity in Fully Atomistic De Novo Protein Design

2025-12-01

This paper addresses the core pain point of low sequence-structure alignment in existing synthetic datasets (e.g., AFDB), which severely limits the pe...
Bioinformatics

MoRSAIK: Sequence Motif Reactor Simulation, Analysis and Inference Kit in Python

2025-12-01

This work addresses the computational bottleneck in simulating prebiotic RNA reactor dynamics by developing a Python package that tracks sequence moti...
Bioinformatics

On the Approximation of Phylogenetic Distance Functions by Artificial Neural Networks

2025-12-01

This paper addresses the core challenge of developing computationally efficient and scalable neural network architectures that can learn accurate phyl...
Bioinformatics

EcoCast: A Spatio-Temporal Model for Continual Biodiversity and Climate Risk Forecasting

2025-12-01

This paper addresses the critical bottleneck in conservation: the lack of timely, high-resolution, near-term forecasts of species distribution shifts ...

15 / 18

期刊: ArXiv Preprint

发布日期: 2026-03-17

BioinformaticsMachine Learning

在强生物域偏移下药物反应模型对患者肿瘤的样本高效适应

Université Grenoble Alpes (UGA)

Camille Jimenez Cortes, Philippe Lalanda, German Vega

30秒速读

IN SHORT: 通过从无标记分子谱中学习可迁移表征，利用最少的临床数据实现患者药物反应的有效预测。

核心创新

Methodology Proposes STaR-DR, a staged transfer-learning framework that explicitly separates unsupervised representation learning, task-specific alignment, and few-shot clinical adaptation.
Methodology Demonstrates that unsupervised pretraining yields limited gains for in vitro prediction but substantially improves few-shot adaptation to patient tumors under strong domain shift.
Biology Links performance patterns to latent-space geometry, providing mechanistic insight into when representation learning is beneficial under biological domain shift.

主要结论

无监督预训练对域内预测（平衡准确率约0.85）和跨数据集泛化（ROC-AUC约0.75）的益处有限，但在适应具有非常有限标记数据的患者肿瘤时能带来明显收益。
分阶段框架在少样本患者水平适应期间实现了更快的性能提升，与单阶段基线相比，有效迁移所需的标记目标样本数量减少了约30-40%。
在Leave-Drug-Out协议下性能下降最为显著（AUPRC下降约0.15），突显了将药物反应预测外推到先前未见化合物的内在困难。

研究空白： 当前基于细胞系数据训练的DRP模型由于强生物域偏移而无法泛化到患者肿瘤，且现有方法未系统研究在此类偏移下表征学习何时能提高适应效率。

摘要: 由于体外细胞系与患者肿瘤之间存在显著的生物学差距，从临床前数据预测患者的药物反应仍然是精准肿瘤学的主要挑战。本研究不追求提高绝对的体外预测准确性，而是探讨在强生物域偏移下，明确分离表征学习与任务监督是否能使药物反应模型对患者数据实现更高效的样本适应。我们提出了一个分阶段的迁移学习框架，其中细胞和药物表征首先通过基于自动编码器的表征学习从大量未标记的药物基因组数据中独立学习。这些表征随后在细胞系数据上与药物反应标签对齐，并最终通过少样本监督适应到患者肿瘤。通过涵盖域内、跨数据集和患者水平设置的系统评估，我们发现当源域和目标域显著重叠时，无监督预训练提供的益处有限，但在适应具有非常有限标记数据的患者肿瘤时能带来明显收益。具体而言，所提出的框架在少样本患者水平适应期间实现了更快的性能提升，同时在标准细胞系基准测试中保持与单阶段基线相当的准确性。总体而言，这些结果表明，从无标记分子谱中学习结构化和可迁移的表征可以显著减少有效药物反应预测所需的临床监督量，为数据高效的临床前到临床转化提供了一条实用途径。