Paper List

Bioinformatics

SpikGPT: A High-Accuracy and Interpretable Spiking Attention Framework for Single-Cell Annotation

2025-12-02

This paper addresses the core challenge of robust single-cell annotation across heterogeneous datasets with batch effects and the critical need to ide...
Bioinformatics

Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time

2025-12-02

This paper addresses the core challenge of efficiently and accurately sampling the conformational landscape of biomolecules from diffusion-based struc...
Computational Neuroscience

Personalized optimization of pediatric HD-tDCS for dose consistency and target engagement

2025-12-01

This paper addresses the critical limitation of one-size-fits-all HD-tDCS protocols in pediatric populations by developing a personalized optimization...
Computational Biophysics

Realistic Transition Paths for Large Biomolecular Systems: A Langevin Bridge Approach

2025-12-01

This paper addresses the core challenge of generating physically realistic and computationally efficient transition paths between distinct protein con...
Bioinformatics

Consistent Synthetic Sequences Unlock Structural Diversity in Fully Atomistic De Novo Protein Design

2025-12-01

This paper addresses the core pain point of low sequence-structure alignment in existing synthetic datasets (e.g., AFDB), which severely limits the pe...
Bioinformatics

MoRSAIK: Sequence Motif Reactor Simulation, Analysis and Inference Kit in Python

2025-12-01

This work addresses the computational bottleneck in simulating prebiotic RNA reactor dynamics by developing a Python package that tracks sequence moti...
Bioinformatics

On the Approximation of Phylogenetic Distance Functions by Artificial Neural Networks

2025-12-01

This paper addresses the core challenge of developing computationally efficient and scalable neural network architectures that can learn accurate phyl...
Bioinformatics

EcoCast: A Spatio-Temporal Model for Continual Biodiversity and Climate Risk Forecasting

2025-12-01

This paper addresses the critical bottleneck in conservation: the lack of timely, high-resolution, near-term forecasts of species distribution shifts ...

15 / 18

期刊: ArXiv Preprint

发布日期: 2026-03-18

Reinforcement LearningComputational Neuroscience

用于快速适应的统一策略-价值分解

Computational Neuroscience Unit, Istituto Superiore di Sanità, Rome, Italy | Ospedale Santa Lucia, Rome, Italy | School of Computer Science, University of Sheffield, United Kingdom

Cristiano Capone, Luca Falorsi, Andrea Ciardiello, Luca Manneschi

30秒速读

IN SHORT: 通过双线性分解在策略和价值函数之间共享低维目标嵌入，实现对新颖任务的零样本适应。

核心创新

Methodology Bilinear co-decomposition of actor and critic with shared multiplicative gating coefficients Gk(g)
Methodology Zero-shot adaptation via single forward pass estimation of Gk(g) without gradient updates
Biology Biologically inspired multiplicative gating mechanism analogous to gain modulation in cortical neurons

主要结论

单层双线性模型比标准双层MLP基线更快地获得更高奖励（图1B-C），通过乘法结构证明了学习效率的提高。
演员和评论家之间的共享G空间产生与单独门控相当的性能（图1D），同时减少了参数并实现了连贯的潜在控制接口。
对未见方向的零样本泛化显示性能下降有限（图2E），目标嵌入空间中的平滑插值支持新方向适应。

研究空白： 当前强化学习中的整体式神经网络架构限制了模块化、可解释性以及对变化任务目标的快速适应能力，需要完全重新训练或复杂的元学习方法。

摘要: 复杂控制系统中的快速适应仍然是强化学习的一个核心挑战。我们引入了一个框架，其中策略和价值函数共享一个低维系数向量——目标嵌入——它捕获任务身份，并能够在无需重新训练表示的情况下立即适应新任务。在预训练期间，我们通过双线性演员-评论家分解共同学习结构化价值基和兼容策略基。评论家分解为Q(s,a,g)=∑kGk(g)yk(s,a)，其中Gk(g)是目标条件系数向量，yk(s,a)是学习的价值基函数。这种乘法门控——其中上下文信号缩放一组状态依赖的基——让人联想到在第5层锥体神经元中观察到的增益调制，其中自上而下的输入调节感觉驱动响应的增益而不改变其调谐[1]。基于后继特征，我们将分解扩展到演员，它由一组由相同系数Gk(g)加权的原始策略组成。在测试时，基被冻结，Gk(g)通过单次前向传播进行零样本估计，从而无需任何梯度更新即可立即适应新任务。我们在MuJoCo Ant环境中训练了一个软演员-评论家智能体，目标为多方向运动，要求智能体以八个指定为连续目标向量的方向行走。双线性结构允许每个策略头专门处理方向子集，而共享系数层在它们之间泛化，通过在目标嵌入空间中插值来适应新方向。我们的结果表明，共享的低维目标嵌入为高维控制中的快速、结构化适应提供了一种通用机制，并突显了复杂强化学习系统中高效迁移的潜在生物学合理原则。