Paper List

Bioinformatics

STAR-GO: Improving Protein Function Prediction by Learning to Hierarchically Integrate Ontology-Informed Semantic Embeddings

2025-12-04

This paper addresses the core challenge of generalizing protein function prediction to unseen or newly introduced Gene Ontology (GO) terms by overcomi...
Bioinformatics

Incorporating indel channels into average-case analysis of seed-chain-extend

2025-12-04

This paper addresses the core pain point of bridging the theoretical gap for the widely used seed-chain-extend heuristic by providing the first rigoro...
Theoretical Neuroscience

Competition, stability, and functionality in excitatory-inhibitory neural circuits

2025-12-04

This paper addresses the core challenge of extending interpretable energy-based frameworks to biologically realistic asymmetric neural networks, where...
Bioinformatics

Enhancing Clinical Note Generation with ICD-10, Clinical Ontology Knowledge Graphs, and Chain-of-Thought Prompting Using GPT-4

2025-12-04

This paper addresses the core challenge of generating accurate and clinically relevant patient notes from sparse inputs (ICD codes and basic demograph...
Bioinformatics

Learning From Limited Data and Feedback for Cell Culture Process Monitoring: A Comparative Study

2025-12-03

This paper addresses the core challenge of developing accurate real-time bioprocess monitoring soft sensors under severe data constraints: limited his...
Bioinformatics

Cell-cell communication inference and analysis: biological mechanisms, computational approaches, and future opportunities

2025-12-03

This review addresses the critical need for a systematic framework to navigate the rapidly expanding landscape of computational methods for inferring ...
Epidemiology

Generating a Contact Matrix for Aged Care Settings in Australia: an agent-based model study

2025-12-03

This study addresses the critical gap in understanding heterogeneous contact patterns within aged care facilities, where existing population-level con...
Computational Neuroscience

Emergent Spatiotemporal Dynamics in Large-Scale Brain Networks with Next Generation Neural Mass Models

2025-12-03

This work addresses the core challenge of understanding how complex, brain-wide spatiotemporal patterns emerge from the interaction of biophysically d...

12 / 18

期刊: ArXiv Preprint

发布日期: 2026-03-18

Reinforcement LearningComputational Neuroscience

用于快速适应的统一策略-价值分解

Computational Neuroscience Unit, Istituto Superiore di Sanità, Rome, Italy | Ospedale Santa Lucia, Rome, Italy | School of Computer Science, University of Sheffield, United Kingdom

Cristiano Capone, Luca Falorsi, Andrea Ciardiello, Luca Manneschi

30秒速读

IN SHORT: 通过双线性分解在策略和价值函数之间共享低维目标嵌入，实现对新颖任务的零样本适应。

核心创新

Methodology Bilinear co-decomposition of actor and critic with shared multiplicative gating coefficients Gk(g)
Methodology Zero-shot adaptation via single forward pass estimation of Gk(g) without gradient updates
Biology Biologically inspired multiplicative gating mechanism analogous to gain modulation in cortical neurons

主要结论

单层双线性模型比标准双层MLP基线更快地获得更高奖励（图1B-C），通过乘法结构证明了学习效率的提高。
演员和评论家之间的共享G空间产生与单独门控相当的性能（图1D），同时减少了参数并实现了连贯的潜在控制接口。
对未见方向的零样本泛化显示性能下降有限（图2E），目标嵌入空间中的平滑插值支持新方向适应。

研究空白： 当前强化学习中的整体式神经网络架构限制了模块化、可解释性以及对变化任务目标的快速适应能力，需要完全重新训练或复杂的元学习方法。

摘要: 复杂控制系统中的快速适应仍然是强化学习的一个核心挑战。我们引入了一个框架，其中策略和价值函数共享一个低维系数向量——目标嵌入——它捕获任务身份，并能够在无需重新训练表示的情况下立即适应新任务。在预训练期间，我们通过双线性演员-评论家分解共同学习结构化价值基和兼容策略基。评论家分解为Q(s,a,g)=∑kGk(g)yk(s,a)，其中Gk(g)是目标条件系数向量，yk(s,a)是学习的价值基函数。这种乘法门控——其中上下文信号缩放一组状态依赖的基——让人联想到在第5层锥体神经元中观察到的增益调制，其中自上而下的输入调节感觉驱动响应的增益而不改变其调谐[1]。基于后继特征，我们将分解扩展到演员，它由一组由相同系数Gk(g)加权的原始策略组成。在测试时，基被冻结，Gk(g)通过单次前向传播进行零样本估计，从而无需任何梯度更新即可立即适应新任务。我们在MuJoCo Ant环境中训练了一个软演员-评论家智能体，目标为多方向运动，要求智能体以八个指定为连续目标向量的方向行走。双线性结构允许每个策略头专门处理方向子集，而共享系数层在它们之间泛化，通过在目标嵌入空间中插值来适应新方向。我们的结果表明，共享的低维目标嵌入为高维控制中的快速、结构化适应提供了一种通用机制，并突显了复杂强化学习系统中高效迁移的潜在生物学合理原则。