Paper List
-
STAR-GO: Improving Protein Function Prediction by Learning to Hierarchically Integrate Ontology-Informed Semantic Embeddings
This paper addresses the core challenge of generalizing protein function prediction to unseen or newly introduced Gene Ontology (GO) terms by overcomi...
-
Incorporating indel channels into average-case analysis of seed-chain-extend
This paper addresses the core pain point of bridging the theoretical gap for the widely used seed-chain-extend heuristic by providing the first rigoro...
-
Competition, stability, and functionality in excitatory-inhibitory neural circuits
This paper addresses the core challenge of extending interpretable energy-based frameworks to biologically realistic asymmetric neural networks, where...
-
Enhancing Clinical Note Generation with ICD-10, Clinical Ontology Knowledge Graphs, and Chain-of-Thought Prompting Using GPT-4
This paper addresses the core challenge of generating accurate and clinically relevant patient notes from sparse inputs (ICD codes and basic demograph...
-
Learning From Limited Data and Feedback for Cell Culture Process Monitoring: A Comparative Study
This paper addresses the core challenge of developing accurate real-time bioprocess monitoring soft sensors under severe data constraints: limited his...
-
Cell-cell communication inference and analysis: biological mechanisms, computational approaches, and future opportunities
This review addresses the critical need for a systematic framework to navigate the rapidly expanding landscape of computational methods for inferring ...
-
Generating a Contact Matrix for Aged Care Settings in Australia: an agent-based model study
This study addresses the critical gap in understanding heterogeneous contact patterns within aged care facilities, where existing population-level con...
-
Emergent Spatiotemporal Dynamics in Large-Scale Brain Networks with Next Generation Neural Mass Models
This work addresses the core challenge of understanding how complex, brain-wide spatiotemporal patterns emerge from the interaction of biophysically d...
用于快速适应的统一策略-价值分解
Computational Neuroscience Unit, Istituto Superiore di Sanità, Rome, Italy | Ospedale Santa Lucia, Rome, Italy | School of Computer Science, University of Sheffield, United Kingdom
30秒速读
IN SHORT: 通过双线性分解在策略和价值函数之间共享低维目标嵌入,实现对新颖任务的零样本适应。
核心创新
- Methodology Bilinear co-decomposition of actor and critic with shared multiplicative gating coefficients Gk(g)
- Methodology Zero-shot adaptation via single forward pass estimation of Gk(g) without gradient updates
- Biology Biologically inspired multiplicative gating mechanism analogous to gain modulation in cortical neurons
主要结论
- 单层双线性模型比标准双层MLP基线更快地获得更高奖励(图1B-C),通过乘法结构证明了学习效率的提高。
- 演员和评论家之间的共享G空间产生与单独门控相当的性能(图1D),同时减少了参数并实现了连贯的潜在控制接口。
- 对未见方向的零样本泛化显示性能下降有限(图2E),目标嵌入空间中的平滑插值支持新方向适应。
摘要: 复杂控制系统中的快速适应仍然是强化学习的一个核心挑战。我们引入了一个框架,其中策略和价值函数共享一个低维系数向量——目标嵌入——它捕获任务身份,并能够在无需重新训练表示的情况下立即适应新任务。在预训练期间,我们通过双线性演员-评论家分解共同学习结构化价值基和兼容策略基。评论家分解为Q(s,a,g)=∑kGk(g)yk(s,a),其中Gk(g)是目标条件系数向量,yk(s,a)是学习的价值基函数。这种乘法门控——其中上下文信号缩放一组状态依赖的基——让人联想到在第5层锥体神经元中观察到的增益调制,其中自上而下的输入调节感觉驱动响应的增益而不改变其调谐[1]。基于后继特征,我们将分解扩展到演员,它由一组由相同系数Gk(g)加权的原始策略组成。在测试时,基被冻结,Gk(g)通过单次前向传播进行零样本估计,从而无需任何梯度更新即可立即适应新任务。我们在MuJoCo Ant环境中训练了一个软演员-评论家智能体,目标为多方向运动,要求智能体以八个指定为连续目标向量的方向行走。双线性结构允许每个策略头专门处理方向子集,而共享系数层在它们之间泛化,通过在目标嵌入空间中插值来适应新方向。我们的结果表明,共享的低维目标嵌入为高维控制中的快速、结构化适应提供了一种通用机制,并突显了复杂强化学习系统中高效迁移的潜在生物学合理原则。