Paper List
-
A Unified Variational Principle for Branching Transport Networks: Wave Impedance, Viscous Flow, and Tissue Metabolism
This paper solves the core problem of predicting the empirically observed branching exponent (α≈2.7) in mammalian arterial trees, which neither Murray...
-
Household Bubbling Strategies for Epidemic Control and Social Connectivity
This paper addresses the core challenge of designing household merging (social bubble) strategies that effectively control epidemic risk while maximiz...
-
Empowering Chemical Structures with Biological Insights for Scalable Phenotypic Virtual Screening
This paper addresses the core challenge of bridging the gap between scalable chemical structure screening and biologically informative but resource-in...
-
A mechanical bifurcation constrains the evolution of cell sheet folding in the family Volvocaceae
This paper addresses the core problem of why there is an evolutionary gap in species with intermediate cell numbers (e.g., 256 cells) in Volvocaceae, ...
-
Bayesian Inference in Epidemic Modelling: A Beginner’s Guide Illustrated with the SIR Model
This guide addresses the core challenge of estimating uncertain epidemiological parameters (like transmission and recovery rates) from noisy, real-wor...
-
Geometric framework for biological evolution
This paper addresses the fundamental challenge of developing a coordinate-independent, geometric description of evolutionary dynamics that bridges gen...
-
A multiscale discrete-to-continuum framework for structured population models
This paper addresses the core challenge of systematically deriving uniformly valid continuum approximations from discrete structured population models...
-
Whole slide and microscopy image analysis with QuPath and OMERO
使QuPath能够直接分析存储在OMERO服务器中的图像而无需下载整个数据集,克服了大规模研究的本地存储限制。
用于快速适应的统一策略-价值分解
Computational Neuroscience Unit, Istituto Superiore di Sanità, Rome, Italy | Ospedale Santa Lucia, Rome, Italy | School of Computer Science, University of Sheffield, United Kingdom
30秒速读
IN SHORT: 通过双线性分解在策略和价值函数之间共享低维目标嵌入,实现对新颖任务的零样本适应。
核心创新
- Methodology Bilinear co-decomposition of actor and critic with shared multiplicative gating coefficients Gk(g)
- Methodology Zero-shot adaptation via single forward pass estimation of Gk(g) without gradient updates
- Biology Biologically inspired multiplicative gating mechanism analogous to gain modulation in cortical neurons
主要结论
- 单层双线性模型比标准双层MLP基线更快地获得更高奖励(图1B-C),通过乘法结构证明了学习效率的提高。
- 演员和评论家之间的共享G空间产生与单独门控相当的性能(图1D),同时减少了参数并实现了连贯的潜在控制接口。
- 对未见方向的零样本泛化显示性能下降有限(图2E),目标嵌入空间中的平滑插值支持新方向适应。
摘要: 复杂控制系统中的快速适应仍然是强化学习的一个核心挑战。我们引入了一个框架,其中策略和价值函数共享一个低维系数向量——目标嵌入——它捕获任务身份,并能够在无需重新训练表示的情况下立即适应新任务。在预训练期间,我们通过双线性演员-评论家分解共同学习结构化价值基和兼容策略基。评论家分解为Q(s,a,g)=∑kGk(g)yk(s,a),其中Gk(g)是目标条件系数向量,yk(s,a)是学习的价值基函数。这种乘法门控——其中上下文信号缩放一组状态依赖的基——让人联想到在第5层锥体神经元中观察到的增益调制,其中自上而下的输入调节感觉驱动响应的增益而不改变其调谐[1]。基于后继特征,我们将分解扩展到演员,它由一组由相同系数Gk(g)加权的原始策略组成。在测试时,基被冻结,Gk(g)通过单次前向传播进行零样本估计,从而无需任何梯度更新即可立即适应新任务。我们在MuJoCo Ant环境中训练了一个软演员-评论家智能体,目标为多方向运动,要求智能体以八个指定为连续目标向量的方向行走。双线性结构允许每个策略头专门处理方向子集,而共享系数层在它们之间泛化,通过在目标嵌入空间中插值来适应新方向。我们的结果表明,共享的低维目标嵌入为高维控制中的快速、结构化适应提供了一种通用机制,并突显了复杂强化学习系统中高效迁移的潜在生物学合理原则。