Paper List
-
Translating Measures onto Mechanisms: The Cognitive Relevance of Higher-Order Information
This review addresses the core challenge of translating abstract higher-order information theory metrics (e.g., synergy, redundancy) into defensible, ...
-
Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs
This paper addresses the critical gap in understanding whether LLMs spontaneously develop human-like Bayesian strategies for processing uncertain info...
-
Vessel Network Topology in Molecular Communication: Insights from Experiments and Theory
This work addresses the critical lack of experimentally validated channel models for molecular communication within complex vessel networks, which is ...
-
Modulation of DNA rheology by a transcription factor that forms aging microgels
This work addresses the fundamental question of how the transcription factor NANOG, essential for embryonic stem cell pluripotency, physically regulat...
-
Imperfect molecular detection renormalizes apparent kinetic rates in stochastic gene regulatory networks
This paper addresses the core challenge of distinguishing genuine stochastic dynamics of gene regulatory networks from artifacts introduced by imperfe...
-
PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-Cancer
This paper addresses the dual challenge of achieving computational efficiency without sacrificing accuracy in whole-transcriptome single-cell represen...
-
Beyond Bayesian Inference: The Correlation Integral Likelihood Framework and Gradient Flow Methods for Deterministic Sampling
This paper addresses the core challenge of calibrating complex biological models (e.g., PDEs, agent-based models) with incomplete, noisy, or heterogen...
-
Contrastive Deep Learning for Variant Detection in Wastewater Genomic Sequencing
This paper addresses the core challenge of detecting viral variants in wastewater sequencing data without reference genomes or labeled annotations, ov...
用于快速适应的统一策略-价值分解
Computational Neuroscience Unit, Istituto Superiore di Sanità, Rome, Italy | Ospedale Santa Lucia, Rome, Italy | School of Computer Science, University of Sheffield, United Kingdom
30秒速读
IN SHORT: 通过双线性分解在策略和价值函数之间共享低维目标嵌入,实现对新颖任务的零样本适应。
核心创新
- Methodology Bilinear co-decomposition of actor and critic with shared multiplicative gating coefficients Gk(g)
- Methodology Zero-shot adaptation via single forward pass estimation of Gk(g) without gradient updates
- Biology Biologically inspired multiplicative gating mechanism analogous to gain modulation in cortical neurons
主要结论
- 单层双线性模型比标准双层MLP基线更快地获得更高奖励(图1B-C),通过乘法结构证明了学习效率的提高。
- 演员和评论家之间的共享G空间产生与单独门控相当的性能(图1D),同时减少了参数并实现了连贯的潜在控制接口。
- 对未见方向的零样本泛化显示性能下降有限(图2E),目标嵌入空间中的平滑插值支持新方向适应。
摘要: 复杂控制系统中的快速适应仍然是强化学习的一个核心挑战。我们引入了一个框架,其中策略和价值函数共享一个低维系数向量——目标嵌入——它捕获任务身份,并能够在无需重新训练表示的情况下立即适应新任务。在预训练期间,我们通过双线性演员-评论家分解共同学习结构化价值基和兼容策略基。评论家分解为Q(s,a,g)=∑kGk(g)yk(s,a),其中Gk(g)是目标条件系数向量,yk(s,a)是学习的价值基函数。这种乘法门控——其中上下文信号缩放一组状态依赖的基——让人联想到在第5层锥体神经元中观察到的增益调制,其中自上而下的输入调节感觉驱动响应的增益而不改变其调谐[1]。基于后继特征,我们将分解扩展到演员,它由一组由相同系数Gk(g)加权的原始策略组成。在测试时,基被冻结,Gk(g)通过单次前向传播进行零样本估计,从而无需任何梯度更新即可立即适应新任务。我们在MuJoCo Ant环境中训练了一个软演员-评论家智能体,目标为多方向运动,要求智能体以八个指定为连续目标向量的方向行走。双线性结构允许每个策略头专门处理方向子集,而共享系数层在它们之间泛化,通过在目标嵌入空间中插值来适应新方向。我们的结果表明,共享的低维目标嵌入为高维控制中的快速、结构化适应提供了一种通用机制,并突显了复杂强化学习系统中高效迁移的潜在生物学合理原则。