Paper List

Complex Systems

Macroscopic Dominance from Microscopic Extremes: Symmetry Breaking in Spatial Competition

2026-03-11

This paper addresses the fundamental question of how microscopic stochastic advantages in spatial exploration translate into macroscopic resource domi...
Computational Neuroscience

Linear Readout of Neural Manifolds with Continuous Variables

2026-03-11

This paper addresses the core challenge of quantifying how the geometric structure of high-dimensional neural population activity (neural manifolds) d...
Biophysics

Theory of Cell Body Lensing and Phototaxis Sign Reversal in “Eyeless” Mutants of Chlamydomonas

2026-03-11

This paper solves the core puzzle of how eyeless mutants of Chlamydomonas exhibit reversed phototaxis by quantitatively modeling the competition betwe...
Bioinformatics

Cross-Species Transfer Learning for Electrophysiology-to-Transcriptomics Mapping in Cortical GABAergic Interneurons

2026-03-11

This paper addresses the challenge of predicting transcriptomic identity from electrophysiological recordings in human cortical interneurons, where li...
Computational Neuroscience

Uncovering statistical structure in large-scale neural activity with Restricted Boltzmann Machines

2026-03-11

This paper addresses the core challenge of modeling large-scale neural population activity (1500-2000 neurons) with interpretable higher-order interac...
Computational Modeling

Realizing Common Random Numbers: Event-Keyed Hashing for Causally Valid Stochastic Models

2026-03-11

This paper addresses the critical problem that standard stateful PRNG implementations in agent-based models violate causal validity by making random d...
Bioinformatics

A Standardized Framework for Evaluating Gene Expression Generative Models

2026-03-11

This paper addresses the critical lack of standardized evaluation protocols for single-cell gene expression generative models, where inconsistent metr...
Bioinformatics

Single Molecule Localization Microscopy Challenge: A Biologically Inspired Benchmark for Long-Sequence Modeling

2026-03-11

This paper addresses the core challenge of evaluating state-space models on biologically realistic, sparse, and stochastic temporal processes, which a...

7 / 18

期刊: ArXiv Preprint

发布日期: 2026-03-18

Reinforcement LearningComputational Neuroscience

用于快速适应的统一策略-价值分解

Computational Neuroscience Unit, Istituto Superiore di Sanità, Rome, Italy | Ospedale Santa Lucia, Rome, Italy | School of Computer Science, University of Sheffield, United Kingdom

Cristiano Capone, Luca Falorsi, Andrea Ciardiello, Luca Manneschi

30秒速读

IN SHORT: 通过双线性分解在策略和价值函数之间共享低维目标嵌入，实现对新颖任务的零样本适应。

核心创新

Methodology Bilinear co-decomposition of actor and critic with shared multiplicative gating coefficients Gk(g)
Methodology Zero-shot adaptation via single forward pass estimation of Gk(g) without gradient updates
Biology Biologically inspired multiplicative gating mechanism analogous to gain modulation in cortical neurons

主要结论

单层双线性模型比标准双层MLP基线更快地获得更高奖励（图1B-C），通过乘法结构证明了学习效率的提高。
演员和评论家之间的共享G空间产生与单独门控相当的性能（图1D），同时减少了参数并实现了连贯的潜在控制接口。
对未见方向的零样本泛化显示性能下降有限（图2E），目标嵌入空间中的平滑插值支持新方向适应。

研究空白： 当前强化学习中的整体式神经网络架构限制了模块化、可解释性以及对变化任务目标的快速适应能力，需要完全重新训练或复杂的元学习方法。

摘要: 复杂控制系统中的快速适应仍然是强化学习的一个核心挑战。我们引入了一个框架，其中策略和价值函数共享一个低维系数向量——目标嵌入——它捕获任务身份，并能够在无需重新训练表示的情况下立即适应新任务。在预训练期间，我们通过双线性演员-评论家分解共同学习结构化价值基和兼容策略基。评论家分解为Q(s,a,g)=∑kGk(g)yk(s,a)，其中Gk(g)是目标条件系数向量，yk(s,a)是学习的价值基函数。这种乘法门控——其中上下文信号缩放一组状态依赖的基——让人联想到在第5层锥体神经元中观察到的增益调制，其中自上而下的输入调节感觉驱动响应的增益而不改变其调谐[1]。基于后继特征，我们将分解扩展到演员，它由一组由相同系数Gk(g)加权的原始策略组成。在测试时，基被冻结，Gk(g)通过单次前向传播进行零样本估计，从而无需任何梯度更新即可立即适应新任务。我们在MuJoCo Ant环境中训练了一个软演员-评论家智能体，目标为多方向运动，要求智能体以八个指定为连续目标向量的方向行走。双线性结构允许每个策略头专门处理方向子集，而共享系数层在它们之间泛化，通过在目标嵌入空间中插值来适应新方向。我们的结果表明，共享的低维目标嵌入为高维控制中的快速、结构化适应提供了一种通用机制，并突显了复杂强化学习系统中高效迁移的潜在生物学合理原则。