Paper List
-
SpikGPT: A High-Accuracy and Interpretable Spiking Attention Framework for Single-Cell Annotation
This paper addresses the core challenge of robust single-cell annotation across heterogeneous datasets with batch effects and the critical need to ide...
-
Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time
This paper addresses the core challenge of efficiently and accurately sampling the conformational landscape of biomolecules from diffusion-based struc...
-
Personalized optimization of pediatric HD-tDCS for dose consistency and target engagement
This paper addresses the critical limitation of one-size-fits-all HD-tDCS protocols in pediatric populations by developing a personalized optimization...
-
Realistic Transition Paths for Large Biomolecular Systems: A Langevin Bridge Approach
This paper addresses the core challenge of generating physically realistic and computationally efficient transition paths between distinct protein con...
-
Consistent Synthetic Sequences Unlock Structural Diversity in Fully Atomistic De Novo Protein Design
This paper addresses the core pain point of low sequence-structure alignment in existing synthetic datasets (e.g., AFDB), which severely limits the pe...
-
MoRSAIK: Sequence Motif Reactor Simulation, Analysis and Inference Kit in Python
This work addresses the computational bottleneck in simulating prebiotic RNA reactor dynamics by developing a Python package that tracks sequence moti...
-
On the Approximation of Phylogenetic Distance Functions by Artificial Neural Networks
This paper addresses the core challenge of developing computationally efficient and scalable neural network architectures that can learn accurate phyl...
-
EcoCast: A Spatio-Temporal Model for Continual Biodiversity and Climate Risk Forecasting
This paper addresses the critical bottleneck in conservation: the lack of timely, high-resolution, near-term forecasts of species distribution shifts ...
用于量子退火优化的二元潜在蛋白质适应度景观
University of Alabama at Birmingham
30秒速读
IN SHORT: 通过将序列映射到二元潜在空间进行基于QUBO的适应度优化,桥接蛋白质表示学习和组合优化。
核心创新
- Methodology First framework to transform protein language model embeddings into binary latent representations for QUBO-based fitness modeling
- Methodology Enables direct compatibility with quantum annealing hardware through native QUBO formulation
- Biology Demonstrates that simple binary representations can capture meaningful structure in protein fitness landscapes
主要结论
- Q-BioLat在ProteinGym GFP数据集(10,000个样本,潜在维度32-64)上实现了0.385-0.413的Spearman相关性
- 优化后的序列始终检索到适应度百分位顶部的最近邻,模拟退火在代理分数上实现了1.529±的改进
- 遗传算法在更高维潜在空间(m=64)中优于其他方法,而局部搜索能更好地保持序列真实性
摘要: 我们提出了Q-BioLat,一个在二元潜在空间中建模和优化蛋白质适应度景观的框架。从蛋白质序列出发,我们利用预训练的蛋白质语言模型获得连续嵌入,然后将其转换为紧凑的二元潜在表示。在这个空间中,蛋白质适应度使用二次无约束二元优化(QUBO)模型进行近似,从而通过经典启发式方法(如模拟退火和遗传算法)实现高效的组合搜索。在ProteinGym基准测试中,我们证明Q-BioLat能够捕捉蛋白质适应度景观中的有意义结构,并能够识别高适应度变体。尽管使用了简单的二值化方案,我们的方法始终能检索到其最近邻位于训练适应度分布顶部的序列,特别是在最强配置下。我们进一步表明,不同的优化策略表现出不同的行为,进化搜索在更高维的潜在空间中表现更好,而局部搜索在保持真实序列方面仍具有竞争力。除了其经验性能外,Q-BioLat为蛋白质表示学习和组合优化之间提供了自然的桥梁。通过将蛋白质适应度表述为QUBO问题,我们的框架与新兴的量子退火硬件直接兼容,为量子辅助蛋白质工程开辟了新的方向。