Paper List

Computational Neuroscience

Translating Measures onto Mechanisms: The Cognitive Relevance of Higher-Order Information

2025-12-02

This review addresses the core challenge of translating abstract higher-order information theory metrics (e.g., synergy, redundancy) into defensible, ...
Artificial Intelligence

Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs

2025-12-02

This paper addresses the critical gap in understanding whether LLMs spontaneously develop human-like Bayesian strategies for processing uncertain info...
Bioinformatics

Vessel Network Topology in Molecular Communication: Insights from Experiments and Theory

2025-12-02

This work addresses the critical lack of experimentally validated channel models for molecular communication within complex vessel networks, which is ...
Biophysics

Modulation of DNA rheology by a transcription factor that forms aging microgels

2025-12-02

This work addresses the fundamental question of how the transcription factor NANOG, essential for embryonic stem cell pluripotency, physically regulat...
Systems Biology

Imperfect molecular detection renormalizes apparent kinetic rates in stochastic gene regulatory networks

2025-12-02

This paper addresses the core challenge of distinguishing genuine stochastic dynamics of gene regulatory networks from artifacts introduced by imperfe...
Bioinformatics

PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-Cancer

2025-12-02

This paper addresses the dual challenge of achieving computational efficiency without sacrificing accuracy in whole-transcriptome single-cell represen...
Mathematical Biology

Beyond Bayesian Inference: The Correlation Integral Likelihood Framework and Gradient Flow Methods for Deterministic Sampling

2025-12-02

This paper addresses the core challenge of calibrating complex biological models (e.g., PDEs, agent-based models) with incomplete, noisy, or heterogen...
Bioinformatics

Contrastive Deep Learning for Variant Detection in Wastewater Genomic Sequencing

2025-12-02

This paper addresses the core challenge of detecting viral variants in wastewater sequencing data without reference genomes or labeled annotations, ov...

14 / 18

期刊: ArXiv Preprint

发布日期: 2026-03

BioinformaticsComputational Biology

Fold-CP: A Context Parallelism Framework for Biomolecular Modeling

NVIDIA | Rezo Therapeutics | Proxima | Earendil Labs

Dejun Lin, Simon Chu, Vishanth Iyer, Youhan Lee, John St John, Kevin Boyd, Brian Roland, Xiaowei Ren, Guoqing Zhou, Zhonglin Cao, Polina Binder, Yuliya Zhautouskaya, Jakub Zakrzewski, Maximilian Stadler, Kyle Gion, Yuxing Peng, Xi Chen, Tianjing Zhang, Philipp Junk, Michelle Dimon, Paweł Gniewek, Fabian Ortega, McKinley Polen, Ivan Grubisic, Ali Bashir, Graham Holt, Danny Kovtun, Matthias Grass, Luca Naef, Rui Wang, Jian Peng, Anthony Costa, Saee Paliwal, Eddie Calleja, Timur Rvachov, Neha Tadimeti, Roy Tal, Emine Kucukbenli

30秒速读

IN SHORT: This paper addresses the critical bottleneck of GPU memory limitations that restrict AlphaFold 3-like models to processing only a few thousand residues, preventing the structural prediction of large biomolecular assemblies essential for understanding cellular function and disease mechanisms.

核心创新

Methodology Introduces a novel 2D context parallelism (CP) framework that tiles the O(N^2) pair representation tensor across a square grid of GPUs, achieving per-device memory scaling of O(N^2/P), a significant improvement over prior 1D sharding approaches like Dynamic Axial Parallelism (O(N^2/√P)).
Methodology Develops custom distributed algorithms for core geometric modules (Triangle Attention, Triangle Multiplication, etc.) using low-level torch.distributed primitives and a custom autograd imperative, avoiding the memory overhead of native PyTorch DTensor operations during backpropagation.
Biology Demonstrates practical utility by enabling the structural scoring of over 90% of the mammalian protein complexes in the CORUM database and the full-length folding of the disease-relevant PI4KA lipid kinase complex with its intrinsically disordered region, tasks previously infeasible due to memory constraints.

主要结论

Fold-CP's 2D tiling strategy enables linear memory scaling, successfully predicting structures for assemblies exceeding 30,000 residues using 64 GPUs, breaking the previous single-GPU limit of ~2,048 tokens.
The framework maintains accuracy parity with single-device baselines while providing a scalable pathway, as evidenced by its application to score >90% of the CORUM database complexes.
By implementing novel distributed algorithms (e.g., Cannon-style ring for Triangle Multiplication) and a square device mesh topology, Fold-CP achieves practical execution speed, making large-scale in-context folding computationally feasible for the first time.

研究空白： Current state-of-the-art co-folding models (e.g., AlphaFold 3) hit a memory wall due to the quadratic O(N^2) scaling of their pairwise representation, imposing a practical limit of a few thousand residues per GPU. This prevents the modeling of large macromolecular assemblies (>70% of CORUM complexes), forcing reliance on approximations like sequence cropping, serial chunking, or linear attention that compromise accuracy and global context.

摘要: Understanding cellular machinery requires atomic-scale reconstruction of large biomolecular assemblies. However, predicting the structures of these systems has been constrained by hardware memory requirements of models like AlphaFold 3, imposing a practical ceiling of a few thousand residues that can be processed on a single GPU. Here we present NVIDIA BioNeMo Fold-CP, a context parallelism framework that overcomes this barrier by distributing the inference and training pipelines of co-folding models across multiple GPUs. We use the Boltz models as open source reference architectures and implement custom multi-dimensional primitives that efficiently parallelize both the dense triangular updates and the irregular, data-dependent pattern of window-batched local attention. Our approach achieves efficient memory scaling; for an N-token input distributed across P GPUs, per-device memory scales as O(N^2/P), enabling the structure prediction of assemblies exceeding 30,000 residues on 64 NVIDIA B300 GPUs. We demonstrate the scientific utility of this approach through successful developer use cases: Fold-CP enabled the scoring of over 90% of Comprehensive Resource of Mammalian protein complexes (CORUM) database, as well as folding of disease-relevant PI4KA lipid kinase complex bound to an intrinsically disordered region without cropping. By providing a scalable pathway for modeling massive systems with full global context, Fold-CP represents a significant step toward the realization of a virtual cell.