Paper List

Bioinformatics

MCP-AI: Protocol-Driven Intelligence Framework for Autonomous Reasoning in Healthcare

2025-12-05

This paper addresses the critical gap in healthcare AI systems that lack contextual reasoning, long-term state management, and verifiable workflows by...
Bioinformatics

Model Gateway: Model Management Platform for Model-Driven Drug Discovery

2025-12-05

This paper addresses the critical bottleneck of fragmented, ad-hoc model management in pharmaceutical research by providing a centralized, scalable ML...
Bioinformatics

Tree Thinking in the Genomic Era: Unifying Models Across Cells, Populations, and Species

2025-12-05

This paper addresses the fragmentation of tree-based inference methods across biological scales by identifying shared algorithmic principles and stati...
Bioinformatics

SSDLabeler: Realistic semi-synthetic data generation for multi-label artifact classification in EEG

2025-12-05

This paper addresses the core challenge of training robust multi-label EEG artifact classifiers by overcoming the scarcity and limited diversity of ma...
Neuroscience

Decoding Selective Auditory Attention to Musical Elements in Ecologically Valid Music Listening

2025-12-05

This paper addresses the core challenge of objectively quantifying listeners' selective attention to specific musical components (e.g., vocals, drums,...
Bioengineering

Physics-Guided Surrogate Modeling for Machine Learning–Driven DLD Design Optimization

2025-12-05

This paper addresses the core bottleneck of translating microfluidic DLD devices from research prototypes to clinical applications by replacing weeks-...
Bioinformatics

Mechanistic Interpretability of Antibody Language Models Using SAEs

2025-12-05

This work addresses the core challenge of achieving both interpretability and controllable generation in domain-specific protein language models, spec...
Theoretical Biology

Fluctuating Environments Favor Extreme Dormancy Strategies and Penalize Intermediate Ones

2025-12-05

This paper addresses the core challenge of determining how organisms should tune dormancy duration to match the temporal autocorrelation of their envi...

10 / 18

期刊: ArXiv Preprint

发布日期: 2026-03

BioinformaticsComputational Biology

Fold-CP: A Context Parallelism Framework for Biomolecular Modeling

NVIDIA | Rezo Therapeutics | Proxima | Earendil Labs

Dejun Lin, Simon Chu, Vishanth Iyer, Youhan Lee, John St John, Kevin Boyd, Brian Roland, Xiaowei Ren, Guoqing Zhou, Zhonglin Cao, Polina Binder, Yuliya Zhautouskaya, Jakub Zakrzewski, Maximilian Stadler, Kyle Gion, Yuxing Peng, Xi Chen, Tianjing Zhang, Philipp Junk, Michelle Dimon, Paweł Gniewek, Fabian Ortega, McKinley Polen, Ivan Grubisic, Ali Bashir, Graham Holt, Danny Kovtun, Matthias Grass, Luca Naef, Rui Wang, Jian Peng, Anthony Costa, Saee Paliwal, Eddie Calleja, Timur Rvachov, Neha Tadimeti, Roy Tal, Emine Kucukbenli

30秒速读

IN SHORT: This paper addresses the critical bottleneck of GPU memory limitations that restrict AlphaFold 3-like models to processing only a few thousand residues, preventing the structural prediction of large biomolecular assemblies essential for understanding cellular function and disease mechanisms.

核心创新

Methodology Introduces a novel 2D context parallelism (CP) framework that tiles the O(N^2) pair representation tensor across a square grid of GPUs, achieving per-device memory scaling of O(N^2/P), a significant improvement over prior 1D sharding approaches like Dynamic Axial Parallelism (O(N^2/√P)).
Methodology Develops custom distributed algorithms for core geometric modules (Triangle Attention, Triangle Multiplication, etc.) using low-level torch.distributed primitives and a custom autograd imperative, avoiding the memory overhead of native PyTorch DTensor operations during backpropagation.
Biology Demonstrates practical utility by enabling the structural scoring of over 90% of the mammalian protein complexes in the CORUM database and the full-length folding of the disease-relevant PI4KA lipid kinase complex with its intrinsically disordered region, tasks previously infeasible due to memory constraints.

主要结论

Fold-CP's 2D tiling strategy enables linear memory scaling, successfully predicting structures for assemblies exceeding 30,000 residues using 64 GPUs, breaking the previous single-GPU limit of ~2,048 tokens.
The framework maintains accuracy parity with single-device baselines while providing a scalable pathway, as evidenced by its application to score >90% of the CORUM database complexes.
By implementing novel distributed algorithms (e.g., Cannon-style ring for Triangle Multiplication) and a square device mesh topology, Fold-CP achieves practical execution speed, making large-scale in-context folding computationally feasible for the first time.

研究空白： Current state-of-the-art co-folding models (e.g., AlphaFold 3) hit a memory wall due to the quadratic O(N^2) scaling of their pairwise representation, imposing a practical limit of a few thousand residues per GPU. This prevents the modeling of large macromolecular assemblies (>70% of CORUM complexes), forcing reliance on approximations like sequence cropping, serial chunking, or linear attention that compromise accuracy and global context.

摘要: Understanding cellular machinery requires atomic-scale reconstruction of large biomolecular assemblies. However, predicting the structures of these systems has been constrained by hardware memory requirements of models like AlphaFold 3, imposing a practical ceiling of a few thousand residues that can be processed on a single GPU. Here we present NVIDIA BioNeMo Fold-CP, a context parallelism framework that overcomes this barrier by distributing the inference and training pipelines of co-folding models across multiple GPUs. We use the Boltz models as open source reference architectures and implement custom multi-dimensional primitives that efficiently parallelize both the dense triangular updates and the irregular, data-dependent pattern of window-batched local attention. Our approach achieves efficient memory scaling; for an N-token input distributed across P GPUs, per-device memory scales as O(N^2/P), enabling the structure prediction of assemblies exceeding 30,000 residues on 64 NVIDIA B300 GPUs. We demonstrate the scientific utility of this approach through successful developer use cases: Fold-CP enabled the scoring of over 90% of Comprehensive Resource of Mammalian protein complexes (CORUM) database, as well as folding of disease-relevant PI4KA lipid kinase complex bound to an intrinsically disordered region without cropping. By providing a scalable pathway for modeling massive systems with full global context, Fold-CP represents a significant step toward the realization of a virtual cell.