Paper List
-
Macroscopic Dominance from Microscopic Extremes: Symmetry Breaking in Spatial Competition
This paper addresses the fundamental question of how microscopic stochastic advantages in spatial exploration translate into macroscopic resource domi...
-
Linear Readout of Neural Manifolds with Continuous Variables
This paper addresses the core challenge of quantifying how the geometric structure of high-dimensional neural population activity (neural manifolds) d...
-
Theory of Cell Body Lensing and Phototaxis Sign Reversal in “Eyeless” Mutants of Chlamydomonas
This paper solves the core puzzle of how eyeless mutants of Chlamydomonas exhibit reversed phototaxis by quantitatively modeling the competition betwe...
-
Cross-Species Transfer Learning for Electrophysiology-to-Transcriptomics Mapping in Cortical GABAergic Interneurons
This paper addresses the challenge of predicting transcriptomic identity from electrophysiological recordings in human cortical interneurons, where li...
-
Uncovering statistical structure in large-scale neural activity with Restricted Boltzmann Machines
This paper addresses the core challenge of modeling large-scale neural population activity (1500-2000 neurons) with interpretable higher-order interac...
-
Realizing Common Random Numbers: Event-Keyed Hashing for Causally Valid Stochastic Models
This paper addresses the critical problem that standard stateful PRNG implementations in agent-based models violate causal validity by making random d...
-
A Standardized Framework for Evaluating Gene Expression Generative Models
This paper addresses the critical lack of standardized evaluation protocols for single-cell gene expression generative models, where inconsistent metr...
-
Single Molecule Localization Microscopy Challenge: A Biologically Inspired Benchmark for Long-Sequence Modeling
This paper addresses the core challenge of evaluating state-space models on biologically realistic, sparse, and stochastic temporal processes, which a...
Fold-CP: A Context Parallelism Framework for Biomolecular Modeling
NVIDIA | Rezo Therapeutics | Proxima | Earendil Labs
30秒速读
IN SHORT: This paper addresses the critical bottleneck of GPU memory limitations that restrict AlphaFold 3-like models to processing only a few thousand residues, preventing the structural prediction of large biomolecular assemblies essential for understanding cellular function and disease mechanisms.
核心创新
- Methodology Introduces a novel 2D context parallelism (CP) framework that tiles the O(N^2) pair representation tensor across a square grid of GPUs, achieving per-device memory scaling of O(N^2/P), a significant improvement over prior 1D sharding approaches like Dynamic Axial Parallelism (O(N^2/√P)).
- Methodology Develops custom distributed algorithms for core geometric modules (Triangle Attention, Triangle Multiplication, etc.) using low-level torch.distributed primitives and a custom autograd imperative, avoiding the memory overhead of native PyTorch DTensor operations during backpropagation.
- Biology Demonstrates practical utility by enabling the structural scoring of over 90% of the mammalian protein complexes in the CORUM database and the full-length folding of the disease-relevant PI4KA lipid kinase complex with its intrinsically disordered region, tasks previously infeasible due to memory constraints.
主要结论
- Fold-CP's 2D tiling strategy enables linear memory scaling, successfully predicting structures for assemblies exceeding 30,000 residues using 64 GPUs, breaking the previous single-GPU limit of ~2,048 tokens.
- The framework maintains accuracy parity with single-device baselines while providing a scalable pathway, as evidenced by its application to score >90% of the CORUM database complexes.
- By implementing novel distributed algorithms (e.g., Cannon-style ring for Triangle Multiplication) and a square device mesh topology, Fold-CP achieves practical execution speed, making large-scale in-context folding computationally feasible for the first time.
摘要: Understanding cellular machinery requires atomic-scale reconstruction of large biomolecular assemblies. However, predicting the structures of these systems has been constrained by hardware memory requirements of models like AlphaFold 3, imposing a practical ceiling of a few thousand residues that can be processed on a single GPU. Here we present NVIDIA BioNeMo Fold-CP, a context parallelism framework that overcomes this barrier by distributing the inference and training pipelines of co-folding models across multiple GPUs. We use the Boltz models as open source reference architectures and implement custom multi-dimensional primitives that efficiently parallelize both the dense triangular updates and the irregular, data-dependent pattern of window-batched local attention. Our approach achieves efficient memory scaling; for an N-token input distributed across P GPUs, per-device memory scales as O(N^2/P), enabling the structure prediction of assemblies exceeding 30,000 residues on 64 NVIDIA B300 GPUs. We demonstrate the scientific utility of this approach through successful developer use cases: Fold-CP enabled the scoring of over 90% of Comprehensive Resource of Mammalian protein complexes (CORUM) database, as well as folding of disease-relevant PI4KA lipid kinase complex bound to an intrinsically disordered region without cropping. By providing a scalable pathway for modeling massive systems with full global context, Fold-CP represents a significant step toward the realization of a virtual cell.