Paper List
-
Ill-Conditioning in Dictionary-Based Dynamic-Equation Learning: A Systems Biology Case Study
This paper addresses the critical challenge of numerical ill-conditioning and multicollinearity in library-based sparse regression methods (e.g., SIND...
-
Hybrid eTFCE–GRF: Exact Cluster-Size Retrieval with Analytical pp-Values for Voxel-Based Morphometry
This paper addresses the computational bottleneck in voxel-based neuroimaging analysis by providing a method that delivers exact cluster-size retrieva...
-
abx_amr_simulator: A simulation environment for antibiotic prescribing policy optimization under antimicrobial resistance
This paper addresses the critical challenge of quantitatively evaluating antibiotic prescribing policies under realistic uncertainty and partial obser...
-
PesTwin: a biology-informed Digital Twin for enabling precision farming
This paper addresses the critical bottleneck in precision agriculture: the inability to accurately forecast pest outbreaks in real-time, leading to su...
-
Equivariant Asynchronous Diffusion: An Adaptive Denoising Schedule for Accelerated Molecular Conformation Generation
This paper addresses the core challenge of generating physically plausible 3D molecular structures by bridging the gap between autoregressive methods ...
-
Omics Data Discovery Agents
This paper addresses the core challenge of making published omics data computationally reusable by automating the extraction, quantification, and inte...
-
Single-cell directional sensing at ultra-low chemoattractant concentrations from extreme first-passage events
This work addresses the core challenge of how a cell can rapidly and accurately determine the direction of a chemoattractant source when the signal is...
-
SDSR: A Spectral Divide-and-Conquer Approach for Species Tree Reconstruction
This paper addresses the computational bottleneck in reconstructing species trees from thousands of species and multiple genes by introducing a scalab...
Fold-CP: A Context Parallelism Framework for Biomolecular Modeling
NVIDIA | Rezo Therapeutics | Proxima | Earendil Labs
30秒速读
IN SHORT: This paper addresses the critical bottleneck of GPU memory limitations that restrict AlphaFold 3-like models to processing only a few thousand residues, preventing the structural prediction of large biomolecular assemblies essential for understanding cellular function and disease mechanisms.
核心创新
- Methodology Introduces a novel 2D context parallelism (CP) framework that tiles the O(N^2) pair representation tensor across a square grid of GPUs, achieving per-device memory scaling of O(N^2/P), a significant improvement over prior 1D sharding approaches like Dynamic Axial Parallelism (O(N^2/√P)).
- Methodology Develops custom distributed algorithms for core geometric modules (Triangle Attention, Triangle Multiplication, etc.) using low-level torch.distributed primitives and a custom autograd imperative, avoiding the memory overhead of native PyTorch DTensor operations during backpropagation.
- Biology Demonstrates practical utility by enabling the structural scoring of over 90% of the mammalian protein complexes in the CORUM database and the full-length folding of the disease-relevant PI4KA lipid kinase complex with its intrinsically disordered region, tasks previously infeasible due to memory constraints.
主要结论
- Fold-CP's 2D tiling strategy enables linear memory scaling, successfully predicting structures for assemblies exceeding 30,000 residues using 64 GPUs, breaking the previous single-GPU limit of ~2,048 tokens.
- The framework maintains accuracy parity with single-device baselines while providing a scalable pathway, as evidenced by its application to score >90% of the CORUM database complexes.
- By implementing novel distributed algorithms (e.g., Cannon-style ring for Triangle Multiplication) and a square device mesh topology, Fold-CP achieves practical execution speed, making large-scale in-context folding computationally feasible for the first time.
摘要: Understanding cellular machinery requires atomic-scale reconstruction of large biomolecular assemblies. However, predicting the structures of these systems has been constrained by hardware memory requirements of models like AlphaFold 3, imposing a practical ceiling of a few thousand residues that can be processed on a single GPU. Here we present NVIDIA BioNeMo Fold-CP, a context parallelism framework that overcomes this barrier by distributing the inference and training pipelines of co-folding models across multiple GPUs. We use the Boltz models as open source reference architectures and implement custom multi-dimensional primitives that efficiently parallelize both the dense triangular updates and the irregular, data-dependent pattern of window-batched local attention. Our approach achieves efficient memory scaling; for an N-token input distributed across P GPUs, per-device memory scales as O(N^2/P), enabling the structure prediction of assemblies exceeding 30,000 residues on 64 NVIDIA B300 GPUs. We demonstrate the scientific utility of this approach through successful developer use cases: Fold-CP enabled the scoring of over 90% of Comprehensive Resource of Mammalian protein complexes (CORUM) database, as well as folding of disease-relevant PI4KA lipid kinase complex bound to an intrinsically disordered region without cropping. By providing a scalable pathway for modeling massive systems with full global context, Fold-CP represents a significant step toward the realization of a virtual cell.