Paper List
-
Developing the PsyCogMetrics™ AI Lab to Evaluate Large Language Models and Advance Cognitive Science
This paper addresses the critical gap between sophisticated LLM evaluation needs and the lack of accessible, scientifically rigorous platforms that in...
-
Equivalence of approximation by networks of single- and multi-spike neurons
This paper resolves the fundamental question of whether single-spike spiking neural networks (SNNs) are inherently less expressive than multi-spike SN...
-
The neuroscience of transformers
提出了Transformer架构与皮层柱微环路之间的新颖计算映射,连接了现代AI与神经科学。
-
Framing local structural identifiability and observability in terms of parameter-state symmetries
This paper addresses the core challenge of systematically determining which parameters and states in a mechanistic ODE model can be uniquely inferred ...
-
Leveraging Phytolith Research using Artificial Intelligence
This paper addresses the critical bottleneck in phytolith research by automating the labor-intensive manual microscopy process through a multimodal AI...
-
Neural network-based encoding in free-viewing fMRI with gaze-aware models
This paper addresses the core challenge of building computationally efficient and ecologically valid brain encoding models for naturalistic vision by ...
-
Scalable DNA Ternary Full Adder Enabled by a Competitive Blocking Circuit
This paper addresses the core bottleneck of carry information attenuation and limited computational scale in DNA binary adders by introducing a scalab...
-
ELISA: An Interpretable Hybrid Generative AI Agent for Expression-Grounded Discovery in Single-Cell Genomics
This paper addresses the critical bottleneck of translating high-dimensional single-cell transcriptomic data into interpretable biological hypotheses ...
Fold-CP: A Context Parallelism Framework for Biomolecular Modeling
NVIDIA | Rezo Therapeutics | Proxima | Earendil Labs
30秒速读
IN SHORT: This paper addresses the critical bottleneck of GPU memory limitations that restrict AlphaFold 3-like models to processing only a few thousand residues, preventing the structural prediction of large biomolecular assemblies essential for understanding cellular function and disease mechanisms.
核心创新
- Methodology Introduces a novel 2D context parallelism (CP) framework that tiles the O(N^2) pair representation tensor across a square grid of GPUs, achieving per-device memory scaling of O(N^2/P), a significant improvement over prior 1D sharding approaches like Dynamic Axial Parallelism (O(N^2/√P)).
- Methodology Develops custom distributed algorithms for core geometric modules (Triangle Attention, Triangle Multiplication, etc.) using low-level torch.distributed primitives and a custom autograd imperative, avoiding the memory overhead of native PyTorch DTensor operations during backpropagation.
- Biology Demonstrates practical utility by enabling the structural scoring of over 90% of the mammalian protein complexes in the CORUM database and the full-length folding of the disease-relevant PI4KA lipid kinase complex with its intrinsically disordered region, tasks previously infeasible due to memory constraints.
主要结论
- Fold-CP's 2D tiling strategy enables linear memory scaling, successfully predicting structures for assemblies exceeding 30,000 residues using 64 GPUs, breaking the previous single-GPU limit of ~2,048 tokens.
- The framework maintains accuracy parity with single-device baselines while providing a scalable pathway, as evidenced by its application to score >90% of the CORUM database complexes.
- By implementing novel distributed algorithms (e.g., Cannon-style ring for Triangle Multiplication) and a square device mesh topology, Fold-CP achieves practical execution speed, making large-scale in-context folding computationally feasible for the first time.
摘要: Understanding cellular machinery requires atomic-scale reconstruction of large biomolecular assemblies. However, predicting the structures of these systems has been constrained by hardware memory requirements of models like AlphaFold 3, imposing a practical ceiling of a few thousand residues that can be processed on a single GPU. Here we present NVIDIA BioNeMo Fold-CP, a context parallelism framework that overcomes this barrier by distributing the inference and training pipelines of co-folding models across multiple GPUs. We use the Boltz models as open source reference architectures and implement custom multi-dimensional primitives that efficiently parallelize both the dense triangular updates and the irregular, data-dependent pattern of window-batched local attention. Our approach achieves efficient memory scaling; for an N-token input distributed across P GPUs, per-device memory scales as O(N^2/P), enabling the structure prediction of assemblies exceeding 30,000 residues on 64 NVIDIA B300 GPUs. We demonstrate the scientific utility of this approach through successful developer use cases: Fold-CP enabled the scoring of over 90% of Comprehensive Resource of Mammalian protein complexes (CORUM) database, as well as folding of disease-relevant PI4KA lipid kinase complex bound to an intrinsically disordered region without cropping. By providing a scalable pathway for modeling massive systems with full global context, Fold-CP represents a significant step toward the realization of a virtual cell.