Paper List
-
Discovery of a Hematopoietic Manifold in scGPT Yields a Method for Extracting Performant Algorithms from Biological Foundation Model Internals
This work addresses the core challenge of extracting reusable, interpretable, and high-performance biological algorithms from the opaque internal repr...
-
MS2MetGAN: Latent-space adversarial training for metabolite–spectrum matching in MS/MS database search
This paper addresses the critical bottleneck in metabolite identification: the generation of high-quality negative training samples that are structura...
-
Toward Robust, Reproducible, and Widely Accessible Intracranial Language Brain-Computer Interfaces: A Comprehensive Review of Neural Mechanisms, Hardware, Algorithms, Evaluation, Clinical Pathways and Future Directions
This review addresses the core challenge of fragmented and heterogeneous evidence that hinders the clinical translation of intracranial language BCIs,...
-
Less Is More in Chemotherapy of Breast Cancer
通过纳入细胞周期时滞和竞争项,解决了现有肿瘤-免疫模型的过度简化问题,以定量比较化疗方案。
-
Fold-CP: A Context Parallelism Framework for Biomolecular Modeling
This paper addresses the critical bottleneck of GPU memory limitations that restrict AlphaFold 3-like models to processing only a few thousand residue...
-
Open Biomedical Knowledge Graphs at Scale: Construction, Federation, and AI Agent Access with Samyama Graph Database
This paper addresses the core pain point of fragmented biomedical data by constructing and federating large-scale, open knowledge graphs to enable sea...
-
Predictive Analytics for Foot Ulcers Using Time-Series Temperature and Pressure Data
This paper addresses the critical need for continuous, real-time monitoring of diabetic foot health by developing an unsupervised anomaly detection fr...
-
Hypothesis-Based Particle Detection for Accurate Nanoparticle Counting and Digital Diagnostics
This paper addresses the core challenge of achieving accurate, interpretable, and training-free nanoparticle counting in digital diagnostic assays, wh...
Cross-Species Antimicrobial Resistance Prediction from Genomic Foundation Models
Department of Computer Science, School of Engineering and Applied Science, Columbia University
30秒速读
IN SHORT: This paper addresses the core challenge of predicting antimicrobial resistance across phylogenetically distinct bacterial species, where traditional methods fail due to reliance on species-specific genomic shortcuts rather than transferable resistance mechanisms.
核心创新
- Methodology Developed diagnostic-driven layer selection for genomic foundation models, identifying Layer 10 in Evo-1-8k-base as the deepest jointly stable extraction point through activation scale, isotropy, effective rank, and cross-seed stability analysis.
- Methodology Introduced MiniRocket-based local pattern preservation for embedding aggregation, treating per-window embeddings as ordered multivariate signals to preserve sparse cassette-scale resistance signals that global pooling dilutes.
- Biology Established the mechanism-mix hypothesis: cross-species AMR prediction performance depends on whether resistance is cassette-mediated (transferable) or chromosomal/diffuse (species-specific), not just aggregation method.
主要结论
- MiniRocket aggregation with k-NN classifier achieved MCC=0.753 on cross-species validation (val_outside), substantially outperforming global pooling (F1=0.982 vs 0.901 for k-NN), while Kover baseline collapsed from within-species F1~0.68 to cross-species F1=0.02.
- Cross-species performance is mechanism-dependent: MiniRocket excels when cassette-mediated resistance predominates (e.g., plasmid-borne β-lactamases), while global pooling remains competitive for chromosomal/diffuse mechanisms.
- Layer 10 embeddings from Evo-1-8k-base provide optimal stability, with sharp degradation beyond Layer 11 evidenced by isotropy collapse (angular diversity peaks at L9-L10) and effective rank compression at L11.
摘要: Cross-species antimicrobial resistance (AMR) prediction is fundamentally an out-of-distribution generalization problem: models trained on one set of bacterial taxa must transfer to phylogenetically distinct genomes that may rely on different resistance mechanisms. Critically, resistance is not monolithic. Across species, it arises from a heterogeneous mixture of localized, horizontally transferred gene cassettes and diffuse, species-specific genomic backgrounds, making successful transfer inherently mechanism-dependent. Using a strict species holdout protocol, we first establish an interpretable k-mer baseline with Kover, showing that strong within-species performance collapses under true cross-species evaluation. This motivates the need for representation-level choices that explicitly preserve transferable biological signals rather than amplify phylogenetic shortcuts. We introduce two ingredients that make genomic foundation model embeddings effective for cross-species AMR prediction. First, for layer selection, we develop diagnostics for activation scale, isotropy, effective rank, and cross-seed stability under native bfloat16 inference. These reveal a sharp stability boundary at Layer 11 in Evo-1-8k-base, identifying Layer 10 as the deepest jointly stable layer; extracting embeddings here improves downstream conditioning, reproducibility, and robustness. Second, for feature aggregation, we argue that global pooling obscures localized resistance mechanisms. Instead, we treat per-window embeddings as an ordered multivariate signal and apply MiniRocket to summarize multi-scale local activation patterns. This preserves cassette-scale signals (e.g., plasmid-borne β-lactamases) that global averages dilute, reorganizing feature space toward phenotype-aligned neighborhoods where simple classifiers can generalize across species. On ampicillin resistance across 3,388 genomes from 126 species, we show that cross-species performance depends on which resistance mechanisms dominate the held-out species, not on aggregation method alone. MiniRocket excels when cassette-mediated resistance predominates; Global Pooling remains competitive for chromosomal or diffuse mechanisms. Both approaches perform similarly under same-species evaluation. Beyond accuracy, MiniRocket enables zero-training aggregation, interpretable predictions via neighbor auditing, and biological validation through mechanism-based clustering. Unlike complex decision boundaries learned by gradient boosting, k-NN exposes the underlying geometric reorganization that explains when and why local pattern preservation succeeds: reduced phylogenetic hubness and increased cross-species mechanism sharing. Together, our results establish aggregation choice as a central axis in cross-species AMR prediction and provide a reproducible, diagnostic-driven framework for deploying genomic foundation models under distribution shift.