Paper List
-
A Unified Variational Principle for Branching Transport Networks: Wave Impedance, Viscous Flow, and Tissue Metabolism
This paper solves the core problem of predicting the empirically observed branching exponent (α≈2.7) in mammalian arterial trees, which neither Murray...
-
Household Bubbling Strategies for Epidemic Control and Social Connectivity
This paper addresses the core challenge of designing household merging (social bubble) strategies that effectively control epidemic risk while maximiz...
-
Empowering Chemical Structures with Biological Insights for Scalable Phenotypic Virtual Screening
This paper addresses the core challenge of bridging the gap between scalable chemical structure screening and biologically informative but resource-in...
-
A mechanical bifurcation constrains the evolution of cell sheet folding in the family Volvocaceae
This paper addresses the core problem of why there is an evolutionary gap in species with intermediate cell numbers (e.g., 256 cells) in Volvocaceae, ...
-
Bayesian Inference in Epidemic Modelling: A Beginner’s Guide Illustrated with the SIR Model
This guide addresses the core challenge of estimating uncertain epidemiological parameters (like transmission and recovery rates) from noisy, real-wor...
-
Geometric framework for biological evolution
This paper addresses the fundamental challenge of developing a coordinate-independent, geometric description of evolutionary dynamics that bridges gen...
-
A multiscale discrete-to-continuum framework for structured population models
This paper addresses the core challenge of systematically deriving uniformly valid continuum approximations from discrete structured population models...
-
Whole slide and microscopy image analysis with QuPath and OMERO
使QuPath能够直接分析存储在OMERO服务器中的图像而无需下载整个数据集,克服了大规模研究的本地存储限制。
Cross-Species Antimicrobial Resistance Prediction from Genomic Foundation Models
Department of Computer Science, School of Engineering and Applied Science, Columbia University
30秒速读
IN SHORT: This paper addresses the core challenge of predicting antimicrobial resistance across phylogenetically distinct bacterial species, where traditional methods fail due to reliance on species-specific genomic shortcuts rather than transferable resistance mechanisms.
核心创新
- Methodology Developed diagnostic-driven layer selection for genomic foundation models, identifying Layer 10 in Evo-1-8k-base as the deepest jointly stable extraction point through activation scale, isotropy, effective rank, and cross-seed stability analysis.
- Methodology Introduced MiniRocket-based local pattern preservation for embedding aggregation, treating per-window embeddings as ordered multivariate signals to preserve sparse cassette-scale resistance signals that global pooling dilutes.
- Biology Established the mechanism-mix hypothesis: cross-species AMR prediction performance depends on whether resistance is cassette-mediated (transferable) or chromosomal/diffuse (species-specific), not just aggregation method.
主要结论
- MiniRocket aggregation with k-NN classifier achieved MCC=0.753 on cross-species validation (val_outside), substantially outperforming global pooling (F1=0.982 vs 0.901 for k-NN), while Kover baseline collapsed from within-species F1~0.68 to cross-species F1=0.02.
- Cross-species performance is mechanism-dependent: MiniRocket excels when cassette-mediated resistance predominates (e.g., plasmid-borne β-lactamases), while global pooling remains competitive for chromosomal/diffuse mechanisms.
- Layer 10 embeddings from Evo-1-8k-base provide optimal stability, with sharp degradation beyond Layer 11 evidenced by isotropy collapse (angular diversity peaks at L9-L10) and effective rank compression at L11.
摘要: Cross-species antimicrobial resistance (AMR) prediction is fundamentally an out-of-distribution generalization problem: models trained on one set of bacterial taxa must transfer to phylogenetically distinct genomes that may rely on different resistance mechanisms. Critically, resistance is not monolithic. Across species, it arises from a heterogeneous mixture of localized, horizontally transferred gene cassettes and diffuse, species-specific genomic backgrounds, making successful transfer inherently mechanism-dependent. Using a strict species holdout protocol, we first establish an interpretable k-mer baseline with Kover, showing that strong within-species performance collapses under true cross-species evaluation. This motivates the need for representation-level choices that explicitly preserve transferable biological signals rather than amplify phylogenetic shortcuts. We introduce two ingredients that make genomic foundation model embeddings effective for cross-species AMR prediction. First, for layer selection, we develop diagnostics for activation scale, isotropy, effective rank, and cross-seed stability under native bfloat16 inference. These reveal a sharp stability boundary at Layer 11 in Evo-1-8k-base, identifying Layer 10 as the deepest jointly stable layer; extracting embeddings here improves downstream conditioning, reproducibility, and robustness. Second, for feature aggregation, we argue that global pooling obscures localized resistance mechanisms. Instead, we treat per-window embeddings as an ordered multivariate signal and apply MiniRocket to summarize multi-scale local activation patterns. This preserves cassette-scale signals (e.g., plasmid-borne β-lactamases) that global averages dilute, reorganizing feature space toward phenotype-aligned neighborhoods where simple classifiers can generalize across species. On ampicillin resistance across 3,388 genomes from 126 species, we show that cross-species performance depends on which resistance mechanisms dominate the held-out species, not on aggregation method alone. MiniRocket excels when cassette-mediated resistance predominates; Global Pooling remains competitive for chromosomal or diffuse mechanisms. Both approaches perform similarly under same-species evaluation. Beyond accuracy, MiniRocket enables zero-training aggregation, interpretable predictions via neighbor auditing, and biological validation through mechanism-based clustering. Unlike complex decision boundaries learned by gradient boosting, k-NN exposes the underlying geometric reorganization that explains when and why local pattern preservation succeeds: reduced phylogenetic hubness and increased cross-species mechanism sharing. Together, our results establish aggregation choice as a central axis in cross-species AMR prediction and provide a reproducible, diagnostic-driven framework for deploying genomic foundation models under distribution shift.