Paper List
-
Ill-Conditioning in Dictionary-Based Dynamic-Equation Learning: A Systems Biology Case Study
This paper addresses the critical challenge of numerical ill-conditioning and multicollinearity in library-based sparse regression methods (e.g., SIND...
-
Hybrid eTFCE–GRF: Exact Cluster-Size Retrieval with Analytical pp-Values for Voxel-Based Morphometry
This paper addresses the computational bottleneck in voxel-based neuroimaging analysis by providing a method that delivers exact cluster-size retrieva...
-
abx_amr_simulator: A simulation environment for antibiotic prescribing policy optimization under antimicrobial resistance
This paper addresses the critical challenge of quantitatively evaluating antibiotic prescribing policies under realistic uncertainty and partial obser...
-
PesTwin: a biology-informed Digital Twin for enabling precision farming
This paper addresses the critical bottleneck in precision agriculture: the inability to accurately forecast pest outbreaks in real-time, leading to su...
-
Equivariant Asynchronous Diffusion: An Adaptive Denoising Schedule for Accelerated Molecular Conformation Generation
This paper addresses the core challenge of generating physically plausible 3D molecular structures by bridging the gap between autoregressive methods ...
-
Omics Data Discovery Agents
This paper addresses the core challenge of making published omics data computationally reusable by automating the extraction, quantification, and inte...
-
Single-cell directional sensing at ultra-low chemoattractant concentrations from extreme first-passage events
This work addresses the core challenge of how a cell can rapidly and accurately determine the direction of a chemoattractant source when the signal is...
-
SDSR: A Spectral Divide-and-Conquer Approach for Species Tree Reconstruction
This paper addresses the computational bottleneck in reconstructing species trees from thousands of species and multiple genes by introducing a scalab...
Cross-Species Antimicrobial Resistance Prediction from Genomic Foundation Models
Department of Computer Science, School of Engineering and Applied Science, Columbia University
30秒速读
IN SHORT: This paper addresses the core challenge of predicting antimicrobial resistance across phylogenetically distinct bacterial species, where traditional methods fail due to reliance on species-specific genomic shortcuts rather than transferable resistance mechanisms.
核心创新
- Methodology Developed diagnostic-driven layer selection for genomic foundation models, identifying Layer 10 in Evo-1-8k-base as the deepest jointly stable extraction point through activation scale, isotropy, effective rank, and cross-seed stability analysis.
- Methodology Introduced MiniRocket-based local pattern preservation for embedding aggregation, treating per-window embeddings as ordered multivariate signals to preserve sparse cassette-scale resistance signals that global pooling dilutes.
- Biology Established the mechanism-mix hypothesis: cross-species AMR prediction performance depends on whether resistance is cassette-mediated (transferable) or chromosomal/diffuse (species-specific), not just aggregation method.
主要结论
- MiniRocket aggregation with k-NN classifier achieved MCC=0.753 on cross-species validation (val_outside), substantially outperforming global pooling (F1=0.982 vs 0.901 for k-NN), while Kover baseline collapsed from within-species F1~0.68 to cross-species F1=0.02.
- Cross-species performance is mechanism-dependent: MiniRocket excels when cassette-mediated resistance predominates (e.g., plasmid-borne β-lactamases), while global pooling remains competitive for chromosomal/diffuse mechanisms.
- Layer 10 embeddings from Evo-1-8k-base provide optimal stability, with sharp degradation beyond Layer 11 evidenced by isotropy collapse (angular diversity peaks at L9-L10) and effective rank compression at L11.
摘要: Cross-species antimicrobial resistance (AMR) prediction is fundamentally an out-of-distribution generalization problem: models trained on one set of bacterial taxa must transfer to phylogenetically distinct genomes that may rely on different resistance mechanisms. Critically, resistance is not monolithic. Across species, it arises from a heterogeneous mixture of localized, horizontally transferred gene cassettes and diffuse, species-specific genomic backgrounds, making successful transfer inherently mechanism-dependent. Using a strict species holdout protocol, we first establish an interpretable k-mer baseline with Kover, showing that strong within-species performance collapses under true cross-species evaluation. This motivates the need for representation-level choices that explicitly preserve transferable biological signals rather than amplify phylogenetic shortcuts. We introduce two ingredients that make genomic foundation model embeddings effective for cross-species AMR prediction. First, for layer selection, we develop diagnostics for activation scale, isotropy, effective rank, and cross-seed stability under native bfloat16 inference. These reveal a sharp stability boundary at Layer 11 in Evo-1-8k-base, identifying Layer 10 as the deepest jointly stable layer; extracting embeddings here improves downstream conditioning, reproducibility, and robustness. Second, for feature aggregation, we argue that global pooling obscures localized resistance mechanisms. Instead, we treat per-window embeddings as an ordered multivariate signal and apply MiniRocket to summarize multi-scale local activation patterns. This preserves cassette-scale signals (e.g., plasmid-borne β-lactamases) that global averages dilute, reorganizing feature space toward phenotype-aligned neighborhoods where simple classifiers can generalize across species. On ampicillin resistance across 3,388 genomes from 126 species, we show that cross-species performance depends on which resistance mechanisms dominate the held-out species, not on aggregation method alone. MiniRocket excels when cassette-mediated resistance predominates; Global Pooling remains competitive for chromosomal or diffuse mechanisms. Both approaches perform similarly under same-species evaluation. Beyond accuracy, MiniRocket enables zero-training aggregation, interpretable predictions via neighbor auditing, and biological validation through mechanism-based clustering. Unlike complex decision boundaries learned by gradient boosting, k-NN exposes the underlying geometric reorganization that explains when and why local pattern preservation succeeds: reduced phylogenetic hubness and increased cross-species mechanism sharing. Together, our results establish aggregation choice as a central axis in cross-species AMR prediction and provide a reproducible, diagnostic-driven framework for deploying genomic foundation models under distribution shift.