Paper List

AI for Science

Autonomous Agents Coordinating Distributed Discovery Through Emergent Artifact Exchange

2026-03-15

This paper addresses the fundamental limitation of current AI-assisted scientific research by enabling truly autonomous, decentralized investigation w...
Artificial Intelligence

D-MEM: Dopamine-Gated Agentic Memory via Reward Prediction Error Routing

2026-03-15

This paper addresses the fundamental scalability bottleneck in LLM agentic memory systems: the O(N²) computational complexity and unbounded API token ...
Biophysics

Countershading coloration in blue shark skin emerges from hierarchically organized and spatially tuned photonic architectures inside skin denticles

2026-03-14

This paper solves the core problem of how blue sharks achieve their striking dorsoventral countershading camouflage, revealing that coloration origina...
Computer Vision

Human-like Object Grouping in Self-supervised Vision Transformers

2026-03-14

This paper addresses the core challenge of quantifying how well self-supervised vision models capture human-like object grouping in natural scenes, br...
Bioinformatics

Hierarchical pp-Adic Framework for Gene Regulatory Networks: Theory and Stability Analysis

2026-03-14

This paper addresses the core challenge of mathematically capturing the inherent hierarchical organization and multi-scale stability of gene regulator...
Computational Neuroscience

Towards unified brain-to-text decoding across speech production and perception

2026-03-13

This paper addresses the core challenge of developing a unified brain-to-text decoding framework that works across both speech production and percepti...
Artificial Intelligence

Dual-Laws Model for a theory of artificial consciousness

2026-03-13

This paper addresses the core challenge of developing a comprehensive, testable theory of consciousness that bridges biological and artificial systems...
Computational Neuroscience

Pulse desynchronization of neural populations by targeting the centroid of the limit cycle in phase space

2026-03-13

This work addresses the core challenge of determining optimal pulse timing and intensity for desynchronizing pathological neural oscillations when the...

3 / 18

期刊: ArXiv Preprint

发布日期: 2025

BioinformaticsComputational Biology

Cross-Species Antimicrobial Resistance Prediction from Genomic Foundation Models

Department of Computer Science, School of Engineering and Applied Science, Columbia University

Huilin Tai

30秒速读

IN SHORT: This paper addresses the core challenge of predicting antimicrobial resistance across phylogenetically distinct bacterial species, where traditional methods fail due to reliance on species-specific genomic shortcuts rather than transferable resistance mechanisms.

核心创新

Methodology Developed diagnostic-driven layer selection for genomic foundation models, identifying Layer 10 in Evo-1-8k-base as the deepest jointly stable extraction point through activation scale, isotropy, effective rank, and cross-seed stability analysis.
Methodology Introduced MiniRocket-based local pattern preservation for embedding aggregation, treating per-window embeddings as ordered multivariate signals to preserve sparse cassette-scale resistance signals that global pooling dilutes.
Biology Established the mechanism-mix hypothesis: cross-species AMR prediction performance depends on whether resistance is cassette-mediated (transferable) or chromosomal/diffuse (species-specific), not just aggregation method.

主要结论

MiniRocket aggregation with k-NN classifier achieved MCC=0.753 on cross-species validation (val_outside), substantially outperforming global pooling (F1=0.982 vs 0.901 for k-NN), while Kover baseline collapsed from within-species F1~0.68 to cross-species F1=0.02.
Cross-species performance is mechanism-dependent: MiniRocket excels when cassette-mediated resistance predominates (e.g., plasmid-borne β-lactamases), while global pooling remains competitive for chromosomal/diffuse mechanisms.
Layer 10 embeddings from Evo-1-8k-base provide optimal stability, with sharp degradation beyond Layer 11 evidenced by isotropy collapse (angular diversity peaks at L9-L10) and effective rank compression at L11.

研究空白： Current AMR prediction methods rely on random or stratified train-test splits with phylogenetic overlap, allowing models to exploit species-specific shortcuts rather than learning truly transferable resistance mechanisms, leading to catastrophic failure under strict cross-species evaluation.

摘要: Cross-species antimicrobial resistance (AMR) prediction is fundamentally an out-of-distribution generalization problem: models trained on one set of bacterial taxa must transfer to phylogenetically distinct genomes that may rely on different resistance mechanisms. Critically, resistance is not monolithic. Across species, it arises from a heterogeneous mixture of localized, horizontally transferred gene cassettes and diffuse, species-specific genomic backgrounds, making successful transfer inherently mechanism-dependent. Using a strict species holdout protocol, we first establish an interpretable k-mer baseline with Kover, showing that strong within-species performance collapses under true cross-species evaluation. This motivates the need for representation-level choices that explicitly preserve transferable biological signals rather than amplify phylogenetic shortcuts. We introduce two ingredients that make genomic foundation model embeddings effective for cross-species AMR prediction. First, for layer selection, we develop diagnostics for activation scale, isotropy, effective rank, and cross-seed stability under native bfloat16 inference. These reveal a sharp stability boundary at Layer 11 in Evo-1-8k-base, identifying Layer 10 as the deepest jointly stable layer; extracting embeddings here improves downstream conditioning, reproducibility, and robustness. Second, for feature aggregation, we argue that global pooling obscures localized resistance mechanisms. Instead, we treat per-window embeddings as an ordered multivariate signal and apply MiniRocket to summarize multi-scale local activation patterns. This preserves cassette-scale signals (e.g., plasmid-borne β-lactamases) that global averages dilute, reorganizing feature space toward phenotype-aligned neighborhoods where simple classifiers can generalize across species. On ampicillin resistance across 3,388 genomes from 126 species, we show that cross-species performance depends on which resistance mechanisms dominate the held-out species, not on aggregation method alone. MiniRocket excels when cassette-mediated resistance predominates; Global Pooling remains competitive for chromosomal or diffuse mechanisms. Both approaches perform similarly under same-species evaluation. Beyond accuracy, MiniRocket enables zero-training aggregation, interpretable predictions via neighbor auditing, and biological validation through mechanism-based clustering. Unlike complex decision boundaries learned by gradient boosting, k-NN exposes the underlying geometric reorganization that explains when and why local pattern preservation succeeds: reduced phylogenetic hubness and increased cross-species mechanism sharing. Together, our results establish aggregation choice as a central axis in cross-species AMR prediction and provide a reproducible, diagnostic-driven framework for deploying genomic foundation models under distribution shift.