Paper List

Human-Computer Interaction

Developing the PsyCogMetrics™ AI Lab to Evaluate Large Language Models and Advance Cognitive Science

2026-03-13

This paper addresses the critical gap between sophisticated LLM evaluation needs and the lack of accessible, scientifically rigorous platforms that in...
Computational Neuroscience

Equivalence of approximation by networks of single- and multi-spike neurons

2026-03-13

This paper resolves the fundamental question of whether single-spike spiking neural networks (SNNs) are inherently less expressive than multi-spike SN...
Computational Neuroscience

The neuroscience of transformers

2026-03-13

提出了Transformer架构与皮层柱微环路之间的新颖计算映射，连接了现代AI与神经科学。
Applied Mathematics

Framing local structural identifiability and observability in terms of parameter-state symmetries

2026-03-12

This paper addresses the core challenge of systematically determining which parameters and states in a mechanistic ODE model can be uniquely inferred ...
Bioinformatics

Leveraging Phytolith Research using Artificial Intelligence

2026-03-12

This paper addresses the critical bottleneck in phytolith research by automating the labor-intensive manual microscopy process through a multimodal AI...
Computational Neuroscience

Neural network-based encoding in free-viewing fMRI with gaze-aware models

2026-03-12

This paper addresses the core challenge of building computationally efficient and ecologically valid brain encoding models for naturalistic vision by ...
Bioinformatics

Scalable DNA Ternary Full Adder Enabled by a Competitive Blocking Circuit

2026-03-12

This paper addresses the core bottleneck of carry information attenuation and limited computational scale in DNA binary adders by introducing a scalab...
Bioinformatics

ELISA: An Interpretable Hybrid Generative AI Agent for Expression-Grounded Discovery in Single-Cell Genomics

2026-03-12

This paper addresses the critical bottleneck of translating high-dimensional single-cell transcriptomic data into interpretable biological hypotheses ...

4 / 18

期刊: ArXiv Preprint

发布日期: 2025

BioinformaticsComputational Biology

Cross-Species Antimicrobial Resistance Prediction from Genomic Foundation Models

Department of Computer Science, School of Engineering and Applied Science, Columbia University

Huilin Tai

30秒速读

IN SHORT: This paper addresses the core challenge of predicting antimicrobial resistance across phylogenetically distinct bacterial species, where traditional methods fail due to reliance on species-specific genomic shortcuts rather than transferable resistance mechanisms.

核心创新

Methodology Developed diagnostic-driven layer selection for genomic foundation models, identifying Layer 10 in Evo-1-8k-base as the deepest jointly stable extraction point through activation scale, isotropy, effective rank, and cross-seed stability analysis.
Methodology Introduced MiniRocket-based local pattern preservation for embedding aggregation, treating per-window embeddings as ordered multivariate signals to preserve sparse cassette-scale resistance signals that global pooling dilutes.
Biology Established the mechanism-mix hypothesis: cross-species AMR prediction performance depends on whether resistance is cassette-mediated (transferable) or chromosomal/diffuse (species-specific), not just aggregation method.

主要结论

MiniRocket aggregation with k-NN classifier achieved MCC=0.753 on cross-species validation (val_outside), substantially outperforming global pooling (F1=0.982 vs 0.901 for k-NN), while Kover baseline collapsed from within-species F1~0.68 to cross-species F1=0.02.
Cross-species performance is mechanism-dependent: MiniRocket excels when cassette-mediated resistance predominates (e.g., plasmid-borne β-lactamases), while global pooling remains competitive for chromosomal/diffuse mechanisms.
Layer 10 embeddings from Evo-1-8k-base provide optimal stability, with sharp degradation beyond Layer 11 evidenced by isotropy collapse (angular diversity peaks at L9-L10) and effective rank compression at L11.

研究空白： Current AMR prediction methods rely on random or stratified train-test splits with phylogenetic overlap, allowing models to exploit species-specific shortcuts rather than learning truly transferable resistance mechanisms, leading to catastrophic failure under strict cross-species evaluation.

摘要: Cross-species antimicrobial resistance (AMR) prediction is fundamentally an out-of-distribution generalization problem: models trained on one set of bacterial taxa must transfer to phylogenetically distinct genomes that may rely on different resistance mechanisms. Critically, resistance is not monolithic. Across species, it arises from a heterogeneous mixture of localized, horizontally transferred gene cassettes and diffuse, species-specific genomic backgrounds, making successful transfer inherently mechanism-dependent. Using a strict species holdout protocol, we first establish an interpretable k-mer baseline with Kover, showing that strong within-species performance collapses under true cross-species evaluation. This motivates the need for representation-level choices that explicitly preserve transferable biological signals rather than amplify phylogenetic shortcuts. We introduce two ingredients that make genomic foundation model embeddings effective for cross-species AMR prediction. First, for layer selection, we develop diagnostics for activation scale, isotropy, effective rank, and cross-seed stability under native bfloat16 inference. These reveal a sharp stability boundary at Layer 11 in Evo-1-8k-base, identifying Layer 10 as the deepest jointly stable layer; extracting embeddings here improves downstream conditioning, reproducibility, and robustness. Second, for feature aggregation, we argue that global pooling obscures localized resistance mechanisms. Instead, we treat per-window embeddings as an ordered multivariate signal and apply MiniRocket to summarize multi-scale local activation patterns. This preserves cassette-scale signals (e.g., plasmid-borne β-lactamases) that global averages dilute, reorganizing feature space toward phenotype-aligned neighborhoods where simple classifiers can generalize across species. On ampicillin resistance across 3,388 genomes from 126 species, we show that cross-species performance depends on which resistance mechanisms dominate the held-out species, not on aggregation method alone. MiniRocket excels when cassette-mediated resistance predominates; Global Pooling remains competitive for chromosomal or diffuse mechanisms. Both approaches perform similarly under same-species evaluation. Beyond accuracy, MiniRocket enables zero-training aggregation, interpretable predictions via neighbor auditing, and biological validation through mechanism-based clustering. Unlike complex decision boundaries learned by gradient boosting, k-NN exposes the underlying geometric reorganization that explains when and why local pattern preservation succeeds: reduced phylogenetic hubness and increased cross-species mechanism sharing. Together, our results establish aggregation choice as a central axis in cross-species AMR prediction and provide a reproducible, diagnostic-driven framework for deploying genomic foundation models under distribution shift.