Paper List

Computational Neuroscience

Formation of Artificial Neural Assemblies by Biologically Plausible Inhibition Mechanisms

2026-03-12

This work addresses the core limitation of the Assembly Calculus model—its fixed-size, biologically implausible k-WTA selection process—by introducing...
Bioinformatics

How to make the most of your masked language model for protein engineering

2026-03-11

This paper addresses the critical bottleneck of efficiently sampling high-quality, diverse protein sequences from Masked Language Models (MLMs) for pr...
Network Psychometrics

Module control in youth symptom networks across COVID-19

2026-03-11

This paper addresses the core challenge of distinguishing whether a prolonged societal stressor (COVID-19) fundamentally reorganizes the architecture ...
Computational Neuroscience

JEDI: Jointly Embedded Inference of Neural Dynamics

2026-03-11

This paper addresses the core challenge of inferring context-dependent neural dynamics from noisy, high-dimensional recordings using a single unified ...
Biophysics

ATP Level and Phosphorylation Free Energy Regulate Trigger-Wave Speed and Critical Nucleus Size in Cellular Biochemical Systems

2026-03-11

This work addresses the core challenge of quantitatively predicting how the cellular energy state (ATP level and phosphorylation free energy) governs ...
Bioinformatics

Packaging Jupyter notebooks as installable desktop apps using LabConstrictor

2026-03-11

This paper addresses the core pain point of ensuring Jupyter notebook reproducibility and accessibility across different computing environments, parti...
Bioinformatics

SNPgen: Phenotype-Supervised Genotype Representation and Synthetic Data Generation via Latent Diffusion

2026-03-11

This paper addresses the core challenge of generating privacy-preserving synthetic genotype data that maintains both statistical fidelity and downstre...
Bioinformatics

Continuous Diffusion Transformers for Designing Synthetic Regulatory Elements

2026-03-11

This paper addresses the challenge of efficiently generating novel, cell-type-specific regulatory DNA sequences with high predicted activity while min...

6 / 18

期刊: ArXiv Preprint

发布日期: 2026-03-07

BioinformaticsComputational Chemistry

MS2MetGAN: Latent-space adversarial training for metabolite–spectrum matching in MS/MS database search

University of Tennessee at Chattanooga | Middle Georgia State University

Yingfeng Wang, Meng Tsai, Alexzander Dwyer, Estelle Nuckels

30秒速读

IN SHORT: This paper addresses the critical bottleneck in metabolite identification: the generation of high-quality negative training samples that are structurally similar to true metabolites, which is essential for training robust machine learning classifiers.

核心创新

Methodology Introduces a novel latent-space approach where metabolite structures and MS/MS spectra are encoded into numerical vectors using autoencoders, transforming metabolite-spectrum matching into vector matching.
Methodology Proposes a GAN framework specifically designed to generate challenging decoy metabolite latent vectors conditioned on spectrum latent vectors, creating more informative negative training samples.
Methodology Demonstrates that adversarial training (GAN-9) significantly improves classifier stability, reducing standard deviation of accuracy across datasets from 0.3286 (GAN-0) to 0.1618 while increasing mean accuracy.

主要结论

MS2MetGAN achieves superior overall performance with mean accuracy of 76.33% against MetaCyc database and 79.35% against isomer decoys, outperforming 8 baseline tools including MIDAS (69.21%), SF-Matching (65.79%), and CSI:FingerID (49.66%).
The GAN training procedure improves performance stability across diverse test datasets, reducing standard deviation of accuracy from 0.3286 (GAN-0) to 0.1618 (GAN-9) for MetaCyc searches and from 0.3122 to 0.1663 for isomer decoy searches.
MS2MetGAN demonstrates strong transferability, outperforming baseline tools on 66.67%-100% of test datasets in pairwise comparisons, with particularly strong performance against isomer decoys where it beats all baselines on 77.78%-100% of datasets.

研究空白： Current metabolite identification tools are limited by the quality of negative training samples, typically constructed from mismatched compounds or spectra that are not structurally similar enough to true metabolites, leading to suboptimal classifier performance.

摘要: Database search is a widely used approach for identifying metabolites from tandem mass spectra (MS/MS). In this strategy, an experimental spectrum is matched against a user-specified database of candidate metabolites, and candidates are ranked such that true metabolite–spectrum matches receive the highest scores. Machine-learning methods have been widely incorporated into database-search–based identification tools and have substantially improved performance. To further improve identification accuracy, we propose a new framework for generating negative training samples. The framework first uses autoencoders to learn latent representations of metabolite structures and MS/MS spectra, thereby recasting metabolite–spectrum matching as matching between latent vectors. It then uses a GAN to generate latent vectors of decoy metabolites and constructs decoy metabolite–spectrum matches as negative samples for training. Experimental results show that our tool, MS2MetGAN, achieves better overall performance than existing metabolite identification methods.