Paper List
-
Nyxus: A Next Generation Image Feature Extraction Library for the Big Data and AI Era
This paper addresses the core pain point of efficiently extracting standardized, comparable features from massive (terabyte to petabyte-scale) biomedi...
-
Topological Enhancement of Protein Kinetic Stability
This work addresses the long-standing puzzle of why knotted proteins exist by demonstrating that deep knots provide a functional advantage through enh...
-
A Multi-Label Temporal Convolutional Framework for Transcription Factor Binding Characterization
This paper addresses the critical limitation of existing TF binding prediction methods that treat transcription factors as independent entities, faili...
-
Social Distancing Equilibria in Games under Conventional SI Dynamics
This paper solves the core problem of proving the existence and uniqueness of Nash equilibria in finite-duration SI epidemic games, showing they are a...
-
Binding Free Energies without Alchemy
This paper addresses the core bottleneck of computational expense in Absolute Binding Free Energy calculations by eliminating the need for numerous al...
-
SHREC: A Spectral Embedding-Based Approach for Ab-Initio Reconstruction of Helical Molecules
This paper addresses the core bottleneck in cryo-EM helical reconstruction: eliminating the dependency on accurate initial symmetry parameter estimati...
-
Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection
This paper addresses the critical gap in evaluating AI-guided scientific selection strategies under realistic budget constraints, where existing metri...
-
Probabilistic Joint and Individual Variation Explained (ProJIVE) for Data Integration
This paper addresses the core challenge of accurately decomposing shared (joint) and dataset-specific (individual) sources of variation in multi-modal...
MS2MetGAN: Latent-space adversarial training for metabolite–spectrum matching in MS/MS database search
University of Tennessee at Chattanooga | Middle Georgia State University
30秒速读
IN SHORT: This paper addresses the critical bottleneck in metabolite identification: the generation of high-quality negative training samples that are structurally similar to true metabolites, which is essential for training robust machine learning classifiers.
核心创新
- Methodology Introduces a novel latent-space approach where metabolite structures and MS/MS spectra are encoded into numerical vectors using autoencoders, transforming metabolite-spectrum matching into vector matching.
- Methodology Proposes a GAN framework specifically designed to generate challenging decoy metabolite latent vectors conditioned on spectrum latent vectors, creating more informative negative training samples.
- Methodology Demonstrates that adversarial training (GAN-9) significantly improves classifier stability, reducing standard deviation of accuracy across datasets from 0.3286 (GAN-0) to 0.1618 while increasing mean accuracy.
主要结论
- MS2MetGAN achieves superior overall performance with mean accuracy of 76.33% against MetaCyc database and 79.35% against isomer decoys, outperforming 8 baseline tools including MIDAS (69.21%), SF-Matching (65.79%), and CSI:FingerID (49.66%).
- The GAN training procedure improves performance stability across diverse test datasets, reducing standard deviation of accuracy from 0.3286 (GAN-0) to 0.1618 (GAN-9) for MetaCyc searches and from 0.3122 to 0.1663 for isomer decoy searches.
- MS2MetGAN demonstrates strong transferability, outperforming baseline tools on 66.67%-100% of test datasets in pairwise comparisons, with particularly strong performance against isomer decoys where it beats all baselines on 77.78%-100% of datasets.
摘要: Database search is a widely used approach for identifying metabolites from tandem mass spectra (MS/MS). In this strategy, an experimental spectrum is matched against a user-specified database of candidate metabolites, and candidates are ranked such that true metabolite–spectrum matches receive the highest scores. Machine-learning methods have been widely incorporated into database-search–based identification tools and have substantially improved performance. To further improve identification accuracy, we propose a new framework for generating negative training samples. The framework first uses autoencoders to learn latent representations of metabolite structures and MS/MS spectra, thereby recasting metabolite–spectrum matching as matching between latent vectors. It then uses a GAN to generate latent vectors of decoy metabolites and constructs decoy metabolite–spectrum matches as negative samples for training. Experimental results show that our tool, MS2MetGAN, achieves better overall performance than existing metabolite identification methods.