Paper List
-
An AI Implementation Science Study to Improve Trustworthy Data in a Large Healthcare System
This paper addresses the critical gap between theoretical AI research and real-world clinical implementation by providing a practical framework for as...
-
The BEAT-CF Causal Model: A model for guiding the design of trials and observational analyses of cystic fibrosis exacerbations
This paper addresses the critical gap in cystic fibrosis exacerbation management by providing a formal causal framework that integrates expert knowled...
-
Hierarchical Molecular Language Models (HMLMs)
This paper addresses the core challenge of accurately modeling context-dependent signaling, pathway cross-talk, and temporal dynamics across multiple ...
-
Stability analysis of action potential generation using Markov models of voltage‑gated sodium channel isoforms
This work addresses the challenge of systematically characterizing how the high-dimensional parameter space of Markov models for different sodium chan...
-
Approximate Bayesian Inference on Mechanisms of Network Growth and Evolution
This paper addresses the core challenge of inferring the relative contributions of multiple, simultaneous generative mechanisms in network formation w...
-
EnzyCLIP: A Cross-Attention Dual Encoder Framework with Contrastive Learning for Predicting Enzyme Kinetic Constants
This paper addresses the core challenge of jointly predicting enzyme kinetic parameters (Kcat and Km) by modeling dynamic enzyme-substrate interaction...
-
Tissue stress measurements with Bayesian Inversion Stress Microscopy
This paper addresses the core challenge of measuring absolute, tissue-scale mechanical stress without making assumptions about tissue rheology, which ...
-
DeepFRI Demystified: Interpretability vs. Accuracy in AI Protein Function Prediction
This study addresses the critical gap between high predictive accuracy and biological interpretability in DeepFRI, revealing that the model often prio...
MS2MetGAN: Latent-space adversarial training for metabolite–spectrum matching in MS/MS database search
University of Tennessee at Chattanooga | Middle Georgia State University
30秒速读
IN SHORT: This paper addresses the critical bottleneck in metabolite identification: the generation of high-quality negative training samples that are structurally similar to true metabolites, which is essential for training robust machine learning classifiers.
核心创新
- Methodology Introduces a novel latent-space approach where metabolite structures and MS/MS spectra are encoded into numerical vectors using autoencoders, transforming metabolite-spectrum matching into vector matching.
- Methodology Proposes a GAN framework specifically designed to generate challenging decoy metabolite latent vectors conditioned on spectrum latent vectors, creating more informative negative training samples.
- Methodology Demonstrates that adversarial training (GAN-9) significantly improves classifier stability, reducing standard deviation of accuracy across datasets from 0.3286 (GAN-0) to 0.1618 while increasing mean accuracy.
主要结论
- MS2MetGAN achieves superior overall performance with mean accuracy of 76.33% against MetaCyc database and 79.35% against isomer decoys, outperforming 8 baseline tools including MIDAS (69.21%), SF-Matching (65.79%), and CSI:FingerID (49.66%).
- The GAN training procedure improves performance stability across diverse test datasets, reducing standard deviation of accuracy from 0.3286 (GAN-0) to 0.1618 (GAN-9) for MetaCyc searches and from 0.3122 to 0.1663 for isomer decoy searches.
- MS2MetGAN demonstrates strong transferability, outperforming baseline tools on 66.67%-100% of test datasets in pairwise comparisons, with particularly strong performance against isomer decoys where it beats all baselines on 77.78%-100% of datasets.
摘要: Database search is a widely used approach for identifying metabolites from tandem mass spectra (MS/MS). In this strategy, an experimental spectrum is matched against a user-specified database of candidate metabolites, and candidates are ranked such that true metabolite–spectrum matches receive the highest scores. Machine-learning methods have been widely incorporated into database-search–based identification tools and have substantially improved performance. To further improve identification accuracy, we propose a new framework for generating negative training samples. The framework first uses autoencoders to learn latent representations of metabolite structures and MS/MS spectra, thereby recasting metabolite–spectrum matching as matching between latent vectors. It then uses a GAN to generate latent vectors of decoy metabolites and constructs decoy metabolite–spectrum matches as negative samples for training. Experimental results show that our tool, MS2MetGAN, achieves better overall performance than existing metabolite identification methods.