Paper List
-
GOPHER: Optimization-based Phenotype Randomization for Genome-Wide Association Studies with Differential Privacy
This paper addresses the core challenge of balancing rigorous privacy protection with data utility when releasing full GWAS summary statistics, overco...
-
Real-time Cricket Sorting By Sex A low-cost embedded solution using YOLOv8 and Raspberry Pi
This paper addresses the critical bottleneck in industrial insect farming: the lack of automated, real-time sex sorting systems for Acheta domesticus ...
-
Training Dynamics of Learning 3D-Rotational Equivariance
This work addresses the core dilemma of whether to use computationally expensive equivariant architectures or faster symmetry-agnostic models with dat...
-
Fast and Accurate Node-Age Estimation Under Fossil Calibration Uncertainty Using the Adjusted Pairwise Likelihood
This paper addresses the dual challenge of computational inefficiency and sensitivity to fossil calibration errors in Bayesian divergence time estimat...
-
Few-shot Protein Fitness Prediction via In-context Learning and Test-time Training
This paper addresses the core challenge of accurately predicting protein fitness with only a handful of experimental observations, where data collecti...
-
scCluBench: Comprehensive Benchmarking of Clustering Algorithms for Single-Cell RNA Sequencing
This paper addresses the critical gap of fragmented and non-standardized benchmarking in single-cell RNA-seq clustering, which hinders objective compa...
-
Simulation and inference methods for non-Markovian stochastic biochemical reaction networks
This paper addresses the computational bottleneck of simulating and performing Bayesian inference for non-Markovian biochemical systems with history-d...
-
Assessment of Simulation-based Inference Methods for Stochastic Compartmental Models
This paper addresses the core challenge of performing accurate Bayesian parameter inference for stochastic epidemic models when the likelihood functio...
MS2MetGAN: Latent-space adversarial training for metabolite–spectrum matching in MS/MS database search
University of Tennessee at Chattanooga | Middle Georgia State University
30秒速读
IN SHORT: This paper addresses the critical bottleneck in metabolite identification: the generation of high-quality negative training samples that are structurally similar to true metabolites, which is essential for training robust machine learning classifiers.
核心创新
- Methodology Introduces a novel latent-space approach where metabolite structures and MS/MS spectra are encoded into numerical vectors using autoencoders, transforming metabolite-spectrum matching into vector matching.
- Methodology Proposes a GAN framework specifically designed to generate challenging decoy metabolite latent vectors conditioned on spectrum latent vectors, creating more informative negative training samples.
- Methodology Demonstrates that adversarial training (GAN-9) significantly improves classifier stability, reducing standard deviation of accuracy across datasets from 0.3286 (GAN-0) to 0.1618 while increasing mean accuracy.
主要结论
- MS2MetGAN achieves superior overall performance with mean accuracy of 76.33% against MetaCyc database and 79.35% against isomer decoys, outperforming 8 baseline tools including MIDAS (69.21%), SF-Matching (65.79%), and CSI:FingerID (49.66%).
- The GAN training procedure improves performance stability across diverse test datasets, reducing standard deviation of accuracy from 0.3286 (GAN-0) to 0.1618 (GAN-9) for MetaCyc searches and from 0.3122 to 0.1663 for isomer decoy searches.
- MS2MetGAN demonstrates strong transferability, outperforming baseline tools on 66.67%-100% of test datasets in pairwise comparisons, with particularly strong performance against isomer decoys where it beats all baselines on 77.78%-100% of datasets.
摘要: Database search is a widely used approach for identifying metabolites from tandem mass spectra (MS/MS). In this strategy, an experimental spectrum is matched against a user-specified database of candidate metabolites, and candidates are ranked such that true metabolite–spectrum matches receive the highest scores. Machine-learning methods have been widely incorporated into database-search–based identification tools and have substantially improved performance. To further improve identification accuracy, we propose a new framework for generating negative training samples. The framework first uses autoencoders to learn latent representations of metabolite structures and MS/MS spectra, thereby recasting metabolite–spectrum matching as matching between latent vectors. It then uses a GAN to generate latent vectors of decoy metabolites and constructs decoy metabolite–spectrum matches as negative samples for training. Experimental results show that our tool, MS2MetGAN, achieves better overall performance than existing metabolite identification methods.