Paper List

Bioinformatics

Simulation and inference methods for non-Markovian stochastic biochemical reaction networks

Unknown

This paper addresses the computational bottleneck of simulating and performing Bayesian inference for non-Markovian biochemical systems with history-d...
Computational Neuroscience

Translating Measures onto Mechanisms: The Cognitive Relevance of Higher-Order Information

Unknown

This review addresses the core challenge of translating abstract higher-order information theory metrics (e.g., synergy, redundancy) into defensible, ...
Artificial Intelligence

Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs

Unknown

This paper addresses the critical gap in understanding whether LLMs spontaneously develop human-like Bayesian strategies for processing uncertain info...
Bioinformatics

Vessel Network Topology in Molecular Communication: Insights from Experiments and Theory

Unknown

This work addresses the critical lack of experimentally validated channel models for molecular communication within complex vessel networks, which is ...
Biophysics

Modulation of DNA rheology by a transcription factor that forms aging microgels

Unknown

This work addresses the fundamental question of how the transcription factor NANOG, essential for embryonic stem cell pluripotency, physically regulat...
Systems Biology

Imperfect molecular detection renormalizes apparent kinetic rates in stochastic gene regulatory networks

Unknown

This paper addresses the core challenge of distinguishing genuine stochastic dynamics of gene regulatory networks from artifacts introduced by imperfe...
Network Science

Approximate Bayesian Inference on Mechanisms of Network Growth and Evolution

Unknown

This paper addresses the core challenge of inferring the relative contributions of multiple, simultaneous generative mechanisms in network formation w...
Health Informatics

An AI Implementation Science Study to Improve Trustworthy Data in a Large Healthcare System

Unknown

This paper addresses the critical gap between theoretical AI research and real-world clinical implementation by providing a practical framework for as...

3 / 9

Journal: ArXiv Preprint

Published: Unknown

BioinformaticsComputational Chemistry

Pharmacophore-based design by learning on voxel grids

AIDD, Genentech

Omar Mahmood, Pedro O. Pinheiro, Richard Bonneau, Saeed Saremi, Vishnu Sresht

The 30-Second View

IN SHORT: This paper addresses the computational bottleneck and limited novelty in conventional pharmacophore-based virtual screening by introducing a voxel captioning method that generates novel molecules directly from 3D pharmacophore-shape profiles.

Innovation (TL;DR)

Methodology Proposes VoxCap, the first voxel captioning method for generating SMILES strings from voxelized 3D pharmacophore-shape profiles, bridging 3D structural information with 1D string generation.
Methodology Introduces a 'fast search' workflow that reduces computational complexity from O(database size) to O(n_g × n_a), enabling screening of billion-compound libraries previously considered intractable.
Biology Demonstrates superior performance in generating diverse, novel scaffolds with high pharmacophore-shape similarity (Tanimoto Combo score ≥1.2), addressing both in-distribution and out-of-distribution query molecules.

Key conclusions

VoxCap generates significantly more hits than baseline methods, with median hits per query increasing from 0 (baseline) to 116.5 on GEOM-drugs and from 0 to 115 on ChEMBL (p<0.001).
The model produces diverse scaffolds, with median unique scaffold hits of 55.5 (GEOM-drugs) and 72 (ChEMBL), compared to 0 for baselines and 7-8.5 for PGMG.
The fast search workflow reduces computational requirements by orders of magnitude while maintaining hit rates, enabling practical screening of billion-compound libraries like Enamine Real (60B compounds).

Background and Gap： Current pharmacophore-based virtual screening suffers from two major limitations: (1) computational expense scales poorly with library size, making billion-compound libraries intractable; (2) candidate generation is restricted to existing library compounds, limiting novelty and scaffold diversity.

Abstract: Ligand-based drug discovery (LBDD) relies on making use of known binders to a protein target to find structurally diverse molecules similarly likely to bind. This process typically involves a brute force search of the known binder (query) against a molecular library using some metric of molecular similarity. One popular approach overlays the pharmacophore-shape profile of the known binder to 3D conformations enumerated for each of the library molecules, computes overlaps, and picks a set of diverse library molecules with high overlaps. While this virtual screening workflow has had considerable success in hit diversification, scaffold hopping, and patent busting, it scales poorly with library sizes and restricts candidate generation to existing library compounds. Leveraging recent advances in voxel-based generative modelling, we propose a pharmacophore-based generative model and workflows that address the scaling and fecundity issues of conventional pharmacophore-based virtual screening. We introduce VoxCap, a voxel captioning method for generating SMILES strings from voxelised molecular representations.We propose two workflows as practical use cases as well as benchmarks for pharmacophore-based generation: de-novo design, in which we aim to generate new molecules with high pharmacophore-shape similarities to query molecules, and fast search, which aims to combine generative design with a cheap 2D substructure similarity search for efficient hit identification. Our results show that VoxCap significantly outperforms previous methods in generating diverse de-novo hits. When combined with our fast search workflow, VoxCap reduces computational time by orders of magnitude while returning hits for all query molecules, enabling the search of large libraries that are intractable to search by brute force.