Paper List
-
STAR-GO: Improving Protein Function Prediction by Learning to Hierarchically Integrate Ontology-Informed Semantic Embeddings
This paper addresses the core challenge of generalizing protein function prediction to unseen or newly introduced Gene Ontology (GO) terms by overcomi...
-
Incorporating indel channels into average-case analysis of seed-chain-extend
This paper addresses the core pain point of bridging the theoretical gap for the widely used seed-chain-extend heuristic by providing the first rigoro...
-
Competition, stability, and functionality in excitatory-inhibitory neural circuits
This paper addresses the core challenge of extending interpretable energy-based frameworks to biologically realistic asymmetric neural networks, where...
-
Enhancing Clinical Note Generation with ICD-10, Clinical Ontology Knowledge Graphs, and Chain-of-Thought Prompting Using GPT-4
This paper addresses the core challenge of generating accurate and clinically relevant patient notes from sparse inputs (ICD codes and basic demograph...
-
Hypothesis-Based Particle Detection for Accurate Nanoparticle Counting and Digital Diagnostics
This paper addresses the core challenge of achieving accurate, interpretable, and training-free nanoparticle counting in digital diagnostic assays, wh...
-
MCP-AI: Protocol-Driven Intelligence Framework for Autonomous Reasoning in Healthcare
This paper addresses the critical gap in healthcare AI systems that lack contextual reasoning, long-term state management, and verifiable workflows by...
-
Model Gateway: Model Management Platform for Model-Driven Drug Discovery
This paper addresses the critical bottleneck of fragmented, ad-hoc model management in pharmaceutical research by providing a centralized, scalable ML...
-
Tree Thinking in the Genomic Era: Unifying Models Across Cells, Populations, and Species
This paper addresses the fragmentation of tree-based inference methods across biological scales by identifying shared algorithmic principles and stati...
Tree Thinking in the Genomic Era: Unifying Models Across Cells, Populations, and Species
Stanford University | University of Oxford | University of California, Berkeley | Peking University | Guangzhou Medical University
The 30-Second View
IN SHORT: This paper addresses the fragmentation of tree-based inference methods across biological scales by identifying shared algorithmic principles and statistical challenges in phylogenetics, population genetics, and cell lineage tracing.
Innovation (TL;DR)
- Methodology Identifies deep conceptual parallels between phylogenetic placement algorithms and ARG threading methods, demonstrating how phylogenetic placement generalizes to ARG reconstruction.
- Biology Shows that quartet-based network methods in phylogenetics and ABBA-BABA statistics in population genetics capture the same underlying signal of gene flow through asymmetric genealogical relationships.
- Methodology Demonstrates how ARG-based migration inference methods (e.g., GAIA, spacetrees) extend classical phylogeographic approaches by leveraging the full sequence of locally correlated genealogies along the genome.
Key conclusions
- Tree-based models provide a unified framework for ancestry inference across biological scales, with ARGs representing ~2.48 million SARS-CoV-2 genomes demonstrating pandemic-scale feasibility.
- Methodological parallels exist across domains: phylogenetic placement algorithms share core logic with ARG threading, and quartet-based methods in phylogenetics mirror ABBA-BABA statistics in population genetics for detecting gene flow.
- Current ARG inference algorithms remain constrained by simplifying assumptions (neutrality, panmixia, constant population size) and face challenges in uncertainty quantification, particularly for non-model species or limited sample sizes.
Abstract: The ongoing explosion of genome sequence data is transforming how we reconstruct and understand the histories of biological systems. Across biological scales–from individual cells to populations and species–trees-based models provide a common framework for representing ancestry. Once limited to species phylogenetics, “tree thinking” now extends deeply to population genomics and cell biology, revealing the genealogical structure of genetic and phenotypic variation within and across organisms. Recently, there have been great methodological and computational advances on tree-based methods, including methods for inferring ancestral recombination graphs in populations, phylogenetic frameworks for comparative genomics, and lineage-tracing techniques in developmental and cancer biology. Despite differences in data types and biological contexts, these approaches share core statistical and algorithmic challenges: efficiently inferring branching histories from genomic information, integrating temporal and spatial signals, and connecting genealogical structures to evolutionary and functional processes. Recognizing these shared foundations opens opportunities for cross-fertilization between fields that are traditionally studied in isolation. By examining how tree-based methods are applied across cellular, population, and species scales, we identify the conceptual parallels that unite them and the distinct challenges that each domain presents. These comparisons offer new perspectives that can inform algorithmic innovations and lead to more powerful inference strategies across the full spectrum of biological systems.