Paper List
-
Emergent Spatiotemporal Dynamics in Large-Scale Brain Networks with Next Generation Neural Mass Models
This work addresses the core challenge of understanding how complex, brain-wide spatiotemporal patterns emerge from the interaction of biophysically d...
-
Human-Centred Evaluation of Text-to-Image Generation Models for Self-expression of Mental Distress: A Dataset Based on GPT-4o
This paper addresses the critical gap in evaluating how AI-generated images can effectively support cross-cultural mental distress communication, part...
-
GOPHER: Optimization-based Phenotype Randomization for Genome-Wide Association Studies with Differential Privacy
This paper addresses the core challenge of balancing rigorous privacy protection with data utility when releasing full GWAS summary statistics, overco...
-
Real-time Cricket Sorting By Sex A low-cost embedded solution using YOLOv8 and Raspberry Pi
This paper addresses the critical bottleneck in industrial insect farming: the lack of automated, real-time sex sorting systems for Acheta domesticus ...
-
Collective adsorption of pheromones at the water-air interface
This paper addresses the core challenge of understanding how amphiphilic pheromones, previously assumed to be transported in the gas phase, can be sta...
-
pHapCompass: Probabilistic Assembly and Uncertainty Quantification of Polyploid Haplotype Phase
This paper addresses the core challenge of accurately assembling polyploid haplotypes from sequencing data, where read assignment ambiguity and an exp...
-
Setting up for failure: automatic discovery of the neural mechanisms of cognitive errors
This paper addresses the core challenge of automating the discovery of biologically plausible recurrent neural network (RNN) dynamics that can replica...
-
Influence of Object Affordance on Action Language Understanding: Evidence from Dynamic Causal Modeling Analysis
This study addresses the core challenge of moving beyond correlational evidence to establish the *causal direction* and *temporal dynamics* of how obj...
MoRSAIK: Sequence Motif Reactor Simulation, Analysis and Inference Kit in Python
Max-Planck-Institut für Astrophysik | Ludwig-Maximilians-Universität München | Technische Universität München | Exzellenzcluster ORIGINS
The 30-Second View
IN SHORT: This work addresses the computational bottleneck in simulating prebiotic RNA reactor dynamics by developing a Python package that tracks sequence motif concentrations instead of full RNA strands, enabling efficient Bayesian inference of reaction parameters.
Innovation (TL;DR)
- Methodology First implementation of Bayesian inference methods for RNA reactor simulations using Geometric Variational Inference via NIFTy.re framework
- Methodology Novel mean-field approximation approach that tracks k-mer motif concentrations (default k=4) instead of exponentially growing full RNA sequences
- Biology Enables systematic investigation of templated ligation dynamics under varying environmental conditions relevant to RNA world hypothesis
Key conclusions
- MoRSAIK reduces computational complexity from exponential to polynomial by tracking k-mer motifs (k=4 default) instead of full RNA strands
- The package enables Bayesian inference of reaction rate constants from templated ligation count data using Geometric Variational Inference
- Integration with JAX provides differentiable models for efficient gradient-based optimization and uncertainty quantification
Abstract: Origins of life research investigates how life could emerge from prebiotic chemistry only. Living systems as we know them today rely on RNA, DNA and proteins. According to the central dogma of molecular biology, information is stored in DNA, transfered by RNA resulting in proteins that catalyze functional reactions, such as synthesis and replication of DNA and RNA. One possible explanation of how this mechanism evolved provides the RNA world hypothesis (Crick 1968; Higgs and Lehman 2014; Orgel 1968; Pressman, Blanco, and Chen 2015; Szostak 2012). It states that life could emerge from RNA strands only, storing and transferring biological information, as well as catalyzing reactions as ribozymes. Before this state could have emerged, however, the prebiotic world was probably a purely chemical pool of short RNA strands with random sequences and without biological function. Despite the lack of guidence by proteins, the RNA sequences reacted with each other. In such an RNA reactor RNA strands perform hybridization and dehybridization, as well as ligation and cleavage. In this context relevant questions are what are the conditions that allow longer RNA strands to be built and how can information carrying in RNA sequence emerge? A key reaction for the emergence of longer RNA strands is templated ligation. There, two strands hybridize adjacent onto a template strand and ligate. The rate of this reaction is the larger, the better the two strands match the complementary sequence of the template strand. The extended strands can then serve as a template for the next generation of templated ligation. This leads to an acceleration of production of complementary strands. This process, however, is highly sensitive to environmental conditions determining the reaction rates within an RNA reactor (Göppel et al. 2022; Rosenberger et al. 2021). In order to investigate those RNA reactors, efficient simulations are needed because the space of possible RNA sequences increases exponentially with the length of the strands, as well as the number of reactions between two strands. In addition, simulations have to be compared to experimental data for validation and parameter calibration. Here, we present the MoRSAIK python package for sequence motif (or k-mer) reactor simulation, analysis and inference. It enables users to simulate RNA sequence motif dynamics in the mean field approximation as well as to infer the reaction parameters from data with Bayesian methods and to analyze results by computing observables and plotting. MoRSAIK simulates an RNA reactor by following the reactions and the concentrations of all strands inside up to a certain length (of four nucleotides by default). Longer strands are followed indirectly, by tracking the concentrations of their containing sequence motifs of that maximum length.