Paper List

AI for Science

Autonomous Agents Coordinating Distributed Discovery Through Emergent Artifact Exchange

2026-03-15

This paper addresses the fundamental limitation of current AI-assisted scientific research by enabling truly autonomous, decentralized investigation w...
Artificial Intelligence

D-MEM: Dopamine-Gated Agentic Memory via Reward Prediction Error Routing

2026-03-15

This paper addresses the fundamental scalability bottleneck in LLM agentic memory systems: the O(N²) computational complexity and unbounded API token ...
Biophysics

Countershading coloration in blue shark skin emerges from hierarchically organized and spatially tuned photonic architectures inside skin denticles

2026-03-14

This paper solves the core problem of how blue sharks achieve their striking dorsoventral countershading camouflage, revealing that coloration origina...
Computer Vision

Human-like Object Grouping in Self-supervised Vision Transformers

2026-03-14

This paper addresses the core challenge of quantifying how well self-supervised vision models capture human-like object grouping in natural scenes, br...
Bioinformatics

Hierarchical pp-Adic Framework for Gene Regulatory Networks: Theory and Stability Analysis

2026-03-14

This paper addresses the core challenge of mathematically capturing the inherent hierarchical organization and multi-scale stability of gene regulator...
Computational Neuroscience

Towards unified brain-to-text decoding across speech production and perception

2026-03-13

This paper addresses the core challenge of developing a unified brain-to-text decoding framework that works across both speech production and percepti...
Artificial Intelligence

Dual-Laws Model for a theory of artificial consciousness

2026-03-13

This paper addresses the core challenge of developing a comprehensive, testable theory of consciousness that bridges biological and artificial systems...
Computational Neuroscience

Pulse desynchronization of neural populations by targeting the centroid of the limit cycle in phase space

2026-03-13

This work addresses the core challenge of determining optimal pulse timing and intensity for desynchronizing pathological neural oscillations when the...

3 / 18

期刊: ArXiv Preprint

发布日期: 2026-03-11

Systems BiologyScientific Machine Learning

Ill-Conditioning in Dictionary-Based Dynamic-Equation Learning: A Systems Biology Case Study

Northwestern University | NSF-Simons National Institute for Theory and Mathematics in Biology

Yuxiang Feng, Niall M. Mangan, Manu Jayadharan

30秒速读

IN SHORT: This paper addresses the critical challenge of numerical ill-conditioning and multicollinearity in library-based sparse regression methods (e.g., SINDy), which leads to unstable and inaccurate recovery of governing equations from biological time-series data.

核心创新

Methodology Quantitatively demonstrates that severe ill-conditioning (condition numbers up to 10^18) arises even with simple 2-3 term combinations in polynomial libraries, fundamentally limiting sparse identification methods.
Methodology Shows that orthogonal polynomial bases (e.g., Legendre, Chebyshev) fail to improve conditioning when data distributions deviate from their theoretical weight functions, sometimes performing worse than monomials.
Methodology Proposes and validates that aligning the data sampling distribution with the orthogonal basis's weight function can mitigate ill-conditioning and improve model recovery accuracy.

主要结论

Ill-conditioning is pervasive in polynomial libraries for biological systems: condition numbers reach O(10^5) for Lotka-Volterra and O(10^18) for chemical reaction network models, leading to systematic model misidentification.
Orthogonal polynomial bases are not a universal solution; they can worsen conditioning when data distributions (e.g., from constrained biological trajectories) deviate from the basis's required weight function.
Distribution-aligned sampling is a key enabler: when data are sampled according to the orthogonal basis's weight function, conditioning improves significantly, enabling more accurate equation recovery.

研究空白： While identifiability issues (e.g., sloppy parameters) are well-studied in systems biology, a systematic analysis of numerical ill-conditioning and its impact on data-driven model discovery using sparse regression libraries has been lacking, particularly regarding the interplay between basis choice and data distribution.

摘要: Data-driven discovery of governing equations from time-series data provides a powerful framework for understanding complex biological systems. Library-based approaches that use sparse regression over candidate functions have shown considerable promise, but they face a critical challenge when candidate functions become strongly correlated: numerical ill-conditioning. Poor or restricted sampling, together with particular choices of candidate libraries, can produce strong multicollinearity and numerical instability. In such cases, measurement noise may lead to widely different recovered models, obscuring the true underlying dynamics and hindering accurate system identification. Although sparse regularization promotes parsimonious solutions and can partially mitigate conditioning issues, strong correlations may persist, regularization may bias the recovered models, and the regression problem may remain highly sensitive to small perturbations in the data. We present a systematic analysis of how ill-conditioning affects sparse identification of biological dynamics using benchmark models from systems biology. We show that combinations involving as few as two or three terms can already exhibit strong multicollinearity and extremely large condition numbers. We further show that orthogonal polynomial bases do not consistently resolve ill-conditioning and can perform worse than monomial libraries when the data distribution deviates from the weight function associated with the orthogonal basis. Finally, we demonstrate that when data are sampled from distributions aligned with the appropriate weight functions corresponding to the orthogonal basis, numerical conditioning improves, and orthogonal polynomial bases can yield improved model recovery accuracy across two baseline models. Relevance to Life Sciences Numerical ill-conditioning is especially consequential in the model discovery for biological systems, where nonlinear interactions are often represented using nonlinear functions such as polynomials, and where multiscale dynamics, constrained state trajectories, and limited sampling due to experimental limitations can further amplify multicollinearity. We demonstrate these effects across benchmark models relevant to metabolic networks, regulatory networks, and population dynamics. Our results show that poor conditioning can impair the recovery of biologically meaningful governing equations, while sampling strategies matched to the candidate basis can improve identification accuracy. These results imply that a broader range of dynamic sampling is needed in most biological experiments to produce data sets that are suitable for data-driven model discovery with current methods. Mathematical Content This paper studies sparse regression-based equation discovery in the presence of multicollinearity and numerical ill-conditioning. We analyze the conditioning of candidate libraries, especially monomial and orthogonal polynomial bases, using condition numbers and model recovery under realistic sampling conditions with publicly available experimental data. We compare how basis choice and sampling distribution affect regression stability, sparsity, and the accuracy of recovered dynamical models.