Paper List

Computational Neuroscience

Emergent Spatiotemporal Dynamics in Large-Scale Brain Networks with Next Generation Neural Mass Models

Unknown

This work addresses the core challenge of understanding how complex, brain-wide spatiotemporal patterns emerge from the interaction of biophysically d...
Human-Computer Interaction

Human-Centred Evaluation of Text-to-Image Generation Models for Self-expression of Mental Distress: A Dataset Based on GPT-4o

Unknown

This paper addresses the critical gap in evaluating how AI-generated images can effectively support cross-cultural mental distress communication, part...
Bioinformatics

GOPHER: Optimization-based Phenotype Randomization for Genome-Wide Association Studies with Differential Privacy

Unknown

This paper addresses the core challenge of balancing rigorous privacy protection with data utility when releasing full GWAS summary statistics, overco...
Computer Vision

Real-time Cricket Sorting By Sex A low-cost embedded solution using YOLOv8 and Raspberry Pi

Unknown

This paper addresses the critical bottleneck in industrial insect farming: the lack of automated, real-time sex sorting systems for Acheta domesticus ...
Physical Chemistry

Collective adsorption of pheromones at the water-air interface

Unknown

This paper addresses the core challenge of understanding how amphiphilic pheromones, previously assumed to be transported in the gas phase, can be sta...
Bioinformatics

pHapCompass: Probabilistic Assembly and Uncertainty Quantification of Polyploid Haplotype Phase

Unknown

This paper addresses the core challenge of accurately assembling polyploid haplotypes from sequencing data, where read assignment ambiguity and an exp...
Computational Neuroscience

Setting up for failure: automatic discovery of the neural mechanisms of cognitive errors

Unknown

This paper addresses the core challenge of automating the discovery of biologically plausible recurrent neural network (RNN) dynamics that can replica...
Cognitive Neuroscience

Influence of Object Affordance on Action Language Understanding: Evidence from Dynamic Causal Modeling Analysis

Unknown

This study addresses the core challenge of moving beyond correlational evidence to establish the *causal direction* and *temporal dynamics* of how obj...

5 / 9

Journal: ArXiv Preprint

Published: Unknown

Machine LearningComputational Biology

Training Dynamics of Learning 3D-Rotational Equivariance

Genentech Computational Sciences | New York University

Max W. Shen, Ewa M. Nowara, Michael Maser, Kyunghyun Cho

The 30-Second View

IN SHORT: This work addresses the core dilemma of whether to use computationally expensive equivariant architectures or faster symmetry-agnostic models with data augmentation, by quantifying the speed and extent to which the latter learn 3D rotational symmetry.

Innovation (TL;DR)

Methodology Introduces a principled, generalizable framework to decompose total loss into a 'twirled prediction error' (ℒ_mean) and an 'equivariance error' (ℒ_equiv), enabling precise measurement of the percent of loss attributable to imperfect symmetry learning.
Methodology Empirically demonstrates that models learning 3D-rotational equivariance via data augmentation achieve very low equivariance error (≤2% of total loss) remarkably quickly, within 1k-10k training steps, across diverse molecular tasks and model scales.
Theory Provides theoretical and experimental evidence that learning equivariance is an easier task than the main prediction, characterized by a smoother and better-conditioned loss landscape (e.g., 1000x lower condition number for ℒ_equiv vs. ℒ_mean in force field prediction).

Key conclusions

Non-equivariant models with data augmentation learn 3D rotational equivariance rapidly and effectively, reducing the equivariance error component to ≤2% of the total validation loss within the first 1k-10k training steps.
The loss penalty for imperfect equivariance (ℒ_equiv) is small throughout training for 3D rotations, meaning the primary trade-off is the 'efficiency gap' (slower training/inference of equivariant models) rather than a significant accuracy penalty.
The speed of learning equivariance is robust to model size (1M to 400M parameters), dataset size (500 to 1M samples), and optimizer choice, indicating it is a fundamental property of the learning task landscape.

Background and Gap： The field lacks a principled, quantitative understanding of how effectively and efficiently symmetry-agnostic models learn desired symmetries through data augmentation, making the architectural choice between equivariant and non-equivariant models largely heuristic and confounded by implementation details.

Abstract: While data augmentation is widely used to train symmetry-agnostic models, it remains unclear how quickly and effectively they learn to respect symmetries. We investigate this by deriving a principled measure of equivariance error that, for convex losses, calculates the percent of total loss attributable to imperfections in learned symmetry. We focus our empirical investigation to 3D-rotation equivariance on high-dimensional molecular tasks (flow matching, force field prediction, denoising voxels) and find that models reduce equivariance error quickly to ≤2% held-out loss within 1k-10k training steps, a result robust to model and dataset size. This happens because learning 3D-rotational equivariance is an easier learning task, with a smoother and better-conditioned loss landscape, than the main prediction task. For 3D rotations, the loss penalty for non-equivariant models is small throughout training, so they may achieve lower test loss than equivariant models per GPU-hour unless the equivariant “efficiency gap” is narrowed. We also experimentally and theoretically investigate the relationships between relative equivariance error, learning gradients, and model parameters.