Paper List

AI for Science

Autonomous Agents Coordinating Distributed Discovery Through Emergent Artifact Exchange

2026-03-15

This paper addresses the fundamental limitation of current AI-assisted scientific research by enabling truly autonomous, decentralized investigation w...
Artificial Intelligence

D-MEM: Dopamine-Gated Agentic Memory via Reward Prediction Error Routing

2026-03-15

This paper addresses the fundamental scalability bottleneck in LLM agentic memory systems: the O(N²) computational complexity and unbounded API token ...
Biophysics

Countershading coloration in blue shark skin emerges from hierarchically organized and spatially tuned photonic architectures inside skin denticles

2026-03-14

This paper solves the core problem of how blue sharks achieve their striking dorsoventral countershading camouflage, revealing that coloration origina...
Computer Vision

Human-like Object Grouping in Self-supervised Vision Transformers

2026-03-14

This paper addresses the core challenge of quantifying how well self-supervised vision models capture human-like object grouping in natural scenes, br...
Bioinformatics

Hierarchical pp-Adic Framework for Gene Regulatory Networks: Theory and Stability Analysis

2026-03-14

This paper addresses the core challenge of mathematically capturing the inherent hierarchical organization and multi-scale stability of gene regulator...
Computational Neuroscience

Towards unified brain-to-text decoding across speech production and perception

2026-03-13

This paper addresses the core challenge of developing a unified brain-to-text decoding framework that works across both speech production and percepti...
Artificial Intelligence

Dual-Laws Model for a theory of artificial consciousness

2026-03-13

This paper addresses the core challenge of developing a comprehensive, testable theory of consciousness that bridges biological and artificial systems...
Computational Neuroscience

Pulse desynchronization of neural populations by targeting the centroid of the limit cycle in phase space

2026-03-13

This work addresses the core challenge of determining optimal pulse timing and intensity for desynchronizing pathological neural oscillations when the...

3 / 18

期刊: ArXiv Preprint

发布日期: 2025-12-03

BioinformaticsPrivacy-Preserving ML

GOPHER: Optimization-based Phenotype Randomization for Genome-Wide Association Studies with Differential Privacy

Department of Biomedical Informatics & Data Science, Yale School of Medicine | Department of Technology and Operations Management, Harvard Business School | Department of Computer Science, Yale University

Anupama Nandi, Seth Neel, Hyunghoon Cho

30秒速读

IN SHORT: This paper addresses the core challenge of balancing rigorous privacy protection with data utility when releasing full GWAS summary statistics, overcoming the limitations of prior methods that either add excessive noise or restrict output to a small subset of results.

核心创新

Methodology Introduces an optimization-based phenotype randomization mechanism (GOPHER-LP) that directly minimizes expected error in GWAS statistics, formulated as a linear programming problem to enhance utility beyond baseline methods like randomized response.
Methodology Proposes GOPHER-MultiLP, which incorporates personalized priors derived from predictive models (e.g., polygenic risk scores) trained on a held-out subset, enabling sample-specific optimization that leverages genotype information to further reduce noise.
Theory Adopts and extends the concept of phenotypic differential privacy (analogous to label DP), focusing protection on sensitive phenotypes while treating genotypes as public, providing a practical middle ground between full DP and unrestricted release.

主要结论

The GOPHER framework enables the release of complete GWAS statistics (e.g., over 500,000 variants) with provable privacy guarantees, a significant scalability advance over prior methods limited to releasing only 3-5 top associations.
Experiments on UK Biobank data (n=100,000) demonstrate that the mechanisms yield association statistics that accurately match non-private GWAS results while maintaining rigorous (ε, δ)-DP guarantees.
The phenotype-randomization approach decouples the added noise from the number of genetic variants analyzed, addressing a fundamental scalability challenge not previously solved in the DP-GWAS literature.

研究空白： Existing differentially private GWAS methods face a scalability-utility trade-off: they either add noise proportional to the high dimensionality of GWAS results (rendering outputs unusable) or restrict releases to only a small number of top associations (limiting downstream analyses like meta-analyses and risk prediction).

摘要: Genome-wide association studies (GWAS) are an essential tool in biomedical research for identifying genetic factors linked to health and disease. However, publicly releasing GWAS summary statistics poses well-recognized privacy risks, including the potential to infer an individual’s participation in the study or to reveal sensitive phenotypic information (e.g., disease status). While differential privacy (DP) offers a rigorous mathematical framework for mitigating these risks, existing DP techniques for GWAS either introduce excessive noise or restrict the release to a limited set of results. In this work, we present practical DP mechanisms for releasing the complete set of genome-wide association statistics with privacy guarantees. We demonstrate the accuracy of the privacy-preserving statistics released by our mechanisms on a range of GWAS datasets from the UK Biobank, utilizing both real and simulated phenotypes. We introduce two key techniques to overcome the limitations of prior approaches: (1) an optimization-based randomization mechanism that directly minimizes the expected error in GWAS results to enhance utility, and (2) the use of personalized priors, derived from predictive models privately trained on a subset of the dataset, to enable sample-specific optimization which further reduces the amount of noise introduced by DP. Overall, our work provides practical tools for accurately releasing comprehensive GWAS results with provable protection of study participants.

代码