Paper List

Bioinformatics

Mapping of Lesion Images to Somatic Mutations

2020-10-19

This paper addresses the critical bottleneck of delayed genetic analysis in cancer diagnosis by predicting a patient's full somatic mutation profile d...
Artificial Intelligence

Reinventing Clinical Dialogue: Agentic Paradigms for LLM‑Enabled Healthcare Communication

2018-08-01

This paper addresses the core challenge of transforming reactive, stateless LLMs into autonomous, reliable clinical dialogue agents capable of longitu...
Bioinformatics

Binary Latent Protein Fitness Landscapes for Quantum Annealing Optimization

2018-06-03

通过将序列映射到二元潜在空间进行基于QUBO的适应度优化，桥接蛋白质表示学习和组合优化。
Bio-inspired Robotics

Controlling Fish Schools via Reinforcement Learning of Virtual Fish Movement

2018-02-09

证明了无模型强化学习可以利用虚拟视觉刺激有效引导鱼群，克服了缺乏精确行为模型的问题。

18 / 18
»

期刊: ArXiv Preprint

发布日期: 2026-03-13

Human-Computer InteractionArtificial Intelligence

Developing the PsyCogMetrics™ AI Lab to Evaluate Large Language Models and Advance Cognitive Science

Marywood University | The University of Scranton | University of North Carolina Wilmington | California State University Dominguez Hills

Zhiye Jin, Yibai Li, K. D. Joshi, Xuefei (Nancy) Deng, Xiaobing (Emily) Li

30秒速读

IN SHORT: This paper addresses the critical gap between sophisticated LLM evaluation needs and the lack of accessible, scientifically rigorous platforms that integrate psychometric and cognitive science methodologies for non-technical stakeholders.

核心创新

Methodology Introduces the first cloud-based platform applying Classical Test Theory (CTT) and psychometric validity principles (Cronbach's α > .70, AVE > .50) to systematically evaluate LLMs as cognitive entities rather than mere tools.
Methodology Implements a three-cycle Action Design Science framework (Relevance-Rigor-Design) with nested Build–Intervene–Evaluate loops, bridging Popperian falsifiability, Cognitive Load Theory, and stakeholder requirements into a unified evaluation system.
Biology Validates that modern LLMs (GPT-4, LLaMA-3) satisfy core psychometric validity criteria—including convergent, discriminant, predictive, and external validity—and outperform earlier models (GPT-3.5, LLaMA-2) across these dimensions.

主要结论

The PsyCogMetrics™ AI Lab successfully operationalizes psychometric principles with demonstrated reliability metrics (Cronbach's α > .70) and validity frameworks (convergent/discriminant validity) for LLM evaluation.
The platform addresses three critical pain points: mitigates benchmark saturation through dynamic evaluation, reduces data contamination via reproducible workflows, and expands coverage through cognitive science methodologies.
Design validation shows GPT-4 and LLaMA-3 satisfy psychometric validity criteria and outperform earlier models, with GPT-4 reaching six-year-old human parity on Theory of Mind vignettes (Strachan et al., 2024).

研究空白： Current LLM evaluation suffers from benchmark saturation (new models achieve near-ceiling scores without real capability improvements), data contamination (test sets leak into training), lack of coverage for emerging capabilities, and developer-oriented tools that exclude psychology/cognitive science experts who lack programming infrastructure.

摘要: This study presents the development of the PsyCogMetrics™ AI Lab (https://psycogmetrics.ai), an integrated, cloud-based platform that operationalizes psychometric and cognitive-science methodologies for Large Language Model (LLM) evaluation. Framed as a three-cycle Action Design Science study, the Relevance Cycle identifies key limitations in current evaluation methods and unfulfilled stakeholder needs. The Rigor Cycle draws on kernel theories such as Popperian falsifiability, Classical Test Theory, and Cognitive Load Theory to derive deductive design objectives. The Design Cycle operationalizes these objectives through nested Build–Intervene–Evaluate loops. The study contributes a novel IT artifact, a validated design for LLM evaluation, benefiting research at the intersection of AI, psychology, cognitive science, and the social and behavioral sciences.

Paper List

Mapping of Lesion Images to Somatic Mutations

Reinventing Clinical Dialogue: Agentic Paradigms for LLM‑Enabled Healthcare Communication

Binary Latent Protein Fitness Landscapes for Quantum Annealing Optimization

Controlling Fish Schools via Reinforcement Learning of Virtual Fish Movement

Developing the PsyCogMetrics™ AI Lab to Evaluate Large Language Models and Advance Cognitive Science

30秒速读

核心创新

主要结论