arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1968
2508.14950 2026-05-15 eess.IV cs.LG

Potential and challenges of generative adversarial networks for super-resolution in 4D Flow MRI

Oliver Welin Odeback, Arivazhagan Geetha Balasubramanian, Jonas Schollenberger, Edward Ferdiand, Alistair A. Young, C. Alberto Figueroa, Susanne Schnell, Outi Tammisola, Ricardo Vinuesa, Tobias Granberg, Alexander Fyrdahl, David Marlevi

AI总结 本文研究了生成对抗网络(GAN)在4D血流磁共振成像(4D Flow MRI)超分辨率重建中的潜力与挑战。针对该技术在近壁速度测量中分辨率低、噪声大的问题,作者提出了一种专门设计的GAN架构,并在三种对抗损失函数下进行了评估。实验表明,Wasserstein GAN在提升近壁速度恢复精度和训练稳定性方面表现最优,展示了GAN在改善4D Flow MRI图像质量中的应用前景。

Comments 26 pages, 10 figures

详情
Journal ref
Computers in Biology and Medicine 211 (2026) 111745
英文摘要

4D Flow Magnetic Resonance Imaging (4D Flow MRI) enables non-invasive quantification of blood flow and hemodynamic parameters. However, its clinical application is limited by low spatial resolution and noise, particularly affecting near-wall velocity measurements. Machine learning-based super-resolution has shown promise in addressing these limitations, but challenges remain, not least in recovering near-wall velocities. Generative adversarial networks (GANs) offer a compelling solution, having demonstrated strong capabilities in restoring sharp boundaries in non-medical super-resolution tasks. Yet, their application in 4D Flow MRI remains unexplored, with implementation challenged by known issues such as training instability and non-convergence. In this study, we investigate GAN-based super-resolution in 4D Flow MRI. Training and validation were conducted using patient-specific cerebrovascular in-silico models, converted into synthetic images via an MR-true reconstruction pipeline. A dedicated GAN architecture was implemented and evaluated across three adversarial loss functions: Vanilla, Relativistic, and Wasserstein. Our results demonstrate that the proposed GAN improved near-wall velocity recovery compared to a non-adversarial reference (vNRMSE: 6.9% vs. 9.6%); however, that implementation specifics are critical for stable network training. While Vanilla and Relativistic GANs proved unstable compared to generator-only training (vNRMSE: 8.1% and 7.8% vs. 7.2%), a Wasserstein GAN demonstrated optimal stability and incremental improvement (vNRMSE: 6.9% vs. 7.2%). The Wasserstein GAN further outperformed the generator-only baseline at low SNR (vNRMSE: 8.7% vs. 10.7%). These findings highlight the potential of GAN-based super-resolution in enhancing 4D Flow MRI, particularly in challenging cerebrovascular regions, while emphasizing the need for careful selection of adversarial strategies.

2508.07876 2026-05-15 stat.ML cs.LG math.DS math.ST stat.TH

Stochastic dynamics learning with state-space systems

Juan-Pablo Ortega, Florian Rossmannek

AI总结 本文研究了状态空间系统在随机动态学习中的特性,旨在深化对脉冲神经网络计算(RC)理论基础的理解。通过统一处理确定性和随机性场景下的记忆衰减和回声状态属性(ESP),作者证明了即使在缺乏ESP的情况下,记忆衰减和解的稳定性也具有普遍性,从而为RC模型的广泛应用提供了理论支持。在随机情形下,文章引入了基于概率分布吸引子动力学的新视角,拓展了非自主动力系统的相关研究,为RC模型在因果性、稳定性与记忆特性方面提供了更深入的见解。

详情
Journal ref
Mathematical Models and Methods in Applied Sciences, 2026
英文摘要

This work advances the theoretical foundations of reservoir computing (RC) by providing a unified treatment of fading memory and the echo state property (ESP) in both deterministic and stochastic settings. We investigate state-space systems, a central model class in time series learning, and establish that fading memory and solution stability hold generically -- even in the absence of the ESP -- offering a robust explanation for the empirical success of RC models without strict contractivity conditions. In the stochastic case, we critically assess stochastic echo states, proposing a novel distributional perspective rooted in attractor dynamics on the space of probability distributions, which leads to a rich and coherent theory. Our results extend and generalize previous work on non-autonomous dynamical systems, offering new insights into causality, stability, and memory in RC models. This lays the groundwork for reliable generative modeling of temporal data in both deterministic and stochastic regimes.

2508.03941 2026-05-15 cs.IR cs.LG

Measuring the stability and plasticity of recommender systems

Maria João Lavoura, Robert Jungnickel, João Vinagre

AI总结 本文研究了推荐系统在长期运行中的稳定性与可塑性问题,提出了一个离线评估方法,用于分析推荐模型在重新训练时的行为表现。该方法从模型保留历史模式(稳定性)和适应新变化(可塑性)两个方面对算法进行评估,提供了一种与数据集、算法和指标无关的长期性能分析框架。实验结果表明,不同类型的推荐算法在稳定性和可塑性上存在差异,并可能存在两者之间的权衡关系。

Comments Final version published in the proceedings of ACM UMAP 2026: https://doi.org/10.1145/3774935.3812707

详情
英文摘要

The typical offline protocol to evaluate recommendation algorithms is to collect a dataset of user-item interactions and then use a part of this dataset to train a model, and the remaining data to measure how closely the model recommendations match the observed user interactions. This protocol is straightforward, useful and practical, but it only provides snapshot performance. We know, however, that online systems evolve over time. In general, it is a good idea that models are frequently retrained with recent data. But if this is the case, to what extent can we trust previous evaluations? How will a model perform when a different pattern (re)emerges? In this paper we propose a methodology to study how recommendation models behave when they are retrained. The idea is to profile algorithms according to their ability to, on the one hand, retain past patterns - stability - and, on the other hand, (quickly) adapt to changes - plasticity. We devise an offline evaluation protocol that provides detail on the long-term behavior of models, and that is agnostic to datasets, algorithms and metrics. To illustrate the potential of this framework, we present preliminary results of three different types of algorithms on the GoodReads dataset that suggest different stability and plasticity profiles depending on the algorithmic technique, and a possible trade-off between stability and plasticity. We further discuss the potential and limitations of the proposal and advance some possible improvements.

2507.13941 2026-05-15 q-bio.NC cs.AI cs.CV eess.IV

Shared representations in brains and models reveal a two-route cortical organization during scene perception

Pablo Marcos-Manchón, Lluís Fuentemilla

AI总结 该研究通过分析7T fMRI数据,探讨了人类大脑在场景感知过程中信息的组织与传递路径。研究利用表征相似性分析,比较了个体间共享的脑区表征结构与视觉和语言神经网络的层次特征,发现大脑存在两条分离的处理通路:一条负责场景布局与环境背景,另一条专门处理生物内容。这一发现深化了对视觉信息处理的经典模型,揭示了场景感知是一个由多个可区分表征路径组成的分布式脑网络。

Comments for associate code, see https://github.com/memory-formation/convergent-transformations

详情
英文摘要

The brain transforms visual inputs into high-dimensional cortical representations that support diverse cognitive and behavioral goals. Characterizing how this information is organized and routed across the human brain is essential for understanding how we process complex visual scenes. Here, we applied representational similarity analysis to 7T fMRI data collected during natural scene viewing. We quantified representational geometry shared across individuals and compared it to hierarchical features from vision and language neural networks. This analysis revealed two distinct processing routes: a ventromedial pathway specialized for scene layout and environmental context, and a lateral occipitotemporal pathway selective for animate content. Vision models aligned with shared structure in both routes, whereas language models corresponded primarily with the lateral pathway. These findings refine classical visual-stream models by characterizing scene perception as a distributed cortical network with separable representational routes for context and animate content.

2507.05193 2026-05-15 eess.IV cs.CV

RAM-W600: A Multi-Task Wrist Dataset and Benchmark for Rheumatoid Arthritis

Songxiao Yang, Haolin Wang, Yao Fu, Ye Tian, Tamotsu Kamishima, Masayuki Ikebe, Yafei Ou, Masatoshi Okutomi

AI总结 该研究提出了一种名为RAM-W600的多任务腕关节X光图像数据集,用于类风湿性关节炎(RA)的辅助诊断与疾病监测。该数据集包含来自六个医疗中心的388名患者的1048张腕部常规X光图像,提供了像素级的腕骨实例分割标注和SvdH骨侵蚀评分,是首个公开的腕骨实例分割资源。该数据集有助于推动RA相关研究,如关节间隙狭窄量化、骨侵蚀检测、骨变形评估等,并可能应用于腕部骨折定位等任务,有望降低腕部RA研究的门槛,促进计算机辅助诊断技术的发展。

Comments Published in NeurIPS 2025

详情
英文摘要

Rheumatoid arthritis (RA) is a common autoimmune disease that has been the focus of research in computer-aided diagnosis (CAD) and disease monitoring. In clinical settings, conventional radiography (CR) is widely used for the screening and evaluation of RA due to its low cost and accessibility. The wrist is a critical region for the diagnosis of RA. However, CAD research in this area remains limited, primarily due to the challenges in acquiring high-quality instance-level annotations. (i) The wrist comprises numerous small bones with narrow joint spaces, complex structures, and frequent overlaps, requiring detailed anatomical knowledge for accurate annotation. (ii) Disease progression in RA often leads to osteophyte, bone erosion (BE), and even bony ankylosis, which alter bone morphology and increase annotation difficulty, necessitating expertise in rheumatology. This work presents a multi-task dataset for wrist bone in CR, including two tasks: (i) wrist bone instance segmentation and (ii) Sharp/van der Heijde (SvdH) BE scoring, which is the first public resource for wrist bone instance segmentation. This dataset comprises 1048 wrist conventional radiographs of 388 patients from six medical centers, with pixel-level instance segmentation annotations for 618 images and SvdH BE scores for 800 images. This dataset can potentially support a wide range of research tasks related to RA, including joint space narrowing (JSN) progression quantification, BE detection, bone deformity evaluation, and osteophyte detection. It may also be applied to other wrist-related tasks, such as carpal bone fracture localization. We hope this dataset will significantly lower the barrier to research on wrist RA and accelerate progress in CAD research within the RA-related domain.

2506.20425 2026-05-15 stat.ML cs.LG stat.CO stat.ME

Scalable Subset Selection in Linear Mixed Models

Ryan Thompson, Matt P. Wand, Joanna J. J. Wang

AI总结 本文研究了在包含固定效应和随机效应的线性混合模型中如何高效地进行可扩展的子集选择问题。为了解决现有方法在处理大量预测变量时计算效率低下的问题,作者提出了一种基于 $\ell_0$ 正则化的新型子集选择方法,并结合坐标下降算法和局部搜索算法以实现快速收敛和非凸优化的高效求解。该方法在统计上提供了有限样本下的KL散度界,并在合成和真实数据实验中表现出优越的性能。

详情
英文摘要

Linear mixed models (LMMs), which incorporate fixed and random effects, are key tools for analyzing heterogeneous data, such as in personalized medicine. Nowadays, this type of data is increasingly wide, sometimes containing thousands of candidate predictors, necessitating sparsity for prediction and interpretation. However, existing sparse learning methods for LMMs do not scale well beyond tens or hundreds of predictors, leaving a large gap compared with sparse methods for linear models, which ignore random effects. This paper closes the gap with a new $\ell_0$ regularized method for LMM subset selection that can run on datasets containing thousands of predictors in seconds to minutes. On the computational front, we develop a coordinate descent algorithm as our main workhorse and provide a guarantee of its convergence. We also develop a local search algorithm to help traverse the nonconvex optimization surface. Both algorithms readily extend to subset selection in generalized LMMs via a penalized quasi-likelihood approximation. On the statistical front, we provide a finite-sample bound on the Kullback-Leibler divergence of the new method. We then demonstrate its excellent performance in experiments involving synthetic and real datasets.

2505.16714 2026-05-15 quant-ph cs.LG

Experimental robustness benchmarking of quantum neural networks on a superconducting quantum processor

Hai-Feng Zhang, Zhao-Yun Chen, Peng Wang, Liang-Liang Guo, Tian-Le Wang, Xiao-Yan Yang, Ren-Ze Zhao, Ze-An Zhao, Sheng Zhang, Lei Du, Hao-Ran Tao, Zhi-Long Jia, Wei-Cheng Kong, Huan-Yu Liu, Athanasios V. Vasilakos, Yang Yang, Yu-Chun Wu, Ji Guan, Peng Duan, Guo-Ping Guo

AI总结 本研究首次在超导量子处理器上对20量子比特的量子神经网络分类器进行了系统的实验鲁棒性评估,揭示了量子机器学习模型在对抗攻击下的安全性问题。研究提出了一种高效的对抗攻击算法,用于量化评估量子神经网络的鲁棒性,并验证了对抗训练能够通过正则化输入梯度显著提升其鲁棒性。实验还表明,与经典神经网络相比,量子神经网络具有更强的对抗鲁棒性,这归因于其固有的量子噪声,并且实验结果与理论下界高度吻合,验证了攻击方法的有效性与鲁棒性界限的紧致性。

Comments There are 8 pages with 5 figures in the main text

详情
Journal ref
SCIENCE CHINA Physics, Mechanics & Astronomy Volume 69, Issue 6: 260315 (2026)
英文摘要

Quantum machine learning (QML) models, like their classical counterparts, are vulnerable to adversarial attacks, hindering their secure deployment. Here, we report the first systematic experimental robustness benchmark for 20-qubit quantum neural network (QNN) classifiers executed on a superconducting processor. Our benchmarking framework features an efficient adversarial attack algorithm designed for QNNs, enabling quantitative characterization of adversarial robustness and robustness bounds. From our analysis, we verify that adversarial training reduces sensitivity to targeted perturbations by regularizing input gradients, significantly enhancing QNN's robustness. Additionally, our analysis reveals that QNNs exhibit superior adversarial robustness compared to classical neural networks, an advantage attributed to inherent quantum noise. Furthermore, the empirical upper bound extracted from our attack experiments shows a minimal deviation ($3 \times 10^{-3}$) from the theoretical lower bound, providing strong experimental confirmation of the attack's effectiveness and the tightness of fidelity-based robustness bounds. This work establishes a critical experimental framework for assessing and improving quantum adversarial robustness, paving the way for secure and reliable QML applications.

2505.09552 2026-05-15 stat.ME cs.LG stat.ML

Scalable Krylov Subspace Methods for Generalized Mixed-Effects Models with Crossed Random Effects

Pascal Kündig, Fabio Sigrist

AI总结 该论文针对具有交叉随机效应的广义混合效应模型中的计算瓶颈问题,提出了一种基于Krylov子空间的方法,有效提升了高维数据下的计算效率。研究通过理论分析和实验验证,展示了预条件随机Lanczos拟合和共轭梯度方法在收敛性和数值稳定性方面的优势,并开发了可扩展的预测方差计算方法。实验表明,新方法相比传统的Cholesky分解方法,在速度和稳定性上均有显著提升。

详情
英文摘要

Mixed-effects models are widely used to model data with hierarchical grouping structures and high-cardinality categorical predictor variables. However, for high-dimensional crossed random effects, current standard computations relying on Cholesky decompositions can become prohibitively slow. In this work, we present Krylov subspace-based methods that address existing computational bottlenecks, and we analyze them both theoretically and empirically. In particular, we derive new results on the convergence and accuracy of the preconditioned stochastic Lanczos quadrature and conjugate gradient methods for mixed-effects models, and we develop scalable methods for calculating predictive variances. In experiments with simulated and real-world data, the proposed methods yield speedups by factors of up to about 10,000 and are numerically more stable than Cholesky-based computations.

2505.09246 2026-05-15 cs.IR cs.AI cs.CL

Autofocus Retrieval: An Effective Pipeline for Multi-Hop Question Answering With Semi-Structured Knowledge

Derian Boer, Stephen Roth, Stefan Kramer

AI总结 本文提出了一种基于半结构化知识库的多跳问答框架Autofocus-Retriever(AF-Retriever),旨在有效结合结构化和非结构化信息进行问答。该方法通过引入可交换的大语言模型提取实体属性和关系约束,并结合向量相似度搜索与增量范围扩展策略,实现了在多个基准测试中优于现有方法的零样本和少样本性能。其核心贡献在于通过四步约束驱动的检索与四步补充排序流程,显著提升了答案检索的准确性和鲁棒性。

详情
Journal ref
Transactions on Machine Learning Research 2026
英文摘要

In many real-world settings, machine learning models and interactive systems have access to both structured knowledge, e.g., knowledge graphs or tables, and unstructured content, e.g., natural language documents. Yet, most rely on either. Semi-Structured Knowledge Bases (SKBs) bridge this gap by linking unstructured content to nodes within structured data. In this work, we present Autofocus-Retriever (AF-Retriever), a modular framework for SKB-based, multi-hop question answering. It combines structural and textual retrieval through novel integration steps and optimizations, achieving the best zero- and one-shot results across all three STaRK QA benchmarks, which span diverse domains and evaluation metrics. AF-Retriever's average first-hit rate surpasses the second-best method by 32.1%. Its performance is driven by (1) leveraging exchangeable large language models (LLMs) to extract entity attributes and relational constraints for both parsing and reranking the top-k answers, (2) vector similarity search for ranking both extracted entities and final answers, (3) a novel incremental scope expansion procedure that prepares for the reranking on a configurable amount of suitable candidates that fulfill the given constraints the most, and (4) a hybrid retrieval strategy that reduces error susceptibility. In summary, while constantly adjusting the focus like an optical autofocus, AF-Retriever delivers a configurable amount of answer candidates in four constraint-driven retrieval steps, which are then supplemented and ranked through four additional processing steps. An ablation study and a detailed error analysis, including a comparison of three different LLM reranking strategies, provide component-level insights. The source code is available at https://github.com/kramerlab/AF-Retriever .

2504.11703 2026-05-15 cs.CR cs.AI

Progent: Securing AI Agents with Privilege Control

Tianneng Shi, Jingxuan He, Zhun Wang, Hongwei Li, Linyu Wu, Wenbo Guo, Dawn Song

AI总结 AI代理通过调用工具与外部环境交互,容易受到如间接提示注入等攻击,导致未经授权的操作。为此,本文提出Progent框架,通过特权控制机制增强AI代理的安全性。Progent将特权表示为基于工具名称和参数的符号化安全策略,通过确定性过程检查每个工具调用,确保最小特权原则。该框架利用大型语言模型自动生成并动态更新策略,并结合SMT求解器保证策略更新的单调性,从而在保障实用性的前提下有效防止权限升级,实验表明其在多个基准测试中显著降低了攻击成功率。

详情
英文摘要

AI agents interact with external environments through tool calls, exposing them to attacks like indirect prompt injection that can trigger unauthorized actions. Securing these agents is challenging: they behave autonomously and probabilistically, security requirements evolve depending on the user's task and execution state, and there is an inherent tradeofff between security and utility. In this work, we introduce Progent, a novel framework that secures AI agents via privilege control. Progent represents privilege as a security policy consisting of symbolic rules over tool names and arguments. These rules specify which tool calls are allowed for task completion and which unnecessary ones are blocked for security. Every tool call is checked against such a policy through a deterministic procedure, enforcing the principle of least privilege. To handle diverse user tasks and evolving execution contexts, an LLM automatically generates the initial policy from the user's task and updates it during execution as new information arrives. Each proposed update is determined by an SMT solver to be either a narrowing (applied automatically) or an expansion (requiring explicit approval), ensuring that the agent's effective action space can only shrink without approval (monotonic confinement). This deterministic update mechanism preserves utility and prevents silent privilege escalation, even when adversarial inputs are present. Our evaluation on popular benchmarks (i.e., AgentDojo and ASB) shows that Progent significantly reduces attack success rates while maintaining high utility. We further validate Progent's practicality by showcasing its effectiveness in real-world agent frameworks such as LangChain and OpenAI Agents SDK.

2504.01571 2026-05-15 cs.GR cs.AI cs.CV cs.LG

Pro-DG: Procedural Diffusion Guidance for Architectural Facade Generation

Aleksander Plocharski, Jan Swidzinski, Przemyslaw Musialski

AI总结 本文提出了一种基于过程化扩散引导(Pro-DG)的建筑立面生成方法,通过在稳定扩散框架中引入分层过程化规则生成控制图,从而生成逼真的建筑立面图像。该方法从单张输入图像及其分割结果出发,利用逆过程模块识别立面的分层布局,并结合结构特征设计了一种新的ControlNet流程,实现由过程化变换引导的立面图像生成。该方法能够精确控制局部外观并进行大规模结构编辑,实验表明其在保持建筑风格和实现可控编辑方面优于现有方法。

Comments 17 pages, 15 figures, Computer Graphics Forum 2026 Journal Paper

详情
英文摘要

We use hierarchical procedural rules for the generation of control maps within the stable diffusion framework to produce photo-realistic architectural facade images. Starting from a single input image and its segmentation, we apply an inverse procedural module to identify the facade's hierarchical layout. Leveraging this hierarchy and structural features, we introduce a novel ControlNet pipeline that generates new facade imagery guided by procedural transformations. Our method enables various structural edits, including floor duplication and window rearrangement, by integrating hierarchical alignment directly into control maps. This precisely guides the diffusion-based generative process, ensuring local appearance fidelity alongside extensive structural modifications. Comprehensive evaluations, including comparisons with inpainting-based approaches and synthetic benchmarks, confirm our approach's superior capability in preserving architectural identity and achieving accurate, controllable edits. Quantitative results and user feedback validate our method's effectiveness.

2501.18756 2026-05-15 stat.ML cs.LG math.OC

A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization

Nuojin Cheng, Leonard Papenmeier, Stephen Becker, Luigi Nardi

AI总结 本文提出了一种统一的理论框架——变分熵搜索(Variational Entropy Search),揭示了预期改进(EI)与基于信息论的获取函数之间的深层联系,挑战了它们本质不同的传统观点。研究通过将EI解释为最大值熵搜索(MES)的变分近似,提出了一个新的获取函数VES-Gamma,该方法在合成和现实世界的低维与高维基准测试中表现出色,优于现有的EI和MES方法。

详情
Journal ref
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:10106-10120, 2025
英文摘要

Bayesian optimization is a widely used method for optimizing expensive black-box functions, with Expected Improvement being one of the most commonly used acquisition functions. In contrast, information-theoretic acquisition functions aim to reduce uncertainty about the function's optimum and are often considered fundamentally distinct from EI. In this work, we challenge this prevailing perspective by introducing a unified theoretical framework, Variational Entropy Search, which reveals that EI and information-theoretic acquisition functions are more closely related than previously recognized. We demonstrate that EI can be interpreted as a variational inference approximation of the popular information-theoretic acquisition function, named Max-value Entropy Search. Building on this insight, we propose VES-Gamma, a novel acquisition function that balances the strengths of EI and MES. Extensive empirical evaluations across both low- and high-dimensional synthetic and real-world benchmarks demonstrate that VES-Gamma is competitive with state-of-the-art acquisition functions and in many cases outperforms EI and MES.

2410.03280 2026-05-15 eess.AS cs.AI cs.LG eess.SP

Manikin-Recorded Cardiopulmonary Sounds Dataset Using Digital Stethoscope

Yasaman Torabi, Shahram Shirani, James P. Reilly

AI总结 该研究提出了一种使用数字听诊器录制的心肺声音数据集,包含正常及多种异常心肺音,如杂音、心律失常和呼吸音等。数据集通过临床模拟人采集,涵盖了不同身体部位的单独和混合声音,并经过频率滤波处理以增强特定声音类型。该数据集为人工智能在心肺疾病自动检测、声音分类及深度学习等领域的研究提供了重要的资源。

详情
Journal ref
IEEE Data Descriptions, vol. 2, pp. 133-140, 2025
英文摘要

Heart and lung sounds are crucial for healthcare monitoring. Recent improvements in stethoscope technology have made it possible to capture patient sounds with enhanced precision. In this dataset, we used a digital stethoscope to capture both heart and lung sounds, including individual and mixed recordings. To our knowledge, this is the first dataset to offer both separate and mixed cardiorespiratory sounds. The recordings were collected from a clinical manikin, a patient simulator designed to replicate human physiological conditions, generating clean heart and lung sounds at different body locations. This dataset includes both normal sounds and various abnormalities (i.e., murmur, atrial fibrillation, tachycardia, atrioventricular block, third and fourth heart sound, wheezing, crackles, rhonchi, pleural rub, and gurgling sounds). The dataset includes audio recordings of chest examinations performed at different anatomical locations, as determined by specialist nurses. Each recording has been enhanced using frequency filters to highlight specific sound types. This dataset is useful for applications in artificial intelligence, such as automated cardiopulmonary disease detection, sound classification, unsupervised separation techniques, and deep learning algorithms related to audio signal processing.

2410.02091 2026-05-15 cs.SE cs.AI cs.HC econ.GN q-fin.EC

The Impact of Generative AI on Collaborative Open-Source Software Development: Evidence from GitHub Copilot

Fangchen Song, Ashish Agarwal, Wen Wen

AI总结 本研究探讨了生成式人工智能(AI)对协作式开源软件(OSS)开发的影响,重点分析了GitHub Copilot这一AI编程助手在GitHub开源项目中的实际作用。研究发现,使用Copilot可使项目层面的代码贡献量提升5.9%,主要源于开发者参与度和个体生产力的提高,但同时也带来了8%的协调时间增加。研究还指出,AI对核心开发者和外围开发者的影响存在差异,为理解AI在开源社区中的长期影响提供了重要参考。

详情
英文摘要

Generative artificial intelligence (AI) facilitates content production and enhances ideation capabilities, which can significantly influence developer productivity and participation in software development. To explore its impact on collaborative open-source software (OSS) development, we investigate the role of GitHub Copilot, a generative AI pair programmer, in OSS development where multiple distributed developers voluntarily collaborate. Using GitHub's proprietary Copilot usage data, combined with public OSS project data obtained from GitHub, we find that Copilot use increases project-level code contributions by 5.9%. This gain is driven by a 3.4% rise in developer coding participation and a 2.1% increase in individual productivity. However, Copilot use also leads to an increase in coordination time by 8% due to more code discussions. This reveals an important tradeoff: While AI expands who can contribute and how much they contribute, it slows coordination in collective development efforts. Despite this tension, the combined effect of these two competing forces remains positive, indicating a net gain in overall project-level timely merge of code contributions from using AI pair programmers. Interestingly, we also find the effects differ across developer roles. Peripheral developers show relatively smaller increases in project-level code contributions and experience larger increases in coordination time than core developers. In summary, our study underscores the dual role of AI pair programmers in affecting project-level code contributions and coordination time in OSS development. Our findings on the differential effects between core and peripheral developers also provide important implications for the structure of OSS communities in the long run.

2404.13649 2026-05-15 stat.ML cs.LG stat.ME

Distributional Principal Autoencoders

Xinwei Shen, Nicolai Meinshausen

AI总结 本文提出了一种名为分布主成分自编码器(DPA)的降维方法,旨在在重建数据时保留原始数据的分布特性。该方法通过学习数据在低维潜在变量条件下的条件分布,使得重建数据与原始数据在分布上一致。实验表明,DPA在气候数据、单细胞数据和图像数据上均能有效保留数据的原始分布和重要结构特征。

详情
英文摘要

Dimension reduction techniques usually lose information in the sense that reconstructed data are not identical to the original data. However, we argue that it is possible to have reconstructed data identically distributed as the original data, irrespective of the retained dimension or the specific mapping. This can be achieved by learning a distributional model that matches the conditional distribution of data given its low-dimensional latent variables. Motivated by this, we propose Distributional Principal Autoencoder (DPA) that consists of an encoder that maps high-dimensional data to low-dimensional latent variables and a decoder that maps the latent variables back to the data space. For reducing the dimension, the DPA encoder aims to minimise the unexplained variability of the data with an adaptive choice of the latent dimension. For reconstructing data, the DPA decoder aims to match the conditional distribution of all data that are mapped to a certain latent value, thus ensuring that the reconstructed data retains the original data distribution. Our numerical results on climate data, single-cell data, and image benchmarks demonstrate the practical feasibility and success of the approach in reconstructing the original distribution of the data. DPA embeddings are shown to preserve meaningful structures of data such as the seasonal cycle for precipitations and cell types for gene expression.

2303.14511 2026-05-15 hep-ex cs.AI cs.LG hep-ph physics.data-an

Improving robustness of jet tagging algorithms with adversarial training: exploring the loss surface

Annika Stein

AI总结 本文研究了如何通过对抗训练提高高能物理中喷注分类算法的鲁棒性,重点分析了输入特征微小扰动对模型性能的影响。作者通过探索损失函数的几何结构,揭示了模型在面对系统性不确定性时的稳健性机制,并提出了一种在保持高性能的同时增强模型鲁棒性的对抗训练方法。

Comments 5 pages, 2 figures; submitted to ACAT 2022 proceedings

详情
Journal ref
2026 J. Phys.: Conf. Ser. 3206 012085
英文摘要

In the field of high-energy physics, deep learning algorithms continue to gain in relevance and provide performance improvements over traditional methods, for example when identifying rare signals or finding complex patterns. From an analyst's perspective, obtaining highest possible performance is desirable, but recently, some attention has been shifted towards studying robustness of models to investigate how well these perform under slight distortions of input features. Especially for tasks that involve many (low-level) inputs, the application of deep neural networks brings new challenges. In the context of jet flavor tagging, adversarial attacks are used to probe a typical classifier's vulnerability and can be understood as a model for systematic uncertainties. A corresponding defense strategy, adversarial training, improves robustness, while maintaining high performance. Investigating the loss surface corresponding to the inputs and models in question reveals geometric interpretations of robustness, taking correlations into account.

2211.16113 2026-05-15 cs.NE cs.LG

Timing-Based Backpropagation in Spiking Neural Networks Without Single-Spike Restrictions

Kakei Yamamoto, Yusuke Sakemi, Kazuyuki Aihara

AI总结 本文提出了一种无需单次放电限制的新型反向传播算法,用于训练脉冲神经网络(SNNs),该算法通过单个神经元的多个脉冲时间相对关系来编码信息。与传统方法不同,该方法允许每个神经元多次放电,从而提升了网络的计算能力,并在多个任务中达到了与非卷积人工神经网络相当的准确率。研究还发现,网络的脉冲数量特性依赖于突触后电流和膜电位的时间常数,并存在一个最优时间常数以实现最高测试准确率,这一现象在传统基于单次放电的时间编码方法中未被观察到。

Comments 10 pages, 5 figures

详情
Journal ref
2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 2024, pp. 1-9
英文摘要

We propose a novel backpropagation algorithm for training spiking neural networks (SNNs) that encodes information in the relative multiple spike timing of individual neurons without single-spike restrictions. The proposed algorithm inherits the advantages of conventional timing-based methods in that it computes accurate gradients with respect to spike timing, which promotes ideal temporal coding. Unlike conventional methods where each neuron fires at most once, the proposed algorithm allows each neuron to fire multiple times. This extension naturally improves the computational capacity of SNNs. Our SNN model outperformed comparable SNN models and achieved as high accuracy as non-convolutional artificial neural networks. The spike count property of our networks was altered depending on the time constant of the postsynaptic current and the membrane potential. Moreover, we found that there existed the optimal time constant with the maximum test accuracy. That was not seen in conventional SNNs with single-spike restrictions on time-to-fast-spike (TTFS) coding. This result demonstrates the computational properties of SNNs that biologically encode information into the multi-spike timing of individual neurons. Our code would be publicly available.

2202.05568 2026-05-15 stat.ML cs.IT cs.LG math.IT math.PR math.ST stat.TH

Change of measure through the Legendre transform

Antoine Picard-Weibel, Benjamin Guedj

AI总结 本文研究了通过Legendre变换实现测度变化的方法,用于推导PAC-Bayes泛化界。作者结合Legendre变换与Fenchel-Young不等式,基于$f$-散度构建了测度变化不等式,拓展了传统Donsker-Varadhan定理的条件。该方法为学习理论提供了更灵活的分析工具,能够在更广泛的假设条件下建立PAC-Bayes保证。

Comments 27 pages

详情
英文摘要

PAC-Bayes generalisation bounds are derived via change-of-measure inequalities that transfer concentration properties from a reference measure to all posterior measures. The specific choice of change of measure determines the assumptions required on the empirical risk; in particular, the classical Donsker--Varadhan theorem leads to bounds relying on bounded exponential moments. We study change-of-measure inequalities based on \(f\)-divergences, obtained by combining the Legendre transform of \(f\) with the Fenchel--Young inequality. Beyond their intrinsic interest in probability theory, we show how these inequalities are helpful in learning theory and yield PAC-Bayes bounds under tailored assumptions on the empirical risk, thereby extending the range of conditions under which PAC-Bayesian guarantees can be established.

2605.14188 2026-05-15 quant-ph cs.CL cs.DL physics.atom-ph

QOuLiPo: What a quantum computer sees when it reads a book

Christophe Jurczak

AI总结 本文研究了量子计算机如何“阅读”书籍,通过将八部文艺复兴时期的经典著作输入中性原子量子处理器,将文本结构转化为图结构,从而探索量子硬件对文本的处理方式。研究引入了“刚性 rho”指标,用于衡量书籍结构的独特性,并反向设计文本结构以匹配量子硬件的图结构,生成名为 QOuLiPo 的新文本集合,为量子处理器的性能评估提供基准。该工作为数字人文领域提供了与量子计算结合的新方法,并展示了量子处理器在处理复杂文本结构上的潜力。

详情
英文摘要

What does a book look like to a quantum computer? This paper takes eight classical works of the Renaissance and its late-antique inheritance -- from Augustine to Galileo -- and runs each through a neutral-atom quantum processor. The bridge is graphs: each textual unit becomes an atom, and graph edges are physical blockade constraints for engineered exact unit-disk designs, or a 2D approximation to the semantic graph for natural texts. Three contributions follow. First, we introduce rigidity rho, a metric for how unique a book's structural backbone is -- distinguishing Marguerite de Navarre's Heptameron (rigid, twelve-nouvelle hard core) from Boethius (fully fungible, every chapter substitutable). Second, we invert the pipeline: rather than extracting a graph from existing prose, we pick a target graph the hardware encodes natively, and write a book whose structure matches it. The twenty-nine texts written this way, collected under the name QOuLiPo, extend the OuLiPo tradition to graph-topological constraints and, together with the eight natural texts, form a benchmark distribution against which neutral-atom hardware can be tracked as it scales. Third, we run both natural and engineered texts on Pasqal's FRESNEL processor up to one hundred atoms; engineered texts reach high approximation ratios, the cleanest instances returning the exact backbone. A cloud-accessible quantum machine plus an agentic coding environment now lets a single investigator run this pipeline end-to-end. What is reported is an application layer, not a speedup -- humanistic instances ready to load onto neutral-atom processors as they scale, already complementing classical text analysis. The Digital Humanities community has a stake in building familiarity with this hardware now: the engineered-corpus design choices made today fix the benchmark distribution future hardware will be measured against.

2605.14177 2026-05-15 cs.IR cs.AI cs.CL

Thinking Ahead: Prospection-Guided Retrieval of Memory with Language Models

Harshita Chopra, Krishna Kant Chintalapudi, Suman Nath, Ryen W. White, Chirag Shah

AI总结 本文研究了如何通过前瞻思维引导语言模型从长期对话历史中检索用户特定的事实,以提升个性化对话系统的性能。为了解决传统检索方法依赖语义相似度而难以发现远距离相关事实的问题,作者提出了基于前瞻引导的检索方法(PGR),通过构建可能的未来步骤作为检索探针,从而更有效地挖掘用户历史中相关但不易被传统方法发现的记忆。实验表明,该方法在多个基准测试中显著提升了检索效果和响应质量。

Comments Preprint

详情
英文摘要

Long-horizon personalization requires dialogue assistants to retrieve user-specific facts from extended interaction histories. In practice, many relevant facts often have low semanticsimilarity to the query under dense retrieval. Standard Retrieval-Augmented Generation (RAG) and GraphRAG systems are still largely retrospective: they rely on embedding similarity to the query or on fixed graph traversals, so they often miss facts that matter for the user's needs but lie far from the query in embedding space. Inspired by prospection, the human ability to use imagined futures as cues for recall, we introduce Prospection-Guided Retrieval (PGR), which decouples retrieval from how memories are stored. Given a user query, PGR first expands the goal into a short Tree-of-Thought (ToT) or linear chain of plausible next steps, and uses these steps as retrieval probes rather than relying on the original query alone. The facts retrieved by these probes are then used to personalize the next round of prospection, enabling PGR to uncover additional memories that become relevant only after the simulation is grounded in the user's history. We also introduce MemoryQuest, a challenging multi-session benchmark in which each query is annotated with 3--5 dated reference facts subject to a low query-reference similarity constraint. Across 1,625 queries spanning 185 user profiles from 3 publicly available datasets, PGR-TOT substantially improves retrieval, including nearly 3x recall on MemoryQuest over the strongest baseline. In pairwise LLM-as-judge comparisons against baselines, PGR-generated responses are preferred on 89--98% of queries, with blinded human annotations on held-out subsets showing the same trend. Overall, the results demonstrate that explicit prospection yields large gains in long-horizon retrieval and response quality relative to similarity-only baselines.

2605.14153 2026-05-15 cs.CR cs.AI

ExploitBench: A Capability Ladder Benchmark for LLM Cybersecurity Agents

Seunghyun Lee, David Brumley

AI总结 本文提出ExploitBench,一个用于评估大语言模型(LLM)在网络安全领域能力的分级基准,将漏洞利用过程分解为16个可衡量的阶段,从代码崩溃到完全控制目标系统。该基准通过确定性验证机制,准确评估模型在不同阶段的表现。实验基于41个V8漏洞进行,结果显示当前公开部署的前沿模型在触发漏洞和崩溃方面表现良好,但在实现任意代码执行等高级能力上仍有明显不足,而私有模型则表现出更强的利用能力。

详情
英文摘要

Exploitation is not a binary event. It is a ladder of acquiring progressive capabilities, from executing a single buggy line of code to taking full control of the target. However, existing LLM security benchmarks treat a crash as exploitation success. That single binary outcome collapses the hard parts of exploitation: the transition from triggering a bug to constructing reusable primitives and control. We present ExploitBench, a capability-graded benchmark that decomposes exploitation into 16 measurable flags, from coverage and crash through sandbox primitives, arbitrary read/write, control-flow hijack, and arbitrary code execution. Each capability is verified by a deterministic oracle that uses a per-run randomized challenge-response for primitives, differential execution against ground-truth binaries to measure progress, and a signal-handler proof for code execution. We instantiate ExploitBench on 41 V8 bugs because V8 is both widely deployed and exploitation-hardened. We report three arms: <model,env> as the primary measurement of model-environment capability, <model,env, adaptive coaching> as a secondary arm that adds adaptive coaching to test whether targeted feedback shifts outcomes, and <model,env,harness> as an ablation that swaps in the model's native CLI to check whether vendor-side optimizations increase exploitation capabilities. Our results show a sharp capability split between publicly deployed frontier models and the private frontier. Across the 8 publicly deployed models tested, reaching the vulnerable code and triggering a crash is routine, but arbitrary code execution is not. The private model shows arbitrary code execution on approximately half. Overall, results suggest that exploit construction against hardened targets is an emerging frontier capability.

2605.14142 2026-05-15 stat.ML cs.LG stat.CO

To discretize continually: Mean shift interacting particle systems for Bayesian inference

Ayoub Belhadji, Daniel Sharp, Youssef M. Marzouk

AI总结 本文提出了一种基于最大均值差异(MMD)最小化的交互粒子系统,用于在已知非归一化密度的情况下近似概率分布的积分。该方法扩展了经典均值漂移算法和经验分布最优量化算法,适用于连续分布,并且不受未知归一化常数的影响,支持无梯度和有梯度的实现方式。实验表明,该方法在多模态混合、贝叶斯分层模型、受PDE约束的反问题等多种采样任务中表现出良好的收敛性、多模态捕捉能力和高维扩展性。

详情
英文摘要

Integration against a probability distribution given its unnormalized density is a central task in Bayesian inference and other fields. We introduce new methods for approximating such expectations with a small set of weighted samples -- i.e., a quadrature rule -- constructed via an interacting particle system that minimizes maximum mean discrepancy (MMD) to the target distribution. These methods extend the classical mean shift algorithm, as well as recent algorithms for optimal quantization of empirical distributions, to the case of continuous distributions. Crucially, our approach creates dynamics for MMD minimization that are invariant to the unknown normalizing constant; they also admit both gradient-free and gradient-informed implementations. The resulting mean shift interacting particle systems converge quickly, capture anisotropy and multi-modality, avoid mode collapse, and scale to high dimensions. We demonstrate their performance on a wide range of benchmark sampling problems, including multi-modal mixtures, Bayesian hierarchical models, PDE-constrained inverse problems, and beyond.

2605.14123 2026-05-15 eess.IV cs.CV

Keyed Nonlinear Transform: Lightweight Privacy-Enhancing Feature Sharing for Medical Image Analysis

Haebom Lee, Gyeongjung Kim

AI总结 本文提出了一种名为Keyed Nonlinear Transform(KNT)的轻量级特征转换方法,用于在医疗图像分析中增强隐私保护,解决特征共享过程中患者身份信息泄露的问题。该方法通过密钥条件的非线性变换对中间特征进行混淆,有效降低了特征的可重新识别性,同时保持了模型的分类性能和计算效率。实验表明,KNT在不重新训练模型的前提下,显著提升了隐私保护水平,并适用于多种医学图像任务。

详情
英文摘要

Feature sharing via split inference offers a lightweight alternative to federated learning for resource-constrained hospitals, but transmitted features still leak patient identity information and lack practical mechanisms for controlled feature sharing. We propose Keyed Nonlinear Transform (KNT), a drop-in feature transformation that applies key-conditioned obfuscation to intermediate representations. KNT reduces re-identification AUC from 0.635 to 0.586, corresponding to a 36% reduction in above-chance identity signal, while introducing only 0.15 ms CPU overhead, without backbone retraining, and preserving classification performance within 1.0 pp. Our analysis shows that KNT's nonlinear transform prevents closed-form inversion and shifts recovery to iterative gradient-based optimization under full key compromise, substantially increasing inversion difficulty. The same transform generalizes to dense prediction tasks, incurring only a 4.4 pp Dice reduction on skin-lesion segmentation without retraining. These results position KNT as a practical and efficient privacy layer for split inference deployments.

2605.14098 2026-05-15 stat.ML cs.CL cs.LG

Pause and Reflect: Conformal Aggregation for Chain-of-Thought Reasoning

Yu Gu, Zijun Yu, Vahid Partovi Nia, Masoud Asgharian

AI总结 该研究针对链式推理(CoT)中多路径推理结果的聚合不确定性问题,提出了一种基于 conformal 的聚合方法,以提升系统在拒绝回答时的准确性。不同于传统的多数投票方式,该方法采用加权得分聚合,并结合 conformal 风险控制来校准拒绝规则,从而在有限样本下保证自信错误率的控制。实验表明,该方法在多个基准测试中实现了较高的选择性准确率,且无需重新训练模型。

Comments 9 pages, 4 figures, submitted

详情
英文摘要

Chain-of-thought (CoT) reasoning with self-consistency improves performance by aggregating multiple sampled reasoning paths. In this setting, correctness is no longer tied to a single reasoning trace but to the aggregation rule over a pool of candidate paths, making aggregation uncertainty the central challenge. This issue is critical where confidently incorrect answers are far more costly than abstentions. We introduce a conformal procedure for CoT reasoning that directly addresses aggregation uncertainty. Our approach replaces majority voting with weighted score aggregation over reasoning paths and calibrates an abstention rule using conformal risk control. This approach leads to finite-sample guarantees on the confident-error rate--the probability that the system answers and is wrong. We further identify score separability as the key condition under which abstention provably improves selective accuracy, and derive closed-form expressions that predict accuracy gains from calibration data alone. The method is fully inference-time, and requires no retraining. Across four benchmarks, four open-source models, and three score classes, realized confident-error rates are consistent with the prescribed targets up to calibration-split and test-set variability. Our method achieves $90.1\%$ selective accuracy on GSM8K by abstaining on less than $5\%$ of problems, compared with $82\%$ accuracy under majority-voting baseline.

2605.14090 2026-05-15 cs.CY cs.GR cs.LG

Synthetic Sociality: How Generative Models Privatize the Social Fabric

Ana Dodik, Moira Weigel

AI总结 本文提出了一种批判性理论框架,用于分析生成模型在描述性和规范性层面的影响。研究指出,生成模型不仅自动化了智力劳动,还复制和重塑了更广泛的人类社会能力,即“社会行为”。文章通过梳理数字经济中社会性的商品化过程,探讨生成模型如何依赖社会数据,并引入“合成社会性”概念,揭示由私营且缺乏民主治理的生成模型所塑造的社会现实,最后提出规范性分析与未来设计方向。

详情
英文摘要

We put forth a critical theoretical framework for analyzing generative models both descriptively and normatively. Our thesis is that generative models automate the production not only of intellectual labor or intelligence, but of a broader set of human social capacities we name "social doing." We do this by historicizing the commodification of sociality in the digital economy, leading to the availability of social data as the precondition for generative models. We elaborate our definition of "social doing" by drawing a distinction between "use" and "exchange" sociality and further differentiate between the ways that generative models either substitute for or mediate existing social relations and processes. We then turn to existing empirical research on how people use generative model-based products and the effects that their use has upon them. In this, we introduce the concept of Synthetic Sociality, a social reality in part fabricated by Silicon Valley's privately owned and undemocratically governed generative models. Lastly, we offer a normative analysis based on our findings and framework, and discuss future design opportunities.

2605.14066 2026-05-15 eess.AS cs.AI cs.CL cs.SD

A Benchmark for Early-stage Parkinson's Disease Detection from Speech

Terry Yi Zhong, Cristian Tejedor-Garcia, Khiet P. Truong, Janna Maas, Louis ten Bosch, Bastiaan R. Bloem

AI总结 该研究提出首个用于基于语音的早期帕金森病检测的基准,旨在解决现有研究因数据集、语言、任务和评估方式不同而导致的结果难以比较的问题。该基准采用说话人无关划分,支持在公开数据集上进行公平且可复现的跨方法评估,并涵盖三种常见语音任务,同时在不同训练资源条件下对方法进行测试。研究还提供了多维度的评估分析,助力细粒度比较与临床应用,为推动鲁棒且具有临床意义的早期帕金森病检测提供了可复用的参考。

Comments Submitted to Interspeech2026

详情
英文摘要

Early-stage Parkinson's disease (EarlyPD) detection from speech is clinically meaningful yet underexplored, and published results are hard to compare because studies differ in datasets, languages, tasks, evaluation protocols, and EarlyPD definitions. To address this issue, we propose the first benchmark for speech-based EarlyPD detection, with a speaker-independent split designed for fair and replicable cross-method evaluation on researcher-accessible datasets. The benchmark covers three common speech tasks and evaluates methods under different training-resource settings. We also present multi-dimensional evaluation breakdowns by dataset, aggregation level, gender, and disease stage to support fine-grained comparisons and clinical adoption. Our results provide a replicable reference and actionable insights, encouraging the adoption of this publicly available benchmark to advance robust and clinically meaningful EarlyPD detection from speech.

2605.14041 2026-05-15 stat.ME cs.LG

Wahkon: A Statistically Principled Deep RKHS Superposition Network

Yongkai Chen, Wenxuan Zhong, Ping Ma

AI总结 本文提出了一种名为Wahkon的深度再生核希尔伯特空间(RKHS)叠加网络,旨在结合深度学习的预测能力与RKHS方法的统计保证。该方法基于Kolmogorov叠加原理和Wahba样条的RKHS正则化思想,建立了有限维的深度表示定理,实现了可训练的模型结构与逐层复杂度控制。理论分析表明,该方法在层次化高斯过程先验下等价于最大后验估计,并在深度与宽度的正则化权衡方面具有最优收敛率;实验显示其在多个基准任务和单细胞数据分析中优于传统深度模型。

详情
英文摘要

Deep learning excels at prediction but often lacks finite-sample guarantees and calibrated uncertainty; RKHS (Reproducing Kernel Hilbert Space)-based methods provide those guarantees but struggle to adapt in high dimensions. We propose Wahkon, a deep RKHS superposition network that unifies Kolmogorov's superposition principle with RKHS regularization in the smoothing-spline tradition of Wahba. This yields a finite-dimensional deep representer theorem that makes training tractable and provides explicit layerwise complexity control. We show the penalized estimator is exactly the MAP (maximum a posteriori) estimate under a hierarchical Gaussian-process prior, extending the spline/GP duality to deep compositions. Using metric-entropy arguments, we establish minimax-optimal convergence rates under mild smoothness and clarify how depth and width trade off with regularity. Empirically, Wahkon outperforms multilayer perceptrons, Neural Tangent Kernels, and Kolmogorov--Arnold Networks across simulation benchmarks and a single-cell CITE-seq study. By unifying Kolmogorov's superposition principle with RKHS regularization, Wahkon delivers accuracy, interpretability, and statistical rigor in a single framework.

2605.14025 2026-05-15 q-bio.NC cs.AI

Do Language Models Align with Brains? Prediction Scores Are Not Enough

Xiao Jia

AI总结 本文探讨了语言模型是否与大脑在语言处理上具有一致性,并质疑仅凭预测得分是否足以证明语言模型能捕捉大脑相关的语言计算。研究采用L-PACT框架,从预测性、关系性、机制剥离和可靠性等多个维度进行严格评估,发现现有语言模型在多个关键指标上无法通过对照实验的检验,表明其与大脑的对齐程度尚未得到充分支持。研究强调需更审慎地解读模型与大脑之间的关系,避免将表面积极结果误认为结构性对齐。

Comments 39 pages, 4 main figures, 6 supplementary figures

详情
英文摘要

Brain-language model comparisons often interpret neural prediction scores as evidence that model representations capture brain-relevant language computation. We asked whether language models align with brains, and whether prediction scores are enough to support that claim, using L-PACT, a source-audited framework that evaluates predictive, relational, mechanism-stripping, and reliability-bounded evidence. Across primary naturalistic language neural datasets and derived language-model representations, L-PACT compared real model features with nuisance baselines and severe controls, tested whether model-to-brain profiles reproduced brain-to-brain patterns, recomputed held-out scores after mechanism stripping, and normalized evidence against brain-brain ceilings. The locked analysis set contains 414 predictive-control rows, 2304 relational profile rows, 4320 mechanism-stripping rows, 420 brain-brain ceiling rows, and 146 integrated decision rows. Assay-sensitivity checks showed that brain-brain reliability, brain-as-model run-to-run relational profiles, independent low-level neural and WAV-derived acoustic-envelope gates, and a deterministic implanted-signal simulation can produce positive evidence when expected. Nevertheless, no real model row passed the predictive, relational, mechanism-stripping, or operational Turing-bounded reliability gates; all 146 integrated rows were control-explained. Less stringent single-criterion rules would have counted raw positive predictive, relational, stripping-delta, and ceiling-normalized effects, but L-PACT downgraded them because controls explained the apparent evidence. In the analyzed derived artifact set, the tested language-model representations do not satisfy L-PACT alignment gates; apparent positives are converted into an auditable control-explained taxonomy rather than treated as structural alignment.

2605.14021 2026-05-15 cs.CY cs.AI

Measuring Google AI Overviews: Activation, Source Quality, Claim Fidelity, and Publisher Impact

Haofei Xu, Umar Iqbal, Jacob M. Montgomery

AI总结 该研究对谷歌AI概览(AIOs)进行了大规模纵向测量,分析了其激活率、引用来源质量、声明准确性及对出版商的影响。研究发现,AIOs的激活率在问题类查询中高达64.7%,但对政治敏感话题则明显降低;其引用的来源比传统搜索结果更可信,但部分来源未出现在搜索结果中,表明其选择机制不同于谷歌的排名算法。此外,AIOs的回答中约11%的声明缺乏来源支持,且引用页面中超过半数包含广告,可能影响出版商收入。该研究揭示了生成式AI对在线信息生态系统的深远影响。

Comments Under Review

详情
英文摘要

Google AI Overviews (AIOs) are arguably the most widely encountered deployment of generative AI, reaching over 2 billion users who may not realize the answers they see are AI-generated. Where search engines have traditionally surfaced ranked sources and left users to evaluate them, AIOs synthesize and deliver a single answer - giving Google unprecedented editorial control over what users read and know. We present a large-scale longitudinal measurement study, issuing 55,393 trending queries across 19 topical categories over a 40-day window (March 13 - April 21, 2026). We report four main findings. First, overall AIO activation is 13.7%, rising to 64.7% for question-form queries, while politically sensitive topics see markedly lower rates. Second, AIO-cited domains are more credible than co-displayed first-page results, yet nearly 30% do not appear in those results at all, indicating a source selection mechanism distinct from Google's ranking algorithm. Third, decomposing responses into 98,020 atomic claims, 11.0% are unsupported by the cited pages - with omission the dominant failure mode - and source quality and claim fidelity are largely independent. Fourth, well over half of AIO-cited pages carry display advertising, meaning publishers lose revenue when AIOs suppress the click-through, even as Google's own sponsored ads continue to appear on the same page. Together, these findings document a rapid transformation of the online information ecosystem whose consequences for epistemic security remain poorly understood.

2605.14019 2026-05-15 econ.EM cs.LG math.ST stat.CO stat.TH

Regret Equals Covariance: A Closed-Form Characterization for Stochastic Optimization

Irene Aldridge

AI总结 本文研究了随机优化问题中遗憾(Regret)的度量问题,提出了一个精确的协方差分解公式,将期望遗憾表示为不确定参数与最优决策之间的协方差加上一个可估计的残差项。对于线性规划和无约束二次规划问题,该残差项为零,使得遗憾可直接由协方差计算得出,从而避免了传统样本平均近似方法的高计算复杂度。该方法在实际问题中可通过历史数据高效估计协方差,计算效率显著提升,并通过理论分析和实验验证了其有效性。

Comments 33 pages

详情
英文摘要

Regret is the cost of uncertainty in algorithmic decision-making. Quantifying regret typically requires computationally expensive simulation via Sample Average Approximation (SAA), with complexity $\mathcal{O}(Bn^{2}d^{3})$ in the number of scenarios $B$, variables $n$, and constraints $d$. % This paper proves that expected regret in any stochastic optimization problem admits the exact decomposition % \begin{equation*} \mathrm{Regret}(c) = \mathrm{Cov}(c,\,π^{*}(c)) + R(c), \end{equation*} % where $c$ is the vector of uncertain parameters, $π^{*}(c)$ is the optimal decision, and $R(c)$ is a residual whose magnitude we bound explicitly under Lipschitz, smooth, and strongly convex conditions. % For linear programs and unconstrained quadratic programs, including the classical Markowitz portfolio problem, we prove $R(c)=0$ exactly, so that $\mathrm{Regret}(c) = \mathrm{Cov}(c,π^{*}(c))$ holds without approximation. % When historical cost-decision pairs $\{(c_i, π^*(c_i))\}$ are available, the covariance can be estimated in $\mathcal{O}(nd^{2})$ time, which is orders of magnitude faster than SAA. The estimation is performed by a single pass through the data. % We derive concentration bounds, a central limit theorem, and an asymptotically unbiased residual estimator, and we validate all results on synthetic LP, QP, and integer programming instances and on a rolling-window portfolio experiment using ten years of CRSP equity data.