arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1695
专题追踪
2601.18685 2026-05-25 math.HO cs.LG

LLAMA LIMA: A Living Meta-Analysis on the Effects of Generative AI on Learning Mathematics

LLAMA LIMA: 关于生成式AI对数学学习影响的活元分析

Anselm Strohmaier, Samira Bödefeld, Oliver Straser, Frank Reinhold

发表机构 * University of Education Freiburg, Institute of Mathematics Education(弗赖堡教育大学数学教育研究所)

AI总结 本文介绍了一项关于生成式人工智能对数学学习效果影响的活体元分析(LLAMA LIMA),旨在应对该领域研究进展迅速、传统综述易过时的问题。研究遵循PRISMA-LSR指南,持续更新文献库,并采用贝叶斯多层次元回归模型处理嵌套和累积数据,定期发布更新结果。第三版分析纳入24项研究,结果显示生成式AI对数学学习有积极影响,且在辅助而非替代教师教学时效果更佳。

Comments This is a living publication. See the first page of the PDF for more information

详情
AI中文摘要

生成式AI在数学教育中的能力正在迅速发展,给研究跟上步伐带来了重大挑战。研究综合仍然稀缺,并且可能在出版时就已经过时。为了解决这个问题,我们提出了一个关于基于生成式AI的数学学习干预效果的活元分析(LIMA)。遵循PRISMA-LSR指南,我们持续更新文献库,应用贝叶斯多水平元回归模型来处理嵌套和累积数据,并定期在预印本服务器上发布更新版本。本文报告了第三版的结果,包括24项研究,其中3项是自第二版以来新纳入的。分析表明存在正向效应(g = 0.40),可信区间较宽[0.14, 0.67],反映了证据基础仍然有限。结果显示没有发表偏倚。调节变量分析表明,有中等证据表明生成式AI在补充常规教学而非替代教师时更有益。

英文摘要

The capabilities of generative AI in mathematics education are rapidly evolving, posing significant challenges for research to keep pace. Research syntheses remain scarce and risk being outdated by the time of publication. To address this issue, we present a Living Meta-Analysis (LIMA) on the effects of generative AI-based interventions for learning mathematics. Following PRISMA-LSR guidelines, we continuously update the literature base, apply a Bayesian multilevel meta-regression model to account for nested and cumulative data, and publish updated versions on a preprint server at regular intervals. This paper reports results from the third version, including 24 studies, 3 of which were newly included since the second version. The analyses indicate a positive effect (g = 0.40) with a wide credible interval [0.14, 0.67], reflecting the still limited evidence base. Results indicate no publication bias. Moderator analyses indicate moderate evidence that generative AI is more beneficial when it complements regular instruction rather than replacing teachers.

2601.09600 2026-05-25 cs.CY cs.AI cs.HC cs.IR

Information Access of the Oppressed: Freirean Design for Emancipatory Information Access

被压迫者的信息获取:解放性信息获取的弗莱雷式设计

Bhaskar Mitra, Nicola Neophytou, Sireesh Gururaja

发表机构 * Independent Researcher(独立研究者) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文探讨了如何在面对威权势力对在线信息访问平台的控制时,通过保罗·弗莱雷的解放教育理论,设计出具有解放性质的信息访问系统。研究挑战了技术开发者与用户之间的传统二元对立关系,主张通过“弗莱雷式设计”使平台成为社区成员共同构建和抗争的工具,从而实现结构性的解放。

详情
AI中文摘要

在线信息获取(IA)平台是威权主义捕获的目标。我们通过保罗·弗莱雷的解放教育学理论视角,探讨如何保护我们的平台并确保解放性成果。弗莱雷的理论为探索IA的社会技术问题提供了一个截然不同的视角,相对于当前主导的公平、问责和透明度框架。我们明确挑战IA平台开发中的技术专家-用户二分法,这反映了弗莱雷分析中的师生关系。通过将弗莱雷的分析扩展到IA,我们批判了技术专家作为解放者的框架,即(利他主义的)技术专家有责任减轻新兴技术对边缘化社区的风险。相反,我们倡导弗莱雷式设计,其目标是在结构上使平台暴露于社区成员的共同选择和共同构建,以支持他们的解放斗争。

英文摘要

Online information access (IA) platforms are targets of authoritarian capture. We explore the question of how to safeguard our platforms and ensure emancipatory outcomes through the lens of Paulo Freire's theories of emancipatory pedagogy. Freire's theories provide a radically different lens for exploring IA's sociotechnical concerns relative to the current dominating frames of fairness, accountability, and transparency. We make explicit, with the intention to challenge, the technologist-user dichotomy in IA platform development that mirrors the teacher-student relation in Freire's analysis. By extending Freire's analysis to IA, we critique the technologists-as-liberator frame where it is the burden of (altruistic) technologists to mitigate the risks of emerging technologies for marginalized communities. Instead, we advocate for Freirean Design whose goal is to structurally expose the platform for co-option and co-construction by community members in aid of their emancipatory struggles.

2601.03260 2026-05-25 cs.CE cs.CL

SciNet: Evaluating AI Agents in Relation-Aware Scientific Literature Retrieval

SciNet: 评估关系感知科学文献检索中的AI代理

Chenyang Shao, Fengli Xu, Yong Li

发表机构 * Department of Electronic Engineering, BNRist, Tsinghua University, Beijing, China(电子工程系、BNRist、清华大学、北京、中国) Zhongguancun Academy(中关村学院)

AI总结 该研究提出 SciNet,一个用于科学文献检索的首个关系感知数据集,旨在解决现有 AI 检索代理在理解科学论文间复杂关系方面的不足。SciNet 基于包含 2.69 亿篇论文的元数据库构建,包含 8,940 个精心设计的任务,涵盖从个体论文检索到科学演化路径重建的多层次关系理解。实验表明,现有检索代理在关系感知任务上的准确率普遍低于 20%,而结合 SciNet 的代理在文献综述任务中质量提升了 25.3%,凸显了关系感知检索对深化科学洞察的重要性。

详情
AI中文摘要

AI代理在科学研究的文献检索中已被广泛采用,催生了如Deep Research等工具。然而,现有的检索代理主要依赖基于关键词或嵌入的方法。虽然它们在捕捉内容级相似性方面有效,但难以理解科学论文之间的复杂关系网络,例如识别相互印证或冲突的研究以及追踪技术谱系。这一根本性限制常常导致知识结构碎片化、研究情感误解以及集体科学进展建模失效。为解决这一限制,我们引入了SciNet,这是首个用于信息检索代理的科学网络关系感知数据集。该数据集基于包含7个学科、2.69亿篇论文的元数据库,并包含8940个精心设计的任务,系统性地捕捉了三个层次的关系理解:以自我为中心检索具有新颖知识结构的论文、成对识别学术关系以及路径式重建科学演化。对三类检索代理的广泛评估表明,它们在关系感知任务上的准确率通常低于20%,凸显了当前检索范式的根本缺陷。重要的是,在下游文献综述应用中,配备SciNet的代理在综述质量上实现了25.3%的提升,突出了关系感知检索对深化科学见解的关键价值。我们在https://github.com/tsinghua-fib-lab/SciNet公开发布SciNet以支持未来研究。

英文摘要

AI agents have seen widespread adoption in information retrieval for scientific research, giving rise to tools such as Deep Research. However, existing retrieval agents mainly rely on keyword- or embedding-based methods. While effective at capturing content-level similarities, they struggle to understand complex relational networks among scientific papers, such as identifying corroborating or conflicting studies and tracing technological lineages. This fundamental limitation often results in fragmented knowledge structures, misinterpreted research sentiment, and ineffective modeling of collective scientific progress. To address this limitation, we introduce SciNet, the first Scientific Network relation-aware dataset for information retrieval agents. Built on a meta-database of 269 million papers across 7 disciplines and containing 8,940 carefully designed tasks, SciNet systematically captures three levels of relational understanding: ego-centric retrieval of papers with novel knowledge structures, pairwise identification of scholarly relationships, and path-wise reconstruction of scientific evolution. Extensive evaluation of three categories of retrieval agents shows that their accuracy on relation-aware tasks often falls below 20%, highlighting a fundamental shortcoming of current retrieval paradigms. Importantly, in a downstream literature review application, agents empowered with SciNet achieve a 25.3% improvement in review quality, highlighting the critical value of relation-aware retrieval for deepening scientific insights. We publicly release SciNet at https://github.com/tsinghua-fib-lab/SciNet to support future research.

2512.18470 2026-05-25 cs.SE cs.AI cs.MA

SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios

SWE-EVO:在长周期软件演化场景中基准测试编码智能体

Tue Le, Minh V. T. Thai, Dung Nguyen Manh, Huy Phan Nhat, Nghi D. Q. Bui

发表机构 * FPT Software AI Center(FPT软件人工智能中心) School of Computing and Information Systems(计算与信息系统学院) University of Melbourne(墨尔本大学) Center of AI Research(人工智能研究中心) VinUniversity(文大学)

AI总结 现有的AI编程代理基准主要集中在单一任务上,如修复错误或添加小功能,而实际软件工程是一个长期演进的过程,涉及多文件协调与多次迭代。为此,研究者提出了SWE-EVO基准,基于七个成熟开源Python项目的发布说明构建,包含48个需要多步骤修改的任务,平均涉及21个文件,并通过大量测试用例验证。实验表明,当前代理在长期、多文件任务上的表现仍存在显著差距,研究还提出了衡量部分进展的新指标——Fix Rate。

详情
AI中文摘要

现有的AI编码智能体基准测试主要关注孤立、单一问题的任务,例如修复一个bug或添加一个小功能。然而,现实世界的软件工程是一个长周期的工作:开发者解读高层次需求,协调跨多个文件的变更,并在多次迭代中演化代码库同时保持功能。我们引入了SWE-EVO,一个针对这种长周期软件演化挑战的基准测试。该基准测试基于七个成熟的开源Python项目的发布说明构建,包含48个任务,每个任务需要平均跨越21个文件的多步修改,并通过平均每个实例874个测试的测试套件进行验证。实验揭示了一个显著的能力差距:带有OpenHands的GPT-5.4在SWE-EVO上仅达到25%,而GPT-5.2在SWE-Bench Verified上达到72.80%,表明当前智能体在持续的、多文件推理方面存在困难。我们还提出了修复率(Fix Rate),一个衡量这些复杂长周期任务部分进展的指标。

英文摘要

Existing benchmarks for AI coding agents focus on isolated, single-issue tasks such as fixing a bug or adding a small feature. However, real-world software engineering is a long-horizon endeavor: developers interpret high-level requirements, coordinate changes across many files, and evolve codebases over multiple iterations while preserving functionality. We introduce SWE-EVO, a benchmark for this long-horizon software evolution challenge. Constructed from release notes of seven mature open-source Python projects, SWE-EVO comprises 48 tasks requiring multi-step modifications spanning an average of 21 files, validated against test suites averaging 874 tests per instance. Experiments reveal a striking capability gap: GPT-5.4 with OpenHands achieves only 25% on SWE-EVO versus 72.80% achieved by GPT-5.2 on SWE-Bench Verified, showing that current agents struggle with sustained, multi-file reasoning. We also propose Fix Rate, a metric capturing partial progress on these complex, long-horizon tasks.

2512.15436 2026-05-25 stat.ML cs.LG

Online Partitioned Local Depth for semi-supervised applications

面向半监督应用的在线分区局部深度

John D. Foley, Justin T. Lee

发表机构 * Metron, Inc.(梅隆公司)

AI总结 本文提出了一种适用于在线应用场景的改进版分区局部深度(PaLD)算法,名为在线PaLD,主要用于半监督预测任务。该算法在预计算参考数据集的凝聚网络后,能够在较短时间内扩展至新数据点,从而提升计算效率。研究通过实际应用展示了在线PaLD在医疗数据集上的异常检测和半监督分类中的潜力,拓展了PaLD框架的应用范围。

Comments Added theorem statements and refined results; 21 pages, 2 figures

详情
AI中文摘要

我们介绍了分区局部深度(PaLD)算法的一个扩展,该扩展适用于在线应用,如半监督预测。PaLD以无监督、无参数聚类而闻名,但其鲁棒性基于数据点的三元组,使得精确分析计算成本高昂。目前正在研究如何提高底层离散算法的可扩展性并扩大PaLD的应用范围。我们提出的新算法online PaLD非常适合那些可以预先从参考数据集中计算凝聚网络的情况。在花费$O(n^3)$步骤构建可查询的数据结构后,online PaLD可以在$O(n^2)$时间内将凝聚网络扩展到新的数据点。我们的方法补充了之前基于近似和并行的加速方法。在实际应用中,online PaLD通过相对简单的实现使得更大的数据集可以进行精确分析。我们展示了在医疗保健数据集上的在线异常检测和半监督分类应用,作为online PaLD扩展PaLD框架应用潜力的初步说明。

英文摘要

We introduce an extension of the partitioned local depth (PaLD) algorithm that is adapted to online applications such as semi-supervised prediction. PaLD is best known for unsupervised, parameter-free clustering, but its robustness is based on triples of data points, making exact analysis computationally expensive. Research is ongoing to improve the scalability of the underlying discrete algorithm and expand the breath of PaLD's applications. The new algorithm we present, online PaLD, is well-suited to situations where it is possible to pre-compute a cohesion network from a reference dataset. After $O(n^3)$ steps to construct a queryable data structure, online PaLD can extend the cohesion network to a new data point in $O(n^2)$ time. Our approach complements previous speed up approaches based on approximation and parallelism. In practical terms, online PaLD makes larger datasets accessible to exact analysis with a relatively simple implementation. We present applications to online anomaly detection and semi-supervised classification for health-care datasets as initial illustrations of online PaLD's potential to expand applications of the PaLD framework.

2510.11195 2026-05-25 cs.CR cs.AI

RAG-Pull: Turning Retrieval into a Code-Injection Channel via Invisible Unicode Perturbations

RAG-Pull:通过不可见Unicode扰动将检索转化为代码注入通道

Aritra Dhar, Vasilije Stambolic, Lukas Cavigelli

发表机构 * Computing System Labs, Huawei Technologies Switzerland AG(华为瑞士技术有限公司计算系统实验室) BKW Energie AG(BKW能源集团)

AI总结 本文提出了一种针对检索增强生成(RAG)系统的新型黑盒攻击方法RAG-Pull,通过在查询或代码库中插入不可见的Unicode字符扰动,引导检索过程指向恶意代码,从而破坏模型的安全对齐性。研究发现,仅对查询或目标代码进行微小扰动即可显著影响检索结果,而两者结合则能实现几乎完美的攻击效果。该方法揭示了RAG系统在安全方面的潜在漏洞,为大语言模型的安全性研究提供了新的视角。

详情
AI中文摘要

检索增强生成(RAG)通过将外部数据添加到LLM的上下文中,提高了LLM响应的可靠性和可信度,减少了幻觉,并消除了模型重新训练的需要。我们开发了一种新的黑盒攻击类别RAG-Pull,该攻击将隐藏的UTF字符插入查询或外部代码库中,将检索重定向到恶意代码,从而破坏模型的安全对齐。我们观察到,仅查询和代码扰动就能使检索偏向攻击者控制的片段,而组合的查询和目标扰动实现了近乎完美的成功。一旦被检索,这些片段会引入可利用的漏洞,如远程代码执行和SQL注入。RAG-Pull的最小扰动可以改变模型的安全对齐,并增加对不安全代码的偏好,从而为LLM开辟了一类新的攻击方式。

英文摘要

Retrieval-Augmented Generation (RAG) increases the reliability and trustworthiness of the LLM response and reduces hallucination by eliminating the need for model retraining. It does so by adding external data into the LLM's context. We develop a new class of black-box attack, RAG-Pull, that inserts hidden UTF characters into queries or external code repositories, redirecting retrieval toward malicious code, thereby breaking the models' safety alignment. We observe that query and code perturbations alone can shift retrieval toward attacker-controlled snippets, while combined query-and-target perturbations achieve near-perfect success. Once retrieved, these snippets introduce exploitable vulnerabilities such as remote code execution and SQL injection. RAG-Pull's minimal perturbations can alter the model's safety alignment and increase preference towards unsafe code, therefore opening up a new class of attacks on LLMs.

2510.09136 2026-05-25 cs.IR cs.AI

Controlled Personalization in Legacy Media Online Services: A Case Study in News Recommendation

传统媒体在线服务中的受控个性化:新闻推荐案例研究

Marlene Holzleitner, Stephan Leitner, Hanna Lind Jorgensen, Christoph Schmitz, Jacob Welander, Dietmar Jannach

发表机构 * University of Klagenfurt(克雷格弗尔特大学) University of Bergen(卑尔根大学)

AI总结 本文研究了传统新闻媒体在在线平台中采用“受控个性化”推荐策略的效果,旨在在技术驱动的内容推荐与核心编辑价值观之间取得平衡。通过在一家挪威主流传统新闻机构网站上进行A/B测试,研究发现即使是适度的个性化推荐也能显著提升用户点击率、降低浏览努力,并促进内容多样性和覆盖率,同时减少流行度偏差。研究结果表明,受控个性化能够在满足用户需求的同时维护新闻编辑目标,为传统媒体采用个性化技术提供了可行路径。

详情
AI中文摘要

个性化新闻推荐已成为大型新闻聚合服务的标准功能,通过自动内容选择优化用户参与。相比之下,传统新闻媒体通常谨慎对待个性化,努力在技术创新与核心编辑价值之间取得平衡。因此,传统新闻媒体的在线平台通常结合编辑策划内容与算法选择文章——我们将这种策略称为受控个性化。在这篇行业文章中,我们通过在挪威一家主要传统新闻机构的网站上进行的A/B测试,评估了受控个性化的有效性。我们的研究结果表明,即使是适度的个性化也能带来显著收益。具体来说,我们观察到接触个性化内容的用户表现出更高的点击率和更少的导航努力,这表明相关内容的发现得到了改善。此外,我们的分析显示,受控个性化有助于提高内容多样性和目录覆盖,并减少流行度偏差。总体而言,我们的结果表明,受控个性化能够成功地将用户需求与编辑目标对齐,为传统媒体在维护新闻价值的同时采用个性化技术提供了一条可行路径。

英文摘要

Personalized news recommendations have become a standard feature of large news aggregation services, optimizing user engagement through automated content selection. In contrast, legacy news media often approach personalization cautiously, striving to balance technological innovation with core editorial values. As a result, online platforms of traditional news outlets typically combine editorially curated content with algorithmically selected articles - a strategy we term controlled personalization. In this industry article, we evaluate the effectiveness of controlled personalization through an A/B test conducted on the website of a major Norwegian legacy news organization. Our findings indicate that even a modest level of personalization yields substantial benefits. Specifically, we observe that users exposed to personalized content demonstrate higher click-through-rates and reduced navigation effort, suggesting improved discovery of relevant content. Moreover, our analysis reveals that controlled personalization contributes to greater content diversity and catalog coverage and in addition reduces popularity bias. Overall, our results suggest that controlled personalization can successfully align user needs with editorial goals, offering a viable path for legacy media to adopt personalization technologies while upholding journalistic values.

2510.04406 2026-05-25 stat.ML cs.LG

Decomposition-Based Modular Conformal Prediction for Two-Stage Modeling

基于分解的模块化共形预测用于两阶段建模

William Zhang, Saurabh Amin, Georgia Perakis

发表机构 * Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, USA(麻省理工学院运筹学研究中心) Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA, US(麻省理工学院信息与决策系统实验室) Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, USA(麻省理工学院斯隆管理学院)

AI总结 本文提出了一种基于分解的模块化 conformal 预测框架,用于处理两阶段建模过程中的不确定性量化问题。该方法将整体预测残差分解为各阶段特有部分,从而能够识别并归因于不同模型阶段的不确定性来源。通过引入基于族内错误率控制的参数选择策略,并扩展到非平稳场景,该方法在结构化和阶段化变化下表现出更优的覆盖率和诊断能力,优于传统 conformal 预测方法。

Comments 11 pages, (37 with appendix), 15 figures

详情
AI中文摘要

共形预测在最小假设下提供了有限样本覆盖保证。然而,现有方法将整个建模过程视为黑箱,忽视了利用和理解模块化结构的机会。我们引入了一种针对两阶段顺序模型的共形预测框架,其中上游预测器为下游模型生成中间表示。通过将整体预测残差分解为阶段特定成分,我们的方法使从业者能够将不确定性归因于特定的流水线阶段。我们开发了一个使用族系错误率(FWER)控制的风险控制参数选择程序,以校准阶段级缩放参数,并引入了一个针对非平稳设置的自适应扩展。在合成分布偏移以及真实供应链和股票市场数据上的实验表明,与标准共形方法相比,我们的方法在结构性的阶段级偏移下提高了覆盖,同时识别了阶段级误差贡献。该框架提供了标准共形方法所缺乏的诊断优势和鲁棒覆盖。

英文摘要

Conformal prediction offers finite-sample coverage guarantees under minimal assumptions. However, existing methods treat the entire modeling process as a black box, overlooking opportunities to exploit and understand modular structure. We introduce a conformal prediction framework for two-stage sequential models, where an upstream predictor generates intermediate representations for a downstream model. By decomposing the overall prediction residual into stage-specific components, our method enables practitioners to attribute uncertainty to specific pipeline stages. We develop a risk-controlled parameter selection procedure using family-wise error rate (FWER) control to calibrate stage-wise scaling parameters, and introduce an adaptive extension for non-stationary settings. Experiments on synthetic distribution shifts, as well as real-world supply chain and stock market data, demonstrate that our approach improves coverage under structural, stage-wise shifts compared to standard conformal methods, while identifying stage-wise error contribution. This framework offers diagnostic advantages and robust coverage that standard conformal methods lack.

2509.22271 2026-05-25 cs.HC cs.RO

Human Autonomy and Sense of Agency in Human-Robot Interaction: A Systematic Literature Review

人机交互中的人类自主性与主体感:一项系统文献综述

Felix Glawe, Tim Schmeckel, Philipp Brauner, Martina Ziefle

发表机构 * Chair for Communication Science(沟通科学系)

AI总结 本文系统综述了2011年至2024年间发表的22项实证研究,探讨了人机交互中人类自主性与主体感的重要性及其影响因素。研究通过主题综合分析,揭示了机器人适应性、沟通方式、拟人化程度、机器人存在感及个体差异等五个关键因素。研究指出当前实证证据仍显碎片化,强调需要统一概念定义和加强定性研究,以支持更符合伦理和心理原则的人机交互设计。

详情
AI中文摘要

人类自主性和主体感在人机交互(HRI)中日益被认为是用户福祉、动机以及机器人伦理部署的关键。随着人工智能的快速发展,机器人的能力及其作为同事和伴侣的潜力正在增长。本系统文献综述综合了从2011年至2024年间发表的728篇初始文章中筛选出的22项实证研究。文章从主要科学数据库中检索,并根据实证焦点和概念相关性(即如何在HRI中保持和促进人类自主性和主体感)进行识别。通过主题综合,揭示了五类潜在影响因素:机器人适应性、沟通风格、拟人化、机器人存在和个体差异。通过心理测量量表或意向绑定范式测量,自主性和主体感的感知在工业、教育、医疗、护理和酒店环境中有所不同。本综述强调了这两个概念之间的理论差异,但它们在HRI中的使用仍然纠缠不清。尽管兴趣日益增加,但当前的实证证据仍然有限且分散,凸显了对标准化定义、更稳健的操作化以及进一步探索性和定性研究的必要性。通过识别现有差距并突出新兴趋势,本综述有助于开发以人为中心、支持自主性的机器人设计策略,这些策略遵循伦理和心理学原则,最终支持人机交互中的福祉。

英文摘要

Human autonomy and sense of agency are increasingly recognised as critical for user well-being, motivation, and the ethical deployment of robots in human-robot interaction (HRI). Given the rapid development of artificial intelligence, robot capabilities and their potential to function as colleagues and companions are growing. This systematic literature review synthesises 22 empirical studies selected from an initial pool of 728 articles published between 2011 and 2024. Articles were retrieved from major scientific databases and identified based on empirical focus and conceptual relevance, namely, how to preserve and promote human autonomy and sense of agency in HRI. Derived through thematic synthesis, five clusters of potentially influential factors are revealed: robot adaptiveness, communication style, anthropomorphism, presence of a robot and individual differences. Measured through psychometric scales or the intentional binding paradigm, perceptions of autonomy and agency varied across industrial, educational, healthcare, care, and hospitality settings. The review underscores the theoretical differences between both concepts, but their yet entangled use in HRI. Despite increasing interest, the current body of empirical evidence remains limited and fragmented, underscoring the necessity for standardised definitions, more robust operationalisations, and further exploratory and qualitative research. By identifying existing gaps and highlighting emerging trends, this review contributes to the development of human-centered, autonomy-supportive robot design strategies that uphold ethical and psychological principles, ultimately supporting well-being in human-robot interaction.

2509.06858 2026-05-25 physics.soc-ph cs.AI nlin.AO

Disentangling Interaction and Bias Effects in Opinion Dynamics of Large Language Models

大型语言模型中意见动态的交互与偏差效应的分离

Vincent C. Brockers, David A. Ehrlich, Viola Priesemann

发表机构 * Max-Planck-Institute for Dynamics and Self-Organization(马克斯·普朗克动态与自组织研究所) Institute for the Dynamics of Complex Systems(复杂系统动力学研究所) University of Göttingen(哥廷根大学) Campus Institute for Dynamics of Biological Networks(校园生物网络动力学研究所)

AI总结 该研究探讨了大型语言模型在模拟人类意见动态时,真实交互效果如何被系统性偏差所掩盖的问题。研究提出了一种贝叶斯框架,用于分离和量化三种偏差:主题偏差、同意偏差和锚定偏差,并应用于多个模型在不同话题上的多轮对话实验中。结果表明,意见演化趋向于快速收敛,偏差和交互的影响随时间减弱,且不同模型的偏差表现存在差异,研究还揭示了微调对模型意见吸引子的影响,为评估语言模型在人类行为模拟中的潜力与局限提供了量化工具。

详情
AI中文摘要

大型语言模型越来越多地被用于模拟人类意见动态,然而真实交互的影响常常被系统性偏差所掩盖。我们开发了一个贝叶斯框架来分离并量化三种这样的偏差:(i) 针对LLM默认立场的主题偏差;(ii) 倾向于同意提示语句的同意偏差,无论问题如何;(iii) 倾向于初始主体立场的锚定偏差。我们将该框架应用于多个LLM,这些模型在从气候变化、社会正义到音乐偏好的12个不同问题上执行了多步对话。我们发现意见轨迹往往迅速收敛到一个共享吸引子,交互和偏差的影响随时间衰减,且偏差的影响在不同LLM之间有所不同。此外,我们表明,在不同组别的强烈意见陈述(包括错误信息)上微调LLM会相应地改变意见吸引子。通过揭示LLM之间的显著差异,并提供定量工具来比较交互和偏差对LLM主体讨论中意见转变的贡献,我们的方法突出了使用LLM作为人类行为代理的潜力和陷阱。

英文摘要

Large Language Models are increasingly used to simulate human opinion dynamics, yet the effect of genuine interaction is often obscured by systematic biases. We develop a Bayesian framework to disentangle and quantify three such biases: (i) A topic bias toward the LLM's default stance; (ii) an agreement bias favoring agreement to the prompted statement irrespective of the question; and (iii) an anchoring bias toward the initiating agent's stance. We apply this framework to various LLMs that performed multi-step dialogues on 12 different questions from climate change and societal justice to music preferences. We find that opinion trajectories tend to quickly converge to a shared attractor, with the influence of both interaction and biases decaying over time, and with the impact of biases differing between LLMs. In addition, we show that fine-tuning an LLM on different sets of strongly opinionated statements (including misinformation) shifts the opinion attractor correspondingly. By exposing stark differences between LLMs and providing quantitative tools for comparing interaction and bias contributions to opinion shifts in LLM agent discussions, our approach highlights both promises and pitfalls of using LLMs as proxies for human behavior.

2507.09330 2026-05-25 physics.flu-dyn cs.LG physics.comp-ph

WellPINN: Accurate Well Representation for Transient Fluid Pressure Diffusion in Subsurface Reservoirs with Physics-Informed Neural Networks

WellPINN:基于物理信息神经网络的瞬态流体压力扩散在储层中的精确井表征

Linus Walter, Qingkai Kong, Sara Hanson-Hedgecock, Víctor Vilarrasa

发表机构 * Global Change Research Group (GCRG), IMEDEA, CSIC-UIB(全球变化研究组(GCRG),IMEDEA,CSIC-UIB)

AI总结 本文提出了一种基于物理信息神经网络(PINN)的新型建模方法 WellPINN,用于更准确地表征地下储层中井周围的瞬态流体压力扩散问题。该方法通过依次训练多个 PINN 模型,并逐步缩小等效井半径以匹配实际井尺寸,有效解决了现有方法在注水初期井附近压力预测不准确的问题。WellPINN 在整个注水周期内实现了对流体压力的高精度反演,显著提升了 PINN 在逆向建模和操作场景模拟中的应用潜力。

详情
AI中文摘要

精确的井表征对于可靠的地层描述和地下流动模型中操作场景的模拟至关重要。物理信息神经网络(PINNs)最近作为一种有前景的储层建模方法出现,能够无缝集成监测数据和控制物理方程。然而,现有的基于PINN的研究在捕捉井附近流体压力方面面临重大挑战,特别是在注入开始后的早期阶段。为了解决这个问题,我们提出了WellPINN,一种建模工作流,它结合了多个顺序训练的PINN模型的输出,以精确表征井。该工作流通过将域分解为逐步缩小的子域,同时减小等效井半径,迭代地逼近等效井半径以匹配实际井尺寸。我们的结果表明,在抽水井周围顺序训练叠加网络是第一个专注于在整个注入期间从泵注速率精确推断流体压力的工作流,显著推进了PINN在反演建模和操作场景模拟中的潜力。本文的所有数据和代码将在https://github.com/linuswalter/WellPINN公开提供。

英文摘要

Accurate representation of wells is essential for reliable reservoir characterization and simulation of operational scenarios in subsurface flow models. Physics-informed neural networks (PINNs) have recently emerged as a promising method for reservoir modeling, offering seamless integration of monitoring data and governing physical equations. However, existing PINN-based studies face major challenges in capturing fluid pressure near wells, particularly during the early stage after injection begins. To address this, we propose WellPINN, a modeling workflow that combines the outputs of multiple sequentially trained PINN models to accurately represent wells. This workflow iteratively approximates the radius of the equivalent well to match the actual well dimensions by decomposing the domain into stepwise shrinking subdomains with a simultaneously reducing equivalent well radius. Our results demonstrate that sequential training of superimposing networks around the pumping well is the first workflow that focuses on accurate inference of fluid pressure from pumping rates throughout the entire injection period, significantly advancing the potential of PINNs for inverse modeling and operational scenario simulations. All data and code for this paper will be made openly available at https://github.com/linuswalter/WellPINN.

2507.06252 2026-05-25 cs.CR cs.AI cs.LG

False Alarms, Real Damage: Adversarial Attacks Using LLM-based Models on Text-based Cyber Threat Intelligence Systems

虚假警报,真实损害:基于LLM的模型对文本网络威胁情报系统的对抗攻击

Samaneh Shafee, Alysson Bessani, Pedro M. Ferreira

发表机构 * Faculty of Sciences, University of Lisbon(里斯本大学科学学院) CIENCES, University of Lisbon(里斯本大学CIENCES)

AI总结 本文研究了基于大语言模型(LLM)的对抗攻击对基于文本的网络威胁情报(CTI)系统的影响。研究分析了三种攻击类型,包括规避、泛滥和投毒攻击,揭示了CTI系统在处理来自开放来源的文本数据时存在的脆弱性。特别指出,通过生成虚假文本,攻击者可以误导分类器,降低系统性能并破坏其功能,其中规避攻击在CTI流程中尤为关键,为后续攻击提供了前提条件。

Journal ref Future Generation Computer Systems, 2026

详情
AI中文摘要

网络威胁情报(CTI)已成为一种重要的补充方法,在网络威胁生命周期的早期阶段运作。CTI涉及收集、处理和分析威胁数据,以提供更准确和快速的网络威胁理解。由于数据量大,通过机器学习(ML)和自然语言处理(NLP)模型进行自动化对于有效的CTI提取至关重要。这些自动化系统利用来自社交网络、论坛和博客等来源的开源情报(OSINT)来识别威胁指标(IoCs)。尽管先前的研究集中在针对特定ML模型的对抗攻击上,但本研究通过调查整个CTI管道中各个组件的脆弱性及其对对抗攻击的敏感性,扩展了研究范围。这些脆弱性源于它们从各种开放来源(包括真实和潜在虚假内容)接收文本输入。我们分析了针对CTI管道的三种攻击类型,包括逃避、淹没和投毒,并评估了它们对系统信息选择能力的影响。具体而言,在虚假文本生成方面,该工作展示了对抗文本生成技术如何创建虚假的网络安全和类似网络安全的文本,从而误导分类器、降低性能并破坏系统功能。重点主要放在逃避攻击上,因为它先于并使得CTI管道中的淹没和投毒攻击成为可能。

英文摘要

Cyber Threat Intelligence (CTI) has emerged as a vital complementary approach that operates in the early phases of the cyber threat lifecycle. CTI involves collecting, processing, and analyzing threat data to provide a more accurate and rapid understanding of cyber threats. Due to the large volume of data, automation through Machine Learning (ML) and Natural Language Processing (NLP) models is essential for effective CTI extraction. These automated systems leverage Open Source Intelligence (OSINT) from sources like social networks, forums, and blogs to identify Indicators of Compromise (IoCs). Although prior research has focused on adversarial attacks on specific ML models, this study expands the scope by investigating vulnerabilities within various components of the entire CTI pipeline and their susceptibility to adversarial attacks. These vulnerabilities arise because they ingest textual inputs from various open sources, including real and potentially fake content. We analyse three types of attacks against CTI pipelines, including evasion, flooding, and poisoning, and assess their impact on the system's information selection capabilities. Specifically, on fake text generation, the work demonstrates how adversarial text generation techniques can create fake cybersecurity and cybersecurity-like text that misleads classifiers, degrades performance, and disrupts system functionality. The focus is primarily on the evasion attack, as it precedes and enables flooding and poisoning attacks within the CTI pipeline.

2507.05311 2026-05-25 cs.IR cs.AI

PLACE: Prompt Learning for Attributed Community Search in Large Graphs

PLACE:面向大规模图属性社区搜索的提示学习

Shuheng Fang, Kangfei Zhao, Rener Zhang, Yu Rong, Jeffrey Xu Yu

发表机构 * Shenzhen Institute of Computing Sciences(深圳计算科学研究院) Beijing Institute of Technology(北京理工大学) Chinese University of Hong Kong(香港中文大学) The Hong Kong University of Science and Technology (Guangzhou)(香港科学与技术大学(广州))

AI总结 本文提出PLACE,一种用于属性社区搜索的图提示学习框架。该方法受到自然语言处理中提示调优的启发,通过在图中插入可学习的提示标记,构建提示增强图结构,以增强与查询相关的节点间连接,帮助图神经网络更有效地识别结构连贯性和属性相似性。实验表明,PLACE在多个真实图数据集上显著优于现有方法,平均F1分数提升22%。

Comments 14 pages, 9 figures

Journal ref KDD 2026

详情
AI中文摘要

在本文中,我们提出了PLACE(面向属性社区搜索的提示学习),一种创新的图提示学习框架用于ACS。受自然语言处理(NLP)中提示调优的启发,其中可学习的提示令牌被插入以语境化NLP查询,PLACE将结构化和可学习的提示令牌集成到图中作为查询相关的细化机制,形成提示增强图。在这种提示增强图结构中,学习到的提示令牌充当桥梁,加强图中节点与查询之间的连接,使GNN能够更有效地识别与特定查询相关的结构凝聚性和属性相似性模式。我们采用交替训练范式来联合优化提示参数和GNN。此外,我们设计了一种分治策略以增强可扩展性,支持模型处理百万级图。在9个真实图上的大量实验证明了PLACE对三种类型ACS查询的有效性,其中PLACE的平均F1分数比现有最先进方法高出22%。

英文摘要

In this paper, we propose PLACE (Prompt Learning for Attributed Community Search), an innovative graph prompt learning framework for ACS. Enlightened by prompt-tuning in Natural Language Processing (NLP), where learnable prompt tokens are inserted to contextualize NLP queries, PLACE integrates structural and learnable prompt tokens into the graph as a query-dependent refinement mechanism, forming a prompt-augmented graph. Within this prompt-augmented graph structure, the learned prompt tokens serve as a bridge that strengthens connections between graph nodes for the query, enabling the GNN to more effectively identify patterns of structural cohesiveness and attribute similarity related to the specific query. We employ an alternating training paradigm to optimize both the prompt parameters and the GNN jointly. Moreover, we design a divide-and-conquer strategy to enhance scalability, supporting the model to handle million-scale graphs. Extensive experiments on 9 real-world graphs demonstrate the effectiveness of PLACE for three types of ACS queries, where PLACE achieves higher F1 scores by 22% compared to the state-of-the-arts on average.

2507.05064 2026-05-25 stat.ML cs.LG stat.ME

Vecchia-Inducing-Points Full-Scale Approximations for Gaussian Processes

高斯过程的Vecchia诱导点全尺度近似

Tim Gyger, Reinhard Furrer, Fabio Sigrist

发表机构 * Institute of Financial Services(金融服务研究所) Lucerne University of Applied Sciences and Arts(卢塞恩应用科学与艺术大学) University of Zurich(苏黎世大学) Seminar for Statistics, ETH Zurich(苏黎世联邦理工学院统计系)

AI总结 本文提出了一种结合全局诱导点与局部Vecchia近似优势的高斯过程全尺度近似方法——VIF近似,旨在解决高斯过程在大规模数据集上的计算瓶颈。该方法通过基于相关性的邻居查找策略,提高了残差过程的Vecchia近似效率,并利用改进的覆盖树算法实现高效计算。此外,研究还扩展了该框架以处理非高斯似然,引入迭代方法大幅降低了计算成本,并在模拟和真实数据集上验证了其在计算效率、精度和数值稳定性方面的优越性。

详情
AI中文摘要

高斯过程是灵活、概率性的非参数模型,广泛应用于机器学习和统计学。然而,其在大数据集上的可扩展性受计算限制。为克服这些挑战,我们提出Vecchia诱导点全尺度(VIF)近似,结合全局诱导点和局部Vecchia近似的优势。Vecchia近似在低维输入和中等光滑协方差函数设置中表现优异,而诱导点方法更适合高维输入和更光滑的协方差函数。我们的VIF方法通过使用基于相关性的高效邻居搜索策略(通过改进的覆盖树算法实现)对残差过程进行Vecchia近似,从而桥接这两种情况。我们进一步将框架扩展到非高斯似然,引入迭代方法,与基于Cholesky的计算相比,在使用拉普拉斯近似时,训练和预测的计算成本降低了几个数量级。特别是,我们提出并比较了新颖的预条件器,并提供了理论收敛结果。在模拟和真实数据集上的大量数值实验表明,VIF近似不仅计算高效,而且比最先进的替代方法更准确、数值更稳定。所有方法均在开源C++库GPBoost中实现,并配有高级Python和R接口。

英文摘要

Gaussian processes are flexible, probabilistic, non-parametric models widely used in machine learning and statistics. However, their scalability to large data sets is limited by computational constraints. To overcome these challenges, we propose Vecchia-inducing-points full-scale (VIF) approximations combining the strengths of global inducing points and local Vecchia approximations. Vecchia approximations excel in settings with low-dimensional inputs and moderately smooth covariance functions, while inducing point methods are better suited to high-dimensional inputs and smoother covariance functions. Our VIF approach bridges these two regimes by using an efficient correlation-based neighbor-finding strategy for the Vecchia approximation of the residual process, implemented via a modified cover tree algorithm. We further extend our framework to non-Gaussian likelihoods by introducing iterative methods that substantially reduce computational costs for training and prediction by several orders of magnitudes compared to Cholesky-based computations when using a Laplace approximation. In particular, we propose and compare novel preconditioners and provide theoretical convergence results. Extensive numerical experiments on simulated and real-world data sets show that VIF approximations are both computationally efficient as well as more accurate and numerically stable than state-of-the-art alternatives. All methods are implemented in the open source C++ library GPBoost with high-level Python and R interfaces.

2506.04390 2026-05-25 cs.CR cs.AI

Through the Stealth Lens: Attention-Aware Defenses Against Poisoning in RAG

通过隐秘视角:检索增强生成中针对投毒攻击的注意力感知防御

Sarthak Choudhary, Nils Palumbo, Ashish Hooda, Krishnamurthy Dj Dvijotham, Somesh Jha

发表机构 * University of Wisconsin-Madison(威斯康星大学麦迪逊分校) ServiceNow Research(ServiceNow研究)

AI总结 本文研究了检索增强生成(RAG)系统中针对数据投毒攻击的隐蔽性防御方法。作者提出了一种基于注意力机制的防御策略,通过分析语言模型的注意力权重,引入归一化段落注意力得分(NPAS)和注意力方差过滤器(AV Filter),以检测并过滤被污染的检索内容。实验表明,该方法显著提升了系统的鲁棒性,并揭示了实现真正隐蔽投毒攻击的难度。

Comments Accepted at ICML 2026

详情
AI中文摘要

检索增强生成(RAG)系统容易受到攻击,即使污染率很低,攻击者也能将有毒段落注入检索到的上下文中。我们表明现有攻击并非设计为隐秘的,因此可以实现可靠的检测和缓解。我们形式化了一个基于可区分性的安全游戏来量化此类攻击的隐秘性。如果少数有毒段落控制了响应,它们必须比良性段落更偏向推理过程,这本质上损害了隐秘性。这促使我们分析LLM的中间信号(如注意力权重)来近似不同段落对响应的影响。利用注意力权重,我们引入了$ extbf{归一化段落注意力分数}$(NPAS)和轻量级的$ extbf{注意力方差滤波器}$(AV Filter),用于标记异常段落。我们的方法提高了鲁棒性,相比基线防御,准确率提高了约$\sim$ $ extbf{20%}$。我们还开发了自适应攻击,试图隐藏此类异常,成功率高达$ extbf{35%}$,这凸显了在RAG系统中实现真正隐秘投毒的挑战。

英文摘要

Retrieval-augmented generation (RAG) systems are vulnerable to attacks that inject poisoned passages into the retrieved context, even at low corruption rates. We show that existing attacks are not designed to be stealthy, allowing reliable detection and mitigation. We formalize a distinguishability-based security game to quantify stealth for such attacks. If a few poisoned passages control the response, they must bias the inference process more than the benign ones, inherently compromising stealth. This motivates analyzing intermediate signals of LLMs, such as attention weights, to approximate the influence of different passages on the response. Leveraging attention weights, we introduce the $\textbf{Normalized Passage Attention Score}$ (NPAS) and a lightweight $\textbf{Attention-Variance Filter}$ (AV Filter) that flags anomalous passages. Our method improves robustness, yielding up to $\sim$ $\textbf{20%}$ higher accuracy than baseline defenses. We also develop adaptive attacks that attempt to conceal such anomalies, achieving up to $\textbf{35%}$ success rate and underscoring the challenges of achieving true stealth in poisoning RAG systems.

2506.03530 2026-05-25 cs.MM cs.CL cs.CV

How Far Are We from Generating Missing Modalities with Foundation Models?

我们距离用基础模型生成缺失模态还有多远?

Guanzhou Ke, Bo Wang, Guoqing Chao, Weiming Hu, Shengfeng He

发表机构 * Institute of Data Science and Intelligent Decision Support, Beijing Jiaotong University(数据科学与智能决策支持研究所,北京交通大学) School of Computing and Information Systems, Singapore Management University(计算与信息系统学院,新加坡管理大学) State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences(多模态人工智能系统国家重点实验室,自动化研究所,中国科学院) School of Computer Science and Technology, Harbin Institute of Technology(计算机科学与技术学院,哈尔滨工业大学)

AI总结 该研究探讨了基础模型在生成缺失模态数据方面的潜力与局限,提出了三种缺失模态重建的范式,并对42种模型变体进行了系统评估。研究发现,当前基础模型在细粒度语义提取和生成模态的鲁棒验证方面存在不足,导致生成结果不够理想。为此,作者提出了一种智能代理框架,通过动态的模态感知挖掘策略和自优化机制,显著提升了缺失模态重建的质量,实验表明在图像和文本重建任务中分别取得了14%和10%以上的性能提升。

Comments T-PAMI

详情
AI中文摘要

多模态基础模型在各种任务中展现了令人印象深刻的能力。然而,它们作为缺失模态重建的即插即用解决方案的潜力尚未被充分探索。为弥补这一差距,我们识别并形式化了三种可能的缺失模态重建范式,并跨这些范式进行了全面评估,覆盖了42个模型变体在重建准确性和下游任务适应性方面的表现。我们的分析表明,当前基础模型在两个关键方面往往表现不佳:(i) 从可用模态中提取细粒度语义,以及(ii) 对生成模态的稳健验证。这些限制导致了次优甚至有时不匹配的生成。为解决这些挑战,我们提出了一个专为缺失模态重建设计的智能框架。该框架根据输入上下文动态制定模态感知的挖掘策略,促进提取更丰富、更具判别性的语义特征。此外,我们引入了一种自精炼机制,通过内部反馈迭代验证和提升生成模态的质量。实验结果表明,与基线相比,我们的方法在缺失图像重建上FID降低了至少14%,在缺失文本重建上MER降低了至少10%。代码已发布在:https://github.com/Guanzhou-Ke/AFM2。

英文摘要

Multimodal foundation models have demonstrated impressive capabilities across diverse tasks. However, their potential as plug-and-play solutions for missing modality reconstruction remains underexplored. To bridge this gap, we identify and formalize three potential paradigms for missing modality reconstruction, and perform a comprehensive evaluation across these paradigms, covering 42 model variants in terms of reconstruction accuracy and adaptability to downstream tasks. Our analysis reveals that current foundation models often fall short in two critical aspects: (i) fine-grained semantic extraction from the available modalities, and (ii) robust validation of generated modalities. These limitations lead to suboptimal and, at times, misaligned generations. To address these challenges, we propose an agentic framework tailored for missing modality reconstruction. This framework dynamically formulates modality-aware mining strategies based on the input context, facilitating the extraction of richer and more discriminative semantic features. In addition, we introduce a self-refinement mechanism, which iteratively verifies and enhances the quality of generated modalities through internal feedback. Experimental results show that our method reduces FID for missing image reconstruction by at least 14\% and MER for missing text reconstruction by at least 10\% compared to baselines. Code are released at: https://github.com/Guanzhou-Ke/AFM2.

2506.00474 2026-05-25 eess.IV cs.CV

A European Multi-Center Breast Cancer MRI Dataset

欧洲多中心乳腺癌MRI数据集

Gustav Müller-Franzes, Lorena Escudero Sánchez, Nicholas Payne, Alexandra Athanasiou, Michael Kalogeropoulos, Aitor Lopez, Alfredo Miguel Soro Busto, Julia Camps Herrero, Nika Rasoolzadeh, Tianyu Zhang, Ritse Mann, Debora Jutz, Maike Bode, Christiane Kuhl, Yuan Gao, Wouter Veldhuis, Oliver Lester Saldanha, JieFu Zhu, Jakob Nikolas Kather, Daniel Truhn, Fiona J. Gilbert

发表机构 * University of Cambridge(剑桥大学) MITERA Hospital(MITERA医院) Ribera Salud Group(Ribera Salud集团) Radboud University Medical Center(拉德堡德大学医学中心) University Hospital RWTH Aachen(亚琛工业大学医院) University Medical Center Utrecht(乌得勒支大学医学中心) University Hospital Carl Gustav Carus(卡尔·古斯塔夫·卡鲁斯大学医院) EKFZ Technical University Dresden(德累斯顿技术大学EKFZ)

AI总结 该研究提出了一种公开的欧洲多中心乳腺癌MRI数据集,旨在解决当前乳腺MRI人工智能辅助诊断中缺乏大规模、多样化数据的问题。数据集包含来自五个欧洲国家六家临床机构的741例乳腺MRI检查,涵盖恶性、良性及无病灶病例,并使用不同扫描设备和参数采集,真实反映临床多样性。研究还利用基于Transformer的模型进行了基准测试,展示了数据集的潜在应用价值,并为后续方法比较提供了参考性能。

详情
AI中文摘要

早期检测乳腺癌对于改善患者预后至关重要。虽然乳腺X线摄影仍是主要筛查手段,但磁共振成像(MRI)越来越多地被推荐作为乳腺组织致密女性及高风险女性的补充工具。然而,多参数乳腺MRI的采集和解读耗时且需要专业知识,限制了其在临床实践中的可扩展性。人工智能(AI)方法在支持乳腺MRI解读方面显示出潜力,但其发展受到大型、多样化和公开可访问数据集可用性有限的阻碍。为弥补这一差距,我们提供了一个公开可用的多中心乳腺MRI数据集,该数据集收集自五个欧洲国家的六个临床机构。该数据集包含741例接受筛查或诊断性乳腺MRI的女性检查,包括恶性、良性和非病灶病例。数据使用异构扫描仪、场强和采集协议获取,反映了真实世界的临床变异性。此外,我们报告了使用基于Transformer模型的基线基准实验,以说明该数据集的潜在用例,并为未来的方法比较提供参考性能。

英文摘要

Early detection of breast cancer is critical for improving patient outcomes. While mammography remains the primary screening modality, magnetic resonance imaging (MRI) is increasingly recommended as a supplemental tool for women with dense breast tissue and those at elevated risk. However, the acquisition and interpretation of multiparametric breast MRI are time-consuming and require specialized expertise, limiting scalability in clinical practice. Artificial intelligence (AI) methods have shown promise in supporting breast MRI interpretation, but their development is hindered by the limited availability of large, diverse, and publicly accessible datasets. To address this gap, we present a publicly available, multi-centre breast MRI dataset collected across six clinical institutions in five European countries. The dataset comprises 741 examinations from women undergoing screening or diagnostic breast MRI and includes malignant, benign, and non-lesion cases. Data were acquired using heterogeneous scanners, field strengths, and acquisition protocols, reflecting real-world clinical variability. In addition, we report baseline benchmark experiments using a transformer-based model to illustrate potential use cases of the dataset and to provide reference performance for future methodological comparisons.

2411.08126 2026-05-25 stat.ML cs.LG

A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing

双城记:离线动态定价中的悲观主义与机会主义

Zeyu Bian, Lan Wang, Zhengling Qi

发表机构 * Department of Statistics, Florida State University(佛罗里达州立大学统计系) Department of Management Science, University of Miami(迈阿密大学管理科学系) Department of Decision Sciences, The George Washington University(乔治华盛顿大学决策科学系)

AI总结 本文研究了在历史数据未能覆盖全部价格区间的情况下,如何进行离线动态定价,尤其是在最优价格可能完全未被观测到的现实场景中。为解决这一问题,作者提出了一种非参数部分识别框架,利用需求对价格的单调性来估计未观测价格的价值,并设计了两种动态定价策略:一种是追求最坏情况下收益最大化的悲观策略,另一种是力求最小化最坏情况下遗憾的乐观策略。该方法在无覆盖场景下表现出优越性能,并为企业提供了根据风险偏好选择定价策略的实用指导。

详情
AI中文摘要

我们研究离线动态定价,当历史数据对价格空间的覆盖不完整时,一些候选价格(包括最优价格)可能完全未被观测到。这种设置在现实中很常见,在动态环境中尤其困难。现有的离线强化学习方法通常依赖于完全或部分覆盖,因此在这种设置下表现不佳。我们开发了一个用于离线动态定价的非参数部分识别框架,利用需求在价格上的单调性来界定未观测价格的价值。在该框架内,我们制定了两种动态决策规则:一种最大化最坏情况收入的悲观策略,和一种最小化最坏情况遗憾的机会策略。这些规则针对顺序无覆盖环境量身定制,并非现有悲观离线强化学习或静态机会主义方法的直接扩展。我们为两种策略建立了有限样本遗憾界,当最优价格被覆盖时恢复了标准速率,并量化了未覆盖时的额外成本。我们还开发了高效算法,并通过模拟和机票应用表明,我们的方法在无覆盖设置中优于标准离线强化学习基线。从管理角度看,该框架提供了从公司风险态度到定价策略的实用映射:寻求收入稳定和下行保护的公司应偏好悲观策略,而愿意承担适度风险以从未充分探索的价格中获取潜在收益的公司应偏好机会策略。

英文摘要

We study offline dynamic pricing when historical data provide incomplete coverage of the price space such that some candidate prices, including the optimal one, may be entirely unobserved. This setting is common in practice and is especially difficult in dynamic environments. Existing offline reinforcement learning methods typically rely on full or partial coverage and can therefore perform poorly in such settings. We develop a nonparametric partial identification framework for offline dynamic pricing that exploits the monotonicity of demand in price to bound the value of unobserved prices. Within this framework, we formulate two dynamic decision rules: a pessimistic policy that maximizes worst-case revenue and an opportunistic policy that minimizes worst-case regret. These rules are tailored to a sequential no-coverage environment and are not direct extensions of existing pessimistic offline RL or static opportunistic approaches. We establish finite-sample regret bounds for both policies, recovering the standard rate when the optimal price is covered and quantifying the additional cost when it is not. We also develop efficient algorithms and show, through simulations and an airline ticket application, that our methods outperform standard offline RL baselines in no-coverage settings. Managerially, the framework provides a practical mapping from a firm's risk posture to its pricing policy: firms seeking revenue stability and downside protection should prefer the pessimistic policy, whereas firms willing to bear measured risk for potential gains from underexplored prices should prefer the opportunistic policy.

2410.19842 2026-05-25 eess.SP cs.LG

A comprehensive evaluation of pretraining strategies for channel-agnostic contrastive self-supervision of biosignals

生物信号通道无关对比自监督预训练策略的综合评估

Thea Brüsch, Mikkel N. Schmidt, Tommy S. Alstrøm

发表机构 * Department of Applied Mathematics and Computer Science(应用数学和计算机科学系)

AI总结 该研究探讨了在生物信号的通道无关自监督学习中创建正样本对的有效策略,以解决多通道时间序列数据中数据增强设计困难和模型泛化能力不足的问题。研究提出了一种名为对比随机导联编码(CRLC)的方法,通过随机选择输入通道的子集生成正样本对,并在EEG和ECG数据上验证了其有效性。实验表明,CRLC在通道无关设置下优于其他方法,在EEG任务中甚至超越了当前最先进的模型,为生物信号的自监督学习提供了新的思路。

详情
AI中文摘要

对比学习在计算机视觉的自监督中取得了令人印象深刻的结果。该方法依赖于正对的创建,这通常通过数据增强来实现。然而,对于多变量时间序列,有效的增强可能难以设计。此外,生物信号数据集的输入通道数通常因应用而异,限制了使用特定通道配置训练的大型自监督模型的实用性。受这些挑战的驱动,我们着手研究用于生物信号通道无关自监督的正对创建策略。我们引入了对比随机导联编码(CRLC),其中使用输入通道的随机子集来创建正对,并与使用增强和时间上相邻片段作为正对的方法进行比较。我们通过在EEG和ECG数据上预训练模型,然后针对下游任务进行微调来验证我们的方法。在通道无关设置中,CRLC在两种场景下均优于竞争策略。值得注意的是,对于EEG任务,CRLC超越了当前最先进的参考模型。而在ECG任务中,尽管最先进的参考模型更优,但结合CRLC使我们能够获得可比较的结果。总之,CRLC有助于在训练我们的通道无关模型时,跨不同通道设置进行泛化。代码可在https://github.com/theabrusch/Multiview_TS_SSL获取。

英文摘要

Contrastive learning yields impressive results for self-supervision in computer vision. The approach relies on the creation of positive pairs, something which is often achieved through augmentations. However, for multivariate time series effective augmentations can be difficult to design. Additionally, the number of input channels for biosignal datasets often varies from application to application, limiting the usefulness of large self-supervised models trained with specific channel configurations. Motivated by these challenges, we set out to investigate strategies for creation of positive pairs for channel-agnostic self-supervision of biosignals. We introduce contrastive random lead coding (CRLC), where random subsets of the input channels are used to create positive pairs and compare with using augmentations and neighboring segments in time as positive pairs. We validate our approach by pre-training models on EEG and ECG data, and then fine-tuning them for downstream tasks. CRLC outperforms competing strategies in both scenarios in the channel-agnostic setting. Notably, for EEG tasks CRLC surpasses the current state-of-the-art reference model. While, the state-of-the-art reference model is superior in the ECG task, incorporating CRLC allows us to obtain comparable results. In conclusion, CRLC helps generalization across variable channel setups when training our channel-agnostic model. The code is available at https://github.com/theabrusch/Multiview_TS_SSL.

2408.03085 2026-05-25 quant-ph cs.LG

Universal Matrix Multiplication on Quantum Computer

量子计算机上的通用矩阵乘法

Jiaqi Yao, Tianjian Huang, Zipeng Cai, Ding Liu

发表机构 * School of Computer Science and Technology, Tiangong University(天津工业大学计算机科学与技术学院)

AI总结 本文研究了如何在量子计算机上高效实现矩阵乘法,这是深度神经网络中最核心且计算量最大的操作。作者提出了一种通用的量子矩阵乘法框架,通过优化量子算术逻辑单元,利用量子傅里叶变换将经典数据编码到参数化的 $R_z$ 旋转门中,从而将量子加法器的基门复杂度降低到 $O(n)$,并基于经典算术的列乘原理优化量子乘法器复杂度至 $O(n^2)$。此外,还扩展了该方法到量子版的斯特拉森算法,实验分析了乘法时间减少与加法资源增加之间的权衡,为构建通用量子矩阵运算提供了可靠的技术路径。

详情
AI中文摘要

作为深度神经网络中最核心且计算最密集的组件,矩阵乘法的执行效率直接决定了模型的训练和推理性能。利用量子叠加和纠缠提供的并行处理能力来重塑矩阵乘法的实现,已成为优化底层量子算术逻辑和提高量子电路运行效率的一个有前景的切入点。本文提出了一种通用量子矩阵乘法(QMM)框架,旨在通过优化的量子算术逻辑单元实现显著的计算加速。为了规避传统量子算术电路中多寄存器和多控制门的限制,我们使用量子傅里叶变换(QFT)将经典数据直接编码到参数化的 \(R_z\) 旋转门中,从而将量子加法器的基本门复杂度降低到 \(O(n)\)。此外,通过采用经典算术中的列乘法原理,我们将量子乘法器的门复杂度优化到 \(O(n^2)\)。我们进一步将这种方法扩展到量子版本的Strassen算法,并通过实验量化了乘法时间减少与加法资源开销增加之间的权衡。这项工作为构建通用量子矩阵运算建立了一条可靠的技术路径,有望为训练现代机器学习模型释放巨大的计算能力。

英文摘要

As the most central and computationally intensive component of deep neural networks, the execution efficiency of matrix multiplication directly determines the training and inference performance of models. Harnessing the parallel processing capabilities afforded by quantum superposition and entanglement to reshape matrix multiplication implementations has become a promising entry point for optimising underlying quantum arithmetic logic and improving the operational efficiency of quantum circuits. This paper proposes a universal quantum matrix multiplication (QMM) framework designed to achieve substantial computational acceleration through an optimised quantum arithmetic logic unit. To circumvent the limitations of multi-register and multi-control gates in conventional quantum arithmetic circuits, we encode classical data directly into parameterised \(R_z\) rotation gates using the quantum Fourier transform (QFT), thereby reducing the base gate complexity of the quantum adder to \(O(n)\). In addition, by adopting the column-wise multiplication principle from classical arithmetic, we optimize the gate complexity of the quantum multiplier to \(O(n^2)\). We further extend this approach to a quantum version of the Strassen algorithm, and experimentally quantify the trade-off between reduced multiplication time and increased overhead in addition resources. This work establishes a reliable technical pathway for constructing general-purpose quantum matrix operations, with the potential to unlock substantial computational power for training modern machine learning models.

2407.04573 2026-05-25 cs.IR cs.CL

Vector Retrieval with Similarity and Diversity: How Hard Is It?

向量检索中的相似性与多样性:难度有多大?

Hang Gao, Dong Deng, Yongfeng Zhang

发表机构 * Rutgers University(罗杰斯大学)

AI总结 本文研究了在密集向量检索中如何同时平衡相似性与多样性这一关键问题,提出了一个名为VRSD的新优化框架,旨在从查询向量与所选候选向量之和之间最大化相似性。该问题被证明是NP难的,具有坚实的理论基础。作者进一步提出了一种无需参数的启发式算法,并在多个数据集上验证了其有效性,优于现有主流方法如MMR和k-DPP。

详情
AI中文摘要

稠密向量检索是现代机器学习系统的重要构建模块,支撑从语义搜索到检索增强生成和知识密集型推理等应用。除了检索与查询单独相似的项外,许多应用还需要一组结果具有多样性、互补性和集体信息性。因此,平衡相似性和多样性是有效检索的核心,但在稳定且理论扎实的方式下优化仍然具有挑战性。最大边际相关性(MMR)是解决该问题的广泛采用的启发式方法,但其对手动调整参数的依赖导致优化波动和不可预测的检索结果。更广泛地说,现有方法对相似性和多样性如何在稠密向量空间中相互作用提供的理论见解有限,使得联合优化问题尚未被充分理解。为了解决这些挑战,本文引入了一种新方法,通过最大化查询向量与所选候选向量之和之间的相似性来同时刻画两个约束。我们正式定义了该优化问题——向量检索相似性与多样性(VRSD),并证明其为NP完全问题,从而建立了该双目标检索固有难度的严格理论界限。随后,我们提出了一种无参数启发式算法来求解VRSD。在多个数据集上的广泛评估,结合客观几何指标和LLM模拟的主观评估,表明我们的VRSD启发式方法始终优于包括MMR和行列式点过程(k-DPP)在内的已建立基线。

英文摘要

Dense vector retrieval is an important building block of modern machine learning systems, underlying applications ranging from semantic search to retrieval-augmented generation and knowledge-intensive reasoning. Beyond retrieving items that are individually similar to a query, many applications require a set of results that is also diverse, complementary, and collectively informative. Balancing similarity and diversity is therefore central to effective retrieval, but remains challenging to optimize in a stable and theoretically grounded way. Maximal Marginal Relevance (MMR) is a widely adopted heuristic for this problem, yet its reliance on a manually tuned parameter leads to optimization fluctuations and unpredictable retrieval results. More broadly, existing methods provide limited theoretical insight into how similarity and diversity interact in dense vector spaces, leaving the joint optimization problem insufficiently understood. To address these challenges, this paper introduces a novel approach that characterizes both constraints simultaneously by maximizing the similarity between the query vector and the sum of the selected candidate vectors. We formally define this optimization problem, Vector Retrieval with Similarity and Diversity (VRSD), and prove that it is NP-complete, establishing a rigorous theoretical bound on the inherent difficulty of this dual-objective retrieval. Subsequently, we present a parameter-free heuristic algorithm to solve VRSD. Extensive evaluations on multiple datasets, incorporating both objective geometric metrics and LLM-simulated subjective assessments, demonstrate that our VRSD heuristic consistently outperforms established baselines, including MMR and Determinantal Point Processes (k-DPP).

2404.05108 2026-05-25 quant-ph cs.IT cs.LG math.IT

Efficient Gradient Estimation for Parameterized Quantum Systems with Lie Algebraic Symmetries

具有李代数对称性的参数化量子系统的有效梯度估计

Mohsen Heidari, Masih Mozakka, Wojciech Szpankowski

发表机构 * Department of Computer Sciences, Indiana University, Bloomington, IN, USA(印第安纳大学计算机科学系,印第安纳州布卢明顿) Department of Computer Sciences, Purdue University, West Lafayette, IN, USA(普渡大学计算机科学系,印第安纳州西拉法叶)

AI总结 本文研究了参数化量子电路(PQCs)训练中的梯度估计问题,针对现有方法在高维希尔伯特空间和量子测量信息丢失方面的不足,提出了一种基于李代数结构和哈达玛测试的新框架。通过分析矩阵指数的微分,将梯度表示为由哈达玛测试得到的期望值的线性组合,其系数仅依赖于电路参数化方式,可利用阴影层析技术高效估计。该方法显著降低了测量次数和计算时间,分别实现了指数级和多项式级的提升。

Comments 32 pages

详情
AI中文摘要

梯度估计是训练参数化量子电路(PQC)以解决混合量子-经典优化和学习问题的核心挑战。这一困难源于多个因素,包括希尔伯特空间的指数维度和量子测量中的信息损失。现有的估计器,如有限差分和参数平移规则,通常无法充分应对某些类别PQC的这些挑战。在这项工作中,我们提出了一种新颖的梯度估计框架,该框架利用了PQC的底层李代数结构,并结合了Hadamard测试。通过分析矩阵指数的微分,我们将梯度表示为通过Hadamard测试获得的期望值的线性组合。该分解中的系数仅取决于电路的参数化,并且可以使用最先进的阴影层析技术进行估计。因此,我们的方法实现了高效的梯度估计,所需的测量次数随参数数量对数增长,并且具有多项式经典和量子时间。与现有工作相比,这实现了测量成本的指数级降低和时间的多项式加速。

英文摘要

Gradient estimation is a central challenge in training parameterized quantum circuits (PQCs) for hybrid quantum-classical optimization and learning problems. This difficulty arises from several factors, including the exponential dimensionality of the Hilbert spaces and the information loss in quantum measurements. Existing estimators, such as finite difference and the parameter shift rule, often fail to adequately address these challenges for certain classes of PQCs. In this work, we propose a novel gradient estimation framework that leverages the underlying Lie algebraic structure of PQCs, combined with the Hadamard test. By analyzing the differential of the matrix exponential, we derive an expression for the gradient as a linear combination of expectation values obtained via Hadamard tests. The coefficients in this decomposition depend solely on the circuit's parameterization and can be estimated using state-of-the-art shadow tomography techniques. Hence, our approach enables efficient gradient estimation, requiring a number of measurement shots that scales logarithmically with the number of parameters, and with polynomial classical and quantum time. This is an exponential reduction in the measurement cost and a polynomial speed-up in time compared to existing works.

2605.22923 2026-05-25 cs.IR cs.CL

AI-Friendly LaTeX: Using LaTeX Code as a Knowledge Source for Retrieval-Augmented Generation

AI友好的LaTeX:使用LaTeX代码作为检索增强生成的知识源

Tom Verhoeff

发表机构 * Department of Mathematics & Computer Science, Eindhoven University of Technology(数学与计算机科学系,埃因霍温理工大学)

AI总结 本文研究如何将LaTeX源代码作为知识源用于检索增强生成(RAG),以提升大型语言模型在处理数学和技术内容时的准确性。作者提出了一种针对性的预处理方法,将LaTeX源文件及其辅助文件转换为适合向量化数据库索引的Markdown和JSONL格式,从而更好地保留结构信息和语义内容。该方法有效解决了LaTeX源代码在AI应用中的兼容性问题,为技术文档的智能处理提供了新思路。

Comments 19 pages, 3 figures

详情
AI中文摘要

当大型语言模型的答案基于显式知识源时,它们可以更可靠地回答关于教科书、讲义和编程练习的问题。检索增强生成(RAG)是一种常见方法:在回答之前,检索文档的相关片段并将其插入模型上下文。对于数学和技术材料,原始的LaTeX源码可以比PDF更好的起点,因为它包含结构信息、标签、章节命令、宏以及作者意图,这些在PDF提取中常常丢失或失真。然而,LaTeX源码并非自动AI友好的。必须解析交叉引用,解释自定义宏,识别练习和示例,并且可能需要作者提供的语义元数据。本文描述了一种有针对性的预处理方法,将LaTeX源码及其编译的辅助文件和可选的作者注释转换为适合在向量数据库中索引的Markdown和JSONL块。

英文摘要

Large language models can answer questions about textbooks, lecture notes, and programming exercises more reliably when their answers are grounded in an explicit knowledge source. Retrieval-augmented generation (RAG) is a common approach: relevant fragments of a document are retrieved and inserted into the model context before answering. For mathematical and technical material, the original LaTeX source can be a better starting point than a PDF, because it contains structural information, labels, sectioning commands, macros, and authorial intent that are often lost or distorted in PDF extraction. However, LaTeX source is not automatically AI-friendly. Cross-references must be resolved, custom macros must be interpreted, exercises and examples must be identified, and author-supplied semantic metadata may be needed. This article describes a focused preprocessing approach for turning LaTeX source, together with its compiled auxiliary files and optional author annotations, into Markdown and JSONL chunks suitable for indexing in a vector database.

2605.22893 2026-05-25 eess.SP cs.LG

L-FAME: Longitudinal Focused Attention Meditation EEG Dataset and Benchmark

L-FAME:纵向专注冥想脑电图数据集与基准

Angqi Li, Ab Basit Rafi Syed, Hamzeh Alzweri, Taosheng Liu, Barry H. Cohen, Saiprasad Ravishankar

发表机构 * Department of CMSE(计算机科学与工程系) Department of CSE(计算机科学与工程系) Michigan State University(密歇根州立大学) Department of Psychology(心理学系) Department of Applied Psychology(应用心理学系) New York University(纽约大学) Department of BME(生物医学工程系)

AI总结 本文介绍了L-FAME数据集和相应的基准测试,旨在推动对不同冥想实践及其六周训练期间神经效应演变的研究。该数据集包含74名健康大学生在干预前后的脑电图记录和心理评估,参与者被随机分配到三种不同的冥想组。研究提出了三个分类任务作为基准,涵盖认知状态解码、冥想技术细分类以及跨会话适应性评估,并提供了多种机器学习和深度学习方法的基线结果,为计算冥想研究和基于EEG的机器学习方法开发提供了宝贵资源。

Comments Code and dataset available at: https://huggingface.co/datasets/L-FAME-Dataset-Benchmark/L-FAME

详情
AI中文摘要

我们引入了一个新颖的纵向专注冥想脑电图(L-FAME)数据集及配套基准,旨在促进对多种冥想实践的神经效应及其在六周训练期内演变的研究。该数据集包含74名健康大学生参与者的脑电图记录和心理评估,在两个不同时间点(干预前和干预后)收集。参与者被随机分配到三个不同的冥想组:两种基于咒语的技术(SA-TA-NA-MA和哈瑞奎师那)和一种专注呼吸练习。利用这一独特的纵向和比较数据集,我们提出了一个基准套件,包含三个不同的分类任务:(1)认知状态解码,区分休息和冥想状态;(2)特定冥想技术的细粒度分类;(3)跨会话适应,评估模型在纵向时间间隔上的泛化能力。我们利用一系列经典机器学习算法和深度学习架构为这些任务提供了全面的基线结果。完整的数据集、预处理流程和基准评估代码将公开发布,为计算冥想研究和基于脑电图的机器学习中新的分析方法的发展和比较提供宝贵的资源和标准化框架。数据集可在https://huggingface.co/datasets/L-FAME-Dataset-Benchmark/L-FAME获取。

英文摘要

We introduce a novel Longitudinal Focused Attention Meditation Electroencephalography (L-FAME) dataset and an accompanying benchmark, designed to foster research into the neural effects of various meditation practices and the evolution of these effects over a six-week training period. The dataset contains EEG recordings and psychological assessments from 74 healthy college participants, collected at two distinct time points: pre-intervention and post-intervention. Participants were randomly assigned to one of three distinct meditation groups: two mantra-based techniques (SA-TA-NA-MA and Hare Krishna) and one Breath Focus practice. Leveraging this unique longitudinal and comparative dataset, we propose a benchmark suite comprising three distinct classification tasks: (1) cognitive state decoding to distinguish between resting and meditation states, (2) fine-grained classification of the specific meditation techniques, and (3) cross-session adaptation to evaluate model generalization across the longitudinal time gap. We provide comprehensive baseline results for these tasks utilizing a range of classical machine learning algorithms and deep learning architectures. The complete dataset, preprocessing pipelines, and benchmark evaluation code will be publicly released, offering a valuable resource and a standardized framework for the development and comparison of new analytical methods in computational meditation research and EEG-based machine learning. The dataset is available at https://huggingface.co/datasets/L-FAME-Dataset-Benchmark/L-FAME

2605.22886 2026-05-25 cs.IT cs.LG cs.NI math.IT

Resilience Characterization of AI-Native Wireless Receivers via Persistent Homology

基于持续同调的AI原生无线接收机韧性表征

Christo Kurisummoottil Thomas, Emilio Calvanese Strinati

发表机构 * CEA-Leti(CEA-莱提)

AI总结 本文研究了基于深度学习的无线接收机在非平稳信道下的鲁棒性问题,提出了一种基于持续同调的实时度量指标——拓扑鲁棒性指数(TRI),用于量化神经网络接收机在在线适应过程中的结构稳定性。TRI从三个互补维度刻画系统鲁棒性,包括模型-信道不匹配、信道冲激响应分布偏移以及信道流形拓扑特性。理论分析表明TRI具有有界性、单调性和稳定性,仿真结果验证了其在OFDM接收机中的有效性,相比传统方法能提前预警信道变化并显著降低误码率。

详情
AI中文摘要

基于深度学习的AI原生无线接收机在平稳信道条件下表现出卓越性能,但其对分布偏移的韧性仍难以通过误码率(BER)等传统指标有效表征。为克服这些局限,本文提出一种新颖的实时指标——拓扑韧性指数(TRI),该指标基于持续同调和持续指数。TRI量化了神经网络接收机参数空间在在线适应非平稳信道过程中的结构稳定性。具体而言,TRI通过三个互补维度捕捉韧性:(i)验证损失韧性,衡量模型-信道失配,基于损失景观子水平集的拓扑持续性;(ii)信道冲激响应(CIR)分布偏移,追踪CIR向量相对于校准参考分布的几何漂移;(iii)信道流形拓扑,通过经Olivier-Ricci曲率范数归一化的高斯核矩阵谱隙量化。我们建立了理论保证,表明TRI具有有界性、在性能退化下的单调性,以及关于Wasserstein距离度量的信道分布扰动的Lipschitz稳定性。针对一个OFDM深度学习接收机在三种偏移速率下跨越十个ITU-R环境间转换的仿真结果表明,TRI相比梯度范数和验证损失基线,提供了一致的大于一个OFDM符号的平均预警提前量,而梯度范数基线在每种场景下均实现零提前量。此外,所提出的TRI引导的突发重适应在200个OFDM符号内将后偏移BER相对于无适应降低了80%。

英文摘要

AI-native wireless receivers based on deep learning exhibit remarkable performance under stationary channel conditions, yet their resilience to distributional shifts remains poorly characterized by conventional metrics such as bit error rate (BER). To overcome these limitations, this paper proposes a novel real-time metric, the Topological Resilience Index (TRI), grounded in persistent homology and persistence exponents. TRI quantifies the structural stability of a neural network receiver's parameter space during online adaptation to non-stationary channels. Specifically, TRI captures resilience through three complementary dimensions: (i) validation-loss resilience measuring model-channel mismatch, grounded in the topological persistence of loss-landscape sublevel sets; (ii) channel impulse response (CIR) distribution shift, tracking geometric drift of CIR vectors from the calibration reference distribution; and (iii) channel manifold topology, quantified by the spectral gap of the Gaussian kernel matrix normalized by the Olivier-Ricci curvature norm. We establish theoretical guarantees showing that TRI is bounded, monotonic under performance degradation, and Lipschitz-stable with respect to perturbations in channel distributions measured in Wasserstein distance. Simulation results for an OFDM deep-learning receiver adapting across ten ITU-R inter-environment transitions at three shift rates demonstrate that TRI provides a consistent mean warning lead of more than one OFDM symbol over gradient-norm and validation-loss baselines, whereas the gradient-norm baseline achieves zero lead in every scenario. Furthermore, the proposed TRI-guided burst re-adaptation reduces post-shift BER by 80% relative to no adaptation within 200 OFDM symbols.

2605.22859 2026-05-25 eess.SP cs.AI

Staging by the Book: Automatic Sleep Stage Classification Using Scoring Rules

按书分期:使用评分规则进行自动睡眠阶段分类

Emil Hardarson, Konstantin Popov, Sigridur Sigurdardottir, Anna Sigridur Islind, Erna Sif Arnardóttir, María Óskarsdóttir

发表机构 * Department of Computer Science, Reykjavik University(雷克雅未克大学计算机科学系) Reykjavik University Sleep Institute(雷克雅未克大学睡眠研究所) Reykjavik University(雷克雅未克大学) Department of Engineering, Reykjavik University(雷克雅未克大学工程系) School of Mathematical Sciences, University of Southampton(萨塞克斯大学数学科学学院)

AI总结 本文提出了一种基于睡眠医学临床评分规则的透明化睡眠分期方法,通过将美国睡眠医学会(AASM)的评分逻辑转化为可执行代码,并为每个分期结果生成自然语言解释,从而提高模型的可解释性。与当前主流的深度学习方法相比,该方法虽然在分期准确率上略低,但其决策过程明确且符合临床规范,可作为深度学习模型的辅助工具,用于审核、调试和监管睡眠分期系统。

详情
AI中文摘要

自动睡眠分期通常被视为监督式机器学习问题,深度学习方法主导了近期研究。尽管机器学习模型与人工评分的参考睡眠阶段达到接近人类水平的一致性,但其决策通常不透明,且并非设计用于遵循临床评分规则。我们提出一种透明的替代方案:一种确定性的、基于规则的睡眠分期方法,将美国睡眠医学会(AASM)的评分逻辑明确操作化为可执行代码,并附带基于解释轨迹的时期级自然语言理由。我们在50份多导睡眠图记录上评估该方法,以10位评分者的多数投票共识作为参考。在所有记录中,该方法与多数投票参考在60.5%的时期中一致(κ=0.42),在开发过程中使用的数据集上一致性显著更高(77.1%,κ=0.61)。与参考的一致性在睡眠阶段N2中最高(召回率83.5%),在睡眠阶段R中中等(召回率68.7%),而清醒和N1的召回率较低。尽管与参考的一致性低于当代深度学习模型,但该方法提供了与AASM评分规则一致的确定性决策和自然语言解释,使其成为审计、调试和管理基于深度学习的睡眠分期的补充工具。

英文摘要

Automated sleep staging is commonly approached as a supervised machine learning problem, with deep learning methods dominating recent research. While machine learning models achieve near-human level agreement with human-scored reference sleep stages, their decisions are typically opaque and not designed to follow clinical scoring rules. We propose a transparent alternative: a deterministic, rule-based sleep staging method that explicitly operationalizes the American Academy of Sleep Medicine's (AASM) scoring logic as executable code, coupled with epoch-level natural-language justifications derived from an explanation trace. We evaluate the approach on 50 polysomnography recordings with a 10-scorer majority-vote consensus as reference. Across all recordings, the method agreed with the majority-vote reference in 60.5% of epochs ($κ=0.42$), with substantially higher agreement on a dataset used during development (77.1%, $κ=0.61$). Agreement with the reference was highest for sleep stage N2 (recall 83.5%) and moderate for sleep stage R (recall 68.7%), while Wake and N1 recall were low. Despite lower agreement with the reference than contemporary deep learning models, the method provides deterministic decisions and natural language explanations aligned with AASM scoring rules, making it a complementary tool for auditing, debugging, and governing deep learning-based sleep staging.

2605.22858 2026-05-25 eess.SP cs.LG

Classification of IED-free EEG Responses for Assisted Epilepsy Diagnosis

用于辅助癫痫诊断的无IED脑电图反应分类

Giacomo Zanardini, Ryan Moesman, Paul van der Kleij, Robert van den Berg, Justin Dauwels

发表机构 * Signal Processing Systems(信号处理系统) Delft University of Technology(代尔夫特理工大学) Erasmus Medical Center(埃因霍温医学中心)

AI总结 本文研究了在常规脑电图(EEG)缺乏发作间期癫痫样放电(IED)的情况下,如何利用刺激诱发的脑电信号辅助癫痫诊断。作者提出了一种基于多领域特征(时域、频域、小波域和连接性)的机器学习分类方法,并采用堆叠集成策略融合不同特征集,以提高分类性能。实验结果表明,该方法在多个数据集上表现出良好的诊断能力,特别是在间歇性光刺激(IPS)诱发的脑电信号中,能够有效区分癫痫患者与非患者,为无IED情况下的癫痫辅助诊断提供了新思路。

Comments Accepted at IEEE EMBC2026

详情
AI中文摘要

当常规脑电图缺乏发作间期癫痫样放电(IED)时,诊断癫痫具有挑战性。间歇性光刺激(IPS)和过度换气(HV)可提高诊断率,但其解释具有主观性。我们提出一种可重复的流水线,使用跨越时域、频谱、小波和连接性域的机器学习特征,以及堆叠集成来组合互补特征集,对刺激过程中采集的脑电图记录进行分类。在TUH癫痫语料库和临床Erasmus MC(EMC)队列上使用留一受试者交叉验证(LOSO)评估性能,包括在TUH上的无IED分析。在TUH上,集成在无IED静息态脑电图上达到高达97.8% AUC / 93.1% BAC,在无IED IPS上达到94.1% AUC / 86.8% BAC。在EMC上,IPS提供最强的区分能力(79.4% AUC / 73.9% BAC),而HV性能受益于按反应性对受试者进行分层。这些结果表明,刺激诱发的活动,特别是IPS,包含对无IED癫痫分类有意义的判别信息,并且多域集成提高了鲁棒性。

英文摘要

Diagnosing epilepsy is challenging when routine EEGs lack interictal epileptiform discharges (IEDs). Intermittent photic stimulation (IPS) and hyperventilation (HV) can increase diagnostic yield, but their interpretation is subjective. We propose a reproducible pipeline that classifies EEG recordings acquired during stimulation procedures, using machine-learning features spanning temporal, spectral, wavelet, and connectivity domains, and a stacked ensemble to combine complementary feature sets. Performance is evaluated with leave-one-subject-out (LOSO) cross-validation on the TUH Epilepsy Corpus and a clinical Erasmus MC (EMC) cohort, including IED-free analyses on TUH. On TUH, ensembles achieve up to 97.8\% AUC / 93.1\% BAC on IED-free resting-state EEG and 94.1\% AUC / 86.8\% BAC on IED-free IPS. On EMC, IPS provides the strongest discrimination (79.4\% AUC / 73.9\% BAC), while HV performance benefits from stratifying subjects by responsiveness. These results indicate that stimulation-evoked activity, particularly IPS, contains meaningful discriminative information for IED-free epilepsy classification and that multi-domain ensembling improves robustness.

2605.22857 2026-05-25 eess.SP cs.LG

JointHRRP-Net: A Statistically Constrained Decoupling Network for Joint Target and Jamming Recognition in Composite Jamming

JointHRRP-Net: 一种用于复合干扰中目标与干扰联合识别的统计约束解耦网络

Yunfei Zhao, Mei Liu, Shuowei Liu, Xunzhang Gao, Yujie Zhou

发表机构 * College of Electronic Science and Technology, National University of Defense Technology(电子科学学院,国防科技大学)

AI总结 在复合干扰环境下,基于高分辨率距离像(HRRP)的雷达自动目标识别性能显著下降。为此,本文提出了一种统一的联合目标-干扰识别框架JointHRRP-Net,通过统计约束解耦模块从混合HRRP中分离出目标主导和干扰主导的潜在特征分支,并结合多尺度时序编码模块和双专家决策模块,分别实现单标签目标分类和多标签干扰分类。实验表明,该方法在不同信噪比和信干比条件下均优于现有方法,且对未知目标具有良好的判别能力。

Comments Submitted to IEEE Transactions on Geoscience and Remote Sensing (TGRS). 15 pages, 12 figures

详情
AI中文摘要

基于高分辨率距离像(HRRP)的雷达自动目标识别在复合干扰环境中性能严重下降。有源干扰在接收到的距离像中引入压制和欺骗相关分量。脉冲压缩后,这些分量与目标回波在HRRP域中耦合,使得目标相关散射峰难以区分,削弱了特征可分离性。针对这一问题,本文提出JointHRRP-Net,一种用于目标-干扰联合识别的统一框架。首先开发了一个统计约束解耦模块,从混合HRRP表示中生成目标主导和干扰主导的潜在分支。施加相关性引导的统计约束以抑制冗余的跨分支信息并减轻目标-干扰特征纠缠。然后设计了一个多尺度时序编码模块来建模局部散射结构和长距离单元依赖关系,随后是一个双专家决策模块,用于单标签目标分类和多标签干扰分类。在不同信干比(SJR)和信噪比(SNR)水平下的实验表明,JointHRRP-Net在目标识别和复合干扰识别方面均优于代表性基线方法。开放集评估进一步表明,学习到的目标表示对于未知目标拒绝仍具有判别性。这些结果证明了JointHRRP-Net在复合干扰场景中的有效性和鲁棒性。

英文摘要

High-resolution range profile (HRRP)-based radar automatic target recognition suffers from severe performance degradation in composite jamming environments. Active jamming introduces suppression- and deception-related components into the received range profile. After pulse compression, these components are coupled with target echoes in the HRRP domain, making target-related scattering peaks difficult to distinguish and weakening feature separability. To address this problem, this paper proposes JointHRRP-Net, a unified framework for joint target-jamming recognition. A statistically constrained decoupling module is first developed to generate target-dominant and jamming-dominant latent branches from the mixed HRRP representation. Correlation-guided statistical constraints are imposed to suppress redundant cross-branch information and alleviate target-jamming feature entanglement. A multi-scale temporal encoding module is then designed to model local scattering structures and long-range range-cell dependencies, followed by a dual-expert decision module for single-label target classification and multi-label jamming classification. Experiments under diverse signal-to-jamming ratio (SJR) and signal-to-noise ratio (SNR) levels demonstrate that JointHRRP-Net outperforms representative baseline methods in both target recognition and composite jamming recognition. Open-set evaluation further shows that the learned target representation remains discriminative for unknown-target rejection. These results demonstrate the effectiveness and robustness of JointHRRP-Net in composite jamming scenarios.

2605.22855 2026-05-25 cs.GT cs.AI cs.CL cs.LG

PrefBench: Evaluating Zero-Shot LLM Agents in Hidden-Preference Personalized Pricing Negotiations

PrefBench:评估隐藏偏好个性化定价谈判中的零样本LLM智能体

Yingjie Lei

发表机构 * University of Aberdeen(阿伯丁大学)

AI总结 本文提出了PrefBench,一个用于评估零样本大语言模型(LLM)代理在隐藏偏好个性化定价谈判中表现的基准测试平台。该平台通过模拟买家与固定车辆定制套餐的互动,要求卖家在仅能获取公开信息的情况下进行谈判,而买家的估值、耐心、还价行为等关键参数是隐藏的。实验表明,尽管LLM代理能够遵循协议并达成高比例的交易,但其利润表现较差,远不如简单的让步策略,突显了当前LLM在利润敏感型谈判中的不足。PrefBench为研究隐藏买家偏好下的定价代理行为提供了可控的评估环境。

Comments 24 pages, 3 figures, 5 tables. Code is available at https://github.com/ChaosTheProducer/PrefBench

详情
AI中文摘要

个性化定价谈判是LLM智能体的一个具有挑战性的测试平台,因为成功的互动并不能保证盈利的决策。当买方的支付意愿和谈判特征仍然隐藏时,卖方可能产生有效的行动并达成许多交易,但定价仍然很差。本文提出了PrefBench,一个基于模拟器的隐藏偏好个性化定价谈判基准。每个回合将一个模拟买家与一个固定的车辆定制捆绑包配对;卖方观察公开的人物描述符、捆绑包信息和谈判历史,而潜在的买方变量控制估值、耐心、还价行为和退出决策。PrefBench通过一个面向LLM的状态摘要协议来评估这一设置,该协议限制智能体在固定的隐藏信息边界下返回严格的JSON动作。我们在7500个回合中评估了零样本LLM卖家与启发式参考。测试的LLM可靠地遵循协议,实现了高于0.99的交易率,但它们的卖家利润结果仍然较弱:最佳LLM平均利润仅略高于随机基线,远低于同一回合流下的简单让步启发式。这些结果表明,结构化行动合规性和寻求协议的行为可以与弱利润敏感谈判共存。PrefBench为评估隐藏买方偏好下的定价智能体行为提供了一个受控基准。

英文摘要

Personalized pricing negotiations are a challenging testbed for LLM agents because successful interaction does not guarantee profitable decision making. A seller may produce valid actions and close many deals while still pricing poorly when buyer willingness to pay and bargaining traits remain hidden. This paper presents PrefBench, a simulator-based benchmark for hidden-preference personalized pricing negotiations. Each episode pairs a simulated buyer with a fixed vehicle-customization bundle; the seller observes public persona descriptors, bundle information, and negotiation history, while latent buyer variables govern valuation, patience, counter-offer behavior, and walkaway decisions. PrefBench evaluates this setting through an LLM-facing state-summary protocol that constrains agents to return strict JSON actions under a fixed hidden-information boundary. We evaluate zero-shot LLM sellers against heuristic references over 7,500 episodes. The tested LLMs follow the protocol reliably and achieve deal rates above 0.99, but their seller-profit outcomes remain weak: the best LLM average profit is only slightly above the random baseline and far below a simple concession heuristic under the same episode stream. These results show that structured action compliance and agreement-seeking behavior can coexist with weak profit-sensitive bargaining. PrefBench provides a controlled benchmark for evaluating pricing-agent behavior under hidden buyer preferences.

2605.22853 2026-05-25 eess.SP cs.LG q-bio.QM

Topological Signal Processing: An Application-Oriented Tutorial

拓扑信号处理:面向应用的教程

Flavia Petruso, Maria Giulia Preti, Dimitri Van De Ville

发表机构 * Neuro-X Institute, École Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland(神经-X研究所,瑞士洛桑联邦理工学院(EPFL),日内瓦) Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland(放射学与医学信息学系,日内瓦大学,日内瓦,瑞士)

AI总结 本文介绍了拓扑信号处理(TSP)的基础概念及其在实际应用中的方法,旨在帮助研究者更好地理解和应用这一新兴领域。TSP 扩展了传统图信号处理(GSP),能够处理定义在节点、边、三角形等高阶网络结构上的信号,通过组合霍奇拉普拉斯算子等工具,实现了对复杂系统中高阶相互作用的分析。文章结合脑成像等实际案例,展示了 TSP 在揭示非平凡区域交互关系中的潜力,推动其在理论与应用研究中的广泛应用。

详情
AI中文摘要

许多现代数据集规模庞大且具有复杂的结构关系。传统上,基于图的方法用于表示网络数据,将个体元素建模为节点,将成对交互建模为边。此外,图信号处理(GSP)已被开发用于分析图节点上的信号,例如全国不同地区的温度测量值(节点信号)表示为图。拓扑信号处理(TSP)是一个新兴领域,它推广了GSP,使得不仅可以分析节点上的信号,还可以分析边、三角形以及更高维网络元素上的信号,这些元素被建模为单纯复形及相关拓扑结构。这使得TSP通过将滤波和傅里叶变换等经典信号处理概念扩展到拓扑层面,自然适用于研究复杂系统中的高阶交互。尽管TSP具有多功能性,但对许多实践者来说仍然具有挑战性。因此,我们提供了一个易于理解的TSP基础概述,同时与面向应用的场景建立联系。我们重点介绍基于组合Hodge Laplacian的处理技术,该技术将图Laplacian推广到单纯复形。特别地,我们回顾了关键的TSP概念,将其与现实世界的例子联系起来,并讨论了如何从数据集中导出高阶结构和信号。例如,我们引入了一种捕捉节点信号之间滞后交互的边级信号,并在基于TSP的脑成像数据分析案例研究中展示了其应用,揭示了脑区域集合之间的非平凡交互。总体而言,我们旨在通过弥合方法发展与应用程序之间的差距,促进TSP的更广泛采用,推动其在理论和应用研究人员社区中的使用。

英文摘要

Many modern datasets are large and carry complex structural relationships. Graph-based methods have traditionally been used to represent networked data, modeling individual elements as nodes and pairwise interactions as edges. Furthermore, Graph Signal Processing (GSP) has been developed to analyze signals on graph nodes, such as temperature measurements (node signals) across different regions of a country represented as a graph. Topological Signal Processing (TSP) is an emerging field that generalizes GSP, enabling the analysis of signals defined not only on nodes but also on edges, triangles, and higher-dimensional network elements, modeled as simplicial complexes and related topological structures. This makes TSP naturally well-suited for studying higher-order interactions in complex systems by extending classical signal processing concepts, such as filtering and Fourier transforms, to the topological level. Despite its versatility, TSP remains challenging for many practitioners. Therefore, we present an accessible overview of TSP foundations while drawing connections with application-oriented settings. We focus on processing techniques based on the combinatorial Hodge Laplacian, which generalizes the graph Laplacian to simplicial complexes. In particular, we review key TSP concepts, relate them to real-world examples, and discuss how higher-order structures and signals can be derived from datasets. For instance, we introduce an edge-level signal capturing lagged interactions between nodal signals, and demonstrate its use in a case study on TSP-based analysis of brain imaging data, revealing nontrivial interactions between sets of brain regions. Overall, we aim to promote a broader adoption of TSP by bridging methodological developments with applications, fostering its use among a wide community of theoretical and applied researchers.