arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1971
专题追踪
2507.07156 2026-06-18 stat.ML cs.CG cs.LG math.AT 版本更新

Unreduced Persistence Diagrams for Topological Machine Learning

未约简持久图在拓扑机器学习中的应用

Nicole Abreu, Parker B. Edwards, Francis Motta

发表机构 * Department of Mathematics and Statistics, Florida Atlantic University, Boca Raton, FL(数学与统计学系,佛罗里达国际大学, Boca Raton, FL)

AI总结 研究未约简边界矩阵生成的拓扑特征向量在机器学习中的性能,发现其与完全约简持久图性能相当甚至更优,且计算内存需求低一个数量级。

Comments Substantially expanded to include additional ML and software benchmark experiments. 11 figures, 4 tables, 20 pages (without appendix and references)

详情
AI中文摘要

基于持久同源性特征训练的监督机器学习流程在实验中被观察到忽略了持久图中包含的大量信息。然而,计算持久图通常是此类流程中计算最密集的步骤。为了探索这一动态,我们引入了几种从未约简边界矩阵生成拓扑特征向量的方法,并研究了它们的理论和计算性质。我们比较了基于未约简持久图向量化的流程与基于完全约简持久图向量化的流程在多种数据和任务类型上的性能。结果表明,基于未约简图构建的持久图训练的模型在某些任务上可以与基于完全约简图训练的模型表现相当,甚至更优。我们还对一个计算未约简图的算法进行了计算性能基准测试,该算法是Ripser的 heavily modified 版本。这些计算是可并行的,并且平均所需内存比计算完全持久图少一个数量级。我们的结果表明,利用未约简边界矩阵中包含信息的机器学习流程可能在计算成本和性能方面受益。

英文摘要

Supervised machine learning pipelines trained on features derived from persistent homology have been experimentally observed to ignore much of the information contained in a persistence diagram. Computing persistence diagrams is often the most computationally demanding step in such a pipeline, however. To explore this dynamic, we introduce several methods to generate topological feature vectors from unreduced boundary matrices and investigate their theoretical and computational properties. We compared the performance of pipelines trained on vectorizations of unreduced PDs to vectorizations of fully-reduced PDs across several data and task types. Our results indicate that models trained on PDs built from unreduced diagrams can perform on par and even outperform those trained on fully-reduced diagrams on some tasks. We also benchmarked the computational performance of an algorithm for computing unreduced diagrams, which was implemented as a heavily modified version of Ripser. These computations are parallelizable and required an order of magnitude less memory on average compared to computing full persistence diagrams. Our results suggest that machine learning pipelines which incorporate topology-based features may benefit in terms of computational cost and performance by utilizing information contained in unreduced boundary matrices.

2412.15557 2026-06-18 cs.SE cs.CL 版本更新

MORTAR: Multi-turn Metamorphic Testing for LLM-based Dialogue Systems

MORTAR:基于LLM的对话系统的多轮蜕变测试

Aaron Guoxiang Guo, Aldeida Aleti, Neelofar Neelofar, Chakkrit Tantithamthavorn, Yuanyuan Qi, Tsong Yueh Chen

发表机构 * Faculty of Information Technology, Monash University(墨尔本大学信息科技学院) School of Computing Technologies, RMIT University(皇家墨尔本理工大学计算技术学院) School of Science, Computing and Emerging Technologies, Swinburne University of Technology(斯威本理工大学科学、计算与新兴技术学院)

AI总结 提出MORTAR方法,通过多轮蜕变关系自动化生成测试用例,解决LLM对话系统多轮测试中的预言问题,相比单轮测试每个用例发现更多且更高质量的缺陷。

Comments Accepted for publication in IEEE Transactions on Software Engineering (TSE)

详情
AI中文摘要

随着基于LLM的对话系统在日常生活中的广泛应用,质量保证变得比以往更加重要。最近的研究成功引入了在单轮测试场景中识别意外行为的方法。然而,多轮交互是对话系统常见的实际使用方式,但针对此类交互的测试方法仍未得到充分探索。这主要是由于多轮测试中的预言问题,它仍然是对话系统开发人员和研究人员面临的重大挑战。在本文中,我们提出了MORTAR,一种蜕变式多轮对话测试方法,它缓解了测试基于LLM的对话系统时的测试预言问题。MORTAR形式化了对话系统的多轮测试,并自动生成问答对话测试用例,其中包含多种对话级扰动和蜕变关系(MRs)。自动化的MR匹配机制使MORTAR在蜕变测试中具有更高的灵活性和效率。所提出的方法完全自动化,无需依赖LLM评判。在测试六个流行的基于LLM的对话系统时,与单轮蜕变测试基线相比,MORTAR每个测试用例发现的错误数量增加了150%以上,效果显著更好。在错误质量方面,MORTAR在多样性、精确性和唯一性方面揭示了更高质量的错误。MORTAR有望激发更多的多轮测试方法,并帮助开发人员在有限的测试资源和预算下更全面地评估对话系统性能。

英文摘要

With the widespread application of LLM-based dialogue systems in daily life, quality assurance has become more important than ever. Recent research has successfully introduced methods to identify unexpected behaviour in single-turn testing scenarios. However, multi-turn interaction is the common real-world usage of dialogue systems, yet testing methods for such interactions remain underexplored. This is largely due to the oracle problem in multi-turn testing, which continues to pose a significant challenge for dialogue system developers and researchers. In this paper, we propose MORTAR, a metamorphic multi-turn dialogue testing approach, which mitigates the test oracle problem in testing LLM-based dialogue systems. MORTAR formalises the multi-turn testing for dialogue systems, and automates the generation of question-answer dialogue test cases with multiple dialogue-level perturbations and metamorphic relations (MRs). The automated MR matching mechanism allows MORTAR more flexibility and efficiency in metamorphic testing. The proposed approach is fully automated without reliance on LLM judges. In testing six popular LLM-based dialogue systems, MORTAR reaches significantly better effectiveness with over 150\% more bugs revealed per test case when compared to the single-turn metamorphic testing baseline. Regarding the quality of bugs, MORTAR reveals higher-quality bugs in terms of diversity, precision and uniqueness. MORTAR is expected to inspire more multi-turn testing approaches, and assist developers in evaluating the dialogue system performance more comprehensively with constrained test resources and budget.

2505.16057 2026-06-18 cs.HC cs.AI cs.MM 版本更新

Signals of Provenance: Practices & Challenges of Navigating Indicators in AI-Generated Media for Sighted and Blind Individuals

来源信号:视障与明眼用户在AI生成媒体中导航指示器的实践与挑战

Ayae Ide, Tory Park, Jaron Mink, Tanusree Sharma

发表机构 * Pennsylvania State University(宾夕法尼亚州立大学) Arizona State University(亚利桑那州立大学)

AI总结 通过访谈28位视障与明眼用户,研究AI生成内容指示器的使用实践,发现基于内容和菜单的指示器各有优劣,视障用户因界面可访问性不足而面临更多挑战,并提出设计建议。

Comments error found in reporting of results

详情
AI中文摘要

近年来,生成模型的进步和易用工具大幅降低了通过简单自然语言提示生成高度逼真音频、图像和视频的技术门槛,使得AI生成(AIG)内容日益普及。作为回应,平台正在采用可验证的来源机制,并推荐AIG内容进行自我披露和向用户发出信号。然而,这些指示器常常被忽略,尤其是当它们仅依赖视觉线索时,对具有不同感官能力的用户效果不佳。为弥补这一空白,我们进行了半结构化访谈(N=28),包括15名明眼和13名盲人或低视力(BLV)参与者,考察他们通过自我披露的AI指示器与AIG内容的互动。我们的发现揭示了多样化的心智模型和实践,突出了基于内容(如标题、描述)和菜单辅助(如AI标签)指示器的不同优缺点。明眼参与者利用视觉和音频线索,而BLV参与者主要依赖音频和现有的辅助工具,限制了其识别AIG的能力。两组参与者都经常忽略平台部署的菜单辅助指示器,而更倾向于与基于内容的指示器(如标题和评论)互动。我们发现了由于指示器位置不一致、元数据不清晰和认知过载导致的可用性挑战。这些问题对BLV个体尤为关键,因为界面元素的可访问性不足。我们为未来AIG指示器的多个维度提供了实用建议和设计启示。

英文摘要

AI-Generated (AIG) content has become increasingly widespread by recent advances in generative models and the easy-to-use tools that have significantly lowered the technical barriers for producing highly realistic audio, images, and videos through simple natural language prompts. In response, platforms are adopting provable provenance with platforms recommending AIG to be self-disclosed and signaled to users. However, these indicators may be often missed, especially when they rely solely on visual cues and make them ineffective to users with different sensory abilities. To address the gap, we conducted semi-structured interviews (N=28) with 15 sighted and 13 BLV participants to examine their interaction with AIG content through self-disclosed AI indicators. Our findings reveal diverse mental models and practices, highlighting different strengths and weaknesses of content-based (e.g., title, description) and menu-aided (e.g., AI labels) indicators. While sighted participants leveraged visual and audio cues, BLV participants primarily relied on audio and existing assistive tools, limiting their ability to identify AIG. Across both groups, they frequently overlooked menu-aided indicators deployed by platforms and rather interacted with content-based indicators such as title and comments. We uncovered usability challenges stemming from inconsistent indicator placement, unclear metadata, and cognitive overload. These issues were especially critical for BLV individuals due to the insufficient accessibility of interface elements. We provide practical recommendations and design implications for future AIG indicators across several dimensions.

2409.03500 2026-06-18 cs.CY cs.AI 版本更新

Quality Perceptions and Intended Engagement in Response to AI-Generated and AI-Assisted News

对AI生成和AI辅助新闻的质量感知与预期参与

Fabrizio Gilardi, Sabrina Di Lorenzo, Juri Ezzaini, Beryl Santa, Benjamin Streiff, Eric Zurfluh, Emma Hoes

发表机构 * University of Zurich(苏黎世大学)

AI总结 通过预注册调查实验(N=599),研究读者对人类撰写、AI辅助和AI完全生成新闻的质量感知及披露AI参与后的参与意愿,发现质量评价相似,但披露后AI组短期阅读意愿更高。

Comments Forthcoming, Scientific Reports

详情
AI中文摘要

人工智能在新闻生产中的日益普及引发了关于受众如何看待和回应AI生成新闻的重要问题。这项预注册调查实验(N=599,瑞士德语区)考察了(i)对人类撰写、AI辅助或完全AI生成的新闻摘录的文章质量感知(以可信度、可读性和专业知识衡量),以及(ii)在披露AI参与后自我报告的参与意愿。参与者在了解文章制作方式之前先阅读两篇短新闻摘录。所有条件下的文章在感知质量上评价相似。披露后,与对照组相比,AI辅助和AI生成条件下的参与者报告了更高的继续阅读指定文章的意愿,但未来阅读AI生成新闻的意愿在各条件下无差异。总体而言,研究结果表明,读者对AI生成和人类撰写的新闻质量评价相当,而披露AI使用可能暂时增加好奇心或兴趣,但尚未改变长期阅读意愿。

英文摘要

The increasing use of artificial intelligence (AI) in news production raises important questions about how audiences perceive and respond to AI-generated journalism. This preregistered survey experiment (N = 599, German-speaking Switzerland) examines (i) perceptions of article quality (measured as credibility, readability, and expertise) across news excerpts that were human-written, AI-assisted, or fully AI-generated, and (ii) self-reported intentions to engage following disclosure of AI involvement. Participants rated two short news excerpts before learning how they had been produced. Articles across all conditions were evaluated similarly in perceived quality. After disclosure, participants in the AI-assisted and AI-generated conditions reported a higher willingness to continue reading their assigned articles compared to the control group, but future willingness to read AI-generated news did not differ across conditions. Overall, the findings suggest that readers assess AI-generated and human-written news comparably in quality, while disclosure of AI use can momentarily increase curiosity or interest without yet changing longer-term reading intentions.

2410.21258 2026-06-18 quant-ph cs.CC cs.LG 版本更新

Provable quantum speedups for computing persistence in topological data analysis

可证明的量子加速用于拓扑数据分析中的持久性计算

Casper Gyurik, Alexander Schmidhuber, Robbie King, Vedran Dunjko, Ryu Hayakawa

发表机构 * applied Quantum algorithms (aQa), Leiden University, 2300 RA Leiden, The Netherlands Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, USA Department of Computing Yukawa Institute for Theoretical Physics \& The Hakubi Center, Kyoto University, Japan

AI总结 提出一种高效量子算法,用于判断拓扑数据分析中洞的持久性,并证明该问题为BQP_1-hard,暗示在标准复杂性假设下存在指数级量子加速。

Comments 17 pages

Journal ref PRX Quantum 7, 020361 (2026)

详情
AI中文摘要

拓扑数据分析(TDA)旨在通过检查数据拓扑中空洞的数量和持久性,从数据集中提取对噪声鲁棒的特征。我们为与TDA核心任务密切相关的一个计算问题提供了高效的量子算法——判断给定空洞是否在不同长度尺度上持续存在。此外,我们证明该问题本身是$\mathsf{BQP}_1$-hard的,意味着经典解决方案极不可能;这与所有先前的TDA量子方法形成对比,在这些方法中,问题对于量子计算机也是难解的,或者严格的经典困难性证明仍然悬而未决。这一结果表明,在标准复杂性理论假设下,该问题存在指数级的量子加速。我们的方法依赖于将空洞的持久性编码到引导稀疏哈密顿量问题的一个变体中,其中引导态由空洞的调和代表元构造而成。

英文摘要

Topological data analysis (TDA) aims to extract noise-robust features from a data set by examining the number and persistence of holes in its topology. We provide an efficient quantum algorithm for a computational problem closely related to a core task in TDA -- determining whether a given hole persists across different length scales. Further, we prove the problem itself is $\mathsf{BQP}_1$-hard, implying that a classical solution is extremely unlikely; this stands in contrast to all previous quantum approaches to TDA, where the problems were also intractable for quantum computers, or where a rigorous proof of classical hardness still remains open. This result implies an {exponential} quantum speedup for this problem under standard complexity-theoretic assumptions. Our approach relies on encoding the persistence of a hole in a variant of the guided sparse Hamiltonian problem, where the guiding state is constructed from a harmonic representative of the hole.

2601.18637 2026-06-18 quant-ph cs.LG stat.ML

Universality of Many-body Projected Ensemble for Learning Quantum Data Distribution

多重体投影集合在学习量子数据分布中的普遍性

Quoc Hoan Tran, Koki Chinzei, Yasuhiro Endo, Hirotaka Oshima

发表机构 * Quantum Laboratory, Fujitsu Research, Fujitsu Limited, Kawasaki, Kanagawa 211-8588, Japan(富士通量子实验室,富士通研究,富士通株式会社,神户,神奈川县211-8588,日本)

AI总结 本文探讨了多重体投影集合框架在量子机器学习中的普遍性,证明了其能近似任意纯态分布,并提出改进训练的增量MPE方法,通过实验验证了其在复杂量子数据分布学习中的有效性。

Comments 21 pages, 6 figures (added Github repository)

Journal ref IJCNN 2026

详情
AI中文摘要

生成量子数据需学习其底层量子分布,这在理论和实践中都面临挑战,但对理解量子系统至关重要。本文通过证明多重体投影集合框架的普遍性定理,回答了量子机器学习中参数化模型能否近似任意量子分布的问题。该定理表明MPE能在1-Wasserstein距离误差内近似任意纯态分布,提供了严格的通用表达性保证,填补了QML的关键理论空白。为提高实用性,我们提出具有层间训练的增量MPE变体。在聚类量子态和量子化学数据集上的数值实验验证了MPE在学习复杂量子数据分布中的有效性。

英文摘要

Generating quantum data by learning the underlying quantum distribution poses challenges in both theoretical and practical scenarios, yet it is a critical task for understanding quantum systems. A fundamental question in quantum machine learning (QML) is the universality of approximation: whether a parameterized QML model can approximate any quantum distribution. We address this question by proving a universality theorem for the Many-body Projected Ensemble (MPE) framework, a method for quantum state design that uses a single many-body wave function to prepare random states. This demonstrates that MPE can approximate any distribution of pure states within a 1-Wasserstein distance error. This theorem provides a rigorous guarantee of universal expressivity, addressing key theoretical gaps in QML. For practicality, we propose an Incremental MPE variant with layer-wise training to improve the trainability. Numerical experiments on clustered quantum states and quantum chemistry datasets validate MPE's efficacy in learning complex quantum data distributions.

2512.04115 2026-06-18 cs.CY cs.AI cs.HC

Artificial Intelligence Competence of K-12 Students Shapes Their AI Risk Perception: A Co-occurrence Network Analysis

K-12学生的人工智能能力塑造其AI风险感知:基于共现网络分析

Ville Heilala, Pieta Sikström, Mika Setälä, Tommi Kärkkäinen

发表机构 * University of Jyväskylä(于韦斯屈莱大学)

AI总结 研究通过共现网络分析探讨芬兰K-12高年级学生的人工智能能力与其风险感知之间的关系,发现能力高低影响学生关注的风险类型及教育AI应用的机遇与公平性。

Comments Accepted for Proceedings of the 41th ACM/SIGAPP Symposium on Applied Computing (SAC'26)

详情
AI中文摘要

随着人工智能(AI)在教育中的日益整合,理解学生对其风险的认知对于支持负责任和有效的采用至关重要。本研究旨在通过共现分析探讨芬兰K-12高年级学生(n=163)感知的AI能力与风险之间的关系。学生报告了其自我感知的AI能力及与AI相关的系统、机构和个人领域的担忧。研究发现,能力较低的学生更强调个人和学习相关风险,如创造力下降、批判性思维不足和滥用,而高能力学生则更关注系统和机构风险,包括偏见、不准确性和作弊。这些差异表明,学生自我报告的AI能力与其对教育AI(AIED)相关风险和机遇的评估有关。本研究的结果突显了教育机构需要将AI素养纳入课程、提供教师指导并制定政策,以确保AI在K-12教育中的个性化应用和公平整合。

英文摘要

As artificial intelligence (AI) becomes increasingly integrated into education, understanding how students perceive its risks is essential for supporting responsible and effective adoption. This research aimed to examine the relationships between perceived AI competence and risks among Finnish K-12 upper secondary students (n = 163) by utilizing a co-occurrence analysis. Students reported their self-perceived AI competence and concerns related to AI across systemic, institutional, and personal domains. The findings showed that students with lower competence emphasized personal and learning-related risks, such as reduced creativity, lack of critical thinking, and misuse, whereas higher-competence students focused more on systemic and institutional risks, including bias, inaccuracy, and cheating. These differences suggest that students' self-reported AI competence is related to how they evaluate both the risks and opportunities associated with artificial intelligence in education (AIED). The results of this study highlight the need for educational institutions to incorporate AI literacy into their curricula, provide teacher guidance, and inform policy development to ensure personalized opportunities for utilization and equitable integration of AI into K-12 education.

2510.13562 2026-06-18 physics.med-ph cs.CV cs.NA math.NA

An efficient approach with theoretical guarantees to simultaneously reconstruct activity and attenuation sinogram for TOF-PET

一种具有理论保证的高效方法用于同时重建TOF-PET的活动和衰减正弦图

Liyang Hu, Chong Chen

发表机构 * State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China(数学科学国家重点实验室,数学与系统科学研究院,中国科学院,北京100190,中国) University of Chinese Academy of Sciences, Beijing 100190, China(中国科学院大学,北京100190,中国)

AI总结 本文提出一种基于最大似然估计的新方法,用于同时重建TOF-PET的活动和衰减正弦图,通过利用指数形式的衰减校正因子和活动总量约束,证明了方法的可解性,并通过实验验证了其在精度和效率上的优越性。

Comments 32 pages, 11 figures, 4 tables

Journal ref IEEE Transactions on Computational Imaging 2026

详情
AI中文摘要

在正电子发射断层扫描(PET)中,进行衰减校正对于获得体内定量准确的活动图(示踪剂分布)至关重要。通常,这基于从计算机断层扫描或磁共振成像获得的估计衰减图。然而,除了衰减校正因子的误差外,额外的扫描不仅会引入新的辐射剂量或增加扫描时间,还会由于两次连续扫描之间的各种运动导致严重的对齐问题。为了解决这些问题,基于最大似然估计,我们提出了一种新的数学模型,仅从时间飞越(TOF)-PET发射数据中同时重建活动和衰减正弦图。特别地,我们充分利用了衰减校正因子的唯一指数形式,并在所提出的模型中考虑了某些掩码区域的活动总量约束。此外,我们证明了其可解性,包括解的存在性、唯一性和稳定性。我们提出了一种交替更新算法来求解该模型,并分析了其收敛性。最后,使用各种TOF-PET发射数据的数值实验表明,所提出的方法在数值收敛性和抗噪性方面表现良好,并在精度和效率上优于一些最先进的方法,且具有自主衰减校正的能力。

英文摘要

In positron emission tomography (PET), it is indispensable to perform attenuation correction in order to obtain the quantitatively accurate activity map (tracer distribution) in the body. Generally, this is carried out based on the estimated attenuation map obtained from computed tomography or magnetic resonance imaging. However, except for errors in the attenuation correction factors obtained, the additional scan not only brings in new radiation doses and/or increases the scanning time but also leads to severe misalignment induced by various motions during and between the two sequential scans. To address these issues, based on maximum likelihood estimation, we propose a new mathematical model for simultaneously reconstructing the activity and attenuation sinogram from the time-of-flight (TOF)-PET emission data only. Particularly, we make full use of the exclusively exponential form for the attenuation correction factors, and consider the constraint of a total amount of the activity in some mask region in the proposed model. Furthermore, we prove its well-posedness, including the existence, uniqueness and stability of the solution. We propose an alternating update algorithm to solve the model, and also analyze its convergence. Finally, numerical experiments with various TOF-PET emission data demonstrate that the proposed method is of numerical convergence and robust to noise, and outperforms some state-of-the-art methods in terms of accuracy and efficiency, and has the capability of autonomous attenuation correction.

2506.20869 2026-06-18 cs.SE cs.AI cs.IR

Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation

为现实应用工程化RAG系统:设计、开发与评估

Md Toufique Hasan, Muhammad Waseem, Kai-Kristian Kemell, Ayman Asad Khan, Mika Saari, Pekka Abrahamsson

发表机构 * Faculty of Information Technology and Communication Sciences, Tampere University(信息科技与通讯科学学院,塔尔皮耶大学)

AI总结 本文介绍了五个领域特定的RAG应用,涵盖治理、网络安全、农业、工业研究和医疗诊断,通过多语言OCR、语义向量检索和领域适应LLM,评估六个维度并总结十二项关键经验教训。

Comments Published in the Proceedings of the 51st Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2025. Lecture Notes in Computer Science, volume 16082, pages 143-158. Springer, 2026

Journal ref LNCS 16082, 143-158, 2026

详情
AI中文摘要

检索增强生成(RAG)系统正成为一种关键方法,用于将大型语言模型(LLMs)与外部知识联系起来,以解决事实准确性和上下文相关性方面的限制。然而,缺乏实证研究报告RAG基于真实应用场景的实现,通过一般用户参与评估,并伴有系统性的经验总结。本文提出了五个领域特定的RAG应用,分别应用于治理、网络安全、农业、工业研究和医疗诊断。每个系统都集成了多语言OCR、语义检索通过向量嵌入以及领域适应的LLM,并通过本地服务器或云API部署以满足不同的用户需求。一个基于网络的评估涉及总共100名参与者,评估了六个维度:(i)易用性,(ii)相关性,(iii)透明度,(iv)响应性,(v)准确性,(vi)推荐可能性。基于用户反馈和我们的开发经验,我们记录了十二项关键经验教训,突出了影响RAG系统在实践中可靠性和可用性的技术、操作和伦理挑战。

英文摘要

Retrieval-Augmented Generation (RAG) systems are emerging as a key approach for grounding Large Language Models (LLMs) in external knowledge, addressing limitations in factual accuracy and contextual relevance. However, there is a lack of empirical studies that report on the development of RAG-based implementations grounded in real-world use cases, evaluated through general user involvement, and accompanied by systematic documentation of lessons learned. This paper presents five domain-specific RAG applications developed for real-world scenarios across governance, cybersecurity, agriculture, industrial research, and medical diagnostics. Each system incorporates multilingual OCR, semantic retrieval via vector embeddings, and domain-adapted LLMs, deployed through local servers or cloud APIs to meet distinct user needs. A web-based evaluation involving a total of 100 participants assessed the systems across six dimensions: (i) Ease of Use, (ii) Relevance, (iii) Transparency, (iv) Responsiveness, (v) Accuracy, and (vi) Likelihood of Recommendation. Based on user feedback and our development experience, we documented twelve key lessons learned, highlighting technical, operational, and ethical challenges affecting the reliability and usability of RAG systems in practice.

2508.06406 2026-06-18 cs.DC cs.LG

Blockchain-Enabled Federated Learning

区块链赋能的联邦学习

Murtaza Rangwala, KR Venugopal, Rajkumar Buyya

发表机构 * Quantum Cloud and Distributed Systems (qCLOUDS) Lab, School of Computing and Information Systems, The University of Melbourne, Australia(量子云与分布式系统实验室,计算机与信息系统学院,墨尔本大学,澳大利亚) Department of Computer Science and Engineering, University of Visvesvaraya College of Engineering, Bangalore University, India(计算机科学与工程系,维萨瓦拉亚工程学院,班加罗尔大学,印度)

AI总结 本文探讨了区块链赋能的联邦学习在信任、隐私和协调方面的挑战,通过四维分类分析其架构,评估了共识机制、存储架构和信任模型的权衡,展示了TrustMesh框架在物联网设备上的应用。

Comments 32 pages, 6 figures, chapter for edited book (Federated Learning: Foundations and Applications)

详情
AI中文摘要

区块链赋能的联邦学习(BCFL)解决了协作AI系统中的信任、隐私和协调问题。本章通过系统性的四维分类分析,全面分析了BCFL系统的架构。从区块链验证的集中协调到完全去中心化的点对点网络,评估了可扩展性、安全性和性能的权衡。通过详细分析联邦学习环境中的共识机制,包括证明质量与证明联邦学习,展示了计算工作如何从任意密码谜题重新分配到有用的机器学习任务。通过多级架构分析,解决了关键存储挑战,平衡了区块链交易限制与神经网络大规模参数需求,同时保持密码学完整性。TrustMesh框架的技术案例研究展示了BCFL系统的实际实施考虑,通过分布式图像分类训练,展示了在高度非IID数据分布下,物联网设备间有效协同学习的同时保持完全透明和容错性。分析现实世界部署在医疗联盟、金融服务和物联网安全应用中的有效性,验证了BCFL系统的实用性,实现了与集中化方法相当的性能,同时提供了增强的安全保障,并启用了新的无信任协作智能模型。

英文摘要

Blockchain-enabled federated learning (BCFL) addresses fundamental challenges of trust, privacy, and coordination in collaborative AI systems. This chapter provides comprehensive architectural analysis of BCFL systems through a systematic four-dimensional taxonomy examining coordination structures, consensus mechanisms, storage architectures, and trust models. We analyze design patterns from blockchain-verified centralized coordination to fully decentralized peer-to-peer networks, evaluating trade-offs in scalability, security, and performance. Through detailed examination of consensus mechanisms designed for federated learning contexts, including Proof of Quality and Proof of Federated Learning, we demonstrate how computational work can be repurposed from arbitrary cryptographic puzzles to productive machine learning tasks. The chapter addresses critical storage challenges by examining multi-tier architectures that balance blockchain's transaction constraints with neural networks' large parameter requirements while maintaining cryptographic integrity. A technical case study of the TrustMesh framework illustrates practical implementation considerations in BCFL systems through distributed image classification training, demonstrating effective collaborative learning across IoT devices with highly non-IID data distributions while maintaining complete transparency and fault tolerance. Analysis of real-world deployments across healthcare consortiums, financial services, and IoT security applications validates the practical viability of BCFL systems, achieving performance comparable to centralized approaches while providing enhanced security guarantees and enabling new models of trustless collaborative intelligence.

2507.05647 2026-06-18 eess.IV cs.CV

Diffusion-Based Limited-Angle CT Reconstruction under Noisy Conditions

基于扩散的噪声条件下有限角度CT重建

Jiaqi Guo, Santiago López-Tapia

发表机构 * Dept. of Electrical and Computer Engineering, Northwestern University, Evanston, IL, USA(电气与计算机工程系,西北大学,埃文斯顿,伊利诺伊州,美国)

AI总结 本文提出基于扩散的有限角度CT重建方法,通过Mean-Reverting随机微分方程完成缺失角度视图,结合噪声感知校正机制提升鲁棒性,实验表明在不同噪声强度和采集条件下均表现优异。

Comments Accepted at the 2025 IEEE International Conference on Image Processing (ICIP), Workshop

详情
AI中文摘要

有限角度计算机断层扫描(LACT)是一个具有挑战性的逆问题,其中缺失的角度投影导致不完整的sinogram和重建图像中的严重伪影。尽管最近的基于学习的方法已显示出有效性,但大多数方法假设理想、无噪声的测量,并未能解决测量噪声的影响。为了克服这一限制,我们将LACT视为sinogram修复任务,并提出基于扩散的框架,利用Mean-Reverting随机微分方程(MR-SDE)公式来完成缺失的角度视图。为了在现实噪声下提高鲁棒性,我们提出RNSD$^+$,一种新的噪声感知校正机制,该机制在推理时显式建模不确定性,从而实现可靠且稳健的重建。广泛的实验表明,我们的方法在数据一致性和感知质量上一致优于基线模型,并且在不同噪声强度和采集场景下具有良好的泛化能力。

英文摘要

Limited-Angle Computed Tomography (LACT) is a challenging inverse problem where missing angular projections lead to incomplete sinograms and severe artifacts in the reconstructed images. While recent learning-based methods have demonstrated effectiveness, most of them assume ideal, noise-free measurements and fail to address the impact of measurement noise. To overcome this limitation, we treat LACT as a sinogram inpainting task and propose a diffusion-based framework that completes missing angular views using a Mean-Reverting Stochastic Differential Equation (MR-SDE) formulation. To improve robustness under realistic noise, we propose RNSD$^+$, a novel noise-aware rectification mechanism that explicitly models inference-time uncertainty, enabling reliable and robust reconstruction. Extensive experiments demonstrate that our method consistently surpasses baseline models in data consistency and perceptual quality, and generalizes well across varying noise intensity and acquisition scenarios.

2506.09822 2026-06-18 cs.CE cs.AI

Superstudent intelligence in thermodynamics

热力学中的超级学生智能

Rebecca Loubet, Pascal Zittlau, Marco Hoffmann, Luisa Vollmer, Sophie Fellenz, Heike Leitte, Fabian Jirasek, Johannes Lenhard, Hans Hasse

发表机构 * Laboratory of Engineering Thermodynamics (LTD)(工程热力学实验室) Visual Information Analysis Research Group (VIA)(视觉信息分析研究组) Machine Learning Research Group (ML)(机器学习研究组)

AI总结 研究展示OpenAI的o3模型在热力学考试中超越所有学生,证明机器在复杂任务中的能力,影响工程教育与实践。

Comments This document is the unedited Author's version of a yet to be Submitted Work to Physical Review Physics Education Research. 15 pages, 2 figures, Graphical Abstract, Highlights and SI available (12 pages)

详情
AI中文摘要

在本文中,我们报告并分析了一个引人注目的事件:OpenAI的大型语言模型o3在热力学考试中击败了所有学生。热力学考试是大多数学生的难点,需要展示对这一重要主题基本原理的掌握。因此,失败率很高,A级分数稀少,被视为学生卓越智力的证明。这是因为模式学习无助于考试。问题只能通过有创造力地结合热力学原理来解决。我们不仅将最新热力学考试提供给学生,还提供给OpenAI最强大的推理模型o3,并以相同方式评估其答案。在零样本模式下,模型o3正确解答了所有问题,优于所有参加考试的学生;其总分在1985年以来超过10000次类似考试中最佳分数范围内。这标志着转折点:机器现在在复杂任务中表现出色,通常被视为人类智力能力的证明。我们讨论了这对工程师工作和未来工程师教育的影响。

英文摘要

In this short note, we report and analyze a striking event: OpenAI's large language model o3 has outwitted all students in a university exam on thermodynamics. The thermodynamics exam is a difficult hurdle for most students, where they must show that they have mastered the fundamentals of this important topic. Consequently, the failure rates are very high, A-grades are rare - and they are considered proof of the students' exceptional intellectual abilities. This is because pattern learning does not help in the exam. The problems can only be solved by knowledgeably and creatively combining principles of thermodynamics. We have given our latest thermodynamics exam not only to the students but also to OpenAI's most powerful reasoning model, o3, and have assessed the answers of o3 exactly the same way as those of the students. In zero-shot mode, the model o3 solved all problems correctly, better than all students who took the exam; its overall score was in the range of the best scores we have seen in more than 10,000 similar exams since 1985. This is a turning point: machines now excel in complex tasks, usually taken as proof of human intellectual capabilities. We discuss the consequences this has for the work of engineers and the education of future engineers.

2505.03863 2026-06-18 cs.CR cs.AI

Data-Driven Falsification of Cyber-Physical Systems

数据驱动的物理系统验证

Atanu Kundu, Sauvik Gon, Rajarshi Ray

发表机构 * Indian Association for the Cultivation of Science(印度科学培养协会)

AI总结 本文提出一种框架,将物理系统验证与深度神经网络验证联系起来,并利用决策树的可解释性加速验证过程,展示了在ARCH-COMP 2024基准测试中高效发现多个反例的潜力。

详情
AI中文摘要

物理系统(CPS)在医疗、航空电子和自动驾驶等安全关键领域中普遍存在。因此,对其操作安全性的形式验证至关重要。本文针对验证问题,即寻找系统中的不安全执行而非证明其不存在。本文的贡献是提出一个框架,将CPS的验证与深度神经网络(DNN)的验证联系起来,并利用决策树的内在可解释性加速CPS的验证。这通过构建被测CPS的替代模型(作为DNN模型或决策树),应用各种DNN验证工具来验证CPS,并通过从其决策树替代模型中提取的安全违规解释来指导新的验证算法实现。所提出的框架有潜力利用一系列设计用于验证DNN鲁棒性属性的对抗攻击算法,以及最先进的DNN验证算法。尽管所提出的 methodology 可应用于可以执行或模拟的一般系统,但我们特别展示了其在CPS中的有效性。我们展示了我们的框架,作为工具FlexiFal,能够检测具有线性和非线性动态的CPS中难以发现的反例。决策树引导的验证在ARCH-COMP 2024验证基准测试中显示出有希望的结果。

英文摘要

Cyber-Physical Systems (CPS) are abundant in safety-critical domains such as healthcare, avionics, and autonomous vehicles. Formal verification of their operational safety is, therefore, of utmost importance. In this paper, we address the falsification problem, where the focus is on searching for an unsafe execution in the system instead of proving their absence. The contribution of this paper is a framework that (a) connects the falsification of CPS with the falsification of deep neural networks (DNNs) and (b) leverages the inherent interpretability of Decision Trees for faster falsification of CPS. This is achieved by: (1) building a surrogate model of the CPS under test, either as a DNN model or a Decision Tree, (2) application of various DNN falsification tools to falsify CPS, and (3) a novel falsification algorithm guided by the explanations of safety violations of the CPS model extracted from its Decision Tree surrogate. The proposed framework has the potential to exploit a repertoire of \emph{adversarial attack} algorithms designed to falsify robustness properties of DNNs, as well as state-of-the-art falsification algorithms for DNNs. Although the presented methodology is applicable to systems that can be executed/simulated in general, we demonstrate its effectiveness, particularly in CPS. We show that our framework, implemented as a tool \textsc{FlexiFal}, can detect hard-to-find counterexamples in CPS that have linear and non-linear dynamics. Decision tree-guided falsification shows promising results in efficiently finding multiple counterexamples in the ARCH-COMP 2024 falsification benchmarks~\cite{khandait2024arch}.

2406.15537 2026-06-18 q-bio.NC cs.AI cs.SD eess.AS

R&B -- Rhythm and Brain: Cross-subject Decoding of Music from Human Brain Activity

R&B -- 音乐与大脑:从人类脑活动交叉解码音乐

Matteo Ferrante, Matteo Ciferri, Nicola Toschi

发表机构 * Department of Biomedicine and Prevention University of Rome Tor Vergata(生物医学与预防系罗马大学托尔维加塔分校) A.A. Martinos Center for Biomedical Imaging Harvard Medical School/MGH, Boston (US)(A.A. Martinos生物医学成像中心哈佛医学院/马萨诸塞总医院,波士顿(美国))

AI总结 研究通过fMRI数据解码音乐,利用CLAP模型和voxel编码模型,实现跨被试音乐识别,提升音乐感知与情绪的神经基础理解。

Comments The first two authors contributed equally to this work

Journal ref Neural Networks, 203, 109195 (2026)

详情
AI中文摘要

音乐是一种普遍现象,深刻影响人类经验。本研究探讨是否能从功能性磁共振成像(fMRI)数据中解码音乐。利用最新大规模数据集和预训练计算模型,构建神经数据与音乐刺激潜在表示之间的映射。我们的方法整合功能和解剖对齐技术,以解决fMRI数据低时间分辨率和信噪比的问题。从GTZan fMRI数据集出发,五名受试者在听10种不同音乐流派的540个音乐刺激时记录脑活动。利用CLAP模型提取音乐刺激的潜在表示,并开发voxel编码模型以识别对这些刺激有反应的脑区。通过设置预测与实际脑活动之间的阈值,确定特定感兴趣区域(ROIs),这些区域可解释为音乐处理的关键参与者。我们的解码流程主要基于检索,使用线性映射将脑活动投影到对应的CLAP特征。这使我们能够预测并检索与fMRI数据起源最相似的音乐刺激。我们的结果展示了最先进的识别精度,方法显著优于现有方法。我们的发现表明,基于神经的音乐检索系统可能实现个性化推荐和治疗应用。未来工作可利用更高时间分辨率的神经成像和生成模型来提高解码精度,并探索音乐感知和情绪的神经基础。

英文摘要

Music is a universal phenomenon that profoundly influences human experiences across cultures. This study investigates whether music can be decoded from human brain activity measured with functional MRI (fMRI) during its perception. Leveraging recent advancements in extensive datasets and pre-trained computational models, we construct mappings between neural data and latent representations of musical stimuli. Our approach integrates functional and anatomical alignment techniques to facilitate cross-subject decoding, addressing the challenges posed by the low temporal resolution and signal-to-noise ratio (SNR) in fMRI data. Starting from the GTZan fMRI dataset, where five participants listened to 540 musical stimuli from 10 different genres while their brain activity was recorded, we used the CLAP (Contrastive Language-Audio Pretraining) model to extract latent representations of the musical stimuli and developed voxel-wise encoding models to identify brain regions responsive to these stimuli. By applying a threshold to the association between predicted and actual brain activity, we identified specific regions of interest (ROIs) which can be interpreted as key players in music processing. Our decoding pipeline, primarily retrieval-based, employs a linear map to project brain activity to the corresponding CLAP features. This enables us to predict and retrieve the musical stimuli most similar to those that originated the fMRI data. Our results demonstrate state-of-the-art identification accuracy, with our methods significantly outperforming existing approaches. Our findings suggest that neural-based music retrieval systems could enable personalized recommendations and therapeutic applications. Future work could use higher temporal resolution neuroimaging and generative models to improve decoding accuracy and explore the neural underpinnings of music perception and emotion.

2211.01960 2026-06-18 q-bio.NC cs.HC cs.LG

FingerFlex: Inferring Finger Trajectories from ECoG signals

FingerFlex:从ECoG信号推断手指轨迹

Vladislav Lomtev, Alexander Kovalev, Alexey Timchenko

发表机构 * Bauman Moscow State Technical University(巴乌曼莫斯科国立技术大学) ALVI Labs(ALVI实验室) Brain Dynamics Group, Higher School of Economics(高等经济学院脑动力组) University of Tuebingen(图宾根大学)

AI总结 本文提出FingerFlex模型,通过卷积编码器-解码器架构实现对电极皮层脑数据中手指运动回归,达到0.74的相关系数,推动高精度运动皮层脑机接口发展。

Comments 6 pages, 3 figures, 4 tables. Preprint. Under review

Journal ref 10.1109/IEEECONF58974.2023.10405112

详情
AI中文摘要

运动脑机接口(BCI)的发展严重依赖于神经时间序列解码算法。近年来深度学习架构的进步使得自动特征选择能够近似数据中的高阶依赖关系。本文提出了FingerFlex模型——一种针对电极皮层脑数据中手指运动回归的卷积编码器-解码器架构。在公开的BCI竞赛IV数据集4上,取得了最先进的性能,真值与预测轨迹之间的相关系数高达0.74。所提出的方法为开发完全功能的高精度运动皮层脑机接口提供了机会。

英文摘要

Motor brain-computer interface (BCI) development relies critically on neural time series decoding algorithms. Recent advances in deep learning architectures allow for automatic feature selection to approximate higher-order dependencies in data. This article presents the FingerFlex model - a convolutional encoder-decoder architecture adapted for finger movement regression on electrocorticographic (ECoG) brain data. State-of-the-art performance was achieved on a publicly available BCI competition IV dataset 4 with a correlation coefficient between true and predicted trajectories up to 0.74. The presented method provides the opportunity for developing fully-functional high-precision cortical motor brain-computer interfaces.

2606.19194 2026-06-18 cs.RO 新提交

Invertible Neural Network Adapter for One-Step Flow Matching in Robot Manipulation

用于机器人操作中一步流匹配的可逆神经网络适配器

Yu Zhang, Kangyi Ji, Yongxiang Zou, Rongtao Xu, Feng Zheng, Long Cheng

AI总结 提出可逆神经网络适配器,通过一步去噪过程生成高维动作,降低推理复杂度并保持精度,在仿真和真实实验中提升效率。

详情
AI中文摘要

本文提出了一种用于通用机器人操作的可逆神经网络适配器,旨在通过一步去噪过程,基于多模态观测(包括视觉、语言和本体感受输入)生成精确的高维动作。基于流匹配公式,所提出的适配器有效地将动作生成轨迹约束在可逆潜空间内,从而仅需单次推理步骤即可实现高效、高质量的灵巧动作合成。与传统的迭代流匹配策略相比,所提出的框架显著降低了推理复杂度,同时保持了强大的动作预测精度和稳定性。在多种仿真基准和真实机器人平台上进行了大量实验,以评估所提出方法的有效性。在仿真基准测试中,所提出的适配器在广泛的操作任务上持续表现出优于或接近最先进的性能。此外,真实世界实验显示,视觉-语言-动作(VLA)模型的推理效率显著提升,平均推理延迟从110毫秒降低到61毫秒,同时保持了强大的任务性能。

英文摘要

This paper presents an invertible neural network adapter for general robotic manipulation, designed to generate precise high-dimensional actions conditioned on multimodal observations, including visual, linguistic, and proprioceptive inputs, through a one-step denoising process. Built upon a flow-matching formulation, the proposed adapter effectively constrains the action generation trajectory within an invertible latent space, thereby enabling efficient and high-quality dexterous action synthesis with only a single inference step. Compared with conventional iterative flow-matching policies, the proposed framework substantially reduces inference complexity while maintaining strong action prediction accuracy and stability. Extensive experiments are conducted across a diverse set of simulation benchmarks and real-world robotic platforms to evaluate the effectiveness of the proposed method. Across simulation benchmarks, the proposed adapter consistently demonstrates superior or near state-of-the-art performance on a wide range of manipulation tasks. Furthermore, real-world experiments reveal a significant improvement in inference efficiency for vision-language-action (VLA) models, reducing the average inference latency from 110 ms to 61 ms while maintaining strong task performance.

2606.18698 2026-06-18 cs.RO cs.AI cs.LG 新提交

Leveraging Energy Features for Surface Classification with Deep Learning: A Comparative Analysis Across Three Independent Datasets

利用能量特征进行基于深度学习的表面分类:三个独立数据集的比较分析

Alexander Belyaev, Oleg Kushnarev

AI总结 研究评估能量特征作为表面分类的独立或辅助模态的可行性,在三个数据集上比较多种深度学习架构,发现CNN性能最优,纯能量特征准确率85-90%,与惯性特征结合可达96-99%,且能量特征可稳定提升1-2%准确率。

详情
AI中文摘要

基于能量的方法在移动机器人表面分类中仍是一个相对未被充分研究的途径,尽管在受限环境中取得了有希望的结果。本研究评估了使用能量衍生特征作为独立分类模态或作为惯性数据补充输入的可行性。在三个公开数据集上进行了全面评估,比较了现代深度学习架构(包括循环神经网络、卷积神经网络、仅编码器变压器和Mamba状态空间模型)在自动超参数调整和输入序列长度优化下的性能。模型在所有评估数据集上均实现了比先前报道值更高的准确率,其中卷积神经网络取得了最高的整体性能。当仅依赖基于能量的特征时,模型分类准确率在85-90%范围内,比与惯性特征结合时(96-99%)低约5-10%。用能量特征增强惯性数据导致平均准确率持续提高1-2%。这些发现表明,仅依赖能量特征的分类器为独立部署提供了足够的准确性,同时在与其它感知模态结合使用时也提供了一致的增益。

英文摘要

The energy-based method remains a comparatively underexamined approach for surface classification in mobile robotics, despite promising results in constrained environments. This study evaluated the viability of using energy-derived features as either a standalone classification modality or as supplementary input to inertial data. A comprehensive evaluation was conducted across three publicly available datasets, comparing the performance of modern deep learning architectures including recurrent neural networks, convolutional neural networks, encoder-only transformers, and Mamba state-space models, under automated hyperparameter tuning and input sequence length optimization. The models achieved higher accuracy than previously reported values on all evaluated datasets, with the convolutional neural network yielding the highest overall performance. When relying exclusively on energy-based features, the models attained classification accuracies in the range of 85-90%, approximately 5-10% lower than those achieved when combined with inertial features (96-99%). Augmenting inertial data with energy features resulted in a consistent mean accuracy improvement of 1-2%. These findings indicate that classifiers relying solely on energy features offer sufficient accuracy for standalone deployment, while also providing a consistent gain when used in combination with other sensing modalities.

2606.18418 2026-06-18 cs.LG 新提交

P$^2$CE: Model-Agnostic Plausible Pareto-Optimal Counterfactual Explanations

P$^2$CE: 模型无关的可行帕累托最优反事实解释

Arthur Hendricks Mendes de Oliveira, Giovani Valdrighi, Marcos Medeiros Raimundo

AI总结 提出P$^2$CE算法,利用隔离森林异常检测和SHAP值,生成可行且帕累托最优的反事实解释,平衡可行性、合理性和计算效率。

Comments Under review in the Machine Learning journal

详情
AI中文摘要

机器学习算法在社会应用中的日益普及引发了对公平性和透明度的担忧,从而推动了反事实解释的发展。这些解释通过提供可操作的输入特征更改,帮助个人理解并可能改变在贷款申请、工作选择等领域的不利决策。现有方法往往难以平衡可行性、合理性和计算效率。为此,我们提出了P$^2$CE,一种生成可行帕累托最优反事实解释的算法,为用户提供不同可行性概念之间的多样化最优权衡。P$^2$CE使用辅助隔离森林异常检测器确保解释符合数据分布,并利用SHAP值在短时间内获得最优结果,与底层模型无关。我们在三个数据集上进行了实证评估,结果表明,与相关技术相比,该算法在解决方案质量和计算效率方面均表现出优越性能。

英文摘要

The increasing use of machine learning algorithms in social applications has raised concerns about fairness and transparency, leading to the development of counterfactual explanations. These explanations supports individuals to understand and potentially alter unfavorable decisions in areas such as loan applications, job selections, and more, by providing actionable changes to input features that would lead to a desired outcome. Existing methods often struggle to balance feasibility, plausibility, and computational efficiency. To address this, we introduce P$^2$CE, an algorithm for generating plausible Pareto-optimal counterfactual explanations, offering users a diverse set of optimal trade-offs between different notions of feasibility. P$^2$CE employs an auxiliary isolation forest outlier detector to ensure that explanations are in accordance with the data distribution and leverages SHAP values to obtain optimal results with short computing times, regardless of the underlying model. Our algorithm was empirically evaluated on three datasets, demonstrating superior performance in terms of both solution quality and computational efficiency compared to related techniques.

2606.18318 2026-06-18 cs.CV cs.CR 新提交

Budget-Aware Adaptive Adversarial Patches for Black-Box Object Detection

预算感知的自适应对抗补丁用于黑盒目标检测

Pedram MohajerAnsari, Amir Salarpour, David Fernandez, Mert D. Pesé

AI总结 提出一种查询高效、预算自适应的黑盒攻击方法,结合上下文汤普森采样放置和NES像素更新,在严格纯图像抑制测试下,对CNN和Transformer检测器实现强抑制,并揭示查询-视觉足迹权衡。

Comments Accepted to the 2026 IEEE International Conference on Image Processing (ICIP 2026)

详情
AI中文摘要

对抗补丁对现代目标检测器构成实际威胁。先前工作揭示了脆弱性,但三个差距限制了可操作的见解:(i) 很少有基于分数的黑盒攻击在严格查询预算下联合优化补丁的位置、纹理和大小;(ii) 成功很少与补丁的视觉足迹相关联;(iii) 评估常常混淆EOT鲁棒性与纯视图抑制。我们提出\method{},一种查询高效、预算自适应的黑盒攻击,它结合了轻量级的上下文汤普森采样放置器与NES风格的像素更新,仅在进展停滞时增大补丁。报告基于严格的纯图像抑制测试;EOT被审计但从不作为成功的替代,可选的外观/可打印性权重揭示了强度-可见性权衡。在YOLOv5、Faster R-CNN和YOLOS上,\method{}在基于CNN的检测器上实现了强抑制,在基于Transformer的检测器上实现了显著抑制,使用紧凑的补丁,并相对于固定大小和启发式基线暴露了清晰的查询-足迹权衡。打印-捕获实验进一步展示了跨未见物理对象和视角的迁移。

英文摘要

Adversarial patches pose a practical threat to modern object detectors. Prior work shows vulnerability, but three gaps limit actionable insight: (i) few \emph{score-based black-box} attacks \emph{jointly} optimize patch \emph{location, texture, and size} under tight query budgets; (ii) success is rarely tied to the patch's \emph{visual footprint}; and (iii) evaluations often conflate EOT robustness with plain-view suppression. We present \method{}, a query-efficient, budget-adaptive black-box attack that couples a lightweight \emph{Contextual Thompson-Sampling} placer with NES-style pixel updates, growing the patch only when progress stalls. Reporting is anchored by a \emph{strict plain-image} suppression test; EOT is audited but never used as a substitute for success, and optional appearance/printability weights expose strength--visibility trade-offs. Across YOLOv5, Faster R-CNN, and YOLOS, \method{} achieves strong suppression on CNN-based detectors and substantial suppression on the transformer-based detector, using compact patches and exposing clear query--footprint trade-offs relative to fixed-size and heuristic baselines. A print--capture pilot further shows transfer across unseen physical objects and viewpoints.

2606.18283 2026-06-18 cs.LG 新提交

Gaussian Mixture Attention: Linear-Time Sequence Mixing via Probabilistic Latent Routing

高斯混合注意力:通过概率潜在路由实现线性时间序列混合

Yongchao Huang, Hassan Raza

AI总结 提出高斯混合注意力(GMA),用K个高斯混合分量的潜在路由替代逐对查询-键比较,实现固定K的线性内存缩放,在长上下文分类任务中与注意力基线竞争。

Comments 55 pages

详情
AI中文摘要

标准点积注意力的密集token间交互模式仍然是扩展Transformer架构到长上下文的主要瓶颈。我们引入\textbf{高斯混合注意力(GMA)},一种概率注意力风格的序列混合器,通过$K$个学习的高斯混合分量进行路由,替代显式的逐对查询-键比较。查询和键被映射到共享潜在路由空间上的后验\textit{责任}向量;它们的重叠定义了隐式的责任空间亲和性,而值被写入和读取自一个$K$槽的潜在记忆。通过利用矩阵乘法的结合性,GMA避免了生成诱导的$N\times N$亲和矩阵,而是使用两个责任矩阵,其主导激活存储规模为$\mathcal{O}(NK)$而非固定$K$下的$\mathcal{O}(N^2)$。我们制定了GMA的双向和因果变体,提供了高斯混合分量的端到端可微参数化,并分析了其责任调制的梯度结构、约束非负低秩亲和性解释以及局部路由稳定性。实验上,GMA表现出预期的固定$K$线性内存缩放,并在长上下文分类上与注意力基线竞争,而因果GMA在WikiText-103上优于测试的线性/随机特征注意力变体,但在当前实现中仍落后于优化的因果SDPA和Mamba。对学习到的责任的分析进一步显示了广泛的组件使用和与表面形式词类别的适度对齐,支持GMA作为一种概率性、可解释、固定$K$的线性时间注意力风格替代方案,而非优化softmax注意力或状态空间模型的通用替代。

英文摘要

The dense token-to-token interaction pattern of standard dot-product attention remains a central bottleneck in scaling Transformer architectures to long contexts. We introduce \textbf{Gaussian Mixture Attention (GMA)}, a probabilistic attention-style sequence mixer that replaces explicit pairwise query--key comparison with routing through $K$ learned Gaussian mixture components. Queries and keys are mapped to posterior \textit{responsibility} vectors over a shared latent routing space; their overlap defines an implicit responsibility-space affinity, while values are written into and read from a $K$-slot latent memory. By exploiting the associativity of matrix multiplication, GMA avoids materializing the induced $N\times N$ affinity matrix and instead uses two responsibility matrices whose dominant activation storage scales as $\mathcal{O}(NK)$ rather than $\mathcal{O}(N^2)$ for fixed $K$. We formulate bidirectional and causal variants of GMA, provide an end-to-end differentiable parameterization of the Gaussian mixture components, and analyze its responsibility-modulated gradient structure, constrained non-negative low-rank affinity interpretation, and local routing stability. Empirically, GMA exhibits the intended fixed-$K$ linear memory scaling and is competitive with attention-style baselines on long-context classification, while causal GMA improves over tested linear/random-feature attention variants on WikiText-103 but remains behind optimized causal SDPA and Mamba in the current implementation. Analysis of learned responsibilities further shows broad component usage and moderate alignment with surface-form token categories, supporting GMA as a probabilistic, interpretable, fixed-$K$ linear-time attention-style alternative rather than a universal replacement for optimized softmax attention or state-space models.

2602.11557 2026-06-18 cs.LG stat.ML 交叉投稿

The Implicit Bias of Steepest Descent with Mini-batch Stochastic Gradient

小批量随机梯度下降的隐式偏差

Jichu Li, Xuan Tang, Difan Zou

AI总结 研究小批量随机最陡下降在多类分类中的隐式偏差,揭示批大小、动量和方差缩减对最大间隔行为和收敛率的影响,并证明动量可实现小批量收敛,方差缩减可恢复全批量隐式偏差。

详情
AI中文摘要

多种广泛使用的优化方法,如SignSGD和Muon,可以被解释为在不同范数诱导几何下的最陡下降实例。在这项工作中,我们研究了多类分类中小批量随机最陡下降的隐式偏差,刻画了批大小、动量和方差缩减如何在一般逐项和Schatten-$p$范数下塑造极限最大间隔行为和收敛率。我们证明,在没有动量时,最坏情况下的收敛和成功分类只能通过全批量梯度保证。相反,动量通过批量-动量权衡使得小批量收敛到近似最大间隔解成为可能,尽管会减慢收敛速度。该方法提供了完全显式、与维度无关的收敛率,优于先前的结果。此外,我们证明方差缩减可以恢复任意批大小下的精确全批量隐式偏差,尽管收敛速度较慢。最后,我们进一步研究了无动量的单批量最陡下降,并通过一个具体数据示例揭示了其收敛到根本不同偏差的特性,这揭示了纯随机更新的一个关键局限性。总体而言,我们的统一分析阐明了随机优化何时与全批量行为一致,并为更深入地探索随机梯度最陡下降算法的训练行为铺平了道路。

英文摘要

A variety of widely used optimization methods like SignSGD and Muon can be interpreted as instances of steepest descent under different norm-induced geometries. In this work, we study the implicit bias of mini-batch stochastic steepest descent in multi-class classification, characterizing how batch size, momentum, and variance reduction shape the limiting max-margin behavior and convergence rates under general entry-wise and Schatten-$p$ norms. We show that, without momentum, worst-case convergence and successful classification can only be guaranteed with full-batch gradient. In contrast, momentum enables small-batch convergence to an approximate max-margin solution through a batch-momentum trade-off, though it slows convergence. This approach provides fully explicit, dimension-free rates that improve upon prior results. Moreover, we prove that variance reduction can recover the exact full-batch implicit bias for any batch size, albeit at a slower convergence rate. Finally, we further investigate the batch-size-one steepest descent without momentum, and reveal its convergence to a fundamentally different bias via a concrete data example, which reveals a key limitation of purely stochastic updates. Overall, our unified analysis clarifies when stochastic optimization aligns with full-batch behavior, and paves the way for perform deeper explorations of the training behavior of stochastic gradient steepest descent algorithms.

2605.02089 2026-06-18 cs.CV 版本更新

Cross-Lingual Learning within Arabic Script for Low-Resource HTR

阿拉伯文字内低资源手写文本识别的跨语言学习

Sana Al-azzawi, Elisa Barney, Marcus Liwicki

AI总结 针对阿拉伯文字低资源手写文本识别,通过跨语言联合训练CRNN和HTR-VT模型,在KHATT、NUST-UHWR和PHTD数据集上显著降低字符错误率。

Comments This paper accepted at DALL workshop ICDAR 2026

详情
AI中文摘要

有限标注数据下的手写文本识别(HTR)仍然是一个具有挑战性的问题,尤其是对于阿拉伯文字语言。尽管现代基于序列的识别器在高资源设置下表现良好,但随着训练数据的稀缺,其准确率急剧下降。阿拉伯文字语言共享一个书写系统,具有大量字符重叠,这促使跨语言学习成为缓解数据稀缺的一种策略。我们在低资源场景(样本数K=100、500、1000标注行)下,对阿拉伯语(KHATT)、乌尔都语(NUST-UHWR)和波斯语(PHTD)进行了受控的行级跨语言联合训练研究。基于CRNN和Vision Transformer的HTR-VT模型在多个相关阿拉伯文字数据集的联合集上进行训练以缓解数据稀缺,并在单个目标语言上进行评估。两种架构在低资源条件下均受益于跨语言训练。CRNN在目标语言数据极其有限时仍然更有效,而随着更多目标语言数据的可用,HTR-VT的跨语言训练收益变得不太一致。在波斯语(PHTD)上,联合训练实现了9.99的字符错误率(CER),尽管未使用全部可用训练数据,仍超越了先前报告的结果。在另一个乌尔都语数据集(UNHD)上,联合训练将CER从17.20降低到14.45。

英文摘要

Handwritten Text Recognition (HTR) with limited labeled data remains a challenging problem, particularly for Arabic-script languages. Although modern sequence-based recognizers perform well in high-resource settings, their accuracy degrades sharply as training data becomes scarce. Arabic-script languages share a common writing system with substantial character overlap, motivating cross-lingual learning as a strategy to mitigate data scarcity. We conduct a controlled line-level study of cross-lingual joint training for Arabic-script HTR under low-resource regimes (number of samples K = 100, 500, 1000 labeled lines) on Arabic (KHATT), Urdu (NUST-UHWR) and Persian (PHTD). CRNN and Vision Transformer-based HTR-VT models are trained on the union of multiple related Arabic-script datasets to mitigate the data scarcity and are evaluated on individual target languages. Both architectures benefit from cross-language training under low-resource conditions. CRNN remains more effective under extremely limited target-language data, whereas the benefits of cross-language training for HTR-VT become less consistent as larger amounts of target-language data become available. On Persian (PHTD), joint training achieves a Character Error Rate (CER) of 9.99 , surpassing previously reported results despite not using the full available training data. On an additional Urdu dataset (UNHD), joint training reduces CER from 17.20 to 14.45.

2605.29649 2026-06-18 cs.AI 版本更新

LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning

LLM进化的符号AI规划领域无关启发式

Elliot Gestrin, Jendrik Seipp

AI总结 本文使用进化搜索让大语言模型生成领域无关的启发式函数,在未见测试域上超越手工最优启发式,并首次系统评估了启发式的信息性-速度权衡。

Comments Accepted at the LM4Plan workshop at ICAPS 2026

详情
AI中文摘要

启发式搜索是符号AI规划中的主导范式,最强的启发式是规划研究者数十年工作的成果。最近的工作表明,大型语言模型(LLM)可以为单个规划领域设计启发式,但迄今为止,没有LLM生成的启发式能在任意规划任务上工作。在本文中,我们使用进化搜索来产生第一个LLM生成的领域无关启发式,其超越了手工最优的现有技术。我们让LLM变异用C++编写的父启发式,将候选解存储在MAP-Elites档案中,以信息性和速度作为键,并通过混合覆盖率和求解时间计算适应度分数。为了将进化程序置于上下文中,我们还额外基准测试了一组广泛的手工启发式在信息性-速度权衡上的表现,据我们所知,这之前从未做过。在未见测试域上,我们最好的进化启发式比最强基线解决了更多任务,我们的完整启发式套件跨越了所述权衡的帕累托前沿。我们还发现,从平凡的盲目启发式开始进化优于从强FF启发式开始,即使最终程序本身是FF变体,并且LLM推理努力影响候选编译成功的频率远大于影响那些编译成功的候选的质量。由于进化程序是纯C++,它们可以作为即插即用替代品插入现有规划器,并继承底层搜索的健全性和完备性保证。

英文摘要

Heuristic search is the dominant paradigm in symbolic AI planning, and the strongest heuristics are the result of decades of work by planning researchers. Recent work has shown that large language models (LLMs) can design heuristics for individual planning domains, but no LLM-generated heuristic has so far worked on arbitrary planning tasks. In this paper, we use evolutionary search to produce the first LLM-generated domain-independent heuristics that exceed the hand-engineered state of the art. We let an LLM mutate parent heuristics written in C++, store candidates in a MAP-Elites archive keyed on informedness and speed and calculate fitness scores by blending coverage with solving time. To place the evolved programs in context, we additionally benchmark a broad set of hand-engineered heuristics on their informedness-speed tradeoff, which to our knowledge has not been done before. On unseen testing domains, our best evolved heuristic solves more tasks than even the strongest baseline, with our full heuristic suite spanning the Pareto frontier of said tradeoff. We also find that seeding evolution from the trivial blind heuristic outperforms seeding from the strong FF heuristic, even when the resulting program is itself an FF variant, and that LLM reasoning effort affects how often candidates compile much more than the quality of those that do. Because the evolved programs are plain C++, they slot into existing planners as drop-in replacements and inherit the soundness and completeness guarantees of the underlying search.

2605.22142 2026-06-18 cs.LG cs.AI 版本更新

Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability

知识图谱下的短期到长期记忆转移:在部分可观测性下的短期到长期记忆转移

Taewoon Kim, Vincent François-Lavet, Michael Cochez

AI总结 本文研究了在部分可观测性下知识图谱中的短期到长期记忆转移问题,提出了一种基于神经符号价值决策的方法,通过在长期插入前决定保留或丢弃观察到的三元组,从而提升记忆效率,并在RoomKG基准测试中优于符号和神经基线方法。

详情
AI中文摘要

在部分可观测性下的强化学习需要决定保留哪些信息,但大多数基于记忆的方法并未显式建模符号观察的短期到长期转移。我们研究了这一转移过程,将其建模为一个神经符号价值决策问题:对于每个观察到的三元组,智能体需决定在长期插入前是否保留或丢弃。为处理可变大小的短期缓冲区,我们采用了一种每项Q学习设计,使用共享参数和实际的时间差分更新,跨连续步骤匹配项目。在长期记忆容量为128的RoomKG基准测试中,学习到的转移决策优于符号和神经基线,包括带有时间注释的符号基线和基于历史的LSTM/Transformer基线。在转移策略消融分析中,一个轻量级的本地短期-only变体表现最佳,且在步骤层面行为显示,策略保留导航和查询相关的事实,同时丢弃低价值的候选事实,支持在内存限制下显式且可解释的记忆决策。

英文摘要

Reinforcement learning under partial observability requires deciding what information to retain, yet most memory-based approaches do not explicitly model short-term-to-long-term transfer of symbolic observations. We study this transfer process in a temporal knowledge-graph memory setting and cast it as a neuro-symbolic value-based decision problem: for each observed triple, the agent chooses whether to keep or drop it before long-term insertion. To handle variable-sized short-term buffers, we use a per-item Q-learning design with shared parameters and a practical temporal-difference update over matched items across consecutive steps. On the RoomKG benchmark at long-term memory capacity 128, learned transfer decisions outperform symbolic and neural baselines, including symbolic baselines with temporal annotations and history-based LSTM/Transformer baselines. Across transfer-policy ablations, a lightweight local short-term-only variant performs best, and step-level behavior shows that the policy keeps navigation- and query-relevant facts while discarding lower-value candidate facts, supporting explicit and interpretable memory decisions under memory constraints.

2604.13899 2026-06-18 cs.CL cs.AI 版本更新

Do We Still Need Humans in the Loop? Comparing Human and LLM Annotation in Active Learning for Hostility Detection

我们是否仍然需要人在回路中?比较主动学习中用于敌意检测的人类与LLM标注

Ahmad Dawar Hakimi, Lea Hirlimann, Isabelle Augenstein, Hinrich Schütze

AI总结 研究比较了LLM与人类在主动学习中的标注效果,发现LLM标注成本更低且性能更优,但主动学习在LLM标注下无优势。

详情
AI中文摘要

指令微调的LLM可以低成本标注数千个实例。这为主动学习(AL)提出了两个问题:LLM标签能否替代AL回路中的人类标签?当整个语料库可以廉价标注时,AL是否仍然必要?我们在一个新的包含277,902条德国政治TikTok评论(25,974条LLM标注,5,000条人工标注)的数据集上进行了研究,比较了LLM和人类标注在七种条件、四种编码器和10个随机种子下的表现。在模仿人类标注任务的双问题界面下,大规模LLM标注的性能优于人类监督分类器,成本约为其十分之一(GPT-5.2 Batch API为28美元,Prolific为316美元)。这一优势对于闭源(GPT-5.2)和开源(Qwen3.5-122B-10B)LLM均成立,在软标签评估下具有鲁棒性,并且是通过双问题分解实现的;整体单提示基线仅与人类监督持平。在任一LLM标注器下,主动学习相比随机采样没有可靠优势。然而,错误结构差异显著:只有GPT-5.2在双问题界面下产生的分类器具有接近人类的FP/FN平衡,而其他LLM变体过度标记了边境管制和经济竞争话语。我们发布了数据集和代码。

英文摘要

Instruction-tuned LLMs can annotate thousands of instances at low cost. This raises two questions for active learning (AL): can LLM labels replace human labels within the AL loop, and does AL remain necessary when entire corpora can be cheaply labeled? We investigate both on a new dataset of 277,902 German political TikTok comments (25,974 LLM-labeled, 5,000 human-annotated), comparing LLM and human annotation across seven conditions, four encoders, and 10 random seeds. Under a two-question interface that mirrors the human annotation task, LLM annotation at scale outperforms human-supervised classifiers at roughly one-tenth the cost (\$28 for GPT-5.2 Batch API vs. \$316 for Prolific). The advantage holds for both a closed-source (GPT-5.2) and an open-weight (Qwen3.5-122B-10B) LLM, is robust under soft-label evaluation, and is unlocked specifically by the two-question decomposition; a holistic single-prompt baseline only ties with human supervision. AL provides no reliable advantage over random sampling under either LLM annotator. However, error structure varies sharply: only GPT-5.2 under the two-question interface produces classifiers with near-human FP/FN balance, while other LLM variants over-flag border-control and economic competition discourse. We release the dataset and code.

2601.19792 2026-06-18 cs.CL cs.AI cs.HC 版本更新

LVLMs and Humans Ground Differently in Referential Communication

LVLMs与人类在指称交流中的基础不同

Peter Zeng, Weiling Li, Amie J. Paige, Zhengxiang Wang, Panagiotis Kaliosis, Dimitris Samaras, Gregory Zelinsky, Susan E. Brennan, Owen Rambow

AI总结 通过人类与AI配对的多轮指称交流实验,发现LVLMs无法像人类一样利用共同基础生成和解析指称表达,导致交流不畅。

Comments 27 pages, 16 figures

详情
AI中文摘要

对于生成式AI代理与人类用户有效合作,准确预测人类意图的能力至关重要。但这种协作能力仍然受到一个关键缺陷的限制:无法建模共同基础。我们提出了一个因子设计的指称交流实验,涉及指导者-匹配者配对(人类-人类、人类-AI、AI-人类和AI-AI),他们在多轮重复回合中交互,以匹配与任何明显词汇化标签无关的物体图片。我们表明,LVLMs无法以促进顺畅交流的方式交互式生成和解析指称表达,而这是人类语言使用的基础技能。我们发布了包含356个对话(89对,每对4轮)的语料库,以及用于数据收集的在线流程和用于分析准确性、效率和词汇重叠的工具。

英文摘要

For generative AI agents to partner effectively with human users, the ability to accurately predict human intent is critical. But this ability to collaborate remains limited by a critical deficit: an inability to model common ground. We present a referential communication experiment with a factorial design involving director-matcher pairs (human-human, human-AI, AI-human, and AI-AI) that interact with multiple turns in repeated rounds to match pictures of objects not associated with any obvious lexicalized labels. We show that LVLMs cannot interactively generate and resolve referring expressions in a way that enables smooth communication, a crucial skill that underlies human language use. We release our corpus of 356 dialogues (89 pairs over 4 rounds each) along with the online pipeline for data collection and the tools for analyzing accuracy, efficiency, and lexical overlap.

2509.09631 2026-06-18 cs.SD cs.CL cs.CV 版本更新

DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Discrete Flow Matching

DiFlow-TTS: 基于离散流匹配的紧凑低延迟零样本文本转语音

Ngoc-Son Nguyen, Thanh V. T. Tran, Hieu-Nghia Huynh-Nguyen, Truong-Son Hy, Van Nguyen

AI总结 提出DiFlow-TTS框架,通过离散流匹配和分解离散流去噪器,在零样本TTS中实现高质量与低延迟的平衡。

Comments Accepted at Interspeech 2026 (Long Paper Track)

详情
AI中文摘要

零样本文本转语音(TTS)在复制未见过的声音方面取得了显著进展,但平衡生成质量和推理效率仍然具有挑战性。自回归模型存在高延迟问题,而基于扩散的方法受限于训练时的配置。此外,大多数基于流的方法在连续空间中运行,由于连续令牌空间本质上比离散空间更复杂,这引入了优化挑战。为了解决这些限制,我们提出了DiFlow-TTS,一种基于离散流匹配的新型零样本TTS框架。该模型由一个用于语言建模的确定性音素-内容映射器和一个同时生成韵律和声学令牌流的分解离散流去噪器组成。实验结果表明了我们的方法在多个评估指标上的有效性。

英文摘要

Zero-shot text-to-speech (TTS) has made significant progress in replicating unseen voices, yet balancing generation quality and inference efficiency remains challenging. Autoregressive models suffer from high latency, while diffusion-based approaches are constrained by training-time configurations. Moreover, most flow-based methods operate in continuous space, which introduces optimization challenges because continuous token spaces are inherently more complex than discrete ones. To address these limitations, we propose DiFlow-TTS, a novel zero-shot TTS framework based on discrete flow matching. The model consists of a deterministic Phoneme-Content Mapper for linguistic modeling and a Factorized Discrete Flow Denoiser that simultaneously generates prosody and acoustic token streams. Experimental results demonstrate the effectiveness of our approach across multiple evaluation metrics.

2507.16859 2026-06-18 cs.RO cs.AI 版本更新

Enhancing Fatigue Detection through Heterogeneous Multi-Source Data Integration and Cross-Domain Modality Imputation

通过异构多源数据集成与跨域模态插补增强疲劳检测

Luobin Cui, Yanlai Wu, Tang Ying, Weikai Li

AI总结 针对实际部署环境中高质量传感器不可用的问题,提出异构多源疲劳检测框架,利用共享模态进行跨域模态插补,融合源域知识提升目标域疲劳检测性能。

Comments 4figures,14pages

详情
AI中文摘要

疲劳检测对于安全相关应用(如航空、采矿和长途运输)中的人类操作员至关重要。可靠的操作员疲劳估计可以支持人机系统中的及时警告、自适应任务调度、接管提醒和其他安全管理决策。然而,这些功能的有效性取决于疲劳相关信号是否能在部署环境中可靠捕获。虽然许多研究已显示高保真传感器在受控实验室环境中的价值,但在实际环境中,由于噪声、光照条件和视野限制,其性能往往会下降,从而限制了实际应用。本文形式化了一种面向实际部署的疲劳检测设置,其中高质量传感器在实际应用中通常不可用。为解决这一问题,我们利用来自异构源域的知识,包括难以在现场部署但常用于受控环境的高保真传感器,来辅助真实目标域中的疲劳检测。基于这一思想,我们设计了一个异构多源疲劳检测框架,该框架利用目标域中的可用模态,同时通过基于共享模态的跨域模态插补来利用源域中的多样化配置。

英文摘要

Fatigue detection for human operators is important in safety-related applications such as aviation, mining, and long-haul transport. Reliable estimation of operator fatigue can support timely warnings, adaptive task scheduling, takeover reminders, and other safety-management decisions in human-machine systems. However, the effectiveness of these functions depends on whether fatigue-related signals can be reliably captured in the deployment environment. While many studies have shown the value of high-fidelity sensors in controlled laboratory environments, their performance often degrades when used in real-world settings because of noise, lighting conditions, and field-of-view constraints, thereby limiting their practical use. This paper formalizes a deployment-oriented setting for real-world fatigue detection, where high-quality sensors are often unavailable in practical applications. To address this issue, we use knowledge from heterogeneous source domains, including high-fidelity sensors that are difficult to deploy in the field but commonly used in controlled environments, to assist fatigue detection in the real-world target domain. Based on this idea, we design a heterogeneous and multi-source fatigue-detection framework that uses the available modalities in the target domain while leveraging diverse configurations in the source domains through cross-domain modality imputation based on shared modalities.

2506.08764 2026-06-18 cs.LG 版本更新

On the Stability of the Jacobian Matrix in Deep Neural Networks

深度神经网络中雅可比矩阵的稳定性

Benjamin Dadoun, Soufiane Hayou, Hanan Salam, Mohamed El Amine Seddik, Pierre Youssef

AI总结 本文利用随机矩阵理论,建立了深度神经网络中雅可比矩阵谱稳定性的通用定理,适用于稀疏和非独立同分布权重,扩展了初始化方案的理论基础。

Comments 21 pages, 28 figures; the main theorem was wrong (again) and is now corrected

详情
AI中文摘要

深度神经网络随着深度增加容易出现梯度爆炸或消失,这一现象与输入-输出雅可比矩阵的谱行为密切相关。先前的工作确定了确保雅可比稳定性的关键初始化方案,但这些分析通常局限于具有独立同分布权重的全连接网络。在这项工作中,我们显著超越了这些限制:我们建立了一个适用于深度神经网络的通用稳定性定理,该定理能够处理稀疏性(例如由剪枝引入的)以及非独立同分布、弱相关权重(例如由训练引起的)。我们的结果依赖于随机矩阵理论的最新进展,并为更广泛类别的网络模型提供了谱稳定性的严格保证。这扩展了具有结构化和依赖随机性的现代神经网络中初始化方案的理论基础。

英文摘要

Deep neural networks are known to suffer from exploding or vanishing gradients as depth increases, a phenomenon closely tied to the spectral behavior of the input-output Jacobian. Prior work has identified critical initialization schemes that ensure Jacobian stability, but these analyses are typically restricted to fully connected networks with i.i.d. weights. In this work, we go significantly beyond these limitations: we establish a general stability theorem for deep neural networks that accommodates sparsity (such as that introduced by pruning) and non-i.i.d., weakly correlated weights (e.g. induced by training). Our results rely on recent advances in random matrix theory, and provide rigorous guarantees for spectral stability in a much broader class of network models. This extends the theoretical foundation for initialization schemes in modern neural networks with structured and dependent randomness.

2206.05018 2026-06-18 cs.SD cs.CL eess.AS

Going Beyond the Cookie Theft Picture Test: Detecting Cognitive Impairments using Acoustic Features

超越饼干盗窃图片测试:利用音频特征检测认知障碍

Franziska Braun, Andreas Erzigkeit, Hartmut Lehfeld, Thomas Hillemacher, Korbinian Riedhammer, Sebastian P. Bayerl

AI总结 本文通过两种标准化神经心理学测试和半结构化临床访谈数据,利用音频特征检测认知障碍,采用OpenSMILE和wav2vec 2.0模型,准确率达85%。

Comments Accepted at the 25th International Conference on Text, Speech and Dialogue (TSD 2022)

Journal ref Proceedings of the 25th International Conference on Text, Speech, and Dialogue (TSD 2022)

详情
AI中文摘要

标准化测试在认知障碍检测中起关键作用。先前研究显示,利用标准化图片描述任务的音频数据可自动检测认知障碍。本文进一步评估了在德国SKT和德国CERAD-NB测试及患者与心理学家的半结构化访谈中提取的音频特征,证明标准化测试的音频特征可可靠区分认知障碍者与非障碍者。此外,即使从访谈中随机语音样本提取的特征也能作为认知障碍的判别指标。在基线实验中,我们使用OpenSMILE特征和支持向量机分类器;在改进设置中,使用wav2vec 2.0特征可达到高达85%的准确率。

英文摘要

Standardized tests play a crucial role in the detection of cognitive impairment. Previous work demonstrated that automatic detection of cognitive impairment is possible using audio data from a standardized picture description task. The presented study goes beyond that, evaluating our methods on data taken from two standardized neuropsychological tests, namely the German SKT and a German version of the CERAD-NB, and a semi-structured clinical interview between a patient and a psychologist. For the tests, we focus on speech recordings of three sub-tests: reading numbers (SKT 3), interference (SKT 7), and verbal fluency (CERAD-NB 1). We show that acoustic features from standardized tests can be used to reliably discriminate cognitively impaired individuals from non-impaired ones. Furthermore, we provide evidence that even features extracted from random speech samples of the interview can be a discriminator of cognitive impairment. In our baseline experiments, we use OpenSMILE features and Support Vector Machine classifiers. In an improved setup, we show that using wav2vec 2.0 features instead, we can achieve an accuracy of up to 85%.