arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3851
2606.07657 2026-06-09 cs.NE cs.LG 新提交

QDS-SNN: Energy-efficient Quantum Deeply-Supervised Spiking Neural Network Algorithm for Traffic Sign Recognition

QDS-SNN:用于交通标志识别的节能量子深度监督脉冲神经网络算法

Zhiguo Qu, Keqi Li, Le Sun, Wenjie Liu, Yimin Yu, Saif Al-Kuwari, Ahmed Farouk

发表机构 * School of Computer Science, School of Software, Nanjing University of Information Science and Technology(计算机科学系、软件学院、信息科学技术大学)

AI总结 提出量子深度监督脉冲神经网络(QDS-SNN),结合量子神经网络与时空自适应LIF神经元,在GTSRB数据集上以6个时间步达到99.72%准确率,能耗降低55.77%。

Comments 13 pages, 10 Figures, 8 Tables

详情
AI中文摘要

交通标志识别对于智能交通和自动驾驶至关重要,因为它可以提高驾驶效率并确保道路安全。然而,传统的识别方法基于大规模数据集和密集计算,限制了其实时应用性。脉冲神经网络(SNN)由于其时空处理能力,提供了一种受生物启发的节能替代方案,但在训练过程中存在信息丢失和梯度消失的问题。为了克服这些限制,本研究提出了一种量子深度监督脉冲神经网络(QDS-SNN),它集成了量子神经网络(QNN)以实现高效、低功耗的深度监督。利用量子叠加和纠缠,QNN能够实现表达性表示和并行计算,从而在不影响能效的情况下提升性能。所提出的QDS-SNN包含一个时空自适应LIF(TSA-LIF)神经元和一个量子辅助分类器模块(QACM),以缓解梯度问题并提高训练效果。本研究在PennyLane量子模拟平台上进行实验,结果表明,QDS-SNN在仅6个时间步内,在GTSRB数据集上达到了99.72%的准确率——比MS-ResNet基线高出1.32%,同时能耗降低了55.77%。在TSRD数据集中,它达到了97.90%的准确率,同时能耗降至基线的52.68%。这些结果表明,QDS-SNN为智能交通系统中的交通标志识别提供了一种高性能、节能的解决方案。

英文摘要

Traffic sign recognition is crucial for intelligent transportation and autonomous driving, as it can improve driving efficiency and ensure road safety. However, traditional recognition methods are based on large datasets and intensive computation, which limits their real-time applicability. Spiking Neural Networks (SNNs) offer a biologically inspired, energy-efficient alternative due to their spatiotemporal processing capabilities, but suffer from information loss and vanishing gradients during training. To overcome these limitations, this study proposes a Quantum Deep-supervised Spiking Neural Network (QDS-SNN) that integrates Quantum Neural Networks (QNNs) for efficient, low-power deep supervision. Using quantum superposition and entanglement, QNNs enable expressive representations and parallel computation, thereby enhancing performance without compromising energy efficiency. The proposed QDS-SNN incorporates a temporally and spatially adaptive LIF (TSA-LIF) neuron and a quantum-assisted classifier module (QACM) to mitigate gradient issues and improve training effectiveness. This study conducts experiments on the PennyLane quantum simulation platform, and the results show that QDS-SNN achieves 99.72\% accuracy on the GTSRB dataset in only 6 time steps -- outperforming the MS-ResNet baseline by 1.32\% while reducing energy consumption by 55.77\%. In the TSRD dataset, it achieves 97.90\% accuracy while reducing energy use to 52.68\% of the baseline. These results demonstrate that QDS-SNN offers a high-performance, energy-efficient solution for traffic sign recognition in intelligent transportation systems.

2606.07656 2026-06-09 physics.chem-ph cs.CE cs.LG 新提交

SC3: The Multi-Solvent Solubility Challenge and Benchmark

SC3:多溶剂溶解度挑战与基准

Vansh Ramani, Har Ashish Arora, Dhairya Kuchhal, Sergei Tatarin, Lev Krasnov, Sayan Ranu, Tarak Karmakar

发表机构 * Indian Institute of Technology Delhi, India(印度德里印度理工学院) Kurnakov Institute of General and Inorganic Chemistry RAS, Russia(库尔诺夫一般和无机化学研究所俄罗斯科学院)

AI总结 针对多溶剂溶解度预测中现有基准的缺陷,提出SC3基准,包含可复现的数据处理流程、多层级共识集和评估指标,并揭示最佳模型与理论极限仍有5倍差距。

Comments 34 pages, 16 tables, 22 figures

详情
AI中文摘要

溶解度预测是计算化学中的标准基准,然而据报道接近实验噪声上限(即偶然极限)的多溶剂模型尚未可靠到可以部署。我们认为这一差距部分是由于人为因素:已发表的基准在筛选策略上存在差异,评估时使用计数加权RMSE掩盖了在重尾溶剂分布上的失败,并且将广泛引用的0.6-0.8 log S实验室间数值视为偶然上限,尽管它反映的是最坏情况而非预期差异。我们引入了SC3,一个基于BigSolDB v2.1构建的多溶剂溶解度基准,包含三个贡献:(i) 一个可复现的数据处理流程,得到101,535个测量值,涵盖1,327种溶质和206种溶剂,重新校准的偶然下限为0.106 log S——约为传统数值的6倍;(ii) 嵌套的金/银/铜共识层级,包含逐点标准差、三种泄漏检查分割以及多溶剂指标套件(PS-RMSE, Z-RMSE);以及(iii) 跨六个家族的31个模型基准,其最佳铜级PS-RMSE是偶然极限的5倍,我们观察到这一差距未被任何测试过的深度替代方案所弥合。我们进行了三项后续分析:数据缩放、从量子化学溶剂化能的迁移以及特征级归因,这表明校准后的逐点不确定性是超越点预测的诊断可复用基础设施。

英文摘要

Solubility prediction is a standard benchmark in computational chemistry, yet multi-solvent models which reportedly approach the experimental-noise ceiling (i.e. the aleatoric limit) are not yet reliable enough to be deployed. We argue that this gap is partly artefactual: published benchmarks differ in curation policies, evaluate on count-weighted RMSE that hides failure on tail-heavy solvent distributions, and treat the widely cited 0.6-0.8 log S inter-laboratory figure as the aleatoric ceiling even though it reflects worst-case, not expected, disagreement. We introduce SC3, a multi-solvent solubility benchmark built on BigSolDB v2.1 with three contributions: (i) a reproducible curation pipeline yielding 101,535 measurements over 1,327 solutes and 206 solvents, with a recalibrated aleatoric floor of 0.106 log S-roughly 6 times tighter than the conventional figure; (ii) nested Gold/Silver/Bronze consensus tiers with per-point standard deviation, three leakage-checked splits, and a multi-solvent metric suite (PS-RMSE, Z-RMSE); and (iii) a 31-model benchmark across six families, whose best Bronze PS-RMSE sits at 5 times the aleatoric limit, and we observe this is a gap unclosed by any deep alternative tested. We perform three follow-on analyses: data scaling, transfer from quantum-chemistry solvation energies, and feature-level attribution, which demonstrates that calibrated per-point uncertainty is a reusable infrastructure for diagnosis beyond point prediction.

2606.07655 2026-06-09 eess.SP cs.CR cs.CV 新提交

FADRW: A Feature-Aware Modulated and Dynamically Reweighted Loss for Few-Shot Linguistic Steganalysis

FADRW:一种面向少样本语言隐写分析的特征感知调制与动态重加权损失

Shuo Liu, Xianghong Lin, Yukun Wei, Zhongliang Yang

发表机构 * International School, Beijing University of Posts and Telecommunications(北京邮电大学国际学院) School of Cyberspace Security, Beijing University of Posts and Telecommunications(北京邮电大学网络安全学院)

AI总结 针对语言隐写检测中类别极度不平衡和特征边缘化问题,提出FADRW损失函数,通过动态重加权和特征感知调制提升少样本隐写分析性能。

Comments Accepted by IEEE Signal Processing Letters

详情
AI中文摘要

社交媒体平台的普及为恶意语言隐写提供了便利,带来了显著的安全风险。然而,模型训练中的两个基本问题严重阻碍了检测。首先,极端类别不平衡(隐写样本不足1%)导致强烈的决策偏差。其次,生成式隐写的不可见性使其特征与正常文本几乎无法区分;这种相似性加上其极端稀有性,导致严重的特征边缘化,微弱的隐写信号被完全淹没。为了直接应对这些优化层面的挑战,我们提出了FADRW(特征感知调制与动态重加权损失),一种专为少样本隐写分析设计的新型损失函数框架。FADRW采用动态重加权逐步抵消决策偏差,并通过特征感知调制模块在结构上重塑特征空间,通过增强这些细微特征的可分离性来防止特征边缘化。在来自三个真实社交平台的数据集上进行的大量实验表明,FADRW显著优于最先进的方法,尤其是在具有挑战性的少样本隐写样本场景中。

英文摘要

The ubiquity of social media platforms facilitates malicious linguistic steganography, posing significant security risks. However, detection is severely hampered by two fundamental issues during model training. Firstly, extreme class imbalance (less than 1% steganographic samples) induces a strong decision bias. Secondly, the invisibility of generative steganography means its features are nearly indistinguishable from benign text; this similarity, compounded by their extreme rarity, leads to severe feature marginalization, where faint steganographic signals are completely overwhelmed. To directly address these optimization-level challenges, we propose FADRW (Feature-Aware Modulated and Dynamically Reweighted Loss), a novel loss function framework engineered for few-shot steganalysis. FADRW employs Dynamic Reweighting to progressively counteract decision bias, and a Feature-Aware Modulation module to structurally reshape the feature space, preventing feature marginalization by enhancing the separability of these subtle features. Extensive experiments on datasets from three real-world social platforms demonstrate that FADRW significantly outperforms state-of-the-art methods, particularly in the challenging few-shot steganographic sample scenario.

2606.07650 2026-06-09 cs.CR cs.CV cs.NI 新提交

Detecting Aimbot Cheaters in MOGs

检测多人在线游戏中的自瞄作弊者

Salman Shaikh, Tao Ni, Marc Dacier

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 提出PATCH主动防御策略,通过对抗性补丁作为游戏蜜标,触发作弊者目标检测模型,实现检测或干扰,在定制Unreal Engine游戏中白盒检测率超90%,跨模型迁移率达60-90%。

详情
AI中文摘要

多人在线游戏已成为娱乐行业数十亿美元的产业。然而,作弊者的存在破坏了诚实玩家的体验,并贬低了游戏开发者的努力,因为它直接影响玩家留存率、竞技完整性、游戏的合法性和可信度,以及最重要的整体收入流。在各种作弊技术中,视觉自瞄作弊是一种新兴威胁。它们使用计算机视觉模型从客户端屏幕截图中检测对手,而不是访问游戏内存,这使得商业内核级反作弊解决方案完全无法检测。在本文中,我们介绍了PATCH,一种新颖的主动防御策略,该策略部署对抗性补丁作为游戏中的蜜标,以减轻视觉自瞄作弊者的存在。我们的方法侧重于故意触发作弊者的目标检测模型,从而实现直接检测,或通过在其视口上泛滥补丁使作弊者无法进行游戏。我们在各种标准上评估了我们的方法;分析了不同补丁大小的有效性、补丁对不同屏幕分辨率的可扩展性、对不同视觉自瞄作弊配置的有效性,并探索了各种YOLO模型以评估补丁的可迁移性。在定制的Unreal Engine游戏上的评估显示,在几乎所有补丁大小的白盒场景中,检测率超过90%,并且使用较大补丁时,跨模型迁移率达到60%至90%。我们进一步在商业MOG《堡垒之夜》上验证了我们的方法,展示了现实世界的适用性。

英文摘要

Multiplayer Online Games have become a multibillion dollar industry in the entertainment sector. However, the presence of cheaters undermines the experience of honest players and devalues the effort of game developers, as it directly affects player retention, competitive integrity, the legitimacy and trustworthiness of a game, and most importantly the overall revenue streams. Among various cheating techniques, visual aimbots represent an emerging threat. They use computer vision models to detect opponents from client screen captures rather than accessing game memory, making them completely undetectable by commercial kernel level anti cheat solutions. In this paper, we introduce PATCH, a novel proactive defense strategy that deploys adversarial patches as in game honeytokens to mitigate the presence of visual aimbot cheaters. Our approach centers on deliberately triggering the cheaters' object detection model, enabling either direct detection, or rendering the game unplayable for the cheater via patch flooding on their viewport. We evaluate our approach on various criteria; analyzing the effectiveness of different patch sizes, scalability of patches to different screen resolutions, efficacy against diverse visual aimbot cheat configurations and also explore various YOLO models to assess patch transferability. Evaluation on a custom Unreal Engine game demonstrates over 90 percent detection rate in white box scenarios for almost all patch sizes, and reaches 60 to 90 percent cross model transferability with larger patches. We further validate our approach on Fortnite, a commercial MOG, demonstrating real world applicability.

2606.07628 2026-06-09 cs.CY cs.CV 新提交

Frankenstein in the Pipeline: Computational Epistemicide in Facial Recognition

管道中的弗兰肯斯坦:面部识别中的计算性知识灭绝

Nina da Hora

发表机构 * Universidade Estadual de Campinas(坎皮纳斯州立大学) Instituto da Hora(时间研究所)

AI总结 本文借鉴玛丽·雪莱的《弗兰肯斯坦》作为方法论框架,分析基于嵌入的面部识别如何通过检测、地标定位、对齐/正面化和嵌入等步骤,逐步将面部简化为数据,实施“计算性知识灭绝”,并论证废除主义作为规范性立场。

Comments Accepted to ACM FAccT 2026. Author's version. 17 pages, 2 figures

详情
AI中文摘要

虽然计算机视觉的优生学根源在批判性技术研究中已有充分记载,但较少关注这种暴力在管道层面实施的操作机制。本文借鉴玛丽·雪莱的《弗兰肯斯坦》,不是作为意外后果的隐喻,而是作为方法的诊断框架:拆解、重构,以及通过制造程序断言其合法性的造物。我认为,基于嵌入的面部识别实施了我所谓的计算性知识灭绝,这是Sueli Carneiro的知识灭绝概念在计算领域的延伸——通过摧毁作为活生生的关系性表面的面部,并授权数值代理作为身份的特权场所。在检测/裁剪、地标定位、对齐/正面化和嵌入过程中,面部逐渐被缩小到可以稳定为数据的部分,产生一个规范的面部作为可读性的条件,以及相应的形式主体作为识别的条件。向量化完成了弗兰肯斯坦式的“缝合”:被解剖的面部被重新组装成一个固定维度的制品,旨在跨数据库和机构流通。然后,我展示了基于距离的相似性和阈值如何将“足够接近”的规范操作化,使识别与标准化密不可分,并使改良主义的“伦理AI”优化在结构上不足。本文最后主张废除主义作为规范性立场:拒绝将向量化身份作为权利和访问的合法基础,并拆除通过可剖析的数据点来治理人类生活的制度冲动。

英文摘要

While the eugenic roots of computer vision are well-documented in critical technology studies, less attention has been paid to the operational mechanisms through which this violence is enacted at the level of the pipeline. This paper employs Mary Shelley's Frankenstein not as a metaphor for unintended consequences, but as a diagnostic framework for method: disassembly, reconstruction, and the production of a creature whose legitimacy is asserted by the procedure that made it. I argue that embedding-based facial recognition enacts what I call computational epistemicide, an extension of Sueli Carneiro's concept of epistemicide to the computational domain - by destroying the face as a living, relational surface and authorizing a numerical proxy as the privileged site of identity. Across detection/cropping, landmarking, alignment/frontalization, and embedding, the face is progressively narrowed to what can be stabilized as data, producing a canonical face as the condition of legibility and a corresponding form-subject as the condition of recognition. Vectorization completes the Frankensteinian "stitching": the dissected face is reassembled into a fixed-dimensional artifact designed to circulate across databases and institutions. I then show how distance-based similarity and thresholding operationalize a norm of "close enough," making recognition inseparable from standardization and rendering reformist "ethical AI" optimization structurally insufficient. The paper concludes by arguing for abolition as a normative stance: refusing vectorized identity as a legitimate basis for rights and access, and dismantling the institutional impulse to govern human life through dissectible data points.

2606.07612 2026-06-09 cs.CY cs.AI cs.LG 新提交

Position: Anthropomorphic Misalignment Research Needs Stronger Evidence

立场:拟人化错位研究需要更强证据

Vansh Gupta, Peter Nutter, Samuel Stante, Andreas Krause, Florian Tramèr, Lukas Fluri, Xin Chen, Anna Hedström

发表机构 * University of Cambridge(剑桥大学)

AI总结 本文指出拟人化错位研究(AMR)在概念模糊、数据不鲁棒、实验设计不足等问题上存在证据薄弱,提出证据层级框架和诊断清单以提升方法论严谨性。

详情
AI中文摘要

我们认为,许多拟人化错位研究(AMR)需要更强证据,以确保它们能为关键安全决策(如模型部署和监管)提供坚实基础。通过评估不同错位概念(如欺骗、突发错位和谄媚)中的失败模式,我们展示了概念模糊、非鲁棒数据集、实验设计和因果干预不足如何导致对模型行为的过度解读。本立场论文旨在提供关于证据考量的指导,以帮助提高AMR的方法论严谨性。为此,我们通过提出的证据层级框架和诊断清单,明确呼吁行动。这些共享标准将促进更富有成效的科学讨论,并确保关于AI风险的声明建立在坚实的实证基础上。

英文摘要

We argue that many Anthropomorphic Misalignment Research (AMR) studies need stronger evidence to ensure that they can provide a robust foundation for critical safety decisions, such as model deployment and regulation. By evaluating failure modes across different misalignment concepts, such as deception, emergent misalignment, and sycophancy, we show how conceptual ambiguity, non-robust datasets, experimental design, and insufficient causal interventions can lead to overinterpretation of model behaviors. This position paper aims to offer guidance on evidentiary considerations that can help improve methodological rigor in AMR. To achieve this, we provide a clear call to action through a proposed framework of evidence levels and a diagnostic checklist. These shared standards will enable more productive scientific discourse and ensure that claims about AI risks rest on solid empirical foundations.

2606.07611 2026-06-09 cs.IR cs.AI cs.LG cs.SE 新提交

MIRAGE: Metadata-Integrated Repository Analysis and Guided Enhancement for MSR Datasets

MIRAGE:面向MSR数据集的元数据集成仓库分析与引导增强

Aabia Ather, Muhammad Usayd Ather, Qurat-Ul-Ain Somroo, Muhammad Khuram Shahzad

发表机构 * SEECS, NUST(软件工程系,努斯兰大学)

AI总结 提出通过元数据丰富化、FAIR评估和主题驱动分析改进MSR数据集分析的方法,扩展了数据集目录并揭示了仓库站点和格式对引用与可用性的影响。

Comments 8 pages, 8 figures

详情
AI中文摘要

本文提出了一种通过元数据丰富化、FAIR评估和主题驱动分析来改进挖掘软件仓库(MSR)数据集分析的方法。本研究在先前专门用于分析MSR数据集的数据集目录基础上进行了扩展,为数据集添加了新注释,丰富了元数据类别,并提供了更高级的过滤选项。使用Semantic Scholar API收集了2013年至2024年间发表的MSR论文的元数据。分析基于潜在狄利克雷分配(LDA)主题建模和统计分析。数据集级别的属性被纳入扩展的数据集目录,即仓库托管站点、格式、可访问性、可重用性和数据集质量。研究表明,仓库托管站点和数据格式的选择会影响引用模式和数据集可用性。此外,增强的注释方法改进了MSR数据集的分析和可发现性,支持更有效地重用和评估研究工件。

英文摘要

This paper proposes an improved approach to the analysis of Mining Software Repositories (MSR) datasets via metadata enrichment, FAIRness assessment, and topic-driven analysis. This research expands upon an earlier dataset directory created specifically for the analysis of MSR datasets by adding new annotations to the datasets, enriching the metadata categories, and offering more advanced filtering options. The metadata of the MSR papers presented from 2013 to 2024 has been gathered using the Semantic Scholar API. The analysis is based on Latent Dirichlet Allocation (LDA) topic modeling and statistical analysis. Dataset-level attributes were included into the expanded dataset directory, namely repository hosting site, format, accessibility, reusability, and dataset quality. The study reveals that the choice of repository hosting sites and data formats influences citation patterns and dataset usability. Furthermore, the enhanced annotation approach improves the analysis and discoverability of MSR datasets, supporting more effective reuse and evaluation of research artifacts.

2606.07588 2026-06-09 cs.NE cs.LG math.OC quant-ph 新提交

Information-Geometric Optimization on Spheres

球面上的信息几何优化

Vladimir Ja\' cimović

发表机构 * Faculty of Natural Sciences and Mathematics University of Montenegro(自然科学与数学学院蒙特内格罗大学)

AI总结 针对球面上的黑箱优化问题,基于庞加莱球和伯格曼球的超几何信息几何,设计了两种信息几何优化流,并展示了广义Kuramoto振子集合如何计算自然搜索梯度并实现IGO算法。

详情
AI中文摘要

我们考虑球面上的黑箱优化问题。基于庞加莱球和伯格曼球的超几何(信息)几何,通过严格计算自然搜索梯度,设计了两种信息几何优化流(IGO流)。我们证明了球面上的广义Kuramoto振子集合能够计算自然搜索梯度,并在两种流形上实现IGO算法。指出了伯格曼球中的自然梯度策略与量子决策制定之间的关系。

英文摘要

We consider the black-box optimization problem on a sphere. Two information-geometric optimization flows (IGO flows) are designed with rigorous calculation of natural search gradients based on hyperbolic (information) geometry of Poincar\' e and Bergman balls. We demonstrate that ensembles of generalized Kuramoto oscillators on spheres compute natural search gradients and realize IGO algorithms on both manifolds. The relationship between natural gradient policies in Bergman balls and quantum decision making is pointed out.

2606.07580 2026-06-09 eess.SY cs.LG cs.SY 新提交

Quantifying Uncertainty in Space Debris Capture with Active Tether-Net Systems Caused by Noisy Observations

量化由噪声观测引起的空间碎片捕获主动绳网系统的不确定性

Feng Liu, Achira Boonrath, Eleonora M. Botta, Souma Chowdhury

发表机构 * Department of Mechanical and Aerospace Engineering, AIAA Student Member(机械与航空航天工程系,AIAA学生会员) Department of Mechanical and Aerospace Engineering, AIAA Member(机械与航空航天工程系,AIAA会员) Department of Mechanical and Aerospace Engineering, AIAA Senior Member(机械与航空航天工程系,AIAA高级会员)

AI总结 针对主动绳网系统捕获空间碎片时因噪声观测导致的不确定性,提出基于Sobol方差分析和扰动法的量化框架,评估捕获质量指数的不确定性。

Comments Presented at 2025 AIAA Aviation Forum

详情
AI中文摘要

随着低地球轨道空间碎片日益增多,对可靠高效的碎片清除解决方案的需求变得更加迫切。带有可操控单元的主动绳网系统是解决该问题的一种有前景的方案,其成功取决于网机动和闭合决策的鲁棒性。这些决策又受到以下不确定性的影响:i) 对目标碎片状态的噪声观测(例如,传感误差),以及ii) 决策系统训练所依赖的复杂网动力学和网/碎片相互作用行为的不完美模拟。本文关注这两个不确定性源中的第一个,并提出一个流程来传播和量化碎片捕获性能中由此产生的不确定性,该性能用捕获质量指数(CQI)表示。该量化针对使用固定基线控制的主动绳网和使用训练好的神经控制策略在部署阶段引导网机动的主动绳网分别进行。利用了两种不同的不确定性量化(UQ)技术,即Sobol基于方差的灵敏度分析和基于扰动的方法。使用高保真模拟器和低保真度基于代理的环境来展示预测精度与解决不确定性难易程度之间的权衡。

英文摘要

As Low Earth Orbit has grown more crowded with space debris, the need for reliable and efficient debris removal solutions becomes more urgent. An active tether-net system with maneuverable units is one of the promising solutions to this problem, whose success is dependent on the robustness of the net maneuver and closing decisions. These in turn are impacted by the uncertainties attributed to i) noisy observation of the target debris state (e.g., sensing errors), and ii) imperfect simulations of the complex net dynamics and net/debris interaction behavior, over which the decision system is trained. This paper focuses on the first of these two uncertainty sources, and presents a pipeline to propagate and quantify the resulting uncertainty in the debris capture performance expressed in terms of Capture Quality Index (CQI). This quantification is uniquely performed for both an active tether-net using a fixed baseline control and one using a trained neuro-control policy to guide the net maneuver during the deployment phase. Two different uncertainty quantification (UQ) techniques, namely Sobol's variance-based sensitivity analysis and perturbation-based method are exploited. A high-fidelity simulator and a lower-fidelity surrogate-based environment are used to demonstrate trade-offs between prediction accuracy versus ease of resolving uncertainties.

2606.07574 2026-06-09 cs.DC cs.AI cs.LG stat.CO stat.ML 新提交

Accelerating Birkhoff Projection for Manifold-Constrained Hyper-Connections

加速流形约束超连接的Birkhoff投影

Chenrui Wang, Yixuan Qiu

发表机构 * School of Statistics(统计学系) Renmin University of China(中国人民大学) School of Statistics and Data Science(统计学与数据科学学院) Institute of Big Data Research(大数据研究院) Shanghai University of Finance and Economics(上海财经大学)

AI总结 针对流形约束超连接中Birkhoff投影的计算瓶颈,提出基于对偶公式和牛顿法的端到端加速框架,结合隐式微分和CUDA内核实现超过20倍加速。

详情
AI中文摘要

流形约束超连接(mHCs)最近被提出作为超连接的一种原则性扩展,其中残差混合矩阵通过投影到Birkhoff多面体上被约束为双随机矩阵。在实际的mHC实现中,该约束通过Sinkhorn-Knopp迭代强制执行,反向传播依赖于展开迭代求解器。这种设计引入了大量的计算和内存开销,并且当算法在具有挑战性的输入上收敛缓慢时,可能产生不准确的投影,从而破坏mHCs预期的范数控制和稳定性保证。在这项工作中,我们聚焦于实际重要的4x4 Birkhoff投影设置,并开发了一个端到端的加速框架。通过利用对偶公式,我们将问题简化为一个三维无约束凸问题,并使用牛顿法求解,实现了快速收敛和高精度。对于反向传播,我们用隐式微分替代展开微分,无需存储中间状态即可获得精确梯度。为了利用大规模并行性,我们设计了一个warp级别的CUDA内核,仅使用寄存器级原语,避免了全局和共享内存I/O。与代表性开源基线的大量实验表明,所提出的求解器产生了更可靠的双随机投影——特别是在输入幅度较大时——并实现了显著的端到端加速(包括反向传播),在大批量下达到超过20倍的加速,同时保持数量级更小的边际误差。

英文摘要

Manifold-constrained hyper-connections (mHCs) have recently been proposed as a principled extension of hyper-connections, where the residual mixing matrices are constrained to be doubly stochastic via projection onto the Birkhoff polytope. In practical mHC implementations, this constraint is enforced by Sinkhorn-Knopp iterations, and the backward pass relies on unrolling the iterative solver. This design introduces substantial computation and memory overhead, and may also yield inaccurate projections when the algorithm converges slowly on challenging inputs, undermining the intended norm-control and stability guarantees of mHCs. In this work, we focus on the practically important 4x4 Birkhoff projection setting and develop an end-to-end acceleration framework. By leveraging the dual formulation, we reduce the problem to a three-dimensional unconstrained convex problem and solve it with Newton's method, achieving fast convergence and high accuracy. For the backward pass, we replace the unrolled differentiation with implicit differentiation, yielding exact gradients without storing intermediate states. To exploit massive parallelism, we design a warp-level CUDA kernel that uses only register-level primitives, avoiding global and shared memory I/O. Extensive experiments against representative open-source baselines demonstrate that the proposed solver yields substantially more reliable doubly stochastic projections -- especially when the input magnitude is large -- and achieves significant end-to-end speedups (including the backward pass), reaching over 20x acceleration at large batch sizes while maintaining orders of magnitude smaller marginal errors.

2606.07572 2026-06-09 physics.soc-ph cs.LG stat.AP 新提交

Forecasting Japanese elections: A nonlinear machine-learning approach

预测日本选举:一种非线性机器学习方法

Sota Kato, Xuan Luo, Budrul Ahsan, Asahi Obata, Takafumi Nakanishi

发表机构 * International University of Japan(国际大学) The Tokyo Foundation(东京基金会) IBM Japan(IBM日本) Rice University(里士满大学) Tokyo University of Technology(东京技术大学)

AI总结 本研究引入基于决策树和集成学习的非线性机器学习模型,预测日本众议院选举结果,相比传统线性模型在样本内和样本外评估中均表现出更优的预测精度。

详情
AI中文摘要

尽管日本是世界上最大的先进民主国家之一,但其全国选举的预测模型发展仍然有限。本研究引入了基于决策树和集成学习方法的非线性机器学习预测模型,用于预测日本众议院选举结果。为了评估我们方法的方法论优势,我们复现了Lewis-Beck和Tien(LBT)针对日本选举的基础统计预测模型的理论框架和数据集。我们的模型在样本内和样本外评估中均显示出比LBT模型适度但持续提高的预测准确性,表明非线性算法在捕捉复杂选举动态方面为经典线性方法提供了一种替代方案。本研究是非线性机器学习技术较早应用于单一国家选举预测的案例之一。它提供了一个可复现的框架,当与其他国家的特定选举理论相结合时,可能提高预测模型在更广泛国家背景下的预测性能。

英文摘要

Despite Japan being one of the world's largest advanced democracies, the development of election forecasting models for its national elections remains limited. This study introduces nonlinear machine-learning forecasting models, based on decision tree and ensemble learning methods, for predicting the outcomes of Japanese lower-house elections. To assess the methodological benefits of our approach, we replicated the theoretical framework and dataset of Lewis-Beck and Tien's (LBT) foundational statistical forecasting model for Japanese elections. Our models demonstrated moderately but consistently improved predictive accuracy compared to LBT's model in both in-sample and out-of-sample evaluations, suggesting that nonlinear algorithms offer an alternative approach to classical linear methods in capturing complex electoral dynamics. This study represents one of the earlier applications of nonlinear machine-learning techniques to single-country election forecasting. It offers a replicable framework that, when combined with the country-specific electoral theories of other nations, may enhance the predictive performance of forecasting models in broader national contexts.

2606.07570 2026-06-09 cs.DL cs.LG 新提交

Can LLMs extract scientific consensus? A case study in high-temperature superconductivity

LLMs能否提取科学共识?以高温超导为例

Mouyang Cheng, Wenhao He, Zhuotao Jin, Bowen Yu, Ju Li, Boris Kozinsky, Yao Wang, Pavel Volkov, Liangzi Deng, Ching-Wu Chu, Xiao-Gang Wen, Mingda Li

发表机构 * Center for Computational Science and Engineering, MIT(MIT计算科学与工程中心) Department of Materials Science and Engineering, MIT(MIT材料科学与工程系) Department of Physics, MIT(MIT物理系) Department of Nuclear Science and Engineering, MIT(MIT核科学与工程系) John A. Paulson School of Engineering and Applied Sciences, Harvard University(哈佛大学约翰·A·保罗森工程与应用科学学院) Department of Chemistry, Emory University(埃默里大学化学系) Department of Physics, University of Connecticut(康涅狄格大学物理系) Department of Physics and Texas Center for Superconductivity, University of Houston(休斯顿大学物理系和德克萨斯超导中心)

AI总结 本研究以高温超导领域为测试平台,利用近18,000篇高被引文献构建知识图谱,发现LLM提取的表征能恢复出连贯且物理可解释的结构,表明LLM可作为解码竞争性科学知识的可扩展工具。

Comments 23 pages, 4 figures

详情
AI中文摘要

科学知识日益分散在庞大且异质的科学文献中,其中重要的主张往往是隐含的、不断演变的,并且存在内部争议。尽管大型语言模型(LLM)在信息提取和摘要方面表现出色,但它们恢复潜在科学共识的能力仍不清楚。本文以凝聚态物理中长期存在且备受争议的高温超导(HTS)问题为挑战性测试平台,研究了这一问题。利用过去七十年间近18,000篇高被引出版物,我们构建了一个结构化的知识图谱,链接了竞争性的超导机制、材料家族、证据模态和引用关系。我们发现,LLM提取的表征恢复出了连贯且物理可解释的结构,包括家族依赖的机制概况、证据特定的相关性以及引用介导的科学信念的时间演化。对LLM的消融研究进一步表明,全局结构在提示、解码和模型变化下保持稳健。我们的结果表明,LLM确实可以作为可扩展的工具,用于解读以竞争性解释和知识演变为特征的领域的科学知识。

英文摘要

Scientific knowledge is increasingly dispersed across vast and heterogeneous scientific literature, where important claims are often implicit, evolving, and internally debated. While large language models (LLMs) have shown impressive performance in information extraction and summarization, their ability to recover latent scientific consensus remains unclear. Here, we investigate this problem in the context of high-temperature superconductivity (HTS), a long-standing and highly debated topic in condensed matter physics, as a challenging testbed. Using near 18,000 highly-cited publications over the past seven decades, we construct a structured knowledge graph linking competing superconducting mechanisms, material families, evidential modalities, and citation relations. We find that LLM-extracted representations recover coherent and physically interpretable structures, including family-dependent mechanism profiles, evidence-specific correlations, and citation-mediated temporal evolution of scientific beliefs. Ablation studies on LLM further show that the global structure remains robust across prompting, decoding, and model variations. Our results suggest that LLMs can indeed serve as scalable tools for deciphering scientific knowledge in domains characterized by competing interpretations and evolving knowledge.

2606.07568 2026-06-09 cs.HC cs.AI cs.CV cs.LG physics.data-an 新提交

A Systematic Study of Behavioral Cloning for Scientific Data Annotation

行为克隆在科学数据标注中的系统研究

Ishaan Singh Chandok, Core Francisco Park

发表机构 * GitHub

AI总结 针对科学数据标注中人工验证校正耗时问题,提出行为克隆框架,通过9个合成任务模拟专家策略,发现模型层次化技能习得、多任务预训练高效微调、内部表示共享错误模式等关键结论。

Comments ICML 2026 Oral

详情
AI中文摘要

科学数据标注,例如视频中动物追踪或神经重建的校对,仍然受限于“最后一公里”问题:即使有强大的自动化,验证和校正仍需大量人力。标准方法训练模型直接预测标注,丢弃了专家如何导航、点击、验证和校正的丰富监督信息。我们引入了一个研究科学标注上行为克隆的框架:9个合成任务配以合成标注,模拟真实人类策略,包括探索、错误校正和战略决策。我们的实验揭示了若干发现。首先,技能层次化出现:模型先学习GUI机制,再学习任务关键决策,且比训练数据犯更少错误,同时保留在错误发生时校正的能力。其次,在多任务行为克隆上扩展模型表明,在我们的规模范围内,更大的模型数据效率更高。第三,多任务预训练能够高效微调至新任务,而从零开始训练则完全失败。第四,线性探针揭示模型内部表示标注过程的潜在变量,如任务阶段和数据位置;有趣的是,我们发现一个跨不同标注任务泛化的共享错误表示。总体而言,我们的框架建立了系统基准并识别了关键瓶颈,为将行为克隆扩展到真实世界科学数据标注奠定了基础。

英文摘要

Scientific data annotation, such as tracking animals in video or proofreading neural reconstructions, remains bottlenecked by the "last mile" problem: even with strong automation, verification and correction consume substantial human effort. Standard approaches train models to directly predict annotations, discarding the rich supervision in how experts navigate, click, verify, and correct. We introduce a framework for studying behavioral cloning on scientific annotation: 9 synthetic tasks paired with synthetic annotations that simulate realistic human strategies including exploration, mistake correction, and strategic decision-making. Our experiments reveal several findings. First, skills emerge hierarchically: models learn GUI mechanics before task-critical decisions, and commit fewer mistakes than the training data while retaining the ability to correct errors when they occur. Second, scaling models on multi-task behavioral cloning shows that larger models are more data efficient within our scale range. Third, multi-task pretraining enables efficient fine-tuning to new tasks, while training from scratch fails entirely. Fourth, linear probes reveal that models internally represent latent variables of the annotation process such as task phase and data position; interestingly, we find a shared mistake representation that generalizes across different annotation tasks. Overall, our framework establishes systematic benchmarks and identifies key bottlenecks, providing a foundation for scaling behavioral cloning to real-world scientific data annotation.

2606.07567 2026-06-09 q-bio.BM cs.AI cs.CE 新提交

SurfDesign: Effective Protein Design on Molecular Surfaces

SurfDesign:基于分子表面的高效蛋白质设计

Fang Wu, Shuting Jin, Xiangru Tang, Mark Gerstein, Xiangxiang Zeng, Yejin Choi, Jure Leskovec, Jinbo Xu

发表机构 * Stanford University(斯坦福大学) Wuhan University of Science and Technology(武汉科技大学) Yale University(耶鲁大学) School of Medicine, Yale University(耶鲁大学医学院) Hunan University(湖南大学) Yuelushan Laboratory(岳麓实验室) Kumo.AI Toyota Technological Institute at Chicago(芝加哥技术研究所)

AI总结 提出SurfDesign框架,将分子表面建模为连续几何流形并整合预训练蛋白质语言模型,通过表面等变消息传递捕捉几何特征,在从头设计结合子和酶设计基准上优于现有方法。

详情
Journal ref
KDD 2026 AI4Science
AI中文摘要

蛋白质功能很大程度上由分子表面几何和物理化学互补性决定,然而大多数蛋白质设计方法仅以主链结构为条件。我们引入了SurfDesign,一个表面条件蛋白质设计框架,将分子表面建模为连续几何流形,并将其与预训练蛋白质语言模型集成。SurfDesign采用基于表面的等变消息传递来捕捉表面法线、曲率和方向几何,同时采用参数高效的微调策略。专注于功能性蛋白质设计,我们表明SurfDesign在从头设计结合子和酶设计基准上始终优于先前的表面条件和仅主链方法。我们还报告了在逆折叠基准上的强劲性能,作为结构兼容性的诊断。我们的结果强调了流形感知表面表示作为功能性蛋白质和酶设计的原理基础。代码可在https://github.com/smiles724/SurfDesign获取。

英文摘要

Protein function is largely determined by molecular surface geometry and physicochemical complementarity, yet most protein design methods condition only on backbone structure. We introduce SurfDesign, a surface-conditioned protein design framework that models molecular surfaces as continuous geometric manifolds and integrates them with pretrained protein language models. SurfDesign employs surface-based equivariant message passing to capture surface normals, curvature, and directional geometry, together with a parameter-efficient fine-tuning strategy. Focusing on functional protein design, we show that SurfDesign consistently outperforms prior surface-conditioned and backbone-only methods on de novo binder and enzyme design benchmarks. We also report strong performance on inverse-folding benchmarks as a diagnostic of structural compatibility. Our results highlight manifold-aware surface representations as a principled foundation for functional protein and enzyme design. Code is available at https://github.com/smiles724/SurfDesign.

2606.07564 2026-06-09 physics.ins-det cs.AI hep-ex 新提交

Considerations for an Integrated Detector Design at FCC-ee: A Human-AI Exploration

FCC-ee集成探测器设计考量:人机协同探索

Charles Young

发表机构 * SLAC National Accelerator Laboratory(SLAC国家加速器实验室)

AI总结 通过物理学家与AI助手的对话,探讨FCC-ee探测器设计,从初始概念到修正方案,展示人机协作在实验物理设计中的潜力与局限。

Comments 103 pages, one figure

详情
AI中文摘要

本报告通过物理学家与AI助手之间的扩展对话,探讨了未来环形对撞机正负电子模式(FCC-ee)的探测器设计考量。从AI助手在没有明确物理学家输入的情况下提出的初始“偏见”探测器概念开始,每个子系统都经过详细审查,AI的假设在交流中受到挑战和修正。讨论涵盖了从束流管到亮度监测器的整个探测器,特别关注子系统选择与实用考量(校准、稳定性和操作简便性)之间的相互作用,这些对于为期十五年的精确物理计划至关重要。叙述记录了集成探测器设计如何从起点演变为AI助手修正后的“偏见”探测器概念。本报告的重点在于过程,以说明人机协作在实验物理设计中的潜力和局限性,任何“偏见”探测器概念的物理能力仍有待探索。

英文摘要

This report explores detector design considerations for the Future Circular Collider in its electron-positron mode (FCC-ee) through an extended dialogue between a physicist and an AI assistant. Starting from initial "prejudice" detector concepts proposed by the AI assistant without explicit physicist input, each subsystem is examined in detail, with the AI's assumptions challenged and revised through the exchange. The discussion covers the full detector from beam pipe to luminosity monitor, with particular attention to the interplay between subsystem choices and the practical considerations - calibration, stability, and operational simplicity - that are essential for a fifteen-year precision physics program. The narrative documents how the integrated detector design evolved substantially from the starting point to revised "prejudice" detector concepts of the AI assistant. The focus of this report is on the process to illustrate both the potential and the limitations of human-AI collaboration in experimental physics design, and the physics capabilities of any of the "prejudice" detector concepts remain to be explored.

2606.07562 2026-06-09 q-bio.BM cs.AI 新提交

The Montparnasse Algorithm for RNA Design

RNA设计的蒙帕纳斯算法

Tristan Cazenave

发表机构 * Tristan Cazenave

AI总结 提出基于广义嵌套滚动策略适应的蒙特卡洛搜索框架Montparnasse,结合问题特定先验和字典序多准则评估,在Eterna100基准上比现有最优方法DesiRNA快三倍以上,并在血红蛋白α信使RNA二级结构优化中优于LinearDesign。

详情
AI中文摘要

RNA设计包括发现一个优化预定义标准(如二级结构)的核苷酸序列。它对合成生物学、医学和纳米技术很有用。我们提出了Montparnasse,一个基于广义嵌套滚动策略适应的蒙特卡洛搜索框架,并增加了问题特定的先验、第1级的慢速和长期适应,以及字典序多准则评估。Montparnasse在所有时间限制下一致地比现有最优方法DesiRNA更快地解决了Eterna100 V1基准的所有100个谜题,总体达到完全覆盖的速度快三倍以上。在血红蛋白α的信使RNA二级结构优化中,它识别出的序列比LinearDesign的MFE最优解具有更多的配对碱基。

英文摘要

RNA design consists of discovering a nucleotide sequence that optimizes predefined criteria, such as secondary structure. It is useful for synthetic biology, medicine, and nanotechnology. We propose Montparnasse, a Monte Carlo search framework based on Generalized Nested Rollout Policy Adaptation, augmented with a problem-specific prior, slow and long adaptation at level 1, and a lexicographic multicriteria evaluation. Montparnasse solves all 100 puzzles of the Eterna100 V1 benchmark consistently faster than DesiRNA, the previous state of the art, across all time limits, reaching full coverage more than three times faster overall. On messenger RNA secondary structure optimization for hemoglobin alpha, it identifies sequences with more paired bases than the MFE-optimal solution of LinearDesign.

2606.07556 2026-06-09 cs.NI cs.AI stat.ME 新提交

Selecting New Measurement Locations to Diversify Traffic-Pattern Coverage: A Real-World Evaluation for Total Traffic Volume Estimation

选择新的测量位置以多样化交通模式覆盖:总交通量估计的实际评估

Masaaki Inoue, Akifumi Okuno, Shintaro Fukushima

发表机构 * TOYOTA Motor Corporation(丰田汽车公司) Institute of Statistical Mathematics(统计数学研究所) The Graduate University for Advanced Studies, SOKENDAI RIKEN(研究生高等大学院,SOKENDAI RIKEN)

AI总结 针对固定交通计数器覆盖有限的问题,提出利用广泛设备数据选择新计数器位置以增加观测模式多样性,提高城市交通量估计精度,并通过实地测量验证。

Comments 12 pages, 7 figures

详情
AI中文摘要

准确测量交通量和流量对于现代智能交通至关重要。然而,尽管传感器设备最近取得了技术进步,安装和维护固定交通计数器的成本仍然很高。因此,它仅限于可以安装计数器的一小部分位置点,这严重限制了在城市范围内掌握和预测总交通量的可能性。相比之下,具有位置历史的设备(如智能手机和联网车辆)现在被广泛使用,并提供更广泛的空间覆盖。然而,这些设备的数据通常是部分且嘈杂的,因此不足以直接估计总交通量和流量。在本文中,我们利用这些广泛可用设备的信息来帮助决定在何处放置额外的交通计数器,并研究选择新的测量位置如何改善城市范围的交通估计性能。为此,我们提出了一种算法,该算法选择额外的计数器位置以增加观测到的交通信号模式的多样性,而不是简单地将计数器均匀分布在空间上。目标是捕获当前计数器集中稀有的交通模式类型,并使收集的观测结果对后续估计和预测更具代表性。我们还进行了实际评估;在一个目标城市中,我们选择了预期能改善交通预测的新位置,然后自费在这些位置进行了新的实地测量。所得数据提高了不同保真度下交通量估计的准确性。

英文摘要

Accurate measurement of traffic volumes and flows is vital for modern intelligent transportation. However, despite recent technological advances in sensor devices, it is still expensive to install and maintain fixed traffic counters. Therefore, it is restricted to a small portion of location points where the counters can be installed, which severely limits the possibility of grasping and predicting the total traffic volume at a city-wide level. By contrast, devices with location history such as smartphones and connected vehicles are now widely used and provide much wider spatial coverage. However, the data from these devices are usually partial and noisy, so they are not enough to directly estimate total traffic volumes and flows. In this paper, we use the information from these widely available devices to help decide where to place additional traffic counters, and we study how selecting new measurement locations can improve city-wide traffic estimation performance. To achieve this, we propose an algorithm that chooses additional counter locations to increase the diversity of observed traffic signal patterns, rather than simply spreading counters evenly over space. The goal is to capture traffic-pattern types that are rare in the current counter set and to make the collected observations more representative for later estimation and forecasting. We also present a real-world evaluation; in a target city, we select new locations expected to improve traffic prediction, and we then commissioned new field measurements at those locations at our expense. The resulting data led to an improvement in traffic volume estimation accuracy across different fidelities.

2606.07552 2026-06-09 cs.MA cs.AI cs.LG 新提交

Symbolic Reasoning Frameworks Modulate LLM Risk Aversion in Multi-Agent Strategic Settings

符号推理框架在多智能体战略环境中调节大语言模型的风险规避

Augustin Chan

发表机构 * iterative.day

AI总结 本研究通过注入符号推理框架(如易经、塔罗牌)作为反思提示,发现其能差异化调节LLM的风险规避倾向,并在多智能体博弈中产生框架特定的胜者分布,且该效应源于反思过程而非内容遵循。

Comments 17 pages, 3 figures, 6 tables, 6 listings. Code and data: https://doi.org/10.5281/zenodo.20338937

详情
AI中文摘要

大型语言模型在作为战略智能体部署时表现出内在的行为倾向——尤其是风险规避的“乌龟”偏向于防御性玩法。我们证明,符号推理框架作为每轮反思提示注入一个智能体,能够差异化地调节这种偏向,并重塑多智能体生态系统,产生框架特定的胜者分布。在一个7玩家的战国策外交变体(41局游戏,4种条件,单战役记忆积累)中,每个框架产生独特的生态系统特征:在控制条件下,燕国主导(7/11,64%);在易经蓍草占卜下,燕国和楚国共同主导,而秦国被完全压制(0/10);在塔罗牌下,秦国主导(5/10,Fisher vs. 合并p=0.006);在乱序文本消融(保留提示结构的无意义神谕文本)下,齐国主导(5/10,Fisher vs. 合并p=0.006)。接受框架的智能体(韩国)从未获胜,且在不同条件下生存率无差异(Fisher p=1.0),但塔罗牌持续提升韩国的峰值领土(平均3.0个SC vs. 2.1-2.5个其他,Kruskal-Wallis p=0.010)。两个框架的内容均不能预测后续行动——卦象主题(卡方p=0.95)和塔罗牌姿态(卡方p=0.69)均与行动选择独立——表明调节作用是通过反思过程而非内容遵循实现的。我们将其作为一篇观察论文呈现,确立智能体层面的对齐框架选择在多智能体环境中产生独特的系统级后果。

英文摘要

Large language models exhibit innate behavioral tendencies when deployed as strategic agents -- notably a risk-averse "turtle" bias toward defensive play. We show that symbolic reasoning frameworks, injected as per-round reflective prompts into one agent, differentially modulate this bias and reshape the multi-agent ecosystem to produce framework-specific winner distributions. In a 7-player Warring States Diplomacy variant (41 games, 4 conditions, single-campaign memory accumulation), each framework produces a distinct ecosystem signature: under control, Yan dominates (7/11, 64%); under I-Ching yarrow divination, Yan and Chu co-dominate while Qin is completely suppressed (0/10); under Tarot, Qin dominates (5/10, Fisher vs. pooled p = 0.006); under scrambled-text ablation (incoherent oracle text preserving prompt structure), Qi dominates (5/10, Fisher vs. pooled p = 0.006). The framework-receiving agent (Han) never wins and shows no survival difference across conditions (Fisher p = 1.0), but Tarot consistently elevates Han's peak territory (mean 3.0 SCs vs. 2.1-2.5 others, Kruskal-Wallis p = 0.010). Neither framework's content predicts subsequent actions -- hexagram themes (chi-squared p = 0.95) and Tarot card postures (chi-squared p = 0.69) are both independent of action choice -- suggesting the modulation operates through the reflective process, not content-following. We present this as an observation paper establishing that alignment-framework choice at the agent level produces distinctive system-level consequences in multi-agent settings.

2606.07551 2026-06-09 cs.CY cs.HC cs.RO 新提交

Astro, I'm Home! Investigating Factors that Influence the Acceptance of Home Robots Using Supervised Machine Learning

Astro,我回家了!利用监督机器学习研究影响家庭机器人接受度的因素

Katrin Fischer, Essence Wilson, Steffie Kim, Dmitri Williams

发表机构 * University of Southern California(南加州大学)

AI总结 本研究运用正则化技术(如Lasso和Ridge回归)分析影响社交机器人接受度的因素,发现绩效期望、社会影响和享乐动机是使用意图的最强预测因子,并识别出可用性、信任和能力等新变量。

Comments Preprint submitted to the 18th International Conference on Social Robotics (ICSR 2026)

详情
AI中文摘要

社交机器人在家庭环境中的使用正在增加。这项探索性研究应用正则化技术(例如Lasso和Ridge回归)来调查变量并识别社交机器人背景下技术接受的新模型。在原始的UTAUT2框架内,绩效期望、社会影响和享乐动机成为使用技术意图的最强和最一致的预测因子。此外,可用性、信任和能力被识别为预测使用意图模型中的有希望的变量。

英文摘要

The use of social robots in home environments is on the rise. This exploratory study applies regularization techniques (e.g., Lasso and Ridge regression) to investigate variables and identify new models of technology acceptance in the context of social robots. Within the original UTAUT2 framework, performance expectancy, social influence, and hedonic motivation emerged as the strongest and most consistent predictors of intention to use the technology. In addition, usability, trust, and competence were identified as promising variables in a model predicting intention to use.

2606.07548 2026-06-09 cs.IR cs.AI cs.CL 新提交

Evaluating Advanced Prompting on Gemini Flash for Multi-Hop Biomedical QA

评估 Gemini Flash 上的高级提示工程用于多跳生物医学问答

Ahmed Bajaber, Mohammed Alliheedi

发表机构 * Saudi Med AI Lab (SMAIL)(沙特医学人工智能实验室(SMAIL)) Prince Sultan University(普森国王大学) Al-Baha University(阿勒巴哈大学)

AI总结 本研究通过设计多组件提示(角色扮演、多步思维链示例和格式规则),在 Gemini 2.0 Flash 上实现概念级得分0.720,显著优于基线0.565,并接近下一代模型性能,证明高级提示设计对释放LLM推理能力至关重要。

Comments 8 pages, proceedings of the BioCreative IX Challenge and Workshop (BC9) at IJCAI 2025

详情
Journal ref
Proc. BioCreative IX Workshop (BC9), IJCAI 2025, Montreal, Canada
AI中文摘要

MedHopQA 挑战为大型语言模型(LLM)提供了一个关键测试:在高风险的生物医学领域中进行复杂的多跳推理。本文详细介绍了我们对 Google Gemini Flash 模型的直接基于 API 的评估,重点关注高级提示工程的影响。我们为 Gemini 2.0 Flash 设计了一个复杂的多组件提示,结合了角色扮演、显式的多步思维链(CoT)示例和详细的格式规则。使用这个复杂提示的最佳运行获得了0.720的概念级得分。这一结果显著优于仅得0.565的基线提示。值得注意的是,在高效的 Gemini 2.0 Flash 上的性能与下一代 Gemini 2.5 Flash 的结果几乎相同。我们的发现表明,复杂的提示设计是释放现代LLM全部推理能力的关键因素。

英文摘要

The MedHopQA challenge presents a critical test for Large Language Models (LLMs): complex, multi-hop reasoning in the high-stakes biomedical domain. This paper details our direct API-based evaluation of Google's Gemini Flash models, focusing on the impact of advanced prompt engineering. We designed a sophisticated, multi-component prompt for Gemini 2.0 Flash that combined role-playing, explicit multi-shot Chain-of-Thought (CoT) examples, and detailed formatting rules. Our best run, using this complex prompt, achieved a Concept Level Score of 0.720. This result dramatically outperformed a baseline prompt which scored only 0.565. Remarkably, this performance on the efficient Gemini 2.0 Flash was almost identical to the result from the next-generation Gemini 2.5 Flash. Our findings demonstrate that sophisticated prompt design is a critical factor for unlocking the full reasoning capabilities of modern LLMs.

2606.07546 2026-06-09 cs.IR cs.AI cs.LG 新提交

Beyond Item IDs: Scaling Short-Form-Video Recommendation via Semantic-Native Long Sequence Modeling

超越视频ID:通过语义原生长序列建模实现短视频推荐规模化

Ruixiao Sun, Diego Uribe Mora, Zhimeng Jiang, Yuanzhen Lin, Jiarui Wang, Yuening Li, Danfeng Guo, Zhizhong Chen, Chuan He, Liang Liu

发表机构 * Google Mountain View, USA(谷歌山景城,美国)

AI总结 针对短视频推荐中序列长度受限于视频ID语义稀疏性和Transformer二次复杂度的问题,提出采用语义ID和全局感知压缩Transformer,实现十亿用户规模的超长行为序列建模,显著降低内存和计算开销,在线实验提升用户满意度和内容消费。

Comments this manuscript has been accepted by SIGIR 2026

详情
AI中文摘要

捕捉用户跨广泛观看历史的兴趣对于短视频推荐至关重要,但扩展序列长度受到两个瓶颈的限制:原子视频ID的语义稀疏性和Transformer的二次计算复杂度。传统的正交视频ID无法捕捉内容关系,并且需要大型嵌入表,而自注意力的二次复杂度在严格的工业延迟和资源约束下限制了最大序列长度。在这项工作中,我们提出了一个在生产环境中部署的框架,用于在十亿用户规模上建模超长用户行为序列。我们首先通过采用内容原生的语义ID来解决表示瓶颈。通过使用深度截断、粗粒度的语义ID,我们将嵌入表大小从语料库基数中缩小。这种紧凑的表示通过共享语义前缀自然地泛化到冷启动内容。其次,为了克服序列扩展障碍,我们引入了全局感知压缩Transformer,它利用非参数时间折叠和统一全局查询集成来有效压缩序列,缓解了标准自注意力的内存和计算瓶颈。在我们计算基础设施上的离线分析显示,峰值内存占用减少了一个数量级,计算开销大幅降低。这种效率提升使得在生产中以可承受的成本支持更长的序列长度,在大规模在线A/B测试中,在满意的用户参与度和满意的内容消费方面取得了显著的在线收益。

英文摘要

Capturing user interests across extensive watch histories is critical for short-form video recommendation, yet scaling sequence length is limited by two bottlenecks: the semantic sparsity of atomic Video IDs and the quadratic computational complexity of Transformers. Traditional orthogonal Video IDs fail to capture content relationships and demand large embedding tables, while the quadratic complexity of self-attention restricts the maximum sequence length under strict industrial latency and resource constraints. In this work, we present a production-deployed framework for modeling ultra-long user behavior sequences at a billion-user scale. We first address the representation bottleneck by adopting content-native Semantic IDs. By utilizing depth-truncated, coarse-grained Semantic IDs, we shrink the embedding table size from corpus cardinality. This compact representation naturally generalizes to cold-start content through shared semantic prefixes. Second, to overcome the sequence scaling barrier, we introduce a Global-Aware Compression Transformer that leverages non-parametric temporal folding and unified global query integration to effectively condense the sequence, alleviating both the memory and computational bottlenecks of standard self-attention. Offline profiling on our computing infrastructure demonstrates an order-of-magnitude reduction in peak memory footprint and a drastic decrease in computational overhead. This efficiency gain enables supporting longer sequence lengths at an affordable cost in production, yielding substantial online gains in satisfied user engagement and satisfied content consumption in large-scale online A/B tests.

2606.07544 2026-06-09 cs.CY cs.AI cs.HC 新提交

AI-Integrated Learning Management System for Middle School: A Longitudinal Study of Learning Outcomes Through High School and Beyond

面向中学的AI集成学习管理系统:一项从高中到毕业后的学习成果纵向研究

Misan Paul Etchie, Taiwo Olutosin

发表机构 * National Agricultural University(国立农业大学)

AI总结 提出一种隐私优先的AI集成学习管理系统,通过政策约束的AI辅助(形成性反馈、间隔复习、适应性练习)和教师仪表盘,在中学日常课程中提供即时支持,并设计纵向研究评估其对高中及毕业后学习轨迹的长期影响。

详情
AI中文摘要

中学是构建核心学术技能和学习习惯的关键时期,这些习惯会延续到高年级,但许多学生仍因帮助有限且滞后而落后。学习管理系统(LMS)已成为分发材料、收集作业、评估学生任务和记录成绩的标准基础设施,但在大多数部署中,它们更像工作流工具而非教学支持。结果是常见的瓶颈:学生在困惑中继续练习,教师对问题进行分诊,而本可纠正误解的反馈在错误观念固化后才到达。为弥补这一差距,我们提出一个面向中学教学的AI集成LMS,并配以纵向研究设计,以测试持续、有边界的AI支持是否能改变高中及毕业后的学习成果。该平台在常规课程中添加了政策约束的AI辅助,提供形成性反馈和提示,基于掌握程度推荐间隔复习和适应性练习,并提供教师仪表盘以总结误解模式并标记持续困难。由于平台面向未成年人,设计以隐私为先,采用数据最小化、基于角色的访问控制、适龄响应约束和可审计的AI交互日志。除了短期表现,评估计划将细粒度的学习轨迹(尝试、修订、求助和节奏)与机构成果(在可行情况下)联系起来,以便将工具采纳效应与学习轨迹的长期变化区分开来。

英文摘要

Middle school is a key window for building core academic skills and the learning routines students carry into later grades, yet many students still fall behind because help is often limited and comes too late, after they have already been stuck for a while. Learning Management Systems (LMSs) are now standard infrastructure for distributing materials, collecting work, assessing students' tasks, and recording grades, but in most deployments they still behave more like workflow tools than instructional supports. The result is the usual bottleneck: students keep practicing through confusion, teachers triage questions, and feedback that could have corrected the misunderstanding arrives after the misconception has already hardened. To address this gap, we propose an AI-integrated LMS for middle school instruction, paired with a longitudinal study design to test whether sustained, bounded AI support changes outcomes through high school and into post-high school pathways. The proposed platform adds policy-gated AI assistance to everyday coursework, delivering formative feedback and hinting, recommending spaced review and adaptive practice based on mastery, and providing teacher-facing dashboards that summarize misconception patterns and flag sustained struggle. Because the platform is intended for minors, the design is privacy-first, using data minimization, role-based access control, age-appropriate response constraints, and auditable logs of AI interactions. Beyond short-term performance, the evaluation plan links fine-grained learning traces (attempts, revisions, help-seeking, and pacing) to institutional outcomes where feasible, so we can separate tool adoption effects from longer-run changes in learning trajectories.

2606.07543 2026-06-09 cs.CY cs.AI cs.HC 新提交

Concerns and Strategic Responses of Older Workers Navigating Generative AI in Bridge Employment

老年工人在桥梁就业中应对生成式AI的关切与战略回应

Aditya Nayak, Aakash Gautam, Rama Adithya Varanasi

发表机构 * University of Pittsburgh(匹兹堡大学) New York University(纽约大学)

AI总结 通过访谈21名专业人士,研究老年工人在桥梁就业中如何应对生成式AI带来的时间与结构性干扰,通过边界工作重构任务,形成AI韧性,并建议平衡个体、中观和宏观层面的策略以减少倦怠。

Comments CHIWORK'26

详情
AI中文摘要

生成式AI正在快速改变工作场所。这不成比例地影响了弱势群体,包括在最终退休前通过桥梁就业重新进入劳动力市场的老年工人。通过对21名专业人士进行深入的半结构化访谈,我们考察了老年工人在追求桥梁角色时如何应对生成式AI驱动的干扰,重点关注他们对GenAI整合的关切以及对这些变化的回应。我们的发现表明,由于GenAI,老年工人在桥梁就业决策过程的所有阶段都经历了时间和结构性干扰。作为回应,他们通过不同形式的边界工作重新配置任务,旨在恢复稳定性和连续性。我们将这些回应概念化为AI韧性,它重塑了老年工人的桥梁就业决策,使其成为一个持续的协商和适应过程。最后,我们提出建议,通过平衡个体层面的AI韧性策略、中观层面的AI韧性集体以及宏观层面的对抗性和可争议的AI中介组织结构,来减少老年工人的倦怠。

英文摘要

Generative AI (GenAI) is transforming workplaces at a rapid pace. This disproportionately affects vulnerable communities, including older workers (OWs) who re-enter the workforce through bridge employment prior to final retirement. Through in-depth semi-structured interviews with 21 professionals, we examine how OWs navigate GenAI-driven disruptions while pursuing bridge roles, focusing on their concerns about GenAI integration and their responses to these changes. Our findings show that OWs experienced both temporal and structural disruptions across all stages of the bridge employment decision-making process due to GenAI. In response, they reconfigured their tasks through different forms of boundary work aimed at restoring stability and continuity. We conceptualize these responses as AI resilience, which reshaped OWs' bridge employment decision-making into an ongoing process of negotiation and adaptation. We conclude by offering recommendations to reduce burnout among OWs by balancing individual-level AI resilience strategies with meso-level AI resilience collectives and macro-level adversarial and contestable AI-mediated organizational structures.

2606.07542 2026-06-09 cs.CY cs.AI 新提交

DIYHealth Suite: Dataset, Model, and Benchmark for Health Management at Home

DIYHealth Suite:家庭健康管理的数据集、模型与基准

Changshuo Liu, Junran Wu, Zhongle Xie, Wenqiao Zhang, Kaiping Zheng, Jiaqi Zhu, Qingpeng Cai, Ooi Gene Anne, Marcus Chun Jin Tan, Jianwei Yin, James Wei Luen Yip, Beng Chin Ooi

发表机构 * National University of Singapore(新加坡国立大学) University of Science and Technology of China(中国科学技术大学)

AI总结 针对家庭健康管理中的数据异构、任务多变和缺乏统一基准等问题,提出包含大规模多模态数据集DIYHealth-900K、自适应基础模型DIYHealthGPT(采用混合超低秩适应技术)和首个家庭护理基准DIYHealthBench的综合框架,在11项任务上达到最优性能。

Comments Accepted by ICML 2026

详情
AI中文摘要

生成式AI正在重塑医疗保健,然而现有大多数进展依赖于医院级设备,这限制了其在临床环境之外的健康管理的可及性和潜力。随着便携式设备和远程医疗的普及,医疗保健正转向基于家庭的自我诊断(DIY)护理。尽管前景广阔,但仍存在几个独特挑战:(i)家庭收集的数据是异构的,且缺乏标准化的大规模数据集;(ii)模型需要适应变化的任务需求和不断变化的个体状况;(iii)家庭护理任务的广泛范围缺乏统一的基准进行系统评估。在本文中,我们提出DIYHealth Suite,一个通过定制数据集、模型和基准来应对这些挑战的综合框架。我们首先整理了DIYHealth-900K,一个大规模多模态数据集,捕捉了多样化的真实世界家庭护理场景。在此基础上,我们提出DIYHealthGPT,一个用于家庭健康管理的自适应基础模型,由新颖的混合超低秩适应技术驱动。最后,我们建立了DIYHealthBench,首个评估基础模型在家庭护理任务上的基准。大量实验表明,DIYHealthGPT在开放问答和封闭问答设置下的11项家庭护理任务中,均优于通用和医学专用基线,达到了最先进的性能,为下一代个性化家庭健康管理奠定了基础。

英文摘要

Generative AI is reshaping healthcare, yet most existing advances rely on hospital-grade devices, which limits their accessibility and potential for health management outside clinical settings. With the proliferation of portable devices and telemedicine, healthcare is shifting toward home-based Diagnosis-It-Yourself (DIY) care. Despite this promise, several distinctive challenges remain: (i) home-collected data are heterogeneous, exacerbated by the absence of standardized large-scale datasets; (ii) models require adaptation to variable task demands and evolving individual conditions; (iii) the broad spectrum of home care tasks lacks a unified benchmark for systematic evaluation. In this paper, we present DIYHealth Suite, a comprehensive framework designed to address these challenges through a tailored dataset, model, and benchmark. We first curate DIYHealth-900K, a large-scale multimodal dataset capturing diverse real-world home care scenarios. Building on this, we propose DIYHealthGPT, an adaptive foundation model for home-based health management, powered by the novel Hybrid Hyper Low-Rank Adaptation technique. Finally, we establish DIYHealthBench, the first benchmark to evaluate foundation models on home care tasks. Extensive experiments demonstrate that DIYHealthGPT delivers state-of-the-art performance over both general-purpose and medical-specific baselines on 11 home care tasks in both open-QA and closed-QA settings, laying the groundwork for the next generation of personalized health management at home.

2606.07541 2026-06-09 cs.HC cs.AI cs.CV cs.CY cs.MM 新提交

Multimodal Large Language Models as Synthetic Participants in Video-Based Studies: An Evaluation

多模态大语言模型作为视频研究中的合成参与者:一项评估

Prabal Shrestha, Bohan Jiang, Haoning Xue, Huan Liu, Xinyi Zhou

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本研究评估多模态大语言模型在视频感知任务中模拟人类主观评分的表现,发现模型存在偏差且与人类一致性有限。

Comments Accepted to SocialLLM @ ICWSM 2026

详情
AI中文摘要

多模态大语言模型在视频理解和推理等客观任务上表现出色。然而,它们能否近似主观人类反应仍不清楚,因为主观反应不仅依赖于内容理解,还依赖于个体的社会背景。为填补这一空白,我们评估了MLLMs作为合成参与者在一项新兴任务中的表现:评估对短视频的感知感官参与度。基于感知信息感官价值框架,我们使用17项量表(测量情绪唤醒、戏剧冲击和新奇性)比较了招募的人类参与者和基于档案条件的MLLM模拟(n=673)的评分。我们发现,即使领先的MLLMs(Gemini 3 Flash和Qwen 3 Omni)与人类参与者的一致性也有限。这些模型在评分分布中表现出明显的向下均值偏移和中心趋势偏差。它们既引入又扁平化了子群体差异,同时对参与者档案的敏感性不一致。提示策略对这些指标的影响不同,适度改善某些方面同时恶化其他方面。这些结果突显了开发MLLMs作为视频研究中合成参与者的挑战与机遇。数据和代码:https://github.com/MINDLab25/mllm-human-simulation-eval

英文摘要

Multimodal large language models (MLLMs) have shown strong performance on objective tasks such as video understanding and reasoning. However, it remains unclear whether they can approximate subjective human responses, which depend not only on content comprehension but also on individuals' social contexts. To address this gap, we evaluate MLLMs as synthetic participants in an emerging task: assessing perceived sensory engagement with short videos. Grounded in the Perceived Message Sensation Value (PMSV) framework, we compare ratings from recruited human participants and profile-conditioned MLLM simulations (n=673) using a 17-item scale measuring emotional arousal, dramatic impact, and novelty. We find that even leading MLLMs (Gemini 3 Flash and Qwen 3 Omni) show limited agreement with human participants. The models exhibit distinct downward mean-shift and central-tendency biases in their rating distributions. They both introduce and flatten subgroup differences, while showing inconsistent sensitivity to participant profiles. Prompting strategies affect these metrics differently, modestly improving some aspects while worsening others. These results highlight both the challenges and opportunities of developing MLLMs as synthetic participants in video-based research. Data and code: https://github.com/MINDLab25/mllm-human-simulation-eval

2606.07538 2026-06-09 cs.IR cs.AI 新提交

Bidirectional Semantic Complementary Tool Retrieval for Remote Sensing Agents

面向遥感智能体的双向语义互补工具检索

Zeyuan Wang, Dongyang Hou, Cheng Yang, Xuezhi Cui, Linrui Xu, Bo Yu, Gaozhi Zhou, Ziyu Li, Liangtian Liu, Kai Ouyang, Wang Guo, Lili Zhu, Chao Tao

发表机构 * School of Geosciences and Info-Physics, Central South University(地质科学与信息物理学院,中南大学) School of Mechanical and Electrical Engineering, Central South University(机械与电子工程学院,中南大学) Hunan Key Laboratory of Land Resources Evaluation and Utilization, Hunan Provincial Institute of Land and Resources Planning(湖南省国土资源评价与利用重点实验室,湖南省国土资源规划院)

AI总结 针对遥感智能体工具检索中查询与文档语义不对称问题,提出双向语义互补方法:通过规划增强查询机制补充功能语义,利用动态工具依赖图注入上下文语义,显著提升复杂遥感任务工具检索精度。

详情
AI中文摘要

基于大语言模型的智能体为遥感数据的自动化处理提供了新范式。它们在复杂遥感任务中的成功依赖于广泛的专用工具库。然而,工具文档通常超出大语言模型的上下文窗口限制,使得精确的工具检索对于智能体工作流至关重要。现有工具检索方法面临“语义不对称”瓶颈:自然语言查询通常表达宏观意图,缺乏工具特定语义,而工具文档提供细粒度的技术描述,缺乏工作流的操作上下文。为弥合这一语义鸿沟,本文提出一种双向语义互补工具检索方法。首先,在查询端,我们引入一种基于规划的查询增强机制,利用智能体的推理能力将抽象意图分解为逻辑子任务,从而主动补充查询缺失的功能语义。其次,在工具端,针对遥感工具链的强耦合特性,我们构建了一个具有持续学习能力的动态工具依赖图。通过采用邻域信息聚合机制,将前驱工具的上下文信息显式注入当前节点表示,从而用上下文语义丰富工具描述。在遥感数据集GeoPlan-bench和通用数据集API-Bank上的实验结果表明,所提方法不仅显著提高了复杂遥感任务的工具检索精度,而且展现出向通用领域任务迁移的鲁棒可扩展性。源代码和数据集可在https://github.com/geox-lab/BSCTR获取。

英文摘要

Large language model (LLM)-based agents provide a novel paradigm for the automated processing of remote sensing(RS) data. Their success in complex RS tasks rely on extensive specialized tool libraries. However, tool documentation often exceeds the context window limits of LLMs, making precise tool retrieval essential for agentic workflows. Existing tool retrieval methods face "semantic asymmetry" bottleneck: natural language queries typically express macro-level intentions lacking tool-specific semantics, while tool documentation provides fine-grained technical descriptions lacking operational context for workflows. To bridge this semantic gap, this paper proposes a bidirectional semantic complementary tool retrieval method. First, on the query side, we introduce a planning-based query enhancement mechanism that leverages the reasoning capabilities of agents to decompose abstract intentions into logical subtasks, thereby actively supplementing the query with missing functional semantics. Second, on the tool side, addressing the strong coupling characteristics of RS tool chains, we construct a dynamic tool dependency graph with continual learning capabilities. By employing a neighborhood information aggregation mechanism, contextual information from precursor tools is explicitly injected into the current node representation, enriching tool descriptions with contextual semantics. Experimental results on the RS dataset GeoPlan-bench and the general-purpose dataset API- Bank demonstrate that the proposed method not only significantly improves tool retrieval accuracy for complex RS tasks but also exhibits robust extensibility for transfer to general-domain tasks. The source code and dataset are available at https://github.com/geox-lab/BSCTR.

2606.07534 2026-06-09 cs.IR cs.CL 新提交

PulseBench-Tab: A Multilingual Benchmark for Table Extraction with Graph-Based Evaluation

PulseBench-Tab:基于图评估的多语言表格提取基准

Ritvik Pandey, Sid Manchkanti, Mohammed Wazir Adain, Mohammed Hadi, Dushyanth Sekhar

发表机构 * Pulse AI Georgia Institute of Technology(佐治亚理工学院) S&P Global, Enterprise Data Organization(S&P全球企业数据部门)

AI总结 提出包含9种语言、1820个标注表格的多语言基准PulseBench-Tab,并设计基于单元格邻接有向图的新评估指标T-LAG,通过最优二分匹配统一衡量结构和内容保真度。

Comments 14 pages, 5 figures, 8 tables. Dataset: https://huggingface.co/datasets/pulse-ai/PulseBench-Tab Code: https://github.com/Pulse-Software-Corp/PulseBench-Tab

详情
AI中文摘要

我们推出了PulseBench-Tab,一个用于评估从文档图像中提取表格的开放多语言基准。该基准包含1,820个人工标注的表格,涵盖9种语言和4种文字系统(拉丁、中日韩、阿拉伯、西里尔),来自380份真实世界源文档,包括财务申报、政府报告和监管披露。表格的单元格数量从2到1,183不等,其中48.1%包含合并或跨行/列单元格。除了数据集,我们还提出了T-LAG(表格逻辑邻接图),一种新颖的评估指标,将表格建模为基于单元格邻接的有向图,并通过最优二分匹配在单一分数中计算结构和内容保真度。我们评估了9个商业和开源表格提取系统在基准上的表现,并报告了每种语言的细分结果。完整数据集、评分代码以及所有提供商的输出均已公开。

英文摘要

We introduce PulseBench-Tab, an open multilingual benchmark for evaluating table extraction from document images. The benchmark comprises 1,820 human-annotated tables spanning 9 languages and 4 scripts (Latin, CJK, Arabic, Cyrillic), drawn from 380 real-world source documents including financial filings, government reports, and regulatory disclosures. Tables range from 2 to 1,183 cells, with 48.1% containing merged or spanning cells. Alongside the dataset, we propose T-LAG (Table Logical Adjacency Graph), a novel evaluation metric that models tables as directed graphs over cell adjacencies and computes structural and content fidelity in a single score via optimal bipartite matching. We evaluate 9 commercial and open-source table extraction systems across the benchmark and report per-language breakdowns. The full dataset, scoring code, and all provider outputs are publicly available.

2606.06895 2026-06-09 cs.CR cs.AI cs.CY cs.ET 交叉投稿

Blockchain Infrastructure for Intelligent Cyber--Physical--Social Systems:Post-Quantum Security, Interoperability, and Trustworthy Data Economies in the Era of Embodied AI

面向智能信息-物理-社会系统的区块链基础设施:具身AI时代的后量子安全、互操作性与可信数据经济

Song Guo, Huawei Huang, Dongping Liu, Aoyu Zhang, Luyao Zhang

发表机构 * Hong Kong University of Science and Technology(香港理工大学) Sun Yat-sen University(中山大学) Amazon Web Services(亚马逊网络服务) Duke Kunshan University(杜克昆山大学)

AI总结 本教程探讨区块链作为协调层,融合后量子密码学与具身AI,实现可扩展、可信的数据经济与跨组织治理。

详情
AI中文摘要

通过基于世界模型的机器人技术部署具身人工智能,为区块链基础设施带来了变革性机遇,迫切需求可信数据溯源、跨组织治理以及跨去中心化生态系统的激励兼容共享。同时,2025年诺贝尔物理学奖和图灵奖所认可的量子计算进展威胁着保障这些数据经济的密码学原语,形成相互依存的紧迫需求:具身AI的长期验证依赖于能够抵御量子对手的密码敏捷架构。本教程考察区块链作为协调层,架起这一双重转型的桥梁——从金融底层到基础性信息-物理-社会系统基础设施,同时抵御量子密码分析并实现可扩展、可信的数据经济。会议以沉浸式AWS Braket演示开场,让参与者接触超导、离子阱和中性原子硬件,评估密码威胁时间线并见证ECDSA向后量子签名的过渡。五个集成模块依次涵盖:具身AI与世界模型需求、量子硬件现实与基于证据的安全迁移、通过BrokerChain协议实现可扩展跨分片架构、实施Croissant元数据标准与机器人学习溯源的可信数据经济,以及面向多模态云部署的行业生态系统集成。通过桥接量子硬件现实与具身AI数据需求,本教程将区块链描绘为下一代去中心化智能环境的统一基础设施,提供开源框架和路线图,用于构建抗量子、可互操作且数据可信的系统。

英文摘要

The deployment of embodied artificial intelligence via world-model-based robotics presents a transformative opportunity for blockchain infrastructure, establishing urgent demand for trustworthy data provenance, cross-organizational governance, and incentive-compatible sharing across decentralized ecosystems. Simultaneously, quantum computing advances recognized by the 2025 Nobel Prize in Physics and the Turing Award threaten the cryptographic primitives securing these data economies, creating an interdependent imperative: long-lived verification for embodied AI depends on crypto-agile architectures capable of withstanding quantum adversaries. This tutorial examines blockchain as the coordination layer bridging this dual transition, from financial substrate to foundational Cyber-Physical-Social Systems infrastructure that simultaneously secures against quantum cryptanalysis and enables scalable, trustworthy data economies. The session opens with an immersive AWS Braket demonstration engaging participants with superconducting, trapped-ion, and neutral-atom hardware to assess cryptographic threat timelines and witness ECDSA-to-post-quantum signature transitions. Five integrated modules progress from embodied AI and world-model requirements through quantum hardware reality and evidence-based security migration, to scalable cross-shard architectures via BrokerChain protocols, trustworthy data economies implementing Croissant metadata standards and robotic learning provenance, and industry ecosystem integration for multi-modal cloud deployment. By bridging quantum hardware realities with embodied AI data requirements, this tutorial charts blockchain as unified infrastructure for next-generation decentralized intelligent environments, providing open-source frameworks and roadmaps for architecting quantum-resistant, interoperable, and data-trustworthy systems.

2602.14033 2026-06-09 cs.IT cs.AI math.IT 交叉投稿

BRAIN: Bayesian Reasoning via Active Inference for Agentic and Embodied Intelligence in Mobile Networks

BRAIN: 通过主动推理进行贝叶斯推理以实现移动网络中的智能体与具身智能

Osman Tugay Basaran, Martin Maier, Falko Dressler

发表机构 * School of Electrical Engineering and Computer Science, TU Berlin(技术大学柏林电气工程与计算机科学学院) Optical Zeitgeist Laboratory, INRS(光感知实验室,INRS) Federal Ministry of Research, Technology and Space (BMFTR, Germany)(德国联邦研究、科技与航天部)

AI总结 提出基于主动推理的贝叶斯推理智能体(BRAIN),利用深度生成模型和变分自由能最小化统一感知与行动,在动态无线资源分配中实现鲁棒因果推理、自适应性和实时可解释性。

详情
AI中文摘要

未来的第六代(6G)移动网络将需要不仅自主高效,而且能够在动态环境中实时适应并透明决策的人工智能(AI)智能体。然而,当前网络中的主流智能体AI方法在这方面表现出显著缺陷。传统的基于深度强化学习(DRL)的智能体缺乏可解释性,并且常常遭受脆弱的适应性问题,包括在非平稳条件下对过去知识的灾难性遗忘。在本文中,我们针对这些挑战提出了一种替代解决方案:通过主动推理进行贝叶斯推理(BRAIN)智能体。BRAIN利用网络环境的深度生成模型,并通过最小化变分自由能将感知和行动统一在单个闭环范式中。我们在GPU加速的测试平台上将BRAIN实现为O-RAN扩展应用(xApp),并展示了其相对于标准DRL基线的优势。在我们的实验中,BRAIN表现出:(i)针对动态无线资源分配的鲁棒因果推理,在变化的流量负载下维持切片特定的服务质量(QoS)目标(吞吐量、延迟、可靠性);(ii)卓越的自适应性,在突然的流量变化中比基准方法高出高达28.3%的鲁棒性(无需任何重新训练即可实现);(iii)通过人类可解释的信念状态诊断实现其实时决策的可解释性。

英文摘要

Future sixth-generation (6G) mobile networks will demand artificial intelligence (AI) agents that are not only autonomous and efficient, but also capable of real-time adaptation in dynamic environments and transparent in their decisionmaking. However, prevailing agentic AI approaches in networking, exhibit significant shortcomings in this regard. Conventional deep reinforcement learning (DRL)-based agents lack explainability and often suffer from brittle adaptation, including catastrophic forgetting of past knowledge under non-stationary conditions. In this paper, we propose an alternative solution for these challenges: Bayesian reasoning via Active Inference (BRAIN) agent. BRAIN harnesses a deep generative model of the network environment and minimizes variational free energy to unify perception and action in a single closed-loop paradigm. We implement BRAIN as O-RAN eXtended application (xApp) on GPU-accelerated testbed and demonstrate its advantages over standard DRL baselines. In our experiments, BRAIN exhibits (i) robust causal reasoning for dynamic radio resource allocation, maintaining slice-specific quality of service (QoS) targets (throughput, latency, reliability) under varying traffic loads, (ii) superior adaptability with up to 28.3% higher robustness to sudden traffic shifts versus benchmarks (achieved without any retraining), and (iii) real-time interpretability of its decisions through human-interpretable belief state diagnostics.

2511.18590 2026-06-09 astro-ph.GA cs.LG hep-th 交叉投稿

From Simulations to Surveys: Domain Adaptation for Galaxy Observations

从模拟到巡天:面向星系观测的领域自适应

Kaley Brauer, Aditya Prasad Dash, Meet J. Vyas, Ahmed Salim, Stiven Briand Massala

发表机构 * Center for Astrophysics, Harvard University(哈佛大学天体物理中心) Physics and Astronomy, University of California, Los Angeles(加州大学洛杉矶分校物理与天文系) International Centre for Space and Cosmology, Ahmedabad University(阿赫迈德布恰大学国际空间与宇宙学中心) Department of Computing, Universiti Teknologi Malaysia(马来西亚技术大学计算系) Université Paris-Saclay, CentraleSupélec, ENS Paris-Saclay, CNRS, LMPS - Laboratoire de Mécanique Paris-Saclay(巴黎-萨克雷大学,CentraleSupélec,ENS巴黎-萨克雷,CNRS,LMPS-巴黎-萨克雷力学实验室)

AI总结 提出一种结合特征级领域损失和基于最优传输的top-k软匹配损失的领域自适应管道,将TNG50模拟星系分类器迁移到真实SDSS观测,目标域准确率从~46%提升至~87%。

Comments 8 pages, 4 figures. Will be presented at NeurIPS 2025 ML4PS

详情
AI中文摘要

大型光度巡天将拍摄数十亿个星系,但我们目前缺乏快速、可靠的自动化方法来推断它们的物理性质,如形态、恒星质量和恒星形成率。模拟提供了具有真实物理标签的星系图像,但PSF、噪声、背景、选择和标签先验中的领域偏移会降低向真实巡天的迁移效果。我们提出了一个初步的领域自适应管道,该管道在模拟的TNG50星系上训练,并在具有形态标签(椭圆/旋涡/不规则)的真实SDSS星系上评估。我们训练了三个骨干网络(CNN、$E(2)$-可转向CNN、ResNet-18),使用焦点损失和有效数量类别加权,以及基于GeomLoss(熵Sinkhorn OT、能量距离、高斯MMD及相关度量)构建的特征级领域损失$L_D$。我们表明,将这些损失与基于OT的“top-$k$软匹配”损失(该损失将$L_D$聚焦于最不匹配的源-目标对)相结合,可以进一步增强领域对齐。使用欧几里得距离、调度对齐权重和top-$k$匹配,目标域准确率(宏F1)从无自适应时的~46%(~30%)提升至~87%(~62.6%),领域AUC接近0.5,表明潜在空间混合良好。

英文摘要

Large photometric surveys will image billions of galaxies, but we currently lack quick, reliable automated ways to infer their physical properties like morphology, stellar mass, and star formation rates. Simulations provide galaxy images with ground-truth physical labels, but domain shifts in PSF, noise, backgrounds, selection, and label priors degrade transfer to real surveys. We present a preliminary domain adaptation pipeline that trains on simulated TNG50 galaxies and evaluates on real SDSS galaxies with morphology labels (elliptical/spiral/irregular). We train three backbones (CNN, $E(2)$-steerable CNN, ResNet-18) with focal loss and effective-number class weighting, and a feature-level domain loss $L_D$ built from GeomLoss (entropic Sinkhorn OT, energy distance, Gaussian MMD, and related metrics). We show that a combination of these losses with an OT-based "top_$k$ soft matching" loss that focuses $L_D$ on the worst-matched source-target pairs can further enhance domain alignment. With Euclidean distance, scheduled alignment weights, and top-$k$ matching, target accuracy (macro F1) rises from $\sim$46% ($\sim$30%) at no adaptation to $\sim$87% ($\sim$62.6%), with a domain AUC near 0.5, indicating strong latent-space mixing.