arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.05494 2026-06-05 cs.CL cs.AI

MASF: A Multi-Model Adaptive Selection Framework for Abstractive Text summarization

MASF：面向抽象式文本摘要的多模型自适应选择框架

Ahmed Alansary, Ali Hamdi

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出一种多模型自适应选择框架，通过集成多个微调的Transformer模型并基于自动评估指标选择最佳摘要，在CNN/DailyMail数据集上BERTScore达88.63%，优于GPT3-D2等大模型。

详情

Comments: 6 pages, 3 figures, IMSA2026

AI中文摘要

自动文本摘要因数字文本信息的快速增长而变得日益重要。本文提出一种多模型自适应摘要框架，旨在提高抽象式文本摘要的鲁棒性和质量。依赖单一模型往往导致在不同结构和主题的文章上摘要质量不一致。为解决这一局限，所提框架集成了多个微调的基于Transformer的摘要模型，并引入自适应选择机制。在该框架中，每个模型独立为同一输入文章生成候选摘要。然后使用自动评估指标评估生成的摘要，这些指标同时捕捉词汇相似性和语义相关性。基于这些分数，框架选择最高质量的摘要作为最终输出。模型在广泛使用的CNN/DailyMail新闻摘要数据集上进行微调和评估。实验结果表明，所提框架在所有比较方法中取得了最高的BERTScore，达到88.63%。它还优于多个大语言模型，如GPT3-D2、Falcon-7b和Mpt-7b，突显了其有效性和鲁棒性。这些发现强调了在自适应选择策略中利用多个基于Transformer的模型来提高自动文本摘要系统质量和鲁棒性的有效性。

英文摘要

Automatic text summarization has become increasingly important due to the rapid growth of digital textual information. This paper presents a Multi-Model Adaptive Summarization Framework designed to improve the robustness and quality of abstractive text summarization. Relying on a single model often leads to inconsistent summarization quality across articles with varying structures and topics. To address this limitation, the proposed framework integrates multiple fine-tuned transformer-based summarization models and introduces an adaptive selection mechanism. In this framework, each model independently generates a candidate summary for the same input article. The generated summaries are then evaluated using automatic evaluation metrics that capture both lexical similarity and semantic relevance. Based on these scores, the framework selects the highest-quality summary as the final output. The models are fine-tuned and evaluated on the widely used CNN/DailyMail news summarization dataset. Experimental results demonstrate that the proposed framework achieves the highest BERTScore among all compared methods with a score of 88.63%. It also outperforms several LLMs such as GPT3-D2, Falcon-7b, and Mpt-7b, highlighting its effectiveness and robustness. These findings highlight the effectiveness of leveraging multiple transformer-based models within an adaptive selection strategy to improve the quality and robustness of automatic text summarization systems.

URL PDF HTML ☆

赞 0 踩 0

2606.05491 2026-06-05 cs.CV cs.RO

Unpaired RGB-Thermal Gaussian-Splatting Using Visual Geometric Transformers

无配对RGB-热成像高斯泼溅使用视觉几何变换器

Jean Cordonnier, Chenghao Xu, Olga Fink, Malcolm Mielle

发表机构 * Ecole Polytechnique Federale de Lausanne（瑞士联邦理工学院洛桑分校）； Schindler EPFL Lab（施耐德EPFL实验室）

AI总结提出一种无配对RGB-热成像新视角合成框架，利用VGGT估计各模态相机位姿并通过Procrustes对齐，结合多模态3D高斯泼溅实现联合重建，在保持RGB保真度的同时实现热成像视图合成。

详情

Comments: Accepted at ICRA 2026's Workshop MM-SpatialAI: Multi-Modal Spatial AI for Robust Navigation and Open-World Understanding

AI中文摘要

结合RGB和热成像的多模态新视角合成（NVS）能够利用视觉和热信息进行精确的3D场景重建。然而，现有方法通常依赖于精确校准的RGB-热成像图像对或立体设置，限制了可扩展性和实际部署。为了解决这个问题，我们引入了一个无配对RGB-热成像NVS框架，该框架利用VGGT（一种3D前馈变换器架构）独立估计每个模态的相机位姿。然后使用Procrustes算法与跨模态特征匹配器对齐位姿集，从而无需配对校准即可实现联合配准。在此对齐基础上，我们进一步提出了一种多模态3D高斯泼溅方法，直接从无配对的RGB和热成像图像中学习。在多种场景上的实验表明，我们的方法在热成像视图合成中取得了有竞争力的性能，同时保持了RGB保真度。此外，我们表明现有的重建方法可能产生缺乏跨模态一致性的特定模态重建。因此，我们引入了一个基准框架，以严格评估每个模态的图像合成以及重建场景的多模态一致性。

英文摘要

Multi-modal novel view synthesis (NVS) combining RGB and thermal imagery enables precise 3D scene reconstruction with visual and thermal information. However, existing methods typically rely on precisely calibrated RGB-thermal image pairs or stereo setups, limiting scalability and practical deployment. To address this, we introduce a framework for unpaired RGB-thermal NVS that leverages VGGT, a 3D feed-forward transformer architecture, to independently estimate camera poses for each modality. The pose sets are then aligned using the Procrustes algorithm with a cross-modal feature matcher, enabling joint registration without paired calibration. Building on this alignment, we further propose a multi-modal 3D Gaussian Splatting approach that learns directly from unpaired RGB and thermal images. Experiments on diverse scenes demonstrate that our method achieves competitive performance in thermal view synthesis while maintaining RGB fidelity. Moreover, we show that existing reconstruction approaches can produce modality-specific reconstructions that lack cross-modal consistency. We thus introduce a benchmarking framework to rigorously evaluate both per-modality image synthesis and the multi-modal coherence of reconstructed scenes.

URL PDF HTML ☆

赞 0 踩 0

2606.05489 2026-06-05 cs.CV cs.DB

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

LLM引导的ANN索引优化用于人-物交互检索

Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

发表机构 * Iowa State University（爱荷华州立大学）； Intel Corporation（英特尔公司）

AI总结提出一种基于大语言模型的阶段感知智能体，通过耦合参数空间的分阶段优化，在HICO-DET等基准上显著提升向量检索吞吐量。

详情

Comments: 13 pages, 5 figures, 8 tables

AI中文摘要

检索系统支撑着现代AI应用——涵盖视觉搜索、推荐引擎和多模态问答。现代多阶段检索系统需要联合优化高度耦合的参数，然而传统的超参数优化（HPO）方法——包括树结构Parzen估计器（TPE）和高斯过程贝叶斯优化——依赖于独立性假设，这从根本上阻止了它们在这些耦合配置空间中的导航。我们通过一个阶段感知的大语言模型（LLM）智能体来解决这一限制，该智能体将每个提案基于其完整的优化历史进行条件化，在阶段划分的探索、利用和微调阶段中导航耦合参数空间。在HICO-DET人-物交互检索基准上使用Intel VDMS（视觉数据管理系统）进行评估，我们的智能体在SIEVE（向量搜索效率的保障索引评估，一种质量约束的吞吐量指标）下比Optuna TPE高出+33.3%，比VDTuner高出+34.2%，相比UniIR实现了15.3倍的吞吐量提升。在三个基准上的验证证实，智能体的优势随参数耦合程度增加而增长：在HICO-DET（高耦合）上+33.3%，在GLDv2（中等耦合）上方法收敛于1%以内，在SIFT1M（近独立控制）上收敛于3.6%以内。在Milvus上的跨系统验证确认，优化器在所有三个数据集上排名第一且无需修改，展示了跨向量数据库管理系统（VDBMS）平台的可迁移性。

英文摘要

Retrieval systems underpin modern AI applications -- spanning visual search, recommendation engines, and multi-modal question answering. Modern multi-stage retrieval systems require the joint optimization of highly coupled parameters, yet traditional hyperparameter optimization (HPO) methods -- including Tree-structured Parzen Estimators (TPE) and Gaussian Process Bayesian Optimization -- rely on an independence assumption that fundamentally prevents them from navigating these coupled configuration spaces. We address this limitation with a phase-aware large language model (LLM) agent that conditions each proposal on its full optimization history, navigating the coupled parameter space across phase-partitioned exploration, exploitation, and fine-tuning stages. Evaluated on the HICO-DET human-object interaction retrieval benchmark using Intel VDMS (Visual Data Management System), our agent outperforms Optuna TPE by +33.3% and VDTuner by +34.2% under SIEVE (Safeguarded Index Evaluation of Vector-search Efficiency, a quality-constrained throughput metric), delivering a 15.3x throughput gain over UniIR. Validation across three benchmarks confirms that the agent's advantage grows with the degree of parameter coupling: +33.3% on HICO-DET (high coupling), methods converge within 1% on GLDv2 (moderate coupling) and within 3.6% on SIFT1M (near-independent control). Cross-system validation on Milvus confirms the optimizer ranks first on all three datasets without modification, demonstrating transferability across vector database management system (VDBMS) platforms.

URL PDF HTML ☆

赞 0 踩 0

2606.05486 2026-06-05 cs.CL cs.LG

Localizing Prompt Ambiguity in Large Language Models with Probe-Targeted Attribution

通过探针目标归因定位大型语言模型中的提示歧义

Govind Ramesh, Yao Dou, Wei Xu

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结提出PRIG方法，利用线性探针和梯度归因，通过中间表示而非输出层定位提示中的歧义位置，在合成和人工基准上取得高AUROC。

详情

Comments: 23 pages, 5 figures, 5 tables

AI中文摘要

提示歧义是大型语言模型中常见的失败原因，但由于它是提示的潜在属性，难以定位，而现有的归因方法旨在解释可观察的输出，如logits或生成的token。我们引入了PRIG，一种梯度归因方法，使用探针logit将潜在歧义归因于token位置。具体来说，PRIG训练一个线性探针来区分清晰提示和模糊提示，并将探针分数归因于残差流中早期的token表示。为了实现token级别的评估，我们通过重写每个提示中的一个关键句子，构建了涵盖编码、数学和写作的合成歧义数据集，并用人工编写的黄金基准进行补充。在这种设置下，PRIG在定位歧义片段方面显著优于梯度归因基线，在组合合成基准上达到0.840 AUROC，在黄金集上达到0.891 AUROC。它在句子级别的歧义识别上也优于GPT-5.4，并在域外保留了有用的信号。这些结果确立了PRIG作为一种实用工具，用于识别提示中哪些部分存在歧义。更广泛地说，它们表明潜在提示属性可以通过中间表示而非输出级归因来定位。

英文摘要

Prompt ambiguity is a common source of failure in large language models, but is difficult to localize because it is a latent property of the prompt, while existing attribution methods are designed to explain observable outputs such as logits or generated tokens. We introduce PRIG, a gradient attribution method that uses a probe logit to attribute latent ambiguity to token positions. Specifically, PRIG trains a linear probe to distinguish clear prompts from ambiguous prompts and attributes the probe score to earlier token representations in the residual stream. To enable token-level evaluation, we construct synthetic ambiguity datasets across coding, math, and writing by rewriting one task-critical sentence per prompt, and complement them with a human-written gold benchmark. In this setting, PRIG localizes ambiguous spans substantially better than gradient attribution baselines, achieving 0.840 AUROC on the combined synthetic benchmark and 0.891 AUROC on the gold set. It also outperforms GPT-5.4 on sentence-level ambiguity identification and retains useful signal out-of-domain. These results establish PRIG as a practical tool for identifying which parts of a prompt are ambiguous. More broadly, they suggest that latent prompt properties can be localized through intermediate representations, rather than through output-level attribution.

URL PDF HTML ☆

赞 0 踩 0

2606.05481 2026-06-05 cs.LG cs.AI eess.SP

Towards Unified and Data-Efficient Prognostics and Health Management with Tabular Foundation Models

面向统一且数据高效的预测与健康管理：基于表格基础模型

Raffael Theiler, Lev Telyatnikov, Leandro Von Krannichfeldt, Olga Fink

发表机构 * IMOS Lab, EPFL（IMOS实验室，瑞士联邦理工学院）

AI总结提出利用表格基础模型通过上下文学习处理工业时间序列，实现预测与健康管理（PHM）任务，在低数据场景下表现优异，并优于序列模型和梯度提升树。

详情

AI中文摘要

数据驱动的预测与健康管理（PHM）利用时变状态监测数据来诊断系统状态并估计工程资产的剩余使用寿命。这些任务是维护规划的核心，但工业PHM数据通常是碎片化的、部分观测且标注不足，这阻碍了监督学习。基础模型提供了一条通往可重用预测系统的途径，然而大多数时间序列基础模型是为预测设计的，并假设长序列、连贯且规则采样。为弥补这一差距，我们提出了一个框架，利用上下文学习将表格基础模型应用于工业时间序列，并在多种PHM任务上对其进行评估。通过将原始单元级信号转换为表格行，我们展示了这些模型在多个任务（包括预测和诊断）上表现良好，且数据效率高。我们在统一的评估协议下，直接将其与序列模型、Transformer基线和梯度提升树进行比较。结果表明，表格基础模型在预测和诊断任务中取得了最佳平均排名。我们的发现进一步表明，基于PFN的模型在低数据场景下具有竞争力，时间上下文可以在表格表示中保留，且性能依赖于子采样下的代表性上下文构建。这些结果证明，表格基础模型为异构PHM问题提供了一个实用且通用的接口。

英文摘要

Data-driven Prognostics and Health Management (PHM) uses time-varying condition-monitoring data to diagnose system states and estimate remaining useful life in engineered assets. These tasks are central to maintenance planning, but industrial PHM data are often fragmented, partially observed, and poorly labeled, which hinders supervised learning. Foundation models offer a route toward reusable predictive systems, yet most time-series foundation models are designed for forecasting and assume long, coherent, regularly sampled sequences. To address this gap, we propose a framework for applying Tabular Foundation Models to industrial time series using in-context learning, and we evaluate them on a variety of PHM tasks. By converting raw unit-level signals into tabular rows, we show that these models perform well across multiple tasks - including prognostics, and diagnostics - and are highly data efficient. We compare them directly with sequence models, transformer baselines, and gradient-boosted trees under a common evaluation protocol. The results indicate that tabular foundation models achieve the best average ranks across prognostic and diagnostic tasks. Our findings further show that PFN-based models are competitive in low-data regimes, that temporal context can be preserved in the tabular representation, and that performance depends on representative context construction under subsampling. These results demonstrate that tabular foundation models provide a practical and general interface for heterogeneous PHM problems.

URL PDF HTML ☆

赞 0 踩 0

2606.05478 2026-06-05 cs.CV cs.LG

Can We Predict The Human Preference For Text-to-Image Content Prior To Generation And Is It Even Useful To Do So?

我们能否在生成之前预测文生图内容的人类偏好，以及这样做是否有用？

Joong Ho Kim, Keith G. Mills

发表机构 * LSU ATHENA Lab（LSU ATHENA实验室）

AI总结研究在扩散模型生成图像前预测人类偏好评分（HPM）的可行性，并利用该预测提升生成质量，同时评估不同HPM的适用性。

详情

Comments: Code is available at https://github.com/LSU-ATHENA/HPM-Predict

AI中文摘要

扩散模型（DM）通过从用户提示中合成高质量、逼真的视觉内容，彻底改变了文本驱动的生成。而先前视觉生成的进展（如VAE和GAN）主要基于感知或视觉相似性指标（如FID、PSNR）进行评估，DM的进展促进了更先进的人类偏好指标（HPM）的发展，这些指标将人类判断建模并量化为标量值。然而，DM使用固有的随机过程合成内容，其中随机噪声种子生成。初始随机噪声直接定性和定量地影响生成输出的质量。这种影响在本地部署场景的小型模型中尤为显著。鉴于这一现象，我们首先研究在投入计算资源进行生成之前，我们能在多大程度上预测标量HPM分数。进一步，我们研究能在多大程度上利用这种预测来改善生成图像的质量，并研究哪些HPM最适合此任务。我们的研究表明，这不仅是可能的，而且可以实现可忽略的硬件开销。

英文摘要

Diffusion Models (DM) have revolutionized text-driven generation by enabling the synthesis of high-quality, photorealistic visual content from user prompts. Whereas prior advances in visual generation such as VAEs and GANs were primarily evaluated on perceptual or visual similarity metrics such as FID PSNR, DM advances have fostered the development of more advanced Human Preference Metrics (HPM) that model and quantify human judgment as scalar values. However, DMs synthesize content using an inherently stochastic process where random noise seeds generation. The initial random noise directly affects the quality of generated outputs, both qualitatively and quantitatively. This influence is pronounced in smaller models for local deployment scenarios. Given this phenomenon, we first investigate to what extent we can predict scalar HPM scores prior to committing compute resources for generation. Further, we then investigate to what extent we can leverage such prediction to improve the quality of generated images, and also study which HPMs are best suited for this task. Our investigation reveals that not only is this possible, but that it is feasible to achieve negligible hardware overhead.

URL PDF HTML ☆

赞 0 踩 0

2606.05468 2026-06-05 cs.RO

FlowPRO: Reward-Free Reinforced Fine-Tuning of Flow-Matching VLAs via Proximalized Preference Optimization

FlowPRO：通过近端偏好优化对流匹配VLA进行无奖励强化微调

Yihao Wu, He Zhang, Junbo Tan, Xueqian Wang, Zhengyou Zhang

发表机构 * Tencent Robotics X（腾讯机器人X实验室）； Futian Laboratory（福田实验室）； Tsinghua University（清华大学）

AI总结提出FlowPRO框架，通过近端偏好优化（RPRO）和干预-回滚数据收集方法，实现无奖励的离线强化微调，在四类长时程双臂任务中取得最高成功率。

详情

AI中文摘要

将视觉-语言-动作（VLA）模型后训练为可在真实机器人上可靠部署的策略仍然是一个主要瓶颈。SFT和DAgger仅间接利用失败信号，而基于奖励的强化学习则受限于真实世界奖励设计的难度以及训练可靠评论家的困难。我们提出FlowPRO，一种针对流匹配VLA的无奖励离线强化微调框架。在算法上，我们提出RPRO（机器人流匹配近端偏好优化），一种针对VLA模型流匹配动作头定制的偏好优化目标。RPRO将对比优化器与显式近端正则化器配对，该正则化器锚定隐式奖励的绝对幅度，从而消除了普通Flow-DPO的奖励黑客失败模式。在数据方面，一种遥操作干预-回滚范式通过单个操作员动作在真实机器人上自然产生成对的正负轨迹$(τ^w, τ^l)$；平滑插值过程结合批量混合，然后将这些稀疏修正转换为密集的每状态监督，同时保留基础策略的能力。在四项长时程双臂任务上，FlowPRO取得了最高成功率，优于四个代表性基线，消融实验证实了每个损失组件的贡献。

英文摘要

Post-training Vision-Language-Action (VLA) models into policies that can be reliably deployed on real robots remains a major bottleneck. SFT and DAgger exploit failure signals only indirectly, and reward-based RL is bottlenecked by the difficulty of real-world reward design and of training reliable critics. We present FlowPRO, a reward-free offline reinforced fine-tuning framework for flow-matching VLAs. Algorithmically, we propose RPRO (Robotic Flow-matching Proximalized Preference Optimization), a preference-optimization objective tailored to the flow-matching action head of VLA models. RPRO pairs a contrastive optimizer with an explicit proximal regularizer that anchors the absolute magnitude of the implicit reward, thereby eliminating the reward-hacking failure mode of plain Flow-DPO. On the data side, a teleoperated intervention-and-rollback paradigm produces naturally paired positive and negative trajectories $(τ^w, τ^l)$ on a real robot from a single operator action; a Smooth Interpolation procedure, combined with batch mixing, then converts these sparse corrections into dense per-state supervision while preserving the base policy's capabilities. On four long-horizon bimanual tasks, FlowPRO attains the highest success rate, outperforming four representative baselines, and ablations confirm the contribution of each loss component.

URL PDF HTML ☆

赞 0 踩 0

2606.05464 2026-06-05 cs.AI

Step-by-Step Optimization-like Reasoning in LLMs over Expanding Search Spaces

大语言模型中在扩展搜索空间上的逐步优化类推理

Nicolás Astorga, Nabeel Seedat, Mihaela van der Schaar

发表机构 * University of Cambridge（剑桥大学）

AI总结本文提出OPT*任务族，通过可验证奖励训练和搜索引导策略，提升LLM在扩展搜索空间中的逐步优化推理能力。

详情

AI中文摘要

可验证奖励训练改善了数学和编码推理，但这些领域仅涵盖了逐步决策的一部分。许多现实任务需要在众多有效备选方案中找到高价值的可行计划。我们引入OPT*，一个可扩展的优化风格任务族，用于沿复杂度轴训练和评估LLM的逐步优化类推理：每个任务提供可行性检查器和评估器，而复杂度参数扩展搜索空间，无需新的人工标签。这促使我们在两种机制下研究这些任务：(i) 求解器引导的在线策略优化，使用求解器作为部分状态的价值预言机，并应用基于排名的奖励塑造来强化更好的下一步；(ii) 当此类求解器不可用时，基于搜索的离线强化学习。理论上，我们将大搜索空间中的成功与推理者在每单位搜索预算中提取的信息联系起来。实证上，我们消融了使OPT*上搜索高效的要素，并表明在OPT*上训练改进了逐步优化类推理。

英文摘要

Verifiable reward training has improved mathematical and coding reasoning, but these domains capture only part of step-by-step decision making. Many real-world tasks require finding a high-value feasible plan among many valid alternatives. We introduce OPT*, a scalable family of optimization-style tasks for training and evaluating LLM step-by-step optimization-like reasoning along a complexity axis: each task provides a feasibility checker and evaluator, while a complexity parameter expands the search space without requiring new human labels. This motivates studying these tasks in two regimes: (i) solver-guided online policy optimization, which uses a solver as a value oracle for partial states and applies rank-based reward shaping to reinforce better next steps, and (ii) search-based offline RL when such solvers are unavailable. Theoretically, we relate success in large search spaces to the information a reasoner extracts per unit of search budget. Empirically, we ablate the ingredients that make search efficient on OPT* and show that training on OPT* improves step-by-step optimization-like reasoning.

URL PDF HTML ☆

赞 0 踩 0

2606.05461 2026-06-05 cs.AI

Output Type Before Quality: A Standards-Derived XAI Admissibility Rubric for Autonomous-Driving Safety

先输出类型，后质量：基于标准的自动驾驶安全XAI可接受性评估标准

Abhinaw Priyadershi, Mandar Pitale, Jelena Frtunikj, Maria Spence

发表机构 * NVIDIA Corporation（英伟达公司）； NVIDIA GmbH（英伟达德国分公司）

AI总结针对基于ML的自动驾驶安全标准与XAI方法输出类型不匹配的证据类型缺口，从多个安全标准推导出19项可测试证据标准，评估六类XAI方法，发现因果XAI在三个生命周期阶段结构上必需，并提出了结构可接受性概念。

详情

AI中文摘要

基于ML的自动驾驶安全标准规定了保证案例必须包含的证据类型（有向因果链、量化的干预效应、命名的根因变量），然而XAI文献是按输出类型和技术族（显著性图、特征归因、反事实、因果图、语言痕迹）组织的。最受推荐的ADS XAI方法SHAP返回一个排序的特征列表，任何实现努力都无法将其转换为有向链（图1）。我们将这种不匹配称为证据类型缺口。从AMLAS、ISO 26262、ISO 21448、ISO/PAS 8800中，我们推导出19项可测试的证据标准，涵盖7个生命周期阶段，并附有代表性的条款引用推导，对六类XAI方法进行了结构性评分。因果XAI在结构上被证明是满足推导标准的必要条件，涉及三个阶段：危害识别（+62%标准缺口）、事件调查（+50%）和数据管理（+50%）；判定集在阈值T∈(0%, 50%]内稳定，并在最坏情况下的单单元翻转下存活至T=25%。在其余四个阶段，相关或基于语言的方法是可比较或足够的。该标准识别了结构可接受性（合规的必要但非充分条件）：一个可接受方法的具体输出内容仍可能是错误的，验证其保真度（拟合SCM产生的边、痕迹命名的原因）是开放的保证挑战。基于1,996个真实驾驶片段（79,840行，十个分割）的单VLA概念验证与每种方法观察到的输出类型匹配其标准预测一致。ADS安全保证的XAI方法选择应由生命周期阶段的证据需求驱动，而非方法流行度。

英文摘要

Safety standards for ML-based autonomous driving specify the kind of evidence an assurance case must contain (directed cause-and-effect chains, quantified interventional effects, named root-cause variables), yet the XAI literature is organised by output type and technique family (saliency maps, feature attribution, counterfactuals, causal graphs, language traces). SHAP, the most-recommended ADS XAI method, returns a ranked feature list that no implementation effort can convert into a directed chain (Fig.1). We name this mismatch the evidence-type gap. From AMLAS, ISO 26262, ISO21448, ISO/PAS 8800 we derive 19 testable evidentiary criteria across 7 lifecycle stages with representative clause-cited derivations and score six XAI method classes structurally. Causal XAI emerges as structurally required to satisfy the derived criteria at three stages: hazard identification (+62% rubric gap), incident investigation (+50%), and data management (+50%); the verdict set is stable across thresholds T in (0%, 50%]$ and survives a worst-case single-cell flip down to T = 25%. At the remaining four stages, correlational or language-based methods are comparable or sufficient. The rubric identifies structural admissibility (necessary but not sufficient for compliance): an admissible method's specific output content may still be wrong, and validating that fidelity (the edges a fitted SCM produces, the cause a trace names) is the open assurance challenge. A single-VLA proof of concept on 1,996 real-world driving clips (79,840 rows, ten splits) is consistent with each method's observed output type matching its rubric prediction. XAI method selection for ADS safety assurance should be driven by lifecycle-stage evidence demand, not by method popularity.

URL PDF HTML ☆

赞 0 踩 0

2606.05460 2026-06-05 cs.CV

ORACLE-CT: Anatomy-Aware Support Pooling for CT Classification

ORACLE-CT：用于CT分类的解剖感知支持池化

Lavsen Dahal, Yubraj Bhandari, Geoffrey Rubin, Joseph Y. Lo

发表机构 * Center for Virtual Imaging Trials, RAI Labs, Department of Radiology, Duke University（虚拟成像试验中心，RAI实验室，放射学系，杜克大学）； Electrical and Computer Engineering, Pratt School of Engineering, Duke University（电气与计算机工程，工程学院，杜克大学）； Department of Mathematics, Trinity College of Arts & Sciences, Duke University（数学系，艺术与科学学院，杜克大学）； Department of Radiology and Imaging Sciences, University of Arizona College of Medicine（放射学与影像科学系，亚利桑那大学医学院）

AI总结提出ORACLE-CT框架，通过多器官分割定义标签特定的解剖支持区域并限制注意力池化，解决CT分类中局部疾病证据与全局聚合不匹配的问题，在多个编码器上提升性能。

详情

AI中文摘要

腹部CT疾病分类具有挑战性，因为每次扫描都是一个包含许多可能发现的大3D体积，而诊断证据通常局限于特定器官或解剖隔室。大多数研究级分类器使用与解剖无关的池化或注意力来聚合编码器特征，造成了局部疾病证据与全局证据聚合之间的不匹配。我们提出ORACLE-CT，一个与编码器无关的解剖感知聚合框架，它使用多器官分割来定义标签特定的解剖支持，并将注意力池化限制在相关区域。该框架支持单器官、多器官联合、比较、局部和全局支持策略。我们使用三个编码器系列评估ORACLE-CT：DINOv3、I3D-ResNet-121和放射学原生Pillar-0编码器。模型在MERLIN上进行端到端训练，并在内部评估以及在冻结外部迁移到Duke-Abdomen和AMOS下进行评估。与全局平均池化相比，支持掩蔽池化将DINOv3的MERLIN宏AUROC/AUPRC从0.838/0.638提高到0.858/0.676，将I3D-ResNet-121从0.829/0.617提高到0.848/0.659。在协调的10标签外部评估中，DINOv3在Duke-Abdomen上从0.802/0.628提高到0.835/0.683，在AMOS上从0.742/0.313提高到0.762/0.350，I3D-ResNet-121也有类似增益。对于Pillar-0，大部分增益来自学习注意力，解剖掩蔽的额外收益较小。ORACLE-CT提高了区分度和外部鲁棒性，同时保留了预测与解剖证据之间的可审计联系。

英文摘要

Abdominal CT disease classification is challenging because each scan is a large 3D volume with many possible findings, while diagnostic evidence is often confined to specific organs or anatomical compartments. Most study-level classifiers aggregate encoder features using anatomy-agnostic pooling or attention, creating a mismatch between localized disease evidence and global evidence aggregation. We propose ORACLE--CT, an encoder-agnostic anatomy-aware aggregation framework that uses multi-organ segmentation to define label-specific anatomical supports and restrict attention pooling to relevant regions. The framework supports single-organ, multi-organ union, comparative, localized, and global support strategies. We evaluate ORACLE--CT with three encoder families: DINOv3, I3D--ResNet-121, and the radiology-native Pillar--0 encoder. Models are trained end-to-end on MERLIN and evaluated internally and under frozen external transfer to Duke--Abdomen and AMOS. Compared with global average pooling, support-masked pooling improved MERLIN macro-AUROC/AUPRC from 0.838/0.638 to 0.858/0.676 for DINOv3 and from 0.829/0.617 to 0.848/0.659 for I3D--ResNet-121. On harmonized 10-label external evaluation, DINOv3 improved on Duke--Abdomen from 0.802/0.628 to 0.835/0.683 and on AMOS from 0.742/0.313 to 0.762/0.350, with similar gains for I3D--ResNet-121. For Pillar--0, most gains came from learned attention, with smaller additional benefit from anatomical masking. ORACLE--CT improves discrimination and external robustness while preserving an auditable link between predictions and anatomical evidence.

URL PDF HTML ☆

赞 0 踩 0

2606.05458 2026-06-05 cs.CV

Horse Eye Blink Detection and Classification for Equine Affective State Assessment

马匹眼睛眨眼检测与分类用于马匹情感状态评估

João Alves, Signe Møller-Skuldbøl, Pia Haubro Andersen, Rikke Gade

发表机构 * Visual Analysis and Perception Lab, Aalborg University（视觉分析与感知实验室，奥尔堡大学）； Department of Animal Biosciences, Swedish University of Agricultural Sciences（动物生物科学系，瑞典农业科学大学）

AI总结本研究开发并评估了三种基于视频的马匹眨眼自动分类方法（帧级YOLOv12检测器、光流幅度阈值法和微调VideoMAE模型），在公开数据集上实现了眨眼分类宏F1分数0.898和二元眨眼检测0.926，展示了细粒度动作单元检测在马匹福利监测中的潜力和挑战。

详情

Comments: CVPRW2026 CV4Animals

AI中文摘要

自动检测马匹面部动作单元（AUs）是评估马匹疼痛和情感状态的一个有前景但尚未充分探索的途径。半眨眼和全眨眼运动被认为是疼痛和压力的识别指标，但作为微表情，其细微、精细的特性使其容易被肉眼忽略，只能通过逐帧视频检查才能辨别，这使得从视频中进行可靠的自动检测成为一项特别艰巨的任务。我们开发并评估了三种从马匹视频中自动分类眨眼的方法：基于帧的YOLOv12检测器、光流幅度阈值方法以及微调的VideoMAE模型，并在公开数据集上进行了测试。我们在眨眼分类任务上达到了0.898的宏F1分数，在二元眨眼检测上达到了0.926。我们的结果突显了细粒度AU检测在马匹福利监测中的潜力和固有挑战。

英文摘要

Automated detection of equine facial action units (AUs) is a promising yet under-explored avenue for pain and affective state assessment in horses. Half and full-blink movements are recognised indicators of pain and stress, but as micro-expressions, their subtle, fine-grained nature makes them easily missed by the naked eye and only discernible through frame-by-frame video inspection, making reliable automated detection from video a particularly demanding task. We develop and evaluate three methods for automated blink classification from horse videos: a frame-based YOLOv12 detector, an optical flow magnitude thresholding approach, and a fine-tuned VideoMAE model, tested on a publicly available dataset. We achieve a macro-F1 score of 0.898 when doing blink classification and 0.926 on binary blink detection. Our results highlight both the potential and the inherent challenges of fine-grained AU detection for equine welfare monitoring.

URL PDF HTML ☆

赞 0 踩 0

2606.05455 2026-06-05 cs.CV

Disentangled Fine-Grained Prototype Learning for Incomplete Image-Tabular Classification

面向不完整图像-表格分类的解缠细粒度原型学习

Feixiang Zhou, Jianyang Xie, Zhuangzhi Gao, Qinkai Yu, Fu Wang, Yuheng Fan, Jing Li, Zheheng Jiang, Yitian Zhao, Yanda Meng, He Zhao, Gregory Y. H. Lip, Yalin Zheng

发表机构 * School of Eye and Vision Sciences, University of Liverpool, U.K.（利物浦大学眼科与视觉科学学院）； Department of Cardiovascular and Metabolic Medicine, University of Liverpool, U.K.（利物浦大学心血管与代谢医学系）； School of Computer Science, University of Exeter, U.K.（埃克塞特大学计算机科学学院）； School of Computer Science and Engineering, South China University of Technology, China（华南理工大学计算机科学与工程学院）； School of Computing and Mathematical Sciences, University of Leicester, U.K.（莱斯特大学计算科学与数学科学学院）； Ningbo Institute of Industrial Technology, Chinese Academy of Sciences, China（中国科学院宁波工业技术研究所）； Bioengineering Program, Biological and Environmental Science and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Saudi Arabia（卡尔斯塔德大学科学与技术学院（KAUST）生物工程项目，沙特阿拉伯）

AI总结针对图像-表格多模态学习中缺失模态问题，提出DFPL框架，通过共享-特定原型建模、原型级解缠和细粒度对齐，实现鲁棒分类。

详情

AI中文摘要

缺失模态问题在广泛的多媒体应用中（包括产品理解、推荐系统和医疗诊断）对图像-表格多模态学习构成了重大挑战。当两种模态高度异质时，这一挑战尤为突出，因为图像和表格属性在语义粒度和数据分布上存在显著差异。现有方法通过对全局令牌平均特征进行解缠和对齐来学习模态不变表示，仅捕获粗粒度的跨模态一致性，忽略了细粒度的语义和分布错位，这阻碍了在缺失模态下利用互补线索。为了解决这个问题，我们提出了DFPL，一种用于细粒度原型学习的新框架。具体来说，共享-特定原型建模（SSPM）提取紧凑且多样化的共享和模态特定原型，并进一步执行原型级解缠以抑制冗余的模态内相关性。此外，我们提出了一个原型引导的细粒度对齐（PFA）模块，该模块在统一的原型空间内联合强制执行原型级分布匹配和原型到类别的语义对齐，从而跨模态保留细粒度的分布和语义一致性。我们还引入了一个类别感知的多尺度聚合（CMA）模块，从全局和原型级别自适应地聚合共享语义和模态特定特征，以实现鲁棒的预测。在三个不同的图像-表格基准上的大量实验表明，我们的方法在各种缺失模态设置下优于先前的方法。代码将公开提供。

英文摘要

The missing-modality problem poses a significant challenge in image-tabular multimodal learning across a wide range of multimedia applications, including product understanding, recommendation systems, and medical diagnosis. This challenge is particularly pronounced when the two modalities are highly heterogeneous, as images and tabular attributes differ substantially in their semantic granularity and data distributions. Existing methods learn modality-invariant representations through disentanglement and alignment over global token-averaged features, capturing only coarse cross-modal consistency and overlooking fine-grained semantic and distributional misalignment, which hampers the exploitation of complementary cues under missing modalities. To address this, we propose DFPL, a novel framework for fine-grained prototype learning. Specifically, Shared-Specific Prototype Modeling (SSPM) extracts compact and diverse shared and modality-specific prototypes, and further performs prototype-level disentanglement to suppress redundant intra-modality correlations. Additionally, we propose a Prototype-guided Fine-grained Alignment (PFA) module that jointly enforces prototype-level distribution matching and prototype-to-class semantic alignment within a unified prototype space, thereby preserving both fine-grained distributional and semantic consistency across modalities. We further introduce a Class-aware Multi-scale Aggregation (CMA) module to adaptively aggregate shared semantics and modality-specific characteristics from global and prototype levels for robust predictions. Extensive experiments on three diverse image-tabular benchmarks demonstrate the superiority of our method compared to the previous approaches under various missing-modality settings. Code will be made publicly available.

URL PDF HTML ☆

赞 0 踩 0

2606.05449 2026-06-05 cs.AI cs.GT econ.EM

Insurance of Agentic AI

代理型人工智能的保险

Quanyan Zhu

发表机构 * Department of Electrical and Computer Engineering, New York University, Tandon School of Engineering（电气与计算机工程系，纽约大学，工程学院）

AI总结本文分析了代理型AI带来的新型风险，提出了承保、定价、再保险和产品设计的框架，并构建了整合多种保险覆盖的协调架构。

详情

AI中文摘要

代理型人工智能系统通过超越信息生成，扩展到自主规划、工具调用、决策执行以及对数字和物理环境的持续修改，正在改变风险格局。这些能力引入了新的风险敞口，这些敞口并不完全适合传统的保险类别，如网络、职业责任、产品责任或董事及高管责任保险。本文考察了新兴的代理型AI保险市场，并开发了一个框架来理解其承保、定价、再保险和产品设计的影响。我们将代理型AI描述为自主性和授权委托的连续体，强调信息输出与能够通过外部行动独立产生保险事件的系统之间的区别。我们分析了主要风险路径，包括幻觉、提示注入攻击、自主决策错误、模型漂移、依赖故障和网络物理伤害，并评估了现有保险产品如何适应这些风险敞口。本文进一步提出了一个基于风险暴露评估、情景分析、依赖映射和累积风险管理的精算框架，借鉴了网络保险的发展历程。最后，我们提出了一个协调的保险架构，通过明确的分配机制和专门的AI总限额，整合了网络、技术错误与遗漏、产品责任、性能保证以及明确的AI责任保险。分析表明，代理型AI保险的未来不在于单一的单线产品，而在于一个由改进的治理、透明度、遥测和监管清晰度支持的互补覆盖分层生态系统。

英文摘要

Agentic artificial intelligence (AI) systems are transforming the risk landscape by extending beyond information generation to autonomous planning, tool invocation, decision execution, and persistent modification of digital and physical environments. These capabilities introduce novel exposures that do not fit neatly within traditional insurance categories such as cyber, professional liability, product liability, or directors and officers coverage. This paper examines the emerging insurance market for agentic AI and develops a framework for understanding its underwriting, pricing, reinsurance, and product-design implications. We characterize agentic AI as a continuum of autonomy and delegated authority, emphasizing the distinction between informational outputs and systems capable of independently generating insured events through external actions. We analyze major risk pathways, including hallucinations, prompt-injection attacks, autonomous decision errors, model drift, dependency failures, and cyber-physical harms, and evaluate how existing insurance products are adapting to address these exposures. The paper further proposes an actuarial framework based on exposure assessment, scenario analysis, dependency mapping, and accumulation-risk management, drawing parallels to the evolution of cyber insurance. Finally, we present a coordinated insurance architecture that integrates cyber, technology errors and omissions, product liability, performance-warranty, and affirmative AI-liability coverages through explicit allocation mechanisms and dedicated AI aggregates. The analysis suggests that the future of agentic-AI insurance lies not in a single monoline product but in a layered ecosystem of complementary coverages supported by improved governance, transparency, telemetry, and regulatory clarity.

URL PDF HTML ☆

赞 0 踩 0

2606.05445 2026-06-05 cs.AI

Brick-Composer: Using MLLMs for Assembly with Diverse Bricks

Brick-Composer: 使用多模态大语言模型进行多样化积木组装

Jiateng Liu, Bingxuan Li, Zhenhailong Wang, Rushi Wang, Kaiwen Hong, Cheng Qian, Jiayu Liu, Denghui Zhang, Katherine Driggs-Campbell, Manling Li, Heng Ji

发表机构 * UIUC（伊利诺伊大学香槟分校）； Stevens Institute of Technology（史蒂文斯理工学院）； Northwestern University（西北大学）

AI总结本文提出Brick-Composer框架，通过人类设计火花、世界反馈和合成经验三种信号训练多模态大语言模型，解决积木组装中的积木选择和姿态估计问题，将步骤级组装成功率从低于1%提升至约15%。

详情

Comments: 10 Pages, 10 figures

AI中文摘要

我们梦想着AI代理能够读取任意设计，并从可重复使用的构建块中构建真实世界的物体。作为迈向这一愿景的第一步，我们研究多模态大语言模型（MLLMs）是否具备积木组装所需的视觉基础和空间推理能力。我们将积木组装形式化为一个序列决策问题，其中每一步涉及两个子任务：积木选择，从候选组件中识别目标积木；以及积木姿态估计，预测所选积木应放置的位置和方式。为支持这项研究，我们引入了BC-Bench（积木构建基准），这是第一个用于评估MLLMs在多样化积木组装中表现的基准。实验表明，当前最先进的MLLMs仍然远非可靠的构建者，在细粒度积木选择上挣扎，并且在精确姿态估计上失败。为弥补这一差距，我们提出了Brick-Composer，一个学习框架，通过三种互补信号赋予MLLMs组装技能：人类设计火花，提供富含可供性的构建演示；世界反馈，将预测动作锚定在视觉和物理后果中；以及合成经验，将学习扩展到现有物体设计之外。Brick-Composer将积木选择准确性提高了三倍以上，大幅减少了姿态估计误差，并将严格的步骤级组装成功率从低于1%提升至约15%。训练后，一个Qwen-3-8B模型能够正确完成一个完整物体高达42%的步骤，这表明MLLMs可以通过有针对性的、基于物理的学习获得组装能力。

英文摘要

We dream of AI agents that can read arbitrary designs and construct real-world objects from reusable building blocks. As a first step toward this vision, we study whether multimodal large language models (MLLMs) possess the visual grounding and spatial reasoning capabilities required for brick assembly. We formulate brick assembly as a sequential decision-making problem, where each step involves two subtasks: brick selection, identifying the target brick from candidate components, and brick pose estimation, predicting where and how the selected brick should be placed. To support this study, we introduce BC-Bench (Brick Construction Benchmark), the first benchmark for evaluating MLLMs on assembly with diverse bricks. Experiments show that current state-of-the-art MLLMs remain far from reliable builders, struggling with fine-grained brick selection and failing at precise pose estimation. To bridge this gap, we propose Brick-Composer, a learning framework that equips MLLMs with assembly skills through three complementary signals: Human Design Sparks, which provide affordance-rich construction demonstrations; World Feedback, which grounds predicted actions in visual and physical consequences; and Synthetic Experience, which scales learning beyond existing object designs. Brick-Composer improves brick selection accuracy by over three times, substantially reduces pose estimation errors, and raises strict step-level assembly success from less than 1% to around 15%. After training, a Qwen-3-8B can correctly compose up to 42% of the steps for a complete object, suggesting that MLLMs can acquire assembly capabilities through targeted, physically grounded learning.

URL PDF HTML ☆

赞 0 踩 0

2606.05444 2026-06-05 cs.CL cs.AI cs.LG

Multilingual Coreference Resolution via Cycle-Consistent Machine Translation

通过循环一致性机器翻译的多语言共指消解

Adriana-Valentina Costache, Eduard Poesina, Silviu-Florin Gheorghe, Paul Irofti, Radu Tudor Ionescu

发表机构 * Department of Computer Science, University of Bucharest（布加勒斯特大学计算机科学系）

AI总结提出一种利用循环一致性机器翻译生成或扩展训练数据的管道，通过BERT潜在空间余弦相似度评估翻译质量并加权损失函数，显著提升低资源语言的共指消解性能。

详情

AI中文摘要

共指消解是一项核心的自然语言处理任务，具有广泛的下游应用，例如机器翻译、问答、文档摘要等。虽然该任务在英语中得到了充分研究，但其他语言（尤其是低资源语言）的共指消解关注相对较少。为了弥补这一差距，我们提出了一种新颖的共指消解管道，该管道利用从英语到目标低资源语言的机器翻译（MT）来生成或扩展训练数据。为了自动验证翻译样本的质量，我们将样本反向翻译，并通过BERT模型潜在空间中的余弦相似度评估与原始英语样本的相似性。得到的相似度分数被整合到损失函数中，以根据样本的MT循环一致性对训练样本进行加权。在四种低资源语言上的大量实验表明，我们的管道在共指消解中带来了显著的性能提升。此外，我们的管道使得在之前没有可用语料库的语言中也能实现准确的共指消解。

英文摘要

Coreference resolution is a core NLP task, having a broad range of downstream applications, e.g.~machine translation, question answering, document summarization, etc. While the task is well-studied in English, comparatively less attention is dedicated to coreference resolution in other languages, especially low-resource ones. To mitigate this gap, we propose a novel coreference resolution pipeline that harnesses machine translation (MT) from English to a target low-resource language, to generate or expand training data. To automatically validate the quality of the translated samples, we back-translate the samples and assess the similarity with the original English samples via cosine similarity in the latent space of a BERT model. The resulting similarity scores are integrated into the loss function to weight training samples according to their MT cycle consistency. Extensive experiments on four low-resource languages show that our pipeline brings significant performance gains in coreference resolution. Moreover, our pipeline enables accurate coreference resolution in languages where no previous corpora were available.

URL PDF HTML ☆

赞 0 踩 0

2606.05438 2026-06-05 cs.LG math.OC

Sharp First-Order Lower Bounds for Higher-Order Smooth Nonconvex Optimization

高阶光滑非凸优化的尖锐一阶下界

Dongruo Zhou

发表机构 * Department of Computer Science, Indiana University（计算机科学系，印第安纳大学）

AI总结针对高阶光滑非凸优化，通过块链机制构造硬实例，首次证明了匹配已知上界的一阶下界，如Hessian Lipschitz情形下的Ω(ε^{-7/4})和三阶光滑情形下的Ω(ε^{-5/3})。

详情

Comments: 24 pages, 1 table

AI中文摘要

我们研究了在目标函数满足高阶光滑性假设时，寻找光滑非凸优化中ε-驻点的确定性一阶预言复杂度。虽然经典的ε^{-2}速率在仅Lipschitz梯度条件下是最优的，但高阶光滑性导致了加速的一阶上界，最显著的是在Lipschitz Hessian下的ε^{-7/4}速率和在Lipschitz三阶导数下的ε^{-5/3}速率。然而，匹配的下界一直未解决。我们通过证明一个新的无维数的一阶下界来填补这一空白，该下界适用于任意有限光滑阶的高阶光滑非凸函数。特别地，我们的构造在Hessian-Lipschitz情形下给出了匹配的Ω(ε^{-7/4})下界，在三阶光滑情形下给出了匹配的Ω(ε^{-5/3})下界。硬实例基于一种块链机制，该机制强制块状预言揭示，同时保持标量硬实例所需的光滑结构。该下界构造是在ChatGPT 5.5 Pro的协助下发现的，随后由作者验证。

英文摘要

We study the deterministic first-order oracle complexity of finding $ε$-stationary points in smooth nonconvex optimization when the objective satisfies higher-order smoothness assumptions. While the classical $ε^{-2}$ rate is optimal under only Lipschitz gradients, higher-order smoothness leads to accelerated first-order upper bounds, most notably the $ε^{-7/4}$ rate under Lipschitz Hessians and the $ε^{-5/3}$ rate under Lipschitz third derivatives. The matching lower bounds, however, have remained open. We resolve this gap by proving a new dimension-free first-order lower bound for higher-order smooth nonconvex functions, valid for every finite smoothness order. In particular, our construction gives a matching $Ω(ε^{-7/4})$ lower bound in the Hessian-Lipschitz case and a matching $Ω(ε^{-5/3})$ lower bound in the third-order-smooth regime. The hard instance is based on a \emph{block-chain} mechanism that enforces blockwise oracle revelation while preserving the smoothness structure needed for the scalar hard instance. The lower-bound construction was discovered with the assistance of ChatGPT 5.5 Pro and subsequently verified by the authors.

URL PDF HTML ☆

赞 0 踩 0

2606.05437 2026-06-05 cs.RO cs.CV

Uncertainty-Aware Adaptive Sensor Fusion for Autonomous Navigation

不确定性感知的自适应传感器融合用于自主导航

Simegnew Yihunie Alaba, Yuichi Motai

发表机构 * IEEE

AI总结提出一种结合无迹卡尔曼滤波（UKF）的混合深度学习方法，通过不确定性感知的自适应融合视觉和惯性特征，提高自主导航中视觉惯性里程计（VIO）的位姿估计精度。

详情

Comments: 13 pages

AI中文摘要

本文介绍了一种混合深度学习方法，与无迹卡尔曼滤波（UKF）相结合，以增强自主导航中视觉惯性里程计（VIO）的位姿估计精度。所提出的模型采用视觉变换器（ViT）网络有效捕获惯性测量单元（IMU）数据的时间依赖性，并利用多尺度卷积神经网络（MCNN）从视觉数据中学习基于光流的运动线索。自适应传感器融合模块通过利用估计的不确定性动态加权IMU和视觉特征，从而在多样且具有挑战性的环境条件下提高鲁棒性。此外，提出了一种新颖的不确定性感知损失函数，将预测不确定性明确纳入学习过程，使得在噪声、不完整或不可靠的传感器输入下实现鲁棒且准确的导航。在KITTI数据集上的全面评估表明，所提出的方法显著优于基线方法，在绝对轨迹误差（ATE）和相对位姿误差（RPE）方面实现了优越性能。该轻量且计算高效的模型在NVIDIA A100 GPU上以155 FPS处理数据，非常适合部署在资源受限的自主系统中。

英文摘要

This work introduces a hybrid deep learning approach integrated with an Unscented Kalman Filter (UKF) to enhance pose estimation accuracy in Visual-Inertial Odometry (VIO) for autonomous navigation. The proposed model employs a Vision Transformer (ViT) network to effectively capture temporal dependencies from inertial measurement unit (IMU) data and utilizes a Multiscale Convolutional Neural Network (MCNN) to learn optical flow-based motion cues from visual data. An adaptive sensor fusion module dynamically weights IMU and visual features by leveraging estimated uncertainty, thus improving robustness in diverse and challenging environmental conditions. Additionally, a novel uncertainty-aware loss function is proposed to explicitly incorporate prediction uncertainty into the learning process, enabling robust and accurate navigation under noisy, incomplete, or unreliable sensor inputs. Comprehensive evaluations of the KITTI dataset demonstrate that the proposed method significantly outperforms baseline approaches, achieving superior performance in terms of Absolute Trajectory Error (ATE) and Relative Pose Error (RPE). The lightweight and computationally efficient model processes data at 155 FPS on an NVIDIA A100 GPU, making it highly suitable for deployment in resource-constrained autonomous systems.

URL PDF HTML ☆

赞 0 踩 0

2606.05436 2026-06-05 cs.AI cs.CL cs.IR

Ten Headache Specialists versus Artificial Intelligence for Clinical Literature Summarization: A Critical Evaluation and Comparison

十位头痛专家与人工智能在临床文献总结中的比较：一项关键评估与对比

Alejandro Lozano, Keiko Ihara, Ping-Hao Yang, Carrie E. Robertson, Jennifer Stern, Allan Purdy, Hsiangkuo Yuan, Pengfei Zhang, Yulia Orlova, Olga Fermo, Jennifer Hranilovich, Fred Cohen, Todd J. Schwedt, Jenelle A. Jindal, Serena Yeung-Levy, Chia-Chun Chiang

发表机构 * Stanford University Palo Alto CA USA（斯坦福大学）； Department of Neurology Mayo Clinic Rochester MN USA（梅奥诊所神经科）； Department of Neurology Dalhousie University Halifax Canada（达尔豪斯大学神经科）； Jefferson Headache Center Department of Neurology Thomas Jefferson University PA USA（泰勒大学神经科）； Beth Israel Deaconess Medical Center Boston MA USA（贝斯以色列医疗中心）； Department of Neurology University of Florida Gainesville FL USA（佛罗里达大学神经科）； University of Colorado School of Medicine Department of Pediatrics Division of Child Neurology Aurora CO USA（科罗拉多医学院儿科部儿童神经科）； Department of Medicine Mount Sinai Hospital Icahn School of Medicine at Mount Sinai New York NY USA（西奈医院医学部）； Department of Neurology Mayo Clinic Scottsdale AZ USA（梅奥诊所Scottsdale分部）； Harvard Medical School Boston MA USA（哈佛医学院）； Department of Neurology Mount Sinai Hospital Icahn School of Medicine at Mount Sinai New York NY USA（西奈医院神经科）

AI总结本研究通过构建基于RAG的AI框架，比较了三种大语言模型与十位头痛专家在临床文献总结方面的表现，发现专家撰写的摘要更受青睐，但专家有时难以区分人类与AI生成的摘要。

详情

AI中文摘要

总结最新医学文献以指导临床决策对于循证医学和高质量患者护理至关重要。然而，由于患者时间有限且发表文章数量迅速增长，临床医生面临越来越大的挑战。尽管检索增强的大语言模型（LLMs）在临床总结方面显示出潜力，但对其在综合更广泛科学文献方面的有效性进行人工评估，以及与专家撰写的综合摘要的直接比较仍然很少。我们使用三种最先进的LLMs（Sonnet、GPT-4o和Llama 3.1）构建了一个基于RAG的智能AI框架。一位头痛专家创建了13个问题，其中3个用于提示优化，10个用于评估。美国和加拿大的十位头痛专家每人针对一个问题撰写一篇摘要，每个问题得到四篇摘要（专家、Sonnet、GPT-4o和Llama）。专家们在不知道作者身份的情况下，根据正确性、完整性、简洁性和临床实用性，使用标准化评分标准对摘要进行评分（1-10分），并排除他们自己撰写摘要的主题。他们还按偏好对摘要进行排序，并指出他们认为每篇摘要是由专家还是LLM撰写的。我们的研究比较了由头痛专家评估的LLM和专家撰写的文献摘要，结果显示专家撰写的摘要更受青睐，尽管专家有时难以区分人类和AI生成的摘要。我们还确定了超出标准评估指标的关键专家重视特征，这些特征可以指导未来人类和AI文献总结流程的改进。

英文摘要

Summarizing the latest medical literature to guide clinical decision-making is essential for evidence-based medicine and high-quality patient care. Yet clinicians face increasing challenges due to limited time with patients and a rapidly growing volume of published articles. Although retrieval-augmented large language models (LLMs) have shown promise in clinical summarization, human evaluations of their effectiveness in synthesizing broader scientific literature and direct comparisons to expert-written syntheses remain scarce. We constructed a RAG-based agentic AI framework using three state-of-the-art LLMs: Sonnet, GPT-4o, and Llama 3.1. A headache specialist created 13 questions, three for prompt optimization and ten for evaluation. Ten headache specialists across the United States and Canada each wrote a summary for one question, yielding four summaries per question (expert, Sonnet, GPT-4o, and Llama). The experts, blinded to authorship, critically evaluated the summaries, excluding the topic for which they wrote a summary, based on correctness, completeness, conciseness, and clinical utility, scoring each from 1 to 10 using standardized rubrics. They also ranked the summaries by preference and indicated whether they believed each summary was written by an expert or an LLM. Our study, comparing LLM- and expert-written literature summaries evaluated by headache specialists, showed that expert-written summaries were preferred, although experts sometimes found it challenging to distinguish between human- and AI-generated summaries. We also identified key expert-valued features beyond standard evaluation metrics that can guide future refinement of both human and AI literature summarization pipelines.

URL PDF HTML ☆

赞 0 踩 0

2606.05435 2026-06-05 cs.LG cs.CR

DP-MacAdam: Differentially Private Mechanism with Adaptive Clipping and Adaptive Momentum

DP-MacAdam：具有自适应裁剪和自适应动量的差分隐私机制

Naima Tasnim, Lalitha Sankar, Oliver Kosut

发表机构 * University of Southern California（南加州大学）

AI总结提出DP-MacAdam算法，通过联合利用梯度均值和方差估计进行自适应裁剪和动量更新，在无需手动调整裁剪阈值的情况下提升模型效用。

详情

Comments: 6 pages, 2 tables

AI中文摘要

差分隐私随机梯度下降（DP-SGD）已成为隐私保护机器学习的标准框架，但其依赖固定梯度裁剪阈值来限制敏感度，这仍然是一个重要的实际限制。诸如AdaClip等自适应裁剪算法在裁剪和添加噪声之前对梯度进行平移和缩放，使得裁剪后的梯度产生更具信息性的下降方向。平移和缩放参数根据经验均值和方差自适应选择。然而，在现有的自适应裁剪算法中，这些经验估计尚未同时用于动量以加速训练本身。另一方面，DP-Adam是一种利用基于梯度均值和方差的类Adam动量更新来加速训练的算法，但并未利用这些估计进行自适应裁剪。在这项工作中，我们提出了具有自适应裁剪和自适应动量的差分隐私机制（DP-MacAdam），这是一种新颖的算法，它结合了这两种方法，从而将相同的均值和方差估计同时用于裁剪和动量。我们进行了分析，表明DP-MacAdam以无偏方式估计梯度方差。此外，我们实证评估了DP-MacAdam的隐私和准确性，证明与DP-SGD、AdaClip和DP-Adam基线相比，它在无需手动调整裁剪阈值的情况下实现了改进的模型效用。

英文摘要

Differentially private stochastic gradient descent (DP-SGD) has become the standard framework for privacy-preserving machine learning, yet its reliance on a fixed gradient clipping threshold to limit sensitivity remains a significant practical limitation. Adaptive clipping algorithms such as AdaClip shift and scale the gradient prior to clipping and adding noise so that the clipped gradient yields a more informative descent direction. The shift and scaling parameters are selected adaptively based on the empirical mean and variance. However, in existing adaptive clipping algorithms, these empirical estimates have not been also used for momentum to accelerate training itself. On the other hand, DP-Adam is an algorithm that exploits Adam-like momentum updates based on the gradient mean and variance to accelerate training, but does not exploit these estimates for adaptive clipping. In this work, we propose Differentially Private Mechanism with Adaptive Clipping and Adaptive Momentum (DP-MacAdam), a novel algorithm that combines these two approaches so as to use the same mean and variance estimates for both clipping and momentum. We perform an analysis showing that DP-MacAdam estimates the gradient variances in a bias-free manner. In addition, we empirically evaluate the privacy and accuracy of DP-MacAdam, demonstrating that it achieves improved model utility compared to DP-SGD, AdaClip, and DP-Adam baselines, without requiring manual tuning of the clipping threshold.

URL PDF HTML ☆

赞 0 踩 0

2606.05434 2026-06-05 cs.LG cs.AI

Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models

选择性优势熵自适应视野GRPO：用于语言模型高效强化学习的非对称令牌级折扣

Chirag Chawla, Rohan Charudatt Salvi, Madhav S. Baidya

发表机构 * Indian Institute of Technology (BHU)（印度理工学院（博生胡大学））； Department of Computer Science, University of Illinois Chicago（伊利诺伊大学芝加哥分校计算机科学系）

AI总结提出选择性优势熵自适应视野GRPO（SA-AH-GRPO），通过非对称令牌级折扣（仅对负优势轨迹应用熵基折扣）来稳定训练并提升数学推理性能。

详情

Comments: 16 pages, 4 Figures, 7 Tables

AI中文摘要

组相对策略优化（GRPO）已成为一种有效的强化学习算法，用于在推理任务上对齐语言模型，但它对称地处理每个令牌位置和每个采样轨迹。我们引入了两个互补的扩展：(i) 自适应视野GRPO（AH-GRPO），它使用基于累积熵的折扣对每个令牌的策略梯度进行加权，当模型不确定时减少有效视野；(ii) 选择性优势AH-GRPO（SA-AH-GRPO），它仅对负优势轨迹应用此折扣，而保留正优势的成功轨迹不受衰减。我们在GSM8K数学推理基准上，使用通过LoRA微调的Qwen 2.5-1.5B-Instruct和Qwen 2.5-3B-Instruct模型，评估了alpha=0的标准GRPO、alpha=0.5的AH-GRPO和alpha=0.5的SA-AH-GRPO。在3B模型上，SA-AH-GRPO在第30步达到峰值Pass@1=0.858，并在180步保持0.846，训练方差降至0.0246，相比GRPO减少了3.6倍，同时匹配其峰值准确率。在1.5B模型上，SA-AH-GRPO达到峰值Pass@1=0.686，优于零样本基线0.637。我们的分析表明，非对称折扣保留了正确解上的完整梯度信号，防止了熵崩溃，并显著稳定了训练，为结构化生成任务上具有可验证奖励的强化学习提供了一种原则性的归纳偏置。

英文摘要

Group Relative Policy Optimisation (GRPO) has emerged as an effective reinforcement-learning algorithm for aligning language models on reasoning tasks, but it treats every token position and every sampled rollout symmetrically. We introduce two complementary extensions: (i) Adaptive-Horizon GRPO (AH-GRPO), which weights each token's policy gradient using a cumulative entropy-based discount that reduces the effective horizon when the model is uncertain, and (ii) Selective-Advantage AH-GRPO (SA-AH-GRPO), which applies this discounting only to negative-advantage rollouts, leaving positive-advantage, successful trajectories unattenuated. We evaluate standard GRPO with alpha = 0, AH-GRPO with alpha = 0.5, and SA-AH-GRPO with alpha = 0.5 on the GSM8K mathematical reasoning benchmark using both Qwen 2.5-1.5B-Instruct and Qwen 2.5-3B-Instruct fine-tuned with LoRA. On the 3B model, SA-AH-GRPO achieves Pass@1 = 0.858 at its peak at step 30 and maintains 0.846 at 180 steps, with training variance reduced to 0.0246, a 3.6 times reduction relative to GRPO while matching its peak accuracy. On the 1.5B model, SA-AH-GRPO achieves a peak Pass@1 of 0.686, improving over the zero-shot baseline of 0.637. Our analysis shows that asymmetric discounting preserves the full gradient signal on correct solutions, prevents entropy collapse, and substantially stabilises training, suggesting a principled inductive bias for reinforcement learning with verifiable rewards on structured generation tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.05433 2026-06-05 cs.AI cs.SY eess.SY

Zero knowledge verification for frontier AI training is possible

前沿AI训练的零知识验证是可能的

Pierre Peigné, Ky Nguyen, Paul Wang

发表机构 * Lefebvre General-Purpose AI Policy Lab（莱贝维尔通用人工智能政策实验室）； Sorbonne Université（索邦大学）； CNRS（国家科学研究中心）； LIP6（LIP6实验室）

AI总结提出一种结合预提交训练规范、节点间网络观测和中间计算即时Merkle承诺的零知识验证架构，通过原生BF16/FP32预编译的零知识虚拟机（zkVM）验证GPU实际浮点计算，实现训练过程可验证且架构保密，预计36个月内实现概念验证。

详情

Comments: 44 pages, 2 figures

AI中文摘要

前沿AI治理框架日益将累积训练计算作为指定高影响力模型的主要标准，但执行依赖于自我报告，因为不存在训练的技术验证原语。任何未来关于前沿AI的国际协议都面临更高风险下的同样问题：对具有显著外部性的技术进行协调监管历史上依赖于技术验证，否则协议只是宣言性的。最近的治理分析认为零知识证明是一个有希望的候选方案，但目前在前沿规模下不切实际[26, 4]。我们认为这种不切实际是范式限制而非根本性的，并提出了一种用于前沿密集预训练的验证架构，结合了预提交的训练规范、节点间网络观测以及中间计算的即时Merkle承诺，通过具有原生BF16/FP32预编译的零知识虚拟机（zkVM）进行验证。该证明检查GPU执行的实际浮点计算而非定点近似，并通过私有的训练规范保护模型架构的机密性。该协议产生三种证明类型：初始化时的创世证明、训练过程中的步骤证明，以及作为运行不变量的事前证明，强制执行与政策相关的声明，将训练记录转变为可治理执行的工件。我们估计在训练侧开销为个位数百分比的情况下，大约36个月内可实现可部署的概念验证，而验证级定制硅片的周期为六到十年。列出了十三个开放的研究和工程问题，作为外部贡献的研究议程。

英文摘要

Frontier AI governance frameworks increasingly use cumulative training compute as the primary criterion for designating high-impact models, but enforcement rests on self-reporting because no technical verification primitive for training exists. Any future international agreement on frontier AI faces the same problem at higher stakes: coordinated regulation of technologies with significant externalities has historically rested on technical verification, without which agreements are declaratory. Recent governance analyses judge zero-knowledge proofs a promising candidate but currently impractical at frontier scale [26, 4]. We argue the impracticality is paradigm-bound rather than fundamental, and propose a verification architecture for frontier dense pre-training combining a pre-committed training specification, inter-node network observations, and on-the-fly Merkle commitments of intermediate computation, verified through a zero-knowledge Virtual Machine (zkVM) with native BF16/FP32 precompiles. The proof checks the actual floating-point computation the GPU performed rather than a fixed-point approximation, and preserves model-architecture confidentiality through a private training specification. The protocol produces three proof types: a genesis proof at initialisation, in-training step proofs across the run, and ex-ante attestations enforcing policy-relevant claims as running invariants, turning the training record into a governance-enforceable artefact. We estimate a deployable proof of concept within approximately 36 months at single-digit-percent training-side overhead, against a six-to-ten-year cycle for verification-grade custom silicon. Thirteen open research and engineering problems are catalogued as a research agenda for external contribution

URL PDF HTML ☆

赞 0 踩 0

2606.05429 2026-06-05 cs.AI

Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models

最小化缩放因子的隐藏成本：面向大语言模型的图引导超低位量化

Rayyan Abdalla, Amir Hussein, Min Wu, Dinesh Manocha

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出SAGE-PTQ框架，通过图引导的显著性感知量化分离显著与非显著权重，实现超低位量化并最小化缩放开销，在LLaMA-3-8B上困惑度降至6.74且内存低于BiLLM的50%。

详情

Comments: Preprint. 18 pages, 10 figures, 7 tables, including appendix

AI中文摘要

训练后量化（PTQ）对于大语言模型（LLMs）的高效部署至关重要。最近的超低位PTQ方法依赖于严格的权重显著性假设或位置启发式，引入了大量隐藏的缩放开销。我们提出SAGE-PTQ（显著性感知图引导高效PTQ），一种新颖的LLMs超低位量化框架，可最小化隐藏缩放成本。SAGE-PTQ使用分布统计分离显著和非显著权重，然后将子采样的非显著权重建模为稀疏图，以估计每层的最佳组数。SAGE-PTQ应用双模量化，为显著权重分配多位精度，并对非显著权重进行二值化。为减少缩放开销，SAGE-PTQ对显著权重使用每个通道一个缩放因子，对每个非显著组使用一个标量。最后，SAGE-PTQ实现自适应显著性阈值，以选择每个矩阵的最佳显著性比率。SAGE-PTQ平均达到1.03权重位和仅0.004缩放位每矩阵，优于BiLLM和PB-LLM等最先进方法。在LLaMA-3-8B上，SAGE-PTQ在WikiText2上达到6.74困惑度，而BiLLM为55.8，同时使用不到BiLLM 50%的GPU内存。在LLaMA-2-70B上，SAGE-PTQ在单个NVIDIA L40 GPU上提供1.5倍更快的解码速度，展示了实际的推理效率。

英文摘要

Post-training quantization (PTQ) is critical for the efficient deployment of large language models (LLMs). Recent ultra-low-bit PTQ methods rely on rigid weight-saliency assumptions or position heuristics, introducing substantial hidden scaling overhead. We propose SAGE-PTQ (Saliency-Aware Graph-guided Efficient PTQ), a novel ultra-low-bit quantization framework for LLMs that minimizes hidden scaling cost. SAGE-PTQ separates salient and unsalient weights using distributional statistics, then models subsampled unsalient weights as a sparse graph to estimate the optimal number of groups per layer. SAGE-PTQ applies dual-mode quantization, assigning multi-bit precision to salient weights and binarizing unsalient weights. To reduce scaling overhead, SAGE-PTQ uses one per-channel scale for salient weights and one scalar per unsalient group. Finally, SAGE-PTQ implements adaptive saliency thresholding to select the optimal saliency ratio per matrix. SAGE-PTQ achieves 1.03 weight bits and only 0.004 scaling bits per matrix on average, outperforming state-of-the-art methods such as BiLLM and PB-LLM. On LLaMA-3-8B, SAGE-PTQ achieves 6.74 WikiText2 perplexity, compared to 55.8 for BiLLM, while using less than 50% of BiLLM's GPU memory. On LLaMA-2-70B, SAGE-PTQ provides 1.5x faster decoding on one NVIDIA L40 GPU, demonstrating practical inference efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.05422 2026-06-05 cs.RO

Learning from Demonstrations over Riemannian Manifolds using Neural ODEs: An Extended Abstract

利用神经常微分方程在黎曼流形上从示范中学习：扩展摘要

Diana Cuervo Espinosa, Mahathi Anand, Angela P. Schoellig

发表机构 * ETH Zürich（苏黎世联邦理工学院）

AI总结针对机器人状态（如方向）在弯曲空间上演化的问题，提出利用神经常微分方程在黎曼流形上从示范中学习，通过数值估计测地线实现自然运动生成，并降低计算开销。

详情

Comments: 2 pages

AI中文摘要

从示范中学习（LfD）通常在欧几里得空间中进行，而机器人状态（例如方向）自然地在弯曲空间上演化。因此，为了确保自然、复杂的运动生成，我们研究在能够编码位置和方向数据的黎曼流形上从示范中学习。在这里，测地线提供了流形内任意两点之间的自然运动。我们提出通过神经常微分方程数值估计测地线，以减轻现有方法的大计算开销。最后，这些测地线可以在部署到机器人之前解码回原始任务空间。在这篇扩展摘要中，我们讨论了我们框架的架构，提供了一些来自仿真实验的初步见解，包括与其他测地线计算机制的比较，并讨论了未来工作的挑战和前景。

英文摘要

Learning from demonstratins (LfD) is usually performed over Euclidean spaces, while the robot state, e.g. orientation, naturally evolves over curved spaces. Therefore, to ensure natural, complex motion generation, we investigate learning from demonstrations over Riemannian manifolds that are capable of encoding both position and orientation data. Here, geodesic paths provide for natural motion between two arbitrary points within the manifold. We propose to numerically estimate geodesics via neural ordinary differential equations, mitigating large computational overhead of existing approaches. Finally, these geodesics can be decoded back into the original task space before deploying on the robot. In this extended abstract, we discuss the architecture of our framework, provide some initial insights from our simulation experiments, including comparison to other geodesic computation mechanisms, and discuss the challenges and prospects for future work.

URL PDF HTML ☆

赞 0 踩 0

2606.05421 2026-06-05 cs.CL

ComplexityMT: Benchmarking the Interaction Between Text Complexity and Machine Translation

ComplexityMT: 文本复杂度与机器翻译交互作用的基准测试

Joseph Marvin Imperial, Junhong Liang, Belal Shoer, Abdullah Barayan, Rodrigo Wilkens, Omar Mussa, Dawn Knight, Eugénio Ribeiro, Ekaterina Kochmar, Sowmya Vajjala, Fernando Alva-Manchego, Harish Tayyar Madabushi

发表机构 * University of Bath（巴斯大学）； Cardiff University（卡迪夫大学）； National University Philippines（菲律宾国家大学）； MBZUAI（穆扎布伊人工智能研究所）； University of Exeter（埃克塞特大学）； INESC-ID Lisboa（里斯本INESC-ID）； Instituto Universitário de Lisboa (ISCTE-IUL), ISTAR（里斯本大学研究所（ISCTE-IUL），ISTAR）； National Research Council, Canada（加拿大国家研究委员会）； King Abdulaziz University（阿卜杜勒-阿齐兹大学）； Saudi Electronic University（沙特电子大学）

AI总结提出ComplexityMT基准，利用CEFR等级评估六种语言中文本复杂度与机器翻译的相互影响，发现高复杂度文本更难翻译且翻译会改变目标文本的CEFR等级。

详情

AI中文摘要

当文本被翻译时，翻译是否保留了原文的复杂度？我们引入ComplexityMT，这是一个新的挑战，用于评估文本复杂度和机器翻译如何相互作用和相互影响，使用欧洲语言共同参考框架（CEFR）等级作为文本复杂度的度量。在包括阿拉伯语、荷兰语、英语、法语、印地语和俄语在内的六种语言中，我们评估了三个开放权重模型、一个封闭模型和一个商业机器翻译系统在两个任务上的表现：i) CEFR与翻译难度的相关性，以及ii) 源文本CEFR等级的变化。我们的实验表明，较高的CEFR等级使文本更难翻译，并且对于大多数语言，机器翻译会改变目标文本相对于原始源文本的CEFR等级。这些发现为从事多语言教学内容生成和机器翻译难度估计的研究人员和从业者提供了新的见解。

英文摘要

When a text is translated, does the translation retain the complexity of the original? We introduce ComplexityMT, a new challenge for assessing how text complexity and machine translation interact with and influence each other, using the Common European Framework of Reference for Languages (CEFR) levels as the measure of text complexity. Across six languages, including Arabic, Dutch, English, French, Hindi, and Russian, we evaluate three open-weight models, one closed model, and a commercial machine translation system on two tasks: i) correlation of CEFR with translation difficulty, and ii) shifts in CEFR levels of the source texts. Our experiments show that higher CEFR levels make texts more difficult to translate, and that machine translation shifts the CEFR level of the target text compared to the original source, for most languages. These findings provide new insights for researchers and practitioners working on multilingual pedagogical content generation and machine translation difficulty estimation.

URL PDF HTML ☆

赞 0 踩 0

2606.05420 2026-06-05 cs.AI stat.AP

Assessing the Carbon Emissions and Energy Consumption of U.S. Hyperscale Data Centers

评估美国超大规模数据中心的碳排放与能源消耗

Gianluca Guidi, Francesca Dominici, Tiziano Squartini, Callaway Sprinkle, Jonathan Gilmour, Kevin Butler, Eric Bell, Scott Delaney, Falco J. Bargagli-Stoffi

发表机构 * Department of Biostatistics, Harvard T.H. Chan School of Public Health（哈佛T.H. 汤普森公共卫生学院生物统计学系）； Department of Computer Science, University of Pisa（比萨大学计算机科学系）； IMT School of Advanced Studies, Lucca（卢塞恩高级研究所）； Environmental Systems Research Institute（环境系统研究机构）； Baxtel（Baxtel公司）； Department of Environmental Health, Harvard T.H. Chan School of Public Health（哈佛T.H. 汤普森公共卫生学院环境健康系）； Department of Biostatistics, UCLA Fielding School of Public Health（加州大学洛杉矶分校Fielding公共卫生学院生物统计学系）

AI总结本研究通过收集403个美国超大规模数据中心设施级数据，估算其电力消耗、电力来源及二氧化碳排放，发现其电力需求约占美国总用电量的1.8%，且碳强度高于全国平均水平48%。

详情

AI中文摘要

美国超大规模数据中心（HDCs）的快速扩张，主要由人工智能的采用驱动，引发了人们对该行业环境足迹的担忧。我们汇编了2024年5月至2025年4月期间运营的403个美国超大规模数据中心的设施级信息，并估算了它们的电力消耗、电力来源及可归因的二氧化碳排放。在不同的设施负载情景下，这些HDC消耗了约68-99太瓦时的电力，并产生了约3700-5400万吨二氧化碳。在中心情景下，HDC电力需求约占美国总用电量的1.8%，其中约54%的归因发电由化石燃料来源提供。HDC电力加权平均碳强度约为545克二氧化碳/千瓦时，比同期美国国家电网平均碳强度370克二氧化碳/千瓦时高出约48%。我们的方法提供了一种归因工具，利用最新的EPA eGRID电厂级数据评估超大规模数据中心的环境足迹。

英文摘要

The rapid proliferation of hyperscale data centers (HDCs) in the US, mainly driven by the adoption of artificial intelligence, has raised concerns about this industry's environmental footprint. We compiled facility-level information on 403 US hyperscale data centers operating between May 2024 and April 2025 and estimated their electricity consumption, electricity sources, and attributable CO2 emissions. Across different facility-load scenarios, these HDCs consumed approximately 68-99 TWh of electricity and were associated with about 37-54 million metric tons of CO2. Under the central scenario, HDC electricity demand corresponded to approximately 1.8% of total US electricity consumption, with roughly 54% of attributed generation supplied by fossil-fuel sources. The HDC electricity-weighted average carbon intensity was approximately 545 gCO2/kWh, about 48% above the contemporaneous US national grid-average carbon intensity of 370 gCO2/kWh. Our approach provides an attributional tool for assessing the environmental footprint of hyperscale data centers using the most recent EPA eGRID plant-level data.

URL PDF HTML ☆

赞 0 踩 0

2606.05415 2026-06-05 cs.CL cs.AI cs.LG

Executable Schema Contracts: From Automatic Ingestion to Multi-Source Retrieval

可执行模式合约：从自动摄入到多源检索

Padmaja Jonnalagedda, Yuguang Yao, Xiang Gao, Hilaf Hasson, Kamalika Das

发表机构 * Intuit AI Research（Intuit AI研究）

AI总结提出一种自动从多源数据中发现可执行模式并将其作为共享合约的系统，通过模式约束的检索路由和结构化分析提升多源问答性能。

详情

Comments: 9 pages, 4 figures, plus supplementary appendix

AI中文摘要

现实世界的数据跨越表格、文档和半结构化文件，具有隐式语义。查询这些数据需要跨不一致的模式和格式整合证据，但现有方法要么需要昂贵的人工工程，要么完全绕过结构。我们提出一个系统，自动从原始多源数据中发现可执行模式，并将其用作知识图谱构建和查询时检索的共享合约。一个封闭世界的字段目录将基于LLM的模式发现限制在已证实的字段上；确定性结构分析推断身份键、外键和源层次结构；由此产生的模式驱动提取、去重和跨源链接，形成具有溯源意识的知识图谱。在查询时，该模式（可选地通过单调协议扩展）调节一个多工具代理，该代理在结构化查找、图遍历和向量搜索之间路由检索，返回带有可追溯引用的有根据的答案。在使用相同LLM、数据和评估框架的受控零样本比较中，该系统在四个QA基准上优于仅检索和基于分解的基线，消融实验表明模式条件路由、结构智能和模式引导构建各自贡献了性能提升。

英文摘要

Real-world data spans tables, documents, and semi-structured files with implicit semantics. Querying this data requires integrating evidence across inconsistent schemas and formats, yet existing approaches either demand costly manual engineering or bypass structure entirely. We present a system that automatically discovers an executable schema from raw multi-source data and uses it as a shared contract for knowledge graph construction and query-time retrieval. A closed-world field catalog constrains LLM-based schema discovery to attested fields; deterministic structural analysis infers identity keys, foreign keys, and source hierarchy; and the resulting schema drives extraction, deduplication, and cross-source linking into a provenance-aware knowledge graph. At query time the schema -- optionally extended via a monotonic protocol -- conditions a multi-tool agent routing retrieval across structured lookup, graph traversal, and vector search, returning grounded answers with traceable citations. In controlled zero-shot comparisons using the same LLM, data, and evaluation harness, the system improves over retrieval-only and decomposition-based baselines across four QA benchmarks, with ablations showing that schema-conditioned routing, structural intelligence, and schema-guided construction each contribute to the gains.

URL PDF HTML ☆

赞 0 踩 0

2606.05413 2026-06-05 cs.LG cs.AI

CausalPOI: Spatio-Temporal Graph-Based Causal Modeling for Cold-Start POI Check-in Forecasting

CausalPOI：基于时空图因果建模的冷启动POI签到预测

Zhaoqi Zhang, Miao Xie, Yi Li, Linyou Cai, Siqiang Luo, Gao Cong

发表机构 * Nanyang Technological University（南洋理工大学）； China Agricultural University（中国农业大学）； Meituan（美团）

AI总结提出CausalPOI框架，利用时空功能交互图建模POI间语义和空间关系，通过结构对齐的处理和对照图模拟事实与反事实场景，解决冷启动POI签到预测问题，在真实数据集上显著优于基线。

详情

DOI: 10.1145/3770855.3817641
Comments: Accepted at KDD 2026

AI中文摘要

随着城市环境的快速演变，准确建模兴趣点（POI）的动态行为对于支持数据驱动的城市规划和商业决策至关重要。尽管时空图学习的最新进展改进了POI预测，但大多数方法依赖于基于邻近性的图和相关性驱动建模，忽略了POI之间的功能依赖关系，且未能捕捉城市干预的因果效应。本文引入了一个新的研究问题——冷启动POI签到预测，旨在通过建模新引入POI的时间演化及其与附近POI在结构化城市空间背景下的功能交互，预测其未来的签到模式。为应对这些挑战，我们提出了CausalPOI，一个基于时空图的因果表示学习框架。CausalPOI利用时空功能交互图建模POI之间的语义和空间关系，并构建结构对齐的处理图和对照图以模拟事实和反事实场景。在真实SafeGraph数据集上的大量实验表明，CausalPOI在各方面显著优于最先进的基线，验证了其在时空预测、语义交互建模和因果效应估计方面的有效性，为城市干预分析提供了更可解释和可操作的基础。源代码可在Github获取。

英文摘要

As urban environments continue to evolve rapidly, accurately modeling the dynamic behaviour of Points of Interest is essential for supporting data-driven urban planning and commercial decision-making. While recent advancements in spatio-temporal graph learning have improved POI forecasting, most methods rely on proximity-based graphs and correlation-driven modeling, which overlook the functional dependencies between POIs and fail to capture the causal effects of urban interventions. In this paper, we introduce a novel research problem -- cold-start POI check-in forecasting, which aims to predict the future check-in pattern of a newly introduced POI, by modeling its temporal evolution and functional interactions with nearby POIs in a structured urban spatial context. To address these challenges, we propose CausalPOI, a spatio-temporal graph-based causal representation learning framework. CausalPOI leverages Spatio-Temporal Functional Interaction Graph to model semantic and spatial relationships between POIs, and constructs structurally aligned treatment and control graphs to simulate factual and counterfactual scenarios. Extensive experiments on real-world SafeGraph datasets demonstrate that CausalPOI significantly outperforms state-of-the-art baselines across the board, validating its effectiveness in spatio-temporal forecasting, semantic interaction modeling, and causal effect estimation, providing a more interpretable and actionable foundation for urban intervention analysis. Source code is available at Github.

URL PDF HTML ☆

赞 0 踩 0

2606.05411 2026-06-05 cs.AI cs.HC

A Motivational Architecture for Conversational AGI

对话式通用人工智能的动机架构

Anna Mikeda, Ben Goertzel

发表机构 * Glass Umbrella（玻璃伞）； SingularityNet

AI总结本文提出一种对话式动机架构，将OpenPsi动机谱系重新解释为对话原生术语，并耦合MetaMo的高层动机支架，通过十阶段动机处理流水线、双决策策略以及行动前感受与行动后情绪的功能区分，实现对话智能体的能力调节、不确定性减少、亲和力等动机管理。

详情

Comments: 16 pages. Accepted for AGI-26 proceedings

AI中文摘要

认知AI中的动机架构主要设计用于调节身体需求的物理智能体。对话智能体运行在另一种机制中：其感觉运动回路是语言性的，其环境是用户不断演变的心理状态，其有后果的行动是言语行为、工具调用和策略性沉默。本文提出对OpenPsi动机谱系的对话式重新解释，耦合MetaMo的高层动机支架，用于构建在模块化执行基底上的智能体。稳态被重新定义为对话原生的术语：智能体调节的是能力、不确定性减少、亲和力、喜爱度、合法性、培育和审美连贯性，而非身体缺陷。我们提出三个贡献：一个十阶段动机处理流水线，在架构上分离认知调节与情境评估；一个双决策策略，融合紧迫驱动的快速响应与深思熟虑的多目标优化；以及一个架构上有用的区分，即行动前感受与行动后情绪作为功能上不同的情感形式。我们将该框架专门化到两个示例智能体——伴侣智能体与研究智能体——并勾勒其向社交机器人和领域通用的人类级通用人工智能的扩展。

英文摘要

Motivational architectures in cognitive AI have largely been designed for physical agents regulating bodily needs. Conversational agents operate in a different regime: their sensorimotor loop is linguistic, their environment is a user's evolving mental state, and their consequential actions are speech acts, tool invocations, and strategic silences. This paper proposes a conversational reinterpretation of the OpenPsi motivational lineage, coupled to MetaMo's higher-level motivational scaffold, for agents built on a modular execution substrate. Homeostasis is recast in dialogue-native terms: the agent regulates competence, uncertainty reduction, affiliation, affinity, legitimacy, nurturing, and aesthetic coherence rather than bodily deficits. We propose three contributions: a ten-stage motivational processing pipeline that architecturally separates cognitive modulation from situational appraisal; a dual decision strategy blending urgency-driven fast response with deliberative multi-goal optimization; and an architecturally useful distinction between pre-action feelings and post-action emotions as functionally different forms of affect. We specialize the framework to two example agents -- CompanionAgent and ResearchAgent -- and sketch its extension to social robotics and domain-generic human-level AGI.

URL PDF HTML ☆

赞 0 踩 0

2606.05408 2026-06-05 cs.AI cs.NE

Mutation Without Variation: Convergence Dynamics in LLM-Driven Program Evolution

无变异的突变：LLM驱动的程序进化中的收敛动力学

Can Gurkan, Forrest Stonedahl, Uri Wilensky

发表机构 * Northwestern University（西北大学）； Augustana College（奥古斯塔纳学院）

AI总结研究LLM在无选择压力下反复变异程序时，是否探索新形式或循环回到旧形式，发现LLM变异一致收敛到受限吸引子区域，结构层面87%的链中超过93%的变异重复先前结构形式。

详情

Comments: Accepted to the Genetic and Evolutionary Computation Conference (GECCO '26) Workshop on Large Language Models for and with Evolutionary Computation

AI中文摘要

当LLM反复变异一个程序时，它是探索新形式还是循环回到旧形式？我们通过分析领域特定语言中无选择压力下的LLM驱动变异链来研究这个问题，变化提示设计、模型族和随机复制。我们发现基于LLM的变异一致收敛到程序空间中的受限吸引子区域。收敛在结构层面尤其严重：在87%的链中，超过93%的变异重复先前看到的结构形式，大多数变异局限于重复模板内的终端替换。循环分析显示短循环和自环主导转移结构。收敛速度随提示措辞和模型选择而变化，但该现象在不同条件下都很稳健。经典的GP子树变异算子没有表现出类似的收敛，表明该效应是LLM变异管道固有的。这些发现揭示了LLM驱动程序进化核心的张力：使语义感知程序转换成为可能的相同能力也带来了对结构同质性的系统性偏差，如果此类系统要维持开放式探索，必须考虑这一点。源代码可在 https://github.com/can-gurkan/lmca 获取。

英文摘要

When an LLM repeatedly mutates a program, does it explore new forms or circle back to the same ones? We study this question by analyzing LLM-driven mutation chains in the absence of selection pressure within a domain-specific language, varying prompt design, model family, and stochastic replication. We find that LLM-based mutation consistently converges toward restricted attractor regions in program space. Convergence is especially severe at the structural level: in 87% of chains, over 93% of mutations revisit a previously seen structural form, with most variation confined to terminal substitutions within recurring templates. Cycle analysis reveals short cycles and self-loops dominating the transition structure. The rate of convergence varies with prompt wording and model choice, but the phenomenon is robust across conditions. A classical GP subtree mutation operator does not exhibit comparable convergence, suggesting that the effect is intrinsic to the LLM mutation pipeline. These findings reveal a tension at the heart of LLM-driven program evolution: the same capabilities that enable semantics-aware program transformation also carry a systematic bias toward structural homogeneity that must be accounted for if such systems are to sustain open-ended exploration. Source code is available at https://github.com/can-gurkan/lmca.

URL PDF HTML ☆

赞 0 踩 0

2606.05407 2026-06-05 cs.RO

MoDex: A Diffusion Policy for Sequential Multi-Object Dexterous Grasping

MoDex：用于顺序多物体灵巧抓取的扩散策略

Haofei Lu, Hongjia Liu, Yifei Dong, Florian T. Pokorny, Jens Lundell, Danica Kragic

发表机构 * Department of Robotics, Perception and Learning, KTH Royal Institute of Technology（机器人、感知与学习系，皇家理工学院）； Robotics and Autonomous Systems at University of Turku（图尔库大学机器人与自主系统）

AI总结提出MoDex扩散策略，通过对抗空间和点云条件预测抓取姿态，实现单只灵巧手顺序抓取多物体而不释放已抓物体，并通过两阶段训练（模仿学习+强化学习微调）提升成功率。

详情

Comments: Submitted to CoRL 2026

AI中文摘要

本工作解决了用单只灵巧手顺序抓取多个物体而不释放已抓物体的问题。大多数灵巧抓取方法将手的所有自由度用于单个物体，未能充分利用其灵巧性，且没有为后续抓取留下冗余。所提出的解决方案MoDex是一种扩散策略，它直接从观测中预测下一个抓取器姿态，并以对抗空间和点云为条件。对抗空间条件指定了哪些手指参与当前抓取，使抓取器仅使用其可用自由度的一个子集，同时保留剩余自由度用于后续抓取。为了促进从仿真到现实的迁移，MoDex分两个阶段训练：首先通过专家演示的模仿学习，然后通过强化学习微调，这持续提高了预训练策略的成功率。我们在基于MuJoCo的Franka Emika Panda机器人（配备Allegro Hand）的仿真中以及相应的真实世界硬件平台上评估了MoDex。在仿真和真实世界实验中，MoDex均取得了比所评估的基于学习的基线方法更高的成功率，性能分别提升了2.92-17.92%和6.67-17.78%。项目页面：https://modex2026.github.io/。

英文摘要

This work addresses sequentially grasping multiple objects with a single dexterous hand without releasing those already held. Most dexterous grasping methods commit all of the hand's degrees of freedom to a single object, underutilizing its dexterity and leaving no redundancy for subsequent grasps. The proposed solution, MoDex, is a diffusion policy that predicts the next gripper pose directly from observations, conditioned on an opposition space and point cloud. The opposition space condition specifies which fingers participate in the current grasp, enabling the gripper to use only a subset of its available degrees of freedom while reserving the remaining degrees of freedom for subsequent grasps. To facilitate sim-to-real transfer, MoDex is trained in two stages: first through imitation learning on expert demonstrations, and subsequently through reinforcement learning fine-tuning, which consistently improves success rates over the pre-trained policy. We evaluate MoDex in simulation on a MuJoCo-based Franka Emika Panda robot equipped with an Allegro Hand and on the corresponding real-world hardware platform. Across both simulation and real-world experiments, MoDex achieves higher success rates than the evaluated learning-based baselines, improving performance by 2.92-17.92% and 6.67-17.78%, respectively. Project page: https://modex2026.github.io/.

URL PDF HTML ☆

赞 0 踩 0