arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

cs.LG 机器学习应用 61 cs.AI 机器学习与表示学习 48 cs.LG 深度学习架构与训练方法 40 cs.AI 评测、基准与数据集 36 cs.AI AI应用与系统 34 cs.CL 评测、数据集与基准 34 cs.AI 可信、安全与AI治理 29 cs.LG 优化、泛化与理论分析 29 cs.AI 自然语言与多模态智能 26 cs.AI 智能体、规划与决策 25 cs.LG 高效学习、压缩与部署 24 cs.LG 数据集、基准与评测 24 cs.CL 大语言模型与基础模型 22 cs.LG 强化学习与序列决策 21 cs.CV 生成式视觉与世界模型 18 cs.CV 数据集、基准、评测与训练方法 18 cs.CL 其他/综合NLP 17 cs.LG 鲁棒性、不确定性与可信学习 17 cs.LG 生成模型与概率建模 15 cs.CL 安全、隐私、公平与可解释NLP 13 cs.LG 其他/综合机器学习 13 cs.AI 机器人与具身智能 12 cs.AI 其他/综合AI 12 cs.LG 联邦学习、隐私与安全 12 cs.RO 机器人学习与模仿强化学习 12 cs.CV 具身智能、机器人与自动驾驶 11 cs.CV 医学影像与生物视觉 11 cs.CL 信息抽取、检索与问答 11 cs.CL 对话系统与智能体 11 cs.RO 操作、抓取与灵巧手 11 cs.CV 多模态与视觉语言模型 10 cs.CV 其他/综合视觉 10 cs.CV 3D视觉、点云与空间智能 9 cs.RO 导航、定位与SLAM 9 cs.CV 低层视觉、计算成像与图像增强 8 cs.CV 鲁棒性、安全、隐私与可信视觉 8 cs.CL 语音语言联合与音频文本 8 cs.LG 表示学习、自监督与对比学习 8 cs.CV 目标检测、分割与定位 7 cs.CL 多模态语言处理 7 cs.LG 图学习与结构化数据 7 cs.RO 运动规划、控制与动力学 7 cs.RO 无人车、无人机与移动机器人 7 cs.RO 仿真、数据集与评测 7 cs.AI 知识表示、推理与符号AI 6 cs.AI 多智能体与博弈 6 cs.CV 视频理解与时序视觉 6 cs.RO 具身智能与视觉语言动作模型 6 cs.AI 搜索、优化与约束求解 5 cs.CV 图像识别、检索与分类 5 cs.RO 人机交互与协作机器人 5 cs.SD 安全、隐私与深度伪造音频 5 cs.CL 低资源、领域适配与高效训练 4 cs.LG 迁移、元学习与持续学习 4 cs.SD 说话人识别、验证与分离 4 cs.CL 语义、语法与语言学分析 3 cs.SD 语音合成与声音生成 3 cs.SD 数据集、基准与评测 3 cs.SD 其他/综合语音音频 3 cs.RO 多机器人与群体系统 2 cs.RO 软体机器人与硬件设计 2 cs.RO 安全、鲁棒性与可信机器人 2 cs.SD 语音识别与关键词检测 2 cs.SD 音乐信息检索与音乐生成 2 cs.CV 文档图像、OCR与图表理解 1 cs.RO 其他/综合机器人 1 cs.SD 音频事件检测与场景理解 1

2606.14156 2026-06-15 cs.LG cs.AI 新提交

Learning High Coverage Discriminative Parsimonious Rulesets

学习高覆盖判别性简约规则集

Mariamma Antony, Raman Sankaran, Chiranjib Bhattacharyya, Uma Satya Ranjan

发表机构 * Indian Institute of Science（印度科学研究所）； Compass

AI总结提出CDPR方法，通过子模最大化算法学习高覆盖、判别性且简约的规则集，在保持高准确率的同时显著提升可解释性，覆盖率比次优算法提升2.5倍以上。

详情

AI中文摘要

基于IF-THEN规则表示的学习系统易于提供可解释性，使其成为当代人工智能研究的关键焦点。此类规则集的一个关键目标是实现高判别能力和可解释性。虽然现有的最先进算法隐式地优先考虑预测准确性，但它们通常在确保可解释性的一个或多个质量指标（如规则集的覆盖率和简约性）上表现不足。受此启发，本文提出开发CDPR，旨在为分类问题创建高度准确且可解释的规则集。据我们所知，这是首次尝试建立这样的方法。在本研究中，我们引入了两种基于子模最大化的算法，这些算法不仅提供了可证明的覆盖率保证，而且产生的规则集既具有判别性又简约。我们通过实验证明，通过我们的方法学习的规则集在准确性和可解释性方面表现更好，并且与次优算法相比，平均覆盖率提高了2.5倍以上。

英文摘要

Learning systems based on IF-THEN rule representations readily offer interpretability, making them a crucial focus in contemporary AI research. A key objective for such rule sets is to achieve both high discriminative power and interpretability. While existing state-of-the-art algorithms implicitly prioritize predictive accuracy, they often fall short on one or more quality metrics that ensure interpretability, such as coverage and parsimony of rule sets. Motivated by this, this paper propose the development of CDPR, which aims to create highly accurate and interpretable rule sets for classification problems. To the best of our knowledge, this represents the first attempt to establish such an approach. In this study, we introduce two algorithms rooted in submodular maximization, which not only provide provable guarantees on coverage but also yield rule sets that are both discriminative and parsimonious. We empirically demonstrate that rule sets learned through our approaches achieve higher accuracy and interpretability and has more than a 2.5-fold improvement in average coverage rates when compared to the next best algorithm.

URL PDF HTML ☆

赞 0 踩 0

2606.14155 2026-06-15 cs.LG cs.CL 新提交

Graph-based Target Back-Propagation for Context Adaptation in Multi-LLM Agentic Systems

基于图的目标反向传播用于多LLM智能体系统中的上下文自适应

Tan Zhu, Tong Yao, Kananart Kuwaranancharoen, Amit Singh, Yushang Lai, Deepa Mohan, Shankara Bhargava

发表机构 * Retail Intelligence, Walmart Global Tech（零售智能，沃尔玛全球技术）

AI总结提出GTBP框架，通过图结构反向传播局部目标输出，实现多LLM智能体工作流的上下文自适应，理论保证稳定性，实验优于基线。

2606.14153 2026-06-15 cs.CV cs.RO 新提交

Encoder Winners Do Not Reliably Transfer Across VLA Backbone Scale: A Frozen-Backbone Grafting Diagnostic

编码器胜者无法可靠跨VLA骨干网络规模迁移：一种冻结骨干嫁接诊断方法

Qingping Zeng, Fei She

发表机构 * Tsinghua University（清华大学）

AI总结提出冻结骨干嫁接诊断方法，发现小规模VLA上最优的视觉编码器在大规模骨干上并非最优，编码器选择依赖于骨干网络规模。

Comments 23 pages, 5 figures, 8 tables

详情

AI中文摘要

视觉-语言-动作（VLA）策略通常从其上游VLM发布中继承视觉编码器，但目前尚不清楚在小规模VLA上验证的编码器选择是否能迁移到更大的骨干网络上。我们引入了一种冻结骨干嫁接诊断方法：将已发布VLA的视觉塔替换为候选编码器，采用固定协议（自适应平均池化、LayerNorm和单个可训练的线性投影器），同时冻结语言模型和动作专家。在四个编码器、两个LIBERO套件、两个骨干网络（SmolVLA-450M和$\pi_{0.5}$-3.3B）以及每个单元两到三个随机种子（共40次主要嫁接运行，加上原生、LoRA、池化以及零/打乱图像对照，全部通过离线动作MSE评分）的条件下，小骨干网络的胜者无法可靠地选出大骨干网络的顶级编码器：SigLIP在SmolVLA上两个套件中均表现最佳，而在$\pi_{0.5}$上，DINOv2-small在空间套件中领先，物体套件则是对种子敏感的接近平局带；四个骨干-套件比较中的三个（以及12个种子级单元中的11个）支持依赖于骨干网络的排名。嫁接包装本身并非中性，在两个骨干网络上符号相反（在SmolVLA原生视觉塔上MSE增加45-56%，在$\pi_{0.5}$上降低50-52%），因此所有结论都依赖于固定的嫁接协议。我们将冻结嫁接定位为一种廉价的靶向骨干诊断方法，在承诺大规模使用编码器之前运行，而非闭环部署声明。

英文摘要

Vision-language-action (VLA) policies typically inherit their vision encoder from upstream VLM releases, but it is unclear whether an encoder choice validated on a small VLA transfers to a larger backbone. We introduce a frozen-backbone grafting diagnostic: the vision tower of a released VLA is replaced by a candidate encoder under a fixed protocol (adaptive average pooling, LayerNorm, and a single trainable linear projector), with the language model and action expert frozen. Across four encoders, two LIBERO suites, two backbones (SmolVLA-450M and $π_{0.5}$-3.3B), and two-to-three seeds per cell (40 main grafting runs plus native, LoRA, pooling, and zero-/shuffled-image controls, all scored by offline action MSE), the small-backbone winner does not reliably select the large-backbone top tier: SigLIP is best on SmolVLA across both suites, while on $π_{0.5}$ DINOv2-small leads the spatial suite and the object suite is a seed-sensitive near-tie band; three of the four backbone-suite comparisons (and 11 of 12 seed-level cells) support backbone-dependent rankings. The grafting wrapper is itself non-neutral with opposite sign across backbones (+45-56% MSE on the SmolVLA native tower, -50-52% on $π_{0.5}$), so all conclusions are conditional on the fixed grafting protocol. We position frozen grafting as a cheap target-backbone diagnostic to run before committing to an encoder at scale, not as a closed-loop deployment claim.

URL PDF HTML ☆

赞 0 踩 0

2606.14150 2026-06-15 cs.LG cs.CL 新提交

Small LLMs: Pruning vs. Training from Scratch

小型LLM：剪枝 vs. 从头训练

Yufeng Xu, Taiming Lu, Kunjun Li, Jiachen Zhu, Mingjie Sun, Zhuang Liu

发表机构 * Princeton University（普林斯顿大学）； New York University（纽约大学）； Carnegie Mellon University（卡内基梅隆大学）

AI总结本文通过六种剪枝方法在Llama-3.1-8B上比较剪枝与从头训练，发现有限预算下剪枝更优，预算充足时粗粒度剪枝可被超越。

Comments Our code is available at https://github.com/zlab-princeton/llm-pruning-collection

详情

AI中文摘要

剪枝有望成为获得强大小型语言模型的捷径。在本工作中，我们通过六种涵盖深度、宽度和稀疏粒度的剪枝方法，在两种受控的token匹配设置下，以0.5-0.8的剪枝率对Llama-3.1-8B进行剪枝，检验了这一承诺。(1) 在相同的训练token预算下，剪枝初始化始终优于随机初始化。这表明父模型提供了一个强起点，尽管随着训练token预算的增加和剪枝率的提高，优势逐渐缩小，在我们研究的最高剪枝率下几乎消失。(2) 当从头训练被给予整个流程消耗的全部token预算时，细粒度剪枝仍保持优势，而粗粒度结构化剪枝可能被匹配或超越。这表明父模型传递了额外训练token无法完全恢复的知识，但仅在细粒度下如此。综合来看，我们的结果给出了明确的建议：当手头有一个大型预训练模型且训练token预算有限时，剪枝优于从头训练；当训练预算不受限时，从头训练在粗粒度剪枝下可能具有竞争力，因此大型预训练父模型并非总是必要的。

英文摘要

Pruning promises a shortcut to strong small language models. In this work, we examine this promise by pruning Llama-3.1-8B at pruning ratios of 0.5--0.8 with six methods spanning depth, width, and sparse granularities, under two controlled token-matched settings. (1) With the same training token budget, pruned initialization consistently outperforms random initialization. This shows that the parent model provides a strong starting point, although the advantage narrows as the training token budget grows and as the pruning ratio rises, nearly vanishing at the highest pruning ratio we study. (2) When training from scratch is instead given the full token budget consumed by the whole pipeline, pruning at finer granularities still retains an advantage, while coarser structured pruning can be matched or surpassed. This suggests that the parent model transfers knowledge that additional training tokens alone cannot fully recover, but only at fine granularity. Taken together, our results yield a clear recommendation: with a large pretrained model in hand and a limited training token budget, pruning is better than training from scratch; when the training budget is not limited, training from scratch can be competitive for coarser pruning, so a large pretrained parent is not always necessary.

URL PDF HTML ☆

赞 0 踩 0

2606.14149 2026-06-15 cs.LG 新提交

Trust but Verify: Mitigating Medical Hallucinations via Post-Hoc Adversarial Auditing and Multi-Agent Feedback Loops

信任但验证：通过事后对抗审计和多智能体反馈循环减轻医学幻觉

Muhammad Osama, Maheera Amjad, Zartasha Mustansar, Arslan Shaukat, Muhammad U. S. Khan

发表机构 * Data Science and Machine Learning Lab, SINES, NUST（NUST SINES数据科学与机器学习实验室）； SINES, NUST（NUST SINES）； CEME, NUST（NUST CEME）

AI总结本研究提出一种五智能体“信任但验证”系统，通过事后对抗审计和多智能体反馈循环，将大型语言模型在临床问题中推荐禁用药品的幻觉错误率降低约53%。

详情

AI中文摘要

大型语言模型（LLM）越来越多地部署在医疗环境中，但其产生幻觉的倾向在涉及临床决策时带来风险。本研究考察LLM在回答临床问题时是否会推荐近期被禁止或撤回的药品，并测试一种基于智能体的方法来减少此类错误。我们使用单一LLM骨干开发了一个五智能体“信任但验证”系统。为了衡量监管知识过时性，我们创建了一个包含103个临床多项选择题的对抗数据集，其中历史上正确的答案现在指向禁用物质。该规模确保了跨各种治疗类别的统计显著性。我们评估了三个开放访问模型家族（GPT-OSS、Llama-3、Falcon-3）在原始和智能体条件下的表现。通过逐点得分、标签准确率、幻觉错误率（HER）和组件保真度（CF）得分来衡量性能。我们还观察到专有模型中的临床安全性退化。在默认配置下，所有模型都显示出高幻觉率，一致地选择了与训练数据模式匹配的禁用药物。我们提出的智能体架构将各模型的HER降低了约53%。逐点得分从-0.25（不安全推荐）转向0.0（适当拒绝）。即使模型的参数知识倾向于禁用物质，安全审计也能拦截危险输出。所提出的多智能体框架提供了一种模型无关的方法来强制执行监管合规性，优先考虑患者安全而非流畅的文本生成。我们的工作展示了在安全关键的医疗环境中部署自主AI系统的实用方法，并说明了如何将实时监管数据集成到LLM流水线中以支持临床决策。

英文摘要

Large Language Models (LLMs) are increasingly deployed in healthcare settings, yet their tendency to hallucinate poses risks when clinical decisions are involved. This study examine whether LLMs recommend recently banned or withdrawn pharmaceuticals when answering clinical questions and tests an agent-based method for reducing such errors. We developed a five-agent "Trust but Verify" system using a single LLM backbone. To measure regulatory knowledge obsolescence, we created an adversarial dataset of 103 clinical MCQs where historically correct answers now refer to banned substances. This scale ensures statistical significance across various therapeutic classes. We evaluated three open-access model families (GPT-OSS, Llama-3, Falcon-3) under vanilla and agentic conditions. Performance was measured via pointwise score, label accuracy, Hallucination Error Rate (HER), and Component Fidelity (CF) score. We also observed clinical safety regression in proprietary models. In default configurations, all models showed high hallucination rates, consistently selecting banned drugs that matched training data patterns. Our proposed agentic architecture reduced HER by approximately 53% across models. Pointwise scores shifted from -0.25 (unsafe recommendation) toward 0.0 (appropriate refusal). The safety audit intercepted dangerous outputs even when models' parametric knowledge favored the banned substance. The proposed multi-agent framework offers a model-agnostic method for enforcing regulatory compliance that prioritizes patient safety over fluent text generation. Our work demonstrates a practical approach for deploying autonomous AI systems in safety-critical healthcare settings. It shows how real-time regulatory data can be integrated into LLM pipelines to support clinical decision-making.

URL PDF HTML ☆

赞 0 踩 0

2606.14145 2026-06-15 cs.CL 新提交

Personal Care Utility: Health as Everyday Infrastructure

个人护理公用设施：健康作为日常基础设施

Mahyar Abbasian, Elahe Khatibi, Saba A. Farahani, Nitish Nagesh, Arshia Ilaty, Hooman Sajjadi, Amir Rahmani, Ramesh Jain

发表机构 * University of California, Irvine（加利福尼亚大学尔湾分校）

AI总结提出个人护理公用设施（PCU）分层事件驱动架构，将健康视为日常基础设施，通过Personicle组织连续信号，分离临床决策与语言表达，以2型糖尿病为例验证其生成实时干预和知识引导的能力。

Comments 12 pages, 2 figures, 3 tables

详情

AI中文摘要

医疗保健本质上是必要的、专业的和偶发性的——围绕一个人每年与临床医生相处的大约一小时而设计。在临床环境之外的8,759小时中，饮食、睡眠、运动、用药和压力实际上塑造着长期健康，却没有相应的基础设施。个性化健康的瓶颈不是原始数据或推理能力，而是缺乏这一基础设施层。本文介绍了个人护理公用设施（PCU）：一种分层事件驱动架构，被提议作为日常健康缺失的公用设施，就像支付、网络和电力是其领域的公用设施一样。PCU通过Personicle将连续的个人信号组织成具有语义意义的生活事件，根据个人基线估计动态健康状态，推理原因和背景，并通过一个编排器将临床决策逻辑、行为策略选择和自然语言表达分离，从而引导指导。这种分离使得大型语言模型能够支持推理和沟通，同时将安全关键的临床决策建立在经过验证的证据基础上。我们针对2型糖尿病实例化了PCU——将CGM、饮食、活动、用药、睡眠、压力和临床数据转化为血糖事件、个性化状态估计、因果解释和基于知识的干预。一个日常生活场景展示了相同的基础设施根据背景和风险产生实时提示、每周总结、用药检查、沉默或确定性安全警报。最后，我们讨论了PCU如何推广到其他慢性疾病以及任何始终在线的个人健康公用设施必须解决的治理问题。结果是一个蓝图，将个性化视为日常健康指导的架构属性，而不是最终的消息传递层。

英文摘要

Healthcare is essential, expert, and episodic by design - built around the roughly one hour per year a person spends with a clinician. The 8,759 hours outside clinical settings, where eating, sleeping, movement, medication, and stress actually shape long-term health, have no comparable infrastructure. The bottleneck for personalized health is not raw data or reasoning capability; it is the absence of that infrastructure layer. This paper introduces the Personal Care Utility (PCU): a layered, event-driven architecture proposed as the missing utility for everyday health, in the way that payments, networks, and power are utilities for their domains. PCU organizes continuous personal signals into semantically meaningful life events through a Personicle, estimates dynamic health state against personal baselines, reasons about cause and context, and routes guidance through an orchestrator that separates clinical decision logic, behavioral strategy selection, and natural-language expression. This separation lets large language models support reasoning and communication while keeping safety-critical clinical decisions grounded in validated evidence. We instantiate PCU for Type 2 Diabetes - turning CGM, meal, activity, medication, sleep, stress, and clinical data into glycemic events, individualized state estimates, causal explanations, and knowledge-grounded interventions. A day-in-the-life scenario shows the same infrastructure producing real-time nudges, weekly summaries, medication check-ins, silence, or deterministic safety alerts depending on context and risk. We close with how PCU generalizes to other chronic conditions and the governance questions any always-on personal health utility must address. The result is a blueprint that treats personalization not as a final messaging layer, but as an architectural property of everyday health guidance.

URL PDF HTML ☆

赞 0 踩 0

2606.14141 2026-06-15 cs.SD cs.AI cs.CL 新提交

Spatio-Temporal Audio Language Modeling for Dynamic Sound Sources

动态声源的时空音频语言建模

Oh Hyun-Bin, Kazuki Shimada, Yuhta Takida, Kim Sung-Bin, Toshimitsu Uesaka, Takashi Shibuya, Kyeongyoon Lee, Tae-Hyun Oh, Yuki Mitsufuji

发表机构 * POSTECH（浦项科技大学）； Sony AI（索尼AI）； Sony Group Corporation（索尼集团）； Sungkyunkwan University（成均馆大学）； KAIST（韩国科学技术院）

AI总结提出ST-AudioLM模型，通过时空音频编码器联合学习事件语义与源轨迹，在ST-AudioQA基准上提升动态声源问答的语义-定位权衡。

详情

AI中文摘要

声音事件是具有语义身份、位置和轨迹的实体，但当前的音频-语言模型通常将片段推理为全局事件内容。相反，声音事件定位模型随时间跟踪声源方向，但对语言推理的语义覆盖有限。为解决这一差距，我们引入了ST-AudioQA，一个基于一阶环绕声（FOA）渲染的静态和移动声源的时空音频问答数据集和基准。每个场景提供源身份、活动、方向、距离和运动元数据，实现密集轨迹监督以及关于什么在发声、在哪里、如何移动以及源之间关系的问题。我们进一步提出了ST-Audio Encoder，一种时间分辨的FOA音频编码器，联合学习事件语义和源轨迹，以及ST-AudioLM，它将编码器的音频令牌连接到LLM进行时空音频问答。实验表明，这种表示改善了语义-定位权衡，并比静态空间和面向定位的基线产生更强的推理性能。

英文摘要

Sound events are entities with semantic identities, locations, and trajectories, but current audio-language models usually reason about clips as global event content. Conversely, sound event localization models track source directions over time but offer limited semantic coverage for language reasoning. To address this gap, we introduce ST-AudioQA, a spatio-temporal audio QA dataset and benchmark built from first-order ambisonic (FOA) renderings of static and moving sound sources. Each scene provides source identity, activity, direction, distance, and motion metadata, enabling dense trajectory supervision and questions about what is sounding, where it is, how it moves, and how sources relate. We further propose ST-Audio Encoder, a time-resolved FOA audio encoder that learns event semantics together with source trajectories, and ST-AudioLM, which connects the audio tokens from the encoder to an LLM for spatio-temporal audio QA. Experiments show that this representation improves the semantic-localization tradeoff and yields stronger reasoning performance than static spatial and localization-oriented baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.14139 2026-06-15 cs.LG 新提交

Decoupled Latent Optimization of Diffusion Models for Full Waveform Inversion

全波形反演的扩散模型解耦潜变量优化

Chen Min, Zheng Ma

发表机构 * School of Mathematical Sciences, Shanghai Jiao Tong University（上海交通大学数学科学学院）； CMA-Shanghai, Shanghai Jiao Tong University（上海交通大学CMA-上海）

AI总结提出解耦潜变量优化（DLO），通过二次惩罚目标分离物理变量和潜变量，结合数据保真度梯度和扩散先验，在OpenFWI基准上优于经典正则化和现有扩散方法。

Comments 35 pages, 14 figures

详情

AI中文摘要

全波形反演（FWI）通过求解严重不适定、非凸的PDE约束优化，从地震记录中恢复地下速度。经典正则化方法稳定反演但无法再现真实地质结构；最近的扩散先验方法提高了真实性，但以数据保真度和先验一致性之间的脆弱权衡为代价。我们提出解耦潜变量优化（DLO），将标准潜变量优化形式松弛为辅助物理变量和潜变量上的二次惩罚目标。数据保真度梯度作用于物理空间，扩散采样器仅通过解码的先验样本贡献，并保留了经典FWI的标准平滑速度初始化。在OpenFWI基准上，DLO在干净、含噪和缺失道采集下优于经典正则化和现有扩散方法。在70×70 OpenFWI模型上训练的先验直接迁移到Marmousi和Overthrust基准，DLO恢复了复杂的断层结构，并对初始化平滑和测量噪声保持鲁棒。

英文摘要

Full waveform inversion (FWI) recovers subsurface velocity from seismic recordings by solving a severely ill-posed, nonconvex PDE-constrained optimization. Classical regularizers stabilize the inversion but fail to reproduce realistic geological structures; recent diffusion-prior methods improve realism at the cost of a fragile trade-off between data fidelity and prior consistency. We propose Decoupled Latent Optimization (DLO), which relaxes the standard latent-optimization formulation into a quadratic-penalty objective over an auxiliary physical variable and a latent variable. The data-fidelity gradient acts in physical space, the diffusion sampler contributes only through a decoded prior sample, and the standard smoothed-velocity initialization of classical FWI is preserved. On the OpenFWI benchmark, DLO outperforms classical regularizers and existing diffusion-based methods under clean, noisy, and missing-trace acquisitions. The prior, trained on 70*70 OpenFWI models, transfers directly to the Marmousi and Overthrust benchmarks, where DLO recovers intricate fault structures and remains robust to initialization smoothing and measurement noise.

URL PDF HTML ☆

赞 0 踩 0

2606.14130 2026-06-15 cs.LG cs.MA 新提交

Contract-Based Compositional Shielding for Safe Multi-Agent Reinforcement Learning

基于合约的组合屏蔽实现安全多智能体强化学习

Omar Adalat, Edwin Hamel-De le Court, Francesco Belardinelli

发表机构 * Imperial College London（伦敦帝国学院）； University of Manchester（曼彻斯特大学）

AI总结提出一种去中心化屏蔽方法，通过合约机制协调智能体局部LTL安全义务，在无集中运行时控制下保证全局安全并优化团队奖励。

详情

AI中文摘要

在多智能体强化学习中，当任何智能体无法单方面强制执行全局安全时，就会出现安全协调问题：一个智能体动作的可接受性可能取决于其他智能体的动态。去中心化屏蔽可以在运行时强制执行安全，但纯粹分解的权限通常会排除仅通过协调才能安全的团队最优行为。我们研究了在去中心化执行下训练和部署的智能体的确定性安全保证，无需集中运行时控制即可恢复团队最优的安全行为。智能体共享一个在安全线性时序逻辑片段（$\mathsf{LTL}_{\mathsf{safe}}$）中的全局规范$\phi$，并选择局部$\mathsf{LTL}_{\mathsf{safe}}$义务元组，这些义务的合取蕴含全局规范$\phi$。每个智能体可以依赖其他智能体的局部义务作为假设，因为整个合约元组同时被认证，并允许投影到局部动作掩码。在学习时，一个非平稳的多臂赌博机从局部$\mathsf{LTL}_{\mathsf{safe}}$义务库中选择元组以优化团队奖励，同时不放弃端到端安全性。我们在6个环境和15种算法变体上评估了该方法。

英文摘要

Safe coordination problems surface in multi-agent reinforcement learning when global safety cannot be enforced by any agent unilaterally: the admissibility of one agent's action may depend on the dynamics of other agents. Decentralised shields can enforce safety at runtime, but purely factorised permissions often exclude optimal team behaviour that is safe only through coordination. We study deterministic safety guarantees for agents trained and deployed under decentralised execution, recovering team-optimal safe behaviour without centralised runtime control. Agents have a shared global specification $ϕ$ in the safety fragment of Linear Temporal Logic ($\mathsf{LTL}_{\mathsf{safe}}$ ), and select among tuples of local $\mathsf{LTL}_{\mathsf{safe}}$ obligations whose conjunction implies the global specification $ϕ$. Each agent may rely on the other agents' local obligations as assumptions because the whole contract tuple is certified simultaneously and allows projection into local action masks. At learning time, a non-stationary multi-armed bandit chooses among a library of local $\mathsf{LTL}_{\mathsf{safe}}$ obligations to select the tuple that optimises team reward, all without forgoing end-to-end safety. We evaluate the approach across 6 environments and 15 algorithmic variants.

URL PDF HTML ☆

赞 0 踩 0

2606.14129 2026-06-15 cs.CV 新提交

BoRAD: Bootstrap your Own Representations for Multi-class Anomaly Detection

BoRAD: 自举表示实现多类异常检测

Duy Hoang Khuong, Tri Nguyen Minh, Ngu Huynh Cong Viet

发表机构 * Department of Artificial Intelligence, FPT University（FPT大学人工智能系）； Department of IT, FPT University（FPT大学信息技术系）； Department of Computing Fundamental, FPT University（FPT大学计算基础系）

AI总结提出BoRAD框架，通过原型正则化解决多类异常检测中重建模型的捷径和误重建问题，无需标签即可实现单模型多类检测。

详情

AI中文摘要

基于重建的异常检测在工业检测中具有吸引力，但将其从类别特定训练扩展到一劳永逸的设置具有挑战性。单个模型必须重建多样的正常外观，同时不复制异常细节，这暴露了两个耦合的失败模式：相同捷径，即异常通过重建路径；以及误重建，即正常类别相互混淆。我们提出\textbf{BoRAD}，一个无标签训练框架，将其视为表示容量分配问题。BoRAD使用共享的可学习原型库施加两个互补正则化器：空间原型对齐约束局部原型内变异以抑制异常复制，而原型相对全局对齐保留原型间结构并提高对异常角度偏差的敏感性。原型库和预测头仅在训练期间使用；推理保持标准的师生特征差异过程，无需类别标签、负样本对、内存检索或原型查找。BoRAD实现了具有竞争力的一劳永逸异常检测性能，包括MVTec AD上86.2% mAD、VisA上80.7% mAD和Real-IAD上73.1% mAD。诊断分析进一步显示异常泄漏减少、正常类别可分性提高以及异常-正常分数分离更强。

英文摘要

Reconstruction-based anomaly detection is attractive for industrial inspection, but scaling it from category-specific training to a one-for-all setting is challenging. A single model must reconstruct diverse normal appearances without copying abnormal details, which exposes two coupled failure modes: identical shortcut, where anomalies pass through the reconstruction path, and mis-reconstruction, where normal categories are confused with one another. We propose \textbf{BoRAD}, a label-free training framework that treats this as a representation-capacity allocation problem. BoRAD uses a shared learnable prototype bank to impose two complementary regularizers: spatial prototype alignment contracts local within-prototype variation to suppress anomaly copying, while prototype-relative global alignment preserves between-prototype structure and improves sensitivity to abnormal angular deviations. The prototype bank and prediction heads are used only during training; inference remains a standard teacher-student feature discrepancy pass, with no class labels, negative pairs, memory retrieval, or prototype lookup. BoRAD achieves competitive one-for-all anomaly detection performance, including 86.2\% mAD on MVTec AD, 80.7\% mAD on VisA and 73.1\% mAD on Real-IAD. Diagnostic analyses further show reduced anomaly leakage, improved normal-category separability, and stronger anomaly-normal score separation.

URL PDF HTML ☆

赞 0 踩 0

2606.14125 2026-06-15 cs.CV cs.AI 新提交

Conditioning Matters: Stabilizing Inversion and Attention in Diffusion Image Editing

条件至关重要：稳定扩散图像编辑中的反演与注意力

Zheyuan Zhan, Hongchen Li, Can Wang, Yinfei Ma, Mingzhen Huang, Ruoshi Bai, Jiawei Chen, Siwei Lyu, Defang Chen

发表机构 * State Key Laboratory of Blockchain and Data Security, Zhejiang University（浙江大学区块链与数据安全全国重点实验室）； HangZhou High-Tech Zong (Binjiang) Institute of Blockchain and Data Security（杭州高新技术产业开发区（滨江）区块链与数据安全研究院）； College of Computer Science, Zhejiang University（浙江大学计算机科学与技术学院）； University at Buffalo, State University of New York（纽约州立大学布法罗分校）

AI总结本文提出SimEdit框架，通过优化文本条件精度和令牌级跨分支注意力控制，提升扩散模型反演稳定性和编辑保真度，在PIE-Bench上显著优于先前方法。

Comments Accepted to ECML PKDD 2026 Research Track

详情

AI中文摘要

基于反演的图像编辑提供了灵活且无需训练的控制，但仍面临反演精度以及编辑保真度与背景保留之间的权衡问题。尽管最近的方法改进了反演公式或注意力交互，但文本条件在塑造扩散动态和编辑行为中的作用仍未得到充分探索。我们从经验和理论上证明，文本条件的精度通过调节扩散速度场的几何形状来影响反演稳定性，同时也会影响编辑过程中跨分支注意力的一致性。这些效应直接影响背景保留和语义保真度。基于这一分析，我们提出了SimEdit，一个条件感知框架，包含两个互补组件：(a) 条件细化，构建具有改进语义精度和结构对齐的条件信号，以促进稳定反演和一致的注意力操作；(b) 令牌级跨分支注意力控制，将编辑相关和结构保留组件分离，并在注意力操作期间对其进行非对称调节。在PIE-Bench上的大量实验表明，SimEdit在反演重建质量和编辑性能上均持续优于先前的注意力操作方法。我们的代码可在以下网址获取：https://this URL。

英文摘要

Inversion-based image editing offers flexible and training-free control but still struggles with inversion accuracy and the trade-off between editing fidelity and background preservation. While recent methods improve inversion formulations or attention interactions, the role of textual conditioning in shaping diffusion dynamics and editing behavior remains underexplored. We show both empirically and theoretically that the precision of textual conditioning influences inversion stability by modulating the geometry of the diffusion velocity field, while also affecting the consistency of cross-branch attention during editing. These effects directly impact background preservation and semantic fidelity. Building on this analysis, we propose SimEdit, a conditioning-aware framework with two complementary components: (a) conditioning refinement, which constructs conditioning signals with improved semantic precision and structural alignment to facilitate stable inversion and consistent attention manipulation, and (b) token-wise cross-branch attention control, which separates edit-relevant and structure-preserving components and modulates them asymmetrically during attention manipulation. Extensive experiments on PIE-Bench demonstrate that SimEdit consistently improves both inversion reconstruction quality and editing performance over previous attention-manipulation approaches. Our code is available at https://github.com/zju-pi/SimEdit.

URL PDF HTML ☆

赞 0 踩 0

2606.14123 2026-06-15 cs.LG cs.AI 新提交

Recovering Stranded Discrimination in Knowledge Tracing: Per-Item Bias Correction via Empirical-Bayes Shrinkage

知识追踪中恢复被搁置的区分能力：通过经验贝叶斯收缩进行逐项偏差校正

Xiaoran Yan, Cheng Tang, Atsushi Shimada

发表机构 * Kyushu University（九州大学）

AI总结提出SLC方法，利用Laplace/IRLS将二值观测转化为高斯伪观测，通过卡尔曼平滑器进行经验贝叶斯收缩，并拟合偏移Platt链接，以校正知识追踪模型中的逐项偏差，恢复被搁置的区分能力，在多个数据集和骨干网络上提升AUC和NLL。

Comments 25 pages, 3 figures. Accepted at ECML PKDD 2026 (Research Track). Code: https://github.com/xiaoran-y/SLC

详情

AI中文摘要

部署的知识追踪模型通常在训练后被冻结，但由于骨干架构中逐项表达能力的限制以及部署后项目属性的变化，会出现系统性的逐项logit偏差，从而降低预测质量。全局事后校准器（如Platt缩放、温度缩放和保序回归）能改善概率估计，但无法改变由AUC衡量的区分能力。这种AUC不变性是单调分数变换的结构性结果；恢复被搁置的区分能力需要以项目身份为条件。我们提出SLC（状态空间logit校正），通过Laplace/IRLS将二值观测转换为高斯伪观测，通过卡尔曼平滑器应用经验贝叶斯收缩，并拟合偏移Platt链接。状态空间公式还产生了一个可检测性界限，表征了伯努利信息下限，解释了在当前数据密度下时间跟踪为何没有益处。在四个数据集、五个骨干网络和三个随机种子上，SLC在所有四个数据集上提升了AUC，在三个数据集上提升了NLL，优势集中在稀疏项目上。跨领域控制表明，当部署的骨干网络留下实体级偏差时，类似现象可能出现在教育领域之外。

英文摘要

Deployed knowledge-tracing models are typically frozen after training, yet systematic per-item logit bias arises, from limited per-item expressivity in backbone architectures and from post-deployment shifts in item properties, degrading prediction quality. Global post-hoc calibrators such as Platt scaling, temperature scaling, and isotonic regression improve probability estimates but leave discriminative ability, as measured by AUC, unchanged. This AUC invariance is a structural consequence of monotone score-only transforms; recovering the stranded discrimination requires conditioning on item identity. We propose SLC (State-space Logit Correction), which converts binary observations to Gaussian pseudo-observations via Laplace/IRLS, applies empirical-Bayes shrinkage through a Kalman smoother, and fits an offset-Platt link. The state-space formulation also yields a detectability bound that characterizes the Bernoulli information floor, explaining why temporal tracking provides no benefit at current data densities. Across four datasets, five backbones, and three seeds, SLC improves AUC on all four datasets and NLL on three, with the advantage concentrating on sparse items. Cross-domain controls suggest that the same phenomenon can arise beyond education when the deployed backbone leaves entity-level bias.

URL PDF HTML ☆

赞 0 踩 0

2606.14122 2026-06-15 cs.CL 新提交

Beyond Perplexity: UTF-8 Validity in Byte-aware Language Models

超越困惑度：字节感知语言模型中的 UTF-8 有效性

Sangwhan Moon, Daisuke Oba, Youmi Ma, Tatsuya Hiraoka, Naoaki Okazaki

发表机构 * University of Tokyo（东京大学）； National Institute of Information and Communications Technology（信息通信技术国家研究所）

AI总结研究字节级分词语言模型生成无效UTF-8序列的问题，通过多语言训练实验发现UTF-8有效性收敛比困惑度慢约一倍，且罕见字符结构有效性更高，表明可靠UTF-8生成是需要单独评估的能力。

2606.14119 2026-06-15 cs.AI 新提交

FactoryLLM: A Safe and Open-Source AI Playground for Evaluating LLMs in Smart Factories

FactoryLLM：用于评估智能工厂中大语言模型的安全开源AI实验场

Yash Pulse, Yong-Bin Kang, Abhik Banerjee, Abdur Forkan, Prem Prakash Jayaraman

发表机构 * GitHub ； arXiv

AI总结提出FactoryLLM，一个安全开源的AI实验场，通过多机器文档分析评估基于RAG的大语言模型，采用RAGAS和NVIDIA LLM-as-a-Judge双评估机制，案例验证了跨机器文档推理的有效性。

Comments 6 pages, 3 figures, IEEE INDIN 2026

详情

AI中文摘要

智能工厂中的故障诊断和恢复具有挑战性，因为关键信息分散在通过制造过程相互连接的多台机器的手册中。大语言模型（LLM）提供了一种有前景的方法。在本文中，我们提出了FactoryLLM，一个安全开源的AI实验场，旨在通过分析制造过程中多台机器的文档来评估不同的基于LLM的检索增强生成（RAG）模型。FactoryLLM使用户能够配置LLM，并通过使用RAGAS和NVIDIA的LLM-as-a-Judge指标的双重评估设置，评估在多个文档上进行推理时的性能。FactoryLLM是安全的，因为它允许用户运行本地或开源LLM，而无需共享敏感的工业数据，提供了一个受控的实验环境。我们通过一个涉及自主智能车辆及其移动规划器软件的案例研究展示了FactoryLLM的有效性，评估了三个LLM在来自约600页跨机器文档的30个维护查询上的表现。结果表明，FactoryLLM在跨机器文档推理方面是有效的：每个模型的地面性得分均高于0.88。用于社区在特定制造场景中测试FactoryLLM的完整代码和文档已公开提供。

英文摘要

Fault diagnostics and recovery in smart factories is challenging because critical information is dispersed across manuals of multiple machines which are interconnected through the manufacturing process. Large Language Models (LLMs) can provide a promising approach. In this paper, we propose FactoryLLM, a safe and open-source AI playground designed for evaluating different LLM-based retrieval-augmented generation (RAG) models by analysing documents from multiple machines across the manufacturing process. FactoryLLM enables the user to configure the LLM, and assess performance when reasoning over multiple documents, through a dual evaluation setup using both RAGAS and NVIDIA's LLM-as-a-Judge metrics. FactoryLLM is safe because it allows users to run local or open-source LLMs without sharing sensitive industrial data, providing a controlled environment for experimentation. We demonstrate the efficacy of FactoryLLM through a case study which involves an Autonomous Intelligent Vehicle and its Mobile Planner software, evaluating three LLMs across 30 maintenance queries derived from approximately 600 pages of cross-machine documentation. The results suggest that FactoryLLM is effective in cross-machine document reasoning: every model achieved a groundedness score above 0.88. The full code and documentation for community to test FactoryLLM with their manufacturing specific scenarios are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2606.14116 2026-06-15 cs.LG stat.ME 新提交

DTVEM-RE: A Hierarchical Random-Effects Extension of the Differential Time-Varying Effect Model for Person-Specific Multi-Lag Estimation in Intensive Longitudinal Data

DTVEM-RE：差分时变效应模型的分层随机效应扩展，用于密集纵向数据中个体特异性多滞后估计

Amartya Bhattacharya

发表机构 * Geisel School of Medicine, Dartmouth College（达特茅斯学院盖泽尔医学院）

AI总结针对DTVEM假设所有人共享相同滞后结构的局限，提出DTVEM-RE扩展，允许个体拥有自己的滞后系数，通过贝叶斯分层VAR和连续时间OU模型实现，模拟和实证表明其能恢复个体间变异并提升预测性能。

详情

AI中文摘要

Jacobson等人（2019）提出的差分时变效应模型（DTVEM）是寻找密集纵向数据中最佳时间滞后的流行工具，但它假设所有人共享相同的滞后结构。原作者将此问题列为未来工作，这与现代临床研究的前提——个体存在差异——相冲突。我们提出DTVEM-RE，一种允许每个人拥有自己滞后系数的扩展，包含两种确认步骤版本：在Stan中实现的离散时间分层贝叶斯VAR，它在个体间进行信息汇集并提供校准的不确定性；以及在ctsem中实现的连续时间个体Ornstein-Uhlenbeck模型，它直接处理不均匀间隔的测量点。我们报告了四个结果。模拟显示，贝叶斯版本恢复个体间变异tau_a的偏差低于0.01，覆盖率为90%至93%。在Fisher等人（2017）的EMA数据集（N=40）上，个体特异性滞后1效应在三个情绪项目上相差一个数量级，贝叶斯和GAMM估计高度一致（r=0.87至0.92），且DTVEM-RE在四种离散时间方法中给出最佳的一步预测。多滞后版本显示所有九个tau_k值的可信区间均排除零，且个体差异最大的滞后在不同项目间变化，这是仅考虑滞后1的方法（如mlVAR）无法检测到的。最后，两个版本在个体特异性滞后1估计上几乎完全一致（r >= 0.995），差异仅如收缩所预测。据我们所知，DTVEM-RE是DTVEM风格滞后检测的第一个个体特异性实现，并且它包含标准DTVEM作为特例。

英文摘要

The Differential Time-Varying Effect Model (DTVEM) of Jacobson et al. (2019) is a popular tool for finding the best time lag in intensive longitudinal data, but it assumes everyone shares the same lag structure. The original authors named fixing this as future work, and it clashes with the premise of modern clinical research, which is that people differ. We present DTVEM-RE, an extension that lets each person have their own lag coefficients, with two versions of the confirmatory step: a discrete-time hierarchical Bayesian VAR in Stan, which pools across people and gives calibrated uncertainty, and a continuous-time per-person Ornstein-Uhlenbeck model in ctsem, which handles unevenly spaced beeps directly. We report four results. A simulation shows the Bayesian version recovers the between-person spread tau_a with bias below 0.01 and coverage of 90 to 93 percent. On the Fisher et al. (2017) EMA dataset (N=40), person-specific lag-1 effects vary by an order of magnitude across three mood items, the Bayesian and GAMM estimates agree closely (r=0.87 to 0.92), and DTVEM-RE gives the best one-step-ahead prediction among four discrete-time methods. A multi-lag version shows all nine tau_k values have credible intervals excluding zero, and the lag where people differ most changes across items, something lag-1-only methods like mlVAR cannot detect. Finally, the two versions agree almost exactly on person-specific lag-1 estimates (r >= 0.995), differing only as shrinkage predicts. DTVEM-RE is, to our knowledge, the first person-specific implementation of DTVEM-style lag detection, and it contains standard DTVEM as a special case.

URL PDF HTML ☆

赞 0 踩 0

2606.14108 2026-06-15 cs.LG cs.AI 新提交

Numbers Already Carry Their Own Embeddings

数字本身已携带其嵌入

Suhyun Bae, Donghun Lee

发表机构 * Department of Mathematics, Korea University（高丽大学数学系）

AI总结提出无训练嵌入方法AOE，同时保留数字的实数值与p-adic模签名，实现即插即用并在代数组合基准上首次达到完美精度。

Comments Presented at the MATH-AI Workshop at NeurIPS 2025

2606.14094 2026-06-15 cs.CV cs.AI 新提交

FEMOT: Multi-Object Tracking using Frame and Event Cameras

FEMOT: 使用帧和事件摄像机的多目标跟踪

Shiao Wang, Xiao Wang, Chao Wang, Yitao Li, Menghao Liu, Bo Jiang, Yaowei Wang, Yonghong Tian, Jin Tang

发表机构 * School of Computer Science and Technology, Anhui University（安徽大学计算机科学与技术学院）； Peng Cheng Laboratory（鹏城实验室）； National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University（北京大学计算机学院多媒体信息处理全国重点实验室）； School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University（北京大学深圳研究生院电子与计算机工程学院）； Harbin Institute of Technology, Shenzhen（哈尔滨工业大学（深圳））

AI总结提出FEMOT大规模RGB-事件多目标跟踪数据集和FEMOTR多模态跟踪框架，通过频域融合解耦特征，有效利用互补信息实现鲁棒跟踪。

详情

AI中文摘要

传统的RGB摄像机因其捕获丰富外观和语义信息的能力而被广泛用于多目标跟踪。然而，在复杂的现实挑战下，如运动模糊、低照度和过度曝光，其性能通常会下降。受生物启发的事件摄像机提供高时间分辨率和高动态范围，在极端场景下提供互补线索。尽管如此，由于缺乏大规模且标注良好的数据集，RGB-事件多目标跟踪仍未被充分探索。为解决这一问题，我们提出了FEMOT，一个大规模RGB-事件多目标跟踪数据集，涵盖多样化的现实场景和14个具有挑战性的属性。凭借RGB和事件数据以及高质量标注，FEMOT为系统评估RGB-事件多目标跟踪方法提供了可靠平台。基于FEMOT，我们重新训练并评估了超过十个强跟踪器，从而为未来研究建立了全面的基准。此外，我们提出了FEMOTR，一种多模态跟踪框架，该框架解耦RGB和事件特征并在频域中融合它们，从而有效利用其互补特性实现鲁棒的目标定位和身份关联。在FEMOT和DSEC-MOT数据集上的大量实验证明了所提方法的有效性。源代码和基准数据集已在此https URL上发布。

英文摘要

Conventional RGB cameras have been widely used in multi-object tracking due to their ability to capture rich appearance and semantic information. However, their performance is often degraded under complex real-world challenges, such as motion blur, low illumination, and overexposure. Bio-inspired event cameras offer high temporal resolution and high dynamic range, providing complementary cues under extreme scenarios. Nevertheless, RGB-event multi-object tracking remains underexplored due to the lack of large-scale and well-annotated datasets. To address this issue, we propose FEMOT, a large-scale RGB-event multi-object tracking dataset that covers diverse real-world scenarios and 14 challenging attributes. With both RGB and event data as well as high-quality annotations, FEMOT provides a reliable platform for systematically evaluating RGB-event multi-object tracking methods. Based on FEMOT, we retrain and evaluate over ten strong trackers, thereby establishing a comprehensive benchmark for future research. Furthermore, we propose FEMOTR, a multimodal tracking framework that decouples RGB and event features and fuses them in the frequency domain, thereby effectively exploiting their complementary characteristics for robust object localization and identity association. Extensive experiments on FEMOT and DSEC-MOT datasets demonstrate the effectiveness of the proposed method. The source code and benchmark dataset have been released on https://github.com/Event-AHU/FEMOT.

URL PDF HTML ☆

赞 0 踩 0

2606.14089 2026-06-15 cs.RO 新提交

A Modular Dual-Arm Apple Harvesting Robot with Enhanced Field Performance

一种具有增强田间性能的模块化双臂苹果采摘机器人

Keyi Zhu, Kyle Lammers, Chaaran Arunachalam, Kaixiang Zhang, Renfu Lu, Zhaojian Li

发表机构 * Michigan State University（密歇根州立大学）； United States Department of Agriculture Agricultural Research Service（美国农业部农业研究局）

AI总结提出一种模块化双臂苹果采摘机器人，采用垂直堆叠臂实现单树上下区域同时作业，结合基础模型感知、7阶加加速度轨迹生成、线性扫描采摘策略等5项改进，在商业果园中达到80.0%采摘成功率和7.53秒平均单臂周期，91.2%果实达到特级标准。

详情

AI中文摘要

机器人苹果采摘为解决商业果园劳动力短缺提供了有前景的方案，但低吞吐量和在果园环境中的较差性能阻碍了其商业应用。本文提出一种模块化双臂苹果采摘机器人，采用垂直堆叠臂实现单棵树上、下区域同时作业，将平台定位从多树横向重新定位简化为单树停止。与我们之前的水平双臂系统相比，该平台集成了5项进步：(1)基于基础模型的感知管线，结合Grounding-DINO和EfficientViT-SAM，在非结构化户外环境中实现鲁棒的水果定位；(2)7阶加加速度有界轨迹生成与控制屏障函数安全滤波器相结合，实现快速且安全的臂运动；(3)线性扫描采摘策略，带有10厘米接近缓冲区和旋转分离，提高了采摘可靠性；(4)基于时序逻辑的双臂协调策略与视觉-臂异步调度，最大化共享真空源的使用；(5)在2025年收获季节，涵盖不同苹果品种和树形结构的两个商业果园中进行现场验证。在这些田间试验收集的1738个臂循环中，系统实现了80.0%的单次尝试成功率和平均每臂周期7.53秒。水果损伤评估确认，91.2%的机器人采摘水果保持了美国农业部最高等级（特级），碰伤率在2.4%至4.9%之间。随着采摘周期时间的进一步改进和对茂密树叶遮挡的处理，这种新型模块化机器人设计有望用于苹果的商业化采摘。

英文摘要

Robotic apple harvesting offers a promising solution to labor shortages in commercial orchards, but low throughput and poor performance in orchard environments hinder its commercial adoption. This paper presents a modular dual-arm apple harvesting robot that uses a vertically stacked arms to enable simultaneous operation in the upper and lower zones of a single tree, simplifying platform positioning from multi-tree lateral repositioning to single-tree stops. Compared to our prior horizontal dual-arm system, the platform integrates 5 advances: (1)a foundation-model-based perception pipeline combining Grounding-DINO and EfficientViT-SAM for robust fruit localization in unstructured outdoor environments; (2)7th-order jerk-bounded trajectory generation paired with a Control Barrier Function safety filter to achieve fast yet safe arm motions; (3)a linear sweep harvesting strategy with a 10cm approach buffer and rotational detachment that improves picking reliability; (4)a temporal-logic-based dual-arm coordination policy with vision-arm async scheduling that maximizes usage of a shared vacuum source; and (5)field validation in 2 commercial orchards covering different apple varieties and tree architectures during the 2025 harvest season. Across the 1738 arm cycles collected in these field trials, the system achieved an 80.0% per-attempt success rate and a mean per-arm cycle time of 7.53s. Fruit damage assessments confirmed that 91.2% of robotically harvested fruit retained the highest USDA grade (Extra Fancy), with bruise rates between 2.4% and 4.9%. With further improvements in the picking cycle time and handling of heavy foliage occlusions, this new modular robot design holds promise for commercial harvesting of apples.

URL PDF HTML ☆

赞 0 踩 0

2606.14086 2026-06-15 cs.SD 新提交

Explainable and Trustworthy Speech Emotion Recognition Using Confidence Score and Reinforcement Learning Rectified Speech Emotion Descriptors

使用置信度分数和强化学习修正语音情感描述子的可解释且可信的语音情感识别

Youjun Chen, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan Li, Shujie Hu, Huimeng Wang, Haoning Xu, Chengxi Deng, Bowen Zhang, Xunying Liu

发表机构 * The Chinese University of Hong Kong（香港中文大学）； Institute of Software, Chinese Academy of Sciences（中国科学院软件研究所）； National Research Council Canada（加拿大国家研究委员会）； Tsinghua University（清华大学）

AI总结提出基于置信度分数和强化学习的在线语音情感描述子修正方法，用于后训练语音情感识别系统，在IEMOCAP和MELD上分别取得2.9%和3.3%的绝对性能提升。

Comments Accepted by Interspeech2026

2606.14084 2026-06-15 cs.RO 新提交

Self-Improving VLA Policies: Selected Diffusion Noise for Spurious-Robust Action Smoothing

自我改进的VLA策略：用于抗伪影动作平滑的选择性扩散噪声

Duc Minh Nguyen, Bao-Ngoc Dao, Tung M. Luu, Binh Gia Nguyen, Vinh Tong, Anji Liu, Vu N. Duong, Dung D. Le, Daniel Sonntag, Trung Le, Ngan Le, Jan Peter, An Thai Le, Minh Nhat Vu, Mathias Niepert, Khoa D. Doan, Duy M. H. Nguyen, Vien Anh Ngo

发表机构 * Center for AI Research, VinUniversity（VinUniversity人工智能研究中心）； VinRobotics ； KAIST（韩国科学技术院）； University of Stuttgart（斯图加特大学）； IMPRS-IS（国际马克斯·普朗克智能系统研究学院）； National University of Singapore（新加坡国立大学）； DFKI（德国人工智能研究中心）； University of Oldenburg（奥尔登堡大学）； Monash University（莫纳什大学）； University of Arkansas（阿肯色大学）； TU Darmstadt（达姆施塔特工业大学）

AI总结提出一种无需训练的选择性扩散噪声方法，通过动态采样噪声向量增强视觉-语言-动作策略的鲁棒性和动作平滑性，在仿真和真实场景中成功率分别提升8%和10%。

详情

AI中文摘要

基于扩散的视觉-语言-动作（VLA）策略在机器人操作中实现了强大的泛化能力，但对伪影视觉相关性和噪声动作生成仍然敏感，导致在扰动下行为脆弱。我们引入了选择性扩散噪声（SDN），这是一种简单的、无需训练的测试时方法，通过利用扩散噪声空间作为可控自由度来提高鲁棒性和成功率。SDN动态采样与参考集最大分离的噪声向量，以减轻对伪影线索的依赖，同时选择产生更一致动作轨迹的候选。这种双重目标即使在物体遮挡的观测下也能鼓励稳定行为，并在不修改模型参数的情况下减少动作抖动。我们在两个模拟基准（Google Robot、Widow-X）和两个真实世界机器人数据集上，对多种VLA策略（包括pi_0、Groot-N1.5和Groot-N1.6）评估了SDN。SDN在模拟环境中一致地将成功率提高了8%，在真实环境中提高了10%，同时产生更平滑、更稳定的动作。我们的结果强调，扩散噪声选择可以作为在测试时增强VLA策略的有效且通用机制。

英文摘要

Diffusion-based Vision-Language-Action (VLA) policies enable strong generalization in robotic manipulation, but remain sensitive to spurious visual correlations and noisy action generation, leading to brittle behavior under perturbations. We introduce Selected Diffusion Noise (SDN), a simple, training-free test-time method that improves both robustness and success rate by leveraging the diffusion noise space as a controllable degree of freedom. SDN dynamically samples noise vectors that are maximally separated from a reference set to mitigate reliance on spurious cues, while selecting candidates that yield more coherent action trajectories. This dual objective encourages stable behavior even under object-masked observations and reduces action jitter without modifying model parameters. We evaluate SDN on two simulation benchmarks (Google Robot, Widow-X) and two real-world robotic datasets across multiple VLA policies, including pi_0, Groot-N1.5, and Groot-N1.6. SDN consistently improves success rates by +8% in simulation and +10% in real-world settings, while producing smoother and more stable actions. Our results highlight that diffusion noise selection can serve as an effective and general mechanism for enhancing VLA policies at test time.

URL PDF HTML ☆

赞 0 踩 0

2606.14083 2026-06-15 cs.RO 新提交

The N2D Haptic Glove: A Multi-Finger Glove for 2D Directional Force Feedback for Contact Rich Manipulation

N2D 触觉手套：用于接触丰富操作的多指二维方向力反馈手套

Yao-Ting Huang, Jake Honma, Omar Hernandez, Logan Li, Kaitlin Calimbahin, Bryce Hackel, Michael C. Yip

发表机构 * University of California San Diego（加州大学圣地亚哥分校）

AI总结提出 N2D 触觉手套，通过绞盘驱动在指尖提供二维弯曲-伸展力反馈，显著降低遥操作中的接触力误差并提高一致性。

详情

AI中文摘要

人类在操作过程中依赖方向性指尖力来探测和调节接触，但大多数可穿戴触觉手套仅提供振动或单轴力，导致力方向模糊。缺乏方向性提示时，用户必须仅凭视觉推断接触力，常导致过度按压、控制不一致以及机器人遥操作精度下降。我们提出 N2D 触觉手套，一种多指可穿戴设备，利用绞盘驱动传输在指尖提供平面弯曲-伸展力，实现高透明度力反馈。通过台架验证和涉及机器人手臂与手触觉遥操作的用户研究，我们证明与仅视觉和单轴触觉基线相比，平面指尖反馈在精确操作中显著降低接触力误差，提高试验间一致性，并增强轴向探测任务中的整体用户体验。这些发现确立了 N2D 触觉手套和基于方向手指的触觉设备作为接触丰富遥操作、沉浸式虚拟现实模拟以及机器人从演示中学习的有前景模式。N2D 触觉手套的硬件和软件系统将完全开源，网址为 \href{this https URL}{this https URL}。

英文摘要

Humans rely on directional fingertip forces to probe and regulate contact during manipulation, yet most wearable haptic gloves render only vibration or single-axis force, leaving force direction ambiguous. Without directional cues, users must infer contact force from vision alone, often leading to over-pressing, inconsistent control, and reduced precision in robotic teleoperation. We present the N2D Haptic Glove, a multi-finger wearable device that renders planar flexion-extension fingertip forces using capstan-drive transmissions for high-transparency force feedback. Through benchtop validations and a user study involving haptic teleoperation of a robotic arm and hand, we demonstrate that compared to visual-only and single-axis haptic baselines, planar fingertip feedback significantly reduces contact force error during precise manipulation, improves trial-to-trial consistency, and enhances overall user experience in axial probing tasks. These findings establish the N2D Haptic Glove and directional finger-based haptics devices as a promising modality for contact-rich teleoperation, immersive virtual reality simulations, and robot learning from demonstrations. N2D Haptic Glove's hardware and software system will be fully open-sourced at \href{https://ucsdarclab.github.io/n2d-glove/}{this https URL}.

URL PDF HTML ☆

赞 0 踩 0

2606.14079 2026-06-15 cs.LG 新提交

Deep Spectral Learning of Embedded Latent Transfer Operators for Stochastic Dynamical Systems

随机动力系统的嵌入潜转移算子深度谱学习

Ryogo Tanaka, Yoshinobu Kawahara

发表机构 * Graduate School of Information Science and Technology, The University of Osaka（大阪大学信息科学与技术研究生院）； Center for Advanced Intelligence Project, RIKEN（理化学研究所先进智能项目中心）

AI总结提出一种深度谱编码器方法，通过可学习的非线性特征映射定义马尔可夫潜状态，利用泛函典型相关分析和Galerkin投影估计转移与观测算子，实现贝叶斯滤波和Koopman谱分解，在噪声和部分可观测条件下表现稳定优越。

Comments Accepted at the 42nd Conference on Uncertainty in Artificial Intelligence (UAI 2026)

详情

AI中文摘要

我们提出了一种用于随机非线性动力系统的谱学习方法，该方法在深度特征空间中用嵌入的潜转移算子表示。我们将该方法实例化为深度谱编码器（DSE），一种基于算子的潜状态空间模型，其中时不变神经编码器从观测中实现可学习的非线性特征映射，这些特征定义了马尔可夫潜状态，其时间演化和观测映射分别由转移算子和观测算子描述。在可学习的Galerkin投影特征空间中的泛函典型相关分析提供了来自过去和未来观测的状态坐标，两个线性算子以岭正则化的闭式解形式在状态坐标上估计，这些解与相关算子的Galerkin投影一致。在此表示上，我们推广了特征空间中的序贯贝叶斯滤波和Koopman谱模态分解。多个场景的实验表明，即使在噪声和部分可观测条件下，与序贯贝叶斯滤波和动态模式分解基线相比，该方法性能稳定且优越。

英文摘要

We propose a spectral learning method for stochastic nonlinear dynamical systems represented with embedded latent transfer operators in deep feature spaces. We instantiate the method as Deep Spectral Encoder (DSE), an operator-based latent state-space model in which a time-invariant neural encoder implements learnable nonlinear feature maps from observations, and these features define Markovian latent states whose temporal evolution and observation mapping are described by the transfer and observation operators, respectively. Functional canonical correlation analysis in a learnable Galerkin-projected feature space provides state coordinates from past and future observations, and the two linear operators are estimated on the state coordinates as ridge-regularized closed-form solutions that coincide with Galerkin projections of the associated covariance operators. On this representation, we generalize sequential Bayesian filtering and Koopman spectral mode decomposition in feature space. Experiments on several scenarios show stable and superior performance with sequential Bayesian filtering and dynamic mode decomposition baselines even under noise and partial observability.

URL PDF HTML ☆

赞 0 踩 0

2606.14078 2026-06-15 cs.LG cs.AI 新提交

Rethinking Backdoor Adversarial Unlearning through the Lens of Catastrophic Forgetting in Continual Learning

通过持续学习中的灾难性遗忘视角重新思考后门对抗性去学习

Zhenqian Zhu, Yamin Hu, Yujiang Liu, Luping Wei, Wenbo Hou, Bin Li, Haodong Li, Wenjian Luo

发表机构 * Harbin Institute of Technology, Shenzhen（哈尔滨工业大学（深圳））； Shenzhen Key Laboratory of Media Security, Shenzhen University（深圳大学媒体安全深圳市重点实验室）

AI总结本文将后门学习与去学习建模为持续学习视角下的三阶段过程，基于灾难性遗忘机制推导完全后门去学习的必要条件，并提出盲反演-后门对抗性去学习（BI-BAU）方法，通过期望最大化算法优化最大后验目标，有效消除后门效应。

Comments Accepted by ACM CCS 2026

详情

AI中文摘要

现有研究表明，当前的后门防御方法鲁棒性有限，且常无法应对特定类型的攻击。更令人担忧的是，主流的安全调优策略往往仅提供表面安全保护，因为它们未能完全消除后门效应。在本工作中，我们从持续学习视角将后门学习与去学习重新表述为一个顺序的三阶段过程。在此框架内，我们正式定义了完全后门去学习，并基于灾难性遗忘机制进一步推导了实现它的必要条件。在这些见解的指导下，我们提出了盲反演-后门对抗性去学习（BI-BAU），它将满足去学习条件的对抗样本生成问题表述为一个盲反演问题。我们通过将对抗训练的双层优化过程整合到期望最大化（EM）算法框架中来解决该问题，以优化最大后验（MAP）目标。此外，BI-BAU被扩展到目标类别未知的无目标对抗场景以及多模态对比学习任务中，增强了其在预训练模型可能被攻破的真实部署场景中的适用性。大量实验表明，我们的方法在广泛的后门攻击中具有通用适用性，并能有效且彻底地消除后门模型中的后门效应。

英文摘要

Existing studies reveal that current backdoor defenses exhibit limited robustness and often fail against specific types of attacks. More concerningly, prevailing safety tuning strategies tend to provide only superficial safety protection, as they fall short of completely eliminating the backdoor effects. In this work, we present a novel formulation of backdoor learning and unlearning as a sequential, three-stage process from a continual learning perspective. Within this framework, we formally define complete backdoor unlearning and further derive the necessary conditions for achieving it based on the mechanism of catastrophic forgetting. Guided by these insights, we propose Blind Inversion-Backdoor Adversarial Unlearning (BI-BAU), which formulates the generation of adversarial examples satisfying the unlearning conditions as a blind inversion problem. We solve this by integrating the bi-level optimization process of adversarial training into an Expectation-Maximization (EM) algorithm framework to optimize the maximum a posteriori (MAP) objective. Furthermore, BI-BAU is extended to untargeted adversarial scenarios with unknown target classes, as well as to multi-modal contrastive learning tasks, enhancing its applicability to real-world deployment scenarios where pre-trained models may be compromised. Extensive experiments demonstrate that our method exhibits general applicability across a wide spectrum of backdoor attacks and can effectively and thoroughly eliminate the backdoor effects from a backdoor model.

URL PDF HTML ☆

赞 0 踩 0

2606.14072 2026-06-15 cs.CV cs.CL 新提交

Diffusion-Refined Segmentation and Vision-Language Interpretation for Pediatric Brain Tumor MRI

扩散细化分割与视觉-语言解释用于儿童脑肿瘤MRI

Wentao Ke, Jianche Liu

发表机构 * Department of Mechanical Engineering, Stanford University（斯坦福大学机械工程系）； School of Medicine, Stanford University（斯坦福大学医学院）

AI总结提出两阶段框架，先用Swin-UNETR粗分割，再用条件扩散模型细化边界，最后结合多模态语言模型生成结构化报告，提升儿童脑肿瘤分割精度和可解释性。

详情

AI中文摘要

由于标注数据有限、成像表型异质性、肿瘤边界弥散以及肿瘤子区域类别不平衡，准确的儿童脑肿瘤分割仍然具有挑战性。在此，我们提出一个两阶段深度学习框架，用于改进多模态儿童脑MRI分割和临床解释。首先，我们在BraTS-PEDs MRI扫描上评估3D Res U-Net和Swin-UNETR基线模型，使用四种配准模态预测肿瘤核心、全肿瘤和增强肿瘤区域。其次，我们引入基于扩散的细化模型，以粗Swin-UNETR预测为条件，包括3D DDPM细化器和MedSegDiff。条件化显著提高了扩散稳定性和性能，特别是对于增强肿瘤边界分割。条件化MedSegDiff实现了最强的边界一致性，HD95最低。最后，预测的肿瘤体积和代表性分割叠加图与多模态语言模型集成，生成结构化的放射学风格报告。综合来看，我们的结果表明，从粗到细的扩散分割可以改善儿童肿瘤边界描绘，并支持端到端可解释的AI辅助神经肿瘤学工作流程。

英文摘要

Accurate pediatric brain tumor segmentation remains challenging due to limited annotated data, heterogeneous imaging phenotypes, diffuse tumor boundaries, and class imbalance across tumor subregions. Here, we present a two-stage deep learning framework for improving multi-modal pediatric brain MRI segmentation and clinical interpretation. First, we evaluate 3D Res U-Net and Swin-UNETR baselines on BraTS-PEDs MRI scans, using four co-registered modalities to predict tumor core, whole tumor, and enhancing tumor regions. Second, we introduce diffusion-based refinement models conditioned on coarse Swin-UNETR predictions, including a 3D DDPM refiner and MedSegDiff. Conditioning substantially improves diffusion stability and performance, particularly for enhancing tumor boundary segmentation. Conditioned MedSegDiff achieves the strongest boundary agreement with the lowest HD95. Finally, predicted tumor volumes and representative segmentation overlays are integrated with a multimodal language model to generate structured radiology-style reports. Together, our results suggest that coarse-to-refined diffusion segmentation can improve pediatric tumor boundary delineation and support end-to-end interpretable AI-assisted neuro-oncology workflows.

URL PDF HTML ☆

赞 0 踩 0

2606.14071 2026-06-15 cs.CV 新提交

ShearFuse-UNet: Hadamard, DCT, and Shearlet Transform Fusion for Next-Day Wildfire Spread Prediction

ShearFuse-UNet: Hadamard、DCT和Shearlet变换融合用于次日野火蔓延预测

Ene Meco, Yingyi Luo, Emadeldeen Hamdan, Adam Watts, Ahmet Enis Cetin

发表机构 * University of Illinois Chicago（伊利诺伊大学芝加哥分校）； US Forest Service, Pacific Wildland Fire Science Laboratory（美国林务局太平洋野火科学实验室）

AI总结提出ShearFuse-UNet，一种轻量级深度学习模型，通过融合WHT、DCT和Shearlet变换分支，在U-Net编码器中实现多模态卫星数据的次日野火蔓延预测，以267k参数达到0.596 F1分数，优于ResNet18 U-Net。

详情

AI中文摘要

我们提出了ShearFuse-UNet，一种轻量级且计算高效的深度学习模型，用于从多模态卫星数据预测次日野火蔓延。该模型在U-Net骨干网络的每个编码器块内集成了三个互补的变换域分支：二维快速沃尔什-阿达玛变换（WHT）分支、二维离散余弦变换（DCT）分支和锥自适应数字Shearlet残差分支。WHT和DCT分支通过可学习的频谱缩放和固定的软阈值建立正交潜在空间，而Shearlet分支提供各向异性的多方向特征分解，显式编码火线特有的细长边缘结构。一个学习的SpectralFusion门自适应地组合WHT和DCT响应，并将Shearlet重构作为残差添加。这种三分支设计与Transformer自注意力有松散的结构类比：WHT和DCT分支提供自适应融合的互补频谱表示，而Shearlet分支通过残差路径贡献方向内容。与自注意力不同，所提出的设计依赖于固定的数学变换而非学习的投影算子，减少了参数数量和计算成本。在WildfireSpreadTS数据集上评估，ShearFuse-UNet仅用267k参数就达到了0.596的F1分数，优于基于ResNet18的U-Net（14M参数，F1=0.589），展示了非常有利的精度-效率权衡。在Google Next-Day Wildfire Spread数据集上的结果进一步验证了这些发现。

英文摘要

We propose ShearFuse-UNet, a lightweight and computationally efficient deep learning model for next-day wildfire spread prediction from multi-modal satellite data. The model integrates three complementary transform-domain branches inside each encoder block of a U-Net backbone: a 2D Fast Walsh-Hadamard Transform (WHT) branch, a 2D Discrete Cosine Transform (DCT) branch, and a cone-adapted digital Shearlet residual branch. The WHT and DCT branches establish orthogonal latent spaces with learnable spectral scaling and fixed soft-thresholding, while the Shearlet branch provides anisotropic, multi-directional feature decomposition that explicitly encodes the elongated edge structures characteristic of fire fronts. A learned SpectralFusion gate adaptively combines the WHT and DCT responses, and the Shearlet reconstruction is added as a residual. This three-branch design bears a loose structural analogy to transformer self-attention: the WHT and DCT branches provide complementary spectral representations that are adaptively fused, while the Shearlet branch contributes directional content through a residual pathway. Unlike self-attention, the proposed design relies on fixed mathematical transforms rather than learned projection operators, reducing parameter count and computational cost. Evaluated on the WildfireSpreadTS dataset, ShearFuse-UNet achieves an F1 score of 0.596 with only 267k parameters, outperforming a ResNet18-based U-Net (14M parameters, F1 = 0.589) and demonstrating a highly favorable accuracy-efficiency trade-off. Results on the Google Next-Day Wildfire Spread dataset further validate these findings across a different benchmark.

URL PDF HTML ☆

赞 0 踩 0

2606.14070 2026-06-15 cs.RO 新提交

Development of a 3 in Sewer Pipe Inspection Robot with an Articulated Differential Mechanism using X-shaped Linkages

使用X形连杆的铰接差动机构的三通下水道管道检测机器人开发

Shoya Umemura, Ryota Taniguchi, Atsushi Kakogawa

发表机构 * Ritsumeikan University（立命馆大学）

AI总结提出一种改进的三通下水道管道检测机器人，通过铰接差动机构提升牵引力和越障能力，并设计基于驱动轮电流检测的线缆松弛控制方法，实验验证了其越障性能。

Comments The 23rd International Conference on Ubiquitous Robots (UR 2026), 15-18 July, Osaka Ibaraki Campus, Ritsumeikan University, Ibaraki, Osaka, Japan

2606.14068 2026-06-15 cs.CL 新提交

Harsher on Male? Evaluating LLMs on Gender-Asymmetric Moral Framing Across Diverse Conflict Scenarios

对男性更严厉？评估LLM在不同冲突场景中的性别不对称道德框架

Guangzong Si, Dong Wang, Zhenhao Li, Yifan Yu, Panwang Pan, Wentao Zhu

发表机构 * University of Science and Technology of China（中国科学技术大学）； Eastern Institute of Technology, Ningbo（宁波东方理工大学）

AI总结提出GAMA-Bench基准，通过性别镜像场景评估LLM对相同负面行为的回应，发现模型对男性更倾向于惩罚和责备，对女性则更强调共情和治疗。

Comments underreview

详情

AI中文摘要

现有关于LLM性别偏见的研究主要关注刻板印象、职业关联或明确的有害输出。在这项工作中，我们询问LLM是否在匹配的男性角色和女性角色条件下对相同的负面行为应用一致的回应标准。我们引入了GAMA-Bench，一个包含1298个场景的性别镜像基准，涵盖亲密关系和公共社会冲突。它通过受控网格和跨模型审查构建性别中立的不当行为模板，然后将它们编译成配对的第一人称提示，包含匹配的角色性别和角色指称变化。我们进一步设计了一个结构化的回应框架协议，以衡量模型如何分配惩罚、共情、升级、指导和责备。在10个代表性LLM上的实验揭示了一致的男性不利不对称：男性角色获得更多惩罚性、升级性和责备中心的框架，而女性角色对相同的不当行为获得更多治疗性和共情导向的框架。进一步分析表明，这种模式在模型家族、场景轨道、模型规模和显式思维风格推理中持续存在。官方代码见https://this URL。

英文摘要

Existing studies on gender bias in LLMs have largely focused on stereotypes, occupational associations, or explicit harmful outputs. In this work, we ask whether LLMs apply consistent response standards to the same negative behavior under matched male-actor and female-actor conditions. We introduce GAMA-Bench, a gender-mirrored benchmark of 1,298 scenarios covering intimate relationship and public social conflicts. It constructs gender-neutral misconduct templates through controlled grids and cross-model review, then compiles them into paired first-person prompts with matched actor-gender and role-reference variations. We further design a structured response-framing protocol to measure how models allocate punishment, empathy, escalation, instruction, and blame. Experiments on 10 representative LLMs reveal a consistent male-disadvantaging asymmetry: male actors receive more punitive, escalatory, and blame-centered framing, whereas female actors receive more therapeutic and empathy-oriented framing for the same misconduct. Further analyses show that this pattern persists across model families, scenario tracks, model scale, and explicit thinking-style reasoning. The official code is available at https://github.com/xufeiqiong/GAMA-Bench.

URL PDF HTML ☆

赞 0 踩 0

2606.14063 2026-06-15 cs.RO cs.SY eess.SY 新提交

Semidefinite Relaxations for Collision-Free Motion Planning

无碰撞运动规划的半定松弛

Bernhard Paus Graesdal, Alexandre Amice, Pablo A. Parrilo, Russ Tedrake

发表机构 * Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology（麻省理工学院电气工程与计算机科学系）

AI总结研究点机器人通过球形障碍物的无碰撞运动规划，提出半定松弛方法，理论分析其紧性并利用对称性降低计算复杂度，比直接非线性规划快10-100倍。

详情

AI中文摘要

我们研究了无碰撞运动规划的半定松弛。我们关注一个点机器人在 $\mathbb{R}^n$ 中从起点运动到终点，穿过球形障碍物，并受到路径连续性约束和平方导数成本；这一设定概念简单但抓住了无碰撞运动规划的难度。我们将该问题精确地表述为多项式曲线上的非凸问题，并提出了一个自然的半定松弛。我们贡献了两个关键的理论见解；据我们所知，这是对无碰撞运动规划半定松弛的首次理论分析。首先，我们表明求解凸松弛等价于在潜在更高维空间中全局最优地求解一个相关的运动规划问题。这种几何解释给出了紧性的必要和充分条件，以及松弛何时松弛的清晰直觉。其次，我们表明该松弛允许对称性约简，使其比预期的要小得多，正半定锥的大小随多项式次数线性增长，且与环境维度无关。由此产生的松弛比使用 SNOPT 和 IPOPT 求解的直接非线性规划转录快10到100倍，求解时间的方差显著降低，并能可靠地找到原始问题的局部最优路径。我们展示了其作为 RRT 规划器中凸导向函数的有效性，用于具有 $C^4$ 连续轨迹的最小加加速度四旋翼规划。

英文摘要

We study semidefinite relaxations for collision-free motion planning. We focus on a point robot moving from start to goal through spherical obstacles in $\mathbb{R}^n$, subject to path continuity constraints and squared derivative costs; a setting that is conceptually simple yet captures the hardness of collision-free motion planning. We formulate this problem exactly as a nonconvex problem over polynomial curves, and present a natural semidefinite relaxation. We contribute two key theoretical insights; to our knowledge this is the first theoretical analysis of semidefinite relaxations for collision-free motion planning. First, we show that solving the convex relaxation is equivalent to solving, to global optimality, a related motion planning problem in a potentially higher-dimensional space. This geometric interpretation yields necessary and sufficient conditions for tightness, and a clear intuition for when the relaxation is loose. Second, we show that the relaxation admits a symmetry reduction that makes it significantly smaller than one might expect, with positive semidefinite cone sizes that scale linearly with the polynomial degree and are independent of the ambient dimension. The resulting relaxation is 10 to 100 times faster than direct nonlinear programming transcriptions solved with SNOPT and IPOPT, exhibits significantly lower variance in solve times, and reliably finds a locally optimal path for the original problem. We demonstrate its effectiveness as a convex steering function in an RRT planner for minimum-snap quadrotor planning with $C^4$ continuous trajectories.

URL PDF HTML ☆

赞 0 踩 0

2606.14060 2026-06-15 cs.LG cs.CL 新提交

Non-Parametric Machine Text Detection via Multi-View Gaussian Processes

非参数化机器文本检测：基于多视角高斯过程

Aleem Khan, Nicholas Andrews

发表机构 * Johns Hopkins University（约翰霍普金斯大学）

AI总结提出多视角非参数检测框架，通过高斯过程集成互补特征视图，提高对对抗攻击的鲁棒性，并提供校准概率和分布外输入的原则性弃权。

详情

AI中文摘要

对抗条件（如释义和定向风格迁移）会急剧降低机器文本检测器的准确性。然而，文档携带多种互补信号（例如，风格特征、似然和排序特征、结构特征），抑制其中一种的攻击可能使其他信号保持完整。虽然参数化分类器可以在充分监督下学习组合这些特征，但当分布发生变化时（例如，新型攻击或未见过的语言模型），分类器容易做出过度自信的错误预测。为了解决这个问题，我们提出了一种多视角、非参数化的检测框架，该框架从同一文档中提取互补的特征视图，并通过高斯过程集成聚合每个视图的证据。通过跨视图聚合证据，对手必须同时击败多个独立的检测轴，从而大幅提高逃避成本。高斯过程公式还提供了校准概率和对分布外输入的原则性弃权，支持在高风险场景中的可靠部署。我们在三个涵盖不同生成器和攻击的基准测试（DetectRL和RAID基准测试，以及PAN2025共享任务）上进行了评估，结果表明，我们的多视角检测器在考虑的攻击下保持强性能，在针对未见攻击时优于现有方法。

英文摘要

Adversarial conditions such as paraphrasing and targeted style transfer sharply degrade the accuracy of machine text detectors. A document, however, carries multiple complementary signals (e.g., stylistic features, likelihood and rank-order features, and structural features), and an attack that suppresses one may leave others intact. While a parametric classifier can learn to combine these features given sufficient supervision, classifiers are prone to making confidently incorrect predictions when the distribution shifts (e.g., novel attacks or unseen language models). To address this, we propose a multi-view, non-parametric detection framework that extracts complementary feature views from the same document and aggregates per-view evidence through a Gaussian process ensemble. By aggregating evidence across views, an adversary must simultaneously defeat multiple independent axes of detection, substantially raising the cost of evasion. The Gaussian process formulation additionally provides calibrated probabilities and principled abstention on out-of-distribution inputs, supporting reliable deployment in high-stakes settings. We evaluate on three benchmarks spanning diverse generators and attacks: the DetectRL and RAID benchmarks, and the PAN2025 shared task and demonstrate that our multi-view detector maintains strong performance under the considered attacks, outperforming existing approaches against held out attacks.

URL PDF HTML ☆

赞 0 踩 0

2606.14058 2026-06-15 cs.RO 新提交

ReactSim-Bench: Benchmarking Reactive Behavior World Model Simulation in Autonomous Driving

ReactSim-Bench：自动驾驶中反应性行为世界模型模拟的基准测试

Zhiyuan Zhang, Yanlun Peng, Jianing Zhang, Xianda Guo, Zehan Huang, Haoran Liu, Qifeng Li, Shaofeng Zhang, Xiaosong Jia, Junchi Yan

发表机构 * School of Computer Science & School of Artificial Intelligence, Shanghai Jiao Tong University（上海交通大学计算机科学与技术学院、人工智能学院）； Great Wall Motor（长城汽车）； Institute of Trustworthy Embodied AI (TEAI), Fudan University（复旦大学可信具身人工智能研究所）； School of Computer Science, Wuhan University（武汉大学计算机学院）； University of Science and Technology of China（中国科学技术大学）

AI总结提出ReactSim-Bench，通过解耦自车与周围智能体控制，使用偏离日志的自车行为作为输入，评估行为世界模型模拟的反应性能力，并基于碰撞、地图和运动学指标系统评测多种模型。

详情

AI中文摘要

反应能力是自动驾驶仿真系统中数据驱动行为世界模型模拟器的一个关键特性。具备这种能力，模拟世界中的智能体能够对不同于日志的自车行为做出可行的响应。然而，现有的行为仿真基准测试并未直接衡量反应能力。它们通常让模拟器联合控制自车和周围智能体，并通过日志相似性或开环预测指标来评估真实性。在这项工作中，我们引入了ReactSim-Bench，用于评估自动驾驶中行为世界模型模拟的反应能力。我们将智能体和自车的控制解耦，使用偏离日志的自车行为作为独立的自车输入，要求智能体做出响应。为了获得这些自车行为，我们构建了一个流程，使用自车规划器模型生成候选行为，并通过规则和人工验证筛选数据。采用碰撞指标、基于地图的指标和运动学可行性指标来评估反应性响应的安全性和规则合规性。我们构建了包含三个类别的2,636个测试场景，并对多种架构的最先进模型进行了系统评估，包括基于Transformer、扩散和下一令牌预测的模型。我们进一步分析了重新规划频率对性能的影响，并为未来研究提供了见解。

英文摘要

Reactive capability is a key property of data-driven behavior world model simulators for autonomous driving simulation systems. With this capability, simulated world agents can respond feasibly to autonomous vehicle (AV) behaviors that differ from the log. However, existing behavior simulation benchmarks do not directly measure reactive capability. They often let the simulator jointly control the AV and surrounding agents and evaluate realism through log similarity or open-loop prediction metrics. In this work, we introduce ReactSim-Bench for evaluating the reactive capability of behavior world model simulation in autonomous driving. We decouple the control of agents and the AV, using AV behaviors that differ from the log and require agents to respond as independent AV inputs. To obtain these AV behaviors, we construct a pipeline that uses an AV planner model to generate candidate behaviors and filters the data using rules and manual verification. Collision metrics, map-based metrics, and kinematic feasibility metrics are used to evaluate the safety and rule compliance of reactive responses. We construct 2,636 test scenarios with three categories and conduct a systematic evaluation of state-of-the-art models across multiple architectures, including Transformer-based, diffusion-based, and next-token-prediction-based models. We further analyze how replan frequency affects performance and provide insights for future studies.

URL PDF HTML ☆

赞 0 踩 0