arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

cs.LG 机器学习应用 16 cs.AI AI应用与系统 13 cs.AI 机器学习与表示学习 12 cs.LG 强化学习与序列决策 12 cs.AI 可信、安全与AI治理 10 cs.LG 数据集、基准与评测 10 cs.AI 评测、基准与数据集 9 cs.LG 优化、泛化与理论分析 9 cs.CV 生成式视觉与世界模型 8 cs.LG 深度学习架构与训练方法 8 cs.RO 导航、定位与SLAM 8 cs.AI 机器人与具身智能 7 cs.LG 高效学习、压缩与部署 7 cs.AI 自然语言与多模态智能 6 cs.CV 数据集、基准、评测与训练方法 6 cs.CV 多模态与视觉语言模型 5 cs.CV 医学影像与生物视觉 5 cs.CL 大语言模型与基础模型 5 cs.LG 生成模型与概率建模 5 cs.LG 鲁棒性、不确定性与可信学习 5 cs.AI 智能体、规划与决策 4 cs.AI 其他/综合AI 4 cs.CV 3D视觉、点云与空间智能 4 cs.CV 鲁棒性、安全、隐私与可信视觉 4 cs.CL 对话系统与智能体 4 cs.CL 多模态语言处理 4 cs.CL 评测、数据集与基准 4 cs.CL 安全、隐私、公平与可解释NLP 4 cs.LG 其他/综合机器学习 4 cs.RO 操作、抓取与灵巧手 4 cs.RO 无人车、无人机与移动机器人 4 cs.AI 多智能体与博弈 3 cs.CV 具身智能、机器人与自动驾驶 3 cs.CV 图像识别、检索与分类 3 cs.CV 目标检测、分割与定位 3 cs.CV 其他/综合视觉 3 cs.CL 语音语言联合与音频文本 3 cs.CL 其他/综合NLP 3 cs.LG 图学习与结构化数据 3 cs.LG 迁移、元学习与持续学习 3 cs.RO 机器人学习与模仿强化学习 3 cs.RO 仿真、数据集与评测 3 cs.CL 机器翻译与跨语言处理 2 cs.LG 表示学习、自监督与对比学习 2 cs.RO 运动规划、控制与动力学 2 cs.RO 人机交互与协作机器人 2 cs.RO 具身智能与视觉语言动作模型 2 cs.RO 软体机器人与硬件设计 2 cs.SD 语音识别与关键词检测 2 cs.AI 搜索、优化与约束求解 1 cs.CV 低层视觉、计算成像与图像增强 1 cs.CL 文本生成、摘要与编辑 1 cs.CL 低资源、领域适配与高效训练 1 cs.RO 多机器人与群体系统 1 cs.RO 安全、鲁棒性与可信机器人 1 cs.RO 其他/综合机器人 1 cs.SD 语音合成与声音生成 1 cs.SD 音频事件检测与场景理解 1 cs.SD 安全、隐私与深度伪造音频 1 cs.SD 其他/综合语音音频 1

2604.23938 2026-06-19 cs.CL 版本更新

TSAssistant: A Human-in-the-Loop Agentic Framework for Automated Target Safety Assessment

TSAssistant: 一种人在回路中的自动化靶点安全性评估智能体框架

Xiaochen Zheng, Zhiwen Jiang, David Tokar, Yexiang Cheng, Alvaro Serra, Melanie Guerard, Klas Hatje, Tatyana Doktorova

发表机构 * Computational Sciences Center of Excellence（计算科学卓越中心）

AI总结提出TSAssistant多智能体框架，通过分层指令架构和交互式优化循环，将靶点安全性评估报告生成分解为专业子任务，实现高可重复性和证据溯源。

Comments Updated with quantitative and expert evaluations

详情

AI中文摘要

靶点安全性评估（TSA）需要系统整合遗传、转录组、靶点同源性、药理学和临床数据，以评估治疗靶点的潜在安全性风险。该过程劳动密集且依赖专家，在可扩展性和可重复性方面面临挑战。我们提出TSAssistant，一种人在回路中的多智能体框架，将TSA报告生成分解为专门子智能体的工作流：研究子智能体各自基于并引用单个TSA领域，合成子智能体整合跨领域发现。子智能体通过标准化工具接口从精选生物医学来源检索和综合证据，生成可单独引用、基于证据的章节，其行为由分层指令架构塑造，该架构将协调逻辑与领域专业知识和用户意图分离。为补充这些软约束，程序化执行钩子和持久记忆存储在整个工作流中强制执行硬约束，而交互式优化循环允许专家在完全保留跨迭代对话上下文的情况下审查和修订各个章节。我们不是进行单一的整体比较，而是将报告质量分解为可重复性、证据基础、任务级准确性和专家监督下的可控性，发现高可重复性和证据基础、与人类参考高度一致以及专家驱动的净正面改进。

英文摘要

Target Safety Assessment (TSA) requires systematic integration of genetic, transcriptomic, target homology, pharmacological, and clinical data to evaluate potential safety liabilities of therapeutic targets. This process is labor-intensive and expert-dependent, posing challenges in scalability and reproducibility. We present TSAssistant, a human-in-the-loop multi-agent framework that decomposes TSA report generation into a workflow of specialized subagents: Research Subagents that each ground and cite a single TSA domain, and Synthesis Subagents that integrate findings across domains. Subagents retrieve and synthesize evidence from curated biomedical sources through standardized tool interfaces and produce individually citable, evidence-grounded sections, with behavior shaped by a hierarchical instruction architecture that separates coordination logic from domain expertise and user intent. To complement these soft constraints, programmatic execution hooks and persistent memory stores enforce hard constraints across the workflow, while an interactive refinement loop allows experts to review and revise individual sections with full conversational context preserved across iterations. Rather than a single holistic comparison, we decompose report quality into reproducibility, evidential grounding, task-level accuracy, and controllability under expert oversight, finding high reproducibility and grounding, substantial agreement with the human reference, and net-positive expert-driven refinement.

URL PDF HTML ☆

赞 0 踩 0

2605.00665 2026-06-19 cs.CV 版本更新

Prediction of Alzheimer's Disease Risk Factors from Retinal Images via Deep Learning: Development and Validation of Biologically Relevant Morphological Associations in the UK Biobank

基于深度学习的视网膜图像预测阿尔茨海默病风险因素：英国生物银行中生物学相关形态学关联的开发和验证

Seowung Leem, Yunchao Yang, Adam J. Woods, Ruogu Fang

发表机构 * J. Crayton Pruitt Family Dept. of Biomedical Engineering, University of Florida（朱·克雷顿·普瑞特生物医学工程系，佛罗里达大学）； University of Florida Research Computing（佛罗里达大学研究计算中心）； Meta AI (FAIR)（Meta AI（FAIR））； School of Behavioral and Brain Sciences, University of Texas at Dallas（德克萨斯大学达拉斯分校行为与脑科学学院）； Dept. of Electrical and Computer Engineering, University of Florida（佛罗里达大学电气与计算机工程系）； Dept. of Computer and Information Science and Engineering, University of Florida（佛罗里达大学计算机与信息科学与工程系）； Center for Cognitive Aging and Memory, University of Florida（佛罗里达大学认知衰老与记忆中心）

AI总结利用深度学习从视网膜彩色眼底照片预测12个阿尔茨海默病相关风险因素，并揭示其背后的视网膜结构特征，发现视神经头和视网膜血管等区域与风险因素及阿尔茨海默病前期变化相关。

Comments Accepted to the "Journal of Alzheimer's Disease" for publication

详情

PTLD: 从仿真到现实的触觉潜在知识蒸馏用于灵巧操作

Rosy Chen, Mustafa Mukadam, Michael Kaess, Tingfan Wu, Francois R Hogan, Jitendra Malik, Akash Sharma

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； University of Washington（华盛顿大学）； FAIR at Meta（Meta的FAIR团队）； UC Berkeley（伯克利大学）

AI总结提出PTLD方法，通过真实世界触觉策略数据蒸馏鲁棒状态估计器，解决触觉仿真困难问题，在灵巧操作任务中相比纯本体感策略提升182%和57%。

详情

AI中文摘要

触觉灵巧操作对于自动化复杂家务任务至关重要，但学习有效控制策略仍然是一个挑战。虽然最近的工作依赖于模仿学习，但通过机器人遥操作或动觉教学获取多指手的高质量演示是困难的。另一种方法是，通过强化学习我们可以在仿真中学习技能，但快速且真实的触觉观测仿真具有挑战性。为了弥合这一差距，我们引入了PTLD：从仿真到现实的触觉潜在知识蒸馏，这是一种无需触觉仿真即可学习触觉操作技能的新方法。我们的关键思想不是模拟触觉传感器或纯粹依赖本体感策略进行零样本从仿真到现实的迁移，而是利用现实世界中的特权传感器收集真实的触觉策略数据。然后，这些数据用于蒸馏一个鲁棒的状态估计器，该估计器基于触觉输入运行。我们的实验表明，PTLD可以通过结合触觉感知显著改善在仿真中训练的本体感操作策略。在基准的掌内旋转任务中，PTLD相比纯本体感策略实现了182%的提升。我们还展示了PTLD能够学习具有挑战性的触觉掌内重定向任务，在该任务中，我们观察到达到的目标数量相比仅使用本体感提高了57%。网站：此 https URL。

英文摘要

Tactile dexterous manipulation is essential to automating complex household tasks, yet learning effective control policies remains a challenge. While recent work has relied on imitation learning, obtaining high quality demonstrations for multi-fingered hands via robot teleoperation or kinesthetic teaching is prohibitive. Alternatively, with reinforcement we can learn skills in simulation, but fast and realistic simulation of tactile observations is challenging. To bridge this gap, we introduce PTLD: sim-to-real Privileged Tactile Latent Distillation, a novel approach to learning tactile manipulation skills without requiring tactile simulation. Instead of simulating tactile sensors or relying purely on proprioceptive policies to transfer zero-shot sim-to-real, our key idea is to leverage privileged sensors in the real world to collect real-world tactile policy data. This data is then used to distill a robust state estimator that operates on tactile input. We demonstrate from our experiments that PTLD can be used to improve proprioceptive manipulation policies trained in simulation significantly by incorporating tactile sensing. On the benchmark in-hand rotation task, PTLD achieves a 182% improvement over a proprioception only policy. We also show that PTLD enables learning the challenging task of tactile in-hand reorientation where we see a 57% improvement in the number of goals reached over using proprioception alone. Website: https://akashsharma02.github.io/ptld-website/.

URL PDF HTML ☆

赞 0 踩 0

2604.15838 2026-06-19 cs.LG 版本更新

Reversible Residual Normalization Alleviates Spatio-Temporal Distribution Shift

可逆残差归一化缓解时空分布偏移

Zhaobo Hu, Vincent Gauthier, Mehdi Naima

发表机构 * CNRS -- LIP6 Sorbonne Universit\'e

AI总结针对时空分布偏移问题，提出可逆残差归一化框架，通过空间感知可逆变换同时处理时空维度偏移，结合图卷积与谱约束图神经网络实现自适应归一化。

详情

AI中文摘要

分布偏移严重降低了深度预测模型的性能。虽然这一问题在单变量时间序列中已有充分研究，但在时空领域中仍然是一个重大挑战。有效的解决方案如实例归一化及其变体可以通过标准化统计量来缓解时间偏移。然而，图上的分布偏移更为复杂，不仅涉及单个节点序列的漂移，还涉及空间网络中的异质性，其中不同节点表现出不同的统计特性。为了解决这个问题，我们提出了可逆残差归一化（RRN），一种新颖的框架，执行空间感知的可逆变换以解决空间和时间维度上的分布偏移。我们的方法在可逆残差块中集成了图卷积操作，实现了在保持可逆性的同时尊重底层图结构的自适应归一化。通过将中心归一化与谱约束图神经网络相结合，我们的方法以数据驱动的方式捕获和归一化复杂的时空关系。我们框架的双向性允许模型在归一化的潜在空间中学习，并通过逆变换恢复原始分布特性，为动态时空系统上的预测提供了一种鲁棒且模型无关的解决方案。

英文摘要

Distribution shift severely degrades the performance of deep forecasting models. While this issue is well-studied for individual time series, it remains a significant challenge in the spatio-temporal domain. Effective solutions like instance normalization and its variants can mitigate temporal shifts by standardizing statistics. However, distribution shift on a graph is far more complex, involving not only the drift of individual node series but also heterogeneity across the spatial network where different nodes exhibit distinct statistical properties. To tackle this problem, we propose Reversible Residual Normalization (RRN), a novel framework that performs spatially-aware invertible transformations to address distribution shift in both spatial and temporal dimensions. Our approach integrates graph convolutional operations within invertible residual blocks, enabling adaptive normalization that respects the underlying graph structure while maintaining reversibility. By combining Center Normalization with spectral-constrained graph neural networks, our method captures and normalizes complex Spatio-Temporal relationships in a data-driven manner. The bidirectional nature of our framework allows models to learn in a normalized latent space and recover original distributional properties through inverse transformation, offering a robust and model-agnostic solution for forecasting on dynamic spatio-temporal systems.

URL PDF HTML ☆

赞 0 踩 0

2604.13416 2026-06-19 cs.CV cs.AI 版本更新

DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

DF3DV-1K：用于无干扰新视角合成的大规模数据集与基准

Cheng-You Lu, Yi-Shan Hung, Wei-Ling Chi, Hao-Ping Wang, Charlie Li-Ting Tsai, Yu-Cheng Chang, Yu-Lun Liu, Thomas Do, Chin-Teng Lin

发表机构 * University of Technology Sydney（悉尼科技大学）； University of Sydney（悉尼大学）； National Yang Ming Chiao Tung University（阳明交通大学）

AI总结为弥补无干扰辐射场领域缺乏大规模真实世界数据集的空白，构建了包含1048个场景、每场景提供干净和杂乱图像集的DF3DV-1K数据集，并基于此基准测试了九种最新方法，识别出最鲁棒的方法和最具挑战的场景。

详情

AI中文摘要

辐射场领域的进展已实现逼真的新视角合成。在多个领域中，已开发出大规模真实世界数据集以支持全面基准测试并促进超越场景特定重建的进展。然而，对于无干扰辐射场，每个场景同时包含干净和杂乱图像的大规模数据集仍然缺乏，限制了发展。为填补这一空白，我们引入了DF3DV-1K，一个包含1048个场景的大规模真实世界数据集，每个场景提供干净和杂乱的图像集用于基准测试。该数据集总共包含89,924张使用消费级相机拍摄的图像，模拟随意拍摄，涵盖128种干扰类型和161种场景主题，包括室内和室外环境。一个精心挑选的41个场景子集DF3DV-41被系统设计用于评估无干扰辐射场方法在挑战性场景下的鲁棒性。利用DF3DV-1K，我们对九种最新的无干扰辐射场方法和3D高斯泼溅进行了基准测试，识别出最鲁棒的方法和最具挑战的场景。除了基准测试，我们还展示了DF3DV-1K的一个应用：微调基于扩散的2D增强器以改进辐射场方法，在保留集（例如DF3DV-41）和On-the-go数据集上实现了平均0.96 dB PSNR和0.057 LPIPS的提升。我们希望DF3DV-1K能促进无干扰视觉的发展，并推动超越场景特定方法的进步。数据集和排行榜可在以下网址获取：此 https URL。

英文摘要

Advances in radiance fields have enabled photorealistic novel view synthesis. In several domains, large-scale real-world datasets have been developed to support comprehensive benchmarking and to facilitate progress beyond scene-specific reconstruction. However, for distractor-free radiance fields, a large-scale dataset with clean and cluttered images per scene remains lacking, limiting the development. To address this gap, we introduce DF3DV-1K, a large-scale real-world dataset comprising 1,048 scenes, each providing clean and cluttered image sets for benchmarking. In total, the dataset contains 89,924 images captured using consumer cameras to mimic casual capture, spanning 128 distractor types and 161 scene themes across indoor and outdoor environments. A curated subset of 41 scenes, DF3DV-41, is systematically designed to evaluate the robustness of distractor-free radiance field methods under challenging scenarios. Using DF3DV-1K, we benchmark nine recent distractor-free radiance field methods and 3D Gaussian Splatting, identifying the most robust methods and the most challenging scenarios. Beyond benchmarking, we demonstrate an application of DF3DV-1K by fine-tuning a diffusion-based 2D enhancer to improve radiance field methods, achieving average improvements of 0.96 dB PSNR and 0.057 LPIPS on the held-out set (e.g., DF3DV-41) and the On-the-go dataset. We hope DF3DV-1K facilitates the development of distractor-free vision and promotes progress beyond scene-specific approaches. The dataset and leaderboard are available at https://johnnylu305.github.io/df3dv1k_web/.

URL PDF HTML ☆

赞 0 踩 0

2604.13240 2026-06-19 cs.CV cs.LG 版本更新

A High-Resolution Landscape Dataset for Concept-Based XAI With Application to Species Distribution Models

基于概念的可解释AI的高分辨率景观数据集及其在物种分布模型中的应用

Augustin de la Brosse, Damien Garreau, Thomas Houet, Thomas Corpetti

发表机构 * Université Rennes 2, CNRS, Nantes Université, Univ Brest, LETG, UMR 6554（里昂大学第二分校、法国国家科学研究中心、南特大学、布列塔尼大学、LETG、UMR 6554）； LTSER Zone Atelier Armorique（Armorique 领域实验室区）； University of Würzburg, Center for Artificial Intelligence and Data Science（乌尔姆大学、人工智能与数据科学中心）

AI总结提出首个基于概念的可解释AI方法用于物种分布模型，利用高分辨率多光谱和LiDAR无人机影像构建景观概念数据集，通过Robust TCAV量化景观概念对模型预测的影响，案例研究验证了方法的有效性。

详情

AI中文摘要

绘制物种空间分布对于保护政策和入侵物种管理至关重要。物种分布模型（SDMs）是完成此任务的主要工具，具有两个目的：实现稳健的预测性能，同时提供关于分布驱动因素的生态见解。然而，深度学习SDMs日益增长的复杂性使得提取这些见解更具挑战性。为了调和这些目标，我们提出了首个基于概念的可解释AI（XAI）在SDMs中的实现。我们利用Robust TCAV（测试与概念激活向量）方法量化景观概念对模型预测的影响。为此，我们提供了一个新的开放获取的景观概念数据集，该数据集源自高分辨率多光谱和LiDAR无人机影像。它包括跨越15个不同景观概念的653个斑块和1,450个随机参考斑块，旨在适用于广泛的物种。我们通过两个水生昆虫（襀翅目和毛翅目）的案例研究，使用两个卷积神经网络和一个视觉Transformer来展示这种方法。结果表明，基于概念的XAI有助于根据专家知识验证SDMs，同时发现产生新生态假说的新颖关联。Robust TCAV还提供了景观层面的信息，对政策制定和土地管理有用。代码和数据集公开可用。

英文摘要

Mapping the spatial distribution of species is essential for conservation policy and invasive species management. Species distribution models (SDMs) are the primary tools for this task, serving two purposes: achieving robust predictive performance while providing ecological insights into the driving factors of distribution. However, the increasing complexity of deep learning SDMs has made extracting these insights more challenging. To reconcile these objectives, we propose the first implementation of concept-based Explainable AI (XAI) for SDMs. We leverage the Robust TCAV (Testing with Concept Activation Vectors) methodology to quantify the influence of landscape concepts on model predictions. To enable this, we provide a new open-access landscape concept dataset derived from high-resolution multispectral and LiDAR drone imagery. It includes 653 patches across 15 distinct landscape concepts and 1,450 random reference patches, designed to suit a wide range of species. We demonstrate this approach through a case study of two aquatic insects, Plecoptera and Trichoptera, using two Convolutional Neural Networks and one Vision Transformer. Results show that concept-based XAI helps validate SDMs against expert knowledge while uncovering novel associations that generate new ecological hypotheses. Robust TCAV also provides landscape-level information, useful for policy-making and land management. Code and datasets are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2602.22495 2026-06-19 cs.LG cs.AI 版本更新

Reinforcement-aware Knowledge Distillation for LLM Reasoning

面向LLM推理的强化学习感知知识蒸馏

Zhaoyang Zhang, Shuli Jiang, Yantao Shen, Yuting Zhang, Dhananjay Ram, Shuo Yang, Zhuowen Tu, Wei Xia, Stefano Soatto

发表机构 * Meta ； Guo et al. ； Lin et al. ； Xu et al. ； Shao et al. ； Schulman et al. ； Xie et al.

AI总结提出RL感知蒸馏（RLAD），通过信任区域比率蒸馏（TRRD）在强化学习后训练中实现选择性模仿，解决分布不匹配和目标干扰问题，在逻辑推理和数学基准上优于现有方法。

详情

AI中文摘要

强化学习（RL）后训练最近推动了长链思维推理大语言模型（LLM）的重大进展，但这类模型的高推理成本促使将其蒸馏到更小的学生模型中。大多数现有的知识蒸馏（KD）方法是为监督微调（SFT）设计的，依赖于固定的教师轨迹或基于教师-学生KL散度的正则化。当与RL结合时，这些方法常常遭受分布不匹配和目标干扰：教师监督可能与学生不断变化的rollout分布不一致，并且KL正则化项可能与奖励最大化竞争，需要仔细的损失平衡。为了解决这些问题，我们提出了RL感知蒸馏（RLAD），它在RL期间执行选择性模仿——仅在改进当前策略更新时引导学生向教师学习。我们的核心组件，信任区域比率蒸馏（TRRD），用基于PPO/GRPO风格似然比的目标替代教师-学生KL正则化项，该目标锚定到教师-旧策略混合，从而在学生rollout上产生优势感知、信任区域约束的蒸馏，并自然平衡探索、利用和模仿。在多种逻辑推理和数学基准上，RLAD始终优于离线蒸馏、标准GRPO和基于KL的在策略教师-学生知识蒸馏。

英文摘要

Reinforcement learning (RL) post-training has recently driven major gains in long chain-of-thought reasoning large language models (LLMs), but the high inference cost of such models motivates distillation into smaller students. Most existing knowledge distillation (KD) methods are designed for supervised fine-tuning (SFT), relying on fixed teacher traces or teacher-student Kullback-Leibler (KL) divergence-based regularization. When combined with RL, these approaches often suffer from distribution mismatch and objective interference: teacher supervision may not align with the student's evolving rollout distribution, and the KL regularizer can compete with reward maximization and require careful loss balancing. To address these issues, we propose RL-aware distillation (RLAD), which performs selective imitation during RL -- guiding the student toward the teacher only when it improves the current policy update. Our core component, Trust Region Ratio Distillation (TRRD), replaces the teacher-student KL regularizer with a PPO/GRPO-style likelihood-ratio objective anchored to a teacher--old-policy mixture, yielding advantage-aware, trust-region-bounded distillation on student rollouts and naturally balancing exploration, exploitation, and imitation. Across diverse logic reasoning and math benchmarks, RLAD consistently outperforms offline distillation, standard GRPO, and KL-based on-policy teacher-student knowledge distillation.

URL PDF HTML ☆

赞 0 踩 0

2604.07593 2026-06-19 cs.AI 版本更新

Too long; didn't solve

太长；没解决

Lucía M. Cabrera, Isaac Saxton-Knight, Jocelyn D'Arcy

发表机构 * Instituto Balseiro（巴塞罗那研究所）； Poindexter Labs（波因迪克斯实验室）

AI总结研究提示长度和解答长度与大型语言模型在数学问题上的性能关系，发现两者与模型失败率正相关。

2604.06464 2026-06-19 cs.LG physics.app-ph stat.ML 版本更新

脚手架效应：提示框架如何驱动临床VLM评估中的表面多模态增益

Doan Nam Long Vu, Simone Balloccu

发表机构 * Technical University of Darmstadt（达姆施塔特技术大学）

AI总结研究发现，在临床VLM评估中，提示中提及MRI可用性即可解释70-80%的性能提升，与图像数据是否存在无关，这种“脚手架效应”揭示了表面评估无法反映真实多模态推理能力。

详情

AI中文摘要

可信的临床AI要求性能提升反映真实的证据整合而非表面伪影。我们在两个临床神经影像队列\textsc{FOR2107}（情感障碍）和\textsc{OASIS-3}（认知衰退）上评估了12个开源视觉语言模型（VLM）的二分类性能。两个数据集都包含结构MRI数据，但这些数据不携带可靠的个体级诊断信号。在这些条件下，较小的VLM在引入神经影像上下文后F1分数提升高达58%，蒸馏模型变得与规模大一个数量级的模型相当。对比置信度分析显示，仅仅在任务提示中\textit{提及}MRI可用性就解释了70-80%的转变，与影像数据是否存在无关，这是模态坍塌的一个领域特定实例，我们称之为\textit{脚手架效应}。专家评估揭示了在所有条件下捏造基于神经影像的正当理由，而偏好对齐虽然消除了引用MRI的行为，却使两种条件都退化为随机基线。我们的发现表明，表面评估不足以作为多模态推理的指标，这对VLM在临床环境中的部署有直接影响。

英文摘要

Trustworthy clinical AI requires that performance gains reflect genuine evidence integration rather than surface-level artifacts. We evaluate 12 open-weight vision-language models (VLMs) on binary classification across two clinical neuroimaging cohorts, \textsc{FOR2107} (affective disorders) and \textsc{OASIS-3} (cognitive decline). Both datasets come with structural MRI data that carries no reliable individual-level diagnostic signal. Under these conditions, smaller VLMs exhibit gains of up to 58\% F1 upon introduction of neuroimaging context, with distilled models becoming competitive with counterparts an order of magnitude larger. A contrastive confidence analysis reveals that merely \emph{mentioning} MRI availability in the task prompt accounts for 70-80\% of this shift, independent of whether imaging data is present, a domain-specific instance of modality collapse we term the \emph{scaffold effect}. Expert evaluation reveals fabrication of neuroimaging-grounded justifications across all conditions, and preference alignment, while eliminating MRI-referencing behavior, collapses both conditions toward random baseline. Our findings demonstrate that surface evaluations are inadequate indicators of multimodal reasoning, with direct implications for the deployment of VLMs in clinical settings.

URL PDF HTML ☆

赞 0 踩 0

2602.04037 2026-06-19 cs.LG cs.RO 版本更新

DADP: Domain Adaptive Diffusion Policy

DADP: 领域自适应扩散策略

Pengcheng Wang, Qinghang Liu, Haotian Lin, Yiheng Li, Guojian Zhan, Masayoshi Tomizuka, Yixiao Wang

发表机构 * University of California, Berkeley, California, USA（加州大学伯克利分校）； Peking University, Beijing, China（北京大学）； Tsinghua University, Beijing, China（清华大学）

AI总结提出DADP，通过无监督解耦和领域感知扩散注入，实现跨动态环境的鲁棒零样本适应，在运动与操控任务上超越先前方法。

详情

质量优于点击：面向早期电商查询建议的迭代强化学习

Qi Sun, Kejun Xiao, Huaipeng Zhao, Tao Luo, Xiaoyi Zeng

发表机构 * Alibaba International Digital Commercial Group（阿里巴巴国际数字商业集团）

AI总结针对早期部署场景点击反馈稀疏的问题，提出质量优先的迭代强化学习框架QualEQS，从可回答性、事实性和信息增益三个维度优化查询建议质量，通过候选建议的组级分歧识别模糊上下文并挖掘难例进行迭代改进，在真实电商系统中ChatPV提升6.81%。

详情

AI中文摘要

现有的对话系统依赖查询建议来增强用户参与度。最近的方法主要使用点击率（CTR）模型优化生成模型，以与用户偏好对齐。然而，这些方法在早期部署场景中效果较差，因为点击反馈稀疏且不足以训练可靠的CTR模型。为弥补这一差距，我们提出了QualEQS，一个面向电商查询建议的质量优先迭代强化学习框架。我们将可操作的建议质量形式化为三个直接影响下游可用性的维度：可回答性、事实性和信息增益。为了在没有点击监督的情况下从在线流量中持续改进，我们进一步提出候选建议之间的组级分歧，以识别模糊的查询上下文并挖掘难训练案例进行迭代优化。我们还引入了EQS-Benchmark，一个包含16,949个真实电商查询的数据集，用于离线训练和评估。实验表明，我们基于质量的离线指标与在线性能强相关，为稀疏反馈部署提供了一种实用的评估方法。在离线和在线设置中，QualEQS均持续优于强基线，在真实企业级对话购物助手系统中，在线ChatPV提升了6.81%。

英文摘要

Existing dialogue systems rely on query suggestion to enhance user engagement. Recent approaches mainly optimize generative models using click-through rate (CTR) models to align with user preferences. However, these methods are less effective in early-stage deployment scenarios, where click feedback is sparse and insufficient for training a reliable CTR model. To bridge this gap, we propose QualEQS, a quality-first iterative reinforcement learning framework for e-commerce query suggestion. We formalize actionable suggestion quality along three dimensions that directly affect downstream usability: answerability, factuality, and information gain. To continuously improve from online traffic without click supervision, we further propose group-level disagreement among candidate suggestions to identify ambiguous query contexts and mine hard training cases for iterative refinement. We also introduce EQS-Benchmark, a dataset of 16,949 real-world e-commerce queries for offline training and evaluation. Experiments show that our quality-based offline metrics correlate strongly with online performance, providing a practical evaluation recipe for sparse-feedback deployment. In both offline and online settings, QualEQS consistently outperforms strong baselines, yielding a 6.81% improvement in online ChatPV in a real-world enterprise-level conversational shopping assistant system.

URL PDF HTML ☆

赞 0 踩 0

2603.16606 2026-06-19 cs.CL 版本更新

硬约束下的条件扩散引导：一种随机分析方法

Zhengyi Guo, Wenpin Tang, Renyuan Xu

发表机构 * Department of Industrial Engineering and Operations Research, Columbia University（哥伦比亚大学工业工程与运营管理系）； Department of Management Science and Engineering, Stanford University（斯坦福大学管理科学与工程系）

AI总结提出基于Doob h-变换和鞅表示的条件扩散引导框架，通过鞅损失和鞅协方差损失学习条件函数梯度，确保硬约束满足并给出非渐近保证。

详情

AI中文摘要

我们研究了扩散模型中在硬约束下的条件生成，其中生成的样本必须以概率1满足预设事件。这类约束在安全关键应用和稀有事件模拟中自然出现，而软或基于奖励的引导方法无法保证约束满足。基于扩散模型的概率解释，我们利用Doob h-变换、鞅表示和二次变差过程，开发了一个原则性的条件扩散引导框架。具体地，得到的引导动力学通过涉及条件函数对数梯度的显式漂移校正来增强预训练扩散，而不修改预训练得分网络。利用鞅和二次变差恒等式，我们提出了两种新的离策略学习算法，基于鞅损失和鞅协方差损失，仅使用预训练模型的轨迹来估计h及其梯度。我们为得到的条件采样器在总变差和Wasserstein距离下提供了非渐近保证，明确刻画了得分近似和引导估计误差的影响。数值实验证明了所提方法在强制硬约束和生成稀有事件样本方面的有效性。数值实验的代码可在此https URL找到。

英文摘要

We study conditional generation in diffusion models under hard constraints, where generated samples must satisfy prescribed events with probability one. Such constraints arise naturally in safety-critical applications and in rare-event simulation, where soft or reward-based guidance methods offer no guarantee of constraint satisfaction. Building on a probabilistic interpretation of diffusion models, we develop a principled conditional diffusion guidance framework based on Doob's h-transform, martingale representation and quadratic variation process. Specifically, the resulting guided dynamics augment a pretrained diffusion with an explicit drift correction involving the logarithmic gradient of a conditioning function, without modifying the pretrained score network. Leveraging martingale and quadratic-variation identities, we propose two novel off-policy learning algorithms based on a martingale loss and a martingale-covariation loss to estimate h and its gradient using only trajectories from the pretrained model. We provide non-asymptotic guarantees for the resulting conditional sampler in both total variation and Wasserstein distances, explicitly characterizing the impact of score approximation and guidance estimation errors. Numerical experiments demonstrate the effectiveness of the proposed methods in enforcing hard constraints and generating rare-event samples. The code of the numerical experiments can be found at https://github.com/ZhengyiGuo2002/CDG_Finance.

URL PDF HTML ☆

赞 0 踩 0

2603.07236 2026-06-19 cs.CV 版本更新

HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing

HY-WU (第一部分): 一种可扩展的功能性神经记忆框架及其在文本引导图像编辑中的应用

Mengxuan Wu, Xuanlei Zhao, Ziqiao Wang, Ruicheng Feng, Zhangyang Wang, Kai Wang

发表机构 * Tencent HY Team（腾讯 HY 团队）

AI总结提出HY-WU框架，通过功能性神经记忆模块即时生成实例特定权重更新，避免共享权重覆盖导致的干扰，解决持续学习与个性化中的灾难性遗忘问题。

详情

AI中文摘要

基础模型正从离线预测器过渡到期望长时间运行的部署系统。在实际部署中，目标并非固定：领域漂移、用户偏好演变，以及模型发布后出现新任务。这将持续学习和即时个性化从可选功能提升为核心架构要求。然而，大多数适应流程仍遵循静态权重范式：训练后（或任何适应步骤后），推理执行单一参数向量，而不考虑用户意图、领域或实例特定约束。这将训练或适应后的模型视为参数空间中的单个点。在异构且持续演变的机制中，不同目标可能在参数上诱导分离的可行区域，迫使任何单一共享更新陷入妥协、干扰或过度专业化。结果，持续学习和个性化通常实现为对共享权重的重复覆盖，冒着先前学习行为退化的风险。我们提出HY-WU（权重释放），一种记忆优先的适应框架，将适应压力从覆盖单一共享参数点转移。HY-WU将功能性（算子级）记忆实现为神经模块：一个根据实例条件即时合成权重更新的生成器，产生实例特定算子而无需测试时优化。

英文摘要

Foundation models are transitioning from offline predictors to deployed systems expected to operate over long time horizons. In real deployments, objectives are not fixed: domains drift, user preferences evolve, and new tasks appear after the model has shipped. This elevates continual learning and instant personalization from optional features to core architectural requirements. Yet most adaptation pipelines still follow a static weight paradigm: after training (or after any adaptation step), inference executes a single parameter vector regardless of user intent, domain, or instance-specific constraints. This treats the trained or adapted model as a single point in parameter space. In heterogeneous and continually evolving regimes, distinct objectives can induce separated feasible regions over parameters, forcing any single shared update into compromise, interference, or overspecialization. As a result, continual learning and personalization are often implemented as repeated overwriting of shared weights, risking degradation of previously learned behaviors. We propose HY-WU (Weight Unleashing), a memory-first adaptation framework that shifts adaptation pressure away from overwriting a single shared parameter point. HY-WU implements functional (operator-level) memory as a neural module: a generator that synthesizes weight updates on-the-fly from the instance condition, yielding instance-specific operators without test-time optimization.

URL PDF HTML ☆

赞 0 踩 0

2602.20573 2026-06-19 cs.LG 版本更新

MolGraphBench: A Benchmark of GNN Architectures for Molecular Regression Tasks

MolGraphBench：用于分子回归任务的GNN架构基准测试

Rajan, Ishaan Gupta

发表机构 * Rajan 1 ； Ishaan Gupta 2

AI总结提出MolGraphBench基准，比较四种GNN模型在分子回归任务上的性能，发现GCN和GIN为最优架构，并指出GNN层类型应作为可调超参数。

Comments 14 pages, 5 figures and 4 tables

详情

AI中文摘要

分子通常表示为SMILES字符串，可以轻松转换为手工设计的描述符或指纹（FP）用于分子性质预测。研究表明，SMILES可以转换为分子图 $G = (V, E)$，其中原子为节点 $(V)$，键为边 $(E)$。这些分子图随后可用于训练图神经网络（GNN）模型。尽管近年来GNN（现有和新架构）在分子性质预测中的应用激增，但仍缺乏严格的基准测试。我们提出了MolGraphBench，一个包含四种常用GNN模型的全面基准测试，用于分子性质预测。基准测试结果表明，基于绝对性能、训练效率、迁移学习和预测质量，图卷积网络（GCN）和图同构网络（GIN）是分子图回归任务的最优GNN架构。研究还表明，在融合（GNN-FP）框架中，分子指纹具有非互补性。此外，我们的GNN模型在三个数据集上取得了优于或与当前最先进GNN基线相当的性能（B3DB上GCN的RMSE为0.518，FreeSolv上GIN-FP的RMSE为1.022，RT数据集上GIN的MAE为63.783）。本研究的发现表明，GNN层类型应被视为可调超参数，而非固定设计选择，以实现更优性能。

英文摘要

Molecules are often represented as SMILES strings, which can be readily converted to hand-crafted descriptors or fingerprints (FP) for molecular property prediction. Research has demonstrated that SMILES can be converted to molecular graphs $G = (V, E)$, with atoms as nodes $(V)$ and bonds as edges $(E)$. These molecular graphs can subsequently be used to train graph neural networks (GNN) models. Despite the recent surge in application of GNN (existing and novel architectures) for molecular property prediction, a rigorous benchmark is still lacking. We propose MolGraphBench, a comprehensive benchmark of four commonly used GNN models for molecular property prediction. Benchmarking results demonstrate graph convolutional network (GCN) and graph isomorphism networks (GIN) as the optimal GNN architectures for molecular graph regression tasks, based on absolute performance, training efficiency, transfer learning and prediction quality. The study also indicates the non-complementary nature of molecular fingerprints in the fusion (GNN-FP) framework. Furthermore, our GNN models achieved performance superior or comparable performance to current state-of-the-art GNN baselines across three datasets (GCN with RMSE of $0.518$ on B3DB, GIN-FP with RMSE of $1.022$ on FreeSolv and GIN with MAE of $63.783$ on RT datasets). Findings from this study indicate that type of GNN-layer, should be treated as a tunable hyperparameter rather than a fixed design choice to achieve superior performance.

URL PDF HTML ☆

赞 0 踩 0

2603.04219 2026-06-19 cs.SD cs.AI eess.AS 版本更新

ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis

ZeSTA: 基于领域条件训练的零样本文本转语音增强用于数据高效的个性化语音合成

Youngwon Choi, Jinwoo Oh, Hwayeon Kim, Hyeonyu Kim

发表机构 * Maum AI Inc.（Maum AI公司）； Humelo Inc.（Humelo公司）

AI总结提出ZeSTA框架，通过轻量领域嵌入区分真实与合成语音，结合真实数据过采样，在极低资源下提升零样本文本转语音增强的说话人相似度，保持可懂度和感知质量。

Comments 6 pages, accepted to INTERSPEECH 2026