arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.14403 2026-05-15 cs.CV

DermAgent: A Self-Reflective Agentic System for Dermatological Image Analysis with Multi-Tool Reasoning and Traceable Decision-Making

Yize Liu, Siyuan Yan, Ming Hu, Lie Ju, Xieji Li, Feilong Tang, Wei Feng, Zongyuan Ge

AI总结 DermAgent 是一个用于皮肤科图像分析的自反思智能代理系统，旨在解决现有多模态大语言模型在皮肤病诊断中领域知识不足和幻觉问题。该系统通过集成七个专业视觉与语言模块，在计划-执行-反思框架下实现可追溯的诊断推理，结合多工具协同推理与外部证据检索，有效提升了诊断准确性和可靠性。实验表明，DermAgent 在多个皮肤病基准测试中表现优异，显著优于现有先进模型。

Comments MICCAI2026 early acceptance

2605.14399 2026-05-15 cs.CV cs.GR

SceneForge: Structured World Supervision from 3D Interventions

Jizhizi Li, Jiayang Ao, Danny Wicks, Petru-Daniel Tudosiu

AI总结 SceneForge 是一个基于可编辑3D世界状态的干预驱动框架，旨在生成在场景编辑、视角变化和场景级干预下保持一致的结构化监督信号。该方法通过显式干预（如物体移除或相机变化）并传播其对场景结构和物理属性的影响，生成包括反事实观测、多视角观测及阴影、反射等效应感知信号在内的对齐输出。实验表明，SceneForge 能有效提升多任务学习中物体移除和场景移除的性能，为干预一致的多模态学习提供了可扩展的监督基础。

2605.14396 2026-05-15 cs.CV cs.CR cs.LG cs.RO

Systematic Discovery of Semantic Attacks in Online Map Construction through Conditional Diffusion

Chenyi Wang, Ruoyu Song, Raymond Muller, Jean-Philippe Monteuuis, Jonathan Petit, Z. Berkay Celik, Ryan Gerdes, Ming F. Li

AI总结自动驾驶车辆依赖在线高精度地图构建来感知车道边界、分隔线和人行横道等关键道路元素，这些元素直接影响运动规划的安全性。本文提出MIRAGE框架，通过条件扩散模型系统性地发现能够绕过对抗防御、导致地图预测退化的语义攻击，例如制造阴影或湿滑路面等合理环境变化。实验表明，MIRAGE生成的攻击在多个防御机制下仍具有强效，并且生成场景的现实感达到80-84%，远高于传统像素级攻击方法。

2605.14393 2026-05-15 cs.CV

Analogical Trajectory Transfer

Junho Kim, Eun Sun Lee, Gwangtak Bae, Seunggu Kang, Young Min Kim

AI总结本文研究类比轨迹迁移问题，旨在将一个三维环境中的运动轨迹转换到另一个语义上相似但空间布局不同的环境中，从而实现机器的类比空间推理能力。为了解决场景间物体位置、尺度和布局差异带来的碰撞和几何失真问题，作者提出了一种基于场景聚类和分层映射预测的方法，通过分解问题并组合子问题的解，生成语义一致且空间连贯的轨迹转移结果。该方法无需训练，运行速度快，且在多个应用场景中优于基于大语言模型和场景图匹配的基线方法。

2605.14392 2026-05-15 cs.AI

Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis

Yucheng Shi, Zhenwen Liang, Kishan Panaganti, Dian Yu, Wenhao Yu, Haitao Mi

AI总结该研究提出了一种通过可验证环境合成实现自我进化的强化学习方法，使语言模型不仅能生成问题，还能构建用于训练自身的环境。核心方法是通过生成可执行的环境对象，实现问题采样、参考解计算与响应评分，并确保环境具有稳定的“解决-验证”不对称性，从而保证奖励信号的有效性。研究通过EvoEnv框架验证了该方法的有效性，在基准测试中实现了性能提升，表明模型的自我改进依赖于构建难度始终超越自身能力的环境，而非单纯增加合成数据量。

Comments Tech report, work in progress

详情

英文摘要

We pursue a vision for self-improving language models in which the model does not merely generate problems or traces to imitate, but constructs the environments that train it. In zero-data reasoning RL, this reframes self-improvement from a data-generation loop into an environment-construction loop, where each artifact is a reusable executable object that samples instances, computes references, and scores responses. Whether this vision sustains improvement hinges on a single property: the environments must exhibit stable solve--verify asymmetry, the model must be able to write an oracle once that it cannot reliably execute in natural language on fresh instances. This asymmetry takes two complementary forms. Some tasks are algorithmically hard to reason through but trivial as code: a dynamic program or graph traversal, compiled once, yields unboundedly many calibrated instances. Others are intrinsically hard to solve but easy to verify, like planted subset-sum or constraint satisfaction. Both create a durable gap between proposing and solving that the policy cannot close by gaming the verifier, and it is this gap that keeps reward informative as the learner improves. We instantiate this view in EvoEnv, a single-policy generator, solver method that synthesizes Python environments from ten seeds and admits them only after staged validation, semantic self-review, solver-relative difficulty calibration, and novelty checks. The strongest evidence comes from the already-strong regime: on Qwen3-4B-Thinking, fixed public-data RLVR and fixed hand-crafted environment RLVR reduce the average, while EvoEnv improves it from 72.4 to 74.8, a relative gain of 3.3%. Stable self-improvement, we suggest, depends not on producing more synthetic data, but on models learning to construct worlds whose difficulty stays structurally beyond their own reach.

URL PDF HTML ☆

赞 0 踩 0

2605.14391 2026-05-15 cs.CV

Dual-Latent Collaborative Decoding for Fidelity-Perception Balanced Image Compression

Qi Mao, Zijian Wang, Zhengxue Cheng, Lingyu Zhu, Siwei Ma

AI总结本文研究了如何在图像压缩中平衡重建图像的保真度与感知质量。现有方法通常依赖单一的潜在表示同时处理结构细节、语义信息和感知先验，导致不同任务之间的冲突。为此，作者提出了一种双潜在协作解码框架MoDE，通过将标量量化和向量量化两种潜在表示分别作为保真度专家和感知专家，并引入专家特定增强和跨专家调制模块，实现两者的协同解码。实验表明，该方法在广泛比特率范围内实现了更优的保真-感知平衡。

2605.14389 2026-05-15 cs.AI cs.CL cs.LG

Nexus : An Agentic Framework for Time Series Forecasting

Sarkar Snigdha Sarathi Das, Palash Goyal, Mihir Parmar, Nanyun Peng, Vishy Tirumalashetty, Chun-Liang Li, Rui Zhang, Jinsung Yoon, Tomas Pfister

AI总结时间序列预测不仅涉及数值推断，还需结合新闻、事件等非结构化文本信息进行推理。为弥补现有时间序列基础模型（TSFMs）对文本信号不敏感以及大语言模型（LLMs）在不同领域表现不一的问题，本文提出Nexus，一种多智能体预测框架，通过分解预测过程为宏观与微观时间波动识别、上下文信息整合等阶段，实现更灵活的预测。实验表明，Nexus在多个领域数据上优于现有先进模型，同时生成高质量的推理轨迹，揭示了预测背后的驱动因素，证明了现实中的时间序列预测是超越单纯序列建模的智能体推理问题。

Comments 30 Pages, 3 figures, 5 Tables

2605.14380 2026-05-15 cs.CL

Mitigating Data Scarcity in Psychological Defense Classification with Context-Aware Synthetic Augmentation

Hoang-Thuy-Duong Vu, Quoc-Cuong Pham, Huy-Hieu Pham

AI总结该研究针对心理防御机制（PDMs）分类任务中因数据稀缺和类别不平衡带来的挑战，提出了一种结合上下文感知合成增强与混合分类模型的方法。通过整合语言上下文表示、基础临床特征以及150个标注防御条目，该方法在PsyDefDetect共享任务中显著提升了分类性能，准确率和宏F1值分别达到58.26%和24.62%，优于现有方法，为低资源场景下的心理防御分类建立了有力的基准。

2605.14379 2026-05-15 cs.LG cs.AI cs.GT cs.MA

Data-Augmented Game Starts for Accelerating Self-Play Exploration in Imperfect Information Games

JB Lanier, Nathan Monette, Pierre Baldi, Roy Fox

AI总结在不完美信息博弈中，由于稀疏奖励和长期探索的困难，寻找大规模竞争性游戏（如《星际争霸》《Dota》等）的近似均衡计算上极具挑战。本文提出了一种多智能体初始状态采样策略——数据增强博弈起始（DAGS），通过从离线人类专家演示中采样中间状态作为强化学习的起始点，以加速策略梯度方法在零和两人博弈中的探索效率。实验表明，DAGS在固定计算预算下能显著降低博弈的可利用性，并揭示了初始状态分布增强可能导致均衡偏差的问题，同时提出了一种简单有效的缓解方法。

Comments 17 pages, 4 figures. JB Lanier and Nathan Monette contributed equally

2605.14374 2026-05-15 cs.LG cs.AI math.OC

Optimal Pattern Detection Tree for Symbolic Rule-Based Classification

Young-Chae Hong, Yangho Chen

AI总结本文提出了一种基于混合整数规划的符号规则分类模型——最优模式检测树（OPDT），用于在二分类任务中发现数据中的单一最优模式。为融入先验知识和合规要求，作者进一步引入了分支结构约束（BSC）框架，使决策者能够将领域知识直接嵌入模型。该方法通过优化覆盖范围并最小化误分类的假阳性率，能够在合理时间内于中等规模数据集上发现具有最优性保证的隐藏模式。

Comments Published in Transactions on Machine Learning Research (TMLR). 26 pages, 4 figures. OpenReview URL: https://openreview.net/forum?id=RJ6eMDcDCv

2605.14368 2026-05-15 cs.CL cs.AI

Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement

Injin Kong, Hyoungjoon Lee, Yohan Jo

AI总结本文研究了如何在预训练语言模型中有效引入扩散模型，提出了一种基于几何引导的扩散-变压器混合模型DiHAL。该方法通过几何特征评估各层的适合性，选择合适的隐藏状态接口，并用扩散桥替换下层变压器结构，保留上层结构和语言模型头部。实验表明，基于几何评分的隐藏状态恢复方法在保持相同训练预算的情况下，优于传统的连续扩散方法，展示了在语言模型中进行扩散替换的可行性。

2605.14366 2026-05-15 cs.CL cs.LG

Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax

Zeli Su, Ziyin Zhang, Zhou Liu, Xuexian Song, Zhankai Xu, Longfei Zheng, Xiaolu Zhang, Rong Fu, Guixian Xu, Wentao Zhang

AI总结该研究探讨了在低资源语言扩展中，如何避免因微调大语言模型而导致的“对齐税”问题。作者提出了一种基于语义奖励的强化学习方法，通过组相对策略优化（GRPO）在嵌入层进行语义对齐，而非传统的似然最大化，从而在保持模型通用能力的同时提升低资源语言的表现。实验表明，该方法在藏汉机器翻译和藏语新闻生成任务中有效缓解了对齐税，生成质量更高且更具可迁移性。

Comments ACL 2026 Findings

2605.14365 2026-05-15 cs.LG cs.AI

LoMETab: Beyond Rank-1 Ensembles for Tabular Deep Learning

Changryeol Choi, Hyewon Park, Yujin Kwon, Gowun Jeong

AI总结在表格深度学习中，主流方法的性能趋于接近，难以形成明显优劣之分。为此，本文提出 LoMETab，一种基于秩-$r$ 的隐式集成模型，通过引入可调节的秩和初始化尺度，增强模型的多样性与表达能力。实验表明，LoMETab 能有效提升模型间的预测差异性，并在分类和回归任务中展现出良好的控制能力与性能表现。

详情

英文摘要

Recent tabular learning benchmarks increasingly show a tight performance cluster rather than a clear hierarchy among leading methods, spanning gradient boosted decision trees, attention-based architectures, and implicit ensembles such as TabM. As benchmark gains plateau, a complementary goal is to understand and control the mechanisms that make simple neural tabular models competitive. We propose LoMETab, a rank-$r$ generalization of multiplicative implicit ensembles. LoMETab lifts the rank-1 BatchEnsemble/TabM modulation to a rank-$r$ identity-residual Hadamard family by parameterizing each member weight as $W_k = W \odot (1 + A_kB_k^\top)$, where $W$ is shared and $(A_k, B_k)$ are member-specific low-rank factors. This exposes two practical diversity-control axes: the adapter rank $r$ and the initialization scale $σ_{\mathrm{init}}$, and we prove that for $r \ge 2$ this generalization strictly enlarges BatchEnsemble's hypothesis class. Empirically, we show that this added capacity manifests as measurable predictive diversity after training: on representative classification datasets, LoMETab sustains higher pairwise KL than an additive low-rank ablation, and $(r, σ_{\mathrm{init}})$ provides broad control over pairwise KL, varying by up to several orders of magnitude across configurations. The induced diversity is reflected in task-appropriate output-level measures: argmax disagreement for classification and ambiguity for regression, indicating that the control extends beyond pairwise KL to decision- and output-level member variation. Finally, experiments sweeping over adapter rank $r$ and initialization scale $σ_{\mathrm{init}}$ reveal that predictive performance is dataset-dependent over the $(r, σ_{\mathrm{init}})$ grid, supporting LoMETab as a controllable family of implicit ensembles rather than a fixed rank-1 construction.

URL PDF HTML ☆

赞 0 踩 0

2605.14359 2026-05-15 cs.LG cs.AI

RQ-MoE: Residual Quantization via Mixture of Experts for Efficient Input-Dependent Vector Compression

Zhengjia Zhong, Shuyan Ke, Zaizhou Lin, Jiaqi Song, Hongyi Lan, Hui Li

AI总结该论文提出了一种名为RQ-MoE的残差量化框架，通过结合专家混合模型与双流量化机制，实现了针对输入数据动态调整的高效向量压缩。该方法解决了现有动态量化方法在解码过程中存在的瓶颈问题，支持并行解码并提升了表达能力。实验表明，RQ-MoE在重建与检索任务中达到了当前最优或接近最优的性能，同时解码速度比以往方法快6到14倍。

Comments To appear at ICML 2026

2605.14358 2026-05-15 cs.AI cs.LG

Uncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traces

Sanjoy Chowdhury, Dinesh Manocha

AI总结该研究探讨了语言模型在生成长链推理过程时，其中有多少步骤对于最终预测是必要的。通过定义“最小核心”——即能保持最终答案或预测分布的最小步骤子集，并引入压缩比、冗余度、步骤必要性等指标，研究发现推理轨迹普遍存在冗余，平均有46%的步骤可以移除而不影响答案，且必要性高度集中于少数几步。研究还表明，最小核心能更清晰地揭示推理的几何结构，并在不同模型间具有较好的迁移能力，为理解语言模型推理的本质提供了新视角。

2605.14352 2026-05-15 cs.CL

Ideology Prediction of German Political Texts

Sinclair Schneider, Florian Steuber, Joao A. G. Schneider, Gabi Dreo Rodosek

AI总结本文研究如何利用基于Transformer的模型对德语政治文本进行意识形态预测，将文本的政治立场映射到从-1到1的连续光谱上。研究构建了四个不同来源的语料库，包括德国联邦议院的会议记录、在线决策工具Wahl-O-Mat、33家不同政治倾向的报纸以及议员的推文，并通过对比多个预训练模型，发现DeBERTa-large和Gemma2-2B在不同数据集上表现出色。研究结果表明，模型结构和领域特定数据的可用性对政治偏见估计具有重要影响。

Comments This paper has been accepted for the upcoming 20th International AAAI Conference on Web and Social Media (ICWSM 2026)

详情

英文摘要

Elections represent a crucial milestone in a nation's ongoing development. To better understand the political rhetoric from various movements, ranging from left to right, we propose a transformer-based model capable of projecting the political orientation of a text on a continuous left-to-right spectrum, represented by a normalized scalar d between -1 and 1. This approach enables analysts to focus on specific segments of the political landscape, such as conservatives, while excluding liberal and far-right movements. Such a task can only be achieved with multiclass classifiers, provided that the desired orientation is incorporated within one of their predefined classes. To determine the most suitable foundation model among 13 candidate transformers for this task, we constructed four distinct corpora. One corpus comprised annotated plenary notes from the German Bundestag, while another was based on an official online decision-making tool, Wahl-O-Mat. The third corpus consisted of articles from 33 newspapers, each identified by its political orientation, and the fourth included 535,200 tweets from 597 members of the 20th and 21st German Bundestag. To mitigate overfitting, we used two distinct corpora for training and two for testing, respectively. For in-domain performance, DeBERTa-large achieved the highest F1 score F1=0.844 as well as for the X (Twitter) out-of-domain test ACC=0.864. Regarding the newspaper out-of-domain test, Gemma2-2B excelled (MAE = 0.172). This study demonstrates that transformer models can recognize political framing in German news at the level of public opinion polls. Our findings suggest that both the model architecture and the availability of domain-specific training data can be as influential as model size for estimating political bias. We discuss methodological limitations and outline directions for improving the robustness of bias measurement.

URL PDF HTML ☆

赞 0 踩 0

2605.14350 2026-05-15 cs.LG

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

Nicholas E. Corrado, Wenyuan Huang, Josiah P. Hanna

AI总结多任务强化学习旨在训练一个智能体同时高效优化多个任务的性能，但传统方法在联合优化所有任务时常导致学习不平衡，即对简单任务学习迅速而对困难任务进展缓慢。本文提出了一种新的自适应任务采样方法DRATS，通过动态优先采样最难完成的任务，以解决数据分配不均的问题。该方法将多任务学习建模为一个可行性问题，并通过最小化最差任务回报差距的最小最大目标进行优化，在多个基准测试中表现出更高的数据效率和最差任务性能。

2605.14346 2026-05-15 cs.CV

Learning with Semantic Priors: Stabilizing Point-Supervised Infrared Small Target Detection via Hierarchical Knowledge Distillation

Yuanhang Yao, Ping Qian, Zhu Liu, Long Ma, Weimin Wang

AI总结本文研究了如何在点监督下稳定红外小目标检测任务，针对轻量级检测器语义信息不足导致的伪标签噪声和训练不稳定问题，提出了一种基于分层视觉基础模型（VFM）的知识蒸馏框架。该方法通过双层优化过程，结合语义条件仿射调制（SCAM）和动态协作学习策略，有效提升了检测精度和训练稳定性。实验表明，该方法在多种红外小目标检测模型上均取得了显著改进。

2605.14343 2026-05-15 cs.LG math.ST stat.ML stat.TH

Nearest-Neighbor Radii under Dependent Sampling

Yuanyuan Gao, Yilong Hou, Zhexiao Lin

AI总结本文研究了在依赖采样条件下最近邻方法的邻域半径性质，突破了传统独立采样假设。通过分析强混合依赖观测，论文建立了多项式混合条件下的几乎处处收敛结果，并在几何混合条件下给出了精确的非渐近矩界，这些界依赖于局部内在维度而非环境维度，从而适用于高维流形数据。实验验证了理论结果，表明即使在依赖采样下，最近邻几何结构仍具有信息性。

Comments 33 pages

2605.14341 2026-05-15 cs.CV

AnyBand-Diff: A Unified Remote Sensing Image Generation and Band Repair Framework with Spectral Priors

Zuopeng Zhao, Ying Liu, Xiaoyu Li, Su Luo, Lu Li, Wenwen Liu

AI总结本文提出了一种名为 AnyBand-Diff 的统一遥感图像生成与波段修复框架，旨在解决现有扩散模型在生成遥感图像时忽略物理规律导致的光谱失真和辐射不一致问题。该方法引入了基于光谱先验的扩散模型架构，结合双随机掩码策略和物理引导采样机制，能够从任意波段子集恢复完整的光谱信息，并保证生成图像的辐射一致性。实验表明，AnyBand-Diff 在生成可靠遥感图像和实现高精度光谱重建方面表现出色，为物理感知的生成模型在地球观测领域的应用提供了新思路。

2605.14340 2026-05-15 cs.SD

Refining Pseudo-Audio Prompts with Speech-Text Alignment for Text-Only Domain Adaptation in LLM-Based ASR

Ryo Magoshi, Takashi Maekaku, Yusuke Shinohara

AI总结基于大语言模型（LLM）的自动语音识别系统通过连接音频编码器和LLM取得了良好性能，但在面对新领域时，由于缺乏配对的语音和文本数据，其适应能力受到限制。本文提出一种新的框架，通过显式建模语音与文本的对齐关系，生成更具表现力的伪音频提示，从而有效弥合模态间的差距，提升目标领域的适应效果。实验表明，该方法在整体错误率和未登录词覆盖率方面均优于现有纯文本适应方法。

Comments Submitted to Interspeech 2026

2605.14337 2026-05-15 cs.CV

IG-Diff: Complex Night Scene Restoration with Illumination-Guided Diffusion Model

Yifan Chen, Fei Yin, Chunle Guo, Chongyi Li, Yujiu Yang

AI总结在夜间复杂场景中，由于光照不足和多种退化因素共存，图像恢复面临较大挑战。本文提出一种基于光照引导的扩散模型（IG-Diff），通过引入光照引导模块，有效提升了低光环境下多退化因素共存场景的图像恢复效果。同时，作者构建了包含多种退化因素的复杂夜间场景数据集，为相关研究提供了重要资源。

Comments Accepted by CGI-2025

2605.14333 2026-05-15 cs.CV

InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation

Yang Yue, Fangyun Wei, Tianyu He, Jinjing Zhao, Zanlin Ni, Zeyu Liu, Jiayi Guo, Lei Shi, Yue Dong, Li Chen, Ji Li, Gao Huang, Dong Chen

AI总结本文研究了在基于离散分词的自回归图像生成中如何提升文本和人脸的生成质量。作者指出，传统分词器因过度下采样和量化导致细粒度结构丢失，难以保留可读的文本和清晰的人脸特征。为此，他们提出了InsightTok，通过引入局部、内容感知的感知损失，有效提升了文本和人脸的保真度，并在不牺牲整体重建质量的前提下显著优于现有分词器。该方法在自回归图像生成模型InsightAR中表现出色，生成的图像具有更清晰的文本和更真实的人脸细节。

Comments Code and checkpoints are available at https://github.com/LeapLabTHU/InsightTok

2605.14327 2026-05-15 cs.LG cs.AI

AIM-DDI: A Model-Agnostic Multimodal Integration Module for Drug-Drug Interaction Prediction

Yerin Park, Sangseon Lee

AI总结药物-药物相互作用（DDI）预测在计算生物医学中具有重要意义，但如何对训练过程中未见的药物进行准确预测仍是一个关键挑战。本文提出了一种与模型无关的多模态集成模块AIM-DDI，它将结构、化学和语义等异构药物信息映射到共享的潜在空间中，并通过统一的融合模块建模模态间依赖关系，从而实现跨不同DDI预测架构的通用集成。实验表明，AIM-DDI在多种DDI模型和DrugBank数据集上均能有效提升预测性能，尤其在两个药物均未在训练中出现的最困难场景下表现突出。

2605.14326 2026-05-15 cs.CV

D2-CDIG: Controlled Diffusion Remote Sensing Image Generation with Dual Priors of DEM and Cloud-Fog

Zuopeng Zhao, Ying Liu, Kanyaphakphachsorn Pharksuwan, Su Luo, Xiaoyu Li, Maocai Ning

AI总结本文提出了一种名为D2-CDIG的可控扩散遥感图像生成框架，旨在解决现有方法在复杂地形和大气条件下生成图像准确性与自然度不足的问题。该方法通过融合数字高程模型（DEM）和云雾信息作为双重先验知识，实现了对地表特征和大气现象的精确控制，并引入了可调节的云雾滑块以灵活控制云层厚度和分布。实验表明，D2-CDIG在图像质量、细节丰富度和真实感方面相比传统方法有显著提升，为遥感大模型训练和下游任务提供了高质量的数据支持。

2605.14323 2026-05-15 cs.LG cs.AI cs.CL

Dynamic Latent Routing

Fangyuan Yu, Xin Su, Amir Abdullah

AI总结本文研究了在时间变化奖励函数的马尔可夫决策过程（MDP）中，子策略的时间拼接问题。作者提出了通用迪杰斯特拉搜索（GDS），并证明通过时间组合中间最优子策略可以恢复全局最优目标达成策略。基于GDS的“搜索、选择、更新”原则，作者进一步提出了动态潜在路由（DLR）方法，该方法在单次训练阶段联合学习离散潜在编码、路由策略和模型参数。实验表明，在低数据微调场景下，DLR在多个数据集和模型上表现优异，优于传统的监督微调方法。

2605.14318 2026-05-15 cs.AI cs.LG

Semantic Feature Segmentation for Interpretable Predictive Maintenance in Complex Systems

Emilio Mastriani, Alessandro Costa, Federico Incardona, Kevin Munari, Sebastiano Spinello

AI总结本文研究了复杂系统中可解释的预测性维护问题，针对监测变量异构性和冗余性导致的故障信息模糊和模型可解释性下降的问题，提出了一种语义特征分割框架。该方法将监测特征空间分解为保留主要预测信息的规范分量和包含结构边缘信号的残差分量，并基于领域知识定义功能分组以反映系统运行机制。实验表明，规范分量在预测风险和结构稳定性方面均优于残差分量和传统方法，实现了预测性能与语义可解释性的兼顾。

Comments 18 pages, 7 figures. Under review at Neural Computing and Applications. Keywords: semantic segmentation, change point detection, fault anticipation

详情

英文摘要

Predictive maintenance in complex systems is often complicated by the heterogeneity and redundancy of monitored variables,which can obscure fault-relevant information and reduce model interpretability. This work proposes a semantic feature segmentation framework that decomposes the monitored feature space into a canonical component,expected to retain the dominant predictive information, and a residual component containing structurally peripheral signals. The segmentation is defined through domain informed criteria and sets up monitoring variables into functional groups reflecting operational mechanisms such as throughput,latency,pressure,network activity,and structural state. To evaluate the effectiveness of this decomposition, we adopt a predictive perspective in which expected predictive risk is used as an operational proxy for task-relevant information. Experimental results obtained through time-aware cross-validation show that the canonical space consistently achieves lower predictive risk than the residual space across multiple temporal configurations, indicating that the semantic segmentation concentrates the most relevant information for fault anticipation. In addition, the canonical segments exhibit significantly stronger intra-segment coherence than inter-segment dependence, and this structural organization remains stable after redundancy reduction. When compared with the full feature space and with a Principal Component Analysis (PCA) representation, the canonical space carries out comparable predictive performance and furthermore preserves the semantic meaning of the original variables. These findings suggest that semantic feature segmentation provides an interpretable and information-preserving decomposition of monitoring signals, enabling competitive predictive performance without sacrificing the operational interpretability required in predictive maintenance applications.

URL PDF HTML ☆

赞 0 踩 0

2605.14317 2026-05-15 cs.LG physics.ao-ph

Guided Diffusion Sampling for Precipitation Forecast Interventions

Ayumu Ueyama, Kazuhiko Kawamoto, Hiroshi Kera

AI总结本文研究如何通过数据驱动的天气预报模型实现对极端降水的干预，以减少其带来的负面影响。作者提出了一种基于梯度引导的扩散采样方法，在扩散天气预报模型中引导采样轨迹，从而在保持大气状态分布一致性的同时实现降水减少。该方法从垂直结构、潜空间轨迹偏差和跨模型可迁移性三个角度评估干预的物理合理性，实验表明其在减少极端降水方面优于对抗性扰动方法。

Comments 12+7 pages, 7+2 figures

2605.14315 2026-05-15 cs.CV

TurboVGGT: Fast Visual Geometry Reconstruction with Adaptive Alternating Attention

David Huang, Guile Wu, Chengjie Huang, Bingbing Liu, Dongfeng Bai

AI总结本文提出了一种名为 TurboVGGT 的新型方法，用于实现快速的多视角三维重建。该方法采用自适应交替注意力机制的视觉几何变换器，在保证重建质量的同时显著提升了计算效率。通过自适应稀疏全局注意力和帧内注意力的结合，TurboVGGT 能够有效捕捉跨帧的全局关系和单帧内的局部细节，实验表明其在多个三维重建基准上表现优异，兼具速度与精度。

Comments Technical Report

2605.14310 2026-05-15 cs.CV

CoRDS: Coreset-based Representative and Diverse Selection for Streaming Video Understanding

Ailar Mahdizadeh, Puria Azadi, Muchen Li, Xiangteng He, Leonid Sigal

AI总结在流式视频理解中，如何高效压缩视觉-语言模型的键值缓存以支持长期推理是一个重要问题。本文将KV缓存压缩视为一个核心集选择问题，提出了一种基于几何覆盖和多样性优化的方法，通过联合优化键和值空间的表示，同时保留检索结构和输出相关信息。该方法引入正交性驱动的多样性准则，提升缓存子集的多样性，实验表明在多个开源模型和视频基准上优于传统启发式压缩方法。