arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.06047 2026-05-12 cs.LG cs.AI

TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models

Duong Nguyen, Mohammed Jawhar, Nicolas Chesneau

AI总结表格基础模型（TFMs）在零样本学习任务中表现出色，但其归纳偏置在推理时固定不变，导致难以适配特定任务或数据集。本文提出了一种轻量级的输入空间适配器TFM-Retouche，无需修改模型结构即可在冻结的TFM主干上进行微调，通过学习输入空间的小残差修正来对齐数据与预训练模型的归纳偏置。实验表明，该方法在多个任务上显著提升了模型性能，且在计算效率和预测质量之间达到了良好的平衡。

2605.05611 2026-05-12 cs.SD cs.AI eess.AS

X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning

Rixi Xu, Qingyu Liu, Haitao Li, Yushen Chen, Zhikang Niu, Yunting Yang, Jian Zhao, Ke Li, Berrak Sisman, Qinyuan Cheng, Xipeng Qiu, Kai Yu, Xie Chen

AI总结本文提出X-Voice，一个0.4B参数的多语言零样本语音克隆模型，使用户能够克隆任意人声并用30种语言说话。该模型基于420,000小时的多语言语料库训练，采用国际音标（IPA）作为统一表示，并设计了两阶段训练框架以无需复杂预处理即可实现零样本克隆。通过扩展F5-TTS架构，引入语言标识符双级注入和分类器自由引导的解耦调度机制，X-Voice在主观和客观评估中均优于现有系统，实现了与百亿参数模型相当的跨语言克隆能力。

Comments 16 pages, 4 figures, 9 tables

2605.05103 2026-05-12 cs.CL cs.AI cs.CY

Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement

Nicholas S. Kersting, Vittorio Castelli, Chieh Ting Yeh, Xinzhu Wang, Saad Taame

AI总结本文提出了一种名为“概念场”的新方法，用于衡量文本语料库中句子之间的语义变化，并据此检测生成文本中的幻觉和新颖性。该方法基于句子嵌入空间中相邻句子的差异，构建局部漂移场并估计点对点不确定性，通过计算候选句子过渡与该场的匹配程度，实现对生成内容的评估。研究引入了向量序列数据库（VSDB）以支持高效计算，并在联邦法规和文学作品等不同领域验证了方法的有效性，展示了其在跨领域应用中的稳定性和可解释性。

Comments 25 pages, 8 figures

2605.04899 2026-05-12 cs.LG

A geometric relation of the error introduced by sampling a language model's output distribution to its internal state

Albert F. Modenbach

AI总结本文研究了语言模型在生成过程中，由于采样其输出分布而引入的误差与模型内部状态之间的几何关系。作者通过分析词元嵌入的几何结构，推导出一个与$\mathfrak{so}(n)$李代数相关的1-形式，并发现其曲率具有语义意义。在国际象棋推理任务中，该曲率与模型的世界模型相互作用，揭示了模型内部如何根据棋盘区域和棋子重要性进行问题表示。

Comments 12 Pages, 10 Figures, 2 Appendices. To appear in Proceedings of ICML 2026

2605.04827 2026-05-12 cs.LG

Trustworthy Federated Label Distribution Learning under Annotation Quality Disparity

Junxiang Wu, Zhiqiang Kou, Hongwei Zeng, Wenke Huang, Biao Liu, Hanlin Gu, Yuheng Jia, Di Jiang, Yang Liu, Xin Geng

AI总结本文研究了在标注质量不均衡的联邦学习场景下的可信标签分布学习（Fed-LDL）问题，提出了一个质量感知的框架FedQual，通过全局语义锚点引导客户端自适应训练，并在服务器端基于可靠性重新加权聚合，以应对不同客户端标注质量差异带来的挑战。为验证方法有效性，作者构建了四个新的Fed-LDL基准数据集，并从理论上证明了客户端特定校准优于统一校准，实验结果进一步验证了FedQual的有效性。

2605.04738 2026-05-12 cs.LG

OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization

Zhikai Li, Zhen Dong, Xuewen Liu, Jing Zhang, Qingyi Gu

AI总结大语言模型（LLMs）参数量庞大，导致推理时资源消耗大且延迟高。为解决这一问题，研究提出了OSAQ方法，通过利用Hessian矩阵的低秩特性，构建加法权重变换以抑制权重中的系统性离群值，从而在低比特量化中提升模型性能。该方法无需层间变换或推理开销，且可通过闭式解高效实现，实验表明其在2比特量化下显著提升了模型表现。

Comments ICML 2026

2605.04671 2026-05-12 cs.LG

ITBoost: Information-Theoretic Trust for Robust Boosting

Ye Su, Longlong Zhao, Diego Garcia-Gil, Jipeng Guo, Gangchun Zhang, Jinxin Chen, Jinsong Chen

AI总结梯度提升在表格数据学习中表现出色，但在标签噪声环境下性能会下降，主要原因是其过于关注梯度大的样本，而未考虑这些误差是否来自难以学习的样本或不可靠的标签。为此，研究提出了基于信息论的信任机制（ITBoost），通过分析样本残差随迭代的变化轨迹，利用最小描述长度原则评估样本的可靠性，并对波动剧烈、不可靠的样本进行降权处理。理论分析表明ITBoost在标签噪声下具有更紧的泛化界，实验结果显示其在多个表格数据集上相比主流提升和深度模型具有更强的鲁棒性，同时在干净数据上仍保持优异性能。

2605.04665 2026-05-12 cs.CL

Paraphrase-Induced Output-Mode Collapse: When LLMs Break Character Under Semantically Equivalent Inputs

Aofan Liu, Jingxiang Meng

AI总结当请求的内容被语义等价地改写时，大型语言模型是否仍能按照原始任务要求的格式作答？研究发现，即使在温度为零的情况下，模型也常常无法保持格式一致性。论文提出了“提示变体输出模式崩溃”现象，即在封闭式提示下，语义等价的提示变体可能使模型输出从简洁格式转变为冗长的对话文本，导致评估系统误判。为此，作者构建了PARACONSIST基准，用于衡量模型在不同提示变体下的输出一致性与语义稳定性，并发现任务结构而非模型身份是崩溃现象的主要预测因素。

Comments Added a footnote; author order is alphabetical by last name

2605.04274 2026-05-12 cs.LG cs.AI stat.ML

A Mean Curvature Approach to Boundary Detection: Geometric Insights for Unsupervised Learning

Alexandre L. M. Levada

AI总结本文提出了一种基于平均曲率的边界检测方法——平均曲率边界点（MCBP），用于高维数据中的无监督学习。该方法通过局部k近邻邻域估计形状算子的离散近似，直接建模数据流形的内在曲率，从而无需显式参数化即可计算点的平均曲率，作为边界结构的原理性描述。研究揭示了高曲率区域与聚类过渡、几何不规则性和低密度界面之间的对应关系，并引入自适应百分位阈值策略实现多尺度边界提取，同时提出基于曲率的数据分解方法，提升聚类可分性和下游算法的鲁棒性。实验表明，MCBP在合成和真实数据集上显著提升了聚类性能，尤其在复杂高维场景中表现突出。

Comments 30 pages, 6 tables, 8 figures

详情

英文摘要

Accurate boundary detection in high-dimensional data remains a central challenge in unsupervised learning, particularly in the presence of non-linear structures and heterogeneous densities. In this work, we introduce Mean Curvature Boundary Points (MCBP), a novel geometric framework grounded in Geometric Machine Learning that departs from traditional density-based approaches by explicitly modeling the intrinsic curvature of the data manifold. The method relies on a discrete approximation of the shape operator, estimated from local k-nearest neighbor patches, to compute pointwise mean curvature without requiring explicit manifold parametrization. The key insight of MCBP is to use mean curvature as a principled descriptor of boundary structure: high-curvature regions naturally correspond to transitions between clusters, geometric irregularities, and low-density interfaces. This yields a unified geometric interpretation of boundary, outlier, and transition points. We further introduce an adaptive percentile-based thresholding scheme that enables multiscale boundary extraction without relying on ad hoc density parameters. Beyond detection, we propose a curvature-driven data decomposition that separates samples into smooth (low-curvature) and boundary (high-curvature) subsets, effectively acting as a non-linear geometric filtering mechanism. This representation enhances cluster separability and improves the robustness of downstream unsupervised algorithms. Extensive experiments on synthetic and real-world datasets demonstrate that MCBP consistently improves clustering performance, particularly in complex and high-dimensional scenarios. These results position MCBP as a concrete contribution to Geometric Machine Learning, highlighting the potential of curvature-aware analysis as a unifying paradigm bridging differential geometry and data-driven modeling.

URL PDF HTML ☆

赞 0 踩 0

2605.04078 2026-05-12 cs.LG cs.AI

Validity-Calibrated Reasoning Distillation

Khouloud Saadi, Di Wang

AI总结该研究提出了一种名为“有效性校准推理蒸馏”的方法，旨在将大语言模型的多步推理能力有效地传递给小型模型。与传统依赖固定师生结构和路径模仿的方法不同，该方法将推理蒸馏视为局部学习信号分配问题，通过比较学生模型和教师模型在相同前缀下的下一步动作，利用其相对有效性动态调整蒸馏更新的强度，从而实现更灵活、更贴合实际推理过程的指导机制。实验表明，该方法在数学推理、代码生成和指令遵循等任务中均优于现有蒸馏基线。

2605.03799 2026-05-12 cs.CL

Natural Language Processing: A Comprehensive Practical Guide from Tokenisation to RLHF

Mullosharaf K. Arabov

AI总结本文是一份系统性的研究型实践指南，全面介绍了现代自然语言处理的完整流程，涵盖分词、向量化、大语言模型微调、检索增强生成以及基于人类反馈的强化学习等内容。特别关注低资源和形态丰富的语言，如塔吉克语和鞑靼语，并提供了包括子词分词器、词嵌入、词典数据库和转写基准在内的原创研究成果，展示了如何在数据稀缺环境下实现严谨的自然语言处理。全书结合理论讲解与详细实现方案，强调可复现性，要求每章代码、模型和报告公开发布，并倡导使用开源模型而非商业API，适合希望从经典机器学习方法过渡到最先进大语言模型系统的高年级本科生、研究生及开发者参考。

Comments 136 pages, 12 practical works, preprint. Textbook for senior undergraduates and graduate students. Original contributions on low-resource languages (Tajik, Tatar and other). Companion repository available

2605.03456 2026-05-12 cs.CV

VL-SAM-v3: Memory-Guided Visual Priors for Open-World Object Detection

Chih-Chung Liu, Zhiwei Lin, Yongtao Wang

AI总结该研究提出了一种名为 VL-SAM-v3 的统一框架，旨在提升开放世界目标检测的性能，特别是在面对细粒度外观变化、稀有类别和复杂场景时。该方法通过引入基于检索的外部视觉记忆，生成两种互补的视觉先验，分别用于实例级空间定位和类别感知的局部上下文建模，并结合记忆引导的提示优化机制，实现了对开放词汇和开放端检测任务的支持。实验表明，VL-SAM-v3 在 LVIS 数据集上显著提升了零样本检测性能，尤其在稀有类别上表现突出，且方法具有良好的通用性。

2605.03276 2026-05-12 cs.CV

VEBench:Benchmarking Large Multimodal Models for Real-World Video Editing

Andong Deng, Dawei Du, Zhenfang Chen, Wen Zhong, Fan Chen, Guang Chen, Chia-Wen Kuo, Longyin Wen, Chen Chen, Sijie Zhu

AI总结 VEBench 是一个用于评估大 multimodal 模型在真实世界视频编辑任务中表现的综合性基准。该基准包含大量高质量编辑视频和人工验证的问题答案对，旨在测试模型在视频编辑技术识别和操作流程模拟方面的能力。研究揭示了当前模型在视频编辑认知方面与人类水平仍存在较大差距，突显了将视频理解与创造性操作推理相结合的迫切需求。

Comments CVPR Findings 2026

2605.02743 2026-05-12 cs.AI cs.CV cs.HC

Triple Spectral Fusion for Sensor-based Human Activity Recognition

Ye Zhang, Longguang Wang, Qing Gao, Chaocan Xiang, Mohammed Bennamoun, Yulan Guo

AI总结本文提出了一种用于基于传感器的人类活动识别（HAR）的三重谱融合框架，旨在解决多源异构传感器数据在时序维度上的信息融合难题。该方法结合自适应互补滤波、图傅里叶域自适应滤波和小波频率选择，分别从时间、空间和频率三个维度对传感器数据进行有效融合与特征压缩，从而提升活动识别的准确性和鲁棒性。实验结果表明，该框架在多个基准数据集上均表现出优越的性能。

2605.02175 2026-05-12 cs.AI

Intervention Complexity as a Canonical Reward and a Measure of Intelligence

Brendan McCane

AI总结本文提出了一种名为“干预复杂度”的新型通用智能度量，作为替代传统外部奖励函数的规范性奖励标准。该度量基于资源函数（如程序长度或运行时间）定义，具备环境衍生性、普遍性、最小性等五项自然属性，无需外部规范输入即可完成Legg--Hutter智能框架的理论补充。研究还引入了智能的二维刻画：代理能力与学习效率，并证明资源偏置的选择决定了度量的可计算性，为超智能和通用代理预训练提供了理论依据。

Comments 23 pages

2605.02169 2026-05-12 cs.CV cs.DC cs.LG

Heterogeneous Model Fusion for Privacy-Aware Multi-Camera Surveillance via Synthetic Domain Adaptation

Peggy Joy Lu, Wei-Yu Chen, Yao-Tsung Huang, Vincent Shin-Mu Tseng

AI总结本文提出了一种名为HeroCrystal的隐私保护多摄像头领域自适应目标检测框架，旨在解决数据隐私、类别不平衡和异构架构等挑战。该框架通过合成领域自适应技术，结合生成式模型、联邦学习和知识蒸馏，实现了在不泄露原始数据的前提下提升检测性能。实验表明，HeroCrystal在多个跨领域检测基准上表现优异，显著提升了检测精度，达到了33.4%的最新mAP指标。

Comments 42 pages, 13 figures. Published in Information Fusion (Elsevier). DOI: 10.1016/j.inffus.2026.104413

详情

DOI: 10.1016/j.inffus.2026.104413
Journal ref: Information Fusion, 2026

英文摘要

We propose HeroCrystal, a novel privacy-preserving framework for multi-camera domain-adaptive object detection, addressing challenges such as data privacy, class imbalance, and heterogeneous architectures. Our framework consists of three key stages. In the Generated Stage, we introduce a one-shot, target-aware diffusion-based generation module that learns visual style from a single target-domain image while leveraging prompt-based control to synthesize specific object instances. Unlike conventional style transfer-based methods that require large target datasets and ignore semantic-level discrepancies, our approach enables privacy-preserving augmentation to reduce ethical concerns, and introduces controllable rare object generation to mitigate long-tailed category degradation. In the Federated Stage, we employ probabilistic Faster R-CNN on the client side to improve localization accuracy, and a dynamic model contrastive strategy to suppress domain-specific bias. The server side performs model fusion across heterogeneous architectures without accessing raw data. Finally, in the Distilled Stage, we propose an inconsistent categories integration algorithm to resolve label inconsistency and architecture heterogeneity across clients. Extensive experiments on multiple cross-domain detection benchmarks demonstrate that our method outperforms existing multi-source domain adaptation and federated learning baselines under multi-class, privacy-preserving settings. Our method improves mAP by +2.1% over prior privacy-preserving approaches and achieves a new state-of-the-art mAP of 33.4%, highlighting the effectiveness of HeroCrystal in enabling practical multi-camera AI surveillance systems. The source code is publicly available at https://github.com/ccuvislab/HeroCrystal.

URL PDF HTML ☆

赞 0 踩 0

2605.01529 2026-05-12 cs.RO

Good in Bad (GiB): Sifting Through End-user Demonstrations for Learning a Better Policy

Noushad Sojib, Ola Ghattas, Momotaz Begum

AI总结本文提出了一种名为GiB（Good-in-Bad）的算法，用于从非专家用户的演示中学习更稳健的机器人策略。该方法能够自动识别并剔除演示中的错误子任务，同时保留高质量部分，从而提升策略学习的效果。通过自监督模型学习潜在特征，并结合马氏距离检测低质量部分，GiB在模拟和真实环境中均展现出优于传统方法的性能。

2605.01507 2026-05-12 cs.AI

MILD: Mediator Agent System with Bidirectional Perception and Multi-Layered Alignment for Human-Vehicle Collaboration

Jiyao Wang, Yunbiao Wang, Yubo Jiao, Xiao Yang, Dengbo He, Sasan Jafarnejad, Luis Miranda-Moreno, Raphael Frank, Jiangbo Yu

AI总结该研究针对部分自动驾驶系统中人车协作存在的认知负担和意图理解不足问题，提出了一种名为MILD的中介代理系统，通过双向感知和多层级对齐机制提升人车协作效率与安全性。MILD引入了感知代理和策略代理，结合证据与约束加权策略优化（ECPO）方法，确保决策既符合安全规范又满足用户偏好。实验表明，MILD在感知准确性和策略质量方面优于现有方法，显著提升了用户的信任度与驾驶体验。

详情

英文摘要

Prior studies report that partial driving automation can increase the cognitive demands on human drivers. This effect largely arises from human drivers' lack of transparent insight into the vehicle's intentions and decision logic, as well as from automated systems' limited awareness of the driver's dynamic state and preferences. This bidirectional misalignment undermines shared situational awareness and exacerbates coordination failures in human-vehicle interaction. To address these limitations, we argue for a paradigm shift that elevates the human role from passive supervisor to active manager. We introduce the Mediator-in-the-Loop-Driving (MILD) system, based on an agentic system architecture to facilitate synergistic human-vehicle collaboration. MILD integrates a perception agent for joint in-cabin and out-of-cabin understanding with a lightweight strategy agent that generates compliant and explainable action suggestions. To ensure these strategies are strictly aligned with safety regulations and human values, we develop Evidence- and Constraint-weighted Policy Optimization (ECPO). ECPO leverages automatic validators to steer the agent toward behaviors that are not only accurate but also structurally complete, substantiated by evidence, and free from constraint violations. Furthermore, a retrieval-augmented generation module dynamically incorporates constraints from traffic regulations, speed recommendations, and driver preferences into the decision loop. Field experiments across three open datasets demonstrate that MILD consistently outperforms baselines in both perception accuracy and strategy quality under auditable offline metrics, and yields higher human-rated policy adequacy, comfort, and explanation than baselines. This work offers a practical pathway for building auditable and aligned agents for human-vehicle collaborative driving.

URL PDF HTML ☆

赞 0 踩 0

2605.01345 2026-05-12 cs.CV cs.AI cs.LG

The Perceptual Bandwidth Bottleneck in Vision-Language Models: Active Visual Reasoning via Sequential Experimental Design

Anjie Liu, Ziqin Gong, Yan Song, Yuxiang Chen, Xiaolong Liu, Hengtong Lu, Kaike Zhang, Chen Wei, Jun Wang

AI总结现代视觉语言模型（VLMs）在视觉感知方面面临感知带宽瓶颈问题，即广视野虽能保留全局上下文，却牺牲了进行复杂推理所需的细粒度细节。本文提出通过主动视觉推理和顺序实验设计方法，以任务相关证据获取为目标，优化视觉信息的获取过程。研究设计了一种无需训练的框架FOVEA，通过证据导向的探查策略提升模型在高分辨率场景下的推理能力，实验表明该方法在多个高分辨率基准测试中表现优于现有方法，尤其在遥感等搜索主导任务中效果显著。

Comments 27 pages, 5 figures, accepted at ICML 2026

2605.01323 2026-05-12 cs.CL cs.AI

SiNFluD: Creating and Evaluating Figurative Language Dataset for Sindhi

Wazir Ali, Adeeb Noor, Saifullah Tumrani

AI总结本文介绍了SiNFluD，一个用于信德语比喻语言分类的新基准数据集。研究者通过收集博客、社交媒体和文学作品中的原始文本，并利用Doccano工具进行标注，取得了0.81的标注者间一致性。实验采用交叉验证和多种预训练模型进行评估，其中XLM-RoBERTa-XL在少样本微调下表现最佳，为信德语比喻语言研究提供了重要资源。

2605.01011 2026-05-12 cs.CL cs.AI cs.LG

CLEAR: Revealing How Noise and Ambiguity Degrade Reliability in LLMs for Medicine

Kevin H. Guo, Chao Yan, Avinash Baidya, Katherine Brown, Xiang Gao, Juming Xiong, Zhijun Yin, Bradley A. Malin

AI总结该研究提出CLEAR框架，旨在评估医学大语言模型在面对噪声和模糊性时的可靠性问题。通过系统性地改变答案选项的数量、是否存在真实答案或弃权选项以及选项的语义表述，CLEAR揭示了当前医学基准测试中存在的一些关键缺陷。研究发现，随着答案选项增多或弃权表述从明确拒绝转向不确定性承认，模型的正确识别能力下降，且模型规模越大，这种可靠性问题越显著。

2605.00884 2026-05-12 cs.CV

LiteVLA-H: Dual-Rate Vision-Language-Action Inference for Onboard Aerial Guidance and Semantic Perception

Justin williams, Kishor Datta Gupta, Roy George, Mrinmoy Sarkar

AI总结本文提出了一种名为LiteVLA-H的紧凑型视觉-语言-动作（VLA）系统，旨在解决无人机在严格计算和通信约束下的低延迟闭环引导与语义感知问题。该系统通过双速率操作，在NVIDIA Jetson AGX Orin平台上实现了快速动作生成与较慢语义理解的协同运行。研究发现，在边缘设备上，多模态预填充过程是影响端到端延迟的主要因素，基于此设计了高效的调度策略，并通过知识保留的微调方法提升了模型在飞行控制与语义描述任务上的性能。

2605.00445 2026-05-12 cs.LG

The Power of Order: Fooling LLMs with Adversarial Table Permutations

Xinshuai Dong, Haifeng Chen, Xuyuan Liu, Shengyu Chen, Haoyu Wang, Shaoan Xie, Kun Zhang, Zhengzhang Chen

AI总结本文研究了大语言模型（LLMs）在处理表格数据时对输入结构的鲁棒性问题，发现即使对表格的行列进行语义不变的排列，也可能导致模型输出错误或不一致。为此，作者提出了一种基于梯度的对抗性表格排列攻击方法，能够高效地找到最破坏模型性能的排列方式。实验表明，该方法显著降低了多种LLMs的性能，揭示了当前模型在处理结构化数据时存在的普遍脆弱性。

2604.26412 2026-05-12 cs.CL

When Hidden States Drift: Can KV Caches Rescue Long-Range Speculative Decoding?

Tianyu Liu, Yuhao Shen, Xinyi Hu, Baolin Zhang, Hengxin Zhang, Jun Dai, Jun Zhang, Shuang Ge, Lei Chen, Yue Li, MingCheng Wan

AI总结该研究探讨了在长距离推测解码中隐藏状态漂移的问题，指出当前基于隐藏状态的草案生成器在长距离预测时准确性下降。文章提出通过重用目标模型的键值（KV）缓存，可以提供更丰富的上下文信息，从而提升长距离推测的性能。研究引入了KVShot框架进行验证，并揭示了当前方法在训练和结构上的关键瓶颈，为未来高效推理架构的设计提供了指导。

详情

英文摘要

Speculative decoding accelerates LLM inference, but SOTA hidden-state-based drafters suffer from long-range decay: draft accuracy degrades as the speculative step increases. Existing work attributes this decay to train-inference mismatch and proposes test-time training (TTT) as a remedy, yet we observe that long-range decay persists even in TTT-trained drafters. We revisit long-range decay from the perspective of context information preservation. In hidden-state reuse, we argue the target hidden state acts as a biased context compression: it aggregates historical token information according to the attention query at the current position, yielding a compact representation optimized for immediate next-token prediction. This compression can suppress information less relevant to the current query but important for later speculative steps. In contrast, the target model's KV cache serves as an explicit context, retaining the complete set of token-wise KV representations. We therefore posit the KV-Reuse Hypothesis: allowing the draft model to reuse the target KV cache can provide richer signals for long-horizon drafting. To test this hypothesis, we introduce KVShot, a diagnostic framework that compares three reuse paradigms: hidden-only, KV-only, and hybrid. Extensive evaluations on Qwen3-8B show that KV-Reuse improves long-range acceptance, although end-to-end speedups remain marginal under current training pipelines. Our analysis identifies two key structural bottlenecks: shallow drafters struggle to estimate target queries accurately, and draft-side KV projections receive sparse gradient signals. These findings suggest that realizing the full potential of KV-aware decoding requires moving beyond TTT toward block-wise training paradigms. By exposing these bottlenecks, KVShot provides a foundational diagnostic testbed and a clear roadmap for designing next-generation inference architectures.

URL PDF HTML ☆

赞 0 踩 0

2604.26326 2026-05-12 cs.LG cs.CL stat.ML

Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control

Bolian Li, Yifan Wang, Yi Ding, Anamika Lochab, Ananth Grama, Ruqi Zhang

AI总结本文研究了大语言模型（LLM）在强化学习（RL）中遇到的性能饱和问题，并提出了一种名为Entrocraft的新方法，通过精确控制熵曲线来解决这一问题。该方法基于偏差优势分布的拒绝采样，无需正则化且适用于任意优势估计器。理论分析表明，该方法能够解释现有RL方法和熵保持方法的行为，并揭示了线性退火策略在熵调度中的优越性。实验表明，Entrocraft有效缓解了性能饱和，显著提升了模型的泛化能力、输出多样性和长期训练表现。

2604.25674 2026-05-12 cs.CL

Modeling Human-Like Color Naming Behavior in Context

Yuqing Zhang, Ecesu Ürker, Tessa Verhoef, Gemma Boleda, Arianna Bisazza

AI总结该研究旨在建模人类样式的颜色命名行为，通过神经代理在监督学习和强化学习框架下生成类人词汇。为了解决现有模型生成的词汇与人类颜色分类在几何结构上的差异，研究引入了稀有颜色术语的上采样和多听者交互机制，并采用凸性度量来评估词汇的几何一致性。实验表明，这些方法有效提升了词汇的多样性与信息性，使生成的词汇更接近人类颜色命名系统。

Comments Cognitive Science Society Annual Conference 2026

2604.25031 2026-05-12 cs.CL cs.AI

Faithful Autoformalization via Roundtrip Verification and Repair

Daneshvar Amrollahi, Jerry Lopez, Clark Barrett

AI总结本文研究如何验证大语言模型在将自然语言形式化时的可靠性，提出了一种无需真实标注的往返验证方法，通过形式化、反向翻译、再形式化并利用形式化工具检查逻辑等价性，从而判断形式化结果是否忠实。当两次形式化结果不一致时，系统能定位错误步骤并进行有针对性的修复。实验表明，基于诊断引导的修复方法在两个法律领域中效果最佳，且形式化结果通过等价性检查的规则在自然语言推理漂移方面表现更优。

2604.23789 2026-05-12 cs.CV

MuSS: A Large-Scale Dataset and Cinematic Narrative Benchmark for Multi-Shot Subject-to-Video Generation

Haojie Zhang, Di Wu, Bingyan Liu, Linjie Zhong, Yuancheng Wei, Xingsong Ye, Nanqing Liu, Yaling Liang

AI总结该研究针对电影叙事中复杂的多镜头生成问题，提出了一个大规模数据集MuSS和电影叙事基准，以推动多镜头主体到视频生成（S2V）的发展。为解决真实叙事逻辑、时空对齐冲突和“复制-粘贴”困境等核心挑战，MuSS通过渐进式字幕生成和跨镜头匹配机制构建，确保局部准确性和全局叙事连贯性。同时，研究引入了新的评估指标ACP-Var，有效衡量模型在连续叙事和三维结构一致性方面的能力，实验表明该数据集显著提升了模型的叙事效果和跨镜头身份保持能力。

Comments 17 pages, 9 figues

2604.23693 2026-05-12 cs.RO

Decentralized Heterogeneous Multi-Robot Collaborative Exploration for Indoor and Outdoor 3D Environments

Yuxiang Li, Kun Chen, Jiancheng Wang, Shihao Fang, Haoyao Chen, Yunhui Liu

AI总结本文提出了一种用于室内和室外三维环境的异构多机器人协同探索的去中心化框架，旨在有效利用不同机器人特性以提升探索效率。该方法设计了融合地形与观测指标的基本感知地图，并采用改进的监督体分割技术简化地图结构，支持轻量级通信。通过建模异构机器人的通行与观测能力，将任务视点分配转化为考虑机器人能力约束的异构多仓库旅行商问题，并采用改进的遗传算法求解，最终优化探索路径并解决机器人路径冲突。实验表明，该方法在复杂环境中实现了更高效的探索与通信节省。

2604.22942 2026-05-12 cs.CV cs.AI cs.LG

VS-DDPM: Efficient Low-Cost Diffusion Model for Medical Modality Translation

Nikoo Moradi, Gijs Luijten, Behrus Hinrichs-Puladi, Jens Kleesiek, Victor Alves, Jan Egger, André Ferreira

AI总结该研究提出了一种名为VS-DDPM的三维可变步长去噪扩散概率模型，旨在在保持生成质量的同时显著提升医学图像合成的推理速度。该模型在多个医学模态转换任务中表现出色，如缺失MRI重建、肿瘤去除以及MRI到合成CT的转换，在多个指标上达到了先进水平。尽管在部分任务中未达到最优性能，但VS-DDPM仍展示了其在高保真医学图像生成中的鲁棒性和可调性。