2606.00890 2026-06-02 cs.CV

深入波动：用于跨被试脑电情绪解码的Morlet谱变换器

Jiaxin Qing, Lexin Li

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结针对脑电情绪识别中跨被试变异性问题，提出基于Morlet小波标记化、长上下文基线去除和频带特定空间投影的Morlet谱变换器（MST），无需预训练即可在SEED系列数据集上超越大型预训练模型和频域方法。

详情

AI中文摘要

我们研究基于脑电的跨被试情绪识别，这是脑机接口中一个实际重要但具有挑战性的问题。与具有清晰波形特征的任务不同，情绪相关的脑电信号主要编码在频谱功率中，且微弱、嘈杂，并在被试间高度变化。现有方法要么依赖需要大量数据但仍难以应对跨被试变异的大型预训练脑电基础模型，要么依赖频域编码器（能更好地反映频谱结构但存在表示不匹配、漂移主导的标记化以及缺乏频带特定空间建模）。在本文中，我们提出了Morlet谱变换器（MST），它围绕三个关键组件构建，并与时空变换器主干集成。首先，Morlet小波标记化提供了与脑节律多尺度结构匹配的时频表示，并将经典微分熵特征扩展到适合变换器的形式。其次，长上下文基线去除作为一种简单的时间归一化，消除了被试特定漂移和附近窗口间的冗余。第三，频带特定空间投影为每个频带学习独立的通道混合器，捕获可解释的频带特定模式并减少跨通道混合。我们表明，即使没有预训练，MST在所有SEED系列数据集上始终优于大型预训练脑电基础模型和基于频率的方法。这些结果表明，精心的表示设计可以产生准确、经济且可解释的替代大规模预训练的方法。

英文摘要

We study cross-subject emotion recognition from EEG, a practically important yet challenging problem in brain-computer interfaces. Unlike tasks with clear waveform signatures, emotion-related EEG signals are primarily encoded in spectral power and are weak, noisy, and highly variable across subjects. Existing approaches rely either on large pretrained EEG foundation models, which require massive data yet still struggle with cross-subject variability, or frequency-domain encoders, which better reflect spectral structure but suffer from mismatched representations, drift-dominated tokenization, and lack of band-specific spatial modeling. In this article, we propose the Morlet Spectral Transformer (MST), built around three key components and integrated with a spatiotemporal Transformer backbone. First, Morlet wavelet tokenization provides a time-frequency representation that matches the multi-scale structure of brain rhythms, and extends classical differential entropy features to a form suitable for Transformers. Second, long-context baseline removal acts as a simple temporal normalization that removes subject-specific drift and redundancy across nearby windows. Third, frequency-specific spatial projection learns a separate channel mixer for each frequency band, capturing interpretable band-specific patterns and reducing cross-channel mixing. We show that, even without pretraining, MST consistently outperforms both large pretrained EEG foundation models and frequency-based methods across all SEED-family datasets. These results suggest that careful representation design can yield an accurate, cost-effective, and interpretable alternative to large-scale pretraining.

URL PDF HTML ☆

赞 0 踩 0

2606.00881 2026-06-02 cs.CL

Chunking Methods on Retrieval-Augmented Generation - Effectiveness Evaluation Against Computational Cost and Limitations

检索增强生成中的分块方法——针对计算成本与局限性的有效性评估

Mateusz Śmigielski, Michał Rajkowski, Mateusz Zbrocki, Michał Bernacki-Janson, Karol Kunicki, Julianna Godziszewska, Maciej Piasecki, Konrad Wojtasik

发表机构 * Department of Artificial Intelligence, Faculty of Information and Communication Technology, Wrocław University of Science and Technology（人工智能系，信息与通信技术学院，沃斯克大学）

AI总结本研究首次系统评估多种分块方法在RAG系统中的有效性，揭示分块策略中常被忽视的关键问题。

详情

AI中文摘要

检索增强生成（RAG）在提升大型语言模型（LLMs）性能方面展现了显著能力。RAG系统中的关键任务之一是分块过程。传统上，固定大小分块和语义分块是标准方法。然而，对分块策略的兴趣日益增长，导致越来越多声称性能优于传统技术的方法被提出。许多这些方法针对特定用例和数据类型定制，缺乏在不同场景下有效性的证据。因此，直接比较不同技术并评估其相对优势仍然具有挑战性。据我们所知，本研究首次系统评估了广泛分块方法的有效性，并强调了RAG系统中分块策略的潜在挑战。虽然分块通常被视为简单的预处理步骤，但我们表明它引入了一系列有影响且常被忽视的问题。

英文摘要

Retrieval-Augmented Generation (RAG) has demonstrated significant capabilities in enhancing the performance of Large Language Models (LLMs). One of the key tasks in RAG systems is the chunking process. Traditionally, fixed-size chunking and semantic chunking have been the standard approaches. However, interest in chunking strategies has been increasing, leading to a growing number of proposed methods that often claim improved performance over these conventional techniques. Many of these approaches are tailored to specific use cases and data types, with limited evidence of their effectiveness across diverse scenarios. As a result, it remains challenging to directly compare different techniques and assess their relative strengths. To the best of our knowledge, this study is the first to systematically evaluate the effectiveness of a wide range of chunking methods and emphasize the underlying challenges of chunking strategies in RAG systems. While chunking is commonly treated as a simple preprocessing step, we show that it introduces a range of impactful and often overlooked issues.

URL PDF HTML ☆

赞 0 踩 0

2606.00880 2026-06-02 cs.LG cs.AI

Task diversity produces systematic transfer but inhibits continual reinforcement learning

任务多样性产生系统性迁移但抑制持续强化学习

Purab Seth, Neil Shah, Kunal Jha, Samuel J. Gershman, Max Kleiman-Weiner, Wilka Carvalho

发表机构 * MIT（麻省理工学院）； University of California, Berkeley（加州大学伯克利分校）； Princeton University（普林斯顿大学）； Harvard University（哈佛大学）

AI总结通过引入GPU加速的持续强化学习领域Banyan，研究任务多样性（地图布局、交互对象、子目标层次结构）对智能体在分布变化下持续学习能力的影响，发现多样性促进局部迁移但导致长期任务性能停滞和遗忘。

详情

AI中文摘要

持续强化学习旨在产生不仅能在当前任务上提高，还能随着任务分布变化而适应的智能体。在众多不同任务上训练智能体可以引发零样本泛化，但先前的工作通常是在训练后（冻结权重）评估这种泛化。任务多样性是否也能提高智能体在分布变化下继续学习的能力仍不清楚。我们引入了Banyan，一个GPU加速的持续强化学习领域，其中任务多样性分解为三个独立可控的轴：智能体必须导航的地图布局、必须与之交互的对象以及子目标依赖的层次结构。在单个分布变化中，增加每个轴上的多样性会导致智能体在新任务上开始训练时，其性能接近先前任务达到的水平，即使变化改变了最优策略的结构。然而，随着变化数量的增加，这种局部迁移本身并不能产生持续的持续学习：更长视野的任务出现平台期，并且较早的任务分布在后续训练后被遗忘。Banyan是一个基准，用于研究受控的任务多样性何时产生可迁移的学习，这种迁移何时持续，以及它在哪些方面未能达到真正的持续学习。

英文摘要

Continual reinforcement learning aims to produce agents that learn not only to improve at their current tasks but also to adapt as task distributions change. Training an agent on many diverse tasks can induce zero-shot generalization, but previous work generally evaluates this generalization after training -- with frozen weights. Whether task diversity also improves an agent's ability to continue learning across distribution shifts remains unclear. We introduce Banyan, a GPU-accelerated continual RL domain in which task diversity factors into three independently controllable axes: the map layouts an agent must navigate, the objects it must interact with, and the hierarchical structures of sub-goal dependencies. Across individual distribution shifts, increasing diversity along each axis causes agents to begin training on the new tasks near the performance attained on the previous one, even when the shift changes the structure of the optimal policy. However, as the number of shifts increases, this local transfer does not by itself yield sustained continual learning: longer-horizon tasks plateau, and earlier task distributions are forgotten after later training. Banyan is a benchmark for studying when controlled task diversity produces transferable learning, when that transfer persists, and where it falls short of proper continual learning.

URL PDF HTML ☆

赞 0 踩 0

2606.00875 2026-06-02 cs.CL

通过认知成对训练增强LLM元认知

Weitao Li, Hao Zhou, Xuanyu Lei, Fandong Meng, Yuanhang Liu, Jingyi Ren, Ante Wang, Xiaolong Wang, Yuanchi Zhang, Fuwen Luo, Guangwen Yang, Lin Gan, Weizhi Ma, Yang Liu

发表机构 * National Engineering Laboratory for Intelligent Information Processing, Academy of Mathematics and Physics, Chinese Academy of Sciences（智能信息处理国家工程实验室，中国科学院数学物理研究所）； University of Science and Technology of China（中国科学技术大学）

AI总结提出认知成对训练（CPT），通过成对比较推理轨迹来学习区分可靠与不可靠推理，从而提升LLM的推理与元认知权衡。

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）已成为LLM推理的核心，但其结果级奖励可能使模型在证据或推理不可靠时更愿意给出自信答案。现有的SFT或RL方法主要在响应级别教导LLM拒绝或表达不确定性，这可能导致过度拟合拒绝行为，而非提高推理可靠性。为解决这一局限，我们提出认知成对训练（CPT），这是一种认知中期训练对齐阶段，将推理轨迹上的成对比较转化为可复用的对齐信号。通过学习区分可信与有缺陷的推理，CPT鼓励模型内化推理质量判别边界，而非记忆表面拒绝模式。在五个模型规模和三个模型家族上，CPT改善了推理与元认知的权衡。在14B规模上，CPT+RL相比标准SFT+RL流水线在数学平均分上提升2.2分，在拒绝F1上提升5.2分。进一步分析表明，CPT提高了轨迹质量，并在评估和训练设置中表现出强鲁棒性和可扩展性。代码和模型已发布在https://github.com/Tsinghua-dhy/CPT。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has become central to LLM reasoning, but its outcome-level rewards can make models more willing to give confident answers when evidence or reasoning is unreliable. Existing SFT or RL methods mainly teach LLMs to refuse or express uncertainty at the response level, which can overfit abstention behavior rather than improve reasoning reliability. To address this limitation, we propose Cognitive Pairwise Training (CPT), a cognitive mid-training alignment stage that turns pairwise comparisons over reasoning traces into a reusable alignment signal. By learning to distinguish trustworthy from flawed reasoning, CPT encourages the model to internalize a reasoning-quality discrimination boundary rather than memorize surface refusal patterns. Across five model scales and three model families, CPT improves the reasoning--metacognition trade-off. At 14B, CPT+RL outperforms the standard SFT+RL pipeline by +2.2 math-average points and +5.2 abstention-F1 points. Further analyses show that CPT improves trace quality and exhibits strong robustness and scalability across evaluation and training settings. Code and models are released at https://github.com/Tsinghua-dhy/CPT.

URL PDF HTML ☆

赞 0 踩 0

2606.00857 2026-06-02 cs.RO cs.AI

From Cues to Horizons: Dynamic Risk Horizon Profiling for Trajectory Prediction

从线索到视野：轨迹预测的动态风险视界剖面

Xinyi Ning, Zilin Bian, Dachuan Zuo, Semiha Ergan, Kaan Ozbay

发表机构 * Department of Civil and Urban Engineering, New York University（纽约大学土木与城市工程系）； Department of Civil Engineering Technology and Environmental Management Safety, Rochester Institute of Technology（罗切斯特理工学院土木工程技术与环境安全管理系）

AI总结提出风险视界剖面（RHP）模块，通过连续可学习的势场模型对未来风险分布进行建模，以提升轨迹预测的准确性，在highD和SHRP2数据集上分别降低5秒RMSE 25.0%和5秒minFDE 29.1%。

Comments 11 pages, 7 figures, submitted to IEEE Transactions on Intelligent Transportation Systems (T-ITS)

详情

AI中文摘要

准确可靠的车辆轨迹预测对于安全自动驾驶至关重要。最近的研究将安全风险纳入轨迹预测，以量化周围代理带来的危险。然而，大多数风险感知方法将过去的风险信息作为辅助信号来帮助决策，忽视了其未来的演变和不确定性。在本文中，我们提出了一种风险视界剖面（RHP）模块，该模块结合了连续、可学习的势场模型，用于风险感知轨迹预测。RHP模块计算周围物体的时空接近度，以描绘未来视界上的风险分布，通过自适应识别人类驾驶员认为的关键时刻，支持更好的轨迹预测。我们在两个不同驾驶设置的数据集上评估了我们的方法：highD（高速公路走廊）和SHRP2（城市街道），涵盖了包括安全、近碰撞和碰撞事件在内的多种风险场景。与基线方法相比，我们的框架在highD数据集上实现了5秒RMSE降低25.0%，在SHRP2上实现了5秒minFDE降低29.1%。这些结果表明，该方法在短视界和长视界预测中均表现出色，并且在高速公路和城市场景中具有强大的泛化能力。所提出的方法能够实现更真实的自动驾驶车辆路径规划和策略选择，从而支持更安全的自动驾驶和更先进的驾驶员辅助系统。本工作的源代码可在以下网址获取：https://github.com/bilab-nyu/RHP

英文摘要

Accurate and reliable vehicle trajectory prediction is essential for safe autonomous driving. Recent studies have incorporated safety risk into trajectory prediction to quantify dangers posed by surrounding agents. However, most risk-aware approaches use past risk information as a secondary signal to help guide decisions, overlooking its future evolution and uncertainty. In this paper, we propose a risk horizon profiling (RHP) module that incorporates a continuous, learnable potential field model for risk-aware trajectory prediction. The RHP module calculates the spatial-temporal proximity of surrounding objects to profile risk distributions across future horizons, which supports better trajectory prediction by adaptively identifying what human drivers perceive as critical moments. We evaluate our method on two datasets from different driving settings, highD for highway corridors and SHRP2 for urban streets, which cover diverse risk scenarios including safe, near-crash, and crash events. Compared to the baseline methods, our framework achieves a 25.0\% reduction in 5s RMSE on the highD dataset and a 29.1\% reduction in 5s minFDE on SHRP2. These results indicate strong performance for both short and long horizon prediction and robust generalization across highway and urban scenarios. The proposed method enables more realistic AV path planning and strategic selection, thereby supporting safer autonomous driving and more advanced driver-assistance systems. The source code for this work is available at: https://github.com/bilab-nyu/RHP

URL PDF HTML ☆

赞 0 踩 0

2606.00852 2026-06-02 cs.CV cs.AI cs.LG

RefDiffNet: Learning to Expose Subtle PCB Defects Before Detection

RefDiffNet: 在检测前学习暴露细微PCB缺陷

Vinay Edula, Nilesh Badwe, Priyanka Bagade

发表机构 * Department of Computer Science and Engineering Indian Institute of Technology Kanpur（计算机科学与工程系印度理工学院坎浦尔）； Department of Materials Science and Engineering Indian Institute of Technology Kanpur（材料科学与工程系印度理工学院坎浦尔）

AI总结提出RefDiffNet，一种轻量级即插即用的输入增强模块，通过引入无缺陷参考图像来突出缺陷区域，从而提升下游检测器在PCB缺陷检测中的性能。

详情

AI中文摘要

印刷电路板（PCB）缺陷检测具有挑战性，因为许多缺陷很小且难以与复杂的背景图案区分。大多数基于深度学习的PCB检测方法仅依赖被检测的PCB图像进行缺陷检测，忽略了编码走线、焊盘和其他PCB结构预期布局的无缺陷参考图像。在这项工作中，我们提出了RefDiffNet，一种轻量级即插即用的输入增强模块，放置在检测器主干之前，用于在缺陷检测前增强图像。RefDiffNet将经典检测中的一个成熟思想带入深度学习时代，利用无缺陷参考图像来揭示缺陷。RefDiffNet比较缺陷图像与对齐的参考图像，捕获相对于参考图像的结构变化，并使用轻量级编码器输出缺陷区域被突出的原始图像，从而简化下游检测器的任务。在HRIPCB和DeepPCB上的结果表明，RefDiffNet在各类检测器上一致地提升了性能，包括从YOLOv8到YOLOv26的单阶段检测器、基于Transformer的RT-DETR以及两阶段Faster R-CNN。它实现了高达18%的相对mAP50:95增益，且开销可忽略，仅引入0.004-0.005M额外参数和0.7-0.8 GFLOPs，最多占任何评估检测器参数量的0.25%。结果确立了RefDiffNet作为一种轻量级、即插即用、检测器无关的输入增强模块，以最小的计算成本显著提升PCB缺陷检测性能。

英文摘要

Printed circuit board (PCB) defect detection is challenging because many defects are small and difficult to distinguish from complex background patterns. Most deep learning-based PCB inspection methods rely only on the inspected PCB image for defect detection, ignoring the defect-free reference image that encodes the expected layout of traces, pads, and other PCB structures. In this work, we propose RefDiffNet, a lightweight plug-and-play input enhancement block placed before the detector backbone to enhance the image before defect detection. RefDiffNet brings one proven idea from classical inspection into the deep learning era, using a defect-free reference image to reveal defects. RefDiffNet compares the defective image with the aligned reference, captures structural changes relative to the reference, and uses a lightweight encoder to output the original image with defective regions highlighted, thereby making the downstream detector's task easier. Results on HRIPCB and DeepPCB show that RefDiffNet consistently improves performance across detector families, including one-stage detectors from YOLOv8 to YOLOv26, the transformer-based RT-DETR, and the two-stage Faster R-CNN. It achieves up to 18% relative mAP50:95 gain with negligible overhead, introducing only 0.004 - 0.005M additional parameters and 0.7 - 0.8 GFLOPs, amounting to at most 0.25% of the parameter count of any evaluated detector. Results establish RefDiffNet as a lightweight, plug-and-play, detector-agnostic input enhancement module that substantially improves PCB defect detection with minimal computational cost.

URL PDF HTML ☆

赞 0 踩 0

2606.00851 2026-06-02 cs.SD cs.CL cs.HC cs.LG eess.AS

Sympatheia: Emotionally Adaptive Voice Assistant with Continuous Affect Conditioning

Sympatheia: 具有连续情感调节的情感自适应语音助手

Sukru Samet Dindar, Riki Shimizu, Xilin Jiang, Nima Mesgarani

发表机构 * Department of Electrical Engineering, Columbia University（电气工程系，哥伦比亚大学）

AI总结提出Sympatheia语音对话框架，通过从用户语音推断情感并结合连续效价-唤醒度控制信号，实现情感自适应响应，优于基线模型。

详情

AI中文摘要

共情口语对话系统必须推断用户的情感状态以做出适当响应，然而日常语音通常带有微弱、中性或模糊的情感线索。为解决这一问题，我们引入了Sympatheia，一种语音到语音对话框架，其条件基于从用户语音中推断出的情感，并且在可用时，基于多模态感知模块或用户界面提供的连续效价-唤醒度（VA）控制信号中的明确情感规格。为了训练我们的模型，我们构建了Sympatheia-18k，一个包含12个情感锚点的情感条件合成口语对话语料库。该数据集包括用于学习情感语音行为的情感分割，以及一个中性分割，该分割将情感中性查询与多个情感条件响应配对，以在情感模糊情况下隔离明确的情感控制。实验结果表明，Sympatheia在生成语义内容和口语表达均情感适当的响应方面优于语音对话基线。我们进一步表明，相同的VA界面可以整合来自不同感知模块（包括面部表情、生物信号和文本情感描述）的情感估计，从而在语音单独提供有限情感证据时改善响应对齐。这些结果表明，连续情感调节是构建情感自适应语音助手的有效实际步骤。

英文摘要

Empathetic spoken dialogue systems must infer a user's emotional state to respond appropriately, yet everyday speech often carries weak, neutral, or ambiguous affective cues. To address this, we introduce Sympatheia, a speech-to-speech dialogue framework conditioned on affect inferred from the user's speech and, when available, explicit affect specifications provided as a continuous valence--arousal (VA) control signal by a multimodal sensing module or user interface. To train our model, we construct Sympatheia-18k, an emotion-conditioned synthetic spoken dialogue corpus with 12 emotion anchors. This dataset includes an emotional split for learning affective speech behavior, and a neutral split that pairs emotionally neutral queries with multiple emotion-conditioned responses to isolate explicit emotion control in emotionally ambiguous cases. Empirical results show that Sympatheia outperforms speech conversational baselines in generating responses whose semantic content and spoken delivery are both emotionally appropriate. We further show that the same VA interface can integrate emotion estimates from diverse sensing modules, including facial expression, biosignals, and textual affect descriptions, improving response alignment when speech alone provides limited emotional evidence. These results suggest that continuous affect conditioning is an effective practical step for building emotionally adaptive voice assistants.

URL PDF HTML ☆

赞 0 踩 0

2606.00846 2026-06-02 cs.LG

CUPID in the Model Zoo: Online Matchmaking for Selecting Your Dream LLM

模型动物园中的丘比特：在线匹配以选择你的梦想大语言模型

Son Nguyen, Xinyuan Liu, Ransalu Senanayake

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出一种基于决斗老虎机算法的主动学习框架，通过迭代选择大语言模型对并收集用户反馈，高效匹配用户偏好与模型能力。

Comments 38 pages, 11 figures

详情

AI中文摘要

用户越来越面临从快速增长的大语言模型池中为给定任务选择合适的LLM的挑战，每个模型具有独特但通常不透明的潜在属性。加剧这一挑战的是，用户可能缺乏词汇或意识来明确表达他们在LLM的响应或部署中所重视的特征。我们提出了一种交互高效的主动学习框架，其中决斗老虎机算法迭代选择LLM对，收集用户关于其响应的反馈，并更新其对用户潜在偏好的信念。我们引入了一种新颖的信念感知上置信界策略，平衡模型池的探索与推断偏好的利用，从而在用户指定的成本和时间预算下实现用户需求与LLM能力之间的高效对齐。通过在LLM和人类研究上的多样化实验，我们实验验证了我们的模型能够以较低成本高效地将良好对齐的LLM匹配给用户。

英文摘要

Users increasingly face the challenge of selecting an appropriate LLM for a given task from a rapidly growing pool of LLMs, each with distinct but often opaque latent properties. Compounding this challenge, users may lack the vocabulary or awareness to explicitly articulate the characteristics they value in an LLM's responses or deployment. We propose an interaction-efficient active learning framework in which a dueling bandit algorithm iteratively selects pairs of LLMs, collects user feedback about their responses, and updates its belief about the user's latent preferences. We introduce a novel belief-aware upper confidence bound strategy that balances exploration of the model pool with exploitation of inferred preferences, enabling efficient alignment between user needs and LLM capabilities under user-specified cost and time budgets. Through diverse experiments on LLMs and human studies, we experimentally verify that our model can efficiently match well-aligned LLMs to users at a lower cost.

URL PDF HTML ☆

赞 0 踩 0

2606.00844 2026-06-02 cs.CV cs.AI cs.LG

MoEIoU: Rethinking Bounding-Box Regression as a Mixture of Experts

MoEIoU：将边界框回归重新思考为混合专家模型

Vinay Edula, Priyanka Bagade

发表机构 * Indian Institute of Technology Kanpur（印度理工学院坎普尔分校）

AI总结提出MoEIoU损失函数，通过混合专家模型联合优化重叠、中心对齐和长宽比，并采用课程学习权重调度，在多个数据集和YOLO架构上超越现有IoU损失。

详情

AI中文摘要

边界框回归是目标检测的基本组成部分，在精确目标定位中起着关键作用。现有的基于交并比（IoU）的损失函数通过引入几何惩罚项（如中心距离和长宽比不匹配）来扩展IoU目标，以改进边界框回归。然而，这些惩罚项通常在训练过程中保持不变，没有考虑优化动态：预测框在初始阶段表现出较大的中心距离和形状误差，而后期阶段则侧重于提高与真实框的重叠。为了解决这一局限性，我们引入了MoEIoU，一种基于混合专家的回归损失，它联合建模了重叠、中心对齐和长宽比不匹配。MoEIoU使用log-sum-exp函数聚合这些组件，该函数强调主要的定位误差，同时保持其他项的平滑贡献。此外，采用基于课程的权重调度，在早期训练阶段优先纠正框的位置和形状，在后期阶段提高重叠。我们在PASCAL VOC、HRIPCB和MS COCO上使用多种YOLO架构以及大规模模拟实验评估了所提出的MoEIoU。它始终优于标准和最新的最先进损失，表现出更快的收敛速度和更高的定位精度。我们进一步表明，这种自适应聚合改进了现有的基于IoU的损失，带来了一致的增益，并为目标检测框架中的边界框回归提供了更有效的优化指导。

英文摘要

Bounding-box regression is a fundamental component of object detection, playing a critical role in precise object localization. Existing Intersection-over-Union (IoU)-based loss functions extend the IoU objective by incorporating geometric penalties, such as center-distance and aspect-ratio mismatch, to improve bounding-box regression. However, these penalties typically remain fixed throughout training and do not account for the optimization dynamics in which predicted boxes initially exhibit large center-distance and shape errors, with later stages focusing on improving overlap with the ground truth. To address this limitation, we introduce MoEIoU, a mixture-of-experts based regression loss that jointly models overlap, center alignment, and aspect-ratio mismatch. MoEIoU aggregates these components using a log-sum-exp function, which emphasizes the dominant localization error while maintaining smooth contributions from other terms. Additionally, a curriculum-based weighting schedule is employed to prioritize correcting box position and shape in early training stages and improving overlap in later stages. We evaluated proposed MoEIoU on PASCAL VOC, HRIPCB, and MS COCO using multiple YOLO architectures, along with large-scale simulation experiments. It consistently outperforms standard and recent state-of-the-art losses, demonstrating faster convergence and improved localization accuracy. We further show that this adaptive aggregation improves existing IoU-based losses, yielding consistent gains and providing more effective optimization guidance for bounding-box regression in object detection frameworks.

URL PDF HTML ☆

赞 0 踩 0

2606.00840 2026-06-02 cs.AI

Certificate-Guided Evaluation of Reinforcement Learning Generalization

证书引导的强化学习泛化评估

Vignesh Subramanian, Đorđe Žikelić, Suguman Bansal

发表机构 * School of Computer Science, Georgia Institute of Technology（佐治亚理工学院计算机科学学院）； School of Computing and Information Systems, Singapore Management University（新加坡管理大学 computing and information systems 学院）

AI总结提出一个逻辑驱动框架，通过神经证书函数验证强化学习算法在未见任务上的泛化能力，并证明证书违规率与测试任务成功率负相关。

详情

AI中文摘要

本文提出了一个逻辑驱动框架，用于评估强化学习算法在泛化到未见任务方面的性能。我们的框架定义了一类归纳可达-避免任务，这些任务在任务动态中具有结构相似性，从而能够评估泛化能力。我们引入了一个神经证书函数，通过强制执行关键条件来验证强化学习算法生成的轨迹，从而作为强化学习泛化的试金石。我们通过实验证明了该方法在几个最先进的可泛化强化学习算法上的能力，在具有挑战性的连续环境中验证了泛化能力。我们的结果表明，证书函数违规率越低，成功解决的测试任务数量越多，突显了我们的框架在评估和区分强化学习算法泛化能力方面的有效性。这项工作为基准测试强化学习泛化提供了一种原则性方法。

英文摘要

This work presents a logic-driven framework to evaluate the performance of reinforcement learning (RL) algorithms in their ability to generalize to unseen tasks. Our framework defines a family of inductive reach-avoid tasks, characterized by structural similarities in task dynamics, enabling evaluation of generalization capabilities. We introduce a neural certificate function that validates trajectories generated by RL algorithms by enforcing key conditions, thereby serving as a litmus test for RL generalization. We empirically demonstrate our method's capability in certifying generalization for several state-of-the-art generalizable RL algorithms on challenging continuous environments. Our results show that a lower percentage of certificate function violations correlates with a higher number of test tasks successfully solved, highlighting the effectiveness of our framework in evaluating and distinguishing generalization capabilities of RL algorithms. This work provides a principled approach for benchmarking RL generalization.

URL PDF HTML ☆

赞 0 踩 0

2606.00838 2026-06-02 cs.AI

Decoupled Behavioral Cloning for Scalable Inductive Generalization in RL from Specifications

解耦行为克隆实现基于规范的强化学习中的可扩展归纳泛化

Vignesh Subramanian, Subhajit Roy, Suguman Bansal

发表机构 * School of Computer Science, Georgia Institute of Technology, USA（美国佐治亚理工学院计算机科学学院）； Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, India（印度理工学院坎浦尔分校计算机科学与工程系）

AI总结提出DIBS方法，通过解耦任务特定策略学习与演化函数学习，利用行为克隆替代噪声奖励聚合，提升训练稳定性和零样本泛化能力。

详情

AI中文摘要

归纳泛化是强化学习泛化的一种框架，其中归纳相关的任务实例允许归纳相关的策略。先前的工作通过直接使用强化学习学习的高阶策略演化函数捕捉这种结构，但存在训练可扩展性差的问题：随着训练任务增加，聚合的奖励反馈变得嘈杂且冲突，破坏训练稳定性并削弱泛化能力。我们提出DIBS，一种解耦的行为克隆方法，将学习任务特定策略与学习演化函数分离。我们首先通过标准强化学习为每个任务学习独立的教师策略，然后通过行为克隆在教师标记的状态-动作对上拟合演化函数。这用密集、稳定的监督取代了嘈杂的奖励聚合。DIBS在训练稳定性和零样本泛化方面相比现有强化学习和元强化学习算法取得了显著改进。

英文摘要

Inductive generalization is a framework for reinforcement learning (RL) generalization in which inductively related task instances admit inductively related policies. Prior work captures this structure via a higher-order policy-evolution function learned directly with RL, but suffers from poor training scalability: as training tasks grow, aggregated reward feedback becomes noisy and conflicting, destabilizing training and weakening generalization. We propose DIBS, a decoupled behavioral cloning approach that separates learning task-specific policies from learning the evolution function. We first learn individual teacher policies per task via standard RL, then fit the evolution function via behavioral cloning on teacher-labeled state-action pairs. This replaces noisy reward aggregation with dense, stable supervision. DIBS achieves significant improvements in both training stability and zero-shot generalization against existing RL and meta-RL algorithms.

URL PDF HTML ☆

赞 0 踩 0

2606.00837 2026-06-02 cs.RO cs.LG

Coarse-to-Fine Compositional Diffusion for Long-Horizon Planning

粗到细的组合扩散用于长时域规划

Byoungwoo Park, Utkarsh A. Mishra, Jaemoo Choi, Juho Lee, Yongxin Chen

发表机构 * KAIST（韩国科学技术院）； Georgia Institute of Technology（佐治亚理工学院）

AI总结提出Coarse-to-Fine Compositional Diffusion (CoFi)方法，通过先形成全局骨架再细化局部细节，在长时域机器人规划、全景图像生成和长视频生成中提升全局一致性和局部质量，同时减少2-8倍去噪评估次数。

Comments Project page: https://cofi-diffusion.github.io

详情

AI中文摘要

扩散模型为生成结构化数据提供了强先验，但许多任务需要输出超出这些模型通常训练规模的范围。组合生成通过将来自预训练短时域先验的重叠局部计划组合成长时域输出来解决这一问题。然而，标准组合主要强制相邻局部计划之间的一致性，产生局部一致性而不直接指定完整组合的全局结构。因此，局部兼容的计划仍可能形成不合理的路线、任务序列或时间演化。现有方法通过重复传播局部一致性信号或添加推理时优化来提高全局连贯性，但随着局部计划数量或维度的增加，这些过程变得昂贵。我们提出粗到细组合扩散（CoFi），一种推理时采样器，将全局结构形成与局部细节细化分离。CoFi首先将局部去噪估计围绕共享的粗结构对齐，产生捕获长程任务级排列的全局骨架。然后将该骨架扩散到中间噪声水平，并使用相同的预训练局部先验去噪，在保留骨架诱导的全局连贯性的同时恢复局部精细结构。在长时域机器人规划、全景图像生成和长视频生成中，CoFi不仅比先前的组合基线提高了全局连贯性和局部样本质量，而且需要2-8倍更少的去噪评估次数。

英文摘要

Diffusion models provide strong priors for generating structured data, but many tasks require outputs beyond the scale on which these models are typically trained. Compositional generation addresses this by composing overlapping local plans from a pretrained short-horizon prior into a long-horizon output. However, standard composition primarily enforces agreement between neighboring local plans, yielding local consistency without directly specifying the global structure of the full composition. As a result, locally compatible plans may still form an implausible route, task sequence, or temporal evolution. Existing methods improve global coherence by repeatedly propagating local consistency signals or by adding inference-time optimization, but these procedures become expensive as the number or dimensionality of local plans increases. We propose Coarse-to-Fine Compositional Diffusion (CoFi), an inference-time sampler that separates global structure formation from local detail refinement. CoFi first aligns local denoised estimates around a shared coarse structure, producing a global scaffold that captures the long-range task-level arrangement. It then diffuses this scaffold to an intermediate noise level and denoises it with the same pretrained local prior, restoring local fine structure while preserving the scaffold-induced global coherence. Across long-horizon robotic planning, panoramic image generation, and long video generation, CoFi not only improves both global coherence and local sample quality over prior compositional baselines, but also requires 2-8x fewer denoiser evaluations.

URL PDF HTML ☆

赞 0 踩 0

2606.00835 2026-06-02 cs.LG

Online Packet Scheduling with Deadlines and Learning

具有截止日期和学习的在线数据包调度

Gianmarco Genalti, Achraf Azize, Vianney Perchet

发表机构 * Politecnico di Milano（米兰理工大学）； FairPlay Joint Team, CREST, ENSAE, IP Paris（FairPlay联合团队，CREST，ENSAE，IP巴黎）

AI总结针对部分反馈下未知权重的在线数据包调度问题，通过连接睡眠强盗问题，提出算法实现α-遗憾最小化，并在不同松弛度下达到最优界。

详情

AI中文摘要

强制执行服务质量（QoS）保证的网络路由器必须在每个时钟周期决定传输哪个即将过期的数据包，即使数据包的值在处理之前是未知的。我们将此问题框架化为部分反馈下的在线数据包调度（OPSD）问题：数据包在每个时钟周期到达，具有不同的截止日期，但权重仅在执行后观察到。在未知权重的随机假设下，我们探索了具有强盗反馈的OPSD问题的不同变体。我们在我们的设置和睡眠强盗问题之间建立了联系，并将学习目标设定为α-遗憾最小化。我们提供了在不同松弛度下具有可证明α-遗憾保证的算法，区分了允许随机化的系统和不允许的系统。在每种情况下，我们的算法实现了$\widetilde{\mathcal{O}}\left(\sqrt{KT} ight)$的α-遗憾上界，与标准强盗设置的下界匹配。在实际相关的2-有界截止日期实例中，其中截止日期最多设置在到达后的一个时钟周期，我们的确定性算法实现了可证明的最紧竞争比。值得注意的是，当不同数据包类型数量$K\ge 2$有限时，有可能打破已建立的$\Phi= rac{1+\sqrt{5}}{2}$竞争比障碍，并获得范围在$[\sqrt{2}, \Phi)$内的更紧竞争比$ heta_K$。

部分公平意识：面向策略代理的信念引导策略机制

Xinpeng Lv, Chunyuan Zheng, Yunxin Mao, Renzhe Xu, Hao Zou, Shanzhi Gu, Liyang Xu, Huan Chen, Yuanlong Chen, Wenjing Yang, Haotian Wang

发表机构 * National University of Defense Technology, Changsha, China（国防科技大学）； Peking University, Beijing, China（北京大学）； Shanghai University of Finance and Economics, Shanghai, China（上海财经大学）； ZGC Laboratory, Beijing, China（ZGC实验室）； Faculty of Computing, Harbin Institute of Technology, Harbin, China（哈尔滨工业大学计算机学院）

AI总结针对策略分类中的公平暴露困境，提出部分公平意识（PFA）问题，通过发布公平约束候选集并隐藏真实约束，结合信念引导机制实现代理与系统公平约束的对齐，实验表明PFA在降低群体公平差距、提高合格个体接受率和结果稳定性方面优于完全公开或私有的公平机制。

Comments Accepted by AAAI2026

详情

DOI: 10.1609/aaai.v40i29.39600

AI中文摘要

策略机器学习研究代理操纵其特征以从预测模型获得有利决策的场景。为了解决策略分类中固有的公平问题，最近的工作引入了群体特定的公平约束。然而，当前的公平感知方法在公平暴露问题上面临根本困境：公开这些约束会导致策略操纵和公平逆转，而隐藏它们可能降低社会福利并阻碍真正的改进。为填补这一空白，我们随后提出了部分公平意识（PFA）问题，因为我们的理论分析表明，这种困境可以通过发布公平约束的候选集并隐藏真实约束来缓解。具体来说，我们引入了一种信念引导的策略机制，其中代理与决策系统迭代交互，并在公平约束候选集上维持一个信念分布。这一信念引导过程使代理能够通过迭代交互和反馈，更新其在候选集上的信念分布，从而逐渐使其信念与系统采用的真实公平约束对齐。在真实世界和合成数据集上的大量实验表明，与完全公开或私有的公平机制相比，PFA实现了更低的群体公平差距、更高的真正合格个体接受率以及更稳定的结果。

英文摘要

Strategic machine learning investigates scenarios where agents manipulate their features to receive favorable decisions from predictive models. To address fairness concerns intrinsic to strategic classification, recent work has introduced group-specific fairness constraints. However, current fairness-aware approaches face a fundamental dilemma in the issue of fairness exposure: making these constraints public enables strategic manipulation and can lead to fairness reversal, while keeping them hidden may reduce social welfare and discourage genuine improvement. To fill this gap, we subsequently propose the problem of partial fairness awareness (PFA), as our theoretical analysis informs that such a dilemma can be mitigated by releasing the candidate set of fairness constraints and concealing the grounding constraint. To be specific, we introduce a belief-guided strategic mechanism, wherein agents iteratively interact with the decision system and maintain a belief distribution over the candidate set of fairness constraints. This belief-guided process enables agents, through iterative interaction and feedback, to update their belief distribution over the candidate set, thereby gradually aligning their belief with the grounding fairness constraint employed by the system. Extensive experiments on real-world and synthetic datasets demonstrate that PFA achieves lower group fairness gaps, higher acceptance of truly qualified individuals, and more stable outcomes compared to fully public or private fairness regimes.

URL PDF HTML ☆

赞 0 踩 0

2606.00825 2026-06-02 cs.CV cs.ET cs.HC cs.MA

SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory

SuperMemory-VQA：面向长期记忆的自我中心视觉问答基准

Samiul Alam, Shakhrul Iman Siam, Michael J. Proulx, James Fort, Richard Newcombe, Hyo Jin Kim, Mi Zhang

发表机构 * The Ohio State University（俄亥俄州立大学）； Meta Project（Meta项目）

AI总结提出SuperMemory-VQA数据集，包含52.9小时AI眼镜录制的日常活动及4853个多选问答对，用于评估AI助手在长期记忆任务上的表现，发现现有系统可靠性不足。

Comments 34 pages, 21 figures, 5 tables

详情

AI中文摘要

AI眼镜为AI代理作为个性化记忆助手提供了有吸引力的平台。要真正有用，此类系统必须超越短期视频理解，解决人类在纵向自我中心视频流中因实际、个人或社交目的而经历的记忆缺口。然而，现有的自我中心数据集主要关注动作识别或来自短片的通用问答，衡量的是感知能力而非现实的人类记忆需求。我们引入了SuperMemory-VQA，一个用于评估AI助手在实际长期记忆任务上的自我中心视觉问答（VQA）数据集。它包含52.9小时用AI眼镜记录的日常活动，包括同步的RGB视频、音频转录、眼动追踪、IMU和SLAM轨迹。通过人工验证的标注流程，我们构建了4,853个有依据的问答对，涵盖物体和位置记忆、意图回忆、视觉场景回忆、时间线重建、对话记忆和上下文检索。每个问题以多项选择形式提出，并包含明确的“不可回答”选项以测试幻觉鲁棒性。对领先的代理框架和LLM骨干的基准测试表明，现有系统在现实世界记忆任务上仍远不可靠，凸显了需要新的架构来实现有依据的AI记忆，使其仅在证据充分时才能回答。参与者调查进一步支持我们的问题具有现实性、实用性，并与日常记忆需求一致。

英文摘要

AI glasses present a compelling platform for AI agents to serve as personalized memory assistants. To be genuinely useful, such systems must move beyond short-term video comprehension and address memory gaps that humans experience for practical, personal, or social purposes over longitudinal egocentric video streams. However, existing egocentric datasets predominantly focus on action recognition or generic QAs from short clips, measuring perceptual capabilities rather than realistic human memory needs. We introduce SuperMemory-VQA, an egocentric visual question answering (VQA) dataset for evaluating AI assistants on practical, long-horizon memory tasks. It contains 52.9 hours of everyday activities recorded with AI glasses, including synchronized RGB video, audio transcription, eye gaze, IMU, and SLAM trajectories. Through a human-verified annotation pipeline, we construct grounded 4,853 question-answer pairs that span object and location memory, intent recall, visual scene recall, timeline reconstruction, conversational memory, and in-context retrieval. Each question is posed as multiple-choice with an explicit "unanswerable" option to test hallucination robustness. Benchmarking leading agentic frameworks and LLM backbones reveals that existing systems remain far from reliable on real-world memory tasks, highlighting the need for new architectures for grounded AI memory that can answer only when evidence is sufficient. A participant survey further supports that our questions are realistic, useful, and aligned with everyday memory needs.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

hZACH-ViT: Curved Latent Geometry for Compact Vision Transformers in Low-Data Medical Imaging

Ryze: Evidence-Enriched Data Synthesis from Biomedical Papers

Citation Grounding: Detecting and Reducing LLM Citation Hallucinations via Legal Citation Graphs

An Exploratory Study into using Machine-Learning for Fast Step-by-step Emulation of Numerical Mechanical Thrombectomy Simulations for Ischemic Stroke

MMDG-Bench: A Benchmark for Multimodal Domain Generalization

Cohort-Scale Neural Atlases of Ultrasound Video

Memory-Efficient LLM Training with Dynamic Sparsity: From Stability to Practical Scaling

GABI: Geometry-Aware Boundary Integration for Spacecraft Segmentation

Dive into Waves: Morlet Spectral Transformer for Cross-Subject Emotion Decoding from EEG

Chunking Methods on Retrieval-Augmented Generation - Effectiveness Evaluation Against Computational Cost and Limitations

Task diversity produces systematic transfer but inhibits continual reinforcement learning

IDEAFix: Evaluation Framework for Creative Defixation Prompting in LLMs

Images as Tables: In-Context Learning with TabPFN for Low-Data Detection of AI-Generated Images

Benchmarks for Vision-Language Models in Urban Perception Should Be Reliability-Aware and Negotiated

Enhancing LLM Metacognition via Cognitive Pairwise Training

From Cues to Horizons: Dynamic Risk Horizon Profiling for Trajectory Prediction

RefDiffNet: Learning to Expose Subtle PCB Defects Before Detection

Sympatheia: Emotionally Adaptive Voice Assistant with Continuous Affect Conditioning

CUPID in the Model Zoo: Online Matchmaking for Selecting Your Dream LLM

MoEIoU: Rethinking Bounding-Box Regression as a Mixture of Experts

Certificate-Guided Evaluation of Reinforcement Learning Generalization

Decoupled Behavioral Cloning for Scalable Inductive Generalization in RL from Specifications

Coarse-to-Fine Compositional Diffusion for Long-Horizon Planning

Online Packet Scheduling with Deadlines and Learning

Momento: Evaluating Persistent Memory and Reasoning with Multi-Session Agentic Conversations

Subliminal Learning is a LoRA Artifact

The Right Inference Strategy Is All You Need: Nearly Training-Free Domain-Wise Inference for EgoCross Challenge

RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes

Partial Fairness Awareness: Belief-Guided Strategic Mechanism for Strategic Agents

SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory