arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2332
2508.20206 2026-05-13 cs.LG cs.AI

Filter then Attend: Improving attention-based Time Series Forecasting with Spectral Filtering

Elisha Dayag, Nhat Thanh Van Tran, Jack Xin

AI总结 本文研究了如何通过频域滤波改进基于Transformer的长期时间序列预测模型。作者提出在模型输入阶段加入可学习的频域滤波器,以增强模型对不同频率成分的利用能力。实验表明,该方法在多个数据集上提升了预测性能,并且能够减少模型嵌入维度,使模型更小更高效。

详情
英文摘要

Transformer-based models are at the forefront in long time-series forecasting (LTSF). While in many cases, these models are able to achieve state of the art results, they suffer from a bias toward low-frequencies in the data and high computational and memory requirements. Recent work has established that learnable frequency filters can be an integral part of a deep forecasting model by enhancing the model's spectral utilization. These works choose to use a multilayer perceptron to process their filtered signals and thus do not solve the issues found with transformer-based models. In this paper, we establish that adding a filter to the beginning of transformer-based models enhances their performance in long time-series forecasting. We add learnable filters, which only add an additional $\approx 1000$ parameters to several transformer-based models and observe in multiple instances 5-10 \% relative improvement in forecasting performance. Additionally, we find that with filters added, we are able to decrease the embedding dimension of our models, resulting in transformer-based architectures that are both smaller and more effective than their non-filtering base models. We also conduct synthetic experiments to analyze how the filters enable Transformer-based models to better utilize the full spectrum for forecasting.

2508.16070 2026-05-13 cs.CL

Less Redundancy: Boosting Practicality of Vision Language Model in Walking Assistants

Chongyang Li, Zhiqiang Yuan, Hanbo Bi, Zexi Jia, Jinchao Zhang

AI总结 本文研究如何提升视觉语言模型在导盲系统中的实用性,针对现有模型输出冗余、缺乏环境风险主动评估的问题,提出了一种减少冗余的行走辅助模型WalkVLM-LR。该模型通过引入基于人类偏好的奖励函数优化输出简洁性与准确性,并结合环境感知判别器提升风险评估效率,实验表明其在输出简洁性和时间冗余度方面均优于现有方法。

Comments ICASSP 2026 Best Industry Paper

详情
英文摘要

Approximately 283 million people worldwide live with visual impairments, motivating increasing research into leveraging Visual Language Models (VLMs) to develop effective walking assistance systems for blind and low vision individuals. However, existing VLMs in walking assistant task often have outputs that contain considerable redundancy and extraneous details, adversely affecting users' ability to accurately assess their surroundings. Moreover, these models typically lack the capability to proactively assess environmental risks and adaptively trigger reminders based on the appropriate scene, leading to excessive temporal redundancy. To mitigate output and temporal redundancy, we propose WalkVLM-LR, a walking assistance model with less redundancy. To reduce output redundancy, we introduce four human-preference-based custom reward functions within the GRPO-based reasoning framework to optimize the output in terms of conciseness, fluency, keyword density, and accuracy, thereby producing more informative and streamlined outputs. To minimize temporal redundancy, we incorporate an environment awareness discriminator, which shares the visual encoder with the VLMs to reduce redundant computations and enhance discriminative efficiency, to make WalkVLM-LR assess scene risk levels and minimize unnecessary reminders. Experimental results demonstrate that our method achieves state-of-the-art performance across all evaluation metrics compared with other models, particularly in output conciseness and less temporal redundancy.

2508.14780 2026-05-13 cs.LG cs.IT math.IT

Context Steering: A New Paradigm for Compression-based Embeddings by Synthesizing Relevant Information Features

Guillermo Sarasa, Ana Granados, Francisco de Borja Rodríguez

AI总结 本文提出了一种名为“上下文引导”(Context Steering)的新方法,用于基于压缩的嵌入表示,通过合成相关性信息特征来提升嵌入对任务的适应性。该方法主动引导特征生成过程,分析每个对象在聚类框架中的关系影响,从而生成定制化的嵌入表示,突出类间差异信息。实验表明,该方法在多种异构数据集上均能生成鲁棒的任务导向嵌入,有效提升了分类和聚类性能。

详情
英文摘要

Compression-based dissimilarities (CD) offer a flexible and domain-agnostic means of measuring similarity by identifying implicit information through redundancies between data objects. However, as similarity features are derived from the data, rather than defined as an input, it often proves difficult to align with the task at hand, particularly in complex clustering or classification settings. To address this issue, we introduce "context steering", a novel methodology that actively guides the feature-shaping process. Instead of passively accepting the emergent data structure (typically a hierarchy derived from clustering CDs), our approach "steers" the process by systematically analyzing how each object influences the relational context within a clustering framework. This process generates a custom-tailored embedding that isolates and amplifies class-distinctive information. We validate this supervised context-steering strategy using Normalized Compression Distance (NCD) and Relative Compression Distance (NRC) combined with hierarchical clustering, and evaluate the learned embeddings through both classification performance and cluster-quality metrics. Experiments on heterogeneous datasets-from text to real-world audio-show that the proposed approach yields robust task-oriented embeddings from compression dissimilarities, moving from traditional transductive uses of distance matrices to an inductive representation that can be applied to unseen data.

2508.10036 2026-05-13 cs.CL cs.AI cs.IR cs.LG

Reflect then Learn: Active Prompting for Information Extraction Guided by Introspective Confusion

Dong Zhao, Yadong Wang, Xiang Chen, Chenxi Wang, Hongliang Dai, Chuanxing Geng, Shengzhong Zhang, Shaoyuan Li, Sheng-Jun Huang

AI总结 该研究提出了一种名为APIE的主动提示框架,用于指导信息抽取任务中的大语言模型。该方法基于“内省混淆”原则,通过量化格式不确定性和内容不确定性两个维度,评估模型自身的困惑程度,并据此选择最具挑战性和信息量的样本作为少样本示例。实验表明,该方法在多个基准数据集上显著提升了信息抽取的准确性和鲁棒性。

Comments Published at AAAI 2026

详情
英文摘要

Large Language Models (LLMs) show remarkable potential for few-shot information extraction (IE), yet their performance is highly sensitive to the choice of in-context examples. Conventional selection strategies often fail to provide informative guidance, as they overlook a key source of model fallibility: confusion stemming not just from semantic content, but also from the generation of well-structured formats required by IE tasks. To address this, we introduce Active Prompting for Information Extraction (APIE), a novel active prompting framework guided by a principle we term introspective confusion. Our method empowers an LLM to assess its own confusion through a dual-component uncertainty metric that uniquely quantifies both Format Uncertainty (difficulty in generating correct syntax) and Content Uncertainty (inconsistency in extracted semantics). By ranking unlabeled data with this comprehensive score, our framework actively selects the most challenging and informative samples to serve as few-shot exemplars. Extensive experiments on four benchmarks show that our approach consistently outperforms strong baselines, yielding significant improvements in both extraction accuracy and robustness. Our work highlights the critical importance of a fine-grained, dual-level view of model uncertainty when it comes to building effective and reliable structured generation systems.

2508.08420 2026-05-13 cs.LG stat.ML

Regret minimization in Linear Bandits with offline data via extended D-optimal exploration

Sushant Vijayan, Arun Suggala, Karthikeyan Shanmugam, Soumyabrata Pal

AI总结 本文研究了在拥有离线数据的情况下,如何在线最小化线性强盗问题的累积遗憾。提出了一种名为Offline-Online Phased Elimination (OOPE) 的算法,通过在探索阶段使用扩展的D-最优设计,有效利用离线数据以显著降低在线遗憾。该算法的在线遗憾界为 $\tilde{O}(\sqrt{\deff T \log (|\mathcal{A}|T)} + d^2)$,其中 $\deff$ 表示离线数据中未充分探索的方向数,反映了离线数据的质量。此外,本文还给出了依赖于离线数据质量的最小最大遗憾下界,并通过Frank-Wolfe近似进一步优化了算法的复杂度。

Comments Accepted to TMLR, with J2C certification, link: https://openreview.net/forum?id=4WcK8gKgCi

详情
英文摘要

We consider the problem of online regret minimization in linear bandits with access to prior observations (offline data) from the underlying bandit model. There are numerous applications where extensive offline data is often available, such as in recommendation systems, online advertising. Consequently, this problem has been studied intensively in recent literature. Our algorithm, Offline-Online Phased Elimination (OOPE), effectively incorporates the offline data to substantially reduce the online regret compared to prior work. To leverage offline information prudently, OOPE uses an extended D-optimal design within each exploration phase. OOPE achieves an online regret is $\tilde{O}(\sqrt{\deff T \log \left(|\mathcal{A}|T\right)}+d^2)$. $\deff \leq d)$ is the effective problem dimension which measures the number of poorly explored directions in offline data and depends on the eigen-spectrum $(λ_k)_{k \in [d]}$ of the Gram matrix of the offline data. The eigen-spectrum $(λ_k)_{k \in [d]}$ is a quantitative measure of the \emph{quality} of offline data. If the offline data is poorly explored ($\deff \approx d$), we recover the established regret bounds for purely online setting while, when offline data is abundant ($\Toff >> T$) and well-explored ($\deff = o(1) $), the online regret reduces substantially. Additionally, we provide the first known minimax regret lower bounds in this setting that depend explicitly on the quality of the offline data. These lower bounds establish the optimality of our algorithm in regimes where offline data is either well-explored or poorly explored. Finally, by using a Frank-Wolfe approximation to the extended optimal design we further improve the $O(d^{2})$ term to $O\left(\frac{d^{2}}{\deff} \min \{ \deff,1\} \right)$, which can be substantial in high dimensions with moderate quality of offline data $\deff = Ω(1)$.

2508.05269 2026-05-13 cs.CV

B4DL: A Benchmark for 4D LiDAR LLM in Spatio-Temporal Understanding

Changho Choi, Youngwoo Shin, Gyojin Han, Dong-Jae Lee, Junmo Kim

AI总结 该研究提出B4DL,一个用于训练和评估多模态大语言模型(MLLM)在4D激光雷达时空理解能力的新基准。针对4D激光雷达数据在MLLM中应用不足的问题,研究设计了可扩展的数据生成流程,并提出了首个能直接处理原始4D激光雷达数据并与语言理解结合的MLLM模型,为动态户外环境中的时空推理提供了统一解决方案。

Comments Accepted at ACM MM 2025

详情
英文摘要

Understanding dynamic outdoor environments requires capturing complex object interactions and their evolution over time. LiDAR-based 4D point clouds provide precise spatial geometry and rich temporal cues, making them ideal for representing real-world scenes. However, despite their potential, 4D LiDAR remains underexplored in the context of Multimodal Large Language Models (MLLMs) due to the absence of high-quality, modality-specific annotations and the lack of MLLM architectures capable of processing its high-dimensional composition. To address these challenges, we introduce B4DL, a new benchmark specifically designed for training and evaluating MLLMs on 4D LiDAR understanding. In addition, we propose a scalable data generation pipeline and an MLLM model that, for the first time, directly processes raw 4D LiDAR by bridging it with language understanding. Combined with our dataset and benchmark, our model offers a unified solution for spatio-temporal reasoning in dynamic outdoor environments. We provide rendered 4D LiDAR videos, generated dataset, and inference outputs on diverse scenarios at: https://github.com/ccho4702/B4DL

2507.21159 2026-05-13 cs.AI cs.LG cs.MA

MAC: Masked Agent Collaboration Boosts Large Language Model Medical Decision-Making

Zhihao Peng, Liuxin Bao, Yixuan Yuan

AI总结 该研究提出了一种名为MAC的掩码智能体协作框架,旨在提升大语言模型在医疗决策中的表现。通过帕累托最优智能体构建和跨一致性最大化机制,该方法实现了协作信息的自适应渐进传播,有效提升了医疗决策的准确性与鲁棒性。研究还引入了模型多样性评估和输出一致性筛选策略,以优化智能体协作过程并减少语义不一致带来的影响。

详情
英文摘要

Large language models (LLMs) have proven effective in artificial intelligence, where the multi-agent system (MAS) holds considerable promise for healthcare development by achieving the collaboration of LLMs. However, the absence of a systematic pipeline for agent construction and the rigidity of static collaboration patterns render current MAS-based models vulnerable to collaboration failures, resulting in substantial performance degradation in medical decision-making scenarios. To this end, we propose a novel Masked Agent Collaboration (MAC) framework that harnesses Pareto-optimal agent construction and cross-consistency maximization mechanisms to achieve adaptive progressive propagation of collaborative information, boosting the medical decision-making capacity. Specifically, we first conduct a Pareto-frontier factors analysis towards the LLMs pool to consider their key factors, including the model size, inference time, diversity score, and throughput ratio, where we calculate the similarity between pairwise outputs within an LLM to derive its diversity score. Beyond this analysis, we enable the identification of Pareto-optimal models that balance efficiency and capability, which are subsequently selected as collaborative agents to consider the fundamental trade-offs inherent in practical LLM deployment. Afterward, we measure the pairwise similarity between the outputs from collaborative agents to determine their cross-consistency values, subsequently masking out the agent with the lowest cross-consistency value to eliminate the output that is likely semantically inconsistent. Finally, we conduct collaboration of agents by achieving adaptive progressive propagation, where each agent aggregates the outputs of unmasked agents from the previous layer as its input to generate the corresponding output via prompt engineering.

2507.13625 2026-05-13 cs.AI

Bridging Dual Knowledge Graphs for Multi-Hop Question Answering in Construction Safety

Yuxin Zhang, Xi Wang, Mo Hu, Zhenyu Zhang

AI总结 本文研究了如何从复杂的建筑安全法规中进行多跳问题回答,以支持自动化合规性检查。为此,提出了一种名为BifrostRAG的双图检索增强生成系统,该系统结合了语言关系和文档结构建模,通过融合图遍历与语义向量搜索的混合检索机制,提升了大语言模型对法规内容和结构的推理能力。实验表明,BifrostRAG在多跳问题数据集上取得了优异的性能,显著优于仅使用向量或仅使用图的基线方法,为复杂技术文档的智能处理提供了可迁移的解决方案。

Comments 22 pages, 13 figures

详情
Journal ref
Automation in Construction, Volume 183, March 2026, 106794
英文摘要

Information retrieval and question answering from safety regulations are essential for automated construction compliance checking but are hindered by the linguistic and structural complexity of regulatory text. Many queries are multi-hop, requiring synthesis across interlinked clauses. To address the challenge, this paper introduces BifrostRAG, a dual-graph retrieval-augmented generation (RAG) system that models both linguistic relationships and document structure. The proposed architecture supports a hybrid retrieval mechanism that combines graph traversal with vector-based semantic search, enabling large language models to reason over both the content and the structure of the text. On a multi-hop question dataset, BifrostRAG achieves 92.8% precision, 85.5% recall, and an F1 score of 87.3%. These results significantly outperform vector-only and graph-only RAG baselines, establishing BifrostRAG as a robust knowledge engine for LLM-driven compliance checking. The dual-graph, hybrid retrieval mechanism presented in this paper offers a transferable blueprint for navigating complex technical documents across knowledge-intensive engineering domains.

2507.12002 2026-05-13 cs.LG

Detecting In-Person Conversations in Noisy Real-World Environments with Smartwatch Audio and Motion Sensing

Alice Zhang, Callihan Bertley, Dawei Liang, Edison Thomaz

AI总结 该研究提出了一种基于智能手表音频和运动传感数据的新方法,用于检测现实环境中面对面的口头对话。通过融合麦克风采集的音频信号与六轴惯性传感器数据,研究设计并训练了卷积和注意力机制神经网络,以识别非语言交流特征。实验表明,多模态数据融合显著提升了检测性能,在实验室和半自然场景中分别达到了82.0%和77.2%的宏F1分数,验证了该方法在实际应用中的有效性。

Comments Accepted to ACM Transactions on Intelligent Systems and Technology

详情
英文摘要

Social interactions play a crucial role in shaping human behavior, relationships, and societies. It encompasses various forms of communication, such as verbal conversation, non-verbal gestures, facial expressions, and body language. In this work, we develop a novel computational approach to detect face-to-face verbal conversations, a foundational aspect of human social interactions. We leverage multimodal data captured by a commodity smartwatch, specifically synchronizing microphone audio with 6-axis inertial signals (accelerometer and gyroscope). We design, train, and evaluate convolutional and attention-based neural networks using three different fusion methods to integrate the audio and motion modalities. To validate this framework, we conduct a lab study with 11 participants and a semi-naturalistic study with 24 participants. Our comprehensive evaluation demonstrates that fusing inertial data with audio significantly improves detection performance by capturing non-verbal conversational dynamics. Overall, our framework achieved 82.0$\pm$3.0% macro F1-score when detecting conversations in the lab and 77.2$\pm$1.8% in the semi-naturalistic setting. Lastly, we demonstrate real-time conversation detection by deploying our trained model to a user application running on a commercial smartwatch.

2507.06694 2026-05-13 cs.LG cs.SY eess.SP eess.SY

Heterogeneous Graph Neural Networks for Short-term State Forecasting in Power Systems across Domains and Time Scales: A Hydroelectric Power Plant Case Study

Raffael Theiler, Olga Fink

AI总结 本文研究了在多物理域和多时间尺度下,如何利用异构图神经网络进行电力系统短期状态预测的问题。针对传统图神经网络在处理异构传感器数据时的局限性,作者提出了一种基于异构图注意力网络的方法,能够同时建模水力和电气两个领域内及跨领域的传感器关系。实验结果表明,该方法在归一化均方根误差指标上比传统方法平均提升了35.5%,验证了其在多域多时间尺度电力系统状态预测中的有效性。

Comments 25 pages, 9 figures

详情
英文摘要

Accurate short-term state forecasting is essential for efficient and stable operation of modern power systems, especially in the context of increasing variability introduced by renewable and distributed energy resources. As these systems evolve rapidly, it becomes increasingly important to reliably predict their states in the short term to ensure operational stability, support control decisions, and enable interpretable monitoring of sensor and machine behavior. Modern power systems often span multiple physical domains - including electrical, mechanical, hydraulic, and thermal - posing significant challenges for modeling and prediction. Graph Neural Networks (GNNs) have emerged as a promising data-driven framework for system state estimation and state forecasting in such settings. By leveraging the topological structure of sensor networks, GNNs can implicitly learn inter-sensor relationships and propagate information across the network. However, most existing GNN-based methods are designed under the assumption of homogeneous sensor relationships and are typically constrained to a single physical domain. This limitation restricts their ability to integrate and reason over heterogeneous sensor data commonly encountered in real-world energy systems, such as those used in energy conversion infrastructure. In this work, we propose the use of Heterogeneous Graph Attention Networks to address these limitations. Our approach models both homogeneous intra-domain and heterogeneous inter-domain relationships among sensor data from two distinct physical domains - hydraulic and electrical - which exhibit fundamentally different temporal dynamics. Experimental results demonstrate that our method significantly outperforms conventional baselines on average by 35.5% in terms of normalized root mean square error, confirming its effectiveness in multi-domain, multi-rate power system state forecasting.

2507.03622 2026-05-13 cs.LG cs.AI stat.ML

Localising Dropout Variance in Twin Networks

Cooper Doyle

AI总结 该论文研究了如何在双网络模型中定位预测不确定性来源的问题,提出了一种分层方差分解方法,将总预测方差分解为编码器部分和输出头部分。通过独立控制共享编码器和输出头的蒙特卡洛Dropout,能够区分不同来源的不确定性。实验表明,编码器方差在分布偏移时占主导,是预测误差的主要指标,而输出头方差在编码器不确定性控制后才具有信息量,该方法成本低廉,可为数据收集提供实用指导。

Comments 14 pages, 5 figures, 3 tables

详情
英文摘要

Accurate individual treatment-effect estimation demands not only reliable point predictions but also uncertainty measures that help practitioners \emph{locate} the source of model failure. We introduce a layer-wise variance decomposition for deep twin-network models: by toggling Monte Carlo Dropout independently in the shared encoder and the outcome heads, we split total predictive variance into an \emph{encoder component} ($σ_{\mathrm{enc}}^2$) and a \emph{head component} ($σ_{\mathrm{head}}^2$), with $σ_{\mathrm{enc}}^2 + σ_{\mathrm{head}}^2 \approx σ_{\mathrm{tot}}^2$ by the law of total variance. Across three synthetic covariate-shift regimes, the encoder component dominates under distributional shift ($ρ_{\mathrm{enc}}=0.53$) while the head component becomes informative only once encoder uncertainty is controlled. On a real-world twins cohort with induced multivariate shift, only $σ_{\mathrm{enc}}^2$ spikes on out-of-distribution samples and becomes the primary error predictor ($ρ_{\mathrm{enc}}\!\approx\!0.89$), while $σ_{\mathrm{head}}^2$ remains flat. The decomposition adds negligible cost over standard MC Dropout and provides a practical diagnostic for deciding whether to collect more diverse covariates or more outcome data.

2506.23723 2026-05-13 cs.RO

A comprehensive control architecture for semi-autonomous dual-arm robots in agriculture settings

Jozsef Palmieri, Paolo Di Lillo, Stefano Chiaverini, Alessandro Marino

AI总结 本文提出了一种适用于农业场景的半自主双臂机器人的综合控制架构,旨在实现如葡萄采摘等复杂任务。该架构基于16自由度的双臂移动机器人,采用分层二次规划(HQP)方法处理多优先级的等式和不等式约束,同时整合感知系统选择的葡萄串进行采摘。为应对环境不确定性和潜在碰撞,架构还通过HQP框架处理交互力,并支持人工操作员协助完成任务,最终通过实验室和真实葡萄园的广泛测试验证了其有效性。

详情
Journal ref
Control Engineering Practice, Vol. 163, 2025
英文摘要

The adoption of mobile robotic platforms in complex environments, such as agricultural settings, requires these systems to exhibit a flexible yet effective architecture that integrates perception and control. In such scenarios, several tasks need to be accomplished simultaneously, ranging from managing robot limits to performing operational tasks and handling human inputs. The purpose of this paper is to present a comprehensive control architecture for achieving complex tasks such as robotized harvesting in vineyards within the framework of the European project CANOPIES. In detail, a 16-DOF dual-arm mobile robot is employed, controlled via a Hierarchical Quadratic Programming (HQP) approach capable of handling both equality and inequality constraints at various priorities to harvest grape bunches selected by the perception system developed within the project. Furthermore, given the complexity of the scenario and the uncertainty in the perception system, which could potentially lead to collisions with the environment, the handling of interaction forces is necessary. Remarkably, this was achieved using the same HQP framework. This feature is further leveraged to enable semi-autonomous operations, allowing a human operator to assist the robotic counterpart in completing harvesting tasks. Finally, the obtained results are validated through extensive testing conducted first in a laboratory environment to prove individual functionalities, then in a real vineyard, encompassing both autonomous and semi-autonomous grape harvesting operations.

2506.22809 2026-05-13 cs.LG cs.AI cs.CL

Learning Adapter Rank via Symmetry Breaking

Cooper Doyle, Andy Hu, Rebecca Chan, Anna Leontjeva

AI总结 该研究针对低秩适配(LoRA)中适配秩坐标不可识别的问题,提出通过变分推断引入对角后验分布,打破LoRA的旋转对称性,从而自动确定适配秩方向的重要性。基于此,研究提出了BayesLoRA,一种在低秩空间直接进行贝叶斯推断的框架,能够同时学习有效的适配秩和预测不确定性,仅需少量额外参数,实验表明其在保持训练成本的同时,实现了更紧凑的预测校准和优于现有低秩稀疏化方法的性能。

Comments 8 pages, 2 figures, 4 tables

详情
英文摘要

Low-rank adaptation is effective partly because downstream updates lie in a low-dimensional subspace, but the latent rank coordinates of LoRA are not identifiable: any invertible reparameterization of the adapter factors leaves the weight update unchanged. We show that variational inference with a diagonal rank-wise posterior turns this non-identifiability into a useful inductive bias. By breaking LoRA's rotational gauge symmetry, the variational objective selects a preferred basis in rank space, enabling automatic relevance determination over rank directions. This yields Low-Rank Variational Dropout (LRVD), a Bayesian framework that performs inference directly in the low-rank adaptation space rather than the ambient weight space. As an instantiation, BayesLoRA jointly learns effective adapter rank and predictive uncertainty with only $\mathcal{O}(r)$ additional parameters. Empirically, BayesLoRA induces stable rank structure aligned with the dominant singular directions of learned updates, yields compact predictive calibration and matches or exceeds strong low-rank sparsification baselines at comparable training cost.

2506.19417 2026-05-13 cs.LG cs.MA

Focusing Influence Mechanism for Multi-Agent Reinforcement Learning

Yisak Park, Sunwoo Lee, Seungyul Han

AI总结 在稀疏奖励环境下,多智能体强化学习(MARL)面临协调探索困难的问题。本文提出了一种聚焦影响机制(FIM),通过基于熵的准则引导智能体关注未充分探索的状态空间区域,并利用资格迹保持多智能体在有益区域的持续协作,从而提升联合行为的协调性和持久性。实验表明,FIM在多种MARL基准任务中均能有效提升合作性能,尤其在稀疏奖励场景下表现出显著优势。

Comments 9 technical page followed by references and appendix

详情
英文摘要

Cooperative multi-agent reinforcement learning (MARL) under sparse rewards remains fundamentally challenging because agents often fail to concentrate their influence, leading to insufficiently coordinated exploration. To address this, we propose the Focusing Influence Mechanism (FIM), a framework that encourages agents to focus their influence on under-explored parts of the state space through an entropy-based criterion, while leveraging eligibility traces to enable multiple agents to consistently align and sustain their influence on the same parts of the state space when beneficial, thereby promoting coordinated and persistent joint behavior. By emphasizing under-explored regions of the state space, FIM facilitates more efficient and structured exploration even under extremely sparse rewards. Across diverse MARL benchmarks, FIM consistently improves cooperative performance over strong baselines.

2506.14097 2026-05-13 cs.RO cond-mat.soft physics.comp-ph

Smooth-Rigid-Body Contact as a ReLCP: A Recursively Generated Linear Complementarity Problem

Bryce Palmer, Hasan Metin Aktulga, Tong Gao

AI总结 本文将光滑刚体之间无摩擦非光滑接触的互补性时间步进方法重新表述为递归生成的线性互补问题(ReLCP),通过一系列维度递增的LCP问题逐步构建。该方法从经典的单约束共享法向有符号距离(SNSD)LCP出发,仅在当前接触集预测的离散时间更新会导致表面穿透时添加单边约束,从而直接作用于光滑几何,保证非穿透性并避免代理表面模型带来的过度采样问题。理论分析表明,在严格凸体和足够小的时间步长下,该方法能够保证有限终止和速度更新的唯一性,数值实验验证了其在大时间步下的稳定性与高效性。

详情
英文摘要

This paper reformulates complementarity-based time-stepping for frictionless nonsmooth contact between smooth rigid bodies as a recursively generated linear complementarity problem (ReLCP), involving a sequence of LCPs of increasing dimension. Starting from a classical single-constraint shared-normal signed-distance (SNSD) LCP, the method adds unilateral constraints only when the discrete-time update predicted by the current contact set would violate nonpenetration of the underlying smooth surfaces. The resulting procedure acts directly on smooth geometry, enforces nonpenetration to a prescribed tolerance, and avoids the oversampling inherent to proxy-surface contact models such as tessellations or multi-sphere decompositions, for which improved geometric fidelity can drive rapid growth in constraint count and cost. For strictly convex bodies, we prove that an initially overlap free configuration with sufficiently small timestep sizes, imply finite termination of the adaptive augmentation, and yield a unique discrete-time velocity update. In the small timestep limit and for any fixed overlap-free discrete state with a fixed geometric overlap tolerance, we prove that the recursion terminates after the initial solve, reducing the method to the classical single-constraint SNSD LCP and retaining the usual consistency of complementarity time-stepping with the underlying differential variational inequality. Numerical tests on colliding ellipsoids, compacting ellipsoid suspensions, growing bacterial colonies, and taut chainmail networks demonstrate stable large-timestep behavior, bounded interpenetration without discretization-induced surface roughness, and substantial reductions in both active constraint counts and runtime relative to representative discrete-surface complementarity formulations.

2506.08902 2026-05-13 cs.LG cs.AI

Intention-Conditioned Flow Occupancy Models

Chongyi Zheng, Seohong Park, Sergey Levine, Benjamin Eysenbach

AI总结 本文提出了一种名为“意图条件流占用模型”(InFOM)的概率模型,用于预测智能体在遥远未来可能访问的状态分布。该模型基于流匹配技术构建,并引入了一个捕捉用户意图的潜在变量,从而提升模型的表达能力并支持通用策略改进。实验表明,InFOM在多个基准任务中相比现有方法,平均回报提升了1.8倍,成功率提高了36%。

Comments ICLR 2026

详情
英文摘要

Large-scale pre-training has fundamentally changed how machine learning research is done today: large foundation models are trained once, and then can be used by anyone in the community (including those without data or compute resources to train a model from scratch) to adapt and fine-tune to specific tasks. Applying this same framework to reinforcement learning (RL) is appealing because it offers compelling avenues for addressing core challenges in RL, including sample efficiency and robustness. However, there remains a fundamental challenge to pre-train large models in the context of RL: actions have long-term dependencies, so training a foundation model that reasons across time is important. Recent advances in generative AI have provided new tools for modeling highly complex distributions. In this paper, we build a probabilistic model to predict which states an agent will visit in the temporally distant future (i.e., an occupancy measure) using flow matching. As large datasets are often constructed by many distinct users performing distinct tasks, we include in our model a latent variable capturing the user intention. This intention increases the expressivity of our model, and enables adaptation with generalized policy improvement. We call our proposed method intention-conditioned flow occupancy models (InFOM). Comparing with alternative methods for pre-training, our experiments on $36$ state-based and $4$ image-based benchmark tasks demonstrate that the proposed method achieves $1.8 \times$ median improvement in returns and increases success rates by $36\%$. Website: https://chongyi-zheng.github.io/infom Code: https://github.com/chongyi-zheng/infom

2506.02215 2026-05-13 cs.RO cs.SY eess.SY

Active inference as a unified model of collision avoidance behavior in human drivers

Julian F. Schumann, Johan Engström, Leif Johnson, Matthew O'Kelly, Joao Messias, Jens Kober, Arkady Zgonnikov

AI总结 本文提出了一种基于主动推断理论的计算认知模型,用于统一解释人类驾驶员在碰撞规避行为中的决策过程。该模型通过最小化自由能来模拟人类在两种典型碰撞场景下的反应,包括前车急刹和对向车辆侧向侵入,并成功复现了多项已有实证研究中的结果,如反应时间、避让策略选择等。研究展示了主动推断作为统一框架在复杂驾驶任务中理解人类行为的潜力。

详情
英文摘要

Collision avoidance -- involving a rapid threat detection and quick execution of the appropriate evasive maneuver -- is a critical aspect of driving. However, existing models of human collision avoidance behavior are fragmented, focusing on specific scenarios or only describing certain aspects of the avoidance behavior, such as response times. This paper addresses these gaps by proposing a novel computational cognitive model of human collision avoidance behavior based on active inference. Active inference provides a unified approach to modeling human behavior: the minimization of free energy. Building on prior active inference work, our model incorporates established cognitive mechanisms such as evidence accumulation to simulate human responses in two distinct collision avoidance scenarios: front-to-rear lead vehicle braking and lateral incursion by an oncoming vehicle. We demonstrate that our model explains a wide range of previous empirical findings on human collision avoidance behavior. Specifically, the model closely reproduces both aggregate results from meta-analyses previously reported in the literature and detailed, scenario-specific effects observed in a recent driving simulator study, including response timing, maneuver selection, and execution. Our results highlight the potential of active inference as a unified framework for understanding and modeling human behavior in complex real-life driving tasks.

2505.20761 2026-05-13 cs.LG stat.ML

Practical estimation of the optimal classification error with soft labels and calibration

Ryota Ushio, Takashi Ishida, Masashi Sugiyama

AI总结 本文研究了在二分类任务中如何实用且理论严谨地估计最优分类错误率(即贝叶斯错误)。作者在原有基于软标签的方法基础上进行了两个重要扩展:一方面,他们分析了基于硬标签的估计器的偏差性质,揭示其衰减速度与两类条件分布的分离程度相关,并在每实例硬标签数量增加时可能显著优于先前结果;另一方面,他们解决了在软标签被污染的情况下进行估计的问题,指出即使使用校准后的软标签,估计结果仍可能不准确,并提出一种基于等距校准的估计方法,在更弱的假设下仍具有统计一致性。该方法无需具体实例,适用于隐私受限的实际场景。实验验证了方法的有效性。

Comments ICLR 2026 camera-ready version updated; 40 pages, 12 figures; GitHub: https://github.com/RyotaUshio/bayes-error-estimation

详情
英文摘要

While the performance of machine learning systems has experienced significant improvement in recent years, relatively little attention has been paid to the fundamental question: to what extent can we improve our models? This paper provides a means of answering this question in the setting of binary classification, which is practical and theoretically supported. We extend a previous work that utilizes soft labels for estimating the Bayes error, the optimal error rate, in two important ways. First, we theoretically investigate the properties of the bias of the hard-label-based estimator discussed in the original work. We reveal that the decay rate of the bias is adaptive to how well the two class-conditional distributions are separated, and it can decay significantly faster than the previous result suggested as the number of hard labels per instance grows. Second, we tackle a more challenging problem setting: estimation with corrupted soft labels. One might be tempted to use calibrated soft labels instead of clean ones. However, we reveal that calibration guarantee is not enough, that is, even perfectly calibrated soft labels can result in a substantially inaccurate estimate. Then, we show that isotonic calibration can provide a statistically consistent estimator under an assumption weaker than that of the previous work. Our method is instance-free, i.e., we do not assume access to any input instances. This feature allows it to be adopted in practical scenarios where the instances are not available due to privacy issues. Experiments with synthetic and real-world datasets show the validity of our methods and theory. The code is available at https://github.com/RyotaUshio/bayes-error-estimation.

2505.19770 2026-05-13 cs.LG cs.CL

Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO

Ruizhe Shi, Minhak Song, Runlong Zhou, Zihan Zhang, Maryam Fazel, Simon S. Du

AI总结 本文对两阶段强化学习从人类反馈(RLHF)和直接偏好优化(DPO)之间的性能差距进行了细致的理论分析,揭示了这一差距来源于精确优化下的显式表示差距和有限样本下的隐式表示差距。研究指出,在精确优化条件下,奖励模型和策略模型的相对容量会影响最终策略质量,并发现RLHF、DPO或在线DPO在不同模型误设情况下可能各有优劣;而在近似优化条件下,当真实奖励稀疏时,RLHF在恢复有效奖励模型所需的样本数量上具有统计优势,表明两阶段学习在某些场景下更具优势。这些结果为理解RLHF与DPO的性能差异提供了全面的理论依据,并为实际应用中选择合适方法提供了指导。

Comments ICML accepted version

详情
英文摘要

We present a fine-grained theoretical analysis of the performance gap between two-stage reinforcement learning from human feedback~(RLHF) and direct preference optimization~(DPO). Our study decomposes this gap into two sources: the explicit representation gap under exact optimization and the implicit representation gap under finite samples. In the exact optimization setting, we characterize how the relative capacities of the reward and policy model classes influence the final policy qualities. We show that RLHF, DPO, or online DPO can outperform one another depending on type of model mis-specifications. Notably, online DPO can outperform both RLHF and standard DPO when the reward and policy model classes are isomorphic and both mis-specified. In the approximate optimization setting, we provide a concrete construction where the ground-truth reward is sparse and show that RLHF requires significantly fewer samples than DPO to recover an effective reward model, highlighting a statistical advantage of two-stage learning. Together, these results provide a comprehensive understanding of the performance gap between RLHF and DPO under various settings, and offer practical insights into when each method is preferred.

2505.13770 2026-05-13 cs.AI cs.CL cs.LG stat.ME stat.ML

Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference

Jin Du, Li Chen, Xun Xian, An Luo, Fangqiao Tian, Ganghua Wang, Charles Doss, Xiaotong Shen, Jie Ding

AI总结 本研究探讨了大型语言模型(LLMs)在因果推断中应对统计陷阱的能力,指出当前模型在处理如辛普森悖论和选择偏差等复杂统计问题时存在明显不足。为此,研究提出了一个名为CausalPitfalls的综合性基准,通过多难度级别的结构化挑战和评分标准,系统评估模型的因果推理能力与回答可靠性。实验结果揭示了现有LLMs在统计因果推理中的局限性,并为构建可信的因果推理系统提供了重要参考。

详情
英文摘要

Reliable causal inference is essential for making decisions in high-stakes areas like medicine, economics, and public policy. However, it remains unclear whether large language models (LLMs) can handle rigorous and trustworthy statistical causal inference. Current benchmarks usually involve simplified tasks. For example, these tasks might only ask LLMs to identify semantic causal relationships or draw conclusions directly from raw data. As a result, models may overlook important statistical pitfalls, such as Simpson's paradox or selection bias. This oversight limits the applicability of LLMs in the real world. To address these limitations, we propose CausalPitfalls, a comprehensive benchmark designed to rigorously evaluate the capability of LLMs in overcoming common causal inference pitfalls. Our benchmark features structured challenges across multiple difficulty levels, each paired with grading rubrics. This approach allows us to quantitatively measure both causal reasoning capabilities and the reliability of LLMs' responses. We evaluate models using two protocols: (1) direct prompting, which assesses intrinsic causal reasoning, and (2) code-assisted prompting, where models generate executable code for explicit statistical analysis. Additionally, we validate the effectiveness of this judge by comparing its scoring with assessments from human experts. Our results reveal significant limitations in current LLMs when performing statistical causal inference. The CausalPitfalls benchmark provides essential guidance and quantitative metrics to advance the development of trustworthy causal reasoning systems.

2505.05772 2026-05-13 cs.CL cs.LG

Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM

Zehao Fan, Garrett Gagnon, Zhenyu Liu, Liu Liu

AI总结 本文研究了如何在处理-in-memory(PIM)架构上高效执行大语言模型(LLM)的解码过程,针对传统密集注意力机制在PIM上难以处理键值(KV)缓存稀疏性带来的不规则访问问题,提出了STARC方法。STARC通过语义相似性对KV对进行聚类,并将其映射到与PIM存储结构对齐的连续内存区域,从而减少解码过程中的内存访问和计算开销。实验表明,STARC在保持模型精度的同时,显著降低了注意力层的延迟和能耗,展示了其在PIM架构上实现高效长上下文LLM推理的有效性。

Comments Early preprint; peer-reviewed version of record published in ASPLOS '26

详情
英文摘要

Transformer-based models are the foundation of modern machine learning, but their execution, particularly during autoregressive decoding in large language models (LLMs), places significant pressure on memory systems due to frequent memory accesses and growing key-value (KV) caches. This creates a bottleneck in memory bandwidth, especially as context lengths increase. Processing-in-memory (PIM) architectures are a promising solution, offering high internal bandwidth and compute parallelism near memory. However, current PIM designs are primarily optimized for dense attention and struggle with the dynamic, irregular access patterns introduced by modern KV cache sparsity techniques. Consequently, they suffer from workload imbalance, reducing throughput and resource utilization. In this work, we propose STARC, a novel sparsity-optimized data mapping scheme tailored specifically for efficient LLM decoding on PIM architectures. STARC clusters KV pairs by semantic similarity and maps them to contiguous memory regions aligned with PIM bank structures. During decoding, queries retrieve relevant tokens at cluster granularity by matching against precomputed centroids, enabling selective attention and parallel processing without frequent reclustering or data movement overhead. Experiments on the HBM-PIM system show that, compared to common token-wise sparsity methods, STARC reduces attention-layer latency by 19%--31% and energy consumption by 19%--27%. Under a KV cache budget of 1024, it achieves up to 54%--74% latency reduction and 45%--67% energy reduction compared to full KV cache retrieval. Meanwhile, STARC maintains model accuracy comparable to state-of-the-art sparse attention methods, demonstrating its effectiveness in enabling efficient and hardware-friendly long-context LLM inference on PIM architectures.

2505.05665 2026-05-13 cs.RO cs.AI cs.CL

Characterizing the Robustness of Black-Box LLM Planners Under Perturbed Observations with Adaptive Stress Testing

Neeloy Chakraborty, John Pohovey, Melkior Ornik, Katherine Driggs-Campbell

AI总结 该研究探讨了在观测信息受到干扰的情况下,黑箱大语言模型(LLM)规划器的鲁棒性问题。研究提出了两种不同的扰动维度,分别模拟语义相似的提示变化和传感器噪声带来的影响,并通过自适应压力测试(AST)结合蒙特卡洛树搜索(MCTS)方法,高效地探索扰动空间,发现可能导致模型产生高度不确定性或崩溃的场景与配置。实验表明,该方法能够提前识别潜在运行时故障,提升LLM在安全关键场景下的可靠性。

Comments Accepted to ACL Findings 2026; 31 pages, 26 figures, 6 tables

详情
英文摘要

Large language models (LLMs) have recently demonstrated success in decision-making tasks including planning, control, and prediction, but their tendency to hallucinate unsafe and undesired outputs poses risks. This unwanted behavior is further exacerbated in environments where sensors are noisy or unreliable. Characterizing the behavior of LLM planners to varied observations is necessary to proactively avoid failures in safety-critical scenarios. We specifically investigate the response of LLMs along two different perturbation dimensions. Like prior works, one dimension generates semantically similar prompts with varied phrasing by randomizing order of details, modifying access to few-shot examples, etc. Unique to our work, the second dimension simulates access to varied sensors and noise to mimic raw sensor or detection algorithm failures. An initial case study in which perturbations are manually applied show that both dimensions lead LLMs to hallucinate in a multi-agent driving environment. However, manually covering the entire perturbation space for several scenarios is infeasible. As such, we propose a novel method for efficiently searching the space of prompt perturbations using adaptive stress testing (AST) with Monte-Carlo tree search (MCTS). Our AST formulation enables discovery of scenarios, sensor configurations, and prompt phrasing that cause language models to act with high uncertainty or even crash. By generating MCTS prompt perturbation trees across diverse scenarios, we show through extensive experiments that offline analyses can be used to proactively understand potential failures that may arise at runtime. Code is available at https://sites.google.com/illinois.edu/astllm.

2504.14707 2026-05-13 cs.CL

FLAME: A New Dataset on FLemish Accounts of Momentary Experiences

Ratna Kandala, Niels Vanhasbroeck, Katie Hoemann

AI总结 本文介绍了FLAME数据集,包含近25,000篇比利时荷兰语(弗莱明语)的个人日常叙事,旨在支持自然语言处理中对代表性不足语言变体的研究。研究探讨了哪种主题建模方法最适合揭示该语料库中的潜在主题,对比了K-Means聚类、LDA和BERTopic三种方法,发现BERTopic在生成连贯且具有文化相关性的主题方面表现最佳,突显了上下文嵌入在低资源、文化特定领域主题建模中的重要性。

详情
英文摘要

We introduce FLAME (FLemish Accounts of Momentary Experiences), a new corpus of nearly 25,000 daily personal narratives in Belgian-Dutch (Flemish), designed to support research on underrepresented language varieties in Natural Language Processing (NLP). Personal narratives of this kind hold rich potential for uncovering culturally grounded, everyday themes, yet extracting meaningful topics from such data is non-trivial, given the informal register, cultural specificity, and low-resource nature of the Flemish variety. We therefore ask: which topic modeling approach is best suited to reveal the latent themes in this corpus? To answer this, we benchmark three widely used methods: K-Means Clustering, Latent Dirichlet Allocation (LDA), and BERTopic, evaluating their ability to identify coherent and culturally relevant topics. While LDA achieves strong performance on automated coherence metrics, human evaluation reveals that BERTopic consistently produces the most coherent and culturally resonant topics, exposing the limitations of purely statistical methods on narrative-rich data. The diminished performance of K-Means compared to prior work on similar Dutch corpora further highlights the unique linguistic challenges posed by this dataset. Our findings demonstrate that contextual embeddings are critical for robust topic modeling in low-resource, culturally specific domains, and underscore the importance of human-centered evaluation alongside automated metrics.

2503.23947 2026-05-13 cs.CV

Spectral-Adaptive Modulation Networks for Visual Perception

Guhnoo Yun, Juhan Yoo, Kijung Kim, Jeongho Lee, Paul Hongsuck Seo, Dong Hwan Kim

AI总结 本文研究了2D卷积与自注意力机制在频域特性上的差异,并通过图谱分析理论解释了它们在频率响应上的行为。基于这一分析,作者提出了一种频域自适应调制(SPAM)混合模块,利用多尺度卷积核和频域重缩放机制对视觉特征进行自适应处理。基于SPAM,作者构建了新型视觉主干网络SPANetV2,在多个视觉任务中表现出优于现有先进模型的性能。

Comments Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

详情
英文摘要

Recent studies have shown that 2D convolution and self-attention exhibit distinct spectral behaviors, and optimizing their spectral properties can enhance vision model performance. However, theoretical analyses remain limited in explaining why 2D convolution is more effective in high-pass filtering than self-attention and why larger kernels favor shape bias, akin to self-attention. In this paper, we employ graph spectral analysis to theoretically simulate and compare the frequency responses of 2D convolution and self-attention within a unified framework. Our results corroborate previous empirical findings and reveal that node connectivity, modulated by window size, is a key factor in shaping spectral functions. Leveraging this insight, we introduce a \textit{spectral-adaptive modulation} (SPAM) mixer, which processes visual features in a spectral-adaptive manner using multi-scale convolutional kernels and a spectral re-scaling mechanism to refine spectral components. Based on SPAM, we develop SPANetV2 as a novel vision backbone. Extensive experiments demonstrate that SPANetV2 outperforms state-of-the-art models across multiple vision tasks, including ImageNet-1K classification, COCO object detection, and ADE20K semantic segmentation.

2503.18760 2026-05-13 cs.CL

Synthetic Function Demonstrations Improve Generation in Low-Resource Programming Languages

Nick McKenna, Xinnuo Xu, Jack Williams, Nick Wilson, Benjamin Van Durme, Christian Poelitz

AI总结 本研究探讨了在低资源编程语言环境下如何提升大语言模型的代码生成能力。针对缺乏真实数据的问题,作者提出了一种生成合成函数示例的方法,通过整理语言文档并利用强大教师模型生成高质量的训练数据,进而对学生模型进行微调。实验表明,该方法在问答数据集上显著提升了模型性能,并优于传统的检索增强生成方法。

Comments Published at LREC 2026

详情
英文摘要

A key consideration when training an LLM is whether the target language is more or less resourced, for example English compared to Welsh, or Python compared to Excel. Typical training data for programming languages consists of real program demonstrations coupled with explanatory human-written comments. In this work we present a novel approach to the creation of such data for low resource programming languages, which lack naturally occurring data. Our process generates synthetic, textbook-quality demonstrations of how to use library functions, which we show makes for good model finetuning data. We demonstrate in an example domain of Excel Formulas. First, we collate language documentation, then we use this to augment a powerful teacher model which generates synthetic training data, and finally finetune student models on the demonstrations. Our technique improves student performance on 2 question-answering datasets: WikiTQ and TAT-QA. We also show advantages of finetuning over standard RAG approaches, which can offer only modest improvement due to the unfamiliarity of the target domain to student models.

2503.16072 2026-05-13 cs.LG cs.AI cs.CL

Toxicity Detection Should Measure Contextual Harm, Not Text-Intrinsic Badness

Sergei Berezin, Reza Farahbakhsh, Noel Crespi

AI总结 本文指出,当前的毒性检测方法往往将毒性视为文本本身的固有属性,而忽视了其在具体语境中的实际危害。作者主张应将毒性检测视为对情境中沟通行为所造成伤害的评估,而非单纯的文本分类任务。为此,他们提出了情境压力框架(CSF),将毒性定义为规范违反与引发压力或干扰之间的关系,并引入了CSF-Eval评估体系,以更全面地衡量毒性检测的效果。

详情
英文摘要

Toxicity detection has become core safety infrastructure for online moderation, dataset filtering, and deployed language-model systems. Yet most detectors still treat toxicity as an intrinsic property of isolated text. This position paper argues that toxicity detection should be evaluated as the contextual measurement of situated communicative harm, rather than as single-label text classification. Toxicity is not contained in words alone; it emerges when a communicative act is interpreted by an audience within a normative and social context. We introduce the Contextual Stress Framework (CSF), which defines toxicity as a relation between perceived norm violation and induced stress or disruption. CSF explains why text-intrinsic detectors overflag dialectal or reclaimed language, miss coded or pragmatic abuse, and remain brittle under meaning-preserving transformations. We propose CSF-Eval, an evaluation agenda that separates text risk, norm violation, disruption, uncertainty, and policy action.

2503.09051 2026-05-13 cs.LG cs.AI

Model-Level GNN Explanations via Rule-to-Graph Readout for Logit Reconstruction

Shengyao Lu, Jiuding Yang, Aedan J. DeFrates, Keith G. Mills, Baochun Li, Di Niu

AI总结 本文提出了一种新的图神经网络(GNN)模型级解释框架,将解释目标从类别的规则提取转向基于规则的logit重建。该方法将预训练GNN的图级读出操作重新表述为加权规则级读出,通过将子图概念组合成逻辑规则,并直接从符号结构计算规则嵌入,再利用冻结的分类器头重建原始多分类logit值。实验表明,该方法在多个图分类数据集上能够高保真地重建原始logit,且在解释效率和功能分析方面优于现有方法。

详情
英文摘要

We propose a novel model-level GNN explanation framework that shifts the explanation target from class-wise rule extraction to rule-based logit reconstruction. Our method recasts the graph-level readout of a pretrained GNN as a weighted rule-level readout: grounded subgraph concepts are composed into logical rules, rule embeddings are computed directly from their symbolic structure, and active rules are passed through the frozen classifier head to reconstruct the GNN's raw multiclass logits. As a result, our approach provides global explanations that remain instantiable on unseen graphs, support subgraph-level grounding, and admit rule-level contribution analysis at test-time. Experiments on three synthetic and two real-world graph classification benchmarks show that our approach faithfully reconstructs the base GNN's raw multiclass logits, achieving high probability-level fidelity across datasets. Rule-level ablations further demonstrate that the identified critical rules actively support the predicted class while suppressing non-target classes, suggesting that they act as functional units rather than merely serving as post-hoc symbolic artifacts. Compared with prior class-wise rule-based explainers, our approach achieves competitive or better prediction agreement while being up to \(20\times\) faster, and additionally provides rule weights, test-time grounding, and logit-level contribution analysis.

2503.00341 2026-05-13 cs.RO cs.SY eess.SY

Feasible Force Set Shaping for a Payload-Carrying Platform Consisting of Tiltable Multiple UAVs Connected Via Passive Hinge Joints

Takumi Ito, Hayato Kawashima, Riku Funada, Mitsuji Sampei

AI总结 本文研究了一种由多个可通过被动铰接关节连接的可倾斜无人机组成的载荷平台的可行力集塑造方法,并提出了一种利用该力集优势的控制律。通过调整无人机的倾斜角度,可以塑造出包含所需形状的可行力集,从而实现平台在多个方向上的冗余力生成。该方法有效提升了平台的负载控制能力和运动灵活性。

Comments This work has been accepted to IFAC for publication under a Creative Commons Licence CC-BY-NC-ND

详情
英文摘要

This paper presents a method for shaping the feasible force set of a payload-carrying platform composed of multiple Unmanned Aerial Vehicles (UAVs) and proposes a control law that leverages the advantages of this shaped force set. The UAVs are connected to the payload through passively rotatable hinge joints. The joint angles are controlled by the differential thrust produced by the rotors, while the total force generated by all the rotors is responsible for controlling the payload. The shape of the set of the total force depends on the tilt angles of the UAVs, which allows us to shape the feasible force set by adjusting these tilt angles. This paper aims to ensure that the feasible force set encompasses the required shape, enabling the platform to generate force redundantly -meaning in various directions. We then propose a control law that takes advantage of this redundancy.

2502.20209 2026-05-13 cs.CV cs.AI

DIPSER: A Dataset for In-Person Student Engagement Recognition in the Wild

Luis Marquez-Carpintero, Sergio Suescun-Ferrandiz, Carolina Lorenzo Álvarez, Jorge Fernandez-Herrero, Diego Viejo, Rosabel Roig-Vila, Miguel Cazorla

AI总结 本文提出了一种名为 DIPSER 的新型数据集,用于评估真实课堂环境中学生的注意力水平。该数据集包含多角度 RGB 摄像头数据和智能手表传感器数据,能够捕捉学生的姿态、面部表情及生理指标,并提供了由学生自评和四位专家评估生成的注意力和情绪标签。该数据集结合了面部与环境摄像头数据、智能穿戴设备指标,并涵盖了以往数据集中较少见的族群群体,是目前最全面的面对面课堂教学中学生注意力与情绪分析数据集。

详情
英文摘要

In this paper, a novel dataset is introduced, designed to assess student attention within in-person classroom settings. This dataset encompasses RGB camera data, featuring multiple cameras per student to capture both posture and facial expressions, in addition to smartwatch sensor data for each individual. This dataset allows machine learning algorithms to be trained to predict attention and correlate it with emotion. A comprehensive suite of attention and emotion labels for each student is provided, generated through self-reporting as well as evaluations by four different experts. Our dataset uniquely combines facial and environmental camera data, smartwatch metrics, and includes underrepresented ethnicities in similar datasets, all within in-the-wild, in-person settings, making it the most comprehensive dataset of its kind currently available. The dataset presented offers an extensive and diverse collection of data pertaining to student interactions across different educational contexts, augmented with additional metadata from other tools. This initiative addresses existing deficiencies by offering a valuable resource for the analysis of student attention and emotion in face-to-face lessons.

2502.19716 2026-05-13 cs.CV cs.LG

Fully AI-Generated Image Detection: Definition, Recent Advances and Challenges

Qijie Xu, Can Wang, Jiawei Chen, Siwei Lyu, Defang Chen

AI总结 本文综述了全AI生成图像检测的研究进展,探讨了该领域面临的核心问题、检测方法及挑战。研究重点分析了数据集构建与特征提取两个关键环节,系统梳理了现有方法在利用先验知识提取生成痕迹方面的分类与差异。文章还指出了当前检测技术的局限性,并展望了未来提升检测鲁棒性与泛化能力的研究方向。

详情
英文摘要

Recent advances in visual generative models have enabled the creation of highly realistic, fully AI-generated images without relying on real source content. While beneficial for many applications, these models also pose significant societal risks, as they can be easily exploited to produce convincing Deepfakes. Detecting them represents a foundational yet challenging problem in AI media forensics, requiring detectors to reliably extract the inherent artifacts imprinted by generative architectures. In this Review, we provide a systematic overview of fully AI-generated image detection. Following the standard detector design pipeline, we focus on two key components: dataset construction and artifact extraction. We analyze how dataset design influences the generalization and robustness of learned artifacts, and categorize existing artifact extraction methods based on the primary inductive priors leveraged to isolate artifacts. Within this framework, we systematically review existing works. Finally, we highlight open problems and envision several future directions for developing more robust and generalizable detectors. Reviewed works in this survey can be found at https://github.com/zju-pi/Awesome-Fully-AI-Generated-Image-Detection.