arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4065
2505.22152 2026-05-12 cs.LG cs.SI

Uncertainty Estimation for Heterophilic Graphs Through the Lens of Information Theory

Dominik Fuchsgruber, Tom Wollschläger, Johannes Bordne, Stephan Günnemann

AI总结 本文研究了在异质图(节点与其邻居特征差异较大)中进行不确定性估计的问题,指出传统方法依赖同质性假设,在异质场景下效果不佳。作者从信息论角度分析了消息传递神经网络,提出了适用于图结构的数据处理不等式,揭示了在异质图中,随着网络深度增加,节点的预测目标信息可能增强。基于这一发现,作者提出了一种同时利用所有节点表示进行不确定性估计的方法,并在异质图上取得了最先进的效果,且在同质图上也表现良好,无需显式依赖同质性假设。

详情
英文摘要

While uncertainty estimation for graphs recently gained traction, most methods rely on homophily and deteriorate in heterophilic settings. We address this by analyzing message passing neural networks from an information-theoretic perspective and developing a suitable analog to data processing inequality to quantify information throughout the model's layers. In contrast to non-graph domains, information about the node-level prediction target can increase with model depth if a node's features are semantically different from its neighbors. Therefore, on heterophilic graphs, the latent embeddings of an MPNN each provide different information about the data distribution - different from homophilic settings. This reveals that considering all node representations simultaneously is a key design principle for epistemic uncertainty estimation on graphs beyond homophily. We empirically confirm this with a simple post-hoc density estimator on the joint node embedding space that provides state-of-the-art uncertainty on heterophilic graphs. At the same time, it matches prior work on homophilic graphs without explicitly exploiting homophily through post-processing.

2505.20654 2026-05-12 cs.CL cs.AI

Chinese Cyberbullying Detection: Dataset, Method, and Validation

Yi Zhu, Xin Zou, Xindong Wu

AI总结 该研究针对现有网络欺凌检测数据集按言论极性分类的局限性,提出了一种基于事件组织的新型标注方法,构建了首个中文网络欺凌事件检测数据集CHNCI,包含91个事件中的220,676条评论。研究采用生成解释的检测方法生成伪标签,并结合人工标注进行验证,同时提出了事件判定的评估标准。实验表明,该数据集可作为网络欺凌检测与事件预测任务的基准,为相关研究提供了重要支持。

详情
英文摘要

Existing cyberbullying detection benchmarks were organized by the polarity of speech, such as "offensive" and "non-offensive", which were essentially hate speech detection. However, in the real world, cyberbullying often attracted widespread social attention through incidents. To address this problem, we propose a novel annotation method to construct a cyberbullying dataset that organized by incidents. The constructed CHNCI is the first Chinese cyberbullying incident detection dataset, which consists of 220,676 comments in 91 incidents. Specifically, we first combine three cyberbullying detection methods based on explanations generation as an ensemble method to generate the pseudo labels, and then let human annotators judge these labels. Then we propose the evaluation criteria for validating whether it constitutes a cyberbullying incident. Experimental results demonstrate that the constructed dataset can be a benchmark for the tasks of cyberbullying detection and incident prediction. To the best of our knowledge, this is the first study for the Chinese cyberbullying incident detection task.

2505.20381 2026-05-12 cs.CV

ReaMOT: A Benchmark and Framework for Reasoning-based Multi-Object Tracking

Sijia Chen, Yanqiu Yu, En Yu, Wenbing Tao

AI总结 ReaMOT 是一个基于推理的多目标跟踪任务,旨在通过逻辑推理追踪由语言指令指定的目标,克服了现有方法对显式视觉-文本匹配的依赖。为此,研究者提出了 ReaMOT 挑战基准,包含大量语言指令和视频序列,并设计了 ReaTrack 框架,结合大语言模型与运动先验,实现了更鲁棒的跟踪性能。实验表明,ReaTrack 在高层次推理任务中表现出显著提升。

Comments Code: https://github.com/chen-si-jia/ReaMOT

详情
英文摘要

Referring Multi-Object Tracking (RMOT) aims to track targets specified by language instructions. However, existing RMOT paradigms heavily rely on explicit visual-textual matching and consequently fail to generalize to complex instructions that require logical reasoning. To overcome this, we propose Reasoning-based Multi-Object Tracking (ReaMOT), a novel task that elevates tracking to a cognitive level, requiring models to infer and track specific targets satisfying implicit constraints via logical reasoning. To advance this field, we construct the ReaMOT Challenge, a comprehensive benchmark featuring a tailored metric suite and a large scale dataset. This dataset comprises 1,156 language instructions, 423,359 image language pairs, and 869 distinct video sequences systematically categorized into six distinct evaluation scenarios, with over 75\% of the instructions dedicated to High Level Reasoning. Furthermore, recognizing that traditional trackers lack cognitive capacity while direct application of Large Vision-Language Model (LVLM) yields severe temporal inconsistencies, we propose ReaTrack. Driven by the insight to decouple high-level cognitive localization from low-level physical motion continuity, this training-free framework dynamically aligns the semantic detections of a Thinking-variant LVLM with the robust motion priors of SAM2. Extensive experiments on the ReaMOT Challenge benchmark demonstrate that ReaTrack establishes a new leading performance standard. Notably, it achieves a more than threefold improvement in RHOTA on the High Level Reasoning subset. Our dataset and code will be available at https://github.com/chen-si-jia/ReaMOT.

2505.20001 2026-05-12 cs.CV

NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-Identification

Shihao Li, Huaibo Huang, Junxian Duan, Aihua Zheng, Jin Tang, Jixin Ma

AI总结 本文研究多模态物体重识别问题,旨在从异构模态中获取完整的身份特征。为解决现有方法依赖隐式特征融合、难以建模细粒度识别模式的问题,提出了一种基于文本调制的多粒度专家混合框架NEXT。该方法通过属性置信度生成高质量描述文本,并将识别任务分解为语义和结构两个分支,分别捕捉细粒度外观特征和粗粒度结构特征,最终通过多粒度特征聚合策略实现更准确的身份表示,实验表明该方法在多个数据集上显著优于现有先进方法。

详情
英文摘要

Multi-modal object Re-IDentification (ReID) aims to obtain complete identity features across heterogeneous modalities. However, most existing methods rely on implicit feature fusion modules, making it difficult to model fine-grained recognition patterns under various challenges in real world. Benefiting from the powerful Multi-modal Large Language Models (MLLMs), the object appearances are effectively translated into descriptive captions. In this paper, we propose a reliable caption generation pipeline based on attribute confidence, which significantly reduces the unknown recognition rate of MLLMs and improves the quality of generated text. Additionally, to model diverse identity patterns, we propose a novel ReID framework, named NEXT, the Multi-grained Mixture of Experts via Text-Modulation for Multi-modal Object Re-Identification. Specifically, we decouple the recognition problem into semantic and structural branches to separately capture fine-grained appearance features and coarsegrained structure features. For semantic recognition, we first propose a Text-Modulated Semantic Experts (TMSE), which randomly samples high-quality captions to modulate experts capturing semantic features and mining inter-modality complementary cues. Second, to recognize structure features, we propose a Context-Shared Structure Experts (CSSE), which focuses on the holistic object structure and maintains identity structural consistency via a soft routing mechanism. Finally, we propose a Multi-Grained Features Aggregation (MGFA), which adopts a unified fusion strategy to effectively integrate multi-grained expert features into the final identity representations. Extensive experiments on two public person datasets and three vehicle datasets demonstrate the effectiveness of our method, showing that it significantly outperforms existing state-of-the-art methods.

2505.19519 2026-05-12 cs.CV

Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift

Gihoon Kim, Hyungjin Park, Taesup Kim

AI总结 本文研究了如何在不导致分布偏移的前提下,对文本到图像的扩散模型进行个性化定制。作者指出,现有方法在个性化过程中容易过度拟合参考图像,忽视用户提示,其根本原因是未能同时保证图像真实性与文本对齐。为此,提出了一种基于李普希茨正则化的优化目标,约束模型参数更新,保持预训练模型输出分布的稳定性,从而在保留原始生成能力的同时实现对新概念的准确适配。实验表明,该方法在多个扩散模型架构上均表现出优越的定量和定性性能。

Comments Accepted at ICLR 2026

详情
英文摘要

Personalizing text-to-image diffusion models involves integrating novel visual concepts from a small set of reference images while retaining the model's original generative capabilities. However, this process often leads to overfitting, where the model ignores the user's prompt and merely replicates the reference images. We attribute this issue to a fundamental misalignment between the true goals of personalization, which are subject fidelity and text alignment, and the training objectives of existing methods that fail to enforce both objectives simultaneously. Specifically, prior approaches often overlook the need to explicitly preserve the pretrained model's output distribution, resulting in distributional drift that undermines diversity and coherence. To resolve these challenges, we introduce a Lipschitz-based regularization objective that constrains parameter updates during personalization, ensuring bounded deviation from the original distribution. This promotes consistency with the pretrained model's behavior while enabling accurate adaptation to new concepts. Furthermore, our method offers a computationally efficient alternative to commonly used, resource-intensive sampling techniques. Through extensive experiments across diverse diffusion model architectures, we demonstrate that our approach achieves superior performance in both quantitative metrics and qualitative evaluations, consistently excelling in visual fidelity and prompt adherence. We further support these findings with comprehensive analyses, including ablation studies and visualizations.

2505.06182 2026-05-12 cs.RO cs.LG

Apple: Toward General Active Perception via Reinforcement Learning

Tim Schneider, Cristiana de Farias, Roberto Calandra, Liming Chen, Jan Peters

AI总结 本文提出了一种名为APPLE的新型框架,旨在通过强化学习解决通用主动感知问题。该框架结合基于Transformer的感知模块与决策策略,通过统一的优化目标进行联合训练,使机器人能够主动获取环境信息。实验表明,APPLE在多种任务中表现出色,包括触觉探索任务,展示了其作为通用主动感知方法的潜力。

Comments 27 pages; 21 figures; accepted at the Fourteenth International Conference on Learning Representations (ICLR 2026)

详情
英文摘要

Active perception is a fundamental skill that enables us humans to deal with uncertainty in our inherently partially observable environment. For senses such as touch, where the information is sparse and local, active perception becomes crucial. In recent years, active perception has emerged as an important research domain in robotics. However, current methods are often bound to specific tasks or make strong assumptions, which limit their generality. To address this gap, this work introduces APPLE (Active Perception Policy Learning) - a novel framework that leverages reinforcement learning (RL) to address a range of different active perception problems. APPLE jointly trains a transformer-based perception module and decision-making policy with a unified optimization objective, learning how to actively gather information. By design, APPLE is not limited to a specific task and can, in principle, be applied to a wide range of active perception problems. We evaluate two variants of APPLE across different tasks, including tactile exploration problems from the Tactile MNIST benchmark. Experiments demonstrate the efficacy of APPLE, achieving high accuracies on both regression and classification tasks. These findings underscore the potential of APPLE as a versatile and general framework for advancing active perception in robotics. Project page: https://timschneider42.github.io/apple

2504.19774 2026-05-12 cs.LG

If Concept Bottlenecks are the Question, are Foundation Models the Answer?

Nicola Debole, Pietro Barbiero, Francesco Giannini, Andrea Passerini, Stefano Teso, Emanuele Marconato

AI总结 该论文探讨了基础模型是否能作为概念瓶颈模型(CBM)中概念学习的有效替代方案。研究通过实验分析了基于基础模型的弱监督方法在概念学习中的表现,发现其与专家标注存在差异,且概念准确性与质量之间关联不强。研究揭示了基础模型在提升CBM可解释性方面的潜力与局限,为未来研究提供了重要参考。

详情
英文摘要

Concept Bottleneck Models (CBMs) are neural networks designed to conjoin high performance with ante-hoc interpretability. CBMs work by first mapping inputs (e.g., images) to high-level concepts (e.g., visible objects and their properties) and then use these to solve a downstream task (e.g., tagging or scoring an image) in an interpretable manner. Their performance and interpretability, however, hinge on the quality of the concepts they learn. The go-to strategy for ensuring good quality concepts is to leverage expert annotations, which are expensive to collect and seldom available in applications. Researchers have recently addressed this issue by introducing "VLM-CBM" architectures that replace manual annotations with weak supervision from foundation models. It is however unclear what is the impact of doing so on the quality of the learned concepts. To answer this question, we put state-of-the-art VLM-CBMs to the test, analyzing their learned concepts empirically using a selection of significant metrics. Our results show that, depending on the task, VLM supervision can sensibly differ from expert annotations, and that concept accuracy and quality are not strongly correlated. Our code is available at https://github.com/debryu/CQA.

2504.10766 2026-05-12 cs.LG cs.AI cs.CL

How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients

Ming Li, Yanhong Li, Ziyue Li, Tianyi Zhou

AI总结 随着大语言模型的微调任务从指令遵循转向复杂推理,不同数据对微调过程的影响仍不明确。本文通过分析指令和推理数据在各层梯度上的谱特性,揭示了数据质量评估指标如IFD、InsTag、难度和奖励等可通过梯度奇异值分解的谱属性进行解释和统一。研究发现,高质量数据通常具有更低的核范数和更高的有效秩,其中有效秩在捕捉数据质量细微差异方面更具鲁棒性和分辨力。此外,同一家族模型在梯度模式上表现出相似性,而不同家族模型则差异显著,为后续数据探索策略的优化提供了新视角。

Comments ACL2026, camera-ready

详情
英文摘要

As the post-training of large language models (LLMs) advances from instruction-following to complex reasoning tasks, understanding how different data affect finetuning dynamics remains largely unexplored. In this paper, we present a spectral analysis of layer-wise gradients induced by low/high-quality instruction and reasoning data for LLM post-training. Our analysis reveals that widely-studied metrics for data evaluation, e.g., IFD, InsTag, Difficulty, and Reward, can be explained and unified by spectral properties computed from gradients' singular value decomposition (SVD). Specifically, higher-quality data are usually associated with lower nuclear norms and higher effective ranks. Notably, effective rank exhibits better robustness and resolution than nuclear norm in capturing subtle quality differences. For example, reasoning data achieves substantially higher effective ranks than instruction data, implying richer gradient structures on more complex tasks. Our experiments also highlight that models within the same family share similar gradient patterns regardless of their sizes, whereas different model families diverge significantly. Providing a unified view on the effects of data quality across instruction and reasoning data, this work illuminates the interplay between data quality and training stability, shedding novel insights into developing better data exploration strategies for post-training.

2504.02343 2026-05-12 cs.LG

Toward General and Robust LLM-enhanced Text-attributed Graph Learning

Zihao Zhang, Xunkai Li, Rong-Hua Li, Zhenjun Li, Bing Zhou, Guoren Wang

AI总结 随着大型语言模型(LLM)的快速发展和文本属性图(TAG)在多个领域的广泛应用,基于LLM增强的TAG学习成为研究热点。然而,现有方法缺乏统一的框架来系统整合LLM与图神经网络(GNN)之间的复杂交互,并且难以有效应对现实TAG中文本和边稀疏带来的挑战。为此,本文提出UltraTAG统一框架及其实现UltraTAG-S,通过文本传播、增强和节点选择等策略,有效缓解稀疏性问题,实验表明其在多种场景下均显著优于现有方法,展现出更强的鲁棒性和泛化能力。

详情
英文摘要

Recent advancements in Large Language Models (LLMs) and the proliferation of Text-Attributed Graphs (TAGs) across various domains have positioned LLM-enhanced TAG learning as a critical research area. By utilizing rich graph descriptions, this paradigm leverages LLMs to generate high-quality embeddings, thereby enhancing the representational capacity of Graph Neural Networks (GNNs). However, the field faces significant challenges: (1) the absence of a unified framework to systematize the diverse optimization perspectives arising from the complex interactions between LLMs and GNNs, and (2) the lack of a robust method capable of handling real-world TAGs, which often suffer from texts and edge sparsity, leading to suboptimal performance. To address these challenges, we propose UltraTAG, a unified pipeline for LLM-enhanced TAG learning. UltraTAG provides a unified comprehensive and domain-adaptive framework that not only organizes existing methodologies but also paves the way for future advancements in the field. Building on this framework, we propose UltraTAG-S, a robust instantiation of UltraTAG designed to tackle the inherent sparsity issues in real-world TAGs. UltraTAG-S employs LLM-based text propagation and text augmentation to mitigate text sparsity, while leveraging LLM-augmented node selection techniques based on PageRank and edge reconfiguration strategies to address edge sparsity. Our extensive experiments demonstrate that UltraTAG-S significantly outperforms existing baselines, achieving improvements of 2.12\% and 17.47\% in ideal and sparse settings, respectively. Moreover, as the data sparsity ratio increases, the performance improvement of UltraTAG-S also rises, which underscores the effectiveness and robustness of UltraTAG-S.

2503.05066 2026-05-12 cs.LG cs.AI cs.CL

Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts

Shwai He, Weilin Cai, Jiayi Huang, Ang Li

AI总结 在混合专家(MoE)架构中,由于专家负载不均导致的“拖后腿效应”会显著影响推理效率。本文提出了一种容量感知的推理方法,通过引入容量感知的令牌丢弃和扩展丢弃策略,有效缓解了专家负载不平衡问题,提升了专家利用率和推理速度。实验表明,该方法在多种语言和多模态MoE模型上均取得了性能提升和效率改进的显著效果。

Comments ICLR 2026

详情
英文摘要

The Mixture of Experts (MoE) is an effective architecture for scaling large language models by leveraging sparse expert activation to balance performance and efficiency. However, under expert parallelism, MoE suffers from inference inefficiencies due to imbalanced token-to-expert assignment, where underloaded experts complete computations early but must wait for overloaded experts, leading to global delays. We define this phenomenon as the \textbf{\textit{Straggler Effect}}, as the most burdened experts dictate the overall inference latency. To address this, we first propose \textit{\textbf{Capacity-Aware Token Drop}}, which enforces expert capacity limits by discarding excess tokens from overloaded experts, effectively reducing load imbalance with minimal performance impact (e.g., $30\%$ speedup with only $0.9\%$ degradation on OLMoE). Next, given the presence of low-load experts remaining well below the capacity threshold, we introduce \textit{\textbf{Capacity-Aware Expanded Drop}}, which allows tokens to include additional local experts in their candidate set before enforcing strict local capacity constraints, thereby improving load balance and enhancing the utilization of underused experts. Extensive experiments on both language and multimodal MoE models demonstrate the effectiveness of our approach, yielding substantial gains in expert utilization, model performance, and inference efficiency, e.g., applying Expanded Drop to Mixtral-8$\times$7B-Instruct yields a {0.2\%} average performance improvement and a {1.85$\times$} inference speedup. The code is released at: https://github.com/CASE-Lab-UMD/Capacity-Aware-MoE.

2503.03481 2026-05-12 cs.RO cs.SY eess.SY

Cyclic Nullspace Coordination: Perpetual Flight of Aerial Carriers for Static Suspension

Chiara Gabellieri, Yaolei Shen, Martina Paolucci, Antonio Franchi

AI总结 本文研究了如何使三个或更多飞行器在不中断飞行的情况下协同悬吊一个静止负载,并提出了一种生成飞行器协调轨迹的算法。该方法基于两个核心思想:一是从负载抓取矩阵的零空间中选择特定的力方向,构成哈密顿回路以生成协调的力轨迹;二是通过图着色将这些方向映射到周期性坐标,从而在二维子空间中生成椭圆轨迹,确保飞行器持续运动而不影响负载的静止状态。该方法具有可扩展性,适用于任意数量的飞行器,并通过仿真和实验验证了其有效性。

Comments Accepted for publications on the IEEE Transactions on Control Systems Technology

详情
英文摘要

This work demonstrates that the non-stop flights of three or more carriers are compatible with holding a constant pose of a cable-suspended load. It also presents an algorithm for generating the carriers' coordinated non-stop trajectories. The proposed method builds upon two pillars: (1) the choice of n special linearly independent directions of internal forces within the 3n-6-dimensional nullspace of the grasp matrix of the load, chosen as the edges of a Hamiltonian cycle on the graph that connects the cable attachment points on the load. Adjacent pairs of directions are used to generate n forces evolving on distinct 2D affine subspaces, despite the attachment points being generically in 3D; (2) the construction of elliptical trajectories within these subspaces by mapping, through appropriate graph coloring, each edge of the Hamiltonian cycle to a periodic coordinate while ensuring that no adjacent coordinates exhibit simultaneous zero derivatives. Combined with conditions for load statics and attachment point positions, these choices ensure that each of the n force trajectories projects onto the corresponding cable constraint sphere with non-zero tangential velocity, enabling perpetual motion of the carriers while the load is still. The work provides a scalable constructive design for any n greater than or equal to 3 with tuning guidelines, quantifies sensitivity and single-carrier failures, and provides a fixed-wing-compatible planner that preserves load statics under speed/bank/flight-path constraints. The theoretical findings are validated through simulations and laboratory experiments with quadrotor UAVs.

2502.10760 2026-05-12 cs.CL cs.LG stat.ML

Why is prompting hard? Understanding prompts on binary sequence predictors

Li Kevin Wenliang, Anian Ruoss, Jordi Grau-Moya, Marcus Hutter, Tim Genewein

AI总结 本文探讨了为何在二元序列预测模型中设计有效的提示(prompt)具有挑战性,认为最优提示的寻找应基于接近最优的序列预测器进行条件设置。通过大量受控实验,研究发现结合预训练分布可以更好地理解非直观的最优提示,而即使使用穷举搜索,实际神经预测模型的最优提示也难以可靠识别。研究还指出,一些流行的提示方法如使用目标任务的示例可能效果不佳,并揭示了前沿模型中最优提示的规律与二元示例及先前研究存在相似性。

详情
Journal ref
Artificial Intelligence and Statistics 2026
英文摘要

Frontier models can be prompted or conditioned to do many tasks, but finding good prompts is not always easy, nor is understanding some performant prompts. We view prompting as finding the best conditioning sequence on a near-optimal sequence predictor. On numerous well-controlled experiments, we show that unintuitive optimal conditioning sequences can be better understood given the pretraining distribution, which is not usually available. Even using exhaustive search, reliably identifying optimal prompts for practical neural predictors can be surprisingly difficult. Popular prompting methods, such as using demonstrations from the targeted task, can be surprisingly suboptimal. Using the same empirical framework, we analyze optimal prompts on frontier models, revealing patterns similar to the binary examples and previous findings. Taken together, this work takes an initial step towards understanding optimal prompts, from a statistical and empirical perspective that complements research on frontier models.

2502.03749 2026-05-12 cs.LG math.OC

PINS: Proximal Iterations with Sparse Newton and Sinkhorn for Optimal Transport

Di Wu, Ling Liang, Haizhao Yang

AI总结 最优传输(OT)在机器学习中广泛应用,但求解大规模实例的高精度解仍面临高昂的计算成本。本文提出了一种名为PINS的两层求解器,通过熵正则化 proximal-point 方法结合稀疏牛顿法和Sinkhorn算法,有效克服了传统Sinkhorn方法在小熵正则化参数下的收敛缓慢和偏差平台问题。实验表明,PINS在相对成本误差和运行效率上均优于现有方法,并在大规模数据集上展现出更好的内存效率。

详情
英文摘要

Optimal transport (OT) is a widely used tool in machine learning, but computing high-accuracy solutions for large instances remains costly. Entropic regularization and the Sinkhorn algorithm improve scalability; however, when the regularization parameter is small, Sinkhorn convergence slows, and the iterates approach an entropic solution that remains separated from the true OT plan by an entropic-bias plateau. We introduce PINS (Proximal Iterations with sparse Newton and Sinkhorn), a two-loop solver designed to move beyond this plateau. The outer loop applies an entropic proximal-point method, solving the original OT problem through a sequence of entropic subproblems with shifted cost matrices. Each inner subproblem is then solved by a Sinkhorn warm-up followed by sparse-Newton refinement. We prove that PINS converges globally to an optimal solution of the unregularized OT problem and that the inner Hessian admits a sparsification at every outer iteration with a structure independent of the cost matrix. On synthetic and augmented-MNIST instances, PINS achieves much lower relative cost errors than Sinkhorn-type baselines, which stall at the entropic-bias plateau, and is $5$--$73\times$ faster than Sinkhorn with the same outer loop at matched accuracy. On large-scale DOTmark instances, a streaming implementation reduces peak memory by $24$--$54\%$ compared with the network-simplex linear programming (LP) solver and remains feasible under per-process memory budgets for which the LP solver fails.

2411.18111 2026-05-12 cs.CV

When Large Vision-Language Models Meet Person Re-Identification

Qizao Wang, Bin Li, Xiangyang Xue

AI总结 本文研究了如何将大型视觉-语言模型(LVLMs)应用于行人重识别(ReID)任务。传统ReID依赖于提取区分性强的身份特征,而LVLMs则擅长跨模态理解和生成。为此,作者提出LVLM-ReID框架,通过指令引导LVLM生成包含行人关键外观语义的语义标记,并利用语义引导交互模块增强语义与视觉特征的关联,最终将强化后的语义标记作为行人身份表示。该方法无需额外图像-文本标注即可在多个基准上取得有竞争力的性能,展示了LVLM生成语义在提升ReID效果中的潜力。

Comments Accepted by ICASSP 2026

详情
英文摘要

Large Vision-Language Models (LVLMs) that incorporate visual models and large language models have achieved impressive results across cross-modal understanding and reasoning tasks. In recent years, person re-identification (ReID) has also started to explore cross-modal semantics to improve the accuracy of identity recognition. However, effectively utilizing LVLMs for ReID remains an open challenge. While LVLMs operate under a generative paradigm by predicting the next output word, ReID requires the extraction of discriminative identity features to match pedestrians across cameras. In this paper, we propose LVLM-ReID, a novel framework that harnesses the strengths of LVLMs to promote ReID. Specifically, we employ instructions to guide the LVLM in generating one semantic token that encapsulates key appearance semantics from the person image. This token is further refined through our Semantic-Guided Interaction (SGI) module, establishing a reciprocal interaction between the semantic token and visual tokens. Ultimately, the reinforced semantic token serves as the representation of pedestrian identity. Our framework integrates the semantic understanding and generation capabilities of LVLM into end-to-end ReID training, allowing LVLM to capture rich semantic cues during both training and inference. LVLM-ReID achieves competitive results on multiple benchmarks without additional image-text annotations, demonstrating the potential of LVLM-generated semantics to advance person ReID.

2411.04077 2026-05-12 cs.CV

H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models

Nhi Pham, Michael Schott

AI总结 本文提出了一种基于分层抽样评估的H-POPE基准,用于系统评估大视觉语言模型在物体存在性和属性层面的幻觉问题。该方法通过从粗到细的层次结构进行评估,揭示了模型在细粒度属性上更容易产生幻觉的现象。研究进一步探讨了模型在生成文本时是否依赖于视觉输入,为理解视觉语言模型的生成机制提供了新的视角。

Comments Poster at https://sites.google.com/berkeley.edu/bb-stat/home

详情
英文摘要

By leveraging both texts and images, large vision language models (LVLMs) have shown significant progress in various multi-modal tasks. Nevertheless, these models often suffer from hallucinations, e.g., they exhibit inconsistencies between the visual input and the textual output. To address this, we propose H-POPE, a coarse-to-fine-grained benchmark that systematically assesses hallucination in object existence and attributes. Our evaluation shows that models are prone to hallucinations on object existence, and even more so on fine-grained attributes. We further investigate whether these models rely on visual input to formulate the output texts.

2410.10247 2026-05-12 cs.CV cs.AI

LPT: Less-overfitting Prompt Tuning for Vision-Language Model

Chenhao Ding, Xinyuan Gao, Songlin Dong, Jizhou Han, Qiang Wang, Zhengdong Zhou, Yuhang He, Yihong Gong

AI总结 该研究针对视觉语言模型在迁移过程中易出现的过拟合问题,提出了一种名为LPT的轻量级提示调优框架。其核心方法包括利用CLIP过滤细粒度前景信息以引导基础视觉概念的提示生成,并引入特征级结构保持约束和输出级层次逻辑约束,以增强模型的泛化能力。实验表明,LPT在多个基准任务中显著提升了模型的泛化性能,有效缓解了过拟合问题。

详情
英文摘要

Vision-language models (VLMs) have demonstrated exceptional generalization capabilities for downstream tasks. Due to its efficiency, prompt learning has gradually become a more effective and efficient method for transferring VLMs to downstream tasks, surpassing traditional finetuning methods. However, during the transfer process, these models are prone to severe overfitting, leading to a significant decline in generalization ability. To address this issue, we propose a framework named LPT, specifically designed for vision-language models. Specifically, we use CLIP to filter out fine-grained foreground information that may lead to overfitting, thereby guiding the prompts with basic visual concepts. Additionally, to further mitigate overfitting, we have developed a Structural Preservation (SP) constraint at the feature level, which aligns the model's overall feature space structure with the frozen CLIP, endowing the feature space with overall plasticity and enabling effective reshaping of the feature space during optimization. Moreover, we employ Hierarchical Logit (HL) constraint at the output layer to constrain the overall class information in the output, complementing the role of SP at the output end. Extensive experiments across various benchmarks (from base-to-novel, cross-dataset transfer, and domain generalization) demonstrate that our approach significantly improves generalization capability and effectively alleviates overfitting compared to state-of-the-art methods.

2408.17366 2026-05-12 cs.LG cs.AI

Leveraging Graph Neural Networks to Forecast Electricity Consumption

Eloi Campagne, Yvenn Amara-Ouali, Yannig Goude, Argyris Kalogeratos

AI总结 本文研究了如何利用图神经网络进行电力需求预测,以应对可再生能源接入和去中心化电网带来的复杂性和不确定性。研究提出了一种基于图结构的方法,能够有效捕捉电网中节点间的空间分布与关系特性,并引入了包括图卷积网络和图SAGE在内的多种模型进行预测。该方法不仅拓展了传统广义可加模型的框架,还提供了一套用于构建和评估图模型性能与可解释性的完整框架,并在合成数据和法国本土区域的真实数据上进行了实验验证。

Comments 17 pages, ECML PKDD 2024 Workshop paper

详情
英文摘要

Accurate electricity demand forecasting is essential for several reasons, especially as the integration of renewable energy sources and the transition to a decentralized network paradigm introduce greater complexity and uncertainty. The proposed methodology leverages graph-based representations to effectively capture the spatial distribution and relational intricacies inherent in this decentralized network structure. This research work offers a novel approach that extends beyond the conventional Generalized Additive Model framework by considering models like Graph Convolutional Networks or Graph SAGE. These graph-based models enable the incorporation of various levels of interconnectedness and information sharing among nodes, where each node corresponds to the combined load (i.e. consumption) of a subset of consumers (e.g. the regions of a country). More specifically, we introduce a range of methods for inferring graphs tailored to consumption forecasting, along with a framework for evaluating the developed models in terms of both performance and explainability. We conduct experiments on electricity forecasting, in both a synthetic and a real framework considering the French mainland regions, and the performance and merits of our approach are discussed.

2407.07639 2026-05-12 cs.LG cs.AI

Explaining Graph Neural Networks for Node Similarity on Graphs

Daniel Daza, Cuong Xuan Chu, Trung-Kien Tran, Daria Stepanova, Michael Cochez, Paul Groth

AI总结 本文研究了如何为基于图神经网络(GNN)的节点相似性计算提供可解释性,以提升图数据中相似性搜索的可理解性。作者比较了两种主流解释方法——基于互信息(MI)和基于梯度的(GB)解释,发现梯度基解释具有三个重要优势:可操作性、一致性以及可显著压缩为稀疏解释而不影响相似性评分效果。该研究为图神经网络的可解释性提供了有价值的实证分析和指导。

Comments Accepted in Transactions of Machine Learning Research (2026)

详情
英文摘要

Similarity search is a fundamental task for exploiting information in various applications dealing with graph data, such as citation networks or knowledge graphs. While this task has been intensively approached from heuristics to graph embeddings and graph neural networks (GNNs), providing explanations for similarity has received less attention. In this work we are concerned with explainable similarity search over graphs, by investigating how GNN-based methods for computing node similarities can be augmented with explanations. Specifically, we evaluate the performance of two prominent approaches towards explanations in GNNs, based on the concepts of mutual information (MI), and gradient-based explanations (GB). We discuss their suitability and empirically validate the properties of their explanations over different popular graph benchmarks. We find that unlike MI explanations, gradient-based explanations have three desirable properties. First, they are actionable: selecting inputs depending on them results in predictable changes in similarity scores. Second, they are consistent: the effect of selecting certain inputs overlaps very little with the effect of discarding them. Third, they can be pruned significantly to obtain sparse explanations that retain the effect on similarity scores.

2406.19741 2026-05-12 cs.RO cs.AI

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Christopher E. Mower, Yuhui Wan, Hongzhan Yu, Antoine Grosnit, Jonas Gonzalez-Billandon, Matthieu Zimmer, Jinlong Wang, Xinyu Zhang, Yao Zhao, Anbang Zhai, Puze Liu, Daniel Palenicek, Davide Tateo, Cesar Cadena, Marco Hutter, Jan Peters, Guangjian Tian, Yuzheng Zhuang, Kun Shao, Xingyue Quan, Jianye Hao, Jun Wang, Haitham Bou-Ammar

AI总结 本文提出了一种名为ROS-LLM的框架,旨在让非专家用户通过自然语言指令直观地编程机器人,该框架结合了机器人操作系统(ROS)与大型语言模型(LLM)。该系统支持从LLM输出中自动提取行为并执行ROS动作,提供多种行为模式,并通过模仿学习扩展机器人动作库,同时利用人类和环境反馈进行LLM反思。实验表明,该框架在多种复杂场景中表现出良好的鲁棒性、可扩展性和灵活性,并已开源以供使用和复现。

Comments This document contains 26 pages and 13 figures

详情
Journal ref
Nature Machine Intelligence 8, 313-325 (2026)
英文摘要

We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connected to a plethora of open-source and commercial LLMs, automatic extraction of a behavior from the LLM output and execution of ROS actions/services, support for three behavior modes (sequence, behavior tree, state machine), imitation learning for adding new robot actions to the library of possible actions, and LLM reflection via human and environment feedback. Extensive experiments validate the framework, showcasing robustness, scalability, and versatility in diverse scenarios, including long-horizon tasks, tabletop rearrangements, and remote supervisory control. To facilitate the adoption of our framework and support the reproduction of our results, we have made our code open-source. You can access it at: https://github.com/huawei-noah/HEBO/tree/master/ROSLLM.

2406.12910 2026-05-12 cs.LG cs.AI cs.NE physics.chem-ph q-bio.BM

Human-level molecular optimization driven by mol-gene evolution

Jiebin Fang, Churu Mao, Yuchen Zhu, Xiaoming Chen, Chang-Yu Hsieh, Zhongjun Ma

AI总结 该研究提出了一种名为DGMM的深度遗传分子修饰算法,旨在解决药物分子优化中结构新颖性与药理性质平衡的问题。通过引入离散变分自编码器(D-VAE),将分子编码为量化代码“mol-gene”,从而将深度学习与遗传算法结合,实现类似药物化学家的分子结构优化。该方法能够发现药理性质相似但结构不同的化合物,并揭示药物发现中结构优化的权衡关系,展示了其在多个应用中的有效性。

详情
英文摘要

De novo molecule generation allows the search for more drug-like hits across a vast chemical space. However, lead optimization is still required, and the process of optimizing molecular structures faces the challenge of balancing structural novelty with pharmacological properties. This study introduces the Deep Genetic Molecular Modification Algorithm (DGMM), which brings structure modification to the level of medicinal chemists. A discrete variational autoencoder (D-VAE) is used in DGMM to encode molecules as quantization code, mol-gene, which incorporates deep learning into genetic algorithms for flexible structural optimization. The mol-gene allows for the discovery of pharmacologically similar but structurally distinct compounds, and reveals the trade-offs of structural optimization in drug discovery. We demonstrate the effectiveness of the DGMM in several applications.

2405.17642 2026-05-12 cs.LG cs.AI stat.ME

Unifying Perspectives: Plausible Counterfactual Explanations on Global, Group-wise, and Local Levels

Oleksii Furman, Patryk Wielopolski, Łukasz Lenkiewicz, Jerzy Stefanowski, Maciej Zięba

AI总结 随着人工智能系统日益复杂,可解释性需求日益迫切。本文提出一种基于梯度优化的统一方法,能够同时生成局部、全局和群体级反事实解释,弥补了现有方法在不同粒度层面缺乏整合的不足。通过将实例分组与反事实生成结合为单一高效流程,并引入可信性准则,提升了群体级反事实的合理性与实用性,实验验证了该方法在有效性、贴近性与可信性之间的良好平衡。

详情
英文摘要

The growing complexity of AI systems has intensified the need for transparency through Explainable AI (XAI). Counterfactual explanations (CFs) offer actionable "what-if" scenarios on three levels: Local CFs providing instance-specific insights, Global CFs addressing broader trends, and Group-wise CFs (GWCFs) striking a balance and revealing patterns within cohesive groups. Despite the availability of methods for each granularity level, the field lacks a unified method that integrates these complementary approaches. We address this limitation by proposing a gradient-based optimization method for differentiable models that generates Local, Global, and Group-wise Counterfactual Explanations in a unified manner. We especially enhance GWCF generation by combining instance grouping and counterfactual generation into a single efficient process, replacing traditional two-step methods. Moreover, to ensure trustworthiness, we innovatively introduce the integration of plausibility criteria into the GWCF domain, making explanations both valid and realistic. Our results demonstrate the method's effectiveness in balancing validity, proximity, and plausibility while optimizing group granularity, with practical utility validated through practical use cases.

2405.12969 2026-05-12 cs.LG

EchoAlign: Bridging Generative and Discriminative Learning under Noisy Labels

Yuxiang Zheng, Zhongyi Han, Yilong Yin

AI总结 本文提出了一种名为 EchoAlign 的新框架,用于在存在噪声标签的情况下桥接生成式学习与判别式学习。该方法不直接修正标签,而是通过生成模型调整实例特征以对齐噪声标签,并结合特征相似性筛选出可靠的样本,从而提升模型鲁棒性。实验表明,EchoAlign 在多个基准数据集上优于现有方法,尤其在高噪声环境下表现出更强的性能和稳定性。

Comments 27 pages, 7 figures. The article has been accepted by Frontiers of Computer Science (FCS), with the DOI: 10.1007/s11704-026-51604-z

详情
英文摘要

Noisy labels severely hinder the accuracy and generalization of machine learning models, especially when ambiguous instance features make reliable annotation difficult. Existing approaches, including transition-matrix-based label correction, struggle to capture complex relationships between instances and noisy labels, limiting their effectiveness in such settings. We present EchoAlign, a framework that bridges generative and discriminative learning under noisy labels. Instead of correcting labels, EchoAlign treats noisy labels as supervision targets and modifies the corresponding instances to align with them. The framework has two components: EchoMod uses controllable generative models to adjust instance features while preserving key instance-level structural cues, such as shape and edges, and avoiding excessive distortion; EchoSelect mitigates distribution shifts by retaining a reliable subset of original instances, guided by feature similarity between original and modified samples. This generative-discriminative interplay enables robust learning in highly noisy settings. Experiments on three benchmark datasets show that EchoAlign outperforms state-of-the-art methods in most evaluated settings. Under 30% instance-dependent noise, EchoSelect retains nearly twice as many correctly labeled samples as competing approaches while maintaining 99% selection accuracy, demonstrating the robustness and effectiveness of EchoAlign.

2404.18923 2026-05-12 cs.CL

Holmes: A Benchmark to Assess the Linguistic Competence of Language Models

Andreas Waldis, Yotam Perlitz, Leshem Choshen, Yufang Hou, Iryna Gurevych

AI总结 本文提出Holmes,一个用于评估语言模型语言能力的新基准,旨在衡量模型对语言现象的无意识理解能力。通过分类器探测方法,研究分析了模型在句法、形态、语义等语言现象上的内部表征,并发现模型的语言能力与规模密切相关,同时模型架构和指令微调也显著影响性能。为此,作者还提出了计算效率更高的简化版FlashHolmes,以在保持高精度的同时降低计算负担。

详情
英文摘要

We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence - their unconscious understanding of linguistic phenomena. Specifically, we use classifier-based probing to examine LMs' internal representations regarding distinct linguistic phenomena (e.g., part-of-speech tagging). As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities, such as following instructions in prompting-based evaluations. Composing Holmes, we review over 270 probing studies and include more than 200 datasets to assess syntax, morphology, semantics, reasoning, and discourse phenomena. Analyzing over 50 LMs reveals that, aligned with known trends, their linguistic competence correlates with model size. However, surprisingly, model architecture and instruction tuning also significantly influence performance, particularly in morphology and syntax. Finally, we propose FlashHolmes, a streamlined version that reduces the computation load while maintaining high-ranking precision.

2404.14442 2026-05-12 cs.LG cs.AI

Toward a Unified Lyapunov-Certified ODE Convergence Analysis of Smooth Q-Learning with p-Norms

Donghwan Lee, Hyunjun Na

AI总结 本文研究了标准Q学习及其平滑变体的收敛性分析问题,提出了一种基于常微分方程(ODE)的统一收敛性框架。该方法通过引入平滑的p范数Lyapunov函数,克服了传统无穷范数方法中的非光滑问题,提供了简洁且严谨的稳定性证明。该框架适用于包括对数求和指数softmax、玻尔兹曼softmax和mellowmax操作符在内的多种平滑Q学习算法,并且在Bellman算子不构成收缩映射的情况下依然有效,具有广泛的适用性。

详情
英文摘要

Convergence of Q-learning has been the subject of extensive study for decades. Among the available techniques, the ordinary differential equation (ODE) method is particularly appealing as a general-purpose, off-the-shelf tool for sanity-checking the convergence of a wide range of reinforcement learning algorithms. In this paper, we develop a unified ODE-based convergence framework that applies to standard Q-learning and several soft/smoothed variants, including those built on the log-sum-exponential softmax, Boltzmann softmax, and mellowmax operators. Our analysis uses a smooth p-norm Lyapunov function, leading to concise yet rigorous stability arguments and circumventing the non-smoothness issues inherent to classical infty-norm-based approaches. To the best of our knowledge, the proposed framework is among the first to provide a unified ODE-based treatment that is broadly applicable to smooth Q-learning algorithms while also encompassing standard Q-learning. Moreover, it remains valid even in settings where the associated Bellman operator is not a contraction, as may happen in Boltzmann soft Q-learning.

2312.08413 2026-05-12 cs.LG cs.CR cs.CY

Privacy Constrained Fairness Estimation for Decision Trees

Florian van der Steen, Fré Vink, Heysem Kaya

AI总结 随着数据价值的提升,保护敏感信息和确保人工智能模型的公平性变得尤为重要。本文研究了在差分隐私约束下对决策树模型进行公平性评估的问题,提出了一种新的方法PAFER,能够在保证隐私的前提下准确估计统计公平性。实验表明,该方法在保持模型可解释性的同时,能够有效降低公平性估计的误差,并在人类更易理解的决策树上表现更优。

Comments 52 pages, under review in Applied Intelligence journal

详情
英文摘要

The protection of sensitive data becomes more vital, as data increases in value and potency. Furthermore, the pressure increases from regulators and society on model developers to make their Artificial Intelligence (AI) models non-discriminatory. To boot, there is a need for interpretable, transparent AI models for high-stakes tasks. In general, measuring the fairness of any AI model requires the sensitive attributes of the individuals in the dataset, thus raising privacy concerns. In this work, the trade-offs between fairness, privacy and interpretability are further explored. We specifically examine the Statistical Parity (SP) of Decision Trees (DTs) with Differential Privacy (DP), that are each popular methods in their respective subfield. We propose a novel method, dubbed Privacy-Aware Fairness Estimation of Rules (PAFER), that can estimate SP in a DP-aware manner for DTs. DP, making use of a third-party legal entity that securely holds this sensitive data, guarantees privacy by adding noise to the sensitive data. We experimentally compare several DP mechanisms. We show that using the Laplacian mechanism, the method is able to estimate SP with low error while guaranteeing the privacy of the individuals in the dataset with high certainty. We further show experimentally and theoretically that the method performs better for DTs that humans generally find easier to interpret.

2311.03600 2026-05-12 cs.RO

Scalable and Efficient Continual Learning from Demonstration via a Hypernetwork-generated Stable Dynamics Model

Sayantan Auddy, Jakob Hollenstein, Matteo Saveriano, Antonio Rodríguez-Sánchez, Justus Piater

AI总结 该研究提出了一种可扩展且高效的从演示中持续学习方法,通过超网络生成稳定动力学模型,以提升机器人在真实环境中学习和保持多技能的能力。核心方法包括生成轨迹学习的动力学模型和轨迹稳定化的李雅普诺夫函数,构建了一个带有时钟增强的稳定神经ODE求解器(sNODE),并在超网络中引入随机正则化以减少训练时间复杂度。实验表明,该方法在多个复杂数据集和现实任务中表现出优越的轨迹精度、持续学习性能和稳定性。

Comments To appear in IEEE Transactions on Cognitive and Developmental Systems

详情
英文摘要

Robots capable of learning from demonstration (LfD) must exhibit stability while executing learned motion skills. To be effective in the real world, they should also remember multiple skills over time -- a capability lacking in current stable-LfD methods. We propose an approach to stable, continual LfD, and highlight the role of stability in improving continual learning. Our proposed hypernetwork generates the parameters of two neural networks: a trajectory learning dynamics model, and a trajectory-stabilizing Lyapunov function. These generated networks form a clock-augmented stable neural ODE solver (sNODE), a stable dynamics model that offers a superior stability-accuracy trade-off compared to the state-of-the-art. We further propose stochastic hypernetwork regularization with a single, uniformly-sampled task embedding, reducing the cumulative training time for $N$ tasks from O($N^2$) to O($N$) without degrading performance on real-world tasks. We introduce high-dimensional variants of the popular LASA dataset to assess scalability and extend a dataset of robotic LfD tasks to assess real-world performance. We empirically evaluate our approach on multiple LfD datasets of varying complexity, including sequences of 7--26 tasks, trajectories of 2--32 dimensions, and real-world tasks involving position and orientation. Our thorough evaluation on multiple LfD datasets demonstrates that our approach sequentially learns and retains multiple motion skills without retraining on past demonstrations, and outperforms other relevant baselines in terms of trajectory errors, continual learning scores, and stability metrics. Notably, we show that stability greatly enhances continual learning performance, particularly in size-efficient chunked hypernetworks. Our code is available at https://github.com/sayantanauddy/clfd-snode.

2306.03606 2026-05-12 cs.AI

BioBLP: A Modular Framework for Learning on Multimodal Biomedical Knowledge Graphs

Daniel Daza, Dimitrios Alivanistos, Payal Mitra, Thom Pijnenburg, Michael Cochez, Paul Groth

AI总结 本文提出了一种名为BioBLP的模块化框架,用于在包含多模态实体属性的生物医学知识图谱中学习实体嵌入,能够处理不同模态的属性数据并支持缺失属性的实体。该方法还引入了一种高效的预训练策略,显著提升了模型性能并减少了训练时间。实验表明,在药物-蛋白质相互作用预测任务中,BioBLP优于不考虑属性数据的基线方法,尤其在低度节点上表现突出。

详情
Journal ref
J Biomed Semant 14, 20 (2023)
英文摘要

Knowledge graphs (KGs) are an important tool for representing complex relationships between entities in the biomedical domain. Several methods have been proposed for learning embeddings that can be used to predict new links in such graphs. Some methods ignore valuable attribute data associated with entities in biomedical KGs, such as protein sequences, or molecular graphs. Other works incorporate such data, but assume that entities can be represented with the same data modality. This is not always the case for biomedical KGs, where entities exhibit heterogeneous modalities that are central to their representation in the subject domain. We propose a modular framework for learning embeddings in KGs with entity attributes, that allows encoding attribute data of different modalities while also supporting entities with missing attributes. We additionally propose an efficient pretraining strategy for reducing the required training runtime. We train models using a biomedical KG containing approximately 2 million triples, and evaluate the performance of the resulting entity embeddings on the tasks of link prediction, and drug-protein interaction prediction, comparing against methods that do not take attribute data into account. In the standard link prediction evaluation, the proposed method results in competitive, yet lower performance than baselines that do not use attribute data. When evaluated in the task of drug-protein interaction prediction, the method compares favorably with the baselines. We find settings involving low degree entities, which make up for a substantial amount of the set of entities in the KG, where our method outperforms the baselines. Our proposed pretraining strategy yields significantly higher performance while reducing the required training runtime. Our implementation is available at https://github.com/elsevier-AI-Lab/BioBLP .

2010.03496 2026-05-12 cs.CL cs.AI cs.LG

Inductive Entity Representations from Text via Link Prediction

Daniel Daza, Michael Cochez, Paul Groth

AI总结 该研究探讨了如何通过链接预测任务从知识图谱中的文本描述中学习归纳性实体表示,并评估这些表示在不同任务中的泛化能力。研究提出了一种基于预训练语言模型的架构,能够有效处理训练时未见过的实体,在链接预测、实体分类和信息检索等任务中均取得显著提升。实验表明,所学实体表示无需微调即可跨任务迁移,展现出比现有方法更强的泛化能力。

Comments The Web Conference 2021

详情
英文摘要

Knowledge Graphs (KG) are of vital importance for multiple applications on the web, including information retrieval, recommender systems, and metadata annotation. Regardless of whether they are built manually by domain experts or with automatic pipelines, KGs are often incomplete. Recent work has begun to explore the use of textual descriptions available in knowledge graphs to learn vector representations of entities in order to preform link prediction. However, the extent to which these representations learned for link prediction generalize to other tasks is unclear. This is important given the cost of learning such representations. Ideally, we would prefer representations that do not need to be trained again when transferring to a different task, while retaining reasonable performance. In this work, we propose a holistic evaluation protocol for entity representations learned via a link prediction objective. We consider the inductive link prediction and entity classification tasks, which involve entities not seen during training. We also consider an information retrieval task for entity-oriented search. We evaluate an architecture based on a pretrained language model, that exhibits strong generalization to entities not observed during training, and outperforms related state-of-the-art methods (22% MRR improvement in link prediction on average). We further provide evidence that the learned representations transfer well to other tasks without fine-tuning. In the entity classification task we obtain an average improvement of 16% in accuracy compared with baselines that also employ pre-trained models. In the information retrieval task, we obtain significant improvements of up to 8.8% in NDCG@10 for natural language queries. We thus show that the learned representations are not limited KG-specific tasks, and have greater generalization properties than evaluated in previous work.

2605.10111 2026-05-12 cs.LG cs.AI cs.CV

CFSPMNet: Cross-subject Fourier-guided Spatial-Patch Mamba Network for EEG Motor Imagery Decoding in Stroke Patients

Xiangkai Wang, Yun Zhao, Dongyi He, Qingling Xia, Gen Li, Xinlai Xing, Yuchi Pan, Bin Jiang

AI总结 该研究针对中风患者脑机接口(BCI)解码中的跨被试应用难题,提出了一种名为CFSPMNet的新型神经网络框架。该方法结合傅里叶域状态重组与共享-私有原型匹配机制,通过建模潜在的神经状态组织,有效提升了跨被试MI-EEG解码的准确性和鲁棒性。实验表明,CFSPMNet在两个中风MI-EEG数据集上均优于现有主流方法,展现出显著的性能提升。

详情
英文摘要

Motor imagery electroencephalography (MI-EEG) decoding offers a non-invasive route for post-stroke rehabilitation, but cross-patient use remains difficult because pathological neural reorganization changes task-related EEG dynamics, aperiodic activity, local excitability, cross-regional coordination, and trial-level brain-state context. This makes source-learned MI representations unreliable for unseen patients. To address this problem, we propose CFSPMNet, a cross-patient adaptation framework that models post-stroke MI-EEG as latent neural-state organization. CFSPMNet combines a Fourier-Reorganized State Mamba Network (FRSM) with Shared-Private Prototype Matching (SPPM). FRSM represents each trial as a latent physiological token sequence, reorganizes token states in the Fourier domain, and uses Fourier-derived trial context to guide Mamba state-space propagation. SPPM improves target pseudo-label updating by combining semantic confidence with shared-private physiological consistency, filtering confident but physiologically inconsistent target predictions. Leave-one-subject-out experiments on two stroke MI-EEG datasets show that CFSPMNet outperforms representative CNN-, Transformer-, Mamba-, and adaptation-based baselines, achieving average accuracies of 68.23% on XW-Stroke and 73.33% on 2019-Stroke, with gains of 5.63 and 8.25 percentage points over the strongest competitors. Ablation, sensitivity, feature-alignment, pseudo-label selection, and neurophysiological visualization analyses further support the roles of Fourier-domain token-state reorganization and calibrated pseudo-label updating. These results suggest that latent neural-state modeling can improve rehabilitation-oriented cross-patient BCI decoding. Code is available at https://github.com/wxk1224/CFSPMNet.

2605.10108 2026-05-12 cs.CL cs.LG

GLiNER-Relex: A Unified Framework for Joint Named Entity Recognition and Relation Extraction

Ihor Stepanov, Oleksandr Lukashov, Mykhailo Shtopko, Vivek Kalyanarangan

AI总结 本文提出了一种统一的框架GLiNER-Relex,用于联合执行命名实体识别(NER)和关系抽取(RE)任务。该方法基于共享的双向Transformer编码器,将实体类型和关系类型标签联合建模,实现了在推理时对任意实体和关系类型的零样本提取。实验表明,GLiNER-Relex在多个标准关系抽取数据集上表现优异,兼具计算效率和模型灵活性,并已作为开源工具包发布。

Comments 19 pages, 1 figure, 2 tables

详情
英文摘要

Joint named entity recognition (NER) and relation extraction (RE) is a fundamental task in natural language processing for constructing knowledge graphs from unstructured text. While recent approaches treat NER and RE as separate tasks requiring distinct models, we introduce GLiNER-Relex, a unified architecture that extends the GLiNER framework to perform both entity recognition and relation extraction in a single model. Our approach leverages a shared bidirectional transformer encoder to jointly represent text, entity type labels, and relation type labels, enabling zero-shot extraction of arbitrary entity and relation types specified at inference time. GLiNER-Relex constructs entity pair representations from recognized spans and scores them against relation type embeddings using a dedicated relation scoring module. We evaluate our model on four standard relation extraction benchmarks: CoNLL04, DocRED, FewRel, and CrossRE, and demonstrate competitive performance against both specialized relation extraction models and large language models, while maintaining the computational efficiency characteristic of the GLiNER family. The model is released as an open-source Python package with a simple inference API that allows users to specify arbitrary entity and relation type labels at inference time and obtain both entities and relation triplets in a single call. All models and code are publicly available.