arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2085
2605.07215 2026-05-11 cs.RO

PISTO: Proximal Inference for Stochastic Trajectory Optimization

Hongzhe Yu, Zinuo Chang, Yongxin Chen

AI总结 本文提出了一种名为PISTO的随机轨迹优化算法,通过引入KL散度正则化项,对STOMP算法的更新过程进行稳定化改进,使其具备信任区域解释并能高效计算均值更新。该方法基于变分推断框架,采用重要性加权蒙特卡洛采样估计期望,从而实现无需梯度的优化过程,能够处理非连续和不可微的成本函数。实验表明,PISTO在机械臂运动规划和MuJoCo接触丰富的任务中均优于现有方法,具有更高的成功率和更快的路径生成速度。

Comments 8 pages

详情
英文摘要

Stochastic trajectory optimization methods like STOMP enable planning with non-differentiable costs, offering substantial flexibility over gradient-based approaches. We show that STOMP implicitly minimizes the KL divergence from a Boltzmann trajectory distribution, revealing an elegant Variational Inference (VI) structure underlying its updates. Building on this insight, we propose the \textit{Proximal Inference for Stochastic Trajectory Optimization} (PISTO) algorithm that stabilizes the updates by augmenting the objective with a KL regularization between successive Gaussian proposals. This proximal formulation admits a trust-region interpretation and yields closed-form mean updates computable as expectations under a surrogate distribution. We estimate these expectations via importance-weighted Monte Carlo sampling, producing a simple, derivative-free algorithm that inherits STOMP's ability to handle non-differentiable and discontinuous costs without modification. On robot arm motion planning benchmarks, PISTO achieves an 89\% success rate -- outperforming CHOMP (63\%) and STOMP (68\%) -- while producing shorter, smoother paths at twice the speed of competing stochastic methods. We further validate PISTO on contact-rich MuJoCo locomotion and manipulation tasks, where it consistently outperforms both CEM and MPPI baselines in reward.

2605.07214 2026-05-11 cs.AI

HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization

Yuping Yan, Jirui Han, Fei Ming, Yuanshuai Li, Yaochu Jin

AI总结 HMACE 是一种异构多智能体协作进化框架,旨在解决组合优化问题中的启发式设计难题。该方法将进化过程分解为四个协同工作的智能体,分别负责策略探索、启发式生成、评估和记忆更新,从而提升搜索的多样性和效率。实验表明,HMACE 在旅行商问题、在线装箱问题等典型组合优化问题上,相比现有单智能体和多智能体方法,在解的质量与计算效率之间取得了更优的平衡。

详情
英文摘要

Large Language Models have recently emerged as a promising paradigm for automated heuristic design for NP-hard combinatorial optimization problems. Despite this progress, existing LLM-based methods typically rely on monolithic workflows constrained by rigid templates, thereby restricting memory-guided exploration and triggering premature convergence to local optima. To design an autonomous and collaborative architecture, we introduce HMACE, a Heterogeneous Multi-Agent Collaborative Evolution framework that reconceptualizes heuristic search as an organizational design problem. HMACE decomposes each evolutionary generation into an autonomous, role-specialized loop with four coordinated agents: a Proposer for strategy exploration, a Generator for executable heuristic synthesis, an Evaluator for empirical assessment, and a Reflector for archive-backed memory update. By coupling behavior-aware retrieval, lightweight candidate filtering, and fitness-grounded archive updates, HMACE guides the search toward diverse and promising heuristic behaviors while avoiding redundant evaluations. Extensive evaluations on representative COPs, including TSP, Online BPP, MKP, and PFSP, show that HMACE achieves a favorable quality-efficiency trade-off compared to state-of-the-art single-agent and multi-agent baselines. In the matched LLM-driven reference comparison, HMACE achieves the lowest average gaps on TSP and Online BPP (0.464\% and 0.223\%, respectively), while requiring only 0.13M and 0.42M tokens for the two tasks, substantially fewer than the compared baselines.

2605.07213 2026-05-11 cs.CV

LoHGNet: Infrared Small Target Detection through Lorentz Geometric Encoding with High-Order Relation Learning

Qianwen Ma, Yang Xu, Shangwei Deng, Xiaobo Li, Haofeng Hu

AI总结 红外小目标检测(IRSTD)由于目标特征稀疏且背景干扰严重,仍面临诸多挑战。为克服现有方法在特征表示和上下文关系建模方面的局限,本文提出LoHGNet,该方法结合洛伦兹几何编码与高阶关系学习,通过在双曲空间中进行特征建模,增强了对弱小目标的层次化几何表征能力,并利用高阶关系模块建模目标与背景之间的复杂依赖关系,显著提升了复杂场景下的检测性能。实验结果表明,LoHGNet在三个数据集上均表现出优异的检测准确率和场景适应性。

详情
英文摘要

Infrared small target detection (IRSTD) remains challenging due to the scarcity of useful target cues and the presence of severe background clutter. Most current methods rely on conventional feature learning and local interaction modeling, where features are represented in Euclidean space. However, such designs may still be limited in describing the subtle differences of weak targets and the contextual relations between targets and backgrounds. To address these limitations, we propose LoHGNet, an IRSTD network that integrates Lorentz geometric encoding with high-order relation learning. By introducing Lorentz manifold based feature learning, LoHGNet offers a different feature representation from conventional IRSTD methods and provides new discriminative cues for IRSTD. Specifically, a Lorentz encoding branch is constructed with the Geometric Attention Guided Lorentz Residual Convolution Module (GA-LRCM) to perform feature modeling under hyperbolic geometric constraints and enhance the hierarchical geometric representation capability of weak targets. Subsequently, the hyperbolic features are mapped into the Euclidean tangent space through logarithmic mapping, and a High-Order Relation Learning Module (HORL) is designed to model the high-order contextual dependencies between targets and backgrounds via hypergraph construction, thereby improving target discrimination in complex backgrounds. Experimental results on three datasets demonstrate that the proposed LoHGNet achieves competitive performance in both detection accuracy and adaptability to complex scenes. The code will be available at https://github.com/Kingwin97.

2605.07212 2026-05-11 cs.LG cs.AI cs.HC cs.NE eess.SP

Same Brain, Different Prediction: How Preprocessing Choices Undermine EEG Decoding Reliability

Dengzhe Hou, Zihao Wu, Lingyu Jiang, Zirui Li, Fangzhou Lin, Kazunori D. Yamada

AI总结 该研究探讨了脑电图(EEG)解码过程中预处理选择对模型预测稳定性的影响,指出当前深度学习模型通常在未明确报告的单一预处理流程下进行训练和评估,导致预测结果高度不稳定。研究将预处理选择形式化为反事实干预空间,并展示了不同预处理方式下预测结果的显著变化,甚至在某些情况下超过42%的预测结果会反转。为此,作者提出了三种工具以量化、分解并减少这种不稳定性,包括基于沃尔什-哈达玛变换的分解方法、预处理不确定性指标以及一种基于图结构的正则化策略。

详情
英文摘要

Electroencephalography (EEG) is a cornerstone of brain-computer interfaces and clinical neuroscience, yet deep learning models are typically trained and evaluated under a single, unreported preprocessing pipeline. We formalize preprocessing choices as a counterfactual intervention space and show that EEG predictions are surprisingly unstable under this space: across six datasets spanning four paradigms, up to 42% of trial-level predictions flip when only the preprocessing changes, a variability that standard uncertainty methods do not explicitly quantify because they condition on a fixed preprocessing pipeline. We provide three tools to make this instability measurable, decomposable, and reducible. First, a Walsh-Hadamard decomposition of the 2^7 pipeline space reveals that sensitivity is near-additive in practice under the binary intervention design, enabling efficient step-by-step optimization. Second, we introduce Preprocessing Uncertainty (PU), a per-trial diagnostic that captures a dimension of instability complementary to model-based confidence. Third, we study Normalized Adaptive PGI (NA-PGI), a graph-structured regularizer that exploits the compositional structure of preprocessing interventions as one mitigation strategy with clear scope conditions.

2605.07211 2026-05-11 cs.LG cs.AI

HARMONY: Bridging the Personalization-Generalization Gap by Mitigating Representation Skew in Heterogeneous Split Federated Learning

Jiseok Youn, You Rim Choi, Goodsol Lee, Sangtae Ha, Hyung-Sin Kim, Saewoong Bahk

AI总结 在异构拆分联邦学习中,由于客户端架构差异和数据分布不均衡,现有方法面临表示偏差问题,导致服务器端对未知类别预测性能下降。本文提出HARMONY框架,通过改进元学习以支持不同参数和架构的个性化提取器,并在服务器端引入对比学习以对齐特征表示,从而缓解表示偏差。HARMONY在保持客户端个性化和不共享原始标签的前提下,显著提升了模型在有无未知类别情况下的测试准确率。

Comments 7 pages (except references), 5 figures

详情
英文摘要

Mobile devices face diverse resource constraints and non-IID data class distributions, requiring fast on-device inference for local in-distribution (ID) classes and on-demand remote support for client-specific out-of-distribution (OOD) classes. Hybrid split federated learning (Hybrid SFL) couples personalized client-side front ends (supporting early exit) with a generalized server-side backend for fallback inference, balancing accuracy and cost. However, under client architectural heterogeneity, the existing hybrid SFL suffers from representation skew, where features from customized extractors fail to align in the shared space, leading to a sharp degradation in the server model responsible for OOD prediction. We propose HARMONY, the first hybrid SFL framework to support heterogeneous client architectures. HARMONY modifies meta-learning to simulate diverse extractors across parameters and architectures, and to learn to personalize. To mitigate representation skew, HARMONY conducts server-side contrastive learning to align extracted features, neither sacrificing clients' personalization nor sharing raw labels. Compared to the state of the art across multiple datasets and model families, HARMONY improves test accuracy by up to 43.0%/28.3% without/with OOD, respectively, while maintaining acceptable latency.

2605.07209 2026-05-11 cs.CL cs.AI cs.LG

Hallucination Detection via Activations of Open-Weight Proxy Analyzers

Akshita Singh, Prabesh Paudel, Siddhartha Roy

AI总结 本文提出了一种基于代理分析器的框架,用于检测大型语言模型中的幻觉。该方法通过一个小型的本地开放权重模型读取已生成文本,并利用其内部激活信息来识别幻觉,适用于闭源API如GPT-4或任何开放权重模型。研究构建了18个基于Transformer处理机制的特征,并在多个数据集上训练了一个堆叠集成模型,实验表明该方法在多个分析器架构上均优于现有方法,且模型规模与性能之间并无明显正相关。

Comments 12 pages, 4 figures. Code available at https://github.com/hallu-detect/llm_hallucination_detection

详情
英文摘要

We introduce a proxy-analyzer framework for detecting hallucinations in large language models. Instead of looking inside the generating model, our system reads already-generated text through a small locally hosted open-weight model and spots hallucinations using the reader's own internal activations. This works just as well when the generator is a closed API like GPT-4 as when it is any open-weight model. We built eighteen features grounded in how transformers process text, covering residual stream norms, per-head source-document attention, entropy, MLP activations, logit-lens trajectories, and three new token-level grounding statistics. We trained a stacking ensemble on 72,135 samples from five hallucination datasets. We tested across seven analyzer architectures from 0.5 billion to 9 billion parameters: Qwen2.5 at 0.5B and 7B, Gemma-2 at 2B and 9B, Pythia at 1.4B, and LLaMA-3 at both 3B and 8B. Across all seven, we consistently beat ReDeEP's token-level AUC of 0.73 on RAGTruth by 7.4 to 10.3 percentage points. Qwen2.5-7B reached an F1 of 0.717, just above ReDeEP's 0.713, while Qwen2.5-0.5B hit 0.706. The most striking finding is how tightly all seven models cluster: AUC spans only 2.3 percentage points across an eighteen-fold difference in model size. Even more surprising, our 3B LLaMA outperforms our 8B LLaMA on RAGTruth, showing that bigger is not always better even within the same model family. Both RAGTruth and LLM-AggreFact include outputs from multiple LLM families, so our results are not skewed toward any particular generator.

2605.07208 2026-05-11 cs.LG

FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution

Jianrong Ding, Jianyuan Zhong, Zhengyan Shi, Qiang Xu

AI总结 该研究提出了一种名为FAME的时空框架,用于预测学术论文的影响力。FAME通过结合文本特征和验证过的知识流动图,构建动态潜在空间,捕捉科学主题的演化轨迹,从而更准确地评估论文的潜在影响。实验表明,FAME在三个快速发展的领域中显著优于现有的大型语言模型,在多维影响力预测任务中表现出色,并能有效提升LLM的预测性能。

详情
英文摘要

Large Language Models (LLMs) are increasingly used to brainstorm and evaluate research ideas, yet assessing such judgments is fundamentally difficult because the true impact of a new idea may take years to emerge. We address this challenge by using the impact forecasting of human-authored manuscripts as a verifiable proxy task. In a prospective forecasting study, we find that frontier LLMs fail to reliably distinguish high-impact papers from ordinary publications, suggesting that static text-based judging is insufficient for scientific evaluation. To address this limitation, we propose $\textbf{FAME}$ ($\underline{\text{F}}$orecasting $\underline{\text{A}}$cademic Impact via Continuous-Time $\underline{\text{M}}$anifold $\underline{\text{E}}$volution), a spatiotemporal framework for modeling the dynamic trajectories of scientific topics. FAME projects papers into a dynamic latent space informed by textual features and a verified knowledge-flow graph, learning geometric constraints that align impactful manuscripts with the forward momentum of their fields. Experiments on 3,200 arXiv papers across three fast-evolving subfields show that FAME consistently and substantially outperforms state-of-the-art LLM evaluators in prospective multidimensional impact forecasting. Furthermore, integrating FAME's dynamic geometric signals into LLMs significantly improves their forecasting performance. These results support manuscript impact forecasting as a useful, measurable proxy benchmark and position FAME as a strong, trajectory-aware foundation for automated scientific evaluation.

2605.07204 2026-05-11 cs.LG

Arrow: A Foundation Model for Causal Discovery

Ryan Thompson, He Zhao, Daniel M. Steinberg, Edwin V. Bonilla

AI总结 本文提出了一种名为Arrow的基础模型,用于在观测表格数据上实现零样本因果发现。该模型通过将有向无环图分解为无向骨架和拓扑序,确保图的无环性,并利用基于Transformer的架构对变量进行上下文建模,从而预测图的边概率和节点顺序,生成因果图。Arrow在多种合成和真实数据集上表现出色,相比现有方法具有更低的推理成本,展示了大规模预训练在因果发现任务中的有效性。

详情
英文摘要

We introduce Arrow, a foundation model for zero-shot causal discovery on observational tabular data. Arrow factorizes a directed acyclic graph into an undirected skeleton and a topological order, guaranteeing acyclicity by construction. Given a new dataset, it uses a transformer-based architecture to contextualize variables within and across observations, then predicts skeleton edge probabilities and node order scores that together define a graph. Arrow is trained in a supervised fashion on synthetic datasets with ground-truth graphs, using an end-to-end differentiable directed edge composite likelihood induced by the skeleton-order factorization. The training distribution spans diverse graph families, functional forms, noise models, and dataset shapes. Across in- and out-of-distribution synthetic, semi-synthetic, and real datasets, Arrow matches or outperforms existing causal discovery methods at substantially lower inference cost than competitive alternatives. Our results demonstrate that large-scale pretraining on diverse synthetic data can yield zero-shot causal discovery models that are fast, accurate, and reusable on new datasets.

2605.07201 2026-05-11 cs.CL cs.AI cs.LG

PSK@EEUCA 2026: Fine-Tuning Large Language Models with Synthetic Data Augmentation for Multi-Class Toxicity Detection in Gaming Chat

Srikar Kashyap Pulipaka

AI总结 本文介绍了我们针对EEUCA 2026共享任务“理解游戏社区中的有毒行为”所提出的系统,任务要求将《坦克世界》聊天消息分类为六类毒性内容。我们尝试了多种方法,包括基于编码器的模型、使用LoRA微调的指令调优大语言模型、层次分类、一对多策略以及集成方法。最终系统结合了Llama 3.1 8B模型与精心校准的5%合成数据增强,取得了0.6234的F1宏平均分,在35支参赛队伍中排名第四,并分析了数据标注模式对模型泛化能力的影响,揭示了验证集性能与测试集表现不一致的“验证陷阱”现象。

Comments Accepted to the EEUCA workshop at ACL 2026

详情
英文摘要

This paper describes our system for the EEUCA 2026 Shared Task on Understanding Toxic Behavior in Gaming Communities. The task involves classifying World of Tanks chat messages into six toxicity categories: Non-toxic, Insults/Flaming, Other Offensive, Hate/Harassment, Threats, and Extremism. We explore multiple approaches including encoder-based models, instruction-tuned LLMs with LoRA fine-tuning, hierarchical classification, one-vs-rest strategies, and various ensemble methods. Our best system combines Llama 3.1 8B with carefully calibrated 5\% synthetic data augmentation, achieving an F1-macro score of 0.6234 on the test set, placing 4th out of 35 participating teams. We provide extensive analysis of the dataset's annotation patterns and their impact on model generalization, revealing a critical ''validation trap'' phenomenon where high validation performance correlates with poor test transfer.

2605.07199 2026-05-11 cs.AI cs.LG

Three-in-One World Model: Energy-Based Consistency, Prediction, and Counterfactual Inference for Marketing Intervention

Junichiro Niimi

AI总结 该论文提出了一种名为“Three-in-One”的世界模型,用于统一处理营销干预中的消费者异质性、内部状态变化和显式干预问题。该模型基于深度玻尔兹曼机(DBM)学习消费者的潜在信念表示,并通过轻量化的任务适配器实现一致性评估、结果预测和反事实推理三个任务。实验表明,该模型在保留消费者异质性特征方面优于现有方法,尤其在处理价格与促销干预的混淆因素时表现突出。

详情
英文摘要

Marketing decisions reflect the interaction of latent consumer heterogeneity, time-varying internal states, and explicit interventions, a structure that current prediction- and language-oriented models do not capture in a unified manner. We propose a Three-in-One world-model architecture in which a Deep Boltzmann Machine (DBM) learns a frozen belief representation from demographics, time, and lagged actions and outcomes, with lightweight task-specific adapters attached on top. The same belief supports three tasks within a single framework: (i) energy-based consistency evaluation through the DBM's free energy, (ii) outcome prediction through adapters, and (iii) counterfactual inference by holding the belief fixed and varying only the action input given to the adapter. Using a controlled simulation in which the latent price sensitivity, promotion responsiveness, and base preference of each consumer are known, we show that the adapters match a strong MLP baseline on visit- and purchase-AUC while recovering heterogeneous treatment effects substantially better than S-, T-, X-, and DR-learner meta-learners and a Causal Forest baseline built on the same raw features, with the largest gap on a confounded price-promotion intervention. Complementing this, free-energy clamps systematically penalize counterfactual purchase trajectories that lack prior promotional exposure, and the penalty itself depends on the latent base preference in the expected direction. These results indicate that DBM beliefs disentangle latent traits in a form that survives counterfactual queries, providing an integrated world-model substrate for marketing intervention.

2605.07195 2026-05-11 cs.CV

See Tomorrow, Act Today: Foresight-Driven Autonomous Driving

Bozhou Zhang, Nan Song, Yuang Wang, Jiankang Deng, Xiatian Zhu, Li Zhang

AI总结 当前端到端自动驾驶规划方法多为反应式,仅基于历史和当前观测预测未来动作。本文提出ForeSight框架,将自动驾驶重新定义为前瞻性决策过程,通过预训练世界模型生成可能的未来场景,并基于这些想象的未来进行动作规划,从而实现更具预见性的决策。实验表明,该方法在NAVSIM和nuScenes数据集上显著优于现有先进方法,验证了其有效性。

Comments CVPR Findings 2026

详情
英文摘要

Current end-to-end autonomous driving planners are fundamentally reactive: they condition on historical and present observations to predict future actions. We argue that autonomous agents should instead imagine future scenes before deciding, just as human drivers mentally simulate ``what will happen next" before acting. We introduce ForeSight, a foundation world model centric planning framework that reframes autonomous driving as anticipatory decision-making. Rather than treating world models as auxiliary components, ForeSight makes future scene imagination the primary driver of action prediction. Our approach operates in two stages: (1) generating plausible future visual worlds via a pretrained world model, and (2) planning actions conditioned on these imagined futures. This paradigm shift from ``what should I do now?" to ``what will happen, and how should I respond?" enables genuinely anticipatory rather than reactive planning. By grounding decisions in anticipated contexts rather than present observations alone, ForeSight navigates dynamic, interactive scenarios more effectively. Extensive experiments on NAVSIM and nuScenes demonstrate that explicit future imagination significantly outperforms previous state-of-the-art alternatives, validating our foresight-driven approach.

2605.07194 2026-05-11 cs.CV cs.AI cs.LG

Closed-Form Linear-Probe Dataset Distillation for Pre-trained Vision Models

Bincheng Peng, Guang Li, Ping Liu, Takahiro Ogawa, Miki Haseyama

AI总结 本文研究了如何将大规模训练集压缩为小规模合成数据集,以保留对预训练视觉模型进行线性探针任务的训练效果。作者提出了一种闭式线性探针数据蒸馏方法(CLP-DD),通过双层优化框架,直接利用预训练特征的闭式解进行合成图像生成,并采用温度缩放的交叉熵损失进行优化。实验表明,该方法在计算效率和性能上均优于现有方法,尤其在ImageNet-1K数据集上表现突出。

详情
英文摘要

Dataset distillation compresses a large training set into a small synthetic set that preserves downstream training utility. While most existing methods target training networks from scratch, modern visual transfer learning often uses frozen pre-trained encoders followed by lightweight linear probing. Existing distillation methods for this setting either unroll iterative linear-probe updates with trajectory-based gradient matching, or rely on closed-form formulations originally designed for from-scratch training with neural-tangent-kernel (NTK) approximations. Neither route exploits the fact that frozen-feature linear probing admits a closed-form solution determined directly by the pre-trained features themselves, with no infinite-width approximation and no inner-loop trajectory. We propose Closed-Form Linear-Probe Dataset Distillation (CLP-DD), a bilevel formulation that computes the linear probe induced by the synthetic set with a sample-space kernel ridge solver. The synthetic images are then updated by evaluating this induced classifier on real features through a temperature-scaled softmax cross-entropy, where the classifier columns act as learned class anchors in feature space. We further show that the choice of outer objective is decisive: pairing the closed-form inner solver with a standard MSE outer loss substantially underperforms trajectory-based methods, while the discriminative outer loss closes most of the gap. On ImageNet-100 with four pre-trained backbones, CLP-DD substantially improves over LGM without DSA and approaches LGM with DSA at a fraction of the computational cost. On ImageNet-1K, CLP-DD matches or surpasses LGM with DSA on three of four backbones while running roughly $14\times$ faster and using less than one-eighth of the GPU memory.

2605.07193 2026-05-11 cs.LG

Coupling Models for One-Step Discrete Generation

Fred Zhangzhi Peng, Avishek Joey Bose, Anru R. Zhang, Alexander Tong

AI总结 本文提出了一种名为Coupling Models的一步式离散生成模型,旨在解决传统生成方法依赖自回归解码或迭代优化的问题。该模型通过学习离散序列与高斯潜变量之间的直接耦合关系,实现单步生成,避免了复杂的连续流和手动指定的数据-噪声耦合。实验表明,Coupling Models在多个任务中显著优于现有的一步式基线方法,展示了其在离散生成任务中的有效性。

Comments Code is available at https://github.com/pengzhangzhi/Coupling-Models

详情
英文摘要

Generative modeling over discrete structures underpins applications across deep learning, from biological sequence design and code generation to large language models, yet generation often remains sequential, relying on autoregressive decoding or iterative refinement. In this work, we introduce Coupling Models(Coupling Models), a one-step discrete generative model that learns a direct coupling between discrete sequences and Gaussian latents. Unlike recent distillation methods that compress a pretrained multi-step sampler into a few steps, Coupling Model trains a purpose-built decoder to invert this coupling and generate samples in a single step. The model also avoids complex continuous flows over the simplex and hand-specified data-to-noise couplings. Empirically,Coupling Model improves the strongest one-step baselines in each domain: it reduces LM1B text-generation perplexity by 33% at its lowest-perplexity operating point, Fly Brain enhancer-design FBD by 18%, and MNIST-Binary FID by 46%. These results suggest that effective one-step discrete generation depends strongly on how data and noise are coupled before decoding. Code is available at https://github.com/pengzhangzhi/Coupling-Models.

2605.07192 2026-05-11 cs.CV

AsyncEvGS: Asynchronous Event-Assisted Gaussian Splatting for Handheld Motion-Blurred Scenes

Jun Dai, Renbiao Jin, Bo Xu, Yutian Chen, Linning Xu, Mulin Yu, Tianfan Xue, Shi Guo

AI总结 本文提出了一种异步RGB-事件双摄像头系统及相应的重建框架AsyncEvGS,用于解决手持设备在严重运动模糊场景下的三维重建问题。该方法利用事件相机的高时间分辨率特性,结合视觉几何变换器(VGGT)进行跨域姿态估计,提升3D高斯溅射(3DGS)的初始化鲁棒性,并通过结构驱动的事件损失和视图特定一致性正则化项优化重建过程。此外,作者还构建了一个高分辨率的RGB-事件数据集AsyncEv-Deblur,实验表明该方法在多个基准数据集上均取得了领先的重建效果。

详情
英文摘要

3D reconstruction methods such as 3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) achieve impressive photorealism but fail when input images suffer from severe motion blur. While event cameras provide high-temporal-resolution motion cues, existing event-assisted approaches rely on low-resolution sensors and strict synchronization, limiting their practicality for handheld 3D capture on common devices, such as smartphones. We introduce a flexible, high-resolution asynchronous RGB-Event dual-camera system and a corresponding reconstruction framework. Our approach first reconstructs sharp images from the event data and then employs a cross-domain pose estimation module based on the Visual Geometry Transformer (VGGT) to obtain robust initialization for 3DGS. During optimization, we employ a structure-driven event loss and view-specific consistency regularizers to mitigate the ill-posed behavior of traditional event losses and deblurring losses, ensuring both stable and high-fidelity reconstruction. We further contribute AsyncEv-Deblur, a new high-resolution RGB-Event dataset captured with our asynchronous system. Experiments demonstrate that our method achieves state-of-the-art performance on both our challenging dataset and existing benchmarks, substantially improving reconstruction robustness under severe motion blur. Project page: https://openimaginglab.github.io/AsyncEvGS/

2605.07191 2026-05-11 cs.CV cs.LG

Attention Transfer Is Not Universally Effective for Vision Transformers

Huaiyuan Qin, Muli Yang, Gabriel James Goenawan, Peng Hu, Chen Gong, Xi Peng, Hongyuan Zhu

AI总结 该研究指出,尽管近期有工作表明仅通过注意力迁移即可从预训练的教师视觉Transformer(ViT)中恢复其全部性能,但这一方法在多个ViT家族中并不普遍有效。研究发现,部分ViT家族在注意力迁移后表现甚至低于从头训练的基线,问题根源在于学生模型与教师模型之间的架构不匹配。通过向学生模型引入教师模型的原生架构组件,可以完全解决这些家族的迁移失败问题,表明注意力迁移的有效性依赖于学生模型对教师架构的匹配程度。

详情
英文摘要

A recent work shows that Attention Transfer, which transfers only the attention patterns from a pre-trained teacher Vision Transformer (ViT) to a randomly initialized standard student ViT, is sufficient to recover the full benefit of the teacher's pre-trained weights. We revisit this finding on a comprehensive benchmark of 20 teachers from 11 well-known ViT families and reveal that Attention Transfer is not universally effective. While 7 families transfer successfully, 4 consistently fail, falling up to 5.1\% below the from-scratch no-transfer baseline. Further results demonstrate that this failure is family-consistent across model sizes, and persists under extended training durations, different transfer datasets, and out-of-distribution evaluations. Controlled analyses then consistently localize the problem to the attention-routing channel, indicating that the key issue is not whether the student can match the teacher's attention patterns, but whether the matched patterns remain functional for the student. Crucially, we identify architectural mismatch between the pre-trained teacher and the standard student as the primary mechanism. By adding only the teacher's native architectural components to the student in a randomly initialized state, we completely reverse the failure for all 4 families. Notably, these components alone do not improve from-scratch training, confirming that they specifically unlock the usability of the teacher's attention. We further systematically show that this failure is not explained by the inadequate choice of transfer loss or by differences in pre-training recipes. Our findings refine the prevailing understanding of attention in ViT representations: attention is sufficient \textit{only} when the student architecture matches the teacher.

2605.07186 2026-05-11 cs.CL cs.AI

The Text Uncanny Valley: Non-Monotonic Performance Degradation in LLM Information Retrieval

Zekai Tong, Ruiyao Xu, Aryan Shrivastava, Chenhao Tan, Ari Holtzman

AI总结 本文研究了大语言模型(LLM)在处理不完美文本时的信息检索性能变化,发现当文本中插入空格导致词语碎片化时,模型的检测准确率呈现U型曲线变化,称之为“文本诡异谷”。研究提出一种模式转换假说,认为模型在接近正常文本时以词级模式处理,而在高度碎片化文本中则切换为字符级模式,而诡异谷区域则是两种模式失效的过渡地带。实验表明,这种性能下降在噪声文本场景中具有重要意义,且对模型的精确词汇对齐依赖程度不同,影响程度也有所差异。

Comments 18 pages, 9 figures

详情
英文摘要

Existing Large Language Model (LLM) benchmarks primarily focus on syntactically correct inputs, leaving a significant gap in evaluation on imperfect text. In this work, we study how word-boundary corruption affects how LLMs detect targeted information. By inserting whitespace characters within words to break them into fragments, LLMs' detection accuracy follows a U-shaped curve with the increase in insertion rate. We refer to this curve as the Text Uncanny Valley. To explain such observation, we propose a mode transition hypothesis: LLMs operate in a word-level mode for near-normal text and a character-level mode for heavily fragmented text, with the valley marking the disordered transition where neither mode is effective. Four experiments and one analysis are consistent with this account: in-context learning fails to rescue valley-bottom performance; regularizing the perturbation substantially reduces the U-shape; a math reasoning task replicates the U-shape for Gemini 3.0 Flash but not for stronger models, suggesting the effect is attenuated when tasks rely less on exact lexical alignment; and tokenization entropy peaks before the F1 minimum, consistent with a regime-conflict interpretation. These findings reveal a failure mode invisible to clean-text benchmarks yet directly relevant to any deployment scenario involving noisy or uncurated text inputs.

2605.07182 2026-05-11 cs.LG

Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control

Ali Taghibakhshi, Ruisi Cai, Saurav Muralidharan, Sharath Turuvekere Sreenivas, Aditya Vavre, Ameya Sunil Mahabaleshwarkar, Bilal Kartal, Sheldon Liang, Marcin Chochowski, Zijia Chen, Akhiad Bercovich, Ran Zilberstein, Ran El-Yaniv, Yonatan Geifman, Daniel Korzekwa, Yoshi Suhara, Oluwatobi Olabiyi, Ashwath Aithal, Nima Tajbakhsh, Pavlo Molchanov

AI总结 Star Elastic 是一种新型的大型语言模型(LLM)后训练方法,通过一次训练过程生成多个嵌套子模型,大幅降低训练成本并提升推理效率。该方法引入弹性预算控制机制,允许根据任务难度动态选择不同子模型进行推理,从而在准确率和延迟之间取得更好的平衡。实验表明,Star Elastic 在保持模型性能的同时,相比从头训练和现有压缩方法分别减少了360倍和7倍的训练成本,并支持多种架构的嵌套与知识蒸馏,适用于大规模模型的高效部署。

详情
英文摘要

Training a family of large language models (LLMs), either from scratch or via iterative compression, is prohibitively expensive and inefficient, requiring separate training runs for each model in the family. In this paper, we introduce Star Elastic, a novel LLM post-training method that adds N nested submodels to a given parent reasoning model using the compute of one run (N-fold savings) via a single post-training job. Beyond reducing training costs, Star Elastic also addresses a fundamental limitation of efficient reasoning: the rigidity of static architectures, which forces the allocation of constant resources regardless of token difficulty. By unlocking elastic budget control, Star Elastic enables a novel inference scheme that uses different submodels for each reasoning phase (thinking and answering). Star Elastic supports (1) nesting along the SSM, embedding channel, MoE, and FFN axes, (2) learning nested submodels via an end-to-end trainable router, and (3) curriculum-based knowledge distillation. Building on the Nemotron Elastic framework, we apply Star Elastic to the NVIDIA Nemotron Nano models, with a particular focus on hybrid Mixture-of-Experts (MoE) architectures: from Nemotron Nano v3 (30B/3.6A), we generate 23B (2.8A) and 12B (2.0A) variants with 160B training tokens. All nested models match or outperform independently trained baselines of comparable size and achieve a 360x reduction versus pretraining from scratch and a 7x reduction over state-of-the-art compression. Crucially, elastic budget control advances the accuracy-latency Pareto frontier, achieving up to 16% higher accuracy and 1.9x lower latency via dynamic per-phase model selection. We further extend Star Elastic to quantized regimes via Quantization-Aware Distillation (QAD), producing nested NVFP4 and FP8 elastic checkpoints that preserve zero-shot slicing while delivering smaller deployment footprints.

2605.07181 2026-05-11 cs.CV

SatSurfGS: Generalizable 2D Gaussian Splatting for Sparse-View Satellite Surface Reconstruction

Min Chen, Wei Guo, Bin Wang, Wen Li, Tong Fang, Jinbo Zhang, Junqi Zhao, Hong Kuang, Han Hu, Xuming Ge, Qing Zhu, Bo Xu

AI总结 稀疏视角卫星图像表面重建面临多视角匹配可靠性空间异质性的挑战,主要由于光照差异大、纹理弱及重复纹理等问题导致几何约束稀疏且局部不可靠。为解决这一问题,本文提出SatSurfGS方法,基于2D高斯点扩散技术构建了一个可泛化的重建框架,通过特征学习、高斯参数估计和训练优化三个层面显式建模局部几何可靠性,并引入置信度感知的特征融合模块、跨阶段自一致性残差引导模块以及置信度双向路由损失,有效提升了重建质量与泛化能力。实验表明,该方法在渲染质量、重建精度和推理效率方面均优于现有主流方法。

详情
英文摘要

Sparse-view satellite image surface reconstruction remains highly challenging, fundamentally because the reliability of multi-view matching under satellite imaging conditions is strongly spatially heterogeneous. Affected by large photometric differences, weak textures, and repetitive textures, multi-view geometric constraints are often sparse, unevenly distributed, and locally unreliable. Although 2D Gaussian Splatting (2DGS) is more suitable than 3D Gaussian Splatting (3DGS) for the explicit representation of continuous surfaces, research on generalizable feed-forward 2DGS frameworks for sparse-view satellite surface reconstruction is still lacking. To address this issue, we propose SatSurfGS, a generalizable sparse-view surface reconstruction method for satellite imagery based on 2DGS. The proposed method builds a coarse-to-fine Gaussian attribute prediction framework and explicitly models local geometric reliability at three levels: feature learning, Gaussian parameter estimation, and training optimization. Specifically, we propose a confidence-aware monocular multi-view feature fusion module to adaptively integrate monocular priors and multi-view matching features according to local confidence; a cross-stage self-consistency residual guidance module to stabilize stage-wise Gaussian parameter refinement using the residual between the rendered height map from the previous stage and the current-stage MVS height map, together with confidence information; and a confidence bidirectional routing loss to achieve differentiated allocation of geometric and appearance supervision. Experiments on satellite datasets show that the proposed method achieves improved rendering quality, surface reconstruction accuracy, cross-dataset generalization, and inference efficiency compared with representative generalizable baselines and competitive per-scene optimization methods.

2605.07180 2026-05-11 cs.CL

Learning Agent Routing From Early Experience

Yimin Wang, Jiahao Qiu, Xuan Qi, Xinzhe Juan, Jingzhe Shi, Zelin Zhao, Hongru Wang, Shilong Liu, Mengdi Wang

AI总结 该研究探讨了在实际冷启动场景下,如何有效路由查询到轻量级大语言模型(LLM)推理或完整智能体执行的问题。提出了一种无需训练的路由框架 BoundaryRouter,通过早期行为经验和规则引导推理来决定查询的处理方式,从而提升效率与性能。实验表明,该方法相比直接使用大模型推理或基于提示的路由方式,在推理时间和性能上均有显著提升。

Comments 17 pages

详情
英文摘要

LLM agents achieve strong performance on complex reasoning tasks but incur high latency and compute cost. In practice, many queries fall within the capability boundary of cutting-edge LLMs and do not require full agent execution, making effective routing between LLMs and agents a key challenge. We study the problem of routing queries between lightweight LLM inference and full agent execution under realistic cold-start settings. To address this, we propose BoundaryRouter, a training-free routing framework that uses early behavioral experience and rubric-guided reasoning to decide whether to answer a query with direct LLM inference or escalate to an agent. BoundaryRouter builds a compact experience memory by executing both systems on a shared seed set and retrieves similar cases at inference time to guide routing decisions. To evaluate this method, we introduce RouteBench, a benchmark covering in-domain, paraphrased, and out-of-domain route settings. Experiments show that BoundaryRouter reduces inference time by 60.6% compared to the agent while improving performance by 28.6% over direct LLM inference, outperforming prompt-based and retrieval-only routing by an average of 37.9% and 8.2%, respectively.

2605.07178 2026-05-11 cs.CV

Masks Can Talk: Extracting Structured Text Information from Single-Modal Images for Remote Sensing Change Detection

Kai Zheng, Hang-Cheng Dong, Jiatong Pan, Zhenkai Wu, Fupeng Wei, Wei Zhang

AI总结 本文研究了如何从单模态遥感图像中提取结构化文本信息以提升变化检测的性能。作者提出了一种名为S2M的框架,通过直接利用变化检测数据集中已有的标注掩膜,自动生成结构化的四元组描述(在哪里、是什么、如何变化、多少),从而提供精确且无噪声的多模态监督信号。实验表明,该方法在新的Gaza-Change-v2数据集上取得了优于现有方法的性能,验证了掩膜本身蕴含的结构化信息在变化检测任务中的巨大潜力。

详情
英文摘要

Remote sensing change detection is pivotal for urban monitoring, disaster assessment, and environmental resource management. Yet, unimodal deep learning methods frequently confuse genuine semantic changes with visually similar but irrelevant variations. Recent multimodal approaches incorporate text as auxiliary supervision, but their descriptions are either semantically coarse and unstructured or model-generated and thus noisy. Critically, all of them overlook a simple fact: fine-grained change semantics are already implicitly encoded in the ground-truth mask labels that come standard with every change detection dataset. These masks know where the change happened, what the land-cover types were before and after, how the transition occurred, and how many objects were involved. In this paper, we propose S2M, a framework that obtains structured textual features directly from change labels at zero additional annotation cost. Specifically, each change region is automatically transcribed into a semantic quadruple (where, what, how, how many) and converted into several fixed-template text descriptions, providing precise, dense, and noise-free multimodal supervision. We adopts a two-stage training strategy to fine-tune on remote sensing imagery firstly for robust domain-specific representation, after which a multimodal decoder with a bi-directional contrastive loss is introduced to achieve deep alignment between visual features and structured textual embeddings. To validate our method, we construct Gaza-Change-v2, a new multi-class change detection (MCD) dataset about the Gaza Strip. On this MCD dataset, S2M achieves a Sek of 17.80\% and an F$_{\text{scd}}$ of 66.14\%, notably surpassing even multimodal methods that leverage large language models. Our work demonstrates that masks can indeed talk. They tell us exactly what, where, how, and how many changes have occurred.

2605.07175 2026-05-11 cs.LG cs.AI

Learning Multi-Relational Graph Representations for DNA Methylation-Based Biological Age Estimation

Qing Qing, Xikun Zhang, Zhongyuan Zhang, Jiarui Liu, Xingtong Yu, Xiaotao Shen, Ziqi Xu, Qixin Zhang, Zhe Wang, Renqiang Luo

AI总结 该研究旨在基于DNA甲基化数据更准确地预测生物年龄,提出了一种名为RelAge-GNN的多关系图神经网络框架。该方法通过构建三个互补的图结构,捕捉CpG位点之间的共甲基化模式、基因组共定位关系以及基因层面的关联,并利用独立的图神经网络分支进行建模,再通过可学习的门控机制融合不同图的表示。实验表明,RelAge-GNN在大规模数据集上表现出更高的预测准确性和与实际年龄的相关性,同时在检测疾病相关的年龄加速方面也更具敏感性,具有重要的生物学解释价值。

详情
英文摘要

Aging clocks aim to estimate biological age, a measure of physiological state distinct from chronological age, from observable biomarkers, and are widely used for health assessment and disease analysis. DNA methylation is a particularly informative biomarker due to its stability and strong association with aging, and recent learning-based approaches have improved predictive performance. However, most existing methods treat CpG sites as independent features, overlooking the complex and heterogeneous biological relationships among them. We propose RelAge-GNN, a multi-relational graph neural network framework for DNA methylation-based age prediction. Our method constructs three complementary graphs capturing co-methylation patterns, genomic co-localization, and gene-level associations among CpG sites. Each graph is modeled by an independent GNN branch, and a learnable gating mechanism adaptively fuses the resulting representations. Experiments on large-scale datasets show that RelAge-GNN achieves competitive accuracy and stronger correlation with chronological age compared to state-of-the-art methods. Moreover, the model exhibits improved sensitivity in detecting age acceleration across diverse disease cohorts, highlighting its potential utility for disease characterization. Finally, through post hoc interpretability analyses, we quantify the contributions of different relational structures and CpG sites, providing biologically meaningful insights and suggesting potential directions for aging-related research. Our code is available at: https://anonymous.4open.science/r/RelAge-GNN-F1E3/.

2605.07174 2026-05-11 cs.AI

Repeated Deceptive Path Planning against Learnable Observer

Shiyue Cao, Pei Xu, Likun Yang, Lei Cui, Shizhao Yu, Shiyu Zhang, Yongjian Ren, Xiaotang Chen, Kaiqi Huang

AI总结 本文研究了在可学习观察者面前的欺骗路径规划问题,即智能体如何隐藏其真实目的地。传统方法假设观察者是静态的,但实际中观察者可通过学习历史轨迹进行适应。为此,作者提出了重复欺骗路径规划(RDPP)框架,并设计了欺骗元规划(DeMP)方法,通过双层优化机制实现短期策略调整与长期模型更新,有效缓解了适应滞后问题,显著提升了对学习型观察者的欺骗能力。

Comments Full version of the extended abstract accepted at AAMAS 2026

详情
英文摘要

We study the problem of deceptive path planning (DPP), where an agent aims to conceal its true destination from external observers. While existing work assumes static, non-learning observers, real-world adversaries-such as in critical goods transportation or military operations-can adapt by learning from historical trajectories. To address this gap, we introduce Repeated Deceptive Path Planning (RDPP), a new formulation that explicitly models learnable observers. We show that existing DPP methods fail under this setting, as they cannot adapt to evolving adversarial predictions. While incorporating observer previous predictions into updates enables some adaptation, such incremental updates cause accumulative lag that degrades deception. To this end, we propose Deceptive Meta Planning (DeMP), a two-level optimization framework that combines episode-level adaptation, which enables short-term policy adjustment to counter updated observer, and meta-level updates, which leverage cross-episode feedback to capture how observers update their models and accelerate adaptation in future episodes. In this way, DeMP mitigates the accumulation of adaptation lag, enabling sustained deception against a learning observer. Experiments across environments demonstrate that DeMP significantly outperforms existing approaches in RDPP while maintaining competitive path cost. Our results highlight the importance of modeling repeated interactions with learnable adversaries, providing new insights into deception and privacy in multi-agent systems.

2605.07172 2026-05-11 cs.CL

Topology-Enhanced Alignment for Large Language Models: Trajectory Topology Loss and Topological Preference Optimization

Yurui Pan, Ke Xu, Bo Peng

AI总结 本文提出了一种基于拓扑结构的大型语言模型对齐方法,通过引入轨迹拓扑损失(TTL)和拓扑偏好优化(TPO),利用0维持久同调分析语义轨迹的全局几何特性,提升模型生成内容的语义连贯性和对齐效果。该方法在监督微调(SFT)和直接偏好优化(DPO)中分别引入拓扑正则化,使模型生成轨迹更符合语义桥梁结构,并在多个基准测试中表现出优于传统非拓扑方法的性能,同时保持或提升了生成内容的安全性。

Comments Accepted to ACL 2026. 15 pages

详情
英文摘要

Alignment of large language models (LLMs) via SFT and RLHF/DPO typically ignores the global geometry of the representation space, relying instead on local token likelihoods or scalar scores. We view generation as tracing a semantic trajectory in hidden space and propose a topology-enhanced alignment framework that regularizes these trajectories using 0-dimensional persistent homology. First, for SFT, we introduce Trajectory Topology Loss (TTL). Treating prompt and gold-answer embeddings as a mixed point cloud, we use a 0D persistent homology algorithm to extract "prompt-answer bridges." TTL aligns the model's actual update direction with these topological bridges rather than arbitrary directions. Second, for DPO, we propose Topological Preference Optimization (TPO). TPO constructs topic-specific semantic preference vectors and aligns the improvement direction between rejected and chosen responses with these vectors in an intermediate hidden layer. We also introduce a dynamic weighting scheme to balance DPO and TPO losses. Evaluating on Qwen2.5-7B-Instruct using UltraChat and Anthropic HH-RLHF, our topology-enhanced objectives consistently outperform strong non-topological baselines (e.g., per-example, nearest-neighbor, random regularizers) on automatic preference metrics and LLM-judge evaluations, while maintaining or improving toxicity. Results show persistent homology and trajectory geometry offer a promising direction for controllable alignment.

2605.07171 2026-05-11 cs.LG cs.SY eess.SY stat.ML

Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy

Ishank Juneja, Carlee Joe-Wong, Osman Yağan

AI总结 本文研究了在成本补贴约束下的多臂老虎机问题,目标是在保证最小奖励的前提下最小化总成本。针对奖励约束相对于未知最优奖励的情况,作者提出了一个名为Cost-Ordered Feasibility(COF)的算法,该算法通过智能地整合各臂的采样信息,评估低成本臂的可行性,并在理论上证明了其累积成本和质量遗憾的上界。实验表明,COF在理论分析和实际性能上均优于现有方法。

详情
英文摘要

The classic multi-armed bandit (MAB) problem tackles the challenge of accruing maximum reward while making decisions under uncertainty. However, in applications, often the goal is to minimize cost subject to a constraint on the minimum permissible reward, an objective captured by multi-armed bandits with cost-subsidy (MAB-CS). Of interest to this paper is the setting where the quality (reward) constraint is specified relative to the unknown best reward and the cost of each arm is known. We characterize the expected sub-optimal samples required by any policy by proving instance-dependent lower bounds that offer new insight into the problem and are a strict generalization of prior bounds. Then, we propose an algorithm called Cost-Ordered Feasibility (COF) that leverages our insight and intelligently combine samples from all arms to gauge the feasibility of a cheap arm. Thereafter, we analyze COF to establish instance-dependent upper bounds on its expected cumulative cost and quality regret, i.e., relative to the cheapest feasible arm. Finally, we empirically validate the merits of COF, comparing it to baselines from the literature through extensive simulation experiments on the MovieLens and Goodreads datasets as well as representative synthetic instances. Not only does our paper develop qualitatively better theoretical regret upper bounds, but COF also convincingly demonstrates improved empirical performance.

2605.07170 2026-05-11 cs.CL

A Reproducible Multi-Architecture Baseline for Token-Level Chinese Metaphor Identification under the MIPVU Framework

Yufeng Wu

AI总结 该研究针对中文隐喻识别中的词级任务,提出了一种可复现的多架构基线方法,基于PSU中文隐喻语料库进行实验。研究系统比较了三种模型架构,包括基于中文RoBERTa的编码器微调、结合现代汉语词典构建的MelBERT模型,以及使用QLoRA微调的Qwen3.5-9B生成模型。实验结果显示,MelBERT在测试集上的F1值达到0.7281,显著优于其他模型,并揭示了生成模型在召回率上的局限性及部分任务设计的问题。研究还提供了完整的训练脚本和数据资源,为后续中文隐喻识别研究提供参考。

详情
英文摘要

Metaphor is pervasive in everyday language, yet token-level computational identification of metaphor-related words in Chinese under the MIPVU framework remains under-explored relative to English. This paper presents a reproducible multi-architecture baseline for token-level metaphor identification on the PSU Chinese Metaphor Corpus (PSU CMC), the only widely available MIPVU-annotated Chinese corpus. We systematically compare three model families: (i) encoder fine-tuning with Chinese RoBERTa-wwm-ext-large; (ii) MelBERT adapted to Chinese using a newly constructed basic-meaning resource derived from the Modern Chinese Dictionary, 7th edition (MCD7), comprising 74,823 entries with 71.51% PSU CMC vocabulary coverage; and (iii) Qwen3.5-9B fine-tuned with QLoRA as an instruction-tuned generative baseline. Across five fixed seeds, MelBERT MIP-only achieves the strongest performance at 0.7281 +/- 0.0050 test positive F1, marginally above MelBERT Full (0.7270 +/- 0.0069) and clearly above plain RoBERTa (0.7142 +/- 0.0121). The Qwen QLoRA generative configuration trails encoder baselines by approximately 11 F1 points (0.6157 +/- 0.0113). Three findings merit attention: (1) the SPV channel of MelBERT does not contribute reliable positive signal in Chinese, consistent with the dominance of conventional metaphor; (2) the Qwen-encoder gap is concentrated in recall, reflecting the discrete-commitment limitation of generative output; (3) several Qwen task formulations fail due to format design rather than model capacity. We release all split manifests, per-seed outputs, the MCD7 basic-meaning embedding pipeline, and training scripts to serve as a common reference for future Chinese metaphor identification research.

2605.07166 2026-05-11 cs.LG

Neurosymbolic Imitation Learning with Human Guidance: A Privileged Information Approach

Nikhilesh Prabhakar, Varun Balaji, Athresh Karanam, Kristian Kersting, Sriraam Natarajan

AI总结 本文提出了一种结合神经网络与符号方法的模仿学习框架,旨在解决纯神经方法样本需求大、易过拟合以及纯符号方法难以处理高维数据的问题。该方法利用训练期间可获得的额外特权信息(如注视数据),有效提升了模型的泛化能力与学习效率。实验结果验证了该方法在复杂环境中的有效性与优越性。

Comments Under Review for ECML-PKDD 2026

详情
英文摘要

Imitation learning is widely used for learning to act in complex environments. While pure neural-based methods handle high dimensional data effectively, they suffer from the requirement of large number of samples and are prone to overfitting. Pure symbolic approaches, while generalize well, do not handle high-dimensional data effectively. We propose a neurosymbolic approach that achieves the best of both worlds, i.e, handling high-dimensional data while achieving generalization. The key advantage of our approach is that it can effectively exploit additional privileged information that is available only during training (in our case, gaze data). Our empirical evaluations demonstrate the effectiveness, efficiency and the generalization capability of our proposed approach.

2605.07164 2026-05-11 cs.CL

Rethinking Experience Utilization in Self-Evolving Language Model Agents

Weixiang Zhao, Yingshuo Wang, Yichen Zhang, Yanyan Zhao, Yu Zhang, Yang Wu, Dandan Tu, Bing Qin, Ting Liu

AI总结 本文研究了自进化语言模型代理中经验利用这一关键设计维度,指出现有工作多关注经验的构建与更新,而忽视了运行时如何有效使用经验。为此,作者提出了ExpWeaver,一种在推理过程中将经验作为可选资源动态调用的轻量方法,实验表明其在多种框架和环境下均优于传统经验使用策略,并可通过训练进一步增强效果。研究揭示了ExpWeaver能够根据决策需求和推理不确定性选择性调用经验,推动从“存储什么经验”向“何时如何使用经验”的研究范式转变。

Comments 30 pages, 20 figures, 7 tables

详情
英文摘要

Self-evolving agents improve by accumulating and reusing experience from past interactions. Existing work has largely focused on how experience is constructed, represented, and updated, while paying less attention to how experience should be used during runtime decision-making. As a result, most agents rely on rigid usage strategies, either injecting experience once at initialization or at every step, without considering whether it is needed for the current decision. This paper studies experience utilization as a critical design dimension of self-evolving agents. We ask whether agents benefit from interweaving experience use with decision-making, so that experience is invoked only when additional guidance is needed. To examine this question, we introduce {ExpWeaver}, a lightweight instantiation that leaves experience construction unchanged and modifies only runtime utilization by exposing experience as an optional resource during reasoning. Across four representative frameworks, seven LLM backbones, and three types of environments, ExpWeaver consistently achieves the best performance among different utilization strategies. Reinforcement learning experiments further show that this behavior can be amplified through training. Usage-pattern, causal ablation, and entropy-based analyses reveal that ExpWeaver enables agents to invoke experience selectively, at beneficial decision points, and under higher reasoning uncertainty. Overall, our findings call for a shift from merely studying \emph{what} experience to store toward understanding \emph{how} and \emph{when} experience should enter decision-making.

2605.07162 2026-05-11 cs.CL

CLIPer: Tailoring Diverse User Preference via Classifier-Guided Inference-Time Personalization

Jinyan Su, Jinpeng Zhou, Claire Cardie, Wen Sun

AI总结 本文提出了一种名为CLIPer的轻量级个性化方法,通过在推理时利用分类器模型引导大语言模型生成符合用户多样化偏好的响应,如帮助性、简洁性和幽默感等。该方法无需对模型进行大量微调,仅带来极小的额外计算开销,实现了对单维度和多维度用户偏好的可控且细致的个性化。实验表明,CLIPer在个性化语言生成方面具有良好的可扩展性和有效性。

详情
英文摘要

Personalized LLMs can significantly enhance user experiences by tailoring responses to preferences such as helpfulness, conciseness, and humor. However, fine-tuning models to address all possible combinations of user preferences is computationally expensive and impractical. In this paper, we introduce \textbf{CLIPer}(\textbf{Cl}assifier-guided \textbf{I}nference-time \textbf{Per}sonalization), a lightweight personalization approach that leverages a classifier model to steer LLM generation dynamically to different user preferences at inference time. Our method eliminates the need for extensive fine-tuning, inducing negligible additional computational overhead while enabling more controllable and nuanced personalization across single and multi-dimensional preferences. Comprehensive empirical analyses demonstrate the scalability and effectiveness of our approach in delivering personalized language generation.

2605.07157 2026-05-11 cs.LG

Learned Lagrangian Models of PDEs via Euler-Lagrange Residual Minimization

Lyra Zhornyak, Eric Forgoston, M. Ani Hsieh

AI总结 本文提出了一种利用学习到的连续拉格朗日量直接预测偏微分方程系统动力学的方法,通过最小化欧拉-拉格朗日残差实现稳定长期预测。该方法基于优化的积分器,在局部时空区域上采用无网格近似辛结构,有效分离模型误差与积分误差,避免了固定离散化带来的全局耦合问题。实验表明,该方法在双摆、一维和二维波动方程等场景中表现出与经典辛方法相当的精度,并能适应空间变化动力学和任意边界条件。

Comments 9 pages, 8 figures, 2 tables, 7 pages of appendices

详情
英文摘要

We present the first method to directly use a learned continuous Lagrangian to forecast the dynamics of systems governed by partial differential equations, exploiting the inherent conservative structure to achieve stable long-range predictions. We develop an optimization-based integrator that minimizes the squared Euler--Lagrange residual via a mesh-free near-symplectic construction on local space-time patches. Different from integrators for analytical models, integrators for learned models should decouple model error (phase error) from integration error (conservation error). By relying on optimization rather than time-stepping, we bypass the global coupling inherent to fixed discretizations, which slows time- and space-stepping and complicates learning. Our method scales linearly with domain size via Jacobi iteration, and places no structural requirements on the learned network, allowing it to be coupled with existing physics-guided machine learning (ML) methods. We validate our approach on a learned representation of a double pendulum, a one-dimensional wave equation, and a two-dimensional wave equation. Our method achieves error comparable to classical symplectic methods while generalizing to spatially varying dynamics and arbitrary boundary conditions without retraining.

2605.07156 2026-05-11 cs.CV

Hierarchical Perfusion Graphs for Tumor Heterogeneity Modeling in Glioma Molecular Subtyping

Han Jang, Junhyeok Lee, Heeseong Eum, Joon Jang, Yoseob Han, Seung Hong Choi, Kyu Sung Choi

AI总结 该研究提出了一种基于动态对比增强MRI的非侵入性方法HiPerfGNN,用于胶质瘤分子亚型的精准分类。该方法通过矢量量化变分自编码器从原始时间-强度曲线中学习离散的血流动力学表示,并结合结构MRI构建层次化图神经网络,以捕捉肿瘤异质性特征。实验表明,该模型在内部和外部数据集上均表现出优异的分类性能,验证了血流动力学信息在放射基因组学中的重要价值。

Comments Accepted at MICCAI 2026. 11 pages, 2 figures, 2 tables

详情
英文摘要

Precise molecular subtyping of gliomas, including isocitrate dehydrogenase (IDH) mutation and 1p/19q codeletion, directly guides surgical and therapeutic decisions, yet currently relies on invasive tissue sampling. Deep learning on structural MRI has emerged as a non-invasive alternative, but anatomy-only approaches cannot capture the hemodynamic signatures that distinguish molecular subtypes. Radiogenomics based on dynamic susceptibility contrast (DSC) MRI holds immense potential for non-invasively characterizing glioma molecular subtypes, yet clinical deployment has been hindered by inter-site variability and the limitations of voxel-wise analysis. We introduce HiPerfGNN, a framework that first learns discrete hemodynamic representations from raw time-intensity curves using a vector-quantized variational autoencoder (VQ-VAE). These quantized perfusion codes define coarse-level graph nodes representing functional tumor habitats, each of which is hierarchically subdivided into fine-level subregions guided by structural MRI. A hierarchical graph neural network then propagates information across scales for molecular prediction. On an internal cohort (n=475), the model achieved AUCs of 0.96 (IDH), 0.89 (1p/19q), and 0.84 (WHO grade), and maintained robust IDH performance (AUC 0.89) on an independent external cohort (n=397) without recalibration. Gradient-based saliency analysis confirms biologically grounded attention patterns aligned with known glioma pathophysiology. Our results demonstrate the added value of integrating perfusion dynamics into radiogenomic pipelines for glioma molecular subtyping. Code is available at https://github.com/janghana/HiPerfGNN.