arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1968
2602.04585 2026-05-15 cs.CV

ImmuVis: Hyperconvolutional Foundation Model for Imaging Mass Cytometry

Dawid Uchal, Marcin Możejko, Krzysztof Gogolewski, Piotr Kupidura, Szymon Łukasik, Jakub Giezgała, Tomasz Nocoń, Kacper Pietrzyk, Robert Pieniuta, Mateusz Sulimowicz, Michal Orzyłowski, Tomasz Siłkowski, Karol Zagródka, Eike Staub, Ewa Szczurek

AI总结 本文提出了一种名为 ImmuVis 的高效基础模型,专门用于成像质谱流式细胞术(IMC)数据的处理。该模型通过引入标记自适应超卷积,解决了IMC数据中通道不固定的问题,使得模型能够灵活处理不同研究中的标记组合。ImmuVis 在大规模数据集 IMC17M 上进行预训练,相比基于 Transformer 的方法具有更低的计算成本,并在虚拟染色和分类任务中表现出色,同时提供了校准的不确定性估计,为实际应用中的IMC建模提供了实用框架。

Comments 38 pages, 19 figures

详情
英文摘要

We present ImmuVis, a family of efficient foundation models for imaging mass cytometry (IMC), a high-throughput multiplex imaging technology that handles molecular marker measurements as image channels and enables large-scale spatial tissue profiling. Unlike natural images, multiplex imaging lacks a fixed channel space, as real-world marker sets vary across studies, violating a core assumption of standard vision backbones. To address this, ImmuVis introduces marker-adaptive hyperconvolutions that generate convolutional kernels from learned marker embeddings, enabling a single model to operate on arbitrary measured marker subsets without retraining. We pretrain ImmuVis on the largest dataset to date, IMC17M (28 cohorts, 24,405 images, 265 markers, over 17M patches), using self-supervised masked reconstruction. ImmuVis outperforms state-of-the-art baselines and ablations in virtual staining and downstream classification tasks at substantially lower compute cost than transformer-based alternatives, and is the sole model that provides calibrated uncertainty via a heteroscedastic likelihood objective. These results position ImmuVis as a practical framework for real-world IMC modeling.

2602.02427 2026-05-15 cs.LG

Embedding Perturbation may Better Reflect Intermediate-Step Uncertainty in LLM Reasoning

Qihao Wen, Jiahao Wang, Yang Nan, Pengfei He, Ravi Tandon, Han Xu

AI总结 本文研究了如何更准确地量化大语言模型(LLM)在推理过程中的中间步骤不确定性。作者提出通过分析嵌入扰动对生成结果的影响,来识别模型在推理过程中可能存在的不确定或错误步骤。实验表明,基于嵌入扰动的不确定性度量方法相比概率、采样和贝叶斯等传统方法,在不确定性估计方面表现更优,且具有更高的简洁性和效率。

详情
英文摘要

Large language Models (LLMs) have achieved significant breakthroughs across diverse domains; however, they can still produce unreliable or misleading outputs. For responsible LLM application, Uncertainty Quantification (UQ) techniques are used to estimate a model's uncertainty about its outputs, indicating the likelihood that those outputs may be problematic. For LLM reasoning tasks, it is essential to estimate the uncertainty not only for the final answer, but also for the intermediate steps of the reasoning, as this can enable more fine-grained and targeted interventions. In this study, we explore what UQ metrics better reflect the LLM's "intermediate uncertainty" during reasoning. Our study reveals that an LLM's incorrect reasoning steps tend to contain tokens which are highly sensitive to the perturbations on the preceding token embeddings, indicating the model's uncertainty among multiple competing continuations. In this way, uncertain (possibly incorrect) intermediate steps can be readily identified using this sensitivity score as guidance in practice. In our experiments, we show such perturbation-based metrics achieve stronger uncertainty quantification performance compared with baselines including probability-based, sampling-based and Bayesian-based methods. Meanwhile, such metrics also enjoy good simplicity and efficiency.

2601.22197 2026-05-15 cs.LG cs.AI eess.SP

Neural Signals Generate Clinical Notes in the Wild

Jathurshan Pradeepkumar, Zheng Chen, Jimeng Sun

AI总结 生成能够总结长期脑电图(EEG)记录中异常模式、诊断发现和临床解释的临床报告仍然是一项耗时的工作。本文提出CELM,首个能够对长时间、变长EEG记录进行多尺度端到端临床报告生成的临床EEG到语言基础模型。该模型结合了预训练的EEG模型和语言模型,通过构建包含9,048名患者约11,000小时EEG记录和9,922份临床报告的大规模数据集进行训练,并发布了自动化报告结构化流程作为基准,实验结果表明CELM在多项评估设置中均优于现有方法,且经临床专家评估,其生成的报告在临床连贯性、诊断可靠性及与专家解释的一致性方面表现更优。

详情
英文摘要

Generating clinical reports that summarize abnormal patterns, diagnostic findings, and clinical interpretations from long-term EEG recordings remains labor-intensive. We present CELM, the first clinical EEG-to-Language foundation model capable of summarizing long-duration, variable-length EEG recordings and performing end-to-end clinical report generation at multiple scales. CELM integrates pretrained EEG foundation models with language models to enable scalable multimodal learning. We curate a large-scale clinical EEG dataset containing 9,922 reports paired with approximately 11,000 hours of EEG recordings from 9,048 patients to train CELM, and release the benchmark with an automated report-structuring pipeline to facilitate future research. Experimental results show that CELM consistently outperforms existing methods across all evaluation settings. Importantly, we further conduct human evaluation with clinical experts, demonstrating that CELM generates reports that are more clinically coherent, diagnostically reliable, and better aligned with expert interpretation. We release our model and benchmark construction pipeline at https://github.com/Jathurshan0330/CELM.

2601.21929 2026-05-15 cs.LG

LoRIF: Low-Rank Influence Functions for Scalable Training Data Attribution

Shuangqi Li, Hieu Le, Jingyi Xu, Mathieu Salzmann

AI总结 训练数据归因(TDA)旨在识别哪些训练样本对模型预测产生了最大影响。LoRIF 是一种基于梯度的归因方法,通过利用梯度的低秩结构,解决了大规模训练数据下归因计算中的存储和计算瓶颈。该方法通过低秩分解和截断奇异值分解(SVD)降低了存储和内存需求,同时保持了较高的归因质量,在大规模模型和数据集上展现出显著的效率提升。

详情
英文摘要

Training data attribution (TDA) identifies which training examples most influenced a model's prediction. Influence function methods are a theoretically grounded family of TDA methods and exploit gradients. To overcome the scalability challenge arising from gradient computation, the most popular strategy is random projection (e.g., TRAK, LoGRA). However, this still faces two bottlenecks when scaling to large training sets and high-quality attribution: \emph{(i)} storing and loading projected per-example gradients for all $N$ training examples, where query latency is dominated by I/O; and \emph{(ii)} forming the $D \times D$ inverse Hessian approximation, which costs $O(D^2)$ memory. Both bottlenecks scale with the projection dimension $D$, yet increasing $D$ is necessary for attribution quality -- creating a quality--scalability tradeoff. We introduce \textbf{LoRIF} (\textbf{Lo}w-\textbf{R}ank \textbf{I}nfluence \textbf{F}unctions), which exploits low-rank structures of gradient to address both bottlenecks. First, we store rank-$c$ factors of projected per-example gradients rather than full matrices, reducing storage and query-time I/O from $O(D)$ to $O(c\sqrt{D})$ per layer per sample. Second, we use truncated SVD with the Woodbury identity to approximate the inverse Hessian term in an $r$-dimensional subspace, reducing memory from $O(D^2)$ to $O(Dr)$. On models from 0.1B to 70B parameters trained on datasets with millions of examples, LoRIF achieves up to 20$\times$ storage reduction and query-time speedup compared to LoGRA, while matching or exceeding its attribution quality. LoRIF makes gradient-based TDA practical at frontier scale.

2601.20173 2026-05-15 cs.LG cs.HC

MAPLE: Self-Supervised Learning-Enhanced Nonlinear Dimensionality Reduction for Visual Analysis

Zeyang Huang, Takanori Fujiwara, Angelos Chatzimparmpas, Wandrille Duchemin, Andreas Kerren

AI总结 本文提出了一种新的非线性降维方法MAPLE,通过改进流形建模增强UMAP算法。MAPLE采用自监督学习方法,利用最大流形容量表示(MMCRs)更高效地编码低维流形结构,有效区分局部相似与不相似数据点的方差,特别适用于具有高内聚类方差和曲面流形结构的生物或图像数据。实验表明,MAPLE在保持计算效率的同时,能够生成更清晰的聚类分离和更细致的子聚类结构。

详情
英文摘要

We present a new nonlinear dimensionality reduction method, MAPLE, that enhances UMAP by improving manifold modeling. MAPLE employs a self-supervised learning approach to more efficiently encode low-dimensional manifold geometry. Central to this approach are maximum manifold capacity representations (MMCRs), which help untangle complex manifolds by compressing variances among locally similar data points while amplifying variance among dissimilar data points. This design is particularly effective for high-dimensional data with substantial intra-cluster variance and curved manifold structures, such as biological or image data. Our qualitative and quantitative evaluations demonstrate that MAPLE can produce clearer visual cluster separations and finer subcluster resolution than UMAP while maintaining tractable computational cost.

2601.02179 2026-05-15 cs.CL

Confidence Estimation for LLMs in Multi-turn Interactions

Caiqi Zhang, Ruihan Yang, Xiaochen Zhu, Chengzu Li, Tiancheng Hu, Yijiang River Dong, Deqing Yang, Nigel Collier

AI总结 本文研究了大语言模型在多轮对话中进行置信度估计的问题,当前研究多集中于单轮场景,而多轮对话中随着上下文积累和歧义逐步消除,模型置信度的变化机制尚未被充分探索。为此,作者提出了一个基于“每轮校准”和“信息增加下置信度单调性”的评估框架,并引入了新的指标和生成方法,实验表明传统方法在多轮场景中表现不佳,而提出的基于logit的探针P(Sufficient)在跟踪证据积累方面更具有效性,为构建更可靠、可信的对话代理提供了基础方法。

Comments ACL 2026 Findings

详情
英文摘要

While confidence estimation is a promising direction for mitigating hallucinations in Large Language Models (LLMs), current research overwhelmingly focuses on single-turn settings. The dynamics of model confidence in multi-turn conversations, where context accumulates and ambiguity is progressively resolved, remain largely unexplored. This work presents the first systematic study of confidence estimation in multi-turn interactions, establishing a formal evaluation framework grounded in two key desiderata: per-turn calibration and monotonicity of confidence as more information becomes available. To facilitate this, we introduce novel metrics, including a length-normalized Expected Calibration Error (InfoECE), and a new "Hinter-Guesser" paradigm for generating controlled evaluation datasets. Our experiments reveal that widely-used confidence techniques struggle with calibration and monotonicity in multi-turn dialogues. In contrast, a novel logit-based probe we introduce, P(Sufficient), proves comparatively more effective, robustly tracking evidence accumulation and distinguishing it from conversational filler. Our work provides a foundational methodology for developing more reliable and trustworthy conversational agents.

2512.09115 2026-05-15 cs.CV

SuperF: Neural Implicit Fields for Multi-Image Super-Resolution

Sander Riisøen Jyhne, Christian Igel, Morten Goodwin, Per-Arne Andersen, Serge Belongie, Nico Lang

AI总结 本文提出了一种名为 SuperF 的多图像超分辨率方法,旨在通过多个亚像素偏移的低分辨率图像提升图像的光学分辨率。该方法基于坐标感知的神经网络(神经场),通过共享一个隐式神经表示(INR)并联合优化图像对齐与重建过程,有效避免了单图像超分辨率中常见的“幻觉”问题。SuperF 不依赖高分辨率训练数据,实验表明其在卫星图像和手持相机拍摄的地面图像上均取得了高质量的超分辨率结果,放大因子高达8倍。

Comments Published at ICLR 2026, Project website: https://sjyhne.github.io/superf/, 23 pages, 13 figures, 8 table

详情
英文摘要

High-resolution imagery is often hindered by limitations in sensor technology, atmospheric conditions, and costs. Such challenges occur in satellite remote sensing, but also with handheld cameras, such as our smartphones. Hence, super-resolution aims to enhance the image resolution algorithmically. Since single-image super-resolution requires solving an inverse problem, such methods must exploit strong priors, e.g. learned from high-resolution training data, or be constrained by auxiliary data, e.g. by a high-resolution guide from another modality. While qualitatively pleasing, such approaches often lead to "hallucinated" structures that do not match reality. In contrast, multi-image super-resolution (MISR) aims to improve the (optical) resolution by constraining the super-resolution process with multiple views taken with sub-pixel shifts. Here, we propose SuperF, a test-time optimization approach for MISR that leverages coordinate-based neural networks, also called neural fields. Their ability to represent continuous signals with an implicit neural representation (INR) makes them an ideal fit for the MISR task. The key characteristic of our approach is to share an INR for multiple shifted low-resolution frames and to jointly optimize the frame alignment with the INR. Our approach advances related INR baselines, adopted from burst fusion for layer separation, by directly parameterizing the sub-pixel alignment as optimizable affine transformation parameters and by optimizing via a super-sampled coordinate grid that corresponds to the output resolution. Our experiments yield compelling results on simulated bursts of satellite imagery and ground-level images from handheld cameras, with upsampling factors of up to 8. A key advantage of SuperF is that this approach does not rely on any high-resolution training data.

2512.07805 2026-05-15 cs.LG cs.AI cs.CL

Group Representational Position Encoding

Yifan Zhang, Zixiang Chen, Yifeng Liu, Zhen Qin, Huizhuo Yuan, Kangping Xu, Yang Yuan, Quanquan Gu, Andrew Chi-Chih Yao

AI总结 本文提出了一种基于群作用的统一位置编码框架 GRAPE,能够涵盖乘法和加法两类机制。乘法 GRAPE 通过指数映射生成保持模长的相对位置表示,能够精确还原 RoPE 并扩展至更复杂的子空间耦合结构;加法 GRAPE 则基于单秩或低秩单射作用,实现了 ALiBi 和 FoX 的精确复现并保持流式计算能力。GRAPE 为长上下文模型中的位置编码提供了理论严谨的设计空间,统一并扩展了现有方法。

Comments Published in ICLR 2026. Project Page: https://github.com/model-architectures/GRAPE

详情
英文摘要

We present GRAPE (Group Representational Position Encoding), a unified framework for positional encoding based on group actions. GRAPE unifies two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in $\operatorname{SO}(d)$ and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group $\mathrm{GL}$. In Multiplicative GRAPE, a position $n \in \mathbb{Z}$ (or $t \in \mathbb{R}$) acts as $\mathbf{G}(n) = \exp(n \, ω\, \mathbf{L})$ with a rank-2 skew-symmetric generator $\mathbf{L} \in \mathbb{R}^{d \times d}$, yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the $d/2$ planes correspond to canonical coordinate pairs with a log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at $O(d)$ and $O(r d)$ cost per head, respectively. In Additive GRAPE, additive logits arise from rank-1 (or low-rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Overall, GRAPE provides a principled design space for positional geometry in long-context models, subsuming RoPE and ALiBi as special cases. Project page: https://github.com/model-architectures/GRAPE.

2512.02920 2026-05-15 cs.LG cs.CV cs.SI

Learning Multimodal Embeddings for Traffic Accident Prediction and Causal Estimation

Ziniu Zhang, Minxuan Duan, Haris N. Koutsopoulos, Hongyang R. Zhang

AI总结 本文研究如何利用道路网络数据和卫星图像信息进行交通事故预测与因果分析。作者构建了一个包含美国六州九百万起事故记录和一千万张高分辨率卫星图像的多模态数据集,并结合天气、道路类型和交通流量等标注信息,评估了融合视觉与网络嵌入的多模态学习方法。实验表明,融合两种模态信息可显著提升预测性能,平均AUROC达90.1%,并发现降水、道路类型和季节性因素对事故率有显著影响。

Comments 17 pages. Appeared in KDD 2026

详情
英文摘要

We consider analyzing traffic accident patterns using both road network data and satellite images aligned to road graph nodes. Previous work for predicting accident occurrences relies primarily on road network structural features while overlooking physical and environmental information from the road surface and its surroundings. In this work, we construct a large multimodal dataset spanning six U.S. states, containing nine million traffic accident records from official sources, and one million high-resolution satellite images for each node of the road network. Additionally, every node is annotated with features such as the region's weather statistics and road type (e.g., residential vs. motorway), and each edge is annotated with traffic volume information (i.e., Average Annual Daily Traffic). Utilizing this dataset, we conduct a comprehensive evaluation of multimodal learning methods that integrate both visual and network embeddings. Our findings show that integrating both data modalities improves prediction accuracy, achieving an average AUROC of $90.1\%$, a $3.7\%$ gain over graph neural network models that use only graph structures. With the improved embeddings, we conduct a causal analysis using a matching estimator to identify the key factors influencing traffic accidents. We find that accident rates rise by $24\%$ under higher precipitation, by $22\%$ on higher-speed roads such as motorways, and by $29\%$ due to seasonal patterns, after adjusting for other confounding factors. Ablation studies confirm that satellite imagery features are essential for achieving accurate prediction.

2512.01766 2026-05-15 cs.LG

On the Unreasonable Effectiveness of Last-layer Retraining

John C. Hill, Tyler LaBonte, Xinchen Zhang, Vidya Muthukumar

AI总结 本文研究了最后一层重训练(LLR)方法在提升模型对少数群体鲁棒性方面的有效性。作者发现,即使在训练集的不平衡子集上进行重训练,LLR仍能显著提升最差群体的准确率。研究通过实验证明,LLR的效果主要源于重训练数据集中的组间平衡性,而非此前假设的神经崩溃缓解机制。文章进一步分析了近期提出的CB-LLR和AFR算法如何通过隐式组平衡提升模型鲁棒性。

详情
英文摘要

Last-layer retraining (LLR) methods -- wherein the last layer of a neural network is reinitialized and retrained on a held-out set following ERM training -- have garnered interest as an efficient approach to rectify dependence on spurious correlations and improve performance on minority groups. Surprisingly, LLR has been found to improve worst-group accuracy even when the held-out set is an imbalanced subset of the training set. We initially hypothesize that this ``unreasonable effectiveness'' of LLR is explained by its ability to mitigate neural collapse through the held-out set, resulting in the implicit bias of gradient descent benefiting robustness. Our empirical investigation does not support this hypothesis. Instead, we present strong evidence for an alternative hypothesis: that the success of LLR is primarily due to better group balance in the held-out set. We conclude by showing how the recent algorithms CB-LLR and AFR perform implicit group-balancing to elicit a robustness improvement.

2511.17299 2026-05-15 cs.RO

MonoSpheres: Large-Scale Monocular SLAM-Based UAV Exploration through Perception-Coupled Mapping and Planning

Tomáš Musil, Matěj Petrlík, Martin Saska

AI总结 本文提出了一种基于单目视觉的无人飞行器大规模自主探索方法MonoSpheres,解决了仅依靠单目相机进行三维环境探索时稀疏深度数据、自由空间间隙和深度不确定性等问题。该方法通过感知耦合的建图与规划模块,实现了对室内外非结构化环境的安全高效探索,并首次在真实户外环境中实现了基于单目视觉的三维自主探索。实验验证了方法的有效性,并开源了代码以支持后续研究。

Comments 8 pages, 9 figures, accepted to IEEE Robotics and Automation Letters

详情
英文摘要

Autonomous exploration of unknown environments is a key capability for mobile robots, but it is largely unsolved for robots equipped with only a single monocular camera and no dense range sensors. In this paper, we present a novel approach to monocular vision-based exploration that can safely cover large-scale unstructured indoor and outdoor 3D environments by explicitly accounting for the properties of a sparse monocular SLAM frontend in both mapping and planning. The mapping module solves the problems of sparse depth data, free-space gaps, and large depth uncertainty by oversampling free space in texture-sparse areas and keeping track of obstacle position uncertainty. The planning module handles the added free-space uncertainty through rapid replanning and perception-aware heading control. We further show that frontier-based exploration is possible with sparse monocular depth data when parallax requirements and the possibility of textureless surfaces are taken into account. We evaluate our approach extensively in diverse real-world and simulated environments, including ablation studies. To the best of the authors' knowledge, the proposed method is the first to achieve 3D monocular exploration in real-world unstructured outdoor environments. We open-source our implementation to support future research.

2511.07308 2026-05-15 cs.LG

Can Stationary Distributions of Scale-Invariant Neural Networks Be Described by the Thermodynamics of an Ideal Gas?

Ildus Sadrtdinov, Ekaterina Lobacheva, Ivan Klimov, Mikhail Burtsev, Mikhail I. Katsnelson, Dmitry Vetrov

AI总结 本文探讨了深度神经网络训练过程中的动力学行为,提出了一种基于热力学的框架,用于描述具有权重衰减的随机梯度下降(SGD)在尺度不变神经网络中的平稳分布。研究将训练超参数(如学习率、权重衰减)与热力学变量(如温度、压力、体积)建立类比,并通过理论分析和实验验证,揭示了SGD动态与理想气体行为之间的紧密对应关系。该框架为理解训练过程提供了物理视角,有助于指导超参数调整和学习率调度器的设计。

Comments Accepted at IJCAI-ECAI 2026 (the 35th International Joint Conference on Artificial Intelligence)

详情
英文摘要

Understanding the training dynamics of deep neural networks remains a major open problem, with physics-inspired approaches offering promising insights. Building on this perspective, we develop a thermodynamic framework to describe the stationary distributions of stochastic gradient descent (SGD) with weight decay for scale-invariant neural networks, a setting that both reflects practical architectures with normalization layers and permits theoretical analysis. We establish analogies between training hyperparameters (e.g., learning rate, weight decay) and thermodynamic variables such as temperature, pressure, and volume. Starting with a simplified isotropic noise model, we uncover a close correspondence between SGD dynamics and ideal gas behavior, validated through theory and simulation. Extending to training of neural networks, we show that key predictions of the framework, including the behavior of stationary entropy, align closely with experimental observations. This framework provides a principled foundation for interpreting training dynamics and may guide future work on hyperparameter tuning and the design of learning rate schedulers.

2510.23477 2026-05-15 cs.CL

MMTutorBench: The First Multimodal Benchmark for AI Math Tutoring

Tengchao Yang, Sichen Guo, Mengzhao Jia, Jiaming Su, Yuanyang Liu, Zhihan Zhang, Meng Jiang

AI总结 MMTutorBench 是首个用于评估人工智能数学辅导能力的多模态基准,旨在测试模型在问题求解、诊断学生困难和逐步引导等方面的能力。该基准包含685个围绕教学关键步骤构建的数学问题,每个问题配有详细的评分标准,并分为三个任务:洞察发现、操作制定和操作执行。实验表明,当前主流多模态大语言模型在辅导能力上仍与人类教师存在较大差距,且不同输入方式对模型表现有显著影响,凸显了该基准在评估和推动AI数学辅导系统发展中的重要价值。

详情
英文摘要

Effective math tutoring requires not only solving problems but also diagnosing students' difficulties and guiding them step by step. While multimodal large language models (MLLMs) show promise, existing benchmarks largely overlook these tutoring skills. We introduce MMTutorBench, the first benchmark for AI math tutoring, consisting of 685 problems built around pedagogically significant key-steps. Each problem is paired with problem-specific rubrics that enable fine-grained evaluation across six dimensions, and structured into three tasks-Insight Discovery, Operation Formulation, and Operation Execution. We evaluate 12 leading MLLMs and find clear performance gaps between proprietary and open-source systems, substantial room compared to human tutors, and consistent trends across input variants: OCR pipelines degrade tutoring quality, few-shot prompting yields limited gains, and our rubric-based LLM-as-a-Judge proves highly reliable. These results highlight both the difficulty and diagnostic value of MMTutorBench for advancing AI tutoring.

2510.18326 2026-05-15 cs.CV

Enhancing Few-Shot Classification of Benchmark and Disaster Imagery with ABHFA-Net

Gao Yu Lee, Tanmoy Dam, Md Meftahul Ferdaus, Daniel Puiu Poenar, Vu Duong

AI总结 随着自然灾害和人为灾害频发,亟需在标注数据有限的情况下具备强鲁棒性的视觉识别系统。本文提出了一种基于注意力机制和巴氏距离的特征聚合网络(ABHFA-Net),用于提升少样本分类在基准和灾害图像上的性能。该方法通过将类别原型建模为概率分布,并利用巴氏距离进行分类,同时引入空间通道注意力机制和对比softmax损失,有效提升了特征判别能力和类别可分性。实验表明,ABHFA-Net在多个基准和真实灾害数据集上均取得优异性能,尤其在灾害图像分类中表现出显著优势。

Comments Revised and Submitted to SN Computer journal

详情
英文摘要

The rising incidence of natural and human-induced disasters necessitates robust visual recognition systems capable of operating under limited labeled data conditions. However, disaster-related image classification remains challenging due to data scarcity, high intra-class variability, and domain-specific complexities in remote sensing imagery. To address these challenges, we propose the Attention Bhattacharyya Distance-based Feature Aggregation Network (ABHFA-Net), a novel few-shot learning (FSL) framework that models class prototypes as probability distributions and performs classification via Bhattacharyya distance-based comparison. Our approach integrates a spatial channel attention mechanism to enhance discrimiantive feature learning in the few-shot context and introduces a Bhattacharyya-based contrastive softmax loss for improved class separability. Extensive experiments on both benchmark datasets (CIFAR-FS, FC-100, miniImageNet, tieredImageNet) and real-world disaster datasets (AIDER, CDD, MEDIC) demonstrate the effectiveness of the proposed method. In particular, ABHFA-Net achieves 80.7% and 92.3% accuracy on CIFAR-FS under 5-way 1-shot and 5-shot settings, respectively, outperforming existing state-of-the-art methods. On disaster datasets, the model consistently improves classification performance, achieving up to 68.2% (1-shot) and 78.3% (5-shot) accuracy on AIDER, highlighting its robustness in real-world scenarios. These results establish ABHFA-Net as a strong and practical solution for few-shot disaster image classification, particularly in data-scarce and time-critical remote sensing applications. The code repository for our work is available at https://github.com/GreedYLearner1146/ABHFA-Net.

2510.16196 2026-05-15 cs.CV cs.AI

Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI

Zheng Huang, Enpei Zhang, Weikang Qiu, Yinghao Cai, Carl Yang, Elynn Chen, Xiang Zhang, Rex Ying, Dawei Zhou, Yujun Yan

AI总结 本文研究如何从功能性磁共振成像(fMRI)信号中重建视觉刺激,以理解大脑如何编码视觉信息。研究发现,fMRI信号与语言模型的文本空间更为相似,而非基于视觉或图文联合的空间,并提出应通过结构化文本空间来更好地表示视觉刺激的组成特性。基于这一发现,作者提出了PRISM模型,通过将fMRI信号投影到结构化文本空间,并结合对象生成和属性关系搜索模块,显著提升了图像重建质量,在真实数据集上实现了感知损失的降低。

详情
英文摘要

Understanding how the brain encodes visual information is a central challenge in neuroscience and machine learning. A promising approach is to reconstruct visual stimuli, essentially images, from functional Magnetic Resonance Imaging (fMRI) signals. This involves two stages: transforming fMRI signals into a latent space and then using a pretrained generative model to reconstruct images. The reconstruction quality depends on how similar the latent space is to the structure of neural activity and how well the generative model produces images from that space. Yet, it remains unclear which type of latent space best supports this transformation and how it should be organized to represent visual stimuli effectively. We present two key findings. First, fMRI signals are more similar to the text space of a language model than to either a vision based space or a joint text image space. Second, text representations and the generative model should be adapted to capture the compositional nature of visual stimuli, including objects, their detailed attributes, and relationships. Building on these insights, we propose PRISM, a model that Projects fMRI sIgnals into a Structured text space as an interMediate representation for visual stimuli reconstruction. It includes an object centric diffusion module that generates images by composing individual objects to reduce object detection errors, and an attribute relationship search module that automatically identifies key attributes and relationships that best align with the neural activity. Extensive experiments on real world datasets demonstrate that our framework outperforms existing methods, achieving up to an 8% reduction in perceptual loss. These results highlight the importance of using structured text as the intermediate space to bridge fMRI signals and image reconstruction.

2510.07060 2026-05-15 cs.CL

Does Local News Stay Local?: Online Content Shifts in Sinclair-Acquired Stations

Miriam Wanner, Sophia Hager, Anjalie Field

AI总结 本文研究了 Sinclair 公司收购地方新闻台后对其新闻内容的影响。通过计算方法分析收购前后地方新闻台与全国性新闻机构的内容变化,发现地方新闻台在被 Sinclair 收购后,更频繁地报道全国性新闻,减少了对本地议题的覆盖,并增加了对争议性全国话题的报道。这一研究揭示了媒体所有权变化对新闻内容倾向的潜在影响。

Comments Published at NLP+CSS Workshop @ ACL 2026

详情
英文摘要

Local news stations are often considered to be reliable sources of non-politicized information, particularly local concerns that residents care about. Because these stations are trusted news sources, viewers are particularly susceptible to the information they report. The Sinclair Broadcast group is a broadcasting company that has acquired many local news stations in the last decade. We investigate the effects of local news stations being acquired by Sinclair: how does coverage change? We use computational methods to investigate changes in internet content put out by local news stations before and after being acquired by Sinclair and in comparison to national news outlets. We find that there is clear evidence that local news stations report more frequently on national news at the expense of local topics, and that their coverage of polarizing national topics increases.

2510.00231 2026-05-15 cs.LG cs.AI

The Pitfalls of KV Cache Compression

Alex Chen, Renato Geh, Aditya Grover, Guy Van den Broeck, Daniel Israel

AI总结 本文探讨了KV缓存压缩在实际应用场景中的潜在问题,特别是在多指令提示任务中可能引发的性能下降。研究评估了五种KV缓存压缩方法在大型语言模型中的表现,发现某些指令在压缩后性能急剧下降,甚至被模型完全忽略,并以系统提示泄露为例,分析了压缩对指令遵循能力的影响。文章进一步指出了影响泄露现象的关键因素,并提出了改进KV缓存淘汰策略的简单方法,以提升多指令任务的整体表现。

Comments In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics, ACL 2026

详情
英文摘要

KV cache compression promises increased throughput and efficiency with negligible loss in performance. While the gains in throughput are indisputable and recent literature has indeed shown minimal degradation on particular benchmarks, in general the consequences of compression in realistic scenarios such as multi-instruction prompting have been insufficiently studied. In this paper, we identify several pitfalls that practitioners should be aware of when deploying KV cache compressed LLMs. We evaluate five KV cache compression methods (StreamingLLM, SnapKV, TOVA, H2O, and K-Norm) on Llama3.1 8B and Qwen2.5 14B under multi-instruction prompting with IFEval. Importantly, we show that certain instructions degrade much more rapidly with compression, effectively causing them to be completely ignored by the LLM. As a practical example, we highlight system prompt leakage as a case study, empirically demonstrating the impact of compression on leakage and general instruction-following. We identify several factors that contribute to system prompt leakage: compression method, instruction order, and KV eviction bias. We then propose simple changes to KV cache eviction policies that can reduce the impact of these factors and improve the overall performance in multi-instruction tasks.

2509.14159 2026-05-15 cs.RO

MIMIC-D: Multi-modal Imitation for MultI-agent Coordination with Decentralized Diffusion Policies

Dayi Dong, Maulik Bhatt, Seoyeon Choi, Negar Mehr

AI总结 随着机器人在社会中应用日益广泛,其在多模态任务中与其它机器人和人类协调合作的能力变得至关重要。传统模仿学习方法在处理多模态专家示范时往往无法有效捕捉多种可能的行为模式,而现有基于扩散模型的多智能体方法通常依赖集中式规划或显式通信。本文提出MIMIC-D,一种基于扩散模型的去中心化多智能体模仿学习框架,通过仅使用局部信息联合训练所有智能体策略,实现隐式协调,在仿真和实际硬件实验中表现出优异的多模态协作能力。

Comments 8 pages, 4 figures, 5 tables

详情
英文摘要

As robots become more integrated in society, their ability to coordinate with other robots and humans on multi-modal tasks (those with multiple valid solutions) is crucial. Such behaviors can be learned from expert demonstrations via imitation learning (IL), but when expert demonstrations are multi-modal, standard IL approaches usually average across modes or collapse to a single mode, preventing effective coordination. Being inspired by diffusion models' ability to capture complex multi-modal trajectory distributions in single-agent settings, we develop a diffusion-based framework for coordinated multi-modal behavior in multi-agent systems. However, existing multi-agent diffusion approaches typically require a centralized planner or explicit communication among agents. This assumption can fail in real-world scenarios where robots must operate independently or with agents like humans that they cannot directly communicate with. Therefore, we propose MIMIC-D, a joint training with decentralized execution paradigm for multi-modal multi-agent IL via diffusion. We jointly train all agents' policies with only local information to achieve implicit coordination. In simulation and hardware experiments, our method exhibits robust multi-modal coordination behavior in various tasks and environments, improving upon state-of-the-art baselines.

2509.01416 2026-05-15 cs.LG

MD-PNOP: Equation-Recast Neural Operators for Minimal-Data Extrapolation and PDE Solver Acceleration

Qiyun Cheng, Md Hossain Sahadath, Huihua Yang, Shaowu Pan, Wei Ji

AI总结 该研究提出了一种名为MD-PNOP的框架,旨在加速参数化偏微分方程(PDE)求解器并实现小样本条件下的参数外推。通过将参数引起的算子差异转化为额外的源项,并结合预训练神经算子进行迭代求解,该方法能够在不重新训练的情况下,从单一训练配置外推到多种未见过的参数场景。实验表明,MD-PNOP在保持物理守恒的前提下显著提升了求解效率,适用于中子输运等实际应用中的复杂问题。

详情
英文摘要

The computational overhead of traditional numerical solvers for partial differential equations (PDEs) remains a critical bottleneck for large-scale parametric studies and design optimization. We introduce a Minimal-Data Parametric Neural Operator Preconditioning (MD-PNOP) framework, which establishes a new strategy for accelerating parametric PDE solvers while strictly preserving physical constraints. To address the extrapolation limitation of neural operators, parameter-induced operator difference is recast as additional source terms and incorporated into an iterative solution scheme using a pretrained neural operator. This equation-recast formulation enables systematic parameter extrapolation from a single training configuration to a broad range of unseen parameter settings without retraining. The neural operator predictions are then embedded into iterative PDE solvers as improved initial guesses, thereby reducing convergence iterations without sacrificing accuracy. Unlike purely data-driven approaches, MD-PNOP guarantees that the governing equations remain fully enforced, eliminating concerns regarding loss of physics or interpretability. The framework is architecture-agnostic and is demonstrated using both DeepONet and FNO for Boltzmann transport equation solvers in neutron transport applications. Numerical results demonstrate that neural operators trained on a single set of constant parameters successfully accelerate solutions with heterogeneous, sinusoidal, and discontinuous parameter distributions. Moreover, MD-PNOP consistently achieves approximately 50% reduction in computational time while maintaining full-order fidelity for fixed-source, single-group eigenvalue, and multigroup coupled eigenvalue problems.

2506.11067 2026-05-15 cs.CL

A Large Language Model Based Pipeline for Review of Systems Entity Recognition from Clinical Notes

Hieu Nghiem, Zhuqi Miao, Hemanth Reddy Singareddy, Jivan Lamichhane, Abdulaziz Ahmed, Johnson Thomas, Dursun Delen, William Paiva

AI总结 该研究提出了一种基于大语言模型(LLM)的高效管道,用于从临床笔记中自动识别“系统回顾”(ROS)实体,如疾病、症状及其所属的身体系统。研究采用四种开源大语言模型,并引入了一种新颖的归因算法,以提高实体识别的准确性。实验结果表明,该管道在多个任务上表现出色,且在资源受限的环境中具有良好的应用前景。

Comments Accepted by IEEE EMBC 2026. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

详情
英文摘要

Objective: Develop a cost-effective, large language model (LLM)-based pipeline for automatically extracting Review of Systems (ROS) entities from clinical notes. Materials and Methods: The pipeline extracts ROS section from the clinical note using SecTag header terminology, followed by few-shot LLMs to identify ROS entities such as diseases or symptoms, their positive/negative status and associated body systems. We implemented the pipeline using 4 open-source LLM models: llama3.1:8b, gemma3:27b, mistral3.1:24b and gpt-oss:20b. Additionally, we introduced a novel attribution algorithm that aligns LLM-identified ROS entities with their source text, addressing non-exact and synonymous matches. The evaluation was conducted on 24 general medicine notes containing 340 annotated ROS entities. Results: Open-source LLMs enable a local, cost-efficient pipeline while delivering promising performance. Larger models like Gemma, Mistral, and Gpt-oss demonstrate robust performance across three entity recognition tasks of the pipeline: ROS entity extraction, negation detection and body system classification (highest F1 score = 0.952). With the attribution algorithm, all models show improvements across key performance metrics, including higher F1 score and accuracy, along with lower error rate. Notably, the smaller Llama model also achieved promising results despite using only one-third the VRAM of larger models. Discussion and Conclusion: From an application perspective, our pipeline provides a scalable, locally deployable solution to easing the ROS documentation burden. Open-source LLMs offer a practical AI option for resource-limited healthcare settings. Methodologically, our newly developed algorithm facilitates accuracy improvements for zero- and few-shot LLMs in named entity recognition.

2506.04646 2026-05-15 cs.RO cs.LG

ActivePusher: Active Learning and Planning with Residual Physics for Nonprehensile Manipulation

Zhuoyun Zhong, Seyedali Golestaneh, Constantinos Chamzas

AI总结 本文提出了一种名为ActivePusher的新型框架,用于非抓取式操作(如推动和滚动)中的主动学习与规划。该方法结合残差物理模型与基于不确定性的主动学习策略,以高效采集最具信息量的训练数据,并与基于模型的运动规划器集成,提升长期规划的可靠性。实验表明,该方法在仿真和实际环境中均表现出更高的数据效率和规划成功率。

Comments Accepted by the 2026 IEEE International Conference on Robotics & Automation (ICRA 2026)

详情
英文摘要

Planning with learned dynamics models offers a promising approach toward versatile real-world manipulation, particularly in nonprehensile settings such as pushing or rolling, where accurate analytical models are difficult to obtain. However, collecting training data for learning-based methods can be costly and inefficient, as it often relies on randomly sampled interactions that are not necessarily the most informative. Furthermore, learned models tend to exhibit high uncertainty in underexplored regions of the skill space, undermining the reliability of long-horizon planning. To address these challenges, we propose ActivePusher, a novel framework that combines residual-physics modeling with uncertainty-based active learning, to focus data acquisition on the most informative skill parameters. Additionally, ActivePusher seamlessly integrates with model-based kinodynamic planners, leveraging uncertainty estimates to bias control sampling toward more reliable actions. We evaluate our approach in both simulation and real-world environments, and demonstrate that it consistently improves data efficiency and achieves higher planning success rates in comparison to baseline methods. The source code is available at https://github.com/elpis-lab/ActivePusher.

2505.23912 2026-05-15 cs.CL cs.AI

LoVeC: Reinforcement Learning for Better Verbalized Confidence in Long-Form Generations

Caiqi Zhang, Xiaochen Zhu, Chengzu Li, Nigel Collier, Andreas Vlachos

AI总结 本文提出 LoVeC,一种基于强化学习的方法,用于在长文本生成过程中动态添加可解释的置信度评分,以提升生成内容的事实准确性。该方法克服了现有方法在计算效率和任务泛化上的不足,能够在长形式问答任务中实现更高效、更鲁棒的置信度估计。实验表明,LoVeC 在多个数据集上表现出更优的校准能力和跨领域泛化性能,且效率比传统方法高20倍。

Comments ACL 2026 Main

详情
英文摘要

Hallucination remains a major challenge for the safe and trustworthy deployment of large language models (LLMs) in factual content generation. Prior work has explored confidence estimation as an effective approach to hallucination detection, but often relies on post-hoc self-consistency methods that require computationally expensive sampling. Verbalized confidence offers a more efficient alternative, but existing approaches are largely limited to short-form question answering (QA) tasks and do not generalize well to open-ended generation. In this paper, we propose LoVeC (Long-form Verbalized Confidence), a novel reinforcement learning based method that trains LLMs to append an on-the-fly numerical confidence score to each generated statement during long-form generation. The confidence score serves as a direct and interpretable signal of the factuality of generation. We introduce two evaluation settings, free-form tagging and iterative tagging, to assess different verbalized confidence estimation methods. Experiments on three long-form QA datasets show that our RL-trained models achieve better calibration and generalize robustly across domains. Also, our method is highly efficient, being 20 times faster than traditional self-consistency methods while achieving better calibration.

2505.17353 2026-05-15 cs.CV cs.AI cs.LG eess.IV

Dual Ascent Diffusion for Inverse Problems

Minseo Kim, Axel Levy, Gordon Wetzstein

AI总结 本文研究了如何利用扩散模型解决逆问题中的病态问题,提出了一种基于对偶上升优化框架的新方法。该方法在图像恢复任务中表现出更优的图像质量、更强的噪声鲁棒性以及更快的计算速度,同时能更真实地反映观测数据。该工作为逆问题求解提供了更高效且准确的解决方案。

Comments Project page: https://soniaminseokim.github.io/ddiff/

详情
英文摘要

Ill-posed inverse problems are fundamental in many domains, ranging from astrophysics to medical imaging. Emerging diffusion models provide a powerful prior for solving these problems. Existing maximum-a-posteriori (MAP) or posterior sampling approaches, however, rely on different computational approximations, leading to inaccurate or suboptimal samples. To address this issue, we introduce a new approach to solving MAP problems with diffusion model priors using a dual ascent optimization framework. Our framework achieves better image quality as measured by various metrics for image restoration problems, it is more robust to high levels of measurement noise, it is faster, and it estimates solutions that represent the observations more faithfully than the state of the art.

2502.16060 2026-05-15 cs.LG cs.AI eess.SP

Tokenizing Single-Channel EEG with Time-Frequency Motif Learning

Jathurshan Pradeepkumar, Xihao Piao, Zheng Chen, Jimeng Sun

AI总结 本文提出了一种名为TFM-Tokenizer的新颖EEG分词框架,通过从单通道脑电图信号中学习时间-频率模式词汇并将其编码为离散标记,解决了EEG分词这一重要难题。该方法采用双路径架构与时间-频率掩码机制,能够生成鲁棒的模式表示,并适用于多种下游模型,包括轻量级变压器和现有基础模型。实验表明,该分词器在多个EEG基准数据集上显著提升了性能,具有更好的泛化能力和设备适应性。

Comments Accepted to ICLR 2026

详情
英文摘要

Foundation models are reshaping EEG analysis, yet an important problem of EEG tokenization remains a challenge. This paper presents TFM-Tokenizer, a novel tokenization framework that learns a vocabulary of time-frequency motifs from single-channel EEG signals and encodes them into discrete tokens. We propose a dual-path architecture with time-frequency masking to capture robust motif representations, and it is model-agnostic, supporting both lightweight transformers and existing foundation models for downstream tasks. Our study demonstrates three key benefits: Accuracy: Experiments on four diverse EEG benchmarks demonstrate consistent performance gains across both single- and multi-dataset pretraining settings, achieving up to $11\%$ improvement in Cohen's Kappa over strong baselines. Generalization: Moreover, as a plug-and-play component, it consistently boosts the performance of diverse foundation models, including BIOT and LaBraM. Scalability: By operating at the single-channel level rather than relying on the strict 10-20 EEG system, our method has the potential to be device-agnostic. Experiments on ear-EEG sleep staging, which differs from the pretraining data in signal format, channel configuration, recording device, and task, show that our tokenizer outperforms baselines by $14\%$. A comprehensive token analysis reveals strong class-discriminative, frequency-aware, and consistent structure, enabling improved representation quality and interpretability. Code is available at https://github.com/Jathurshan0330/TFM-Tokenizer.

2502.00270 2026-05-15 cs.LG cs.AI stat.ML

DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks

Zhiliang Chen, Gregory Kang Ruey Lau, Chuan-Sheng Foo, Bryan Kian Hsiang Low

AI总结 本文研究了如何在未知的下游评估任务下优化大型语言模型的训练数据混合问题。由于实际任务数据往往不可见,传统数据选择方法难以适用,作者提出了一种基于反馈的优化方法DUET,结合影响函数与贝叶斯优化,实现了无需任务数据先验知识的全局到局部的数据混合优化。实验表明,DUET在多种语言任务中优于现有方法,展示了其在未知任务设置下的有效性。

Comments Accepted to ICLR 2026 main conference

详情
英文摘要

The performance of an LLM depends heavily on the relevance of its training data to the downstream evaluation task. However, in practice, the data involved in an unseen evaluation task is often unknown (e.g., conversations between an LLM and a user are end-to-end encrypted). Hence, it is unclear what data are relevant for fine-tuning the LLM to maximize its performance on the specific unseen evaluation task. Instead, one can only deploy the LLM on the unseen task to gather multiple rounds of feedback on how well the model performs (e.g., user ratings). This novel setting offers a refreshing perspective towards optimizing training data mixtures via feedback from an unseen evaluation task, which prior data mixing and selection works do not consider. Our paper presents DUET, a novel global-to-local algorithm that interleaves influence function as a data selection method with Bayesian optimization to optimize data mixture via feedback from a specific unseen evaluation task. By analyzing DUET's cumulative regret, we theoretically show that DUET converges to the optimal training data mixture for an unseen task even without any data knowledge of the task. Finally, our experiments across a variety of language tasks demonstrate that DUET outperforms existing data selection and mixing methods in the unseen-task setting.

2411.18104 2026-05-15 cs.CL cs.AI cs.LG

Training and Evaluating Language Models with Template-based Data Generation

Yifan Zhang

AI总结 本文针对大语言模型在复杂多步骤推理任务(如数学问题求解)中的不足,提出了一种基于模板的数据生成方法(TDG),利用前沿大模型GPT-4自动生成参数化元模板,从而合成大量高质量的问题与解答。研究构建了包含700多万道小学数学题的TemplateMath Part I:TemplateGSM数据集,每个问题均配有可编程验证的解法,有效解决了数据稀缺问题,并为模型对齐提供了基于可验证奖励的强化学习机制,推动了具备强大推理能力的新一代大语言模型的发展。

Comments Published in ICLR 2025 DATA-FM Workshop. Project Page: https://github.com/iiis-ai/TemplateMath

详情
英文摘要

The rapid advancement of large language models (LLMs) such as GPT-3, PaLM, and Llama has significantly transformed natural language processing, showcasing remarkable capabilities in understanding and generating language. However, a fundamental bottleneck persists: these models often struggle with tasks requiring complex, multi-step reasoning, particularly in mathematical problem-solving. This deficiency stems from the critical scarcity of large-scale, high-quality, domain-specific datasets necessary for cultivating sophisticated reasoning abilities. To overcome this challenge, we introduce Template-based Data Generation (TDG), a novel and scalable paradigm that harnesses frontier LLMs (GPT-4) to automatically generate parameterized meta-templates, which in turn synthesize a virtually infinite stream of high-quality problems and solutions. Using this paradigm, we create TemplateMath Part I: TemplateGSM, a foundational dataset of over 7 million synthetically generated grade school math problems. Each problem is accompanied by a programmatically verifiable solution, offering an unprecedented level of quality at scale. This resource not only resolves the data scarcity issue for supervised fine-tuning but also provides a robust mechanism for model alignment through Reinforcement Learning with Verifiable Rewards (RLVR). Our approach elevates data augmentation by leveraging GPT-4 to generate meta-templates, ensuring diverse and complex problem structures. By providing a scalable solution to the data and verification bottleneck, TDG and TemplateGSM pave the way for a new generation of LLMs with powerful, reliable reasoning skills.

2410.06431 2026-05-15 cs.LG

Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs

Ruijia Niu, Dongxia Wu, Rose Yu, Yi-An Ma

AI总结 本文研究了大语言模型在微调过程中不确定性量化的问题,针对现有方法在有限适配数据下容易过度自信的缺陷,提出了一种基于功能层面的不确定性量化方法UQ4CT。该方法通过混合专家微调框架,在训练过程中引入校准损失,使模型的功能层面置信度与预测正确性对齐,从而提升模型的校准性能。实验表明,UQ4CT在多个基准任务中显著降低了预期校准误差,同时保持了较高的准确率,并在分布偏移情况下表现出更强的鲁棒性。

详情
英文摘要

Accurate uncertainty quantification in large language models (LLMs) is essential for reliable confidence estimation, yet fine-tuned LLMs often become overconfident under limited adaptation data. Existing uncertainty methods for PEFT-based LLMs are largely post hoc, estimating uncertainty after fine-tuning rather than improving how adapters specialize to task-specific input-output relationships. We propose Functional-Level Uncertainty Quantification for Calibrated Fine-Tuning (UQ4CT), which calibrates uncertainty over the functional space induced by prompt-dependent mixtures of LoRA experts. UQ4CT implements this perspective through a mixture-of-experts fine-tuning framework, where a calibration loss aligns functional-level confidence with predictive correctness during training. Across four multiple-choice benchmarks and two open-ended generative QA tasks, UQ4CT reduces Expected Calibration Error (ECE) by over $25\%$ while preserving high accuracy. Under distribution shift, UQ4CT maintains superior calibration and competitive accuracy, demonstrating improved reliability and generalization for fine-tuned LLMs.

2605.15172 2026-05-15 cs.CR cs.CL

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

Rui Wen, Mark Russinovich, Andrew Paverd, Jun Sakuma, Ahmed Salem

AI总结 本文提出了一种新型的后门攻击方法MetaBackdoor,利用大语言模型中的位置编码作为触发机制,无需修改输入文本内容即可激活后门。研究发现,基于位置信息的触发器能够有效激活隐蔽的后门行为,使模型在满足特定长度条件时泄露敏感信息或执行恶意操作。该方法扩展了大语言模型后门攻击的威胁模型,揭示了位置编码这一此前被忽视的攻击面,为防御策略的设计提出了新的挑战。

详情
英文摘要

Backdoor attacks pose a serious security threat to large language models (LLMs), which are increasingly deployed as general-purpose assistants in safety- and privacy-critical applications. Existing LLM backdoors rely primarily on content-based triggers, requiring explicit modification of the input text. In this work, we show that this assumption is unnecessary and limiting. We introduce MetaBackdoor, a new class of backdoor attacks that exploits positional information as the trigger, without modifying textual content. Our key insight is that Transformer-based LLMs necessarily encode token positions to process ordered sequences. As a result, length-correlated positional structure is reflected in the model's internal computation and can be used as an effective non-content trigger signal. We demonstrate that even a simple length-based positional trigger is sufficient to activate stealthy backdoors. Unlike prior attacks, MetaBackdoor operates on visibly and semantically clean inputs and enables qualitatively new capabilities. We show that a backdoored LLM can be induced to disclose sensitive internal information, including proprietary system prompts, once a length condition is satisfied. We further demonstrate a self-activation scenario, where normal multi-turn interaction can move the conversation context into the trigger region and induce malicious tool-call behavior without attacker-supplied trigger text. In addition, MetaBackdoor is orthogonal to content-based backdoors and can be composed with them to create more precise and harder-to-detect activation conditions. Our results expand the threat model of LLM backdoors by revealing positional encoding as a previously overlooked attack surface. This challenges defenses that focus on detecting suspicious text and highlights the need for new defense strategies that explicitly account for positional triggers in modern LLM architectures.

2605.15154 2026-05-15 stat.ML cs.LG

RoSHAP: A Distributional Framework and Robust Metric for Stable Feature Attribution

Lanxin Xiang, Liang Shi, Youhui Ye, Boyu Jiang, Dawei Zhou, Feng Guo

AI总结 本文提出了一种名为RoSHAP的分布框架和鲁棒度量方法,用于实现更稳定的特征归因分析。该方法基于SHAP值,通过引导重采样和核密度估计建模特征归因分数的分布,并在温和正则条件下证明其聚合值渐近服从高斯分布,从而降低了分布估计的计算成本。RoSHAP不仅提升了特征排名的稳定性,还在模拟和实际数据实验中表现出优于传统单次归因方法的性能,同时使用更少的特征即可达到与全特征模型相当的预测效果。

详情
英文摘要

Feature attribution analysis is critical for interpreting machine learning models and supporting reliable data-driven decisions. However, feature attribution measures often exhibit stochastic variation: different train--test splits, random seeds, or model-fitting procedures can produce substantially different attribution values and feature rankings. This paper proposes a framework for incorporating stochastic nature of feature attribution and a robust attribution metric, RoSHAP, for stable feature ranking based on the SHAP metric. The proposed framework models the distribution of feature attribution scores and estimates it through bootstrap resampling and kernel density estimation. We show that, under mild regularity conditions, the aggregated feature attribution score is asymptotically Gaussian, which greatly reduces the computational cost of distribution estimation. The RoSHAP summarizes the distribution of SHAP into a robust feature-ranking criterion that simultaneously rewards features that are active, strong, and stable. Through simulations and real-data experiments, the proposed framework and RoSHAP outperform standard single-run attribution measures in identifying signal features. In addition, models built using RoSHAP-selected features achieve predictive performance comparable to full-feature models while using substantially fewer predictors. The proposed RoSHAP approach improves the stability and interpretability of machine learning models, enabling reliable and consistent insights for analysis.

2605.15127 2026-05-15 cs.HC cs.AI

Understanding How International Students in the U.S. Are Using Conversational AI to Support Cross-Cultural Adaptation

Laleh Nourian, Anisa Callis, Stephanie Patterson, Jadeline Miao, Jamison Heard, Garreth W. Tigwell

AI总结 本文研究了在美国留学的国际学生如何使用对话式人工智能来支持跨文化适应。通过调查和访谈,研究揭示了国际学生在面临文化适应挑战时对AI工具的使用模式、动机及局限性。研究发现,AI被视为应对即时问题的“急救工具”,但学生也期望其能发展为长期支持伙伴。研究为设计更贴合国际学生需求的AI支持系统提供了重要建议。

Comments 33 pages, single column. 4 figures, 9 tables

详情
英文摘要

Moving to a new culture and adapting to a new life, as an international student, can be a stressful experience. In the US, international students face unique overlapping challenges, yet the current support ecosystem, including university support systems and informal social networks, remains largely fragmented. While conversational AI has emerged as a tool used by many (e.g., generative AI chatbots like ChatGPT and Google Gemini), we do not have a clear understanding of how international students adopt and perceive these technologies as support tools. We conducted a survey study (n=60) to map the relationship between international students' challenges and AI adoption patterns, followed by an interview study with 14 participants to identify the underlying motivations and boundaries of use. Our findings show that AI is perceived as a first-aid tool for immediate challenges, however, there is an interest in transforming AI from a tool for short-term help into a long-term support companion. By identifying where and how AI can provide long-term support, and where it is insufficient, we contribute recommendations for creating AI-powered support tailored to the unique needs of international students.