arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1676
2409.03897 2026-05-18 cs.LG cs.DC

On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments

Leo Muxing Wang, Pengkun Yang, Lili Su

AI总结 本文研究了异构环境下联邦Q学习的收敛速率问题,探讨了在多个智能体协同学习最优Q函数时,通信频率与智能体数量对收敛速度的影响。研究发现,虽然增加智能体数量可以线性加速收敛,但增加通信间隔会导致性能显著下降,且这一现象具有本质性。论文还揭示了收敛过程中的两阶段特性,并提出了通过调整学习率以加快整体收敛的策略。

详情
英文摘要

Large-scale multi-agent systems are often deployed across wide geographic areas, where agents interact with heterogeneous environments. There is an emerging interest in understanding the role of heterogeneity in the performance of the federated versions of classic reinforcement learning algorithms. In this paper, we study synchronous federated Q-learning, which aims to learn an optimal Q-function by having $K$ agents average their local Q-estimates per $E$ iterations. We observe an interesting phenomenon on the convergence speeds in terms of $K$ and $E$. Similar to the homogeneous environment settings, there is a linear speed-up concerning $K$ in reducing the errors that arise from sampling randomness. Yet, in sharp contrast to the homogeneous settings, $E>1$ leads to significant performance degradation. Specifically, we provide a fine-grained characterization of the error evolution in the presence of environmental heterogeneity, which decay to zero as the number of iterations $T$ increases. The slow convergence of having $E>1$ turns out to be fundamental rather than an artifact of our analysis. We prove that, for a wide range of stepsizes, the $\ell_{\infty}$ norm of the error cannot decay faster than $Θ(E/T)$. In addition, our experiments demonstrate that the convergence exhibits an interesting two-phase phenomenon. For any given stepsize, there is a sharp phase-transition of the convergence: the error decays rapidly in the beginning yet later bounces up and stabilizes. Provided that the phase-transition time can be estimated, choosing different stepsizes for the two phases leads to faster overall convergence.

2408.07331 2026-05-18 cs.LG

RSEA-MVGNN: Multi-View Graph Neural Network with Reliable Structural Enhancement and Aggregation

Junyu Chen, Long Shi, Badong Chen

AI总结 该论文提出了一种名为RSEA-MVGNN的多视图图神经网络,旨在有效融合具有不同图结构特征的多视图图数据。该方法通过主观逻辑估计每个视图的不确定性,并利用去相关算法进行可靠的结构增强,从而提升特征多样性;同时,模型基于视图的信念和不确定性评估视图质量,使高质量视图在图神经网络聚合中占据主导地位。实验表明,该方法在多个真实数据集上优于现有先进方法。

详情
Journal ref
Information Fusion 121 (2025) 103143
英文摘要

Graph Neural Networks (GNNs) have exhibited remarkable efficacy in learning from multi-view graph data. In the framework of multi-view graph neural networks, a critical challenge lies in effectively combining diverse views, where each view has distinct graph structure features (GSFs). Existing approaches to this challenge primarily focus on two aspects: 1) prioritizing the most important GSFs, 2) utilizing GNNs for feature aggregation. However, prioritizing the most important GSFs can lead to limited feature diversity, and existing GNN-based aggregation strategies equally treat each view without considering view quality. To address these issues, we propose a novel Multi-View Graph Neural Network with Reliable Structural Enhancement and Aggregation (RSEA-MVGNN). Firstly, we estimate view-specific uncertainty employing subjective logic. Based on this uncertainty, we design reliable structural enhancement by feature de-correlation algorithm. This approach enables each enhancement to focus on different GSFs, thereby achieving diverse feature representation in the enhanced structure. Secondly, the model learns view-specific beliefs and uncertainty as opinions, which are utilized to evaluate view quality. Based on these opinions, the model enables high-quality views to dominate GNN aggregation, thereby facilitating representation learning. Experimental results conducted on five real-world datasets demonstrate that RSEA-MVGNN outperforms several state-of-the-art GNN-based methods.

2407.02039 2026-05-18 cs.CL

Prompt Stability Scoring for Text Annotation with Large Language Models

Christopher Barrie, Elli Palaiologou, Petter Törnberg

AI总结 随着大型语言模型在文本标注中的应用日益广泛,研究发现模型输出的可重复性可能受到提示设计微小变化的影响。为此,本文提出了一种通用的提示稳定性评估框架,通过借鉴编码者内部与外部一致性评分方法,定义了“提示稳定性评分(PSS)”,并开发了相应的Python工具包。实验在多个数据集上验证了该方法的有效性,并为实际研究者提供了提升标注稳定性的实践建议。

Comments 39 pages, 5 figures

详情
英文摘要

Researchers are increasingly using language models (LMs) for text annotation. These approaches rely only on a prompt telling the model to return a given output according to a set of instructions. The reproducibility of LM outputs may nonetheless be vulnerable to small changes in the prompt design. This calls into question the replicability of classification routines. To tackle this problem, researchers have typically tested a variety of semantically similar prompts to determine what we call ``prompt stability." These approaches remain ad-hoc and task specific. In this article, we propose a general framework for diagnosing prompt stability by adapting traditional approaches to intra- and inter-coder reliability scoring. We call the resulting metric the Prompt Stability Score (PSS) and provide a Python package \texttt{promptstability} for its estimation. Using six different datasets and twelve outcomes, we classify $\sim$3.1m rows of data and $\sim$300m input tokens to: a) diagnose when prompt stability is low; and b) demonstrate the functionality of the package. We conclude by providing best practice recommendations for applied researchers.

2406.18944 2026-05-18 cs.CV cs.AI cs.CR

Rethinking and Red-Teaming Protective Perturbation in Personalized Diffusion Models

Yixin Liu, Ruoxi Chen, Xun Chen, Lichao Sun

AI总结 个性化扩散模型(PDMs)在使用少量数据生成特定人物图像方面表现出色,但其对微小对抗性扰动高度敏感,导致在受污染数据上微调时性能显著下降。本文通过 Shortcut Learning 的视角深入分析了 PDMs 的微调过程,揭示了对抗扰动在 CLIP 嵌入空间中引发的潜在语义对齐问题,并据此提出了一种系统性的反制框架,包括图像净化和对比解耦学习,有效提升了模型的鲁棒性和泛化能力。

Comments Code is available at https://github.com/liuyixin-louis/DiffShortcut

详情
英文摘要

Personalized diffusion models (PDMs) have become prominent for adapting pre-trained text-to-image models to generate images of specific subjects using minimal training data. However, PDMs are susceptible to minor adversarial perturbations, leading to significant degradation when fine-tuned on corrupted datasets. These vulnerabilities are exploited to create protective perturbations that prevent unauthorized image generation. Existing purification methods attempt to red-team the protective perturbation to break the protection but often over-purify images, resulting in information loss. In this work, we conduct an in-depth analysis of the fine-tuning process of PDMs through the lens of shortcut learning. We hypothesize and empirically demonstrate that adversarial perturbations induce a latent-space misalignment between images and their text prompts in the CLIP embedding space. This misalignment causes the model to erroneously associate noisy patterns with unique identifiers during fine-tuning, resulting in poor generalization. Based on these insights, we propose a systematic red-teaming framework that includes data purification and contrastive decoupling learning. We first employ off-the-shelf image restoration techniques to realign images with their original semantic content in latent space. Then, we introduce contrastive decoupling learning with noise tokens to decouple the learning of personalized concepts from spurious noise patterns. Our study not only uncovers shortcut learning vulnerabilities in PDMs but also provides a thorough evaluation framework for developing stronger protection. Our extensive evaluation demonstrates its advantages over existing purification methods and its robustness against adaptive perturbations.

2404.03099 2026-05-18 cs.LG cs.AI cs.CE cs.IT math.IT stat.ML

Composite Bayesian Optimization In Function Spaces Using NEON -- Neural Epistemic Operator Networks

Leonardo Ferreira Guilhoto, Paris Perdikaris

AI总结 本文提出了一种名为NEON的神经网络架构,用于在无限维函数空间中进行带有不确定性的预测,其参数数量远少于性能相当的深度集成方法。研究聚焦于复合贝叶斯优化问题,即优化由未知函数映射和已知函数组成的复合函数,并通过实验表明NEON在多个场景下取得了领先的优化效果,同时显著降低了模型复杂度。

详情
Journal ref
Guilhoto, Leonardo Ferreira, and Paris Perdikaris. "Composite Bayesian optimization in function spaces using NEON - Neural Epistemic Operator Networks." Scientific Reports 14.1 (2024): 29199
英文摘要

Operator learning is a rising field of scientific computing where inputs or outputs of a machine learning model are functions defined in infinite-dimensional spaces. In this paper, we introduce NEON (Neural Epistemic Operator Networks), an architecture for generating predictions with uncertainty using a single operator network backbone, which presents orders of magnitude less trainable parameters than deep ensembles of comparable performance. We showcase the utility of this method for sequential decision-making by examining the problem of composite Bayesian Optimization (BO), where we aim to optimize a function $f=g\circ h$, where $h:X\to C(\mathcal{Y},\mathbb{R}^{d_s})$ is an unknown map which outputs elements of a function space, and $g: C(\mathcal{Y},\mathbb{R}^{d_s})\to \mathbb{R}$ is a known and cheap-to-compute functional. By comparing our approach to other state-of-the-art methods on toy and real world scenarios, we demonstrate that NEON achieves state-of-the-art performance while requiring orders of magnitude less trainable parameters.

2403.13805 2026-05-18 cs.CV cs.AI cs.LG

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition

Ziyu Liu, Zeyi Sun, Yuhang Zang, Wei Li, Pan Zhang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

AI总结 本文提出了一种名为RAR的方法,旨在提升多模态大语言模型(MLLMs)在细粒度和少样本视觉识别任务中的性能。RAR结合了CLIP的多模态检索能力与MLLMs的丰富知识库,通过建立多模态检索器来扩展模型的上下文窗口,并在推理时检索相关类别信息供MLLMs进行排序和预测。该方法有效解决了MLLMs在面对大量类别时性能下降的问题,在多个细粒度和零样本识别基准上取得了显著的性能提升。

Comments Project: https://github.com/Liuziyu77/RAR

详情
英文摘要

CLIP (Contrastive Language-Image Pre-training) uses contrastive learning from noise image-text pairs to excel at recognizing a wide array of candidates, yet its focus on broad associations hinders the precision in distinguishing subtle differences among fine-grained items. Conversely, Multimodal Large Language Models (MLLMs) excel at classifying fine-grained categories, thanks to their substantial knowledge from pre-training on web-level corpora. However, the performance of MLLMs declines with an increase in category numbers, primarily due to growing complexity and constraints of limited context window size. To synergize the strengths of both approaches and enhance the few-shot/zero-shot recognition abilities for datasets characterized by extensive and fine-grained vocabularies, this paper introduces RAR, a Retrieving And Ranking augmented method for MLLMs. We initially establish a multi-modal retriever based on CLIP to create and store explicit memory for different categories beyond the immediate context window. During inference, RAR retrieves the top-k similar results from the memory and uses MLLMs to rank and make the final predictions. Our proposed approach not only addresses the inherent limitations in fine-grained recognition but also preserves the model's comprehensive knowledge base, significantly boosting accuracy across a range of vision-language recognition tasks. Notably, our approach demonstrates a significant improvement in performance on 5 fine-grained visual recognition benchmarks, 11 few-shot image recognition datasets, and the 2 object detection datasets under the zero-shot recognition setting.

2402.10380 2026-05-18 cs.LG cs.AI cs.CL

Subgraph-level Universal Prompt Tuning

Junhyun Lee, Wooseong Yang, Jaewoo Kang

AI总结 在图神经网络中,如何有效适配不同预训练策略的模型仍是一个挑战。本文提出了一种子图级通用提示调优方法(SUPT),通过在子图层面分配提示特征,保持方法的通用性,同时大幅减少调优参数数量。实验表明,SUPT在多种下游任务中表现优异,尤其在少样本场景下平均性能提升超过6.6%。

详情
Journal ref
Information Sciences 749 (2026) 123516
英文摘要

In the evolving landscape of machine learning, the adaptation of pre-trained models through prompt tuning has become increasingly prominent. This trend is particularly observable in the graph domain, where diverse pre-training strategies present unique challenges in developing effective prompt-based tuning methods for graph neural networks. Previous approaches have been limited, focusing on specialized prompting functions tailored to models with edge prediction pre-training tasks. These methods, however, suffer from a lack of generalizability across different pre-training strategies. Recently, a simple prompt tuning method has been designed for any pre-training strategy, functioning within the input graph's feature space. This allows it to theoretically emulate any type of prompting function, thereby significantly increasing its versatility for a range of downstream applications. Nevertheless, the capacity of such simple prompts to fully grasp the complex contexts found in graphs remains an open question, necessitating further investigation. Addressing this challenge, our work introduces the Subgraph-level Universal Prompt Tuning (SUPT) approach, focusing on the detailed context within subgraphs. In SUPT, prompt features are assigned at the subgraph-level, preserving the method's universal capability. This requires extremely fewer tuning parameters than fine-tuning-based methods, outperforming them in 42 out of 45 full-shot scenario experiments with an average improvement of over 2.5%. In few-shot scenarios, it excels in 41 out of 45 experiments, achieving an average performance increase of more than 6.6%.

2311.03658 2026-05-18 cs.CL cs.AI cs.LG stat.ML

The Linear Representation Hypothesis and the Geometry of Large Language Models

Kiho Park, Yo Joong Choe, Victor Veitch

AI总结 本文探讨了“线性表示假设”,即高层概念在表示空间中以线性方向形式表示的问题,提出了“线性表示”的两种形式化定义,并分别对应输出(词)空间和输入(句子)空间。通过引入因果内积,作者建立了一个非欧几里得的内积结构,能够统一各种线性表示的概念,并用于构建探针和引导向量。实验表明,大型语言模型中确实存在概念的线性表示,且内积的选择对解释与控制模型具有基础性作用。

Comments Accepted for a presentation at ICML 2024 and an oral presentation at NeurIPS 2023 Workshop on Causal Representation Learning. Code is available at https://github.com/KihoPark/linear_rep_geometry

详情
Journal ref
In Proceedings of the 41st International Conference on Machine Learning (ICML), 2024
英文摘要

Informally, the 'linear representation hypothesis' is the idea that high-level concepts are represented linearly as directions in some representation space. In this paper, we address two closely related questions: What does "linear representation" actually mean? And, how do we make sense of geometric notions (e.g., cosine similarity or projection) in the representation space? To answer these, we use the language of counterfactuals to give two formalizations of "linear representation", one in the output (word) representation space, and one in the input (sentence) space. We then prove these connect to linear probing and model steering, respectively. To make sense of geometric notions, we use the formalization to identify a particular (non-Euclidean) inner product that respects language structure in a sense we make precise. Using this causal inner product, we show how to unify all notions of linear representation. In particular, this allows the construction of probes and steering vectors using counterfactual pairs. Experiments with LLaMA-2 demonstrate the existence of linear representations of concepts, the connection to interpretation and control, and the fundamental role of the choice of inner product.

2212.12130 2026-05-18 cs.CV

Learning to Detect and Segment for Open Vocabulary Object Detection

Tao Wang, Nan Li

AI总结 该研究旨在解决开放词汇物体检测中的检测与分割问题,提出了一种名为CondHead的动态网络结构,以提升模型对新类别物体的泛化能力。核心方法通过条件参数化网络头,利用语义嵌入引导模型学习类别特异性知识,从而实现更准确的边界框回归和分割预测。该方法在保持计算开销极小的前提下,显著提升了现有开放词汇检测方法的性能。

Comments We appologize that author Nan Li was not on the published version due to cvpr23 policy that authors cannot be added after abstract deadline

详情
英文摘要

Open vocabulary object detection has been greatly advanced by the recent development of vision-language pretrained model, which helps recognize novel objects with only semantic categories. The prior works mainly focus on knowledge transferring to the object proposal classification and employ class-agnostic box and mask prediction. In this work, we propose CondHead, a principled dynamic network design to better generalize the box regression and mask segmentation for open vocabulary setting. The core idea is to conditionally parameterize the network heads on semantic embedding and thus the model is guided with class-specific knowledge to better detect novel categories. Specifically, CondHead is composed of two streams of network heads, the dynamically aggregated head and the dynamically generated head. The former is instantiated with a set of static heads that are conditionally aggregated, these heads are optimized as experts and are expected to learn sophisticated prediction. The latter is instantiated with dynamically generated parameters and encodes general class-specific information. With such a conditional design, the detection model is bridged by the semantic embedding to offer strongly generalizable class-wise box and mask prediction. Our method brings significant improvement to the state-of-the-art open vocabulary object detection methods with very minor overhead, e.g., it surpasses a RegionClip model by 3.0 detection AP on novel categories, with only 1.1% more computation.

1911.05467 2026-05-18 cs.LG cs.NA math.NA

ChebNet: Efficient and Stable Constructions of Deep Neural Networks with Rectified Power Units via Chebyshev Approximations

Shanshan Tang, Bo Li, Haijun Yu

AI总结 本文提出了一种基于切比雪夫多项式逼近的高效且稳定的深度神经网络构建方法——ChebNet,用于提升对光滑函数的逼近能力。相比传统使用幂级数逼近的RePU激活函数网络,ChebNet通过频率域中的分层切比雪夫逼近结构,实现了更稳定且计算效率更高的网络构造。实验表明,ChebNet不仅保持了与幂级数方法相当的逼近性能,还具有更高的稳定性,并可通过微调获得更优结果,为实际应用中高效逼近光滑函数提供了可行方案。

Comments 6 figures, 3 tables, to appear on Communications in Mathematics and Statistics

详情
Journal ref
Communications in Mathematics and Statistics, 2024
英文摘要

In a previous study [B. Li, S. Tang and H. Yu, Commun. Comput. Phy. 27(2):379-411, 2020], it is shown that deep neural networks built with rectified power units (RePU) as activation functions can give better approximation for sufficient smooth functions than those built with rectified linear units, by converting polynomial approximations using power series into deep neural networks with optimal complexity and no approximation error. However, in practice, power series approximations are not easy to obtain due to the associated stability issue. In this paper, we propose a new and more stable way to construct RePU deep neural networks based on Chebyshev polynomial approximations. By using a hierarchical structure of Chebyshev polynomial approximation in frequency domain, we obtain efficient and stable deep neural network construction, which we call ChebNet. The approximation of smooth functions by ChebNets is no worse than the approximation by deep RePU nets using power series. On the same time, ChebNets are much more stable. Numerical results show that the constructed ChebNets can be further fine-tuned to obtain much better results than those obtained by tuning deep RePU nets constructed by power series approach. As spectral accuracy is hard to obtain by direct training of deep neural networks, ChebNets provide a practical way to obtain spectral accuracy, it is expected to be useful in real applications that require efficient approximations of smooth functions.

2605.16255 2026-05-18 cs.DC cs.AI

Designing Datacenter Power Delivery Hierarchies for the AI Era

为AI时代设计数据中心电力交付层级

Grant Wilkins, Fiodar Kazhamiaka, Alok Gautam Kumbhare, Chaojie Zhang, Ricardo Bianchini

AI总结 本文研究了AI时代数据中心电力交付层级设计的挑战,提出了一种评估框架,结合吞吐量、功率和成本指标,分析多资源短缺对部署容量、资本支出和性能的影响。

详情
AI中文摘要

对AI加速器的需求迅速增加机架功率密度,预计到2027年将达到每部署1MW。这给数据中心电力交付设计者带来了重大挑战。随着功率密度增加,为不同目标密度设计的数据中心可能无法使用其交付层级预留的所有功率。设计必须在数据中心长生命周期和多个硬件世代中保持高效。功率利用率在AI时代尤为重要,因为电网电力容量是稀缺资源。设计长期高效的电力交付层级困难,因为机架放置可行性、工作负载影响和成本取决于电气拓扑、部署粒度、放置策略、功率超订和工作负载混合。此外,这些因素随时间变化,跨多个资源维度有相互依赖性,通常无法用闭式分析。为解决这一挑战,我们开发了一个评估框架,结合GPU、计算和存储部署的投影模型,结合Microsoft Azure的生产数据。我们的结果表明,多资源短缺显著改变可部署容量、有效资本支出和交付性能,并量化了从机架和机柜规模AI系统中上升的密度如何影响这些结果。对于AI数据中心设计,相关规划目标不是安装兆瓦,而是随时间变化的可部署容量。

英文摘要

Demand for AI accelerators is rapidly increasing rack power density, with projections approaching 1MW per deployment by 2027. This poses a major challenge for datacenter power delivery designers. As power densities increase, a datacenter designed for a different target density may strand power, i.e., may be unable to use all the power that its delivery hierarchy has provisioned. Designs must remain efficient over long datacenter lifetimes and multiple hardware generations. Power utilization is particularly important as grid power capacity is a scarce resource in the AI era. Designing an efficient power delivery hierarchy for the long run is difficult because rack placement feasibility, workload impact, and cost depend jointly on electrical topology, deployment granularity, placement policy, power oversubscription, and workload mix. Moreover, each of these factors evolve over time, have inter-dependencies across multiple resource dimensions, and generally do not lend themselves to closed-form analysis. To address this challenge, we develop a framework for evaluating datacenter power delivery designs using throughput, power, and cost metrics over realistic arrival, oversubscription, and decommissioning sequences. The framework combines projection models for GPU, compute, and storage deployments with operational factors grounded in production data from Microsoft Azure. Our results show that multi-resource stranding materially changes deployable capacity, effective capital expenditure, and delivered performance, and quantify how rising density from rack- and pod-scale AI systems shapes these outcomes. For AI datacenter design, the relevant planning objective is not installed megawatts, but deployable capacity over time.

2605.16245 2026-05-18 cs.CY cs.AI cs.CL cs.LG cs.SI

AI-Mediated Communication Can Steer Collective Opinion

AI介导的交流可以引导集体意见

Stratis Tsirtsis, Kai Rawal, Chris Russell, Brent Mittelstadt, Sandra Wachter

AI总结 本文研究AI在人类间交流中对集体意见形成的影响,通过实证和理论分析展示AI引入的方向性偏见如何通过网络放大并改变集体观点,探讨平台如何控制此类偏见。

详情
AI中文摘要

生成式人工智能(AI)正日益融入人类交流意见的在线平台;大型语言模型(LLMs)现在在LinkedIn上润色用户帖子,并在X上提供内容上下文。尽管先前研究显示AI能表达偏见意见并影响个体意见,但较少关注其在介导人类间交流时对集体意见形成的影响。我们通过实证和理论分析填补这一空白。我们实证显示,多个流行LLM家族在被指示编辑争议性话题的人类文本时引入方向性偏见,例如倾向于支持枪支管控,反对无神论。基于这一观察,我们引入了一个意见动态的数学模型,其中AI系统位于社交网络用户之间,转换他们表达和感知的意见。通过分析该模型的平衡点并使用真实社交网络数据进行模拟,我们显示AI在人类间交流中引入的偏见可通过网络放大并转向集体意见。鉴于这些发现,我们探讨此类偏见是否可通过在线平台控制。我们审核了X上的“解释此帖子”功能,并发现Grok在与堕胎相关的内容中的输出存在亲生命偏见,我们追溯到特定的设计选择。最后,我们讨论了这些发现与欧洲联盟正在进行的立法努力的广泛影响。

英文摘要

Generative artificial intelligence (AI) is increasingly integrated into the online platforms where humans exchange opinions; large language models (LLMs) now polish users' posts on LinkedIn and provide context for content shared on X. While prior work has shown that AI can express biased opinions and shape individuals' opinions during human-AI interactions, less attention has been paid to its influence on collective opinion formation when mediating human-to-human communication. We address this gap via a combination of empirical and theoretical analyses. We show empirically that LLMs from multiple popular families introduce directional biases when instructed to edit human-written texts on contested topics, for example, nudging texts in favor of gun control and against atheism. Building on this observation, we introduce a mathematical model of opinion dynamics in which an AI system sits between users on a social network, transforming the opinions they express and perceive. By analytically characterizing the equilibrium of this model and performing simulations on real social network data, we show that biases introduced by AI in human-to-human communication can be amplified through the network and shift collective opinion in their direction. In light of these findings, we investigate whether such biases are controllable by online platforms. We audit the "Explain this post" feature on X and find evidence of pro-life bias in Grok's outputs on abortion-related content, which we trace back to specific design choices. We conclude with a discussion of the broader implications of our findings in relation to ongoing legislative efforts in the European Union.

2605.16230 2026-05-18 cond-mat.mtrl-sci cs.LG

Universal Magnetic Structure Prediction from Atomic Coordinates with Near-Experimental Accuracy

从原子坐标预测通用磁结构并实现接近实验精度

Abhijatmedhi Chotrattanapituk, Ryotaro Okabe, Eunbi Rha, Mariya Al-Hinai, Eugene Jiang, Daniel Pajerowski, Yongqiang Cheng, Joshua J. Turner, Mingda Li

AI总结 本文提出磁结构网络(MSN),通过原子晶体结构直接预测磁结构,利用原始调制结构表示(PMSR)统一编码调制结构,实现高精度磁结构预测,为磁性材料发现提供新方法。

Comments 9 pages, 3 figures

详情
AI中文摘要

磁序是材料的基本性质,调控集体行为并实现多种功能。然而,磁结构难以确定:实验成本高且专业,而第一性原理方法常难以处理非collinear和无调制序。本文引入磁结构网络(MSN),一种E(3)等变图神经网络,直接从原子晶体结构预测collinear和non-collinear磁结构,训练于MAGNDATA实验确定结构。通过提出原始调制结构表示(PMSR),我们能够统一编码调制和非调制结构,无需对称假设。模型在所有调制组件上表现强劲,能高保真重建实验磁结构。我们的方法提供了一种可扩展的框架,用于快速磁结构预测,并开辟了数据驱动发现磁性材料的新途径。

英文摘要

Magnetic order is a fundamental property of materials, governing collective behavior and enabling a broad range of functionalities. Yet magnetic structure remains difficult to determine: experiments are costly and specialized, while first-principles methods often struggle with the noncollinear and incommensurate orders found in real materials. Here we introduce magnetic structure network (MSN), an E(3) equivariant graph neural network that predicts both collinear and non-collinear magnetic structures directly from atomic crystal structures, trained directly on experimentally determined structures from MAGNDATA. By proposing the primitive modulated structure representation (PMSR), we are able to encode commensurate and incommensurate structures in a unified way without symmetry assumptions. The model achieves strong performance across all modulation components and reconstructs experimental magnetic structures with high fidelity. Our approach provides a scalable framework for rapid magnetic structure prediction and opens a route to data-driven discovery of magnetic materials.

2605.16208 2026-05-18 stat.ML cs.LG

A Scalable Nonparametric Continuous-Time Survival Model through Numerical Quadrature

通过数值积分实现的可扩展非参数连续时间生存模型

Chaeyeon Lee, Sehwan Kim, Hyungrok Do

AI总结 本文提出QSurv模型,通过高斯-勒让德数值积分实现非参数连续时间生存建模,无需时间离散化或限制分布假设,有效捕捉非平稳危险动态,实验表明其在即时危险函数估计上具有优势。

详情
AI中文摘要

灵活的连续时间生存建模对于捕捉高维数据中的复杂时间变化危险动态至关重要;然而,由于似然估计所需的不可计算积分,训练此类模型仍然具有挑战性。我们引入QSurv,一种可扩展的深度学习框架,使非参数连续时间建模成为可能,而无需依赖时间离散化或限制性分布假设。我们提出基于高斯-勒让德数值积分的训练目标,该方法以高阶精度近似累积危险,同时通过标准反向传播实现高效的端到端训练。此外,为了在复杂架构中有效捕捉非平稳危险动态,我们引入了时间条件低秩适应,一种通过动态调节权重实现对时间的条件化的机制。我们提供了理论分析,建立了累积危险评估的近似误差界。在合成基准、大规模真实世界表格数据集和高维医学影像任务中的全面实验表明,QSurv在预测性能上具有竞争力,在即时危险函数估计方面具有优势,从而能够更可解释地表征时间变化的风险模式。

英文摘要

Flexible continuous-time survival modeling is critical for capturing complex time-varying hazard dynamics in high-dimensional data; however, training such models remains challenging due to the intractable integral required for likelihood estimation. We introduce QSurv, a scalable deep learning framework that enables nonparametric continuous-time modeling without relying on time discretization or restrictive distributional assumptions. We propose a training objective based on Gauss-Legendre numerical quadrature, which approximates the cumulative hazard with high-order accuracy while facilitating efficient end-to-end training via standard backpropagation. Furthermore, to effectively capture non-stationary hazard dynamics in complex architectures, we introduce time-conditioned low-rank adaptation, a mechanism that conditions general neural backbones on time by dynamically modulating weights via low-rank updates. We provide theoretical analysis establishing approximation error bounds for cumulative-hazard evaluation. Comprehensive experiments across synthetic benchmarks, large-scale real-world tabular datasets, and high-dimensional medical imaging tasks demonstrate that QSurv achieves competitive predictive performance with advantages in instantaneous hazard function estimation, enabling more interpretable characterization of time-varying risk patterns.

2605.16194 2026-05-18 cs.DL cs.AI cs.IR cs.MA

paper.json: A Coordination Convention for LLM-Agent-Actionable Papers

为LLM-代理可操作论文的协调约定

Arquimedes Canedo

AI总结 本文提出paper.json文件,通过稳定声明ID、明确不声明列表、精确图示命令和稳定定义ID等约定,解决LLM代理在阅读学术论文时的重复失败问题。

详情
AI中文摘要

LLM代理通常作为学术论文的第一(有时唯一)阅读者,快速浏览子声明、提取可重复性步骤并概括范围。标准论文在这一角色中产生重复失败:无法在子论文粒度下引用子声明、范围过度扩展超出论文测试内容,以及图示命令埋藏在代码库而非论文本身。我们提出paper.json,一个随PDF一同携带的JSON文件,通过轻量级约定解决这些失败:稳定声明ID(C1)、明确不声明列表(C2)、精确每图shell命令(C3)和稳定定义ID(C5)。第五个约定(C4)指出,最小可行合规性,手写JSON与PDF一同,可在一小时内完成,无需触碰人类可读输出。C1、C2、C3和C5是开放邀请:阅读合规论文并采取行动的代理将产生证据支持或反对它们。本文本身合规:运行`uv run validator.py paper.json --against paper.typ`通过。仓库:https://github.com/arquicanedo/paper-json

英文摘要

LLM agents routinely serve as first (and sometimes only) readers of academic papers, skimming for sub-claims, extracting reproducibility steps, and generalizing scope. Standard prose papers produce recurring failures in this role: sub-claims that cannot be cited at sub-paper granularity, scope overextension beyond what the paper tests, and figure commands buried in codebases rather than the paper itself. We propose `paper.json`, a companion JSON file that travels with the PDF and addresses each failure with a lightweight convention: stable claim IDs (C1), an explicit does-not-claim list (C2), exact per-figure shell commands (C3), and stable definition IDs (C5). A fifth convention (C4) holds that minimum viable compliance, hand-written JSON alongside the PDF, is achievable in under an hour for a finished paper without touching the human-readable output. C1, C2, C3, and C5 are open invitations: an agent that reads a compliant paper and acts on it produces evidence for or against them. This paper is itself compliant: `uv run validator.py paper.json --against paper.typ` passes. Repo: https://github.com/arquicanedo/paper-json

2605.16184 2026-05-18 cs.DC cs.LG

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training

面向可扩展大语言模型训练的运行时优化

Yishun Lu, Junhao Zhang, Zeyu Yang, Wes Armour

AI总结 本文提出Asteria系统,通过分离二次优化逻辑与GPU训练路径,解决大规模矩阵优化器状态维护的系统成本问题,实现在内存受限和分布式训练中提升大语言模型的训练效率。

详情
AI中文摘要

二次方法为更高效的LLM训练提供了有吸引力的路径,但其实际应用常受限于维护和更新大型矩阵优化器状态的系统成本。我们引入Asteria,一种运行时系统,通过将二次优化逻辑与关键GPU训练路径分离,消除这一瓶颈。Asteria动态地将优化器状态分布在GPU内存、CPU内存和可选NVMe存储中,根据架构约束和运行时压力。它进一步利用训练钩子提前准备影子状态,使昂贵的逆根计算异步在主机上进行,同时GPU计算持续进行。对于分布式训练,Asteria采用有界滞后协议,通过拓扑感知协调限制同步频率,同时保持优化器有效性。我们在内存受限和分布式训练设置上评估Asteria。在单块GB10 GPU和128GB统一内存的DGX Spark平台,Asteria支持10亿参数语言模型的二次训练。在多节点GH200系统中,它降低了可见优化器开销,减少了反复延迟尖峰,加速了收敛时间,并在70亿参数语言模型中保持SOAP和KL-Shampoo的优化优势。我们的结果表明,二次LLM训练的实用性并非仅通过简化优化器,而是通过重新思考运行时层面的优化器状态、后台计算和分布式同步管理来实现。

英文摘要

Second-order methods offer an attractive path toward more sample-efficient LLM training, but their practical use is often blocked by the systems cost of maintaining and updating large matrix-based optimizer states. We introduce \textbf{Asteria}, a runtime system designed to remove this bottleneck by separating second-order optimization logic from the critical GPU training path. Rather than keeping all preconditioner state on the accelerator, Asteria dynamically distributes optimizer state across GPU memory, CPU memory, and optional NVMe storage according to architectural constraints and runtime pressure. It further uses training hooks to prepare shadow states in advance, allowing expensive inverse-root computations to proceed asynchronously on the host while GPU computation continues. For distributed training, Asteria employs a bounded-staleness protocol that limits synchronization frequency while preserving optimizer effectiveness through topology-aware coordination. We evaluate Asteria on both memory-constrained and distributed training settings. On a DGX Spark platform with a single GB10 GPU and 128GB unified memory, Asteria supports second-order training for a 1B-parameter language model. On multi-node GH200 systems, it lowers visible optimizer overhead, reduces recurring latency spikes, accelerates convergence in wall-clock time, and maintains the optimization advantages of SOAP and KL-Shampoo in a 7B-parameter language model. Our results suggest that second-order LLM training can be made practical not by simplifying the optimizer alone, but by rethinking how optimizer state, background computation, and distributed synchronization are managed at the runtime level.

2605.16145 2026-05-18 stat.ML cs.LG

Skew-adaptive conformal prediction

偏斜自适应置信预测

Paulo C. Marques F., Helton Graziadei

AI总结 本文提出一种偏斜自适应置信预测方法,通过非对称区间族和 gauge 方法构建置信分数,利用逆双曲正弦变换训练额外预测模型以适应特征空间中的不确定性倾斜,保持了样本有限的边缘有效性,同时实现了对局部尺度和偏斜的适应。

Comments 17 pages, 2 figures

详情
AI中文摘要

我们开发了一种偏斜自适应扩展的分割置信预测方法用于回归。该方法从一个以点预测为中心的非对称区间族开始,并利用 gauge 方法推导出由该族诱导的置信分数。符号缩放残差的逆双曲正弦变换提供了额外预测模型的训练目标,其作用是学习如何在特征空间中调整预测不确定性。所得到的程序在交换性下保持了样本有限的边缘有效性,同时产生能够适应局部尺度和局部偏斜的区间。我们还开发了一种基于校准样本的估计器,用于比较偏斜自适应和经典缩放分数区间的预期相对宽度。在各种数据集上的实验表明,与缩放分数构造和置信化分位数回归相比,预测区间效率有所提高,并显示所提出的估计器与测试样本上观察到的相应平均宽度比高度吻合。

英文摘要

We develop a skew-adaptive extension of split conformal prediction for regression. The method starts from an asymmetric interval family centered at a point prediction and uses the gauge approach to deduce the conformity score induced by this family. The inverse hyperbolic sine transform of signed scaled residuals provides the training target for an additional predictive model, whose role is to learn how predictive uncertainty should tilt across the feature space. The resulting procedure preserves the finite-sample marginal validity of split conformal prediction under exchangeability, while producing intervals that adapt to both local scale and local skewness. We also develop a calibration-sample-based estimator for comparing the expected relative future width of the skew-adaptive and classical scaled-score intervals. Experiments on a variety of datasets indicate gains in prediction interval efficiency over the scaled-score construction and conformalized quantile regression, and show that the proposed estimator closely matches the corresponding average width ratio observed on the test sample.

2605.16114 2026-05-18 cs.NE cs.LG

Scalable neuromorphic computing from autonomous spiking dynamics in a clockless reconfigurable chip

可扩展的神经形态计算:基于自主脉冲动态的无时钟可重构芯片

Eric Oliveira Gomes, Damien Rontani

AI总结 本文提出了一种基于无时钟数字电路自主连续演化的脉冲动态的可扩展神经形态架构,通过FPGA实现可配置的布尔脉冲神经元网络,展示了在音频分类任务中高效处理脉冲编码数据的性能,且能耗显著低于传统数字方案。

详情
AI中文摘要

我们提出了一种基于脉冲动态的可扩展神经形态架构,该架构基于无时钟(异步)数字电路的自主连续演化的脉冲动态。在商用场可编程门阵列(FPGA)上实现,我们的系统实现了具有可配置兴奋性和抑制性突触权重的相互作用布尔脉冲神经元网络。完整的处理流水线能够高效处理脉冲编码数据以解决机器学习任务。我们展示了在基于脉冲编码的音频分类任务中具有竞争性性能。能耗显著低于传统数字实现;这使我们的方法成为一种高效的替代方案,填补了传统模拟神经形态系统与专用模拟神经形态系统之间的差距,而无需专门的硬件设计。更一般而言,我们的方法确立了无时钟数字硬件作为神经形态计算的可行平台。它为可重构芯片转变为节能的准模拟神经形态处理器铺平了道路。

英文摘要

We propose a scalable neuromorphic architecture based on spiking dynamics emerging from the autonomous time-continuous evolution of clockless (asynchronous) digital circuits. Implemented on commercially available field-programmable gate arrays (FPGAs), our system implements networks of interacting Boolean spiking neurons with configurable excitatory and inhibitory synaptic weights. A complete processing pipeline enables efficient handling of spike-encoded data for solving machine-learning tasks. We demonstrate competitive performance for an audio classification task with spike-based encoding and high-speed processing. Power consumption is significantly lower than traditional digital implementations; this makes our approach an efficient alternative that bridges the gap to dedicated analog neuromorphic systems without the need for specialized hardware design. More generally, our approach establishes clockless digital hardware as a viable platform for neuromorphic computing. It paves the way for reconfigurable chips to be turned into energy-efficient quasi-analog neuromorphic processors.

2605.16094 2026-05-18 cs.IT cs.AI math.IT

GeoGS-CE: Learning Delay--Beam Channel Priors with 3D Gaussians for High-Mobility Scenarios

GeoGS-CE: 利用3D高斯分布学习延迟-波束信道先验以应对高机动场景

Yumeng Zhang, Jiajia Guo, Chaozheng Wen, Chenghong Bian, Jun Zhang

AI总结 本文提出GeoGS-CE框架,通过3D高斯分布建模高机动场景中的信道特性,利用延迟-波束功率谱作为先验信息,提升稀疏试点下的信道频率响应重建精度。

详情
AI中文摘要

宽带信道估计(CE)在高机动场景中仍具挑战性,因为信道响应变化迅速,而实际系统只能分配稀疏试点以适应密集用户。幸运的是,许多高机动环境,如高速铁路,表现出预定轨迹、可预测速度和有限主导传播路径。这些特性诱导出的延迟-波束功率谱比瞬时复通道频率响应(CFR)更稳定,对随机相位相干性更不敏感,并富含几何信息。为利用这些环境特性,我们提出GeoGS-CE,一种针对稀疏试点高机动场景的两阶段信道估计框架。在离线阶段,GeoGS-CE联合建模:1)场景级3D高斯分布,捕捉非视线(NLoS)几何散射支持;2)漏泄感知的可微无线渲染过程,将NLoS高斯分布与显式虚拟视线(LoS)组件映射到测量的延迟-波束功率谱,同时考虑实际OFDM延迟和阵列漏泄效应。在在线阶段,为每个用户位置预测延迟-波束功率谱,并用作强协方差先验,通过线性MMSE估计器实现准确的全带和全阵列CFR重建和跟踪。基于广深高速铁路生成的信道仿真表明,所提出的几何先验显著提高了CFR重建性能,优于仅试点和非几何基线。

英文摘要

Wideband channel estimation (CE) in high-mobility scenarios remains challenging because channel responses vary rapidly, while practical systems can allocate only sparse pilots to accommodate dense users. Fortunately, many high-mobility environments, such as high-speed railways, exhibit scheduled trajectories, predictable velocities, and a limited number of dominant propagation paths. These properties induce a delay--beam power spectrum that is more stable than the instantaneous complex channel frequency response (CFR), less sensitive to the random phase coherence, and rich in geometric information. To exploit such environmental properties, we propose GeoGS-CE, a two-stage channel estimation framework for sparse-pilot high-mobility scenarios. In the offline stage, GeoGS-CE jointly models: 1) a scene-level 3D Gaussian representation that captures the non-line-of-sight (NLoS) geometric scattering support, and 2) a leakage-aware differentiable wireless rendering process that maps the NLoS Gaussians, together with an explicit virtual line-of-sight (LoS) component, to the measured delay--beam power spectrum, while accounting for practical OFDM delay and array leakage effects. In the online stage, the delay--beam power spectrum is predicted for each user location and used as a strong covariance prior, enabling accurate full-band and full-array CFR reconstruction and tracking through a linear MMSE estimator. Simulations based on channels generated from a segment of the Guangshen high-speed railway show that the proposed geometric prior substantially improves CFR reconstruction over pilot-only and non-geometric baselines.

2605.16090 2026-05-18 cs.CR cs.CV

A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation

针对大视觉-语言模型的跨模态提示注入攻击:仅图像扰动

Hao Yang, Zhuo Ma, Yang Liu, Yilong Yang, Guancheng Wang, JianFeng Ma

AI总结 本文提出CrossMPI攻击,通过仅图像扰动实现跨模态提示注入,改进模型隐藏状态空间优化并采用层选择策略与距离递减扰动策略,有效提升攻击性能。

详情
AI中文摘要

大型视觉-语言模型(LVLMs)已发展为多模态智能的强大范式,但其日益增长的部署也扩大了提示注入攻击的攻击面。尽管存在日益增长的担忧,现有攻击仍受到关键限制:注入的提示仅能引导模型对单一输入的解释。相反,这些攻击虽然多模态,但未能实现跨模态提示扰动。为此,我们引入了新颖的跨模态提示注入攻击CrossMPI,通过仅图像提示注入引导模型对文本和视觉输入的解释。我们的设计基于以下关键突破:首先,我们将注入提示扰动优化的焦点从视觉嵌入空间(通常仅有10^5个参数)转向模型隐藏状态空间(用于多模态信息整合,具有10^7个参数)。然后,采用两种策略以缓解由更大参数空间带来的优化挑战。为了约束优化的模型参数空间,我们引入了一种层选择策略,识别对多模态整合最关键的层。有趣的是,与以往经验不同,我们的分析表明,最优的LVLM提示扰动层位于模型中间而非最后。为了约束图像扰动空间,我们提出了一种新的距离递减扰动预算分配策略,该策略随着像素距离到语义关键区域的增加而递减分配预算。在多个LVLM和数据集上的广泛实验表明,我们的方法显著优于基线方法。

英文摘要

Large vision-language models (LVLMs) have emerged as a powerful paradigm for multimodal intelligence, but their growing deployment also expands the attack surface of prompt injection. Despite this growing concern, existing attacks still suffer from a critical limitation: the injected prompt for one modality only steers the model's interpretation of that singular input. Alternatively, these attacks remain multimodal but fail to achieve cross-modal prompt perturbation. To bridge this gap, we introduce a novel cross-modal prompt injection attack CrossMPI, which can steer the model's interpretation of both textual and visual inputs via image-only prompt injection. Our design is underpinned by the following key breakthroughs. First, we turn the focus of the injected prompt perturbation optimization from the visual embedding space (typically with only $10^5$ parameters) to the model hidden state space (for multimodal information integration and with $10^7$ parameters). Then, two strategies are adopted to mitigate the optimization challenges posed by the larger parameter space. To constrain the optimized model parameter space, we introduce a layer selection strategy that identifies the layers most critical to multimodal integration. Interestingly, deviating from the past experience, our analysis reveals that the optimal layers for LVLM prompt perturbation reside in the middle of the model rather than the last. To constrain the image perturbation space, we propose a new distance-decremental perturbation budget assignment strategy that allocates budgets decrementally as the pixel distance to semantic-critical regions increases. Extensive experiments across multiple LVLMs and datasets show that our method significantly outperforms baseline approaches.

2605.16085 2026-05-18 cs.DB cs.AI

Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks

面向关系数据库的foundation models的语言模型与图神经网络方法

Jingcheng Wu, Ratan Bahadur Thapa, Mojtaba Nayyeri, Lucas Etteldorf, Max Finkenbeiner, Fabian Leeske, Steffen Staab

AI总结 本文提出结合语言模型和图神经网络的混合架构,通过关系实体图建模提升关系数据库的预测性能,实验表明其在多个任务中表现优异,接近监督基线并缩小与RDL的差距。

Comments 15 pages, 7 figures, 4 tables. Preprint of a paper accepted at the 1st Workshop on Extraction from Triplet Text-Table-Knowledge Graph and associated Challenge (TRIPLET), co-located with ESWC 2026

详情
AI中文摘要

关系数据库存储了大量结构化信息,对复杂预测应用至关重要。然而,关系数据的深度学习进展有限,传统方法通过人工特征工程将数据库扁平化为单表,丢失了关系上下文。关系深度学习(RDL)通过将数据库建模为关系实体图(REGs)供图神经网络(GNNs)处理,但任务和数据库特定。为结合两种范式的优势,本文提出混合架构,结合微调的BART编码器捕捉行内语义,以及基于GraphSAGE的GNN处理REGs注入关系上下文。在RelBench上的实验表明,GNN显著丰富BART的行嵌入,实现驱动-dnf任务在rel-f1数据集上的ROC-AUC为67.40。该性能与监督基线如LightGBM(68.86)相当,并缩小与RDL(72.62)的差距至5.22点,尽管与最先进的基础模型如KumoRFM(82.63)仍有较大差距。这些结果表明,轻量级混合LM-GNN架构为关系数据库的基础模型提供了有前景且资源高效的路径。

英文摘要

Relational databases store much of the world's structured information, and they are essential for driving complex predictive applications. However, deep learning progress on relational data remains limited, as conventional approaches flatten databases into single tables via manual feature engineering, discarding relational context. Relational deep learning (RDL) addresses this by modeling databases as relational entity graphs (REGs) for graph neural networks (GNNs), but remains task- and database-specific. To combine the strengths of both paradigms, we propose a hybrid architecture combining a fine-tuned BART encoder to capture intra-row semantics with a GraphSAGE-based GNN over REGs to inject relational context. Experiments on RelBench show that the GNN substantially enriches BART's row embeddings, achieving a ROC-AUC of 67.40 on the driver-dnf task from the rel-f1 dataset. This performance is competitive with supervised baselines such as LightGBM (68.86) and narrows the gap to RDL (72.62) to within 5.22 points, though a substantial gap remains to state-of-the-art foundation models such as KumoRFM (82.63). These results suggest that lightweight hybrid LM-GNN architectures offer a promising and resource-efficient path towards foundation models for relational databases.

2605.16078 2026-05-18 stat.ML cs.LG

A numerical study into neural network surrogate model performance for uncertainty propagation

基于神经网络代理模型的不确定性传播性能数值研究

Noah Wade, Kirubel Teferra

AI总结 本文研究神经网络代理模型在捕捉整个概率空间中解场完整分布的能力,尤其关注分布尾部表现,通过热传导方程对比了全连接网络与深度算子网络的性能。

详情
AI中文摘要

神经网络代理模型已发展为一种有前景的方法,用于建模物理建模中遇到的各种边界值问题的解场。随机问题特别受到关注,因为传统数值求解器在参数分析中可以显著减少昂贵的正向模型重复评估。然而,文献中的许多研究主要关注神经网络代理模型表示确定性样本或均值场解的能力,而忽视了代理模型在分布尾部的性能。本文详细研究了神经网络代理模型捕捉整个概率空间中解场完整分布的能力,尤其强调分布尾部的表现。作为典型问题,热传导方程具有高度随机的源项,导致热解场出现极端变化。通过比较经典前馈全连接网络和深度算子网络架构,使用数据驱动和物理指导的损失函数进行比较。结果表明,最坏情况预测误差比均值场误差大一个数量级,突显了异常样本的重要性。与极端样本相关的较大误差源于网络必须超出训练数据范围进行外推。本文提出了一种识别这些样本的方法,并讨论了处理其误差的潜在方法。在考虑的模型中,使用弱形式残差损失训练的全连接神经网络在处理这些外推输入方面表现最佳,实现了对数值生成数据集的最高预测精度。

英文摘要

Neural network surrogate models have emerged as a promising approach to model solution fields for a wide variety of boundary value problems encountered in physical modeling. Stochastic problems represent an area of particularly high interest because of the potential to significantly reduce the repeated evaluation of expensive forward models via traditional numerical solvers when conducting parametric analysis. However, many studies found in the literature primarily focus on the ability of neural network surrogate models to represent deterministic samples or mean field solutions and largely overlook surrogate model performance at the tails of the distribution. The present study examines in detail the ability of neural network surrogate models to capture the full distribution of solution fields over the entire probability space, while emphasis is placed at the tails of the distribution. Serving as a canonical problem is the heat conduction equation with a highly stochastic source term, inducing extremely large variation in the thermal solution field. Comparisons are made between a classic feed-forward fully connected network and a Deep Operator Network architecture, using both data-driven and physics-informed loss functions. Results show that the worst-case prediction errors are an order of magnitude larger than the mean field error, highlighting the importance of the outlier samples. The large errors associated with extreme samples result from the networks having to extrapolate beyond the bounds of the training data. A method for identifying these samples is presented along with a discussion of potential approaches to account of their errors. Among the models considered, the fully connected neural network trained using a weak form residual loss performs best in handling these extrapolated inputs, achieving the highest prediction accuracy for the numerically produced datasets.

2605.16046 2026-05-18 cs.SE cs.AI

XSearch: Explainable Code Search via Concept-to-Code Alignment

XSearch: 通过概念到代码对齐实现可解释的代码搜索

Yiming Liu, Ruofan Liu, Yun Lin, Zicong Zhang, Weiyu Kong, Pengnian Qi, Xiao Cheng, Weinan Zhang, Qianxiang Wang, Linpeng Huang

AI总结 本文提出XSearch框架,通过将代码搜索转化为概念对齐问题,提升代码搜索的可解释性和泛化能力,在分布偏移基准测试中性能提升显著。

Comments Accepted to ISSTA 2026

详情
AI中文摘要

语义代码搜索在学术和工业中广泛应用。这些方法将自然语言查询和代码片段嵌入共享嵌入空间并基于向量相似性检索结果。尽管在基准数据集上表现强劲,但往往存在可解释性和泛化能力差的问题。检索的代码可能在语义上相似,却遗漏了查询的关键功能需求,且无法解释为何选择该结果。此外,这种失败在分布偏移下更加严重,模型难以泛化到未见过的基准。本文提出XSearch,一种内在可解释的代码搜索框架。我们的关键见解是,通过依赖全局嵌入相似性,现有检索器本质上采取归纳观点。它们学习统计模式而非真正理解查询的功能需求。我们通过将代码搜索重新表述为演绎的概念对齐问题来解决这一问题。XSearch (i) 在查询中识别功能概念 (ii) 明确将这些概念与相应代码语句对齐。这种解释后再预测的设计产生内在的概念级解释,并减轻影响分布偏移泛化的捷径学习。我们训练一个具有显式概念对齐目标的编码器,并通过查询概念与代码语句之间的显式匹配进行检索。实验显示,训练在CodeSearchNet使用GraphCodeBERT (125M参数) 的XSearch在分布偏移基准测试中的性能从0.02提升到0.33 (15倍) 超过八种最先进的检索器,并且在参数高达7B的基线中表现一致。用户研究显示,概念对齐的解释使用户能够更快更准确地评估检索结果。

英文摘要

Semantic code search has been widely adopted in both academia and industry. These approaches embed natural-language queries and code snippets into a shared embedding space and retrieve results based on vector similarity. Despit strong performance on benchmark datasets, they often suffer from poor explainability and generalization. Retrieved code may appear semantically similar yet miss critical functional requirements of the query, while providing no explanation of why the result was retrieved. Moreover, such failures become more severe under distribution shift, where models struggle to generalize to unseen benchmarks. In this work, we propose XSearch, an intrinsically explainable code search framework. Our key insight is that by relying on global embedding similarity, existing retrievers inherently take an inductive view. They learn statistical patterns rather than truly understanding the query's functional requirements. We address this problem by reformulating code search as a deductive concept alignment problem. XSearch (i) identifies functional concepts in the query and (ii) explicitly aligns them with corresponding code statements. This explain-then-predict design produces inherent concept-level explanations and mitigates shortcut learning that harms out-of-distribution generalization. We train an encoder with explicit concept-alignment objectives and perform retrieval through explicit matching between query concepts and code statements. Experiments show that, trained on CodeSearchNet using GraphCodeBERT (125M parameters), XSearch improves performance on out-of-distribution benchmarks from 0.02 to 0.33 (15x) over eight state-of-the-art retrievers, and consistently outperforms both encoder- and decoder-based baselines with up to 7B parameters. A user study demonstrates that concept-alignment explanations enable users to evaluate retrieved results faster and more accurately.

2605.16041 2026-05-18 stat.ML cs.LG

Explainable AI Isn't Enough! Rethinking Algorithmic Contestability

可解释AI还不够!重新思考算法可争议性

Timo Freiesleben, Kristof Meding, Gunnar König

AI总结 本文探讨了算法可争议性的重要性,提出了一种新的定义,指出传统XAI方法不足以挑战算法决策,提出了三种证据类型以支持决策逆转。

详情
AI中文摘要

机器学习系统日益影响个人生活决策,如贷款审批、招聘和作弊检测,引发如何应对这些系统不利决定的问题。尽管可解释AI(XAI)主要关注算法可逆性,但算法可争议性问题却较少受到关注。本文提出可争议性作为算法问题的正式定义,强调决策可能错误,并识别三种证据类型以挑战和推翻决策。

英文摘要

Machine learning systems increasingly make life-changing decisions about individuals, such as loan approvals, hiring, and cheating detection, raising a pressing question: how can individuals respond to negative decisions made by these opaque systems? While explainable artificial intelligence (XAI) has largely focused on algorithmic recourse -- helping individuals change their features to obtain a desired outcome -- the parallel problem of algorithmic contestability -- helping individuals review and correct erroneous algorithmic decisions -- has received far less attention, despite its central ethical and legal importance. We trace this neglect to the absence of clear formal definitions and a systematic operationalization of contestability as an algorithmic problem. To address it, we propose an operational definition of contestability as a natural complement to recourse: contestability starts from the presumption that a decision may be incorrect and focuses on identifying evidence to challenge and potentially overturn it, whereas recourse assumes the decision is valid and instead provides pathways for changing it. We show that standard XAI explanations, such as counterfactuals, LIME, or Anchors, even when combined with human intuitions about decision continuity or monotonicity, reveal only errors in the neighborhood of the individual, but provide insufficient grounds for overturning the decision at hand. Going thus beyond traditional XAI, we identify three types of evidence warranting reversal according to the decision maker's own ethical standards: predictive multiplicity, incorrect feature values, and neglected overruling evidence. We argue that these render decisions normatively indefensible and thus successfully contestable. Finally, we analyze how existing EU legislation connects to our framework and argue that individuals already hold some legal rights to these forms of evidence.

2605.16035 2026-05-18 cs.CR cs.AI cs.MA

Who Owns This Agent? Tracing AI Agents Back to Their Owners

谁拥有这个智能体?追溯AI智能体回其所有者

Ruben Chocron, Doron Jonathan Ben Chayim, Eyal Lenga, Gilad Gressel, Alina Oprea, Yisroel Mirsky

AI总结 本文提出了一种基于canary的智能体归属追踪方法,解决无法追溯恶意或误配置智能体所有者的问题,展示了其在实际场景中的可靠性与鲁棒性。

Comments Under Review

详情
AI中文摘要

AI智能体越来越多地被用于在世界中自主行动,但目前仍没有可靠的方法追溯有害智能体回其部署账户。本文将这一缺口定义为智能体归属问题:将观察到的智能体交互链接到托管供应商的负责账户。我们提出了一种基于canary的协议:授权方将canary注入智能体交互流中,供应商在会话日志的狭窄窗口中恢复原始会话和账户。非对抗性情况下简单的canary足够,针对对抗性操作者过滤或改写内容,我们开发了鲁棒的canary构造,使其无法被压制而不影响智能体自身任务性能,从而在防守方获得正式的不对称优势。我们评估了多种场景,包括现实中的智能体,并证明了我们的归属方法在供应商端部署中的可靠性、鲁棒性和可扩展性。

英文摘要

AI agents are increasingly deployed to act autonomously in the world, yet there is still no reliable way to trace a harmful agent back to the account that deployed it. This creates the same accountability gap across both ends of the intent spectrum: benign operators may deploy misconfigured or overbroad agents that cause harm unintentionally, while malicious operators may deliberately weaponize agents for scams, harassment, or cyber attacks. In many cases, these agents are powered by vendor-hosted models, a dependency that holds even for sophisticated adversaries such as state actors conducting cyber operations. In either case, affected parties can observe the behavior but cannot notify the responsible operator, stop the session, or identify the account for investigation. We formalize this gap as the problem of agent attribution: linking an observed agent interaction to the responsible account at the hosting vendor. To our knowledge, this is the first work to define the problem and present a practical solution. Our protocol is canary-based: an authorized party injects a canary into the agent's interaction stream, and the vendor searches a narrow window of session logs to recover the originating session and account. Simple canaries suffice in non-adversarial settings. For adversarial operators who filter or paraphrase incoming content, we develop robust canary constructions that cannot be suppressed without degrading the agent's own task performance, yielding a formal asymmetry in the defender's favor. We evaluate a variety of scenarios including real-world agents and show that our attribution method is reliable, robust, and scalable for vendor-side deployment.

2605.15996 2026-05-18 stat.ML cs.LG math.ST stat.TH

Testing properties of trees in graphical models with covariance queries

利用协方差查询测试图模型中树的性质

Sofiya Burova, Francisco Calvillo, Gábor Lugosi, Piotr Zwiernik

AI总结 本文研究高维图模型下树结构的性质测试,设计了基于子二次查询数量的随机测试方法,针对叶子数、最大度、典型距离和直径等属性提出显式查询复杂度界限。

详情
AI中文摘要

我们考虑高维图模型下图结构性质的测试问题。我们采用Lugosi等人(2021)引入的协方差查询模型。我们研究底层图为树的情况。本文主要结果表明,尽管重建整个树可能代价高昂,但某些全局结构性质可以高效测试。特别是,我们设计了针对全局结构性质的随机测试,使用子二次数量的查询。我们为几种基本性质开发了测试程序,包括叶子数、最大度、典型距离和直径。对于每个性质,我们获得了依赖于目标阈值和容差参数的显式查询复杂度界限。

英文摘要

We consider the problem of testing properties of graphs underlying high-dimensional graphical models. We adopt the model of covariance queries introduced by Lugosi, Truszkowski, Velona, and Zwiernik (2021). We study the case when the underlying graph is a tree. The main results of the paper show that, while reconstructing the entire tree may be costly, certain global structural properties can be tested efficiently. In particular, we design randomized tests for global structural properties that use a sub-quadratic number of queries. We develop testing procedures for several fundamental properties, including the number of leaves, the maximum degree, the typical distance, and the diameter of the tree. For each property, we obtain explicit query complexity bounds that depend on the target threshold and tolerance parameters.

2605.15952 2026-05-18 cs.HC cs.RO

Driving Through the Network: Performance and Workload Under Latency and Video Impairment

通过网络驾驶:延迟和视频失真下的性能与负载

Ines Trautmannsheimer, Ahmed Azab, Frank Diermeyer

AI总结 研究通过驾驶模拟器探讨网络延迟和视频质量对驾驶员性能和负载的影响,发现延迟和带宽增加会提升操作负荷,但生理指标显示亚加性交互作用,而性能和眼动指标交互作用较小。

Comments Preprint of VEHITS 2026 : 12th International Conference on Vehicle Technology and Intelligent Transport Systems

详情
AI中文摘要

远程操作有望扩展自动驾驶车辆的操作范围,但其关键依赖于网络延迟和视频质量。我们报告了一项固定基础驾驶模拟器研究(N=25),通过增加延迟(100/300 ms)和比特率(500/2000 kbit/s)的2x2操纵,以及最佳基线(0 ms增加,9000 kbit/s)进行研究。我们测量了每种条件下的有效玻璃到玻璃(G2G)延迟(基线约413 ms;有效总延迟约500-700 ms)并验证了稳定的帧率和编码器设置。多模态测量涵盖了性能(速度、转向反转、碰撞)、眼动行为(眨眼率、固定持续时间)、生理学(RR间隔、心率、皮肤电导)和主观工作负载。延迟和比特率均增加了操作者的负载并略微影响了性能。生理学指标(心率、RR间隔)表现出亚加性交互作用,而性能和眼动交互作用较小或不显著。等价性测试显示,300 ms与2000 kbit/s相当于最佳情况(SESOI±2 km/h),而300 ms与500 kbit/s则不等价。我们主张延迟和视频质量应被视为主要独立设计变量,并且生理学感知的适应可以提前预测过载,而不会影响安全。

英文摘要

Teleoperation promises to extend the operational envelope of automated vehicles, yet it critically depends on network latency and video quality. We report a fixed-base driving-simulator study (N=25) with a 2x2 manipulation of added latency (100/300 ms) and bitrate (500/2000 kbit/s), plus a best-case baseline (0 ms added, 9000 kbit/s). We measured effective glass-to-glass (G2G) latency per condition (baseline approx. 413 ms; effective totals approx. 500-700 ms) and verified stable framerate and encoder settings. Multimodal measures covered performance (speed, steering reversals, crashes), oculomotor behavior (blink rate, fixation duration), physiology (RR interval, heart rate, skin conductance), and subjective workload. Latency and bitrate each increased operator load and modestly affected performance. Physiological measures (heart rate, RR interval) exhibited sub-additive interactions, whereas performance and oculomotor interactions were small or non-significant. Equivalence tests showed that 300 ms with 2000 kbit/s was velocity-equivalent to best-case (SESOI +/- 2 km/h), while 300 ms with 500 kbit/s was not. We argue that latency and video quality should be treated as largely independent design levers, and that physiology-aware adaptation can anticipate overload before safety is compromised.

2605.15938 2026-05-18 physics.bio-ph cs.LG

Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery

湍流中利用Q学习进行钟态嗅觉搜索:烟雾恢复的几何学

Marco Rando, Robin A. Heinonen, Yujia Qi, Agnese Seminara

AI总结 本文通过Q学习训练嗅觉搜索代理,利用时间钟表恢复烟雾,结合昆虫行为提升导航策略,但需改进策略适应局部间歇性水平以增强鲁棒性。

Comments 15 pages, 13 figures, 1 table

详情
AI中文摘要

在湍流中寻找气味源需要有效利用嗅觉观察历史以形成稳健的导航策略。本文使用表格Q学习训练一个具有最小记忆的嗅觉搜索代理:仅记录上次嗅觉后的运行时钟。该代理学习可解释的策略以恢复烟雾,结合昆虫观察到的冲浪、投掷和顺风返回行为。尽管在直接数值模拟数据上表现良好,但代理受限于无法适应局部间歇性水平;我们展示提供更多灵活性可提高鲁棒性。

英文摘要

Finding an odor source in a turbulent flow requires effectively leveraging the history of olfactory observations into a robust navigation strategy. In this work, we use tabular Q-learning to train an olfactory search agent with a minimal memory of past observations: only a running clock since the last whiff. This agent learns an interpretable strategy to recover the plume which combines well-known behaviors observed in insects: surging, casting, and a return downwind. While achieving good performance on data from direct numerical simulations of turbulence, the agent is limited by an inability to adapt its strategy to the local intermittency level; we show that providing more flexibility improves robustness.

2605.15920 2026-05-18 stat.ML cs.LG

Unsupervised Domain Shift Detection with Interpretable Subspace Attribution

无监督领域偏移检测与可解释子空间归因

Sebastian Springer, Alessandro Laio

AI总结 本文提出一种无监督领域偏移检测工具,通过高维特征空间中的局部密度异常检测,识别偏移特征子空间,从而可解释偏移来源,并提供补偿协议。

详情
AI中文摘要

我们开发了一种检测领域偏移的工具,即数据集概率分布的细微差异。我们通过检测高维特征空间中的局部密度异常来识别这些偏移。如果存在异常,则确定异常最显著的特征子空间。这使我们能够追溯偏移到一小部分特征,使其可解释。此外,我们提供了一种补偿领域偏移的协议,通过从两个未标记数据集中提取无明显残余分布差异的样本子集。我们在受控的20维基准上验证了该框架,恢复了广义和局部偏移及其支持的特征子空间。然后将其应用于由782个特征表示的健康心电图(ECG)记录。在年龄和性别匹配的队列比较中,方法检测到设备引起的偏移,提取了富含不平衡设备组件的代表性子集,并识别了与获取对比相关的ECG特征。这些结果表明,密度偏移检测和子空间归因提供了一种实用框架,可在下游建模之前揭示隐藏的队列偏见。

英文摘要

We developed a tool for detecting domain shifts, namely subtle differences in the probability distributions of datasets. We identify these shifts using an algorithm designed to detect localised density anomalies in high-dimensional feature spaces. If an anomaly is present, we then identify the feature subspace in which the anomaly is most pronounced. This allows us to trace the domain shift to a small set of features, making the shift interpretable. Moreover, we provide a protocol for compensating domain shifts by extracting, from two unlabelled datasets, subsets of samples with no detectable residual distributional difference. We validate the framework on controlled 20-dimensional benchmarks with known ground truth, recovering both broad and localized shifts together with their supporting feature subspaces. We then apply it to healthy electrocardiogram (ECG) recordings represented by 782 features. In age- and sex-matched cohort comparisons differing in measurement-device composition, the method detects device-induced shifts, extracts representative subsets enriched in the imbalanced device components, and identifies ECG features associated with the acquisition contrast. These results suggest that density-shift detection and subspace attribution provide a practical framework for uncovering hidden cohort biases before downstream modelling.

2605.15915 2026-05-18 cs.HC cs.AI cs.CL

SLIP & ETHICS: Graduated Intervention for AI Emotional Companions

SLIP与伦理:面向AI情感伴侣的渐进干预

Minseo Kim

AI总结 本文提出SLIP与ETHICS框架,通过渐进干预方法解决AI情感伴侣的安全与亲和力矛盾,实验显示在高能量状态下干预不足,但提升模型能力可改善检测效果。

Comments Accepted to PervasiveHealth 2026. 11 pages, 2 figures, 4 tables. Proc. of the 20th EAI International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth 2026)

详情
AI中文摘要

AI情感伴侣面临安全与亲和力的矛盾:严格的安全措施可能损害支持性联盟,而宽松的系统则可能危害用户。本文提出SLIP(分阶段干预协议),一种四阶段渐进方法,通过结构化定性指标(情绪强度(a)和叙述动态性(m))推导干预措施(无、轻度、重度)。ETHICS(人类-人工智能交互上下文信号的新兴分类法)是一种“信号而非标签”的分类法。结合小规模生产部署(N=68,10名用户,10周)和合成角色电池测试(N=91,5种行为风险配置文件),结果显示流角色的误报率为0%,并在危机导向角色中显示出预期的升级模式。然而,初步结果表明,连续8天的高能量提升导致零干预(0/8),暴露了“不病理化”原则与安全之间的边界。后续的三模型压力测试显示,增加模型能力可将检测率从0/8提升至6/8,同时在最大模型中保持0/10的流误报率。这些发现将渐进干预作为导航而非解决情感计算中安全与亲和力张力的设计方向。

英文摘要

AI emotional companions face a safety-rapport paradox: restrictive safeguards can damage supportive alliance, while permissive systems risk user harm. We present SLIP (Staged Layers of Intervention Protocol), a four-stage graduated methodology deriving interventions (none, soft, hard) from structured qualitative indicators -- affect intensity (a) and narrative dynamism (m) -- alongside ETHICS (Emergent Taxonomy for Human-AI Interaction Context Signals), a "signals not labels" taxonomy. An evaluation combining a small-scale production deployment (N=68 entries, 10 users, 10 weeks) with a synthetic persona battery (N=91, 5 behavioral-risk profiles) achieved 0% false positives for the flow persona and showed expected escalation patterns in crisis-oriented personas. However, initial results showed that 8 consecutive days of high-energy elevation produced zero interventions (0/8), exposing a boundary where the "do not pathologize" principle conflicts with safety. A subsequent three-model stress test demonstrated that increased model capability improves detection from 0/8 to 6/8 while preserving 0/10 flow false positives in the largest model. Read as preliminary, these findings position graduated intervention as a design direction for navigating -- not resolving -- the safety-rapport tension in affective computing.