arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1709
专题追踪
2606.14260 2026-06-15 cs.IR cs.AI 新提交

ChronoID: Infusing Explicit Temporal Signals into Semantic IDs for Generative Recommendation

ChronoID: 将显式时间信号注入语义ID用于生成式推荐

Dongdong Nian, Dongqi Fu, Chenliang Xu, Yinglong Xia, Hong Li, Hong Yan, Jian Kang

发表机构 * University of Rochester(罗切斯特大学) Meta MRS MBZUAI

AI总结 提出ChronoID框架,通过沿三个正交维度注入显式时间信号到语义ID中,解决生成式推荐中时间信息缺失问题,并构建新基准验证其有效性。

详情
AI中文摘要

语义ID在生成式推荐中至关重要,但存在一个根本性限制:时间信息未能很好地融入语义ID。相反,时间仅隐式影响推荐(例如,通过会话构建启发式、偏好对齐或序列顺序),而现有的语义ID学习完全与时间无关。这种设计将不同时间上下文下的交互混为一谈,隐含地假设物品语义和用户意图在时间上是平稳的。这种假设与真实推荐场景不符,其中演变的交互节奏起着核心作用。在这项工作中,我们研究了显式时间应如何以及在哪里被纳入生成式推荐的语义ID中。首先,我们沿时间信号的三个正交维度系统地表征了设计空间,并提出了一个统一框架ChronoID,用于时间感知的语义ID学习。然后,通过贡献一个新的时间显式生成推荐基准,ChronoID回答了以下问题:注入时间的有效方式是什么,如何设计架构,以及增益来自何处。

英文摘要

Semantic IDs are crucial in generative recommendation, but with a fundamental limitation: temporal information is not well incorporated into semantic IDs. Instead, time influences recommendation only implicitly (e.g., through session construction heuristics, preference alignment, or sequence order), while existing semantic ID learning remains entirely time-agnostic. This design conflates interactions occurring under distinct temporal contexts into identical semantic representations, implicitly assuming that item semantics and user intent are temporally stationary. Such an assumption is misaligned with real-world recommendation scenarios, where evolving interaction rhythms play a central role. In this work, we investigate where and how the explicit time should be incorporated into semantic ID for generative recommendation. First, we systematically characterize the design space along three orthogonal dimensions of temporal signals and present a unified framework, ChronoID, for time-aware semantic ID learning. Then, by contributing a new time-explicit generation recommendation benchmark, ChronoID answers the questions: what is the effective way of infusing time, how to design the architecture, and where does the gain come from.

2606.14248 2026-06-15 eess.IV cs.CV 新提交

Spectrum Aware Illumination Estimation Using Multispectral Image

利用多光谱图像的光谱感知光照估计

Hyejin Oh, Woo-Shik Kim, Sangyoon Lee, YungKyung Park, Je-Won Kang

发表机构 * Department of Electronic and Electrical Engineering, Ewha W. University(成均馆大学电子与电气工程系) Telechips Samsung Advanced Institute of Technology(三星先进技术研究所) Department of Design, Ewha W. University(成均馆大学设计系)

AI总结 提出一种结合光谱注意力机制和光照先验的深度学习框架,通过时空光谱特征提取块和跨传感器域变换,实现高精度光照谱估计,并在真实多光谱数据集上验证了优越性。

Comments Accepted for publication in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT). DOI: 10.1109/TCSVT.2026.3701975

详情
AI中文摘要

多光谱成像通过捕获更多光谱波段扩展了传统的RGB成像,从而改进了光照谱估计。然而,现有方法往往未能充分利用光谱信息,导致在不同光照条件和不同传感器域下性能欠佳。因此,我们提出了一种具有时空光谱特征提取块的深度学习框架,该框架结合了光谱注意力机制以增强光谱相关性并保留与光照相关的空间特征。通过引入光照先验,我们的方法优先考虑在多光谱图像中提供更有意义信息的特定通道。我们还提出了跨不同多光谱传感器空间的光谱域变换。结果表明,在高维传感器空间中学习到的光照谱可以有效地变换到各种低维相机传感器空间,而无需任何额外训练。为了便于评估,我们引入了一个真实世界的多光谱数据集,其中包含在不同光照条件下捕获的高维真实光照谱。通过大量实验,我们证明了我们的方法相比现有模型实现了更高的准确性,从而为现实世界的光照谱估计提供了实用解决方案。代码和数据集可在以下网址获取:此 https URL。

英文摘要

Multispectral (MS) imaging extends beyond conventional RGB imaging by capturing more spectral bands, thereby improving illuminant spectrum estimation (ISE). However, existing methods often fail to fully exploit spectral information, resulting in suboptimal performance under diverse lighting conditions and across different sensor domains. Hence, we propose a deep learning framework with a spatio-spectral feature extraction block, which incorporates spectral attention mechanisms to enhance spectral correlation and preserve illuminant-relevant spatial features. Through the inclusion of an illuminant prior (IP), our approach prioritizes specific channels that provide more meaningful information in an MS image. We also propose a spectral-domain transform across different MS sensor spaces. The results demonstrate that illuminant spectra learned in high-dimensional sensor spaces can be effectively transformed to various lower-dimensional camera sensor spaces without any additional training. To facilitate evaluation, we introduce a real-world MS dataset containing high-dimensional ground-truth illumination spectra captured under diverse lighting conditions. Through extensive experiments, we demonstrate that our method achieves superior accuracy compared to existing models, thus providing a practical solution for real-world ISE. The code and dataset are available at https://github.com/hyejin5/Spectrum-Aware-Illumination-Estimation-Using-Multispectral-Image.

2606.14210 2026-06-15 cs.CR cs.AI 新提交

From Prompts to Responses: Dual-Sided Data Leakage and Defense in Split Large Language Models

从提示到响应:分割大语言模型中的双面数据泄露与防御

Zixuan Gu, Xiaojun Ye, Yang Liu

发表机构 * GitHub

AI总结 提出PIDI攻击方法,同时泄露分割LLM中的输入提示和输出响应;并设计ADMI防御机制,通过适配器热身和互信息正则化有效抵御攻击。

Comments 18 pages, Accepted at ICML 2026

详情
AI中文摘要

大型语言模型(LLM)越来越多地部署在隐私敏感领域,用户必须在通过外部API暴露数据的风险与本地部署的高计算成本之间取得平衡。因此,分割学习已成为在有限本地资源下进行LLM微调和推理的一种有前景的范式。然而,它引入了新的隐私风险。先前的工作主要研究私有输入提示的泄露,通常通过对中间表示进行反转攻击,而通过生成响应输出泄露敏感信息的可能性在很大程度上尚未被探索。在这项工作中,我们通过提出具有双面初始化的补丁模型反转(PIDI)揭示了Split-LLM的新漏洞,这是一种两阶段攻击,同时针对Split-LLM设置中的私有输入提示和输出响应。它结合了双面初始化与补丁反转策略来处理长序列,显著优于先前的反转方法。为了应对来自两方面的威胁,我们进一步提出了基于适配器的具有互信息防御的双重守卫(ADMI),它集成了基于适配器的本地热身策略和互信息正则化,以在最小影响任务性能的情况下提供强大的经验隐私保护。跨不同任务和模型的广泛实验表明,ADMI有效防御了PIDI和其他最先进的反转攻击。我们的代码在此https URL公开。

英文摘要

Large language models (LLMs) are increasingly deployed in privacy-sensitive domains, where users must balance the risk of data exposure through external APIs against the high computational cost of local deployment. Split learning has therefore emerged as a promising paradigm for LLM fine-tuning and inference under limited local resources. However, it introduces new privacy risks. Prior work primarily studies leakage of private input prompts, typically via inversion attacks on intermediate representations, while the potential for sensitive information leakage through generative response outputs remains largely unexplored. In this work, we unveil novel vulnerabilities of Split-LLM by presenting Patched Model Inversion with Dual-Sided Initialization (PIDI), a two-stage attack that simultaneously targets both private input prompts and output responses in Split-LLM settings. It combines dual-sided initialization with a patched inversion strategy to tackle long sequences, substantially outperforming prior inversion methods. To counter threats from both sides, we further propose the Adapter-based DualGuard with Mutual Information Defense (ADMI), which integrates an adapter-based local warmup strategy and mutual information regularization to provide a strong empirical privacy protection with minimal impact on task performance. Extensive experiments across diverse tasks and models demonstrate that ADMI effectively defends against PIDI and other state-of-the-art inversion attacks. Our code is publicly available at https://github.com/FLAIR-THU/VFLAIR-LLM.

2606.14181 2026-06-15 math.NA cs.LG cs.NA 新提交

Robin-Neumann Coupling of PINN and FEM Solvers: A Steklov-Poincaré View, with Application to Fluid-Structure Interaction with Contact

Robin-Neumann 耦合 PINN 与 FEM 求解器:基于 Steklov-Poincaré 视角及其在流固耦合接触问题中的应用

Mikel Landajuela

发表机构 * Lawrence Livermore National Laboratory(劳伦斯利弗莫尔国家实验室)

AI总结 提出基于域分解的 PINN-FEM 耦合框架,通过 Steklov-Poincaré 算子理论证明 Robin-Neumann 迭代的收缩性,并引入傅里叶模态探针诊断网络谱上限,在接触流固耦合问题中实现无网格拓扑变化。

详情
AI中文摘要

物理信息神经网络(PINN)是无网格的,并通过重新采样配置点来处理移动几何和拓扑变化;有限元方法(FEM)是边界拟合离散化的主力。两者在共享界面上的耦合有望兼得两者优势,但现有的 PINN-FEM 方案仅经过经验验证。我们将耦合置于域分解基础上:将每个求解器视为 Steklov-Poincaré(迹到通量)算子,我们转移了经典的 Dirichlet-Neumann(DN)发散诊断及其 Robin-Neumann(RN)修正,包括一个闭式、无扫描的界面阻抗,并证明了一个特定于 PINN 的收缩定理:训练好的网络仅实现一个带有每步训练残差的扰动 Steklov 算子,而 RN 在没有共享特征基假设的情况下,收缩到由达到的训练损失决定的下限。由于 PINN 没有刚度矩阵,我们引入了一个傅里叶模态界面探针,该探针恢复网络可解的 Steklov 特征值,误差在 0.5% 以内,并兼作网络谱上限的诊断。该理论预测了在 1D 和 2D Poisson 耦合中测量的 PINN-FEM 收缩率,误差在 7% 以内,并且一个大附加质量区域的双板类比显示,RN 的每模态阻抗匹配在调谐标量松弛饱和的地方取得了决定性胜利。我们在一个带有 Alart-Curnier 接触的 Stokes/刚性圆盘问题上演示了该框架:无网格 PINN 流体仅通过配置点排除来吸收接触时的拓扑变化,无需重新网格划分和切割单元,并且静态平衡接触反力在网格细化下与浸没重量匹配到 0.4%。我们量化了剩余的局限性:热启动的 PINN 在长时间范围内偏离 Stokes 流形,并且匹配的 FEM-FEM 基准将冲击前的挤压膜特征归因于 PINN 分辨率不足。

英文摘要

Physics-informed neural networks (PINNs) are meshless and carry moving geometry and topology change through resampling of collocation points; the finite-element method (FEM) is the workhorse for boundary-fitted discretisations. Coupling the two across a shared interface promises the best of both, yet existing PINN-FEM schemes are validated only empirically. We put the coupling on a domain-decomposition footing: viewing each solver as a Steklov-Poincaré (trace-to-flux) operator, we transfer the classical Dirichlet-Neumann (DN) divergence diagnosis and its Robin-Neumann (RN) cure, including a closed-form, sweep-free interface impedance, and prove a PINN-specific contraction theorem: a trained network realises only a perturbed Steklov operator with a per-step training residual, and RN still contracts, with no shared-eigenbasis hypothesis, to a floor set by the achieved training loss. Because a PINN has no stiffness matrix, we introduce a Fourier-mode interface probe that recovers the network's resolvable Steklov eigenvalues to within 0.5% and doubles as a diagnostic of the network's spectral cap. The theory predicts measured PINN-FEM contraction rates to within 7% on 1D and 2D Poisson couplings, and a two-slab analogue of the large-added-mass regime shows RN's per-mode impedance matching winning decisively where tuned scalar relaxation saturates. We demonstrate the framework on a Stokes/rigid-disc problem with Alart-Curnier contact: the meshless PINN fluid absorbs the topology change at contact by collocation exclusion alone, no remeshing and no cut cells, and the static-equilibrium contact reaction matches the submerged weight to 0.4% under mesh refinement. We quantify remaining limitations: the warm-started PINN drifts off the Stokes manifold over long horizons, and matched FEM-FEM benchmarks attribute pre-impact squeeze-film signatures to PINN under-resolution.

2606.14127 2026-06-15 cs.IR cs.CL 新提交

CoRe: A Continuously Reward-Finetuned LLM Query Rewriter for Multi-Stage Context-Aware Relevance in Web-Scale Video Search

CoRe:一种持续奖励微调的LLM查询重写器,用于大规模视频搜索中的多阶段上下文感知相关性

Yilin Wen, Rong Yang, Xiaojia Chang, Hong Sun, Gefu Tang, Chunhui Liu, Jeffrey Chen, Zeyu Ma, Lisong Qiu, Xiaochuan Fan, Congjia Yu, Quan Zhou, Yuheng Chen, Zian Wang

发表机构 * TikTok

AI总结 提出CoRe系统,通过半在线混合偏好优化和乘法奖励比,实现每周重新部署的LLM查询重写器,在多阶段视频搜索中显著降低变更查询率并提升相关性指标。

Comments 12 pages, 3 figures

详情
AI中文摘要

生产环境中的基于LLM的查询重写器面临一个矛盾:训练奖励必须反映重写结果如何被生产排序器使用,但训练过程必须足够廉价以支持随着数据漂移而持续重新部署。我们提出了CoRe(上下文相关性)系统,该系统在主要短视频搜索引擎中每周重新部署,已超过五个月。我们的奖励使用部署的多模态相关性模型作为其来源,并采用乘法比率形式,镜像生产融合代数,从而缩小了离线奖励代理留下的模拟-生产差距。半在线混合偏好优化循环使得这种奖励在每周数百万实例规模下可行:一个DPO风格的成对目标将梯度传递限制在采样轨迹的小型top-k/bottom-k子集,并且一个阶段结构将训练器/推理服务器参数同步从每步减少到每阶段。一个基于奖励和稳定性指标的自动升级门检测并恢复了一次生产中的真实奖励黑客事件。重写器输出作为并行相关性信号在召回、原始排序和精细排序阶段被消费,而不取代原始信号,从而限制了重写器故障的影响范围。来自两个连续生产发布的在线A/B测试,首先在精细排序阶段部署重写器,然后扩展到召回和原始排序阶段,在受重写影响的查询上实现了统计显著的变更查询率降低,所有主要相关性和参与度指标均朝着预期方向移动。

英文摘要

LLM-based query rewriters in production face a tension: the training reward must reflect how the rewrite is consumed by the production ranker, yet the training procedure must be cheap enough to support continuous redeployment as data drifts. We present CoRe (Context Relevance), such a system, redeployed weekly for over five months in a major short-video search engine. Our reward uses the deployed multimodal relevance model as its source and a multiplicative ratio form mirroring the production fusion algebra, closing the simulation-production gap that offline reward proxies leave open. A semi-online Mixed Preference Optimization loop makes this reward affordable at multi-million-instance weekly scale: a DPO-style pairwise objective restricts the gradient pass to a small top-k/bottom-k subset of sampled trajectories, and a phase structure reduces trainer/inference-server parameter syncs from per-step to per-phase. An automated promotion gate over reward-like and stability metrics detected and recovered from a real reward-hacking incident in production. Rewriter output is consumed as parallel relevance signals at recall, rawrank, and finerank without displacing the original signals, bounding rewriter-failure blast radius. Online A/B from two sequential production launches, first deploying the rewriter at finerank, then extending consumption to recall and rawrank, delivers statistically significant reductions in change-query rate on rewrite-impacted queries, with all headline relevance and engagement metrics moving in the expected direction.

2606.14120 2026-06-15 eess.SP cs.AI cs.LG cs.SD eess.AS 新提交

FAConformer: Frequency-Aware Convolutional Transformer for Auditory Attention Decoding

FAConformer:用于听觉注意解码的频率感知卷积Transformer

Ziwei Wang, Xingyi He, Tianwang Jia, Hongbin Wang, Dongrui Wu

发表机构 * Hubei Key Laboratory of Brain-inspired Intelligent Systems, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology(湖北脑启发智能系统重点实验室,人工智能与自动化学院,华中科技大学)

AI总结 提出FAConformer框架,通过频带特定编码和自适应跨频带交互,有效利用脑电图频域信息进行听觉注意解码,在公开数据集上超越现有最佳模型4.9%。

Comments 15 pages, 7 figures

详情
AI中文摘要

听觉注意解码(AAD)旨在从多说话人声学环境中的神经反应推断被注意的说话人,是神经导向听力系统的关键问题。尽管最近的研究取得了令人鼓舞的进展,但现有的AAD模型仍未充分利用频域脑电图(EEG)信息。特别是,大多数方法通过手工特征提取或直接跨频带特征拼接引入多频带信息,这主要是在浅层利用频率信息,可能忽略频带特定模式和跨频带交互。为了解决这些局限性,本文提出了FAConformer,一种用于AAD的频率感知CNN-Transformer框架,它明确集成了频带特定编码和自适应跨频带交互。具体来说,FAConformer首先将EEG信号分解为多个频带,并为每个频带分配一个独立的CNN-Transformer编码器进行频带特定建模。然后,通过精心设计的频率感知注意(FAA)模块自适应地融合得到的频带特征,该模块通过将频带特征视为令牌来建模跨频带依赖关系。此外,引入了频带辅助监督(BAS)以防止在联合训练中贡献较弱的分支优化不足。通过这种方式,FAConformer执行频率感知建模,更有效地利用频域信息。在两个公开AAD数据集上使用三种决策窗口长度进行的广泛实验表明,FAConformer始终优于12个竞争基线,比当前最先进模型高出4.9%。对频带重要性、消融和参数敏感性的进一步分析验证了所提出框架的有效性、鲁棒性和可解释性。代码可在此https URL获取。

英文摘要

Auditory attention decoding (AAD) aims to infer the attended speaker from neural responses in multi-speaker acoustic environments and is a key problem for neuro-steered hearing systems. Although recent studies have achieved encouraging progress, existing AAD models still do not fully exploit frequency domain electroencephalography (EEG) information. In particular, most approaches introduce multi-band information through handcrafted feature extraction or direct cross-band feature concatenation, which mainly exploit frequency information at a shallow level and may overlook band-specific patterns and cross-band interactions. To address these limitations, this paper proposes FAConformer, a frequency-aware CNN-Transformer framework for AAD that explicitly integrates band-specific encoding and adaptive cross-band interaction. Specifically, FAConformer first decomposes EEG signals into multiple frequency bands and assigns each band to an independent CNN-Transformer encoder for band-specific modeling. The resulting band-wise features are then adaptively fused by a carefully designed frequency-aware attention (FAA) module that models cross-band dependencies by treating band-wise features as tokens. Further, band-wise auxiliary supervision (BAS) is introduced to prevent weakly contributing branches from being under-optimized during joint training. In this way, FAConformer performs frequency-aware modeling that more effectively exploits frequency domain information. Extensive experiments on two public AAD datasets with three decision-window lengths demonstrated that FAConformer consistently outperformed 12 competitive baselines, surpassing the current state-of-the-art model by 4.9%. Further analyses of band importance, ablation, and parameter sensitivity verify the effectiveness, robustness, and interpretability of the proposed framework. Code is available at https://github.com/wzwvv/FAConformer.

2606.14117 2026-06-15 stat.ME cs.AI 新提交

A Two-Stage Statistical Framework for Evaluating Associative Interference in Large Language Models

评估大语言模型中联想干扰的两阶段统计框架

Achraf Cohen, Andrew Kincaid

发表机构 * Department of Mathematics and Statistics, University of West Florida(数学与统计学系,西弗吉尼亚大学)

AI总结 提出两阶段统计框架,分离响应遵从性与任务一致性,评估三个LLM在性别-职业等领域的联想干扰,发现效应因模型而异。

Comments 11 pages; 2 figures

详情
AI中文摘要

大语言模型(LLM)越来越多地通过改编人类心理范式来评估偏见,然而方法论上的局限性——特别是将拒绝行为与任务表现混为一谈——阻碍了清晰的解释。在此,我们将内隐联想测验(IAT)改编为一个受控的强制选择框架,并引入一个两阶段建模方法,将响应遵从性与任务一致性分类分开。在三个当代LLM(Claude Sonnet-4、Gemini 2.5 Pro和GPT-5)上,我们评估了联想干扰,定义为不一致条件相对于一致条件下任务一致性的降低。虽然对结构化响应格式的遵从性普遍较高,但干扰效应在模型和领域之间差异很大。Claude Sonnet-4在性别-职业领域表现出强干扰(DeltaP = 0.086, 95% CrI [0.026, 0.173]),在性别-科学领域表现出较小但可信的效应。Gemini 2.5 Pro显示出减弱的干扰,而GPT-5在所有领域表现出最小或不可检测的干扰。这些发现表明,IAT风格的联想不对称性并非LLM的普遍属性,而是取决于模型特定特征。通过将干扰与遵从性分离并对项目水平变异性建模,本研究为评估LLM中的结构化响应模式提供了一个原则性框架。结果强调了模型特定评估的重要性,并表明联想干扰在现代系统中可以得到实质性缓解。

英文摘要

Large language models (LLMs) are increasingly evaluated for bias using adaptations of human psychological paradigms, yet methodological limitations-particularly the conflation of refusal behavior with task performance-have hindered clear interpretation. Here, we adapt the Implicit Association Test (IAT) to a controlled, forced-choice framework and introduce a two-stage modeling approach that separates response compliance from task-consistent classification. Across three contemporary LLMs (Claude Sonnet-4, Gemini 2.5 Pro, and GPT-5), we evaluate associative interference, defined as reduced task-consistency in incongruent relative to congruent conditions. While compliance with the structured response format was uniformly high, interference effects varied substantially across models and domains. Claude Sonnet-4 exhibited strong interference in the Gender--Career domain (DeltaP = 0.086, 95% CrI [0.026, 0.173]) and smaller but credible effects in Gender--Science. Gemini 2.5 Pro showed attenuated interference, and GPT-5 exhibited minimal or no detectable interference across domains. These findings demonstrate that IAT-style associative asymmetries are not a universal property of LLMs, but instead depend on model-specific characteristics. By isolating interference from compliance and modeling item-level variability, this study provides a principled framework for evaluating structured response patterns in LLMs. The results highlight the importance of model-specific assessment and suggest that associative interference can be substantially mitigated in modern systems.

2606.14113 2026-06-15 cs.SE cs.CL 新提交

Simulating Students' Java Programming Errors with Large Language Models

用大型语言模型模拟学生的Java编程错误

Ali Keramati, Jie Cao, Iman Mohammadi, Mark Warschauer, Yang Shi

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校)

AI总结 探索用LLM模拟学生编程错误,评估五种模型在多样性和对齐性上的表现,发现Claude Sonnet 4平衡最佳,且合成错误与真实错误难以区分。

详情
AI中文摘要

理解学生在编程中的错误是编程教育的基石,然而,对于任何新设计的任务,获取具有代表性的学生错误集仍然缓慢且成本高昂,因为真实的提交只有在广泛的课堂部署后才能积累。本文探讨了大型语言模型(LLMs)是否可以通过模拟代码提交中的真实逻辑错误,作为学生的可扩展代理。使用包含37个问题的74,000多个独特学生Java提交的CodeWorkout数据集,我们在三种主流提示策略下评估了五种LLM:输入-输出(IO)、思维链(CoT)和迭代自我改进。我们沿着两个关键维度评估性能:多样性(不同错误模式的范围)和对齐性(与真实学生错误的一致性),并考察这些维度如何随编程任务的困难程度变化。我们的定量发现表明,虽然所有模型都能生成多样化的错误,但它们与人类提交的对齐性存在差异:Claude Sonnet 4实现了最平衡的性能。此外,我们进行了一项盲法专家注释研究(N = 401),比较合成错误和真实错误。这一定性分析证实,生成的错误在功能上与真实学生错误无法区分。此外,更高困难程度的问题会引发更多样化但更不像学生的错误。这些结果突出了使用LLM模拟人类学习者的权衡,并为将合成错误集成到可教学代理、智能辅导系统和大规模学习分析中提供了设计考虑。

英文摘要

Understanding student errors in the programming is a cornerstone of programming education, yet obtaining a representative set of student errors for any newly designed task remains slow and costly, since authentic submissions only accumulate after extensive classroom deployment. This paper explores whether large language models (LLMs) can serve as scalable proxies for students by simulating realistic logical errors in code submissions. Using the CodeWorkout dataset of 74,000+ unique student Java submissions across 37 problems, we evaluate five LLMs under three mainstream prompting strategies: Input-Output (IO), Chain-of-Thought (CoT), and iterative Self-Refine. We assess performance along two key dimensions: diversity (the range of distinct error patterns) and alignment (alignment with authentic student mistakes), and examine how these vary by struggling level of programming tasks. Our quantitative findings reveal that while all models generate diverse errors, their alignment to human submissions diverges: Claude Sonnet 4 achieves the most balanced performance. In addition, we conducted a blinded expert annotation study (N = 401) comparing synthetic and authentic errors. This qualitative analysis confirms that the generated errors are functionally indistinguishable from authentic student errors. Moreover, higher-struggling-level problems elicit more diverse but less student-like errors. These results highlight trade-offs in using LLMs to simulate human learners and suggest design considerations for integrating synthetic errors into teachable agents, intelligent tutoring systems, and large-scale learning analytics.

2606.14106 2026-06-15 cs.MA cs.CV 新提交

Naive Visual Memory is Not Enough: A Failure-Mode Study of GUI Agents

朴素视觉记忆不足:GUI代理的失败模式研究

Seoyoung Choi, Minseok Ko, Hyunseok Lee, Kunwoong Kim, Woomin Song, Chanseok Jeon, Jinwoo Shin

发表机构 * KAIST(韩国科学技术院)

AI总结 本文提出动作锚定视觉记忆(AGMem),通过存储与成功动作相关的局部GUI区域图像而非全屏截图,减少GUI代理中的动作级错误,在OSWorld上将任务成功率提升33.3%。

Comments 9 pages, 5 figures, ICML 2026 WORKSHOP

详情
AI中文摘要

图形用户界面(GUI)代理越来越多地被用于自动化跨应用程序、网站和操作系统的复杂计算机任务。为了提高其可靠性,最近的工作引入了经验记忆,代理检索先前的轨迹以指导相似状态下的决策。更近期的方法进一步将这一思想扩展到视觉记忆,通过存储和检索过去交互中的截图,为代理提供比纯文本记忆更丰富的上下文信息。然而,视觉记忆在GUI代理中的效果仍未被充分理解:不清楚视觉记忆缓解了哪些失败,或加剧了哪些失败。为了系统分析视觉记忆的效果,我们引入了一个包含四种GUI代理失败(即认知失败、视觉状态误解、隐藏操作盲点和接地错误)的分类法,这些失败对应于感知-推理-动作流水线的不同阶段。我们发现,前置全图像记忆对失败分布产生了分歧性影响:它减少了状态级失败,但加剧了动作级失败,并增加了隐藏操作盲点和接地错误。受此发现启发,我们提出了动作锚定视觉记忆(AGMem),一种用于GUI代理的动作锚定记忆框架。AGMem的核心思想是存储捕捉与成功动作或恢复密切相关的局部GUI区域的图像裁剪,而不是存储全屏截图。在OSWorld上的实验表明,AGMem比全图像记忆将任务成功率提高了33.3%。这些结果表明,AGMem是GUI代理中视觉记忆的一种有效表示。

英文摘要

Graphical User Interface (GUI) agents are increasingly used to automate complex computer tasks across applications, websites, and operating systems. To improve their reliability, recent work has introduced experiential memory, where agents retrieve prior trajectories to guide decision-making in similar states. More recent approaches further extend this idea to visual memory by storing and retrieving screenshots from past interactions, providing agents with richer contextual information than text-only memories. However, the effect of visual memory in GUI agents remains insufficiently understood: it is unclear which failures visual memory mitigates, or which failures it exacerbates. To systematically analyze the effect of visual memory, we introduce a taxonomy of four GUI agent failures (i.e., cognitive failure, visual state misunderstanding, hidden operation blindness, and grounding error) that map to distinct stages of the perception-reasoning-action pipeline. We find that prepending full-image memory has a divergent effect on the failure distribution: it reduces state-level failures but worsens action-level ones, and increases hidden operation blindness and grounding error. Motivated by this finding, we propose Action-Grounded Visual Memory (AGMem), an action-grounded memory framework for GUI agents. The core idea of AGMem is to store image crops that capture the local GUI region closely related to a successful action or a recovery, rather than storing full screenshots. Experiments on OSWorld show that AGMem improves task success rates by 33.3 % over full-image memory. These results demonstrate that AGMem is an effective representation for visual memory in GUI agents.

2606.14053 2026-06-15 stat.ML cs.LG 新提交

Hybrid Uncertainty Sensitivity Analysis Based on the HSIC for High-Dimensional Responses with Aleatory--Epistemic Separation

基于HSIC的混合不确定性灵敏度分析:面向具有偶然-认知分离的高维响应

Shijie Zhong, Jiangfeng Fu, Pengfei Wei

发表机构 * School of Power and Energy, Northwestern Polytechnical University(能源学院,西北工业大学)

AI总结 提出双空间张量积RKHS框架,通过分解核函数和双重Möbius反演,将全局依赖度量正交分解为纯偶然效应、纯认知效应及其交互贡献,实现高维响应下混合不确定性的灵敏度分析。

Comments 19 pages, 7 figures

详情
AI中文摘要

量化混合偶然和认知不确定性对高维系统响应的影响仍然是全局灵敏度分析(GSA)中的主要挑战。现有的基于希尔伯特-施密特独立性准则(HSIC)的方法主要局限于单输出设置,并且缺乏对异质不确定性来源及其相互作用的严格分解。为了解决这一局限性,提出了一种新颖的双空间张量积RKHS框架,用于混合不确定性下的灵敏度分析。通过在潜在输入空间和多维输出空间上构造因子化核,推导出并发双重Möbius反演,将全局依赖度量正交分解为纯偶然效应、纯认知效应及其交互贡献。得到的维度灵敏度指数保留了所有输出维度上的不确定性归因结构。为了满足分解所需的独立性假设,引入了基于逆概率积分变换的辅助变量表示,使得能够在统一的潜在空间中处理层次不确定性和Copula诱导的相关性。进一步开发了完全向量化的单循环实现,以避免嵌套蒙特卡洛模拟的计算负担。通过置换检验和Bootstrap置信区间量化统计显著性和估计不确定性。在改进的多输出Ishigami函数和空气动力学压力场问题上的数值研究证明了所提出框架的准确性、可扩展性和实际适用性。

英文摘要

Quantifying the influence of hybrid aleatory and epistemic uncertainties on high-dimensional system responses remains a major challenge in global sensitivity analysis (GSA). Existing Hilbert--Schmidt Independence Criterion (HSIC)-based approaches are primarily restricted to single-output settings and lack a rigorous decomposition of heterogeneous uncertainty sources and their interactions. To address this limitation, a novel double-space tensor-product RKHS framework is proposed for sensitivity analysis under hybrid uncertainty. By constructing factorized kernels over both the latent input space and the multidimensional output space, a concurrent double Möbius inversion is derived to orthogonally decompose the global dependence measure into pure aleatory effects, pure epistemic effects, and their interaction contributions. The resulting dimension-wise sensitivity indices preserve the uncertainty attribution structure across all output dimensions. To satisfy the independence assumptions required by the decomposition, an auxiliary-variable representation based on the inverse probability integral transform is introduced, enabling the treatment of hierarchical uncertainties and Copula-induced correlations within a unified latent space. A fully vectorized single-loop implementation is further developed to avoid the computational burden of nested Monte Carlo simulation. Statistical significance and estimation uncertainty are quantified through permutation testing and Bootstrap confidence intervals. Numerical studies on a modified multi-output Ishigami function and an aerodynamic pressure-field problem demonstrate the accuracy, scalability, and practical applicability of the proposed framework.

2606.14047 2026-06-15 cs.IR cs.AI cs.CL cs.LG 新提交

Knowledge Graph Enhanced Memory-Augmented Retrieval for Long Context Modeling

知识图谱增强的记忆增强检索用于长上下文建模

Ghadir Alselwi, Basem Suleiman, Hao Xue, Shoaib Jameel, Hakim Hacid, Flora D. Salim, Imran Razzak

发表机构 * University of New South Wales(新南威尔士大学) Hong Kong University of Science and Technology (Guangzhou)(香港科学与技术大学(广州)) University of Southampton(南安普顿大学) Technology Innovation Institute(技术创新研究所) Mohamed Bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学)

AI总结 提出KGERMAR框架,通过动态构建上下文知识图谱并融合多组件记忆架构,在长上下文建模中降低困惑度达8.5%,提升记忆效率2-2.5倍。

详情
AI中文摘要

长上下文语言建模不仅需要扩展上下文窗口,还需要在数千个token中保持对实体状态和关系的连贯理解——这是语义相似性单独无法解决的挑战。KGERMAR通过在推理过程中从输入文本构建动态的、上下文特定的知识图谱来解决这一问题,实现利用语义相似性和显式实体关系的领域自适应检索。该框架执行实时实体和关系抽取以构建上下文知识图谱,然后通过多组件记忆架构将图结构嵌入与文本语义相结合。维护三个记忆库——上下文、语义和结构——通过学习权重融合检索信号,以捕获表面语义和更深层次的关系模式。在SlimPajama(84.7K训练样本)、WikiText-103(4,358样本)、PG-19(100样本)和Proof-pile(46.3K样本)上评估,KGERMAR在1K到32K token的上下文长度上,相比记忆增强基线实现了高达8.5%的困惑度降低和2-2.5倍的记忆效率提升,并在五个NLU任务上展现出优越的上下文学习性能。动态知识图谱构建方法通过实现适应输入上下文而非依赖固定知识库的领域特定知识表示,推进了记忆增强语言建模。

英文摘要

Long-context language modeling requires not only extending context windows but maintaining coherent understanding of entity states and relationships across thousands of tokens -- a challenge that semantic similarity alone cannot address. KGERMAR addresses this by constructing dynamic, context-specific knowledge graphs from input text during inference, enabling domain-adaptive retrieval that leverages both semantic similarity and explicit entity relationships. The framework performs real-time entity and relation extraction to build contextual knowledge graphs, then integrates graph-structural embeddings with textual semantics through a multi-component memory architecture. Three memory banks -- contextual, semantic, and structural -- are maintained with retrieval signals fused via learned weights to capture both surface-level semantics and deeper relational patterns. Evaluated on SlimPajama (84.7K training examples), WikiText-103 (4,358 examples), PG-19 (100 examples), and Proof-pile (46.3K examples), KGERMAR achieves up to 8.5\% lower perplexity and 2--2.5x better memory efficiency than memory-augmented baselines across context lengths from 1K to 32K tokens, with superior in-context learning performance across five NLU tasks. The dynamic knowledge graph construction approach advances memory-augmented language modeling by enabling domain-specific knowledge representation that adapts to input contexts rather than relying on fixed knowledge bases.

2606.14028 2026-06-15 stat.ML cs.LG 新提交

Anytime-Valid Confirmation of Label-Shift Corrections

标签偏移修正的任意有效确认

Seungjin Choi

发表机构 * Seungjin Choi

AI总结 针对标签稀缺时预指定偏移修正的确认问题,提出基于条件e值的任意有效序贯检验方法,利用似然比乘积构造非负鞅,将常规模型监测转化为正式检验。

Comments ICML 2026 Workshop on Hypothesis Testing

详情
AI中文摘要

在小型批次的科学部署中,即使未标记的目标输入可用,标记的目标结果也可能过于稀缺,无法进行可靠的偏移估计。我们解决了互补的设置,其中从业者根据领域知识预先指定了标签偏移修正,并询问传入的标记结果是否支持该修正。我们表明,经过标签偏移修正的预测与源预测之间的每个观测的似然比是一个条件e值,因此其运行乘积是一个非负鞅,Ville不等式产生一个任意有效的确认规则。对数鞅等于源预测与修正预测之间的累积负对数预测密度(NLPD)差距,将常规模型监测转化为正式的序贯检验。拒绝意味着传入数据支持相对于源预测的假定修正,但这不是对偏移程度的精确估计。对于具有高斯标签偏移比率的高斯过程源,存在封闭形式。高斯过程回归模拟验证了类型I控制、有限样本功效、校准敏感性以及基于标签重新估计的可靠先验的小批量优势。

英文摘要

In small-batch scientific deployments, labeled target outcomes may be too scarce for reliable shift estimation even when unlabeled target inputs are available. We address the complementary setting where the practitioner has a pre-specified label-shift correction from domain knowledge and asks whether incoming labeled outcomes support it. We show that the per-observation likelihood ratio between a label-shift-corrected predictive and the source predictive is a conditional e-value, so its running product is a nonnegative martingale and Ville's inequality yields an anytime-valid confirmation rule. The log martingale equals the cumulative negative log-predictive density (NLPD) gap between the source and the corrected predictive, converting routine model monitoring into a formal sequential test. Rejection means the incoming data support the posited correction relative to the source predictive, but it is not a precise estimate of the degree of shift. Closed forms are available for GP sources with Gaussian label-shift ratios. GP regression simulations validate Type I control, finite-sample power, miscalibration sensitivity, and the small-batch advantage of a reliable prior over label-based re-estimation.

2606.14023 2026-06-15 stat.ML cs.LG stat.ME 新提交

Geometric Domain Adaptation via Optimal Transport for Linear Regression in R^2

R^2中线性回归的几何域自适应:基于最优传输

Brian Britos, Mathias Bourel

发表机构 * University of the People(人民大学)

AI总结 针对源域与目标域存在旋转、平移或缩放变换的线性回归问题,提出结合K-means与最优传输的方法估计变换,实现目标数据稀缺时的模型自适应,理论证明p≥2时最优传输恢复变换。

详情
AI中文摘要

最优传输最近通过对齐源分布和目标分布,成为域自适应的一种强大方法。我们研究了一个监督域自适应问题,其中源域和目标域在$\mathbb{R}^2$中通过旋转、平移或缩放相关联。我们证明,当使用$p \geq 2$的$p$-范数成本时,最优传输映射能够恢复底层映射。基于这一见解,我们开发了一种结合$K$-means和最优传输的方法来估计底层映射,从而在目标数据稀缺时实现线性回归模型的自适应。模拟表明,与基线方法相比,性能有所提升。我们不依赖高表达力的深度学习架构,而是专注于经典机器学习模型,以强调可解释性和理论洞察。这一视角使我们能够明确刻画最优传输在恢复旋转、平移和缩放等几何变换中的作用。我们的贡献包括一个将最优传输与$\mathbb{R}^2$中的旋转、平移和缩放联系起来的理论结果,以及一种用于线性回归自适应的实用方法,在该空间的域自适应任务中既提供概念清晰性又具有应用价值。

英文摘要

Optimal Transport has become recently a powerful method for domain adaptation by aligning source and target distributions. We study a supervised domain adaptation problem where source and target domains are related by a rotation or a translation or a homothety in $\mathbb{R}^2$. We prove that the optimal transport map recovers the underlying map when using a $p-$norm cost with $p \geq 2$. Based on this insight, we develop a method combining $K-$means and optimal transport to estimate the underlying map, enabling adaptation of linear regression models when target data is scarce. Simulations demonstrate improved performance over baseline methods. Rather than relying on highly expressive deep learning architectures, we focus on classical machine learning models to emphasize interpretability and theoretical insight. This perspective allows us to explicitly characterize the role of optimal transport in recovering geometric transformations such as rotations, translations, and homotheties. Our contributions include a theoretical result linking optimal transport and rotations, translations and homothecies in $\mathbb{R}^2$, and a practical method for adaptation in linear regression offering both conceptual clarity and applied value in domain adaptation tasks in this space.

2606.14003 2026-06-15 cond-mat.mtrl-sci cs.LG physics.comp-ph 新提交

XRDiff: Crystal Structure Prediction from Powder X-Ray Diffraction Data Using Diffusion Models

XRDiff: 使用扩散模型从粉末X射线衍射数据进行晶体结构预测

Nofit Segal, Mingda Li, Benjamin Kurt Miller, Rafael Gómez-Bombarelli

发表机构 * Department of Materials Science and Engineering, MIT(材料科学与工程系,麻省理工学院) Department of Nuclear Science and Engineering, MIT(核科学与工程系,麻省理工学院) FAIR, Meta(FAIR,Meta)

AI总结 提出XRDiff扩散模型,从粉末X射线衍射数据恢复晶体结构,在模拟基准上实现强结构恢复率,并采用基于峰值的编码提升对实验数据的泛化能力。

详情
AI中文摘要

从粉末X射线衍射(PXRD)图谱确定材料的晶体结构是材料科学中的一个核心挑战。PXRD是一种易于使用且广泛应用的表征技术,但由于相位信息的丢失,从衍射数据恢复原子结构需要求解一个欠定逆问题。生成建模可以为原子结构提供先验,并通过模拟的结构-谱图对学习从PXRD图谱到晶体结构的映射。我们提出了XRDiff,一个扩散模型,能够在给定化学计量比或更困难的情况下(给定元素组成和晶胞中原子总数)从PXRD恢复晶体结构。我们在每个化学计量比具有多个多晶型物且给定组成的所有多晶型物被一起保留的数据集上进行评估,确保高性能反映了对衍射信号的真实利用。XRDiff在模拟基准上实现了强结构恢复率,表明模型学习了足够精确的谱图到结构的映射,能够区分多晶型物。为了解决对实验数据的泛化问题,我们比较了全谱编码和基于峰描述符的编码。基于峰的编码泛化能力显著更好,甚至优于使用针对实验噪声分布进行增强的全谱训练模型。这些结果表明,对真实世界PXRD中存在的噪声和伪影具有鲁棒性的表示为弥合模拟与实验之间的差距提供了一条实用且可扩展的路径,使得能够从实验PXRD中零样本求解晶体结构,输入完整的或部分的化学成分。

英文摘要

Determining the crystal structure of a material from its powder X-ray diffraction (PXRD) pattern is a central challenge in materials science. PXRD is an accessible and widely used characterization technique, yet recovering the atomic structure from diffraction data requires solving an underdetermined inverse problem due to the loss of phase information. Generative modeling can provide a prior over atomic structure and learn the mapping from PXRD patterns to crystal structures via simulated structure-spectrum pairs. We present XRDiff, a diffusion model that recovers crystal structures from PXRD given either the stoichiometry or, in a more challenging setting, the elemental constituents and total number of atoms in the unit cell. We evaluate on datasets where each stoichiometry has multiple polymorphs and all polymorphs of a given composition are held out together, ensuring that high performance reflects genuine use of the diffraction signal. XRDiff achieves strong structure recovery rates on simulated benchmarks, indicating that the model learns a spectrum-to-structure mapping precise enough to differentiate between polymorphs. To address generalization to experimental data, we compare a full-spectrum encoding against an encoding based on peak descriptors. The peak-based encoding generalizes substantially better, outperforming even a model trained on full spectra with augmentations fitted to the experimental noise distribution. These results demonstrate that representations robust to the noise and artifacts present in real-world PXRD offer a practical and scalable path toward closing the simulation-to-experiment gap, enabling zero-shot crystal structure solution from experimental PXRD with full or partial chemical composition input.

2606.13994 2026-06-15 cs.CR cs.AI cs.LG 新提交

Hidden in Plain Sight: Benchmarking Agent Safety Against Decomposition Attacks with DECOMPBENCH

隐于无形:使用DECOMPBENCH基准测试代理安全对抗分解攻击

Vikhyath Kothamasu, Virginia Smith, Chhavi Yadav

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Simons Institute, UC Berkeley(Simons研究所,伯克利大学)

AI总结 提出DeCompBench基准,通过分解攻击将有害任务拆分为良性子任务,揭示现有代理安全机制在对抗分解攻击时的脆弱性。

详情
AI中文摘要

基于LLM的代理变得越来越强大且广泛部署,在现实世界中造成了日益增长的对抗性滥用动机。一个关键的新兴威胁是分解攻击\cite{glukhov2024breach, jones2024adversaries},其中有害任务被分解为更简单、良性的子任务,这些子任务单独执行时能规避安全机制,但累积起来却实现了恶意意图。尽管最近的基准测试评估了代理在多轮和多工具使用设置中的安全性,但它们并未明确捕捉这种形式的分解滥用,且可能无法代表现实的对抗性执行流程。为此,我们引入了DeCompBench,这是一个专门设计用于评估分解攻击下代理安全性的基准。DeCompBench采用分解即设计原则,使用图形框架创建,能够将有害任务分解为单独良性且可执行的子任务,并具有现实的工作流程。我们使用自定义分解器的实验表明,最先进的代理在整体有害任务上表现出高拒绝率,但在其分解变体上拒绝率显著降低,同时往往无意中实现了对抗性目标。这些发现强调了针对分解攻击进行安全性评估及相应防御的必要性。我们的数据集已公开,可在以下网址获取:https://this https URL。

英文摘要

LLM-based Agents are becoming increasingly capable and widely deployed, creating growing incentives for adversarial misuse in the real-world. A key emerging threat is Decomposition Attacks \cite{glukhov2024breach, jones2024adversaries} in which a harmful task is broken into simpler, benign subtasks that evade safety mechanisms when executed separately but cumulatively fulfill the malicious intent. Although recent benchmarks assess agent safety in multi-turn and multi-tool-use settings, they do not explicitly capture this form of decompositional misuse and may not represent realistic adversarial execution flows. To this end, we introduce DeCompBench, a benchmark designed specifically to evaluate agentic safety under decomposition attacks. DeCompBench is created with a decomposition-by-design principle using a graphical framework and enables harmful task decomposition into individually benign and executable subtasks with realistic workflows. Our experiments using a custom decomposer show that state-of-the-art agents exhibit high refusal rates on monolithic harmful tasks, but significantly lower refusal rates on their decomposed variants, while often inadvertently fulfilling the adversarial objectives. These findings underscore the need for safety evaluations against decomposition attacks and corresponding defenses. Our dataset is publicly available and can be found at https://huggingface.co/datasets/decompositionbench/DeCompBench.

2606.13984 2026-06-15 stat.ML cs.LG stat.ME 新提交

A General Framework for Decision Trees via Bregman Divergences

基于Bregman散度的决策树通用框架

Mathias Bourel

发表机构 * IESTA, Facultad de Ciencias Económicas y de Administración, Universidad de la República, Uruguay(乌拉圭拉普拉塔大学经济与管理学院,IESTA) IRL-2030, Instituto Franco-Uruguayo de Matemática e Interacciones (IFUMI)(法乌数学与互动研究所(IFUMI))

AI总结 提出基于Bregman散度的CART推广框架,统一多种损失函数和分裂准则,并研究生成凸函数的强凸性与光滑性对杂质增益、估计器稳定性和一致性的影响。

详情
AI中文摘要

决策树因其可解释性、灵活性以及适应非线性结构的能力,成为统计学习中的基本工具之一。其中,由Breiman、Friedman、Olshen和Stone于1984年引入的分类与回归树(CART)成为最具影响力的算法之一,至今仍是分类和回归问题中最广泛使用的方法之一。另一方面,由Lev Bregman于1967年在凸优化背景下引入的Bregman散度,提供了广泛的一类损失函数,自然地推广了平方欧氏距离。该族包括Kullback-Leibler散度、Poisson散度和Itakura-Saito散度,以及与指数族分布相关的若干损失函数。此外,Bregman散度具有丰富的几何结构,并与凸分析和信息几何有深刻联系。本文提出基于Bregman散度的CART范式推广,从而获得适应不同统计模型和底层几何结构的更广泛的决策树族。尽管CART或经典实现(如rpart)等算法包含了不同的杂质准则,但这些准则通常针对每个特定模型以临时方式引入。相比之下,Bregman散度方法提供了一个统一的框架,使得这些准则可以从共同的凸和几何原理中推导和解释。除了算法构建,我们还研究了这些树的理论性质。特别地,我们研究了生成凸函数的性质(如强凸性或光滑性)如何影响父节点与子节点之间的杂质增益,以及估计器的稳定性和一致性。

英文摘要

Decision trees are one of the fundamental tools in statistical learning due to their interpretability, flexibility, and their ability to adapt to nonlinear structures. Among them, the Classification and Regression Trees, introduced by Breiman, Friedman, Olshen, and Stone in 1984, became one of the most influential algorithms and remains one of the most widely used methods for classification and regression problems. On the other hand, Bregman divergences, introduced by Lev Bregman in 1967 in the context of convex optimization, provide a broad family of loss functions that naturally generalize the squared Euclidean distance. This family includes, among others, the Kullback-Leibler divergence, the Poisson divergence, and the Itakura-Saito divergence, as well as several losses associated with distributions belonging to the exponential family. Moreover, Bregman divergences possess a rich geometric structure and deep connections with convex analysis and information geometry. In this work, we propose a generalization of the CART paradigm based on Bregman divergences, thereby obtaining a broader family of decision trees adapted to different statistical models and underlying geometries. Although algorithms such as CART or classical implementations such as rpart incorporate different impurity criteria, these are usually introduced in an ad hoc manner for each specific model. In contrast, the Bregman divergence approach provides a unified framework that allows these criteria to be derived and interpreted from common convex and geometric principles. Beyond the algorithmic construction, we also investigate theoretical properties of these trees. In particular, we study how properties of the generating convex function -- such as strong convexity or smoothness -- influence impurity gains between parent and child nodes, as well as stability and consistency properties of the estimator.

2606.13982 2026-06-15 stat.ML cs.LG 新提交

Adaptive Nucleus Truncation for Long-Form Reasoning

自适应核截断用于长形式推理

Ousmane Amadou Dia

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出自适应核截断采样(ANTS),通过熵条件控制器动态调整截断宽度,在长文本生成中提升推理性能,在33B参数稀疏MoE模型上平均提升1.9-5.2分。

详情
AI中文摘要

采样在长形式语言模型推理中扮演重要角色。在数千个解码步骤中,候选token集合的微小变化可能累积成不同的推理轨迹、稳定性配置和最终答案。现有的截断方法如top-$p$、min-$p$和固定top-$n\sigma$采样改进了无限制采样,但它们依赖固定阈值,无法适应熵、任务难度、训练阶段或生成预算的变化。我们引入自适应核截断采样(ANTS),将top-$n\sigma$采样从固定解码规则扩展为长形式生成的自适应展开控制机制。ANTS在温度缩放前选择最大logit周围的标准邻域,使用熵条件控制器自适应调整截断宽度,并保留一个无截断回退臂以在截断不安全时稳定训练。在33B总参数/4B活跃参数的稀疏混合专家推理模型上,ANTS在8K、16K和32K生成预算下分别比基于百分比的基准平均提升1.9、3.8和5.2分。最大提升出现在指令遵循和数学推理上,其中IFBench在32K时提升超过10分,AIME 2025提升7分。代码生成揭示了重要的预算交互:在Codeforces上,ANTS在8K时落后于基线,但在16K和32K时逆转差距并显著提升ELO。这些结果表明,采样器设计不应仅被视为解码超参数,而应作为我们稳定和扩展长预算推理的一部分。

英文摘要

Sampling plays an important role in long-form language-model reasoning. Over thousands of decoding steps, small changes in the candidate token set can compound into different reasoning trajectories, stability profiles, and final answers. Existing truncation methods such as top-$p$, min-$p$, and fixed top-$nσ$ sampling improve over unrestricted sampling, but they rely on fixed thresholds that cannot adapt to changes in entropy, task difficulty, training stage, or generation budget. We introduce Adaptive Nucleus Truncation Sampling (ANTS), which extends top-\(nσ\) sampling from a fixed decoding rule into an adaptive rollout-control mechanism for long-form generation. ANTS selects standardized neighborhoods around the maximum logit before temperature scaling, adapts the truncation width using an entropy-conditioned controller, and retains a no-truncation fallback arm to stabilize training when truncation becomes unsafe. On a 33B-total / 4B-active sparse Mixture-of-Experts reasoning model, ANTS improves average performance over percentage-based benchmarks by +1.9, +3.8, and +5.2 points at 8K, 16K, and 32K generation budgets, respectively. The strongest gains appear on instruction following and mathematical reasoning, with IFBench improving by more than 10 points at 32K and AIME 2025 improving by 7 points. Code generation reveals an important budget interaction. On Codeforces, ANTS trails the baseline at 8K, but reverses this gap and substantially improves ELO at 16K and 32K. These results suggest that sampler design should be treated not just as a decoding hyperparameter, but as part of how we stabilize and scale long-budget reasoning.

2606.13978 2026-06-15 astro-ph.IM cs.LG 新提交

Classification of Astronomical Spectra Using PCA-Compressed Flux and Inverse-Variance Features

使用PCA压缩通量和逆方差特征对天文光谱进行分类

Bruno Santos Meneses Barreto, Marcio Eisencraft

发表机构 * Departamento de Engenharia de Telecomunicações e Controle, Universidade de São Paulo(电信与控制工程系,圣保罗大学)

AI总结 提出一种结合通量和逆方差特征、经PCA压缩后使用LightGBM分类器对SDSS DR17光谱进行恒星、星系和类星体分类的方法,测试集准确率达94.6%。

Comments This manuscript has been submitted to the Simpósio Brasileiro de Telecomunicações e Processamento de Sinais (SBrT) and is currently under peer review

详情
AI中文摘要

本文评估了一种用于将SDSS DR17天文光谱分类为恒星、星系和类星体的信号处理和监督学习流程。每个光谱由其测量的通量和逆方差信息表示,结合了光谱形状与波长依赖的可靠性分布。在重新采样到共同的对数波长网格后,通量和逆方差向量被标准化并分别使用主成分分析进行压缩。得到的成分被连接起来并用于训练多个分类器。最佳性能由LightGBM梯度提升分类器获得,在测试集上达到94.6%的准确率和92.1%的平衡准确率。

英文摘要

This paper evaluates a signal-processing and supervised-learning pipeline for classifying SDSS DR17 astronomical spectra into stars, galaxies, and quasars. Each spectrum is represented by its measured flux and inverse-variance information, combining spectral shape with a wavelength-dependent reliability profile. After resampling onto a common logarithmic wavelength grid, the flux and inverse-variance vectors are standardized and separately compressed using principal component analysis. The resulting components are concatenated and used to train several classifiers. The best performance was obtained with the LightGBM gradient-boosting classifier, reaching $94.6\%$ accuracy and $92.1\%$ balanced accuracy on the test set.

2606.13968 2026-06-15 cs.DC cs.AI 新提交

STREAM: Multi-Tier LLM Inference Middleware with Dual-Channel HPC Token Streaming

STREAM:具有双通道 HPC 令牌流的多层 LLM 推理中间件

Anas Nassar, Steve Mohr, Leonard Apanasevich, Himanshu Sharma

发表机构 * Advanced Cyberinfrastructure for Education and Research (ACER) University of Illinois Chicago(高级教育与研究计算基础设施(ACER)伊利诺伊大学芝加哥分校)

AI总结 提出 STREAM 系统,通过三层路由架构(本地、HPC、云)和双通道 HPC 流(控制平面与数据平面分离)实现亚秒级 TTFT,解决现有系统无法统一三种推理场景的问题。

Comments 6 pages, 1 figure, PEARC '26

详情
AI中文摘要

研究人员和从业者在使用大型语言模型时面临碎片化局面:本地模型免费且私密,但硬件限制了可用的模型大小和上下文窗口;机构 HPC 中心提供强大的 GPU 资源且无边际成本,并将数据保留在机构边界内,但运行在防火墙后且专为批处理作业而非交互使用设计;商业云 API 按需提供前沿模型质量,但带来显著成本和不适合敏感研究数据的数据保留策略。现有系统无法统一这三者。STREAM(智能分层路由引擎)通过四项贡献解决了这一差距:(1)三层路由架构,结合本地、HPC 和云推理,并配备基于本地 LLM 的复杂度判断器;(2)双通道 HPC 流架构,将 Globus Compute 控制平面(认证和作业调度)与 WebSocket 中继数据平面(令牌传递)分离,实现亚秒级 TTFT(中位数 0.54 秒,比批处理模式的 11.40 秒快 21.1 倍),通过机构防火墙无需 VPN 或防火墙规则更改,端到端 AES-256-GCM 加密确保中继操作员无法读取令牌负载;(3)层级感知的上下文摘要,防止长对话将简单查询强制推送到昂贵层级;(4)HPC 即 API 代理模式,将 HPC 推理暴露为与 OpenAI 兼容的端点,可从任何标准客户端调用,无需 HPC 专业知识,这种部署模式仅因贡献(2)的亚秒级 TTFT 而变得实用。Llama 3.2 3B 在跨越十个领域的 1,200 个查询基准测试中实现了 85.1% 的免费层级保留率。测量的 TTFT:本地 0.26 秒,HPC(中继)0.54 秒,云 1.68 秒。

英文摘要

Researchers and practitioners working with large language models face a fragmented landscape: local models are free and private but hardware limits the model size and context windows a researcher can use; institutional HPC centers offer powerful GPU resources at no marginal cost and keep data within institutional boundaries, but operate behind firewalls and are designed for batch jobs rather than interactive use; commercial cloud APIs provide frontier-model quality on demand but impose significant cost and data retention policies unsuitable for sensitive research data. No existing system unifies all three. STREAM (Smart Tiered Routing Engine for AI Models) addresses this gap with four contributions: (1) a three-tier routing architecture combining local, HPC, and cloud inference with a local LLM-based complexity judge; (2) a dual-channel HPC streaming architecture that separates the Globus Compute control plane (authentication and job dispatch) from a WebSocket relay data plane (token delivery), enabling sub-second TTFT (0.54 s median, 21.1x over batch mode's 11.40 s) through institutional firewalls without VPN or firewall rule changes, with end-to-end AES-256-GCM encryption ensuring the relay operator cannot read token payloads; (3) tier-aware context summarization that prevents long conversations from forcing simple queries onto expensive tiers; and (4) an HPC-as-API proxy mode that exposes HPC inference as an OpenAI-compatible endpoint callable from any standard client with no HPC expertise, a deployment pattern made practical only by the sub-second TTFT of contribution (2). Llama 3.2 3B achieves 85.1% free-tier retention on a 1,200-query benchmark spanning ten domains. Measured TTFT: 0.26 s local, 0.54 s HPC (relay), 1.68 s cloud.

2606.13962 2026-06-15 cs.HC cs.AI 新提交

The Silent Cost of Artificial Intelligence Assistance: A Theory of Autonomy Surrender, the Recovery Mechanism, and the Restoration of Human Agency

人工智能辅助的隐性成本:自主性让渡理论、恢复机制与人类能动性的重建

Ancuta Margondai, Julie Rader, Emma Rader, Sara Willox, Mustapha Mouloua

发表机构 * Department of Modeling and Simulation(建模与仿真系)

AI总结 本文基于HIAG框架提出自主性让渡的理论模型,揭示AI辅助中认知带宽消耗导致的隐性成本,并设计恢复机制以重建人类能动性。

Comments 15 pages, 1 figure. Submitted version

详情
AI中文摘要

人工智能融入人类决策环境引入了一种此前未被充分理论化的成本:人类为获取信息和计算辅助而逐渐让渡自主性。基于人类身份与自主性差距(HIAG)框架,本文提出了一个自主性让渡的理论模型,将其视为由认知带宽消耗驱动的可测量、累积过程。该模型提出三种相互作用机制:AI辅助的隐性成本(自主性在无意识中逐步转移)、让渡阈值(超过该阈值后,恢复自主功能在认知和心理上变得困难)以及恢复机制(确立了设计义务和伦理责任,伴随人类有意识地重新掌握控制权)。本文认为,人类重新进入决策循环并非被动选择,而是一种需要有意恢复带宽的主动认知事件。AI系统的设计必须包含结构化的重新进入路径(此处称为恢复机制),以在适当分配责任的同时保留人类能动性。该模型进一步预测了一种终端状态(此处称为偏好反转),即对AI辅助的功能依赖不再被视为缺陷,而被体验为一种偏好,从而将自主性的恢复从设计问题转变为文化政治问题。本文为AI系统设计、治理框架和人因研究提供了启示。

英文摘要

The integration of artificial intelligence into human decision-making environments has introduced a previously undertheorized cost: the gradual surrender of human autonomy in exchange for access to information and computational assistance. Building on the Human Identity and Autonomy Gap (HIAG) framework, this paper advances a theoretical model of autonomy surrender as a measurable, cumulative process driven by cognitive bandwidth depletion. The model proposes three interacting mechanisms: the silent cost of AI assistance, in which autonomy is transferred incrementally and without awareness; the surrender threshold, beyond which reclaiming autonomous function becomes cognitively and psychologically difficult; and the recovery mechanism, which establishes the design obligation and the ethical responsibility accompanying deliberate human re-assumption of control. The paper argues that human re-entry into the decision loop is not a passive option but an active cognitive event requiring intentional bandwidth restoration. The design of AI systems must incorporate structured re-entry pathways, here termed recovery mechanisms, that preserve human agency while appropriately distributing responsibility. The model further predicts a terminal state, here termed preference inversion, in which functional dependence on AI assistance is experienced not as a deficit but as a preference, transforming the restoration of autonomy from a design problem into a cultural and political one. Implications are drawn for AI system design, governance frameworks, and human factors research.

2606.13957 2026-06-15 eess.IV cs.CV cs.MM 新提交

High-Fidelity Video Compression based on Invertible Neural Transform and Implicit Conditioning

基于可逆神经变换和隐式条件的高保真视频压缩

Siyue Teng, Ho Man Kwan, Yuxuan Jiang, Fan Zhang, David Bull

发表机构 * Visual Information Lab, University of Bristol, UK(布里斯托大学视觉信息实验室)

AI总结 提出InnVC,一种基于可逆神经网络和隐式条件场的视频编解码器,通过保持可逆主变换路径并解耦相关内容和细节,在高质量区域实现显著性能提升。

详情
AI中文摘要

基于学习的视频压缩最近在率失真性能上已与传统视频编解码器相媲美。然而,大多数现有方法依赖于不可逆的分析-合成变换,重建质量受到量化和变换近似误差的双重影响。在高质量点,量化误差较小,变换引起的失真占主导地位,这一限制尤为突出。为此,我们提出InnVC,一种基于可逆神经网络的视频编解码器,用于宽范围和高保真压缩。核心思想是在量化前保留可逆的主变换路径,同时通过紧凑的隐式条件场注入内容自适应上下文。这将强相关的视频内容与难以建模的细节解耦,使不同组件专门负责互补的重建任务,从而实现更高效的压缩。为进一步提高可压缩性,我们引入了一种调度掩码策略,逐步将信息内容集中到更少的潜在通道中,以实现更有效的熵编码。在UVG和MCL-JCV基准上的实验表明,InnVC在广泛的质量范围内实现了强大的压缩性能,在高质量区域尤为有效,相对于x265在UVG上PSNR的BD率降低21.66%,MS-SSIM降低46.06%。据我们所知,InnVC是首个在单一架构尺度内覆盖从低比特率到高保真操作点的神经视频编解码器,PSNR跨度超过20 dB。

英文摘要

Learning-based video compression has recently achieved competitive rate-distortion performance compared to conventional video codecs. However, most existing methods rely on non-invertible analysis-synthesis transforms, with reconstruction quality subject to both quantization and transform approximation errors. This limitation becomes particularly restrictive at higher quality points, where quantization errors are small and transform-induced distortion dominates. To address this, we propose InnVC, an Invertible neural network based Video Codec for wide-range and high-fidelity compression. The core idea is to preserve an invertible main transform path prior to quantization, while injecting content-adaptive context through a compact implicit conditioning field. This decouples strongly correlated video content from harder-to-model fine details, allowing different components to specialize in complementary reconstruction tasks for more efficient compression. To further improve compressibility, we introduce a scheduled masking strategy that progressively concentrates informative content into fewer latent channels for more effective entropy coding. Experiments on the UVG and MCL-JCV benchmarks show that InnVC achieves strong compression performance over a broad quality range, being particularly effective in the high-quality regime, yielding BD-rate reductions of 21.66% in PSNR and 46.06% in MS-SSIM relative to x265 on UVG. To the best of our knowledge, InnVC is the first neural video codec covers operating poins from low bitrate to high fidelity within a single architecture scale, spanning more than 20 dB in PSNR.

2606.13952 2026-06-15 cs.CR cs.ET cs.LG 新提交

Side-Channel Attacks Bypass Protection in 3D Printers

侧信道攻击绕过3D打印机的保护

Eric Yocam, Varghese Vaidyan, Micah Flack, Gurcan Comert, Judith L. Mwakalonge

发表机构 * Department of Computer Science, California Polytechnic State University(计算机科学系,加州大学Polytechnic州立大学) Beacom College of Computer and Cyber Sciences, Dakota State University(计算机与网络科学学院,达科他州立大学) Idaho National Laboratory(爱达荷国家实验室) Department of Computational Data Science and Engineering, North Carolina A&T State University(计算数据科学与工程系,北卡罗来纳A&T州立大学) Department of Engineering, South Carolina State University(工程系,南卡罗来纳州立大学)

AI总结 首次评估商用3D打印机的主动电机噪声消除(AMNC)硬件对策,发现其完全消除声学信道,但振动信道仍泄漏几何信息,且泄漏具有设备特异性。

Comments 11 pages, 6 figures, 4 tables

详情
AI中文摘要

主动电机噪声消除(AMNC)作为硬件对策,已部署在商用熔融沉积成型(FDM)3D打印机中,用于防御针对知识产权(IP)的声学侧信道攻击。我们首次对部署的AMNC对策进行实证评估,使用来自两台配备AMNC的Bambu Lab打印机的同步声学和振动记录公共数据集,涵盖12个物体类别。AMNC完全中和了声学信道:分类准确率与8.33%的随机基线无法区分。AMNC未针对的振动信道仍然泄漏。通过汇总统计,泄漏是粗略且幅度驱动的(振动准确率约31%合并,36-47%打印机内),而波形形状几乎不携带信息(仅频率特征为随机)。一个摄入打印有序演化的全序列时间模型将准确率提升至约61%,而顺序打乱的控制(约33%)表明,一个实质性成分是真正的顺序性并依赖于打印进程。泄漏具有设备特异性:在一台打印机上训练的分类器转移到另一台时接近随机。我们得出结论:AMNC仅是声学防御;振动仍然是一个部分、几何相关的侧信道,它未解决,但在此数据集上不支持完整的几何重建;重建级攻击需要AMNC同样未涉及的磁或电源信道。我们发布所有代码。

英文摘要

Active Motor Noise Cancellation (AMNC) ships in commercial fused deposition modeling (FDM) 3D printers as a hardware countermeasure against acoustic side-channel attacks that target intellectual property (IP). We present the first empirical evaluation of a deployed AMNC countermeasure, using a public dataset of synchronized acoustic and vibration recordings from two AMNC-equipped Bambu Lab printers across 12 object classes. AMNC fully neutralizes the acoustic channel: classification accuracy is indistinguishable from the 8.33% random baseline. The vibration channel, which AMNC does not target, still leaks. With summary statistics the leak is coarse and amplitude-driven (vibration accuracy approximately 31% pooled, 36-47% within-printer), while the waveform shape carries essentially nothing (frequency-only features at chance). A full-sequence temporal model that ingests the ordered evolution of the print raises accuracy to approximately 61%, and an order-shuffling control (approximately 33%) shows that a substantial component is genuinely sequential and tied to print progression. The leak is device-specific: a classifier trained on one printer transfers near chance to the other. We conclude that AMNC is an acoustic-only defense: vibration remains a partial, geometry-correlated side channel it does not address, but one that does not, on this dataset, support full geometric reconstruction; reconstruction-grade attacks would require the magnetic or power channels AMNC also leaves untouched. We release all code.

2606.13941 2026-06-15 gr-qc astro-ph.IM cs.LG 新提交

Binary Black Hole Parameter Estimation with Hybrid CNN-Transformer Neural Networks

使用混合CNN-Transformer神经网络进行双黑洞参数估计

Panagiotis N. Sakellariou, Spiros V. Georgakopoulos, Sotiris Tasoulis, Vassilis P. Plagianakos

发表机构 * University of Thessaly(塞萨洛尼基大学)

AI总结 提出混合CNN-Transformer深度学习策略,用于估计非进动双黑洞系统的内禀和外在参数,在模拟和真实引力波事件中展现出强预测性能和鲁棒性。

Comments Accepted manuscript. 12 pages, 10 figures

Journal ref Astronomy and Computing, vol. 54, 101027 (2026)

详情
AI中文摘要

引力波的探测彻底改变了我们探索宇宙基本方面的能力。传统上,建模的引力波信号通过基于模板的匹配滤波来识别,随后在信噪比时间序列中跨多个探测器进行符合分析。机器学习和深度学习的最新进展激发了人们对其在信号检测和参数估计中应用的兴趣。在本研究中,提出了一种混合深度学习策略,利用Transformer编码器的有效性以及成熟的卷积神经网络架构,尝试估计非进动双黑洞系统的内禀和外在参数。这项工作的主要焦点是点估计,即为每个参数生成单一最佳拟合值,而非完整的后验分布。该方法在嵌入高斯噪声的模拟信号和真实引力波事件上进行了评估,并在关键天体物理参数上展示了强大的预测性能和鲁棒性。

英文摘要

The detection of gravitational waves has revolutionized our ability to explore fundamental aspects of the Universe. Traditionally, modeled gravitational-wave signals have been identified using template-based matched filtering, followed by coincidence analysis across multiple detectors in the signal-to-noise ratio time series. Recent advances in Machine Learning and Deep Learning have sparked growing interest in their application to both signal detection and parameter estimation. In this study, a hybrid Deep Learning strategy is proposed that leverages the effectiveness of Transformer encoders alongside well-established Convolutional Neural Network architectures in an attempt to estimate the intrinsic and extrinsic parameters of non-precessing binary black hole systems. The primary focus of this work is point estimation, producing single best-fit values for each parameter rather than full posterior distributions. This method is evaluated on both simulated signals embedded in Gaussian noise and real gravitational-wave events, and it demonstrates strong predictive performance and robustness across key astrophysical parameters.

2606.13912 2026-06-15 cond-mat.dis-nn cond-mat.str-el cs.LG physics.comp-ph quant-ph 新提交

Direct/adaptive-mixture phase-gradient learning for neural-network quantum states with complex phase structure

具有复杂相位结构的神经网络量子态的直接/自适应混合相位梯度学习

Yi-Ran Xue, Rui Wang, Baigeng Wang, Chenan Wei

发表机构 * National Laboratory of Solid State Microstructures and Department of Physics(固体-state微结构国家实验室和物理系) Department of Physics, University of Massachusetts(麻省大学物理系) A. Alikhanyan National Science Laboratory(Alikhanyan国家科学实验室) Collaborative Innovation Center of Advanced Microstructures, Nanjing University(先进微结构协同创新中心,南京大学) Jiangsu Physical Science Research Center(江苏物理科学研究中心) Hefei National Laboratory(合肥国家实验室)

AI总结 针对神经网络量子态在复杂相位结构下的优化脆弱性问题,提出直接相位梯度估计器与自适应混合方法,显著降低方差并提升精度,在100位点通量梯子和手性XXX链上验证了优势。

Comments 24 pages, 8 figures

详情
AI中文摘要

神经网络量子态是量子多体物理中领先的变分工具,但当基态具有非平凡符号或复杂相位结构时(这在规范场、时间反演对称性破缺和费米子统计中是普遍存在的),其优化变得脆弱。我们将这种脆弱性归因于相位梯度的随机估计器,而非网络表达能力。蒙特卡洛能量梯度的相位部分是一个有噪声的得分函数估计器;相反,对局部能量进行微分得到一个直接估计器,该估计器对相同的相位力无偏,方差低得多,并且只需要分离的振幅-相位假设。在100位点通量梯子上演示,以这种方式训练的小型网络达到0.89%的中位误差,而调整后的标准基线停滞在1.8%,更宽或更深的标准梯度网络误差从8.4%退化到24.6%。该优势延续到手性XXX链:直接估计器再次收敛到比标准估计器明显更低的误差,跨越α和系统尺寸;该优势随通量增加而在零通量控制中消失。两个估计器的自适应混合在最优混合系数下方差绝不会比更好的端点差,通过种子分辨的诊断将大部分增益归因于消除失败运行。因此,估计器设计成为复值神经量子态的一类重要杠杆。

英文摘要

Neural-network quantum states (NQS) are a leading variational tool for quantum many-body physics, yet their optimization is fragile whenever the ground state carries a non-trivial sign or complex phase structure, a situation generic to gauge fields, broken time-reversal symmetry, and fermionic statistics. We trace this fragility to the stochastic estimator of the phase gradient rather than to network expressiveness. The phase sector of the Monte Carlo energy gradient is a noisy score-function estimator; differentiating the local energy instead yields a direct estimator that is unbiased for the same phase force, has far lower variance, and requires only a separated amplitude--phase ansatz. Demonstrated on a 100-site flux ladder, a small network trained this way reaches $0.89\%$ median error, where tuned standard baselines plateau at $1.8\%$ and wider or deeper standard-gradient networks degrade from $8.4\%$ to $24.6\%$. The advantage carries over to chiral XXX chains: the direct estimator again converges to a markedly lower error than the standard one, across $α$ and size; it grows with flux and vanishes in zero-flux controls. An adaptive-mixture of the two estimators is provably never worse in variance than the better endpoint at the optimal mixing coefficient, with seed-resolved diagnostics tracing much of the gain to eliminating failed runs. Estimator design thus emerges as a first-class lever for complex-valued neural quantum states.

2606.13905 2026-06-15 cs.IR cs.CL 新提交

ADORE: Iterative Query Expansion with Retrieval-Grounded Relevance Feedback

ADORE: 基于检索反馈的迭代查询扩展

Amin Bigdeli, Negar Arabzadeh, Radin Hamidi Rad, Sajad Ebrahimi, Charles L. A. Clarke, Ebrahim Bagheri

发表机构 * University of Waterloo(滑铁卢大学) Mila – Quebec AI Institute(魁北克人工智能研究所) University of Toronto(多伦多大学) University of California, Berkeley(加州大学伯克利分校)

AI总结 提出ADORE框架,通过迭代生成伪段落、检索语料库并评估相关性,利用检索反馈指导查询扩展,显著提升检索性能。

详情
AI中文摘要

基于LLM的查询扩展通过为原始查询添加额外上下文来改进检索。然而,大多数方法仍然是生成驱动的,产生看似合理的伪文档或扩展,而不检查目标语料库的响应。这可能导致检索漂移、放大误导性词汇或遗漏区分相关与不相关文档的术语。我们认为,有效的扩展需要基于检索的反馈,而不仅仅是单次生成或未经验证的迭代。我们引入ADORE(适应、观察、相关性评估),一个迭代框架,将检索结果转化为下一次扩展的反馈。在每一轮中,LLM生成伪段落,检索器暴露语料库响应,相关性评估器根据原始查询评估检索到的文档。这些判断识别出需要强化、仍未被覆盖以及需要抑制的内容。在TREC Deep Learning、BEIR和BRIGHT上,ADORE consistently outperforms strong query expansion baselines with notable improvements across nearly all evaluation settings, improving average nDCG@10 by 24.5% over BM25 and 3.6% over the strongest prior query expansion method on BEIR, and by 122.9% over BM25 and 9.2% over the best query expansion baseline on BRIGHT. 我们的代码和数据已公开。

英文摘要

LLM-based query expansion improves retrieval by enriching the original query with additional context. Yet most methods remain generation-driven, producing plausible pseudo-documents or expansions without checking how the target corpus responds. This can introduce retrieval drift, amplify misleading vocabulary, or miss terms that distinguish relevant from non-relevant documents. We argue that effective expansion requires retrieval-grounded feedback, not just single-pass generation or unverified iteration. We introduce ADORE (ADapt, Observe, Relevance Evaluate), an iterative framework that turns retrieval outcomes into feedback for the next expansion. At each round, an LLM generates pseudo-passages, a retriever exposes the corpus response, and a relevance assessor evaluates retrieved documents against the original query. These judgments identify what to reinforce, what remains undercovered, and what to suppress. Across TREC Deep Learning, BEIR, and BRIGHT, ADORE consistently outperforms strong query expansion baselines with notable improvements across nearly all evaluation settings, improving average nDCG@10 by 24.5% over BM25 and 3.6% over the strongest prior query expansion method on BEIR, and by 122.9% over BM25 and 9.2% over the best query expansion baseline on BRIGHT. Our code and data are publicly available.

2606.13892 2026-06-15 cs.CR cs.AI 新提交

Crypto x AI, AI x Crypto: A Survey

Crypto x AI, AI x Crypto: 综述

Sarah Allen, Pranay Anchuri, James Austgen, Maryam Bahrani, Samuel Breckenridge, Aaron Buchwald, Christian Cachin, Andrés Fábrega, Jared Fernandez, James Hsin-yu Chiang, Marwa Mouallem, Roi Bar-Zur, Neil DeSilva, Ittay Eyal, Giulia Fanti, Ari Juels, Andrew Miller, Christian Sillaber, Dani Vilardell, Pramod Viswanath, Wenhao Wang, Matt Weinberg, Sen Yang, Jianzhu Yao, Fan Zhang

发表机构 * Initiative for CryptoCurrencies and Contracts (IC3)(加密货币与合同倡议(IC3)) Ava Labs(Ava实验室) Carnegie Mellon University(卡内基梅隆大学) Cornell Tech(康奈尔科技) Flashbots Offchain Labs(离链实验室) Ritual Labs(仪式实验室) Technion(技术学院) University of Bern(伯恩大学) Princeton University(普林斯顿大学) ETH Zurich(苏黎世联邦理工学院) Teleport(Teleport;Flashbots(X)) Flashbots(X)(特拉维夫大学) Tel Aviv University

AI总结 本综述系统梳理了AI与区块链(crypto)的交叉研究,总结了现有工作、关键发现、开放问题及行业误解,指出两者仍处于早期融合阶段。

详情
AI中文摘要

Crypto x AI的交叉领域正在催生大量论文、产品、在线帖子和公司。然而,所有的喧嚣掩盖了已经完成的工作、存在的机遇和挑战,以及值得关注的开放问题。本综述论文探讨了AI能为基于区块链的技术(广义上的“crypto”)做什么(crypto x AI),反之亦然(AI x crypto)。我们系统化了现有工作,总结了关键要点,强调了开放的研究问题,并对普遍的行业误解提供了观点,得出结论:AI和crypto仍处于有意义整合的非常早期阶段。

英文摘要

The intersection of crypto x AI is spawning papers, products, online posts, and companies. All the surrounding buzz, though, obscures what exactly has been done, what the opportunities and challenges are, and what open questions deserve attention. This survey paper asks what AI can do for blockchain-based technologies (broadly construed as "crypto") (crypto x AI), and vice versa (AI x crypto). We systematize existing work, summarize key takeaways, highlight open research questions, and offer a perspective on pervasive industry misconceptions, concluding that AI and crypto are still in the very early stages of meaningful integration.

2606.13868 2026-06-15 astro-ph.IM cs.LG 新提交

Multi-Variable Stellar Parameter Estimation Using Residual Multitask Neural Networks

使用残差多任务神经网络的多变量恒星参数估计

Bruno Santos Meneses Barreto, Marcio Eisencraft

发表机构 * Escola Politécnica, Universidade de São Paulo, SP(圣保罗大学理工学院)

AI总结 提出一种端到端流水线,利用带残差块的全连接多任务神经网络,通过贝叶斯优化调参,从SDSS光谱中估计有效温度、金属丰度和表面重力,在低复杂度下达到1%-3%的归一化误差。

Comments This manuscript has been submitted to the Congresso Brasileiro de Automática (CBA) and is currently under peer review

详情
AI中文摘要

我们提出了一种端到端流水线,用于从斯隆数字巡天数据发布12的光谱中估计恒星参数,该流水线使用带有残差块的全连接多任务神经网络,其超参数通过贝叶斯优化进行调优。预处理流水线包括每个光谱的标准化、目标变量(有效温度$T_{\mathrm{eff}}$、金属丰度$[\mathrm{Fe/H}]$和表面重力$\log g$)的RobustScaler归一化,以及通过注入高斯噪声进行数据增强。在保留的测试集上,该模型对$T_{\mathrm{eff}}$实现了$59.76~\mathrm{K}$的平均绝对误差(MAE),对$[\mathrm{Fe/H}]$实现了$0.103~\mathrm{dex}$,对$\log g$实现了$0.130~\mathrm{dex}$。相对于每个参数的全尺度范围进行归一化后,这些结果代表了$1\%$到$3\%$的范围归一化误差,而模型复杂度仅为约540,000个可训练参数,效率极高。这些结果表明,紧凑的残差多任务架构结合合理的信号预处理,为大规模光谱数据集中的非线性参数估计提供了一种参数高效的解决方案。特别是,所提出的模型在复杂度远低于更深神经网络基线的情况下实现了有竞争力的性能。

英文摘要

We present an end-to-end pipeline for estimating stellar parameters from Sloan Digital Sky Survey Data Release 12 spectra using a fully connected multitask neural network with residual blocks, whose hyperparameters are tuned via Bayesian optimization. The preprocessing pipeline includes per-spectrum standardization, RobustScaler normalization of the target variables -- effective temperature $T_{\mathrm{eff}}$, metallicity $[\mathrm{Fe/H}]$, and surface gravity $\log g$ -- and data augmentation via Gaussian noise injection. On a held-out test set, the model achieved Mean Absolute Errors (MAE) of $59.76~\mathrm{K}$ for $T_{\mathrm{eff}}$, $0.103~\mathrm{dex}$ for $[\mathrm{Fe/H}]$, and $0.130~\mathrm{dex}$ for $\log g$. Normalized against the full-scale range of each parameter, these results represent range-normalized errors between $1\%$ and $3\%$, achieved with a highly efficient model complexity of approximately 540,000 trainable parameters. These results demonstrate that a compact residual multitask architecture, combined with principled signal preprocessing, provides a parameter-efficient solution for nonlinear parameter estimation in large-scale spectral datasets. In particular, the proposed model achieves competitive performance with substantially lower complexity than deeper neural network baselines.

2606.13859 2026-06-15 cond-mat.mtrl-sci cs.LG 新提交

Closed-loop discovery of out-of-distribution processing protocols by evolutionary search and uncertainty-aware learning

通过进化搜索和不确定性感知学习发现分布外处理协议的闭环方法

Yu Liu, Stanislav Udovenko, Ching-Che Lin, Jaegyu Kim, Lane W. Martin, Susan Trolier-McKinstry, Sergei V. Kalinin

发表机构 * Department of Materials Science and Engineering, University of Tennessee, Knoxville(田纳西大学材料科学与工程系) Materials Science and Engineering Department, Materials Research Institute, the Pennsylvania State University(宾夕法尼亚州立大学材料研究学院材料科学与工程系) Department of Materials Science and NanoEngineering, Rice University(Rice大学材料科学与纳米工程系) Rice Advanced Materials Institute, Rice University(Rice大学先进材料研究所) Department of Materials Science and Engineering, University of California, Berkeley(加州大学伯克利分校材料科学与工程系) Departments of Chemistry and Physics and Astronomy, Rice University(Rice大学化学与天文物理系) Physical Sciences Division, Pacific Northwest National Laboratory(太平洋西北国家实验室物理科学部)

AI总结 提出一种闭环工作流,结合紧凑波形表示的进化搜索与不确定性感知深度核学习,自动发现提升铁电薄膜非线性响应的分布外处理协议,并通过实验验证其机制。

详情
AI中文摘要

许多材料和化学系统表现出历史依赖的响应,其中功能结果不仅由最终状态变量决定,还由操作期间施加的场、温度或化学势的时间序列决定。因此,发现新的处理协议是一个高维搜索问题,其中控制变量是整个波形或样本历史,而传统策略要么局限于保守的内插族,要么变得过于测量密集。本文介绍了一种闭环工作流,将紧凑波形表示上的进化搜索与不确定性感知深度核学习相结合,以生成、排序和实验验证候选协议。应用于铁电薄膜,以扫描探针尖端偏压波形为协议,非线性机电响应为奖励,该工作流发现了通过去老化薄膜增强非线性的波形族。空间分辨的前后测量表明,性能最佳的波形选择性地激活预先存在的弱钉扎畴壁段,而最差的波形则驱动长程不可逆切换。该框架将协议调优重新定义为分布外发现,可推广到合成和退火轨迹、电池形成协议以及其他高维控制问题。

英文摘要

Many materials and chemical systems exhibit history-dependent responses, where functional outcomes are governed not only by final-state variables but by the time-dependent sequence of fields, temperatures, or chemical potentials applied during operation. Discovering new processing protocols is therefore a high-dimensional search problem in which the control variable is an entire waveform or sample history, and conventional strategies either remain confined to conservative interpolative families or become prohibitively measurement intensive. Here, a closed-loop workflow is introduced that couples evolutionary search over a compact waveform representation with uncertainty-aware deep kernel learning to generate, rank, and experimentally validate candidate protocols. Applied to ferroelectric thin films, with the scanning-probe tip-bias waveform as the protocol and the nonlinear electromechanical response as the reward, the workflow discovers waveform families that enhance nonlinearity by de-aging the film. Spatially resolved before/after measurements show that the best-performing waveforms selectively activate pre-existing, weakly pinned domain-wall segments, whereas the worst drive long-range irreversible switching. This framework reframes protocol tuning as out-of-distribution discovery, generalizable to synthesis and annealing trajectories, battery formation protocols, and other high-dimensional control problems.

2606.13858 2026-06-15 cs.IR cs.AI 新提交

Mood-Aware Music Recommendation: Integrating User Affective Signals into Ranking Systems

情绪感知音乐推荐:将用户情感信号融入排序系统

Terence Zeng, Abhishek K. Umrawal

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出一种情绪条件排序框架,通过能量-效价空间的softmax采样将用户情感信号融入推荐过程,单盲实验表明能提升推荐质量。

Comments 13 pages, 4 figures, and 1 table

详情
AI中文摘要

推荐系统在现代音乐流媒体平台中至关重要,因为可用内容数量巨大。虽然协同过滤被广泛用于根据具有相似模式的其他用户的偏好来推荐项目,但在用户-项目交互稀疏的领域(如音乐)中表现不佳。基于内容的过滤是一种替代方法,它检查项目本身的属性。已有研究探索了流派、乐器和歌词;然而,对情感识别的关注相对较少。由于用户的情绪状态强烈影响其音乐选择,融入情绪信号为个性化提供了有前景的方向。在这项工作中,我们提出了一种情绪条件排序框架,通过能量-效价空间中的softmax采样将用户情感信号融入推荐过程。我们通过单盲实验评估该方法,参与者将所提系统的推荐与基线进行比较。结果表明感知推荐质量有所提升,为将基于情绪的输入融入音乐推荐的有效性提供了初步证据。

英文摘要

Recommendation systems are essential in modern music streaming platforms due to the vast amount of available content. While collaborative filtering is widely used to suggest items based on the preferences of others with similar patterns, it performs poorly in domains where user-item interactions are sparse, such as music. Content-based filtering is an alternative approach that examines the qualities of the items themselves. Genre, instrumentation, and lyrics have been explored; however, relatively little attention has been given to emotion recognition. Since a user's emotional state strongly influences their music choice, incorporating mood signals offers a promising direction for personalization. In this work, we propose a mood-conditioned ranking framework that integrates user affective signals into the recommendation process via softmax-based sampling in the energy-valence space. We evaluate the approach via single-blind experiments in which participants compare recommendations from the proposed system against a baseline. The results indicate improved perceived recommendation quality, providing preliminary evidence for the effectiveness of incorporating mood-based inputs into music recommendations.

2606.13854 2026-06-15 cs.HC cs.AI 新提交

SpheriCity: Designing Trustworthy Conversational AI for Sustainability Decision Support

SpheriCity:为可持续发展决策支持设计可信赖的对话式AI

Ahmed Qayyum, Madison Werner, Kathryn Youngblood, Jenna R. Jambeck, Tahiya Chowdhury

发表机构 * Department of Computer Science, Colby College(科里尔学院计算机科学系) Circularity Informatics Lab, University of Georgia(佐治亚大学循环信息实验室)

AI总结 提出SpheriCity,一种基于来源的对话式AI原型,通过结构化合成和交互支架,支持从城市循环性评估报告中可信地获取知识,解决大语言模型在可持续性高风险领域中的透明度与信任问题。

Comments Accepted to ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies (COMPASS '26)

详情
AI中文摘要

我们提出了SpheriCity,一种基于专家知识的对话式原型,旨在支持从可持续性报告中可信地获取知识。城市级循环性评估报告包含关于材料、基础设施和政策干预的丰富信息,但其长度和异构结构使得从事循环经济倡议的从业者和研究人员难以进行跨文档综合和比较。虽然大型语言模型(LLM)有望实现更快速的知识获取和综合,但其不透明的推理、幻觉和缺乏来源透明度给信任和可解释性带来了风险,并且在高风险的可持续性背景下需要验证。SpheriCity通过一种以来源为先的对话式代理来应对这些挑战,该代理强调证据可追溯性、结构化合成和交互支架,以支持跨可持续性报告的探索性查询和跨文档综合。我们与六位可持续性专家进行了形成性专家评审,使用了涵盖跨城市比较、政策总结和推荐导向任务的代表性查询。专家们从多个维度评估了回答,并提供了关于系统对可持续性知识工作有用性的定性反思。我们的结果表明,透明的来源、上下文解释、可解释性以及与专家工作流程的一致性强烈影响专家对系统有用性的信任和判断。这项工作贡献了(1)一个用于可持续性知识理解的对话式原型,(2)一个用于评估高风险知识领域中AI回答的基于专家的评估框架,以及(3)关于来源、不确定性沟通和工作流程整合如何影响专家用户对AI辅助可持续性决策支持信任的设计见解。

英文摘要

We present SpheriCity, an expert-grounded conversational prototype designed to support trustworthy knowledge sensemaking from sustainability reports. City-level circularity assessment reports contain rich information about materials, infrastructure, and policy interventions, yet their length and heterogeneous structure make cross-document synthesis and comparison difficult for practitioners and researchers working on circular economy initiatives. While large language models (LLM) promise faster knowledge access and synthesis, their opaque reasoning, hallucinations, and lack of source transparency introduce risks for trust and interpretability, and require verification in high-stakes sustainability contexts. SpheriCity addresses these challenges through a provenance-first conversational agent that foregrounds evidence traceability, structured synthesis, and interaction scaffolds to support exploratory querying and cross-document synthesis across sustainability reports. We conducted a formative expert review with six sustainability experts using representative queries spanning cross-city comparison, policy summarization, and recommendation-oriented tasks. Experts evaluated responses across dimensions and provided qualitative reflections on the system's usefulness for sustainability knowledge work. Our results reveal that transparent sourcing, contextual explanation, interpretability, and alignment with expert workflow strongly shape expert trust and judgments of system usefulness. This work contributes (1) a conversational prototype for sustainability knowledge sensemaking, (2) an expert-grounded evaluation framework for assessing AI responses in high-stakes knowledge domains, and (3) design insights into how provenance, uncertainty communication, and integration in workflow influence expert users' trust in AI assistance for sustainability decision support.