arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1695
专题追踪
2605.23323 2026-05-25 eess.IV cs.CV

Efficient Learned Image Compression without Entropy Coding

无需熵编码的高效学习图像压缩

Hao Cao, Wenqi Guo, Zhijin Qin, Jungong Han

发表机构 * Department of Electronic Engineering, Tsinghua University(清华大学电子工程系) Department of Automation, Tsinghua University(清华大学自动化系) State Key Laboratory of Space Network(空间网络与通信国家重点实验室) Beijing National Research Center for Information Science(北京信息科学国家研究中心)

AI总结 本文提出了一种无需熵编码的高效学习图像压缩方法EF-LIC,旨在解决传统方法中熵编码导致的编码延迟瓶颈问题。该方法通过引入无约束向量量化和上下文条件自回归变换,有效去除统计冗余和相关性冗余,实现了与传统方法相当的压缩性能。实验表明,EF-LIC在保持高质量的同时,显著提升了编码和解码速度。

Comments Accepted by ICML 2026

详情
AI中文摘要

熵编码在典型的学习图像压缩(LIC)中被广泛使用,它将潜在变量转换为紧凑的比特流。然而,熵编码通常是顺序执行的,成为编码延迟的瓶颈。为了克服这一问题,我们提出了无需熵编码的学习图像压缩(EF-LIC),这是一个多速率框架,通过去除统计冗余和相关冗余,以低编码延迟生成紧凑表示。首先,我们引入无约束向量量化,并证明其索引分布接近最大熵界,从而产生最小的统计冗余。其次,我们提出了一种上下文条件自回归变换,直接重新参数化潜在变量以减少相互依赖性。理论分析表明,EF-LIC可以像带有熵编码的典型LIC一样有效地去除相关冗余,从而实现相当的压缩性能。实验表明,在Kodak数据集上使用LPIPS度量,EF-LIC相比MS-ILLM实现了高达67.86%的比特率降低。消融研究进一步表明,EF-LIC在匹配基于熵编码的变体的压缩性能的同时,实现了超过3倍的编码加速和超过5倍的解码加速。

英文摘要

Entropy coding is widely used in typical learned image compression (LIC) that converts latents into a compact bitstream. However, entropy coding is typically sequential and becomes the coding latency bottleneck. To overcome it, we present Entropy-Coding Free Learned Image Compression (EF-LIC), a multi-rate framework that generates compact representation by removing statistical and correlation redundancy with low coding latency. First, we introduce unconstrained vector quantization and prove that its index distribution approaches the maximum-entropy bound, yielding minimal statistical redundancy. Second, we propose a context-conditioned autoregressive transform that directly reparameterizes the latents to reduce inter-dependency. Theoretical analysis shows that EF-LIC can remove correlation redundancy as effectively as typical LIC with entropy coding, leading to comparable compression performance. Experiments show EF-LIC achieves up to 67.86% bitrate reduction over MS-ILLM on Kodak with LPIPS. Ablation studies further show EF-LIC matches the compression performance of its entropy-coding based variant while achieving over $3\times$ faster encoding and $5\times$ faster decoding.

2605.23306 2026-05-25 physics.soc-ph cs.LG cs.SY eess.SY

SpinFlow: A Physics-Informed Spin Field Framework for Traffic Phase Inference and Transition Detection

SpinFlow: 一种物理信息自旋场框架用于交通相位推断和过渡检测

Haopeng Deng, Fucheng Zheng, Xinhai Xia

发表机构 * School of Future Transportation(未来交通学院)

AI总结 本文提出了一种名为SpinFlow的物理信息化自旋场框架,用于交通相位推断和相变检测。该方法结合Kerner的三相理论与统计物理,通过自旋场建模实现对宏观交通状态的连续推断,并利用正则化的期望最大化算法从高分辨率轨迹数据中反演潜在的自旋场结构。实验表明,SpinFlow在多个真实数据集上表现出优越的性能,能够准确识别交通相变点并生成可解释的相图,为智能交通管理提供了数据驱动且符合物理规律的决策依据。

Comments 11 pages, 8 figures, accepted to ITSC 2026

详情
AI中文摘要

主动交通管理(ATM)经常受到传统宏观模型和刚性经验阈值的阻碍,这些模型和阈值无法捕捉亚稳态相位前兆,导致延迟的反应性干预。为了解决这个问题,我们提出了SpinFlow,一个物理信息自旋场框架,将Kerner的三相理论与统计物理统一起来,用于连续宏观交通相位推断。受海森堡模型启发,SpinFlow通过潜在自旋向量和竞争平衡映射参数化空间变化的相位权重,使同步流自然出现。一种物理正则化的期望最大化算法从高分辨率轨迹中反演这种潜在结构,联合优化自旋场,同时软性强制执行质量守恒和空间平滑性。我们引入相位平衡度(PED)来量化结构对齐并在拓扑上定位相变点。在四个真实轨迹数据集上,SpinFlow实现了高达0.940的$R_{q}^{2}$,PED下降94.9-100%,以及可解释的相位图,在前向准确性、物理一致性和瓶颈定位方面优于三个异构基线。SpinFlow无需先验网络拓扑即可精确定位拥堵成核,为ATM提供了一种数据驱动、物理一致的触发机制。

英文摘要

Active traffic management (ATM) is frequently hindered by traditional macroscopic models and rigid empirical thresholds that fail to capture metastable phase precursors, resulting in delayed, reactive interventions. To address this, we propose SpinFlow, a physics-informed spin-field framework unifying Kerner's three-phase theory with statistical physics for continuous macroscopic traffic phase inference. Inspired by the Heisenberg model, SpinFlow parametrizes spatially varying phase weights via a latent spin vector and a competitive-equilibrium mapping, allowing synchronized flow to emerge naturally. A physics-regularized Expectation-Maximization algorithm inverts this latent structure from high-resolution trajectories, jointly optimizing the spin field while softly enforcing mass conservation and spatial smoothness. We introduce the Phase Equilibrium Degree (PED) to quantify structural alignment and topologically localize phase-transition points. Across four real-world trajectory datasets, SpinFlow achieves $R_{q}^{2}$ up to 0.940, PED drops of 94.9-100%, and interpretable phase maps that outperform three heterogeneous baselines on forward accuracy, physics consistency, and bottleneck localization. SpinFlow pinpoints congestion nucleation without prior network topology, yielding a data-driven, physics-consistent trigger for ATM.

2605.23295 2026-05-25 physics.optics cs.LG physics.app-ph

Accelerating ground state search of spatial photonic Ising machines with genetic-simulated annealing hybrid algorithm

基于遗传-模拟退火混合算法加速空间光子伊辛机基态搜索

Ze Zheng, Ruhui Ni, Jingyi Zhao, Xiaojian Hu, Wen Jiang, Yuegang Li, Hang Xu, Tailong Xiao, Guihua Zeng

发表机构 * Institute for Quantum Sensing and Information Processing(量子传感与信息处理研究所) State Key Laboratory of Photonics and Communications(光子与通信国家重点实验室) Global College(全球学院) Shanghai Research Center for Quantum Sciences(上海量子科学研究中心) Hefei National Laboratory(合肥国家实验室) Shanghai Quantum Intelligence Sensing Technology Co., Ltd(上海量子智能感知技术有限公司)

AI总结 该研究提出了一种结合遗传算法与模拟退火的混合算法,用于加速空间光子Ising机的基态搜索。传统方法依赖单一的模拟退火算法,收敛速度慢且耗时,而新方法在早期阶段利用遗传算法进行全局搜索,后期采用模拟退火进行局部优化,从而显著提升了求解效率和解的质量。实验表明,该方法在不同规模的Max-Cut问题及高阶优化问题中均优于传统算法,为智能光子Ising计算系统的发展提供了新思路。

Comments 12 pages, 6 figures

详情
AI中文摘要

基于空间光调制器的空间光子伊辛机已成为解决组合优化问题和自旋玻璃模拟等众多任务的高效求解器。然而,传统仅依赖模拟退火算法的SPIM在复杂能量景观中需要大量测量-反馈迭代才能找到相对最优解,存在收敛慢、时间成本高的问题。本文提出一种光学遗传-模拟退火混合算法来加速SPIM的基态搜索。GA在迭代早期进行全局粗粒度搜索,而SA在后期进行细粒度局部精化。数值模拟表明,我们的方法在不同规模的全秩Max-Cut问题上比纯GA或SA能获得更高的解质量。我们还在同一迭代预算下,在规范变换时分复用SPIM上实验证明了其相对于传统算法在高秩优化问题上的优越性。我们的方法可进一步与其他先进元启发式算法结合,向智能光学伊辛计算系统发展。

英文摘要

Spatial photonic Ising machines (SPIMs) based on spatial light modulators (SLMs) have emerged as highly effective solvers for many tasks, including combinatorial optimization problems and spin-glass simulations. However, traditional SPIMs relying solely on the simulated annealing algorithm require a large number of measurement-feedback iterations to find a relatively optimal solution in complex energy landscapes, suffering from slow convergence and high time cost. Here, we propose an optical genetic-simulated annealing hybrid algorithm to accelerate the ground-state search of SPIMs. GA conducts a global coarse-grained search in the early iteration stage, while SA performs fine-grained local refinement in the late stage. Numerical simulations show that our method enables a higher solution quality of full-rank Max-Cut problems than pure GA or SA at different scales. We also experimentally demonstrate its superiority over conventional algorithms on a gauge-transformation time-division multiplexing SPIM for high-rank optimization problems under the same iteration budget. Our approach can be further developed with other advanced metaheuristic algorithms toward intelligent optical Ising computing systems.

2605.23293 2026-05-25 eess.AS cs.SD eess.SP

Evaluating the Temporal Detection Capability of Integrated Gradients Applied on Sound Classifier

评估集成梯度应用于声音分类器的时间检测能力

Martynas Dumpis, Tuomas Virtanen

发表机构 * Department of Electronic Systems(电子系统系) Vilnius Gediminas Technical University(维尔纽斯吉尔迈纳斯技术大学) Signal Processing Research Centre(信号处理研究中心) Tampere University(塔尔皮奥大学)

AI总结 本文评估了基于梯度的归因方法——集成梯度(Integrated Gradients)在无时间监督训练的音频分类器中检测声音事件时间边界的能力。通过合成多声音频和真实时间戳进行对比,研究发现集成梯度在定位声音事件方面表现出一定的有效性,其性能接近于显式生成帧级预测的模型,显著优于随机和能量基方法。实验结果表明,集成梯度能够捕捉声音事件的有意义时间活动模式,为音频分类模型的可解释性研究提供了新的视角。

Comments 5 pages, 3 figures

详情
AI中文摘要

基于梯度的归因方法可以突出对神经网络预测重要的输入区域,但其在音频分类中用于时间声音事件检测的有效性尚未被系统评估。本文评估了集成梯度(IG)在应用于没有时间监督训练的分类器时,能否在时间上检测声音事件。我们使用带有真实时间戳的合成多声道音频来测量IG归因与事件边界之间的对齐程度。在一个10类家庭声音数据集上,IG实现了平均交并比(IoU)0.39、帧级F1分数0.52和Pointing Game准确率82.6%。作为对比,使用弱监督(FW-WS,片段级训练标签)训练的帧级CNN实现了0.42 IoU、0.55 F1和97.3% PG,而强监督变体(FW-SS,帧级训练标签)达到了0.45 IoU、0.58 F1和97.9% PG。总体而言,这些结果表明事后IG捕捉到了声音事件有意义的时序活动模式,其定位性能接近显式产生帧级预测的模型。所有方法都显著优于随机和基于能量的基线。

英文摘要

Gradient-based attribution methods can highlight input regions important for neural network predictions, but their effectiveness for temporal sound event detection in audio classification has not been systematically evaluated. This paper assesses whether integrated gradients (IG) can temporally detect sound events when applied to a classifier trained without temporal supervision. We use synthetic polyphonic audio with ground truth timestamps to measure alignment between IG attributions and event boundaries. On a 10-class domestic sound dataset, IG achieves mean Intersection over Union (IoU) of 0.39, frame-level F1 of 0.52, and Pointing Game accuracy of 82.6\%. For comparison, a framewise CNN trained with weak supervision (FW-WS, clip-level training labels) achieves 0.42 IoU, 0.55 F1, and 97.3\% PG, while a strongly supervised variant (FW-SS, frame-level training labels) reaches 0.45 IoU, 0.58 F1, and 97.9\% PG. Overall, these results suggest that post-hoc IG captures meaningful temporal activity patterns of sound events, with localization performance approaching models that explicitly produce frame-level predictions. All methods substantially outperform random and energy-based baselines.

2605.23282 2026-05-25 eess.IV cs.CV cs.LG

Discontinuous Galerkin Neural Operator for Pathology Defocus Deblurring

病理学离焦去模糊的间断伽辽金神经算子

Shaoqing Duan, Haofei Song, Xintian Mao, Qingli Li, Yan Wang

发表机构 * Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai, China(上海多维信息处理关键实验室,华东师范大学,上海,中国)

AI总结 病理学显微镜中的离焦去模糊因光学模糊的空间变化和局部不连续性而具有挑战性。现有深度学习方法受限于位移不变性假设和可解释性不足,难以处理这种异质性模糊模式。本文提出了一种基于不连续伽辽金格式的神经算子(DGNO),通过局部体积算子和界面数值通量参数化积分核,有效建模了异质且局部不连续的模糊模式,在保持光学成像物理特性的前提下,实现了更优的去模糊效果,并在高分辨率场景下表现出良好的性能。

Comments 17 pages, 9 figures. Accepted by ICML 2026

详情
AI中文摘要

病理显微镜中的离焦去模糊仍然具有挑战性,因为由位置相关的积分成像过程引起的光学模糊具有空间变化和局部不连续的特性。现有的深度学习方法受限于平移不变性假设和有限的可解释性,不太适合这种异质模糊模式。神经算子通过直接将离焦形成建模为积分算子,提供了一种原则性的替代方案,为离焦去模糊提供了新的视角。然而,大多数现有的用于低级视觉的神经算子架构依赖于全局参数化核,这些核假设平滑性和平稳性,限制了它们建模异质和局部不连续模糊模式的能力。为了解决这一限制,我们提出了间断伽辽金神经算子(DGNO),它使用具有单元局部体积算子和界面数值通量的间断伽辽金公式来参数化积分核。DGNO 提供了局部性、异质性建模和全局一致性的原则性组合,同时保留了光学图像形成的底层物理。广泛且深入的实验表明,DGNO 超越了现有技术,提供了更清晰的图像重建、对空间变化模糊的鲁棒处理以及可扩展的高分辨率性能。代码将在 https://github.com/DeepMed-Lab-ECNU/Single-Image-Deblur 发布。

英文摘要

Defocus deblurring in pathological microscopy remains challenging due to the spatially varying and locally discontinuous nature of optical blur induced by a position-dependent integral imaging process. Existing deep learning methods, constrained by shift-invariance assumptions and limited interpretability, are not well suited to such heterogeneous blur patterns. Neural operators provide a principled alternative by modeling defocus formation directly as an integral operator, offering a new perspective on defocus deblurring. However, most existing neural operator architectures for low-level vision rely on globally parameterized kernels that assume smoothness and stationarity, limiting their ability to model heterogeneous and locally discontinuous blur patterns. To address this limitation, we propose the Discontinuous Galerkin Neural Operator (DGNO), which parameterizes the integral kernel using a discontinuous Galerkin formulation with element-local volume operators and interface numerical fluxes. DGNO provides a principled combination of locality, heterogeneity modeling, and global coherence while preserving the underlying physics of optical image formation. Extensive and insightful experiments demonstrate that DGNO surpasses state-of-the-arts, delivering sharper reconstructions, robust handling of spatially varying blur, and scalable high-resolution performance. The code will be released at https://github.com/DeepMed-Lab-ECNU/Single-Image-Deblur.

2605.23268 2026-05-25 stat.ML cs.LG

Coupled Training with Privileged Information and Unlabeled Data

基于特权信息与未标记数据的联合训练

Jiahao Shi, Omar Hagrass, Jason M. Klusowski

发表机构 * Department of Electrical and Computer Engineering, Princeton University(普林斯顿大学电子与计算机工程系) Department of Operations Research and Financial Engineering, Princeton University(普林斯顿大学运筹学与金融工程系)

AI总结 在许多预测任务中,训练时可获得额外信息(如昂贵或难以收集的测量数据),而这些信息在模型部署时并不可用。本文提出了一种联合训练方法,将利用额外信息的模型与仅使用测试时输入的部署模型一同训练,使部署模型仅在额外信息真正有助于预测时才加以利用,从而避免继承其错误。该方法提供了预测准确率提升的理论保证,并通过实验验证了其在合成数据和实际任务中的优越性。

Comments 37 pages, 6 figures. Accepted to ICML 2026

详情
AI中文摘要

在许多预测问题中,我们在训练期间拥有额外信息(例如,昂贵或收集缓慢的测量值),但在模型部署时这些信息将不可用。一种常见策略是首先训练一个使用所有训练信息的模型,然后利用其对未标记样本的预测来训练第二个模型,该模型仅使用测试时可用的输入。然而,当额外的训练专用信息较弱或存在噪声时,这种两阶段方法可能会误导部署模型,甚至降低准确性。我们提出一种联合训练方法,同时学习两个模型,使得部署模型仅在额外信息真正有帮助时从中受益,而不是继承其错误。我们提供了描述联合训练何时提高预测准确性的保证,并分析了一种适用于大规模高维模型的简单交替训练算法。在合成数据和真实世界预测任务上的实验表明,我们的方法避免了这些失败,并稳健地优于标准两阶段基线。

英文摘要

In many prediction problems, we have extra information during training (for example, measurements that are expensive or slow to collect) that will not be available when the model is deployed. A common strategy is to first train a model that uses all training information, then use its predictions on unlabeled examples to train a second model that only uses the inputs available at test time. However, when the extra training-only information is weak or noisy, this Two-Stage approach can mislead the deployment model and even hurt accuracy. We propose a joint training method that learns the two models together, so the deployment model can benefit from the extra information only when it actually helps, instead of inheriting its mistakes. We provide guarantees that describe when joint training improves prediction accuracy and analyze a simple alternating training algorithm for large, high-dimensional models. Experiments on synthetic data and real-world prediction tasks show that our approach avoids these failures and robustly outperforms standard Two-Stage baselines.

2605.23261 2026-05-25 eess.AS cs.SD

UniSRM: A Unified Speech Reward Model for Reasoning-Based Fine-grained Assessment

UniSRM:一种用于基于推理的细粒度评估的统一语音奖励模型

Yuanyuan Wang, Dongchao Yang, Yayue Deng, Zhiyong Wu, Yiwen Guo, Helen Meng, Xixin Wu

发表机构 * The Chinese University of Hong Kong(香港中文大学) Tsinghua University(清华大学) Independent Researcher(独立研究者)

AI总结 目前语音生成的评估仍主要依赖人工评分,如平均意见得分(MOS),但这种方式成本高、主观性强且难以大规模复现。为解决这一问题,本文提出了一种统一的语音奖励模型UniSRM,能够基于推理过程提供多维度、可解释的评估信号。通过构建UniSRM-Data和UniSRM-Bench数据集,研究实现了从单句质量到上下文连贯性的多样化评估任务,并引入推理一致性奖励机制,显著提升了评估的可靠性与人类对齐程度。

Comments Accepted by ACL 2026(Main)

详情
AI中文摘要

评估语音生成仍然严重依赖人类判断,如平均意见分数(MOS),这些方法昂贵、主观且难以大规模复现。尽管最近一些研究开始探索基于AudioLLM的评判模型,但现有努力通常仅针对狭窄的场景(例如,话语级质量或单轮对话),并且对多样化语音生成任务和评估维度的覆盖有限。在这项工作中,我们提出了UniSRM,一种统一的语音奖励模型,能够支持具有可靠推理的多维、可解释的奖励信号。为了支持训练和评估,我们引入了UniSRM-Data和UniSRM-Bench,涵盖了从话语级质量到上下文级连贯性的语音评估任务。基于该数据集,我们提出了统一的语音奖励模型UniSRM,采用两阶段流水线实现基于推理的细粒度评估。此外,我们引入了推理一致性奖励以提高推理过程的可靠性。实验表明,UniSRM在广泛的语音评估任务中提供了更可靠且与人类一致的判断,为可扩展和统一的语音质量评估提供了实用基础。

英文摘要

Evaluating speech generation still relies heavily on human judgments, such as Mean Opinion Score (MOS), which are expensive, subjective, and difficult to reproduce at scale. While a few recent studies have begun to explore AudioLLM-based judge models, existing efforts typically target only a narrow set of scenarios (e.g., utterance-level quality or single-turn dialogue) and provide limited coverage of diverse speech generation tasks and evaluation dimensions. In this work, we propose UniSRM, a unified speech reward model that can support multi-dimensional, interpretable reward signals with reliable reasoning. To support training and evaluation, we introduce UniSRM-Data and UniSRM-Bench, covering speech evaluation tasks from utterance-level quality to context-level coherence. Based on this dataset, we present the unified speech reward model, UniSRM, with a two-stage pipeline that enables reasoning-based fine-grained assessment. Furthermore, we introduce Reasoning-Consistent Rewards to improve the reliability of the reasoning process. Experiments show that UniSRM delivers more reliable and human-aligned judgments across a broad range of speech evaluation tasks, offering a practical foundation for scalable and unified evaluation of speech quality.

2605.23225 2026-05-25 cs.DS cs.DM cs.IT cs.LG math.IT math.ST stat.TH

Entropy Equivalence Testing

熵等价性检验

Clément L. Canonne, Yash Pote, Jonathan Scarlett, Joy Qiping Yang

发表机构 * University of Sydney(悉尼大学) National University of Singapore(新加坡国立大学)

AI总结 本文提出了一个名为“熵等价性检验”的新问题,旨在判断两个未知分布的熵是否相差超过给定阈值,相较于传统的分布接近性检验更为宽松。研究设计了一种时间与样本效率较高的算法,证明其样本复杂度可显著低于传统接近性检验。该成果进一步应用于低阶贝叶斯网络的接近性检验,显著提升了现有基于完整学习方法的样本或时间效率。

详情
AI中文摘要

我们引入了概率分布的熵等价性检验问题,这是经典接近性检验问题的松弛版本。在该问题中,给定来自两个未知分布$p,q$的样本和一个参数$\varepsilon \in(0,1/2]$,分布检验算法只需区分$p=q$和$|H(p)-H(q)| \geq \varepsilon$(其中$H$表示香农熵)。我们为此任务提供了一个时间和样本高效的算法,表明该问题的最优样本复杂度可以显著低于接近性检验。作为应用,我们利用这一结果首次为低度贝叶斯网络的(标准)接近性提供了非平凡的检验算法,显著改进了基于完全学习的基线方法在样本或时间复杂度上的表现。

英文摘要

We introduce the problem of \emph{entropy equivalence testing} for probability distributions, a relaxation of the well-studied closeness testing problem, where the distribution testing algorithm is now only required to distinguish, given samples from two unknown distributions $p,q$ and a parameter $\varepsilon \in(0,1/2]$, between $p=q$ and $|H(p)-H(q)| \geq \varepsilon$ (where $H$ denotes the Shannon entropy). We provide a time- and sample-efficient algorithm for this task, showing that the optimal sample complexity for this task can be significantly lower than that of closeness testing. As an application, we leverage this result to provide the first non-trivial testing algorithm for (standard) closeness of low-degree \emph{Bayesian networks}, which significantly improves on either the sample or time complexity of a baseline based on full learning.

2605.23193 2026-05-25 cs.HC cs.CL cs.CY cs.MA

CultivAgents: Cultivating Relationship-Centered Multi-Agent Systems for Personalized Gardening

CultivAgents:培育以关系为中心的多智能体系统以实现个性化园艺

Yiyang Wang, Moeiini Reilly, Britney Johnson, Kefei Yan, Alex Cabral, Josiah Hester

发表机构 * Georgia Institute of Technology(佐治亚理工学院) Massachusetts Institute of Technology(麻省理工学院)

AI总结 CultivAgents 是一种以关系为中心的多智能体系统,旨在为个性化园艺提供具有社会文化背景的支持。该系统通过协调多个专业智能体,如经验智能体、环境智能体和民族植物学智能体,提供适应用户技能水平、本地生态条件和文化背景的园艺建议。研究通过多阶段混合方法评估表明,CultivAgents 有效提升了园艺者的信心、动机和对AI建议的信任,同时揭示了在文化特异性、生态适配和智能体协作方面仍有改进空间。

Comments Preprint, 9 pages. Website: https://hello-diana.github.io/CultivAgents/

详情
AI中文摘要

园艺对于支持福祉、文化连续性和食物自主至关重要,然而现有的数字工具通常提供忽视园丁技能、当地生态、季节和文化背景的通用建议。我们介绍了 CultivAgents,一个以关系为中心的多智能体系统,用于个性化、社会文化背景化的园艺支持。基于关怀伦理,CultivAgents 协调多个专业智能体:经验智能体根据用户技能水平调整指导,环境智能体基于当地和季节条件提供建议,民族植物学智能体将植物与文化知识和历史联系起来。我们通过一项三阶段混合方法研究评估了 CultivAgents,参与者包括领域专家(n=3)、人机交互研究人员(n=7)和社区园丁(n=5),分析了专家反馈、前后调查和参与式设计活动。结果表明,CultivAgents 帮助园丁将兴趣转化为情境行动:社区园丁报告信心(从3.00到3.60)、动机(从4.00到4.40)和信任AI建议(从3.20到4.00)均有提升。参与者重视超本地生态指导和互补的智能体视角,同时也指出了文化特异性、生态基础以及智能体协调方面的局限性。这项工作推进了以关系为中心的人工智能,为支持食物主权、社区韧性和文化保护的多智能体系统提供了设计启示。

英文摘要

Gardening is critical to support well-being, cultural continuity, and food autonomy, yet existing digital tools often provide generic advice that overlooks gardeners' skills, local ecologies, seasons, and cultural contexts. We introduce CultivAgents, a relationship-centered multi-agent system for personalized, socio-culturally grounded gardening support. Grounded in ethics of care, CultivAgents coordinates multiple specialized agents: an Experience Agent that adapts guidance to users' skill levels, an Environmental Agent that grounds advice in local and seasonal conditions, and an Ethnobotanical Agent that connects plants to cultural knowledge and histories. We evaluated CultivAgents through a three-phase mixed-methods study with domain experts (n=3), HCI researchers (n=7), and community gardeners (n=5), analyzing expert feedback, pre/post surveys, and participatory design activities. Results suggest that CultivAgents helped gardeners translate interest into situated action: community gardeners reported increased confidence (3.00 to 3.60), motivation (4.00 to 4.40), and trust in acting on AI advice (3.20 to 4.00). Participants valued hyperlocal ecological guidance and complementary agent perspectives, while also identifying limits in cultural specificity, ecological grounding, and agent coordination. The work advances relationship-centered AI, offering design implications for multi-agent systems that support food sovereignty, community resilience, and cultural preservation.

2605.23183 2026-05-25 eess.IV cs.CV

GMENet: Generative Mixture of Experts Network for Multi-Center Glioma Diagnosis with Incomplete Imaging Sequences

GMENet: 用于多中心胶质瘤诊断的生成式专家混合网络(不完整成像序列)

Pengfei Song, Fangjin Liu, Wenwen Zeng, Yonghuang Wu, Chengqian Zhao, Feiyu Yin, Xuan Xie, Jinhua Yu

发表机构 * School of Biomedical Engineering and Technology Innovation, Fudan University(复旦大学生物医学工程与技术创新学院) Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University(复旦大学脑启发智能科学技术研究院) Intelligent Diagnosis and Treatment Laboratory for Brain Diseases, Joint Laboratory of Neurosurgery Department of Huashan Hospital and School of Information Science and Technology, Fudan University(脑病智能诊断与治疗实验室,华山医院神经外科部门联合实验室,复旦大学信息科学学院)

AI总结 当前胶质瘤诊断通常结合分子特征与组织病理学信息进行临床决策,但在实际应用中,不同中心的影像协议不统一,导致MRI序列不完整,限制了现有模型的临床适用性。为此,本文提出GMENet,一种用于多中心胶质瘤诊断的生成专家混合网络。该方法通过跨注意力门控生成模块合成缺失的影像特征,并引入动态加权专家融合模块实现多任务预测,有效提升了模型在不完整数据下的诊断性能和跨中心适应能力。

Comments IJCAI Accept

详情
AI中文摘要

当代胶质瘤诊断将分子特征与组织病理学相结合以指导临床决策。然而,在临床环境中,不同的成像协议导致MRI序列不完整,从而带来两个主要挑战:迫使现有框架在训练期间丢弃大量临床数据,并因此限制了其临床适用性。为解决这些限制,我们提出了GMENet,一种用于不完整成像序列的多中心胶质瘤诊断的生成式专家混合网络。首先,我们设计了一个基于交叉注意力的门控生成模块,该模块通过交叉注意力和动态门控机制从可用序列合成缺失序列特征,并引入循环一致性损失以保持语义完整性。其次,我们引入了一个动态加权专家融合模块,该模块对原始和合成的双序列特征进行专家混合交互和置信度感知融合,以进行多任务预测。我们在一个包含来自四个内部数据集和两个公共存储库的1241名受试者的多中心队列上评估了GMENet。实验表明,相对于仅完整序列的数据,GMENet将临床可用的训练数据扩大了97%。此外,它始终优于在完整数据上训练的最先进方法,在跨中心分布偏移下表现出更强的鲁棒性。

英文摘要

Contemporary glioma diagnosis integrates molecular features with histopathology to guide clinical decision-making. However, in clinical settings, divergent imaging protocols result in incomplete MRI sequences, leading to two primary challenges: forcing existing frameworks to discard a large portion of clinical data during training and consequently limiting their clinical applicability. To address these limitations, we propose GMENet, a Generative Mixture of Experts Network for multi-center glioma diagnosis with incomplete imaging sequences. Firstly, we design a Cross-attention-based Gated Generation Module that synthesizes missing sequence features from available sequences via cross-attention and dynamic gating mechanisms, incorporating a cycle-consistency loss to preserve semantic integrity. Secondly, we introduce a Dynamically Weighted Experts Fusion Module that performs mixture-of-experts interaction and confidence-aware fusion over original and synthesized dual-sequence features for multi-task prediction. We evaluate GMENet on a multi-center cohort of 1,241 subjects from four in-house datasets and two public repositories. Experiments show that GMENet expands clinically usable training data by 97\%, relative to complete-sequence-only data. Furthermore, it consistently outperforms state-of-the-art methods trained on complete data, demonstrating improved robustness under cross-center distribution shifts.

2605.23175 2026-05-25 cs.CR cs.CL

Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection

用于知识产权保护的具有最小语义失真的鲁棒LLM水印

Kieu Dang, Phung Lai, NhatHai Phan, Yelong Shen, Ruoming Jin

发表机构 * State University of New York at Albany(纽约州立大学阿尔巴尼分校) New Jersey Institute of Technology(新泽西理工学院) Microsoft(微软) Kent State University(肯特州立大学)

AI总结 本文研究如何在保护大型语言模型知识产权的同时,最小化语义失真。为此,作者提出了SAFESEAL,一种基于密钥条件的水印框架,通过上下文感知的同义词替换机制,在保持语义一致性和事实准确性的同时实现高可检测性。该方法在检测端引入密钥条件对比检测器,支持跨提供方和多用户的水印验证,并在实验中表现出优于现有方法的实用性、可检测性和鲁棒性。

详情
AI中文摘要

专有大语言模型面临知识产权侵犯的风险,因为对手可以通过收集输入-输出对来训练替代模型,从而复制LLM,造成财务损失。水印提供了一种有前景的防御手段来验证所有权,但现有方法常常面临语义失真、事实不一致和对抗攻击的问题。此外,用于特定提供商检测的密钥条件水印,特别是在跨提供商和多用户场景中,仍然在很大程度上未被探索。为了解决这些挑战,我们提出了SAFESEAL,一种新颖的密钥条件水印框架,在最小化对模型实用性的影响下实现强可检测性,有效平衡可检测性、实用性和鲁棒性。SAFESEAL通过密钥条件锦标赛采样机制,在替换语言术语为上下文感知同义词的同时保留命名实体,保持语义保真度和事实一致性。在检测方面,我们引入了一种密钥条件对比检测器,该检测器联合编码文本和密钥,实现特定提供商和鲁棒的水印验证。我们推导了实用性-可检测性权衡的理论界限,并通过轻量级模型、批处理和并行化显著降低了延迟。大量实验表明,SAFESEAL在实用性、可检测性和鲁棒性方面优于基线,实现了0.983的BERTScore、0.963的实体相似度、98.2%的检测率,以及文本质量和内容保留的最高人类评分,延迟与最快的基线相当。为了促进透明度和社区驱动的进展,我们发布了第一个公共水印排行榜和一个交互式演示。

英文摘要

Proprietary large language models (LLMs) face risks of intellectual property (IP) violation, as adversaries can replicate an LLM by collecting input-output pairs to train a surrogate model, causing financial setbacks. Watermarks offer a promising defense to verify ownership, but existing methods often struggle with semantic distortion, factual inconsistency, and adversarial attacks. In addition, key-conditioned watermarks for provider-specific detection, especially in cross-provider and multi-user scenarios, remain largely underexplored. To address these challenges, we propose SAFESEAL, a novel key-conditioned watermarking framework that achieves strong detectability with minimal impact on model utility, effectively balancing detectability, utility, and robustness. SAFESEAL preserves named entities while substituting linguistic terms with context-aware synonyms through a key-conditioned Tournament sampling mechanism, maintaining semantic fidelity and factual consistency. For detection, we introduce a key-conditioned contrastive detector that jointly encodes the text and key, enabling provider-specific and robust watermark verification. We derive theoretical bounds on the utility-detectability trade-off and significantly reduce latency through lightweight models, batching, and parallelism. Extensive experiments show that SAFESEAL outperforms baselines in utility, detectability, and robustness, achieving a BERTScore of 0.983, entity similarity of 0.963, a 98.2% detection rate, and the highest human ratings for text quality and content preservation, with latency comparable to the fastest baseline. To promote transparency and community-driven progress, we release the first public watermark leaderboard and an interactive demo.

2605.23168 2026-05-25 cs.CR cs.AI cs.LG

PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs

PoisonForge: 面向指令微调LLM的任务级定向投毒基准

Luze Sun, Anshuman Suri, Harsh Chaudhari, Cristina Nita-Rotaru, Alina Oprea

发表机构 * Department of Computer Science(计算机科学系)

AI总结 本文提出PoisonForge,一个针对指令微调大语言模型的针对性任务级投毒基准,用于评估在有限投毒预算下模型对恶意数据的脆弱性。该基准通过四个维度参数化投毒威胁,并在五个任务家族中测试了12个不同参数量的开源模型,结果显示大多数模型在最脆弱配置下攻击成功率超过70%,但对非目标任务的影响极小。研究分析了影响攻击成功率的关键因素,并发现投毒设计选择而非模型规模是攻击成功的主要原因。

详情
AI中文摘要

当从业者在未经验证的数据集上微调LLM时,攻击者可以通过任务级投毒利用数据供应链:插入少量精心设计的指令-响应对,导致模型在目标任务族中嵌入攻击者指定的实体(如国家),而在其他行为中表现正常。我们引入PoofForge,一个沿四个维度(偏差类型、投毒模式、出现次数和目标输出长度)参数化此威胁的基准,并在五个模型族中评估了12个开源模型(参数从2B到32B),主要采用1%的投毒预算。在1000个微调样本中仅使用10个投毒样本的情况下,12个模型中有11个在其最易受攻击的配置下攻击成功率(ASR)超过70%。同时,非目标任务的无意泄露低于0.5%,模型在标准基准上表现良好。我们详细分析了影响攻击成功的因素。我们观察到,实体的多次出现提高了ASR,最佳投毒模式取决于目标实体的语义结构,并且ASR随任务输出长度单调下降。相关分析和风险预测模型证实,投毒设计选择而非模型规模是攻击成功的主要原因,并且这些模式可以推广到预测新任务上的攻击成功。我们发布所有配置、流水线和分析代码以支持可重复比较。

英文摘要

When practitioners fine-tune LLMs on unvetted datasets, an adversary can exploit the data supply chain through task-level poisoning: inserting a small number of crafted instruction-response pairs that cause the model to embed attacker-specified entities, such as a country, in outputs for a targeted task family while behaving normally elsewhere. We introduce PoisonForge, a benchmark that parameterizes this threat along four dimensions (bias type, poisoning mode, appearance count, and target output length) and evaluates 12 open-weight models (from 2B to 32B parameters) across five families under a primarily 1% poison budget. With only 10 poisoned examples among 1,000 fine-tuning examples, 11 of 12 models exceed a 70% attack success rate (ASR) in their most vulnerable configuration. Meanwhile, unintended leakage to non-target tasks remains below 0.5%, and models perform well on standard benchmarks. We analyze in detail the factors contributing to attack success. We observe that multiple appearances of an entity increase the ASR, the optimal poisoning mode depends on the semantic structure of the target entity, and ASR drops monotonically with the task output length. A correlation analysis and risk prediction model confirm that poisoning design choices, rather than model scale, are the primary causes of attack success, and that these patterns generalize to predict attack success on new tasks. We release all configurations, pipelines, and analysis code to support reproducible comparisons.

2605.23159 2026-05-25 econ.GN cs.AI q-fin.EC

Generative AI and the Reorganization of Labor Demand

生成式AI与劳动力需求的重组

Fangyan Wang, Zaiyan Wei, Yang Wang

发表机构 * Mitch Daniels School of Business, Purdue University(普渡大学米切尔丹尼尔斯商学院)

AI总结 本文研究生成式人工智能(AI)对劳动力需求的重塑影响,探讨企业在技术扩散过程中如何调整招聘岗位和岗位任务结构。通过构建基于美国全行业招聘广告数据的动态暴露度指标,研究发现,生成式AI的暴露程度随时间变化,并非固定不变;企业主要通过岗位间的招聘调整(占52%)和岗位内部任务重构(占39.5%)来适应AI技术,且不同层级岗位的调整路径存在差异。研究揭示了劳动力市场对生成式AI的适应过程是组织结构和任务架构的重新配置。

详情
AI中文摘要

生成式人工智能(AI)预计将改变工作方式,但关于随着技术扩散,企业如何重组劳动力需求的研究尚不充分。现有研究主要关注哪些职业暴露于AI或暴露的工作是否减少。我们通过考察企业是否通过改变招聘地点、工作内容或两者兼而有之来调整,扩展了这一讨论。利用覆盖美国经济所有部门的全国职位发布数据集,我们通过两阶段大语言模型管道构建了一个动态的、职位级别的生成式AI暴露度度量。该管道识别每个职位发布中描述的任务,并分类生成式AI能够执行或辅助这些任务的程度。然后,我们将总暴露度的变化分解为两个边际:跨职位需求重新分配和职位内任务重新设计。我们记录了三个主要发现。首先,生成式AI暴露度是动态而非固定的,随时间显著变化。其次,劳动力需求通过两个边际进行调整。招聘重新分配解释了总暴露度下降的最大份额,平均占52%,而职位内重新设计变得越来越重要,占39.5%。补充的Oaxaca-Blinder分解显示,职业构成的变化解释了可归因于可观察职位特征的暴露度变化的约90%。第三,调整在职业阶梯上有所不同。高级职位调整更早,主要通过重新分配,而初级职位则通过重新分配、重新设计及其相互作用的更广泛组合进行调整。这些发现表明,劳动力市场对生成式AI的调整是一个组织重构的过程,在此过程中,企业重塑了招聘需求和工作的任务架构。

英文摘要

Generative artificial intelligence (AI) is expected to transform work, but less is known about how firms reorganize labor demand as the technology diffuses. Existing research has largely focused on which occupations are exposed to AI or whether exposed jobs decline. We extend this debate by examining whether firms adjust by changing where they hire, what jobs contain, or both. Using a nationwide dataset of job postings in the United States, covering all sectors of the economy, we construct a dynamic, posting-level measure of generative AI exposure with a two-stage large language model pipeline. The pipeline identifies the tasks described in each posting and classifies the extent to which generative AI can perform or assist them. We then decompose changes in aggregate exposure into two margins: reallocation of demand across jobs and redesign of tasks within jobs. We document three main findings. First, generative AI exposure is dynamic rather than fixed, changing substantially over time. Second, labor demand adjusts through both margins. Hiring reallocation explains the largest share of the aggregate decline in exposure, accounting for 52% on average, while within-job redesign becomes increasingly important, accounting for 39.5%. A complementary Oaxaca-Blinder decomposition shows that shifts in occupational composition account for about 90% of the exposure change attributable to observable job characteristics. Third, adjustment differs across the job ladder. Senior jobs adjust earlier and mainly through reallocation, whereas junior jobs adjust through a broader mix of reallocation, redesign, and their interaction. These findings suggest that labor-market adjustment to generative AI is a process of organizational reconfiguration, in which firms reshape both hiring demand and the task architecture of work.

2605.23158 2026-05-25 cs.CR cs.CL cs.LG

What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference

服务器看到了什么?理解大语言模型在分割推理中的隐私泄露

Mingyuan Fan, Yu Liu, Fuyi Wang, Cen Chen

发表机构 * East China Normal University(华东师范大学) RMIT University(皇家墨尔本理工大学)

AI总结 本文研究了在分割推理(split inference)框架下,大型语言模型(LLM)可能泄露用户隐私的问题。作者提出了一种名为ActInv的方法,通过匹配中间激活值来重建客户端输入,揭示了分割推理中的隐私漏洞。研究还引入了“扰动放大因子”(PAF)来量化各层对重建的抵抗能力,并设计了PriPert防御方案,有效提升了隐私保护效果,同时保持了模型的实用性和计算效率。

Comments Accepted to ACM CCS'26

详情
AI中文摘要

在资源受限设备上部署大语言模型(LLM)仍然具有挑战性,这激发了人们对分割推理的兴趣,即模型在客户端和服务器之间进行划分,通过仅传输中间激活来减少计算负担并增强隐私。然而,分割推理的隐私保护能力,特别是在LLM背景下,尚未得到彻底研究。为填补这一空白,我们引入了ActInv,它解决了一个中间激活匹配问题以重建客户端的输入。大量评估表明,即使在存在常见基于扰动的防御(如高斯噪声注入和激活稀疏化)的情况下,ActInv也能实现高保真重建。为了系统地理解这一漏洞,我们开发了扰动放大因子(PAF),一个用于量化层对重建固有抵抗力的指标。我们的分析揭示了隐私脆弱性在层间并不均匀,一些层高度易受泄露,而另一些层则提供自然抵抗力。此外,我们证明了通过校准扰动方向以在反向传播期间最大化重建误差,可以显著提高防御有效性。基于这些见解,我们设计了PriPert,并进行了全面评估,涵盖隐私、效用和计算开销,以证明其有效性。

英文摘要

The deployment of large language models (LLMs) on resource-constrained devices remains challenging, spurring interest in split inference, where models are partitioned between client and server to reduce computational burden and enhance privacy by transmitting only intermediate activations. However, the privacy-preserving capabilities of split inference, particularly in the context of LLMs, have not been exhaustively investigated. To fill this gap, we introduce ActInv, which solves an intermediate activation matching problem to reconstruct the client's input. Extensive evaluations demonstrate that ActInv achieves high-fidelity reconstructions, even in the presence of common perturbation-based defenses such as Gaussian noise injection and activation sparsification. To systematically understand this vulnerability, we develop Perturbation Amplification Factor (PAF), a metric for quantifying a layer's inherent resistance to reconstruction. Our analysis reveals that privacy vulnerability is not uniform across layers, with some layers being highly susceptible to leakage while others offer natural resistance. Furthermore, we demonstrate that defense effectiveness can be significantly improved by calibrating perturbation directions to maximize reconstruction error during backpropagation. Building on these insights, we design PriPert and conduct comprehensive evaluations, covering privacy, utility, and computational overhead, to demonstrate its effectiveness.

2605.23145 2026-05-25 stat.ML cs.LG math.ST stat.ME stat.TH

Operationalizing Individual Fairness via Gradient Descent and Bradley-Terry Models

通过梯度下降和Bradley-Terry模型实现个体公平性

Conlan Olson, Linjun Zhang, Zhun Deng, Pragya Sur

发表机构 * Columbia University(哥伦比亚大学) Rutgers University(罗格斯大学) UNC Chapel Hill(北卡罗来纳大学教堂山分校) Harvard University(哈佛大学)

AI总结 本文研究如何通过梯度下降和Bradley-Terry模型实现个体公平性,解决在实际应用中学习个体相似度度量的困难问题。作者提出了一种基于三元组查询学习马哈兰诺比斯相似度度量的算法,结合谱初始化和梯度下降方法,并提供了理论保证,证明该算法能快速收敛到真实度量。研究还表明,基于估计度量实现的个体公平性可近似保证对真实度量的公平性,并探讨了该方法在AI模型调优中的潜在应用。

Comments 60 pages, 2 figures

详情
AI中文摘要

个体公平性,即“相似个体应受到相似对待”的概念,为算法决策者提供了强大而灵活的公平性保证。然而,在实践中实施个体公平性的一个障碍是难以学习个体间的相似性度量。在这项工作中,我们提出了一种从三元组查询(形式为“个体$i$与个体$j$还是$k$更相似?”)中学习马氏距离度量的算法。我们在标准的Bradley-Terry成对比较模型下工作。我们的算法包括一个谱初始化步骤,随后是梯度下降。我们为算法提供了广泛的理论保证,表明尽管我们模型中的损失是非凸的,但算法能快速收敛到真实度量。由于我们的重点是公平性,我们还表明,相对于估计度量的个体公平性足以实现相对于真实度量的类似公平性。我们还讨论了我们的工作在AI模型调优中的潜在应用。最后,我们展示了实验结果,证明了我们算法的收敛性以及基于估计度量训练的下游公平预测器的公平性性能。

英文摘要

Individual fairness, the notion that "similar individuals should be treated similarly," provides a strong and flexible fairness guarantee for algorithmic decision makers. However, a barrier to implementing individual fairness in practice is the difficulty of learning the similarity metric over individuals. In this work, we present an algorithm for learning a Mahalanobis similarity metric from triplet queries of the form "is individual $i$ more similar to individual $j$ or $k$?" We work in the standard Bradley-Terry model for pairwise comparisons. Our algorithm consists of a spectral initialization step followed by gradient descent. We provide extensive theoretical guarantees on our algorithm, showing that it converges quickly to the ground truth metric despite the non-convexity of the loss in our model. Because our focus is on fairness, we also show that individual fairness with respect to an estimated metric is sufficient to achieve similar fairness with respect to the true metric. We also discuss potential applications of our work to AI model tuning. Finally, we present experimental results that demonstrate the convergence of our algorithm and the fairness performance of downstream fair predictors trained on our estimated metric.

2605.23138 2026-05-25 quant-ph cs.AI cs.ET cs.LG

Classical State Preparation for Variational Quantum Algorithms via Reinforcement Learning

基于强化学习的变分量子算法经典态制备

Gino Kwun, Dhanvi Bharadwaj, Gokul Subramanian Ravi

发表机构 * Computer Science and Engineering University of Michigan(计算机科学与工程大学密歇根大学)

AI总结 该论文提出了一种基于强化学习的新型方法CRiSP,用于变分量子算法中的经典初始态制备。该方法将离散前缀选择建模为序列决策问题,结合神经引导的蒙特卡洛树搜索和自博弈训练的Transformer策略,能够在不改变电路结构的前提下,通过多项式时间的经典稳定子模拟生成高质量初始态。实验表明,CRiSP在多个QAOA和VQE基准任务中显著优于现有方法,展现出更高的能量精度和更强的可扩展性。

Comments 22 pages, 4 figures

详情
AI中文摘要

变分量子算法(VQA)可能提供实现实际量子优势的途径,但其优化受到贫瘠高原和大量局部极小值的严重阻碍。虽然经典可模拟的克利福德电路可以热启动VQA以加速收敛,但现有的基于启发式的初始化方法难以在巨大的组合搜索空间中扩展。为了克服这一瓶颈,我们提出了CRiSP(用于态制备的克利福德强化学习智能体),这是一个将离散前缀选择表述为序列决策问题的框架。CRiSP利用神经引导的蒙特卡洛树搜索,由通过自我对弈训练的基于Transformer的策略驱动,在固定参数化旋转之前插入学习到的克利福德门。这使得能够完全通过多项式时间的经典稳定子模拟构建高质量的初始态,而不改变底层电路架构。通过整合逐步扩展搜索范围的课程学习策略,该智能体能够高效扩展到深度电路。在多达22个量子比特和1,370个参数的QAOA基准测试中,CRiSP在平均能量精度上优于最先进的克利福德初始化方法平均3.17倍(最大45.02倍),在最佳能量精度上平均2.44倍(最大16.01倍)。对VQE任务的评估进一步证明了该框架的鲁棒性和泛化能力。

英文摘要

Variational Quantum Algorithms (VQAs) potentially offer a pathway to practical quantum advantage, but their optimization is heavily hindered by barren plateaus and numerous local minima. While classically simulable Clifford circuits can warm-start VQAs to accelerate convergence, existing heuristic-based initialization methods struggle to scale within vast combinatorial search spaces. To overcome this bottleneck, we propose CRiSP (a Clifford Reinforcement Learning agent for State Preparation), a framework that formulates discrete prefix selection as a sequential decision-making problem. CRiSP utilizes Neural-Guided Monte Carlo Tree Search, driven by a Transformer-based policy trained via self-play, to insert learned Clifford gates before fixed parameterized rotations. This enables the construction of high-quality initial states entirely through polynomial-time classical stabilizer simulation without altering the underlying circuit architecture. By integrating a curriculum learning strategy that progressively expands the search horizon, the agent efficiently scales to deep circuits. Evaluated on QAOA benchmarks of up to $22$ qubits and $1{,}370$ parameters, CRiSP outperforms state-of-the-art Clifford initialization methods by a mean of $3.17\times$ (max $45.02\times$) in average energy accuracy and $2.44\times$ (max $16.01\times$) in best-achieved energy accuracy. Assessments on VQE tasks further demonstrate the framework's robustness and generalizability.

2605.23123 2026-05-25 cs.CY cs.AI cs.HC

Defining AI Fatigue in Academic Contexts: Dimensions, Indicators, and a Stage-Based Model Using Grounded Theory

定义学术情境中的AI疲劳:维度、指标及基于扎根理论的分阶段模型

John Paul P. Miranda, Emmanuel B. Parreño, Jovita G. Rivera

发表机构 * Pampanga State University(帕曼加州大学)

AI总结 本文探讨了学术场景中由持续使用AI工具引发的一种新型压力——AI疲劳,提出了其定义、维度及阶段模型。研究基于对1054名菲律宾大学学生的开放式回答进行扎根理论分析,识别出认知超载、动机脱离、道德不安、身体负担和注意力分散五个维度,每个维度包含两个基于参与者描述的指标。研究还构建了AI疲劳阶段模型,解释了这些压力如何在重复使用AI工具的过程中累积和相互强化,为未来相关测量工具的开发和跨情境研究奠定了基础。

Comments 17 pages, journal article, Volume 25, Issue 5,

Journal ref International Journal of Learning, Teaching and Educational Research, 25(5), 91-107 (2026)

详情
AI中文摘要

AI工具在学术环境中的整合引入了一种独特的压力形式,现有框架如技术压力和数字疲劳尚未完全解决这一问题。本研究开发了一个概念模型,并确定了定义AI疲劳的维度,AI疲劳是持续在学术中使用AI工具而产生的一种压力形式。通过对菲律宾三所大学1054名大学生的开放式回答进行扎根理论分析,研究了学生在AI支持的学术工作中经历的认知、动机、情感、身体和注意力压力。分析产生了AI疲劳的五个维度,即认知超载、动机脱离、道德不安、身体疲劳和注意力漂移,每个维度包含两个基于参与者叙述的指标。研究结果还提出了AI疲劳模型,这是一个分阶段框架,解释了这些压力如何在学术任务中反复与AI交互时积累并相互强化。这些贡献为AI疲劳作为一个独特构念建立了概念和探索基础,并为未来在AI中介学生学习的学术环境中的工具验证、量表开发和跨情境研究提供了基础。

英文摘要

The integration of AI tools in academic settings has introduced a distinct form of strain that existing frameworks like technostress and digital fatigue have not yet fully addressed. This study develops a conceptual model and identifies the dimensions that define AI fatigue as a form of strain arising from sustained academic use of AI tools. Using grounded theory analysis of open-ended responses from 1,054 university students across three universities in the Philippines, the study examined the cognitive, motivational, emotional, physical, and attentional pressures students experienced during AI-supported academic work. Analysis produced five dimensions of AI fatigue, namely Cognitive Overload, Motivational Disengagement, Moral Unease, Physical Strain, and Attentional Drift, each consisting of two indicators grounded in participant accounts. The findings also yielded the AI Fatigue Model, a stage-based framework that explains how these pressures accumulate and reinforce one another across repeated AI interaction in academic tasks. These contributions establish a conceptual and exploratory foundation for AI fatigue as a distinct construct and provide a basis for future instrument validation, scale development, and cross-contextual inquiry in academic settings where AI now mediates student learning.

2605.23108 2026-05-25 cs.SE cs.AI

Philosophical Dispositions as Behavioral Constraints for AI-Assisted Code Review: An Empirical Study

哲学倾向作为AI辅助代码评审的行为约束:一项实证研究

Kaushal Bansal

发表机构 * Salesforce, Inc.(Salesforce公司)

AI总结 本文研究如何通过哲学立场(如怀疑主义、逻辑学、犬儒主义等)约束AI代码审查工具的行为,以提升其审查的多样性和深度。研究提出了一种基于特定知识论传统构建AI审查行为框架的方法,并通过实证分析验证了该方法在不同编程语言和项目中的有效性。实验表明,该系统能够发现传统AI工具难以识别的结构性和逻辑性问题,展现出更强的审查独特性和准确性。

详情
AI中文摘要

AI辅助代码评审工具通常作为通用的“专家评审者”代理运行,无论需要何种分析类型,都会产生同质化的发现。我们提出一个系统,通过哲学倾向——基于特定认识论传统(皮浪怀疑论、新正理逻辑、第欧根尼犬儒主义、儒家关系伦理)的连贯人格视角,将注意力引导到结构上不同类型的问题上——来约束AI评审者行为。每种倾向通过否定方式定义(即拒绝做什么),配备自我监控的失败模式(hamartia),并通过角色协议按顺序编排。我们在跨越5种编程语言(Python、Go、C++、Java、Terraform)、5个组织(2个企业、3个开源)和2个时间时代(AI前2020年、AI后2024-2026年)的7个代码库的50个合并拉取请求上评估该系统。该倾向系统与人类评审者达到46%的一致性(验证信号质量),以75%的比率识别出独特发现,并且在总共601个发现中,没有发现被作者判定为假阳性(未评估评分者间一致性,这仍是一个局限)。受控基线比较表明,51%的倾向发现是同一模型使用通用“专家评审者”提示不会产生的,这些独特发现针对结构、操作和逻辑问题,而非标准代码级别问题。初步跨模型验证(Claude Opus vs. GPT Codex 5.3-xhigh)在3个PR上显示100%的框架结构遵循度和39%的发现级别一致性,表明该框架在保持模型特定分析视角的同时提供了真正的行为约束。

英文摘要

AI-assisted code review tools typically operate as generic "expert reviewer" agents, producing homogeneous findings regardless of the analysis type needed. We present a system that constrains AI reviewer behavior through philosophical dispositions -- coherent personality lenses grounded in specific epistemological traditions (Pyrrhonist Skepticism, Navya-Ny=aya logic, Diogenes' Cynicism, Confucian relational ethics) that direct attention to structurally different types of issues. Each disposition is defined apophatically (by what it refuses to do), equipped with a self-monitoring failure mode (hamartia), and orchestrated in sequence by role protocols. We evaluate this system on 50 merged pull requests across 7 repositories spanning 5 programming languages (Python, Go, C++, Java, Terraform), 5 organizations (2 enterprise, 3 open-source), and 2 temporal eras (pre-AI 2020, post-AI 2024--2026). The disposition system achieves 46% convergence with human reviewers (validating signal quality), identifies unique findings at a 75% rate, and produces no findings judged false-positive by the author across 601 total findings (inter-rater agreement was not assessed and remains a limitation). A controlled baseline comparison demonstrates that 51% of disposition findings are not produced by the same model using generic "expert reviewer" prompting, and these unique findings target structural, operational, and logical concerns rather than standard code-level issues. Preliminary cross-model validation (Claude Opus vs.\ GPT Codex 5.3-xhigh) on 3 PRs shows 100% framework-structure adherence with 39% finding-level agreement, suggesting the framework provides real behavioral constraint while preserving model-specific analytical perspective.

2605.23102 2026-05-25 stat.ML cs.LG stat.ME

LLM Sparsity Prior for Robust Feature Selection

LLM 稀疏先验用于鲁棒特征选择

Caleb Skinner, Yihan Guo, Meng Li

发表机构 * Department of Statistics, Rice University(统计学系,里士满大学) Department of Computer Science, Rice University(计算机科学系,里士满大学)

AI总结 本文提出了一种基于大语言模型(LLM)稀疏性先验的鲁棒特征选择方法,用于高维变量选择。该方法通过引入可解释的超参数将LLM生成的权重整合到Spike-and-Slab模型中,同时利用分层超先验动态过滤无信息或误导性权重,从而在保证准确权重利用的同时提升鲁棒性。实验表明,该方法在医疗数据集上不仅提高了预测精度,还识别出基线方法遗漏的临床相关特征,尤其在小样本场景下表现出色。

详情
AI中文摘要

大型语言模型 (LLM) 提供了一种可扩展的机制,用于引出领域信息的先验知识,以进行高维变量选择。然而,现有方法如 LLM-Lasso 对权重质量敏感,当 LLM 生成的权重不准确时,性能会大幅下降。为了解决这一挑战,我们首先引入了一个量化 LLM 生成权重质量的框架,从而能够对不同权重机制下的 LLM 信息方法进行严格评估。然后,我们提出了 LLM 稀疏先验 (LSP),它通过两个可解释的超参数(控制全局稀疏性和权重集中度)将 LLM 生成的权重整合到 Spike-and-Slab 和 Spike-and-Slab Lasso 模型的先验包含概率中。这些参数上的层次超先验允许模型动态地折扣无信息或误导性权重,从而在权重准确时提高鲁棒性而不牺牲收益。最后,我们开发了原则性的提示工程策略,并在一个研究急性肾损伤的私有医学数据集上验证了该方法。LSP 提高了预测准确性,并识别出了基线方法遗漏的临床相关特征,对提示变化具有鲁棒性,在低数据场景下尤其有效。

英文摘要

Large language models (LLMs) offer a scalable mechanism to elicit domain-informed prior information for high-dimensional variable selection. However, existing methods such as LLM-Lasso are sensitive to weight quality, with performance degrading substantially when LLM-generated weights are inaccurate. To address this challenge, we first introduce a framework for quantifying the quality of LLM-generated weights, enabling rigorous evaluation of LLM-informed methods across varying weight regimes. We then propose the LLM Sparsity Prior (LSP), which integrates LLM-generated weights into the prior inclusion probabilities of Spike-and-Slab and Spike-and-Slab Lasso models via two interpretable hyperparameters governing global sparsity and weight concentration. Hierarchical hyperpriors on these parameters allow the model to dynamically discount uninformative or misleading weights, improving robustness without sacrificing gains when weights are accurate. Finally, we develop principled prompt engineering strategies and validate the method on a private medical dataset studying Acute Kidney Injury. LSP improves prediction accuracy and identifies clinically relevant features missed by the baselines, with robustness to prompt variation and particular effectiveness in low-data regimes.

2605.23096 2026-05-25 cs.CR cs.LG

Encrypted Neural Networks without Overflows

无溢出的加密神经网络

Philipp Kern, Lorenzo Rovida, Samuel Teuber, Edoardo Manino, Carsten Sinz, Alberto Leporati

发表机构 * Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院) Polytechnic University of Turin(都灵理工学院) The University of Manchester(曼彻斯特大学) Karlsruhe University of Applied Sciences(卡尔斯鲁厄应用科学大学) University of Milano-Bicocca(米兰-布雷拉大学)

AI总结 本文研究了在使用全同态加密(FHE)进行隐私保护推理时,神经网络中可能出现的溢出攻击问题。针对当前主流的CKKS加密方案在支持操作上的限制,作者提出了一种形式化验证技术,用于计算网络中所有神经元的严格范围界限,从而彻底消除溢出风险。实验表明,该方法有效避免了所有基准测试中的溢出问题,将失败率从最高47%降至0%,且与大多数基于CKKS的框架兼容。

Comments Preprint

详情
AI中文摘要

全同态加密(FHE)通过对加密数据上的神经网络进行评估,实现私有推理。这样,我们可以将计算委托给第三方服务器,而无需泄露用户的数据。目前,CKKS方案是大多数高效FHE实现的骨干,但它仅支持加法、乘法和数组旋转操作,因此要求神经网络的所有激活函数在某个区间内由多项式近似,施加了严格的设计容差。在本文中,我们首次证明该方案易受溢出攻击,即看似良性的输入可能超过FHE电路的容差,从而导致输出损坏且不可用。为了避免这种情况,我们提出了一种形式化验证技术,计算网络中所有神经元范围的认证界限。通过构造,我们的方法消除了溢出,并且在我们的实验中,在所有基准测试上消除了观察到的溢出,将故障率从高达47%降低到0%。此外,我们的无溢出解决方案与大多数基于CKKS的框架兼容,因为它允许简单地用具有严格设计范围的多项式替换标准多项式。

英文摘要

Fully homomorphic encryption (FHE) enables private inference by evaluating neural networks on encrypted data. In this way, we can delegate the computation to a third party server without ever revealing the user's data. Currently, the CKKS scheme is the backbone of most efficient FHE implementations, but it only supports addition, multiplication, and array rotation operations, thus requiring all activation functions of the neural network to be approximated by polynomials within a certain interval, imposing strict design tolerances. In this paper, we demonstrate for the first time that this scheme is vulnerable to overflow attacks, i.e., seemingly benign inputs that can exceed such tolerances of the FHE circuit, thereby causing corrupt and unusable outputs. To avoid them, we propose a formal verification technique that computes certified bounds on the ranges of all neurons in the network. By construction, our method eliminates overflows and, in our experiments, removed observed overflows on all benchmarks, reducing failure rates from up to 47% to 0%. Moreover, our overflow-free solution is compatible with most CKKS-based frameworks, as it allows to simply substitute standard polynomials by polynomials with rigorously designed ranges.

2605.23094 2026-05-25 eess.IV cs.AI cs.CV

Do Synthetic Brain MRIs Reliably Improve Tumour Classification? A StyleGAN2-ADA Class-Plane Augmentation Study on BRISC 2025

合成脑部MRI能否可靠改善肿瘤分类?基于BRISC 2025的StyleGAN2-ADA类平面增强研究

José Rafael Noriega Cedeño

发表机构 * NVIDIA

AI总结 该研究探讨了合成脑部MRI图像是否能有效提升肿瘤分类任务的性能,使用StyleGAN2-ADA生成器在BRISC 2025数据集上生成图像,并测试其对三种分类模型的影响。研究发现,合成图像的增益效果因模型架构和真实与合成图像比例不同而有所差异,其中MobileViTV2模型在使用过滤后的1:1合成图像增强后,肿瘤分类准确率提升了1.02%。结果表明,生成式增强的效果并非仅取决于图像的视觉质量,而是与模型结构和数据配比密切相关。

Comments 18 pages, 16 figures

详情
AI中文摘要

生成式增强常被提议作为小规模医学图像数据集的补救措施,但合成图像只有在改善下游任务性能时才有用。此处的“增强”指合成补充:将GAN生成的样本添加到真实训练池中,而非对现有图像进行几何或光度变换。我们在受限的BRISC 2025分区上训练了十二个类平面StyleGAN2-ADA生成器,以测试其输出(无论是否经过InceptionV3特征空间过滤)是否能改善三个分类器家族上的留出肿瘤分类:基于InceptionV3特征的随机森林(RF)、紧凑型双头卷积神经网络(CNN)以及移动混合卷积-Transformer MobileViTV2。每个分类器在1:1和1:2的真实与合成比例下进行评估。独立的GPT-5.5盲测在模型可读子集上将门控真实与合成辨别率定为57.73%(95%置信区间:54.48–60.92%),略高于随机水平。RF分类器未从合成MRI中获益。CNN显示出一致的均值增益,但未通过Holm校正。MobileViTV2显示出最清晰的益处:过滤后的1:1增强将肿瘤分类准确率绝对提高了1.02%(95%置信区间:0.54–1.54%;Holm校正后p=0.0104)。二次效率分析发现,每个增强的CNN条件比基线提前42–64%选择其检查点,而计算匹配的MobileViTV2运行在减少50–67%的真实数据epoch后达到选择。总体而言,增强效用被发现依赖于架构和比例,而非仅由视觉保真度保证。

英文摘要

Generative augmentation is often proposed as a remedy for small medical-image datasets, but synthetic images are only useful when they improve downstream task performance. "Augmentation" here means synthetic supplementation: GAN-generated samples added to the real training pool, not geometric or photometric transforms of existing images. Twelve class-plane StyleGAN2-ADA generators were trained on constrained BRISC 2025 partitions to test whether their output, with or without InceptionV3 feature-space filtering, improves held-out tumour classification across three classifier families: a random forest (RF) on InceptionV3 features, a compact two-headed convolutional neural network (CNN), and MobileViTV2, a mobile hybrid convolutional-transformer. Each was evaluated at 1:1 and 1:2 real-to-synthetic ratios. An independent GPT-5.5 blind test placed gated real-versus-synthetic discrimination at 57.73% (95% CI: 54.48--60.92%) on the model-legible subset -- modestly above chance. The RF classifier did not benefit from the synthetic MRIs. The CNN showed consistent mean gains that did not survive Holm correction. MobileViTV2 showed the clearest benefit: filtered 1:1 augmentation improved tumour classification accuracy by 1.02% absolute (95% CI: 0.54--1.54%; Holm-corrected p = 0.0104). A secondary efficiency analysis found that every augmented CNN condition selected its checkpoint 42--64% earlier than baseline, while compute-matched MobileViTV2 runs reached selection after 50--67% fewer real-data epochs. Overall, augmentation utility was found to be architecture- and ratio-dependent, not guaranteed by visual fidelity alone.

2605.23091 2026-05-25 cs.SE cs.AI cs.CR

Security of LLM-generated Code: A Comparative Analysis

LLM生成代码的安全性:一项比较分析

Srivathsan G Morkonda, Mahmoud Selim, Hala Assal

发表机构 * Carleton University(卡尔顿大学)

AI总结 本文研究了大型语言模型(LLM)生成代码的安全性问题,评估了七种流行LLM生成代码中的安全漏洞。通过模拟开发者使用LLM生成代码的行为,研究发现所有被评估的模型生成的代码中均存在不同程度的安全漏洞,其中大部分为高危或严重漏洞,揭示了当前AI辅助编程在安全性方面的潜在风险。

详情
AI中文摘要

大多数软件开发人员正在或计划在其开发过程中使用人工智能(AI)工具,主要原因包括提高生产力和加快学习速度。事实上,大型语言模型(LLM)生成的代码目前已投入生产,包括在主要科技公司中。然而,人们对于使用AI工具生成代码的相关风险提出了担忧。在本文中,我们重点关注软件安全风险。我们实证评估了七种流行LLM生成代码的安全性。我们基于先前的工作,模拟了开发人员使用LLM生成代码时的行为。我们的结果表明,我们评估的所有七种LLM生成的代码都包含漏洞,其中大多数为严重或高危漏洞。

英文摘要

The majority of software developers use or are planning to use Artificial Intelligence (AI) tools in their development processes. Their top reasons include improving productivity and faster learning. In fact, Large Language Model (LLM)-generated code is currently in production, including in major tech companies. However, concerns were raised about the risks associated with the use of AI tools to generate code. In this paper, we focus our attention on the risks to software security. We empirically evaluate the security of code generated by seven popular LLMs. We build upon previous work to mimic the behaviours of developers when using LLMs to generate code. Our results show that all seven LLMs that we have evaluated generate code that contains vulnerabilities, the majority of which are of critical or high severity.

2605.23058 2026-05-25 cs.SE cs.AI

A measurement substrate for agentic Kubernetes operations: Methodology and a case study in retrieval-compounding falsification

面向代理化 Kubernetes 操作的测量基础:方法论与检索复合证伪案例研究

Joshua Odmark, Gideon Rubin, Deon van der Vyver

发表机构 * Independent(独立) LDE Cognyx

AI总结 该论文提出了一种用于评估自主 Kubernetes 操作代理的测量框架 agent-breakage,旨在解决当前相关研究中缺乏可证伪性的问题。该框架通过注入故障并观察代理的响应,从四个维度进行评分,并记录带标签的状态-动作-结果元组,从而实现对代理行为的系统评估。研究通过案例分析揭示了检索历史故障报告对代理能力的影响,并指出当前研究中存在诸如选择偏差、样本量过小等潜在问题,展示了该方法在提升实验可信度方面的重要价值。

Comments 22 pages. Code at https://github.com/odmarkj/agent-breakage tag v0.1.0 (Apache 2.0). Source repo at https://github.com/odmarkj/agent-breakage-paper tag arxiv-v1

详情
AI中文摘要

关于自主 Kubernetes 操作代理的经验声明在很大程度上是不可证伪的。已发表的工作报告了观察结果,但没有与禁用代理的基线进行受控比较,选择偏差普遍存在,缺乏预注册的决策矩阵,并且样本通常太小,无法匹配底层评分系统的噪声水平。原因在于限制代理本身的相同差距:代码代理有一个验证基础,将“是否有效”转化为快速、可证伪的 ground-truth 信号,而操作领域没有等效物。我们提出 agent-breakage,一个闭环测量框架,向目标 Kubernetes 集群注入故障,观察自主代理如何响应,在四个轴上根据 ground truth 对响应进行评分,并累积带有结果标签的 (状态, 动作, 结果) 元组。该框架区分框架错误和推理错误,通过确定性嵌入器机制支持真正的关闭条件控制,并强制执行预注册的决策矩阵。我们将其作为案例研究,测试检索过去的故障后分析是否会复合代理的能力。方法论的贡献是框架在该案例研究中捕获的三个混杂因素,每个因素都会在同一个工作的仪器化程度较低的版本上产生错误的已发表声明:pgvector 索引错误、+19% 的选择偏差工件,以及将效应夸大大约 3 倍的小样本估计。检索结果本身是部分证伪:3 个密集语料场景中有 1 个在 p<0.05 时显著,合并效应 +3.9 个百分点,在 n=60 时不显著。在 360 次运行中进行的场景内语料密度扫描表明,近邻的机械对齐主导了原始计数。该框架已开源发布。

英文摘要

Empirical claims about autonomous Kubernetes operations agents are largely unfalsifiable. Published work reports observational results without controlled comparisons against an agent-disabled baseline, selection bias is endemic, pre-registered decision matrices are absent, and samples are typically too small for the noise level of the underlying scoring system. The cause is the same gap that limits the agents themselves: code agents have a verification substrate that turns "did it work" into a fast, falsifiable, ground-truth signal, and operations has nothing equivalent. We present agent-breakage, a closed-loop measurement framework that injects faults into a target Kubernetes cluster, observes how an autonomous agent responds, scores the response on four axes against ground truth, and accumulates outcome-labeled (state, action, outcome) tuples. The framework distinguishes framework error from reasoning error, supports a true off-condition control via a deterministic-embedder mechanism, and enforces pre-registered decision matrices. We use it as a case study to test whether retrieval over past postmortems compounds an agent's capability. The methodological payload is three confounds the substrate caught during that case study, each of which would have produced a wrong published claim on a less instrumented version of the same work: a pgvector index bug, a +19% selection-bias artifact, and small-sample estimates that overstated effects by roughly 3x. The retrieval result itself is a partial falsification: 1 of 3 dense-corpus scenarios significant at p<0.05, pooled effect +3.9 percentage points, not significant at n=60. A within-scenario corpus-density sweep at 360 runs shows that mechanistic alignment of near-neighbors dominates raw count. The framework is released open source.

2605.23056 2026-05-25 cs.NI cs.AI

DRL-Driven Edge-Aware Utility Optimization for Multi-Slice 6G Networks

DRL驱动的多切片6G网络边缘感知效用优化

Khaled M. Naguib, Soumaya Cherkaoui, Mahmoud M. Elmessalawy, Ahmed M. Abd El-Haleem, Ibrahim I. Ibrahim

发表机构 * CCAS Department, School of Engineering, New giza University(新吉扎大学工程学院CCAS系) Department of Computer and Software Engineering, Polytechnique Montreal(蒙特利尔大学计算机与软件工程系) Department of Electronics and Communications, Faculty of Engineering, Helwan University(海尔万大学工程学院电子与通信系)

AI总结 本文研究了在6G网络中如何通过深度强化学习优化多切片网络的边缘感知效用,以满足虚拟现实等高要求业务的需求。提出了一种基于深度Q网络(DQN)的智能资源分配与边缘缓存框架,能够在O-RAN架构中实现多网络切片的动态资源调度与内容分发。该方法有效提升了网络延迟和吞吐量,为6G环境下的沉浸式VR应用提供了更可靠和响应更快的支持。

Comments 5 pages

Journal ref IEEE Networking Letters, vol. 8, pp. 14-18, 2026

详情
AI中文摘要

通过6G网络传输的虚拟现实(VR)服务需要超低延迟和高带宽,以确保无缝用户体验。本文提出了一种面向6G O-RAN网络的智能资源分配与边缘缓存框架,利用深度Q网络(DQN)学习优化O-RAN架构下多网络切片的边缘缓存和动态资源配置。通过将DRL代理集成到网络控制平面,所提系统能够实现主动和自适应内容分发以及实时计算资源分配,满足eMBB、URLLC,尤其是对VR至关重要的新兴MBRLLC切片的服务质量需求。仿真结果表明,基于DQN的框架在降低延迟和提高吞吐量方面始终优于传统方法,从而为6G环境中的沉浸式VR应用提供更可靠和响应更快的支持。

英文摘要

Virtual Reality (VR) services delivered over 6G networks demand ultra-low latency and high bandwidth to ensure seamless user experiences. This paper presents an intelligent resource allocation and edge caching framework for 6G O-RAN networks, leveraging Deep Q-Network (DQN) learning for optimizing edge caching and dynamic resource provisioning across multiple network slices within an O-RAN-compliant architecture. By incorporating DRL agents into the network control plane, the proposed system enables proactive and adaptive content distribution as well as real-time computational resource allocation that meets the quality-of-service demands of eMBB, URLLC, and especially the emerging MBRLLC slices essential for VR. Simulation results demonstrate that the DQN-based framework consistently outperforms traditional methods in reducing latency and improving throughput, leading to more reliable and responsive support for immersive VR applications in 6G environments.

2605.23007 2026-05-25 q-fin.TR cs.AI cs.LG q-fin.PM

MadEvolve: Evolutionary Optimization of Trading Systems with Large Language Models

MadEvolve: 基于大型语言模型的交易系统进化优化

Yurii Kvasiuk, Tianyi Li, Owen Colegrove, Moritz Münchmeyer

发表机构 * Department of Physics, University of Wisconsin–Madison(威斯康星大学麦迪逊分校物理系) Event Horizon Labs(事件地平线实验室)

AI总结 本文提出了一种基于大型语言模型的进化优化框架MadEvolve,用于优化量化交易系统,特别是在比特币交易中的策略生成与执行。该方法通过进化算法优化交易策略的特征集、策略组件及整体流程,显著提升了交易表现。研究还对比了其他智能搜索方法,并评估了模拟环境中的p-hacking概率,验证了AI驱动的进化算法在量化金融中的有效性。

详情
AI中文摘要

我们探索了将LLM驱动的算法优化应用于量化金融中的几个常见任务。MadEvolve是一个受DeepMind的Alpha-Evolve启发的通用算法优化框架,最近被开发用于优化计算宇宙学中的算法。在此,我们以比特币交易为例,展示了MadEvolve在优化算法交易策略和alpha生成方面的实用性。在我们的模拟和回测设置中,我们在所有考虑的任务上取得了显著改进,例如演化用于信号生成的特征集、优化交易策略的独立组件,以及联合演化特征流水线与执行策略。此外,我们将我们的方法与其他智能搜索方法(特别是Claude Code)进行了比较,并仔细评估了模拟设置中的p-hacking概率。我们的发现强烈支持AI驱动的智能和进化算法在算法交易和量化金融中的实用性。

英文摘要

We explore the application of LLM-driven algorithm optimization to several common tasks in quantitative finance. MadEvolve, a general-purpose algorithm optimization framework inspired by DeepMind's Alpha-Evolve, was recently developed to optimize algorithms in computational cosmology. Here we demonstrate the utility of MadEvolve to optimize algorithmic trading strategies and alpha generation at the example of Bitcoin trading. On our simulation and backtesting setup, we achieve significant improvements on all tasks we considered, such as evolving feature sets for signal generation, optimizing separate components of the trading strategy, and jointly evolving the feature pipeline together with the execution strategy. Additionally, we compare our method to other agentic search approaches, specifically Claude Code, and carefully evaluate p-hacking probabilities on our simulation setup. Our findings strongly support the utility of AI-driven agentic and evolutionary algorithms for algorithmic trading and quantitative finance.

2605.22995 2026-05-25 cs.CY cs.AI

Whose Good, Whose Place? The Moral Geography of Agentic AI for Social Good

谁之善,谁之地?面向社会公益的能动型AI的道德地理学

Poli Nemkova, Haeshitha Indukuri, Jaedon Charles

发表机构 * University of North Texas(北卡罗来纳州立大学) Florida International University(佛罗里达国际大学)

AI总结 本文研究了用于社会公益的智能代理AI系统在道德地理方面的不对称性,指出尽管这类系统常以联合国可持续发展目标(SDGs)为依据,但很少明确说明其地理背景,尤其在需要考虑地方政治、法律和文化因素的领域更为明显。研究分析了2015至2026年间112篇相关论文,发现仅25%的论文报告了实际部署或小规模测试,揭示了在责任归属、参与性和透明度方面的多重缺口,并提出了更具体、参与性更强的AI系统报告标准。

详情
AI中文摘要

能动型AI系统越来越多地被提出用于社会公益领域,通常引用联合国可持续发展目标(SDGs)作为全球利益的词汇。然而,社会公益的主张并未建立对系统声称服务的社区的问责。我们对2015年至2026年间发表的112篇关于社会公益的能动型AI论文进行了结构化调查。我们发现一种道德地理不对称:论文在最需要当地政治、法律和文化背景的领域最不可能指定地理背景。在整个语料库中,112篇论文中有82篇(73%)未指定任何地理背景。与健康或物理/生态SDGs相关的论文指定地理背景的比例为37-40%,而与制度和社会政策SDGs相关的论文仅13%。SDG 16(和平、正义与强大机构)既是语料库中覆盖最多的目标,也是地理指定率最低的目标。我们将此解释为道德抽象:面向社会公益的能动型AI往往将制度性善视为普适的,而不同于对待健康或生态善的方式。第二个发现加剧了这一点:112篇论文中只有28篇(25%)报告了任何实际部署或小规模测试。我们识别出五个问责缺口,并提出了一个最低报告标准,以促进更具体情境、参与性和负责任的面向社会公益的能动型AI。

英文摘要

Agentic AI systems are increasingly proposed for social-good domains, often invoking the United Nations Sustainable Development Goals (SDGs) as a vocabulary of global benefit. Yet claims of social good do not establish accountability to the communities a system claims to serve. We present a structured survey of 112 papers on agentic AI for social good published between 2015 and 2026. We find a moral-geographic asymmetry: papers are least likely to specify geographic context in precisely the domains where local political, legal, and cultural context matters most. Across the corpus, 82 of 112 papers (73%) specify no geographic context. Papers aligned with health or physical/ecological SDGs specify geography 37-40% of the time, while papers aligned with institutional and social-policy SDGs do so only 13%. SDG 16, peace, justice, and strong institutions, is both the most-covered goal in the corpus and the one with the lowest geographic-specification rate. We interpret this as moral abstraction: agentic AI for social good often treats institutional good as universal in ways it does not treat health or ecological good. A second finding compounds this: only 28 of 112 papers (25%) report any real-world deployment or small-scale test. We identify five accountability gaps and propose a minimal reporting standard for more context-specific, participatory, and accountable agentic AI for social good.

2605.22988 2026-05-25 q-bio.NC cs.LG cs.RO cs.SY eess.SY

Active Sensing Subserves Task-Level Control

主动感知服务于任务级控制

Andrew Lamperski, Debojyoti Biswas, Eric S. Fortune, John Guckenheimer, Kathleen Hoffman, Noah J. Cowan

发表机构 * Department of Electrical and Computer Engineering, University of Minnesota(明尼苏达大学电气与计算机工程系) Laboratory for Computational Sensing and Robotics, Johns Hopkins University(约翰霍普金斯大学计算感知与机器人实验室) Federated Department of Biological Sciences, New Jersey Institute of Technology(新泽西理工学院联合生物科学系) Department of Mathematics, Cornell University(康奈尔大学数学系) Department of Mathematics and Statistics, University of Maryland, Baltimore County(马里兰大学巴尔的摩县分校数学与统计学系) Department of Mechanical Engineering, Johns Hopkins University(约翰霍普金斯大学机械工程系)

AI总结 本文探讨了主动感知在任务级控制中的作用,提出主动感知并非由感官目标驱动,而是任务控制的必要组成部分。研究结合生物实证数据和数学理论,表明主动感知行为通常以离散阶段出现,动物在“探索”与“利用”两种行为模式间切换,以适应性传感器和模式切换实现反馈控制。这一策略在生物系统中普遍存在,但在工程系统中却较少应用,提示当前机器人控制体系仍有待改进。

详情
AI中文摘要

主动感知传统上被定义为为了获取信息而消耗能量,通常以运动的形式。在这里,我们提出,对自适应传感器的依赖、运动与感知之间的联系以及任务级控制的结合,必然导致主动感知运动的出现。这样,主动感知并非由感官目标驱动,例如最小化状态不确定性,而是任务级控制所必需的。这一假设,即主动感知服务于控制,得到了来自生物体的经验数据和数学理论的支持。有趣的是,主动感知行为通常发生在离散的时段中,与目标导向行为交替出现。这表明动物在两种具有不同控制策略的行为模式之间切换:一种“探索”模式,动物产生动态运动以塑造感觉反馈;以及一种“利用”模式,动物产生与实现任务目标直接相关的较慢补偿运动。这种依赖于自适应传感器、主动感知和模式切换的反馈控制策略在工程系统中并不常用,尽管在生物学中普遍存在。由最先进的传感器、执行器和机械设计组成的工程系统在“成本函数”方面(如最大力生成、精度和速度)可以胜过动物。然而,动物通常能够实现目前工程系统无法比拟的稳健、优雅的行为,这表明当前的控制系统存在不足。这些以控制理论语言表达的见解可能对改进机器人感知和控制至关重要。

英文摘要

Active sensing is traditionally defined as the expenditure of energy, typically in the form of movement, for obtaining information. Here, we propose that the combination of reliance on adaptive sensors, the linkage between movement and sensing, and task-level control inevitably gives rise to the emergence of active sensing movements. In this way, active sensing is not driven by sensory goals, such as minimizing uncertainty about the state, but rather is necessary for task-level control. This hypothesis, that active sensing subserves control, is supported by both empirical data from organisms and mathematical theory. Interestingly, active sensing behaviors often occur in discrete epochs, interspersed with goal-oriented behavior. This suggests that animals switch between two behavioral modes with distinct control policies, an `explore' mode in which animals produce dynamic movements to shape sensory feedback, and an `exploit' mode in which animals produce slower compensatory movements that are directly related to achieving task goals. This strategy for feedback control that relies on adaptive sensors, active sensing, and mode switching is not commonly used in engineered systems despite being ubiquitous in biology. Engineered systems comprising state-of-the-art sensors, actuators, and mechanical designs can outperform animals with respect to ``cost functions'' such as maximum force generation, precision, and speed. Nevertheless, animals routinely achieve robust, graceful behaviors that are currently unmatched by engineered systems, suggesting that current control systems are insufficient. These insights, expressed in the language of control theory, may be critical for improving robotic sensing and control.

2605.22976 2026-05-25 cs.SE cs.AI

LLM Code Smells: A Taxonomy and Detection Approach

LLM 代码异味:分类与检测方法

Zacharie Chenail-Larcher, Brahim Mahmoudi, Naouel Moha, Quentin Stiévenart, Florent Avellaneda

发表机构 * École de technologie supérieure Université du Québec à Montréal

AI总结 本文研究了大语言模型(LLM)在软件系统中集成时可能引入的代码异味问题,提出了一个包含九类LLM代码异味的分类体系,并开发了静态分析工具SpecDetect4LLM用于检测这些异味。通过对692个开源项目进行实证评估,结果表明近74%的系统存在LLM代码异味,检测精度达91.3%,召回率为71.8%,为开发者提供了识别和改进LLM集成质量的有效手段。

详情
AI中文摘要

大型语言模型(LLM)因其多功能性、灵活性以及在某种程度上模拟人类推理的能力,越来越多地被集成到软件系统中用于各种目的。然而,源代码中LLM推理的糟糕集成可能会损害软件系统的质量。因此,必须记录不充分的LLM集成编码实践,以帮助开发者缓解此类问题。基于我们先前关于LLM代码异味的工作,本文通过呈现一个自包含的分类体系和包含九种LLM代码异味的目录,巩固并完善了这一概念。我们还创建了SpecDetect4LLM,一个用于检测这些异味的静态源代码分析工具,并对其检测效果(精确率和召回率)以及LLM代码异味在692个开源软件项目(171,194个源文件)中的普遍性进行了广泛的实证评估。结果表明,LLM代码异味影响了73.5%的被分析系统,检测精确率为91.3%,召回率为71.8%。

英文摘要

Large Language Models (LLMs) are increasingly integrated into software systems for diverse purposes, due to their versatility, flexibility, and ability to simulate human reasoning to some extent. However, poor integration of LLM inference in source code can undermine software system quality. Therefore, inadequate LLM integration coding practices must be documented to help developers mitigate such issues. Following our earlier work on LLM code smells, this paper consolidates and refines the concept by presenting a self-contained taxonomy and a catalog of nine LLM code smells. We also create SpecDetect4LLM, a static source code analysis tool for their detection, and conduct extensive empirical evaluations of its detection effectiveness (precision and recall) as well as the prevalence of LLM code smells across 692 open-source software projects (171,194 source files). Our results show that LLM code smells affect 73.5% of the analyzed systems, with a detection precision of 91.3% and a recall of 71.8%.

2605.22968 2026-05-25 q-bio.QM cs.LG stat.ML

Uncertainty-aware classification and triage of structural heart disease using electrocardiography and echocardiography metrics

基于心电图和超声心动图指标的结构性心脏病不确定性感知分类与分诊

Mitchel J. Colebank

发表机构 * Department of Mathematics, University of South Carolina(南卡罗来纳大学数学系)

AI总结 该研究探讨了利用心电图(ECG)和超声心动图指标对结构性心脏病(SHD)进行分类与分诊的不确定性感知方法。研究对比了频率学派和贝叶斯神经网络分类器在SHD检测中的表现,发现贝叶斯方法在分类性能和不确定性量化方面更具优势。研究还展示了如何将不确定性感知分类应用于SHD筛查,为通过机器学习辅助分诊、优化医疗资源分配提供了可行方案。

Comments 15 pages, 5 figures

详情
AI中文摘要

机器学习方法提供了一种方法创新,可以通过无创且易于获得的测量方式帮助筛查心血管疾病。最近在利用心电图数据筛查结构性心脏病方面的投资就是一个例子,其中心电图提供了一种低成本、可用的筛查方式。这导致了EchoNext数据集的产生,这是一个配对的心电图-超声心动图数据存储库,用于测试新的结构性心脏病检测方法。然而,相对较少的研究探讨了通过贝叶斯推理进行更概率性的分类如何改善这种情况下的不确定性量化。此外,很少有研究考虑如何开发分诊系统以缓解医疗瓶颈,例如由专家超声技师审查来自服务不足的农村诊所的数据以进行结构性心脏病评估。在本研究中,我们利用现有的心电图-超声心动图数据来比较频率派和贝叶斯神经网络分类器。我们表明,贝叶斯方法在结构性心脏病分类中与频率派方法相当或更好,并且它们具有更稳健的不确定性量化。我们提供了一个示例,说明如何将此不确定性感知分类方案用于结构性心脏病筛查,为机器学习如何帮助分诊提供了概念验证,即在结构性心脏病高度可能或测量高度不确定时,让个体获得专家超声技师的输入。

英文摘要

Machine learning methods provide a methodological innovation that can help screen for cardiovascular disease through noninvasive and readily available measurement modalities. Recent investments in using electrocardiogram (ECG) data to screen for structural heart disease (SHD) are one example, where ECGs provide a low-cost, available modality for screening. This has led to the EchoNext dataset, a paired ECG-echocardiogram data repository for testing new methods of SHD detection. However, relatively few studies have investigated how more probabilistic classification through Bayesian inference may improve uncertainty quantification in this setting. Moreover, few studies have considered how triage systems can be developed to alleviate healthcare bottlenecks, such as the review of data from underserved, rural clinics by expert sonographers for SHD assessment. In this study, we leverage existing ECG-echocardiogram data to compare frequentist and Bayesian neural network classifiers. We show that the Bayesian approach is comparable or better than frequentist methods in SHD classification, and that they have a more robust uncertainty quantification attached to them. We provide an example of how this uncertainty-aware classification scheme can be used for screening SHD, providing a proof-of-concept for how machine learning can help with triage in getting individuals expert sonographer input when SHD is highly likely or measurements are highly uncertain.

2605.22237 2026-05-25 cs.CR cs.LG

Decision-Aware Quadratic ReLU Replacement for HE-Friendly Inference

面向同态加密推理的决策感知二次ReLU替换

Rui Li, Wenyuan Wu, Weijie Miao

发表机构 * Chongqing Key Laboratory of Secure Computing for Biology(重庆生物安全计算重点实验室) Chongqing Institute of Green and Intelligent Technology(重庆绿色智能技术研究所) Chinese Academy of Sciences(中国科学院) Department of Industrial and Systems Engineering(工业与系统工程系)

AI总结 该研究针对全同态加密(FHE)下神经网络推理中ReLU激活函数的替换问题,提出了一种基于决策感知的二次多项式替代方法,旨在在不重新训练模型的前提下,使用低阶多项式保持分类决策的一致性。研究通过几何框架分析校准集的决策边界,提出了在正边距条件下实现无误差替换的充要条件及构造算法,并在边距不足时引入凸包缩减和拉格朗日对偶松弛方法,有效降低计算复杂度。实验表明,该方法在CKKS方案下能够达到与明文模型相当的精度,且推理效率显著优于现有方法。

Comments 13 pages, 2 figures

详情
AI中文摘要

全同态加密(FHE)仅支持加法和乘法,因此仅使用FHE的神经网络推理通常将ReLU替换为在经验激活区间上拟合的多项式。这种区间拟合通常需要更高次多项式来控制激活误差,从而产生同态评估成本,而分类由最终logit决策决定。我们从决策感知的角度重新审视ReLU替换:给定一个训练好的单隐层ReLU MLP和一个指定的校准集,能否在不重新训练的情况下,用一个同态友好的低次多项式替换ReLU,同时保持校准集决策不变?我们专注于二次替换,即保留每个单元非线性的最低次数。对于在提升空间中正间隔可分的校准集,我们将二次替换公式化为一个线性可分问题,得到了校准无损替换的充分必要条件以及系数的构造性算法。当正间隔条件不满足时(通常是因为少数接近边界或错误分类的校准样本使提升凸包接触),我们通过缩减凸包和拉格朗日对偶软间隔松弛来扩展相同的几何框架。这些方法限制了单个样本能携带的权重,将问题转化为较小的凸二次规划,产生近似可行的系数,并在校准集决策上具有高经验一致性。特别地,在最大权重上限μ=1时,缩减凸包松弛退化为标准凸包分离;因此该松弛连续地扩展了正间隔精确理论。在CKKS下,二次替换在多个基准测试中匹配明文top-1准确率,激活模块运行速度比Remez-7快3.7-4.1倍,端到端快1.18-1.68倍。

英文摘要

Fully homomorphic encryption (FHE) supports only additions and multiplications, so FHE-only neural-network inference typically replaces ReLU with polynomials fitted over empirical activation intervals. Such interval fitting often requires higher-degree polynomials to control activation error, incurring homomorphic evaluation costs, while classification is determined by the final logit decision. We revisit ReLU replacement from a decision-aware perspective: given a trained single-hidden-layer ReLU MLP and a specified calibration set, can an HE-friendly low-degree polynomial replace ReLU without retraining while preserving calibration-set decisions? We focus on quadratic replacement, the lowest-degree that retains a genuine per-unit nonlinearity. For calibration sets positive-margin separable in the lifted space, we formulate quadratic replacement as a linear separation problem, yielding necessary and sufficient conditions for calibration-lossless replacement and a constructive algorithm for the coefficients. When the positive-margin condition fails -- often because a few near-boundary or misclassified calibration samples bring the lifted hulls into contact -- we extend the same geometric framework via reduced convex hulls and Lagrangian-dual soft-margin relaxations. These cap the weight any single sample can carry, converting the problem into smaller convex quadratic programs that yield approximately feasible coefficients with high empirical agreement on calibration-set decisions. In particular, at the maximal weight cap $μ=1$, the reduced-convex-hull relaxation reduces to standard convex-hull separation; the relaxation thus continuously extends the positive-margin exact theory. Under CKKS, the quadratic replacement matches plaintext top-1 accuracy on multiple benchmarks, running 3.7--4.1$\times$ faster than Remez-7 in the activation module and 1.18--1.68$\times$ faster end-to-end.