arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1676
2504.09006 2026-05-18 cs.GT cs.LG

Learning in Structured Stackelberg Games

在结构化Stackelberg游戏中学习

Maria-Florina Balcan, Kiriaki Fragkia, Keegan Harris

AI总结 本文研究了结构化Stackelberg游戏,提出Stackelberg-Littlestone维度以优化在线学习算法,并在分布设定中提供样本复杂度的上下界。

Comments Accepted as a spotlight paper to ICML 2026

详情
AI中文摘要

我们首次研究了结构化Stackelberg游戏,这是一种领导者与跟随者之间的新型战略互动形式,其中上下文信息可以预测跟随者的(未知)类型。受安全游戏和AI安全应用的启发,我们展示了这种额外结构如何帮助领导者在在线和分布设定中学习最优效用策略。在在线设定中,我们证明标准学习理论复杂性度量不刻画领导者学习任务的难度。值得注意的是,我们发现存在一种类似于在线分类中Littlestone维度的学习理论复杂性度量,能够紧密刻画领导者实例最优遗憾。我们将其称为Stackelberg-Littlestone维度,并利用它提供可证明最优的在线学习算法。在分布设定中,我们通过展示两个新维度控制样本复杂度的上界和下界,提供了类比结果。

英文摘要

We initiate the study of structured Stackelberg games, a novel form of strategic interaction between a leader and a follower where contextual information can be predictive of the follower's (unknown) type. Motivated by applications such as security games and AI safety, we show how this additional structure can help the leader learn a utility-maximizing policy in both the online and distributional settings. In the online setting, we first prove that standard learning-theoretic measures of complexity do not characterize the difficulty of the leader's learning task. Notably, we find that there exists a learning-theoretic measure of complexity, analogous to the Littlestone dimension in online classification, that tightly characterizes the leader's instance-optimal regret. We term this the Stackelberg-Littlestone dimension, and leverage it to provide a provably optimal online learning algorithm. In the distributional setting, we provide analogous results by showing that two new dimensions control the sample complexity upper- and lower-bound.

2503.23927 2026-05-18 stat.ML cs.LG

Detecting Localized Density Anomalies in Multivariate Data via Coin-Flip Statistics

通过硬币翻转统计检测多变量数据中的局部密度异常

Sebastian Springer, Andre Scaffidi, Maximilian Autenrieth, Gabriella Contardo, Alessandro Laio, Roberto Trotta, Heikki Haario

AI总结 本文提出EagleEye方法,通过编码k近邻列表为二进制序列,检测多变量数据中的局部过密度和欠密度异常,并在三种场景中验证其有效性。

Comments Code Availability: The code used to generate the results of this study is available at GitHub via the link: https://github.com/sspring137/EagleEye

详情
AI中文摘要

检测两个样本之间的局部差异是科学数据分析的核心任务,用于识别信号事件、制度变化或模型不匹配。我们引入EagleEye方法,通过将有序k近邻列表编码为二进制成员序列,并测试该序列中累积成功次数是否与二项式(硬币翻转)空模型一致,来定位多变量特征空间中的局部过密度和欠密度。在存在真实局部异常时,邻居将优先属于其中一个数据集,导致相对于二项式空模型的“成功”次数过多。这些局部点检测通过确定性细化程序整合为可解释的异常集,同时可以估计不可约背景和局部密度异常纯度。我们通过三种场景展示了EagleEye的有效性:首先考虑具有已知局部过密度和欠密度的人工数据示例;其次展示EagleEye在粒子对撞机实验中检测新物理现象时在系统背景建模差异下的应用;最后进行气候分析研究,揭示了时空温度模式重复中的局部变化。

英文摘要

Detecting localized differences between two samples is a central task in scientific data analysis, required for the identification of signal events, regime changes, or model mismatch. We introduce EagleEye, a method that pinpoints local over- and under-densities in multivariate feature spaces. EagleEye assigns each point an anomaly score by encoding its ordered k-nearest-neighbour list as a binary membership sequence and testing whether the cumulative number of successes in this sequence is consistent with a binomial (coin-flipping) null model. In the presence of a genuine local anomaly, neighbours will preferentially belong to one of the two datasts, yielding an excess of ``successes'' relative to the binomial null model. These local, pointwise detections are consolidated into interpretable anomaly sets through a deterministic refinement procedure that can also estimate the irreducible background and local density anomaly purity. We demonstrate EagleEye's efficacy in three scenarios. We first consider an artificial data example with known localized over- and under-densities. Second, we demonstrate how EagleEye may be used for new physics searches at particle collider experiments in the presence of systematic background modelling differences. Finally, we conduct a climate analysis study that reveals localized changes in spatiotemporal temperature-pattern recurrence.

2503.15107 2026-05-18 stat.ML cs.LG

Interpretability of Graph Neural Networks to Assess Effects of Global Change Drivers on Ecological Networks

图神经网络的可解释性:评估全球变化驱动因素对生态网络的影响

Emre Anakok, Pierre Barbillon, Colin Fontaine, Elisa Thebault

AI总结 研究通过图神经网络分析全球变化驱动因素对传粉网络连接性的影响,探讨环境变量与植物属的交互作用,并验证去偏技术对估计效果的影响。

详情
AI中文摘要

传粉者在植物繁殖中起关键作用,无论是自然生态系统还是人类修改的景观。全球变化驱动因素,如气候变化或土地利用修改,会改变植物-传粉者相互作用。为了评估全球变化驱动因素对传粉的影响,需要大规模的相互作用、气候和土地利用数据。尽管最近的机器学习方法,如图神经网络(GNNs),允许分析此类数据集,但解释其结果具有挑战性。我们探索现有的GNN解释方法,以突出各种环境协变量对传粉网络连接性的影响。进行了广泛的模拟研究,以确认这些方法能否检测协变量与植物属之间的交互作用,以及去偏技术的应用是否影响这些效果的估计。对Spipoll数据集的应用,包括和不包括考虑采样效应,突显了土地利用对网络连接性潜在影响,并显示考虑采样效应部分改变了这些效果的估计。

英文摘要

Pollinators play a crucial role for plant reproduction, either in natural ecosystem or in human-modified landscape. Global change drivers,including climate change or land use modifications, can alter the plant-pollinator interactions. To assess the potential influence of global change drivers on pollination, large-scale interactions, climate and land use data are required. While recent machine learning methods, such as graph neural networks (GNNs), allow the analysis of such datasets, interpreting their results can be challenging. We explore existing methods for interpreting GNNs in order to highlight the effects of various environmental covariates on pollination network connectivity. An extensive simulation study is performed to confirm whether these methods can detect the interactive effect between a covariate and a genus of plant on connectivity, and whether the application of debiasing techniques influences the estimation of these effects. An application on the Spipoll dataset, with and without accounting for sampling effects, highlights the potential impact of land use on network connectivity and shows that accounting for sampling effects partially alters the estimation of these effects.

2502.04271 2026-05-18 quant-ph cs.LG

Variational decision diagrams for quantum-inspired machine learning applications

变分决策图用于量子启发式机器学习应用

Vladimir Vargas-Calderón, Santiago Acevedo-Mancera, Herbert Vinck-Posada

AI总结 本文提出变分决策图,结合决策图结构优势与变分方法适应性,用于高效表示量子态,解决Ising和Heisenberg哈密顿量的基态估计问题,证明训练可行性。

Comments 11 pages, 3 figures, presented at Quantum Information in Spain (ICE-9)

详情
AI中文摘要

决策图(DDs)因能利用量子态和操作中的数据冗余,已成为模拟量子电路的有效工具。然而其在量子机器学习(QML)中的应用尚未被探索。本文引入变分决策图(VDDs),一种结合DD结构优势与变分方法适应性的新型图结构。我们通过将VDDs应用于横向场Ising和Heisenberg哈密顿量的基态估计问题,研究其可训练性。梯度方差分析表明,VDDs的训练是可能的,未观察到消失梯度(即 barren plateaus)的现象。本文为在QML中使用决策图作为变分方案设计和训练的替代方法提供了新见解。

英文摘要

Decision diagrams (DDs) have emerged as an efficient tool for simulating quantum circuits due to their capacity to exploit data redundancies in quantum states and quantum operations, enabling the efficient computation of probability amplitudes. However, their application in quantum machine learning (QML) has remained unexplored. This paper introduces variational decision diagrams (VDDs), a novel graph structure that combines the structural benefits of DDs with the adaptability of variational methods for efficiently representing quantum states. We investigate the trainability of VDDs by applying them to the ground state estimation problem for transverse-field Ising and Heisenberg Hamiltonians. Analysis of gradient variance suggests that training VDDs is possible, as no signs of vanishing gradients--also known as barren plateaus--are observed. This work provides new insights into the use of decision diagrams in QML as an alternative to design and train variational ansätze.

2412.11308 2026-05-18 stat.ML cs.LG

From XAI to MLOps: Explainable Concept Drift Detection with Profile Drift Detection

从XAI到MLOps:基于轮廓漂移检测的可解释概念漂移检测

Ugur Dar, Mustafa Cavus

AI总结 本文提出轮廓漂移检测方法,利用可解释AI工具部分依赖性轮廓图,通过新的漂移度量标准检测概念漂移并理解其原因,实验表明其在保持预测性能的同时有效平衡了漂移信号的敏感性和稳定性。

Comments 15 pages, 6 figures

详情
Journal ref
Future Generation Computer Systems (2026)
AI中文摘要

预测模型的性能往往因数据分布的变化而下降,这种现象称为数据漂移。其中,概念漂移(解释变量与响应变量之间的关系变化)尤其难以检测和适应。传统漂移检测方法通常依赖准确率或边缘变量分布等指标,可能无法捕捉到微妙但重要的概念变化。本文提出了一种新方法,轮廓漂移检测(PDD),通过利用可解释AI工具部分依赖性轮廓图(PDPs),实现了对概念漂移的检测和对其潜在原因的深入理解。PDD通过新的漂移度量标准量化PDPs的变化,这些度量标准对数据流中的变化敏感,同时保持计算效率。该方法与MLOps实践一致,强调在动态环境中持续的模型监控和适应性重训练。在合成和实际数据集上的实验表明,PDD在保持高预测性能的同时,有效平衡了漂移信号的敏感性和稳定性。结果突显了其在实时应用中的适用性,本文最后讨论了该方法的优势、限制以及向更广泛应用场景扩展的潜力。

英文摘要

Predictive models often degrade in performance due to evolving data distributions, a phenomenon known as data drift. Among its forms, concept drift, where the relationship between explanatory variables and the response variable changes, is particularly challenging to detect and adapt to. Traditional drift detection methods often rely on metrics such as accuracy or marginal variable distributions, which may fail to capture subtle but important conceptual changes. This paper proposes a novel method, Profile Drift Detection (PDD), which enables both the detection of concept drift and an enhanced understanding of its underlying causes by leveraging an explainable AI tool: Partial Dependence Profiles (PDPs). PDD quantifies changes in PDPs through new drift metrics that are sensitive to shifts in the data stream while remaining computationally efficient. This approach is aligned with MLOps practices, emphasizing continuous model monitoring and adaptive retraining in dynamic environments. Experiments on synthetic and real-world datasets demonstrate that PDD outperforms existing methods by maintaining high predictive performance while effectively balancing sensitivity and stability in drift signals. The results highlight its suitability for real-time applications, and the paper concludes by discussing the method's advantages, limitations, and potential extensions to broader use cases.

2407.20240 2026-05-18 cs.CY cs.AI

Social and Ethical Risks Posed by General-Purpose LLMs for Settling Newcomers in Canada

通用大型语言模型对加拿大新移民融入社会的潜在风险

Isar Nejadgholi, Maryam Molamohammadi, Samir Bakhtawar

AI总结 研究探讨通用大语言模型在移民安置领域可能带来的风险,强调需开发定制化AI工具以确保人类监督与责任。

Comments 26 pages, 8 figures

详情
AI中文摘要

加拿大非营利安置部门支持新移民实现成功融入。该部门面临日益增长的操作压力,凸显了提高效率和创新的必要性,可能通过可靠的AI解决方案实现。随意使用通用生成式AI,如ChatGPT,可能成为移民和服务机构的常见做法,但这些工具未针对安置领域进行优化,可能对移民和难民产生有害影响。本文探讨这些工具可能对新移民造成的风险,警告避免未经监管的生成式AI使用,并鼓励进一步研究开发AI素养课程及定制化LLM,使其符合受影响社区的偏好。关键在于此类技术应无缝集成到安置部门现有流程中,确保人类监督、可信度和问责制。

英文摘要

The non-profit settlement sector in Canada supports newcomers in achieving successful integration. This sector faces increasing operational pressures amidst rising immigration targets, which highlights a need for enhanced efficiency and innovation, potentially through reliable AI solutions. The ad-hoc use of general-purpose generative AI, such as ChatGPT, might become a common practice among newcomers and service providers to address this need. However, these tools are not tailored for the settlement domain and can have detrimental implications for immigrants and refugees. We explore the risks that these tools might pose on newcomers to first, warn against the unguarded use of generative AI, and second, to incentivize further research and development in creating AI literacy programs as well as customized LLMs that are aligned with the preferences of the impacted communities. Crucially, such technologies should be designed to integrate seamlessly into the existing workflow of the settlement sector, ensuring human oversight, trustworthiness, and accountability.

2605.15765 2026-05-18 cs.CG cs.DS cs.RO math.OC

Optimizing Line Segment Inspection with Limited-Range Drones

利用有限范围无人机优化线段检测

José-Miguel Díaz-Báñez, José-Manuel Higes, Alina Kasiuk, Inmaculada Ventura

AI总结 本文研究如何利用无人机高效检测线段,提出近似算法解决NP难问题,证明在单行线段和两架无人机情况下问题的复杂性,实验显示算法在多种场景下表现接近最优。

Comments 28 pages, 14 figures

详情
AI中文摘要

无人机优化问题在民用任务中广泛研究,主要因其能穿越崎岖地形并携带摄像头等传感器进行监视任务。这些空中机器人有限的电池寿命给运筹学研究带来挑战。本文解决以下优化问题:给定一组线段(如太阳能电站中的管道)进行检测,目标是利用人工智能检测损坏的管道,路径规划必须高效进行。一方面,电池容量有限需要定期访问固定基站,但希望为每架无人机分配一组行程以确保快速覆盖线段,旨在最小化makespan,即任何无人机的最大时间。我们能证明该优化问题即使在线段位于一条线上且仅涉及两架无人机的情况下也是强NP难的。然后提出了近似算法。我们的计算实验表明,所提出的算法在多种运营场景下实现了接近最优的性能。

英文摘要

Optimization problems with drones are widely studied in a variety of civilian tasks, mainly due to their ability to traverse rough terrains and to carry cameras and other sensors for surveillance tasks. The limited battery life of these aerial robots poses challenges in operational research. In this paper, we address the following optimization problem. We are given a set of line segments (e.g. tubes in a solar plant) to inspect by drones. The objective is to detect broken pipes using artificial intelligence and path planning must be carried out efficiently. On the one hand, the limited capacity of the batteries necessitates periodic visits (tours) to a fixed base station. However, it is desirable to allocate a set of tours for each drone to ensure that the segments are covered as quickly as possible, aiming to minimize the makespan, which is the maximum time spent by any drone. We are able to prove that this optimization problem is strongly NP-hard even when the segments are positioned on a line and the scenario involves only two drones. Then, approximation algorithms are proposed. Our computational experiments demonstrate that the proposed algorithm achieves near-optimal performance across diverse operational scenarios.

2605.15733 2026-05-18 cs.NE cs.AI cs.CV

Structure Abstraction and Generalization in a Hippocampal-Entorhinal Inspired World Model

在启发式世界模型中的结构抽象与泛化

Tianqiu Zhang, Muyang Lyu, Xiao Liu, Si Wu

AI总结 本文提出了一种脑启发的分层模型,通过逆向模型提取潜在转换并构建预测视觉世界模型,展示了在连续高维动态中同时提取抽象结构的能力,实现了结构泛化。

Comments Project page: https://hpc-mec-worldmodel.github.io/

详情
AI中文摘要

人类将经验抽象为结构化表示以促进模式推断和知识转移。尽管海马-内侧颞叶(HPC-MEC)回路已知能表示空间和概念空间,但如何同时从连续、高维动态中提取抽象结构的机制仍不明确。我们提出了一种脑启发的分层模型,同时推断潜在转换并构建预测视觉世界模型。该架构采用逆向模型进行结构提取,同时结合HPC-MEC耦合模型,将关系结构(MEC)与整合的事件场景(HPC)分离。通过使用原始变换动态作为基准,我们展示了该模型在结构抽象方面的能力。通过利用速度驱动的路径整合,该框架能够在不同情境中实现稳健的预测和结构重用,从而实现结构泛化。本文提供了一个新的计算框架,用于理解如何通过脑启发的自监督学习世界模型,促进可重用的抽象知识的获取。

英文摘要

Humans abstract experiences into structured representations to facilitate pattern inference and knowledge transfer. While the hippocampal-entorhinal (HPC-MEC) circuit is known to represent both spatial and conceptual spaces, the mechanisms for concurrently extracting abstract structures from continuous, high-dimensional dynamics remain poorly understood. We propose a brain-inspired hierarchical model that simultaneously infers latent transitions and constructs a predictive visual world model. Our architecture employs an inverse model for structural extraction alongside an HPC-MEC coupling model that dissociates relational structures (MEC) from integrated episodic scenes (HPC). Using primitive transformation dynamics as a benchmark, we demonstrate the model's capacity for structural abstraction. By leveraging velocity-driven path integration, the framework enables robust prediction and structural reuse across diverse contexts, thereby achieving structural generalization. This work provides a novel computational framework for understanding how brain-inspired, self-supervised learning of world models facilitates the acquisition of reusable abstract knowledge.

2605.15714 2026-05-18 cs.SE cs.AI

Position: Early-Stage Quality Assurance in Annotation Pipelines Is More Cost-Effective Than Late-Stage Validation

位置:标注流程早期阶段的质量保证比后期验证更具成本效益

Sunil Kothari, Sumukha Sharma Thoppanahalli Chandramouli, Naman Khandelwal, Parth Kulshreshtha, Ashi Jain, Kriti Banka, Tanuja Chintada, Venkata Triveni, Gulipalli Praveen Kumar, Manish Mehta, Tao Liu

AI总结 本文指出标注流程早期质量保证比后期验证更有效,强调时间因素对误差率和成本的影响,提出三种质量保证触发点并建议改进研究和实践方法。

Comments 8 pages

详情
AI中文摘要

本文主张机器学习社区应优先考虑标注流程早期阶段的质量保证,而非传统的后期验证。数据质量瓶颈日益限制基础模型的改进,然而质量保证研究几乎只关注验证方法而非验证时机。当验证发生时,不仅所采用的方法,根本上决定了误差率和标注成本。这种对时间的忽视令人费解,鉴于软件工程中已确立的“左移”原则,实证研究显示缺陷在后期发现时成本乘数为4-100倍(Boehm, 1981; Shull et al., 2002)。标注流程展现出类似动态:在标注开始前发现的错误成本仅为审查周期结束后发现的分数之一。我们提出三种质量保证触发点,即标注前(T0)、标注后(T1)和审查后(T2),将标注工作流分解为离散的验证机会。一个参数化的误差传播模型正式化了何时时间影响最终误差率 versus 仅经济因素,使时间成为可测量的设计变量而非配置后的考虑。对47篇近期论文的调查发现,仅有4%报告了验证发生的时间,这在相邻领域中显示出时间的影响,令人惊讶。如果没有对质量保证时间的明确关注,社区将有风险在优化验证方法的同时忽略可能最相关的结构性变量。采取这一立场需要三个步骤:研究人员应在报告质量保证时间配置的同时报告验证方法;标注平台应将时间作为首要参数暴露;并且社区应运行受控实验,直接测量各阶段的检测率。

英文摘要

This position paper argues that the machine learning community should prioritize early-stage quality assurance in annotation pipelines over the prevailing practice of late-stage validation. Data quality bottlenecks increasingly limit foundation model improvement, yet quality assurance research focuses almost exclusively on validation methods rather than validation timing. When validation occurs, not merely what methods are employed, fundamentally determines both error rates and annotation costs. This temporal neglect is puzzling given the well-established "shift-left" principle from software engineering, where empirical studies demonstrate 4--100x cost multipliers for defects detected in later stages (Boehm, 1981; Shull et al., 2002). Annotation pipelines exhibit analogous dynamics: errors caught before annotation begins cost a fraction of those discovered after review cycles complete. We propose a taxonomy of three QA trigger points, namely pre-annotation (T0), post-annotation (T1), and post-review (T2), that decompose annotation workflows into discrete validation opportunities. A parametric error-propagation model formalizes when timing affects final error rates versus only economics, making timing a measurable design variable rather than a configuration afterthought. A survey of 47 recent papers reveals that only 4% report when validation occurs, a striking gap given timing's demonstrated impact in adjacent fields. Without explicit attention to QA timing, the community risks optimizing validation methods while ignoring the structural variable that may matter most. Acting on this position requires three steps: researchers should report QA timing configurations alongside validation methods; annotation platforms should expose timing as a first-class parameter; and the community should run controlled experiments that measure stage-specific detection rates directly.

2605.15707 2026-05-18 eess.IV cs.CV

Evaluation of Anatomical Shape Priors in Deep Learning-Based Cardiac Multi-Compartment Segmentation

基于深度学习的心脏多腔分割中解剖形状先验的评估

Michael Hudler, Franz Thaler, Martin Urschler

AI总结 本文评估了轻量级显式形状先验在心脏多腔CT分割中的效果,发现标准3D U-Net仍为强大基线,手工先验效果有限,未来需更 expressive 的学习先验。

Comments Published in the Proceedings of the Third Austrian Symposium on AI, Robotics, and Vision (AIRoV 2026), pp. 23-27

详情
AI中文摘要

全心多腔CT分割在临床中具有重要意义,但标准CNN未显式强制解剖合理性。基于训练数据统计,我们评估了轻量级显式形状先验,以形状感知损失和空间标签分布热图引导的U-Net变体改进3D心脏分割。在所有实验中,标准3D U-Net意外保持了非常强的基线,手工先验仅带来微小且不一致的变化,有时甚至退化性能。这些结果表明,基线已捕捉了显著的隐式解剖规律,未来改进可能需要更 expressive 的学习先验,而非简单的手工解剖形状约束。

英文摘要

Whole-heart multi-compartment CT segmentation is clinically important, but standard CNNs do not explicitly enforce anatomical plausibility. Based on statistics derived from the training data, we evaluate whether lightweight explicit shape priors, implemented as shape-aware losses and spatial label distribution heatmap-guided U-Net variants, improve 3D cardiac segmentation on MM-WHS CT and WHS++. Across all experiments, a standard 3D U-Net surprisingly remained a very strong baseline, with handcrafted priors yielding at best marginal and inconsistent changes and often degrading performance. These results suggest that the baseline already captures substantial implicit anatomical regularities and that future gains will likely require more expressive learned priors rather than simple handcrafted anatomical shape constraints.

2605.15688 2026-05-18 stat.ML cs.AI cs.LG math.PR

$α$-TCAV: A Unified Framework for Testing with Concept Activation Vectors

$α$-TCAV:基于概念激活向量的测试统一框架

Ekkehard Schnoor, Jawher Said, Malik Tiomoko, Wojciech Samek, Alexander Jung

AI总结 本文提出$α$-TCAV框架,解决传统TCAV方法中因指示函数不连续导致的方差问题,通过参数化平滑函数统一概率表述,并提供参数调优指导,挑战现有实践惯例。

Comments 44 pages, 12 figures

详情
AI中文摘要

概念激活向量(CAVs)是深度学习中基于概念的可解释性基础工具,但其实际应用受限于统计不稳定性。本文分析了CAVs和TCAV方法的随机性质,推导了主要CAV类别的分布,包括PatternCAV、FastCAV和基于岭回归的CAV。识别了标准TCAV得分的根本缺陷:其依赖不连续指示函数导致关键区域方差不衰减。为此,引入$α$-TCAV,一种通用框架,用参数化平滑函数替代指示函数,得到统一的概率表述,涵盖TCAV和Multi-TCAV。刻画了灵敏度得分和不同TCAV变体的诱导分布,显示现有最先进的选择缺乏理论依据。提供原理指导,调优$α$-TCAV参数:要么以较低计算成本模仿Multi-TCAV,要么获得校准的贝叶斯最优概率度量。最终分析产生实用建议,挑战现有惯例:最显著的是将全部采样预算分配给单一CAV而非多个。

英文摘要

Concept Activation Vectors (CAVs) are a fundamental tool for concept-based explainability in deep learning, yet their practical utility is limited by statistical instability. We analyze the stochastic nature of CAVs and the Testing with CAVs (TCAV) method, deriving the distributions of major CAV classes including PatternCAV, FastCAV, and ridge regression-based CAVs. We then identify a fundamental flaw in the standard TCAV score: its reliance on a discontinuous indicator function induces non-decaying variance in critical regimes. To address this, we introduce $α$-TCAV, a generalized framework that replaces the indicator with a parameterized smooth function, yielding a unified probabilistic formulation that subsumes both TCAV and Multi-TCAV. We characterize the induced distributions of sensitivity scores and different TCAV variants, showing that established state-of-the-art choices lack theoretical justification. We provide principled guidance on tuning the parameter in $α$-TCAV -- either to imitate Multi-TCAV at substantially lower computational cost, or to obtain a calibrated Bayes-optimal probabilistic measure of a concept's influence. Finally, our analysis yields practical recommendations that challenge established routines: most notably, allocating the full sampling budget to a single CAV rather than splitting it across several.

2605.15681 2026-05-18 cs.GR cs.CV

DealMaTe: Multi-Dimensional Material Transfer via Diffusion Transformer

DealMaTe: 通过扩散变换器实现多维材料传输

Nisha Huang, Yizhou Lin, Jie Guo, Xiu Li, Tong-Yee Lee, Zitong Yu

AI总结 DealMaTe通过深度、规范和光照图像实现材料传输,采用简化扩散框架,消除文本引导和参考网络,设计轻量3D信息注入方法,优化注意力机制,实现高效高质量的材料传输。

详情
AI中文摘要

最近,基于扩散的材料传输方法依赖于图像微调或复杂的架构和辅助网络,面临文本依赖、额外计算成本和特征对齐等挑战。为了解决这些限制,我们提出了DealMaTe,使用深度、规范和光照图像进行材料传输。DealMaTe是一种简化扩散框架,消除了文本引导和参考网络。我们设计了一种轻量的3D信息注入方法,多维3D着色LoRA,无需修改基础模型权重,实现了兼容的控制条件,并获得了和谐稳定的结果。此外,我们通过着色因果互注意力机制优化注意力机制,并使用键值(KV)缓存来减少由多个条件引起的推理延迟,提高计算效率,并在低架构复杂度下实现高质量的材料传输结果。广泛的实验涵盖了各种物体和光照条件,一致地证明DealMaTe在任意输入材料下实现了显著的高保真材料传输。代码可在https://github.com/haha-lisa/DealMaTe上获得。

英文摘要

Recently, diffusion-based material transfer methods rely on image fine-tuning or complex architectures with auxiliary networks but face challenges such as text dependency, additional computational costs, and feature misalignment. To address these limitations, we propose \textbf{DealMaTe}, using \underline{\textbf{de}}pth, norm\underline{\textbf{a}}l, and \underline{\textbf{l}}ighting images for \underline{\textbf{ma}}terial \underline{\textbf{t}}ransf\underline{\textbf{e}}r. DealMaTe is a simplified diffusion framework that eliminates text guidance and reference networks. We design a lightweight 3D information injection method, Multi-Dim 3D Shader LoRA, which, without modifying the base model weights, enables compatible control conditions and achieves harmonious and stable results. Additionally, we optimize the attention mechanism with Shader Causal Mutual Attention and key-value (KV) caching to reduce inference latency caused by multiple conditions, improve computational efficiency, and achieve high-quality material transfer results with low architectural complexity. Extensive experiments covering a wide variety of objects and lighting conditions consistently demonstrate that DealMaTe achieves remarkable high-fidelity material transfer under arbitrary input materials. The code is available at https://github.com/haha-lisa/DealMaTe.

2605.15673 2026-05-18 eess.IV cs.CV cs.LG

Highly Detailed and Generalizable Broadleaf Tree Crown Instance Segmentation from UAV Imagery

基于无人机影像的高精度通用性阔叶林树冠实例分割

Mitsutaka Nakada, Takahiko Ikebata, Kengo Ikebata, Yuji Mizuno, Yusuke Onoda, Ryuichi Takeshige, Kyaw Kyaw Htoo, Kanehiro Kitayama, Robert Ong, Masanori Onishi

AI总结 本文提出一种高精度树冠实例分割模型,通过无人机影像实现阔叶林中单个树冠的精确定界,利用大规模高质量标注数据集提升分割性能,适用于复杂结构和不同地理生物的森林环境。

Comments 12 pages, 5 figures, 3 Tables

详情
AI中文摘要

我们提出了一种高精度的实例分割模型,用于利用无人机获取的高空影像确定自然阔叶林中单个树冠。阔叶林中的树冠界定比其他森林类型更具挑战性,因为树冠形状多样且缺乏明显树顶。为解决这一问题,我们开发了一个基于深度学习的树冠分割模型,该模型在高质量标注的树冠轮廓上进行训练。我们通过熟练标注员手动定义了18,507个树冠多边形,从日本七个森林收集的正射影像中,并基于Mask2Former开发了多个主干架构的模型。最佳模型仅使用RGB影像即可在结构复杂的阔叶林中实现高分割性能。当应用于日本不同地理区域的森林以及婆罗洲生物不同的热带雨林时,性能仍然保持。这些结果表明,使用大量高质量标注数据集对于实现跨多样森林生态系统精确且通用的树冠分割至关重要。所开发的模型已整合到DF Scanner Pro软件中,该软件支持使用无人机进行实际森林监测,这种实现预计能够使广泛用户从无人机分析阔叶林的树级信息。

英文摘要

We present a highly detailed instance segmentation model for delineating individual tree crowns in natural broadleaf forests using aerial imagery acquired by unmanned aerial vehicles (UAVs). Tree crown delineation in broadleaf forests is more challenging than in other forest types due to diversity of crown shapes and the lack of clearly defined treetops. To address this issue, we developed a deep-learning-based crown segmentation model trained on high-quality annotated crown outlines. We manually delineated 18,507 crown polygons from orthomosaic images collected across seven forests in Japan by skilled annotators, and developed a model based on Mask2Former with multiple backbone architectures. The best model achieved high segmentation performance in structurally complex broadleaf forests using only RGB imagery. This performance was maintained when applied to geographically distinct forests within Japan, as well as to biologically distinct tropical rainforests in Borneo. These results demonstrate that using a large number of high-quality annotated datasets is critical for achieving detailed and generalizable crown segmentation across diverse forest ecosystems. The developed model has been integrated into DF Scanner Pro, a software that supports practical forest monitoring using UAVs, and this implementation is expected to enable a wide range of users to analyze tree-level information in broadleaf forest from UAVs.

2605.15671 2026-05-18 eess.IV cs.CV

Degradation-Aware Blur-Segmentation of Brain Tumor

考虑退化因素的脑肿瘤模糊分割

Yuchun Wang, Xiaosong Li, Gefei Liang, Yang Liu

AI总结 本文提出DABSeg网络,通过同步去模糊和精确分割,提升多模态3D脑肿瘤分割在退化条件下的鲁棒性与临床实用性。

详情
AI中文摘要

多模态3D MRI脑肿瘤分割是放疗目标勾画、手术规划和治疗后评估的关键步骤。现有方法通常假设MRI图像无伪影,但扫描过程中不可避免的患者运动引入伪影和模糊,导致边界和纹理特征退化,影响分割性能。为此,我们引入考虑退化因素的模糊分割网络(DABSeg),一种同步去模糊的3D多模态MRI分割网络,统一了模糊去除和准确分割。具体而言,我们提出一个特征域运动去模糊茎以补偿模糊并平衡强度。同时,骨干网络嵌入了一个模糊感知的跨模态交叉注意力模块和多尺度残差聚合,以实现有效的模态互补性。值得注意的是,我们优化了一个联合损失,结合加权Dice与清晰参考重建项,其中不平衡的权重应用于小目标以增强学习强度和预测稳定性,以小病变和边界区域。系统比较和消融实验在BraTS2020数据集上,无论是清晰还是退化条件均一致表明,DABSeg在肿瘤Dice分数和边界精度上优于现有最先进方法。这些结果验证了考虑退化因素的跨任务协作学习在提升多模态3D脑肿瘤分割在现实退化条件下的鲁棒性和临床实用性方面的有效性。源代码可在https://github.com/YuchunWang24/DABSeg_ICPR获取。

英文摘要

Multimodal 3D MRI brain tumor segmentation is a pivotal step in radiotherapy target delineation, surgical planning and post-treatment assessment. Existing methods often assume artifact-free MRI images. However, inevitable patient motion during scanning introduces artifacts and blur that degrade boundary and texture features, leading to poor segmentation performance. To bridge this gap, we introduce Degradation-Aware Blur-Segmentation Net (DABSeg), a synchronous deblurring 3D multimodal MRI segmentation network that unifies blur removal and accurate segmentation. Specifically, we propose a feature-domain motion-deblurring stem to compensate for blur and rebalance intensity. Concurrently, the backbone network embeds a blur-aware cross-modal cross-attention module and multi-scale residual aggregation to yield effective modality complementarity. Notably, we optimize a joint loss that combines weighted Dice with a clear-reference reconstruction term, where imbalanced weights are applied to small targets to boost learning intensity and predictive stability for small lesions and border regions. Systematic comparisons and ablation experiments on the BraTS2020 dataset under both clear and degenerative conditions consistently demonstrate that DABSeg surpasses state-of-the-art methods in tumor Dice score and boundary precision. These results validate the effectiveness of degenerative-aware cross-task collaborative learning in improving the robustness and clinical utility of multi-modal 3D brain tumor segmentation under realistic degenerative conditions. The source code is available at https://github.com/YuchunWang24/DABSeg_ICPR

2605.15656 2026-05-18 eess.SP cs.AI

TFZ-Tree: An Ultra-Lightweight Waveform Classification Framework for Resource-Constrained Devices

TFZ-Tree:一种面向资源受限设备的超轻量波形分类框架

Hao Wang, Kuang Zhang, Yonggang Chi, Tianqi Zhao, Yanbo Fu, Jiaxing Guo

AI总结 本文提出TFZ-Tree框架,通过时间频率多维特征和优化的Z检验树实现超轻量波形分类,实现在资源受限设备上实时识别十种物联网波形类型,测试精度达99.5%。

详情
AI中文摘要

在6G物联网多波形共存趋势下,智能接收器必须首先识别物理层波形类型才能正确解调和资源调度。然而,现有信号识别研究主要聚焦于符号级调制分类,直接针对物理层波形类型(如OFDM、OTFS、LoRa)的研究极为稀缺,且依赖深度神经网络和复杂时频变换,难以部署在资源受限终端。符号调制分类方法本身也无法规避“波形识别先于解调”的前提。为解决这一双重缺口,本文提出一种基于时频多维特征的超轻量波形分类框架,采用低复杂度时域特征提取,分类后端采用优化的Z检验树,利用假设检验置信度自动控制决策树分裂和大小,确保在资源有限处理器上高效执行。在包含OFDM、OTFS、DSSS、LoRa和NB-IoT在内的十种6G候选波形上测试,方法在AWGN信道下平均精度达99.5%,在TDL-C多径信道下为87.4%,主要混淆OTFS与LoRa。在x86平台用C语言实现,单次推理延迟低于4ms。据所知,这是首次实现十种物联网波形类型实时识别的工作。未来工作将针对嵌入式MCU上的部署加速。代码和数据集已开源:https://github.com/Einstein-sworder/IoT-wave.

英文摘要

Under the trend of multi-waveform coexistence in 6G IoT, intelligent receivers must first identify physical-layer waveform types before performing correct demodulation and resource scheduling. However, existing signal identification research largely focuses on symbol-level modulation classification. Research directly targeting physical-layer waveform types (e.g., OFDM, OTFS, LoRa) is not only extremely scarce but also heavily reliant on deep neural networks and complex time-frequency transforms, making deployment on resource-constrained terminals difficult. Symbol modulation classification methods themselves cannot circumvent the prerequisite of ``waveform identification first.'' To address this dual gap, we propose an ultra-lightweight waveform classification framework based on time-frequency multidimensional features with a cooperative Z-test tree (ZTree). The framework employs low-complexity time-domain feature extraction, and the classification backend adopts a ZTree optimized by Z-statistical testing, which uses hypothesis testing confidence to automatically control decision tree splitting and size, ensuring efficient execution on resource-limited processors. Tested on ten 6G candidate waveforms including OFDM, OTFS, DSSS, LoRa, and NB-IoT, the method achieves 99.5\% average accuracy under AWGN and 87.4\% under TDL-C multipath channels, with main confusion between OTFS and LoRa. Implemented in C on an x86 platform, single inference latency is under 4~ms. To the best of our knowledge, this is the first work achieving real-time recognition of ten IoT waveform types. Future work will target deployment acceleration on embedded MCUs. Code and dataset are open-sourced at: https://github.com/Einstein-sworder/IoT-wave.

2605.15630 2026-05-18 physics.chem-ph cond-mat.stat-mech cs.LG

Reweighting free energy profiles between universal machine learning interatomic potentials for fast consensus building

在通用机器学习原子势能函数之间重新加权自由能轮廓以实现快速共识构建

Sauradeep Majumdar, Miguel Steiner, Johannes C. B. Dietschreit, Swagata Roy, Daniel Willimetz, Lukaš Grajciar, Rafael Gómez-Bombarelli

AI总结 本文提出一种系统且可扩展的框架,通过重新加权潜在平均力,实现不同机器学习势能函数之间的自由能轮廓匹配,从而在低计算成本下获得高精度的热力学性质。

Comments 19 pages, 4 figures, 1 table, SI appended

详情
AI中文摘要

自由能轮廓在微观原子波动与宏观热力学可观测量之间起到桥梁作用。沿反应坐标估计自由能轮廓(即平均势能轮廓,PMF)的密度泛函理论(DFT)精度计算成本很高。通用机器学习原子势能函数(MLIPs)大幅降低了这一成本,但其精度取决于训练数据,因此对于特定系统可能不确定。本文提出一种系统且可扩展的框架,用于重新加权PMF,初始由单一‘源’MLIP采样,然后扩展到代表性目标MLIP集合。由于传统直接指数重新加权在大系统中因相空间重叠低而失效,我们部署了稳健的分析修正。应用于复杂601原子系统中的Li+在纳米受限电解质中的传输,证明平均能量间隙近似有效避免了统计崩溃,产生高度稳定的PMF匹配目标PMF。使用此方法,我们可以在多个DFT参考水平(PBE+D3、PBE-sol、r²SCAN、r²SCAN-D4)下以远低于完整模拟的计算成本恢复高保真的目标热力学性质。进一步的热力学分析表明,所研究的MLIPs根据其训练数据分为两个不同的簇。我们的重新加权框架即使在相空间重叠极低时也能恢复目标热力学性质——特别是反应和激活自由能。最终,此方法建立了一种关键的诊断协议,以在不冗余且资源密集的模拟下实现材料化学性质的跨模型共识。

英文摘要

Free energy profiles serve as a fundamental bridge between microscopic atomic fluctuations and macroscopic thermodynamic observables. Estimating the free energy profile along a reaction coordinate, referred to as the potential of mean force (PMF), with density functional theory (DFT) accuracy is computationally expensive. Universal machine learning interatomic potentials (MLIPs) drastically reduce this cost, but their accuracy is strongly determined by their training data and hence can be uncertain for a given system. In this work, we present a systematic and scalable framework for reweighting PMFs, initially sampled with a single 'source' MLIP, across a representative suite of target MLIPs. Because traditional direct exponential reweighting fails for large system sizes due to low phase-space overlap between potentials, we deploy robust analytical corrections. Applying this to a complex 601-atom system of Li$^+$ transport in a nanoconfined electrolyte, we demonstrate that a mean energy-gap approximation effectively bypasses statistical collapse, producing a highly stable PMF matching the target PMF. Using this approach, we recover high-fidelity target thermodynamics across multiple DFT reference levels (PBE+D3, PBE-sol, r$^2$SCAN,r$^2$SCAN-D4) at a fraction of the computational cost of full simulations. Furthermore, thermodynamic analysis reveals that the studied MLIPs partition into two distinct clusters driven by their training data. Our reweighting framework successfully recovers target thermodynamic properties--specifically, reaction and activation free energies--even when the phase-space overlap between potentials is critically low. Ultimately, this approach establishes a vital diagnostic protocol to achieve affordable cross-model consensus on materials chemistry properties without redundant, resource-intensive simulations.

2605.15620 2026-05-18 stat.ML cs.LG

Pessimistic Risk-Aware Policy Learning in Contextual Bandits

悲观风险感知策略学习在上下文老虎机中

Yilong Wan, Yuqiang Li, Xianyi Wu

AI总结 本文提出统一框架优化Lipschitz连续风险函数,涵盖均值-方差、熵风险等,通过新型经验集中不等式推导数据依赖的次优界,无须强叠加假设,达到最小最大最优。

详情
AI中文摘要

我们研究风险感知的离线策略学习,旨在从记录数据中学习最优决策规则,满足一般风险标准。在高风险领域,线上交互不可行且需严格控制不利结果。现有离线上下文老虎机文献要么聚焦预期奖励标准,要么仅限于策略评估而非优化。本文提出统一分布框架优化Lipschitz连续风险函数,涵盖均值-方差、熵风险、条件风险价值等。通过开发新型经验集中不等式用于重要性采样分布估计,分析推导数据依赖的次优界,无须强叠加假设,该速率最小最大最优,与风险中性离线策略优化一致,表明优化一般Lipschitz风险标准无额外统计成本。

英文摘要

We study risk-aware offline policy learning, aiming to learn a decision rule from logged data that is optimal under general risk criteria. This problem is crucial in high-stakes domains where online interaction is infeasible and adverse outcomes must be carefully controlled. However, existing literature on offline contextual bandits either centers on expected-reward criteria or restricts risk considerations to policy evaluation instead of optimization. In this work, we propose a unified distributional framework for optimizing Lipschitz-continuous risk functionals, a broad class of risk measures encompassing mean-variance, entropic risk, and conditional value-at-risk, among others. By developing novel empirical concentration inequalities for importance sampling-based distributional estimators, our analysis derives data-dependent suboptimality bounds with an $\tilde{\mathcal{O}}(1/\sqrt{n})$ rate, without relying on restrictive uniform overlap assumptions. This rate is minimax optimal and matches that of risk-neutral offline policy optimization, indicating that optimizing general Lipschitz risk criteria incurs no additional statistical cost relative to the expected-reward.

2605.15617 2026-05-18 cs.DC cs.AI

A Few GPUs, A Whole Lotta Scale: Faithful LLM Training Emulation with PrismLLM

几块GPU,大量规模:PrismLLM实现忠实的LLM训练仿真

Shaoke Xi, ChonLam Lao, Boyi Jia, Jiaqi Gao, Zhipeng Zhang, Jiamin Cao, Brian Sutioso, Erci Xu, Minlan Yu, Kui Ren, Yong Li, Zhengping Qian, Ennan Zhai, Jingren Zhou

AI总结 PrismLLM通过切片方法构建高保真执行图,使工程师能用少量GPU模拟大规模训练行为,准确复现性能和内存表现,节省集群访问成本。

Comments 13 pages body, 21 pages total

详情
AI中文摘要

当前大型语言模型(LLM)训练依赖数千块GPU的集群,尽管规模大能加速模型发展,但开发、调试和性能调优框架变得复杂且昂贵。工程师需频繁访问生产集群以复现行为或评估优化,但大部分GPU已用于生产任务。PrismLLM通过切片方法构建高保真执行图,分离大规模执行与访问大集群的需求,使工程师能用少量GPU运行并观察感兴趣的一组rank。PrismLLM通过混合仿真,部分rank执行原始程序,其余rank作为虚拟参与者回放。实验显示PrismLLM在大规模LLM训练任务中准确复现性能和内存行为,迭代时间平均误差仅0.58%,峰值GPU内存使用误差低于0.01%。PrismLLM可模拟最多8192块GPU的集群,仅需原部署物理GPU的不到1%。

英文摘要

Large language model (LLM) training today runs on clusters spanning thousands of GPUs. While this scale enables rapid model advances, developing, debugging, and performance-tuning the training framework inevitably becomes complex and costly. This is because engineers often need to reproduce production behaviors to diagnose failures or evaluate optimizations, thereby demanding frequent and even exclusive access to production-scale clusters -- which becomes increasingly hard given that the majority of GPUs are already committed to production workloads. Simulation relies on complex performance models that are difficult to maintain, and downscaled experiments often fail to capture scale-dependent behaviors. We present PrismLLM to decouple large-scale execution from the need to access large clusters, enabling engineers to run and observe ranks of interest under faithful large-scale behavior using only a few GPUs. PrismLLM constructs a high-fidelity execution graph via a slicing-based approach that captures computation, communication, and dependencies of the target scale. Then, PrismLLM performs hybrid emulation where selected ranks execute the original program while the remaining ranks are replayed as virtual participants. Experiments on large-scale LLM training workloads show that PrismLLM accurately reproduces performance and memory behavior, achieving only 0.58\% average error in iteration time and less than 0.01\% error in peak GPU memory usage. PrismLLM can emulate clusters of up to 8192 GPUs using fewer than 1\% of the physical GPUs required by the original deployment.

2605.15579 2026-05-18 eess.IV cs.CV

TVRN: Invertible Neural Networks for Compression-Aware Temporal Video Rescaling

TVRN:用于压缩感知的可逆神经网络时间视频重采样

Xinmin Feng, Li Li, Dong Liu, Feng Wu

AI总结 本文提出TVRN框架,通过可逆架构和学习到的排名策略,解决压缩感知下的时间视频重采样问题,提升重建质量。

Comments Accepted by IEEE Transactions on Image Processing

详情
AI中文摘要

为适应多样显示和带宽约束,高帧率视频需先时间下采样到低帧率(LFR)再上采样,需联合优化以实现有效帧率重采样。然而现有方法通常通过训练目标连接两个操作,未充分利用其互为逆过程的性质,可能导致高频信息丢失。此外,它们忽略了有损编码器对LFR视频的影响,限制了实际应用。本文提出一种端到端的压缩感知帧率重采样框架TVRN。为正则化帧率下采样过程中丢失的高频信息,TVRN采用结合多输入多输出时间小波变换的可逆架构,并加入高频重建模块。为通过非可微的有损编码器实现端到端训练,设计了一个近似其梯度的替代网络。最后,为提高不同压缩级别下的鲁棒性,通过学习到的排名策略扩展TVRN为非对称架构。大量实验表明,TVRN在工业视频压缩设置下优于现有方法。源代码可在https://github.com/fengxinmin/TVRN_public公开获取。

英文摘要

To fit diverse display and bandwidth constraints, high-frame-rate videos are temporally downscaled to low-frame-rate (LFR) and later upscaled, requiring joint optimization for effective frame-rate rescaling. However, existing methods typically link the two operations via training objectives, without fully exploiting their reciprocal nature, which may cause high-frequency information loss. Moreover, they overlook the impact of lossy codecs on LFR videos, limiting real-world applicability. In this work, we propose an end-to-end framework for compression-aware frame-rate rescaling, named TVRN. To regularize high-frequency information lost during frame-rate downscaling, TVRN adopts an invertible architecture that combines a Multi-Input Multi-Output Temporal Wavelet Transform with a high-frequency reconstruction module. To enable end-to-end training through non-differentiable lossy codecs, we design a surrogate network that approximates their gradients. Finally, to improve robustness under various compression levels, we extend TVRN to an asymmetric architecture by incorporating compression-aware features learned via a learning-to-rank strategy. Extensive experiments show that TVRN outperforms existing methods in reconstruction quality under industrial video compression settings. Source code is publicly available at https://github.com/fengxinmin/TVRN_public.

2605.15571 2026-05-18 stat.ML cs.LG

MaxSketch: Robust Distinct Counting in Streams via Random Projections

MaxSketch:通过随机投影在数据流中实现鲁棒的唯一计数

Nikos Tsikouras, Constantine Caramanis, Christos Tzamos

AI总结 本文提出MaxSketch,利用随机高斯投影在高维噪声数据流中实现鲁棒的唯一计数,证明在几何结构下可将内存需求降低至~O(log n / ε²)。

详情
AI中文摘要

估计数据流中不同元素的数量在重复元素相同的情况下已知。然而在现代设置中,观测是高维且噪声的,相同对象的重复实例仅近似相似——例如不同个体的图像在像素层面可能有显著差异。经典草图如HyperLogLog依赖一致的哈希值来处理相同元素,在这种情况下会失效。最近在一般度量空间中关于鲁棒唯一计数的研究实现了~Θ(√n)的内存需求,这是最坏情况下的最优。本文证明在学习表示中常见的几何结构下,可以实现显著改进的内存保证。我们介绍了MaxSketch,一种由随机高斯投影构建的简单max线性草图,并证明其能够估计潜在对象的数量。具体而言,我们证明在这一假设下,m = ~O(log n / ε²)的随机投影(因此~O(log n / ε²)的内存)足以在(1+ε)因子内恢复真实的唯一计数。在图像流上的实验证实MaxSketch能够准确估计唯一计数,并在训练范围外泛化。我们的结果将经典流算法与现代表示学习连接起来,展示了几何结构如何从根本上减少唯一计数的复杂性。

英文摘要

Estimating the number of distinct elements in a data stream is well understood when repeated elements are identical. In modern settings, however, observations are high-dimensional and noisy, so repeated instances of the same object are only approximately similar -- for example, different images of the same individual may vary significantly at the pixel level. Classical sketches such as HyperLogLog rely on consistent hash values for identical elements and break down in this regime. Recent work on robust distinct counting in general metric spaces achieves $\widetildeΘ(\sqrt{n})$ memory, which is tight in the worst case. We show that substantially improved memory guarantees are possible under geometric structure common in learned representations. We introduce MaxSketch, a simple max-linear sketch built from random Gaussian projections, and prove that it succeeds in estimating the number of distinct latent objects. Concretely, we show that under this assumption $m = \widetilde{O} (\log n / \varepsilon^2)$ random projections (and hence $\widetilde{O} (\log n/\varepsilon^2)$ memory) suffice to recover the true distinct count within a $(1+\varepsilon)$ factor. Experiments on image streams confirm that MaxSketch accurately estimates distinct counts and generalizes beyond the training regime. Our results bridge classical streaming algorithms and modern representation learning, showing how geometric structure can fundamentally reduce the complexity of distinct counting.

2605.15569 2026-05-18 cs.CR cs.AI cs.SE

Detecting Privilege Escalation in Polyglot Microservices via Agentic Program Analysis

通过代理程序分析检测多语言微服务中的特权提升

Penghui Li, Hong Yau Chong, Yinzhi Cao, Junfeng Yang

AI总结 本文提出Neo框架,结合LLM和经典程序分析,解决微服务中特权提升检测的复杂性问题,发现24个零日漏洞,精度和召回率均优于现有方法。

Comments In Proceedings of the 47th IEEE Symposium on Security and Privacy (S&P)

详情
AI中文摘要

微服务因可扩展性和容错性被广泛采用,但其架构引入了特权和权限控制的复杂性,导致特权提升风险。本文提出Neo框架,结合大语言模型和经典程序分析,通过动态生成分析计划、适应代码搜索策略和验证语义,实现跨服务和语言的可扩展代码探索。在25个开源微服务应用上评估,Neo发现24个零日漏洞,精度81.0%、召回率85.0%。相比现有方法,Neo在检测准确性和可扩展性上均有显著提升,并展示了其在其他应用领域和漏洞类型上的可扩展性,发现18个额外零日漏洞。

英文摘要

Microservices are widely adopted in modern cloud systems due to their scalability and fault tolerance. However, microservice architectures introduce significant complexity in privilege and permission control, creating risks of privilege escalation where attackers can gain unauthorized access to resources or operations. Detecting such vulnerabilities is challenging due to complex cross-service interactions, polyglot codebases, and diverse privileged operations and permission checks. We present Neo, an agentic program analysis framework that combines large language models (LLMs) with classic program analysis to address these challenges. Neo leverages an LLM-based agent that dynamically generates analysis plans, adapts code search strategies, and validates semantics. We develop code search primitives that enable Neo to perform scalable and flexible code exploration across services and languages. We evaluated Neo on 25 open-source microservice applications spanning 7 programming languages and 6.2 million lines of code. Neo uncovered 24 zero-day privilege escalation vulnerabilities and achieved 81.0% precision and 85.0% recall on a ground-truth dataset. Compared to existing program analysis and agentic solutions, Neo demonstrated significant improvements in both detection accuracy and scalability. We further showcased Neo's extensibility by applying it to other application domains and vulnerability types, uncovering 18 additional zero-day vulnerabilities.

2605.15558 2026-05-18 eess.IV cs.CV

Text-RSIR: A Text-Guided Framework for Efficient Remote Sensing Image Transmission and Reconstruction

Text-RSIR: 一种基于文本的高效遥感图像传输与重建框架

Hao Yang, Xianping Ma, Peifeng Ma, Man-On Pun

AI总结 本文提出一种基于文本的遥感图像传输系统,通过低分辨率图像与紧凑文本描述替代高分辨率数据,提升传输效率。引入文本条件图像恢复模型,实现细粒度细节恢复与语义一致性保持。

Comments 15 pages, 8 figures, submitted to ISPRS JPRS

详情
AI中文摘要

高分辨率遥感影像对环境监测、城市制图和土地覆盖分析至关重要,但其传输常受带宽限制和高通信成本阻碍。传统流程传输全分辨率像素数据导致冗余和低效。本文提出一种文本引导的遥感图像传输系统,用低分辨率图像配以紧凑文本描述替代完整高分辨率数据。机载文本生成器产生空间和语义摘要,将传输数据量减少至原大小的约2%。地面重建中引入文本条件图像恢复模型,利用跨模态学习恢复细粒度空间细节并保持语义一致性。实验结果表明,在Alsat-2B、UC Merced Land Use和Aerial Image数据集上,所提框架的重建PSNR分别为16.36 dB、26.87 dB和27.41 dB,实现了高效且信息保留的遥感图像传输。实现将公开发布于GitHub。

英文摘要

High-resolution remote sensing imagery is critical for environmental monitoring, urban mapping, and land cover analysis, but its transmission is often hindered by limited bandwidth and high communication costs. Conventional pipelines transmit full-resolution pixel data, resulting in redundant and inefficient delivery. This paper proposes a text-guided remote sensing image transmission system that replaces complete high-resolution data with low-resolution images accompanied by compact textual descriptions. An onboard text generator produces spatial and semantic summaries, reducing the transmitted data volume to approximately 2\% of the original size. For ground-based reconstruction, a text-conditioned image restoration model is introduced, which leverages cross-modal learning to recover fine spatial details and maintain semantic coherence. Experimental results on the Alsat-2B, UC Merced Land Use, and Aerial Image datasets demonstrate that the proposed framework achieves reconstruction PSNRs of 16.36 dB, 26.87 dB, and 27.41 dB, respectively, enabling efficient and information-preserving image transfer for remote sensing applications. The implementation will be made publicly available at \href{https://github.com/haoyangofficial/textrssr}{GitHub}.

2605.15543 2026-05-18 cs.GT cs.AI

Domain-Independent Game Abstraction using Word Embedding Techniques

基于词嵌入技术的领域无关游戏抽象

Juho Kim, Tuomas Sandholm

AI总结 本文提出一种基于自然语言处理的词嵌入技术进行游戏抽象的方法,通过将动作视为词,利用词向量表示和聚类实现领域无关的游戏抽象,实验表明该方法有效但不如专用算法。

详情
AI中文摘要

许多现实中的游戏规模庞大,需要通过游戏抽象来减小规模。尽管过去二十年游戏抽象有显著进展,但多数工作局限于特定领域(如扑克),难以推广到其他领域。本文提出一种领域无关的游戏抽象方法,利用自然语言处理中的词嵌入技术,将动作视为词,通过训练词向量表示并聚类实现游戏抽象。实验结果表明,该方法有效,但不如针对特定游戏优化的算法性能优异。

英文摘要

Many games of interest in the real world are often intractably large, thereby necessitating the use of game abstraction to shrink them in size, typically by many magnitudes. Over the last two decades, there have been significant advances in game abstraction; however, the domain-specific nature (usually poker) of much of the prior work prevents those techniques from being easily generalized to other settings without extensively analyzing the game at hand. In this paper, we propose a domain-independent approach to game abstraction, which applies word embedding techniques from the field of natural language processing. Treating each action as a word and gameplay data as a corpus, word vectors can be trained to represent each action as a real-valued vector, which can then be clustered to facilitate game abstraction. We also explore the use of foundational embedding models and show that action embeddings obtained this way can capture a surprising amount of information about the underlying game. Experimental results demonstrate that our proposed game abstraction technique is effective, although it does not outperform specialized algorithms tailored to specific games.

2605.15507 2026-05-18 cs.IT cs.AI cs.LG math.IT

PrismQuant: Rate-Distortion-Optimal Vector Quantization for Gaussian-Mixture Sources

PrismQuant: 为高斯混合源优化的率失真向量量化

Bumsu Park, Chanho Park, Youngmok Park, Namyoon Lee

AI总结 针对高斯混合源,PrismQuant通过组件标签传输和组件匹配KLT实现率失真优化,结合EM驱动学习和熵约束量化,有效逼近理论边界并优于传统模型。

详情
AI中文摘要

对于均方误差下的高斯源,传统变换编码在率失真(RD)最优:KLT对角化协方差,反向水填充分配比特,随后标量量化闭环。然而多模态源中,单一协方差无法捕捉异质局部几何,RD函数失去闭合形式。本文通过高斯混合源重新审视该问题,构建其RD理论。核心发现混合结构仅引入组件标签成本。在活跃混合组件条件下,每个分支为高斯;挑战在于异质分支间的比特分配。证明 genie-aided 条件RD函数由单一全局反向水填充水平支配。基于此,提出PrismQuant,无损传输组件标签并使用组件匹配KLT编码残差,随后标量量化,实现H(C)/n bits per source dimension的反向率,渐近间隙消失。进一步开发基于EM驱动高斯混合学习、组件自适应KLT和熵约束标量量化(ECSQ)的实用实现。合成高斯混合实验显示PrismQuant接近理论RD界限,现实世界信道状态信息(CSI)数据实验显示其性能优于传统模型,模型规模小一个数量级。

英文摘要

For a Gaussian source under mean-squared error (MSE), classical transform coding is rate--distortion (RD) optimal: the Karhunen--Loeve transform (KLT) diagonalizes the covariance, reverse waterfilling allocates the bits, and scalar quantization closes the loop. This elegant story breaks down for multimodal sources, where no single covariance can capture heterogeneous local geometries, and the RD function loses its closed form. We revisit this problem through Gaussian-mixture sources and develop a constructive RD theory for them. Our key finding is that the mixture structure incurs only a component label cost. Conditioned on the active mixture component, each branch is Gaussian; the challenge is allocating bits across heterogeneous branches. We prove that the genie-aided conditional RD function is governed by a single global reverse-waterfilling level shared across all components and eigenmodes. Building on this result, we introduce PrismQuant, which transmits the component label losslessly and encodes the residual using the component-matched KLT, followed by scalar quantization, achieving a rate of H(C)/n bits per source dimension of the converse, with a vanishing asymptotic gap. We further develop a practical implementation based on EM-driven Gaussian-mixture learning, component-adaptive KLTs, and entropy-constrained scalar quantization (ECSQ). Experiments on synthetic Gaussian mixtures show that PrismQuant closely approaches the theoretical RD bound, while experiments on real-world channel-state-information (CSI) data demonstrate competitive or superior performance compared with transformer-based learned codecs at more than one order of magnitude smaller model size.

2605.15460 2026-05-18 cs.IR cs.AI

Differentially Private Motif-Preserving Multi-modal Hashing

差分隐私的动机保持多模态哈希

Zehua Cheng, Wei Dai, Jiahao Sun

AI总结 本文提出DMP-MH框架,通过去噪后蒸馏方法在保证隐私的前提下保留多模态数据的结构特征,实验表明其在保持隐私的同时提升了检索性能。

Comments 9 Pages

详情
AI中文摘要

跨模态哈希通过将图像和文本编码为紧凑的二进制码实现高效检索。现有方法依赖于用户交互导出的语义相似性图进行监督,但这些图编码了敏感行为模式,易受链接重建攻击。现有隐私保护方法在图结构数据上失效:差分隐私SGD通过独立处理样本破坏关系动机,而图合成方法在无标度网络中面临无界局部敏感性,中心节点的单边修改会通过O(N)改变三角形计数,需要昂贵的噪声注入。我们称此现象为Hubness Explosion。本文提出DMP-MH,一种Sanitize-then-Distill框架,将隐私与表征学习解耦。我们的方法首先通过确定性裁剪节点度数来限制敏感性,独立于数据集规模上限三角动机的L2敏感性。然后通过在(ε,δ)-边差分隐私下生成去噪合成图。最后,双流哈希网络通过整体结构损失蒸馏此拓扑,强制跨模态对齐。在MIRFlickr-25K和NUS-WIDE数据集上严格归纳协议下评估,DMP-MH在保持隐私的同时,检索性能比私有基线高出11.4 mAP点,非隐私性能保留率达92.5%。

英文摘要

Cross-modal hashing enables efficient retrieval by encoding images and text into compact binary codes. State-of-the-art methods rely on semantic similarity graphs derived from user interactions for supervision, yet these graphs encode sensitive behavioral patterns vulnerable to link reconstruction attacks. Existing privacy-preserving approaches fail on graph-structured data: Differentially Private SGD destroys relational motifs by treating samples independently, while graph synthesis methods suffer from unbounded local sensitivity in scale-free networks, hub nodes cause single-edge modifications to alter triangle counts by $\mathcal{O}(N)$, necessitating prohibitive noise injection. We term this phenomenon Hubness Explosion. We propose DMP-MH, a Sanitize-then-Distill framework that decouples privacy from representation learning. Our approach first bounds sensitivity by deterministically clipping node degrees, capping the $L_2$-sensitivity of triangle motifs independently of dataset size. A sanitized synthetic graph is then generated via Noisy Mirror Descent under $(ε,δ)$-Edge Differential Privacy. Finally, dual-stream hashing networks distill this topology using a holistic structural loss that enforces cross-modal alignment. Evaluated on MIRFlickr-25K and NUS-WIDE under a strict inductive protocol, DMP-MH outperforms private baselines by up to 11.4 mAP points while retaining up to 92.5% of non-private performance.

2605.15456 2026-05-18 eess.IV cs.CV math.OC

DIPA: Distilled Preconditioned Algorithms for Solving Imaging Inverse Problems

DIPA:用于解决成像反问题的蒸馏预条件算法

Romario Gualdrón-Hurtado, Roman Jacome, Leon Suarez, Henry Arguello

AI总结 本文提出DIPA算法,通过教师指导蒸馏改进重建质量,结合线性与非线性预条件运算符,验证了其在磁共振成像、压缩感知和超分辨率成像中的有效性。

Comments 17 pages, 8 figures, 8 tables

详情
AI中文摘要

解决成像反问题通常需要设计合适的先验模型,但数据保真项的最小化因物理约束导致的病态传感矩阵而面临挑战。为此,经典优化理论采用预条件技术通过改变算法梯度步长以加速收敛和提升数值稳定性。本文将预条件概念扩展至提升重建质量,并引入DIPA:蒸馏预条件算法,其中预条件运算符(PO)通过教师指导的蒸馏标准进行优化。教师与学生在重建过程中使用的传感运算符不同:教师使用模拟的更良态且信息更丰富的传感矩阵,而学生使用物理可行的传感矩阵。设计不同的蒸馏损失函数以将教师算法的不同特性转移到预条件学生中。PO可以是线性的(L-DIPA),允许可解释性,或非线性的(N-DIPA),由神经网络参数化,提供更好的可扩展性。在多种成像模态中验证了所提PO设计的有效性,包括磁共振成像、压缩感知和超分辨率成像。

英文摘要

Solving imaging inverse problems has usually been addressed by designing proper prior models of the underlying signal. However, minimizing the data fidelity term poses significant challenges due to the ill-conditioned sensing matrix caused by physical constraints in the acquisition system. Thus, preconditioning techniques have been adopted in classical optimization theory to address ill-conditioned data-fidelity minimization by transforming the algorithm gradient step to achieve faster convergence and better numerical stability. We extend the preconditioning concept beyond convergence acceleration and use it to improve reconstruction quality. We introduce DIPA: Distilled Preconditioned Algorithms, where a preconditioning operator (PO) is optimized using teacher-guided distillation criteria. Unlike standard model-compression KD, the teacher and student differ by the sensing operators available during reconstruction: the teacher uses a simulated, better-conditioned, and more informative sensing matrix, whereas the student uses the physically feasible sensing matrix. We design different distillation loss functions to transfer different properties of the teacher algorithm to the preconditioned student. The PO can be linear (L-DIPA), allowing interpretability, or non-linear (N-DIPA), parametrized by a neural network, offering better scalability. We validate the proposed PO design across several imaging modalities, including magnetic resonance imaging, compressed sensing, and super-resolution imaging.

2605.15425 2026-05-18 cs.SE cs.AI

Runtime-Structured Task Decomposition for Agentic Coding Systems

运行时结构化任务分解用于代理编码系统

Shubhi Asthana, Bing Zhang, Chad DeLuca, Hima Patel, Ruchi Mahindru

AI总结 本文提出运行时结构化任务分解方法,通过可执行控制逻辑管理任务分解与执行流程,降低重试成本,提升代理编码系统的效率和可靠性。

Comments Paper presented at ACM Conference on AI and Agentic Systems 2026 at the Agentic Software Engineering workshop

详情
AI中文摘要

代理编码系统越来越多地使用大型语言模型(LLMs)进行软件工程任务,如调试、根本原因分析和代码审查。然而,许多现有系统在单个提示中编码任务逻辑、执行流程和输出生成,这种设计导致行为脆弱、调试困难和高重试成本,因为失败往往需要重新运行整个工作流。我们提出运行时结构化任务分解,一种架构方法,通过可执行控制逻辑管理任务分解和执行流程,而不是仅依赖提示结构。LLMs仅用于专注判断任务,输出在下游执行前会根据预定义的模式进行验证。我们在两个软件工程工作负载上评估了这种方法,使用三种配置:单体执行、静态分解(固定子任务和无运行时分支)和运行时结构化分解。每种配置在10次运行中进行评估。我们的结果表明,分解本身并不一定减少重试成本。在Kubernetes根本原因分析工作负载中,静态分解基线的重试成本为1,632±145个标记,而单体基线为904±17个标记,因为失败迫使重新运行下游子任务。在多文件调试工作负载中,类似模式出现,静态基线消耗933个标记,而单体系统为703个标记。运行时结构化方法仅重新运行失败的子任务,将重试成本降低到436±132个标记(根本原因分析)和460个标记(调试)。总体而言,该方法比单体系统减少了51.7%的重试成本,比静态分解基线减少了73.2%的重试成本,提高了代理编码系统的效率、调试能力和操作可靠性。

英文摘要

Agentic coding systems increasingly use large language models (LLMs) for software engineering tasks such as debugging, root cause analysis, and code review. However, many existing systems encode task logic, execution flow, and output generation inside monolithic prompts. This design creates brittle behavior, limited debuggability, and high retry costs because failures often require rerunning the full workflow. We present runtime-structured task decomposition, an architectural approach in which task partitioning and execution flow are managed through executable control logic rather than prompt structure alone. LLMs are used only for focused judgment tasks, and outputs are validated against predefined schemas before downstream execution. We evaluate this approach on two software engineering workloads using three configurations: monolithic execution, static decomposition with fixed subtasks and no runtime branching, and runtime-structured decomposition. Each configuration was evaluated across 10 runs. Our results show that decomposition alone does not necessarily reduce retry cost. In the Kubernetes root cause analysis workload, the static decomposition baseline produced a retry cost of 1,632 +/- 145 tokens versus 904 +/- 17 tokens for the monolithic baseline because failures forced reruns of downstream subtasks. A similar pattern appeared in the multi-file debugging workload, where the static baseline consumed 933 tokens compared to 703 tokens for the monolithic system. The runtime-structured approach reran only failed subtasks, reducing retry costs to 436 +/- 132 tokens for root cause analysis and 460 tokens for debugging. Overall, the approach achieved up to 51.7% lower retry cost than monolithic systems and 73.2% lower retry cost than static decomposition baselines, improving efficiency, debuggability, and operational reliability in agentic coding systems.

2605.15412 2026-05-18 cs.CE cs.AI cs.CL

From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery

从反馈循环到政策更新:基于强化微调的LLM驱动的alpha因子发现

Lingzhe Zhang, Tong Jia, Yunpeng Zhai, Zixuan Xie, Chiming Duan, Minghua He, Philip S. Yu, Ying Li

AI总结 本文提出QuantEvolver框架,通过强化微调将可执行量化评估转化为策略更新,提升LLM在alpha因子发现中的表现,生成高质量且互补的因子池。

详情
AI中文摘要

现代量化交易日益依赖系统模型从大规模金融数据中提取预测信号,其中alpha因子发现是将市场观察转化为可交易信号的核心。最近基于LLM的方法在自动化因子生成方面表现出色,但大多数仍依赖提示级生成-评估-反馈循环进行迭代优化。随着循环变长,反复追加的历史候选和反馈会导致上下文爆炸,增加推理成本,稀释有用信息,并引入反馈漂移。此外,这些方法通常依赖非常大的LLM,其稳定的生成偏好可能导致结构相似的表达、冗余候选和搜索停滞。为了解决这些限制,我们提出QuantEvolver,一种基于强化微调的自进化alpha因子发现框架。与在提示中积累反馈不同,QuantEvolver将可执行量化评估转化为策略更新,使Miner LLM通过参数学习内化历史优化经验。具体而言,QuantEvolver构建高质量种子因子,构建多样化的种子-时间窗训练任务,生成可执行的Factor DSL表达式,通过Regime Backtest进行评估,并通过多样性-互补性奖励优化Miner LLM。在训练过程中,高质量因子持续积累在Mined Factor Database中,最终成为发现的因子库。在三个现实市场基准上的广泛实验表明,QuantEvolver的有效性,其在每个任务的主要评估指标上均优于现有基于LLM的alpha因子发现基线,产生更高质量和更互补的因子池。

英文摘要

Modern quantitative trading increasingly relies on systematic models to extract predictive signals from large-scale financial data, where alpha factor discovery plays a central role in transforming market observations into tradable signals. Recent LLM-based methods have shown promise in automating factor generation, but most of them still rely on prompt-level generation--evaluation--feedback loops for iterative optimization. As the loop becomes longer, repeatedly appended historical candidates and feedback can cause context explosion, increase inference cost, dilute useful information, and introduce feedback drift. Moreover, these methods often depend on very large LLMs whose stable generation preferences may lead to structurally similar expressions, redundant candidates, and search stagnation. To address these limitations, we propose \textsc{QuantEvolver}, a self-evolving alpha factor discovery framework based on reinforcement fine-tuning. Instead of accumulating feedback in the prompt, \textsc{QuantEvolver} converts executable quantitative evaluation into policy updates, enabling a Miner LLM to internalize historical optimization experience through parameter learning. Specifically, \textsc{QuantEvolver} constructs high-quality seed factors, builds diverse seed--time-window training tasks, generates executable Factor DSL expressions, evaluates them through Regime Backtest, and optimizes the Miner LLM with Diversity-Complementarity Reward. During training, high-quality factors are continuously accumulated in a Mined Factor Database, which serves as the final discovered factor library. Extensive experiments on three realistic market benchmarks demonstrate the effectiveness of \textsc{QuantEvolver}, which consistently improves the primary evaluation metric of each task over existing LLM-based alpha factor discovery baselines, produces higher-quality and more complementary factor pools.

2605.15411 2026-05-18 stat.ML cs.LG math.OC

Harnessing Unimodality in Semiparametric Contextual Pricing via Oracle Price Map Learning

通过Oracle价格图学习利用单峰性在半参数上下文定价中

Yingying Fan, Yuxuan Han, Jinchi Lv, Xiaocong Xu, Zhengyuan Zhou

AI总结 本文研究了半参数标量指数估值模型中的上下文动态定价,通过Oracle价格图学习方法,利用β-Hölder光滑性和收益几何条件,提出了一种模块化粗到细策略,实现非参数Oracle图学习的最优 regret 界。

详情
AI中文摘要

我们研究了半参数标量指数估值模型中的上下文动态定价,其中潜在价值为 $v_t=μ_\ast(\mathsf c_t)+ξ_t$,其中未知效用图 $μ_\ast$ 和未知加性噪声分布。关键决策对象是通过标量指数 $u=μ_\ast(\mathsf c)$ 和噪声尾部诱导的一维Oracle价格图 $u\mapsto p^\ast(u)$。在 $β$-Hölder光滑性($β\geq 2$)和收益几何条件(提供唯一、稳定的内部最大化器)下,该Oracle图本身为 $(β-1)$-光滑。我们通过 $\mathsf{ORBIT}$,一种模块化粗到细策略,利用标量试点指数作为输入,在每个活跃区间内局部化基准价格,并通过多臂凸优化学习Oracle图的局部多项式近似。对于基线线性效用模型 $μ_\ast(\mathsf c)=\mathsf c^\topθ_\ast$,自适应椭圆探索方案在不假设上下文分布的情况下构建所需的标量试点在线。所得到的策略达到 regret $\widetilde{O}\big(T^{\frac{2β-1}{4β-3}}+\sqrt{dT}\big)$。对于固定 $d$,我们建立了在时间范围依赖上的匹配下界,揭示了非参数Oracle图学习项的最小最大尖锐性。相同的标量试点接口还扩展到稀疏高维线性效用和非参数Hölder效用。

英文摘要

We study contextual dynamic pricing in a semiparametric scalar-index valuation model where the latent value is $v_t=μ_\ast(\mathsf c_t)+ξ_t$, with an unknown utility map $μ_\ast$ and an unknown additive noise distribution. The key decision object is the one-dimensional oracle price map $u\mapsto p^\ast(u)$ induced by the scalar index $u=μ_\ast(\mathsf c)$ and the noise tail. Under the $β$-Hölder smoothness of the tail function for $β\geq 2$ and a revenue-geometry condition that gives a unique, stable, interior maximizer, this oracle map is itself $(β-1)$-smooth. We exploit such structure through $\mathsf{ORBIT}$, a modular coarse-to-fine policy that takes a scalar pilot index as input, localizes a benchmark price in each active bin, and learns a local polynomial approximation of the oracle map inside a trust region via bandit convex optimization. For the baseline linear utility model $μ_\ast(\mathsf c)=\mathsf c^\topθ_\ast$, an adaptive elliptical exploration scheme constructs the required scalar pilot online without distributional assumptions on the contexts. The resulting policy achieves regret $\widetilde{O}\big(T^{\frac{2β-1}{4β-3}}+\sqrt{dT}\big)$. For fixed $d$, we establish a matching lower bound in the horizon dependence, unveiling that the nonparametric oracle-map learning term is minimax sharp. The same scalar-pilot interface also yields extensions to sparse high-dimensional linear utility and nonparametric Hölder utility.

2605.15410 2026-05-18 quant-ph cs.AI cs.LG

Diagonal Adaptive Non-local Observables on Quantum Neural Networks

量子神经网络上的对角自适应非局部可观测量

Huan-Hsin Tseng, Yan Li, Hsin-Yi Lin, Samuel Yen-Chi Chen

AI总结 本文提出了一种对角自适应非局部可观测量,通过仅考虑对角可观测量与量子电路的组合,降低了参数数量和经典优化成本,同时保持了全非局部可观测量的能力。

Comments Accepted at ICCCN2026

详情
AI中文摘要

自适应非局部可观测量(ANOs)已显示,使量子可观测量动态化可以显著扩大变分量子算法的功能空间,部分将硬件需求从电路合成转移到测量设计。然而,这种优势伴随着参数数量的大幅增加以及经典优化成本的上升。我们提出了一种特殊的ANo形式,通过仅考虑对角可观测量与量子电路的组合,显著降低了这一负担。数学上,这相当于全ANo在大参数空间中的完整形式,因为对角矩阵是ANo空间的规范代表,模幺正相似性。因此,对角ANo保持了全ANo的能力,同时将k-局部可观测量的复杂度从O(4^k)降低到O(2^k),并降低了相应的测量侧经典计算成本。从这个意义上说,对角ANo保留了全ANo的许多优势,同时涵盖了传统VQCs作为特殊情况。

英文摘要

Adaptive Non-local Observables (ANOs) have shown that making quantum observables dynamic can substantially enlarge the function space of Variational Quantum Algorithms, partly shifting hardware demands from circuit synthesis to measurement design. However, this advantage is accompanied by a steep increase in the number of parameters, as well as the classical optimization cost for varying general Hermitian observables. We propose a special form of ANO that significantly reduces this burden by considering only diagonal observables paired with quantum circuits. Mathematically, this is equivalent to the full ANO of a large parameter space since diagonal matrices are canonical representatives of the ANO space modulo unitary similarity. As a result, Diagonal ANO retains the same capability of full ANO while reducing $k$-local observable complexity from $O(4^k)$ to $O(2^k)$ and lowering the corresponding measurement-side classical computation. In this sense, diagonal ANO preserves much of the benefit of full ANO while encompassing conventional VQCs as a special case.