arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1530
专题追踪
2606.19690 2026-06-19 cs.LG 新提交

Multi-Granular Attention-Driven Reinforcement Learning Framework for Web Intelligent Enhancement Systems

多粒度注意力驱动的强化学习框架用于Web智能增强系统

Navin Chhibber, Deepak Singh, Anokh Kishore, Nikita Chawla, K. Anguraj

AI总结 提出MGAR-WIES框架,通过语义图建模、注意力机制和自适应强化学习,解决Web环境中异构动态数据的语义理解与可扩展性问题,在准确率上达到80%。

Comments 2026 3rd International Conference on Integrated Intelligence and Communication Systems (ICIICS), 6 Pages

详情
AI中文摘要

近年来,Web智能增强系统越来越依赖异构和动态的Web数据来提供个性化的上下文感知服务。然而,传统的机器学习、深度学习和强化学习模型在持续演化的Web环境中往往难以应对语义理解、适应性和可扩展性的挑战。本研究提出了一种基于多粒度注意力的强化Web智能增强系统(MGAR-WIES),通过集成语义图建模、注意力机制和自适应强化学习来应对这些挑战。首先,收集包括结构化、半结构化和非结构化来源的异构Web数据,并进行预处理以生成统一特征表示。这些表示被转换为动态语义图,其中实体及其关系通过注意力机制增强的图嵌入进行建模,以捕捉局部相关性和全局上下文依赖。随后,一种自适应多智能体强化学习策略利用注意力感知的语义状态来优化个性化Web动作,如内容推荐、导航优化和服务自适应。最后,持续在线反馈被进一步集成,以实时更新图表示和学习策略,确保持续的适应性和性能。与现有方法相比,提出的MGAR-WIES在准确率(80%)方面取得了更好的结果。

英文摘要

From the past few years, web intelligent enhancement systems increasingly rely on heterogeneous and dynamic web data to deliver personalized, context-aware services. However, traditional machine learning, deep learning, and reinforcement learning models often struggle with semantic understanding, adaptability, and scalability in continuously evolving web environments. In this research, a Multi-Granular Attention-based Reinforcement Web Intelligent Enhancement System (MGAR-WIES) is proposed to address the challenges by integrating semantic graph modeling, attention mechanisms, and adaptive reinforcement learning. Initially, heterogeneous web data comprising structured, semi-structured and unstructured sources are collected and preprocessed for generating unified feature representations. These representations are transformed into a dynamic semantic graph, where entities and their relationships are modeled by using graph embeddings enhanced by attention mechanisms for capturing both local relevance and global contextual dependencies. Subsequently, an adaptive multi-agent reinforcement learning strategy leverages the attention-aware semantic states to optimize personalized web actions like content recommendation, navigation optimization, and service adaptation. Finally, the continuous online feedback is further integrated to update graph representations and learning policies in real time by ensuring sustained adaptability and performance. The proposed MGAR-WIES acheived better results in terms of accuracy (80%) when compared with existing approaches.

2606.19676 2026-06-19 cs.CV cs.AI 新提交

TeleMorpher: Toward Robust Simultaneous Motion-Location Editing

TeleMorpher: 迈向鲁棒的同步运动-位置编辑

Haengbok Chung

AI总结 提出TeleMorpher,一种基于扩散模型的一步式框架,通过运动先验、姿态扭曲和基线运动编辑器注入,实现视频中主角运动与位置的同步编辑,在定量和定性评估中表现优异。

详情
AI中文摘要

扩散模型在图像和视频生成与编辑中取得了显著成功。尽管最近的研究将工作扩展到运动编辑,但同步变换运动与位置——尽管具有实际重要性——仍基本未被探索。为了更好地理解鲁棒的运动-位置编辑,我们首先分析了降低其质量的根本因素。基于此分析,我们提出了TeleMorpher,据我们所知,这是首个用于同步运动-位置编辑的一步式框架之一。我们的方法利用运动先验(从现成模型生成的目标运动中心视频作为运动编辑指导)和真实运动,实现更可控和精确的运动-位置编辑。通过这种方式,我们的框架工作如下:(1) 首先通过预训练的分割和修复模型分离主角和背景。(2) 然后,我们引入一种无需训练的姿势扭曲,以运动先验为指导编辑主角的运动。(3) 扭曲运动视频的结果在推理时直接注入基线运动编辑器,减轻源运动与目标运动之间的差异,同时保留源视频的外观。(4) 为提高定量评估的可靠性,我们提出了两个新的基于LPIPS的指标,分别测量运动编辑前后背景一致性以及通过测量从源视频和目标视频中提取的主角骨架差异来评估运动编辑性能的保真度。在野外视频和TaiChi数据集上的实验表明,TeleMorpher在定量和定性测量(真实人类评估)中均取得了优越性能,凸显了其有效性。

英文摘要

Diffusion models have achieved remarkable success in image and video generation and editing. While recent studies have extended these efforts toward motion editing, simultaneously transforming both motion and location-despite its practical importance-remains largely unexplored. To better understand robust motion-location editing, we first analyze the fundamental factors that degrade its quality. Based on this analysis, we propose TeleMorpher, one of the first one-shot frameworks to the best of our knowledge, for simultaneous motion-location editing. Our approach leverages motion priors, a target motion-centric video generated from an off-the-shelf model as motion-editing guidance, and the ground truth motion to enable more controllable and precise motion-location editing. Via this, our framework works as follows: (1) we first disentangle the protagonist and the background via pre-trained segmentation and inpainting models. (2) Then, we introduce a training-free pose warping that edits the protagonist's motion with the motion prior as the guidance. (3) The result of warped motion video is directly injected into a baseline motion editor during inference, mitigating the difference between source and target motions while preserving the appearance of the source video. (4) To enhance the reliability of quantitative evaluations, we propose two new LPIPS-based metrics that measure the background consistency before and after the motion editing and the fidelity of motion editing performance via measuring the difference between the extracted protagonist's skeletons from source and target videos. Experiments with in-the-wild videos and the TaiChi dataset demonstrate that TeleMorpher achieves superior performance across both quantitative and qualitative measurements (real-human evaluation), underscoring its effectiveness.

2606.19638 2026-06-19 cs.CL 新提交

MiqraBERT: Regression-Based Sentence-BERT Finetuning for Biblical Hebrew Parallel Detection

MiqraBERT:基于回归的Sentence-BERT微调用于圣经希伯来语平行检测

David M. Smiley

AI总结 提出MiqraBERT模型,通过余弦相似度回归微调Sentence-BERT,在圣经希伯来语中检测文本平行,将分布分离度提升2.7倍,重叠区域从24%降至6%。

详情
AI中文摘要

文本复用遍及希伯来圣经,但用于检测的计算方法仍主要依赖词汇重叠,一旦平行涉及释义、词汇替换或句法重组,这些方法就会失效。本文介绍MiqraBERT,一个从AlephBERT(现代希伯来语编码器)微调而来的Sentence-BERT模型,用于圣经希伯来语的诗句级语义相似度。训练集包含1,650个标注的诗句和半诗句对:825个来自编年史同源材料和诗歌平行基础研究的真实平行,与825个随机采样的负例平衡。通过余弦相似度回归,模型学习到一个嵌入空间,其中平行诗句聚集在一起,无关诗句彼此远离。我们使用基于分布的指标、Wasserstein距离和重叠系数,在十个随机种子上评估分离度。MiqraBERT将分布分离度比预训练基线提高了2.7倍,并将模糊重叠区域从约24%减少到约6%。叙事同源平行的召回@10达到87.1%;诗歌平行仍然困难,低于9%。这种依赖于体裁的不对称性将模型的可靠范围限制在叙事文本复用。MiqraBERT在此https URL公开可用。

英文摘要

Textual reuse pervades the Hebrew Bible, yet the computational methods used to detect it still rest largely on lexical overlap, and they falter once a parallel involves paraphrase, lexical substitution, or syntactic reworking. This paper introduces MiqraBERT, a Sentence-BERT model finetuned from AlephBERT (a Modern Hebrew encoder) for verse-level semantic similarity in Biblical Hebrew. The training set comprises 1,650 labeled verse and half-verse pairs: 825 true parallels drawn from the Chronicles synoptic material and from foundational studies of poetic parallelism, balanced against 825 randomly sampled negatives. Through cosine-similarity regression, the model learns an embedding space in which parallel verses cluster together and unrelated verses move apart. We evaluate separation with distribution-based metrics, Wasserstein distance and the overlap coefficient, across ten random seeds. MiqraBERT improves distributional separation 2.7-fold over the pre-trained baseline and reduces the ambiguous overlap region from roughly 24% to about 6%. Narrative synoptic parallels reach a recall@10 of 87.1%; poetic parallels remain difficult, below 9%. This genre-dependent asymmetry confines the model's reliable scope to narrative textual reuse. MiqraBERT is publicly available at https://huggingface.co/davidmsmiley/MiqraBERT

2606.19568 2026-06-19 cs.SD cs.AI 新提交

Exploring Feature Extraction Technique Parameters for Acoustic Gunshot Classification

声学枪声分类的特征提取技术参数探索

Sinclair Gurny, Ryan Quinn

AI总结 本文系统研究了特征提取技术及其参数对声学枪声分类的影响,使用ResNet-18在23000条枪声数据集上评估,发现正确技术可提升top-1准确率20%,参数优化可再提升4.7%。

详情
AI中文摘要

声学枪声检测是一个在民用公共安全、军事行动和野生动物保护中都有应用的问题,但该领域缺乏对特征提取技术的严格探索,且未关注对现实数据的泛化能力。商业枪声检测与分类系统的混合有效性表明,当前文献未能充分解决这一开放问题。在本文中,我们使用包含85种枪械和21种口径的23000条枪声记录数据集,对常见特征提取技术进行了系统研究。我们使用ResNet-18对三种特征提取技术及其12个独特参数集进行了基准测试。结果表明,使用正确的特征提取技术可将top-1准确率提升高达20%,而针对给定特征提取技术使用正确的参数可进一步提升高达4.7%。

英文摘要

Acoustic gunshot detection is a problem with applications across civilian public safety, military operations, and wildlife conservation, yet the field lacks a rigorous exploration of feature extraction techniques with a focus on generalization to realistic data. The mixed effectiveness of commercial gunshot detection and classification systems indicates an open problem that is not adequately addressed by the current literature. In this paper, we present a systematic investigation of common feature extraction techniques using a dataset of 23,000 gunshot recordings across 85 firearms and 21 calibers. We benchmark three feature extraction techniques with 12 total unique parameter sets using ResNet-18. Our results demonstrate that using the correct feature extraction technique can improve top-1 accuracy by up to 20%, and utilizing the correct parameters for a given feature extraction technique can improve that value by up to 4.7%.

2606.19525 2026-06-19 cs.RO 新提交

A Categorial and Sheaf-Theoretic Semantics for Autonomic Component Ensembles

自主组件集合的范畴与层论语义

Manuel Hernández, Eduardo Sánchez-Soto

AI总结 针对自主组件集合语言SCEL,提出基于范畴论和层论的多层数学模型,将机器人社会建模为拓扑空间上的层,通过层上同调量化系统故障,将分布式系统验证转化为几何分析。

详情
AI中文摘要

大规模、去中心化的自主代理系统(如机器人集群和网络化信息物理系统)的激增对传统形式化方法提出了严峻挑战。软件组件集合语言(SCEL)为这类系统提供了形式化模型,但其操作语义不适合推理全局、结构和涌现属性。本报告利用范畴论和层论为SCEL提出了一种新的多层数学模型。我们认为,用SCEL描述的机器人社会可以形式化地建模为拓扑空间上的层,其中组件是点,集合是开集,分布式知识构成层的数据。在此框架下,信息共享等计算过程等价于“粘合”局部数据的层论操作。系统故障可以被理解并量化为拓扑障碍,通过层上同调可测量。该方法将复杂分布式系统的验证转化为数学对象的几何分析,为设计鲁棒的自主系统提供了深刻的结构性见解。

英文摘要

The proliferation of large-scale, decentralized systems of autonomous agents, such as swarms of robots and networked cyber-physical systems, presents a formidable challenge to traditional formal methods. The Software Component Ensemble Language (SCEL) offers a formal model for such systems, but its operational semantics is not ideal for reasoning about global, structural, and emergent properties. This report proposes a new, multi-layered mathematical model for SCEL using category theory and sheaf theory. We argue that a society of robots described in SCEL can be formally modeled as a sheaf on a topological space, where components are points, ensembles are open sets, and distributed knowledge forms the sheaf's data. In this framework, computational processes like information sharing become equivalent to the sheaf-theoretic operation of "gluing" local data. System failures can then be understood and quantified as topological obstructions, measurable by sheaf cohomology. This approach transforms the verification of a complex distributed system into the analysis of the geometry of a mathematical object, providing deep, structural insights for the design of robust autonomic systems.

2606.19397 2026-06-19 cs.RO 新提交

DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy

DiffusionVS:基于扩散策略的鲁棒视觉伺服生成框架

Hongkang Cui, Rui He, Haoyao Chen

AI总结 提出基于扩散策略的视觉伺服方法,通过条件去噪生成相机速度,并采用在线训练增强泛化能力,仿真成功率近100%,物理实验93%。

Comments 8 pages, 4 figures, 7 tables

详情
AI中文摘要

视觉伺服是机器人操作和导航中的基础技术。基于回归的视觉伺服常因噪声敏感的单步映射和分布偏移时的误差累积而出现轨迹抖动。相比之下,扩散策略通过预测动作序列保持时间一致性,并通过隐式数据增强提高鲁棒性。本文提出一种新颖的基于扩散的伺服方法。基于扩散策略,该方法使用观测标签角点的归一化图像坐标作为输入,通过条件去噪生成相机速度。为了克服在静态数据集上训练的模型的泛化限制,采用了在线训练范式,通过交互经验收集持续扩展训练数据的多样性。该策略显著提升了模型的性能和泛化能力。全面的仿真和实际实验证明了该方法的有效性,在仿真中实现了近100%的成功率,在物理实验中达到93%。除了具体的流程,我们进一步验证了扩散机制的通用性。实验表明,现有的视觉伺服网络在与我们的扩散模块集成时,性能持续提升。这些结果表明,所提出的策略具有广泛的适用性,能够增强除本文具体架构之外的各种视觉伺服系统。

英文摘要

Visual servoing is a fundamental technique in robotic manipulation and navigation. Regression-based visual servoing frequently experiences trajectory jitter as a result of noise-sensitive single-step mappings and the accumulation of errors during distribution shifts. In contrast, Diffusion Policy maintains temporal consistency by predicting action sequences and improves robustness through implicit data augmentation. This paper presents a novel diffusion-based servoing method. Based on Diffusion Policy, the proposed approach uses normalized image coordinates of observed tag corners as input and generates camera velocity through conditional denoising. To overcome the generalization limitations of models trained on static datasets, an online training paradigm is adopted, continuously expanding the diversity of training data through interactive experience collection. This strategy substantially enhances both the performance and generalization capability of the model. Comprehensive simulations and real-world experiments demonstrate the effectiveness of the proposed method, achieving success rates of nearly 100\% in simulation and 93\% in physical experiments. Beyond the specific pipeline, we further validate the generality of the diffusion mechanism. Experiments show that existing visual servoing networks consistently achieve improved performance when integrated with our diffusion-based module. These results indicate that the proposed strategy possesses broad applicability and can enhance various visual servoing systems beyond the specific architecture presented here.

2606.20325 2026-06-19 cs.LG cs.SC math.DS 新提交

Recurrent neural networks approximate continuous functions

递归神经网络近似连续函数

Valentin Abadie, Clemens Hutter, Helmut Bölcskei

AI总结 本文证明,对于[-1,1]上的任意连续函数,存在一个固定权重和隐藏维度的ReLU递归神经网络,其时间演化可以均匀逼近该函数,并给出了收敛速率和极小极大下界。

详情
AI中文摘要

经典逼近定理要求每当目标精度提高时,就需要一个新的神经网络。本文研究相反的可能性:能否一劳永逸地选择网络,而仅通过让其运行更长时间来换取精度?我们证明这对于[-1,1]上的每个连续函数都是可能的。更准确地说,每个这样的函数都可以通过一个具有固定权重和固定隐藏维度的单ReLU递归神经网络的时间演化来均匀逼近。该构造背后的机制是一个新的中间模型——带神经单元的图灵机(TMNU)。该模型保留了实现多项式逼近方案所需的算法自由度,同时保持足够的刚性,以便被具有显式隐藏维度和权重幅度界限的RNN模拟。由此产生的收敛速率反映了底层多项式逼近的速率。我们通过极小极大下界补充了该构造,表明运行时间不仅仅是证明的产物,而是这种固定网络逼近范式中不可避免的资源。

英文摘要

Classical approximation theorems ask for a new neural network whenever the target accuracy is improved. This paper studies the opposite possibility: can the network be chosen once and for all, and can accuracy be bought only by letting it run longer? We prove that this is possible for every continuous function on [-1,1]. More precisely, each such function is uniformly approximated by the time evolution of a single ReLU recurrent neural network with fixed weights and fixed hidden dimension. The mechanism behind the construction is a new intermediate model, the Turing machine with neural units (TMNU). This model retains the algorithmic freedom needed to implement polynomial approximation schemes, while remaining rigid enough to be simulated by RNNs with explicit bounds on hidden dimension and weight magnitude. The resulting convergence rates reflect the underlying polynomial approximation rates. We complement the construction with minimax lower bounds showing that runtime is not merely a proof artifact, but an unavoidable resource in this fixed-network approximation paradigm.

2606.19876 2026-06-19 cs.LG math.OC 新提交

Global Convergence of Gradient Descent for Score Matching in Gaussian Mixtures via Reverse Fisher Divergence

通过反向Fisher散度实现高斯混合模型中得分匹配的梯度下降全局收敛

Alexander Tyurin

AI总结 研究反向Fisher散度下梯度下降拟合高斯混合模型的全局收敛性,证明从任意初始化或随机初始化下学生分量收敛到最近教师分量,并给出全变差距离收敛条件。

详情
AI中文摘要

得分匹配问题是现代生成建模、扩散模型、拟合非归一化统计模型和逆问题中的核心训练目标。标准方法是最小化前向Fisher散度,其中期望相对于教师分布取。然而,最近结果表明,即使在简单的高斯混合模型设置中,该目标也可能导致不良且依赖初始化的收敛行为。本文研究另一种目标:反向Fisher散度,其中期望相对于学生分布取。我们分析梯度下降(GD)拟合高斯混合模型,并表明目标函数的这一改变导致显著更好的优化性质。首先,当教师分布是单个高斯分布且学生是固定权重和单位协方差的高斯混合模型时,我们证明了从任意初始化出发GD的全局收敛性。其次,我们将分析扩展到教师也是高斯混合模型的情况,并在全局随机初始化方案和目标均值满足$\widetilde{\Omega}(1)$-分离假设下证明了全局收敛保证。特别地,以高概率,每个学生分量收敛到其最近的教师分量,并且我们提供了学生分布在全变差距离下收敛的条件。我们的证明依赖于基于Lyapunov的梯度下降动力学新分析,表明反向Fisher散度比前向Fisher散度具有更有利的优化景观。

英文摘要

The score matching problem is a central training objective in modern generative modeling, diffusion models, fitting unnormalized statistical models, and inverse problems. A standard approach is to minimize the forward Fisher divergence, where the expectation is taken with respect to the teacher distribution. However, recent results show that even in simple Gaussian mixture model settings, this objective can lead to undesirable and initialization-dependent convergence behavior. In this paper, we study an alternative objective: the reverse Fisher divergence, where the expectation is taken with respect to the student distribution. We analyze gradient descent (GD) for fitting Gaussian mixture models and show that this change in the objective leads to significantly better optimization properties. First, when the teacher distribution is a single Gaussian and the student is a Gaussian mixture model with fixed weights and identity covariances, we prove the global convergence of GD from arbitrary initializations. Second, we extend the analysis to the case where the teacher is also a Gaussian mixture model and prove global convergence guarantees under a global random initialization scheme and a $\widetildeΩ(1)$-separation assumption on the target means. In particular, with high probability, each student component converges near its closest teacher component, and we provide conditions under which the student distribution converges in total variation distance. Our proofs rely on a new Lyapunov-based analysis of the gradient descent dynamics, showing that the reverse Fisher divergence has a much more favorable optimization landscape than the forward Fisher divergence.

2606.20347 2026-06-19 cs.LG cond-mat.dis-nn 新提交

Critical Percolation as a Synthetic Data Model for Interpretability

临界渗流作为可解释性的合成数据模型

Aryeh Brill, Tom Ingebretsen Carlson

AI总结 提出基于临界平均场渗流簇的层次函数合成数据集,具有稀疏、分形和幂律分布特性,支持几乎线性时间算法生成任意规模数据,可用于评估可解释性方法。

Comments 21 pages, 10 figures, accepted to the Mechanistic Interpretability Workshop at ICML 2026

详情
AI中文摘要

神经网络学习反映自然数据层次化、多尺度结构的特征。用于评估可解释性方法的合成数据集通常缺乏这种结构,限制了其作为现实玩具模型的价值。为弥补这一差距,我们引入了一系列合成数据集,由定义在高维数据空间中嵌入的临界平均场渗流簇上的层次函数组成。渗流数据由稀疏、低维的分形簇组成,具有幂律大小分布。模拟分类层次结构的潜变量生成每个数据点的目标值。该数据模型在分析上易于处理,具有已知的临界指数,无需超参数调整即可固定其属性。我们利用渗流簇、随机树和加法凝聚之间的映射,提出了一种几乎线性时间的算法,用于联合采样随机树及其层次潜变量分解,从而能够生成任意规模的数据。通过探测实验,我们发现模型的地面真值潜变量可以从神经网络激活中线性解码。稀疏性、自相似性、幂律统计和分析可处理性共同使临界渗流成为可解释性研究的原理性测试平台。

英文摘要

Neural networks learn features that reflect the hierarchical, multi-scale structure of natural data. Synthetic datasets used to evaluate interpretability methods typically lack this structure, limiting their value as realistic toy models. To close this gap, we introduce a family of synthetic datasets consisting of hierarchical functions defined on critical mean-field percolation clusters embedded in a high-dimensional data space. The percolation data consists of sparse, low-dimensional fractal clusters with a power-law size distribution. Latent variables modeling a taxonomic hierarchy generate each data point's target value. The data model is analytically tractable with known critical exponents that fix its properties without requiring hyperparameter tuning. We leverage a mapping between percolation clusters, random trees, and additive coalescence to propose an almost linear-time algorithm to jointly sample a random tree and its hierarchical latent decomposition, enabling data generation at arbitrary scale. Using probing experiments, we find that the model's ground-truth latent variables can be linearly decoded from neural network activations. Together, sparsity, self-similarity, power-law statistics, and analytical tractability make critical percolation a principled testbed for interpretability research.

2606.18951 2026-06-19 cs.RO 新提交

A High-accuracy Event-based Underwater SLAM System

高精度事件相机水下SLAM系统

Yifan Peng, Qihang Liu, Haoying Li, Yuzhe Li, Junfeng Wu, Ziyang Hong

AI总结 针对事件相机水下SLAM中时间曲面成像质量差和匹配失败问题,提出基于结构感知度量和贝叶斯优化的高精度立体SLAM系统,并贡献首个高质量水下事件数据集UWE。

详情
AI中文摘要

虽然事件相机为水下SLAM提供了巨大潜力,但现有的基于时间曲面(TS)的方法在水下部署时被证明非常不可靠。波动的相机速度严重降低了TS成像质量,而宽立体基线和重复的水下纹理导致关键匹配失败,频繁引发系统崩溃。为克服这些挑战,我们开发了首个高精度事件相机水下立体SLAM系统。基于结构张量相干性和梯度,设计了一种结构感知度量来定量评估TS结构信息密度。通过将最优TS生成解耦为基于系统初始化的两个不同阶段,贝叶斯优化(BO)在初始化前首先预测最优先验TS,同时我们设置异步在线局部搜索方法,在跟踪阶段实时获取合适的TS。我们使用先验视差保证精确的数据关联,并采用“最新观测优先”三角测量机制实现稳定三角测量。作为这些解决方案的基准和社区资源,我们还贡献了UWE,这是首个高质量真实世界水下事件数据集,包含变化的相机运动、复杂纹理和不同轨迹特征。在公共数据集和UWE上的广泛评估表明,所提出的SLAM系统与最先进的事件相机方法相比具有竞争力的精度性能。代码和数据将开源。

英文摘要

While event cameras offer immense potential for underwater SLAM, existing Time Surface (TS)-based methods prove highly unreliable when deployed underwater. Fluctuating camera velocities severely degrade TS imaging quality, while wide stereo baselines and repetitive underwater textures induce critical matching failures, frequently triggering system failure. To overcome these challenges, we develop the first high-accuracy event-based underwater stereo SLAM system. A structure-aware metric for TS is designed based on structure tensor coherence and gradients to quantitatively evaluate TS structural information density. By decoupling the optimal TS generation into two distinct stages based on system initialization, Bayesian Optimization(BO) first predicts an optimal prior TS sequentially before initialization while we set an asynchronous online local searching method periodically to obtain appropriate TS in real-time during the tracking stage. We use the prior disparity to guarantee precise data association and "latest-observation-first'' triangulation mechanism to realize stable triangulation. As a benchmark for these solutions and a resource for the community, we also contribute UWE, the first high-quality real-world underwater event dataset containing variable camera motions, complex textures and different trajectory features. Extensive evaluations on public datasets and UWE show the competitive accuracy performance of the proposed SLAM system compared to the state-of-the-art event-based method. The code and data will be open-sourced.

2606.12500 2026-06-19 cs.LG cs.AI 新提交

Improving Crash Frequency Prediction from Simulated Traffic Conflicts Using Machine Learning Based Microsimulation

基于机器学习的微观仿真从模拟交通冲突改进碰撞频率预测

Xian Liu, Carlo G. Prato, Gustav Markkula

AI总结 本文利用机器学习行为模型替代传统规则模型进行交通微观仿真,通过极端值理论分析模拟冲突预测碰撞频率,在英国利兹五个信号交叉口验证了ML模型无需地点校准即可提升预测准确性。

详情
AI中文摘要

交通微观仿真结合替代安全措施越来越多地被用作历史碰撞数据的主动替代方案,用于预测当前或计划道路基础设施设计的碰撞频率。然而,现有的基于微观仿真的安全研究采用了简化的基于规则的行为模型,这些模型能较好地再现交通流,但往往无法生成真实的冲突动态,限制了碰撞预测的准确性。机器学习(ML)行为模型的最新进展提供了一个有希望的机会,通过直接从大规模轨迹数据集中学习人类驾驶行为,可能提高微观仿真的真实性和碰撞频率预测。为了研究这种可能性,我们对英国利兹的五个真实信号交叉口进行了交通微观仿真,使用了标准的基于规则模型和最先进的ML模型。使用二维碰撞时间指标分析模拟车辆轨迹以识别模拟冲突,然后使用极端值理论建模以预测碰撞频率。结果表明,ML模型的冲突产生的碰撞预测与实际碰撞数据一致,而基于规则的模型由于缺乏对特定模拟交叉口的模型校准,无法产生有意义的预测。直接使用ML生成的模拟碰撞来预测实际碰撞频率也产生了较差的结果,这表明尽管当前的ML模型可以真实地再现冲突,但尚不能生成真实的碰撞。总体而言,研究结果表明,基于ML的行为模型在无需特定地点模型校准的情况下,有望从模拟冲突中改进碰撞预测,并为基于ML的交通微观仿真指明了明确的未来方向。

英文摘要

Traffic microsimulation combined with surrogate safety measures has increasingly been used as a proactive alternative to historical crash data for predicting crash frequency for current or planned road infrastructure designs. However, existing microsimulation-based safety studies have adopted simplified rule-based behaviour models, which reproduce traffic flow reasonably well but often fail to generate realistic conflict dynamics, limiting crash prediction accuracy. Recent advances in machine learning (ML)-based behaviour models offer a promising opportunity to potentially improve microsimulation realism and crash frequency predictions by learning human driving behaviour directly from large-scale trajectory datasets. To investigate this possibility, traffic microsimulation was conducted for five real-world signalised intersections in Leeds, UK, using both a standard rule-based model and a state-of-the-art ML model. Simulated vehicle trajectories were analysed using a two-dimensional Time-to-Collision metric to identify simulated conflicts, which were then modelled using Extreme Value Theory to predict crash frequency. Results show that conflicts from the ML model yielded crash predictions in line with the real-world crash data, whereas the rule-based model did not permit meaningful predictions, presumably due to a lack of model calibration to the specific simulated intersections. Directly using ML-generated simulated crashes to predict real-world crash frequency also yielded poor results, suggesting that while current ML models can realistically reproduce conflicts, they are not yet able to generate realistic crashes. Overall, the findings demonstrate that ML-based behaviour models are promising for improving crash prediction from simulated conflicts, without a need for location-specific model calibration, and suggest clear future directions for ML-based traffic microsimulation.

2606.10136 2026-06-19 cs.CV 新提交

iSAGE: A Human-in-the-Loop Framework for Remote Sensing Semantic Segmentation via Sparse Point Supervision

iSAGE: 一种通过稀疏点监督进行遥感语义分割的人机协同框架

Osmar Luiz Ferreira de Carvalho, Osmar Abilio de Carvalho Junior, Anesmar Olino de Albuquerque, Daniel Guerreiro e Silva

AI总结 提出iSAGE框架,通过专家点击模型错误像素而非任意像素,无需辅助机制即可匹配密集监督,在BsB Aerial和ISPRS Vaihingen数据集上以极低标注率达到与密集监督相当的性能。

Comments 47 pages, 8 tables, 6 figures

详情
AI中文摘要

遥感中的语义分割需要昂贵的像素级标注,且由于模型很少能在传感器、平台或地理区域间迁移,几乎每个问题都需要新的数据集。现有的人机协同框架通过辅助机制(伪标签、传播、CRF、基础模型提示、辅助头)将稀疏点击扩展为密集监督,这些机制均基于模型的预测分布。在该分布中,一个自信的错误像素与一个自信的正确像素在结构上无法区分,因此任何读取该分布的规则都无法区分两者;区分信号位于模型外部。本文假设,专家针对模型错误(而非任意像素)的点击足以匹配密集监督,无需扩展机制。iSAGE(基于专家指导的迭代稀疏标注)在一个集成的开源平台上实现了这一假设,其中错误加权损失放大了每次点击的梯度,而标注记录本身即为数据集,可扩展、可纠正、可审计。实验采用最小努力策略:每帧每类最多一个标注像素。在BsB Aerial上,iSAGE恢复了密集监督的97.2%(在0.040%的像素上达到74.79% mIoU),并呈现出对比性的类别动态:无定形类别(渗透区域)从种子点开始饱和,而小类别(汽车)需要后期迭代的努力。在ISPRS Vaihingen(外部基准)上,iSAGE以0.011%的像素达到76.78% mIoU,匹配密集基线(76.65%)并超越所有已发表方法。在相同流程下,四种输出读取机制(预算1-100倍的oracle熵、阈值0.90-0.99的伪标签、基于CRF的传播、均匀随机)比iSAGE低7.4至14.5个百分点。在调查的31种方法中,iSAGE是唯一无需辅助机制即可运行的迭代式人机协同框架。

英文摘要

Semantic segmentation in remote sensing requires costly pixel-level annotations, and nearly every problem demands a new dataset since models rarely transfer across sensors, platforms, or geographies. Existing human-in-the-loop frameworks expand sparse clicks into dense supervision via auxiliary machinery (pseudo-labels, propagation, CRFs, foundation-model prompts, auxiliary heads), all operating on the model's predictive distribution. A confidently wrong pixel is indistinguishable from a confidently correct one in that distribution by construction, so no rule reading it can separate the two; the distinguishing signal is external to the model. This paper hypothesizes that expert clicks targeting confident model errors, not arbitrary pixels, suffice to match dense supervision, with no expansion machinery. iSAGE (Iterative Sparse Annotation Guided by Expert) realizes this hypothesis on an integrated open-source platform, where an error-weighted loss amplifies the gradient at each click and the annotation record itself is the dataset, extensible, correctable, and auditable. Experiments use a minimum-effort regime: at most one labeled pixel per class per frame. On BsB Aerial, iSAGE recovers 97.2% of dense supervision (74.79% mIoU on 0.040% of pixels) with contrasting class dynamics: amorphous classes (permeable areas) saturate from the seed, while small classes (cars) require late-iteration effort. On ISPRS Vaihingen (external benchmark), iSAGE reaches 76.78% mIoU with 0.011% of pixels, matching the dense baseline (76.65%) and exceeding all published methods. Under the same pipeline, four output-reading mechanisms (oracle entropy across budgets 1--100x, pseudo-labels across thresholds 0.90--0.99, CRF-based propagation, uniform random) plateau 7.4 to 14.5 pp below iSAGE. Across 31 surveyed methods, iSAGE is the only iterative human-in-the-loop framework operating without auxiliary machinery.

2606.11171 2026-06-19 cs.LG cond-mat.stat-mech cs.IT math.IT math.OC math.ST stat.TH 新提交

Indexed Bellman Information Complexity

核赌博机中的算法与极小极大复杂度

Yunbei Xu

AI总结 本文通过统一MAIR框架,将GP-UCB与MAMS算法置于共同语言下,提出结合两者优势的安全主算法,并证明在过参数化模型中算法复杂度比类宽极小极大或DEC证书更具信息性。

详情
AI中文摘要

高斯过程上置信界(GP-UCB)和决策估计系数(DEC)方法乍看之下可能属于不同的理论。本文将这两种观点置于一个共同的算法信息语言中,用于频率学派RKHS赌博机。GP-UCB固定了一个算法性的(而非真实的)高斯过程先验,并利用实现轨迹的复杂度以及计算可处理性,而MAMS优化了一个鲁棒的类宽MAIR/DEC包络。通过统一的MAIR框架和异质半正定算法先验,我们推广了GP-UCB分析和MAMS算法,提出了一种结合两者优势的安全主算法,并提供了一个核赌博机构造,表明在过参数化模型中算法复杂度可以比类宽极小极大或DEC证书更具信息性。由此得出的信息是:算法信息和类宽极小极大系数回答不同的问题,并可能导致不同的差距;核赌博机提供了一个干净的环境,使得这种区别在数学上变得可见。

英文摘要

We develop indexed Bellman information complexity, a representation-level theory of interactive decision making centered on information indices and reference histories. The representation strips away problem-specific syntax and retains only the ingredients needed for dynamic programming and information accounting, thereby unifying the earlier framework of indexed algorithmic information ratios (AIR). On the upper-bound side, regret is controlled by Bellman supersolutions or potential identities whose gradient bracket is paid for by indexed information. Upper-confidence-bound (UCB), estimation-to-decision/decision-estimation-coefficient (E2D/DEC), and adaptive-minimax-sampling or exploration-by-optimization (AMS/EBO) methods appear as three relaxations of this same identity. On the lower-bound side, the posterior-reference trajectory supplies both the information telescope and the ghost quantile of small-regret trajectories. The resulting critical radius in the lower bound is an effective-dimension-scale quantity, as in Fano and local-prior-mass lower bounds, rather than the constant radius of a two-point Le Cam argument. The examples show that DEC is best viewed as a one-step relaxation of indexed Bellman information complexity, not as a universally tight conversion mechanism. We illustrate the framework through several applications, with particular emphasis on kernel bandits. In this setting, the active action marginal provides a concrete basis for comparing UCB, E2D, and AMS/EBO.

2606.19943 2026-06-19 eess.IV cs.AI 新提交

SIMBA: ABidirectional Retrieval Forward Simulation Framework for Modeling FY-4A GIIRS Hyperspectral Infrared Radiances Toward NWP Applications

SIMBA:面向NWP应用的FY-4A GIIRS高光谱红外辐射双向检索正向模拟框架

Jingdong Shen, Fu Wang*, Qifeng Lu, Hao Huang, Chunqiang Wu, Chi Yang, Xiaofang Liu

AI总结 提出SIMBA框架,联合进行大气廓线检索和辐射重建,通过循环一致性约束和双向Mamba模块增强耦合,在FY-4A GIIRS数据上优于多种深度学习基线。

详情
AI中文摘要

高光谱红外观测是数值天气预报(NWP)的重要数据源,因为它们提供了大气温度和湿度垂直结构的丰富信息。然而,现有的深度学习方法主要关注从辐射到大气廓线的单向检索,而反向辐射模拟过程以及大气状态空间与辐射观测空间之间的一致性考虑不足。在本研究中,我们提出了SIMBA,一个用于FY-4A GIIRS高光谱红外辐射建模的统一双向检索-正向模拟框架,面向NWP应用。该框架联合执行大气廓线检索和辐射重建,引入循环一致性约束以加强两个过程之间的耦合,并采用双向Mamba状态空间模块来捕捉沿气压层的长程依赖。利用配准的FY-4A GIIRS观测和ERA5再分析数据,该方法在温度检索、比湿检索、长波辐射重建和中波辐射重建上进行了评估。实验结果表明,SIMBA在检索和重建任务上均优于多个代表性深度学习基线,而消融实验证实了双向设计和循环一致性机制的贡献。这些结果表明,所提出的框架对于联合大气廓线检索和高光谱红外辐射建模是有效的,并显示出未来在雅可比相关分析和面向NWP扩展方面的潜力。

英文摘要

Hyperspectral infrared observations are an important data source for numerical weather prediction (NWP) because they provide rich information on the vertical structure of atmospheric temperature and humidity. However, most existing deep learning methods mainly focus on one-way retrieval from radiances to atmospheric profiles, while the reverse radiance simulation process and the consistency between atmospheric state space and radiance observation space are insufficiently considered. In this study, we propose SIMBA, a unified bidirectional retrieval-forward simulation framework for FY-4A GIIRS hyperspectral infrared radiance modeling toward NWP applications. The framework jointly performs atmospheric profile retrieval and radiance reconstruction, introduces a cycle-consistency constraint to strengthen the coupling between the two processes, and employs a bidirectional Mamba state-space module to capture long-range dependencies along pressure levels. Using collocated FY-4A GIIRS observations and ERA5 reanalysis data, the proposed method is evaluated for temperature retrieval, specific humidity retrieval, long-wave radiance reconstruction, and medium-wave radiance reconstruction. Experimental results show that SIMBA outperforms several representative deep learning baselines across both retrieval and reconstruction tasks, while ablation experiments confirm the contribution of the bidirectional design and cycle-consistency mechanism. These results demonstrate that the proposed framework is effective for joint atmospheric profile retrieval and hyperspectral infrared radiance modeling, and suggest potential for future Jacobian-related analysis and NWP-oriented extensions.

2606.19574 2026-06-19 eess.IV cs.CV 新提交

FrequencyFormer: A Co-Designed Sensor-to-Processor Pipeline for Frequency-Domain Vision Transformer Inference

FrequencyFormer: 面向频域视觉Transformer推理的协同设计传感器到处理器流水线

Chengwei Zhou, Ovishake Sen, Xuming Chen, Rishith Paramasivam, Shaahin Angizi, Swarup Bhunia, Baibhab Chatterjee, Gourav Datta

AI总结 提出FrequencyFormer,通过多尺度DCT标记化将图像压缩为频域令牌,结合近传感器LUT硬件和低功耗通信架构,实现高达128倍数据压缩和28.8 TOPS/W能效,兼容多种视觉任务。

详情
AI中文摘要

在传感器边缘系统上部署视觉Transformer(ViT)不仅受限于设备计算能力,还受限于从传感器到处理器传输高维图像数据所需的能量和带宽。虽然传感器内和近传感器计算通过早期特征提取降低了这一成本,但现有方法通常仅提供适度的压缩。我们观察到频域提供了视觉信息的自然紧凑表示,并且可以在传感器级别利用以减少传感器到处理器的数据移动。基于这一见解,我们提出了FrequencyFormer,一种用于高效ViT推理的协同设计传感器到处理器流水线。FrequencyFormer包括:(1)多尺度DCT标记化器,将224x224图像压缩为紧凑的频域令牌,实现高达128倍的片外数据量减少,且精度损失较小;(2)基于查找表(LUT)的近传感器硬件实现,利用固定DCT系数实现无乘法器、节能且面积高效的标记化;(3)改进的基于MIPI的低功耗通信架构,进一步降低传输能量。FrequencyFormer可作为标准ViT补丁嵌入的直接替代,并与分类、检测和分割任务的预训练骨干网络兼容。该流水线实现了28.8 TOPS/W的能效,将通信能量降低230倍,并将总传感器侧能量降低2.22倍,展示了频域标记化作为传感器内ViT部署的可扩展基础。

英文摘要

Deploying vision transformers (ViTs) on sensor-edge systems is limited not only by on-device compute, but also by the energy and bandwidth required to transmit high-dimensional image data from the sensor to the processor. While in-sensor and near-sensor computing reduce this cost through early feature extraction, existing methods often provide only modest compression. We observe that the frequency domain provides a naturally compact representation of visual information and can be exploited at the sensor level to reduce sensor-to-processor data movement. Building on this insight, we present FrequencyFormer, a co-designed sensor-to-processor pipeline for efficient ViT inference. FrequencyFormer includes: (1) a multi-scale DCT tokenizer that compresses a 224x224 image into compact frequency-domain tokens, achieving up to 128x reduction in off-chip data volume with modest accuracy loss; (2) a LUT-based near-sensor hardware implementation that leverages fixed DCT coefficients for multiplier-free, energy- and area-efficient tokenization; and (3) a modified MIPI-based low-power communication architecture that further reduces transfer energy. FrequencyFormer serves as a drop-in replacement for standard ViT patch embedding and remains compatible with pretrained backbones across classification, detection, and segmentation tasks. The pipeline achieves 28.8 TOPS/W, reduces communication energy by 230x, and lowers total sensor-side energy by 2.22x, demonstrating frequency-domain tokenization as a scalable foundation for in-sensor ViT deployment.

2606.20520 2026-06-19 cs.CR cs.AI cs.DC cs.LG 新提交

Sovereign Execution Brokers: Enforcing Certificate-Bound Authority in Agentic Control Planes

主权执行代理:在智能体控制平面中强制执行证书绑定权限

Jun He, Deying Yu

AI总结 针对自主代理在生产环境中执行变更时缺乏强制权限验证的问题,提出主权执行代理(SEB),通过证书验证、状态检查和范围身份实现运行时强制权限控制,并在AWS和Kubernetes上验证了其安全性和性能。

Comments 19 pages, 6 figures, 10 tables

详情
AI中文摘要

自主代理越来越多地连接到云、部署和数据控制工作流,但生产环境的变更权限不应存在于非确定性推理过程中。现有的访问控制机制授权身份,而保证层认证提议的操作;两者单独都无法在变更时刻提供对认证权限的强制执行点。本文介绍了主权执行代理(SEB),一种用于证书绑定智能体基础设施的运行时强制边界。SEB消耗由主权保证边界(SAB)颁发的证书,验证请求的变更与认证的执行合约匹配,检查有效期窗口、策略时期、撤销时期和实时状态漂移,铸造范围执行身份,调用基础设施API,并记录签名的决策和结果记录。通过分离提议、准入和执行,SEB将认证权限转化为短暂的、可撤销的、可审计的运行时能力,前提是生产变更API拒绝非代理身份。我们展示了SEB执行模型、证书和重放验证谓词、范围身份语义、绕过预防部署模式、失败行为以及一个具体的原型实现。我们在AWS和Kubernetes集群上评估了原型,测量了延迟开销、撤销传播、漂移检测以及故障注入下的安全性。

英文摘要

Autonomous agents are increasingly connected to cloud, deployment, and data-control workflows, but production mutation authority should not reside inside non-deterministic reasoning processes. Existing access-control mechanisms authorize identities, while assurance layers certify proposed actions; neither alone provides a mandatory enforcement point for certified authority at the moment of mutation. This paper introduces the Sovereign Execution Broker (SEB), a runtime enforcement boundary for certificate-bound agentic infrastructure. SEB consumes certificates issued by the Sovereign Assurance Boundary (SAB), verifies that the requested mutation matches the certified execution contract, checks validity windows, policy epochs, revocation epochs, and live-state drift, mints scoped execution identity, invokes infrastructure APIs, and records signed decision and outcome records. By separating proposal, admission, and execution, SEB turns certified authority into a short-lived, revocable, auditable runtime capability, provided that production mutation APIs reject non-broker identities. We present the SEB execution model, certificate and replay-verification predicates, scoped identity semantics, bypass-prevention deployment patterns, failure behavior, and a concrete prototype implementation. We evaluate the prototype on AWS and Kubernetes clusters, measuring latency overheads, revocation propagation, drift detection, and security under fault injection.

2606.20388 2026-06-19 cs.HC cs.AI cs.DB 新提交

DataMagic: Transforming Tabular Data into Data Insight Video

DataMagic: 将表格数据转化为数据洞察视频

Yupeng Xie, Chen Ma, Zhenyang Wang, Liangwei Wang, Jiayi Zhu, Chuxuan Zeng, Zhouan Shen, Boyan Li, Yuyu Luo

AI总结 提出DataMagic系统,通过声明式规范DVSpec和多智能体架构,将原始表格数据和自然语言查询转化为叙事性数据洞察视频,并支持交互式探索。

Comments 5 pages, 3 figures, accepted at VLDB 2026

详情
AI中文摘要

数据视频整合动态图表、语音叙述和同步动画,以时间叙事的方式传达数据洞察,使其成为提高数据管理生命周期中数据消费效率的有效媒介。然而,制作高质量的数据视频需要涵盖数据分析、叙事设计和视频制作的专业知识。现有方法存在不足:静态可视化工具(如BI仪表板)缺乏叙事逻辑和动画;创作工具要求用户预先准备可视化,而非从原始数据开始;像素级视频生成模型无法保证数据保真度或来源。我们演示了DataMagic,一个端到端的交互式系统,将原始表格数据和自然语言查询转化为叙事性数据洞察视频。为确保数据保真度,DataMagic引入了声明式规范DVSpec,通过数据驱动的语义引用将视觉和动画元素绑定到底层数据字段。为解决设计空间的组合爆炸问题,DataMagic采用先生成后编排的多智能体架构,并行生成候选场景,然后通过全局编排优化叙事连贯性。利用DVSpec逻辑与渲染的解耦,系统进一步支持三种交互模式和基于结构化来源的数据问答,将单向视频转化为可探索的交互式数据界面。在109个真实世界样本上的评估验证了DataMagic的有效性。主页:此 https URL

英文摘要

Data videos integrate dynamic charts, voice narration, and synchronized animations to communicate data insights as temporal narratives, making them an effective medium for improving data consumption efficiency in the data management lifecycle. However, producing high-quality data videos requires expertise spanning data analysis, narrative design, and video production. Existing approaches fall short: static visualization tools (e.g., BI dashboards) lack narrative logic and animation; authoring tools require users to pre-prepare visualizations rather than working from raw data; pixel-level video generation models cannot guarantee data fidelity or provenance. We demonstrate DataMagic, an end-to-end interactive system that transforms raw tabular data and natural language queries into narrative data-insight videos. To ensure data fidelity, DataMagic introduces the declarative specification DVSpec, which binds visual and animation elements to underlying data fields through data-driven semantic references. To address the combinatorial explosion of the design space, DataMagic adopts a Generate-then-Orchestrate multi-agent architecture that generates candidate scenes in parallel and then optimizes narrative coherence through global orchestration. Leveraging DVSpec's decoupling of logic and rendering, the system further supports three interaction modes and structured provenance-based data Q&A, transforming one-way videos into explorable interactive data interfaces. Evaluation on 109 real-world samples validates the effectiveness of the DataMagic. Homepage: https://datamagic-home.github.io/

2606.20324 2026-06-19 cs.SE cs.LG 新提交

A Model-Driven Approach for Developing Families of Reinforcement Learning Environments

一种模型驱动的方法用于开发强化学习环境族

Xiaoran Liu, Istvan David

AI总结 提出一种模型驱动方法,通过混合遗传算法和模型转换自动生成强化学习训练环境族,以解决手动开发环境族耗时且易错的问题,并在野火缓解场景中验证了其有效性。

详情
AI中文摘要

虚拟训练环境是软件密集型系统,强化学习(RL)智能体在其中学习、适应并展示有意义的行为。虚拟训练环境为在现实环境中训练智能体提供了一种安全且成本效益高的替代方案。然而,为了收敛,大多数现实的RL问题需要在多个相似但略有不同的环境中进行训练——即环境变体族。环境族的典型开发过程是一项劳动密集型且容易出错的手动工作,难以扩展。为了缓解这些问题,本文提出了一种模型驱动的方法来开发RL训练环境族。为了获得环境族,我们开发了一种方法和原型工具。在我们的方法中,一种混合遗传算法——基于种群的全局搜索和启发式局部搜索的结合——生成环境族。变异和约束被表达为模型转换,并通过最先进的模型转换引擎操作化为搜索过程。我们在野火缓解场景和课程学习(一种依赖于环境族的特定学习范式)中展示了我们方法的有效性。

英文摘要

Virtual training environments are software-intensive systems in which reinforcement learning (RL) agents learn, adapt, and demonstrate meaningful behavior. Virtual training environments offer a safe and cost-efficient alternative to training agents in real-world settings. However, to converge, most realistic RL problems require training in multiple, mostly similar but slightly different environments - i.e., families of environment variants. The typical development process of environment families is a labor-intensive and error-prone manual endeavor that does not scale well. To alleviate these issues, in this paper, we propose a model-driven approach for developing families of RL training environments. To obtain the family of environments, we develop an approach and prototype tool. In our approach, a hybrid genetic algorithm - a combination of population-based global search and heuristic local search - generates environment families. Mutations and constraints are expressed as model transformations and are operationalized into a search process by a state-of-the-art model transformation engine. We demonstrate the soundness of our approach in a wildfire mitigation scenario and curriculum learning - a particular learning paradigm that relies on environment families.

2606.20151 2026-06-19 cs.NE cs.AI 新提交

Hybrid ANN-SNN Pipeline with Local Plasticity

混合ANN-SNN流水线与局部可塑性

Denis Larionov, Khairutin Shtanchaev, Mikhail Kiselev, Mikhail Korovin, Ivan Tugoy

AI总结 提出一种混合ANN-SNN流水线,利用预训练ANN的丰富嵌入实现高性能SNN,通过速率编码和局部学习规则训练,在64类ImageNet上达到99.09%准确率。

Comments 9 pages, 4 figues, source-code available

详情
AI中文摘要

本文提出了一种混合ANN-SNN流水线,有效利用预训练人工神经网络(ANN)的丰富嵌入来实现高性能脉冲神经网络(SNN)。该架构将预训练的EfficientNet编码器与CoLaNET脉冲分类器耦合。我们通过速率编码将编码器的激活转换为脉冲序列,并使用局部、生物启发的学习规则训练后续的SNN分类器,绕过了端到端的梯度传播。该方法在64类ImageNet基准测试中达到了99.09%的准确率,展现了与传统深度网络相当的性能。该工作为将强大的预训练编码器适应于下游脉冲神经网络任务提供了一种生物上合理且高效的框架。

英文摘要

This work proposes a hybrid ANN-SNN pipeline that effectively leverages the rich embeddings of pretrained artificial neural networks (ANNs) to enable high-performance spiking neural networks (SNNs). The architecture couples a pretrained EfficientNet encoder with a CoLaNET spiking classifier. We convert the encoder's activations into spike trains via rate-coding and train the subsequent SNN classifier using local, biologically inspired learning rules, bypassing end-to-end gradient propagation. This approach achieves 99.09% accuracy on a 64-class ImageNet benchmark, demonstrating performance on par with conventional deep networks. The work presents a biologically plausible and efficient framework for adapting powerful pretrained encoders to downstream spiking neural network tasks.

2606.19975 2026-06-19 cs.CY cs.AI 新提交

The Algorithmic-Human Manager: AI, Apps, and Workers in the Indian Gig Economy

算法-人类管理者:印度零工经济中的AI、应用程序与工人

Omir Kumar, Krishnan Narayanan

AI总结 本文研究AI和数字技术对印度蓝领零工经济中算法管理的影响,发现其虽扩大就业机会但引发公平性、透明度和工人尊严问题,提出算法-人类管理者混合治理模型。

Comments Published by the Centre for Responsible AI (CeRAI) at IIT Madras

详情
AI中文摘要

本文考察了人工智能和数字技术对印度蓝领零工经济的影响,重点关注算法管理——即在基于位置的服务(如拼车和配送)中使用自动化系统来分配、监控和评估工作。采用社会正义框架和混合方法(包括对16名零工工人和21名关键利益相关者的访谈),研究揭示了一个双重现实:虽然AI驱动的系统扩大了工作机会并产生了运营效率,但它们同时引入了与公平、透明度和工人尊严相关的重大挑战。关键发现表明,算法系统设计上不透明,产生不公平的结果,并且其结构不能为额外劳动提供相应报酬。研究倡导一种务实的混合治理模型——算法-人类管理者框架,其中技术效率和人类问责制共同运作而非对立。研究结果对政策制定者、平台公司以及致力于为印度和全球南方的零工经济设计公平AI治理框架的民间社会组织具有启示意义。

英文摘要

This paper examines the impact of artificial intelligence and digital technologies on the blue-collar gig economy in India, focusing on algorithmic management. This paper examines the impact of artificial intelligence and digital technologies on the blue collar gig economy in India, focusing on algorithmic management he use of automated systems to allocate, monitor, and evaluate work in location-based services such as ride sharing and delivery. Using a social justice framework and a mixed-methods approach comprising interviews with 16 gig workers and 21 key stakeholders, the study uncovers a dual reality: while AI-powered systems expand access to work and generate operational efficiencies, they simultaneously introduce significant challenges related to fairness, transparency, and worker dignity. Key findings reveal that algorithmic systems are opaque by design, produce inequitable outcomes, and are not structured to reward additional labour with proportionate pay. The study advocates for a pragmatic hybrid governance model an Algorithmic Human Manager framework in which technological efficiency and human accountability operate together rather than in opposition. The findings carry implications for policymakers, platform companies, and civil society organizations working to design equitable AI governance frameworks for the gig economy in India and across the Global South.

2606.19799 2026-06-19 cs.SE cs.LG 新提交

The Hidden Environmental Cost of Poor Coding Practices in TensorFlow and Keras Applications: A Study on Resource Leaks and Carbon Emissions

TensorFlow和Keras应用中不良编码实践的隐藏环境成本:资源泄漏与碳排放研究

Bashar Abdallah, Gustavo Santos, Rola Al Bataineh, Alain Abran, Mohammad Hamdaqa

AI总结 研究TensorFlow/Keras中两种资源泄漏气味(IMR和UTR)对能耗和碳排放的影响,实验表明两者分别增加约32%和46%的电力消耗,证明资源泄漏显著降低ML能效并增加环境负担。

详情
AI中文摘要

效率和可持续性是机器学习(ML)应用开发和部署中的关键考量。在影响可持续性的因素中,ML代码中的资源泄漏可能引入隐藏的低效率,从而增加能源消耗和CO2排放。尽管如此,量化其环境影响的实证证据仍然有限。这篇新兴结果论文对两种常见的资源泄漏气味,即不当模型重用(IMR)和未释放张量引用(UTR),及其对TensorFlow和Keras工作负载中能源消耗和CO2排放的影响进行了初步实证研究。通过执行相同的训练任务,并与无气味基线进行比较,对每种气味进行了受控实验。我们的初步结果表明,两种气味都持续增加了估计的用电量和碳排放。IMR和UTR分别使电力消耗增加约32%和46%,CO2排放也成比例增加。配对统计检验表明这些差异是系统性的且具有统计显著性,提供了初步的实证证据,表明资源泄漏气味可能降低ML的能效和环境可持续性。这些发现表明,资源泄漏气味对软件质量和可持续性构成可衡量的风险,强调了将资源生命周期管理和能效考虑纳入ML开发的重要性。

英文摘要

Efficiency and sustainability are critical considerations in the development and deployment of machine learning (ML) applications. Among the factors influencing sustainability, resource leaks in ML code can introduce hidden inefficiencies that elevate energy consumption and CO2 emissions. Despite this, empirical evidence quantifying their environmental impact remains limited. This emerging results paper presents an initial empirical investigation of two common resource-leak smells, namely Improper Model Reuse (IMR) and Unreleased Tensor References (UTR), and their impact on energy consumption and CO2 emissions in TensorFlow and Keras workloads. Controlled experiments were conducted for each smell by executing identical training tasks while comparing against a smell-free baseline. Our preliminary results show that both smells consistently increase estimated electricity usage and carbon emissions. IMR and UTR increased electricity consumption by approximately 32% and 46%, respectively, with proportional increases in CO2 emissions. Paired statistical tests indicate that these differences are systematic and statistically significant, providing initial empirical evidence that resource-leak smells may degrade ML energy efficiency and environmental sustainability. These findings suggest that resource-leak smells pose measurable risks to both software quality and sustainability, emphasizing the importance of integrating resource-lifecycle management and energy-efficiency considerations into ML development.

2606.19660 2026-06-19 cs.CR cs.CL 新提交

A Layered Security Framework Against Prompt Injection in RAG-Based Chatbots

基于RAG的聊天机器人中针对提示注入的分层安全框架

Gulshan Saleem, Nisar Ahmed, Muhammad Imran Zaman, Ali Hassan

AI总结 提出三层防御框架,通过输入过滤、上下文指令层级和输出审计,将提示注入攻击成功率从71.4%降至11.3%,误报率4.8%,延迟开销61.2毫秒。

Comments Submitted in ICCK Transactions on Information Security and Cryptography

详情
AI中文摘要

提示注入被OWASP Top 10 for LLM Applications列为大语言模型(LLM)部署中最关键的漏洞,然而现有防御措施仅在孤立的流水线阶段运行且不完整。输入过滤器无法检查检索到的文档,而输出监控器无法阻止恶意载荷到达模型。因此,检索增强生成(RAG)聊天机器人仍然容易受到间接注入攻击,其中被污染的知识库文档会损害每个检索到它的用户。我们提出了一个三层框架,在推理流水线中拦截直接和间接的提示注入。第一层使用基于规则的模式库和微调后的语义异常分类器筛选用户输入。第二层在上下文组装期间强制执行基于来源的指令层级,防止检索到的内容覆盖操作员策略。第三层在交付前使用策略规则引擎和语义漂移检测器审计模型输出。一个持续审计循环聚合结构化日志,并支持重新训练以适应新兴攻击模式。该框架与模型无关,作为中间件部署,无需修改底层LLM。在GPT-4o、Llama 3和Mistral 7B上对5,080个样本的评估显示,该框架将攻击成功率(ASR)从71.4%降至11.3%,比最佳单层基线高出27.3个百分点,比已发布的护栏系统高出23.8个百分点,同时保持4.8%的误报率和61.2毫秒的中位延迟开销。消融研究证实,所有三层提供互补保护,且其组合效果超过单个贡献的总和。

英文摘要

Prompt injection is ranked as the most critical vulnerability in large language model (LLM) deployments by the OWASP Top 10 for LLM Applications, yet existing defenses operate at isolated pipeline stages and remain incomplete. Input filters cannot inspect retrieved documents, while output monitors cannot prevent malicious payloads from reaching the model. Consequently, retrieval-augmented generation (RAG) chatbots remain vulnerable to indirect injection, where a poisoned knowledge-base document compromises every user whose query retrieves it. We present a three-layer framework that intercepts both direct and indirect prompt injection throughout the inference pipeline. Layer 1 screens user input using a rule-based pattern library and a fine-tuned semantic anomaly classifier. Layer 2 enforces a provenance-based instruction hierarchy during context assembly, preventing retrieved content from overriding operator policy. Layer 3 audits model output using a policy rule engine and semantic drift detector before delivery. A continuous audit loop aggregates structured logs and supports retraining to adapt the classifier to emerging attack patterns. The framework is model-agnostic and deploys as middleware without modifying the underlying LLM. Evaluation on 5,080 samples across GPT-4o, Llama 3, and Mistral 7B shows that the framework reduces Attack Success Rate (ASR) from 71.4\% to 11.3\%, outperforming the best single-layer baseline by 27.3 percentage points and a published guardrail system by 23.8 percentage points, while maintaining a 4.8\% false positive rate and a median latency overhead of 61.2 ms. Ablation studies confirm that all three layers provide complementary protection and that their combined effect exceeds the sum of individual contributions.

2606.19635 2026-06-19 cs.IR cs.AI cs.LG 新提交

Token Factory: Efficiently Integrating Diverse Signals into Large Recommendation Models

Token Factory:高效整合多样化信号于大型推荐模型

Xilun Chen, Shao-Chuan Wang, Baykal Cakici, Lukasz Heldt, Lichan Hong, Raghu Keshavan, Aniruddh Nath, Li Wei, Xinyang Xi

AI总结 提出Token Factory框架,将传统信号转化为软令牌,高效集成到基于Transformer的大型推荐模型中,避免提示长度爆炸并提升性能。

Comments 8 pages, 10 figures

详情
AI中文摘要

大型推荐模型(LRM)在工业级推荐任务中展现了强大的能力。然而,如何有效且高效地将传统信号整合到这些基于Transformer的架构中仍然是一个主要挑战。传统的直接“文本化”这些信号或创建离散物品表示的方法往往导致过长的提示、巨大的内存占用和高计算开销。为了克服这些限制,我们提出了“Token Factory”,一个旨在将传统信号转化为可由LRM直接处理的“软令牌”的框架。这种方法能够高效集成和压缩异构输入特征,防止提示长度爆炸,同时提升模型性能。我们详细描述了Token Factory的架构,并展示了在工业级推荐环境中验证其有效性的实验结果。

英文摘要

Large Recommendation Models (LRMs) have demonstrated promising capabilities in industry-scale recommendation tasks. However, holistically integrating traditional signals into these transformer-based architectures effectively and efficiently remains a major challenge. Conventional approaches that "textualize" these signals directly or create discrete item representations often lead to excessively long prompts, substantial memory footprints, and high computational overhead. To overcome these limitations, we propose "Token Factory", a framework designed to transform traditional signals into "soft tokens" that can be directly processed by LRMs. This approach enables efficient integration and compression of heterogeneous input features, preventing prompt length explosion while enhancing model performance. We detail the architecture of Token Factory and present experimental results validating its effectiveness in a production-scale recommendation environment.

2606.19566 2026-06-19 eess.SY cs.AI cs.SY 新提交

GDGU: A Gradient Difference-based Graph Unlearning Method for Cyberattack Localization in Electric Vehicle Charging Networks

GDGU:基于梯度差异的图遗忘方法用于电动汽车充电网络中的网络攻击定位

Nanhong Liu, Mucun Sun, Jie Zhang

AI总结 针对电动汽车充电站数据删除需求,提出基于梯度差异的图遗忘方法(GDGU),通过一阶参数校正实现高效遗忘,在保持定位性能的同时显著降低计算开销。

详情
AI中文摘要

电动汽车充电站(EVCS)可能使配电馈线暴露于网络攻击。尽管包括图神经网络在内的机器学习方法可以定位哪个母线被攻破,但在数据共享和模型训练方面仍存在重大挑战。例如,隐私法规允许EVCS所有者从已部署的模型中删除其训练数据,但每次请求都从头重新训练在计算上不可行。为了解决这个问题,我们研究了用于EVCS网络攻击定位的图遗忘(GU),将其形式化为图级多标签分类任务上的特征级遗忘问题。具体来说,我们提出了基于梯度差异的图遗忘(GDGU),通过一阶参数校正消除请求删除数据的影响。该校正基于原始训练数据与修改后数据集之间的梯度差异计算,其中仅遗忘请求的EVCS母线的充电功率特征。然后,应用批归一化重新校准和简短的恢复微调步骤以恢复定位效用。我们在IEEE 34母线、123母线和8500节点配电网络上,使用三种图神经网络骨干网络和累积遗忘场景,将GDGU与两种二阶GU基线进行比较。GDGU在定位效用上与最强基线相当,遗忘保真度接近完全重新训练,同时遗忘速度比从头重新训练快10到12倍,且内存使用远少于二阶GU基线。

英文摘要

Electric vehicle charging stations (EVCSs) can expose distribution feeders to cyberattacks. While machine learning methods, including graph neural networks, can localize which bus is compromised, significant challenges remain in data sharing and model training. For example, privacy regulations grant EVCS owners the right to delete their training data from a deployed model, yet retraining from scratch on every request is computationally prohibitive. To address this, we study graph unlearning (GU) for EVCS cyberattack localization, formulated as a feature-level unlearning problem on a graph-level multi-label classification task. Specifically, we propose gradient difference-based graph unlearning (GDGU), which removes the influence of the requested deletion data through a first-order parameter correction. The correction is computed from the gradient difference between the original training data and a modified dataset in which only the charging power features at the requested EVCS buses are unlearned. Then, a batch-normalization recalibration and a brief recovery fine-tuning step are applied to restore localization utility. We benchmark GDGU against two second-order GU baselines on the IEEE 34-bus, 123-bus, and 8500-node distribution networks across three graph neural network backbones and cumulative unlearning scenarios. GDGU matches the strongest baseline on localization utility and reaches forgetting fidelity close to full-retraining, while unlearning 10 to 12 times faster than retraining from scratch and using far less memory than the second-order GU baselines.

2606.19533 2026-06-19 cs.AR cs.AI 新提交

A Tool for the Synthesis of Adaptive Probabilistic Processors Based on the Ising Model

基于伊辛模型的自适应概率处理器合成工具

Jonathan Juracy Carneiro da Silva, Leonardo R. Gobatto, Jose Rodrigo Azambuja

AI总结 提出一种自动合成与仿真概率架构的工具,通过将组合优化问题映射到伊辛模型,自适应选择更新算法,改善收敛行为并支持硬件实现。

Comments ACM/IEEE/SBC/SBMICRO Symposium on Integrated Circuits and Systems Design 2026

详情
AI中文摘要

本文提出一种用于合成和仿真概率架构的工具,通过将组合优化问题映射到伊辛模型来求解。该方法根据问题特征(如规模和拓扑)自动构建伊辛哈密顿量并确定概率元件(p-bits)的数量。此外,该工具引入了一种自适应策略,用于在吉布斯采样、模拟退火(SA)、模拟量子退火(SQA)和基于簇的方法中选择最合适的更新算法。使用基准问题的实验结果表明,与固定方法相比,该方法具有更好的收敛行为和灵活性。所提出的框架能够系统评估概率计算策略,并支持基于MTJ和p-bits的未来硬件实现的开发。

英文摘要

This work presents a tool for the synthesis and simulation of probabilistic architectures for solving combinatorial optimization problems by mapping them to the Ising model. The proposed approach automatically constructs the Ising Hamiltonian and determines the number of probabilistic elements (p-bits) based on problem characteristics such as size and topology. Furthermore, the tool introduces an adaptive strategy for selecting the most suitable update algorithm among Gibbs Sampling, Simulated Annealing (SA), Simulated Quantum Annealing (SQA), and cluster-based methods. Experimental results using benchmark problems demonstrate improved convergence behavior and flexibility compared to fixed approaches. The proposed framework enables systematic evaluation of probabilistic computing strategies and supports the development of future hardware implementations based on MTJs and p-bits.

2606.20022 2026-06-19 stat.ML cs.LG math.OC 新提交

Stochastic Linear Contextual Bandits with Bounded Noise: A Set-Membership Approach

具有有界噪声的随机线性上下文赌博机:一种集合成员方法

Haonan Xu, Yingying Li

AI总结 针对有界奖励噪声的随机线性上下文赌博机,提出基于集合成员估计和乐观原则的SME-OFU算法,实现O(log T)的遗憾界,优于次高斯噪声下的最优界。

Comments 23 pages, 1 figure

详情
AI中文摘要

本文考虑具有有界奖励噪声的随机线性上下文赌博机(SLCB)。现有工作通常假设次高斯奖励噪声和有界期望奖励,在此条件下最优遗憾界关于时间T为$\tilde{O}(\sqrt{T})$。然而,在许多应用中,实现/观测到的奖励也自然有界,这意味着奖励噪声有界。有界噪声比次高斯条件更具信息性,但在SLCB文献中尚未被明确利用。本文通过利用一种称为集合成员估计(SME)的不确定性量化方法,并应用面对不确定性的乐观原则(OFU),提出了一种新颖的算法SME-OFU。我们的算法享有改进的遗憾界$O(\log T)$。注意,这并不与次高斯噪声下现有的最优界$\tilde{O}(\sqrt{T})$矛盾,因为有界噪声是更强的条件。最后,仿真表明,当奖励噪声有界时,SME-OFU相对于为次高斯噪声设计的基准算法在经验上有所改进。

英文摘要

This paper considers stochastic linear contextual bandits (SLCB) with bounded reward noise. Existing works typically assume sub-Gaussian reward noise and bounded expected rewards, under which the optimal regret bound scales as $\tilde{O}(\sqrt{T})$ in terms of horizon $T$. However, in many applications, realized/observed rewards are also naturally bounded, implying bounded reward noise. Bounded noise is more informative than the sub-Gaussian condition but has not been leveraged explicitly in the SLCB literature. In this paper, we propose a novel algorithm SME-OFU by utilizing an uncertainty quantification method called set-membership estimation (SME) and applying the principle of optimism in the face of uncertainty (OFU). Our algorithm enjoys an improved regret bound $O(\log T)$. Notice that this does not contradict the existing optimal bound $\tilde{O}(\sqrt{T})$ for sub-Gaussian noise because bounded noise is a stronger condition. Finally, simulations show empirical improvements of SME-OFU over a benchmark algorithm designed for sub-Gaussian noise when the reward noise is bounded.

2606.20356 2026-06-19 math.OC cs.AI cs.LG math.PR stat.ML 新提交

Robust $Q$-learning for mean-field control under Wasserstein uncertainty in common noise

公共噪声Wasserstein不确定性下的平均场控制鲁棒$Q$-学习

Mathieu Laurière, Ariel Neufeld, Kyunghyun Park

AI总结 提出一种针对公共噪声分布Wasserstein不确定性的离散时间平均场控制鲁棒$Q$-学习算法,结合量化投影与Wasserstein对偶,证明同步和异步学习的收敛性及有限时间界,并在系统风险和流行病模型中验证鲁棒性-性能权衡。

详情
AI中文摘要

在本文中,我们提出了一种针对公共噪声定律下Wasserstein不确定性的离散时间平均场控制问题的鲁棒$Q$-学习算法。该算法将量化投影方案与公共噪声空间上的Wasserstein对偶重述相结合。我们建立了其收敛性以及同步和异步学习方案的有限时间迭代界。关于系统风险和流行病模型的数值实验将异步实现与理想化的Bellman迭代进行了比较,说明了在公共噪声误设下的鲁棒性-性能权衡,并报告了异步$Q$-学习算法的观察收敛行为。

英文摘要

In this article, we present a robust $Q$-learning algorithm for discrete-time mean-field control problems under Wasserstein uncertainty in the common noise law. The algorithm combines a quantization-and-projection scheme with a Wasserstein dual reformulation on the common-noise space. We establish its convergence together with finite-time iteration bounds for both synchronous and asynchronous learning schemes. Numerical experiments on systemic risk and epidemic models compare the asynchronous implementation with an idealized Bellman iteration, illustrate the robustness-performance tradeoff under common-noise misspecification, and report the observed convergence behavior of the asynchronous $Q$-learning algorithm.

2606.20082 2026-06-19 math.OC cs.DS cs.LG 新提交

Beyond Averaging in John Ellipsoid Approximation: High-Accuracy Algorithms in the Leverage-Score Model

超越John椭球逼近中的平均化:杠杆分数模型中的高精度算法

Xiaoyu Li, Junwei Yu, Jiaojiao Jiang, Junbin Gao, Andi Han

AI总结 本文分离了John椭球逼近算法中的认证、识别和精度三种成本,证明精度依赖仅为双对数,并提出了加速方法和阻尼牛顿法,在杠杆分数模型中实现了高精度逼近。

详情
AI中文摘要

对称多面体 $P=\{\mathbf{x}\in\mathbb{R}^d:\|\mathbf{A}\mathbf{x}\|_\infty\le1\}$, $\mathbf{A}\in\mathbb{R}^{n\times d}$ 的 John 椭球由一系列杠杆分数算法计算,从 Cohen, Cousins, Lee 和 Yang (COLT 2019) 到其后续工作 [WY24, CLS+25],均在 $\Theta(\varepsilon^{-1}\log(n/d))$ 次迭代内达到 $(1+\varepsilon)$-逼近。我们将这一复杂度分离为现代算法混淆的三种成本(认证、识别和精度),并发现历史上的 $\varepsilon^{-1}$ 仅存在于第一种成本中。在等价的 D-最优设计形式 $\min_{\mathbf{p}\in\Delta_n}-\log\det(\sum_i p_i\mathbf{a}_i\mathbf{a}_i^\top)$ 中,杠杆分数预言机恰好是一阶预言机,而 $(1+\varepsilon)$-John 保证对应于 Frank-Wolfe 间隙 $g(\mathbf{p})\le\varepsilon d$;通过这一对应关系,成本得以分离。$\varepsilon^{-1}$ 是认证的产物:迭代点的均匀平均(该系列算法中使用的认证)的间隙恰好为 $\Theta(1/T)$,无论每次迭代多么廉价。相反,针对最后迭代点,同一预言机是快速的:热启动加速方法在 $\varepsilon$-无关的初始化 $C(\mathbf{A})$ 后,仅需 $C(\mathbf{A})+O(\sqrt{\kappa}\log(1/\varepsilon))$ 次查询即可达到保证;一旦最优面被识别,面问题成为无约束自和谐最小化,其 Hessian 可由预言机精确恢复,因此阻尼牛顿法仅需 $O(\log\log(1/\varepsilon))$ 步,总查询数为 $C(\mathbf{A})+O(d^2\log\log(1/\varepsilon))$。因此,在 $\varepsilon$-无关、条件依赖的初始化后,精度依赖是双对数的;开放问题在于剩余的识别成本(达到最优面的无条件界)和下界。精度并非障碍。

英文摘要

The John ellipsoid of a symmetric polytope $P=\{\mathbf{x}\in\mathbb{R}^d:\|\mathbf{A}\mathbf{x}\|_\infty\le1\}$, $\mathbf{A}\in\mathbb{R}^{n\times d}$, is computed by a long line of leverage-score algorithms, from Cohen, Cousins, Lee and Yang (COLT 2019) to its successors [WY24, CLS+25], all reaching a $(1+\varepsilon)$-approximation in $Θ(\varepsilon^{-1}\log(n/d))$ iterations. We separate this complexity into three costs the modern line conflates (certification, identification, and accuracy) and locate the historical $\varepsilon^{-1}$ in the first alone. In the equivalent D-optimal-design form $\min_{\mathbf{p}\inΔ_n}-\log\det(\sum_i p_i\mathbf{a}_i\mathbf{a}_i^\top)$, the leverage-score oracle is exactly the first-order oracle and the $(1+\varepsilon)$-John guarantee the Frank-Wolfe gap $g(\mathbf{p})\le\varepsilon d$; through this dictionary the costs come apart. The $\varepsilon^{-1}$ is a certification artifact: the uniform average of the iterates, the certificate used throughout the line, has gap exactly $Θ(1/T)$, however cheap each iteration is made. Pointed instead at the last iterate the same oracle is fast: a warm-started accelerated method reaches the guarantee in $C(\mathbf{A})+O(\sqrtκ\log(1/\varepsilon))$ queries after an $\varepsilon$-independent setup $C(\mathbf{A})$, and once the optimal face is identified the facial problem is an unconstrained self-concordant minimization whose Hessian the oracle recovers exactly, so damped Newton needs only $O(\log\log(1/\varepsilon))$ steps, for a total of $C(\mathbf{A})+O(d^2\log\log(1/\varepsilon))$ queries. The accuracy dependence is thus doubly logarithmic after an $\varepsilon$-independent, condition-dependent setup; the open problem is the remaining identification cost (a condition-free bound on reaching the optimal face) and lower bounds. Accuracy is not the obstruction.

2606.20062 2026-06-19 math.OC cs.LG math.PR 新提交

Optimal Coarse Correlated Equilibria in Mean Field Games: Linear Programming and No-Regret Learning

平均场博弈中的最优粗相关均衡:线性规划与无遗憾学习

Luciano Campi, Federico Cannerozzi, Ioannis Tzouanas

AI总结 针对连续时间平均场博弈,提出最优粗相关均衡的线性规划刻画,并设计基于拉格朗日对偶的无遗憾学习算法,给出收敛速率。

Comments 55 pages, 3 figures

详情
AI中文摘要

我们引入了连续时间平均场博弈的最优粗相关均衡。粗相关均衡是一种随机推荐方案,任何玩家都无法通过忽略推荐并转向替代策略而获益。问题如下:一个协调者在所有平均场粗相关均衡中选择一个,以优化一个规定的性能准则,该准则可能不同于代表性玩家的目标。在问题公式化之后,我们开发了一个线性规划(LP)公式,证明了最优LP粗相关均衡的存在性,并将LP刻画与原始概率设定联系起来。基于这一刻画,我们设计了一个无遗憾原始-对偶算法,基于外部遗憾约束的等价拉格朗日公式,用于学习此类均衡。我们提供了学习算法的显式收敛速率,数值例子说明了该方法。

英文摘要

We introduce optimal coarse correlated equilibria for continuous-time mean field games. A coarse correlated equilibrium is a randomized recommendation scheme from which no player can gain by ignoring the recommendation and switching to an alternative strategy. The problem is as follows: a moderator selects, among all mean-field coarse correlated equilibria, one that optimizes a prescribed performance criterion, which may differ from the representative player's objective. After formulating the problem, we develop a linear programming (LP) formulation, prove the existence of optimal LP coarse correlated equilibria, and relate the LP characterization to the original probabilistic setting. Building on this characterization, we design a no-regret primal-dual algorithm, based on an equivalent Lagrangian formulation of the external-regret constraint, for learning such equilibria. We provide explicit convergence rates for the learning algorithm, and numerical examples illustrate the method.

2606.19859 2026-06-19 cs.IT cs.LG math.IT math.PR math.ST stat.TH 新提交

Doeblin Curves

Doeblin 曲线

Dongmin Lee, William Lu, Anuran Makur, Japneet Singh

AI总结 提出 Doeblin 曲线概念,量化马尔可夫核在不同散度和功率水平下的收缩行为,并应用于噪声迭代优化、噪声电路可靠计算和差分隐私等领域的更细粒度收缩分析。

Comments 42 pages, 2 figures

Journal ref IEEE Transactions on Information Theory, vol. 72, no. 6, pp. 3556-3596, June 2026

详情
AI中文摘要

近期关于 Doeblin 系数的研究揭示了它们作为 TV 距离的 Dobrushin 收缩系数的多路泛化的有用性,这与它们在马尔可夫链遍历性理论中的经典作用不同。然而,为了建立信息收缩的存在性,通常需要强条件,例如远离 0。基于最近提出的非线性信息收缩概念,我们旨在提出一种更细粒度的基于 Doeblin 的多路收缩行为刻画,即使对于 Doeblin 系数为 0 的信道,也能产生非平凡的收缩保证。为此,我们引入了 Doeblin 曲线的概念——一种非线性函数,它量化了马尔可夫核在特定散度和功率水平下对输入分布集合的收缩行为。在我们的分析过程中,我们发展了 Doeblin 系数的新变分刻画,提出了 Doeblin 曲线的若干性质,定义了功率约束 Doeblin 曲线的几个版本,并利用上述变分刻画推导了上下界。然后,我们将这些结果应用于不同领域,包括噪声迭代优化的泛化界、噪声电路可靠计算的误差界以及在线迭代算法的差分隐私保证。特别是,我们将这些领域的结果扩展到更广泛的领域或群体设置,利用 Doeblin 曲线揭示比 Doeblin 系数更细粒度的收缩现象。

英文摘要

Recent research on Doeblin coefficients has shed light on their usefulness as a multi-way generalization of the Dobrushin contraction coefficient for TV distance, in a separate vein from their classic role in the theory of Markov chain ergodicity. However, strong conditions, such as being bounded away from 0, are typically necessary for Doeblin coefficients to establish the existence of information contraction. Building on recently formulated concepts of nonlinear information contraction, we aim to propose a finer-grained Doeblin-based characterization of multi-way contraction behavior which yields non-vacuous contraction guarantees even for channels whose Doeblin coefficient is 0. To this end, we introduce the notion of a Doeblin curve -- a nonlinear function which quantifies the contraction behavior of a Markov kernel on collections of input distributions at specific levels of divergence and power. Through the course of our analysis, we develop a new variational characterization of Doeblin coefficients, present several properties of Doeblin curves, define several versions of power-constrained Doeblin curves, and derive upper and lower bounds using our aforementioned variational characterization. We then utilize these results in diverse areas, including generalization bounds for noisy iterative optimization, error bounds for reliable computation with noisy circuits, and differential privacy guarantees for online iterative algorithms. In particular, we extend results in these areas to broader domains or group settings, leveraging Doeblin curves to reveal finer-grained contraction phenomena than Doeblin coefficients.