语言大模型 / LLM - arXivDaily 专题

2603.10184 2026-06-19 stat.ML cs.LG 版本更新 60%

Stabilizing Bandits using Regularization: Precise Regret and A Quantitative Central Limit Theorem

使用正则化稳定赌博机：精确遗憾与定量中心极限定理

Budhaditya Halder, Ishan Sengupta, Koustav Chowdhury, Samya Praharaj, Koulik Khamaru

发表机构 * Department of Statistics, Rutgers University（罗切斯特大学统计系）； Indian Statistical Institute, Kolkata（加尔各答印度统计研究所）

专题命中其他LLM ：研究赌博机算法稳定性，与LLM弱相关。

AI总结本文提出一种精细的稳定性条件，证明正则化随机镜像下降算法满足该条件，并推导出自适应采样下经验奖励估计的非渐近Berry-Esseen界、匹配的遗憾上下界，以及抗腐败下的渐近正态性，同时揭示正则化是有效推断的必要代价。

Comments Updated rate of convergence and precise regret in version 2

详情

AI中文摘要

由于自适应采样违反了经典渐近理论中的独立性假设，使用赌博机数据进行统计推断面临根本性挑战。近期工作将稳定性~\citep{laiwei82} 确定为自适应下有效推断的充分条件。本文首先提出一个精细的稳定性条件，以在线算法的迭代形式表述，并证明一大类正则化随机镜像下降算法满足该条件。这一精细条件使我们能够在多个方面加强~\citet{laiwei82} 的渐近结果。首先，我们推导出自适应采样下经验奖励估计的非渐近Berry-Esseen界。其次，我们推导出所提算法遗憾的匹配非渐近上下界，从而精确刻画其遗憾。第三，我们证明这些正则化算法在给定水平的对抗性腐败下保持渐近正态性和有效推断。最后，我们表明正则化是必要的而非偶然的：Lai-Wei稳定性与最优的$O(\sqrt{T})$遗憾率（如EXP3等非正则化算法所达到的）不相容，因此受控的多对数级遗憾膨胀是有效推断的代价。

英文摘要

Statistical inference with bandit data presents fundamental challenges owing to adaptive sampling, which violates the independence assumptions underlying classical asymptotic theory. Recent work has identified stability~\citep{laiwei82} as a sufficient condition for valid inference under adaptivity. This paper first provides a refined stability condition, stated in terms of the iterates of an online algorithm, and shows that a large class of regularized stochastic-mirror-descent-style algorithms satisfy it. This refined condition allows us to strengthen the asymptotic results of~\citet{laiwei82} in several ways. First, we derive a non-asymptotic Berry--Esseen bound for the empirical reward estimates under adaptive sampling. Second, we derive matching non-asymptotic upper and lower bounds on the regret of the proposed algorithm, yielding a precise characterization of its regret. Third, we show that these regularized algorithms preserve asymptotic normality and valid inference under a prescribed level of adversarial corruption. Finally, we show that regularization is necessary rather than incidental: Lai--Wei stability is incompatible with the optimal $O(\sqrt{T})$ regret rate -- the rate attained by unregularized algorithms such as EXP3 -- so that a controlled, polylogarithmic inflation in regret is the price of valid inference.

URL PDF HTML ☆

赞 0 踩 0

2511.22283 2026-06-19 cs.LG 版本更新 60%

The Hidden Cost of Approximation in Online Mirror Descent

在线镜像下降中近似的隐藏代价

Ofir Schlisselberg, Uri Sherman, Tomer Koren, Yishay Mansour

发表机构 * Tel Aviv University（特拉维夫大学）； Google Research（谷歌研究）

专题命中其他LLM ：研究在线镜像下降在近似误差下的鲁棒性，与优化相关。

AI总结研究在线镜像下降（OMD）在近似误差下的鲁棒性，发现正则子光滑度与误差容忍度密切相关：均匀光滑正则子有紧界，而负熵在单纯形上需指数小误差，对数障碍和Tsallis正则子仅需多项式误差。

详情

AI中文摘要

在线镜像下降（OMD）是一个基本的算法范式，支撑着优化、机器学习和序列决策中的许多算法。OMD迭代被定义为优化子问题的解，而这些子问题通常只能近似求解，导致算法的不精确版本。然而，现有的OMD分析通常假设理想的无误差环境，从而限制了我们对实践中应期望的性能保证的理解。在这项工作中，我们启动了对不精确OMD的系统研究，并揭示了正则子光滑性与对近似误差鲁棒性之间的复杂关系。当正则子一致光滑时，我们建立了由误差引起的超额遗憾的紧界。然后，对于单纯形及其子集上的障碍正则子，我们识别出一个尖锐的分离：负熵需要指数小的误差以避免线性遗憾，而对数障碍和Tsallis正则子即使在误差仅为多项式大小时也能保持鲁棒。最后，我们表明当损失是随机的且域是单纯形时，负熵重新获得鲁棒性——但这种性质并不扩展到所有子集，在那里指数小的误差再次是避免次优遗憾所必需的。

英文摘要

Online mirror descent (OMD) is a fundamental algorithmic paradigm that underlies many algorithms in optimization, machine learning and sequential decision-making. The OMD iterates are defined as solutions to optimization subproblems which, oftentimes, can be solved only approximately, leading to an inexact version of the algorithm. Nonetheless, existing OMD analyses typically assume an idealized error free setting, thereby limiting our understanding of performance guarantees that should be expected in practice. In this work we initiate a systematic study into inexact OMD, and uncover an intricate relation between regularizer smoothness and robustness to approximation errors. When the regularizer is uniformly smooth, we establish a tight bound on the excess regret due to errors. Then, for barrier regularizers over the simplex and its subsets, we identify a sharp separation: negative entropy requires exponentially small errors to avoid linear regret, whereas log-barrier and Tsallis regularizers remain robust even when the errors are only polynomial. Finally, we show that when the losses are stochastic and the domain is the simplex, negative entropy regains robustness-but this property does not extend to all subsets, where exponentially small errors are again necessary to avoid suboptimal regret.

URL PDF HTML ☆

赞 0 踩 0

2509.23806 2026-06-19 cs.SE cs.LG 版本更新 60%

Influence-Guided Concolic Testing of Transformer Robustness

影响力引导的Transformer鲁棒性具体化测试

Chih-Duo Hong, Chih-Cheng Yang, Yu Wang, Fang Yu

发表机构 * Department of Management Information Systems（管理信息系）

专题命中其他LLM ：测试Transformer鲁棒性，但主要关注软件测试

AI总结提出一种基于SHAP影响力排序路径谓词的具体化测试方法，通过纯Python实现多头注意力语义并显式化softmax边界，在CIFAR-10上对紧凑Transformer分类器实现60%攻击成功率，比差分进化基线高45%，且谓词优先级排序将中位攻击时间降低51%。

Comments Accepted at the 26th International Conference on Software Quality, Reliability, and Security

详情

AI中文摘要

神经网络的具体化测试交替进行具体执行和约束求解，以搜索翻转模型决策的输入。我们提出一种针对Transformer分类器的具体化测试器，使用SHAP估计对待定路径谓词按其当前预测的影响进行排序。为了支持SMT求解驱动的执行中多头自注意力机制，我们用纯Python实现注意力语义，使其与求解器兼容，并通过具体化指数参数使softmax边界显式化。我们在CIFAR-10上对三个紧凑Transformer分类器、ResNet18和VGG16在单像素预算和900秒时限下评估了该方法。在匹配比较的500个模型-输入对中，我们的方法实现了60%的成功率，而将模型视为黑盒的差分进化基线仅为15%。在主要的两层Transformer分支排序研究中，基于SHAP的谓词优先级排序将成功率从56%提升至60%，并将中位攻击时间降低51%。这些结果表明，影响力引导的路径探索可以使具体化测试成为在Transformer模型中寻找对抗样本的实用方法。

英文摘要

Concolic testing for neural networks alternates concrete execution with constraint solving to search for inputs that flip model decisions. We present a concolic tester for Transformer classifiers that uses SHAP estimates to rank pending path predicates by their impact on the current prediction. To support self-attention with multiple heads in execution backed by SMT solving, we implement attention semantics in pure Python that are compatible with the solver and make the softmax boundary explicit by concretizing exponentiation arguments. We evaluate our method on CIFAR-10 across three compact Transformer classifiers, ResNet18, and VGG16 under a one-pixel budget and a 900s horizon. Across the 500 model--input pairs in this matched comparison, our method achieves 60% success, compared with 15% for a differential evolution baseline that treats the model as a black box. In the primary two-layer Transformer branch-ordering study, SHAP-based predicate prioritization raises success from 56% to 60% and reduces median attack time by 51%. These results show that influence-guided path exploration can make concolic testing a practical way to find adversarial examples in Transformer models.

URL PDF HTML ☆

赞 0 踩 0

2507.05169 2026-06-19 cs.LG cs.AI cs.CL cs.CV cs.RO 版本更新 60%

Critique of World Model

世界模型批判：一种用于世界建模的生成式潜在预测架构

Eric Xing, Mingkai Deng, Jinyu Hou

专题命中其他LLM ：世界模型架构综述，涉及生成式预测，与LLM相关。

AI总结本文从心理学“假设性思维”出发，提出世界模型的核心目标是模拟真实世界的所有可行动可能性，并设计了一种基于状态化、分层、多级、混合连续/离散表示的生成式潜在预测（GLP）架构。

详情

AI中文摘要

世界模型，即生物智能体所经历并对其采取行动的真实世界环境的算法模拟器，近年来因开发具有人工（通用）智能的虚拟智能体的需求日益增长而成为一个新兴课题。关于世界模型究竟是什么、如何构建、如何使用以及如何评估，已有许多讨论。本文从著名科幻经典《沙丘》中的想象出发，并借鉴心理学文献中“假设性思维”的概念，论证世界模型的主要目标是模拟真实世界中所有可行动的可能性，以进行有目的的推理和行动。我们审视了世界建模的关键设计维度：数据、表示、架构、学习目标和使用，调查了现有方法并分析了它们的权衡。在此基础上，我们提出了一种新的通用世界模型生成式潜在预测（GLP）架构，基于有状态的、分层的、多层次的、混合连续/离散表示，以及生成式和自监督学习框架，并展望了由这种模型支持的物理、智能体和嵌套（PAN）AGI系统。

英文摘要

World Model, the algorithmic simulator of the real-world environment which biological agents experience and act upon, has been an emerging topic in recent years due to the rising need to develop virtual agents with artificial (general) intelligence. There has been much discussion on what a world model really is, how to build it, how to use it, and how to evaluate it. In this essay, starting from the imagination in the famed Sci-Fi classic Dune, and drawing inspiration from the concept of ``hypothetical thinking'' in psychology literature, we argue the primary goal of a world model to be {\it simulating all actionable possibilities of the real world for purposeful reasoning and acting}. We examine the key design dimensions of world modeling: data, representation, architecture, learning objective, and usage, surveying existing approaches and analyzing their tradeoffs. Building on this examination, we propose a new Generative Latent Prediction (GLP) architecture for a general-purpose world model, based on stateful, hierarchical, multi-level, and mixed continuous/discrete representations, and a generative and self-supervised learning framework, with an outlook of a Physical, Agentic, and Nested (PAN) AGI system enabled by such a model.

URL PDF HTML ☆

赞 0 踩 0

2502.03227 2026-06-19 cs.LG cs.CV 版本更新 60%

Adversarial Dependence Minimization

对抗性依赖最小化

Pierre-François De Plaen, Tinne Tuytelaars, Marc Proesmans, Luc Van Gool

发表机构 * CVL, ETH Zürich, Switzerland（CVL，苏黎世联邦理工学院，瑞士）； INSAIT, Sofia University, Bulgaria（INSAIT，索菲亚大学，保加利亚）

专题命中其他LLM ：算法可应用于自监督学习防止维度坍塌

AI总结提出ADM算法，通过对抗博弈最小化特征维度间的统计依赖性，证明全局最优时达到相互独立，并应用于非线性去相关、图像分类泛化提升和自监督学习维度坍塌预防。

2602.05533 2026-06-19 cs.AI 版本更新 55%

Conditional Diffusion Guidance under Hard Constraint: A Stochastic Analysis Approach

硬约束下的条件扩散引导：一种随机分析方法

Zhengyi Guo, Wenpin Tang, Renyuan Xu

发表机构 * Department of Industrial Engineering and Operations Research, Columbia University（哥伦比亚大学工业工程与运营管理系）； Department of Management Science and Engineering, Stanford University（斯坦福大学管理科学与工程系）

专题命中其他LLM ：扩散模型条件生成，与LLM弱相关。

AI总结提出基于Doob h-变换和鞅表示的条件扩散引导框架，通过鞅损失和鞅协方差损失学习条件函数梯度，确保硬约束满足并给出非渐近保证。

详情

AI中文摘要

我们研究了扩散模型中在硬约束下的条件生成，其中生成的样本必须以概率1满足预设事件。这类约束在安全关键应用和稀有事件模拟中自然出现，而软或基于奖励的引导方法无法保证约束满足。基于扩散模型的概率解释，我们利用Doob h-变换、鞅表示和二次变差过程，开发了一个原则性的条件扩散引导框架。具体地，得到的引导动力学通过涉及条件函数对数梯度的显式漂移校正来增强预训练扩散，而不修改预训练得分网络。利用鞅和二次变差恒等式，我们提出了两种新的离策略学习算法，基于鞅损失和鞅协方差损失，仅使用预训练模型的轨迹来估计h及其梯度。我们为得到的条件采样器在总变差和Wasserstein距离下提供了非渐近保证，明确刻画了得分近似和引导估计误差的影响。数值实验证明了所提方法在强制硬约束和生成稀有事件样本方面的有效性。数值实验的代码可在此https URL找到。

英文摘要

We study conditional generation in diffusion models under hard constraints, where generated samples must satisfy prescribed events with probability one. Such constraints arise naturally in safety-critical applications and in rare-event simulation, where soft or reward-based guidance methods offer no guarantee of constraint satisfaction. Building on a probabilistic interpretation of diffusion models, we develop a principled conditional diffusion guidance framework based on Doob's h-transform, martingale representation and quadratic variation process. Specifically, the resulting guided dynamics augment a pretrained diffusion with an explicit drift correction involving the logarithmic gradient of a conditioning function, without modifying the pretrained score network. Leveraging martingale and quadratic-variation identities, we propose two novel off-policy learning algorithms based on a martingale loss and a martingale-covariation loss to estimate h and its gradient using only trajectories from the pretrained model. We provide non-asymptotic guarantees for the resulting conditional sampler in both total variation and Wasserstein distances, explicitly characterizing the impact of score approximation and guidance estimation errors. Numerical experiments demonstrate the effectiveness of the proposed methods in enforcing hard constraints and generating rare-event samples. The code of the numerical experiments can be found at https://github.com/ZhengyiGuo2002/CDG_Finance.

URL PDF HTML ☆

赞 0 踩 0