arXivDaily arXiv每日学术速递 周一至周五更新
2606.20534 2026-06-19 math.OC 新提交

On Second-Order Methods for Bilevel Optimization

关于双层优化的二阶方法

Jiawen Bi, Jiaxiang Li, Mingyi Hong, Shuzhong Zhang

AI总结 本文针对双层优化问题,提出了一种单循环三次正则牛顿算法,在非凸上层和强凸下层设置下,实现了最优的O(ε^{-1.5})总预言复杂度,首次达到二阶驻点的最优收敛率。

详情
AI中文摘要

双层优化是现代机器学习和工程设计不可或缺的建模工具。然而,在双层优化中寻找二阶驻点的理论和实践仍然很大程度上未解决。即使对于具有强凸下层问题的双层优化,其诱导的超函数通常是非凸的。尽管三次正则牛顿方法(CRN)在单层优化中实现了最优的$\mathcal{O}(\varepsilon^{-1.5})$ SOSP(二阶驻点)率,但如何控制将二阶方法应用于双层问题时超梯度和超Hessian计算的精度,以使整个过程高效,仍不清楚。在本文中,我们着手回答这个问题。特别地,我们首先制定了一个双循环CRN基线,该基线实现了最优的外层率,但需要重复的下层求解。接下来,我们提出了一种单循环三次正则牛顿算法,该算法将一个下层梯度步与一个用于超梯度的牛顿步相结合,并证明了总体确定性的$\mathcal{O}(\varepsilon^{-1.5})$总预言复杂度,这是最优的。此外,我们说明了一些直观简单的修改可能无法维持收敛结果。据我们所知,这是第一个用于无约束NCSC(非凸上层和强凸下层)双层优化设置的确定性单循环方法,该方法实现了寻找超函数$\varepsilon$-SOSP的$\mathcal{O}(\varepsilon^{-1.5})$最优收敛率。

英文摘要

Bilevel optimization is an indispensable modeling tool for modern machine learning and engineering design. However, the theory and practice for finding second order stationary points in the context of bilevel optimization still remain largely unsettled. Even for bilevel optimization with strongly convex lower-level problem, the hyperfunction it induces is in general nonconvex. Although the Cubic Regularized Newton methods (CRN) famously achieve the optimal $\mathcal{O}(\varepsilon^{-1.5})$ SOSP (second-order stationary point) rate in single-level optimization, it is unclear how to control the accuracy of the hypergradient and hyper-Hessian computations in the context of applying the second-order methods to bilevel problems in order for the overall process to be efficient. In this paper, we set out to answer this question. In particular, we first formulate a double loop CRN baseline that achieves the optimal outer rate but requires repeated lower level solves. Next, we propose a single loop cubic regularized Newton algorithm that combines one lower-level gradient step with one Newton step for the hypergradient, and prove an overall deterministic $\mathcal{O}(\varepsilon^{-1.5})$ total oracle complexity, which is optimal. In addition, we illustrate that some intuitively simple modifications of our method may fail to hold up the convergence result. To the best of our knowledge, this is the first deterministic single loop method for unconstrained NCSC (non-convex upper-level and strongly convex lower-level) bilevel optimization setting that achieves the $\mathcal{O}(\varepsilon^{-1.5})$ optimal convergence rate for finding an $\varepsilon$-SOSP of the hyperfunction.

2606.20498 2026-06-19 math.OC 新提交

CLUSTER: Derivative-free optimization of smooth functions with parameter-change costs

CLUSTER: 带参数变化代价的光滑函数无导数优化

Serena Landers, Sahil Pontula, Shiekh Zia Uddin, Sachin Vaidya, Marin Soljačić, Steven G. Johnson

AI总结 针对参数变化有代价的无导数优化问题,提出CLUSTER算法,基于二次插值优化,在测试问题(含光学实验)上性能提升约50%,优于贝叶斯优化和Nelder-Mead,并给出收敛性保证。

Comments 18 pages, 9 figures

详情
AI中文摘要

我们引入了CLUSTER算法(用于信任域步骤评估细化的坐标水平更新策略),用于解决局部无导数优化问题,其中改变每个参数(或参数簇)存在代价。例如,这种代价模型适用于优化机器人控制的实验室实验,其中机器人可能需要对每个参数簇进行单独的运动调整。我们基于Powell和Conn的一类二次插值优化算法(已知对二次可微目标函数表现良好,例如低噪声实验),并展示了CLUSTER变体在各种测试问题(包括光学实验室实验)上将性能提升约50%,且大大优于常见的实验室优化竞争算法(贝叶斯优化和Nelder-Mead)。我们还改进了Conn算法的收敛性证明,以获得CLUSTER-Conn的类似收敛保证。

英文摘要

We introduce the CLUSTER algorithm (\textbf{c}oordinate-\textbf{l}evel \textbf{u}pdate \textbf{s}trategy for \textbf{t}rust-region step \textbf{e}valuation \textbf{r}efinement) for local derivative-free optimization problems where there is a cost to changing each parameter (or clusters of parameters). For example, this type of cost model is appropriate for optimizing robot-controlled laboratory experiments, in which a robot may incur a separate motion for each parameter cluster to be adjusted. We build off of a class of quadratic-interpolation optimization algorithms by Powell and Conn that are known to perform well for twice-differentiable objectives (e.g. low-noise experiments), and show that the CLUSTER variants improve performance on a variety of test problems (including an optics laboratory experiment) by around 50$\%$, and greatly outperform common competing algorithms for laboratory optimization (Bayesian optimization and Nelder--Mead). We also adapt the convergence proof of the Conn algorithm to obtain a similar convergence guarantee for CLUSTER-Conn.

2606.20446 2026-06-19 math.OC 新提交

High-Probability Last-Iterate Guarantees for Two-Point Gaussian Zeroth-Order Stochastic Gradient Descent

两点高斯零阶随机梯度下降的高概率最后迭代保证

Haishan Ye

AI总结 针对光滑强凸随机优化,证明标准同样本两点高斯零阶随机梯度方法具有直接的高概率最后迭代收敛率 O(d/T),置信度对数依赖 1/δ。

详情
AI中文摘要

我们为应用于光滑、强凸随机优化的标准同样本两点高斯零阶随机梯度方法建立了直接的高概率最后迭代保证。在每次迭代中,该方法抽取一个新鲜的高斯方向,使用相同的随机样本在两个对称扰动处评估目标函数,并执行一个范数归一化的随机逼近步骤。假设无偏随机梯度和随机梯度噪声平方范数的条件指数矩有界,我们证明,只要 \(d\ge16\log(6T/\delta)\),就有 \[ f(\bx_T)-f(\bx^*) = \widetilde{\mathcal O}\!\left(\frac{d}{T}\right) \] 以至少 \(1-\delta\) 的概率成立,其中固定问题参数和对数因子被吸收。因此,置信度依赖是 \(1/\delta\) 的对数而非多项式。该分析是直接的:它既不调用马尔可夫不等式将期望界转换,也不截断噪声。我们不知道在此之前有关于条件亚高斯随机梯度噪声下同样本高斯递归的零阶尺度上的直接高概率最后迭代结果。证明将高斯角度的均匀加权扫描与角度增大的乘积鞅边界相结合,该边界控制由展开的随机递归产生的带符号后缀乘积项。

英文摘要

We establish a direct high-probability last-iterate guarantee for the standard same-sample two-point Gaussian zeroth-order stochastic-gradient method applied to smooth, strongly convex stochastic optimization. At each iteration, the method draws a fresh Gaussian direction, evaluates the objective at two symmetric perturbations using the same stochastic sample, and takes a norm-normalized stochastic-approximation step. Assuming unbiased stochastic gradients and a conditional exponential-moment bound on the squared norm of the stochastic-gradient noise, we prove that, whenever \(d\ge16\log(6T/δ)\), \[ f(\bx_T)-f(\bx^*) = \widetilde{\mathcal O}\!\left(\frac{d}{T}\right) \] with probability at least \(1-δ\), up to fixed problem parameters and logarithmic factors. The confidence dependence is therefore logarithmic rather than polynomial in \(1/δ\). The analysis is direct: it neither invokes Markov's inequality to convert an expectation bound nor truncates the noise. We are not aware of a prior direct high-probability last-iterate result at this zeroth-order scale for the same-sample Gaussian recursion under conditional sub-Gaussian stochastic-gradient noise. The proof combines a uniform weighted scan for Gaussian angles with an angle-enlarged product-martingale boundary that controls the signed suffix-product term arising from the unrolled stochastic recursion.

2606.20383 2026-06-19 math.OC 新提交

A Single-Loop Minorized Dual Decomposition Method for Nonsmooth Multi-Stage Stochastic Programming

非光滑多阶段随机规划的单循环最小化对偶分解方法

Dan Luo, Hailin Sun, Lei Yang, Yang You

AI总结 针对非光滑复合目标的多阶段随机规划问题,提出一种单循环最小化对偶分解方法,利用阶段和场景可分解结构,通过对称高斯-赛德尔交替方向乘子法更新,实现全局收敛和并行计算。

详情
AI中文摘要

本文研究具有非光滑复合目标的多阶段随机规划(MSP)问题。针对其固有的阶段和场景结构,我们开发了一种单循环最小化对偶分解方法,其中每次迭代构造一个最小化问题及其受限Wolfe对偶,然后在对偶问题上执行\textit{一次迭代}的基于对称高斯-赛德尔的非精确交替方向乘子法以生成下一个迭代点。所提出的优化框架的一个关键特征是,得到的更新保持了MSP问题的阶段和场景可分解结构,并适用于并行实现。我们建立了三阶段情况下生成迭代的全局收敛性,并进一步建立了一般多阶段设置下的相应全局收敛定理。数值实验说明了所提出框架的计算可行性及其在阶段和场景结构方面的良好扩展行为。

英文摘要

In this paper, we study multi-stage stochastic programming (MSP) problems with nonsmooth composite objectives. Tailored to their intrinsic stage-wise and scenario-wise structure, we develop a single-loop minorized dual decomposition method, in which each iteration constructs a minorized problem and its restricted Wolfe dual, and then performs \textit{one iteration} of the symmetric Gauss--Seidel based inexact alternating direction method of multipliers on the resulting dual problem to generate the next iterate. A key feature of the proposed optimization framework is that the resulting updates preserve the stage-wise and scenario-wise decomposable structure of the MSP problem and are suitable for parallel implementation. We establish global convergence of the generated iterates for the three-stage case and further establish the corresponding global convergence theorem for the general multi-stage setting. Numerical experiments illustrate the computational viability of the proposed framework and its favorable scaling behavior with respect to the stage-wise and scenario-wise structure.

2606.20356 2026-06-19 math.OC cs.AI cs.LG math.PR stat.ML 新提交

Robust $Q$-learning for mean-field control under Wasserstein uncertainty in common noise

公共噪声Wasserstein不确定性下的平均场控制鲁棒$Q$-学习

Mathieu Laurière, Ariel Neufeld, Kyunghyun Park

AI总结 提出一种针对公共噪声分布Wasserstein不确定性的离散时间平均场控制鲁棒$Q$-学习算法,结合量化投影与Wasserstein对偶,证明同步和异步学习的收敛性及有限时间界,并在系统风险和流行病模型中验证鲁棒性-性能权衡。

详情
AI中文摘要

在本文中,我们提出了一种针对公共噪声定律下Wasserstein不确定性的离散时间平均场控制问题的鲁棒$Q$-学习算法。该算法将量化投影方案与公共噪声空间上的Wasserstein对偶重述相结合。我们建立了其收敛性以及同步和异步学习方案的有限时间迭代界。关于系统风险和流行病模型的数值实验将异步实现与理想化的Bellman迭代进行了比较,说明了在公共噪声误设下的鲁棒性-性能权衡,并报告了异步$Q$-学习算法的观察收敛行为。

英文摘要

In this article, we present a robust $Q$-learning algorithm for discrete-time mean-field control problems under Wasserstein uncertainty in the common noise law. The algorithm combines a quantization-and-projection scheme with a Wasserstein dual reformulation on the common-noise space. We establish its convergence together with finite-time iteration bounds for both synchronous and asynchronous learning schemes. Numerical experiments on systemic risk and epidemic models compare the asynchronous implementation with an idealized Bellman iteration, illustrate the robustness-performance tradeoff under common-noise misspecification, and report the observed convergence behavior of the asynchronous $Q$-learning algorithm.

2606.20304 2026-06-19 math.OC 新提交

Diagonal Hessian Approximation Based on Conjugacy Condition for Noisy Derivative-Free Optimization Problems in High Dimensions

基于共轭条件的对角Hessian近似用于高维含噪无导数优化问题

Morteza Kimiaei, Saman Babaie--Kafaki

AI总结 针对高维含噪无导数优化问题,提出一种利用共轭条件构造对角近似替代全仿射缩放矩阵的方法,在噪声大时比MAES方法更高效稳定。

Comments 26 pages, 4 figures

详情
AI中文摘要

我们考虑大规模含噪无导数优化(DFO)问题,其中仅函数值可用,梯度或次梯度信息无法可靠估计。矩阵自适应进化策略(MAES)及其有限内存变体是噪声下最鲁棒的DFO方法之一;然而,当噪声水平较大时,其性能可能下降。在这种情形下,排序和选择可能误识别信息性采样点,使重组步骤可靠性降低,并削弱仿射或矩阵自适应机制使用的缩放信息。这会大幅降低MAES类方法的效率,尤其是在高维设置中。为解决这一局限,我们提出一种DFO方法,用基于共轭型条件构造的对角近似替换全仿射缩放矩阵。所提机制不尝试估计梯度、次梯度或插值模型,也不从噪声排序中学习稠密协方差信息。相反,它在保守的对角更新中使用连续的归一化重组位移,从而限制不可靠选择信息的影响,同时保留底层进化框架的无导数结构。因此,该方法在计算上比全矩阵自适应方案和有限内存仿射缩放变体更便宜,同时在噪声环境中提供稳定的缩放机制。在含噪基准问题上的数值实验表明,所提方法与MAES类基线相比具有竞争力,且通常更高效,尤其是在噪声水平大且基于排序的选择变得不可靠时。

英文摘要

We consider large-scale noisy derivative-free optimization (DFO) problems in which only function values are available and gradient or subgradient information cannot be reliably estimated. Matrix-adaptation evolution strategies (MAES) and their limited-memory variants are among the most robust DFO methods under noise; however, their performance may deteriorate when the noise level is large. In such regimes, sorting and selection may misidentify informative sampled points, making the recombination step less reliable and weakening the scaling information used by affine or matrix-adaptation mechanisms. This can substantially reduce the efficiency of MAES-type methods, especially in high-dimensional settings. To address this limitation, we propose a DFO method that replaces the full affine-scaling matrix with a diagonal approximation constructed from conjugacy-type conditions. The proposed mechanism does not attempt to estimate gradients, subgradients, or interpolation models, nor does it learn dense covariance information from noisy rankings. Instead, it uses consecutive normalized recombination displacements in a conservative diagonal update, thereby limiting the influence of unreliable selection information while preserving the derivative-free structure of the underlying evolutionary framework. As a result, the method is computationally cheaper than full matrix-adaptation schemes and limited-memory affine-scaling variants, while providing a stable scaling mechanism in noisy environments. Numerical experiments on noisy benchmark problems show that the proposed method is competitive with, and often more efficient than, MAES-type baselines, particularly when the noise level is large and ranking-based selection becomes unreliable.

2606.20239 2026-06-19 math.OC 新提交

Optimizing Agricultural Drone Operations: From Launch and Recovery Siting to Tiered Routing Strategies

优化农业无人机作业:从发射与回收选址到分层路由策略

Ethan Kolby, Josh Noble, Max Z. Li

AI总结 提出农业无人机喷洒作业框架,通过p-中位启发式将选址时间从97秒降至1.2秒,分层路由将计算时间降低一个数量级,实现分钟级规划。

Comments 33 pages, 4 tables, 10 figures, preprint submitted to Drone Systems & Applications

详情
AI中文摘要

无人机在农业中的应用日益广泛,而农业利润微薄要求高效规划。当前的优化工具随着问题规模增大而出现指数级运行时间,因此日常操作需要实用的启发式方法。本文提出了无人机喷洒作业的操作框架和基准分析。我们评估了设施选址方法与分层路由参数之间的权衡。在设施选址方面,将混合整数规划(MIP)基线方法与$p$-中位启发式进行比较,结果显示启发式方法将运行时间降低了三个数量级,从超过97秒降至不到1.2秒,而服务农田面积仅减少4%。在路线规划方面,一种分层问题分解方法将目标区域划分为6到8个空间簇,将计算时间降低一个数量级,而服务面积几乎没有减少。该框架在商用硬件上实现了分钟级规划,展示了操作相关性。未来研究将纳入天气建模、设施位置与路由的集成优化,并在不同田地几何形状下进行验证。

英文摘要

Drones are increasingly used in agriculture, where tight margins demand efficient planning. Current optimization tools suffer from exponential runtimes as problem sizes grow, necessitating practical heuristics for daily operations. This paper presents an operational framework and benchmarking analysis for drone spraying operations. We evaluate the trade-offs between facility siting methods and tiered routing parameters. For facility siting, comparing a Mixed-Integer Program (MIP) baseline against a $p$-Median heuristic shows that the heuristic reduces runtime by three orders of magnitude, from over 97 seconds to under 1.2 seconds, with only a 4\% reduction in serviced field area. For route planning, a tiered problem decomposition approach partitioning the target area into 6 to 8 spatial clusters reduces computation time by an order of magnitude with minimal degradation in serviced area. This framework achieves minute-scale planning on commodity hardware, demonstrating operational relevance. Future research will incorporate weather modeling, integrated optimization of facility location and routing, and validation across diverse field geometries.

2606.20082 2026-06-19 math.OC cs.DS cs.LG 新提交

Beyond Averaging in John Ellipsoid Approximation: High-Accuracy Algorithms in the Leverage-Score Model

超越John椭球逼近中的平均化:杠杆分数模型中的高精度算法

Xiaoyu Li, Junwei Yu, Jiaojiao Jiang, Junbin Gao, Andi Han

AI总结 本文分离了John椭球逼近算法中的认证、识别和精度三种成本,证明精度依赖仅为双对数,并提出了加速方法和阻尼牛顿法,在杠杆分数模型中实现了高精度逼近。

详情
AI中文摘要

对称多面体 $P=\{\mathbf{x}\in\mathbb{R}^d:\|\mathbf{A}\mathbf{x}\|_\infty\le1\}$, $\mathbf{A}\in\mathbb{R}^{n\times d}$ 的 John 椭球由一系列杠杆分数算法计算,从 Cohen, Cousins, Lee 和 Yang (COLT 2019) 到其后续工作 [WY24, CLS+25],均在 $\Theta(\varepsilon^{-1}\log(n/d))$ 次迭代内达到 $(1+\varepsilon)$-逼近。我们将这一复杂度分离为现代算法混淆的三种成本(认证、识别和精度),并发现历史上的 $\varepsilon^{-1}$ 仅存在于第一种成本中。在等价的 D-最优设计形式 $\min_{\mathbf{p}\in\Delta_n}-\log\det(\sum_i p_i\mathbf{a}_i\mathbf{a}_i^\top)$ 中,杠杆分数预言机恰好是一阶预言机,而 $(1+\varepsilon)$-John 保证对应于 Frank-Wolfe 间隙 $g(\mathbf{p})\le\varepsilon d$;通过这一对应关系,成本得以分离。$\varepsilon^{-1}$ 是认证的产物:迭代点的均匀平均(该系列算法中使用的认证)的间隙恰好为 $\Theta(1/T)$,无论每次迭代多么廉价。相反,针对最后迭代点,同一预言机是快速的:热启动加速方法在 $\varepsilon$-无关的初始化 $C(\mathbf{A})$ 后,仅需 $C(\mathbf{A})+O(\sqrt{\kappa}\log(1/\varepsilon))$ 次查询即可达到保证;一旦最优面被识别,面问题成为无约束自和谐最小化,其 Hessian 可由预言机精确恢复,因此阻尼牛顿法仅需 $O(\log\log(1/\varepsilon))$ 步,总查询数为 $C(\mathbf{A})+O(d^2\log\log(1/\varepsilon))$。因此,在 $\varepsilon$-无关、条件依赖的初始化后,精度依赖是双对数的;开放问题在于剩余的识别成本(达到最优面的无条件界)和下界。精度并非障碍。

英文摘要

The John ellipsoid of a symmetric polytope $P=\{\mathbf{x}\in\mathbb{R}^d:\|\mathbf{A}\mathbf{x}\|_\infty\le1\}$, $\mathbf{A}\in\mathbb{R}^{n\times d}$, is computed by a long line of leverage-score algorithms, from Cohen, Cousins, Lee and Yang (COLT 2019) to its successors [WY24, CLS+25], all reaching a $(1+\varepsilon)$-approximation in $Θ(\varepsilon^{-1}\log(n/d))$ iterations. We separate this complexity into three costs the modern line conflates (certification, identification, and accuracy) and locate the historical $\varepsilon^{-1}$ in the first alone. In the equivalent D-optimal-design form $\min_{\mathbf{p}\inΔ_n}-\log\det(\sum_i p_i\mathbf{a}_i\mathbf{a}_i^\top)$, the leverage-score oracle is exactly the first-order oracle and the $(1+\varepsilon)$-John guarantee the Frank-Wolfe gap $g(\mathbf{p})\le\varepsilon d$; through this dictionary the costs come apart. The $\varepsilon^{-1}$ is a certification artifact: the uniform average of the iterates, the certificate used throughout the line, has gap exactly $Θ(1/T)$, however cheap each iteration is made. Pointed instead at the last iterate the same oracle is fast: a warm-started accelerated method reaches the guarantee in $C(\mathbf{A})+O(\sqrtκ\log(1/\varepsilon))$ queries after an $\varepsilon$-independent setup $C(\mathbf{A})$, and once the optimal face is identified the facial problem is an unconstrained self-concordant minimization whose Hessian the oracle recovers exactly, so damped Newton needs only $O(\log\log(1/\varepsilon))$ steps, for a total of $C(\mathbf{A})+O(d^2\log\log(1/\varepsilon))$ queries. The accuracy dependence is thus doubly logarithmic after an $\varepsilon$-independent, condition-dependent setup; the open problem is the remaining identification cost (a condition-free bound on reaching the optimal face) and lower bounds. Accuracy is not the obstruction.

2606.20062 2026-06-19 math.OC cs.LG math.PR 新提交

Optimal Coarse Correlated Equilibria in Mean Field Games: Linear Programming and No-Regret Learning

平均场博弈中的最优粗相关均衡:线性规划与无遗憾学习

Luciano Campi, Federico Cannerozzi, Ioannis Tzouanas

AI总结 针对连续时间平均场博弈,提出最优粗相关均衡的线性规划刻画,并设计基于拉格朗日对偶的无遗憾学习算法,给出收敛速率。

Comments 55 pages, 3 figures

详情
AI中文摘要

我们引入了连续时间平均场博弈的最优粗相关均衡。粗相关均衡是一种随机推荐方案,任何玩家都无法通过忽略推荐并转向替代策略而获益。问题如下:一个协调者在所有平均场粗相关均衡中选择一个,以优化一个规定的性能准则,该准则可能不同于代表性玩家的目标。在问题公式化之后,我们开发了一个线性规划(LP)公式,证明了最优LP粗相关均衡的存在性,并将LP刻画与原始概率设定联系起来。基于这一刻画,我们设计了一个无遗憾原始-对偶算法,基于外部遗憾约束的等价拉格朗日公式,用于学习此类均衡。我们提供了学习算法的显式收敛速率,数值例子说明了该方法。

英文摘要

We introduce optimal coarse correlated equilibria for continuous-time mean field games. A coarse correlated equilibrium is a randomized recommendation scheme from which no player can gain by ignoring the recommendation and switching to an alternative strategy. The problem is as follows: a moderator selects, among all mean-field coarse correlated equilibria, one that optimizes a prescribed performance criterion, which may differ from the representative player's objective. After formulating the problem, we develop a linear programming (LP) formulation, prove the existence of optimal LP coarse correlated equilibria, and relate the LP characterization to the original probabilistic setting. Building on this characterization, we design a no-regret primal-dual algorithm, based on an equivalent Lagrangian formulation of the external-regret constraint, for learning such equilibria. We provide explicit convergence rates for the learning algorithm, and numerical examples illustrate the method.

2606.20059 2026-06-19 math.OC math.DG 新提交

Optimization with inequality constraints by the embedded gradient vector field method

嵌入梯度向量场方法求解带不等式约束的优化问题

Petre Birtea, Ioan Casu, Dan Comanescu

AI总结 通过二次松弛变量将不等式约束转化为等式,利用黎曼几何和嵌入梯度向量场方法,推导出拉格朗日乘子的显式行列式公式,并重新解释KKT条件。

详情
AI中文摘要

我们通过引入二次松弛变量,为带不等式约束的优化问题建立了几何框架。该公式使得能够运用黎曼几何的语言,并通过嵌入梯度向量场方法求解问题。我们将可行集提升到扩展环境空间的一个光滑子流形上。详细分析了由此产生的约束流形的分层结构,得到了根据哪些约束是活跃的自然划分。利用嵌入梯度向量场形式,直接从约束流形的几何结构推导出拉格朗日乘子函数的显式行列式公式,在不借助经典拉格朗日乘子法的情况下,重新表述了经典的Karush-Kuhn-Tucker一阶必要条件。通过计算每个分层上的限制Hessian矩阵得到二阶最优性条件,并将拉格朗日乘子的完整符号条件识别为经典互补松弛条件的几何对应。该理论在双冰激凌锥示例上进行了说明,其中问题的几何结构决定了局部极小值的性质和数量。

英文摘要

We develop a geometric framework for constrained optimization problems with inequality constraints through the introduction of quadratic slack variables. This formulation makes it possible to employ the language of Riemannian geometry and to solve the problem via the embedded gradient vector field method. We lift the feasible set to a smooth submanifold of an extended ambient space. The stratified structure of the resulting constraint manifold is analyzed in detail, yielding a natural partition according to which constraints are active. Using the embedded gradient vector field formalism, we derive explicit, determinantal formulas for the Lagrange multiplier functions directly from the geometry of the constraint manifold, recovering and re-framing the classical Karush-Kuhn-Tucker first-order necessary conditions without invoking the classical Lagrange multiplier method. Second-order optimality conditions are obtained by computing the restricted Hessian on each stratum, and a complete sign condition on the Lagrange multipliers is identified as the geometric counterpart of the classical complementary slackness condition. The theory is illustrated on the double ice-cream cone example, where the geometry of the problem determines the nature and number of local minima.

2606.20013 2026-06-19 math.OC 新提交

A Regularized Nikaido-Isoda Function Approach to Multi-Leader-Follower Games

正则化Nikaido-Isoda函数方法求解多领导者-跟随者博弈

Atsushi Hori, Takayuki Okuno, Ellen H. Fukuda

AI总结 提出一种基于正则化Nikaido-Isoda函数的新重构方法,将多领导者-跟随者博弈近似为单层可微纳什均衡问题,避免高阶导数需求,适用于更广泛的博弈类。

详情
AI中文摘要

多领导者-跟随者博弈(MLFG)是一种层次非合作博弈,其中领导者在上层竞争,同时考虑下层跟随者的最优反应。求解MLFG的一种典型方法是通过将下层博弈替换为其KKT条件,将其重构为具有均衡约束的均衡问题(EPEC)。另一种方法,当每个跟随者的响应唯一时,是将MLFG重构为纳什均衡问题,将这些响应函数代入每个领导者的问题中。然而,这两种重构可能缺乏可扩展性,因为求解所得问题可能需要高阶导数。在本文中,我们通过利用正则化Nikaido-Isoda函数,并借助惩罚参数将MLFG近似为单层可微纳什均衡问题,提出了一种新的MLFG重构方法。所提出的重构既不需要跟随者博弈的导数信息,也不假设每个跟随者问题的凸性;因此,它可以处理更广泛的MLFG类。在全局子解析性条件下,我们分析了原始MLFG的均衡与所提重构之间的数学关系。

英文摘要

A multi-leader--follower game (MLFG) is a hierarchical noncooperative game in which leaders compete at the upper level while taking into account the followers' best responses at the lower level. A typical approach to solving the MLFG reformulates it as an equilibrium problem with equilibrium constraints (EPECs) by replacing the lower-level game with its KKT conditions. Another approach, when each follower's response is unique, is to reformulate the MLFG as a Nash equilibrium problem by substituting these response functions into each leader's problem. However, both reformulations may lack scalability since higher-order derivatives may be required when solving the resulting problems. In this paper, we propose a new reformulation of the MLFG by exploiting a regularized Nikaido--Isoda function and approximating the MLFG by a single-level differentiable Nash equilibrium problem with a penalty parameter. The proposed reformulation neither requires derivative information on the followers' game nor assumes convexity of each follower's problem; hence, it can handle a broader class of MLFGs. Under global subanalyticity, we analyze the mathematical relationship between equilibria of the original MLFG and the proposed reformulation.

2606.19871 2026-06-19 math.OC cs.MA cs.SY eess.SY 新提交

Semiglobal Input-Delay Tolerance Algorithm for Distributed Nonconvex Optimization of Networked Nonlinear Systems

网络化非线性系统分布式非凸优化的半全局输入延迟容忍算法

Jing-Zhe Xu, Zhi-Wei Liu, Ming-Feng Ge, Yan-Wu Wang, Dinxin He

AI总结 针对存在输入延迟和一致性约束的网络化非线性系统,提出一种半全局输入延迟容忍算法,通过分层设计和输入-状态稳定性分析,在Polyak-Łojasiewicz条件下实现非凸优化的分布式求解。

Comments 36 pages, 5 figures

详情
AI中文摘要

本文研究了一类受输入延迟和一致性约束的网络化非线性系统中的分布式优化问题。引入了输入延迟容忍半全局收敛(IDTSC),即对于任意给定的紧致初始集,存在一个可容许的延迟界,在该界下,最优解在一致性约束内被计算,并且所有节点状态收敛到该解。基于分层设计和输入-状态稳定性分析,开发了一种新的半全局输入延迟容忍(SIDT)算法,该算法在实际中实现了输入延迟与非线性动力学耦合下的分布式优化IDTSC。此外,通过Polyak-Łojasiewicz条件放宽严格凸性要求,SIDT算法将其适用性扩展到非凸优化。最后,数值实验验证了该理论在具有输入延迟的网络化非线性系统上的有效性。

英文摘要

This paper studies a class of distributed optimization problems in networked nonlinear systems (NNSs) subject to input delays and consensus constraints. It introduces input-delay tolerant semiglobal convergence (IDTSC), meaning that for any prescribed compact initial set there exists an admissible delay bound under which the optimal solution is computed within consensus constraints and all node states converge to the solution. Building on a hierarchical design and input-to-state stability analysis, a new semiglobal input-delay tolerant (SIDT) algorithm is developed that practically achieves IDTSC for distributed optimization under the coupling between input delays and nonlinear dynamics. Further, by relaxing strict convexity requirements through the Polyak-Łojasiewicz condition, the SIDT algorithm broadens its applicability to nonconvex optimization. Finally, numerical experiments corroborate the theory on NNSs with input delays.

2606.19789 2026-06-19 math.OC stat.ME 新提交

Dynamic Core Allocation for Malleable Jobs with Unknown Speed-up Parameters

具有未知加速参数的可变作业的动态核心分配

S. ~A. Bodas, J. ~L. Dorsman, M. Mandjes, L. Ravner

AI总结 针对多核系统中具有未知加速参数的可变作业,提出一种迭代学习-控制框架,通过最大似然估计未知参数并求解马尔可夫决策过程更新分配策略,以最小化长期平均作业数。

详情
AI中文摘要

我们研究了具有固定数量处理核心和可变形作业流的多核计算系统中的动态资源分配问题。每个作业可以在执行期间调整其并行度,从而允许在并发活动作业之间自适应地重新分配资源。作业属于两个可观测类别之一,每个类别由具有未知参数的独特加速函数表征。目标是学习一种核心分配策略,以最小化系统中长期平均作业数,即稳态下的平均响应时间。为了解决这种不确定性,我们开发了一个迭代学习与控制框架。系统在根据观察到的作业完成情况估计未知加速参数和求解相关马尔可夫决策过程以更新分配策略之间交替。在每个作业类别内,核心在活动作业之间平均共享;分配给每个类别的容量比例来自文献[17]的MDP公式,并在当前参数估计下进行评估。我们基于状态相关的离开时间构建了最大似然估计器,并证明了在固定分配策略下其强一致性。我们进一步提出了两种学习算法,将该估计步骤与基于动态规划的策略更新相结合,并通过数值实验说明了它们的性能。

英文摘要

We study dynamic resource allocation in a multicore computing system with a fixed number of processing cores and a stream of {\it malleable} jobs. Each job may adjust its level of parallelism during execution, allowing adaptive redistribution of resources across concurrently active jobs. Jobs belong to one of two observable classes, each characterized by a distinct speed-up function with unknown parameters. The objective is to learn a core-allocation policy that minimizes the long-run mean number of jobs in the system, equivalently the mean response time in steady state. \noindent To address this uncertainty, we develop an iterative learning-and-control framework. The system alternates between estimating the unknown speed-up parameters from observed job completions and solving the associated Markov decision process (MDP) to update the allocation policy. Within each job class, cores are shared equally among active jobs; the fraction of capacity assigned to each class is obtained from the MDP formulation of \cite{berg2017}, evaluated at the current parameter estimates. We construct a maximum likelihood estimator based on state-dependent inter-departure times and prove its strong consistency under a fixed allocation policy. We further propose two learning algorithms that combine this estimation step with dynamic programming-based policy updates, and illustrate their through numerical experiments.

2606.19772 2026-06-19 math.OC 新提交

Signature Methods for Optimal Market Making

最优做市商的签名方法

Alberto Gennaro, Thibaut Mastrolia, Francesca Primavera

AI总结 提出基于签名的均值-方差最优做市方法,通过签名线性化将问题转化为伪线性优化,并开发Sig-REINFORCE算法学习最优报价。

Comments v1

详情
AI中文摘要

我们提出了一种基于签名的方法来解决均值-方差准则下的最优做市问题。通过利用签名线性化技术,我们将做市问题简化为对增强市场路径期望签名的伪线性优化,并开发了一种名为Sig-REINFORCE的签名算法来学习最优买卖报价。我们在两种场景下测试了该方法,其中市价单到达遵循泊松过程或自激霍克斯过程,并将其与近端策略优化(PPO)基线进行了比较。

英文摘要

We propose a signature-based method to solve the optimal market-making problem under a mean-variance criterion. By exploiting signature linearization techniques, we reduce the market-making problem to a pseudo-linear optimization over the expected signature of an augmented market path, and we develop a signature algorithm named Sig-REINFORCE to learn the optimal bid and ask quotes. We test our method in two scenarios, in which market-order arrivals follow either a Poisson or a self-exciting Hawkes process, and we benchmark it against a Proximal Policy Optimization (PPO) baseline.

2606.19705 2026-06-19 math.OC 新提交

Stochastic Representations of Stationary HJBI-Type Variational Inequalities with Bilateral Constraints

双边约束下平稳HJBI型变分不等式的随机表示

Sheng Huang, Qingmeng Wei

AI总结 本文通过增广无限时域二人零和随机微分博弈和混合控制-停止博弈,给出了双边约束下平稳HJBI型变分不等式的两种随机表示,并证明了值函数是相应变分不等式的唯一有界粘性解。

详情
AI中文摘要

本文研究了双边约束下平稳HJBI型变分不等式的概率表示。我们提供了两种互补的随机表示。第一种表示通过增广无限时域二人零和随机微分博弈获得。通过用两个额外的停止符号扩大控制空间,障碍项被纳入运行收益。利用无限时域随机递归微分博弈的框架,我们证明了所得的下值和上值函数是相应HJBI变分不等式的唯一有界粘性解。第二种表示由二人零和混合控制-停止随机微分博弈给出。在该公式中,每个玩家同时选择连续控制和停止决策,收益由具有随机终止时间的BSDE定义。为了使停止分量与Elliott-Kalton策略框架兼容,我们引入了依赖于对手控制过程的非预期停止策略。证明基于带自身值函数的惩罚无限时域随机微分博弈,结合动态规划论证和后向半群的稳定性估计。我们证明了混合控制-停止博弈的值函数与双边HJBI变分不等式的唯一有界粘性解一致。

英文摘要

In this paper, we study probabilistic representations for stationary HJBI-type variational inequalities with bilateral constraints. We provide two complementary stochastic representations.The first representation is obtained through an augmented infinite-horizon two-player zero-sum stochastic differential game (SDG). By enlarging the control spaces with two additional stopping symbols, the obstacle terms are incorporated into the running payoff. Using the framework of infinite-horizon stochastic recursive differential games, we show that the resulting lower and upper value functions are the unique bounded viscosity solutions of the corresponding HJBI variational inequalities. The second representation is given by a two-player zero-sum mixed control--stopping SDG. In this formulation, each player chooses both a continuous control and a stopping decision, and the payoff is defined by a BSDE with a random terminal time. To make the stopping component compatible with the Elliott--Kalton strategy framework, we introduce nonanticipative stopping strategies depending on the opponent's control process. The proof is based on penalized infinite-horizon SDGs coupled with their own value functions, together with dynamic programming arguments and stability estimates for backward semigroups. We prove that the value functions of the mixed control--stopping game coincide with the unique bounded viscosity solutions of the bilateral HJBI variational inequalities.

2606.19669 2026-06-19 math.OC cs.SY eess.SY 新提交

Learning Neural Maximal Lyapunov Functions on $\mathsf{SO}(n)$

在 $\mathsf{SO}(n)$ 上学习神经最大李雅普诺夫函数

Adeel Akhtar, Matthieu Barreau

AI总结 提出基于对数映射的神经李雅普诺夫架构,通过Zubov型表征学习最大吸引域,并推导对数映射导数的显式公式,实现两阶段训练算法。

Comments Accepted to IEEE Control Systems Letters (L-CSS), 6 pages, 2 figures,

详情
AI中文摘要

为李群上的动力系统建立稳定性保证是一个基本挑战,因为为欧几里得空间开发的经典李雅普诺夫方法不能直接转移到弯曲几何上。在本文中,我们提出了一个框架,用于学习在特殊正交群 $\mathsf{SO}(n)$ 上演化的系统的最大李雅普诺夫函数。理论上,我们引入了一种基于对数映射的神经李雅普诺夫架构,具有可证明的逼近能力,并通过最大吸引域的Zubov型表征来形式化学习问题。一个关键的技术贡献是推导了对数映射导数的显式、数值可处理的公式,使得通过一个平衡计算效率和精度的两阶段算法进行训练成为可能。实证上,我们在一个低维非线性系统上验证了该方法。

英文摘要

Establishing stability guarantees for dynamical systems on Lie groups is a fundamental challenge, as classical Lyapunov methods developed for Euclidean spaces do not directly transfer to curved geometries. In this paper, we propose a framework for learning maximal Lyapunov functions for systems evolving on the special orthogonal group $\mathsf{SO}(n)$. Theoretically, we introduce a neural Lyapunov architecture based on the logarithmic map with proven approximation capabilities, and we formulate the learning problem via a Zubov-type characterization of the maximal region of attraction. A key technical contribution is the derivation of explicit, numerically tractable formulas for the derivative of the logarithmic map, enabling training through a two-phase algorithm that balances computational efficiency and accuracy. Empirically, we validate the approach on a low-dimensional nonlinear system.

2606.19663 2026-06-19 math.OC math.PR 新提交

Counterexample to a conjecture on the pairwise independent correlation gap using AI

利用AI对成对独立相关间隙猜想的反例

Arjun Ramachandra, Karthik Natarajan

AI总结 借助AI工具GPT5.5 Pro,构造了一个反例,反驳了Ramachandra和Natarajan(2025)关于成对独立相关间隙的猜想。

详情
AI中文摘要

借助AI工具GPT5.5 Pro,我们为Ramachandra和Natarajan(2025)[成对独立相关间隙,Operations Research Letters, 107255, 6040]提出的一个猜想提供了一个反例。

英文摘要

Aided by the AI tool GPT5.5 Pro, we provide a counterexample to a conjecture made by Ramachandra and Natarajan (2025) [Pairwise independent correlation gap, Operations Research Letters, 107255, 6040].

2606.19639 2026-06-19 math.OC 新提交

Mean-Field Control with a Common Hidden State under Decentralized Observations

分散观测下具有共同隐藏状态的均值场控制

Erhan Bayraktar, Ali D. Kara

AI总结 研究多个决策者通过相同信道接收分散观测并共享隐藏状态的最优控制问题,通过均值场极限简化为单代理控制问题,证明了随机化控制的必要性并建立了有限人口问题的近似最优收敛率。

详情
AI中文摘要

我们研究具有多个决策者的系统的最优控制,这些决策者共享一个共同的隐藏状态,并通过相同的信道接收完全分散的观测。隐藏状态的动态和代理产生的成本仅通过其经验分布依赖于代理的动作。在具有无限多个代理的极限问题中,问题简化为单代理控制问题,其中代理通过给定隐藏状态过程过去值的动作条件律影响隐藏状态动态。我们将该问题表述为策略空间上的确定性测度值控制问题,并给出动态规划递归。我们首先证明,对于极限问题,控制动作的随机化对于最优性是必要的。然而,策略选择的随机化(即混合策略)是不需要的。然后我们证明,为无限人口问题设计的最优对称策略对于有限人口问题是近似最优的。特别地,我们建立了收敛速率,该速率随代理数量以 $\frac{1}{\sqrt{N}}$ 衰减,并随策略中使用的记忆长度指数增长。

英文摘要

We study optimal control of a system with multiple decision makers who share a common hidden state and receive fully decentralized observations through identical channels. The dynamics of the hidden state and the cost incurred by the agents depend on the agents' actions only through their empirical distribution. In the limit problem with infinitely many agents, the problem reduces to a single agent control problem where the agent affects the hidden state dynamics via the conditional law of the actions given the past values of the hidden state process. We formulate this problem as a deterministic measure valued control problem over the space of policies and provide a dynamic programming recursion. We first show that for the limiting problem randomization over the control actions is necessary for optimality. However, randomization over the selection of policies (i.e., mixture policies) is not required. We then show that the optimal symmetric policies designed for the infinite population problem are near optimal for the finite population problem. In particular, we establish convergence rates that decay with number of agents as $\frac{1}{\sqrt{N}}$, and grow exponentially with the memory length used in the policy.

2606.19553 2026-06-19 math.OC 新提交

On the Limits of Biased Derivative Information for Nonconvex Stochastic Optimization

关于非凸随机优化中有偏导数信息的局限性

Anant Shyam, Brian Bullins

AI总结 针对光滑非凸目标,研究有偏随机导数下寻找δ-稳定点的下界,并开发信任域方法匹配下界,高阶方差缩减在高偏差情形下降低复杂度。

Comments 39 pages

详情
AI中文摘要

我们考虑对于δ>0寻找δ-稳定点的问题,即x∈R^d使得||∇F(x)||≤δ,其中导数预言机不仅是随机的而且有偏。在一阶情形下,我们给出了寻找O((ε+B^2)^{1/2})-稳定点的紧下界,其中ε>0,B是梯度偏差的界,与Ajalloeian和Stich (2020)的上界匹配。然后,我们为使用高阶导数信息寻找O(ε+B_max)-稳定点的算法建立了依赖于偏差的下界,其中B_max是所有导数最大偏差的界。为了补充这些下界,我们开发了基于信任域的方法,在特定偏差范围内提供与相应下界匹配的保证。我们进一步通过高阶方差缩减方案在高偏差设置中改进了预言机复杂度,特别展示了在某些情况下使用高阶导数信息的好处,而这类改进在随机无偏设置中已知是无法实现的。

英文摘要

We consider the problem of finding $δ$-stationary points for $δ> 0$, i.e., $x \in \mathbb{R}^d$ such that $||\nabla F(x)|| \le δ$, for smooth, non-convex objectives, where the derivative oracles are not only stochastic but also biased. In the first-order setting, we provide tight lower bounds for finding an $O((ε+ B^2)^{1/2})$-stationary point, for $ε> 0$ and where $B$ is a bound on the gradient bias, matching the upper bounds of Ajalloeian and Stich (2020). We then establish bias-dependent lower bounds for algorithms that use higher-order derivative information for finding $O(ε+ B_{\max})$-stationary points, where $B_{\max}$ is a bound on the maximum bias for all derivatives. To complement these lower bounds, we develop trust-region based methods that, for certain ranges of bias, provide guarantees that match the corresponding lower bounds. We further improve upon the oracle complexity in high bias settings through a higher-order variance reduction scheme, in particular demonstrating the benefits, in some cases, of using higher-order derivative information, whereas such improvements are known to be unattainable for stochastic unbiased settings.

2606.19516 2026-06-19 math.OC 新提交

A land of monotone plenty, bis repetita: from classical to weak optimal transport

单调富饶之地,重复之韵:从经典最优输运到弱最优输运

Virginie Ehrlacher, Rodrigue Lelotte, Luca Nenna

AI总结 本文揭示c-循环单调性等价于经典最优输运问题的零阶最优性条件,并将其推广到弱最优输运问题,对应一阶最优性条件,从而统一刻画最优输运计划。

详情
AI中文摘要

著名的c-循环单调性被证明归结为最优输运问题的零阶最优性条件。更精确地说,我们证明最优性等价于线性输运成本泛函在允许扰动的径向锥上的非负性。然后,我们利用这一观点将c-循环单调性扩展到弱最优输运问题,在该问题中它对应于一阶最优性条件,即弱输运成本泛函在优化器附近的线性化的非负性。总之,这为这一单调性概念提供了新的视角。对于经典和弱最优输运,我们证明该性质(在适当假设下)刻画了最优输运计划。在经典情形中,我们恢复了文献中的已知结果,但给出了重新审视的证明。

英文摘要

The celebrated c-cyclical monotonicity property is shown to boil down to the zeroth-order optimality condition for the optimal transport problem. More precisely, we show that optimality is equivalent to the non-negativity of the linear transport cost functional on the radial cone of admissible perturbations. We then utilise this point of view to extend the c-cyclical monotonicity property to the weak optimal transport problem, for which it corresponds to the first-order optimality condition, namely to the non-negativity of the linearisation of the weak transport cost functional near the optimiser. Altogether, this sheds new light on this monotonicity concept. For both classical and weak optimal transport, we show that this property characterises (under suitable assumptions) optimal transport plans. In the classical case, we recover known results of the literature but with revisited proofs.

2606.13481 2026-06-19 math.OC 新提交

Towards a Control interpretation of Quantum Advantage

走向量子优势的控制解释

Dario Pighin

AI总结 提出控制论框架解释量子优势,通过双线性受控薛定谔方程将量子计算转化为算子可控性问题,并证明量子傅里叶变换和最大独立集问题的可控性及时间上界。

详情
AI中文摘要

我们开发了一个控制论框架来理解量子优势(QA),提供了一条系统化的途径来刻画量子优势何时以及如何产生。双线性受控薛定谔方程是共同主线:目标量子计算被重新表述为特殊酉群 $SU(N)$ 上的算子可控性问题,而量子优势则与相关最小时间函数的 $n$ 的多项式上界相关联。我们在两个典型问题上说明了该框架:a) 超导数字量子处理器(如 IBM 的 ibm_brisbane)上的量子傅里叶变换(QFT),通过李代数论证证明了算子可控性,并利用门串联引理结合标准 QFT 电路分解推导出最小时间的 $O(n^2)$ 上界;b) 中性原子模拟量子处理器(如 Pasqal 的硬件)上的最大独立集(MIS)问题,将里德伯封锁哈密顿量分析为双线性控制系统,并将量子近似优化算法(QAOA)重新表述为连续时间最优控制问题。通过可控性结果,我们展示了该问题如何在 Pasqal 量子计算机上求解,并引入了基于控制的 MIS 量子优势定义。最后,我们概述了几个开放问题,为控制理论与量子计算交叉领域的未来研究指明了方向。

英文摘要

We develop a control-theoretic framework for understanding Quantum Advantage (QA), providing a systematic route to characterize when and how QA can arise. The bilinear controlled Schrödinger equation is the common thread: the target quantum computation is recast as an operator controllability problem on the special unitary group $SU(N)$, and QA is identified with a polynomial-in-$n$ upper bound on the associated minimal-time function. We illustrate the framework on two paradigmatic problems: a) the Quantum Fourier Transform (QFT) on superconducting digital quantum processors (such as IBM's ibm_brisbane), for which we prove operator controllability by a Lie-algebraic argument and derive an $O(n^2)$ upper bound on the minimal time via a gate-concatenation lemma combined with the standard QFT circuit decomposition; b) the Maximum Independent Set (MIS) problem on neutral-atom analog quantum processors (such as Pasqal's hardware), for which we analyze the Rydberg-blockade Hamiltonian as a bilinear control system and reformulate the Quantum Approximate Optimization Algorithm (QAOA) as a continuous-time optimal control problem. By a controllability result, we show how the problem can be solved on Pasqal Quantum Computers and we introduce a control-based definition of Quantum Advantage for MIS. We conclude by outlining several open problems that chart directions for future research at the intersection of Control Theory and Quantum Computing.

2606.20394 2026-06-19 cs.RO math.OC 交叉投稿

Agentic AutoResearch forSpace Autonomy: An Auditable, LLM-Driven Research Agent for Aerospace Control Problems

面向空间自主性的智能体自动研究:用于航空航天控制问题的可审计、LLM驱动的研究代理

Amit Jain, Richard Linares

发表机构 * Department of Aeronautics and Astronautics(航空航天学系)

AI总结 提出AutoResearch框架,利用大语言模型作为离线研究代理,自动迭代开发航天控制策略,并通过内置可信层审计结果,消除种子噪声影响,在交会和对接问题上验证了有效性。

详情
AI中文摘要

航天器的制导、导航与控制功能日益通过从专家求解器中提炼的学习策略来实现。开发这样的策略本身就是一个研究过程:研究者选择架构和超参数,运行实验,并必须判断一个明显的改进是真实的还是仅仅是种子噪声。本文提出了AutoResearch框架,其中大语言模型自主驱动这一循环,用于航空航天控制问题,并结合了一个内置在循环中的可信层,该层根据问题自身测量的种子噪声对每个报告的结果进行认证。语言模型仅作为离线研究代理,负责开发控制策略;它产生的训练策略随后部署在航天器上,而模型本身从不操作飞行器。在每次迭代中,代理读取自然语言描述的问题描述和运行历史,对训练脚本提出一次编辑,执行它,并记录结果。任何报告的结果在通过相同的三项检查之前不会被认可:测量的每个问题的种子噪声、最佳配置的重新播种验证,以及代理编辑的留一法剪枝。相同的循环被原样应用于两个航空航天控制问题:Clohessy-Wiltshire相对交会问题和带有安全约束的避碰对接问题(经过禁飞区),每个问题都针对已知的最优控制基准进行了校准。在这两个问题中,经过审计的策略以多个标准差超过了测量的种子噪声;对相同参数的未定向搜索则没有。在对接问题上,差距变得明显:未定向搜索没有产生可行的策略,而学习到的策略在每个种子上都保持在禁飞区之外。

英文摘要

Spacecraft guidance, navigation, and control functions are increasingly realized as learned policies distilled from expert solvers. Developing such a policy is itself a research process: an investigator selects an architecture and hyperparameters, runs experiments, and must determine whether an apparent improvement is genuine or merely seed noise. This paper presents AutoResearch, a framework in which a large language model autonomously drives that loop for aerospace control problems, coupled with a credibility layer, built into the loop, that certifies each reported result against the problem's own measured seed noise. The language model serves only as the offline research agent that develops the control policy; the trained policy it produces is then deployed onboard the spacecraft, while the model itself never operates the vehicle. At each iteration the agent reads a plain-language problem description and the run history, proposes a single edit to the training script, executes it, and logs the outcome. No reported result is credited until it passes the same three checks: measured per-problem seed noise, reseeded verification of the best configuration, and leave-one-out pruning of the agent's edits. The same loop is applied, unchanged, to two aerospace control problems: a Clohessy-Wiltshire relative rendezvous and a safety-constrained collision-avoidance docking past a keep-out zone, each calibrated against a known optimal control benchmark. In both, the audited policy clears the measured seed noise by many standard deviations; an undirected search over the same parameters does not. On the docking problem the gap becomes categorical: undirected search yields no feasible policy, while the learned policy stays outside the keep-out zone on every seed.

2606.20022 2026-06-19 stat.ML cs.LG math.OC 交叉投稿

Stochastic Linear Contextual Bandits with Bounded Noise: A Set-Membership Approach

具有有界噪声的随机线性上下文赌博机:一种集合成员方法

Haonan Xu, Yingying Li

AI总结 针对有界奖励噪声的随机线性上下文赌博机,提出基于集合成员估计和乐观原则的SME-OFU算法,实现O(log T)的遗憾界,优于次高斯噪声下的最优界。

Comments 23 pages, 1 figure

详情
AI中文摘要

本文考虑具有有界奖励噪声的随机线性上下文赌博机(SLCB)。现有工作通常假设次高斯奖励噪声和有界期望奖励,在此条件下最优遗憾界关于时间T为$\tilde{O}(\sqrt{T})$。然而,在许多应用中,实现/观测到的奖励也自然有界,这意味着奖励噪声有界。有界噪声比次高斯条件更具信息性,但在SLCB文献中尚未被明确利用。本文通过利用一种称为集合成员估计(SME)的不确定性量化方法,并应用面对不确定性的乐观原则(OFU),提出了一种新颖的算法SME-OFU。我们的算法享有改进的遗憾界$O(\log T)$。注意,这并不与次高斯噪声下现有的最优界$\tilde{O}(\sqrt{T})$矛盾,因为有界噪声是更强的条件。最后,仿真表明,当奖励噪声有界时,SME-OFU相对于为次高斯噪声设计的基准算法在经验上有所改进。

英文摘要

This paper considers stochastic linear contextual bandits (SLCB) with bounded reward noise. Existing works typically assume sub-Gaussian reward noise and bounded expected rewards, under which the optimal regret bound scales as $\tilde{O}(\sqrt{T})$ in terms of horizon $T$. However, in many applications, realized/observed rewards are also naturally bounded, implying bounded reward noise. Bounded noise is more informative than the sub-Gaussian condition but has not been leveraged explicitly in the SLCB literature. In this paper, we propose a novel algorithm SME-OFU by utilizing an uncertainty quantification method called set-membership estimation (SME) and applying the principle of optimism in the face of uncertainty (OFU). Our algorithm enjoys an improved regret bound $O(\log T)$. Notice that this does not contradict the existing optimal bound $\tilde{O}(\sqrt{T})$ for sub-Gaussian noise because bounded noise is a stronger condition. Finally, simulations show empirical improvements of SME-OFU over a benchmark algorithm designed for sub-Gaussian noise when the reward noise is bounded.

2606.19878 2026-06-19 cs.LG math.OC stat.ML 交叉投稿

On the Oracle Complexity of Interpolation-Based Gradient Descent

基于插值的梯度下降的预言复杂度

Dongmin Lee, William Lu, Anuran Makur

发表机构 * Purdue University(普渡大学)

AI总结 提出分段多项式插值梯度下降(PPI-GD)方法,通过数据域等距点查询一阶预言构造多项式插值近似全梯度,在强凸和非凸损失下分析预言复杂度,证明在数据维数受限且损失足够光滑时优于多种GD变体。

Comments 16 pages, 2 figures

详情
AI中文摘要

最近关于经验风险最小化(ERM)的一阶优化器的工作表明,可以利用ERM损失函数在训练数据中的光滑性(而非优化参数中的光滑性)来改进梯度下降(GD)方法的预言复杂度。在本文中,我们提出了一种不精确梯度方法——分段多项式插值梯度下降(PPI-GD),该方法通过在数据域中的等距点处查询一阶预言来近似每次迭代中的全梯度,从而在数据域的适当大小的块上构造所得梯度样本的多项式插值。我们分析了PPI-GD在强凸和非凸损失函数下的预言复杂度,其中数据空间维数以训练样本数量的多对数函数为界,并发现当损失函数足够光滑时,PPI-GD在关键区域优于几种GD变体。此外,我们的分析将双三次样条插值误差分析中的几种技术扩展到$d$变量张量积多项式插值的设置中,这可能对插值分析具有独立意义。

英文摘要

Recent work on first-order optimizers for empirical risk minimization (ERM) has suggested that smoothness of ERM loss functions in the training data, rather than in the optimization parameters, can be leveraged to improve the oracle complexity of gradient descent (GD) methods. In this paper, we propose an inexact gradient method, piecewise polynomial interpolation-based gradient descent (PPI-GD), which approximates the full gradient in each iteration by querying the first-order oracle at equidistant points in the data domain to construct polynomial interpolants of the resulting gradient samples over appropriately sized patches of the data domain. We analyze the oracle complexity of PPI-GD for strongly convex and non-convex loss functions when the data space dimension is bounded by a polylogarithmic function of the number of training samples, and find it to outperform several GD variants in key regimes when the loss function is sufficiently smooth. Furthermore, our analysis extends several techniques from the error analysis of bicubic spline interpolants to the setting of $d$-variate tensor product polynomial interpolants which may be of independent interest in interpolation analysis.

2606.19876 2026-06-19 cs.LG math.OC 交叉投稿

Global Convergence of Gradient Descent for Score Matching in Gaussian Mixtures via Reverse Fisher Divergence

通过反向Fisher散度实现高斯混合模型中得分匹配的梯度下降全局收敛

Alexander Tyurin

AI总结 研究反向Fisher散度下梯度下降拟合高斯混合模型的全局收敛性,证明从任意初始化或随机初始化下学生分量收敛到最近教师分量,并给出全变差距离收敛条件。

详情
AI中文摘要

得分匹配问题是现代生成建模、扩散模型、拟合非归一化统计模型和逆问题中的核心训练目标。标准方法是最小化前向Fisher散度,其中期望相对于教师分布取。然而,最近结果表明,即使在简单的高斯混合模型设置中,该目标也可能导致不良且依赖初始化的收敛行为。本文研究另一种目标:反向Fisher散度,其中期望相对于学生分布取。我们分析梯度下降(GD)拟合高斯混合模型,并表明目标函数的这一改变导致显著更好的优化性质。首先,当教师分布是单个高斯分布且学生是固定权重和单位协方差的高斯混合模型时,我们证明了从任意初始化出发GD的全局收敛性。其次,我们将分析扩展到教师也是高斯混合模型的情况,并在全局随机初始化方案和目标均值满足$\widetilde{\Omega}(1)$-分离假设下证明了全局收敛保证。特别地,以高概率,每个学生分量收敛到其最近的教师分量,并且我们提供了学生分布在全变差距离下收敛的条件。我们的证明依赖于基于Lyapunov的梯度下降动力学新分析,表明反向Fisher散度比前向Fisher散度具有更有利的优化景观。

英文摘要

The score matching problem is a central training objective in modern generative modeling, diffusion models, fitting unnormalized statistical models, and inverse problems. A standard approach is to minimize the forward Fisher divergence, where the expectation is taken with respect to the teacher distribution. However, recent results show that even in simple Gaussian mixture model settings, this objective can lead to undesirable and initialization-dependent convergence behavior. In this paper, we study an alternative objective: the reverse Fisher divergence, where the expectation is taken with respect to the student distribution. We analyze gradient descent (GD) for fitting Gaussian mixture models and show that this change in the objective leads to significantly better optimization properties. First, when the teacher distribution is a single Gaussian and the student is a Gaussian mixture model with fixed weights and identity covariances, we prove the global convergence of GD from arbitrary initializations. Second, we extend the analysis to the case where the teacher is also a Gaussian mixture model and prove global convergence guarantees under a global random initialization scheme and a $\widetildeΩ(1)$-separation assumption on the target means. In particular, with high probability, each student component converges near its closest teacher component, and we provide conditions under which the student distribution converges in total variation distance. Our proofs rely on a new Lyapunov-based analysis of the gradient descent dynamics, showing that the reverse Fisher divergence has a much more favorable optimization landscape than the forward Fisher divergence.

2606.19751 2026-06-19 cs.DB math.OC 交叉投稿

DeQL: A Decision Query Language for Prescriptive Analytics over Relational Data

DeQL:一种用于关系数据规范性分析的决策查询语言

Matteo Brucato, Fjodor Kholodkov, Soren Little, Jakob Mayer, Duc Nguyen

AI总结 DeQL扩展SQL以支持决策查询,通过CREATE CANDIDATES和DECIDE两个构造定义选项空间、约束和目标,实现子集选择、分配、调度等决策,并支持不确定性优化和模型评分。

详情
AI中文摘要

DeQL(决策查询语言)扩展了SQL以表达决策查询:给定从关系数据中提取的选项、策略约束和可测量的目标,DeQL查询计算出最佳行动方案。两个构造实现了这一扩展:CREATE CANDIDATES,定义来自关系源的选项空间;DECIDE,声明决策变量、命名约束以及针对这些变量的目标。该设计遵循SQL的原则:用户说明要优化的内容,而引擎选择如何求解;每个查询消费并产生关系;问题的结构对引擎保持可见。本文档规范了该语言(其设计原则、语法、形式文法及执行模型),并附有涵盖子集选择、分配、指派、调度以及多级聚合决策的示例,以及针对不确定性优化、内联模型评分和时间与质量受限求解的扩展。这是该规范的第一版;该语言正在积极开发中,本版本固定了后续修订将基于的核心构造。

英文摘要

DeQL (Decision Query Language) extends SQL to express decision queries: given options drawn from relational data, constraints from policy, and a measurable objective, a DeQL query computes the best course of action. Two constructs carry the extension: CREATE CANDIDATES, which defines the space of options from relational sources, and DECIDE, which declares decision variables, named constraints, and an objective over them. The design follows SQL's principles: the user states what to optimize while the engine chooses how to solve it, every query consumes and produces relations, and the structure of a problem stays visible to the engine. This document specifies the language (its design principles, syntax, formal grammar, and execution model) with examples spanning subset selection, allocation, assignment, scheduling, and decisions at multiple levels of aggregation, and extensions for optimization under uncertainty, inline model scoring, and time- and quality-bounded solving. It is the first version of the specification; the language is under active development, and this version fixes the core constructs on which later revisions will build.

2606.19695 2026-06-19 eess.SY cs.GT cs.SY math.OC 交叉投稿

A Unified Framework for Joint Sensor Placement and Scheduling for Intrusion Detection

入侵检测中联合传感器放置与调度的统一框架

Jayanth Bhargav, Mahsa Ghasemi, Shreyas Sundaram

AI总结 提出一个统一框架,将传感器放置与方向调度联合优化,通过博弈论设计效用函数并利用弱子模性实现近最优检测性能。

Comments 27 pages, 4 figures

详情
AI中文摘要

我们考虑一个入侵检测任务,其中防御者必须联合优化传感器放置位置和方向,以最小化入侵者穿越受保护环境时被漏检的概率。我们将此问题分解为一个元问题(称为SensorPlacement)和一个嵌入的子问题(称为OrientationScheduling)。对于固定的传感器放置,OrientationScheduling子问题被建模为防御者和入侵者之间的两人零和博弈,其中防御者寻求已部署传感器的方向策略以最小化漏检概率,而入侵者则寻求路径选择策略以最大化该概率。由于防御者的策略空间随传感器数量和方向组合增长,通过标准线性规划求解博弈变得不可行。为此,我们开发了一种迭代且高效的均衡求解算法,该算法利用博弈收益函数的结构,并建立了收敛到博弈纳什均衡(NE)的理论保证。该NE值随后被用作SensorPlacement元问题中的效用度量。我们证明了这个基于博弈值的效用函数在传感器放置集合上是弱子模的,并提出了一个具有近最优性保证的贪婪放置算法。据我们所知,这是第一个将博弈论效用设计与(弱)子模优化相结合的统一框架,实现了传感器放置和方向调度的原则性联合优化。通过大量仿真,我们证明所提出的方法实现了近最优的检测性能,同时与基线相比显著减少了计算时间。

英文摘要

We consider an intrusion detection task in which a defender must jointly optimize sensor placement locations and orientations to minimize the probability of missed detection of an intruder traversing a protected environment. We decompose this problem into a meta problem, termed SensorPlacement, and an embedded subproblem, termed OrientationScheduling. The OrientationScheduling subproblem, for a fixed sensor placement, is modeled as a 2-player zero-sum game between the defender and the intruder, where the defender seeks an orientation strategy for the deployed sensors to minimize the probability of missed detection, while the intruder seeks a path selection strategy to maximize it. Since the defender's strategy space grows combinatorially with the number of sensors and orientations, solving the game via standard linear programming becomes prohibitive. To this end, we develop an iterative and efficient equilibrium-seeking algorithm that exploits the structure of the game's payoff function and establishes theoretical guarantees for convergence to the Nash equilibrium (NE) of the game. This NE value is then used as a utility measure in the SensorPlacement meta problem. We show that this game-value-based utility function is weakly submodular over the set of sensor placements and propose a greedy placement algorithm with near-optimality guarantees. To our knowledge, this is the first unified framework to integrate game-theoretic utility design with (weak) submodular optimization, enabling principled joint optimization of sensor placement and orientation scheduling. Through extensive simulations, we demonstrate that the proposed approach achieves near-optimal detection performance while significantly reducing computation time compared to baselines.

2606.19521 2026-06-19 cs.LG math.OC 交叉投稿

Interactive Pareto navigation for deep multi-task learning

深度多任务学习的交互式帕累托导航

Augustina C. Amakor, Konstantin Sonntag, Sebastian Peitz

发表机构 * Department of Computer Science, TU Dortmund, Dortmund, Germany(多特蒙德工业大学计算机科学系,德国多特蒙德) Lamarr Institute for Machine Learning and Artificial Intelligence(拉马尔机器学习和人工智能研究所)

AI总结 提出偏好帕累托探索(PPE)框架,通过预测-校正方法沿帕累托流形切线方向引导偏好,利用Krylov子空间方法避免Hessian计算,实现高效交互式多目标优化。

详情
AI中文摘要

在多任务学习中,处理越来越多的目标在计算资源和决策者选择适当权衡的能力方面都很快变得具有挑战性。因此,一种广泛使用的方法是通过加权和将各个损失聚合到单个损失函数中。这通常由于帕累托前沿的形状而无法捕捉决策者的偏好,或者需要多次调整和计算,这在深度学习应用中变得过于昂贵。为了解决这些问题,我们引入了一个新颖的框架,偏好帕累托探索(PPE),它在交互式探索过程中强制执行决策者的偏好,同时考虑帕累托集的几何形状。PPE基于预测-校正方法,该方法沿着帕累托最优解流形的切线方向执行预测步骤,遵循决策者的偏好。随后的校正步骤产生反映该偏好的新权衡。为了在表征流形切空间时避免显式的Hessian计算,我们采用了一种仅依赖于矩阵-向量乘积的Krylov子空间方法。这些乘积可以通过自动微分高效获得,确保了整个优化过程的效率和鲁棒性。该方法的有效性和性能通过玩具问题和深度学习示例进行了展示。

英文摘要

In multi-task learning, handling an increasing number of objectives can quickly become challenging, both in terms of the computational resources and the decision maker's capacity to choose appropriate trade-offs. A widely used approach is thus to aggregate the individual losses in a single loss function by a weighted sum. This often fails to capture either the decision maker's preferences as a result of the shape of the Pareto front, or requires multiple adjustments and computations which becomes prohibitively expensive in deep learning applications. To address these issues, we introduce a novel framework, Preference Pareto Exploration (PPE), which enforces the decision maker's preferences while accounting for the geometry of the Pareto set in an interactive exploration process. PPE is based on a predictor-corrector method that performs predictor steps tangential to the manifold of Pareto-optimal solutions, following the decision maker's preference. The subsequent corrector step results in a new trade-off reflecting this preference. To avoid explicit Hessian computations when characterizing the tangent space of the manifold, we employ a Krylov subspace method that relies solely on matrix-vector products. These products can be efficiently obtained via automatic differentiation, ensuring both efficiency and robustness throughout the optimization process. The method's functionality and performance are demonstrated using both toy problems and examples from deep learning.

2606.19368 2026-06-19 math.NA cs.LG cs.NA math.OC 交叉投稿

Neural Architectures as Functional Priors in Physics-Informed Control Problems

物理信息控制问题中的神经架构作为函数先验

Sonia Rubio Herranz, Fernando Carlos López Hernández, Antonio López Montes

AI总结 研究神经架构作为隐式函数先验在常微分方程控制问题中的作用,发现不同架构(MLP与傅里叶KAN)在相同条件下产生定性不同的控制,表现出功能特化现象。

Comments 17 pages, 6 figures. Physics-informed neural networks, optimal control, spectral bias, Kolmogorov-Arnold Networks

详情
AI中文摘要

在这项工作中,我们研究了神经架构作为隐式函数先验在由常微分方程控制的问题中的作用。我们的目标不是关注高度复杂的问题,而是在最简单的物理可解释设置中研究受控动力系统中依赖于架构的效应。特别地,我们研究了一个受控的线性RLC电路和一个非线性Duffing型动力系统。这两个系统首先通过经典最优控制公式进行分析,然后通过基于PINN的方法进行分析。我们比较了多层感知器(MLP)和基于傅里叶的KAN类架构的不同组合,并分析了它们对所得控制的影响。数值实验表明,即使在相同的控制方程、损失函数、初始和目标状态、训练参数以及物理约束下,不同的架构选择也会系统地产生定性不同的控制。学习到的解在谱结构、平滑性、能量分布和相空间行为方面出现显著差异。这项工作的一个核心观察是,当神经架构被允许足够的自由度来塑造学习到的控制结构时,会出现功能特化现象。更具体地说,在我们考虑的系统中,基于傅里叶的架构倾向于产生具有更丰富振荡内容的轨迹,而更平滑的低频偏置架构倾向于产生更规则且能量效率更高的控制。这表明控制问题的不同功能组件可能由不同的神经架构更有效地处理,从而导致状态表示和控制生成之间的隐式特化。

英文摘要

In this work we investigate the role of neural architectures as implicit functional priors in control problems governed by ordinary differential equations. Rather than focusing on highly complex problems, our objective is to investigate architecture-dependent effects in controlled dynamical systems within the simplest physically interpretable settings possible. In particular, we study a controlled linear RLC electrical circuit and a nonlinear Duffing-type dynamical system. Both systems are analyzed first through classical optimal-control formulations and later through PINN-based approaches. We compare different combinations of multilayer perceptrons (MLPs) and Fourier-based KAN-like architectures, and analyze their influence on the resulting controls. The numerical experiments suggest that different architectural choices systematically generate qualitatively distinct controls, even under identical governing equations, loss functionals, initial and target states, training parameters and physical constraints. Significant differences appear in the spectral structure, smoothness, energy distribution, and phase-space behavior of the learned solutions. A central observation of this work is the emergence of a functional specialization phenomenon when the neural architectures are allowed sufficient freedom to shape the structure of the learned controls. More specifically, in the systems considered here, Fourier-based architectures tend to produce trajectories with richer oscillatory content, whereas smoother low-frequency-biased architectures tend to generate more regular and energetically efficient controls. This suggests that different functional components of the control problem may be handled more efficiently by different neural architectures, leading to an implicit specialization between state representation and control generation.

2606.18679 2026-06-19 cs.DS cs.GT cs.LG math.OC 交叉投稿

Fair Online Resource Allocation

公平在线资源分配

Christopher En, Yuri Faenza, Andrea Lodi, Gonzalo Muñoz

发表机构 * Columbia University, IEOR Department(哥伦比亚大学工业工程与运营研究系) Cornell Tech(康奈尔科技学院) Universidad de Chile(智利大学)

AI总结 研究在线资源分配中的公平性问题,提出基于对偶镜像下降的算法,在批次内强制执行公平约束,实现亚线性遗憾,并通过难民数据验证了福利与公平的权衡。

Comments 30 pages, 4 figures. To appear in the proceedings of EC 2026

详情
AI中文摘要

我们研究公平在线资源分配问题,其动机源于难民安置和航班调度等应用,其中代理顺序到达并必须分配到容量有限的设施。我们引入一个模型,在资源约束和Lipschitz公平性要求下最大化整体福利,该要求确保同一批次中到达的相似代理获得相似的预期结果。我们首先分析离线问题,证明最优公平分配的价值至少是最优不公平分配的$\Omega(1/\gamma)$倍,其中$\gamma$是公平系数,从而界定了公平的代价。对于在线设置,我们提出一种基于对偶镜像下降的算法,该算法在估计最优对偶变量的同时,在批次内强制执行公平约束。我们证明该算法相对于最优离线流体基准实现了亚线性遗憾。最后,我们使用难民经济项目的真实数据验证了理论结果,展示了算法的性能,并考察了福利最大化与公平执行之间的权衡。

英文摘要

We study the problem of fair online resource allocation, motivated by applications such as refugee resettlement and airline scheduling, where agents arrive sequentially and must be assigned to facilities with limited capacities. We introduce a model that maximizes the overall welfare subject to resource constraints and a Lipschitz fairness requirement, which ensures that similar agents arriving in the same batch receive similar expected outcomes. We first analyze the offline problem, proving that the value of the optimal fair allocation is at least an $Ω(1/γ)$ fraction of the optimal unfair allocation, where $γ$ is the fairness coefficient, thereby bounding the price of fairness. For the online setting, we propose an algorithm based on dual mirror descent that enforces fairness constraints within batches while estimating optimal dual variables. We prove that this algorithm achieves sublinear regret relative to the optimal offline fluid benchmark. Finally, we validate our theoretical results using real-world data from the Refugee Economies Programme, demonstrating the algorithm's performance and examining the trade-offs between welfare maximization and fairness enforcement.