arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.05140 2026-06-04 math.AP math-ph math.MP math.PR stat.ML

Phase transitions for the noisy transformer model in arbitrary dimension

任意维噪声变压器模型的相变

Kyunghoo Mun, Matthew Rosenzweig

AI总结 研究任意维球面上未归一化自注意力(USA)噪声变压器模型的McKean-Vlasov自由能,证明了全局最小化子的尖锐二分法,并给出了连续与不连续相变的临界条件。

详情
Comments
18 pages
AI中文摘要

我们研究了与噪声变压器动力学中未归一化自注意力(USA)模型相关的单位球面上的McKean-Vlasov自由能。我们在每个维度$d\ge2$中证明了尖锐的全局最小化子二分法。存在唯一的$\beta_*^{(d)}>0$使得\begin{equation*} \frac{I_{d/2+1}(\beta_*^{(d)})}{I_{d/2}(\beta_*^{(d)})}=\frac1d, \end{equation*}其中$I_\nu$是第一类修正贝塞尔函数。对于$0<\beta\le \beta_*^{(d)}$,均匀密度在达到线性稳定阈值\begin{equation*} K_\#^{(d)}(\beta)=\frac{\beta^{d/2}}{2^{d/2}\Gamma(d/2)I_{d/2}(\beta)} \end{equation*}之前仍是唯一的全局最小化子,且相变是连续的。对于$\beta>\beta_*^{(d)}$,均匀密度在$K_\#^{(d)}(\beta)$处不是全局最小化子,因此临界耦合满足$K_c<K_\#^{(d)}(\beta)$且相变是不连续的。这一结果将作者近期关于$d=2$的工作(arXiv:2604.16288)推广到了任意维度。证明使用了球面上的尖锐Beckner-Onofri/对数Hardy-Littlewood-Sobolev(HLS)不等式,结合Funk-Hecke/贝塞尔系数计算和二次四次阻碍。

英文摘要

We study the McKean--Vlasov free energy on the unit sphere associated with the unnormalized self-attention (USA) model for noisy transformer dynamics. We prove a sharp global-minimizer dichotomy in every dimension $d\ge2$. There is a unique $β_*^{(d)}>0$ such that \begin{equation*} \frac{I_{d/2+1}(β_*^{(d)})}{I_{d/2}(β_*^{(d)})}=\frac1d, \end{equation*} where $I_ν$ is the modified Bessel function of the first kind. For $0<β\le β_*^{(d)}$, the uniform density remains the unique global minimizer up to the linear-stability threshold \begin{equation*} K_\#^{(d)}(β)=\frac{β^{d/2}}{2^{d/2}Γ(d/2)I_{d/2}(β)}, \end{equation*} and the phase transition is continuous. For $β>β_*^{(d)}$, the uniform density is not globally minimizing at $K_\#^{(d)}(β)$, so the critical coupling satisfies $K_c<K_\#^{(d)}(β)$ and the transition is discontinuous. This result generalizes the authors' recent $d=2$ work arXiv:2604.16288 to arbitrary dimension. The proof uses the sharp Beckner--Onofri/logarithmic Hardy-Littlewood-Sobolev (HLS) inequality on the sphere, together with a Funk--Hecke/Bessel coefficient computation and a degree-two quartic obstruction.

2606.05103 2026-06-04 cs.LG astro-ph.IM cs.CV stat.ML

Identifying Gems from Roman RAPIDly

从Roman RAPIDly中识别宝石

Karan Gandhi, Ashish A. Mahabal, Jacob E. Jencson, Russ R. Laher, Ben Rusholme, Lin Yan, Ryan M. Lau, Schuyler D. Van Dyk, Mansi M. Kasliwal

AI总结 针对Roman太空望远镜无真实数据的问题,提出机器学习模型RuBR和通用方法,用于在RAPID流水线中区分真实瞬变/变源与虚假检测,实验表明该方法在Roman时代具有鲁棒性。

详情
Comments
15 pages, 10 figures, Submitted to the Publications of the Astronomical Society of the Pacific
AI中文摘要

南希·格雷斯·罗马太空望远镜(Roman)计划最早于2026年9月发射,将以前所未有的空间分辨率和节奏进行宽场红外成像巡天,从而发现数百万天文瞬变源。因此,有必要建立自动化的警报流水线,以便望远镜在发射后不久就能开始发现可靠的瞬变源和变源。然而,目前不存在真实的Roman数据,这使得开发此类流水线变得困难。在这项工作中,我们提出了一个机器学习模型$RuBR$和一种通用方法,用于在RAPID流水线中区分真实的瞬变和变源检测与虚假检测。具体而言,我们使用该方法提出了三个模型:$RuBR_{comb}$在本地注入和OpenUniverse2024瞬变源的组合数据上训练和测试,$RuBR_{loc}$在本地注入瞬变源上训练并在OpenUniverse2024瞬变源上测试,以及$RuBR_{DA}$将本地注入瞬变源与部分OpenUniverse2024瞬变源以域适应模式结合进行训练。这为在Roman任务早期阶段缺乏真实标签的情况下,将$RuBR_{comb}$模型适应真实观测的策略铺平了道路。尽管图像差分流水线仍在改进中,但我们的实验结果证明了所提出方法的有效性及其在Roman时代进行稳健真实-虚假分类的前景。

英文摘要

The Nancy Grace Roman Space Telescope (Roman), set for launch as early as September 2026, will conduct wide-field infrared imaging surveys with unprecedented spatial resolution and cadence, enabling the discovery of millions of astronomical transients. Hence, it is necessary to have automated pipelines for generating alerts in place so that the telescope can begin discovering reliable transients and variable objects soon after it is launched. However, no real Roman data currently exist, making the development of such pipelines difficult. In this work, we present a machine learning model $RuBR$ and a general methodology for distinguishing genuine transient and variable detections from spurious (bogus) detections within the RAPID pipeline. In particular, we present three models using this methodology: $RuBR_{comb}$ trained and tested on combined locally injected and OpenUniverse2024 transients, $RuBR_{loc}$ trained on locally injected transients and tested on OpenUniverse2024 transients, and $RuBR_{DA}$ that combines locally injected transients with a fraction of OpenUniverse2024 transients in domain-adaptation mode for training. This paves the way for strategies to adapt the $RuBR_{comb}$ model to real observations in the absence of any ground-truth labels during the early phases of the Roman mission. While the image differencing pipeline continues to be improved, our experimental results demonstrate the effectiveness of the proposed approach and its promise for robust real-bogus classification in the Roman era.

2606.05072 2026-06-04 math.ST stat.TH

Adaptive Sequential Change Detection using Mixtures of Predictive Distributions

使用预测分布混合的自适应序列变化检测

Topi Halme, H. Vincent Poor, Visa Koivunen

AI总结 针对后变化分布未知的独立观测序列变化检测问题,提出一种基于滑动窗口预测分布混合的PM-CuSum算法,实现一阶渐近最优性且渐近延迟余项更小。

详情
AI中文摘要

本文研究了当后变化分布未知时,检测独立观测序列分布变化的问题。我们提出了一种新颖的变化检测算法,称为预测混合CuSum(PM-CuSum),该算法在CuSum递归中结合了从不同长度滑动窗口构建的预测分布。预测分布根据其近期预测性能使用自适应权重进行聚合。我们证明,在温和条件下,PM-CuSum实现了一阶渐近最优性,并且其渐近延迟界具有比任何固定(甚至先知)窗口更小的余项阶数。数值模拟表明,与现有方法相比,PM-CuSum表现良好。此外,与插件似然相比,使用完整预测分布形成似然比可以显著提高性能。

英文摘要

This paper studies the problem of detecting a change in the distribution of a sequence of independent observations when the post-change distribution is unknown. We propose a novel change detection algorithm, termed Predictive-Mixture CuSum (PM-CuSum), which combines predictive distributions constructed from sliding windows of different lengths within a CuSum recursion. The predictive distributions are aggregated using adaptive weights based on their recent predictive performance. We show that PM-CuSum achieves first-order asymptotic optimality under mild conditions, and that its asymptotic delay bound has a smaller remainder order than what is achieved by any fixed (even oracle) window. Numerical simulations demonstrate that PM-CuSum performs well compared to existing methods. Moreover, it is demonstrated that forming likelihood ratios using full predictive distributions can substantially improve performance compared to plug-in likelihoods.

2606.05046 2026-06-04 cs.LG stat.ML

Graph Cascades: Contagion-Based Mesoscopic Rewiring for Structure-Aware Graph Machine Learning

图级联:基于传染的介观重连用于结构感知图机器学习

Meher Chaitanya, My Le, Luana Ruiz

AI总结 提出一种基于传染扩散的介观重连策略Graph Cascades,通过构建辅助图增强图神经网络和变换器对中间尺度结构的捕捉能力,在节点分类任务上提升多个骨干网络性能,并理论刻画了重连有效的条件。

详情
AI中文摘要

我们引入图级联(Graph Cascades),一种用于图神经网络(GNN)和图变换器(GT)的介观重连策略,它能够捕获超出纯局部边或完全全局注意力的中间尺度图结构。基于传染扩散过程,Graph Cascades 在 O(|V|+|E|) 时间内构建一个辅助图,其中由重复多跳强化支持的节点对被提升为直接邻居。我们从理论上刻画了基于强化的重连何时有帮助:强化边选择比直接邻接更标签对齐的充分条件,一个两跳强化完全同质的 SBM 示例,以及通过图有效电阻对介观连通性的形式化。实验上,在节点分类基准测试中,Graph Cascades 改进了多个 GNN 和稀疏 GT 骨干网络,在异质图和中等至高同质度图上观察到最可靠的增益。理论条件还识别了介观重连不太可能有益的场景——低度正则图和存在结构瓶颈的图——这些预测与观察到的失败相符。我们还观察到重连图中性能与结构属性之间的紧密相关性。

英文摘要

We introduce Graph Cascades, a mesoscopic rewiring strategy for Graph Neural Networks (GNNs) and Graph Transformers (GTs) that captures intermediate-scale graph structure beyond purely local edges or fully global attention. Using contagion-based diffusion processes, Graph Cascades constructs, in O(|V|+|E|) time, an auxiliary graph where node pairs supported by repeated multi-hop reinforcement are promoted to direct neighbors. We theoretically characterize when reinforcement-based rewiring helps: sufficient conditions under which reinforcement-based edge selection is more label-aligned than direct adjacency, an SBM witness in which two-hop reinforcement is perfectly homophilic, and a formalization of mesoscopic connectivity via graph effective resistance. Empirically, across node-classification benchmarks, Graph Cascades improves multiple GNN and sparse-GT backbones, with the most reliable gains observed on heterophilic and moderate- to high-degree homophilic graphs. The theoretical conditions also identify regimes where mesoscopic rewiring is unlikely to be beneficial -- low-degree regular graphs and graphs with structural bottlenecks -- and these predictions match the observed failures. We additionally observe tight correlations between performance and structural properties in the rewired graphs.

2606.05026 2026-06-04 stat.ME stat.AP

Removal of Multivariate Environmental Influences in Structural Health Monitoring through Conditional Covariances and Supervised Learning

通过条件协方差和监督学习去除结构健康监测中的多变量环境影响

Lizzie Neumann, Philipp Wittenberg, Jan Gertheiss

AI总结 针对结构健康监测中环境因素影响输出量高阶统计矩的问题,提出并比较了非参数核估计、随机森林、半参数加性模型和深度学习方法以识别和量化多变量混淆效应,并展示了条件协方差矩阵在SHM流程中的应用。

详情
Comments
25 pages, 8 figures
AI中文摘要

在结构健康监测(SHM)系统中,数据从大量传感器收集,例如测量结构的振动或应变,以及捕捉环境或操作信息的附加特征。众所周知,测量的传感器输出变化不一定源于结构损伤,而常常是由环境变化引起的。解决这些影响的一种流行方法是将系统输出对混淆因子进行回归,也称为“响应面建模”。之后,从观测值中减去预测值,以获得(假定)去除环境影响的校正数据。然而,对真实世界SHM数据的评估表明,环境条件不仅可能影响期望输出值,还可能影响高阶统计矩,特别是输出量(如不同模态的固有频率或不同位置的应变传感器)的方差以及协方差和相关性。由于构造原因,常用于响应面建模的(监督)机器学习技术无法考虑这些高阶效应。为解决这些问题,我们提出并讨论了几种识别和量化输出协方差和相关性上的多变量混淆效应的方法:非参数核估计器、随机森林、半参数加性模型和深度学习方法。此外,我们展示了如何将得到的条件协方差矩阵用于SHM流程。我们在人工数据以及德国汉堡Vahrendorfer Stadtweg桥的真实荷载试验数据、比利时鲁汶附近铁路桥KW51的固有频率数据上比较了这些竞争方法。

英文摘要

In structural health monitoring (SHM) systems, data is collected from a multitude of sensors measuring, for example, vibration or strain in the structure, along with additional features that capture environmental or operational information. It is well known that changes in the measured sensor outputs do not necessarily originate from structural damage but are often induced by environmental changes. One popular approach to account for these effects is regressing the system outputs on the confounding factors, also known as "response surface modeling". Afterward, the predicted values are subtracted from the observed ones to obtain corrected data with the environmental effects (supposedly) removed. However, the evaluation of real-world SHM data shows that environmental conditions may affect not only the expected output values but also higher-order statistical moments, particularly the variances of and the covariances and correlations between the output quantities, such as eigenfrequencies of different modes or strain sensors at different locations. By construction, the (supervised) machine learning techniques commonly used for response surface modeling cannot account for those higher-order effects. To address these issues, we present and discuss several approaches for identifying and quantifying multivariate confounding effects on output covariances and correlations: a nonparametric, kernel-based estimator, a random forest, a semiparametric additive model, and a deep learning approach. Furthermore, we show how the resulting conditional covariance matrices can be used in an SHM pipeline. We compare the competing methods on both artificial data and real-world load test data from the Vahrendorfer Stadtweg bridge in Hamburg, Germany, as well as eigenfrequency data from the railway bridge KW51 near Leuven, Belgium.

2606.04946 2026-06-04 cs.DS cs.LG stat.ML

A General Framework for Dynamic Consistent Submodular Maximization

动态一致子模最大化的通用框架

Paul Dütting, Federico Fusco, Silvio Lattanzi, Ashkan Norouzi-Fard, Ola Svensson, Morteza Zadimoghaddam

AI总结 针对全动态环境下的子模最大化问题,提出一个通用算法框架,首次实现具有次线性一致性的常数因子近似解。

详情
Comments
Accepted at ICML 2026
AI中文摘要

一致性是动态子模最大化中的一个重要性质,它要求算法始终维持一个接近最优的解,并且在每一步只对解进行少量调整。先前的工作仅在仅插入的情况下探讨了这个问题,其中算法面临 $n$ 个插入的流,并建立了基数约束版本的下界和上界。我们在全动态设置中考虑这个问题,其中操作流可能同时包含插入和删除。我们开发了一个通用框架来设计该设置下的算法,并通过实例化得到了首个具有次线性一致性的常数因子近似。对于基数约束,我们提出了一个 $\frac 12 - O(\varepsilon)$ 近似,其一致性为 $O\left(\frac{1}{\varepsilon^2}\right)$。对于秩-$k$ 拟阵约束,我们构造了一个 $\frac 14 - O(\varepsilon)$ 近似于动态最优解,其一致性为 $O\left(\frac{\log k}{\varepsilon^2}\right)$。

英文摘要

Consistency is an important property in dynamic submodular maximization and entails maintaining a near-optimal solution at all times, making only a small number of adjustments to the solution in each step. Prior work has explored this question for the insertion-only case, where the algorithm faces a stream of $n$ insertions, and has established lower and upper bounds for the cardinality-constrained version of the problem. We consider this question in the fully dynamic setting, where the stream of operations may contain both insertions and deletions. We develop a general framework for designing algorithms for this setting, and instantiate it to obtain the first constant-factor approximations with sublinear consistency. For cardinality constraints, we propose a $\frac 12 - O(\varepsilon)$ approximation that is $O\left(\frac{1}{\varepsilon^2}\right)$ consistent. For rank-$k$ matroid constraints, we construct a $\frac 14 - O(\varepsilon)$ approximation to the dynamic optimum that is $O\left(\frac{\log k}{\varepsilon^2}\right)$ consistent.

2606.04930 2026-06-04 cs.LG cs.AI stat.ML

AdaKoop: Efficient Modeling of Nonlinear Dynamics from Nonstationary Data Streams with Koopman Operator Regression

AdaKoop: 基于Koopman算子回归的非平稳数据流非线性动力学高效建模

Naoki Chihara, Ren Fujiwara, Yasuko Matsubara, Yasushi Sakurai

AI总结 提出AdaKoop,一种基于Koopman算子理论和概率框架的流式算法,通过将非线性动力学表示为线性系统,实现对非平稳数据流的高效、稳定建模,并在71个基准数据集上超越现有方法。

详情
Journal ref
The 32nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2026
Comments
Accepted by KDD'26
AI中文摘要

实时数据分析需要准确且自适应地处理非平稳数据流中的非线性动力学,同时保持计算效率。然而,非线性动力学非常复杂,在严格时间限制下捕获动态变化的非线性模式并将其用于下游任务并非易事。为了弥合非线性复杂性与计算可处理性之间的差距,本研究应用了Koopman算子理论,该理论指出非线性动力学可以表示为无限维空间中的线性变换。基于该算子的有限维近似,我们提出了AdaKoop,一种用于对非平稳数据流上的非线性动力学进行建模的高效流式算法。我们的方法利用基于Koopman算子理论的概率框架,将原始观测和再生核希尔伯特空间(RKHS)特征都视为来自潜在向量的发射。这种双视角公式允许非线性动力学被表示为可处理的线性系统。因此,AdaKoop能够以流式方式高效稳定地建模非线性动力学,避免了迭代非线性优化的高昂计算成本。此外,为了应对数据流中的非平稳性,AdaKoop通过统计假设检验自适应地检测模式突变,并增量更新模型参数以处理连续变化。在总共71个跨领域实际基准数据集上的大量实验表明,AdaKoop在实时预测准确性和计算效率方面均优于最先进的方法。

英文摘要

Real-time data analysis requires the ability to accurately and adaptively address nonlinear dynamics in a nonstationary data stream while preserving computational efficiency. However, nonlinear dynamics are so complex that capturing dynamically changing nonlinear patterns and utilizing them for downstream tasks under strict time constraints is nontrivial. To bridge the gap between nonlinear complexity and computational tractability, this study applies Koopman operator theory, which states that nonlinear dynamics can be represented as linear transitions in an infinite-dimensional space. Building upon finite-dimensional approximations of this operator, we present AdaKoop, an efficient streaming algorithm for modeling nonlinear dynamics over nonstationary data streams. Our approach utilizes a probabilistic framework grounded in Koopman operator theory, treating both raw observations and reproducing kernel Hilbert space (RKHS) features as emissions from latent vectors. This dual-view formulation allows nonlinear dynamics to be expressed as a tractable linear system. Therefore, AdaKoop enables the efficient and stable modeling of nonlinear dynamics in a streaming fashion, avoiding the prohibitive computational costs of iterative nonlinear optimization. Furthermore, to address nonstationarity in data streams, AdaKoop adaptively detects the switching of patterns via statistical hypothesis testing for abrupt pattern shifts and incrementally updates model parameters to handle continuous changes. Extensive experiments on a total of 71 practical benchmark datasets across various domains demonstrate that AdaKoop outperforms state-of-the-art methods in terms of real-time forecasting accuracy and computational efficiency.

2606.04916 2026-06-04 cs.LG econ.GN q-fin.EC stat.ML

Worker Utility as Hysteresis: A Preisach Model of Transaction Acceptance in Gig Labour Markets

工人效用作为滞后:零工劳动力市场中交易接受的Preisach模型

Piotr Frydrych

AI总结 本文提出Preisach滞后模型表示零工工人隐藏偏好,通过双输出神经网络估计接受和拒绝效用,结合XGBoost分类器,在36891笔交易上实现Jaccard=0.827和ROC AUC=0.799,并证明价格下降比上升对完成率影响更大。

详情
Comments
18 pages, 5 figures
AI中文摘要

工人效用是不可观测的——只有其结果可观测。每笔零工交易产生一个比特:接受或拒绝。我们认为这种结构直接指向Preisach滞后模型作为潜在工人偏好的自然表示。Preisach算子将总产出建模为对一群二元阈值元素的积分——这正是异质性工人各自持有私人接受工资时出现的结构。我们通过双输出神经网络(共享层256->128,边际损失强制U_1 >= U_0)估计两个潜在效用曲面:接受效用U_1(X)和拒绝效用U_0(X)。分类简化为Preisach间隙U_1(X) - U_0(X),与裁剪稳定的价格-阈值编码一起输入XGBoost分类器。在36,891笔零工交易上,该流程实现了Jaccard=0.827和ROC AUC=0.799。价格-阈值编码相比原始效用特征贡献了+11.0个百分点的AUC。模型证实了滞后预测的方向不对称性:价格下降比同等幅度的上升更严重地降低完成率。应用于完整数据集,模型的建议同时将总工资账单减少21.3%,并将预期填充率提高9.7个百分点。对于74.2%的交易,P(接受)已超过0.80;降低工资使其保持在阈值以上(削减后平均P=0.972),释放成本节约(中位数31%)。对于剩余的25.4%,中位数7%的工资增长恢复了+43个百分点的接受率。没有明确无差异区域的模型无法同时执行这两种操作。

英文摘要

Worker utility is not observed -- only its consequence is. Each gig transaction produces a single bit: accepted or rejected. We argue this structure points directly to the Preisach hysteresis model as the natural representation of latent worker preferences. The Preisach operator models aggregate output as an integral over a population of binary threshold elements -- precisely the structure that emerges when heterogeneous workers each carry a private acceptance wage. We estimate two latent utility surfaces: acceptance utility U_1(X) and rejection utility U_0(X), via a dual-output neural network (shared layers 256->128, margin loss enforcing U_1 >= U_0). Classification reduces to the Preisach gap U_1(X) - U_0(X), passed into an XGBoost classifier alongside clip-stabilised price-to-threshold encodings. On 36,891 gig transactions, this pipeline achieves Jaccard = 0.827 and ROC AUC = 0.799. The price-to-threshold encoding accounts for +11.0 pp AUC over raw utility features. The model confirms the directional asymmetry hysteresis predicts: price decreases depress completion rates more than equivalent increases raise them. Applied to the full dataset, the model's recommendations simultaneously reduce the total wage bill by 21.3% and increase expected fill rate by 9.7 pp. For 74.2% of transactions, P(accept) already exceeds 0.80; reducing the wage keeps it above threshold (mean post-cut P = 0.972), releasing cost savings (median 31%). For the remaining 25.4%, a median 7% wage increase recovers +43 pp acceptance. A model without an explicit indifference zone cannot execute both moves simultaneously.

2606.04900 2026-06-04 stat.AP

Multi-objective probabilistic forecast combination for inventory demand

库存需求的多目标概率预测组合

Shengjie Wang, Yanfei Kang, Evangelos Spiliotis, Fotios Petropoulos

AI总结 针对现有概率预测组合方法忽略运营目标的问题,提出多目标优化框架,同时优化预测准确性和库存决策性能,在沃尔玛零售和英国皇家空军备件数据上验证了其平衡性和鲁棒性。

详情
AI中文摘要

概率预测对于库存管理至关重要,因为决策依赖于未来需求的完整分布。虽然概率预测组合被广泛用于提高统计准确性,但大多数现有方法仅优化统计损失而忽略运营目标。然而,在库存环境中,更高的预测准确性并不一定转化为更好的决策性能,尤其是在非线性成本结构和多个潜在冲突的决策目标下。为了解决这一差距,我们提出了一个多目标概率预测组合框架,同时考虑预测准确性和库存决策性能。该框架将预测组合表述为多目标优化问题,并导出一组帕累托最优组合,从而实现预测与运营目标之间的显式权衡。使用沃尔玛零售数据和英国皇家空军备件数据的实证研究表明,所提出的方法比单个模型、简单平均和单目标优化实现了更平衡和稳健的性能。我们的结果为将概率预测与库存决策对齐提供了一个实用且灵活的框架。

英文摘要

Probabilistic forecasts are essential for inventory management, where decisions depend on the full distribution of future demand. While probabilistic forecast combination is widely used to improve statistical accuracy, most existing approaches optimize statistical loss alone and overlook operational objectives. However, in inventory settings, higher forecast accuracy does not necessarily translate into better decision performance, especially under nonlinear cost structures and multiple, potentially conflicting, decision targets. To address this gap, we propose a multi-objective probabilistic forecast combination framework that simultaneously considers forecast accuracy and inventory decision performance. The framework formulates forecast combination as a multi-objective optimization problem and derives a set of Pareto-optimal combinations, enabling explicit trade-offs between forecasting and operational goals. Empirical studies using Walmart retail data and Royal Air Force spare parts data demonstrate that the proposed approach achieves more balanced and robust performance than individual models, simple averaging, and single-objective optimization. Our results provide a practical and flexible framework for aligning probabilistic forecasting with inventory decision-making.

2606.04879 2026-06-04 stat.ME stat.AP

Bootstrap-based Hypothesis Test of 2D Contours using Elastic Shape Analysis

基于弹性形状分析的二维轮廓自助法假设检验

Susan Glenn, Justin Strait, Kelly Moran, Chris Danly, Matthew P Selwood

AI总结 提出一种基于弹性形状距离的经验置信区间假设检验方法,通过针对非光滑泛函的自助法处理距离不可微性,并在惯性约束聚变图像中验证有效性。

详情
Comments
35 pages, 11 figures
AI中文摘要

图像中物体的形状通常复杂、高维,且变化方式无法被标准欧几里得几何和统计所捕捉。统计形状分析包含灵活且可解释地测量几何对象内在形状和形状变异的方法。弹性形状分析(ESA)是一种通过轮廓表示对象、在旋转、尺度、平移和参数化下不变地测量形状差异的方法。尽管ESA在许多图像应用中可用于量化对象形状,但基于图像的ESA中正式的统计推断方法仍然有限。本文引入了一种基于弹性形状距离(ESD)的经验置信区间的假设检验程序,用于检验提出的真实形状与估计形状之间的差异。置信区间通过针对非光滑泛函的自助法构建,以处理ESD的不可微性。通过数值研究和惯性约束聚变(ICF)的真实图像示例展示了该方法的有效性。

英文摘要

Shapes of objects in images are often complex, high-dimensional, and vary in ways not captured by standard Euclidean geometry and statistics. Statistical shape analysis encompasses methods for flexible and interpretable measurement of intrinsic shape and shape variability in geometric objects. Elastic Shape Analysis (ESA) is one such method that measures shape differences between objects, represented by contours, in a way that is invariant to rotation, scale, translation, and parameterization. Although ESA is useful for quantifying shape of objects in many image applications, formal methods for statistical inference in image-based ESA remain limited. This work introduces a hypothesis test procedure based on empirical confidence intervals for the elastic shape distance (ESD) between a proposed underlying true shape and an estimated shape. The confidence intervals are created using a bootstrap procedure for non-smooth functionals, which accounts for the non-differentiability of the ESD. The effectiveness of the method is illustrated through both numerical studies and real world image examples from inertial confinement fusion (ICF).

2606.04859 2026-06-04 math.PR math.ST stat.ME stat.TH

Stein's method for the Wishart distribution

Wishart分布的Stein方法

Gabriel Bailly, Robert E. Gaunt, Frédéric Ouimet, Donald Richards, Rainer von Sachs

AI总结 本文针对正定矩阵锥上的Wishart分布发展了Stein方法,建立了基于扩展生成元的Stein刻画、转移半群、Stein方程的解及其正则性估计,并通过四个应用展示了新方法。

详情
Comments
93 pages, 6 tables, 2 figures
AI中文摘要

在这项工作中,我们在正定矩阵锥上发展了Wishart分布的Stein方法。我们建立了Wishart Stein框架的基本要素:从Wishart扩散过程中推导出基于扩展生成元的Stein刻画,通过非中心Wishart律识别相应的转移半群,给出Stein方程解的显式半群表示,并获得解的正则性估计。新方法在四个应用中得到了展示:(i) 在MANOVA中,对于光滑测试函数,非中心群均值散点矩阵的Wishart逼近的$n^{-1}$阶界;(ii) 定量多元Satterthwaite逼近;(iii) Wishart测度的局部/积分De Bruijn恒等式和对数Sobolev不等式;(iv) 形状和尺度参数的Stein矩方法,包括结构化尺度估计。

英文摘要

In this work, we develop Stein's method for the Wishart distribution on the cone of positive definite matrices. We establish the basic ingredients of a Wishart Stein framework: we derive an extended-generator-based Stein characterization from the Wishart diffusion process, identify the corresponding transition semigroup through the noncentral Wishart law, provide an explicit semigroup representation for the solution of the Stein equation, and obtain regularity estimates for the solution. The new methodology is demonstrated in four applications: (i) an order $n^{-1}$ bound, for smooth test functions, for the Wishart approximation of uncentered group-mean scatter matrices in MANOVA; (ii) a quantitative multivariate Satterthwaite approximation; (iii) local/integrated De Bruijn identities and logarithmic Sobolev inequalities for the Wishart measure; and (iv) Stein's method of moments for the shape and scale parameters, including structured scale estimation.

2606.04845 2026-06-04 stat.ML cs.LG math.ST stat.CO stat.TH

Bayesian learning for the stochastic shortest path problem

随机最短路径问题的贝叶斯学习

Chon Wai Ho, Sumeetpal S. Singh, Jiaqi Guo

AI总结 针对随机最短路径问题,提出一种贝叶斯框架,通过贝尔曼最优方程直接构建最优动作价值函数Q*的后验分布,并解决似然松弛导致的不可识别性问题,实现不确定性量化与数据高效学习。

详情
Comments
50 pages, 19 figures
AI中文摘要

序列决策问题通常被建模为马尔可夫决策过程(MDP)。我们关注随机最短路径(SSP)问题,这是一个具有吸收终止状态的无限水平无折扣MDP。我们开发了一个贝叶斯框架,通过与决策任务的交互来学习最优决策策略。具体来说,我们学习最优动作价值函数$Q^*$,但与许多现有的贝叶斯方法不同,我们不依赖于不现实的建模假设和临时近似。我们的方法是通过贝尔曼最优方程直接构建$Q^*$的后验信念。对于确定性奖励,我们将后验描述为具有流形密度的分布。为了简化推理,我们放松了似然,使得勒贝格密度存在。但这样做的代价是产生不可识别性问题。具体来说,放松后的后验可能在不当决策规则上有显著质量,而精确后验则不会。我们还计算了$Q^*$的表格参数化、高斯似然放松和高斯先验下最优动作选择的精确后验概率,这在基准测试研究中很有用。对深海基准测试变体的数值研究验证了我们的发现。我们证明了我们的框架能够忠实地量化不确定性,并且与其他基于时间差分的贝叶斯方法相比,数据效率更高。最后,我们对未来工作提出了建议。

英文摘要

Sequential decision-making problems are often modelled as a Markov decision process (MDP). We focus on the stochastic shortest path (SSP) problem, which is an infinite-horizon undiscounted MDP with absorbing terminal states. We develop a Bayesian framework to learn the optimal decision strategy through interactions with the decision-making task. Specifically, we learn the optimal action-value function $Q^*$, but unlike many existing Bayesian approaches, we do not rely on unrealistic modelling assumptions and ad-hoc approximations. Our approach is to directly construct the posterior beliefs for $Q^*$ through Bellman's optimality equations. For deterministic rewards, we characterise the posterior as a distribution with a manifold density. To facilitate simpler inference, we relax the likelihood so that a Lebesgue density exists. The flip side is to create unidentifiability issues. Specifically, the relaxed posterior can have significant mass on improper decision rules, while the exact posterior will not. We also calculate the exact posterior probabilities for optimal action selections for the tabular parametrisation of $Q^*$, a Gaussian likelihood relaxation and a Gaussian prior, which is useful in benchmarking studies. Numerical studies on variants of the Deep Sea benchmark verify our findings. We demonstrate that our framework faithfully quantifies uncertainty and, compared to other temporal-difference-based Bayesian methodologies, is more data efficient. We conclude with recommendations for future work.

2606.04784 2026-06-04 cs.IT cs.DS math.IT math.ST stat.TH

The Preisach Extremum Stack is a Shannon-Minimal Sufficient Statistic for Rate-Independent Functionals

Preisach极值堆是率无关泛函的Shannon最小充分统计量

Piotr Frydrych

AI总结 本文证明Preisach极值堆是率无关泛函的Shannon最小充分统计量,并给出信息论推论和在线维护算法。

详情
Comments
4 pages
AI中文摘要

令R表示所有经典意义下率无关(在单调时间重参数化下不变)的可计算因果泛函类,Pi_n为输入序列u_{0:n}的Preisach极值堆。我们证明了一个刻画定理:每个F∈R满足Fu = f(Pi_n)对于某个可计算f成立,并推导出两个信息论结果。首先,在u_{0:n}的任何概率测度下,对于每个F∈R,等式I(u_{0:n}; Fu) = I(Pi_n; Fu)成立,这是刻画定理的直接推论。其次,主要结果:Pi_n是Shannon最小充分统计量,即对于任何可从其计算所有R查询的随机变量S,有I(u_{0:n}; Pi_n) <= I(u_{0:n}; S)。证明使用了[Frydrych, 2026]的有限指示族从任何充分S重构Pi_n。作为推论,在线维护Pi_n足以进行率无关估计:Preisach测度μ的NNLS估计量可以从增量堆过程(Pi_t)_{t=0}^n以每步O(k * L^2)内存组装,其中k = |Pi_t|,L为网格分辨率。

英文摘要

Let R denote the class of all computable, causal functionals that are rate-independent in the classical sense (invariant under monotone time reparametrizations), and let Pi_n be the Preisach extremum stack of an input sequence u_{0:n}. We prove a characterization theorem establishing that every F in R satisfies Fu = f(Pi_n) for a computable f, and derive two information-theoretic results. First, under any probability measure on u_{0:n}, the equality I(u_{0:n}; Fu) = I(Pi_n; Fu) holds for every F in R and is an immediate corollary of the characterization theorem. Second, the main result: Pi_n is a Shannon-minimal sufficient statistic in the sense that I(u_{0:n}; Pi_n) <= I(u_{0:n}; S) for every random variable S from which all R-queries are computable. The proof uses the finite indicator family of [Frydrych, 2026] to reconstruct Pi_n from any sufficient S. As a corollary, online maintenance of Pi_n suffices for rate-independent estimation: the NNLS estimator of the Preisach measure mu can be assembled from the incremental stack process (Pi_t)_{t=0}^n in O(k * L^2) memory per step, where k = |Pi_t| and L is the grid resolution.

2606.04683 2026-06-04 math.ST stat.TH

Minimax Private Estimation of Smooth Optimal-Transport Maps

光滑最优传输映射的极小极大私有估计

Clément Lalanne, David Rodríguez-Vítores, Franck Iutzeler, Jean-Michel Loubes

AI总结 针对差分隐私约束下光滑最优传输映射的估计问题,提出基于小波的密度估计器和稳定性界,在中心与局部DP模型下达到近极小极大最优率,并建立匹配的极小极大下界。

详情
AI中文摘要

我们研究了在差分隐私(DP)约束下估计两个概率分布之间的光滑最优传输(OT)映射的问题。利用基于小波的密度估计器和光滑OT映射的最新稳定性界,我们提出了适用于中心DP和局部DP模型的差分隐私估计器。我们的主要估计器在维度$d \geq 2$下达到了近极小极大最优率,并且我们补充了一个基于分位数的估计器,该估计器在中心DP下维度$d = 1$时达到了极小极大最优率。我们进一步建立了匹配的极小极大下界,证实了我们的方法的近最优性。据我们所知,这是第一个具有极小极大最优性保证的OT映射估计的差分隐私程序。

英文摘要

We study the problem of estimating smooth optimal transport (OT) maps between two probability distributions under differential privacy (DP) constraints. Leveraging wavelet-based density estimators and recent stability bounds for smooth OT maps, we propose differentially private estimators that apply to both central and local DP models. Our main estimator achieves near-minimax optimal rates in dimension $d \geq 2$, and we complement it with a quantile-based estimator that attains minimax optimal rates in dimension $d = 1$ under central DP. We further establish matching minimax lower bounds, confirming the near-optimality of our approach. To the best of our knowledge, this constitutes the first differentially private procedure for OT map estimation with minimax optimality guarantees.

2606.04673 2026-06-04 stat.ME

Improving Longitudinal Targeted Maximum Likelihood Estimation in Target Trial Emulation using Joint Calibrated Weights

在目标试验模拟中使用联合校准权重改进纵向目标最大似然估计

Juliette M. Limozin, Shaun R. Seaman, Li Su

AI总结 提出联合校准纵向目标最大似然估计(LTMLE),通过同时校准治疗和删失过程的权重,改善目标试验模拟中每方案效应估计的小样本性能和稳健性。

详情
Comments
Main text: 34 pages, 3 figures, 8 tables. Supplementary Materials included
AI中文摘要

在目标试验模拟(TTE)中,边际结构模型(MSM)可用于描述随时间变化的每方案治疗效果。MSM参数通常通过逆概率加权(IPW)估计,权重由最大似然估计得到。然而,基于IPW的估计量在小样本中可能不稳定,且对权重模型的误设敏感。另一种估计MSM参数的方法是纵向目标最大似然估计(LTMLE)。LTMLE具有双重稳健性,且可能比IPW更有效。尽管如此,LTMLE也依赖于逆概率权重,因此可能具有与基于IPW的估计量相同的不稳定性。我们提出了联合校准LTMLE,它将LTMLE与针对TTE中每方案效应估计定制的联合校准权重相结合。这种权重校准通过同时强制治疗和删失过程中的协变量平衡,提高了有限样本性能。模拟表明,与标准LTMLE相比,所提出的方法在效率和对权重模型误设的稳健性方面有所改进。我们通过一个案例研究来评估高效抗逆转录病毒疗法对HIV阳性女性CD4细胞计数的影响,以说明该方法。

英文摘要

In target trial emulation (TTE), marginal structural models (MSMs) can be used to characterise per-protocol treatment effects over time. The MSM parameters are often estimated by inverse probability weighting (IPW), with weights estimated by maximum likelihood. However, IPW-based estimators can be unstable in small samples and are sensitive to misspecification of the weight models. An alternative method for estimating the MSM parameters is longitudinal targeted maximum likelihood estimation (LTMLE). LTMLE is double robust and potentially more efficient than IPW. Nevertheless, LTMLE also relies on inverse probability weights and may therefore share the instability of IPW-based estimators. We propose joint calibrated LTMLE, which integrates LTMLE with joint calibrated weights tailored for per-protocol effect estimation in TTE. This calibration of weights improves finite-sample performance by enforcing covariate balance in both the treatment and censoring processes simultaneously. Simulations show that the proposed method has improved efficiency and robustness to weight model misspecification, compared to standard LTMLE. We illustrate the method using a case study to evaluate the effect of highly active antiretroviral therapy on CD4 cell count among HIV-positive women.

2606.04637 2026-06-04 stat.AP

Optimal designs for incomplete stepped wedge trials

不完全阶梯楔形试验的最优设计

Richard Hooper, Alan Girling

AI总结 针对不完全阶梯楔形试验,提出在给定成本下最小化数据收集簇-周期数或最大化精度的最优设计方法,并给出两周期试验的解及多周期试验的猜想。

详情
AI中文摘要

背景:阶梯楔形试验是纵向随机化评估,通常是整群随机化,其中实验干预以交错方式引入。不完全阶梯楔形设计将数据收集工作集中在特定序列的特定时期。方法:我们假设在每个收集数据的簇中每个时期都有成本,并且每个簇每个时期有固定数量的个体m可提供数据。如果我们愿意支付该簇-时期的数据收集成本,则收集所有m个个体的数据;如果不愿意支付,则在该簇-时期不收集数据。我们考虑设计试验的问题,以最小化达到治疗效应估计量给定精度所需的数据收集簇-时期总数,或者等价地,在给定数据收集簇-时期数量下最大化精度。结果:我们给出了两周期试验的解,该解有两种不同形式,取决于来自同一簇不同时期的两个簇-时期均值之间的相关性。我们还基于对设计空间的贪婪搜索结果,提出了多周期试验解的形式猜想。结论:现实中的阶梯楔形设计问题将涉及权衡各种设计元素的成本,同时受限于数据收集的规模。尽管如此,本文考虑的问题的解显著增进了我们对不完全阶梯楔形试验最优设计的理解。

英文摘要

Background: Stepped wedge trials are longitudinal randomised evaluations, usually cluster-randomised, in which the experimental intervention is introduced in a staggered fashion. Incomplete stepped wedge designs focus the effort of data collection on particular periods in particular sequences. Methods: We suppose there is a cost for every period in every cluster where we collect data, and that there are a fixed number of individuals, m, with data available in each period in each cluster. If we are willing to pay the cost of data collection in that cluster-period then we collect the data on all m individuals, and if we are not willing to pay the cost then we collect no data in that cluster-period. We consider the problem of designing a trial to minimise the total number of cluster-periods of data collection needed to achieve given precision for the treatment effect estimator, or equivalently, to maximise precision for a given number of cluster-periods of data collection. Results: We present the solution for two-period trials, which has two distinct forms, depending on the correlation between two cluster-period means from the same cluster in different periods. We also present a conjecture on the form of the solution for multi-period trials, informed by results from a greedy search of the design space. Conclusions: A real-life stepped wedge design problem will involve trading off the costs of various design elements subject also to constraints on the scale of data collection. Nevertheless, the solutions to the problem considered here add significantly to our understanding of the optimal design of incomplete stepped wedge trials.

2606.04603 2026-06-04 cs.IR cs.LG stat.ML

Distributional Approximate Nearest Neighbour Search for Uncertainty-Aware Retrieval

面向不确定性感知检索的分布近似最近邻搜索

Olivier Jeunen

AI总结 提出DINOSAUR框架,通过为每个物品采样多个嵌入并构建索引,在检索时对用户嵌入进行采样,以隐式边缘化嵌入不确定性,从而在不改变模型架构或索引基础设施的情况下提升长尾物品的覆盖。

详情
AI中文摘要

近似最近邻搜索索引构成了现实世界推荐系统的骨干,支持在百万级物品目录上进行实时候选检索。通常,为每个用户和每个物品学习一个点估计嵌入。在服务时,用户嵌入查询索引以获取相关物品。由于这些表示是从稀疏交互数据中学习的,它们带有噪声,可能无法捕捉所有有助于“相关性”的细微差别——忽略了其固有的基本不确定性。结果是检索管道系统性地偏向于少数嵌入估计良好的热门头部物品,而牺牲了长尾中多数小众、多样和偶然的内容。 我们提出了DINOSAUR(面向不确定性感知检索的分布近似最近邻搜索):一个简单且与基础设施兼容的框架,将嵌入不确定性纳入候选生成。DINOSAUR不为点估计建立索引,而是为每个物品采样$S_i$个嵌入,并在这一增强集上构建索引。类似地,在查询时,对用户嵌入进行采样。这种双边的随机检索过程隐式地边缘化了嵌入不确定性,无需改变模型架构或ANN索引基础设施。 在分析方面,我们展示了当不确定性消失时,DINOSAUR恢复标准的点估计检索,并刻画了增加的嵌入方差如何扩展不确定物品可检索的潜在空间区域。可重复的实证观察与这些预期一致,显示出在离线召回率小幅损失的情况下,覆盖率大幅提升。

英文摘要

Approximate Nearest Neighbour search indices form the backbone of real-world recommender systems, enabling real-time candidate retrieval over million-item catalogues. Typically, a single point estimate embedding is learnt for every user and every item. At serving time, the user embedding queries the index for relevant items. Since these representations are learnt from sparse interaction data, they are noisy and might fail to capture all the nuances that contribute to ``relevance'' -- ignoring the fundamental uncertainty that is inherent to them. The result is a retrieval pipeline that is systematically biased toward the small minority of popular head items with well-estimated embeddings, at the expense of the long-tail majority of niche, diverse, and serendipitous content. We propose DINOSAUR (Distributional Approximate Nearest Neighbour Search for Uncertainty-Aware Retrieval): a simple and infrastructure-compatible framework to incorporate embedding uncertainty into candidate generation. Rather than indexing point estimates, DINOSAUR samples $S_i$ embeddings per item and constructs an index on this augmented set. Analogously, at query time, a user embedding is sampled. This two-sided stochastic retrieval process implicitly marginalises over embedding uncertainty, without requiring changes to model architecture or ANN index infrastructure. On the analytical side, we show that DINOSAUR recovers standard point-estimate retrieval as uncertainty vanishes, and we characterise how increased embedding variance expands the regions of latent space in which uncertain items are retrievable. Reproducible empirical observations align with these expectations, showing large coverage gains with small losses in offline recall.

2606.04576 2026-06-04 stat.ML cs.LG econ.EM q-fin.RM

ReSGA: A Large Tail Risk Model for Learning Value-at-Risk and Expected Shortfall

ReSGA: 一种用于学习风险价值和预期缺口的大尾部风险模型

Yichi Zhang, Ke Zhu, Zhoufan Zhu

AI总结 提出检索增强自分组自编码器(ReSGA),利用数百万参数捕捉资产横截面依赖和长期时间动态,在1926-2023年美国股票数据上优于12种基准模型,并通过新规模增强左尾动量策略实现经济收益。

详情
AI中文摘要

学习风险价值(VaR)和预期缺口(ES)对于有效管理金融风险至关重要。在大数据时代,参数有限的现有方法容易受到模型错误设定的影响。为了解决这一局限性,我们提出了一种大尾部风险模型——检索增强自分组自编码器(ReSGA),该模型设计有数百万个参数,利用资产的特征来挖掘丰富的横截面依赖性和长期时间动态。应用于1926年至2023年的月度美国股票收益数据,包含153个公司特征,ReSGA在样本外损失和统计回测方面优于十二种计量经济学和机器学习竞争对手。此外,其预测优势可以通过一种新的规模增强左尾动量策略构建的多空十分位投资组合转化为显著的经济收益。为了阐明复杂性的作用,我们进一步进行了系统的规模分析,并证明联合VaR-ES预测的改进主要由数据复杂性驱动,而非模型复杂性。最后,我们的组重要性和迁移学习分析展示了ReSGA的可解释性和跨市场泛化能力。

英文摘要

Learning Value-at-Risk (VaR) and Expected Shortfall (ES) is important for managing financial risks effectively. Existing approaches with limited parameters are vulnerable to model misspecification in the era of big data. To address this limitation, we propose a large tail risk model, the retrieval-enhanced self-grouping autoencoder (ReSGA), which is designed with millions of parameters to exploit the rich cross-sectional dependence and long-term temporal dynamics of assets using their characteristics. Applied to monthly US equity returns from 1926 to 2023 with 153 firm characteristics, ReSGA outperforms twelve econometric and machine learning competitors in terms of out-of-sample loss and statistical backtesting. In addition, its forecast advantages can translate into significant economic gains from long-short decile portfolios that are constructed by a new size-enhanced left-side momentum strategy. To clarify the role of complexity, we further conduct a systematic scaling analysis and demonstrate that improvements in joint VaR-ES forecasting are primarily driven by data complexity rather than model complexity. Finally, our analyses of group-importance and transfer-learning exhibit the interpretability and cross-market generalizability of ReSGA.

2606.04574 2026-06-04 cs.LG cs.NE q-fin.ST q-fin.TR stat.ML

Dynamic Multi-Pair Trading Strategy in Cryptocurrency Markets with Deep Reinforcement Learning

基于深度强化学习的加密货币市场动态多对交易策略

Damian Lebiedź, Robert Ślepaczuk

AI总结 本研究提出一种结合深度强化学习执行覆盖层的层次化“过滤-排序”配对选择方法和“固定风险、自适应均值”执行模型,在加密货币市场实现优于启发式基准的统计套利表现。

详情
Comments
61 pages, 37 figures, 16 tables
AI中文摘要

本研究旨在确定深度强化学习(DRL)作为专门执行覆盖层是否能够增强高波动性加密货币市场中的配对交易。尽管该策略的经典实现在传统股票市场中已被证明成功,但在高方差环境中往往表现出刚性并面临严重的发散风险。为应对这一需求,本研究引入了新颖概念。为构建稳健系统,我们开发了层次化的“过滤-排序”配对选择方法和专有的“固定风险、自适应均值”执行模型。该系统采用带有长短期记忆(LSTM)层的近端策略优化(PPO)智能体,在严格确定性风险管理边界内控制执行决策。在币安USD-M期货市场的1小时间隔数据上评估,优化后的强化学习策略在样本外表现显著优于启发式基线。平稳循环块自举稳健性检验证实,智能体的风险调整后超额收益在10%水平上统计显著。尽管略低于更严格的5%阈值,这一结果凸显了数字资产特有的极端异质方差。最终,本论文通过引入结合统计套利与DRL执行策略的混合架构,为量化金融文献做出贡献。此外,它通过确定性屏蔽提供了一种安全强化学习的新框架,证明将神经策略锚定于统计稳健边界能成功缓解严重的发散风险。

英文摘要

This study aims to determine whether the application of Deep Reinforcement Learning (DRL) as a specialized execution overlay can enhance pair trading in highly volatile cryptocurrency markets. Although classical implementations of the strategy have proven successful in traditional equities, they frequently exhibit rigidity and suffer from severe divergence risks when applied to high-variance environments. To address this need, this research introduces novel concepts. To construct a robust system, we developed a hierarchical "Filter-then-Rank" pair selection methodology and a proprietary "Fixed Risk, Adaptive Mean" execution model. The system employs a Proximal Policy Optimization (PPO) agent with a Long Short-Term Memory (LSTM) layer to govern execution decisions within strict deterministic risk management boundaries. Evaluated on 1-hour interval data from the Binance USD-M Futures market, the optimized RL policy achieved an out-of-sample performance that substantially outperformed the heuristic baseline. A stationary circular block bootstrap robustness check confirms that the agent's risk-adjusted outperformance is statistically significant at the 10 percent level. Although falling marginally short of the stricter 5 percent threshold, this result highlights the extreme idiosyncratic variance characteristic of digital assets. Ultimately, this thesis contributes to the quantitative finance literature by introducing a hybrid architecture that combines statistical arbitrage with DRL execution policies. Furthermore, it delivers a novel framework for safe reinforcement learning via deterministic shielding, proving that anchoring a neural policy to statistically robust boundaries successfully mitigates severe divergence risks.

2606.04561 2026-06-04 math.ST stat.TH

Penalized Order Selection for ARFIMA Models

ARFIMA模型的惩罚阶数选择

Chunhao Cai

AI总结 通过最小化嵌套折叠凹惩罚的集成Toeplitz剖面Whittle准则,研究有限阶平稳ARFIMA过程的阶数选择,证明存在一个局部极小值点能以趋于1的概率选择真实阶数并将非活跃系数精确设为零。

详情
AI中文摘要

我们研究通过最小化集成Toeplitz剖面Whittle准则的嵌套折叠凹惩罚来选择有限阶平稳ARFIMA过程的阶数。AR和MA阶数通过后缀组编码,因此零尾部的精确恢复等价于阶数的精确恢复。分析在一个加权筛邻域上进行,其几何权重阻止增长滞后扰动将AR或MA根推向单位圆。在有限真实阶数、互质根分离的短记忆多项式、内部平稳记忆参数、增长筛阶数、适当调优以及独立次高斯新息的条件下,我们证明了存在一个局部极小值点,而非非凸准则的全局最优性。该局部极小值点以趋于1的概率选择真实的AR和MA阶数,将所有非活跃系数精确设为零,并达到活跃坐标速率\( \Op(L_n n^{-1/2}) \),其中随机尺度\(L_n\)由局部失配\(Δ\)控制。证明结合了加权根分离、匹配有限段Toeplitz控制、有限段迹偏差控制、平稳观测向量的次高斯二次型论证以及折叠凹惩罚的局部支配步骤。

英文摘要

We study order selection for a finite-order stationary ARFIMA process by minimizing a nested folded-concave penalization of the integrated Toeplitz profile Whittle criterion. The AR and MA orders are encoded through suffix groups, so that exact recovery of the zero tails is equivalent to exact recovery of the orders. The analysis is carried out on a weighted sieve neighborhood whose geometric weights keep growing-lag perturbations from pushing AR or MA roots toward the unit circle. Under finite true orders, coprime root-separated short-memory polynomials, an interior stationary memory parameter, growing sieve orders, suitable tuning, and independent sub-Gaussian innovations, we prove existence of an oracle local minimizer, rather than global optimality of the nonconvex criterion. This local minimizer selects the true AR and MA orders with probability tending to one, sets all inactive coefficients exactly to zero, and achieves the active-coordinate rate \( \Op(L_n n^{-1/2}) \), where the stochastic scale \(L_n\) is governed by the local mismatch \(Δ\). The proof combines weighted root-separation, matched finite-section Toeplitz control, finite-section trace-bias control, a sub-Gaussian quadratic-form argument for the stationary observed vector, and a local domination step for folded-concave penalties.

2606.04546 2026-06-04 stat.ME stat.AP

Bivariate inverse Gaussian degradation processes with shared random effects and an application to fatigue cracks

具有共享随机效应的双变量逆高斯退化过程及其在疲劳裂纹中的应用

Yuvraj Dutta, Sandip Barui, Debanjan Mitra, Narayanaswamy Balakrishnan

AI总结 提出一种基于广义伽马共享脆弱性的双变量逆高斯退化模型,灵活选择脆弱性分布,通过模拟和疲劳裂纹数据验证了其优越的拟合性能。

详情
AI中文摘要

逆高斯(IG)过程是单变量退化数据广泛使用的模型。对于涉及两个性能特征(PC)的双变量退化数据,通常通过未观测的共享脆弱因子结合IG过程引入依赖性。以往研究通常假设特定的脆弱性分布,如正态或伽马,但由于脆弱性不可观测,此类选择难以证明。本文提出一个通用的IG GG框架,用于建模具有依赖PC的双变量退化数据。每个退化过程使用IG过程建模,而共享脆弱性遵循广义伽马(GG)族,其包含指数、伽马、威布尔和对数正态分布作为特例。该框架允许在GG族内灵活选择适当的脆弱性分布,从而改进模型拟合。开发了便捷的参数估计程序,并通过模拟研究评估,显示出令人满意的性能。将所提模型应用于疲劳裂纹数据,并与若干现有基于脆弱性和基于copula的模型进行比较。结果表明,IG GG模型提供了更优的拟合。还讨论了IG GG框架下的系统可靠性估计。

英文摘要

The inverse Gaussian (IG) process is a widely used model for univariate degradation data. For bivariate degradation data involving two performance characteristics (PCs), dependence is often introduced through an unobserved shared frailty factor combined with IG processes. Previous studies typically assume a specific frailty distribution, such as normal or gamma, although such choices are difficult to justify because the frailty is unobserved. This paper proposes a general IG GG framework for modeling bivariate degradation data with dependent PCs. Each degradation process is modeled using an IG process, while the shared frailty follows the generalized gamma (GG) family, which includes exponential, gamma, Weibull, and lognormal distributions as special cases. The proposed framework allows flexible selection of an appropriate frailty distribution within the GG family, leading to improved model fitting. Convenient parameter estimation procedures are developed and evaluated through simulation studies, demonstrating satisfactory performance. The proposed model is applied to fatigue crack data and compared with several existing frailty based and copula based models. Results show that the IG GG model provides a superior fit. System reliability estimation under the IG GG framework is also discussed.

2606.04523 2026-06-04 stat.ME

Bias Correction for Scalar-on-Density Regression Models

标量对密度回归模型的偏差校正

Fenglin Xie, Todd Ogden

AI总结 针对标量对密度回归模型中因每个观测单元测量次数有限导致的系数函数衰减偏差,提出基于模拟外推(SIMEX)的偏差校正方法,并证明偏差随测量次数增加单调递减。

详情
Comments
26 pages, 7 figures, 1 table
AI中文摘要

在标量对函数回归模型的一种扩展中,协变量被视为从每个观测单元收集的有限数量测量值估计得到的密度。当测量次数相对较少时,估计的系数函数会遭受衰减偏差。本文研究了偏差如何依赖于每个单元的测量次数,并提出了一种基于模拟外推(SIMEX)的偏差校正方法。我们证明偏差随每个单元测量次数的增加而单调递减。所提出的SIMEX过程应用bootstrap重采样来模拟更少的测量次数,然后外推到无限多次测量,从而校正有限测量偏差。在样本量和噪声水平范围内进行的综合模拟研究表明,系数函数的平均积分平方误差随每个单元测量次数的增加而减小,并且SIMEX外推估计比基于全部测量值的朴素估计具有更低的偏差。通过应用于国家健康与营养调查(NHANES)进一步说明了该方法的实用性,我们将24小时身体活动概况与全因死亡率相关联。该例子支持了方法的有效性,并展示了其检测和校正有限测量偏差的能力。

英文摘要

In one extension of scalar-on-function regression modeling, the covariate is taken to be a density that is estimated from a finite number of measurements gathered for each observational unit. When this number of measurements is relatively small, the estimated coefficient function suffers from attenuation bias. This paper studies how the bias depends on the number of measurements per unit and proposes a bias-correction method based on simulation extrapolation (SIMEX). We establish that the bias decreases monotonically as the number of measurements per unit increases. The proposed SIMEX procedure applies bootstrap resampling to simulate smaller measurement counts and then extrapolates to infinitely many measurements, thereby correcting finite-measurement bias. A comprehensive simulation study, conducted over a range of sample sizes and noise levels, shows that the mean integrated squared error of the coefficient function decreases with more measurements per unit and that the SIMEX-extrapolated estimates achieve lower bias than the naive estimates based on the full set of measurements. The practical utility of the method is further illustrated through an application to the National Health and Nutrition Examination Survey, for which we relate 24-hour physical activity profiles to all-cause mortality. This example supports the validity of the method and demonstrates its ability to detect and correct for finite-measurement bias.

2606.04520 2026-06-04 stat.ME math.ST stat.TH

Beyond First-order Asymptotics in Sequential Mean Testing

序列均值检验中的一阶渐近之外

Vikas Deep, Shubhada Agrawal

AI总结 研究在水平α的势为一框架下,基于KL_inf统计量的序列检验的停止时间的中心极限定理,给出渐近最优检验的二阶刻画。

详情
AI中文摘要

我们重新审视在水平-α势为一框架下对 bounded 分布均值进行序列检验的问题。我们研究了一种基于 $\mathrm{KL_{inf}}$ 的序列检验,已知该检验在 $\alpha \to 0$ 时以精确常数达到期望停止时间的信息论下界。超越一阶渐近,我们建立了该检验停止时间的中心极限定理(CLT)。我们的分析分两步进行。首先,我们证明了 $\mathrm{KL_{inf}}$ 统计量本身的一个新颖的 CLT,刻画了其围绕确定性极限的波动。然后,我们利用这一结果证明,适当中心化并按 $\sqrt{\log(1/\alpha)}$ 缩放的停止时间依分布收敛到具有显式方差的高斯极限。这给出了 bounded 分布渐近最优序列检验的二阶刻画。最后,我们通过数值实验验证了我们的理论发现。

英文摘要

We revisit the problem of sequentially testing the mean of bounded distributions in a level-$α$ power-one framework. We study a $\mathrm{KL_{inf}}$-based sequential test that is known to attain the information-theoretic lower bound on the expected stopping time with exact constants as $α\to 0$. Going beyond first-order asymptotics, we establish a central limit theorem (CLT) for the stopping time of this test. Our analysis proceeds in two steps. First, we prove a novel CLT for the $\mathrm{KL_{inf}}$ statistic itself, characterizing its fluctuations around its deterministic limit. We then leverage this result to show that the stopping time, centered appropriately and scaled by $\sqrt{\log(1/α)}$, converges in distribution to a Gaussian limit with an explicit variance. This yields a second-order characterization of an asymptotically optimal sequential test for bounded distributions. Finally, we present numerical experiments that corroborate our theoretical findings.

2606.04495 2026-06-04 stat.ME

Fused Spatial Latent Block Models for Co-Clustering

融合空间潜在块模型的共聚类方法

Biao Cai, Yuanxing Chen, Kuangnan Fang, Xiaolong Lin

AI总结 提出融合空间潜在块模型(F-SpLBM),利用LBM、惩罚融合和Potts模型实现空间转录组学中斑点与基因的共聚类,并证明其超多项式收敛率和空间平滑带来的精度提升。

详情
AI中文摘要

空间转录组学是一种快速发展的技术,可在完整组织切片中捕获基因表达及其空间坐标,从而实现转录活性的原位映射。该技术为研究组织异质性和空间基因表达模式提供了前所未有的机会。揭示空间可变基因模块与斑点类型之间的关联可以增进我们对病理机制的理解。然而,目前仍缺乏利用空间信息实现斑点和基因空间一致共聚类的严格统计方法,且该方向的理论研究有限。我们提出了一种融合空间潜在块模型(F-SpLBM)。我们的模型使用LBM揭示斑点和基因之间的共表达模式,通过惩罚融合自动确定共簇数量,并利用Potts模型整合空间信息。我们证明了基于融合的程序能够以超多项式收敛速度恢复真实的块结构。我们还证明了参数估计量的渐近正态性,并量化了空间平滑带来的精度提升。模拟和真实数据分析表明,F-SpLBM能够产生空间一致且生物学可解释的聚类结果。

英文摘要

Spatial transcriptomics is a rapidly growing technique that captures gene expression together with spatial coordinates in intact tissue sections, enabling in situ mapping of transcriptional activity. This technology offers unprecedented opportunities to study tissue heterogeneity and spatial gene expression patterns. Uncovering the associations between spatially variable gene modules and spot types can advance our understanding of pathological mechanisms. However, rigorous statistical methods that exploit spatial information to achieve spatially coherent co-clustering of spots and genes are still lacking, and theoretical investigations in this direction remain limited. We propose a fused spatial latent block model (F-SpLBM). Our model uses the LBM to uncover co-expression patterns between spots and genes, penalized fusion to automatically determine the number of co-clusters, and the Potts model to incorporate spatial information. We establish that the fusion-based procedure recovers the true block structure with the misclassification rate converging at a super-polynomial rate. We also prove asymptotic normality of the parameter estimators and quantify the accuracy gain from spatial smoothing. Simulations and real-data analyses demonstrate that F-SpLBM yields spatially coherent and biologically interpretable clustering results.

2606.04486 2026-06-04 cs.CR cs.CL cs.LG stat.ML

Global Sketch-Based Watermarking for Diffusion Language Models

基于全局草图的扩散语言模型水印

Daniel Zhao

AI总结 提出一种针对掩码扩散语言模型的全局向量草图水印方法,通过控制文本的整体统计特征实现与局部上下文无关的检测。

详情
AI中文摘要

语言模型的水印方法在自回归设置中已被广泛研究,其中令牌是顺序生成的。这些工作主要关注局部上下文方案,该方案根据前序令牌扰动下一个令牌的分布。在扩散语言模型中,许多未解析位置的分布被联合采样,使得整个序列的加性统计在生成过程中是可处理的。我们提出了一种针对掩码扩散语言模型的水印,该水印控制文本的全局向量草图表示。与上下文相关的水印相比,草图公式将检测与生成过程中看到的局部上下文解耦,从而产生一个顺序无关的统计量和一个不表现为简单令牌偏差的水印规则。我们分析了该方法的失真、合理性和鲁棒性。

英文摘要

Watermarking methods for language models have been studied extensively in the autoregressive setting, where tokens are generated sequentially. These works largely focus on local-context schemes that perturb the next token's distribution as a function of its preceding tokens. In diffusion language models, distributions over many unresolved positions are jointly sampled, allowing additive statistics of the entire sequence to be tractable during generation. We propose a watermark for masked diffusion language models that controls a global, vector-valued sketch representation of the text. Compared to context-dependent watermarking, the sketch formulation decouples detection from the local contexts seen during generation, resulting in an order-agnostic statistic and a watermarking rule which does not manifest as a simple token bias. We analyze the distortion, soundness, and robustness properties of the method.

2606.04476 2026-06-04 cs.LG math.OC math.ST stat.ML stat.TH

When Both Layers Learn: Training Dynamics of Representing Linear Models via ReLU Networks

当两层都学习:通过ReLU网络表示线性模型的训练动力学

Berk Tinaz, Changzhi Xie, Mahdi Soltanolkotabi

AI总结 本文研究单隐层ReLU网络联合训练两层以拟合线性目标函数的梯度下降动力学,通过三阶段分析证明从随机初始化出发能以线性速率收敛到全局最小化器并达到最优样本复杂度。

详情
Comments
47 pages, 8 figures, published at the 39th Annual Conference on Learning Theory (COLT), 2026
AI中文摘要

在本文中,我们研究了联合训练单隐层ReLU网络的两层以拟合线性目标函数的梯度下降动力学。具体来说,我们考虑一个可实现设置,其中输入从高斯分布中独立同分布采样,标签遵循一个植入的线性模型。这种风格化的框架捕捉了逆问题和某些自编码器模型中端到端训练的关键特征。尽管其表面简单,但动力学仍然难以理解,部分原因是损失景观包含多个非严格鞍点,这使得不清楚为什么从随机初始化开始的梯度下降能够可靠地逃离坏的驻点区域。我们提供了优化景观的详细刻画,并证明从适度小的随机初始化开始——同时训练两层——梯度下降以线性速率收敛到全局最小化器,并具有阶次最优的样本复杂度。我们的分析通过三个阶段追踪轨迹:对齐阶段,其中隐藏权重逐渐与植入方向对齐,而输出权重保持正确的符号模式;增长阶段,其中两层的范数增加同时保持对齐;以及局部细化阶段,其中对齐的神经元快速收敛到植入方向,产生快速的局部收敛。为了严格证明梯度下降避免非严格鞍点,我们为端到端动力学开发了轨迹级控制论证。此外,我们建立了沿整个轨迹成立的新颖的均匀集中结果,这对于获得阶次最优的样本复杂度至关重要。我们通过一系列配置的大量实验验证了我们的理论。

英文摘要

In this paper, we study the gradient descent dynamics for jointly training both layers of a one-hidden-layer ReLU network to fit a linear target function. Concretely, we consider a realizable setting where inputs are drawn i.i.d. from a Gaussian distribution and labels follow a planted linear model. This stylized framework captures salient features of end-to-end training in inverse problems and certain auto-encoder models. Despite its apparent simplicity, the dynamics remain poorly understood, in part because the loss landscape contains multiple non-strict saddle points, making it unclear why gradient descent from random initialization reliably escapes bad stationary regions. We provide a detailed characterization of the optimization landscape and prove that gradient descent from a moderately small random initialization-simultaneously training both layers-converges to a global minimizer at a linear rate with order-wise optimal sample complexity. Our analysis tracks the trajectory through three phases: an alignment phase in which hidden weights progressively align with the planted direction while the output weights maintain the correct sign pattern; a growth phase in which the norms of both layers increase while preserving alignment; and a local refinement phase in which the aligned neurons rapidly converge to the planted direction, yielding fast local convergence. To rigorously show that GD avoids non-strict saddles, we develop trajectory-level control arguments for the end-to-end dynamics. In addition, we establish novel uniform concentration results that hold along the entire trajectory, and are essential for obtaining order-wise optimal sample complexity. We corroborate our theory with extensive experiments across a range of configurations.

2606.04445 2026-06-04 cs.LG cs.AI math.ST stat.TH

RowNet: A Memory Transformer for Tabular Regression

RowNet: 用于表格回归的记忆Transformer

Askat Rakhymbekov, Gulshat Muhametjanova

AI总结 针对房地产估值中表格回归问题,提出RowNet,一种基于检索的神经网络架构,通过记忆库中的成对相似性特征、目标一致性增强和混合专家模块实现价格预测。

详情
Comments
Retrieval-based neural architecture for real estate valuation. Related to TabR (arXiv:2307.14338) and retrieval-augmented tabular learning
AI中文摘要

房地产估值是一个结构化回归问题,其中价格受异构特征类型、稀疏区域效应、非线性交互以及可比房产的实际逻辑影响。标准多层感知器将每一行视为孤立向量,必须仅从监督中学习局部性、尺度敏感性和类别匹配。梯度提升决策树提供了强大的表格基线,但其以特征为中心的分裂机制并未显式建模相似历史观测的检索。本文提出了RowNet,一种用于房地产每平方米价格预测的基于检索的神经网络架构。RowNet通过针对标记属性记忆库的成对相似性特征来表示查询属性。第一检索层从仅特征相似性中估计粗略目标。第二层通过目标一致性特征增强记忆比较,并使用多个学习注意力头检索互补的可比集。最终的混合专家模块结合了学习门控、残差校正、熵正则化和头多样性正则化以产生预测。

英文摘要

Real estate valuation is a structured regression problem in which prices are governed by heterogeneous feature types, sparse regional effects, nonlinear interactions, and the practical logic of comparable properties. Standard multilayer perceptrons treat each row as an isolated vector and must learn locality, scale sensitivity, and categorical matching from supervision alone. Gradient-boosted decision trees provide strong tabular baselines, but their feature-centric splitting mechanism does not explicitly model the retrieval of similar historical observations. This paper presents RowNet, a retrieval-based neural architecture for real estate price-per-square-meter prediction. RowNet represents a query property through pairwise similarity features against a memory bank of labeled properties. A first retrieval layer estimates a coarse target from feature-only similarities. A second layer augments the memory comparison with target-consistency features and uses multiple learned attention heads to retrieve complementary comparable sets. A final mixture-of-experts module combines learned gating, residual correction, entropy regularization, and head-diversity regularization to produce the prediction.

2606.04440 2026-06-04 math.ST stat.TH

Asymptotic analysis of parameterised univariate Gaussian splitting

参数化单变量高斯分裂的渐近分析

Dmitry Mikhin, Athena Xiourouppa

AI总结 本文对 arXiv:2606.01530 中提出的单变量高斯分裂算法进行渐近分析,该算法通过最小化近似与原始高斯之间的平方 L^2 范数,用均匀间隔的同方差高斯分量混合来逼近标准一维高斯分布。

详情
Comments
23 pages
AI中文摘要

本文档详细推导了 arXiv:2606.01530 中开发的单变量分裂算法。该算法通过均匀间隔的同方差高斯分量的混合来逼近标准一维高斯分布。通过最小化近似与原始高斯之间失配的平方 $L^2$ 范数来求解。本文提出了在混合均值步长 $h$ 小和混合分量数量 $M$ 大的极限下所提议分裂的渐近分析。

英文摘要

This document provides in-depth details for the derivation of the univariate splitting algorithm developed in arXiv:2606.01530. The algorithm approximates the standard, 1-D Gaussian distribution with a mixture of uniformly spaced homoscedastic Gaussian components. The solution is found by minimising the squared $L^2$ norm of the mismatch between the approximation and the original Gaussian. This text presents asymptotic analyses of the proposed splitting in the limit of small step $h$ between the mixand means and in the limit of large number of mixands $M$.

2606.04429 2026-06-04 stat.ML cs.LG

Flatness and Generalization: Learning Multi-Index Models with Homogeneous Neural Networks

平坦性与泛化:使用齐次神经网络学习多指标模型

Harsh Vardhan, Hossein Taheri, Arya Mazumdar

AI总结 本文研究两层齐次神经网络学习多指标模型时,平坦性与泛化之间的关系,证明最平坦插值器总能泛化,而某些非泛化插值器的平坦性无法接近最平坦值。

详情
AI中文摘要

用于解释一阶梯度方法在非凸神经网络上泛化能力的常见启发式方法是“平坦插值器泛化良好”(Hochreiter and Schmidhuber, 1994; Keskar et al., 2017),其中平坦性可通过经验损失Hessian矩阵的迹来衡量。然而,Dinh等人(2017)表明,利用网络的对称性(可在保持总体和经验损失不变的情况下改变平坦性),任何插值器都可以变得更尖锐或更平坦。这一结果使得之前的启发式陈述变得空洞。在本文中,我们表明,对于使用两层非凸齐次神经网络学习未知多指标模型,尽管存在对称性,平坦性与泛化之间仍存在联系。这种联系涉及“最平坦”插值器,即所有插值器中具有阶数最小平坦性的插值器。首先,我们证明存在一类自然的非泛化插值器,其平坦性即使利用对称性也无法接近最平坦可能值。其次,我们证明,对于由单指标模型之和生成的数据,如果近似误差和标签噪声较低,任何最平坦插值器都能实现较小的总体损失,即最平坦插值器总是泛化的。这建立了平坦性与泛化之间的直接联系,适用于一大类激活函数和现实数据分布。

英文摘要

A common heuristic used to explain the generalization of first-order gradient methods on non-convex neural networks is that "flat interpolators generalize well" (Hochreiter and Schmidhuber, 1994; Keskar et al., 2017), where flatness can be measured by the trace of the Hessian of the empirical loss. However, Dinh et al. 2017) showed that, using symmetry of the network that can change flatness while keeping the population and empirical losses unchanged, any interpolator can be made sharper or flatter. This result makes the earlier heuristic statement vacuous. In this paper, we show that for learning an unknown multi-index model with $2$-layer non-convex homogeneous neural networks, there is a connection between flatness and generalization, despite the existence of symmetries. This connection pertains to the "flattest" interpolators, i.e., the interpolators that have orderwise minimum flatness among all interpolators. First, we show that there exists a natural class of non-generalizing interpolators whose flatness cannot be made closer to the flattest possible, even using symmetries. Second, we show that for data generated by a sum of single-index models, if the approximation error and label noise are low, any flattest interpolator achieves small population loss, i.e., the flattest interpolators always generalize. This establishes a direct link between flatness and generalization which applies to a large class of activations and realistic data distributions.

2606.04423 2026-06-04 cs.LG stat.ML

The price of multi-group transductive learning

多组转导学习的代价

Noah Bergam, Samuel Deng, Daniel Hsu

AI总结 本文证明在转导学习设置中,多组学习器在某些组上的错误率可能相对于单组设置产生乘法惩罚,且惩罚随组数线性增长至样本量的平方根,这与统计设置中惩罚至多对数增长且与组数无关形成鲜明对比。

详情
AI中文摘要

我们证明,在转导设置中,每个多组学习器在某些组上的错误率相对于单组设置可能产生乘法惩罚,并且惩罚可以随组数线性增加,最多达到样本量的平方根。这与类似(组可实现)统计设置中的最优多组学习器形成鲜明对比,后者的惩罚始终至多是样本量的对数,且与组数无关。

英文摘要

We show every multi-group learner in the transductive setting may incur a multiplicative penalty in its error rate on some group relative to the error rate achievable in the single-group setting, and the penalty can increasing linearly with the number of groups, up to roughly the square-root of the sample size. This stands in stark contrast to optimal multi-group learners in an analogous (group-realizable) statistical setting, where the penalty is always at most logarithmic in the sample size and independent of the number of groups.

2606.04417 2026-06-04 stat.ME stat.CO

saCI: An R Package for Stochastic Approximation Confidence Intervals for Correlation Coefficients

saCI: 一个用于相关系数随机逼近置信区间的R包

Pengyu Chen, Yifan Jiang, Jiashuo Shao

AI总结 本文介绍了一个R包saCI,它实现了随机逼近方法,用于构建Pearson相关系数的非参数置信区间,并提供了与BCa自助法的比较及交互式Shiny应用。

详情
Comments
8 pages, 1 figure, R package
AI中文摘要

本文介绍了saCI,一个实现随机逼近方法以构建Pearson相关系数非参数置信区间的R包。该包基于Garthwaite (1996)提出的算法,并由Xiong & Xu (2016)进一步发展。实现提供了随机逼近(SA)方法和Bootstrap BCa方法以供比较,以及一个用于探索性分析的交互式Shiny应用。该包已成功发布在CRAN上,证明了其符合R包标准和可重复性。

英文摘要

This paper presents saCI, an R package that implements the stochastic approximation method for constructing nonparametric confidence intervals for Pearson's correlation coefficient. The package is based on the algorithm proposed by Garthwaite (1996) and further developed by Xiong & Xu (2016). The implementation provides both the stochastic approximation (SA) method and the bootstrap BCa method for comparison, along with an interactive Shiny application for exploratory analysis. The package has been successfully published on CRAN, demonstrating its compliance with R package standards and reproducibility.

2606.04416 2026-06-04 stat.ME math.ST stat.AP stat.TH

Powerful Multivariate Sensitivity Analysis via Sample Splitting in an Observational Study of the Effects of Poverty on Cardiovascular Disease Risk Factors

基于样本分割的观察性研究中贫困对心血管疾病风险因素影响的多变量敏感性分析

William Bekerman, Anurag Mehta, Rebecca E. Hasson, Leah E. Robinson, Dylan S. Small, Colin B. Fogarty

AI总结 提出通过样本分割(规划样本选择最优线性组合,分析样本进行推断)来增强观察性研究中多结局全局零假设检验对未测量偏倚的敏感性分析功效,并应用于贫困对儿童青少年心血管疾病风险因素影响的研究。

详情
AI中文摘要

在评估观察性研究中暴露对两个或更多结局的因果效应时,结局的线性组合可能降低全局零假设检验对潜在未测量偏倚的敏感性。虽然可以使用Scheffe投影或其约束变体考虑所有评分结局的线性组合,但找到最小化对未测量偏倚敏感性的组合需要对多重检验进行校正,这会削弱功效,尤其是当许多结局感兴趣时。为了缓解这一问题,我们提出将样本分割为规划样本(用于识别最优线性组合)和分析样本(用于进行推断)。我们提供了该方法的线性组合集的新颖刻画,保证该方法与全样本替代方法实现相同的渐近功效,并进行了广泛的模拟研究,证明在有限样本中增强了功效。最后,我们将该方法应用于研究贫困对儿童和青少年心血管疾病风险因素出现的影响。我们发现了与身体成分、体力活动和烟草暴露相关的结局的不良后果。尽管贫困对高烟草暴露的影响对未测量混杂因素表现出一定的稳健性,但其他发现仍然对潜在偏倚敏感。

英文摘要

When assessing the causal effect of an exposure on two or more outcomes in an observational study, a linear combination of outcomes may lessen the sensitivity of a test of the global null hypothesis to potential unmeasured biases. While all linear combinations of scored outcomes can be considered using Scheffe projections or constrained variants thereof, finding the combination that minimizes sensitivity to unmeasured biases requires corrections for multiple testing, which can erode power, especially when many outcomes are of interest. To mitigate this issue, we propose splitting the sample into a planning sample to identify an optimal linear combination and an analysis sample to conduct inference. We provide a novel characterization of the set of linear combinations for which this approach is guaranteed to achieve the same asymptotic power as full-sample alternatives and conduct extensive simulation studies that demonstrate enhanced power in finite samples. Finally, we apply our method to investigate the effects of poverty on the emergence of cardiovascular disease risk factors in children and adolescents. We discover adverse consequences on outcomes related to body composition, physical activity, and tobacco exposure. Although the impact of poverty on elevated tobacco exposure shows some robustness to unmeasured confounding, the other findings remain sensitive to potential biases.

2606.04404 2026-06-04 stat.ML cs.LG

Knockoffs-based False Discovery Rate Control and Simplification for Deep Neural Networks

基于Knockoffs的深度神经网络错误发现率控制与简化

Huiqi Zhang, Wenyu Liao, Yiqing Shi, Xiaobo Huang, Fang Xie

AI总结 本文基于knockoff方法和正则化神经网络,提出了三种在控制错误发现率条件下的变量筛选方法(单层过滤、多层过滤、变量权重聚合过滤),以简化深度神经网络并降低计算复杂度。

详情
AI中文摘要

深度神经网络是机器学习中广泛使用的框架,已广泛应用于各个领域。然而,深度神经网络通常涉及大量参数和输入,其中许多可能与目标或真实输出无关。这些参数和输入变量不仅增加了计算复杂度,还导致了额外的计算成本。解决这一问题的一种方法是knockoff方法,该方法在高维回归中已被证明能有效控制错误发现率。基于knockoff方法和正则化神经网络,本文提出了三种在控制错误发现率条件下的变量筛选方法:单层过滤、多层过滤、变量权重聚合过滤。与现有算法相比,我们发现我们的算法表现出令人满意的性能。

英文摘要

The deep neural network is a widely used framework in machine learning that has been widely applied in various fields. However, deep neural networks often involve a large number of parameters and inputs, many of which may be irrelevant to the goal or true output. These parameters and \textcolor{black}{input variables} not only increase computational complexity, but also contribute to additional computational cost. One solution to this problem is knockoff methods, which have proven successful in controlling false discovery rates in high-dimensional regression. Building on the knockoff methods and using the regularised neural network, this paper proposes three variable screening methods under the condition of controlling false discovery rates: \textit{one layer filter}, \textit{multiple layers filter}, \textit{variable weight aggregation filter}. In comparison with existing algorithms, we find that our algorithms show satisfactory performance.

2606.04384 2026-06-04 cs.LG cs.CR stat.ML

Revisiting Privacy Amplification by Subsampling in Selective Release DPSGD

重新审视选择性释放DPSGD中的子采样隐私放大

Xiaobo Huang, Fang Xie

AI总结 针对DPSGD中梯度裁剪和噪声注入导致的效用下降和收敛缓慢问题,重新评估选择性释放机制的隐私分析,提出基于裁剪梯度的差分隐私选择性释放算法(DPSR-CG),通过严格的隐私分析和实验证明其在保持严格隐私保证的同时实现优异模型性能。

详情
AI中文摘要

机器学习对敏感数据的依赖需要差分隐私随机梯度下降(DPSGD)等隐私保护技术。然而,由于梯度裁剪和噪声注入,DPSGD存在显著的效用下降和收敛缓慢的问题。先前的工作试图从不同角度改进DPSGD;值得注意的是,差分隐私选择性更新与释放(DPSUR)算法取得了显著的模型效用。然而,DPSUR中的隐私核算忽略了选择性释放机制引入的采样概率变化,这损害了其隐私保证的严谨性。为了解决这些限制,我们重新评估了选择性释放机制的隐私分析,并提出了一种新颖的算法:基于裁剪梯度的差分隐私选择性释放(DPSR-CG)。通过严格的新推导隐私分析以及在多个数据集(MNIST、CIFAR-10、IMDB和FMNIST)上的广泛实验,我们证明了我们的DPSR-CG机制在保持严格隐私保证的同时实现了卓越的模型性能。

英文摘要

Machine learning's reliance on sensitive data necessitates privacy-preserving techniques like Differentially Private Stochastic Gradient Descent (DPSGD). However, DPSGD suffers from substantial utility degradation and slow convergence due to gradient clipping and noise injection. Prior works have attempted to improve DPSGD from various perspectives; notably, the Differentially Private Selective Update and Release (DPSUR) algorithm has achieved remarkable model utility. However, the privacy accounting in DPSUR overlooks the variation in sampling probability introduced by the selective release mechanism, which compromises the rigor of its privacy guarantees. To address these limitations, we re-evaluate the privacy analysis of the selective release mechanism and propose a novel algorithm: Differentially Private Selective Release based on Clipped Gradients (DPSR-CG). Through a rigorous, newly derived privacy analysis and extensive experiments on multiple datasets (MNIST, CIFAR-10, IMDB, and FMNIST), we demonstrate that our DPSR-CG mechanism maintains strict privacy guarantees while achieving exceptional model performance.

2606.04380 2026-06-04 stat.ML cs.LG

REGAIN: REconciliation GAIN-driven Auxiliary Direction Learning

REGAIN:基于调和增益的辅助方向学习

Weijia Li, Shun Hu, Yanfei Kang

AI总结 提出REGAIN框架,通过学习归一化辅助方向并利用冻结预测预言机,基于目标加权损失减少选择方向,以改进预测调和。

详情
AI中文摘要

预测调和通常从固定测量系统开始,询问如何将预测投影到一致空间。我们提出不同问题:哪些额外的线性测量应被预测并纳入调和系统?我们提出REGAIN,一种调和增益框架,学习归一化辅助方向,用冻结预测预言机预测诱导序列,并通过增强广义最小二乘调和后的目标加权损失减少选择方向。与基于方差的分量或基于可预测性的辅助选择不同,REGAIN优化辅助测量对最终调和预测的下游影响。我们提供统计特征,表明有用的辅助方向必须提供关于未解决目标不确定性的互补信息,而不仅仅是易于预测。分析还阐明了协方差风险减少机制、偏差变化在实现二次风险中的作用以及估计增益信号的稳定性。开发了带有保留增益筛选的分阶段学习算法,以及可选的联合优化步骤。在北京PM2.5和澳大利亚旅游数据上的实验表明,增益选择的测量可以改进普通多变量和层次预测,特别是当它们揭示原始测量系统未捕捉的残差不确定性时。

英文摘要

Forecast reconciliation usually starts from a fixed measurement system and asks how forecasts should be projected onto a coherent space. We ask a different question: which additional linear measurements should be forecast and included in the reconciliation system? We propose REGAIN, a reconciliation-gain framework that learns normalized auxiliary directions, forecasts the induced series with a frozen forecasting oracle, and selects directions by their target-weighted loss reduction after augmented generalized least-squares reconciliation. Unlike variance-based components or predictability-based auxiliary selection, REGAIN optimizes the downstream effect of an auxiliary measurement on the final reconciled forecasts. We provide a statistical characterization showing that useful auxiliary directions must provide complementary information about unresolved target uncertainty, rather than merely being easy to forecast. The analysis also clarifies the covariance-risk reduction mechanism, the role of bias changes in realized quadratic risk, and the stability of estimated gain signals. A stagewise learning algorithm with held-out gain screening is developed, together with an optional joint refinement step. Experiments on Beijing PM2.5 and Australian Tourism data show that gain-selected measurements can improve both ordinary multivariate and hierarchical forecasts, especially when they reveal residual uncertainty not captured by the original measurement system.

2606.04375 2026-06-04 cs.LG stat.ML

When Do Fewer Coordinates Suffice in DP-SGD?

何时在DP-SGD中更少的坐标就足够了?

Huiqi Zhang, Fang Xie

AI总结 本文提出一种无需公共数据的两阶段坐标稀疏私有训练方法TP-TopK,通过私有预热阶段识别坐标支撑集,使得噪声项缩放比例从全参数维度d降至活跃维度k,并在非凸平稳性边界下给出坐标限制有效的条件。

详情
Comments
14 pages
AI中文摘要

差分隐私随机梯度下降(DP-SGD)向每个更新的坐标注入噪声,使得注入的噪声能量随环境参数维度\(d\)缩放。我们探究私有训练何时可以更新更少的坐标而不丢失优化所需的信号。我们提出 extsc{TP-TopK}(两阶段TopK DP-SGD),一种无需公共数据的坐标稀疏私有训练的两阶段方法,其中私有预热阶段识别用于指导主训练阶段的坐标支撑集。我们给出了一个刻画坐标限制何时有益的准则,通过非凸平稳性边界表明在该条件下相关噪声项随活跃维度\(k\)而非全参数维度\(d\)缩放,并提供了基于预热的坐标排序可靠性的下界。在MNIST、FMNIST和CIFAR-10上的实验表明,学习到的坐标支撑集比大小匹配的随机支撑集能保留更多的梯度能量,当活跃维度较小且预热分数信息丰富时收益最大。

英文摘要

Differentially private stochastic gradient descent (DP-SGD) injects noise into every updated coordinate, making the injected noise energy scale with the ambient parameter dimension \(d\). We ask when private training can update fewer coordinates without losing the signal needed for optimization. We propose \textsc{TP-TopK} (Two-Phase TopK DP-SGD), a two-phase method for coordinate-sparse private training without public data, in which a private warm-up phase identifies a coordinate support used to guide the main training phase. We give a criterion characterizing when coordinate restriction can be beneficial, show via a nonconvex stationarity bound that under this condition the relevant noise term scales with the active dimension \(k\) rather than the full parameter dimension \(d\), and provide a lower bound on the reliability of warm-up-based coordinate ranking. Experiments on MNIST, FMNIST, and CIFAR-10 show that learned coordinate supports can retain more gradient energy than size-matched random supports, with the largest gains when the active dimension is small and warm-up scores are informative.

2606.04357 2026-06-04 stat.CO stat.ME

A New Perspective on Reverse Diffusion for Monte Carlo Sampling

蒙特卡洛采样中反向扩散的新视角

Jairon H. N. Batista, Flávio B. Gonçalves, Yuri F. Saporito, Rodrigo S. Targino

AI总结 提出一种无需时间离散化和分数估计的反向扩散采样方法,通过嵌入目标密度到有限时域扩散过程的初始边际,并基于Radon-Nikodym导数推导出两类并行化MCMC算法,有效处理多模态和复杂依赖结构。

详情
AI中文摘要

本文提出了一种使用反向扩散过程从非归一化密度中采样的新视角。核心思想是将目标密度嵌入到适当构造的有限时域扩散过程的初始时刻边际分布中。与现有方法不同,所提出的方法既没有时间离散化误差,也不需要分数函数估计,因此蒙特卡洛变异性是唯一的近似来源。一个关键的理论结果刻画了反向扩散转移分布相对于Ornstein-Uhlenbeck(OU)过程的Radon-Nikodym导数。这种表示提供了一种易处理的测度变换公式,并作为两类不同蒙特卡洛算法的基础。第一类通过一系列伪边际Metropolis-Hastings MCMC算法近似反向转移分布。该方案产生目标分布的近似独立同分布样本,并且由于轨迹可以独立生成,因此完全可并行化。第二类包括针对整个扩散路径在[0,T]上的联合分布进行采样的MCMC算法,其中T是适当选择的时域。所提出的采样器结合了三种类型的更新。一种更新根据OU动力学在时间上向前模拟扩散,条件于其初始值。其余两种通过Metropolis型步骤更新反向分量:一种条件于时间T的终值,另一种则不。在这两种情况下,接受概率都使用Barker型伯努利工厂构造实现。所提出的方法对于具有多模态和复杂依赖结构的目标表现良好,为广泛使用的随机游走Metropolis算法提供了一种可扩展且高效的替代方案。

英文摘要

This paper introduces a novel perspective on the use of reverse diffusion processes for sampling from unnormalized densities. The central idea is to embed the target density as the marginal at the initial time of a suitably constructed diffusion process evolving over a finite horizon. In contrast to existing approaches, the proposed methodology involves neither time discretization error nor score function estimation, so that Monte Carlo variability is the only source of approximation. A key theoretical result characterizes the Radon-Nikodym derivative of the reverse diffusion transition distribution with respect to that of an Ornstein-Uhlenbeck (OU) process. This representation provides a tractable change-of-measure formulation and serves as the foundation for two distinct classes of Monte Carlo algorithms. The first class approximates the reverse transition distribution via a sequence of pseudo-marginal Metropolis-Hastings MCMC algorithms. The resulting scheme produces an approximate i.i.d. sample from the target distribution and is fully parallelizable, as trajectories can be generated independently. The second class consists of MCMC algorithms targeting the joint law of the whole diffusion path in $[0,T]$, for a suitably chosen horizon $T$. The proposed samplers combine three types of updates. One update simulates the diffusion forward in time according to an OU dynamics, conditional on its initial value. The remaining two update the backward component via Metropolis-type steps: one conditions on the terminal value at time $T$ and the other one does not. In both cases, acceptance probabilities are implemented using Barker-type Bernoulli factory constructions. The proposed methods perform well for targets with multimodality and complex dependence structures, providing a scalable and efficient alternative to the widely used random-walk Metropolis algorithm.

2606.04334 2026-06-04 astro-ph.EP astro-ph.IM stat.AP

Hybrid Particle Gaussian Mixture (H-PGM) Solution for Cislunar Target Tracking

混合粒子高斯混合(H-PGM)方法用于地月空间目标跟踪

Ishan Paranjape, Tarun Hejmadi, Utkarsh Ranjan Mishra, Suman Chakravorty

AI总结 针对地月空间三体非平面效应主导下高斯轨道确定方法失效的问题,提出一种基于马尔可夫链蒙特卡洛和卡尔曼更新的混合粒子高斯滤波框架,融合角度观测和先验信息实现短长期目标跟踪。

详情
Comments
38 pages, 14 figures, to be submitted to the Journal of Astronautical Sciences
AI中文摘要

高斯轨道确定方法是天体动力学中最流行的、假设最少的目标跟踪技术之一,尤其用于生成初始状态估计。然而,由于高斯方法假设开普勒运动(属于更大的二体问题的一部分),该方法无法应用于地月空间环境,其中三体非平面效应占主导地位。在这项工作中,我们展示了一种混合粒子高斯混合(H-PGM)滤波方法,这是一种纯递归概率轨道确定框架,依赖于基于马尔可夫链蒙特卡洛(MCMC)的粒子高斯混合-II(PGM-II)和基于卡尔曼更新的粒子高斯混合-I(PGM-I)滤波器的顺序组合。该方法使我们能够将概率信息与来自地面望远镜的仅角度观测融合,用于地月空间目标的短期和长期跟踪。该方法还允许我们融合其他目标先验信息,以在短期内减少目标不确定性。这种混合滤波技术在几种流行且重要的地月轨道体制中进行了演示,并与几种同质和混合滤波框架进行了比较。

英文摘要

Gauss's method of orbit determination (OD) is one of the most popular, minimal assumption target tracking techniques in astrodynamics, especially for generating an initial state estimate. However, due to Gauss's method's assumption of Keplerian motion (part of the larger two-body problem), this method cannot be applied in a cislunar environment, where three body, non-planar effects dominate. In this work, we showcase a hybrid Particle Gaussian Mixture (H-PGM) filtering method, a purely recursive probabilistic OD framework that relies upon a sequential combination of the Markov Chain Monte Carlo (MCMC) based Particle Gaussian Mixture-II (PGM-II) and Kalman update based Particle Gaussian Mixture-I (PGM-I) filters. This method allows us to fuse probabilistic information with angles-only observations from terrestrial telescopes for short- and long-term cislunar target tracking. This method also allows us to fuse other target \textit{a priori} information in an effort to reduce target uncertainty in the short term. This hybrid filtering technique is demonstrated for several popular and important cislunar orbit regimes and compared with several homogeneous and hybrid filtering frameworks.

2606.04324 2026-06-04 cs.LG stat.ML

Neural Galerkin Normalizing Flows for Bayesian Inference of Diffusions with Inaccessible Boundaries

用于具有不可达边界的扩散模型贝叶斯推断的神经Galerkin归一化流

Riccardo Saporiti, Fabio Nobile

AI总结 提出一种新的归一化流架构,通过神经Galerkin框架求解Fokker-Planck方程,学习扩散过程在两次观测之间的转移密度函数,从而高效实现贝叶斯推断。

详情
Comments
27 pages, 12 figures
AI中文摘要

从离散观测对扩散模型参数进行贝叶斯推断的主要挑战之一是,在连续观测时间之间无法获得转移密度函数的解析表达式,而该函数是推导似然函数所必需的。扩展先前使用归一化流求解Fokker-Planck型偏微分方程的研究,我们提出一种新的归一化流架构,用于学习扩散过程在两个观测时间之间的转移密度函数。我们通过神经Galerkin框架,以狄拉克质量作为初始条件,在初始数据和扩散系数的指定训练分布上求解相关的Fokker-Planck方程来实现这一点。我们特别关注扩散矩阵在某些不可达边界区域消失的过程,例如满足Feller条件的随机波动率模型。沿观测轨迹评估所获得的转移密度的乘积近似似然函数,从而通过马尔可夫链蒙特卡洛实现廉价的后验采样。在离线训练阶段之后,推断变得显著更高效,因为它避免了为MCMC采样器提出的每个参数实时求解Fokker-Planck方程,或依赖其他涉及重复模拟扩散桥的无似然贝叶斯推断方法。

英文摘要

One of the primary challenges in Bayesian inference on the parameters of a diffusion model from discrete observations is the unavailability of an analytical expression for the transition density function between consecutive observation times, which is needed to derive the likelihood function. Extending previous studies that solve Fokker-Planck (FP) type partial differential equations with Normalizing Flows, we propose a new Normalizing Flow architecture to learn the transition density function of the diffusion process between two observation times. We do so by solving in a Neural Galerkin framework the associated FP equation with a Dirac mass as initial condition, over a specified training distribution of the initial datum and the coefficients of the diffusion. We specifically focus on processes whose diffusion matrix vanishes in certain inaccessible boundary regions, such as Stochastic Volatility models that satisfy a Feller condition. The product of the obtained transition densities evaluated along the observed trajectory approximates the likelihood function, thereby enabling cheap posterior sampling via Markov chain Monte Carlo (MCMC). After the offline training phase, inference becomes significantly more efficient, as it avoids the need to solve the FP equation in real time for each parameter proposed by the MCMC sampler or to rely on other likelihood-free methods for Bayesian inference that involve repeated simulation of diffusion bridges.

2606.04322 2026-06-04 stat.ME math.ST stat.TH

Robust Prediction Variance Estimation for Gaussian Process Regression Under Covariance Smoothness Misspecification

协方差平滑性误指定下高斯过程回归的鲁棒预测方差估计

Roberto Rivera

AI总结 针对协方差平滑性误指定导致预测方差低估的问题,提出一种新的均方预测误差估计方法,该方法在非等价工作模型与真实模型下表现更优。

详情
AI中文摘要

最佳线性无偏预测(BLUP)一直是广义线性混合模型、空间模型和高斯过程回归(GPR)中的主导方法。除了最优性质外,BLUP程序还能量化预测不确定性。然而,BLUP的一般实现步骤如下:(i)假设概率分布和协方差函数已知,仅协方差参数值未知;(ii)将参数估计值代入BLUP方程,得到估计的最佳线性无偏预测(EBLUP)及其方差。在实际应用中,真实的协方差函数是未知的,选择错误的协方差模型(特别是其平滑性)来估计参数会产生准EBLUP,其预测方差被低估。本文聚焦于GPR背景,首先证明当工作模型与真实模型非等价时,误指定对均方预测误差(MSPE)的影响收敛于一个正常数,且该影响在预测位置上平滑。然后,我们提出一种考虑协方差函数不确定性的准EBLUP的MSPE估计新方法。将新估计量与另外四种预测方差估计量进行比较。新的预测方差估计量通常优于所有其他竞争者,且协方差平滑性误指定越大,MSPE估计量之间的差异越大。

英文摘要

Best Linear Unbiased Prediction (BLUP) has been a dominant approach in Generalized Linear Mixed Models, spatial models, and Gaussian Process Regression (GPR). In addition to their optimal properties, BLUP procedures quantify prediction uncertainty. However, the general implementation of BLUP goes as follows: (i) assume the probability distribution and covariance function are known and that only the covariance parameter values are unknown; (ii) plug in parameter estimates into BLUP equations to get the Estimated Best Linear Unbiased Prediction (EBLUP) and its variance. In applications, the reality is that the true covariance function for the process is unknown and choosing the wrong covariance model, particularly its smoothness, to estimate parameters yields a quasi-EBLUP whose prediction variance is biased downward. Focusing on a GPR context, in this paper we first demonstrate that the effect of misspecification on the mean squared prediction error (MSPE) of the quasi-EBLUP converges to a positive constant when the working and true measures are non-equivalent, and is smooth in the prediction location. We then propose a new way to estimate the MSPE of the quasi-EBLUP that accounts for covariance function uncertainty. Our new estimator is compared to four other prediction variance estimators. The new prediction variance estimator generally performs better than all other competitors, and the larger the misspecification of the covariance smoothness, the wider the difference among MSPE estimators.

2606.04307 2026-06-04 cs.LG stat.CO stat.ME

Folded Transport MCMC: Certifiable Quotient Posterior Computation for Symmetric Bayesian Models

折叠传输MCMC:对称贝叶斯模型的可认证商后验计算

Jun Hu

AI总结 针对对称贝叶斯模型中的冗余多峰性导致MCMC收敛诊断退化的问题,提出Folded Transport MCMC方法,通过在对称群的基本域上构建独立采样器直接对商后验进行推断,并利用LCNF振荡认证框架在商度量下提供可证明的认证下界。

详情
Comments
48 pages (including supplementary material), 5 figures, 6 tables. Submitted to Journal of the Royal Statistical Society: Series B
AI中文摘要

具有有限对称性的贝叶斯模型——如可交换分量的混合模型、具有紧密间隔模态的结构识别——定义的后验在标签置换群下不变,产生冗余的多峰性,从而降低MCMC收敛诊断的质量。我们引入折叠传输MCMC(FolT-MCMC),该方法通过在对称群的基本域上构建独立采样器,直接对商后验进行推断。商提议分布通过对群轨道上学习的归一化流进行对称化得到。我们证明了基于LCNF振荡的认证框架可以迁移到商度量,并具有稳定子修正的球质量界和改进的覆盖半径,并且当未折叠流表现出跨模态提议缺陷时,分位数核心认证下界会得到改善。在高斯混合(d=2-20)、标签切换目标(最多24个等价模态)以及标准贝叶斯三分量混合后验上,分位数核心认证改进比从2倍到145倍不等,且折叠认证经验上几乎与维度无关。在台风山竹期间超高层建筑的真实加速度计数据上,FolT-MCMC产生了非平凡的分位数核心认证,而未折叠认证是平凡的。

英文摘要

Bayesian models with finite symmetry - mixture models with exchangeable components, structural identification with closely-spaced modes - define posteriors that are invariant under a group of label permutations, creating redundant multimodality that degrades MCMC convergence diagnostics. We introduce Folded Transport MCMC (FolT-MCMC), which performs inference directly on the quotient posterior by constructing an independence sampler on the fundamental domain of the symmetry group. The quotient proposal is formed by symmetrising a learned normalising flow over the group orbits. We prove that the LCNF oscillation-based certification framework transfers to the quotient metric with a stabiliser-corrected ball-mass bound and improved covering radius, and that the quantile-core certified lower bound improves whenever the unfolded flow exhibits cross-mode proposal deficiency. On Gaussian mixtures (d = 2 - 20), label-switching targets (up to 24 equivalent modes), and a standard Bayesian three-component mixture posterior, the quantile-core certified improvement ratio ranges from 2x to 145x, with the folded certificate empirically nearly dimension-free. On real accelerometer data from a supertall building during Typhoon Mangkhut, FolT-MCMC yields a non-vacuous quantile-core certificate where the unfolded certificate is vacuous.

2606.04305 2026-06-04 cs.LG stat.ML

Offline-to-Online Learning in Linear Bandits

线性Bandit中的离线到在线学习

Kushagra Chandak, Toshinori Kitamura, Xiaoqi Tan

AI总结 针对随机线性Bandit问题,提出一种平衡离线数据与在线探索的算法,实现次线性遗憾并随离线样本增加降低离线参考遗憾。

详情
AI中文摘要

我们研究了在随机线性Bandit设置中利用额外离线数据集进行在线学习的问题。尽管该问题在实践中频繁出现,但在结构化环境中,离线到在线的权衡仍然缺乏深入理解。我们提出了一种线性Bandit算法来平衡这种权衡:它在早期回合依赖离线数据,并随着时间推移逐渐增加探索。我们建立了遗憾界,表明我们的方法同时与纯在线和纯离线解决方案具有竞争力。特别地,相对于最优动作,它在在线交互次数上实现了次线性遗憾,而相对于离线参考的遗憾随着离线样本数量的增加而降低。实验结果进一步证明了该方法在各种问题参数下的有效性。

英文摘要

We study online learning with an additional offline dataset in the stochastic linear bandit setting. Although this problem arises frequently in practice, the offline-to-online tradeoff remains poorly understood in structured environments. We propose a linear bandit algorithm that balances this tradeoff: it relies on offline data during early rounds, and increasingly favors exploration as the horizon grows. We establish regret bounds showing that our method is simultaneously competitive with both purely online and purely offline solutions. In particular, it achieves sublinear regret relative to the optimal action in the number of online interactions, while its regret relative to an offline reference decreases as the number of offline samples grows. Empirical results further demonstrate its effectiveness across various problem parameters.

2606.04276 2026-06-04 math.ST stat.TH

Local Sensitivity Under Transport Restrictions

传输限制下的局部敏感性

Hongseok Namkoong

AI总结 本文通过Otto-Wasserstein几何中的局部敏感性量化结构知识(模型在观测数据前施加的限制)的价值,并应用于因果推断中倾向性得分已知与未知时敏感性差异的经典问题。

详情
AI中文摘要

我们量化了结构知识(建模者在看到数据之前对世界施加的限制)的价值。我们的分析工具是估计量在Otto-Wasserstein几何中对分布扰动的局部敏感性:估计量在单位概率质量位移(传输)下的最大一阶变化,等于有效影响函数空间梯度的对偶范数。建模者通过限制传输类别来编码结构知识,由此导致的敏感性降低就是其归纳偏差的价值。我们通过阐明因果推断中一个长期存在的谜题来说明这一点:尽管当倾向性得分未知时存在实际困难,但经典半参数效率边界在倾向性得分已知或未知时保持不变。我们的方法刻画了已知倾向性如何显著降低对错误设定的敏感性。

英文摘要

We quantify the value of structural knowledge, restrictions a modeler places on the world before seeing data. Our analytic workhorse is the local sensitivity of an estimand to distributional perturbations in the Otto-Wasserstein geometry: the largest first-order change in the estimand per unit displacement of probability mass (transport), equal to the dual norm of the spatial gradient of the efficient influence function. The modeler encodes structural knowledge by restricting the class of transports, and the resulting reduction in sensitivity is the value of their inductive bias. We illustrate by shedding light on a longstanding puzzle in causal inference where classical semiparametric efficiency bounds remain identical regardless of whether the propensity score is known, despite observed practical difficulties when it is unknown. Our approach characterizes how known propensities significantly reduce sensitivity to misspecification.

2606.04267 2026-06-04 math.ST cs.NA math.NA stat.TH

Unbiased estimation of squared concentration in the Fisher-von Mises-Langevin distribution and the impossibility of unbiased concentration

Fisher-von Mises-Langevin分布中平方浓度的无偏估计及浓度无偏估计的不可能性

Zain Jabbar, Yuqin Jiang, Andrey A. Popov

AI总结 本文证明了Fisher-von Mises-Langevin分布中浓度参数的无偏估计不可能,转而提出平方浓度(强度)的无偏估计,并基于部分和U统计量给出了(几乎)无偏估计量。

详情
AI中文摘要

Fisher-von Mises-Langevin分布中浓度参数的估计是方向统计中与高斯分布精度矩阵估计类似的问题。本文证明了该参数的无偏估计是不可能的。基于这一认识,我们提供了Fisher-von Mises-Langevin分布的另一种参数化形式,即平方浓度,我们称之为强度。我们进一步证明了其无偏估计是可能的,并基于部分和U统计量给出了(几乎)无偏估计量。我们在合成数据、纽约出租车行程数据和球形词嵌入上展示了我们的新估计量。

英文摘要

The estimation of concentration parameter in Fisher-von Mises-Langevin distribution is the directional statistics analogue of the estimation of the precision matrix for the Gaussian distribution. In this work we show that unbiased estimation of this parameter is impossible. With this realization in hand, we provide an alternative parameterization of the Fisher-von Mises-Langevin distribution in terms of the squared concentration, which we term the intensity. We fruther show that unbiased estimation of thereof is possible, and provide (almost) unbiased estimators thereof in terms of a partial sum U-statistic. We showcase our new estimator on synthetic data, New York taxi trip data, and on spherical word embeddings.

2606.04250 2026-06-04 stat.ME

Locally Equivalent Weights for Multilevel Regression and Poststratification

多层回归与事后分层的局部等价权重

Ryan Giordano, Alice Cima, Jared Murray, Erin Hartman, Avi Feller

AI总结 提出多层回归与事后分层(MrP)的局部等价权重(MrPlew),将MrP表示为与校准权重局部等价的加权估计量,从而支持方差估计、协变量平衡和子组贡献等诊断,并证明其渐近性质。

详情
Comments
60 pages
AI中文摘要

多层回归与事后分层(MrP)已成为从非概率调查中估计总体数量的主流方法,并且是传统调查校准加权方法(如迭代加权)的主要基于模型的替代方案。对于简单线性回归模型,MrP方法承认“等价权重”,允许在MrP和传统校准加权之间进行直接比较。然而,对于最广泛使用的MrP模型(如逻辑回归),这样的权重一直不可用。在本文中,我们开发了一种自然的推广,“MrP局部等价权重”(MrPlew),它将MrP表示为一种加权风格的估计量,在观测响应附近与校准权重局部等价。这使得一系列标准加权诊断成为可能,包括频率派抽样变异性、协变量平衡和子组贡献。我们正式证明了在这些情况下使用MrPlew的合理性:我们证明了基于MrPlew的方差估计量对于常见的指数族模型渐近等价于无穷小刀切法,并引入了一类基于数据扰动不变性的新型模型检查,将协变量平衡和子组贡献推广到非线性模型。我们进一步表明,MrPlew可以轻松使用现有的MCMC样本计算,并提供开源软件,使用标准软件的输出来计算MrPlew。我们通过几个使用MrP的经典研究说明了我们的方法,包括通过逻辑回归结果模型,表明隐含的协变量平衡有时对于MrP比对于迭代加权更差。鉴于计算的简便性,我们建议将MrPlew作为MrP模型审查工作流程的标准部分。

英文摘要

Multilevel regression and poststratification (MrP) has become a workhorse method for estimating population quantities from non-probability surveys, and is the primary model-based alternative to traditional survey calibration weighting methods, such as raking. For simple linear regression models, MrP methods admit ``equivalent weights'', allowing for direct comparisons between MrP and traditional calibration weighting. Such weights, however, have been unavailable for the most widely used MrP models, such as logistic regression. In this paper, we develop a natural generalization, ``MrP locally equivalent weights'' (MrPlew), which represent MrP as a weighting-style estimator that is locally equivalent to calibration weights near the observed responses. This enables a suite of standard weighting diagnostics, including frequentist sampling variability, covariate balance, and subgroup contribution. We formally justify the use of MrPlew in these cases: we prove the MrPlew-based variance estimator is asymptotically equivalent to the infinitesimal jackknife for common exponential family models, and we introduce a novel class of model checks based on invariance to data perturbations that generalize covariate balance and subgroup contribution to nonlinear models. We further show that MrPlew can be computed easily using existing MCMC samples and provide open-source software to compute MrPlew using the output of standard software. We illustrate our approach for several canonical studies that use MrP, including via a logistic regression outcome model, showing that implied covariate balance can sometimes be worse for MrP than for raking. Given the ease of computing, we recommend making MrPlew a standard part of the MrP model interrogation workflow.

2606.04237 2026-06-04 stat.ME stat.CO

Constrained Weighted Bayesian Bootstrap

约束加权贝叶斯自助法

Sam Rosen, Jason Xu

AI总结 本文提出约束加权贝叶斯自助法,在温和假设下将加权贝叶斯自助法扩展到一般约束后验分布采样,通过凸优化工具实现快速算法,并证明其渐近分布协方差与有效估计量受限最大似然估计匹配。

详情
Comments
24 Pages, 8 Figures. Accepted to 42nd Conference on Uncertainty in Artificial Intelligence (uai2026)
AI中文摘要

我们证明了加权贝叶斯自助法(一种后验分布近似采样方法)可以在温和假设下扩展到从一般约束后验分布中采样。该方法涉及一个简单的算法,可以利用凸优化中的快速工具。在正则条件下,我们展示了约束加权贝叶斯自助法样本的渐近分布具有与受限最大似然估计(一种有效估计量)匹配的协方差。我们在各种约束贝叶斯问题上对方法进行了实证评估,展示了该方法的广泛适用性以及相对于现有同类方法的优势。约束加权贝叶斯自助法能够快速从约束后验中采样,为通常通过仅提供点估计的优化方法解决的问题提供足够的不确定性量化。作为案例研究,使用欧式期权价格所需的约束,通过约束加权贝叶斯自助法推导出了期权定价曲面的不确定性估计。

英文摘要

We prove the weighted Bayesian bootstrap, a method for approximate sampling of a posterior distribution, can be extended to sample from general constrained posterior distributions under mild assumptions. The method entails a simple algorithm that can take advantage of fast tools from convex optimization. Under regularity conditions, we show the asymptotic distribution of samples from the constrained weighted Bayesian bootstrap has a covariance matching the restricted maximum likelihood estimator, an efficient estimator. We assess the method empirically on a variety of constrained Bayesian problems, demonstrating broad applicability of the method as well as advantages over existing peer methods. The constrained weighted Bayesian bootstrap quickly samples from constrained posteriors, providing adequate uncertainty quantification for problems typically solved via optimization methods designed to deliver only a point estimate. As a case study, using constraints required in European-style option prices, uncertainty estimates of an option pricing surface are derived with constrained weighted Bayesian bootstrap.

2606.04215 2026-06-04 stat.AP

Contextual Geospatial Features for Identifying Informal Environmental-Health Hazards Undetectable from Satellites: A ULAB Case Study

用于识别卫星无法检测的非正式环境健康危害的情境地理空间特征:以废旧铅酸电池回收为例

Naia Ormaza-Zulueta, Zia Mehrabi

AI总结 本文提出利用情境地理空间特征和领域知识设计案例特定特征,以识别卫星无法检测的非正式环境健康危害,并通过废旧铅酸电池回收案例验证了该方法在训练集外检测非正式回收站点的有效性。

详情
AI中文摘要

可靠、可扩展地检测非正式、小规模的环境健康危害(废旧铅酸电池(ULAB)回收、家庭规模的电子垃圾焚烧、室内汞合金提取、砖窑、小型制革厂)仍然是一个未解决的问题。这些活动对卫星不可见,且不在正式登记册中,却对低收入和中等收入国家的低收入人群造成不成比例的伤害。本文阐述了问题类别,并探讨了一种可能的应对方案:情境地理空间特征,以及由领域专业知识指导的案例特定特征设计。我们以ULAB回收作为示范案例,利用了来自Pure Earth有毒场地识别项目的孟加拉国和印度164个已验证场地。在此样本量下,训练集上的五折交叉验证无法在统计上区分工程化的情境特征与简单的双特征社会人口基线。附加价值仅在训练集外评估时才显现。在非NCR印度和孟加拉国的172个保留的非正式回收站点上,模型给出的分数比匹配的随机城市对照点高出数倍;在独立的131个监管确认的正式回收商集合上,非正式站点在非NCR印度的得分显著高于正式站点,表明模型捕捉到了非正式回收商特有的结构,而非通用工业信号。我们将这些结果视为探索性而非确认性:标签稀疏性、兴趣点覆盖的空白以及未在南亚以外测试的迁移性仍然是开放问题。最后,我们提出了七个开放问题,并邀请环境健康和地理空间机器学习社区将非正式危害检测作为一类值得解决的问题来参与。

英文摘要

Reliable, scalable detection of informal, small-scale environmental-health hazards (used lead-acid battery (ULAB) recycling, household-scale e-waste burning, indoor mercury amalgamation, brick kilns, small tanneries) remains an unsolved problem. These operations are invisible to satellites and absent from formal registries, yet disproportionately harm low-income populations in low- and middle-income countries. This paper articulates the problem class and explores a possible response: contextual geospatial features, with case-specific feature design informed by domain expertise. We use ULAB recycling as a demonstration case, drawing on 164 verified sites in Bangladesh and India from Pure Earth's Toxic Sites Identification Programme. At this sample size, five-fold cross-validation on the training set cannot statistically distinguish the engineered contextual features from a simple two-feature socio-demographic baseline. The added value only becomes visible when we evaluate outside the training set. On 172 held-out informal-recycling sites in non-NCR India and Bangladesh, the model assigns scores several times higher than to matched random urban controls; and on an independent set of 131 regulatory-confirmed formal recyclers, informal sites score materially higher than formal ones in non-NCR India, indicating that the model is picking up informal-recycler-specific structure rather than generic industrial signal. We frame these results as exploratory rather than confirmatory: label sparsity, gaps in point-of-interest coverage, and untested transfer beyond South Asia all remain open. We close with seven open problems and invite the environmental-health and geospatial machine-learning communities to engage with informal-hazard detection as a class of problems worth solving.

2606.04182 2026-06-04 cs.LG cs.AI stat.ML

Exact Unlearning in Reinforcement Learning

强化学习中的精确遗忘

Thanh Nguyen-Tang, Raman Arora

AI总结 本文提出强化学习中的精确遗忘问题,通过ρ-TV稳定算法实现数据删除后输出与从未学习该数据时不可区分,并给出近乎最优的遗憾界。

详情
Comments
ICML Spotlight
AI中文摘要

我们提出了强化学习中的精确遗忘问题,目标是设计一个高效框架,使得在收到删除请求后能够移除任何用户的数据,即遗忘后在线学习者的输出与从未与学习者交互过的用户所产生的结果不可区分。对于任意 $ρ>0$,我们证明存在一个 $ρ$-TV 稳定的强化学习算法,支持精确遗忘过程,其期望计算成本仅为从头重新训练计算成本的 $ρ\sqrt{\ln T}$ 分之一。我们为表格型马尔可夫决策过程构造了这样一个 $ρ$-TV 稳定的强化学习算法,其遗憾界为 $\mathcal{O}(H^2 \sqrt{SAT} + H^3 S^2 A + {H^{2.5} S^2 A}/ρ)$,其中 $S, A, H, T$ 分别表示状态数、动作数、回合长度和回合数。我们还为 $ρ$-TV 稳定的强化学习算法建立了 $\Omega(H\sqrt{\!SAT}\! +\! {SAH}/ρ)$ 的下界,表明我们的算法几乎是极小化最优的。

英文摘要

We formulate the problem of \emph{exact unlearning} in reinforcement learning, where the goal is to design an efficient framework that enables the removal of any user's data upon deletion request, i.e., the online learner's output after unlearning is \emph{indistinguishable} from what would have been produced had the deleted user never interacted with the learner. For any $ρ>0$, we show that there exists a reinforcement learning (RL) algorithm that is $ρ$-TV-stable and supports an exact unlearning procedure whose expected computational cost is only a $ρ\sqrt{\ln T}$ fraction of the computational cost of retraining from scratch. We construct such a $ρ$-TV-stable RL algorithm for tabular Markov decision processes (MDPs), which achieves a regret bound of $\mathcal{O}(H^2 \sqrt{SAT} + H^3 S^2 A + {H^{2.5} S^2 A}/ρ)$, where $S, A, H$, and $T$ denote the number of states, the number of actions, the episode horizon, and the number of episodes, respectively. We also establish a lower bound of $Ω(H\sqrt{\!SAT}\! +\! {SAH}/ρ)$ for $ρ$-TV-stable RL algorithms, showing that our algorithm is nearly minimax optimal.

2606.04176 2026-06-04 cs.LG math.ST stat.ML stat.TH

Low-rank Distributional Matrix Completion

低秩分布矩阵补全

Jiayi Wang, Raymond K. W. Wong

AI总结 针对每个条目为概率分布的矩阵,提出基于核均值嵌入和Tucker秩的低秩结构,通过函数展开算子连接无限维与有限维,实现分布矩阵补全并给出非渐近误差界。

详情
AI中文摘要

我们研究了矩阵补全问题的分布推广,其中目标矩阵的每个条目是概率分布而非标量。在此设置中,仅观察到矩阵条目的一个子集,即使对于观察到的条目,底层分布也无法直接获取;相反,我们观察到从这些分布中抽取的有限样本。为了表示分布条目,我们采用核均值嵌入,并引入分布值矩阵的Tucker秩概念以捕捉其低秩结构。核嵌入的无限维性质带来了重大的方法论挑战。为解决此问题,我们引入了函数展开算子,将所提出的分布低秩结构与有限维张量的经典Tucker秩联系起来。基于此框架,我们提出了一种用于分布矩阵补全的新估计器。我们建立了非渐近误差界,刻画了估计器的统计性能。在合成数据和真实世界应用上的大量实验证明了所提方法的有效性。

英文摘要

We study a distributional generalization of the matrix completion problem in which each entry of the target matrix is a probability distribution rather than a scalar. In this setting, only a subset of matrix entries is observed, and even for observed entries, the underlying distributions are not directly accessible; instead, we observe finitely many samples drawn from them. To represent distributional entries, we employ kernel mean embeddings and introduce a notion of Tucker rank for distribution-valued matrices to capture their low-rank structure. The infinite-dimensional nature of kernel embeddings poses significant methodological challenges. To address this, we introduce functional unfolding operators that link the proposed distributional low-rank structure to the classical Tucker rank for finite-dimensional tensors. Based on this framework, we propose a novel estimator for distributional matrix completion. We establish non-asymptotic error bounds that characterize the statistical performance of the estimator. Extensive experiments on synthetic data and a real-world application demonstrate the effectiveness of the proposed method.

2606.04175 2026-06-04 stat.AP

Inferring cellular heterogeneity with mixture models for DNA methylation rates

利用DNA甲基化率的混合模型推断细胞异质性

Hugo Barbot, Yuna Blum, Magali Richard, David Causeur

AI总结 针对DNA甲基化数据中细胞异质性导致的反卷积精度问题,提出基于非负Beta回归混合模型和EM算法的特征选择方法,显著提升反卷积准确性。

详情
AI中文摘要

细胞异质性是生物组织的标志,在疾病进展、诊断和预后中起核心作用。然而,从整体分子谱中准确表征这种异质性仍然具有挑战性,因为观察到的信号来自多个细胞群体的混合。细胞反卷积旨在从这种异质性测量中恢复组成细胞类型的相对丰度,但大多数现有方法隐含地依赖于残差误差的限制性假设,包括独立性、同方差性和正态性。这些假设在组学数据中很少满足,因为组学数据固有地有界且过度分散。在这项工作中,我们展示了全基因组细胞类型特异性DNA甲基化谱表现出潜在的组结构,当忽略这些结构时,会显著损害反卷积精度。因此,我们提出了一种通过期望最大化算法估计的非负Beta回归混合模型,用于DNA甲基化率。我们的框架通过混合成分识别自然地整合了特征选择机制,使成分选择成为推理过程中的关键步骤。我们进一步提出了一个专门的成分选择标准,并通过在多个体外基准数据集上的广泛比较研究评估了该方法的性能。我们的结果表明,反卷积精度对潜在成分结构高度敏感,并表明显式建模这种异质性比标准的全基因组反卷积策略带来了显著的改进。总之,这项工作将DNA甲基化数据的混合建模确立为稳健和准确细胞反卷积的一个强大新方向。

英文摘要

Cellular heterogeneity is a hallmark of biological tissues and plays a central role in disease progression, diagnosis, and prognosis. Yet, accurately characterizing this heterogeneity from bulk molecular profiles remains challenging because observed signals arise from mixtures of multiple cell populations. Cell deconvolution aim to recover the relative abundance of constituent cell types from such heterogeneous measurements, but most existing approaches implicitly rely on restrictive assumptions on residual errors, including independence, homoscedasticity, and normality. These assumptions are rarely satisfied in omics data, which are inherently bounded and overdispersed. In this work, we show that whole-genome cell-type specific DNA methylation profiles exhibit latent group structures that can substantially impair deconvolution accuracy when ignored. We therefore propose a mixture of non-negative Beta regression models estimated through an Expectation-Maximization algorithm for DNA methylation rates. Our framework naturally incorporates a feature selection mechanism through mixture component identification, making component selection a critical step of the inference procedure. We further propose a dedicated criterion for component selection and assess the performance of the approach through an extensive comparative study across several in vitro benchmark datasets. Our results demonstrate that deconvolution accuracy is highly sensitive to latent component structure and show that explicitly modeling this heterogeneity yields substantial improvements over standard whole-genome deconvolution strategies. Altogether, this work establishes mixture modeling of DNA methylation data as a powerful new direction for robust and accurate cell deconvolution.

2606.04170 2026-06-04 stat.AP

A Retrospective Benchmark of Spatiotemporal Covariates for Daily Active-Fire Detection in Cerrado Conservation Units

塞拉多保护区每日活跃火灾检测中时空协变量的回顾性基准

Juliano Eleno Silva Pádua, Alexandre Luis Magalhães Levada, Fredy João Valente

AI总结 本研究使用逻辑回归、随机森林和XGBoost,基于INPE卫星标签和MapBiomas土地覆盖数据,构建了巴西塞拉多保护区每日活跃火灾检测的回顾性基准,评估了四类协变量的性能。

详情
Comments
26 pages, 19 figures, 7 tables
AI中文摘要

野火威胁着巴西塞拉多地区的生物多样性、碳储量和管理能力,该地区的保护单位及其官方缓冲区必须在强旱季火灾制度下分配预防资源。本研究利用INPE BDQueimadas参考卫星标签(AQUA_M-T)、同一年MapBiomas Collection 9土地覆盖过滤的约束伪缺失,以及通过Google Earth Engine提取的四个嵌套协变量阶段,为巴西米纳斯吉拉斯州的塞拉多部分开发了一个回顾性每日活跃火灾检测基准。在全局训练基上采用五折时间序列交叉验证,并在空间上保留给Pau Furado州立公园和Serra do Cabral州立公园及其官方缓冲区的独立不平衡测试集上,评估了逻辑回归、随机森林和XGBoost。主要指标是AUC-PR,AUC-ROC、阈值精确率和召回率、SHAP解释以及回顾性得分图作为补充诊断。时间交叉验证显示,所有三个模型系列在完整时间记忆阶段达到最高平均AUC-PR。在更严格的1:100患病率设计下,保留的AOI测试较弱:随机森林在两个AOI中均在阶段3达到峰值,而XGBoost图暴露了高召回率、高警告量的行为。由此产生的基线为比较每日CU尺度活跃火灾检测排序中的大气、地表、静态空间和短期记忆协变量提供了可重复的参考。由于几个阶段使用同一天协变量,本研究是回顾性分类基准而非前瞻性预测。

英文摘要

Wildfires threaten biodiversity, carbon stocks, and management capacity in the Brazilian Cerrado, where Conservation Units and their official buffer zones must allocate prevention resources under a strong dry-season fire regime. This work develops a retrospective daily active-fire detection benchmark for the Cerrado portion of Minas Gerais, Brazil, using INPE BDQueimadas reference satellite labels (AQUA_M-T), constrained pseudo absences with same-year MapBiomas Collection 9 land-cover filtering, and four nested covariate stages extracted through Google Earth Engine. Logistic Regression, Random Forest, and XGBoost are evaluated under five-fold time-series cross-validation on a global training base and on independent imbalanced test sets spatially held out to Parque Estadual do Pau Furado and Parque Estadual da Serra do Cabral with their official buffer zones. AUC-PR is the primary metric, with AUC-ROC, threshold precision and recall, SHAP explanations, and retrospective score maps used as complementary diagnostics. Temporal cross-validation showed the highest mean AUC-PR at the complete temporal-memory stage for all three model families. Held-out AOI tests were weaker under the stricter 1:100 prevalence design: Random Forest peaked at Stage 3 in both AOIs, while XGBoost maps exposed high-recall, high-warning-volume behavior. The resulting baseline provides a reproducible reference for comparing atmospheric, surface, static spatial, and short-term memory covariates in daily CU-scale active-fire detection ranking. Because several stages use same-day covariates, the study is a retrospective classification benchmark rather than a prospective forecast.

2606.04134 2026-06-04 stat.ME

Optimal Treatment Policy Estimation for Recurrent Events with a Competing Terminal Event: An Instrumented Difference-in-Differences Approach

存在竞争性终点事件时复发事件的最优治疗策略估计:一种工具变量差分法

Ritoban Kundu, James Flory, Sean Hennessy, Ashkan Ertefaie

AI总结 针对存在竞争性终点事件的复发事件,提出基于工具变量差分法(iDID)的框架来估计最优治疗策略,通过多重稳健估计量解决未测量混杂问题,并避免因增加死亡率而减少不良事件的策略。

详情
AI中文摘要

学习慢性疾病的可重复和可推广最优治疗策略需要具有长期随访的大规模代表性人群。行政健康数据提供了一个自然的起点,但其使用常受未测量混杂因素的限制。我们通过提出一种基于工具变量差分法(iDID)的新框架来解决这一问题,以估计受终止事件影响的复发事件结果的最优策略。iDID设计在此场景中特别有用,因为它利用政策诱导的治疗变异,同时允许不同人群之间存在持续的未测量差异,其假设比传统IV或DID方法所需的假设对行政健康数据更合理。我们方法的一个关键特征是明确解决了避免那些通过增加死亡率而简单减少复发不良事件的策略这一基本挑战。我们推导了两种不同的逆概率加权识别方法,并开发了一种多重稳健估计量,只要多个子集的干扰模型中的任何一个被正确设定,该估计量就能达到一致性。通过大样本理论,我们建立了估计量的一致性和渐近正态性,并通过模拟证明了其在有限样本下优于现有方法的性能。最后,我们将该框架应用于一个全国性的Medicare数据集,以优化一线2型糖尿病策略,具体目标是减少疾病相关住院,同时考虑生存情况。

英文摘要

Learning reproducible and generalizable optimal treatment policies for chronic diseases requires large, representative populations with long-term follow-up. Administrative health data provide a natural starting point, but their use is often limited by unmeasured confounding. We address this by proposing a novel framework based on Instrumented Difference-in-Differences (iDID) to estimate optimal policies for recurrent event outcomes subject to a terminating event. The iDID design is particularly useful in this setting because it leverages policy-induced treatment variation while allowing for persistent unmeasured differences across populations, relying on assumptions that are more plausible for administrative health data than those required by conventional IV or DID approaches. A key feature of our approach is that it explicitly addresses the fundamental challenge of avoiding policies that trivially reduce recurrent adverse events by increasing mortality. We derive two distinct Inverse Probability Weighted identifications and develop a multiply robust estimator that achieves consistency if any one of several subsets of nuisance models is correctly specified. We establish the estimator's consistency and asymptotic normality through large-sample theory and demonstrate its superior finite-sample performance over existing methods via simulation. Finally, we apply this framework to a national Medicare dataset to optimize first-line Type 2 Diabetes strategies, specifically targeting the minimization of disease-related hospitalizations while accounting for survival.

2606.04128 2026-06-04 stat.ME

On prediction-powered inference for quantile regression via convolution smoothing

基于卷积平滑的分位数回归预测驱动推断

Shota Takeishi, Jimin Ding, Xuming He

AI总结 针对金标准数据有限而替代数据广泛可得的场景,提出基于卷积平滑的分位数回归预测驱动推断方法,解决计算困难和置信区间过保守问题,并建立渐近分布理论。

详情
Comments
32 pages, 8 figures
AI中文摘要

本文研究数据有限场景下的分位数回归,其中金标准结果仅对有限数量的观测可用,而替代结果广泛可用。随着现代AI低成本预测的普及,这类场景日益常见,推动了“预测驱动推断”研究的增长,以改进统计推断。然而,将该框架直接扩展到分位数回归会带来两个挑战:由于次梯度不连续性导致的计算困难,以及过于保守的置信区间。为解决这些问题,我们提出基于卷积平滑的检查损失目标函数,并开发了两种估计量变体。所提出的估计量计算可行,数值研究表明它们能缓解过度覆盖。作为理论贡献,我们在可能错误设定的线性分位数回归模型下建立了所提估计量的渐近分布。我们进一步提出两种估计量的集成,并通过模拟和本地住房数据集的应用来说明所提方法。

英文摘要

This paper studies quantile regression in a data-limited setting where the gold-standard outcome is available only for a limited number of observations, whereas a surrogate outcome is widely available. Such settings are becoming increasingly common with the availability of low-cost predictions from modern AI, motivating a growing line of research on "prediction-powered inference," for improved statistical inference. Naively extending this framework to quantile regression, however, raises two challenges: computational difficulties due to the discontinuity of the subgradient, and overly conservative confidence intervals. To address these issues, we propose a convolution-based smoothing of the check-loss objective and develop two variants of the estimator. The proposed estimators are computationally tractable, and our numerical studies show that they mitigate overcoverage. As a theoretical contribution, we establish the asymptotic distributions of the proposed estimators under a possibly misspecified linear quantile regression model. We further propose an ensemble of the two estimators and illustrate the proposed methods through simulations and an application to a local housing dataset.

2606.04114 2026-06-04 stat.ME

Global Warming Has Been Accelerating Since At Least 1990

全球变暖自1990年以来一直在加速

J. Eduardo Vera-Valdes

AI总结 使用线对数模型检验全球温度的超线性趋势,发现自1990年以来存在加速证据,且显著性随数据更新增强。

详情
AI中文摘要

我们研究了全球温度的加速现象,将加速定义为随时间呈超线性(大于线性)的增加。我们开发了一个统计框架,使用线对数设定来检验超线性趋势。我们的结果表明,自至少1990年以来,全球温度存在加速的证据,且随着包含更近期的数据,显著性增强。相比之下,在二次设定下,加速的证据仅在最长估计窗口中显著。我们还表明,如果真实温度趋势是超线性的,标准断点检验最终会检测到线性趋势模型斜率的变化,这可能解释了报道的全球温度趋势中的结构性断点。

英文摘要

We investigate acceleration in global temperature, defining acceleration as a supralinear (greater-than-linear) increase over time. We develop a statistical framework to test for supralinear trends using a linearithmic specification. Our results indicate evidence of acceleration in global temperature since at least 1990, with significance strengthening as more recent data are included. In contrast, evidence for acceleration under a quadratic specification is significant only in the longest estimation window. We also show that, if the true temperature trend is supralinear, standard break-point tests will eventually detect changes in the slope of a linear trend model, which may explain reported structural breaks in global temperature trends.

2606.04110 2026-06-04 cs.LG stat.ML

Variance Reduction for Heavy-Tailed Monetization Metrics in Ranking Experiments via Post-Stratification

基于事后分层的排序实验中重尾货币化指标的方差缩减

Neeti Pokharna, Olivier Jeunen, Yatharth Saraf, Aleksei Ustimenko

AI总结 针对排序实验中重尾货币化指标方差大、统计功效低的问题,提出结合事后分层与CUPED的方差缩减框架,利用实验前协变量提升灵敏度,在ShareChat部署后以约45%的流量实现同等统计置信度。

详情
Comments
Accepted as Industry Track paper in the 2026 ACM SIGIR Conference on Research and Development in Information Retrieval
AI中文摘要

排序和检索系统的在线评估通常依赖于下游货币化指标,如应用收入或创作者收益。这些指标通常是重尾的,一小部分用户主导了均值和方差,导致A/B实验的统计功效低、结论不可靠——尤其是在流量有限的情况下。我们提出了一个实用的在线实验方差缩减框架,通过结合事后分层与CUPED。我们的方法利用实验前协变量提高货币化实验的灵敏度,无需额外流量。在ShareChat的排名驱动货币化实验中部署后,该方法显著降低了方差并提高了决策稳定性,与标准指标相比,以约45%的流量实现了同等的统计置信度。我们进一步讨论了实际设计选择、防护措施和局限性,为事后分层在现实信息检索和推荐系统中的适用性提供了指导。

英文摘要

Online evaluation of ranking and retrieval systems often relies on downstream monetization metrics such as app revenue or creator earnings. These metrics are typically heavy-tailed, with a small fraction of users dominating both mean and variance, leading to low statistical power and unreliable conclusions in A/B experiments -- especially under limited traffic. We present a practical framework for variance reduction in online experiments by combining post-stratification with CUPED. Our approach leverages pre-experiment covariates to improve the sensitivity of monetization experiments without requiring additional traffic. Deployed at ShareChat across ranking-driven monetization experiments, the method substantially reduces variance and improves decision stability, achieving equivalent statistical confidence with ~45\% less traffic than standard metrics. We further discuss practical design choices, guardrails, and limitations, providing guidance on when post-stratification is appropriate for real-world information retrieval and Recommendation systems.

2606.04073 2026-06-04 cs.LG cs.AI stat.ML

TPA-AD: A Two-Stage Pseudo Anomaly-Guided Method for Bearing Time-Series Anomaly Detection

TPA-AD: 一种用于轴承时间序列异常检测的两阶段伪异常引导方法

Xiancheng Wang, Zhibo Zhang, Ran Li, Rui Wang, Minghang Zhao, Shisheng Zhong, Lin Wang

AI总结 提出一种两阶段伪异常引导方法TPA-AD,通过重构模型和特征误差控制生成边界伪异常窗口,结合对比学习与KNN实现无监督轴承时间序列异常检测,在轴承故障和退化数据集上表现稳定且具泛化性。

详情
AI中文摘要

本文提出了一种两阶段伪异常引导的异常检测方法(TPA-AD),用于在仅正常样本可用的训练设置下进行轴箱轴承时间序列异常检测(TSAD)。该方法首先利用重构模型和每特征目标误差控制在正常边界附近生成伪异常窗口,然后通过正常窗口与伪异常窗口之间的对比学习学习异常敏感表示,最后使用k近邻(KNN)生成窗口级和点级异常分数。与依赖已知故障类别、真实异常先验或随机异常注入的现有方法相比,TPA-AD通过在边界邻域构建伪异常提高了正常边界的可分离性,并能联合处理混合变量场景中的连续和离散特征。主要实验在轴承故障检测数据集和退化过程数据集上进行,并在13个公共TSAD数据集上进行了额外的探索性扩展。结果表明,所提方法产生相对稳定的异常响应,对退化演化敏感,并在公共TSAD基准和真实高速列车相关轴承数据上表现出一定程度的更广泛适用性。

英文摘要

This paper proposes a two-stage pseudo anomaly-guided anomaly detection method (\textbf{T}wo-stage \textbf{P}seudo \textbf{A}nomaly-guided \textbf{A}nomaly \textbf{D}etection, \textbf{TPA-AD}) for axle-box bearing time-series anomaly detection (time series anomaly detection, TSAD) under the setting where only normal samples are available for training. The method first generates pseudo-anomalous windows near the normal boundary using a reconstruction model and per-feature target-error control. It then learns anomaly-sensitive representations through contrastive learning between normal and pseudo-anomalous windows, and finally produces window-level and point-level anomaly scores using k-nearest neighbors (KNN). Compared with existing methods that rely on known fault categories, real anomaly priors, or random anomaly injection, TPA-AD improves the separability of the normal boundary by constructing pseudo-anomalies in boundary neighborhoods and can jointly handle continuous and discrete features in mixed-variable scenarios. The main experiments are conducted on bearing fault detection datasets and degradation-process datasets, with an additional exploratory extension on $13$ public TSAD datasets. The results show that the proposed method yields relatively stable anomaly responses, is sensitive to degradation evolution, and demonstrates a certain degree of broader applicability on public TSAD benchmarks and real high-speed-train-related bearing data.

2606.04065 2026-06-04 stat.ML cs.LG math.ST stat.TH

Finite-Iteration Local Dynamics and Warm Starts for Alternating Power Iteration in Spiked Tensor PCA

尖峰张量PCA中交替幂迭代的有限迭代局部动力学与热启动

Yanjin Xiang, Zhihua Zhang

AI总结 研究固定阶非对称秩一张量模型中同步交替幂迭代的有限迭代局部理论,提出与初始化无关的误差分解和热启动机制。

详情
Comments
67 pages, 0 figures. The paper studies local dynamics and warm-start analysis for alternating power iteration in spiked tensor PCA
AI中文摘要

我们研究了固定阶非对称秩一张量模型中的同步交替幂迭代。主要贡献是一个与任何特定初始化无关的有限迭代局部理论。一旦迭代进入种植秩一方向的足够小邻域,其误差分解为几何衰减的瞬态部分和由种植点处固定正交噪声收缩引起的内在噪声基底。确定性有限样本条件被明确陈述,但在粗粒度的固定阶多线性噪声事件下,它们简化为固定或缓慢扩展局部半径的保守高信号区域。然后,我们将热启动机制与任何特定谱构造分离。一个通用的单扫描原理表明,如果符号兼容的初始器具有相关性γ_N,第一扫描噪声水平a_N,且a_N/(γ_N^{d-1}ω_{N,d})→0,则可以选择一个扩展半径r_N=o(ω_{N,d}),使得第一扫描进入局部盆地。进入后,局部仿射收缩导致收敛到该盆地中唯一的信息性局部不动点。对于中心Gram初始化,我们通过信号保持的仅噪声留一比较和平均留一片收缩估计(称为压回估计),在独立同分布有限四阶矩噪声下验证了所需的相关性和同一样本第一扫描噪声界。留一比较保持尖峰固定并对删除坐标取平均,因此种植坐标通过ℓ₂加权和而非最坏情况非相干界进入。

英文摘要

We study simultaneous alternating power iteration for fixed-order asymmetric rank-one spiked tensor models. Our main contribution is a finite-iteration local theory that is independent of any particular initialization. Once the iterates enter a sufficiently small neighborhood of the planted rank-one direction, their error decomposes into a geometrically decaying transient and an intrinsic noise floor caused by fixed orthogonal noise contractions at the planted point. The deterministic finite-sample conditions are stated explicitly, but under a coarse fixed-order multilinear noise event they reduce to a conservative high-signal regime for fixed or slowly expanding local radii. We then separate the warm-start mechanism from any specific spectral construction. A generic one-sweep principle shows that, if a sign-compatible initializer has correlation \(γ_N\), first-sweep noise level \(a_N\), and \(a_N/(γ_N^{d-1}ω_{N,d})\to0\), then one can choose an expanding radius \(r_N=o(ω_{N,d})\) for which the first sweep enters the local basin. After entry, the local affine contraction yields convergence to the unique informative local fixed point in that basin. For centered-Gram initialization, we verify the required correlation and same-sample first-sweep noise bound under i.i.d. finite-fourth-moment noise by a signal-preserving noise-only leave-one comparison and an averaged leave-one slice-contraction estimate, which we call a pressed-back estimate. The leave-one comparison keeps the spike fixed and averages over the deleted coordinate, so planted coordinates enter through \(\ell_2\)-weighted sums rather than worst-case incoherence bounds.

2606.04031 2026-06-04 cs.LG math.OC stat.ML

Pseudospectral Bounds for Transient Amplification in Coupled Gradient Descent

耦合梯度下降中瞬态放大的伪谱界

Ahanaf Hasan Ariq

AI总结 针对耦合梯度下降中块三角雅可比矩阵的非正态性导致的瞬态放大,提出尖锐的伪谱理论,给出Kreiss常数的上界与匹配极小极大下界,并导出随机耦合下降的有限步迭代复杂度界。

详情
Comments
11 pages, 3 tables. Accepted as poster at HiLD 2026 (4th Workshop on High-dimensional Learning Dynamics, ICML 2026)
AI中文摘要

耦合梯度下降——其中一个参数块的更新依赖于另一个——是双层优化、双时间尺度随机逼近和对抗训练的基础。当耦合雅可比矩阵为块三角时,渐近稳定性由对角块的谱半径决定,但由于非正态性,收敛前的瞬态放大可能任意大。我们为这种块三角雅可比矩阵发展了尖锐的伪谱理论,证明当对角块对称且谱半径至多为γ<1时,Kreiss常数满足K(J) ≤ 2/(1-γ) + ||C||/(4(1-γ)),并建立了匹配的极小极大下界。我们刻画了谱不稳定的临界耦合阈值,并通过Neumann级数扰动框架将分析扩展到近自指系统。作为推论,我们得到了随机耦合下降的有限步迭代复杂度界O(K(J)^2 log(1/δ))。将结果表述为非平稳双时间尺度优化的标度律,我们的理论揭示了谱半径分析无法看到的非渐近、实例依赖的高维学习动力学。在线性二次问题、基于IQC的比较和神经网络训练上的实验证实了该理论。

英文摘要

Coupled gradient descent--where the update of one parameter block depends on another--underlies bilevel optimization, two-time-scale stochastic approximation, and adversarial training. When the coupled Jacobian is block-triangular, asymptotic stability is governed by the spectral radii of the diagonal blocks, yet transient amplification before convergence can be arbitrarily large due to non-normality. We develop a sharp pseudospectral theory for such block-triangular Jacobians, proving that the Kreiss constant satisfies $K(J) \leq 2/(1-γ) + \|C\|/(4(1-γ))$ when the diagonal blocks are symmetric with spectral radii at most $γ< 1$, and we establish matching minimax lower bounds. We characterize the critical coupling threshold for spectral instability and extend the analysis to nearly self-referential systems via a Neumann-series perturbation framework. As a consequence, we obtain a finite-horizon iteration-complexity bound of $O(K(J)^2 \log(1/δ))$ for stochastic coupled descent. Framed as scaling laws for non-stationary two-time-scale optimization, our results expose a non-asymptotic, instance-dependent regime of high-dimensional learning dynamics that is invisible to spectral-radius analysis. Experiments on linear-quadratic problems, IQC-based comparisons, and neural-network training confirm the theory.

2606.03863 2026-06-04 stat.ME stat.AP

Assessing the Impact of Intercurrent Events on Power and Sample Size for Estimands with Time-to-Event Endpoints

评估并发事件对时间至事件终点估计目标的功效和样本量的影响

Daniel J Bratton, Fiona Guillard, Sunita Rehal, Thomas Drury

AI总结 本文提出一组公式,用于计算固定随访期临床试验中时间至事件终点估计目标的功效,考虑了治疗策略、假设、复合或组合策略处理并发事件,并通过鼻息肉病例研究验证了方法的实用性和准确性。

详情
AI中文摘要

根据ICH E9(R1)附录,主要估计目标的精确定义需考虑并发事件(IEs),这对临床试验的设计和解释至关重要。然而,传统的功效和样本量计算往往未能充分纳入IEs的影响及其相应的处理策略,导致研究存在功效过高或过低的风险。虽然基于模拟的方法可以解决这一复杂性,但它们通常计算密集且可能仅探索有限的情景。在本文中,我们介绍了一组公式,用于计算具有时间至事件终点的估计目标的功效,应用于固定随访期的试验。我们专注于使用治疗策略、假设、复合或这些策略的组合来处理IEs的估计目标,假设IEs彼此独立且与主要终点独立。与基于模拟的估计进行验证显示高度一致性,并探讨了在结果和IEs相关的情况下功效估计的偏差。我们通过鼻息肉病例研究说明了我们方法的实际应用,考察了样本量需求对不同IE发生率及其对IE后结果影响的敏感性。所提出的公式有助于快速准确地进行功效和保证计算,使临床试验设计更紧密地与感兴趣的估计目标对齐。

英文摘要

The precise definition of a primary estimand, accounting for intercurrent events (IEs) as per the ICH E9(R1) addendum, is fundamental to the design and interpretation of clinical trials. Conventional power and sample size calculations, however, often do not adequately incorporate the impact of IEs and their corresponding handling strategies, creating a risk of over- or under-powered studies. While simulation-based approaches can address this complexity, they are often computationally intensive and may only explore a limited set of scenarios. In this paper, we introduce a set of formulae for calculating power for estimands with time-to-event endpoints, applied to trials with fixed follow-up durations. We focus on estimands that use treatment policy, hypothetical, composite, or a combination of strategies for handling IEs, under the assumption that IEs occur independently of each other and the primary endpoint. Validation against simulation-based estimates shows strong agreement, and we explore deviations in power estimates in scenarios where outcomes and IEs are dependent. We illustrate the practical application of our approach through a case study in nasal polyposis, examining the sensitivity of sample size requirements to varying IE rates and their impacts on post-IE outcomes. The proposed formulae facilitate rapid and accurate power and assurance calculations, enabling clinical trial designs to be more closely aligned with the estimand of interest.

2606.03656 2026-06-04 stat.ME

Beyond Point Estimates: Reliable Evaluation of Prediction Performance Metrics under Clustered Data

超越点估计:聚类数据下预测性能指标的可靠评估

Taekwon Hong, Daeyoung Lim, Woojung Bae

AI总结 针对聚类数据下预测性能指标缺乏不确定性量化的问题,提出基于混淆矩阵概率光滑泛函表示的统一框架,利用聚类稳健方差估计实现置信区间、假设检验和配对模型比较,并提供功效和样本量近似方法。

详情
AI中文摘要

预测性能指标(如准确率和F1分数)通常以单一数值报告,没有不确定性度量。在探索性设置中,这种省略是可以容忍的,因为模型评估用于非正式比较而非正式决策。但随着机器学习在现实世界应用中的部署,评估结果越来越多地用于支持二元决策——模型是否达到所需标准——这使得不确定性量化变得至关重要。当数据存在依赖性时(如重复测量、聚类受试者或时间序列),问题更加复杂,因为变异性更难评估且容易被低估。我们开发了一个统一框架,通过将广泛类别的性能指标表示为混淆矩阵概率的光滑泛函来建立联系。这种表示允许使用聚类稳健三明治方差估计量,在聚类数据下为二元和多类问题获得渐近有效的置信区间、假设检验和配对模型比较。我们还基于试点数据提供了功效和样本量近似,从而实现模型评估的原则性研究设计。模拟显示,所提出的方法在各种依赖结构下实现了接近名义覆盖,而朴素方法低估了变异性。一个真实数据应用进一步说明了考虑聚类如何实质性地改变结论。这些结果为在依赖和聚类数据下需要证明决策的预测性能评估中的不确定性量化和研究设计提供了实用基础。

英文摘要

Prediction performance metrics such as accuracy and the F1 score are typically reported as single numbers, with no measure of uncertainty. The omission has been tolerable in exploratory settings, where model evaluation is used for informal comparison rather than formal decision-making. But as machine learning is deployed in real-world applications, evaluation results are increasingly used to support binary decisions -- whether a model meets a required standard or not -- making uncertainty quantification essential. The problem is compounded when data are dependent, as in repeated measurements, clustered subjects, or time series, where variability is harder to assess and easy to underestimate. We develop a unified framework that links a broad class of performance metrics through their representation as smooth functionals of confusion-matrix probabilities. This representation allows the use of the cluster-robust sandwich variance estimator to obtain asymptotically valid confidence intervals, hypothesis tests, and paired model comparisons for both binary and multiclass problems under clustered data. We also provide power and sample size approximations based on pilot data, enabling principled study design for model evaluation. Simulations show that the proposed methods achieve near-nominal coverage across a range of dependence structures, while naive methods underestimate variability. A real-data application further illustrates how accounting for clustering can materially change conclusions. These results offer a practical foundation for uncertainty quantification and study design in prediction performance evaluation, in settings where decisions should be justified under dependent and clustered data.

2606.03415 2026-06-04 stat.ME

A Better Comparison under right-censoring: ABC Statistic for Equivalence Testing and Quantification

右删失下更好的比较:用于等价检验和量化的ABC统计量

Simon Mack, Kathrin Möllenhoff, Dennis Dobler

AI总结 提出基于两条生存曲线之间归一化积分绝对距离的ABC统计量,用于右删失数据下的等价检验和差异量化,并发展了大样本性质和重抽样方法。

详情
Comments
27 pages, 6 tables, 4 figures; version2: Added previously incomplete summary of the contents of the appendix
AI中文摘要

ABC(曲线间面积)统计量是一种$L^1$距离,针对易于解释的估计目标。定义为两条生存曲线之间(归一化)积分绝对距离,即使生存函数交叉时也是有意义的量。基于右删失的生存时间数据,估计基于来自两个独立样本组的Kaplan-Meier曲线。在本文中,我们发展了ABC统计量的大样本性质,并研究了多种重抽样选项以近似该统计量的分布,该分布可能在极限情况下非正态。这些突破使得构建等价检验成为可能,用于确定两个生存函数之间的差异在实际中无关紧要。或者,点估计可以附带置信区间,以可理解的方式量化曲线之间的差异。一项广泛的模拟研究在各种情景下探索了这些推断方法:比例、交叉和部分相等的生存函数。应用于肺癌试验的总生存期和无进展生存期数据,说明了这些方法的优点和一些考虑点。

英文摘要

The ABC (area between curves) statistic is an $L^1$-distance which targets an easy-to-interpret estimand. Defined as the (normalized) integrated absolute distance between two survival curves it is a meaningful quantity even when survival functions are crossing. Based on right-censored time-to-event data, estimation is based on Kaplan-Meier curves obtained from two independent sample groups. In the present paper, we develop the large sample properties of the ABC statistic and investigate various resampling options for approximating the statistic's distribution which is possibly non-normal in the limit. These breakthroughs enable the construction of equivalence tests which can be used to establish that differences between two survival functions are practically irrelevant. Alternatively, the point estimator can be accompanied with confidence intervals that comprehensibly quantify the difference between the curves. An extensive simulation study explores these inferential methods under various scenarios: proportional, crossing, and partially equal survival functions. An application to data on overall and progression-free survival in a lung cancer trial illustrates the methods' benefits and some points of consideration.

2606.03973 2026-06-04 math.PR cs.IT math.IT math.ST stat.TH

A remark on the majorizing measures theorem for general processes

关于一般过程的主测度定理的一个注记

Reese Pathak, Nikita Zhivotovskiy

AI总结 本文证明了对一大类随机向量,主测度定理的下界成立,其中关键依赖于率失真积分。

详情
AI中文摘要

我们证明主测度定理的下界对一大类随机向量成立。具体地,设 $X \\\sim \\\mu$ 是 $\\\mathbf{R}^n$ 中的中心化随机向量,满足 \\[ C_{\\\mathrm{KL}}(\\\mu) = \\\sup_{\\\substack{\\theta \\neq \\\eta \\\\\ \\theta, \\\eta \\\in \\\mathbf{R}^n}} \\frac{\\\mathrm{KL}(\\\mu_\\theta \\\| \\\mu_\\\eta)}{\\\|\\theta - \\\eta\\\|_2^2} < \\\infty, \\] 其中 $\\\mu_\\theta$ 表示平移 $\\theta + X$ 的分布。则对任意非空有界 $T \\\subset \\\mathbf{R}^n$,有 \\[ \\\sqrt{C_{\\\mathrm{KL}}(\\\mu)}\\\, \\\mathbf{E}_\\\mu \\\Big[\\\sup_{t \\\in T} \\\, \\\langle X, t \\rangle \\\Big] \\\gtrsim \\\gamma_2(T), \\] 其中右边表示 Talagrand 的泛函链泛函。该结果作为特例恢复了中心化高斯过程的主测度定理的下界。我们的论证关键依赖于 J. Liu 最近引入的率失真积分。

英文摘要

We show that the lower bound in the majorizing measures theorem holds for a large class of random vectors. Specifically, suppose $X \sim μ$ is a centered random vector in $\mathbf{R}^n$ with \[ C_{\mathrm{KL}}(μ) = \sup_{\substack{θ\neq η\\ θ, η\in \mathbf{R}^n}} \frac{\mathrm{KL}(μ_θ\| μ_η)}{\|θ- η\|_2^2} < \infty, \] where $μ_θ$ denotes the law of the translate $θ+ X$. Then, for every nonempty, bounded $T \subset \mathbf{R}^n$, \[ \sqrt{C_{\mathrm{KL}}(μ)}\, \mathbf{E}_μ\Big[\sup_{t \in T} \, \langle X, t \rangle \Big] \gtrsim γ_2(T), \] where the righthand side denotes Talagrand's generic chaining functional. This result recovers, as a special case, the lower bound in the majorizing measures theorem for centered Gaussian processes. Our argument critically relies on the rate-distortion integral, recently introduced by J. Liu.

2606.01760 2026-06-04 hep-ex stat.ME

HS3: A Descriptive, Interoperable Serialization Standard for Statistical Models in High-Energy Physics

HS3:高能物理中统计模型的一种描述性、可互操作的序列化标准

Carsten Burgard, Oliver Schulz, Giordon Stark, Jonas Rembser, Simon Cello, Cornelius Grunwald

AI总结 本文提出HS3标准,一种与实现无关、人类可读且可扩展的统计模型序列化格式,旨在解决高能物理中模型表示缺乏通用标准的问题,支持似然函数的计算图表示和跨框架互操作。

详情
Comments
18 pages, 3 figures, 3 code listings
AI中文摘要

高能物理中的统计模型形式化地编码了观测数据、感兴趣的物理参数以及实验和理论不确定性之间的关系。基于似然的推断是精确测量、有效场论拟合和交叉分析组合的核心工具。因此,对机器可读、描述性和可移植的模型表示的需求日益增长。现有的格式如ROOT工作空间、pyhf JSON和CMS DataCards提供了有价值的能力,但仍局限于特定的软件栈,并且缺乏用于交换、验证或长期保存的通用标准。我们引入HS3,即高能物理统计序列化标准,这是一种与实现无关、人类可读且可扩展的统计模型序列化格式。HS3的设计使得新的统计结构可以通过向后兼容的扩展来加入,而推断过程和实现特定的执行细节则由下游框架负责。HS3将似然表示为计算图,由命名分布、函数、数据集、域和分析方案组成。它支持分箱和未分箱似然以及分层复合模型。HS3可以从ROOT/RooFit转换,并且是pyhf的超集。我们描述了HS3的设计原则、结构和语义,并总结了在C++、Python和Julia中的现有实现。我们还介绍了在HEPData上对公开似然的早期应用、跨框架验证和可重复性工作。HS3为LHC及其他领域提供了FAIR(可发现、可访问、可互操作、可重用)、长寿命的统计模型的基础。该标准旨在服务于更广泛的科学界,并随着时间的推移而发展,以适用于广泛的领域。

英文摘要

Statistical models in high-energy physics formally encode the relationship between observed data, physics parameters of interest, and experimental and theoretical uncertainties. Likelihood-based inference is the central tool for precision measurements, effective field theory fits, and cross-analysis combinations. Consequently, there is an increasing need for machine-readable, descriptive, and portable model representations. Existing formats such as ROOT workspaces, pyhf JSON, and CMS DataCards provide valuable capabilities but remain tied to specific software stacks and offer no universal standard for exchange, validation, or long-term preservation. We introduce HS3, the High-Energy Physics Statistics Serialization Standard, an implementation-agnostic, human-readable, and extensible serialization format for statistical models. HS3 is designed such that new statistical constructs can be incorporated through backward-compatible extensions, while inference procedures and implementation-specific execution details remain the responsibility of downstream frameworks. HS3 represents likelihoods as computational graphs composed of named distributions, functions, datasets, domains, and analysis prescriptions. It supports binned and unbinned likelihoods as well as hierarchical composite models. HS3 is convertible from and to ROOT/RooFit and is a superset of pyhf. We describe the design principles, structure, and semantics of HS3 and summarize existing implementations in C++, Python, and Julia. We also present early applications to public likelihoods on HEPData, cross-framework validation, and reproducibility efforts. HS3 provides a foundation for FAIR (Findable, Accessible, Interoperable, Reusable), long-lived statistical models at the LHC and beyond. The standard is intended to serve the broader scientific community and to evolve over time for application across a wide range of domains.

2605.28349 2026-06-04 econ.EM stat.AP

Robust Inference for Dyadic Data with Dependent Ordered Nodes

具有依赖有序节点的二元数据的稳健推断

Ulrich Hounyo, Jiahao Lin, Xiaojun Song

AI总结 针对有序节点间存在共同潜在冲击导致传统二元依赖范式失效的问题,提出两种考虑节点顺序依赖的推断方法(修正方差估计器和行-列移动块刀切法),并证明其渐近有效性。

详情
AI中文摘要

二元回归模型通常在传统的二元依赖范式下进行分析,其中两个观测值可能仅当对应的二元组共享一个节点时才存在依赖。本文研究当这种范式因节点有序且邻近节点暴露于共同潜在冲击而失效时的推断问题。在这种设定下,没有共同端点的二元组在端点顺序接近时仍可能相关。尽管每个额外的协方差项可能较弱,但邻近节点二元组对的数量随样本量发散,因此它们对渐近方差的总体贡献不可忽略。我们为具有有序节点依赖的二元数组开发了一个推断框架。第一个估计量是依赖节点二元CRVE,它保留了端点邻近的二元组之间的协方差项。第二个是行-列移动块刀切法,它删除相邻的节点块以及所有触及这些节点的二元组。我们在沿有序节点索引的弱依赖条件下建立了两种方法的渐近有效性。蒙特卡洛证据表明,考虑有序节点依赖可以显著改善尺寸控制,并且刀切版本在有限样本中相对稳定。

英文摘要

Dyadic regression models are commonly analyzed under the conventional dyadic dependence framework, where two observations may be dependent only if the corresponding dyads share a node. This paper studies inference when nodes are ordered and nearby nodes are exposed to common latent shocks, so that dyads with no shared endpoint may still be dependent. Although each additional covariance term may be weak, the number of nearby-node dyad pairs grows with the sample size, making their aggregate contribution asymptotically non-negligible. We develop an inferential framework for dyadic arrays with ordered-node dependence and propose two variance estimators: a dependent-node dyadic cluster-robust variance estimator that retains covariance terms between dyads with nearby endpoints, and a row-column moving-block jackknife method that deletes adjacent blocks of nodes together with all dyads touching those nodes. We establish the asymptotic validity of both procedures under weak dependence along the ordered node index. Monte Carlo evidence shows improvements in size control, with the jackknife procedure displaying comparatively stable finite-sample performance. An application to international trade gravity regressions shows that accounting for ordered-node dependence substantially weakens the statistical evidence for free trade agreement effects.

2504.12988 2026-06-04 cs.LG stat.ML

Why Ask One When You Can Ask $k$? Learning-to-Defer to the Top-$k$ Experts

为何只问一个专家?学习将任务推迟到Top-$k$专家

Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

AI总结 提出Top-$k$学习推迟框架,通过将查询分配给最优的$k$个专家,实现多专家协作,并开发了与$k$无关的替代损失函数,在准确性和成本之间取得更优权衡。

详情
AI中文摘要

现有的学习推迟(L2D)框架仅限于单专家推迟,迫使每个查询仅依赖一个专家,无法利用集体专业知识。我们首次提出了Top-$k$学习推迟框架,将查询分配给成本效益最高的$k$个实体。我们的公式统一并严格推广了先前的方法,包括单阶段和两阶段机制、选择性预测以及经典级联。特别地,它将通常的Top-1推迟规则作为特例,同时当$k>1$时能够与多个专家进行原则性协作。我们进一步提出了Top-$k(x)$学习推迟,这是一种自适应变体,根据输入难度、专家质量和咨询成本学习每个查询的最佳专家数量。为了实现实际学习,我们开发了一种新颖的替代损失函数,该函数在单阶段设置中是贝叶斯一致且$\mathcal{H}_h$一致的,在两阶段设置中是$(\mathcal{H}_r,\mathcal{H}_g)$一致的。关键是,该替代损失与$k$无关,允许一次性学习单个策略并灵活地部署到不同的$k$值。在两个机制上的实验表明,Top-$k$和Top-$k(x)$在准确性和成本之间实现了更优的权衡,为L2D中的多专家推迟开辟了新方向。

英文摘要

Existing Learning-to-Defer (L2D) frameworks are limited to single-expert deferral, forcing each query to rely on only one expert and preventing the use of collective expertise. We introduce the first framework for Top-$k$ Learning-to-Defer, which allocates queries to the $k$ most cost-effective entities. Our formulation unifies and strictly generalizes prior approaches, including the one-stage and two-stage regimes, selective prediction, and classical cascades. In particular, it recovers the usual Top-1 deferral rule as a special case while enabling principled collaboration with multiple experts when $k>1$. We further propose Top-$k(x)$ Learning-to-Defer, an adaptive variant that learns the optimal number of experts per query based on input difficulty, expert quality, and consultation cost. To enable practical learning, we develop a novel surrogate loss that is Bayes-consistent, $\mathcal{H}_h$-consistent in the one-stage setting, and $(\mathcal{H}_r,\mathcal{H}_g)$-consistent in the two-stage setting. Crucially, this surrogate is independent of $k$, allowing a single policy to be learned once and deployed flexibly across $k$. Experiments across both regimes show that Top-$k$ and Top-$k(x)$ deliver superior accuracy-cost trade-offs, opening a new direction for multi-expert deferral in L2D.

2410.15761 2026-06-04 cs.CL cs.LG stat.ML

Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees

基于LLM的抽取式问答中的最优查询分配:一个具有理论保证的学习-推迟框架

Yannis Montreuil, Shu Heng Yeo, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

AI总结 提出一个学习-推迟框架,通过将查询分配给专门专家,在保证高置信度预测的同时优化计算效率,并在SQuADv1、SQuADv2和TriviaQA上验证了其提高答案可靠性和降低计算开销的效果。

详情
Comments
25 pages, 17 main paper
AI中文摘要

大型语言模型在生成任务中表现出色,但在结构化文本选择(特别是抽取式问答)中效率低下。这一挑战在资源受限环境中被放大,因为部署多个专门模型处理不同任务是不切实际的。我们提出一个学习-推迟框架,将查询分配给专门专家,确保高置信度预测的同时优化计算效率。我们的方法整合了一个原则性的分配策略,并提供了关于最优推迟的理论保证,以平衡性能和成本。在SQuADv1、SQuADv2和TriviaQA上的实证评估表明,我们的方法增强了答案可靠性,同时显著降低了计算开销,使其非常适合可扩展且高效的EQA部署。

英文摘要

Large Language Models excel in generative tasks but exhibit inefficiencies in structured text selection, particularly in extractive question answering. This challenge is magnified in resource-constrained environments, where deploying multiple specialized models for different tasks is impractical. We propose a Learning-to-Defer framework that allocates queries to specialized experts, ensuring high-confidence predictions while optimizing computational efficiency. Our approach integrates a principled allocation strategy with theoretical guarantees on optimal deferral that balances performance and cost. Empirical evaluations on SQuADv1, SQuADv2, and TriviaQA demonstrate that our method enhances answer reliability while significantly reducing computational overhead, making it well-suited for scalable and efficient EQA deployment.

2605.25934 2026-06-04 stat.ME math.ST stat.AP stat.TH

Weighted NPMLE for the Marginal Mean of Recurrent Events with a Competing Terminal Event

加权NPMLE用于存在竞争终点事件的复发事件边际均值

Anna Bellach, Michael R. Kosorok

AI总结 针对复发事件与竞争终点事件,提出基于加权似然的半参数回归模型,直接估计复发事件的边际均值,并证明估计量的一致性和渐近正态性,通过STATCOPE试验数据验证方法有效性。

详情
AI中文摘要

复发事件和终点事件的回归建模在生存分析中持续带来方法论挑战。现有方法要么对两种事件类型之间的依赖结构做出不可验证的假设,要么依赖于边际均值的比例强度假设。本文提出一种基于新颖加权似然函数的半参数回归模型,直接针对复发事件的边际均值。我们的通用模型涵盖了一大类半参数回归模型,并适应外部时变协变量对边际均值强度的影响。我们建立了估计量的一致性和渐近正态性,并提出了方差的sandwich估计量。我们提出了一种新颖的模拟程序,直接针对复发事件的边际均值强度。在模拟研究中,我们展示了加权NPMLE在独立右删失下的强大性能。通过应用于STATCOPE试验(一项研究辛伐他汀对COPD急性加重疗效的大型随机临床试验)的数据,展示了所提出方法的实际效用。我们提供了急性加重次数的个性化预测,并重新评估了辛伐他汀治疗的效果,将死亡作为GOLD 4期患者的竞争终点事件。

英文摘要

Regression modeling of recurrent and terminal events continues to present methodological challenges in survival analysis. Existing approaches either make unverifiable assumptions about the dependency structure between the two event types or rely on the proportional intensity assumption for the marginal mean. A semiparametric regression model is proposed that is based on a novel weighted likelihood function, thereby targeting directly the marginal mean of the recurrent event. Our general model captures a large class of semiparametric regression models and accommodates external time-dependent covariate effects on the marginal mean intensity. We establish the consistency and asymptotic normality of the estimators and propose a sandwich estimator of the variance. We propose a novel simulation procedure that directly targets the marginal mean intensity of the recurrent events. In simulation studies, we demonstrate a strong performance of the weighted NPMLE under independent right-censoring. The practical utility of the proposed methodology is demonstrated through application to data from the STATCOPE trial, a large randomized clinical trial that investigated the efficacy of simvastatin for COPD exacerbations. We provide personalized predictions for the number of exacerbations and reassess the effect of simvastatin treatment, accounting for death as a competing terminal event for patients with GOLD stage 4.

2602.05725 2026-06-04 cs.LG math.OC stat.ML

Muon in Associative Memory Learning: Training Dynamics and Scaling Laws

联想记忆学习中的Muon:训练动力学与缩放定律

Binghui Li, Kaifei Wang, Han Zhong, Pinyan Lu, Liwei Wang

AI总结 本文在联想记忆模型中研究Muon优化器的训练动力学和缩放定律,证明其相比梯度下降在无噪声情况下实现指数加速,在有噪声情况下具有更优的缩放效率。

详情
Comments
Published as a conference paper at ICML 2026; 53 pages
AI中文摘要

Muon通过梯度的矩阵符号更新矩阵参数,并显示出强大的经验增益,但其动力学和缩放行为在理论上仍不清楚。我们在具有softmax检索和查询-答案对上的层次频谱(含和不含标签噪声)的线性联想记忆模型中研究Muon。在该设置下,我们证明梯度下降以高度不平衡的速率学习频率分量,导致收敛缓慢,瓶颈在于低频分量。相比之下,Muon优化器缓解了这种不平衡,实现了更快且更均匀的进展。具体地,在无噪声情况下,Muon实现了相对于梯度下降的指数加速;在具有幂律频谱的有噪声情况下,我们推导了Muon的缩放定律,并展示了其相对于梯度下降的优越缩放效率。此外,我们表明Muon可以解释为由自适应任务对齐和块对称梯度结构产生的隐式矩阵预处理器。相比之下,具有坐标符号算子的预处理器在已知未知任务表示的oracle访问下才能匹配Muon,而这在实践中的SignGD中是不可行的。在合成长尾分类和LLaMA风格预训练上的实验证实了该理论。

英文摘要

Muon updates matrix parameters via the matrix sign of the gradient and has shown strong empirical gains, yet its dynamics and scaling behavior remain unclear in theory. We study Muon in a linear associative memory model with softmax retrieval and a hierarchical frequency spectrum over query-answer pairs, with and without label noise. In this setting, we show that Gradient Descent (GD) learns frequency components at highly imbalanced rates, leading to slow convergence bottlenecked by low-frequency components. In contrast, the Muon optimizer mitigates this imbalance, leading to faster and more uniform progress. Specifically, in the noiseless case, Muon achieves an exponential speedup over GD; in the noisy case with a power-law frequency spectrum, we derive Muon's scaling law and demonstrate its superior scaling efficiency over GD. Furthermore, we show that Muon can be interpreted as an implicit matrix preconditioner arising from adaptive task alignment and block-symmetric gradient structure. In contrast, the preconditioner with coordinate-wise sign operator could match Muon under oracle access to unknown task representations, which is infeasible for SignGD in practice. Experiments on synthetic long-tail classification and LLaMA-style pre-training corroborate the theory.

2605.21659 2026-06-04 stat.CO

Adaptive Generalized Elliptical Slice Sampling

自适应广义椭圆切片采样

Nicholas Marco, Surya T. Tokdar

AI总结 针对无梯度MCMC中手动调参、维度扩展和局部几何适应性的挑战,提出自适应广义椭圆切片采样(AGESS),通过结合椭圆切片采样与衰减自适应,实现从慢混合到快混合的自校正,并保持遍历性。

详情
AI中文摘要

无梯度MCMC的核心挑战是设计能同时绕过手动调参、高效扩展维度并适应局部目标几何的算法。虽然自适应策略可以自动调整随机游走Metropolis等通用框架,但其混合时间随维度呈线性增长。椭圆切片采样(ESS)提供了一个有前景的替代方案:它无需调参,适应局部几何,并在有利条件下可实现近乎与维度无关的扩展。然而,若目标分布与用于生成椭圆定义辅助变量的分布不匹配,其效率会迅速下降,从而阻碍其在高维场景中的应用。我们证明,ESS与衰减自适应的精心结合可直接解决这些瓶颈。由此产生的自适应广义椭圆切片采样器(AGESS)能从慢混合状态自校正为快混合状态,同时在满足温和正则条件的多种目标密度下保持遍历性。该算法的实用性在一系列具有挑战性的应用中得到了验证,包括广义回归、深度高斯过程代理建模和高维稀疏回归。我们的理论结果和案例研究共同证明了AGESS在非椭圆、不可微、多模态或高维目标分布上的效率和鲁棒性。

英文摘要

A central challenge in gradient-free MCMC is designing algorithms that simultaneously bypass manual tuning, scale efficiently with dimension, and adapt to local target geometry. While adaptive strategies can auto-tune generic frameworks like random walk Metropolis, they offer slow, linear-order scaling of mixing times with dimension. Elliptical slice sampling (ESS) offers a promising alternative: it is tuning-free, adjusts to local geometry, and can achieve nearly dimension-free scaling under favorable conditions. However, its efficiency degrades rapidly if there is a mismatch between the target distribution and the distribution used to generate the ellipse-defining auxiliary variables, precluding its use in high-dimensional settings. We demonstrate that a careful synthesis of ESS and diminishing adaptation directly resolves these bottlenecks. The resulting adaptive generalized elliptical slice sampler (AGESS) self-corrects from a slow-mixing to a fast-mixing regime, while preserving ergodicity across a wide variety of target densities satisfying mild regularity conditions. The algorithm's utility is demonstrated across a broad collection of challenging applications, including generalized regression, deep Gaussian process surrogate modeling, and high-dimensional sparse regression. Together, our theoretical results and the case studies give evidence of the efficiency and robustness of AGESS across target distributions that are non-elliptical, non-differentiable, multi-modal, or high-dimensional.

2605.20657 2026-06-04 eess.SY cs.SY stat.AP

Cooling Channel Design Optimization for High Power Multi-Chip Packages

高功率多芯片封装的冷却通道设计优化

Michael Acquah, Zheng Liu

AI总结 针对高功率多芯片封装的热管理问题,提出一种基于物理的计算框架,结合替代模型和混合整数二次规划算法优化嵌入式冷却通道布局,显著降低芯片峰值和平均温度。

详情
Comments
9 pages, 8 figures
AI中文摘要

热管理是下一代高性能计算系统的主要挑战,特别是对于异构多芯片封装,如NVIDIA GB200 Grace Blackwell超级芯片。本文开发了一个基于物理的计算框架,用于优化高功率多芯片模块的嵌入式冷却通道布局。该模型将稳态热传导与基于多孔介质的冷却剂输运表示相结合,并结合逐行冷却剂能量平衡,以估计微通道网络内的芯片温度场。与传统设计不同,交错冷却架构通过几何变量参数化,包括通道数量、宽度和芯片区域上的扩展,从而实现系统化的设计探索。为了实现高效优化,采用替代方法近似几何参数与温度指标之间的关系。使用混合整数二次规划算法优化所得模型,以最小化基于峰值和平均芯片温度的加权目标。为了提高物理相关性,进一步约束通道布局以增加GPU区域附近的冷却覆盖,这些区域的热负荷最高。该框架应用于基于NVIDIA GB200架构的代表性多芯片配置,包括两个图形处理单元和一个中央处理单元。结果表明,与基线配置相比,优化设计将芯片峰值温度降低了140.45°C,平均芯片温度降低了35.87°C。

英文摘要

Thermal management is a major challenge in next-generation high-performance computing systems, particularly for heterogeneous multi-chip packages such as the NVIDIA GB200 Grace Blackwell Superchip. In this work, a physics-based computational framework is developed to optimize embedded cooling channel layouts for high-power multi-chip modules. The model couples steady-state heat conduction with a porous media-based representation of coolant transport, coupled with a row-wise coolant energy balance, to estimate chip temperature fields within microchannel networks. Unlike conventional designs, an interdigitated cooling architecture is parameterized using geometric variables, including channel count, width, and expansion over chip regions, enabling systematic design exploration. To enable efficient optimization, a surrogate-based approach is employed to approximate the relationship between geometric parameters and temperature metrics. The resulting model is optimized using a mixed-integer quadratic programming algorithm to minimize a weighted objective based on peak and average chip temperatures. To improve physical relevance, channel placement is further constrained to increase cooling coverage near GPU regions, where thermal loads are highest. The framework is applied to a representative multi-chip configuration based on NVIDIA GB200 architecture, consisting of two graphics processing units and one central processing unit. The results demonstrate that the optimal design reduces the peak chip temperature by 140.45°C and the average chip temperature by 35.87°C compared to the baseline configuration.

2605.20468 2026-06-04 cs.LG stat.ME stat.ML

CASCADE Conformal Prediction: Uncertainty-Adaptive Prediction Intervals for Two-Stage Clinical Decision Support

CASCADE 共形预测:两阶段临床决策支持的不确定性自适应预测区间

Ricardo Diaz-Rincon, Muxuan Liang, Adolfo Ramirez-Zamora, Benjamin Shickel

AI总结 提出 CASCADE 共形预测框架,通过传播分类器认知不确定性动态调整回归预测区间,在帕金森病用药管理中实现高效且鲁棒的区间估计。

详情
Comments
Accepted to ICML 2026 AgenticUQ Workshop. 14 Pages, 3 Figures
AI中文摘要

由于疾病进展的异质性、患者反应的差异性以及药物副作用,帕金森病(PD)的有效用药管理具有挑战性。虽然AI模型可以预测左旋多巴等效日剂量(LEDD)作为用药需求的度量,但标准的不确定性量化通常无法传达这些预测的可靠性,将高置信度和低置信度的临床决策等同对待。我们引入了CASCADE(通过共形和分布估计的校准自适应缩放),一种新颖的共形预测框架,它将来自筛选分类器的认知不确定性传播以自适应下游预测。与依赖辅助残差回归的标准共形方法不同,我们利用来自主要分类任务(识别是否需要改变用药)的认知不确定性,动态缩放次要回归任务(预测改变多少)的预测区间。通过将Venn-Abers多概率不确定性直接映射到非一致性分数,我们的框架实现了连续的风险自适应。我们证明,这种“级联效应”为高置信度患者产生高效的区间(比标准共形基线窄38.9%),同时自动扩展区间以确保对不确定病例的鲁棒覆盖,弥合了PD中离散临床决策与连续剂量预测之间的差距。

英文摘要

Effective medication management in Parkinson's Disease (PD) is challenging due to heterogeneous disease progression, variable patient response, and medication side effects. While AI models can forecast levodopa equivalent daily dose (LEDD) as a measure of medication needs, standard uncertainty quantification often fails to communicate the reliability of these predictions, treating high and low confidence clinical decisions identically. We introduce CASCADE (Calibrated Adaptive Scaling via Conformal And Distributional Estimation), a novel conformal prediction framework that propagates epistemic uncertainty from a screening classifier to adapt downstream predictions. Unlike standard conformal methods that rely on auxiliary residual regression, we leverage epistemic uncertainty from a primary classification task (identifying whether a medication change is needed) to dynamically scale the prediction intervals of a secondary regression task (predicting how much change). By mapping Venn-Abers multi-probabilistic uncertainty directly to non-conformity scores, our framework achieves continuous risk adaptation. We demonstrate that this ``cascade effect'' produces highly efficient intervals for confident patients (38.9% narrower than standard conformal baselines) while automatically expanding intervals to ensure robust coverage for uncertain cases, bridging the gap between discrete clinical decision-making and continuous dose forecasting in PD.

2605.19271 2026-06-04 stat.ME

Ranking with Confidence: A Probabilistic Framework for Deterministic Ranking Methods

排名置信度:确定性排名方法的概率框架

Shunpu Zhang

AI总结 提出一个概率框架,将真实排名视为潜在随机变量,基于成对优势概率引入新排名准则,并开发最坏最优排名方法构建置信区间,首次为经典确定性排名提供形式化不确定性量化。

详情
AI中文摘要

排名在教育到在线平台等领域的决策中至关重要,但经典确定性方法如Borda计数法或Copeland型成对方法忽略了由于抽样噪声或不完整数据带来的不确定性。我们提出了一个概率框架,将真实排名视为潜在随机变量,从而能够量化排名的不确定性。我们引入了基于成对优势概率的新排名准则,推导了近似推断程序,并提供了一种新颖的最坏最优排名方法来构建排名的联合和个体置信区间。我们的方法是首个为经典确定性排名提供形式化不确定性量化的方法。它对缺失数据具有固有的鲁棒性:与Copeland型方法不同,后者通过给观察较少的实体分配较少的胜利来惩罚它们,我们的成对概率模型调整了不完整性,消除了对具有更完整记录的项目的偏见。由此产生的排名反映了潜在表现而非数据可用性,从而在高风险应用中增强了公平性、透明度和统计可靠性。

英文摘要

Rankings are central to decision-making in fields ranging from education to online platforms, yet classical deterministic methods such as the Borda count method or Copeland-type pairwise methods ignore uncertainty due to sampling noise or incomplete data. We propose a probabilistic framework that treats true ranks as latent random variables, enabling quantification of ranking uncertainty. We introduce new ranking criteria based on pairwise dominance probabilities, derive approximate inference procedures, and provide a novel Worst Best rank method to construct simultaneous and individual confidence intervals for ranks. Our approach is the first to provide formal uncertainty quantification for classical deterministic rankings. It is inherently robust to missing data: unlike Copeland type methods, which penalize entities with fewer observed comparisons by assigning them fewer wins, our pairwise probability model adjusts for incompleteness, eliminating bias toward items with more complete records. The resulting rankings reflect underlying performance rather than data availability, enhancing fairness, transparency, and statistical reliability in high-stakes applications.

2605.18931 2026-06-04 stat.ML cs.AI cs.LG

Markov Chain Decoders Overcome the Heavy-Tail Limitations of Lipschitz Generative Models

马尔可夫链解码器克服Lipschitz生成模型的重尾限制

Abdelhakim Ziani, Andras Horvath, Paolo Ballarini

AI总结 针对Lipschitz生成模型无法生成重尾分布的问题,提出用基于马尔可夫链的Phase-Type分布替换高斯解码器,显著降低了尾部误差和极端分位数误差。

详情
Journal ref
22nd European Performance Engineering Workshop (EPEW 2026), Jun 2025, Grimstad, Norway
AI中文摘要

重尾分布在性能评估、网络流量和风险建模中普遍存在。这种行为对现代深度生成模型构成了根本性挑战。标准变分自编码器(VAE)采用高斯解码器似然和Lipschitz约束神经网络,这种组合在结构上无法产生重尾输出:高斯尾部呈指数衰减,而Lipschitz连续性阻止解码器放大来自潜在空间的罕见事件以充分克服这种衰减。我们提供了这一局限性的理论刻画,并使用合成Pareto数据(跨越尾部指数$α$ ∈ {2, 3, 5, 30}和维度d ∈ {1, 5, 10}的网格)进行了受控实证演示。作为解决方案,我们在保持编码器、潜在空间和训练过程不变的情况下,将高斯解码器替换为基于马尔可夫链的Phase-Type(PH)分布。PH分布允许对任何正值分布(包括重尾族)进行任意精确的近似。实验表明,对于重尾数据,与高斯基线相比,基于PH的模型将尾部Kolmogorov-Smirnov距离减少了最多6倍,极端分位数误差减少了最多10倍。这些结果表明,将基于马尔可夫链的分布集成到生成模型的解码器中,为重尾生成问题提供了一个有原则且实际有效的解决方案。

英文摘要

Heavy-tailed distributions are prevalent in performance evaluation, network traffic, and risk modeling. This behavior poses a fundamental challenge for modern deep generative models. Standard Variational Autoencoders (VAEs) employ Gaussian decoder likelihoods and Lipschitz-constrained neural networks, a combination that is structurally incapable of producing heavy-tailed outputs: the Gaussian tail decays exponentially, and Lipschitz continuity prevents the decoder from amplifying rare events from the latent space input to sufficiently overcome this decay. We provide both a theoretical characterization of this limitation and a controlled empirical demonstration using synthetic Pareto data across a grid of tail indices $α$ $\in$ {2, 3, 5, 30} and dimensions d $\in$ {1, 5, 10}. As a solution, we replace the Gaussian decoder with a Phase-Type (PH) distribution based on Markov chains, while keeping the encoder, latent space, and training procedure identical. PH distributions allow for arbitrarily precise approximations of any positive-valued distributions, including heavy-tailed families. Experiments showed that the PH-based model reduces tail Kolmogorov-Smirnov distance by up to x6 and extreme quantile error by up to x10 compared to the Gaussian baseline for heavy-tailed data. These results demonstrate that integrating Markov chain-based distributions into the decoder of a generative model institutes a principled and practically effective solution to the heavy-tail generation problem.

2506.01250 2026-06-04 cs.LG stat.ML

Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration

神经方差感知的深度表示与浅层探索的对抗性老虎机

Youngmin Oh, Jinje Park, Taejin Paik

AI总结 提出首个方差感知的上下文对抗性老虎机算法,结合浅层探索与神经网络非线性效用逼近,通过迭代自改进与谱分析将网络宽度需求从Ω̃(T^{14})降至Ω̃(T^{6}),并实现次线性遗憾。

详情
Comments
Accepted at AISTATS 2026; code at https://github.com/youngmin0oh/NVLDB-AISTATS2026
AI中文摘要

我们首次引入了方差感知的上下文对抗性老虎机算法,该算法利用浅层探索策略与神经网络进行非线性效用逼近。一个关键的理论挑战是缺乏闭式估计量,这导致先前的工作需要极大的网络宽度$m$(即$m = \widetilde{\Omega}(T^{14})$)。我们通过一种结合迭代自改进与谱分析的新颖分析方法解决了这一约束。我们的分析将网络宽度需求显著降低至$m = \widetilde{\Omega}(T^{6})$,并表明我们的算法在UCB和TS框架下均实现了次线性遗憾$\widetilde{\mathcal{O}}(d\sqrt{\sum_{t=1}^{T} \sigma_t^2} + \sqrt{dT})$。实验结果表明,所提出的算法不仅计算高效,在实际环境中表现出次线性遗憾,而且在合成和实际任务上均达到了最先进的性能。

英文摘要

We introduce the first variance-aware algorithms for contextual dueling bandits that leverage shallow exploration strategies with neural networks for nonlinear utility approximation. A key theoretical challenge is the absence of a closed-form estimator, which led prior work to require an extremely large network width $m$ (i.e., $m = \widetildeΩ(T^{14})$). We address this constraint with a novel analytical approach that combines iterative self-improvement with spectral analysis. Our analysis significantly reduces the network width requirement to $m = \widetildeΩ(T^{6})$, and shows that our algorithms achieve a sublinear regret of $\widetilde{\mathcal{O}}(d\sqrt{\sum_{t=1}^{T} σ_t^2} + \sqrt{dT})$ under both UCB and TS frameworks. Empirical results show that the proposed algorithms are not only computationally efficient and exhibit sublinear regret in practical settings, but also achieve state-of-the-art performance on both synthetic and real-world tasks.

2408.08630 2026-06-04 stat.ME

Spatial Principal Component Analysis and Moran Statistics for Multivariate Functional Areal Data

空间主成分分析与多元函数面域数据的Moran统计量

Dharini Pathmanathan, Issa-Mbenard Dabo, Tzung Hsuen Khoo, Alaa Ali-Hassan, Sophie Dabo-Niang

AI总结 提出多元函数面域空间主成分分析(mfasPCA)框架和多元函数Moran's I统计量,用于评估空间自相关并对方域观测的多元函数数据进行降维。

详情
AI中文摘要

本文介绍了多元函数面域空间主成分分析(mfasPCA)框架,以及多元函数Moran's I统计量,以评估在面域单元上观测的多元函数数据的空间自相关并进行降维。所提出的框架在空间-函数上是通用的:函数参数可以表示时间、年龄、波长或其他有序连续量,而空间依赖性通过空间权重矩阵在面域单元之间引入。主成分方法通过Moran型空间加权准则定义。我们提出基于特征值的置换检验来评估空间结构化成分的显著性。检验框架包括综合检验、带Holm调整的分量检验以及基于特征值尾和的有序秩检验。模拟研究表明,mfasPCA能够捕获正负空间-函数结构,并将其集中在各自自相关模式下的主要成分中。一个实际数据应用说明了mfasPCA如何识别多元函数变异的空间结构化模式。

英文摘要

The paper introduces a multivariate functional areal spatial principal component analysis (mfasPCA) framework, together with multivariate functional Moran's I statistics, to enable the assessment of spatial autocorrelation and dimension reduction for multivariate functional data observed over areal units. The proposed framework is spatial-functional in scope: the functional argument may represent time, age, wavelength, or another ordered continuum, while spatial dependence is introduced across areal units through a spatial weight matrix. The principal component method is defined through a Moran-type spatially weighted criterion. We propose eigenvalue-based permutation tests to assess the significance of spatially structured components. The testing framework includes omnibus tests, componentwise tests with Holm adjustment, and sequential rank-wise tests based on tail sums of eigenvalues. Simulation studies show that mfasPCA captures positive and negative spatial-functional structures and concentrates them in the leading components under the respective autocorrelation regimes. A real-data application illustrates how mfasPCA identifies spatially structured modes of multivariate functional variation.

2604.14575 2026-06-04 cs.LG cs.AI stat.ME stat.ML

Generative Augmented Inference

生成式增强推断

Cheng Lu, Mengxin Wang, Dennis J. Zhang, Heng Zhang

AI总结 提出生成式增强推断(GAI)框架,将AI输出视为学习真实标签的高维信息特征而非代理,通过非参数方法建模,实现人机数据联合的一致估计和有效推断,在随机标注下渐近效率严格优于仅用人类数据。

详情
AI中文摘要

大型语言模型使得廉价的AI生成标注成为可能,但如何可靠地将其用于因果推断仍然具有挑战性。简单地将AI和人类数据混合会引入偏差,而现有方法如预测驱动推断(PPI;Angelopoulos et al., 2023a)将AI输出视为真实标签的代理——这一假设在实践中常被生成模型输出所违背。我们提出生成式增强推断(GAI),一个将AI输出视为学习人类标签的一般性、潜在高维信息特征而非替代品的框架。GAI使用非参数方法灵活建模这种关系,从而能够从人类和AI的联合数据中进行一致估计和有效推断。我们建立了渐近正态性,并证明在随机标注下,只要AI输出对真实标签具有信息量,GAI在渐近效率上严格优于仅使用人类数据的估计。在真实数据集上的实证研究表明,与仅使用人类数据和基于PPI的估计相比,GAI在多种生成数据源上显著降低了估计误差并提高了置信区间质量。

英文摘要

Large language models enable inexpensive AI-generated annotations, but using them reliably for causal inference remains challenging. Naively pooling AI and human data induces bias, while existing methods such as Prediction-Powered Inference (PPI; Angelopoulos et al., 2023a) treat AI outputs as proxies of true labels -- an assumption often violated for generative model outputs in practice. We propose Generative Augmented Inference (GAI), a framework that treats AI outputs as general, potentially high-dimensional informative features for learning human labels rather than as surrogates. GAI flexibly models this relationship using nonparametric methods, enabling consistent estimation and valid inference from combined human and AI data. We establish asymptotic normality and show that, under random labeling, GAI strictly improves asymptotic efficiency over human-data-only estimation whenever AI outputs are informative for true labels. Empirical studies on real-world datasets demonstrate that GAI significantly reduces estimation error and improves confidence interval quality across diverse generative data sources relative to human-only and PPI-based estimation.

2604.14486 2026-06-04 math.ST econ.EM stat.ME stat.TH

Tweedie Calculus

Tweedie 微积分

Santiago Torres

AI总结 本文针对加性噪声模型,提出一个统一框架,通过傅里叶分析刻画后验期望的Tweedie表示,并证明其存在性、唯一性和连续性,推广到非高斯噪声和非线性后验泛函。

详情
AI中文摘要

Tweedie公式在高斯噪声下将潜变量的后验均值直接表示为观测数据密度的函数,是经验贝叶斯和测量误差分析的基石。然而,目前尚无一般理论解释类似恒等式何时成立、如何构造,以及如何将其推广到非高斯噪声和均值以外的后验泛函。本文针对加性噪声模型发展了这样一个框架。我刻画了在给定观测信号下,未观测潜变量的条件期望何时能直接表示为观测密度的函数——我将这些恒等式称为 \emph{Tweedie表示}——并证明它们由一个线性映射 \emph{Tweedie泛函} 控制。在一般条件下,我证明了该泛函存在、唯一且连续。我提供了一种基于傅里叶分析的构造性计算方法:通过延拓一个显式缓增分布的逆傅里叶变换得到该泛函。该理论给出了非高斯噪声下的后验均值公式,并为非线性后验泛函提供了新的表示。应用包括差分隐私中的拉普拉斯机制和复合决策问题中的异方差高斯序列模型。

英文摘要

Tweedie's formula, which under Gaussian noise expresses the posterior mean of a latent variable directly from the observed-data density, is a cornerstone of empirical Bayes and measurement-error analysis. No general theory, however, explains when analogous identities hold, how they are structured, or how to derive them for non-Gaussian noise and for posterior functionals other than the mean. This paper develops such a framework for additive-noise models. I characterize when conditional expectations of an unobserved latent variable, given the observed signal, admit direct expressions in terms of the observed density -- identities I call \emph{Tweedie representations} -- and show that they are governed by a linear map, the \emph{Tweedie functional}. Under general conditions, I prove that this functional exists, is unique, and is continuous. I provide a constructive method for its computation based on Fourier analysis: the functional is obtained by extending the inverse Fourier transform of an explicit tempered distribution. The theory yields posterior-mean formulas for non-Gaussian noise and provides new representations for nonlinear posterior functionals. Applications include Laplace mechanisms in differential privacy and heteroskedastic Gaussian sequence models in compound decision problems.

2603.21180 2026-06-04 cs.LG stat.CO stat.ME stat.ML

ALMAB-DC: Active Learning, Multi-Armed Bandits, and Distributed Computing for Sequential Experimental Design and Black-Box Optimization

ALMAB-DC:用于序贯实验设计和黑箱优化的主动学习、多臂老虎机和分布式计算

Foo Hui-Mean, Yuan-chin I Chang

AI总结 提出ALMAB-DC框架,结合高斯过程代理模型、多臂老虎机控制和异步分布式调度,解决昂贵黑箱优化问题,在多个基准上显著优于现有方法。

详情
Comments
33 pages, and 13 figures
AI中文摘要

在昂贵且无梯度目标下的序贯实验设计是计算统计学中的一个核心挑战:评估预算严格受限,必须从每次观测中高效提取信息。我们提出 extbf{ALMAB-DC},一种基于高斯过程的序贯设计框架,结合主动学习、多臂老虎机(MAB)和分布式异步计算,用于昂贵的黑箱实验。具有不确定性感知获取函数的高斯过程代理模型识别信息量大的查询点;UCB或汤普森采样老虎机控制器在并行工作节点间分配评估;异步调度器处理异构运行时间。我们给出了老虎机组件的累积遗憾界,并通过阿姆达尔定律刻画了并行可扩展性。我们在五个基准上验证了ALMAB-DC。在两个统计实验设计任务中,ALMAB-DC在剂量-响应优化中实现了比等间距、随机和D最优设计更低的简单遗憾,在自适应空间场估计中匹配了贪婪最大方差基准并优于拉丁超立方采样;在$K=4$时,分布式设置达到目标性能所需的序贯挂钟轮次仅为四分之一。在三个机器学习/工程任务(CIFAR-10 HPO、CFD阻力最小化、MuJoCo RL)中,ALMAB-DC实现了93.4%的CIFAR-10准确率(超过BOHB 1.7个百分点和Optuna 1.1个百分点),将翼型阻力降低至$C_D = 0.059$(比网格搜索低36.9%),并将RL回报比网格搜索提高50%。所有相对于非ALMAB基线的优势在Bonferroni校正的Mann-Whitney $U$检验下均具有统计显著性。分布式执行在$K = 16$个智能体时实现了$7.5 imes$加速,与阿姆达尔定律一致。

英文摘要

Sequential experimental design under expensive, gradient-free objectives is a central challenge in computational statistics: evaluation budgets are tightly constrained and information must be extracted efficiently from each observation. We propose \textbf{ALMAB-DC}, a GP-based sequential design framework combining active learning, multi-armed bandits (MAB), and distributed asynchronous computing for expensive black-box experimentation. A Gaussian process surrogate with uncertainty-aware acquisition identifies informative query points; a UCB or Thompson-sampling bandit controller allocates evaluations across parallel workers; and an asynchronous scheduler handles heterogeneous runtimes. We present cumulative regret bounds for the bandit components and characterize parallel scalability via Amdahl's Law. We validate ALMAB-DC on five benchmarks. On the two statistical experimental-design tasks, ALMAB-DC achieves lower simple regret than Equal Spacing, Random, and D-optimal designs in dose--response optimization, and in adaptive spatial field estimation matches the Greedy Max-Variance benchmark while outperforming Latin Hypercube Sampling; at $K=4$ the distributed setting reaches target performance in one-quarter of sequential wall-clock rounds. On three ML/engineering tasks (CIFAR-10 HPO, CFD drag minimization, MuJoCo RL), ALMAB-DC achieves 93.4\% CIFAR-10 accuracy (outperforming BOHB by 1.7\,pp and Optuna by 1.1\,pp), reduces airfoil drag to $C_D = 0.059$ (36.9\% below Grid Search), and improves RL return by 50\% over Grid Search. All advantages over non-ALMAB baselines are statistically significant under Bonferroni-corrected Mann--Whitney $U$ tests. Distributed execution achieves $7.5\times$ speedup at $K = 16$ agents, consistent with Amdahl's Law.

2604.00915 2026-06-04 cs.LG stat.ML

Orthogonal Learner for Estimating Heterogeneous Long-Term Treatment Effects

正交学习器用于估计异质性长期处理效应

Haorui Ma, Dennis Frauen, Valentyn Melnychuk, Stefan Feuerriegel

AI总结 提出LT-O-learners,通过定制重叠权重重新定位损失函数,解决低重叠区域下异质性长期处理效应估计不稳定问题,并证明其Neyman正交性和对干扰误差的鲁棒性。

详情
AI中文摘要

异质性长期处理效应(HLTEs)的估计对于营销、经济学和医学中的个性化决策具有重要意义,在这些领域中,短期观测数据集通常与长期观测数据集相结合。然而,由于某些子群体在处理分配或长期结果上的重叠有限,HLTE估计面临挑战,可能导致具有大有限样本方差不稳定的HLTE估计。为了解决这一挑战,我们引入了LT-O-learners(长期正交学习器),这是一组新颖的正交学习器,用于在具有替代性的典型HLTE设置中进行HLTE估计。我们的LT-O-learners的关键思想是通过定制的重叠权重重新定位损失函数,这些权重降低了低重叠样本的权重。我们证明了重新定位的损失函数逐点恢复真实的HLTE,并满足Neyman正交性。我们进一步证明了两个关键的理论结果:(i)干扰误差仅通过高阶项进入误差界,这意味着我们的学习器对干扰估计误差具有鲁棒性。(ii)在线性函数类下,重新定位通过低重叠区域中的重叠权重有效控制了HLTE估计器的渐近方差。我们在合成和真实世界数据集上进行实验,以确认我们的LT-O-learners的理论性质,特别是在低重叠区域中的鲁棒性。据我们所知,我们是第一个在长期设置中对低重叠鲁棒的HLTE估计正交学习器。

英文摘要

Estimation of heterogeneous long-term treatment effects (HLTEs) is relevant for personalized decision-making in marketing, economics, and medicine, where short-term observational datasets are often combined with long-term observational datasets. However, HLTE estimation is challenging due to limited overlap in treatment assignments or in long-term outcomes for certain subpopulations, which can lead to unstable HLTE estimates with large finite-sample variance. To address this challenge, we introduce the LT-O-learners (Long-Term Orthogonal Learners), a set of novel orthogonal learners for HLTE estimation in the canonical HLTE setting with surrogacy. The key idea of our LT-O-learners is to retarget the loss via custom overlap weights that downweight low-overlap samples. We show that the retargeted loss recovers the true HLTE pointwise and satisfies Neyman-orthogonality. We further prove two key theoretical results: (i) The nuisance error enters the error bound only through higher-order terms, which means our learners are robust to nuisance estimation error. (ii) Under a linear function class, the retargeting effectively controls the asymptotic variance of the HLTE estimator via the overlap weights in low-overlap regimes. We conduct experiments on synthetic and real-world datasets to confirm the theoretical properties of our LT-O-learners, particularly robustness in low-overlap regimes. To our knowledge, ours are the first orthogonal learners for HLTE estimation robust to low overlap in long-term settings.

2603.27095 2026-06-04 stat.AP

Socioeconomic Drivers of Physical Morbidity Across U.S. Counties: A Spatial Causal Inference Approach

美国各县身体发病的社会经济驱动因素:一种空间因果推断方法

Ranadeep Daw, Hunter N. Evans, Indrabati Bhattacharya

AI总结 本研究采用双稳健广义倾向得分估计器结合空间谱基函数,识别美国各县社会经济因素对身体不健康天数的影响。

详情
AI中文摘要

识别社会经济决定因素对人口健康的因果效应具有广泛兴趣——从统计方法发展到公共卫生实践和政策制定。该问题的统计方面需要解决几个问题:暴露和结果中的空间自相关、处理与协变量之间的混杂,以及地理逻辑推断的需求。我们通过使用谱基函数——Moran特征向量图和ICAR精度矩阵特征向量——在连续处理的双稳健广义倾向得分估计器中联合解决这些问题。应用于2022年美国各县的健康数据,该框架识别了六个选定预测因子对每月平均身体不健康天数的影响。本文还讨论了可能的进一步应用和方法扩展作为未来研究方向。

英文摘要

Identifying the causal effects of socioeconomic determinants on population health is of many great interests - from statistical methodology development to public health practitioners and policy developments. The statistical side of the problem needs to address several questions: spatial autocorrelation in both exposures and outcomes, confounding between treatments and covariates, and the need for geographically logical inference. We address these jointly by using spectral basis functions - Moran Eigenvector Maps and ICAR precision matrix eigenvectors - within a doubly robust generalized propensity score estimator for continuous treatments. Applied to 2022 county health data across the U.S. counties, the framework identifies the effect of six chosen predictors on the average physically unhealthy days per month. Possible further applications and methodological extensions are also discussed as future directions from this research.

2603.24786 2026-06-04 econ.EM math.ST stat.TH

Refined Cluster Robust Inference

精细化聚类稳健推断

Bulat Gafarov, Takuya Ura

AI总结 针对小聚类数下t统计量正态近似精度差的问题,本文提出基于条件Cramér-Edgeworth展开的临界值方法,实现三阶精细化,并给出闭式表达式。

详情
AI中文摘要

在实证研究中,进行对聚类依赖和异质性稳健的推断已成为标准做法。当聚类数量较少时,回归系数t统计量的正态近似可能效果不佳。本文利用基于t统计量的条件Cramér-Edgeworth展开的临界值来解决这一问题。我们的方法保证了三阶精细化,无论解释变量是离散的还是连续的。该临界值是估计的得分偏度和峰度的闭式函数。模拟表明,我们的方法在仅有10个聚类时也能在尺寸控制上产生显著差异。

英文摘要

It has become standard for empirical studies to conduct inference robust to cluster dependence and heterogeneity. With a small number of clusters, the normal approximation for the $t$-statistics of regression coefficients may be poor. This paper tackles this problem using a critical value based on the conditional Cramér-Edgeworth expansion for the $t$-statistics. Our approach guarantees third-order refinement, regardless of whether a regressor is discrete or not. The critical value is a closed-form function of the estimated score skewness and kurtosis. Simulations show that our proposal can make a difference in size control with as few as 10 clusters. Keywords: Cluster robust inference, Cramér-Edgeworth expansion, Asymptotic refinement

2603.19005 2026-06-04 cs.LG cs.AI stat.ME

AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science

AgentDS技术报告:领域特定数据科学中人机协作的未来基准测试

An Luo, Jin Du, Xun Xian, Robert Specht, Fangqiao Tian, Ganghua Wang, Xuan Bi, Charles Fleming, Ashish Kundu, Jayanth Srinivasa, Mingyi Hong, Rui Zhang, Tianxi Li, Galin Jones, Jie Ding

AI总结 提出AgentDS基准测试和竞赛,通过17个跨行业挑战评估AI代理及人机协作在领域特定数据科学中的表现,发现AI代理在领域推理上存在不足,人机协作优于纯AI方法。

详情
AI中文摘要

数据科学在将复杂数据转化为跨领域的可操作洞察方面发挥着关键作用。大型语言模型(LLM)和人工智能(AI)代理的最新发展显著自动化了数据科学工作流程。然而,目前尚不清楚AI代理在多大程度上能够匹配人类专家在领域特定数据科学任务上的表现,以及人类专业知识在哪些方面仍具有优势。我们引入了AgentDS,一个旨在评估AI代理和人机协作在领域特定数据科学中表现的基准测试和竞赛。AgentDS包含来自六个行业(商业、食品生产、医疗保健、保险、制造业和零售银行)的17个挑战。我们组织了一场公开竞赛,涉及29支队伍和80名参与者,从而能够系统比较人机协作方法与纯AI基线。我们的结果表明,当前的AI代理在领域特定推理方面存在困难。纯AI基线的表现低于竞赛参与者的前四分位数,而最强的解决方案来自人机协作。这些发现挑战了AI完全自动化的说法,并强调了人类专业知识在数据科学中的持久重要性,同时为下一代AI指明了方向。访问AgentDS网站:https://agentds.org/,开源数据集:https://huggingface.co/datasets/lainmn/AgentDS。

英文摘要

Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) and artificial intelligence (AI) agents have significantly automated data science workflow. However, it remains unclear to what extent AI agents can match the performance of human experts on domain-specific data science tasks, and in which aspects human expertise continues to provide advantages. We introduce AgentDS, a benchmark and competition designed to evaluate both AI agents and human-AI collaboration performance in domain-specific data science. AgentDS consists of 17 challenges across six industries: commerce, food production, healthcare, insurance, manufacturing, and retail banking. We conducted an open competition involving 29 teams and 80 participants, enabling systematic comparison between human-AI collaborative approaches and AI-only baselines. Our results show that current AI agents struggle with domain-specific reasoning. AI-only baselines perform below the top quartile of competition participants, while the strongest solutions arise from human-AI collaboration. These findings challenge the narrative of complete automation by AI and underscore the enduring importance of human expertise in data science, while illuminating directions for the next generation of AI. Visit the AgentDS website here: https://agentds.org/ and open source datasets here: https://huggingface.co/datasets/lainmn/AgentDS .

2603.13704 2026-06-04 stat.ME

A Kernel-Based Nonparametric Test for Conditional Independence of Functional Data

基于核的函数型数据条件独立性非参数检验

Yin Tang, Bing Li

AI总结 针对函数型数据,提出基于联合条件协方差算子的核检验方法,推导渐近分布并构造检验统计量,应用于活动和生物特征数据集及宏观经济数据集。

详情
AI中文摘要

条件独立性是统计研究多个领域(例如充分降维、因果推断和统计图模型)的基本概念。在许多现代应用中,数据以随机函数的形式出现,因此判断两个随机函数在给定第三个随机函数时是否条件独立变得重要。然而,据我们所知,文献中现有的条件独立性检验仅适用于多元数据,尚未扩展到函数型设置。为填补这一空白,我们基于联合条件协方差算子(CCCO)开发了一种针对随机函数的核检验。我们利用最近建立的回归算子锐化收敛速度(Choi 等人,2026),严格推导了 CCCO 估计量的渐近分布。基于此结果,我们利用渐近分布中出现的算子的谱分解构造了检验统计量。通过应用于活动和生物特征数据集以及宏观经济数据集,展示了所提出方法的效果。

英文摘要

Conditional independence is a fundamental concept in many areas of statistical research, including, for example, sufficient dimension reduction, causal inference, and statistical graphical models. In many modern applications, data arise in the form of random functions, making it important to determine whether two random functions are conditionally independent given a third. However, to the best of our knowledge, existing conditional independence tests in the literature apply only to multivariate data, and extensions to the functional setting are not available. To fill this gap, we develop a kernel-based test for conditional independence of random functions based on the conjoined conditional covariance operator (CCCO). We rigorously derive the asymptotic distribution of the CCCO estimator using a recently established sharpened convergence rate for the regression operator (Choi et al., 2026). Based on this result, we construct a test statistic using the spectral decomposition of the operator appearing in the asymptotic distribution. The proposed method is illustrated through applications to an activity and biometrics dataset and a macroeconomic dataset.

2602.20651 2026-06-04 cs.LG stat.AP stat.ML

Sparse Bayesian Deep Functional Learning with Structured Region Selection

稀疏贝叶斯深度函数学习与结构化区域选择

Xiaoxian Zhu, Yingmeng Li, Shuangge Ma, Mengyun Wu

AI总结 提出稀疏贝叶斯函数深度神经网络(sBayFDNN),通过深度贝叶斯架构学习自适应函数嵌入以捕捉复杂非线性关系,并利用结构化先验实现具有量化不确定性的可解释区域选择,理论上首次为贝叶斯深度函数模型提供了近似误差界、后验一致性和区域选择一致性的严格保证。

详情
AI中文摘要

在现代应用如心电图监测、神经影像、可穿戴传感和工业设备诊断中,复杂且连续结构化的数据无处不在,为函数数据分析带来了挑战和机遇。然而,现有方法面临关键权衡:传统函数模型受限于线性,而深度学习方法缺乏可解释的区域选择以处理稀疏效应。为弥合这些差距,我们提出了一种稀疏贝叶斯函数深度神经网络(sBayFDNN)。它通过深度贝叶斯架构学习自适应函数嵌入以捕捉复杂的非线性关系,同时结构化先验使得能够对具有量化不确定性的影响域进行可解释的区域选择。理论上,我们建立了严格的近似误差界、后验一致性和区域选择一致性。这些结果为贝叶斯深度函数模型提供了首个理论保证,确保了其可靠性和统计严谨性。实证上,全面的模拟和真实世界研究证实了sBayFDNN的有效性和优越性。关键的是,sBayFDNN在识别复杂依赖关系以实现准确预测方面表现出色,并能更精确地识别功能上有意义的区域,这些能力从根本上超越了现有方法。

英文摘要

In modern applications such as ECG monitoring, neuroimaging, wearable sensing, and industrial equipment diagnostics, complex and continuously structured data are ubiquitous, presenting both challenges and opportunities for functional data analysis. However, existing methods face a critical trade-off: conventional functional models are limited by linearity, whereas deep learning approaches lack interpretable region selection for sparse effects. To bridge these gaps, we propose a sparse Bayesian functional deep neural network (sBayFDNN). It learns adaptive functional embeddings through a deep Bayesian architecture to capture complex nonlinear relationships, while a structured prior enables interpretable, region-wise selection of influential domains with quantified uncertainty. Theoretically, we establish rigorous approximation error bounds, posterior consistency, and region selection consistency. These results provide the first theoretical guarantees for a Bayesian deep functional model, ensuring its reliability and statistical rigor. Empirically, comprehensive simulations and real-world studies confirm the effectiveness and superiority of sBayFDNN. Crucially, sBayFDNN excels in recognizing intricate dependencies for accurate predictions and more precisely identifies functionally meaningful regions, capabilities fundamentally beyond existing approaches.

2602.19799 2026-06-04 stat.ML cs.LG math.OC

Path-conditioned training: a principled way to rescale ReLU neural networks

路径条件训练:一种缩放ReLU神经网络的原则性方法

Arthur Lebeurrier, Titouan Vayer, Rémi Gribonval

AI总结 本文提出一种基于路径提升框架的几何准则来缩放ReLU网络参数,通过最小化该准则实现核对齐,从而加速训练。

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning (ICML 2026), Seoul, South Korea, PMLR 306 (2026)
AI中文摘要

尽管最近算法有所进展,我们仍然缺乏原则性的方法来利用ReLU神经网络参数中记录良好的缩放对称性。虽然两个适当缩放的权重实现相同的函数,但训练动态可能截然不同。为了对这一现象提供新的视角,我们基于最近的路径提升框架,该框架提供了ReLU网络的紧凑分解。我们引入了一个几何动机的准则来缩放神经网络参数,其最小化导致一种条件策略,将路径提升空间中的核与选定的参考对齐。我们推导了一种有效的算法来执行这种对齐。在随机网络初始化的背景下,我们分析了架构和初始化尺度如何共同影响所提出方法的输出。数值实验展示了其加速训练的潜力。

英文摘要

Despite recent algorithmic advances, we still lack principled ways to leverage the well-documented rescaling symmetries in ReLU neural network parameters. While two properly rescaled weights implement the same function, the training dynamics can be dramatically different. To offer a fresh perspective on exploiting this phenomenon, we build on the recent path-lifting framework, which provides a compact factorization of ReLU networks. We introduce a geometrically motivated criterion to rescale neural network parameters which minimization leads to a conditioning strategy that aligns a kernel in the path-lifting space with a chosen reference. We derive an efficient algorithm to perform this alignment. In the context of random network initialization, we analyze how the architecture and the initialization scale jointly impact the output of the proposed method. Numerical experiments illustrate its potential to speed up training.

2602.03972 2026-06-04 stat.ML cs.AI cs.LG

Fixed Budget is No Harder Than Fixed Confidence in Best-Arm Identification up to Logarithmic Factors

固定预算在最佳臂识别中不比固定置信度难(对数因子范围内)

Kapilan Balagopalan, Yinan Li, Yao Zhao, Tuan Nguyen, Anton Daitche, Houssam Nassif, Kwang-Sung Jun

AI总结 本文提出元算法FC2FB,将固定置信度算法转化为固定预算算法,证明固定预算的样本复杂度在log因子内不高于固定置信度。

详情
Journal ref
International Conference on Machine Learning (ICML'26), Seoul, Korea, 2026
AI中文摘要

最佳臂识别(BAI)问题是交互式机器学习中最基本的问题之一,有两种形式:固定预算设置(FB)和固定置信度设置(FC)。对于具有唯一最佳臂的$K$臂赌博机,两种设置的最优样本复杂度已被确定,且在对数因子内匹配。这引出了一个关于通用的、可能具有结构化的BAI问题的有趣研究问题:FB是否比FC更难,还是相反?在本文中,我们证明FB在对数因子内并不比FC难。我们通过构造性方式做到这一点:我们提出了一种名为FC2FB(固定置信度到固定预算)的新算法,这是一种元算法,它接收一个FC算法$\mathcal{A}$并将其转化为FB算法。我们证明FC2FB的样本复杂度与$\mathcal{A}$的样本复杂度在对数因子内匹配。这意味着最优FC样本复杂度是FB最优样本复杂度的一个上界(在对数因子内)。我们的结果不仅揭示了FB和FC之间的基本关系,而且具有重要含义:FC2FB与现有最先进的FC算法相结合,可以改善许多FB问题的样本复杂度。

英文摘要

The best-arm identification (BAI) problem is one of the most fundamental problems in interactive machine learning, which has two flavors: the fixed-budget setting (FB) and the fixed-confidence setting (FC). For $K$-armed bandits with a unique best arm, the optimal sample complexities for both settings have been settled down, and they match up to logarithmic factors. This prompts an interesting research question about the generic, potentially structured BAI problems: is FB harder than FC or the other way around? In this paper, we show that FB is no harder than FC up to logarithmic factors. We do this constructively: we propose a novel algorithm called FC2FB (fixed confidence to fixed budget), which is a meta algorithm that takes in an FC algorithm $\mathcal{A}$ and turn it into an FB algorithm. We prove that FC2FB enjoys a sample complexity that matches, up to logarithmic factors, that of the sample complexity of $\mathcal{A}$. This means that the optimal FC sample complexity is an upper bound of the optimal FB sample complexity up to logarithmic factors. Our result not only reveals a fundamental relationship between FB and FC, but also has a significant implication: FC2FB combined with existing state-of-the-art FC algorithms leads to improved sample complexity for a number of FB problems.

2602.15202 2026-06-04 quant-ph cs.AI cs.NA eess.SP math.NA stat.CO

Tomography by Design: An Algebraic Approach to Low-Rank Quantum States

按设计断层扫描:低秩量子态的代数方法

Shakir Showkat Sofi, Charlotte Vermeylen, Lieven De Lathauwer

AI总结 提出一种代数算法,通过测量特定可观测量估计密度矩阵的结构化条目,并利用低秩假设通过数值线性代数完成矩阵,实现高效且确定性的量子态层析。

详情
Comments
5 pages, Accepted to EUSIPCO 2026
AI中文摘要

我们提出了一种用于量子态层析的代数算法,该算法利用对某些可观测量的测量来估计底层密度矩阵的结构化条目。在低秩假设下,其余条目可以仅使用标准数值线性代数运算获得。所提出的代数矩阵补全框架适用于一大类通用的低秩混合量子态,并且与最先进的方法相比,计算效率高,同时提供确定性的恢复保证。

英文摘要

We present an algebraic algorithm for quantum state tomography that leverages measurements of certain observables to estimate structured entries of the underlying density matrix. Under low-rank assumptions, the remaining entries can be obtained solely using standard numerical linear algebra operations. The proposed algebraic matrix completion framework applies to a broad class of generic, low-rank mixed quantum states and, compared with state-of-the-art methods, is computationally efficient while providing deterministic recovery guarantees.

2602.12643 2026-06-04 cs.LG cs.AI stat.ML

Unifying Model-Free Efficiency and Model-Based Representations via Latent Dynamics

通过潜在动力学统一无模型效率与基于模型的表示

Jashaswimalya Acharjee, Balaraman Ravindran

AI总结 提出统一潜在动力学算法,通过将状态-动作对嵌入到值函数近似线性的潜在空间,无需规划开销即可融合无模型效率与基于模型表示的优势,在80个环境中匹配或超越专门基线。

详情
Comments
Similarities found with a prior work. Hence, requesting for withdrawal until further notice
AI中文摘要

我们提出了统一潜在动力学(ULD),一种新颖的强化学习算法,它统一了无模型方法的效率与基于模型方法的表示优势,且不产生规划开销。通过将状态-动作对嵌入到真实值函数近似线性的潜在空间中,我们的方法支持跨不同领域使用单一超参数集——从低维和像素输入的连续控制到高维Atari游戏。我们证明,在温和条件下,基于嵌入的时序差分更新的不动点与相应线性基于模型的值扩展的不动点一致,并推导了将嵌入保真度与值逼近质量相关联的显式误差界。在实践中,ULD采用编码器、值函数和策略网络的同步更新、短视界预测动力学的辅助损失以及奖励尺度归一化,以确保在稀疏奖励下的稳定学习。在涵盖Gym运动控制、DeepMind Control(本体感觉和视觉)以及Atari的80个环境上的评估表明,我们的方法匹配或超过了专门的基于模型和通用基于模型的基线的性能——以最少的调参和更少的参数实现了跨领域能力。这些结果表明,仅与值对齐的潜在表示就能提供传统上归因于完整基于模型规划的适应性和样本效率。

英文摘要

We present Unified Latent Dynamics (ULD), a novel reinforcement learning algorithm that unifies the efficiency of model-free methods with the representational strengths of model-based approaches, without incurring planning overhead. By embedding state-action pairs into a latent space in which the true value function is approximately linear, our method supports a single set of hyperparameters across diverse domains -- from continuous control with low-dimensional and pixel inputs to high-dimensional Atari games. We prove that, under mild conditions, the fixed point of our embedding-based temporal-difference updates coincides with that of a corresponding linear model-based value expansion, and we derive explicit error bounds relating embedding fidelity to value approximation quality. In practice, ULD employs synchronized updates of encoder, value, and policy networks, auxiliary losses for short-horizon predictive dynamics, and reward-scale normalization to ensure stable learning under sparse rewards. Evaluated on 80 environments spanning Gym locomotion, DeepMind Control (proprioceptive and visual), and Atari, our approach matches or exceeds the performance of specialized model-free and general model-based baselines -- achieving cross-domain competence with minimal tuning and a fraction of the parameter footprint. These results indicate that value-aligned latent representations alone can deliver the adaptability and sample efficiency traditionally attributed to full model-based planning.

2602.11406 2026-06-04 stat.ML cs.LG

The Cost of Learning Under Multiple Change Points

多个变化点下的学习成本

Tomer Gafni, Garud Iyengar, Assaf Zeevi

AI总结 针对多变化点在线学习问题,提出选择性检测算法ATC,实现近乎极小化最优的遗憾界。

详情
Comments
A version of this work has been accepted for publication in the Proceedings of the 43rd International Conference on Machine Learning (ICML 2026), Seoul, South Korea
AI中文摘要

我们考虑具有多个变化点的环境中的在线学习问题。与使用经典“高置信度”检测方案广泛研究的单变化点问题不同,多变化点环境提出了新的学习理论和算法挑战。具体来说,我们表明经典方法可能由于我们称之为内生混杂的现象而表现出灾难性失败(高遗憾)。为了克服这一点,我们提出了一类新的学习算法,称为任意时间跟踪CUSUM(ATC)。这些是无视时间范围的在线算法,实现选择性检测原则,平衡忽略“小”(难以检测)变化的需要,同时对显著变化做出“快速”反应。我们证明,适当调整的ATC算法的性能几乎是极小化最优的;其遗憾保证紧密匹配任何学习算法在多变化点问题中可实现性能的新信息论下界。在合成数据以及真实数据上的实验验证了上述理论发现。

英文摘要

We consider an online learning problem in environments with multiple change points. In contrast to the single change point problem that is widely studied using classical "high confidence" detection schemes, the multiple change point environment presents new learning-theoretic and algorithmic challenges. Specifically, we show that classical methods may exhibit catastrophic failure (high regret) due to a phenomenon we refer to as endogenous confounding. To overcome this, we propose a new class of learning algorithms dubbed Anytime Tracking CUSUM (ATC). These are horizon-free online algorithms that implement a selective detection principle, balancing the need to ignore "small" (hard-to-detect) shifts, while reacting "quickly" to significant ones. We prove that the performance of a properly tuned ATC algorithm is nearly minimax-optimal; its regret is guaranteed to closely match a novel information-theoretic lower bound on the achievable performance of any learning algorithm in the multiple change point problem. Experiments on synthetic as well as real-world data validate the aforementioned theoretical findings.

2602.08142 2026-06-04 cs.LG stat.ML

Variance-Gated Ensembles: An Epistemic-Aware Framework for Uncertainty Estimation

方差门控集成:一种面向认知不确定性的估计框架

H. Martin Gillis, Isaac Xu, Thomas Trappenberg

AI总结 提出方差门控集成(VGE)框架,通过从集成统计量计算信噪比门控注入认知敏感性,实现高效且可微的不确定性估计,在计算效率与性能上匹配或超越现有方法。

详情
Comments
Published in Transactions on Machine Learning Research (06/2026)
AI中文摘要

机器学习应用需要快速且可靠的逐样本不确定性估计。常见方法是使用贝叶斯或近似方法的预测分布,并将不确定性加性分解为偶然(即数据相关)和认知(即模型相关)分量。然而,加性分解最近受到质疑,有证据表明当使用有限集成采样和/或不匹配的预测分布时,该分解会失效。本文介绍方差门控集成(VGE),一种直观、可微的框架,通过从集成统计量计算的信噪比门控注入认知敏感性。VGE提供:(i)方差门控边际不确定性(VGMU)分数,将决策边际与集成预测方差耦合;(ii)方差门控归一化(VGN)层,通过每类可学习的集成成员概率归一化,将方差门控不确定性机制推广到训练。我们推导出闭合形式的向量-雅可比积,使得通过集成样本均值和方差进行端到端训练成为可能。VGE在保持计算效率的同时,匹配或超越最先进的信息论基线。因此,VGE为集成模型中的认知感知不确定性估计提供了一种实用且可扩展的方法。

英文摘要

Machine learning applications require fast and reliable per-sample uncertainty estimation. A common approach is to use predictive distributions from Bayesian or approximation methods and additively decompose uncertainty into aleatoric (i.e., data-related) and epistemic (i.e., model-related) components. However, additive decomposition has recently been questioned, with evidence that it breaks down when using finite-ensemble sampling and/or mismatched predictive distributions. This paper introduces Variance-Gated Ensembles (VGE), an intuitive, differentiable framework that injects epistemic sensitivity via a signal-to-noise gate computed from ensemble statistics. VGE provides: (i) a Variance-Gated Margin Uncertainty (VGMU) score that couples decision margins with ensemble predictive variance; and (ii) a Variance-Gated Normalization (VGN) layer that generalizes the variance-gated uncertainty mechanism to training via per-class, learnable normalization of ensemble member probabilities. We derive closed-form vector-Jacobian products enabling end-to-end training through ensemble sample mean and variance. VGE matches or exceeds state-of-the-art information-theoretic baselines while remaining computationally efficient. As a result, VGE provides a practical and scalable approach to epistemic-aware uncertainty estimation in ensemble models.

2602.06883 2026-06-04 cs.LG cs.CV stat.ML

Vision Transformer Finetuning Benefits from Non-Smooth Components

视觉变换器微调受益于非平滑组件

Ambroise Odonnat, Laetitia Chapel, Romain Tavenard, Ievgen Redko

AI总结 本文通过分析视觉变换器组件的可塑性(即输出对输入变化的敏感度),发现高可塑性(低平滑性)的注意力模块和前馈层在微调中表现更好,挑战了平滑性有利的传统观点。

详情
Comments
Accepted at ICML 2026
AI中文摘要

变换器架构的平滑性在泛化、训练稳定性和对抗鲁棒性方面已被广泛研究。然而,其在迁移学习中的作用仍知之甚少。本文分析了视觉变换器组件使其输出适应输入变化的能力,即它们的\emph{可塑性}。定义为平均变化率,它捕捉了对输入扰动的敏感性;特别地,高可塑性意味着低平滑性。我们的理论分析和大量实验——在大规模视觉变换器上进行超过1000次微调运行——表明,这一视角为选择在适应过程中优先考虑的组件提供了原则性指导。对从业者的关键启示是,注意力模块和前馈层的高可塑性始终导致更好的微调性能。我们的发现偏离了平滑性是可取的普遍假设,为变换器的功能特性提供了新的视角。代码可在 https://github.com/ambroiseodt/vit-plasticity 获取。

英文摘要

The smoothness of the transformer architecture has been extensively studied in the context of generalization, training stability, and adversarial robustness. However, its role in transfer learning remains poorly understood. In this paper, we analyze the ability of vision transformer components to adapt their outputs to changes in inputs, or, in other words, their \emph{plasticity}. Defined as an average rate of change, it captures the sensitivity to input perturbation; in particular, a high plasticity implies a low smoothness. Our theoretical analysis and extensive experiments -- over $1,000$ finetuning runs on large-scale vision transformers -- showcase that this perspective provides principled guidance in choosing the components to prioritize during adaptation. A key takeaway for practitioners is that the high plasticity of the attention modules and feedforward layers consistently leads to better finetuning performance. Our findings depart from the prevailing assumption that smoothness is desirable, offering a novel perspective on transformers' functional properties. The code is available at https://github.com/ambroiseodt/vit-plasticity.

2512.21917 2026-06-04 cs.LG cs.AI econ.EM stat.ML

Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model

半参数偏好优化:你的语言模型秘密地是一个单索引模型

Nathan Kallus

AI总结 本文提出半参数偏好优化方法,通过放宽偏好与潜在奖励之间的链接函数假设,在未知且无限制的链接函数下进行策略对齐,并证明策略类的可实现性诱导出半参数单索引二元选择模型,直接学习策略并给出链接无关的收敛保证。

详情
AI中文摘要

策略对齐到偏好数据通常假设观察到的偏好与潜在奖励之间存在已知的链接函数(例如,Bradley-Terry模型/逻辑链接)。这种链接的错误设定可能会使推断的奖励产生偏差,并使学习到的策略偏离对齐。我们研究了在未知且无限制的链接函数下的策略对齐。我们提出了一个$f$-散度约束的奖励最大化问题,并表明策略类中的可实现性诱导出一个半参数单索引二元选择模型,其中标量策略诱导的索引捕获了所有对示范的依赖,而剩余的偏好分布是无限制的。与计量经济学中要求识别此类模型的结构参数并进行估计不同,我们开发了直接学习策略的方法,其中奖励函数是隐式的,分析了与最优策略的误差,并允许不可识别和非参数的索引。我们证明了基于通用函数复杂度度量的链接无关收敛保证,并通过实验验证了方法和理论。代码可在 https://github.com/causalml/spo/ 获取。

英文摘要

Policy alignment to preference data typically assumes a known link function between observed preferences and latent rewards (e.g., Bradley-Terry model / logistic link). Misspecification of this link can bias inferred rewards and misalign learned policies. We study policy alignment under an unknown and unrestricted link function. We formulate an $f$-divergence-constrained reward maximization problem and show that realizability in a policy class induces a semiparametric single-index binary choice model, where a scalar policy-induced index captures all dependence on demonstrations and the remaining preference distribution is unrestricted. Rather than impose identifiability of structural parameters of such a model and estimate them, as in econometrics, we develop methods that directly learn policies, with the reward function implicit, analyzing error to the optimal policy and allowing for unidentifiable and nonparametric indices. We prove link-agnostic convergence guarantees in terms of generic function complexity measures and validate the methods and theory empirically. Code is available at https://github.com/causalml/spo/.

2601.07144 2026-06-04 stat.ML cs.LG math.ST stat.TH

Optimal Transport under Group Fairness Constraints

群体公平约束下的最优运输

Linus Bleistein, Mathieu Dagréou, Francisco Andrade, Thomas Boudou, Aurélien Bellet

AI总结 针对最优运输中的群体公平问题,提出通过修正Sinkhorn算法实现完美公平,并开发两种松弛策略(惩罚OT和双层优化学习成本)以平衡公平与匹配质量,给出理论保证和实证结果。

详情
Comments
Accepted at ICML 2026 (spotlight)
AI中文摘要

确保匹配算法中的公平性是分配稀缺资源和职位的关键挑战。聚焦于最优运输(OT),我们引入了一种新的群体公平概念,要求OT计划中任意两个给定群体的个体匹配概率满足预定义目标。我们首先提出一种修正的Sinkhorn算法来高效计算完全公平的运输计划。由于实际中精确公平会显著降低匹配质量,我们随后开发了两种松弛策略。第一种涉及求解一个惩罚OT问题,我们为其推导了新的有限样本复杂度保证。第二种策略利用双层优化学习一个诱导公平OT解的基础成本,并建立了匹配未见数据时公平性偏差的界。最后,我们展示了实证结果,说明了我们方法的性能以及公平性与运输成本之间的权衡。

英文摘要

Ensuring fairness in matching algorithms is a key challenge in allocating scarce resources and positions. Focusing on Optimal Transport (OT), we introduce a novel notion of group fairness requiring that the probability of matching two individuals from any two given groups in the OT plan satisfies a predefined target. We first propose a modified Sinkhorn algorithm to compute perfectly fair transport plans efficiently. Since exact fairness can significantly degrade matching quality in practice, we then develop two relaxation strategies. The first one involves solving a penalized OT problem, for which we derive novel finite-sample complexity guarantees. Our second strategy leverages bilevel optimization to learn a ground cost that induces a fair OT solution, and we establish a bound on the deviation of fairness when matching unseen data. Finally, we present empirical results illustrating the performance of our approaches and the trade-off between fairness and transport cost.

2601.21868 2026-06-04 stat.ML cs.LG

On Forgetting and Stability of Score-based Generative models

关于基于分数的生成模型的遗忘与稳定性

Stanislas Strasman, Gabriel Cardoso, Sylvain Le Corff, Vincent Lemaire, Antonio Ocello

AI总结 本文利用反向时间动力学马尔可夫链的遗忘和稳定性性质,在弱假设下通过Lyapunov漂移条件和Doeblin型小化条件,定量分析了基于分数的生成模型的采样误差,并证明了采样过程的定量稳定性。

详情
AI中文摘要

理解生成模型的稳定性和长期行为是现代机器学习中的一个基本问题。本文通过利用与反向时间动力学相关的马尔可夫链的稳定性和遗忘性质,为基于分数的生成模型的采样误差提供了定量界限。在弱假设下,我们提供了两个结构性质以确保反向过程的初始化和离散化误差的传播:一个Lyapunov漂移条件和一个Doeblin型小化条件。一个实际结果是采样过程的定量稳定性,因为反向扩散动力学沿采样轨迹诱导了一种收缩机制。我们的结果阐明了随机动力学在基于分数的模型中的作用,并为分析此类方法中的误差传播提供了一个原则性框架。

英文摘要

Understanding the stability and long-time behavior of generative models is a fundamental problem in modern machine learning. This paper provides quantitative bounds on the sampling error of score-based generative models by leveraging stability and forgetting properties of the Markov chain associated with the reverse-time dynamics. Under weak assumptions, we provide the two structural properties to ensure the propagation of initialization and discretization errors of the backward process: a Lyapunov drift condition and a Doeblin-type minorization condition. A practical consequence is quantitative stability of the sampling procedure, as the reverse diffusion dynamics induces a contraction mechanism along the sampling trajectory. Our results clarify the role of stochastic dynamics in score-based models and provide a principled framework for analyzing propagation of errors in such approaches.

2601.18777 2026-06-04 cs.LG cs.AI cs.CL cs.IR stat.AP

PRECISE: Reducing the Bias of LLM Evaluations Using Prediction-Powered Ranking Estimation

PRECISE: 使用预测驱动的排名估计减少LLM评估的偏差

Abhishek Divekar, Anirban Majumder

AI总结 提出PRECISE框架,通过结合少量人工标注与LLM判断,利用预测驱动推断(PPI)方法,在低资源下可靠估计搜索、排序和RAG系统的指标,并校正LLM偏差。

详情
Comments
Accepted at AAAI 2026 - Innovative Applications of AI (IAAI-26)
AI中文摘要

评估搜索、排序和RAG系统的质量传统上需要大量人工相关性标注。近年来,一些已部署的系统探索使用大型语言模型(LLM)作为自动评判者,但其固有偏差阻碍了直接用于指标估计。我们提出了一个扩展预测驱动推断(PPI)的统计框架,将最少的人工标注与LLM判断相结合,以生成需要子实例标注的指标的可靠估计。我们的方法仅需少至100个人工标注查询和10,000个未标注示例,相比传统方法显著减少了标注需求。我们为基于LLM的查询改写应用中的相关性提升推断制定了所提出的框架(PRECISE),将PPI扩展到查询-文档级别的子实例标注。通过重新制定指标集成空间,我们将计算复杂度从O(2^|C|)降低到O(2^K),其中|C|表示语料库大小(百万量级)。在多个著名检索数据集上的详细实验表明,我们的方法降低了业务关键指标Precision@K的估计方差,同时在低资源设置下有效校正了LLM偏差。

英文摘要

Evaluating the quality of search, ranking and RAG systems traditionally requires a significant number of human relevance annotations. In recent times, several deployed systems have explored the usage of Large Language Models (LLMs) as automated judges for this task while their inherent biases prevent direct use for metric estimation. We present a statistical framework extending Prediction-Powered Inference (PPI) that combines minimal human annotations with LLM judgments to produce reliable estimates of metrics which require sub-instance annotations. Our method requires as few as 100 human-annotated queries and 10,000 unlabeled examples, reducing annotation requirements significantly compared to traditional approaches. We formulate our proposed framework (PRECISE) for inference of relevance uplift for an LLM-based query reformulation application, extending PPI to sub-instance annotations at the query-document level. By reformulating the metric-integration space, we reduced the computational complexity from O(2^|C|) to O(2^K), where |C| represents corpus size (in order of millions). Detailed experiments across prominent retrieval datasets demonstrate that our method reduces the variance of estimates for the business-critical Precision@K metric, while effectively correcting for LLM bias in low-resource settings.

2601.18175 2026-06-04 cs.AI cs.LG cs.SY eess.SY stat.ML

Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success

成功条件化作为策略改进:模仿成功所解决的优化问题

Daniel Russo

AI总结 本文证明成功条件化(模仿成功轨迹)精确求解了一个信任区域优化问题,其χ²散度约束半径由数据自动确定,并揭示了相对策略改进、策略变化幅度和动作影响之间的等式关系。

详情
AI中文摘要

一种广泛使用的策略改进技术是成功条件化,即收集轨迹,识别那些实现期望结果的轨迹,并更新策略以模仿沿成功轨迹采取的动作。这一原则有许多名称——带SFT的拒绝采样、目标条件化RL、决策Transformer——但它解决了什么优化问题(如果有的话)一直不清楚。我们证明成功条件化精确求解了一个信任区域优化问题,在由数据自动确定半径的χ²散度约束下最大化策略改进。这产生了一个恒等式:相对策略改进、策略变化幅度以及我们称为动作影响(衡量动作选择中的随机变化如何影响成功率)的量在每个状态下都完全相等。因此,成功条件化表现为一个保守的改进算子。精确的成功条件化不会降低性能或引发危险的分布偏移,但当它失败时,它会以可观察的方式失败,即几乎不改变策略。我们将我们的理论应用于常见的回报阈值设定实践,表明这可以放大改进,但代价是可能与真实目标不一致。

英文摘要

A widely used technique for improving policies is success conditioning, in which one collects trajectories, identifies those that achieve a desired outcome, and updates the policy to imitate the actions taken along successful trajectories. This principle appears under many names -- rejection sampling with SFT, goal-conditioned RL, Decision Transformers -- yet what optimization problem it solves, if any, has remained unclear. We prove that success conditioning exactly solves a trust-region optimization problem, maximizing policy improvement subject to a $χ^2$ divergence constraint whose radius is determined automatically by the data. This yields an identity: relative policy improvement, the magnitude of policy change, and a quantity we call action-influence -- measuring how random variation in action choices affects success rates -- are exactly equal at every state. Success conditioning thus emerges as a conservative improvement operator. Exact success conditioning cannot degrade performance or induce dangerous distribution shift, but when it fails, it does so observably, by hardly changing the policy at all. We apply our theory to the common practice of return thresholding, showing this can amplify improvement, but at the cost of potential misalignment with the true objective.

2601.03569 2026-06-04 cs.LG stat.AP

Local Intrinsic Dimensionality of Ground Motion Data for Early Detection of Catastrophic Slope Failure

用于早期检测灾难性边坡破坏的地震动数据的局部内在维度

Yuansan Liu, James Bailey, Antoinette Tordesillas

AI总结 提出时空局部内在维度(st-LID)无监督框架,通过运动增强、贝叶斯空间融合和时间建模,提高滑坡监测中破坏区域的早期检测精度和提前时间。

详情
Comments
20 pages, 9 figures. ECML-PKDD 2026
AI中文摘要

局部内在维度(LID)在高维数据异常检测中显示出强大潜力,包括颗粒介质中滑坡破坏检测,其中早期准确识别破坏区域对于有效的地质灾害缓解至关重要。然而,由于表面位移数据中固有的空间相关性和时间动态,这项任务仍然具有挑战性。为了解决这一差距,我们提出了一种新颖的无监督框架,称为时空LID(st-LID),它将LID推广到滑坡监测网络中的稳健破坏检测。我们的方法引入了三个关键创新:(1)运动增强,将速度纳入LID计算以捕获瞬时变形率和短期时间动态;(2)贝叶斯空间融合,通过贝叶斯估计聚合空间邻域内的LID值,以嵌入空间相关性并考虑局部噪声;以及(3)时间建模(t-LID),一种新变体,表征位移数据的长期动态,提供位移行为的稳健时间表示。通过统一这些组件,st-LID识别出现有方法经常忽略的复杂多阶段破坏区域。大量实验表明,st-LID在检测精度和提前时间方面始终优于最先进的无监督基线,为滑坡早期预警系统和有针对性的风险干预提供了稳健基础,以增强社区韧性和准备策略。

英文摘要

Local Intrinsic Dimensionality (LID) has shown strong potential for anomaly detection in high-dimensional data, including landslide failure detection in granular media, where early and accurate identification of failure zones is crucial for effective geohazard mitigation. However, this task is still challenging due to the spatial correlations and temporal dynamics that are inherently present in surface displacement data. To address this gap, we propose a novel unsupervised framework called spatiotemporal LID (st-LID) that generalizes the LID for robust failure detection in landslide monitoring networks. Our approach introduces three key innovations: (1) Kinematic enhancement, incorporating velocity into the LID computation to capture instantaneous deformation rates and short-term temporal dynamics; (2) Bayesian spatial fusion, which aggregates LID values across spatial neighborhoods via Bayesian estimation, to embed spatial correlations and account for localized noise; and (3) Temporal modeling (t-LID), a new variant that characterizes long-term dynamics of displacement data, providing a robust temporal representation of displacement behavior. By unifying these components, st-LID identifies complex, multi-stage failure zones often overlooked by existing methods. Extensive experiments show that st-LID consistently outperforms state-of-the-art unsupervised baselines in detection precision and lead-time, providing a robust foundation for landslide early warning systems and targeted risk intervention to enhance community resilience and preparedness strategies.

2505.18281 2026-06-04 stat.ME

Measuring the Impact of Missingness in Traffic Stop Data

衡量交通拦截数据中缺失性的影响

Saatvik Kher, Johanna Hardin

AI总结 本文利用斯坦福开放警务项目的数据,通过多种指标识别数据并非完全随机缺失,并开发量化与可视化缺失趋势的方法,通过敏感性分析扩展结果检验和平均处理效应锐界的工作,证明缺失种族变量的观测值假设会根本改变偏差计算。

详情
AI中文摘要

在本文中,我们探讨了通过斯坦福开放警务项目可获得的数据。这些数据包含近100个不同城市和高速公路巡逻队的数百万次交通拦截信息。利用多种指标,我们识别出数据并非完全随机缺失。此外,我们开发了量化和可视化不同变量在数据集中缺失趋势的方法。随后,我们进行敏感性分析,以扩展结果检验以及平均处理效应锐界的工作。我们证明,偏差计算会根据对未记录种族变量的观测值所做的假设而发生根本性变化。我们提出了将缺失性敏感性分析扩展到多种不同背景的方法。

英文摘要

In this article we explore the data available through the Stanford Open Policing Project. The data consist of information on millions of traffic stops across close to 100 different cities and highway patrols. Using a variety of metrics, we identify that the data is not missing completely at random. Furthermore, we develop ways of quantifying and visualizing missingness trends for different variables across the datasets. We follow up by performing a sensitivity analysis to extend work done on the outcome test as well as to extend work done on sharp bounds on the average treatment effect. We demonstrate that bias calculations can fundamentally shift depending on the assumptions made about the observations for which the race variable has not been recorded. We suggest ways that our missingness sensitivity analysis can be extended to myriad different contexts.

2512.20753 2026-06-04 stat.AP

Algorithmic Bias in Lending: Evidence from a Fintech Audit

借贷中的算法偏见:来自金融科技审计的证据

Madison Coots, Robert Bartlett, Julian Nyarko, Sharad Goel

AI总结 本研究利用美国金融科技平台约8万笔个人贷款数据,审计发现算法承保模型对男性和黑人借款人的风险校准偏差导致其获得相对优惠的定价,并指出纳入种族和性别可纠正这种偏差,揭示了不同公平概念之间的冲突。

详情
AI中文摘要

算法借贷改变了消费信贷格局,机器学习模型通常用于辅助承保决策。为遵守公平借贷法律,这些算法排除了受法律保护的特征,如种族和性别。然而,算法承保仍可能无意中偏袒某些群体,引发对借贷算法是否表现出歧视行为的担忧。利用美国一家大型金融科技平台的专有贷款级数据,我们对约8万笔个人贷款的借贷决策进行了审计。我们发现,向男性和黑人借款人发放的贷款产生的利润低于其他群体,这表明男性和黑人借款人受益于相对优惠的定价。我们将这些差异追溯到平台承保模型的校准偏差,该模型高估了女性的风险,低估了黑人的风险。然后,我们表明,通过在承保模型中纳入种族和性别,可以纠正这种校准偏差及相应的差异,这说明了不同公平概念之间的紧张关系。

英文摘要

Algorithmic lending has transformed the consumer credit landscape, with machine learning models commonly facilitating underwriting decisions. To comply with fair lending laws, these algorithms exclude legally protected characteristics, such as race and gender. Yet algorithmic underwriting can still inadvertently favor certain groups, prompting concerns about whether lending algorithms exhibit discriminatory behavior. Using proprietary loan-level data from a major U.S. fintech platform, we audit lending decisions across approximately 80,000 personal loans. We find that loans made to men and Black borrowers yielded lower profits than loans to other groups, suggesting that men and Black borrowers benefited from relatively favorable pricing. We trace these disparities to miscalibration in the platform's underwriting model, which overestimates risk for women and underestimates risk for Black borrowers. We then show that one could correct this miscalibration -- and the corresponding disparities -- by including race and gender in underwriting models, illustrating a tension between competing notions of fairness.

2510.05013 2026-06-04 stat.ML cs.LG

Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration

通过自我探索的机器人好奇心驱动行为与语言发展

Theodore Jerome Tinker, Kenji Doya, Jun Tani

AI总结 本研究通过好奇心驱动的机器人自我探索,结合Q学习实现主动推理,揭示了组合泛化、快速学习、先配对后组合以及异常处理导致的U型发展模式,为人类高效语言习得提供解释。

详情
Comments
27 pages, 22 pages of supplementary material
AI中文摘要

婴儿通过极少的经验就能泛化习得语言,而大型语言模型需要数十亿的训练标记。人类高效发展的基础是什么?我们通过实验研究了这一问题,其中机器人代理通过好奇心驱动的自我探索学习执行与祈使句(例如,推红色立方体)相关的动作。我们的方法使用Q学习摊销主动推理,实现内在动机的发展性学习。模拟揭示了与发展心理学观察相对应的关键发现。i) 随着组合元素规模的增加,泛化能力显著提高。ii) 好奇心驱动的探索能够加速学习。iii) 句子和动作的机械配对先于组合泛化。iv) 异常处理导致U型发展表现,这种模式类似于儿童语言学习中的表征重述。这些结果表明,好奇心驱动的主动推理解释了内在动机的感觉运动-语言学习如何支持人类和人工代理中的可扩展组合泛化和异常处理。

英文摘要

Infants acquire language with generalization from minimal experience, whereas large language models require billions of training tokens. What underlies efficient development in humans? We investigated this problem through experiments wherein robotic agents learn to perform actions associated with imperative sentences (e.g., push red cube) via curiosity-driven self-exploration. Our approach amortizes active inference using Q-learning, enabling intrinsically motivated developmental learning. The simulations reveal key findings corresponding to observations in developmental psychology. i) Generalization improves drastically as the scale of compositional elements increases. ii) Curiosity-driven exploration enables faster learning. iii) Rote pairing of sentences and actions precedes compositional generalization. iv) Exception-handling induces U-shaped developmental performance, a pattern like representational redescription in child language learning. These results suggest that curiosity-driven active inference accounts for how intrinsically motivated sensorimotor-linguistic learning supports scalable compositional generalization and exception handling in humans and artificial agents.

2411.15075 2026-06-04 stat.AP stat.ME

The Effects of Major League Baseball's Ban on Infield Shifts: A Quasi-Experimental Analysis

美国职业棒球大联盟内野防守布阵禁令的影响:一项准实验分析

Lee Kennedy-Shaffer

AI总结 利用面板数据和准实验方法(双重差分和合成控制法),分析2023年MLB内野防守布阵禁令对联盟整体和球员个体进攻表现的影响。

详情
Journal ref
Am.Stat. 80 (2026) 193-203
Comments
25 pages main text, 12 pages appendices, 8 figures (4 main, 4 appendix), 3 tables
AI中文摘要

从2020年到2023年,美国职业棒球大联盟(MLB)修改了影响球队组成、球员定位和比赛时间的规则。理解这些规则的效果对于联盟、球队、球员及其他相关方评估其影响、倡导进一步改变或撤销先前改变至关重要。面板数据和准实验方法为这些情境下的因果推断提供了有用工具。我通过分析2023年防守布阵禁令在联盟范围和球员个体层面的效果,展示了这一潜力。使用双重差分分析,我表明该政策使左撇子击球手的击球率(BABIP)和上垒率(OBP)小幅提高了9个百分点。对于个体球员,合成控制分析识别出多名球员的进攻表现(上垒率、上垒加长打率、加权上垒率)因规则改变而显著提升(多个案例中超过70个百分点),而其他先前防守布阵率高的球员则几乎没有受到影响。本文既估计了这一特定规则改变的影响,也展示了这些因果推断方法如何在更广泛的体育分析中——在球员、球队和联盟层面——具有潜在价值。

英文摘要

From 2020 to 2023, Major League Baseball changed rules affecting team composition, player positioning, and game time. Understanding the effects of these rules is crucial for leagues, teams, players, and other relevant parties to assess their impact and to advocate either for further changes or undoing previous ones. Panel data and quasi-experimental methods provide useful tools for causal inference in these settings. I demonstrate this potential by analyzing the effect of the 2023 shift ban at both the league-wide and player-specific levels. Using difference-in-differences analysis, I show that the policy increased batting average on balls in play and on-base percentage for left-handed batters by a modest amount (nine points). For individual players, synthetic control analyses identify several players whose offensive performance (on-base percentage, on-base plus slugging percentage, and weighted on-base average) improved substantially (over 70 points in several cases) because of the rule change, and other players with previously high shift rates for whom it had little effect. This article both estimates the impact of this specific rule change and demonstrates how these methods for causal inference are potentially valuable for sports analytics -- at the player, team, and league levels -- more broadly.

2512.06553 2026-06-04 stat.AP cs.LG

A Latent Variable Framework for Scaling Laws in Large Language Models

大型语言模型中缩放定律的潜变量框架

Peiyao Cai, Chengyu Cui, Felipe Maia Polo, Seamus Somerstep, Leshem Choshen, Mikhail Yurochkin, Yuekai Sun, Kean Ming Tan, Gongjun Xu

AI总结 提出基于潜变量建模的统计框架,通过引入潜变量捕获不同模型家族和基准的异构性,以更准确地建模大型语言模型的缩放定律。

详情
AI中文摘要

我们提出了一个基于潜变量建模的统计框架,用于大型语言模型(LLMs)的缩放定律。我们的工作受到大量具有不同架构和训练策略的新LLM家族迅速涌现的推动,这些模型在越来越多的基准上进行评估。这种异构性使得单一的全局缩放曲线不足以捕捉不同家族和基准之间的性能变化。为了解决这个问题,我们提出了一个潜变量建模框架,其中每个LLM家族与一个潜变量相关联,该潜变量捕获该家族中常见的底层特征。然后,LLM在不同基准上的性能由其潜在技能驱动,这些技能由潜变量和模型自身的可观测特征共同决定。我们开发了该潜变量模型的估计程序,并建立了其统计性质。我们还设计了支持估计和各种下游任务的高效数值算法。在实验上,我们在Open LLM Leaderboard(v1/v2)的12个广泛使用的基准上评估了该方法。

英文摘要

We propose a statistical framework built on latent variable modeling for scaling laws of large language models (LLMs). Our work is motivated by the rapid emergence of numerous new LLM families with distinct architectures and training strategies, evaluated on an increasing number of benchmarks. This heterogeneity makes a single global scaling curve inadequate for capturing how performance varies across families and benchmarks. To address this, we propose a latent variable modeling framework in which each LLM family is associated with a latent variable that captures the common underlying features in that family. An LLM's performance on different benchmarks is then driven by its latent skills, which are jointly determined by the latent variable and the model's own observable features. We develop an estimation procedure for this latent variable model and establish its statistical properties. We also design efficient numerical algorithms that support estimation and various downstream tasks. Empirically, we evaluate the approach on 12 widely used benchmarks from the Open LLM Leaderboard (v1/v2).

2511.18731 2026-06-04 stat.ME

Can discrete-time analyses be trusted for stepped wedge trials with continuous recruitment?

连续招募的阶梯楔形试验能否信任离散时间分析?

Hao Wang, Guangyu Tong, Heather Allore, Kelsey L. Grantham, Monica Taljaard, Fan Li

AI总结 通过模拟研究,评估在连续招募设计的阶梯楔形整群随机试验中使用离散时间线性混合模型的偏差和鲁棒推断性能。

详情
AI中文摘要

在阶梯楔形整群随机试验(SW-CRTs)中,干预措施在多个时期内依次向整群推广。通常使用将时间视为离散的线性混合模型来分析SW-CRTs的数据。然而,最近的一项系统综述发现,95.1%的横截面SW-CRTs随时间连续招募个体。尽管这种连续招募设计非常普遍,但在分析此类SW-CRTs时如何获得模型鲁棒推断的指导有限。在本文中,我们通过模拟研究,探讨在连续招募设计且结局为连续变量的情况下,使用此类离散时间线性混合模型的影响。具体而言,在数据生成过程中,我们使用连续时间指数衰减相关结构来刻画连续招募,考虑存在或不存在固定连续周期效应,并处理有或无随机或暴露时间依赖的干预效应的情况。然后,我们在三种流行的离散时间工作相关结构(简单可交换、嵌套可交换和离散时间指数衰减)下分析模拟数据,并使用鲁棒三明治方差估计量。我们的结果表明,离散时间分析通常产生可忽略的偏差,并且使用Mancl和DeRouen校正的鲁棒方差估计量始终达到名义覆盖率和I类错误率。一个重要的例外是当招募模式在对照期和干预期之间系统性变化时,离散时间分析会导致略微有偏的估计。最后,我们通过重新分析一项已完成的SW-CRT来阐明这些发现。

英文摘要

In stepped wedge cluster randomized trials (SW-CRTs), interventions are sequentially rolled out to clusters over multiple periods. It is common practice to analyze data from SW-CRTs using linear mixed models that treat time as discrete. However, a recent systematic review found that 95.1% of cross-sectional SW-CRTs recruit individuals continuously over time. Despite the high prevalence of such continuous recruitment designs, there has been limited guidance on how to draw model-robust inference when analyzing such SW-CRTs. In this article, we investigate through simulations the implications of using such discrete-time linear mixed models in the case of continuous recruitment designs with a continuous outcome. Specifically, in the data-generating process, we characterize continuous recruitment using a continuous-time exponential decay correlation structure in the presence or absence of a fixed continuous period effect, addressing scenarios both with and without a random or exposure-time-dependent intervention effect. We then analyze the simulated data under three popular discrete-time working correlation structures: simple exchangeable, nested exchangeable, and discrete-time exponential decay, with a robust sandwich variance estimator. Our results demonstrate that discrete-time analysis often yields negligible bias and that the robust variance estimator with the Mancl and DeRouen correction consistently achieves nominal coverage and type I error rate. One important exception occurs when recruitment patterns vary systematically between control and intervention periods, where discrete-time analysis leads to slightly biased estimates. Finally, we illustrate these findings by reanalyzing a completed SW-CRT.

2511.06320 2026-06-04 stat.AP

Bayesian Predictive Probabilities for Online Experimentation

在线实验的贝叶斯预测概率

Abbas Zaidi, Rina Friedberg, Samir Khan, Yao-Yang Leow, Maulik Soneji, Houssam Nassif, Richard Mudd

AI总结 提出基于贝叶斯预测概率的在线实验中期分析方法,无需数值积分即可估计预测概率,并在Instagram数据上验证其降低错误率、保持实验保真度的实际效益。

详情
Journal ref
Presented at CODE@MIT 2025
Comments
5 pages, 1 figure
AI中文摘要

在线随机对照实验(A/B测试)在决策中的广泛采用造成了持续的能力限制,需要进行中期分析。因此,平台用户越来越倾向于通过偷看(peeking)来优化有限资源。然而,这种过程容易出错,且常常与实验结束时的结果不一致(例如,I型错误膨胀)。我们引入了一个基于贝叶斯预测概率的系统,使我们能够在不影响实验保真度的情况下进行中期分析;这一思想在技术领域之外的应用中已被广泛用于更有效地做出实验决策。受实验平台大规模部署的启发,我们展示了如何在不使用数值积分技术的情况下估计预测概率,并推荐了系统以大规模研究其属性作为持续健康检查,以及系统设计建议——所有这些都基于Instagram的实验数据——以证明其带来的实际效益。

英文摘要

The widespread adoption of online randomized controlled experiments (A/B Tests) for decision-making has created ongoing capacity constraints which necessitate interim analyses. As a consequence, platform users are increasingly motivated to use ad-hoc means of optimizing limited resources via peeking. Such processes, however, are error prone and often misaligned with end-of-experiment outcomes (e.g., inflated type-I error). We introduce a system based on Bayesian Predictive Probabilities that enable us to perform interim analyses without compromising fidelity of the experiment; This idea has been widely utilized in applications outside of the technology domain to more efficiently make decisions in experiments. Motivated by at-scale deployment within an experimentation platform, we demonstrate how predictive probabilities can be estimated without numerical integration techniques and recommend systems to study its properties at scale as an ongoing health check, along with system design recommendations - all on experiment data from Instagram - to demonstrate practical benefits that it enables.

2408.04607 2026-06-04 stat.ML cond-mat.dis-nn cs.LG

Risk and cross validation in ridge regression with correlated samples

带相关样本的岭回归中的风险与交叉验证

Alexander Atanasov, Jacob A. Zavatone-Veth, Cengiz Pehlevan

AI总结 利用随机矩阵理论和自由概率,研究了数据点具有任意相关性时岭回归的渐近风险,并提出了修正的广义交叉验证估计器CorrGCV,同时扩展到测试点与训练集相关的情况。

详情
Journal ref
International Conference on Machine Learning (2025), https://proceedings.mlr.press/v267/atanasov25a.html
Comments
50 pages, 19 figures. v4: ICML 2025 camera-ready. v5: Fix typo in statement of Theorem 5. v6: typos corrected, to appear in 2026 JSTAT Machine Learning focus collection
AI中文摘要

近年来,我们对高维岭回归的理解取得了实质性进展,但现有理论假设训练样本是独立的。通过利用随机矩阵理论和自由概率的技术,我们为数据点具有任意相关性时岭回归的样本内和样本外风险提供了精确的渐近结果。我们证明,在这种情况下,广义交叉验证估计器(GCV)无法正确预测样本外风险。然而,当噪声残差与数据点具有相同相关性时,可以修改GCV以产生一个在高维极限下集中的高效可计算无偏估计器,我们称之为CorrGCV。我们进一步将渐近分析扩展到测试点与训练集具有非平凡相关性的情况,这是时间序列预测中经常遇到的情况。假设已知时间序列的相关结构,这再次产生了GCV估计器的扩展,并精确刻画了此类测试点对长期风险产生过于乐观预测的程度。我们在各种高维数据上验证了理论的预测。

英文摘要

Recent years have seen substantial advances in our understanding of high-dimensional ridge regression, but existing theories assume that training examples are independent. By leveraging techniques from random matrix theory and free probability, we provide sharp asymptotics for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations. We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk. However, in the case where the noise residuals have the same correlations as the data points, one can modify the GCV to yield an efficiently-computable unbiased estimator that concentrates in the high-dimensional limit, which we dub CorrGCV. We further extend our asymptotic analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting. Assuming knowledge of the correlation structure of the time series, this again yields an extension of the GCV estimator, and sharply characterizes the degree to which such test points yield an overly optimistic prediction of long-time risk. We validate the predictions of our theory across a variety of high dimensional data.

2511.03000 2026-06-04 stat.ML cs.IT cs.LG math.IT

Unifying Information-Theoretic and Pair-Counting Clustering Similarity

统一信息论与配对计数的聚类相似性

Alexander J. Gates

AI总结 本文通过加权展开和高阶扩展两个视角,统一了配对计数与信息论两类聚类相似性度量,揭示了它们之间的分析联系。

详情
Comments
23 pages, 2 figures
AI中文摘要

比较聚类结果对于评估无监督模型至关重要,然而现有的许多相似性度量可能产生广泛分歧、有时甚至矛盾的评估。聚类相似性度量通常分为两大族:配对计数和信息论,分别反映它们是通过元素对还是通过完整聚类列联表的聚合信息来量化一致性。先前的工作已发现这些族之间的相似性,并应用了经验归一化或机会校正方案,但它们更深层的分析联系仍仅部分被理解。在此,我们开发了一个分析框架,通过两个互补视角统一这些族。首先,两个族都表示为观察到的与期望的共现的加权展开,配对计数作为二次低阶近似出现,而信息论度量作为高阶频率加权扩展。其次,我们将配对计数推广到k元组一致性,并表明信息论度量可以被视为系统性地累积超出成对水平的高阶共分配结构。我们针对Rand指数和互信息从分析上说明了这些方法,并展示了每个族中的其他指数如何作为自然扩展出现。总之,这些观点阐明了两个体系何时以及为何产生分歧,将它们的敏感性直接与权重和近似阶数联系起来,并为跨应用选择、解释和扩展聚类相似性度量提供了原则性基础。

英文摘要

Comparing clusterings is central to evaluating unsupervised models, yet the many existing similarity measures can produce widely divergent, sometimes contradictory, evaluations. Clustering similarity measures are typically organized into two principal families, pair-counting and information-theoretic, reflecting whether they quantify agreement through element pairs or aggregate information across full cluster contingency tables. Prior work has uncovered parallels between these families and applied empirical normalization or chance-correction schemes, but their deeper analytical connection remains only partially understood. Here, we develop an analytical framework that unifies these families through two complementary perspectives. First, both families are expressed as weighted expansions of observed versus expected co-occurrences, with pair-counting arising as a quadratic, low-order approximation and information-theoretic measures as higher-order, frequency-weighted extensions. Second, we generalize pair-counting to k-tuple agreement and show that information-theoretic measures can be viewed as systematically accumulating higher-order co-assignment structure beyond the pairwise level. We illustrate the approaches analytically for the Rand index and Mutual Information, and show how other indices in each family emerge as natural extensions. Together, these views clarify when and why the two regimes diverge, relating their sensitivities directly to weighting and approximation order, and provide a principled basis for selecting, interpreting, and extending clustering similarity measures across applications.

2502.00470 2026-06-04 math.OC cs.LG stat.ML

On the Relationship Between CoCoA and ADMM for Distributed Empirical Risk Minimization

关于CoCoA与ADMM在分布式经验风险最小化中的关系

Runxiong Wu, Andi Wang

AI总结 本文从统一原始-对偶视角揭示CoCoA与ADMM两类分布式ERM算法的内在联系,证明岭正则化下CoCoA等价于特定近端ADMM方案,并给出ADMM型方法的统一收敛分析和早停准则。

详情
Journal ref
Published in Transactions on Machine Learning Research (06/2026)
Comments
21 pages, 4 figures, 1 table
AI中文摘要

分布式经验风险最小化(ERM)通常通过两类有影响力但看似独立的方法来研究:源自分布式对偶坐标上升的CoCoA型算法,以及源自共识和近端分裂的ADMM型算法。本文从统一的原始-对偶视角研究这两类算法的联系。我们证明共识ADMM、线性化共识ADMM、两种分布式近端ADMM变体以及岭正则化CoCoA都可以写成一种涉及全局原始变量和块对偶变量的通用更新形式。这种重新表述使几个先前隐藏的联系变得明确:对于岭正则化ERM,CoCoA在对偶更新层面上与特定的近端ADMM方案一致。此外,原始问题上的共识ADMM等价于对偶问题上的近端ADMM,并具有显式参数映射以及鞍点目标符号反转;线性化变体也存在类似的对应关系。这些结果表明,在岭正则化ERM问题下,经过精细调参的ADMM型算法至少与CoCoA性能相当。统一视角还为共识ADMM提供了自然的原始-对偶间隙早停准则,并为ADMM型方法提供了统一的$O(1/T)$遍历收敛分析。在合成回归问题和真实SVM数据集上的实验支持了预测的关系,阐明了调参的作用,并表明适当调参的ADMM变体在岭正则化设置下可以优于CoCoA。

英文摘要

Distributed empirical risk minimization (ERM) is often studied through two influential yet seemingly separate families of methods: CoCoA-type algorithms, derived from distributed dual coordinate ascent, and ADMM-type algorithms, derived from consensus and proximal splitting. In this paper, we investigate the connection of the two types of algorithms from a unified primal-dual perspective. We show that consensus ADMM, linearized consensus ADMM, two distributed proximal ADMM variants, and ridge-regularized CoCoA can all be written in a common update form involving a global primal variable and block dual variables. This reformulation makes several previously hidden connections explicit: For ridge-regularized ERM, CoCoA coincides with a particular proximal ADMM scheme at the level of the dual update. Moreover, consensus ADMM on the primal problem is equivalent to proximal ADMM on the dual problem under an explicit parameter mapping together with a sign reversal of the saddle objective; similar correspondences also hold for the linearized variants.These results indicates that the ADMM-type algorithms, when fine tuned, performs at least as good as CoCoA, under ridge regularized ERM problems. The unified view also yields a natural primal-dual gap stopping criterion for consensus ADMM and a unified $O(1/T)$ ergodic convergence analysis for the ADMM-type methods. Experiments on synthetic regression problems and real SVM datasets support the predicted relationships, clarify the role of tuning parameters, and show that suitably tuned ADMM variants can outperform CoCoA in the ridge-regularized setting.

2509.23385 2026-06-04 stat.ML cs.LG

Flow Matching Calibration for Simulation-Based Inference under Model Misspecification

模型误设定下基于模拟推断的流匹配校准

Pierre-Louis Ruhlmann, Michael Arbel, Florence Forbes, Pedro L. C. Rodrigues

AI总结 针对基于模拟推断中模型误设定导致的偏差,提出流匹配校正后验估计方法,通过少量校准样本利用流匹配范式修正后验估计器,提高推断准确性和不确定性量化。

详情
AI中文摘要

基于模拟的推断(SBI)通过从模拟数据中估计复杂非线性模型的参数,正在变革实验科学。然而,一个持续的挑战是模型误设定。在贝叶斯设置中,针对后验分布,误差可能来自模拟器、噪声或先验建模。这些模型组件只是现实世界的近似,严重的不匹配可能导致有偏或过于自信的后验。我们通过引入流匹配校正后验估计(FMCPE)来解决这个问题,该框架利用流匹配范式,使用少量校准样本细化基于模拟训练的后验估计器。我们的方法分两个阶段进行:首先,在大量模拟数据上训练后验近似器;其次,流匹配将其预测向由校准观测支持的真实后验传输。我们依靠后者来指导校正,无需明确知道误设定形式或哪些模型组件受到影响。这种设计使FMCPE能够结合SBI的可扩展性和对分布偏移的鲁棒性。在合成基准和真实世界数据集上,我们表明我们的提议一致地减轻了误设定的影响,与标准SBI基线相比,提供了改进的推断准确性和不确定性量化,同时保持计算效率。

英文摘要

Simulation-based inference (SBI) is transforming experimental sciences by enabling parameter estimation in complex non-linear models from simulated data. A persistent challenge, however, is model misspecification. In a Bayesian setting, targeting posterior distributions, errors may arise from the simulator, the noise or prior modelling. These model components are only approximations of reality, and severe mismatches can yield biased or overconfident posteriors. We address this issue by introducing Flow Matching Corrected Posterior Estimation (FMCPE), a framework that leverages the flow matching paradigm to refine simulation-trained posterior estimators using a small set of calibration samples. Our approach proceeds in two stages: first, a posterior approximator is trained on abundant simulated data; second, flow matching transports its predictions toward the true posterior supported by calibration observations. We rely on the later to guide the correction, without requiring explicit knowledge of the misspecification form or of which model components are affected. This design enables FMCPE to combine the scalability of SBI with robustness to distributional shift. Across synthetic benchmarks and real-world datasets, we show that our proposal consistently mitigates the effects of misspecification, delivering improved inference accuracy and uncertainty quantification compared to standard SBI baselines, while remaining computationally efficient.

2510.07559 2026-06-04 stat.CO math.PR math.ST stat.ME stat.TH

A coupling-based approach to f-divergences diagnostics for Markov chain Monte Carlo

基于耦合的马尔可夫链蒙特卡洛f-散度诊断方法

Adrien Corenflos, Hai-Dang Dau

AI总结 提出一种基于耦合的权重协调方案,通过计算一致的重要性权重来估计任意f-散度,为MCMC收敛提供通用诊断工具,并给出Radon-Nikodym导数的一致估计。

详情
Comments
15 pages + 23 pages of appendix comprising mostly proofs. 8 figures. Main differences w.r.t. v1 are: - the addition of a theorem on the almost sure convergence our the weights system to 1/N under minimal assumptions. - fixed numerical simulations
AI中文摘要

马尔可夫链蒙特卡洛收敛的理论分析(通常基于统计散度)与实际使用的诊断之间存在长期差距。我们首次引入基于任意$f$-散度的通用马尔可夫链蒙特卡洛收敛诊断方法,允许用户直接监控Kullback-Leibler散度、$\chi^2$散度、Hellinger距离和全变差距离等。我们的方法基于一种耦合的“权重协调”方案,为相互作用的马尔可夫链生成直接、可计算且一致的相对于其目标分布的重要性权重。除了用作收敛诊断外,这些权重是Radon-Nikodym导数$\mathrm{d}\pi/\mathrm{d}\mu_t$的一致估计,比收敛界限本身更丰富,可自然应用于重要性加权推断。我们展示了这种加权如何为任意$f$-散度提供上界,证明这些界随时间收紧并在链接近平稳时收敛到零,并证明尽管比现有的基于耦合的全变差估计更保守,我们的方法仍然是实用且广泛适用的诊断工具。

英文摘要

A long-standing gap exists between the theoretical analysis of Markov chain Monte Carlo convergence, which is often based on statistical divergences, and the diagnostics used in practice. We introduce the first general convergence diagnostics for Markov chain Monte Carlo based on any $f$-divergence, allowing users to directly monitor, among others, the Kullback-Leibler and the $χ^2$ divergences as well as the Hellinger and the total variation distances. Our approach rests on a coupling-based "weight harmonization" scheme that produces direct, computable, and consistent importance weights for interacting Markov chains with respect to their target distribution. Beyond their use as convergence diagnostics, these weights are consistent estimates of the Radon-Nikodym derivative $\mathrm{d}π/\mathrm{d} μ_t$, a richer object than the convergence bounds alone, with natural applications to importance-weighted inference. We show how such weightings can provide upper bounds to any $f$-divergence, prove that these bounds tighten over time and converge to zero as the chains approach stationarity, and demonstrate that, while more conservative than existing coupling-based total variation estimators, our method remains a practical and broadly applicable diagnostic tool.

2509.23935 2026-06-04 stat.ME

RAPSEM: Identifying Latent Mediators Without Sequential Ignorability via a Rank-Preserving Structural Equation Model

RAPSEM: 通过秩保持结构方程模型识别无序列可忽略性下的潜在中介变量

Sofia Morelli, Roberto Faleh, Holger Brandt

AI总结 提出秩保持结构方程模型(RAPSEM),利用G估计和两阶段矩方法,在弱假设下识别潜在中介变量,并通过模拟和实证研究验证其稳健性。

详情
Comments
31 pages, 8 figures, 8 tables, submitted to Psychometrika, Cambridge University Press
AI中文摘要

标准结构方程模型(SEM)常用于识别潜在中介变量。然而,有效的推断通常依赖于强且常被违反的序列可忽略性假设。我们引入了秩保持结构方程模型(RAPSEM),该模型通过G估计增强了稳健性,同时通过两阶段矩方法(2SMM)进行因子得分校正,保持了测量模型的完整性。RAPSEM将无未测量的中介-结果混杂假设替换为较弱的无未观察到的效应修饰假设。通过利用治疗随机化,RAPSEM以等价于通过结构涌现工具进行工具变量估计的方式实现识别。具体而言,识别依赖于影响中介变量但对结果无直接效应的治疗-协变量交互作用,使研究者能够利用治疗反应中的自然异质性作为可检验的识别来源。我们为核心识别假设提供了稳健性评估,并建立了所得估计量的一致性和渐近正态性。模拟研究表明,在存在未观测混杂的情况下,RAPSEM保持无偏,而标准SEM产生有偏结果。根据结构工具的强度,RAPSEM在样本量超过500时达到合理的统计功效。该方法在配套的rapsem R包中实现,并通过教育研究中的一个实证例子展示了其实用性。代码见 https://github.com/PsychometricsMZ/RAPSEM。

英文摘要

Standard structural equation models (SEMs) are often used to identify latent mediators. However, valid inference typically relies on the strong, frequently violated Sequential Ignorability assumption. We introduce the Rank-Preserving Structural Equation Model (RAPSEM), which increases robustness through G-estimation while maintaining the measurement model's integrity through a two-stage method of moments (2SMM) for factor score corrections. RAPSEM replaces the no unmeasured mediator-outcome confounding with the weaker no unobserved effect modification assumption. By leveraging treatment randomization, RAPSEM achieves identification in a manner equivalent to instrumental variable estimation through structurally emerging instruments. Specifically, identification relies on treatment-covariate interactions that influence the mediator but have no direct effect on the outcome, allowing researchers to utilize natural heterogeneity in treatment response as a testable source of identification. We provide a robustness assessment for the core identifying assumption and establish the consistency and asymptotic normality of the resulting estimator. Simulation studies demonstrate that RAPSEM remains unbiased under unobserved confounding, whereas standard SEM yields biased results. RAPSEM achieves reasonable power for sample sizes above 500, depending on the strength of the structural instruments. The method is implemented in the accompanying rapsem R package, and its practical utility is illustrated through an empirical example from educational research. The code is available at https://github.com/PsychometricsMZ/RAPSEM.

2509.19500 2026-06-04 stat.AP

One Person, How Many Votes? Demographic Distortions in United States Elections

一人几票?美国选举中的人口统计扭曲

Lee Kennedy-Shaffer

AI总结 本文利用人口普查数据提出量化美国选举制度中基于地理单元(国会选区和州)对人口统计群体造成的代表性扭曲的指标,并可视化2000-2020年的数据,发现白种人、农村居民和自有住房家庭在参议院和选举人团中过度代表,而黑人和西班牙裔、城市居民和租房家庭则代表不足。

详情
Comments
28 pages, 3 figures, 1 table
AI中文摘要

美国的代议制民主依赖于选举系统,该系统将选票转化为三个关键机构的代表:联邦立法机构的两院(众议院和参议院)以及选举总统和副总统的选举人团。这一过程通过基于地理单元(国会选区和州)的重新加权进行,可能引入显著的扭曲。在本文中,我提出了可应用于人口统计群体的这种扭曲的量化指标,利用人口普查数据来评估和可视化这些扭曲效应。这些指标包括在这些系统下选票的绝对权重以及通过扭曲在机构中代表的超额人口。可视化2000年至2020年的这些指标显示,关键人口类别中存在持续的不平等分配。白种人(非西班牙裔)居民、农村居民和自有住房家庭在参议院和选举人团中代表过度;黑人和西班牙裔人、城市居民和租房家庭代表不足。对于城市居民,这种代表不足相当于参议院中少了2500万居民,选举人团中少了近500万居民。我讨论了这些扭曲效应及其与选举系统其他特征相互作用对进一步研究的意义。

英文摘要

Representative democracy in the United States relies on election systems that transmit votes into representatives in three key bodies: the two chambers of the federal legislature (House of Representatives and Senate) and the Electoral College, which selects the President and Vice-President. This happens through a process of re-weighting based on geographic units (congressional districts and states) that can introduce substantial distortion. In this paper, I propose quantitative measures of this distortion that can be applied to demographic groups, using Census data, to assess and visualize these distortive effects. These include the absolute weight of votes under these systems and the excess population represented in the bodies through the distortions. Visualizing these metrics from 2000 -- 2020 shows persistent malapportionment in key demographic categories. White (non-Hispanic) residents, residents of rural areas, and owner-occupied households are overrepresented in the Senate and Electoral College; Black and Hispanic people, urban dwellers, and renter-occupied households are underrepresented. For urban residents, this underrepresentation is the equivalent of 25 million fewer residents in the Senate and nearly 5 million in the Electoral College. I discuss implications for further research on the effects of these distortions and their interactions with other features of the electoral system.

2509.08846 2026-06-04 cs.LG cs.AI stat.ML

Uncertainty Estimation using Variance-Gated Distributions

使用方差门控分布的不确定性估计

H. Martin Gillis, Isaac Xu, Thomas Trappenberg

AI总结 提出基于类概率分布信噪比的方差门控不确定性估计框架,通过集成置信因子缩放预测,解决神经网络预测不确定性分解中的加性分解问题。

详情
Comments
NeurIPS Workshop: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision-Making
AI中文摘要

评估神经网络每个样本的不确定性量化对于涉及高风险应用的决策至关重要。一种常见的方法是使用贝叶斯或近似模型的预测分布,并将相应的预测不确定性分解为认知(模型相关)和偶然(数据相关)成分。然而,加性分解最近受到质疑。在这项工作中,我们提出了一个基于不同模型预测中类概率分布信噪比的不确定性估计和分解的直观框架。我们引入了一种方差门控度量,该度量通过从集成中导出的置信因子来缩放预测。我们使用这个度量来讨论委员会机器多样性崩溃的存在性。

英文摘要

Evaluation of per-sample uncertainty quantification from neural networks is essential for decision-making involving high-risk applications. A common approach is to use the predictive distribution from Bayesian or approximation models and decompose the corresponding predictive uncertainty into epistemic (model-related) and aleatoric (data-related) components. However, additive decomposition has recently been questioned. In this work, we propose an intuitive framework for uncertainty estimation and decomposition based on the signal-to-noise ratio of class probability distributions across different model predictions. We introduce a variance-gated measure that scales predictions by a confidence factor derived from ensembles. We use this measure to discuss the existence of a collapse in the diversity of committee machines.

2508.05866 2026-06-04 stat.ME

Identifiability and Inference for Generalized Latent Factor Models

广义潜在因子模型的可识别性与推断

Chengyu Cui, Gongjun Xu

AI总结 针对广义潜在因子模型,在常用可识别性条件下建立了最大似然估计的统计推断理论,并通过数值模拟和人格评估数据集验证。

详情
Comments
36 pages, 4 figures
AI中文摘要

广义潜在因子分析不仅在统计学和机器学习中提供了一种有用的潜在嵌入方法,而且在心理测量学、计量经济学和社会科学等各个科学领域中被广泛使用。确保潜在因子和载荷矩阵的可识别性对于模型的可估计性和可解释性至关重要,实践者已经采用了各种可识别性条件。然而,在常用可识别性条件下,潜在因子和因子载荷的基本统计推断问题在很大程度上仍未得到解决,特别是对于相关因子和/或非正交载荷矩阵。在这项工作中,我们专注于广义因子模型的最大似然估计,并在广泛使用的可识别性条件下建立统计推断性质。通过数值模拟和对人格评估数据集的应用进一步说明了所发展的理论。

英文摘要

Generalized latent factor analysis not only provides a useful latent embedding approach in statistics and machine learning, but also serves as a widely used tool across various scientific fields, such as psychometrics, econometrics, and social sciences. Ensuring the identifiability of latent factors and the loading matrix is essential for the model's estimability and interpretability, and various identifiability conditions have been employed by practitioners. However, fundamental statistical inference issues for latent factors and factor loadings under commonly used identifiability conditions remain largely unaddressed, especially for correlated factors and/or non-orthogonal loading matrix. In this work, we focus on the maximum likelihood estimation for generalized factor models and establish statistical inference properties under popularly used identifiability conditions. The developed theory is further illustrated through numerical simulations and an application to a personality assessment dataset.

1709.00310 2026-06-04 eess.SY cs.SY stat.AP

Coherent Track Before Detect: Detection via simultaneous trajectory estimation and long time integration

相干跟踪前检测:通过同时估计轨迹和长时间积分进行检测

Kimin Kim, Murat Uney, Bernard Mulgrew

AI总结 本文研究了雷达在低信噪比下检测机动小目标的问题,提出了一种相干跟踪前检测方法,通过同时估计目标轨迹和反射系数,实现长时间积分以提高检测性能。

详情
Comments
This article is based on Chapter 3 in "Reliable detection and characterisation of dim targets via track-before-detect", a PhD Thesis by Kimin Kim, The University of Edinburgh (https://era.ed.ac.uk/bitstream/handle/1842/38091/Kim2021.pdf?sequence=1&isAllowed=y)
AI中文摘要

在本文中,我们考虑了雷达对机动小目标的检测问题。此类目标在接收到的信号中会引起较低的信噪比(SNR)反射。我们考虑了共址和分离的发射/接收对,即单静态和双静态配置,以及包含这两种类型的多静态设置。我们提出了相干跟踪前检测:一种检测方法,能够在所有这些配置中相干地整合这些反射,并在连续的CPI(相干处理间隔)中继续整合任意长的时间。{我们估计反射系数的复值以进行整合,同时估计目标轨迹。这些计算还结合了估计分离发射机的未知时间参考偏移,这对于相干处理是必要的。} 检测通过将所得整合值用于纳伊曼-皮尔逊检验,与常数假警率阈值进行比较来实现。我们在一个模拟示例中展示了该方法的有效性,该示例中一个信噪比极低的目标无法用传统技术检测。

英文摘要

In this work, we consider the detection of manoeuvring small objects with radars. Such objects induce low signal to noise ratio (SNR) reflections in the received signal. We consider both co-located and separated transmitter/receiver pairs, i.e., mono-static and bi-static configurations, respectively, as well as multi-static settings involving both types. We propose coherent track before detect: A detection approach which is capable of coherently integrating these reflections within a coherent processing interval (CPI) in all these configurations and continuing integration for an arbitrarily long time across consecutive CPIs. {We estimate the complex value of the reflection coefficients for integration while simultaneously estimating the object trajectory. Compounded with these computations is the estimation of the unknown time reference shift of the separated transmitters necessary for coherent processing.} Detection is made by using the resulting integration value in a Neyman-Pearson test against a constant false alarm rate threshold. We demonstrate the efficacy of our approach in a simulation example with a very low SNR object which cannot be detected with conventional techniques.

2505.15354 2026-06-04 cs.LG stat.ML

Post-Training Corrections for Improved Time-Series Forecasting

人在回路的自适应优化用于改进时间序列预测

Hamza Cherkaoui, Malik Tiomoko, Giuseppe Paolo, Zhang Yili, Yu Meng, Zhang Keli, Hafiz Tiomoko Ali

AI总结 提出一种无需重训练或修改架构的轻量级后训练自适应优化框架,通过强化学习、上下文赌博机或遗传算法自动学习表达性变换来校正模型输出,并支持人类专家通过自然语言引导校正,从而在多个基准上以最小计算开销持续提升预测精度。

详情
AI中文摘要

时间序列预测模型即使在能源、金融和医疗等关键领域也经常产生系统性的、可预测的错误。我们引入了一种新颖的后训练自适应优化框架,无需重训练或架构更改即可提高预测准确性。我们的方法自动应用通过强化学习、上下文赌博机或遗传算法优化的表达性变换,以轻量级和模型无关的方式校正模型输出。理论上,我们证明了仿射校正总能降低均方误差;实际上,我们通过基于动态动作的优化扩展了这一思想。该框架还支持可选的人回路组件:领域专家可以使用自然语言指导校正,自然语言由语言模型解析为动作。在多个基准(例如电力、天气、交通)上,我们观察到以最小的计算开销持续提高准确性。我们的交互式演示展示了该框架的实时可用性。通过将自动事后改进与可解释和可扩展的机制相结合,我们的方法为实际预测系统提供了强大的新方向。

英文摘要

Time-series forecasting is a critical task in various business domains, but it remains inherently challenging. Typically, large forecasting models are trained in a single, resource-intensive run. Once training is completed, a natural question arises:~\emph{is there still potential for meaningful improvement in the model's performance?} Motivated by techniques from boosting, we introduce the concept of~\emph{post-training corrections}. This approach enhances a trained forecaster by sequentially applying a carefully selected set of corrections to its predictions. Our method offers a lightweight, model-agnostic, and scalable strategy to improve forecasting performance in practical settings. We provide theoretical foundations for the approach, starting with the affine correction case, and analyze the expected performance gains and computational costs in more general settings. Across a range of benchmark datasets, our method consistently delivers up to a $30\%$ improvement in forecasting accuracy over existing state-of-the-art models, with minimal computational overhead.

2504.13291 2026-06-04 stat.ME stat.CO

Estimating equations for causal survival analysis with pooled logistic regression

基于合并逻辑回归的因果生存分析的估计方程

Paul N Zivich, Stephen R Cole, Bonnie E Shook-Sa, Justin B DeMonte, Jessie K Edwards

AI总结 提出使用估计方程重新表述合并逻辑回归模型,结合经验三明治方差估计器进行推断,以降低计算负担并避免非参数自举法。

详情
AI中文摘要

背景:合并逻辑回归模型常用于生存分析。然而,标准实现可能计算量很大,当使用非参数自举法进行推断时,这一问题更加严重。为减轻计算负担,研究者通常对时间间隔进行粗化或假设时间参数模型。这些方法施加了限制性假设,可能缺乏充分的实质性理由。方法:本文使用估计方程重新表述合并逻辑回归模型,以简化计算并允许通过经验三明治方差估计器进行推断,从而避免计算量更大的自举法。通过两个公开数据示例展示了所提出的实现。使用蒙特卡洛模拟研究说明了经验三明治方差估计器的性能。结果:如应用示例所示,所提出的实现大幅减少了运行时间,且无需粗化数据即可应用。在模拟研究中,经验三明治方差估计器得到了名义置信区间覆盖率。结论:本文提出的实现为合并逻辑回归的标准实现提供了一种改进的替代方案,无需对时间施加限制性约束。

英文摘要

Background: Pooled logistic regression models are commonly applied in survival analysis. However, the standard implementation can be computationally demanding, which is further exacerbated when using the nonparametric bootstrap for inference. To ease these computational burdens, investigators often coarsen time intervals or assume a parametric models for time. These approaches impose restrictive assumptions, which may not always have a well-motivated substantive justification. Methods: Here, the pooled logistic regression model is re-framed using estimating equations to simplify computations and allow for inference via the empirical sandwich variance estimator, thus avoiding the more computationally demanding bootstrap. The proposed implementation is demonstrated using two examples with publicly available data. The performance of the empirical sandwich variance estimator is illustrated using a Monte Carlo simulation study. Results: As shown in the applied examples, the proposed implementation substantially reduced run-times and could be applied without needing to coarsen the data. In the simulation study, the empirical sandwich variance estimator results in nominal confidence interval coverage. Conclusions: The implementation proposed here offers an improved alternative to the standard implementation of pooled logistic regression without needing to impose restrictive constraints on time.

2503.18721 2026-06-04 math.ST cs.CR cs.LG stat.ME stat.ML stat.TH

Differentially Private Joint Independence Test

差分隐私联合独立性检验

Xingwei Liu, Yuexin Chen, Jin-Ting Zhang, Wangli Xu

AI总结 针对隐私约束下的多随机向量联合依赖检测问题,提出基于差分隐私置换的dHSIC检验方法,实现有效水平、点态一致性和极小极大最优功效。

详情
Comments
57 pages, 7 figures
AI中文摘要

多个随机向量之间的联合依赖识别在许多统计应用中扮演重要角色,其中数据可能包含敏感或机密信息。本文在差分隐私背景下考虑$d$变量希尔伯特-施密特独立性准则(dHSIC)。鉴于dHSIC经验估计的极限分布是复杂的高斯混沌,非隐私场景下的检验通常基于置换和自助法。为了在隐私约束下检测联合依赖,我们提出了一种采用差分隐私置换方法的基于dHSIC的检验程序。我们证明该方法具有隐私保证、有效水平和点态一致性,而自助法存在功效不一致的问题。我们进一步研究了所提检验在dHSIC和$L_2$度量下的均匀功效,表明该检验在不同隐私机制下达到极小极大最优功效。作为副产品,我们证明了Pfister等人(2018)提出的非隐私置换dHSIC检验是我们差分隐私置换检验的特例,并且我们的结果也建立了其点态和均匀功效——从而解决了该工作中的开放问题。因果推断中的数值模拟和真实数据分析表明,我们提出的检验在实证中表现良好。

英文摘要

Identification of joint dependence among several random vectors plays an important role in many statistical applications, where the data may contain sensitive or confidential information. In this paper, we consider the $d$-variable Hilbert-Schmidt independence criterion (dHSIC) in the context of differential privacy. Given that the limiting distribution of the empirical estimate of dHSIC is a complicated Gaussian chaos, constructing tests in the non-private regime is typically based on permutation and bootstrap methods. To detect joint dependence under privacy constraints, we propose a dHSIC-based testing procedure employing a differentially private permutation methodology. We show that our method enjoys privacy guarantees, a valid level, and pointwise consistency, whereas the bootstrap counterpart suffers from inconsistent power. We further investigate the uniform power of the proposed test under the dHSIC and $L_2$ metrics, showing that the proposed test attains the minimax optimal power across different privacy regimes. As a byproduct, we show that the non-private permutation dHSIC test proposed in Pfister et al. (2018) is a special case of our differentially private permutation test, and our results also establish its pointwise and uniform power--thus resolving an open problem from that work. Both numerical simulations and real data analysis in causal inference suggest that our proposed test performs well empirically.

2503.21358 2026-06-04 stat.ME math.PR

Inference in stochastic differential equations using the Laplace approximation: Demonstration and examples

使用拉普拉斯逼近的随机微分方程推断:演示与实例

Uffe Høgsbro Thygesen, Kasper Kristensen

AI总结 本文展示如何正确使用拉普拉斯逼近来估计随机微分方程中的状态和参数,特别关注非线性动力学、状态依赖噪声和非高斯测量误差,并通过模拟案例验证其计算可行性和灵活性。

详情
Comments
40 pages, 7 figures, 1 table
AI中文摘要

随机微分方程是生态学中动态系统和时间序列的自然框架,因为它们允许非线性第一性原理知识和动力学中的不确定性,并且可以与测量误差结合。然而,估计方法通常在技术和计算上具有挑战性。在这里,我们证明当正确使用时,拉普拉斯逼近对于估计这些模型中的状态和参数是有用的。我们特别关注非线性动力学、状态依赖噪声强度和非高斯测量误差。我们的技术在观测时间之间添加状态,使用离散化方法(最简单的情况下是Euler-Maruyama方法)逼近转移密度,并使用拉普拉斯逼近消除未观测状态。我们证明一致性要求特定形式的逼近,并提供不同的实现方法。通过模拟案例研究,我们证明转移概率得到良好逼近,推断在计算上可行,并且该框架导致简单且灵活的实现。

英文摘要

Stochastic differential equations are a natural framework for dynamic systems and time series in ecology, because they allow for non-linear first-principle knowledge and uncertainty in the dynamics, and can be combined with measurement errors. However, estimation methods are often technically and computationally challenging. Here, we demonstrate that the Laplace approximation is useful for estimating states and parameters in these models, when done correctly. We give special attention to non-linear dynamics, state-dependent noise intensities, and non-Gaussian measurement errors. Our technique adds states between times of observations, approximates transition densities using discretization methods - in the simplest case, the Euler-Maruyama method - and eliminates unobserved states using the Laplace approximation. We demonstrate that consistency requires a particular form of the approximation, and provide different approaches to implementation. Using simulated case studies, we demonstrate that transition probabilities are well approximated, that inference is computationally feasible, and that the framework leads to simple and flexible implementations.

1802.10003 2026-06-04 econ.GN math.OC q-fin.EC stat.AP

Stock management (Gestão de estoques)

库存管理(库存管理)

Cainan K. de Oliveira, Henrique G. Menck, Pedro Y. Takito, Eliandro Rodrigues Cirilo, Neyva Maria Lopes Romeiro, Érica R. Takano Natti, Paulo Laerte Natti

AI总结 本文提出数学和统计方法用于库存管理,通过ABC曲线分析确定优先级物品,利用EOQ模型和(Q,R)模型最小化库存成本。

详情
Journal ref
In: Applied Production Engineering 2. Chapter4. Ponta Grossa: Atena, 2022, v. 2, p. 46-60
Comments
In Portuguese, 17 pages, 12 figures, 7 tables. Conference SEMAT2017
AI中文摘要

在生产中需要大量储备原材料,但储存材料会带来成本。库存无序会导致最终产品成本非常高,并在生产链中产生其他问题。本文提出了适用于库存管理的数学和统计方法。使用ABC曲线分析来确定优先级物品,即最昂贵和周转率最高的物品,从而通过库存控制模型确定采购批量和周期,以最小化这些材料的总存储成本。利用经济订货量(EOQ)模型和(Q,R)模型,对公司库存成本进行了最小化。对模型结果进行了比较。

英文摘要

There is a great need to stock materials for production, but storing materials comes at a cost. Lack of organization in the inventory can result in a very high cost for the final product, in addition to generating other problems in the production chain. In this work we present mathematical and statistical methods applicable to stock management. The stock analysis using ABC curves serves to identify which are the priority items, the most expensive and with the highest turnover (demand), and thus determine, through stock control models, the purchase lot size and the periodicity that minimize the total costs of storing these materials. Using the Economic Order Quantity (EOQ) model and the (Q,R) model, the inventory costs of a company were minimized. The comparison of the results provided by the models was performed.

2502.08870 2026-06-04 cs.LG stat.ML

When and why randomised exploration works (in linear bandits)

随机探索何时以及为何有效(在线性赌博机中)

Marc Abeille, David Janz, Ciara Pike-Burke

AI总结 本文提出一种不依赖强制乐观或后验膨胀的分析方法,证明在动作空间光滑且强凸的d维线性赌博机中,随机探索算法(如汤普森采样)可实现O(d√n log(n))的n步遗憾界,首次表明在非平凡线性赌博机设置中汤普森采样能达到最优维度依赖。

详情
Comments
Minor corrections to formulas and text; results unchanged
AI中文摘要

我们提供了一种分析随机探索算法(如汤普森采样)的方法,该方法不依赖于强制乐观或后验膨胀。通过这种方法,我们证明在$d$维线性赌博机设置中,当动作空间光滑且强凸时,随机探索算法享有$O(d\sqrt{n} \log(n))$阶的$n$步遗憾界。值得注意的是,这首次表明存在非平凡的线性赌博机设置,其中汤普森采样可以在遗憾中实现最优维度依赖。

英文摘要

We provide an approach for the analysis of randomised exploration algorithms like Thompson sampling that does not rely on forced optimism or posterior inflation. With this, we demonstrate that in the $d$-dimensional linear bandit setting, when the action space is smooth and strongly convex, randomised exploration algorithms enjoy an $n$-step regret bound of the order $O(d\sqrt{n} \log(n))$. Notably, this shows for the first time that there exist non-trivial linear bandit settings where Thompson sampling can achieve optimal dimension dependence in the regret.

2411.03383 2026-06-04 math.ST math.CA stat.ML stat.TH

Near-Optimal and Tractable Estimation under Shift-Invariance

平移不变性下的近最优且可处理的估计

Dmitrii M. Ostrovskii

AI总结 针对满足未知s阶线性递推关系的离散时间信号,在i.i.d.复高斯噪声下,本文证明了其统计复杂度(以平方极小极大半径衡量)与s-稀疏信号几乎相同,并给出了近最优且可处理的估计器。

详情
Comments
28 pages. In the previous version (v2), our construction of the reproducing filter was erroneous. It is now replaced with an alternative construction using the Christoffel function. The only change from v3 is a typesetting correction in the abstract
AI中文摘要

估计一个满足未知$s$阶线性递推关系的离散时间信号$(x_{1}, ..., x_{n}) \in \mathbb{C}^n$,并在i.i.d.复高斯噪声中观测,难度有多大?所有此类信号的类虽然是参数化的,但极其丰富:它包含$\mathbb{C}$上总次数为$s$的所有指数多项式,包括具有$s$个任意频率的谐波振荡。几何上,该类对应于$\mathbb{C}^\mathbb{Z}$中所有$s$维平移不变子空间的并集到$\mathbb{C}^{n}$上的投影。我们证明,该类的统计复杂度(以$(1-δ)$置信$\ell_2$球的平方极小极大半径衡量)与$s$-稀疏信号的几乎相同,即$O\left(s\log(en) + \log(δ^{-1}) ight) \cdot \log^2(es) \cdot \log(en/s)$。此外,相应的近极小极大估计器是可处理的,并且可用于在相关检测问题中构建具有近极小极大检测阈值的检验统计量。这些统计结果依赖于一个简单的分析观察:将$\mathds{C}^\mathds{Z}$的任何平移不变子空间的Christoffel函数的傅里叶系数解释为在所有$\ell_p$范数($p \in [1,\infty]$)下具有最小可能谱的再生滤波器。

英文摘要

How hard is it to estimate a discrete-time signal $(x_{1}, ..., x_{n}) \in \mathbb{C}^n$ satisfying an unknown linear recurrence relation of order $s$ and observed in i.i.d. complex Gaussian noise? The class of all such signals is parametric but extremely rich: it contains all exponential polynomials over $\mathbb{C}$ with total degree $s$, including harmonic oscillations with $s$ arbitrary frequencies. Geometrically, this class corresponds to the projection onto $\mathbb{C}^{n}$ of the union of all shift-invariant subspaces of $\smash{\mathbb{C}^\mathbb{Z}}$ of dimension $s$. We show that the statistical complexity of this class, as measured by the squared minimax radius of the $(1-δ)$-confidence $\ell_2$-ball, is nearly the same as for the class of $s$-sparse signals, namely $\smash{O\left(s\log(en) + \log(δ^{-1})\right) \cdot \log^2(es) \cdot \log(en/s).}$ Moreover, the corresponding near-minimax estimator is tractable, and it can be used to build a test statistic with a near-minimax detection threshold in the associated detection problem. These statistical results rely upon a simple analytic observation: the interpretation of the Fourier coefficients of the Christoffel function of any shift-invariant subspace of $\smash{\mathbb{C}^\mathbb{Z}}$ as a reproducing filter with the smallest possible spectrum in all $\ell_p$-norms, $p \in [1,\infty]$, at once.

2411.13939 2026-06-04 math.ST math.DS math.PR stat.TH

Filtering and Statistical Properties of Unimodal Maps Perturbed by Heteroscedastic Noises

异方差噪声扰动下单峰映射的滤波与统计性质

Fabrizio Lillo, Stefano Marmi, Matteo Tanzi, Sandro Vaienti

AI总结 针对异方差马尔可夫链噪声和观测噪声扰动的单峰映射,提出滤波理论并证明观测增加时预测分布与初始猜测无关,同时给出浓度不等式、极值分布和泊松分布等极限定理。

详情
AI中文摘要

我们提出了一个受异方差马尔可夫链噪声扰动并经历由不确定观测引起的另一种异方差噪声的单峰映射理论。我们处理并解决了滤波问题,表明通过收集越来越多的观测,无论初始猜测如何,人们将预测底层马尔可夫链状态的相同分布。此外,我们给出了其他极限定理,特别强调了浓度不等式以及极值分布和泊松分布。我们的结果适用于来自金融系统性风险模型的一族映射。

英文摘要

We propose a theory of unimodal maps perturbed by an heteroscedastic Markov chain noise and experiencing another heteroscedastic noise due to uncertain observation. We address and treat the filtering problem showing that by collecting more and more observations, one would predict the same distribution for the state of the underlying Markov chain no matter one's initial guess. Moreover we give other limit theorems, emphasizing in particular concentration inequalities and extreme value and Poisson distributions. Our results apply to a family of maps arising from a model of systemic risk in finance.

2411.05591 2026-06-04 stat.ML cs.LG

Decentralized EM Algorithm for Gaussian Mixtures under Data Heterogeneity and Partial Labeling

数据异质性和部分标记下高斯混合的分布式EM算法

Xuetong Li, Shuyuan Wu, Bin Du, Hansheng Wang

AI总结 针对分布式联邦学习中数据异质性导致经典EM算法估计有偏的问题,提出动量网络EM(MNEM)算法和半监督MNEM(semi-MNEM)算法,实现渐近有效估计并加速收敛。

详情
AI中文摘要

我们系统研究了分布式联邦学习(DFL)中高斯混合模型的几种基于网络的期望最大化(EM)算法。理论研究表明,当数据在不同站点间异质分布时,直接将经典EM算法扩展到DFL会导致有偏估计。为解决这一问题,我们引入了动量网络EM(MNEM)算法,该算法整合了当前和先前DFL迭代的历史估计信息。我们进一步开发了半监督MNEM(semi-MNEM)算法,利用部分标记数据提供的信息。严格的理论分析表明,在适当的正则条件下,即使数据异质,MNEM估计器也能达到与全样本估计器相同的渐近效率。此外,即使不同混合成分分离较差,semi-MNEM估计器也能显著提高MNEM算法的收敛速度。进行了大量模拟,并分析了一个广泛使用的胸部X射线数据集,以证明所提出方法的有限样本性能。

英文摘要

We systematically study several network-based Expectation-Maximization (EM) algorithms for the Gaussian mixture model within decentralized federated learning (DFL). Our theoretical investigation shows that directly extending the classic EM algorithm to DFL leads to a biased estimator when data are heterogeneously distributed across sites. To address this, we introduce a momentum network EM (MNEM) algorithm, which integrates information from both current and historical estimators from previous DFL iterations. We further develop a semi-supervised MNEM (semi-MNEM) algorithm, which utilizes information provided by partially labeled data. Rigorous theoretical analysis demonstrates that the MNEM estimator can achieve the same asymptotic efficiency as the whole-sample estimator under appropriate regularity conditions, even with heterogeneous data. Moreover, the semi-MNEM estimator significantly improves the convergence speed of the MNEM algorithm, even if different mixture components are poorly separated. Extensive simulations are conducted, and a widely used chest X-ray dataset is analyzed to demonstrate the finite-sample performance of the proposed methods.

2405.08730 2026-06-04 stat.ME stat.AP

A Generalized Difference-in-Differences Estimator for Randomized Stepped-Wedge and Observational Staggered Adoption Settings

随机阶梯楔形与观察性交错采纳场景的广义双重差分估计量

Lee Kennedy-Shaffer

AI总结 针对随机阶梯楔形试验和观察性交错采纳研究,提出一种基于2×2双重差分加权平均的非参数估计方法,以灵活、无偏地估计目标处理效应,并平衡偏差-方差-泛化性权衡。

详情
Comments
56 pages main text, 19 pages additional material, 8 figures, 13 tables Update includes simulation results and updated discussion
AI中文摘要

交错处理采纳出现在许多场景的政策影响和实施评估中,包括随机阶梯楔形试验和具有面板数据的非随机准实验。在这两种场景中,获得可解释、无偏的效应估计需要仔细考虑目标估计量和可能的处理效应异质性。本文针对任一场景提出了一种新颖的非参数估计方法。通过使用2×2双重差分比较的加权平均值作为构建块来构造估计量,研究者可以针对任何假设的处理效应异质性定位目标估计量。这在有效利用比较以减少精度损失的同时,提供了理想的偏差和解释性质,且无需正确的方差设定。该方法通过一项关于新型结核病诊断工具影响的随机阶梯楔形试验和一项关于美国各州COVID-19疫苗经济激励彩票效果的观察性交错采纳研究进行了演示,并与使用先前方法的分析进行了比较。提供了包含R代码的完整算法来实现该方法并与现有方法进行比较。所提出的方法允许高度灵活性和对期望效应的清晰定位,为偏差-方差-泛化性权衡提供了一种解决方案。

英文摘要

Staggered treatment adoption arises in the evaluation of policy impact and implementation in many settings, including both randomized stepped-wedge trials and non-randomized quasi-experiments with panel data. In both settings, getting an interpretable, unbiased effect estimate requires careful consideration of the target estimand and possible treatment effect heterogeneities. This paper proposes a novel non-parametric approach to this estimation for either setting. By constructing an estimator using weighted averages of two-by-two difference-in-differences comparisons as building blocks, the investigator can target the desired estimand for any assumed treatment effect heterogeneities. This provides desirable bias and interpretation properties while using the comparisons efficiently to mitigate the loss of precision, without requiring correct variance specification. The methods are demonstrated for both a randomized stepped-wedge trial on the impact of novel tuberculosis diagnostic tools and an observational staggered adoption study on the effects of COVID-19 vaccine financial incentive lotteries in U.S. states; these are compared to analyses using previous methods. A full algorithm with R code is provided to implement this method and to compare against existing methods. The proposed method allows for high flexibility and clear targeting of desired effects, providing one solution to the bias-variance-generalizability tradeoff.

2205.08609 2026-06-04 stat.ML cs.LG stat.ME

Bagged Polynomial Regression and Neural Networks

Bagged Polynomial Regression and Neural Networks

Sylvia Klosin, Jaume Vives-i-Bastida

AI总结 针对高维预测问题,提出基于随机投影的袋装多项式回归(BPR),在保持与神经网络相当精度的同时提供可解释性和诊断工具。

详情
AI中文摘要

气候和环境应用越来越依赖于从遥感和其他科学数据中进行高维预测。神经网络(NN)在这些场景中能够提供强大的准确性,但往往难以审计且难以与领域知识对齐。作为替代方案,我们提出了基于随机投影的袋装多项式回归(BPR),这是一种计量经济学原生的集成方法,它对在随机选择的协变量组上拟合的多个正则化低次多项式模型进行平均。我们提供了新颖的有限样本和渐近风险界,并展示了协变量划分如何通过控制字典基增长来改善光滑目标函数的速率。速率改进对于边际效应的估计可能尤其重要。在使用光学和雷达图像进行基于卫星的作物分类的应用中,BPR 在保持易于诊断的同时达到了与 NN 相当的准确性。我们提供了实用的透明度工具、系数汇总和偏依赖诊断,表明 BPR 捕捉到了 NN 未能捕捉到的直观特征关系。

英文摘要

Climate and environmental applications increasingly rely on high-dimensional prediction from remote sensing and other scientific data. Neural networks (NN) can deliver strong accuracy in these settings, but they are often hard to audit and hard to align with domain knowledge. As an alternative, we propose bagged polynomial regression with random projections (BPR), an econometrics-native ensemble that averages many regularized low-degree polynomial models fit on randomly selected covariate groups. We provide novel finite-sample and asymptotic risk bounds and show how covariate partitioning can improve rates for smooth target functions by controlling dictionary basis growth. Rate improvements may be particularly relevant for the estimation of marginal effects. In an application to satellite-based crop classification using optical and radar imagery, BPR matches NN accuracy while remaining straightforward to diagnose. We provide practical transparency tools, coefficient summaries and partial-dependence diagnostics, that show BPR captures intuitive feature relationships that NNs do not.

2301.04512 2026-06-04 math.ST stat.ME stat.TH

Partial Conditioning for Inference of Many-Normal-Means with Hölder Constraints

具有Hölder约束的多正态均值推断的部分条件化

Jiasen Yang, Xiao Wang, Chuanhai Liu

AI总结 针对具有Hölder约束的多正态均值问题,提出部分条件化方法以生成有效且高效的边际推断,在有效性和效率上均优于现有方法。

详情
Journal ref
International Journal of Approximate Reasoning, 2023, Volume 159, 108946
AI中文摘要

推断模型已被提出用于有效且高效的无先验概率推断。随着该理论逐渐普及,它需要针对实际具有挑战性的问题进行进一步发展。本文考虑了均值被约束在彼此邻域内的多正态均值问题,该约束由Hölder空间形式化表示。提出了一种称为部分条件化的新方法,用于生成关于个体均值的有效且高效的边际推断。结果表明,该方法在有效性上优于基准方法,在效率上优于保守方法。我们通过指出部分条件化推断模型的一般理论值得未来发展来结束本文。

英文摘要

Inferential models have been proposed for valid and efficient prior-free probabilistic inference. As it gradually gained popularity, this theory is subject to further developments for practically challenging problems. This paper considers the many-normal-means problem with the means constrained to be in the neighborhood of each other, formally represented by a Hölder space. A new method, called partial conditioning, is proposed to generate valid and efficient marginal inference about the individual means. It is shown that the method outperforms both a fiducial-counterpart in terms of validity and a conservative-counterpart in terms of efficiency. We conclude the paper by remarking that a general theory of partial conditioning for inferential models deserves future development.

1710.04238 2026-06-04 stat.ME cs.LG cs.NA math.NA

Regression-aware decompositions

回归感知的分解

Mark Tygert

AI总结 本文提出了一种回归感知的分解方法,通过结合线性最小二乘回归模型与插值分解,实现了对矩阵B的监督降维,从而揭示了B中与A回归相关的结构。

详情
Journal ref
Linear Algebra and Its Applications, 565 (6): 208-224, 2019
Comments
19 pages, 9 figures, 2 tables
AI中文摘要

线性最小二乘回归通过设计矩阵A来近似给定矩阵B,通过最小化谱范数或Frobenius范数的差异||AX-B||来实现。另一种流行的近似方法是通过主成分分析(PCA)进行低秩近似,即奇异值分解(SVD)或插值分解(ID)。传统上,PCA/SVD和ID仅使用被近似的矩阵B,而不受任何辅助矩阵A的监督。然而,线性最小二乘回归模型可以指导ID,从而产生回归感知的ID。作为额外的好处,这为一种典型的判别分析(A和B之间的相关性)提供了解释。回归感知的分解有效使监督信息能够指导经典的降维方法,而经典降维方法历来是完全无监督的。回归感知的分解揭示了B中与A回归相关的结构。

英文摘要

Linear least-squares regression with a "design" matrix A approximates a given matrix B via minimization of the spectral- or Frobenius-norm discrepancy ||AX-B|| over every conformingly sized matrix X. Another popular approximation is low-rank approximation via principal component analysis (PCA) -- which is essentially singular value decomposition (SVD) -- or interpolative decomposition (ID). Classically, PCA/SVD and ID operate solely with the matrix B being approximated, not supervised by any auxiliary matrix A. However, linear least-squares regression models can inform the ID, yielding regression-aware ID. As a bonus, this provides an interpretation as regression-aware PCA for a kind of canonical correlation analysis between A and B. The regression-aware decompositions effectively enable supervision to inform classical dimensionality reduction, which classically has been totally unsupervised. The regression-aware decompositions reveal the structure inherent in B that is relevant to regression against A.

1306.6770 2026-06-04 math.PR cs.NA math.DS math.NA math.OC math.ST stat.TH

Numerical Methods and Analysis via Random Field Based Malliavin Calculus for Backward Stochastic PDEs

基于随机场Malliavin微积分的随机偏微分方程的数值方法与分析

Wanyang Dai

AI总结 本文研究了随机偏微分方程的适配解、数值方法及相关收敛分析,提出了一种基于随机场Malliavin微积分的新理论,用于证明在随机环境下一阶和二阶Malliavin导数基于B-SPDE的适配解的存在性和唯一性。

详情
Journal ref
Computers & Mathematics with Applications Volume 119, 1 August 2022, Pages 21-58
Comments
39 pages
AI中文摘要

我们研究了一种统一的反向随机偏微分方程(B-SPDE)的适配解、数值方法及相关收敛分析。该方程是向量值的,其漂移和扩散系数可能涉及非线性和高阶偏微分算子。在某些广义Lipschitz和线性增长条件下,证明了B-SPDE的适配解的存在性和唯一性。所采用的方法基于时间和空间的完全离散方案。对方法的误差估计或收敛速率的分析进行了研究。分析的关键是开发新的随机场基于Malliavin微积分理论,以证明在随机环境下一阶和二阶Malliavin导数基于B-SPDE的适配解的存在性和唯一性。

英文摘要

We study the adapted solution, numerical methods, and related convergence analysis for a unified backward stochastic partial differential equation (B-SPDE). The equation is vector-valued, whose drift and diffusion coefficients may involve nonlinear and high-order partial differential operators. Under certain generalized Lipschitz and linear growth conditions, the existence and uniqueness of adapted solution to the B-SPDE are justified. The methods are based on completely discrete schemes in terms of both time and space. The analysis concerning error estimation or rate of convergence of the methods is conducted. The key of the analysis is to develop new theory for random field based Malliavin calculus to prove the existence and uniqueness of adapted solutions to the first-order and second-order Malliavin derivative based B-SPDEs under random environments.

2401.14077 2026-06-04 cs.MS stat.CO

LongMemory.jl: Generating, Estimating, and Forecasting Long Memory Models in Julia

LongMemory.jl: 在 Julia 中生成、估计和预测长记忆模型

J. Eduardo Vera-Valdés

AI总结 本文介绍 LongMemory.jl 包,提供在 Julia 中生成、估计和预测长记忆时间序列模型的功能,并与其他现有工具进行性能比较。

详情
AI中文摘要

LongMemory.jl 是一个用于 Julia 中时间序列长记忆建模的包。该包提供生成长记忆、估计模型参数和预测的功能。生成方法包括分数差分、随机误差持续时间和横截面聚合。估计量包括用于估计赫斯特效应的经典方法、基于对数周期图回归的方法以及参数方法。所有参数估计器都提供预测。此外,该包增加了绘图功能以说明长记忆动态和预测。本文介绍了长记忆建模的理论发展,展示了使用包中包含的数据的示例,并将 LongMemory.jl 的特性与当前替代方案(包括基准测试)进行了比较。对于某些理论发展,LongMemory.jl 提供了任何编程语言中的第一个公开实现。该包的一个显著特点是所有函数都用同一种编程语言实现,充分利用了 Julia 提供的易用性和速度。因此,所有代码对用户都是可访问的。多重分派(该语言的一个新特性)用于加速计算并提供对相关方法的一致调用。该包与 R 包 LongMemoryTS 和 fracdiff 相关。

英文摘要

LongMemory.jl is a package for time series long memory modelling in Julia. The package provides functions to generate long memory, estimate model parameters, and forecast. Generating methods include fractional differencing, stochastic error duration, and cross-sectional aggregation. Estimators include the classic ones used to estimate the Hurst effect, those inspired by log-periodogram regression, and parametric ones. Forecasting is provided for all parametric estimators. Moreover, the package adds plotting capabilities to illustrate long memory dynamics and forecasting. This article presents the theoretical developments for long memory modelling, show examples using the data included with the package, and compares the properties of LongMemory.jl with current alternatives, including benchmarks. For some of the theoretical developments, LongMemory.jl provides the first publicly available implementation in any programming language. A notable feature of this package is that all functions are implemented in the same programming language, taking advantage of the ease of use and speed provided by Julia. Therefore, all code is accessible to the user. Multiple dispatch, a novel feature of the language, is used to speed computations and provide consistent calls to related methods. The package is related to the R packages LongMemoryTS and fracdiff.

2209.15448 2026-06-04 cs.LG math.ST stat.ME stat.TH

Blessing from Human-AI Interaction: Super Reinforcement Learning in Confounded Environments

人机交互的福音:混杂环境下的超级强化学习

Jiayi Wang, Zhengling Qi, Chengchun Shi

AI总结 提出利用人机交互中的观察动作进行超级策略学习,在存在未测量混杂的情况下,通过近端因果推断实现优于标准最优策略和行为策略的超级策略。

详情
AI中文摘要

随着人工智能在社会中越来越普遍,整合人类和AI系统以发挥各自优势并降低风险的有效方法已成为重要优先事项。在本文中,我们引入了超级策略学习的范式,该范式利用人机交互进行数据驱动的序贯决策。这种方法将来自AI或人类的观察动作作为输入,以实现决策者(人类或AI)在策略学习中更强的oracle。在存在未测量混杂的决策过程中,过去智能体采取的动作可以揭示未公开信息的有价值见解。通过以一种新颖且合法的方式将这些信息纳入策略搜索,所提出的超级策略学习将产生一个超级策略,该策略保证优于标准最优策略和行为策略(例如,过去智能体的动作)。我们将这种更强的oracle称为人机交互的福音。此外,为了解决使用批处理数据寻找超级策略时的未测量混杂问题,在近端因果推断框架下建立了一系列非参数和因果识别。基于这些新颖的识别结果,我们开发了几种超级策略学习算法,并系统研究了它们的理论性质,例如有限样本遗憾保证。最后,通过大量模拟和实际应用说明了我们方法的有效性。

英文摘要

As AI becomes more prevalent throughout society, effective methods of integrating humans and AI systems that leverage their respective strengths and mitigate risk have become an important priority. In this paper, we introduce the paradigm of super policy learning that takes advantage of Human-AI interaction for data driven sequential decision making. This approach utilizes the observed action, either from AI or humans, as input for achieving a stronger oracle in policy learning for the decision maker (humans or AI). In the decision process with unmeasured confounding, the actions taken by past agents can offer valuable insights into undisclosed information. By including this information for the policy search in a novel and legitimate manner, the proposed super policy learning will yield a super-policy that is guaranteed to outperform both the standard optimal policy and the behavior one (e.g., past agents' actions). We call this stronger oracle a blessing from human-AI interaction. Furthermore, to address the issue of unmeasured confounding in finding super-policies using the batch data, a number of nonparametric and causal identifications are established under the framework of proximal causal inference. Building upon on these novel identification results, we develop several super-policy learning algorithms and systematically study their theoretical properties such as finite-sample regret guarantee. Finally, we illustrate the effectiveness of our proposal through extensive simulations and real-world applications.

1502.07873 2026-06-04 stat.CO cs.NA math.NA

Fast Bayesian Optimal Experimental Design for Seismic Source Inversion

快速贝叶斯最优实验设计用于地震源反演

Quan Long, Mohammad Motamed, Raul Tempone

AI总结 本文提出了一种快速方法,用于在统计地震源反演中最优设计实验,通过高效计算接收器或地震仪的最佳数量和位置,利用弹性动力学波动方程建模正向问题,并采用拉普拉斯近似加速估计预期Kullback-Leibler散度以提高实验设计效率。

详情
AI中文摘要

我们开发了一种快速方法,用于在统计地震源反演中最优设计实验。特别是,我们高效地计算了接收器或地震仪的最佳数量和位置。地震源由点矩张量乘以时间依赖函数建模。参数包括源位置、矩张量分量以及时间函数中的起始时间和频率。正向问题由弹性动力学波动方程建模。我们证明了成本函数的Hessian,通常定义为实验数据与模拟数据差异加权L2范数的平方,与测量时间和接收器数量成正比。因此,在贝叶斯设置下,参数的后验分布集中在“真实”参数周围,我们可以采用拉普拉斯近似以加速估计预期Kullback-Leibler散度(预期信息增益),即实验设计过程中的最优标准。由于源参数跨度较大,我们使用缩放矩阵以高效控制原始Hessian矩阵的条件数。我们使用二阶精度有限差分法计算Hessian矩阵,并采用稀疏求积或蒙特卡洛采样进行数值积分。我们在二维地震源反演问题上展示了该方法的效率、精度和适用性。

英文摘要

We develop a fast method for optimally designing experiments in the context of statistical seismic source inversion. In particular, we efficiently compute the optimal number and locations of the receivers or seismographs. The seismic source is modeled by a point moment tensor multiplied by a time-dependent function. The parameters include the source location, moment tensor components, and start time and frequency in the time function. The forward problem is modeled by elastodynamic wave equations. We show that the Hessian of the cost functional, which is usually defined as the square of the weighted L2 norm of the difference between the experimental data and the simulated data, is proportional to the measurement time and the number of receivers. Consequently, the posterior distribution of the parameters, in a Bayesian setting, concentrates around the "true" parameters, and we can employ Laplace approximation and speed up the estimation of the expected Kullback-Leibler divergence (expected information gain), the optimality criterion in the experimental design procedure. Since the source parameters span several magnitudes, we use a scaling matrix for efficient control of the condition number of the original Hessian matrix. We use a second-order accurate finite difference method to compute the Hessian matrix and either sparse quadrature or Monte Carlo sampling to carry out numerical integration. We demonstrate the efficiency, accuracy, and applicability of our method on a two-dimensional seismic source inversion problem.

1501.03323 2026-06-04 math.NA cs.NA stat.CO

Coordinate Transformation and Polynomial Chaos for the Bayesian Inference of a Gaussian Process with Parametrized Prior Covariance Function

坐标变换与多项式混沌用于具有参数化先验协方差函数的高斯过程的贝叶斯推断

Ihab Sraj, Olivier P. Le Maître, Omar M. Knio, Ibrahim Hoteit

AI总结 本文提出了一种基于坐标变换和多项式混沌的高斯过程贝叶斯推断方法,用于在不确定的协方差函数超参数下进行模型降维和参数估计。

详情
Comments
34 pages, 17 figures
AI中文摘要

本文针对基于具有不确定协方差函数超参数的先验高斯场的贝叶斯推断中的模型降维问题。传统上,通过假设协方差函数具有固定超参数的卡尔文-洛埃夫展开来实现降维,尽管这些超参数本身是不确定的。然后通过可用观测数据推断卡尔文-洛埃夫坐标的后验分布。因此,得到的推断场依赖于所假设的超参数。本文旨在高效地估计场和协方差超参数。为此,通过坐标变换推导出广义的卡尔文-洛埃夫展开,以考虑对协方差超参数的依赖性。利用多项式混沌展开来加速贝叶斯推断,通过类似的坐标变换,从而避免显式展开解对不确定超参数的依赖性。我们通过瞬态扩散方程验证了所提方法的可行性,通过从噪声数据中推断出空间变化的对数扩散率场。在推断过程中包含超参数的不确定性时,推断出的剖面更接近真实剖面。

英文摘要

This paper addresses model dimensionality reduction for Bayesian inference based on prior Gaussian fields with uncertainty in the covariance function hyper-parameters. The dimensionality reduction is traditionally achieved using the Karhunen-\Loeve expansion of a prior Gaussian process assuming covariance function with fixed hyper-parameters, despite the fact that these are uncertain in nature. The posterior distribution of the Karhunen-Loève coordinates is then inferred using available observations. The resulting inferred field is therefore dependent on the assumed hyper-parameters. Here, we seek to efficiently estimate both the field and covariance hyper-parameters using Bayesian inference. To this end, a generalized Karhunen-Loève expansion is derived using a coordinate transformation to account for the dependence with respect to the covariance hyper-parameters. Polynomial Chaos expansions are employed for the acceleration of the Bayesian inference using similar coordinate transformations, enabling us to avoid expanding explicitly the solution dependence on the uncertain hyper-parameters. We demonstrate the feasibility of the proposed method on a transient diffusion equation by inferring spatially-varying log-diffusivity fields from noisy data. The inferred profiles were found closer to the true profiles when including the hyper-parameters' uncertainty in the inference formulation.

1409.6111 2026-06-04 math.OC cs.LG cs.MA cs.SY eess.SY stat.ML

Distributed Clustering and Learning Over Networks

网络上的分布式聚类与学习

Xiaochuan Zhao, Ali H. Sayed

AI总结 本文提出了一种自适应的聚类和学习方案,使智能体能够学习应与哪些邻居合作以及哪些邻居应忽略,从而在网络中实现更准确的学习和估计。通过详细的均方分析,评估了聚类机制的一阶和二阶误差概率,并证明这些概率随步长指数衰减,从而可以将正确聚类的概率任意接近于一。

详情
Comments
47 pages, 6 figures
AI中文摘要

网络上的分布式处理依赖于节点间的在网处理和邻近智能体之间的合作。当智能体共享共同目标时,合作是有益的。然而,在许多应用中,智能体可能属于不同的集群,追求不同的目标。因此,无差别合作会导致不期望的结果。在本文中,我们提出了一种自适应的聚类和学习方案,使智能体能够学习应与哪些邻居合作以及哪些其他邻居应忽略。通过这样做,所得到的算法使智能体能够识别其集群,并在网络中实现改进的学习和估计准确性。我们进行了详细的均方分析,并评估了聚类机制的一阶和二阶误差概率,即虚警和误检概率。此外,我们证明这些概率随着步长指数衰减,从而使正确聚类的概率可以任意接近于一。

英文摘要

Distributed processing over networks relies on in-network processing and cooperation among neighboring agents. Cooperation is beneficial when agents share a common objective. However, in many applications agents may belong to different clusters that pursue different objectives. Then, indiscriminate cooperation will lead to undesired results. In this work, we propose an adaptive clustering and learning scheme that allows agents to learn which neighbors they should cooperate with and which other neighbors they should ignore. In doing so, the resulting algorithm enables the agents to identify their clusters and to attain improved learning and estimation accuracy over networks. We carry out a detailed mean-square analysis and assess the error probabilities of Types I and II, i.e., false alarm and mis-detection, for the clustering mechanism. Among other results, we establish that these probabilities decay exponentially with the step-sizes so that the probability of correct clustering can be made arbitrarily close to one.

1705.05495 2026-06-04 stat.ML cs.SY eess.SY

A Bayesian Filtering Algorithm for Gaussian Mixture Models

一种用于高斯混合模型的贝叶斯滤波算法

Adrian G. Wills, Johannes Hendriks, Christopher Renton, Brett Ninness

AI总结 本文提出了一种针对可由高斯混合模型建模的状态空间系统的贝叶斯滤波算法,通过在时间更新和测量更新后引入高斯混合缩减步骤来处理滤波问题中的指数增长混合项问题,并在多个模拟系统上对统一算法的平方根实现进行了性能评估。

详情
AI中文摘要

本文开发了一种用于一类可由高斯混合模型建模的状态空间系统的贝叶斯滤波算法。通常,该滤波问题的精确解涉及混合项数量的指数增长,此处通过在时间和测量更新后利用高斯混合缩减步骤来处理。此外,还提出了统一算法的平方根实现,并在多个模拟系统上对该算法进行了性能评估。这包括对两个严格不在本文考虑范围内的非线性系统的状态估计。

英文摘要

A Bayesian filtering algorithm is developed for a class of state-space systems that can be modelled via Gaussian mixtures. In general, the exact solution to this filtering problem involves an exponential growth in the number of mixture terms and this is handled here by utilising a Gaussian mixture reduction step after both the time and measurement updates. In addition, a square-root implementation of the unified algorithm is presented and this algorithm is profiled on several simulated systems. This includes the state estimation for two non-linear systems that are strictly outside the class considered in this paper.

1808.03408 2026-06-04 cs.LG cs.NA math.NA math.OC stat.ML

A Unified Analysis of AdaGrad with Weighted Aggregation and Momentum Acceleration

AdaGrad的统一分析:带加权聚合和动量加速

Li Shen, Congliang Chen, Fangyu Zou, Zequn Jie, Ju Sun, Wei Liu

AI总结 本文提出了一种名为AdaUSM的加权AdaGrad算法,通过统一动量方案和新型加权自适应学习率,实现了在非凸随机设置下的O(√(log(T)/T))收敛率,并从新视角解释了Adam和RMSProp的自适应学习率。

详情
Comments
IEEE TNNLS
AI中文摘要

将自适应学习率和动量技术整合到SGD中,可以得到一系列高效加速的自适应随机算法,如AdaGrad、RMSProp、Adam、AccAdaGrad等。尽管这些算法在实践中效果显著,但在非凸随机设置下的收敛理论仍存在较大差距。为此,我们提出了名为AdaUSM的加权AdaGrad,其主要特点包括(1)采用统一的动量方案,涵盖重球动量和Nesterov加速梯度动量;(2)采用新颖的加权自适应学习率,能够统一AdaGrad、AccAdaGrad、Adam和RMSProp的学习率。此外,当在AdaUSM中采用多项式增长的权重时,可以得到非凸随机设置下的O(√(log(T)/T))收敛率。我们还展示了Adam和RMSProp的自适应学习率对应于在AdaUSM中采用指数增长的权重,从而为理解Adam和RMSProp提供了新的视角。最后,我们还在各种深度学习模型和数据集上进行了AdaUSM与SGD动量、AdaGrad、AdaEMA、Adam和AMSGrad的比较实验。

英文摘要

Integrating adaptive learning rate and momentum techniques into SGD leads to a large class of efficiently accelerated adaptive stochastic algorithms, such as AdaGrad, RMSProp, Adam, AccAdaGrad, \textit{etc}. In spite of their effectiveness in practice, there is still a large gap in their theories of convergences, especially in the difficult non-convex stochastic setting. To fill this gap, we propose \emph{weighted AdaGrad with unified momentum}, dubbed AdaUSM, which has the main characteristics that (1) it incorporates a unified momentum scheme which covers both the heavy ball momentum and the Nesterov accelerated gradient momentum; (2) it adopts a novel weighted adaptive learning rate that can unify the learning rates of AdaGrad, AccAdaGrad, Adam, and RMSProp. Moreover, when we take polynomially growing weights in AdaUSM, we obtain its $\mathcal{O}(\log(T)/\sqrt{T})$ convergence rate in the non-convex stochastic setting. We also show that the adaptive learning rates of Adam and RMSProp correspond to taking exponentially growing weights in AdaUSM, thereby providing a new perspective for understanding Adam and RMSProp. Lastly, comparative experiments of AdaUSM against SGD with momentum, AdaGrad, AdaEMA, Adam, and AMSGrad on various deep learning models and datasets are also carried out.

2209.08050 2026-06-04 stat.ME

Reweighted and Circularized Anderson-Darling Tests of Goodness-of-Fit

重加权和循环化的Anderson-Darling拟合优度检验

Chuanhai Liu

AI总结 本文在重加权Anderson-Darling检验的背景下,通过几何理解、提出循环对称检验(循环化检验)以及建立大样本结果,改进了拟合优度检验的性能。

详情
Journal ref
Journal of Nonparametric Statistics, 2023, 35, Pages 869-904
AI中文摘要

本文研究了重加权Anderson-Darling检验背景下的拟合优度综合检验,并做出了三方面贡献。第一个贡献是提供几何理解。论证了对于可交换分布偏差具有最小方差的检验统计量可以作为良好的通用检验。第二个贡献是提出更好的综合检验,称为循环对称检验,通过循环化重加权Anderson-Darling检验统计量或更一般地基于观测顺序统计量的检验统计量得到。由此产生的检验称为循环化检验。一项有限但具有说服力的有限样本性能模拟研究表明,循环化检验具有良好的性能,因为在模拟研究中它们通常优于其父方法。第三个贡献是建立了新的渐近结果。

英文摘要

This paper takes a look at omnibus tests of goodness of fit in the context of reweighted Anderson-Darling tests and makes threefold contributions. The first contribution is to provide a geometric understanding. It is argued that the test statistic with minimum variance for exchangeable distributional deviations can serve as a good general-purpose test. The second contribution is to propose better omnibus tests, called circularly symmetric tests and obtained by circularizing reweighted Anderson-Darling test statistics or, more generally, test statistics based on the observed order statistics. The resulting tests are called circularized tests. A limited but arguably convincing simulation study on finite-sample performance demonstrates that circularized tests have good performance, as they typically outperform their parent methods in the simulation study. The third contribution is to establish new large-sample results.

2107.01629 2026-06-04 stat.ML cs.LG econ.GN q-fin.EC stat.AP

From Live to Recording: Consumer Demand and Response to Price Across the Livestreaming Lifecycle

从直播到录制:消费者对直播生命周期中价格的需求与响应

Ziwei Cong, Jia Liu, Puneet Manchanda

AI总结 利用大型直播平台数据,研究消费者在直播前后对价格敏感性的差异,发现直播前需求价格弹性更高,主要由消费者自选择和质量不确定性驱动。

详情
Comments
An earlier version of this paper was distributed under the title "The Role of 'Live' in Livestreaming Markets: Evidence Using Orthogonal Random Forest."
AI中文摘要

直播已发展成为一个蓬勃发展的行业,创作者可以直接从中获利并与观众和粉丝互动。在实践中,创作者和平台通常将营销工作集中在直播前的时期。然而,直播活动在结束后自然过渡到录制格式,创造了潜在的“剩余”变现机会。本研究利用一个大型直播平台的数据,系统性地考察了消费者在整个直播生命周期中对直播活动的需求,该平台允许消费者在直播结束后购买付费直播活动的录制版本。我们发现,与直播后时期相比,直播前时期的需求对价格更敏感。这部分由两种机制驱动:消费者自选择(不常消费的消费者可能错过了直播活动,对录制版本表现出更高的支付意愿)和质量不确定性(消费者在直播前时期面临的事件质量不确定性高于直播后时期)。我们的研究结果为直播市场的定价和定向策略提供了启示。

英文摘要

Livestreaming has evolved into a thriving industry where creators can directly monetize and engage with their audiences and followers. In practice, creators and platforms typically concentrate their marketing efforts on the period leading up to the livestream. However, livestreaming events naturally transition into recorded formats once the event concludes, creating potential "residual" opportunities for monetization. This study systematically examines consumer demand for live events throughout the entire livestream life-cycle, using data from a large livestreaming platform that allows consumers to purchase the recorded version of a paid live event after the livestream ends. We find that the demand is surprisingly more price-sensitive during the pre-livestream period compared to the post-period. This is partly driven by two mechanisms: consumer self-selection (infrequent consumers who may have missed the live events exhibit a higher willingness to pay for recorded versions) and quality uncertainty (consumers face higher uncertainty in event quality during the pre-period than in the post-period). Our findings generate implications for the pricing and targeting strategies in livestreaming markets.

1402.3365 2026-06-04 math.NA cs.NA stat.ME

Regularization Parameter Estimation for Underdetermined problems by the $χ^2$ principle with application to $2D$ focusing gravity inversion

通过χ²原理估计欠定问题的正则化参数并应用于二维聚焦重力反演

Saeed Vatankhah, Rosemary A Renaut, Vahid E Ardestani

AI总结 本文提出了一种基于χ²原理的正则化参数估计方法,用于解决欠定问题,并在二维聚焦重力反演中应用,通过数值实验验证了该方法在效率和鲁棒性上的优势,同时对比了L曲线和MDP方法。

详情
Journal ref
Inverse Problems 30 (2014) 085002
AI中文摘要

χ²原理将Morozov不一致原理推广到Tikhonov正则化最小二乘问题的增广残差中。当数据保真度按已知高斯噪声分布加权,而正则项按未知的模型参数逆协方差信息加权时,Tikhonov泛函的最小值遵循具有m+p-n自由度的χ²分布,其中模型矩阵G为m×n,正则器L为p×n。证明当m<n时,若m+p≥n,则结果仍然成立。使用牛顿根寻找算法找到正则化参数α,该参数在白噪声假设下给出最优的逆协方差加权。通过广义奇异值分解实现小规模问题的计算。数值结果验证了该算法在正则器近似为零或二阶导数近似情况下的有效性,与广义交叉验证和无偏预测风险估计方法进行了对比。对于欠定的二维聚焦重力数据反演,生成具有非光滑性质的模型,通常使用迭代最小支持(MS)稳定器,并在每次迭代中更新正则器和正则化参数。在含噪声的模拟数据集中,使用欠定数据集的正则化参数估计方法在该迭代框架中进行,同时与L曲线和MDP方法进行对比。实验展示了χ²原理的效率和鲁棒性,此外L曲线和MDP方法通常表现不佳。此外,当不掌握模型均值信息时,MS在χ²原理中具有通用性。

英文摘要

The $χ^2$-principle generalizes the Morozov discrepancy principle (MDP) to the augmented residual of the Tikhonov regularized least squares problem. Weighting of the data fidelity by a known Gaussian noise distribution on the measured data, when the regularization term is weighted by unknown inverse covariance information on the model parameters, the minimum of the Tikhonov functional is a random variable following a $χ^2$-distribution with $m+p-n$ degrees of freedom, model matrix $G:$ $m \times n$ and regularizer $L:$ $p\times n$. It is proved that the result holds also for $m<n$ when $m+p\ge n$. A Newton root-finding algorithm is used to find the regularization parameter $α$ which yields the optimal inverse covariance weighting in the case of a white noise assumption on the mapped model data. It is implemented for small-scale problems using the generalized singular value decomposition. Numerical results verify the algorithm for the case of regularizers approximating zero to second order derivative approximations, contrasted with the methods of generalized cross validation and unbiased predictive risk estimation. The inversion of underdetermined $2D$ focusing gravity data produces models with non-smooth properties, for which typical implementations in this field use the iterative minimum support (MS) stabilizer and both regularizer and regularizing parameter are updated each iteration. For a simulated data set with noise, the regularization parameter estimation methods for underdetermined data sets are used in this iterative framework, also contrasted with the L-curve and MDP. Experiments demonstrate efficiency and robustness of the $χ^2$-principle, moreover the L-curve and MDP are generally outperformed. Furthermore, the MS is of general use for the $χ^2$-principle when implemented without the knowledge of a mean value of the model.

1412.4428 2026-06-04 stat.ME econ.GN q-fin.EC q-fin.MF

Nonparametric Stochastic Discount Factor Decomposition

非参数随机折现因子分解

Timothy Christensen

AI总结 本文提出了一种经验框架,用于分析随机折现因子过程的永久-暂时分解,通过非参数方法估计Hansen和Scheinkman(2009)的Perron-Frobenius特征函数问题的解,并研究了递归偏好模型中的持续价值函数非参数估计器。

详情
Journal ref
Econometrica 85(5) (2017) 1501-1536
AI中文摘要

动态经济中的随机折现因子(SDF)过程允许进行永久-暂时分解,其中永久成分表征长期投资期限内的定价。本文介绍了一个经验框架,用于分析SDF过程的永久-暂时分解。具体而言,我们展示了如何非参数地估计Hansen和Scheinkman(2009)的Perron-Frobenius特征函数问题的解。我们的经验框架使研究者能够(i)恢复估计的永久和暂时成分的时间序列,(ii)估计表征长期投资期限内定价的收益率和测度变化。我们还引入了在一类具有递归偏好的模型中持续价值函数的非参数估计器,通过将价值函数递归重新解释为非线性Perron-Frobenius问题。我们建立了特征函数估计器的一致性和收敛速度,以及特征值估计器和相关函数估计器的渐近正态性。作为应用,我们研究了一个经济体系,其中代表性代理人具有递归偏好,允许一般(非线性)消费和收入增长动态。

英文摘要

Stochastic discount factor (SDF) processes in dynamic economies admit a permanent-transitory decomposition in which the permanent component characterizes pricing over long investment horizons. This paper introduces an empirical framework to analyze the permanent-transitory decomposition of SDF processes. Specifically, we show how to estimate nonparametrically the solution to the Perron-Frobenius eigenfunction problem of Hansen and Scheinkman (2009). Our empirical framework allows researchers to (i) recover the time series of the estimated permanent and transitory components and (ii) estimate the yield and the change of measure which characterize pricing over long investment horizons. We also introduce nonparametric estimators of the continuation value function in a class of models with recursive preferences by reinterpreting the value function recursion as a nonlinear Perron-Frobenius problem. We establish consistency and convergence rates of the eigenfunction estimators and asymptotic normality of the eigenvalue estimator and estimators of related functionals. As an application, we study an economy where the representative agent is endowed with recursive preferences, allowing for general (nonlinear) consumption and earnings growth dynamics.

1610.00199 2026-06-04 math.NA cs.NA stat.ML

Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation From Undersampled Data

流数据子空间估计的Grassmannian梯度下降算法收敛性研究

Dejiao Zhang, Laura Balzano

AI总结 本文研究了在欠采样数据下,基于Grassmannian约束的一阶增量梯度下降算法的收敛性,提出了一种自适应步长方案,证明了该方法在任意随机初始化下能够收敛到真实子空间,即使在非凸问题和正交约束下也能保证收敛。

详情
Comments
31 pages, 3 figures
AI中文摘要

子空间学习和矩阵分解问题在科学和工程中有着广泛的应用,随着数据集规模的增大,高效算法变得至关重要。许多相关问题形式是非凸的,在各种上下文中观察到直接求解非凸问题不仅高效而且准确。本文讨论了一种特定方法的收敛理论:首先是一阶增量梯度下降,受限于Grassmannian。该算法的输出是输入流数据矩阵张成的d维子空间的正交基。我们研究了两种采样情况:流数据矩阵中的每个数据向量完全采样,或者被一个采样矩阵A_t∈R^{m×n}(m<<n)欠采样。我们的结果涵盖了两种情况,其中A_t是高斯矩阵或单位矩阵的子集。我们提出了一种仅依赖于采样数据和算法输出的自适应步长方案。我们证明了在完全采样数据的情况下,该步长方案在每次迭代中最大化我们的收敛度量的改进,并且该方法从任何随机初始化都能收敛到真实子空间,尽管是非凸问题和正交约束。对于欠采样数据的情况,我们建立了在每次迭代中,定义的收敛度量具有高概率的单调期望改进。

英文摘要

Subspace learning and matrix factorization problems have great many applications in science and engineering, and efficient algorithms are critical as dataset sizes continue to grow. Many relevant problem formulations are non-convex, and in a variety of contexts it has been observed that solving the non-convex problem directly is not only efficient but reliably accurate. We discuss convergence theory for a particular method: first order incremental gradient descent constrained to the Grassmannian. The output of the algorithm is an orthonormal basis for a $d$-dimensional subspace spanned by an input streaming data matrix. We study two sampling cases: where each data vector of the streaming matrix is fully sampled, or where it is undersampled by a sampling matrix $A_t\in \mathbb{R}^{m\times n}$ with $m\ll n$. Our results cover two cases, where $A_t$ is Gaussian or a subset of rows of the identity matrix. We propose an adaptive stepsize scheme that depends only on the sampled data and algorithm outputs. We prove that with fully sampled data, the stepsize scheme maximizes the improvement of our convergence metric at each iteration, and this method converges from any random initialization to the true subspace, despite the non-convex formulation and orthogonality constraints. For the case of undersampled data, we establish monotonic expected improvement on the defined convergence metric for each iteration with high probability.

1905.08870 2026-06-04 stat.AP cs.SY econ.GN eess.SY q-fin.EC

The perils of automated fitting of datasets: the case of a wind turbine cost model

自动化拟合数据集的危险:风力涡轮机成本模型的案例

Claude Klöckl, Katharina Gruber, Peter Regner, Sebastian Wehrle, Johannes Schmidt

AI总结 本文研究了自动化回归分析在风力涡轮机成本模型中的应用问题,提出了一个针对特定地点的投资成本模型,但认为通用成本模型存在缺陷,可能导致其他应用场景中出现不合理结果。

详情
Comments
Updated for Examples and Counterexamples Submission, In response to referee feedback we have extensively revised and integrated new data (the-windpower.net), updated all figures and made sure that the given wind turbine examples are more numerous, named explicitly and show more clearly unplausible behavior
AI中文摘要

Rinne 等人对风力涡轮机技术及土地利用对风能潜力的影响进行了有趣分析,这为每个因素对总体潜力的贡献提供了深刻见解。本文提出了一个用于特定地点风力涡轮机投资成本(即道路和电网接入成本)的详细模型,并辅以用于估计非特定地点成本的模型。我们认为提出了一个前沿的特定地点投资成本模型。然而,我们认为通用成本模型存在缺陷。这种缺陷很可能不影响本文中呈现的结果,尽管我们预计会有相当大的泛化误差。因此,将风力涡轮机成本模型应用于其他情境可能导致不合理的结果。更广泛地说,风力涡轮机成本模型的推导是自动化回归分析应用出错的一个例子。

英文摘要

Rinne et al. conduct an interesting analysis of the impact of wind turbine technology and land-use on wind power potentials, which allows profound insights into each factors contribution to overall potentials. The paper presents a detailed model of site-specific wind turbine investment cost (i.e. road- and grid access costs) complemented by a model used to estimate site-independent costs. We believe that propose a cutting edge model of site-specific investment costs. However, the site-independent cost model is flawed in our opinion. This flaw most likely does not impact the results presented in the paper, although we expect a considerable generalization error. Thus the application of the wind turbine cost model in other contexts may lead to unreasonable results. More generally, the derivation of the wind turbine cost model serves as an example of how applications of automated regression analysis can go wrong.

1807.11660 2026-06-04 stat.AP cs.SY eess.SY

Unmanned Aerial Vehicle Path Planning for Traffic Estimation and Detection of Non-Recurrent Congestion

无人机路径规划用于交通估计及非重复拥堵检测

Cesar N. Yahia, Shannon E. Scott, Stephen D. Boyles, Christian G. Claudel

AI总结 本文提出了一种基于无人机的路径规划算法,通过结合无人机观测与速度-密度传感器数据,以最小化道路/交通状态的不确定性,从而提高非重复拥堵情况下的交通事件检测能力。

详情
Journal ref
Transportation Letters (2021): 1-14
AI中文摘要

无人机(UAVs)提供了一种从视频数据中提取道路和交通信息的新方法。通过分析视频帧中的对象,无人机可以检测交通特征和道路事件。利用无人机的机动性和检测能力,我们研究了一种导航算法,旨在在非重复拥堵条件下最大化道路/交通状态的信息。我们提出了一种主动探索框架,该框架(1)将无人机观测与速度-密度传感器数据相结合,(2)量化道路/交通状态的不确定性,(3)自适应地导航无人机以最小化这种不确定性。导航算法使用A-最优信息度量(均值不确定性),并依赖于由双状态集合卡尔曼滤波器(EnKF)生成的协方差矩阵。在EnKF过程中,由于观测是事件状态变量的非线性函数,我们使用代表模型预测测量的诊断变量。我们还提出了一种状态更新程序,以维持事件参数与测量之间的单调关系。我们比较了无人机导航-估计程序所得到的交通/事件状态估计结果与不使用目标无人机观测的相应估计结果。我们的结果表明,无人机在速度-密度数据不具信息性的情况下,有助于检测拥堵条件下的事件。

英文摘要

Unmanned aerial vehicles (UAVs) provide a novel means of extracting road and traffic information from video data. In particular, by analyzing objects in a video frame, UAVs can detect traffic characteristics and road incidents. Leveraging the mobility and detection capabilities of UAVs, we investigate a navigation algorithm that seeks to maximize information on the road/traffic state under non-recurrent congestion. We propose an active exploration framework that (1) assimilates UAV observations with speed-density sensor data, (2) quantifies uncertainty on the road/traffic state, and (3) adaptively navigates the UAV to minimize this uncertainty. The navigation algorithm uses the A-optimal information measure (mean uncertainty), and it depends on covariance matrices generated by a dual state ensemble Kalman filter (EnKF). In the EnKF procedure, since observations are a nonlinear function of the incident state variables, we use diagnostic variables that represent model predicted measurements. We also present a state update procedure that maintains a monotonic relationship between incident parameters and measurements. We compare the traffic/incident state estimates resulting from the UAV navigation-estimation procedure against corresponding estimates that do not use targeted UAV observations. Our results indicate that UAVs aid in detection of incidents under congested conditions where speed-density data are not informative.

1610.02962 2026-06-04 stat.ML cs.NA math.NA

Low-Rank Dynamic Mode Decomposition: An Exact and Tractable Solution

低秩动态模式分解:一个精确且可处理的解决方案

Patrick Héas, Cédric Herzet

AI总结 本文研究了使用低秩动态模式分解对高维动态系统进行线性近似,提出了一种精确且可处理的解决方案,并通过数值模拟验证了其有效性。

详情
Journal ref
Journal of Nonlinear Science, 2021
AI中文摘要

本文研究了使用低秩动态模式分解对高维动态系统进行线性近似。通过数据驱动的方法寻找此近似,被形式化为求解一个低秩约束优化问题。该问题是非凸的,现有最先进的算法都是次优的。本文证明存在一个闭式解,该解可在多项式时间内计算,并刻画了最优近似误差的l2范数。本文还提出了基于奇异值分解或特征值分解构建简化模型的低复杂度算法。这些算法通过使用合成和物理数据基准的数值模拟进行评估。

英文摘要

This work studies the linear approximation of high-dimensional dynamical systems using low-rank dynamic mode decomposition (DMD). Searching this approximation in a data-driven approach is formalised as attempting to solve a low-rank constrained optimisation problem. This problem is non-convex and state-of-the-art algorithms are all sub-optimal. This paper shows that there exists a closed-form solution, which is computed in polynomial time, and characterises the l2-norm of the optimal approximation error. The paper also proposes low-complexity algorithms building reduced models from this optimal solution, based on singular value decomposition or eigen value decomposition. The algorithms are evaluated by numerical simulations using synthetic and physical data benchmarks.

1904.11665 2026-06-04 stat.CO cs.NA math.NA math.ST stat.TH

Rapid evaluation of the spectral signal detection threshold and Stieltjes transform

快速评估谱信号检测阈值和Stieltjes变换

William Leeb

AI总结 本文提出了一种快速算法,用于在大数据矩阵极限下评估谱信号检测阈值,并设计了新的算法来计算谱分布的Stieltjes变换,以提高谱去噪方法的参数估计精度。

详情
AI中文摘要

准确检测信号成分是统计应用中低信噪比下的常见挑战,特别是在异方差噪声环境下尤为困难。在某些信号加噪声模型中,如经典主成分协方差模型及其变体,对于各向同性噪声,在大数据矩阵极限下存在闭合公式计算谱信号检测阈值(最大的样本特征值仅归因于噪声)。然而,更一般的噪声模型目前缺乏可以证明快速且准确数值评估阈值的方法。本文提出了一种快速算法用于在大数据矩阵极限下评估谱信号检测阈值。我们考虑具有可分离方差轮廓(其方差矩阵为秩一)的噪声矩阵,因为这些在应用中经常出现。该解决方案基于嵌套应用牛顿法。我们还设计了新的算法用于在超过阈值的实数值上计算谱分布的Stieltjes变换。Stieltjes变换在此域内是谱去噪方法参数估计的关键量。两种算法的正确性均通过详细分析Stieltjes变换的主方程得到证明,其性能在数值实验中得到展示。

英文摘要

Accurate detection of signal components is a frequently-encountered challenge in statistical applications with low signal-to-noise ratio. This problem is particularly challenging in settings with heteroscedastic noise. In certain signal-plus-noise models of data, such as the classical spiked covariance model and its variants, there are closed formulas for the spectral signal detection threshold (the largest sample eigenvalue attributable solely to noise) for isotropic noise in the limit of infinitely large data matrices. However, more general noise models currently lack provably fast and accurate methods for numerically evaluating the threshold. In this work, we introduce a rapid algorithm for evaluating the spectral signal detection threshold in the limit of infinitely large data matrices. We consider noise matrices with a separable variance profile (whose variance matrix is rank one), as these arise often in applications. The solution is based on nested applications of Newton's method. We also devise a new algorithm for evaluating the Stieltjes transform of the spectral distribution at real values exceeding the threshold. The Stieltjes transform on this domain is known to be a key quantity in parameter estimation for spectral denoising methods. The correctness of both algorithms is proven from a detailed analysis of the master equations characterizing the Stieltjes transform, and their performance is demonstrated in numerical experiments.

1710.04465 2026-06-04 cs.RO cs.SY eess.SY stat.CO

Markerless visual servoing on unknown objects for humanoid robot platforms

无标记未知物体的人形机器人视觉伺服

Claudio Fantacci, Giulia Vezzani, Ugo Pattacini, Vadim Tikhanoff, Lorenzo Natale

AI总结 本文提出了一种无标记未知物体的人形机器人视觉伺服框架,通过立体视觉计算可抓取物体的体积,利用递归贝叶斯滤波估计末端执行器的6D姿态,结合非线性约束优化问题计算目标姿态,并通过图像基于视觉伺服控制实现末端执行器的精确控制。

详情
Journal ref
IEEE International Conference on Robotics and Automation (ICRA), 2018
AI中文摘要

为了精确地抓住一个物体,人形机器人需要对末端执行器、物体姿态和形状有良好的了解。本文提出了一种无标记未知物体的视觉伺服框架,分为四个主要部分:I) 通过立体视觉建立最小二乘问题来计算机器人手可抓取的物体体积;II) 基于序贯蒙特卡洛(SMC)滤波的递归贝叶斯滤波技术,用于在不使用标记的情况下估计机器人末端执行器的6D姿态;III) 建立非线性约束优化问题来计算关于物体的目标可抓取姿态;IV) 通过图像基于视觉伺服控制命令机器人末端执行器向目标姿态移动。我们通过大量实验在iCub人形机器人平台上验证了该方法的有效性和鲁棒性,实现了实时计算、平滑轨迹和亚像素精度。

英文摘要

To precisely reach for an object with a humanoid robot, it is of central importance to have good knowledge of both end-effector, object pose and shape. In this work we propose a framework for markerless visual servoing on unknown objects, which is divided in four main parts: I) a least-squares minimization problem is formulated to find the volume of the object graspable by the robot's hand using its stereo vision; II) a recursive Bayesian filtering technique, based on Sequential Monte Carlo (SMC) filtering, estimates the 6D pose (position and orientation) of the robot's end-effector without the use of markers; III) a nonlinear constrained optimization problem is formulated to compute the desired graspable pose about the object; IV) an image-based visual servo control commands the robot's end-effector toward the desired pose. We demonstrate effectiveness and robustness of our approach with extensive experiments on the iCub humanoid robot platform, achieving real-time computation, smooth trajectories and sub-pixel precisions.

1512.04811 2026-06-04 physics.data-an cs.NA math.NA stat.AP

Entropy-based Time-Varying Window Width Selection for Nonlinear type Time-Frequency Analysis

基于熵的时变窗口宽度选择用于非线性类型时频分析

Yae-lin Sheu, Liang-Yan Hsu, Pi-Tai Chou, Hau-tieng Wu

AI总结 本文提出了一种时变最优窗口宽度选择方法,用于优化多种非线性时频分析方法的性能,包括重新分配方法和同步压缩变换及其变种,主要贡献是通过选择时间频表示中最集中的分布窗口来提高分析效果。

详情
AI中文摘要

我们提出了一种时变最优窗口宽度(TVOWW)选择方案,以优化几种非线性类型的时频分析方法,包括重新分配方法和同步压缩变换(SST)及其变种。将使时间频表示(TFR)中最集中的分布的窗口视为最优窗口。TVOWW选择方案特别适用于包含快速变化的瞬时频率和小频谱间隙的信号。为了证明该方法的有效性,除了分析合成信号外,我们还研究了由双色中红外激光场驱动的原子时变偶极矩在阿秒物理中的情况。

英文摘要

We propose a time-varying optimal window width (TVOWW) selection scheme to optimize the performance of several nonlinear-type time-frequency analyses, including the reassignment method, and the synchrosqueezing transform (SST) and its variations. A window rendering the most concentrated distribution in the time-frequency representation (TFR) is regarded as the optimal window. The TVOWW selection scheme is particularly useful for signals that comprise fast-varying instantaneous frequencies and small spectral gaps. To demonstrate the efficacy of the method, in addition to analyzing a synthetic signal, we study an atomic time-varying dipole moment driven by two-color mid-infrared laser fields in attosecond physics.

1709.04673 2026-06-04 eess.SY cs.SY math.DS stat.ML

Analyzing Approximate Value Iteration Algorithms

分析近似值迭代算法

Arunselvan Ramaswamy, Shalabh Bhatnagar

AI总结 本文研究了在Bellman算子的近似值存在噪声和可能偏差的情况下,值迭代方案的随机迭代版本,即近似值迭代(AVI)方案。通过神经网络近似Bellman算子,考虑了训练样本带来的误差和偏差,并提出了可验证的充分条件,确保AVI在几乎必然有界的情况下收敛到近似Bellman算子的不动点。此外,还展示了AVI稳定性分析可以扩展到多值随机近似的一般情况,并可用于寻找收缩多值映射的不动点。

详情
AI中文摘要

在本文中,我们考虑了值迭代方案的随机迭代版本,其中仅能获得Bellman算子的噪声和可能有偏的近似值。我们将这种版本称为近似值迭代(AVI)方案。神经网络常用于函数逼近,以克服Bellman的维度诅咒。在本文中,它们被用于近似Bellman算子。由于神经网络通常通过样本数据进行训练,可能会引入误差和偏差。AVI的设计考虑了具有Bellman算子有偏近似值和采样误差的实现。我们提出了可验证的充分条件,确保AVI在几乎必然有界的情况下收敛到近似Bellman算子的不动点。为了确保AVI的稳定性,我们提出了三种不同但相关的充分条件集,这些条件基于适当Lyapunov函数的存在。这些基于Lyapunov函数的条件易于验证且是新的。可验证性还增强了由于提供了构造必要Lyapunov函数的食谱。我们还展示了AVI的稳定性分析可以轻易扩展到多值随机近似的普遍情况。最后,我们展示AVI也可用于更一般的情况,即寻找收缩多值映射的不动点。

英文摘要

In this paper, we consider the stochastic iterative counterpart of the value iteration scheme wherein only noisy and possibly biased approximations of the Bellman operator are available. We call this counterpart as the approximate value iteration (AVI) scheme. Neural networks are often used as function approximators, in order to counter Bellman's curse of dimensionality. In this paper, they are used to approximate the Bellman operator. Since neural networks are typically trained using sample data, errors and biases may be introduced. The design of AVI accounts for implementations with biased approximations of the Bellman operator and sampling errors. We present verifiable sufficient conditions under which AVI is stable (almost surely bounded) and converges to a fixed point of the approximate Bellman operator. To ensure the stability of AVI, we present three different yet related sets of sufficient conditions that are based on the existence of an appropriate Lyapunov function. These Lyapunov function based conditions are easily verifiable and new to the literature. The verifiability is enhanced by the fact that a recipe for the construction of the necessary Lyapunov function is also provided. We also show that the stability analysis of AVI can be readily extended to the general case of set-valued stochastic approximations. Finally, we show that AVI can also be used in more general circumstances, i.e., for finding fixed points of contractive set-valued maps.

1902.03056 2026-06-04 math.ST cs.NA math.NA math.OC stat.TH

Bernstein Concentration Inequalities for Tensors via Einstein Products

通过爱因斯坦乘积的张量伯恩斯坦集中不等式

Z. Luo, L. Qi, Ph. L. Toint

AI总结 本文提出了一种将伯恩斯坦矩阵集中不等式推广到一般阶随机张量的方法,通过爱因斯坦乘积建立矩阵与张量之间的强联系,从而利用已有的矩阵结果。

详情
Journal ref
Frontiers of Mathematics in China, vol. 5(2), pp. 367-384, 2020
Comments
12 pages
AI中文摘要

本文提出了一种将伯恩斯坦矩阵集中不等式推广到一般阶随机张量的方法。该推广基于张量之间的爱因斯坦乘积的使用,从而可以建立矩阵与张量之间的强联系,进而可以利用已有的对于前者的现有结果。

英文摘要

A generalization of the Bernstein matrix concentration inequality to random tensors of general order is proposed. This generalization is based on the use of Einstein products between tensors, from which a strong link can be established between matrices and tensors, in turn allowing exploitation of existing results for the former.

1903.00979 2026-06-04 math.OC cs.LG cs.SY eess.SY math.DS stat.ML

Analysis of a Generalized Expectation-Maximization Algorithm for Gaussian Mixture Models: A Control Systems Perspective

Gaussian混合模型中通用期望-最大化算法的分析:控制系统的视角

Sarthak Chatterjee, Orlando Romero, Sérgio Pequito

AI总结 本文从控制系统的角度分析了Gaussian混合模型中的一种通用期望-最大化算法,探讨了其收敛性质,并通过示例展示了该方法的优势。

详情
Comments
17 pages, 7 figures
AI中文摘要

期望-最大化(EM)算法是无监督学习中解决参数分布基于聚类问题最流行的算法之一。在本文中,我们提出在高斯混合模型的背景下分析一种通用的EM(GEM)算法,其中EM中的最大化步骤被替换为递增步骤。我们证明这种GEM算法可以被理解为具有反馈非线性的线性时不变(LTI)系统。因此,我们利用鲁棒控制理论的工具来探索其收敛性质。最后,我们解释了如何设计所提出的GEM,并通过一个教学示例来理解所提出方法的优势。

英文摘要

The Expectation-Maximization (EM) algorithm is one of the most popular methods used to solve the problem of parametric distribution-based clustering in unsupervised learning. In this paper, we propose to analyze a generalized EM (GEM) algorithm in the context of Gaussian mixture models, where the maximization step in the EM is replaced by an increasing step. We show that this GEM algorithm can be understood as a linear time-invariant (LTI) system with a feedback nonlinearity. Therefore, we explore some of its convergence properties by leveraging tools from robust control theory. Lastly, we explain how the proposed GEM can be designed, and present a pedagogical example to understand the advantages of the proposed approach.

1705.09395 2026-06-04 stat.CO cs.NA math.NA

Optimal Experimental Design Using A Consistent Bayesian Approach

使用一致贝叶斯方法进行最优实验设计

Scott N. Walsh, Tim M. Wildey, John D. Jakeman

AI总结 本文提出了一种基于一致贝叶斯方法的最优实验设计方法,通过计算模型指导实验数据的最优获取,以改进模型输入参数的随机描述,并通过数值实验验证了该方法在PDE模型中的有效性。

详情
AI中文摘要

我们考虑利用计算模型指导最优实验数据获取,以改进模型输入参数的随机描述。我们的方法基于最近发展的用于求解随机逆问题的一致贝叶斯方法,该方法旨在寻找一个后验概率密度,使其在模型和数据的意义下一致,即后验密度通过计算模型的推动匹配观测密度几乎处处。给定一组潜在观测,我们的最优实验设计(OED)寻求最大化从先验概率密度在模型参数上的预期信息增益的观测,或观测集。我们讨论了观测密度空间的表征以及一种计算高效的方法,用于将观测密度重新缩放以满足一致贝叶斯方法的基本假设。数值结果用于将我们的方法与现有OED方法进行比较,并在一组代表性PDE模型上展示我们的OED。

英文摘要

We consider the utilization of a computational model to guide the optimal acquisition of experimental data to inform the stochastic description of model input parameters. Our formulation is based on the recently developed consistent Bayesian approach for solving stochastic inverse problems which seeks a posterior probability density that is consistent with the model and the data in the sense that the push-forward of the posterior (through the computational model) matches the observed density on the observations almost everywhere. Given a set a potential observations, our optimal experimental design (OED) seeks the observation, or set of observations, that maximizes the expected information gain from the prior probability density on the model parameters. We discuss the characterization of the space of observed densities and a computationally efficient approach for rescaling observed densities to satisfy the fundamental assumptions of the consistent Bayesian approach. Numerical results are presented to compare our approach with existing OED methodologies using the classical/statistical Bayesian approach and to demonstrate our OED on a set of representative PDE-based models.

1704.00680 2026-06-04 math.NA cs.NA stat.CO

A Consistent Bayesian Formulation for Stochastic Inverse Problems Based on Push-forward Measures

基于推进测度的随机反问题一致贝叶斯公式

T. Butler, J. D. Jakeman, T. Wildey

AI总结 本文提出了一种基于推进测度的随机反问题一致贝叶斯方法,通过贝叶斯更新方法求解确定性模型参数的反问题,并通过数值方法验证了该方法的正确性和有效性。

详情
AI中文摘要

我们提出了一个反问题的公式,并给出了求解该反问题的数值方法,该反问题旨在从随机观测数据(感兴趣的量)中推断确定性模型的参数。解以概率测度形式给出,通过贝叶斯更新方法对可测映射进行处理,得到一个后验概率测度,当通过确定性模型传播时,会产生一个与观测概率测度在数据上精确匹配的推进测度。我们提出的方法称为一致贝叶斯推断,其简单且仅需计算由先验概率测度和确定性模型组合诱导的推进概率测度。我们建立了观测一致后验的存在性和唯一性,并给出了稳定性与误差分析。我们还讨论了一致贝叶斯推断与经典/统计贝叶斯推断以及最近发展出的测度论推断方法之间的关系。最后,通过分析和数值结果来突出一致贝叶斯方法的某些性质以及该方法与上述两种替代推断方法之间的差异。

英文摘要

We formulate, and present a numerical method for solving, an inverse problem for inferring parameters of a deterministic model from stochastic observational data (quantities of interest). The solution, given as a probability measure, is derived using a Bayesian updating approach for measurable maps that finds a posterior probability measure, that when propagated through the deterministic model produces a push-forward measure that exactly matches the observed probability measure on the data. Our approach for finding such posterior measures, which we call consistent Bayesian inference, is simple and only requires the computation of the push-forward probability measure induced by the combination of a prior probability measure and the deterministic model. We establish existence and uniqueness of observation-consistent posteriors and present stability and error analysis. We also discuss the relationships between consistent Bayesian inference, classical/statistical Bayesian inference, and a recently developed measure-theoretic approach for inference. Finally, analytical and numerical results are presented to highlight certain properties of the consistent Bayesian approach and the differences between this approach and the two aforementioned alternatives for inference.

1708.02196 2026-06-04 stat.AP cs.SY eess.SY

Joint Smoothing, Tracking, and Forecasting Based on Continuous-Time Target Trajectory Fitting

基于连续时间目标轨迹拟合的联合平滑、跟踪与预测

Tiancheng Li, Huimin Chen, Shudong Sun, Juan M Corchado

AI总结 本文提出一种连续时间状态估计框架,将传统的平滑、跟踪和预测(STF)任务统一为一个在线数据拟合问题,适用于具有平滑运动过程的目标,如以近似恒定加速度或受微小噪声影响移动的目标。该框架不同于传统的马尔可夫转移公式,通过连续时间轨迹函数(FoT)建模状态过程,并通过滑动时间窗口内的数据拟合寻找最佳轨迹FoT。该框架在实时环境中减少了对目标运动的严格统计建模需求,适用于多种现实世界目标,如乘客飞机和船舶,它们在计划或分段平滑路径上移动,但对其实时运动和传感器的统计知识有限。此外,所提出的STF框架继承了数据拟合的优点,能够适应任意传感器重访时间、目标机动和漏检。所提出的方法在机动或非机动目标场景中与最先进的估计器进行比较。

详情
Journal ref
IEEE Transactions on Automation Science and Engineering, Volume: 16, Issue: 3, July 2019,Pages: 1476 - 1483
Comments
16 pages, 8 figures, 5 tables, 80 references; Codes available
AI中文摘要

我们提出了一种连续时间状态估计框架,将传统的平滑、跟踪和预测(STF)任务统一为一个在线数据拟合问题,适用于具有平滑运动过程的目标,例如目标以近似恒定加速度或受微小噪声影响移动。与传统的马尔可夫转移公式根本不同,状态过程通过连续时间轨迹函数(FoT)建模,STF问题被公式化为一个在线数据拟合问题,目标是找到最佳拟合观测的轨迹FoT。然后,目标的状态,无论是过去(即平滑)、当前(即过滤)还是近未来(即预测),都可以从FoT中推断出来。我们的框架在实时环境中释放了对目标运动的严格统计建模需求,并适用于多种现实世界目标,如乘客飞机和船舶,它们在计划或分段平滑路径上移动,但对其实时运动和传感器的统计知识有限。此外,所提出的STF框架继承了数据拟合的优点,能够适应任意传感器重访时间、目标机动和漏检。所提出的方法在机动或非机动目标场景中与最先进的估计器进行比较。

英文摘要

We present a continuous time state estimation framework that unifies traditionally individual tasks of smoothing, tracking, and forecasting (STF), for a class of targets subject to smooth motion processes, e.g., the target moves with nearly constant acceleration or affected by insignificant noises. Fundamentally different from the conventional Markov transition formulation, the state process is modeled by a continuous trajectory function of time (FoT) and the STF problem is formulated as an online data fitting problem with the goal of finding the trajectory FoT that best fits the observations in a sliding time-window. Then, the state of the target, whether the past (namely, smoothing), the current (filtering) or the near-future (forecasting), can be inferred from the FoT. Our framework releases stringent statistical modeling of the target motion in real time, and is applicable to a broad range of real world targets of significance such as passenger aircraft and ships which move on scheduled, (segmented) smooth paths but little statistical knowledge is given about their real time movement and even about the sensors. In addition, the proposed STF framework inherits the advantages of data fitting for accommodating arbitrary sensor revisit time, target maneuvering and missed detection. The proposed method is compared with state of the art estimators in scenarios of either maneuvering or non-maneuvering target.

1701.00178 2026-06-04 math.OC cs.AI cs.LG cs.SY eess.SY stat.ML

Lazily Adapted Constant Kinky Inference for Nonparametric Regression and Model-Reference Adaptive Control

惰性适应的常数Kinky推断用于非参数回归和模型参考自适应控制

Jan-Peter Calliess

AI总结 本文提出了一种惰性适应的常数Kinky推断方法,用于非参数回归和模型参考自适应控制,通过在线估计Hölder常数并建立强通用逼近保证,展示了在密集数据下学习任意连续函数的能力。

详情
AI中文摘要

非线性集合成员预测、Lipschitz插值或Kinky推断是机器学习中利用预设Lipschitz性质来计算未观测函数值推断的方法。在已知目标函数真实最佳Lipschitz常数的上界时,这些方法提供收敛保证和预测的界限。考虑一个更一般的设置,该设置基于相对于伪度量的Hölder连续性,我们提出了一种在线方法,用于从可能受有界观测误差影响的函数值观测中估计Hölder常数。利用此方法在Kinky推断规则中计算自适应参数,从而得到一种非参数机器学习方法,我们为此建立了强通用逼近保证。也就是说,我们证明我们的预测规则在数据越来越密集的情况下,可以学习任意连续函数,其最坏误差界取决于观测不确定性水平。我们在非参数模型参考自适应控制(MRAC)的背景下应用了我们的方法。在一系列模拟飞机滚动动力学和性能指标中,我们的方法优于基于高斯过程和RBF神经网络最近提出的方法。对于离散时间系统,我们为我们的基于学习的控制器在批量学习和在线学习设置下的跟踪成功率提供了保证。

英文摘要

Techniques known as Nonlinear Set Membership prediction, Lipschitz Interpolation or Kinky Inference are approaches to machine learning that utilise presupposed Lipschitz properties to compute inferences over unobserved function values. Provided a bound on the true best Lipschitz constant of the target function is known a priori they offer convergence guarantees as well as bounds around the predictions. Considering a more general setting that builds on Hoelder continuity relative to pseudo-metrics, we propose an online method for estimating the Hoelder constant online from function value observations that possibly are corrupted by bounded observational errors. Utilising this to compute adaptive parameters within a kinky inference rule gives rise to a nonparametric machine learning method, for which we establish strong universal approximation guarantees. That is, we show that our prediction rule can learn any continuous function in the limit of increasingly dense data to within a worst-case error bound that depends on the level of observational uncertainty. We apply our method in the context of nonparametric model-reference adaptive control (MRAC). Across a range of simulated aircraft roll-dynamics and performance metrics our approach outperforms recently proposed alternatives that were based on Gaussian processes and RBF-neural networks. For discrete-time systems, we provide guarantees on the tracking success of our learning-based controllers both for the batch and the online learning setting.

1904.06569 2026-06-04 math.NA cs.NA stat.ME

Regularity and convergence analysis in Sobolev and Hölder spaces for generalized Whittle-Matérn fields

Sobolev 和 Hölder 空间中的广义 Whittle-Matérn 场的正则性和收敛性分析

Sonja G. Cox, Kristin Kirchner

AI总结 本文研究了广义 Whittle-Matérn 场的 Galerkin 近似在 Sobolev 和 Hölder 空间中的正则性和收敛性,证明了在最小假设下,谱 Galerkin 方法和有限元近似在 $L_q(Ω; H^σ(\mathcal{D}))$ 和 $L_q(Ω; C^δ(\overline{\mathcal{D}}))$ 中以最优速率收敛,并提供了协方差函数误差的估计。

详情
Journal ref
Numer. Math. 146 (2020) 819-873
Comments
41 pages, 2 figures
AI中文摘要

我们分析了由负分数次幂 $L^{-2β}$ 的二阶椭圆算子 $L:=- abla\cdot(A abla) + κ^2$ 决定的高斯随机场 $\mathcal{Z}\colon\mathcal{D} imesΩ o\mathbb{R}$ 的几种 Galerkin 近似。在最小的域 $\mathcal{D}$ 假设下,系数 $A\colon\mathcal{D} o\mathbb{R}^{d imes d}$, $κ\colon\mathcal{D} o\mathbb{R}$ 和分数指数 $β>0$ 下,我们证明了在 $L_q(Ω; H^σ(\mathcal{D}))$ 和 $L_q(Ω; C^δ(\overline{\mathcal{D}}))$ 中的收敛性,并以 (本质上) 最优速率收敛于 (i) 谱 Galerkin 方法和 (ii) 有限元近似。具体而言,我们的分析仅基于 $H^{1+α}(\mathcal{D})$ 正则性,其中 $0<α\leq 1$。对于这种设定,我们进一步提供了这些近似的协方差函数误差在 $L_{\infty}(\mathcal{D} imes\mathcal{D})$ 和混合 Sobolev 空间 $H^{σ,σ}(\mathcal{D} imes\mathcal{D})$ 中的严格估计,显示了比对应的 $L_q(Ω; H^σ(\mathcal{D}))$ 率快两倍以上的收敛性。对于著名的 Whittle-Matérn 类型的高斯随机场,即 $L=-Δ+ κ^2$ 且 $κ\equiv \operatorname{const.}$ 的情况,我们进行了若干数值实验以验证我们的理论结果。

英文摘要

We analyze several Galerkin approximations of a Gaussian random field $\mathcal{Z}\colon\mathcal{D}\timesΩ\to\mathbb{R}$ indexed by a Euclidean domain $\mathcal{D}\subset\mathbb{R}^d$ whose covariance structure is determined by a negative fractional power $L^{-2β}$ of a second-order elliptic differential operator $L:= -\nabla\cdot(A\nabla) + κ^2$. Under minimal assumptions on the domain $\mathcal{D}$, the coefficients $A\colon\mathcal{D}\to\mathbb{R}^{d\times d}$, $κ\colon\mathcal{D}\to\mathbb{R}$, and the fractional exponent $β>0$, we prove convergence in $L_q(Ω; H^σ(\mathcal{D}))$ and in $L_q(Ω; C^δ(\overline{\mathcal{D}}))$ at (essentially) optimal rates for (i) spectral Galerkin methods and (ii) finite element approximations. Specifically, our analysis is solely based on $H^{1+α}(\mathcal{D})$-regularity of the differential operator $L$, where $0<α\leq 1$. For this setting, we furthermore provide rigorous estimates for the error in the covariance function of these approximations in $L_{\infty}(\mathcal{D}\times\mathcal{D})$ and in the mixed Sobolev space $H^{σ,σ}(\mathcal{D}\times\mathcal{D})$, showing convergence which is more than twice as fast compared to the corresponding $L_q(Ω; H^σ(\mathcal{D}))$-rate. For the well-known example of such Gaussian random fields, the original Whittle-Matérn class, where $L=-Δ+ κ^2$ and $κ\equiv \operatorname{const.}$, we perform several numerical experiments which validate our theoretical results.

1906.00729 2026-06-04 cs.LG cs.GT cs.SY eess.SY math.OC stat.ML

Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games

策略优化在零和线性二次博弈中可证明收敛至纳什均衡

Kaiqing Zhang, Zhuoran Yang, Tamer Başar

AI总结 本文研究了策略优化在零和线性二次博弈中寻找纳什均衡的全局收敛性,通过分析LQ博弈的优化景观,证明了线性反馈控制策略的 stationary 点构成博弈的纳什均衡,并提出三种保证收敛到纳什均衡的投影嵌套梯度方法,同时展示了这些算法具有全局次线性和局部线性收敛率。

详情
Comments
Fixed some typos, addressed some comments from NeurIPS reviews
AI中文摘要

我们研究了策略优化在寻找零和线性二次(LQ)博弈纳什均衡(NE)中的全局收敛性。为此,我们首先分析了LQ博弈的景观,将其视为策略空间中的非凸非凹鞍点问题。具体来说,我们证明了尽管其非凸性和非凹性,零和LQ博弈具有性质:目标函数相对于线性反馈控制策略的 stationary 点构成博弈的纳什均衡。在此基础上,我们开发了三种投影嵌套梯度方法,这些方法保证能够收敛到博弈的纳什均衡。此外,我们证明所有这些算法都具有全局次线性和局部线性收敛率。还提供了仿真结果以说明算法的满意收敛特性。据我们所知,这项工作似乎是首次研究LQ博弈的优化景观,并且证明了策略优化方法收敛到纳什均衡。我们的工作为理解一般零和马尔可夫游戏中的基于策略的强化学习算法的理论方面提供了初步步骤。

英文摘要

We study the global convergence of policy optimization for finding the Nash equilibria (NE) in zero-sum linear quadratic (LQ) games. To this end, we first investigate the landscape of LQ games, viewing it as a nonconvex-nonconcave saddle-point problem in the policy space. Specifically, we show that despite its nonconvexity and nonconcavity, zero-sum LQ games have the property that the stationary point of the objective function with respect to the linear feedback control policies constitutes the NE of the game. Building upon this, we develop three projected nested-gradient methods that are guaranteed to converge to the NE of the game. Moreover, we show that all of these algorithms enjoy both globally sublinear and locally linear convergence rates. Simulation results are also provided to illustrate the satisfactory convergence properties of the algorithms. To the best of our knowledge, this work appears to be the first one to investigate the optimization landscape of LQ games, and provably show the convergence of policy optimization methods to the Nash equilibria. Our work serves as an initial step toward understanding the theoretical aspects of policy-based reinforcement learning algorithms for zero-sum Markov games in general.

1612.04425 2026-06-04 math.OC cs.DC cs.NA math.NA stat.ML

On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays

关于异步并行迭代与无界延迟收敛性分析

Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin

AI总结 本文从概率角度分析了异步并行方法在无界延迟下的收敛性,给出了依赖延迟统计的显式步长公式,并通过实验验证了理论模型的正确性,展示了传统最大延迟诱导步长过于保守的问题。

详情
Journal ref
Journal of the Operations Research Society of China, 7 (2019), 5-42
Comments
accepted to JORSC
AI中文摘要

近年来,由于涉及非常大规模数据和大量决策变量的问题,异步并行(async-parallel)迭代算法得到了迅速发展。由于异步性,迭代计算使用的是过时的信息,而延迟是指信息创建后被更新的次数。几乎所有的近期工作都假设最大延迟是有限的,并据此设置步长参数。然而,最大延迟在实践中是未知的。本文从概率角度分析了异步并行方法在无界延迟下的收敛性,给出了依赖延迟统计的显式步长公式。使用p+1个相同处理器,我们实验证实延迟近似服从参数为p的泊松分布,与我们的理论模型相符,因此可以据此设置步长。在凸和非凸优化问题上的模拟验证了我们的分析,并显示现有最大延迟诱导的步长过于保守,通常会减慢算法的收敛速度。

英文摘要

Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed with outdated information, and the age of the outdated information, which we call delay, is the number of times it has been updated since its creation. Almost all recent works prove convergence under the assumption of a finite maximum delay and set their stepsize parameters accordingly. However, the maximum delay is practically unknown. This paper presents convergence analysis of an async-parallel method from a probabilistic viewpoint, and it allows for large unbounded delays. An explicit formula of stepsize that guarantees convergence is given depending on delays' statistics. With $p+1$ identical processors, we empirically measured that delays closely follow the Poisson distribution with parameter $p$, matching our theoretical model, and thus the stepsize can be set accordingly. Simulations on both convex and nonconvex optimization problems demonstrate the validness of our analysis and also show that the existing maximum-delay induced stepsize is too conservative, often slowing down the convergence of the algorithm.

1905.13268 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Interpretable PID Parameter Tuning for Control Engineering using General Dynamic Neural Networks: An Extensive Comparison

使用通用动态神经网络进行可解释的PID参数调节:一种广泛的比较

Johannes Günther, Elias Reichensdörfer, Patrick M. Pilarski, Klaus Diepold

AI总结 本文研究了如何通过通用动态神经网络(GDNN)扩展PID控制器,以提高复杂控制系统的性能和可解释性,通过四个基准系统的广泛比较,展示了神经PID控制器在16项任务中优于传统PID和模型驱动控制的13项任务。

详情
AI中文摘要

现代自动化系统依赖于闭环控制,其中控制器根据观察与受控过程交互。这些系统日益复杂,但大多数控制器仍是线性比例-积分-微分(PID)控制器。PID控制器在处理线性和近线性系统时表现良好,但其简单性与控制复杂过程所需鲁棒性相矛盾。现代机器学习提供了一种方法,即通过神经网络扩展PID控制器,以超越其线性能力。然而,这种扩展以失去稳定性保证和控制器可解释性为代价。本文研究了通过循环神经网络(即通用动态神经网络GDNN)扩展PID控制器的效用,证明GDNN(神经)PID控制器在多种控制系统中表现良好,并强调其作为可扩展和可解释的控制选项。为此,我们通过四个基准系统进行了广泛研究,这些系统代表了最常用的控制工程基准。所有控制基准均在有噪声和无噪声、有干扰和无干扰的情况下进行评估。神经PID控制器在16项任务中优于传统PID控制15项,在16项任务中优于模型驱动控制13项。作为第二项贡献,我们解决了防止神经网络用于实际控制过程的可解释性不足问题。我们使用有界输入有界输出稳定性分析来评估神经网络建议的参数,从而使其变得可理解。这种严格的评估与更好的可解释性相结合,是神经网络控制方法接受的重要步骤。此外,这也是可解释和安全应用人工智能的重要步骤。

英文摘要

Modern automation systems rely on closed loop control, wherein a controller interacts with a controlled process, based on observations. These systems are increasingly complex, yet most controllers are linear Proportional-Integral-Derivative (PID) controllers. PID controllers perform well on linear and near-linear systems but their simplicity is at odds with the robustness required to reliably control complex processes. Modern machine learning offers a way to extend PID controllers beyond their linear capabilities by using neural networks. However, such an extension comes at the cost of losing stability guarantees and controller interpretability. In this paper, we examine the utility of extending PID controllers with recurrent neural networks-namely, General Dynamic Neural Networks (GDNN); we show that GDNN (neural) PID controllers perform well on a range of control systems and highlight how they can be a scalable and interpretable option for control systems. To do so, we provide an extensive study using four benchmark systems that represent the most common control engineering benchmarks. All control benchmarks are evaluated with and without noise as well as with and without disturbances. The neural PID controller performs better than standard PID control in 15 of 16 tasks and better than model-based control in 13 of 16 tasks. As a second contribution, we address the lack of interpretability that prevents neural networks from being used in real-world control processes. We use bounded-input bounded-output stability analysis to evaluate the parameters suggested by the neural network, thus making them understandable. This combination of rigorous evaluation paired with better interpretability is an important step towards the acceptance of neural-network-based control approaches. It is furthermore an important step towards interpretable and safely applied artificial intelligence.

1810.11596 2026-06-04 math.NA cs.NA physics.bio-ph physics.comp-ph stat.ML

Nonlocal flocking dynamics: Learning the fractional order of PDEs from particle simulations

非局部聚群动力学:从粒子模拟中学习PDE的分数阶次

Zhiping Mao, Zhen Li, George Em Karniadakis

AI总结 本文研究了非局部聚群动力学,通过粒子模拟学习PDE的分数阶次,将离散的agent模型与连续的fPDE模型连接起来,展示了如何从粒子轨迹中提取有效的非局部影响函数。

详情
Journal ref
Commun. Appl. Math. Comput. 2019, 1: 597-619
Comments
22 pages, 7 figures
AI中文摘要

聚群指的是大量相互作用实体的集体行为,其中个体之间的相互作用产生大规模的集体运动。我们采用基于代理的模型来描述聚群中每个个体的微观动力学,并用分数PDE来建模感兴趣的宏观量的演化。通过将微观模型应用连续假设,推导出具有现象学相互作用函数的宏观模型。而不是为非局部聚群动力学指定一个任意的分数阶次,我们直接从由基于代理的模拟生成的粒子轨迹中学习fPDE中的有效非局部影响函数。我们展示了如何利用学习框架将离散的基于代理的模型与连续的fPDE在1D和2D非局部聚群动力学中连接起来。特别是,Cucker-Smale粒子模型用于描述每个个体的微尺度动力学,而带有非局部相互作用项的欧拉方程用于计算宏观量的演化。由粒子模拟生成的轨迹模仿了通过实验可获得的跟踪日志数据。它们可以用于使用高斯过程回归模型(通过贝叶斯优化实现)学习影响函数的分数阶次。我们展示了通过有限体积方案求解学习到的欧拉方程的数值解可以得到与基于代理系统集体行为一致的正确密度分布。所提出的方法为如何将离散的基于代理的模型扩展到基于连续的PDE模型提供了新的见解,并可作为从粒子轨迹中提取有效控制方程的范例。

英文摘要

Flocking refers to collective behavior of a large number of interacting entities, where the interactions between discrete individuals produce collective motion on the large scale. We employ an agent-based model to describe the microscopic dynamics of each individual in a flock, and use a fractional PDE to model the evolution of macroscopic quantities of interest. The macroscopic models with phenomenological interaction functions are derived by applying the continuum hypothesis to the microscopic model. Instead of specifying the fPDEs with an ad hoc fractional order for nonlocal flocking dynamics, we learn the effective nonlocal influence function in fPDEs directly from particle trajectories generated by the agent-based simulations. We demonstrate how the learning framework is used to connect the discrete agent-based model to the continuum fPDEs in 1D and 2D nonlocal flocking dynamics. In particular, a Cucker-Smale particle model is employed to describe the microscale dynamics of each individual, while Euler equations with nonlocal interaction terms are used to compute the evolution of macroscale quantities. The trajectories generated by the particle simulations mimic the field data of tracking logs that can be obtained experimentally. They can be used to learn the fractional order of the influence function using a Gaussian process regression model implemented with the Bayesian optimization. We show that the numerical solution of the learned Euler equations solved by the finite volume scheme can yield correct density distributions consistent with the collective behavior of the agent-based system. The proposed method offers new insights on how to scale the discrete agent-based models to the continuum-based PDE models, and could serve as a paradigm on extracting effective governing equations for nonlocal flocking dynamics directly from particle trajectories.

1708.01974 2026-06-04 math.ST econ.GN q-fin.EC stat.ME stat.TH

Model Misspecification in ABC: Consequences and Diagnostics

ABC中模型不规范性:后果与诊断

David T. Frazier, Christian P. Robert, Judith Rousseau

AI总结 本文研究了ABC方法在模型不规范性下的行为,分析了不同ABC变体在模型不规范时可能产生显著不同的结果,并提出了两种诊断模型不规范性的方法。

详情
AI中文摘要

我们分析了当生成模拟数据的模型与实际数据生成过程不一致时,近似贝叶斯计算(ABC)的行为;即当ABC中的数据模拟器不规范时。我们通过理论分析和简单但实际相关的例子,证明当模型不规范时,不同的ABC版本会产生显著不同的结果。我们的理论结果表明,尽管模型不规范,只要满足正则性条件,接受/拒绝ABC方法会将后验质量集中在适当定义的伪真参数值上。然而,在模型不规范的情况下,ABC后验不会产生具有有效频率覆盖的可信区间,并且具有非标准的渐近行为。此外,我们还研究了在模型不规范情况下流行的局部回归调整ABC的理论行为,并证明这种方法会将后验质量集中在与接受/拒绝ABC完全不同的伪真值上。利用我们的理论结果,我们建议两种方法来诊断ABC中的模型不规范性。所有理论结果和诊断均在简单的运行示例中进行说明。

英文摘要

We analyze the behavior of approximate Bayesian computation (ABC) when the model generating the simulated data differs from the actual data generating process; i.e., when the data simulator in ABC is misspecified. We demonstrate both theoretically and in simple, but practically relevant, examples that when the model is misspecified different versions of ABC can yield substantially different results. Our theoretical results demonstrate that even though the model is misspecified, under regularity conditions, the accept/reject ABC approach concentrates posterior mass on an appropriately defined pseudo-true parameter value. However, under model misspecification the ABC posterior does not yield credible sets with valid frequentist coverage and has non-standard asymptotic behavior. In addition, we examine the theoretical behavior of the popular local regression adjustment to ABC under model misspecification and demonstrate that this approach concentrates posterior mass on a completely different pseudo-true value than accept/reject ABC. Using our theoretical results, we suggest two approaches to diagnose model misspecification in ABC. All theoretical results and diagnostics are illustrated in a simple running example.

1812.03412 2026-06-04 cs.LG cs.NA math.NA stat.ML

Learning Multiplication-free Linear Transformations

学习无乘法线性变换

Cristian Rusu

AI总结 本文提出了一种字典学习算法,用于稀疏表示,同时对学习到的字典施加特定结构,使其在数值上更高效:减少加法/乘法次数甚至避免乘法。我们基于字典的高结构化基本构建块(二进制正交、缩放和剪切变换)来建立工作,可以写出优化问题的闭式解。我们在图像数据上展示了方法的有效性,并与已知的数值高效变换如快速傅里叶变换和快速离散余弦变换进行比较。

详情
AI中文摘要

在本文中,我们提出了几种字典学习算法,用于稀疏表示,同时对学习到的字典施加特定结构,使得它们在数值上更高效:减少加法/乘法的次数,甚至避免乘法。我们的工作基于字典的高结构化基本构建块(二进制正交、缩放和剪切变换)来建立,可以写出我们考虑的优化问题的闭式解。我们在图像数据上展示了我们方法的有效性,并可以与已知的数值高效变换如快速傅里叶变换和快速离散余弦变换进行比较。

英文摘要

In this paper, we propose several dictionary learning algorithms for sparse representations that also impose specific structures on the learned dictionaries such that they are numerically efficient to use: reduced number of addition/multiplications and even avoiding multiplications altogether. We base our work on factorizations of the dictionary in highly structured basic building blocks (binary orthonormal, scaling and shear transformations) for which we can write closed-form solutions to the optimization problems that we consider. We show the effectiveness of our methods on image data where we can compare against well-known numerically efficient transforms such as the fast Fourier and the fast discrete cosine transforms.

1410.7091 2026-06-04 math.OC cs.SY eess.SY math.ST stat.TH

On Some Distributed Disorder Detection

关于某些分布式紊乱检测

Krzysztof Szajowski

AI总结 本文研究了多变量数据源中不同信息价值组件的分布紊乱检测问题,提出了一种具有马尔可夫性质的数学模型,并基于贝叶斯方法在多维过程中检测转移概率的变化。

详情
Journal ref
Springer Proceedings in Mathematics and Statistics 2015
Comments
8. arXiv admin note: substantial text overlap with arXiv:1111.4504, arXiv:1304.6986
AI中文摘要

具有不同信息价值组件的多变量数据源在实践中似乎频繁出现。在不同时间改变其同质性的模型具有重要意义。是否任何变化对整个过程有影响不仅取决于变化的时刻,还取决于坐标。这在复杂系统的可靠性分析和监视系统中入侵者定位等问题中尤为重要。本文开发了一种具有离散时间的信号源数学模型,该模型在变化时间具有马尔可夫性质。研究还包括在多维过程中在特定灵敏度水平上检测转移概率变化的多变量检测。此外,随机向量的观测被描绘出来。每个所选坐标在某些未知时刻前后形成具有不同转移概率的马尔可夫过程。统计学家的目标是基于过程的观测来估计时刻。使用贝叶斯方法,风险函数取决于假警概率的度量和一些过度估计的成本。系统紊乱时刻由某些坐标上转移概率变化的检测确定。对关键坐标的总体建模基于简单的游戏。

英文摘要

Multivariate data sources with components of different information value seem to appear frequently in practice. Models in which the components change their homogeneity at different times are of significant importance. The fact whether any changes are influential for the whole process is determined not only by the moments of the change, but also depends on which coordinates. This is particularly important in issues such as reliability analysis of complex systems and the location of an intruder in surveillance systems. In this paper we developed a mathematical model for such sources of signals with discrete time having the Markov property given the times of change. The research also comprises a multivariate detection of the transition probabilities changes at certain sensitivity level in the multidimensional process. Additionally, the observation of the random vector is depicted. Each chosen coordinate forms the Markov process with different transition probabilities before and after some unknown moment. The aim of statisticians is to estimate the moments based on the observation of the process. The Bayesian approach is used with the risk function depending on measure of chance of a false alarm and some cost of overestimation. The moment of the system's disorder is determined by the detection of transition probabilities changes at some coordinates. The overall modeling of the critical coordinates is based on the simple game.

1812.07725 2026-06-04 math.OC cs.LG cs.NA math.NA math.PR stat.ML

Breaking Reversibility Accelerates Langevin Dynamics for Global Non-Convex Optimization

打破可逆性加速Langevin动力学用于全局非凸优化

Xuefeng Gao, Mert Gurbuzbalaban, Lingjiong Zhu

AI总结 本文研究了非可逆Langevin动力学在全局非凸优化中的应用,通过分析非可逆动力学算法的收敛性和混合速率,证明了非可逆算法在寻找局部极小值和探索状态空间方面的效率提升。

详情
AI中文摘要

Langevin动力学(LD)已被证明是一种强大的技术,用于优化非凸目标,作为一种高效的算法来寻找局部极小值,而最终在更长的时间尺度上访问全局极小值。LD基于一阶Langevin扩散,其时间是可逆的。我们研究了两种基于非可逆Langevin扩散的变种:欠阻尼Langevin动力学(ULD)和具有非对称漂移的Langevin动力学(NLD)。采用Tzen、Liang和Raginsky(2018)为LD到非可逆扩散的技术,我们证明了对于给定的局部极小值,其在初始化点任意距离内,以高概率,ULD轨迹会在依赖于局部极小值Hessian最小特征值的复发时间内结束于该局部极小值的小邻域之外,或者在复发时间内进入该邻域并停留可能极长的逃逸时间。ULD算法在Hessian最小特征值的依赖性方面优于Tzen、Liang和Raginsky(2018)中LD的复发时间。对于NLD算法也获得了相似的结果和改进。我们还展示了非可逆变种在离散时间中能够更快地退出局部极小值的吸引盆地,当目标函数有两个局部极小值被鞍点分隔时,并量化了改进的幅度。我们的分析表明,非可逆Langevin算法在寻找局部极小值和探索状态空间方面更有效。我们的分析基于在局部极小值周围对目标函数的二次近似。作为我们分析的副产品,我们获得了两个非可逆Langevin算法在2-Wasserstein距离下的最优混合速率。

英文摘要

Langevin dynamics (LD) has been proven to be a powerful technique for optimizing a non-convex objective as an efficient algorithm to find local minima while eventually visiting a global minimum on longer time-scales. LD is based on the first-order Langevin diffusion which is reversible in time. We study two variants that are based on non-reversible Langevin diffusions: the underdamped Langevin dynamics (ULD) and the Langevin dynamics with a non-symmetric drift (NLD). Adopting the techniques of Tzen, Liang and Raginsky (2018) for LD to non-reversible diffusions, we show that for a given local minimum that is within an arbitrary distance from the initialization, with high probability, either the ULD trajectory ends up somewhere outside a small neighborhood of this local minimum within a recurrence time which depends on the smallest eigenvalue of the Hessian at the local minimum or they enter this neighborhood by the recurrence time and stay there for a potentially exponentially long escape time. The ULD algorithms improve upon the recurrence time obtained for LD in Tzen, Liang and Raginsky (2018) with respect to the dependency on the smallest eigenvalue of the Hessian at the local minimum. Similar result and improvement are obtained for the NLD algorithm. We also show that non-reversible variants can exit the basin of attraction of a local minimum faster in discrete time when the objective has two local minima separated by a saddle point and quantify the amount of improvement. Our analysis suggests that non-reversible Langevin algorithms are more efficient to locate a local minimum as well as exploring the state space. Our analysis is based on the quadratic approximation of the objective around a local minimum. As a by-product of our analysis, we obtain optimal mixing rates for quadratic objectives in the 2-Wasserstein distance for two non-reversible Langevin algorithms we consider.

1803.08600 2026-06-04 math.NA cs.NA math.PR stat.ML

Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates

随机梯度下降优化算法的下误差界:慢速和快速衰减学习率的sharp收敛速率

Arnulf Jentzen, Philippe von Wurstemberger

AI总结 本文研究了随机梯度下降算法的下误差界,通过建立与学习率衰减行为相关的均方误差下界和上界,精确量化了SGD方法的收敛速率。

详情
Journal ref
J. Complexity 57 (2020), 101438
Comments
42 pages
AI中文摘要

随机梯度下降(SGD)优化算法在一系列机器学习应用中扮演着核心角色。科学文献提供了大量关于SGD方法的上误差界,但对SGD方法的下误差界的研究则较少。本文的关键贡献是朝着这一方向迈出了一步。更具体地说,本文为每一个γ, ν∈(0,∞),建立了与学习率(γ/n^ν)_{n∈ℕ}相关的简单二次随机优化问题的SGD过程的均方误差的下界和上界,这使得我们可以精确地量化SGD方法的均方收敛速率,其依赖于学习率的渐进行为。

英文摘要

The stochastic gradient descent (SGD) optimization algorithm plays a central role in a series of machine learning applications. The scientific literature provides a vast amount of upper error bounds for the SGD method. Much less attention as been paid to proving lower error bounds for the SGD method. It is the key contribution of this paper to make a step in this direction. More precisely, in this article we establish for every $γ, ν\in (0,\infty)$ essentially matching lower and upper bounds for the mean square error of the SGD process with learning rates $(\fracγ{n^ν})_{n \in \mathbb{N}}$ associated to a simple quadratic stochastic optimization problem. This allows us to precisely quantify the mean square convergence rate of the SGD method in dependence on the asymptotic behavior of the learning rates.

1902.00931 2026-06-04 math.OC cs.SY eess.SY stat.ME stat.ML

Optimal Experiment Design in Nonlinear Parameter Estimation with Exact Confidence Regions

非线性参数估计中基于模型的最优实验设计与精确置信区域

Anwesh Reddy Gottu Mukkula, Radoslav Paulen

AI总结 本文研究了非线性系统基于模型的最优实验设计方法,通过在后验分析中获得的最小二乘参数估计的联合置信区域来优化参数联合置信区域的几何形状,提出了一种基于精确置信区域的最优设计方法,并通过两个小规模案例研究比较了不同最优设计准则和线性化方法的性能。

详情
Comments
12 pages, 9 figures
AI中文摘要

本文研究了非线性系统基于模型的最优实验设计(OED)。OED是一种用于优化参数联合置信区域(CRs)几何形状的方法,这些CRs是在后验分析中获得的最小二乘参数估计的结果。通过利用可用的(实验)自由度,以获得更具信息量的测量结果。与通常使用的基于线性化CRs的方法不同,本文探索了一种在OED框架中显式考虑精确CRs的方法。我们提出了一种有限参数化的精确CRs的方法,并引入了一种使用内-外逼近椭球作为计算上更可行的替代方法来近似精确CRs。所采用的技术将OED问题转化为一个双层的有限维数学规划问题。我们通过两个小规模的示例案例研究了各种OED准则,并将所得到的最优设计与常用的线性化方法进行比较。我们还评估了两种简单的启发式数值方案在所研究问题中的双层优化性能。

英文摘要

A model-based optimal experiment design (OED) of nonlinear systems is studied. OED represents a methodology for optimizing the geometry of the parametric joint-confidence regions (CRs), which are obtained in an a posteriori analysis of the least-squares parameter estimates. The optimal design is achieved by using the available (experimental) degrees of freedom such that more informative measurements are obtained. Unlike the commonly used approaches, which base the OED procedure upon the linearized CRs, we explore a path where we explicitly consider the exact CRs in the OED framework. We use a methodology for a finite parametrization of the exact CRs within the OED problem and we introduce a novel approximation technique of the exact CRs using inner- and outer-approximating ellipsoids as a computationally less demanding alternative. The employed techniques give the OED problem as a finite-dimensional mathematical program of bilevel nature. We use two small-scale illustrative case studies to study various OED criteria and compare the resulting optimal designs with the commonly used linearization-based approach. We also assess the performance of two simple heuristic numerical schemes for bilevel optimization within the studied problems.

1904.10778 2026-06-04 cs.LG cs.SY eess.SY math.OC math.PR stat.ML

Some Limit Properties of Markov Chains Induced by Stochastic Recursive Algorithms

由随机递归算法诱导的马尔可夫链的一些极限性质

Abhishek Gupta, Hao Chen, Jianzong Pi, Gaurav Tendolkar

AI总结 本文研究了由随机递归算法诱导的马尔可夫链的极限性质,通过分析迭代随机算子的收敛性,证明了随机序列的分布弱收敛于收缩算子生成的轨迹,并进一步展示了随机序列的时间平均收敛于不变分布的空间均值。

详情
Comments
Accepted in SIMODS, 37 pages
AI中文摘要

递归随机算法由于数据驱动应用而近期受到广泛关注。例如,随机梯度下降用于解决大规模优化问题,经验动态规划算法用于解决马尔可夫决策问题。这些递归随机算法近似某些收缩算子,并可以被视为迭代随机算子的框架内。因此,我们考虑在波兰空间上迭代随机算子,模拟该波兰空间上的迭代收缩算子。假设迭代随机算子按一定批次大小索引,当批次大小趋于无穷时,每个随机算子的实现(以某种方式)收敛于它所模拟的收缩算子。我们证明,从相同的初始条件出发,由迭代随机算子生成的随机序列的分布弱收敛于由收缩算子生成的轨迹。我们进一步证明,在某些条件下,随机序列的时间平均收敛于不变分布的空间均值。然后,我们将这些结果应用于逻辑回归、经验价值迭代和经验Q值迭代,以说明此处发展的通用理论。

英文摘要

Recursive stochastic algorithms have gained significant attention in the recent past due to data driven applications. Examples include stochastic gradient descent for solving large-scale optimization problems and empirical dynamic programming algorithms for solving Markov decision problems. These recursive stochastic algorithms approximate certain contraction operators and can be viewed within the framework of iterated random operators. Accordingly, we consider iterated random operators over a Polish space that simulate iterated contraction operator over that Polish space. Assume that the iterated random operators are indexed by certain batch sizes such that as batch sizes grow to infinity, each realization of the random operator converges (in some sense) to the contraction operator it is simulating. We show that starting from the same initial condition, the distribution of the random sequence generated by the iterated random operators converges weakly to the trajectory generated by the contraction operator. We further show that under certain conditions, the time average of the random sequence converges to the spatial mean of the invariant distribution. We then apply these results to logistic regression, empirical value iteration, and empirical Q value iteration for finite state finite action MDPs to illustrate the general theory develop here.

1810.01702 2026-06-04 math.ST cs.NA math.NA math.PR stat.TH

Nonparametric statistical inference for drift vector fields of multi-dimensional diffusions

多维扩散漂移向量场的非参数统计推断

Richard Nickl, Kolyan Ray

AI总结 本文研究了从多维随机微分方程解的观测轨迹中确定周期性Lipschitz向量场的问题,推导了惩罚最小二乘估计器的收敛速率,并证明了后验分布的收缩速率,结果在L²损失和supnorm损失下最优,且在d≤4时有效,同时当d≤3时证明了后验分布的非参数伯恩斯坦-冯·米塞斯定理。

详情
Journal ref
Ann. Statist. 48 (2020), 1383-1408
Comments
55 pages, to appear in the Annals of Statistics
AI中文摘要

考虑从多维随机微分方程解的观测轨迹确定周期性Lipschitz向量场$b=(b_1, \dots, b_d)$的问题,其中$W_t$是标准的d维布朗运动。推导了惩罚最小二乘估计器的收敛速率,该估计器等于高维高斯乘积先验对应的最大后验估计。这些结果基于相关后验分布的收缩速率。获得的速率在任何维度下在L²损失中至多有对数因子最优,并且当d≤4时在supnorm损失下也最优。进一步,当d≤3时,证明了后验分布的非参数伯恩斯坦-冯·米塞斯定理。由此推导出不变测度$μ_b$的隐含估计量的函数中心极限定理。极限高斯过程分布具有从信息论角度看渐近最优的协方差结构。

英文摘要

The problem of determining a periodic Lipschitz vector field $b=(b_1, \dots, b_d)$ from an observed trajectory of the solution $(X_t: 0 \le t \le T)$ of the multi-dimensional stochastic differential equation \begin{equation*} dX_t = b(X_t)dt + dW_t, \quad t \geq 0, \end{equation*} where $W_t$ is a standard $d$-dimensional Brownian motion, is considered. Convergence rates of a penalised least squares estimator, which equals the maximum a posteriori (MAP) estimate corresponding to a high-dimensional Gaussian product prior, are derived. These results are deduced from corresponding contraction rates for the associated posterior distributions. The rates obtained are optimal up to log-factors in $L^2$-loss in any dimension, and also for supremum norm loss when $d \le 4$. Further, when $d \le 3$, nonparametric Bernstein-von Mises theorems are proved for the posterior distributions of $b$. From this we deduce functional central limit theorems for the implied estimators of the invariant measure $μ_b$. The limiting Gaussian process distributions have a covariance structure that is asymptotically optimal from an information-theoretic point of view.

1711.05337 2026-06-04 math.PR cs.NA math.NA stat.CO stat.ME

Geometric integrators and the Hamiltonian Monte Carlo method

几何积分方法与哈密顿蒙特卡洛方法

Nawaf Bou-Rabee, Jesús María Sanz-Serna

AI总结 本文探讨了数值积分与哈密顿(或混合)蒙特卡洛方法(HMC)之间的关系,指出HMC的计算成本主要来自数值积分,因此需要高效的方法。然而,HMC要求具有体积守恒和可逆几何性质的方法,这限制了可用积分器的数量。同时,这些几何性质对积分误差有重要影响,进而影响提议接受率。尽管目前Velvet算法因其优点被广泛采用,但本文认为Velvet可以进一步改进。此外,本文还详细讨论了HMC在目标分布维度增加时的行为。

详情
Journal ref
Acta Numerica, Vol. 27, pp. 113-206, 2018
Comments
Final version will appear in Acta Numerica 2018
AI中文摘要

本文详细回顾了数值积分与哈密顿(或混合)蒙特卡洛方法(HMC)之间的关系。由于HMC的计算成本主要来自数值积分,因此这些积分应尽可能高效地进行。然而,HMC要求具有体积守恒和可逆几何性质的方法,这限制了可用积分器的数量。另一方面,这些几何性质对积分误差有重要的定量影响,进而影响提议的接受率。尽管目前Velvet算法因其优点被广泛采用,但本文认为Velvet可以进一步改进。此外,本文还详细讨论了HMC在目标分布维度增加时的行为。

英文摘要

This paper surveys in detail the relations between numerical integration and the Hamiltonian (or hybrid) Monte Carlo method (HMC). Since the computational cost of HMC mainly lies in the numerical integrations, these should be performed as efficiently as possible. However, HMC requires methods that have the geometric properties of being volume-preserving and reversible, and this limits the number of integrators that may be used. On the other hand, these geometric properties have important quantitative implications on the integration error, which in turn have an impact on the acceptance rate of the proposal. While at present the velocity Verlet algorithm is the method of choice for good reasons, we argue that Verlet can be improved upon. We also discuss in detail the behavior of HMC as the dimensionality of the target distribution increases.

1709.05963 2026-06-04 math.NA cs.LG cs.NA cs.NE math.PR stat.ML

Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations

基于机器学习的近似算法用于高维非线性偏微分方程和二阶反向随机微分方程

Christian Beck, Weinan E, Arnulf Jentzen

AI总结 本文提出了一种基于机器学习的高维非线性二阶偏微分方程的求解方法,通过将非线性偏微分方程与二阶反向随机微分方程联系起来,并利用深度神经网络进行空间近似和随机梯度下降优化,展示了该方法在高维Black-Scholes-Barenblatt方程、Hamilton-Jacobi-Bellman方程和非线性期望问题中的高效性和准确性。

详情
Journal ref
J. Nonlinear Sci. 29, 1563-1619 (2019)
Comments
56 pages, 12 figures
AI中文摘要

高维偏微分方程(PDE)出现在金融行业的多个模型中,例如衍生品定价模型、信用估值调整(CVA)模型或投资组合优化模型。这些应用中的PDE通常是高维的,因为维度对应于投资组合中的金融资产数量。此外,由于需要在模型中纳入某些非线性现象,如违约风险、交易成本、波动率不确定性(Knightian不确定性)或交易限制,这些PDE往往是完全非线性的。此类高维完全非线性PDE的求解极具挑战性,因为标准近似方法的计算努力随着维度呈指数增长。在本工作中,我们提出了一种新的方法来求解高维完全非线性二阶PDE。该方法可以特别用于采样高维非线性期望。该方法基于(i)完全非线性二阶PDE与二阶反向随机微分方程(2BSDE)之间的联系,(ii)PDE和2BSDE问题的合并公式,(iii)2BSDE的时间前向离散化和通过深度神经网络的空间近似,以及(iv)随机梯度下降型优化过程。使用Python中的TENSORFLOW获得的数值结果展示了该方法在100维Black-Scholes-Barenblatt方程、100维Hamilton-Jacobi-Bellman方程和100维G-布朗运动的非线性期望问题中的效率和准确性。

英文摘要

High-dimensional partial differential equations (PDE) appear in a number of models from the financial industry, such as in derivative pricing models, credit valuation adjustment (CVA) models, or portfolio optimization models. The PDEs in such applications are high-dimensional as the dimension corresponds to the number of financial assets in a portfolio. Moreover, such PDEs are often fully nonlinear due to the need to incorporate certain nonlinear phenomena in the model such as default risks, transaction costs, volatility uncertainty (Knightian uncertainty), or trading constraints in the model. Such high-dimensional fully nonlinear PDEs are exceedingly difficult to solve as the computational effort for standard approximation methods grows exponentially with the dimension. In this work we propose a new method for solving high-dimensional fully nonlinear second-order PDEs. Our method can in particular be used to sample from high-dimensional nonlinear expectations. The method is based on (i) a connection between fully nonlinear second-order PDEs and second-order backward stochastic differential equations (2BSDEs), (ii) a merged formulation of the PDE and the 2BSDE problem, (iii) a temporal forward discretization of the 2BSDE and a spatial approximation via deep neural nets, and (iv) a stochastic gradient descent-type optimization procedure. Numerical results obtained using ${\rm T{\small ENSOR}F{\small LOW}}$ in ${\rm P{\small YTHON}}$ illustrate the efficiency and the accuracy of the method in the cases of a $100$-dimensional Black-Scholes-Barenblatt equation, a $100$-dimensional Hamilton-Jacobi-Bellman equation, and a nonlinear expectation of a $ 100 $-dimensional $ G $-Brownian motion.

1706.04702 2026-06-04 math.NA cs.LG cs.NA cs.NE math.PR stat.ML

Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

基于深度学习的高维抛物型偏微分方程和反向随机微分方程的数值方法

Weinan E, Jiequn Han, Arnulf Jentzen

AI总结 本文提出了一种基于深度学习的算法,通过将反向随机微分方程与强化学习类比,利用解的梯度作为策略函数,采用神经网络近似策略函数,有效解决了高维非线性偏微分方程和反向随机微分方程的问题。

详情
Journal ref
Commun. Math. Stat. 5, 349-380 (2017)
Comments
39 pages, 15 figures
AI中文摘要

我们提出了一种新的算法,用于求解高维抛物型偏微分方程(PDEs)和反向随机微分方程(BSDEs),通过将BSDE与强化学习进行类比,将解的梯度作为策略函数,损失函数由给定的终端条件与BSDE解之间的误差构成。策略函数随后通过神经网络进行近似,如深度强化学习中所做的那样。使用TensorFlow进行的数值结果展示了所提出算法在解决物理和金融领域中多个100维非线性PDEs方面的效率和准确性,例如Allen-Cahn方程、Hamilton-Jacobi-Bellman方程以及金融衍生品的非线性定价模型。

英文摘要

We propose a new algorithm for solving parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) in high dimension, by making an analogy between the BSDE and reinforcement learning with the gradient of the solution playing the role of the policy function, and the loss function given by the error between the prescribed terminal condition and the solution of the BSDE. The policy function is then approximated by a neural network, as is done in deep reinforcement learning. Numerical results using TensorFlow illustrate the efficiency and accuracy of the proposed algorithms for several 100-dimensional nonlinear PDEs from physics and finance such as the Allen-Cahn equation, the Hamilton-Jacobi-Bellman equation, and a nonlinear pricing model for financial derivatives.

1903.11483 2026-06-04 cs.LG cs.NE cs.RO cs.SY eess.SY stat.ML

Constructing Parsimonious Analytic Models for Dynamic Systems via Symbolic Regression

通过符号回归构建动态系统的简洁解析模型

Erik Derner, Jiří Kubalík, Nicola Ancona, Robert Babuška

AI总结 本文提出利用符号回归构建动态系统的简洁解析模型,通过两种先进的符号回归算法在状态空间域和输入输出域中应用,展示了在模拟示例和真实系统中的优越性能。

详情
Journal ref
Applied Soft Computing, Volume 94, September 2020, 106432
AI中文摘要

构建动态系统的数学模型对于许多工程和科学学科至关重要。模型有助于模拟、分析系统行为、决策制定和自动控制算法的设计。即使像强化学习(RL)这样的无模型控制技术也已被证明能从使用模型中受益,通常这些模型是在线学习的。任何模型构建方法都必须处理模型的准确性和复杂性之间的权衡,这很难做到。本文提出利用符号回归(SR)来构建由解析方程描述的简洁过程模型。我们为方法配备了两种最先进的符号回归算法,它们自动搜索适合测量数据的方程:单节点遗传编程(SNGP)和多基因遗传编程(MGGP)。除了状态空间域中的标准问题表述外,我们还展示了该方法如何应用于非线性自回归加外生输入(NARX)类型的输入输出模型。我们展示了该方法在三个模拟示例中的应用,这些示例的状态空间最高可达14维:倒立摆、移动机器人和双足行走机器人。与深度神经网络和局部线性回归的比较表明,SR在大多数情况下优于这些常用替代方法。我们在真实摆系统上展示了解析模型的发现使RL控制器能够成功完成摆起任务,该模型仅基于100个数据样本构建。

英文摘要

Developing mathematical models of dynamic systems is central to many disciplines of engineering and science. Models facilitate simulations, analysis of the system's behavior, decision making and design of automatic control algorithms. Even inherently model-free control techniques such as reinforcement learning (RL) have been shown to benefit from the use of models, typically learned online. Any model construction method must address the tradeoff between the accuracy of the model and its complexity, which is difficult to strike. In this paper, we propose to employ symbolic regression (SR) to construct parsimonious process models described by analytic equations. We have equipped our method with two different state-of-the-art SR algorithms which automatically search for equations that fit the measured data: Single Node Genetic Programming (SNGP) and Multi-Gene Genetic Programming (MGGP). In addition to the standard problem formulation in the state-space domain, we show how the method can also be applied to input-output models of the NARX (nonlinear autoregressive with exogenous input) type. We present the approach on three simulated examples with up to 14-dimensional state space: an inverted pendulum, a mobile robot, and a bipedal walking robot. A comparison with deep neural networks and local linear regression shows that SR in most cases outperforms these commonly used alternative methods. We demonstrate on a real pendulum system that the analytic model found enables a RL controller to successfully perform the swing-up task, based on a model constructed from only 100 data samples.

1905.10706 2026-06-04 cs.LG cs.RO cs.SY eess.SY stat.ML

Interactive Differentiable Simulation

交互式可微分模拟

Eric Heiden, David Millard, Hejia Zhang, Gaurav S. Sukhatme

AI总结 本文提出交互式可微分模拟(IDS),一种能够高效准确推断刚体系统物理属性的可微分物理引擎,通过视觉输入实现系统识别,从而建立具有物理意义的世界模型,并在非线性动态系统中实现自动任务机器人设计和参数估计,显著提升了非线性控制领域的样本效率。

详情
AI中文摘要

智能体需要对世界有物理理解才能预测其未来行动的影响。虽然基于学习的环境动力学模型在样本效率上相比无模型强化学习算法有所改进,但通常无法泛化到训练数据之外的系统状态,且往往依赖于非解释性的潜在变量。我们引入交互式可微分模拟(IDS),一种可微分的物理引擎,能够高效准确地推断刚体系统的物理属性。将模型集成到深度学习架构中,该模型能够利用视觉输入实现系统识别,从而建立具有物理意义的世界模型。我们展示了通过自动计算IDS中的梯度,实现非线性动态系统的自动任务机器人设计和参数估计。当与自适应模型预测控制算法结合时,我们的方法在具有挑战性的非线性控制领域中,相比无模型强化学习算法显示出数量级的样本效率提升。

英文摘要

Intelligent agents need a physical understanding of the world to predict the impact of their actions in the future. While learning-based models of the environment dynamics have contributed to significant improvements in sample efficiency compared to model-free reinforcement learning algorithms, they typically fail to generalize to system states beyond the training data, while often grounding their predictions on non-interpretable latent variables. We introduce Interactive Differentiable Simulation (IDS), a differentiable physics engine, that allows for efficient, accurate inference of physical properties of rigid-body systems. Integrated into deep learning architectures, our model is able to accomplish system identification using visual input, leading to an interpretable model of the world whose parameters have physical meaning. We present experiments showing automatic task-based robot design and parameter estimation for nonlinear dynamical systems by automatically calculating gradients in IDS. When integrated into an adaptive model-predictive control algorithm, our approach exhibits orders of magnitude improvements in sample efficiency over model-free reinforcement learning algorithms on challenging nonlinear control domains.

1607.01027 2026-06-04 math.OC cs.LG cs.NA math.NA stat.ML

Accelerate Stochastic Subgradient Method by Leveraging Local Growth Condition

通过利用局部增长条件加速随机子梯度方法

Yi Xu, Qihang Lin, Tianbao Yang

AI总结 本文提出了一种新的理论,表明在最优解邻域内目标函数的局部增长率足以量化一阶随机凸优化的全局收敛率,通过局部区域逐步缩小的方法改进了加速随机子梯度方法的收敛性,并在实践中提出了无需知道乘法增长常数和增长率的实用变体。

详情
AI中文摘要

在本文中,我们为一阶随机凸优化开发了一种新理论,表明全局收敛率足以由最优解邻域内目标函数的局部增长率量化。具体而言,如果目标函数F(w)在ε子水平集内以速度‖w - w*‖_2^{1/θ}增长,其中w*是w最近的最优解,θ∈(0,1]表示局部增长率,则达到ε最优解的一阶随机优化迭代复杂度可以为~O(1/ε^{2(1-θ)}),这在至多对数因子范围内是最佳的。为了实现更快的全局收敛,我们通过在历史解的局部区域中迭代求解原始问题,开发了两种不同的加速随机子梯度方法,该局部区域的大小随着解接近最优集而逐渐减小。除了理论改进外,这项工作还包含了使所提算法实用的新贡献:(i) 我们提出了可以运行而无需知道乘法增长常数和增长率θ的加速随机子梯度方法的实用变体;(ii) 我们考虑了机器学习中的广泛问题集,以证明所提算法比传统随机子梯度方法具有更快的收敛速度。我们还表征了所提算法的复杂性,以确保在不假设光滑性的情况下梯度较小。

英文摘要

In this paper, a new theory is developed for first-order stochastic convex optimization, showing that the global convergence rate is sufficiently quantified by a local growth rate of the objective function in a neighborhood of the optimal solutions. In particular, if the objective function $F(\mathbf w)$ in the $ε$-sublevel set grows as fast as $\|\mathbf w - \mathbf w_*\|_2^{1/θ}$, where $\mathbf w_*$ represents the closest optimal solution to $\mathbf w$ and $θ\in(0,1]$ quantifies the local growth rate, the iteration complexity of first-order stochastic optimization for achieving an $ε$-optimal solution can be $\widetilde O(1/ε^{2(1-θ)})$, which is optimal at most up to a logarithmic factor. To achieve the faster global convergence, we develop two different accelerated stochastic subgradient methods by iteratively solving the original problem approximately in a local region around a historical solution with the size of the local region gradually decreasing as the solution approaches the optimal set. Besides the theoretical improvements, this work also includes new contributions towards making the proposed algorithms practical: (i) we present practical variants of accelerated stochastic subgradient methods that can run without the knowledge of multiplicative growth constant and even the growth rate $θ$; (ii) we consider a broad family of problems in machine learning to demonstrate that the proposed algorithms enjoy faster convergence than traditional stochastic subgradient method. We also characterize the complexity of the proposed algorithms for ensuring the gradient is small without the smoothness assumption.

1809.09170 2026-06-04 math.NA cs.LG cs.NA math.DS stat.ML

Numerical Aspects for Approximating Governing Equations Using Data

利用数据近似求解方程的数值方面

Kailiang Wu, Dongbin Xiu

AI总结 本文提出了一种有效的数值算法,用于从测量数据中局部恢复未知的偏微分方程,通过使用多项式等标准基函数进行高精度近似,并讨论了准确近似的关键因素,如使用大量短轨迹数据而非单一长轨迹数据,以及展示了线性和非线性系统的数值示例。

详情
Journal ref
Journal of Computational Physics, 384, 200-221, 2019
Comments
26 pages, 17 figures
AI中文摘要

我们提出了有效的数值算法,用于从测量数据中局部恢复未知的偏微分方程。我们采用一组标准基函数,例如多项式,来高精度地近似求解方程。在将问题转化为函数近似问题后,我们讨论了几个重要的方面以确保准确的近似。最值得注意的是,我们讨论了使用大量短轨迹数据burst而非单一长轨迹数据的重要性。随后,我们提出了几种数值算法以实现准确的近似,并给出了最终方程近似的误差估计。然后,我们展示了线性和非线性系统的一系列广泛数值示例,以展示我们方程恢复算法的性质和有效性。

英文摘要

We present effective numerical algorithms for locally recovering unknown governing differential equations from measurement data. We employ a set of standard basis functions, e.g., polynomials, to approximate the governing equation with high accuracy. Upon recasting the problem into a function approximation problem, we discuss several important aspects for accurate approximation. Most notably, we discuss the importance of using a large number of short bursts of trajectory data, rather than using data from a single long trajectory. Several options for the numerical algorithms to perform accurate approximation are then presented, along with an error estimate of the final equation approximation. We then present an extensive set of numerical examples of both linear and nonlinear systems to demonstrate the properties and effectiveness of our equation recovery algorithms.

1606.08090 2026-06-04 eess.SY cs.IT cs.SY math.IT math.ST stat.TH

Framework for state and unknown input estimation of linear time-varying systems

线性时变系统状态和未知输入估计的框架

Peng Lu, Erik-Jan van Kampen, Cornelis C. de Visser, Qiping Chu

AI总结 本文提出了一种双模型自适应估计方法,用于解决不存在存在条件的未知输入滤波问题,实现了状态和未知输入的估计与解耦。

详情
Journal ref
Automatica, 73 (2016), 145-154
Comments
This paper has been accepted by Automatica. It considers unknown input estimation or fault and disturbances estimation. Existing approaches considers the case where the effects of fault and disturbance can be decoupled. In our paper, we consider the case where the effects of fault and disturbance are coupled. This approach can be easily extended to nonlinear systems
AI中文摘要

设计未知输入解耦观测器和滤波器需要在文献中假设存在条件。本文针对不存在存在条件的未知输入滤波问题,提出了一种扩展的双模型自适应估计方法。证明了在不满足存在条件的情况下,可以使用扩展的双模型自适应估计方法来估计状态和未知输入。通过数值例子比较了所提出方法与文献中方法的性能。

英文摘要

The design of unknown-input decoupled observers and filters requires the assumption of an existence condition in the literature. This paper addresses an unknown input filtering problem where the existence condition is not satisfied. Instead of designing a traditional unknown input decoupled filter, a Double-Model Adaptive Estimation approach is extended to solve the unknown input filtering problem. It is proved that the state and the unknown inputs can be estimated and decoupled using the extended Double-Model Adaptive Estimation approach without satisfying the existence condition. Numerical examples are presented in which the performance of the proposed approach is compared to methods from literature.

1905.13547 2026-06-04 cs.LG cs.SY eess.SY math.DS math.OC stat.ML

Learning robust control for LQR systems with multiplicative noise via policy gradient

通过策略梯度学习具有乘性噪声的LQR系统的鲁棒控制

Benjamin Gravell, Peyman Mohajerin Esfahani, Tyler Summers

AI总结 本文研究了具有乘性噪声的LQR系统,通过策略梯度方法实现鲁棒控制,证明了在非凸成本函数下策略梯度算法的全局收敛性。

详情
AI中文摘要

线性二次调节(LQR)问题重新成为强化学习控制复杂动态系统的重要理论基准,特别是当状态和动作空间连续时。与几乎所有近期相关工作不同,我们考虑了乘性噪声模型,这些模型由于显式地纳入系统动态中的固有不确定性和变化,从而提高了控制器的鲁棒性。鲁棒性是强化学习中一个关键但理解不足的问题;现有不考虑不确定性的方法可能会收敛到脆弱的策略或完全无法收敛。此外,有意地将乘性噪声注入到学习算法中可以增强策略的鲁棒性,如在领域随机化中的非正式工作所观察到的。尽管策略梯度算法需要优化非凸成本函数,我们展示了乘性噪声LQR成本具有称为梯度支配的特殊性质,该性质被用来证明策略梯度算法在问题参数上具有多项式依赖性的全局收敛性,以达到全局最优控制策略。结果在已知模型和未知模型设置中均提供,其中系统轨迹样本用于估计策略梯度。

英文摘要

The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for reinforcement learning-based control of complex dynamical systems with continuous state and action spaces. In contrast with nearly all recent work in this area, we consider multiplicative noise models, which are increasingly relevant because they explicitly incorporate inherent uncertainty and variation in the system dynamics and thereby improve robustness properties of the controller. Robustness is a critical and poorly understood issue in reinforcement learning; existing methods which do not account for uncertainty can converge to fragile policies or fail to converge at all. Additionally, intentional injection of multiplicative noise into learning algorithms can enhance robustness of policies, as observed in ad hoc work on domain randomization. Although policy gradient algorithms require optimization of a non-convex cost function, we show that the multiplicative noise LQR cost has a special property called gradient domination, which is exploited to prove global convergence of policy gradient algorithms to the globally optimum control policy with polynomial dependence on problem parameters. Results are provided both in the model-known and model-unknown settings where samples of system trajectories are used to estimate policy gradients.

1902.09964 2026-06-04 eess.SY cs.LG cs.SY stat.ML

A Neural-Network-Based Model Predictive Control of Three-Phase Inverter With an Output LC Filter

基于神经网络的三相逆变器模型预测控制及输出LC滤波器

Ihab S. Mohamed, Stefano Rovetta, Ton Duc Do, Tomislav Dragicevic, Ahmed A. Zaki Diab

AI总结 本文提出了一种结合模型预测控制(MPC)和前馈人工神经网络(ANN)的双级逆变器控制方案,旨在降低总谐波失真(THD)并提高系统在不同负载类型下的稳态和动态性能。通过MPC生成神经网络训练数据,利用训练好的ANN实现无MPC的电压跟踪,通过MATLAB/Simulink仿真验证了该策略的优越性能。

详情
Comments
13 pages, 15 figures, 3 tables. This article has been submitted to IEEE Access
AI中文摘要

模型预测控制(MPC)已成为一种well-established的现代控制方法,用于具有输出LC滤波器的三相逆变器,其中需要高质量电压和低总谐波失真(THD)。尽管MPC是一种直观的控制器,易于理解和实现,但它有显著缺点,即需要大量的在线计算来解决优化问题。另一方面,在电力电子和驱动领域,基于人工神经网络的无模型方法的应用正在迅速增长。本文提出了一种新的双级逆变器控制方案,结合MPC和前馈ANN,旨在降低THD并提高系统在不同负载类型下的稳态和动态性能。首先,MPC在训练阶段用于生成用于训练所提出神经网络所需的数据。然后,一旦神经网络经过微调,就可以在不需要使用MPC的情况下在线用于电压跟踪目的。所提出的基于ANN的控制策略通过MATLAB/Simulink工具进行仿真验证,考虑了不同的负载条件。此外,评估了基于ANN的控制器在多种线性和非线性负载下的不同运行条件下性能,并与MPC的性能进行比较,证明了所提出基于ANN的控制策略在稳态和动态性能方面的优异表现。

英文摘要

Model predictive control (MPC) has become one of the well-established modern control methods for three-phase inverters with an output LC filter, where a high-quality voltage with low total harmonic distortion (THD) is needed. Although it is an intuitive controller, easy to understand and implement, it has the significant disadvantage of requiring a large number of online calculations for solving the optimization problem. On the other hand, the application of model-free approaches such as those based on artificial neural networks approaches is currently growing rapidly in the area of power electronics and drives. This paper presents a new control scheme for a two-level converter based on combining MPC and feed-forward ANN, with the aim of getting lower THD and improving the steady and dynamic performance of the system for different types of loads. First, MPC is used, as an expert, in the training phase to generate data required for training the proposed neural network. Then, once the neural network is fine-tuned, it can be successfully used online for voltage tracking purpose, without the need of using MPC. The proposed ANN-based control strategy is validated through simulation, using MATLAB/Simulink tools, taking into account different loads conditions. Moreover, the performance of the ANN-based controller is evaluated, on several samples of linear and non-linear loads under various operating conditions, and compared to that of MPC, demonstrating the excellent steady-state and dynamic performance of the proposed ANN-based control strategy.

1806.10749 2026-06-04 eess.SY cs.SY eess.SP math.PR stat.AP

On Adaptive Linear-Quadratic Regulators

关于自适应线性二次调节器

Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis

AI总结 本文研究了自适应控制策略的性能评估问题,通过与最优调节器的 regret 分析,揭示了未知参数对运行成本的影响,并提出了一种新的分解方法,以获得更精确的 regret 表达式,同时展示了基于 Certainty Equivalence 方案的自适应策略的高效性,并研究了系统参数识别的精度。

详情
AI中文摘要

通过与最优调节器的 regret 来评估自适应控制策略的性能,该 regret 反映了由于动态参数不确定性导致的运行成本增加。然而,现有文献中并没有提供关于未知参数对 regret 影响的定量描述。此外,一些现有自适应策略的高效实现也存在问题。最后,关于系统参数识别精度的结果较为稀缺且不完整。本研究旨在全面解决这三个问题。首先,通过引入自适应策略的新分解方法,我们建立了任意策略的 regret 表达式,该表达式以偏离最优调节器的程度为基准。其次,我们展示了基于 Certainty Equivalence 方案的微小修改的自适应策略是高效的。具体而言,我们为两种随机自适应策略建立了近似平方根速率的 regret。所提出的 regret 上界是通过使用用于随机化未知参数估计的随机矩阵的反集中结果得到的。此外,我们研究了使用动态矩阵的最小额外信息,使得 regret 变为对数阶。最后,展示了系统未知参数的识别速率。

英文摘要

Performance of adaptive control policies is assessed through the regret with respect to the optimal regulator, which reflects the increase in the operating cost due to uncertainty about the dynamics parameters. However, available results in the literature do not provide a quantitative characterization of the effect of the unknown parameters on the regret. Further, there are problems regarding the efficient implementation of some of the existing adaptive policies. Finally, results regarding the accuracy with which the system's parameters are identified are scarce and rather incomplete. This study aims to comprehensively address these three issues. First, by introducing a novel decomposition of adaptive policies, we establish a sharp expression for the regret of an arbitrary policy in terms of the deviations from the optimal regulator. Second, we show that adaptive policies based on slight modifications of the Certainty Equivalence scheme are efficient. Specifically, we establish a regret of (nearly) square-root rate for two families of randomized adaptive policies. The presented regret bounds are obtained by using anti-concentration results on the random matrices employed for randomizing the estimates of the unknown parameters. Moreover, we study the minimal additional information on dynamics matrices that using them the regret will become of logarithmic order. Finally, the rates at which the unknown parameters of the system are being identified are presented.

1812.08625 2026-06-04 math.NA cs.LG cs.NA physics.comp-ph stat.ML

Deep Theory of Functional Connections: A New Method for Estimating the Solutions of PDEs

深度函数连接理论:一种用于估计偏微分方程解的新方法

Carl Leake

AI总结 本文提出了一种名为深度函数连接理论(TFC)的新方法,通过将神经网络与TFC结合来估计偏微分方程(PDEs)的解。该方法将带有边界条件的PDEs转换为无约束优化问题,利用神经网络作为自由函数来求解无约束优化问题,并通过残差平方作为损失函数进行无监督训练。与传统方法相比,该方法无需离散化域,且能提供整个训练域的闭合形式解析解。

详情
Journal ref
Mach. Learn. Knowl. Extr. 2020, 2(1), 37-55
Comments
14 pages, 7 figures
AI中文摘要

本文提出了一种名为深度函数连接理论(TFC)的新方法,通过将神经网络与TFC结合来估计偏微分方程(PDEs)的解。TFC用于将带有边界条件的PDEs转换为无约束优化问题,通过将边界条件嵌入到一个“约束表达式”中。在本工作中,神经网络被选为自由函数,并用于求解现在无约束的优化问题。损失函数取为PDE残差的平方。然后,神经网络以无监督的方式训练以解决无约束优化问题。与用于估计PDE解的流行方法相比,该方法有两个主要区别。首先,该方法不需要将域离散化为网格,而是在线性训练阶段随机采样域中的点。其次,训练后,该方法在整个训练域内提供闭合形式、解析、可微的解的近似。相比之下,其他流行方法如果需要在不在离散化网格上的点上估计解,则需要插值。深度TFC方法用于解决四个具有各种边界条件的问题。

英文摘要

This article presents a new methodology called deep Theory of Functional Connections (TFC) that estimates the solutions of partial differential equations (PDEs) by combining neural networks with TFC. TFC is used to transform PDEs with boundary conditions into unconstrained optimization problems by embedding the boundary conditions into a "constrained expression." In this work, a neural network is chosen as the free function, and used to solve the now unconstrained optimization problem. The loss function is taken as the square of the residual of the PDE. Then, the neural network is trained in an unsupervised manner to solve the unconstrained optimization problem. This methodology has two major differences when compared with popular methods used to estimate the solutions of PDEs. First, this methodology does not need to discretize the domain into a grid, rather, this methodology randomly samples points from the domain during the training phase. Second, after training, this methodology represents a closed form, analytical, differentiable approximation of the solution throughout the entire training domain. In contrast, other popular methods require interpolation if the estimated solution is desired at points that do not lie on the discretized grid. The deep TFC method for estimating the solution of PDEs is demonstrated on four problems with a variety of boundary conditions.

1809.00710 2026-06-04 math.OC cs.DC cs.MA cs.SY eess.SY stat.ML

A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks

在网络分布式优化中的最优算法的双方法

César A. Uribe, Soomin Lee, Alexander Gasnikov, Angelia Nedić

AI总结 本文研究了网络分布式凸优化问题中的双方法算法,分析了四种不同情况下的复杂度界限,并提出了基于适当原始问题对偶的分布式算法,实现了与集中式算法相同最优速率(常数和对数因子外),并考虑了网络的谱特性。

详情
Comments
This work is an extended version of the manuscript: Optimal Algorithms for Distributed Optimization arXiv:1712.00232
AI中文摘要

我们研究了网络分布式凸优化问题中的双方法算法,目标是最小化一个网络中函数的和$\sum_{i=1}^{m}f_i(z)$。我们为四种不同情况提供了复杂度界限,即每个函数$f_i$是强凸和光滑的,每个函数是强凸或光滑的,以及当函数是凸但既不强凸也不光滑时。我们的方法基于适当原始问题的对偶,其中包含一个模型通信限制的图。我们提出了解分布式算法,这些算法在常数和对数因子范围内实现了与集中式算法相同的最优速率,并且还考虑了网络的谱特性。起初,我们专注于可以显式最小化其Legendre-Fenchel共轭的函数,即可接受或双友好的函数。然后,我们研究了非双友好的函数的分布式优化算法,以及改进涉及函数参数依赖性的方法。还提供了所提算法的数值分析。

英文摘要

We study dual-based algorithms for distributed convex optimization problems over networks, where the objective is to minimize a sum $\sum_{i=1}^{m}f_i(z)$ of functions over in a network. We provide complexity bounds for four different cases, namely: each function $f_i$ is strongly convex and smooth, each function is either strongly convex or smooth, and when it is convex but neither strongly convex nor smooth. Our approach is based on the dual of an appropriately formulated primal problem, which includes a graph that models the communication restrictions. We propose distributed algorithms that achieve the same optimal rates as their centralized counterparts (up to constant and logarithmic factors), with an additional optimal cost related to the spectral properties of the network. Initially, we focus on functions for which we can explicitly minimize its Legendre-Fenchel conjugate, i.e., admissible or dual friendly functions. Then, we study distributed optimization algorithms for non-dual friendly functions, as well as a method to improve the dependency on the parameters of the functions involved. Numerical analysis of the proposed algorithms is also provided.

1703.09074 2026-06-04 math.NA cs.NA stat.CO

Randomized CP Tensor Decomposition

随机CP张量分解

N. Benjamin Erichson, Krithika Manohar, Steven L. Brunton, J. Nathan Kutz

AI总结 本文提出了一种基于随机算法的CP张量分解方法,通过高效处理大规模张量数据,实现低秩近似并控制误差。

详情
AI中文摘要

CANDECOMP/PARAFAC(CP)张量分解是一种用于多维数据的流行降维方法。降维通常受到重视,因为许多高维张量的内在秩相对于环境测量空间的维度较低。然而,大数据的出现对计算这种基本张量分解带来了显著的计算挑战。通过利用现代随机算法,我们证明了可以从张量的较小表示中学习到相干结构,这在时间上是微不足道的。因此,这种简单但强大的算法使计算近似CP分解成为可能,即使对于大规模张量。近似误差可通过 oversampling 和幂迭代的计算来控制。除了理论结果外,几个经验结果还展示了所提出算法的性能。

英文摘要

The CANDECOMP/PARAFAC (CP) tensor decomposition is a popular dimensionality-reduction method for multiway data. Dimensionality reduction is often sought after since many high-dimensional tensors have low intrinsic rank relative to the dimension of the ambient measurement space. However, the emergence of `big data' poses significant computational challenges for computing this fundamental tensor decomposition. By leveraging modern randomized algorithms, we demonstrate that coherent structures can be learned from a smaller representation of the tensor in a fraction of the time. Thus, this simple but powerful algorithm enables one to compute the approximate CP decomposition even for massive tensors. The approximation error can thereby be controlled via oversampling and the computation of power iterations. In addition to theoretical results, several empirical results demonstrate the performance of the proposed algorithm.

1707.09350 2026-06-04 cs.SI cs.SY eess.SY math.ST physics.soc-ph stat.ML stat.TH

Centrality measures for graphons: Accounting for uncertainty in networks

图ons的中心性度量:在网络中考虑不确定性

Marco Avella-Medina, Francesca Parise, Michael T. Schaub, Santiago Segarra

AI总结 本文研究了图ons中的中心性度量,提出了一种基于图ons理论的统计方法,通过定义图ons的中心性度量并建立其与传统图中心性度量之间的联系,以应对大规模不确定网络中的计算和概念挑战。

详情
Journal ref
IEEE Transactions on Network Science and Engineering, 2020, vol. 7, no. 1, pp. 520-537
Comments
Authors ordered alphabetically, all authors contributed equally. 21 pages, 7 figures
AI中文摘要

随着以图形式建模的关系数据集不断增长,其数据采集过程受到不确定性的渗透,基于图的分析技术在计算和概念上变得具有挑战性。特别是,节点中心性度量假设图是完全已知的——这一前提不一定适用于大型不确定网络。因此,在存在不确定性的情况下,中心性度量可能无法准确提取节点的重要性。为缓解这些问题,我们建议基于图ons理论的统计方法:我们引入了图ons的中心性度量的正式定义,并建立了其与传统图中心性度量之间的联系。这种方法的一个关键优势是,在图ons的建模级别上定义的中心性度量本质上对特定图实现的随机变化具有鲁棒性。利用线性积分算子的理论,我们为图ons定义了度数、特征向量、Katz和PageRank中心性函数,并建立了集中不等式,证明了图ons中心性函数自然地作为更大规模图序列中对应度量的极限。同样的集中不等式也提供了图ons中心性函数与任何采样图上中心性度量之间的高概率界限,从而建立了所测中心性分数的不确定性度量。

英文摘要

As relational datasets modeled as graphs keep increasing in size and their data-acquisition is permeated by uncertainty, graph-based analysis techniques can become computationally and conceptually challenging. In particular, node centrality measures rely on the assumption that the graph is perfectly known -- a premise not necessarily fulfilled for large, uncertain networks. Accordingly, centrality measures may fail to faithfully extract the importance of nodes in the presence of uncertainty. To mitigate these problems, we suggest a statistical approach based on graphon theory: we introduce formal definitions of centrality measures for graphons and establish their connections to classical graph centrality measures. A key advantage of this approach is that centrality measures defined at the modeling level of graphons are inherently robust to stochastic variations of specific graph realizations. Using the theory of linear integral operators, we define degree, eigenvector, Katz and PageRank centrality functions for graphons and establish concentration inequalities demonstrating that graphon centrality functions arise naturally as limits of their counterparts defined on sequences of graphs of increasing size. The same concentration inequalities also provide high-probability bounds between the graphon centrality functions and the centrality measures on any sampled graph, thereby establishing a measure of uncertainty of the measured centrality score. The same concentration inequalities also provide high-probability bounds between the graphon centrality functions and the centrality measures on any sampled graph, thereby establishing a measure of uncertainty of the measured centrality score.

1809.02341 2026-06-04 math.OC cs.LG cs.NA math.NA stat.ML

A Fast Anderson-Chebyshev Acceleration for Nonlinear Optimization

非线性优化中的快速安德森-切比雪夫加速方法

Zhize Li, Jian Li

AI总结 本文提出了一种快速安德森-切比雪夫加速方法,用于非线性优化问题,该方法在二次函数上实现了最优收敛率O(√κ ln(1/ε)),并提供了通用非线性问题的收敛分析,同时提出了动态猜测超参数的算法。

详情
Comments
To appear in AISTATS 2020
AI中文摘要

安德森加速(或安德森混合)是一种高效的固定点迭代方法$ x_{t+1}=G(x_t) $,例如梯度下降可以视为迭代应用操作$ G(x) riangleq x-α abla f(x) $。本文表明,安德森加速结合切比雪夫多项式可以实现最优收敛率$ O(\sqrtκ\ln rac{1}ε) $,这改进了之前对于二次函数提供的结果$ O(κ\ln rac{1}ε) $(Toth and Kelley, 2015)。此外,我们为一般非线性问题提供了收敛分析。此外,如果超参数(例如Lipschitz光滑参数$ L $)不可用,我们提出了一种猜测算法来动态猜测它们,并证明了类似的收敛率。最后,实验结果表明,所提出的安德森-切比雪夫加速方法比其他算法如普通梯度下降(GD)、Nesterov加速GD收敛更快。此外,这些算法结合所提出的猜测算法(动态猜测超参数)实现了更好的性能。

英文摘要

Anderson acceleration (or Anderson mixing) is an efficient acceleration method for fixed point iterations $x_{t+1}=G(x_t)$, e.g., gradient descent can be viewed as iteratively applying the operation $G(x) \triangleq x-α\nabla f(x)$. It is known that Anderson acceleration is quite efficient in practice and can be viewed as an extension of Krylov subspace methods for nonlinear problems. In this paper, we show that Anderson acceleration with Chebyshev polynomial can achieve the optimal convergence rate $O(\sqrtκ\ln\frac{1}ε)$, which improves the previous result $O(κ\ln\frac{1}ε)$ provided by (Toth and Kelley, 2015) for quadratic functions. Moreover, we provide a convergence analysis for minimizing general nonlinear problems. Besides, if the hyperparameters (e.g., the Lipschitz smooth parameter $L$) are not available, we propose a guessing algorithm for guessing them dynamically and also prove a similar convergence rate. Finally, the experimental results demonstrate that the proposed Anderson-Chebyshev acceleration method converges significantly faster than other algorithms, e.g., vanilla gradient descent (GD), Nesterov's Accelerated GD. Also, these algorithms combined with the proposed guessing algorithm (guessing the hyperparameters dynamically) achieve much better performance.

1812.11137 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

Differential Temporal Difference Learning

差分时间差分学习

Adithya M. Devraj, Ioannis Kontoyiannis, Sean P. Meyn

AI总结 本文提出了一种新的差分时间差分学习算法,旨在解决传统时间差分学习方法中收敛缓慢和相对价值函数计算中一致性算法仅在特殊情况下存在的问题。

详情
Comments
Preliminary versions of some of the results in this article were submitted as arXiv:1604.01828
AI中文摘要

由马尔可夫决策过程导出的价值函数在许多统计和工程应用中的机器学习技术中作为算法和性能指标的核心组成部分。在大多数实际情况下,计算相关贝尔曼方程的解具有挑战性。一种流行的近似技术,即时间差分(TD)学习算法,是通用强化学习方法的重要子类。本文介绍的算法旨在解决TD学习方法的两个已知难题:由于非常高的方差导致的收敛缓慢,以及在计算相对价值函数的问题中,仅在特殊情况下存在一致算法。首先,我们表明这些价值函数的梯度具有可以用于算法设计的表示形式。基于这一结果,引入了一种新的差分TD学习算法。对于在欧几里得空间上具有光滑动力学的马尔可夫模型,在一般条件下,这些算法被证明是自洽的。数值结果表明,与标准方法相比,具有显著的方差减少。

英文摘要

Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques. Computation of the solution to the associated Bellman equations is challenging in most practical cases of interest. A popular class of approximation techniques, known as Temporal Difference (TD) learning algorithms, are an important sub-class of general reinforcement learning methods. The algorithms introduced in this paper are intended to resolve two well-known difficulties of TD-learning approaches: Their slow convergence due to very high variance, and the fact that, for the problem of computing the relative value function, consistent algorithms exist only in special cases. First we show that the gradients of these value functions admit a representation that lends itself to algorithm design. Based on this result, a new class of differential TD-learning algorithms is introduced. For Markovian models on Euclidean space with smooth dynamics, the algorithms are shown to be consistent under general conditions. Numerical results show dramatic variance reduction when compared to standard methods.

1904.12794 2026-06-04 physics.comp-ph cs.NA math.NA stat.ML

Constraint-Aware Neural Networks for Riemann Problems

具有约束意识的神经网络用于黎曼问题

Jim Magiera, Deep Ray, Jan S. Hesthaven, Christian Rohde

AI总结 本文提出两种生成具有约束意识的神经网络策略,用于解决强非线性波运动中的黎曼问题,通过减少约束偏差来提高数值模拟的精度和稳定性。

详情
AI中文摘要

神经网络越来越多地被用于复杂(数据驱动)模拟中作为替代模型或加速经典替代模型的计算。在许多应用中,物理约束,如质量或能量守恒,必须得到满足以获得可靠的结果。然而,标准机器学习算法通常不专门设计来尊重这些约束。我们提出了两种不同的策略来生成具有约束意识的神经网络。我们测试了它们在强非线性波运动压缩流体流动中的前沿捕捉方案中的性能。具体来说,在这种情况下,所谓的黎曼问题必须作为替代模型来解决。其解描述了数值模拟中捕获的波前的局部动力学。考虑了三个模型问题:立方通量模型问题、等温两相流模型和欧拉方程。我们证明了约束偏差的减少与所有模型问题中的离散化误差降低相关,此外还有满足约束的结构优势。

英文摘要

Neural networks are increasingly used in complex (data-driven) simulations as surrogates or for accelerating the computation of classical surrogates. In many applications physical constraints, such as mass or energy conservation, must be satisfied to obtain reliable results. However, standard machine learning algorithms are generally not tailored to respect such constraints. We propose two different strategies to generate constraint-aware neural networks. We test their performance in the context of front-capturing schemes for strongly nonlinear wave motion in compressible fluid flow. Precisely, in this context so-called Riemann problems have to be solved as surrogates. Their solution describes the local dynamics of the captured wave front in numerical simulations. Three model problems are considered: a cubic flux model problem, an isothermal two-phase flow model, and the Euler equations. We demonstrate that a decrease in the constraint deviation correlates with low discretization errors for all model problems, in addition to the structural advantage of fulfilling the constraint.

1612.02024 2026-06-04 math.ST econ.GN q-fin.EC stat.AP stat.ME stat.TH

Impossible Inference in Econometrics: Theory and Applications

计量经济学中的不可能推断:理论与应用

Marinho Bertanha, Marcelo J. Moreira

AI总结 本文研究了假设检验功效极低(即功效小于显著性水平)的模型,即检验不可能性(类型A)出现在任何替代假设无法与原假设区分的情况下。我们还研究了在参数估计中几乎必然有界置信区间的不可能性(类型B)的情形,这种不可能性发生在参数几乎未识别的条件下。本文框架将许多现有关于不可能推断的研究联系起来,这些研究依赖于不同的拓扑观念来展示模型无法区分或几乎未识别。我们还利用弱拓扑(由分布收敛诱导)推导了这两种不可能性。在弱拓扑下的不可能性通常更容易证明,适用于许多常用检验,并对稳健假设检验有用。最后,我们通过多个经济应用中的不连续模型和时间序列模型展示了不可能推断。

详情
AI中文摘要

本文研究了假设检验功效极低(即功效小于显著性水平)的模型,即检验不可能性(类型A)出现在任何替代假设无法与原假设区分的情况下。我们还研究了在参数估计中几乎必然有界置信区间的不可能性(类型B)的情形,这种不可能性发生在参数几乎未识别的条件下。本文理论框架将许多现有关于不可能推断的研究联系起来,这些研究依赖于不同的拓扑观念来展示模型无法区分或几乎未识别。我们还利用弱拓扑(由分布收敛诱导)推导了这两种不可能性。在弱拓扑下的不可能性通常更容易证明,适用于许多常用检验,并对稳健假设检验有用。最后,我们通过多个经济应用中的不连续模型和时间序列模型展示了不可能推断。

英文摘要

This paper studies models in which hypothesis tests have trivial power, that is, power smaller than size. This testing impossibility, or impossibility type A, arises when any alternative is not distinguishable from the null. We also study settings in which it is impossible to have almost surely bounded confidence sets for a parameter of interest. This second type of impossibility (type B) occurs under a condition weaker than the condition for type A impossibility: the parameter of interest must be nearly unidentified. Our theoretical framework connects many existing publications on impossible inference that rely on different notions of topologies to show models are not distinguishable or nearly unidentified. We also derive both types of impossibility using the weak topology induced by convergence in distribution. Impossibility in the weak topology is often easier to prove, it is applicable for many widely-used tests, and it is useful for robust hypothesis testing. We conclude by demonstrating impossible inference in multiple economic applications of models with discontinuity and time-series models.

1811.06838 2026-06-04 stat.ML cs.LG cs.NA math.NA

The Trace Criterion for Kernel Bandwidth Selection for Support Vector Data Description

核带宽选择的迹准则用于支持向量数据描述

Arin Chaudhuri, Carol Sadek, Deovrat Kakde, Wenhao Hu, Hansi Jiang, Seunghyun Kong, Yuewei Liao, Sergiy Peredriy, Haoyu Wang

AI总结 本文提出了一种新的无监督方法,用于选择支持向量数据描述(SVDD)中高斯核的带宽,通过利用核矩阵的低秩表示来建议带宽值,该方法在低维数据中与当前最佳方法竞争,并在许多高维数据类别中表现极佳。

详情
Comments
note: some text overlap with arXiv:1708.05106 because common background material is covered in both papers
AI中文摘要

支持向量数据描述(SVDD)是一种流行的异常检测技术。SVDD分类器将整个数据空间划分为内群区域和外群区域。计算SVDD分类器需要一个核函数,高斯核是一个常见选择。高斯核有一个带宽参数,正确设置该参数对获得良好结果至关重要。小带宽会导致过拟合,使得SVDD分类器高估异常数量,而大带宽会导致欠拟合,无法检测许多异常。本文提出了一种新的无监督方法,用于选择高斯核的带宽。我们的方法利用核矩阵的低秩表示来建议带宽值。我们的新方法在低维数据中与当前最佳方法竞争,并在许多高维数据类别中表现极佳。由于当使用高斯核时,SVDD的数学公式与单类支持向量机(OCSVM)的数学公式相同,因此我们的方法同样适用于OCSVM的高斯核带宽调整。

英文摘要

Support vector data description (SVDD) is a popular anomaly detection technique. The SVDD classifier partitions the whole data space into an inlier region, which consists of the region near the training data, and an outlier region, which consists of points away from the training data. The computation of the SVDD classifier requires a kernel function, for which the Gaussian kernel is a common choice. The Gaussian kernel has a bandwidth parameter, and it is important to set the value of this parameter correctly for good results. A small bandwidth leads to overfitting such that the resulting SVDD classifier overestimates the number of anomalies, whereas a large bandwidth leads to underfitting and an inability to detect many anomalies. In this paper, we present a new unsupervised method for selecting the Gaussian kernel bandwidth. Our method exploits a low-rank representation of the kernel matrix to suggest a kernel bandwidth value. Our new technique is competitive with the current state of the art for low-dimensional data and performs extremely well for many classes of high-dimensional data. Because the mathematical formulation of SVDD is identical with the mathematical formulation of one-class support vector machines (OCSVM) when the Gaussian kernel is used, our method is equally applicable to Gaussian kernel bandwidth tuning for OCSVM.

1604.02100 2026-06-04 stat.ML cs.IT cs.NA math.IT math.NA math.SP physics.med-ph

Hankel Matrix Nuclear Norm Regularized Tensor Completion for $N$-dimensional Exponential Signals

Hankel矩阵核范数正则化的张量补全用于N维指数信号

Jiaxi Ying, Hengfa Lu, Qingtao Wei, Jian-Feng Cai, Di Guo, Jihui Wu, Zhong Chen, Xiaobo Qu

AI总结 本文提出了一种基于Hankel矩阵核范数正则化的张量补全方法,用于恢复N维(特别是N≥3)的指数信号,通过同时利用CANDECOMP/PARAFAC结构和因子向量的指数结构,有效恢复有限样本下的完整信号。

详情
Comments
15 pages, 12 figures
AI中文摘要

信号通常被建模为化学、生物学和医学成像光谱中的指数函数叠加。然而,由于快速数据采集或其他不可避免的原因,只能获取少量样本,因此如何恢复完整信号成为活跃的研究课题。但现有方法无法有效恢复N≥3维的指数信号。本文研究了从部分观测中恢复N维(特别是N≥3)指数信号的问题,并将其建模为具有指数因子向量的低秩张量补全问题。通过同时利用CANDECOMP/PARAFAC结构和因子向量的指数结构,后者通过最小化包含Hankel矩阵核范数的目标函数来促进。在模拟和真实磁共振光谱数据上的实验结果表明,所提出的方法能够从非常有限的样本中成功恢复完整信号,并且对估计的张量秩具有鲁棒性。

英文摘要

Signals are generally modeled as a superposition of exponential functions in spectroscopy of chemistry, biology and medical imaging. For fast data acquisition or other inevitable reasons, however, only a small amount of samples may be acquired and thus how to recover the full signal becomes an active research topic. But existing approaches can not efficiently recover $N$-dimensional exponential signals with $N\geq 3$. In this paper, we study the problem of recovering N-dimensional (particularly $N\geq 3$) exponential signals from partial observations, and formulate this problem as a low-rank tensor completion problem with exponential factor vectors. The full signal is reconstructed by simultaneously exploiting the CANDECOMP/PARAFAC structure and the exponential structure of the associated factor vectors. The latter is promoted by minimizing an objective function involving the nuclear norm of Hankel matrices. Experimental results on simulated and real magnetic resonance spectroscopy data show that the proposed approach can successfully recover full signals from very limited samples and is robust to the estimated tensor rank.

1805.06604 2026-06-04 math.NA cs.NA math.OC stat.ML

Accelerating Nonnegative Matrix Factorization Algorithms using Extrapolation

利用外推加速非负矩阵因子分解算法

Andersen Man Shun Ang, Nicolas Gillis

AI总结 本文提出了一种通用框架,用于显著加速非负矩阵因子分解(NMF)算法。该框架灵感来源于用于加速凸优化中梯度方法的外推方案以及平行切线法。然而,在处理非凸NMF问题的双块精确坐标下降算法中使用外推是新颖的。我们通过合成、图像和文档数据集,展示了该方法在两个最先进的NMF算法(加速层次交替最小二乘法(A-HALS)和交替非负最小二乘法(ANLS))上的性能。

详情
Journal ref
Neural Computation 31 (2), pp. 417-439, 2019
Comments
19 pages, 6 figures, 6 tables. v2: few typos corrected, additional comparison with the extrapolated projected gradient method of Xu and Yin (SIAM J. on Imaging Sciences, 2013)
AI中文摘要

在本文中,我们提出了一种通用框架,用于显著加速非负矩阵因子分解(NMF)算法。该框架受到用于加速凸优化中梯度方法的外推方案和平行切线法的启发。然而,在处理非凸NMF问题的双块精确坐标下降算法中使用外推是新颖的。我们通过合成、图像和文档数据集,展示了该方法在两个最先进的NMF算法,即加速层次交替最小二乘法(A-HALS)和交替非负最小二乘法(ANLS)上的性能。

英文摘要

In this paper, we propose a general framework to accelerate significantly the algorithms for nonnegative matrix factorization (NMF). This framework is inspired from the extrapolation scheme used to accelerate gradient methods in convex optimization and from the method of parallel tangents. However, the use of extrapolation in the context of the two-block exact coordinate descent algorithms tackling the non-convex NMF problems is novel. We illustrate the performance of this approach on two state-of-the-art NMF algorithms, namely, accelerated hierarchical alternating least squares (A-HALS) and alternating nonnegative least squares (ANLS), using synthetic, image and document data sets.

1803.00204 2026-06-04 cs.LG cs.AI cs.NA math.NA stat.ML

Scalar Quantization as Sparse Least Square Optimization

标量量化作为稀疏最小二乘优化

Chen Wang, Xiaomei Yang, Shaomin Fei, Kai Zhou, Xiaofeng Gong, Miao Du, Ruisen Luo

AI总结 本文提出了一种基于稀疏最小二乘优化的新方法,用于解决标量量化中的问题,通过引入l1、l1+l2和l0正则化,改进了传统聚类方法的不足,提升了在位宽缩减场景下的性能。

详情
Journal ref
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019
AI中文摘要

量化可以用来形成具有共享值的新向量/矩阵,其值接近原始数据。近年来,标量量化在值共享应用中的普及度迅速上升,因为它在减少神经网络复杂度方面具有巨大实用性。现有的基于聚类的量化技术虽然发展成熟,但存在多个缺点,包括对随机种子的依赖性、空集群或超出范围的集群,以及大量集群时的时间复杂度高。为克服这些问题,本文从新的视角研究标量量化问题,即稀疏最小二乘优化。具体来说,受稀疏最小二乘回归性质的启发,提出了几种基于l1最小二乘的量化算法。此外,还提出了类似的方案,具有l1 + l2和l0正则化。此外,为了计算给定数量的值/集群的量化结果,本文设计了一种迭代方法和一种基于聚类的方法,并且两者都建立在稀疏最小二乘之上。本文表明,后者方法在数学上等价于改进版的k-means聚类基量化算法,尽管两种算法起源于不同的直觉。所提出的算法在三种类型的数据上进行了测试,比较和分析了其计算性能,包括信息损失、时间消耗以及稀疏向量值的分布。本文为量化领域提供了新的视角,所提出的算法在某些位宽缩减场景下表现优异,当所需的量化后分辨率(值的数量)不显著低于原始数量时尤其如此。

英文摘要

Quantization can be used to form new vectors/matrices with shared values close to the original. In recent years, the popularity of scalar quantization for value-sharing applications has been soaring as it has been found huge utilities in reducing the complexity of neural networks. Existing clustering-based quantization techniques, while being well-developed, have multiple drawbacks including the dependency of the random seed, empty or out-of-the-range clusters, and high time complexity for a large number of clusters. To overcome these problems, in this paper, the problem of scalar quantization is examined from a new perspective, namely sparse least square optimization. Specifically, inspired by the property of sparse least square regression, several quantization algorithms based on $l_1$ least square are proposed. In addition, similar schemes with $l_1 + l_2$ and $l_0$ regularization are proposed. Furthermore, to compute quantization results with a given amount of values/clusters, this paper designed an iterative method and a clustering-based method, and both of them are built on sparse least square. The paper shows that the latter method is mathematically equivalent to an improved version of k-means clustering-based quantization algorithm, although the two algorithms originated from different intuitions. The algorithms proposed were tested with three types of data and their computational performances, including information loss, time consumption, and the distribution of the values of the sparse vectors, were compared and analyzed. The paper offers a new perspective to probe the area of quantization, and the algorithms proposed can outperform existing methods especially under some bit-width reduction scenarios, when the required post-quantization resolution (number of values) is not significantly lower than the original number.

1605.06311 2026-06-04 stat.CO cs.CV cs.SY eess.SY

Poisson multi-Bernoulli conjugate prior for multiple extended object filtering

泊松多伯努利共轭先验用于多扩展目标滤波

Karl Granstrom, Maryam Fatemi, Lennart Svensson

AI总结 本文提出了一种用于多扩展目标滤波的泊松多伯努利混合(PMBM)共轭先验。通过泊松点过程描述尚未检测到的目标存在,而多伯努利混合描述已检测到的目标分布。预测和更新方程针对标准转移密度和测量似然性进行推导。预测和更新均保持密度的PMBM形式,因此PMBM密度是一种共轭先验。然而,未知的数据关联导致PMBM密度中出现难以处理的大量项,因此需要近似方法。本文给出了伽马高斯逆 Wishart 实现,并提供了处理数据关联问题的方法。模拟研究显示,扩展目标PMBM滤波器在与扩展目标d-GLMB和LMB滤波器的比较中表现良好。使用激光雷达数据的实验展示了同时跟踪已检测和未检测目标的优势。

详情
AI中文摘要

本文提出了一种用于多扩展目标滤波的泊松多伯努利混合(PMBM)共轭先验。通过泊松点过程描述尚未检测到的目标存在,而多伯努利混合描述已检测到的目标分布。预测和更新方程针对标准转移密度和测量似然性进行推导。预测和更新均保持密度的PMBM形式,因此PMBM密度是一种共轭先验。然而,未知的数据关联导致PMBM密度中出现难以处理的大量项,因此需要近似方法。本文给出了伽马高斯逆 Wishart 实现,并提供了处理数据关联问题的方法。模拟研究显示,扩展目标PMBM滤波器在与扩展目标d-GLMB和LMB滤波器的比较中表现良好。使用激光雷达数据的实验展示了同时跟踪已检测和未检测目标的优势。

英文摘要

This paper presents a Poisson multi-Bernoulli mixture (PMBM) conjugate prior for multiple extended object filtering. A Poisson point process is used to describe the existence of yet undetected targets, while a multi-Bernoulli mixture describes the distribution of the targets that have been detected. The prediction and update equations are presented for the standard transition density and measurement likelihood. Both the prediction and the update preserve the PMBM form of the density, and in this sense the PMBM density is a conjugate prior. However, the unknown data associations lead to an intractably large number of terms in the PMBM density, and approximations are necessary for tractability. A gamma Gaussian inverse Wishart implementation is presented, along with methods to handle the data association problem. A simulation study shows that the extended target PMBM filter performs well in comparison to the extended target d-GLMB and LMB filters. An experiment with Lidar data illustrates the benefit of tracking both detected and undetected targets.

1904.08831 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Neural-Attention-Based Deep Learning Architectures for Modeling Traffic Dynamics on Lane Graphs

基于神经注意力的深度学习架构用于车道图上的交通动态建模

Matthew A. Wright, Simon F. G. Ehlers, Roberto Horowitz

AI总结 本文提出了一种基于神经注意力的深度学习架构,用于建模车道图上的交通动态,通过显式编码车道间的关系类型来提高预测性能,并展示了该模型在复杂道路网络中的迁移能力。

详情
Comments
To appear at 2019 IEEE Conference on Intelligent Transportation Systems
AI中文摘要

深度神经网络可以成为强大的工具,但需要特定应用的精心设计以确保数据中最相关信息可被学习。在本文中,我们将深度神经网络应用于车辆交通动态的非线性时空物理问题。我们考虑在车道层面估计宏观量(例如交叉口的排队长度)的问题。由于建模如车道变更等社会行为的复杂性以及这些行为的宏观尺度影响,车道尺度的第一原理建模一直是一个挑战。遵循领域知识,上游/下游车道和邻近车道以不同的方式影响彼此的交通流量,我们应用了一种神经注意力机制,使神经网络层能够以不同的方式聚合来自不同车道的信息。使用微观交通模拟器作为测试平台,我们获得了结果,表明注意力神经网络模型可以利用附近车道的信息来提高预测效果,并且显式编码车道间的关系类型显著提高了性能。我们还展示了所学神经网络在更复杂道路网络中的迁移能力,讨论了其性能退化可能归因于拓扑复杂性增加所引起的新交通行为,并激励从多种道路网络拓扑中学习动态模型。

英文摘要

Deep neural networks can be powerful tools, but require careful application-specific design to ensure that the most informative relationships in the data are learnable. In this paper, we apply deep neural networks to the nonlinear spatiotemporal physics problem of vehicle traffic dynamics. We consider problems of estimating macroscopic quantities (e.g., the queue at an intersection) at a lane level. First-principles modeling at the lane scale has been a challenge due to complexities in modeling social behaviors like lane changes, and those behaviors' resultant macro-scale effects. Following domain knowledge that upstream/downstream lanes and neighboring lanes affect each others' traffic flows in distinct ways, we apply a form of neural attention that allows the neural network layers to aggregate information from different lanes in different manners. Using a microscopic traffic simulator as a testbed, we obtain results showing that an attentional neural network model can use information from nearby lanes to improve predictions, and, that explicitly encoding the lane-to-lane relationship types significantly improves performance. We also demonstrate the transfer of our learned neural network to a more complex road network, discuss how its performance degradation may be attributable to new traffic behaviors induced by increased topological complexity, and motivate learning dynamics models from many road network topologies.

1905.09435 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling

MATCHA: 通过匹配分解采样加速去中心化SGD

Jianyu Wang, Anit Kumar Sahu, Zhouyi Yang, Gauri Joshi, Soummya Kar

AI总结 该研究提出MATCHA算法,通过匹配分解采样在去中心化SGD中实现误差与运行时间的双赢,验证了其在各种数据集和深度神经网络上的有效性,证明其比传统去中心化SGD快5倍。

详情
AI中文摘要

本文研究了在基于随机梯度下降(SGD)的去中心化训练中常见的误差-运行时间权衡问题。尽管更密集(稀疏)的网络拓扑会导致迭代更快(更慢)的误差收敛,但会带来更多的(更少)每次迭代的通信时间/延迟。本文提出MATCHA算法,能够在任意任意网络拓扑中实现误差-运行时间的双赢。MATCHA的主要思想是通过将拓扑分解为匹配来并行化节点间通信。为了保持快速的误差收敛速度,它识别并频繁通过关键链接进行通信,并通过较少使用其他链接来节省通信时间。在一系列数据集和深度神经网络上的实验验证了理论分析,并证明MATCHA在达到相同训练损失时比传统去中心化SGD快多达5倍。

英文摘要

This paper studies the problem of error-runtime trade-off, typically encountered in decentralized training based on stochastic gradient descent (SGD) using a given network. While a denser (sparser) network topology results in faster (slower) error convergence in terms of iterations, it incurs more (less) communication time/delay per iteration. In this paper, we propose MATCHA, an algorithm that can achieve a win-win in this error-runtime trade-off for any arbitrary network topology. The main idea of MATCHA is to parallelize inter-node communication by decomposing the topology into matchings. To preserve fast error convergence speed, it identifies and communicates more frequently over critical links, and saves communication time by using other links less frequently. Experiments on a suite of datasets and deep neural networks validate the theoretical analyses and demonstrate that MATCHA takes up to $5\times$ less time than vanilla decentralized SGD to reach the same training loss.

1705.06565 2026-06-04 math.NA cs.NA stat.CO

Numerical solution of fractional elliptic stochastic PDEs with spatial white noise

分数椭圆随机PDEs中空间白噪声的数值解法

David Bolin, Kristin Kirchner, Mihály Kovács

AI总结 本文研究了在$\mathbb{R}^d$有界域上带有加性空间白噪声的随机偏微分方程的数值解法,核心方法是利用分数次算子的逆运算表示并结合有限元方法进行空间离散,主要贡献是推导了强均方误差的收敛率并进行了数值验证。

详情
Journal ref
IMA J. Numer. Anal. (2018)
Comments
21 pages, 1 figure
AI中文摘要

考虑在$\mathbb{R}^d$有界域上带有加性空间白噪声的随机偏微分方程的数值近似问题。微分算子由整数阶椭圆微分算子$L$的分数次幂$L^β$($β\in(0,1)$)给出,因此是非局部的。其逆$L^{-β}$通过Bochner积分从Dunford-Taylor泛函分析中表示。通过将此积分表示代入求积公式,将逆分数次幂算子$L^{-β}$近似为非分数次残差$(I + t_j^2 L)^{-1}$的加权和,其中$t_j>0$为求积节点。然后通过标准有限元方法对残差进行空间离散。该方法结合仅基于有限元离散化质量矩阵的白噪声近似,从而得到高效的数值算法用于计算近似解的样本。对于所得到的近似解,分析了强均方误差并推导出显式收敛率。数值实验针对$L=κ^2-Δ$($κ>0$)在单位立方体$(0,1)^d$上,$d=1,2,3$空间维度,以及变化的$β\in(0,1)$验证了理论结果。

英文摘要

The numerical approximation of solutions to stochastic partial differential equations with additive spatial white noise on bounded domains in $\mathbb{R}^d$ is considered. The differential operator is given by the fractional power $L^β$, $β\in(0,1)$, of an integer order elliptic differential operator $L$ and is therefore non-local. Its inverse $L^{-β}$ is represented by a Bochner integral from the Dunford-Taylor functional calculus. By applying a quadrature formula to this integral representation, the inverse fractional power operator $L^{-β}$ is approximated by a weighted sum of non-fractional resolvents $(I + t_j^2 L)^{-1}$ at certain quadrature nodes $t_j>0$. The resolvents are then discretized in space by a standard finite element method. This approach is combined with an approximation of the white noise, which is based only on the mass matrix of the finite element discretization. In this way, an efficient numerical algorithm for computing samples of the approximate solution is obtained. For the resulting approximation, the strong mean-square error is analyzed and an explicit rate of convergence is derived. Numerical experiments for $L=κ^2-Δ$, $κ> 0$, with homogeneous Dirichlet boundary conditions on the unit cube $(0,1)^d$ in $d=1,2,3$ spatial dimensions for varying $β\in(0,1)$ attest the theoretical results.

1905.07960 2026-06-04 cs.LG cs.SY eess.SY stat.ML

A novel Multiplicative Polynomial Kernel for Volterra series identification

一种新型的乘积多项式核用于Volterra级数识别

Alberto Dalla Libera, Ruggero Carli, Gianluigi Pillonetto

AI总结 本文提出了一种新的正则化网络用于Volterra模型的识别,通过引入由基本构建块乘积构成的新核,利用边际似然优化估计未知参数,实验表明该方法能更有效地选择影响系统输出的单项式,提升模型预测能力。

详情
AI中文摘要

Volterra级数在非线性系统识别中尤其有用,也得益于其能够近似广泛输入输出映射的能力。然而,从有限数据集中进行识别困难,由于维度灾难。最近的方法已展示出如何利用正则化核方法对此任务有所帮助。本文提出了一种新的正则化网络用于Volterra模型的识别。它依赖于一个新的核,由基本构建块的乘积给出。每个块包含一些未知参数,可以通过边际似然优化从数据中估计。与文献中提出的其他算法相比,数值实验表明,我们的方法能够更好地选择真正影响系统输出的单项式,大大提高了模型的预测能力。

英文摘要

Volterra series are especially useful for nonlinear system identification, also thanks to their capability to approximate a broad range of input-output maps. However, their identification from a finite set of data is hard, due to the curse of dimensionality. Recent approaches have shown how regularized kernel-based methods can be useful for this task. In this paper, we propose a new regularization network for Volterra models identification. It relies on a new kernel given by the product of basic building blocks. Each block contains some unknown parameters that can be estimated from data using marginal likelihood optimization. In comparison with other algorithms proposed in the literature, numerical experiments show that our approach allows to better select the monomials that really influence the system output, much increasing the prediction capability of the model.

1604.03776 2026-06-04 stat.ME econ.GN q-fin.EC stat.AP

Detecting a Structural Change in Functional Time Series Using Local Wilcoxon Statistic

使用局部Wilcoxon统计量检测函数时间序列中的结构变化

Daniel Kosiorowski, Jerzy P. Rydlewski, Małgorzata Snarska

AI总结 本文提出了一种基于局部Wilcoxon统计量的新方法,用于检测函数时间序列中的结构变化,该统计量由Paindaveine和Van Bever (2013)提出的局部深度函数诱导。

详情
Journal ref
Statistical Papers, October 2019, Volume 60, Issue 5, pp 1677 - 1698
Comments
17 pages, 19 figures, LaTeX svjour3 class The final publication is available at link.springer.com DOI: 10.1007/s00362-017-0891-y
AI中文摘要

函数数据分析(FDA)是现代多元统计学的一部分,用于分析提供关于曲线、表面或任何其他在连续体上变化的数据的信息。在经济学和实证金融中,我们经常需要处理函数时间序列数据,其中很难决定这些数据应被视为同质还是异质。目前,关于函数数据适当同质性检验的讨论正在进行。我们提出了一种新的统计量,用于检测函数时间序列中的结构变化,该统计量基于由Paindaveine和Van Bever (2013)提出的局部深度函数诱导的局部Wilcoxon统计量。

英文摘要

Functional data analysis (FDA) is a part of modern multivariate statistics that analyses data providing information about curves, surfaces or anything else varying over a certain continuum. In economics and empirical finance we often have to deal with time series of functional data, where we cannot easily decide, whether they are to be considered as homogeneous or heterogeneous. At present a discussion on adequate tests of homogenity for functional data is carried. We propose a novel statistic for detetecting a structural change in functional time series based on a local Wilcoxon statistic induced by a local depth function proposed by Paindaveine and Van Bever (2013).

1811.07624 2026-06-04 math.NA cs.DS cs.LG cs.NA stat.ML

Approximate Eigenvalue Decompositions of Linear Transformations with a Few Householder Reflectors

利用少量Householder反射子进行线性变换的近似本征值分解

Cristian Rusu

AI总结 本文提出了一种利用少量Householder反射子构造高效或thonormal矩阵的方法,用于近似或thonormal或对称变换,并应用于快速Mahalanobis距离度量变换的学习。

详情
AI中文摘要

将信号分解为正交基(一组正交分量,每个分量归一化为单位长度)的能力,是许多信号处理方法和应用的核心。经典例子是傅里叶变换和小波变换,它们具有数值高效的实现(FFT和FWT)。不幸的是,正交变换通常结构不规则,因此通常不具有低计算复杂度的性质。在本文中,基于Householder反射子,我们引入了一类正交矩阵,这些矩阵在数值上易于操作:我们通过一个给定参数控制这些矩阵与向量的乘法复杂度。我们提供了数值算法,用于近似任何正交或对称变换,通过给定数量Householder反射子的乘积构造新的正交或对称结构。我们展示了分析和数值证据,以突出所提近似的准确性,并提供了一个应用于快速Mahalanobis距离度量变换学习的应用。

英文摘要

The ability to decompose a signal in an orthonormal basis (a set of orthogonal components, each normalized to have unit length) using a fast numerical procedure rests at the heart of many signal processing methods and applications. The classic examples are the Fourier and wavelet transforms that enjoy numerically efficient implementations (FFT and FWT, respectively). Unfortunately, orthonormal transformations are in general unstructured, and therefore they do not enjoy low computational complexity properties. In this paper, based on Householder reflectors, we introduce a class of orthonormal matrices that are numerically efficient to manipulate: we control the complexity of matrix-vector multiplications with these matrices using a given parameter. We provide numerical algorithms that approximate any orthonormal or symmetric transform with a new orthonormal or symmetric structure made up of products of a given number of Householder reflectors. We show analyses and numerical evidence to highlight the accuracy of the proposed approximations and provide an application to the case of learning fast Mahanalobis distance metric transformations.

1812.04426 2026-06-04 cs.LG cs.NA math.NA physics.comp-ph stat.ML

PDE-Net 2.0: Learning PDEs from Data with A Numeric-Symbolic Hybrid Deep Network

PDE-Net 2.0:基于数据学习PDE的数值-符号混合深度网络

Zichao Long, Yiping Lu, Bin Dong

AI总结 本文提出PDE-Net 2.0,一种结合数值近似和符号计算的深度网络,用于从动态数据中学习偏微分方程,并具有较高的灵活性和表达能力。

详情
Comments
16 pages, 15 figures. arXiv admin note: substantial text overlap with arXiv:1710.09668
AI中文摘要

偏微分方程(PDEs)通常是基于经验观察推导得出的。然而,技术的进步使我们能够收集和存储大量数据,这为数据驱动的PDE发现提供了新机会。本文提出了一种新的深度神经网络,称为PDE-Net 2.0,用于从观测动态数据中发现(时间依赖的)PDE,仅需少量对驱动动态机制的先验知识。PDE-Net 2.0的设计基于我们先前的工作\cite{Long2018PDE},其中提出了原始版本的PDE-Net。PDE-Net 2.0是通过卷积近似微分算子和用于模型恢复的符号多层神经网络的结合。与现有方法相比,PDE-Net 2.0通过学习微分算子和PDE模型的非线性响应函数,具有最大的灵活性和表达能力。数值实验表明,PDE-Net 2.0有潜力揭示观测动态的隐藏PDE,并在噪声环境中预测相对较长时间的动力学行为。

英文摘要

Partial differential equations (PDEs) are commonly derived based on empirical observations. However, recent advances of technology enable us to collect and store massive amount of data, which offers new opportunities for data-driven discovery of PDEs. In this paper, we propose a new deep neural network, called PDE-Net 2.0, to discover (time-dependent) PDEs from observed dynamic data with minor prior knowledge on the underlying mechanism that drives the dynamics. The design of PDE-Net 2.0 is based on our earlier work \cite{Long2018PDE} where the original version of PDE-Net was proposed. PDE-Net 2.0 is a combination of numerical approximation of differential operators by convolutions and a symbolic multi-layer neural network for model recovery. Comparing with existing approaches, PDE-Net 2.0 has the most flexibility and expressive power by learning both differential operators and the nonlinear response function of the underlying PDE model. Numerical experiments show that the PDE-Net 2.0 has the potential to uncover the hidden PDE of the observed dynamics, and predict the dynamical behavior for a relatively long time, even in a noisy environment.

1806.02957 2026-06-04 cs.LG cs.NA cs.NE math.NA physics.comp-ph stat.ML

A Deep Neural Network Surrogate for High-Dimensional Random Partial Differential Equations

高维随机偏微分方程的深度神经网络替代模型

Mohammad Amin Nabian, Hadi Meidani

AI总结 本文提出了一种基于深度学习的高维随机偏微分方程求解框架,通过深度残差网络近似随机PDE,并采用强化或弱化初始和边界条件的方法,验证了该方法在扩散和热传导问题中的准确性。

详情
Journal ref
Probabilistic Engineering Mechanics, 57, pp.14-25 (2019)
AI中文摘要

开发高效的数值算法来求解高维随机偏微分方程(PDEs)一直是一个具有挑战性的任务,由于众所周知的维度灾难。我们提出了一种基于深度学习的新解决方案框架。具体而言,随机PDE通过前馈全连接深度残差网络进行近似,采用强或弱执行初始和边界约束。该框架是无网格的,能够处理不规则计算域。近似深度神经网络的参数通过SGD算法的变种迭代确定。所提出的框架在扩散和热传导问题中通过数值实验验证了令人满意的准确性,与收敛的基于蒙特卡洛的有限元结果进行比较。

英文摘要

Developing efficient numerical algorithms for the solution of high dimensional random Partial Differential Equations (PDEs) has been a challenging task due to the well-known curse of dimensionality. We present a new solution framework for these problems based on a deep learning approach. Specifically, the random PDE is approximated by a feed-forward fully-connected deep residual network, with either strong or weak enforcement of initial and boundary constraints. The framework is mesh-free, and can handle irregular computational domains. Parameters of the approximating deep neural network are determined iteratively using variants of the Stochastic Gradient Descent (SGD) algorithm. The satisfactory accuracy of the proposed frameworks is numerically demonstrated on diffusion and heat conduction problems, in comparison with the converged Monte Carlo-based finite element results.

1802.04447 2026-06-04 cs.SI cs.NA math.NA stat.AP

Graph Coarsening with Preserved Spectral Properties

具有保持谱性质的图粗化

Yu Jin, Andreas Loukas, Joseph F. JaJa

AI总结 本文提出了一种基于谱图理论的新方法,用于在粗化图时保持其谱性质,通过定义新的距离函数来衡量原始图与粗化图之间的差异,并提供高效的图粗化算法以保证谱性质的保持。

详情
Comments
Submitted to AISTATS 2020
AI中文摘要

大规模图在许多现实应用中被广泛用于表示对象关系。大规模图的出现给处理、分析和提取信息带来了显著的计算挑战。图粗化技术通常用于减少计算负载,同时试图保持原始图的基本结构属性。由于对粗化图所保持的具体图属性尚无共识,如何测量原始图与粗化图之间的差异仍然是一个关键挑战。在本文中,我们引入了一种基于谱图理论概念的新视角,提出并论证了新的距离函数,以表征原始图与粗化图之间的差异。我们展示了所提出的谱距离自然地捕捉了图粗化过程中的结构差异。此外,我们还提供了高效的图粗化算法,以生成能够证明保持原始图谱性质的图。实验表明,所提出的算法在图分类和块恢复任务中比先前的图粗化方法表现更优。

英文摘要

Large-scale graphs are widely used to represent object relationships in many real world applications. The occurrence of large-scale graphs presents significant computational challenges to process, analyze, and extract information. Graph coarsening techniques are commonly used to reduce the computational load while attempting to maintain the basic structural properties of the original graph. As there is no consensus on the specific graph properties preserved by coarse graphs, how to measure the differences between original and coarse graphs remains a key challenge. In this work, we introduce a new perspective regarding the graph coarsening based on concepts from spectral graph theory. We propose and justify new distance functions that characterize the differences between original and coarse graphs. We show that the proposed spectral distance naturally captures the structural differences in the graph coarsening process. In addition, we provide efficient graph coarsening algorithms to generate graphs which provably preserve the spectral properties from original graphs. Experiments show that our proposed algorithms consistently achieve better results compared to previous graph coarsening methods on graph classification and block recovery tasks.

1701.08711 2026-06-04 cs.CL cs.LG econ.GN q-fin.EC stat.ML

Predicting Auction Price of Vehicle License Plate with Deep Recurrent Neural Network

利用深度循环神经网络预测车辆车牌拍卖价格

Vinci Chow

AI总结 本文提出将车辆车牌价格预测视为自然语言处理任务,通过构建深度循环神经网络来预测香港车牌拍卖价格,并展示了模型在解释价格变化和扩展为车牌搜索引擎方面的贡献。

详情
AI中文摘要

在中国社会,迷信因素极为重要,具有吉祥数字的车辆车牌在拍卖中可以高价成交。与其他珍贵物品不同,车牌在拍卖前并不预估价格。本文提出将车牌价格预测视为自然语言处理(NLP)任务,因为价值取决于车牌上每个字符的含义和语义。本文构建了一个深度循环神经网络(RNN)来预测香港车牌的价格,基于车牌上的字符。在13年的历史拍卖价格上评估,深度RNN的预测可以解释超过80%的价格变化,显著优于以前的模型。此外,本文还展示了该模型如何扩展为车牌搜索引擎,并提供价格分布的估计。

英文摘要

In Chinese societies, superstition is of paramount importance, and vehicle license plates with desirable numbers can fetch very high prices in auctions. Unlike other valuable items, license plates are not allocated an estimated price before auction. I propose that the task of predicting plate prices can be viewed as a natural language processing (NLP) task, as the value depends on the meaning of each individual character on the plate and its semantics. I construct a deep recurrent neural network (RNN) to predict the prices of vehicle license plates in Hong Kong, based on the characters on a plate. I demonstrate the importance of having a deep network and of retraining. Evaluated on 13 years of historical auction prices, the deep RNN's predictions can explain over 80 percent of price variations, outperforming previous models by a significant margin. I also demonstrate how the model can be extended to become a search engine for plates and to provide estimates of the expected price distribution.

1712.00716 2026-06-04 stat.CO cs.IT cs.NA math.IT math.NA math.OC stat.ML

Convolutional Phase Retrieval via Gradient Descent

通过梯度下降进行卷积相位恢复

Qing Qu, Yuqian Zhang, Yonina C. Eldar, John Wright

AI总结 本文研究了通过梯度下降恢复未知信号的问题,利用随机核和足够多的观测来高效恢复信号,克服了测量算子中的依赖性挑战。

详情
Comments
64 pages , 9 figures, appeared in NeurIPS 2017. Accepted at IEEE Transactions on Information Theory. This is the final (minor) update: fixed typos and grammar issues
AI中文摘要

我们研究了卷积相位恢复问题,即从m个测量值中恢复未知信号x∈C^n,这些测量值是其与给定核a∈C^m的循环卷积的模值。该模型受信道估计、光学和水下声学通信等应用的启发,在这些应用中,感兴趣的信号受给定信道/滤波器作用,且相位信息难以或不可能获取。我们证明当a是随机的且观测数m足够大时,以高概率可以通过谱初始化和广义梯度下降结合的方法,高效地恢复x,仅相差一个全局相位。主要挑战是处理测量算子中的依赖性,我们通过使用去耦理论、混沌过程的上界和随机循环矩阵的限制等距性质,以及交替最小化方法的最新分析来克服这一挑战。

英文摘要

We study the convolutional phase retrieval problem, of recovering an unknown signal $\mathbf x \in \mathbb C^n $ from $m$ measurements consisting of the magnitude of its cyclic convolution with a given kernel $\mathbf a \in \mathbb C^m $. This model is motivated by applications such as channel estimation, optics, and underwater acoustic communication, where the signal of interest is acted on by a given channel/filter, and phase information is difficult or impossible to acquire. We show that when $\mathbf a$ is random and the number of observations $m$ is sufficiently large, with high probability $\mathbf x$ can be efficiently recovered up to a global phase shift using a combination of spectral initialization and generalized gradient descent. The main challenge is coping with dependencies in the measurement operator. We overcome this challenge by using ideas from decoupling theory, suprema of chaos processes and the restricted isometry property of random circulant matrices, and recent analysis of alternating minimization methods.

1806.10273 2026-06-04 eess.SP cs.NA math.NA math.ST stat.AP stat.TH

von Mises Tapering: A Circular Data Windowing

von Mises tapering: 一种圆形数据窗函数

H. M. de Oliveira, F. Chaves

AI总结 本文重新审视了连续标准窗函数,并引入了一种基于von Mises分布的新窗形,分析了连续时间窗的频谱特性,并与经典窗函数在频谱特性上进行了比较,为频谱分析提供了新的选择。

详情
Journal ref
XXXVI SIMPOSIO BRASILEIRO DE TELECOMUNICACOES E PROCESSAMENTO DE SINAIS-SBrT2018
Comments
5 pages, 5 figures
AI中文摘要

Continuous standard windowing is revisited and a new taper shape is introduced, which is based on the normal circular distribution by von Mises. Continuous-time windows are considered and their spectra obtained. A brief comparison with classical window families is performed in terms of their spectral properties. These windows can be used as an alternative in spectral analysis.

英文摘要

Continuous standard windowing is revisited and a new taper shape is introduced, which is based on the normal circular distribution by von Mises. Continuous-time windows are considered and their spectra obtained. A brief comparison with classical window families is performed in terms of their spectral properties. These windows can be used as an alternative in spectral analysis.

1501.06031 2026-06-04 stat.AP cs.NA math.NA q-bio.NC

From dynamics to links: a sparse reconstruction of the topology of a neural network

从动力学到链接:神经网络拓扑结构的稀疏重建

Giacomo Aletti, Davide Lonardoni, Giovanni Naldi, Thierry Nieus

AI总结 本文提出了一种基于记录电压的神经网络连接性识别新方法,假设网络具有稀疏连接拓扑,通过比较与尖峰训练的交叉相关性,验证了该方法在神经网络拓扑重建中的有效性。

详情
AI中文摘要

神经科学中的一个主要挑战是确定反映神经活动的信号之间的相互关系以及神经回路中信息处理的发生方式。在细胞和分子层面,信号转导的机制已被深入研究,并且对某些基本神经元信息处理过程有了更好的了解和认识。相反,关于复杂神经网络的组织和功能知之甚少。现在有实验方法可以同时监测大量神经元的电活动。然后,对单个神经元尖峰活动的定性和定量分析是研究神经网络动态和架构的非常有价值的工具。这种活动不仅仅归因于单个神经元的内在性质,而是主要由于其他神经元的直接影响所致。推导出神经元之间有效连接性的推断在神经科学中至关重要:首先是为了正确解释所涉及神经元和神经网络的电生理活动,以及正确将电生理活动与网络完成的功能任务联系起来。在本文中,我们提出了一种利用记录电压识别神经网络连接性的新方法。我们的方法基于网络具有稀疏连接拓扑的假设。在简要描述我们的方法后,我们将报告其性能,并将其与在尖峰训练上计算的交叉相关性进行比较,后者是该领域中的金标准方法。

英文摘要

One major challenge in neuroscience is the identification of interrelations between signals reflecting neural activity and how information processing occurs in the neural circuits. At the cellular and molecular level, mechanisms of signal transduction have been studied intensively and a better knowledge and understanding of some basic processes of information handling by neurons has been achieved. In contrast, little is known about the organization and function of complex neuronal networks. Experimental methods are now available to simultaneously monitor electrical activity of a large number of neurons in real time. Then, the qualitative and quantitative analysis of the spiking activity of individual neurons is a very valuable tool for the study of the dynamics and architecture of the neural networks. Such activity is not due to the sole intrinsic properties of the individual neural cells but it is mostly consequence of the direct influence of other neurons. The deduction of the effective connectivity between neurons, whose experimental spike trains are observed, is of crucial importance in neuroscience: first for the correct interpretation of the electro-physiological activity of the involved neurons and neural networks, and, for correctly relating the electrophysiological activity to the functional tasks accomplished by the network. In this work we propose a novel method for the identification of connectivity of neural networks using recorded voltages. Our approach is based on the assumption that the network has a topology with sparse connections. After a brief description of our method we will report the performances and compare it to the cross-correlation computed on the spike trains, that represents a gold standard method in the field.

1904.10945 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

Target-Based Temporal Difference Learning

基于目标的时序差分学习

Donghwan Lee, Niao He

AI总结 本文提出了一种新的基于目标的时序差分学习算法家族,并从理论上分析了其收敛性,展示了这些算法在收敛性能上可能优于标准时序差分学习。

详情
AI中文摘要

目标网络的使用已成为近期深度Q学习算法在强化学习中的流行和关键组成部分,但理论方面的了解仍然有限。在本工作中,我们介绍了一种新的基于目标的时序差分(TD)学习算法家族,并对其收敛性进行了理论分析。与标准TD学习不同,基于目标的TD算法维护两个独立的学习参数——目标变量和在线变量。特别地,我们介绍了该家族中的三个成员,称为平均TD、双TD和周期TD,其中目标变量通过平均、对称或周期性的方式更新,模仿了深度Q学习实践中使用的技术。我们为平均TD和双TD建立了渐近收敛分析,并为周期TD提供了有限样本分析。此外,我们还提供了一些模拟结果,显示这些基于目标的TD算法在收敛性能上可能优于标准TD学习。虽然本工作集中在线性函数逼近和策略评估设置上,但我们将其视为朝着理解具有目标网络的深度Q学习变体理论基础迈出的有意义一步。

英文摘要

The use of target networks has been a popular and key component of recent deep Q-learning algorithms for reinforcement learning, yet little is known from the theory side. In this work, we introduce a new family of target-based temporal difference (TD) learning algorithms and provide theoretical analysis on their convergences. In contrast to the standard TD-learning, target-based TD algorithms maintain two separate learning parameters-the target variable and online variable. Particularly, we introduce three members in the family, called the averaging TD, double TD, and periodic TD, where the target variable is updated through an averaging, symmetric, or periodic fashion, mirroring those techniques used in deep Q-learning practice. We establish asymptotic convergence analyses for both averaging TD and double TD and a finite sample analysis for periodic TD. In addition, we also provide some simulation results showing potentially superior convergence of these target-based TD algorithms compared to the standard TD-learning. While this work focuses on linear function approximation and policy evaluation setting, we consider this as a meaningful step towards the theoretical understanding of deep Q-learning variants with target networks.

1806.07200 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Adaptive Input Estimation in Linear Dynamical Systems with Applications to Learning-from-Observations

线性动态系统中自适应输入估计及其在学习-观察中的应用

Sebastian Curi, Kfir Y. Levy, Andreas Krause

AI总结 本文提出了一种自适应输入估计算法,通过在每个时间步高效地平衡偏差和方差以优化总体估计误差,并在学习-观察框架中展示了其在控制器学习中的有效性。

详情
Comments
CDC 2019
AI中文摘要

我们解决了从系统输出测量估计动态系统输入的问题。为此,我们引入了一种新颖的估计算法,该算法明确地在偏差和方差之间进行权衡,以最优地减少总体估计误差。这种最优的权衡在每个时间步都高效且自适应地完成。实验表明,我们的方法经常产生比现有最佳方法低得多的误差估计。最后,我们考虑了更复杂的学习-观察框架,其中智能体应从专家示范的输出学习控制器。我们将我们的估计算法作为该框架中的基本模块,并展示了它能够成功地学习控制器。

英文摘要

We address the problem of estimating the inputs of a dynamical system from measurements of the system's outputs. To this end, we introduce a novel estimation algorithm that explicitly trades off bias and variance to optimally reduce the overall estimation error. This optimal trade-off is done efficiently and adaptively in every time step. Experimentally, we show that our method often produces estimates with substantially lower error compared to the state-of-the-art. Finally, we consider the more complex \emph{Learning-from-Observations} framework, where an agent should learn a controller from the outputs of an expert's demonstration. We incorporate our estimation algorithm as a building block inside this framework and show that it enables learning controllers successfully.

1712.09379 2026-06-04 math.OC cs.DS cs.LG cs.NA math.NA stat.ML

IHT dies hard: Provable accelerated Iterative Hard Thresholding

IHT死守:可证明的加速迭代硬阈值法

Rajiv Khanna, Anastasios Kyrillidis

AI总结 本文研究了在理论和实践中经典迭代硬阈值(IHT)方法中动量运动的使用,通过简单修改普通IHT,探讨了其在具有非凸约束的凸优化标准下的收敛行为,并观察到IHT的加速在投影梯度下降和Frank-Wolfe变体中带来了显著改进。

详情
Comments
accepted to AISTATS 2018
AI中文摘要

我们研究了经典迭代硬阈值(IHT)方法中动量运动的使用,理论和实践相结合。通过简单修改普通IHT,我们探讨了其在具有非凸约束的凸优化标准下的收敛行为,在标准假设下。在多样场景中,我们观察到IHT的加速在投影梯度下降和Frank-Wolfe变体中带来了显著改进。作为我们检查的副产品,我们研究了选择动量参数的影响:类似于凸设置,观察到两种行为模式——“波纹”和线性——这取决于动量的水平。

英文摘要

We study --both in theory and practice-- the use of momentum motions in classic iterative hard thresholding (IHT) methods. By simply modifying plain IHT, we investigate its convergence behavior on convex optimization criteria with non-convex constraints, under standard assumptions. In diverse scenaria, we observe that acceleration in IHT leads to significant improvements, compared to state of the art projected gradient descent and Frank-Wolfe variants. As a byproduct of our inspection, we study the impact of selecting the momentum parameter: similar to convex settings, two modes of behavior are observed --"rippling" and linear-- depending on the level of momentum.

1512.06888 2026-06-04 eess.SY cs.MA cs.SY math.OC stat.ML

On Distributed Cooperative Decision-Making in Multiarmed Bandits

关于多臂老虎机中分布式协作决策的探讨

Peter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard

AI总结 本文研究了在多臂老虎机问题中分布式协作决策中的探索-利用权衡,设计了协作UCB算法,通过共识算法估计奖励和基于置信下限的启发式方法选择臂,并分析了通信图结构对群体决策性能的影响。

详情
Comments
This revision provides a correction to the original paper, which appeared in the Proceedings of the 2016 European Control Conference (ECC). The second statement of Proposition 1, Theorem 1 and their proofs are new. The new Theorem 1 is used to prove the regret bounds in Theorem 2
AI中文摘要

我们研究了在分布式协作决策中探索-利用权衡的问题,使用多臂老虎机(MAB)问题作为背景。对于分布式协作MAB问题,我们设计了协作UCB算法,该算法包含两个交错的分布式过程:(i)运行共识算法以估计奖励,以及(ii)基于置信下限的启发式方法以选择臂。我们严格分析了协作UCB算法的性能,并表征了通信图结构对群体决策性能的影响。

英文摘要

We study the explore-exploit tradeoff in distributed cooperative decision-making using the context of the multiarmed bandit (MAB) problem. For the distributed cooperative MAB problem, we design the cooperative UCB algorithm that comprises two interleaved distributed processes: (i) running consensus algorithms for estimation of rewards, and (ii) upper-confidence-bound-based heuristics for selection of arms. We rigorously analyze the performance of the cooperative UCB algorithm and characterize the influence of communication graph structure on the decision-making performance of the group.

1810.08068 2026-06-04 math.NA cs.NA math.ST stat.TH

Expectation Propagation for Poisson Data

泊松数据的期望传播

Chen Zhang, Simon Arridge, Bangti Jin

AI总结 本文提出了一种基于期望传播的近似贝叶斯推断方法,用于近似由泊松似然函数和拉普拉斯型先验分布(如各向异性总变差先验)形成的后验分布,通过迭代产生高斯近似,并在每次迭代中通过矩匹配更新后验分布的一个因子,推导了显式更新公式,并讨论了高效稳定的求积规则。

详情
Comments
25 pages, to be published at Inverse Problems
AI中文摘要

泊松分布出现在涉及计数的数据处理中,并在反问题和成像中找到了许多应用。在本文中,我们开发了一种基于期望传播的近似贝叶斯推断技术,用于近似由泊松似然函数和拉普拉斯型先验分布(如各向异性总变差先验)形成的后验分布。该方法通过迭代产生高斯近似,并在每次迭代中通过矩匹配更新后验分布的一个因子。我们推导了以一维积分形式表示的显式更新公式,并讨论了评估这些积分的稳定和高效求积规则。该方法在二维PET图像上进行了展示。

英文摘要

The Poisson distribution arises naturally when dealing with data involving counts, and it has found many applications in inverse problems and imaging. In this work, we develop an approximate Bayesian inference technique based on expectation propagation for approximating the posterior distribution formed from the Poisson likelihood function and a Laplace type prior distribution, e.g., the anisotropic total variation prior. The approach iteratively yields a Gaussian approximation, and at each iteration, it updates the Gaussian approximation to one factor of the posterior distribution by moment matching. We derive explicit update formulas in terms of one-dimensional integrals, and also discuss stable and efficient quadrature rules for evaluating these integrals. The method is showcased on two-dimensional PET images.

1808.03258 2026-06-04 cs.LG cs.NA math.NA stat.ML

Application of Bounded Total Variation Denoising in Urban Traffic Analysis

bounded 总变差去噪在城市交通分析中的应用

Shanshan Tang, Haijun Yu

AI总结 本文提出利用 bounded 总变差去噪方法提升城市交通分析的准确性,通过改进的去噪算法和神经网络结合历史匹配方法,提高了交通预测和聚类的性能。

详情
Journal ref
East Asian Journal on Applied Mathematics Vol.9, No.3, pp. 622-642, 2019
Comments
7 figures, 3 tables, to appear on East Asian Journal on Applied Mathematics
AI中文摘要

尽管在许多大数据应用中人们认为去噪并不总是必要,但本文通过将 bounded 总变差去噪方法应用于城市道路预测和聚类问题,证明了去噪在城市交通分析中的有效性。我们提出了两种易于实现的方法来估计去噪算法中的噪声强度参数,并将去噪算法应用于北京出租车系统基于 GPS 的交通数据。在交通预测问题中,我们结合神经网络和历史匹配方法,对北京城市区域中随机选择的道路进行预测。数值实验表明,应用所提出的 bounded 总变差去噪算法显著提高了预测精度。我们还测试了该算法在聚类问题中的应用,其中一种 recently 开发的聚类分析方法被应用于北京超过一百个城市的道路段,基于其速度剖面进行聚类分析。去噪后获得了更好的聚类结果。

英文摘要

While it is believed that denoising is not always necessary in many big data applications, we show in this paper that denoising is helpful in urban traffic analysis by applying the method of bounded total variation denoising to the urban road traffic prediction and clustering problem. We propose two easy-to-implement methods to estimate the noise strength parameter in the denoising algorithm, and apply the denoising algorithm to GPS-based traffic data from Beijing taxi system. For the traffic prediction problem, we combine neural network and history matching method for roads randomly chosen from an urban area of Beijing. Numerical experiments show that the predicting accuracy is improved significantly by applying the proposed bounded total variation denoising algorithm. We also test the algorithm on clustering problem, where a recently developed clustering analysis method is applied to more than one hundred urban road segments in Beijing based on their velocity profiles. Better clustering result is obtained after denoising.

1904.05310 2026-06-04 math.NA cs.NA stat.CO

Joint state-parameter estimation of a nonlinear stochastic energy balance model from sparse noisy data

非线性随机能量平衡模型中联合状态-参数估计的稀疏噪声数据推断

Fei Lu, Nils Weitzel, Adam H. Monahan

AI总结 本文提出了一种基于正则化后验的联合参数-状态估计方法,用于非线性随机能量平衡模型中稀疏噪声数据的推断,克服了参数估计逆问题的病态性问题,提高了估计的准确性与可靠性。

详情
Journal ref
Nonlin. Processes Geophys. 26, 2019
AI中文摘要

尽管非线性随机偏微分方程在时空建模中自然出现,但此类系统的推断常面临两个主要挑战:稀疏噪声数据和参数估计逆问题的病态性。为克服这些挑战,我们引入了强正则化后验,通过归一化似然函数并利用参数和状态的物理约束作为先验来施加物理限制。我们通过正则化后验在具有物理动机的非线性随机能量平衡模型(SEBM)中研究联合参数-状态估计,用于古气候重建。通过粒子吉布斯采样器对高维后验进行采样,该采样器结合了MCMC与最优粒子滤波器,利用SEBM的结构特性。在使用高斯或均匀先验(基于参数的物理范围)的测试中,正则化后验克服了病态性并导致样本落在物理范围内,量化了估计的不确定性。由于病态性和正则化,参数的后验呈现相对较大的不确定性,因此变分方法中的后验最大值可能有较大的变化。相反,状态的后验通常集中在真实值附近,显著滤除观测噪声并减少无约束SEBM中的不确定性。

英文摘要

While nonlinear stochastic partial differential equations arise naturally in spatiotemporal modeling, inference for such systems often faces two major challenges: sparse noisy data and ill-posedness of the inverse problem of parameter estimation. To overcome the challenges, we introduce a strongly regularized posterior by normalizing the likelihood and by imposing physical constraints through priors of the parameters and states. We investigate joint parameter-state estimation by the regularized posterior in a physically motivated nonlinear stochastic energy balance model (SEBM) for paleoclimate reconstruction. The high-dimensional posterior is sampled by a particle Gibbs sampler that combines MCMC with an optimal particle filter exploiting the structure of the SEBM. In tests using either Gaussian or uniform priors based on the physical range of parameters, the regularized posteriors overcome the ill-posedness and lead to samples within physical ranges, quantifying the uncertainty in estimation. Due to the ill-posedness and the regularization, the posterior of parameters presents a relatively large uncertainty, and consequently, the maximum of the posterior, which is the minimizer in a variational approach, can have a large variation. In contrast, the posterior of states generally concentrates near the truth, substantially filtering out observation noise and reducing uncertainty in the unconstrained SEBM.

1607.06163 2026-06-04 math.ST econ.GN q-fin.EC stat.ME stat.TH

Indirect Inference With(Out) Constraints

带有(无)约束的间接推断

David T. Frazier, Eric Renault

AI总结 本文研究了在间接推断(I-I)估计中如何处理辅助模型参数的约束问题,提出了一种新的I-I方法,通过适当修改无约束的辅助统计量来提高估计精度,并展示了其在GARCH等模型中的应用。

详情
AI中文摘要

间接推断(I-I)估计结构性参数θ需要匹配观测和模拟统计量,这些统计量通常由依赖于工具参数β的辅助模型生成。工具参数的估计器将包含用于推断结构性参数的统计信息。因此,人为约束这些参数可能会限制辅助模型准确复制结构性数据特征的能力,从而导致识别丢失等问题。然而,在某些情况下,参数β自然带有q个限制。例如,当I-I基于GARCH辅助模型时,β必须在q个可能严格不等约束g(β) > 0下估计。在这些设置中,我们提出了一种新的I-I方法,使用适当修改的无约束辅助统计量,这些统计量易于计算且总是存在的。我们为这种无约束I-I方法陈述了相关的渐近理论,并展示它可以通过适当修改的绑定函数重新解释为标准I-I实现。文献中出现的几个例子展示了我们的方法。

英文摘要

Indirect Inference (I-I) estimation of structural parameters $θ$ {requires matching observed and simulated statistics, which are most often generated using an auxiliary model that depends on instrumental parameters $β$.} {The estimators of the instrumental parameters will encapsulate} the statistical information used for inference about the structural parameters. As such, artificially constraining these parameters may restrict the ability of the auxiliary model to accurately replicate features in the structural data, which may lead to a range of issues, such as, a loss of identification. However, in certain situations the parameters $β$ naturally come with a set of $q$ restrictions. Examples include settings where $β$ must be estimated subject to $q$ possibly strict inequality constraints $g(β) > 0$, such as, when I-I is based on GARCH auxiliary models. In these settings we propose a novel I-I approach that uses appropriately modified unconstrained auxiliary statistics, which are simple to compute and always exists. We state the relevant asymptotic theory for this I-I approach without constraints and show that it can be reinterpreted as a standard implementation of I-I through a properly modified binding function. Several examples that have featured in the literature illustrate our approach.

1806.06790 2026-06-04 cs.LG cs.AI cs.IT cs.SY eess.SY math.IT math.OC stat.ML

Towards Distributed Energy Services: Decentralizing Optimal Power Flow with Machine Learning

迈向分布式能源服务:利用机器学习实现最优功率流的去中心化

Roel Dobbe, Oscar Sondermeijer, David Fridovich-Keil, Daniel Arnold, Duncan Callaway, Claire Tomlin

AI总结 本文提出了一种基于机器学习的去中心化方法,通过本地可用信息学习可控分布式能源资源(DER)的控制策略,以重构和模仿集中式最优功率流(OPF)问题的解决方案,从而实现分布式能源服务。

详情
Comments
Accepted for publication. To appear in the IEEE Transactions on Smart Grid
AI中文摘要

实现最优功率流(OPF)方法以调节电力网络中的电压和功率流通常被认为需要大量通信。我们考虑包含多个可控分布式能源资源(DER)的配电系统,并提出一种数据驱动的方法,用于学习每个DER的控制策略,以仅利用本地可用信息来重构和模仿集中式OPF问题的解决方案。集体来看,所有本地控制器紧密匹配集中式OPF解决方案,提供接近最优的性能并满足系统约束。速率失真框架使得能够分析由此产生的完全去中心化控制策略在重构OPF解决方案方面的效果。该方法为决定DER应与哪些节点通信以改进其个别策略提供了自然扩展。该方法在单相和三相测试馈线网络上应用,使用真实负载和分布式发电机的数据,重点于不表现出跨时间依赖性的DER。它为配电系统运营商提供了一个框架,以高效规划和操作DER的贡献,以实现配电网络中的分布式能源服务。

英文摘要

The implementation of optimal power flow (OPF) methods to perform voltage and power flow regulation in electric networks is generally believed to require extensive communication. We consider distribution systems with multiple controllable Distributed Energy Resources (DERs) and present a data-driven approach to learn control policies for each DER to reconstruct and mimic the solution to a centralized OPF problem from solely locally available information. Collectively, all local controllers closely match the centralized OPF solution, providing near optimal performance and satisfaction of system constraints. A rate distortion framework enables the analysis of how well the resulting fully decentralized control policies are able to reconstruct the OPF solution. The methodology provides a natural extension to decide what nodes a DER should communicate with to improve the reconstruction of its individual policy. The method is applied on both single- and three-phase test feeder networks using data from real loads and distributed generators, focusing on DERs that do not exhibit inter-temporal dependencies. It provides a framework for Distribution System Operators to efficiently plan and operate the contributions of DERs to achieve Distributed Energy Services in distribution networks.

1804.02948 2026-06-04 eess.SY cs.LG cs.SY stat.ML

Sample-Derived Disjunctive Rules for Secure Power System Operation

基于样本的离散规则用于安全电力系统运行

Jochen L. Cremer, Ioannis Konstantelos, Simon H. Tindemans, Goran Strbac

AI总结 本文提出了一种基于决策树的离散规则方法,用于在标准优化框架中进行预故障和后故障控制,通过通用化方法将决策树衍生的规则嵌入到操作决策模型中,以提高电力系统运行的安全性。

详情
Comments
6 pages, accepted paper to IEEE PMAPS 2018
AI中文摘要

机器学习技术过去曾利用蒙特卡洛样本来构建电力系统动态稳定的预测器。在本文中,我们超越了预测任务,提出了一种综合方法,将预测器(如决策树(DT))纳入标准优化框架中,用于预故障和后故障控制。具体而言,我们提出了一种通用方法,用于将从决策树中导出的规则嵌入到操作决策模型中。我们首先指出了从预测框架过渡到控制框架时所面临的特定挑战。接着,我们介绍了基于广义离散规划(GDP)的解决方案策略,以及一种两步搜索方法,用于确定最优超参数以平衡成本和控制精度。我们通过IEEE 39节点系统的案例研究,展示了所提出的方法如何在高维不确定性条件下构建覆盖多种故障情景的安全代理。该方法在系统价格方面仅略高于理想模型,实现了高效的系统控制。

英文摘要

Machine learning techniques have been used in the past using Monte Carlo samples to construct predictors of the dynamic stability of power systems. In this paper we move beyond the task of prediction and propose a comprehensive approach to use predictors, such as Decision Trees (DT), within a standard optimization framework for pre- and post-fault control purposes. In particular, we present a generalizable method for embedding rules derived from DTs in an operation decision-making model. We begin by pointing out the specific challenges entailed when moving from a prediction to a control framework. We proceed with introducing the solution strategy based on generalized disjunctive programming (GDP) as well as a two-step search method for identifying optimal hyper-parameters for balancing cost and control accuracy. We showcase how the proposed approach constructs security proxies that cover multiple contingencies while facing high-dimensional uncertainty with respect to operating conditions with the use of a case study on the IEEE 39-bus system. The method is shown to achieve efficient system control at a marginal increase in system price compared to an oracle model.

1905.04835 2026-06-04 cs.LG cs.CV cs.MA cs.RO cs.SY eess.SY stat.ML

Multi-Agent Image Classification via Reinforcement Learning

通过强化学习进行多智能体图像分类

Hossein K. Mousavi, Mohammadreza Nazari, Martin Takáč, Nader Motee

AI总结 本文研究了利用多个能够收集未知环境部分姿态依赖观测的移动智能体进行图像分类的问题,提出了一种网络架构,用于指导智能体形成局部信念、采取局部行动并从原始部分观测中提取相关特征,通过与邻居智能体交换信息更新自身信念,并利用强化学习技术实现分类问题的去中心化实现。

详情
Comments
Preprint of the paper to be published in IROS'19 proceedings
AI中文摘要

我们研究了使用多个能够收集未知环境部分姿态依赖观测的移动智能体进行分类问题。目标是在有限的时间范围内对图像进行分类。我们提出了一种网络架构,用于指导智能体如何形成局部信念、采取局部行动并从原始部分观测中提取相关特征。智能体被允许与邻居智能体交换信息以更新自身信念。证明了如何利用强化学习技术通过运行去中心化共识协议来实现分类问题的去中心化实现。我们在MNIST手写数字数据集上的实验结果展示了我们所提框架的有效性。

英文摘要

We investigate a classification problem using multiple mobile agents capable of collecting (partial) pose-dependent observations of an unknown environment. The objective is to classify an image over a finite time horizon. We propose a network architecture on how agents should form a local belief, take local actions, and extract relevant features from their raw partial observations. Agents are allowed to exchange information with their neighboring agents to update their own beliefs. It is shown how reinforcement learning techniques can be utilized to achieve decentralized implementation of the classification problem by running a decentralized consensus protocol. Our experimental results on the MNIST handwritten digit dataset demonstrates the effectiveness of our proposed framework.

1903.11683 2026-06-04 stat.ML cs.CV cs.LG cs.RO cs.SY eess.SY stat.AP

Outlier-Robust Spatial Perception: Hardness, General-Purpose Algorithms, and Guarantees

抗异常的空域感知:难度、通用算法和保证

Vasileios Tzoumas, Pasquale Antonante, Luca Carlone

AI总结 本文研究了空域感知中异常数据的影响,提出了一种通用算法来有效去除异常,并提供了对算法性能的理论保证。

详情
AI中文摘要

空域感知是许多机器人应用的核心,涵盖了定位与建图、点云对齐和从相机图像中估计相对姿态等广泛的研究问题。异常数据的存在会威胁到空域感知的鲁棒性,而一般情况下,异常值是主要问题。尽管已有处理异常值的技术,但它们可能以不可预测的方式失败(例如RANSAC、鲁棒估计器),或具有指数级的运行时间(例如分支界限法)。在本文中,我们通过三个贡献推动了异常拒绝的前沿。首先,我们证明了即使是最简单的线性异常拒绝实例也是近似不可行的:在最坏情况下,无法设计出一个准多项式时间算法来高效计算近似解。我们的第二个贡献是提供第一个实例级的次优界限,以评估给定异常拒绝结果的近似质量。我们的第三个贡献是提出了一种简单的通用算法,称为自适应修剪,用于去除异常值。我们的算法利用了最近提出的一类全局求解器,能够解决无异常的问题,并通过迭代去除误差较大的测量值。我们在三个空域感知问题上展示了所提出的算法:三维配准、双视几何和SLAM。结果表明,我们的算法在各种应用中优于几种最先进的方法,同时是一种通用的方法。

英文摘要

Spatial perception is the backbone of many robotics applications, and spans a broad range of research problems, including localization and mapping, point cloud alignment, and relative pose estimation from camera images. Robust spatial perception is jeopardized by the presence of incorrect data association, and in general, outliers. Although techniques to handle outliers do exist, they can fail in unpredictable manners (e.g., RANSAC, robust estimators), or can have exponential runtime (e.g., branch-and-bound). In this paper, we advance the state of the art in outlier rejection by making three contributions. First, we show that even a simple linear instance of outlier rejection is inapproximable: in the worst-case one cannot design a quasi-polynomial time algorithm that computes an approximate solution efficiently. Our second contribution is to provide the first per-instance sub-optimality bounds to assess the approximation quality of a given outlier rejection outcome. Our third contribution is to propose a simple general-purpose algorithm, named adaptive trimming, to remove outliers. Our algorithm leverages recently-proposed global solvers that are able to solve outlier-free problems, and iteratively removes measurements with large errors. We demonstrate the proposed algorithm on three spatial perception problems: 3D registration, two-view geometry, and SLAM. The results show that our algorithm outperforms several state-of-the-art methods across applications while being a general-purpose method.

1905.02679 2026-06-04 stat.CO cs.CE cs.NA math.NA

Multifidelity probability estimation via fusion of estimators

通过融合估计器进行多保真度概率估计

Boris Kramer, Alexandre Noll Marques, Benjamin Peherstorfer, Umberto Villa, Karen Willcox

AI总结 本文提出了一种多保真度方法,通过信息融合和重要性采样估计昂贵模型的失效概率,核心方法是融合多个概率估计器以降低方差,主要贡献是证明融合估计器在方差上最优。

详情
Journal ref
Journal of Computational Physics 392, 385-402, 2019
AI中文摘要

本文开发了一种多保真度方法,该方法通过信息融合和重要性采样,使能够对昂贵评估模型的失效概率进行估计。所提出的通用融合方法旨在通过结合多个概率估计器以实现方差降低。我们利用低保真度模型推导偏置密度用于重要性采样,然后融合重要性采样估计器,使得融合的多保真度估计器无偏且具有均方误差小于或等于任何单独重要性采样估计器的特性。通过融合所有可用的估计器,该方法规避了选择最佳偏置密度并仅使用该密度采样的挑战。严谨的分析表明,融合的估计器在所有可能的估计器组合中是最优的,即具有最小的方差。所提出方法的渐进行为在可承受10^5次采样的对流-扩散-反应偏微分方程模型上得到演示。为了展示该方法的扩展性,我们考虑了自由平面射流模型,并量化了流入中的不确定性如何传播到与湍流混合相关的感兴趣量。与仅使用高保真度模型的重要性采样估计器相比,我们的多保真度估计器将所需的CPU时间减少了65%,同时达到相似的变异系数。

英文摘要

This paper develops a multifidelity method that enables estimation of failure probabilities for expensive-to-evaluate models via information fusion and importance sampling. The presented general fusion method combines multiple probability estimators with the goal of variance reduction. We use low-fidelity models to derive biasing densities for importance sampling and then fuse the importance sampling estimators such that the fused multifidelity estimator is unbiased and has mean-squared error lower than or equal to that of any of the importance sampling estimators alone. By fusing all available estimators, the method circumvents the challenging problem of selecting the best biasing density and using only that density for sampling. A rigorous analysis shows that the fused estimator is optimal in the sense that it has minimal variance amongst all possible combinations of the estimators. The asymptotic behavior of the proposed method is demonstrated on a convection-diffusion-reaction partial differential equation model for which $10^5$ samples can be afforded. To illustrate the proposed method at scale, we consider a model of a free plane jet and quantify how uncertainties at the flow inlet propagate to a quantity of interest related to turbulent mixing. Compared to an importance sampling estimator that uses the high-fidelity model alone, our multifidelity estimator reduces the required CPU time by 65\% while achieving a similar coefficient of variation.

1811.05537 2026-06-04 math.NA cs.LG cs.NA cs.NE math.DS stat.ML

Data Driven Governing Equations Approximation Using Deep Neural Networks

利用深度神经网络的数据驱动 governing 方程近似

Tong Qin, Kailiang Wu, Dongbin Xiu

AI总结 本文提出了一种数值框架,利用观测数据和深度神经网络近似未知的 governing 方程,通过残差网络作为基本构建块,提出了两种多步方法,展示了其在不同时间步长下的性能。

详情
AI中文摘要

我们提出了一种数值框架,用于利用观测数据和深度神经网络(DNN)近似未知的 governing 方程。特别是,我们提出使用残差网络(ResNet)作为方程近似的基本构建块。我们证明残差网络块可以被视为在时间积分中精确的一步方法。然后,我们提出了两种多步方法,即递归残差网络(RT-ResNet)方法和递归 ReNet(RS-ResNet)方法。RT-ResNet 是一种在均匀时间步长上的多步方法,而 RS-ResNet 是一种使用可变时间步长的自适应多步方法。所有三种方法均基于底层动力系统的基本积分形式。因此,它们不需要时间导数数据进行方程恢复,能够处理相对粗略分布的轨迹数据。几个数值例子展示了这些方法的性能。

英文摘要

We present a numerical framework for approximating unknown governing equations using observation data and deep neural networks (DNN). In particular, we propose to use residual network (ResNet) as the basic building block for equation approximation. We demonstrate that the ResNet block can be considered as a one-step method that is exact in temporal integration. We then present two multi-step methods, recurrent ResNet (RT-ResNet) method and recursive ReNet (RS-ResNet) method. The RT-ResNet is a multi-step method on uniform time steps, whereas the RS-ResNet is an adaptive multi-step method using variable time steps. All three methods presented here are based on integral form of the underlying dynamical system. As a result, they do not require time derivative data for equation recovery and can cope with relatively coarsely distributed trajectory data. Several numerical examples are presented to demonstrate the performance of the methods.

1805.06454 2026-06-04 cs.CE cs.NA math.NA stat.ML

Unsupervised Machine Learning Based on Non-Negative Tensor Factorization for Analyzing Reactive-Mixing

基于非负张量分解的无监督机器学习用于分析反应-混合过程

V. V. Vesselinov, M. K. Mudunuru, S. Karra, D. O. Malley, B. S. Alexandrov

AI总结 本文提出了一种基于非负张量分解(NTF)的无监督机器学习方法,结合k-means聚类,用于分析反应-混合过程中的隐藏特征,通过非负特征提取来解构混合过程,应用于高分辨率有限元模拟以识别混合行为的加法特征。

详情
Comments
34 pages
AI中文摘要

分析反应扩散模拟需要大量的独立模型运行。对于每个高保真度模拟,输入参数被变化,并且预测的混合行为通过物种浓度的变化来表示。因此需要确定模型输入如何影响混合过程。这项任务具有挑战性,通常涉及对大型模型输出的解释。然而,通过应用机器学习(ML)方法可以自动化并显著简化这项任务。在本文中,我们提出了一种应用无监督ML方法(称为NTFk)的方案,该方法结合了非负张量分解(NTF)和基于k-means的自定义聚类过程,以揭示产物浓度中的隐藏特征。所提出的ML方法的一个优点是确保提取的特征是非负的,这对于获得有意义的混合过程分解至关重要。该ML方法应用于一组高分辨率FEM模拟,这些模拟代表了在扰动涡旋速度场中反应扩散过程。所应用的FEM确保物种浓度始终非负。模拟的反应是一个快速不可逆的双分子反应。控制混合的反应扩散模型输入参数包括速度场的性质、各向异性扩散和分子扩散。我们展示了该ML方法在生成有意义的模型输出分解以区分影响反应物、其混合和产物空间分布的不同物理过程方面的应用。所提出的ML分析使我们能够识别出表征混合行为的加法特征。

英文摘要

Analysis of reactive-diffusion simulations requires a large number of independent model runs. For each high-fidelity simulation, inputs are varied and the predicted mixing behavior is represented by changes in species concentration. It is then required to discern how the model inputs impact the mixing process. This task is challenging and typically involves interpretation of large model outputs. However, the task can be automated and substantially simplified by applying Machine Learning (ML) methods. In this paper, we present an application of an unsupervised ML method (called NTFk) using Non-negative Tensor Factorization (NTF) coupled with a custom clustering procedure based on k-means to reveal hidden features in product concentration. An attractive aspect of the proposed ML method is that it ensures the extracted features are non-negative, which are important to obtain a meaningful deconstruction of the mixing processes. The ML method is applied to a large set of high-resolution FEM simulations representing reaction-diffusion processes in perturbed vortex-based velocity fields. The applied FEM ensures that species concentration are always non-negative. The simulated reaction is a fast irreversible bimolecular reaction. The reactive-diffusion model input parameters that control mixing include properties of velocity field, anisotropic dispersion, and molecular diffusion. We demonstrate the applicability of the ML method to produce a meaningful deconstruction of model outputs to discriminate between different physical processes impacting the reactants, their mixing, and the spatial distribution of the product. The presented ML analysis allowed us to identify additive features that characterize mixing behavior.

1904.08353 2026-06-04 stat.ML cs.LG cs.SY eess.SY

Towards Robust Deep Reinforcement Learning for Traffic Signal Control: Demand Surges, Incidents and Sensor Failures

面向交通信号控制的鲁棒深度强化学习:需求激增、事故和传感器故障

Filipe Rodrigues, Carlos Lima Azevedo

AI总结 本文提出了一种开源的回调框架,用于在交通模拟环境中灵活评估不同深度强化学习配置,研究了深度强化学习自适应交通控制器在需求激增、事故导致的容量下降和传感器故障等场景下的表现,并提出了缓解这些外源不确定性的具体设计。

详情
Comments
8 pages
AI中文摘要

强化学习(RL)构成了缓解交通拥堵问题的一种有希望的解决方案。特别是,深度RL算法已被证明能够产生适应性强的交通信号控制器,其性能优于传统系统。然而,为了在高度动态的城市区域中保持可靠性,此类控制器需要对一系列外源不确定性具有鲁棒性。在本文中,我们开发了一个开源的回调基于框架,用于在交通模拟环境中促进不同深度RL配置的灵活评估。借助该框架,我们研究了深度RL基于自适应交通控制器在不同场景下的表现,即由特殊事件引起的交通需求激增、由事故导致的容量下降以及传感器故障。我们提取了若干关键见解,以开发用于交通控制的鲁棒深度RL算法,并提出了具体设计以减轻所考虑的外源不确定性的影响。

英文摘要

Reinforcement learning (RL) constitutes a promising solution for alleviating the problem of traffic congestion. In particular, deep RL algorithms have been shown to produce adaptive traffic signal controllers that outperform conventional systems. However, in order to be reliable in highly dynamic urban areas, such controllers need to be robust with the respect to a series of exogenous sources of uncertainty. In this paper, we develop an open-source callback-based framework for promoting the flexible evaluation of different deep RL configurations under a traffic simulation environment. With this framework, we investigate how deep RL-based adaptive traffic controllers perform under different scenarios, namely under demand surges caused by special events, capacity reductions from incidents and sensor failures. We extract several key insights for the development of robust deep RL algorithms for traffic control and propose concrete designs to mitigate the impact of the considered exogenous uncertainties.

1809.05525 2026-06-04 quant-ph cs.LG cs.SY eess.SY stat.ML

Robustness of Quantum-Enhanced Adaptive Phase Estimation

量子增强自适应相位估计的鲁棒性

Pantita Palittapongarnpim, Barry C. Sanders

AI总结 本研究提出了一种评估量子增强自适应相位估计策略鲁棒性的测试方法,并比较了不同策略所使用的资源,以确定其有效性并选择合适的策略。

详情
Journal ref
Phys. Rev. A 100, 012106 (2019)
Comments
15 pages, 2 figures, 2 tables
AI中文摘要

由于所有物理上的自适应量子增强计量方案都在具有部分理解的噪声条件下运行,因此实际的控制策略必须在未知噪声的情况下也具有鲁棒性。我们旨在设计一个测试来评估AQEM策略的鲁棒性,并评估策略所使用的资源。鲁棒性测试是在QEAPE上进行的,通过模拟四种相位噪声模型(正态分布噪声、随机电报噪声、偏态正态分布噪声和对数正态分布噪声)下的方案进行。控制策略要么是在相同嘈杂条件下由进化算法设计,尽管不知道其特性,要么是基于贝叶斯反馈的方法,假设没有噪声。我们的鲁棒性测试和资源比较方法可用于确定有效性和选择合适的策略。

英文摘要

As all physical adaptive quantum-enhanced metrology schemes operate under noisy conditions with only partially understood noise characteristics, so a practical control policy must be robust even for unknown noise. We aim to devise a test to evaluate the robustness of AQEM policies and assess the resource used by the policies. The robustness test is performed on QEAPE by simulating the scheme under four phase-noise models corresponding to normal-distribution noise, random-telegraph noise, skew-normal-distribution noise, and log-normal-distribution noise. Control policies are devised either by an evolutionary algorithm under the same noisy conditions, albeit ignorant of its properties, or a Bayesian-based feedback method that assumes no noise. Our robustness test and resource comparison method can be used to determining the efficacy and selecting a suitable policy.

1812.08413 2026-06-04 astro-ph.SR cs.NA math.NA stat.CO

Compressed sensing and Sequential Monte Carlo for solar hard X-ray imaging

压缩感知与顺序蒙特卡洛方法用于太阳硬X射线成像

Anna Maria Massone, Federica Sciacchitano, Michele Piana, Alberto Sorrentino

AI总结 本文提出两种逆向方法用于恢复太阳硬X射线图像,通过实验可见度数据和合成可见度数据进行测试,验证了压缩感知与顺序蒙特卡洛方法在太阳成像中的有效性。

详情
Comments
submitted to 'Nuovo Cimento' as proceeding SOHE3
AI中文摘要

我们描述了两种用于恢复硬X射线太阳图像的逆向方法。这些方法在Reuven Ramaty高能太阳光谱成像仪(RHESSI)记录的实验可见度数据以及基于Spectrometer/Telescope for Imaging X-rays(STIX)设计的合成可见度数据上进行了测试。

英文摘要

We describe two inversion methods for the reconstruction of hard X-ray solar images. The methods are tested against experimental visibilities recorded by the Reuven Ramaty High Energy Solar Spectroscopic Imager (RHESSI) and synthetic visibilities based on the design of the Spectrometer/Telescope for Imaging X-rays (STIX).

1806.03816 2026-06-04 cs.LG cs.NA math.NA stat.ML

Adaptive MCMC via Combining Local Samplers

通过结合局部采样器实现自适应MCMC

Kiarash Shaloudegi, András György

AI总结 本文提出了一种自适应MCMC方法,通过结合多个并行运行的局部采样器,利用核Stein分歧度优先选择链,以提高整体采样效率,实验表明该方法在多模态问题和传感器定位任务中优于现有方法。

详情
AI中文摘要

马尔可夫链蒙特卡罗(MCMC)方法在机器学习中被广泛使用。MCMC的主要问题之一是如何设计能够快速混合整个状态空间的链,特别是如何选择MCMC算法的参数。本文采取了不同的方法,类似于并行MCMC方法,而不是寻找一个能够采样整个分布的单一链,而是结合多个并行运行的链的样本,每个链仅探索状态空间的部分(例如几个模式)。链根据核Stein分歧度优先级进行选择,这提供了局部性能的良好度量。独立链的样本通过一种新的技术进行组合,用于估计样本空间不同区域的概率。实验结果表明,所提出的算法可能在不同的采样问题中提供显著的加速。最重要的是,当与最先进的NUTS算法作为基础MCMC采样器结合时,我们的方法在采样单峰分布时与NUTS具有竞争力,而在合成多峰问题以及具有挑战性的传感器定位任务中显著优于现有方法。

英文摘要

Markov chain Monte Carlo (MCMC) methods are widely used in machine learning. One of the major problems with MCMC is the question of how to design chains that mix fast over the whole state space; in particular, how to select the parameters of an MCMC algorithm. Here we take a different approach and, similarly to parallel MCMC methods, instead of trying to find a single chain that samples from the whole distribution, we combine samples from several chains run in parallel, each exploring only parts of the state space (e.g., a few modes only). The chains are prioritized based on kernel Stein discrepancy, which provides a good measure of performance locally. The samples from the independent chains are combined using a novel technique for estimating the probability of different regions of the sample space. Experimental results demonstrate that the proposed algorithm may provide significant speedups in different sampling problems. Most importantly, when combined with the state-of-the-art NUTS algorithm as the base MCMC sampler, our method remained competitive with NUTS on sampling from unimodal distributions, while significantly outperforming state-of-the-art competitors on synthetic multimodal problems as well as on a challenging sensor localization task.

1809.07192 2026-06-04 eess.SY cs.LG cs.SY math.OC stat.ML

Unbalanced Multi-Phase Distribution Grid Topology Estimation and Bus Phase Identification

不平衡多相配电网拓扑估计与节点相位识别

Yizheng Liao, Yang Weng, Guangyi Liu, Zhongyang Zhao, Chin-woo Tan, Ram Rajagopal

AI总结 本文提出了一种基于信息论的方法,利用智能电表数据估计配电网的多相拓扑并识别节点相位,通过将不平衡系统转换为对称分量并证明Chow-Liu算法在存在错误节点相位标签时能确定拓扑结构,最终通过Carson方程证明电压测量可正确识别节点相位连接,实验结果表明该方法在强负载不平衡和分布式能源接入条件下具有高准确性。

详情
Comments
17 pages, 18 figures
AI中文摘要

随着分布式能源资源带来的不确定性在配电网中增加,准确的多相拓扑是相关不平衡配电网测量的基础。然而,由于投资有限,尤其是低压配电网,此类拓扑知识往往不可用。此外,由于人为错误或过时记录,节点相位标签信息不准确。为此,本文利用智能电表数据提出了一种信息论方法来学习配电网拓扑。具体而言,多相不平衡系统被转换为对称分量,即正序、负序和零序。然后,本文证明Chow-Liu算法通过利用功率流方程和由配电网径向多相结构隐含的条件独立关系来确定拓扑。最后,通过Carson方程证明可以使用电压测量正确识别节点相位连接。为验证,使用三个真实数据集模拟IEEE系统。仿真结果表明,该算法在强负载不平衡和DERs条件下仍能准确找到多相拓扑,确保配电网中分布式能源的紧密监控和控制。

英文摘要

There is an increasing need for monitoring and controlling uncertainties brought by distributed energy resources in distribution grids. For such goal, accurate multi-phase topology is the basis for correlating measurements in unbalanced distribution networks. Unfortunately, such topology knowledge is often unavailable due to limited investment, especially for \revv{low-voltage} distribution grids. Also, the bus phase labeling information is inaccurate due to human errors or outdated records. For this challenge, this paper utilizes smart meter data for an information-theoretic approach to learn the topology of distribution grids. Specifically, multi-phase unbalanced systems are converted into symmetrical components, namely positive, negative, and zero sequences. Then, this paper proves that the Chow-Liu algorithm finds the topology by utilizing power flow equations and the conditional independence relationships implied by the radial multi-phase structure of distribution grids with the presence of incorrect bus phase labels. At last, by utilizing Carson's equation, this paper proves that the bus phase connection can be correctly identified using voltage measurements. For validation, IEEE systems are simulated using three real data sets. The simulation results demonstrate that the algorithm is highly accurate for finding multi-phase topology even with strong load unbalancing condition and DERs. This ensures close monitoring and controlling DERs in distribution grids.

1603.09060 2026-06-04 q-fin.TR econ.GN math.PR math.ST q-fin.EC stat.TH

The Perfect Marriage and Much More: Combining Dimension Reduction, Distance Measures and Covariance

完美的结合与更多:结合维度降低、距离度量和协方差

Ravi Kashyap

AI总结 本文提出了一种基于Bhattacharyya距离和Johnson-Lindenstrauss引理结合的新方法,用于比较不同分布的数据集,并通过扩展Stein引理展示了协方差与距离度量之间的关系,应用于资产定价及其他领域。

详情
AI中文摘要

我们开发了一种基于Bhattacharyya距离(一种衡量随机变量分布相似性的度量)和Johnson-Lindenstrauss引理(一种维度降低技术)结合的新方法。所得到的工具是一种简单而强大的工具,允许比较任何两个分布所代表的数据集。不同实体(市场、大学、医院、城市、证券组等)对应分布的距离度量程度表明了它们的差异程度,帮助寻求多样化或寻求更多相同事物的参与者。我们基于Stein引理的通用扩展展示了协方差与距离度量之间的关系。我们考虑了资产定价应用,然后简要讨论了该方法如何适用于市场结构研究,甚至应用于金融/社会科学之外的领域,通过展示一个生物应用来说明。我们使用来自六个不同国家的安全证券价格、成交量和波动性等变量的数值示例。

英文摘要

We develop a novel methodology based on the marriage between the Bhattacharyya distance, a measure of similarity across distributions of random variables, and the Johnson-Lindenstrauss Lemma, a technique for dimension reduction. The resulting technique is a simple yet powerful tool that allows comparisons between data-sets representing any two distributions. The degree to which different entities, (markets, universities, hospitals, cities, groups of securities, etc.), have different distance measures of their corresponding distributions tells us the extent to which they are different, aiding participants looking for diversification or looking for more of the same thing. We demonstrate a relationship between covariance and distance measures based on a generic extension of Stein's Lemma. We consider an asset pricing application and then briefly discuss how this methodology lends itself to numerous market-structure studies and even applications outside the realm of finance / social sciences by illustrating a biological application. We provide numerical illustrations using security prices, volumes and volatilities of both these variables from six different countries.

1708.02365 2026-06-04 econ.GN q-fin.EC stat.CO stat.ME

Indirect Inference with a Non-Smooth Criterion Function

间接推断与非光滑准则函数

David T. Frazier, Tatsushi Oka, Dan Zhu

AI总结 本文研究了在间接推断中使用非光滑准则函数的问题,提出了一种新的模拟算法来缓解由于内生变量为模型参数的不连续函数所导致的准则函数不连续性,从而允许使用基于导数的优化方法进行参数估计。

详情
Comments
This paper is a revision of arXiv:1708.02365 and supersedes the earlier arXiv paper "Derivative-Based Optimization with a Non-Smooth Simulated Criterion"
AI中文摘要

间接推断需要从所研究的模型中模拟内生变量的实现。当内生变量是模型参数的不连续函数时,所得到的间接推断准则函数也是不连续的,这不允许使用基于导数的优化程序。利用变量变换技术,我们提出了一种新的模拟算法,以减轻此类间接推断准则函数中固有的不连续性,并允许使用基于导数的优化程序来估计未知模型参数。与竞争方法不同,该方法不依赖于核平滑或带宽参数。文献中关于间接推断与不连续结果的几个蒙特卡洛例子展示了该方法,并证明了该方法在现有替代方法上的优越性能。

英文摘要

Indirect inference requires simulating realisations of endogenous variables from the model under study. When the endogenous variables are discontinuous functions of the model parameters, the resulting indirect inference criterion function is discontinuous and does not permit the use of derivative-based optimisation routines. Using a change of variables technique, we propose a novel simulation algorithm that alleviates the discontinuities inherent in such indirect inference criterion functions, and permits the application of derivative-based optimisation routines to estimate the unknown model parameters. Unlike competing approaches, this approach does not rely on kernel smoothing or bandwidth parameters. Several Monte Carlo examples that have featured in the literature on indirect inference with discontinuous outcomes illustrate the approach, and demonstrate the superior performance of this approach over existing alternatives.

1811.09358 2026-06-04 cs.LG cs.CV cs.NA math.NA math.OC stat.ML

A Sufficient Condition for Convergences of Adam and RMSProp

Adam和RMSProp收敛性的充分条件

Fangyu Zou, Li Shen, Zequn Jie, Weizhong Zhang, Wei Liu

AI总结 本文提出了一种易于检查的充分条件,该条件仅依赖于基础学习率参数和历史二阶矩量的组合,以保证通用的Adam/RMSProp算法在大规模非凸随机优化中的全局收敛性,并展示了几种Adam变体在非凸设置下的收敛性可由此条件直接推导。

详情
Comments
Accepted by CVPR2019 as an Oral presentation
AI中文摘要

Adam和RMSProp是训练深度神经网络中最具影响力的自适应随机算法,尽管在凸设置中通过几个简单的反例已被指出存在发散现象。许多尝试,如降低自适应学习率、采用大批次大小、引入时间去相关技术、寻找类比的替代方案等,已被尝试以促进Adam/RMSProp型算法收敛。与现有方法不同,我们引入了一种替代的易于检查的充分条件,该条件仅依赖于基础学习率参数和历史二阶矩量的组合,以保证通用的Adam/RMSProp算法在大规模非凸随机优化中的全局收敛性。此外,我们展示了几种Adam变体,如AdamNC、AdaEMA等,在非凸设置下的收敛性可通过所提出的充分条件直接推导。此外,我们表明Adam本质上是一种具有指数移动平均动量的特定加权AdaGrad,这为理解Adam和RMSProp提供了新的视角。这一观察结合该充分条件,为它们的发散性提供了更深入的解释。最后,我们通过将Adam和RMSProp应用于特定反例和训练深度神经网络来验证该充分条件。数值结果与我们的理论分析一致。

英文摘要

Adam and RMSProp are two of the most influential adaptive stochastic algorithms for training deep neural networks, which have been pointed out to be divergent even in the convex setting via a few simple counterexamples. Many attempts, such as decreasing an adaptive learning rate, adopting a big batch size, incorporating a temporal decorrelation technique, seeking an analogous surrogate, etc., have been tried to promote Adam/RMSProp-type algorithms to converge. In contrast with existing approaches, we introduce an alternative easy-to-check sufficient condition, which merely depends on the parameters of the base learning rate and combinations of historical second-order moments, to guarantee the global convergence of generic Adam/RMSProp for solving large-scale non-convex stochastic optimization. Moreover, we show that the convergences of several variants of Adam, such as AdamNC, AdaEMA, etc., can be directly implied via the proposed sufficient condition in the non-convex setting. In addition, we illustrate that Adam is essentially a specifically weighted AdaGrad with exponential moving average momentum, which provides a novel perspective for understanding Adam and RMSProp. This observation coupled with this sufficient condition gives much deeper interpretations on their divergences. At last, we validate the sufficient condition by applying Adam and RMSProp to tackle a certain counterexample and train deep neural networks. Numerical results are exactly in accord with our theoretical analysis.

1803.07726 2026-06-04 stat.ML cs.IT cs.LG cs.NA math.IT math.NA math.OC

Gradient Descent with Random Initialization: Fast Global Convergence for Nonconvex Phase Retrieval

梯度下降与随机初始化:非凸相位恢复的快速全局收敛性

Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma

AI总结 本文研究了通过二次方程恢复目标对象的问题,证明了在高斯设计下,随机初始化的梯度下降能在O(log n + log(1/ε))次迭代中获得ε精度的解,从而实现了计算和样本复杂度的近最优性,为相位恢复提供了首个无需精心设计初始化、样本分割或复杂鞍点逃离方案的全局收敛保证。

详情
Journal ref
Mathematical Programming 2019, Volume 176, Issue 1-2, 5-37
Comments
Accepted to Mathematical Programming
AI中文摘要

本文考虑了解二次方程组的问题,即从m个二次方程/样本y_i=(a_i^T x^natural)^2 (1≤i≤m)中恢复感兴趣的对象x^natural∈R^n。这个问题也被称为相位恢复,涵盖了多个领域,包括物理科学和机器学习。我们研究了为非凸最小二乘问题设计的梯度下降(或Wirtinger流)的效率。我们证明,在高斯设计下,梯度下降——当以随机方式初始化时——能在O(log n + log(1/ε))次迭代中获得ε精度的解,从而同时实现了近最优的计算和样本复杂度。这为相位恢复提供了首个关于普通梯度下降的全局收敛保证,无需(i)精心设计的初始化(ii)样本分割,或(iii)复杂的鞍点逃离方案。所有这些都通过利用统计模型分析优化算法,通过一种leave-one-out方法,实现了梯度下降迭代与数据之间的统计依赖性的解耦。

英文摘要

This paper considers the problem of solving systems of quadratic equations, namely, recovering an object of interest $\mathbf{x}^{\natural}\in\mathbb{R}^{n}$ from $m$ quadratic equations/samples $y_{i}=(\mathbf{a}_{i}^{\top}\mathbf{x}^{\natural})^{2}$, $1\leq i\leq m$. This problem, also dubbed as phase retrieval, spans multiple domains including physical sciences and machine learning. We investigate the efficiency of gradient descent (or Wirtinger flow) designed for the nonconvex least squares problem. We prove that under Gaussian designs, gradient descent --- when randomly initialized --- yields an $ε$-accurate solution in $O\big(\log n+\log(1/ε)\big)$ iterations given nearly minimal samples, thus achieving near-optimal computational and sample complexities at once. This provides the first global convergence guarantee concerning vanilla gradient descent for phase retrieval, without the need of (i) carefully-designed initialization, (ii) sample splitting, or (iii) sophisticated saddle-point escaping schemes. All of these are achieved by exploiting the statistical models in analyzing optimization algorithms, via a leave-one-out approach that enables the decoupling of certain statistical dependency between the gradient descent iterates and the data.

1811.10745 2026-06-04 cs.LG cs.CR cs.NA math.NA stat.ML

ResNets Ensemble via the Feynman-Kac Formalism to Improve Natural and Robust Accuracies

通过费米-狄拉克公式式方法提升ResNets的自然和鲁棒准确性的集成方法

Bao Wang, Binjie Yuan, Zuoqiang Shi, Stanley J. Osher

AI总结 本文提出了一种基于费米-狄拉克公式式的ResNets集成算法,通过在残差映射的输出中注入方差指定的高斯噪声并平均多个联合训练的修改ResNets的乘积来提高模型在干净和对抗性图像上的准确率。

详情
Comments
18 pages, 6 figures
AI中文摘要

经验对抗风险最小化(EARM)是一种广泛使用的数学框架,用于鲁棒地训练深度神经网络(DNNs),使其对对抗性攻击具有抵抗力。然而,训练后的鲁棒模型在分类干净图像和对抗图像时的自然和鲁棒准确率仍然远未令人满意。在本工作中,我们统一了传输方程最优控制的理论与ResNets的训练和测试实践。基于这一统一观点,我们提出了一种简单但有效的ResNets集成算法,以提升鲁棒训练模型在干净和对抗图像上的准确率。所提出的算法包括两个组成部分:首先,我们通过在每个残差映射的输出中注入指定方差的高斯噪声来修改基础ResNets。其次,我们对多个联合训练的修改ResNets的乘积进行平均以获得最终预测。这两个步骤对费米-狄拉克公式表示粘性传输方程或对流-扩散方程的解提供了近似。在CIFAR10基准测试中,该简单算法导致在干净图像上的自然准确率为85.62%,在20次IFGSM攻击迭代下的鲁棒准确率为57.94%,优于当前在CIFAR10上防御IFGSM攻击的最先进方法。所提出的ResNets集成的自然和鲁棒准确率可以随着基础ResNet的进展动态提高。代码可在:https://github.com/BaoWangMath/EnResNet获取。

英文摘要

Empirical adversarial risk minimization (EARM) is a widely used mathematical framework to robustly train deep neural nets (DNNs) that are resistant to adversarial attacks. However, both natural and robust accuracies, in classifying clean and adversarial images, respectively, of the trained robust models are far from satisfactory. In this work, we unify the theory of optimal control of transport equations with the practice of training and testing of ResNets. Based on this unified viewpoint, we propose a simple yet effective ResNets ensemble algorithm to boost the accuracy of the robustly trained model on both clean and adversarial images. The proposed algorithm consists of two components: First, we modify the base ResNets by injecting a variance specified Gaussian noise to the output of each residual mapping. Second, we average over the production of multiple jointly trained modified ResNets to get the final prediction. These two steps give an approximation to the Feynman-Kac formula for representing the solution of a transport equation with viscosity, or a convection-diffusion equation. For the CIFAR10 benchmark, this simple algorithm leads to a robust model with a natural accuracy of {\bf 85.62}\% on clean images and a robust accuracy of ${\bf 57.94 \%}$ under the 20 iterations of the IFGSM attack, which outperforms the current state-of-the-art in defending against IFGSM attack on the CIFAR10. Both natural and robust accuracies of the proposed ResNets ensemble can be improved dynamically as the building block ResNet advances. The code is available at: \url{https://github.com/BaoWangMath/EnResNet}.

1807.07120 2026-06-04 stat.AP cs.SY eess.SY

A Holistic Approach to Forecasting Wholesale Energy Market Prices

面向批发电力市场价格预测的综合方法

Ana Radovanovic, Tommaso Nesti, Bokan Chen

AI总结 本文提出一种综合方法,利用电网运营商进行供需匹配过程的基本属性(即最优功率流OPF)来恢复电力市场结构并预测节点价格,仅使用公开可用数据,如电网发电类型构成、系统负荷和历史价格,通过统计学习的最新进展来处理高维和稀疏的电力网络拓扑以及稀缺的公开市场数据,从而实现对实时价格的准确提前预测。

详情
Comments
14 pages, 14 figures. Accepted for publication in IEEE Transactions on Power Systems
AI中文摘要

电力市场价格预测使能源市场参与者能够根据经济和环境目标调整其消费或供应。通过利用电网运营商进行供需匹配过程的基本属性,即最优功率流(OPF),我们开发了一种方法,仅使用公开可用数据(即电网范围内的发电类型构成、系统负荷和历史价格)来恢复电力市场的结构并预测由此产生的节点价格。我们的方法利用统计学习的最新进展来处理高维和稀疏的实功率电网拓扑,以及稀缺的公开市场数据,同时利用底层OPF机制的结构性质。通过严格验证西南电力池(SPP)市场数据,发现电网层面的混合与相应市场价格之间存在强相关性,从而实现了对实时价格的准确提前预测。所提出的方法在假设完全去中心化、市场参与者视角的情况下,表现出与行业最新基准相当的接近性。最后,我们认识到所提出及其他评估方法在预测大价格尖峰值方面的局限性。

英文摘要

Electricity market price predictions enable energy market participants to shape their consumption or supply while meeting their economic and environmental objectives. By utilizing the basic properties of the supply-demand matching process performed by grid operators, known as Optimal Power Flow (OPF), we develop a methodology to recover energy market's structure and predict the resulting nodal prices by using only publicly available data, specifically grid-wide generation type mix, system load, and historical prices. Our methodology uses the latest advancements in statistical learning to cope with high dimensional and sparse real power grid topologies, as well as scarce, public market data, while exploiting structural characteristics of the underlying OPF mechanism. Rigorous validations using the Southwest Power Pool (SPP) market data reveal a strong correlation between the grid level mix and corresponding market prices, resulting in accurate day-ahead predictions of real time prices. The proposed approach demonstrates remarkable proximity to the state-of-the-art industry benchmark while assuming a fully decentralized, market-participant perspective. Finally, we recognize the limitations of the proposed and other evaluated methodologies in predicting large price spike values.

1811.01091 2026-06-04 stat.CO cs.NA math.NA

Efficient Marginalization-based MCMC Methods for Hierarchical Bayesian Inverse Problems

高效的基于边缘化的MCMC方法用于分层贝叶斯反问题

Arvind K. Saibaba, Johnathan Bardsley, D. Andrew Brown, Alen Alexanderian

AI总结 本文提出了一种高效的基于边缘化的MCMC方法,用于解决分层贝叶斯反问题,通过结合低秩技术与边缘化方法,提高了高维状态下的计算效率和统计效率。

详情
Comments
27 pages, 8 figures, 5 tables
AI中文摘要

分层模型在贝叶斯反问题中以假设的先验概率分布为未知状态和测量误差精度,以及先验参数的超先验分布所特征化。利用贝叶斯定律结合这些概率模型通常会得到一个后验分布,该分布无法直接采样,即使对于具有高斯测量误差和高斯先验的线性模型也是如此。吉布斯采样可用于采样后验分布,但当状态维度较大时会出现问题。这是因为每个迭代所需的高斯样本计算成本可能过高,而且随着状态维度的增加,马尔可夫链的统计效率会下降。后者问题可以通过基于边缘化的技术缓解,但这些技术本身也可能计算成本很高。在本文中,我们结合了布朗、萨比亚和瓦莱利亚(2018)的低秩技术与鲁和霍尔德(2005)的边缘化方法。我们考虑了这种方法的两种变体:延迟接受和伪边缘化。我们对所提出算法的接受率和计算成本进行了详细分析,并在两个数值测试案例中比较了其性能——图像去模糊和反热方程。

英文摘要

Hierarchical models in Bayesian inverse problems are characterized by an assumed prior probability distribution for the unknown state and measurement error precision, and hyper-priors for the prior parameters. Combining these probability models using Bayes' law often yields a posterior distribution that cannot be sampled from directly, even for a linear model with Gaussian measurement error and Gaussian prior. Gibbs sampling can be used to sample from the posterior, but problems arise when the dimension of the state is large. This is because the Gaussian sample required for each iteration can be prohibitively expensive to compute, and because the statistical efficiency of the Markov chain degrades as the dimension of the state increases. The latter problem can be mitigated using marginalization-based techniques, but these can be computationally prohibitive as well. In this paper, we combine the low-rank techniques of Brown, Saibaba, and Vallelian (2018) with the marginalization approach of Rue and Held (2005). We consider two variants of this approach: delayed acceptance and pseudo-marginalization. We provide a detailed analysis of the acceptance rates and computational costs associated with our proposed algorithms, and compare their performances on two numerical test cases---image deblurring and inverse heat equation.

1807.06613 2026-06-04 cs.MA cs.AI cs.LG cs.SY eess.SY stat.ML

Deep Reinforcement Learning for Swarm Systems

深度强化学习用于群体系统

Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann

AI总结 本文提出了一种基于分布均嵌入的新状态表示方法,用于深度多智能体强化学习,以更有效地处理大规模同质群体系统的去中心化决策问题。

详情
Journal ref
Journal of Machine Learning Research 20(54):1-31, 2019
Comments
31 pages, 12 figures, version 3 (published in JMLR Volume 20)
AI中文摘要

最近,深度强化学习(RL)方法已成功应用于多智能体场景。通常,这些方法依赖于将智能体状态拼接起来以表示去中心化决策所需的信 �息内容。然而,拼接在大规模同质群体系统中表现不佳,因为它不利用这些系统固有的基本属性:(i)群体中的智能体是可互换的,(ii)群体中智能体的精确数量无关。因此,我们提出了一种基于分布均嵌入的新深度多智能体RL状态表示方法。我们将智能体视为分布的样本,并使用经验均嵌入作为去中心化策略的输入。我们通过直方图、径向基函数和端到端学习的神经网络定义了不同的均嵌入特征空间。我们在群体文献中两个著名的已知问题(相遇和追捕)上评估了该表示方法,在全局和局部可观察的设置中。对于局部设置,我们进一步引入了简单的通信协议。所有方法中,基于神经网络特征的均嵌入表示能够促进相邻智能体之间最丰富的信息交换,从而促进更复杂的集体策略的发展。

英文摘要

Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.

1905.13587 2026-06-04 cs.LG cs.NA math.NA math.OC stat.ML

GENO -- GENeric Optimization for Classical Machine Learning

GENO -- 为经典机器学习设计的通用优化

Sören Laue, Matthias Mitterreiter, Joachim Giesen

AI总结 本文提出GENO框架,通过结合建模语言和通用求解器,实现了对大多数经典机器学习问题的高效自动求解,展示了其在效率上的优势。

详情
AI中文摘要

尽管优化是机器学习的长期算法核心,但新模型仍需要耗时实现新求解器。因此,有成千上万种针对机器学习问题的优化算法实现。一个自然的问题是,是否总需要实现新求解器,或者是否存在一个适用于大多数模型的算法。普遍认为这种“万能算法”无法工作,因为该算法无法利用模型特定的结构,因此无法在广泛的问题上高效且稳健。本文挑战这一普遍观点。我们设计并实现了优化框架GENO(GENeric Optimization),它结合了建模语言和通用求解器。GENO从优化问题类的声明性规范中生成求解器。该框架足够灵活,可以涵盖大多数经典机器学习问题。我们在广泛的经典问题以及一些最近提出的问题上展示了自动生成的求解器的性能:(1) 与精心设计的专用求解器一样高效,(2) 比最近的最先进求解器有相当大的优势,(3) 比传统建模语言加求解器方法快多个数量级。

英文摘要

Although optimization is the longstanding algorithmic backbone of machine learning, new models still require the time-consuming implementation of new solvers. As a result, there are thousands of implementations of optimization algorithms for machine learning problems. A natural question is, if it is always necessary to implement a new solver, or if there is one algorithm that is sufficient for most models. Common belief suggests that such a one-algorithm-fits-all approach cannot work, because this algorithm cannot exploit model specific structure and thus cannot be efficient and robust on a wide variety of problems. Here, we challenge this common belief. We have designed and implemented the optimization framework GENO (GENeric Optimization) that combines a modeling language with a generic solver. GENO generates a solver from the declarative specification of an optimization problem class. The framework is flexible enough to encompass most of the classical machine learning problems. We show on a wide variety of classical but also some recently suggested problems that the automatically generated solvers are (1) as efficient as well-engineered specialized solvers, (2) more efficient by a decent margin than recent state-of-the-art solvers, and (3) orders of magnitude more efficient than classical modeling language plus solver approaches.

1905.13428 2026-06-04 cs.LG cs.MA cs.SY eess.SY stat.ML

Attentional Policies for Cross-Context Multi-Agent Reinforcement Learning

面向跨上下文多智能体强化学习的注意力策略

Matthew A. Wright, Roberto Horowitz

AI总结 本文提出了一种新的神经策略架构,用于解决多智能体问题,通过在策略层面学习多智能体关系,利用注意力机制实现智能体间的协作,优于传统方法并在大规模智能体场景中表现更优。

详情
AI中文摘要

许多现实世界中强化学习的应用涉及与数量随时间变化的其他智能体交互。我们为这些多智能体问题提出了新的神经策略架构。与传统的为每个智能体训练离散策略并通过额外的跨策略机制强制合作的方法不同,我们遵循最近关于深度网络中关系归纳偏置力量的工作精神,在策略层面学习多智能体关系。在我们的方法中,所有智能体共享相同的策略,但各自在自己的上下文中独立应用该策略,以聚合其他智能体的状态信息以选择下一步动作。我们的架构结构允许其应用于具有不同数量智能体的环境。我们在基准多智能体自动驾驶协调问题上展示了我们的架构,取得了优于全知识、完全集中化参考解决方案的成果,并在智能体数量扩大时显著优于该方案。

英文摘要

Many potential applications of reinforcement learning in the real world involve interacting with other agents whose numbers vary over time. We propose new neural policy architectures for these multi-agent problems. In contrast to other methods of training an individual, discrete policy for each agent and then enforcing cooperation through some additional inter-policy mechanism, we follow the spirit of recent work on the power of relational inductive biases in deep networks by learning multi-agent relationships at the policy level via an attentional architecture. In our method, all agents share the same policy, but independently apply it in their own context to aggregate the other agents' state information when selecting their next action. The structure of our architectures allow them to be applied on environments with varying numbers of agents. We demonstrate our architecture on a benchmark multi-agent autonomous vehicle coordination problem, obtaining superior results to a full-knowledge, fully-centralized reference solution, and significantly outperforming it when scaling to large numbers of agents.

1805.07297 2026-06-04 cs.LG cs.NA math.NA stat.ML

General solutions for nonlinear differential equations: a rule-based self-learning approach using deep reinforcement learning

非线性微分方程的通用解法:一种基于规则的自学习方法使用深度强化学习

Shiyin Wei, Xiaowei Jin, Hui Li

AI总结 本文提出了一种基于规则的自学习方法,利用深度强化学习解决非线性常微分方程和偏微分方程,通过深度神经网络结构的演员输出候选解,以及仅基于物理规则( governing equations 和边界和初始条件)的评论家,展示了转移学习特性,并验证了该方法在求解薛定谔、纳维-斯托克斯、伯格斯、范德波尔和洛伦兹方程及运动方程中的高精度解。

详情
AI中文摘要

本文首次提出了一种基于深度强化学习(DRL)的通用规则-based 自学习方法,用于求解非线性常微分方程和偏微分方程。求解器由一个深度神经网络结构的演员组成,该演员输出候选解,以及仅基于物理规则( governing equations 和边界和初始条件)的评论家。离散时间中的解被视为共享相同 governing equation 的多个任务,当前步骤参数为下一步提供了理想的初始化,由于解的时序连续性,展示了转移学习特性,表明DRL求解器已经捕捉到了方程的本质。该方法通过求解薛定谔、纳维-斯托克斯、伯格斯、范德波尔和洛伦兹方程及运动方程进行了验证。结果表明,该方法能够给出高精度的解,且求解过程有望更快。

英文摘要

A universal rule-based self-learning approach using deep reinforcement learning (DRL) is proposed for the first time to solve nonlinear ordinary differential equations and partial differential equations. The solver consists of a deep neural network-structured actor that outputs candidate solutions, and a critic derived only from physical rules (governing equations and boundary and initial conditions). Solutions in discretized time are treated as multiple tasks sharing the same governing equation, and the current step parameters provide an ideal initialization for the next owing to the temporal continuity of the solutions, which shows a transfer learning characteristic and indicates that the DRL solver has captured the intrinsic nature of the equation. The approach is verified through solving the Schrödinger, Navier-Stokes, Burgers', Van der Pol, and Lorenz equations and an equation of motion. The results indicate that the approach gives solutions with high accuracy, and the solution process promises to get faster.

1905.10457 2026-06-04 cs.LG cs.NA math.NA stat.ML

A Polynomial-Based Approach for Architectural Design and Learning with Deep Neural Networks

基于多项式的深度神经网络架构设计与学习方法

Joseph Daws, Clayton G. Webster

AI总结 本文提出了一种基于多项式的新型方法,通过识别合适的网络架构和初始化来从训练数据中重建多元函数,利用多项式近似,通过标准训练过程改进网络,从而更可能获得理想的局部极小值。

详情
Comments
11 pages, 6 figures, submitted to NeurIPS 2019, corrected several typos and included new examples
AI中文摘要

在本研究中,我们提出了一种新的方法,通过多项式近似来从训练数据中重建多元函数,同时确定合适的网络架构和初始化。使用梯度下降训练深度神经网络可以被视为沿着损失景观移动网络参数以最小化损失函数。参数初始化对于基于下降的迭代训练方法至关重要。我们的方法产生了一个初始状态为训练数据多项式表示的网络。该技术的主要优势是,从该初始状态出发,网络可以通过标准训练过程进行改进。由于网络已经近似了数据,训练更可能产生一组与理想局部极小值相关的参数。我们提供了构建此类网络所需的理论细节,并考虑了几个数值示例,揭示了我们的方法最终能够有效训练网络,从初始状态开始,以实现对大量目标函数的改进近似。

英文摘要

In this effort we propose a novel approach for reconstructing multivariate functions from training data, by identifying both a suitable network architecture and an initialization using polynomial-based approximations. Training deep neural networks using gradient descent can be interpreted as moving the set of network parameters along the loss landscape in order to minimize the loss functional. The initialization of parameters is important for iterative training methods based on descent. Our procedure produces a network whose initial state is a polynomial representation of the training data. The major advantage of this technique is from this initialized state the network may be improved using standard training procedures. Since the network already approximates the data, training is more likely to produce a set of parameters associated with a desirable local minimum. We provide the details of the theory necessary for constructing such networks and also consider several numerical examples that reveal our approach ultimately produces networks which can be effectively trained from our initialized state to achieve an improved approximation for a large class of target functions.

1905.08930 2026-06-04 math.NA cs.LG cs.NA math.PR math.ST stat.ML stat.TH

Heavy Hitters and Bernoulli Convolutions

重 hitters与伯努利卷积

Alexander Kushkuley

AI总结 本文提出了一种简单的事件频率近似算法,该算法对事件时效性敏感。算法通过迭代更新类别点击分布,在标准n维单纯形上生成随机游走路径。在某些条件下,这种随机游走具有自相似性,并对应于有偏伯努利卷积。算法评估自然地导致对有偏(有限和无限)伯努利卷积矩的估计。

详情
Comments
1) fixed some typos and a reference 2) expanded section 3
AI中文摘要

提出了一种非常简单的事件频率近似算法,该算法对事件时效性敏感。该算法通过迭代更新类别点击分布,在标准n维单纯形上生成(路径)随机游走。在某些条件下,这种随机游走具有自相似性,并对应于有偏伯努利卷积。算法评估自然地导致对有偏(有限和无限)伯努利卷积矩的估计。

英文摘要

A very simple event frequency approximation algorithm that is sensitive to event timeliness is suggested. The algorithm iteratively updates categorical click-distribution, producing (path of) a random walk on a standard $n$-dimensional simplex. Under certain conditions, this random walk is self-similar and corresponds to a biased Bernoulli convolution. Algorithm evaluation naturally leads to estimation of moments of biased (finite and infinite) Bernoulli convolutions.

1905.10363 2026-06-04 math.NA cs.CE cs.LG cs.NA stat.ML

User-Device Authentication in Mobile Banking using APHEN for Paratuck2 Tensor Decomposition

使用APHEN进行Paratuck2张量分解的移动银行用户-设备认证

Jeremy Charlier, Eric Falk, Radu State, Jean Hilger

AI总结 本文研究了如何利用Paratuck2张量分解和APHEN算法提高移动银行应用中的用户-设备认证效率,以增强个人财务广告的效果。

详情
AI中文摘要

新的金融欧洲法规,如PSD2,正在改变零售银行业务服务。值得注意的是,个人支出的监控现在不仅限于零售银行。然而,零售银行希望通过移动银行应用中的用户-设备认证来增强个人财务广告。为了解决认证的建模问题,我们依赖于张量分解,这是矩阵分解的高维类比。我们使用Paratuck2,因为它可以将张量表示为矩阵乘积和对角张量的乘积,因为用户和设备数量之间存在不平衡。我们强调为什么Paratuck2比流行的CP张量分解更适合这种情况,后者将张量分解为秩一张量的和。然而,Paratuck2的计算是计算密集型的。我们提出了一种新的近似Hessian基于牛顿求解算法,APHEN,能够比基于交替最小二乘或梯度下降的其他流行方法更准确和快速地解决Paratuck2。Paratuck2的结果用于通过神经网络预测用户认证的预测。我们应用我们的方法用于具体的案例,即基于移动银行应用生成的认证事件来针对客户进行财务广告活动。

英文摘要

The new financial European regulations such as PSD2 are changing the retail banking services. Noticeably, the monitoring of the personal expenses is now opened to other institutions than retail banks. Nonetheless, the retail banks are looking to leverage the user-device authentication on the mobile banking applications to enhance the personal financial advertisement. To address the profiling of the authentication, we rely on tensor decomposition, a higher dimensional analogue of matrix decomposition. We use Paratuck2, which expresses a tensor as a multiplication of matrices and diagonal tensors, because of the imbalance between the number of users and devices. We highlight why Paratuck2 is more appropriate in this case than the popular CP tensor decomposition, which decomposes a tensor as a sum of rank-one tensors. However, the computation of Paratuck2 is computational intensive. We propose a new APproximate HEssian-based Newton resolution algorithm, APHEN, capable of solving Paratuck2 more accurately and faster than the other popular approaches based on alternating least square or gradient descent. The results of Paratuck2 are used for the predictions of users' authentication with neural networks. We apply our method for the concrete case of targeting clients for financial advertising campaigns based on the authentication events generated by mobile banking applications.

1905.10224 2026-06-04 cs.LG cs.DM cs.NA cs.NE math.NA stat.ML

Semi-Supervised Classification on Non-Sparse Graphs Using Low-Rank Graph Convolutional Networks

利用低秩图卷积网络对非稀疏图进行半监督分类

Dominik Alfke, Martin Stoll

AI总结 本文提出了一种低秩图卷积网络架构,用于高效处理非稀疏图上的半监督学习问题,通过引入低秩滤波器提升运行效率和准确率,并扩展到超图数据集的处理。

详情
AI中文摘要

图卷积网络(GCNs)已被证明是图数据集上半监督学习的成功工具。对于稀疏图,线性和多项式滤波函数取得了显著成果。然而,对于大规模非稀疏图,网络训练和评估变得成本过高。通过引入低秩滤波器,我们获得了显著的运行时间加速,同时提高了准确性。我们进一步提出了一种架构变化,模仿了模型降阶技术,称为降阶GCN。此外,我们展示了我们的方法如何也能应用于超图数据集,并展示了超图卷积如何高效实现。

英文摘要

Graph Convolutional Networks (GCNs) have proven to be successful tools for semi-supervised learning on graph-based datasets. For sparse graphs, linear and polynomial filter functions have yielded impressive results. For large non-sparse graphs, however, network training and evaluation becomes prohibitively expensive. By introducing low-rank filters, we gain significant runtime acceleration and simultaneously improved accuracy. We further propose an architecture change mimicking techniques from Model Order Reduction in what we call a reduced-order GCN. Moreover, we present how our method can also be applied to hypergraph datasets and how hypergraph convolution can be implemented efficiently.

1807.00251 2026-06-04 math.NA cs.LG cs.NA stat.ML

Trust-Region Algorithms for Training Responses: Machine Learning Methods Using Indefinite Hessian Approximations

基于信任区域算法的训练响应方法:使用不定Hessian近似机学习方法

Jennifer B. Erway, Joshua Griffin, Roummel F. Marcia, Riadh Omheni

AI总结 本文提出了一种基于准牛顿信任区域框架的机学习方法,用于解决允许不定Hessian近似的大规模优化问题,通过数值实验展示了其在固定计算时间预算下优于传统有限记忆BFGS和Hessian自由方法的性能。

详情
AI中文摘要

机学习(ML)问题通常被表述为高度非线性和非凸的无约束优化问题。基于随机梯度下降的ML问题求解方法易于扩展到非常大的问题,但可能需要微调许多超参数。基于有限记忆Broyden-Fletcher-Goldfarb-Shanno(BFGS)更新的准牛顿方法通常不需要手动调整超参数,但会将潜在的不定Hessian近似为正定矩阵。Hessian自由方法利用了无需整个Hessian矩阵即可执行Hessian-向量乘法的能力,但每次迭代的复杂度显著高于准牛顿方法。在本文中,我们提出了一种基于准牛顿信任区域框架的替代方法,用于解决允许不定Hessian近似的大型优化问题。在标准测试数据集上的数值实验表明,在固定计算时间预算下,所提出的方法比传统有限记忆BFGS和Hessian自由方法表现更好。

英文摘要

Machine learning (ML) problems are often posed as highly nonlinear and nonconvex unconstrained optimization problems. Methods for solving ML problems based on stochastic gradient descent are easily scaled for very large problems but may involve fine-tuning many hyper-parameters. Quasi-Newton approaches based on the limited-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) update typically do not require manually tuning hyper-parameters but suffer from approximating a potentially indefinite Hessian with a positive-definite matrix. Hessian-free methods leverage the ability to perform Hessian-vector multiplication without needing the entire Hessian matrix, but each iteration's complexity is significantly greater than quasi-Newton methods. In this paper we propose an alternative approach for solving ML problems based on a quasi-Newton trust-region framework for solving large-scale optimization problems that allow for indefinite Hessian approximations. Numerical experiments on a standard testing data set show that with a fixed computational time budget, the proposed methods achieve better results than the traditional limited-memory BFGS and the Hessian-free methods.

1905.09149 2026-06-04 math.NA cs.NA stat.CO

Analytic regularity and stochastic collocation of high dimensional Newton iterates

解析正则性与高维牛顿迭代的随机格点

Julio Enrique Castrillon-Candas, Mark Kon

AI总结 本文引入不确定性量化和数值分析的概念,高效评估随机高维牛顿迭代。通过开发关于随机变量的复分析正则性理论,证明了稀疏网格在计算随机矩时的有效性,并推导出收敛速率,显示其相对于随机扰动实现次数为亚指数或代数收敛。该方法的精度使其适合计算低概率事件的高置信度计算。本文将方法应用于电力潮流问题,并在具有大随机负载的39节点新英格兰电力系统模型上进行了数值实验,结果与理论收敛速率一致。

详情
AI中文摘要

在本文中,我们引入不确定性量化(UQ)和数值分析的概念,以高效评估随机高维牛顿迭代。特别是,我们开发了关于随机变量的复分析正则性理论。这为稀疏网格计算随机矩提供了理论依据。推导了收敛速率,并展示了其相对于随机扰动实现次数为亚指数或代数收敛。由于方法的准确性,稀疏网格非常适合计算低概率事件的高置信度。我们应用了我们的方法到电力潮流问题。在具有大随机负载的39节点新英格兰电力系统模型上的数值实验与理论收敛速率一致。

英文摘要

In this paper we introduce concepts from uncertainty quantification (UQ) and numerical analysis for the efficient evaluation of stochastic high dimensional Newton iterates. In particular, we develop complex analytic regularity theory of the solution with respect to the random variables. This justifies the application of sparse grids for the computation of stochastic moments. Convergence rates are derived and are shown to be subexponential or algebraic with respect to the number of realizations of random perturbations. Due the accuracy of the method, sparse grids are well suited for computing low probability events with high confidence. We apply our method to the power flow problem. Numerical experiments on the 39 bus New England power system model with large stochastic loads are consistent with the theoretical convergence rates.

1905.07501 2026-06-04 cs.LG cs.NA math.NA stat.ML

Enforcing constraints for time series prediction in supervised, unsupervised and reinforcement learning

在监督、无监督和强化学习中对时间序列预测施加约束

Panos Stinis

AI总结 本文研究了如何在监督、无监督和强化学习中通过施加来自动态系统的约束来加速深度神经网络训练并提高其预测能力,主要贡献是提出了一种基于动作价值函数同伦的新型方法来稳定和加速强化学习训练。

详情
Comments
30 pages, 5 figures
AI中文摘要

我们假设我们给定了一个来自动态系统的数据时间序列,我们的任务是学习动态系统的流映射。我们提出了关于如何施加来自动态系统的约束以加速深度神经网络训练并提高其预测能力的一系列结果。特别是,我们为监督、无监督和强化学习三种主要学习模式提供了在训练过程中施加约束的方法。一般来说,动态约束需要包括类似于模型降阶形式中的记忆项。这些记忆项起到恢复力的作用,纠正由学习的流映射在预测过程中犯的错误。对于监督学习,约束被添加到目标函数中。对于无监督学习,特别是生成对抗网络,约束是通过增强判别器的输入引入的。最后,对于强化学习,特别是actor-critic方法,约束被添加到奖励函数中。此外,对于强化学习情况,我们提出了一种基于动作价值函数同伦的新方法,以稳定和加速训练。我们使用洛伦兹系统数值结果来说明各种构造。

英文摘要

We assume that we are given a time series of data from a dynamical system and our task is to learn the flow map of the dynamical system. We present a collection of results on how to enforce constraints coming from the dynamical system in order to accelerate the training of deep neural networks to represent the flow map of the system as well as increase their predictive ability. In particular, we provide ways to enforce constraints during training for all three major modes of learning, namely supervised, unsupervised and reinforcement learning. In general, the dynamic constraints need to include terms which are analogous to memory terms in model reduction formalisms. Such memory terms act as a restoring force which corrects the errors committed by the learned flow map during prediction. For supervised learning, the constraints are added to the objective function. For the case of unsupervised learning, in particular generative adversarial networks, the constraints are introduced by augmenting the input of the discriminator. Finally, for the case of reinforcement learning and in particular actor-critic methods, the constraints are added to the reward function. In addition, for the reinforcement learning case, we present a novel approach based on homotopy of the action-value function in order to stabilize and accelerate training. We use numerical results for the Lorenz system to illustrate the various constructions.

1905.07436 2026-06-04 math.OC cs.LG cs.SY eess.SY stat.ML

A Dynamical Systems Perspective on Nesterov Acceleration

从动力系统角度看待Nesterov加速法

Michael Muehlebach, Michael I. Jordan

AI总结 本文提出一个动力系统框架来理解Nesterov加速梯度方法,通过分析连续时间动力学和离散化过程,揭示了曲率依赖的阻尼项是加速现象的核心,并建立了离散和连续时间动力学之间的联系。

详情
Comments
11 pages, 4 figures, to appear in the Proceedings of the 36th International Conference on Machine Learning
AI中文摘要

我们提出一个动力系统框架来理解Nesterov加速梯度方法。与以往工作不同,我们的推导不依赖于步长消失的论证。我们展示Nesterov加速源于对常微分方程的半隐式欧拉积分方案的离散化。我们分析了底层微分方程及其离散化,以获得对加速现象的见解。分析表明,曲率依赖的阻尼项是该现象的核心。我们进一步建立了离散和连续时间动力学之间的联系。

英文摘要

We present a dynamical system framework for understanding Nesterov's accelerated gradient method. In contrast to earlier work, our derivation does not rely on a vanishing step size argument. We show that Nesterov acceleration arises from discretizing an ordinary differential equation with a semi-implicit Euler integration scheme. We analyze both the underlying differential equation as well as the discretization to obtain insights into the phenomenon of acceleration. The analysis suggests that a curvature-dependent damping term lies at the heart of the phenomenon. We further establish connections between the discretized and the continuous-time dynamics.

1905.06978 2026-06-04 eess.SY cs.LG cs.RO cs.SY stat.AP

Randomized Algorithms for Data-Driven Stabilization of Stochastic Linear Systems

数据驱动的随机算法用于随机线性系统的稳定化

Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis

AI总结 本文提出两种随机算法用于数据驱动的随机线性系统稳定化,通过数值分析研究了随机反馈和随机参数方法的稳定速度和失败概率,证明在统计独立随机化数量不小时可以保证快速稳定化。

详情
AI中文摘要

数据驱动的控制策略在动态系统中广泛应用,尤其是在参数未知的情况下。一个关键问题是防止随机线性系统因决策者对动态参数不确定而失稳。本文提出了两种随机算法来解决这个问题,但其性能尚未充分研究。此外,算法中的关键参数,如随机化应用的幅度和频率的影响目前尚不明确。本文研究了数据驱动过程的稳定速度和失败概率。我们对两种方法:随机反馈和随机参数的性能进行了数值分析。所呈现的结果表明,只要统计独立的随机化数量不太多,就可以保证快速稳定化。

英文摘要

Data-driven control strategies for dynamical systems with unknown parameters are popular in theory and applications. An essential problem is to prevent stochastic linear systems becoming destabilized, due to the uncertainty of the decision-maker about the dynamical parameter. Two randomized algorithms are proposed for this problem, but the performance is not sufficiently investigated. Further, the effect of key parameters of the algorithms such as the magnitude and the frequency of applying the randomizations is not currently available. This work studies the stabilization speed and the failure probability of data-driven procedures. We provide numerical analyses for the performance of two methods: stochastic feedback, and stochastic parameter. The presented results imply that as long as the number of statistically independent randomizations is not too small, fast stabilization is guaranteed.

1810.05247 2026-06-04 eess.SY cs.LG cs.SY stat.ML

Real-time Faulted Line Localization and PMU Placement in Power Systems through Convolutional Neural Networks

通过卷积神经网络实现电力系统中的实时故障线路定位与PMU布置

Wenting Li, Deepjyoti Deka, Michael Chertkov, Meng Wang

AI总结 本文提出基于卷积神经网络的故障线路定位方法,利用母线电压特征提高鲁棒性,并提出联合PMU布置策略,通过不同类型的故障模拟验证了在低可观测性条件下高精度的故障定位能力。

详情
Comments
11 pages, 8 figures
AI中文摘要

多样化的故障类型、快速的重合闸和故障后复杂的暂态状态使得电力电网中的实时故障定位具有挑战性。现有定位技术依赖于静态负载等简化假设或需要更高的采样率或总测量可用性。本文提出了一种基于卷积神经网络(CNN)分类器的故障线路定位方法,利用母线电压。与以往的数据驱动方法不同,所提出的分类器基于具有物理解释的特征,提高了定位性能的鲁棒性。我们的基于CNN的定位工具的准确性明显优于文献中的其他机器学习分类器。为了进一步提高定位性能,提出了一种联合相量测量单元(PMU)布置策略,并与其他方法进行了验证。我们方法的一个重要方面是,在非常低的可观测性(7%的母线)下,算法仍能以高概率将故障线路定位到小的邻域。通过在IEEE 39母线和68母线电力系统中不同类型的故障模拟,验证了在变化的不确定条件、系统可观测性和测量质量下的方案性能。

英文摘要

Diverse fault types, fast re-closures, and complicated transient states after a fault event make real-time fault location in power grids challenging. Existing localization techniques in this area rely on simplistic assumptions, such as static loads, or require much higher sampling rates or total measurement availability. This paper proposes a faulted line localization method based on a Convolutional Neural Network (CNN) classifier using bus voltages. Unlike prior data-driven methods, the proposed classifier is based on features with physical interpretations that improve the robustness of the location performance. The accuracy of our CNN based localization tool is demonstrably superior to other machine learning classifiers in the literature. To further improve the location performance, a joint phasor measurement units (PMU) placement strategy is proposed and validated against other methods. A significant aspect of our methodology is that under very low observability (7% of buses), the algorithm is still able to localize the faulted line to a small neighborhood with high probability. The performance of our scheme is validated through simulations of faults of various types in the IEEE 39-bus and 68-bus power systems under varying uncertain conditions, system observability, and measurement quality.

1905.05663 2026-06-04 math.PR cs.NA math.NA math.ST q-fin.CP stat.TH

Approximation of Optimal Transport problems with marginal moments constraints

用边际矩约束近似最优运输问题

Aurélien Alfonsi, Rafaël Coyaud, Virginie Ehrlacher, Damiano Lombardi

AI总结 本文研究了在边际约束被矩约束替代时最优运输问题的近似方法,证明了矩约束最优运输问题(MCOT)可以通过有限离散测度实现,并展示了其在多边际最优运输问题中的应用及收敛性。

详情
AI中文摘要

最优运输(OT)问题在物理、经济等多个领域均有广泛应用。获得这些问题的数值近似解是一个具有实际意义的挑战性问题。本文研究了在边际约束被某些矩约束替代时的OT问题的松弛情况。利用Tchakaloff定理,我们证明了矩约束最优运输问题(MCOT)可通过有限离散测度实现。有趣的是,对于多边际OT问题,该测度的点数与边际分布的数量呈线性关系,这有助于克服维度诅咒。该近似方法也适用于martingale OT问题。我们展示了MCOT问题向相应OT问题的收敛性。在某些基本情况下,我们得到了收敛率在$O(1/n)$或$O(1/n^2)$(其中$n$为矩的数量),这展示了矩函数的作用。最后,我们提出了利用MCOT由有限离散测度实现这一事实的算法,并提供了数值近似示例。

英文摘要

Optimal Transport (OT) problems arise in a wide range of applications, from physics to economics. Getting numerical approximate solution of these problems is a challenging issue of practical importance. In this work, we investigate the relaxation of the OT problem when the marginal constraints are replaced by some moment constraints. Using Tchakaloff's theorem, we show that the Moment Constrained Optimal Transport problem (MCOT) is achieved by a finite discrete measure. Interestingly, for multimarginal OT problems, the number of points weighted by this measure scales linearly with the number of marginal laws, which is encouraging to bypass the curse of dimension. This approximation method is also relevant for Martingale OT problems. We show the convergence of the MCOT problem toward the corresponding OT problem. In some fundamental cases, we obtain rates of convergence in $O(1/n)$ or $O(1/n^2)$ where $n$ is the number of moments, which illustrates the role of the moment functions. Last, we present algorithms exploiting the fact that the MCOT is reached by a finite discrete measure and provide numerical examples of approximations.

1905.05380 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Control Regularization for Reduced Variance Reinforcement Learning

减少方差的强化学习中的控制正则化

Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri, Yisong Yue, Joel W. Burdick

AI总结 本文提出了一种功能正则化方法,用于减少连续控制中强化学习的方差,通过正则化深度策略的行为与先验策略相似,从而在偏倚-方差权衡中实现更稳定的动态稳定性和更高效的训练。

详情
Comments
Appearing in ICML 2019
AI中文摘要

在模型无关强化学习(RL)中,处理高方差是一个重要的挑战。现有方法不可靠,使用不同初始化/种子时性能表现方差较大。针对连续控制中出现的问题,我们提出了一种功能正则化方法来增强模型无关RL。具体而言,我们正则化深度策略的行为与先验策略相似,即在函数空间中进行正则化。我们证明功能正则化会产生偏倚-方差权衡,并提出了一种自适应调节策略来优化这种权衡。当策略先验具有控制理论稳定性保证时,我们进一步证明这种正则化在整个学习过程中近似保持这些稳定性保证。我们在多种设置中通过实验证明了我们的方法,展示了显著减少的方差、保证的动态稳定性和比深度RL更高效的训练。

英文摘要

Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a policy prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the policy prior has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone.

1712.09718 2026-06-04 stat.CO cs.LG cs.SY eess.SY

Directional Statistics and Filtering Using libDirectional

基于libDirectional的方向统计与滤波

Gerhard Kurz, Igor Gilitschenski, Florian Pfaff, Lukas Drude, Uwe D. Hanebeck, Reinhold Haeb-Umbach, Roland Y. Siegwart

AI总结 本文介绍了libDirectional库,该库用于方向统计和方向估计,支持单位圆上常用的分布如von Mises、 Wrapped Normal和Wrapped Cauchy分布,以及更高维流形上的分布,如单位超球面和超 torus,并基于这些分布实现了多种递归滤波算法。

详情
Comments
Version accepted for Publication in the Journal of Statistical Software
AI中文摘要

在本文中,我们介绍了libDirectional,一个用于方向统计和方向估计的MATLAB库。它支持单位圆上各种常用分布,如von Mises、wrapped normal和wrapped Cauchy分布。此外,还提供了更高维流形上的分布,如单位超球面和超 torus。基于这些分布,libDirectional中的几种递归滤波算法允许在这些流形上进行估计。该功能以清晰、文档齐全且面向对象的结构实现,易于使用且易于扩展。

英文摘要

In this paper, we present libDirectional, a MATLAB library for directional statistics and directional estimation. It supports a variety of commonly used distributions on the unit circle, such as the von Mises, wrapped normal, and wrapped Cauchy distributions. Furthermore, various distributions on higher-dimensional manifolds such as the unit hypersphere and the hypertorus are available. Based on these distributions, several recursive filtering algorithms in libDirectional allow estimation on these manifolds. The functionality is implemented in a clear, well-documented, and object-oriented structure that is both easy to use and easy to extend.

1905.04365 2026-06-04 math.ST cs.NA math.NA stat.TH

Hyperparameter Estimation in Bayesian MAP Estimation: Parameterizations and Consistency

贝叶斯MAP估计中的超参数估计:参数化与一致性

Matthew M. Dunlop, Tapio Helin, Andrew M. Stuart

AI总结 本文研究了在条件高斯分层先验分布下参数化对MAP估计器的影响,探讨了不同参数化方法在超参数估计中的一致性和鲁棒性,揭示了超参数只能在测度等价的意义下被恢复的特性。

详情
Comments
36 pages, 8 figures
AI中文摘要

贝叶斯逆问题的公式化具有三个主要吸引力:它提供了一个清晰的建模框架;提供了不确定性量化的方法;并且允许有原则地学习超参数。后验分布可以通过采样方法进行探索,但对于许多问题,这样做计算上是不可行的。在这种情况下,通常寻求最大后验(MAP)估计器。尽管这些估计器计算相对便宜且具有吸引人的变分公式,但其关键缺点是缺乏参数化变化下的不变性。当使用分层先验来学习超参数时,这一点尤其显著。在本文中,我们研究了在条件高斯分层先验分布下参数化选择对MAP估计器的影响。具体而言,我们考虑了中心参数化、自然参数化(其中未知状态直接求解)和非中心参数化(其中未知状态变量是一个白化高斯变量,出现在考虑维度鲁棒MCMC算法时)。在非参数设置下,只有非中心参数化才能定义MAP估计。然而,我们证明基于非中心参数化的MAP估计不是超参数估计器的一致估计。相反,我们证明有限维中心MAP估计器的极限在维度趋于无穷时是一致的。我们还考虑了经验贝叶斯超参数估计,展示了这些估计的一致性,并证明它们比中心MAP估计更鲁棒,对噪声更稳健。贯穿整个研究的一个基本概念是,超参数只能在测度等价的意义下被恢复,这是在奥本-乌尔本贝克过程上下文中众所周知的现象。

英文摘要

The Bayesian formulation of inverse problems is attractive for three primary reasons: it provides a clear modelling framework; means for uncertainty quantification; and it allows for principled learning of hyperparameters. The posterior distribution may be explored by sampling methods, but for many problems it is computationally infeasible to do so. In this situation maximum a posteriori (MAP) estimators are often sought. Whilst these are relatively cheap to compute, and have an attractive variational formulation, a key drawback is their lack of invariance under change of parameterization. This is a particularly significant issue when hierarchical priors are employed to learn hyperparameters. In this paper we study the effect of the choice of parameterization on MAP estimators when a conditionally Gaussian hierarchical prior distribution is employed. Specifically we consider the centred parameterization, the natural parameterization in which the unknown state is solved for directly, and the noncentred parameterization, which works with a whitened Gaussian as the unknown state variable, and arises when considering dimension-robust MCMC algorithms; MAP estimation is well-defined in the nonparametric setting only for the noncentred parameterization. However, we show that MAP estimates based on the noncentred parameterization are not consistent as estimators of hyperparameters; conversely, we show that limits of finite-dimensional centred MAP estimators are consistent as the dimension tends to infinity. We also consider empirical Bayesian hyperparameter estimation, show consistency of these estimates, and demonstrate that they are more robust with respect to noise than centred MAP estimates. An underpinning concept throughout is that hyperparameters may only be recovered up to measure equivalence, a well-known phenomenon in the context of the Ornstein-Uhlenbeck process.

1905.04351 2026-06-04 cs.LG cs.NA math.NA physics.comp-ph physics.data-an stat.ML

Solving Irregular and Data-enriched Differential Equations using Deep Neural Networks

使用深度神经网络求解不规则和数据丰富的微分方程

Craig Michoski, Milos Milosavljevic, Todd Oliver, David Hatch

AI总结 本文提出了一种利用深度神经网络求解不规则和数据丰富的微分方程的方法,通过分析Sod激波管解和压缩磁流体动力学中的激波解,展示了该方法在提高数值方法性能和参数空间探索方面的优势。

详情
Comments
21 pages, 14 figures, 3 tables
AI中文摘要

近期的研究提出了一种简单的数值方法,用于利用深度神经网络(DNN)求解偏微分方程(PDEs)。本文回顾并扩展了该方法,并将其应用于分析数值PDEs和非线性分析中最基本的特征之一:不规则解。首先,讨论并分析了Sod激波管解到可压缩欧拉方程的解,然后与传统有限元和有限体积方法进行比较。这些方法被扩展以考虑性能改进和同时的参数空间探索。接下来,解决了一个压缩磁流体动力学(MHD)中的激波解,并在实验数据被用来增强一个原本不足以验证的PDE系统的情况下使用。这通过在模型PDE系统中加入源项并使用监督训练在合成实验数据上实现。所得到的DNN框架似乎在系统原型化方面表现出几乎幻想般的易用性,能够自然整合大规模数据集(无论是合成还是实验数据),同时能够同时进行整个参数空间的单次探索。

英文摘要

Recent work has introduced a simple numerical method for solving partial differential equations (PDEs) with deep neural networks (DNNs). This paper reviews and extends the method while applying it to analyze one of the most fundamental features in numerical PDEs and nonlinear analysis: irregular solutions. First, the Sod shock tube solution to compressible Euler equations is discussed, analyzed, and then compared to conventional finite element and finite volume methods. These methods are extended to consider performance improvements and simultaneous parameter space exploration. Next, a shock solution to compressible magnetohydrodynamics (MHD) is solved for, and used in a scenario where experimental data is utilized to enhance a PDE system that is \emph{a priori} insufficient to validate against the observed/experimental data. This is accomplished by enriching the model PDE system with source terms and using supervised training on synthetic experimental data. The resulting DNN framework for PDEs seems to demonstrate almost fantastical ease of system prototyping, natural integration of large data sets (be they synthetic or experimental), all while simultaneously enabling single-pass exploration of the entire parameter space.

1905.04152 2026-06-04 eess.SY cs.LG cs.NI cs.SY stat.ML

Massive Autonomous UAV Path Planning: A Neural Network Based Mean-Field Game Theoretic Approach

大规模自主无人机路径规划:一种基于神经网络的均场博弈理论方法

Hamid Shiri, Jihong Park, Mehdi Bennis

AI总结 本文研究了大规模无人机在关键任务中的自主控制问题,提出了一种基于神经网络的均场博弈理论方法,通过减少无人机状态交换次数和降低计算能耗来实现高效路径规划。

详情
Comments
6 pages, 5 figures, submitted to IEEE GLOBECOM 2019
AI中文摘要

本文研究了大规模无人驾驶航空器(UAVs)在关键任务中的自主控制问题,例如从源点向目的地派遣大量UAVs进行灭火任务。在风扰扰动下实现快速移动和低运动能耗同时避免UAV间碰撞是一项具有挑战性的控制任务,这会带来巨大的通信能耗用于实时交换UAV状态。我们通过利用均场博弈(MFG)理论控制方法来解决这个问题,该方法要求UAVs仅在初始源点交换一次状态。此后,每个UAV可以通过本地求解两个偏微分方程(PDEs)来控制其加速度,即哈密尔顿-雅可比-贝尔曼(HJB)方程和福克-科尔莫戈罗夫-柯尔莫哥洛夫(FPK)方程。然而,这种方法在解决PDEs时带来了巨大的计算能耗,特别是在多维UAV状态的情况下。我们通过使用机器学习(ML)方法来解决这个问题,其中两个独立的ML模型近似HJB和FPK方程的解。这些ML模型通过使用在线梯度下降方法进行训练和利用,具有较低的计算复杂度。数值评估验证了所提出的ML辅助MFG理论算法(称为MFG学习控制)在碰撞避免方面是有效的,具有低通信能耗和可接受的计算能耗。

英文摘要

This paper investigates the autonomous control of massive unmanned aerial vehicles (UAVs) for mission-critical applications (e.g., dispatching many UAVs from a source to a destination for firefighting). Achieving their fast travel and low motion energy without inter-UAV collision under wind perturbation is a daunting control task, which incurs huge communication energy for exchanging UAV states in real time. We tackle this problem by exploiting a mean-field game (MFG) theoretic control method that requires the UAV state exchanges only once at the initial source. Afterwards, each UAV can control its acceleration by locally solving two partial differential equations (PDEs), known as the Hamilton-Jacobi-Bellman (HJB) and Fokker-Planck-Kolmogorov (FPK) equations. This approach, however, brings about huge computation energy for solving the PDEs, particularly under multi-dimensional UAV states. We address this issue by utilizing a machine learning (ML) method where two separate ML models approximate the solutions of the HJB and FPK equations. These ML models are trained and exploited using an online gradient descent method with low computational complexity. Numerical evaluations validate that the proposed ML aided MFG theoretic algorithm, referred to as MFG learning control, is effective in collision avoidance with low communication energy and acceptable computation energy.

1805.03304 2026-06-04 math.NA cs.NA math.AP stat.CO

Bayesian parameter identification in Cahn-Hilliard models for biological growth

在生物生长的Cahn-Hilliard模型中进行贝叶斯参数识别

Christian Kahle, Kei Fong Lam, Jonas Latz, Elisabeth Ullmann

AI总结 本文研究了在肿瘤生长的扩散界面模型中参数估计的反问题,采用贝叶斯框架构建似然函数和噪声,通过两种典型观测设置分析后验测度的well-posedness,并利用合成数据进行数值示例,采用退火序贯蒙特卡洛方法近似后验测度。

详情
Journal ref
SIAM/ASA J. Uncertain. Quantif. 7(2), p. 526-552, 2019
AI中文摘要

我们考虑了在肿瘤生长的扩散界面模型中参数估计的反问题。该模型由一个四阶Cahn-Hilliard系统组成,包含三个现象学参数:肿瘤增殖率、营养消耗率和趋化敏感性。我们采用贝叶斯框架研究该反问题,并为两种典型观测设置构建似然函数和噪声。一种设置涉及无限维的数据空间,我们观测完整的肿瘤;另一种设置仅观测肿瘤体积,因此数据空间是有限维的。我们展示了两种设置下后验测度的well-posedness,基于并改进了[C. Kahle和K.F. Lam, Appl. Math. Optim. (2018)]中的分析结果。一个涉及合成数据的数值示例被呈现,其中后验测度通过具有退火的序贯蒙特卡洛方法进行数值近似。

英文摘要

We consider the inverse problem of parameter estimation in a diffuse interface model for tumour growth. The model consists of a fourth-order Cahn-Hilliard system and contains three phenomenological parameters: the tumour proliferation rate, the nutrient consumption rate, and the chemotactic sensitivity. We study the inverse problem within the Bayesian framework and construct the likelihood and noise for two typical observation settings. One setting involves an infinite-dimensional data space where we observe the full tumour. In the second setting we observe only the tumour volume, hence the data space is finite-dimensional. We show the well-posedness of the posterior measure for both settings, building upon and improving the analytical results in [C. Kahle and K.F. Lam, Appl. Math. Optim. (2018)]. A numerical example involving synthetic data is presented in which the posterior measure is numerically approximated by the sequential Monte Carlo approach with tempering.

1804.10636 2026-06-04 math.NA cs.NA eess.IV physics.med-ph stat.AP

Reconstruction of optical vector-fields with applications in endoscopic imaging

光学矢量场的重建及其在内窥镜成像中的应用

Milana Gataric, George S. D. Gordon, Francesco Renna, Alberto Gil C. P. Ramos, Maria P. Alcolea, Sarah E. Bohndiek

AI总结 本文提出了一种利用未知线性变换的成像设备进行光学矢量场幅度、相位和偏振重建的框架,通过引入有效的正则化项,能够针对任意表示系统恢复光学矢量场,并在傅里叶基底上恢复,用于识别组织异常的散射特征。

详情
AI中文摘要

我们介绍了一种框架,用于使用通过具有未知线性变换的成像设备获取的校准测量,来重建光学矢量场的幅度、相位和偏振。通过引入有效的正则化项,这种方法能够针对任意表示系统恢复光学矢量场,该系统可能与校准中使用的不同。特别是,它使能够针对傅里叶基底恢复光学矢量场,这已被证明能产生与组织异常相关的增强散射特征。我们通过合成全息图像以及实验设置中的生物组织样本来展示本方法的有效性,其中通过光纤内窥镜获取光学矢量场的测量,并观察到确实恢复的傅里叶系数在区分健康组织和早期食管癌病变方面是有用的。

英文摘要

We introduce a framework for the reconstruction of the amplitude, phase and polarisation of an optical vector-field using calibration measurements acquired by an imaging device with an unknown linear transformation. By incorporating effective regularisation terms, this new approach is able to recover an optical vector-field with respect to an arbitrary representation system, which may be different from the one used in calibration. In particular, it enables the recovery of an optical vector-field with respect to a Fourier basis, which is shown to yield indicative features of increased scattering associated with tissue abnormalities. We demonstrate the effectiveness of our approach using synthetic holographic images as well as biological tissue samples in an experimental setting where measurements of an optical vector-field are acquired by a fibre endoscope, and observe that indeed the recovered Fourier coefficients are useful in distinguishing healthy tissues from lesions in early stages of oesophageal cancer.

1905.01770 2026-06-04 math.NA cs.NA stat.AP

Propagation of Uncertainties in Density-Driven Flow

密度驱动流中不确定性的传播

Alexander Litvinenko, Dmitry Logashenko, Raul Tempone, Gabriel Wittum, David Keyes

AI总结 本文提出了一种并行方法,用于量化地下流中污染扩散中不确定性的传播,通过构建低成本的广义多项式混沌扩展(gPC)替代模型,利用稀疏和全张量网格进行系数计算,以评估渗透性和孔隙率的不确定性对解的影响。

详情
Comments
21 page, 9 Figures, 2 Tables
AI中文摘要

准确建模地下流和地下水中的污染扩散对于农业和环境保护至关重要。本文展示了一种并行方法,用于量化污染在地下流中扩散过程中不确定性的传播。具体而言,我们考虑密度驱动的流,并估计渗透性和孔隙率的不确定性如何传播到解中。我们以Elder-like问题作为数值基准,并利用随机场来建模对孔隙率和渗透性的有限知识。我们构建了一个低成本的广义多项式混沌扩展(gPC)替代模型,其中gPC系数通过在稀疏和全张量网格上的投影来计算。我们并行化了基于多网格方法的确定性问题数值求解器,以及参数空间上的求积运算。

英文摘要

Accurate modeling of contamination in subsurface flow and water aquifers is crucial for agriculture and environmental protection. Here, we demonstrate a parallel method to quantify the propagation of the uncertainty in the dispersal of pollution in subsurface flow. Specifically, we consider the density-driven flow and estimate how uncertainty from permeability and porosity propagates to the solution. We take an Elder-like problem as a numerical benchmark and we use random fields to model the limited knowledge on the porosity and permeability. We construct a low-cost generalized polynomial chaos expansion (gPC) surrogate model, where the gPC coefficients are computed by projection on sparse and full tensor grids. We parallelize both the numerical solver for the deterministic problem based on the multigrid method, and the quadrature over the parametric space

1803.10986 2026-06-04 math.NA cs.LG cs.NA stat.ML

Error Analysis and Improving the Accuracy of Winograd Convolution for Deep Neural Networks

卷积在深度神经网络中的误差分析及精度提升

Barbara Barabasz, Andrew Anderson, Kirk M. Soodhalter, David Gregg

AI总结 本文分析了Winograd卷积算法的误差,并提出改进方法以提高其精度,通过Huffman编码优化求和误差,实验选择采样点并探索混合精度卷积等方法以减少浮点误差。

详情
AI中文摘要

流行的深度神经网络(DNNs)大部分执行时间用于计算卷积。Winograd算法族可以显著减少所需的算术运算次数,并存在于许多DNN软件框架中。然而,性能提升是以浮点(FP)数值精度的降低为代价。在本文中,我们分析了最坏情况下的FP误差并证明了算法的范数和条件数的估计。我们证明误差界随卷积大小的增加呈指数增长,但改进后的算法的误差界比原始算法更小。我们提出几种减少FP误差的方法。我们提出基于Huffman编码的通用评估顺序以减少求和误差。我们通过实验研究采样

英文摘要

Popular deep neural networks (DNNs) spend the majority of their execution time computing convolutions. The Winograd family of algorithms can greatly reduce the number of arithmetic operations required and is present in many DNN software frameworks. However, the performance gain is at the expense of a reduction in floating point (FP) numerical accuracy. In this paper, we analyse the worst case FP error and prove the estimation of norm and conditioning of the algorithm. We show that the bound grows exponentially with the size of the convolution, but the error bound of the \textit{modified} algorithm is smaller than the original one. We propose several methods for reducing FP error. We propose a canonical evaluation ordering based on Huffman coding that reduces summation error. We study the selection of sampling "points" experimentally and find empirically good points for the most important sizes. We identify the main factors associated with good points. In addition, we explore other methods to reduce FP error, including mixed-precision convolution, and pairwise summation across DNN channels. Using our methods we can significantly reduce FP error for a given block size, which allows larger block sizes and reduced computation.

1709.08625 2026-06-04 stat.CO cs.NA math.NA

HLIBCov: Parallel Hierarchical Matrix Approximation of Large Covariance Matrices and Likelihoods with Applications in Parameter Identification

HLIBCov: 并行分层矩阵近似大规模协方差矩阵和似然函数及其在参数识别中的应用

Alexander Litvinenko

AI总结 本文提出HLIBCov包,利用并行分层矩阵高效近似大规模不均匀协方差矩阵,通过最大化联合高斯对数似然函数估计协方差函数的未知参数,包括方差、平滑度和协方差长度。

详情
Comments
26 pages, 11 figures
AI中文摘要

我们提供了HLIBCov包的更多技术细节,该包使用并行分层(H)矩阵来识别协方差函数的未知参数(方差、平滑度和协方差长度)。这些参数通过最大化联合高斯对数似然函数进行估计。HLIBCov包以对数线性计算成本和存储需求近似大规模密集不均匀协方差矩阵。我们解释了如何在H矩阵格式中计算Cholesky分解、行列式、逆和二次型。为了展示数值性能,我们在具有200万个位置的示例中识别三个未知参数。

英文摘要

We provide more technical details about the HLIBCov package, which is using parallel hierarchical ($\H$-) matrices to identify unknown parameters of the covariance function (variance, smoothness, and covariance length). These parameters are estimated by maximizing the joint Gaussian log-likelihood function. The HLIBCov package approximates large dense inhomogeneous covariance matrices with a log-linear computational cost and storage requirement. We explain how to compute the Cholesky factorization, determinant, inverse and quadratic form in the H-matrix format. To demonstrate the numerical performance, we identify three unknown parameters in an example with 2,000,000 locations on a PC-desktop.

1904.13304 2026-06-04 cs.LG cs.SY eess.SP eess.SY stat.ML

A supervised-learning-based strategy for optimal demand response of an HVAC System

基于监督学习的HVAC系统最优需求响应策略

Youngjin Kim

AI总结 本文提出了一种基于监督学习的HVAC系统最优需求响应策略,通过训练人工神经网络并结合分段线性方程,解决多区建筑中HVAC系统的优化需求响应问题,同时确保热舒适性和经济性。

详情
Comments
12 pages
AI中文摘要

建筑的大型热容量使供暖、通风和空气调节(HVAC)系统能够被用作需求响应(DR)资源。优化HVAC单元的需求响应具有挑战性,特别是在多区建筑中,因为这需要详细的基于物理的模型来描述区域温度变化和建筑热状况。本文提出了一种基于监督学习(SL)的新策略,用于多区建筑中的HVAC系统最优需求响应。人工神经网络(ANNs)使用正常建筑运行条件下的数据进行训练。ANNs通过分段线性方程进行复制,并显式地整合到基于价格的需求响应优化问题中。该优化问题在各种电价和建筑热状况下得到解决。解决方案进一步用于训练深度神经网络(DNN)以直接确定最优需求响应计划,称为监督学习辅助元预测(SLAMP)。通过三种不同方法进行案例研究:显式ANN复制(EAR)、SLAMP和基于物理的建模。案例研究结果验证了所提出的SL策略的有效性,不仅在实际应用性和计算时间方面,还确保了 occupant的热舒适性和HVAC系统的经济运行。

英文摘要

The large thermal capacity of buildings enables heating, ventilating, and air-conditioning (HVAC) systems to be exploited as demand response (DR) resources. Optimal DR of HVAC units is challenging, particularly for multi-zone buildings, because this requires detailed physics-based models of zonal temperature variations for HVAC system operation and building thermal conditions. This paper proposes a new strategy for optimal DR of an HVAC system in a multi-zone building, based on supervised learning (SL). Artificial neural networks (ANNs) are trained with data obtained under normal building operating conditions. The ANNs are replicated using piecewise linear equations, which are explicitly integrated into an optimal scheduling problem for price-based DR. The optimization problem is solved for various electricity prices and building thermal conditions. The solutions are further used to train a deep neural network (DNN) to directly determine the optimal DR schedule, referred to here as supervised-learning-aided meta-prediction (SLAMP). Case studies are performed using three different methods: explicit ANN replication (EAR), SLAMP, and physics-based modeling. The case study results verify the effectiveness of the proposed SL-based strategy, in terms of both practical applicability and computational time, while also ensuring the thermal comfort of occupants and cost-effective operation of the HVAC system.

1806.06317 2026-06-04 cs.LG cs.NA math.NA stat.ML

Laplacian Smoothing Gradient Descent

拉普拉斯平滑梯度下降

Stanley Osher, Bao Wang, Penghang Yin, Xiyang Luo, Farzin Barekat, Minh Pham, Alex Lin

AI总结 本文提出了一种简单的方法改进梯度下降和随机梯度下降,通过乘以正定矩阵的逆(可通过FFT高效计算)来减少方差、增大步长并提高泛化精度,同时在理论和实践中均表现出色。

详情
Comments
28 pages, 15 figures
AI中文摘要

我们提出了一类非常简单的梯度下降和随机梯度下降的修改方法。我们展示,当应用于从逻辑回归到深度神经网络的各种机器学习问题时,所提出的替代方法可以显著减少方差,允许采取更大的步长,并提高泛化准确性。这些方法仅涉及将通常的(随机)梯度乘以正定矩阵的逆(可以通过FFT高效计算),该矩阵的条件数来自一维离散拉普拉斯或其高阶推广。它还保持均值并增加最小成分,减少最大成分。哈密尔顿-雅可比偏微分方程的理论表明,新算法的隐式版本几乎等同于在新的函数上进行梯度下降,该函数(i)具有与原函数相同的全局极小值,并且(ii)更“凸”。此外,我们证明具有这些替代方案的优化算法在离散Sobolev $H_σ^p$ 意义下统一收敛,并减少凸优化问题的最优性差距。代码可在:\url{https://github.com/BaoWangMath/LaplacianSmoothing-GradientDescent}

英文摘要

We propose a class of very simple modifications of gradient descent and stochastic gradient descent. We show that when applied to a large variety of machine learning problems, ranging from logistic regression to deep neural nets, the proposed surrogates can dramatically reduce the variance, allow to take a larger step size, and improve the generalization accuracy. The methods only involve multiplying the usual (stochastic) gradient by the inverse of a positive definitive matrix (which can be computed efficiently by FFT) with a low condition number coming from a one-dimensional discrete Laplacian or its high order generalizations. It also preserves the mean and increases the smallest component and decreases the largest component. The theory of Hamilton-Jacobi partial differential equations demonstrates that the implicit version of the new algorithm is almost the same as doing gradient descent on a new function which (i) has the same global minima as the original function and (ii) is ``more convex". Moreover, we show that optimization algorithms with these surrogates converge uniformly in the discrete Sobolev $H_σ^p$ sense and reduce the optimality gap for convex optimization problems. The code is available at: \url{https://github.com/BaoWangMath/LaplacianSmoothing-GradientDescent}

1904.11930 2026-06-04 cs.SI cs.SY eess.SY physics.soc-ph stat.ML

Spectral partitioning of time-varying networks with unobserved edges

具有未观测边的时间变化网络的谱划分

Michael T. Schaub, Santiago Segarra, Hoi-To Wai

AI总结 本文研究了在动态图信号观测下,如何通过谱算法对时间变化网络进行划分,并证明了该方法在恢复潜在社区结构上的一致性。

详情
Journal ref
2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019)
Comments
5 pages, 2 figures
AI中文摘要

我们讨论了`盲`社区检测的一种变体,旨在从网络上的动态图信号观测中划分一个未观测的网络。我们考虑了一种场景,其中观测到的图信号是通过过滤白噪声输入得到的,而底层网络对每次观测都是不同的。在这种情况下,过滤后的图信号可以被解释为定义在时间变化网络上的。我们将每个底层网络实现视为从潜在的随机块模型(SBM)中独立抽取的。为了推断潜在SBM的划分,我们提出了一种简单的谱算法,并提供了理论分析和恢复的一致性保证。我们通过合成和真实数据的数值实验来说明我们的结果,突显了我们方法的有效性。

英文摘要

We discuss a variant of `blind' community detection, in which we aim to partition an unobserved network from the observation of a (dynamical) graph signal defined on the network. We consider a scenario where our observed graph signals are obtained by filtering white noise input, and the underlying network is different for every observation. In this fashion, the filtered graph signals can be interpreted as defined on a time-varying network. We model each of the underlying network realizations as generated by an independent draw from a latent stochastic blockmodel (SBM). To infer the partition of the latent SBM, we propose a simple spectral algorithm for which we provide a theoretical analysis and establish consistency guarantees for the recovery. We illustrate our results using numerical experiments on synthetic and real data, highlighting the efficacy of our approach.

1809.00846 2026-06-04 cs.LG cs.CV cs.SY eess.SY stat.ML

Towards Understanding Regularization in Batch Normalization

向批量归一化中的正则化理解迈进

Ping Luo, Xinjiang Wang, Wenqi Shao, Zhanglin Peng

AI总结 本文通过理论分析探讨了批量归一化在神经网络训练中的收敛性和泛化能力,揭示了批量归一化作为隐式正则化的作用,并通过实验验证了其在卷积神经网络中的正则化特性。

详情
Comments
International Conference on Learning Representations (ICLR)
AI中文摘要

批量归一化(BN)在神经网络训练中提高了收敛性和泛化能力。本工作从理论上理解这些现象。我们通过使用由核层、BN层和非线性激活函数组成的基本网络块来分析BN。这个基本网络帮助我们从三个方面理解BN的影响。首先,将BN视为隐式正则化,可以将其分解为总体归一化(PN)和伽马衰减作为显式正则化。其次,BN和正则化的学习动态表明,使用大最大有效学习率训练可以收敛。第三,通过统计力学探讨BN的泛化能力。实验表明,卷积神经网络中的BN共享上述分析中的正则化特性。

英文摘要

Batch Normalization (BN) improves both convergence and generalization in training neural networks. This work understands these phenomena theoretically. We analyze BN by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function. This basic network helps us understand the impacts of BN in three aspects. First, by viewing BN as an implicit regularizer, BN can be decomposed into population normalization (PN) and gamma decay as an explicit regularization. Second, learning dynamics of BN and the regularization show that training converged with large maximum and effective learning rate. Third, generalization of BN is explored by using statistical mechanics. Experiments demonstrate that BN in convolutional neural networks share the same traits of regularization as the above analyses.

1804.07833 2026-06-04 stat.CO cs.NA math.NA math.PR math.ST stat.TH

Two Metropolis-Hastings algorithms for posterior measures with non-Gaussian priors in infinite dimensions

在无限维中具有非高斯先验的后验测度的两种Metropolis-Hastings算法

Bamdad Hosseini

AI总结 本文提出两种Metropolis-Hastings算法,用于在无限维Hilbert空间上绝对连续的目标测度采样,重点在于设计可逆的自回归类型提案核,并引入Bessel-K先验作为伽玛分布在无限维中的推广,用于稀疏或压缩参数建模。

详情
Comments
Minor revisions
AI中文摘要

我们介绍了两种类别的Metropolis-Hastings算法,用于采样绝对连续于无限维Hilbert空间上非高斯先验测度的目标测度。特别是,我们关注某些先验测度的类别,其中可以设计出可逆的自回归类型提案核。然后,我们利用这些提案核设计满足详细平衡条件的算法。之后,我们引入了一种新的先验测度类,称为Bessel-K先验,作为伽玛分布在无限维中的推广。Bessel-K先验介于已知的先验之间,如伽玛分布和Besov先验,并能建模稀疏或压缩参数。我们展示了在密度估计、有限维去噪和圆上去卷积的数值例子中,针对Bessel-K先验的具体算法实例。

英文摘要

We introduce two classes of Metropolis-Hastings algorithms for sampling target measures that are absolutely continuous with respect to non-Gaussian prior measures on infinite-dimensional Hilbert spaces. In particular, we focus on certain classes of prior measures for which prior-reversible proposal kernels of the autoregressive type can be designed. We then use these proposal kernels to design algorithms that satisfy detailed balance with respect to the target measures. Afterwards, we introduce a new class of prior measures, called the Bessel-K priors, as a generalization of the gamma distribution to measures in infinite dimensions. The Bessel-K priors interpolate between well-known priors such as the gamma distribution and Besov priors and can model sparse or compressible parameters. We present concrete instances of our algorithms for the Bessel-K priors in the context of numerical examples in density estimation, finite-dimensional denoising and deconvolution on the circle.

1904.09668 2026-06-04 stat.CO cs.NA math.NA stat.ME

Kriging in Tensor Train data format

张量列车数据格式中的克里金法

Sergey Dolgov, Alexander Litvinenko, Dishi Liu

AI总结 本文提出了一种结合低张量秩技术和基于快速傅里叶变换的方法,用于加速统计运算,如克里金法、计算条件协方差和地质统计最优设计。通过将鲁棒的张量列车(TT)协方差矩阵近似和高效的TT-Cross算法融入基于FFT的克里金法中,将计算复杂度降低到O(d r³ n),并在合成和实际数据中展示了其相对于普通FFT方法的优势。

详情
Comments
19 pages,4 figures, 1 table, UNCECOMP 2019 3rd International Conference on Uncertainty Quantification in Computational Sciences and Engineering 24-26 June 2019, Crete, Greece https://2019.uncecomp.org/
AI中文摘要

低张量秩技术与基于快速傅里叶变换(FFT)的方法的结合在加速各种统计运算如克里金法、计算条件协方差、地质统计最优设计等方面显得尤为重要。然而,将完整的张量近似为低秩格式可能会带来计算上的挑战。在本工作中,我们结合了鲁棒的张量列车(TT)协方差矩阵近似和高效的TT-Cross算法,将其应用于基于FFT的克里金法中。结果显示,这里克里金法的计算复杂度被降低到O(d r³ n),其中n是估计网格的模式大小,d是变量的数量(维度),r是协方差矩阵的TT近似秩。对于许多流行的协方差函数,TT秩r在n和d增加时保持稳定。在合成和实际数据示例中,展示了这种方法相对于使用普通FFT方法的优势。

英文摘要

Combination of low-tensor rank techniques and the Fast Fourier transform (FFT) based methods had turned out to be prominent in accelerating various statistical operations such as Kriging, computing conditional covariance, geostatistical optimal design, and others. However, the approximation of a full tensor by its low-rank format can be computationally formidable. In this work, we incorporate the robust Tensor Train (TT) approximation of covariance matrices and the efficient TT-Cross algorithm into the FFT-based Kriging. It is shown that here the computational complexity of Kriging is reduced to $\mathcal{O}(d r^3 n)$, where $n$ is the mode size of the estimation grid, $d$ is the number of variables (the dimension), and $r$ is the rank of the TT approximation of the covariance matrix. For many popular covariance functions the TT rank $r$ remains stable for increasing $n$ and $d$. The advantages of this approach against those using plain FFT are demonstrated in synthetic and real data examples.

1904.09396 2026-06-04 eess.SY cs.SY math.OC stat.ML

Learning Sparse Dynamical Systems from a Single Sample Trajectory

从单个样本轨迹学习稀疏动力系统

Salar Fattahi, Nikolai Matni, Somayeh Sojoudi

AI总结 本文研究了从单个样本轨迹识别稀疏线性时不变系统的问题,提出了一种类似于Lasso的估计器来估计系统参数,并提供了在有限时间内准确恢复系统稀疏结构和参数值的保证。同时,该方法在样本复杂度上有所改进,并通过电力系统案例研究展示了其性能。

详情
AI中文摘要

本文解决了从单个样本轨迹识别稀疏线性时不变(LTI)系统的问题。我们引入了一种类似于Lasso的估计器来估计系统参数,考虑到其稀疏性。假设系统是稳定的,或者配备了初始稳定控制器,我们提供了在有限时间内准确恢复系统稀疏结构和参数值的严格保证。特别是,我们证明,只要样本轨迹的长度超过某个阈值,所提出的估计器可以以高概率正确识别系统矩阵的稀疏模式。此外,我们证明该阈值与系统矩阵中非零元素的数量呈多项式关系,但与系统维度呈对数关系,这优于现有稀疏系统识别问题的样本复杂度界。我们进一步将这些结果扩展到获得估计误差的$\ell_{\infty}$-范数的严格界,并展示了系统不同属性(如稳定性水平和互不相干性)如何影响该界。最后,通过电力系统的广泛案例研究来展示所提出估计方法的性能。

英文摘要

This paper addresses the problem of identifying sparse linear time-invariant (LTI) systems from a single sample trajectory generated by the system dynamics. We introduce a Lasso-like estimator for the parameters of the system, taking into account their sparse nature. Assuming that the system is stable, or that it is equipped with an initial stabilizing controller, we provide sharp finite-time guarantees on the accurate recovery of both the sparsity structure and the parameter values of the system. In particular, we show that the proposed estimator can correctly identify the sparsity pattern of the system matrices with high probability, provided that the length of the sample trajectory exceeds a threshold. Furthermore, we show that this threshold scales polynomially in the number of nonzero elements in the system matrices, but logarithmically in the system dimensions --- this improves on existing sample complexity bounds for the sparse system identification problem. We further extend these results to obtain sharp bounds on the $\ell_{\infty}$-norm of the estimation error and show how different properties of the system---such as its stability level and \textit{mutual incoherency}---affect this bound. Finally, an extensive case study on power systems is presented to illustrate the performance of the proposed estimation method.

1903.05803 2026-06-04 cs.LG cs.SY eess.SY stat.ML

On Applications of Bootstrap in Continuous Space Reinforcement Learning

在连续空间强化学习中Bootstrap应用的探讨

Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis

AI总结 本文研究了在连续状态和动作空间决策问题中,基于Bootstrap的策略在 regret 方面的平方根缩放特性,并探讨了模型动态学习的准确性。

详情
AI中文摘要

在连续状态和动作空间的决策问题中,线性动力学模型被广泛采用。具体而言,针对受二次成本函数约束的随机线性系统,其策略在强化学习中涵盖了大量应用。最近的文献中研究了随机策略,以解决识别与控制之间的权衡。然而,关于基于Bootstrap观察状态和动作的策略知之甚少。在本文中,我们证明基于Bootstrap的策略在时间方面具有平方根缩放的regret。我们还获得了关于学习模型动态准确性结果。此外,还提供了支持技术结果的数值分析。

英文摘要

In decision making problems for continuous state and action spaces, linear dynamical models are widely employed. Specifically, policies for stochastic linear systems subject to quadratic cost functions capture a large number of applications in reinforcement learning. Selected randomized policies have been studied in the literature recently that address the trade-off between identification and control. However, little is known about policies based on bootstrapping observed states and actions. In this work, we show that bootstrap-based policies achieve a square root scaling of regret with respect to time. We also obtain results on the accuracy of learning the model's dynamics. Corroborative numerical analysis that illustrates the technical results is also provided.

1903.03712 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Adaptive Power System Emergency Control using Deep Reinforcement Learning

基于深度强化学习的自适应电力系统紧急控制

Qiuhua Huang, Renke Huang, Weituo Hao, Jie Tan, Rui Fan, Zhenyu Huang

AI总结 本文提出了一种基于深度强化学习的自适应电力系统紧急控制方法,通过高维特征提取和非线性泛化能力来应对现代电网中的不确定性与变化,展示了其在发电机动态制动和电压降低负载切除中的优异性能和鲁棒性。

详情
Comments
12 pages
AI中文摘要

电力系统紧急控制通常被视为电网安全和韧性的最后一道安全网。现有的紧急控制方案通常是基于设想的'最坏'场景或几个典型运行场景进行离线设计。这些方案在现代电网中出现越来越多的不确定性和变化时,面临着显著的适应性和鲁棒性问题。为了解决这些挑战,本文首次开发了新的自适应紧急控制方案,利用深度强化学习(DRL)的高维特征提取和非线性泛化能力来处理复杂的电力系统。此外,首次设计了一个名为RLGC的开源平台,以协助DRL算法在电力系统控制中的开发和基准测试。详细介绍了该平台和基于DRL的紧急控制方案,包括发电机动态制动和电压降低负载切除。在两个区域四机系统和IEEE 39节点系统中进行了广泛的案例研究,证明了所提出方案的优异性能和鲁棒性。

英文摘要

Power system emergency control is generally regarded as the last safety net for grid security and resiliency. Existing emergency control schemes are usually designed off-line based on either the conceived "worst" case scenario or a few typical operation scenarios. These schemes are facing significant adaptiveness and robustness issues as increasing uncertainties and variations occur in modern electrical grids. To address these challenges, for the first time, this paper developed novel adaptive emergency control schemes using deep reinforcement learning (DRL), by leveraging the high-dimensional feature extraction and non-linear generalization capabilities of DRL for complex power systems. Furthermore, an open-source platform named RLGC has been designed for the first time to assist the development and benchmarking of DRL algorithms for power system control. Details of the platform and DRL-based emergency control schemes for generator dynamic braking and under-voltage load shedding are presented. Extensive case studies performed in both two-area four-machine system and IEEE 39-Bus system have demonstrated the excellent performance and robustness of the proposed schemes.

1506.08272 2026-06-04 math.OC cs.NA math.NA stat.ML

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

非凸优化中的异步并行随机梯度

Xiangru Lian, Yijun Huang, Yuncheng Li, Ji Liu

AI总结 本文研究了非凸优化中两种异步并行随机梯度方法,证明了其在计算机网络和共享内存系统中的渐进收敛率,并展示了线性加速的可能性。

详情
Comments
33 pages
AI中文摘要

非凸优化中的异步并行随机梯度(SG)实现已被广泛用于解决深度神经网络,并在实践中取得了许多成功。然而,现有的理论无法解释其收敛性和加速特性,主要是由于大多数深度学习公式是非凸的以及异步并行机制。为了填补理论空白并提供理论支持,本文研究了两种异步并行SG实现:一种是在计算机网络上,另一种是在共享内存系统上。我们为这两种算法建立了渐进收敛率$O(1/\sqrt{K})$,并证明了如果工作节点数受$\sqrt{K}$($K$是总迭代次数)限制,则线性加速是可行的。我们的结果扩展并改进了现有凸最小化分析。

英文摘要

Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism. To fill the gaps in theory and provide theoretical supports, this paper studies two asynchronous parallel implementations of SG: one is on the computer network and the other is on the shared memory system. We establish an ergodic convergence rate $O(1/\sqrt{K})$ for both algorithms and prove that the linear speedup is achievable if the number of workers is bounded by $\sqrt{K}$ ($K$ is the total number of iterations). Our results generalize and improve existing analysis for convex minimization.

1805.07970 2026-06-04 stat.ME cs.NA math.NA

Implicit Probabilistic Integrators for ODEs

隐式概率积分器用于微分方程

Onur Teymur, Han Cheng Lie, Tim Sullivan, Ben Calderhead

AI总结 本文提出了一种隐式概率积分器家族,用于初始值问题,基于多步Adams-Moulton方法。该方法通过动态反馈未来时间步的信息,不同于以往基于显式方法的概率积分器。本文还讨论了此类积分器的校准问题,并提供了一个示例展示概率积分器在反问题参数推断中的应用效果。

详情
Journal ref
Advances in Neural Information Processing Systems 31 (2018) pp. 7244-7253
Comments
32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada
AI中文摘要

我们介绍了一种用于初始值问题(IVPs)的隐式概率积分器家族,其起点是多步Adams-Moulton方法。隐式构造允许从未来时间步动态反馈信息,与以往所有基于显式方法的概率积分器不同。我们首先简要回顾了概率微分方程求解器这一迅速发展的领域。然后介绍我们的方法,该方法基于并改进了Conrad等人(2016)和Teymur等人(2016)的工作,并提供了一种严格证明其定义性和收敛性的证明。我们讨论了此类积分器的校准问题,并提出了一种方法。我们还提供了一个示例,展示了使用概率积分器(包括我们的新方法)在反问题参数推断中的效果。

英文摘要

We introduce a family of implicit probabilistic integrators for initial value problems (IVPs), taking as a starting point the multistep Adams-Moulton method. The implicit construction allows for dynamic feedback from the forthcoming time-step, in contrast to previous probabilistic integrators, all of which are based on explicit methods. We begin with a concise survey of the rapidly-expanding field of probabilistic ODE solvers. We then introduce our method, which builds on and adapts the work of Conrad et al. (2016) and Teymur et al. (2016), and provide a rigorous proof of its well-definedness and convergence. We discuss the problem of the calibration of such integrators and suggest one approach. We give an illustrative example highlighting the effect of the use of probabilistic integrators - including our new method - in the setting of parameter inference within an inverse problem.

1904.08361 2026-06-04 cs.LG cs.RO cs.SY eess.SY stat.ML

Decoupled Data Based Approach for Learning to Control Nonlinear Dynamical Systems

基于解耦数据的方法用于学习控制非线性动力学系统

Ran Wang, Karthikeya Parunandi, Dan Yu, Dileep Kalathil, Suman Chakravorty

AI总结 本文提出了一种解耦数据基于的方法,用于学习控制具有连续状态空间、连续动作空间和未知动态的非线性随机动力学系统,通过解耦的开环-闭环方法,利用黑盒仿真模型解决开环确定性轨迹优化问题,并通过线性化动态在该名义轨迹上开发闭环控制,从而使用线性二次调节器算法,证明了该方法的性能近似最优,并在训练时间上显著优于其他先进算法。

详情
AI中文摘要

本文解决了一个非线性随机动力学系统学习最优控制策略的问题,该系统具有连续状态空间、连续动作空间和未知动态。此类问题通常在随机自适应控制和强化学习文献中使用基于模型和无模型的方法分别解决。这两种方法都依赖于解决动态规划问题,无论是直接还是间接,以找到最优闭环控制策略。动态规划方法固有的'维度灾难'使这些方法也变得计算上困难。本文提出了一种新颖的解耦数据基于控制(D2C)算法,通过解耦的'开环-闭环'方法解决这个问题。首先,使用动力学系统的黑盒仿真模型解决一个开环确定性轨迹优化问题。然后,通过在该名义轨迹上线性化动态,开发围绕该开环轨迹的闭环控制。通过线性化,可以使用基于线性二次调节器的算法来实现该闭环控制。我们证明了D2C算法的性能近似最优。此外,仿真性能表明,与其它先进算法相比,训练时间显著减少。

英文摘要

This paper addresses the problem of learning the optimal control policy for a nonlinear stochastic dynamical system with continuous state space, continuous action space and unknown dynamics. This class of problems are typically addressed in stochastic adaptive control and reinforcement learning literature using model-based and model-free approaches respectively. Both methods rely on solving a dynamic programming problem, either directly or indirectly, for finding the optimal closed loop control policy. The inherent `curse of dimensionality' associated with dynamic programming method makes these approaches also computationally difficult. This paper proposes a novel decoupled data-based control (D2C) algorithm that addresses this problem using a decoupled, `open loop - closed loop', approach. First, an open-loop deterministic trajectory optimization problem is solved using a black-box simulation model of the dynamical system. Then, a closed loop control is developed around this open loop trajectory by linearization of the dynamics about this nominal trajectory. By virtue of linearization, a linear quadratic regulator based algorithm can be used for this closed loop control. We show that the performance of D2C algorithm is approximately optimal. Moreover, simulation performance suggests significant reduction in training time compared to other state of the art algorithms.

1904.07200 2026-06-04 cs.LG cs.NA math.NA stat.ML

A Discussion on Solving Partial Differential Equations using Neural Networks

利用神经网络求解偏微分方程的讨论

Tim Dockhorn

AI总结 本文探讨了神经网络求解偏微分方程的能力,通过数值实验展示了小型神经网络能够准确学习复杂解,并分析了随机权重初始化对解质量的影响,提出了损失函数的选择、神经网络与经典数值方法的优劣比较,以及未来研究方向。

详情
Comments
9 pages, 2 figures
AI中文摘要

神经网络是否能够学习求解偏微分方程(PDEs)?本文针对两个(系统)PDEs,即泊松方程和稳态纳维-斯托克斯方程,探讨了这一问题。本文的贡献有五点:(1)数值实验表明,小型神经网络(<500个可学习参数)能够准确学习偏微分方程组的复杂解。(2)研究了随机权重初始化对神经网络近似解质量的影响,并展示了如何利用这种非确定性进行集成学习。(3)探讨了本文中所用损失函数的适用性。(4)研究了用神经网络求解(系统)PDEs与经典数值方法的优缺点。(5)提出了未来研究的全面方向列表。

英文摘要

Can neural networks learn to solve partial differential equations (PDEs)? We investigate this question for two (systems of) PDEs, namely, the Poisson equation and the steady Navier--Stokes equations. The contributions of this paper are five-fold. (1) Numerical experiments show that small neural networks (< 500 learnable parameters) are able to accurately learn complex solutions for systems of partial differential equations. (2) It investigates the influence of random weight initialization on the quality of the neural network approximate solution and demonstrates how one can take advantage of this non-determinism using ensemble learning. (3) It investigates the suitability of the loss function used in this work. (4) It studies the benefits and drawbacks of solving (systems of) PDEs with neural networks compared to classical numerical methods. (5) It proposes an exhaustive list of possible directions of future work.

1903.07266 2026-06-04 cs.LG cs.DC cs.MA cs.SY eess.SY stat.ML

Distributed stochastic optimization with gradient tracking over strongly-connected networks

在强连通网络上进行分布式随机优化与梯度跟踪

Ran Xin, Anit Kumar Sahu, Usman A. Khan, Soummya Kar

AI总结 本文研究了在强连通网络上最小化平滑且强凸局部成本函数之和的分布式随机优化问题,提出了一种新的分布式方法$\mathcal{S}$-$\mathcal{AB}$,通过辅助变量在期望意义上渐近跟踪全局成本的梯度,利用行和列随机权重确保共识和最优性,并在任意强连通图上应用。

详情
AI中文摘要

在本文中,我们研究了在强连通网络上最小化平滑且强凸局部成本函数之和的分布式随机优化问题,假设每个代理都有访问随机一阶oracle($\mathcal{SFO}$)的权限,我们提出了一种新的分布式方法,称为$\mathcal{S}$-$\mathcal{AB}$,其中每个代理使用辅助变量以期望意义渐近跟踪全局成本的梯度。$\mathcal{S}$-$\mathcal{AB}$算法同时使用行和列随机权重,以确保共识和最优性。由于未使用双随机权重,$\mathcal{S}$-$\mathcal{AB}$适用于任意强连通图。我们证明,在足够小的常数步长下,$\mathcal{S}$-$\mathcal{AB}$在期望均方意义上线性收敛到全局极小值的邻域。我们基于真实世界数据集进行了数值模拟以说明理论结果。

英文摘要

In this paper, we study distributed stochastic optimization to minimize a sum of smooth and strongly-convex local cost functions over a network of agents, communicating over a strongly-connected graph. Assuming that each agent has access to a stochastic first-order oracle ($\mathcal{SFO}$), we propose a novel distributed method, called $\mathcal{S}$-$\mathcal{AB}$, where each agent uses an auxiliary variable to asymptotically track the gradient of the global cost in expectation. The $\mathcal{S}$-$\mathcal{AB}$ algorithm employs row- and column-stochastic weights simultaneously to ensure both consensus and optimality. Since doubly-stochastic weights are not used, $\mathcal{S}$-$\mathcal{AB}$ is applicable to arbitrary strongly-connected graphs. We show that under a sufficiently small constant step-size, $\mathcal{S}$-$\mathcal{AB}$ converges linearly (in expected mean-square sense) to a neighborhood of the global minimizer. We present numerical simulations based on real-world data sets to illustrate the theoretical results.

1904.04083 2026-06-04 eess.SP cs.SY eess.SY stat.AP

Convolutive Blind Source Separation on Surface EMG Signals for Respiratory Diagnostics and Medical Ventilation Control

基于表面肌电信号的卷积盲源分离用于呼吸诊断和医疗通气控制

Herbert Buchner, Eike Petersen, Marcus Eger, Philipp Rostalski

AI总结 本文提出利用卷积盲源分离技术对表面肌电信号进行预处理,以解决区分吸气、呼气和心肌活动的问题,从而提升呼吸诊断和医疗通气控制的应用效果。

详情
Journal ref
2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
AI中文摘要

肌电图(EMG)是评估肌肉活动的重要工具,因此也是呼吸支持诊断和控制的重要测量手段。本文提出卷积盲源分离(BSS)作为一种有效的预处理方法,用于处理人类呼吸肌肉的表面肌电信号(sEMG)。具体而言,本文针对区分吸气、呼气和心肌活动的问题进行了研究,该问题目前是sEMG在自适应通气控制中应用的主要障碍。研究表明,使用所研究的宽带算法,可以实现这些成分的清晰分离。该算法基于一个通用的BSS框架,利用多种统计信号特性。除了四通道FIR结构外,对解混系统没有其他限制性假设。

英文摘要

The electromyogram (EMG) is an important tool for assessing the activity of a muscle and thus also a valuable measure for the diagnosis and control of respiratory support. In this article we propose convolutive blind source separation (BSS) as an effective tool to pre-process surface electromyogram (sEMG) data of the human respiratory muscles. Specifically, the problem of discriminating between inspiratory, expiratory and cardiac muscle activity is addressed, which currently poses a major obstacle for the clinical use of sEMG for adaptive ventilation control. It is shown that using the investigated broadband algorithm, a clear separation of these components can be achieved. The algorithm is based on a generic framework for BSS that utilizes multiple statistical signal characteristics. Apart from a four-channel FIR structure, there are no further restrictive assumptions on the demixing system.

1809.06581 2026-06-04 math.PR cs.NA math.NA math.ST stat.ME stat.TH

A probabilistic framework for approximating functions in active subspaces

基于主动子空间的函数近似概率框架

Mario Teixeira Parente

AI总结 本文提出一个全面的概率框架,用于在主动子空间中计算近似函数。该方法通过在非主导方向上对目标高维函数f进行积分,利用合适的条件密度函数进行近似,从而将问题降维。通过扩展Constantine等人(2014, 2016)的结果,本文提出了一个完全概率框架,并通过一个简单的数值例子进行了验证。

详情
Comments
21 pages, 2 figures
AI中文摘要

本文开发了一个全面的概率框架,用于在主动子空间中计算近似函数。Constantine等人(2014)提出了主动子空间方法,旨在将计算问题的维度降低。该方法可以视为通过低维函数近似高维函数f的一种尝试。为此,一种常见方法是在非主导方向上对f进行积分,使用合适的条件密度函数。在实践中,这可以通过有限的蒙特卡洛求和来实现,使得在每个固定的主动子空间输入下,所得到的近似函数在非主导变量上是随机的,并且其期望值,即低维函数在主动变量上的概率测度加权积分,也得到考虑。在此基础上,我们扩展了Constantine等人(2014, 2016)的结果,提出了一个完整的概率框架。结果通过一个简单的数值例子进行了支持。

英文摘要

This paper develops a comprehensive probabilistic setup to compute approximating functions in active subspaces. Constantine et al. proposed the active subspace method in (Constantine et al., 2014) to reduce the dimension of computational problems. It can be seen as an attempt to approximate a high-dimensional function of interest $f$ by a low-dimensional one. To do this, a common approach is to integrate $f$ over the inactive, i.e. non-dominant, directions with a suitable conditional density function. In practice, this can be done with a finite Monte Carlo sum, making not only the resulting approximation random in the inactive variable for each fixed input from the active subspace, but also its expectation, i.e. the integral of the low-dimensional function weighted with a probability measure on the active variable. In this regard we develop a fully probabilistic framework extending results from (Constantine et al., 2014, 2016). The results are supported by a simple numerical example.

1708.01945 2026-06-04 stat.ML cs.LG cs.NA math.NA

A Bootstrap Method for Error Estimation in Randomized Matrix Multiplication

一种用于随机矩阵乘法误差估计的自助法

Miles E. Lopes, Shusen Wang, Michael W. Mahoney

AI总结 本文提出了一种自助方法,用于直接估计随机矩阵乘法(降维)的准确性,作为解决一般问题的原型设置。该方法在计算上不显著增加标准降维方法的成本,并通过插值技术实现,同时提供了理论和实证结果以证明其有效性。

详情
Journal ref
Journal of Machine Learning Research, 20(39): 1-40, 2019
AI中文摘要

近年来,随机方法在数值线性代数中受到越来越多的关注,作为解决大规模问题的一般方法。通常,这些方法的核心成分是某种形式的随机降维,这加速了计算,但也引入了随机近似误差。在这种情况下,降维步骤编码了成本与精度之间的权衡。然而,成本与精度之间的精确数值关系通常未知,因此用户可能难以准确知道(1)给定解的准确性,或(2)为了达到给定的准确性水平需要多少计算。在本文中,我们研究随机矩阵乘法(草图)作为解决这些问题的原型设置。作为解决方案,我们开发了一种自助方法,用于直接估计准确性作为降维函数的函数(而不是导出降维的最坏情况界限)。从计算角度来看,所提出的方法不显著增加标准草图方法的成本,并且这得益于一种“插值”技术。此外,我们提供了理论和实证结果,以证明所提出方法的有效性。

英文摘要

In recent years, randomized methods for numerical linear algebra have received growing interest as a general approach to large-scale problems. Typically, the essential ingredient of these methods is some form of randomized dimension reduction, which accelerates computations, but also creates random approximation error. In this way, the dimension reduction step encodes a tradeoff between cost and accuracy. However, the exact numerical relationship between cost and accuracy is typically unknown, and consequently, it may be difficult for the user to precisely know (1) how accurate a given solution is, or (2) how much computation is needed to achieve a given level of accuracy. In the current paper, we study randomized matrix multiplication (sketching) as a prototype setting for addressing these general problems. As a solution, we develop a bootstrap method for \emph{directly estimating} the accuracy as a function of the reduced dimension (as opposed to deriving worst-case bounds on the accuracy in terms of the reduced dimension). From a computational standpoint, the proposed method does not substantially increase the cost of standard sketching methods, and this is made possible by an "extrapolation" technique. In addition, we provide both theoretical and empirical results to demonstrate the effectiveness of the proposed method.

1904.01855 2026-06-04 math.OC cs.LG cs.SY eess.SY stat.ML

A Stochastic Interpretation of Stochastic Mirror Descent: Risk-Sensitive Optimality

随机镜像下降的随机解释:风险敏感最优性

Navid Azizan, Babak Hassibi

AI总结 本文提出随机镜像下降(SMD)是一种风险敏感最优估计器,适用于非高斯分布的未知权重向量和加性噪声,同时引入了对称SMD(SSMD)的改进版本。

详情
AI中文摘要

随机镜像下降(SMD)是一种相对较新的算法家族,最近在优化、机器学习和控制领域得到了广泛应用。它可以被视为经典随机梯度算法(SGD)的推广,其中权重向量的更新不是沿着随机梯度的负方向进行,而是在一个由梯度的(严格凸)势函数定义的“镜像域”中进行。这种势函数及其产生的镜像域相比SGD提供了更大的算法灵活性。尽管许多SMD的性质已经在文献中得到研究,但本文提出了SMD的一个新解释,即当未知权重向量和加性噪声非高斯且属于指数分布族时,SMD是一个风险敏感最优估计器。分析还建议了SMD的一种改进版本,称为对称SMD(SSMD)。证明依赖于Bregman散度的一些简单性质,使我们能够将结果从二次函数和高斯分布扩展到某些凸函数和指数分布族,方式较为流畅。

英文摘要

Stochastic mirror descent (SMD) is a fairly new family of algorithms that has recently found a wide range of applications in optimization, machine learning, and control. It can be considered a generalization of the classical stochastic gradient algorithm (SGD), where instead of updating the weight vector along the negative direction of the stochastic gradient, the update is performed in a "mirror domain" defined by the gradient of a (strictly convex) potential function. This potential function, and the mirror domain it yields, provides considerable flexibility in the algorithm compared to SGD. While many properties of SMD have already been obtained in the literature, in this paper we exhibit a new interpretation of SMD, namely that it is a risk-sensitive optimal estimator when the unknown weight vector and additive noise are non-Gaussian and belong to the exponential family of distributions. The analysis also suggests a modified version of SMD, which we refer to as symmetric SMD (SSMD). The proofs rely on some simple properties of Bregman divergence, which allow us to extend results from quadratics and Gaussians to certain convex functions and exponential families in a rather seamless way.

1904.01214 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

Enhancement of Energy-Based Swing-Up Controller via Entropy Search

通过熵搜索增强基于能量的摆动上控制器

Chang Sik Lee, Dong Eui Chang

AI总结 本文利用熵搜索进行贝叶斯优化,改进基于能量的控制器,以实现旋转倒立摆(Furuta摆)的摆动控制,实验表明该控制器在各种初始条件下性能优于常规控制器。

详情
Comments
6 pages, 2019 Asian Control Conference
AI中文摘要

基于能量的方法为稳定机械系统提供了一种简单而强大的控制方案。然而,由于它不对控制器参数空间施加强约束,寻找适合最优控制器的参数值被认为是困难的。本文旨在通过应用称为熵搜索的贝叶斯优化方法,生成一个最优的基于能量的控制器,用于旋转倒立摆(也称为Furuta摆)的摆动控制。仿真和实验表明,与常规控制器相比,最优控制器在各种初始条件下表现出改进的性能。

英文摘要

An energy based approach for stabilizing a mechanical system has offered a simple yet powerful control scheme. However, since it does not impose such strong constraints on parameter space of the controller, finding appropriate parameter values for an optimal controller is known to be hard. This paper intends to generate an optimal energy-based controller for swinging up a rotary inverted pendulum, also known as the Furuta pendulum, by applying the Bayesian optimization called Entropy Search. Simulations and experiments show that the optimal controller has an improved performance compared to a nominal controller for various initial conditions.

1904.00035 2026-06-04 cs.RO cs.LG cs.SY eess.SY stat.ML

Autonomous Highway Driving using Deep Reinforcement Learning

使用深度强化学习实现自动驾驶高速公路驾驶

Subramanya Nageshrao, Eric Tseng, Dimitar Filev

AI总结 本文提出了一种基于强化学习的方法,通过与模拟交通直接交互,使自动驾驶车辆在复杂和多变的环境中做出决策,解决了传统规则和预设成本函数在实时优化中的不足,提高了学习效率和安全性。

详情
AI中文摘要

自动驾驶车辆的操作空间可以是多样的,并且可能显著变化。这可能导致设计阶段未预料到的场景。因此,基于规则的决策者选择动作可能并不理想。同样,设计一个先验成本函数然后在实时中求解最优控制问题可能也不够有效。为了应对这些问题并避免在遇到意外场景时出现异常行为,我们提出了一种基于强化学习(RL)的方法,其中自动驾驶车辆通过与模拟交通直接交互来学习决策。决策者由深度神经网络实现,根据给定的系统状态提供动作选择。在关键应用如驾驶中,没有明确安全概念的RL代理可能无法收敛,或者需要极大量的样本才能找到可靠的策略。为了更好地解决这个问题,本文将强化学习与额外的短时间安全检查(SC)相结合。在关键场景中,安全检查还将为代理提供替代的安全动作,如果存在的话。这导致了两个新的贡献。首先,它扩展了可能导致不良“接近事件”或“碰撞”的状态。其次,安全检查的加入可以提供一个安全且稳定的训练环境。这显著提高了学习效率,同时不抑制有意义的探索,以确保安全和最优的学习行为。我们展示了所开发算法在高速公路驾驶场景中的性能,其中训练好的自动驾驶车辆在高速公路环境下遇到不同交通密度的情况。

英文摘要

The operational space of an autonomous vehicle (AV) can be diverse and vary significantly. This may lead to a scenario that was not postulated in the design phase. Due to this, formulating a rule based decision maker for selecting maneuvers may not be ideal. Similarly, it may not be effective to design an a-priori cost function and then solve the optimal control problem in real-time. In order to address these issues and to avoid peculiar behaviors when encountering unforeseen scenario, we propose a reinforcement learning (RL) based method, where the ego car, i.e., an autonomous vehicle, learns to make decisions by directly interacting with simulated traffic. The decision maker for AV is implemented as a deep neural network providing an action choice for a given system state. In a critical application such as driving, an RL agent without explicit notion of safety may not converge or it may need extremely large number of samples before finding a reliable policy. To best address the issue, this paper incorporates reinforcement learning with an additional short horizon safety check (SC). In a critical scenario, the safety check will also provide an alternate safe action to the agent provided if it exists. This leads to two novel contributions. First, it generalizes the states that could lead to undesirable "near-misses" or "collisions ". Second, inclusion of safety check can provide a safe and stable training environment. This significantly enhances learning efficiency without inhibiting meaningful exploration to ensure safe and optimal learned behavior. We demonstrate the performance of the developed algorithm in highway driving scenario where the trained AV encounters varying traffic density in a highway setting.

1711.07230 2026-06-04 eess.SY cs.SY math.OC stat.AP stat.ML

Optimism-Based Adaptive Regulation of Linear-Quadratic Systems

基于乐观的线性二次系统自适应调节

Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis

AI总结 本文研究了线性二次系统自适应调节中的探索与利用权衡问题,提出基于乐观方法的自适应策略,并建立了最坏情况下后悔的高概率上界,证明了该上界在对数因子范围内最优,且仅需系统可稳定性和噪声分布尾重限制的温和假设。

详情
Comments
28 pages
AI中文摘要

线性二次系统自适应调节的主要挑战在于探索与利用之间的权衡。自适应策略需要同时解决未知动态参数的估计(探索)和系统调节(利用)。为此,文献中采用了基于乐观的方法,通过偏倚识别以偏向于真实参数的乐观近似。已有大量渐近结果,但有限时间内的结果较少且存在重要限制。本文建立了基于乐观的自适应策略的最坏情况后悔结果。所呈现的高概率上界在对数因子范围内最优。本工作的非渐近分析仅需非常温和的假设:(i) 系统动态的可稳定性,以及 (ii) 限制噪声分布的尾重程度。为建立这些界,发展了某些新技巧,以全面处理具有重尾分布的依赖随机矩阵的概率行为。

英文摘要

The main challenge for adaptive regulation of linear-quadratic systems is the trade-off between identification and control. An adaptive policy needs to address both the estimation of unknown dynamics parameters (exploration), as well as the regulation of the underlying system (exploitation). To this end, optimism-based methods which bias the identification in favor of optimistic approximations of the true parameter are employed in the literature. A number of asymptotic results have been established, but their finite time counterparts are few, with important restrictions. This study establishes results for the worst-case regret of optimism-based adaptive policies. The presented high probability upper bounds are optimal up to logarithmic factors. The non-asymptotic analysis of this work requires very mild assumptions; (i) stabilizability of the system's dynamics, and (ii) limiting the degree of heaviness of the noise distribution. To establish such bounds, certain novel techniques are developed to comprehensively address the probabilistic behavior of dependent random matrices with heavy-tailed distributions.

1808.10788 2026-06-04 stat.ML cs.LG cs.NA math.NA

Data-driven discovery of PDEs in complex datasets

基于数据的复杂数据集中的PDE发现

Jens Berg, Kaj Nyström

AI总结 本文通过机器学习方法从复杂数据集中发现隐藏的偏微分方程,展示了如何通过数据转换和特征选择来揭示物理过程的PDE,并在非线性二次PDE和瑞典温度分布模拟中验证了该方法的有效性。

详情
AI中文摘要

许多科学和工程中的过程可以用偏微分方程(PDEs)来描述。传统上,PDEs是通过考虑物理基本原理来推导感兴趣的物理量之间的关系。另一种方法是测量感兴趣的量并使用深度学习来逆向工程描述物理过程的PDEs。本文使用机器学习,特别是深度学习,来从测量数据中发现复杂数据集中的PDEs。我们包括来自已知模型问题的数据示例和来自气象站的实测数据。我们展示了输入数据的必要转换相当于发现的PDE中的坐标转换,并详细阐述了特征和模型选择。证明了非线性二次PDE的动力学可以被普通微分方程准确描述,该方程由我们的深度学习算法自动发现。更有趣的是,我们在瑞典温度分布更复杂的模拟中也展示了类似的结果。

英文摘要

Many processes in science and engineering can be described by partial differential equations (PDEs). Traditionally, PDEs are derived by considering first principles of physics to derive the relations between the involved physical quantities of interest. A different approach is to measure the quantities of interest and use deep learning to reverse engineer the PDEs which are describing the physical process. In this paper we use machine learning, and deep learning in particular, to discover PDEs hidden in complex data sets from measurement data. We include examples of data from a known model problem, and real data from weather station measurements. We show how necessary transformations of the input data amounts to coordinate transformations in the discovered PDE, and we elaborate on feature and model selection. It is shown that the dynamics of a non-linear, second order PDE can be accurately described by an ordinary differential equation which is automatically discovered by our deep learning algorithm. Even more interestingly, we show that similar results apply in the context of more complex simulations of the Swedish temperature distribution.

1808.03215 2026-06-04 stat.CO cs.NA math.NA math.OC

Scalable Gaussian Process Computations Using Hierarchical Matrices

使用分层矩阵实现可扩展的高斯过程计算

Christopher J. Geoga, Mihai Anitescu, Michael L. Stein

AI总结 本文提出了一种不依赖核的方法,利用分层矩阵解决高斯过程的最大似然估计问题,通过准线性复杂度O(n log²n)高效计算梯度、Hessian和期望Fisher信息矩阵。

详情
AI中文摘要

我们提出了一种不依赖核的方法,将分层矩阵应用于高斯过程的最大似然估计问题。所提出的近似方法提供了自然且可扩展的随机估计器,用于其梯度、Hessian以及期望Fisher信息矩阵,其计算复杂度为准线性O(n log²n),适用于广泛模型。为此,我们(i)选择特定的分层近似方法来计算协方差矩阵的精确导数,并(ii)使用一种稳定的Hutchinson随机迹估计器。由于观测和期望信息矩阵均可在准线性复杂度下计算,因此协方差矩阵对于最大似然估计(MLE)也可高效估计。在讨论相关数学后,我们展示了该方法的可扩展性,讨论了其实现细节,并验证了基于逆Fisher信息矩阵所得的MLE和置信区间与精确似然所得结果一致。

英文摘要

We present a kernel-independent method that applies hierarchical matrices to the problem of maximum likelihood estimation for Gaussian processes. The proposed approximation provides natural and scalable stochastic estimators for its gradient and Hessian, as well as the expected Fisher information matrix, that are computable in quasilinear $O(n \log^2 n)$ complexity for a large range of models. To accomplish this, we (i) choose a specific hierarchical approximation for covariance matrices that enables the computation of their exact derivatives and (ii) use a stabilized form of the Hutchinson stochastic trace estimator. Since both the observed and expected information matrices can be computed in quasilinear complexity, covariance matrices for MLEs can also be estimated efficiently. After discussing the associated mathematics, we demonstrate the scalability of the method, discuss details of its implementation, and validate that the resulting MLEs and confidence intervals based on the inverse Fisher information matrix faithfully approach those obtained by the exact likelihood.

1903.09136 2026-06-04 stat.ML cs.LG cs.SY eess.SP eess.SY

On Approximate Nonlinear Gaussian Message Passing On Factor Graphs

关于因子图上的近似非线性高斯信息传递

Eike Petersen, Christian Hoffmann, Philipp Rostalski

AI总结 本文提出了一种基于因子图的近似高斯信息传递规则,用于处理确定性非线性变换节点,通过数值求积和Rauch-Tung-Striebel型近似方法,为非线性问题的求解提供了新的算法框架。

详情
Journal ref
2018 IEEE Statistical Signal Processing Workshop (SSP)
AI中文摘要

因子图近年来因其作为信号处理、估计和控制算法表示和构建的统一框架而受到越来越多的关注。因子图工具包中似乎未充分探索的一个能力是利用表格信息传递规则处理确定性非线性变换,如非线性滤波和平滑问题中的变换。在本贡献中,我们基于前向传递的数值求积过程和后向传递的Rauch-Tung-Striebel型近似方法,为满足马尔可夫性质的任意因子图中的确定性非线性变换节点提供了通用的前向(滤波)和后向(平滑)近似高斯信息传递规则。这些信息传递规则可用于推导许多使用因子图求解非线性问题的算法,如基于所提出信息传递规则的非线性修改Bryson-Frazier(MBF)平滑器的提出。

英文摘要

Factor graphs have recently gained increasing attention as a unified framework for representing and constructing algorithms for signal processing, estimation, and control. One capability that does not seem to be well explored within the factor graph tool kit is the ability to handle deterministic nonlinear transformations, such as those occurring in nonlinear filtering and smoothing problems, using tabulated message passing rules. In this contribution, we provide general forward (filtering) and backward (smoothing) approximate Gaussian message passing rules for deterministic nonlinear transformation nodes in arbitrary factor graphs fulfilling a Markov property, based on numerical quadrature procedures for the forward pass and a Rauch-Tung-Striebel-type approximation of the backward pass. These message passing rules can be employed for deriving many algorithms for solving nonlinear problems using factor graphs, as is illustrated by the proposition of a nonlinear modified Bryson-Frazier (MBF) smoother based on the presented message passing rules.

1903.09122 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

Finite Sample Analysis of Stochastic System Identification

随机系统辨识的有限样本分析

Anastasios Tsiamis, George J. Pappas

AI总结 本文基于机器学习和统计学的现代工具,研究了随机系统辨识的有限样本复杂性。通过子空间辨识算法和有限数量的输出样本,提供了系统参数估计误差的非渐近高概率上界,证明了在高概率下估计误差以1/√N的速度减小。

详情
Comments
Under review
AI中文摘要

在本文中,我们利用现代机器学习和统计学工具,分析了随机系统辨识的有限样本复杂性。一个未知的离散时间线性系统在高斯噪声下随时间演变,没有外部输入。目标是在给定有限时间跨度N内的单条输出测量轨迹的情况下,恢复系统参数以及卡尔曼滤波增益。基于子空间辨识算法和有限数量的N个输出样本,我们提供了系统参数估计误差的非渐近高概率上界。我们的分析利用了最近的随机矩阵理论、自归一化鞅和SVD鲁棒性结果,以证明在高概率下估计误差以1/√N的速度减小。我们的非渐近界不仅与经典渐近结果一致,而且即使在系统处于临界稳定的情况下也有效。

英文摘要

In this paper, we analyze the finite sample complexity of stochastic system identification using modern tools from machine learning and statistics. An unknown discrete-time linear system evolves over time under Gaussian noise without external inputs. The objective is to recover the system parameters as well as the Kalman filter gain, given a single trajectory of output measurements over a finite horizon of length $N$. Based on a subspace identification algorithm and a finite number of $N$ output samples, we provide non-asymptotic high-probability upper bounds for the system parameter estimation errors. Our analysis uses recent results from random matrix theory, self-normalized martingales and SVD robustness, in order to show that with high probability the estimation errors decrease with a rate of $1/\sqrt{N}$. Our non-asymptotic bounds not only agree with classical asymptotic results, but are also valid even when the system is marginally stable.

1903.08990 2026-06-04 stat.AP cs.SY eess.SY

Optimal Intermittent Measurements for Tumor Tracking in X-ray Guided Radiotherapy

X射线引导放疗中肿瘤跟踪的最优间歇测量

Antoine Aspeel, Damien Dasnoy, Raphaël M. Jungers, Benoît Macq

AI总结 本文提出了一种基于遗传算法的最优间歇卡尔曼预测方法,以在有限测量预算下优化肿瘤跟踪,通过选择最优测量时间减少预测误差。

详情
Journal ref
Proc. SPIE 10951, Medical Imaging 2019: Image-Guided Procedures, Robotic Interventions, and Modeling, 109510C (8 March 2019);
Comments
8 pages, 4 figures, 1 table
AI中文摘要

在放疗中,肿瘤跟踪是一个具有挑战性的任务,有助于更精确的剂量递送。一种常见做法是在治疗过程中实时获取X射线图像,用于估计肿瘤位置。这些信息用于预测肿瘤的短期轨迹。卡尔曼预测是该任务的经典方法。X射线采集的主要缺点是会照射患者,包括健康组织。在经典卡尔曼框架中,X射线测量是定期进行的,即以恒定速率进行。在本文中,我们提出了一种新方法,放宽这一约束,以便在最需要的时候进行测量。我们的目标是在给定的测量预算下优化跟踪过程。这一想法自然引导出一个最优的间歇卡尔曼预测器,其中测量时间被选择以最小化整个疗程内的均方预测误差。这个问题可以在呼吸模型被识别后直接解决,并且最优采样时间可以一次计算出来。这些最优测量时间是通过使用遗传算法解决组合优化问题得到的。我们创建了一个在一名患者上验证的测试基准。这种新的预测方法与常规卡尔曼预测器相比,根均方位置估计误差相对改进了9.8%。

英文摘要

In radiation therapy, tumor tracking is a challenging task that allows a better dose delivery. One practice is to acquire X-ray images in real-time during treatment, that are used to estimate the tumor location. These informations are used to predict the close future tumor trajectory. Kalman prediction is a classical approach for this task. The main drawback of X-ray acquisition is that it irradiates the patient, including its healthy tissues. In the classical Kalman framework, X-ray measurements are taken regularly, i.e. at a constant rate. In this paper, we propose a new approach which relaxes this constraint in order to take measurements when they are the most useful. Our aim is for a given budget of measurements to optimize the tracking process. This idea naturally brings to an optimal intermittent Kalman predictor for which measurement times are selected to minimize the mean squared prediction error over the complete fraction. This optimization problem can be solved directly when the respiratory model has been identified and the optimal sampling times can be computed at once. These optimal measurement times are obtained by solving a combinatorial optimization problem using a genetic algorithm. We created a test benchmark on trajectories validated on one patient. This new prediction method is compared to the regular Kalman predictor and a relative improvement of 9:8% is observed on the root mean square position estimation error.

1903.08792 2026-06-04 cs.LG cs.SY eess.SY stat.ML

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

通过障碍函数实现端到端安全强化学习用于安全关键的连续控制任务

Richard Cheng, Gabor Orosz, Richard M. Murray, Joel W. Burdick

AI总结 本文提出了一种结合模型无关强化学习控制器、基于控制障碍函数的控制器以及在线学习未知系统动力学的控制器架构,以确保学习过程中的安全性,通过Gaussian Processes建模系统动力学并展示在倒立摆和无线车对车自主跟车任务中更高的样本效率和安全性。

详情
Comments
Published in AAAI 2019
AI中文摘要

强化学习(RL)算法在模拟应用之外取得有限成功,主要原因是学习过程中缺乏安全性保证。现实世界系统在最优控制器学习之前可能无法正常运行或崩溃。为了解决这个问题,我们提出了一种控制器架构,结合(1)模型无关的RL控制器、(2)利用控制障碍函数(CBFs)的模型基于控制器以及(3)在线学习未知系统动力学,以确保学习过程中的安全性。我们的通用框架利用RL算法的成功来学习高性能控制器,而基于CBF的控制器通过约束可探索策略集来保证安全并引导学习过程。我们利用高斯过程(GPs)来建模系统动力学及其不确定性。我们的新型控制器合成算法RL-CBF在学习过程中以高概率保证安全性,无论使用何种RL算法,并展示了更高的策略探索效率。我们在(1)倒立摆控制和(2)具有无线车辆到车辆通信的自动驾驶跟车任务中测试了我们的算法,并展示了我们的算法在学习过程中比其他最先进的算法具有更高的样本效率,并在整个学习过程中保持安全。

英文摘要

Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) on-line learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous car-following with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.

1903.05196 2026-06-04 cs.LG cs.SY eess.SY stat.ML

A Review of Reinforcement Learning for Autonomous Building Energy Management

自主建筑能源管理中强化学习的综述

Karl Mason, Santiago Grijalva

AI总结 本文综述了强化学习在自主建筑能源管理系统中的应用,总结了相关文献,并探讨了未来研究方向和挑战。

详情
Comments
17 pages, 3 figures
AI中文摘要

近年来,建筑能源管理领域受到了广泛关注。该领域致力于结合传感器技术、通信技术和先进控制算法,以优化能源利用。强化学习是用于控制问题中最突出的机器学习算法之一,并已在建筑能源管理领域取得了许多成功应用。本文对与强化学习应用于开发自主建筑能源管理系统相关的文献进行了全面回顾。还概述了强化学习未来的研究方向和挑战。

英文摘要

The area of building energy management has received a significant amount of interest in recent years. This area is concerned with combining advancements in sensor technologies, communications and advanced control algorithms to optimize energy utilization. Reinforcement learning is one of the most prominent machine learning algorithms used for control problems and has had many successful applications in the area of building energy management. This research gives a comprehensive review of the literature relating to the application of reinforcement learning to developing autonomous building energy management systems. The main direction for future research and challenges in reinforcement learning are also outlined.

1903.01032 2026-06-04 cs.LG cs.SY eess.SY stat.ML

A Fundamental Performance Limitation for Adversarial Classification

对抗分类中的基本性能限制

Abed AlRahman Al Makdah, Vaibhav Katewa, Fabio Pasqualetti

AI总结 本文研究了对抗分类中的基本性能限制,指出在优化准确率的过程中,二分类算法不可避免地会变得更加敏感于数据的对抗操纵,并且准确率与敏感度之间的根本权衡曲线仅取决于数据的统计特性,无法通过调整算法来改进。

详情
AI中文摘要

尽管机器学习算法被广泛用于解决技术、经济和社会相关的问题,但对这些数据驱动算法性能的可证明保证却严重不足,尤其是在数据来自不可靠来源并通过未保护和易受攻击的通道传输时。在本文中,我们采取了重要的步骤来弥合这一差距,并正式证明,在试图优化其准确率时,二分类算法——包括基于机器学习技术的算法——不可避免地会变得更加敏感于数据的对抗操纵。进一步地,对于具有相同复杂度(即分类边界数量)的给定算法类,准确率与敏感度之间的根本权衡曲线仅取决于数据的统计特性,无法通过调整算法来改进。

英文摘要

Despite the widespread use of machine learning algorithms to solve problems of technological, economic, and social relevance, provable guarantees on the performance of these data-driven algorithms are critically lacking, especially when the data originates from unreliable sources and is transmitted over unprotected and easily accessible channels. In this paper we take an important step to bridge this gap and formally show that, in a quest to optimize their accuracy, binary classification algorithms -- including those based on machine-learning techniques -- inevitably become more sensitive to adversarial manipulation of the data. Further, for a given class of algorithms with the same complexity (i.e., number of classification boundaries), the fundamental tradeoff curve between accuracy and sensitivity depends solely on the statistics of the data, and cannot be improved by tuning the algorithm.

1806.06173 2026-06-04 math.OC cs.CC cs.DS cs.SY eess.SY stat.ML

On the Complexity of Detecting Convexity over a Box

关于在盒子上检测凸性的复杂性

Amir Ali Ahmadi, Georgina Hall

AI总结 本文研究了在盒子区域上检测多项式凸性的复杂性,证明了该问题在多项式次数为三时也是强NP难的,这解释了非线性优化求解器中凸性检测仅限于二次函数或特殊结构函数的原因。

详情
AI中文摘要

最近已证明检测四次多项式全局凸性的问题是强NP难的,回答了N.Z. Shor提出的一个开放问题。这一结果在多项式次数方面是最小的当关注全局凸性时。然而,在许多应用中,人们只关心在紧致区域(最常见的是盒子,即超矩形)上检测凸性。在本文中,我们证明了这一问题也是强NP难的,事实上对于多项式次数仅为三的情况。这一结果在多项式次数方面是最小的,并在某种意义上解释了为什么非线性优化求解器中的凸性检测仅限于二次函数或具有特殊结构的函数。作为副产品,我们的证明还显示了测试区间族中所有矩阵是否为半正定的问题也是强NP难的。这个问题此前由Nemirovski证明为弱NP难,但在鲁棒控制理论中具有独立的兴趣。

英文摘要

It has recently been shown that the problem of testing global convexity of polynomials of degree four is {strongly} NP-hard, answering an open question of N.Z. Shor. This result is minimal in the degree of the polynomial when global convexity is of concern. In a number of applications however, one is interested in testing convexity only over a compact region, most commonly a box (i.e., hyper-rectangle). In this paper, we show that this problem is also strongly NP-hard, in fact for polynomials of degree as low as three. This result is minimal in the degree of the polynomial and in some sense justifies why convexity detection in nonlinear optimization solvers is limited to quadratic functions or functions with special structure. As a byproduct, our proof shows that the problem of testing whether all matrices in an interval family are positive semidefinite is strongly NP-hard. This problem, which was previously shown to be (weakly) NP-hard by Nemirovski, is of independent interest in the theory of robust control.

1705.03181 2026-06-04 math.NA cs.NA math.ST stat.TH

Asymptotic Normality of Extensible Grid Sampling

可扩展网格采样的渐近正态性

Zhijian He, Lingjiong Zhu

AI总结 本文研究了使用 scrambling van der Corput 序列作为输入时,基于 Hilbert 空间填充曲线的估计的渐近正态性,证明了对于 $C^1([0,1]^d)$ 中的函数(排除常数函数)以及满足一定条件的不连续函数,该估计具有渐近正态分布,并给出了 HSFC 基估计方差的下界。

详情
Journal ref
Statistics and Computing, 2019, Volume 29, Issue 1, pp 53-65
AI中文摘要

Recently, He and Owen (2016) proposed the use of Hilbert's space filling curve (HSFC) in numerical integration as a way of reducing the dimension from $d>1$ to $d=1$. This paper studies the asymptotic normality of the HSFC-based estimate when using scrambled van der Corput sequence as input. We show that the estimate has an asymptotic normal distribution for functions in $C^1([0,1]^d)$, excluding the trivial case of constant functions. The asymptotic normality also holds for discontinuous functions under mild conditions. It was previously known only that scrambled $(0,m,d)$-net quadratures enjoy the asymptotic normality for smooth enough functions, whose mixed partial gradients satisfy a Hölder condition. As a by-product, we find lower bounds for the variance of the HSFC-based estimate. Particularly, for nontrivial functions in $C^1([0,1]^d)$, the low bound is of order $n^{-1-2/d}$, which matches the rate of the upper bound established in He and Owen (2016).

英文摘要

Recently, He and Owen (2016) proposed the use of Hilbert's space filling curve (HSFC) in numerical integration as a way of reducing the dimension from $d>1$ to $d=1$. This paper studies the asymptotic normality of the HSFC-based estimate when using scrambled van der Corput sequence as input. We show that the estimate has an asymptotic normal distribution for functions in $C^1([0,1]^d)$, excluding the trivial case of constant functions. The asymptotic normality also holds for discontinuous functions under mild conditions. It was previously known only that scrambled $(0,m,d)$-net quadratures enjoy the asymptotic normality for smooth enough functions, whose mixed partial gradients satisfy a Hölder condition. As a by-product, we find lower bounds for the variance of the HSFC-based estimate. Particularly, for nontrivial functions in $C^1([0,1]^d)$, the low bound is of order $n^{-1-2/d}$, which matches the rate of the upper bound established in He and Owen (2016).

1903.03763 2026-06-04 eess.SY cs.LG cs.SY math.OC stat.ML

A tractable ellipsoidal approximation for voltage regulation problems

电压调节问题中的可处理椭球近似

Pan Li, Baihong Jin, Ruoxuan Xiong, Dai Wang, Alberto Sangiovanni-Vincentelli, Baosen Zhang

AI总结 本文提出了一种基于机器学习的方法来解决电力系统运行中电压调节问题中的机会约束优化问题,通过用椭球近似不确定性可行区域,提出了类似支持向量机的学习模型和高效的采样算法。

详情
Comments
accepted by ACC2019 http://acc2019.a2c2.org/
AI中文摘要

我们提出了一种机器学习方法来解决电压调节问题中的机会约束优化问题。我们的方法新颖之处在于用椭球近似不确定性可行区域。我们使用类似于支持向量机(SVM)的学习模型来提出这个问题,并提出了一种高效的采样算法来训练模型。我们使用标准的IEEE配电测试馈线在电压调节问题上展示了我们的方法。

英文摘要

We present a machine learning approach to the solution of chance constrained optimizations in the context of voltage regulation problems in power system operation. The novelty of our approach resides in approximating the feasible region of uncertainty with an ellipsoid. We formulate this problem using a learning model similar to Support Vector Machines (SVM) and propose a sampling algorithm that efficiently trains the model. We demonstrate our approach on a voltage regulation problem using standard IEEE distribution test feeders.

1903.02250 2026-06-04 eess.SY cs.SY math.OC stat.ML

Nonlinear input design as optimal control of a Hamiltonian system

非线性输入设计作为哈密顿系统的最优控制

Jack Umenberger, Thomas B. Schön

AI总结 本文提出了一种针对参数概率模型的输入设计方法,通过将后验分布表示为哈密顿系统轨迹,将输入设计问题转化为最优控制问题,用于非线性动力系统等场景的参数估计。

详情
AI中文摘要

我们提出了一种针对一般参数概率模型的输入设计方法,包括具有过程噪声的非线性动力系统。该方法的目标是选择输入,使得参数后验分布集中于参数的真实值附近;然而,后验分布的精确计算是不可行的。通过将(后验分布的样本)表示为特定哈密顿系统轨迹,我们将输入设计任务转化为一个最优控制问题。该方法通过数值示例进行了说明,包括MRI脉冲序列设计。

英文摘要

We propose an input design method for a general class of parametric probabilistic models, including nonlinear dynamical systems with process noise. The goal of the procedure is to select inputs such that the parameter posterior distribution concentrates about the true value of the parameters; however, exact computation of the posterior is intractable. By representing (samples from) the posterior as trajectories from a certain Hamiltonian system, we transform the input design task into an optimal control problem. The method is illustrated via numerical examples, including MRI pulse sequence design.

1810.06749 2026-06-04 cs.LG cs.NA math.NA stat.ML

Optimally rotated coordinate systems for adaptive least-squares regression on sparse grids

最优旋转坐标系用于稀疏网格上的自适应最小二乘回归

Bastian Bohn, Michael Griebel, Jens Oettershagen

AI总结 针对高维数据集,本文提出了一种预处理方法,通过确定问题相关的优化坐标系来降低数据的有效维度,从而提升自适应稀疏网格最小二乘回归算法的性能。

详情
AI中文摘要

对于低维数据集具有大量数据点时,标准核方法通常不再适用于回归。除了简单的线性模型或复杂的启发式深度学习模型外,基于网格的更大(核)模型类的离散化方法导致的算法自然地线性缩放数据点数量。在中等维或高维回归任务中,这些基于网格的离散化方法受到维度诅咒的影响。在此背景下,稀疏网格方法已证明可以很大程度上克服这一问题。在这种情况下,能够检测并利用名义上高维数据的低有效维数的空间和维度自适应稀疏网格特别成功。然而,它们仍然依赖于轴对齐的结构,并在具有主要偏斜和旋转坐标的数据中表现出问题。在本文中,我们提出了一种预处理方法,用于这些自适应稀疏网格算法,以确定一个优化的、问题相关的坐标系,从而在ANOVA意义上降低给定数据集的有效维度。我们通过合成数据以及现实世界数据的数值示例,展示了自适应稀疏网格最小二乘算法如何从我们的预处理方法中受益。

英文摘要

For low-dimensional data sets with a large amount of data points, standard kernel methods are usually not feasible for regression anymore. Besides simple linear models or involved heuristic deep learning models, grid-based discretizations of larger (kernel) model classes lead to algorithms, which naturally scale linearly in the amount of data points. For moderate-dimensional or high-dimensional regression tasks, these grid-based discretizations suffer from the curse of dimensionality. Here, sparse grid methods have proven to circumvent this problem to a large extent. In this context, space- and dimension-adaptive sparse grids, which can detect and exploit a given low effective dimensionality of nominally high-dimensional data, are particularly successful. They nevertheless rely on an axis-aligned structure of the solution and exhibit issues for data with predominantly skewed and rotated coordinates. In this paper we propose a preprocessing approach for these adaptive sparse grid algorithms that determines an optimized, problem-dependent coordinate system and, thus, reduces the effective dimensionality of a given data set in the ANOVA sense. We provide numerical examples on synthetic data as well as real-world data to show how an adaptive sparse grid least squares algorithm benefits from our preprocessing method.

1703.00734 2026-06-04 stat.ML cs.DC cs.LG cs.NA math.NA stat.ME

Distributed Bayesian Matrix Factorization with Limited Communication

分布式贝叶斯矩阵分解与有限通信

Xiangju Qin, Paul Blomstedt, Eemeli Leppäaho, Pekka Parviainen, Samuel Kaski

AI总结 本文提出了一种分布式贝叶斯矩阵分解方法,通过分层分解联合后验分布,结合并行计算和高效近似实现,提高了大规模数据处理效率,同时保持预测准确性。

详情
Journal ref
Machine Learning, 2019
Comments
28 pages, 8 figures. The paper is published in Machine Learning journal. An implementation of the method is is available in SMURFF software on github (bmfpp branch): https://github.com/ExaScience/smurff
AI中文摘要

贝叶斯矩阵分解(BMF)是一种强大的工具,用于生成低秩矩阵表示,并预测缺失值和提供置信区间。对大规模矩阵的后验推断进行扩展具有挑战性,需要将数据和计算分布到多个工人上,使通信成为主要的计算瓶颈。 embarrassingly parallel 推断可以通过在不同数据子集上使用完全独立的计算来消除通信需求,但会受到BMF解的固有不可识别性的影响。我们引入了联合后验分布的分层分解,将子推断耦合起来,允许在最多三个阶段中进行 embarrassingly parallel 计算。使用高效的近似实现,我们在真实和模拟数据上经验性地展示了改进。我们的分布式方法能够实现比完整后验快几乎一个数量级的速度提升,对预测准确性影响微小。我们的方法在准确性上优于最先进的 embarrassingly parallel MCMC 方法,并在结果上与其它可用的分布式和并行BMF实现具有竞争力。

英文摘要

Bayesian matrix factorization (BMF) is a powerful tool for producing low-rank representations of matrices and for predicting missing values and providing confidence intervals. Scaling up the posterior inference for massive-scale matrices is challenging and requires distributing both data and computation over many workers, making communication the main computational bottleneck. Embarrassingly parallel inference would remove the communication needed, by using completely independent computations on different data subsets, but it suffers from the inherent unidentifiability of BMF solutions. We introduce a hierarchical decomposition of the joint posterior distribution, which couples the subset inferences, allowing for embarrassingly parallel computations in a sequence of at most three stages. Using an efficient approximate implementation, we show improvements empirically on both real and simulated data. Our distributed approach is able to achieve a speed-up of almost an order of magnitude over the full posterior, with a negligible effect on predictive accuracy. Our method outperforms state-of-the-art embarrassingly parallel MCMC methods in accuracy, and achieves results competitive to other available distributed and parallel implementations of BMF.

1709.02577 2026-06-04 math.NA cs.NA stat.AP

An integrated quasi-Monte Carlo method for handling high dimensional problems with discontinuities in financial engineering

一种用于金融工程中高维问题的集成准蒙特卡洛方法

Zhijian He, Xiaoqun Wang

AI总结 本文提出了一种集成准蒙特卡洛方法,通过平滑方法消除金融工程中典型函数的不连续性,并设计新的路径生成方法以降低目标函数的有效维度,从而提高高维和不连续问题的计算效率。

详情
AI中文摘要

准蒙特卡洛(QMC)方法是一种用于复杂金融衍生品定价和对冲的有用数值工具。这些问题通常具有高维性和不连续性,这两个因素可能会显著降低QMC方法的性能。本文开发了一种集成方法,同时克服高维性和不连续性的挑战。为此,提出了一种平滑方法以消除某些典型函数中的不连续性。为了使平滑方法适用于更一般性的函数,设计了一种新的路径生成方法,用于模拟基础资产的路径,使得生成的函数具有所需形式。新的路径生成方法还具有降低目标函数有效维度的能力。我们提出的方法适用于多种模型规格,包括Black-Scholes、指数正常逆高斯莱维和Heston模型。针对这些模型的数值实验表明,在QMC设置下,所提出的平滑方法结合新的路径生成方法可以显著减少定价具有不连续支付的奇异期权和计算期权希腊字母的方差。对有效维度及其相关特征的调查解释了联合过程的显著增强。

英文摘要

Quasi-Monte Carlo (QMC) method is a useful numerical tool for pricing and hedging of complex financial derivatives. These problems are usually of high dimensionality and discontinuities. The two factors may significantly deteriorate the performance of the QMC method. This paper develops an integrated method that overcomes the challenges of the high dimensionality and discontinuities concurrently. For this purpose, a smoothing method is proposed to remove the discontinuities for some typical functions arising from financial engineering. To make the smoothing method applicable for more general functions, a new path generation method is designed for simulating the paths of the underlying assets such that the resulting function has the required form. The new path generation method has an additional power to reduce the effective dimension of the target function. Our proposed method caters for a large variety of model specifications, including the Black-Scholes, exponential normal inverse Gaussian Lévy, and Heston models. Numerical experiments dealing with these models show that in the QMC setting the proposed smoothing method in combination with the new path generation method can lead to a dramatic variance reduction for pricing exotic options with discontinuous payoffs and for calculating options' Greeks. The investigation on the effective dimension and the related characteristics explains the significant enhancement of the combined procedure.

1902.09427 2026-06-04 eess.SP cs.LG cs.SY eess.SY stat.ML

Fault Diagnosis Method Based on Scaling Law for On-line Refrigerant Leak Detection

基于缩放定律的故障诊断方法用于在线制冷剂泄漏检测

Shun Takeuchi, Takahiro Saito

AI总结 本文提出了一种基于物理建模和空调系统控制机制的制冷剂泄漏故障诊断方法,通过推导与制冷剂泄漏相关的缩放定律,使模型能够适用于不同配置的空调系统,利用实验室的小规模离线故障测试数据估计缩放指数,并通过真实数据验证,证明了该方法在早期泄漏检测中的有效性。

详情
Journal ref
2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)
Comments
8 pages, 6 figures
AI中文摘要

利用仪器化传感器数据进行早期故障检测是机器学习在工业设施中的一个有前景的应用领域。然而,由于目标诊断系统中复杂的系统配置和不足的故障数据,训练出的故障检测模型的泛化性能难以提高。将训练好的模型应用于其他系统并不容易。本文提出了一种考虑空调系统物理建模和控制机制的制冷剂泄漏故障诊断方法。我们推导出与制冷剂泄漏相关的有用缩放定律。如果控制机制相同,模型可以应用于其他空调系统,而不论系统配置如何。在实验室中获得的小规模离线故障测试数据用于估计缩放指数。我们通过真实数据评估所提出的缩放定律。基于两组之间相互作用的统计假设检验,我们证明了不同空调系统的缩放指数是等价的。此外,我们基于缩放定律对实际过程数据的泄漏程度时间序列进行了估计,并通过与专家评估的比较,证明了该方法在早期泄漏检测中的有效性。

英文摘要

Early fault detection using instrumented sensor data is one of the promising application areas of machine learning in industrial facilities. However, it is difficult to improve the generalization performance of the trained fault-detection model because of the complex system configuration in the target diagnostic system and insufficient fault data. It is not trivial to apply the trained model to other systems. Here we propose a fault diagnosis method for refrigerant leak detection considering the physical modeling and control mechanism of an air-conditioning system. We derive a useful scaling law related to refrigerant leak. If the control mechanism is the same, the model can be applied to other air-conditioning systems irrespective of the system configuration. Small-scale off-line fault test data obtained in a laboratory are applied to estimate the scaling exponent. We evaluate the proposed scaling law by using real-world data. Based on a statistical hypothesis test of the interaction between two groups, we show that the scaling exponents of different air-conditioning systems are equivalent. In addition, we estimated the time series of the degree of leakage of real process data based on the scaling law and confirmed that the proposed method is promising for early leak detection through comparison with assessment by experts.

1902.09426 2026-06-04 eess.SP cs.LG cs.SY eess.SY stat.ML

Semi-supervised Approach to Soft Sensor Modeling for Fault Detection in Industrial Systems with Multiple Operation Modes

基于半监督方法的软传感器建模用于具有多种操作模式的工业系统故障检测

Shun Takeuchi, Takuya Nishino, Takahiro Saito, Isamu Watanabe

AI总结 本文提出了一种半监督方法用于软传感器建模,以解决在多操作模式系统中因目标变量数据不足而无法有效训练的问题,通过利用操作模式转换点的特性来改进模型预测能力。

详情
Journal ref
International Conference on Advanced Intelligent Systems and Informatics 2017
Comments
7 pages, 1 figure
AI中文摘要

在工业系统中,某些需要监控以检测故障的过程变量往往难以或无法测量。软传感器技术广泛用于从易于测量的变量估计这些难以测量的过程变量。软传感器建模需要包含各种状态信息的训练数据集,但目标变量的故障数据集不足,无法作为训练数据集。本文描述了一种半监督方法用于软传感器建模,以将缺少目标变量的不完整数据集纳入训练数据集。为了整合不完整数据集,我们考虑系统中操作模式转换点的特性。在约束条件下,通过从模式转换信息中获得的约束条件估计操作模式的回归系数。在案例研究中,这种受约束的软传感器建模被用于预测具有加热和制冷操作模式的空调系统中的制冷剂泄漏。结果表明,这种建模方法对于具有多种操作模式的系统中的软传感器具有前景。

英文摘要

In industrial systems, certain process variables that need to be monitored for detecting faults are often difficult or impossible to measure. Soft sensor techniques are widely used to estimate such difficult-to-measure process variables from easy-to-measure ones. Soft sensor modeling requires training datasets including the information of various states such as operation modes, but the fault dataset with the target variable is insufficient as the training dataset. This paper describes a semi-supervised approach to soft sensor modeling to incorporate an incomplete dataset without the target variable in the training dataset. To incorporate the incomplete dataset, we consider the properties of processes at transition points between operation modes in the system. The regression coefficients of the operation modes are estimated under constraint conditions obtained from the information on the mode transitions. In a case study, this constrained soft sensor modeling was used to predict refrigerant leaks in air-conditioning systems with heating and cooling operation modes. The results show that this modeling method is promising for soft sensors in a system with multiple operation modes.

1806.07190 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Stable Gaussian Process based Tracking Control of Euler-Lagrange Systems

基于稳定高斯过程的欧拉-拉格朗日系统跟踪控制

Thomas Beckers, Dana Kulić, Sandra Hirche

AI总结 本文提出一种基于高斯过程回归的稳定跟踪控制方法,用于未知欧拉-拉格朗日系统的高精度跟踪控制,通过数据驱动建模实现前馈补偿,并利用模型保真度动态调整反馈增益,确保全局有界跟踪误差。

详情
Comments
Accepted manuscript for publication in Elsevier Automatica
AI中文摘要

对现实中的欧拉-拉格朗日系统实现完美的跟踪控制具有挑战性,因为系统模型的不确定性以及外部干扰会影响跟踪误差的大小。通过增加反馈增益或改进系统模型可以减小跟踪误差。后者显然更可取,因为它允许在低反馈增益下保持良好的跟踪性能。然而,准确的模型往往难以获得。在本文中,我们解决了未知欧拉-拉格朗日系统的稳定高性能跟踪控制问题。具体来说,我们使用高斯过程回归来获得一个数据驱动的模型,用于系统未知动力学的前馈补偿。模型保真度用于调整反馈增益,允许在状态空间中模型信心高的区域使用低反馈增益。所提出的控制律保证了具有特定概率的全局有界跟踪误差。仿真研究展示了其优于现有跟踪控制方法的优越性。

英文摘要

Perfect tracking control for real-world Euler-Lagrange systems is challenging due to uncertainties in the system model and external disturbances. The magnitude of the tracking error can be reduced either by increasing the feedback gains or improving the model of the system. The latter is clearly preferable as it allows to maintain good tracking performance at low feedback gains. However, accurate models are often difficult to obtain. In this article, we address the problem of stable high-performance tracking control for unknown Euler-Lagrange systems. In particular, we employ Gaussian Process regression to obtain a data-driven model that is used for the feed-forward compensation of unknown dynamics of the system. The model fidelity is used to adapt the feedback gains allowing low feedback gains in state space regions of high model confidence. The proposed control law guarantees a globally bounded tracking error with a specific probability. Simulation studies demonstrate the superiority over state of the art tracking control approaches.

1902.08721 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

Online Control with Adversarial Disturbances

对抗性扰动下的在线控制

Naman Agarwal, Brian Bullins, Elad Hazan, Sham M. Kakade, Karan Singh

AI总结 本文研究了在存在对抗性扰动的线性动态系统中的在线控制问题,提出了一种高效的算法,该算法在几乎紧致的 regret 绑定下实现了接近全知扰动的控制效果,同时扩展了先前工作的两个主要方面:允许动态中的对抗性噪声和一般的凸成本。

详情
AI中文摘要

我们研究了具有对抗性扰动(而非统计噪声)的线性动态系统的控制问题。我们考虑的目标是regret:我们希望一种在线控制过程能够几乎达到完全了解扰动的控制过程的性能。我们的主要结果是一个高效的算法,该算法为该问题提供了几乎紧致的regret界。从技术角度来看,这项工作在两个主要方面扩展了先前的工作:我们的模型允许动态中的对抗性噪声,并允许一般的凸成本。

英文摘要

We study the control of a linear dynamical system with adversarial disturbances (as opposed to statistical noise). The objective we consider is one of regret: we desire an online control procedure that can do nearly as well as that of a procedure that has full knowledge of the disturbances in hindsight. Our main result is an efficient algorithm that provides nearly tight regret bounds for this problem. From a technical standpoint, this work generalizes upon previous work in two main aspects: our model allows for adversarial noise in the dynamics, and allows for general convex costs.

1807.08713 2026-06-04 stat.CO cs.NA math.NA

A practical example for the non-linear Bayesian filtering of model parameters

模型参数非线性贝叶斯滤波的实用示例

Matthieu Bulté, Jonas Latz, Elisabeth Ullmann

AI总结 本文通过实际数据示例,介绍了非线性贝叶斯滤波在时间依赖模型中静态参数估计的方法,重点讨论了粒子滤波中的序列重要性采样和序列蒙特卡洛方法,并提供了Python实现。

详情
AI中文摘要

在本教程中,我们考虑了时间依赖模型中静态参数的非线性贝叶斯滤波。我们概述了理论背景,并讨论了适当的求解器。我们重点介绍基于粒子的滤波器,并介绍了序列重要性采样(SIS)和序列蒙特卡洛(SMC)。本文 Throughout the paper we illustrate the concepts and techniques with a practical example using real-world data. The task is to estimate the gravitational acceleration of the Earth $g$ by using observations collected from a simple pendulum. Importantly, the particle filters enable the adaptive updating of the estimate for $g$ as new observations become available. For tutorial purposes we provide the data set and a Python implementation of the particle filters.

英文摘要

In this tutorial we consider the non-linear Bayesian filtering of static parameters in a time-dependent model. We outline the theoretical background and discuss appropriate solvers. We focus on particle-based filters and present Sequential Importance Sampling (SIS) and Sequential Monte Carlo (SMC). Throughout the paper we illustrate the concepts and techniques with a practical example using real-world data. The task is to estimate the gravitational acceleration of the Earth $g$ by using observations collected from a simple pendulum. Importantly, the particle filters enable the adaptive updating of the estimate for $g$ as new observations become available. For tutorial purposes we provide the data set and a Python implementation of the particle filters.

1902.08594 2026-06-04 eess.SY cs.LG cs.MA cs.SY stat.ML

Regression-based Inverter Control for Decentralized Optimal Power Flow and Voltage Regulation

基于回归的逆变器控制用于分布式最优功率流和电压调节

Oscar Sondermeijer, Roel Dobbe, Daniel Arnold, Claire Tomlin, Tamás Keviczky

AI总结 本文提出了一种系统化的数据驱动方法,通过本地测量确定逆变器输出无功功率,以实现接近最优的结果,该方法通过网络模型和历史负荷和发电数据进行最优功率流计算,然后利用回归找到每个逆变器的函数,将本地历史数据映射到其最优无功功率注入的近似值,从而实现分布式控制,以在电压和容量约束下最小化损耗并实现电压平坦化,同时允许高效的电压-无功优化(VVO)方案,使传统控制设备与现有逆变器协同工作,以安全运行高分布式发电水平的配电网。

详情
Comments
Cite as: Oscar Sondermeijer, Roel Dobbe, Daniel Arnold, Claire Tomlin and Tamás Keviczky, "Regression-based Inverter Control for Decentralized Optimal Power Flow and Voltage Regulation", IEEE Power & Energy Society General Meeting, Boston, July 2016
AI中文摘要

电子功率逆变器能够快速提供无功功率以维持客户电压在运行容差范围内,并减少配电网中的系统损耗。本文提出了一种系统化且数据驱动的方法,以确定无功功率逆变器输出作为本地测量函数的方式,以获得接近最优的结果。首先,我们使用网络模型和历史负荷和发电数据,并进行最优功率流计算,以计算网络中所有可控逆变器的全局最优无功功率注入。随后,我们使用回归找到每个逆变器的函数,将本地历史数据映射到其最优无功功率注入的近似值。所得函数随后作为参与逆变器的分布式控制器,根据新的本地测量预测最优注入。该方法在执行电压和容量约束下的损耗最小化和电压平坦化时能够实现接近最优的结果,并允许高效的电压-无功优化(VVO)方案,其中传统控制设备与现有逆变器协同工作,以安全运行具有更高分布式发电水平的配电网。

英文摘要

Electronic power inverters are capable of quickly delivering reactive power to maintain customer voltages within operating tolerances and to reduce system losses in distribution grids. This paper proposes a systematic and data-driven approach to determine reactive power inverter output as a function of local measurements in a manner that obtains near optimal results. First, we use a network model and historic load and generation data and do optimal power flow to compute globally optimal reactive power injections for all controllable inverters in the network. Subsequently, we use regression to find a function for each inverter that maps its local historical data to an approximation of its optimal reactive power injection. The resulting functions then serve as decentralized controllers in the participating inverters to predict the optimal injection based on a new local measurements. The method achieves near-optimal results when performing voltage- and capacity-constrained loss minimization and voltage flattening, and allows for an efficient volt-VAR optimization (VVO) scheme in which legacy control equipment collaborates with existing inverters to facilitate safe operation of distribution networks with higher levels of distributed generation.

1807.04020 2026-06-04 math.NA cs.LG cs.NA stat.ML

Improved SVD-based Initialization for Nonnegative Matrix Factorization using Low-Rank Correction

改进的基于SVD的非负矩阵分解初始化方法:利用低秩修正

Atif Muhammad Syed, Sameer Qazi, Nicolas Gillis

AI总结 本文提出了一种改进的基于SVD的非负矩阵分解初始化方法,通过考虑被丢弃的SVD因子来降低初始误差,同时生成稀疏初始因子并提高计算效率。

详情
Journal ref
Pattern Recognition Letters 122, pp. 53-59, 2019
Comments
12 pages, 1 figure, 5 tables, submitted to pattern recognition letters
AI中文摘要

由于大多数非负矩阵分解(NMF)算法的迭代性质,初始化是一个关键因素,因为它显著影响收敛性和最终得到的解。许多初始化方案已被提出,其中最受欢迎的一类方法基于奇异值分解(SVD)。然而,这些基于SVD的初始化方法并不满足一个自然条件,即误差应随着因子分解的秩增加而减少。在本文中,我们提出了一种新的基于SVD的NMF初始化方法,专门针对这一不足,通过考虑用于获得非负初始化而被丢弃的SVD因子。这种方法称为非负SVD与低秩修正(NNSVD-LRC),通过利用被丢弃的SVD因子的低秩结构,在可忽略的额外计算成本下显著降低初始误差。与以往基于SVD的初始化方法相比,NNSVD-LRC还有两个其他优势:(1)它能够证明生成稀疏的初始因子;(2)它更快,因为它只需要计算秩为⌈r/2 + 1⌉的截断SVD,其中r是所求NMF分解的因子秩(与其他方法不同,其他方法需要计算秩为r的截断SVD)。我们在多个标准密集和稀疏数据集上展示了我们的新方法在NMF中与最先进的基于SVD的初始化方法竞争性。

英文摘要

Due to the iterative nature of most nonnegative matrix factorization (\textsc{NMF}) algorithms, initialization is a key aspect as it significantly influences both the convergence and the final solution obtained. Many initialization schemes have been proposed for NMF, among which one of the most popular class of methods are based on the singular value decomposition (SVD). However, these SVD-based initializations do not satisfy a rather natural condition, namely that the error should decrease as the rank of factorization increases. In this paper, we propose a novel SVD-based \textsc{NMF} initialization to specifically address this shortcoming by taking into account the SVD factors that were discarded to obtain a nonnegative initialization. This method, referred to as nonnegative SVD with low-rank correction (NNSVD-LRC), allows us to significantly reduce the initial error at a negligible additional computational cost using the low-rank structure of the discarded SVD factors. NNSVD-LRC has two other advantages compared to previous SVD-based initializations: (1) it provably generates sparse initial factors, and (2) it is faster as it only requires to compute a truncated SVD of rank $\lceil r/2 + 1 \rceil$ where $r$ is the factorization rank of the sought NMF decomposition (as opposed to a rank-$r$ truncated SVD for other methods). We show on several standard dense and sparse data sets that our new method competes favorably with state-of-the-art SVD-based initializations for NMF.

1304.2902 2026-06-04 math.ST cs.NA math.AP math.NA stat.TH

Random fields representations for stochastic elliptic boundary value problems and statistical inverse problems

随机场表示用于随机椭圆边界值问题和统计反问题

Anthony Nouy, Christian Soize

AI总结 本文提出了一种新的方法,用于通过随机椭圆边界值问题识别未知的非高斯正矩阵值随机场,解决统计反问题。引入了一种新的非高斯正定矩阵值随机场一般类,并分析了其性质。提出了该类随机场离散化后的最小参数化方法,并基于此提出了一种完整的识别过程。还展示了参数化随机椭圆边界值问题的数学和数值分析的新结果。该参数化问题的数值解提供了将参数化一般类随机场映射到相应随机解集的应用显式近似,此近似可用于识别过程以避免求解多个正向随机问题。由于所提出的随机场一般类可能包含非均匀有界的随机场,因此开发了特定的数学分析并引入了专门的近似方法。为了构造非常高维映射的近似算法,引入了复杂度降低方法,这些方法基于利用参数化一般类随机场所产生解的张量结构的低秩近似方法。

详情
Journal ref
Eur. J. Appl. Math 25 (2014) 339-373
Comments
European Journal of Applied Mathematics, 2014
AI中文摘要

本文提出了一种新的方法,用于通过随机椭圆边界值问题识别未知的非高斯正矩阵值随机场,解决统计反问题。引入了一种新的非高斯正定矩阵值随机场一般类,并分析了其性质。提出了该类随机场离散化后的最小参数化方法,并基于此提出了一种完整的识别过程。还展示了参数化随机椭圆边界值问题的数学和数值分析的新结果。该参数化问题的数值解提供了将参数化一般类随机场映射到相应随机解集的应用显式近似,此近似可用于识别过程以避免求解多个正向随机问题。由于所提出的随机场一般类可能包含非均匀有界的随机场,因此开发了特定的数学分析并引入了专门的近似方法。为了构造非常高维映射的近似算法,引入了复杂度降低方法,这些方法基于利用参数化一般类随机场所产生解的张量结构的低秩近似方法。

英文摘要

This paper presents new results allowing an unknown non-Gaussian positive matrix-valued random field to be identified through a stochastic elliptic boundary value problem, solving a statistical inverse problem. A new general class of non-Gaussian positive-definite matrix-valued random fields, adapted to the statistical inverse problems in high stochastic dimension for their experimental identification, is introduced and its properties are analyzed. A minimal parametrization of discretized random fields belonging to this general class is proposed. Using this parametrization of the general class, a complete identification procedure is proposed. New results of the mathematical and numerical analyzes of the parameterized stochastic elliptic boundary value problem are presented. The numerical solution of this parametric stochastic problem provides an explicit approximation of the application that maps the parameterized general class of random fields to the corresponding set of random solutions. This approximation can be used during the identification procedure in order to avoid the solution of multiple forward stochastic problems. Since the proposed general class of random fields possibly contains random fields which are not uniformly bounded, a particular mathematical analysis is developed and dedicated approximation methods are introduced. In order to obtain an algorithm for constructing the approximation of a very high-dimensional map, complexity reduction methods are introduced and are based on the use of low-rank approximation methods that exploit the tensor structure of the solution which results from the parametrization of the general class of random fields.

1902.06366 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Detecting and Diagnosing Incipient Building Faults Using Uncertainty Information from Deep Neural Networks

利用深度神经网络的不确定性信息检测和诊断建筑初期故障

Baihong Jin, Dan Li, Seshadhri Srinivasan, See-Kiong Ng, Kameshwar Poolla, Alberto~Sangiovanni-Vincentelli

AI总结 本文提出利用蒙特卡洛dropout方法增强监督学习流程,以检测和诊断未见过的初期故障,并在RP-1043数据集上验证其在指示最可能的初期故障类型方面的有效性。

详情
AI中文摘要

早期检测初期故障对于减少维护成本、节约能源和提高居住舒适度在建筑中至关重要。尽管深度神经网络等流行监督学习模型因其能够直接从标记的故障数据中学习而被认为具有前景,但监督学习方法的性能高度依赖于标记训练数据的可用性和质量。在故障检测与诊断(FDD)应用中,缺乏标记的初期故障数据已成为将这些监督学习技术应用于商业建筑的主要挑战。为克服这一挑战,本文提出利用蒙特卡洛dropout(MC-dropout)来增强监督学习流程,使生成的神经网络能够检测和诊断未见过的初期故障示例。我们还检查了所提出的MC-dropout方法在RP-1043数据集上的效果,以证明其在指示最可能的初期故障类型方面的有效性。

英文摘要

Early detection of incipient faults is of vital importance to reducing maintenance costs, saving energy, and enhancing occupant comfort in buildings. Popular supervised learning models such as deep neural networks are considered promising due to their ability to directly learn from labeled fault data; however, it is known that the performance of supervised learning approaches highly relies on the availability and quality of labeled training data. In Fault Detection and Diagnosis (FDD) applications, the lack of labeled incipient fault data has posed a major challenge to applying these supervised learning techniques to commercial buildings. To overcome this challenge, this paper proposes using Monte Carlo dropout (MC-dropout) to enhance the supervised learning pipeline, so that the resulting neural network is able to detect and diagnose unseen incipient fault examples. We also examine the proposed MC-dropout method on the RP-1043 dataset to demonstrate its effectiveness in indicating the most likely incipient fault types.

1902.06361 2026-06-04 cs.LG cs.NE cs.SY eess.SY stat.ML

A One-Class Support Vector Machine Calibration Method for Time Series Change Point Detection

一种用于时间序列变化点检测的一类支持向量机校准方法

Baihong Jin, Yuxin Chen, Dan Li, Kameshwar Poolla, Alberto Sangiovanni-Vincentelli

AI总结 本文提出了一种校准一类支持向量机(OC-SVM)的方法,用于时间序列变化点检测,通过启发式搜索方法找到输入数据和超参数的最优组合,实验表明OC-SVM在少量训练数据下也能有效检测变化点,优于现有深度学习方法。

详情
AI中文摘要

识别系统健康状态的变化点对于检测发展中的初始故障至关重要。一类支持向量机(OC-SVM)是一种流行的机器学习模型,用于异常检测,因此可用于识别变化点;然而,有时难以获得一个能够用于传感器测量时间序列以识别系统健康状态变化点的良好的OC-SVM模型。在本文中,我们提出了一种新颖的OC-SVM模型校准方法。该方法使用启发式搜索方法来寻找一组良好的输入数据和超参数,以产生一个表现良好的模型。我们在C-MAPSS数据集上的结果表明,OC-SVM在使用较少训练数据的情况下也能在时间序列中实现满意的准确性,相较于最先进的深度学习方法。在我们的案例研究中,通过所提出的模型校准的OC-SVM在训练数据有限的情况下显示出特别的实用性。

英文摘要

It is important to identify the change point of a system's health status, which usually signifies an incipient fault under development. The One-Class Support Vector Machine (OC-SVM) is a popular machine learning model for anomaly detection and hence could be used for identifying change points; however, it is sometimes difficult to obtain a good OC-SVM model that can be used on sensor measurement time series to identify the change points in system health status. In this paper, we propose a novel approach for calibrating OC-SVM models. The approach uses a heuristic search method to find a good set of input data and hyperparameters that yield a well-performing model. Our results on the C-MAPSS dataset demonstrate that OC-SVM can also achieve satisfactory accuracy in detecting change point in time series with fewer training data, compared to state-of-the-art deep learning approaches. In our case study, the OC-SVM calibrated by the proposed model is shown to be useful especially in scenarios with limited amount of training data.

1712.10158 2026-06-04 q-bio.NC cs.LG cs.NE cs.SY eess.SY stat.ML

Non-linear motor control by local learning in spiking neural networks

通过局部学习在脉冲神经网络中实现非线性运动控制

Aditya Gilra, Wulfram Gerstner

AI总结 本文提出了一种基于反馈的在线局部学习权重(FOLLOW)方法,用于训练异构脉冲神经网络,以控制双臂并重现期望的状态轨迹,核心贡献是通过局部可塑性规则学习逆模型以实现非线性动力学控制。

详情
Journal ref
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1773-1782, 2018
AI中文摘要

在具有隐藏神经元的脉冲神经网络中,使用局部、稳定且在线的规则学习权重,以控制非线性身体动力学是一个开放性问题。本文采用监督方案,反馈基于在线局部学习权重(FOLLOW),训练具有隐藏层的异质脉冲神经元网络,以控制双臂以重现期望状态轨迹。网络首先学习非线性动力学的逆模型,即从状态轨迹作为输入,学习推断产生轨迹的连续时间命令。连接权重通过涉及前突触放电和后突触误差反馈的局部可塑性规则进行调整。我们选择了一种称为微分前馈的网络架构,该架构在不同前馈和递归架构中提供了最低的测试误差。学习的逆模型随后用于生成连续时间运动命令以控制手臂,给定期望轨迹。

英文摘要

Learning weights in a spiking neural network with hidden neurons, using local, stable and online rules, to control non-linear body dynamics is an open problem. Here, we employ a supervised scheme, Feedback-based Online Local Learning Of Weights (FOLLOW), to train a network of heterogeneous spiking neurons with hidden layers, to control a two-link arm so as to reproduce a desired state trajectory. The network first learns an inverse model of the non-linear dynamics, i.e. from state trajectory as input to the network, it learns to infer the continuous-time command that produced the trajectory. Connection weights are adjusted via a local plasticity rule that involves pre-synaptic firing and post-synaptic feedback of the error in the inferred command. We choose a network architecture, termed differential feedforward, that gives the lowest test error from different feedforward and recurrent architectures. The learned inverse model is then used to generate a continuous-time motor command to control the arm, given a desired trajectory.

1810.00810 2026-06-04 math.NA cs.NA stat.CO

Multilevel Adaptive Sparse Grid Quadrature for Monte Carlo models

多级自适应稀疏网格求积法用于蒙特卡罗模型

Sandra Döpking, Sebastian Matera

AI总结 本文提出了一种多级自适应稀疏网格求积方法,用于计算由蒙特卡罗模型隐式定义的积分,通过利用不同采样精度层级来降低计算成本,相比单级方法有显著提升。

详情
AI中文摘要

许多问题需要通过某种蒙特卡罗(MC)采样来近似期望值,例如分子动力学(MD)或随机反应模型(也称为动力学蒙特卡罗(kMC)模拟)。通常,我们还对MC模型输出关于输入参数的积分感兴趣。本文提出了一种多级自适应稀疏网格策略,用于此类问题的数值积分,其中被积函数由蒙特卡罗模型隐式定义。在该方法中,我们利用蒙特卡罗模型中不同层级的采样精度来降低总体计算成本,相较于单级方法。与现有多级数值求积方法不同,我们的方法不基于 telescoping sum,而是利用稀疏网格的内在多级结构以及所采用的局部支撑、分段线性基函数。除了示例性的玩具模型外,我们还在现实的kMC模型上展示了该方法,用于CO氧化。我们发现与单级方法相比有显著节省——通常数量级的节省。

英文摘要

Many problems require to approximate an expected value by some kind of Monte Carlo (MC) sampling, e.g. molecular dynamics (MD) or simulation of stochastic reaction models (also termed kinetic Monte Carlo (kMC)). Often, we are furthermore interested in some integral of the MC model's output over the input parameters. We present a Multilevel Adaptive Sparse Grid strategy for the numerical integration of such problems where the integrand is implicitly defined by a Monte Carlo model. In this approach, we exploit different levels of sampling accuracy in the Monte Carlo model to reduce the overall computational costs compared to a single level approach. Unlike existing approaches for Multilevel Numerical Quadrature, our approach is not based on a telescoping sum, but we rather utilize the intrinsic multilevel structure of the sparse grids and the employed locally supported, piecewise linear basis functions. Besides illustrative toy models, we demonstrate the methodology on a realistic kMC model for CO oxidation. We find significant savings compared to the single level approach - often orders of magnitude.

1902.04671 2026-06-04 stat.AP cs.SY eess.SY

A Novel Maneuvering Target Tracking Approach by Stochastic Volatility GARCH Model

基于GARCH波动性的新型机动目标跟踪方法

Ehsan Hajiramezanali, Seyyed Hamed Fouladi, Hamidreza Amindavar

AI总结 本文提出一种基于GARCH波动性的新型单模型机动目标跟踪方法,通过随机微分方程(SDE)建模目标的马尔可夫跳跃加速度,利用贝叶斯滤波器同时估计原始状态和随机波动性(SV),并通过仿真验证了该方法在目标跟踪性能上的改进。

详情
AI中文摘要

在本文中,我们介绍了一种基于GARCH波动性的新型单模型机动目标跟踪方法,该方法基于随机微分方程(SDE)。传统输入估计(IE)技术假设加速度水平恒定,无法涵盖所有可能的加速度本质。相比之下,多模型(MM)算法虽然弥补了IE的一些不足,但对转移概率矩阵敏感。本文提出了一种创新模型,通过使用新的加速度动态建模方法和贝叶斯滤波器来克服这些缺点。我们利用SDE通过GARCH过程作为SDE波动性来建模机动目标的马尔可夫跳跃加速度。在所提出的方案中,通过bootstrap粒子滤波器(PF)同时估计原始状态和随机波动性(SV)。我们引入bootstrap重采样以获得GARCH密度的统计特性。由于GARCH分布的厚尾性质,bootstrap PF在状态方程中可能出现的大误差情况下更为有效。我们通过分析证明,考虑GARCH加速度模型可以提高目标跟踪性能。最后,通过仿真研究验证了所提出策略(PF-AR-GARCH)的有效性和能力。

英文摘要

In this paper, we introduce a new single model maneuvering target tracking approach using stochastic differential equation (SDE) based on GARCH volatility. The traditional input estimation (IE) techniques assume constant acceleration level which do not cover all the possible acceleration quintessence. In contrast, the multiple model (MM) algorithms that take care of some IE's shortcomings, are sensitive to the transition probability matrices. In this paper, an innovative model is proposed to overcome these drawbacks by using a new generalized dynamic modeling of acceleration and a Bayesian filter. We utilize SDE to model Markovian jump acceleration of a maneuvering target through GARCH process as the SDE volatility. In the proposed scheme, the original state and stochastic volatility (SV) are estimated simultaneously by a bootstrap particle filter (PF). We introduce the bootstrap resampling to obtain the statistical properties of a GARCH density. Due to the heavy-tailed nature of the GARCH distribution, the bootstrap PF is more effective in the presence of large errors that can occur in the state equation. We show analytically that the target tracking performance is improved by considering GARCH acceleration model. Finally, the effectiveness and capabilities of our proposed strategy (PF-AR-GARCH) are demonstrated and validated through simulation studies.

1811.10486 2026-06-04 stat.ME cs.NA math.NA

Selected Methods for non-Gaussian Data Analysis

非高斯数据分析中的选定方法

Krzysztof Domino

AI总结 本文研究非高斯分布数据的分析方法,探讨了单变量数据中自相关增量的随机过程以及Levy和Tsallis分布,多变量非高斯分布采用Copula方法,并讨论了高阶多变量cumulants在提取非高斯多变量分布信息中的应用。

详情
Comments
ISBN: 978-83-926054-3-0
AI中文摘要

计算机工程的基本目标是对数据进行分析。此类数据通常是根据各种分布模型分布的大数据集。在本文中,我们专注于非高斯分布数据的分析。在单变量数据分析中,我们讨论了具有自相关增量的随机过程以及由特定随机过程导出的单变量分布,即Levy和Tsallis分布。对多变量非高斯分布的深入研究需要Copula方法。Copula是多变量分布中的一个组成部分,用于建模边缘之间的相互依赖性。存在许多Copula家族,它们以不同的依赖性度量来表征边缘之间的依赖性。重要的是,其中一种是“尾部”依赖性,用于建模许多边缘中同时出现极端值的情况。这些极端事件可能反映金融数据中的危机、机器学习中的异常值或交通拥堵。接下来我们讨论了高阶多变量cumulants,当多变量分布非高斯时,这些cumulants非零。然而,cumulants和copulas之间的关系并不直接且较为复杂。我们讨论了这些cumulants在提取关于非高斯多变量分布信息中的应用,以获取关于非高斯copulas的信息。在计算机科学中使用高阶多变量cumulants受到金融数据分析的启发,特别是安全投资组合评估。此外,高阶多变量cumulants在数据工程中的许多其他应用,特别是信号处理、非线性系统识别、盲源分离以及多源信号的方向寻找算法中。

英文摘要

The basic goal of computer engineering is the analysis of data. Such data are often large data sets distributed according to various distribution models. In this manuscript we focus on the analysis of non-Gaussian distributed data. In the case of univariate data analysis we discuss stochastic processes with auto-correlated increments and univariate distributions derived from specific stochastic processes, i.e. Levy and Tsallis distributions. Deep investigation of multivariate non-Gaussian distributions requires the copula approach. A copula is an component of multivariate distribution that models the mutual interdependence between marginals. There are many copula families characterised by various measures of the dependence between marginals. Importantly, one of those are `tail' dependencies that model the simultaneous appearance of extreme values in many marginals. Those extreme events may reflect a crisis given financial data, outliers in machine learning, or a traffic congestion. Next we discuss higher order multivariate cumulants that are non-zero if multivariate distribution is non-Gaussian. However, the relation between cumulants and copulas is not straight forward and rather complicated. We discuss the application of those cumulants to extract information about non-Gaussian multivariate distributions, such that information about non-Gaussian copulas. The use of higher order multivariate cumulants in computer science is inspired by financial data analysis, especially by the safe investment portfolio evaluation. There are many other applications of higher order multivariate cumulants in data engineering, especially in: signal processing, non-linear system identification, blind sources separation, and direction finding algorithms of multi-source signals.

1809.05066 2026-06-04 physics.chem-ph cs.NA math.NA stat.CO

Simulated Tempering Method in the Infinite Switch Limit with Adaptive Weight Learning

在无限交换极限下基于自适应权重学习的模拟退火方法

Anton Martinsson, Jianfeng Lu, Benedict Leimkuhler, Eric Vanden-Eijnden

AI总结 本文研究了模拟退火方法的理论基础,并设计了高效的算法。通过大偏差论证,证明了在无限快速变化温度的极限下,模拟退火最有效。在此极限下,可以将温度和物理变量的运动方程替换为仅对后者进行平均的方程,其中力根据温度权重定义的位置依赖函数重新缩放。这些平均方程类似于Gao的积分过温度方法,但证明使用连续而非离散温度集更优。我们为温度权重的选择提供了理论依据,将其作为反向配分函数,从而将模拟退火与Wang-Landau采样联系起来。最后,我们描述了一个自洽算法,同时采样canonical集合并学习权重。该算法在谐振子系统和Curie-Weiss模型的连续变种上进行了测试,显示其表现良好,并能准确捕捉到该模型中观察到的二级相变。

详情
AI中文摘要

我们研究了模拟退火方法的理论基础,并利用我们的发现设计了高效的算法。首先使用大偏差论证,该论证最初用于复制交换分子动力学[Plattner等人,J. Chem. Phys. 135:134111 (2011)],我们证明最有效的模拟退火方法是无限快速地变化温度。在此极限下,我们可以将温度和物理变量的运动方程替换为仅对后者进行平均的方程,其中力根据温度权重定义的位置依赖函数重新缩放。这些平均方程类似于Gao的积分过温度方法,但证明使用连续而非离散温度集更优。我们为温度权重的选择提供了理论依据,将其作为反向配分函数,从而将模拟退火与Wang-Landau采样联系起来。最后,我们描述了一个自洽算法,同时采样canonical集合并学习权重。该算法在谐振子系统以及Curie-Weiss模型的连续变种上进行了测试,显示其表现良好,并能准确捕捉到该模型中观察到的二级相变。

英文摘要

We investigate the theoretical foundations of the simulated tempering method and use our findings to design efficient algorithms. Employing a large deviation argument first used for replica exchange molecular dynamics [Plattner et al., J. Chem. Phys. 135:134111 (2011)], we demonstrate that the most efficient approach to simulated tempering is to vary the temperature infinitely rapidly. In this limit, we can replace the equations of motion for the temperature and physical variables by averaged equations for the latter alone, with the forces rescaled according to a position-dependent function defined in terms of temperature weights. The averaged equations are similar to those used in Gao's integrated-over-temperature method, except that we show that it is better to use a continuous rather than a discrete set of temperatures. We give a theoretical argument for the choice of the temperature weights as the reciprocal partition function, thereby relating simulated tempering to Wang-Landau sampling. Finally, we describe a self-consistent algorithm for simultaneously sampling the canonical ensemble and learning the weights during simulation. This algorithm is tested on a system of harmonic oscillators as well as a continuous variant of the Curie-Weiss model, where it is shown to perform well and to accurately capture the second-order phase transition observed in this model.

1602.05702 2026-06-04 cs.SD cs.SY eess.SY stat.ML

EEG-informed attended speaker extraction from recorded speech mixtures with application in neuro-steered hearing prostheses

基于EEG的受关注说话人提取从记录的语音混音中,应用于神经引导的听力假体

Simon Van Eyndhoven, Tom Francart, Alexander Bertrand

AI总结 本文提出了一种基于EEG的受关注说话人提取方法,利用麦克风阵列记录和EEG记录来实现噪声环境下的说话人分离与去噪,展示了在无干净语音信号的情况下,通过EEG进行的听觉注意力检测的鲁棒性。

详情
Journal ref
IEEE Transactions on Biomedical Engineering, vol. 64, no. 5, pp. 1045-1056, 2017
Comments
This paper is published in IEEE Transactions on Biomedical Engineering (2016) and is under copyright. Please cite this paper as: S. Van Eyndhoven, T. Francart, and A. Bertrand, "EEG-informed attended speaker extraction from recorded speech mixtures with application in neuro-steered hearing prostheses", IEEE Transactions on Biomedical Engineering, vol. 64, no. 5, pp. 1045-1056, 2017
AI中文摘要

OBJECTIVE: 我们的目标是提取并去噪嘈杂双说话人声学场景中的受关注说话人,依靠双耳助听器的麦克风阵列记录,这些记录通过脑电图(EEG)记录补充,以推断感兴趣的说话人。 METHODS: 在本研究中,我们提出了一种模块化处理流程,首先从麦克风记录中提取两个语音包络,然后根据EEG选择受关注的语音包络,最后使用该包络来指导多通道语音分离和去噪算法。 RESULTS: 实现了对干扰(未受关注)语音和背景噪声的有效抑制,同时保留了受关注的语音。此外,基于EEG的听觉注意力检测(AAD)被证明在使用噪声语音信号时具有鲁棒性。 CONCLUSIONS: 我们的结果表明,基于EEG的说话人提取从麦克风阵列记录是可行且鲁棒的,即使在嘈杂的声学环境中,且无需访问干净的语音信号来执行基于EEG的AAD。 SIGNIFICANCE: 当前关于AAD的研究总是假设干净语音信号的可用性,这限制了在真实环境中的应用。我们扩展了这项研究,即使只有麦克风记录和嘈杂语音混合物可用时,也能检测到受关注的说话人。这是为新的脑机接口和有效的过滤方案在神经引导的听力假体中提供了一个关键要素。在这里,我们提供了基于EEG的受关注说话人提取和去噪的第一个概念验证。

英文摘要

OBJECTIVE: We aim to extract and denoise the attended speaker in a noisy, two-speaker acoustic scenario, relying on microphone array recordings from a binaural hearing aid, which are complemented with electroencephalography (EEG) recordings to infer the speaker of interest. METHODS: In this study, we propose a modular processing flow that first extracts the two speech envelopes from the microphone recordings, then selects the attended speech envelope based on the EEG, and finally uses this envelope to inform a multi-channel speech separation and denoising algorithm. RESULTS: Strong suppression of interfering (unattended) speech and background noise is achieved, while the attended speech is preserved. Furthermore, EEG-based auditory attention detection (AAD) is shown to be robust to the use of noisy speech signals. CONCLUSIONS: Our results show that AAD-based speaker extraction from microphone array recordings is feasible and robust, even in noisy acoustic environments, and without access to the clean speech signals to perform EEG-based AAD. SIGNIFICANCE: Current research on AAD always assumes the availability of the clean speech signals, which limits the applicability in real settings. We have extended this research to detect the attended speaker even when only microphone recordings with noisy speech mixtures are available. This is an enabling ingredient for new brain-computer interfaces and effective filtering schemes in neuro-steered hearing prostheses. Here, we provide a first proof of concept for EEG-informed attended speaker extraction and denoising.

1806.05722 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

Non-asymptotic Identification of LTI Systems from a Single Trajectory

非渐近识别单轨迹下的线性时不变系统

Samet Oymak, Necmiye Ozay

AI总结 该研究通过单轨迹输入输出数据,利用霍尔-卡尔曼算法在有限时间内学习线性时不变系统的马尔可夫参数,并结合稳定性结果和样本复杂度分析,确定学习系统平衡实现所需的数据量。

详情
Comments
Version 2 has two improvements: First, paper now uses spectral radius rather than largest singular value hence applies to a larger class of systems. Secondly, new sample complexity bounds are provided for approximating the system's Hankel operator via estimated Markov parameters
AI中文摘要

我们考虑从输入输出数据学习线性时不变(LTI)动态系统实现的问题。给定单个输入输出轨迹,我们为学习系统的马尔可夫参数提供有限时间分析,通过经典霍尔-卡尔曼算法获得平衡实现。通过证明霍尔-卡尔曼算法的稳定性结果,并结合马尔可夫参数的样本复杂度结果,我们展示了如何确定以高概率学习系统平衡实现所需的数据量。

英文摘要

We consider the problem of learning a realization for a linear time-invariant (LTI) dynamical system from input/output data. Given a single input/output trajectory, we provide finite time analysis for learning the system's Markov parameters, from which a balanced realization is obtained using the classical Ho-Kalman algorithm. By proving a stability result for the Ho-Kalman algorithm and combining it with the sample complexity results for Markov parameters, we show how much data is needed to learn a balanced realization of the system up to a desired accuracy with high probability.

1803.06443 2026-06-04 cs.LG cs.DC cs.SY eess.SY stat.ML

Communication Compression for Decentralized Training

分布式训练中的通信压缩

Hanlin Tang, Shaoduo Gan, Ce Zhang, Tong Zhang, Ji Liu

AI总结 本文研究了在高延迟和低带宽网络中结合通信压缩与去中心化技术以实现鲁棒训练系统的问题,提出了两种新的压缩策略并证明了其收敛性。

详情
AI中文摘要

优化分布式学习系统是平衡计算与通信的艺术。已有两种研究方向试图解决网络速度慢的问题:通信压缩用于低带宽网络,去中心化用于高延迟网络。本文探讨了一个自然问题:能否将这两种技术结合,使系统同时鲁棒于带宽和延迟?尽管这种组合的系统影响是显而易见的,但其理论原理和算法设计却极具挑战性:与集中式算法不同,简单地在去中心化网络中压缩交换信息,即使以无偏随机方式,也会累积误差并导致无法收敛。本文提出了一种压缩的去中心化训练框架,并提出了两种不同的策略,分别称为 extrapolation compression 和 difference compression。我们分析了这两种算法并证明了它们以 $O(1/\sqrt{nT})$ 的速率收敛,其中 $n$ 是工作者数量,$T$ 是迭代次数,与全精度集中式训练的收敛速率相匹配。我们验证了我们的算法,并发现对于同时具有高延迟和低带宽的网络,我们的算法显著优于仅去中心化或仅量化算法。

英文摘要

Optimizing distributed learning systems is an art of balancing between computation and communication. There have been two lines of research that try to deal with slower networks: {\em communication compression} for low bandwidth networks, and {\em decentralization} for high latency networks. In this paper, We explore a natural question: {\em can the combination of both techniques lead to a system that is robust to both bandwidth and latency?} Although the system implication of such combination is trivial, the underlying theoretical principle and algorithm design is challenging: unlike centralized algorithms, simply compressing exchanged information, even in an unbiased stochastic way, within the decentralized network would accumulate the error and fail to converge. In this paper, we develop a framework of compressed, decentralized training and propose two different strategies, which we call {\em extrapolation compression} and {\em difference compression}. We analyze both algorithms and prove both converge at the rate of $O(1/\sqrt{nT})$ where $n$ is the number of workers and $T$ is the number of iterations, matching the convergence rate for full precision, centralized training. We validate our algorithms and find that our proposed algorithm outperforms the best of merely decentralized and merely quantized algorithm significantly for networks with {\em both} high latency and low bandwidth.

1508.01132 2026-06-04 math.NA cs.NA stat.CO

Ensemble Transport Adaptive Importance Sampling

粒子运载自适应重要性采样

Colin Cotter, Simon Cotter, Paul Russell

AI总结 本文提出了一种基于粒子集合的自适应重要性采样方法,通过使用粒子集合生成重要性采样提议分布,并结合最先进的粒子运载重采样方法生成均匀加权样本,从而在低维问题中优于MCMC方法,并在粒子数量增加时表现出更好的收敛速度。此外,还引入了新的重采样策略多级变换(MT),并探讨了混合提议算法参数的快速调优方法,以优化性能。

详情
AI中文摘要

马尔可夫链蒙特卡洛方法是一类强大的、广泛使用的数值方法,用于从复杂概率分布中进行抽样。随着这些方法的应用规模和复杂性增加,对高效方法的需求也增加。在本文中,我们提出了一种粒子集合算法。在每次迭代中,通过粒子集合生成一个重要性采样提议分布。从该分布中进行分层抽样,并在后验下加权,然后使用最先进的粒子运载重采样方法生成一个均匀加权样本,以准备下一次迭代。我们证明,这种粒子运载自适应重要性采样(ETAIS)方法在低维问题中优于具有等效提议分布的MCMC方法,并且在粒子数量增加时,收敛速度显示出比线性改进更好的效果。我们还引入了一种新的重采样策略,多级变换(MT),虽然不如粒子运载重采样器准确,但对大规模粒子集合来说成本显著降低,可以与ETAIS结合用于复杂问题。我们还专注于如何快速调整混合提议算法的参数以优化性能。特别是,我们展示了该方法在多模态问题中的优越采样能力,如来自混合模型推断的问题,以及需要求解微分方程的昂贵似然问题,后者展示了数量级的加速效果。粒子集合的似然评估可以以分布式方式进行计算,这表明该方法是并行贝叶斯计算的良好候选者。

英文摘要

Markov chain Monte Carlo methods are a powerful and commonly used family of numerical methods for sampling from complex probability distributions. As applications of these methods increase in size and complexity, the need for efficient methods increases. In this paper, we present a particle ensemble algorithm. At each iteration, an importance sampling proposal distribution is formed using an ensemble of particles. A stratified sample is taken from this distribution and weighted under the posterior, a state-of-the-art ensemble transport resampling method is then used to create an evenly weighted sample ready for the next iteration. We demonstrate that this ensemble transport adaptive importance sampling (ETAIS) method outperforms MCMC methods with equivalent proposal distributions for low dimensional problems, and in fact shows better than linear improvements in convergence rates with respect to the number of ensemble members. We also introduce a new resampling strategy, multinomial transformation (MT), which while not as accurate as the ensemble transport resampler, is substantially less costly for large ensemble sizes, and can then be used in conjunction with ETAIS for complex problems. We also focus on how algorithmic parameters regarding the mixture proposal can be quickly tuned to optimise performance. In particular, we demonstrate this methodology's superior sampling for multimodal problems, such as those arising from inference for mixture models, and for problems with expensive likelihoods requiring the solution of a differential equation, for which speed-ups of orders of magnitude are demonstrated. Likelihood evaluations of the ensemble could be computed in a distributed manner, suggesting that this methodology is a good candidate for parallel Bayesian computations.

1808.05441 2026-06-04 math.NA cs.NA math.OC stat.ME

A comparative study of structural similarity and regularization for joint inverse problems governed by PDEs

基于PDE正则化问题的结构相似性与正则化比较研究

Benjamin Crestel, Georg Stadler, Omar Ghattas

AI总结 本文研究了通过正则化方法联合反演由PDE正向模型 governed 的多个参数场,探讨了结构相似性正则化方法在提高参数场重建效果方面的应用,并通过数值实验验证了向量总变分方法的优越性。

详情
AI中文摘要

联合反演指的是从由单个或多个正向模型所支配的系统观测中同时推断多个参数场。在许多情况下,这些参数场反映了单一介质的不同属性,因此在空间上相关或结构相似。通过在它们的空间相关性上施加先验信息,我们试图提高参数场的重建效果,相对于单独反演每个场。其中一个主要挑战是设计一个联合正则化功能,既能传达字段之间的空间相关性或结构相似性,又能允许联合反问题的可扩展和高效求解器。我们描述了几种受这些目标启发的联合正则化:交叉梯度和归一化交叉梯度结构相似性项、向量总变分以及基于梯度核范数的联合正则化。基于三个类别的逆问题的数值结果,我们得出结论,向量总变分功能优于其他方法。除了在所有实验中产生良好的重建外,它还允许可扩展且高效的求解器用于由PDE正向模型支配的联合反问题。

英文摘要

Joint inversion refers to the simultaneous inference of multiple parameter fields from observations of systems governed by single or multiple forward models. In many cases these parameter fields reflect different attributes of a single medium and are thus spatially correlated or structurally similar. By imposing prior information on their spatial correlations via a joint regularization term, we seek to improve the reconstruction of the parameter fields relative to inversion for each field independently. One of the main challenges is to devise a joint regularization functional that conveys the spatial correlations or structural similarity between the fields while at the same time permitting scalable and efficient solvers for the joint inverse problem. We describe several joint regularizations that are motivated by these goals: a cross-gradient and a normalized cross-gradient structural similarity term, the vectorial total variation, and a joint regularization based on the nuclear norm of the gradients. Based on numerical results from three classes of inverse problems with piecewise-homogeneous parameter fields, we conclude that the vectorial total variation functional is preferable to the other methods considered. Besides resulting in good reconstructions in all experiments, it allows for scalable, efficient solvers for joint inverse problems governed by PDE forward models.

1809.10227 2026-06-04 stat.ME cs.NA math.NA stat.CO

Symmetry Exploits for Bayesian Cubature Methods

对称性利用在贝叶斯立方法中的应用

Toni Karvonen, Simo Särkkä, Chris. J. Oates

AI总结 本文研究了如何利用对称性来改进贝叶斯立方法,提出了一种新的方法来减少计算成本,并扩展了贝叶斯立方法、贝叶斯-萨德立方法和多输出贝叶斯立方法的对称性利用。

详情
Comments
Accepted for publication in Statistics and Computing
AI中文摘要

贝叶斯立方法提供了一个灵活的数值积分框架,在其中可以编码和利用关于被积函数的先验知识。与许多经典立方法相比,这种额外的灵活性带来了计算成本,该成本与被积函数评估次数的立方成正比。最近发现,完全对称的点集可以用来减少标准贝叶斯立方法的计算成本,有时可以显著减少。本文在贝叶斯立方法框架内识别了几种额外的对称性利用。特别是,我们超越了早期的工作,在考虑非对称度量的同时,除了标准贝叶斯立方法外,还提出了贝叶斯-萨德立方法和多输出贝叶斯立方法的利用方法。

英文摘要

Bayesian cubature provides a flexible framework for numerical integration, in which a priori knowledge on the integrand can be encoded and exploited. This additional flexibility, compared to many classical cubature methods, comes at a computational cost which is cubic in the number of evaluations of the integrand. It has been recently observed that fully symmetric point sets can be exploited in order to reduce - in some cases substantially - the computational cost of the standard Bayesian cubature method. This work identifies several additional symmetry exploits within the Bayesian cubature framework. In particular, we go beyond earlier work in considering non-symmetric measures and, in addition to the standard Bayesian cubature method, present exploits for the Bayes-Sard cubature method and the multi-output Bayesian cubature method.

1809.08911 2026-06-04 cs.LG cs.CY cs.SY eess.SP eess.SY stat.ML

Understanding Compressive Adversarial Privacy

理解压缩对抗隐私

Xiao Chen, Peter Kairouz, Ram Rajagopal

AI总结 本文提出了一种压缩对抗隐私框架,通过凸优化在数据隐私和效用之间取得平衡,并通过实证应用展示了该框架在保护敏感信息方面的有效性。

详情
Journal ref
2018 IEEE Conference on Decision and Control (CDC)
AI中文摘要

设计一种不牺牲过多隐私的数据共享机制可以被视为数据持有者与恶意攻击者之间的博弈。本文描述了一种压缩对抗隐私框架,该框架捕捉了数据隐私与效用之间的权衡。我们在假设数据持有者和攻击者只能使用线性变换修改数据的情况下,通过凸优化确定最优的数据发布机制。随后,我们构建了一个更加现实的数据发布机制,该机制可以依赖于非线性压缩模型,而攻击者则使用神经网络。通过一系列实证应用,我们展示了该框架,即压缩对抗隐私,能够保护敏感信息。

英文摘要

Designing a data sharing mechanism without sacrificing too much privacy can be considered as a game between data holders and malicious attackers. This paper describes a compressive adversarial privacy framework that captures the trade-off between the data privacy and utility. We characterize the optimal data releasing mechanism through convex optimization when assuming that both the data holder and attacker can only modify the data using linear transformations. We then build a more realistic data releasing mechanism that can rely on a nonlinear compression model while the attacker uses a neural network. We demonstrate in a series of empirical applications that this framework, consisting of compressive adversarial privacy, can preserve sensitive information.

1610.05202 2026-06-04 cs.LG cs.AI cs.DC cs.SY eess.SY stat.ML

Decentralized Collaborative Learning of Personalized Models over Networks

网络上的去中心化协作学习个性化模型

Paul Vanhaesebrouck, Aurélien Bellet, Marc Tommasi

AI总结 本文研究了在协作对等网络中,如何通过与其他具有相似目标的代理通信来改进本地训练模型,提出两种异步 gossip 算法并基于 ADMM 实现去中心化算法。

详情
Comments
To appear in the Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017)
AI中文摘要

我们考虑了一个协作对等网络中的学习代理集合,其中每个代理根据其自身的学习目标学习一个个性化模型。本文研究的问题是:如何通过与其他具有相似目标的代理通信来改进本地训练的模型?我们引入并分析了两种异步 gossip 算法,以完全去中心化的方式运行。我们的第一种方法受标签传播启发,旨在在网络中平滑预训练的本地模型,同时考虑每个代理对其初始模型的置信度。我们的第二种方法中,代理通过基于本地数据集和邻居行为的迭代更新联合学习和传播模型。为了优化这个具有挑战性的目标,我们的去中心化算法基于 ADMM。

英文摘要

We consider a set of learning agents in a collaborative peer-to-peer network, where each agent learns a personalized model according to its own learning objective. The question addressed in this paper is: how can agents improve upon their locally trained model by communicating with other agents that have similar objectives? We introduce and analyze two asynchronous gossip algorithms running in a fully decentralized manner. Our first approach, inspired from label propagation, aims to smooth pre-trained local models over the network while accounting for the confidence that each agent has in its initial model. In our second approach, agents jointly learn and propagate their model by making iterative updates based on both their local dataset and the behavior of their neighbors. To optimize this challenging objective, our decentralized algorithm is based on ADMM.

1606.02421 2026-06-04 stat.ML cs.AI cs.DC cs.LG cs.SY eess.SY

Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions

基于 gossip 的双重平均法用于分布式优化配对函数

Igor Colin, Aurélien Bellet, Joseph Salmon, Stéphan Clémençon

AI总结 本文提出了一种基于 gossip 的双重平均算法,用于在分布式网络中优化配对函数,适用于排名、距离度量学习和图推断等应用,通过同步和异步设置解决优化问题,并展示了其在AUC最大化和度量学习中的实际应用。

详情
AI中文摘要

在分布式网络(如传感器、连接设备等)中,存在对高效算法优化全局成本函数的重要需求,例如从每个计算单元收集的本地数据中学习全局模型。本文针对分布式最小化数据点配对函数的问题,这些点分布在定义网络通信拓扑的图的节点上。该问题在排名、距离度量学习和图推断等领域有广泛应用。我们提出了一种基于双重平均的新型 gossip 算法,旨在在同步和异步设置中解决此类问题。所提出的框架足够灵活,能够处理约束和正则化优化问题的变体。我们的理论分析表明,所提出的算法在保持集中式双重平均收敛速度的同时,仅引入一个加性偏差项。我们还通过在AUC最大化和度量学习问题上的数值模拟,展示了我们方法的实际价值。

英文摘要

In decentralized networks (of sensors, connected objects, etc.), there is an important need for efficient algorithms to optimize a global cost function, for instance to learn a global model from the local data collected by each computing unit. In this paper, we address the problem of decentralized minimization of pairwise functions of the data points, where these points are distributed over the nodes of a graph defining the communication topology of the network. This general problem finds applications in ranking, distance metric learning and graph inference, among others. We propose new gossip algorithms based on dual averaging which aims at solving such problems both in synchronous and asynchronous settings. The proposed framework is flexible enough to deal with constrained and regularized variants of the optimization problem. Our theoretical analysis reveals that the proposed algorithms preserve the convergence rate of centralized dual averaging up to an additive bias term. We present numerical simulations on Area Under the ROC Curve (AUC) maximization and metric learning problems which illustrate the practical interest of our approach.

1511.05464 2026-06-04 stat.ML cs.DC cs.LG cs.SY eess.SY stat.CO

Extending Gossip Algorithms to Distributed Estimation of U-Statistics

将 gossip 算法扩展到分布式 U-统计量估计

Igor Colin, Aurélien Bellet, Joseph Salmon, Stéphan Clémençon

AI总结 本文提出新的同步和异步随机 gossip 算法,用于在分布式网络中同时传播数据并维护局部的 U-统计量估计,证明了同步和异步情况下的收敛率分别为 O(1/t) 和 O(log t / t),并通过数值实验验证了算法的优越性。

详情
Comments
to be presented at NIPS 2015
AI中文摘要

高效且稳健的去中心化网络估计算法对于许多分布式系统至关重要。尽管样本均值统计的分布式估计已受到广泛关注,但依赖于对观测对的更昂贵平均的 U-统计量计算却是一个研究较少的领域。然而,这些数据函数对于描述统计总体的全局特性至关重要,重要例子包括曲线下面积、经验方差、基尼均差和簇内点散度。本文提出新的同步和异步随机 gossip 算法,同时在网络中传播数据并维护感兴趣的 U-统计量的局部估计。我们建立了同步和异步情况下的收敛率界分别为 O(1/t) 和 O(log t / t),其中 t 是迭代次数,且具有明确的数据和网络依赖项。除了在速率分析方面的优越比较外,数值实验还提供了实证证据,证明所提出的算法优于之前引入的方法。

英文摘要

Efficient and robust algorithms for decentralized estimation in networks are essential to many distributed systems. Whereas distributed estimation of sample mean statistics has been the subject of a good deal of attention, computation of $U$-statistics, relying on more expensive averaging over pairs of observations, is a less investigated area. Yet, such data functionals are essential to describe global properties of a statistical population, with important examples including Area Under the Curve, empirical variance, Gini mean difference and within-cluster point scatter. This paper proposes new synchronous and asynchronous randomized gossip algorithms which simultaneously propagate data across the network and maintain local estimates of the $U$-statistic of interest. We establish convergence rate bounds of $O(1/t)$ and $O(\log t / t)$ for the synchronous and asynchronous cases respectively, where $t$ is the number of iterations, with explicit data and network dependent terms. Beyond favorable comparisons in terms of rate analysis, numerical experiments provide empirical evidence the proposed algorithms surpasses the previously introduced approach.

1811.04685 2026-06-04 eess.SY cs.SY math.ST stat.TH

Joint Probability Distribution of Prediction Errors of ARIMA

ARIMA预测误差的联合概率分布

Xin Qin, Jyotirmoy V. Deshmukh

AI总结 本文提出了一种基于ARIMA模型计算多步预测误差联合概率分布的方法,适用于平稳过程和本质平稳过程的单变量和多变量场景。

详情
Comments
Revised notations, 16 pages
AI中文摘要

为预测信号的多步预测提供概率保证,在时间逻辑定义的行为监控中具有重要性。本文推导了一种基于自回归积分移动平均(ARIMA)模型计算多步预测误差联合概率分布的方法。我们涵盖了平稳过程和本质平稳过程的单变量和多变量场景。

英文摘要

Producing probabilistic guarantee for several steps of a predicted signal follow a temporal logic defined behavior has its rising importance in monitoring. In this paper, we derive a method to compute the joint probability distribution of prediction errors of multiple steps based on Autoregressive Integrated Moving Average(ARIMA) model. We cover scenarios in stationary process and intrinsically stationary process for univariate and multivariate.

1811.04455 2026-06-04 stat.ML cs.LG cs.NA math.NA

Learning with tree-based tensor formats

基于树结构张量格式的学习

Erwan Grelier, Anthony Nouy, Mathilde Chevreuil

AI总结 本文研究了在统计学习设置中,通过经验风险最小化在树结构张量格式的模型类中近似高维函数的问题,提出了一种基于树结构张量格式的模型选择策略和树优化算法,以提高学习的数值稳定性与可靠性。

详情
AI中文摘要

本文关注在统计学习设置中,通过在树结构张量格式的模型类中进行经验风险最小化来近似高维函数的问题。这些是特定的秩结构函数类,可以视为具有与树和多线性激活函数相关的稀疏架构的深度神经网络。对于给定的模型类,我们利用树结构张量格式是多线性模型的事实,将风险最小化问题转换为一系列线性模型的学习问题。适当的表示变换会产生数值稳定的 学习问题,并允许利用稀疏性。对于高维问题或仅当数据集较小时,选择合适的模型类是一个关键问题。对于给定的树,选择最小化风险的树结构秩元组是一个组合问题。在这里,我们提出了一种秩适应策略,实际情况下能够提供风险随模型类复杂度变化的良好收敛性。找到合适的树也是一个组合问题,可以与深度神经网络特定稀疏架构的选择相关联。在这里,我们提出了一种随机算法,用于最小化给定函数在具有给定 arity 的树类中的表示复杂度,允许树的拓扑结构变化。该树优化算法随后被包含在一种学习方案中,该方案依次适应树和相应的树结构秩。与经典非线性模型类学习算法不同,所提出的算法在数值上是稳定、可靠的,并且只需要用户较低水平的专业知识。

英文摘要

This paper is concerned with the approximation of high-dimensional functions in a statistical learning setting, by empirical risk minimization over model classes of functions in tree-based tensor format. These are particular classes of rank-structured functions that can be seen as deep neural networks with a sparse architecture related to the tree and multilinear activation functions. For learning in a given model class, we exploit the fact that tree-based tensor formats are multilinear models and recast the problem of risk minimization over a nonlinear set into a succession of learning problems with linear models. Suitable changes of representation yield numerically stable learning problems and allow to exploit sparsity. For high-dimensional problems or when only a small data set is available, the selection of a good model class is a critical issue. For a given tree, the selection of the tuple of tree-based ranks that minimize the risk is a combinatorial problem. Here, we propose a rank adaptation strategy which provides in practice a good convergence of the risk as a function of the model class complexity. Finding a good tree is also a combinatorial problem, which can be related to the choice of a particular sparse architecture for deep neural networks. Here, we propose a stochastic algorithm for minimizing the complexity of the representation of a given function over a class of trees with a given arity, allowing changes in the topology of the tree. This tree optimization algorithm is then included in a learning scheme that successively adapts the tree and the corresponding tree-based ranks. Contrary to classical learning algorithms for nonlinear model classes, the proposed algorithms are numerically stable, reliable, and require only a low level expertise of the user.

1805.11572 2026-06-04 cs.CV cs.LG cs.NA math.NA stat.ML

Adversarial Regularizers in Inverse Problems

对抗正则化在反问题中的应用

Sebastian Lunz, Ozan Öktem, Carola-Bibiane Schönlieb

AI总结 本文提出了一种利用神经网络作为正则化函数的新框架,用于解决反问题,该方法通过学习真实图像分布与未正则化重建分布之间的差异来提升反问题求解的性能。

详情
Comments
published at NeurIPS 2018
AI中文摘要

医学成像和计算机视觉中的反问题传统上使用纯模型方法来解决。其中,变分正则化模型是其中最流行的方法之一。我们提出了一种新的框架,用于将数据驱动的方法应用于反问题,使用神经网络作为正则化函数。网络学习区分真实图像分布与未正则化重建分布的分布。一旦训练完成,网络通过求解相应的变分问题应用于反问题。与其他数据驱动的反问题方法不同,该算法即使在只有无监督训练数据可用的情况下也能应用。实验展示了该框架在BSDS数据集上的去噪潜力以及在LIDC数据集上的计算机断层扫描重建潜力。

英文摘要

Inverse Problems in medical imaging and computer vision are traditionally solved using purely model-based methods. Among those variational regularization models are one of the most popular approaches. We propose a new framework for applying data-driven approaches to inverse problems, using a neural network as a regularization functional. The network learns to discriminate between the distribution of ground truth images and the distribution of unregularized reconstructions. Once trained, the network is applied to the inverse problem by solving the corresponding variational problem. Unlike other data-based approaches for inverse problems, the algorithm can be applied even if only unsupervised training data is available. Experiments demonstrate the potential of the framework for denoising on the BSDS dataset and for computed tomography reconstruction on the LIDC dataset.

1710.07747 2026-06-04 stat.ME cs.NA math.NA

Localization for MCMC: sampling high-dimensional posterior distributions with local structure

MCMC定位:在具有局部结构的高维后验分布中采样

Matthias Morzfeld, Xin T. Tong, Youssef M. Marzouk

AI总结 本文研究了如何将数值天气预测中的协方差定位思想应用于贝叶斯反问题中高维后验分布的马尔可夫链蒙特卡洛采样。通过忽略先验精度和协方差矩阵的小对角线元素以及限制观测值对邻域的影响,可以强制反问题具有预期的局部结构。对于线性问题,我们指出了后验矩在局部问题中接近原始问题的条件。我们解释了关于局部结构假设的物理意义,并讨论了局部问题中的高维性概念,这与通常函数空间MCMC中的高维性不同。吉布斯采样是处理局部反问题的自然选择,并且我们证明了其收敛速度对于局部线性问题与维度无关。非线性问题也可以通过定位高效处理,作为这些思想的简单示例,我们介绍了一个局部化的Metropolis-within-Gibbs采样器。几个线性和非线性数值例子展示了定位在逆问题MCMC采样器中的应用。

详情
Comments
33 pages, 5 figures
AI中文摘要

我们研究了如何将数值天气预测中的协方差定位思想应用于马尔可夫链蒙特卡洛(MCMC)采样中高维后验分布的采样,这些分布出现在贝叶斯反问题中。对反问题进行局部化,即通过(i)忽略先验精度和协方差矩阵的小对角线元素;以及(ii)限制观测值对邻域的影响,来强制预期的局部结构。对于线性问题,我们可以指定在局部问题中后验矩接近原始问题的条件。我们解释了关于局部结构假设的物理意义,并讨论了局部问题中的高维性概念,这与通常函数空间MCMC中的高维性不同。吉布斯采样是处理局部反问题的自然选择,并且我们证明了其收敛速度对于局部线性问题与维度无关。非线性问题也可以通过定位高效处理,作为这些思想的简单示例,我们介绍了一个局部化的Metropolis-within-Gibbs采样器。几个线性和非线性数值例子展示了定位在逆问题MCMC采样器中的应用。

英文摘要

We investigate how ideas from covariance localization in numerical weather prediction can be used in Markov chain Monte Carlo (MCMC) sampling of high-dimensional posterior distributions arising in Bayesian inverse problems. To localize an inverse problem is to enforce an anticipated "local" structure by (i) neglecting small off-diagonal elements of the prior precision and covariance matrices; and (ii) restricting the influence of observations to their neighborhood. For linear problems we can specify the conditions under which posterior moments of the localized problem are close to those of the original problem. We explain physical interpretations of our assumptions about local structure and discuss the notion of high dimensionality in local problems, which is different from the usual notion of high dimensionality in function space MCMC. The Gibbs sampler is a natural choice of MCMC algorithm for localized inverse problems and we demonstrate that its convergence rate is independent of dimension for localized linear problems. Nonlinear problems can also be tackled efficiently by localization and, as a simple illustration of these ideas, we present a localized Metropolis-within-Gibbs sampler. Several linear and nonlinear numerical examples illustrate localization in the context of MCMC samplers for inverse problems.

1510.06946 2026-06-04 math.ST econ.GN q-fin.EC q-fin.ST stat.TH

Quantile Coherency: A General Measure for Dependence between Cyclical Economic Variables

分位数一致性:一种用于周期性经济变量之间依赖关系的通用度量方法

Jozef Baruník, Tobias Kley

AI总结 本文提出分位数一致性作为一种度量联合分布中频率域内一般依赖结构的通用方法,论证这种依赖关系对于经济时间序列是自然的,但传统分析方法无法察觉。文章定义了捕捉一般依赖结构的估计量,详细分析其渐近性质,并讨论如何对一般非线性过程进行推断。在实证分析中,研究了二元股票市场收益率之间的依赖关系,揭示了金融市场上尾部风险的测量新视角,并提供了一个建模示例,说明应用研究者如何通过分位数一致性来评估时间序列模型。

详情
Comments
paper (49 pages) and online supplement (31 pages), R codes to replicate the figures in the paper are available at https://github.com/tobiaskley/quantile_coherency_replication
AI中文摘要

在本文中,我们介绍分位数一致性作为一种度量联合分布中频率域内一般依赖结构的通用方法,并论证这种依赖关系对于经济时间序列是自然的,但仅使用传统分析方法时会使其变得不可见。我们定义了捕捉一般依赖结构的估计量,提供了对其渐近性质的详细分析,并讨论了如何对一般类别的可能非线性过程进行推断。在实证分析中,我们研究了二元股票市场收益率之间的依赖关系,为金融市场尾部风险的测量提供了新的见解。我们还提供了一个建模示例,以说明应用研究者如何通过分位数一致性来评估时间序列模型。

英文摘要

In this paper, we introduce quantile coherency to measure general dependence structures emerging in the joint distribution in the frequency domain and argue that this type of dependence is natural for economic time series but remains invisible when only the traditional analysis is employed. We define estimators which capture the general dependence structure, provide a detailed analysis of their asymptotic properties and discuss how to conduct inference for a general class of possibly nonlinear processes. In an empirical illustration we examine the dependence of bivariate stock market returns and shed new light on measurement of tail risk in financial markets. We also provide a modelling exercise to illustrate how applied researchers can benefit from using quantile coherency when assessing time series models.

1805.11706 2026-06-04 cs.LG cs.AI cs.RO cs.SY eess.SY stat.ML

Supervised Policy Update for Deep Reinforcement Learning

深度强化学习中的监督策略更新

Quan Vuong, Yiming Zhang, Keith W. Ross

AI总结 本文提出了一种新的样本效率高的方法,称为监督策略更新(SPU),用于深度强化学习。该方法通过当前策略生成的数据,在非参数化的近端策略空间中构建并求解一个约束优化问题,然后利用监督回归将最优的非参数化策略转换为参数化策略,从而生成新的样本。该方法适用于离散和连续动作空间,并能处理多种接近约束。本文展示了如何通过该方法解决自然策略梯度和信任区域策略优化(NPG/TRPO)以及近端策略优化(PPO)问题。SPU的实现比TRPO更简单,在样本效率方面,实验表明SPU在Mujoco模拟机器人任务中优于TRPO,在Atari视频游戏任务中优于PPO。

详情
Comments
Accepted as a conference paper at ICLR 2019
AI中文摘要

我们提出了一种新的样本效率高的方法,称为监督策略更新(SPU),用于深度强化学习。从当前策略生成的数据开始,SPU在非参数化的近端策略空间中构建并求解一个约束优化问题。利用监督回归,它将最优的非参数化策略转换为参数化策略,从而生成新的样本。该方法具有通用性,适用于离散和连续动作空间,并能处理多种接近约束。我们展示了如何通过该方法解决自然策略梯度和信任区域策略优化(NPG/TRPO)以及近端策略优化(PPO)问题。SPU的实现比TRPO更简单。在样本效率方面,我们的广泛实验表明,SPU在Mujoco模拟机器人任务中优于TRPO,在Atari视频游戏任务中优于PPO。

英文摘要

We propose a new sample-efficient methodology, called Supervised Policy Update (SPU), for deep reinforcement learning. Starting with data generated by the current policy, SPU formulates and solves a constrained optimization problem in the non-parameterized proximal policy space. Using supervised regression, it then converts the optimal non-parameterized policy to a parameterized policy, from which it draws new samples. The methodology is general in that it applies to both discrete and continuous action spaces, and can handle a wide variety of proximity constraints for the non-parameterized optimization problem. We show how the Natural Policy Gradient and Trust Region Policy Optimization (NPG/TRPO) problems, and the Proximal Policy Optimization (PPO) problem can be addressed by this methodology. The SPU implementation is much simpler than TRPO. In terms of sample efficiency, our extensive experiments show SPU outperforms TRPO in Mujoco simulated robotic tasks and outperforms PPO in Atari video game tasks.

1812.07810 2026-06-04 cs.LG cs.CR cs.NA math.NA stat.ML

Fast Botnet Detection From Streaming Logs Using Online Lanczos Method

从流日志中快速检测僵尸网络的在线兰茨斯方法

Zheng Chen, Xinli Yu, Chi Zhang, Jin Zhang, Cui Lin, Bo Song, Jianliang Gao, Xiaohua Hu, Wei-Shih Yang, Erjia Yan

AI总结 本文提出了一种基于在线兰茨斯方法的僵尸网络检测方法,通过将PCA方法改进为亚立方复杂度,提高了实时检测的准确性和灵敏度,同时提出了通用的在线相关矩阵更新公式和新的终止条件。

详情
AI中文摘要

僵尸网络,作为一种协调的机器人网络,已成为恶意互联网活动的主要平台,如DDOS攻击、点击欺诈、网络爬虫、垃圾/谣言传播等。本文专注于设计和实验一种新的从流Web服务器日志中检测僵尸网络的方法,受到其广泛适用性、实时保护能力、易用性和敏感数据更安全的启发。我们的算法受到主成分分析(PCA)的启发,以捕捉数据中的相关性,我们首次将兰茨斯方法应用于改进基于PCA的僵尸网络检测的时间复杂度,从立方到亚立方,这使我们能够更准确和灵敏地检测滑动时间窗口中的僵尸网络,而不是固定时间窗口。我们贡献了一个通用的在线相关矩阵更新公式,以及基于误差界和对称矩阵非递减特征值的新终止条件。在电子商务网站日志数据集上,实验表明兰茨斯方法在不同时间窗口下的时间成本始终仅为PCA的20%至25%。

英文摘要

Botnet, a group of coordinated bots, is becoming the main platform of malicious Internet activities like DDOS, click fraud, web scraping, spam/rumor distribution, etc. This paper focuses on design and experiment of a new approach for botnet detection from streaming web server logs, motivated by its wide applicability, real-time protection capability, ease of use and better security of sensitive data. Our algorithm is inspired by a Principal Component Analysis (PCA) to capture correlation in data, and we are first to recognize and adapt Lanczos method to improve the time complexity of PCA-based botnet detection from cubic to sub-cubic, which enables us to more accurately and sensitively detect botnets with sliding time windows rather than fixed time windows. We contribute a generalized online correlation matrix update formula, and a new termination condition for Lanczos iteration for our purpose based on error bound and non-decreasing eigenvalues of symmetric matrices. On our dataset of an ecommerce website logs, experiments show the time cost of Lanczos method with different time windows are consistently only 20% to 25% of PCA.

1812.07410 2026-06-04 cs.LG cs.SY eess.SY stat.ML

An Improved Deep Belief Network Model for Road Safety Analyses

一种改进的深度信念网络模型用于道路安全分析

Guangyuan Pan, Liping Fu, Lalita Thakali, Matthew Muresan, Ming Yu

AI总结 本文提出了一种改进的深度信念网络模型,用于提升道路安全分析中的碰撞预测能力,通过两个案例研究展示该模型在预测性能上的优势,并与其他传统模型进行比较。

详情
Journal ref
Transportation Research Board 97th Annual Meeting, 2018
AI中文摘要

碰撞预测是道路安全分析中的关键组成部分。广泛采用的碰撞预测方法是应用基于回归的技术。底层的校准过程通常耗时较长,需要大量的领域知识和专业知识,无法轻易自动化。本文介绍了一种新的机器学习(ML)方法作为传统技术的替代方案。所提出的ML模型称为正则化深度信念网络,是一种具有两个训练步骤的深度神经网络:首先使用无监督学习算法进行训练,然后通过用第一步训练得到的权重初始化贝叶斯神经网络进行微调。所得模型预计具有改进的预测能力和减少对耗时人工干预的需求。在本文中,我们试图通过两个案例研究展示这种新模型在碰撞预测中的潜力,包括来自加拿大安大略省800公里高速公路401号和其他高速公路的碰撞数据集。我们的目的是展示该ML方法与其他传统模型(包括负二项(NB)模型、核回归(KR)和贝叶斯神经网络(贝叶斯NN))的性能比较。我们还试图解决其他相关问题,如训练数据大小和训练参数的影响。

英文摘要

Crash prediction is a critical component of road safety analyses. A widely adopted approach to crash prediction is application of regression based techniques. The underlying calibration process is often time-consuming, requiring significant domain knowledge and expertise and cannot be easily automated. This paper introduces a new machine learning (ML) based approach as an alternative to the traditional techniques. The proposed ML model is called regularized deep belief network, which is a deep neural network with two training steps: it is first trained using an unsupervised learning algorithm and then fine-tuned by initializing a Bayesian neural network with the trained weights from the first step. The resulting model is expected to have improved prediction power and reduced need for the time-consuming human intervention. In this paper, we attempt to demonstrate the potential of this new model for crash prediction through two case studies including a collision data set from 800 km stretch of Highway 401 and other highways in Ontario, Canada. Our intention is to show the performance of this ML approach in comparison to various traditional models including negative binomial (NB) model, kernel regression (KR), and Bayesian neural network (Bayesian NN). We also attempt to address other related issues such as effect of training data size and training parameters.

1812.06243 2026-06-04 cs.DS cs.LG cs.NA math.NA stat.ML

Algorithmic Theory of ODEs and Sampling from Well-conditioned Logconcave Densities

ODEs的算法理论与从良好条件的logconcave密度采样

Yin Tat Lee, Zhao Song, Santosh S. Vempala

AI总结 本文提出了一种求解多元微分方程的通用算法,其解接近已知基函数的张成,从而实现了近线性时间复杂度的HMC采样方法,适用于广泛使用的logistic回归损失函数。

详情
AI中文摘要

从统计学和机器学习中出现的logconcave函数采样已成为研究热点。最近的发展包括对 Langevin 动力学和 Hamiltonian Monte Carlo (HMC) 的分析。虽然这两种方法在足够强的光滑条件下对连续过程有维度无关的界,但所得到的离散算法的复杂度和函数评估次数随维度增长。受此问题启发,本文提出了一种通用算法,用于求解解接近已知基函数张成的多元微分方程。所得到的算法具有多项对数深度和几乎紧致的运行时间——几乎与解的表示大小成线性关系。我们将此应用于采样问题,以获得一个几乎线性的HMC实现,适用于广泛使用的logistic回归损失函数,其迭代次数(并行深度)和梯度评估次数为维度的多项对数(而非之前的多项式)。该类包括最近广泛研究的用于logistic回归的损失函数,其权重矩阵不相干。我们还给出了一个更快的算法,具有多项对数深度,适用于更一般和标准的强凸函数类,其梯度具有Lipschitz连续性。这些结果基于(1)对精确HMC过程的改进收缩界,以及(2)在实现HMC时出现的微分方程解的多项式近似次数的对数界。

英文摘要

Sampling logconcave functions arising in statistics and machine learning has been a subject of intensive study. Recent developments include analyses for Langevin dynamics and Hamiltonian Monte Carlo (HMC). While both approaches have dimension-independent bounds for the underlying $\mathit{continuous}$ processes under sufficiently strong smoothness conditions, the resulting discrete algorithms have complexity and number of function evaluations growing with the dimension. Motivated by this problem, in this paper, we give a general algorithm for solving multivariate ordinary differential equations whose solution is close to the span of a known basis of functions (e.g., polynomials or piecewise polynomials). The resulting algorithm has polylogarithmic depth and essentially tight runtime - it is nearly linear in the size of the representation of the solution. We apply this to the sampling problem to obtain a nearly linear implementation of HMC for a broad class of smooth, strongly logconcave densities, with the number of iterations (parallel depth) and gradient evaluations being $\mathit{polylogarithmic}$ in the dimension (rather than polynomial as in previous work). This class includes the widely-used loss function for logistic regression with incoherent weight matrices and has been subject of much study recently. We also give a faster algorithm with $ \mathit{polylogarithmic~depth}$ for the more general and standard class of strongly convex functions with Lipschitz gradient. These results are based on (1) an improved contraction bound for the exact HMC process and (2) logarithmic bounds on the degree of polynomials that approximate solutions of the differential equations arising in implementing HMC.

1808.04580 2026-06-04 cs.LG cs.NA math.NA stat.ML

NFFT meets Krylov methods: Fast matrix-vector products for the graph Laplacian of fully connected networks

NFFT与Krylov方法的结合:用于完全连接网络图拉普拉斯算子的快速矩阵-向量乘法

Dominik Alfke, Daniel Potts, Martin Stoll, Toni Volkmer

AI总结 本文提出利用NFFT进行快速矩阵-向量乘法,以高效处理完全连接网络的图拉普拉斯算子,同时展示了其在图像分割和半监督学习中的应用,并与Nyström方法进行了比较。

详情
Comments
28 pages, 9 figures
AI中文摘要

图拉普拉斯算子是数据科学、机器学习和图像处理中的标准工具。对应的矩阵继承了底层网络的复杂结构,并在某些应用中被密集填充。这使得与图拉普拉斯算子的计算,特别是矩阵-向量乘法,成为一个困难的任务。典型应用是计算其若干特征值和特征向量。标准方法在图中节点数量过大时变得不可行。本文提出利用基于非等间距快速傅里叶变换(NFFT)的快速求和方法,以快速执行图拉普拉斯算子的密集矩阵-向量乘法,而无需形成整个矩阵。NFFT算法的巨大灵活性使我们能够将加速乘法嵌入到基于Lanczos的特征值计算程序或迭代线性系统求解器中,并考虑非标准高斯核。我们通过一系列测试问题展示了该方法的可行性,从图像分割到基于图的PDEs的半监督学习。特别是,我们比较了我们的方法与Nyström方法。此外,我们还提出并测试了改进的、混合版本的Nyström方法,该方法内部使用NFFT。

英文摘要

The graph Laplacian is a standard tool in data science, machine learning, and image processing. The corresponding matrix inherits the complex structure of the underlying network and is in certain applications densely populated. This makes computations, in particular matrix-vector products, with the graph Laplacian a hard task. A typical application is the computation of a number of its eigenvalues and eigenvectors. Standard methods become infeasible as the number of nodes in the graph is too large. We propose the use of the fast summation based on the nonequispaced fast Fourier transform (NFFT) to perform the dense matrix-vector product with the graph Laplacian fast without ever forming the whole matrix. The enormous flexibility of the NFFT algorithm allows us to embed the accelerated multiplication into Lanczos-based eigenvalues routines or iterative linear system solvers and even consider other than the standard Gaussian kernels. We illustrate the feasibility of our approach on a number of test problems from image segmentation to semi-supervised learning based on graph-based PDEs. In particular, we compare our approach with the Nyström method. Moreover, we present and test an enhanced, hybrid version of the Nyström method, which internally uses the NFFT.

1711.04178 2026-06-04 cs.LG cs.CV cs.NA math.NA stat.ML

CUR Decompositions, Similarity Matrices, and Subspace Clustering

CUR分解、相似矩阵与子空间聚类

Akram Aldroubi, Keaton Hamm, Ahmet Bugra Koku, Ali Sekmen

AI总结 本文提出了一种利用CUR分解解决子空间聚类问题的通用框架,通过构造相似矩阵实现无噪声情况下的精确聚类,并展示了如何通过CUR分解生成多种相似矩阵以处理噪声数据,同时推导出两种已知的子空间聚类方法。

详情
Comments
Approximately 30 pages. Current version contains improved algorithm and numerical experiments from the previous version
AI中文摘要

本文提出了一种利用CUR分解解决子空间聚类问题的通用框架。CUR分解提供了一种自然方法来构造数据来自未知子空间联盟$\mathscr{U}=\underset{i=1}{\overset{M}\bigcup}S_i$的相似矩阵。由此构造的相似矩阵在无噪声情况下能够实现精确聚类。此外,这种分解还能从给定数据集生成多种不同的相似矩阵,从而具有足够的灵活性以对含噪声数据进行准确聚类。我们还展示了两种已知的子空间聚类方法可以从CUR分解中推导出来。本文还提出了一种基于相似矩阵理论构造的算法,并在合成和真实数据上进行了实验以测试该方法。此外,本文还利用了基于CUR的相似矩阵的改进版本,提供了一种启发式算法用于子空间聚类;该算法在Hopkins155运动分割数据集上的聚类性能目前最佳。

英文摘要

A general framework for solving the subspace clustering problem using the CUR decomposition is presented. The CUR decomposition provides a natural way to construct similarity matrices for data that come from a union of unknown subspaces $\mathscr{U}=\underset{i=1}{\overset{M}\bigcup}S_i$. The similarity matrices thus constructed give the exact clustering in the noise-free case. Additionally, this decomposition gives rise to many distinct similarity matrices from a given set of data, which allow enough flexibility to perform accurate clustering of noisy data. We also show that two known methods for subspace clustering can be derived from the CUR decomposition. An algorithm based on the theoretical construction of similarity matrices is presented, and experiments on synthetic and real data are presented to test the method. Additionally, an adaptation of our CUR based similarity matrices is utilized to provide a heuristic algorithm for subspace clustering; this algorithm yields the best overall performance to date for clustering the Hopkins155 motion segmentation dataset.

1802.08678 2026-06-04 eess.SY cs.LG cs.RO cs.SY stat.ML

Verifying Controllers Against Adversarial Examples with Bayesian Optimization

通过贝叶斯优化验证控制器对抗示例

Shromona Ghosh, Felix Berkenkamp, Gireeja Ranade, Shaz Qadeer, Ashish Kapoor

AI总结 本文提出基于贝叶斯优化的主动测试框架,用于验证控制器的安全性,通过逻辑定义安全约束并高效搜索行为空间以发现违反安全规范的对抗示例。

详情
Comments
Proc. of the IEEE International Conference on Robotics and Automation, 2018
AI中文摘要

最近强化学习的成功促使开发了用于现实世界机器人的复杂控制器。由于这些机器人被部署在安全关键应用中并与人类交互,确保安全性以避免造成伤害变得至关重要。为此方向的一个初步步骤是测试控制器在仿真中的表现。为了做到这一点,我们需要明确安全的定义,然后高效地搜索所有行为空间以确定其安全性。在本文中,我们提出了一种基于贝叶斯优化的主动测试框架。我们使用逻辑指定安全约束,并利用问题中的结构来测试系统,以发现违反安全规范的对抗示例。这些规范被定义为轨迹上的光滑函数的复杂布尔组合,与强化学习中的奖励函数不同,它们是表达性强且对系统施加硬约束。在我们的框架中,我们利用单个函数的正则性假设,形式化为高斯过程(GP)先验。我们结合这些内容到一个连贯的优化框架中,利用问题结构。所得到的算法能够证明验证复杂的安全规范或找到对抗示例。实验结果表明,所提出的方法能够快速发现对抗示例。

英文摘要

Recent successes in reinforcement learning have lead to the development of complex controllers for real-world robots. As these robots are deployed in safety-critical applications and interact with humans, it becomes critical to ensure safety in order to avoid causing harm. A first step in this direction is to test the controllers in simulation. To be able to do this, we need to capture what we mean by safety and then efficiently search the space of all behaviors to see if they are safe. In this paper, we present an active-testing framework based on Bayesian Optimization. We specify safety constraints using logic and exploit structure in the problem in order to test the system for adversarial counter examples that violate the safety specifications. These specifications are defined as complex boolean combinations of smooth functions on the trajectories and, unlike reward functions in reinforcement learning, are expressive and impose hard constraints on the system. In our framework, we exploit regularity assumptions on individual functions in form of a Gaussian Process (GP) prior. We combine these into a coherent optimization framework using problem structure. The resulting algorithm is able to provably verify complex safety specifications or alternatively find counter examples. Experimental results show that the proposed method is able to find adversarial examples quickly.

1711.05188 2026-06-04 math.NA cs.NA stat.ME

Weak convergence of Galerkin approximations for fractional elliptic stochastic PDEs with spatial white noise

分数椭圆随机PDEs中空间白噪声的Galerkin近似的弱收敛性

David Bolin, Kristin Kirchner, Mihály Kovács

AI总结 本文研究了带有空间白噪声的分数椭圆随机偏微分方程解的数值近似问题,采用有限元空间离散和分数逆算子的积分表示的四次近似,推导了弱收敛率并验证了理论结果。

详情
Journal ref
BIT Numer. Math. 58 (2018) pp. 881-906
Comments
22 pages, 1 figure
AI中文摘要

考虑在有界域上带有加性空间白噪声的随机偏微分方程解的数值近似。假设微分算子为整数阶椭圆微分算子的分数次幂。通过空间有限元离散和积分表示的分数逆算子的四次近似来近似解。对所得到的近似进行简要分析,特别是针对两次连续弗雷歇可微且二阶导数具有多项式增长的功能泛函,推导出显式的弱收敛率,并证明收敛率中来自随机性的部分比对应的强收敛率翻倍。不同功能泛函的数值实验验证了理论结果。

英文摘要

The numerical approximation of the solution to a stochastic partial differential equation with additive spatial white noise on a bounded domain is considered. The differential operator is assumed to be a fractional power of an integer order elliptic differential operator. The solution is approximated by means of a finite element discretization in space and a quadrature approximation of an integral representation of the fractional inverse from the Dunford-Taylor calculus. For the resulting approximation, a concise analysis of the weak error is performed. Specifically, for the class of twice continuously Fréchet differentiable functionals with second derivatives of polynomial growth, an explicit rate of weak convergence is derived, and it is shown that the component of the convergence rate stemming from the stochasticity is doubled compared to the corresponding strong rate. Numerical experiments for different functionals validate the theoretical results.

1805.07857 2026-06-04 cs.LG cs.CV cs.NA math.NA stat.ML

Parallel Transport Convolution: A New Tool for Convolutional Neural Networks on Manifolds

平行运输卷积:用于流形上卷积神经网络的新工具

Stefan C. Schonsheck, Bin Dong, Rongjie Lai

AI总结 本文提出平行运输卷积(PTC),一种在流形及其离散对应物上扩展卷积操作的新方法,能够保持卷积的紧凑支持、方向性和跨流形的可转移性,从而在曲面域上构建小波样操作和深度卷积神经网络。

详情
Comments
10 pages
AI中文摘要

卷积在科学和工程中的各种应用中扮演了重要的角色,是卷积神经网络中最关键的操作。近年来,研究者对在曲面域(如流形和图)上推广卷积的兴趣增长,但现有方法无法保持欧几里得卷积的所有理想特性,即紧凑支持滤波器、方向性和跨不同流形的可转移性。本文开发了一种新的卷积操作扩展,称为平行运输卷积(PTC),应用于黎曼流形及其离散对应物。PTC基于平行运输,能够沿流形传输信息并内在保持方向性。PTC允许构建具有紧凑支持的滤波器,并且对流形变形具有鲁棒性。这使得我们能够执行小波样操作,并在曲面域上定义深度卷积神经网络。

英文摘要

Convolution has been playing a prominent role in various applications in science and engineering for many years. It is the most important operation in convolutional neural networks. There has been a recent growth of interests of research in generalizing convolutions on curved domains such as manifolds and graphs. However, existing approaches cannot preserve all the desirable properties of Euclidean convolutions, namely compactly supported filters, directionality, transferability across different manifolds. In this paper we develop a new generalization of the convolution operation, referred to as parallel transport convolution (PTC), on Riemannian manifolds and their discrete counterparts. PTC is designed based on the parallel transportation which is able to translate information along a manifold and to intrinsically preserve directionality. PTC allows for the construction of compactly supported filters and is also robust to manifold deformations. This enables us to preform wavelet-like operations and to define deep convolutional neural networks on curved domains.

1804.01013 2026-06-04 math.OC cs.RO cs.SY eess.SY stat.ML

Resilient Non-Submodular Maximization over Matroid Constraints

基于Matroid约束的抗扰非子模最大化

Vasileios Tzoumas, Ali Jadbabaie, George J. Pappas

AI总结 本文研究了在传感器和执行器失效情况下,基于Matroid约束的控制与传感问题,提出了一种具有系统级鲁棒性、可扩展性和可证明近似界的新算法。

详情
Comments
arXiv admin note: substantial text overlap with arXiv:1803.07954. Correction on problem statement (Problem 1), and change in authors' info
AI中文摘要

大规模系统的控制和传感导致了不仅在传感器和执行器布置上,而且在调度或可观测性/可控性上都出现组合问题。此类系统设计和实现中的组合约束可以利用一种称为Matroid的结构来捕捉。特别是,Matroid的代数结构可以被用来开发可扩展的算法用于传感器和执行器选择,以及具有可量化近似界。然而,在大规模系统中,传感器和执行器可能失效或可能被(网络-)攻击。本文的目标是关注在存在传感器和执行器失效情况下的Matroid约束问题。一般来说,鲁棒Matroid约束问题在计算上是困难的。与非鲁棒情况(无故障)相反,尽管它们通常涉及单调或子模目标函数,但尚无已知的可扩展近似算法。在本文中,我们提供了第一个算法,其具有以下特性:首先,它实现了系统级鲁棒性,即该算法适用于任何数量的拒绝服务攻击或故障。其次,它是可扩展的,因为我们的算法终止时的运行时间与最先进的非鲁棒Matroid约束优化算法相同。第三,它提供了对系统性能的可证明近似界,因为对于单调目标函数,我们的算法保证了接近最优的解。我们使用单调(不一定子模)集合函数的曲率概念来量化我们的算法的近似性能。最后,我们通过考虑一个控制感知的传感器选择场景,即受传感约束的机器人导航,来支持我们的理论分析。

英文摘要

The control and sensing of large-scale systems results in combinatorial problems not only for sensor and actuator placement but also for scheduling or observability/controllability. Such combinatorial constraints in system design and implementation can be captured using a structure known as matroids. In particular, the algebraic structure of matroids can be exploited to develop scalable algorithms for sensor and actuator selection, along with quantifiable approximation bounds. However, in large-scale systems, sensors and actuators may fail or may be (cyber-)attacked. The objective of this paper is to focus on resilient matroid-constrained problems arising in control and sensing but in the presence of sensor and actuator failures. In general, resilient matroid-constrained problems are computationally hard. Contrary to the non-resilient case (with no failures), even though they often involve objective functions that are monotone or submodular, no scalable approximation algorithms are known for their solution. In this paper, we provide the first algorithm, that also has the following properties: First, it achieves system-wide resiliency, i.e., the algorithm is valid for any number of denial-of-service attacks or failures. Second, it is scalable, as our algorithm terminates with the same running time as state-of-the-art algorithms for (non-resilient) matroid-constrained optimization. Third, it provides provable approximation bounds on the system performance, since for monotone objective functions our algorithm guarantees a solution close to the optimal. We quantify our algorithm's approximation performance using a notion of curvature for monotone (not necessarily submodular) set functions. Finally, we support our theoretical analyses with numerical experiments, by considering a control-aware sensor selection scenario, namely, sensing-constrained robot navigation.

1812.00375 2026-06-04 math.NA cs.NA stat.CO

Ensemble-based implicit sampling for Bayesian inverse problems with non-Gaussian priors

基于集合的隐式采样方法用于具有非高斯先验的贝叶斯反问题

Yuming Ba, Lijian Jiang

AI总结 本文提出了一种基于集合的隐式采样方法,用于解决具有非高斯先验的贝叶斯反问题,通过结合迭代集合平滑(IES)和隐式采样,生成重要样本以构建重要密度,从而避免显式计算雅可比矩阵和海森矩阵,提高计算效率,并应用于非高斯模型中的反问题。

详情
Comments
27 pages
AI中文摘要

在本文中,我们开发了一种基于集合的隐式采样方法用于贝叶斯反问题。对于贝叶斯推断,迭代集合平滑(IES)和隐式采样被整合以获得重要样本,这些样本构建了重要密度。所提出的方法与重要采样有相似的思想。IES用于近似后验分布的均值和协方差。这提供了最大后验点和海森矩阵的逆,这些是构建隐式映射所必需的。重要样本由隐式映射生成,相应的权重是重要密度与后验密度的比率。在所提出的方法中,我们使用IES的集合样本来找到似然函数的优化解和海森矩阵的逆。这种方法避免了显式计算雅可比矩阵和海森矩阵,这在高维空间中计算成本很高。为了处理非高斯模型,离散余弦变换和高斯混合模型被用来表征非高斯先验。基于集合的隐式采样方法被扩展到非高斯先验,以探索反问题中的未知参数的后验。所提出的方法用于每个高斯模型在高斯混合模型中。所提出的方法显著提高了隐式采样方法的适用性。一些数值示例被呈现以展示所提方法的有效性,应用于地下水流问题和多孔介质中的异常扩散模型的反问题。

英文摘要

In the paper, we develop an ensemble-based implicit sampling method for Bayesian inverse problems. For Bayesian inference, the iterative ensemble smoother (IES) and implicit sampling are integrated to obtain importance ensemble samples, which build an importance density. The proposed method shares a similar idea to importance sampling. IES is used to approximate mean and covariance of a posterior distribution. This provides the MAP point and the inverse of Hessian matrix, which are necessary to construct the implicit map in implicit sampling. The importance samples are generated by the implicit map and the corresponding weights are the ratio between the importance density and posterior density. In the proposed method, we use the ensemble samples of IES to find the optimization solution of likelihood function and the inverse of Hessian matrix. This approach avoids the explicit computation for Jacobian matrix and Hessian matrix, which are very computationally expensive in high dimension spaces. To treat non-Gaussian models, discrete cosine transform and Gaussian mixture model are used to characterize the non-Gaussian priors. The ensemble-based implicit sampling method is extended to the non-Gaussian priors for exploring the posterior of unknowns in inverse problems. The proposed method is used for each individual Gaussian model in the Gaussian mixture model. The proposed approach substantially improves the applicability of implicit sampling method. A few numerical examples are presented to demonstrate the efficacy of the proposed method with applications of inverse problems for subsurface flow problems and anomalous diffusion models in porous media.

1812.00173 2026-06-04 stat.CO cs.NA math.NA

A Nonstationary Designer Space-Time Kernel

非平稳设计时空核

Michael McCourt, Gregory Fasshauer, David Kozak

AI总结 本文提出一种非平稳核,通过半轴定义以更自然地纳入时间因素,解决传统平稳协方差结构在动态不对称情况下的局限性。

详情
Comments
5 pages, 2 figures, NIPS 2018 spacetime workshop
AI中文摘要

在空间统计中,克里格模型通常使用平稳协方差结构进行设计;这种平移不变性产生了具有众多有利性质的模型。然而,在模型动态具有根本不对称性的情况下,例如从固定初始轮廓随时间演变的现象建模时,这一假设可能会受到限制。我们提出了一种新的非平稳核,仅在半轴上定义,以更自然地纳入建模过程中的时间因素。

英文摘要

In spatial statistics, kriging models are often designed using a stationary covariance structure; this translation-invariance produces models which have numerous favorable properties. This assumption can be limiting, though, in circumstances where the dynamics of the model have a fundamental asymmetry, such as in modeling phenomena that evolve over time from a fixed initial profile. We propose a new nonstationary kernel which is only defined over the half-line to incorporate time more naturally in the modeling process.

1808.04360 2026-06-04 cs.DS cs.SY eess.SY math.OC stat.AP

Stochastic on-time arrival problem in transit networks

公共交通网络中的随机准时到达问题

Yang Liu, Sebastien Blandin, Samitha Samaranayake

AI总结 本文研究了公共交通网络中旅行时间和等待时间均为随机变量的随机准时到达问题,提出了一种适合乘客在线决策的网络结构,包括登机、等待和换乘,并设计了一个伪多项式时间复杂度的动态规划算法,通过定义交通线路支配性并提出识别技术,将计算时间减少90%。

详情
Comments
29 pages; 12 figures. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/
AI中文摘要

本文考虑了公共交通网络中的随机准时到达问题,其中旅行时间和等待时间均为随机变量。该问题的一个特定挑战是由于交通线路到达顺序未知而导致的组合解空间。我们提出了一种适合乘客在线决策的网络结构,包括登机、等待和换乘。在此框架中,我们设计了一个动态规划算法,其时间复杂度为交通站数量和旅行时间预算的伪多项式,以及交通线路数量的指数,后者在实践中是一个小数字。为了减少搜索空间,我们提出了一种交通线路支配性定义,并提出识别支配性的技术,这在数值实验中将计算时间减少了高达90%。在合成网络和芝加哥公共交通网络上进行了广泛的数值实验。

英文摘要

This article considers the stochastic on-time arrival problem in transit networks where both the travel time and the waiting time for transit services are stochastic. A specific challenge of this problem is the combinatorial solution space due to the unknown ordering of transit line arrivals. We propose a network structure appropriate to the online decision-making of a passenger, including boarding, waiting and transferring. In this framework, we design a dynamic programming algorithm that is pseudo-polynomial in the number of transit stations and travel time budget, and exponential in the number of transit lines at a station, which is a small number in practice. To reduce the search space, we propose a definition of transit line dominance, and techniques to identify dominance, which decrease the computation time by up to 90% in numerical experiments. Extensive numerical experiments are conducted on both a synthetic network and the Chicago transit network.

1804.01526 2026-06-04 cs.LG cs.NA math.NA stat.ML

Training DNNs with Hybrid Block Floating Point

用混合块浮点数训练DNNs

Mario Drumond, Tao Lin, Martin Jaggi, Babak Falsafi

AI总结 本文提出了一种混合块浮点数(HBFP)方法,通过在块浮点数中执行所有点积运算,其他运算使用浮点数,从而在保持高精度的同时提高硬件密度和吞吐量。

详情
Comments
9 pages, 3 figures. Accepted in Neural Information Processing Systems 2018 (NeurIPS 2018)
AI中文摘要

深度神经网络(DNN)的广泛应用催生了持续增长的计算需求,迫使数据中心运营商采用领域专用加速器来训练它们。这些加速器通常使用密集打包的全精度浮点运算以最大化面积性能。持续的研究努力旨在通过用固定点运算替换浮点运算来进一步提高这种性能密度。然而,这些尝试面临的主要障碍是固定点的动态范围狭窄,不足以支持DNN训练的收敛。我们识别出块浮点数(BFP)作为有前途的替代表示,因为它具有宽动态范围,并且能够使大多数DNN运算使用固定点逻辑进行。不幸的是,BFP单独引入了几个限制,使其无法直接应用。在本文中,我们引入了HBFP,一种混合BFP-FP方法,它在BFP中执行所有点积运算,其他运算使用浮点运算。HBFP实现了两全其美:浮点数的高精度和固定点的优越硬件密度。对于广泛的各种模型,我们证明HBFP在保持浮点数精度的同时,能够实现高达8.5倍的吞吐量。

英文摘要

The wide adoption of DNNs has given birth to unrelenting computing requirements, forcing datacenter operators to adopt domain-specific accelerators to train them. These accelerators typically employ densely packed full precision floating-point arithmetic to maximize performance per area. Ongoing research efforts seek to further increase that performance density by replacing floating-point with fixed-point arithmetic. However, a significant roadblock for these attempts has been fixed point's narrow dynamic range, which is insufficient for DNN training convergence. We identify block floating point (BFP) as a promising alternative representation since it exhibits wide dynamic range and enables the majority of DNN operations to be performed with fixed-point logic. Unfortunately, BFP alone introduces several limitations that preclude its direct applicability. In this work, we introduce HBFP, a hybrid BFP-FP approach, which performs all dot products in BFP and other operations in floating point. HBFP delivers the best of both worlds: the high accuracy of floating point at the superior hardware density of fixed point. For a wide variety of models, we show that HBFP matches floating point's accuracy while enabling hardware implementations that deliver up to 8.5x higher throughput.

1505.01668 2026-06-04 cs.MA cs.SY eess.SY stat.AP

Multi-Target Tracking in Distributed Sensor Networks using Particle PHD Filters

在分布式传感器网络中使用粒子PHD滤波器进行多目标跟踪

Mark R. Leonard, Abdelhak M. Zoubir

AI总结 本文研究了分布式传感器网络中的多目标跟踪问题,提出了一种基于随机有限集的粒子PHD滤波器,解决了数据关联问题,并通过OSPA指标评估了其性能,同时与现有方法进行了比较。

详情
Comments
27 pages, 6 figures
AI中文摘要

多目标跟踪在民用和军事应用中是一个重要问题。本文研究了分布式传感器网络中的多目标跟踪。数据关联问题在多目标场景中尤为突出,可以通过多种解决方案来处理。我们考虑了基于随机有限集的概率假密度(PHD)滤波器的顺序蒙特卡洛实现。这种方法通过联合估计感兴趣区域内的所有目标,从而避免了数据关联问题。为此,我们开发了扩散粒子PHD滤波器(D-PPHDF)以及一个集中版本,称为多传感器粒子PHD滤波器(MS-PPHDF)。它们的性能在最优子模式分配(OSPA)度量下进行评估,与后验Cramér-Rao下限(PCRLB)的分布式扩展进行基准测试,并与现有分布式PHD粒子滤波器的性能进行比较。此外,还研究了所提出跟踪算法对异常值的鲁棒性以及不同量级的杂波对其性能的影响。

英文摘要

Multi-target tracking is an important problem in civilian and military applications. This paper investigates multi-target tracking in distributed sensor networks. Data association, which arises particularly in multi-object scenarios, can be tackled by various solutions. We consider sequential Monte Carlo implementations of the Probability Hypothesis Density (PHD) filter based on random finite sets. This approach circumvents the data association issue by jointly estimating all targets in the region of interest. To this end, we develop the Diffusion Particle PHD Filter (D-PPHDF) as well as a centralized version, called the Multi-Sensor Particle PHD Filter (MS-PPHDF). Their performance is evaluated in terms of the Optimal Subpattern Assignment (OSPA) metric, benchmarked against a distributed extension of the Posterior Cramér-Rao Lower Bound (PCRLB), and compared to the performance of an existing distributed PHD Particle Filter. Furthermore, the robustness of the proposed tracking algorithms against outliers and their performance with respect to different amounts of clutter is investigated.

1811.11433 2026-06-04 math.NA cs.LG cs.NA stat.ML

Beyond Pham's algorithm for joint diagonalization

超越Pham算法的联合对角化

Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort

AI总结 本文提出了一种新的拟牛顿方法来优化Pham提出的对角化准则,并通过模拟和真实数据集的实验表明该方法优于Pham算法。

详情
AI中文摘要

一组矩阵的近似联合对角化问题在于找到一个基底,使得这些矩阵尽可能对角化。这个问题自然出现在多种统计学习任务中,如盲源分离。我们考虑了Pham(2001)在开创性论文中研究的对角化准则,并提出了一种新的拟牛顿方法来优化它。通过在模拟和真实数据集上的数值实验,我们展示了所提出的方法优于Pham算法。一个开源的Python包已发布。

英文摘要

The approximate joint diagonalization of a set of matrices consists in finding a basis in which these matrices are as diagonal as possible. This problem naturally appears in several statistical learning tasks such as blind signal separation. We consider the diagonalization criterion studied in a seminal paper by Pham (2001), and propose a new quasi-Newton method for its optimization. Through numerical experiments on simulated and real datasets, we show that the proposed method outper-forms Pham's algorithm. An open source Python package is released.

1803.00444 2026-06-04 cs.LG cs.AI cs.RO cs.SY eess.SY stat.ML

Inverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal Modeling

通过非参数时空子目标建模实现逆强化学习

Adrian Šošić, Elmar Rueckert, Jan Peters, Abdelhak M. Zoubir, Heinz Koeppl

AI总结 本文提出了一种基于非参数时空子目标建模的逆强化学习方法,通过局部上下文更高效地解释单条轨迹,实现更紧凑的行为表示,并构建隐式意图模型以预测未观察到的情况,从而在处理意图变化和主动学习场景中表现出色。

详情
Comments
45 pages, 14 figures; ### Version 3 ### published in the Journal of Machine Learning Research
AI中文摘要

逆强化学习(IRL)领域的发展导致了更复杂的推理框架,这些框架放宽了原始建模假设,即观察到的代理行为仅反映单一意图。相反于学习全局行为模型,最近的IRL方法将演示数据分成部分,以考虑不同轨迹可能对应不同意图,例如因为它们由不同领域专家生成。在本工作中,我们进一步采用子目标的直观概念,建立一个前提:即使单条轨迹在特定上下文中局部解释也比全局更高效,从而实现更紧凑的行为表示。基于这一假设,我们构建了代理目标的隐式意图模型,以预测未观察到的情况。结果是一种集成的贝叶斯预测框架,显著优于现有IRL解决方案,并提供与专家计划一致的平滑策略估计。最值得注意的是,我们的框架自然处理代理意图随时间变化的情况,而经典IRL算法失败。此外,由于其概率性质,该模型可以轻松应用于主动学习场景,以指导专家的演示过程。

英文摘要

Advances in the field of inverse reinforcement learning (IRL) have led to sophisticated inference frameworks that relax the original modeling assumption of observing an agent behavior that reflects only a single intention. Instead of learning a global behavioral model, recent IRL methods divide the demonstration data into parts, to account for the fact that different trajectories may correspond to different intentions, e.g., because they were generated by different domain experts. In this work, we go one step further: using the intuitive concept of subgoals, we build upon the premise that even a single trajectory can be explained more efficiently locally within a certain context than globally, enabling a more compact representation of the observed behavior. Based on this assumption, we build an implicit intentional model of the agent's goals to forecast its behavior in unobserved situations. The result is an integrated Bayesian prediction framework that significantly outperforms existing IRL solutions and provides smooth policy estimates consistent with the expert's plan. Most notably, our framework naturally handles situations where the intentions of the agent change over time and classical IRL algorithms fail. In addition, due to its probabilistic nature, the model can be straightforwardly applied in active learning scenarios to guide the demonstration process of the expert.

1811.11259 2026-06-04 cs.LG cs.AI cs.DS cs.SY eess.SY stat.ML

Scaling Configuration of Energy Harvesting Sensors with Reinforcement Learning

基于强化学习的能源收集传感器的扩展配置

Francesco Fraternali, Bharathan Balaji, Rajesh Gupta

AI总结 本文提出利用强化学习自动配置室内太阳能板能源收集传感器的采样率,通过减少训练阶段和计算需求,实现快速部署和大规模扩展,有效提升传感器数据采集效率并避免能源耗尽。

详情
Journal ref
ENSsys '18: International Workshop on Energy Harvesting & Energy-Neutral Sensing Systems}{November 4, 2018}{Shenzhen, China
Comments
7 pages, 5 figures
AI中文摘要

随着物联网(IoT)的出现,越来越多的能源收集方法被用于补充或替代电池供电传感器。能源收集传感器需要根据应用、硬件和环境条件进行配置,以最大化其效用。目前,传感器配置要么是手动的,要么基于启发式方法,需要宝贵的领域专业知识。强化学习(RL)是一种有前景的方法,可以自动化配置并高效扩展IoT部署,但尚未在实践中得到应用。我们提出了解决这一差距的解决方案:减少RL的训练阶段,使节点在部署后短时间内即可运行,并减少计算需求以扩展到大规模部署。我们专注于配置基于室内太阳能板的能源收集传感器的采样率。我们基于三个月内从5个传感器节点收集的数据创建了一个模拟器。我们的模拟结果表明,RL可以有效学习能源可用性模式,并配置传感器节点的采样率以在确保不耗尽能源存储的情况下最大化传感数据。通过我们的方法,节点可以在部署的第一天内投入使用。我们还展示了通过使用相似光照条件的节点共享单个策略来减少RL策略数量的可能性。

英文摘要

With the advent of the Internet of Things (IoT), an increasing number of energy harvesting methods are being used to supplement or supplant battery based sensors. Energy harvesting sensors need to be configured according to the application, hardware, and environmental conditions to maximize their usefulness. As of today, the configuration of sensors is either manual or heuristics based, requiring valuable domain expertise. Reinforcement learning (RL) is a promising approach to automate configuration and efficiently scale IoT deployments, but it is not yet adopted in practice. We propose solutions to bridge this gap: reduce the training phase of RL so that nodes are operational within a short time after deployment and reduce the computational requirements to scale to large deployments. We focus on configuration of the sampling rate of indoor solar panel based energy harvesting sensors. We created a simulator based on 3 months of data collected from 5 sensor nodes subject to different lighting conditions. Our simulation results show that RL can effectively learn energy availability patterns and configure the sampling rate of the sensor nodes to maximize the sensing data while ensuring that energy storage is not depleted. The nodes can be operational within the first day by using our methods. We show that it is possible to reduce the number of RL policies by using a single policy for nodes that share similar lighting conditions.

1709.04072 2026-06-04 math.OC cs.NA math.NA stat.ML

A convergence framework for inexact nonconvex and nonsmooth algorithms and its applications to several iterations

一个不精确非凸和非光滑算法的收敛框架及其在若干迭代中的应用

Tao Sun, Hao Jiang, Lizhi Cheng, Wei Zhu

AI总结 本文提出了一种不精确非凸和非光滑算法的收敛框架,通过引入伪充分下降条件和伪相对误差条件,并假设连续性条件,证明在特定可和假设下,生成的序列收敛到目标函数的临界点,并将其应用于几种经典非凸迭代算法,推导了相应的收敛结果。

详情
AI中文摘要

在本文中,我们考虑了一种抽象的不精确非凸和非光滑算法的收敛性。我们为该算法引入了伪充分下降条件和伪相对误差条件,两者均与一个辅助序列相关,并假设连续性条件成立。实际上,许多经典的不精确非凸和非光滑算法都允许这三个条件。在辅助序列的特殊可和假设下,我们证明由一般算法生成的序列在假设Kurdyka-Lojasiewicz性质时收敛到目标函数的临界点。证明的核心在于构建一个新的Lyapunov函数,其连续差提供了一个对算法生成点连续差的界。然后,我们将我们的发现应用于几种经典的非凸迭代算法,并推导了相应的收敛结果。

英文摘要

In this paper, we consider the convergence of an abstract inexact nonconvex and nonsmooth algorithm. We promise a pseudo sufficient descent condition and a pseudo relative error condition, which are both related to an auxiliary sequence, for the algorithm; and a continuity condition is assumed to hold. In fact, a lot of classical inexact nonconvex and nonsmooth algorithms allow these three conditions. Under a special kind of summable assumption on the auxiliary sequence, we prove the sequence generated by the general algorithm converges to a critical point of the objective function if being assumed Kurdyka- Lojasiewicz property. The core of the proofs lies in building a new Lyapunov function, whose successive difference provides a bound for the successive difference of the points generated by the algorithm. And then, we apply our findings to several classical nonconvex iterative algorithms and derive the corresponding convergence results

1707.02688 2026-06-04 math.NA cs.NA stat.CO

A General Framework for Enhancing Sparsity of Generalized Polynomial Chaos Expansions

增强广义多项式混沌展开稀疏性的通用框架

Xiu Yang, Xiaoliang Wan, Lin Lin, Huan Lei

AI总结 本文提出了一种通用框架,通过交替方向法迭代旋转来识别新的随机变量,从而提高不确定性表示的稀疏性,提高了压缩感知不确定性量化方法的效率和准确性,并展示了该方法在求解随机偏微分方程和高维问题中的有效性。

详情
Comments
Corrected the lemmas in the previous version using perturbation theory of singular value decomposition. arXiv admin note: text overlap with arXiv:1506.04344
AI中文摘要

压缩感知在数据有限时已成为不确定性量化中的强大补充。本文提供了一种通用框架,用于增强以广义多项式混沌展开形式表示的不确定性的稀疏性。我们使用交替方向法来识别新的随机变量集,通过迭代旋转,使得不确定性的新表示更稀疏。因此,我们提高了基于压缩感知的不确定性量化方法的效率和准确性。我们证明了之前开发的用于增强Hermite多项式展开稀疏性的迭代方法是该通用框架的特殊情况。此外,我们使用Legendre和Chebyshev多项式展开来展示该方法的有效性,并应用于求解随机偏微分方程和高维(O(100))问题。

英文摘要

Compressive sensing has become a powerful addition to uncertainty quantification when only limited data is available. In this paper we provide a general framework to enhance the sparsity of the representation of uncertainty in the form of generalized polynomial chaos expansion. We use alternating direction method to identify new sets of random variables through iterative rotations such that the new representation of the uncertainty is sparser. Consequently, we increases both the efficiency and accuracy of the compressive sensing-based uncertainty quantification method. We demonstrate that the previously developed iterative method to enhance the sparsity of Hermite polynomial expansion is a special case of this general framework. Moreover, we use Legendre and Chebyshev polynomials expansions to demonstrate the effectiveness of this method with applications in solving stochastic partial differential equations and high-dimensional (O(100)) problems.

1608.07435 2026-06-04 eess.SY cs.SY stat.CO

Skew-t Filter and Smoother with Improved Covariance Matrix Approximation

具有改进协方差矩阵近似的斜分布滤波器与平滑器

Henri Nurminen, Tohid Ardeshiri, Robert Piché, Fredrik Gustafsson

AI总结 本文提出了一种针对线性离散时间状态空间模型中斜分布测量噪声的滤波和平滑算法,通过变分贝叶斯后验近似和耦合位置和偏斜变量来减少变分近似的误差,实验表明该方法在准确性和速度上优于之前的鲁棒滤波器和平滑器及其他低复杂度替代方案。

详情
Journal ref
H. Nurminen, T. Ardeshiri, R. Piché, and F. Gustafsson, "Skew-t Filter and Smoother with Improved Covariance Matrix Approximation", IEEE Transactions on Signal Processing, vol. 66, no. 21, pp. 5618-5633, 2018
Comments
14 pages, 15 figures
AI中文摘要

针对线性离散时间状态空间模型中具有斜分布测量噪声的滤波和平滑算法被提出。所提出的算法利用变分贝叶斯后验近似,结合耦合的位置和偏斜变量,以减少变分近似带来的误差。尽管变分更新使用了期望传播算法进行次优更新,但我们的仿真显示,所提出的方法在后验协方差矩阵的近似精度上优于先前提出的变分算法。因此,新的滤波器和平滑器在准确性和速度上均优于先前提出的鲁棒滤波器和平滑器及其他现有低复杂度替代方案。我们基于真实世界导航数据,特别是城市区域的GPS数据,进行了仿真和测试以展示新方法的性能。此外,还概述了所提出算法扩展到测量噪声分布为多变量斜-t分布的情况。最后,本文还对所提出算法的理论性能界限进行了研究。

英文摘要

Filtering and smoothing algorithms for linear discrete-time state-space models with skew-t-distributed measurement noise are proposed. The algorithms use a variational Bayes based posterior approximation with coupled location and skewness variables to reduce the error caused by the variational approximation. Although the variational update is done suboptimally using an expectation propagation algorithm, our simulations show that the proposed method gives a more accurate approximation of the posterior covariance matrix than an earlier proposed variational algorithm. Consequently, the novel filter and smoother outperform the earlier proposed robust filter and smoother and other existing low-complexity alternatives in accuracy and speed. We present both simulations and tests based on real-world navigation data, in particular GPS data in an urban area, to demonstrate the performance of the novel methods. Moreover, the extension of the proposed algorithms to cover the case where the distribution of the measurement noise is multivariate skew-$t$ is outlined. Finally, the paper presents a study of theoretical performance bounds for the proposed algorithms.

1811.10446 2026-06-04 math.NA cs.NA stat.ME

Non-deterministic inference using random set models: theory, approximation, and sampling method

基于随机集模型的非确定性推断:理论、近似与采样方法

Truong-Vinh Hoang, Hermann G. Matthies

AI总结 本文提出了一种非确定性推断框架,旨在解决用随机集表示不确定性的反问题,通过理论、近似和采样方法统一了区间变量、Dempster-Shafer证据理论中的质量信念函数、可能性理论和概率分布集合等不确定性描述。

详情
AI中文摘要

随机集是随机变量的推广,即一个值集随机变量。随机集理论允许将其他不确定性描述统一起来,如区间变量、Dempster-Shafer证据理论中的质量信念函数、可能性理论以及概率分布集合。本文的目标是开发一种非确定性推断框架,包括理论、近似和采样方法,用于处理用随机集表示不确定性的反问题。所提出的推断方法基于先验和测量诱导随机集的交集,得到后验随机集。该推断方法是Dempster组合规则的扩展,也是贝叶斯推断的推广。直接评估后验随机集可能不切实际。我们通过使用所提出概率分布生成的样本集近似后验随机集。我们使用后验随机集的容量变换密度函数用于此分布。该函数具有特殊性质:它是通过先验随机集的容量变换密度函数的贝叶斯推断得到的后验密度函数。此类所提出概率分布的样本可通过贝叶斯推断框架中开发的方法直接获得。通过这种近似方法,后验随机集的评估变得可行。

英文摘要

A random set is a generalisation of a random variable, i.e. a set-valued random variable. The random set theory allows a unification of other uncertainty descriptions such as interval variable, mass belief function in Dempster-Shafer theory of evidence, possibility theory, and set of probability distributions. The aim of this work is to develop a non-deterministic inference framework, including theory, approximation and sampling method, that deals with the inverse problems in which uncertainty is represented using random sets. The proposed inference method yields the posterior random set based on the intersection of the prior and the measurement induced random sets. That inference method is an extension of Dempster's rule of combination, and a generalisation of Bayesian inference as well. A direct evaluation of the posterior random set might be impractical. We approximate the posterior random set by a random discrete set whose domain is the set of samples generated using a proposed probability distribution. We use the capacity transform density function of the posterior random set for this proposed distribution. This function has a special property: it is the posterior density function yielded by Bayesian inference of the capacity transform density function of the prior random set. The samples of such proposed probability distribution can be directly obtained using the methods developed in the Bayesian inference framework. With this approximation method, the evaluation of the posterior random set becomes tractable.

1811.10275 2026-06-04 stat.CO cs.LG cs.NA math.NA stat.ML

Rejoinder for "Probabilistic Integration: A Role in Statistical Computation?"

对“概率积分:在统计计算中的作用?”的回应

Francois-Xavier Briol, Chris J. Oates, Mark Girolami, Michael A. Osborne, Dino Sejdinovic

AI总结 本文是对即将发表在《统计科学》上的论文“概率积分:在统计计算中的作用?”的回应。作者感谢了评审员和同事们的帮助,并回应了讨论者提出的问题,探讨了贝叶斯方法在数值分析中的应用及其在统计计算中的作用。

详情
Comments
Accepted to Statistical Science
AI中文摘要

本文是对即将发表在《统计科学》上的论文“概率积分:在统计计算中的作用?”的回应。我们首先感谢评审员和许多同事帮助塑造了这篇论文,感谢编辑选择我们的论文进行讨论,当然还有所有讨论者对他们深入、有见地和建设性的评论。在本回应中,我们回应了讨论者提出的一些观点,并进一步探讨了论文背后的基本问题:(i)贝叶斯思想是否应用于数值分析?(ii)如果应该,此类方法在统计计算中应扮演什么角色?

英文摘要

This article is the rejoinder for the paper "Probabilistic Integration: A Role in Statistical Computation?" to appear in Statistical Science with discussion. We would first like to thank the reviewers and many of our colleagues who helped shape this paper, the editor for selecting our paper for discussion, and of course all of the discussants for their thoughtful, insightful and constructive comments. In this rejoinder, we respond to some of the points raised by the discussants and comment further on the fundamental questions underlying the paper: (i) Should Bayesian ideas be used in numerical analysis?, and (ii) If so, what role should such approaches have in statistical computation?

1809.01588 2026-06-04 math.NA cs.LG cs.NA math.AG math.ST stat.TH

Learning Paths from Signature Tensors

从签名张量学习路径

Max Pfeffer, Anna Seigal, Bernd Sturmfels

AI总结 本文通过张量分解、代数几何和数值优化方法,研究了张量的矩阵共轭问题,并针对随机分析中的逆问题,提出从第三阶签名张量恢复路径的方法,建立了路径的可识别性结果。

详情
Comments
22 pages, 3 figures
AI中文摘要

矩阵共轭自然地扩展到张量的设置中。我们应用张量分解、代数几何和数值优化的方法来处理这一群作用。给定一个张量在另一个张量的轨道上,我们计算一个矩阵将其转换为另一个。我们的主要应用是一个来自随机分析的逆问题:从第三阶签名张量恢复路径。我们为分段线性路径、多项式路径和通用字典建立了精确和数值的可识别性结果。数值优化应用于从不精确数据中恢复。我们还计算了具有给定签名张量的最短路径。

英文摘要

Matrix congruence extends naturally to the setting of tensors. We apply methods from tensor decomposition, algebraic geometry and numerical optimization to this group action. Given a tensor in the orbit of another tensor, we compute a matrix which transforms one to the other. Our primary application is an inverse problem from stochastic analysis: the recovery of paths from their third order signature tensors. We establish identifiability results, both exact and numerical, for piecewise linear paths, polynomial paths, and generic dictionaries. Numerical optimization is applied for recovery from inexact data. We also compute the shortest path with a given signature tensor.

1807.11287 2026-06-04 astro-ph.IM cs.NA math.NA stat.AP

Sparse Bayesian Imaging of Solar Flares

稀疏贝叶斯成像用于太阳耀斑

Federica Sciacchitano, Silvio Lugaro, Alberto Sorrentino

AI总结 本文提出了一种基于贝叶斯方法的稀疏成像技术,用于从NASA RHESSI数据中重建太阳耀斑图像,通过参数化方法和蒙特卡罗算法提高图像质量和参数不确定性量化。

详情
Comments
accepted on Siam Journal on Imaging Sciences
AI中文摘要

我们考虑将太阳耀斑成像视为参数化成像问题,其中耀斑被表示为有限数量的几何形状集合。我们建立了一个贝叶斯模型,其中构成图像的物体数量及其形状在事前是未知的。我们使用顺序蒙特卡罗算法来探索相应的后验分布。我们将该方法应用于合成和实验数据,这些数据在RHESSI社区中较为知名。该方法能够重建改进的太阳耀斑图像,并额外提供估计参数的不确定性量化。

英文摘要

We consider imaging of solar flares from NASA RHESSI data as a parametric imaging problem, where flares are represented as a finite collection of geometric shapes. We set up a Bayesian model in which the number of objects forming the image is a priori unknown, as well as their shapes. We use a Sequential Monte Carlo algorithm to explore the corresponding posterior distribution. We apply the method to synthetic and experimental data, largely known in the RHESSI community. The method reconstructs improved images of solar flares, with the additional advantage of providing uncertainty quantification of the estimated parameters.

1811.08101 2026-06-04 math.PR cs.NA math.AP math.NA math.ST stat.TH

Global sensitivity analysis for models described by stochastic differential equations

对由随机微分方程描述的模型进行全局灵敏度分析

Pierre Etoré, Clémentine Prieur, Dang Khoi Pham, Long Li

AI总结 本文研究了由随机微分方程描述的模型中参数不确定性对感兴趣量变异性影响的全局灵敏度分析,采用Sobol指数和多项式混沌展开与随机伽辽金投影方法进行分析。

详情
AI中文摘要

许多数学模型包含输入参数,这些参数并不精确已知。全局灵敏度分析旨在识别那些不确定性对感兴趣量变异性影响最大的参数。用于量化每个输入变量对感兴趣量影响的统计工具之一是Sobol灵敏度指数。在本文中,我们考虑由随机微分方程(SDE)描述的随机模型。我们专注于均值量,定义为感兴趣量相对于Wiener测度的期望值。我们的方法基于感兴趣量的Feynman-Kac表示,从而得到初始问题的参数化偏微分方程(PDE)表示。然后我们使用多项式混沌展开和随机伽辽金投影来处理参数化PDE的不确定性。

英文摘要

Many mathematical models involve input parameters, which are not precisely known. Global sensitivity analysis aims to identify the parameters whose uncertainty has the largest impact on the variability of a quantity of interest. One of the statistical tools used to quantify the influence of each input variable on the quantity of interest are the Sobol' sensitivity indices. In this paper, we consider stochastic models described by stochastic differential equations (SDE). We focus the study on mean quantities, defined as the expectation with respect to the Wiener measure of a quantity of interest related to the solution of the SDE itself. Our approach is based on a Feynman-Kac representation of the quantity of interest, from which we get a parametrized partial differential equation (PDE) representation of our initial problem. We then handle the uncertainty on the parametrized PDE using polynomial chaos expansion and a stochastic Galerkin projection.

1804.05678 2026-06-04 math.NA cs.NA math.OC stat.CO

Sparse solutions in optimal control of PDEs with uncertain parameters: the linear case

PDEs优化控制中稀疏解:线性情况

Chen Li, Georg Stadler

AI总结 本文研究由具有不确定参数的PDEs支配的优化控制问题的稀疏解,提出两种方法,一种是确定性控制优化均值目标,另一种是具有相同稀疏结构的随机控制方法。通过低秩算子近似和误差估计,结合预条件牛顿-共轭梯度法加速IRLS算法,数值实验显示牛顿变体在性能上优于IRLS方法。

详情
Comments
25 pages
AI中文摘要

我们研究由具有不确定系数的PDEs支配的优化控制问题的稀疏解。我们提出两种方法,一种是确定性控制优化均值目标,另一种是具有相同稀疏结构的随机控制方法。在两种方法中,控制不为零的区域可以解释为放置控制装置的最佳位置。本文专注于具有线性进入不确定参数的线性PDEs。在这些假设下,确定性方法简化为结构已知的问题,因此我们主要研究随机控制方法。通过将点wise平方控制的均值的L1范数纳入目标函数,实现共享稀疏性。我们使用仅在物理空间上定义的范数重加权函数重新公式化问题,从而避免使用样本或二次积分近似随机空间。我们证明将固定点算法应用于范数重加权公式会导致一种已研究的迭代加权最小二乘(IRLS)算法变体,并提出一种新的预条件牛顿-共轭梯度法来加速IRLS算法。我们将我们的算法与低秩算子近似结合,并提供截断误差的估计。我们仔细研究了所得到算法的计算复杂性。通过由拉普拉斯和亥姆霍兹方程支配的控制问题进行数值实验,研究最优控制的稀疏结构和解算法的性能。在这些实验中,牛顿变体明显优于IRLS方法。

英文摘要

We study sparse solutions of optimal control problems governed by PDEs with uncertain coefficients. We propose two formulations, one where the solution is a deterministic control optimizing the mean objective, and a formulation aiming at stochastic controls that share the same sparsity structure. In both formulations, regions where the controls do not vanish can be interpreted as optimal locations for placing control devices. In this paper, we focus on linear PDEs with linearly entering uncertain parameters. Under these assumptions, the deterministic formulation reduces to a problem with known structure, and thus we mainly focus on the stochastic control formulation. Here, shared sparsity is achieved by incorporating the $L^1$-norm of the mean of the pointwise squared controls in the objective. We reformulate the problem using a norm reweighting function that is defined over physical space only and thus helps to avoid approximation of the random space using samples or quadrature. We show that a fixed point algorithm applied to the norm reweighting formulation leads to a variant of the well-studied iterative reweighted least squares (IRLS) algorithm, and we propose a novel preconditioned Newton-conjugate gradient method to speed up the IRLS algorithm. We combine our algorithms with low-rank operator approximations, for which we provide estimates of the truncation error. We carefully examine the computational complexity of the resulting algorithms. The sparsity structure of the optimal controls and the performance of the solution algorithms are studied numerically using control problems governed by the Laplace and Helmholtz equations. In these experiments the Newton variant clearly outperforms the IRLS method.

1712.00232 2026-06-04 math.OC cs.CC cs.MA cs.SY eess.SY stat.ML

Optimal Algorithms for Distributed Optimization

分布式优化的最优算法

César A. Uribe, Soomin Lee, Alexander Gasnikov, Angelia Nedić

AI总结 本文研究了网络中分布式凸优化问题的最优收敛率,通过将网络通信限制建模为一组线性约束,为四种不同设置提供了最优复杂度界,包括目标函数F(x)强凸和平滑、强凸或平滑或仅凸的情况。结果表明,Nesterov加速梯度下降法在对偶问题上可以分布式执行,并在额外成本相关于交互矩阵的谱间隙的情况下,获得与集中版本相同最优速率(常数或对数因子)。最后,讨论了该设置的一些扩展,如近似友好的函数、时变图和条件数的改进。

详情
AI中文摘要

在本文中,我们研究了网络中分布式凸优化问题的最优收敛率。我们将网络通信限制所施加的约束建模为一组线性约束,并为四种不同的设置提供了最优复杂度界,即目标函数F(x)强凸且光滑,或者强凸或光滑,或者仅仅是凸的。我们的结果表明,Nesterov加速梯度下降法在对偶问题上可以分布式执行,并且在额外成本相关于交互矩阵的谱间隙的情况下,可以获得与集中版本问题相同的最优速率(常数或对数因子)。最后,我们讨论了该设置的一些扩展,如近似友好的函数、时变图以及条件数的改进。

英文摘要

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.

1811.05646 2026-06-04 eess.SY cs.LG cs.SY stat.ML

Fast Distribution Grid Line Outage Identification with $μ$PMU

快速分布电网线路故障识别与μPMU

Yizheng Liao, Yang Weng, Chin-Woo Tan, Ram Rajagopal

AI总结 本文提出基于μPMU的随机时间序列分析的数据驱动故障监测方法,通过功率流分析证明线路故障后电压时间序列的统计特性显著变化,利用最大似然方法直接学习分布参数,实现快速准确的故障识别。

详情
Comments
9 pages
AI中文摘要

随着分布式能源资源(DERs)在城市配电网中的日益集成,由于DER行为的不确定性和复杂性,可靠性问题日益突出。在大规模DER渗透的情况下,传统故障检测方法依赖于客户电话呼叫和智能电表的“最后信号”,由于可再生能源发电机可在线路故障后供电且许多城市电网为网状结构,线路故障不会影响供电。为解决这些问题,我们提出了一种基于微量程相量测量单元(μPMU)的随机时间序列分析的数据驱动故障监测方法。具体而言,我们通过功率流分析证明,线路故障后电压时间序列的依赖性表现出显著的统计变化。这使得基于最优变化点检测的理论能够通过μPMU实现快速且准确的线路故障识别。然而,现有变化点检测方法需要分布系统中故障后电压分布未知。因此,我们设计了一种基于最大似然的方法,直接从μPMU数据中学习分布参数。我们证明,基于估计参数的检测仍能实现最优性能,使其在配电网故障识别中具有极高的实用性。仿真结果表明,在八个配置有和没有DERs的配电网中,使用μPMU数据实现了高度准确的故障识别。

英文摘要

The growing integration of distributed energy resources (DERs) in urban distribution grids raises various reliability issues due to DER's uncertain and complex behaviors. With a large-scale DER penetration, traditional outage detection methods, which rely on customers making phone calls and smart meters' "last gasp" signals, will have limited performance, because the renewable generators can supply powers after line outages and many urban grids are mesh so line outages do not affect power supply. To address these drawbacks, we propose a data-driven outage monitoring approach based on the stochastic time series analysis from micro phasor measurement unit ($μ$PMU). Specifically, we prove via power flow analysis that the dependency of time-series voltage measurements exhibits significant statistical changes after line outages. This makes the theory on optimal change-point detection suitable to identify line outages via $μ$PMUs with fast and accurate sampling. However, existing change point detection methods require post-outage voltage distribution unknown in distribution systems. Therefore, we design a maximum likelihood-based method to directly learn the distribution parameters from $μ$PMU data. We prove that the estimated parameters-based detection still achieves the optimal performance, making it extremely useful for distribution grid outage identifications. Simulation results show highly accurate outage identification in eight distribution grids with 14 configurations with and without DERs using $μ$PMU data.

1811.04006 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Reachability-based safe learning for optimal control problem

基于可达性的安全学习用于最优控制问题

Stanislav Fedorov, Antonio Candelieri

AI总结 本文提出了一种结合系统部分已知状态空间模型和未知动态作为加性有界扰动的安全学习方法,旨在通过安全集选择最优动作以实现目标集,同时在学习过程中更新扰动并提升最优控制的鲁棒性。

详情
AI中文摘要

在本文中,我们寻求一种整合安全性的学习方法,该方法依赖于部分已知的系统状态空间模型,并将未知动态视为加性有界扰动。我们引入了一个框架,用于在存在扰动的情况下安全地学习控制策略。基于已知模型部分,算法可以在满足安全保持条件的情况下,选择最优动作以追求目标集。在一些学习回合后,扰动可以根据现实数据进行更新。为此,对收集的扰动样本进行高斯过程回归。由于现实世界的不稳定性,例如摩擦或导电性随温度的变化,我们期望获得更鲁棒的最优控制问题解决方案。为了评估上述方法,我们选择倒立摆作为基准模型。所提出的算法能够学习到不违反预设安全约束的策略。当将其与探索设置结合时,观察到性能有所提升,从而确保在安全集内学习到最优策略。最后,我们概述了一些超出本文范围的未来研究方向。

英文摘要

In this work we seek for an approach to integrate safety in the learning process that relies on a partly known state-space model of the system and regards the unknown dynamics as an additive bounded disturbance. We introduce a framework for safely learning a control strategy for a given system with an additive disturbance. On the basis of the known part of the model, a safe set in which the system can learn safely, the algorithm can choose optimal actions for pursuing the target set as long as the safety-preserving condition is satisfied. After some learning episodes, the disturbance can be updated based on real-world data. To this end, Gaussian Process regression is conducted on the collected disturbance samples. Since the unstable nature of the law of the real world, for example, change of friction or conductivity with the temperature, we expect to have the more robust solution of optimal control problem. For evaluation of approach described above we choose an inverted pendulum as a benchmark model. The proposed algorithm manages to learn a policy that does not violate the pre-specified safety constraints. Observed performance is improved when it was incorporated exploration set up to make sure that an optimal policy is learned everywhere in the safe set. Finally, we outline some promising directions for future research beyond the scope of this paper.

1811.03621 2026-06-04 cs.HC cs.CV cs.LG cs.SY eess.SY stat.ML

Satyam: Democratizing Groundtruth for Machine Vision

Satyam: 机器视觉领域地面真实数据的民主化

Hang Qiu, Krishna Chintalapudi, Ramesh Govindan

AI总结 本文提出Satyam系统,通过简化流程使非专业人员能够高效收集机器视觉的地面真实数据,从而提升自动驾驶、交通监控和视频监控系统的性能。

详情
AI中文摘要

机器学习的民主化已经导致了用于自动驾驶、交通监控和视频监控的基于机器学习的机器视觉系统。然而,没有大大简化收集地面真实数据的过程,真正的民主化就无法实现。这种地面真实数据的收集对于确保在不同条件下具有良好的性能是必要的。在本文中,我们提出了Satyam系统的设计和评估,这是一个首次出现的系统,使非专业人士能够以最小的努力启动机器视觉的地面真实数据收集任务。Satyam利用一个众包平台,亚马逊机械 Turk,并自动化了地面真实数据收集的几个具有挑战性的方面:创建和启动定制的网页用户界面任务以获取所需的真实数据,控制结果质量以应对垃圾邮件发送者和未经训练的工人,根据任务复杂性调整价格,过滤表现差的垃圾邮件发送者和工人,以及处理工人的报酬。我们通过几种流行的基准视觉数据集验证了Satyam,并展示了通过Satyam获得的真实数据与由训练专家获得的数据相当,并且在用于训练时提供匹配的机器学习性能。

英文摘要

The democratization of machine learning (ML) has led to ML-based machine vision systems for autonomous driving, traffic monitoring, and video surveillance. However, true democratization cannot be achieved without greatly simplifying the process of collecting groundtruth for training and testing these systems. This groundtruth collection is necessary to ensure good performance under varying conditions. In this paper, we present the design and evaluation of Satyam, a first-of-its-kind system that enables a layperson to launch groundtruth collection tasks for machine vision with minimal effort. Satyam leverages a crowdtasking platform, Amazon Mechanical Turk, and automates several challenging aspects of groundtruth collection: creating and launching of custom web-UI tasks for obtaining the desired groundtruth, controlling result quality in the face of spammers and untrained workers, adapting prices to match task complexity, filtering spammers and workers with poor performance, and processing worker payments. We validate Satyam using several popular benchmark vision datasets, and demonstrate that groundtruth obtained by Satyam is comparable to that obtained from trained experts and provides matching ML performance when used for training.

1811.02033 2026-06-04 stat.ML cs.LG cs.NA math.AP math.NA

Physics-Informed Generative Adversarial Networks for Stochastic Differential Equations

基于随机微分方程的物理信息生成对抗网络

Liu Yang, Dongkun Zhang, George Em Karniadakis

AI总结 本文提出了一种新的物理信息生成对抗网络(PI-GANs),通过统一解决随机问题中的正向、逆向和混合问题,利用自动微分将物理定律编码到GAN架构中,展示了PI-GANs在高维随机微分方程求解中的准确性和有效性。

详情
AI中文摘要

我们开发了一种新的物理信息生成对抗网络(PI-GANs),以统一的方式解决基于有限散射测量的正向、逆向和混合随机问题。与仅依赖数据训练的常规GANs不同,我们通过自动微分将 governing 物理定律以随机微分方程(SDEs)的形式编码到GANs的架构中。特别地,我们应用了Wasserstein GANs with gradient penalty(WGAN-GP),因为其比vanilla GANs具有更强的稳定性。我们首先测试了WGAN-GP在基于来自稀疏放置传感器的同时读取数据中不同相关长度的高斯过程的近似能力。我们获得了良好的生成随机过程对目标过程的近似,即使输入噪声维度与目标随机过程的有效维度不匹配。我们还研究了判别器和生成器的过拟合问题,并发现生成器也出现过拟合,除了判别器之外。随后,我们考虑了解决需要近似三个随机过程(即解、激励和扩散系数)的椭圆SDEs。我们使用了三个生成器,其中两个是前馈深度神经网络(DNNs),另一个是由SDE诱导的神经网络。根据数据,我们使用一个或多个前馈DNNs作为PI-GANs中的判别器。在此,我们展示了PI-GANs在解决最多30维的SDEs中的准确性和有效性,但原则上,PI-GANs可以处理非常高的维数问题,只要有更多的传感器数据,并且计算成本具有低多项式增长。

英文摘要

We developed a new class of physics-informed generative adversarial networks (PI-GANs) to solve in a unified manner forward, inverse and mixed stochastic problems based on a limited number of scattered measurements. Unlike standard GANs relying only on data for training, here we encoded into the architecture of GANs the governing physical laws in the form of stochastic differential equations (SDEs) using automatic differentiation. In particular, we applied Wasserstein GANs with gradient penalty (WGAN-GP) for its enhanced stability compared to vanilla GANs. We first tested WGAN-GP in approximating Gaussian processes of different correlation lengths based on data realizations collected from simultaneous reads at sparsely placed sensors. We obtained good approximation of the generated stochastic processes to the target ones even for a mismatch between the input noise dimensionality and the effective dimensionality of the target stochastic processes. We also studied the overfitting issue for both the discriminator and generator, and we found that overfitting occurs also in the generator in addition to the discriminator as previously reported. Subsequently, we considered the solution of elliptic SDEs requiring approximations of three stochastic processes, namely the solution, the forcing, and the diffusion coefficient. We used three generators for the PI-GANs, two of them were feed forward deep neural networks (DNNs) while the other one was the neural network induced by the SDE. Depending on the data, we employed one or multiple feed forward DNNs as the discriminators in PI-GANs. Here, we have demonstrated the accuracy and effectiveness of PI-GANs in solving SDEs for up to 30 dimensions, but in principle, PI-GANs could tackle very high dimensional problems given more sensor data with low-polynomial growth in computational cost.

1704.08999 2026-06-04 math.OC cs.SY eess.SY stat.ML

Distribution System Voltage Control under Uncertainties using Tractable Chance Constraints

配电系统在不确定性下的电压控制使用可处理的可能性约束

Pan Li, Baihong Jin, Dai Wang, Baosen Zhang

AI总结 本文研究了在不确定性下通过可处理的可能性约束进行配电系统电压控制,采用了一种考虑每个节点上可再生能源之间任意相关性的可能性约束方法,并通过历史样本高效解决该问题,同时证明了该优化问题在多种概率分布下是凸的,比传统每节点可能性约束更稳健且计算更高效。

详情
Comments
20 pages, 4 figures. Accepted by IEEE Transactions on Power Systems
AI中文摘要

电压控制在电力分配网络的操作中起着重要作用,尤其是在分布式能源资源高渗透率的情况下。这些资源引入了显著且快速变化的不确定性。在本文中,我们专注于无功功率补偿以在不确定性存在下控制电压。我们采用了一种考虑每个节点上可再生能源之间任意相关性的可能性约束方法。我们展示了如何通过历史样本使用随机准梯度方法高效地解决该问题。我们还证明了该优化问题在广泛的概率分布下是凸的。与传统每节点可能性约束相比,我们的方法对不确定性更具鲁棒性且计算上更具可处理性。我们使用标准的IEEE分配测试馈线来展示结果。

英文摘要

Voltage control plays an important role in the operation of electricity distribution networks, especially with high penetration of distributed energy resources. These resources introduce significant and fast varying uncertainties. In this paper, we focus on reactive power compensation to control voltage in the presence of uncertainties. We adopt a chance constraint approach that accounts for arbitrary correlations between renewable resources at each of the buses. We show how the problem can be solved efficiently using historical samples via a stochastic quasi gradient method. We also show that this optimization problem is convex for a wide variety of probabilistic distributions. Compared to conventional per-bus chance constraints, our formulation is more robust to uncertainty and more computationally tractable. We illustrate the results using standard IEEE distribution test feeders.

1811.01314 2026-06-04 eess.SY cs.SY math.OC stat.AP

Modeling Traffic Networks Using Integrated Route and Link Data

利用集成的路径和链接数据建模交通网络

Xilei Zhao, James C. Spall

AI总结 本文通过整合Google Maps的路径旅行时间数据和链接交通状况数据,利用最大似然估计技术建模交通网络,提出了一种更精确的旅行时间可靠性估计方法,并通过实证数据验证了其有效性。

详情
Comments
11 pages, 4 figures
AI中文摘要

实时导航服务,如Google Maps和Waze,在日常生活中被广泛使用。这些服务提供了实时交通状况和旅行时间预测的丰富数据资源;然而,它们尚未被完全应用于交通建模。本文旨在利用Google Maps的交通数据,并应用最大似然估计的先进技术来建模交通网络和旅行时间可靠性。本文整合了Google Maps的路径旅行时间数据和链接交通状况数据,以建模交通网络的复杂性。我们然后构建了Fisher信息矩阵,并应用渐近正态性以获得感兴趣网络中随机路线旅行时间估计的概率分布。我们还通过考虑两个层次的不确定性,即路线旅行时间的不确定性及其旅行时间估计的不确定性,推导出旅行时间可靠性。所提出的方法可以提供更真实和精确的旅行时间可靠性估计。该方法应用于巴尔的摩市中心的一个小型网络,其中我们提出了一种链接数据收集策略,并通过遵循该策略提供实证证据以显示数据的独立性。我们还展示了不同路线内的最大似然估计和旅行时间可靠性度量的结果。此外,我们使用不同网络的历史数据来验证这种方法,显示我们的方法相比经验数据的样本均值提供了更准确和精确的估计。

英文摘要

Real-time navigation services, such as Google Maps and Waze, are widely used in daily life. These services provide rich data resources in real-time traffic conditions and travel time predictions; however, they have not been fully applied in transportation modeling. This paper aims to use traffic data from Google Maps and applying cutting-edge technologies in maximum likelihood estimation to model traffic networks and travel time reliability. This paper integrates Google Maps travel time data for routes and traffic condition data for links to model the complexities of traffic networks. We then formulate the Fisher information matrix and apply the asymptotic normality to obtain the probability distribution of the travel time estimates for a random route within the network of interest. We also derive the travel time reliability by considering two levels of uncertainties, i.e., the uncertainty of the route's travel time and the uncertainty of its travel time estimates. The proposed method could provide a more realistic and precise travel time reliability estimate. The methodology is applied to a small network in the downtown Baltimore area, where we propose a link data collection strategy and provide empirical evidence to show data independence by following this strategy. We also show results for maximum likelihood estimates and travel time reliability measures for different routes within the network. Furthermore, we use the historical data from a different network to validate this approach, showing our method provides a more accurate and precise estimate compared to the sample mean of the empirical data.

1605.03364 2026-06-04 math.NA cs.LG cs.NA stat.ML

Active Uncertainty Calibration in Bayesian ODE Solvers

在贝叶斯微分方程求解器中的主动不确定性校准

Hans Kersting, Philipp Hennig

AI总结 本文研究了如何在贝叶斯微分方程求解器中平衡计算成本与概率校准,提出了一种基于过滤的方法Bayesian Quadrature filtering (BQF),通过主动学习梯度测量的不精确性来提高不确定性校准。

详情
Journal ref
Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence (UAI2016) 309--3018
Comments
10 pages, 3 figures, published at UAI 2016. Changes for Version 3: fixed minor index mistake in equation (14) (q-1-i instead of q+1-i on top of the product)
AI中文摘要

在统计学和机器学习中,对微分方程(ODEs)求解器的兴趣正在重新增长,这些求解器返回概率测度而非点估计。最近,Conrad等人引入了一种基于采样的方法类,这些方法在特定意义上是'well-calibrated'的。但是,这些方法的计算成本显著高于经典方法。另一方面,Schober等人指出经典Runge-Kutta ODE求解器与高斯滤波器之间存在精确的联系,这只能提供粗糙的概率校准,但计算开销可忽略不计。通过将ODE的解视为线性高斯SDE中的近似推断,我们研究了一类概率ODE求解器,这些求解器在计算成本和概率校准之间取得了平衡,并识别出不准确的梯度测量是不确定性的关键来源。我们提出了一种新的基于过滤的方法Bayesian Quadrature filtering (BQF),该方法利用贝叶斯二次法主动学习梯度测量的不精确性,通过收集多个梯度评估来提高不确定性校准。

英文摘要

There is resurging interest, in statistics and machine learning, in solvers for ordinary differential equations (ODEs) that return probability measures instead of point estimates. Recently, Conrad et al. introduced a sampling-based class of methods that are 'well-calibrated' in a specific sense. But the computational cost of these methods is significantly above that of classic methods. On the other hand, Schober et al. pointed out a precise connection between classic Runge-Kutta ODE solvers and Gaussian filters, which gives only a rough probabilistic calibration, but at negligible cost overhead. By formulating the solution of ODEs as approximate inference in linear Gaussian SDEs, we investigate a range of probabilistic ODE solvers, that bridge the trade-off between computational cost and probabilistic calibration, and identify the inaccurate gradient measurement as the crucial source of uncertainty. We propose the novel filtering-based method Bayesian Quadrature filtering (BQF) which uses Bayesian quadrature to actively learn the imprecision in the gradient measurement by collecting multiple gradient evaluations.

1811.00641 2026-06-04 cs.LG cs.CL cs.NA math.NA stat.ML

Online Embedding Compression for Text Classification using Low Rank Matrix Factorization

在线文本分类中的低秩矩阵分解用于词嵌入压缩

Anish Acharya, Rahul Goel, Angeliki Metallinou, Inderjit Dhillon

AI总结 本文提出了一种在线词嵌入压缩方法,利用低秩矩阵分解在训练过程中压缩词嵌入层,从而减少NLP模型的内存瓶颈,同时在下游任务中通过重新训练恢复精度,实验证明该方法在句子分类任务中实现了90%的压缩率,并优于固定点量化等其他方法。

详情
Comments
Accepted in Thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019)
AI中文摘要

深度学习模型已成为自然语言处理(NLP)任务的最新技术,但将其部署到生产系统中却面临显著的内存限制。现有的压缩方法要么有损,要么引入显著的延迟。我们提出了一种压缩方法,利用低秩矩阵分解在训练过程中压缩词嵌入层,该层是大多数NLP模型的主要内存瓶颈。我们的模型在训练、压缩后,再在下游任务上重新训练以恢复精度,同时保持减小的尺寸。实验证明,所提出的方法在句子分类任务中可实现90%的压缩,对精度影响极小,并优于固定点量化或其他方法如离线词嵌入压缩。我们还通过FLOP计算分析了我们方法的推理时间和存储空间,显示我们可以通过可配置的比率压缩DNN模型,并在不引入额外延迟的情况下恢复精度损失。最后,我们引入了一种新的学习率调度方法,即周期性退火学习率(CALR),并通过句子分类基准实验证明其优于其他流行的自适应学习率算法。

英文摘要

Deep learning models have become state of the art for natural language processing (NLP) tasks, however deploying these models in production system poses significant memory constraints. Existing compression methods are either lossy or introduce significant latency. We propose a compression method that leverages low rank matrix factorization during training,to compress the word embedding layer which represents the size bottleneck for most NLP models. Our models are trained, compressed and then further re-trained on the downstream task to recover accuracy while maintaining the reduced size. Empirically, we show that the proposed method can achieve 90% compression with minimal impact in accuracy for sentence classification tasks, and outperforms alternative methods like fixed-point quantization or offline word embedding compression. We also analyze the inference time and storage space for our method through FLOP calculations, showing that we can compress DNN models by a configurable ratio and regain accuracy loss without introducing additional latency compared to fixed point quantization. Finally, we introduce a novel learning rate schedule, the Cyclically Annealed Learning Rate (CALR), which we empirically demonstrate to outperform other popular adaptive learning rate algorithms on a sentence classification benchmark.

1810.08907 2026-06-04 math.OC cs.LG cs.NA math.CA math.NA stat.ML

Understanding the Acceleration Phenomenon via High-Resolution Differential Equations

通过高分辨率微分方程理解加速现象

Bin Shi, Simon S. Du, Michael I. Jordan, Weijie J. Su

AI总结 本文通过高分辨率微分方程研究优化算法的加速现象,提出了一种新的极限过程,能够区分Nesterov加速梯度法和Polyak重力球方法,并揭示了NAG-C在非强凸函数下的收敛特性。

详情
Comments
82 pages, 11 figures
AI中文摘要

基于梯度的优化算法可以从极限常微分方程(ODEs)的角度进行研究。受现有ODEs无法区分Nesterov加速梯度法(用于强凸函数)和Polyak重力球方法的启发,我们研究了一种替代的极限过程,以获得高分辨率的ODEs。我们证明这些ODEs允许一个通用的Lyapunov函数框架,用于连续和离散时间下的收敛分析。我们还证明这些ODEs是底层算法更准确的替代品;特别是,它们不仅区分NAG-SC和Polyak重力球方法,还允许识别一个称为“梯度修正”的项,该项存在于NAG-SC中但不在重力球方法中,并负责两种方法收敛性质的差异。我们还利用高分辨率ODE框架研究Nesterov加速梯度法用于(非强凸)函数,揭示了一个此前未知的结果——NAG-C以反立方速率最小化平方梯度范数。最后,通过修改NAG-C的高分辨率ODE,我们获得了一族新的优化方法,这些方法被证明在光滑凸函数上保持NAG-C的加速收敛率。

英文摘要

Gradient-based optimization algorithms can be studied from the perspective of limiting ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not distinguish between two fundamentally different algorithms---Nesterov's accelerated gradient method for strongly convex functions (NAG-SC) and Polyak's heavy-ball method---we study an alternative limiting process that yields high-resolution ODEs. We show that these ODEs permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time. We also show that these ODEs are more accurate surrogates for the underlying algorithms; in particular, they not only distinguish between NAG-SC and Polyak's heavy-ball method, but they allow the identification of a term that we refer to as "gradient correction" that is present in NAG-SC but not in the heavy-ball method and is responsible for the qualitative difference in convergence of the two methods. We also use the high-resolution ODE framework to study Nesterov's accelerated gradient method for (non-strongly) convex functions, uncovering a hitherto unknown result---that NAG-C minimizes the squared gradient norm at an inverse cubic rate. Finally, by modifying the high-resolution ODE of NAG-C, we obtain a family of new optimization methods that are shown to maintain the accelerated convergence rates of NAG-C for smooth convex functions.

1810.12429 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

打破地平线诅咒:无限地平线离线估计

Qiang Liu, Lihong Li, Ziyang Tang, Dengyong Zhou

AI总结 本文提出了一种新的离线估计方法,通过直接在平稳状态访问分布上应用重要性采样来避免现有估计器中方差爆炸的问题,核心贡献是提出了一种估计两个平稳分布密度比的新方法,并推导了RKHS情况下的闭式解。

详情
Comments
21 pages, 5 figures, NIPS 2018 (spotlight)
AI中文摘要

我们考虑了估计目标策略预期奖励的离线估计问题,该问题使用由不同行为策略收集的样本进行估计。重要性采样(IS)已成为推导(近)无偏估计器的关键技术,但在长地平线问题中已知会遭受过度高的方差。在无限地平线问题的极端情况下,基于IS的估计器的方差可能甚至是无界的。在本文中,我们提出了一种新的离线估计方法,直接在平稳状态访问分布上应用重要性采样,以避免现有估计器所面临的爆炸方差问题。我们的关键贡献是提出了一种估计两个平稳分布密度比的新方法,仅从行为分布中采样轨迹。我们为估计问题开发了一种mini-max损失函数,并推导了RKHS情况下的闭式解。我们通过理论和实证分析支持我们的方法。

英文摘要

We consider the off-policy estimation problem of estimating the expected reward of a target policy using samples collected by a different behavior policy. Importance sampling (IS) has been a key technique to derive (nearly) unbiased estimators, but is known to suffer from an excessively high variance in long-horizon problems. In the extreme case of in infinite-horizon problems, the variance of an IS-based estimator may even be unbounded. In this paper, we propose a new off-policy estimation method that applies IS directly on the stationary state-visitation distributions to avoid the exploding variance issue faced by existing estimators.Our key contribution is a novel approach to estimating the density ratio of two stationary distributions, with trajectories sampled from only the behavior distribution. We develop a mini-max loss function for the estimation problem, and derive a closed-form solution for the case of RKHS. We support our method with both theoretical and empirical analyses.

1806.03085 2026-06-04 stat.ML cs.LG cs.NA math.NA

A Stein variational Newton method

一种Stein变分牛顿方法

Gianluca Detommaso, Tiangang Cui, Alessio Spantini, Youssef Marzouk, Robert Scheichl

AI总结 本文提出了一种基于Stein变分梯度下降(SVGD)的改进方法,通过引入二阶信息加速并推广了该算法,实现了函数空间中的牛顿迭代,并展示了在多个测试案例中显著的计算效率提升。

详情
Journal ref
NIPS 2018
Comments
18 pages, 7 figures
AI中文摘要

Stein变分梯度下降(SVGD)最近被提出作为一种通用的非参数变分推断算法 [Liu & Wang, NIPS 2016]:它通过在再生核希尔伯特空间上实现一种函数梯度下降的形式来最小化目标分布与其近似分布之间的Kullback-Leibler散度。在本文中,我们通过引入二阶信息来加速和推广SVGD算法,从而在函数空间中近似出一种牛顿迭代。我们还展示了二阶信息如何导致更有效的核选择。我们在多个测试案例中观察到相对于原始SVGD算法有显著的计算效率提升。

英文摘要

Stein variational gradient descent (SVGD) was recently proposed as a general purpose nonparametric variational inference algorithm [Liu & Wang, NIPS 2016]: it minimizes the Kullback-Leibler divergence between the target distribution and its approximation by implementing a form of functional gradient descent on a reproducing kernel Hilbert space. In this paper, we accelerate and generalize the SVGD algorithm by including second-order information, thereby approximating a Newton-like iteration in function space. We also show how second-order information can lead to more effective choices of kernel. We observe significant computational gains over the original SVGD algorithm in multiple test cases.

1702.08142 2026-06-04 stat.ME cs.IT cs.NA math.IT math.NA stat.ML

Tensor Balancing on Statistical Manifold

张量统计流形上的平衡

Mahito Sugiyama, Hiroyuki Nakahara, Koji Tsuda

AI总结 本文提出了一种高效的张量平衡算法,通过将张量视为概率分布,在统计流形上进行投影,利用Moebius inversion公式获得梯度,实现二次收敛,显著提高了计算效率。

详情
Comments
19 pages, 5 figures, accepted to the 34th International Conference on Machine Learning (ICML 2017)
AI中文摘要

我们解决了张量平衡问题,通过乘以N个N-1阶张量来重新缩放N阶非负张量,使得每个纤维之和为1。这扩展了用于比较矩阵的基本过程,广泛应用于生物学和经济学等领域。我们提出了一种高效的平衡算法,使用牛顿方法并证明其具有二次收敛性,在数值实验中显示比现有方法快几个数量级。为了证明算法的正确性,我们将张量建模为统计流形上的概率分布,并将张量平衡视为对子流形的投影。我们算法的关键在于流形的梯度,作为牛顿方法中的雅可比矩阵,可以利用Moebius inversion公式解析地获得。我们的模型不仅限于张量平衡,还具有广泛应用,包括加权DAG和Boltzmann机等多种统计和机器学习模型。

英文摘要

We solve tensor balancing, rescaling an Nth order nonnegative tensor by multiplying N tensors of order N - 1 so that every fiber sums to one. This generalizes a fundamental process of matrix balancing used to compare matrices in a wide range of applications from biology to economics. We present an efficient balancing algorithm with quadratic convergence using Newton's method and show in numerical experiments that the proposed algorithm is several orders of magnitude faster than existing ones. To theoretically prove the correctness of the algorithm, we model tensors as probability distributions in a statistical manifold and realize tensor balancing as projection onto a submanifold. The key to our algorithm is that the gradient of the manifold, used as a Jacobian matrix in Newton's method, can be analytically obtained using the Moebius inversion formula, the essential of combinatorial mathematics. Our model is not limited to tensor balancing, but has a wide applicability as it includes various statistical and machine learning models such as weighted DAGs and Boltzmann machines.

1608.02740 2026-06-04 econ.GN q-fin.EC stat.ME

Bayesian nonparametric sparse VAR models

贝叶斯非参数稀疏VAR模型

Monica Billio, Roberto Casarin, Luca Rossini

AI总结 本文提出了一种新的贝叶斯非参数Lasso先验(BNP-Lasso)用于高维VAR模型,以提高估计效率和预测精度,通过聚类和收缩VAR系数来克服过参数化和过拟合问题,适用于从时间序列中提取因果网络。

详情
Comments
Forthcoming in "Journal of Econometrics" ---- Revised Version of the paper "Bayesian nonparametric Seemingly Unrelated Regression Models" ---- Supplementary Material available on request
AI中文摘要

高维向量自回归(VAR)模型需要估计大量参数,可能导致推断问题。我们提出了一种新的贝叶斯非参数(BNP)Lasso先验(BNP-Lasso)用于高维VAR模型,以提高估计效率和预测精度。我们的分层先验通过将VAR系数分组并收缩每组系数向共同位置收缩,克服了过参数化和过拟合问题。由BNP-Lasso先验引起的聚类和收缩效应非常适合从时间序列中提取因果网络,因为它们考虑了现实世界网络中的一些stylized facts,如稀疏性、社区结构和边强度的异质性。为了充分捕捉数据的丰富性并更好地理解金融和宏观经济风险,因此至关重要的是所用的模型在提取网络时应考虑这些stylized facts。

英文摘要

High dimensional vector autoregressive (VAR) models require a large number of parameters to be estimated and may suffer of inferential problems. We propose a new Bayesian nonparametric (BNP) Lasso prior (BNP-Lasso) for high-dimensional VAR models that can improve estimation efficiency and prediction accuracy. Our hierarchical prior overcomes overparametrization and overfitting issues by clustering the VAR coefficients into groups and by shrinking the coefficients of each group toward a common location. Clustering and shrinking effects induced by the BNP-Lasso prior are well suited for the extraction of causal networks from time series, since they account for some stylized facts in real-world networks, which are sparsity, communities structures and heterogeneity in the edges intensity. In order to fully capture the richness of the data and to achieve a better understanding of financial and macroeconomic risk, it is therefore crucial that the model used to extract network accounts for these stylized facts.

1810.10078 2026-06-04 stat.ML cs.LG cs.NA math.NA

Model Selection for Nonnegative Matrix Factorization by Support Union Recovery

通过支持联合恢复进行非负矩阵分解的模型选择

Zhaoqiang Liu

AI总结 本文提出了一种通过计算经验二阶矩并恢复与经验二阶矩相关的矩阵中非零行的索引集来自动选择非负矩阵分解的潜在维度的算法,该算法在理论上有保证地检测出真实的潜在维度。

详情
AI中文摘要

非负矩阵分解(NMF)因其非减性和部分基性质而被广泛应用于机器学习和信号处理,因为它增强了可解释性。通常假设潜在维度(或组件数量)是给定的。尽管已经设计了大量NMF算法,但关于具有理论保证的自动NMF模型选择的文献却很少。在本文中,我们提出了一种算法,首先从经验四阶累积张量中计算经验二阶矩,然后通过恢复与经验二阶矩相关的矩阵中非零行的索引集(即非零行的索引集)来估计潜在维度。通过假设数据的生成模型并加入额外的温和条件,我们的算法可以证明性地检测到真实的潜在维度。我们在合成示例上展示了所提出的算法能够找到近似正确的组件数量。

英文摘要

Nonnegative matrix factorization (NMF) has been widely used in machine learning and signal processing because of its non-subtractive, part-based property which enhances interpretability. It is often assumed that the latent dimensionality (or the number of components) is given. Despite the large amount of algorithms designed for NMF, there is little literature about automatic model selection for NMF with theoretical guarantees. In this paper, we propose an algorithm that first calculates an empirical second-order moment from the empirical fourth-order cumulant tensor, and then estimates the latent dimensionality by recovering the support union (the index set of non-zero rows) of a matrix related to the empirical second-order moment. By assuming a generative model of the data with additional mild conditions, our algorithm provably detects the true latent dimensionality. We show on synthetic examples that our proposed algorithm is able to find an approximately correct number of components.

1805.08637 2026-06-04 math.NA cs.NA stat.CO

Solvable Integration Problems and Optimal Sample Size Selection

可解的积分问题与最优样本大小选择

Robert J. Kunsch, Erich Novak, Daniel Rudolf

AI总结 本文研究了如何以最小成本计算函数积分或随机变量期望,并提出了一种新的算法和复杂度上界,通过独立同分布样本选择样本大小,基于方差估计或$p$-矩估计来保证小绝对误差概率,从而解决可解问题。同时,通过下界证明算法在精度、置信水平和输入随机变量范数方面最优,尽管最坏情况成本可能无限,但通过自适应停止规则可使期望成本有限。

详情
Comments
38 pages, to appear in the Journal of Complexity
AI中文摘要

我们计算函数积分或随机变量期望时以最小成本,并利用独立同分布样本用于新算法和复杂度上界。在某些假设下,可以通过方差估计或更一般地通过$p$-矩估计来选择样本大小,从而以高概率保证小绝对误差,因此该问题称为可解。该方法的期望成本取决于随机变量的$p$-矩,可能任意大。为了证明算法最优性,我们还提供了下界。这些下界不仅适用于基于i.i.d样本的方法,也适用于通用随机算法。它们表明,算法的成本在精度、置信水平和特定输入随机变量范数方面最优。由于考虑的随机变量或积分函数类非常大,最坏情况成本可能为无限。然而,可以通过自适应停止规则使每个输入的期望成本有限。我们通过不可解的积分问题示例对比这些积极结果。

英文摘要

We compute the integral of a function or the expectation of a random variable with minimal cost and use, for our new algorithm and for upper bounds of the complexity, i.i.d. samples. Under certain assumptions it is possible to select a sample size based on a variance estimation, or -- more generally -- based on an estimation of a (central absolute) $p$-moment. That way one can guarantee a small absolute error with high probability, the problem is thus called solvable. The expected cost of the method depends on the $p$-moment of the random variable, which can be arbitrarily large. In order to prove the optimality of our algorithm we also provide lower bounds. These bounds apply not only to methods based on i.i.d. samples but also to general randomized algorithms. They show that -- up to constants -- the cost of the algorithm is optimal in terms of accuracy, confidence level, and norm of the particular input random variable. Since the considered classes of random variables or integrands are very large, the worst case cost would be infinite. Nevertheless one can define adaptive stopping rules such that for each input the expected cost is finite. We contrast these positive results with examples of integration problems that are not solvable.

1804.01932 2026-06-04 physics.data-an cs.NA math.NA physics.comp-ph q-bio.QM stat.ME

Density estimation on small datasets

小数据集上的密度估计

Wei-Chia Chen, Ammar Tareen, Justin B. Kinney

AI总结 本文提出了一种场论方法,用于从有限样本数据中准确估计平滑概率分布,并提供非参数贝叶斯后验分布,无需可调参数或大数据近似。

详情
Journal ref
Phys. Rev. Lett. 121, 160605 (2018)
Comments
Includes main text (5 pages, 3 figures) and Supplemental Information (10 pages, 4 figures). Same as version 3 but with Feynman diagrams properly rendered
AI中文摘要

如何从有限的采样数据中准确量化不确定性地估计一个平滑的概率分布?本文描述了一种场论方法,该方法在单维情况下非常有效地解决了这个问题,提供了一个精确的非参数贝叶斯后验分布,而无需依赖可调参数或大数据近似。强非高斯约束,需要非微扰处理,被发现在减少分布不确定性中起着主要作用。提供了该方法的软件实现。

英文摘要

How might a smooth probability distribution be estimated, with accurately quantified uncertainty, from a limited amount of sampled data? Here we describe a field-theoretic approach that addresses this problem remarkably well in one dimension, providing an exact nonparametric Bayesian posterior without relying on tunable parameters or large-data approximations. Strong non-Gaussian constraints, which require a non-perturbative treatment, are found to play a major role in reducing distribution uncertainty. A software implementation of this method is provided.

1810.09365 2026-06-04 cs.LG cs.RO cs.SY eess.SY stat.ML

Coupled Longitudinal and Lateral Control of a Vehicle using Deep Learning

使用深度学习进行车辆纵向和横向控制的耦合控制

Guillaume Devineau, Philip Polack, Florent Altché, Fabien Moutarde

AI总结 本文研究了深度神经网络在捕捉车辆动力学关键特性及执行耦合纵向和横向控制方面的潜力,通过高保真车辆动力学模拟数据集训练两种不同的人工神经网络,评估多层感知机和卷积神经网络在复杂测试赛道上的性能,与传统解耦控制器进行比较。

详情
Comments
Published in the IEEE 2018 International Conference on Intelligent Transportation Systems (ITSC 2018)
AI中文摘要

本文探讨了深度神经网络在捕捉车辆动力学关键特性及执行耦合纵向和横向控制方面的潜力。为此,两种不同的人工神经网络被训练以计算对应参考轨迹的车辆控制输入,使用基于高保真车辆动力学模拟的数据集。在本研究中,控制输入被选择为前轮转向角和每个车轮施加的扭矩。两种模型,即多层感知机(MLP)和卷积神经网络(CNN),基于其在复杂测试赛道上驾驶车辆的能力进行评估,该赛道在长直线和紧弯之间切换。还提供了与传统解耦控制器在相同赛道上的比较。

英文摘要

This paper explores the capability of deep neural networks to capture key characteristics of vehicle dynamics, and their ability to perform coupled longitudinal and lateral control of a vehicle. To this extent, two different artificial neural networks are trained to compute vehicle controls corresponding to a reference trajectory, using a dataset based on high-fidelity simulations of vehicle dynamics. In this study, control inputs are chosen as the steering angle of the front wheels, and the applied torque on each wheel. The performance of both models, namely a Multi-Layer Perceptron (MLP) and a Convolutional Neural Network (CNN), is evaluated based on their ability to drive the vehicle on a challenging test track, shifting between long straight lines and tight curves. A comparison to conventional decoupled controllers on the same track is also provided.

1810.03398 2026-06-04 stat.CO cs.NA math.NA

Probabilistic Linear Solvers: A Unifying View

概率线性求解器:一种统一的观点

Simon Bartels, Jon Cockayne, Ilse C. F. Ipsen, Philipp Hennig

AI总结 本文提出了一种统一的概率线性求解器框架,通过概率方法将不同线性系统求解方法统一起来,并介绍了概率求解器与投影方法之间的联系,以及预处理的概率视角。

详情
AI中文摘要

近年来,一些工作发展了一种新的概率解释,用于数值算法求解线性系统,其中解决方案在贝叶斯框架下被推断,无论是直接推断还是通过推断未知矩阵逆的作用。这些方法通常专注于复制共轭梯度法作为典型迭代方法的行为。在本文中,作者提出了令人惊讶的一般等价条件,将这些不同的方法统一起来。我们还描述了概率线性求解器与线性系统投影方法之间的联系,为更广泛的迭代方法类提供了概率解释。特别是,这为广义最小余数法提供了这样的解释。还引入了概率视角的预处理。这些发展统一了概率线性求解器的文献,并与线性系统迭代求解器的文献建立了基础联系。

英文摘要

Several recent works have developed a new, probabilistic interpretation for numerical algorithms solving linear systems in which the solution is inferred in a Bayesian framework, either directly or by inferring the unknown action of the matrix inverse. These approaches have typically focused on replicating the behavior of the conjugate gradient method as a prototypical iterative method. In this work surprisingly general conditions for equivalence of these disparate methods are presented. We also describe connections between probabilistic linear solvers and projection methods for linear systems, providing a probabilistic interpretation of a far more general class of iterative methods. In particular, this provides such an interpretation of the generalised minimum residual method. A probabilistic view of preconditioning is also introduced. These developments unify the literature on probabilistic linear solvers, and provide foundational connections to the literature on iterative solvers for linear systems.

1810.06175 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

An Optimal Control Approach to Sequential Machine Teaching

用最优控制方法进行序列机器教学

Laurent Lessard, Xuezhou Zhang, Xiaojin Zhu

AI总结 本文提出了一种基于最优控制的序列机器教学方法,通过将问题转化为时间最优控制问题,解决了寻找最短训练序列以驱动学习算法达到目标模型的问题,并在案例研究中展示了该方法的优越性。

详情
AI中文摘要

给定一个序列学习算法和目标模型,序列机器教学旨在找到最短的训练序列以驱动学习算法达到目标模型。我们提出了寻找此类最短训练序列的第一个系统方法。我们的关键见解是将序列机器教学公式化为时间最优控制问题。这使我们能够利用过去60年间最优控制领域发展出的关键理论和计算工具来解决序列教学问题。具体而言,我们研究了庞特里亚金最大原理,它为训练序列的最优性提供了必要条件。我们通过一个使用最小二乘损失函数和梯度下降学习者案例研究,展示了该方法的分析、结构和数值影响。我们为该问题计算了最优训练序列,尽管这些序列看起来曲折,但我们发现它们可以大幅超越现有生成训练序列的最优启发式方法。

英文摘要

Given a sequential learning algorithm and a target model, sequential machine teaching aims to find the shortest training sequence to drive the learning algorithm to the target model. We present the first principled way to find such shortest training sequences. Our key insight is to formulate sequential machine teaching as a time-optimal control problem. This allows us to solve sequential teaching by leveraging key theoretical and computational tools developed over the past 60 years in the optimal control community. Specifically, we study the Pontryagin Maximum Principle, which yields a necessary condition for optimality of a training sequence. We present analytic, structural, and numerical implications of this approach on a case study with a least-squares loss function and gradient descent learner. We compute optimal training sequences for this problem, and although the sequences seem circuitous, we find that they can vastly outperform the best available heuristics for generating training sequences.

1709.00483 2026-06-04 math.NA cs.CV cs.NA math.OC stat.ML

Iteratively Linearized Reweighted Alternating Direction Method of Multipliers for a Class of Nonconvex Problems

迭代线性化加权交替方向乘子法用于一类非凸问题

Tao Sun, Hao Jiang, Lizhi Cheng, Wei Zhu

AI总结 本文提出了一种迭代线性化加权交替方向乘子法,用于解决信号处理和机器学习中常见的非凸和非光滑问题,该方法通过将子问题转化为凸问题以提高求解效率,并证明了算法的全局收敛性。

详情
AI中文摘要

在本文中,我们考虑解决在信号处理和机器学习研究中频繁出现的一类非凸和非光滑问题。传统的交替方向乘子法在数学和计算上解决非凸和非光滑子问题时遇到了困难。为此,我们提出了一种加权交替方向乘子法。在该算法中,所有子问题都是凸的,易于求解。我们还提供了几种保证以确保收敛性,并利用Kurdyka-Łojasiewicz性质证明了该算法全局收敛到辅助函数的临界点。展示了几个数值结果以证明所提算法的有效性。

英文摘要

In this paper, we consider solving a class of nonconvex and nonsmooth problems frequently appearing in signal processing and machine learning research. The traditional alternating direction method of multipliers encounters troubles in both mathematics and computations in solving the nonconvex and nonsmooth subproblem. In view of this, we propose a reweighted alternating direction method of multipliers. In this algorithm, all subproblems are convex and easy to solve. We also provide several guarantees for the convergence and prove that the algorithm globally converges to a critical point of an auxiliary function with the help of the Kurdyka-Łojasiewicz property. Several numerical results are presented to demonstrate the efficiency of the proposed algorithm.

1708.07469 2026-06-04 q-fin.MF cs.NA math.NA q-fin.CP stat.ML

DGM: A deep learning algorithm for solving partial differential equations

DGM:一种用于求解偏微分方程的深度学习算法

Justin Sirignano, Konstantinos Spiliopoulos

AI总结 本文提出了一种深度学习算法DGM,用于求解高维偏微分方程,通过神经网络近似解并训练其满足微分算子、初始条件和边界条件,能够在高维空间中有效解决自由边界问题。

详情
Comments
Deep learning, machine learning, partial differential equations
AI中文摘要

高维偏微分方程一直是计算上的长期挑战。我们提出通过深度神经网络近似解来求解高维偏微分方程,该网络被训练以满足微分算子、初始条件和边界条件。我们的算法是无网格的,这在高维情况下至关重要,因为网格变得不可行。代替形成网格,神经网络在随机采样的时间和空间点上进行批量训练。该算法在一类高维自由边界偏微分方程上进行了测试,能够准确求解高达200维的问题。此外,该算法还测试了高维Hamilton-Jacobi-Bellman偏微分方程和Burgers方程。深度学习算法近似了Burgers方程在连续不同边界条件和物理条件下的通用解(可视为高维空间)。我们称之为“深度Galerkin方法(DGM)”,因为它在精神上类似于Galerkin方法,但用神经网络代替了基函数的线性组合来近似解。此外,我们还证明了关于神经网络对一类拟线性抛物型偏微分方程的近似能力的定理。

英文摘要

High-dimensional PDEs have been a longstanding computational challenge. We propose to solve high-dimensional PDEs by approximating the solution with a deep neural network which is trained to satisfy the differential operator, initial condition, and boundary conditions. Our algorithm is meshfree, which is key since meshes become infeasible in higher dimensions. Instead of forming a mesh, the neural network is trained on batches of randomly sampled time and space points. The algorithm is tested on a class of high-dimensional free boundary PDEs, which we are able to accurately solve in up to $200$ dimensions. The algorithm is also tested on a high-dimensional Hamilton-Jacobi-Bellman PDE and Burgers' equation. The deep learning algorithm approximates the general solution to the Burgers' equation for a continuum of different boundary conditions and physical conditions (which can be viewed as a high-dimensional space). We call the algorithm a "Deep Galerkin Method (DGM)" since it is similar in spirit to Galerkin methods, with the solution approximated by a neural network instead of a linear combination of basis functions. In addition, we prove a theorem regarding the approximation power of neural networks for a class of quasilinear parabolic PDEs.

1810.04859 2026-06-04 cs.IT cs.AI cs.LG cs.SY eess.SY math.IT math.ST stat.TH

Policy Design for Active Sequential Hypothesis Testing using Deep Learning

使用深度学习的主动顺序假设检验政策设计

Dhruva Kartik, Ekraam Sabir, Urbashi Mitra, Prem Natarajan

AI总结 本文研究了如何利用深度学习设计更有效的主动顺序假设检验策略,通过比较新提出的启发式方法与现有方法,展示了在某些场景下性能的显著提升。

详情
Comments
Accepted at 56th Annual Allerton Conference on Communication, Control, and Computing
AI中文摘要

信息论在通信、压缩和假设检验等各类问题中取得了很大的成功,而随机控制理论则通过动态规划对部分可观测马尔可夫决策过程(POMDPs)的最优策略进行表征。然而,一般情况下找到这些问题的最优策略是计算上困难的,因此在实践中通常采用启发式方法。深度学习可以作为一种工具,用于设计更好的启发式方法。本文考虑了主动顺序假设检验问题,目标是通过自适应选择适当的查询来以最少的样本量可靠地推断真实假设。该问题可以建模为POMDP,并且文献中已存在其价值函数的界。然而,最优策略尚未被识别,各种启发式方法被使用。本文提出了两种新的启发式方法:一种基于深度强化学习,另一种基于KL散度零和博弈。这些启发式方法与最先进的解决方案进行了比较,并通过数值实验表明,在某些场景下,所提出的启发式方法能够显著优于现有方法。

英文摘要

Information theory has been very successful in obtaining performance limits for various problems such as communication, compression and hypothesis testing. Likewise, stochastic control theory provides a characterization of optimal policies for Partially Observable Markov Decision Processes (POMDPs) using dynamic programming. However, finding optimal policies for these problems is computationally hard in general and thus, heuristic solutions are employed in practice. Deep learning can be used as a tool for designing better heuristics in such problems. In this paper, the problem of active sequential hypothesis testing is considered. The goal is to design a policy that can reliably infer the true hypothesis using as few samples as possible by adaptively selecting appropriate queries. This problem can be modeled as a POMDP and bounds on its value function exist in literature. However, optimal policies have not been identified and various heuristics are used. In this paper, two new heuristics are proposed: one based on deep reinforcement learning and another based on a KL-divergence zero-sum game. These heuristics are compared with state-of-the-art solutions and it is demonstrated using numerical experiments that the proposed heuristics can achieve significantly better performance than existing methods in some scenarios.

1801.08383 2026-06-04 eess.SY cs.LG cs.SY stat.ML

Data-Driven Impulse Response Regularization via Deep Learning

基于深度学习的数据驱动脉冲响应正则化

Carl Andersson, Niklas Wahlström, Thomas B. Schön

AI总结 本文提出了一种新的数据驱动模型,用于稳定线性单输入单输出系统的脉冲响应估计,该模型在利用输入输出数据中的隐藏模式方面优于非参数模型。

详情
AI中文摘要

我们考虑了稳定线性单输入单输出系统脉冲响应估计的问题。这是一个已广泛研究的问题,其中灵活的非参数模型最近在性能上超越了传统的有限维模型结构。受这一发展和深度学习的成功启发,我们提出了一种新的灵活的数据驱动模型。我们的实验表明,新模型能够比非参数模型更充分地利用输入输出数据中的隐藏模式。

英文摘要

We consider the problem of impulse response estimation of stable linear single-input single-output systems. It is a well-studied problem where flexible non-parametric models recently offered a leap in performance compared to the classical finite-dimensional model structures. Inspired by this development and the success of deep learning we propose a new flexible data-driven model. Our experiments indicate that the new model is capable of exploiting even more of the hidden patterns that are present in the input-output data as compared to the non-parametric models.

1606.09539 2026-06-04 math.NA cs.NA math.PR stat.ME

Analysis of multiscale integrators for multiple attractors and irreversible Langevin samplers

多尺度积分器对多重吸引子和不可逆兰格朗采样器的分析

Jianfeng Lu, Konstantinos Spiliopoulos

AI总结 本文研究了多尺度积分器在处理一类刚性随机微分方程中的数值方法,探讨了多尺度SDEs在刚性参数趋于极限时的行为,以及如何通过适当多尺度积分器来纠正经典方法如欧拉-马尔可夫方案的不稳定性,并通过数值研究展示了在兰格朗采样器中引入不可逆性以加速收敛到平衡态的应用。

详情
AI中文摘要

我们研究了一类刚性随机微分方程(SDEs)的多尺度积分器数值方案。我们考虑了具有潜在多个吸引子的多尺度SDEs,这些吸引子在刚性参数趋于极限时表现为图上的扩散。经典数值离散化方案,如欧拉-马尔可夫方案,在刚性参数趋于极限时会变得不稳定,而适当的多尺度积分器可以纠正这种不稳定性。我们严格建立了数值方法对相关图上扩散的收敛性,确定了适当的离散化参数选择。理论结果通过在最近发展的引入兰格朗采样器不可逆性以加速收敛到平衡态的领域中的数值研究进行了补充。

英文摘要

We study multiscale integrator numerical schemes for a class of stiff stochastic differential equations (SDEs). We consider multiscale SDEs with potentially multiple attractors that behave as diffusions on graphs as the stiffness parameter goes to its limit. Classical numerical discretization schemes, such as the Euler-Maruyama scheme, become unstable as the stiffness parameter converges to its limit and appropriate multiscale integrators can correct for this. We rigorously establish the convergence of the numerical method to the related diffusion on graph, identifying the appropriate choice of discretization parameters. Theoretical results are supplemented by numerical studies on the problem of the recently developing area of introducing irreversibility in Langevin samplers in order to accelerate convergence to equilibrium.

1805.06094 2026-06-04 eess.SY cs.SY math.OC stat.AP

A Framework to Integrate Mode Choice in the Design of Mobility-on-Demand Systems

一种整合出行方式选择的按需出行系统设计框架

Yang Liu, Prateek Bansal, Ricardo Daziano, Samitha Samaranayake

AI总结 本文提出了一种整合出行方式选择的按需出行系统设计框架,通过多模式交通系统中的出行方式需求函数进行优化和分析,利用贝叶斯优化方法确定最优供应参数,并通过数值实验验证了该方法的优越性。

详情
Comments
31 pages, 11 figures. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
AI中文摘要

按需出行(MoD)系统通常针对固定且外生的需求进行设计和分析,但此类框架无法回答这些服务对城市交通系统的影响,例如诱导需求的影响和公共交通乘客量的含义。本文提出了一种统一的框架,用于在多模式交通系统中设计、优化和分析MoD操作,其中出行方式的需求是其服务水平的函数。还通过贝叶斯优化(BO)推导了最优的供应侧MoD参数(例如车队规模和票价)。该框架使用纽约市曼哈顿的出租车需求数据进行校准。出行需求由公共交通和不同乘客容量(1、4和10)的MoD服务提供,并且乘客根据出行方式选择模型预测选择出行方式。该选择模型使用在纽约市收集的陈述偏好数据进行估计。通过数值实验建立了多模式供应-需求系统的收敛性和贝叶斯优化方法相对于早期方法的优越性。最后考虑了一种政策干预,即政府对网约车服务征税,并说明了该框架如何为不同利益相关者量化此类政策的利弊。

英文摘要

Mobility-on-Demand (MoD) systems are generally designed and analyzed for a fixed and exogenous demand, but such frameworks fail to answer questions about the impact of these services on the urban transportation system, such as the effect of induced demand and the implications for transit ridership. In this study, we propose a unified framework to design, optimize and analyze MoD operations within a multimodal transportation system where the demand for a travel mode is a function of its level of service. An application of Bayesian optimization (BO) to derive the optimal supply-side MoD parameters (e.g., fleet size and fare) is also illustrated. The proposed framework is calibrated using the taxi demand data in Manhattan, New York. Travel demand is served by public transit and MoD services of varying passenger capacities (1, 4 and 10), and passengers are predicted to choose travel modes according to a mode choice model. This choice model is estimated using stated preference data collected in New York City. The convergence of the multimodal supply-demand system and the superiority of the BO-based optimization method over earlier approaches are established through numerical experiments. We finally consider a policy intervention where the government imposes a tax on the ride-hailing service and illustrate how the proposed framework can quantify the pros and cons of such policies for different stakeholders.

1810.03025 2026-06-04 stat.ML cs.AI cs.LG cs.SY eess.SY

Discretizing Logged Interaction Data Biases Learning for Decision-Making

对记录交互数据进行离散化会偏学习决策制定

Peter Schulam, Suchi Saria

AI总结 本文研究了对非等间隔时间序列数据进行离散化对决策制定模型训练的影响,指出离散化引入了偏差,并提出使用连续时间模型来避免这一问题。

详情
Comments
This is a standalone short paper describing a new type of bias that can arise when learning from time series data for sequential decision-making problems
AI中文摘要

时间序列数据通常在非等间隔时间点测量,常通过离散化作为预处理步骤。例如,客户到达时间的数据可能通过将每小时内的到达次数相加来简化,从而生成更易建模的离散时间序列。在本文摘要中,我们展示离散化引入了影响决策制定模型训练的偏差。我们称这种现象为离散化偏差,并表明可以通过使用连续时间模型来避免它。

英文摘要

Time series data that are not measured at regular intervals are commonly discretized as a preprocessing step. For example, data about customer arrival times might be simplified by summing the number of arrivals within hourly intervals, which produces a discrete-time time series that is easier to model. In this abstract, we show that discretization introduces a bias that affects models trained for decision-making. We refer to this phenomenon as discretization bias, and show that we can avoid it by using continuous-time models instead.

1706.07650 2026-06-04 math.NA cs.NA stat.CO

Semi-discrete optimal transport - the case p=1

半离散最优传输 - 情况 p=1

Valentin Hartmann, Dominic Schuhmacher

AI总结 本文研究了在欧几里得距离作为运输成本的情况下,从连续分布的测度μ到有限支持的测度ν的最优传输计划问题,并提出了一个与p=2情况相似的算法,用于计算最优传输计划,并展示了其在多个测试案例中的性能及应用。

详情
Comments
28 pages, 8 figures; new application added and more thorough performance evaluation
AI中文摘要

我们考虑在欧几里得距离作为运输成本的情况下,从连续分布的测度μ(定义在R^d的子集X上)到有限支持的测度ν(定义在R^d上)的最优传输计划问题。我们可以将此问题视为将某种连续分布的空间资源分配到有限数量的处理站点,这些站点具有容量限制。本文详细讨论了该问题,包括与更广泛研究的平方欧几里得成本情况(即p=2的情况)的比较。我们提出了一个计算最优传输计划的算法,该算法与Aurenhammer、Hoffmann和Aronov [Algorithmica 20, 61-76, 1998]以及Mérigot [Computer Graphics Forum 30, 1583--1592, 2011]为p=2情况设计的算法类似。我们展示了使该方法适用于欧几里得成本所需的必要结果,评估了其在一组测试案例中的性能,并给出了多个应用。这些应用包括良好的拟合分区,一种新的视觉工具,用于评估有限样本是否与假设的概率密度一致。

英文摘要

We consider the problem of finding an optimal transport plan between an absolutely continuous measure $μ$ on $\mathcal{X} \subset \mathbb{R}^d$ and a finitely supported measure $ν$ on $\mathbb{R}^d$ when the transport cost is the Euclidean distance. We may think of this problem as closest distance allocation of some ressource continuously distributed over space to a finite number of processing sites with capacity constraints. This article gives a detailed discussion of the problem, including a comparison with the much better studied case of squared Euclidean cost ("the case $p=2$"). We present an algorithm for computing the optimal transport plan, which is similar to the approach for $p=2$ by Aurenhammer, Hoffmann and Aronov [Algorithmica 20, 61-76, 1998] and Mérigot [Computer Graphics Forum 30, 1583--1592, 2011]. We show the necessary results to make the approach work for the Euclidean cost, evaluate its performance on a set of test cases, and give a number of applications. The later include goodness-of-fit partitions, a novel visual tool for assessing whether a finite sample is consistent with a posited probability density.

1810.02022 2026-06-04 math.OC cs.LG cs.SY eess.SY math.DS stat.ML

Convergence of the Expectation-Maximization Algorithm Through Discrete-Time Lyapunov Stability Theory

通过离散时间李雅普诺夫稳定性理论分析期望-最大化算法的收敛性

Orlando Romero, Sarthak Chatterjee, Sérgio Pequito

AI总结 本文从动态系统视角重新审视期望-最大化算法,将其视为非线性状态空间动态系统,并利用离散时间李雅普诺夫稳定性理论证明其收敛性。

详情
Comments
Preprint submitted to ACC 2019
AI中文摘要

在本文中,我们提出了期望-最大化(EM)算法的动态系统视角。更具体地说,我们可以将EM算法分析为一个非线性状态空间动态系统。EM算法广泛应用于统计学、控制系统和机器学习中的数据聚类和密度估计。该算法属于一类称为近点方法的大型迭代算法。特别是,我们将EM算法的极限点和其他局部最大似然函数的极值点重新解释为其动态系统表示中的平衡点。此外,我们提出将其收敛性作为李雅普诺夫意义下的渐近稳定性来评估。因此,我们通过利用最近关于离散时间李雅普诺夫稳定性理论的结果,以建立EM算法动态系统表示中的渐近稳定性(从而收敛性)。

英文摘要

In this paper, we propose a dynamical systems perspective of the Expectation-Maximization (EM) algorithm. More precisely, we can analyze the EM algorithm as a nonlinear state-space dynamical system. The EM algorithm is widely adopted for data clustering and density estimation in statistics, control systems, and machine learning. This algorithm belongs to a large class of iterative algorithms known as proximal point methods. In particular, we re-interpret limit points of the EM algorithm and other local maximizers of the likelihood function it seeks to optimize as equilibria in its dynamical system representation. Furthermore, we propose to assess its convergence as asymptotic stability in the sense of Lyapunov. As a consequence, we proceed by leveraging recent results regarding discrete-time Lyapunov stability theory in order to establish asymptotic stability (and thus, convergence) in the dynamical system representation of the EM algorithm.

1809.11003 2026-06-04 cs.GR cs.LG cs.NA math.NA stat.ML

An inverse scattering approach for geometric body generation: a machine learning perspective

用于几何体生成的逆散射方法:一种机器学习视角

Jinhong Li, Hongyu Liu, Wing-Yan Tsui, Xianchao Wang

AI总结 本文提出了一种基于逆散射技术的机器学习方法,用于生成具有特定特征值的2D和3D几何体,通过建立几何体与远场模式的一一对应关系,实现高效稳定的体生成。

详情
Comments
22pages, comments are welcome
AI中文摘要

在本文中,我们关注通过指定特定几何体的特征值集来生成2D和3D几何形状。我们的主要动机之一是各种应用中的3D人体生成。我们开发了一种新的方法,能够根据定制的特征值生成所需的体。该方法采用机器学习的风味,通过训练数据集中的输入特征参数生成推断几何体。我们方法的一个关键成分和创新点是将波传播理论中的逆散射技术引入到体生成中。这是通过在由Helmholtz系统支配的源散射问题中建立几何体与远场模式之间精细的一一对应关系来实现的。这使得能够建立几何体空间与由远场模式定义的功能空间之间的一一对应关系。因此,远场模式可以作为形状生成器。通过首先操纵形状生成器,然后通过稳定的多频傅里叶方法从获得的形状生成器中重建对应的几何体,实现了具有指定特征参数的形状生成。我们的方法易于实现,能够产生更高效和稳定的体生成。我们为所提出的方法提供了理论分析和广泛的数值实验。本研究是首次尝试将逆散射方法与机器学习结合应用于几何体生成,并为进一步的发展打开了许多机会。

英文摘要

In this paper, we are concerned with the 2D and 3D geometric shape generation by prescribing a set of characteristic values of a specific geometric body. One of the major motivations of our study is the 3D human body generation in various applications. We develop a novel method that can generate the desired body with customized characteristic values. The proposed method follows a machine-learning flavour that generates the inferred geometric body with the input characteristic parameters from a training dataset. One of the critical ingredients and novelties of our method is the borrowing of inverse scattering techniques in the theory of wave propagation to the body generation. This is done by establishing a delicate one-to-one correspondence between a geometric body and the far-field pattern of a source scattering problem governed by the Helmholtz system. It in turn enables us to establish a one-to-one correspondence between the geometric body space and the function space defined by the far-field patterns. Hence, the far-field patterns can act as the shape generators. The shape generation with prescribed characteristic parameters is achieved by first manipulating the shape generators and then reconstructing the corresponding geometric body from the obtained shape generator by a stable multiple-frequency Fourier method. Our method is easy to implement and produces more efficient and stable body generations. We provide both theoretical analysis and extensive numerical experiments for the proposed method. The study is the first attempt to introduce inverse scattering approaches in combination with machine learning to the geometric body generation and it opens up many opportunities for further developments.

1809.10535 2026-06-04 eess.SY cs.SY stat.ML

Physics Informed Topology Learning in Networks of Linear Dynamical Systems

网络中线性动力系统中的物理引导拓扑学习

Saurav Talukdar, Deepjyoti Deka, Harish Doddi, Donatello Materassi, Misha Chertkov, Murti V. Salapaka

AI总结 本文研究了从观测中学习动态相关过程网络影响路径的问题,提出了一种基于多变量维纳滤波的算法来重建交互拓扑,并证明在满足流量守恒的广泛重要类别中可以精确恢复拓扑结构,应用于电力分配网络、动态热网络和共识网络。

详情
Comments
14 pages, 10 figures
AI中文摘要

从观测中学习动态相关过程网络的影响路径在许多学科中具有重要意义。本文考虑了通过线性依赖动态交互的代理影响网络。分析了一种基于多变量维纳滤波的重建交互拓扑的算法。证明对于一个广泛且重要的交互类别,该类别尊重流量守恒,交互拓扑可以精确恢复。保证重建精确的问题类别包括电力分配网络、动态热网络和共识网络。通过共识网络、IEEE电力分配网络和建筑热动力学的模拟和实验展示了该方法的有效性。

英文摘要

Learning influence pathways of a network of dynamically related processes from observations is of considerable importance in many disciplines. In this article, influence networks of agents which interact dynamically via linear dependencies are considered. An algorithm for the reconstruction of the topology of interaction based on multivariate Wiener filtering is analyzed. It is shown that for a vast and important class of interactions, that respect flow conservation, the topology of the interactions can be exactly recovered. The class of problems where reconstruction is guaranteed to be exact includes power distribution networks, dynamic thermal networks and consensus networks. The efficacy of the approach is illustrated through simulation and experiments on consensus networks, IEEE power distribution networks and thermal dynamics of buildings.

1803.01802 2026-06-04 eess.SY cs.SY stat.ML

Event-triggered Learning for Resource-efficient Networked Control

事件触发学习用于资源高效的网络化控制

Friedrich Solowjow, Dominik Baumann, Jochen Garcke, Sebastian Trimpe

AI总结 本文提出事件触发学习作为一种新概念,通过检测通信性能不佳时触发识别实验并从数据中学习改进的预测模型,从而进一步减少通信开销,提升系统在变化环境中的鲁棒性。

详情
Comments
7 pages, 4 figures, to appear in the 2018 American Control Conference (ACC)
AI中文摘要

常见的事件触发状态估计(ETSE)算法通过预测代理的行为来节省网络化控制系统中的通信量,仅在预测显著偏离时传输更新。因此,减少通信的效果严重依赖于用于预测代理状态或测量的动态模型的质量。本文提出事件触发学习作为一种新概念,进一步减少通信:每当检测到通信性能不佳时,会触发识别实验并从数据中学习改进的预测模型。通过将实际通信速率与基于当前模型预期的通信速率进行比较,获得有效的学习触发。通过分析通信时间的统计特性并利用强大的收敛结果,所提出的触发机制被证明仅在必要时刻限制学习实验。数值和物理实验表明,事件触发学习在变化环境中具有更好的鲁棒性,并且比常见的ETSE产生更低的通信率。

英文摘要

Common event-triggered state estimation (ETSE) algorithms save communication in networked control systems by predicting agents' behavior, and transmitting updates only when the predictions deviate significantly. The effectiveness in reducing communication thus heavily depends on the quality of the dynamics models used to predict the agents' states or measurements. Event-triggered learning is proposed herein as a novel concept to further reduce communication: whenever poor communication performance is detected, an identification experiment is triggered and an improved prediction model learned from data. Effective learning triggers are obtained by comparing the actual communication rate with the one that is expected based on the current model. By analyzing statistical properties of the inter-communication times and leveraging powerful convergence results, the proposed trigger is proven to limit learning experiments to the necessary instants. Numerical and physical experiments demonstrate that event-triggered learning improves robustness toward changing environments and yields lower communication rates than common ETSE.

1801.03765 2026-06-04 math.OC cs.NA math.NA stat.ML

Non-stationary Douglas-Rachford and alternating direction method of multipliers: adaptive stepsizes and convergence

非平稳Douglas-Rachford和交替方向乘数法:自适应步长和收敛性

Dirk A. Lorenz, Quoc Tran-Dinh

AI总结 本文重新审视经典Douglas-Rachford方法用于求解两个最大单调算子之和的零点问题,提出了一种自适应步长规则,通过分析线性情况并开发出无需调整步长的策略,证明了非平稳DR方案在可和增量步长序列下的收敛性,并由此推导出相关的非平稳交替方向乘数法(ADMM),并通过数值实验验证了所提方法的效率。

详情
AI中文摘要

我们重新审视经典的Douglas-Rachford(DR)方法,用于寻找两个最大单调算子之和的零点。由于DR方法的实际性能严重依赖于步长,我们旨在开发一种自适应步长规则。为此,我们更深入地分析了该问题的线性情况,并利用我们的发现开发出一种消除步长调整需求的步长策略。我们分析了一种一般的非平稳DR方案,并证明了在可和增量步长序列下的收敛性。这反过来证明了使用新自适应步长规则的方法的收敛性。我们还从这种非平稳DR方法中推导出相关的非平稳交替方向乘数法(ADMM)。我们通过几个数值示例展示了所提方法的效率。

英文摘要

We revisit the classical Douglas-Rachford (DR) method for finding a zero of the sum of two maximal monotone operators. Since the practical performance of the DR method crucially depends on the stepsizes, we aim at developing an adaptive stepsize rule. To that end, we take a closer look at a linear case of the problem and use our findings to develop a stepsize strategy that eliminates the need for stepsize tuning. We analyze a general non-stationary DR scheme and prove its convergence for a convergent sequence of stepsizes with summable increments. This, in turn, proves the convergence of the method with the new adaptive stepsize rule. We also derive the related non-stationary alternating direction method of multipliers (ADMM) from such a non-stationary DR method. We illustrate the efficiency of the proposed methods on several numerical examples.

1801.03592 2026-06-04 math.OC cs.NA math.NA math.ST stat.TH

Estimation of the Robin coefficient field in a Poisson problem with uncertain conductivity field

在具有不确定导电率场的泊松问题中估计鲁金系数场

Ruanui Nicholson, Noemi Petra, Jari Kaipio

AI总结 本文研究了在不可达边界部分的鲁金边界条件下的异质系数场重建问题,通过贝叶斯近似误差方法处理导电率不确定性,并利用局部线性化和低秩近似来分析不确定性,展示了鲁金系数场估计的可行性。

详情
AI中文摘要

我们考虑在具有不确定(或未知)非均匀导电率场的域内情况下,在不可达边界部分的鲁金边界条件下的异质系数场重建。为处理源于导电率系数不确定性的模型误差,我们将未知导电率视为噪声参数,并对其进行近似预边缘化,仅反演鲁金系数场。我们通过贝叶斯近似误差(BAE)方法近似相关建模误差。此处呈现的不确定性分析依赖于在最大后验(MAP)估计处对参数到观测映射的局部线性化,这导致参数后验密度的正态(高斯)近似。为计算MAP点,我们应用基于伴随方法的不精确牛顿共轭梯度方法。通过调用数据不匹配部分的Hessian的低秩近似,使协方差构造变得可行。考虑了两个数值实验:一个中导电率的先验协方差是各向同性的,另一个是各向异性的。结果与基于标准误差模型的结果进行了比较,特别强调后验不确定性估计的可行性。我们展示了BAE方法在意义上是可行的,因为预测的后验不确定性与实际估计误差一致,而忽略相关建模误差会导致鲁金系数的不可行估计。此外,我们还证明BAE方法在计算上大约与传统误差方法一样昂贵(以PDE求解次数衡量)

英文摘要

We consider the reconstruction of a heterogeneous coefficient field in a Robin boundary condition on an inaccessible part of the boundary in a Poisson problem with an uncertain (or unknown) inhomogeneous conductivity field in the interior of the domain. To account for model errors that stem from the uncertainty in the conductivity coefficient, we treat the unknown conductivity as a nuisance parameter and carry out approximative premarginalization over it, and invert for the Robin coefficient field only. We approximate the related modelling errors via the Bayesian approximation error (BAE) approach. The uncertainty analysis presented here relies on a local linearization of the parameter-to-observable map at the maximum a posteriori (MAP) estimates, which leads to a normal (Gaussian) approximation of the parameter posterior density. To compute the MAP point we apply an inexact Newton conjugate gradient approach based on the adjoint methodology. The construction of the covariance is made tractable by invoking a low-rank approximation of the data misfit component of the Hessian. Two numerical experiments are considered: one where the prior covariance on the conductivity is isotropic, and one where the prior covariance on the conductivity is anisotropic. Results are compared to those based on standard error models, with particular emphasis on the feasibility of the posterior uncertainty estimates. We show that the BAE approach is a feasible one in the sense that the predicted posterior uncertainty is consistent with the actual estimation errors, while neglecting the related modelling error yields infeasible estimates for the Robin coefficient. In addition, we demonstrate that the BAE approach is approximately as computationally expensive (measured in the number of PDE solves) as the conventional error approach.

1809.08438 2026-06-04 cs.DC cs.IT cs.SY eess.SY math.IT stat.ML

Trusted Multi-Party Computation and Verifiable Simulations: A Scalable Blockchain Approach

可信多方计算与可验证模拟:一种可扩展的区块链方法

Ravi Kiran Raman, Roman Vaculin, Michael Hind, Sekou L. Remy, Eleftheria K. Pissadaki, Nelson Kibichii Bore, Roozbeh Daneshvar, Biplav Srivastava, Kush R. Varshney

AI总结 本文提出了一种基于区块链的可扩展框架,用于在多方计算环境中实现计算结果的可信性和可验证性,通过减少存储和通信成本的损失性压缩方案解决可扩展性问题。

详情
Comments
16 pages, 8 figures
AI中文摘要

大规模计算实验,通常需要数周时间并在大规模数据集上运行,被广泛用于流行病学、气象学、计算生物学和医疗健康等领域,以理解现象并设计影响日常健康和经济的高风险政策。例如,OpenMalaria框架是一种计算密集型模拟,被各种非政府和政府机构用于理解疟疾传播和干预策略的有效性,从而制定医疗政策。鉴于这些共享结果是推断、技术解决方案设计和日常政策制定的基础,确保计算的验证和信任至关重要。特别是,在涉及多个独立计算代理的多代理环境中,对同辈生成结果的信任对于促进透明度、问责制和合作至关重要。通过使用分布式验证原子计算块和基于区块链的不可变审计机制的新型组合,本文提出了一种用于分布式计算信任的通用框架。特别地,我们通过使用损失性压缩方案来减少存储和通信成本,解决可扩展性问题。该框架不仅保证了最终结果的可验证性,还保证了本地计算的有效性,并通过神经网络训练的合成示例研究了其成本效益权衡。

英文摘要

Large-scale computational experiments, often running over weeks and over large datasets, are used extensively in fields such as epidemiology, meteorology, computational biology, and healthcare to understand phenomena, and design high-stakes policies affecting everyday health and economy. For instance, the OpenMalaria framework is a computationally-intensive simulation used by various non-governmental and governmental agencies to understand malarial disease spread and effectiveness of intervention strategies, and subsequently design healthcare policies. Given that such shared results form the basis of inferences drawn, technological solutions designed, and day-to-day policies drafted, it is essential that the computations are validated and trusted. In particular, in a multi-agent environment involving several independent computing agents, a notion of trust in results generated by peers is critical in facilitating transparency, accountability, and collaboration. Using a novel combination of distributed validation of atomic computation blocks and a blockchain-based immutable audits mechanism, this work proposes a universal framework for distributed trust in computations. In particular we address the scalaibility problem by reducing the storage and communication costs using a lossy compression scheme. This framework guarantees not only verifiability of final results, but also the validity of local computations, and its cost-benefit tradeoffs are studied using a synthetic example of training a neural network.

1809.06401 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Hidden Markov Model Estimation-Based Q-learning for Partially Observable Markov Decision Process

基于隐马尔可夫模型估计的Q学习:部分可观测马尔可夫决策过程

Hyung-Jin Yoon, Donghwan Lee, Naira Hovakimyan

AI总结 本文提出了一种基于隐马尔可夫模型估计的在线Q学习算法,用于部分可观测马尔可夫决策过程,同时估计POMDP参数和Q函数,并证明其收敛性。

详情
AI中文摘要

目标是研究一种在线隐马尔可夫模型(HMM)估计基于的Q学习算法,用于有限状态和动作集的部分可观测马尔可夫决策过程(POMDP)。当完整状态观测可用时,Q学习在当前动作下找到最优动作价值函数(Q函数)。然而,当完整状态观测不可用时,Q学习表现不佳。本文将POMDP估计转化为HMM估计问题,并提出递归算法,同时估计POMDP参数和Q函数。此外,本文证明POMDP估计收敛到最大似然估计的平稳点,而Q函数估计收敛到满足由HMM估计过程确定的状态信念不变分布加权的贝尔曼最优性方程的固定点。

英文摘要

The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.

1804.03521 2026-06-04 eess.SY cs.SY stat.AP

Consensus-based approach to peer-to-peer electricity markets with product differentiation

基于共识的点对点电力市场方法:具有产品差异化

Etienne Sorin, Lucien Bobo, Pierre Pinson

AI总结 本文提出了一种基于共识的点对点电力市场方法,通过多双边经济调度(MBED)模型实现多双边交易,考虑消费者偏好差异,采用放松共识+创新(RCI)方法实现完全去中心化的市场清算,有效产生不同于集中式市场的结果,同时在尊重消费者偏好和最大化社会福利方面达到最优。

详情
Comments
Accepted for publication in IEEE Transactions on Power Systems
AI中文摘要

随着分布式发电容量的持续部署和消费者更积极的角色,电力系统及其运行正逐渐远离传统的自上而下分层结构。然而,电力市场结构尚未采纳这种演变。尊重现代电力系统的高维、分布式和动态特性,意味着设计点对点市场或至少利用这种去中心化结构以实现未来电力市场的自下而上方法。本文介绍了一种基于多双边经济调度(MBED)模型的点对点市场结构,允许基于消费者偏好的多双边交易。描述了放松共识+创新(RCI)方法以完全去中心化的方式解决MBED。一组现实案例研究及其分析表明,此类点对点市场结构可以产生不同于集中式市场结构的结果,并在尊重消费者偏好和最大化社会福利方面达到最优。此外,RCI求解方法允许完全去中心化的市场清算,其收敛具有可忽略的最优性差距,仅需有限的信息共享。

英文摘要

With the sustained deployment of distributed generation capacities and the more proactive role of consumers, power systems and their operation are drifting away from a conventional top-down hierarchical structure. Electricity market structures, however, have not yet embraced that evolution. Respecting the high-dimensional, distributed and dynamic nature of modern power systems would translate to designing peer-to-peer markets or, at least, to using such an underlying decentralized structure to enable a bottom-up approach to future electricity markets. A peer-to-peer market structure based on a Multi-Bilateral Economic Dispatch (MBED) formulation is introduced, allowing for multi-bilateral trading with product differentiation, for instance based on consumer preferences. A Relaxed Consensus+Innovation (RCI) approach is described to solve the MBED in fully decentralized manner. A set of realistic case studies and their analysis allow us showing that such peer-to-peer market structures can effectively yield market outcomes that are different from centralized market structures and optimal in terms of respecting consumers preferences while maximizing social welfare. Additionally, the RCI solving approach allows for a fully decentralized market clearing which converges with a negligible optimality gap, with a limited amount of information being shared.

1809.07570 2026-06-04 math.NA cs.NA math.ST stat.TH

Analysis of boundary effects on PDE-based sampling of Whittle-Matérn random fields

对基于PDE的Whittle-Matérn随机场采样的边界效应分析

Ustim Khristenko, Laura Scarabosio, Piotr Swierczynski, Elisabeth Ullmann, Barbara Wohlmuth

AI总结 本文研究了在有限域上求解具有Matérn协方差函数的零均值高斯随机场样本时,边界效应的影响,通过窗口技术引入合适的边界条件,并分析不同边界条件下的误差衰减特性。

详情
AI中文摘要

我们考虑了生成具有Matérn协方差函数的零均值高斯随机场样本的问题。每个样本都需要求解带有高斯白噪声驱动项的微分方程,该方程定义在有限计算域上。这引入了不想要的边界效应,因为原来的随机偏微分方程是在整个$\mathbb{R}^d$上定义的,没有边界条件。我们采用窗口技术,将计算域嵌入到更大的域中,并在扩展域上假设方便的边界条件。为了减轻人工边界带来的污染,在数值研究中建议选择窗口大小至少等于Matérn场的相关长度。我们为窗口技术引入的协方差误差进行了严格分析,考虑了齐次狄利克雷、齐次奈曼和周期边界条件。我们证明误差随着窗口大小指数衰减,与边界条件类型无关。我们在1D和2D空间中进行了数值实验,验证了我们的理论结果。

英文摘要

We consider the generation of samples of a mean-zero Gaussian random field with Matérn covariance function. Every sample requires the solution of a differential equation with Gaussian white noise forcing, formulated on a bounded computational domain. This introduces unwanted boundary effects since the stochastic partial differential equation is originally posed on the whole $\mathbb{R}^d$, without boundary conditions. We use a window technique, whereby one embeds the computational domain into a larger domain, and postulates convenient boundary conditions on the extended domain. To mitigate the pollution from the artificial boundary it has been suggested in numerical studies to choose a window size that is at least as large as the correlation length of the Matérn field. We provide a rigorous analysis for the error in the covariance introduced by the window technique, for homogeneous Dirichlet, homogeneous Neumann, and periodic boundary conditions. We show that the error decays exponentially in the window size, independently of the type of boundary condition. We conduct numerical experiments in 1D and 2D space, confirming our theoretical results.

1809.06970 2026-06-04 cs.LG cs.NI cs.PF cs.SY eess.SY stat.ML

FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices

FastDeepIoT: 向理解和优化移动和嵌入式设备上神经网络执行时间迈进

Shuochao Yao, Yiran Zhao, Huajie Shao, Shengzhong Liu, Dongxin Liu, Lu Su, Tarek Abdelzaher

AI总结 本文提出FastDeepIoT框架,通过揭示神经网络结构与执行时间之间的非线性关系,优化移动和嵌入式设备上执行时间与准确性的权衡,同时无需预先了解硬件规格或深度学习库的实现细节。

详情
Comments
Accepted by SenSys '18
AI中文摘要

深度神经网络在许多传感应用问题中展现出巨大潜力,但其过度的资源需求会减慢执行时间,成为在低端设备上部署的重大障碍。为了解决这一挑战,最近的研究集中在压缩神经网络大小以提高性能。我们表明,改变神经网络大小并不成比例地影响感兴趣的性能属性,例如执行时间。相反,在网络配置空间中存在极端的运行时间非线性性。因此,我们提出了一个名为FastDeepIoT的新型框架,该框架揭示了神经网络结构与执行时间之间的非线性关系,然后利用这种理解来找到显著改善移动和嵌入式设备上执行时间与准确性权衡的网络配置。FastDeepIoT有两个关键贡献。首先,FastDeepIoT自动学习了一个准确且高度可解释的深度神经网络在目标设备上的执行时间模型。这无需事先了解硬件规格或所用深度学习库的详细实现。其次,FastDeepIoT告知压缩算法如何在经过分析的设备上最小化执行时间而不影响准确性。我们使用三种不同的传感相关任务在两部移动设备(Nexus 5和Galaxy Nexus)上评估了FastDeepIoT。FastDeepIoT进一步将神经网络的执行时间减少了48%到78%,并将能耗降低了37%到69%,与最先进的压缩算法相比。

英文摘要

Deep neural networks show great potential as solutions to many sensing application problems, but their excessive resource demand slows down execution time, pausing a serious impediment to deployment on low-end devices. To address this challenge, recent literature focused on compressing neural network size to improve performance. We show that changing neural network size does not proportionally affect performance attributes of interest, such as execution time. Rather, extreme run-time nonlinearities exist over the network configuration space. Hence, we propose a novel framework, called FastDeepIoT, that uncovers the non-linear relation between neural network structure and execution time, then exploits that understanding to find network configurations that significantly improve the trade-off between execution time and accuracy on mobile and embedded devices. FastDeepIoT makes two key contributions. First, FastDeepIoT automatically learns an accurate and highly interpretable execution time model for deep neural networks on the target device. This is done without prior knowledge of either the hardware specifications or the detailed implementation of the used deep learning library. Second, FastDeepIoT informs a compression algorithm how to minimize execution time on the profiled device without impacting accuracy. We evaluate FastDeepIoT using three different sensing-related tasks on two mobile devices: Nexus 5 and Galaxy Nexus. FastDeepIoT further reduces the neural network execution time by $48\%$ to $78\%$ and energy consumption by $37\%$ to $69\%$ compared with the state-of-the-art compression algorithms.

1809.06009 2026-06-04 cs.LG cs.NA math.NA stat.ML

Uncertainty Propagation in Deep Neural Networks Using Extended Kalman Filtering

使用扩展卡尔曼滤波在深度神经网络中进行不确定性传播

Jessica S. Titensky, Hayden Jananthan, Jeremy Kepner

AI总结 本文提出利用扩展卡尔曼滤波在深度神经网络中传播和量化输入不确定性,方法在计算效率上优于现有技术,同时自然地将模型误差纳入输出不确定性。

详情
Comments
4 Pages, 8 figures. Accepted at MIT IEEE Undergraduate Research Technology Conference 2018. Publication pending
AI中文摘要

扩展卡尔曼滤波(EKF)可用于在假设输入分布具有温和假设的情况下通过深度神经网络(DNN)传播和量化输入不确定性。该方法在结果上与现有DNN不确定性传播方法相当,同时显著降低了计算开销。此外,EKF允许将模型误差自然地纳入输出不确定性中。

英文摘要

Extended Kalman Filtering (EKF) can be used to propagate and quantify input uncertainty through a Deep Neural Network (DNN) assuming mild hypotheses on the input distribution. This methodology yields results comparable to existing methods of uncertainty propagation for DNNs while lowering the computational overhead considerably. Additionally, EKF allows model error to be naturally incorporated into the output uncertainty.

1712.09685 2026-06-04 stat.ML cs.NA math.NA

Neural network augmented inverse problems for PDEs

神经网络增强的偏微分方程反问题

Jens Berg, Kaj Nyström

AI总结 本文利用神经网络增强经典反问题方法,通过神经网络作为先验来估计噪声数据中的系数,展示其在不同空间维度的泊松方程中的鲁棒性。

详情
AI中文摘要

在本文中,我们展示了如何通过人工神经网络增强经典反问题方法。神经网络作为先验用于估计从噪声数据中提取的系数。神经网络是全局的、平滑的函数近似器,因此不需要显式正则化误差泛函来恢复平滑解和系数。我们通过一维、二维和三维空间中的泊松方程提供了详细的示例,并展示了神经网络增强方法在噪声和不完整数据、网格和几何结构方面的鲁棒性。

英文摘要

In this paper we show how to augment classical methods for inverse problems with artificial neural networks. The neural network acts as a prior for the coefficient to be estimated from noisy data. Neural networks are global, smooth function approximators and as such they do not require explicit regularization of the error functional to recover smooth solutions and coefficients. We give detailed examples using the Poisson equation in 1, 2, and 3 space dimensions and show that the neural network augmentation is robust with respect to noisy and incomplete data, mesh, and geometry.

1611.01845 2026-06-04 stat.ML cs.SY eess.SY math.OC

Urban MV and LV Distribution Grid Topology Estimation via Group Lasso

通过分组Lasso估计城市中压和低压配电网拓扑

Yizheng Liao, Yang Weng, Guangyi Liu, Ram Rajagopal

AI总结 本文提出一种基于历史智能电表数据的配电网拓扑估计方法,利用概率图模型捕捉电压依赖性,并通过分组Lasso回归解决中压和低压配电网的拓扑估计问题。

详情
Journal ref
IEEE transactions on power systems 2018
Comments
15 pages, 16 figures
AI中文摘要

分布式能源资源的渗透率增加给城市配电网带来了诸多可靠性问题。拓扑估计是确保配电网运行稳健性的关键步骤。然而,配电网络中的节点连接和电网拓扑估计通常难以实现。例如,在城市电网中监测节点连接技术上具有挑战性且成本高昂,如地下线路。此外,单独使用放射状拓扑假设在大都市和负载密集区域不合适,因为这些区域的电网可能具有多种网状结构。为了解决这些问题,我们提出了一种数据驱动的拓扑估计方法,仅利用历史智能电表测量数据。特别地,利用概率图模型捕捉节点电压之间的统计依赖性。我们证明了在放射状和网状结构中,节点连接和电网拓扑估计问题可以转化为带有组最小一乘正则化的线性回归。使用PG&E住宅智能电表数据,在不同规模的8个中压和低压配电网及22种拓扑配置中进行了模拟,结果显示出高度准确的结果。

英文摘要

The increasing penetration of distributed energy resources poses numerous reliability issues to the urban distribution grid. The topology estimation is a critical step to ensure the robustness of distribution grid operation. However, the bus connectivity and grid topology estimation are usually hard in distribution grids. For example, it is technically challenging and costly to monitor the bus connectivity in urban grids, e.g., underground lines. It is also inappropriate to use the radial topology assumption exclusively because the grids of metropolitan cities and regions with dense loads could be with many mesh structures. To resolve these drawbacks, we propose a data-driven topology estimation method for MV and LV distribution grids by only utilizing the historical smart meter measurements. Particularly, a probabilistic graphical model is utilized to capture the statistical dependencies amongst bus voltages. We prove that the bus connectivity and grid topology estimation problems, in radial and mesh structures, can be formulated as a linear regression with a least absolute shrinkage regularization on grouped variables (\textit{group lasso}). Simulations show highly accurate results in eight MV and LV distribution networks at different sizes and 22 topology configurations using PG\&E residential smart meter data.

1809.03343 2026-06-04 eess.SY cs.LG cs.SY stat.ML

Distributed dynamic modeling and monitoring for large-scale industrial processes under closed-loop control

分布式动态建模与监控用于闭环控制下的大规模工业过程

Wenqing Li, Chunhui Zhao, Biao Huang

AI总结 本文提出一种分布式监控方法,结合静态和动态特性,区分真实故障与操作条件变化,通过稀疏慢特征分析算法分解过程并建立模型,验证方法有效性。

详情
AI中文摘要

对于受闭环控制的大规模工业过程,过程动态直接由控制动作产生,可能在真实故障和正常操作条件变化间表现出不同行为。然而,传统分布式监控方法不考虑闭环控制机制,仅探索静态特性,无法区分真实故障与名义变化,导致不必要的警报。本文提出一种分布式监控方法,通过同时探索静态和动态特性,首先通过开发稀疏慢特征分析(SSFA)算法将大规模闭环过程分解为若干子系统,其次开发分布式模型分别捕捉局部和全局的静态和动态特性。基于分布式监控系统,提出两级监控策略,检查操作条件和控制动作对过程特性的影响,从而区分两种变化。通过基准数据和真实工业过程数据的案例研究验证了所提方法的有效性。

英文摘要

For large-scale industrial processes under closed-loop control, process dynamics directly resulting from control action are typical characteristics and may show different behaviors between real faults and normal changes of operating conditions. However, conventional distributed monitoring approaches do not consider the closed-loop control mechanism and only explore static characteristics, which thus are incapable of distinguishing between real process faults and nominal changes of operating conditions, leading to unnecessary alarms. In this regard, this paper proposes a distributed monitoring method for closed-loop industrial processes by concurrently exploring static and dynamic characteristics. First, the large-scale closed-loop process is decomposed into several subsystems by developing a sparse slow feature analysis (SSFA) algorithm which capture changes of both static and dynamic information. Second, distributed models are developed to separately capture static and dynamic characteristics from the local and global aspects. Based on the distributed monitoring system, a two-level monitoring strategy is proposed to check different influences on process characteristics resulting from changes of the operating conditions and control action, and thus the two changes can be well distinguished from each other. Case studies are conducted based on both benchmark data and real industrial process data to illustrate the effectiveness of the proposed method.

1809.01353 2026-06-04 cs.LG cs.NA math.NA stat.ML

IKA: Independent Kernel Approximator

IKA:独立核近似器

Matteo Ronchetti

AI总结 本文提出IKA方法,通过线性组合任意选择的函数进行低秩核近似,优于Nyström方法,在STL-10数据集上表现更优。

详情
AI中文摘要

本文提出IKA方法,通过线性组合任意选择的函数进行低秩核近似,优于Nyström方法,在STL-10数据集上表现更优。

英文摘要

This paper describes a new method for low rank kernel approximation called IKA. The main advantage of IKA is that it produces a function $ψ(x)$ defined as a linear combination of arbitrarily chosen functions. In contrast the approximation produced by Nyström method is a linear combination of kernel evaluations. The proposed method consistently outperformed Nyström method in a comparison on the STL-10 dataset. Numerical results are reproducible using the source code available at https://gitlab.com/matteo-ronchetti/IKA

1710.03608 2026-06-04 math.NA cs.LG cs.NA stat.ML

CTD: Fast, Accurate, and Interpretable Method for Static and Dynamic Tensor Decompositions

CTD: 一种快速、准确且可解释的静态和动态张量分解方法

Jungwoo Lee, Dongjin Choi, Lee Sael

AI总结 本文提出CTD方法,用于高效且可解释地进行静态和动态张量分解,通过去除冗余提升准确性和效率,适用于在线环境下的异常检测。

详情
AI中文摘要

如何在高效且直接可解释的方式下发现张量中的模式和异常?如何在在线环境中处理不断到来的张量?张量模式和异常检测是关键问题,应用于安全监控、健康监测、网络安全等领域。标准的PARAFAC和Tucker分解结果不可直接解释。尽管已有基于采样的方法,但需要更快、更高效和更准确。本文提出CTD,一种基于采样的快速、准确且可解释的张量分解方法。CTD-S在准确性上比现有方法高17-83倍,速度和内存效率也分别提升5-86倍和7-12倍。CTD-D是首个可解释的动态张量分解方法,通过利用前一时间步的因素和重新排列操作,使速度提升2-3倍。通过CTD展示了如何在在线分布式拒绝服务(DDoS)攻击检测中有效解释结果。

英文摘要

How can we find patterns and anomalies in a tensor, or multi-dimensional array, in an efficient and directly interpretable way? How can we do this in an online environment, where a new tensor arrives each time step? Finding patterns and anomalies in a tensor is a crucial problem with many applications, including building safety monitoring, patient health monitoring, cyber security, terrorist detection, and fake user detection in social networks. Standard PARAFAC and Tucker decomposition results are not directly interpretable. Although a few sampling-based methods have previously been proposed towards better interpretability, they need to be made faster, more memory efficient, and more accurate. In this paper, we propose CTD, a fast, accurate, and directly interpretable tensor decomposition method based on sampling. CTD-S, the static version of CTD, provably guarantees a high accuracy that is 17 ~ 83x more accurate than that of the state-of-the-art method. Also, CTD-S is made 5 ~ 86x faster, and 7 ~ 12x more memory-efficient than the state-of-the-art method by removing redundancy. CTD-D, the dynamic version of CTD, is the first interpretable dynamic tensor decomposition method ever proposed. Also, it is made 2 ~ 3x faster than already fast CTD-S by exploiting factors at previous time step and by reordering operations. With CTD, we demonstrate how the results can be effectively interpreted in the online distributed denial of service (DDoS) attack detection.

1706.02586 2026-06-04 math.OC cs.DS cs.SY eess.SY stat.ML

DSOS and SDSOS Optimization: More Tractable Alternatives to Sum of Squares and Semidefinite Optimization

DSOS和SDSOS优化:比平方和优化和半定优化更可行的替代方法

Amir Ali Ahmadi, Anirudha Majumdar

AI总结 本文提出DSOS和SDSOS优化作为替代SOS优化的方法,通过线性规划和二次锥规划在计算时间和解的质量之间进行权衡,适用于半定规划中的可扩展性限制问题。

详情
AI中文摘要

近年来,平方和(SOS)优化的出现极大地影响了优化理论。然而,这种技术对大规模半定规划的依赖限制了其应用问题的规模。本文介绍DSOS和SDSOS优化作为SOS优化的替代方法,基于线性规划和二次锥规划,在计算时间和解的质量之间进行权衡。这些优化问题针对某些平方和多项式(或等价的半正定矩阵)的子集,适用于半定规划的一般应用,其中可扩展性是一个限制因素。我们证明了一些基本的SOS优化定理,这些定理依赖于实代数几何的结果,仍然适用于DSOS和SDSOS优化。此外,我们通过来自多项式优化、统计和机器学习、衍生品定价和控制理论等不同应用领域的数值实验,展示了在准确度上有合理权衡的情况下,可以处理目前传统SOS方法无法处理的更大规模的问题。最后,我们提供了一种最近的技术回顾,以弥补我们的DSOS/SDSOS方法与SOS方法之间的差距,但需要额外的运行时间。论文的补充材料介绍了一个用于DSOS和SDSOS优化的MATLAB工具包。

英文摘要

In recent years, optimization theory has been greatly impacted by the advent of sum of squares (SOS) optimization. The reliance of this technique on large-scale semidefinite programs however, has limited the scale of problems to which it can be applied. In this paper, we introduce DSOS and SDSOS optimization as linear programming and second-order cone programming-based alternatives to sum of squares optimization that allow one to trade off computation time with solution quality. These are optimization problems over certain subsets of sum of squares polynomials (or equivalently subsets of positive semidefinite matrices), which can be of interest in general applications of semidefinite programming where scalability is a limitation. We show that some basic theorems from SOS optimization which rely on results from real algebraic geometry are still valid for DSOS and SDSOS optimization. Furthermore, we show with numerical experiments from diverse application areas---polynomial optimization, statistics and machine learning, derivative pricing, and control theory---that with reasonable tradeoffs in accuracy, we can handle problems at scales that are currently significantly beyond the reach of traditional sum of squares approaches. Finally, we provide a review of recent techniques that bridge the gap between our DSOS/SDSOS approach and the SOS approach at the expense of additional running time. The Supplementary Material of the paper introduces an accompanying MATLAB package for DSOS and SDSOS optimization.

1808.09379 2026-06-04 math.NA cs.NA math.PR stat.ME

A transport-based multifidelity preconditioner for Markov chain Monte Carlo

基于传输的多保真预条件器用于马尔可夫链蒙特卡罗

Benjamin Peherstorfer, Youssef Marzouk

AI总结 本文提出一种基于传输的多保真预条件器,结合高保真与低保真模型,通过构造传输映射生成非高斯提议分布,实现高效采样,保证马尔可夫链的平稳分布为高保真后验。

详情
AI中文摘要

马尔可夫链蒙特卡罗(MCMC)采样后验分布时,当正向模型评估计算成本高时具有挑战性。用低成本低保真模型替代正向模型可显著降低计算成本;但单独使用低保真模型会使MCMC链的平稳分布对应低保真模型的后验,而非原始高保真模型的后验。我们提出一种多保真方法,将高保真模型与低保真模型结合而非替代。首先,用低保真模型构建传输映射,确定性地将参考高斯分布与低保真后验近似耦合。然后,用传输映射导出的非高斯提议分布探索高保真后验分布。这种多保真“预条件”MCMC方法通过提议分布专门针对后验,且高效构建于低保真模型。仅依赖低保真模型构造提议分布,保证MCMC链的平稳分布为高保真后验。在数值实验中,我们的多保真方法相比单保真MCMC采样方法实现了显著加速。

英文摘要

Markov chain Monte Carlo (MCMC) sampling of posterior distributions arising in Bayesian inverse problems is challenging when evaluations of the forward model are computationally expensive. Replacing the forward model with a low-cost, low-fidelity model often significantly reduces computational cost; however, employing a low-fidelity model alone means that the stationary distribution of the MCMC chain is the posterior distribution corresponding to the low-fidelity model, rather than the original posterior distribution corresponding to the high-fidelity model. We propose a multifidelity approach that combines, rather than replaces, the high-fidelity model with a low-fidelity model. First, the low-fidelity model is used to construct a transport map that deterministically couples a reference Gaussian distribution with an approximation of the low-fidelity posterior. Then, the high-fidelity posterior distribution is explored using a non-Gaussian proposal distribution derived from the transport map. This multifidelity "preconditioned" MCMC approach seeks efficient sampling via a proposal that is explicitly tailored to the posterior at hand and that is constructed efficiently with the low-fidelity model. By relying on the low-fidelity model only to construct the proposal distribution, our approach guarantees that the stationary distribution of the MCMC chain is the high-fidelity posterior. In our numerical examples, our multifidelity approach achieves significant speedups compared to single-fidelity MCMC sampling methods.

1806.00728 2026-06-04 stat.ML cs.CV cs.LG cs.SY eess.SP eess.SY

Data-Free/Data-Sparse Softmax Parameter Estimation with Structured Class Geometries

无数据/稀疏数据softmax参数估计与结构类几何

Nisar Ahmed

AI总结 本文提出在少量或无标注数据情况下,利用类标签对数几率边界结构几何先验信息进行softmax参数估计,通过线性方程组求解,无需昂贵的数据采样和优化。

详情
Comments
Final version accepted to IEEE Signal Processing Letters (double column), submitted July 21, 2018
AI中文摘要

本文考虑在少量或无标注训练数据可用时,但已知类标签对数几率边界相对几何结构信息的softmax参数估计问题。证明了'无数据'softmax模型合成对应于求解参数方程组,其中期望主导类对数几率边界通过分解输入特征空间的凸多面体编码。当方程可解时,线性方程给出仅使用类边界多面体规范的softmax参数解集。这允许softmax参数学习无需昂贵的暴力数据采样和数值优化。线性方程还可适应数据稀疏情况下的约束最大似然估计。由于某些多面体规范可能无法得到解,因此也展示了存在某些概率分类问题,其对数几率边界无法用m类softmax模型学习。

英文摘要

This note considers softmax parameter estimation when little/no labeled training data is available, but a priori information about the relative geometry of class label log-odds boundaries is available. It is shown that `data-free' softmax model synthesis corresponds to solving a linear system of parameter equations, wherein desired dominant class log-odds boundaries are encoded via convex polytopes that decompose the input feature space. When solvable, the linear equations yield closed-form softmax parameter solution families using class boundary polytope specifications only. This allows softmax parameter learning to be implemented without expensive brute force data sampling and numerical optimization. The linear equations can also be adapted to constrained maximum likelihood estimation in data-sparse settings. Since solutions may also fail to exist for the linear parameter equations derived from certain polytope specifications, it is thus also shown that there exist probabilistic classification problems over m convexly separable classes for which the log-odds boundaries cannot be learned using an m-class softmax model.

1803.02916 2026-06-04 stat.AP cs.NA math.NA q-bio.QM

A Bayesian framework for molecular strain identification from mixed diagnostic samples

一种用于从混合诊断样本中识别微生物菌株的贝叶斯框架

Lauri Mustonen, Xiangxi Gao, Asteroide Santana, Rebecca Mitchell, Ymir Vigfusson, Lars Ruthotto

AI总结 本文提出了一种贝叶斯框架,用于从混合DNA样本中识别多种微生物菌株,通过逆问题建模同时估计菌株突变矩阵和混合比例向量,以解决疾病爆发监测中的路径菌株识别问题。

详情
Comments
25 pages, 4 figures
AI中文摘要

我们提供了一个数学公式,并开发了一个计算框架,用于从混合样本中识别多种微生物菌株。我们的方法适用于公共卫生领域,其中高效识别病原体至关重要,例如用于监测疾病爆发。我们将菌株识别建模为一个逆问题,旨在同时估计一个二进制矩阵(编码每个菌株中突变的存在或缺失)和一个实数值向量(表示菌株的混合比例),使得它们的乘积近似等于测量数据向量。该问题的结构与盲反卷积相似,但存在二进制约束,我们在方法中加以强制。采用贝叶斯方法,我们推导出后验密度。我们提出了两种计算方法来解决非凸的最大后验估计问题。第一种是局部优化方法,通过将问题分解为更小的独立子问题,使其高效且可扩展;第二种则通过将问题转换为凸混合整数二次规划问题,以获得全局最小值。分解方法还提供了高效整合后验分布的方式。这提供了有关欠定问题不确定性和数值解不确定性的有用信息。我们通过合成和实验数据(具有已知真实值)在硅中评估了我们框架的潜力和局限性。

英文摘要

We provide a mathematical formulation and develop a computational framework for identifying multiple strains of microorganisms from mixed samples of DNA. Our method is applicable in public health domains where efficient identification of pathogens is paramount, e.g., for the monitoring of disease outbreaks. We formulate strain identification as an inverse problem that aims at simultaneously estimating a binary matrix (encoding presence or absence of mutations in each strain) and a real-valued vector (representing the mixture of strains) such that their product is approximately equal to the measured data vector. The problem at hand has a similar structure to blind deconvolution, except for the presence of binary constraints, which we enforce in our approach. Following a Bayesian approach, we derive a posterior density. We present two computational methods for solving the non-convex maximum a posteriori estimation problem. The first one is a local optimization method that is made efficient and scalable by decoupling the problem into smaller independent subproblems, whereas the second one yields a global minimizer by converting the problem into a convex mixed-integer quadratic programming problem. The decoupling approach also provides an efficient way to integrate over the posterior. This provides useful information about the ambiguity of the underdetermined problem and, thus, the uncertainty associated with numerical solutions. We evaluate the potential and limitations of our framework in silico using synthetic and experimental data with available ground truths.

1612.01216 2026-06-04 math.OC cs.SY eess.SY stat.ML

Decentralized Frank-Wolfe Algorithm for Convex and Non-convex Problems

去中心化Frank-Wolfe算法用于凸和非凸问题

Hoi-To Wai, Jean Lafond, Anna Scaglione, Eric Moulines

AI总结 本文提出去中心化Frank-Wolfe算法,解决高维约束问题,通过投影自由方法实现高效收敛,适用于凸、强凸和非凸优化问题。

详情
Comments
Accepted to IEEE Transactions on Automatic Control. 33 pages, 7 figures, include an improved constant in Lemma 2
AI中文摘要

去中心化优化算法因网络信息处理的进展而受到关注。然而,传统基于投影梯度下降的去中心化算法无法处理高维约束问题,因为投影步骤计算成本过高。为此,本文采用投影自由优化方法,即Frank-Wolfe(FW)或条件梯度算法。我们从经典FW算法开发了去中心化FW(DeFW)算法。通过将去中心化算法视为不精确FW算法来研究所提算法的收敛性。使用递减步长规则和令t为迭代次数,我们证明DeFW算法的收敛率对于凸目标为O(1/t),对于强凸目标在约束集内部的最优解为O(1/t²),对于平滑但非凸目标向 stationary point 收敛为O(1/√t)。我们进一步证明基于共识的DeFW算法通过每迭代两次通信轮次满足上述保证。此外,我们展示了所提DeFW算法在低复杂度鲁棒矩阵补全和通信高效稀疏学习中的优势。合成和真实数据的数值结果支持我们的发现。

英文摘要

Decentralized optimization algorithms have received much attention due to the recent advances in network information processing. However, conventional decentralized algorithms based on projected gradient descent are incapable of handling high dimensional constrained problems, as the projection step becomes computationally prohibitive to compute. To address this problem, this paper adopts a projection-free optimization approach, a.k.a.~the Frank-Wolfe (FW) or conditional gradient algorithm. We first develop a decentralized FW (DeFW) algorithm from the classical FW algorithm. The convergence of the proposed algorithm is studied by viewing the decentralized algorithm as an inexact FW algorithm. Using a diminishing step size rule and letting $t$ be the iteration number, we show that the DeFW algorithm's convergence rate is ${\cal O}(1/t)$ for convex objectives; is ${\cal O}(1/t^2)$ for strongly convex objectives with the optimal solution in the interior of the constraint set; and is ${\cal O}(1/\sqrt{t})$ towards a stationary point for smooth but non-convex objectives. We then show that a consensus-based DeFW algorithm meets the above guarantees with two communication rounds per iteration. Furthermore, we demonstrate the advantages of the proposed DeFW algorithm on low-complexity robust matrix completion and communication efficient sparse learning. Numerical results on synthetic and real data are presented to support our findings.

1808.08657 2026-06-04 eess.SY cs.SY math.OC stat.AP

Localized solar power prediction based on weather data from local history and global forecasts

基于本地历史天气数据和全球预报的局部太阳能预测

Chaitanya Poolla, Abraham K. Ishihara

AI总结 本文提出利用时间序列模型结合外生输入,结合本地历史天气和全球预报数据,实现18小时提前预测,准确率达约80%。

详情
Comments
5 pages, 4 figures, 3 tables, 45th IEEE Photovoltaic Specialists Conference (PVSC)
AI中文摘要

随着商业建筑净零可持续性需求的增加,光伏资产的整合变得更加重要。然而,由于高太阳变化性和光伏输出预测的不确定性,这种整合仍具挑战性。现有方法通常仅使用本地电力/天气历史或全球天气预报,忽略全球现象或本地特征。本文提出利用本地历史天气和全球预报数据,基于时间序列建模与外生输入,实现18小时提前预测,准确率达约80%,使用美国国家海洋和大气管理局(NOAA)的高分辨率快速刷新(HRRR)模型数据。

英文摘要

With the recent interest in net-zero sustainability for commercial buildings, integration of photovoltaic (PV) assets becomes even more important. This integration remains a challenge due to high solar variability and uncertainty in the prediction of PV output. Most existing methods predict PV output using either local power/weather history or global weather forecasts, thereby ignoring either the impending global phenomena or the relevant local characteristics, respectively. This work proposes to leverage weather data from both local weather history and global forecasts based on time series modeling with exogenous inputs. The proposed model results in eighteen hour ahead forecasts with a mean accuracy of $\approx$ 80\% and uses data from the National Ocean and Atmospheric Administration's (NOAA) High-Resolution Rapid Refresh (HRRR) model.

1803.07753 2026-06-04 eess.SY cs.SY stat.ML

Sample Complexity of Sparse System Identification Problem

稀疏系统辨识问题的样本复杂度

Salar Fattahi, Somayeh Sojoudi

AI总结 本文研究了稀疏线性时不变系统辨识问题,提出了一种促进稀疏性的块正则化估计器,通过有限的输入-状态数据样本识别系统动态,并在高维尺度下证明了该估计器的性质,展示了在样本轨迹数量超过阈值时的小元素误差。

详情
AI中文摘要

在本文中,我们研究了稀疏线性时不变系统辨识问题。我们提出了一种促进稀疏性的块正则化估计器,仅使用有限数量的输入-状态数据样本来识别系统动态。我们从高维统计学的现代结果出发,刻画了该估计器在系统维度增长速率与可用样本轨迹数量相当或更快时的性质。特别是,我们证明,当样本轨迹数量超过某个阈值时,该估计器会产生小的元素误差。该阈值与每个块的大小和输入和状态矩阵中非零元素的数量呈多项式关系,但与系统维度呈对数关系。该结果的一个副产品是,稀疏系统辨识所需的样本轨迹数量显著少于系统维度。此外,我们展示了与最近备受瞩目的系统辨识问题的最小二乘估计器不同,本文提出的方法能够以上述数据样本数量精确恢复系统底层的稀疏结构。对合成系统、物理质量-弹簧网络和多智能体系统进行了广泛的案例研究,以展示所提出方法的有效性。

英文摘要

In this paper, we study the system identification problem for sparse linear time-invariant systems. We propose a sparsity promoting block-regularized estimator to identify the dynamics of the system with only a limited number of input-state data samples. We characterize the properties of this estimator under high-dimensional scaling, where the growth rate of the system dimension is comparable to or even faster than that of the number of available sample trajectories. In particular, using contemporary results on high-dimensional statistics, we show that the proposed estimator results in a small element-wise error, provided that the number of sample trajectories is above a threshold. This threshold depends polynomially on the size of each block and the number of nonzero elements at different rows of input and state matrices, but only logarithmically on the system dimension. A by-product of this result is that the number of sample trajectories required for sparse system identification is significantly smaller than the dimension of the system. Furthermore, we show that, unlike the recently celebrated least-squares estimators for system identification problems, the method developed in this work is capable of \textit{exact recovery} of the underlying sparsity structure of the system with the aforementioned number of data samples. Extensive case studies on synthetically generated systems, physical mass-spring networks, and multi-agent systems are offered to demonstrate the effectiveness of the proposed method.

1801.05859 2026-06-04 math.CA cs.NA eess.AS eess.SP math.NA stat.ME

A Kotel'nikov Representation for Wavelets

小波的 Kotelnikov 表示法

H. M. de Oliveira, R. J. Cintra, R. C. de Oliveira

AI总结 本文基于 Kotelnikov 定理提出小波表示法,探讨低频信号包络和相位处理,重新诠释小波作为滤波器组的分析方法,并展示在特定频带条件下保证正交分析的条件。

详情
Comments
6 pages, 15 figures
AI中文摘要

本文提出了一种利用基带信号的小波表示方法,通过利用 Kotelnikov 的结果来实现。详细展示了如何在低频下获取包络和相位的过程。重新审视了将小波视为具有恒定品质因子滤波器组的典型解释。证明如果小波频谱支持被限制在频带$[f_m,f_M]$内,则当$f_M \leq 3f_m$时,可以保证正交分析,这是一个相当简单的结果,但与 Nyquist 率有某种相似之处。然而,在正交小波谱不满足此条件的情况下,展示了如何构造一个无频谱重叠的等效滤波器组。

英文摘要

This paper presents a wavelet representation using baseband signals, by exploiting Kotel'nikov results. Details of how to obtain the processes of envelope and phase at low frequency are shown. The archetypal interpretation of wavelets as an analysis with a filter bank of constant quality factor is revisited on these bases. It is shown that if the wavelet spectral support is limited into the band $[f_m,f_M]$, then an orthogonal analysis is guaranteed provided that $f_M \leq 3f_m$, a quite simple result, but that invokes some parallel with the Nyquist rate. Nevertheless, in cases of orthogonal wavelets whose spectrum does not verify this condition, it is shown how to construct an "equivalent" filter bank with no spectral overlapping.

1803.10309 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Canonical Correlation Analysis of Datasets with a Common Source Graph

具有共同源图的数据集的典型相关分析

Jia Chen, Gang Wang, Yanning Shen, Georgios B. Giannakis

AI总结 本文提出了一种基于图正则化的典型相关分析方法(gCCA),通过引入图结构来利用共同源的知识,以提升数据融合和分类性能。

详情
Comments
10 pages, 7 figures
AI中文摘要

典型相关分析(CCA)是一种用于发现两个或多个数据集是否共享隐藏源的强大技术。其优点包括降维、聚类、分类、特征选择和数据融合。然而,标准CCA未利用共同源的几何结构,这可能来自给定数据或通过(交叉)相关性推导。本文将共同源提供的额外信息编码为图,并作为图正则化器。这导致了一种新的图正则化CCA方法,称为图(g)CCA。新的gCCA考虑了图诱导的共同源知识,同时最小化所需典型变量的距离。针对数据量小于数据向量维度的多种实际设置,还开发了gCCA的对偶形式。一种设置包括内核用于处理非线性数据依赖性。所得到的图内核(gk)CCA也以闭式形式获得。最后,通过多个真实数据集上的图像分类测试来证明新线性、对偶和内核方法相对于竞争方法的优势。

英文摘要

Canonical correlation analysis (CCA) is a powerful technique for discovering whether or not hidden sources are commonly present in two (or more) datasets. Its well-appreciated merits include dimensionality reduction, clustering, classification, feature selection, and data fusion. The standard CCA however, does not exploit the geometry of the common sources, which may be available from the given data or can be deduced from (cross-) correlations. In this paper, this extra information provided by the common sources generating the data is encoded in a graph, and is invoked as a graph regularizer. This leads to a novel graph-regularized CCA approach, that is termed graph (g) CCA. The novel gCCA accounts for the graph-induced knowledge of common sources, while minimizing the distance between the wanted canonical variables. Tailored for diverse practical settings where the number of data is smaller than the data vector dimensions, the dual formulation of gCCA is also developed. One such setting includes kernels that are incorporated to account for nonlinear data dependencies. The resultant graph-kernel (gk) CCA is also obtained in closed form. Finally, corroborating image classification tests over several real datasets are presented to showcase the merits of the novel linear, dual, and kernel approaches relative to competing alternatives.

1802.06517 2026-06-04 cs.CE cs.NA math.NA math.OC stat.AP

Goal-Oriented Optimal Design of Experiments for Large-Scale Bayesian Linear Inverse Problems

面向目标的大规模贝叶斯线性反问题实验设计优化

Ahmed Attia, Alen Alexanderian, Arvind K. Saibaba

AI总结 本文提出面向目标的实验设计优化框架,针对大规模贝叶斯线性反问题,通过减少目标量的后验不确定性,改进传统方法,提供A-GOODE和D-GOODE设计准则,并通过数值实验验证其有效性。

详情
Comments
25 pages, 13 figures
AI中文摘要

我们开发了一个框架,用于大规模贝叶斯线性反问题中由偏微分方程 governed 的面向目标的最优实验设计(GOODE)。该框架与经典贝叶斯最优实验设计(ODE)不同,我们寻求实验设计以最小化实验目标(如量值感兴趣(QoI))的后验不确定性,而不是估计的参数本身。这适用于解决方案是反问题中间步骤的情况,然后使用估计参数计算QoI。在这些问题中,GOODE方法有两个优势:设计可以避免浪费实验资源,通过有针对性的数据收集,且由于QoIs通常低维,因此设计标准计算更简单。我们提出了两种修改的设计标准,A-GOODE和D-GOODE,它们是经典贝叶斯A-和D-最优标准的自然类比。我们分析了与其他ODE标准的联系,并通过信息论工具解释了GOODE标准。然后,我们开发了一个高效的基于梯度的优化框架来解决GOODE优化问题。此外,我们展示了全面的数值实验,测试了所提出方法的各个方面。驱动应用是优化传感器放置以识别扩散和传输问题中的污染源。我们使用ℓ1范数惩罚方法强制传感器放置的稀疏性,并提出了一种实用的策略来指定相关的惩罚参数。

英文摘要

We develop a framework for goal-oriented optimal design of experiments (GOODE) for large-scale Bayesian linear inverse problems governed by PDEs. This framework differs from classical Bayesian optimal design of experiments (ODE) in the following sense: we seek experimental designs that minimize the posterior uncertainty in the experiment end-goal, e.g., a quantity of interest (QoI), rather than the estimated parameter itself. This is suitable for scenarios in which the solution of an inverse problem is an intermediate step and the estimated parameter is then used to compute a QoI. In such problems, a GOODE approach has two benefits: the designs can avoid wastage of experimental resources by a targeted collection of data, and the resulting design criteria are computationally easier to evaluate due to the often low-dimensionality of the QoIs. We present two modified design criteria, A-GOODE and D-GOODE, which are natural analogues of classical Bayesian A- and D-optimal criteria. We analyze the connections to other ODE criteria, and provide interpretations for the GOODE criteria by using tools from information theory. Then, we develop an efficient gradient-based optimization framework for solving the GOODE optimization problems. Additionally, we present comprehensive numerical experiments testing the various aspects of the presented approach. The driving application is the optimal placement of sensors to identify the source of contaminants in a diffusion and transport problem. We enforce sparsity of the sensor placements using an $\ell_1$-norm penalty approach, and propose a practical strategy for specifying the associated penalty parameter.

1709.05409 2026-06-04 eess.SY cs.SY math.DS stat.ME stat.ML

Gaussian Process Latent Force Models for Learning and Stochastic Control of Physical Systems

高斯过程隐力模型用于物理系统的学习与随机控制

Simo Särkkä, Mauricio A. Álvarez, Neil D. Lawrence

AI总结 本文研究了包含未知输入信号的物理系统学习与随机控制问题,通过高斯过程建模未知信号,提出隐力模型并探讨其统计推断、学习方法及可观测性和可控性理论。

详情
AI中文摘要

本文关注包含未知输入信号的物理系统的学习与随机控制问题。这些未知信号被建模为具有特定参数化协方差结构的高斯过程(GP)。所得到的隐力模型(LFMs)可视为混合模型,包含基于原理的物理模型部分和非参数化的GP模型部分。我们简要回顾了此类模型的统计推断和学习方法,介绍了用于这些模型的随机控制方法,并提供了新的理论可观测性和可控性结果。

英文摘要

This article is concerned with learning and stochastic control in physical systems which contain unknown input signals. These unknown signals are modeled as Gaussian processes (GP) with certain parametrized covariance structures. The resulting latent force models (LFMs) can be seen as hybrid models that contain a first-principles physical model part and a non-parametric GP model part. We briefly review the statistical inference and learning methods for this kind of models, introduce stochastic control methodology for the models, and provide new theoretical observability and controllability results for them.

1512.08732 2026-06-04 math.CA cs.NA math.NA math.PR math.ST stat.TH

Recovery of periodicities hidden in heavy-tailed noise

从重尾噪声中恢复隐藏的周期性

Illya M. Karabash, Jürgen Prestin

AI总结 本文研究了离散信号在重尾噪声下的参数联合检测估计问题,提出基于反导数奇异性检测和两级选择过程的估计器构造方法,证明了参数估计的渐近一致性。

详情
Journal ref
Mathematische Nachrichten 291 (2018), no. 1, 86-102
Comments
This e-print differs in style, formatting, pagination, and small non-mathematical details from the version of this paper accepted for publication in "Mathematische Nachrichten" (Wiley-VCH). In comparison with the previous version, the following changes have been made: The introduction section 1 was split into two sections 1 and 2. The new section 5 was added. The algorithm 4.1 was added
AI中文摘要

我们研究了一类离散信号形式为$x(t) = \sum_{n=1}^{N} α_n e^{-i λ_n t } + ε_t$,$t \in \mathbb{N}$的参数联合检测估计问题,其中噪声$\epsilon_t$由独立中心复随机变量表示。假设$\epsilon_t$的分布未知但满足各种条件。我们证明在重尾噪声情况下,可以构造渐近强一致的参数估计器,包括频率$\lambda_n$、数量$N$和复幅值$\alpha_n$。例如,考虑的噪声类之一是$\epsilon_t$为独立同分布随机变量,满足$\mathbb{E}(\epsilon_t) = 0$且$\mathbb{E}(|\epsilon_t| \ln |\epsilon_t|) < \infty$。估计器的构造基于$Z$-变换反导数的奇异性检测和特殊离散超平面集的两级选择过程。一致性证明依赖于随机傅里叶级数的收敛理论。我们还讨论了衰减信号和无限频率数情况。

英文摘要

We address a parametric joint detection-estimation problem for discrete signals of the form $x(t) = \sum_{n=1}^{N} α_n e^{-i λ_n t } + ε_t$, $t \in \mathbb{N}$, with an additive noise represented by independent centered complex random variables $ε_t$. The distributions of $ε_t$ are assumed to be unknown, but satisfying various sets of conditions. We prove that in the case of a heavy-tailed noise it is possible to construct asymptotically strongly consistent estimators for the unknown parameters of the signal, i.e., the frequencies $λ_n$, their number $N$, and complex amplitudes $α_n$. For example, one of considered classes of noise is the following: $ε_t$ are independent identically distributed random variables with $\mathbb{E} (ε_t) = 0$ and $\mathbb{E} (|ε_t| \ln |ε_t|) < \infty$. The construction of estimators is based on detection of singularities of anti-derivatives for $Z$-transforms and on a two-level selection procedure for special discretized versions of superlevel sets. The consistency proof relies on the convergence theory for random Fourier series. We discuss also decaying signals and the case of infinite number of frequencies.

1808.03374 2026-06-04 math.NA cs.CE cs.NA q-bio.GN stat.AP

Fast computation of the principal components of genotype matrices in Julia

在Julia中快速计算基因型矩阵的主成分

Jiahao Chen, Andreas Noack, Alan Edelman

AI总结 本文提出了一种随机矩阵模型,用于快速计算基因组关联研究中基因型矩阵的主成分,采用GKL bidiagonalization方法,比现有工具快36倍。

详情
Comments
15 pages, 6 figures, 3 tables, repository at https://github.com/jiahao/paper-copper-2016
AI中文摘要

在基因组关联研究(GWASs)中,找到基因数据矩阵的最大几个主成分是常见任务,用于降维和识别变异因素。我们描述了一种简单的随机矩阵模型,显示奇异值具有遵循Marchenko-Pastur分布的批量行为,并有少量大异常值。我们还实现了Golub-Kahan-Lanczos(GKL)双对角化方法,在Julia编程语言中提供厚重启和全/部分重新正交化策略以控制舍入误差。我们的GKL双对角化实现比用于基因组数据分析的现有工具如EIGENSOFT和FlashPCA快36倍,后者使用密集LAPACK例行和随机子空间迭代。

英文摘要

Finding the largest few principal components of a matrix of genetic data is a common task in genome-wide association studies (GWASs), both for dimensionality reduction and for identifying unwanted factors of variation. We describe a simple random matrix model for matrices that arise in GWASs, showing that the singular values have a bulk behavior that obeys a Marchenko-Pastur distributed with a handful of large outliers. We also implement Golub-Kahan-Lanczos (GKL) bidiagonalization in the Julia programming language, providing thick restarting and a choice between full and partial reorthogonalization strategies to control numerical roundoff. Our implementation of GKL bidiagonalization is up to 36 times faster than software tools used commonly in genomics data analysis for computing principal components, such as EIGENSOFT and FlashPCA, which use dense LAPACK routines and randomized subspace iteration respectively.

1808.02316 2026-06-04 math.NA cs.NA eess.SP stat.ML

Modelling hidden structure of signals in group data analysis with modified (Lr, 1) and block-term decompositions

在群体数据分析中利用改进的(Lr, 1)和块项分解建模信号的隐藏结构

Pavel Kharyuk, Ivan Oseledets

AI总结 本文提出改进的(Lr, 1)和块项分解用于群体数据分析,通过多标签分类任务评估了新方法在图像集上的表现,并与传统矩阵模型进行对比。

详情
AI中文摘要

本文致力于阐述利用块项分解进行群体数据分析的思路,并探讨用(Lr, 1)和Tucker块建模群体活动的可能性。在群体数据分析应用中,考虑了一种块张量分解的新泛化形式。所提出的方法在图像集的多标签分类任务上进行了评估。此外,本文还报告了在聚类任务中使用所提出的张量模型与已知矩阵模型(即公共正交基提取和组独立成分分析)的比较结果。

英文摘要

This work is devoted to elaboration on the idea to use block term decomposition for group data analysis and to raise the possibility of modelling group activity with (Lr, 1) and Tucker blocks. A new generalization of block tensor decomposition was considered in application to group data analysis. Suggested approach was evaluated on multilabel classification task for a set of images. This contribution also reports results of investigation on clustering with proposed tensor models in comparison with known matrix models, namely common orthogonal basis extraction and group independent component analysis.

1705.07646 2026-06-04 math.NA cs.NA stat.CO

An approximate empirical Bayesian method for large-scale linear-Gaussian inverse problems

大规模线性-高斯逆问题的近似经验贝叶斯方法

Qingping Zhou, Wenqing Liu, Jinglai Li, Youssef M. Marzouk

AI总结 本文提出一种近似经验贝叶斯方法,用于解决大规模线性-高斯逆问题,通过低秩近似优化边际似然函数,结合随机化SVD和谱近似方法提高计算效率。

详情
AI中文摘要

我们研究了用于解决线性逆问题的贝叶斯推断方法,重点在于分层公式,其中先验或似然函数依赖于未指定的超参数。在实践中,这些超参数通常通过最大化边际似然函数(即数据在超参数条件下的概率密度)来确定。然而,对于大规模问题,评估边际似然函数在计算上具有挑战性。在本文中,我们提出了一种近似评估边际似然函数的方法,基于先验协方差到后验协方差更新的低秩近似。我们证明这种近似在最小最大意义上是最佳的。此外,我们提供了一种高效的算法来实现所提出的方法,基于随机化SVD和计算先验协方差矩阵平方根的谱近似方法。几个数值示例展示了所提方法的良好性能。

英文摘要

We study Bayesian inference methods for solving linear inverse problems, focusing on hierarchical formulations where the prior or the likelihood function depend on unspecified hyperparameters. In practice, these hyperparameters are often determined via an empirical Bayesian method that maximizes the marginal likelihood function, i.e., the probability density of the data conditional on the hyperparameters. Evaluating the marginal likelihood, however, is computationally challenging for large-scale problems. In this work, we present a method to approximately evaluate marginal likelihood functions, based on a low-rank approximation of the update from the prior covariance to the posterior covariance. We show that this approximation is optimal in a minimax sense. Moreover, we provide an efficient algorithm to implement the proposed method, based on a combination of the randomized SVD and a spectral approximation method to compute square roots of the prior covariance matrix. Several numerical examples demonstrate good performance of the proposed method.

1804.01189 2026-06-04 eess.SY cs.CL cs.SY math.OC stat.ML

Real-Time Prediction of the Duration of Distribution System Outages

配电系统停电持续时间的实时预测

Aaron Jaech, Baosen Zhang, Mari Ostendorf, Daniel S. Kirschen

AI总结 本文利用历史停电记录训练神经网络预测停电持续时间,通过环境因素初始预测并结合现场报告文本分析提升预测性能,案例研究显示自然语言处理能识别停电原因和修复步骤。

详情
Comments
Appears in IEEE Transactions on Power Systems
AI中文摘要

本文针对无计划停电持续时间的预测问题,利用历史停电记录训练一系列神经网络预测器。初始持续时间预测基于环境因素,随后通过自然语言处理分析 incoming 场地报告进行更新。使用15年的停电记录进行实验显示初始结果良好,借助文本信息提升了性能。案例研究显示语言处理能够识别指向停电原因和修复步骤的短语。

英文摘要

This paper addresses the problem of predicting duration of unplanned power outages, using historical outage records to train a series of neural network predictors. The initial duration prediction is made based on environmental factors, and it is updated based on incoming field reports using natural language processing to automatically analyze the text. Experiments using 15 years of outage records show good initial results and improved performance leveraging text. Case studies show that the language processing identifies phrases that point to outage causes and repair steps.

1807.09657 2026-06-04 math.NA cs.NA stat.CO

A computational geometry method for the inverse scattering problem

一种用于反散射问题的计算几何方法

Maria L. Daza-Torres, Juan Antonio Infante del Río, Marcos A. Capistrán, J. Andrés Christen

AI总结 本文提出基于计算几何的反散射问题求解方法,通过点云近似散射体支撑集,并利用贝叶斯方法联合建模点云非凸包和散射体折射率,结合样条控制点计算体积势,提出概率转移核以高效采样后验分布,验证了方法在同时恢复支撑集和折射率方面的可靠性。

详情
Comments
20 pages, figures 8
AI中文摘要

本文展示了一种用于求解二维星形、光滑、穿透性障碍物反散射问题的计算方法。我们的方法基于计算几何的经典思想。首先,我们用点云近似散射体的支撑集。其次,我们利用贝叶斯范式建模给定近场数据下点云非凸包和散射体常数折射率的联合条件概率分布。值得注意的是,我们使用点云的非凸包作为样条控制点,在更细的网格上计算出现在直接问题积分方程公式中的体积势。最后,为了采样出现的后验分布,我们提出一个与空间仿射变换兼容的概率转移核。我们的发现表明,我们的方法能够可靠地同时恢复散射体的支撑集和常数折射率。确实,我们的采样方法在估计诸如散射体面积等感兴趣量时表现出鲁棒性。我们最后指出了一系列方法的推广。

英文摘要

In this paper we demonstrate a computational method to solve the inverse scattering problem for a star-shaped, smooth, penetrable obstacle in 2D. Our method is based on classical ideas from computational geometry. First, we approximate the support of a scatterer by a point cloud. Secondly, we use the Bayesian paradigm to model the joint conditional probability distribution of the non-convex hull of the point cloud and the constant refractive index of the scatterer given near field data. Of note, we use the non-convex hull of the point cloud as spline control points to evaluate, on a finer mesh, the volume potential arising in the integral equation formulation of the direct problem. Finally, in order to sample the arising posterior distribution, we propose a probability transition kernel that commutes with affine transformations of space. Our findings indicate that our method is reliable to retrieve the support and constant refractive index of the scatterer simultaneously. Indeed, our sampling method is robust to estimate a quantity of interest such as the area of the scatterer. We conclude pointing out a series of generalizations of our method.

1807.09120 2026-06-04 eess.SY cs.SY math.PR stat.AP stat.ML

Finite Time Adaptive Stabilization of LQ Systems

线性二次系统有限时间自适应稳定化

Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis

AI总结 本文提出基于随机线性反馈的有限时间稳定化方法,通过严格假设和新工具,实现系统在有限时间内稳定,克服了传统渐近方法的不足。

详情
Comments
arXiv admin note: substantial text overlap with arXiv:1711.07230
AI中文摘要

本文提出基于随机线性反馈的有限时间稳定化方法,通过严格假设和新工具,实现系统在有限时间内稳定,克服了传统渐近方法的不足。

英文摘要

Stabilization of linear systems with unknown dynamics is a canonical problem in adaptive control. Since the lack of knowledge of system parameters can cause it to become destabilized, an adaptive stabilization procedure is needed prior to regulation. Therefore, the adaptive stabilization needs to be completed in finite time. In order to achieve this goal, asymptotic approaches are not very helpful. There are only a few existing non-asymptotic results and a full treatment of the problem is not currently available. In this work, leveraging the novel method of random linear feedbacks, we establish high probability guarantees for finite time stabilization. Our results hold for remarkably general settings because we carefully choose a minimal set of assumptions. These include stabilizability of the underlying system and restricting the degree of heaviness of the noise distribution. To derive our results, we also introduce a number of new concepts and technical tools to address regularity and instability of the closed-loop matrix.

1807.08855 2026-06-04 stat.ML cs.LG cs.RO cs.SY eess.SP eess.SY

Weak in the NEES?: Auto-tuning Kalman Filters with Bayesian Optimization

在NEES中薄弱:基于贝叶斯优化的自动调节卡尔曼滤波器

Zhaozhong Chen, Christoffer Heckman, Simon Julier, Nisar Ahmed

AI总结 本文提出一种基于贝叶斯优化的自动调节卡尔曼滤波器方法,通过智能采样参数空间,利用非参数高斯过程代理函数,高效识别多个局部极小值并提供结果不确定性量化。

详情
Comments
Final version presented at FUSION 2018 Conference, Cambridge, UK, July 2018 (submitted June 1, 2018)
AI中文摘要

卡尔曼滤波器被广泛用于数据融合应用,包括导航、跟踪和同时定位与建图问题。然而,调整各种卡尔曼滤波器模型参数需要大量时间和努力,例如过程噪声协方差、非白噪声预白化滤波器模型等。传统优化技术在调整时容易陷入较差的局部极小值,并且使用真实传感器数据实施成本较高。为了解决这些问题,本文开发了一种新的“黑箱”贝叶斯优化策略,用于自动调节卡尔曼滤波器。在该方法中,性能由两种随机目标函数之一来表征:当可用真实状态模型时为归一化估计误差平方(NEES),当只有传感器数据可用时为归一化创新误差平方(NIS)。通过智能采样参数空间,学习和利用非参数高斯过程代理函数,贝叶斯优化可以高效地识别多个局部极小值,并对其结果提供不确定性量化。

英文摘要

Kalman filters are routinely used for many data fusion applications including navigation, tracking, and simultaneous localization and mapping problems. However, significant time and effort is frequently required to tune various Kalman filter model parameters, e.g. process noise covariance, pre-whitening filter models for non-white noise, etc. Conventional optimization techniques for tuning can get stuck in poor local minima and can be expensive to implement with real sensor data. To address these issues, a new "black box" Bayesian optimization strategy is developed for automatically tuning Kalman filters. In this approach, performance is characterized by one of two stochastic objective functions: normalized estimation error squared (NEES) when ground truth state models are available, or the normalized innovation error squared (NIS) when only sensor data is available. By intelligently sampling the parameter space to both learn and exploit a nonparametric Gaussian process surrogate function for the NEES/NIS costs, Bayesian optimization can efficiently identify multiple local minima and provide uncertainty quantification on its results.

1807.08084 2026-06-04 math.NA cs.DS cs.NA eess.SP stat.CO

Fast Matrix Inversion and Determinant Computation for Polarimetric Synthetic Aperture Radar

快速矩阵求逆与行列式计算用于极化合成孔径雷达

D. F. G. Coelho, R. J. Cintra, A. C. Frery, V. S. Dimitrov

AI总结 本文提出一种快速算法,用于极化合成孔径雷达图像处理中小型矩阵的求逆和行列式计算,基于伴随矩阵和输入矩阵的对称性,实现GPU加速,相比传统Cholesky分解方法有约两倍的加速效果。

详情
Journal ref
Computers and Geosciences, no. 119 (2018), pages 109-114
Comments
7 pages, 1 figure
AI中文摘要

本文介绍了一种快速算法,用于在全极化合成孔径雷达(PolSAR)图像处理和分析中同时求解小型矩阵的逆和行列式。所提出的快速算法基于伴随矩阵的计算和输入矩阵的对称性。该算法在通用图形处理单元(GPGPU)上实现,并与基于Cholesky分解的传统方法进行了比较。使用模拟观测数据和实际PolSAR传感器数据的评估显示,与传统Cholesky分解相比,该方法的速度提升约为两倍。此外,本文提供的表达式可以在任何平台上实现。

英文摘要

This paper introduces a fast algorithm for simultaneous inversion and determinant computation of small sized matrices in the context of fully Polarimetric Synthetic Aperture Radar (PolSAR) image processing and analysis. The proposed fast algorithm is based on the computation of the adjoint matrix and the symmetry of the input matrix. The algorithm is implemented in a general purpose graphical processing unit (GPGPU) and compared to the usual approach based on Cholesky factorization. The assessment with simulated observations and data from an actual PolSAR sensor show a speedup factor of about two when compared to the usual Cholesky factorization. Moreover, the expressions provided here can be implemented in any platform.

1607.03518 2026-06-04 math.NA cs.NA stat.AP stat.CO

Airborne contaminant source estimation using a finite-volume forward solver coupled with a Bayesian inversion approach

利用有限体积正向求解器与贝叶斯反演方法估计空中污染物源

Bamdad Hosseini, John M. Stockie

AI总结 本文提出一种数值算法,用于解决具有高点源和地面沉积的大气扩散问题,采用有限体积方法结合贝叶斯反演,通过对比有限体积求解器与高斯烟柱求解器,展示其在估计排放速率不确定性方面的优势。

详情
Journal ref
Computers & Fluids, 154:27-43, 2017
Comments
Fixed a few typos in figures
AI中文摘要

我们提出了一种数值算法,用于解决具有高点源和地面沉积的大气扩散问题。该问题通过三维对流-扩散方程建模,包含delta分布源项,以及高度依赖的对流速度和扩散系数。我们使用分裂方法构建有限体积方案,其中Clawpack软件包用于对流求解,隐式时间离散化用于扩散项。该算法应用于实际工业场景,涉及锌冶炼厂的空气颗粒物排放,使用实际风速数据。我们还处理了各种实际考虑,如选择合适的正则化方法处理噪声风数据以及量化模型对参数不确定性的敏感性。随后,我们使用该算法在贝叶斯框架内估计工业现场多个源的锌排放速率。我们比较了有限体积求解器与高斯烟柱求解器,并展示有限体积求解器在估计排放速率的不确定性边界上更紧密。

英文摘要

We propose a numerical algorithm for solving the atmospheric dispersion problem with elevated point sources and ground-level deposition. The problem is modelled by the 3D advection-diffusion equation with delta-distribution source terms, as well as height-dependent advection speed and diffusion coefficients. We construct a finite volume scheme using a splitting approach in which the Clawpack software package is used as the advection solver and an implicit time discretization is proposed for the diffusion terms. The algorithm is then applied to an actual industrial scenario involving emissions of airborne particulates from a zinc smelter using actual wind measurements. We also address various practical considerations such as choosing appropriate methods for regularizing noisy wind data and quantifying sensitivity of the model to parameter uncertainty. Afterwards, we use the algorithm within a Bayesian framework for estimating emission rates of zinc from multiple sources over the industrial site. We compare our finite volume solver with a Gaussian plume solver within the Bayesian framework and demonstrate that the finite volume solver results in tighter uncertainty bounds on the estimated emission rates.

1807.07099 2026-06-04 eess.SP cs.LG cs.NA math.NA stat.ML

Comparative study of Discrete Wavelet Transforms and Wavelet Tensor Train decomposition to feature extraction of FTIR data of medicinal plants

对离散小波变换与小波张量分解在药用植物FTIR数据特征提取中的比较研究

Pavel Kharyuk, Dmitry Nazarenko, Ivan Oseledets

AI总结 本文比较了小波张量分解与离散小波变换在药用植物FTIR数据特征提取中的应用,发现两者在预处理和特征提取对机器学习算法效率的影响上表现相似,且小波张量分解因其单一参数调优优势更适用于多种信号处理任务。

详情
AI中文摘要

本文利用7种植物样本的傅里叶变换红外(FTIR)光谱,探讨了预处理和特征提取对机器学习算法效率的影响。将小波张量分解(WTT)与离散小波变换(DWT)作为药用植物FTIR数据的特征提取技术进行比较。各种信号处理步骤在应用于分类和聚类任务时表现出不同的行为。通过网格搜索找到的WTT和DWT的最佳结果相似,显著提高了聚类质量和调优后的逻辑回归分类准确率,相比原始光谱。与DWT不同,WTT只有一个参数(秩)需要调优,使其成为在各种信号处理应用中更通用和易用的数据处理工具。

英文摘要

Fourier-transform infra-red (FTIR) spectra of samples from 7 plant species were used to explore the influence of preprocessing and feature extraction on efficiency of machine learning algorithms. Wavelet Tensor Train (WTT) and Discrete Wavelet Transforms (DWT) were compared as feature extraction techniques for FTIR data of medicinal plants. Various combinations of signal processing steps showed different behavior when applied to classification and clustering tasks. Best results for WTT and DWT found through grid search were similar, significantly improving quality of clustering as well as classification accuracy for tuned logistic regression in comparison to original spectra. Unlike DWT, WTT has only one parameter to be tuned (rank), making it a more versatile and easier to use as a data processing tool in various signal processing applications.

1704.07897 2026-06-04 math.NA cs.NA math.PR math.ST stat.TH

Diffeomorphic random sampling using optimal information transport

利用最优信息传输的微分同胚随机采样

Martin Bauer, Sarang Joshi, Klas Modin

AI总结 本文提出一种基于最优信息传输的微分同胚随机采样算法,用于在黎曼流形上采样非均匀概率分布,该方法利用 Fisher-Rao 度量与微分同胚群的几何联系,避免了非线性 Monge-Ampere 方程。

详情
Comments
8 pages, 3 figures
AI中文摘要

本文探讨了一种用于在黎曼流形上进行非均匀概率分布微分同胚随机采样的算法。该算法基于最优信息传输(OIT),类似于最优质量传输(OMT)。我们的框架利用 Fisher-Rao 度量在概率密度空间上的深度几何联系以及右不变信息度量在微分同胚群上的联系。所得到的采样算法是 OMT 的一种有前景的替代方法,特别是在我们的公式化是半显式且不涉及非线性 Monge--Ampere 方程的情况下。与马尔可夫链蒙特卡洛方法相比,我们预计在需要大量从低维非均匀分布中采样时,该算法表现良好。

英文摘要

In this article we explore an algorithm for diffeomorphic random sampling of nonuniform probability distributions on Riemannian manifolds. The algorithm is based on optimal information transport (OIT)---an analogue of optimal mass transport (OMT). Our framework uses the deep geometric connections between the Fisher-Rao metric on the space of probability densities and the right-invariant information metric on the group of diffeomorphisms. The resulting sampling algorithm is a promising alternative to OMT, in particular as our formulation is semi-explicit, free of the nonlinear Monge--Ampere equation. Compared to Markov Chain Monte Carlo methods, we expect our algorithm to stand up well when a large number of samples from a low dimensional nonuniform distribution is needed.

1709.07224 2026-06-04 cs.MA cs.AI cs.LG cs.SY eess.SY stat.ML

Local Communication Protocols for Learning Complex Swarm Behaviors with Deep Reinforcement Learning

局部通信协议用于通过深度强化学习学习复杂群集行为

Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann

AI总结 本文提出简单通信协议,利用深度强化学习在多机器人群环境中学习去中心化控制策略,通过直方图编码局部邻域关系并传输任务特定信息,如最短距离和方向,以完成协作任务。

详情
Comments
13 pages, 4 figures, version 2, accepted at ANTS 2018
AI中文摘要

群集系统对强化学习(RL)构成挑战,因为算法需要学习去中心化控制策略以应对代理的有限局部感知和通信能力。虽然直接定义代理行为困难,但可通过先验知识定义简单的通信协议。本文提出多种简单通信协议,用于深度强化学习在多机器人群环境中寻找去中心化控制策略。协议基于直方图编码代理的局部邻域关系,并可传输任务特定信息,如到目标的最短距离和方向。在我们的框架中,我们采用信任区域策略优化的变体来学习复杂协作任务,如编队和建立通信链路。我们在模拟的2D物理环境中评估了我们的发现,并比较了不同通信协议的影响。

英文摘要

Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. While it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building and building a communication link. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.

1803.06531 2026-06-04 eess.SY cs.SY math.OC stat.ML

Topology Estimation using Graphical Models in Multi-Phase Power Distribution Grids

利用图形模型进行多相电力分配电网拓扑估计

Deepjyoti Deka, Michael Chertkov, Scott Backhaus

AI总结 本文提出利用同步电压测量重构多相电力分配电网的径向拓扑结构,通过条件独立性测试检测运营线路,并在IEEE测试案例中验证算法性能。

详情
Comments
12 pages 9 figures
AI中文摘要

配电电网是大电力系统中的中压和低压部分。结构上,大多数配电网络呈放射状运行,使带电线路形成树状结构,即森林,其中变电站位于每棵树的根部。然而,跟踪这些变化对于配电电网的操作和控制至关重要,但受限于实时监控的不足。本文开发了一个学习框架,从受节点功率波动影响的电网中同步电压测量中重建配电电网的径向运行结构。为了检测运行线路,我们的学习算法使用适用于广泛概率分布和高斯注入的连续随机变量条件独立性测试。此外,我们的算法适用于不平衡三相电力流的实际情况。算法性能在IEEE配电电网测试案例的AC电力流模拟上进行了验证。

英文摘要

Distribution grid is the medium and low voltage part of a large power system. Structurally, the majority of distribution networks operate radially, such that energized lines form a collection of trees, i.e. forest, with a substation being at the root of any tree. The operational topology/forest may change from time to time, however tracking these changes, even though important for the distribution grid operation and control, is hindered by limited real-time monitoring. This paper develops a learning framework to reconstruct radial operational structure of the distribution grid from synchronized voltage measurements in the grid subject to the exogenous fluctuations in nodal power consumption. To detect operational lines our learning algorithm uses conditional independence tests for continuous random variables that is applicable to a wide class of probability distributions of the nodal consumption and Gaussian injections in particular. Moreover, our algorithm applies to the practical case of unbalanced three-phase power flow. Algorithm performance is validated on AC power flow simulations over IEEE distribution grid test cases.

1807.03769 2026-06-04 math.OC cs.AI cs.LG cs.SY eess.SY stat.ML

Kernel-Based Learning for Smart Inverter Control

基于核方法的智能逆变器控制

Aditie Garg, Mana Jalali, Vassilis Kekatos, Nikolaos Gatsis

AI总结 本文提出非线性逆变器控制策略,通过类比多任务学习将反应控制视为核回归任务,利用线性化电网模型和预测数据场景,在馈线层面联合设计逆变器规则以最小化电压偏差和电阻损耗。

详情
Comments
Submitted to the 2018 IEEE Global Signal and Information Processing Conf., Symposium on Smart Energy Infrastructures
AI中文摘要

目前,分布电网面临由间歇性太阳能发电引起的频繁电压波动的挑战。智能逆变器被倡导为一种快速响应的手段,用于调节电压并最小化电阻损耗。由于最优逆变器协调可能计算上具有挑战性,而预设的本地控制规则表现不佳,因此定制化的准静态控制规则被视为最佳折中方案。本文从仿射控制规则出发,提出非线性逆变器控制策略。通过类比多任务学习,将反应控制视为基于核的回归任务。利用线性化电网模型和给定的预期数据场景,在馈线层面联合设计逆变器规则,以最小化电压偏差和电阻损耗的凸组合,通过线性约束的二次规划。使用真实世界数据在基准馈线上的数值测试表明,非线性控制规则即使由少数非本地读数驱动,也能实现近最优性能。

英文摘要

Distribution grids are currently challenged by frequent voltage excursions induced by intermittent solar generation. Smart inverters have been advocated as a fast-responding means to regulate voltage and minimize ohmic losses. Since optimal inverter coordination may be computationally challenging and preset local control rules are subpar, the approach of customized control rules designed in a quasi-static fashion features as a golden middle. Departing from affine control rules, this work puts forth non-linear inverter control policies. Drawing analogies to multi-task learning, reactive control is posed as a kernel-based regression task. Leveraging a linearized grid model and given anticipated data scenarios, inverter rules are jointly designed at the feeder level to minimize a convex combination of voltage deviations and ohmic losses via a linearly-constrained quadratic program. Numerical tests using real-world data on a benchmark feeder demonstrate that nonlinear control rules driven also by a few non-local readings can attain near-optimal performance.

1807.02297 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML

Combinatorial Bandits for Incentivizing Agents with Dynamic Preferences

基于动态偏好的激励机制组合博弈问题

Tanner Fiez, Shreyas Sekar, Liyuan Zheng, Lillian J. Ratliff

AI总结 本文提出一种多臂老虎机框架,用于在资源受限环境下匹配用户激励,结合贪心匹配、UCB算法和马尔可夫链混合时间,理论分析 regret 并通过合成和现实案例验证性能。

详情
Comments
Published as a conference paper in Conference on Uncertainty in Artificial Intelligence (UAI) 2018
AI中文摘要

个性化激励或推荐设计以提高用户参与度正日益受到重视,随着数字平台提供商不断涌现。我们提出了一种多臂老虎机框架,用于匹配激励给用户,其偏好在事前未知且随时间动态变化,在资源受限环境下。我们设计了一种算法,结合了三个不同领域的思想:(i) 贪心匹配范式,(ii) 用于老虎机的上置信界算法 (UCB),以及 (iii) 马尔可夫链理论中的混合时间。对于该算法,我们提供了关于 regret 的理论界限,并通过合成和现实(如共享单车平台的供需匹配)示例展示了其性能。

英文摘要

The design of personalized incentives or recommendations to improve user engagement is gaining prominence as digital platform providers continually emerge. We propose a multi-armed bandit framework for matching incentives to users, whose preferences are unknown a priori and evolving dynamically in time, in a resource constrained environment. We design an algorithm that combines ideas from three distinct domains: (i) a greedy matching paradigm, (ii) the upper confidence bound algorithm (UCB) for bandits, and (iii) mixing times from the theory of Markov chains. For this algorithm, we provide theoretical bounds on the regret and demonstrate its performance via both synthetic and realistic (matching supply and demand in a bike-sharing platform) examples.

1807.00553 2026-06-04 cs.LG cs.AI cs.SY eess.SY math.DS stat.ML

A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics

对自动化决策中偏见的更广泛视角:反思认识论与动态性

Roel Dobbe, Sarah Dean, Thomas Gilbert, Nitin Kohli

AI总结 本文探讨自动化决策中偏见的根源,将技术偏见视为认识论问题,新兴偏见视为动态反馈现象,强调需反思认识论并采用价值敏感设计方法改进决策系统。

详情
Comments
Presented at the 2018 Workshop on Fairness, Accountability and Transparency in Machine Learning during ICML 2018, Stockholm, Sweden
AI中文摘要

机器学习(ML)正日益应用于现实世界,提供可操作见解并成为自动化决策系统的基础。尽管训练数据中固有的偏见是公平性讨论的核心问题,但这些系统也受到技术性和新兴偏见的影响,后者常作为实施中的上下文特定产物出现。本文将技术偏见视为认识论问题,新兴偏见视为动态反馈现象。为激发关于如何改变机器学习实践以有效应对这些问题的讨论,本文探索了偏见的更广泛视角,强调反思认识论的必要性,并指出价值敏感设计方法以重新审视自动化决策系统的设计和实施过程。

英文摘要

Machine learning (ML) is increasingly deployed in real world contexts, supplying actionable insights and forming the basis of automated decision-making systems. While issues resulting from biases pre-existing in training data have been at the center of the fairness debate, these systems are also affected by technical and emergent biases, which often arise as context-specific artifacts of implementation. This position paper interprets technical bias as an epistemological problem and emergent bias as a dynamical feedback phenomenon. In order to stimulate debate on how to change machine learning practice to effectively address these issues, we explore this broader view on bias, stress the need to reflect on epistemology, and point to value-sensitive design methodologies to revisit the design and implementation process of automated decision-making systems.

1707.02695 2026-06-04 math.NA cs.NA stat.CO

Symmetrized importance samplers for stochastic differential equations

对随机微分方程的对称重要性采样方法

Andrew Leach, Kevin K. Lin, Matthias Morzfeld

AI总结 本文研究了随机微分方程的重要性采样方法,通过小噪声分析表明,对称化处理能显著提升采样性能,适用于线性和非线性示例,并探讨了数据同化等潜在应用。

详情
Journal ref
Commun. Appl. Math. Comput. Sci. 13 (2018) 215-241
Comments
Added brief discussion of Hamilton-Jacobi equation. Also made various minor corrections. To appear in Communciations in Applied Mathematics and Computational Science
AI中文摘要

我们研究了随机微分方程(SDEs)的一类重要性采样方法。进行了小噪声分析,结果表明,当噪声不是太大时,简单的对称化程序可以显著提高我们重要性采样方案的性能。我们证明了这一点在多个线性和非线性示例中确实成立。讨论了潜在的应用,例如数据同化。

英文摘要

We study a class of importance sampling methods for stochastic differential equations (SDEs). A small-noise analysis is performed, and the results suggest that a simple symmetrization procedure can significantly improve the performance of our importance sampling schemes when the noise is not too large. We demonstrate that this is indeed the case for a number of linear and nonlinear examples. Potential applications, e.g., data assimilation, are discussed.

1804.06467 2026-06-04 math.NA cs.NA math.ST stat.TH

A Galerkin Isogeometric Method for Karhunen-Loeve Approximation of Random Fields

一种用于随机场Karhunen-Loeve近似的Galerkin同调方法

Sharif Rahman

AI总结 本文提出一种基于NURBS的Galerkin同调方法,用于求解Fredholm积分特征值问题,通过Karhunen-Loeve展开实现随机场离散化,具有几何精确性和全局光滑解的优势。

详情
Comments
31 pages, 12 figures, five tables; accepted by Computer Methods in Applied Mechanics and Engineering
AI中文摘要

本文介绍了用于求解Fredholm积分特征值问题的Galerkin同调方法的首次应用,通过Karhunen-Loeve展开实现随机场离散化。该方法在Hilbert空间的有限维子空间上进行Galerkin投影,利用B样条和非均匀有理B样条(NURBS)构建子空间,并采用标准特征解方法。与现有Galerkin方法(如有限元和无网格方法)相比,基于NURBS的同调方法能够保持物理或计算域的几何精确性,并利用基函数的正则性获得全局光滑的特征解。因此,将同调方法引入随机场离散化不仅是新的,也提供了现有方法的计算优势。总体而言,使用NURBS进行随机场离散化丰富了同调范式。因此,可以展望未来不确定量化流程,其中几何建模、应力分析和随机模拟均可通过NURBS相同构建块集成。三个数值示例,包括三维随机场离散化问题,展示了同调方法在获得特征解方面的准确性和收敛性。

英文摘要

This paper marks the debut of a Galerkin isogeometric method for solving a Fredholm integral eigenvalue problem, enabling random field discretization by means of the Karhunen-Loeve expansion. The method involves a Galerkin projection onto a finite-dimensional subspace of a Hilbert space, basis splines (B-splines) and non-uniform rational B-splines (NURBS) spanning the subspace, and standard methods of eigensolutions. Compared with the existing Galerkin methods, such as the finite-element and mesh-free methods, the NURBS-based isogeometric method upholds exact geometrical representation of the physical or computational domain and exploits regularity of basis functions delivering globally smooth eigensolutions. Therefore, the introduction of the isogeometric method for random field discretization is not only new; it also offers a few computational advantages over existing methods. In the big picture, the use of NURBS for random field discretization enriches the isogeometric paradigm. As a result, an uncertainty quantification pipeline of the future can be envisioned where geometric modeling, stress analysis, and stochastic simulation are all integrated using the same building blocks of NURBS. Three numerical examples, including a three-dimensional random field discretization problem, illustrate the accuracy and convergence properties of the isogeometric method for obtaining eigensolutions.

1710.02669 2026-06-04 stat.CO econ.GN q-fin.EC stat.AP

Aggregated moving functional median in robust prediction of hierarchical functional time series - an application to forecasting web portal users behaviors

聚合移动函数中位数在鲁棒预测层次函数时间序列中的应用——以预测网络门户用户行为为例

Daniel Kosiorowski, Dominik Mielczarek, Jerzy P. Rydlewski

AI总结 本文提出一种非参数鲁棒方法用于预测层次函数时间序列,通过分析、模拟和实证研究证明其在鲁棒性和计算复杂度方面优于Hyndman和Shang的方法,并应用于网络门户用户行为管理。

详情
AI中文摘要

本文介绍了一种新的非参数和鲁棒方法,用于预测层次函数时间序列。该方法与Hyndman和Shang的方法在无偏性、有效性、鲁棒性和计算复杂度方面进行了比较。考虑到分析、模拟和实证研究的结果,我们得出结论,我们的方法在某些统计标准,尤其是鲁棒性和计算复杂度方面优于Hyndman和Shang的方法。我们的方法的实证实用性通过管理一个分为四个子服务的网络门户的例子来展示。此外,还进行了涉及由FAR(1)过程和维纳过程组成的层次系统的广泛模拟研究。

英文摘要

In this article, a new nonparametric and robust method of forecasting hierarchical functional time series is presented. The method is compared with Hyndman and Shang's method with respect to their unbiasedness, effectiveness, robustness, and computational complexity. Taking into account results of the analytical, simulation and empirical studies, we come to the conclusion that our proposal is superior over the proposal of Hyndman and Shang with respect to some statistical criteria and especially with respect to robustness and computational complexity. An empirical usefulness of our method is presented on example of management of a certain web portal divided into four subservices. An extensive simulation study involving hierarchical systems consisted of FAR(1) processes and Wiener processes has been conducted as well.

1712.05460 2026-06-04 math.CO cs.NA math.NA stat.CO

Honey from the Hives: A Theoretical and Computational Exploration of Combinatorial Hives

蜂巢中的蜂蜜:组合蜂巢的理论与计算探索

John Lombard

AI总结 本文探讨了通过优化方案从Hermitian矩阵对生成蜂巢的理论与计算方法,提出新的算法并验证了两种数值算法在估计Littlewood-Richardson系数中的有效性。

详情
Comments
25 pages, 20 figures
AI中文摘要

本文首先回顾了Knutson和Tao提出的组合蜂巢,并探讨了Danilov和Koshevoy的猜想,分析了Appleby和Whitehead的提案中的障碍,并提出了更强的条件以几乎确定性生成蜂巢。我们首次将该方案映射到实用算法空间,以产生验证性的计算结果,并开辟了研究选定矩阵集合的随机几何和曲率的新领域。本文的第二部分关注Littlewood-Richardson系数及其估计方法,展示了两种数值算法的实验验证:一种是基于Narayanan提议的连续蜂巢多面体体积的圆润估计器,另一种是基于蜂巢晶格本身的坐标击中-运行构造的新方法。我们比较了每种方法的优势,并包括了在某些测试案例中的准确性数值结果。

英文摘要

In the first half of this manuscript, we begin with a brief review of combinatorial hives as introduced by Knutson and Tao, and focus on a conjecture by Danilov and Koshevoy for generating such a hive from Hermitian matrix pairs through an optimization scheme. We examine a proposal by Appleby and Whitehead in the spirit of this conjecture and analytically elucidate an obstruction in their construction for guaranteeing hive generation, while detailing stronger conditions under which we can produce hives with almost certain probability. We provide the first mapping of this prescription onto a practical algorithmic space that enables us to produce affirming computational results and open a new area of research into the analysis of the random geometries and curvatures of hive surfaces from select matrix ensembles. The second part of this manuscript concerns Littlewood-Richardson coefficients and methods of estimating them from the hive construction. We illustrate experimental confirmation of two numerical algorithms that we provide as tools for the community: one as a rounded estimator on the continuous hive polytope volume following a proposal by Narayanan, and the other as a novel construction using a coordinate hit-and-run on the hive lattice itself. We compare the advantages of each, and include numerical results on their accuracies for some tested cases.

1806.09919 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Tangent-Space Regularization for Neural-Network Models of Dynamical Systems

神经动力系统模型中的切空间正则化

Fredrik Bagge Carlson, Rolf Johansson, Anders Robertsson

AI总结 本文提出神经网络动力系统模型的切空间正则化方法,通过利用动力学函数的切空间特性,改进模型雅可比矩阵的正则化,减少对大量训练数据的依赖,并探讨不同网络架构对输入输出雅可比矩阵学习能力及L2正则化对系统稳定性的影响。

详情
AI中文摘要

本文介绍了神经网络动力系统模型中的切空间正则化概念。许多物理系统在控制应用中的动力学函数的切空间表现出有用性质,例如光滑性,这促使通过假设动力学的切空间来沿系统轨迹正则化模型雅可比矩阵。在没有假设的情况下,神经网络需要大量训练数据才能学习完整的非线性动力学而不过拟合。本文比较了不同网络架构在一步预测和模拟性能上的表现,并研究了不同架构学习具有正确输入输出雅可比矩阵的倾向。此外,探讨了L2权重正则化对学习雅可比特征值谱以及系统稳定性的影响。

英文摘要

This work introduces the concept of tangent space regularization for neural-network models of dynamical systems. The tangent space to the dynamics function of many physical systems of interest in control applications exhibits useful properties, e.g., smoothness, motivating regularization of the model Jacobian along system trajectories using assumptions on the tangent space of the dynamics. Without assumptions, large amounts of training data are required for a neural network to learn the full non-linear dynamics without overfitting. We compare different network architectures on one-step prediction and simulation performance and investigate the propensity of different architectures to learn models with correct input-output Jacobian. Furthermore, the influence of $L_2$ weight regularization on the learned Jacobian eigenvalue spectrum, and hence system stability, is investigated.

1707.01764 2026-06-04 math.ST cs.NA math.AP math.NA stat.TH

Bernstein - von Mises theorems for statistical inverse problems I: Schrödinger equation

贝塞尔-冯-米塞定理在统计反问题中的应用 I:薛定谔方程

Richard Nickl

AI总结 本文研究了确定薛定谔方程中未知势函数的问题,提出非参数贝叶斯先验,并证明了贝塞尔-冯-米塞定理,表明后验分布在小噪声极限下能进行有效的统计推断。

详情
Comments
46 pages, to appear in the Journal of the European Mathematical Society (JEMS)
AI中文摘要

本文研究了确定薛定谔方程中未知势函数的问题,提出非参数贝叶斯先验,并证明了贝塞尔-冯-米塞定理,表明后验分布在小噪声极限下能进行有效的统计推断。

英文摘要

The inverse problem of determining the unknown potential $f>0$ in the partial differential equation $$\fracΔ{2} u - fu =0 \text{ on } \mathcal O ~~\text{s.t. } u = g \text { on } \partial \mathcal O,$$ where $\mathcal O$ is a bounded $C^\infty$-domain in $\mathbb R^d$ and $g>0$ is a given function prescribing boundary values, is considered. The data consist of the solution $u$ corrupted by additive Gaussian noise. A nonparametric Bayesian prior for the function $f$ is devised and a Bernstein - von Mises theorem is proved which entails that the posterior distribution given the observations is approximated in a suitable function space by an infinite-dimensional Gaussian measure that has a `minimal' covariance structure in an information-theoretic sense. As a consequence the posterior distribution performs valid and optimal frequentist statistical inference on $f$ in the small noise limit.

1606.04567 2026-06-04 cs.CE cs.NA math.NA physics.comp-ph stat.ML

Regression-based reduced-order models to predict transient thermal output for enhanced geothermal systems

基于回归的降阶模型用于预测增强地热系统瞬态热输出

M. K. Mudunuru, S. Karra, D. R. Harp, G. D. Guthrie, H. S. Viswanathan

AI总结 本文基于降阶模型预测增强地热系统瞬态热输出,通过敏感参数识别和模型简化,提出三种不同复杂度的ROM,验证其在不同渗透率下的预测能力。

详情
Journal ref
M.K. Mudunuru, S. Karra, D.R. Harp, G.D. Guthrie, H.S. Viswanathan, Regression-based reduced-order models to predict transient thermal output for enhanced geothermal systems, Geothermics, Volume 70, 2017, Pages 192-205
Comments
25 pages, 8 figures
AI中文摘要

本文旨在评估基于三维物理模型开发的降阶模型(ROM)在预测增强地热储层瞬态热功率输出时的实用性,同时明确考虑地下系统和现场特定细节的不确定性。通过拉丁超立方抽样对模型输入进行数值模拟,识别出关键敏感参数,包括裂缝区渗透率、井/皮层因子、底孔压力和注入流量率。ROM的输入基于这些关键敏感参数,然后用于评估地下属性对热功率生产曲线的影响。所得到的ROM与现场数据和详细物理数值模拟进行比较。我们提出三种不同模型简洁程度的ROM,每种都描述了功率生产曲线的关键和本质特征。ROM-1能够准确再现低渗透率下的数值模拟功率输出以及某些现场数据特征,且相对简洁。ROM-2比ROM-1更复杂,但能准确描述现场数据。在较高渗透率下,ROM-2比ROM-1更准确地再现数值结果,但在低裂缝区渗透率下存在显著偏差。ROM-3结合了ROM-1和ROM-2的最佳特性,提供了一个折中的模型简洁性。它能够描述各种数值模拟和现场数据的特征。从所提出的流程中,我们证明所提出的简单ROM能够捕捉Fenton Hill HDR系统功率生产曲线的多种复杂特征。对于典型EGS应用,ROM-2和ROM-3优于ROM-1。

英文摘要

The goal of this paper is to assess the utility of Reduced-Order Models (ROMs) developed from 3D physics-based models for predicting transient thermal power output for an enhanced geothermal reservoir while explicitly accounting for uncertainties in the subsurface system and site-specific details. Numerical simulations are performed based on Latin Hypercube Sampling (LHS) of model inputs drawn from uniform probability distributions. Key sensitive parameters are identified from these simulations, which are fracture zone permeability, well/skin factor, bottom hole pressure, and injection flow rate. The inputs for ROMs are based on these key sensitive parameters. The ROMs are then used to evaluate the influence of subsurface attributes on thermal power production curves. The resulting ROMs are compared with field-data and the detailed physics-based numerical simulations. We propose three different ROMs with different levels of model parsimony, each describing key and essential features of the power production curves. ROM-1 is able to accurately reproduce the power output of numerical simulations for low values of permeabilities and certain features of the field-scale data, and is relatively parsimonious. ROM-2 is a more complex model than ROM-1 but it accurately describes the field-data. At higher permeabilities, ROM-2 reproduces numerical results better than ROM-1, however, there is a considerable deviation at low fracture zone permeabilities. ROM-3 is developed by taking the best aspects of ROM-1 and ROM-2 and provides a middle ground for model parsimony. It is able to describe various features of numerical simulations and field-data. From the proposed workflow, we demonstrate that the proposed simple ROMs are able to capture various complex features of the power production curves of Fenton Hill HDR system. For typical EGS applications, ROM-2 and ROM-3 outperform ROM-1.

1806.05419 2026-06-04 stat.ML cs.LG cs.NA math.NA math.ST stat.TH

Ranking Recovery from Limited Comparisons using Low-Rank Matrix Completion

通过低秩矩阵补全进行有限比较的排序恢复

Tal Levy, Alireza Vahid, Raja Giryes

AI总结 本文提出利用低秩矩阵补全方法解决经典排名聚合问题,通过矩阵形式处理部分噪声比较数据,结合交替最小化算法和最大似然估计,重建真实偏好强度。

详情
Comments
10 Pages, 9 figures. A prediction table for 2018 FIFA soccer world cup is included
AI中文摘要

本文提出了一种新的方法,利用低秩矩阵补全技术解决经典的排名聚合问题。通过将成对比较的不完全和噪声数据转换为矩阵形式,并利用矩阵补全工具(如Netflix挑战中的低秩补全解决方案)来构建不同对象的偏好。在我们的方法中,利用多个比较数据估计对象i相对于对象j获胜(或被选择)的概率,其中仅已知N个对象的部分比较数据。然后将数据转换为矩阵形式,其无噪声解具有已知的秩为一。接着使用目标矩阵具有双线性形式的交替最小化算法,并结合最大似然估计对两个因素进行估计。重建的矩阵用于获得真实的潜在偏好强度。本工作在模拟场景和真实数据中展示了所提算法相对于当前最先进方法的改进。

英文摘要

This paper proposes a new method for solving the well-known rank aggregation problem from pairwise comparisons using the method of low-rank matrix completion. The partial and noisy data of pairwise comparisons is transformed into a matrix form. We then use tools from matrix completion, which has served as a major component in the low-rank completion solution of the Netflix challenge, to construct the preference of the different objects. In our approach, the data of multiple comparisons is used to create an estimate of the probability of object i to win (or be chosen) over object j, where only a partial set of comparisons between N objects is known. The data is then transformed into a matrix form for which the noiseless solution has a known rank of one. An alternating minimization algorithm, in which the target matrix takes a bilinear form, is then used in combination with maximum likelihood estimation for both factors. The reconstructed matrix is used to obtain the true underlying preference intensity. This work demonstrates the improvement of our proposed algorithm over the current state-of-the-art in both simulated scenarios and real data.

1606.07686 2026-06-04 math.NA cs.NA stat.ML

Gamblets for opening the complexity-bottleneck of implicit schemes for hyperbolic and parabolic ODEs/PDEs with rough coefficients

赌局基函数用于突破隐式方案在处理具有粗糙系数的超几何和抛物型ODEs/PDEs中的复杂性瓶颈

Houman Owhadi, Lei Zhang

AI总结 本文提出了一种扩展的赌局基函数,用于高效求解隐式方案中的线性系统,提供近线性复杂度和严谨的误差界。

详情
Journal ref
Journal of Computational Physics, 347, 99-128, 2017
Comments
55 pages. 26 figures
AI中文摘要

隐式方案是求解时间依赖PDEs(如超几何和抛物型PDEs)的流行方法。然而,每次时间步需求解对应的线性系统构成了处理具有粗糙系数的PDEs的复杂性瓶颈。我们介绍了在\cite{OwhadiMultigrid:2015}中提出的赌局基函数的推广,使得这些隐式系统能够在近线性复杂度下求解,并为超几何和抛物型PDEs的数值近似提供严谨的先验误差界。这些扩展的赌局基函数诱导了解空间的多分辨率分解,该分解适应于底层(超几何和抛物型)PDE(以及由空间离散化得到的ODE系统)和数值方案的时间步长。

英文摘要

Implicit schemes are popular methods for the integration of time dependent PDEs such as hyperbolic and parabolic PDEs. However the necessity to solve corresponding linear systems at each time step constitutes a complexity bottleneck in their application to PDEs with rough coefficients. We present a generalization of gamblets introduced in \cite{OwhadiMultigrid:2015} enabling the resolution of these implicit systems in near-linear complexity and provide rigorous a-priori error bounds on the resulting numerical approximations of hyperbolic and parabolic PDEs. These generalized gamblets induce a multiresolution decomposition of the solution space that is adapted to both the underlying (hyperbolic and parabolic) PDE (and the system of ODEs resulting from space discretization) and to the time-steps of the numerical scheme.

1804.01926 2026-06-04 cs.RO cs.SY eess.SY stat.ML

Scalable Magnetic Field SLAM in 3D Using Gaussian Process Maps

基于高斯过程地图的可扩展三维磁场SLAM

Manon Kok, Arno Solin

AI总结 本文提出一种利用磁场局部异常进行三维磁场SLAM的方法,采用高斯过程模型和六边形分块映射,结合降维高斯过程回归与 Rao-Blackwellised 粒子滤波,实现高效计算和存储的SLAM算法。

详情
Comments
11 pages, 5 figures
AI中文摘要

我们提出了一种利用磁场局部异常作为位置信息源的可扩展且完全三维的磁场同时定位与建图(SLAM)方法。这些异常是由于建筑物结构和家具等物体中存在铁磁材料引起的。我们使用高斯过程模型表示磁场地图,并考虑磁场的已知物理性质。我们使用三维六边形分块进行局部地图构建。为了使我们的方法计算可行,我们结合降维高斯过程回归与 Rao-Blackwellised 粒子滤波。我们展示了使用智能手机测量可以得到准确的位置和姿态估计,并证明我们的方法在计算复杂度和地图存储方面都实现了可扩展的磁场SLAM算法。

英文摘要

We present a method for scalable and fully 3D magnetic field simultaneous localisation and mapping (SLAM) using local anomalies in the magnetic field as a source of position information. These anomalies are due to the presence of ferromagnetic material in the structure of buildings and in objects such as furniture. We represent the magnetic field map using a Gaussian process model and take well-known physical properties of the magnetic field into account. We build local maps using three-dimensional hexagonal block tiling. To make our approach computationally tractable we use reduced-rank Gaussian process regression in combination with a Rao-Blackwellised particle filter. We show that it is possible to obtain accurate position and orientation estimates using measurements from a smartphone, and that our approach provides a scalable magnetic field SLAM algorithm in terms of both computational complexity and map storage.

1806.03145 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Fidelity-based Probabilistic Q-learning for Control of Quantum Systems

基于保真度的概率Q学习用于量子系统的控制

Chunlin Chen, Daoyi Dong, Han-Xiong Li, Jian Chu, Tzyh-Jong Tarn

AI总结 本文提出基于保真度的概率Q学习方法,用于解决强化学习中探索与利用的平衡问题,并应用于量子系统控制,通过迭代更新动作概率实现自然探索策略,提升学习效率。

详情
Journal ref
IEEE Transactions on Neural Networks and Learning Systems, VOL. 25, NO. 5, pp.920-933, MAY 2014
Comments
13 pages, 16 figures
AI中文摘要

在强化学习中,探索与利用的平衡是一个关键问题,尤其是对于Q学习。本文提出一种基于保真度的概率Q学习(FPQL)方法,以自然解决此问题并应用于量子系统控制。该方法利用保真度指导学习过程,迭代更新每个状态的动作概率,从而实现自然探索策略而非依赖配置参数的尖锐策略。首先提出概率Q学习(PQL)算法以展示概率动作选择的基本思想,随后针对量子系统控制提出FPQL算法。通过两个例子(自旋-1/2系统和λ型原子系统)测试FPQL算法的性能。结果表明,FPQL算法在探索与利用之间取得更好的平衡,能够避免局部最优策略并加速学习过程。

英文摘要

The balance between exploration and exploitation is a key problem for reinforcement learning methods, especially for Q-learning. In this paper, a fidelity-based probabilistic Q-learning (FPQL) approach is presented to naturally solve this problem and applied for learning control of quantum systems. In this approach, fidelity is adopted to help direct the learning process and the probability of each action to be selected at a certain state is updated iteratively along with the learning process, which leads to a natural exploration strategy instead of a pointed one with configured parameters. A probabilistic Q-learning (PQL) algorithm is first presented to demonstrate the basic idea of probabilistic action selection. Then the FPQL algorithm is presented for learning control of quantum systems. Two examples (a spin- 1/2 system and a lamda-type atomic system) are demonstrated to test the performance of the FPQL algorithm. The results show that FPQL algorithms attain a better balance between exploration and exploitation, and can also avoid local optimal policies and accelerate the learning process.

1806.02499 2026-06-04 eess.SY cs.LG cs.SY stat.ML

Conditional probability calculation using restricted Boltzmann machine with application to system identification

基于受限玻尔兹曼机的条件概率计算及其在系统辨识中的应用

Erick de la Rosa, Wen Yu

AI总结 本文利用受限玻尔兹曼机计算条件概率用于非线性系统辨识,通过二进制编码和连续值方法改进模型,提出通用逼近分析,验证在噪声大和系统动态复杂时方法优势。

详情
AI中文摘要

使用概率方法进行非线性系统辨识具有优势,如数据集中的噪声和离群值对概率模型影响小,输入特征可概率形式提取。概率模型的主要障碍是概率分布难以获得。本文将非线性系统辨识转化为求解条件概率问题,修改受限玻尔兹曼机(RBM),使联合概率、输入分布和条件概率可通过RBM训练计算。讨论了二进制编码和连续值方法,提出基于条件概率建模的通用逼近分析。使用两个非线性系统基准测试比较本文概率建模方法与其他黑盒建模方法。结果表明,该新方法在存在大量噪声和系统动态复杂时表现更优。

英文摘要

There are many advantages to use probability method for nonlinear system identification, such as the noises and outliers in the data set do not affect the probability models significantly; the input features can be extracted in probability forms. The biggest obstacle of the probability model is the probability distributions are not easy to be obtained. In this paper, we form the nonlinear system identification into solving the conditional probability. Then we modify the restricted Boltzmann machine (RBM), such that the joint probability, input distribution, and the conditional probability can be calculated by the RBM training. Binary encoding and continue valued methods are discussed. The universal approximation analysis for the conditional probability based modelling is proposed. We use two benchmark nonlinear systems to compare our probability modelling method with the other black-box modeling methods. The results show that this novel method is much better when there are big noises and the system dynamics are complex.

1806.01949 2026-06-04 cs.CE cs.NA math.NA physics.comp-ph stat.ML

Reduced-Order Modeling through Machine Learning Approaches for Brittle Fracture Applications

通过机器学习方法进行 brittle fracture 应用的降阶建模

A. Hunter, B. A. Moore, M. K. Mudunuru, V. T. Chau, R. L. Miller, R. B. Tchoua, C. Nyshadham, S. Karra, D. O. Malley, E. Rougier, H. S. Viswanathan, G. Srinivasan

AI总结 本文介绍了五种用于地质材料(特别是混凝土) brittle fracture 降阶建模的方法,其中四种基于机器学习算法,结合物理假设减少计算复杂度,验证了机器学习在低应变率纯拉伸载荷下模拟含20条预存裂纹混凝土样本的有效性。

详情
Comments
25 pages, 8 figures
AI中文摘要

本文介绍了五种用于地质材料(特别是混凝土) brittle fracture 降阶建模的方法,其中四种基于机器学习算法,结合物理假设减少计算复杂度,验证了机器学习在低应变率纯拉伸载荷下模拟含20条预存裂纹混凝土样本的有效性。

英文摘要

In this paper, five different approaches for reduced-order modeling of brittle fracture in geomaterials, specifically concrete, are presented and compared. Four of the five methods rely on machine learning (ML) algorithms to approximate important aspects of the brittle fracture problem. In addition to the ML algorithms, each method incorporates different physics-based assumptions in order to reduce the computational complexity while maintaining the physics as much as possible. This work specifically focuses on using the ML approaches to model a 2D concrete sample under low strain rate pure tensile loading conditions with 20 preexisting cracks present. A high-fidelity finite element-discrete element model is used to both produce a training dataset of 150 simulations and an additional 35 simulations for validation. Results from the ML approaches are directly compared against the results from the high-fidelity model. Strengths and weaknesses of each approach are discussed and the most important conclusion is that a combination of physics-informed and data-driven features are necessary for emulating the physics of crack propagation, interaction and coalescence. All of the models presented here have runtimes that are orders of magnitude faster than the original high-fidelity model and pave the path for developing accurate reduced order models that could be used to inform larger length-scale models with important sub-scale physics that often cannot be accounted for due to computational cost.

1606.04464 2026-06-04 cs.CE cs.NA math.NA physics.comp-ph physics.geo-ph stat.ML

Sequential geophysical and flow inversion to characterize fracture networks in subsurface systems

序贯地球物理与流体反演以表征地下系统中的裂缝网络

M. K. Mudunuru, S. Karra, N. Makedonska, T. Chen

AI总结 本文提出一种非侵入式序贯反演框架,整合地球物理和流体数据以约束地下离散裂缝网络,通过微震数据估计裂缝取向统计学界限,并利用流体数据约束裂缝长度。

详情
Comments
32 pages, 14 figures
AI中文摘要

地下应用包括地热能、地质碳封存、石油和天然气等,通常涉及最大化能量提取或流体存储。由于异质性和各向异性,表征地下环境极为复杂,需要从多种碎片化数据流中估计地下参数的不确定性。本文提出一种非侵入式序贯反演框架,整合地球物理和流体数据以约束地下离散裂缝网络(DFN)。该方法首先利用微震数据估计DFN裂缝取向的统计学界限,通过焦点机制(基于物理的方法)和地震数据的聚类分析(统计方法)结合来估计这些界限。然后,基于流体数据约束裂缝长度。通过一个具有代表性的合成示例验证了这种多物理场序贯反演的有效性。

英文摘要

Subsurface applications including geothermal, geological carbon sequestration, oil and gas, etc., typically involve maximizing either the extraction of energy or the storage of fluids. Characterizing the subsurface is extremely complex due to heterogeneity and anisotropy. Due to this complexity, there are uncertainties in the subsurface parameters, which need to be estimated from multiple diverse as well as fragmented data streams. In this paper, we present a non-intrusive sequential inversion framework, for integrating data from geophysical and flow sources to constraint subsurface Discrete Fracture Networks (DFN). In this approach, we first estimate bounds on the statistics for the DFN fracture orientations using microseismic data. These bounds are estimated through a combination of a focal mechanism (physics-based approach) and clustering analysis (statistical approach) of seismic data. Then, the fracture lengths are constrained based on the flow data. The efficacy of this multi-physics based sequential inversion is demonstrated through a representative synthetic example.

1806.01678 2026-06-04 math.NA cs.LG cs.NA stat.ML

A Projection Method for Metric-Constrained Optimization

度量约束优化的一种投影方法

Nate Veldt, David Gleich, Anthony Wirth, James Saunderson

AI总结 本文提出一种解决度量约束优化问题的新方法,通过改进投影算法解决图聚类中的高维优化问题,并提供新的近似保证。

详情
AI中文摘要

我们概述了一种解决强制输出变量三角不等式的优化问题的新方法。我们将其称为度量约束优化,并给出了在机器学习应用和图聚类理论近似算法中出现的几个例子。尽管这些问题是理论上的有趣问题,但实际求解具有挑战性,因为黑箱求解器需要高内存。为了解决这一挑战,我们首先证明了相关聚类的度量约束线性规划松弛等价于度量接近问题的特殊情况。然后我们通过推广和改进最初用于度量接近的简单投影算法,开发了一个通用求解器。我们为使用我们的框架找到几个具有挑战性的图聚类问题的最优解的下界提供了几种新的近似保证。我们还通过解决包含高达10^8个变量和10^11个约束的优化问题来展示我们框架的威力。

英文摘要

We outline a new approach for solving optimization problems which enforce triangle inequalities on output variables. We refer to this as metric-constrained optimization, and give several examples where problems of this form arise in machine learning applications and theoretical approximation algorithms for graph clustering. Although these problem are interesting from a theoretical perspective, they are challenging to solve in practice due to the high memory requirement of black-box solvers. In order to address this challenge we first prove that the metric-constrained linear program relaxation of correlation clustering is equivalent to a special case of the metric nearness problem. We then developed a general solver for metric-constrained linear and quadratic programs by generalizing and improving a simple projection algorithm originally developed for metric nearness. We give several novel approximation guarantees for using our framework to find lower bounds for optimal solutions to several challenging graph clustering problems. We also demonstrate the power of our framework by solving optimizing problems involving up to 10^{8} variables and 10^{11} constraints.

1710.01852 2026-06-04 eess.SY cs.SY econ.EM eess.SP math.ST stat.TH

Finite Time Identification in Unstable Linear Systems

有限时间识别在不稳定的线性系统中

Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis

AI总结 研究在不稳定线性系统中有限时间参数识别问题,提出基于重尾噪声分布和转移矩阵的有限时间误差界,结合随机矩阵和鞅差序列的浓度不等式。

详情
AI中文摘要

在不稳定线性系统中进行参数识别是一个在文献中广泛研究的问题,无论是低维还是高维设置。然而,对于不稳定情况几乎没有任何结果,特别是关于有限时间界。在此设置中,传统的关于动态参数最小二乘估计的经典结果不适用,因此需要开发新的概念和技术方法来解决这个问题。不稳定的线性系统出现在控制理论、计量经济学和金融的关键实际应用中。本研究为一大类重尾噪声分布和此类系统的转移矩阵的最小二乘估计的识别误差建立了有限时间界。结果将估计所需的时间长度(样本)与问题维度和真实底层转移矩阵和噪声分布的关键特征函数相关联。为此,适当利用了随机矩阵和鞅差序列的浓度不等式。

英文摘要

Identification of the parameters of stable linear dynamical systems is a well-studied problem in the literature, both in the low and high-dimensional settings. However, there are hardly any results for the unstable case, especially regarding finite time bounds. For this setting, classical results on least-squares estimation of the dynamics parameters are not applicable and therefore new concepts and technical approaches need to be developed to address the issue. Unstable linear systems arise in key real applications in control theory, econometrics, and finance. This study establishes finite time bounds for the identification error of the least-squares estimates for a fairly large class of heavy-tailed noise distributions, and transition matrices of such systems. The results relate the time length (samples) required for estimation to a function of the problem dimension and key characteristics of the true underlying transition matrix and the noise distribution. To establish them, appropriate concentration inequalities for random matrices and for sequences of martingale differences are leveraged.

1806.01003 2026-06-04 eess.SY cs.LG cs.SY stat.ML

Distributed Learning from Interactions in Social Networks

社交网络中交互的分布式学习

Francesco Sasso, Angelo Coluccia, Giuseppe Notarstefano

AI总结 本文提出基于社交网络交互的分布式学习框架,利用贝叶斯方法和最大似然估计,通过图模型工具实现参数和超参数的分布式估计,用于用户画像建模。

详情
Comments
This submission is a shorter work (for conference publication) of a more comprehensive paper, already submitted as arXiv:1706.04081 (under review for journal publication). In this short submission only one social set-up is considered and only one of the relaxed estimators is proposed. Moreover, the exhaustive analysis, carried out in the longer manuscript, is completely missing in this version
AI中文摘要

我们考虑一个网络场景,其中代理可以根据表示某些交互的评分图来评估彼此。目标是设计一个分布式协议,由代理运行,使他们能够在有限的可能值中学习其未知状态。我们提出一个贝叶斯框架,其中评分和状态与具有未知参数和超参数的概率事件相关联。我们展示每个代理可以通过本地贝叶斯分类器和结合普通最大似然估计和经验贝叶斯方法的(集中式)最大似然(ML)估计器来学习其状态。通过使用图模型工具,我们可以获得评分和状态的条件依赖性的洞察,从而提供一个放松的概率模型,最终导致一个适合分布式计算的参数-超参数估计器。为了突出所提放松的适当性,我们将在社交互动设置中演示分布式估计器。

英文摘要

We consider a network scenario in which agents can evaluate each other according to a score graph that models some interactions. The goal is to design a distributed protocol, run by the agents, that allows them to learn their unknown state among a finite set of possible values. We propose a Bayesian framework in which scores and states are associated to probabilistic events with unknown parameters and hyperparameters, respectively. We show that each agent can learn its state by means of a local Bayesian classifier and a (centralized) Maximum-Likelihood (ML) estimator of parameter-hyperparameter that combines plain ML and Empirical Bayes approaches. By using tools from graphical models, which allow us to gain insight on conditional dependencies of scores and states, we provide a relaxed probabilistic model that ultimately leads to a parameter-hyperparameter estimator amenable to distributed computation. To highlight the appropriateness of the proposed relaxation, we demonstrate the distributed estimators on a social interaction set-up for user profiling.

1806.00589 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML

Efficient Entropy for Policy Gradient with Multidimensional Action Space

在多维动作空间中高效的策略梯度熵

Yiming Zhang, Quan Ho Vuong, Kenny Song, Xiao-Yue Gong, Keith W. Ross

AI总结 本文提出高效计算高维动作空间策略梯度熵的方法,通过改进的无偏估计器提升探索效率,在多猎手多兔子网格游戏和多智能体多臂老虎机问题中验证了其有效性。

详情
AI中文摘要

近年来,深度强化学习在解决高维状态空间(如Atari游戏)的序列决策过程方面表现出色。然而,许多强化学习问题涉及高维离散动作空间和高维状态空间。本文考虑熵奖励,用于在策略梯度中鼓励探索。在高维动作空间中,计算熵及其梯度需要枚举所有动作并为每个动作运行前向和反向传播,这可能计算上不可行。我们开发了几种新颖的无偏估计器用于熵奖励及其梯度。我们将这些估计器应用于几种参数化策略模型,包括独立采样、CommNet、带有修改MDP的自回归和带有LSTM的自回归。最后,我们在两个环境中测试我们的算法:一个多猎手多兔子网格游戏和一个多智能体多臂老虎机问题。结果表明,我们的熵估计器在边际额外计算成本下显著提升了性能。

英文摘要

In recent years, deep reinforcement learning has been shown to be adept at solving sequential decision processes with high-dimensional state spaces such as in the Atari games. Many reinforcement learning problems, however, involve high-dimensional discrete action spaces as well as high-dimensional state spaces. This paper considers entropy bonus, which is used to encourage exploration in policy gradient. In the case of high-dimensional action spaces, calculating the entropy and its gradient requires enumerating all the actions in the action space and running forward and backpropagation for each action, which may be computationally infeasible. We develop several novel unbiased estimators for the entropy bonus and its gradient. We apply these estimators to several models for the parameterized policies, including Independent Sampling, CommNet, Autoregressive with Modified MDP, and Autoregressive with LSTM. Finally, we test our algorithms on two environments: a multi-hunter multi-rabbit grid game and a multi-agent multi-arm bandit problem. The results show that our entropy estimators substantially improve performance with marginal additional computational cost.

1805.12253 2026-06-04 q-bio.MN cs.SY eess.SY stat.ME

Sequential Experimental Design for Optimal Structural Intervention in Gene Regulatory Networks Based on the Mean Objective Cost of Uncertainty

基于不确定性均目标成本的基因调控网络最优结构干预的序列实验设计

Mahdi Imani, Roozbeh Dehghannasiri, Ulisses M. Braga-Neto, Edward R. Dougherty

AI总结 本文提出基于均目标不确定性(MOCU)的有限时间动态规划方法,用于基因调控网络的最优结构干预实验设计,对比了贪心策略与动态规划策略的性能。

详情
AI中文摘要

科学家正在尝试使用日益复杂的模型,尤其是在医学领域,如癌症等基于基因的疾病需要更好的细胞调控建模。复杂模型面临不确定性,需要实验来减少这种不确定性。由于实验可能昂贵且耗时,因此需要确定提供最有用信息的实验。如果要进行一系列实验,则需要实验设计来确定顺序。经典方法是最大限度地减少模型的总体不确定性,即最大熵减少。最近提出的方法考虑了模型不确定性和翻译目标,例如在基因调控网络中实现最优结构干预,目标是改变调控逻辑以最大限度地减少长期处于癌变状态的可能性。均目标不确定性(MOCU)基于模型不确定性对目标影响的程度来量化不确定性。实验设计涉及选择产生最大MOCU减少的实验。本文介绍了基于MOCU的有限时间动态规划方法用于序列实验设计,并将其与贪心方法进行比较,后者每次选择一个实验而不考虑整个实验时间范围。本文的一个显著特点是证明了基于MOCU的设计在贪心和动态规划策略中优于广泛使用的基于熵的设计,并研究了模型条件对比较性能的影响。

英文摘要

Scientists are attempting to use models of ever increasing complexity, especially in medicine, where gene-based diseases such as cancer require better modeling of cell regulation. Complex models suffer from uncertainty and experiments are needed to reduce this uncertainty. Because experiments can be costly and time-consuming it is desirable to determine experiments providing the most useful information. If a sequence of experiments is to be performed, experimental design is needed to determine the order. A classical approach is to maximally reduce the overall uncertainty in the model, meaning maximal entropy reduction. A recently proposed method takes into account both model uncertainty and the translational objective, for instance, optimal structural intervention in gene regulatory networks, where the aim is to alter the regulatory logic to maximally reduce the long-run likelihood of being in a cancerous state. The mean objective cost of uncertainty (MOCU) quantifies uncertainty based on the degree to which model uncertainty affects the objective. Experimental design involves choosing the experiment that yields the greatest reduction in MOCU. This paper introduces finite-horizon dynamic programming for MOCU-based sequential experimental design and compares it to the greedy approach, which selects one experiment at a time without consideration of the full horizon of experiments. A salient aspect of the paper is that it demonstrates the advantage of MOCU-based design over the widely used entropy-based design for both greedy and dynamic-programming strategies and investigates the effect of model conditions on the comparative performances.

1805.10801 2026-06-04 math.NA cs.NA math.PR math.ST stat.TH

Sequential sampling for optimal weighted least squares approximations in hierarchical spaces

在分层空间中为最优加权最小二乘逼近的序列采样

Benjamin Arras, Markus Bachmayr, Albert Cohen

AI总结 本文研究在分层空间中通过随机采样点进行加权最小二乘逼近的问题,提出了一种保持稳定性与逼近性质的序列采样算法,证明在近线性采样规模下,样本数量保持m log m的顺序。

详情
Comments
17 pages, 7 figures
AI中文摘要

我们考虑从给定采样点x^1,…,x^n∈D处的评估近似未知函数u∈L²(D,ρ)的问题,其中D⊂R^d是一般领域,ρ是一个概率测度。近似选择在线性空间V_m中进行,通过加权最小二乘方法计算。最新结果表明,按照精心选择的概率测度μ随机选择采样点,其中μ同时依赖于V_m和ρ,可以保证加权最小二乘逼近在高概率下稳定,并具有与精确L²(D,ρ)-正交投影到V_m的精度相当的精度,在近线性采样规模n~m log m下。本文受自适应逼近背景启发,在此背景下通常生成一个嵌套的空间序列(V_m)_{m≥1},其维度逐渐增加。尽管测度μ=μ_m随着V_m变化,但可以将之前生成的样本重新利用,将μ_m解释为μ_{m-1}和更新测度σ_m的混合。基于此观察,我们讨论了保持所有空间V_m稳定性与逼近性质的序列采样算法。主要结果是,在步骤m中计算的总样本数保持高概率下m log m的顺序。数值实验验证了这一分析。

英文摘要

We consider the problem of approximating an unknown function $u\in L^2(D,ρ)$ from its evaluations at given sampling points $x^1,\dots,x^n\in D$, where $D\subset \mathbb{R}^d$ is a general domain and $ρ$ is a probability measure. The approximation is picked in a linear space $V_m$ where $m=\dim(V_m)$ and computed by a weighted least squares method. Recent results show the advantages of picking the sampling points at random according to a well-chosen probability measure $μ$ that depends both on $V_m$ and $ρ$. With such a random design, the weighted least squares approximation is proved to be stable with high probability, and having precision comparable to that of the exact $L^2(D,ρ)$-orthonormal projection onto $V_m$, in a near-linear sampling regime $n\sim{m\log m}$. The present paper is motivated by the adaptive approximation context, in which one typically generates a nested sequence of spaces $(V_m)_{m\geq1}$ with increasing dimension. Although the measure $μ=μ_m$ changes with $V_m$, it is possible to recycle the previously generated samples by interpreting $μ_m$ as a mixture between $μ_{m-1}$ and an update measure $σ_m$. Based on this observation, we discuss sequential sampling algorithms that maintain the stability and approximation properties uniformly over all spaces $V_m$. Our main result is that the total number of computed sample at step $m$ remains of the order $m\log{m}$ with high probability. Numerical experiments confirm this analysis.

1805.10736 2026-06-04 math.ST cs.NA math.NA stat.TH

De-noising by thresholding operator adapted wavelets

通过适应波let的阈值操作去噪

Gene Ryan Yoo, Houman Owhadi

AI总结 本文研究了在未知函数u的先验信息为线性算子Łu的正则性时,通过阈值化适应波let系数来近似u的问题,证明了该方法在近似最优性和能量范数界方面具有优势。

详情
AI中文摘要

Donoho和Johnstone提出了一种通过将经验波let系数向零翻译来从噪声数据u+ζ中重建未知光滑函数u的方法。本文考虑了未知函数u的先验信息可能不是u的正则性,而是Łu的正则性(如偏微分方程或图拉普拉希算子)的情况。我们证明,通过阈值化u+ζ的赌博特(算子适应波let)系数所获得的u的近似是近最优的(至多乘以一个常数),并且以高概率,其能量范数(由算子定义)不超过u的范数,至多与噪声幅度有关的常数。由于赌博特可以在O(N polylog N)复杂度内计算,并且在空间和特征空间上都具有局部性,所提出的方法具有近线性复杂度,并可推广到非均匀噪声。

英文摘要

Donoho and Johnstone proposed a method from reconstructing an unknown smooth function $u$ from noisy data $u+ζ$ by translating the empirical wavelet coefficients of $u+ζ$ towards zero. We consider the situation where the prior information on the unknown function $u$ may not be the regularity of $u$ but that of $ Łu$ where $Ł$ is a linear operator (such as a PDE or a graph Laplacian). We show that the approximation of $u$ obtained by thresholding the gamblet (operator adapted wavelet) coefficients of $u+ζ$ is near minimax optimal (up to a multiplicative constant), and with high probability, its energy norm (defined by the operator) is bounded by that of $u$ up to a constant depending on the amplitude of the noise. Since gamblets can be computed in $\mathcal{O}(N \operatorname{polylog} N)$ complexity and are localized both in space and eigenspace, the proposed method is of near-linear complexity and generalizable to non-homogeneous noise.

1805.10638 2026-06-04 cs.LG cs.NA math.NA stat.ML

Fast K-Means Clustering with Anderson Acceleration

快速K均值聚类的安德森加速方法

Juyong Zhang, Yuxin Yao, Yue Peng, Hao Yu, Bailin Deng

AI总结 本文提出了一种加速K均值聚类Lloyd算法的新方法,通过将Lloyd算法的分配和更新步骤视为固定点迭代,并应用安德森加速技术,动态调整参数m以实现鲁棒且一致的加速效果。

详情
AI中文摘要

我们提出了一种新的方法,用于加速K-均值聚类的Lloyd算法。与以往减少每次迭代计算成本或改进初始化的方法不同,我们的方法专注于减少收敛所需的迭代次数。这通过将Lloyd算法的分配步骤和更新步骤视为固定点迭代,并应用安德森加速,一种已建立的加速固定点求解器的技术来实现。经典安德森加速利用m个之前的迭代来找到加速的迭代,其在K-均值聚类中的性能对m的选择和样本分布敏感。我们提出了一种新的策略,动态调整m的值,以在不同问题实例上实现鲁棒且一致的加速。我们的方法补充了现有的加速技术,并可以与它们结合以实现最先进的性能。我们进行了广泛的实验来评估所提出方法的性能,在120个测试用例中,有106个用例优于其他算法,平均计算时间减少比率超过33%。

英文摘要

We propose a novel method to accelerate Lloyd's algorithm for K-Means clustering. Unlike previous acceleration approaches that reduce computational cost per iterations or improve initialization, our approach is focused on reducing the number of iterations required for convergence. This is achieved by treating the assignment step and the update step of Lloyd's algorithm as a fixed-point iteration, and applying Anderson acceleration, a well-established technique for accelerating fixed-point solvers. Classical Anderson acceleration utilizes m previous iterates to find an accelerated iterate, and its performance on K-Means clustering can be sensitive to choice of m and the distribution of samples. We propose a new strategy to dynamically adjust the value of m, which achieves robust and consistent speedups across different problem instances. Our method complements existing acceleration techniques, and can be combined with them to achieve state-of-the-art performance. We perform extensive experiments to evaluate the performance of the proposed method, where it outperforms other algorithms in 106 out of 120 test cases, and the mean decrease ratio of computational time is more than 33%.

1805.09613 2026-06-04 stat.ML cs.AI cs.LG cs.RO cs.SY eess.SY

A0C: Alpha Zero in Continuous Action Space

A0C:在连续动作空间中的Alpha Zero

Thomas M. Moerland, Joost Broekens, Aske Plaat, Catholijn M. Jonker

AI总结 本文提出将Alpha Zero扩展到连续动作空间的理论方法,并在倒摆任务中验证了其可行性,为连续动作空间中的迭代搜索与学习应用奠定了基础。

详情
AI中文摘要

Alpha Zero的核心创新在于树搜索与深度学习的结合,这在国际象棋、国际跳棋和围棋等离散动作空间的游戏中证明非常成功。然而,许多现实世界的强化学习领域具有连续动作空间,例如机器人控制、导航和自动驾驶汽车。本文提出了将Alpha Zero扩展到连续动作空间所需的理论扩展。我们还提供了一些在倒摆摆起任务中的初步实验,实证地展示了我们方法的可行性。因此,这项工作为在连续动作空间领域中应用迭代搜索与学习奠定了基础。

英文摘要

A core novelty of Alpha Zero is the interleaving of tree search and deep learning, which has proven very successful in board games like Chess, Shogi and Go. These games have a discrete action space. However, many real-world reinforcement learning domains have continuous action spaces, for example in robotic control, navigation and self-driving cars. This paper presents the necessary theoretical extensions of Alpha Zero to deal with continuous action space. We also provide some preliminary experiments on the Pendulum swing-up task, empirically showing the feasibility of our approach. Thereby, this work provides a first step towards the application of iterated search and learning in domains with a continuous action space.

1805.09464 2026-06-04 cs.LG cs.IT cs.NA math.IT math.NA math.OC stat.ML

Simple and practical algorithms for $\ell_p$-norm low-rank approximation

简单且实用的ℓp-范数低秩近似算法

Anastasios Kyrillidis

AI总结 本文提出基于梯度的非凸算法,用于ℓp范数低秩近似,适用于p=1或p=∞。算法易于实现,能更快速且更精确地逼近,理论证明其可达到(1+ε)-OPT近似,且不依赖超参数。

详情
Comments
16 pages, 11 figures, to appear in UAI 2018
AI中文摘要

我们提出了一种实用算法,用于entrywise ℓp-范数低秩近似,其中p=1或p=∞。所提出的框架是非凸且基于梯度的,易于实现且通常在速度和精度上优于现有方法。从理论角度看,我们证明所提方案可以达到(1+ε)-OPT近似。我们的算法并非超参数无关:只有在假设算法的超参数已知或可近似的情况下,才能实现所需目标。即,我们的理论表明为了在多项式时间内获得良好的解,需要知道哪些问题量,且不与最近的不可近似性结果相矛盾,如[46]。

英文摘要

We propose practical algorithms for entrywise $\ell_p$-norm low-rank approximation, for $p = 1$ or $p = \infty$. The proposed framework, which is non-convex and gradient-based, is easy to implement and typically attains better approximations, faster, than state of the art. From a theoretical standpoint, we show that the proposed scheme can attain $(1 + \varepsilon)$-OPT approximations. Our algorithms are not hyperparameter-free: they achieve the desiderata only assuming algorithm's hyperparameters are known a priori---or are at least approximable. I.e., our theory indicates what problem quantities need to be known, in order to get a good solution within polynomial time, and does not contradict to recent inapproximabilty results, as in [46].

1605.05898 2026-06-04 math.PR cs.NA math.NA math.ST stat.TH

Well-posed Bayesian inverse problems and heavy-tailed stable quasi-Banach space priors

良好的贝叶斯反问题与厚尾稳定准巴拿赫空间先验分布

T. J. Sullivan

AI总结 本文基于贝叶斯反问题框架,探讨了在准巴拿赫空间上使用厚尾稳定分布作为先验分布的可能性,并证明了在弱正则性假设下后验测度对扰动的Hellinger连续性。

详情
Comments
To appear in Inverse Problems and Imaging. This preprint differs from the final published version in layout and typographical details
AI中文摘要

本文扩展了Stuart(Acta Numer. 19:451--559, 2010)等人提出的无限维参数空间中贝叶斯反问题框架,应用于厚尾先验分布,如无限维柯西分布,其多项式矩为无穷或未定义。它表明类似于平方可积随机变量的Karhunen--Loève展开的类比可以用于在准巴拿赫空间上采样此类测度。此外,在比以往使用的正则性假设更弱的条件下,贝叶斯后验测度在Hellinger度量下对扰动函数和观测数据的扰动表现出Lipschitz连续性。

英文摘要

This article extends the framework of Bayesian inverse problems in infinite-dimensional parameter spaces, as advocated by Stuart (Acta Numer. 19:451--559, 2010) and others, to the case of a heavy-tailed prior measure in the family of stable distributions, such as an infinite-dimensional Cauchy distribution, for which polynomial moments are infinite or undefined. It is shown that analogues of the Karhunen--Loève expansion for square-integrable random variables can be used to sample such measures on quasi-Banach spaces. Furthermore, under weaker regularity assumptions than those used to date, the Bayesian posterior measure is shown to depend Lipschitz continuously in the Hellinger metric upon perturbations of the misfit function and observed data.

1805.08095 2026-06-04 cs.LG cs.CV cs.NA math.NA stat.ML

Small steps and giant leaps: Minimal Newton solvers for Deep Learning

小步与巨跃:用于深度学习的最小牛顿求解器

João F. Henriques, Sebastien Ehrhardt, Samuel Albanie, Andrea Vedaldi

AI总结 本文提出一种快速的二阶方法,可作为现有深度学习求解器的替代方案。该方法仅需每个迭代两次额外的前向模式自动微分操作,计算成本与两次标准前向传递相当,易于实现。方法解决了现有二阶求解器的长期问题,避免了计算Hessian矩阵的近似逆矩阵的高成本和噪声敏感性。

详情
AI中文摘要

我们提出了一种快速的二阶方法,可作为现有深度学习求解器的替代方案。与随机梯度下降(SGD)相比,该方法每个迭代仅需两次额外的前向模式自动微分操作,计算成本与两次标准前向传递相当,且易于实现。我们的方法解决了现有二阶求解器的长期问题,即每次迭代精确或通过共轭梯度法计算近似Hessian矩阵的逆矩阵,这一过程成本高且对噪声敏感。相反,我们提出保持一个梯度的估计值,该估计值通过逆Hessian矩阵投影得到,并在每次迭代中更新一次。该估计值的大小相同,类似于SGD中常用的动量变量。不维护Hessian的估计值。我们首先在具有已知闭式解的小问题上验证了我们的方法,称为CurveBall,包括噪声Rosenbrock函数和退化的两层线性网络,其中现有深度学习求解器似乎难以处理。然后我们在CIFAR和ImageNet上训练了多个大型模型,包括ResNet和VGG-f网络,展示了无需超参数调优的更快收敛速度。代码已提供。

英文摘要

We propose a fast second-order method that can be used as a drop-in replacement for current deep learning solvers. Compared to stochastic gradient descent (SGD), it only requires two additional forward-mode automatic differentiation operations per iteration, which has a computational cost comparable to two standard forward passes and is easy to implement. Our method addresses long-standing issues with current second-order solvers, which invert an approximate Hessian matrix every iteration exactly or by conjugate-gradient methods, a procedure that is both costly and sensitive to noise. Instead, we propose to keep a single estimate of the gradient projected by the inverse Hessian matrix, and update it once per iteration. This estimate has the same size and is similar to the momentum variable that is commonly used in SGD. No estimate of the Hessian is maintained. We first validate our method, called CurveBall, on small problems with known closed-form solutions (noisy Rosenbrock function and degenerate 2-layer linear networks), where current deep learning solvers seem to struggle. We then train several large models on CIFAR and ImageNet, including ResNet and VGG-f networks, where we demonstrate faster convergence with no hyperparameter tuning. Code is available.

1805.07640 2026-06-04 math.OC cs.SY eess.SY stat.ML

Comments on "Momentum fractional LMS for power signal parameter estimation"

对“用于功率信号参数估计的动量分数LMS”的评论

Shujaat Khan, Imran Naseem, Alishba Sadiq, Jawwad Ahmad, Muhammad Moinuddin

AI总结 本文指出最近提出的动量分数最小均方算法存在设计和分析严重缺陷,通过仿真验证其不如传统LMS方法有效。

详情
Comments
Least mean squares algorithm, Fractional least mean squares algorithm, Momentum fractional least mean square algorithm
AI中文摘要

本文旨在指出最近提出的动量分数最小均方(mFLMS)算法在设计和分析上存在严重缺陷。我们的担忧基于论文《用于功率信号参数估计的动量分数LMS》中发现的推导和分析证据。除了理论基础外,我们的主张还通过广泛的仿真结果得到验证。实验清楚地表明,新方法在性能上并不优于传统最小均方(LMS)方法。

英文摘要

The purpose of this paper is to indicate that the recently proposed Momentum fractional least mean squares (mFLMS) algorithm has some serious flaws in its design and analysis. Our apprehensions are based on the evidence we found in the derivation and analysis in the paper titled: \textquotedblleft \textit{Momentum fractional LMS for power signal parameter estimation}\textquotedblright. In addition to the theoretical bases our claims are also verified through extensive simulation results. The experiments clearly show that the new method does not have any advantage over the classical least mean square (LMS) method.

1804.03016 2026-06-04 stat.ME cs.NA math.NA

A Bayes-Sard Cubature Method

一种贝叶斯-萨德求积方法

Toni Karvonen, Chris J. Oates, Simo Särkkä

AI总结 本文提出一种结合贝叶斯求积灵活性与经典求积鲁棒性的贝叶斯-萨德求积方法,通过高斯过程模型和参数回归模型实现高维积分的高效准确计算。

详情
AI中文摘要

本文聚焦于将数值积分作为推断任务进行建模。目前研究主要集中在贝叶斯求积的发展,其分布输出提供积分的不确定性量化。然而,贝叶斯求积的点估计在高维域中可能不准确且对先验敏感。为解决这些问题,我们引入了贝叶斯-萨德求积方法,通过高斯过程模型和参数回归模型结合,采用非信息性扁平先验,实现了高维金融积分的误差显著降低。

英文摘要

This paper focusses on the formulation of numerical integration as an inferential task. To date, research effort has largely focussed on the development of Bayesian cubature, whose distributional output provides uncertainty quantification for the integral. However, the point estimators associated to Bayesian cubature can be inaccurate and acutely sensitive to the prior when the domain is high-dimensional. To address these drawbacks we introduce Bayes-Sard cubature, a probabilistic framework that combines the flexibility of Bayesian cubature with the robustness of classical cubatures which are well-established. This is achieved by considering a Gaussian process model for the integrand whose mean is a parametric regression model, with an improper flat prior on each regression coefficient. The features in the regression model consist of test functions which are guaranteed to be exactly integrated, with remaining degrees of freedom afforded to the non-parametric part. The asymptotic convergence of the Bayes-Sard cubature method is established and the theoretical results are numerically verified. In particular, we report two orders of magnitude reduction in error compared to Bayesian cubature in the context of a high-dimensional financial integral.

1504.06043 2026-06-04 eess.SY cs.SY stat.ML

Stability of Stochastic Approximations with `Controlled Markov' Noise and Temporal Difference Learning

具有'受控马尔可夫'噪声和时间差分学习的随机逼近稳定性

Arunselvan Ramaswamy, Shalabh Bhatnagar

AI总结 本文研究了由'受控马尔可夫'过程驱动的随机逼近算法的稳定性,提出易于验证的充分条件,适用于连续状态空间的强化学习应用,并扩展了时间差分学习的理论分析。

详情
Comments
18 pages
AI中文摘要

我们关注由'受控马尔可夫'过程驱动的随机逼近算法(SAs)的稳定性(几乎必然有界)。分析此类算法很重要,因为许多强化学习(RL)算法可以视为由'受控马尔可夫'过程驱动的SAs。本文提出了适用于由'受控马尔可夫'过程驱动的SAs的稳定性与收敛的易验证充分条件。许多RL应用涉及连续状态空间。虽然我们的分析确保了此类连续状态应用的稳定性,但传统分析却不。与文献相比,我们的分析实现了双重推广:(a)马尔可夫过程可能在连续状态空间中演变;(b)过程在任何给定的平稳策略下无需为遍历。时间差分学习(TD)是强化学习中重要的策略评估方法。本文发展的理论用于分析广义$TD(0)$,即TD的重要变种。本文的理论还用于分析监督学习的时间差分公式用于预测问题。

英文摘要

We are interested in understanding stability (almost sure boundedness) of stochastic approximation algorithms (SAs) driven by a `controlled Markov' process. Analyzing this class of algorithms is important, since many reinforcement learning (RL) algorithms can be cast as SAs driven by a `controlled Markov' process. In this paper, we present easily verifiable sufficient conditions for stability and convergence of SAs driven by a `controlled Markov' process. Many RL applications involve continuous state spaces. While our analysis readily ensures stability for such continuous state applications, traditional analyses do not. As compared to literature, our analysis presents a two-fold generalization (a) the Markov process may evolve in a continuous state space and (b) the process need not be ergodic under any given stationary policy. Temporal difference learning (TD) is an important policy evaluation method in reinforcement learning. The theory developed herein, is used to analyze generalized $TD(0)$, an important variant of TD. Our theory is also used to analyze a TD formulation of supervised learning for forecasting problems.

1804.01825 2026-06-04 cs.LG econ.GN q-fin.EC stat.ML

Evaluating Hospital Case Cost Prediction Models Using Azure Machine Learning Studio

利用Azure机器学习工作室评估医院病例成本预测模型

Alexei Botchkarev

AI总结 本文提出了一种利用Azure机器学习工作室快速评估多种回归模型的工具,评估了鲁棒回归、提升决策树回归和决策森林回归在医院病例成本预测中的优势。

详情
AI中文摘要

准确的医院病例成本建模和预测能力对高效医疗财务管理和预算规划至关重要。已知各种回归机器学习算法在医疗成本预测中表现良好。本实验的目的是构建一个Azure机器学习工作室工具,用于快速评估多种类型的回归模型。该工具提供了一个统一的实验环境,可比较14种回归模型:线性回归、贝叶斯线性回归、决策森林回归、提升决策树回归、神经网络回归、泊松回归、回归高斯过程、梯度提升机、非线性最小二乘回归、投影寻踪回归、随机森林回归、鲁棒回归、鲁棒回归与mm型估计器、支持向量回归。该工具通过五个性能指标将评估结果按模型准确性排列在单一表格中。对回归机器学习模型进行医院病例成本预测的评估显示,鲁棒回归模型、提升决策树回归和决策森林回归具有优势。该操作工具已发布到网络上,可供实验和扩展使用。

英文摘要

Ability for accurate hospital case cost modelling and prediction is critical for efficient health care financial management and budgetary planning. A variety of regression machine learning algorithms are known to be effective for health care cost predictions. The purpose of this experiment was to build an Azure Machine Learning Studio tool for rapid assessment of multiple types of regression models. The tool offers environment for comparing 14 types of regression models in a unified experiment: linear regression, Bayesian linear regression, decision forest regression, boosted decision tree regression, neural network regression, Poisson regression, Gaussian processes for regression, gradient boosted machine, nonlinear least squares regression, projection pursuit regression, random forest regression, robust regression, robust regression with mm-type estimators, support vector regression. The tool presents assessment results arranged by model accuracy in a single table using five performance metrics. Evaluation of regression machine learning models for performing hospital case cost prediction demonstrated advantage of robust regression model, boosted decision tree regression and decision forest regression. The operational tool has been published to the web and openly available for experiments and extensions.

1805.03777 2026-06-04 stat.AP cs.SY eess.SY stat.ML

Deep Reinforcement Learning for Optimal Control of Space Heating

深度强化学习用于空间供暖的最优控制

Adam Nagy, Hussain Kazmi, Farah Cheaib, Johan Driesen

AI总结 本文提出一种深度强化学习算法,高效控制建筑空间供暖,优于基于规则的控制方法,提升计算效率和系统鲁棒性。

详情
Comments
Accepted at Building Simulation and Optimization (BSO 2018), Cambridge, England
AI中文摘要

经典方法在控制供暖系统时常存在性能欠佳、无法适应动态条件和不合理假设(如建筑模型的存在)。本文提出了一种新颖的深度强化学习算法,以高效方式控制建筑空间供暖,并与其他已知技术进行基准测试。该算法在模拟环境中对多种价格信号表现出比基于规则的控制方法高5-10%的性能。我们得出结论,虽然该算法并非最优,但其提供了额外的实用优势,如更快的计算时间和对建筑动态非平稳性的更高鲁棒性。

英文摘要

Classical methods to control heating systems are often marred by suboptimal performance, inability to adapt to dynamic conditions and unreasonable assumptions e.g. existence of building models. This paper presents a novel deep reinforcement learning algorithm which can control space heating in buildings in a computationally efficient manner, and benchmarks it against other known techniques. The proposed algorithm outperforms rule based control by between 5-10% in a simulation environment for a number of price signals. We conclude that, while not optimal, the proposed algorithm offers additional practical advantages such as faster computation times and increased robustness to non-stationarities in building dynamics.

1710.00387 2026-06-04 math.NA cs.NA stat.ML

Efficient Preconditioning for Noisy Separable NMFs by Successive Projection Based Low-Rank Approximations

通过基于 successive projection 的低秩近似实现噪声可分离 NMF 的高效预处理

Tomohiko Mizutani, Mirai Tanaka

AI总结 本文提出改进的预处理方法,通过基于 successive projection 的低秩近似替代传统奇异值分解,提升 SPA 在噪声下的鲁棒性并降低计算成本。

详情
Journal ref
Machine Learning, 107(4), pages 643-673, 2018
Comments
32 pages, 4 figures
AI中文摘要

successive projection algorithm (SPA) 可快速求解在可分离假设下的非负矩阵分解问题。即使在问题中加入噪声,只要噪声引起的扰动较小,SPA 仍具有鲁棒性。Gillis 和 Vavasis (2015) 提出的预处理方法可以增强 SPA 的噪声鲁棒性,但需要额外计算成本。预处理构建过程包含计算输入矩阵的 top-k 截断奇异值分解步骤,该步骤是预处理 SPA 高效实现的主要障碍。为解决成本问题,本文提出修改预处理算法的方案。虽然原始算法使用最佳 rank-k 近似,但本文改用基于 SPA 的 rank-k 近似。我们分析了近似的精度并评估了算法的计算成本。然后通过实证研究揭示了基于 SPA 的 rank-k 近似算法和改进的预处理 SPA 的实际性能。

英文摘要

The successive projection algorithm (SPA) can quickly solve a nonnegative matrix factorization problem under a separability assumption. Even if noise is added to the problem, SPA is robust as long as the perturbations caused by the noise are small. In particular, robustness against noise should be high when handling the problems arising from real applications. The preconditioner proposed by Gillis and Vavasis (2015) makes it possible to enhance the noise robustness of SPA. Meanwhile, an additional computational cost is required. The construction of the preconditioner contains a step to compute the top-$k$ truncated singular value decomposition of an input matrix. It is known that the decomposition provides the best rank-$k$ approximation to the input matrix; in other words, a matrix with the smallest approximation error among all matrices of rank less than $k$. This step is an obstacle to an efficient implementation of the preconditioned SPA. To address the cost issue, we propose a modification of the algorithm for constructing the preconditioner. Although the original algorithm uses the best rank-$k$ approximation, instead of it, our modification uses an alternative. Ideally, this alternative should have high approximation accuracy and low computational cost. To ensure this, our modification employs a rank-$k$ approximation produced by an SPA based algorithm. We analyze the accuracy of the approximation and evaluate the computational cost of the algorithm. We then present an empirical study revealing the actual performance of the SPA based rank-$k$ approximation algorithm and the modified preconditioned SPA.

1805.02765 2026-06-04 eess.SY cs.SY math.OC stat.AP

3D printing of a leaf spring: A demonstration of closed-loop control in additive manufacturing

叶片弹簧的3D打印:加成型制造中闭环控制的演示

Kevin Garanger, Thanakorn Khamvilai, Eric Feron

AI总结 本文通过3D打印叶片弹簧展示闭环控制在增材制造中的应用,通过调整填充密度实现目标刚度,使刚度误差从11.63%降至1.34%。

详情
Comments
Accepted to CCTA 2018 conference
AI中文摘要

本文提出在塑料物体打印过程中集成反馈控制回路。打印对象为由不同填充密度部分组成的叶片弹簧,这些部分的填充密度是控制变量。为实现预期刚度,每次完成部分后进行测量并相应调整填充密度,以闭环框架实现。通过闭环控制,最终刚度误差从11.63%降低到1.34%。该实验作为概念验证,展示了反馈控制在增材制造中的相关性。通过将打印过程和测量视为随机过程,本文展示了如何利用随机最优控制和卡尔曼滤波来提高由基础打印机制造的对象质量。

英文摘要

This paper presents the integration of a feedback control loop during the printing of a plastic object using additive manufacturing. The printed object is a leaf spring made of several parts of different infill density values, which are the control variables in this problem. In order to achieve a desired objective stiffness, measurements are taken after each part is completed and the infill density is adjusted accordingly in a closed-loop framework. The absolute error in the stiffness at the end of printing is reduced from 11.63% to 1.34% by using a closed-loop instead of an open-loop control. This experiments serves as a proof of concept to show the relevance of using feedback control in additive manufacturing. By considering the printing process and the measurements as stochastic processes, we show how stochastic optimal control and Kalman filtering can be used to improve the quality of objects manufactured with rudimentary printers.

1709.01781 2026-06-04 math.NA cs.NA math.OC stat.ME

Parameterizations for Ensemble Kalman Inversion

集束卡尔曼反演的参数化方法

Neil K. Chada, Marco A. Iglesias, Lassi Roininen, Andrew M. Stuart

AI总结 本文探讨了如何通过几何和分层方法设计有效的参数化方案,以提升在电阻抗成像、地下水流动和源反演等应用中的卡尔曼反演性能。

详情
AI中文摘要

使用集束方法解决反问题具有吸引力,因为它是一种无导数方法,也适合并行化。其基本迭代形式产生一个解集,位于初始集束的线性跨度内。因此,未知场的参数化选择是方法成功的关键组成部分。本文展示了如何利用几何和分层思想设计有效的参数化方案,以解决电阻抗成像、地下水流动和源反演等应用中的反问题。特别是,我们展示了如何利用几何思想,包括水平集方法,来重建分段连续场,并展示了如何利用分层方法学习连续场中的关键参数,如长度尺度,从而改进重建结果。几何和分层思想在水平集方法中结合,以找到具有未知拓扑的分段常数重建。

英文摘要

The use of ensemble methods to solve inverse problems is attractive because it is a derivative-free methodology which is also well-adapted to parallelization. In its basic iterative form the method produces an ensemble of solutions which lie in the linear span of the initial ensemble. Choice of the parameterization of the unknown field is thus a key component of the success of the method. We demonstrate how both geometric ideas and hierarchical ideas can be used to design effective parameterizations for a number of applied inverse problems arising in electrical impedance tomography, groundwater flow and source inversion. In particular we show how geometric ideas, including the level set method, can be used to reconstruct piecewise continuous fields, and we show how hierarchical methods can be used to learn key parameters in continuous fields, such as length-scales, resulting in improved reconstructions. Geometric and hierarchical ideas are combined in the level set method to find piecewise constant reconstructions with interfaces of unknown topology.

1702.04837 2026-06-04 stat.ML cs.LG cs.NA math.NA

Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging

草图岭回归:优化视角、统计视角和模型平均

Shusen Wang, Alex Gittens, Michael W. Mahoney

AI总结 本文从优化和统计角度研究了草图和Hessian草图在矩阵岭回归中的影响,发现经典草图能近似最优解,而Hessian草图则不同。通过理论和实验表明,模型平均可显著降低真实与草图解间的风险差距。

详情
Journal ref
Journal of Machine Learning Research, 19, pp1-50, 2018
Comments
To appear in Journal of Machine Learning Research, 2018. A short version has appeared in International Conference on Machine Learning (ICML), 2017
AI中文摘要

我们探讨了经典草图和Hessian草图在近似求解矩阵岭回归(MRR)问题中的统计和优化影响。先前研究量化了经典草图对更简单的最小二乘回归(LSR)问题的影响。我们证明经典草图对MRR的优化属性的影响与对LSR的影响类似:即恢复近似最优解。相反,Hessian草图没有这种保证,其近似误差由响应中的“质量”与最优目标值之间的微妙交互决定。对于两种类型的近似,sketched MRR中的正则化导致与sketched LSR不同的统计特性。特别是,在sketched MRR中存在偏误-方差权衡,这在sketched LSR中不存在。我们提供了sketched MRR的偏误和方差的上界和下界,这些界限表明经典草图显著增加方差,而Hessian草图显著增加偏误。经验上,sketched MRR的解的风险可能比最优MRR解高一个数量级。我们理论和实证表明,模型平均显著降低真实解与sketched解风险之间的差距。因此,在并行或分布式设置中,草图结合模型平均是一种强大的技术,能够快速获得近似最优解,同时大幅减轻草图带来的统计风险增加。

英文摘要

We address the statistical and optimization impacts of the classical sketch and Hessian sketch used to approximately solve the Matrix Ridge Regression (MRR) problem. Prior research has quantified the effects of classical sketch on the strictly simpler least squares regression (LSR) problem. We establish that classical sketch has a similar effect upon the optimization properties of MRR as it does on those of LSR: namely, it recovers nearly optimal solutions. By contrast, Hessian sketch does not have this guarantee, instead, the approximation error is governed by a subtle interplay between the "mass" in the responses and the optimal objective value. For both types of approximation, the regularization in the sketched MRR problem results in significantly different statistical properties from those of the sketched LSR problem. In particular, there is a bias-variance trade-off in sketched MRR that is not present in sketched LSR. We provide upper and lower bounds on the bias and variance of sketched MRR, these bounds show that classical sketch significantly increases the variance, while Hessian sketch significantly increases the bias. Empirically, sketched MRR solutions can have risks that are higher by an order-of-magnitude than those of the optimal MRR solutions. We establish theoretically and empirically that model averaging greatly decreases the gap between the risks of the true and sketched solutions to the MRR problem. Thus, in parallel or distributed settings, sketching combined with model averaging is a powerful technique that quickly obtains near-optimal solutions to the MRR problem while greatly mitigating the increased statistical risk incurred by sketching.

1804.09062 2026-06-04 cs.ET cs.IT cs.SY eess.SY math.IT q-bio.MN stat.AP

A reaction network scheme which implements the EM algorithm

一种实现EM算法的反应网络方案

Muppirala Viswa Virinchi, Abhishek Behera, Manoj Gopalkrishnan

AI总结 本文提出一种反应网络方案,用于解决包括细胞通过受体-配体结合推断环境在内的统计问题,通过信息投影和广义EM算法实现最大似然估计。

详情
Comments
15 pages, 3 figures
AI中文摘要

本文详细阐述了如何通过化学反应网络生成活细胞复杂行为的算法机制。尽管已有研究显示反应网络在计算上是通用的,能够原则上实现任何算法,但仍有空间构造能良好映射生物现实、高效利用反应网络本征动力学的计算潜力,并与统计力学接触。我们描述了一种新的反应网络方案,用于解决包括细胞如何从受体-配体结合推断其环境在内的大量统计问题。具体而言,我们展示了反应网络如何实现信息投影,从而实现广义期望最大化算法,以解决在部分观测指数族上的分类数据最大似然估计问题。我们的方案可视为对E.T. Jaynes关于统计力学作为统计推断愿景的算法解释。

英文摘要

A detailed algorithmic explanation is required for how a network of chemical reactions can generate the sophisticated behavior displayed by living cells. Though several previous works have shown that reaction networks are computationally universal and can in principle implement any algorithm, there is scope for constructions that map well onto biological reality, make efficient use of the computational potential of the native dynamics of reaction networks, and make contact with statistical mechanics. We describe a new reaction network scheme for solving a large class of statistical problems including the problem of how a cell would infer its environment from receptor-ligand bindings. Specifically we show how reaction networks can implement information projection, and consequently a generalized Expectation-Maximization algorithm, to solve maximum likelihood estimation problems in partially-observed exponential families on categorical data. Our scheme can be thought of as an algorithmic interpretation of E. T. Jaynes's vision of statistical mechanics as statistical inference.

1804.07323 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Nonparametric Stochastic Compositional Gradient Descent for Q-Learning in Continuous Markov Decision Problems

非参数随机组合梯度下降法在连续马尔可夫决策问题中的Q学习

Alec Koppel, Ekaterina Tolstaya, Ethan Stump, Alejandro Ribeiro

AI总结 本文提出非参数随机组合梯度下降法用于连续马尔可夫决策问题中的Q学习,通过将贝尔曼最优性方程转化为嵌套非凸随机优化问题,并利用核诱导再生核希尔伯特空间进行参数化,最终证明算法在概率意义下收敛于问题的 stationary 点。

详情
AI中文摘要

我们考虑定义在连续状态和动作空间上的马尔可夫决策问题,其中自主代理试图学习从状态到动作的映射以最大化长期折扣奖励累积。我们通过考虑定义在动作价值函数上的贝尔曼最优性方程,将其重新表述为一个嵌套非凸随机优化问题,该问题定义在再生核希尔伯特空间(RKHS)上。我们开发了一种功能扩展的随机准梯度方法来解决这个问题,由于RKHS的结构,它允许以标量权重和过去的状态-动作对参数化,其增长与算法迭代次数成比例。为缓解这种复杂性爆炸,我们应用核正交匹配追踪到核权重和字典序列,从而在底层优化方法的下降方向上产生可控的误差。我们证明所得到的算法,称为KQ学习,以概率1收敛于该问题的 stationary 点,从而在假设其属于RKHS的情况下得到贝尔曼最优性算子的固定点。在常数学习率下,我们进一步得到收敛于一个小的贝尔曼误差,该误差取决于所选的学习率。在连续山车和倒立摆任务上的数值评估表明,收敛的简洁学习动作价值函数、与最先进方法具有竞争力的策略,并表现出可靠、可重复的学习行为。

英文摘要

We consider Markov Decision Problems defined over continuous state and action spaces, where an autonomous agent seeks to learn a map from its states to actions so as to maximize its long-term discounted accumulation of rewards. We address this problem by considering Bellman's optimality equation defined over action-value functions, which we reformulate into a nested non-convex stochastic optimization problem defined over a Reproducing Kernel Hilbert Space (RKHS). We develop a functional generalization of stochastic quasi-gradient method to solve it, which, owing to the structure of the RKHS, admits a parameterization in terms of scalar weights and past state-action pairs which grows proportionately with the algorithm iteration index. To ameliorate this complexity explosion, we apply Kernel Orthogonal Matching Pursuit to the sequence of kernel weights and dictionaries, which yields a controllable error in the descent direction of the underlying optimization method. We prove that the resulting algorithm, called KQ-Learning, converges with probability 1 to a stationary point of this problem, yielding a fixed point of the Bellman optimality operator under the hypothesis that it belongs to the RKHS. Under constant learning rates, we further obtain convergence to a small Bellman error that depends on the chosen learning rates. Numerical evaluation on the Continuous Mountain Car and Inverted Pendulum tasks yields convergent parsimonious learned action-value functions, policies that are competitive with the state of the art, and exhibit reliable, reproducible learning behavior.

1804.07010 2026-06-04 stat.ML cs.LG cs.SY eess.SY math.AP math.OC

Forward-Backward Stochastic Neural Networks: Deep Learning of High-dimensional Partial Differential Equations

前向-后向随机神经网络:高维偏微分方程的深度学习

Maziar Raissi

AI总结 本文提出一种高维偏微分方程求解方法,利用深度神经网络和随机微分方程的联系,避免数值离散化限制,解决维度诅咒问题。

详情
AI中文摘要

经典偏微分方程数值方法因依赖精细的时空网格而受维度诅咒限制。受现代深度学习技术启发,本文提出一种可扩展的算法,通过深度神经网络近似未知解,并利用自动微分优势。通过将高维偏微分方程与前向-后向随机微分方程联系起来,利用布朗运动独立实现作为训练数据,测试了Black-Scholes-Barenblatt和Hamilton-Jacobi-Bellman方程等100维基准问题的有效性。

英文摘要

Classical numerical methods for solving partial differential equations suffer from the curse dimensionality mainly due to their reliance on meticulously generated spatio-temporal grids. Inspired by modern deep learning based techniques for solving forward and inverse problems associated with partial differential equations, we circumvent the tyranny of numerical discretization by devising an algorithm that is scalable to high-dimensions. In particular, we approximate the unknown solution by a deep neural network which essentially enables us to benefit from the merits of automatic differentiation. To train the aforementioned neural network we leverage the well-known connection between high-dimensional partial differential equations and forward-backward stochastic differential equations. In fact, independent realizations of a standard Brownian motion will act as training data. We test the effectiveness of our approach for a couple of benchmark problems spanning a number of scientific domains including Black-Scholes-Barenblatt and Hamilton-Jacobi-Bellman equations, both in 100-dimensions.

1804.06114 2026-06-04 cs.LG cs.CV cs.NA math.NA stat.ML

A Support Tensor Train Machine

支持张量列车机

Cong Chen, Kim Batselier, Ching-Yun Ko, Ngai Wong

AI总结 本文提出支持张量列车机,通过将传统支持张量机中的秩一张量替换为张量列车,提升模型表达能力,实验验证其优于SVM和STM。

详情
Comments
7 pages
AI中文摘要

近年来,将传统向量机技术扩展到张量形式引起了广泛关注。例如,支持张量机(STM)利用秩一张量捕捉数据结构,从而缓解传统支持向量机(SVM)中的过拟合和维度灾难问题。然而,秩一张量的表达能力对于许多现实数据来说是有限的。为克服这一限制,我们引入支持张量列车机(STTM),通过将STM中的秩一张量替换为张量列车。实验验证并确认STTM优于SVM和STM。

英文摘要

There has been growing interest in extending traditional vector-based machine learning techniques to their tensor forms. An example is the support tensor machine (STM) that utilizes a rank-one tensor to capture the data structure, thereby alleviating the overfitting and curse of dimensionality problems in the conventional support vector machine (SVM). However, the expressive power of a rank-one tensor is restrictive for many real-world data. To overcome this limitation, we introduce a support tensor train machine (STTM) by replacing the rank-one tensor in an STM with a tensor train. Experiments validate and confirm the superiority of an STTM over the SVM and STM.

1702.02826 2026-06-04 math.ST econ.GN math-ph math.MP q-fin.EC stat.TH

Super Generalized Central Limit Theorem: Limit distributions for sums of non-identical random variables with power-laws

超广义中心极限定理:非同分布随机变量求和的极限分布

Masaru Shintani, Ken Umeno

AI总结 本文研究非同分布随机变量求和的极限分布,证明其收敛于唯一稳定分布,解释了稳定分布的普遍性。

详情
Comments
4pages,1figure
AI中文摘要

在自然界或社会中,幂律普遍存在,因此在大数据时代研究幂律的数学特性至关重要。本文证明非同分布的随机过程叠加后,其密度收敛于唯一稳定分布。这一性质可用于解释稳定分布的普遍性,即非同分布股票价格波动的对数收益率之和遵循稳定分布。

英文摘要

In nature or societies, the power-law is present ubiquitously, and then it is important to investigate the mathematical characteristics of power-laws in the recent era of big data. In this paper we prove the superposition of non-identical stochastic processes with power-laws converges in density to a unique stable distribution. This property can be used to explain the universality of stable laws such that the sums of the logarithmic return of non-identical stock price fluctuations follow stable distributions.

1709.10314 2026-06-04 math.NA astro-ph.CO cs.NA math.PR math.ST stat.TH

Fast generation of isotropic Gaussian random fields on the sphere

球面上各向同性高斯随机场的快速生成

Peter E. Creasey, Annika Lang

AI总结 本文提出一种基于马尔可夫性质和一维快速傅里叶变换的高效算法,能够在O(n²logn)时间内生成球面上n×n网格的各向同性高斯随机场,并提供高效的条件协方差矩阵设置方法。

详情
Journal ref
Monte Carlo Meth. Appl., Vol. 24, No. 1, 1-11, March 2018
Comments
Corrected link to software in arXiv's online abstract, added journal reference
AI中文摘要

在单位球面上高效模拟各向同性高斯随机场是数值应用中常见的任务。本文提出了一种基于马尔可夫性质和一维快速傅里叶变换的快速算法,能够在O(n²logn)时间内生成n×n网格的样本。此外,还推导出设置必要条件协方差矩阵的高效方法,并通过模拟展示了该算法的性能。代码的开源实现已发布在https://github.com/pec27/smerfs。

英文摘要

The efficient simulation of isotropic Gaussian random fields on the unit sphere is a task encountered frequently in numerical applications. A fast algorithm based on Markov properties and fast Fourier Transforms in 1d is presented that generates samples on an n x n grid in O(n^2 log n). Furthermore, an efficient method to set up the necessary conditional covariance matrices is derived and simulations demonstrate the performance of the algorithm. An open source implementation of the code has been made available at https://github.com/pec27/smerfs .

1709.05153 2026-06-04 math.ST cs.NA math.NA math.PR stat.TH

Operator Fitting for Parameter Estimation of Stochastic Differential Equations

算子拟合用于随机微分方程参数估计

Asbjørn N. Riseth, Jake P. Taylor-King

AI总结 本文提出通过拟合Koopman算子的有限维近似来估计随机微分方程参数,相较于传统方法,该方法在评估成本上更具优势,并通过简单系统验证了其有效性。

详情
Comments
21 pages, 3 figures, 2 tables
AI中文摘要

参数估计是模型开发中的关键部分。当模型是确定性时,可以最小化拟合误差;而对于随机系统,必须更加谨慎。广义上,随机动力系统参数化方法可以分为最大似然估计和矩方法启发式技术。我们提出了一种方法,其中通过拟合Koopman算子的有限维近似与由扩展动态模式分解近似生成的隐含Koopman算子进行匹配。这种方法的一个优势是,对于某些动力系统,目标评估成本可以与样本数量无关。我们在这两个简单的随机微分方程形式的系统上测试了我们的方法,与基准技术进行比较,并考虑了被近似的算子的有限特征展开。其他小的技巧变化也进行了考虑,并讨论了我们方法的优势。

英文摘要

Estimation of parameters is a crucial part of model development. When models are deterministic, one can minimise the fitting error; for stochastic systems one must be more careful. Broadly parameterisation methods for stochastic dynamical systems fit into maximum likelihood estimation- and method of moment-inspired techniques. We propose a method where one matches a finite dimensional approximation of the Koopman operator with the implied Koopman operator as generated by an extended dynamic mode decomposition approximation. One advantage of this approach is that the objective evaluation cost can be independent the number of samples for some dynamical systems. We test our approach on two simple systems in the form of stochastic differential equations, compare to benchmark techniques, and consider limited eigen-expansions of the operators being approximated. Other small variations on the technique are also considered, and we discuss the advantages to our formulation.

1409.4685 2026-06-04 math.NA cs.NA math.ST stat.TH

Perturbation-based inference for diffusion processes: Obtaining effective models from multiscale data

基于扰动的扩散过程推断:从多尺度数据中获得有效模型

Sebastian Krumscheid

AI总结 本文研究了从离散时间观测中推断随机微分方程参数的问题,提出了一种新的推断方法,适用于从多尺度过程中估计有效模型参数。

详情
AI中文摘要

我们考虑了从离散时间观测(例如实验或模拟数据)中推断随机微分方程模型参数的推断问题。具体来说,我们研究了在没有模型自身观测数据的情况下,仅能获取扰动版本数据的情况。受此扰动论证的启发,我们从数值分析的角度研究了估计程序的收敛性。更具体地说,我们引入了适当的一致性、稳定性和收敛性概念,并研究它们之间的联系。结果表明,标准的统计技术,如最大似然估计器,在这种情况下并不收敛,因为它们无法保持稳定性。由于这一缺陷,我们引入并分析了一种新的随机微分方程参数推断程序,该程序被证明是收敛的。因此,该方法特别适用于从相应多尺度过程的观测中估计有效(即粗粒度)模型的参数。我们通过几个数值例子来说明这些理论发现。

英文摘要

We consider the inference problem for parameters in stochastic differential equation models from discrete time observations (e.g. experimental or simulation data). Specifically, we study the case where one does not have access to observations of the model itself, but only to a perturbed version which converges weakly to the solution of the model. Motivated by this perturbation argument, we study the convergence of estimation procedures from a numerical analysis point of view. More precisely, we introduce appropriate consistency, stability, and convergence concepts and study their connection. It turns out that standard statistical techniques, such as the maximum likelihood estimator, are not convergent methodologies in this setting, since they fail to be stable. Due to this shortcoming, we introduce and analyse a novel inference procedure for parameters in stochastic differential equation models which turns out to be convergent. As such, the method is particularly suited for the estimation of parameters in effective (i.e. coarse-grained) models from observations of the corresponding multiscale process. We illustrate these theoretical findings via several numerical examples.

1610.02967 2026-06-04 math.OC cs.LG cs.NA math.NA stat.ML

Distributed Convex Optimization with Many Convex Constraints

具有许多凸约束的分布式凸优化

Joachim Giesen, Sören Laue

AI总结 本文提出一种扩展的ADMM方法,用于解决具有众多凸约束的分布式凸优化问题,继承了ADMM和增广拉格朗日方法的收敛性保证。

详情
AI中文摘要

我们解决在分布式环境下求解具有许多凸约束的凸优化问题。我们的方法基于一种扩展的交替方向乘子法(ADMM),该方法近期在大数据领域受到广泛关注。尽管ADMM早在数十年前就被发明,但迄今为止只能应用于无约束问题或具有线性等式或不等式约束的问题。我们的扩展方法能够直接处理任意不等式约束。它结合了ADMM在分布式环境下求解凸优化问题的能力,以及增广拉格朗日方法求解约束优化问题的能力,并且我们证明它继承了ADMM和增广拉格朗日方法的收敛保证。

英文摘要

We address the problem of solving convex optimization problems with many convex constraints in a distributed setting. Our approach is based on an extension of the alternating direction method of multipliers (ADMM) that recently gained a lot of attention in the Big Data context. Although it has been invented decades ago, ADMM so far can be applied only to unconstrained problems and problems with linear equality or inequality constraints. Our extension can handle arbitrary inequality constraints directly. It combines the ability of ADMM to solve convex optimization problems in a distributed setting with the ability of the Augmented Lagrangian method to solve constrained optimization problems, and as we show, it inherits the convergence guarantees of ADMM and the Augmented Lagrangian method.

1804.01647 2026-06-04 math.NA cs.NA math.PR math.ST stat.TH

Mathematical Properties of Polynomial Dimensional Decomposition

多项式维度分解的数学性质

Sharif Rahman

AI总结 本文研究了多项式维度分解的数学性质,证明了多项式正交基的完备性,并展示了其在均方收敛方面的优势,同时指出PDD在计算效率上优于多项式混沌扩展。

详情
Comments
28 pages, two figures, one table; accepted by SIAM/ASA Journal on Uncertainty Quantification
AI中文摘要

许多高维不确定性量化问题通过多项式维度分解(PDD)解决,其以随机正交多项式表示傅里叶级数展开。本文构建了维度-wise和正交分割的多项式空间,证明了在给定假设下多项式正交基的完备性,并展示了均方收敛到正确极限。二阶矩误差分析表明,PDD的误差不会超过多项式混沌扩展(PCE)在适当选择截断参数时的误差。从计算效率比较来看,估计输出函数方差时,PDD近似在计算效率上显著优于PCE近似。

英文摘要

Many high-dimensional uncertainty quantification problems are solved by polynomial dimensional decomposition (PDD), which represents Fourier-like series expansion in terms of random orthonormal polynomials with increasing dimensions. This study constructs dimension-wise and orthogonal splitting of polynomial spaces, proves completeness of polynomial orthogonal basis for prescribed assumptions, and demonstrates mean-square convergence to the correct limit -- all associated with PDD. A second-moment error analysis reveals that PDD cannot commit larger error than polynomial chaos expansion (PCE) for the appropriately chosen truncation parameters. From the comparison of computational efforts, required to estimate with the same precision the variance of an output function involving exponentially attenuating expansion coefficients, the PDD approximation can be markedly more efficient than the PCE approximation.

1710.10781 2026-06-04 math.NA cs.CV cs.LG cs.NA stat.ML

Stochastic variance reduced multiplicative update for nonnegative matrix factorization

随机方差缩减乘法更新用于非负矩阵分解

Hiroyuki Kasai

AI总结 本文提出一种随机方差缩减乘法更新算法,改进非负矩阵分解的收敛速度,通过数值实验验证其在不同数据集上的优越性。

详情
Comments
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2018)
AI中文摘要

非负矩阵分解(NMF)是一种降维和因子分析方法,其因子矩阵具有低秩非负约束。考虑到NMF中的随机学习,本文特别针对最流行的乘法更新(MU)规则,该规则收敛速度较慢。本文提出一种随机梯度的方差缩减技术,数值比较表明,所提出的算法在不同合成和实际数据集上均优于现有算法。

英文摘要

Nonnegative matrix factorization (NMF), a dimensionality reduction and factor analysis method, is a special case in which factor matrices have low-rank nonnegative constraints. Considering the stochastic learning in NMF, we specifically address the multiplicative update (MU) rule, which is the most popular, but which has slow convergence property. This present paper introduces on the stochastic MU rule a variance-reduced technique of stochastic gradient. Numerical comparisons suggest that our proposed algorithms robustly outperform state-of-the-art algorithms across different synthetic and real-world datasets.

1804.00684 2026-06-04 cs.LG cs.NA math.NA stat.ML

Graph-Based Deep Modeling and Real Time Forecasting of Sparse Spatio-Temporal Data

基于图的深度建模与稀疏时空数据的实时预测

Bao Wang, Xiyang Luo, Fangbo Zhang, Baichuan Yuan, Andrea L. Bertozzi, P. Jeffrey Brantingham

AI总结 本文提出一种通用框架,用于稀疏时空数据的建模、分析与预测,结合自激发点过程和图结构循环神经网络,实现宏微观尺度的联合建模与实时预测。

详情
Comments
9 pages, 19 figures
AI中文摘要

我们提出了一种通用框架,用于时空数据的建模、分析和预测,特别关注在空间和时间上都稀疏的数据。我们的多尺度框架是两个主要组件的无缝耦合:一个自激发点过程用于建模时空数据的宏尺度统计行为,以及一个图结构循环神经网络(GSRNN)用于在推断图上发现时空数据的微尺度模式。这种新颖的深度神经网络(DNN)结合了图节点的实时交互,以实现更准确的实时预测。该方法在犯罪和交通预测上得到了验证。

英文摘要

We present a generic framework for spatio-temporal (ST) data modeling, analysis, and forecasting, with a special focus on data that is sparse in both space and time. Our multi-scaled framework is a seamless coupling of two major components: a self-exciting point process that models the macroscale statistical behaviors of the ST data and a graph structured recurrent neural network (GSRNN) to discover the microscale patterns of the ST data on the inferred graph. This novel deep neural network (DNN) incorporates the real time interactions of the graph nodes to enable more accurate real time forecasting. The effectiveness of our method is demonstrated on both crime and traffic forecasting.

1804.00430 2026-06-04 stat.CO astro-ph.IM cs.SY eess.SY math.CV

Constrained Least Squares for Extended Complex Factor Analysis

受限最小二乘法用于扩展复数因子分析

Ahmad Mouri Sardarabadi, Alle-Jan van der Veen, L. V. E. Koopmans

AI总结 本文提出一种基于受限最小二乘法的扩展复数因子分析方法,通过闭式解快速求解参数,提升子空间估计效率。

详情
AI中文摘要

本文提出一种基于受限最小二乘法的扩展复数因子分析方法,通过闭式解快速求解参数,提升子空间估计效率。

英文摘要

For subspace estimation with an unknown colored noise, Factor Analysis (FA) is a good candidate for replacing the popular eigenvalue decomposition (EVD). Finding the unknowns in factor analysis can be done by solving a non-linear least square problem. For this type of optimization problems, the Gauss-Newton (GN) algorithm is a powerful and simple method. The most expensive part of the GN algorithm is finding the direction of descent by solving a system of equations at each iteration. In this paper we show that for FA, the matrices involved in solving these systems of equations can be diagonalized in a closed form fashion and the solution can be found in a computationally efficient way. We show how the unknown parameters can be updated without actually constructing these matrices. The convergence performance of the algorithm is studied via numerical simulations.

1705.10887 2026-06-04 stat.ML cs.CV cs.LG cs.NA math.NA

Efficient, sparse representation of manifold distance matrices for classical scaling

高效表示经典标度中的流形距离矩阵

Javier S. Turek, Alexander Huth

AI总结 本文提出一种基于双调和插值的稀疏方法,用于高效表示流形距离矩阵,相比现有方法速度快2倍,内存占用低20倍,能处理大规模点集。

详情
Comments
Conference CVPR 2018
AI中文摘要

Geodesic距离矩阵可以揭示对非刚性变形不敏感的形状特性,因此常用于分析和表示3-D形状。然而,这些矩阵随点数的平方增长,因此对于大规模点集常用低秩近似来存储和分析。本文提出了一种新颖的稀疏方法,利用双调和插值高效表示流形距离矩阵。该方法利用数据流形的知识,学习一个稀疏插值算子,通过部分点近似距离。我们证明,与现有方法相比,该方法在处理大规模点集的MDS问题时速度快2倍,内存占用低20倍,质量相似。这使得分析之前不可行的大规模点集成为可能。

英文摘要

Geodesic distance matrices can reveal shape properties that are largely invariant to non-rigid deformations, and thus are often used to analyze and represent 3-D shapes. However, these matrices grow quadratically with the number of points. Thus for large point sets it is common to use a low-rank approximation to the distance matrix, which fits in memory and can be efficiently analyzed using methods such as multidimensional scaling (MDS). In this paper we present a novel sparse method for efficiently representing geodesic distance matrices using biharmonic interpolation. This method exploits knowledge of the data manifold to learn a sparse interpolation operator that approximates distances using a subset of points. We show that our method is 2x faster and uses 20x less memory than current leading methods for solving MDS on large point sets, with similar quality. This enables analyses of large point sets that were previously infeasible.

1701.01394 2026-06-04 cs.DS cs.LG cs.NA math.NA stat.ML

On spectral partitioning of signed graphs

关于带符号图的谱划分

Andrew V. Knyazev

AI总结 本文讨论了带符号图谱划分中标准图拉普拉斯矩阵优于符号拉普拉斯矩阵,指出基于符号拉普拉斯矩阵主特征向量的划分方法更有效,负特征值有助于提高计算效率。

详情
Comments
12 pages, 10 figures. Rev 2 to appear in proceedings of the SIAM Workshop on Combinatorial Scientific Computing 2018 (CSC18)
AI中文摘要

我们主张标准图拉普拉斯矩阵比符号拉普拉斯矩阵更适合带符号图的谱划分。简单例子表明,基于符号拉普拉斯矩阵主特征向量的划分可能无意义,而基于标准图拉普拉斯矩阵Fiedler向量的划分更有效。我们观察到负特征值对带符号图的谱划分有益,使Fiedler向量更容易计算。

英文摘要

We argue that the standard graph Laplacian is preferable for spectral partitioning of signed graphs compared to the signed Laplacian. Simple examples demonstrate that partitioning based on signs of components of the leading eigenvectors of the signed Laplacian may be meaningless, in contrast to partitioning based on the Fiedler vector of the standard graph Laplacian for signed graphs. We observe that negative eigenvalues are beneficial for spectral partitioning of signed graphs, making the Fiedler vector easier to compute.

1803.08522 2026-06-04 eess.SY cs.SY stat.CO

Frequency violations from random disturbances: an MCMC approach

随机扰动引起的频率违规:一种MCMC方法

John Moriarty, Jure Vogrinc, Alessandro Zocca

AI总结 本文提出一种MCMC方法,用于高效采样罕见的功率扰动,研究频率变化率违规的成因及概率。

详情
AI中文摘要

电力系统频率稳定性正面临各种扰动的挑战,尤其是可再生能源渗透率增加导致发电波动性和系统惯性下降。本文提出一种名为ghost sampling的Metropolis-Hastings MCMC方法变体,用于高效采样导致节点频率违规的罕见功率扰动。生成代表性随机样本回答了诸如

英文摘要

The frequency stability of power systems is increasingly challenged by various types of disturbances. In particular, the increasing penetration of renewable energy sources is increasing the variability of power generation and at the same time reducing system inertia against disturbances. In this paper we are particularly interested in understanding how rate of change of frequency (RoCoF) violations could arise from unusually large power disturbances. We devise a novel specialization, named ghost sampling, of the Metropolis-Hastings Markov Chain Monte Carlo method that is tailored to efficiently sample rare power disturbances leading to nodal frequency violations. Generating a representative random sample addresses important statistical questions such as "which generator is most likely to be disconnected due to a RoCoF violation?" or "what is the probability of having simultaneous RoCoF violations, given that a violation occurs?" Our method can perform conditional sampling from any joint distribution of power disturbances including, for instance, correlated and non-Gaussian disturbances, features which have both been recently shown to be significant in security analyses.

1803.07661 2026-06-04 cs.LG cs.NA math.NA stat.ML

Efficient Recurrent Neural Networks using Structured Matrices in FPGAs

在FPGA上使用结构化矩阵实现高效的循环神经网络

Zhe Li, Shuo Wang, Caiwen Ding, Qinru Qiu, Yanzhi Wang, Yun Liang

AI总结 本文提出在FPGA上使用块循环矩阵实现RNN,以提高模型压缩和加速,实验显示比ESE提升35.7倍的能效。

详情
Comments
To appear in International Conference on Learning Representations 2018 Workshop Track
AI中文摘要

循环神经网络(RNN)在时间序列相关应用中正变得越来越重要,要求高效的实时实现。最近基于剪枝的工作ESE由于剪枝后网络结构的不规则性导致性能/能效下降。我们提出在RNN中使用块循环矩阵来表示权重矩阵,从而实现同时的模型压缩和加速。我们的目标是在FPGA上实现最高性能和能效的RNN,同时满足一定的精度要求(可忽略的精度下降)。实验结果表明,所提出的框架在实际FPGA部署中相比ESE实现了最大能效提升35.7倍。

英文摘要

Recurrent Neural Networks (RNNs) are becoming increasingly important for time series-related applications which require efficient and real-time implementations. The recent pruning based work ESE suffers from degradation of performance/energy efficiency due to the irregular network structure after pruning. We propose block-circulant matrices for weight matrix representation in RNNs, thereby achieving simultaneous model compression and acceleration. We aim to implement RNNs in FPGA with highest performance and energy efficiency, with certain accuracy requirement (negligible accuracy degradation). Experimental results on actual FPGA deployments shows that the proposed framework achieves a maximum energy efficiency improvement of 35.7$\times$ compared with ESE.

1803.08137 2026-06-04 cs.CV cs.AI cs.NA math.NA stat.ML

Robust Blind Deconvolution via Mirror Descent

通过镜像下降实现鲁棒盲去卷积

Sathya N. Ravi, Ronak Mehta, Vikas Singh

AI总结 本文研究盲去卷积的鲁棒性和收敛性,提出一种具有理论保证的算法,在实践中表现优异。

详情
AI中文摘要

我们重新审视盲去卷积问题,重点在于理解其鲁棒性和收敛性属性。可证明的鲁棒性对噪声和其他扰动的容忍能力最近在视觉领域受到关注,从获得对抗攻击的免疫性到评估和描述关键任务应用中算法的失败模式。此外,许多基于深度架构的盲去卷积方法内部使用或优化基本公式,因此更清楚地理解该子模块的行为、何时可以求解以及它可以容忍多少噪声注入是首要要求。我们推导了盲去卷积理论基础的新见解。出现的算法具有良好的收敛保证,并在我们论文中正式定义的意义上被证明是鲁棒的。有趣的是,这些技术结果在实践中表现非常出色,其中在标准数据集上,我们的算法结果与或优于现有最先进方法。关键词:盲去卷积,鲁棒连续优化

英文摘要

We revisit the Blind Deconvolution problem with a focus on understanding its robustness and convergence properties. Provable robustness to noise and other perturbations is receiving recent interest in vision, from obtaining immunity to adversarial attacks to assessing and describing failure modes of algorithms in mission critical applications. Further, many blind deconvolution methods based on deep architectures internally make use of or optimize the basic formulation, so a clearer understanding of how this sub-module behaves, when it can be solved, and what noise injection it can tolerate is a first order requirement. We derive new insights into the theoretical underpinnings of blind deconvolution. The algorithm that emerges has nice convergence guarantees and is provably robust in a sense we formalize in the paper. Interestingly, these technical results play out very well in practice, where on standard datasets our algorithm yields results competitive with or superior to the state of the art. Keywords: blind deconvolution, robust continuous optimization

1612.05276 2026-06-04 nlin.AO cs.SY eess.SY stat.ML

Learning Optimal Control of Synchronization in Networks of Coupled Oscillators using Genetic Programming-based Symbolic Regression

利用基于遗传编程的符号回归学习耦合振子网络同步控制的最优控制

Julien Gout, Markus Quade, Kamran Shafi, Robert K. Niven, Markus Abel

AI总结 本文提出一种基于遗传编程的多目标优化方法,用于推断驱动耦合振子网络从同步到非同步及反之的最优控制函数,通过可解释的符号形式实现有效控制。

详情
Journal ref
Nonlinear Dynamics 91 (2), 1001-1021, 2018
Comments
Submitted to nonlinear dynamics
AI中文摘要

耦合动态系统的网络为建模具有极其复杂动态的系统提供了强大方法,如人类大脑。此类网络中同步控制在许多领域有广泛应用,包括工程和医学。本文将动态系统中的同步控制建模为优化问题,并提出一种基于遗传编程的多目标方法,以推断驱动系统从同步到非同步及反之的最优控制函数。基于遗传编程的控制器允许以可解释的符号形式学习最优控制函数。所提方法的有效性在控制耦合振子网络同步方面得到验证,从简单的耦合振子系统到复杂层次网络均展示了其学习高效且可解释的控制函数的能力。

英文摘要

Networks of coupled dynamical systems provide a powerful way to model systems with enormously complex dynamics, such as the human brain. Control of synchronization in such networked systems has far reaching applications in many domains, including engineering and medicine. In this paper, we formulate the synchronization control in dynamical systems as an optimization problem and present a multi-objective genetic programming-based approach to infer optimal control functions that drive the system from a synchronized to a non-synchronized state and vice-versa. The genetic programming-based controller allows learning optimal control functions in an interpretable symbolic form. The effectiveness of the proposed approach is demonstrated in controlling synchronization in coupled oscillator systems linked in networks of increasing order complexity, ranging from a simple coupled oscillator system to a hierarchical network of coupled oscillators. The results show that the proposed method can learn highly-effective and interpretable control functions for such systems.

1610.05660 2026-06-04 stat.CO cs.NA math.NA

Langevin Diffusion for Population Based Sampling with an Application in Bayesian Inference for Pharmacodynamics

基于朗之万扩散的群体采样算法及其在药动学贝叶斯推断中的应用

Georgios Arampatzis, Daniel Wälchli, Panagiotis Angelikopoulos, Stephen Wu, Panagiotis Hadjidoukas, Petros Koumoutsakos

AI总结 本文提出一种结合曼陀罗Metropolis调整朗之万过渡核和群体采样算法优势的高效鲁棒采样算法,用于贝叶斯推断中的后验分布采样,展示了其在药动学模型参数后验分布计算中的应用优势。

详情
AI中文摘要

我们提出了一种用于贝叶斯推断问题中高效且稳健后验概率分布采样的算法。该算法结合了曼陀罗Metropolis调整朗之万过渡核的局部搜索能力与基于群体采样的全局探索算法,即过渡马尔可夫链蒙特卡罗(TMCMC)。朗之万扩散过程由目标分布的Hessian或Fisher信息决定,并根据非正定性进行适当修改。本文的方法在可获取梯度的概率分布采样中优于其他基于群体的算法,并能处理原本不可识别的模型。我们通过使用临床数据计算胶质瘤生长及其药物诱导抑制的药动学模型参数的后验分布,展示了该方法的能力和优势。

英文摘要

We propose an algorithm for the efficient and robust sampling of the posterior probability distribution in Bayesian inference problems. The algorithm combines the local search capabilities of the Manifold Metropolis Adjusted Langevin transition kernels with the advantages of global exploration by a population based sampling algorithm, the Transitional Markov Chain Monte Carlo (TMCMC). The Langevin diffusion process is determined by either the Hessian or the Fisher Information of the target distribution with appropriate modifications for non positive definiteness. The present methods is shown to be superior over other population based algorithms, in sampling probability distributions for which gradients are available and is shown to handle otherwise unidentifiable models. We demonstrate the capabilities and advantages of the method in computing the posterior distribution of the parameters in a Pharmacodynamics model, for glioma growth and its drug induced inhibition, using clinical data.

1803.06989 2026-06-04 math.ST cs.LG cs.NA math.NA stat.ML stat.TH

Numerical Integration on Graphs: where to sample and how to weigh

图上的数值积分:在哪里采样和如何加权

George C. Linderman, Stefan Steinerberger

AI总结 研究图上数值积分问题,通过热球最优打包几何问题重构积分,提出采样策略与加权方法,验证方法效率。

详情
AI中文摘要

设G=(V,E,w)为有限连通加权图。我们关注寻找顶点子集W⊆V和权重a_w,使得1/|V|∑_{v∈V}f(v)≈∑_{w∈W}a_wf(w),其中f:V→R是图几何下光滑的函数。主要应用是当f依赖于图结构但单点评估成本高的问题。证明不等式显示积分问题可转化为几何问题(最优热球打包)。讨论如何构造热球打包近似解;数值示例展示方法效率。

英文摘要

Let $G=(V,E,w)$ be a finite, connected graph with weighted edges. We are interested in the problem of finding a subset $W \subset V$ of vertices and weights $a_w$ such that $$ \frac{1}{|V|}\sum_{v \in V}^{}{f(v)} \sim \sum_{w \in W}{a_w f(w)}$$ for functions $f:V \rightarrow \mathbb{R}$ that are `smooth' with respect to the geometry of the graph. The main application are problems where $f$ is known to somehow depend on the underlying graph but is expensive to evaluate on even a single vertex. We prove an inequality showing that the integration problem can be rewritten as a geometric problem (`the optimal packing of heat balls'). We discuss how one would construct approximate solutions of the heat ball packing problem; numerical examples demonstrate the efficiency of the method.

1505.03898 2026-06-04 cs.IT cs.NA math.IT math.NA math.OC stat.ML

Pinball Loss Minimization for One-bit Compressive Sensing: Convex Models and Algorithms

一比特压缩感知中的Pinball损失最小化:凸模型与算法

Xiaolin Huang, Lei Shi, Ming Yan, Johan A. K. Suykens

AI总结 本文提出基于Pinball损失的凸模型及算法,用于提升一比特压缩感知在噪声数据下的解码性能。

详情
Comments
11 pages
AI中文摘要

一比特量化通过低功耗高速比较器实现,因此一比特压缩感知(1bit-CS)在信号处理中变得有吸引力。当信号采集和传输过程中受到噪声干扰时,1bit-CS通常被建模为最小化带有稀疏性约束的损失函数。一侧ℓ1损失和线性损失是1bit-CS中两种流行的损失函数。为提高噪声数据下的解码性能,本文考虑了Pinball损失,它在一侧ℓ1损失和线性损失之间架起桥梁。使用Pinball损失,提出了两种凸模型,即弹性网络Pinball模型及其带有ℓ1范数约束的修改版本。为高效求解这些模型,设计了相应的对偶坐标上升算法并证明了其收敛性。数值实验验证了所提算法的有效性以及Pinball损失最小化在1bit-CS中的性能。

英文摘要

The one-bit quantization is implemented by one single comparator that operates at low power and a high rate. Hence one-bit compressive sensing (1bit-CS) becomes attractive in signal processing. When measurements are corrupted by noise during signal acquisition and transmission, 1bit-CS is usually modeled as minimizing a loss function with a sparsity constraint. The one-sided $\ell_1$ loss and the linear loss are two popular loss functions for 1bit-CS. To improve the decoding performance on noisy data, we consider the pinball loss, which provides a bridge between the one-sided $\ell_1$ loss and the linear loss. Using the pinball loss, two convex models, an elastic-net pinball model and its modification with the $\ell_1$-norm constraint, are proposed. To efficiently solve them, the corresponding dual coordinate ascent algorithms are designed and their convergence is proved. The numerical experiments confirm the effectiveness of the proposed algorithms and the performance of the pinball loss minimization for 1bit-CS.

1703.02419 2026-06-04 stat.CO cs.LG cs.SY eess.SY

Probabilistic learning of nonlinear dynamical systems using sequential Monte Carlo

利用序贯蒙特卡洛方法进行非线性动力系统概率学习

Thomas B. Schön, Andreas Svensson, Lawrence Murray, Fredrik Lindsten

AI总结 本文提出基于序贯蒙特卡洛方法的概率非线性状态空间模型学习方法,通过粒子Metropolis-Hastings算法实现参数空间的高效采样,并展示了该方法在动态系统建模中的应用。

详情
Comments
Thomas B. Schön, Andreas Svensson, Lawrence Murray and Fredrik Lindsten, 2018. Probabilistic learning of nonlinear dynamical systems using sequential Monte Carlo. In Mechanical Systems and Signal Processing, Volume 104, pp. 866-883
AI中文摘要

概率建模能够表示和操纵数据、模型、预测和决策中的不确定性。本文关注从测量数据中学习动态系统概率模型的问题,特别是非线性状态空间模型的学习。由于该问题没有闭式解,因此必须使用近似方法。本文提供了一个自包含的介绍,介绍了一种最先进的方法——粒子Metropolis-Hastings算法,该算法已被证明能提供实用的近似。这是一种基于蒙特卡洛的方法,其中粒子滤波用于引导马尔可夫链蒙特卡洛方法通过参数空间。粒子Metropolis-Hastings算法的一个关键优点是,在温和的假设下,它保证收敛到“真实解”,尽管它基于仅有限数量粒子的粒子滤波。本文还提供了一个数值示例,通过为序贯蒙特卡洛方法量身定制的建模语言来展示该方法。此类建模语言的目的是将高级蒙特卡洛方法(包括粒子Metropolis-Hastings)的威力带给大量用户,而无需他们了解所有底层数学细节。

英文摘要

Probabilistic modeling provides the capability to represent and manipulate uncertainty in data, models, predictions and decisions. We are concerned with the problem of learning probabilistic models of dynamical systems from measured data. Specifically, we consider learning of probabilistic nonlinear state-space models. There is no closed-form solution available for this problem, implying that we are forced to use approximations. In this tutorial we will provide a self-contained introduction to one of the state-of-the-art methods---the particle Metropolis--Hastings algorithm---which has proven to offer a practical approximation. This is a Monte Carlo based method, where the particle filter is used to guide a Markov chain Monte Carlo method through the parameter space. One of the key merits of the particle Metropolis--Hastings algorithm is that it is guaranteed to converge to the "true solution" under mild assumptions, despite being based on a particle filter with only a finite number of particles. We will also provide a motivating numerical example illustrating the method using a modeling language tailored for sequential Monte Carlo methods. The intention of modeling languages of this kind is to open up the power of sophisticated Monte Carlo methods---including particle Metropolis--Hastings---to a large group of users without requiring them to know all the underlying mathematical details.

1706.03151 2026-06-04 eess.SY cs.SY stat.AP

Adaptive Interference Removal for Un-coordinated Radar/Communication Co-existence

自适应干扰消除用于非协调雷达/通信共存

Le Zheng, Marco Lops, Xiaodong Wang

AI总结 本文提出两种算法,利用干扰和数据块误差向量的稀疏性,实现迭代联合干扰消除与数据解调。前者基于经典网格压缩感知,后者引入原子范数约束,通过求解凸优化问题估计雷达参数和解调误差。

详情
AI中文摘要

现有共存通信/雷达系统方法假设雷达和通信系统协调,即共享信息如相对位置、发射波形和信道状态。本文考虑非协调场景,通信接收器需在多个雷达存在下工作,其中仅部分雷达可能活跃,需估计活跃波形及相关参数以在解调前消除干扰。提出两种算法解决联合波形估计/数据解调问题,均利用适当表示的干扰和数据块误差向量的稀疏性,实现迭代联合干扰消除/解调过程。前者基于经典网格压缩感知,后者引入原子范数约束:两种情况下雷达参数和通信解调误差可通过求解凸优化问题估计。还提出改进原子范数算法效率的方法。通过大量仿真验证算法性能,考虑干扰源和各自信道状态的各种条件。

英文摘要

Most existing approaches to co-existing communication/radar systems assume that the radar and communication systems are coordinated, i.e., they share information, such as relative position, transmitted waveforms and channel state. In this paper, we consider an un-coordinated scenario where a communication receiver is to operate in the presence of a number of radars, of which only a sub-set may be active, which poses the problem of estimating the active waveforms and the relevant parameters thereof, so as to cancel them prior to demodulation. Two algorithms are proposed for such a joint waveform estimation/data demodulation problem, both exploiting sparsity of a proper representation of the interference and of the vector containing the errors of the data block, so as to implement an iterative joint interference removal/data demodulation process. The former algorithm is based on classical on-grid compressed sensing (CS), while the latter forces an atomic norm (AN) constraint: in both cases the radar parameters and the communication demodulation errors can be estimated by solving a convex problem. We also propose a way to improve the efficiency of the AN-based algorithm. The performance of these algorithms are demonstrated through extensive simulations, taking into account a variety of conditions concerning both the interferers and the respective channel states.

1609.07180 2026-06-04 math.NA cs.NA stat.CO

Low Rank Independence Samplers in Bayesian Inverse Problems

低秩独立采样器在贝叶斯逆问题中的应用

D. Andrew Brown, Arvind Saibaba, Sarah Vallélian

AI总结 本文提出一种高效采样方法,用于高维高斯分布的贝叶斯线性逆问题,利用低秩近似先验预条件Hessian进行Metropolis-Hastings独立采样,提升计算效率。

详情
Comments
23 pages, 9 figures, 4 tables; plus supplementary materials (15 pages, 16 figures, 1 table)
AI中文摘要

在贝叶斯逆问题中,后验分布用于量化重构解的不确定性。在实践中,马尔可夫链蒙特卡洛算法常用于从后验分布中采样。然而,此类算法的实现可能计算成本较高。本文提出了一种计算高效的方案,用于在病态贝叶斯线性逆问题中采样高维高斯分布。我们的方法使用Metropolis-Hastings独立采样,其建议分布基于先验预条件Hessian的低秩近似。我们展示了接受率与保留的特征值数量之间的依赖关系,并讨论了接受率较高的条件。我们通过在图像去模糊、计算机断层扫描和核磁共振弛豫测量的数值实验中使用该采样器,展示了所提采样器的效果。

英文摘要

In Bayesian inverse problems, the posterior distribution is used to quantify uncertainty about the reconstructed solution. In practice, Markov chain Monte Carlo algorithms often are used to draw samples from the posterior distribution. However, implementations of such algorithms can be computationally expensive. We present a computationally efficient scheme for sampling high-dimensional Gaussian distributions in ill-posed Bayesian linear inverse problems. Our approach uses Metropolis-Hastings independence sampling with a proposal distribution based on a low-rank approximation of the prior-preconditioned Hessian. We show the dependence of the acceptance rate on the number of eigenvalues retained and discuss conditions under which the acceptance rate is high. We demonstrate our proposed sampler by using it with Metropolis-Hastings-within-Gibbs sampling in numerical experiments in image deblurring, computerized tomography, and NMR relaxometry.

1609.06842 2026-06-04 physics.data-an cs.NA math.NA math.OC physics.geo-ph stat.CO

Efficient big data assimilation through sparse representation: A 3D benchmark case study in seismic history matching

通过稀疏表示实现高效的大数据同化:地震历史匹配的三维基准案例研究

Xiaodong Luo, Tuhin Bhakta, Morten Jakobsen, Geir Nævdal

AI总结 本文提出一种基于稀疏表示的高效地震历史匹配方法,通过波let变换减少数据量并提高同化效率,利用领头波let系数提升储层模型更新精度。

详情
Comments
Overlapping with our conference paper at ECMOR XV, 2016. This is the initial draft submitted for review
AI中文摘要

在先前的工作中,作者提出了基于集合的4D地震历史匹配(SHM)框架,其中包括选择地震数据类型、处理大数据及相关数据噪声估计的方法,以及使用最新开发的迭代集合历史匹配算法。在地震历史匹配中,通常使用反演的地震属性,如声阻抗,作为观测数据。在此过程中,可能会引入额外的不确定性。所提出的SHM框架通过采用幅度随角度变化(AVA)数据来避免此类中间反演过程。此外,SHM通常涉及将大量观测地震属性同化到储层模型中。为处理SHM中的大数据问题,所提出的框架采用基于波let的稀疏表示程序:首先对观测地震属性应用离散波let变换。然后在波let域中进行不确定性分析,以估计所得到的波let系数中的噪声,并计算相应的阈值。波let系数高于阈值的称为领头波let系数,用于历史匹配。保留的领头波let系数保留了观测地震属性中最显著的特征,同时使数据量大幅减少。最后,采用迭代集合平滑器来更新储层模型,以使模拟的地震属性的领头波let系数更好地匹配观测的地震属性的领头波let系数。

英文摘要

In a previous work \citep{luo2016sparse2d_spej}, the authors proposed an ensemble-based 4D seismic history matching (SHM) framework, which has some relatively new ingredients, in terms of the type of seismic data in choice, the way to handle big seismic data and related data noise estimation, and the use of a recently developed iterative ensemble history matching algorithm. In seismic history matching, it is customary to use inverted seismic attributes, such as acoustic impedance, as the observed data. In doing so, extra uncertainties may arise during the inversion processes. The proposed SHM framework avoids such intermediate inversion processes by adopting amplitude versus angle (AVA) data. In addition, SHM typically involves assimilating a large amount of observed seismic attributes into reservoir models. To handle the big-data problem in SHM, the proposed framework adopts the following wavelet-based sparse representation procedure: First, a discrete wavelet transform is applied to observed seismic attributes. Then, uncertainty analysis is conducted in the wavelet domain to estimate noise in the resulting wavelet coefficients, and to calculate a corresponding threshold value. Wavelet coefficients above the threshold value, called leading wavelet coefficients hereafter, are used as the data for history matching. The retained leading wavelet coefficients preserve the most salient features of the observed seismic attributes, whereas rendering a substantially smaller data size. Finally, an iterative ensemble smoother is adopted to update reservoir models, in such a way that the leading wavelet coefficients of simulated seismic attributes better match those of observed seismic attributes. (The rest of the abstract was omitted for the length restriction.)

1803.03104 2026-06-04 eess.SY cs.CV cs.SY math.DS stat.ML

Applicability and interpretation of the deterministic weighted cepstral distance

确定性加权谱距的应用与解释

Oliver Lauwers, Bart De Moor

AI总结 本文结合系统理论和机器学习,研究了加权谱距在确定性线性时不变单输入单输出模型中的应用,提出了一种基于输入输出信号信息评估系统稳定性和相位类型的纯数据驱动方法。

详情
Comments
18 pages, 5 figures, submitted for review to Automatica
AI中文摘要

量化数据对象之间的相似性是现代数据科学的重要部分。决定使用哪种相似性度量非常依赖于具体应用。本文结合系统理论和机器学习的见解,研究了之前为ARMA模型信号定义的加权谱距。我们将其扩展到可逆的确定性线性时不变单输入单输出模型,并评估其适用性。我们证明了该距离总能以底层模型的极点和零点进行解释,并在稳定、最小相位或不稳定、最大相位模型的情况下,可以以子空间角度进行几何解释。然后,我们提出了一种仅使用输入/输出信号信息的方法来评估生成模型的稳定性和相位类型。通过这种方式,我们证明了扩展的加权谱距与加权谱模型范数之间的联系。通过这种方式,我们提供了一种纯数据驱动的方法来评估输入/输出信号对的不同底层动态,而无需任何系统识别步骤。这在时间序列聚类等机器学习任务中可能很有用。本文还发布了一个iPython教程,包含各种方法和算法的实现,以及一些证明等价性的数值示例。

英文摘要

Quantifying similarity between data objects is an important part of modern data science. Deciding what similarity measure to use is very application dependent. In this paper, we combine insights from systems theory and machine learning, and investigate the weighted cepstral distance, which was previously defined for signals coming from ARMA models. We provide an extension of this distance to invertible deterministic linear time invariant single input single output models, and assess its applicability. We show that it can always be interpreted in terms of the poles and zeros of the underlying model, and that, in the case of stable, minimum-phase, or unstable, maximum-phase models, a geometrical interpretation in terms of subspace angles can be given. We then devise a method to assess stability and phase-type of the generating models, using only input/output signal information. In this way, we prove a connection between the extended weighted cepstral distance and a weighted cepstral model norm. In this way, we provide a purely data-driven way to assess different underlying dynamics of input/output signal pairs, without the need for any system identification step. This can be useful in machine learning tasks such as time series clustering. An iPython tutorial is published complementary to this paper, containing implementations of the various methods and algorithms presented here, as well as some numerical illustrations of the equivalences proven here.

1703.01923 2026-06-04 eess.SY cs.SY math.DS stat.ML

A time series distance measure for efficient clustering of input output signals by their underlying dynamics

一种用于通过其底层动态高效聚类输入输出信号的时间序列距离度量

Oliver Lauwers, Bart De Moor

AI总结 本文提出一种改进的Martin cepstral距离度量,用于高效聚类输入输出信号,通过分析时间序列的底层动态特性,降低计算复杂度。

详情
Comments
6 pages, 4 figures, sent in for review to IEEE L-CSS (CDC 2017 option)
AI中文摘要

本文从由多个确定性线性动态系统生成的输入/输出时间序列数据集出发,解决自动聚类问题。我们提出一种Martin cepstral距离的扩展,能够高效聚类这些时间序列,并应用于模拟的电气电路数据。传统方法分为两类:一类使用时间序列距离度量(如欧几里得距离、动态时间规整)和聚类技术(如k均值、k-medoids、层次聚类)来寻找数据集中的自然群体;另一类则利用输入/输出数据识别动态系统,并应用模型范数度量(如H2、H-infinity)来确定相似系统。本文展示所提出的新距离度量在性能上与显式建模每对输入/输出数据相当,但计算复杂度显著降低。计算两个长度为N的时间序列之间的距离复杂度为O(N logN)。

英文摘要

Starting from a dataset with input/output time series generated by multiple deterministic linear dynamical systems, this paper tackles the problem of automatically clustering these time series. We propose an extension to the so-called Martin cepstral distance, that allows to efficiently cluster these time series, and apply it to simulated electrical circuits data. Traditionally, two ways of handling the problem are used. The first class of methods employs a distance measure on time series (e.g. Euclidean, Dynamic Time Warping) and a clustering technique (e.g. k-means, k-medoids, hierarchical clustering) to find natural groups in the dataset. It is, however, often not clear whether these distance measures effectively take into account the specific temporal correlations in these time series. The second class of methods uses the input/output data to identify a dynamic system using an identification scheme, and then applies a model norm-based distance (e.g. H2, H-infinity) to find out which systems are similar. This, however, can be very time consuming for large amounts of long time series data. We show that the new distance measure presented in this paper performs as good as when every input/output pair is modelled explicitly, but remains computationally much less complex. The complexity of calculating this distance between two time series of length N is O(N logN).

1803.02553 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Graph Learning from Filtered Signals: Graph System and Diffusion Kernel Identification

基于滤波信号的图学习:图系统与扩散核识别

Hilmi E. Egilmez, Eduardo Pavez, Antonio Ortega

AI总结 本文提出一种新的图信号处理框架,用于从滤波信号类中构建图模型。通过将图建模问题转化为图系统识别问题,学习加权图(图拉普拉斯矩阵)和图滤波器(图拉普拉斯矩阵函数)。算法能从多信号观测中联合识别图和图滤波器,适用于学习扩散核,并在真实气候数据集上验证了其有效性。

详情
Comments
Submitted to IEEE Trans. on Signal and Information Processing over Networks (13 pages)
AI中文摘要

本文介绍了一种新的图信号处理框架,用于从滤波信号类中构建图模型。在我们的框架中,图建模被公式化为图系统识别问题,目标是学习加权图(图拉普拉斯矩阵)和图滤波器(图拉普拉斯矩阵函数)。为了求解提出的问题,开发了一种算法,从多个信号/数据观测中联合识别图和图滤波器(GBF)。我们的算法在GBF是一一对应函数的假设下有效。所提出的方法可以应用于学习扩散(热)核,这些核在各种领域中用于建模扩散过程。此外,对于特定的图滤波器选择,所提出的问题减少为图拉普拉斯估计问题。我们的实验结果表明,所提出算法优于当前最先进的方法。我们还实现了该框架在一个真实气候数据集上,用于温度信号建模。

英文摘要

This paper introduces a novel graph signal processing framework for building graph-based models from classes of filtered signals. In our framework, graph-based modeling is formulated as a graph system identification problem, where the goal is to learn a weighted graph (a graph Laplacian matrix) and a graph-based filter (a function of graph Laplacian matrices). In order to solve the proposed problem, an algorithm is developed to jointly identify a graph and a graph-based filter (GBF) from multiple signal/data observations. Our algorithm is valid under the assumption that GBFs are one-to-one functions. The proposed approach can be applied to learn diffusion (heat) kernels, which are popular in various fields for modeling diffusion processes. In addition, for specific choices of graph-based filters, the proposed problem reduces to a graph Laplacian estimation problem. Our experimental results demonstrate that the proposed algorithm outperforms the current state-of-the-art methods. We also implement our framework on a real climate dataset for modeling of temperature signals.

1803.01626 2026-06-04 stat.ML cs.LG cs.SY eess.SY

Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs

考虑平均回报准则下未知离散马尔可夫决策过程(MDP)中的强化学习的方差意识后悔界

Mohammad Sadegh Talebi, Odalric-Ambrym Maillard

AI总结 本文基于平均回报准则,重新审视未知离散MDP中的强化学习问题,通过引入局部方差代替MDP直径,改进KL-UCRL算法的后悔界,提供更优的性能保证。

详情
Comments
To appear in Proceedings of the 29th International Conference on Algorithmic Learning Theory (ALT 2018)
AI中文摘要

在未知和离散的马尔可夫决策过程(MDP)中,考虑在单一流观测下进行强化学习的问题,当学习者从初始状态开始与系统交互时。我们通过引入偏倚函数的局部方差代替MDP的直径,重新审视该问题的最小最大下界。此外,我们提供了KL-UCRL算法的新型分析,建立了高概率的后悔界,其规模为$\widetilde {\mathcal O}\Bigl({\textstyle \sqrt{S\sum_{s,a}{\bf V}^\star_{s,a}T}}\Big)$,适用于周期性MDP。该界优于之前已知的$\widetilde {\mathcal O}(DS\sqrt{AT})$界,其中$A$和$D$分别表示每个状态的最大动作数和MDP的直径。我们最终在一些基准MDP中比较了两个界的主导项,表明在某些情况下,所推导的界可以提供一个数量级的改进。我们的分析利用了运输引理的新变体结合KL集中不等式,我们认为这些方法具有独立的兴趣。

英文摘要

The problem of reinforcement learning in an unknown and discrete Markov Decision Process (MDP) under the average-reward criterion is considered, when the learner interacts with the system in a single stream of observations, starting from an initial state without any reset. We revisit the minimax lower bound for that problem by making appear the local variance of the bias function in place of the diameter of the MDP. Furthermore, we provide a novel analysis of the KL-UCRL algorithm establishing a high-probability regret bound scaling as $\widetilde {\mathcal O}\Bigl({\textstyle \sqrt{S\sum_{s,a}{\bf V}^\star_{s,a}T}}\Big)$ for this algorithm for ergodic MDPs, where $S$ denotes the number of states and where ${\bf V}^\star_{s,a}$ is the variance of the bias function with respect to the next-state distribution following action $a$ in state $s$. The resulting bound improves upon the best previously known regret bound $\widetilde {\mathcal O}(DS\sqrt{AT})$ for that algorithm, where $A$ and $D$ respectively denote the maximum number of actions (per state) and the diameter of MDP. We finally compare the leading terms of the two bounds in some benchmark MDPs indicating that the derived bound can provide an order of magnitude improvement in some cases. Our analysis leverages novel variations of the transportation lemma combined with Kullback-Leibler concentration inequalities, that we believe to be of independent interest.

1803.01221 2026-06-04 eess.SY cs.SY stat.OT

Byzantine-Resilient Locally Optimum Detection Using Collaborative Autonomous Networks

使用协作自主网络的容错局部最优检测

Bhavya Kailkhura, Priyadip Ray, Deepak Rajan, Anton Yen, Peter Barnes, Ryan Goldhahn

AI总结 本文提出了一种局部最优检测方案,用于检测埋藏在背景杂波中的弱放射源,采用基于交替方向乘子法的去中心化算法实现,展示了在低信噪比下的高性能和良好的收敛性。

详情
Comments
Proceedings of the 2017 IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP 2017), 10.-13. December 2017, Curacao, Dutch Antilles
AI中文摘要

本文提出了一种局部最优检测(LOD)方案,用于检测埋藏在背景杂波中的弱放射源。我们开发了一种基于交替方向乘子法(ADMM)的去中心化算法,用于在自主传感器网络中实现该方案。结果表明,该算法在低信噪比情况下性能接近集中式clairvoyant检测算法,并表现出优异的收敛速度和可扩展性(相对于节点数量)。我们还设计了一种低开销、鲁棒的ADMM算法,用于容错检测,并展示了其对数据篡改攻击的鲁棒性。

英文摘要

In this paper, we propose a locally optimum detection (LOD) scheme for detecting a weak radioactive source buried in background clutter. We develop a decentralized algorithm, based on alternating direction method of multipliers (ADMM), for implementing the proposed scheme in autonomous sensor networks. Results show that algorithm performance approaches the centralized clairvoyant detection algorithm in the low SNR regime, and exhibits excellent convergence rate and scaling behavior (w.r.t. number of nodes). We also devise a low-overhead, robust ADMM algorithm for Byzantine-resilient detection, and demonstrate its robustness to data falsification attacks.

1610.09572 2026-06-04 stat.CO cs.NA math.NA math.PR

Density Tracking by Quadrature for Stochastic Differential Equations

通过四阶求积法进行随机微分方程的密度追踪

Harish S. Bhat, R. W. M. A. Madushani

AI总结 本文提出DTQ方法,通过时间离散化和四阶求积求解马尔可夫链的 Chapman-Kolmogorov 方程,证明其在 $L^1 $ 范数下具有指数收敛性和一阶收敛性,并通过数值实验验证其效率。

详情
Comments
42 pages, 2 figures, minor revisions made to previous version, comments welcome
AI中文摘要

我们开发并分析了一种方法,即通过四阶求积法进行密度追踪(DTQ),用于计算随机微分方程解的概率密度函数。方法的推导始于对随机微分方程的时间离散化,从而得到一个具有连续状态空间的离散时间马尔可夫链。在每个时间步,DTQ 使用四阶求积法求解该马尔可夫链的 Chapman-Kolmogorov 方程。本文专注于 DTQ 方法的一个特例,即在时间上使用欧拉-马尔尤拉方法,在空间上使用梯形求积规则。我们的主要结果证明了 DTQ 计算的密度在 $L^1 $ 范数下收敛到马尔可夫链的精确密度(具有指数收敛率)和随机微分方程的精确密度(具有一阶收敛率)。我们建立了一个 Chernoff 绑定,证明了 DTQ 的域截断版本的收敛性。我们进行了数值实验,以显示 DTQ 的实测性能与理论结果一致,并证明 DTQ 在相同误差水平下比福克-普朗克求解器快几倍。

英文摘要

We develop and analyze a method, density tracking by quadrature (DTQ), to compute the probability density function of the solution of a stochastic differential equation. The derivation of the method begins with the discretization in time of the stochastic differential equation, resulting in a discrete-time Markov chain with continuous state space. At each time step, DTQ applies quadrature to solve the Chapman-Kolmogorov equation for this Markov chain. In this paper, we focus on a particular case of the DTQ method that arises from applying the Euler-Maruyama method in time and the trapezoidal quadrature rule in space. Our main result establishes that the density computed by DTQ converges in $L^1$ to both the exact density of the Markov chain (with exponential convergence rate), and to the exact density of the stochastic differential equation (with first-order convergence rate). We establish a Chernoff bound that implies convergence of a domain-truncated version of DTQ. We carry out numerical tests to show that the empirical performance of DTQ matches theoretical results, and also to demonstrate that DTQ can compute densities several times faster than a Fokker-Planck solver, for the same level of error.

1707.09825 2026-06-04 math.ST cs.NA math.AP math.NA math.PR stat.TH

On Approximation for Fractional Stochastic Partial Differential Equations on the Sphere

关于球面上分数随机偏微分方程的近似方法

Vo V. Anh, Philip Broadbridge, Andriy Olenko, Yu Guang Wang

AI总结 本文研究了球面上分数随机偏微分方程的精确解及其数值近似方法,通过截断Karhunen-Loève展开分析了截断误差的收敛率,并通过数值示例和宇宙微波背景辐射模拟验证了理论结果。

详情
Comments
28 pages, 7 figures
AI中文摘要

本文给出了一个分数随机偏微分方程在单位球$\mathbb{S}^{2}\subset \mathbb{R}^{3}$上的精确解,该解以分数布朗运动为驱动噪声,初始条件由分数随机柯西问题给出。通过截断Karhunen-Loève展开给出解的数值近似。证明了截断误差在度数上的收敛率及均方误差在时间上的收敛率。通过使用各向同性高斯随机场作为初始条件以及宇宙微波背景辐射(CMB)的演化模拟来验证理论结果。

英文摘要

This paper gives the exact solution in terms of the Karhunen-Loève expansion to a fractional stochastic partial differential equation on the unit sphere $\mathbb{S}^{2}\subset \mathbb{R}^{3}$ with fractional Brownian motion as driving noise and with random initial condition given by a fractional stochastic Cauchy problem. A numerical approximation to the solution is given by truncating the Karhunen-Loève expansion. We show the convergence rates of the truncation errors in degree and the mean square approximation errors in time. Numerical examples using an isotropic Gaussian random field as initial condition and simulations of evolution of cosmic microwave background (CMB) are given to illustrate the theoretical results.

1803.00491 2026-06-04 stat.ML cs.LG cs.NA math.NA

The Power Mean Laplacian for Multilayer Graph Clustering

多层图聚类的幂均值拉普拉斯

Pedro Mercado, Antoine Gautier, Francesco Tudisco, Matthias Hein

AI总结 本文提出一种参数化的矩阵幂均值方法,用于融合多层图的拉普拉斯矩阵,分析其在随机块模型中的期望性能,并在真实数据中验证其在不同设置下恢复真实聚类的能力。

详情
Comments
19 pages, 3 figures. Accepted in Artificial Intelligence and Statistics (AISTATS), 2018
AI中文摘要

多层图编码了相同实体集之间的不同种类的相互作用。当对这样的多层图进行聚类时,自然的问题是应如何融合不同层的信息。本文介绍了一种参数化的矩阵幂均值家族,用于融合不同层的拉普拉斯矩阵,并在随机块模型中分析其期望性能。我们证明该家族在不同设置下能够恢复真实聚类,并在真实世界数据中验证了这一点。尽管计算矩阵幂均值对于大图来说可能非常昂贵,我们引入了一种数值方案,以高效计算大规模稀疏图的幂均值的特征向量。

英文摘要

Multilayer graphs encode different kind of interactions between the same set of entities. When one wants to cluster such a multilayer graph, the natural question arises how one should merge the information different layers. We introduce in this paper a one-parameter family of matrix power means for merging the Laplacians from different layers and analyze it in expectation in the stochastic block model. We show that this family allows to recover ground truth clusters under different settings and verify this in real world data. While computing the matrix power mean can be very expensive for large graphs, we introduce a numerical scheme to efficiently compute its eigenvectors for the case of large sparse graphs.

1711.01631 2026-06-04 stat.AP cs.SY eess.SY

Space Time MUSIC: Consistent Signal Subspace Estimation for Wide-band Sensor Arrays

时空MUSIC:适用于宽带传感器阵列的一致信号子空间估计

Elio D. Di Claudio, Raffaele Parisi, Giovanni Jacovitti

AI总结 本文提出了一种基于时空MUSIC的改进方法,用于宽带传感器阵列的DOA估计,通过考虑有限长度的传感器脉冲响应,实现了在任意频率下一致恢复信号子空间。

详情
Comments
15 pages, 10 figures. Accepted in a revised form by the IEEE Trans. on Signal Processing on 12 February 1918. @IEEE2018
AI中文摘要

宽带方向到达(DOA)估计是声纳、雷达、声学、生物医学和多媒体应用中的关键任务。许多先进的宽带DOA估计器通过近似最大似然、加权子空间拟合或聚焦技术对频率分组的阵列输出进行相干处理。本文表明,通过滤波器组方法获得的分组信号不遵循有限秩窄带阵列模型,因为频谱泄漏和分组内频率变化的阵列响应变化会生成依赖于源过程特定实现的虚拟源。因此,基于分组的现有DOA估计器即使在完美了解阵列响应的情况下也无法保证一致性。本文假设了一个更现实的阵列模型,其中传感器脉冲响应具有有限长度,该模型在时空 formulations 下仍具有有限秩。通过将MUSIC型(ST-MUSIC)估计器应用于宽带时空传感器互相关矩阵的主导特征向量,可以在温和条件下一致恢复任意频率的信号子空间。本文开发了一种基于最大似然的ST-MUSIC子空间估计方法以恢复一致性。通过信息理论准则估计每个频率上活跃的源数量。样本ST-MUSIC子空间可以馈送至任何子空间拟合DOA估计器,单频或多频。仿真表明,当模型不匹配超过信噪比底噪时,新方法在足够高的信噪比下明显优于分组方法。

英文摘要

Wide-band Direction of Arrival (DOA) estimation with sensor arrays is an essential task in sonar, radar, acoustics, biomedical and multimedia applications. Many state of the art wide-band DOA estimators coherently process frequency binned array outputs by approximate Maximum Likelihood, Weighted Subspace Fitting or focusing techniques. This paper shows that bin signals obtained by filter-bank approaches do not obey the finite rank narrow-band array model, because spectral leakage and the change of the array response with frequency within the bin create \emph{ghost sources} dependent on the particular realization of the source process. Therefore, existing DOA estimators based on binning cannot claim consistency even with the perfect knowledge of the array response. In this work, a more realistic array model with a finite length of the sensor impulse responses is assumed, which still has finite rank under a space-time formulation. It is shown that signal subspaces at arbitrary frequencies can be consistently recovered under mild conditions by applying MUSIC-type (ST-MUSIC) estimators to the dominant eigenvectors of the wide-band space-time sensor cross-correlation matrix. A novel Maximum Likelihood based ST-MUSIC subspace estimate is developed in order to recover consistency. The number of sources active at each frequency are estimated by Information Theoretic Criteria. The sample ST-MUSIC subspaces can be fed to any subspace fitting DOA estimator at single or multiple frequencies. Simulations confirm that the new technique clearly outperforms binning approaches at sufficiently high signal to noise ratio, when model mismatches exceed the noise floor.

1802.10275 2026-06-04 cs.LG cs.NA math.NA stat.ML

Solving for high dimensional committor functions using artificial neural networks

利用人工神经网络求解高维承诺函数

Yuehaw Khoo, Jianfeng Lu, Lexing Ying

AI总结 本文提出基于人工神经网络的方法,用于研究由随机过程支配的状态转换。通过变分公式和神经网络参数化,获得高维Fokker-Planck方程的承诺函数数值解,证明在高维问题中可实现中等精度。

详情
Comments
12 pages, 6 figures
AI中文摘要

在本注释中,我们提出了一种基于人工神经网络的方法,用于研究由随机过程支配的状态转换。特别是,我们旨在为过渡路径理论的核心对象——承诺函数,设计数值方案,该函数满足高维Fokker-Planck方程。通过处理此类偏微分方程的变分公式,并将承诺函数参数化为神经网络,可以利用随机算法优化神经网络权重来获得近似解。数值示例表明,对于高维问题可以实现中等精度。

英文摘要

In this note we propose a method based on artificial neural network to study the transition between states governed by stochastic processes. In particular, we aim for numerical schemes for the committor function, the central object of transition path theory, which satisfies a high-dimensional Fokker-Planck equation. By working with the variational formulation of such partial differential equation and parameterizing the committor function in terms of a neural network, approximations can be obtained via optimizing the neural network weights using stochastic algorithms. The numerical examples show that moderate accuracy can be achieved for high-dimensional problems.

1802.09794 2026-06-04 eess.SY cs.SY stat.ML

Identification of LTV Dynamical Models with Smooth or Discontinuous Time Evolution by means of Convex Optimization

通过凸优化识别具有平滑或不连续时间演化的LTV动态模型

Fredrik Bagge Carlson, Anders Robertsson, Rolf Johansson

AI总结 本文通过凸优化方法建立趋势过滤与系统识别的联系,提出识别线性时变LTV动态模型的新方法,支持连续或不连续动态变化,并讨论在激励不足时对模型参数的先验约束。

详情
AI中文摘要

我们建立了趋势过滤与系统识别之间的联系,从而得到了基于凸优化的线性时变(LTV)动态模型识别新方法。我们展示了如何通过设计成本函数促进具有时间连续动态变化的模型,或导致模型系数在有限(稀疏)时间实例处出现不连续变化。我们进一步讨论了在激励不足的情况下对模型参数引入先验信息。识别问题被转化为凸优化问题,适用于如ARX模型和具有时间变化参数的状态空间模型。我们通过跳跃线性系统的仿真、非光滑摩擦和刚性接触的非线性机械臂以及基于模型的轨迹导向强化学习在平滑非线性系统中的应用来说明方法的使用。

英文摘要

We establish a connection between trend filtering and system identification which results in a family of new identification methods for linear, time-varying (LTV) dynamical models based on convex optimization. We demonstrate how the design of the cost function promotes a model with either a continuous change in dynamics over time, or causes discontinuous changes in model coefficients occurring at a finite (sparse) set of time instances. We further discuss the introduction of priors on the model parameters for situations where excitation is insufficient for identification. The identification problems are cast as convex optimization problems and are applicable to, e.g., ARX models and state-space models with time-varying parameters. We illustrate usage of the methods in simulations of jump-linear systems, a nonlinear robot arm with non-smooth friction and stiff contacts as well as in model-based, trajectory centric reinforcement learning on a smooth nonlinear system.

1710.00598 2026-06-04 eess.SY cs.SY stat.ML

Lasso Regularization Paths for NARMAX Models via Coordinate Descent

基于坐标下降法的NARMAX模型Lasso正则化路径

Antônio H. Ribeiro, Luis A. Aguirre

AI总结 本文提出一种利用坐标下降法估计带有L1正则化的NARMAX模型的新算法,通过引入误差回归变量解决非线性回归问题,并在计算效率上保持原有算法性能。

详情
Comments
2018 American Control Conference
AI中文摘要

我们提出了一种新的算法,用于估计以基函数线性组合表示的NARMAX模型的L1正则化估计。由于L1范数惩罚,Lasso估计倾向于产生一些精确为零的系数,从而产生可解释的模型。本研究的创新点在于在Lasso估计中引入误差回归变量(从而产生非线性回归问题)。所提出的算法使用循环坐标下降法计算NARMAX模型的整个正则化路径参数。它通过同时更新回归矩阵和参数向量来处理误差项。在比较计算时间时,我们发现该修改并未降低原始算法的计算效率,并且可以在很少的廉价迭代中提供最重要的回归变量。通过两个示例,该方法被用于线性和多项式模型。

英文摘要

We propose a new algorithm for estimating NARMAX models with $L_1$ regularization for models represented as a linear combination of basis functions. Due to the $L_1$-norm penalty the Lasso estimation tends to produce some coefficients that are exactly zero and hence gives interpretable models. The novelty of the contribution is the inclusion of error regressors in the Lasso estimation (which yields a nonlinear regression problem). The proposed algorithm uses cyclical coordinate descent to compute the parameters of the NARMAX models for the entire regularization path. It deals with the error terms by updating the regressor matrix along with the parameter vector. In comparative timings we find that the modification does not reduce the computational efficiency of the original algorithm and can provide the most important regressors in very few inexpensive iterations. The method is illustrated for linear and polynomial models by means of two examples.

1612.03233 2026-06-04 math.ST cs.NA math.NA math.RT stat.CO stat.ME stat.TH

New Tests of Uniformity on the Compact Classical Groups as Diagnostics for Weak-star Mixing of Markov Chains

紧凑经典群上均匀性检验的新测试作为马尔可夫链弱星混合的诊断

Amir Sepehri

AI总结 本文提出两种新的非参数拟合优度检验,用于检验经典紧群上的特征值分布和均匀分布,证明其在所有固定替代假设下一致。通过分析检验的渐近分布,并验证其渐近可接受性,探讨了局部功效和全局功效。

详情
Comments
Accepted for publication in Bernoulli
AI中文摘要

本文介绍两种新的非参数拟合优度检验,用于经典紧群上的均匀分布和特征值分布。一种检验针对由均匀分布诱导的特征值分布,另一种检验针对整个群上的均匀分布,均在所有固定替代假设下一致。我们推导了在原假设和一般替代假设下的渐近分布,并证明这些检验是渐近可接受的。局部功效被推导出来,并探讨了针对局部替代假设的功率函数的全局性质。新检验在两个随机游走上得到验证,其中混合时间在文献中被研究。新检验及其他检验应用于由\cite{jones2011randomized}提出的马尔可夫链采样器,提供了强有力证据支持采样器快速混合的主张。

英文摘要

This paper introduces two new families of non-parametric tests of goodness-of-fit on the compact classical groups. One of them is a family of tests for the eigenvalue distribution induced by the uniform distribution, which is consistent against all fixed alternatives. The other is a family of tests for the uniform distribution on the entire group, which is again consistent against all fixed alternatives. We find the asymptotic distribution under the null and general alternatives. The tests are proved to be asymptotically admissible. Local power is derived and the global properties of the power function against local alternatives are explored. The new tests are validated on two random walks for which the mixing-time is studied in the literature. The new tests, and several others, are applied to the Markov chain sampler proposed by \cite{jones2011randomized}, providing strong evidence supporting the claim that the sampler mixes quickly.

1802.08286 2026-06-04 eess.SY cs.SY stat.AP

Reliability and Market Price of Energy in the Presence of Intermittent and Non-Dispatchable Renewable Energies

间歇性和不可调度可再生能源存在下的能源可靠性与市场价格

Ashkan Zeinalzadeh, Donya Ghavidel, Vijay Gupta

AI总结 本文研究了间歇性和不可调度可再生能源对传统发电机运营成本的影响,通过建立考虑可再生能源和拥堵约束的市场清算价格模型,量化了能源成本,并分析了可再生能源渗透率对承诺功率、市场价格和消费者成本的影响。

详情
Comments
11 pages
AI中文摘要

可再生能源的间歇性增加了传统发电机的运营成本。随着可再生能源供应比例的增加,这些成本也随之上升。本文通过开发一个考虑可再生能源和拥堵约束的市场清算价格模型,量化了这些成本。我们考虑了一个电力市场,其中发电机向独立系统运营商(ISO)提出每单位能量的报价。ISO通过求解优化问题来调度各发电机的能源,以最小化代表消费者购买能源的总成本。为确保发电机能够在期望的置信水平内满足负载,我们利用条件价值-at-风险(CVAR)度量在电力市场中引入负载方差的概念,并推导出承诺功率和能源市场清算价格作为CVAR函数。研究表明,可再生能源渗透率的增加可能由于可再生能源发电的不确定性而增加承诺功率、能源市场清算价格和消费者能源成本。我们还获得了拥堵约束对承诺功率影响的上限。我们通过描述性模拟来说明可再生能源渗透率和可靠性水平对非可再生能源发电机承诺功率的影响,以及调度与承诺功率之间的差异、能源市场价格和可再生能源与非可再生能源发电机利润的影响。

英文摘要

The intermittent nature of the renewable energies increases the operation costs of conventional generators. As the share of energy supplied by renewable sources increases, these costs also increase. In this paper, we quantify these costs by developing a market clearing price of energy in the presence of renewable energy and congestion constraints. We consider an electricity market where generators propose their asking price per unit of energy to an independent system operator (ISO). The ISO solve an optimization problem to dispatch energy from each generator to minimize the total cost of energy purchased on behalf of the consumers. To ensure that the generators are able to meet the load within a desired confidence level, we incorporate the notion of load variance using the Conditional Value-at-Risk (CVAR) measure in an electricity market and we derive the amount of committed power and market clearing price of energy as a function of CVAR. It is shown that a higher penetration of renewable energies may increase the committed power, market clearing price of energy and consumer cost of energy due to renewable generation uncertainties. We also obtain an upper-bound on the amount that congestion constraints can affect the committed power. We present descriptive simulations to illustrate the impact of renewable energy penetration and reliability levels on committed power by the non-renewable generators, difference between the dispatched and committed power, market price of energy and profit of renewable and non-renewable generators.

1802.08242 2026-06-04 stat.ME cs.NA cs.SY eess.SY math.NA stat.ML

Structured low-rank matrix completion for forecasting in time series analysis

结构低秩矩阵补全用于时间序列分析中的预测

Jonathan Gillard, Konstantin Usevich

AI总结 本文研究了低秩矩阵补全问题,应用于时间序列预测。基于Hankel矩阵的补全方法和核范数松弛,通过理论和实验证明了适当选择已知观测加权方案的重要性。

详情
Comments
25 pages, 12 figures
AI中文摘要

本文考虑了具有特定应用到时间序列分析预测的低秩矩阵补全问题。简要而言,低秩矩阵补全问题是在秩约束下填补矩阵缺失值的问题。我们考虑了Hankel矩阵的补全问题,并基于核范数的凸松弛方法。基于新的理论结果和多个数值和实际例子,我们探讨了所提出方法在何时有效。我们的结果强调了选择适当加权方案的重要性。

英文摘要

In this paper we consider the low-rank matrix completion problem with specific application to forecasting in time series analysis. Briefly, the low-rank matrix completion problem is the problem of imputing missing values of a matrix under a rank constraint. We consider a matrix completion problem for Hankel matrices and a convex relaxation based on the nuclear norm. Based on new theoretical results and a number of numerical and real examples, we investigate the cases when the proposed approach can work. Our results highlight the importance of choosing a proper weighting scheme for the known observations.

1706.06491 2026-06-04 eess.SY cs.SY stat.ML

Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

基于概率模型预测控制的数据高效强化学习

Sanket Kamthe, Marc Peter Deisenroth

AI总结 本文提出基于概率模型预测控制的强化学习框架,通过高斯过程学习转移模型以处理约束,实现数据高效和最优控制。

详情
Comments
Accepted at AISTATS 2018,
AI中文摘要

本文提出基于概率模型预测控制的强化学习框架,通过高斯过程学习转移模型以处理约束,实现数据高效和最优控制。

英文摘要

Trial-and-error based reinforcement learning (RL) has seen rapid advancements in recent times, especially with the advent of deep neural networks. However, the majority of autonomous RL algorithms require a large number of interactions with the environment. A large number of interactions may be impractical in many real-world applications, such as robotics, and many practical systems have to obey limitations in the form of state space or control constraints. To reduce the number of system interactions while simultaneously handling constraints, we propose a model-based RL framework based on probabilistic Model Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs) to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors. We then use MPC to find a control sequence that minimises the expected long-term cost. We provide theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long-term planning. We demonstrate that our approach does not only achieve state-of-the-art data efficiency, but also is a principled way for RL in constrained environments.

1802.07346 2026-06-04 cs.RO cs.SY eess.SP eess.SY stat.AP

Cooperative Robot Localization Using Event-triggered Estimation

基于事件触发估计的协作机器人定位

Michael Ouimet, David Iglesias, Nisar Ahmed, Sonia Martinez

AI总结 本文提出一种低通信开销的协作定位算法,通过事件触发机制仅在状态估计创新度高时发送测量,结合协方差交叠机制实现网络状态同步,实验验证了其在多种动态模型下的高效定位性能。

详情
Comments
Revised submission in review with AIAA Journal of Aerospace Information Systems (JAIS), submitted February 17, 2018
AI中文摘要

本文描述了一种新颖的低通信开销协作定位算法,适用于移动无人机器人团队。利用基于事件的估计范式,机器人仅在状态估计创新度较高时向邻居发送测量。由于代理已知触发测量的条件,缺少的测量信息也被融合到状态估计中。机器人使用协方差交叠(CI)机制偶尔同步其对完整网络状态的局部估计。此外,启发式平衡动态确保在大直径网络中,局部误差协方差始终保持在预期范围内。线性和非线性动态/测量模型的仿真表明,事件触发方法在广泛的操作条件下实现了接近最优的状态估计性能,即使仅使用传统全数据共享所需通信开销的小部分。还检验了所提方法对丢包通信的鲁棒性以及网络拓扑与基于CI的同步需求之间的关系。

英文摘要

This paper describes a novel communication-spare cooperative localization algorithm for a team of mobile unmanned robotic vehicles. Exploiting an event-based estimation paradigm, robots only send measurements to neighbors when the expected innovation for state estimation is high. Since agents know the event-triggering condition for measurements to be sent, the lack of a measurement is thus also informative and fused into state estimates. The robots use a Covariance Intersection (CI) mechanism to occasionally synchronize their local estimates of the full network state. In addition, heuristic balancing dynamics on the robots' CI-triggering thresholds ensure that, in large diameter networks, the local error covariances remains below desired bounds across the network. Simulations on both linear and nonlinear dynamics/measurement models show that the event-triggering approach achieves nearly optimal state estimation performance in a wide range of operating conditions, even when using only a fraction of the communication cost required by conventional full data sharing. The robustness of the proposed approach to lossy communications, as well as the relationship between network topology and CI-based synchronization requirements, are also examined.

1802.06820 2026-06-04 cs.SI cond-mat.stat-mech cs.NA math.NA physics.soc-ph stat.ML

Tools for higher-order network analysis

高阶网络分析工具

Austin R. Benson

AI总结 本文提出三种工具,利用高阶连接模式分析网络数据,包括基于网络动机的模块聚类、高阶闭合模式的聚类系数扩展以及时间网络中的动机定义与计数算法。

详情
Comments
Ph.D. Thesis, Stanford University, 2017
AI中文摘要

网络是复杂系统的基本模型,通常通过个体节点和边的低阶连接模式进行分析。然而,由小子图捕获的高阶连接模式,即网络动机,描述了控制许多复杂系统行为的基本结构。本文开发了三种网络分析工具:(1) 基于节点在网络动机中的联合参与进行模块聚类的框架;(2) 用于研究高阶闭合模式的聚类系数测量扩展;(3) 用于时间网络的网络动机定义及快速计数算法。使用这些工具,我们分析了生物学、生态学、经济学、神经科学、在线社交网络、科学合作、电信、交通以及万维网的数据。

英文摘要

Networks are a fundamental model of complex systems throughout the sciences, and network datasets are typically analyzed through lower-order connectivity patterns described at the level of individual nodes and edges. However, higher-order connectivity patterns captured by small subgraphs, also called network motifs, describe the fundamental structures that control and mediate the behavior of many complex systems. We develop three tools for network analysis that use higher-order connectivity patterns to gain new insights into network datasets: (1) a framework to cluster nodes into modules based on joint participation in network motifs; (2) a generalization of the clustering coefficient measurement to investigate higher-order closure patterns; and (3) a definition of network motifs for temporal networks and fast algorithms for counting them. Using these tools, we analyze data from biology, ecology, economics, neuroscience, online social networks, scientific collaborations, telecommunications, transportation, and the World Wide Web.

1705.08435 2026-06-04 cs.LG cs.CR cs.DC cs.SY eess.SY stat.ML

Personalized and Private Peer-to-Peer Machine Learning

个性化与隐私保护的点对点机器学习

Aurélien Bellet, Rachid Guerraoui, Mahsa Taziki, Marc Tommasi

AI总结 本文提出一种高效算法,实现去中心化且异步的个性化机器学习,在强隐私要求下保证收敛性。通过差分隐私保护数据隐私,并分析隐私与效用的平衡。实验表明,在非隐私情况下优于先前方法,隐私约束下可显著提升模型性能。

详情
Comments
20 pages, to appear in the Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018)
AI中文摘要

随着连接个人设备的兴起和隐私问题的出现,需要能够利用大量代理数据学习个性化模型的机器学习算法,同时满足严格的隐私要求。本文介绍了一种高效的算法,以完全去中心化(点对点)和异步方式解决上述问题,并具有可证明的收敛速度。我们展示了如何使算法具有差分隐私性,以保护个人数据集信息的泄露,并正式分析效用与隐私之间的权衡。我们的实验表明,在非隐私情况下,我们的方法显著优于先前工作,在隐私约束下,我们可以在孤立学习的模型上取得显著改进。

英文摘要

The rise of connected personal devices together with privacy concerns call for machine learning algorithms capable of leveraging the data of a large number of agents to learn personalized models under strong privacy requirements. In this paper, we introduce an efficient algorithm to address the above problem in a fully decentralized (peer-to-peer) and asynchronous fashion, with provable convergence rate. We show how to make the algorithm differentially private to protect against the disclosure of information about the personal datasets, and formally analyze the trade-off between utility and privacy. Our experiments show that our approach dramatically outperforms previous work in the non-private case, and that under privacy constraints, we can significantly improve over models learned in isolation.

1802.05828 2026-06-04 eess.SY cs.SY stat.AP

Improving Power Grid Resilience Through Predictive Outage Estimation

通过预测性停电估计提高电网韧性

Rozhin Eskandarpour, Amin Khodaei, Ali Arab

AI总结 本文提出基于多维支持向量机的机器学习模型,预测极端事件下电网组件状态,以提高电网韧性。模型结合基础设施质量与组件耐受时间,通过交叉验证和仿真测试验证其有效性。

详情
Journal ref
Power Symposium (NAPS), 2017 North American
AI中文摘要

本文提出一种机器学习模型,旨在通过预测极端事件下电网组件状态来提高电网韧性。该模型基于多维支持向量机(SVM),考虑与韧性指数相关的因素,包括基础设施质量水平和组件能承受事件的时间长度,以及预测的事件路径和强度。模型输出将组件状态分为停电和运行两类,可用于预测性调度系统资源以最大化韧性。模型通过Ä折交叉验证和模型基准测试进行验证,其性能通过数值模拟和基于定义良好的性能指标进行测试。

英文摘要

In this paper, in an attempt to improve power grid resilience, a machine learning model is proposed to predictively estimate the component states in response to extreme events. The proposed model is based on a multi-dimensional Support Vector Machine (SVM) considering the associated resilience index, i.e., the infrastructure quality level and the time duration that each component can withstand the event, as well as predicted path and intensity of the upcoming extreme event. The outcome of the proposed model is the classified component state data to two categories of outage and operational, which can be further used to schedule system resources in a predictive manner with the objective of maximizing its resilience. The proposed model is validated using Ä-fold cross-validation and model benchmarking techniques. The performance of the model is tested through numerical simulations and based on a well-defined and commonly-used performance measure.

1708.07827 2026-06-04 math.OC cs.LG cs.NA math.NA stat.ML

Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study

非凸机器学习中的二阶优化:一项实证研究

Peng Xu, Farbod Roosta-Khorasani, Michael W. Mahoney

AI总结 本文通过实证研究评估了非凸机器学习问题中牛顿型方法的性能,证明其在泛化性能和超参数鲁棒性方面优于传统SGD,能有效逃离平坦区域和鞍点。

详情
Comments
21 pages, 11 figures. Restructure the paper and add experiments
AI中文摘要

尽管随机梯度下降(SGD)等一阶优化方法在机器学习(ML)中广泛应用,但它们存在收敛速度慢、超参数设置敏感、易陷入高训练误差和难以逃离平坦区域及鞍点等缺陷。在高度非凸设置(如神经网络中)尤为明显。受此启发,近期有研究关注二阶方法,旨在通过捕捉曲率信息缓解这些不足。本文报告了针对非凸ML问题的一类牛顿型方法——信任区域(TR)和自适应三次正则化(ARC)算法的子采样变体的详细实证评估。在此过程中,我们证明这些方法不仅在计算上与手工调优的SGD加动量方法具有竞争力,泛化性能可比或更优,而且对超参数设置具有高度鲁棒性。此外,与SGD加动量相比,这些牛顿型方法利用曲率信息的方式使其能够无缝逃离平坦区域和鞍点。

英文摘要

While first-order optimization methods such as stochastic gradient descent (SGD) are popular in machine learning (ML), they come with well-known deficiencies, including relatively-slow convergence, sensitivity to the settings of hyper-parameters such as learning rate, stagnation at high training errors, and difficulty in escaping flat regions and saddle points. These issues are particularly acute in highly non-convex settings such as those arising in neural networks. Motivated by this, there has been recent interest in second-order methods that aim to alleviate these shortcomings by capturing curvature information. In this paper, we report detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems. In doing so, we demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings. Further, in contrast to SGD with momentum, we show that the manner in which these Newton-type methods employ curvature information allows them to seamlessly escape flat regions and saddle points.

1605.09232 2026-06-04 math.NA cs.LG cs.NA cs.NE math.OC stat.ML

Tradeoffs between Convergence Speed and Reconstruction Accuracy in Inverse Problems

反问题中收敛速度与重建精度之间的权衡

Raja Giryes, Yonina C. Eldar, Alex M. Bronstein, Guillermo Sapiro

AI总结 研究探讨了在逆问题中,通过调整迭代算法以加快收敛速度同时保持重建精度的可行性,结合低维集的恢复技术,分析了粗略估计对收敛速度的影响。

详情
Comments
To appear in IEEE Transactions on Signal Processing
AI中文摘要

使用迭代算法求解逆问题在大数据中很流行。由于时间限制,迭代次数通常有限,可能影响可实现的精度。给定可接受的误差范围,一个重要问题是是否可以通过修改原始迭代方法,获得更快收敛到达到允许误差的极小值点,而不显著增加每次迭代的计算成本。基于最近为某些低维集信号恢复开发的恢复技术,我们表明使用该集的粗略估计可能以额外的重建误差为代价加快收敛。我们的理论与稀疏恢复、压缩感知和深度学习的最新进展相关。特别是,它可能为神经网络通过层表示迭代来近似l1最小化解的成功提供了可能的解释,如在学习迭代收缩阈值算法(LISTA)中实践的那样。

英文摘要

Solving inverse problems with iterative algorithms is popular, especially for large data. Due to time constraints, the number of possible iterations is usually limited, potentially affecting the achievable accuracy. Given an error one is willing to tolerate, an important question is whether it is possible to modify the original iterations to obtain faster convergence to a minimizer achieving the allowed error without increasing the computational cost of each iteration considerably. Relying on recent recovery techniques developed for settings in which the desired signal belongs to some low-dimensional set, we show that using a coarse estimate of this set may lead to faster convergence at the cost of an additional reconstruction error related to the accuracy of the set approximation. Our theory ties to recent advances in sparse recovery, compressed sensing, and deep learning. Particularly, it may provide a possible explanation to the successful approximation of the l1-minimization solution by neural networks with layers representing iterations, as practiced in the learned iterative shrinkage-thresholding algorithm (LISTA).

1709.05885 2026-06-04 math.NA cs.NA stat.CO stat.ML

Variational Gaussian Approximation for Poisson Data

变分高斯近似用于泊松数据

Simon Arridge, Kazufumi Ito, Bangti Jin, Chen Zhang

AI总结 本文研究了泊松模型下的变分高斯近似,通过最小化KL散度推导出下界,并提出高效算法用于求解优化问题,同时讨论了降维和稀疏性以降低计算复杂度。

详情
Comments
26 pages
AI中文摘要

泊松模型常用于描述计数数据,但在贝叶斯背景下会导致后验概率分布无法解析求解。本文分析了泊松模型与高斯先验下的后验分布的变分高斯近似,通过寻找最优高斯分布以最小化KL散度或最大化模型证据的下界。推导了下界表达式,并证明了最优高斯近似的存在性和唯一性。下界函数可视为一种经典的Tikhonov正则化变种,同时惩罚协方差。进一步开发了高效的交替方向最大化算法,并分析其收敛性。讨论了通过前向算子的低秩结构和协方差的稀疏性降低计算复杂度的策略。此外,作为下界应用,讨论了层次贝叶斯建模以选择先验分布的超参数,并提出单调收敛算法确定超参数。通过大量数值实验展示了高斯近似和算法的效果。

英文摘要

The Poisson model is frequently employed to describe count data, but in a Bayesian context it leads to an analytically intractable posterior probability distribution. In this work, we analyze a variational Gaussian approximation to the posterior distribution arising from the Poisson model with a Gaussian prior. This is achieved by seeking an optimal Gaussian distribution minimizing the Kullback-Leibler divergence from the posterior distribution to the approximation, or equivalently maximizing the lower bound for the model evidence. We derive an explicit expression for the lower bound, and show the existence and uniqueness of the optimal Gaussian approximation. The lower bound functional can be viewed as a variant of classical Tikhonov regularization that penalizes also the covariance. Then we develop an efficient alternating direction maximization algorithm for solving the optimization problem, and analyze its convergence. We discuss strategies for reducing the computational complexity via low rank structure of the forward operator and the sparsity of the covariance. Further, as an application of the lower bound, we discuss hierarchical Bayesian modeling for selecting the hyperparameter in the prior distribution, and propose a monotonically convergent algorithm for determining the hyperparameter. We present extensive numerical experiments to illustrate the Gaussian approximation and the algorithms.

1802.04475 2026-06-04 cs.SI cs.NA math.NA math.OC stat.ML

Graph-Based Ascent Algorithms for Function Maximization

基于图的上升算法用于函数最大化

Muni Sreenivas Pydi, Varun Jog, Po-Ling Loh

AI总结 本文研究了在连通图节点上寻找函数最大值的问题,提出两种基于图的局部迭代算法,通过不同转移核分析其收敛性,并通过模拟展示算法在不同图函数平滑度下的收敛性能。

详情
AI中文摘要

我们研究了在连通图的节点上寻找函数最大值的问题。目标是识别一个函数取得最大值的节点。我们专注于局部迭代算法,这些算法沿着图的路径遍历节点,并且下一个迭代点从当前迭代点的邻居中以概率分布选择,该分布由当前迭代点及其邻居的函数值决定。我们研究了两种算法,对应于具有不同转移核的Metropolis-Hastings随机游走:(i) 第一种算法是受参数γ控制的指数加权随机游走。(ii) 第二种算法与图拉普拉斯矩阵和一个平滑参数k相关。我们推导了这两种算法在总变差距离和命中时间方面的收敛速率。我们还提供了模拟,展示了我们的算法与无偏随机游走相比的相对收敛速率,作为图函数平滑度的函数。我们的算法可以归类为一种新的“下降型”方法,用于在图节点上进行函数最大化。

英文摘要

We study the problem of finding the maximum of a function defined on the nodes of a connected graph. The goal is to identify a node where the function obtains its maximum. We focus on local iterative algorithms, which traverse the nodes of the graph along a path, and the next iterate is chosen from the neighbors of the current iterate with probability distribution determined by the function values at the current iterate and its neighbors. We study two algorithms corresponding to a Metropolis-Hastings random walk with different transition kernels: (i) The first algorithm is an exponentially weighted random walk governed by a parameter $γ$. (ii) The second algorithm is defined with respect to the graph Laplacian and a smoothness parameter $k$. We derive convergence rates for the two algorithms in terms of total variation distance and hitting times. We also provide simulations showing the relative convergence rates of our algorithms in comparison to an unbiased random walk, as a function of the smoothness of the graph function. Our algorithms may be categorized as a new class of "descent-based" methods for function maximization on the nodes of a graph.

1702.08435 2026-06-04 eess.SY cs.SY math.OC stat.ML

Statistical Anomaly Detection via Composite Hypothesis Testing for Markov Models

通过马尔可夫模型的复合假设检验进行统计异常检测

Jing Zhang, Ioannis Ch. Paschalidis

AI总结 本文提出一种基于复合假设检验的新估计器,用于改进异常检测中的阈值估计,在通信网络和交通网络中验证了其在减少误报和保持检测概率方面的优势。

详情
Comments
Preprint submitted to the IEEE Transactions on Signal Processing
AI中文摘要

在马尔可夫假设下,我们利用检验统计量中经验测度的中心极限定理(CLT),以建立检验统计量的弱收敛结果,并由此推导出新的阈值估计器。我们通过广泛的数值实验展示了该估计器优于现有估计器的优势,发现其在控制误报的同时保持了满意的检测概率。随后,我们将该Hoeffding检验与我们的阈值估计器应用于两个不同的应用领域:通信网络和交通网络。前者旨在增强网络安全,后者旨在构建智能交通系统。

英文摘要

Under Markovian assumptions, we leverage a Central Limit Theorem (CLT) for the empirical measure in the test statistic of the composite hypothesis Hoeffding test so as to establish weak convergence results for the test statistic, and, thereby, derive a new estimator for the threshold needed by the test. We first show the advantages of our estimator over an existing estimator by conducting extensive numerical experiments. We find that our estimator controls better for false alarms while maintaining satisfactory detection probabilities. We then apply the Hoeffding test with our threshold estimator to detecting anomalies in two distinct applications domains: one in communication networks and the other in transportation networks. The former application seeks to enhance cyber security and the latter aims at building smarter transportation systems in cities.

1802.03981 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Spectral Filtering for General Linear Dynamical Systems

谱滤波用于通用线性动态系统

Elad Hazan, Holden Lee, Karan Singh, Cyril Zhang, Yi Zhang

AI总结 本文提出一种多项式时间算法,用于学习无系统识别假设的隐状态线性动态系统,无需假设系统转移矩阵的谱半径。该算法扩展了谱滤波技术,通过新的凸松弛方法高效识别相位。

详情
AI中文摘要

我们提出了一种多项式时间算法,用于学习隐状态线性动态系统,而无需系统识别,也不假设系统转移矩阵的谱半径。该算法扩展了最近引入的谱滤波技术,该技术先前仅应用于具有对称转移矩阵的系统,通过新的凸松弛方法允许高效识别相位。

英文摘要

We give a polynomial-time algorithm for learning latent-state linear dynamical systems without system identification, and without assumptions on the spectral radius of the system's transition matrix. The algorithm extends the recently introduced technique of spectral filtering, previously applied only to systems with a symmetric transition matrix, using a novel convex relaxation to allow for the efficient identification of phases.

1709.06606 2026-06-04 math.ST cs.NA math.NA math.PR stat.TH

Optimal projection of observations in a Bayesian setting

贝叶斯设定下的观测最优投影

Loïc Giraldi, Olivier P. Le Maître, Ibrahim Hoteit, Omar M. Knio

AI总结 本文提出基于信息论的最优观测投影方法,用于高维数据下的高斯线性模型贝叶斯推断。通过最小化KL散度和最大化互信息的方法,探讨了最优子空间的确定及Riemann优化算法的应用。

详情
AI中文摘要

本文提出了一种用于高斯线性模型贝叶斯推断的最优降维方法,适用于存在过多数据的情况。基于信息论,提出了三种不同的观测最优投影:一种最小化原始模型与投影模型后验分布之间的KL散度,另一种最小化预期KL散度,第三种最大化参数与投影观测之间的互信息。前两个优化问题被建模为最优子空间的确定,因此使用Riemann优化算法在Grassmann流形上求解。对于互信息最大化问题,证明存在一个最优子空间,可最小化降维模型的后验分布熵;该子空间的基底可通过广义特征值问题求得;该特定解有先验误差估计;且保持输入与输出模型之间互信息的子空间维度小于待推断参数数量。通过线性和非线性模型的数值应用,评估了所提方法的效率,并展示了其相较于基于观测主成分分析的常规方法的优势。

英文摘要

Optimal dimensionality reduction methods are proposed for the Bayesian inference of a Gaussian linear model with additive noise in presence of overabundant data. Three different optimal projections of the observations are proposed based on information theory: the projection that minimizes the Kullback-Leibler divergence between the posterior distributions of the original and the projected models, the one that minimizes the expected Kullback-Leibler divergence between the same distributions, and the one that maximizes the mutual information between the parameter of interest and the projected observations. The first two optimization problems are formulated as the determination of an optimal subspace and therefore the solution is computed using Riemannian optimization algorithms on the Grassmann manifold. Regarding the maximization of the mutual information, it is shown that there exists an optimal subspace that minimizes the entropy of the posterior distribution of the reduced model; a basis of the subspace can be computed as the solution to a generalized eigenvalue problem; an a priori error estimate on the mutual information is available for this particular solution; and that the dimensionality of the subspace to exactly conserve the mutual information between the input and the output of the models is less than the number of parameters to be inferred. Numerical applications to linear and nonlinear models are used to assess the efficiency of the proposed approaches, and to highlight their advantages compared to standard approaches based on the principal component analysis of the observations.

1605.00031 2026-06-04 cs.LG cs.CV cs.NA math.NA stat.ML

Deep Convolutional Neural Networks on Cartoon Functions

深度卷积神经网络在卡通函数上的应用

Philipp Grohs, Thomas Wiatowski, Helmut Bölcskei

AI总结 本文研究深度卷积神经网络在卡通函数上的变形稳定性,提出考虑结构特性的新结果,适用于具有尖锐和弯曲不连续性的信号。

详情
Journal ref
Proc. of IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, pp. 1163-1167, July 2016
Comments
This is a slightly updated version of the paper published in the ISIT proceedings. Specifically, we corrected errors in the arguments on the volume of tubes. Note that this correction does not affect the main statements of the paper
AI中文摘要

Wiatowski和Bölcskei, 2015证明了深度卷积神经网络基于的特征提取器的变形稳定性和垂直平移不变性由网络结构本身保证,而非特定卷积核和非线性。虽然平移不变性结果适用于平方可积函数,变形稳定性界仅适用于带限函数。然而,许多实际相关信号(如自然图像)表现出尖锐和弯曲的不连续性,因此不是带限的。本文的主要贡献是针对Donoho, 2001引入的卡通函数类建立变形稳定性界。

英文摘要

Wiatowski and Bölcskei, 2015, proved that deformation stability and vertical translation invariance of deep convolutional neural network-based feature extractors are guaranteed by the network structure per se rather than the specific convolution kernels and non-linearities. While the translation invariance result applies to square-integrable functions, the deformation stability bound holds for band-limited functions only. Many signals of practical relevance (such as natural images) exhibit, however, sharp and curved discontinuities and are, hence, not band-limited. The main contribution of this paper is a deformation stability result that takes these structural properties into account. Specifically, we establish deformation stability bounds for the class of cartoon functions introduced by Donoho, 2001.

1603.08057 2026-06-04 stat.ME cs.NA math.NA

Fast spatial Gaussian process maximum likelihood estimation via skeletonization factorizations

通过骨架分解因子化实现快速空间高斯过程最大似然估计

Victor Minden, Anil Damle, Kenneth L. Ho, Lexing Ying

AI总结 本文提出一种方法,通过骨架分解因子化和矩阵剥离算法,实现对空间高斯过程最大似然估计的高效计算,时间复杂度为~O(n^{3/2})。

详情
Comments
36 pages, 8 figures
AI中文摘要

在空间中给定观测数据的参数拟合的最大似然估计是一个计算需求高的任务,限制了此类方法在中等规模数据集中的应用。我们提出了一种适用于二维无结构观测的框架,能够在一定假设下以~O(n^{3/2})的时间复杂度计算对数似然及其梯度(即得分方程)。我们的方法基于Martinsson与Rokhlin描述的骨架化过程,形式为Ho和Ying的递归骨架分解因子化。结合Lin等人矩阵剥离算法的改编版本,用于构建黑箱运算的H矩阵表示,我们获得了一种可用于任何一阶优化程序的框架,以快速且准确地计算最大似然估计。

英文摘要

Maximum likelihood estimation for parameter-fitting given observations from a Gaussian process in space is a computationally-demanding task that restricts the use of such methods to moderately-sized datasets. We present a framework for unstructured observations in two spatial dimensions that allows for evaluation of the log-likelihood and its gradient (i.e., the score equations) in $\tilde O(n^{3/2})$ time under certain assumptions, where $n$ is the number of observations. Our method relies on the skeletonization procedure described by Martinsson & Rokhlin in the form of the recursive skeletonization factorization of Ho & Ying. Combining this with an adaptation of the matrix peeling algorithm of Lin et al. for constructing $\mathcal{H}$-matrix representations of black-box operators, we obtain a framework that can be used in the context of any first-order optimization routine to quickly and accurately compute maximum-likelihood estimates.

1710.06635 2026-06-04 math.OC cs.NA math.NA stat.ML

A Sinkhorn-Newton method for entropic optimal transport

带熵正则化的最优运输的Sinkhorn-Newton方法

Christoph Brauer, Christian Clason, Dirk Lorenz, Benedikt Wirth

AI总结 本文提出一种基于对数牛顿迭代的熵正则化最优运输求解方法,证明其二次收敛性,并验证其在小正则化强度下优于Sinkhorn-Knopp算法,同时探讨了离散化网格大小对方法鲁棒性的影响。

详情
AI中文摘要

本文考虑了离散最优运输的熵正则化,并提出通过对数牛顿迭代求解其最优性条件。我们证明了方法的二次收敛性,并验证在小正则化强度下,该方法优于常用的Sinkhorn-Knopp算法。同时,我们进一步数值研究了该方法对离散化网格大小等参数的鲁棒性。

英文摘要

We consider the entropic regularization of discretized optimal transport and propose to solve its optimality conditions via a logarithmic Newton iteration. We show a quadratic convergence rate and validate numerically that the method compares favorably with the more commonly used Sinkhorn--Knopp algorithm for small regularization strength. We further investigate numerically the robustness of the proposed method with respect to parameters such as the mesh size of the discretization.

1702.08061 2026-06-04 stat.ME cs.SY eess.SY stat.CO

The Ensemble Kalman Filter: A Signal Processing Perspective

集合卡尔曼滤波:信号处理视角

Michael Roth, Gustaf Hendeby, Carsten Fritsche, Fredrik Gustafsson

AI总结 本文从信号处理角度探讨集合卡尔曼滤波,介绍其在高维非线性非高斯状态估计中的应用,分析算法挑战与扩展,并对比其他滤波方法,提供实用见解与研究方向。

详情
Journal ref
EURASIP J. Adv. Signal Process. (2017) 2017: 56
AI中文摘要

集合卡尔曼滤波(EnKF)是一种基于蒙特卡洛方法的卡尔曼滤波(KF)实现,用于解决极高维、可能非线性和非高斯的状态估计问题。其能够处理百万级状态维度的能力使其在不同地球科学领域流行。尽管信号处理领域同样需要可扩展算法来处理日益增长的传感器数据,但EnKF在该领域几乎未被讨论。本文旨在为信号处理研究者提供EnKF的全部知识,以开始使用该算法。算法在KF框架下推导,不使用常见的地球科学术语。提供了EnKF的算法挑战和所需扩展,以及与sigma点KF和粒子滤波的关系。总结了相关的EnKF文献,并通过详尽的调查和独特的仿真示例,包括流行的基准问题,补充理论与实用见解。信号处理视角突显了EnKF的新研究方向,并促进了该方法以及高维非线性和非高斯滤波的有益交流。

英文摘要

The ensemble Kalman filter (EnKF) is a Monte Carlo based implementation of the Kalman filter (KF) for extremely high-dimensional, possibly nonlinear and non-Gaussian state estimation problems. Its ability to handle state dimensions in the order of millions has made the EnKF a popular algorithm in different geoscientific disciplines. Despite a similarly vital need for scalable algorithms in signal processing, e.g., to make sense of the ever increasing amount of sensor data, the EnKF is hardly discussed in our field. This self-contained review paper is aimed at signal processing researchers and provides all the knowledge to get started with the EnKF. The algorithm is derived in a KF framework, without the often encountered geoscientific terminology. Algorithmic challenges and required extensions of the EnKF are provided, as well as relations to sigma-point KF and particle filters. The relevant EnKF literature is summarized in an extensive survey and unique simulation examples, including popular benchmark problems, complement the theory with practical insights. The signal processing perspective highlights new directions of research and facilitates the exchange of potentially beneficial ideas, both for the EnKF and high-dimensional nonlinear and non-Gaussian filtering in general.

1602.00520 2026-06-04 math.NA cs.NA math.OC math.ST stat.TH

Large Noise in Variational Regularization

变分正则化中的大噪声

Martin Burger, Tapio Helin, Hanne Kekkonen

AI总结 本文研究了逆问题中大噪声的变分正则化方法,提出Banach空间框架以处理更一般的噪声,推导了正则化问题的存在性和误差估计,特别针对一阶和p-齐次正则化函数。

详情
AI中文摘要

本文考虑了具有大噪声的逆问题的变分正则化方法,其中噪声通常在正向算子的像空间中是无界的。我们引入了一个Banach空间设置,允许在更大的空间中定义更一般的噪声的合理解概念,前提是正向算子具有足够的映射性质。关键观察是,这种一般噪声模型可以与近似源条件设置相同,而标准有界噪声模型直接与经典源条件相关。基于这一见解,我们获得了正则化变分问题的广泛存在性结果,并推导了以Bregman距离为术语的误差估计。后者专门针对一阶和p-齐次正则化函数的重要情况。作为自然的下一步,我们研究了随机噪声模型,特别是白噪声,并推导了以Bregman距离期望值为术语的误差估计。某些期望值的有限性导致了新的抽象光滑性条件,这些条件在Hilbert空间情况下易于解释。我们最后举例说明了该方法以及流行的正则化函数的条件,如平方范数、Besov范数和总变分。

英文摘要

In this paper we consider variational regularization methods for inverse problems with large noise that is in general unbounded in the image space of the forward operator. We introduce a Banach space setting that allows to define a reasonable notion of solutions for more general noise in a larger space provided one has sufficient mapping properties of the forward operators. A key observation, which guides us through the subsequent analysis, is that such a general noise model can be understood with the same setting as approximate source conditions (while a standard model of bounded noise is related directly to classical source conditions). Based on this insight we obtain a quite general existence result for regularized variational problems and derive error estimates in terms of Bregman distances. The latter are specialized for the particularly important cases of one- and p-homogeneous regularization functionals. As a natural further step we study stochastic noise models and in particular white noise, for which we derive error estimates in terms of the expectation of the Bregman distance. The finiteness of certain expectations leads to a novel class of abstract smoothness conditions on the forward operator, which can be easily interpreted in the Hilbert space case. We finally exemplify the approach and in particular the conditions for popular examples of regularization functionals given by squared norm, Besov norm and total variation, respectively.

1709.07089 2026-06-04 eess.SY cs.LG cs.SY stat.ML

On the Design of LQR Kernels for Efficient Controller Learning

关于为高效控制器学习设计LQR核

Alonso Marco, Philipp Hennig, Stefan Schaal, Sebastian Trimpe

AI总结 本文提出两种基于LQR结构的核,用于改进基于贝叶斯优化的控制器学习,通过模拟线性和非线性系统证明其优于传统GP方法。

详情
Comments
8 pages, 5 figures, to appear in 56th IEEE Conference on Decision and Control (CDC 2017)
AI中文摘要

从数据中为非线性动态系统寻找最优反馈控制器是困难的。最近,贝叶斯优化(BO)被提出作为直接从实验试次中调整控制器的强大框架。为了选择下一个查询点并找到全局最优解,BO依赖于潜在目标函数的概率描述,通常为高斯过程(GP)。本文显示,使用常见核的GP在标准二次控制问题上可能导致学习效果差。对于一阶系统,我们构建了两种核,专门利用广为人知的线性二次调节器(LQR)的结构,同时保留贝叶斯非参数学习的灵活性。对不确定线性和非线性系统的模拟显示,LQR核在学习性能上优于传统GP方法。

英文摘要

Finding optimal feedback controllers for nonlinear dynamic systems from data is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful framework for direct controller tuning from experimental trials. For selecting the next query point and finding the global optimum, BO relies on a probabilistic description of the latent objective function, typically a Gaussian process (GP). As is shown herein, GPs with a common kernel choice can, however, lead to poor learning outcomes on standard quadratic control problems. For a first-order system, we construct two kernels that specifically leverage the structure of the well-known Linear Quadratic Regulator (LQR), yet retain the flexibility of Bayesian nonparametric learning. Simulations of uncertain linear and nonlinear systems demonstrate that the LQR kernels yield superior learning performance.

1802.00852 2026-06-04 stat.ME cs.NA math.NA

Parameter and Uncertainty Estimation for Dynamical Systems Using Surrogate Stochastic Processes

利用代理随机过程对动态系统进行参数和不确定性估计

M. Chung, M. Binois, R. B. Gramacy, D. J. Moquin, A. P. Smith, A. M. Smith

AI总结 本文提出了一种新的方法,通过观测数据学习参数化的动态系统。首先拟合代理随机过程以融入先验知识,然后利用其样本作为代理数据进行点估计,该方法具有全贝叶斯性和并行计算能力。

详情
Comments
24 pages, 9 figures
AI中文摘要

通过观测数据对动态系统中未知量进行推断对于提供有意义的见解、提供准确预测、实现稳健控制和建立未来实验设计至关重要。将数学理论与实证测量在统计上一致的方式结合是关键,但存在诸多挑战,例如参数估计问题的不恰当性、适当正则化和整合先验知识,以及对完整不确定性资格的计算限制。为了解决这些问题,我们提出了一种新的方法,从数据中学习参数化的动态系统。在许多方面,我们的方法颠倒了传统框架。我们首先拟合一个代理随机过程以适应观测数据,强制先验知识(例如光滑性),并处理具有挑战性的数据特征,如异方差性、厚尾和删失。然后,随机过程的样本被用作“代理数据”,通过普通点估计方法以模块化的方式计算点估计。这种方法的一个优点是完全贝叶斯且同时可并行化。我们在捕食者-猎物模拟研究和一个涉及宿主内流感病毒感染数据与病毒动力学模型的现实应用中展示了新方法的优势。

英文摘要

Inference on unknown quantities in dynamical systems via observational data is essential for providing meaningful insight, furnishing accurate predictions, enabling robust control, and establishing appropriate designs for future experiments. Merging mathematical theory with empirical measurements in a statistically coherent way is critical and challenges abound, e.g.,: ill-posedness of the parameter estimation problem, proper regularization and incorporation of prior knowledge, and computational limitations on full uncertainty qualification. To address these issues, we propose a new method for learning parameterized dynamical systems from data. In many ways, our proposal turns the canonical framework on its head. We first fit a surrogate stochastic process to observational data, enforcing prior knowledge (e.g., smoothness), and coping with challenging data features like heteroskedasticity, heavy tails and censoring. Then, samples of the stochastic process are used as "surrogate data" and point estimates are computed via ordinary point estimation methods in a modular fashion. An attractive feature of this approach is that it is fully Bayesian and simultaneously parallelizable. We demonstrate the advantages of our new approach on a predator prey simulation study and on a real world application involving within-host influenza virus infection data paired with a viral kinetic model.

1801.09238 2026-06-04 eess.SY cs.CV cs.SY stat.AP

Performance Analysis of Robust Stable PID Controllers Using Dominant Pole Placement for SOPTD Process Models

基于主导极点放置的鲁棒稳定PID控制器性能分析用于SOPTD过程模型

Saptarshi Das, Kaushik Halder, Amitava Gupta

AI总结 本文提出新的主导极点放置PID控制器设计方法,用于处理具有时间延迟的二阶过程。通过三阶Pade近似约束闭环主导和非主导极点位置,分析不同非主导极点类型对稳定性区域的影响。

详情
Comments
50 pages, 42 figures, Knowledge-Based Systems, 2018
AI中文摘要

本文推导了新的基于主导极点放置的PID控制器设计公式,用于处理具有时间延迟的二阶过程(SOPTD)。之前已尝试在无延迟系统中进行极点放置。时间延迟项在Pade近似中表现为具有可变数量交错极点和零点的高阶系统,这使得精确极点放置控制变得困难。本文报告了使用三阶Pade近似来约束闭环主导和非主导极点在复数s平面上的解析表达式。然而,通过增加Pade阶数验证了不同时间延迟近似对闭环性能的不变性,代表了更接近现实的高阶延迟动态。非主导极点的性质(如全部为复数、实数或组合)会影响特征方程并影响可实现的稳定性区域。不同类型的非主导极点及其对应的稳定性区域对九个测试台过程的影响被获得,这些过程表现出不同的开环阻尼比和滞后到延迟比。接下来,通过蒙特卡洛模拟研究不同表达式在设计参数空间中产生更宽稳定性区域的效果。随后,通过成千上万的蒙特卡洛模拟研究了各种时域和频域控制性能参数及其在不确定过程参数下的偏差,围绕每个测试台过程的鲁棒稳定解。

英文摘要

This paper derives new formulations for designing dominant pole placement based proportional-integral-derivative (PID) controllers to handle second order processes with time delays (SOPTD). Previously, similar attempts have been made for pole placement in delay-free systems. The presence of the time delay term manifests itself as a higher order system with variable number of interlaced poles and zeros upon Pade approximation, which makes it difficult to achieve precise pole placement control. We here report the analytical expressions to constrain the closed loop dominant and non-dominant poles at the desired locations in the complex s-plane, using a third order Pade approximation for the delay term. However, invariance of the closed loop performance with different time delay approximation has also been verified using increasing order of Pade, representing a closed to reality higher order delay dynamics. The choice of the nature of non-dominant poles e.g. all being complex, real or a combination of them modifies the characteristic equation and influences the achievable stability regions. The effect of different types of non-dominant poles and the corresponding stability regions are obtained for nine test-bench processes indicating different levels of open-loop damping and lag to delay ratio. Next, we investigate which expression yields a wider stability region in the design parameter space by using Monte Carlo simulations while uniformly sampling a chosen design parameter space. Various time and frequency domain control performance parameters are investigated next, as well as their deviations with uncertain process parameters, using thousands of Monte Carlo simulations, around the robust stable solution for each of the nine test-bench processes.

1801.09657 2026-06-04 math.NA cs.LG cs.NA stat.ME

Matrix Completion for Structured Observations

结构观测的矩阵补全

Denali Molitor, Deanna Needell

AI总结 本文提出改进的核范数最小化方法,以考虑观测与未观测条目间的结构性差异,提升矩阵补全效果。

详情
AI中文摘要

预测或填补缺失数据(即矩阵补全)是当今数据驱动世界中的常见挑战。以往策略通常假设观测与缺失条目之间无结构性差异。不幸的是,这一假设在许多应用中显得不现实。例如,在经典的Netflix挑战中,预测用户对未观看电影的评分时,观众未观看某部电影可能表明对该电影缺乏兴趣,从而建议评分低于预期。本文提出调整标准核范数最小化策略,通过正则化未观测条目的值来考虑此类结构性差异。我们证明在某些情况下,所提方法优于核范数最小化。

英文摘要

The need to predict or fill-in missing data, often referred to as matrix completion, is a common challenge in today's data-driven world. Previous strategies typically assume that no structural difference between observed and missing entries exists. Unfortunately, this assumption is woefully unrealistic in many applications. For example, in the classic Netflix challenge, in which one hopes to predict user-movie ratings for unseen films, the fact that the viewer has not watched a given movie may indicate a lack of interest in that movie, thus suggesting a lower rating than otherwise expected. We propose adjusting the standard nuclear norm minimization strategy for matrix completion to account for such structural differences between observed and unobserved entries by regularizing the values of the unobserved entries. We show that the proposed method outperforms nuclear norm minimization in certain settings.

1611.09805 2026-06-04 math.OC cs.NA math.NA stat.ML

A new primal-dual algorithm for minimizing the sum of three functions with a linear operator

一种新的对偶算法用于最小化三个函数之和及其线性算子

Ming Yan

AI总结 本文提出一种新的对偶算法,用于最小化三个函数之和及其线性算子,该算法包含经典算法为特例,证明了收敛性并具有O(1/k)的收敛率,且参数范围更广、效率更高。

详情
Comments
v2 added the ergodic and nonergodic convergence rates for the primal-dual gap; v3 added the infimal convolution result and changed the title; v4 modifies the primal-dual gap and add more recent references
AI中文摘要

在本文中,我们提出了一种新的对偶算法,用于最小化$f(x) + g(x) + h(Ax)$,其中$f$、$g$和$h$是恰当的下半连续凸函数,$f$是可微且梯度Lipschitz连续的,而$A$是一个有界线性算子。所提出的算法包含一些著名的对偶算法作为特例,例如当$f = 0$时退化为Chambolle-Pock算法,当$g = 0$时退化为近端交替预测-校正算法。对于一般的凸情况,我们通过证明迭代过程是一个非扩张算子,证明了该算法在距离固定点的收敛性。此外,我们证明了在对偶间隙上的$O(1/k)$的ergodic收敛率。在额外假设下,我们推导了在距离固定点的线性收敛率。与其他用于解决相同问题的对偶算法相比,该算法扩展了可接受参数的范围以确保收敛性,并且具有更小的每迭代成本。数值实验展示了该算法的效率。

英文摘要

In this paper, we propose a new primal-dual algorithm for minimizing $f(x) + g(x) + h(Ax)$, where $f$, $g$, and $h$ are proper lower semi-continuous convex functions, $f$ is differentiable with a Lipschitz continuous gradient, and $A$ is a bounded linear operator. The proposed algorithm has some famous primal-dual algorithms for minimizing the sum of two functions as special cases. E.g., it reduces to the Chambolle-Pock algorithm when $f = 0$ and the proximal alternating predictor-corrector when $g = 0$. For the general convex case, we prove the convergence of this new algorithm in terms of the distance to a fixed point by showing that the iteration is a nonexpansive operator. In addition, we prove the $O(1/k)$ ergodic convergence rate in the primal-dual gap. With additional assumptions, we derive the linear convergence rate in terms of the distance to the fixed point. Comparing to other primal-dual algorithms for solving the same problem, this algorithm extends the range of acceptable parameters to ensure its convergence and has a smaller per-iteration cost. The numerical experiments show the efficiency of this algorithm.

1707.04723 2026-06-04 math.NA cs.NA stat.ME

Optimal Monte Carlo integration on closed manifolds

闭 manifold 上最优的蒙特卡洛积分

Martin Ehler, Manuel Graef, Chris. J. Oates

AI总结 本文研究了闭流形上Sobolev空间的蒙特卡洛积分方法,证明重加权能提高收敛速度,并在单位球面和Grassmannian流形上进行了数值实验。

详情
AI中文摘要

标准蒙特卡洛方法在再生核希尔伯特空间中的最坏情况积分误差随n衰减为n^{-1/2}。然而,通过重新加权随机点有时可以改进收敛阶。本文为闭黎曼流形上的Sobolev空间提供了通用理论结果,证明重加权能提供最优近似速率(至对数因子)。我们还为单位球面和Grassmannian流形上的某些Sobolev空间提供了数值实验,验证理论结果。本文的理论发现也涵盖了更一般集上的函数空间,如单位球、立方体和单纯形。

英文摘要

The worst case integration error in reproducing kernel Hilbert spaces of standard Monte Carlo methods with n random points decays as $n^{-1/2}$. However, re-weighting of random points can sometimes be used to improve the convergence order. This paper contributes general theoretical results for Sobolev spaces on closed Riemannian manifolds, where we verify that such re-weighting yields optimal approximation rates up to a logarithmic factor. We also provide numerical experiments matching the theoretical results for some Sobolev spaces on the unit sphere and on the Grassmannian manifold. Our theoretical findings also cover function spaces on more general sets such as the unit ball, the cube, and the simplex.

1512.07349 2026-06-04 cs.SI cs.NA math.NA stat.ML

Incremental Method for Spectral Clustering of Increasing Orders

逐步方法用于增加阶次的谱聚类

Pin-Yu Chen, Baichuan Zhang, Mohammad Al Hasan, Alfred O. Hero

AI总结 本文提出一种逐步方法,用于构建图拉普拉斯矩阵的特征谱,通过利用图拉普拉斯矩阵的特征结构,从已知的K-1个最小特征对中获取K个特征对,从而将批量特征分解问题转化为高效的顺序主导特征对计算问题。

详情
Comments
in KDD workshop on mining and learning graph, 2016 http://www.mlgworkshop.org/2016/
AI中文摘要

图拉普拉斯矩阵的最小特征值及其对应的特征向量(即特征对)已被广泛用于谱聚类和社区检测。然而,在实际应用中,聚类数或社区数(例如K)通常是未知的。因此,现有的大多数方法要么启发式地选择K,要么用不同的K重复聚类方法并接受最佳聚类结果。第一种方法通常会产生次优结果,而第二种方法计算成本高。本文提出了一种逐步方法,用于构建图拉普拉斯矩阵的特征谱。该方法利用图拉普拉斯矩阵的特征结构,从所有K-1个最小特征对中获取拉普拉斯矩阵的K个特征对。我们提出的方法使拉普拉斯矩阵适应,从而将批量特征分解问题转化为高效的顺序主导特征对计算问题。作为实际应用,我们考虑了用户引导的谱聚类。具体来说,我们展示了用户可以利用所提出的逐步方法进行有效的特征对计算,并基于多种聚类度量确定所需的聚类数。

英文摘要

The smallest eigenvalues and the associated eigenvectors (i.e., eigenpairs) of a graph Laplacian matrix have been widely used for spectral clustering and community detection. However, in real-life applications the number of clusters or communities (say, $K$) is generally unknown a-priori. Consequently, the majority of the existing methods either choose $K$ heuristically or they repeat the clustering method with different choices of $K$ and accept the best clustering result. The first option, more often, yields suboptimal result, while the second option is computationally expensive. In this work, we propose an incremental method for constructing the eigenspectrum of the graph Laplacian matrix. This method leverages the eigenstructure of graph Laplacian matrix to obtain the $K$-th eigenpairs of the Laplacian matrix given a collection of all the $K-1$ smallest eigenpairs. Our proposed method adapts the Laplacian matrix such that the batch eigenvalue decomposition problem transforms into an efficient sequential leading eigenpair computation problem. As a practical application, we consider user-guided spectral clustering. Specifically, we demonstrate that users can utilize the proposed incremental method for effective eigenpair computation and determining the desired number of clusters based on multiple clustering metrics.

1801.07083 2026-06-04 cs.IT cs.NA math.IT math.NA math.PR math.ST stat.TH

Differential Message Importance Measure: A New Approach to the Required Sampling Number in Big Data Structure Characterization

差分信息重要性度量:大数据结构表征中所需采样数的新方法

Shanyun Liu, Rui She, Pingyi Fan

AI总结 本文提出差分信息重要性度量(DMIM)用于大数据结构表征中所需采样数的确定,通过分析信息收集过程,提供无分布假设的判定标准,并证明DMIM变化可描述样本分布与理论分布的差距。

详情
Comments
11pages, 6 figures
AI中文摘要

在大数据场景中,数据收集是一个基础问题,采样集的大小尤其重要,特别是在数据结构表征中。本文通过考虑信息重要性来考虑信息收集过程,并给出一个无分布假设的标准来确定大数据结构表征中所需的采样数。类似于微分熵,我们定义微分信息重要性度量(DMIM)作为连续随机变量的消息重要性度量。讨论了许多常见密度的DMIM,并给出了正态分布的高精度近似值。此外,证明了DMIM的变化可以描述样本集分布与理论分布之间的差距。事实上,DMIM的偏差等同于Kolmogorov-Smirnov统计量,但提供了一种新的分布拟合表征方法。数值结果展示了一些DMIM的基本性质和所提近似值的准确性。此外,还得到经验分布随着DMIM偏差减小而接近真实分布,这有助于实际系统中合适采样点的选择。

英文摘要

Data collection is a fundamental problem in the scenario of big data, where the size of sampling sets plays a very important role, especially in the characterization of data structure. This paper considers the information collection process by taking message importance into account, and gives a distribution-free criterion to determine how many samples are required in big data structure characterization. Similar to differential entropy, we define differential message importance measure (DMIM) as a measure of message importance for continuous random variable. The DMIM for many common densities is discussed, and high-precision approximate values for normal distribution are given. Moreover, it is proved that the change of DMIM can describe the gap between the distribution of a set of sample values and a theoretical distribution. In fact, the deviation of DMIM is equivalent to Kolmogorov-Smirnov statistic, but it offers a new way to characterize the distribution goodness-of-fit. Numerical results show some basic properties of DMIM and the accuracy of the proposed approximate values. Furthermore, it is also obtained that the empirical distribution approaches the real distribution with decreasing of the DMIM deviation, which contributes to the selection of suitable sampling points in actual system.

1801.06637 2026-06-04 stat.ML cs.LG cs.NA math.AP math.NA

Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations

深度隐物理模型:深度学习非线性偏微分方程

Maziar Raissi

AI总结 本文提出深度学习方法,从散乱且可能含噪声的观测数据中发现非线性偏微分方程,通过两个深度神经网络近似未知解及非线性动力学,验证了该方法在多个科学领域基准问题中的有效性。

详情
AI中文摘要

在人工智能与应用数学的交汇处,长期存在的问题是设计出能够将观测数据转化为物理世界预测数学模型的算法。在数据丰富和机器学习能力先进的时代,自然的问题是:如何自动从高维实验数据中揭示物理定律?本文提出了一种深度学习方法,用于从空间和时间上散乱且可能含噪声的观测中发现非线性偏微分方程。具体而言,我们通过两个深度神经网络近似未知解及非线性动力学。第一个网络作为未知解的先验,本质上使我们能够避免本质上病态且不稳定的数值微分。第二个网络代表非线性动力学,帮助我们提炼支配给定时空数据集演化的机制。我们测试了该方法在多个科学领域基准问题中的有效性,并展示了所提框架如何帮助我们准确学习底层动力学并预测系统未来状态。特别是,我们研究了Burgers'、Korteweg-de Vries(KdV)、Kuramoto-Sivashinsky、非线性Schrödinger和Navier-Stokes方程。

英文摘要

A long-standing problem at the interface of artificial intelligence and applied mathematics is to devise an algorithm capable of achieving human level or even superhuman proficiency in transforming observed data into predictive mathematical models of the physical world. In the current era of abundance of data and advanced machine learning capabilities, the natural question arises: How can we automatically uncover the underlying laws of physics from high-dimensional data generated from experiments? In this work, we put forth a deep learning approach for discovering nonlinear partial differential equations from scattered and potentially noisy observations in space and time. Specifically, we approximate the unknown solution as well as the nonlinear dynamics by two deep neural networks. The first network acts as a prior on the unknown solution and essentially enables us to avoid numerical differentiations which are inherently ill-conditioned and unstable. The second network represents the nonlinear dynamics and helps us distill the mechanisms that govern the evolution of a given spatiotemporal data-set. We test the effectiveness of our approach for several benchmark problems spanning a number of scientific domains and demonstrate how the proposed framework can help us accurately learn the underlying dynamics and forecast future states of the system. In particular, we study the Burgers', Korteweg-de Vries (KdV), Kuramoto-Sivashinsky, nonlinear Schrödinger, and Navier-Stokes equations.

1801.05894 2026-06-04 math.HO cs.LG cs.NA math.NA stat.ML

Deep Learning: An Introduction for Applied Mathematicians

深度学习:应用数学家的入门指南

Catherine F. Higham, Desmond J. Higham

AI总结 本文从应用数学角度介绍深度学习基本概念,面向数学专业研究生及本科生,通过MATLAB代码和图像分类实例展示神经网络原理与训练方法。

详情
AI中文摘要

多层人工神经网络正成为众多应用领域中的普遍工具。深度学习革命的核心概念来自应用数学和计算数学,包括微积分、逼近论、优化和线性代数。本文为应用数学家提供深度学习基础介绍。目标读者为数学专业研究生及大四本科生,也适用于希望在课堂中引入深度学习应用的数学教师。文章聚焦三个核心问题:什么是深度神经网络?如何训练网络?什么是随机梯度方法?通过MATLAB代码展示网络构建与训练,并展示在大规模图像分类问题中使用最新软件的应用。最后提供当前文献的参考文献。

英文摘要

Multilayered artificial neural networks are becoming a pervasive tool in a host of application fields. At the heart of this deep learning revolution are familiar concepts from applied and computational mathematics; notably, in calculus, approximation theory, optimization and linear algebra. This article provides a very brief introduction to the basic ideas that underlie deep learning from an applied mathematics perspective. Our target audience includes postgraduate and final year undergraduate students in mathematics who are keen to learn about the area. The article may also be useful for instructors in mathematics who wish to enliven their classes with references to the application of deep learning techniques. We focus on three fundamental questions: what is a deep neural network? how is a network trained? what is the stochastic gradient method? We illustrate the ideas with a short MATLAB code that sets up and trains a network. We also show the use of state-of-the art software on a large scale image classification problem. We finish with references to the current literature.

1604.00082 2026-06-04 stat.AP cs.SY eess.SY

A track-before-detect labelled multi-Bernoulli particle filter with label switching

具有标签切换的跟踪前检测标记多伯努利粒子滤波器

Ángel F. García-Fernández

AI总结 本文提出了一种适用于通用跟踪前检测测量模型的多目标跟踪粒子滤波器,采用标记多伯努利近似,并引入基于马尔可夫链蒙特卡洛的标签切换改进算法以提升滤波性能。

详情
Journal ref
IEEE Transactions on Aerospace and Electronic Systems, vol. 52, no. 5, pp. 2123-2138, October 2016
Comments
Accepted for publication in IEEE Transactions on Aerospace and Electronic Systems
AI中文摘要

本文提出了一种适用于通用跟踪前检测测量模型的多目标跟踪粒子滤波器(PF)。该PF基于随机有限集框架,并采用标记多伯努利近似。我们还提出了一种基于马尔可夫链蒙特卡洛的标签切换改进算法,预计在目标长时间接近时能提高滤波性能。该PF在两个具有挑战性的数值例子中进行了测试。

英文摘要

This paper presents a multitarget tracking particle filter (PF) for general track-before-detect measurement models. The PF is presented in the random finite set framework and uses a labelled multi-Bernoulli approximation. We also present a label switching improvement algorithm based on Markov chain Monte Carlo that is expected to increase filter performance if targets get in close proximity for a sufficiently long time. The PF is tested in two challenging numerical examples.

1706.09993 2026-06-04 math.NA cs.IT cs.LG cs.NA math.IT math.PR math.ST stat.TH

Phase Retrieval via Randomized Kaczmarz: Theoretical Guarantees

通过随机Kaczmarz方法进行相位恢复:理论保证

Yan Shuo Tan, Roman Vershynin

AI总结 本文提出随机Kaczmarz方法在相位恢复中的理论保障,证明仅需与维度成正比的高斯测量即可保证收敛,引入了测量集的充分条件,并利用VC维和度量熵证明高斯采样向量满足该条件。

详情
Comments
Revised after comments from referees
AI中文摘要

我们考虑相位恢复问题,即求解二次方程组的问题。最近提出了一种随机Kaczmarz方法的简单变种,并在数值上显示出比最先进的Wirtinger流方法更高效。在本文中,我们为相位恢复中的随机Kaczmarz方法提供了首次理论保证。我们证明仅需与维度成正比的高斯测量即可保证收敛。在此过程中,我们引入了一个关于测量集的充分条件,以保证随机Kaczmarz方法能够正常工作。我们证明高斯采样向量以高概率满足该性质;这通过链式论证结合VC维和度量熵的界来证明。

英文摘要

We consider the problem of phase retrieval, i.e. that of solving systems of quadratic equations. A simple variant of the randomized Kaczmarz method was recently proposed for phase retrieval, and it was shown numerically to have a computational edge over state-of-the-art Wirtinger flow methods. In this paper, we provide the first theoretical guarantee for the convergence of the randomized Kaczmarz method for phase retrieval. We show that it is sufficient to have as many Gaussian measurements as the dimension, up to a constant factor. Along the way, we introduce a sufficient condition on measurement sets for which the randomized Kaczmarz method is guaranteed to work. We show that Gaussian sampling vectors satisfy this property with high probability; this is proved using a chaining argument coupled with bounds on VC dimension and metric entropy.

1612.09123 2026-06-04 econ.GN econ.EM physics.ao-ph q-fin.EC stat.AP

Population and trends in the global mean temperature

全球平均温度的人口与趋势

Richard S. J. Tol

AI总结 本文将Fisher理想指数应用于定义人口加权温度趋势,发现全球面积加权平均地表气温趋势与人口加权趋势存在差异,并扩展该指数以考虑城市化和城市热岛效应,进一步影响趋势。

详情
AI中文摘要

Fisher理想指数最初用于衡量价格通货膨胀,被应用于定义人口加权的温度趋势。该方法具有优势,即趋势能代表样本中的人口分布,而不混淆人口分布趋势和温度趋势。作者展示了全球面积加权平均地表气温趋势在关键细节上与人口加权趋势不同。该指数被扩展以包括城市化和城市热岛效应,这再次显著改变了趋势。进一步扩展该指数以包括国际移民,但对趋势的影响较小。

英文摘要

The Fisher Ideal index, developed to measure price inflation, is applied to define a population-weighted temperature trend. This method has the advantages that the trend is representative for the population distribution throughout the sample but without conflating the trend in the population distribution and the trend in the temperature. I show that the trend in the global area-weighted average surface air temperature is different in key details from the population-weighted trend. I extend the index to include urbanization and the urban heat island effect. This substantially changes the trend again. I further extend the index to include international migration, but this has a minor impact on the trend.

1607.06247 2026-06-04 econ.GN econ.EM q-fin.EC stat.AP

Effects of Sea Level Rise on Economy of the United States

海平面上升对美国经济的影响

Monika Novackova, Richard S. J. Tol

AI总结 本文通过两种计量方法研究海平面上升对美国经济的影响,发现多数情况下效应不显著,与先前前瞻性研究结果相矛盾。

详情
AI中文摘要

我们报告了首个事后研究海平面上升对美国经济影响的实证研究。我们应用两种计量方法估计过去海平面上升对美国经济的影响,即调整空间模式的巴罗型增长回归和匹配估计器。分析单位为美国3063个县。我们为13个时间期拟合增长回归,并对两种回归和匹配估计器进行了多种变体和稳健性测试。尽管有证据表明海平面上升对经济增长有正向影响,但大多数规范中估计的影响不显著。因此,我们得出结论,海平面上升对经济增长没有稳定且显著的影响。这一发现与先前前瞻性研究结果相矛盾。

英文摘要

We report the first ex post study of the economic impact of sea level rise. We apply two econometric approaches to estimate the past effects of sea level rise on the economy of the USA, viz. Barro type growth regressions adjusted for spatial patterns and a matching estimator. Unit of analysis is 3063 counties of the USA. We fit growth regressions for 13 time periods and we estimated numerous varieties and robustness tests for both growth regressions and matching estimator. Although there is some evidence that sea level rise has a positive effect on economic growth, in most specifications the estimated effects are insignificant. We therefore conclude that there is no stable, significant effect of sea level rise on economic growth. This finding contradicts previous ex ante studies.

1801.01467 2026-06-04 eess.SY cs.SY stat.AP stat.ML

Deep Reinforcement Learning based Optimal Control of Hot Water Systems

基于深度强化学习的热水系统最优控制

Hussain Kazmi, Fahad Mehmood, Stefan Lodeweyckx, Johan Driesen

AI总结 本文提出一种基于深度强化学习的热水系统优化方法,无需详细动态模型或人类知识,通过实时学习住户偏好,降低能耗20%,适用于多种住宅热水系统。

详情
Journal ref
Energy, Volume 144, 2018
AI中文摘要

热水生产能耗是高效率建筑中的主要能耗来源。通常从热力学角度优化,与住户影响脱节。优化通常假设存在详细的热水系统动态模型,这些假设导致现实中的次优能效。本文提出一种基于深度强化学习的新方法,优化热水生产。该方法具有通用性,无需离线步骤或人类领域知识构建热水罐或加热元件模型。住户偏好也在实时学习。该系统应用于荷兰32个住宅,使热水生产能耗降低约20%,无舒适度损失。此性能可复制到任何住宅热水系统和优化目标,只要满足相对简单的传感器数据要求。全球数以百万计的热水系统运行,该框架有望在未来几年减少现有和新系统的能耗,达到多吉瓦时的规模。

英文摘要

Energy consumption for hot water production is a major draw in high efficiency buildings. Optimizing this has typically been approached from a thermodynamics perspective, decoupled from occupant influence. Furthermore, optimization usually presupposes existence of a detailed dynamics model for the hot water system. These assumptions lead to suboptimal energy efficiency in the real world. In this paper, we present a novel reinforcement learning based methodology which optimizes hot water production. The proposed methodology is completely generalizable, and does not require an offline step or human domain knowledge to build a model for the hot water vessel or the heating element. Occupant preferences too are learnt on the fly. The proposed system is applied to a set of 32 houses in the Netherlands where it reduces energy consumption for hot water production by roughly 20% with no loss of occupant comfort. Extrapolating, this translates to absolute savings of roughly 200 kWh for a single household on an annual basis. This performance can be replicated to any domestic hot water system and optimization objective, given that the fairly minimal requirements on sensor data are met. With millions of hot water systems operational worldwide, the proposed framework has the potential to reduce energy consumption in existing and new systems on a multi Gigawatt-hour scale in the years to come.

1801.01236 2026-06-04 math.DS cs.NA math.NA nlin.CD physics.comp-ph stat.ML

Multistep Neural Networks for Data-driven Discovery of Nonlinear Dynamical Systems

多步神经网络用于数据驱动发现非线性动力系统

Maziar Raissi, Paris Perdikaris, George Em Karniadakis

AI总结 本文提出利用多步时间推进方案与深度神经网络结合,用于从数据中发现非线性动力系统的机制,通过多个基准问题验证其在预测未来状态和识别吸引盆地方面的有效性。

详情
AI中文摘要

将数值分析中的多步时间推进方案与深度神经网络结合,用于从数据中发现非线性动力系统的机制。通过多个基准问题验证该方法在准确学习动力学、预测未来状态和识别吸引盆地方面的有效性,包括洛伦兹系统、圆柱后流体流动、霍普夫分岔和糖酵解振荡器等复杂非线性动力学问题。

英文摘要

The process of transforming observed data into predictive mathematical models of the physical world has always been paramount in science and engineering. Although data is currently being collected at an ever-increasing pace, devising meaningful models out of such observations in an automated fashion still remains an open problem. In this work, we put forth a machine learning approach for identifying nonlinear dynamical systems from data. Specifically, we blend classical tools from numerical analysis, namely the multi-step time-stepping schemes, with powerful nonlinear function approximators, namely deep neural networks, to distill the mechanisms that govern the evolution of a given data-set. We test the effectiveness of our approach for several benchmark problems involving the identification of complex, nonlinear and chaotic dynamics, and we demonstrate how this allows us to accurately learn the dynamics, forecast future states, and identify basins of attraction. In particular, we study the Lorenz system, the fluid flow behind a cylinder, the Hopf bifurcation, and the Glycoltic oscillator model as an example of complicated nonlinear dynamics typical of biological systems.

1402.6964 2026-06-04 cs.LG cs.DC cs.NA math.NA stat.ML

Scalable methods for nonnegative matrix factorizations of near-separable tall-and-skinny matrices

可扩展的非负矩阵分解方法用于近可分离的高瘦矩阵

Austin R. Benson, Jason D. Lee, Bartek Rajwa, David F. Gleich

AI总结 本文提出高效算法处理高瘦矩阵的非负矩阵分解,通过正交变换保持分离性,适用于流式数据和分布式计算环境。

详情
Journal ref
Proceedings of Neural Information Processing Systems, 2014
AI中文摘要

许多算法在假设矩阵近可分离的情况下用于非负矩阵分解。本文展示如何使这些算法高效处理行数远多于列数的高瘦矩阵。改进方法的关键是保持NMF问题分离性的正交矩阵变换。最终方法只需单次遍历数据矩阵,适用于流式、多核和MapReduce架构。我们在TB级合成矩阵和科学计算、生物信息学的真实数据上验证了算法的有效性。

英文摘要

Numerous algorithms are used for nonnegative matrix factorization under the assumption that the matrix is nearly separable. In this paper, we show how to make these algorithms efficient for data matrices that have many more rows than columns, so-called "tall-and-skinny matrices". One key component to these improved methods is an orthogonal matrix transformation that preserves the separability of the NMF problem. Our final methods need a single pass over the data matrix and are suitable for streaming, multi-core, and MapReduce architectures. We demonstrate the efficacy of these algorithms on terabyte-sized synthetic matrices and real-world matrices from scientific computing and bioinformatics.

1708.00842 2026-06-04 eess.SY cs.IT cs.MA cs.SY math.IT stat.AP stat.CO

Latent Parameter Estimation in Fusion Networks Using Separable Likelihoods

在融合网络中使用可分离似然进行潜在参数估计

Murat Uney, Bernard Mulgrew, Daniel E Clark

AI总结 本文提出一种更准确的可分离伪似然方法,用于在多传感器状态空间模型中估计潜在参数,并通过经验贝叶斯视角评估其在多目标和测量关联模糊情况下的有效性。

详情
Comments
accepted with minor revisions, IEEE Transactions on Signal and Information Processing Over Networks
AI中文摘要

多传感器状态空间模型是网络传感器融合应用的基础。在这些模型中估计潜在参数具有提供网络自校准等高度 desirable 能力的潜力。传统方法由于联合多传感器过滤在评估参数似然时难以扩展。本文提出一种更准确的可分离伪似然,相较于之前提出的方法在典型操作条件下更为准确。此外,我们考虑在存在许多目标和测量与来源目标关联模糊的情况下使用可分离似然。为此,我们使用基于假设的参数化状态空间模型,并开发经验贝叶斯视角以在该模型上使用局部过滤评估可分离似然。通过信念传播在关联的配对马尔可夫随机场上进行贝叶斯推断。我们为线性高斯状态空间模型指定一个粒子算法用于潜在参数估计,并通过非合作目标的测量结果演示其在网络自校准中的有效性,与替代方法相比。

英文摘要

Multi-sensor state space models underpin fusion applications in networks of sensors. Estimation of latent parameters in these models has the potential to provide highly desirable capabilities such as network self-calibration. Conventional solutions to the problem pose difficulties in scaling with the number of sensors due to the joint multi-sensor filtering involved when evaluating the parameter likelihood. In this article, we propose a separable pseudo-likelihood which is a more accurate approximation compared to a previously proposed alternative under typical operating conditions. In addition, we consider using separable likelihoods in the presence of many objects and ambiguity in associating measurements with objects that originated them. To this end, we use a state space model with a hypothesis based parameterisation, and, develop an empirical Bayesian perspective in order to evaluate separable likelihoods on this model using local filtering. Bayesian inference with this likelihood is carried out using belief propagation on the associated pairwise Markov random field. We specify a particle algorithm for latent parameter estimation in a linear Gaussian state space model and demonstrate its efficacy for network self-calibration using measurements from non-cooperative targets in comparison with alternatives.

1608.03928 2026-06-04 math.OC cs.NA math.NA stat.ML

Hybrid Jacobian and Gauss-Seidel proximal block coordinate update methods for linearly constrained convex programming

混合雅可比和高斯-塞德尔近似块坐标更新方法用于线性约束凸规划

Yangyang Xu

AI总结 本文提出混合雅可比和高斯-塞德尔块坐标更新方法,用于解决线性约束多块结构凸规划问题,通过选择合适的仿射混合矩阵实现理论收敛性,并在数值上优于随机对偶块坐标更新方法。

详情
Comments
Accepted in SIAM Journal on Optimization
AI中文摘要

近年来,块坐标更新(BCU)方法在处理大规模数据和/或变量的问题中迅速发展。在优化中,BCU最初以坐标下降法出现,适用于光滑问题或具有分离的非光滑项和/或分离约束的问题。当存在非分离约束时,BCU可以在对偶设置下应用。文献表明,对于弱凸问题和非分离线性约束,BCU在完全高斯-塞德尔更新规则下可能无法收敛,而在完全雅可比规则下只能亚线性收敛。然而,经验上雅可比更新方法通常比高斯-塞德尔规则更慢。为保持其优点,本文提出了一种混合雅可比和高斯-塞德尔BCU方法,用于解决线性约束多块结构凸规划问题,其中目标函数可能具有非分离的二次项和可分离的非光滑项。在每个原变量块更新时,该方法近似了增广拉格朗日函数在前两次迭代的仿射组合上,并通过求解半正定规划选择具有期望良好性质的仿射混合矩阵。我们证明了混合方法具有与雅可比BCU相同的理论收敛性。此外,我们数值证明该方法在性能上与高斯-塞德尔方法相当,并优于最近提出的随机对偶BCU方法。

英文摘要

Recent years have witnessed the rapid development of block coordinate update (BCU) methods, which are particularly suitable for problems involving large-sized data and/or variables. In optimization, BCU first appears as the coordinate descent method that works well for smooth problems or those with separable nonsmooth terms and/or separable constraints. As nonseparable constraints exist, BCU can be applied under primal-dual settings. In the literature, it has been shown that for weakly convex problems with nonseparable linear constraint, BCU with fully Gauss-Seidel updating rule may fail to converge and that with fully Jacobian rule can converge sublinearly. However, empirically the method with Jacobian update is usually slower than that with Gauss-Seidel rule. To maintain their advantages, we propose a hybrid Jacobian and Gauss-Seidel BCU method for solving linearly constrained multi-block structured convex programming, where the objective may have a nonseparable quadratic term and separable nonsmooth terms. At each primal block variable update, the method approximates the augmented Lagrangian function at an affine combination of the previous two iterates, and the affinely mixing matrix with desired nice properties can be chosen through solving a semidefinite programming. We show that the hybrid method enjoys the theoretical convergence guarantee as Jacobian BCU. In addition, we numerically demonstrate that the method can perform as well as Gauss-Seidel method and better than a recently proposed randomized primal-dual BCU method.

1801.00410 2026-06-04 math.OC cs.SY eess.SY stat.OT

Enhanced ${q}$-Least Mean Square

增强的 q-最小均方算法

Shujaat Khan, Alishba Sadiq, Imran Naseem, Roberto Togneri, Mohammed Bennamoun

AI总结 本文提出基于 q-微积分的新型随机梯度算法,通过引入时间变化的 q 参数,改进了 q-LMS 算法,实现高收敛性、稳定性和低稳态误差。

详情
AI中文摘要

本文提出了一种基于 q-微积分的新类随机梯度算法。与现有 q-LMS 算法不同,所提出的方法充分利用 q-微积分的概念,通过引入时间变化的 q 参数。所提出的增强 q-LMS (Eq-LMS) 算法利用新颖的误差相关能量概念和信号归一化,以确保高收敛性、稳定性和低稳态误差。所提出算法自动根据误差调整学习率。为了评估目的,考虑了系统识别问题。大量实验表明,所提出的 Eq-LMS 算法相比标准 q-LMS 方法具有更好的性能。

英文摘要

In this work, a new class of stochastic gradient algorithm is developed based on $q$-calculus. Unlike the existing $q$-LMS algorithm, the proposed approach fully utilizes the concept of $q$-calculus by incorporating time-varying $q$ parameter. The proposed enhanced $q$-LMS ($Eq$-LMS) algorithm utilizes a novel, parameterless concept of error-correlation energy and normalization of signal to ensure high convergence, stability and low steady-state error. The proposed algorithm automatically adapts the learning rate with respect to the error. For the evaluation purpose the system identification problem is considered. Extensive experiments show better performance of the proposed $Eq$-LMS algorithm compared to the standard $q$-LMS approach.

1710.09668 2026-06-04 math.NA cs.LG cs.NA cs.NE stat.ML

PDE-Net: Learning PDEs from Data

PDE-Net:从数据中学习偏微分方程

Zichao Long, Yiping Lu, Xianzhong Ma, Bin Dong

AI总结 本文提出PDE-Net,通过学习卷积核来获取微分算子,同时近似未知非线性响应,灵活地揭示复杂系统的动力学和隐藏的PDE模型。

详情
Comments
15 pages, 13 figures
AI中文摘要

本文提出了一种新的前馈深度网络PDE-Net,旨在同时准确预测复杂系统的动力学并揭示隐藏的PDE模型。PDE-Net通过学习卷积核来获取微分算子,并利用神经网络或其他机器学习方法近似未知的非线性响应。与现有方法相比,我们的方法通过学习微分算子和非线性响应具有最大的灵活性。PDE-Net的特殊之处在于所有滤波器都受到适当约束,这使我们能够轻松识别 governing PDE 模型,同时保持网络的表达力和预测能力。这些约束通过充分利用微分算子的阶数与滤波器的阶数总和规则(源自小波理论的重要概念)精心设计。我们还讨论了PDE-Net与计算机视觉中的一些现有网络如Network-In-Network (NIN) 和 Residual Neural Network (ResNet) 的关系。数值实验表明,PDE-Net有潜力揭示观测动态的隐藏PDE,并在噪声环境中预测相对较长的时间内的动态行为。

英文摘要

In this paper, we present an initial attempt to learn evolution PDEs from data. Inspired by the latest development of neural network designs in deep learning, we propose a new feed-forward deep network, called PDE-Net, to fulfill two objectives at the same time: to accurately predict dynamics of complex systems and to uncover the underlying hidden PDE models. The basic idea of the proposed PDE-Net is to learn differential operators by learning convolution kernels (filters), and apply neural networks or other machine learning methods to approximate the unknown nonlinear responses. Comparing with existing approaches, which either assume the form of the nonlinear response is known or fix certain finite difference approximations of differential operators, our approach has the most flexibility by learning both differential operators and the nonlinear responses. A special feature of the proposed PDE-Net is that all filters are properly constrained, which enables us to easily identify the governing PDE models while still maintaining the expressive and predictive power of the network. These constrains are carefully designed by fully exploiting the relation between the orders of differential operators and the orders of sum rules of filters (an important concept originated from wavelet theory). We also discuss relations of the PDE-Net with some existing networks in computer vision such as Network-In-Network (NIN) and Residual Neural Network (ResNet). Numerical experiments show that the PDE-Net has the potential to uncover the hidden PDE of the observed dynamics, and predict the dynamical behavior for a relatively long time, even in a noisy environment.

1609.00048 2026-06-04 math.NA cs.DS cs.NA stat.CO stat.ML

Practical sketching algorithms for low-rank matrix approximation

实用的低秩矩阵近似 sketching 算法

Joel A. Tropp, Alp Yurtsever, Madeleine Udell, Volkan Cevher

AI总结 本文提出了一系列构造输入矩阵低秩近似的算法,通过随机线性图像生成 sketch,保留矩阵结构特性,提供指定秩的近似,并保证数值稳定性与正确性。

详情
Journal ref
SIAM J. Matrix Analysis and Applications, Vol. 38, num. 4, pp. 1454-1485, Dec. 2017
AI中文摘要

本文描述了一套用于构造输入矩阵低秩近似的算法,这些算法通过随机线性图像生成 sketch 来实现。这些方法可以保留输入矩阵的结构特性,如正定性,并能生成用户指定秩的近似。这些算法简单、准确、数值稳定且具有理论保证。此外,每种方法都配有信息丰富的误差界,允许用户在事前选择参数以达到所需的近似质量。这些声明通过真实和合成数据的数值实验得到支持。

英文摘要

This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.

1405.6397 2026-06-04 stat.CO cs.NA cs.SY eess.SY math.NA

Efficient Evaluation of the Probability Density Function of a Wrapped Normal Distribution

高效评估包裹正态分布的概率密度函数

Gerhard Kurz, Igor Gilitschenski, Uwe D. Hanebeck

AI总结 本文研究了包裹正态分布概率密度函数的截断级数表示方法,发现小数量项即可实现高精度,适用于不同不确定性范围。

详情
AI中文摘要

本文研究了包裹正态分布概率密度函数的截断级数表示方法,发现小数量项即可实现高精度,适用于不同不确定性范围。

英文摘要

The wrapped normal distribution arises when a the density of a one-dimensional normal distribution is wrapped around the circle infinitely many times. At first look, evaluation of its probability density function appears tedious as an infinite series is involved. In this paper, we investigate the evaluation of two truncated series representations. As one representation performs well for small uncertainties whereas the other performs well for large uncertainties, we show that in all cases a small number of summands is sufficient to achieve high accuracy.

1612.03974 2026-06-04 stat.ME cs.NA math.NA

A self-calibrating method for heavy tailed data modelling. Application in neuroscience and finance

针对重尾数据建模的自校准方法。应用于神经科学和金融领域

Nehla Debbabi, Marie Kratz, Mamadou Mboup

AI总结 本文提出一种自校准方法,用于建模具有重尾特性的多成分数据,通过混合分布和无监督算法在神经科学和金融领域验证其有效性。

详情
Comments
30 pages, 9 figures, 11 tables
AI中文摘要

对非均匀和多成分数据建模是挑战科学研究人员的难题。通常无法找到简单的闭合形式概率模型来描述此类数据。因此,人们常采用非参数方法。然而,当多个成分可分离时,参数建模变得可行。本文提出了一种自校准方法,用于建模具有重尾特性的多成分数据。我们引入了一个三成分混合分布:高斯分布通过指数分布与广义帕累托分布连接,从而弥合均值和尾部行为之间的差距。然后开发了一个无监督算法来估计该模型的参数。我们分析和数值研究了其收敛性。在模拟数据上测试了自校准方法的有效性,随后应用于神经科学和金融领域的实际数据。与其他标准极值理论方法的比较证实了该新方法的相关性和实用性。

英文摘要

Modelling non-homogeneous and multi-component data is a problem that challenges scientific researchers in several fields. In general, it is not possible to find a simple and closed form probabilistic model to describe such data. That is why one often resorts to non-parametric approaches. However, when the multiple components are separable, parametric modelling becomes again tractable. In this study, we propose a self-calibrating method to model multi-component data that exhibit heavy tails. We introduce a three-component hybrid distribution: a Gaussian distribution is linked to a Generalized Pareto one via an exponential distribution that bridges the gap between mean and tail behaviors. An unsupervised algorithm is then developed for estimating the parameters of this model. We study analytically and numerically its convergence. The effectiveness of the self-calibrating method is tested on simulated data, before applying it to real data from neuroscience and finance, respectively. A comparison with other standard Extreme Value Theory approaches confirms the relevance and the practical advantage of this new method.

1710.10737 2026-06-04 math.OC cs.LG cs.NA math.NA stat.ML

Linearly convergent stochastic heavy ball method for minimizing generalization error

用于最小化泛化误差的线性收敛随机重力球方法

Nicolas Loizou, Peter Richtárik

AI总结 本文首次证明了随机重力球方法的线性收敛性,通过固定步长的SGD步骤结合重力球动量项,专注于最小化期望损失而非有限和最小化。

详情
Comments
NIPS 2017, Workshop on Optimization for Machine Learning (camera ready version)
AI中文摘要

本文首次建立了随机重力球方法的线性收敛性结果。该方法通过固定步长的SGD步骤结合重力球动量项进行优化。在分析中,我们专注于最小化期望损失,而非通常更困难的有限和最小化问题。尽管分析中我们限制在二次损失下,但总体目标不一定是强凸的。

英文摘要

In this work we establish the first linear convergence result for the stochastic heavy ball method. The method performs SGD steps with a fixed stepsize, amended by a heavy ball momentum term. In the analysis, we focus on minimizing the expected loss and not on finite-sum minimization, which is typically a much harder problem. While in the analysis we constrain ourselves to quadratic loss, the overall objective is not necessarily strongly convex.

1707.06837 2026-06-04 stat.ME econ.GN q-fin.EC q-fin.ST

An Alternative Estimation Method of a Time-Varying Parameter Model

时间变化参数模型的一种替代估计方法

Mikio Ito, Akihiko Noda, Tatsuma Wada

AI总结 本文提出非贝叶斯回归或广义最小二乘法估计时间变化AR参数模型,无需卡尔曼滤波,可处理随机波动模型和非高斯误差模型。

详情
Comments
35 pages, 6 tables, 4 figures
AI中文摘要

本文正式提出了一种非贝叶斯的回归方法或广义最小二乘法(GLS)方法,用于估计一类时间变化的AR参数模型。该方法部分由Ito等人(2014, 2016a,b)使用,并被证明是高效的,因为它不像传统方法那样需要卡尔曼滤波和平滑过程,但会产生与卡尔曼平滑估计相同的平滑估计。与最大似然估计不同,堆叠问题的可能性很小。此外,该方法使我们能够处理随机波动模型、具有时间依赖方差-协方差矩阵的模型以及允许处理时间变化参数中突变或结构性突破的非高斯误差模型。

英文摘要

A non-Bayesian, regression-based or generalized least squares (GLS)-based approach is formally proposed to estimate a class of time-varying AR parameter models. This approach has partly been used by Ito et al. (2014, 2016a,b), and is proven to be efficient because, unlike conventional methods, it does not require Kalman filtering and smoothing procedures, but yields a smoothed estimate that is identical to the Kalman-smoothed estimate. Unlike the maximum likelihood estimator, the possibility of the pile-up problem is negligible. In addition, this approach enables us to deal with stochastic volatility models, models with a time-dependent variance-covariance matrix, and models with non-Gaussian errors that allow us to deal with abrupt changes or structural breaks in time-varying parameters.

1712.06947 2026-06-04 physics.chem-ph cs.NA math.NA stat.CO

Methodological and computational aspects of parallel tempering methods in the infinite swapping limit

并行退火方法在无限交换极限下的方法论与计算方面

Jianfeng Lu, Eric Vanden-Eijnden

AI总结 本文提出了一种基于随机切换过程的并行退火变种,通过大偏差理论分析其收敛性,并指出在无限交换极限下可提高采样效率。同时讨论了如何选择温度梯度及多温度模拟方法。

详情
AI中文摘要

本文提出了一种基于随机切换过程的并行退火变种,用于耦合的复制配置和温度置换动态。该方法通过大偏差理论分析并行退火的收敛性,表明应在无限交换极限下操作以最大化采样效率。在该极限下,复制单独的有效方程仅需将原始势能替换为混合势能。分析该势能的几何特性提供了新的视角,解释了如何选择温度梯度以及为何通常需要多个温度以提高采样效率。还展示了如何在多温度环境下使用多尺度积分器模拟有效方程。最后,类似的思想也用于讨论无限交换极限扩展到模拟退火技术的扩展。

英文摘要

A variant of the parallel tempering method is proposed in terms of a stochastic switching process for the coupled dynamics of replica configuration and temperature permutation. This formulation is shown to facilitate the analysis of the convergence properties of parallel tempering by large deviation theory, which indicates that the method should be operated in the infinite swapping limit to maximize sampling efficiency. The effective equation for the replica alone that arises in this infinite swapping limit simply involves replacing the original potential by a mixture potential. The analysis of the geometric properties of this potential offers a new perspective on the issues of how to choose of temperature ladder, and why many temperatures should typically be introduced to boost the sampling efficiency. It is also shown how to simulate the effective equation in this many temperature regime using multiscale integrators. Finally, similar ideas are also used to discuss extensions of the infinite swapping limits to the technique of simulated tempering.

1507.01729 2026-06-04 stat.ME econ.GN q-fin.EC q-fin.ST

Measuring the frequency dynamics of financial connectedness and systemic risk

测量金融关联性与系统性风险的频率动态

Jozef Barunik, Tomas Krehlik

AI总结 本文提出一种新框架,用于测量因异质频率响应而产生的金融变量关联性,通过频谱表示方差分解估计短期、中期和长期金融周期中的关联性,并发现美国金融机构波动性关联性的丰富时频动态。

详情
AI中文摘要

我们提出了一种新的框架,用于测量由于对冲击的异质频率响应而产生的金融变量之间的关联性。为了估计短期、中期和长期金融周期中的关联性,我们引入基于方差分解频谱表示的框架。在实证应用中,我们记录了美国金融机构波动性关联性的丰富时频动态。经济上,高频率产生关联性的时期表明股票市场似乎快速而冷静地处理信息,且系统中一个资产的冲击主要在短期内产生影响。当关联性在较低频率下产生时,这表明冲击是持久的,并且会被更长时间地传递。

英文摘要

We propose a new framework for measuring connectedness among financial variables that arises due to heterogeneous frequency responses to shocks. To estimate connectedness in short-, medium-, and long-term financial cycles, we introduce a framework based on the spectral representation of variance decompositions. In an empirical application, we document the rich time-frequency dynamics of volatility connectedness in US financial institutions. Economically, periods in which connectedness is created at high frequencies are periods when stock markets seem to process information rapidly and calmly, and a shock to one asset in the system will have an impact mainly in the short term. When the connectedness is created at lower frequencies, it suggests that shocks are persistent and are being transmitted for longer periods.

1712.02860 2026-06-04 stat.AP cs.CE econ.GN math.OC math.PR q-fin.EC

Remarks on Bayesian Control Charts

关于贝叶斯控制图的评论

Amir Ahmadi-Javid, Mohsen Ebadi

AI总结 本文指出现有贝叶斯控制图不能普遍最优,因早年Taylor已提供反例,挑战其经济最优性主张。

详情
AI中文摘要

在单变量和多变量过程中,大量研究关注利用贝叶斯控制图检测从良好质量分布到劣质质量分布的转移。普遍认为贝叶斯控制图在经济上最优,如Calabrese(1995)和Makis(2008)的研究。部分研究者将基于后验概率的控制优化扩展到部分可观测马尔可夫决策过程。本文指出现有贝叶斯控制图不能普遍最优,因早年Taylor(1965)已提供反例。

英文摘要

There is a considerable amount of ongoing research on the use of Bayesian control charts for detecting a shift from a good quality distribution to a bad quality distribution in univariate and multivariate processes. It is widely claimed that Bayesian control charts are economically optimal; see, for example, Calabrese (1995) [Bayesian process control for attributes. Management Science, DOI: 10.1287/mnsc.41.4.637] and Makis (2008) [Multivariate Bayesian control chart. Operations Research, DOI: 10.1287/opre.1070.0495]. Some researchers also generalize the optimality of controls defined based on posterior probabilities to the class of partially observable Markov decision processes. This note points out that the existing Bayesian control charts cannot generally be optimal because many years ago an analytical counterexample was provided by Taylor (1965) [Markovian sequential replacement processes. The Annals of Mathematical Statistics, DOI: 10.1214/aoms/1177699796].

1712.02441 2026-06-04 eess.SY cs.SY stat.ML

A Novel Model for Arbitration between Planning and Habitual Control Systems

一种用于规划与习惯控制系统的仲裁新模型

Farzaneh S. Fard, Thomas P. Trappenberg

AI总结 本文提出一种结合规划与习惯控制的仲裁模型,通过模拟双关节机械臂任务展示其在动态环境中的高效性和鲁棒性。

详情
AI中文摘要

已知人类决策和工具控制使用多种系统,其中一些使用习惯性动作选择,而另一些则需要有意识的规划。有意识的规划系统利用内部环境模型预测动作结果,而习惯性动作选择系统通过重复先前奖励的动作来学习自动化。习惯性控制计算效率高但可能在变化环境中不够灵活。相反,有意识的规划可能计算成本高但更灵活。本文提出了一种包含两种控制范式的通用架构,通过引入仲裁器来控制何时使用哪个子系统。该系统在目标到达任务中实现了监督内部模型和深度强化学习。通过排列目标到达条件,我们证明了所提出的模型能够快速学习系统的动力学,且在环境奖励和动力学变化以及视觉遮挡条件下具有鲁棒性。仲裁器模型与仅使用内部模型的有意识规划和仅使用习惯性控制的实例进行了比较。结果表明,此类模型能够结合两种系统的优点,在可靠情况下做出快速决策,在变化环境中优化性能。此外,所提出的模型学习速度非常快。最后,包含内部模型的系统能够在视觉遮挡下达到目标,而纯习惯性系统在这样的条件下无法有效运行。

英文摘要

It is well established that humans decision making and instrumental control uses multiple systems, some which use habitual action selection and some which require deliberate planning. Deliberate planning systems use predictions of action-outcomes using an internal model of the agent's environment, while habitual action selection systems learn to automate by repeating previously rewarded actions. Habitual control is computationally efficient but may be inflexible in changing environments. Conversely, deliberate planning may be computationally expensive, but flexible in dynamic environments. This paper proposes a general architecture comprising both control paradigms by introducing an arbitrator that controls which subsystem is used at any time. This system is implemented for a target-reaching task with a simulated two-joint robotic arm that comprises a supervised internal model and deep reinforcement learning. Through permutation of target-reaching conditions, we demonstrate that the proposed is capable of rapidly learning kinematics of the system without a priori knowledge, and is robust to (A) changing environmental reward and kinematics, and (B) occluded vision. The arbitrator model is compared to exclusive deliberate planning with the internal model and exclusive habitual control instances of the model. The results show how such a model can harness the benefits of both systems, using fast decisions in reliable circumstances while optimizing performance in changing environments. In addition, the proposed model learns very fast. Finally, the system which includes internal models is able to reach the target under the visual occlusion, while the pure habitual system is unable to operate sufficiently under such conditions.

1712.02083 2026-06-04 cs.IT cs.NA math.IT math.NA math.OC stat.ML

A Local Analysis of Block Coordinate Descent for Gaussian Phase Retrieval

块坐标下降法在高斯相位恢复中的局部分析

David Barmherzig, Ju Sun

AI总结 本文研究了非凸问题中块坐标下降法的收敛性,通过高斯相位恢复问题的特殊结构,证明了该方法在确定性初始化下以线性速率收敛到全局极小值。

详情
Comments
10th NIPS Workshop on Optimization for Machine Learning (NIPS 2017)
AI中文摘要

尽管交替方向乘子法(ADMM)在凸问题上的收敛性已被广泛研究,但其在非凸问题上的收敛性仅部分理解。本文考虑了高斯相位恢复问题,将其形式化为具有双凸目标的线性约束优化问题。该问题的特殊结构允许ADMM的新应用。可以证明,在全局极小值处对偶变量为零。这促使对块坐标下降算法的分析,该算法等价于对偶变量固定为零的ADMM。我们证明,当从可确定性初始化点开始时,块坐标下降算法以线性速率收敛到全局极小值。

英文摘要

While convergence of the Alternating Direction Method of Multipliers (ADMM) on convex problems is well studied, convergence on nonconvex problems is only partially understood. In this paper, we consider the Gaussian phase retrieval problem, formulated as a linear constrained optimization problem with a biconvex objective. The particular structure allows for a novel application of the ADMM. It can be shown that the dual variable is zero at the global minimizer. This motivates the analysis of a block coordinate descent algorithm, which is equivalent to the ADMM with the dual variable fixed to be zero. We show that the block coordinate descent algorithm converges to the global minimizer at a linear rate, when starting from a deterministically achievable initialization point.

1705.08656 2026-06-04 stat.CO cs.NA math.NA stat.ME

Efficient Covariance Approximations for Large Sparse Precision Matrices

大规模稀疏精确矩阵的高效协方差近似

Per Sidén, Finn Lindgren, David Bolin, Mattias Villani

AI总结 本文提出一种快速Rao-Blackwell化蒙特卡罗采样方法,用于高效近似协方差矩阵元素,同时通过子域迭代减少误差,适用于高维模型中的联合推断。

详情
AI中文摘要

稀疏精确矩阵(逆协方差矩阵)的使用因其在高维模型联合推断中高效的算法而流行。许多应用需要计算协方差矩阵的某些元素,如边际方差,在高维情况下可能难以获得。本文介绍了一种快速Rao-Blackwell化蒙特卡罗采样方法,用于高效近似协方差矩阵的选定元素。近似值的方差和置信区间可精确估计,而无需额外计算成本。此外,提出了一种遍历子域的方法,并在功能性磁共振成像数据应用中显示,该方法能将近似误差减少到可忽略的程度。两种方法均具有低内存需求,通常是竞争直接方法的瓶颈。

英文摘要

The use of sparse precision (inverse covariance) matrices has become popular because they allow for efficient algorithms for joint inference in high-dimensional models. Many applications require the computation of certain elements of the covariance matrix, such as the marginal variances, which may be non-trivial to obtain when the dimension is large. This paper introduces a fast Rao-Blackwellized Monte Carlo sampling based method for efficiently approximating selected elements of the covariance matrix. The variance and confidence bounds of the approximations can be precisely estimated without additional computational costs. Furthermore, a method that iterates over subdomains is introduced, and is shown to additionally reduce the approximation errors to practically negligible levels in an application on functional magnetic resonance imaging data. Both methods have low memory requirements, which is typically the bottleneck for competing direct methods.

1705.00170 2026-06-04 math.PR cs.NA math-ph math.MP math.NA math.ST stat.TH

Using Perturbed Underdamped Langevin Dynamics to Efficiently Sample from Probability Distributions

利用扰动的非平衡兰格朗日动力学高效采样概率分布

A. B. Duncan, N. Nuesken, G. A. Pavliotis

AI总结 本文提出并分析了一种扰动的非平衡兰格朗日动力学采样器,通过调整扰动参数可降低渐近方差,针对高斯分布进行了详细分析,并通过非高斯分布的数值实验验证了理论结果。

详情
Comments
45 pages, 4 figures
AI中文摘要

在本文中,我们引入并分析了由标准非平衡兰格朗日动力学扰动构成的兰格朗日采样器。扰动动力学的不变测度与未扰动动力学相同。我们展示适当选择扰动可使采样器具有改进的性质,至少在减少渐近方差方面。我们对新的兰格朗日采样器在高斯目标分布上的进行了详细分析。我们的理论结果通过非高斯目标测度的数值实验得到支持。

英文摘要

In this paper we introduce and analyse Langevin samplers that consist of perturbations of the standard underdamped Langevin dynamics. The perturbed dynamics is such that its invariant measure is the same as that of the unperturbed dynamics. We show that appropriate choices of the perturbations can lead to samplers that have improved properties, at least in terms of reducing the asymptotic variance. We present a detailed analysis of the new Langevin sampler for Gaussian target distributions. Our theoretical results are supported by numerical experiments with non-Gaussian target measures.

1712.00602 2026-06-04 econ.GN q-fin.EC stat.AP

An Inverse Problem Study: Credit Risk Ratings as a Determinant of Corporate Governance and Capital Structure in Emerging Markets: Evidence from Chinese Listed Companies

逆问题研究:信用风险评级对企业治理和资本结构的影响:以中国上市公司为证据

ManYing Kang, Marcel Ausloos

AI总结 本文研究信用评级对企业治理和资本结构的影响,通过分析中国上市公司数据,发现信用评级与外部董事、企业规模等因素正相关,但对企业杠杆率的影响较弱,且存在非债务税盾的显著相关性。

详情
Journal ref
Economies 2017, 5, 41
Comments
25 pages, 6 tables
AI中文摘要

信用风险评级被证明是估计良好企业治理和自我优化资本结构的相关决定因素。结论基于对上海证券交易所和深圳证券交易所182家上市公司(2010-2015年)的研究,这些公司使用上海 Brilliance 信用评级与投资者服务公司的评估标准。实际上,研究了三个债务比率与11个特征变量的关系。任何信用评级与企业治理之间的关系可能是一个有趣的发现。信用评级与杠杆率的关系不如其他国家的研究明显,但与外部董事、企业规模、实物资产和企业年龄、CEO和董事长兼任正相关。然而,杠杆率与董事会规模、盈利能力、增长机会和非债务税盾负相关。信用评级与杠杆率正相关,但相关性较弱。CEO-董事会主席兼任与杠杆率无显著关系。非债务税盾与杠杆率显著相关。CEO兼任与审计师的相关系数为正但弱显著,似乎与预期不一致。最后,盈利能力可能是一个有趣的发现。确实,盈利能力与总债务呈负相关(注意,结果支持啄序理论)。结论是,信用评级对所列大型中国公司的影响比其他国家小。然而,通过相关机构评估信用风险评级的视角无疑是一个推荐的时间依赖性杠杆决定因素。

英文摘要

Credit risk rating is shown to be a relevant determinant in order to estimate good corporate governance and to self-optimize capital structure. The conclusion is argued from a study on a selected (and justified) sample of (182) companies listed on the Shanghai Stock Exchange and the Shenzhen Stock Exchange and which use the same Shanghai Brilliance Credit Rating & Investors Service Company assessment criteria, for their credit ratings, from 2010 to 2015. Practically, 3 debt ratios are examined in terms of 11 characteristic variables. Moreover, any relationship between credit rating and corporate governance can be thought to be an interesting finding. The relationship between credit rating and leverage is not as evident as that found by other researchers from different countries; it is significantly positively related to the outside director, firm size, tangible assets and firm age, and CEO and chairman office plurality. However, leverage is found to be negatively correlated with board size, profitability, growth opportunity, and non-debt tax shield. Credit rating is positively associated with leverage, but in a less significant way. CEO-Board chairship duality is insignificantly related to leverage. The non-debt tax shield is significantly correlated with leverage. The correlation coefficient between CEO duality and auditor is positive but weakly significant, but seems not consistent with expectations. Finally, profitability cause could be regarded as an interesting finding. Indeed, there is an inverse correlation between profitability and total debt (Notice that the result supports the pecking order theory). In conclusion, it appears that credit rating has less effect on the so listed large Chinese companies than in other countries. Nevertheless, the perspective of assessing credit risk rating by relevant agencies is indubitably a recommended time dependent leverage determinant.

1711.02213 2026-06-04 cs.LG cs.NA math.NA stat.ML

Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks

Flexpoint:一种适应性数值格式,用于高效训练深度神经网络

Urs Köster, Tristan J. Webb, Xin Wang, Marcel Nassar, Arjun K. Bansal, William H. Constable, Oğuz H. Elibol, Scott Gray, Stewart Hall, Luke Hornof, Amir Khosrowshahi, Carey Kloss, Ruby J. Pai, Naveen Rao

AI总结 Flexpoint是一种适应性数值格式,旨在高效训练深度神经网络,通过动态调整指数来减少溢出并最大化动态范围,实验证明其在训练AlexNet、残差网络和生成对抗网络时性能接近32位浮点数。

详情
Comments
14 pages, 5 figures, accepted in Neural Information Processing Systems 2017
AI中文摘要

深度神经网络通常在32位浮点格式下开发和训练。通过在训练和推理中使用优化于深度学习的数值格式,可以实现性能和能效的显著提升。尽管近年来在有限精度推理方面取得了进展,但以低比特宽度训练神经网络仍是一个挑战。本文提出了Flexpoint数据格式,旨在完全取代32位浮点格式的训练和推理,支持现代深度网络拓扑而不需修改。Flexpoint张量具有共享的指数,该指数动态调整以最小化溢出并最大化可用动态范围。我们通过使用neon深度学习框架实现的模拟器验证了Flexpoint,证明在训练AlexNet、深度残差网络和生成对抗网络时,16位Flexpoint在不需调整模型超参数的情况下,性能接近32位浮点数。我们的结果表明,Flexpoint是一种有前途的数值格式,适用于未来用于训练和推理的硬件。

英文摘要

Deep neural networks are commonly developed and trained in 32-bit floating point format. Significant gains in performance and energy efficiency could be realized by training and inference in numerical formats optimized for deep learning. Despite advances in limited precision inference in recent years, training of neural networks in low bit-width remains a challenging problem. Here we present the Flexpoint data format, aiming at a complete replacement of 32-bit floating point format training and inference, designed to support modern deep network topologies without modifications. Flexpoint tensors have a shared exponent that is dynamically adjusted to minimize overflows and maximize available dynamic range. We validate Flexpoint by training AlexNet, a deep residual network and a generative adversarial network, using a simulator implemented with the neon deep learning framework. We demonstrate that 16-bit Flexpoint closely matches 32-bit floating point in training all three models, without any need for tuning of model hyperparameters. Our results suggest Flexpoint as a promising numerical format for future hardware for training and inference.

1610.04056 2026-06-04 cs.IT cs.NA math.IT math.NA math.ST stat.TH

Estimation of linear operators from scattered impulse responses

从散射冲击响应估计线性算子

Jérémie Bigot, Paul Escande, Pierre Weiss

AI总结 本文提出一种基于再生核希尔伯特空间平滑理论的新估计方法,用于估计具有光滑核的积分算子,通过适当正则化项考虑算子的光滑性,并分析其在大尺寸数据集中的逼近性质和鲁棒性。

详情
AI中文摘要

我们提供了一种新的积分算子估计器,该估计器基于一组散射且噪声的冲击响应。所提出的方法依赖于再生核希尔伯特空间中的平滑理论,并选择适当的正则化项以考虑算子的光滑性。该方法在非常大的维度上具有数值可行性。我们研究了估计器对噪声的鲁棒性,并分析了其在数据集大小和几何结构方面的逼近性质。此外,我们还展示了所提出估计器的最小最大最优性。

英文摘要

We provide a new estimator of integral operators with smooth kernels, obtained from a set of scattered and noisy impulse responses. The proposed approach relies on the formalism of smoothing in reproducing kernel Hilbert spaces and on the choice of an appropriate regularization term that takes the smoothness of the operator into account. It is numerically tractable in very large dimensions. We study the estimator's robustness to noise and analyze its approximation properties with respect to the size and the geometry of the dataset. In addition, we show minimax optimality of the proposed estimator.

1711.10765 2026-06-04 stat.CO cs.SY eess.SY

Learning nonlinear state-space models using smooth particle-filter-based likelihood approximations

利用平滑粒子滤波似然近似学习非线性状态空间模型

Andreas Svensson, Fredrik Lindsten, Thomas B. Schön

AI总结 本文提出通过平滑粒子滤波近似似然函数,利用优化方法求解非线性状态空间模型参数,通过迭代提升估计精度。

详情
AI中文摘要

当经典粒子滤波算法用于非线性状态空间模型的最大似然参数估计时,似然函数及其导数的估计固有噪声是一个关键挑战。本文的核心思想是运行一个基于当前参数估计的粒子滤波器,然后利用该滤波器的输出重新评估其他参数值的似然函数近似。这导致了一个局部确定性似然近似,任何标准优化程序均可用于找到该局部近似的最大值。通过迭代此过程,最终获得参数估计。

英文摘要

When classical particle filtering algorithms are used for maximum likelihood parameter estimation in nonlinear state-space models, a key challenge is that estimates of the likelihood function and its derivatives are inherently noisy. The key idea in this paper is to run a particle filter based on a current parameter estimate, but then use the output from this particle filter to re-evaluate the likelihood function approximation also for other parameter values. This results in a (local) deterministic approximation of the likelihood and any standard optimization routine can be applied to find the maximum of this local approximation. By iterating this procedure we eventually arrive at a final parameter estimate.

1711.10566 2026-06-04 cs.AI cs.LG cs.NA math.AP math.NA stat.ML

Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations

物理指导深度学习(第二部分):数据驱动发现非线性偏微分方程

Maziar Raissi, Paris Perdikaris, George Em Karniadakis

AI总结 本文提出物理指导神经网络,用于在尊重物理定律的前提下解决监督学习任务。第二部分聚焦于数据驱动发现偏微分方程的问题,区分了连续时间和离散时间模型,并通过数学物理中的多个基准问题验证了方法的有效性。

详情
AI中文摘要

我们介绍了一种物理指导的神经网络——一种在解决监督学习任务时尊重由一般非线性偏微分方程描述的物理定律的神经网络。在本文第二部分中,我们专注于偏微分方程的数据驱动发现问题。根据可用数据在时空中的分布是散乱还是固定时间快照,我们引入了两种主要算法类别,即连续时间和离散时间模型。通过数学物理中的广泛基准问题,包括守恒定律、不可压缩流体流动和非线性浅水波传播,展示了我们方法的有效性。

英文摘要

We introduce physics informed neural networks -- neural networks that are trained to solve supervised learning tasks while respecting any given law of physics described by general nonlinear partial differential equations. In this second part of our two-part treatise, we focus on the problem of data-driven discovery of partial differential equations. Depending on whether the available data is scattered in space-time or arranged in fixed temporal snapshots, we introduce two main classes of algorithms, namely continuous time and discrete time models. The effectiveness of our approach is demonstrated using a wide range of benchmark problems in mathematical physics, including conservation laws, incompressible fluid flow, and the propagation of nonlinear shallow-water waves.

1711.10561 2026-06-04 cs.AI cs.LG cs.NA math.DS math.NA stat.ML

Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations

物理引导的深度学习(第一部分):非线性偏微分方程的数据驱动求解

Maziar Raissi, Paris Perdikaris, George Em Karniadakis

AI总结 本文提出物理引导的神经网络,用于在满足物理定律的前提下解决监督学习问题。第一部分介绍了如何利用这些网络推断偏微分方程的解,并构建可微的物理引导替代模型。

详情
AI中文摘要

我们引入了物理引导的神经网络——一种在解决监督学习任务时尊重由一般非线性偏微分方程描述的物理定律的神经网络。在本两部分论述中,我们围绕解决两类主要问题展开:数据驱动求解和数据驱动发现偏微分方程。根据可用数据的性质和安排,我们设计了两种不同的算法类别,即连续时间和离散时间模型。所得到的神经网络形成了一种新的数据高效通用函数逼近器类别,能够自然地将任何底层物理定律作为先验信息编码。在本第一部分中,我们展示了这些网络如何用于推断偏微分方程的解,并获得完全可微的物理引导替代模型,该模型对所有输入坐标和自由参数均可微分。

英文摘要

We introduce physics informed neural networks -- neural networks that are trained to solve supervised learning tasks while respecting any given law of physics described by general nonlinear partial differential equations. In this two part treatise, we present our developments in the context of solving two main classes of problems: data-driven solution and data-driven discovery of partial differential equations. Depending on the nature and arrangement of the available data, we devise two distinct classes of algorithms, namely continuous time and discrete time models. The resulting neural networks form a new class of data-efficient universal function approximators that naturally encode any underlying physical laws as prior information. In this first part, we demonstrate how these networks can be used to infer solutions to partial differential equations, and obtain physics-informed surrogate models that are fully differentiable with respect to all input coordinates and free parameters.

1603.07775 2026-06-04 eess.SY cs.SY stat.AP

The Impact of Operators' Performance in the Reliability of Cyber-Physical Power Distribution Systems

运营商性能对电网可靠性的冲击

Michel Bessani, Rodrigo Z. Fanucchi, Alexandre C. C. Delbem, Carlos D. Maciel

AI总结 本文研究了运营商响应时间对电网可靠性指标的影响,提出了一种基于序贯蒙特卡洛模拟的方法,评估了运营商性能对可靠性指标的作用。

详情
AI中文摘要

本文研究了运营商响应时间对电网可靠性指标的影响,提出了一种基于序贯蒙特卡洛模拟的方法,评估了运营商性能对可靠性指标的作用。

英文摘要

Cyber-Physical Systems are the result of integrating information and communication technologies into physical systems. One particular case are Cyber-Physical Power Systems (CPPS), which use communication technologies to perform real-time monitoring and operation. These kinds of systems have become more complex, impacting on the systems' characteristics, such as their reliability. In addition, it is already known that in terms of the reliability of Cyber-Physical Power Distribution Systems (CPPDS), the failures of the communication network are just as relevant as the electrical network failures. However, some of the operators' performances, such as response time and decision quality, during CPPDS contingencies have not been investigated yet. In this paper, we introduce a model to the operator response time, present a Sequential Monte Carlo Simulation methodology that incorporates the response time in CPPDS reliability indices estimation, and evaluate the impact of the operator response time in reliability indices. Our method is tested on a CPPDS using different values for the average response time of operators. The results show that the response time of the operators affects the reliability indices that are related to the durations of the failure, indicating that a fast decision directly contributes to the system performance. We conclude that the improvement of CPPDS reliability is not only dependent on the electric and communication components, but also dependent on operators' performance.

1711.08833 2026-06-04 cs.LG cs.NA math.NA stat.ML

Deep Learning for Real-Time Crime Forecasting and its Ternarization

深度学习在实时犯罪预测中的应用及其三元化

Bao Wang, Penghang Yin, Andrea L. Bertozzi, P. Jeffrey Brantingham, Stanley J. Osher, Jack Xin

AI总结 本文提出了一种基于时空残差网络的犯罪预测模型,并通过三元化技术解决实际部署中的资源消耗问题,提升了预测精度。

详情
Comments
14 pages, 7 figures
AI中文摘要

实时犯罪预测至关重要。然而,准确预测下一次犯罪发生的时间和地点具有挑战性。目前尚无已知的物理模型能合理近似此类复杂系统。历史犯罪数据在空间和时间上都很稀疏,感兴趣的信号较弱。本文首先提出了一种恰当的犯罪数据表示方法。然后,我们适应了在充分表示的数据上时空残差网络,以预测洛杉矶每个邻里规模区域在小时级尺度上的犯罪分布。这些实验以及与几种现有预测方法的比较证明了所提模型在准确性上的优越性。最后,我们提出了一种三元化技术,以解决其在现实世界部署中的资源消耗问题。本文是对我们短期会议论文[ Wang et al, Arxiv 1707.03340 ]的扩展。

英文摘要

Real-time crime forecasting is important. However, accurate prediction of when and where the next crime will happen is difficult. No known physical model provides a reasonable approximation to such a complex system. Historical crime data are sparse in both space and time and the signal of interests is weak. In this work, we first present a proper representation of crime data. We then adapt the spatial temporal residual network on the well represented data to predict the distribution of crime in Los Angeles at the scale of hours in neighborhood-sized parcels. These experiments as well as comparisons with several existing approaches to prediction demonstrate the superiority of the proposed model in terms of accuracy. Finally, we present a ternarization technique to address the resource consumption issue for its deployment in real world. This work is an extension of our short conference proceeding paper [Wang et al, Arxiv 1707.03340].

1701.03974 2026-06-04 eess.SY cs.LG cs.SY math.OC stat.ML

An Online Convex Optimization Approach to Dynamic Network Resource Allocation

一种面向动态网络资源分配的在线凸优化方法

Tianyi Chen, Qing Ling, Georgios B. Giannakis

AI总结 本文提出MOSP方案,解决对抗性损失和约束下的动态优化问题,实现子线性动态遗憾和适应性,应用于网络资源分配并优于现有方法。

详情
AI中文摘要

现有在线凸优化方法进行顺序单时段决策,导致可能的对抗性损失,其性能通过遗憾衡量。本文研究对抗性损失和约束的在线凸优化问题,约束在决策后揭示,可容忍瞬时违反但需长期满足。算法性能通过动态遗憾和动态适应性评估。本文提出改进的在线鞍点方案(MOSP),证明其在累积变化子线性增长时同时获得子线性动态遗憾和适应性。MOSP应用于动态网络资源分配任务,并与已知的随机对偶梯度方法比较。在各种场景中,数值实验展示了MOSP相对于现有方法的性能优势。

英文摘要

Existing approaches to online convex optimization (OCO) make sequential one-slot-ahead decisions, which lead to (possibly adversarial) losses that drive subsequent decision iterates. Their performance is evaluated by the so-called regret that measures the difference of losses between the online solution and the best yet fixed overall solution in hindsight. The present paper deals with online convex optimization involving adversarial loss functions and adversarial constraints, where the constraints are revealed after making decisions, and can be tolerable to instantaneous violations but must be satisfied in the long term. Performance of an online algorithm in this setting is assessed by: i) the difference of its losses relative to the best dynamic solution with one-slot-ahead information of the loss function and the constraint (that is here termed dynamic regret); and, ii) the accumulated amount of constraint violations (that is here termed dynamic fit). In this context, a modified online saddle-point (MOSP) scheme is developed, and proved to simultaneously yield sub-linear dynamic regret and fit, provided that the accumulated variations of per-slot minimizers and constraints are sub-linearly growing with time. MOSP is also applied to the dynamic network resource allocation task, and it is compared with the well-known stochastic dual gradient method. Under various scenarios, numerical experiments demonstrate the performance gain of MOSP relative to the state-of-the-art.

1702.05423 2026-06-04 math.OC cs.NA math.NA stat.ML

Accelerated Primal-Dual Proximal Block Coordinate Updating Methods for Constrained Convex Optimization

加速的对偶 primal-dual 近似块坐标更新方法用于约束凸优化

Yangyang Xu, Shuzhong Zhang

AI总结 本文提出了一种加速的对偶 primal-dual 块坐标更新方法,用于解决多块变量的线性约束凸优化问题,通过随机选择块变量实现 O(1/t) 收敛率,并在强凸情况下达到 O(1/t²) 收敛,且在独立块变量时可实现线性收敛。

详情
Comments
Accepted to Computational Optimization and Applications
AI中文摘要

块坐标更新(BCU)方法因其每次更新计算复杂度低而受到青睐,因为每次只更新一个或几个块变量。本文提出了一种求解线性约束凸程序的对偶 primal-dual BCU 方法,该方法是作者提出的一种对偶 primal-dual 算法的加速版本,通过随机选择块变量来更新,并在弱凸性假设下建立 O(1/t) 收敛率。我们展示了在目标函数为强凸时,收敛率可以加速到 O(1/t²)。此外,如果目标函数中一个块变量与其他变量无关,则可以修改算法以实现线性收敛率。数值实验表明,加速方法在单组参数下表现稳定,而原始方法需要为不同数据集调整参数以达到可比的性能水平。

英文摘要

Block Coordinate Update (BCU) methods enjoy low per-update computational complexity because every time only one or a few block variables would need to be updated among possibly a large number of blocks. They are also easily parallelized and thus have been particularly popular for solving problems involving large-scale dataset and/or variables. In this paper, we propose a primal-dual BCU method for solving linearly constrained convex program in multi-block variables. The method is an accelerated version of a primal-dual algorithm proposed by the authors, which applies randomization in selecting block variables to update and establishes an $O(1/t)$ convergence rate under weak convexity assumption. We show that the rate can be accelerated to $O(1/t^2)$ if the objective is strongly convex. In addition, if one block variable is independent of the others in the objective, we then show that the algorithm can be modified to achieve a linear rate of convergence. The numerical experiments show that the accelerated method performs stably with a single set of parameters while the original method needs to tune the parameters for different datasets in order to achieve a comparable level of performance.

1711.06656 2026-06-04 math.OC cs.NA math.NA stat.ML

A Parallelizable Acceleration Framework for Packing Linear Programs

一种可并行加速的线性规划打包框架

Palma London, Shai Vardi, Adam Wierman, Hanling Yi

AI总结 本文提出一种可并行加速的线性规划打包框架,通过减少问题规模显著提升求解速度,适用于精确和近似求解器,以及整数线性规划问题。

详情
AI中文摘要

本文提出了一种用于解决数据量有限的线性规划问题的加速框架,其中约束数量m远小于变量维度n。该框架可作为黑盒工具,通过在实验中提升求解速度两个数量级。我们提供了最坏情况下的解质量保证和加速效果保证,表明该框架在运行原始求解器于更小问题上提供近似最优解。该框架可用于加速精确求解器、近似求解器以及并行/分布式求解器,同时适用于线性规划和整数线性规划问题。

英文摘要

This paper presents an acceleration framework for packing linear programming problems where the amount of data available is limited, i.e., where the number of constraints m is small compared to the variable dimension n. The framework can be used as a black box to speed up linear programming solvers dramatically, by two orders of magnitude in our experiments. We present worst-case guarantees on the quality of the solution and the speedup provided by the algorithm, showing that the framework provides an approximately optimal solution while running the original solver on a much smaller problem. The framework can be used to accelerate exact solvers, approximate solvers, and parallel/distributed solvers. Further, it can be used for both linear programs and integer linear programs.

1710.11298 2026-06-04 stat.ME cs.IT cs.NA math.IT math.NA stat.ML

Effective Tensor Sketching via Sparsification

通过稀疏化实现有效的张量Sketching

Dong Xia, Ming Yuan

AI总结 本文提出一种新的张量稀疏化算法,通过合理保留张量部分元素,在保证张量谱范数近似精度的前提下,显著降低样本复杂度。对于稳定秩为r_s的k阶d×…×d立方张量,相对误差ε较大时,样本需求为r_s^{1/2}d^{k/2}/ε,ε较小时为r_s d/ε²,且高精度时与k无关。

详情
AI中文摘要

本文研究了通过稀疏化实现高维多线性数组或张量的有效Sketching方案。我们提出了一种新的张量稀疏化算法,通过合理保留张量部分元素,证明其在张量谱范数近似精度方面,相较于现有方法具有显著更小的样本复杂度。特别是,对于一个k阶d×…×d立方张量,其稳定秩为r_s时,当相对误差ε较大时,样本需求量为r_s^{1/2}d^{k/2}/ε(忽略对数因子),当ε较小时,样本需求量为r_s d/ε²且几乎是最优的。值得注意的是,高精度下的样本需求量与k无关。此外,我们还研究了如何通过稀疏化高效近似大规模张量的高阶奇异值分解(HOSVD)。

英文摘要

In this paper, we investigate effective sketching schemes via sparsification for high dimensional multilinear arrays or tensors. More specifically, we propose a novel tensor sparsification algorithm that retains a subset of the entries of a tensor in a judicious way, and prove that it can attain a given level of approximation accuracy in terms of tensor spectral norm with a much smaller sample complexity when compared with existing approaches. In particular, we show that for a $k$th order $d\times\cdots\times d$ cubic tensor of {\it stable rank} $r_s$, the sample size requirement for achieving a relative error $\varepsilon$ is, up to a logarithmic factor, of the order $r_s^{1/2} d^{k/2} /\varepsilon$ when $\varepsilon$ is relatively large, and $r_s d /\varepsilon^2$ and essentially optimal when $\varepsilon$ is sufficiently small. It is especially noteworthy that the sample size requirement for achieving a high accuracy is of an order independent of $k$. To further demonstrate the utility of our techniques, we also study how higher order singular value decomposition (HOSVD) of large tensors can be efficiently approximated via sparsification.

1711.04683 2026-06-04 cs.LG cs.RO cs.SY eess.SY stat.ML

Tensor Decompositions for Modeling Inverse Dynamics

张量分解用于逆动力学建模

Stephan Baier, Volker Tresp

AI总结 本文提出利用张量分解方法建模逆动力学,通过处理位置、速度和加速度的三重交互,实现对高非线性函数的近似,并在SARCOS机械臂数据集上验证了其优越性。

详情
AI中文摘要

建模逆动力学对于精确的前馈机器人控制至关重要。该模型计算所需的关节扭矩,以执行预期的运动。高度非线性的动态系统逆函数可以通过回归技术近似。我们提出了一种回归方法,即利用位置x速度x加速度的三重交互的张量分解模型。大多数张量分解工作都解决了密集张量的分解问题。本文在稀疏张量的分解基础上进行扩展,仅包含少量非零条目。稀疏张量的分解已成功应用于关系学习,例如大规模知识图谱的建模。最近,该方法已扩展到多类分类问题,涉及离散输入变量。在高维稀疏张量中表示数据可以近似复杂的高非线性函数。本文展示了稀疏张量分解如何应用于回归问题。此外,我们通过学习从连续输入到张量分解的潜在表示的映射,利用基函数将方法扩展到连续输入。我们在具有七自由度SARCOS机械臂轨迹的数据集上评估了所提出的模型。实验结果表明,所提出的功能张量模型相比挑战性的最新方法具有优越的性能。

英文摘要

Modeling inverse dynamics is crucial for accurate feedforward robot control. The model computes the necessary joint torques, to perform a desired movement. The highly non-linear inverse function of the dynamical system can be approximated using regression techniques. We propose as regression method a tensor decomposition model that exploits the inherent three-way interaction of positions x velocities x accelerations. Most work in tensor factorization has addressed the decomposition of dense tensors. In this paper, we build upon the decomposition of sparse tensors, with only small amounts of nonzero entries. The decomposition of sparse tensors has successfully been used in relational learning, e.g., the modeling of large knowledge graphs. Recently, the approach has been extended to multi-class classification with discrete input variables. Representing the data in high dimensional sparse tensors enables the approximation of complex highly non-linear functions. In this paper we show how the decomposition of sparse tensors can be applied to regression problems. Furthermore, we extend the method to continuous inputs, by learning a mapping from the continuous inputs to the latent representations of the tensor decomposition, using basis functions. We evaluate our proposed model on a dataset with trajectories from a seven degrees of freedom SARCOS robot arm. Our experimental results show superior performance of the proposed functional tensor model, compared to challenging state-of-the art methods.

1705.08551 2026-06-04 stat.ML cs.AI cs.LG cs.SY eess.SY

Safe Model-based Reinforcement Learning with Stability Guarantees

具有稳定性保证的安全模型基于强化学习

Felix Berkenkamp, Matteo Turchetta, Angela P. Schoellig, Andreas Krause

AI总结 本文提出一种考虑安全性的强化学习算法,通过Lyapunov稳定性验证理论,利用动态统计模型获得具有证明稳定性的高性能控制策略,并在模拟倒立摆中展示其安全优化神经网络策略的能力。

详情
Comments
Proc. of Neural Information Processing Systems (NIPS), 2017
AI中文摘要

强化学习是一种从实验数据中学习最优策略的强大范式。然而,为了找到最优策略,大多数强化学习算法会探索所有可能的动作,这可能对现实系统有害。因此,学习算法在现实世界中很少应用于安全关键系统。在本文中,我们提出了一种明确考虑安全性的学习算法,定义为稳定性保证。具体来说,我们扩展了控制理论中关于Lyapunov稳定性验证的结果,并展示了如何利用动态的统计模型来获得具有证明稳定性的高性能控制策略。此外,在额外的正则性假设条件下,我们证明了可以有效地、安全地收集数据以学习动态特性,从而提高控制性能并扩大状态空间的安全区域。在我们的实验中,我们展示了所得到的算法如何在模拟倒立摆上安全地优化神经网络策略,而摆杆从未倒下。

英文摘要

Reinforcement learning is a powerful paradigm for learning optimal policies from experimental data. However, to find optimal policies, most reinforcement learning algorithms explore all possible actions, which may be harmful for real-world systems. As a consequence, learning algorithms are rarely applied on safety-critical systems in the real world. In this paper, we present a learning algorithm that explicitly considers safety, defined in terms of stability guarantees. Specifically, we extend control-theoretic results on Lyapunov stability verification and show how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates. Moreover, under additional regularity assumptions in terms of a Gaussian process prior, we prove that one can effectively and safely collect data in order to learn about the dynamics and thus both improve control performance and expand the safe region of the state space. In our experiments, we show how the resulting algorithm can safely optimize a neural network policy on a simulated inverted pendulum, without the pendulum ever falling down.

1608.01958 2026-06-04 math.NA cs.NA stat.CO

Iterative importance sampling algorithms for parameter estimation

迭代重要性采样算法用于参数估计

Matthias Morzfeld, Marcus S. Day, Ray W. Grout, George Shu Heng Pau, Stefan A. Finsterle, John B. Bell

AI总结 本文研究了迭代重要性采样算法在地下流和燃烧建模中的应用,通过并行计算实现高效参数估计。

详情
AI中文摘要

在参数估计问题中,需计算由先验分布、模型和噪声数据共同定义的后验分布。马尔可夫链蒙特卡洛(MCMC)常用于此类问题的数值求解。重要性采样作为替代方法,因其在高性能计算系统上能近似完美地扩展到核心数量。然而,找到合适的提议分布是一个挑战。近年来提出了一些迭代构建提议分布的采样算法。本文通过两个现实且具有挑战性的测试问题,即地下流和燃烧建模,探讨了这些算法的适用性。具体而言,我们实现了迭代计算高斯或多元t分布均值和协方差矩阵的重要性采样算法。我们的实现利用了大规模并行计算机,并提出了使用“粗略”MCMC运行或高斯混合模型初始化迭代的策略。

英文摘要

In parameter estimation problems one computes a posterior distribution over uncertain parameters defined jointly by a prior distribution, a model, and noisy data. Markov Chain Monte Carlo (MCMC) is often used for the numerical solution of such problems. An alternative to MCMC is importance sampling, which can exhibit near perfect scaling with the number of cores on high performance computing systems because samples are drawn independently. However, finding a suitable proposal distribution is a challenging task. Several sampling algorithms have been proposed over the past years that take an iterative approach to constructing a proposal distribution. We investigate the applicability of such algorithms by applying them to two realistic and challenging test problems, one in subsurface flow, and one in combustion modeling. More specifically, we implement importance sampling algorithms that iterate over the mean and covariance matrix of Gaussian or multivariate t-proposal distributions. Our implementation leverages massively parallel computers, and we present strategies to initialize the iterations using "coarse" MCMC runs or Gaussian mixture models.

1711.03026 2026-06-04 eess.SY cs.AI cs.SY stat.ML

Intelligent Fault Analysis in Electrical Power Grids

电力电网的智能故障分析

Biswarup Bhattacharya, Abhishek Sinha

AI总结 本文提出利用人工智能技术,通过形式化模型和机器学习方法检测电网健康状况,提升电网稳定性与安全性。

详情
Comments
In proceedings of the 29th IEEE International Conference on Tools with Artificial Intelligence (ICTAI) 2017 (full paper); 6 pages; 13 figures
AI中文摘要

电力电网是当今世界基础设施中最重要的一部分。每个国家都依赖自身电网的安全性和稳定性来为家庭和工业提供电力。即使电网的某一小部分出现故障,也可能导致生产力、收入损失,甚至在某些情况下导致生命危险。因此,设计一个能够检测电网健康状况并在严重异常发生前采取保护措施的系统至关重要。为此,我们致力于创建一个智能系统,能够随时分析电网信息,并通过使用复杂的正式模型和新颖的机器学习技术如循环神经网络来确定电网的健康状况。我们的系统使用西门子PSS/E软件模拟电网条件,包括故障、发电机输出波动和负载波动等刺激,并使用SVM、LSTM等分类器对数据进行训练和测试。结果非常出色,我们的方法在数据上表现出很高的准确性。该模型可以轻松扩展以处理更大、更复杂的电网架构。

英文摘要

Power grids are one of the most important components of infrastructure in today's world. Every nation is dependent on the security and stability of its own power grid to provide electricity to the households and industries. A malfunction of even a small part of a power grid can cause loss of productivity, revenue and in some cases even life. Thus, it is imperative to design a system which can detect the health of the power grid and take protective measures accordingly even before a serious anomaly takes place. To achieve this objective, we have set out to create an artificially intelligent system which can analyze the grid information at any given time and determine the health of the grid through the usage of sophisticated formal models and novel machine learning techniques like recurrent neural networks. Our system simulates grid conditions including stimuli like faults, generator output fluctuations, load fluctuations using Siemens PSS/E software and this data is trained using various classifiers like SVM, LSTM and subsequently tested. The results are excellent with our methods giving very high accuracy for the data. This model can easily be scaled to handle larger and more complex grid architectures.

1711.02857 2026-06-04 cs.LG cs.AI cs.CV cs.NA math.NA stat.ML

Learning Sparse Visual Representations with Leaky Capped Norm Regularizers

通过泄漏受限范数正则化器学习稀疏视觉表示

Jianqiao Wangni, Dahua Lin

AI总结 本文提出泄漏受限范数正则化器,用于学习过完备视觉表示,证明了其在3D形状恢复中的收敛性,优于ℓ1和非凸正则化方法。

详情
AI中文摘要

诱导稀疏性的正则化是学习过完备视觉表示的重要组成部分。尽管ℓ1正则化广受欢迎,本文研究了非凸正则化在该问题中的应用。我们的贡献包括三个部分:首先,我们提出了泄漏受限范数正则化器(LCNR),允许模型权重低于一定阈值的部分被更强地正则化,从而实现强稀疏性,仅引入可控的估计偏差。我们提出了一种主要化-最小化算法来优化联合目标函数。其次,我们的研究显示,在单目3D形状恢复和神经网络中,LCNR优于ℓ1和其他非凸正则化方法,实现了最先进的性能和更快的收敛速度。第三,我们证明了在3D恢复问题上的理论全局收敛速度。到目前为止,这是首次对3D恢复问题的收敛性分析。

英文摘要

Sparsity inducing regularization is an important part for learning over-complete visual representations. Despite the popularity of $\ell_1$ regularization, in this paper, we investigate the usage of non-convex regularizations in this problem. Our contribution consists of three parts. First, we propose the leaky capped norm regularization (LCNR), which allows model weights below a certain threshold to be regularized more strongly as opposed to those above, therefore imposes strong sparsity and only introduces controllable estimation bias. We propose a majorization-minimization algorithm to optimize the joint objective function. Second, our study over monocular 3D shape recovery and neural networks with LCNR outperforms $\ell_1$ and other non-convex regularizations, achieving state-of-the-art performance and faster convergence. Third, we prove a theoretical global convergence speed on the 3D recovery problem. To the best of our knowledge, this is the first convergence analysis of the 3D recovery problem.

1705.04374 2026-06-04 cs.CE cs.DC cs.NA math.NA stat.CO

Optimal fidelity multi-level Monte Carlo for quantification of uncertainty in simulations of cloud cavitation collapse

最优保真多级蒙特卡洛方法用于云腔崩溃模拟中不确定性量化

Jonas Šukys, Ursula Rasthofer, Fabian Wermelinger, Panagiotis Hadjidoukas, Petros Koumoutsakos

AI总结 本文提出一种最优保真多级蒙特卡洛方法,用于量化云腔崩溃模拟中极端压力斑点位置和强度的不确定性,通过引入新的最优控制变量化系数显著提高方差减少效率,实现比传统蒙特卡洛方法快两个数量级的计算速度。

详情
AI中文摘要

我们量化了从大规模多相流模拟中揭示出的云腔崩溃中极端压力斑点位置和强度的不确定性。我们研究包含500个气泡的云,并量化其初始空间排列相关的不确定性。所得到的2000维空间使用非侵入性和计算高效的多级蒙特卡洛(MLMC)方法进行采样。我们引入了新的最优控制变量化系数以提高MLMC中的方差减少。所提出的最优保真MLMC方法相比传统蒙特卡洛方法实现了超过两个数量级的加速。我们识别出峰值压力脉冲的位置和强度存在较大不确定性,并展示了其统计相关性和与云几何特征的联合概率密度函数。空间云结构的特征属性被识别为产生显著不确定性的潜在原因。

英文摘要

We quantify uncertainties in the location and magnitude of extreme pressure spots revealed from large scale multi-phase flow simulations of cloud cavitation collapse. We examine clouds containing 500 cavities and quantify uncertainties related to their initial spatial arrangement. The resulting 2000-dimensional space is sampled using a non-intrusive and computationally efficient Multi-Level Monte Carlo (MLMC) methodology. We introduce novel optimal control variate coefficients to enhance the variance reduction in MLMC. The proposed optimal fidelity MLMC leads to more than two orders of magnitude speedup when compared to standard Monte Carlo methods. We identify large uncertainties in the location and magnitude of the peak pressure pulse and present its statistical correlations and joint probability density functions with the geometrical characteristics of the cloud. Characteristic properties of spatial cloud structure are identified as potential causes of significant uncertainties in exerted collapse pressures.

1711.00946 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

Learning Linear Dynamical Systems via Spectral Filtering

通过谱滤波学习线性动力系统

Elad Hazan, Karan Singh, Cyril Zhang

AI总结 本文提出一种高效算法,通过过度参数化线性动力系统实现在线预测,利用谱滤波技术获得近优 regret 保证。

详情
Comments
Published as a conference paper at NIPS 2017
AI中文摘要

我们提出了一种高效且实用的算法,用于在线预测具有对称转移矩阵的离散时间线性动力系统。我们通过不当学习避免非凸优化问题:通过多项式对数因子过度参数化LDS类,在换取损失函数的凸性。由此产生一个具有近优 regret 保证的多项式时间算法,具有类似的一般学习样本复杂度界。我们的算法基于一种新颖的过滤技术,可能具有独立兴趣:我们将时间序列与某个Hankel矩阵的特征向量进行卷积。

英文摘要

We present an efficient and practical algorithm for the online prediction of discrete-time linear dynamical systems with a symmetric transition matrix. We circumvent the non-convex optimization problem using improper learning: carefully overparameterize the class of LDSs by a polylogarithmic factor, in exchange for convexity of the loss functions. From this arises a polynomial-time algorithm with a near-optimal regret guarantee, with an analogous sample complexity bound for agnostic learning. Our algorithm is based on a novel filtering technique, which may be of independent interest: we convolve the time series with the eigenvectors of a certain Hankel matrix.

1702.06861 2026-06-04 stat.ML cs.LG cs.NA math.NA

On the Power of Truncated SVD for General High-rank Matrix Estimation Problems

关于截断SVD在一般高秩矩阵估计问题中的功效

Simon S. Du, Yining Wang, Aarti Singh

AI总结 本文探讨了在谱范数下接近高秩正半定矩阵的估计值,通过截断SVD在Frobenius范数下得到乘法近似,解决了高秩矩阵补全、去噪和高维协方差估计问题。

详情
Comments
Accepted by NIPS 2017. Add gap-dependent bounds
AI中文摘要

本文证明,给定一个在谱范数下接近一般高秩正半定矩阵A的估计值Ã(即‖Ã-A‖₂ ≤ δ),对Ã进行简单的截断SVD可以得到A在Frobenius范数下的乘法近似。这一观察导致了许多关于一般高秩矩阵估计问题的有趣结果,我们简要总结如下(A是一个n×n的高秩正半定矩阵,A_k是A的最佳秩-k近似):(1)高秩矩阵补全:通过观测Ω(nmax{ε⁻⁴,k²}μ₀²‖A‖_F²logn/σ_{k+1}(A)²)个A的元素,其中σ_{k+1}(A)是A的第(k+1)个奇异值,μ₀是不相干性,对零填充矩阵进行截断SVD可以满足‖Ã_k -A‖_F ≤ (1+O(ε))‖A -A_k‖_F,以高概率成立。(2)高秩矩阵去噪:令Ã=A+E,其中E是一个高斯随机噪声矩阵,具有零均值和每个元素方差为ν²/n。则Ã的截断SVD满足‖Ã_k -A‖_F ≤ (1+O(√(ν/σ_{k+1}(A))))‖A -A_k‖_F + O(√kν)。(3)高维协方差的低秩估计:给定N个i.i.d.样本X₁,…,X_N ~ N_n(0,A),能否用相对误差Frobenius范数界来估计A?我们证明如果N=Ω(nmax{ε⁻⁴,k²}γ_k(A)²logN),其中γ_k(A)=σ₁(A)/σ_{k+1}(A),则‖Ã_k -A‖_F ≤ (1+O(ε))‖A -A_k‖_F,以高概率成立,其中Ã=1/N∑_{i=1}^N X_iX_i^T是样本协方差。

英文摘要

We show that given an estimate $\widehat{A}$ that is close to a general high-rank positive semi-definite (PSD) matrix $A$ in spectral norm (i.e., $\|\widehat{A}-A\|_2 \leq δ$), the simple truncated SVD of $\widehat{A}$ produces a multiplicative approximation of $A$ in Frobenius norm. This observation leads to many interesting results on general high-rank matrix estimation problems, which we briefly summarize below ($A$ is an $n\times n$ high-rank PSD matrix and $A_k$ is the best rank-$k$ approximation of $A$): (1) High-rank matrix completion: By observing $Ω(\frac{n\max\{ε^{-4},k^2\}μ_0^2\|A\|_F^2\log n}{σ_{k+1}(A)^2})$ elements of $A$ where $σ_{k+1}\left(A\right)$ is the $\left(k+1\right)$-th singular value of $A$ and $μ_0$ is the incoherence, the truncated SVD on a zero-filled matrix satisfies $\|\widehat{A}_k-A\|_F \leq (1+O(ε))\|A-A_k\|_F$ with high probability. (2)High-rank matrix de-noising: Let $\widehat{A}=A+E$ where $E$ is a Gaussian random noise matrix with zero mean and $ν^2/n$ variance on each entry. Then the truncated SVD of $\widehat{A}$ satisfies $\|\widehat{A}_k-A\|_F \leq (1+O(\sqrt{ν/σ_{k+1}(A)}))\|A-A_k\|_F + O(\sqrt{k}ν)$. (3) Low-rank Estimation of high-dimensional covariance: Given $N$ i.i.d.~samples $X_1,\cdots,X_N\sim\mathcal N_n(0,A)$, can we estimate $A$ with a relative-error Frobenius norm bound? We show that if $N = Ω\left(n\max\{ε^{-4},k^2\}γ_k(A)^2\log N\right)$ for $γ_k(A)=σ_1(A)/σ_{k+1}(A)$, then $\|\widehat{A}_k-A\|_F \leq (1+O(ε))\|A-A_k\|_F$ with high probability, where $\widehat{A}=\frac{1}{N}\sum_{i=1}^N{X_iX_i^\top}$ is the sample covariance.

1711.01206 2026-06-04 math.NA cs.NA stat.CO

Robust Decoding from 1-Bit Compressive Sampling with Least Squares

从1位压缩采样中鲁棒解码:最小二乘法

Jian Huang, Yuling Jiao, Xiliang Lu, Liping Zhu

AI总结 本文研究了1位压缩感知中如何通过最小二乘法在过定和欠定情况下恢复信号,提出PDAS算法和新的正则化参数选择规则,证明了在特定条件下可有效恢复稀疏信号。

详情
AI中文摘要

在1位压缩感知中,目标信号被编码为二进制测量,旨在从噪声和量化样本中恢复信号。数学上,1位压缩感知模型表示为:y = η⊙sign(Ψx* + ε),其中x*∈R^n,y∈R^m,Ψ∈R^{m×n},ε是量化前的随机误差,η∈R^n是表示符号翻转的随机向量。由于非线性、噪声和符号翻转的存在,从1位压缩感知中解码极具挑战性。本文考虑了在过定和欠定情况下使用最小二乘方法。对于m>n的情况,我们证明,只要m≥~O(n/δ²),则最小二乘解x_ls可近似x*的精度为δ。对于m<n的情况,我们证明,在m≥O(slogn/δ²)且x*的稀疏度s < m的条件下,ℓ1-正则化最小二乘解x_ℓ1位于以x*为中心、半径为δ的球内。我们引入了一种牛顿型方法,即所谓的对偶和原问题活动集(PDAS)算法,用于求解非光滑优化问题。PDAS具有一步收敛的性质。它只需在活动集上求解一个小的最小二乘问题。因此,PDAS对于通过连续方法恢复稀疏信号非常高效。我们提出了一种新的正则化参数选择规则,该规则不引入额外的计算开销。广泛数值实验展示了所提模型的鲁棒性和算法的效率。

英文摘要

In 1-bit compressive sensing (1-bit CS) where target signal is coded into a binary measurement, one goal is to recover the signal from noisy and quantized samples. Mathematically, the 1-bit CS model reads: $y = η\odot\textrm{sign} (Ψx^* + ε)$, where $x^{*}\in \mathcal{R}^{n}, y\in \mathcal{R}^{m}$, $Ψ\in \mathcal{R}^{m\times n}$, and $ε$ is the random error before quantization and $η\in \mathcal{R}^{n}$ is a random vector modeling the sign flips. Due to the presence of nonlinearity, noise and sign flips, it is quite challenging to decode from the 1-bit CS. In this paper, we consider least squares approach under the over-determined and under-determined settings. For $m>n$, we show that, up to a constant $c$, with high probability, the least squares solution $x_{\textrm{ls}}$ approximates $ x^*$ with precision $δ$ as long as $m \geq\widetilde{\mathcal{O}}(\frac{n}{δ^2})$. For $m< n$, we prove that, up to a constant $c$, with high probability, the $\ell_1$-regularized least-squares solution $x_{\ell_1}$ lies in the ball with center $x^*$ and radius $δ$ provided that $m \geq \mathcal{O}( \frac{s\log n}{δ^2})$ and $\|x^*\|_0 := s < m$. We introduce a Newton type method, the so-called primal and dual active set (PDAS) algorithm, to solve the nonsmooth optimization problem. The PDAS possesses the property of one-step convergence. It only requires to solve a small least squares problem on the active set. Therefore, the PDAS is extremely efficient for recovering sparse signals through continuation. We propose a novel regularization parameter selection rule which does not introduce any extra computational overhead. Extensive numerical experiments are presented to illustrate the robustness of our proposed model and the efficiency of our algorithm.

1711.00708 2026-06-04 econ.GN math.ST q-fin.EC stat.AP stat.TH

On Game-Theoretic Risk Management (Part Three) - Modeling and Applications

关于博弈论风险管理(第三部分)——建模与应用

Stefan Rass

AI总结 本文基于前两部分理论,探讨如何将博弈论风险管理理论整合到风险管理体系中,通过非参数损失模型构建及ISO 27000流程应用,展示其在高级持续威胁和社会工程中的实践价值。

详情
AI中文摘要

前两篇报告中提出的博弈论风险管理框架在此被总结,讨论如何将已发展理论整合到风险管理流程中。我们讨论如何从数据构建损失模型(主要但不局限于非参数模型)。此外,提供如何建立有意义的博弈论模型的提示,并展示其在ISO 27000风险管理各阶段的应用。举例说明高级持续威胁和社会工程的相关问题。最后讨论混合纳什均衡在风险管理中的意义和实际应用。

英文摘要

The game-theoretic risk management framework put forth in the precursor reports "Towards a Theory of Games with Payoffs that are Probability-Distributions" (arXiv:1506.07368 [q-fin.EC]) and "Algorithms to Compute Nash-Equilibria in Games with Distributions as Payoffs" (arXiv:1511.08591v1 [q-fin.EC]) is herein concluded by discussing how to integrate the previously developed theory into risk management processes. To this end, we discuss how loss models (primarily but not exclusively non-parametric) can be constructed from data. Furthermore, hints are given on how a meaningful game theoretic model can be set up, and how it can be used in various stages of the ISO 27000 risk management process. Examples related to advanced persistent threats and social engineering are given. We conclude by a discussion on the meaning and practical use of (mixed) Nash equilibria equilibria for risk management.

1710.11200 2026-06-04 cs.AR cs.DS cs.MM cs.NA math.NA stat.ME

VLSI Computational Architectures for the Arithmetic Cosine Transform

用于算术余弦变换的VLSI计算架构

N. Rajapaksha, A. Madanayake, R. J. Cintra, J. Adikari, V. S. Dimitrov

AI总结 本文提出了一种用于计算零均值ACT的硬件架构,并扩展了ACT以适用于非零均值信号。所有电路均在Xilinx XC6VLX240T FPGA上实现并测试,用于45nm TSMC标准单元库的性能评估。

详情
Journal ref
IEEE Transactions on Computers, vol. 64, no. 9, Sep 2015
Comments
8 pages, 2 figures, 6 tables
AI中文摘要

离散余弦变换(DCT)是一种广泛使用且重要的信号处理工具,应用于各种领域。典型的快速算法需要浮点运算,乘法密集且累积舍入误差。最近提出的快速算法算术余弦变换(ACT)仅使用加法和整数常数乘法即可精确计算DCT,具有极低的面积复杂度,适用于零均值输入序列。ACT也可用于任何输入序列的非精确计算,具有低面积复杂度和低功耗,利用所描述的新型架构。然而,作为权衡,ACT算法需要10个非均匀采样的数据点来计算8点DCT。这种要求可以通过在图像传感器和生物医学传感器阵列等处理空间信号的应用中将传感器元件布置在非均匀网格中来轻松满足。本文提出了一种用于计算零均值ACT的硬件架构,随后提出了一种扩展ACT以适用于非零均值信号的新架构。所有电路均在Xilinx XC6VLX240T FPGA设备上实现并测试,并针对45nm TSMC标准单元库进行综合以评估性能。

英文摘要

The discrete cosine transform (DCT) is a widely-used and important signal processing tool employed in a plethora of applications. Typical fast algorithms for nearly-exact computation of DCT require floating point arithmetic, are multiplier intensive, and accumulate round-off errors. Recently proposed fast algorithm arithmetic cosine transform (ACT) calculates the DCT exactly using only additions and integer constant multiplications, with very low area complexity, for null mean input sequences. The ACT can also be computed non-exactly for any input sequence, with low area complexity and low power consumption, utilizing the novel architecture described. However, as a trade-off, the ACT algorithm requires 10 non-uniformly sampled data points to calculate the 8-point DCT. This requirement can easily be satisfied for applications dealing with spatial signals such as image sensors and biomedical sensor arrays, by placing sensor elements in a non-uniform grid. In this work, a hardware architecture for the computation of the null mean ACT is proposed, followed by a novel architectures that extend the ACT for non-null mean signals. All circuits are physically implemented and tested using the Xilinx XC6VLX240T FPGA device and synthesized for 45 nm TSMC standard-cell library for performance assessment.

1710.10021 2026-06-04 eess.SY cs.SY math.OC physics.soc-ph stat.ML

Online Learning of Power Transmission Dynamics

电力传输动态的在线学习

Andrey Y. Lokhov, Marc Vuffray, Dmitry Shemetov, Deepjyoti Deka, Michael Chertkov

AI总结 本文提出基于最大似然的方法,通过时间戳PMU测量重建电网动态状态矩阵,实现数据驱动的凸估计器,适用于实时应用和小数据量场景。

详情
Comments
7 pages, 4 figures
AI中文摘要

我们考虑从时间戳PMU测量中重建传输电网动态状态矩阵的问题,在环境波动条件下。使用基于最大似然的方法,我们构建了一族凸估计器,根据可用的先验信息适应问题结构。所提方法完全数据驱动,不假设系统参数的知识。它可以近实时实施,并且只需要少量数据。我们的学习算法可用于模型验证和校准,也可应用于系统稳定性、强制振荡检测、发电再调度以及系统状态估计等相关问题。

英文摘要

We consider the problem of reconstructing the dynamic state matrix of transmission power grids from time-stamped PMU measurements in the regime of ambient fluctuations. Using a maximum likelihood based approach, we construct a family of convex estimators that adapt to the structure of the problem depending on the available prior information. The proposed method is fully data-driven and does not assume any knowledge of system parameters. It can be implemented in near real-time and requires a small amount of data. Our learning algorithms can be used for model validation and calibration, and can also be applied to related problems of system stability, detection of forced oscillations, generation re-dispatch, as well as to the estimation of the system state.

1710.09854 2026-06-04 cs.LG cs.NA math.NA stat.ML

Gradient Sparsification for Communication-Efficient Distributed Optimization

梯度稀疏化用于通信高效的分布式优化

Jianqiao Wangni, Jialei Wang, Ji Liu, Tong Zhang

AI总结 本文提出通过凸优化方法减少梯度通信开销,设计高效算法实现梯度稀疏化,验证了在逻辑回归、支持向量机和卷积神经网络中的有效性。

详情
AI中文摘要

现代大规模机器学习应用需要在分布式计算架构上实现随机优化算法。关键瓶颈是不同工作者之间交换信息(如随机梯度)的通信开销。本文提出了一种凸优化公式,以最小化随机梯度的编码长度。为高效求解最优稀疏化,提出了几种简单快速的算法用于近似解,具有理论保证的稀疏性。在ℓ2正则化逻辑回归、支持向量机和卷积神经网络上的实验验证了我们的稀疏化方法。

英文摘要

Modern large scale machine learning applications require stochastic optimization algorithms to be implemented on distributed computational architectures. A key bottleneck is the communication overhead for exchanging information such as stochastic gradients among different workers. In this paper, to reduce the communication cost we propose a convex optimization formulation to minimize the coding length of stochastic gradients. To solve the optimal sparsification efficiently, several simple and fast algorithms are proposed for approximate solution, with theoretical guaranteed for sparseness. Experiments on $\ell_2$ regularized logistic regression, support vector machines, and convolutional neural networks validate our sparsification approaches.

1710.09657 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Segment Parameter Labelling in MCMC Mean-Shift Change Detection

MCMC均值迁移中的分段参数标记

Alireza Ahrabian, Shirin Enshaeifar, Clive Cheong-Took, Payam Barnaghi

AI总结 本文提出一种基于贝叶斯均值迁移的分段变化检测算法,利用分段参数重复性提升性能。

详情
AI中文摘要

本文解决时间序列数据在贝叶斯模型中关于感兴趣统计参数的分段问题。通常假设每个分段内的参数是不同的,因此许多贝叶斯变化点检测模型未利用分段参数模式,这可能提高性能。本文提出了一种贝叶斯均值迁移变化点检测算法,通过引入利用狄利克雷过程先验的分段类别标签来利用分段参数的重复性。所提出方法在合成和真实数据上的性能评估表明,使用参数标记可提高性能。

英文摘要

This work addresses the problem of segmentation in time series data with respect to a statistical parameter of interest in Bayesian models. It is common to assume that the parameters are distinct within each segment. As such, many Bayesian change point detection models do not exploit the segment parameter patterns, which can improve performance. This work proposes a Bayesian mean-shift change point detection algorithm that makes use of repetition in segment parameters, by introducing segment class labels that utilise a Dirichlet process prior. The performance of the proposed approach was assessed on both synthetic and real world data, highlighting the enhanced performance when using parameter labelling.

1706.04059 2026-06-04 math.ST cs.IT cs.NA math.IT math.NA stat.CO stat.ME stat.TH

Approximate Optimal Designs for Multivariate Polynomial Regression

多变量多项式回归的近似最优设计

Yohann De Castro, Fabrice Gamboa, Didier Henrion, Roxana Hess, Jean-Bernard Lasserre

AI总结 本文提出利用半定规划的矩-平方和层次结构,计算多变量多项式回归的近似最优设计,通过半定规划对偶理论恢复设计几何,并证明该层次结构随阶数增加收敛到近似最优解,同时提供双证明确保有限收敛性。

详情
Comments
30 Pages, 8 Figures. arXiv admin note: substantial text overlap with arXiv:1703.01777
AI中文摘要

我们介绍了一种新方法,旨在计算多变量多项式回归在紧致(半代数)设计空间上的近似最优设计。我们使用半定规划的矩-平方和层次结构来数值求解近似最优设计问题。通过半定规划对偶理论恢复设计的几何结构。本文证明了该层次结构随着阶数的增加收敛到近似最优设计。此外,我们提供了一个对偶证书,确保该层次结构的有限收敛性,并表明可以使用我们的方法数值计算近似最优设计。作为副产品,我们重新审视了实验设计理论的等价定理:它与Christoffel多项式相关,并且描述了矩-平方和层次结构的有限收敛性。

英文摘要

We introduce a new approach aiming at computing approximate optimal designs for multivariate polynomial regressions on compact (semi-algebraic) design spaces. We use the moment-sum-of-squares hierarchy of semidefinite programming problems to solve numerically the approximate optimal design problem. The geometry of the design is recovered via semidefinite programming duality theory. This article shows that the hierarchy converges to the approximate optimal design as the order of the hierarchy increases. Furthermore, we provide a dual certificate ensuring finite convergence of the hierarchy and showing that the approximate optimal design can be computed numerically with our method. As a byproduct, we revisit the equivalence theorem of the experimental design theory: it is linked to the Christoffel polynomial and it characterizes finite convergence of the moment-sum-of-square hierarchies.

1710.08530 2026-06-04 math.OC cs.LG cs.SY eess.SY stat.ML

Stability Analysis of Optimal Adaptive Control using Value Iteration with Approximation Errors

基于价值迭代的最优自适应控制稳定性分析

Ali Heydari

AI总结 本文通过价值迭代分析自适应最优控制的稳定性,考虑了近似误差的影响,提供了吸引域的估计,确保初始条件在该域内时轨迹保持有效。

详情
Comments
A part of this paper is based on preliminary results presented in arXiv:1412.5675
AI中文摘要

本文对基于价值迭代的自适应最优控制进行了理论分析,研究了学习阶段系统稳定性,不忽略近似误差的影响。分析包括使用任何单个/常数控制策略或演化的/时间变化控制策略时的系统运行。所提结果的一个特点是提供吸引域的估计,如果初始条件在该域内,整个轨迹将保持在其中,从而保证函数近似结果的有效性。

英文摘要

Adaptive optimal control using value iteration initiated from a stabilizing control policy is theoretically analyzed in terms of stability of the system during the learning stage without ignoring the effects of approximation errors. This analysis includes the system operated using any single/constant resulting control policy and also using an evolving/time-varying control policy. A feature of the presented results is providing estimations of the \textit{region of attraction} so that if the initial condition is within the region, the whole trajectory will remain inside it and hence, the function approximation results remain valid.

1710.07855 2026-06-04 q-bio.QM cs.SY eess.SY stat.ML

Insulin Regimen ML-based control for T2DM patients

基于机器学习的胰岛素方案控制2型糖尿病患者

Mark Shifrin, Hava Siegelmann

AI总结 本文通过马尔可夫决策过程建模2型糖尿病患者血糖水平,利用强化学习优化胰岛素治疗方案,以最大化长期奖励。

详情
AI中文摘要

我们通过随机过程建模2型糖尿病患者血糖水平(BGL),该过程主要由药物治疗方案(如胰岛素注射)决定,但并非仅此而已。BGL状态的变化还受到各种生理触发因素的影响,使该过程呈现出随机、统计上未知,但假设为准稳态的性质。为了表达处于理想健康BGL的激励,我们启发式地定义了一个奖励函数,该函数对 desirable BG 水平返回正值,对 undesirable BG 水平返回负值。状态空间包含足够多的状态以允许无记忆假设。这反过来允许构建马尔可夫决策过程(MDP),其目标是最大化长期总奖励。概率分布通过基于模型的强化学习(RL)找到,最优胰岛素治疗策略从MDP解中获取。

英文摘要

\begin{abstract} We model individual T2DM patient blood glucose level (BGL) by stochastic process with discrete number of states mainly but not solely governed by medication regimen (e.g. insulin injections). BGL states change otherwise according to various physiological triggers which render a stochastic, statistically unknown, yet assumed to be quasi-stationary, nature of the process. In order to express incentive for being in desired healthy BGL we heuristically define a reward function which returns positive values for desirable BG levels and negative values for undesirable BG levels. The state space consists of sufficient number of states in order to allow for memoryless assumption. This, in turn, allows to formulate Markov Decision Process (MDP), with an objective to maximize the total reward, summarized over a long run. The probability law is found by model-based reinforcement learning (RL) and the optimal insulin treatment policy is retrieved from MDP solution.

1607.03081 2026-06-04 math.NA cs.LG cs.NA math.OC stat.ML

Proximal Quasi-Newton Methods for Regularized Convex Optimization with Linear and Accelerated Sublinear Convergence Rates

近似拟牛顿方法在正则化凸优化中的应用:线性和加速次线性收敛率

Hiva Ghanbari, Katya Scheinberg

AI总结 本文研究了正则化凸优化中近似拟牛顿方法的收敛性,分析了强凸情况下精确与不精确设置的收敛性质,并探讨了加速变体的实用性与性能。

详情
AI中文摘要

在[19]中,提出了一种通用的、不精确的、高效的近似拟牛顿算法用于复合优化问题,并建立了次线性全局收敛率。本文分析了该方法在精确和不精确设置下的收敛性质,当目标函数为强凸时。我们还研究了该方法的一个实用变种,通过建立一个简单的子问题优化停止准则。此外,我们考虑了基于FISTA[1]的加速变体,针对近似拟牛顿算法。类似加速方法在[7]中被考虑,但其收敛性分析依赖于非常强但不实际的假设。我们提出了一个修改后的分析,放松了这些假设,并对加速的近似拟牛顿算法和常规方法进行了实际比较。我们的分析和计算结果表明,在拟牛顿设置中加速可能不会带来任何好处。

英文摘要

In [19], a general, inexact, efficient proximal quasi-Newton algorithm for composite optimization problems has been proposed and a sublinear global convergence rate has been established. In this paper, we analyze the convergence properties of this method, both in the exact and inexact setting, in the case when the objective function is strongly convex. We also investigate a practical variant of this method by establishing a simple stopping criterion for the subproblem optimization. Furthermore, we consider an accelerated variant, based on FISTA [1], to the proximal quasi-Newton algorithm. A similar accelerated method has been considered in [7], where the convergence rate analysis relies on very strong impractical assumptions. We present a modified analysis while relaxing these assumptions and perform a practical comparison of the accelerated proximal quasi- Newton algorithm and the regular one. Our analysis and computational results show that acceleration may not bring any benefit in the quasi-Newton setting.

1710.05610 2026-06-04 math.PR cs.NA math.NA math.ST stat.TH

Well-posedness of Bayesian inverse problems in quasi-Banach spaces with stable priors

在具有稳定先验的准巴拿赫空间中贝叶斯逆问题的适定性

T. J. Sullivan

AI总结 本文研究了在无限维参数空间中贝叶斯逆问题的适定性,探讨了在稳定分布先验下的逆问题求解方法及后验测度的连续性。

详情
Comments
To appear in the proceedings of the 88th Annual Meeting of the International Association of Applied Mathematics and Mechanics (GAMM), Weimar 2017. This preprint differs from the final published version in pagination and typographical detail
AI中文摘要

从贝叶斯视角看逆问题近年来吸引了大量数学关注。特别关注的是参数位于无限维空间中的贝叶斯逆问题(BIPs),典型例子是通过常微分方程或偏微分方程与观测数据耦合的标量或张量场。本文介绍了Stuart(Acta Numer. 19:451--559, 2010)等人倡导的在无限维参数空间中适定BIPs的框架。该框架的优势在于能够确保无论使用何种有限维离散化方法进行数值求解,推断问题都保持一致的适定性。最近,该框架被扩展到重尾先验测度的情况,如无限维的柯西分布,其中多项式矩可能为无穷或未定义。证明了类似于平方可积随机变量的Karhunen--Loève展开式可以用于在准巴拿赫空间上采样此类测度。此外,在比目前所用的正则性假设更弱的条件下,贝叶斯后验测度在Hellinger和总变分度量下对失配函数和观测数据的扰动表现出Lipschitz连续性。

英文摘要

The Bayesian perspective on inverse problems has attracted much mathematical attention in recent years. Particular attention has been paid to Bayesian inverse problems (BIPs) in which the parameter to be inferred lies in an infinite-dimensional space, a typical example being a scalar or tensor field coupled to some observed data via an ODE or PDE. This article gives an introduction to the framework of well-posed BIPs in infinite-dimensional parameter spaces, as advocated by Stuart (Acta Numer. 19:451--559, 2010) and others. This framework has the advantage of ensuring uniformly well-posed inference problems independently of the finite-dimensional discretisation used for numerical solution. Recently, this framework has been extended to the case of a heavy-tailed prior measure in the family of stable distributions, such as an infinite-dimensional Cauchy distribution, for which polynomial moments are infinite or undefined. It is shown that analogues of the Karhunen--Loève expansion for square-integrable random variables can be used to sample such measures on quasi-Banach spaces. Furthermore, under weaker regularity assumptions than those used to date, the Bayesian posterior measure is shown to depend Lipschitz continuously in the Hellinger and total variation metrics upon perturbations of the misfit function and observed data.

1710.05513 2026-06-04 stat.ML cs.NA math.NA q-fin.ST stat.AP stat.CO

Robust Maximum Likelihood Estimation of Sparse Vector Error Correction Model

稳健的最大似然估计:稀疏向量误差修正模型

Ziping Zhao, Daniel P. Palomar

AI总结 本文提出基于Cauchy分布的稳健估计方法,用于处理金融和计量经济学中厚尾数据和异常值问题,并通过稀疏协整关系实现特征选择和降维。

详情
Comments
5 pages, 3 figures, to appear in Proc. of the 2017 5th IEEE Global Conference on Signal and Information Processing (GlobalSIP)
AI中文摘要

在计量经济学和金融领域,向量误差修正模型(VECM)是用于协整分析的重要时间序列模型,用于估计长期均衡变量关系。传统分析和估计方法假设底层分布为高斯分布,但在实践中,厚尾数据和异常值可能导致这些方法不适用。本文提出基于Cauchy分布的稳健模型估计方法以解决此问题。此外,考虑稀疏协整关系以实现特征选择和降维。基于主要化-最小化(MM)方法的高效算法被应用于解决所提出的非凸问题。通过数值模拟展示了该算法的性能。

英文摘要

In econometrics and finance, the vector error correction model (VECM) is an important time series model for cointegration analysis, which is used to estimate the long-run equilibrium variable relationships. The traditional analysis and estimation methodologies assume the underlying Gaussian distribution but, in practice, heavy-tailed data and outliers can lead to the inapplicability of these methods. In this paper, we propose a robust model estimation method based on the Cauchy distribution to tackle this issue. In addition, sparse cointegration relations are considered to realize feature selection and dimension reduction. An efficient algorithm based on the majorization-minimization (MM) method is applied to solve the proposed nonconvex problem. The performance of this algorithm is shown through numerical simulations.

1610.01952 2026-06-04 math.NA cs.NA math.PR math.ST stat.TH

A Machine Learning Approach to Optimal Tikhonov Regularisation I: Affine Manifolds

基于机器学习的最优Tikhonov正则化方法 I:仿射流形

Ernesto De Vito, Massimo Fornasier, Valeriya Naumova

AI总结 本文提出基于监督学习的方法,用于近似高维函数,将噪声数据映射到最优Tikhonov正则化参数的近似值,假设逆问题解在低维线性子空间上集中分布且噪声为亚高斯。

详情
AI中文摘要

尽管有多种可用技术,但正确选择逆问题正则化参数仍是一个最大的挑战。主要困难在于构造一个规则,允许从给定的噪声数据中计算参数,而无需依赖先验知识或噪声水平。本文提出了一种基于监督机器学习的新方法,用于近似高维函数,将噪声数据映射到最优Tikhonov正则化参数的近似值。我们的假设是逆问题的解在低维线性子空间上集中分布,且噪声为亚高斯。其中一个令人惊讶的事实是,用于监督学习最优参数映射的先前观察示例数量最多与解子空间的维度线性相关。我们还提供了对近似参数和相应正则化解精度的显式误差界。尽管结果更偏向理论性质,我们提供了一种实际实现该方法的配方,并提供了数值实验以验证理论结果。我们还概述了未来研究的有趣方向,并提供了一些初步结果,确认其可行性。

英文摘要

Despite a variety of available techniques the issue of the proper regularization parameter choice for inverse problems still remains one of the biggest challenges. The main difficulty lies in constructing a rule, allowing to compute the parameter from given noisy data without relying either on a priori knowledge of the solution or on the noise level. In this paper we propose a novel method based on supervised machine learning to approximate the high-dimensional function, mapping noisy data into a good approximation to the optimal Tikhonov regularization parameter. Our assumptions are that solutions of the inverse problem are statistically distributed in a concentrated manner on (lower-dimensional) linear subspaces and the noise is sub-gaussian. One of the surprising facts is that the number of previously observed examples for the supervised learning of the optimal parameter mapping scales at most linearly with the dimension of the solution subspace. We also provide explicit error bounds on the accuracy of the approximated parameter and the corresponding regularization solution. Even though the results are more of theoretical nature, we present a recipe for the practical implementation of the approach and provide numerical experiments confirming the theoretical results. We also outline interesting directions for future research with some preliminary results, confirming their feasibility.

1710.03971 2026-06-04 stat.ML cs.LG cs.NA math.NA

Adaptive multi-penalty regularization based on a generalized Lasso path

基于广义Lasso路径的自适应多罚项正则化

Markus Grasmair, Timo Klock, Valeriya Naumova

AI总结 本文提出了一种自适应多罚项正则化参数选择框架,通过构建包含结构相似解的区域,实现正确支持恢复,并结合模型选择准则进行数据自适应参数选择,提升压缩感知问题的鲁棒性和性能。

详情
AI中文摘要

对于许多算法,参数调节仍是一个具有挑战性和关键性的任务,尤其是在多参数设置中变得繁琐且不可行。多罚项正则化,成功用于解决混合型不定稀疏回归问题,其中信号和噪声是加法混合的,是此类例子之一。本文提出了一种新的算法框架,用于多罚项正则化的自适应参数选择,重点在于正确支持恢复。基于正则化路径理论和单罚项函数的算法理论,我们通过提供一种高效的构造包含结构相似解的区域的程序,将这些想法扩展到多罚项框架中,即在参数范围内的整个范围内,构造具有相同稀疏性和符号模式的解。结合这一方法与模型选择准则,可以以数据自适应的方式选择正则化参数。我们算法的另一个优势是,它提供了整个参数范围内解稳定性概述。这可以进一步用于获得对感兴趣问题的额外见解。我们对我们的方法进行了数值分析,并将其与压缩感知问题中的最新单罚项算法进行比较,以展示所提算法的鲁棒性和强大性。

英文摘要

For many algorithms, parameter tuning remains a challenging and critical task, which becomes tedious and infeasible in a multi-parameter setting. Multi-penalty regularization, successfully used for solving undetermined sparse regression of problems of unmixing type where signal and noise are additively mixed, is one of such examples. In this paper, we propose a novel algorithmic framework for an adaptive parameter choice in multi-penalty regularization with a focus on the correct support recovery. Building upon the theory of regularization paths and algorithms for single-penalty functionals, we extend these ideas to a multi-penalty framework by providing an efficient procedure for the construction of regions containing structurally similar solutions, i.e., solutions with the same sparsity and sign pattern, over the whole range of parameters. Combining this with a model selection criterion, we can choose regularization parameters in a data-adaptive manner. Another advantage of our algorithm is that it provides an overview on the solution stability over the whole range of parameters. This can be further exploited to obtain additional insights into the problem of interest. We provide a numerical analysis of our method and compare it to the state-of-the-art single-penalty algorithms for compressed sensing problems in order to demonstrate the robustness and power of the proposed algorithm.

1701.04970 2026-06-04 math.ST cs.NA math.NA math.OC stat.TH

Risk Estimators for Choosing Regularization Parameters in Ill-Posed Problems - Properties and Limitations

在病态问题中选择正则化参数的风险估计器——性质与局限性

Felix Lucka, Katharina Proksch, Christoph Brune, Nicolai Bissantz, Martin Burger, Holger Dette, Frank Wübbeling

AI总结 本文研究了用于病态问题中选择正则化参数的风险估计器的性质与局限性,分析了SURE和GSURE两种方法在有限维线性Tikhonov正则化中的表现,并探讨了其在高维问题下的渐进行为。

详情
AI中文摘要

本文讨论了近期提出的一些用于选择病态问题中正则化参数的风险估计器的性质。一种简单的做法是Stein无偏风险估计器(SURE),它估计数据空间中的风险,而一种近期的改进方法(GSURE)则估计未知变量空间中的风险。似乎直观的是后者更适合病态问题,因为数据空间中的性质并不能很好地反映重建质量。我们对线性Tikhonov正则化在有限维设置中的两种估计器进行了理论研究,并估计了风险估计器的质量,这还导致了当问题维度趋于无穷时的渐近收敛结果。与之前研究图像处理问题的论文不同,我们关注了随着病态性增加时风险估计器的行为。有趣的是,我们的理论结果表明,对于病态问题,GSURE风险的质量会渐近恶化,这通过详细的数值研究得到了验证。后者显示,在许多情况下,GSURE估计器会导致极小的正则化参数,这显然无法稳定重建。与之相似但不那么严重的问题也出现在SURE估计器上,与相对保守的不一致原理相比,得出基于无偏风险估计的正则化参数选择并非病态问题的可靠方法。对稀疏正则化的相似数值研究显示,相同问题也出现在非线性变分正则化方法中。

英文摘要

This paper discusses the properties of certain risk estimators recently proposed to choose regularization parameters in ill-posed problems. A simple approach is Stein's unbiased risk estimator (SURE), which estimates the risk in the data space, while a recent modification (GSURE) estimates the risk in the space of the unknown variable. It seems intuitive that the latter is more appropriate for ill-posed problems, since the properties in the data space do not tell much about the quality of the reconstruction. We provide theoretical studies of both estimators for linear Tikhonov regularization in a finite dimensional setting and estimate the quality of the risk estimators, which also leads to asymptotic convergence results as the dimension of the problem tends to infinity. Unlike previous papers, who studied image processing problems with a very low degree of ill-posedness, we are interested in the behavior of the risk estimators for increasing ill-posedness. Interestingly, our theoretical results indicate that the quality of the GSURE risk can deteriorate asymptotically for ill-posed problems, which is confirmed by a detailed numerical study. The latter shows that in many cases the GSURE estimator leads to extremely small regularization parameters, which obviously cannot stabilize the reconstruction. Similar but less severe issues with respect to robustness also appear for the SURE estimator, which in comparison to the rather conservative discrepancy principle leads to the conclusion that regularization parameter choice based on unbiased risk estimation is not a reliable procedure for ill-posed problems. A similar numerical study for sparsity regularization demonstrates that the same issue appears in nonlinear variational regularization approaches.

1502.02860 2026-06-04 stat.ML cs.LG cs.RO cs.SY eess.SY

Gaussian Processes for Data-Efficient Learning in Robotics and Control

高斯过程在机器人和控制中的数据高效学习

Marc Peter Deisenroth, Dieter Fox, Carl Edward Rasmussen

AI总结 本文提出基于高斯过程的非参数转移模型,通过提取更多数据信息加速学习,减少模型误差影响,实现高效自主学习。

详情
Journal ref
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, issue no 2, pages 408-423, February 2015
Comments
20 pages, 29 figures; fixed a typo in equation on page 8
AI中文摘要

自主学习在控制和机器人领域已持续十多年,数据驱动学习可减少工程知识需求。然而,自主强化学习通常需要大量系统交互,这在实际系统中(如机器人)不现实。本文提出通过高斯过程转移模型提取更多数据信息,显式纳入模型不确定性以减少误差影响,相比现有RL方法,模型基于策略搜索方法实现了前所未有的学习速度,并在真实机器人和控制任务中展示了应用价值。

英文摘要

Autonomous learning has been a promising direction in control and robotics for more than a decade since data-driven learning allows to reduce the amount of engineering knowledge, which is otherwise required. However, autonomous reinforcement learning (RL) approaches typically require many interactions with the system to learn controllers, which is a practical limitation in real systems, such as robots, where many interactions can be impractical and time consuming. To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, realistic simulators, pre-shaped policies, or specific knowledge about the underlying dynamics. In this article, we follow a different approach and speed up learning by extracting more information from data. In particular, we learn a probabilistic, non-parametric Gaussian process transition model of the system. By explicitly incorporating model uncertainty into long-term planning and controller learning our approach reduces the effects of model errors, a key problem in model-based learning. Compared to state-of-the art RL our model-based policy search method achieves an unprecedented speed of learning. We demonstrate its applicability to autonomous learning in real robot and control tasks.

1710.02901 2026-06-04 math.OC cs.DS cs.SY eess.SY stat.ML

Response to "Counterexample to global convergence of DSOS and SDSOS hierarchies"

对“DSOS和SDSOS层次结构全局收敛性的反例”的回应

Amir Ali Ahmadi, Anirudha Majumdar

AI总结 本文反驳了对DSOS和SDSOS层次结构全局收敛性声称的反例,指出原论文未定义此类层次结构,且原研究已证明存在收敛的层次结构。

详情
AI中文摘要

在最近的一篇笔记[8]中,作者提供了针对多项式优化问题(POPs)中DSOS和SDSOS层次结构全局收敛性的反例,并声称这推翻了我们扩展摘要[4]和幻灯片[3]中的主张。本文旨在澄清,[4]、[3]以及显然我们的完整论文[5]从未将DSOS或SDSOS层次结构定义为[8]中所描述的方式。显然,[8]中层次结构的收敛性声明从未被提出过。[4,3]中所陈述的是完全不同的:我们声明存在基于DSOS和SDSOS优化的层次结构能够收敛。这确实如我们在本文回应中所讨论的那样是正确的。我们还强调,我们清楚地意识到某些(S)DSOS层次结构即使其自然SOS对应物能够收敛,它们也可能不收敛。这可以通过我们先前工作[5]中的一个例子直接推导出来,这使得[8]中的反例变得多余。最后,我们提供了具体的反论点,以反驳[8]中旨在挑战DSOS和SDSOS优化相对于SOS优化的可扩展性改进的主张。

英文摘要

In a recent note [8], the author provides a counterexample to the global convergence of what his work refers to as "the DSOS and SDSOS hierarchies" for polynomial optimization problems (POPs) and purports that this refutes claims in our extended abstract [4] and slides in [3]. The goal of this paper is to clarify that neither [4], nor [3], and certainly not our full paper [5], ever defined DSOS or SDSOS hierarchies as it is done in [8]. It goes without saying that no claims about convergence properties of the hierarchies in [8] were ever made as a consequence. What was stated in [4,3] was completely different: we stated that there exist hierarchies based on DSOS and SDSOS optimization that converge. This is indeed true as we discuss in this response. We also emphasize that we were well aware that some (S)DSOS hierarchies do not converge even if their natural SOS counterparts do. This is readily implied by an example in our prior work [5], which makes the counterexample in [8] superfluous. Finally, we provide concrete counterarguments to claims made in [8] that aim to challenge the scalability improvements obtained by DSOS and SDSOS optimization as compared to sum of squares (SOS) optimization. [3] A. A. Ahmadi and A. Majumdar. DSOS and SDSOS: More tractable alternatives to SOS. Slides at the meeting on Geometry and Algebra of Linear Matrix Inequalities, CIRM, Marseille, 2013. [4] A. A. Ahmadi and A. Majumdar. DSOS and SDSOS optimization: LP and SOCP-based alternatives to sum of squares optimization. In proceedings of the 48th annual IEEE Conference on Information Sciences and Systems, 2014. [5] A. A. Ahmadi and A. Majumdar. DSOS and SDSOS optimization: more tractable alternatives to sum of squares and semidefinite optimization. arXiv:1706.02586, 2017. [8] C. Josz. Counterexample to global convergence of DSOS and SDSOS hierarchies. arXiv:1707.02964, 2017.

1701.05609 2026-06-04 stat.CO cs.NA math.NA

Confidence Intervals for Finite Difference Solutions

有限差分解的置信区间

Majnu John, Yihren Wu

AI总结 本文基于回归框架,采用贝叶斯方法为有限差分解构建置信区间,探讨其与截断误差的关系及应用价值。

详情
AI中文摘要

尽管之前已有统计学家考虑过贝叶斯分析在数值求积问题中的应用,但最近才有人关注统计学与微分方程数值分析之间的联系。本文基于这一最新趋势,展示如何将常用的有限差分方案用于常微分和偏微分方程的数值解的回归设置中。聚焦于这一回归框架,我们应用简单的贝叶斯策略来获得有限差分解的置信区间。我们通过多个示例应用此框架,展示置信区间如何与截断误差相关联,并说明这些置信区间在所考虑示例中的实用性。

英文摘要

Although applications of Bayesian analysis for numerical quadrature problems have been considered before, it's only very recently that statisticians have focused on the connections between statistics and numerical analysis of differential equations. In line with this very recent trend, we show how certain commonly used finite difference schemes for numerical solutions of ordinary and partial differential equations can be considered in a regression setting. Focusing on this regression framework, we apply a simple Bayesian strategy to obtain confidence intervals for the finite difference solutions. We apply this framework on several examples to show how the confidence intervals are related to truncation error and illustrate the utility of the confidence intervals for the examples considered.

1709.10276 2026-06-04 math.NA cs.NA stat.ML

Fast online low-rank tensor subspace tracking by CP decomposition using recursive least squares from incomplete observations

快速在线低维张量子空间跟踪:基于CP分解的递归最小二乘法

Hiroyuki Kasai

AI总结 本文提出基于CP分解的OLSTEC算法,用于在线跟踪动态变化的低维子空间,通过递归最小二乘法提升收敛速度。

详情
Comments
Extended version of arXiv:1602.07067 (IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016))
AI中文摘要

我们考虑了在线子空间跟踪的问题,其中数据流部分观测且受噪声干扰,假设数据位于低维线性子空间中。将此问题转化为在线低秩张量补全问题。我们提出了一种基于CP分解的在线张量子空间跟踪算法OLSTEC,特别针对动态变化的子空间。通过递归最小二乘法(RLS)构建算法,数值实验表明OLSTEC在收敛速度方面优于现有方法。

英文摘要

We consider the problem of online subspace tracking of a partially observed high-dimensional data stream corrupted by noise, where we assume that the data lie in a low-dimensional linear subspace. This problem is cast as an online low-rank tensor completion problem. We propose a novel online tensor subspace tracking algorithm based on the CANDECOMP/PARAFAC (CP) decomposition, dubbed OnLine Low-rank Subspace tracking by TEnsor CP Decomposition (OLSTEC). The proposed algorithm especially addresses the case in which the subspace of interest is dynamically time-varying. To this end, we build up our proposed algorithm exploiting the recursive least squares (RLS), which is the second-order gradient algorithm. Numerical evaluations on synthetic datasets and real-world datasets such as communication network traffic, environmental data, and surveillance videos, show that the proposed OLSTEC algorithm outperforms state-of-the-art online algorithms in terms of the convergence rate per iteration.

1709.10142 2026-06-04 eess.SY cs.SY math.OC stat.ML

Resilient Learning-Based Control for Synchronization of Passive Multi-Agent Systems under Attack

抗攻击的基于学习的控制:被动多智能体系统同步研究

Arash Rahnama, Panos J. Antsaklis

AI总结 本文提出分布式事件触发控制框架,用于在带宽受限网络中实现被动多智能体系统的同步并减少通信负载,同时分析了拜占庭攻击对同步的影响,并提出学习方法以缓解攻击影响。

详情
AI中文摘要

本文展示了一组输出被动智能体通过通信图实现共同目标的同步。我们提出了一种分布式事件触发控制框架,以保证同步并显著减少所需通信负载。我们定义了事件触发多智能体网络系统中的通用拜占庭攻击,并阐述其对同步的负面影响。拜占庭智能体能够智能地伪造数据并通过改变各自的控制反馈权重来操纵底层通信图。我们引入了去中心化的检测框架并分析其稳态和瞬态性能。我们提出了一种识别个别拜占庭邻居的方法以及一种基于学习的估计攻击参数的方法。最后,我们提出基于学习的控制方法以缓解对抗性攻击的负面影响。

英文摘要

In this paper, we show synchronization for a group of output passive agents that communicate with each other according to an underlying communication graph to achieve a common goal. We propose a distributed event-triggered control framework that will guarantee synchronization and considerably decrease the required communication load on the band-limited network. We define a general Byzantine attack on the event-triggered multi-agent network system and characterize its negative effects on synchronization. The Byzantine agents are capable of intelligently falsifying their data and manipulating the underlying communication graph by altering their respective control feedback weights. We introduce a decentralized detection framework and analyze its steady-state and transient performances. We propose a way of identifying individual Byzantine neighbors and a learning-based method of estimating the attack parameters. Lastly, we propose learning-based control approaches to mitigate the negative effects of the adversarial attack.

1504.03413 2026-06-04 eess.SY cs.DC cs.SY stat.AP stat.ML

Consensus based Detection in the Presence of Data Falsification Attacks

基于共识的检测在数据伪造攻击下的应用

Bhavya Kailkhura, Swastik Brahma, Pramod K. Varshney

AI总结 本文研究分布式网络中数据伪造攻击下的检测问题,提出一种鲁棒的分布式加权平均共识算法,通过本地计算全局检验统计量来提升检测性能。

详情
AI中文摘要

本文考虑了在存在数据伪造(拜占庭)攻击情况下分布式网络中的检测问题。所考虑的检测方法基于完全分布式共识算法,在没有融合中心的情况下,所有节点仅与邻居交换信息。在这样的网络中,我们刻画了拜占庭行为对传统共识检测算法稳态和瞬态检测性能的负面影响。为了解决这一问题,我们从网络设计者的角度出发。具体来说,我们首先提出了一种对拜占庭攻击具有鲁棒性的分布式加权平均共识算法。我们证明,在合理假设下,使用所提出的共识算法,每个节点可以本地计算全局检验统计量。我们利用节点数据的统计分布,设计技术以减轻数据伪造拜占庭对分布式检测系统的影响。由于某些节点数据统计分布的参数可能无法事先确定,我们提出基于学习的技术,以实现本地融合或更新规则的自适应设计。

英文摘要

This paper considers the problem of detection in distributed networks in the presence of data falsification (Byzantine) attacks. Detection approaches considered in the paper are based on fully distributed consensus algorithms, where all of the nodes exchange information only with their neighbors in the absence of a fusion center. In such networks, we characterize the negative effect of Byzantines on the steady-state and transient detection performance of the conventional consensus based detection algorithms. To address this issue, we study the problem from the network designer's perspective. More specifically, we first propose a distributed weighted average consensus algorithm that is robust to Byzantine attacks. We show that, under reasonable assumptions, the global test statistic for detection can be computed locally at each node using our proposed consensus algorithm. We exploit the statistical distribution of the nodes' data to devise techniques for mitigating the influence of data falsifying Byzantines on the distributed detection system. Since some parameters of the statistical distribution of the nodes' data might not be known a priori, we propose learning based techniques to enable an adaptive design of the local fusion or update rules.

1709.07032 2026-06-04 cs.RO cs.MA cs.SY eess.SY stat.AP

Data-Driven Model Predictive Control of Autonomous Mobility-on-Demand Systems

数据驱动的自动驾驶按需出行系统模型预测控制

Ramon Iglesias, Federico Rossi, Kevin Wang, David Hallac, Jure Leskovec, Marco Pavone

AI总结 本文提出一种端到端的数据驱动框架,用于控制自动驾驶按需出行系统,通过时间扩展网络建模并设计MPC算法,利用历史数据预测短期需求,减少乘客等待时间达89.6%。

详情
Comments
Submitted to the International Conference on Robotics and Automation 2018
AI中文摘要

本文旨在提出一种端到端的数据驱动框架,用于控制自动驾驶按需出行系统(AMoD,即自动驾驶车队)。我们首先使用时间扩展网络建模AMoD系统,并提出一种计算最优再平衡策略(即预置重新定位)和给定旅行需求的最小可行车队规模的公式。然后,我们适应此公式,设计出一种模型预测控制(MPC)算法,利用基于历史数据的短期需求预测来计算再平衡策略。我们使用最先进的LSTM神经网络和滴滴出行的真实客户数据测试该控制器的端到端性能,证明该方法在大规模系统中表现优异(MPC算法的计算复杂度不依赖于系统中的客户和车辆数量),并且在减少平均乘客等待时间方面优于现有再平衡策略,最高可减少89.6%。

英文摘要

The goal of this paper is to present an end-to-end, data-driven framework to control Autonomous Mobility-on-Demand systems (AMoD, i.e. fleets of self-driving vehicles). We first model the AMoD system using a time-expanded network, and present a formulation that computes the optimal rebalancing strategy (i.e., preemptive repositioning) and the minimum feasible fleet size for a given travel demand. Then, we adapt this formulation to devise a Model Predictive Control (MPC) algorithm that leverages short-term demand forecasts based on historical data to compute rebalancing strategies. We test the end-to-end performance of this controller with a state-of-the-art LSTM neural network to predict customer demand and real customer data from DiDi Chuxing: we show that this approach scales very well for large systems (indeed, the computational complexity of the MPC algorithm does not depend on the number of customers and of vehicles in the system) and outperforms state-of-the-art rebalancing strategies by reducing the mean customer wait time by up to to 89.6%.

1702.08446 2026-06-04 math.NA cond-mat.stat-mech cs.NA stat.CO

Monte Carlo on manifolds: sampling densities and integrating functions

曼陀罗在流形上:采样密度与积分函数

Emilio Zappa, Miranda Holmes-Cerfon, Jonathan Goodman

AI总结 本文提出在欧几里得空间中定义的流形上基于马尔可夫链蒙特卡洛的方法,用于采样概率分布和计算积分,并通过实验验证了其在硬球系统熵计算中的有效性。

详情
Comments
New version. 32 pages, 11 figures
AI中文摘要

我们描述并分析了在欧几里得空间中由等式和不等式约束定义的流形上的某些蒙特卡洛方法。首先,我们为在这些流形上定义的概率分布提供了一个MCMC采样器,该采样器使用特定的正交投影到表面,仅需关于流形切空间的信息,可通过约束函数的一阶导数获得,从而避免了曲率信息或二阶导数的需要。其次,我们利用该采样器开发了一种多阶段算法来计算这些流形上的积分。我们提供了单次运行误差估计,避免了需要多个独立运行。在各种测试问题上的计算实验显示,算法和误差估计在实践中有效。该方法应用于计算不同粘性硬球系统的熵,这些预测了硬粘性球在何种温度或相互作用能下环状结构比链状结构更优。

英文摘要

We describe and analyze some Monte Carlo methods for manifolds in Euclidean space defined by equality and inequality constraints. First, we give an MCMC sampler for probability distributions defined by un-normalized densities on such manifolds. The sampler uses a specific orthogonal projection to the surface that requires only information about the tangent space to the manifold, obtainable from first derivatives of the constraint functions, hence avoiding the need for curvature information or second derivatives. Second, we use the sampler to develop a multi-stage algorithm to compute integrals over such manifolds. We provide single-run error estimates that avoid the need for multiple independent runs. Computational experiments on various test problems show that the algorithms and error estimates work in practice. The method is applied to compute the entropies of different sticky hard sphere systems. These predict the temperature or interaction energy at which loops of hard sticky spheres become preferable to chains.

1709.06011 2026-06-04 cs.MA cs.AI cs.LG cs.SY eess.SY stat.ML

Guided Deep Reinforcement Learning for Swarm Systems

引导式深度强化学习用于群体系统

Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann

AI总结 本文研究如何通过有限感知能力的协作代理(如机器人群)学习控制方法,提出引导式强化学习框架,利用中央 critic 获取全局状态以简化策略评估,通过深度强化学习近似 Q 函数和策略。

详情
Comments
15 pages, 8 figures, accepted at the AAMAS 2017 Autonomous Robots and Multirobot Systems (ARMS) Workshop
AI中文摘要

本文研究如何学习控制具有有限感知能力的协作代理群体(如机器人群)。代理仅具备基本传感器能力,但通过协作可完成复杂任务,如分布式装配或搜索救援。学习群体代理的策略因分布式部分可观测性而困难。本文采用引导式方法,其中 critic 在学习过程中拥有全局状态的中央访问,从而从强化学习角度简化策略评估问题。例如,通过摄像头图像获取所有机器人位置,但该图像仅供 critic 使用,不供机器人控制策略。本文采用 actor-critic 方法,其中 actor 仅基于本地感知信息做决策,而 critic 基于真实全局状态进行学习。算法使用深度强化学习近似 Q 函数和策略。算法性能在两个简单模拟 2D 代理任务上进行评估:1) 找到并维持一定距离;2) 定位目标。

英文摘要

In this paper, we investigate how to learn to control a group of cooperative agents with limited sensing capabilities such as robot swarms. The agents have only very basic sensor capabilities, yet in a group they can accomplish sophisticated tasks, such as distributed assembly or search and rescue tasks. Learning a policy for a group of agents is difficult due to distributed partial observability of the state. Here, we follow a guided approach where a critic has central access to the global state during learning, which simplifies the policy evaluation problem from a reinforcement learning point of view. For example, we can get the positions of all robots of the swarm using a camera image of a scene. This camera image is only available to the critic and not to the control policies of the robots. We follow an actor-critic approach, where the actors base their decisions only on locally sensed information. In contrast, the critic is learned based on the true global state. Our algorithm uses deep reinforcement learning to approximate both the Q-function and the policy. The performance of the algorithm is evaluated on two tasks with simple simulated 2D agents: 1) finding and maintaining a certain distance to each others and 2) locating a target.

1604.00151 2026-06-04 eess.SY cs.SY stat.ML

Analysis of gradient descent methods with non-diminishing, bounded errors

带有非递减、有界的梯度误差的梯度下降方法分析

Arunselvan Ramaswamy, Shalabh Bhatnagar

AI总结 本文分析了在梯度误差不必然消失的情况下梯度下降算法的稳定性与收敛性,提出充分条件并展示算法收敛于最小值集的邻域,同时扩展了现有文献中的相关研究。

详情
Comments
arXiv admin note: text overlap with arXiv:1502.01953, IEEE Transactions on Automatic Control, 2017
AI中文摘要

本文的主要目的是提供一种对具有非递减、有界的梯度误差的梯度下降(GD)算法的分析。特别地,本文给出了保证GD算法稳定性(迭代序列几乎必然有界)和收敛性的充分条件。此外,该算法被证明收敛于一个依赖于梯度误差的最小值集的邻域。值得注意的是,本文的主要结果可用于证明具有渐近消失误差的GD算法确实收敛于最小值集。本文的结果比以前的结果更为一般,且我们对具有误差的GD算法的分析是文献中首次出现的。本文的工作扩展了Mangasarian & Solodov、Bertsekas & Tsitsiklis以及Tadic & Doucet的研究成果。使用我们的框架,我们提出了一种简单的、有效的GD实现方法,利用同时扰动随机逼近(SP SA),并采用常数敏感度参数。另一个重要的改进是,没有对步长施加额外的限制。在机器学习应用中,步长与学习率相关,我们的假设不像其他论文那样影响这些学习率。最后,我们通过实验结果来验证我们的理论。

英文摘要

The main aim of this paper is to provide an analysis of gradient descent (GD) algorithms with gradient errors that do not necessarily vanish, asymptotically. In particular, sufficient conditions are presented for both stability (almost sure boundedness of the iterates) and convergence of GD with bounded, (possibly) non-diminishing gradient errors. In addition to ensuring stability, such an algorithm is shown to converge to a small neighborhood of the minimum set, which depends on the gradient errors. It is worth noting that the main result of this paper can be used to show that GD with asymptotically vanishing errors indeed converges to the minimum set. The results presented herein are not only more general when compared to previous results, but our analysis of GD with errors is new to the literature to the best of our knowledge. Our work extends the contributions of Mangasarian & Solodov, Bertsekas & Tsitsiklis and Tadic & Doucet. Using our framework, a simple yet effective implementation of GD using simultaneous perturbation stochastic approximations (SP SA), with constant sensitivity parameters, is presented. Another important improvement over many previous results is that there are no `additional' restrictions imposed on the step-sizes. In machine learning applications where step-sizes are related to learning rates, our assumptions, unlike those of other papers, do not affect these learning rates. Finally, we present experimental results to validate our theory.

1602.04436 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Autoregressive Moving Average Graph Filtering

自回归移动平均图滤波器

Elvin Isufi, Andreas Loukas, Andrea Simonetto, Geert Leus

AI总结 本文提出了一种自回归移动平均图滤波器,能够近似任意图频响应并实现信号去噪和插值。该方法适用于静态和时变场景,通过二维滤波处理时变图信号。

详情
Journal ref
IEEE Transactions on Signal Processing, vol. 67 (2), pages 274 - 288, 2017
AI中文摘要

本文提出了一种自回归移动平均图滤波器,能够近似任意图频响应并实现信号去噪和插值。该方法适用于静态和时变场景,通过二维滤波处理时变图信号。

英文摘要

One of the cornerstones of the field of signal processing on graphs are graph filters, direct analogues of classical filters, but intended for signals defined on graphs. This work brings forth new insights on the distributed graph filtering problem. We design a family of autoregressive moving average (ARMA) recursions, which (i) are able to approximate any desired graph frequency response, and (ii) give exact solutions for tasks such as graph signal denoising and interpolation. The design philosophy, which allows us to design the ARMA coefficients independently from the underlying graph, renders the ARMA graph filters suitable in static and, particularly, time-varying settings. The latter occur when the graph signal and/or graph are changing over time. We show that in case of a time-varying graph signal our approach extends naturally to a two-dimensional filter, operating concurrently in the graph and regular time domains. We also derive sufficient conditions for filter stability when the graph and signal are time-varying. The analytical and numerical results presented in this paper illustrate that ARMA graph filters are practically appealing for static and time-varying settings, as predicted by theoretical derivations.

1709.04574 2026-06-04 cs.HC cs.AI cs.SY eess.SY stat.ML

Towards personalized human AI interaction - adapting the behavior of AI agents using neural signatures of subjective interest

迈向个性化的人工智能交互 - 利用主观兴趣的神经签名来适应AI代理的行为

Victor Shih, David C Jangraw, Paul Sajda, Sameer Saproo

AI总结 本文提出通过神经签名检测用户兴趣,使深度强化学习AI代理适应个性化人类偏好,首次展示hBCI在虚拟环境中隐式强化AI控制系统的应用。

详情
Comments
11 pages, 9 figures, 1 table, Submitted to IEEE Trans. on Neural Networks and Learning Systems
AI中文摘要

强化学习AI通常使用环境中的客观奖励/惩罚信号(如游戏得分、完成时间等)来学习最优任务策略。然而,此类AI代理的人机交互应包含隐式且主观的强化信号(如人类对特定AI行为的偏好),以适应个体化的人类偏好。这种适应会模仿自然发生的增强信任和舒适度的社会互动过程。本文展示如何利用混合脑机接口(hBCI)检测个体在虚拟环境中的兴趣水平,以适应控制虚拟自动驾驶车辆的深度强化学习AI代理。具体而言,我们展示AI学习了一种保持与前车安全距离的驾驶策略,并最值得注意的是,当车辆乘客遇到感兴趣物体时,优先减速。这种适应使主观有趣物体的观看时间增加了20%。这是首次展示如何利用hBCI以包含用户偏好的方式向AI代理提供隐式强化。

英文摘要

Reinforcement Learning AI commonly uses reward/penalty signals that are objective and explicit in an environment -- e.g. game score, completion time, etc. -- in order to learn the optimal strategy for task performance. However, Human-AI interaction for such AI agents should include additional reinforcement that is implicit and subjective -- e.g. human preferences for certain AI behavior -- in order to adapt the AI behavior to idiosyncratic human preferences. Such adaptations would mirror naturally occurring processes that increase trust and comfort during social interactions. Here, we show how a hybrid brain-computer-interface (hBCI), which detects an individual's level of interest in objects/events in a virtual environment, can be used to adapt the behavior of a Deep Reinforcement Learning AI agent that is controlling a virtual autonomous vehicle. Specifically, we show that the AI learns a driving strategy that maintains a safe distance from a lead vehicle, and most novelly, preferentially slows the vehicle when the human passengers of the vehicle encounter objects of interest. This adaptation affords an additional 20\% viewing time for subjectively interesting objects. This is the first demonstration of how an hBCI can be used to provide implicit reinforcement to an AI agent in a way that incorporates user preferences into the control system.

1709.04073 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Linear Stochastic Approximation: Constant Step-Size and Iterate Averaging

线性随机逼近:固定步长和迭代平均

Chandrashekar Lakshminarayanan, Csaba Szepesvári

AI总结 本文研究了固定步长和Polyak-Ruppert平均的线性随机逼近算法,分析了其均方误差随迭代次数的变化,并探讨了在不同数据分布下固定步长的选择条件及启发式调整方法。

详情
Comments
16 pages, 2 figures, was submitted to NIPS 2017
AI中文摘要

本文研究了具有固定步长和Polyak-Ruppert(PR)迭代平均的$d$维线性随机逼近算法(LSAs)。LSAs广泛应用于机器学习和强化学习(RL)中,其目标是利用噪声数据和每个迭代$O(d)$次更新来计算合适的$θ_*∈\mathbb{R}^d$(即最优解或固定点)。本文受RL中从经验回放中评估策略的问题启发,探讨了属于时间差分(TD)类学习算法的LSAs。对于具有固定步长和PR平均的LSAs,我们提供了$t$次迭代后的均方误差(MSE)的界限。我们假设数据是独立同分布且具有有限方差(底层分布为$P$)且期望动力学是Hurwitz的。对于给定的LSA与PR平均,以及满足上述假设的数据分布$P$,我们证明存在一个常数步长范围,使得其MSE衰减为$O(1/t)$。我们还探讨了在数据分布$\mathcal{P}$中选择统一常数步长的条件,并证明并非所有数据分布都允许这样的统一常数步长。此外,我们建议一种启发式步长调整算法,用于为给定的数据分布$P$选择LSA的常数步长。我们还比较了我们的结果与相关工作,并讨论了我们的结果在TD算法作为LSAs的上下文中的意义。

英文摘要

We consider $d$-dimensional linear stochastic approximation algorithms (LSAs) with a constant step-size and the so called Polyak-Ruppert (PR) averaging of iterates. LSAs are widely applied in machine learning and reinforcement learning (RL), where the aim is to compute an appropriate $θ_{*} \in \mathbb{R}^d$ (that is an optimum or a fixed point) using noisy data and $O(d)$ updates per iteration. In this paper, we are motivated by the problem (in RL) of policy evaluation from experience replay using the \emph{temporal difference} (TD) class of learning algorithms that are also LSAs. For LSAs with a constant step-size, and PR averaging, we provide bounds for the mean squared error (MSE) after $t$ iterations. We assume that data is \iid with finite variance (underlying distribution being $P$) and that the expected dynamics is Hurwitz. For a given LSA with PR averaging, and data distribution $P$ satisfying the said assumptions, we show that there exists a range of constant step-sizes such that its MSE decays as $O(\frac{1}{t})$. We examine the conditions under which a constant step-size can be chosen uniformly for a class of data distributions $\mathcal{P}$, and show that not all data distributions `admit' such a uniform constant step-size. We also suggest a heuristic step-size tuning algorithm to choose a constant step-size of a given LSA for a given data distribution $P$. We compare our results with related work and also discuss the implication of our results in the context of TD algorithms that are LSAs.

1701.02440 2026-06-04 cs.LG cs.NA math.NA stat.ML

Machine Learning of Linear Differential Equations using Gaussian Processes

利用高斯过程学习线性微分方程

Maziar Raissi, George Em. Karniadakis

AI总结 本文利用概率机器学习最新进展,通过高斯过程先验发现参数化的线性守恒律方程,包括常微分、偏微分、积分微分及分数阶算子。

详情
AI中文摘要

本工作利用概率机器学习最新进展,发现由参数化线性方程表达的守恒律。此类方程包括但不限于常微分、偏微分、积分微分和分数阶算子。此处,根据此类算子的特定形式修改高斯过程先验,并用于从稀疏且可能含噪声的观测中推断线性方程的参数。此类观测可能来自实验或

英文摘要

This work leverages recent advances in probabilistic machine learning to discover conservation laws expressed by parametric linear equations. Such equations involve, but are not limited to, ordinary and partial differential, integro-differential, and fractional order operators. Here, Gaussian process priors are modified according to the particular form of such operators and are employed to infer parameters of the linear equations from scarce and possibly noisy observations. Such observations may come from experiments or "black-box" computer simulations.

1607.03428 2026-06-04 cs.LG cs.SY eess.SY quant-ph stat.ML

Learning in Quantum Control: High-Dimensional Global Optimization for Noisy Quantum Dynamics

量子控制中的学习:用于噪声量子动力学的高维全局优化

Pantita Palittapongarnpim, Peter Wittek, Ehsan Zahedinejad, Shakib Vedaie, Barry C. Sanders

AI总结 本文提出使用差分进化算法解决高维量子系统中非凸优化问题,通过改进控制保真度和引入启发式方法提升计算效率,展示在量子相位估计和量子门设计中的优越性能。

详情
Journal ref
Neurocomputing 268 (2017) 116-126
Comments
32 pages, 4 figures, extension of proceedings in ESANN 2016 conference submitted to Neurocomputing
AI中文摘要

量子控制在多种量子技术中具有重要价值,如通用量子计算中的高保真门、自适应量子增强计量和超冷原子操控。尽管监督学习和强化学习广泛用于优化经典系统的控制参数,但量子参数优化主要通过基于梯度的贪心算法进行。虽然量子适应度景观通常与贪心算法兼容,但在高维量子系统中贪心算法可能表现不佳。本文采用差分进化算法克服非凸优化的停滞问题,通过平均目标函数提升噪声系统中的量子控制保真度。为减少计算成本,引入了运行终止的启发式方法和自适应搜索子空间选择。我们的实现是大规模并行和向量化的,以进一步减少运行时间。通过量子相位估计和量子门设计两个示例,我们展示了在保真度和可扩展性方面优于贪心算法的结果。

英文摘要

Quantum control is valuable for various quantum technologies such as high-fidelity gates for universal quantum computing, adaptive quantum-enhanced metrology, and ultra-cold atom manipulation. Although supervised machine learning and reinforcement learning are widely used for optimizing control parameters in classical systems, quantum control for parameter optimization is mainly pursued via gradient-based greedy algorithms. Although the quantum fitness landscape is often compatible with greedy algorithms, sometimes greedy algorithms yield poor results, especially for large-dimensional quantum systems. We employ differential evolution algorithms to circumvent the stagnation problem of non-convex optimization. We improve quantum control fidelity for noisy system by averaging over the objective function. To reduce computational cost, we introduce heuristics for early termination of runs and for adaptive selection of search subspaces. Our implementation is massively parallel and vectorized to reduce run time even further. We demonstrate our methods with two examples, namely quantum phase estimation and quantum gate design, for which we achieve superior fidelity and scalability than obtained using greedy algorithms.

1708.08552 2026-06-04 cs.LG cs.NA math.NA stat.ML

An inexact subsampled proximal Newton-type method for large-scale machine learning

一种用于大规模机器学习的近似子采样近端牛顿型方法

Xuanqing Liu, Cho-Jui Hsieh, Jason D. Lee, Yuekai Sun

AI总结 本文提出一种快速近端牛顿型算法,通过子采样构造牛顿子问题,提升大规模优化效率,实验验证其在ℓ₁正则化逻辑回归中的优越性。

详情
AI中文摘要

我们提出了一种快速的近端牛顿型算法,用于最小化带有正则化的有限和。该算法能够在$\tilde{\mathcal{O}}(d(n + \sqrt{κd})\log(\frac{1}ε))$ FLOPS内返回一个$ε$-次优解,其中$n$是样本数,$d$是特征维度,$κ$是条件数。只要$n > d$,该方法比最先进的加速随机一阶方法更高效,后者需要$\tilde{\mathcal{O}}(d(n + \sqrt{κn})\log(\frac{1}ε))$ FLOPS。关键思想是通过子采样构造牛顿子问题,以保持目标函数的有限和结构,从而利用最近的随机一阶方法进展来求解子问题。实验结果验证了所提算法在真实数据集上的ℓ₁正则化逻辑回归任务中优于先前算法。

英文摘要

We propose a fast proximal Newton-type algorithm for minimizing regularized finite sums that returns an $ε$-suboptimal point in $\tilde{\mathcal{O}}(d(n + \sqrt{κd})\log(\frac{1}ε))$ FLOPS, where $n$ is number of samples, $d$ is feature dimension, and $κ$ is the condition number. As long as $n > d$, the proposed method is more efficient than state-of-the-art accelerated stochastic first-order methods for non-smooth regularizers which requires $\tilde{\mathcal{O}}(d(n + \sqrt{κn})\log(\frac{1}ε))$ FLOPS. The key idea is to form the subsampled Newton subproblem in a way that preserves the finite sum structure of the objective, thereby allowing us to leverage recent developments in stochastic first-order methods to solve the subproblem. Experimental results verify that the proposed algorithm outperforms previous algorithms for $\ell_1$-regularized logistic regression on real datasets.

1708.08354 2026-06-04 math.NA cs.NA stat.CO

Recent implementations, applications, and extensions of the Locally Optimal Block Preconditioned Conjugate Gradient method (LOBPCG)

最近的LOBPCG方法实现、应用及扩展

Andrew Knyazev

AI总结 本文综述了LOBPCG方法的最新实现、应用及其在标准特征值问题外的扩展,涵盖力学、材料科学和数据科学领域。

详情
Comments
4 pages. Householder Symposium on Numerical Linear Algebra, June 2017
AI中文摘要

自引入以来[1]和高效并行实现[2],LOBPCG已被广泛应用于力学、材料科学和数据科学等领域。本文回顾了其最新实现和应用,以及局部最优性思想在标准特征值问题之外的扩展。

英文摘要

Since introduction [A. Knyazev, Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method, SISC (2001) DOI:10.1137/S1064827500366124] and efficient parallel implementation [A. Knyazev et al., Block locally optimal preconditioned eigenvalue xolvers (BLOPEX) in HYPRE and PETSc, SISC (2007) DOI:10.1137/060661624], LOBPCG has been used is a wide range of applications in mechanics, material sciences, and data sciences. We review its recent implementations and applications, as well as extensions of the local optimality idea beyond standard eigenvalue problems.

1708.05136 2026-06-04 math.OC cs.DC cs.NA math.NA stat.CO

More Iterations per Second, Same Quality -- Why Asynchronous Algorithms may Drastically Outperform Traditional Ones

更多每秒迭代次数,相同质量——为什么异步算法可能大幅超越传统算法

Robert Hannah, Wotao Yin

AI总结 本文研究了异步并行算法ARock的收敛性,证明在大规模问题中,异步算法所需的额外迭代次数可忽略,从而证明其在分布式计算中的潜力。

详情
Comments
29 pages
AI中文摘要

本文研究了一种非常通用的异步并行算法ARock的收敛性,该算法包含了许多已知的异步算法作为特例(如梯度下降、近端梯度、Douglas Rachford、ADMM等)。在异步并行算法中,计算节点使用最新可用信息而非等待所有节点的完整更新。这使得节点无需浪费时间等待信息,这在分布式系统中可能是一个主要瓶颈。当系统有p个节点时,异步算法可能在给定时间内完成Θ(ln(p))更多的迭代(

英文摘要

In this paper, we consider the convergence of a very general asynchronous-parallel algorithm called ARock, that takes many well-known asynchronous algorithms as special cases (gradient descent, proximal gradient, Douglas Rachford, ADMM, etc.). In asynchronous-parallel algorithms, the computing nodes simply use the most recent information that they have access to, instead of waiting for a full update from all nodes in the system. This means that nodes do not have to waste time waiting for information, which can be a major bottleneck, especially in distributed systems. When the system has $p$ nodes, asynchronous algorithms may complete $Θ(\ln(p))$ more iterations than synchronous algorithms in a given time period ("more iterations per second"). Although asynchronous algorithms may compute more iterations per second, there is error associated with using outdated information. How many more iterations in total are needed to compensate for this error is still an open question. The main results of this paper aim to answer this question. We prove, loosely, that as the size of the problem becomes large, the number of additional iterations that asynchronous algorithms need becomes negligible compared to the total number ("same quality" of the iterations). Taking these facts together, our results provide solid evidence of the potential of asynchronous algorithms to vastly speed up certain distributed computations.

1708.06714 2026-06-04 math.OC cs.NA math.NA stat.ML

A Deterministic Nonsmooth Frank Wolfe Algorithm with Coreset Guarantees

一种具有压缩集保证的确定性非光滑Frank-Wolfe算法

Sathya N. Ravi, Maxwell D. Collins, Vikas Singh

AI总结 本文提出一种适用于非光滑凸优化问题的Frank-Wolfe算法,提供收敛性界并展示其在多个机器学习问题中的压缩集结果,算法在不依赖输入规模的情况下提供近似解。

详情
AI中文摘要

我们提出了一种新的Frank-Wolfe(FW)类型算法,适用于具有非光滑凸目标的最优化问题。我们提供了收敛性界,并展示了该方案在各种机器学习问题中(如1-中位数、平衡发展、稀疏PCA、图割以及ℓ1-范数正则化的支持向量机等)提供了所谓的压缩集结果。这意味着该算法在时间复杂度界上提供这些问题的近似解,而无需依赖输入问题的规模。我们的框架受到大量关于各种数据分析问题的子线性算法研究的启发,是完全确定性的,并且不使用平滑或近 operator。除了这些理论结果外,我们还通过实验展示了该算法的实用性,并在某些情况下为大规模问题实例提供了显著的计算优势。我们提供了一个开源实现,可以适应其他符合整体结构的问题。

英文摘要

We present a new Frank-Wolfe (FW) type algorithm that is applicable to minimization problems with a nonsmooth convex objective. We provide convergence bounds and show that the scheme yields so-called coreset results for various Machine Learning problems including 1-median, Balanced Development, Sparse PCA, Graph Cuts, and the $\ell_1$-norm-regularized Support Vector Machine (SVM) among others. This means that the algorithm provides approximate solutions to these problems in time complexity bounds that are not dependent on the size of the input problem. Our framework, motivated by a growing body of work on sublinear algorithms for various data analysis problems, is entirely deterministic and makes no use of smoothing or proximal operators. Apart from these theoretical results, we show experimentally that the algorithm is very practical and in some cases also offers significant computational advantages on large problem instances. We provide an open source implementation that can be adapted for other problems that fit the overall structure.

1601.08068 2026-06-04 stat.ML cs.LG cs.SY eess.SY

System Identification through Online Sparse Gaussian Process Regression with Input Noise

通过在线稀疏高斯过程回归进行系统辨识

Hildo Bijl, Thomas B. Schön, Jan-Willem van Wingerden, Michel Verhaegen

AI总结 本文提出一种在线稀疏高斯过程回归算法,解决高斯过程回归在计算效率、在线更新和处理噪声输入方面的不足,实验表明其在非线性黑盒系统建模中性能优异。

详情
AI中文摘要

近年来,非参数回归方法如高斯过程(GP)回归在系统辨识中受到越来越多关注。传统高斯过程回归有三个重要缺点:(1)计算成本高,(2)无法高效在线实现新测量值,(3)无法处理随机(噪声)输入点。本文提出一种算法同时解决这三个问题。所提出的稀疏在线噪声输入高斯过程(SONIG)回归算法可以在常数时间内纳入新的噪声测量值。实验表明,其比现有回归算法更准确。当应用于非线性黑盒系统建模时,其性能与现有非线性ARX模型相媲美。

英文摘要

There has been a growing interest in using non-parametric regression methods like Gaussian Process (GP) regression for system identification. GP regression does traditionally have three important downsides: (1) it is computationally intensive, (2) it cannot efficiently implement newly obtained measurements online, and (3) it cannot deal with stochastic (noisy) input points. In this paper we present an algorithm tackling all these three issues simultaneously. The resulting Sparse Online Noisy Input GP (SONIG) regression algorithm can incorporate new noisy measurements in constant runtime. A comparison has shown that it is more accurate than similar existing regression algorithms. When applied to non-linear black-box system modeling, its performance is competitive with existing non-linear ARX models.

1708.04303 2026-06-04 math.NA cs.NA stat.ME

Data-driven dimensional analysis: algorithms for unique and relevant dimensionless groups

数据驱动的维度分析:用于唯一和相关无量纲群的算法

Paul G. Constantine, Zachary del Rosario, Gianluca Iaccarino

AI总结 本文提出两种算法,通过实验数据补充经典维度分析,生成唯一且相关的无量纲群,应用于粘性管道流动研究。

详情
AI中文摘要

经典维度分析有两个局限性:(i) 计算出的无量纲群不唯一,(ii) 分析不测量无量纲群的相对重要性。我们提出两种算法,假设实验者能控制系统的独立变量并评估相应依赖变量;例如,计算机实验提供这种设置。第一种算法基于从一组实验中构建的响应曲面。第二种算法使用大量实验来估计独立变量范围内的有限差分。两种算法都是半经验的,因为它们使用实验数据来补充维度分析。我们通过结合经典半经验建模与主动子空间,推导出这些算法,从而在给定独立变量的概率密度下生成唯一且相关的无量纲群。主动子空间与维度分析的联系也揭示了所有经验模型都是岭函数,即在其域内沿低维子空间恒定的函数。我们展示了所提出的算法在经典的粘性管道流动例子中的应用——包括湍流和层流情况。结果包括湍流管道流动的新无量纲群集,按系统相关性排序;精确的相关性概念紧密关联于Sobol'和Kucherenko的基于导数的全局灵敏度度量。

英文摘要

Classical dimensional analysis has two limitations: (i) the computed dimensionless groups are not unique, and (ii) the analysis does not measure relative importance of the dimensionless groups. We propose two algorithms for estimating unique and relevant dimensionless groups assuming the experimenter can control the system's independent variables and evaluate the corresponding dependent variable; e.g., computer experiments provide such a setting. The first algorithm is based on a response surface constructed from a set of experiments. The second algorithm uses many experiments to estimate finite differences over a range of the independent variables. Both algorithms are semi-empirical because they use experimental data to complement the dimensional analysis. We derive the algorithms by combining classical semi-empirical modeling with active subspaces, which---given a probability density on the independent variables---yield unique and relevant dimensionless groups. The connection between active subspaces and dimensional analysis also reveals that all empirical models are ridge functions, which are functions that are constant along low-dimensional subspaces in its domain. We demonstrate the proposed algorithms on the well-studied example of viscous pipe flow---both turbulent and laminar cases. The results include a new set of two dimensionless groups for turbulent pipe flow that are ordered by relevance to the system; the precise notion of relevance is closely tied to the derivative based global sensitivity metric from Sobol' and Kucherenko.

1610.05261 2026-06-04 math.NA cs.LG cs.NA stat.ML

A probabilistic model for the numerical solution of initial value problems

初值问题数值解的概率模型

Michael Schober, Simo Särkkä, Philipp Hennig

AI总结 本文提出将初值问题解法视为对潜在路径的推断,连接了广义线性方法、Runge-Kutta方法和Nordsieck方法,揭示了经典方法的隐含假设和不确定性处理。

详情
Comments
23 pages, 11 figures
AI中文摘要

与许多数值方法类似,初值问题求解器通过可计算结果估计不可解析量。本文将求解过程视为从高斯过程概率测度中抽取路径的推断,展示了该类算法与广义线性方法、Runge-Kutta方法和Nordsieck方法的联系。这种概率框架在分析上突显了隐含的先验假设,并在实践中为不确定性处理提供了 docking points。

英文摘要

Like many numerical methods, solvers for initial value problems (IVPs) on ordinary differential equations estimate an analytically intractable quantity, using the results of tractable computations as inputs. This structure is closely connected to the notion of inference on latent variables in statistics. We describe a class of algorithms that formulate the solution to an IVP as inference on a latent path that is a draw from a Gaussian process probability measure (or equivalently, the solution of a linear stochastic differential equation). We then show that certain members of this class are connected precisely to generalized linear methods for ODEs, a number of Runge--Kutta methods, and Nordsieck methods. This probabilistic formulation of classic methods is valuable in two ways: analytically, it highlights implicit prior assumptions favoring certain approximate solutions to the IVP over others, and gives a precise meaning to the old observation that these methods act like filters. Practically, it endows the classic solvers with `docking points' for notions of uncertainty and prior information about the initial value, the value of the ODE itself, and the solution of the problem.

1605.01278 2026-06-04 stat.ML cs.LG cs.SY eess.SY math.DS math.PR

A Bayesian Approach to Policy Recognition and State Representation Learning

基于贝叶斯方法的策略识别与状态表示学习

Adrian Šošić, Abdelhak M. Zoubir, Heinz Koeppl

AI总结 本文提出一种贝叶斯方法,用于在不假设专家行为最优的情况下,学习任意随机专家策略,并推断专家使用的状态表示复杂度及任务相关的状态空间划分。

详情
Comments
17 pages, 8 figures; ### Version 4 ### to appear in IEEE Transactions on Pattern Analysis and Machine Intelligence
AI中文摘要

学习从示范(LfD)是通过专家提供的示范构建任务行为模型的过程。这些模型可用于系统控制,通过泛化专家示范到未曾遇到的情况。然而,大多数LfD方法假设专家行为的确定性最优地面真实策略或需要直接监控专家的控制,限制了其在一般系统识别框架中的实际应用。本文考虑了更一般性的LfD问题,允许任意随机专家策略,而不考虑示范的最优性。采用贝叶斯方法,我们建模了能够解释所提供示范数据的全部可能专家控制器的后验分布。此外,我们展示了该方法可以应用于非参数上下文,以推断专家使用的状态表示复杂度,并学习任务相关的系统状态空间划分。

英文摘要

Learning from demonstration (LfD) is the process of building behavioral models of a task from demonstrations provided by an expert. These models can be used e.g. for system control by generalizing the expert demonstrations to previously unencountered situations. Most LfD methods, however, make strong assumptions about the expert behavior, e.g. they assume the existence of a deterministic optimal ground truth policy or require direct monitoring of the expert's controls, which limits their practical use as part of a general system identification framework. In this work, we consider the LfD problem in a more general setting where we allow for arbitrary stochastic expert policies, without reasoning about the optimality of the demonstrations. Following a Bayesian methodology, we model the full posterior distribution of possible expert controllers that explain the provided demonstration data. Moreover, we show that our methodology can be applied in a nonparametric context to infer the complexity of the state representation used by the expert, and to learn task-appropriate partitionings of the system state space.

1707.03746 2026-06-04 q-fin.CP econ.GN q-fin.EC q-fin.ST stat.AP

Modeling the price of Bitcoin with geometric fractional Brownian motion: a Monte Carlo approach

用几何分数布朗运动建模比特币价格:一种蒙特卡洛方法

Mariusz Tarnopolski

AI总结 本文利用几何分数布朗运动和蒙特卡洛模拟预测比特币价格,通过历史数据扩展预测2018年初比特币价格最可能为6358美元。

详情
Comments
5 pages, 3 figures
AI中文摘要

比特币(BTC)的长期依赖性,通过Hurst指数H>0.5体现出来,被用来预测未来的BTC/USD价格。通过进行10^4次几何分数布朗运动实现的蒙特卡洛模拟作为历史数据的扩展。统计推断的准确性为10%。2018年初最可能的比特币价格为6358美元。

英文摘要

The long-term dependence of Bitcoin (BTC), manifesting itself through a Hurst exponent $H>0.5$, is exploited in order to predict future BTC/USD price. A Monte Carlo simulation with $10^4$ geometric fractional Brownian motion realisations is performed as extensions of historical data. The accuracy of statistical inferences is 10\%. The most probable Bitcoin price at the beginning of 2018 is 6358 USD.

1602.00078 2026-06-04 physics.data-an cs.DS cs.NA math.NA stat.ML

Latent common manifold learning with alternating diffusion: analysis and applications

潜在共同流形学习与交替扩散:分析与应用

Ronen Talmon, Hau-tieng Wu

AI总结 本文提出基于交替扩散的潜在共同流形模型,用于多模态数据融合,分析其理论基础并展示在多个应用中的实验结果。

详情
AI中文摘要

多传感器数据集的分析近年来引起了广泛关注。传统方法,包括基于核的方法,通常无法捕捉非线性几何结构。本文引入了一个潜在共同流形模型,用于多传感器观测的多模态数据融合。提出了一种基于交替扩散的方法并进行了分析;在潜在共同流形模型下对方法进行了理论分析。为了展示所提框架的威力,报告了几个应用中的实验结果。

英文摘要

The analysis of data sets arising from multiple sensors has drawn significant research attention over the years. Traditional methods, including kernel-based methods, are typically incapable of capturing nonlinear geometric structures. We introduce a latent common manifold model underlying multiple sensor observations for the purpose of multimodal data fusion. A method based on alternating diffusion is presented and analyzed; we provide theoretical analysis of the method under the latent common manifold model. To exemplify the power of the proposed framework, experimental results in several applications are reported.

1706.03369 2026-06-04 stat.ML cs.LG cs.NA math.NA stat.CO

On the Sampling Problem for Kernel Quadrature

关于核二次求积的采样问题

Francois-Xavier Briol, Chris J. Oates, Jon Cockayne, Wilson Ye Chen, Mark Girolami

AI总结 本文探讨了核二次求积中采样分布对收敛速率的影响,提出基于自适应温度调节和序列蒙特卡罗的自动方法,显著降低积分误差。

详情
Journal ref
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:586-595, 2017
Comments
To appear at Thirty-fourth International Conference on Machine Learning (ICML 2017)
AI中文摘要

标准的核二次求积方法在随机点集下(也称为贝叶斯蒙特卡罗)以根均方误差收敛,其收敛速率由$s/d$比值决定,其中$s$和$d$分别表示被积函数的光滑性和维度。然而,实证研究显示速率常数$C$对随机点分布高度敏感。与标准蒙特卡罗积分不同,对于核二次求积,使$C$最小的采样分布无闭合形式。本文认为采样分布的实用选择是一个重要开放问题。一种解决方案是基于自适应温度调节和序列蒙特卡罗的自动方法。实证结果表明,该方法可使积分误差降低多达4个数量级。

英文摘要

The standard Kernel Quadrature method for numerical integration with random point sets (also called Bayesian Monte Carlo) is known to converge in root mean square error at a rate determined by the ratio $s/d$, where $s$ and $d$ encode the smoothness and dimension of the integrand. However, an empirical investigation reveals that the rate constant $C$ is highly sensitive to the distribution of the random points. In contrast to standard Monte Carlo integration, for which optimal importance sampling is well-understood, the sampling distribution that minimises $C$ for Kernel Quadrature does not admit a closed form. This paper argues that the practical choice of sampling distribution is an important open problem. One solution is considered; a novel automatic approach based on adaptive tempering and sequential Monte Carlo. Empirical results demonstrate a dramatic reduction in integration error of up to 4 orders of magnitude can be achieved with the proposed method.

1610.06752 2026-06-04 stat.CO cs.NA math.NA

Comments on "Bayesian Solution Uncertainty Quantification for Differential Equations" by Chkrebtii, Campbell, Calderhead & Girolami

对Chkrebtii等人论文《关于微分方程的贝叶斯解不确定性量化》的评论

Francois-Xavier Briol, Jon Cockayne, Onur Teymur

AI总结 本文讨论了概率数值方法中先验建模的未来研究方向,强调了在不确定性量化中需深入考虑的先验模型问题。

详情
Journal ref
Bayesian Analysis, Vol 11, Num 4, pp1285-1293, 2016
AI中文摘要

我们称赞作者们在新兴的概率数值(PN)领域提供了重要贡献的激动人心的论文。以下,我们讨论了未来工作中需要彻底考虑的先验建模方面。

英文摘要

We commend the authors for an exciting paper which provides a strong contribution to the emerging field of probabilistic numerics (PN). Below, we discuss aspects of prior modelling which need to be considered thoroughly in future work.

1610.03819 2026-06-04 math.NA cs.CV cs.NA math.ST stat.TH

Recursive Diffeomorphism-Based Regression for Shape Functions

递归微分流形回归用于形状函数

Jieren Xu, Haizhao Yang, Ingrid Daubechies

AI总结 本文提出一种递归微分流形回归方法,用于一维广义模式分解问题,旨在从其叠加中提取广义模式。首先应用一维同步压缩变换估计瞬时信息,然后提出基于微分流形和非参数回归的新方法估计波形函数。

详情
AI中文摘要

本文提出了一种递归微分流形回归方法,用于解决一维广义模式分解问题,目标是从其叠加中提取广义模式$α_k(t)s_k(2πN_kϕ_k(t))$。首先,应用一维同步压缩变换估计瞬时信息,例如$α_k(t)$和$N_kϕ_k(t)$。其次,提出一种基于微分流形和非参数回归的新方法来估计波形函数$s_k(t)$。这两种方法导致在弱分离条件下广义模式分解问题的框架。提供了合成和真实数据的数值示例,以展示这些方法的广泛应用。

英文摘要

This paper proposes a recursive diffeomorphism based regression method for one-dimensional generalized mode decomposition problem that aims at extracting generalized modes $α_k(t)s_k(2πN_kϕ_k(t))$ from their superposition $\sum_{k=1}^K α_k(t)s_k(2πN_kϕ_k(t))$. First, a one-dimensional synchrosqueezed transform is applied to estimate instantaneous information, e.g., $α_k(t)$ and $N_kϕ_k(t)$. Second, a novel approach based on diffeomorphisms and nonparametric regression is proposed to estimate wave shape functions $s_k(t)$. These two methods lead to a framework for the generalized mode decomposition problem under a weak well-separation condition. Numerical examples of synthetic and real data are provided to demonstrate the fruitful applications of these methods.

1608.01282 2026-06-04 stat.ML cs.IT cs.NA math.DS math.IT math.NA

A Multivariate Hawkes Process with Gaps in Observations

具有观测间隙的多元Hawkes过程

Triet M Le

AI总结 本文研究了在观测存在间隙的情况下,利用多元Hawkes过程学习隐藏的定向关系。通过引入边界条件,解决了缺失事件的处理问题,并展示了在稀疏观测下MHPG仍能实现稳健恢复。

详情
AI中文摘要

给定网络中的一组实体及其间断观测活动,一个重要问题是学习隐藏的边以描述这些实体之间的方向关系。本文研究了通过多元Hawkes过程实现的因果关系(激发)。多元Hawkes过程(MHP)及其变体(时空点过程)已被用于研究地震、犯罪、神经尖峰活动、股票和外汇市场中的传染现象。本文考虑了具有观测间隙的多元Hawkes过程(MHPG)。我们提出了一种变分问题,用于检测稀疏隐藏关系,该过程考虑了每个实体的间隙。通过引入少量未知边界条件,我们绕过了处理大量缺失事件的问题。在观测稀疏(例如10%到30%)的情况下,我们通过数值模拟表明,即使观测间隔长度较小但选择恰当,MHPG仍能实现稳健恢复。数值结果还显示,了解间隙并施加正确的边界条件在发现潜在模式和隐藏关系中至关重要。

英文摘要

Given a collection of entities (or nodes) in a network and our intermittent observations of activities from each entity, an important problem is to learn the hidden edges depicting directional relationships among these entities. Here, we study causal relationships (excitations) that are realized by a multivariate Hawkes process. The multivariate Hawkes process (MHP) and its variations (spatio-temporal point processes) have been used to study contagion in earthquakes, crimes, neural spiking activities, the stock and foreign exchange markets, etc. In this paper, we consider the multivariate Hawkes process with gaps in observations (MHPG). We propose a variational problem for detecting sparsely hidden relationships with a multivariate Hawkes process that takes into account the gaps from each entity. We bypass the problem of dealing with a large amount of missing events by introducing a small number of unknown boundary conditions. In the case where our observations are sparse (e.g. from 10% to 30%), we show through numerical simulations that robust recovery with MHPG is still possible even if the lengths of the observed intervals are small but they are chosen accordingly. The numerical results also show that the knowledge of gaps and imposing the right boundary conditions are very crucial in discovering the underlying patterns and hidden relationships.

1707.09319 2026-06-04 stat.OT cs.LG cs.NA math.NA

A Fourier-invariant method for locating point-masses and computing their attributes

一种用于定位点质量及其属性的傅里叶不变方法

Charles K. Chui, Hrushikesh N. Mhaskar

AI总结 本文提出一种有效方法,用于计数点质量、确定其空间位置并计算其属性,基于傅里叶不变的赫尔姆特矩计算,适用于任意维度的空间和傅里叶数据处理。

详情
AI中文摘要

受观察癌细胞生长和探索星系及恒星形成过程的启发,本文旨在介绍一种严谨有效的计数点质量、确定其空间位置并计算其属性的方法。基于傅里叶不变的赫尔姆特矩计算,我们的方法促进了任意维度空间和傅里叶数据的处理。

英文摘要

Motivated by the interest of observing the growth of cancer cells among normal living cells and exploring how galaxies and stars are truly formed, the objective of this paper is to introduce a rigorous and effective method for counting point-masses, determining their spatial locations, and computing their attributes. Based on computation of Hermite moments that are Fourier-invariant, our approach facilitates the processing of both spatial and Fourier data in any dimension.

1411.6529 2026-06-04 math.NA cs.NA stat.AP stat.ME

The Optimal Arbitrary-Proportional Finite-Set-Partitioning

最优任意比例有限集合划分

Tiancheng Li

AI总结 本文研究了任意比例有限集合划分问题,通过定义度量函数最小化偏差,解决集合划分中的整数约束问题,并通过理论证明和仿真验证方案的最优性。

详情
Journal ref
Frontiers of Information Technology & Electronic Engineering, Volume 16, Issue 11, pp 969-984 (2015)
AI中文摘要

本文考虑了任意比例有限集合划分问题,即根据任意非负比例将有限集合划分为多个子集。这是许多基础问题的核心,例如确定不同权重个体的配额或从离散加权样本集采样以获得新的同分布但非加权样本集(例如粒子滤波所需的重采样)。挑战在于每个子集的大小必须为整数,而其无偏期望通常不是。为了解决这个问题,定义了一个度量(成本函数)在其偏差上,并相应地提出了解决方案以确定每个子集的大小,从而获得最小的偏差。理论证明和仿真演示展示了在所提出度量意义下的方案最优性。

英文摘要

This paper considers the arbitrary-proportional finite-set-partitioning problem which involves partitioning a finite set into multiple subsets with respect to arbitrary nonnegative proportions. This is the core art of many fundamental problems such as determining quotas for different individuals of different weights or sampling from a discrete-valued weighted sample set to get a new identically distributed but non-weighted sample set (e.g. the resampling needed in the particle filter). The challenge raises as the size of each subset must be an integer while its unbiased expectation is often not. To solve this problem, a metric (cost function) is defined on their discrepancies and correspondingly a solution is proposed to determine the sizes of each subsets, gaining the minimal bias. Theoretical proof and simulation demonstrations are provided to demonstrate the optimality of the scheme in the sense of the proposed metric.

1707.07976 2026-06-04 stat.ML cs.NA math.NA

Scaled Nuclear Norm Minimization for Low-Rank Tensor Completion

缩放核范数最小化用于低秩张量补全

Morteza Ashraphijuo, Xiaodong Wang

AI总结 本文提出通过最小化张量展开的加权核范数之和来恢复低TT秩张量,证明该方法相比传统核范数求和方法更高效,需更少样本即可恢复原始张量。

详情
AI中文摘要

最小化矩阵的核范数已被证明在重建低秩采样矩阵中非常有效。此外,最小化张量矩阵化核范数之和已被证明在恢复低Tucker秩采样张量中非常有效。本文提出通过最小化张量展开的加权核范数之和来恢复低TT秩采样张量。我们提供数值结果表明,与单纯最小化核范数之和相比,所提方法需要显著更少的样本即可恢复到原始张量,因为TT张量模型中的展开结构与Tucker张量模型中的矩阵化结构本质上不同。

英文摘要

Minimizing the nuclear norm of a matrix has been shown to be very efficient in reconstructing a low-rank sampled matrix. Furthermore, minimizing the sum of nuclear norms of matricizations of a tensor has been shown to be very efficient in recovering a low-Tucker-rank sampled tensor. In this paper, we propose to recover a low-TT-rank sampled tensor by minimizing a weighted sum of nuclear norms of unfoldings of the tensor. We provide numerical results to show that our proposed method requires significantly less number of samples to recover to the original tensor in comparison with simply minimizing the sum of nuclear norms since the structure of the unfoldings in the TT tensor model is fundamentally different from that of matricizations in the Tucker tensor model.

1508.03332 2026-06-04 math.NA cs.LG cs.MA cs.NA math.DS stat.ML

Dimensionality Reduction of Collective Motion by Principal Manifolds

通过主流形进行集体运动的降维

Kelum Gajamannage, Sachit Butail, Maurizio Porfiri, Erik M. Bollt

AI总结 本文提出基于立方平滑样条构建二维主流形的方法,用于降维分析集体运动数据,保留原始结构并优于现有非线性降维方法。

详情
Journal ref
Physica-D : Nonlinear Phenomena, Volume 291, 15 January 2015, Pages 62-73
Comments
19 pages, 13 figures, journal article
AI中文摘要

尽管已证明集体运动模式中存在低维嵌入流形,但现有非线性降维方法无法有效分析此类流形,主要原因是谱分解步骤限制了高维空间到嵌入空间的映射控制。本文提出一种替代方法,要求二维嵌入以拓扑方式总结高维数据。具体而言,我们直接在高维空间中使用立方平滑样条构建二维主流形,并用测地距离定义嵌入坐标。通过代表性示例,我们展示与现有非线性降维方法相比,主流形在噪声和稀疏数据集上仍能保留原始结构。主流形寻找算法应用于多个代理动态系统模拟复杂机动(捕食者围攻)得到的配置,并将所得二维嵌入与已建立的非线性降维方法进行比较。

英文摘要

While the existence of low-dimensional embedding manifolds has been shown in patterns of collective motion, the current battery of nonlinear dimensionality reduction methods are not amenable to the analysis of such manifolds. This is mainly due to the necessary spectral decomposition step, which limits control over the mapping from the original high-dimensional space to the embedding space. Here, we propose an alternative approach that demands a two-dimensional embedding which topologically summarizes the high-dimensional data. In this sense, our approach is closely related to the construction of one-dimensional principal curves that minimize orthogonal error to data points subject to smoothness constraints. Specifically, we construct a two-dimensional principal manifold directly in the high-dimensional space using cubic smoothing splines, and define the embedding coordinates in terms of geodesic distances. Thus, the mapping from the high-dimensional data to the manifold is defined in terms of local coordinates. Through representative examples, we show that compared to existing nonlinear dimensionality reduction methods, the principal manifold retains the original structure even in noisy and sparse datasets. The principal manifold finding algorithm is applied to configurations obtained from a dynamical system of multiple agents simulating a complex maneuver called predator mobbing, and the resulting two-dimensional embedding is compared with that of a well-established nonlinear dimensionality reduction method.

1707.01596 2026-06-04 math.OC cs.SY eess.SY stat.ML

Topology Estimation in Bulk Power Grids: Guarantees on Exact Recovery

大规模电力网络拓扑估计:关于精确恢复的保证

Deepjyoti Deka, Saurav Talukdar, Michael Chertkov, Murti Salapaka

AI总结 本文提出了一种基于电压测量的图模型框架,用于估计大规模电力网络拓扑,通过阈值或邻域计数方案提取操作边,并在无三节点环的电网中保证精确恢复。

详情
Comments
10 pages, 8 figures. A version of this paper will appear in IREP 2017
AI中文摘要

电力网络拓扑影响其动态运行和电力市场结算。实时拓扑识别可使在故障等紧急场景下更快采取控制措施。本文讨论了一种图模型框架,用于利用电网节点收集的电压测量值估计大规模电力网络(包括环形输电和辐射分布)的拓扑。图模型的概率分布节点电压在线性功率流模型中包含额外边以及真实电网的操作边。所提出的估计算法首先学习图模型,随后使用阈值或邻域计数方案提取操作边。对于无三节点环(两个母线不共享公共邻居)的电网拓扑,证明了精确提取操作拓扑的理论保证。这包括大多数具有辐射拓扑的配电网络。对于包含三节点环的电网,提供确保存在精确重建算法的充分条件。特别是对于单位长度恒定阻抗和均匀注入协方差的电网,这一观察导致对母线地理位置的条件。算法性能在测试案例模拟中得到验证。

英文摘要

The topology of a power grid affects its dynamic operation and settlement in the electricity market. Real-time topology identification can enable faster control action following an emergency scenario like failure of a line. This article discusses a graphical model framework for topology estimation in bulk power grids (both loopy transmission and radial distribution) using measurements of voltage collected from the grid nodes. The graphical model for the probability distribution of nodal voltages in linear power flow models is shown to include additional edges along with the operational edges in the true grid. Our proposed estimation algorithms first learn the graphical model and subsequently extract the operational edges using either thresholding or a neighborhood counting scheme. For grid topologies containing no three-node cycles (two buses do not share a common neighbor), we prove that an exact extraction of the operational topology is theoretically guaranteed. This includes a majority of distribution grids that have radial topologies. For grids that include cycles of length three, we provide sufficient conditions that ensure existence of algorithms for exact reconstruction. In particular, for grids with constant impedance per unit length and uniform injection covariances, this observation leads to conditions on geographical placement of the buses. The performance of algorithms is demonstrated in test case simulations.

1707.04531 2026-06-04 math.NA cs.NA stat.AP

A Convex Reconstruction Model for X-ray Tomographic Imaging with Uncertain Flat-fields

X射线断层成像中带有不确定平坦场的凸重建模型

Hari Om Aggrawal, Martin Skovgaard Andersen, Sean Rose, Emil Y. Sidky

AI总结 本文提出一种新的凸模型,用于处理X射线断层成像中平坦场的不确定性,以提高重建质量,尤其在动态CT等对时间和剂量敏感的应用中减少环状伪影。

详情
Comments
Accepted at IEEE Transactions on Computational Imaging
AI中文摘要

经典X射线计算机断层成像方法基于假设X射线源强度已知,但实际上该强度是测量得到的,因此存在不确定性。在正常工作条件下,当曝光时间足够高时,这种不确定性通常对重建质量影响很小。然而,在时间或剂量受限的应用中,如动态CT,这种不确定性可能导致严重的系统性伪影,即环状伪影。通过仔细建模测量过程并考虑不确定性,我们推导出一个新的凸模型,即使在低质量测量情况下也能实现改进的重建。我们基于模拟和真实数据集展示了该方法的有效性。

英文摘要

Classical methods for X-ray computed tomography are based on the assumption that the X-ray source intensity is known, but in practice, the intensity is measured and hence uncertain. Under normal operating conditions, when the exposure time is sufficiently high, this kind of uncertainty typically has a negligible effect on the reconstruction quality. However, in time- or dose-limited applications such as dynamic CT, this uncertainty may cause severe and systematic artifacts known as ring artifacts. By carefully modeling the measurement process and by taking uncertainties into account, we derive a new convex model that leads to improved reconstructions despite poor quality measurements. We demonstrate the effectiveness of the methodology based on simulated and real data sets.

1707.03340 2026-06-04 math.NA cs.LG cs.NA stat.ML

Deep Learning for Real Time Crime Forecasting

深度学习用于实时犯罪预测

Bao Wang, Duo Zhang, Duanhao Zhang, P. Jeffery Brantingham, Andrea L. Bertozzi

AI总结 本文提出基于深度学习的时空预测模型,通过空间时间正则化和残差卷积结构,提升对洛杉矶犯罪分布的预测精度。

详情
Comments
4 pages, 6 figures, NOLTA, 2017
AI中文摘要

准确的实时犯罪预测是公共安全的关键问题,但对科学界仍具挑战性。犯罪发生受多种复杂因素影响,且犯罪事件稀疏。在不同时空尺度下,犯罪分布呈现显著不同的模式,且在空间和时间上均具有极低的规律性。本文采用最先进的深度学习时空预测器ST-ResNet[1],用于预测洛杉矶地区的犯罪分布。我们的模型分为两个阶段:首先对原始犯罪数据进行预处理,包括空间和时间上的正则化以增强可预测信号;其次采用残差卷积单元的层次结构训练多因素犯罪预测模型。在洛杉矶为期半年的实验中,我们的模型展现出高度的预测能力。

英文摘要

Accurate real time crime prediction is a fundamental issue for public safety, but remains a challenging problem for the scientific community. Crime occurrences depend on many complex factors. Compared to many predictable events, crime is sparse. At different spatio-temporal scales, crime distributions display dramatically different patterns. These distributions are of very low regularity in both space and time. In this work, we adapt the state-of-the-art deep learning spatio-temporal predictor, ST-ResNet [Zhang et al, AAAI, 2017], to collectively predict crime distribution over the Los Angeles area. Our models are two staged. First, we preprocess the raw crime data. This includes regularization in both space and time to enhance predictable signals. Second, we adapt hierarchical structures of residual convolutional units to train multi-factor crime prediction models. Experiments over a half year period in Los Angeles reveal highly accurate predictive power of our models.

1707.02670 2026-06-04 math.OC cs.DS cs.LG cs.NA math.NA stat.ML

Accelerated Stochastic Power Iteration

加速随机幂迭代

Christopher De Sa, Bryan He, Ioannis Mitliagkas, Christopher Ré, Peng Xu

AI总结 本文提出一种带有动量项的幂迭代变种,实现了最优的样本和迭代复杂度,适用于在线和离线设置的随机PCA算法,加速了迭代复杂度至O(1/√Δ)。

详情
Comments
37 pages, 5 figures
AI中文摘要

主成分分析(PCA)是机器学习中最强大的工具之一。最简单的PCA方法,即幂迭代,需要O(1/Δ)次全数据遍历来恢复具有特征间隙Δ的矩阵的主成分。Lanczos方法虽然更复杂,但实现了加速的O(1/√Δ)遍历率。现代应用却要求仅处理可用数据子集的随机方法。在线随机设置中,简单的Oja迭代方法达到最优样本复杂度O(σ²/Δ²),但其完全序列且需要O(σ²/Δ²)次迭代,远低于Lanczos的O(1/√Δ)速率。本文提出一种带有动量项的幂迭代变种,实现了最优的样本和迭代复杂度。在全遍历设置中,标准分析表明动量可实现加速的O(1/√Δ)速率。我们实证表明,简单将动量应用于随机方法并不能加速。通过新颖的紧方差分析,揭示了“断裂点方差”之后加速不再发生。结合现代方差减少技术,我们构建了适用于在线和离线设置的随机PCA算法,实现了加速的迭代复杂度O(1/√Δ)。由于我们的方法具有 embarrassingly 并行性质,如果在并行环境中部署,这种加速可直接转化为实际时间。我们的方法非常通用,适用于许多可加速的非凸优化问题。

英文摘要

Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, requires $\mathcal O(1/Δ)$ full-data passes to recover the principal component of a matrix with eigen-gap $Δ$. Lanczos, a significantly more complex method, achieves an accelerated rate of $\mathcal O(1/\sqrtΔ)$ passes. Modern applications, however, motivate methods that only ingest a subset of available data, known as the stochastic setting. In the online stochastic setting, simple algorithms like Oja's iteration achieve the optimal sample complexity $\mathcal O(σ^2/Δ^2)$. Unfortunately, they are fully sequential, and also require $\mathcal O(σ^2/Δ^2)$ iterations, far from the $\mathcal O(1/\sqrtΔ)$ rate of Lanczos. We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity. In the full-pass setting, standard analysis shows that momentum achieves the accelerated rate, $\mathcal O(1/\sqrtΔ)$. We demonstrate empirically that naively applying momentum to a stochastic method, does not result in acceleration. We perform a novel, tight variance analysis that reveals the "breaking-point variance" beyond which this acceleration does not occur. By combining this insight with modern variance reduction techniques, we construct stochastic PCA algorithms, for the online and offline setting, that achieve an accelerated iteration complexity $\mathcal O(1/\sqrtΔ)$. Due to the embarassingly parallel nature of our methods, this acceleration translates directly to wall-clock time if deployed in a parallel environment. Our approach is very general, and applies to many non-convex optimization problems that can now be accelerated using the same technique.

1707.01945 2026-06-04 cs.LG cs.NA math.NA stat.ML

Simple Classification using Binary Data

基于二进制数据的简单分类

Deanna Needell, Rayan Saab, Tina Woolf

AI总结 本文研究了从二进制数据进行分类的问题,提出了一种计算和资源消耗低的框架,并通过实验和理论分析验证其有效性。

详情
AI中文摘要

二进制数据在许多应用中自然出现,并在硬件实现和算法设计中具有吸引力。本文研究了从二进制数据进行分类的问题,提出了一种计算和资源消耗低的框架。我们通过 stylized 和 realistic 的数值实验展示了所提方法的实用性,并为简单情况提供了理论分析。我们希望我们的框架和分析能为研究类似方法提供基础。

英文摘要

Binary, or one-bit, representations of data arise naturally in many applications, and are appealing in both hardware implementations and algorithm design. In this work, we study the problem of data classification from binary data and propose a framework with low computation and resource costs. We illustrate the utility of the proposed approach through stylized and realistic numerical experiments, and provide a theoretical analysis for a simple case. We hope that our framework and analysis will serve as a foundation for studying similar types of approaches.

1511.07304 2026-06-04 math.PR cs.NA math.NA stat.CO

Convergence Results for a Class of Time-Varying Simulated Annealing Algorithms

时间变化退火算法类的收敛性结果

Mathieu Gerber, Luke Bornn

AI总结 本文研究了基于时间变化马尔可夫核的退火算法类的几乎必然收敛性,提出了保证收敛的条件,并验证了其在实际应用中的可行性。

详情
Comments
25 pages (final version)
AI中文摘要

我们提供了一组条件,确保在有界集$\mathcal{X}\subset\mathbb{R}^d$上,基于时间变化马尔可夫核的一类退火算法的几乎必然收敛性。本文考虑的算法类别包括Belisle (1992)、Yang (2000)以及Gerber和Bornn (2016)最近提出的去随机化版本。据我们所知,本文得出的结果是首次基于时间变化核的退火算法的几乎必然收敛性示例。此外,对马尔可夫核和冷却计划的假设在实践中易于验证。

英文摘要

We provide a set of conditions which ensure the almost sure convergence of a class of simulated annealing algorithms on a bounded set $\mathcal{X}\subset\mathbb{R}^d$ based on a time-varying Markov kernel. The class of algorithms considered in this work encompasses the one studied in Belisle (1992) and Yang (2000) as well as its derandomized version recently proposed by Gerber and Bornn (2016). To the best of our knowledge, the results we derive are the first examples of almost sure convergence results for simulated annealing based on a time-varying kernel. In addition, the assumptions on the Markov kernel and on the cooling schedule have the advantage of being trivial to verify in practice.

1611.04537 2026-06-04 stat.ME cs.NA math.NA math.OC math.ST stat.AP stat.TH

Multiscale scanning in inverse problems

逆问题中的多尺度扫描

Katharina Proksch, Frank Werner, Axel Munk

AI总结 本文提出多尺度扫描方法,用于确定逆回归模型中量f相对于字典U的活跃成分。通过提供统一的置信陈述,实现对活跃成分的识别,同时控制错误率。方法基于Gaussian近似和自适应的尺度惩罚,适用于断定问题,并在医学成像中展示应用。

详情
Comments
55 pages, 10 figures, 1 table
AI中文摘要

在本文中,我们提出了一种多尺度扫描方法,用于从观测Y中确定逆回归模型Y=Tf+ξ中量f相对于字典U的活跃成分。为此,我们在假设(T*)^{-1}(U)为小波型的情况下,为系数⟨φ,f⟩,φ∈U提供统一的置信陈述。基于此,我们获得一个多重检验,允许在受控的家族错误率下识别U中的活跃成分,即⟨f,φ⟩≠0,φ∈U。我们的结果依赖于对底层多尺度统计量的高斯近似,结合一种自适应的尺度惩罚,以适应问题的病态性。尺度惩罚还确保统计量的分布弱收敛到Gumbel极限。我们详细讨论了断定和反卷积的重要特殊情况。此外,当T=id且字典由不同大小的移动窗口(尺度)组成时的回归情况也包含在内,扩展了此前在该设置下的结果。我们证明了我们的方法满足Oracle最优性,即在正确尺度下,它达到与单尺度检验程序相同的渐近功效。模拟支持我们的理论,我们展示了该方法作为成像工具的潜力。作为特定应用,我们讨论了超分辨率显微镜,并分析了实验STED数据以定位单个DNA折纸结构。

英文摘要

In this paper we propose a multiscale scanning method to determine active components of a quantity $f$ w.r.t. a dictionary $\mathcal{U}$ from observations $Y$ in an inverse regression model $Y=Tf+ξ$ with linear operator $T$ and general random error $ξ$. To this end, we provide uniform confidence statements for the coefficients $\langle φ, f\rangle$, $φ\in \mathcal U$, under the assumption that $(T^*)^{-1} \left(\mathcal U\right)$ is of wavelet-type. Based on this we obtain a multiple test that allows to identify the active components of $\mathcal{U}$, i.e. $\left\langle f, φ\right\rangle \neq 0$, $φ\in \mathcal U$, at controlled, family-wise error rate. Our results rely on a Gaussian approximation of the underlying multiscale statistic with a novel scale penalty adapted to the ill-posedness of the problem. The scale penalty furthermore ensures weak convergence of the statistic's distribution towards a Gumbel limit under reasonable assumptions. The important special cases of tomography and deconvolution are discussed in detail. Further, the regression case, when $T = \text{id}$ and the dictionary consists of moving windows of various sizes (scales), is included, generalizing previous results for this setting. We show that our method obeys an oracle optimality, i.e. it attains the same asymptotic power as a single-scale testing procedure at the correct scale. Simulations support our theory and we illustrate the potential of the method as an inferential tool for imaging. As a particular application we discuss super-resolution microscopy and analyze experimental STED data to locate single DNA origami.

1706.07584 2026-06-04 math.PR cs.NA math.NA math.OC math.SP math.ST stat.TH

Global algorithms for maximal eigenpair

全局算法用于最大特征对

Mu-Fa Chen

AI总结 本文提出两种全局算法,用于在广泛设定中计算最大特征对,包括实数和复数矩阵,扩展了之前对三对角矩阵和不可约矩阵的研究。

详情
Comments
20 pages
AI中文摘要

本文是继\ct{cmf16}之后的延续,首次为三对角矩阵引入了高效的计算最大特征对的算法,随后扩展到不可约矩阵,非对角线元素非负的情况。本文提出了两种全局算法,用于在广泛设定中计算最大特征对,包括甚至包含实数(部分非对角线元素为负)或复数矩阵。

英文摘要

This paper is a continuation of \ct{cmf16} where an efficient algorithm for computing the maximal eigenpair was introduced first for tridiagonal matrices and then extended to the irreducible matrices with nonnegative off-diagonal elements. This paper introduces two global algorithms for computing the maximal eigenpair in a rather general setup, including even a class of real (with some negative off-diagonal elements) or complex matrices.

1511.01437 2026-06-04 math.PR cs.NA math.NA math.ST physics.data-an stat.TH

The sample size required in importance sampling

重要性抽样所需样本量

Sourav Chatterjee, Persi Diaconis

AI总结 研究提出重要性抽样中样本量需约exp(D(ν||μ)),并应用于指数族分布的样本量公式推导。

详情
Comments
37 pages, 4 figures. To appear in Ann. App. Probab
AI中文摘要

重要性抽样旨在用不同概率测度μ抽样的样本估计给定函数对测度ν的期望值。若μ和ν近似奇异,所需样本量较大。本文证明在一般情况下,样本量约exp(D(ν||μ))(Kullback-Leibler散度)即可实现准确估计,并应用于指数族分布(Gibbs措施)的样本量公式推导。

英文摘要

The goal of importance sampling is to estimate the expected value of a given function with respect to a probability measure $ν$ using a random sample of size $n$ drawn from a different probability measure $μ$. If the two measures $μ$ and $ν$ are nearly singular with respect to each other, which is often the case in practice, the sample size required for accurate estimation is large. In this article it is shown that in a fairly general setting, a sample of size approximately $\exp(D(ν||μ))$ is necessary and sufficient for accurate estimation by importance sampling, where $D(ν||μ)$ is the Kullback-Leibler divergence of $μ$ from $ν$. In particular, the required sample size exhibits a kind of cut-off in the logarithmic scale. The theory is applied to obtain a general formula for the sample size required in importance sampling for one-parameter exponential families (Gibbs measures).

1610.00470 2026-06-04 eess.SY cs.SY stat.ML

A new kernel-based approach to system identification with quantized output data

基于核方法的系统辨识新方法:量化输出数据

Giulio Bottegal, Håkan Hjalmarsson, Gianluigi Pillonetto

AI总结 本文提出一种基于核方法的系统辨识新方法,利用稳定样条核建模脉冲响应,通过马尔可夫链蒙特卡洛方法估计系统参数,并通过最大似然估计优化核超参数。

详情
Comments
10 pages, 4 figures
AI中文摘要

本文介绍了一种新颖的线性系统辨识方法,用于处理量化输出数据。我们把脉冲响应建模为均值为零的高斯过程,其协方差(核)由最近提出的稳定样条核给出,该核编码了正则性和指数稳定性信息。这为将系统辨识问题转化为贝叶斯框架提供了起点。我们采用马尔可夫链蒙特卡洛方法来估计系统。特别是,我们设计了两种基于吉布斯采样器的方法,允许通过期望最大化方法通过最大似然估计来估计核超参数。数值模拟显示,所提出的方法在系统辨识中比现有最先进的核方法更有效。

英文摘要

In this paper we introduce a novel method for linear system identification with quantized output data. We model the impulse response as a zero-mean Gaussian process whose covariance (kernel) is given by the recently proposed stable spline kernel, which encodes information on regularity and exponential stability. This serves as a starting point to cast our system identification problem into a Bayesian framework. We employ Markov Chain Monte Carlo methods to provide an estimate of the system. In particular, we design two methods based on the so-called Gibbs sampler that allow also to estimate the kernel hyperparameters by marginal likelihood maximization via the expectation-maximization method. Numerical simulations show the effectiveness of the proposed scheme, as compared to the state-of-the-art kernel-based methods when these are employed in system identification with quantized data.

1706.05736 2026-06-04 math.NA cs.DS cs.NA stat.ML

Fixed-Rank Approximation of a Positive-Semidefinite Matrix from Streaming Data

从数据流中正定矩阵的固定秩近似

Joel A. Tropp, Alp Yurtsever, Madeleine Udell, Volkan Cevher

AI总结 本文提出一种基于Nystrom方法和新秩截断机制的算法,用于从数据流中近似正定矩阵,理论分析证明其在Schatten 1范数下可达到任意给定的相对误差,并通过实验验证其在多种示例中的优越性。

详情
AI中文摘要

几个重要的应用,如流式PCA和半正定规划,涉及一个大规模的正定矩阵,该矩阵以线性更新序列的形式出现。由于存储限制,可能只能保留正定矩阵的草稿。本文开发了一种新的算法,用于从草稿中进行固定秩正定近似。该方法结合了Nystrom近似与一种新的秩截断机制。理论分析证明,所提出的方法可以在Schatten 1范数下实现任意给定的相对误差,并利用输入矩阵的谱衰减特性。计算机实验表明,所提出的方法在广泛的示例中均优于其他固定秩正定矩阵近似技术。

英文摘要

Several important applications, such as streaming PCA and semidefinite programming, involve a large-scale positive-semidefinite (psd) matrix that is presented as a sequence of linear updates. Because of storage limitations, it may only be possible to retain a sketch of the psd matrix. This paper develops a new algorithm for fixed-rank psd approximation from a sketch. The approach combines the Nystrom approximation with a novel mechanism for rank truncation. Theoretical analysis establishes that the proposed method can achieve any prescribed relative error in the Schatten 1-norm and that it exploits the spectral decay of the input matrix. Computer experiments show that the proposed method dominates alternative techniques for fixed-rank psd matrix approximation across a wide range of examples.

1602.07764 2026-06-04 cs.AI cs.LG cs.NA math.NA math.OC stat.ML

Reinforcement Learning of POMDPs using Spectral Methods

使用谱方法进行POMDP的强化学习

Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar

AI总结 本文提出基于谱分解方法的POMDP强化学习算法,通过轨迹学习参数并利用优化 oracle 得到最优无记忆策略,证明了与最优无记忆策略的最优 regret 绑定和高维空间的高效扩展性。

详情
Journal ref
29th Annual Conference on Learning Theory, PMLR 49:193-256, 2016
AI中文摘要

我们提出了一种新的基于谱分解方法的POMDP强化学习算法。尽管谱方法之前已被用于一致学习隐马尔可夫模型等被动潜在变量模型,但POMDP更具挑战性,因为学习者与环境交互可能会改变未来的观测。我们设计了一种通过回合运行的算法,每个回合中利用谱技术从由固定策略生成的轨迹中学习POMDP参数。回合结束时,优化 oracle 返回基于估计POMDP模型的最优无记忆规划策略,该策略最大化预期奖励。我们证明了与最优无记忆策略相比的最优 regret 绑定以及在观测和动作空间维度上的高效扩展性。

英文摘要

We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the future observations in the process. We devise a learning algorithm running through episodes, in each episode we employ spectral techniques to learn the POMDP parameters from a trajectory generated by a fixed policy. At the end of the episode, an optimization oracle returns the optimal memoryless planning policy which maximizes the expected reward based on the estimated POMDP model. We prove an order-optimal regret bound with respect to the optimal memoryless policy and efficient scaling with respect to the dimensionality of observation and action spaces.

1706.04097 2026-06-04 cs.LG cs.DS cs.NA math.NA stat.ML

Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations

可证明的非负矩阵分解交替梯度下降法用于强相关性情况

Yuanzhi Li, Yingyu Liang

AI总结 本文提出了一种简单的交替梯度下降算法,证明在强相关性下能有效恢复真实特征矩阵,并展示了其在噪声下的鲁棒性。

详情
Comments
Accepted to the International Conference on Machine Learning (ICML), 2017
AI中文摘要

非负矩阵分解是一种在非负约束下将数据分解为特征和权重矩阵的基本工具,在实践中通常通过交替最小化框架求解。然而,当不同特征的权重高度相关时,此类算法能否恢复真实特征矩阵仍不明确。本文提出了一种简单自然的交替梯度下降算法,并证明在温和初始化下,即使在强相关性存在时也能证明恢复真实矩阵。在大多数有趣的情况下,相关性可以达到最高可能的量级。我们的分析还揭示了其几个有利特性,包括对噪声的鲁棒性。我们通过半合成数据集的实证研究补充了理论结果,证明其在恢复真实矩阵方面优于几种流行方法。

英文摘要

Non-negative matrix factorization is a basic tool for decomposing data into the feature and weight matrices under non-negativity constraints, and in practice is often solved in the alternating minimization framework. However, it is unclear whether such algorithms can recover the ground-truth feature matrix when the weights for different features are highly correlated, which is common in applications. This paper proposes a simple and natural alternating gradient descent based algorithm, and shows that with a mild initialization it provably recovers the ground-truth in the presence of strong correlations. In most interesting cases, the correlation can be in the same order as the highest possible. Our analysis also reveals its several favorable features including robustness to noise. We complement our theoretical results with empirical studies on semi-synthetic datasets, demonstrating its advantage over several popular methods in recovering the ground-truth.

1402.2703 2026-06-04 math.ST cs.NA math.NA math.OC stat.TH

Taking all positive eigenvectors is suboptimal in classical multidimensional scaling

在经典多维标定中,取所有正特征向量是次优的

Jeffrey Tsang, Rajesh Pereira

AI总结 本文指出在经典多维标定中,仅取正特征向量会导致误差增加,提出更优的处理方法。

详情
Journal ref
SIAM Journal on Optimization 26(4):2080-2090, 2016
Comments
13 pages, 1 figure, 1 table, 1 supplementary file
AI中文摘要

在经典多维标定中,仅取正特征向量会导致误差增加,本文提出更优的处理方法。

英文摘要

It is hard to overstate the importance of multidimensional scaling as an analysis technique in the broad sciences. Classical, or Torgerson multidimensional scaling is one of the main variants, with the advantage that it has a closed-form analytic solution. However, this solution is exact if and only if the distances are Euclidean. Conversely, there has been comparatively little discussion on what to do in the presence of negative eigenvalues: the intuitive solution, prima facie justifiable in least-squares terms, is to take every positive eigenvector as a dimension. We show that this, minimizing least-squares to the centred distances instead of the true distances, is suboptimal - throwing away positive eigenvectors can decrease the error even as we project to fewer dimensions. We provide provably better methods for handling this common case.

1702.07944 2026-06-04 cs.LG cs.AI cs.SY eess.SY math.OC stat.ML

Stochastic Variance Reduction Methods for Policy Evaluation

基于随机方差缩减的方法用于策略评估

Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou

AI总结 本文提出基于线性函数逼近的策略评估方法,通过将经验策略评估问题转化为二次凸-凹鞍点问题,并设计了双变量批量梯度方法及两种随机方差缩减算法,实现线性缩放和线性收敛。

详情
Comments
Accepted by ICML 2017
AI中文摘要

策略评估是强化学习中的关键步骤,用于估计在给定策略下状态长期价值的价值函数。本文聚焦于在固定数据集上使用线性函数逼近的策略评估。我们首先将经验策略评估问题转化为二次凸-凹鞍点问题,然后提出了一种对偶批量梯度方法,以及两种用于解决该问题的随机方差缩减方法。这些算法在样本大小和特征维度上均呈线性扩展。此外,即使当鞍点问题仅在对偶变量中具有强凹性而没有在原变量中具有强凸性时,它们仍能实现线性收敛。在基准问题上的数值实验验证了方法的有效性。

英文摘要

Policy evaluation is a crucial step in many reinforcement-learning procedures, which estimates a value function that predicts states' long-term value under a given policy. In this paper, we focus on policy evaluation with linear function approximation over a fixed dataset. We first transform the empirical policy evaluation problem into a (quadratic) convex-concave saddle point problem, and then present a primal-dual batch gradient method, as well as two stochastic variance reduction methods for solving the problem. These algorithms scale linearly in both sample size and feature dimension. Moreover, they achieve linear convergence even when the saddle-point problem has only strong concavity in the dual variables but no strong convexity in the primal variables. Numerical experiments on benchmark problems demonstrate the effectiveness of our methods.

1706.02692 2026-06-04 stat.ME cs.NA math.NA

The True Cost of Stochastic Gradient Langevin Dynamics

随机梯度兰格-恩德尔动力学的真实成本

Tigran Nagapetyan, Andrew B. Duncan, Leonard Hasenclever, Sebastian J. Vollmer, Lukasz Szpruch, Konstantinos Zygalakis

AI总结 本文研究了在强log-凹模型中,SGLD方法在大数据集下的均方误差,指出需选择极小步长以控制偏差,通过控制变量子法可大幅降低计算成本。

详情
Comments
6 Figures
AI中文摘要

后验推断是贝叶斯统计的核心,大量马尔可夫链蒙特卡洛(MCMC)方法用于从后验中获得渐近正确的样本。随着数据集增大,MCMC方法的可扩展性成为关键问题。随机梯度兰格-恩德尔动力学(SGLD)及相关随机梯度MCMC方法通过在每一步模拟动态中使用随机梯度实现可扩展性。尽管这些方法在适当减小步长时渐近无偏,但实践中常使用固定步长,引入常被忽略的偏差。本文研究了强log-凹模型中Lipschitz函数的均方误差,表明在给定批量大小下,为控制SGLD的偏差,步长必须选择得极小,使得达到目标精度的计算成本与批量大小无关。通过控制变量子法可大幅降低成本。分析将算法视为噪声离散化兰格-恩德尔随机微分方程,对应于使用完整数据集时的欧拉方法。重要发现是,若要求一致可信区间,步长尺度由稳定性准则决定。实验结果验证了理论发现。

英文摘要

The problem of posterior inference is central to Bayesian statistics and a wealth of Markov Chain Monte Carlo (MCMC) methods have been proposed to obtain asymptotically correct samples from the posterior. As datasets in applications grow larger and larger, scalability has emerged as a central problem for MCMC methods. Stochastic Gradient Langevin Dynamics (SGLD) and related stochastic gradient Markov Chain Monte Carlo methods offer scalability by using stochastic gradients in each step of the simulated dynamics. While these methods are asymptotically unbiased if the stepsizes are reduced in an appropriate fashion, in practice constant stepsizes are used. This introduces a bias that is often ignored. In this paper we study the mean squared error of Lipschitz functionals in strongly log- concave models with i.i.d. data of growing data set size and show that, given a batchsize, to control the bias of SGLD the stepsize has to be chosen so small that the computational cost of reaching a target accuracy is roughly the same for all batchsizes. Using a control variate approach, the cost can be reduced dramatically. The analysis is performed by considering the algorithms as noisy discretisations of the Langevin SDE which correspond to the Euler method if the full data set is used. An important observation is that the 1scale of the step size is determined by the stability criterion if the accuracy is required for consistent credible intervals. Experimental results confirm our theoretical findings.

1706.02495 2026-06-04 stat.ML cs.SY eess.SY

The Generalized Cross Validation Filter

广义交叉验证滤波器

Giulio Bottegal, Gianluigi Pillonetto

AI总结 本文提出广义交叉验证滤波器,通过扩展经典卡尔曼滤波方程,实现在线应用中高效传播GCV分数,解决了传统方法在在线应用中的计算瓶颈。

详情
Comments
10 pages, 9 figures
AI中文摘要

广义交叉验证(GCV)是用于逆问题和正则化技术中估计参数的重要方法。在状态空间模型(如样条情况)中,存在高效算法计算GCV分数,其复杂度与数据集大小线性相关。然而,这些方法不适用于在线应用,因为它们依赖于前向和后向递归。本文通过推导新的GCV滤波器,将更新成本降低到O(1),从而实现GCV的在线应用。该滤波器扩展了经典卡尔曼滤波方程,以高效传播GCV分数。我们还展示了新滤波器在状态估计和在线正则化线性系统识别中的应用。

英文摘要

Generalized cross validation (GCV) is one of the most important approaches used to estimate parameters in the context of inverse problems and regularization techniques. A notable example is the determination of the smoothness parameter in splines. When the data are generated by a state space model, like in the spline case, efficient algorithms are available to evaluate the GCV score with complexity that scales linearly in the data set size. However, these methods are not amenable to on-line applications since they rely on forward and backward recursions. Hence, if the objective has been evaluated at time $t-1$ and new data arrive at time t, then O(t) operations are needed to update the GCV score. In this paper we instead show that the update cost is $O(1)$, thus paving the way to the on-line use of GCV. This result is obtained by deriving the novel GCV filter which extends the classical Kalman filter equations to efficiently propagate the GCV score over time. We also illustrate applications of the new filter in the context of state estimation and on-line regularized linear system identification.

1603.03516 2026-06-04 math.ST cs.NA math.NA stat.TH

An $\ell_{\infty}$ Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation

一个ℓ∞特征向量扰动界及其在鲁棒协方差估计中的应用

Jianqing Fan, Weichen Wang, Yiqiao Zhong

AI总结 本文提出一个基于ℓ∞范数的特征向量扰动界,用于鲁棒协方差估计,尤其适用于数据具有重尾分布的情况,通过理论分析和数值实验验证了其有效性。

详情
Comments
48 pages, 5 figures
AI中文摘要

在统计和机器学习中,人们通常对某些矩阵(如协方差矩阵、数据矩阵等)的特征向量(或奇异向量)感兴趣。然而,这些矩阵通常受到噪声或统计误差的扰动,无论是来自随机采样还是结构性模式。通常使用Davis-Kahan sinθ定理来以ℓ2范数形式界定矩阵A和扰动矩阵A~ = A + E的特征向量之间的差异。本文证明,当A是一个低秩且不相干的矩阵时,奇异向量(或对称情况下的特征向量)的ℓ∞范数扰动界比传统方法小一个因子√d1或√d2,其中d1和d2是矩阵的维度。该新扰动结果的威力在鲁棒协方差估计中得到展示,特别是在随机变量具有重尾分布的情况下。在那里,我们提出了新的鲁棒协方差估计器,并利用新发展的扰动界建立了其渐近性质。我们的理论结果通过广泛的数值实验得到了验证。

英文摘要

In statistics and machine learning, people are often interested in the eigenvectors (or singular vectors) of certain matrices (e.g. covariance matrices, data matrices, etc). However, those matrices are usually perturbed by noises or statistical errors, either from random sampling or structural patterns. One usually employs Davis-Kahan $\sin θ$ theorem to bound the difference between the eigenvectors of a matrix $A$ and those of a perturbed matrix $\widetilde{A} = A + E$, in terms of $\ell_2$ norm. In this paper, we prove that when $A$ is a low-rank and incoherent matrix, the $\ell_{\infty}$ norm perturbation bound of singular vectors (or eigenvectors in the symmetric case) is smaller by a factor of $\sqrt{d_1}$ or $\sqrt{d_2}$ for left and right vectors, where $d_1$ and $d_2$ are the matrix dimensions. The power of this new perturbation result is shown in robust covariance estimation, particularly when random variables have heavy tails. There, we propose new robust covariance estimators and establish their asymptotic properties using the newly developed perturbation bound. Our theoretical results are verified through extensive numerical experiments.

1706.00241 2026-06-04 cs.LG cs.NA math.NA stat.ML

Krylov Subspace Recycling for Fast Iterative Least-Squares in Machine Learning

Krylov子空间回收用于机器学习中的快速迭代最小二乘法

Filip de Roos, Philipp Hennig

AI总结 本文研究了利用Krylov子空间回收方法提高机器学习中对称正定线性问题求解效率,通过迭代优化低秩近似以平衡计算成本与数值精度。

详情
AI中文摘要

求解对称正定线性问题是机器学习中的基础计算任务。精确解,众所周知,其计算复杂度与矩阵大小呈立方关系。为缓解这一问题,已提出几种线性时间的近似方法,如谱方法和诱导点方法,这些方法现在被广泛应用。这些方法是低秩近似,提前选择低秩空间,并不随时间迭代优化。虽然这允许数据集大小的线性成本,但也导致有限的、无法纠正的近似误差。数值线性代数领域的作者探索了如何迭代优化此类低秩近似,其成本仅为少量矩阵-向量乘法。这一想法尤其在机器学习中许多情况下具有吸引力,其中需要解决一系列相关的对称正定线性问题。从机器学习的角度来看,此类消减方法可以被解释为在时间序列的数值任务中,低秩近似的迁移学习。我们研究了此类方法在我们领域中的应用。我们的实验证明,在中等规模的回归和分类问题上,这种方法可以介于低计算成本和数值精度之间。

英文摘要

Solving symmetric positive definite linear problems is a fundamental computational task in machine learning. The exact solution, famously, is cubicly expensive in the size of the matrix. To alleviate this problem, several linear-time approximations, such as spectral and inducing-point methods, have been suggested and are now in wide use. These are low-rank approximations that choose the low-rank space a priori and do not refine it over time. While this allows linear cost in the data-set size, it also causes a finite, uncorrected approximation error. Authors from numerical linear algebra have explored ways to iteratively refine such low-rank approximations, at a cost of a small number of matrix-vector multiplications. This idea is particularly interesting in the many situations in machine learning where one has to solve a sequence of related symmetric positive definite linear problems. From the machine learning perspective, such deflation methods can be interpreted as transfer learning of a low-rank approximation across a time-series of numerical tasks. We study the use of such methods for our field. Our empirical results show that, on regression and classification problems of intermediate size, this approach can interpolate between low computational cost and numerical precision.

1705.10757 2026-06-04 eess.SY cs.SY stat.ML

A Multi-Layer K-means Approach for Multi-Sensor Data Pattern Recognition in Multi-Target Localization

多层K-means方法用于多传感器数据模式识别的多目标定位

Samuel Silva, Rengan Suresh, Feng Tao, Johnathan Votion, Yongcan Cao

AI总结 本文提出多层K-means方法,结合K-means、K-means++和深度学习的优势,用于多传感器数据关联以提高多目标定位精度。

详情
AI中文摘要

数据-目标关联是智能无人系统在搜索救援、交通管理及监控等应用中多目标定位的重要步骤。本文提出一种创新的数据关联学习方法,名为多层K-means(MLKM),基于利用现有机器学习方法的优势,包括K-means、K-means++和深度神经网络。为了实现不同传感器的准确数据关联以实现高效的定位,MLKM依赖于K-means++的聚类能力,结构化为多层框架,并具有受深度学习研究中著名的反向传播启发的错误校正功能。为了展示MLKM方法的有效性,进行了大量模拟实验,将其性能与K-means、K-means++和深度神经网络进行比较。

英文摘要

Data-target association is an important step in multi-target localization for the intelligent operation of un- manned systems in numerous applications such as search and rescue, traffic management and surveillance. The objective of this paper is to present an innovative data association learning approach named multi-layer K-means (MLKM) based on leveraging the advantages of some existing machine learning approaches, including K-means, K-means++, and deep neural networks. To enable the accurate data association from different sensors for efficient target localization, MLKM relies on the clustering capabilities of K-means++ structured in a multi-layer framework with the error correction feature that is motivated by the backpropogation that is well-known in deep learning research. To show the effectiveness of the MLKM method, numerous simulation examples are conducted to compare its performance with K-means, K-means++, and deep neural networks.

1703.10761 2026-06-04 math.NA cs.NA math.AP stat.ML

Universal Scalable Robust Solvers from Computational Information Games and fast eigenspace adapted Multiresolution Analysis

从计算信息博弈中获得通用可扩展稳健求解器及快速特征空间适应多分辨率分析

Houman Owhadi, Clint Scovel

AI总结 本文通过将求解任意有界线性算子的稳健可扩展数值求解器问题转化为博弈论问题,提出了一种基于高斯场的求解方法,并引入快速赌局变换实现多分辨率分析。

详情
Comments
142 pages. 14 Figures. Presented at AFOSR (Aug 2016), DARPA (Sep 2016), IPAM (Apr 3, 2017), Hausdorff (April 13, 2017) and ICERM (June 5, 2017)
AI中文摘要

我们展示如何将发现任意有界线性算子的稳健可扩展数值求解器的过程自动化为博弈论问题,通过将利用部分信息和有限资源的计算过程重新表述为底层对抗信息博弈的层次结构。当解空间是一个带有二次范数‖·‖的巴拿赫空间B时,此类博弈(例如,在给定部分测量[ϕ_i, u]的情况下,通过‖·‖范数的相对误差作为损失来恢复u∈B)的最优测度(混合策略)是一个仅由范数‖·‖决定的中心高斯场ξ,其条件(基于测量)产生最优赌注。当测量是分层的时,对这个高斯场进行条件化的过程会产生一个赌注层次结构(gamblets)。这些赌注在意义上扩展了小波和瓦尼尔函数的概念,因为它们适应于范数‖·‖并诱导出适应于定义范数的算子的特征子空间的多分辨率分解。当算子是局部化的,我们表明所得到的赌注在空间和频率上都是局部化的,并引入了具有严格精度和(近线性)复杂度估计的快速赌局变换(FGT)。由于FFT可以用于求解和对角化任意具有常系数的PDE,FGT可以用于将广泛连续线性算子(包括任意连续线性双射从H^s_0到H^{-s}或到L^2)分解为一系列独立的线性系统,这些系统具有均匀有界的条件数,并导致O(N polylog N)的求解器和特征空间适应的多分辨率分析(导致所有特征子空间的近线性复杂度近似)。

英文摘要

We show how the discovery of robust scalable numerical solvers for arbitrary bounded linear operators can be automated as a Game Theory problem by reformulating the process of computing with partial information and limited resources as that of playing underlying hierarchies of adversarial information games. When the solution space is a Banach space $B$ endowed with a quadratic norm $\|\cdot\|$, the optimal measure (mixed strategy) for such games (e.g. the adversarial recovery of $u\in B$, given partial measurements $[ϕ_i, u]$ with $ϕ_i\in B^*$, using relative error in $\|\cdot\|$-norm as a loss) is a centered Gaussian field $ξ$ solely determined by the norm $\|\cdot\|$, whose conditioning (on measurements) produces optimal bets. When measurements are hierarchical, the process of conditioning this Gaussian field produces a hierarchy of elementary bets (gamblets). These gamblets generalize the notion of Wavelets and Wannier functions in the sense that they are adapted to the norm $\|\cdot\|$ and induce a multi-resolution decomposition of $B$ that is adapted to the eigensubspaces of the operator defining the norm $\|\cdot\|$. When the operator is localized, we show that the resulting gamblets are localized both in space and frequency and introduce the Fast Gamblet Transform (FGT) with rigorous accuracy and (near-linear) complexity estimates. As the FFT can be used to solve and diagonalize arbitrary PDEs with constant coefficients, the FGT can be used to decompose a wide range of continuous linear operators (including arbitrary continuous linear bijections from $H^s_0$ to $H^{-s}$ or to $L^2$) into a sequence of independent linear systems with uniformly bounded condition numbers and leads to $\mathcal{O}(N \operatorname{polylog} N)$ solvers and eigenspace adapted Multiresolution Analysis (resulting in near linear complexity approximation of all eigensubspaces).

1403.7588 2026-06-04 math.OC cs.CV cs.NA math.NA stat.ML

Scalable Robust Matrix Recovery: Frank-Wolfe Meets Proximal Methods

可扩展的鲁棒矩阵恢复:Frank-Wolfe与近端方法的结合

Cun Mu, Yuqian Zhang, John Wright, Donald Goldfarb

AI总结 本文提出了一种可扩展且高效的鲁棒矩阵恢复方法,结合Frank-Wolfe和近端方法,以线性复杂度解决压缩主成分追寻问题,通过秩一SVD更新低秩部分并处理稀疏项,验证了方法在视觉数据中的可扩展性。

详情
Journal ref
SIAM Journal on Scientific Computing, 2016, Vol. 38, No. 5 : pp. A3291-A3317
AI中文摘要

矩阵从压缩和严重损坏的观测中恢复是稳健统计中的基本问题,广泛应用于计算机视觉和机器学习。理论上,在某些条件下,该问题可以通过自然的凸松弛,即压缩主成分追踪(CPCP)在多项式时间内解决。然而,所有现有的可证明算法对于CPCP都面临每迭代超线性的成本,这严重限制了它们在大规模问题中的应用。在本文中,我们提出了一种可证明、可扩展和高效的解决CPCP的方法,具有(本质上)线性每迭代成本。我们的方法结合了Frank-Wolfe和近端方法的经典思想。在每次迭代中,我们主要利用Frank-Wolfe来使用秩一SVD更新低秩部分,并利用近端步骤处理稀疏项。还讨论了收敛结果和实现细节。我们通过在视觉数据上的有希望的数值实验展示了所提出方法的可扩展性。

英文摘要

Recovering matrices from compressive and grossly corrupted observations is a fundamental problem in robust statistics, with rich applications in computer vision and machine learning. In theory, under certain conditions, this problem can be solved in polynomial time via a natural convex relaxation, known as Compressive Principal Component Pursuit (CPCP). However, all existing provable algorithms for CPCP suffer from superlinear per-iteration cost, which severely limits their applicability to large scale problems. In this paper, we propose provable, scalable and efficient methods to solve CPCP with (essentially) linear per-iteration cost. Our method combines classical ideas from Frank-Wolfe and proximal methods. In each iteration, we mainly exploit Frank-Wolfe to update the low-rank component with rank-one SVD and exploit the proximal step for the sparse term. Convergence results and implementation details are also discussed. We demonstrate the scalability of the proposed approach with promising numerical experiments on visual data.

1705.10015 2026-06-04 math.NA cs.NA math.OC stat.ML

Learning the Sparse and Low Rank PARAFAC Decomposition via the Elastic Net

通过弹性网学习稀疏和低秩PARAFAC分解

Songting Shi, Xiang Li, Arkadiusz Sitek, Quanzheng Li

AI总结 本文提出一种贝叶斯模型,通过弹性网学习存在缺失值的张量的稀疏和低秩PARAFAC分解,能鲁棒地找到真实秩和稀疏因子矩阵,并提出高效的块坐标下降算法和admax随机块坐标下降算法以解决大规模问题。

详情
AI中文摘要

本文提出了一种贝叶斯模型,通过弹性网学习存在缺失值的张量的稀疏和低秩PARAFAC分解,具有找到真实秩和稀疏因子矩阵并鲁棒于噪声的性质。我们提出了高效的块坐标下降算法和admax随机块坐标下降算法来解决这个问题,可以用于解决大规模问题。为了选择PARAFAC分解中的适当秩和稀疏性,我们通过逐步增加正则化来增加稀疏性和减少秩。当找到因子矩阵的稀疏结构后,可以固定稀疏结构,使用小正则化来减少恢复误差,并可以从解路径中选择适当的分解,其中具有足够稀疏因子矩阵和低恢复误差。我们通过模拟数据和真实数据测试了算法的性能,结果表明其具有强大能力。

英文摘要

In this article, we derive a Bayesian model to learning the sparse and low rank PARAFAC decomposition for the observed tensor with missing values via the elastic net, with property to find the true rank and sparse factor matrix which is robust to the noise. We formulate efficient block coordinate descent algorithm and admax stochastic block coordinate descent algorithm to solve it, which can be used to solve the large scale problem. To choose the appropriate rank and sparsity in PARAFAC decomposition, we will give a solution path by gradually increasing the regularization to increase the sparsity and decrease the rank. When we find the sparse structure of the factor matrix, we can fixed the sparse structure, using a small to regularization to decreasing the recovery error, and one can choose the proper decomposition from the solution path with sufficient sparse factor matrix with low recovery error. We test the power of our algorithm on the simulation data and real data, which show it is powerful.

1705.08815 2026-06-04 stat.ML cs.SY eess.SY stat.AP

Power Systems Data Fusion based on Belief Propagation

基于信念传播的电力系统数据融合

Francesco Fusco, Seshu Tirupathi, Robert Gormally

AI总结 本文提出基于概率图模型的电力系统数据融合框架,通过整合异构数据源与经典状态估计节点,扩展电网状态定义并实现高效分布式推断算法,应用于分布式太阳能能量量化。

详情
Comments
Version as accepted for publication at the 7th IEEE International Conference on Innovative Smart Grid Technologies (ISGT) Europe 2017
AI中文摘要

随着分布式资源渗透率增加和互联分布式计量设备普及,电力系统需要新的工具提供统一一致的系统视图。本文提出一种基于概率图模型的电力系统数据融合计算框架,能够结合异构数据源与经典状态估计节点及其他定制计算节点。该框架允许灵活扩展电网状态概念,超越传统支路-节点模型中的流和注入视图,并能推导出高效的自然分布式推断算法。通过基于标准IEEE 14节点测试案例的半合成模拟数值示例,提出数据融合模型在量化分布式太阳能能量方面的应用。

英文摘要

The increasing complexity of the power grid, due to higher penetration of distributed resources and the growing availability of interconnected, distributed metering devices re- quires novel tools for providing a unified and consistent view of the system. A computational framework for power systems data fusion, based on probabilistic graphical models, capable of combining heterogeneous data sources with classical state estimation nodes and other customised computational nodes, is proposed. The framework allows flexible extension of the notion of grid state beyond the view of flows and injection in bus-branch models, and an efficient, naturally distributed inference algorithm can be derived. An application of the data fusion model to the quantification of distributed solar energy is proposed through numerical examples based on semi-synthetic simulations of the standard IEEE 14-bus test case.

1612.07002 2026-06-04 math.NA cs.NA stat.CO

A subset multicanonical Monte Carlo method for simulating rare failure events

一个用于模拟罕见故障事件的子集多canonical蒙特卡洛方法

Xinjuan Chen, Jinglai Li

AI总结 本文提出一种结合子集模拟和多canonical蒙特卡洛方法的算法,用于高效估计极小故障概率,提升采样效率并提供更多信息。

详情
AI中文摘要

估计工程系统故障概率是许多工程领域的重要问题。本文考虑故障概率极小(例如≤10^{-10})的情况,在这种情况下,标准蒙特卡洛方法由于所需样本数极多而不可行。为此,我们提出一种算法,结合子集模拟(SS)和多canonical蒙特卡洛(MMC)方法的核心思想。与标准MMC方法不同,所提出的子集MMC算法在状态空间的子集上进行MMC模拟,从而提高了采样效率。通过数值示例表明,所提出的方法在效率上显著优于SS和MMC方法。此外,所提出算法可以重建参数的完整分布函数,从而提供比单纯故障概率更多的信息。

英文摘要

Estimating failure probabilities of engineering systems is an important problem in many engineering fields. In this work we consider such problems where the failure probability is extremely small (e.g $\leq10^{-10}$). In this case, standard Monte Carlo methods are not feasible due to the extraordinarily large number of samples required. To address these problems, we propose an algorithm that combines the main ideas of two very powerful failure probability estimation approaches: the subset simulation (SS) and the multicanonical Monte Carlo (MMC) methods. Unlike the standard MMC which samples in the entire domain of the input parameter in each iteration, the proposed subset MMC algorithm adaptively performs MMC simulations in a subset of the state space and thus improves the sampling efficiency. With numerical examples we demonstrate that the proposed method is significantly more efficient than both of the SS and the MMC methods. Moreover, the proposed algorithm can reconstruct the complete distribution function of the parameter of interest and thus can provide more information than just the failure probabilities of the systems.

1606.05578 2026-06-04 cs.MA cs.SY eess.SY stat.CO

Proximity Without Consensus in Online Multi-Agent Optimization

在线多智能体优化中的无共识接近

Alec Koppel, Brian M. Sadler, Alejandro Ribeiro

AI总结 本文提出一种在线多智能体优化方法,在无需共识约束的情况下,通过网络接近约束优化全局目标,证明了时间平均解的收敛性。

详情
AI中文摘要

我们考虑多智能体设置中的随机优化问题,其中网络中的智能体旨在学习最优参数,同时优先考虑本地观测的流式信息。为此,我们偏离传统去中心化优化框架,转而提出一种问题,其中每个智能体最小化全局目标同时施加网络接近约束。该方法包含在线共识优化作为特殊情况,但允许更一般的假设,即网络中存在数据异质性。为了解决这个问题,我们提出了一种受Arrow和Hurwicz启发的随机鞍点算法。该方法产生了一种处理每个节点依次接收到的观测的去中心化算法。使用拉格朗日乘数惩罚它们之间的差异,只有相邻节点交换模型信息。我们证明在恒定步长制度下,时间平均次优性和约束违反被限制在一个邻域内,该邻域半径随着迭代次数的增加而减小。因此,我们证明时间平均的原始向量收敛到最优目标,同时满足网络接近约束。我们将其应用于顺序估计相关随机场在传感器网络中的问题,以及在线源定位问题,这两个问题都展示了上述收敛结果的实证有效性。

英文摘要

We consider stochastic optimization problems in multi-agent settings, where a network of agents aims to learn parameters which are optimal in terms of a global objective, while giving preference to locally observed streaming information. To do so, we depart from the canonical decentralized optimization framework where agreement constraints are enforced, and instead formulate a problem where each agent minimizes a global objective while enforcing network proximity constraints. This formulation includes online consensus optimization as a special case, but allows for the more general hypothesis that there is data heterogeneity across the network. To solve this problem, we propose using a stochastic saddle point algorithm inspired by Arrow and Hurwicz. This method yields a decentralized algorithm for processing observations sequentially received at each node of the network. Using Lagrange multipliers to penalize the discrepancy between them, only neighboring nodes exchange model information. We establish that under a constant step-size regime the time-average suboptimality and constraint violation are contained in a neighborhood whose radius vanishes with increasing number of iterations. As a consequence, we prove that the time-average primal vectors converge to the optimal objective while satisfying the network proximity constraints. We apply this method to the problem of sequentially estimating a correlated random field in a sensor network, as well as an online source localization problem, both of which demonstrate the empirical validity of the aforementioned convergence results.

1607.01231 2026-06-04 math.OC cs.LG cs.NA math.NA stat.ML

Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization

非凸随机优化的随机拟牛顿方法

Xiao Wang, Shiqian Ma, Donald Goldfarb, Wei Liu

AI总结 本文研究非凸随机优化中的随机拟牛顿方法,提出了一种框架并证明了收敛性,分析了最坏情况下的迭代复杂度,提出了一种随机阻尼L-BFGS方法,并结合SVRG技术,展示了在非凸二分类和多分类问题中的数值结果。

详情
Comments
published in SIAM Journal on Optimization
AI中文摘要

本文研究了非凸随机优化中的随机拟牛顿方法,假设可以通过随机一阶 oracle(SFO)获取目标函数梯度的噪声信息。我们提出了一种通用框架,证明了几乎必然收敛到 stationary points,并分析了最坏情况下的迭代复杂度。当随机选择的迭代结果作为算法输出时,我们证明在最坏情况下,SFO调用的复杂度为 $O(ε^{-2})$,以确保梯度平方范数的期望小于给定的精度容限 $ε$。我们还提出了一种具体的算法,即随机阻尼L-BFGS(SdLBFGS)方法,该方法属于所提出的框架。此外,我们将SVRG方差减少技术纳入所提出的SdLBFGS方法中,并分析了其SFO调用复杂度。报告了在非凸二分类问题中使用SVM以及多分类问题中使用神经网络的数值结果。

英文摘要

In this paper we study stochastic quasi-Newton methods for nonconvex stochastic optimization, where we assume that noisy information about the gradients of the objective function is available via a stochastic first-order oracle (SFO). We propose a general framework for such methods, for which we prove almost sure convergence to stationary points and analyze its worst-case iteration complexity. When a randomly chosen iterate is returned as the output of such an algorithm, we prove that in the worst-case, the SFO-calls complexity is $O(ε^{-2})$ to ensure that the expectation of the squared norm of the gradient is smaller than the given accuracy tolerance $ε$. We also propose a specific algorithm, namely a stochastic damped L-BFGS (SdLBFGS) method, that falls under the proposed framework. {Moreover, we incorporate the SVRG variance reduction technique into the proposed SdLBFGS method, and analyze its SFO-calls complexity. Numerical results on a nonconvex binary classification problem using SVM, and a multiclass classification problem using neural networks are reported.

1705.05804 2026-06-04 cs.CV cs.NA math.NA stat.ML

The Incremental Multiresolution Matrix Factorization Algorithm

增量多分辨率矩阵分解算法

Vamsi K. Ithapu, Risi Kondor, Sterling C. Johnson, Vikas Singh

AI总结 本文提出增量多分辨率矩阵分解算法,用于揭示对称矩阵的层次块结构,通过逐特征分析提升大规模矩阵处理能力,并在医学影像回归任务中验证其有效性。

详情
Comments
Computer Vision and Pattern Recognition (CVPR) 2017, 10 pages
AI中文摘要

多分辨率分析和矩阵分解是计算机视觉的基础工具。本文研究了这两个不同领域的交汇,并获得揭示对称矩阵层次块结构的技术,这对许多视觉问题的成功至关重要。我们的新算法,增量多分辨率矩阵分解,逐特征揭示此类结构,因此能有效扩展至大规模矩阵。我们描述了这种多尺度分析比直接全局分解能识别的更多。我们通过医学影像数据评估所得到的分解在回归任务中的有效性。我们还利用该分解在由流行深度网络学习的表示上进行操作,提供证据表明这些网络即使未显式训练以执行此类推断,也能推断语义关系。我们展示了该算法可作为探索工具来改进网络架构,并在视觉的众多其他设置中使用。

英文摘要

Multiresolution analysis and matrix factorization are foundational tools in computer vision. In this work, we study the interface between these two distinct topics and obtain techniques to uncover hierarchical block structure in symmetric matrices -- an important aspect in the success of many vision problems. Our new algorithm, the incremental multiresolution matrix factorization, uncovers such structure one feature at a time, and hence scales well to large matrices. We describe how this multiscale analysis goes much farther than what a direct global factorization of the data can identify. We evaluate the efficacy of the resulting factorizations for relative leveraging within regression tasks using medical imaging data. We also use the factorization on representations learned by popular deep networks, providing evidence of their ability to infer semantic relationships even when they are not explicitly trained to do so. We show that this algorithm can be used as an exploratory tool to improve the network architecture, and within numerous other settings in vision.

1604.00169 2026-06-04 stat.ML cs.SY eess.SY

A sequential Monte Carlo approach to Thompson sampling for Bayesian optimization

基于序贯蒙特卡洛方法的贝叶斯优化中的汤普森采样

Hildo Bijl, Thomas B. Schön, Jan-Willem van Wingerden, Michel Verhaegen

AI总结 本文提出一种基于序贯蒙特卡洛方法的算法,用于近似高斯过程最大值的分布,从而在连续输入空间中应用汤普森采样进行贝叶斯优化,无需优化非线性获取函数。

详情
AI中文摘要

通过高斯过程回归进行的贝叶斯优化是一种有效的优化方法,用于优化未知函数,其中每次测量都很昂贵。该方法近似目标函数并推荐新的测量点。此推荐通常通过优化给定的获取函数来选择。在足够多的测量后,会做出最大值的推荐。然而,关键认识是高斯过程的最大值不是一个确定点,而是一个具有自身分布的随机变量。此分布无法解析计算。我们的主要贡献是一种受序贯蒙特卡洛采样启发的算法,用于近似此最大值分布。随后,通过从该分布中采样,我们使汤普森采样能够应用于(带武器)优化问题,具有连续输入空间。所有这些都在不需要优化非线性获取函数的情况下完成。实验表明,所得到的优化方法在保持累积遗憾有限方面具有竞争力。

英文摘要

Bayesian optimization through Gaussian process regression is an effective method of optimizing an unknown function for which every measurement is expensive. It approximates the objective function and then recommends a new measurement point to try out. This recommendation is usually selected by optimizing a given acquisition function. After a sufficient number of measurements, a recommendation about the maximum is made. However, a key realization is that the maximum of a Gaussian process is not a deterministic point, but a random variable with a distribution of its own. This distribution cannot be calculated analytically. Our main contribution is an algorithm, inspired by sequential Monte Carlo samplers, that approximates this maximum distribution. Subsequently, by taking samples from this distribution, we enable Thompson sampling to be applied to (armed-bandit) optimization problems with a continuous input space. All this is done without requiring the optimization of a nonlinear acquisition function. Experiments have shown that the resulting optimization method has a competitive performance at keeping the cumulative regret limited.

1308.1883 2026-06-04 stat.CO cs.NA math.NA math.PR

Nested particle filters for online parameter estimation in discrete-time state-space Markov models

嵌套粒子滤波器用于离散时间状态空间马尔可夫模型中的在线参数估计

Dan Crisan, Joaquin Miguez

AI总结 本文提出一种嵌套粒子滤波器方法,用于离散时间状态空间马尔可夫模型中的在线参数估计,通过递归结构实现高效近似,证明了在L_p范数下渐近收敛性。

详情
Comments
Just a format update compared to the previous version. This is the final manuscript accepted for publication in the Bernoulli journal
AI中文摘要

我们通过序贯蒙特卡洛方法解决状态空间动态系统固定参数后验概率分布的近似问题。所提出的方法依赖于嵌套结构,使用两层粒子滤波器近似系统静态参数和动态状态变量的后验概率测度,类似于最近的"序贯蒙特卡洛平方"(SMC$^2$)算法。然而,与SMC$^2$方案不同,所提出的技术完全递归。特别地,本文方法的递归步骤的计算复杂度随时间恒定。我们分析了通过所提出方案计算系统参数后验分布下有界实函数积分的近似。结果证明,在正则性假设下,近似误差在L_p(p≥1)范数下渐近消失,收敛速度与1/√N + 1/√M成比例,其中N是参数空间中的蒙特卡洛样本数,N×M是状态空间中的样本数。这一结果也适用于参数和状态变量联合后验分布的近似。我们讨论了SMC$^2$算法与新递归方法之间的关系,并通过简单示例和计算机模拟来说明一些理论发现。

英文摘要

We address the problem of approximating the posterior probability distribution of the fixed parameters of a state-space dynamical system using a sequential Monte Carlo method. The proposed approach relies on a nested structure that employs two layers of particle filters to approximate the posterior probability measure of the static parameters and the dynamic state variables of the system of interest, in a vein similar to the recent "sequential Monte Carlo square" (SMC$^2$) algorithm. However, unlike the SMC$^2$ scheme, the proposed technique operates in a purely recursive manner. In particular, the computational complexity of the recursive steps of the method introduced herein is constant over time. We analyse the approximation of integrals of real bounded functions with respect to the posterior distribution of the system parameters computed via the proposed scheme. As a result, we prove, under regularity assumptions, that the approximation errors vanish asymptotically in $L_p$ ($p \ge 1$) with convergence rate proportional to $\frac{1}{\sqrt{N}} + \frac{1}{\sqrt{M}}$, where $N$ is the number of Monte Carlo samples in the parameter space and $N\times M$ is the number of samples in the state space. This result also holds for the approximation of the joint posterior distribution of the parameters and the state variables. We discuss the relationship between the SMC$^2$ algorithm and the new recursive method and present a simple example in order to illustrate some of the theoretical findings with computer simulations.

1702.04077 2026-06-04 cs.LG cs.NA math.NA stat.ML

Mutual Kernel Matrix Completion

互核矩阵补全

Tsuyoshi Kato, Rachelle Rivero

AI总结 本文提出互核矩阵补全算法,通过融合数据与核矩阵补全方法,提升生物数据分类任务中缺失核矩阵的补全效果。

详情
Comments
10 pages, 4 figures
AI中文摘要

随着各种数据的大量涌入,从其中提取知识已成为数据科学家的一项有趣但繁琐的任务,特别是当数据形式异构且存在缺失信息时。许多数据补全技术已被引入,尤其是在核方法出现后。然而,在现有文献中,关于同时补全多个不完整核矩阵的研究却很少受到关注。本文提出了一种新的方法,称为互核矩阵补全(MKMC)算法,通过结合数据融合和核矩阵补全的概念,应用于生物数据集以用于分类任务。我们首先引入了一个目标函数,通过利用EM算法进行最小化,从而得到涉及的核矩阵中缺失条目的估计。补全后的核矩阵随后被结合以生成一个模型矩阵,可用于进一步改进获得的估计。我们的研究结果表明,E步和M步以闭合形式给出,使我们的算法在时间和内存方面都高效。完成补全后,补全的核矩阵用于训练SVM分类器,以测试数据点之间关系的保持程度。我们的实证结果表明,所提出的算法在保持数据点之间关系和准确恢复缺失核矩阵条目方面优于传统补全技术。目前,MKMC为多个相关不完整核矩阵的相互估计问题提供了一个有前途的解决方案。

英文摘要

With the huge influx of various data nowadays, extracting knowledge from them has become an interesting but tedious task among data scientists, particularly when the data come in heterogeneous form and have missing information. Many data completion techniques had been introduced, especially in the advent of kernel methods. However, among the many data completion techniques available in the literature, studies about mutually completing several incomplete kernel matrices have not been given much attention yet. In this paper, we present a new method, called Mutual Kernel Matrix Completion (MKMC) algorithm, that tackles this problem of mutually inferring the missing entries of multiple kernel matrices by combining the notions of data fusion and kernel matrix completion, applied on biological data sets to be used for classification task. We first introduced an objective function that will be minimized by exploiting the EM algorithm, which in turn results to an estimate of the missing entries of the kernel matrices involved. The completed kernel matrices are then combined to produce a model matrix that can be used to further improve the obtained estimates. An interesting result of our study is that the E-step and the M-step are given in closed form, which makes our algorithm efficient in terms of time and memory. After completion, the (completed) kernel matrices are then used to train an SVM classifier to test how well the relationships among the entries are preserved. Our empirical results show that the proposed algorithm bested the traditional completion techniques in preserving the relationships among the data points, and in accurately recovering the missing kernel matrix entries. By far, MKMC offers a promising solution to the problem of mutual estimation of a number of relevant incomplete kernel matrices.

1608.05002 2026-06-04 math.ST econ.GN q-fin.EC stat.TH

Bayesian Posteriors For Arbitrarily Rare Events

贝叶斯后验用于任意稀有事件

Drew Fudenberg, Kevin He, Lorens Imhof

AI总结 研究贝叶斯观察者在两个极稀事件下所需数据量以正确推断其相对概率,方法基于概率分布和边界行为分析,贡献为证明数据量条件的最优性。

详情
Journal ref
Proceedings of the National Academy of Sciences 114(19):4925-4929, May 2017
AI中文摘要

我们研究贝叶斯观察者在两个极稀事件下所需数据量以正确推断其相对概率。每个时期投掷一个蓝色或红色骰子,两骰子朝侧面1的概率分别为未知的p1和q1,可以任意低。在数据生成过程中,p1≥c q1,我们关注需要多少数据才能保证以高概率观察者的贝叶斯后验均值p1超过(1-δ)c倍的q1均值。如果两个骰子的先验密度在参数空间内部为正,并在边界处行为如同幂函数,则对于每个ε>0,存在有限的N,使得在n个时期后,以至少1-ε的概率获得这样的推断,当np1≥N时。n和p1的条件是最优的。如果其中一个先验密度在边界处以指数速度趋于零,则结果可能失效。

英文摘要

We study how much data a Bayesian observer needs to correctly infer the relative likelihoods of two events when both events are arbitrarily rare. Each period, either a blue die or a red die is tossed. The two dice land on side $1$ with unknown probabilities $p_1$ and $q_1$, which can be arbitrarily low. Given a data-generating process where $p_1\ge c q_1$, we are interested in how much data is required to guarantee that with high probability the observer's Bayesian posterior mean for $p_1$ exceeds $(1-δ)c$ times that for $q_1$. If the prior densities for the two dice are positive on the interior of the parameter space and behave like power functions at the boundary, then for every $ε>0,$ there exists a finite $N$ so that the observer obtains such an inference after $n$ periods with probability at least $1-ε$ whenever $np_1\ge N$. The condition on $n$ and $p_1$ is the best possible. The result can fail if one of the prior densities converges to zero exponentially fast at the boundary.

1705.02891 2026-06-04 stat.CO cs.LG cs.NA hep-lat math.NA stat.ML

Geometry and Dynamics for Markov Chain Monte Carlo

马尔可夫链蒙特卡洛的几何与动力学

Alessandro Barp, Francois-Xavier Briol, Anthony D. Kennedy, Mark Girolami

AI总结 本文综述了Hamiltonian Monte Carlo中使用的几何工具,为统计学家和机器学习者提供基础理解,并讨论了该领域最新进展。

详情
Comments
Submitted to "Annual Review of Statistics and Its Applications"
AI中文摘要

马尔可夫链蒙特卡洛方法已革新了数学计算,并在许多以前无法处理的模型中实现了统计推断。在此背景下,哈密顿动力学被提出作为高效构建链的方法,以高效探索概率密度。该方法源自物理和几何,并且这些联系已通过一系列作者三十年的研究被广泛研究。然而,目前用户对方法的直觉和知识与我们对这些理论基础的深入理解之间存在差距。本文的目的是为统计学家、机器学习者及其他方法使用者提供一个全面的介绍,使他们能够在仅具备基本蒙特卡洛方法知识的情况下理解这些几何工具。这将通过讨论该领域最近期的进展来补充,我们相信这些进展将对应用科学家越来越相关。

英文摘要

Markov Chain Monte Carlo methods have revolutionised mathematical computation and enabled statistical inference within many previously intractable models. In this context, Hamiltonian dynamics have been proposed as an efficient way of building chains which can explore probability densities efficiently. The method emerges from physics and geometry and these links have been extensively studied by a series of authors through the last thirty years. However, there is currently a gap between the intuitions and knowledge of users of the methodology and our deep understanding of these theoretical foundations. The aim of this review is to provide a comprehensive introduction to the geometric tools used in Hamiltonian Monte Carlo at a level accessible to statisticians, machine learners and other users of the methodology with only a basic understanding of Monte Carlo methods. This will be complemented with some discussion of the most recent advances in the field which we believe will become increasingly relevant to applied scientists.

1605.00609 2026-06-04 cs.LG cs.IT cs.NA math.IT math.NA stat.ML

Algorithms for Learning Sparse Additive Models with Interactions in High Dimensions

高维空间中包含交互项的稀疏加法模型的学习算法

Hemant Tyagi, Anastasios Kyrillidis, Bernd Gärtner, Andreas Krause

AI总结 本文提出了一种在高维空间中学习包含稀疏交互项的加法模型的算法,通过压缩感知方法有效恢复模型结构并保证误差界。

详情
Comments
To appear in Information and Inference: A Journal of the IMA. Made following changes after review process: (a) Corrected typos throughout the text. (b) Corrected choice of sampling distribution in Section 5, see eqs. (5.2), (5.3). (c) More detailed comparison with existing work in Section 8. (d) Added Section B in appendix on roots of cubic equation
AI中文摘要

一个函数$f: \mathbb{R}^d \rightarrow \mathbb{R}$是稀疏加法模型(SPAM),如果其形式为$f(\mathbf{x}) = \sum_{l \in \mathcal{S}}ϕ_{l}(x_l)$,其中$\mathcal{S} \subset [d]$,且$|\mathcal{S}| \ll d$。假设$ϕ$和$\mathcal{S}$未知,已有大量工作致力于从样本中估计$f$。本文考虑了一种广义的SPAMs,允许存在少量的二次交互项。对于某些$\mathcal{S}_1 \subset [d], \mathcal{S}_2 \subset {[d] \choose 2}$,其中$|\mathcal{S}_1| \ll d, |\mathcal{S}_2| \ll d^2$,函数$f$现在被假设为形式:$\sum_{p \in \mathcal{S}_1}ϕ_{p} (x_p) + \sum_{(l,l^{\prime}) \in \mathcal{S}_2}ϕ_{(l,l^{\prime})} (x_l,x_{l^{\prime}})$。假设我们能够任意查询$f$的域内任意点,我们推导出高效的算法,能够以有限样本界证明恢复$\mathcal{S}_1,\mathcal{S}_2$。我们的分析涵盖了无噪声设置,即获得精确的$f$样本,也扩展到有噪声设置,其中查询被噪声污染。特别是对于有噪声设置,我们考虑了两种噪声模型:独立同分布高斯噪声和任意但有界的噪声。我们的主要方法依赖于稀疏Hessian矩阵的估计,为此我们提供了两种新的压缩感知方案。一旦$\mathcal{S}_1, \mathcal{S}_2$已知,我们展示了如何通过额外的$f$查询估计个体组件$ϕ_p$, $ϕ_{(l,l^{\prime})}$,并保证均匀误差界。最后,我们通过合成数据的模拟结果验证了我们的理论发现。

英文摘要

A function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a Sparse Additive Model (SPAM), if it is of the form $f(\mathbf{x}) = \sum_{l \in \mathcal{S}}ϕ_{l}(x_l)$ where $\mathcal{S} \subset [d]$, $|\mathcal{S}| \ll d$. Assuming $ϕ$'s, $\mathcal{S}$ to be unknown, there exists extensive work for estimating $f$ from its samples. In this work, we consider a generalized version of SPAMs, that also allows for the presence of a sparse number of second order interaction terms. For some $\mathcal{S}_1 \subset [d], \mathcal{S}_2 \subset {[d] \choose 2}$, with $|\mathcal{S}_1| \ll d, |\mathcal{S}_2| \ll d^2$, the function $f$ is now assumed to be of the form: $\sum_{p \in \mathcal{S}_1}ϕ_{p} (x_p) + \sum_{(l,l^{\prime}) \in \mathcal{S}_2}ϕ_{(l,l^{\prime})} (x_l,x_{l^{\prime}})$. Assuming we have the freedom to query $f$ anywhere in its domain, we derive efficient algorithms that provably recover $\mathcal{S}_1,\mathcal{S}_2$ with finite sample bounds. Our analysis covers the noiseless setting where exact samples of $f$ are obtained, and also extends to the noisy setting where the queries are corrupted with noise. For the noisy setting in particular, we consider two noise models namely: i.i.d Gaussian noise and arbitrary but bounded noise. Our main methods for identification of $\mathcal{S}_2$ essentially rely on estimation of sparse Hessian matrices, for which we provide two novel compressed sensing based schemes. Once $\mathcal{S}_1, \mathcal{S}_2$ are known, we show how the individual components $ϕ_p$, $ϕ_{(l,l^{\prime})}$ can be estimated via additional queries of $f$, with uniform error bounds. Lastly, we provide simulation results on synthetic data that validate our theoretical findings.

1610.00362 2026-06-04 eess.SY cs.SY math.OC stat.ML

An Optimal Treatment Assignment Strategy to Evaluate Demand Response Effect

一种最优的需求响应效果评估治疗分配策略

Pan Li, Baosen Zhang

AI总结 本文提出一种策略性分配方法,用于评估需求响应信号对客户消费的影响,通过多变量线性模型和实验设计方法,优化估计方差,验证了在高维数据下更高效的分配策略。

详情
Comments
A shorter version appeared in Proceedings of the 2016 Allerton Conference
AI中文摘要

需求响应旨在激励电力客户在关键时期调整负载。准确估计需求响应信号对客户消费的影响对于任何成功的计划都是关键。在实践中,学习这些响应具有挑战性,因为运营商只能发送有限的信号。此外,客户行为还依赖于大量的外生协变量。这两个特征导致了高维推断问题,观测数量有限。本文通过多变量线性模型将此问题建模,并采用实验设计方法来估计需求响应信号的影响。我们显示,随机分配,广泛用于估计平均处理效应,当存在大量协变量时,不高效于减少估计量的方差。相反,我们提出了一种可处理的算法,战略性地将需求响应信号分配给客户。该算法实现了估计方差的最优减少,无论协变量的数量如何。结果通过合成数据的模拟验证。

英文摘要

Demand response is designed to motivate electricity customers to modify their loads at critical time periods. The accurate estimation of impact of demand response signals to customers' consumption is central to any successful program. In practice, learning these response is nontrivial because operators can only send a limited number of signals. In addition, customer behavior also depends on a large number of exogenous covariates. These two features lead to a high dimensional inference problem with limited number of observations. In this paper, we formulate this problem by using a multivariate linear model and adopt an experimental design approach to estimate the impact of demand response signals. We show that randomized assignment, which is widely used to estimate the average treatment effect, is not efficient in reducing the variance of the estimator when a large number of covariates is present. In contrast, we present a tractable algorithm that strategically assigns demand response signals to customers. This algorithm achieves the optimal reduction in estimation variance, independent of the number of covariates. The results are validated from simulations on synthetic data.

1608.04773 2026-06-04 stat.ML cs.DS cs.LG cs.NA math.NA math.OC

Faster Principal Component Regression and Stable Matrix Chebyshev Approximation

更快的主成分回归与稳定的矩阵切比雪夫逼近

Zeyuan Allen-Zhu, Yuanzhi Li

AI总结 本文提出了一种通过减少黑盒调用次数来实现主成分回归的算法,其精度为1+γ,且无需显式构造主成分,适用于大规模数据。同时,开发了稳定的矩阵切比雪夫多项式递推公式和最优多项式逼近矩阵符号函数的方法。

详情
Comments
title changed and minor revisions
AI中文摘要

我们通过将问题转化为至多$\tilde{O}(γ^{-1})$次黑盒调用的岭回归来解决主成分回归(PCR),从而在乘法精度$1+γ$内达到目标。因此,我们的算法不需要显式构造顶部主成分,适用于大规模PCR实例。相比之下,先前结果需要$\tilde{O}(γ^{-2})$次这样的黑盒调用。我们通过开发一个通用的稳定递推公式用于矩阵切比雪夫多项式,以及一个最优次数的多项式逼近矩阵符号函数来获得这一结果。我们的技术可能具有独立的兴趣,尤其是在设计迭代方法时。

英文摘要

We solve principal component regression (PCR), up to a multiplicative accuracy $1+γ$, by reducing the problem to $\tilde{O}(γ^{-1})$ black-box calls of ridge regression. Therefore, our algorithm does not require any explicit construction of the top principal components, and is suitable for large-scale PCR instances. In contrast, previous result requires $\tilde{O}(γ^{-2})$ such black-box calls. We obtain this result by developing a general stable recurrence formula for matrix Chebyshev polynomials, and a degree-optimal polynomial approximation to the matrix sign function. Our techniques may be of independent interests, especially when designing iterative methods.

1704.07272 2026-06-04 stat.CO cs.NA math.NA stat.ME

Advanced Multilevel Monte Carlo Methods

高级多级蒙特卡洛方法

Ajay Jasra, Kody Law, Carina Suciu

AI总结 本文探讨了在多级蒙特卡洛方法中应用高级蒙特卡洛技术的问题,分析了在无法精确采样时如何利用马尔可夫链蒙特卡洛和顺序蒙特卡洛方法优化计算效率。

详情
AI中文摘要

本文回顾了在多级蒙特卡洛(MLMC)背景下应用高级蒙特卡洛技术的应用。MLMC是一种用于计算期望值的策略,该期望值在某种意义上可能是有偏的,例如通过使用关联概率律的离散化。MLMC方法利用一个层次结构的有偏近似值,这些近似值逐渐变得更加准确但成本也更高。通过使用最准确近似的 telescoping 表示,该方法能够将给定误差水平下的计算成本降低到与i.i.d.采样相比。所有这些想法最初都是针对在层次结构中可以精确采样的情况。本文考虑了当前无法精确采样的情况。我们考虑了马尔可夫链蒙特卡洛和顺序蒙特卡洛方法,并介绍了不同的策略,以促进这些方法中MLMC的应用。

英文摘要

This article reviews the application of advanced Monte Carlo techniques in the context of Multilevel Monte Carlo (MLMC). MLMC is a strategy employed to compute expectations which can be biased in some sense, for instance, by using the discretization of a associated probability law. The MLMC approach works with a hierarchy of biased approximations which become progressively more accurate and more expensive. Using a telescoping representation of the most accurate approximation, the method is able to reduce the computational cost for a given level of error versus i.i.d. sampling from this latter approximation. All of these ideas originated for cases where exact sampling from couples in the hierarchy is possible. This article considers the case where such exact sampling is not currently possible. We consider Markov chain Monte Carlo and sequential Monte Carlo methods which have been introduced in the literature and we describe different strategies which facilitate the application of MLMC within these methods.

1704.07223 2026-06-04 math.NA cs.IT cs.NA math.IT stat.CO stat.ML

Entropic Trace Estimates for Log Determinants

熵迹估计用于对数行列式

Jack Fitzsimons, Diego Granziol, Kurt Cutajar, Michael Osborne, Maurizio Filippone, Stephen Roberts

AI总结 本文基于最大熵框架,利用随机迹估计的矩约束来估算对数行列式,显著提升了多种稀疏矩阵的计算效率,并加速了大规模学习方法中的推断过程。

详情
Comments
16 pages, 4 figures, 2 tables, 2 algorithms
AI中文摘要

矩阵行列式的可扩展计算一直是许多机器学习方法广泛应用的瓶颈,如行列式点过程、高斯过程、广义马尔可夫随机场、图模型等。本文在最大熵框架下,利用随机迹估计的矩约束来估计对数行列式。实验表明,该方法在多种UFL稀疏矩阵上显著优于现有方法。以一般马尔可夫随机场为例,展示了该方法如何显著加速涉及对数行列式的大型学习方法的推断过程。

英文摘要

The scalable calculation of matrix determinants has been a bottleneck to the widespread application of many machine learning methods such as determinantal point processes, Gaussian processes, generalised Markov random fields, graph models and many others. In this work, we estimate log determinants under the framework of maximum entropy, given information in the form of moment constraints from stochastic trace estimation. The estimates demonstrate a significant improvement on state-of-the-art alternative methods, as shown on a wide variety of UFL sparse matrices. By taking the example of a general Markov random field, we also demonstrate how this approach can significantly accelerate inference in large-scale learning methods involving the log determinant.

1603.04577 2026-06-04 physics.data-an cs.NA math.NA math.OC physics.geo-ph stat.CO

An Ensemble 4D Seismic History Matching Framework with Sparse Representation Based on Wavelet Multiresolution Analysis

基于小波多分辨率分析的稀疏表示的集成4D地震历史匹配框架

Xiaodong Luo, Tuhin Bhakta, Morten Jakobsen, Geir Nævdal

AI总结 本文提出一种集成4D地震历史匹配框架,通过小波多分辨率分析和稀疏表示减少数据规模,提升储层表征精度。

详情
Journal ref
SPE Journal, 2017, paper number SPE-180025-PA
Comments
SPE-180025-MS, SPE Bergen One Day Seminar
AI中文摘要

在本文中,我们提出了一种用于储层表征的集成4D地震历史匹配框架。与现有储层工程框架相比,该框架包含了一些较新的成分,包括所选地震数据的类型、用于所选地震数据的小波多分辨率分析及相关数据噪声估计,以及最近开发的迭代集合历史匹配算法的使用。典型用于历史匹配的地震数据,如声速阻抗,是反演量,反演过程中可能产生额外的不确定性。在所提出的框架中,我们避免了此类中间反演过程。此外,我们还采用基于小波的稀疏表示来减少数据规模。具体而言,我们使用从振幅对角(AVA)数据中导出的截距和梯度属性,对属性数据应用多级离散小波变换(DWT),并估计所得小波系数的噪声水平。然后,我们选择超过一定阈值的波let系数,并使用迭代集合平滑器对这些主导波let系数进行历史匹配。

英文摘要

In this work we propose an ensemble 4D seismic history matching framework for reservoir characterization. Compared to similar existing frameworks in reservoir engineering community, the proposed one consists of some relatively new ingredients, in terms of the type of seismic data in choice, wavelet multiresolution analysis for the chosen seismic data and related data noise estimation, and the use of recently developed iterative ensemble history matching algorithms. Typical seismic data used for history matching, such as acoustic impedance, are inverted quantities, whereas extra uncertainties may arise during the inversion processes. In the proposed framework we avoid such intermediate inversion processes. In addition, we also adopt wavelet-based sparse representation to reduce data size. Concretely, we use intercept and gradient attributes derived from amplitude versus angle (AVA) data, apply multilevel discrete wavelet transforms (DWT) to attribute data, and estimate noise level of resulting wavelet coefficients. We then select the wavelet coefficients above a certain threshold value, and history-match these leading wavelet coefficients using an iterative ensemble smoother. (The rest of the abstract is omitted for exceeding the limit of length)

1704.06803 2026-06-04 cs.LG cs.IR cs.NA math.NA stat.ML

Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks

基于循环多图神经网络的几何矩阵补全

Federico Monti, Michael M. Bronstein, Xavier Bresson

AI总结 本文提出利用几何深度学习改进矩阵补全,结合图卷积网络和循环神经网络,学习图结构模式和非线性扩散过程,以提升推荐系统性能,参数数量与矩阵规模无关。

详情
AI中文摘要

本文提出利用几何深度学习改进矩阵补全,结合图卷积网络和循环神经网络,学习图结构模式和非线性扩散过程,以提升推荐系统性能,参数数量与矩阵规模无关。

英文摘要

Matrix completion models are among the most common formulations of recommender systems. Recent works have showed a boost of performance of these techniques when introducing the pairwise relationships between users/items in the form of graphs, and imposing smoothness priors on these graphs. However, such techniques do not fully exploit the local stationarity structures of user/item graphs, and the number of parameters to learn is linear w.r.t. the number of users and items. We propose a novel approach to overcome these limitations by using geometric deep learning on graphs. Our matrix completion architecture combines graph convolutional neural networks and recurrent neural networks to learn meaningful statistical graph-structured patterns and the non-linear diffusion process that generates the known ratings. This neural network system requires a constant number of parameters independent of the matrix size. We apply our method on both synthetic and real datasets, showing that it outperforms state-of-the-art techniques.

1504.01418 2026-06-04 stat.CO cs.NA math.NA

Precomputing Strategy for Hamiltonian Monte Carlo Method Based on Regularity in Parameter Space

基于参数空间正则性的Hamilton蒙特卡洛方法预计算策略

Cheng Zhang, Babak Shahbaba, Hongkai Zhao

AI总结 本文提出利用参数空间的正则性提升MCMC算法效率,通过预计算几何信息和插值技术加速HMC采样,适用于高维问题。

详情
AI中文摘要

Markov链蒙特卡洛(MCMC)算法在处理不可 tractable 概率分布的统计推断中起重要作用。近年来,Hamiltonian蒙特卡洛(HMC)和Riemannian流形HMC等算法被提出以提供高接受率的远距离提案。然而,这些算法计算成本高,尤其在大数据问题中因重复评估数据依赖的函数和统计量而受限。本文提出一种新策略,利用参数空间的光滑性提升MCMC算法效率。当需要在参数空间某点评估函数或统计量时,通过插值预计算值或先前计算值。具体而言,我们专注于利用几何信息加速概率分布探索的HMC算法。所提方法基于在HMC迭代中每个步骤前在网格集上预计算所需几何信息,然后在附近网格上进行采样。对于高维问题使用稀疏网格插值方法。计算示例测试展示了该方法的优势。

英文摘要

Markov Chain Monte Carlo (MCMC) algorithms play an important role in statistical inference problems dealing with intractable probability distributions. Recently, many MCMC algorithms such as Hamiltonian Monte Carlo (HMC) and Riemannian Manifold HMC have been proposed to provide distant proposals with high acceptance rate. These algorithms, however, tend to be computationally intensive which could limit their usefulness, especially for big data problems due to repetitive evaluations of functions and statistical quantities that depend on the data. This issue occurs in many statistic computing problems. In this paper, we propose a novel strategy that exploits smoothness (regularity) of parameter space to improve computational efficiency of MCMC algorithms. When evaluation of functions or statistical quantities are needed at a point in parameter space, interpolation from precomputed values or previous computed values is used. More specifically, we focus on Hamiltonian Monte Carlo (HMC) algorithms that use geometric information for faster exploration of probability distributions. Our proposed method is based on precomputing the required geometric information on a set of grids before running sampling information at nearby grids at each iteration of HMC. Sparse grid interpolation method is used for high dimensional problems. Tests on computational examples are shown to illustrate the advantages of our method.

1607.07837 2026-06-04 math.OC cs.DS cs.LG cs.NA math.NA stat.ML

First Efficient Convergence for Streaming k-PCA: a Global, Gap-Free, and Near-Optimal Rate

流式k-PCA的首次高效收敛:全局、无间隙且近最优速率

Zeyuan Allen-Zhu, Yuanzhi Li

AI总结 本文研究流式PCA,提出改进的Oja算法变体Oja++,在O(dk)空间内实现全局收敛和无间隙收敛,匹配信息理论下限。

详情
Comments
REMARK: v4 adds discussions and polishes writing; v3 contains a stronger Theorem 2, a new lower bound Theorem 6, as well as new Oja++ results Theorem 4 and Theorem 5
AI中文摘要

我们研究流式主成分分析(PCA),即在O(dk)空间内找到d×d隐藏矩阵Σ的前k个特征向量。我们为Oja算法提供了全局收敛性,该算法在实践中常用但缺乏理论支持。我们还提出改进的Oja++变体,其运行速度比Oja更快。我们的结果在误差、特征间隙、秩k和维度d的依赖关系上匹配信息理论下限,至多多项式对数因子。此外,我们的收敛速率可做到无间隙,即与近似误差成正比,不依赖特征间隙。相比之下,在一般秩k情况下,在O(dk)空间内设计具有高效全局收敛速率的算法之前未有解决方案;并且在O(dk)空间内设计具有(甚至局部)无间隙收敛速率的算法之前也未有解决方案。

英文摘要

We study streaming principal component analysis (PCA), that is to find, in $O(dk)$ space, the top $k$ eigenvectors of a $d\times d$ hidden matrix $\bf Σ$ with online vectors drawn from covariance matrix $\bf Σ$. We provide $\textit{global}$ convergence for Oja's algorithm which is popularly used in practice but lacks theoretical understanding for $k>1$. We also provide a modified variant $\mathsf{Oja}^{++}$ that runs $\textit{even faster}$ than Oja's. Our results match the information theoretic lower bound in terms of dependency on error, on eigengap, on rank $k$, and on dimension $d$, up to poly-log factors. In addition, our convergence rate can be made gap-free, that is proportional to the approximation error and independent of the eigengap. In contrast, for general rank $k$, before our work (1) it was open to design any algorithm with efficient global convergence rate; and (2) it was open to design any algorithm with (even local) gap-free convergence rate in $O(dk)$ space.

1701.06731 2026-06-04 math.OC cs.SY eess.SY stat.ML

Weak Adaptive Submodularity and Group-Based Active Diagnosis with Applications to State Estimation with Persistent Sensor Faults

弱自适应次模性与基于群体的主动诊断及其在存在持续传感器故障的状态估计中的应用

Sze Zheng Yong, Lingyun Gao, Necmiye Ozay

AI总结 本文研究了随机状态估计中部分观测的自适应决策问题,提出弱自适应次模性概念,并展示基于群体的主动诊断的奖励函数具有该性质,实验表明自适应贪心策略在持续传感器故障的状态估计中表现优异。

详情
Comments
To appear in 2017 IEEE American Control Conference
AI中文摘要

本文考虑了随机状态估计中部分观测的自适应决策问题。首先引入弱自适应次模性概念,它是自适应次模性的一种推广,已在解决具有挑战性的自适应状态估计问题中取得巨大成功。然后针对主动诊断问题,即通过主动传感进行离散状态估计,证明当奖励函数具有该性质时,自适应贪心策略具有近最优性能保证。我们进一步展示,群体基于主动诊断的奖励函数,在医疗诊断和持续传感器故障的状态估计等应用中也是弱自适应次模的。最后,在具有持续传感器故障的飞机电气系统状态估计实验中,观察到自适应贪心策略与穷举搜索表现相当。

英文摘要

In this paper, we consider adaptive decision-making problems for stochastic state estimation with partial observations. First, we introduce the concept of weak adaptive submodularity, a generalization of adaptive submodularity, which has found great success in solving challenging adaptive state estimation problems. Then, for the problem of active diagnosis, i.e., discrete state estimation via active sensing, we show that an adaptive greedy policy has a near-optimal performance guarantee when the reward function possesses this property. We further show that the reward function for group-based active diagnosis, which arises in applications such as medical diagnosis and state estimation with persistent sensor faults, is also weakly adaptive submodular. Finally, in experiments of state estimation for an aircraft electrical system with persistent sensor faults, we observe that an adaptive greedy policy performs equally well as an exhaustive search.

1703.03722 2026-06-04 math.NA cs.NA stat.ML

Recovery of Sparse and Low Rank Components of Matrices Using Iterative Method with Adaptive Thresholding

利用自适应阈值的迭代方法恢复矩阵的稀疏和低秩成分

Nematollah Zarmehi, Farokh Marvasti

AI总结 本文提出一种利用自适应阈值迭代方法恢复矩阵稀疏和低秩成分的算法,通过阈值操作获取低秩和稀疏成分,具有高效性和易实现性,并在非稀疏噪声应用中表现出良好的性能。

详情
AI中文摘要

本文提出了一种用于恢复矩阵稀疏和低秩成分的算法,该算法采用迭代方法结合自适应阈值。在每次迭代中,通过阈值操作获取低秩和稀疏成分。该算法运行快速且易于实现。我们将其与一种常见的快速方法进行比较,该方法通过ℓ1范数近似秩和稀疏性。我们还将其应用于一些实际应用,其中噪声并非稀疏。仿真结果表明,该方法具有良好的性能,且运行时间较低。

英文摘要

In this letter, we propose an algorithm for recovery of sparse and low rank components of matrices using an iterative method with adaptive thresholding. In each iteration, the low rank and sparse components are obtained using a thresholding operator. This algorithm is fast and can be implemented easily. We compare it with one of the most common fast methods in which the rank and sparsity are approximated by $\ell_1$ norm. We also apply it to some real applications where the noise is not so sparse. The simulation results show that it has a suitable performance with low run-time.

1605.07367 2026-06-04 cs.LG cs.NA math.NA math.OC stat.ML

Riemannian stochastic variance reduced gradient on Grassmann manifold

黎曼流形上的随机方差缩减梯度算法

Hiroyuki Kasai, Hiroyuki Sato, Bamdev Mishra

AI总结 本文提出了一种在紧凑流形搜索空间中扩展欧几里得随机方差缩减梯度算法的黎曼扩展方法,针对格拉斯曼流形进行研究,解决了多个梯度的平均、加法和减法问题,并在不同步长下分析了算法的收敛性。

详情
AI中文摘要

随机方差缩减算法近年来在最小化大量但有限的损失函数的平均值方面变得流行。本文提出了一种新颖的黎曼扩展欧几里得随机方差缩减梯度算法(R-SVRG)到紧凑流形搜索空间。为此,我们展示了在格拉斯曼流形上的发展。通过在格拉斯曼流形上引入对数映射和向量的平行翻译来解决多个梯度的平均、加法和减法的关键挑战。我们展示了所提出算法在衰减步长下的全局收敛性分析,并在固定步长下在某些自然假设下进行了局部收敛率分析。所提出算法被应用于格拉斯曼流形上的多个问题,如主成分分析、低秩矩阵补全和Karcher均值计算。在所有这些情况下,所提出算法都优于标准的黎曼随机梯度下降算法。

英文摘要

Stochastic variance reduction algorithms have recently become popular for minimizing the average of a large, but finite, number of loss functions. In this paper, we propose a novel Riemannian extension of the Euclidean stochastic variance reduced gradient algorithm (R-SVRG) to a compact manifold search space. To this end, we show the developments on the Grassmann manifold. The key challenges of averaging, addition, and subtraction of multiple gradients are addressed with notions like logarithm mapping and parallel translation of vectors on the Grassmann manifold. We present a global convergence analysis of the proposed algorithm with decay step-sizes and a local convergence rate analysis under fixed step-size with some natural assumptions. The proposed algorithm is applied on a number of problems on the Grassmann manifold like principal components analysis, low-rank matrix completion, and the Karcher mean computation. In all these cases, the proposed algorithm outperforms the standard Riemannian stochastic gradient descent algorithm.

1610.05792 2026-06-04 cs.LG cs.NA math.NA math.OC stat.ML

Big Batch SGD: Automated Inference using Adaptive Batch Sizes

大批次SGD:利用自适应批次大小进行自动化推断

Soham De, Abhay Yadav, David Jacobs, Tom Goldstein

AI总结 本文提出大批次SGD方法,通过自适应增长批次大小保持梯度近似信号与噪声比,实现无需凸优化的高效收敛,并支持自动学习率选择。

详情
Comments
A preliminary version of this paper appears in AISTATS 2017 (International Conference on Artificial Intelligence and Statistics)
AI中文摘要

经典随机梯度方法依赖于噪声梯度近似,随着迭代接近解,噪声逐渐增大,导致难以用于自适应步长选择和自动停止。本文提出替代的

英文摘要

Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it difficult to use them for adaptive stepsize selection and automatic stopping. We propose alternative "big batch" SGD schemes that adaptively grow the batch size over time to maintain a nearly constant signal-to-noise ratio in the gradient approximation. The resulting methods have similar convergence rates to classical SGD, and do not require convexity of the objective. The high fidelity gradients enable automated learning rate selection and do not require stepsize decay. Big batch methods are thus easily automated and can run with little or no oversight.

1603.01566 2026-06-04 cs.IT cs.NA math.IT math.NA stat.ML

Identifiability of an X-rank decomposition of polynomial maps

多项式映射X-秩分解的可识别性

Pierre Comon, Yang Qi, Konstantin Usevich

AI总结 研究多项式分解模型的可识别性,证明其为X-秩分解的特例,并探讨其通用秩和可识别性。

详情
Comments
26 pages
AI中文摘要

本文研究了出现在系统辨识、信号处理和机器学习中的多项式分解模型。我们证明该分解是X-秩分解的特例,这是一种在代数几何中强有力的新型概念,扩展了张量CP分解。我们证明了该多项式分解模型的通用秩和最大秩以及可识别性的新结果。在本文中,我们试图使结果和基本工具对一般读者易于理解(假设没有代数几何或其前提知识)。

英文摘要

In this paper, we study a polynomial decomposition model that arises in problems of system identification, signal processing and machine learning. We show that this decomposition is a special case of the X-rank decomposition --- a powerful novel concept in algebraic geometry that generalizes the tensor CP decomposition. We prove new results on generic/maximal rank and on identifiability of a particular polynomial decomposition model. In the paper, we try to make results and basic tools accessible for general audience (assuming no knowledge of algebraic geometry or its prerequisites).

1704.01445 2026-06-04 stat.ML cs.NA math.NA stat.CO

Bayesian Inference of Log Determinants

对数行列式的概率推断

Jack Fitzsimons, Kurt Cutajar, Michael Osborne, Stephen Roberts, Maurizio Filippone

AI总结 本文提出将计算对数行列式视为贝叶斯推断问题,结合矩阵理论的界和随机迹估计的证据,以概率方式估计对数行列式及其不确定性,方法在性能上与现有方法相当并量化了预算受限的不确定性。

详情
Comments
12 pages, 3 figures
AI中文摘要

核矩阵的对数行列式出现在多种机器学习问题中,从行列式点过程和广义马尔可夫随机场到高斯过程的训练。当核矩阵大小超过几千时,精确计算这一项通常是不可行的。在概率数值学的框架下,我们将计算对数行列式的问题重新解释为贝叶斯推断问题。具体来说,我们结合矩阵理论中的界和随机迹估计得到的证据,以在给定计算预算内获得对数行列式及其相关不确定性的概率估计。除了其新颖性和理论吸引力外,我们方法的性能与现有近似对数行列式的方法相竞争,同时量化了由于预算受限的证据而导致的不确定性。

英文摘要

The log-determinant of a kernel matrix appears in a variety of machine learning problems, ranging from determinantal point processes and generalized Markov random fields, through to the training of Gaussian processes. Exact calculation of this term is often intractable when the size of the kernel matrix exceeds a few thousand. In the spirit of probabilistic numerics, we reinterpret the problem of computing the log-determinant as a Bayesian inference problem. In particular, we combine prior knowledge in the form of bounds from matrix theory and evidence derived from stochastic trace estimation to obtain probabilistic estimates for the log-determinant and its associated uncertainty within a given computational budget. Beyond its novelty and theoretic appeal, the performance of our proposal is competitive with state-of-the-art approaches to approximating the log-determinant, while also quantifying the uncertainty due to budget-constrained evidence.

1607.02538 2026-06-04 math.ST cs.NA math.NA stat.TH

A data-driven method for improving the correlation estimation in serial ensemble Kalman filters

一种用于改进序列集合卡尔曼滤波器相关性估计的数据驱动方法

Michèle De La Chevrotière, John Harlim

AI总结 本文提出一种数据驱动方法,通过线性映射改进序列集合卡尔曼滤波器中相关性估计,尤其在小规模集合时效果显著。

详情
Comments
12 figures
AI中文摘要

本文介绍了一种用于改进序列集合卡尔曼滤波器中相关性估计的数据驱动方法。该方法在每次同化循环中寻找一个线性映射,将估计不佳的样本相关性转换为改进的相关性。该映射通过离线训练过程获得,作为线性回归问题的解,该问题利用从历史同化产品中获得的适当样本相关性统计信息。在理想化的OSSE中使用洛伦兹-96模型,并针对线性和非线性观测模型的不同情况,所提出的方法提高了滤波估计,尤其是在集合大小相对于状态空间维度较小时效果尤为明显。

英文摘要

A data-driven method for improving the correlation estimation in serial ensemble Kalman filters is introduced. The method finds a linear map that transforms, at each assimilation cycle, the poorly estimated sample correlation into an improved correlation. This map is obtained from an offline training procedure without any tuning as the solution of a linear regression problem that uses appropriate sample correlation statistics obtained from historical data assimilation products. In an idealized OSSE with the Lorenz-96 model and for a range of cases of linear and nonlinear observation models, the proposed scheme improves the filter estimates, especially when the ensemble size is small relative to the dimension of the state space.

1603.05486 2026-06-04 stat.CO cs.SY eess.SY stat.ML

A flexible state space model for learning nonlinear dynamical systems

一种灵活的状态空间模型用于学习非线性动力学系统

Andreas Svensson, Thomas B. Schön

AI总结 本文提出一种基于基函数展开的非线性状态空间模型,通过学习基函数系数来建模动态系统,并利用高斯过程开发先验分布以防止过拟合,提升模型泛化能力。

详情
Journal ref
Automatica 80(2017), page 189-199
AI中文摘要

我们考虑了一种非线性状态空间模型,其中状态转移和观测函数以基函数展开形式表示。基函数展开中的系数从数据中学习。通过与高斯过程的联系,我们开发了对系数的先验分布,用于调节模型灵活性并防止数据过拟合,类似于高斯过程状态空间模型。这些先验可以视为一种正则化,有助于在不牺牲基函数展开丰富性的情况下推广数据。为了高效学习系数和其他未知参数,我们设计了一个利用最新序列蒙特卡洛方法的算法,该算法在学习上有理论保证。在经典基准和真实数据上的评估表明,我们的方法显示出有前景的结果。

英文摘要

We consider a nonlinear state-space model with the state transition and observation functions expressed as basis function expansions. The coefficients in the basis function expansions are learned from data. Using a connection to Gaussian processes we also develop priors on the coefficients, for tuning the model flexibility and to prevent overfitting to data, akin to a Gaussian process state-space model. The priors can alternatively be seen as a regularization, and helps the model in generalizing the data without sacrificing the richness offered by the basis function expansion. To learn the coefficients and other unknown parameters efficiently, we tailor an algorithm using state-of-the-art sequential Monte Carlo methods, which comes with theoretical guarantees on the learning. Our approach indicates promising results when evaluated on a classical benchmark as well as real data.

1703.08254 2026-06-04 eess.SY cs.SY stat.AP

Improved NN-JPDAF for Joint Multiple Target Tracking and Feature Extraction

改进的NN-JPDAF用于联合多目标跟踪与特征提取

Le Zheng, Xiaodong Wang

AI总结 本文提出改进的NN-JPDAF算法,结合稀疏表示和ADMM方法,用于密集目标环境中多目标跟踪与特征提取的联合处理。

详情
AI中文摘要

特征辅助跟踪通常能提升标准多目标跟踪算法的性能,但目标特征信号由稀疏傅里叶域信号组成,且在时域快速非线性变化,特征测量受漏检和误关联影响。本文开发了特征辅助最近邻联合概率数据关联滤波器(NN-JPDAF),利用原子范数约束表征特征信号的稀疏性,用ℓ1-范数表征误关联引起的稀疏性。通过求解半正定规划(SDP)估计快速变化的特征信号,并通过交替方向乘子法(ADMM)迭代求解。利用估计的特征信号重新过滤以估计目标运动状态,关联利用运动和特征信息。仿真结果展示了算法在雷达应用中的性能。

英文摘要

Feature aided tracking can often yield improved tracking performance over the standard multiple target tracking (MTT) algorithms with only kinematic measurements. However, in many applications, the feature signal of the targets consists of sparse Fourier-domain signals. It changes quickly and nonlinearly in the time domain, and the feature measurements are corrupted by missed detections and mis-associations. These two factors make it hard to extract the feature information to be used in MTT. In this paper, we develop a feature-aided nearest neighbour joint probabilistic data association filter (NN-JPDAF) for joint MTT and feature extraction in dense target environments. To estimate the rapidly varying feature signal from incomplete and corrupted measurements, we use the atomic norm constraint to formulate the sparsity of feature signal and use the $\ell_1$-norm to formulate the sparsity of the corruption induced by mis-associations. Based on the sparse representation, the feature signal are estimated by solving a semidefinite program (SDP) which is convex. We also provide an iterative method for solving this SDP via the alternating direction method of multipliers (ADMM) where each iteration involves closed-form computation. With the estimated feature signal, re-filtering is performed to estimate the kinematic states of the targets, where the association makes use of both kinematic and feature information. Simulation results are presented to illustrate the performance of the proposed algorithm in a radar application.

1602.05450 2026-06-04 stat.ML cs.AI cs.MA cs.SY eess.SY

Inverse Reinforcement Learning in Swarm Systems

群体系统中的逆强化学习

Adrian Šošić, Wasiur R. KhudaBukhsh, Abdelhak M. Zoubir, Heinz Koeppl

AI总结 本文提出swarMDP框架,通过将群体特性融入去中心化部分可观测马尔可夫决策过程,将多智能体逆强化学习问题转化为单智能体问题,并提出适用于群体设置的异构学习方案,验证了该框架能生成有意义的局部奖励模型以复制全局系统动态。

详情
Comments
9 pages, 8 figures; ### Version 2 ### version accepted at AAMAS 2017
AI中文摘要

逆强化学习(IRL)已成为从演示数据中学习行为模型的有用工具。然而,IRL在多智能体系统中的应用仍大多未被探索。本文展示了如何将IRL的原则扩展到同质大规模问题,受自然系统集体趋同行为的启发。特别地,我们对领域做出了以下贡献:1)我们引入了swarMDP框架,这是一种具有群体特性的去中心化部分可观测马尔可夫决策过程的子类。2)利用该框架固有的同质性,我们通过证明该模型中的智能体特定价值函数相等,将所产生的多智能体IRL问题减少为单智能体问题。3)为解决相应的控制问题,我们提出了一种特别针对群体设置的异构学习方案。在两个示例系统上的结果表明,我们的框架能够生成有意义的局部奖励模型,从而复制观察到的全局系统动态。

英文摘要

Inverse reinforcement learning (IRL) has become a useful tool for learning behavioral models from demonstration data. However, IRL remains mostly unexplored for multi-agent systems. In this paper, we show how the principle of IRL can be extended to homogeneous large-scale problems, inspired by the collective swarming behavior of natural systems. In particular, we make the following contributions to the field: 1) We introduce the swarMDP framework, a sub-class of decentralized partially observable Markov decision processes endowed with a swarm characterization. 2) Exploiting the inherent homogeneity of this framework, we reduce the resulting multi-agent IRL problem to a single-agent one by proving that the agent-specific value functions in this model coincide. 3) To solve the corresponding control problem, we propose a novel heterogeneous learning scheme that is particularly tailored to the swarm setting. Results on two example systems demonstrate that our framework is able to produce meaningful local reward models from which we can replicate the observed global system dynamics.

1701.00573 2026-06-04 math.NA cs.LG cs.NA stat.ML

Robust method for finding sparse solutions to linear inverse problems using an L2 regularization

使用L2正则化求解稀疏解的稳健方法

Gonzalo H Otazu

AI总结 本文提出了一种基于生物启发算法的稳健方法,通过L2正则化在过完备字典中寻找稀疏解,具有对噪声的强鲁棒性。

详情
Comments
13 pages, 6 figures. Code available
AI中文摘要

我们分析了在需要稀疏性约束以唯一重建观测信号时,生物启发算法正确投影算法(CPA)的性能。通过改变估计问题的几何结构,CPA给出了一个二元变量的解析表达式,该变量通过L2正则化指示字典原子的存在或不存在。正则化解可通过高效的实时卡尔曼滤波类型算法实现。CPA的平滑L2正则化使其对噪声具有极强的鲁棒性,并在信号中存在强新型原子时优于其他方法。

英文摘要

We analyzed the performance of a biologically inspired algorithm called the Corrected Projections Algorithm (CPA) when a sparseness constraint is required to unambiguously reconstruct an observed signal using atoms from an overcomplete dictionary. By changing the geometry of the estimation problem, CPA gives an analytical expression for a binary variable that indicates the presence or absence of a dictionary atom using an L2 regularizer. The regularized solution can be implemented using an efficient real-time Kalman-filter type of algorithm. The smoother L2 regularization of CPA makes it very robust to noise, and CPA outperforms other methods in identifying known atoms in the presence of strong novel atoms in the signal.

1507.03955 2026-06-04 cs.NE cs.IT cs.SY eess.SY math.IT math.OC stat.AP

Robust Estimation of Self-Exciting Generalized Linear Models with Application to Neuronal Modeling

鲁棒估计自激发广义线性模型及其在神经元建模中的应用

Abbas Kazemipour, Min Wu, Behtash Babadi

AI总结 本文研究了从有限二元观测中估计自激发广义线性模型的问题,分析了ℓ1正则化最大似然和贪心估计器在典型自激发过程中的性能,探讨了非渐进行 regime 下的采样权衡,并扩展了压缩感知在独立同分布协变量下的结果到高度互相关协变量。

详情
AI中文摘要

我们考虑从有限二元观测中估计自激发广义线性模型的问题,其中过程的历史作为协变量。我们分析了两种估计器类别的性能,即ℓ1正则化最大似然估计器和贪心估计器,对于一个典型自激发过程,并表征了非渐进行 regime 下稳定恢复所需的采样权衡。我们的结果将压缩感知在独立同分布协变量下的线性和广义线性模型的结果扩展到高度互相关的协变量。我们还提供了模拟研究以及应用于小鼠视束和海豚视网膜节细胞的真实尖峰数据,这些结果与我们的理论预测一致。

英文摘要

We consider the problem of estimating self-exciting generalized linear models from limited binary observations, where the history of the process serves as the covariate. We analyze the performance of two classes of estimators, namely the $\ell_1$-regularized maximum likelihood and greedy estimators, for a canonical self-exciting process and characterize the sampling tradeoffs required for stable recovery in the non-asymptotic regime. Our results extend those of compressed sensing for linear and generalized linear models with i.i.d. covariates to those with highly inter-dependent covariates. We further provide simulation studies as well as application to real spiking data from the mouse's lateral geniculate nucleus and the ferret's retinal ganglion cells which agree with our theoretical predictions.

1610.02080 2026-06-04 math.ST cs.NA math.NA stat.TH

On Shapley value for measuring importance of dependent inputs

关于用Shapley值衡量依赖输入重要性的研究

Art B. Owen, Clémentine Prieur

AI总结 本文论证了使用Shapley值量化随机输入变量对函数重要性的有效性,对比了基于ANOVA分解的替代方法在输入变量依赖时的理论与计算问题,展示了Shapley值在解决这些问题上的优势。

详情
AI中文摘要

本文主张使用Shapley值来量化随机输入变量对函数的重要性。基于ANOVA分解的替代方法在输入变量依赖时可能面临概念和计算问题。本文的主要目标是证明Shapley值可以消除这些概念问题。我们通过一些简单的例子来展示,Shapley值能够得出直观合理且接近闭式表达的结果。

英文摘要

This paper makes the case for using Shapley value to quantify the importance of random input variables to a function. Alternatives based on the ANOVA decomposition can run into conceptual and computational problems when the input variables are dependent. Our main goal here is to show that Shapley value removes the conceptual problems. We do this with some simple examples where Shapley value leads to intuitively reasonable nearly closed form values.

1703.06327 2026-06-04 stat.ML cs.DS cs.LG cs.NA math.NA

Spectrum Estimation from a Few Entries

从少量条目中估计谱

Ashish Khetan, Sewoong Oh

AI总结 本文研究从矩阵部分条目中恢复谱性质的问题,提出通过估计Schatten范数和Chebyshev逼近或Wasserstein距离匹配来高效恢复奇异值,理论分析显示其比低秩矩阵恢复需要更少样本。

详情
Comments
52 pages; 15 figures
AI中文摘要

矩阵数据的奇异值提供了数据结构、有效维数和高阶数据分析工具超参数选择的见解。然而,在协同过滤和网络分析等实际应用中,我们只能获取部分观测。本文考虑从矩阵条目采样中恢复底层矩阵谱性质的基本问题。我们特别关注直接恢复奇异值集合以及样本高效恢复谱总和函数的方法。首先估计矩阵的Schatten k-范数,然后应用Chebyshev逼近谱总和函数或在Wasserstein距离中进行矩匹配以恢复奇异值。主要技术挑战是准确估计Schatten范数。我们引入基于图中小结构计数的无偏估计器,并提供与实测性能相匹配的保证。理论分析表明,Schatten范数可以从比恢复低秩矩阵所需更少的样本中准确恢复。数值实验表明,我们显著优于使用矩阵补全方法的竞争对手方法。

英文摘要

Singular values of a data in a matrix form provide insights on the structure of the data, the effective dimensionality, and the choice of hyper-parameters on higher-level data analysis tools. However, in many practical applications such as collaborative filtering and network analysis, we only get a partial observation. Under such scenarios, we consider the fundamental problem of recovering spectral properties of the underlying matrix from a sampling of its entries. We are particularly interested in directly recovering the spectrum, which is the set of singular values, and also in sample-efficient approaches for recovering a spectral sum function, which is an aggregate sum of the same function applied to each of the singular values. We propose first estimating the Schatten $k$-norms of a matrix, and then applying Chebyshev approximation to the spectral sum function or applying moment matching in Wasserstein distance to recover the singular values. The main technical challenge is in accurately estimating the Schatten norms from a sampling of a matrix. We introduce a novel unbiased estimator based on counting small structures in a graph and provide guarantees that match its empirical performance. Our theoretical analysis shows that Schatten norms can be recovered accurately from strictly smaller number of samples compared to what is needed to recover the underlying low-rank matrix. Numerical experiments suggest that we significantly improve upon a competing approach of using matrix completion methods.

1502.00182 2026-06-04 math.NA cs.DS cs.LG cs.NA stat.ML

High Dimensional Low Rank plus Sparse Matrix Decomposition

高维低秩加稀疏矩阵分解

Mostafa Rahmani, George Atia

AI总结 本文提出一种可扩展的子空间追猎方法,将矩阵分解问题转化为子空间学习问题,通过小数据草稿实现分解,适应性采样算法提升了处理结构化数据的效率。

详情
Comments
IEEE Transactions on Signal Processing
AI中文摘要

本文关注大数据中的低秩加稀疏矩阵分解问题。传统算法使用全部数据提取低秩和稀疏成分,基于复杂度随数据维度增长的优化问题,限制了可扩展性。现有随机方法多依赖均匀随机采样,对于具有额外结构的数据(如聚类)效率低下。本文提出一种可扩展的子空间追猎方法,将分解问题转化为子空间学习问题。通过采样列/行形成的小数据草稿进行分解。即使均匀随机采样,所需采样列/行数约为O(rμ),其中μ是相干参数,r是低秩成分的秩。此外,提出适应性采样算法以解决结构化数据的列/行采样问题。提供适应性采样方法的分析,证明适应性采样使所需采样列/行数对数据分布不变。所提方法适用于在线实现,并提出在线方案。

英文摘要

This paper is concerned with the problem of low rank plus sparse matrix decomposition for big data. Conventional algorithms for matrix decomposition use the entire data to extract the low-rank and sparse components, and are based on optimization problems with complexity that scales with the dimension of the data, which limits their scalability. Furthermore, existing randomized approaches mostly rely on uniform random sampling, which is quite inefficient for many real world data matrices that exhibit additional structures (e.g. clustering). In this paper, a scalable subspace-pursuit approach that transforms the decomposition problem to a subspace learning problem is proposed. The decomposition is carried out using a small data sketch formed from sampled columns/rows. Even when the data is sampled uniformly at random, it is shown that the sufficient number of sampled columns/rows is roughly O(rμ), where μis the coherency parameter and r the rank of the low rank component. In addition, adaptive sampling algorithms are proposed to address the problem of column/row sampling from structured data. We provide an analysis of the proposed method with adaptive sampling and show that adaptive sampling makes the required number of sampled columns/rows invariant to the distribution of the data. The proposed approach is amenable to online implementation and an online scheme is proposed.

1607.01881 2026-06-04 stat.ME cs.NA math.NA stat.CO

Goal-oriented optimal approximations of Bayesian linear inverse problems

面向目标的贝叶斯线性反问题最优近似

Alessio Spantini, Tiangang Cui, Karen Willcox, Luis Tenorio, Youssef Marzouk

AI总结 本文提出面向目标的线性高斯反问题最优降维方法,通过低秩负更新近似后验协方差,并证明其在对称正定矩阵流形上的最优性,同时避免显式计算参数后验分布。

详情
AI中文摘要

我们提出了解决面向目标的线性高斯反问题的最优降维技术,其中感兴趣的量(QoI)是反演参数的函数。这些近似适用于大规模应用。特别地,我们研究了QoI后验协方差作为其先验协方差的低秩负更新近似,并证明了该更新在对称正定矩阵流形上的自然测地距离下的最优性。假设QoI的后验均值已知,最优结果可推广到以Kullback-Leibler散度和Hellinger距离为度量的分布最优性。我们还提出将QoI的后验均值近似为数据的低秩线性函数,并证明该近似在加权贝叶斯风险下的最优性。这两种最优近似避免显式计算参数的完整后验分布,而是聚焦于由目标反问题各组成部分(先验信息、正向模型、测量噪声和最终目标)平衡得出的方向。我们通过热传导的高维反问题示例展示了理论。

英文摘要

We propose optimal dimensionality reduction techniques for the solution of goal-oriented linear-Gaussian inverse problems, where the quantity of interest (QoI) is a function of the inversion parameters. These approximations are suitable for large-scale applications. In particular, we study the approximation of the posterior covariance of the QoI as a low-rank negative update of its prior covariance, and prove optimality of this update with respect to the natural geodesic distance on the manifold of symmetric positive definite matrices. Assuming exact knowledge of the posterior mean of the QoI, the optimality results extend to optimality in distribution with respect to the Kullback-Leibler divergence and the Hellinger distance between the associated distributions. We also propose approximation of the posterior mean of the QoI as a low-rank linear function of the data, and prove optimality of this approximation with respect to a weighted Bayes risk. Both of these optimal approximations avoid the explicit computation of the full posterior distribution of the parameters and instead focus on directions that are well informed by the data and relevant to the QoI. These directions stem from a balance among all the components of the goal-oriented inverse problem: prior information, forward model, measurement noise, and ultimate goals. We illustrate the theory using a high-dimensional inverse problem in heat transfer.

1503.06675 2026-06-04 stat.ME cs.IT cs.NA math.IT math.NA

The Fourier Decomposition Method for nonlinear and nonstationary time series analysis

非线性和非平稳时间序列分析的傅里叶分解方法

Pushpendra Singh, Shiv Dutt Joshi, Rakesh Kumar Patney, Kaushik Saha

AI总结 本文提出傅里叶分解方法(FDM),用于分析非线性和非平稳时间序列,通过分解数据为少量傅里叶内在带函数(FIBFs),并展示其在多变量非线性和非平稳时间序列分析中的有效性。

详情
Journal ref
Proceedings of the Royal Society of London A; March 2017, Volume 473, issue 2199
Comments
14 Pages, 18 Figures
AI中文摘要

长期以来,文献中普遍认为傅里叶方法不适合非线性和非平稳数据的分析。本文提出傅里叶分解方法(FDM),并通过示例证明其在分析非线性和非平稳时间序列中的有效性。FDM将任何数据分解为少量傅里叶内在带函数(FIBFs)。FDM通过傅里叶方法本身,提供了一种具有可变振幅和频率的广义傅里叶展开。我们提出了一种基于零相位滤波器组的多变量FDM(MFDM)算法,用于分析多变量非线性和非平稳时间序列。我们还提出了一种算法来获取MFDM的截止频率。MFDM算法生成有限数量的带限多变量FIBFs(MFIBFs)。MFDM保留了多变量数据的一些内在物理特性,如尺度对齐、趋势和瞬时频率。所提出的方法在时频能分布中产生结果,揭示数据的内在结构。进行了模拟并与其他经验模分解(EMD)方法在各种模拟和实际时间序列分析中的比较,结果表明所提出的方法是分析和获取任何数据时频能表示的强大工具。

英文摘要

Since many decades, there is a general perception in literature that the Fourier methods are not suitable for the analysis of nonlinear and nonstationary data. In this paper, we propose a Fourier Decomposition Method (FDM) and demonstrate its efficacy for the analysis of nonlinear (i.e. data generated by nonlinear systems) and nonstationary time series. The proposed FDM decomposes any data into a small number of `Fourier intrinsic band functions' (FIBFs). The FDM presents a generalized Fourier expansion with variable amplitudes and frequencies of a time series by the Fourier method itself. We propose an idea of zero-phase filter bank based multivariate FDM (MFDM) algorithm, for the analysis of multivariate nonlinear and nonstationary time series, from the FDM. We also present an algorithm to obtain cutoff frequencies for MFDM. The MFDM algorithm is generating finite number of band limited multivariate FIBFs (MFIBFs). The MFDM preserves some intrinsic physical properties of the multivariate data, such as scale alignment, trend and instantaneous frequency. The proposed methods produce the results in a time-frequency-energy distribution that reveal the intrinsic structures of a data. Simulations have been carried out and comparison is made with the Empirical Mode Decomposition (EMD) methods in the analysis of various simulated as well as real life time series, and results show that the proposed methods are powerful tools for analyzing and obtaining the time-frequency-energy representation of any data.

1703.02899 2026-06-04 cs.LG cs.RO cs.SY eess.SY stat.ML

Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers

基于模型的策略搜索用于多变量PID控制器的自动调优

Andreas Doerr, Duy Nguyen-Tuong, Alonso Marco, Stefan Schaal, Sebastian Trimpe

AI总结 本文提出基于模型的策略搜索框架,用于自动调优多变量PID控制器,通过数据驱动的方法解决复杂系统的控制器调优问题。

详情
Comments
Accepted final version to appear in 2017 IEEE International Conference on Robotics and Automation (ICRA)
AI中文摘要

PID控制架构在工业应用中被广泛使用。尽管其开放参数数量较少,但实际中调优多个耦合的PID控制器可能变得繁琐。本文扩展了PILCO,一种基于模型的策略搜索框架,以纯数据驱动的方式自动调优多变量PID控制器,无需事先了解系统。通过适当扩展系统状态,将PID策略框架为静态状态反馈策略,从而将PID调优视为有限时间最优控制问题的解法,无需进一步先验知识。该框架应用于平衡倒立摆于七自由度机械臂的任务,展示了其在复杂现实问题中快速且数据高效的学习能力。

英文摘要

PID control architectures are widely used in industrial applications. Despite their low number of open parameters, tuning multiple, coupled PID controllers can become tedious in practice. In this paper, we extend PILCO, a model-based policy search framework, to automatically tune multivariate PID controllers purely based on data observed on an otherwise unknown system. The system's state is extended appropriately to frame the PID policy as a static state feedback policy. This renders PID tuning possible as the solution of a finite horizon optimal control problem without further a priori knowledge. The framework is applied to the task of balancing an inverted pendulum on a seven degree-of-freedom robotic arm, thereby demonstrating its capabilities of fast and data-efficient policy learning, even on complex real world problems.

1703.02428 2026-06-04 stat.ME cs.SY eess.SY stat.CO

Robust Bayesian Filtering and Smoothing Using Student's t Distribution

基于学生t分布的鲁棒贝叶斯滤波与平滑

Michael Roth, Tohid Ardeshiri, Emre Özkan, Fredrik Gustafsson

AI总结 本文提出基于学生t分布的鲁棒滤波与平滑算法,以应对厚尾过程和测量噪声,改进卡尔曼滤波在非高斯环境下的性能。

详情
AI中文摘要

在厚尾过程和测量噪声状态下进行状态估计是一个重要挑战,尤其是在具有敏捷目标和异常值污染的跟踪场景中。由于卡尔曼滤波与高斯分布密切相关,其性能会下降。本文讨论了学生t分布的应用,分析了线性状态空间模型中t噪声的精确滤波。通过中间近似步骤,得到与卡尔曼滤波和Rauch-Tung-Striebel平滑器相似但具有非线性测量依赖矩阵更新的滤波和平滑算法。讨论了所需的近似方法,并揭示了矩匹配在t密度中的不利行为。提出了一种基于最小化Kullback-Leibler散度的有利近似方法。由于其与卡尔曼滤波的关系,t滤波继承了一些属性和算法扩展。有指导性的仿真示例展示了新算法的性能和鲁棒性。

英文摘要

State estimation in heavy-tailed process and measurement noise is an important challenge that must be addressed in, e.g., tracking scenarios with agile targets and outlier-corrupted measurements. The performance of the Kalman filter (KF) can deteriorate in such applications because of the close relation to the Gaussian distribution. Therefore, this paper describes the use of Student's t distribution to develop robust, scalable, and simple filtering and smoothing algorithms. After a discussion of Student's t distribution, exact filtering in linear state-space models with t noise is analyzed. Intermediate approximation steps are used to arrive at filtering and smoothing algorithms that closely resemble the KF and the Rauch-Tung-Striebel (RTS) smoother except for a nonlinear measurement-dependent matrix update. The required approximations are discussed and an undesirable behavior of moment matching for t densities is revealed. A favorable approximation based on minimization of the Kullback-Leibler divergence is presented. Because of its relation to the KF, some properties and algorithmic extensions are inherited by the t filter. Instructive simulation examples demonstrate the performance and robustness of the novel algorithms.

1605.06432 2026-06-04 stat.ML cs.LG cs.SY eess.SY

Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data

深度变分贝叶斯滤波器:从原始数据中无监督学习状态空间模型

Maximilian Karl, Maximilian Soelch, Justin Bayer, Patrick van der Smagt

AI总结 本文提出深度变分贝叶斯滤波器,通过变分推断处理非解析性推理,实现从原始数据中无监督学习状态空间模型,提升长期预测能力。

详情
Comments
Published as a conference paper at ICLR 2017
AI中文摘要

我们介绍了深度变分贝叶斯滤波器(DVBF),一种新的方法,用于无监督学习和识别潜在马尔可夫状态空间模型。利用最近在随机梯度变分贝叶斯中的进展,DVBF可以通过变分推断克服不可处理的推断分布。因此,它可以处理具有时间和空间依赖性的高度非线性输入数据,如图像序列,而无需领域知识。我们的实验表明,启用通过转换的反向传播强制状态空间假设,并显著提高潜在嵌入的信息内容。这还使长期预测成为可能。

英文摘要

We introduce Deep Variational Bayes Filters (DVBF), a new method for unsupervised learning and identification of latent Markovian state space models. Leveraging recent advances in Stochastic Gradient Variational Bayes, DVBF can overcome intractable inference distributions via variational inference. Thus, it can handle highly nonlinear input data with temporal and spatial dependencies such as image sequences without domain knowledge. Our experiments show that enabling backpropagation through transitions enforces state space assumptions and significantly improves information content of the latent embedding. This also enables realistic long-term prediction.

1703.00663 2026-06-04 math.NA cs.CV cs.LG cs.NA math.OC stat.ML

Introduction to Nonnegative Matrix Factorization

非负矩阵因子分解简介

Nicolas Gillis

AI总结 本文介绍非负矩阵因子分解的应用、解的几何性质与唯一性、复杂度及算法,并探讨其与多面体扩展形式的联系。

详情
Journal ref
SIAG/OPT Views and News 25 (1), pp. 7-16 (2017)
Comments
18 pages, 4 figures
AI中文摘要

本文介绍了非负矩阵因子分解(NMF)的概念,并提供了简要概述。讨论了NMF在高光谱成像中的应用、解的几何性质与唯一性、复杂度、算法及其与多面体扩展形式的联系。为将NMF置于更广泛的问题框架中,首先简要介绍了受限低秩矩阵近似问题的更一般问题类别。

英文摘要

In this paper, we introduce and provide a short overview of nonnegative matrix factorization (NMF). Several aspects of NMF are discussed, namely, the application in hyperspectral imaging, geometry and uniqueness of NMF solutions, complexity, algorithms, and its link with extended formulations of polyhedra. In order to put NMF into perspective, the more general problem class of constrained low-rank matrix approximation problems is first briefly introduced.

1609.06757 2026-06-04 stat.AP cs.SY eess.SY math.ST stat.TH

Quickest Change Detection Approach to Optimal Control in Markov Decision Processes with Model Changes

在马尔可夫决策过程中的最优控制中采用最快变化检测方法

Taposh Banerjee, Miao Liu, Jonathan P. How

AI总结 本文提出了一种基于最快变化检测的方法,用于解决非平稳马尔可夫决策过程中的最优控制问题,通过双阈值切换策略提升长期奖励表现。

详情
Comments
In Proceedings of American Control Conference 2017, 7 pages
AI中文摘要

非平稳马尔可夫决策过程(MDP)中的最优控制是一个具有挑战性的问题。在这样的控制问题中,目标是在过渡动态或奖励函数随时间变化时最大化长期折扣奖励。当已知变化统计信息时,标准的贝叶斯方法是将问题重新表述为部分可观测马尔可夫决策过程(POMDP)并使用近似POMDP求解器解决,这通常计算上很耗费资源。在本文中,从最快变化检测(QCD)的视角分析该问题,QCD是一组用于检测随机变量序列分布变化的工具。当前应用QCD到此类问题的方法仅通过遵循预设策略被动检测变化,而没有优化长期性能的行动选择。本文证明忽略奖励-检测权衡可能导致长期奖励显著损失,并提出双阈值切换策略来解决这一问题。还提出了一个非贝叶斯问题形式化,用于无法定义贝叶斯形式化的场景。通过在非平稳MDP任务上的数值分析检验所提双阈值策略的性能,该策略在贝叶斯和非贝叶斯设置中均优于现有最先进QCD方法。

英文摘要

Optimal control in non-stationary Markov decision processes (MDP) is a challenging problem. The aim in such a control problem is to maximize the long-term discounted reward when the transition dynamics or the reward function can change over time. When a prior knowledge of change statistics is available, the standard Bayesian approach to this problem is to reformulate it as a partially observable MDP (POMDP) and solve it using approximate POMDP solvers, which are typically computationally demanding. In this paper, the problem is analyzed through the viewpoint of quickest change detection (QCD), a set of tools for detecting a change in the distribution of a sequence of random variables. Current methods applying QCD to such problems only passively detect changes by following prescribed policies, without optimizing the choice of actions for long term performance. We demonstrate that ignoring the reward-detection trade-off can cause a significant loss in long term rewards, and propose a two threshold switching strategy to solve the issue. A non-Bayesian problem formulation is also proposed for scenarios where a Bayesian formulation cannot be defined. The performance of the proposed two threshold strategy is examined through numerical analysis on a non-stationary MDP task, and the strategy outperforms the state-of-the-art QCD methods in both Bayesian and non-Bayesian settings.

1703.00084 2026-06-04 eess.SY cs.LG cs.SY stat.ML

Multi-Sensor Data Pattern Recognition for Multi-Target Localization: A Machine Learning Approach

多传感器数据模式识别用于多目标定位:一种机器学习方法

Kasthurirengan Suresh, Samuel Silva, Johnathan Votion, Yongcan Cao

AI总结 本文提出了一种创新的目标定位学习方法,利用聚类和SVM等算法处理多传感器数据,以提高多目标定位的准确性。

详情
Comments
submitted for conference publication
AI中文摘要

数据-目标配对是实现无人系统智能操作多目标定位的重要步骤。目标定位在许多应用中至关重要,如搜索、救援、交通管理和监控。本文旨在提出一种创新的目标位置学习方法,其中使用了多种机器学习方法,包括K均值聚类和支持向量机(SVM),以学习跨多个空间分布传感器的数据模式。为了实现不同传感器之间的准确数据关联以实现准确的目标定位,适当的数据预处理至关重要,随后应用不同的机器学习算法对不同传感器的数据进行适当分组,以实现多个目标的准确定位。通过模拟示例,对这些机器学习算法的性能进行了量化和比较。

英文摘要

Data-target pairing is an important step towards multi-target localization for the intelligent operation of unmanned systems. Target localization plays a crucial role in numerous applications, such as search, and rescue missions, traffic management and surveillance. The objective of this paper is to present an innovative target location learning approach, where numerous machine learning approaches, including K-means clustering and supported vector machines (SVM), are used to learn the data pattern across a list of spatially distributed sensors. To enable the accurate data association from different sensors for accurate target localization, appropriate data pre-processing is essential, which is then followed by the application of different machine learning algorithms to appropriately group data from different sensors for the accurate localization of multiple targets. Through simulation examples, the performance of these machine learning algorithms is quantified and compared.

1702.07834 2026-06-04 math.NA cs.LG cs.NA stat.ML

Efficient coordinate-wise leading eigenvector computation

高效坐标-wise 主特征向量计算

Jialei Wang, Weiran Wang, Dan Garber, Nathan Srebro

AI总结 本文提出并分析了高效的坐标-wise 方法来寻找主特征向量,每一步仅涉及向量-向量乘积。方法在全局收敛性和运行时间上不低于Lanczos方法,并在谱衰减较慢时表现更优。

详情
AI中文摘要

我们开发并分析了高效的

英文摘要

We develop and analyze efficient "coordinate-wise" methods for finding the leading eigenvector, where each step involves only a vector-vector product. We establish global convergence with overall runtime guarantees that are at least as good as Lanczos's method and dominate it for slowly decaying spectrum. Our methods are based on combining a shift-and-invert approach with coordinate-wise algorithms for linear regression.

1702.06166 2026-06-04 stat.ML cs.LG cs.NA math.NA q-bio.GN q-bio.QM stat.ME

Bayesian Boolean Matrix Factorisation

贝叶斯布尔矩阵分解

Tammo Rukat, Chris C. Holmes, Michalis K. Titsias, Christopher Yau

AI总结 本文提出一种基于概率生成模型的布尔矩阵分解方法,通过Metropolised Gibbs采样实现高效后验推断,并在真实和模拟数据中优于现有方法,提升解释性与应用价值。

详情
AI中文摘要

布尔矩阵分解旨在将二进制数据矩阵分解为两个低秩二进制矩阵的近似布尔乘积:一个包含有意义的模式,另一个量化如何将观测表示为这些模式的组合。本文引入OrMachine,一种概率生成模型,推导出Metropolised Gibbs采样器以实现高效的后验推断。在真实和模拟数据上,我们的方法优于现有方法,首次提供完整的后验推断,适用于协作过滤中的假阳性控制,并提升推断模式的可解释性。所提算法可扩展至大规模数据集,如通过分析11,000个基因在130万只小鼠脑细胞中的单细胞基因表达数据,在商用硬件上实现。

英文摘要

Boolean matrix factorisation aims to decompose a binary data matrix into an approximate Boolean product of two low rank, binary matrices: one containing meaningful patterns, the other quantifying how the observations can be expressed as a combination of these patterns. We introduce the OrMachine, a probabilistic generative model for Boolean matrix factorisation and derive a Metropolised Gibbs sampler that facilitates efficient parallel posterior inference. On real world and simulated data, our method outperforms all currently existing approaches for Boolean matrix factorisation and completion. This is the first method to provide full posterior inference for Boolean Matrix factorisation which is relevant in applications, e.g. for controlling false positive rates in collaborative filtering and, crucially, improves the interpretability of the inferred patterns. The proposed algorithm scales to large datasets as we demonstrate by analysing single cell gene expression data in 1.3 million mouse brain cells across 11 thousand genes on commodity hardware.

1602.07046 2026-06-04 stat.ML cs.LG cs.NA math.NA

An Improved Gap-Dependency Analysis of the Noisy Power Method

改进的噪声幂方法的间隙依赖性分析

Maria Florina Balcan, Simon S. Du, Yining Wang, Adams Wei Yu

AI总结 本文改进了噪声幂方法对谱间隙的依赖性,通过引入中间参数q,提升了样本复杂度和噪声容忍度的界限,应用于分布式隐私PCA和内存高效流PCA。

详情
AI中文摘要

我们考虑了在机器学习和统计中广泛应用的噪声幂方法,尤其是在资源受限下的主成分分析(PCA)中。现有分析显示噪声幂方法对输入数据矩阵的连续谱间隙(σ_k-σ_{k+1})存在不满意的依赖性,这可能非常小,从而限制了算法的应用。本文提出了一种新的噪声幂方法分析,实现了样本复杂度和噪声容忍度界限的改进依赖性。具体而言,我们将对(σ_k-σ_{k+1})的依赖性改进为对(σ_k-σ_{q+1})的依赖性,其中q是一个中间算法参数,可能远大于目标秩k。我们的证明基于对两个子空间接近性的新特征化,这不同于之前工作中分析的canonical angle特征化。最后,我们将改进的界限应用于分布式隐私PCA和内存高效的流PCA,并获得了优于现有文献结果的界限。

英文摘要

We consider the noisy power method algorithm, which has wide applications in machine learning and statistics, especially those related to principal component analysis (PCA) under resource (communication, memory or privacy) constraints. Existing analysis of the noisy power method shows an unsatisfactory dependency over the "consecutive" spectral gap $(σ_k-σ_{k+1})$ of an input data matrix, which could be very small and hence limits the algorithm's applicability. In this paper, we present a new analysis of the noisy power method that achieves improved gap dependency for both sample complexity and noise tolerance bounds. More specifically, we improve the dependency over $(σ_k-σ_{k+1})$ to dependency over $(σ_k-σ_{q+1})$, where $q$ is an intermediate algorithm parameter and could be much larger than the target rank $k$. Our proofs are built upon a novel characterization of proximity between two subspaces that differ from canonical angle characterizations analyzed in previous works. Finally, we apply our improved bounds to distributed private PCA and memory-efficient streaming PCA and obtain bounds that are superior to existing results in the literature.

1604.02575 2026-06-04 math.PR cs.NA math.NA math.ST stat.TH

Well-posed Bayesian Inverse Problems: Priors with Exponential Tails

良好定义的贝叶斯反问题:具有指数尾部的先验

Bamdad Hosseini, Nilima Nigam

AI总结 研究在先验测度具有指数尾部时贝叶斯反问题的良好定义性,探讨凸(对数凹)概率测度的条件,保证后验测度的存在性、唯一性和稳定性,并提出后验近似方法及构造Banach空间上凸先验的通用方法。

详情
AI中文摘要

我们考虑当先验测度具有指数尾部时贝叶斯反问题的良好定义性。特别是,我们考虑包含高斯和贝索夫测度以及某些分层先验的凸(对数凹)概率测度类。我们确定了保证后验测度存在性、唯一性和数据扰动下稳定性所需的似然分布和先验测度的条件。我们还考虑了后验的相容近似,如通过投影的离散化。最后,我们提出一个在Banach空间上构造凸先验的通用方法,这在实际应用中常涉及如$ L^2$或连续函数空间。

英文摘要

We consider the well-posedness of Bayesian inverse problems when the prior measure has exponential tails. In particular, we consider the class of convex (log-concave) probability measures which include the Gaussian and Besov measures as well as certain classes of hierarchical priors. We identify appropriate conditions on the likelihood distribution and the prior measure which guarantee existence, uniqueness and stability of the posterior measure with respect to perturbations of the data. We also consider consistent approximations of the posterior such as discretization by projection. Finally, we present a general recipe for construction of convex priors on Banach spaces which will be of interest in practical applications where one often works with spaces such as $L^2$ or the continuous functions.

1503.04123 2026-06-04 stat.CO cs.NA math.NA math.PR stat.ME

Perturbation theory for Markov chains via Wasserstein distance

通过Wasserstein距离的马尔可夫链扰动理论

Daniel Rudolf, Nikolaus Schweizer

AI总结 本文研究了马尔可夫链扰动理论,通过Wasserstein距离提供灵活的分布距离界,适用于几何递减的马尔可夫链,并展示了在自回归模型中无法进一步改进的界。

详情
Comments
31 pages, accepted at Bernoulli Journal
AI中文摘要

马尔可夫链的扰动理论探讨了转移差异如何影响分布差异。我们证明了当其中一个马尔可夫链满足Wasserstein平稳性条件时,两个马尔可夫链第n步分布距离的有力且灵活的界。我们的工作受到近来对大数据集分析中近似马尔可夫链蒙特卡洛(MCMC)方法的兴趣启发。通过基于Lyapunov函数的方法,我们在弱假设下为几何递减马尔可夫链提供估计。在自回归模型中,我们的界通常无法改进。我们通过展示两种著名MCMC算法(Metropolis-Hastings和随机 Langevin 算法)的近似版本的定量估计来说明我们的理论。

英文摘要

Perturbation theory for Markov chains addresses the question how small differences in the transitions of Markov chains are reflected in differences between their distributions. We prove powerful and flexible bounds on the distance of the $n$th step distributions of two Markov chains when one of them satisfies a Wasserstein ergodicity condition. Our work is motivated by the recent interest in approximate Markov chain Monte Carlo (MCMC) methods in the analysis of big data sets. By using an approach based on Lyapunov functions, we provide estimates for geometrically ergodic Markov chains under weak assumptions. In an autoregressive model, our bounds cannot be improved in general. We illustrate our theory by showing quantitative estimates for approximate versions of two prominent MCMC algorithms, the Metropolis-Hastings and stochastic Langevin algorithms.

1611.03993 2026-06-04 stat.ML cs.LG cs.NA math.NA

Riemannian Tensor Completion with Side Information

Riemannian张量补全与侧信息

Tengfei Zhou, Hui Qian, Zebang Shen, Congfu Xu

AI总结 本文提出一种新的Riemannian模型,整合原始模型与侧信息以提升低秩张量补全效率与准确性。

详情
AI中文摘要

通过将迭代限制在非线性流形上,最近提出的Riemannian优化方法在低秩张量补全问题中证明了其高效和有效性。然而,现有方法由于格式不匹配而无法利用容易获取的侧信息。因此,这些方法仍有改进的空间。为填补这一空白,本文提出了一种新的Riemannian模型,通过克服其不一致性来有机整合原始模型和侧信息。针对此模型,基于一种新的度量方法设计了高效的Riemannian共轭梯度下降求解器。数值实验表明,我们的求解器在不牺牲效率的情况下比最先进的方法更准确。

英文摘要

By restricting the iterate on a nonlinear manifold, the recently proposed Riemannian optimization methods prove to be both efficient and effective in low rank tensor completion problems. However, existing methods fail to exploit the easily accessible side information, due to their format mismatch. Consequently, there is still room for improvement in such methods. To fill the gap, in this paper, a novel Riemannian model is proposed to organically integrate the original model and the side information by overcoming their inconsistency. For this particular model, an efficient Riemannian conjugate gradient descent solver is devised based on a new metric that captures the curvature of the objective.Numerical experiments suggest that our solver is more accurate than the state-of-the-art without compromising the efficiency.

1609.07532 2026-06-04 math.PR cs.NA math.NA math.ST stat.TH

Well-posed Bayesian Inverse Problems with Infinitely-Divisible and Heavy-Tailed Prior Measures

具有无限可分且厚尾先验测度的良好设定贝叶斯逆问题

Bamdad Hosseini

AI总结 本文提出基于广义伽马分布的新的先验测度,用于p∈(0,1)的ℓp正则化技术,展示其厚尾、非凸和无限可分性质,并探讨无限可分先验测度的尾行为与莱维测度的联系,定义新的函数空间先验测度,解决贝叶斯逆问题的良定性。

详情
AI中文摘要

我们提出了一种新的先验测度,与p∈(0,1)的ℓp正则化技术相关,基于广义伽马分布。我们证明所得到的先验测度是厚尾、非凸和无限可分的。受此启发,我们讨论无限可分先验测度的类别,并建立其尾行为与莱维测度尾行为之间的联系。接着,我们利用纯跳莱维过程的定律,定义新的先验测度,这些测度集中在具有有界变分的函数空间上。这些先验测度作为经典总变分先验的替代品,导致良定义的逆问题。然后,我们在一般设定下研究贝叶斯逆问题的良定性,证明良定性依赖于对数似然函数增长与先验尾行为的平衡,并将结果应用于加性噪声模型和线性问题等特殊情况。最后,我们讨论贝叶斯逆问题的一些实际方面,如其一致近似,并给出三个具体的良定义贝叶斯逆问题实例,这些实例使用厚尾或随机过程先验测度。

英文摘要

We present a new class of prior measures in connection to $\ell_p$ regularization techniques when $p \in(0,1)$ which is based on the generalized Gamma distribution. We show that the resulting prior measure is heavy-tailed, non-convex and infinitely divisible. Motivated by this observation we discuss the class of infinitely divisible prior measures and draw a connection between their tail behavior and the tail behavior of their L{évy} measures. Next, we use the laws of pure jump L{é}vy processes in order to define new classes of prior measures that are concentrated on the space of functions with bounded variation. These priors serve as an alternative to the classic total variation prior and result in well-defined inverse problems. We then study the well-posedness of Bayesian inverse problems in a general enough setting that encompasses the above mentioned classes of prior measures. We establish that well-posedness relies on a balance between the growth of the log-likelihood function and the tail behavior of the prior and apply our results to special cases such as additive noise models and linear problems. Finally, we discuss some of the practical aspects of Bayesian inverse problems such as their consistent approximation and present three concrete examples of well-posed Bayesian inverse problems with heavy-tailed or stochastic process prior measures.

1511.01853 2026-06-04 stat.ML cs.SY eess.SY math.OC

A Sparse Linear Model and Significance Test for Individual Consumption Prediction

一种稀疏线性模型及其个体消费预测显著性检验

Pan Li, Baosen Zhang, Yang Weng, Ram Rajagopal

AI总结 本文提出一种利用历史数据稀疏性提升个体消费预测精度的方法,通过用户间预测关系的显著性检验,实现高效且精确的预测。

详情
Comments
36 pages, 7 figures
AI中文摘要

准确预测用户消费是理解消费者灵活性和行为模式的关键,也是设计稳健高效的节能方案的重要部分。现有预测方法通常相对误差较高,可达30%以上,且难以处理个体用户之间的异质性。本文提出了一种方法,通过自适应探索历史数据中的稀疏性,并利用不同用户间的预测关系来提高个体用户的预测精度。稀疏性通过流行的最小绝对收缩和选择估计器来捕捉,而用户选择则被 formulated 为一个最优假设检验问题,并通过协方差检验来解决。使用来自PG&E的真实世界数据,我们对所提方法与支持向量机、主成分分析结合线性回归以及随机森林等已知技术进行了广泛的模拟验证。结果表明,由于线性性质,所提出的方法在操作上是高效的,并实现了最优的预测性能。

英文摘要

Accurate prediction of user consumption is a key part not only in understanding consumer flexibility and behavior patterns, but in the design of robust and efficient energy saving programs as well. Existing prediction methods usually have high relative errors that can be larger than 30% and have difficulties accounting for heterogeneity between individual users. In this paper, we propose a method to improve prediction accuracy of individual users by adaptively exploring sparsity in historical data and leveraging predictive relationship between different users. Sparsity is captured by popular least absolute shrinkage and selection estimator, while user selection is formulated as an optimal hypothesis testing problem and solved via a covariance test. Using real world data from PG&E, we provide extensive simulation validation of the proposed method against well-known techniques such as support vector machine, principle component analysis combined with linear regression, and random forest. The results demonstrate that our proposed methods are operationally efficient because of linear nature, and achieve optimal prediction performance.

1503.03467 2026-06-04 math.NA cs.AI cs.NA math.ST stat.ML stat.TH

Multigrid with rough coefficients and Multiresolution operator decomposition from Hierarchical Information Games

多重网格与粗糙系数的多重分辨率算子分解来自层次信息博弈

Houman Owhadi

AI总结 本文提出了一种近线性复杂度的多重网格/多重分辨率方法,用于处理具有粗糙系数的偏微分方程,通过信息博弈理论框架实现精确的先验精度和性能估计。

详情
Journal ref
SIAM Rev. 59-1, pp. 99-149 (2017)
Comments
Presented at SIAM CSE 15. Final (published) version. http://epubs.siam.org/doi/abs/10.1137/15M1013894
AI中文摘要

我们介绍了一种近线性复杂度(几何和无网格/代数)的多重网格/多重分辨率方法,用于具有粗糙(L^∞)系数的偏微分方程,具有严格的先验准确性和性能估计。该方法通过决策/博弈理论框架发现,解决三个问题:(1)识别限制和插值算子;(2)基于线性算子图像的范数约束恢复信号;(3)基于解的层次嵌套测量的赌注。所得基本赌注形成一个层次的(确定性)基函数集合H^1_0(Ω)(赌注函数),这些函数(1)在子尺度/子带之间关于由偏微分方程能量范数诱导的标量积正交;(2)在H^1_0(Ω)中实现解空间的稀疏压缩;(3)诱导一个正交的多重分辨率算子分解。多重网格方法的操作图是一个倒置金字塔,其中赌注函数局部计算(由于指数衰减),层次化(从细到粗尺度),并分解为具有均匀有界条件数的独立线性系统。所得算法在空间(通过局部化)和带宽/子尺度(子尺度可以独立计算)上均可并行化。尽管该方法是确定性的,但其在信息博弈框架下具有自然的贝叶斯解释,且多重分辨率逼近相对于由嵌套测量层次诱导的滤波器形成一个鞅。

英文摘要

We introduce a near-linear complexity (geometric and meshless/algebraic) multigrid/multiresolution method for PDEs with rough ($L^\infty$) coefficients with rigorous a-priori accuracy and performance estimates. The method is discovered through a decision/game theory formulation of the problems of (1) identifying restriction and interpolation operators (2) recovering a signal from incomplete measurements based on norm constraints on its image under a linear operator (3) gambling on the value of the solution of the PDE based on a hierarchy of nested measurements of its solution or source term. The resulting elementary gambles form a hierarchy of (deterministic) basis functions of $H^1_0(Ω)$ (gamblets) that (1) are orthogonal across subscales/subbands with respect to the scalar product induced by the energy norm of the PDE (2) enable sparse compression of the solution space in $H^1_0(Ω)$ (3) induce an orthogonal multiresolution operator decomposition. The operating diagram of the multigrid method is that of an inverted pyramid in which gamblets are computed locally (by virtue of their exponential decay), hierarchically (from fine to coarse scales) and the PDE is decomposed into a hierarchy of independent linear systems with uniformly bounded condition numbers. The resulting algorithm is parallelizable both in space (via localization) and in bandwith/subscale (subscales can be computed independently from each other). Although the method is deterministic it has a natural Bayesian interpretation under the measure of probability emerging (as a mixed strategy) from the information game formulation and multiresolution approximations form a martingale with respect to the filtration induced by the hierarchy of nested measurements.

1702.01890 2026-06-04 eess.SY cs.SY stat.AP

Graphical Models and Belief Propagation-hierarchy for Optimal Physics-Constrained Network Flows

图模型与信念传播层级用于最优物理约束网络流

Michael Chertkov, Sidhant Misra, Marc Vuffray, Dvijotham Krishnamurty, Pascal Van Hentenryck

AI总结 本文探讨了图模型方法在带物理约束的网络流优化中的应用,通过电力系统和天然气传输案例展示核心方法与贡献。

详情
Comments
28 pages, 1 figure
AI中文摘要

本文回顾了图模型方法在带物理约束的网络流优化中的新思想和初步结果,该方法起源于统计物理、信息论、计算机科学和机器学习。我们通过电力系统和天然气传输(大陆规模)及分配(区域规模)系统的多个示例展示了通用概念。

英文摘要

In this manuscript we review new ideas and first results on application of the Graphical Models approach, originated from Statistical Physics, Information Theory, Computer Science and Machine Learning, to optimization problems of network flow type with additional constraints related to the physics of the flow. We illustrate the general concepts on a number of enabling examples from power system and natural gas transmission (continental scale) and distribution (district scale) systems.

1702.01236 2026-06-04 math.NA cs.NA stat.ME

Improved Probabilistic Principal Component Analysis for Application to Reduced Order Modeling

改进的概率主成分分析用于降阶建模应用

Indika Udagedara, Brian Helenbrook, Aaron Luttman, Jared Catenacci

AI总结 本文改进概率主成分分析用于降阶建模,通过估计训练数据中的噪声并优化基向量,提高在噪声数据下的预测准确性。

详情
Comments
21 pages, 7 figures
AI中文摘要

在先前的工作中,为随机系统构建了降阶模型(ROM),其中噪声数据被投影到主成分分析(PCA)导出的基向量上以获得无噪声数据的准确重建。该工作使用了为确定性数据设计的技术,PCA用于生成基函数,$L_2$投影用于创建重建。在本工作中,采用了概率方法。概率主成分分析(PPCA)用于生成基向量,从而允许估计训练数据中的噪声。PPCA还被改进,使得导出的基向量是正交的,并且可以估计训练数据集上基展开系数的方差。标准方法假设这些系数具有单位方差。基于PPCA的结果,应用模型选择标准来自动选择ROM的维度。在先前的工作中,使用启发式方法来选择维度。最后,使用新的统计方法进行投影步骤,其中从改进的PPCA获得的方差信息用作先验以改进投影。这在投影数据存在噪声时比$L_2$投影提供了更高的准确性。此外,投影数据的噪声统计不假设与训练数据相同,而是在投影过程中进行估计。整个方法提供了一种完全随机的方法,用于从噪声训练数据中计算ROM,确定理想的模型选择,并投影噪声测试数据,从而能够从数据中准确预测无噪声数据。

英文摘要

In our previous work, a reduced order model (ROM) for a stochastic system was made, where noisy data was projected onto principal component analysis (PCA)-derived basis vectors to obtain an accurate reconstruction of the noise-free data. That work used techniques designed for deterministic data, PCA was used for the basis function generation and $L_2$ projection was used to create the reconstructions. In this work, probabilistic approaches are used. The probabilistic PCA (PPCA) is used to generate the basis, which then allows the noise in the training data to be estimated. PPCA has also been improved so that the derived basis vectors are orthonormal and the variance of the basis expansion coefficients over the training data set can be estimated. The standard approach assumes a unit variance for these coefficients. Based on the results of the PPCA, model selection criteria are applied to automatically choose the dimension of the ROM. In our previous work, a heuristic approach was used to pick the dimension. Lastly, a new statistical approach is used for the projection step where the variance information obtained from the improved PPCA is used as a prior to improve the projection. This gives improved accuracy over $L_2$ projection when the projected data is noisy. In addition, the noise statistics for the projected data are not assumed to be the same as that of the training data, but are estimated in the projection process. The entire approach gives a fully stochastic method for computing a ROM from noisy training data, determining ideal model selection, and projecting noisy test data, thus enabling accurate predictions of noise-free data from data that is dominated by noise.

1701.08789 2026-06-04 stat.ML econ.GN q-fin.EC

Understanding food inflation in India: A Machine Learning approach

理解印度食品通胀:一种机器学习方法

Akash Malhotra, Mayank Maloo

AI总结 本文利用机器学习技术分析印度食品通胀原因,发现供应与需求失衡是主因,MSP和农场工资对食品价格影响显著,国际食品价格对国内价格影响有限。

详情
AI中文摘要

过去十年中,印度经济的快速增长受到持续高通胀,尤其是食品价格的挑战。食品通胀顽固的主要原因是供应与需求不匹配,国内农业生产未能跟上上升的需求,这归因于多种近因因素。通过梯度提升回归树(BRT)分析这些因素对食品价格变化的重要性,结果显示所有预测变量在解释食品价格变化中均较为显著,其中MSP和农场工资比其他变量更重要。国际食品价格在解释国内食品价格变化中的相关性有限。确保日益增长的印度人口在收入上升时实现粮食和营养安全,需要通过坚决的政策改革来应对。

英文摘要

Over the past decade, the stellar growth of Indian economy has been challenged by persistently high levels of inflation, particularly in food prices. The primary reason behind this stubborn food inflation is mismatch in supply-demand, as domestic agricultural production has failed to keep up with rising demand owing to a number of proximate factors. The relative significance of these factors in determining the change in food prices have been analysed using gradient boosted regression trees (BRT), a machine learning technique. The results from BRT indicates all predictor variables to be fairly significant in explaining the change in food prices, with MSP and farm wages being relatively more important than others. International food prices were found to have limited relevance in explaining the variation in domestic food prices. The challenge of ensuring food and nutritional security for growing Indian population with rising incomes needs to be addressed through resolute policy reforms.

1701.08757 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Bayesian Learning of Consumer Preferences for Residential Demand Response

贝叶斯学习消费者对住宅需求响应的偏好

Mikhail V. Goubko, Sergey O. Kuznetsov, Alexey A. Neznanov, Dmitry I. Ignatov

AI总结 本文提出一种贝叶斯学习算法,用于估计消费者舒适度函数,通过历史家电使用数据实现能源节约,优于传统回归分析方法,可扩展至控制供暖和制冷系统。

详情
Journal ref
IFAC-PapersOnLine, 49(32), 2016, p. 24-29, ISSN 2405-8963
AI中文摘要

在未来几年,住宅消费者将面临实时电价,能源价格每日变化,有效的节能需要自动化——一个推荐系统,通过学习消费者的行为来了解其偏好。消费者选择家电使用场景以平衡舒适度和电费。本文提出一种贝叶斯学习算法,从家电使用历史中估计舒适度函数。在由模拟模型生成的数据集上进行数值实验时,该算法优于流行的回归分析工具。我们的方法可扩展至控制负责家庭能源费用一半的供暖和制冷系统。

英文摘要

In coming years residential consumers will face real-time electricity tariffs with energy prices varying day to day, and effective energy saving will require automation - a recommender system, which learns consumer's preferences from her actions. A consumer chooses a scenario of home appliance use to balance her comfort level and the energy bill. We propose a Bayesian learning algorithm to estimate the comfort level function from the history of appliance use. In numeric experiments with datasets generated from a simulation model of a consumer interacting with small home appliances the algorithm outperforms popular regression analysis tools. Our approach can be extended to control an air heating and conditioning system, which is responsible for up to half of a household's energy bill.

1605.05969 2026-06-04 math.OC cs.NA math.NA stat.ML

Randomized Primal-Dual Proximal Block Coordinate Updates

随机对偶近似代理块坐标更新

Xiang Gao, Yangyang Xu, Shuzhong Zhang

AI总结 本文提出了一种随机对偶近似代理块坐标更新框架,用于解决一般多块凸优化问题,证明了在目标函数和线性约束耦合下的O(1/t)收敛性,扩展了现有算法的收敛性质。

详情
Comments
convergence rate results are presented in a more explicit way; numerical results are added
AI中文摘要

本文提出了一种随机对偶近似代理块坐标更新框架,用于解决一般多块凸优化问题,其中目标函数和线性约束相互耦合。在仅假设凸性的情况下,我们建立了目标值和可行性度量的O(1/t)收敛率。该框架包括现有算法作为特例,如双变量对偶方法(PD-S)、近似雅可比ADMM(Prox-JADMM)以及多块凸优化的随机ADMM变体。我们的分析恢复并加强了现有算法的收敛性质。例如,对于PD-S,我们的结果导致相同的收敛速率,而无需之前假设约束集的有界性;对于Prox-JADMM,新结果提供了目标值和可行性违反的收敛率。已知原始ADMM在块数超过两个时可能无法收敛。我们的结果表明,如果适当随机化选择更新块,可以保证多块ADMM的亚线性收敛率,而无需假设强凸性。新方法还扩展到解决仅可用目标函数的随机近似(子)梯度的问题,并建立了解决随机规划问题的O(1/√t)收敛率。

英文摘要

In this paper we propose a randomized primal-dual proximal block coordinate updating framework for a general multi-block convex optimization model with coupled objective function and linear constraints. Assuming mere convexity, we establish its $O(1/t)$ convergence rate in terms of the objective value and feasibility measure. The framework includes several existing algorithms as special cases such as a primal-dual method for bilinear saddle-point problems (PD-S), the proximal Jacobian ADMM (Prox-JADMM) and a randomized variant of the ADMM method for multi-block convex optimization. Our analysis recovers and/or strengthens the convergence properties of several existing algorithms. For example, for PD-S our result leads to the same order of convergence rate without the previously assumed boundedness condition on the constraint sets, and for Prox-JADMM the new result provides convergence rate in terms of the objective value and the feasibility violation. It is well known that the original ADMM may fail to converge when the number of blocks exceeds two. Our result shows that if an appropriate randomization procedure is invoked to select the updating blocks, then a sublinear rate of convergence in expectation can be guaranteed for multi-block ADMM, without assuming any strong convexity. The new approach is also extended to solve problems where only a stochastic approximation of the (sub-)gradient of the objective is available, and we establish an $O(1/\sqrt{t})$ convergence rate of the extended approach for solving stochastic programming.

1607.03463 2026-06-04 math.NA cs.DS cs.LG cs.NA math.OC stat.ML

LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain

LazySVD:即使更快的SVD分解也无需痛苦

Zeyuan Allen-Zhu, Yuanzhi Li

AI总结 本文提出LazySVD框架,改进了k-SVD的突破性方法,实现了更快的无间隙方法,以及首个加速和随机方法,在特定参数范围内优于现有算法。

详情
Comments
first circulated on May 20, 2016; this newer version improves writing
AI中文摘要

本文研究了$k$-SVD,旨在获得矩阵$A$的第一个$k$个奇异向量。最近,$k$-SVD上发现了一些突破:Musco和Musco[1]利用块Krylov方法证明了首个无间隙收敛结果,Shamir[2]发现了首个方差缩减随机方法,Bhojanapalli等人[3]利用交替最小化提供了最快的$O(\mathsf{nnz}(A) + \mathsf{poly}(1/\varepsilon))$-时间算法。本文提出了一种新的简单LazySVD框架以改进上述突破。该框架导致了一个更快的无间隙方法,优于[1],并且首个加速和随机方法,优于[2]。在$O(\mathsf{nnz}(A) + \mathsf{poly}(1/\varepsilon))$运行时间范围内,LazySVD在某些参数范围内优于[3],甚至不使用交替最小化。

英文摘要

We study $k$-SVD that is to obtain the first $k$ singular vectors of a matrix $A$. Recently, a few breakthroughs have been discovered on $k$-SVD: Musco and Musco [1] proved the first gap-free convergence result using the block Krylov method, Shamir [2] discovered the first variance-reduction stochastic method, and Bhojanapalli et al. [3] provided the fastest $O(\mathsf{nnz}(A) + \mathsf{poly}(1/\varepsilon))$-time algorithm using alternating minimization. In this paper, we put forward a new and simple LazySVD framework to improve the above breakthroughs. This framework leads to a faster gap-free method outperforming [1], and the first accelerated and stochastic method outperforming [2]. In the $O(\mathsf{nnz}(A) + \mathsf{poly}(1/\varepsilon))$ running-time regime, LazySVD outperforms [3] in certain parameter regimes without even using alternating minimization.

1701.05632 2026-06-04 econ.GN cs.CY cs.SI physics.soc-ph q-fin.EC stat.ML

The Internet as Quantitative Social Science Platform: Insights from a Trillion Observations

互联网作为定量社会科学平台:来自万亿观测的洞察

Klaus Ackermann, Simon D Angus, Paul A Raschky

AI总结 研究利用万亿级互联网观测数据,分析互联网扩展、睡眠时长及经济成果的关系,揭示互联网作为社会科学研究平台的潜力。

详情
Comments
40 pages, including 4 main figures, and appendix
AI中文摘要

随着互联网的广泛应用,人类首次通过单一开放通信平台相互连接。本文基于超过万亿次(1.5×10¹²)的终端互联网连接观测数据,分析了互联网的扩展、睡眠时长及经济成果的关系。我们发现,每百人唯一IP数量在约每三人达到饱和,耗时16.1年;通过日间互联网活动特征升级传统夜间睡眠观测,得出645个城市7年内的夜间睡眠时长估计;同时发现互联网集中度与经济成果的关系,显示互联网扩展对生产力的影响因行业而异。本文首次利用互联网整体在线/离线活动推断社会科学洞察,证明互联网作为社会科学研究平台的非凡潜力。

英文摘要

With the large-scale penetration of the internet, for the first time, humanity has become linked by a single, open, communications platform. Harnessing this fact, we report insights arising from a unified internet activity and location dataset of an unparalleled scope and accuracy drawn from over a trillion (1.5$\times 10^{12}$) observations of end-user internet connections, with temporal resolution of just 15min over 2006-2012. We first apply this dataset to the expansion of the internet itself over 1,647 urban agglomerations globally. We find that unique IP per capita counts reach saturation at approximately one IP per three people, and take, on average, 16.1 years to achieve; eclipsing the estimated 100- and 60- year saturation times for steam-power and electrification respectively. Next, we use intra-diurnal internet activity features to up-scale traditional over-night sleep observations, producing the first global estimate of over-night sleep duration in 645 cities over 7 years. We find statistically significant variation between continental, national and regional sleep durations including some evidence of global sleep duration convergence. Finally, we estimate the relationship between internet concentration and economic outcomes in 411 OECD regions and find that the internet's expansion is associated with negative or positive productivity gains, depending strongly on sectoral considerations. To our knowledge, our study is the first of its kind to use online/offline activity of the entire internet to infer social science insights, demonstrating the unparalleled potential of the internet as a social data-science platform.

1612.05614 2026-06-04 math.OC cs.NA math.NA stat.CO stat.ML

An MM Algorithm for Split Feasibility Problems

关于分裂可行性问题的MM算法

Jason Xu, Eric C. Chi, Meng Yang, Kenneth Lange

AI总结 本文提出一种MM算法解决非线性分裂可行性问题,通过改进的近似函数方法,允许非线性映射,并提供收敛性保证,应用于放疗优化。

详情
Comments
31 pages, 5 figures, 1 table
AI中文摘要

经典的多集分裂可行性问题寻求一个点,该点在多个闭合凸域约束的交集中,其在线性映射下的像也在多个闭合凸范围约束的交集中。分裂可行性推广了重要的逆问题,包括凸可行性、线性互补性和带约束集的回归。当可行点不存在时,通过最小化接近函数的方法可以得到最优近似解。本文提出了一种扩展的接近函数方法,将线性分裂可行性问题推广到允许非线性映射的情况。我们的算法基于主要化-最小化原理,可以接受拟牛顿加速,并在温和假设下提供收敛性保证。此外,我们展示了非线性分裂可行性问题的接近函数中的欧几里得范数可以被任意Bregman散度替代。我们探讨了几个例子,说明非线性公式在非线性情况下的优势,重点是放疗优化中的优化。

英文摘要

The classical multi-set split feasibility problem seeks a point in the intersection of finitely many closed convex domain constraints, whose image under a linear mapping also lies in the intersection of finitely many closed convex range constraints. Split feasibility generalizes important inverse problems including convex feasibility, linear complementarity, and regression with constraint sets. When a feasible point does not exist, solution methods that proceed by minimizing a proximity function can be used to obtain optimal approximate solutions to the problem. We present an extension of the proximity function approach that generalizes the linear split feasibility problem to allow for non-linear mappings. Our algorithm is based on the principle of majorization-minimization, is amenable to quasi-Newton acceleration, and comes complete with convergence guarantees under mild assumptions. Furthermore, we show that the Euclidean norm appearing in the proximity function of the non-linear split feasibility problem can be replaced by arbitrary Bregman divergences. We explore several examples illustrating the merits of non-linear formulations over the linear case, with a focus on optimization for intensity-modulated radiation therapy.

1701.02522 2026-06-04 math.NA cs.NA math.PR stat.CO

Magnus expansions and pseudospectra of Master Equations

主方程的Magnus展开与伪谱

Arieh Iserles, Shev MacNamara

AI总结 本文通过示例展示主方程研究的新方向,探讨了Magnus展开、时间变化速率和伪谱,发现精确本征值并对比了标准数值方法的误差,以异构化为例展示了化学动力学应用。

详情
AI中文摘要

通过示例展示主方程研究的新方向。Magnus展开、时间变化速率和伪谱被突出强调。精确本征值被发现并与某些情况下标准数值方法产生的大误差进行对比。异构化提供了一个运行示例和化学动力学的说明应用。我们还给出了完全不对称排除过程的简要示例。

英文摘要

New directions in research on master equations are showcased by example. Magnus expansions, time-varying rates, and pseudospectra are highlighted. Exact eigenvalues are found and contrasted with the large errors produced by standard numerical methods in some cases. Isomerisation provides a running example and an illustrative application to chemical kinetics. We also give a brief example of the totally asymmetric exclusion process.

1701.00757 2026-06-04 stat.ML cs.LG cs.NA math.NA

Clustering Signed Networks with the Geometric Mean of Laplacians

利用拉普拉斯矩阵几何均值对带符号网络进行聚类

Pedro Mercado, Francesco Tudisco, Matthias Hein

AI总结 本文提出利用拉普拉斯矩阵几何均值改进谱聚类,解决传统算术均值方法在无噪声正负网络结构中无法准确聚类的问题。

详情
Journal ref
Advances in Neural Information Processing Systems 29, pp.4421--4429, 2016
Comments
14 pages, 5 figures. Accepted in Neural Information Processing Systems (NIPS), 2016
AI中文摘要

带符号网络允许建模正负关系。我们分析了现有谱聚类方法在带符号网络中的扩展,发现现有方法在某些情况下无法恢复真实聚类,因为其采用正负部分拉普拉斯矩阵的算术均值。本文提出使用几何均值,并证明其优于现有方法。尽管矩阵几何均值计算成本高,但通过高效计算几何均值的特征向量,提出了适用于稀疏矩阵的数值方案。

英文摘要

Signed networks allow to model positive and negative relationships. We analyze existing extensions of spectral clustering to signed networks. It turns out that existing approaches do not recover the ground truth clustering in several situations where either the positive or the negative network structures contain no noise. Our analysis shows that these problems arise as existing approaches take some form of arithmetic mean of the Laplacians of the positive and negative part. As a solution we propose to use the geometric mean of the Laplacians of positive and negative part and show that it outperforms the existing approaches. While the geometric mean of matrices is computationally expensive, we show that eigenvectors of the geometric mean can be computed efficiently, leading to a numerical scheme for sparse matrices which is of independent interest.

1612.09158 2026-06-04 eess.SY cs.LG cs.SY stat.ML

The interplay between system identification and machine learning

系统辨识与机器学习之间的相互作用

Gianluigi Pillonetto

AI总结 本文探讨系统辨识与机器学习的联系,提出动态系统RKHS框架,简化稳定性条件推导,并证明正则化估计器收敛于最优预测。

详情
AI中文摘要

从例子中学习是科学和工程中的关键问题,涉及从有限直接和噪声样本中重建函数。在 reproducing kernel Hilbert spaces (RKHSs) 中的正则化被广泛用于解决此任务,包括强大的估计器如正则化网络。最近的成就包括证明这些基于内核的方法的统计一致性。同时,许多不同的系统辨识技术已被开发,但与机器学习的互动仍不强烈。原因之一是机器学习中通常使用的RKHS不嵌入动态系统的信息,例如BIBO稳定性。此外,在系统辨识中,机器学习中通常采用的独立数据假设在实践中并不成立。本文提供了新的结果,加强系统辨识与机器学习之间的联系。我们的起点是引入动态系统的RKHS。它们包含在系统输入定义的空间上的函数,允许将系统辨识解释为从例子中学习。在线性和非线性设置中,证明这种视角允许以相对简单的方式推导RKHS稳定性条件(即只包含BIBO稳定系统或预测器的性质),也促进了系统辨识新内核的设计。此外,我们证明在动态系统典型条件下,正则化估计器收敛于最优预测器。

英文摘要

Learning from examples is one of the key problems in science and engineering. It deals with function reconstruction from a finite set of direct and noisy samples. Regularization in reproducing kernel Hilbert spaces (RKHSs) is widely used to solve this task and includes powerful estimators such as regularization networks. Recent achievements include the proof of the statistical consistency of these kernel- based approaches. Parallel to this, many different system identification techniques have been developed but the interaction with machine learning does not appear so strong yet. One reason is that the RKHSs usually employed in machine learning do not embed the information available on dynamic systems, e.g. BIBO stability. In addition, in system identification the independent data assumptions routinely adopted in machine learning are never satisfied in practice. This paper provides new results which strengthen the connection between system identification and machine learning. Our starting point is the introduction of RKHSs of dynamic systems. They contain functionals over spaces defined by system inputs and allow to interpret system identification as learning from examples. In both linear and nonlinear settings, it is shown that this perspective permits to derive in a relatively simple way conditions on RKHS stability (i.e. the property of containing only BIBO stable systems or predictors), also facilitating the design of new kernels for system identification. Furthermore, we prove the convergence of the regularized estimator to the optimal predictor under conditions typical of dynamic systems.

1308.4757 2026-06-04 math.NA cs.LG cs.NA stat.ML

Online and stochastic Douglas-Rachford splitting method for large scale machine learning

在线和随机Douglas-Rachford分裂方法用于大规模机器学习

Ziqiang Shi, Rujie Liu

AI总结 本文提出在线和随机Douglas-Rachford分裂方法,用于大规模优化问题,证明其在在线和随机设置下的收敛性,并通过实验验证其有效性。

详情
AI中文摘要

在线和随机学习已成为大规模优化中的强大工具。本文将Douglas-Rachford分裂(DRs)方法推广到在线和随机设置(据我们所知,这是首次将DRs方法推广到顺序版本)。我们首先建立了批量DRs方法的O(1/√T) regret界。然后证明在线DRs分裂方法具有O(1)的regret界,而随机DRs分裂方法的收敛率为O(1/√T)。证明过程简单直观,结果和技术可为利用DRs方法进行大规模机器学习研究提供基础。所提方法的数值实验验证了在线和随机更新规则的有效性,并进一步确认了我们的regret和收敛性分析。

英文摘要

Online and stochastic learning has emerged as powerful tool in large scale optimization. In this work, we generalize the Douglas-Rachford splitting (DRs) method for minimizing composite functions to online and stochastic settings (to our best knowledge this is the first time DRs been generalized to sequential version). We first establish an $O(1/\sqrt{T})$ regret bound for batch DRs method. Then we proved that the online DRs splitting method enjoy an $O(1)$ regret bound and stochastic DRs splitting has a convergence rate of $O(1/\sqrt{T})$. The proof is simple and intuitive, and the results and technique can be served as a initiate for the research on the large scale machine learning employ the DRs method. Numerical experiments of the proposed method demonstrate the effectiveness of the online and stochastic update rule, and further confirm our regret and convergence analysis.

1510.07476 2026-06-04 stat.ME cs.NA math.NA

Polynomial Chaos-based Bayesian Inference of K-Profile Parametrization in a General Circulation Model of the Tropical Pacific

基于多项式混沌的MIT热带太平洋通用环流模型中K-廓线参数化不确定性量化方法

Ihab Sraj, Sarah E. Zedler, Omar M. Knio, Charles S. Jackson, Ibrahim Hoteit

AI总结 本文提出基于多项式混沌的贝叶斯推断方法,用于量化MIT通用环流模型中K-廓线参数化不确定性,通过马尔可夫链蒙特卡洛方法和压缩感知技术提高计算效率。

详情
AI中文摘要

作者提出了一种基于多项式混沌(PC)的贝叶斯推断方法,用于量化MIT通用环流模型(MITgcm)中K-廓线参数化(KPP)的不确定性。参数推断基于马尔可夫链蒙特卡洛(MCMC)方案,利用新设计的检验统计量,考虑了不同时间尺度上湍流混合结构的组件以及数据质量,并过滤参数扰动与风变化的影响。为避免在每次MCMC迭代中积分MITgcm模型带来的高昂计算成本,我们使用PC方法构建了检验统计量的替代模型。为过滤模型预测中的噪声并避免相关收敛问题,我们采用基 Pursuit-去噪(BPDN)压缩感知方法确定代表替代模型的PC系数。PC替代模型随后用于评估MCMC步骤中的检验统计量,以采样不确定参数的后验分布。结果表明,后验分布与KPP模型中两个参数(临界体积分数和梯度里兹数)的默认值有良好一致性;而其余参数的后验分布几乎无信息。

英文摘要

The authors present a Polynomial Chaos (PC)-based Bayesian inference method for quantifying the uncertainties of the K-Profile Parametrization (KPP) within the MIT General Circulation Model (MITgcm) of the tropical pacific. The inference of the uncertain parameters is based on a Markov Chain Monte Carlo (MCMC) scheme that utilizes a newly formulated test statistic taking into account the different components representing the structures of turbulent mixing on both daily and seasonal timescales in addition to the data quality, and filters for the effects of parameter perturbations over those due to changes in the wind. To avoid the prohibitive computational cost of integrating the MITgcm model at each MCMC iteration, we build a surrogate model for the test statistic using the PC method. To filter out the noise in the model predictions and avoid related convergence issues, we resort to a Basis-Pursuit-DeNoising (BPDN) compressed sensing approach to determine the PC coefficients of a representative surrogate model. The PC surrogate is then used to evaluate the test statistic in the MCMC step for sampling the posterior of the uncertain parameters. Results of the posteriors indicate good agreement with the default values for two parameters of the KPP model namely the critical bulk and gradient Richardson numbers; while the posteriors of the remaining parameters were barely informative.

1612.06451 2026-06-04 cs.NI cs.CY econ.GN q-fin.EC stat.AP

Panel dataset description for econometric analysis of the ISP-OTT relationship in the years 2008-2013

用于2008-2013年间ISP-OTT关系计量分析的面板数据集描述

Chiara Perillo, Angelos Antonopoulos, Christos Verikoukis

AI总结 本文提供2008-2013年间ISP与OTT服务商的财务和统计数据,用于构建计量模型分析两者关系,涵盖三大OTT提供商和十家主要ISP。

详情
Comments
34 pages
AI中文摘要

电信领域最新技术进展(如移动设备普及、5G无线通信引入等)带来了新参与者,特别是Over-the-Top(OTT)服务提供商,其服务通过现有电信网络提供。新参与者与传统主导ISP的动态变化产生了冲突,促使需要新的分析研究。然而,缺乏整合数据的数据库阻碍了研究。本文提供2008-2013年间详细数据总结,涵盖ISP收入和资本支出(CAPEX)、OTT收入、互联网普及率和国内生产总值(GDP),考虑三大OTT提供商(Facebook、Skype、WhatsApp)和十家主要ISP,覆盖七个国家。

英文摘要

The latest technological advancements in the telecommunications domain (e.g., widespread adoption of mobile devices, introduction of 5G wireless communications, etc.) have brought new stakeholders into the spotlight. More specifically, Over-the-Top (OTT) providers have recently appeared, offering their services over the existing deployed telecommunication networks. The entry of the new players has changed the dynamics in the domain, as it creates conflicting situations with the Internet Service Providers (ISPs), who traditionally dominate the area, motivating the necessity for novel analytical studies for this relationship. However, despite the importance of accessing real observational data, there is no database with the aggregate information that can serve as a solid base for this research. To that end, this document provides a detailed summary report for financial and statistic data for the period 2008-2013 that can be exploited for realistic econometric models that will provide useful insights on this topic. The document summarizes data from various sources with regard to the ISP revenues and Capital Expenditures (CAPEX), the OTT revenues, the Internet penetration and the Gross Domestic Product (GDP), taking into account three big OTT providers (i.e., Facebook, Skype, WhatsApp) and ten major ISPs that operate in seven different countries.

1601.00129 2026-06-04 math.NA cs.NA stat.AP stat.CO

The Reduced-Order Hybrid Monte Carlo Sampling Smoother

降阶混合蒙特卡洛采样平滑器

Ahmed Attia, Razvan Stefanescu, Adrian Sandu

AI总结 本文提出基于降阶近似的高效混合蒙特卡洛采样平滑器,用于四维变分数据同化,通过浅水方程验证其在准确性和速度上的优势。

详情
Comments
32 pages, 2 figures
AI中文摘要

混合蒙特卡洛(HMC)采样平滑器是一种完全非高斯的四维数据同化算法,通过在贝叶斯框架下直接采样后验分布。其原始形式由于需要反复运行正向和伴随模型而计算成本高。本文提出基于底层模型动力学降阶近似的高效HMC采样平滑器版本。所开发的方案通过笛卡尔坐标系下的浅水方程模型进行数值测试。结果表明,降阶版本的平滑器能够准确捕捉后验概率密度,同时显著快于原始全阶形式。

英文摘要

Hybrid Monte-Carlo (HMC) sampling smoother is a fully non-Gaussian four-dimensional data assimilation algorithm that works by directly sampling the posterior distribution formulated in the Bayesian framework. The smoother in its original formulation is computationally expensive due to the intrinsic requirement of running the forward and adjoint models repeatedly. Here we present computationally efficient versions of the HMC sampling smoother based on reduced-order approximations of the underlying model dynamics. The schemes developed herein are tested numerically using the shallow-water equations model on Cartesian coordinates. The results reveal that the reduced-order versions of the smoother are capable of accurately capturing the posterior probability density, while being significantly faster than the original full order formulation.

1612.06176 2026-06-04 cs.CV cs.NA math.NA stat.ML

An extended Perona-Malik model based on probabilistic models

基于概率模型扩展的Perona-Malik模型

Lars M. Mescheder, Dirk A. Lorenz

AI总结 本文基于高斯尺度混合模型扩展了Perona-Malik模型,通过EM算法推导出滞后扩散算法,并改进其以更好地捕捉恢复中的不确定性,同时提出计算可行的放松方法,实验显示改进算法在恢复纹理区域和模糊边缘方面表现更优。

详情
AI中文摘要

Perona-Malik模型在从噪声输入中恢复图像方面非常成功。本文将该模型重新诠释为高斯尺度混合物的语言,并推导出一些扩展。具体来说,我们展示了将EM算法应用于高斯尺度混合物导致滞后扩散算法用于计算Perona-Malik扩散方程的稳态点。此外,我们展示了这些高斯尺度混合物的均场近似如何导致一种改进的滞后扩散算法,更准确地捕捉恢复中的不确定性。由于这种改进在实践中难以计算,我们提出对均场目标进行放松以使算法计算可行。我们的数值实验表明,这种改进的滞后扩散算法在恢复纹理区域和模糊边缘方面通常比未改进的算法表现更好。作为高斯尺度混合框架的第二个应用,我们展示了如何通过高效采样过程获得概率模型,使计算条件均值和其他期望在算法上可行。同样,所得到的算法与滞后扩散算法有很强的相似性。最后,我们展示了在相同框架下,通过离散边缘先验可以得到概率版本的Mumford-Shah分割模型。

英文摘要

The Perona-Malik model has been very successful at restoring images from noisy input. In this paper, we reinterpret the Perona-Malik model in the language of Gaussian scale mixtures and derive some extensions of the model. Specifically, we show that the expectation-maximization (EM) algorithm applied to Gaussian scale mixtures leads to the lagged-diffusivity algorithm for computing stationary points of the Perona-Malik diffusion equations. Moreover, we show how mean field approximations to these Gaussian scale mixtures lead to a modification of the lagged-diffusivity algorithm that better captures the uncertainties in the restoration. Since this modification can be hard to compute in practice we propose relaxations to the mean field objective to make the algorithm computationally feasible. Our numerical experiments show that this modified lagged-diffusivity algorithm often performs better at restoring textured areas and fuzzy edges than the unmodified algorithm. As a second application of the Gaussian scale mixture framework, we show how an efficient sampling procedure can be obtained for the probabilistic model, making the computation of the conditional mean and other expectations algorithmically feasible. Again, the resulting algorithm has a strong resemblance to the lagged-diffusivity algorithm. Finally, we show that a probabilistic version of the Mumford-Shah segementation model can be obtained in the same framework with a discrete edge-prior.

1612.04933 2026-06-04 stat.ML cs.AI cs.LG cs.SY eess.SY

Dynamical Kinds and their Discovery

动力学种类及其发现

Benjamin C. Jantzen

AI总结 本文提出一种无需显式构建动态模型或依赖系统动力学先验知识,即可分类具有相同结构因果系统的算法,展示了其在动态模型开发与验证中的应用价值。

详情
Comments
Accepted for the proceedings of the Causation: Foundation to Application Workshop, UAI 2016
AI中文摘要

我们展示了将因果系统分类为共享相同结构的种类的可能性,无需首先构建显式动态模型或使用系统动力学的先验知识。该算法能够确定任意系统是否由相同形式的因果关系支配,具有在动态模型开发和验证中的重要应用价值。从理论上看,这也是科学推理中从实证数据中推导定律的关键阶段。所提出的算法基于动态对称性方法来处理动态种类。时间对称性是指对系统的一个或多个变量进行干预,该干预与系统的时间演化过程可交换。动态种类是共享一组动态对称性的系统类。所提出的算法通过直接比较系统展示的对称性来分类确定性、时间依赖性的因果系统。使用来自多种非线性系统的模拟、噪声数据,我们证明该算法能够正确地将系统分类为动态种类。该算法在显著的采样误差下具有鲁棒性,对采样误差的非正态性不敏感,并在动态相似性增加时表现良好。所展示的算法是首个针对自动化科学发现这一方面的算法。

英文摘要

We demonstrate the possibility of classifying causal systems into kinds that share a common structure without first constructing an explicit dynamical model or using prior knowledge of the system dynamics. The algorithmic ability to determine whether arbitrary systems are governed by causal relations of the same form offers significant practical applications in the development and validation of dynamical models. It is also of theoretical interest as an essential stage in the scientific inference of laws from empirical data. The algorithm presented is based on the dynamical symmetry approach to dynamical kinds. A dynamical symmetry with respect to time is an intervention on one or more variables of a system that commutes with the time evolution of the system. A dynamical kind is a class of systems sharing a set of dynamical symmetries. The algorithm presented classifies deterministic, time-dependent causal systems by directly comparing their exhibited symmetries. Using simulated, noisy data from a variety of nonlinear systems, we show that this algorithm correctly sorts systems into dynamical kinds. It is robust under significant sampling error, is immune to violations of normality in sampling error, and fails gracefully with increasing dynamical similarity. The algorithm we demonstrate is the first to address this aspect of automated scientific discovery.

1612.03772 2026-06-04 cs.MS cs.NA math.NA stat.ML

SimTensor: A synthetic tensor data generator

SimTensor:一个合成张量数据生成器

Hadi Fanaee-T, Joao Gama

AI总结 SimTensor是一款多平台开源软件,用于生成具有CP/PARAFAC或Tucker结构的人工张量数据,用于可重复研究张量分解算法。其提供多种配置生成张量的功能,并支持导出至CSV和HDF5等通用格式。

详情
AI中文摘要

SimTensor是一种基于MATLAB的独立应用程序,用于生成人工张量数据(具有CP/PARAFAC或Tucker结构)以用于可重复研究张量分解算法。它提供广泛的功能来生成具有各种配置的张量数据,并配备用户友好的图形用户界面,使用户能够轻松生成复杂设置的张量。它还具有将生成的数据导出到通用格式(如CSV和HDF5)的功能,这些格式可通过多种编程语言(C、C++、Java、R、Fortran、MATLAB、Perl、Python等)导入。SimTensor最创新的功能是能够生成具有周期波、季节效应和流结构的时间张量。它还能够应用非负性和不同类型的稀疏性等约束。SimTensor还提供模拟不同类型突变点和注入各种类型异常的功能。SimTensor的源代码和二进制版本可在http://www.simtensor.org下载。

英文摘要

SimTensor is a multi-platform, open-source software for generating artificial tensor data (either with CP/PARAFAC or Tucker structure) for reproducible research on tensor factorization algorithms. SimTensor is a stand-alone application based on MATALB. It provides a wide range of facilities for generating tensor data with various configurations. It comes with a user-friendly graphical user interface, which enables the user to generate tensors with complicated settings in an easy way. It also has this facility to export generated data to universal formats such as CSV and HDF5, which can be imported via a wide range of programming languages (C, C++, Java, R, Fortran, MATLAB, Perl, Python, and many more). The most innovative part of SimTensor is this that can generate temporal tensors with periodic waves, seasonal effects and streaming structure. it can apply constraints such as non-negativity and different kinds of sparsity to the data. SimTensor also provides this facility to simulate different kinds of change-points and inject various types of anomalies. The source code and binary versions of SimTensor is available for download in http://www.simtensor.org.

1612.01600 2026-06-04 math.OC cs.LG cs.MA cs.SY eess.SY stat.ML

Distributed Gaussian Learning over Time-varying Directed Graphs

时变有向图上的分布式高斯学习

Angelia Nedić, Alex Olshevsky, César A. Uribe

AI总结 本文提出一种分布式非贝叶斯学习算法,用于高斯噪声下的参数估计,通过显式更新高斯信念参数,证明了收敛率和几乎必然收敛性。

详情
AI中文摘要

我们提出了一种分布式(非贝叶斯)学习算法,用于参数估计问题中的高斯噪声。该算法以高斯信念的参数(即均值和精度)的显式更新形式表达。我们展示了收敛率为O(1/k),常数项依赖于代理数量和网络拓扑。此外,我们还证明了在一般时变有向图情况下,算法几乎必然收敛到估计问题的最优解。

英文摘要

We present a distributed (non-Bayesian) learning algorithm for the problem of parameter estimation with Gaussian noise. The algorithm is expressed as explicit updates on the parameters of the Gaussian beliefs (i.e. means and precision). We show a convergence rate of $O(1/k)$ with the constant term depending on the number of agents and the topology of the network. Moreover, we show almost sure convergence to the optimal solution of the estimation problem for the general case of time-varying directed graphs.

1611.09766 2026-06-04 stat.AP cs.SY eess.SY physics.bio-ph

Drift Removal in Plant Electrical Signals via IIR Filtering Using Wavelet Energy

通过IIR滤波去除植物电信号中的漂移

Saptarshi Das, Barry Juans Ajiwibawa, Shre Kumar Chatterjee, Sanmitra Ghosh, Koushik Maharatna, Srinandan Dasmahapatra, Andrea Vitaletti, Elisa Masi, Stefano Mancuso

AI总结 本文提出一种最优IIR滤波器,利用小波包变换的能量标准去除植物电信号中的低频漂移,保留随机成分的频谱信息,为植物信号处理开辟新领域。

详情
Journal ref
Computers and Electronics in Agriculture, Volume 118, October 2015, Pages 15-23
Comments
12 pages, 9 figures, 1 table
AI中文摘要

植物电信号常包含低频漂移,无论是否施加外部刺激。由于缺乏实际生物响应中的关键频率信息,难以以刺激特异性方式量化植物信号的随机性。本文设计了最优无限脉冲响应(IIR)滤波器,通过将未知伪影和漂移引起的偏差降至最低,去除低频漂移并保留未刺激植物信号的随机成分频谱。利用小波包变换的能量标准对IIR滤波器参数进行优化调优。此类最优滤波器使不同实验中预刺激部分的能量分布几乎重叠,但在不同刺激下能量分布发生变化。本研究可能推动植物信号处理成为独立领域,同时与其他传统生物电信号处理范式相结合。

英文摘要

Plant electrical signals often contains low frequency drifts with or without the application of external stimuli. Quantification of the randomness in plant signals in a stimulus-specific way is hindered because the knowledge of vital frequency information in the actual biological response is not known yet. Here we design an optimum Infinite Impulse Response (IIR) filter which removes the low frequency drifts and preserves the frequency spectrum corresponding to the random component of the unstimulated plant signals by bringing the bias due to unknown artifacts and drifts to a minimum. We use energy criteria of wavelet packet transform (WPT) for optimization based tuning of the IIR filter parameters. Such an optimum filter enforces that the energy distribution of the pre-stimulus parts in different experiments are almost overlapped but under different stimuli the distributions of the energy get changed. The reported research may popularize plant signal processing, as a separate field, besides other conventional bioelectrical signal processing paradigms.

1601.06103 2026-06-04 math.ST cs.SI cs.SY eess.SY math.OC math.PR stat.TH

Bayesian Learning without Recall

贝叶斯学习无需回忆

M. Amin Rahimian, Ali Jadbabaie

AI总结 本文研究了在社交网络中,学习者无法回忆历史观察并无法推断他人信念形成的贝叶斯学习模型,探讨了该模型在分析理性代理行为及非贝叶斯更新规则中的应用。

详情
AI中文摘要

我们分析了一个学习和信念形成的网络模型,其中代理遵循贝叶斯规则,但不回忆过去的观察历史,也无法推断其他代理信念的形成方式。他们通过理性推断自己的观察来实现,这些观察包括一系列独立同分布的私人信号以及每个时间点邻居代理的行为。对整个历史观察的连续贝叶斯规则应用导致推断复杂性急剧增加:由于缺乏对全局网络结构的了解,私人观察的不可用性,以及每个决策前第三方互动的缺失。这些困难使得贝叶斯更新信念作为社交学习的机制显得不可行。为了解决这些复杂性,我们考虑了一个无回忆的贝叶斯推理模型。一方面,该模型为分析社交网络中理性代理的行为提供了一个可处理的框架。另一方面,该模型也为文献中各种非贝叶斯更新规则提供了行为基础。我们讨论了行动空间结构和效用函数的不同选择对代理的影响,并研究了学习、收敛性和共识在特殊案例中的性质。

英文摘要

We analyze a model of learning and belief formation in networks in which agents follow Bayes rule yet they do not recall their history of past observations and cannot reason about how other agents' beliefs are formed. They do so by making rational inferences about their observations which include a sequence of independent and identically distributed private signals as well as the actions of their neighboring agents at each time. Successive applications of Bayes rule to the entire history of past observations lead to forebodingly complex inferences: due to lack of knowledge about the global network structure, and unavailability of private observations, as well as third party interactions preceding every decision. Such difficulties make Bayesian updating of beliefs an implausible mechanism for social learning. To address these complexities, we consider a Bayesian without Recall model of inference. On the one hand, this model provides a tractable framework for analyzing the behavior of rational agents in social networks. On the other hand, this model also provides a behavioral foundation for the variety of non-Bayesian update rules in the literature. We present the implications of various choices for the structure of the action space and utility functions for such agents and investigate the properties of learning, convergence, and consensus in special cases.

1611.08372 2026-06-04 stat.ML cs.LG cs.NA math.NA math.OC

A Unified Convex Surrogate for the Schatten-$p$ Norm

一种统一的凸替代项用于Schatten-p范数

Chen Xu, Zhouchen Lin, Hongbin Zha

AI总结 本文提出一种统一的凸替代项,用于Schatten-p范数,通过矩阵分解的等价性,使因子矩阵的范数可凸优化,提升矩阵补全任务的性能。

详情
Comments
The paper is accepted by AAAI-17. We show that multi-factor matrix factorization enjoys superiority over the traditional two-factor case
AI中文摘要

Schatten-p范数(0<p<1)已被广泛用于替代核范数以更好地近似秩函数。然而,现有方法要么由于每次迭代依赖奇异值分解(SVD)而不适用于大规模问题,要么局限于特定的p值,如1/2和2/3。本文表明,对于任何p、p1和p2>0满足1/p=1/p1+1/p2,单矩阵的Schatten-p范数与两个因子矩阵的Schatten-p1和Schatten-p2范数之间存在等价性。我们进一步将等价性扩展到多个因子矩阵,并证明所有因子范数对于任何p>0均可凸和光滑。相比之下,原始Schatten-p范数对于0<p<1是非凸和非光滑的。作为示例,我们进行了矩阵补全实验。为了利用因子矩阵范数的凸性,我们采用了加速近端交替线性化最小化算法,并建立了其序列收敛性。在合成和真实数据集上的实验显示其优于现有方法,速度也极具竞争力。

英文摘要

The Schatten-$p$ norm ($0<p<1$) has been widely used to replace the nuclear norm for better approximating the rank function. However, existing methods are either 1) not scalable for large scale problems due to relying on singular value decomposition (SVD) in every iteration, or 2) specific to some $p$ values, e.g., $1/2$, and $2/3$. In this paper, we show that for any $p$, $p_1$, and $p_2 >0$ satisfying $1/p=1/p_1+1/p_2$, there is an equivalence between the Schatten-$p$ norm of one matrix and the Schatten-$p_1$ and the Schatten-$p_2$ norms of its two factor matrices. We further extend the equivalence to multiple factor matrices and show that all the factor norms can be convex and smooth for any $p>0$. In contrast, the original Schatten-$p$ norm for $0<p<1$ is non-convex and non-smooth. As an example we conduct experiments on matrix completion. To utilize the convexity of the factor matrix norms, we adopt the accelerated proximal alternating linearized minimization algorithm and establish its sequence convergence. Experiments on both synthetic and real datasets exhibit its superior performance over the state-of-the-art methods. Its speed is also highly competitive.

1411.5915 2026-06-04 eess.SY cs.SY stat.ML

Robust EM kernel-based methods for linear system identification

鲁棒的EM核方法用于线性系统辨识

Giulio Bottegal, Aleksandr Y. Aravkin, Håkan Hjalmarsson, Gianluigi Pillonetto

AI总结 本文提出一种鲁棒的核方法用于线性系统辨识,通过引入重尾分布模型和EM算法改进超参数估计,提升对异常值的鲁棒性。

详情
Comments
Accepted for publication in Automatica
AI中文摘要

近年来,系统辨识领域的发展引起了对正则化核方法的关注。此类方法已被证明在经典参数方法中表现优异。然而,当前的公式对异常值不具有鲁棒性。本文提出了一种新的方法来使核系统辨识方法更加鲁棒。为此,我们使用具有重尾概率密度函数(pdf)的随机变量建模输出测量噪声,重点研究拉普拉斯分布和学生t分布。利用这些pdf作为高斯混合物的表示,我们将系统辨识问题转化为高斯过程回归框架,该框架需要估计与数据规模成比例的超参数数量。为克服这一困难,我们设计了一个新的最大后验(MAP)估计器来估计超参数,并使用基于期望最大化(EM)方法的新型迭代方案解决相关优化问题。在存在异常值的情况下,模拟数据和真实系统的测试显示,与当前使用的核方法相比,性能有显著提升。

英文摘要

Recent developments in system identification have brought attention to regularized kernel-based methods. This type of approach has been proven to compare favorably with classic parametric methods. However, current formulations are not robust with respect to outliers. In this paper, we introduce a novel method to robustify kernel-based system identification methods. To this end, we model the output measurement noise using random variables with heavy-tailed probability density functions (pdfs), focusing on the Laplacian and the Student's t distributions. Exploiting the representation of these pdfs as scale mixtures of Gaussians, we cast our system identification problem into a Gaussian process regression framework, which requires estimating a number of hyperparameters of the data size order. To overcome this difficulty, we design a new maximum a posteriori (MAP) estimator of the hyperparameters, and solve the related optimization problem with a novel iterative scheme based on the Expectation-Maximization (EM) method. In presence of outliers, tests on simulated data and on a real system show a substantial performance improvement compared to currently used kernel-based methods for linear system identification.

1611.07555 2026-06-04 cs.DC cs.NA math.NA stat.ML

Randomized Distributed Mean Estimation: Accuracy vs Communication

随机分布式均值估计:精度与通信

Jakub Konečný, Peter Richtárik

AI总结 研究在通信预算限制下分布式估计向量集合算术平均的问题,提出灵活的随机算法,在通信成本与估计误差间取得平衡,提出改进的误差界。

详情
Comments
19 pages, 1 figure
AI中文摘要

我们考虑在分布式计算节点中存储的有限实向量集合的算术平均估计问题,受通信预算约束。分析不依赖向量来源的统计假设。该问题出现在分布式和联邦优化与学习算法的reduce-all操作中。我们提出了一类灵活的随机算法,探索通信成本与估计误差之间的权衡。该家族包含全通信和零误差方法的一端,以及ε位通信和O(1/(εn))误差方法的另一端。在通信预期为每个向量单个比特的情况下,我们改进现有结果,得到O(r/n)误差界,其中r是表示浮点值所用的位数。

英文摘要

We consider the problem of estimating the arithmetic average of a finite collection of real vectors stored in a distributed fashion across several compute nodes subject to a communication budget constraint. Our analysis does not rely on any statistical assumptions about the source of the vectors. This problem arises as a subproblem in many applications, including reduce-all operations within algorithms for distributed and federated optimization and learning. We propose a flexible family of randomized algorithms exploring the trade-off between expected communication cost and estimation error. Our family contains the full-communication and zero-error method on one extreme, and an $ε$-bit communication and ${\cal O}\left(1/(εn)\right)$ error method on the opposite extreme. In the special case where we communicate, in expectation, a single bit per coordinate of each vector, we improve upon existing results by obtaining $\mathcal{O}(r/n)$ error, where $r$ is the number of bits used to represent a floating point value.

1504.03461 2026-06-04 stat.CO cs.NA math.NA math.PR

On a generalization of the preconditioned Crank-Nicolson Metropolis algorithm

关于预条件Crank-Nicolson Metropolis算法的推广

Daniel Rudolf, Björn Sprungk

AI总结 本文提出一种改进的预条件Crank-Nicolson Metropolis算法,用于无限维Hilbert空间概率测度近似采样,并展示其在贝叶斯反问题中的有效性,证明其与状态空间维度和观测噪声方差无关,且具有几何混合性。

详情
Comments
40 pages, 3 Figures
AI中文摘要

本文考虑了用于无限维Hilbert空间概率测度近似采样的Metropolis算法,并引入了预条件Crank-Nicolson(pCN)提议的推广。新的提议能够结合目标测度的信息。一个贝叶斯反问题的数值模拟表明,具有此类提议的Metropolis算法在状态空间维度和观测噪声方差方面表现独立。此外,通过谱间隙的比较论证提供了定性收敛结果。特别是,证明了该推广继承了pCN提议的Metropolis算法的几何混合性。

英文摘要

Metropolis algorithms for approximate sampling of probability measures on infinite dimensional Hilbert spaces are considered and a generalization of the preconditioned Crank-Nicolson (pCN) proposal is introduced. The new proposal is able to incorporate information of the measure of interest. A numerical simulation of a Bayesian inverse problem indicates that a Metropolis algorithm with such a proposal performs independent of the state space dimension and the variance of the observational noise. Moreover, a qualitative convergence result is provided by a comparison argument for spectral gaps. In particular, it is shown that the generalization inherits geometric ergodicity from the Metropolis algorithm with pCN proposal.

1611.06094 2026-06-04 stat.ML cs.NA math.NA

Generalizing diffuse interface methods on graphs: non-smooth potentials and hypergraphs

将扩散界面方法推广到图上:非光滑势函数和超图

Jessica Bosch, Steffen Klamt, Martin Stoll

AI总结 本文提出非光滑势函数扩展扩散界面方法,用于超图数据分割,并展示图拉普拉斯矩阵源于超图信息。

详情
Comments
13 pages, 11 figures
AI中文摘要

扩散界面方法近期被引入半监督学习领域。其底层模型在材料科学中已知,但通过吉因堡-兰道函数和图拉普拉斯算子扩展到图上。本文通过非光滑势函数推广原有模型,并展示扩散界面方法可用于超图数据分割。我们证明图拉普拉斯矩阵在几乎所有情况下均源于超图信息。此外,我们展示由放松优化问题导出的超图拉普拉斯算子适合用于扩散界面方法。我们为图和超图拉普拉斯矩阵进行了计算实验。

英文摘要

Diffuse interface methods have recently been introduced for the task of semi-supervised learning. The underlying model is well-known in materials science but was extended to graphs using a Ginzburg--Landau functional and the graph Laplacian. We here generalize the previously proposed model by a non-smooth potential function. Additionally, we show that the diffuse interface method can be used for the segmentation of data coming from hypergraphs. For this we show that the graph Laplacian in almost all cases is derived from hypergraph information. Additionally, we show that the formerly introduced hypergraph Laplacian coming from a relaxed optimization problem is well suited to be used within the diffuse interface method. We present computational experiments for graph and hypergraph Laplacians.

1611.05977 2026-06-04 cs.LG cs.NA math.NA stat.AP stat.ML

Robust and Scalable Column/Row Sampling from Corrupted Big Data

鲁棒且可扩展的列/行采样从受腐蚀的大数据

Mostafa Rahmani, George Atia

AI总结 本文提出新的采样算法,能在严重数据腐蚀下定位信息列,并开发可扩展的随机化设计,同时对稀疏腐蚀和异常值具有鲁棒性,实验显示优于现有鲁棒采样算法。

详情
AI中文摘要

传统采样技术在数据严重腐蚀时无法生成数据描述性草图,因为此类腐蚀破坏了其所需的低秩结构。本文提出新的采样算法,可在存在严重数据腐蚀时定位信息列,并开发新的可扩展随机化设计。所提方法同时对稀疏腐蚀和异常值具有鲁棒性,并通过真实和合成数据的实验表明显著优于现有鲁棒采样算法。

英文摘要

Conventional sampling techniques fall short of drawing descriptive sketches of the data when the data is grossly corrupted as such corruptions break the low rank structure required for them to perform satisfactorily. In this paper, we present new sampling algorithms which can locate the informative columns in presence of severe data corruptions. In addition, we develop new scalable randomized designs of the proposed algorithms. The proposed approach is simultaneously robust to sparse corruption and outliers and substantially outperforms the state-of-the-art robust sampling algorithms as demonstrated by experiments conducted using both real and synthetic data.

1610.04272 2026-06-04 math.NA cs.NA stat.CO

Tensor Computation: A New Framework for High-Dimensional Problems in EDA

张量计算:EDA中高维问题的新框架

Zheng Zhang, Kim Batselier, Haotian Liu, Luca Daniel, Ngai Wong

AI总结 本文提出张量计算作为高效EDA算法的替代框架,解决高维问题中的计算瓶颈,展示非线性电路建模和不确定性量化等应用案例。

详情
Comments
14 figures. Accepted by IEEE Trans. CAD of Integrated Circuits and Systems
AI中文摘要

许多关键EDA问题受到维度灾难的影响,即由大量参数和/或未知变量导致的计算负担迅速增长。本文提出张量计算作为高效EDA算法开发的替代框架,张量是矩阵和向量的高维推广,适用于存储和高效求解高维EDA问题。本文提供张量基础教程,展示非线性电路建模和高维不确定性量化等应用案例,并提出进一步的开放EDA问题,探讨张量计算的潜在优势。

英文摘要

Many critical EDA problems suffer from the curse of dimensionality, i.e. the very fast-scaling computational burden produced by large number of parameters and/or unknown variables. This phenomenon may be caused by multiple spatial or temporal factors (e.g. 3-D field solvers discretizations and multi-rate circuit simulation), nonlinearity of devices and circuits, large number of design or optimization parameters (e.g. full-chip routing/placement and circuit sizing), or extensive process variations (e.g. variability/reliability analysis and design for manufacturability). The computational challenges generated by such high dimensional problems are generally hard to handle efficiently with traditional EDA core algorithms that are based on matrix and vector computation. This paper presents "tensor computation" as an alternative general framework for the development of efficient EDA algorithms and tools. A tensor is a high-dimensional generalization of a matrix and a vector, and is a natural choice for both storing and solving efficiently high-dimensional EDA problems. This paper gives a basic tutorial on tensors, demonstrates some recent examples of EDA applications (e.g., nonlinear circuit modeling and high-dimensional uncertainty quantification), and suggests further open EDA problems where the use of tensor computation could be of advantage.

1507.05144 2026-06-04 eess.SY cs.SY math.OC stat.AP

Centralized Adaptation for Parameter Estimation over Wireless Sensor Networks

无线传感器网络中参数估计的集中式适应方法

Reza Abdolee, Benoit Champagne

AI总结 本文研究了无线传感器网络中集中式最小均方算法的性能,提出了一种改进算法,通过优化传输数据和链路故障警报策略来减少通信噪声影响,降低稳态均方误差。

详情
Comments
IEEE Communication Letter, 2015
AI中文摘要

我们研究了无线传感器网络中集中式最小均方(CLMS)算法的性能,其中节点通过衰减信道将数据传输到中央处理单元(如融合中心或集群头)以进行参数估计。无线信道退化,包括衰减和路径损耗,会扭曲传输数据,导致链路失效并降低自适应解决方案的性能。为了解决这个问题,我们提出了一种新的CLMS算法,该算法使用改进的传输数据,并利用链路故障警报策略来丢弃严重失真数据。此外,为了消除通信噪声引起的估计偏差,我们引入了一个消除偏差的方案,这也会导致稳态均方误差降低。我们的理论发现得到了数值模拟结果的支持。

英文摘要

We study the performance of centralized least mean-squares (CLMS) algorithms in wireless sensor networks where nodes transmit their data over fading channels to a central processing unit (e.g., fusion center or cluster head), for parameter estimation. Wireless channel impairments, including fading and path loss, distort the transmitted data, cause link failure and degrade the performance of the adaptive solutions. To address this problem, we propose a novel CLMS algorithm that uses a refined version of the transmitted data and benefits from a link failure alarm strategy to discard severely distorted data. Furthermore, to remove the bias due to communication noise from the estimate, we introduce a bias-elimination scheme that also leads to a lower steady-state mean-square error. Our theoretical findings are supported by numerical simulation results.

1506.06040 2026-06-04 stat.ME cs.NA math.NA stat.AP stat.ML

Tensor Analysis and Fusion of Multimodal Brain Images

张量分析与多模态脑图像融合

Esin Karahan, Pedro A. Rojas-Lopez, Maria L. Bringas-Vega, Pedro A. Valdes-Hernandez, Pedro A. Valdes-Sosa

AI总结 本文提出利用张量结构分析多模态神经影像数据,引入马尔可夫-彭罗斯图进行建模,首次将Granger因果分析视为张量回归问题,展示其在脑网络分解中的潜力。

详情
Comments
23 pages, 15 figures, submitted to Proceedings of the IEEE
AI中文摘要

当前高通量数据采集技术通过不同成像模态探测动态系统,生成不同空间和时间分辨率的数据集,对多模态数据融合提出挑战。本文强调神经影像数据的多模态、多尺度特性可通过张量结构反映,通过少量组件或

英文摘要

Current high-throughput data acquisition technologies probe dynamical systems with different imaging modalities, generating massive data sets at different spatial and temporal resolutions posing challenging problems in multimodal data fusion. A case in point is the attempt to parse out the brain structures and networks that underpin human cognitive processes by analysis of different neuroimaging modalities (functional MRI, EEG, NIRS etc.). We emphasize that the multimodal, multi-scale nature of neuroimaging data is well reflected by a multi-way (tensor) structure where the underlying processes can be summarized by a relatively small number of components or "atoms". We introduce Markov-Penrose diagrams - an integration of Bayesian DAG and tensor network notation in order to analyze these models. These diagrams not only clarify matrix and tensor EEG and fMRI time/frequency analysis and inverse problems, but also help understand multimodal fusion via Multiway Partial Least Squares and Coupled Matrix-Tensor Factorization. We show here, for the first time, that Granger causal analysis of brain networks is a tensor regression problem, thus allowing the atomic decomposition of brain networks. Analysis of EEG and fMRI recordings shows the potential of the methods and suggests their use in other scientific domains.

1506.04775 2026-06-04 math.PR cs.NA cs.SY eess.SY math.AP math.NA math.ST stat.TH

A stochastic density matrix approach to approximation of probability distributions and its application to nonlinear systems

基于随机密度矩阵的概率分布近似方法及其在非线性系统中的应用

Igor G. Vladimirov

AI总结 本文提出利用二次型加权正交基函数和半正定酉矩阵近似概率密度函数,该方法满足归一化条件且非负,适用于非线性系统中的Fokker-Planck-Kolmogorov方程动力学。

详情
Comments
12 pages, 3 figures. A brief version of this paper will appear in the proceedings of the IEEE Multi-Conference on Systems and Control, 21-23 September 2015, Sydney, Australia
AI中文摘要

本文概述了一种通过加权正交基函数的二次型和半正定酉矩阵来近似概率密度函数的方法。这些矩阵被称为随机密度矩阵,以反映与量子力学密度矩阵的类比。SDM近似满足归一化条件且处处非负,不同于截断的Gram-Charlier和Edgeworth展开。对于具有代数结构的基函数,如Hermite多项式和傅里叶基,SDM近似可以满足给定的矩规范并利用二次接近准则进行优化。我们将SDM方法应用于由非线性随机微分方程驱动的马尔可夫扩散过程的Fokker-Planck-Kolmogorov PDF动力学,从而得到近似PDF的SDM动力学的常微分方程。作为例子,我们考虑了在多维环面上的Smoluchowski SDE。

英文摘要

This paper outlines an approach to the approximation of probability density functions by quadratic forms of weighted orthonormal basis functions with positive semi-definite Hermitian matrices of unit trace. Such matrices are called stochastic density matrices in order to reflect an analogy with the quantum mechanical density matrices. The SDM approximation of a PDF satisfies the normalization condition and is nonnegative everywhere in contrast to the truncated Gram-Charlier and Edgeworth expansions. For bases with an algebraic structure, such as the Hermite polynomial and Fourier bases, the SDM approximation can be chosen so as to satisfy given moment specifications and can be optimized using a quadratic proximity criterion. We apply the SDM approach to the Fokker-Planck-Kolmogorov PDF dynamics of Markov diffusion processes governed by nonlinear stochastic differential equations. This leads to an ordinary differential equation for the SDM dynamics of the approximating PDF. As an example, we consider the Smoluchowski SDE on a multidimensional torus.

1511.05261 2026-06-04 cs.CV cs.LG cs.NA math.NA stat.ML

Robust PCA via Nonconvex Rank Approximation

通过非凸秩近似实现鲁棒PCA

Zhao Kang, Chong Peng, Qiang Cheng

AI总结 本文提出非凸秩近似方法,以改进鲁棒PCA中核范数的局限性,通过高效算法提升准确性和效率。

详情
Comments
IEEE International Conference on Data Mining
AI中文摘要

在数据挖掘和机器学习中,许多应用需要恢复低秩矩阵。鲁棒主成分分析(RPCA)是处理此类问题的通用框架。RPCA中核范数作为秩函数的凸替代物被广泛研究。在某些假设下,它可以以高概率恢复底层低秩矩阵。然而,这些假设可能在实际应用中不成立。由于核范数通过将所有奇异值相加来近似秩,即本质上是奇异值的ℓ1范数,因此产生的近似误差并不 trivial,导致最终的矩阵估计器可能有显著偏差。为寻求更接近的近似并缓解核范数的上述限制,我们提出了一种非凸秩近似。这种对矩阵秩的近似比核范数更紧密。为了解决相关的非凸最小化问题,我们开发了高效的增广拉格朗日乘子优化算法。实验结果表明,我们的方法在准确性和效率上均优于当前最先进的算法。

英文摘要

Numerous applications in data mining and machine learning require recovering a matrix of minimal rank. Robust principal component analysis (RPCA) is a general framework for handling this kind of problems. Nuclear norm based convex surrogate of the rank function in RPCA is widely investigated. Under certain assumptions, it can recover the underlying true low rank matrix with high probability. However, those assumptions may not hold in real-world applications. Since the nuclear norm approximates the rank by adding all singular values together, which is essentially a $\ell_1$-norm of the singular values, the resulting approximation error is not trivial and thus the resulting matrix estimator can be significantly biased. To seek a closer approximation and to alleviate the above-mentioned limitations of the nuclear norm, we propose a nonconvex rank approximation. This approximation to the matrix rank is tighter than the nuclear norm. To solve the associated nonconvex minimization problem, we develop an efficient augmented Lagrange multiplier based optimization algorithm. Experimental results demonstrate that our method outperforms current state-of-the-art algorithms in both accuracy and efficiency.

1510.00563 2026-06-04 stat.CO cs.SY eess.SY

Nonlinear State Space Model Identification Using a Regularized Basis Function Expansion

基于正则化的基函数扩展的非线性状态空间模型识别

Andreas Svensson, Thomas B. Schön, Arno Solin, Simo Särkkä

AI总结 本文提出利用基函数扩展方法对非线性状态空间模型进行黑箱识别,采用期望最大化算法迭代更新状态和参数,并引入正则化方案防止过拟合,通过仿真和实际数据验证方法有效性。

详情
Comments
Accepted to the 6th IEEE international workshop on computational advances in multi-sensor adaptive processing (CAMSAP), Cancun, Mexico, December 2015
AI中文摘要

本文关注非线性状态空间模型的黑箱识别问题。通过在状态空间模型中引入基函数扩展,获得灵活的结构。采用期望最大化方法进行模型识别,通过迭代更新状态和参数以获得最大似然估计。使用具有良好理论性质的粒子方法推断状态,而参数可通过利用模型对参数的线性性得到闭式表达式。为防止灵活模型过拟合,我们提出一种正则化方案,无需增加计算负担。重要的是,这为非线性状态空间模型的系统使用正则化打开了途径。最后,通过一个仿真示例和两个实际数据问题评估所提方法。

英文摘要

This paper is concerned with black-box identification of nonlinear state space models. By using a basis function expansion within the state space model, we obtain a flexible structure. The model is identified using an expectation maximization approach, where the states and the parameters are updated iteratively in such a way that a maximum likelihood estimate is obtained. We use recent particle methods with sound theoretical properties to infer the states, whereas the model parameters can be updated using closed-form expressions by exploiting the fact that our model is linear in the parameters. Not to over-fit the flexible model to the data, we also propose a regularization scheme without increasing the computational burden. Importantly, this opens up for systematic use of regularization in nonlinear state space models. We conclude by evaluating our proposed approach on one simulation example and two real-data problems.

1407.1065 2026-06-04 cs.IT cs.NA math.FA math.IT math.NA math.OC math.ST stat.TH

Phase Retrieval via Wirtinger Flow: Theory and Algorithms

通过Wirtinger流进行相位恢复:理论与算法

Emmanuel Candes, Xiaodong Li, Mahdi Soltanolkotabi

AI总结 本文提出一种非凸相位恢复算法,通过谱方法初始化并迭代优化,以高效恢复少量随机测量中的相位信息。

详情
Comments
IEEE Transactions on Information Theory, Vol. 64 (4), Feb. 2015
AI中文摘要

本文研究了从幅度测量中恢复相位的问题;具体来说,我们希望重建一个复值信号x∈C^n,我们有形式为y_r = |<a_r, x>|^2的相位less样本(知道这些样本的相位将得到一个线性系统)。本文提出了一种非凸的相位恢复问题的公式化,并提出了一个具体的求解算法。简而言之,该算法首先通过谱方法获得一个仔细的初始化,然后通过迭代应用新的更新规则来改进这个初始估计,这些更新规则具有低计算复杂度,类似于梯度下降方案。主要贡献是证明该算法能够严格地从几乎最小数量的随机测量中精确恢复相位信息。确实,连续迭代的序列被证明以几何速率收敛到解,因此该方案在计算和数据资源方面都是高效的。在理论上,这种方案的变种导致了一种近线性时间算法,基于编码衍射图案的物理可实现模型。我们通过各种图像数据实验展示了我们方法的有效性。我们的分析背后有对非凸优化方案的见解,这些见解可能对计算问题有更广泛的影响。

英文摘要

We study the problem of recovering the phase from magnitude measurements; specifically, we wish to reconstruct a complex-valued signal x of C^n about which we have phaseless samples of the form y_r = |< a_r,x >|^2, r = 1,2,...,m (knowledge of the phase of these samples would yield a linear system). This paper develops a non-convex formulation of the phase retrieval problem as well as a concrete solution algorithm. In a nutshell, this algorithm starts with a careful initialization obtained by means of a spectral method, and then refines this initial estimate by iteratively applying novel update rules, which have low computational complexity, much like in a gradient descent scheme. The main contribution is that this algorithm is shown to rigorously allow the exact retrieval of phase information from a nearly minimal number of random measurements. Indeed, the sequence of successive iterates provably converges to the solution at a geometric rate so that the proposed scheme is efficient both in terms of computational and data resources. In theory, a variation on this scheme leads to a near-linear time algorithm for a physically realizable model based on coded diffraction patterns. We illustrate the effectiveness of our methods with various experiments on image data. Underlying our analysis are insights for the analysis of non-convex optimization schemes that may have implications for computational problems beyond phase retrieval.

1305.5601 2026-06-04 stat.AP cs.SY eess.SY

Optimal Periodic Sensor Scheduling in Networks of Dynamical Systems

动态系统网络中最优周期传感器调度

Sijia Liu, Makan Fardad, Engin Masazade, Pramod K. Varshney

AI总结 本文研究动态系统网络中周期性传感器调度问题,通过优化估计精度与总传感器激活次数的平衡,提出基于估计器增益非零列的算法,采用ADMM方法求解组合优化问题。

详情
Comments
Accepted in IEEE Transactions on Signal Processing
AI中文摘要

我们考虑寻找用于估计离散时间动态系统状态的最优时间周期性传感器调度问题。我们假设已部署了多个传感器,且这些传感器受资源限制,限制了每个传感器在一个周期内被激活的次数。我们寻求一个在估计精度和总传感器激活次数之间取得平衡的算法。我们建立主动传感器与估计器增益非零列之间的对应关系。我们提出一个优化问题,其中在最小化误差协方差迹的同时,同时惩罚估计器增益的非零列数。这个问题本质上是组合的,我们采用交替方向乘子法(ADMM)来找到其局部最优解。数值结果和与其他文献中传感器调度算法的比较用于说明所提方法的有效性。

英文摘要

We consider the problem of finding optimal time-periodic sensor schedules for estimating the state of discrete-time dynamical systems. We assume that {multiple} sensors have been deployed and that the sensors are subject to resource constraints, which limits the number of times each can be activated over one period of the periodic schedule. We seek an algorithm that strikes a balance between estimation accuracy and total sensor activations over one period. We make a correspondence between active sensors and the nonzero columns of estimator gain. We formulate an optimization problem in which we minimize the trace of the error covariance with respect to the estimator gain while simultaneously penalizing the number of nonzero columns of the estimator gain. This optimization problem is combinatorial in nature, and we employ the alternating direction method of multipliers (ADMM) to find its locally optimal solutions. Numerical results and comparisons with other sensor scheduling algorithms in the literature are provided to illustrate the effectiveness of our proposed method.

1511.03144 2026-06-04 cs.MA cs.IT cs.SY eess.SY math.IT stat.ML

Asynchronous Decentralized 20 Questions for Adaptive Search

异步去中心化20个问题用于自适应搜索

Theodoros Tsiligkaridis

AI总结 本文研究多个智能体通过时变网络拓扑进行自适应搜索未知目标的问题,提出去中心化协作算法以控制其搜索,结合20个问题方法和社会学习元素,证明在时变网络动态下迭代趋于正确共识。

详情
Comments
19 pages, Submitted. arXiv admin note: substantial text overlap with arXiv:1312.7847
AI中文摘要

本文研究多个智能体通过时变网络拓扑进行自适应搜索未知目标的问题,提出去中心化协作算法以控制其搜索,结合20个问题方法和社会学习元素,证明在时变网络动态下迭代趋于正确共识。

英文摘要

This paper considers the problem of adaptively searching for an unknown target using multiple agents connected through a time-varying network topology. Agents are equipped with sensors capable of fast information processing, and we propose a decentralized collaborative algorithm for controlling their search given noisy observations. Specifically, we propose decentralized extensions of the adaptive query-based search strategy that combines elements from the 20 questions approach and social learning. Under standard assumptions on the time-varying network dynamics, we prove convergence to correct consensus on the value of the parameter as the number of iterations go to infinity. The convergence analysis takes a novel approach using martingale-based techniques combined with spectral graph theory. Our results establish that stability and consistency can be maintained even with one-way updating and randomized pairwise averaging, thus providing a scalable low complexity method with performance guarantees. We illustrate the effectiveness of our algorithm for random network topologies.

1412.2817 2026-06-04 math.ST cs.SY eess.SY stat.TH

Diffusion Estimation Over Cooperative Multi-Agent Networks With Missing Data

在存在缺失数据的情况下,合作多智能体网络中的扩散估计

Mohammad Reza Gholami, Magnus Jansson, Erik G. Ström, Ali H. Sayed

AI总结 本文研究了在存在缺失数据的情况下,多智能体网络如何通过局部交互合作估计底层模型参数,提出通过(去)正则化调整分布式扩散方法以消除不完整模型引入的偏差,并通过心理健康调查和家庭消费调查应用展示结果。

详情
Comments
To appear in IEEE Transactions on Signal and Information Processing Over Networks
AI中文摘要

在许多领域,尤其是医学和社会科学以及推荐系统中,数据是通过临床研究或定向调查收集的。参与者通常不愿意回答所有问题,或者可能缺乏足够的信息来回答某些问题。这些研究收集的数据往往导致线性回归模型,其中回归向量只部分已知:其中一些条目要么完全缺失,要么被随机噪声值替代。在本文中,假设缺失位置被噪声值替代,我们研究了一个连接的智能体网络如何通过局部交互合作估计底层模型参数。我们解释了如何通过(去)正则化调整分布式扩散以消除不完整模型引入的偏差。我们还提出了一种递归估计(去)正则化参数的技术,并检验了所提出策略的性能。我们通过考虑两个应用:一个处理心理健康调查,另一个处理家庭消费调查来展示结果。

英文摘要

In many fields, and especially in the medical and social sciences and in recommender systems, data are gathered through clinical studies or targeted surveys. Participants are generally reluctant to respond to all questions in a survey or they may lack information to respond adequately to some questions. The data collected from these studies tend to lead to linear regression models where the regression vectors are only known partially: some of their entries are either missing completely or replaced randomly by noisy values. In this work, assuming missing positions are replaced by noisy values, we examine how a connected network of agents, with each one of them subjected to a stream of data with incomplete regression information, can cooperate with each other through local interactions to estimate the underlying model parameters in the presence of missing data. We explain how to adjust the distributed diffusion through (de)regularization in order to eliminate the bias introduced by the incomplete model. We also propose a technique to recursively estimate the (de)regularization parameter and examine the performance of the resulting strategy. We illustrate the results by considering two applications: one dealing with a mental health survey and the other dealing with a household consumption survey.

1411.6719 2026-06-04 math.ST cs.SY eess.SY math.OC stat.AP stat.ME stat.TH

Asymptotically Optimal Discrete Time Nonlinear Filters From Stochastically Convergent State Process Approximations

渐近最优离散时间非线性滤波器:从随机收敛状态过程近似出发

Dionysios S. Kalogerias, Athina P. Petropulu

AI总结 本文研究了在离散时间下,基于随机收敛状态过程近似的最优非线性滤波器近似问题,提出了一种在强意义下收敛的近似滤波算子,并为Egoroff定理提供了量化证明。

详情
Comments
EXTENDED version of an original paper published in the IEEE Transactions on Signal Processing; 37 pages
AI中文摘要

我们考虑了在离散时间设置下,以最小均方误差(MMSE)意义下逼近最优非线性滤波器的问题,利用随机收敛状态过程近似性质。具体而言,我们考虑由一个(可能非平稳)隐藏随机过程(状态)和另一个条件高斯随机过程(观测)组成的非线性、部分可观测随机系统。在一般假设下,我们证明,给定一个在每个时间步随机收敛到状态过程的近似过程,可以定义一个近似滤波算子,该算子在强且明确的意义下收敛到状态的真实最优非线性滤波器。特别是,收敛性在时间上是紧致的,并在完全 characterize 的测度几乎为1的可测集合上是均匀的,也为此问题提供了Egoroff定理的纯粹量化证明。本文的结果可以作为分析和表征多种启发式近似最优非线性滤波器方法(如基于近似的网格技术)的共同基础,这些方法在各种应用中表现良好。

英文摘要

We consider the problem of approximating optimal in the Minimum Mean Squared Error (MMSE) sense nonlinear filters in a discrete time setting, exploiting properties of stochastically convergent state process approximations. More specifically, we consider a class of nonlinear, partially observable stochastic systems, comprised by a (possibly nonstationary) hidden stochastic process (the state), observed through another conditionally Gaussian stochastic process (the observations). Under general assumptions, we show that, given an approximating process which, for each time step, is stochastically convergent to the state process, an approximate filtering operator can be defined, which converges to the true optimal nonlinear filter of the state in a strong and well defined sense. In particular, the convergence is compact in time and uniform in a completely characterized measurable set of probability measure almost unity, also providing a purely quantitative justification of Egoroff's Theorem for the problem at hand. The results presented in this paper can form a common basis for the analysis and characterization of a number of heuristic approaches for approximating optimal nonlinear filters, such as approximate grid based techniques, known to perform well in a variety of applications.

1509.08660 2026-06-04 eess.SY cs.MA cs.SY math.OC stat.ML

Censoring Diffusion for Harvesting WSNs

截断扩散用于无线传感器网络的能量采集

Jesus Fernandez-Bes, Rocío Arroyo-Valles, Jerónimo Arenas-García, Jesús Cid-Sueiro

AI总结 本文研究了分布式估计问题中能量采集自适应扩散网络的优化,提出结合截断算法的扩散策略,以提高能量利用效率。

详情
Comments
Accepted in 2015 IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP 2015)
AI中文摘要

本文分析了用于分布式估计问题的能量采集自适应扩散网络。为有效管理可用能量资源,我们提出了一种方案,其中截断算法与扩散策略联合应用。采用一种能量感知的扩散算法变体,并提出了一种新的方法来衡量扩散网络中估计的相关性,以便应用后续的截断机制。仿真结果展示了在能量受限的扩散网络中整合截断方案的潜在优势。

英文摘要

In this paper, we analyze energy-harvesting adaptive diffusion networks for a distributed estimation problem. In order to wisely manage the available energy resources, we propose a scheme where a censoring algorithm is jointly applied over the diffusion strategy. An energy-aware variation of a diffusion algorithm is used, and a new way of measuring the relevance of the estimates in diffusion networks is proposed in order to apply a subsequent censoring mechanism. Simulation results show the potential benefit of integrating censoring schemes in energy-constrained diffusion networks.

1410.7057 2026-06-04 cs.LG cs.DC cs.SY eess.SY stat.ML

Sparse Distributed Learning via Heterogeneous Diffusion Adaptive Networks

稀疏分布式学习 via 异质扩散自适应网络

Bijit Kumar Das, Mrityunjoy Chakraborty, Jerónimo Arenas-García

AI总结 本文提出通过异质扩散自适应网络实现稀疏参数向量的分布式估计,通过选择性应用凸正则化方法减少计算开销,同时保持最优性能。

详情
Comments
4 pages, 1 figure, conference, submitted to IEEE ISCAS 2015, Lisbon, Portugal
AI中文摘要

近年来,关于通过扩散LMS策略在网内进行稀疏参数向量分布式估计的研究已有所涉及。在所有现有工作中,每个网络节点都使用了一些凸正则化方法,以实现优于简单扩散LMS的整体网络性能,尽管这导致了计算开销的增加。本文提供了分析和实验结果,表明凸正则化可以仅应用于某些选定的节点,其余节点保持稀疏性无感知,同时仍能实现与在所有节点上部署凸正则化相同最优行为。由于在部分节点中采用无正则化学习,所提出的方法需要更少的计算成本。我们还提供了一条选择稀疏感知节点的指南和最优正则化参数的闭式表达式。

英文摘要

In-network distributed estimation of sparse parameter vectors via diffusion LMS strategies has been studied and investigated in recent years. In all the existing works, some convex regularization approach has been used at each node of the network in order to achieve an overall network performance superior to that of the simple diffusion LMS, albeit at the cost of increased computational overhead. In this paper, we provide analytical as well as experimental results which show that the convex regularization can be selectively applied only to some chosen nodes keeping rest of the nodes sparsity agnostic, while still enjoying the same optimum behavior as can be realized by deploying the convex regularization at all the nodes. Due to the incorporation of unregularized learning at a subset of nodes, less computational cost is needed in the proposed approach. We also provide a guideline for selection of the sparsity aware nodes and a closed form expression for the optimum regularization parameter.

1611.03328 2026-06-04 stat.AP cs.SI cs.SY eess.SY math.ST stat.ML stat.TH

Distributed Estimation and Learning over Heterogeneous Networks

分布式估计与学习在异构网络中的应用

M. Amin Rahimian, Ali Jadbabaie

AI总结 研究网络代理在不确定变量下进行决策时的估计与学习问题,提出高效处理数据异质性和时间异质性的方法,通过聚合方案结合分布式数据,实现全局高效估计。

详情
Comments
In Proceedings of the 53rd Annual Allerton Conference on Communication, Control, and Computing, 2016
AI中文摘要

研究网络代理在不确定变量下进行决策时的估计与学习问题,提出高效处理数据异质性和时间异质性的方法,通过聚合方案结合分布式数据,实现全局高效估计。

英文摘要

We consider several estimation and learning problems that networked agents face when making decisions given their uncertainty about an unknown variable. Our methods are designed to efficiently deal with heterogeneity in both size and quality of the observed data, as well as heterogeneity over time (intermittence). The goal of the studied aggregation schemes is to efficiently combine the observed data that is spread over time and across several network nodes, accounting for all the network heterogeneities. Moreover, we require no form of coordination beyond the local neighborhood of every network agent or sensor node. The three problems that we consider are (i) maximum likelihood estimation of the unknown given initial data sets, (ii) learning the true model parameter from streams of data that the agents receive intermittently over time, and (iii) minimum variance estimation of a complete sufficient statistic from several data points that the networked agents collect over time. In each case we rely on an aggregation scheme to combine the observations of all agents; moreover, when the agents receive streams of data over time, we modify the update rules to accommodate the most recent observations. In every case, we demonstrate the efficiency of our algorithms by proving convergence to the globally efficient estimators given the observations of all agents. We supplement these results by investigating the rate of convergence and providing finite-time performance guarantees.

1611.02256 2026-06-04 cs.CE cs.NA math.NA stat.CO

A Big-Data Approach to Handle Many Process Variations: Tensor Recovery and Applications

处理大量工艺变化的大数据方法:张量恢复与应用

Zheng Zhang, Tsui-Wei Weng, Luca Daniel

AI总结 本文提出大数据视角下的高维不确定性量化算法,通过张量恢复问题减少模拟样本数量,高效处理IC、MEMS和光子问题中的50个以上随机参数。

详情
Journal ref
IEEE Transactions on Component, Packaging and Manufacturing Technology, 2017
Comments
8 figures
AI中文摘要

制造工艺变化是集成电路、微机电系统和光子电路纳米级设计中产量下降的主要原因。随机谱方法是一种量化工艺变化引起的不确定性的有前途的技术。尽管这些算法在许多设计案例中比蒙特卡洛方法更高效,但它们却受到维度诅咒的限制;即,随着随机参数数量的增加,计算成本增长非常快。为了解决这个具有挑战性的问题,本文提出了一种从大数据视角出发的高维不确定性量化算法。具体来说,我们展示了标准随机格点方法中巨大的模拟样本数(例如,1.5×10^27)可以通过利用高维数据数组的某些隐藏结构减少到非常小的数目(例如,500)。这一想法被形式化为具有稀疏性和低秩约束的张量恢复问题,并通过交替最小化方法进行求解。数值结果表明,我们的方法可以高效地模拟一些IC、MEMS和光子问题,这些问题包含超过50个独立的随机参数,而传统算法只能处理几个随机参数。

英文摘要

Fabrication process variations are a major source of yield degradation in the nano-scale design of integrated circuits (IC), microelectromechanical systems (MEMS) and photonic circuits. Stochastic spectral methods are a promising technique to quantify the uncertainties caused by process variations. Despite their superior efficiency over Monte Carlo for many design cases, these algorithms suffer from the curse of dimensionality; i.e., their computational cost grows very fast as the number of random parameters increases. In order to solve this challenging problem, this paper presents a high-dimensional uncertainty quantification algorithm from a big-data perspective. Specifically, we show that the huge number of (e.g., $1.5 \times 10^{27}$) simulation samples in standard stochastic collocation can be reduced to a very small one (e.g., $500$) by exploiting some hidden structures of a high-dimensional data array. This idea is formulated as a tensor recovery problem with sparse and low-rank constraints; and it is solved with an alternating minimization approach. Numerical results show that our approach can simulate efficiently some ICs, as well as MEMS and photonic problems with over 50 independent random parameters, whereas the traditional algorithm can only handle several random parameters.

1407.1537 2026-06-04 cs.DS cs.LG cs.NA math.NA math.OC stat.ML

Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent

线性耦合:梯度下降与镜像下降的终极统一

Zeyuan Allen-Zhu, Lorenzo Orecchia

AI总结 本文提出线性耦合方法,通过结合梯度下降和镜像下降,统一了两种优化算法,重新解释了Nesterov加速梯度方法,并扩展至其他无法应用Nesterov方法的场景。

详情
Comments
A new section added; polished writing
AI中文摘要

首先阶方法在大规模机器学习中起核心作用。尽管存在多种变体,每种适用于特定问题,几乎所有此类方法本质上都依赖于两种算法步骤:梯度下降,产生原始进展,和镜像下降,产生对偶进展。我们观察到梯度和镜像下降的性能互补,因此通过线性耦合这两种方法可以设计出更快的算法。我们展示了如何通过线性耦合重构Nesterov加速梯度方法,这比Nesterov原始证明提供了更清晰的解释。我们还通过将线性耦合扩展到Nesterov方法无法应用的其他场景,讨论了其威力。

英文摘要

First-order methods play a central role in large-scale machine learning. Even though many variations exist, each suited to a particular problem, almost all such methods fundamentally rely on two types of algorithmic steps: gradient descent, which yields primal progress, and mirror descent, which yields dual progress. We observe that the performances of gradient and mirror descent are complementary, so that faster algorithms can be designed by LINEARLY COUPLING the two. We show how to reconstruct Nesterov's accelerated gradient methods using linear coupling, which gives a cleaner interpretation than Nesterov's original proofs. We also discuss the power of linear coupling by extending it to many other settings that Nesterov's methods cannot apply to.

1611.01767 2026-06-04 econ.GN math.OC q-fin.EC stat.ML

EM Algorithm and Stochastic Control in Economics

EM算法与经济学中的随机控制

Steven Kou, Xianhua Peng, Xingbo Xu

AI总结 本文提出EM-Control算法用于解决多期随机控制问题,通过正向反向蒙特卡洛模拟更新策略,具备单调收敛性,应用于 perishable assets 垄断定价与 real business cycle 研究。

详情
Comments
46 pages, 9 figures
AI中文摘要

在经典EM算法广泛用于计算最大似然估计的基础上,本文提出EM-Control (EM-C)算法用于求解多期有限时间 horizon 的随机控制问题。该算法在每个时间周期内通过正向-反向方向的蒙特卡洛模拟依次更新控制策略;换句话说,算法在每个迭代中正向模拟并反向优化。与EM算法类似,EM-C算法在每次迭代中具有性能改进的单调性,从而具有良好的收敛性质。通过求解 perishable assets 的垄断定价和 real business cycle 研究中的随机控制问题,展示了该算法的有效性。

英文摘要

Generalising the idea of the classical EM algorithm that is widely used for computing maximum likelihood estimates, we propose an EM-Control (EM-C) algorithm for solving multi-period finite time horizon stochastic control problems. The new algorithm sequentially updates the control policies in each time period using Monte Carlo simulation in a forward-backward manner; in other words, the algorithm goes forward in simulation and backward in optimization in each iteration. Similar to the EM algorithm, the EM-C algorithm has the monotonicity of performance improvement in each iteration, leading to good convergence properties. We demonstrate the effectiveness of the algorithm by solving stochastic control problems in the monopoly pricing of perishable assets and in the study of real business cycle.

1506.00059 2026-06-04 math.NA cs.NA stat.ML

Saddle-free Hessian-free Optimization

无鞍点的无Hessian优化

Martin Arjovsky

AI总结 本文提出一种新算法,解决非凸优化中的鞍点繁衍问题,推动牛顿法在高维非凸优化中的应用。

详情
Comments
NIPS 2016 Workshop on Nonconvex Optimization for Machine Learning: Theory and Practice
AI中文摘要

非凸优化问题(如深度神经网络训练)面临鞍点繁衍现象,即损失函数中存在大量高误差鞍点。二阶方法在凸优化中非常成功,但在深度学习中应用受限,原因包括计算复杂性和方法趋向高误差鞍点。本文提出一种新算法,专门解决这两个问题,为将牛顿法的优势引入非凸优化社区,特别是在高维设置中,提供了关键的第一步。

英文摘要

Nonconvex optimization problems such as the ones in training deep neural networks suffer from a phenomenon called saddle point proliferation. This means that there are a vast number of high error saddle points present in the loss function. Second order methods have been tremendously successful and widely adopted in the convex optimization community, while their usefulness in deep learning remains limited. This is due to two problems: computational complexity and the methods being driven towards the high error saddle points. We introduce a novel algorithm specially designed to solve these two issues, providing a crucial first step to take the widely known advantages of Newton's method to the nonconvex optimization community, especially in high dimensional settings.

1611.01006 2026-06-04 cs.MA cs.SI cs.SY eess.SY math.ST stat.AP stat.TH

Bayesian Heuristics for Group Decisions

基于贝叶斯方法的群体决策启发式模型

M. Amin Rahimian, Ali Jadbabaie

AI总结 本文提出了一种基于贝叶斯规则的群体决策启发式模型,分析了信息结构和概率分布对群体决策效率的影响,揭示了重复互动导致的过度自信和极端选择倾向。

详情
AI中文摘要

我们提出了一种推理和群体启发式决策模型,该模型基于贝叶斯规则,但避免了部分观测环境下不完整信息的理性推断复杂性,这在群体互动中是典型的特征。我们的模型也符合双过程心理理论:群体成员在相互互动初期行为理性(慢而审慎的模式);在随后的决策阶段,他们依赖于复制第一阶段经验的启发式方法(快而自动的模式)。我们专门将此模型应用于一个群体决策场景,其中私人观察在开始时接收,而代理人旨在根据所有群体成员的总体观察采取最佳行动。我们研究了信息结构和决定概率分布的属性,这些属性决定了所谓的“贝叶斯启发式”结构,即模型中代理遵循的结构。我们还分析了两类线性行动更新和对数线性信念更新的群体决策结果,并展示了由于个体之间的重复互动导致的许多低效现象,导致过度自信的信念以及向极端选择的转变。然而,平衡的常规结构在聚合个体初始信息方面表现出一定的效率。这些结果不仅验证了关于群体决策的一些已知见解,还通过揭示群体衰减过程的额外机理解释以及关于群体互动模型的心理和认知直觉来补充这些见解。

英文摘要

We propose a model of inference and heuristic decision-making in groups that is rooted in the Bayes rule but avoids the complexities of rational inference in partially observed environments with incomplete information, which are characteristic of group interactions. Our model is also consistent with a dual-process psychological theory of thinking: the group members behave rationally at the initiation of their interactions with each other (the slow and deliberative mode); however, in the ensuing decision epochs, they rely on a heuristic that replicates their experiences from the first stage (the fast automatic mode). We specialize this model to a group decision scenario where private observations are received at the beginning, and agents aim to take the best action given the aggregate observations of all group members. We study the implications of the information structure together with the properties of the probability distributions which determine the structure of the so-called "Bayesian heuristics" that the agents follow in our model. We also analyze the group decision outcomes in two classes of linear action updates and log-linear belief updates and show that many inefficiencies arise in group decisions as a result of repeated interactions between individuals, leading to overconfident beliefs as well as choice-shifts toward extremes. Nevertheless, balanced regular structures demonstrate a measure of efficiency in terms of aggregating the initial information of individuals. These results not only verify some well-known insights about group decision-making but also complement these insights by revealing additional mechanistic interpretations for the group declension-process, as well as psychological and cognitive intuitions about the group interaction model.

1602.08595 2026-06-04 math.NA cs.NA math.PR math.ST stat.TH

Fast Gibbs sampling for high-dimensional Bayesian inversion

高维贝叶斯反演中的快速吉布斯采样

Felix Lucka

AI总结 本文提出了一种改进的吉布斯采样方法,用于高维贝叶斯反演问题,支持多种先验分布,如一般ℓ_𝑝^𝑞先验和硬约束,并通过计算实例展示了其在CT数据反演中的应用效果。

详情
Comments
submitted to "Inverse Problems"
AI中文摘要

通过贝叶斯推理解决病态反问题近年来引起了广泛关注。与确定性方法相比,后验分布对解的表示可以用于探索和量化其不确定性。在应用中,反解需进一步分析时,这种方法具有显著优势。除了理论进展,各种新计算技术允许对高维后验分布进行采样:在[Lucka2012]中,为线性反问题开发了马尔可夫链蒙特卡罗(MCMC)后验采样器,使用ℓ_1型先验。本文将这种单组件吉布斯型采样器扩展到更广泛的先验分布,如一般的ℓ_𝑝^𝑞先验和附加的硬约束。除了单组件密度的显式参数化形式的快速计算,从这些一维密度中快速、稳健和精确地采样是获得高效算法的关键。我们证明了一种切片采样的推广可以利用其特定结构完成此任务,并通过不同的计算示例展示了所得到的切片-吉布斯采样器的性能。这些新采样器使我们能够首次在具有某些先验的高维场景中进行基于样本的贝叶斯推断,包括使用流行各向同性总变分(TV)先验的CT数据反演。

英文摘要

Solving ill-posed inverse problems by Bayesian inference has recently attracted considerable attention. Compared to deterministic approaches, the probabilistic representation of the solution by the posterior distribution can be exploited to explore and quantify its uncertainties. In applications where the inverse solution is subject to further analysis procedures, this can be a significant advantage. Alongside theoretical progress, various new computational techniques allow to sample very high dimensional posterior distributions: In [Lucka2012], a Markov chain Monte Carlo (MCMC) posterior sampler was developed for linear inverse problems with $\ell_1$-type priors. In this article, we extend this single component Gibbs-type sampler to a wide range of priors used in Bayesian inversion, such as general $\ell_p^q$ priors with additional hard constraints. Besides a fast computation of the conditional, single component densities in an explicit, parameterized form, a fast, robust and exact sampling from these one-dimensional densities is key to obtain an efficient algorithm. We demonstrate that a generalization of slice sampling can utilize their specific structure for this task and illustrate the performance of the resulting slice-within-Gibbs samplers by different computed examples. These new samplers allow us to perform sample-based Bayesian inference in high-dimensional scenarios with certain priors for the first time, including the inversion of computed tomography (CT) data with the popular isotropic total variation (TV) prior.

1606.01111 2026-06-04 eess.SY cs.SY stat.ML

Property-driven State-Space Coarsening for Continuous Time Markov Chains

基于属性的连续时间马尔可夫链状态空间粗化

Michalis Michaelides, Dimitrios Milios, Jane Hillston, Guido Sanguinetti

AI总结 本文提出基于属性的状态空间粗化方法,通过高斯过程模拟和多维缩放技术,保留轨迹满足逻辑规范的最优状态空间,展示在非平凡示例中的高效性能。

详情
Journal ref
Lecture Notes in Computer Science 9826 (2016) 3-18
Comments
16 pages, 6 figures, 1 table
AI中文摘要

具有大状态空间的动态系统通常实验探索成本高昂。粗化方法旨在定义更易分析和探索的简化系统;然而,当前方法多基于转移率相似性进行先验状态聚合,这未必反映轨迹层面的相似行为。本文提出一种方法,通过高斯过程模拟和多维缩放技术,最优保留系统轨迹满足逻辑规范的属性。我们展示了如何从属性满足角度获得低维状态空间可视化,并定义行为一致的宏状态。该方法在非平凡示例中展示出良好的性能和高效计算效率。

英文摘要

Dynamical systems with large state-spaces are often expensive to thoroughly explore experimentally. Coarse-graining methods aim to define simpler systems which are more amenable to analysis and exploration; most current methods, however, focus on a priori state aggregation based on similarities in transition rates, which is not necessarily reflected in similar behaviours at the level of trajectories. We propose a way to coarsen the state-space of a system which optimally preserves the satisfaction of a set of logical specifications about the system's trajectories. Our approach is based on Gaussian Process emulation and Multi-Dimensional Scaling, a dimensionality reduction technique which optimally preserves distances in non-Euclidean spaces. We show how to obtain low-dimensional visualisations of the system's state-space from the perspective of properties' satisfaction, and how to define macro-states which behave coherently with respect to the specifications. Our approach is illustrated on a non-trivial running example, showing promising performance and high computational efficiency.

1401.0869 2026-06-04 math.OC cs.LG cs.NA math.NA stat.CO stat.ML

Schatten-$p$ Quasi-Norm Regularized Matrix Optimization via Iterative Reweighted Singular Value Minimization

通过迭代加权奇异值最小化进行Schatten-p准范数正则化的矩阵优化

Zhaosong Lu, Yong Zhang

AI总结 本文研究了Schatten-p准范数正则化的矩阵最小化问题,提出了一种迭代加权奇异值最小化方法,证明了其收敛性并展示了其在解质量和速度上的优势。

详情
Comments
This paper has been withdrawn by the author due to major revision and corrections
AI中文摘要

本文研究了通用Schatten-p准范数(SPQN)正则化的矩阵最小化问题。首先,我们引入了一类一阶 stationary 点,并证明了在SPQN正则化的向量最小化问题中引入的一阶 stationary 点等同于在SPQN正则化的矩阵最小化重参数化问题中的一阶 stationary 点。我们还证明了SPQN正则化的矩阵最小化问题的任何局部极小值必须是一阶 stationary 点。此外,我们推导了非零奇异值的下界,并因此也推导了SPQN正则化的矩阵最小化问题的局部极小值的下界。然后,我们提出了迭代加权奇异值最小化(IRSVM)方法来解决这些问题,其子问题被证明具有闭式解。与SPQN正则化的向量最小化问题的类似方法相比,这些方法的收敛性分析显著更具挑战性。我们开发了一种新的方法来证明这些方法的收敛性,利用了其子问题特定解的表达式,避免了寻找其子问题目标函数Clarke子微分的显式表达式的复杂问题。特别是,我们证明了由IRSVM方法生成的序列的任何累积点都是问题的一阶 stationary 点。我们的计算结果表明,IRSVM方法在解质量和/或速度上通常优于一些最近开发的最先进的方法。

英文摘要

In this paper we study general Schatten-$p$ quasi-norm (SPQN) regularized matrix minimization problems. In particular, we first introduce a class of first-order stationary points for them, and show that the first-order stationary points introduced in [11] for an SPQN regularized $vector$ minimization problem are equivalent to those of an SPQN regularized $matrix$ minimization reformulation. We also show that any local minimizer of the SPQN regularized matrix minimization problems must be a first-order stationary point. Moreover, we derive lower bounds for nonzero singular values of the first-order stationary points and hence also of the local minimizers of the SPQN regularized matrix minimization problems. The iterative reweighted singular value minimization (IRSVM) methods are then proposed to solve these problems, whose subproblems are shown to have a closed-form solution. In contrast to the analogous methods for the SPQN regularized $vector$ minimization problems, the convergence analysis of these methods is significantly more challenging. We develop a novel approach to establishing the convergence of these methods, which makes use of the expression of a specific solution of their subproblems and avoids the intricate issue of finding the explicit expression for the Clarke subdifferential of the objective of their subproblems. In particular, we show that any accumulation point of the sequence generated by the IRSVM methods is a first-order stationary point of the problems. Our computational results demonstrate that the IRSVM methods generally outperform some recently developed state-of-the-art methods in terms of solution quality and/or speed.

1610.08127 2026-06-04 cs.LG cs.AI cs.NA math.NA stat.ML

Fast Bayesian Non-Negative Matrix Factorisation and Tri-Factorisation

快速的贝叶斯非负矩阵分解与三因子分解

Thomas Brouwer, Jes Frellsen, Pietro Lio'

AI总结 本文提出一种快速变分贝叶斯算法,用于非负矩阵分解和三因子分解,相比Gibbs采样和非概率方法,该方法在迭代和时间步收敛速度更快,且无需额外样本估计后验。

详情
Comments
NIPS 2016 Workshop on Advances in Approximate Bayesian Inference
AI中文摘要

我们提出了一种快速的变分贝叶斯算法,用于执行非负矩阵分解和三因子分解。我们证明了我们的方法在每次迭代和时间步(墙钟时间)上的收敛速度比Gibbs采样和非概率方法更快,并且不需要额外的样本来估计后验。我们特别展示了对于矩阵三因子分解,收敛具有挑战性,但我们的变分贝叶斯方法提供了一种快速的解决方案,使三因子分解方法能够更有效地使用。

英文摘要

We present a fast variational Bayesian algorithm for performing non-negative matrix factorisation and tri-factorisation. We show that our approach achieves faster convergence per iteration and timestep (wall-clock) than Gibbs sampling and non-probabilistic approaches, and do not require additional samples to estimate the posterior. We show that in particular for matrix tri-factorisation convergence is difficult, but our variational Bayesian approach offers a fast solution, allowing the tri-factorisation approach to be used more effectively.

1606.00119 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Contextual Bandits with Latent Confounders: An NMF Approach

具有潜在混杂因素的上下文老虎机:一种NMF方法

Rajat Sen, Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G. Dimakis, Sanjay Shakkottai

AI总结 本文提出基于NMF的ε-贪心算法,通过低维结构学习与最优臂选择平衡,实现在线矩阵补全的 regret 保障,适用于高维数据场景。

详情
Comments
37 pages, 2 figures
AI中文摘要

受在线推荐和广告系统启发,本文考虑了具有潜在低维混杂因子的随机上下文老虎机因果模型。在该模型中,L个观察到的上下文和K个臂之间通过潜在混杂因子相关联。臂选择和潜在混杂因子因果决定奖励,而观察到的上下文与混杂因子相关。在此模型下,L×K的均值奖励矩阵U可分解为非负因子A和W。本文提出ε-贪心NMF-Bandit算法,通过干预序列选择臂,实现学习低维结构与最小化遗憾的平衡。算法在时间T时的遗憾为O(Lpoly(m,logK)logT),相较于传统上下文老虎机的O(LKlogT)更优。这些保证基于较弱的统计RIP条件。此外,本文提出一类生成模型满足充分条件,并推导出O(KmlogT)的下界。这些是首次针对在线矩阵补全与老虎机反馈的regret保证,当秩大于一时。

英文摘要

Motivated by online recommendation and advertising systems, we consider a causal model for stochastic contextual bandits with a latent low-dimensional confounder. In our model, there are $L$ observed contexts and $K$ arms of the bandit. The observed context influences the reward obtained through a latent confounder variable with cardinality $m$ ($m \ll L,K$). The arm choice and the latent confounder causally determines the reward while the observed context is correlated with the confounder. Under this model, the $L \times K$ mean reward matrix $\mathbf{U}$ (for each context in $[L]$ and each arm in $[K]$) factorizes into non-negative factors $\mathbf{A}$ ($L \times m$) and $\mathbf{W}$ ($m \times K$). This insight enables us to propose an $ε$-greedy NMF-Bandit algorithm that designs a sequence of interventions (selecting specific arms), that achieves a balance between learning this low-dimensional structure and selecting the best arm to minimize regret. Our algorithm achieves a regret of $\mathcal{O}\left(L\mathrm{poly}(m, \log K) \log T \right)$ at time $T$, as compared to $\mathcal{O}(LK\log T)$ for conventional contextual bandits, assuming a constant gap between the best arm and the rest for each context. These guarantees are obtained under mild sufficiency conditions on the factors that are weaker versions of the well-known Statistical RIP condition. We further propose a class of generative models that satisfy our sufficient conditions, and derive a lower bound of $\mathcal{O}\left(Km\log T\right)$. These are the first regret guarantees for online matrix completion with bandit feedback, when the rank is greater than one. We further compare the performance of our algorithm with the state of the art, on synthetic and real world data-sets.

1610.08363 2026-06-04 stat.ME cs.NA math.NA stat.CO

Comments on "Bayesian Solution Uncertainty Quantification for Differential Equations" by Chkrebtii, Campbell, Calderhead & Girolami

对Chkrebtii等人论文《关于微分方程的贝叶斯解不确定性量化》的评论

Jon Cockayne

AI总结 本文探讨了用于ODE和PDE的概率求解器,提出了一种新的不确定性量化方法,提升了数值解的可靠性。

详情
AI中文摘要

我感谢作者们发表的这篇有趣且清晰的论文,讨论了用于ODE和PDE的概率求解器。

英文摘要

I would like to thank the authors for their interesting and very clearly presented paper discussing probabilistic solvers for ODEs and PDEs.

1508.00506 2026-06-04 math.OC cs.LG cs.SY eess.SY math.PR math.ST stat.TH

A variational approach to path estimation and parameter inference of hidden diffusion processes

隐扩散过程路径估计与参数推断的变分方法

Tobias Sutter, Arnab Ganguly, Heinz Koeppl

AI总结 本文提出一种变分方法,用于估计隐扩散过程的路径并推断参数,通过高效推理方案提升对随机微分方程参数的估计精度。

详情
Journal ref
JMLR, volume 17, number 190, year 2016
Comments
37 pages, 2 figures, revised
AI中文摘要

本文考虑了一个隐马尔可夫模型,其中信号过程由扩散过程给出,仅通过一些噪声测量间接观测。文章开发了一种变分方法,用于在给定全部观测数据的情况下近似信号过程的隐藏状态。这特别导致了对信号过程平滑密度的系统近似。论文随后展示了如何基于这种变分方法来设计高效的推理方案,以估计随机微分方程的未知参数。最后两个例子展示了所提方法的有效性和准确性。

英文摘要

We consider a hidden Markov model, where the signal process, given by a diffusion, is only indirectly observed through some noisy measurements. The article develops a variational method for approximating the hidden states of the signal process given the full set of observations. This, in particular, leads to systematic approximations of the smoothing densities of the signal process. The paper then demonstrates how an efficient inference scheme, based on this variational approach to the approximation of the hidden states, can be designed to estimate the unknown parameters of stochastic differential equations. Two examples at the end illustrate the efficacy and the accuracy of the presented method.

1610.06506 2026-06-04 math.OC cs.NA math.NA stat.ME

ASTRO-DF: A Class of Adaptive Sampling Trust-Region Algorithms for Derivative-Free Stochastic Optimization

ASTRO-DF:一种自适应采样信任区域算法类用于无约束随机优化

Sara Shashaani, Fatemeh Hashemi, Raghu Pasupathy

AI总结 本文提出ASTRO-DF算法,通过自适应采样平衡采样误差与模型偏差,实现随机优化问题的几乎 surely 收敛到一阶临界点。

详情
Comments
27 pages, no figures
AI中文摘要

我们考虑无约束优化问题,其中只能获取目标函数的随机估计作为蒙特卡洛 oracle 的重复观测。蒙特卡洛 oracle 被假设不提供函数梯度的直接观测。我们提出ASTRO-DF——一种无梯度信任区域算法,其中随机局部插值模型被构造、优化并迭代更新。在ASTRO-DF中,函数估计和模型构建是自适应的,即蒙特卡洛采样的程度由持续监控和平衡采样误差(或方差)和结构误差(或模型偏差)来决定。这种误差平衡旨在确保ASTRO-DF中的蒙特卡洛努力对算法轨迹敏感,当迭代被推断接近临界点时增加采样,远离时减少采样。我们证明当使用线性或二次随机插值模型时,ASTRO-DF的迭代几乎 surely 收敛到一阶临界点。使用更复杂的模型,如回归或随机克里格法结合自适应采样的问题值得进一步研究,并将受益于本文提出的证明方法。我们推测ASTRO-DF的迭代达到经典的蒙特卡洛收敛速率,尽管证明仍不明确。

英文摘要

We consider unconstrained optimization problems where only "stochastic" estimates of the objective function are observable as replicates from a Monte Carlo oracle. The Monte Carlo oracle is assumed to provide no direct observations of the function gradient. We present ASTRO-DF --- a class of derivative-free trust-region algorithms, where a stochastic local interpolation model is constructed, optimized, and updated iteratively. Function estimation and model construction within ASTRO-DF is adaptive in the sense that the extent of Monte Carlo sampling is determined by continuously monitoring and balancing metrics of sampling error (or variance) and structural error (or model bias) within ASTRO-DF. Such balancing of errors is designed to ensure that Monte Carlo effort within ASTRO-DF is sensitive to algorithm trajectory, sampling more whenever an iterate is inferred to be close to a critical point and less when far away. We demonstrate the almost-sure convergence of ASTRO-DF's iterates to a first-order critical point when using linear or quadratic stochastic interpolation models. The question of using more complicated models, e.g., regression or stochastic kriging, in combination with adaptive sampling is worth further investigation and will benefit from the methods of proof presented here. We speculate that ASTRO-DF's iterates achieve the canonical Monte Carlo convergence rate, although a proof remains elusive.

1610.05448 2026-06-04 stat.ML econ.GN math.ST q-fin.EC stat.TH

Generalization error minimization: a new approach to model evaluation and selection with an application to penalized regression

泛化误差最小化:一种新的模型评估和选择方法及其在正则化回归中的应用

Ning Xu, Jian Hong, Timothy C. G. Fisher

AI总结 本文从泛化能力角度研究模型评估与选择,提出泛化误差最小化框架,统一了正则化回归估计器,并探讨了其在模型选择中的应用。

详情
Comments
The theoretical generalization and extension of arXiv:1606.00142 and arXiv:1609.03344
AI中文摘要

我们从泛化能力(GA)的角度研究模型评估和模型选择:模型在新样本中的预测能力。我们相信GA是正式解决模型外部有效性问题的一种方法。通过样本估计的模型GA可通过经验外样本误差(泛化误差GE)来测量。我们推导了GE的上界,这些上界依赖于样本量、模型复杂度和损失函数分布。这些上界可用于评估模型的GA,事先评估。我们提出使用泛化误差最小化(GEM)作为模型选择的框架。使用GEM,我们能够统一一大类正则化回归估计器,包括Lasso、Ridge和Bridge,统一在相同假设下。我们建立了GEM估计器的有限样本和渐近性质(包括L2一致性),包括n≥p和n<p的情况。我们还推导了正则化与相应无正则化回归估计之间的L2距离。在实践中,GEM可通过验证或交叉验证实现。我们证明GE上界可用于选择K折交叉验证的最佳折叠数。我们提出R²的变种GR²作为GA的度量,考虑了样本内和样本外拟合的好坏。模拟用于展示我们的关键结果。

英文摘要

We study model evaluation and model selection from the perspective of generalization ability (GA): the ability of a model to predict outcomes in new samples from the same population. We believe that GA is one way formally to address concerns about the external validity of a model. The GA of a model estimated on a sample can be measured by its empirical out-of-sample errors, called the generalization errors (GE). We derive upper bounds for the GE, which depend on sample sizes, model complexity and the distribution of the loss function. The upper bounds can be used to evaluate the GA of a model, ex ante. We propose using generalization error minimization (GEM) as a framework for model selection. Using GEM, we are able to unify a big class of penalized regression estimators, including lasso, ridge and bridge, under the same set of assumptions. We establish finite-sample and asymptotic properties (including $\mathcal{L}_2$-consistency) of the GEM estimator for both the $n \geqslant p$ and the $n < p$ cases. We also derive the $\mathcal{L}_2$-distance between the penalized and corresponding unpenalized regression estimates. In practice, GEM can be implemented by validation or cross-validation. We show that the GE bounds can be used for selecting the optimal number of folds in $K$-fold cross-validation. We propose a variant of $R^2$, the $GR^2$, as a measure of GA, which considers both both in-sample and out-of-sample goodness of fit. Simulations are used to demonstrate our key results.

1509.05009 2026-06-04 cs.NE cs.LG cs.NA math.NA stat.ML

On the Expressive Power of Deep Learning: A Tensor Analysis

深度学习表达能力的分析:张量视角

Nadav Cohen, Or Sharir, Amnon Shashua

AI总结 本文通过张量分解理论分析深度学习的表达能力,证明深度网络在多项式规模下实现的函数需浅层网络指数规模才能近似。

详情
Journal ref
29th Annual Conference on Learning Theory, pp. 698-728, 2016
AI中文摘要

长期以来,人们推测适合组合性数据(如文本或图像)的假设空间可能更高效地由深度分层网络表示而非浅层网络。尽管有大量的实证证据支持这一观点,但目前的理论依据有限。特别是,它们未能考虑卷积网络的局部性、共享和池化构造,这是目前最成功的深度学习架构。本文推导出一种基于算术电路的深度网络架构,其本质上具有局部性、共享和池化。建立了网络与分层张量分解之间的等价性。证明浅层网络对应于CP(秩-1)分解,而深层网络对应于分层Tucker分解。利用测度论和矩阵代数工具,证明除了可忽略的集合外,所有可通过多项式规模深度网络实现的函数,都需要指数规模的浅层网络才能实现(或近似)。由于对数空间计算将我们的网络转化为SimNets,该结果直接适用于具有有希望实证性能的深度学习架构。本文提出的构造和理论为深度学习社区的各种实践和想法提供了新的见解。

英文摘要

It has long been conjectured that hypotheses spaces suitable for data that is compositional in nature, such as text or images, may be more efficiently represented with deep hierarchical networks than with shallow ones. Despite the vast empirical evidence supporting this belief, theoretical justifications to date are limited. In particular, they do not account for the locality, sharing and pooling constructs of convolutional networks, the most successful deep learning architecture to date. In this work we derive a deep network architecture based on arithmetic circuits that inherently employs locality, sharing and pooling. An equivalence between the networks and hierarchical tensor factorizations is established. We show that a shallow network corresponds to CP (rank-1) decomposition, whereas a deep network corresponds to Hierarchical Tucker decomposition. Using tools from measure theory and matrix algebra, we prove that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require exponential size in order to be realized (or even approximated) by a shallow network. Since log-space computation transforms our networks into SimNets, the result applies directly to a deep learning architecture demonstrating promising empirical performance. The construction and theory developed in this paper shed new light on various practices and ideas employed by the deep learning community.

1610.02106 2026-06-04 stat.CO cs.NA math.NA

Numerical approximation of the Frobenius-Perron operator using the finite volume method

使用有限体积法对弗罗贝尼乌斯-佩尔松算子进行数值近似

Richard A. Norton, Colin Fox, Malcolm E. Morrison

AI总结 本文提出利用有限体积法对概率演化方程中的弗罗贝尼乌斯-佩尔松算子进行有限维近似,通过Courant-Friedrichs-Lewy条件保证马尔可夫性质,并通过有限体积法的收敛理论证明离散算子向连续算子收敛。

详情
Comments
15 pages, 2 figures
AI中文摘要

我们利用有限体积法对连续方程中的概率演化进行有限维近似,以得到弗罗贝尼乌斯-佩尔松算子的近似。Courant-Friedrichs-Lewy条件确保近似满足马尔可夫性质,而有限体积法的现有收敛理论保证离散算子在网格尺寸趋于零时收敛到连续算子。通过计算示例展示了近似的性质,该示例涉及低维机械系统状态的顺序推断,当观测导致多模分布时的情况。

英文摘要

We develop a finite-dimensional approximation of the Frobenius-Perron operator using the finite volume method applied to the continuity equation for the evolution of probability. A Courant-Friedrichs-Lewy condition ensures that the approximation satisfies the Markov property, while existing convergence theory for the finite volume method guarantees convergence of the discrete operator to the continuous operator as mesh size tends to zero. Properties of the approximation are demonstrated in a computed example of sequential inference for the state of a low-dimensional mechanical system when observations give rise to multi-modal distributions.

1509.01404 2026-06-04 math.NA cs.CV cs.LG cs.NA math.OC stat.ML

Coordinate Descent Methods for Symmetric Nonnegative Matrix Factorization

对称非负矩阵分解的坐标下降方法

Arnaud Vandaele, Nicolas Gillis, Qi Lei, Kai Zhong, Inderjit Dhillon

AI总结 本文提出高效的坐标下降方法用于对称非负矩阵分解,适用于大规模稀疏矩阵,通过实验证明其在合成和实际数据集上的有效性。

详情
Journal ref
IEEE Transactions on Signal Processing 64 (21), pp. 5571-5584, 2016
Comments
25 pages, 5 figures, 7 tables. Main changes: comparison with another symNMF algorithm (namely, BetaSNMF), and correction of an error in the convergence proof
AI中文摘要

给定一个对称非负矩阵A,对称非负矩阵分解(symNMF)是寻找一个非负矩阵H,通常列数远少于A,使得A≈HH^T。本文提出简单且高效的坐标下降方案来解决该问题,能够处理大规模稀疏输入矩阵。通过合成和实际数据集的实验,展示了所提方法在合成和实际数据集上的有效性,并证明其在与最新状态的最先进方法相比表现优异。

英文摘要

Given a symmetric nonnegative matrix $A$, symmetric nonnegative matrix factorization (symNMF) is the problem of finding a nonnegative matrix $H$, usually with much fewer columns than $A$, such that $A \approx HH^T$. SymNMF can be used for data analysis and in particular for various clustering tasks. In this paper, we propose simple and very efficient coordinate descent schemes to solve this problem, and that can handle large and sparse input matrices. The effectiveness of our methods is illustrated on synthetic and real-world data sets, and we show that they perform favorably compared to recent state-of-the-art methods.

1411.7245 2026-06-04 math.OC cs.LG cs.NA math.NA stat.ML

Heuristics for Exact Nonnegative Matrix Factorization

精确非负矩阵分解的启发式方法

Arnaud Vandaele, Nicolas Gillis, François Glineur, Daniel Tuyttens

AI总结 本文提出两种启发式方法用于精确非负矩阵分解,通过模拟退火和贪心随机自适应搜索启发式方法,展示了其在多种非负矩阵类别的应用优势,并探讨了非负秩的行为特性。

详情
Journal ref
Journal of Global Optimization 65 (2), pp 369-400, 2016
Comments
32 pages, 2 figures, 16 tables
AI中文摘要

精确非负矩阵分解(精确NMF)问题为:给定一个m-by-n的非负矩阵X和一个分解秩r,寻找若可能的m-by-r非负矩阵W和r-by-n非负矩阵H使得X=WH。本文提出两种启发式方法,一种受模拟退火启发,另一种受贪心随机自适应搜索启发。我们证明这两种启发式方法能够计算几种非负矩阵类别的精确非负分解,并展示其优于标准多起始策略。我们还考虑这两种启发式的混合方法,以结合两种方法的优势。最后,我们讨论这些启发式方法在理解非负秩行为方面的应用,即最小分解秩使得存在精确NMF。特别是,我们推翻了关于Kronecker积非负秩的猜想,提出了关于通用n边形扩展复杂度的新上界,并推测正则n边形的扩展复杂度和相关联的非负秩的精确值。

英文摘要

The exact nonnegative matrix factorization (exact NMF) problem is the following: given an $m$-by-$n$ nonnegative matrix $X$ and a factorization rank $r$, find, if possible, an $m$-by-$r$ nonnegative matrix $W$ and an $r$-by-$n$ nonnegative matrix $H$ such that $X = WH$. In this paper, we propose two heuristics for exact NMF, one inspired from simulated annealing and the other from the greedy randomized adaptive search procedure. We show that these two heuristics are able to compute exact nonnegative factorizations for several classes of nonnegative matrices (namely, linear Euclidean distance matrices, slack matrices, unique-disjointness matrices, and randomly generated matrices) and as such demonstrate their superiority over standard multi-start strategies. We also consider a hybridization between these two heuristics that allows us to combine the advantages of both methods. Finally, we discuss the use of these heuristics to gain insight on the behavior of the nonnegative rank, i.e., the minimum factorization rank such that an exact NMF exists. In particular, we disprove a conjecture on the nonnegative rank of a Kronecker product, propose a new upper bound on the extension complexity of generic $n$-gons and conjecture the exact value of (i) the extension complexity of regular $n$-gons and (ii) the nonnegative rank of a submatrix of the slack matrix of the correlation polytope.

1610.01230 2026-06-04 math.NA cs.NA math.ST stat.TH

A Local Inverse Formula and a Factorization

局部逆公式与因子分解

Gilbert Strang, Shev MacNamara

AI总结 研究局部逆公式及其在稀疏矩阵中的应用,通过矩阵因子分解解释其在机器学习和图模型中的广泛用途。

详情
AI中文摘要

当矩阵具有带状逆时,存在一个显著的公式可快速计算该逆,仅需原矩阵的局部信息。此局部逆公式更广泛地适用于具有弦图或完美消去模式稀疏结构的矩阵。该公式历史悠久,可追溯至协方差矩阵缺失数据的补全问题。最大熵估计、对数行列式、秩条件、空性定理和小波均与该公式密切相关,并在机器学习和图模型中广泛应用。本文描述了该局部逆公式,并解释其作为矩阵因子分解的理解。

英文摘要

When a matrix has a banded inverse there is a remarkable formula that quickly computes that inverse, using only local information in the original matrix. This local inverse formula holds more generally, for matrices with sparsity patterns that are examples of chordal graphs or perfect eliminators. The formula has a long history going back at least as far as the completion problem for covariance matrices with missing data. Maximum entropy estimates, log-determinants, rank conditions, the Nullity Theorem and wavelets are all closely related, and the formula has found wide applications in machine learning and graphical models. We describe that local inverse and explain how it can be understood as a matrix factorization.

1606.01316 2026-06-04 stat.ML cs.DS cs.IT cs.NA math.IT math.NA math.OC

Provable Burer-Monteiro factorization for a class of norm-constrained matrix problems

可证明的Burer-Monteiro分解用于一类具有规范约束的矩阵问题

Dohyung Park, Anastasios Kyrillidis, Srinadh Bhojanapalli, Constantine Caramanis, Sujay Sanghavi

AI总结 本文研究了在强凸目标下的低秩矩阵问题的投影梯度下降法,采用Burer-Monteiro分解隐式约束低秩性,并在正定约束和特定矩阵范数约束下展示局部线性收敛性。

详情
Comments
28 pages
AI中文摘要

我们研究了在低秩矩阵问题中具有强凸目标的投影梯度下降方法。我们使用Burer-Monteiro分解方法隐式地强制低秩性;这种分解在目标函数中引入了非凸性。我们关注包含半正定(PSD)约束和特定矩阵范数约束的约束集。此类标准出现在量子态成像和相位恢复应用中。我们证明非凸投影梯度下降在因子空间中具有局部线性收敛性。我们的理论基于一个新颖的下降引理,该引理扩展了最近对无约束问题的结果。所得到的算法称为投影因子梯度下降(ProjFGD),在量子态成像和稀疏相位恢复应用中表现出优于现有方法的性能。

英文摘要

We study the projected gradient descent method on low-rank matrix problems with a strongly convex objective. We use the Burer-Monteiro factorization approach to implicitly enforce low-rankness; such factorization introduces non-convexity in the objective. We focus on constraint sets that include both positive semi-definite (PSD) constraints and specific matrix norm-constraints. Such criteria appear in quantum state tomography and phase retrieval applications. We show that non-convex projected gradient descent favors local linear convergence in the factored space. We build our theory on a novel descent lemma, that non-trivially extends recent results on the unconstrained problem. The resulting algorithm is Projected Factored Gradient Descent, abbreviated as ProjFGD, and shows superior performance compared to state of the art on quantum state tomography and sparse phase retrieval applications.

1609.09660 2026-06-04 eess.SY cs.LG cs.SY stat.ML

On Identification of Sparse Multivariable ARX Model: A Sparse Bayesian Learning Approach

关于稀疏多变量ARX模型识别:一种稀疏贝叶斯学习方法

J. Jin, Y. Yuan, W. Pan, D. L. T. Pham, C. J. Tomlin, A. Webb, J. Goncalves

AI总结 本文提出一种基于稀疏贝叶斯学习的方法,用于识别稀疏多变量ARX模型的布尔结构和节点间动态,无需先验知识,通过最大后验估计结合复杂性和组稀疏性惩罚。

详情
AI中文摘要

本文首先考虑了由多变量ARX模型描述的稀疏线性时不变网络的识别问题。此类模型具有相对简单的结构,因此被用作基准以促进进一步研究。在保证网络可识别性的情况下,本文提出了一种识别方法,该方法从数据中推断网络的布尔结构和节点间的内部动态。识别直接从数据中进行,而无需任何系统先验知识,包括其阶数。所提出的方法通过最大后验估计(MAP)解决识别问题,但采用分离的惩罚项来处理复杂性,包括元素(非零连接的阶数)和组稀疏性(网络拓扑)。这种方法广泛应用于压缩感知(CS)中,被称为稀疏贝叶斯学习(SBL)。随后,本文提出了一种新的方案,结合稀疏贝叶斯和组稀疏贝叶斯以高效解决问题。所得到的算法形式与标准稀疏组正则化(SGL)相似,当已知噪声方差时,简化为精确的加权SGL。该方法和开发的工具包可应用于从各种领域推断网络,包括系统生物学中的信号和基因调控网络应用。

英文摘要

This paper begins with considering the identification of sparse linear time-invariant networks described by multivariable ARX models. Such models possess relatively simple structure thus used as a benchmark to promote further research. With identifiability of the network guaranteed, this paper presents an identification method that infers both the Boolean structure of the network and the internal dynamics between nodes. Identification is performed directly from data without any prior knowledge of the system, including its order. The proposed method solves the identification problem using Maximum a posteriori estimation (MAP) but with inseparable penalties for complexity, both in terms of element (order of nonzero connections) and group sparsity (network topology). Such an approach is widely applied in Compressive Sensing (CS) and known as Sparse Bayesian Learning (SBL). We then propose a novel scheme that combines sparse Bayesian and group sparse Bayesian to efficiently solve the problem. The resulted algorithm has a similar form of the standard Sparse Group Lasso (SGL) while with known noise variance, it simplifies to exact re-weighted SGL. The method and the developed toolbox can be applied to infer networks from a wide range of fields, including systems biology applications such as signaling and genetic regulatory networks.

1609.09154 2026-06-04 cs.DC cs.NA math.NA stat.ML

MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization

MPI-FAUN: 一种基于MPI的交替更新非负矩阵分解框架

Ramakrishnan Kannan, Grey Ballard, Haesun Park

AI总结 本文提出MPI-FAUN框架,用于高效解决大规模非负矩阵分解问题,通过交替求解非负最小二乘子问题,实现并行计算,支持多种算法并能处理大规模密集和稀疏数据集。

详情
Comments
arXiv admin note: text overlap with arXiv:1509.09313
AI中文摘要

非负矩阵分解(NMF)是确定给定输入矩阵A的两个非负低秩因子W和H,使得A≈WH的问题。NMF在文本挖掘、视频分析和社交网络社区检测等领域有广泛应用。尽管在数据挖掘领域很受欢迎,但缺乏高效的并行算法来处理大数据集。本文的主要贡献是提出一种新的高性能并行计算框架,适用于广泛类别的NMF算法,通过迭代求解W和H的交替非负最小二乘(NLS)子问题。该框架在内存中维护数据和因子矩阵(分布于处理器上),使用MPI进行处理器间通信,并在密集情况下证明在温和假设下可最小化通信成本。该框架灵活,能够利用多种NMF和NLS算法,包括乘法更新、分层交替最小二乘和块主元翻转。我们的实现允许我们在数百万到数十亿规模的密集和稀疏数据矩阵上基准测试和比较不同算法。我们展示了算法的可扩展性,并将其与基线实现进行比较,显示出显著的性能提升。用于实验的代码和数据集可在网络上获取。

英文摘要

Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors $W$ and $H$, for the given input matrix $A$, such that $A \approx W H$. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms that iteratively solves alternating non-negative least squares (NLS) subproblems for $W$ and $H$. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans for few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.

1508.02865 2026-06-04 eess.SY cs.SY stat.ML

Maximum Entropy Vector Kernels for MIMO system identification

最大熵向量核用于MIMO系统识别

Giulia Prando, Gianluigi Pillonetto, Alessandro Chiuso

AI总结 本文提出基于最大熵原理的向量核方法,用于MIMO系统识别,通过Hankel矩阵结构控制模型复杂度、稳定性与平滑度,采用SGP算法优化边际似然,验证了方法的有效性。

详情
AI中文摘要

近期研究将线性系统识别视为非参数正则化反问题。基于$\ell_2$型正则化,这些方法在稳定性和平滑性方面表现良好。本文采用最大熵原理,推导出基于向量值核的新的$\ell_2$惩罚项。通过Hankel矩阵的结构,同时控制复杂度、稳定性和平滑度。作为特殊情况,我们恢复了平方块Hankel矩阵的核范数惩罚。与之前的重加权核范数惩罚文献不同,我们的核由少量超参数描述,通过边际似然最大化迭代更新。约束核的结构作为(超)正则化器,帮助控制估计器的有效自由度。为了优化边际似然,我们采用适应的缩放梯度投影(SGP)算法,证明其计算成本显著低于其他一阶和二阶优化方法。本文还包含对多种最新方法在多个蒙特卡洛研究中的广泛比较,验证了本方法的有效性。

英文摘要

Recent contributions have framed linear system identification as a nonparametric regularized inverse problem. Relying on $\ell_2$-type regularization which accounts for the stability and smoothness of the impulse response to be estimated, these approaches have been shown to be competitive w.r.t classical parametric methods. In this paper, adopting Maximum Entropy arguments, we derive a new $\ell_2$ penalty deriving from a vector-valued kernel; to do so we exploit the structure of the Hankel matrix, thus controlling at the same time complexity, measured by the McMillan degree, stability and smoothness of the identified models. As a special case we recover the nuclear norm penalty on the squared block Hankel matrix. In contrast with previous literature on reweighted nuclear norm penalties, our kernel is described by a small number of hyper-parameters, which are iteratively updated through marginal likelihood maximization; constraining the structure of the kernel acts as a (hyper)regularizer which helps controlling the effective degrees of freedom of our estimator. To optimize the marginal likelihood we adapt a Scaled Gradient Projection (SGP) algorithm which is proved to be significantly computationally cheaper than other first and second order off-the-shelf optimization methods. The paper also contains an extensive comparison with many state-of-the-art methods on several Monte-Carlo studies, which confirms the effectiveness of our procedure.

1502.01956 2026-06-04 eess.SY cs.SY math.DS stat.ML

Stochastic recursive inclusion in two timescales with an application to the Lagrangian dual problem

双时间尺度随机递归包含及其在拉格朗日对偶问题中的应用

Arunselvan Ramaswamy, Shalabh Bhatnagar

AI总结 本文提出分析双时间尺度随机逼近算法渐近行为的框架,包括具有多值均值场的算法,并应用于优化理论中的拉格朗日对偶问题分析。

详情
Journal ref
Stochastics 2016
AI中文摘要

本文提出一个框架,用于分析包含多值均值场的双时间尺度随机逼近算法的渐近行为。该框架基于Borkar和Perkins及Leslie的工作,比Perkins & Leslie的同步双时间尺度框架更一般,但所涉及的假设易于验证。作为应用,本文使用该框架分析优化理论中拉格朗日对偶问题对应的双时间尺度随机逼近算法。

英文摘要

In this paper we present a framework to analyze the asymptotic behavior of two timescale stochastic approximation algorithms including those with set-valued mean fields. This paper builds on the works of Borkar and Perkins & Leslie. The framework presented herein is more general as compared to the synchronous two timescale framework of Perkins \& Leslie, however the assumptions involved are easily verifiable. As an application, we use this framework to analyze the two timescale stochastic approximation algorithm corresponding to the Lagrangian dual problem in optimization theory.

1502.01953 2026-06-04 eess.SY cs.SY math.DS stat.ML

A Generalization of the Borkar-Meyn Theorem for Stochastic Recursive Inclusions

Borkar-Meyn定理的递归包含扩展

Arunselvan Ramaswamy, Shalabh Bhatnagar

AI总结 本文扩展了Borkar-Meyn定理,以涵盖均场为微分包含的情况,提出了两种充分条件保证递归包含的稳定性和收敛性,并讨论了近似漂移问题的解法。

详情
AI中文摘要

本文将Borkar和Meyn的稳定性定理扩展到均场为微分包含的情况。提出了两种不同的充分条件,保证了随机递归包含的稳定性和收敛性。本文的工作借鉴了Benaim、Hofbauer、Sorin以及Borkar和Meyn的研究。作为主要定理的一个推论,Borkar和Meyn定理的自然推广得以得出。此外,Borkar和Meyn的原始定理在略微放松的假设下也成立。最后,作为主要定理的应用,我们讨论了近似漂移问题的解法。

英文摘要

In this paper the stability theorem of Borkar and Meyn is extended to include the case when the mean field is a differential inclusion. Two different sets of sufficient conditions are presented that guarantee the stability and convergence of stochastic recursive inclusions. Our work builds on the works of Benaim, Hofbauer and Sorin as well as Borkar and Meyn. As a corollary to one of the main theorems, a natural generalization of the Borkar and Meyn Theorem follows. In addition, the original theorem of Borkar and Meyn is shown to hold under slightly relaxed assumptions. Finally, as an application to one of the main theorems we discuss a solution to the approximate drift problem.

1609.03240 2026-06-04 stat.ML cs.IT cs.LG cs.NA math.IT math.NA math.OC

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach

非正方形矩阵感知:通过Burer-Monteiro方法避免虚假局部极小值

Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, Sujay Sanghavi

AI总结 本文在受限等距性质假设下研究非正方形矩阵感知问题,通过非凸方法证明矩阵分解在RIP条件下不引入虚假局部极小值。

详情
Comments
14 pages, no figures
AI中文摘要

我们考虑在受限等距性质(RIP)假设下的非正方形矩阵感知问题。我们聚焦于非凸形式,其中任何秩为r的矩阵X∈R^{m×n}表示为UV^T,其中U∈R^{m×r}和V∈R^{n×r}。在本文中,我们补充了最近关于类似PSD设置的非凸几何的发现[5],并证明在RIP条件下,矩阵分解不会引入任何虚假局部极小值。

英文摘要

We consider the non-square matrix sensing problem, under restricted isometry property (RIP) assumptions. We focus on the non-convex formulation, where any rank-$r$ matrix $X \in \mathbb{R}^{m \times n}$ is represented as $UV^\top$, where $U \in \mathbb{R}^{m \times r}$ and $V \in \mathbb{R}^{n \times r}$. In this paper, we complement recent findings on the non-convex geometry of the analogous PSD setting [5], and show that matrix factorization does not introduce any spurious local minima, under RIP.

1609.06942 2026-06-04 stat.ML cs.LG cs.SY eess.SY math.PR math.ST stat.TH

Randomized Independent Component Analysis

随机独立成分分析

Matan Sela, Ron Kimmel

AI总结 本文提出基于随机特征的随机广义方差和随机典型相关作为替代措施,以降低计算复杂度并提高ICA分解效率。

详情
Comments
Accepted to ICSEE 2016
AI中文摘要

独立成分分析(ICA)是一种从未知线性组合的源信号观测中恢复统计独立信号的方法。一些最准确的ICA分解方法需要搜索最小化不同互信息近似值的逆变换,互信息是随机向量统计独立性的度量。两种这样的近似是核广义方差或核典型相关,已被证明能达到ICA方法的最高性能。然而,仅计算这些度量所需的计算努力与样本大小成立方关系。因此,优化它们在空间和时间上都变得更加计算密集。在此,我们提出了一种基于样本随机特征的替代新度量——随机广义方差和随机典型相关。所提出的替代措施的计算复杂度与样本大小成线性关系,并提供了可控的核非随机版本的近似。我们还展示了优化所提出的统计特性可以在数量级上比核方法更快地达到可比的分离误差。

英文摘要

Independent component analysis (ICA) is a method for recovering statistically independent signals from observations of unknown linear combinations of the sources. Some of the most accurate ICA decomposition methods require searching for the inverse transformation which minimizes different approximations of the Mutual Information, a measure of statistical independence of random vectors. Two such approximations are the Kernel Generalized Variance or the Kernel Canonical Correlation which has been shown to reach the highest performance of ICA methods. However, the computational effort necessary just for computing these measures is cubic in the sample size. Hence, optimizing them becomes even more computationally demanding, in terms of both space and time. Here, we propose a couple of alternative novel measures based on randomized features of the samples - the Randomized Generalized Variance and the Randomized Canonical Correlation. The computational complexity of calculating the proposed alternatives is linear in the sample size and provide a controllable approximation of their Kernel-based non-random versions. We also show that optimization of the proposed statistical properties yields a comparable separation error at an order of magnitude faster compared to Kernel-based measures.

1602.02164 2026-06-04 stat.ML cs.LG cs.NA math.NA

A Note on Alternating Minimization Algorithm for the Matrix Completion Problem

关于矩阵补全问题交替最小化算法的注记

David Gamarnik, Sidhant Misra

AI总结 本文分析了两种交替最小化算法变体在低秩矩阵补全问题中的性能,证明当矩阵秩为1且满足特定条件时,算法能在多项式时间内近似重建矩阵,并通过模拟结果表明第二种基于消息传递更新的算法表现更优。

详情
Comments
8 pages, 2 figures
AI中文摘要

我们考虑从矩阵部分条目重建低秩矩阵的问题,并分析了两种交替最小化算法的变体。我们证明当底层矩阵秩为1,具有正有界的条目,并且底层揭示条目的图$\mathcal{G}$具有有界度数和直径不超过矩阵规模对数时,两种算法都能在多项式时间内从任意初始化开始近似重建矩阵。我们进一步提供了模拟结果,表明基于消息传递类型更新的第二种算法表现更优。

英文摘要

We consider the problem of reconstructing a low rank matrix from a subset of its entries and analyze two variants of the so-called Alternating Minimization algorithm, which has been proposed in the past. We establish that when the underlying matrix has rank $r=1$, has positive bounded entries, and the graph $\mathcal{G}$ underlying the revealed entries has bounded degree and diameter which is at most logarithmic in the size of the matrix, both algorithms succeed in reconstructing the matrix approximately in polynomial time starting from an arbitrary initialization. We further provide simulation results which suggest that the second algorithm which is based on the message passing type updates, performs significantly better.

1510.01993 2026-06-04 eess.SY cs.SY stat.OT

Sensor Selection for Target Tracking in Wireless Sensor Networks with Uncertainty

无线传感器网络中带有不确定性的目标跟踪传感器选择

Nianxia Cao, Sora Choi, Engin Masazade, Pramod K. Varshney

AI总结 本文提出了一种多目标优化框架,用于解决不确定无线传感器网络中的传感器选择问题,通过低计算复杂度的MIUB方法实现估计性能,揭示了传感器选择策略中目标最小化和性能差距的权衡。

详情
AI中文摘要

本文提出了一种多目标优化框架,用于解决不确定无线传感器网络中的传感器选择问题。WSN的不确定性导致一组传感器观测值包含关于目标的不充分信息。我们提出了一种基于互信息上界(MIUB)的新型传感器选择方案,其计算复杂度与基于Fisher信息(FI)的方案相同,并且在估计性能上与基于互信息(MI)的方案相似。无需事先知道要选择的传感器数量,多目标优化问题(MOP)提供了一组传感器选择策略,揭示了两个冲突目标之间的不同权衡:所选传感器数量的最小化和在所有传感器传输测量与仅选定传感器传输测量时性能度量(MIUB和FI)之间差距的最小化。提供了具有价值见解的示例数值结果。

英文摘要

In this paper, we propose a multiobjective optimization framework for the sensor selection problem in uncertain Wireless Sensor Networks (WSNs). The uncertainties of the WSNs result in a set of sensor observations with insufficient information about the target. We propose a novel mutual information upper bound (MIUB) based sensor selection scheme, which has low computational complexity, same as the Fisher information (FI) based sensor selection scheme, and gives estimation performance similar to the mutual information (MI) based sensor selection scheme. Without knowing the number of sensors to be selected a priori, the multiobjective optimization problem (MOP) gives a set of sensor selection strategies that reveal different trade-offs between two conflicting objectives: minimization of the number of selected sensors and minimization of the gap between the performance metric (MIUB and FI) when all the sensors transmit measurements and when only the selected sensors transmit their measurements based on the sensor selection strategy. Illustrative numerical results that provide valuable insights are presented.

1509.08451 2026-06-04 cs.IT cs.NA math.IT math.NA math.OC math.ST stat.TH

Phase Retrieval Using Feasible Point Pursuit: Algorithms and Cramér-Rao Bound

利用可行点追求的相位恢复:算法与Cramér-Rao界

Cheng Qian, Nicholas D. Sidiropoulos, Kejun Huang, Lei Huang, H. C. So

AI总结 本文提出两种基于非凸二次约束二次规划的相位恢复算法,并推导了Cramér-Rao界,通过仿真显示LS-FPP在性能上优于现有算法并接近Cramér-Rao界。

详情
Comments
13 pages, 13 figures
AI中文摘要

通过恢复平方线性(秩一二次)测量来重建信号是一个具有重要应用(如光学和成像)的挑战性问题,称为相位恢复。本文提出两种基于非凸二次约束二次规划(QCQP)公式的相位恢复算法,并提出一种最近提出的近似技术,称为可行点追求(FPP)。第一种适用于均匀分布的有界测量误差,如来自高率量化(B-FPP)产生的误差。第二种适用于高斯测量误差,采用最小二乘准则(LS-FPP)。其性能通过与现有算法和Cramér-Rao界(CRB)进行比较来衡量,CRB也在此推导。仿真显示LS-FPP在性能上优于现有算法并接近CRB。通过在各种特殊情况下显式计算CRB,获得了紧凑的CRB表达式、性质和见解——包括当感兴趣的信号具有稀疏参数化时,以调和恢复为例。

英文摘要

Reconstructing a signal from squared linear (rank-one quadratic) measurements is a challenging problem with important applications in optics and imaging, where it is known as phase retrieval. This paper proposes two new phase retrieval algorithms based on non-convex quadratically constrained quadratic programming (QCQP) formulations, and a recently proposed approximation technique dubbed feasible point pursuit (FPP). The first is designed for uniformly distributed bounded measurement errors, such as those arising from high-rate quantization (B-FPP). The second is designed for Gaussian measurement errors, using a least squares criterion (LS-FPP). Their performance is measured against state-of-the-art algorithms and the Cramér-Rao bound (CRB), which is also derived here. Simulations show that LS-FPP outperforms the state-of-art and operates close to the CRB. Compact CRB expressions, properties, and insights are obtained by explicitly computing the CRB in various special cases -- including when the signal of interest admits a sparse parametrization, using harmonic retrieval as an example.

1609.05877 2026-06-04 math.OC cs.SY eess.SY stat.ML

Geometrically Convergent Distributed Optimization with Uncoordinated Step-Sizes

具有非协调步长的几何收敛分布式优化

Angelia Nedić, Alex Olshevsky, Wei Shi, César A. Uribe

AI总结 本文研究了在非协调步长下,ATC变体的DIGing算法的收敛速率,证明其在不同步长下仍能几何快速收敛,并表明ATC结构比原始DIGing算法中的DGD结构更高效。

详情
Comments
arXiv admin note: text overlap with arXiv:1607.03218
AI中文摘要

最近一种分布式优化算法家族,DIGing的,已被证明在时间变化的无向/有向图上具有几何收敛性。然而,需要所有代理使用相同的步长。在本文中,我们研究了在非协调步长下,ATC变体的DIGing算法的收敛速率。我们证明即使代理间的步长不同,ATC变体的DIGing算法仍能几何快速收敛。此外,我们的分析表明,与原始DIGing算法中使用的分布式梯度下降(DGD)结构相比,ATC结构可以加速收敛。

英文摘要

A recent algorithmic family for distributed optimization, DIGing's, have been shown to have geometric convergence over time-varying undirected/directed graphs. Nevertheless, an identical step-size for all agents is needed. In this paper, we study the convergence rates of the Adapt-Then-Combine (ATC) variation of the DIGing algorithm under uncoordinated step-sizes. We show that the ATC variation of DIGing algorithm converges geometrically fast even if the step-sizes are different among the agents. In addition, our analysis implies that the ATC structure can accelerate convergence compared to the distributed gradient descent (DGD) structure which has been used in the original DIGing algorithm.

1609.04623 2026-06-04 stat.ML cs.SY eess.SY

Distributed Estimation of the Operating State of a Single-Bus DC MicroGrid without an External Communication Interface

无外部通信接口的单母线直流微电网运行状态分布式估计

Marko Angjelichinoski, Anna Scaglione, Petar Popovski, Cedomir Stefanovic

AI总结 本文提出一种去中心化最大似然方法,用于估计单母线直流微电网中随机可再生能源发电和需求,通过可控电压扰动实现分布式训练,避免外部通信接口,有效估计稳态电压。

详情
Comments
Accepted to GlobalSIP 2016
AI中文摘要

我们提出了一种去中心化的最大似然解,用于估计单母线直流微电网中随机可再生能源发电和需求,其中存在大量基于下垂控制的电力电子转换器。该解决方案依赖于主控参数与本地发电机发电状态一致的事实。因此,稳态电压本质上依赖于发电容量和负载,通过非线性参数模型进行估计。为获得良好的估计问题条件,我们的解决方案避免使用外部通信接口,并利用可控电压扰动进行分布式训练。使用该工具,我们开发了高效的去中心化最大似然估计器(MLE),并制定了全局最优解存在的充分条件。数值结果展示了我们MLE算法的有希望性能。

英文摘要

We propose a decentralized Maximum Likelihood solution for estimating the stochastic renewable power generation and demand in single bus Direct Current (DC) MicroGrids (MGs), with high penetration of droop controlled power electronic converters. The solution relies on the fact that the primary control parameters are set in accordance with the local power generation status of the generators. Therefore, the steady state voltage is inherently dependent on the generation capacities and the load, through a non-linear parametric model, which can be estimated. To have a well conditioned estimation problem, our solution avoids the use of an external communication interface and utilizes controlled voltage disturbances to perform distributed training. Using this tool, we develop an efficient, decentralized Maximum Likelihood Estimator (MLE) and formulate the sufficient condition for the existence of the globally optimal solution. The numerical results illustrate the promising performance of our MLE algorithm.

1609.03344 2026-06-04 stat.ML cs.LG econ.GN math.ST q-fin.EC stat.CO stat.TH

Finite-sample and asymptotic analysis of generalization ability with an application to penalized regression

有限样本与渐近分析中的泛化能力研究:以正则化回归应用为例

Ning Xu, Jian Hong, Timothy C. G. Fisher

AI总结 本文从泛化能力角度研究极值估计器性能,推导了泛化误差上界,并探讨了交叉验证中K值对偏差方差权衡的影响,证明了正则化回归估计在高维数据下的L2一致性。

详情
Comments
The theoretical generalization and extension of arXiv:1606.00142
AI中文摘要

本文从泛化能力(GA)角度研究极值估计器的性能:即模型在新样本上预测结果的能力。通过适应经典集中不等式,我们推导了经验外样本预测误差的上界,作为样本内误差、样本量、误差分布尾部重性及模型复杂度的函数。我们证明误差界可用于调节关键估计超参数,如交叉验证中K值。我们还展示了K值如何影响交叉验证的偏差方差权衡。最后,我们证明所有正则化回归估计在n≥p和n<p情况下均为L2一致。通过模拟验证了关键结果。

英文摘要

In this paper, we study the performance of extremum estimators from the perspective of generalization ability (GA): the ability of a model to predict outcomes in new samples from the same population. By adapting the classical concentration inequalities, we derive upper bounds on the empirical out-of-sample prediction errors as a function of the in-sample errors, in-sample data size, heaviness in the tails of the error distribution, and model complexity. We show that the error bounds may be used for tuning key estimation hyper-parameters, such as the number of folds $K$ in cross-validation. We also show how $K$ affects the bias-variance trade-off for cross-validation. We demonstrate that the $\mathcal{L}_2$-norm difference between penalized and the corresponding un-penalized regression estimates is directly explained by the GA of the estimates and the GA of empirical moment conditions. Lastly, we prove that all penalized regression estimates are $L_2$-consistent for both the $n \geqslant p$ and the $n < p$ cases. Simulations are used to demonstrate key results. Keywords: generalization ability, upper bound of generalization error, penalized regression, cross-validation, bias-variance trade-off, $\mathcal{L}_2$ difference between penalized and unpenalized regression, lasso, high-dimensional data.

1606.00602 2026-06-04 stat.ML cs.LG cs.NA math.NA

Variance-Reduced Proximal Stochastic Gradient Descent for Non-convex Composite optimization

方差缩减的非凸复合优化的近端随机梯度下降

Xiyu Yu, Dacheng Tao

AI总结 本文提出非凸复合优化的方差缩减近端随机梯度下降方法,证明其在非凸情况下能以O(1/ε)迭代收敛至 stationary 点,优于随机梯度下降。

详情
Comments
This paper has been withdrawn by the author due to an error in the proof of the convergence rate. They will modify this proof as soon as possible
AI中文摘要

本文研究非凸复合优化问题:首先是一个有限和的光滑但非凸函数,其次是一个具有简单近端映射的通用函数。大多数关于复合优化随机方法的研究假设每个函数是凸的或强凸的。本文通过方差缩减技术(如prox-SVRG和prox-SAGA)将问题扩展到非凸设置。证明在固定步长下,prox-SVRG和prox-SAGA适用于非凸复合优化,并能在O(1/ε)次迭代内收敛至 stationary 点。这与RSAG方法的收敛速度相似,但比随机梯度下降更快。本文分析还扩展到min-batch设置,线性加速收敛。到目前为止,这是首个关于非凸复合优化中方差缩减近端随机梯度方法收敛率的分析。

英文摘要

Here we study non-convex composite optimization: first, a finite-sum of smooth but non-convex functions, and second, a general function that admits a simple proximal mapping. Most research on stochastic methods for composite optimization assumes convexity or strong convexity of each function. In this paper, we extend this problem into the non-convex setting using variance reduction techniques, such as prox-SVRG and prox-SAGA. We prove that, with a constant step size, both prox-SVRG and prox-SAGA are suitable for non-convex composite optimization, and help the problem converge to a stationary point within $O(1/ε)$ iterations. That is similar to the convergence rate seen with the state-of-the-art RSAG method and faster than stochastic gradient descent. Our analysis is also extended into the min-batch setting, which linearly accelerates the convergence. To the best of our knowledge, this is the first analysis of convergence rate of variance-reduced proximal stochastic gradient for non-convex composite optimization.

1510.00024 2026-06-04 math.NA cs.NA stat.CO

Accelerating MCMC with active subspaces

用主动子空间加速MCMC

Paul G. Constantine, Carson Kent, Tan Bui-Thanh

AI总结 本文提出通过主动子空间降低维度以加速MCMC,减少高维空间探索的计算负担,并在两个例子中验证了该方法的有效性。

详情
AI中文摘要

本文提出通过主动子空间降低维度以加速MCMC,减少高维空间探索的计算负担,并在两个例子中验证了该方法的有效性。

英文摘要

The Markov chain Monte Carlo (MCMC) method is the computational workhorse for Bayesian inverse problems. However, MCMC struggles in high-dimensional parameter spaces, since its iterates must sequentially explore the high-dimensional space. This struggle is compounded in physical applications when the nonlinear forward model is computationally expensive. One approach to accelerate MCMC is to reduce the dimension of the state space. Active subspaces are part of an emerging set of tools for subspace-based dimension reduction. An active subspace in a given inverse problem indicates a separation between a low-dimensional subspace that is informed by the data and its orthogonal complement that is constrained by the prior. With this information, one can run the sequential MCMC on the active variables while sampling independently according to the prior on the inactive variables. However, this approach to increase efficiency may introduce bias. We provide a bound on the Hellinger distance between the true posterior and its active subspace- exploiting approximation. And we demonstrate the active subspace-accelerated MCMC on two computational examples: (i) a two-dimensional parameter space with a quadratic forward model and one-dimensional active subspace and (ii) a 100-dimensional parameter space with a PDE-based forward model and a two-dimensional active subspace.

1505.03173 2026-06-04 stat.CO cs.NA math.NA math.OC

Improving Simulated Annealing through Derandomization

通过去随机化改进模拟退火

Mathieu Gerber, Luke Bornn

AI总结 本文提出基于(t,s)_R序列的连续状态空间模拟退火方法,证明其在矩形域内几乎必然收敛于全局最优解,且在单变量情况下具有完全确定性。

详情
Comments
33 pages, 4 figures (final version)
AI中文摘要

我们提出并研究了一种基于$(t,s)_R$序列的连续状态空间模拟退火方法。参数$R\in\bar{\mathbb{N}}$调节输入序列的随机性,$R=0$对应于独立同分布的均匀随机数,$R=\infty$对应于$(t,s)$序列。我们的主要结果是在矩形域内证明,所得到的优化方法,我们称为QMC-SA,在任何$R\in\mathbb{N}$的情况下几乎必然收敛于目标函数$φ$的全局最优解。当$φ$为单变量时,我们还能证明QMC-SA的完全确定性版本是收敛的。这些结果的一个关键性质是它们不需要对冷却计划有依赖于目标函数的条件。作为我们理论分析的推论,我们提供了一个新的几乎必然收敛结果,该结果在对$φ$的最小假设下具有这一性质。我们进一步解释了我们的结果实际上适用于更广泛的优化方法,包括例如阈值接受,目前对此方法尚无收敛结果。我们最后通过数值研究展示了QMC-SA在性能上的优越性。

英文摘要

We propose and study a version of simulated annealing (SA) on continuous state spaces based on $(t,s)_R$-sequences. The parameter $R\in\bar{\mathbb{N}}$ regulates the degree of randomness of the input sequence, with the case $R=0$ corresponding to IID uniform random numbers and the limiting case $R=\infty$ to $(t,s)$-sequences. Our main result, obtained for rectangular domains, shows that the resulting optimization method, which we refer to as QMC-SA, converges almost surely to the global optimum of the objective function $φ$ for any $R\in\mathbb{N}$. When $φ$ is univariate, we are in addition able to show that the completely deterministic version of QMC-SA is convergent. A key property of these results is that they do not require objective-dependent conditions on the cooling schedule. As a corollary of our theoretical analysis, we provide a new almost sure convergence result for SA which shares this property under minimal assumptions on $φ$. We further explain how our results in fact apply to a broader class of optimization methods including for example threshold accepting, for which to our knowledge no convergence results currently exist. We finally illustrate the superiority of QMC-SA over SA algorithms in a numerical study.

1607.03390 2026-06-04 stat.CO cs.NA math.NA

Information Splitting for Big Data Analytics

大数据分析中的信息分割

Shengxin Zhu, Tongxiang Gu, Xiaowen Xu, Zeyao Mo

AI总结 本文提出信息分割技术,简化大数据集的计算,使线性混合模型适用于大规模数据集,通过矩阵代数变换实现更高效的计算。

详情
Comments
arXiv admin note: text overlap with arXiv:1605.07646
AI中文摘要

许多统计模型需要估计未知(协)方差参数,通常通过最大化对数似然函数来获得,涉及对数行列式项。理论上,需要观测信息(负Hessian矩阵或对数似然的二阶导数)以获得准确的最大似然估计。当使用Fisher信息(观测信息的期望值)时,可以得到比牛顿法更简单的Fisher评分算法。随着生物科学、推荐系统和社会网络中高通量技术的发展,数据集的规模以及相应的统计模型突然增加了多个数量级。观测信息和Fisher信息对于这些大数据集难以获得。本文引入信息分割技术以简化计算。在分割观测信息和Fisher信息的均值后,可以得到一个更简单的对数似然的近似Hessian矩阵。这种近似Hessian矩阵可以显著减少计算量,并使线性混合模型适用于大数据集。这种分割和更简单的公式严重依赖于矩阵代数变换,并适用于大规模育种模型和基因组广义关联分析。

英文摘要

Many statistical models require an estimation of unknown (co)-variance parameter(s) in a model. The estimation usually obtained by maximizing a log-likelihood which involves log determinant terms. In principle, one requires the \emph{observed information}--the negative Hessian matrix or the second derivative of the log-likelihood---to obtain an accurate maximum likelihood estimator according to the Newton method. When one uses the \emph{Fisher information}, the expect value of the observed information, a simpler algorithm than the Newton method is obtained as the Fisher scoring algorithm. With the advance in high-throughput technologies in the biological sciences, recommendation systems and social networks, the sizes of data sets---and the corresponding statistical models---have suddenly increased by several orders of magnitude. Neither the observed information nor the Fisher information is easy to obtained for these big data sets. This paper introduces an information splitting technique to simplify the computation. After splitting the mean of the observed information and the Fisher information, an simpler approximate Hessian matrix for the log-likelihood can be obtained. This approximated Hessian matrix can significantly reduce computations, and makes the linear mixed model applicable for big data sets. Such a spitting and simpler formulas heavily depends on matrix algebra transforms, and applicable to large scale breeding model, genetics wide association analysis.

1609.00051 2026-06-04 eess.SY cs.SY stat.AP

Estimation and Control of Quality of Service in Demand Dispatch

需求调度中的服务质量估计与控制

Yue Chen, Ana Bušić, Sean Meyn

AI总结 本文提出通过统计技术估计服务质量的均值和方差,以提高电网级的跟踪性能,同时通过局部控制截断服务质量直方图,确保服务质量的严格界限。

详情
Comments
Submitted for publication, August 2016. arXiv admin note: text overlap with arXiv:1409.6941
AI中文摘要

如今已知,能源消费的灵活性可以用于电网级的辅助服务。特别是,通过分布式控制一组负载,可以准确跟踪平衡机构的调节信号,同时确保每个负载的服务质量(QoS)在平均上是可接受的。本文认为QoS的直方图近似高斯分布,因此最终每个负载将收到差的服务。开发了统计技术来估计QoS的均值和方差,作为调节信号功率谱密度函数的函数。还证明了额外的本地控制可以消除风险:通过这种本地控制截断QoS的直方图,从而保证服务质量的严格界限。尽管电网级跟踪性能(容量和精度)与对QoS施加的界限之间存在权衡,但发现在典型情况下容量的损失是轻微的。

英文摘要

It is now well known that flexibility of energy consumption can be harnessed for the purposes of grid-level ancillary services. In particular, through distributed control of a collection of loads, a balancing authority regulation signal can be tracked accurately, while ensuring that the quality of service (QoS) for each load is acceptable {\it on average}. In this paper it is argued that a histogram of QoS is approximately Gaussian, and consequently each load will eventually receive poor service. Statistical techniques are developed to estimate the mean and variance of QoS as a function of the power spectral density of the regulation signal. It is also shown that additional local control can eliminate risk: The histogram of QoS is {\it truncated} through this local control, so that strict bounds on service quality are guaranteed. While there is a tradeoff between the grid-level tracking performance (capacity and accuracy) and the bounds imposed on QoS, it is found that the loss of capacity is minor in typical cases.

1608.08285 2026-06-04 math.NA cs.NA math.ST stat.TH

Integrating multiple random sketches for singular value decomposition

将多种随机草图整合用于奇异值分解

Ting-Li Chen, Dawei D. Chang, Su-Yun Huang, Hung Chen, Chienyao Lin, Weichung Wang

AI总结 本文提出基于多种随机草图的蒙特卡洛类型整合SVD算法,通过优化问题与矩阵Stiefel流形约束,提高SVD精度与稳定性,理论分析证明其一致性,数值结果表明其有效性。

详情
AI中文摘要

大规模矩阵的奇异值分解(SVD)是数据科学和科学计算中的关键工具。随着矩阵规模的快速增长,开发高效的大规模SVD算法需求增加。基于一次随机草图的随机化SVD已被研究,并展示了计算低秩SVD的潜力。本文提出一种基于多种随机草图的蒙特卡洛类型整合SVD算法。该算法通过整合多个随机草图子空间的结果,从而实现更高的精度和更低的随机波动。整合的核心是一个具有矩阵Stiefel流形约束的优化问题。该优化问题通过Kolmogorov-Nagumo型平均解决。理论分析表明,通过总体平均可以诱导奇异向量,并确保计算和真实子空间及奇异向量之间的一致性。统计分析进一步证明了大数定律,并通过中心极限定理给出了收敛速率。初步的数值结果表明,所提出的整合SVD算法具有前景。

英文摘要

The singular value decomposition (SVD) of large-scale matrices is a key tool in data analytics and scientific computing. The rapid growth in the size of matrices further increases the need for developing efficient large-scale SVD algorithms. Randomized SVD based on one-time sketching has been studied, and its potential has been demonstrated for computing a low-rank SVD. Instead of exploring different single random sketching techniques, we propose a Monte Carlo type integrated SVD algorithm based on multiple random sketches. The proposed integration algorithm takes multiple random sketches and then integrates the results obtained from the multiple sketched subspaces. So that the integrated SVD can achieve higher accuracy and lower stochastic variations. The main component of the integration is an optimization problem with a matrix Stiefel manifold constraint. The optimization problem is solved using Kolmogorov-Nagumo-type averages. Our theoretical analyses show that the singular vectors can be induced by population averaging and ensure the consistencies between the computed and true subspaces and singular vectors. Statistical analysis further proves a strong Law of Large Numbers and gives a rate of convergence by the Central Limit Theorem. Preliminary numerical results suggest that the proposed integrated SVD algorithm is promising.

1608.07207 2026-06-04 math.NA cs.NA stat.CO

Computing log-likelihood and its derivatives for restricted maximum likelihood methods

计算限制最大似然方法中的对数似然及其导数

Shengxin Zhu

AI总结 本文提出通过串行矩阵变换和平均信息分割技术,高效计算线性混合模型中限制最大似然方法的对数似然及其导数,以提高大规模基因组关联分析的统计效率。

详情
Comments
11
AI中文摘要

近年来,大规模基因组关联分析涉及大规模线性混合模型。在混合模型中使用限制最大似然方法量化(协)方差参数会得到一个作为对数似然一阶导数的得分函数。为了获得方差参数的统计高效估计,需要通过牛顿方法找到得分函数的根。大多数雅可比矩阵的元素涉及四个参数矩阵-矩阵乘法的迹。对于大规模数据集,这在计算上是不可行的。通过串行矩阵变换和平均信息分割技术,可以得到一个近似的雅可比矩阵,该矩阵通过将雅可比矩阵的平均值与其期望值分开来获得。在近似的雅可比矩阵中,其元素仅涉及四个矩阵向量乘法,可以通过求解稀疏线性系统的方法用多前导因子分解法高效评估。

英文摘要

Recent large scale genome wide association analysis involves large scale linear mixed models. Quantifying (co)-variance parameters in the mixed models with a restricted maximum likelihood method results in a score function which is the first derivative of a log-likelihood. To obtain a statistically efficient estimate of the variance parameters, one needs to find the root of the score function via the Newton method. Most elements of the Jacobian matrix of the score involve a trace term of four parametric matrix-matrix multiplications. It is computationally prohibitively for large scale data sets. By a serial matrix transforms and an averaged information splitting technique, an approximate Jacobian matrix can be obtained by splitting the average of the Jacobian matrix and its expected value. In the approximated Jacobian, its elements only involve Four matrix vector multiplications which can be efficiently evaluated by solving sparse linear systems with the multi-frontal factorization method.

1508.02473 2026-06-04 math.ST econ.GN q-fin.EC stat.ML stat.TH

Bridging AIC and BIC: a new criterion for autoregression

连接AIC和BIC:用于自回归模型的新的准则

Jie Ding, Vahid Tarokh, Yuhong Yang

AI总结 本文提出一种新的准则,结合AIC和BIC的优点,用于确定时间序列自回归模型的阶数,适应不同真实模型需求。

详情
AI中文摘要

本文提出了一种新的准则,结合AIC和BIC的优点,用于确定时间序列自回归模型的阶数,适应不同真实模型需求。该准则在数据由有限阶自回归生成时具有一致性,在真实阶数为无穷或与样本量适当高时具有效率性。与经典准则不同,所提准则能够根据底层真实模型自适应地实现一致性或效率。在实际应用中,当观察到的时间序列没有关于模型规范的先验信息时,所提的阶数选择准则比经典方法更加灵活和稳健。数值结果展示了该技术在应用于各种数据集时的适应性。

英文摘要

We introduce a new criterion to determine the order of an autoregressive model fitted to time series data. It has the benefits of the two well-known model selection techniques, the Akaike information criterion and the Bayesian information criterion. When the data is generated from a finite order autoregression, the Bayesian information criterion is known to be consistent, and so is the new criterion. When the true order is infinity or suitably high with respect to the sample size, the Akaike information criterion is known to be efficient in the sense that its prediction performance is asymptotically equivalent to the best offered by the candidate models; in this case, the new criterion behaves in a similar manner. Different from the two classical criteria, the proposed criterion adaptively achieves either consistency or efficiency depending on the underlying true model. In practice where the observed time series is given without any prior information about the model specification, the proposed order selection criterion is more flexible and robust compared with classical approaches. Numerical results are presented demonstrating the adaptivity of the proposed technique when applied to various datasets.

1511.07492 2026-06-04 math.NA cs.NA stat.CO

Polynomial meta-models with canonical low-rank approximations: numerical insights and comparison to sparse polynomial chaos expansions

多项式元模型与规范低秩近似:数值洞察与与稀疏多项式混沌展开的比较

Katerina Konakli, Bruno Sudret

AI总结 本文探讨了规范低秩近似在高维问题中构建多项式元模型的效率,展示了其在小样本情况下优于稀疏多项式混沌展开的性能。

详情
AI中文摘要

随着复杂计算模型不确定性分析需求的增长,元模型在工程和科学中的应用日益广泛。元模型技术的效率依赖于其能够基于少量原始模型评估提供统计等效的解析表示。多项式混沌展开(PCE)已被证明是开发元模型的强大工具;其核心思想是将模型响应展开到由多元多项式构成的基上,这些多项式作为适当单变量多项式的张量积获得。经典的PCE方法面临“维度诅咒”,即输入维度增加时基大小呈指数增长。为解决这一限制,提出了稀疏PCE技术,其中仅在少量相关基项上进行展开,这些基项由合适的算法自动选择。另一种在高维问题中开发多项式函数元模型的方法是新兴的低秩近似(LRA)方法。通过利用多元基的张量积结构,LRA可以提供高度压缩的多项式表示。通过广泛的数值研究,本文首次揭示了规范LRA构造中涉及特定贪心算法的问题,该算法包含沿不同维度依次更新多项式系数的过程。在可用模型评估数量相对于输入维度较小时,规范LRA表现出比稀疏PCE更小的误差。通过引入条件泛化误差,我们进一步证明规范LRA在预测极值模型响应方面优于稀疏PCE。

英文摘要

The growing need for uncertainty analysis of complex computational models has led to an expanding use of meta-models across engineering and sciences. The efficiency of meta-modeling techniques relies on their ability to provide statistically-equivalent analytical representations based on relatively few evaluations of the original model. Polynomial chaos expansions (PCE) have proven a powerful tool for developing meta-models in a wide range of applications; the key idea thereof is to expand the model response onto a basis made of multivariate polynomials obtained as tensor products of appropriate univariate polynomials. The classical PCE approach nevertheless faces the "curse of dimensionality", namely the exponential increase of the basis size with increasing input dimension. To address this limitation, the sparse PCE technique has been proposed, in which the expansion is carried out on only a few relevant basis terms that are automatically selected by a suitable algorithm. An alternative for developing meta-models with polynomial functions in high-dimensional problems is offered by the newly emerged low-rank approximations (LRA) approach. By exploiting the tensor-product structure of the multivariate basis, LRA can provide polynomial representations in highly compressed formats. Through extensive numerical investigations, we herein first shed light on issues relating to the construction of canonical LRA with a particular greedy algorithm involving a sequential updating of the polynomial coefficients along separate dimensions. Canonical LRA exhibit smaller errors than sparse PCE in cases when the number of available model evaluations is small with respect to the input dimension. By introducing the conditional generalization error, we further demonstrate that canonical LRA tend to outperform sparse PCE in the prediction of extreme model responses.

1608.05338 2026-06-04 stat.CO cs.CE cs.NA math.NA

The multi-level Monte Carlo method for simulations of turbulent flows

多级蒙特卡洛方法在湍流流动模拟中的应用

Qingsha Chen, Ju Ming

AI总结 本文研究了多级蒙特卡洛方法在具有不确定参数的湍流流动数值模拟中的应用,探讨了多种设置策略及其优缺点,并通过南极环流实验展示该方法能提高效率而不影响不确定性评估的准确性。

详情
Comments
4 figures
AI中文摘要

本文探讨了多级蒙特卡洛(MLMC)方法在具有不确定参数的湍流流动数值模拟中的应用。提出了几种设置MLMC方法的策略,并讨论了每种策略的优缺点。通过使用具有不确定的小尺度底地形的南极环流(ACC)进行数值实验,证明了与点解法不同,平均体积输运在网格分辨率上是相关的,且MLMC方法能够在不损失不确定性评估准确性的情况下提高模拟效率。

英文摘要

In this paper the application of the multi-level Monte Carlo (MLMC) method on numerical simulations of turbulent flows with uncertain parameters is investigated. Several strategies for setting up the MLMC method are presented, and the advantages and disadvantages of each strategy are also discussed. A numerical experiment is carried out using the Antarctic Circumpolar Current (ACC) with uncertain, small-scale bottom topographic features. It is demonstrated that, unlike the pointwise solutions, the averaged volume transports are correlated across grid resolutions, and the MLMC method could increase simulation efficiency without losing accuracy in uncertainty assessment.

1509.02314 2026-06-04 math.NA cs.NA math.OC stat.ML

A Scalable and Extensible Framework for Superposition-Structured Models

可扩展且可扩展的框架用于叠加结构模型

Shenjian Zhao, Cong Xie, Zhihua Zhang

AI总结 本文提出一种可扩展且可扩展的框架,用于解决叠加结构模型,通过近端牛顿型方法实现高效求解,并在多个数据集上展示了其强大的性能和超线性收敛速度。

详情
Journal ref
AAAI 2016: 2372-2378
AI中文摘要

在许多学习任务中,结构模型通常能带来更好的可解释性和更高的泛化性能。然而,近年来简单的结构模型如Lasso被证明是不足的。因此,有很多工作致力于“叠加结构”模型,其中施加了多个结构约束。为了高效求解这些“叠加结构”统计模型,我们开发了一种基于近端牛顿型方法的框架。采用平滑锥对偶方法与LBFGS更新公式,我们提出了一种可扩展且可扩展的近端拟牛顿(SEP-QN)框架。在各种数据集上的实证分析表明,我们的框架具有潜力,并在优化一些流行的“叠加结构”统计模型如融合稀疏组Lasso时实现了超线性收敛速度。

英文摘要

In many learning tasks, structural models usually lead to better interpretability and higher generalization performance. In recent years, however, the simple structural models such as lasso are frequently proved to be insufficient. Accordingly, there has been a lot of work on "superposition-structured" models where multiple structural constraints are imposed. To efficiently solve these "superposition-structured" statistical models, we develop a framework based on a proximal Newton-type method. Employing the smoothed conic dual approach with the LBFGS updating formula, we propose a scalable and extensible proximal quasi-Newton (SEP-QN) framework. Empirical analysis on various datasets shows that our framework is potentially powerful, and achieves super-linear convergence rate for optimizing some popular "superposition-structured" statistical models such as the fused sparse group lasso.

1607.03592 2026-06-04 stat.CO cs.NA math.NA stat.AP

Cluster Sampling Filters for Non-Gaussian Data Assimilation

非高斯数据同化中的聚类采样滤波器

Ahmed Attia, Azam Moosavi, Adrian Sandu

AI总结 本文提出一种非高斯Hamilton蒙特卡罗采样滤波器,通过引入聚类步骤和高斯混合模型估计先验密度,实现对非高斯先验的灵活处理,验证了该方法在准地转模型中的有效性。

详情
AI中文摘要

本文提出了一种完全非高斯版本的Hamilton蒙特卡罗(HMC)采样滤波器。在原始HMC滤波器中对高斯先验假设的放松是关键。具体而言,在滤波的预报阶段之后引入聚类步骤,并通过拟合高斯混合模型(GMM)到先验集合来估计先验密度函数。利用数据似然函数,后验密度被公式化为混合密度,并通过HMC方法(或其他能够高维子空间中采样多模态密度的方案)进行采样。本文开发的主要滤波器称为“聚类HMC采样滤波器”(ClHMC)。还提出了ClHMC的多链版本,即MC-ClHMC,以确保样本从后验概率模式的所有邻域中被采样。新方法在双涡旋风强迫和双谐波摩擦的准地转模型中进行了测试。数值结果展示了使用GMM放松HMC滤波范式中高斯先验假设的有用性。

英文摘要

This paper presents a fully non-Gaussian version of the Hamiltonian Monte Carlo (HMC) sampling filter. The Gaussian prior assumption in the original HMC filter is relaxed. Specifically, a clustering step is introduced after the forecast phase of the filter, and the prior density function is estimated by fitting a Gaussian Mixture Model (GMM) to the prior ensemble. Using the data likelihood function, the posterior density is then formulated as a mixture density, and is sampled using a HMC approach (or any other scheme capable of sampling multimodal densities in high-dimensional subspaces). The main filter developed herein is named "cluster HMC sampling filter" (ClHMC). A multi-chain version of the ClHMC filter, namely MC-ClHMC is also proposed to guarantee that samples are taken from the vicinities of all probability modes of the formulated posterior. The new methodologies are tested using a quasi-geostrophic (QG) model with double-gyre wind forcing and bi-harmonic friction. Numerical results demonstrate the usefulness of using GMMs to relax the Gaussian prior assumption in the HMC filtering paradigm.

1608.00075 2026-06-04 stat.ML cs.IT cs.NA math.IT math.NA math.OC

Online Nonnegative Matrix Factorization with General Divergences

在线非负矩阵分解与通用分歧度

Renbo Zhao, Vincent Y. F. Tan, Huan Xu

AI总结 本文提出了一种统一的在线非负矩阵分解框架,适用于多种重要分歧度,证明了学习字典序列几乎必然收敛到期望损失函数的临界点,并在合成和真实数据集上验证了算法的计算效率和字典质量。

详情
AI中文摘要

我们开发了一个统一且系统的框架,用于在多种重要分歧度下执行在线非负矩阵分解。我们的算法具有在线特性,特别适合大规模数据。我们通过利用随机近似理论和投影动力系统理论,证明了学习字典序列几乎必然收敛到期望损失函数的临界点。这一结果大大扩展了之前仅针对平方$\ell_2$损失的结果。此外,我们的分析中涉及的新技术为类似的矩阵分解问题提供了新的分析途径。我们在合成和真实数据集上验证了算法的计算效率和学习字典的质量。特别是,在主题学习、阴影去除和图像去噪任务中,我们的算法在质量和运行时间之间实现了优于批量和其他在线NMF算法的权衡。

英文摘要

We develop a unified and systematic framework for performing online nonnegative matrix factorization under a wide variety of important divergences. The online nature of our algorithm makes it particularly amenable to large-scale data. We prove that the sequence of learned dictionaries converges almost surely to the set of critical points of the expected loss function. We do so by leveraging the theory of stochastic approximations and projected dynamical systems. This result substantially generalizes the previous results obtained only for the squared-$\ell_2$ loss. Moreover, the novel techniques involved in our analysis open new avenues for analyzing similar matrix factorization problems. The computational efficiency and the quality of the learned dictionary of our algorithm are verified empirically on both synthetic and real datasets. In particular, on the tasks of topic learning, shadow removal and image denoising, our algorithm achieves superior trade-offs between the quality of learned dictionary and running time over the batch and other online NMF algorithms.

1601.00863 2026-06-04 math.OC cs.CE cs.DC cs.NA math.NA stat.ML

Coordinate Friendly Structures, Algorithms and Applications

坐标友好结构、算法与应用

Zhimin Peng, Tianyu Wu, Yangyang Xu, Ming Yan, Wotao Yin

AI总结 本文提出新的坐标更新算法,适用于大规模数据集的机器学习、图像处理等领域,通过并行计算提升效率。

详情
Journal ref
Annals of Mathematical Sciences and Applications, 1 (2016), 57-119
AI中文摘要

本文聚焦于坐标更新方法,这些方法在处理大规模或高维数据集的问题时尤为有效。它们将问题分解为简单的子问题,每个子问题更新一个或一小块变量,同时固定其他变量。这些方法可以处理线性和非线性映射、光滑和非光滑函数,以及凸和非凸问题。此外,它们易于并行化。坐标更新方法的高性能依赖于解决简单的子问题。为了为几种新的应用类别导出简单的子问题,本文系统地研究了能够执行低成本坐标更新的坐标友好操作符。基于发现的坐标友好操作符以及操作符拆分技术,我们为机器学习、图像处理以及优化的子领域获得了新的坐标更新算法。几个问题首次采用坐标更新方法进行处理。通过并行和甚至异步计算,所获得的算法能够扩展到大规模实例。我们通过数值示例展示了这些算法的有效性。

英文摘要

This paper focuses on coordinate update methods, which are useful for solving problems involving large or high-dimensional datasets. They decompose a problem into simple subproblems, where each updates one, or a small block of, variables while fixing others. These methods can deal with linear and nonlinear mappings, smooth and nonsmooth functions, as well as convex and nonconvex problems. In addition, they are easy to parallelize. The great performance of coordinate update methods depends on solving simple sub-problems. To derive simple subproblems for several new classes of applications, this paper systematically studies coordinate-friendly operators that perform low-cost coordinate updates. Based on the discovered coordinate friendly operators, as well as operator splitting techniques, we obtain new coordinate update algorithms for a variety of problems in machine learning, image processing, as well as sub-areas of optimization. Several problems are treated with coordinate update for the first time in history. The obtained algorithms are scalable to large instances through parallel and even asynchronous computing. We present numerical examples to illustrate how effective these algorithms are.

1510.03591 2026-06-04 stat.ML cs.SY eess.SY math.OC

Dual Control for Approximate Bayesian Reinforcement Learning

双重控制用于近似贝叶斯强化学习

Edgar D. Klenske, Philipp Hennig

AI总结 本文探讨了在不确定动态系统中使用双重控制方法解决探索与利用权衡问题,通过推广到现代回归方法,提出了适用于高斯过程回归和前馈神经网络的探索策略。

详情
Comments
30 pages, 7 figures
AI中文摘要

对具有不确定动态的非回合、有限时间动态系统进行控制是一个探索与利用权衡的基本问题。贝叶斯强化学习通过考虑动作和未来观测的影响,提供了系统性的解决方案,但计算上不可行。本文回顾并扩展了控制理论中的旧近似方法——双重控制——在现代回归方法中的应用,特别是广义线性回归。实验表明,该框架为贝叶斯强化学习的不可行部分提供了有用的近似,产生了一种结构化的探索策略,不同于标准强化学习方法。本文还提供了该框架在(近似)高斯过程回归和前馈神经网络中的应用示例。

英文摘要

Control of non-episodic, finite-horizon dynamical systems with uncertain dynamics poses a tough and elementary case of the exploration-exploitation trade-off. Bayesian reinforcement learning, reasoning about the effect of actions and future observations, offers a principled solution, but is intractable. We review, then extend an old approximate approach from control theory---where the problem is known as dual control---in the context of modern regression methods, specifically generalized linear regression. Experiments on simulated systems show that this framework offers a useful approximation to the intractable aspects of Bayesian RL, producing structured exploration strategies that differ from standard RL approaches. We provide simple examples for the use of this framework in (approximate) Gaussian process regression and feedforward neural networks for the control of exploration.

1410.5939 2026-06-04 math.ST cs.NA math.NA stat.TH

Statistical Analysis of Synchrosqueezed Transforms

同步压缩变换的统计分析

Haizhao Yang

AI总结 本文研究了在多维空间中嵌入波形成分的广义高斯随机过程中,紧凑支持同步压缩变换的统计特性,并提出新的数值实现以减少噪声对变换的影响。

详情
AI中文摘要

同步压缩变换是非线性过程,用于获得波形成分的锐化时频表示。它们是高效工具,用于从其叠加中识别和分析波形成分。本文关注的是在多维空间中嵌入波形成分的广义高斯随机过程中的紧凑支持同步压缩变换的统计特性。在这些性质的理论分析指导下,提出了新的数值实现,以减少这些变换在噪声数据上的噪声波动。提供的MATLAB包SynLab以及几个高度噪声的示例支持这些理论主张。

英文摘要

Synchrosqueezed transforms are non-linear processes for a sharpened time-frequency representation of wave-like components. They are efficient tools for identifying and analyzing wave-like components from their superposition. This paper is concerned with the statistical properties of compactly supported synchrosqueezed transforms for wave-like components embedded in a generalized Gaussian random process in multidimensional spaces. Guided by the theoretical analysis of these properties, new numerical implementations are proposed to reduce the noise fluctuations of these transforms on noisy data. A MATLAB package SynLab together with several heavily noisy examples is provided to support these theoretical claims.

1608.02221 2026-06-04 stat.AP cs.NA math.NA

Quantile based global sensitivity measures

基于分位数的全局敏感性度量

Sergei Kucherenko, Shufang Song

AI总结 本文提出基于输出分位数的新型全局敏感性度量方法,用于分析量化函数和识别关键变量,证明其与Sobol主效应指数的联系,并通过蒙特卡洛估计器验证方法效率,结合结构安全案例展示其有效性。

详情
Comments
25 pages, 9 figures
AI中文摘要

本文引入基于输出分位数的新全局敏感性度量方法。这些度量可用于分析量化为研究函数的问题,并识别在模型输出极值中最重要的变量。证明了所引入的度量与Sobol主效应敏感性指数之间存在联系。考虑了两种不同的蒙特卡洛估计器,显示双循环重新排序方法比暴力估计器更高效。利用多个测试案例和实际案例研究来说明所开发的方法。数值计算结果展示了所提技术的有效性。

英文摘要

New global sensitivity measures based on quantiles of the output are introduced. Such measures can be used for global sensitivity analysis of problems in which quantiles are explicitly the functions of interest and for identification of variables which are the most important in achieving extreme values of the model output. It is proven that there is a link between introduced measures and Sobol main effect sensitivity indices. Two different Monte Carlo estimators are considered. It is shown that the double loop reordering approach is much more efficient than the brute force estimator. Several test cases and practical case studies related to structural safety are used to illustrate the developed method. Results of numerical calculations show the efficiency of the presented technique.

1603.09157 2026-06-04 stat.CO cs.SY eess.SY

Linear System Identification via EM with Latent Disturbances and Lagrangian Relaxation

通过EM算法和拉格朗日松弛进行线性系统辨识:考虑潜在扰动和拉格朗日松弛

Jack Umenberger, Johan Wågberg, Ian R. Manchester, Thomas B. Schön

AI总结 本文提出利用EM算法辨识动态系统时,将系统扰动作为潜在变量,以解决奇异状态空间模型问题,并通过拉格朗日松弛和半定规划提升收敛效率。

详情
Comments
21 pages, 4 figures
AI中文摘要

在将期望最大化算法应用于动态系统辨识时,通常将内部状态作为潜在变量以简化问题。本文提出了一种不同的潜在变量选择,即系统扰动。这种 formulation 精巧地处理了奇异状态空间模型的问题,并在某些情况下提高了对似然界精度的估计,从而在更少的迭代次数内实现收敛。为了利用这些好处,我们开发了非凸优化问题的拉格朗日松弛,并通过半定规划进行求解。

英文摘要

In the application of the Expectation Maximization algorithm to identification of dynamical systems, internal states are typically chosen as latent variables, for simplicity. In this work, we propose a different choice of latent variables, namely, system disturbances. Such a formulation elegantly handles the problematic case of singular state space models, and is shown, under certain circumstances, to improve the fidelity of bounds on the likelihood, leading to convergence in fewer iterations. To access these benefits we develop a Lagrangian relaxation of the nonconvex optimization problems that arise in the latent disturbances formulation, and proceed via semidefinite programming.

1608.00512 2026-06-04 math.NA cs.NA math.PR math.ST stat.TH

Optimal weighted least-squares methods

最优加权最小二乘方法

Albert Cohen, Giovanni Migliorati

AI总结 本文研究了从噪声样本中重建未知有界函数的最优加权最小二乘方法,探讨了在一般逼近空间中的稳定性与最优精度条件。

详情
AI中文摘要

本文研究了从噪声样本中重建未知有界函数$u$的最优加权最小二乘方法,探讨了在一般逼近空间中的稳定性与最优精度条件。

英文摘要

We consider the problem of reconstructing an unknown bounded function $u$ defined on a domain $X\subset \mathbb{R}^d$ from noiseless or noisy samples of $u$ at $n$ points $(x^i)_{i=1,\dots,n}$. We measure the reconstruction error in a norm $L^2(X,dρ)$ for some given probability measure $dρ$. Given a linear space $V_m$ with ${\rm dim}(V_m)=m\leq n$, we study in general terms the weighted least-squares approximations from the spaces $V_m$ based on independent random samples. The contribution of the present paper is twofold. From the theoretical perspective, we establish results in expectation and in probability for weighted least squares in general approximation spaces $V_m$. These results show that for an optimal choice of sampling measure $dμ$ and weight $w$, which depends on the space $V_m$ and on the measure $dρ$, stability and optimal accuracy are achieved under the mild condition that $n$ scales linearly with $m$ up to an additional logarithmic factor. The present analysis covers also cases where the function $u$ and its approximants from $V_m$ are unbounded, which might occur for instance in the relevant case where $X=\mathbb{R}^d$ and $dρ$ is the Gaussian measure. From the numerical perspective, we propose a sampling method which allows one to generate independent and identically distributed samples from the optimal measure $dμ$. This method becomes of interest in the multivariate setting where $dμ$ is generally not of tensor product type. We illustrate this for particular examples of approximation spaces $V_m$ of polynomial type, where the domain $X$ is allowed to be unbounded and high or even infinite dimensional, motivated by certain applications to parametric and stochastic PDEs.

1607.08712 2026-06-04 cs.IT cs.NA math.IT math.NA stat.ME

Signal Recovery in Uncorrelated and Correlated Dictionaries Using Orthogonal Least Squares

利用正交最小二乘法在不相关和相关字典中进行信号恢复

Samrat Mukhopadhyay, Prateek Vashishtha and, Mrityunjoy Chakraborty

AI总结 本文研究了正交最小二乘法在压缩测量信号恢复中的性能,证明其在不相关和相关字典中具有竞争力,优于OMP等贪心算法,尤其在相关字典中表现更优。

详情
Comments
18 Pages, 8 figures
AI中文摘要

尽管最小二乘法在信号处理问题中已使用多年,但在压缩测量稀疏恢复领域并未得到广泛关注。本文展示了一种属于最小二乘法家族的方法,即正交最小二乘法(OLS),在压缩恢复问题中具有竞争力的恢复性能和计算复杂度,使其成为OMP等流行贪心方法的合适替代品。我们证明,通过轻微修改,OLS可以在M=O(K log(N/K))次测量中精确恢复嵌入在N维空间中的K-稀疏信号(K<<N),在高斯字典中。我们还证明,OLS可以以类似OMP的方式实现,仅需O(KMN)次浮点运算。本文还研究了在相关字典中的OLS性能,其中OMP无法表现出良好的恢复性能。我们研究了在特定称为广义混合字典的字典中OLS的恢复性能,该字典已被证明是相关字典,并通过数值实验证明,OLS在这些字典中比OMP在恢复性能上更优。最后,我们提供分析论证来支持数值示例中的发现。

英文摘要

Though the method of least squares has been used for a long time in solving signal processing problems, in the recent field of sparse recovery from compressed measurements, this method has not been given much attention. In this paper we show that a method in the least squares family, known in the literature as Orthogonal Least Squares (OLS), adapted for compressed recovery problems, has competitive recovery performance and computation complexity, that makes it a suitable alternative to popular greedy methods like Orthogonal Matching Pursuit (OMP). We show that with a slight modification, OLS can exactly recover a $K$-sparse signal, embedded in an $N$ dimensional space ($K<<N$) in $M=\mathcal{O}(K\log (N/K))$ no of measurements with Gaussian dictionaries. We also show that OLS can be easily implemented in such a way that it requires $\mathcal{O}(KMN)$ no of floating point operations similar to that of OMP. In this paper performance of OLS is also studied with sensing matrices with correlated dictionary, in which algorithms like OMP does not exhibit good recovery performance. We study the recovery performance of OLS in a specific dictionary called \emph{generalized hybrid dictionary}, which is shown to be a correlated dictionary, and show numerically that OLS has is far superior to OMP in these kind of dictionaries in terms of recovery performance. Finally we provide analytical justifications that corroborate the findings in the numerical illustrations.

1605.03017 2026-06-04 math.ST cs.SY eess.SY stat.TH

Marginalized Particle Filtering and Related Filtering Techniques as Message Passing

边缘粒子滤波及相关滤波技术作为信息传递

Giorgio M. Vitetta, Emilio Sirignano, Francesco Montorsi, Matteo Sola

AI总结 本文通过因子图方法研究混合线性/非线性状态空间模型的递归滤波问题,揭示因子图非环特性,并展示在条件线性高斯系统中,不同消息传递调度策略可产生已知和新颖的滤波技术,包括边缘粒子滤波和turbo滤波。

详情
AI中文摘要

本文采用因子图方法研究混合线性/非线性状态空间模型的递归滤波问题。研究发现,所考虑的滤波问题的因子图非环特性,并证明在条件线性高斯系统中,应用求和-乘积规则结合不同消息传递调度策略可得到已知和新颖的滤波技术。具体而言,特定前向消息传递调度策略可自然导出边缘粒子滤波;而迭代消息传递策略可开发出概念上类似turbo解码方法的turbo滤波技术。

英文摘要

In this manuscript a factor graph approach is employed to investigate the recursive filtering problem for a mixed linear/nonlinear state-space model, i.e. for a model whose state vector can be partitioned in a linear state variable (characterized by conditionally linear dynamics) and a non linear state variable. Our approach allows us to show that: a) the factor graph characterizing the considered filtering problem is not cycle free; b) in the case of conditionally linear Gaussian systems, applying the sum-product rule, together with different scheduling procedures for message passing, to this graph results in both known and novel filtering techniques. In particular, it is proved that, on the one hand, adopting a specific message scheduling for forward only message passing leads to marginalized particle filtering in a natural fashion; on the other hand, if iterative strategies for message passing are employed, novel filtering methods, dubbed turbo filters for their conceptual resemblance to the turbo decoding methods devised for concatenated channel codes, can be developed.

1605.05588 2026-06-04 eess.SY cs.SY stat.AP stat.ML

A Distributed Quaternion Kalman Filter With Applications to Fly-by-Wire Systems

一种分布式四元数卡尔曼滤波器及其在飞控系统中的应用

Sayed Pouria Talebi

AI总结 本文提出一种分布式四元数卡尔曼滤波算法,用于跟踪飞机在三维空间中的姿态和旋转,通过传感器分布式计算实现鲁棒性,避免万向节锁问题。

详情
Comments
It had to be noted that the assumption was made that all sensors have access to all observations and state estimate vectors. In addition, the summations in the DAQKF Algorithm are on all sensors, not just the neighbouring sensors
AI中文摘要

自动飞行控制系统和管理系统的引入使飞机设计能够牺牲空气动力学稳定性以融入隐身技术,从而更高效地操作并具有高度机动性。因此,现代飞行管理系统依赖多个冗余传感器来监控和控制飞机的旋转。为此,开发了一种新的分布式四元数卡尔曼滤波算法,用于跟踪飞机在三维空间中的旋转和姿态。该算法旨在将计算分布式到传感器中,使它们同意唯一解,同时对传感器和链路故障具有鲁棒性,这是飞行管理系统所需的特点。此外,基于四元数的状态空间模型可以避免与万向节锁相关的问题。通过仿真验证了所开发算法的性能。

英文摘要

The introduction of automated flight control and management systems have made possible aircraft designs that sacrifice arodynamic stability in order to incorporate stealth technology intro their shape, operate more efficiently, and are highly maneuverable. Therefore, modern flight management systems are reliant on multiple redundant sensors to monitor and control the rotations of the aircraft. To this end, a novel distributed quaternion Kalman filtering algorithm is developed for tracking the rotation and orientation of an aircraft in the three-dimensional space. The algorithm is developed to distribute computation among the sensors in a manner that forces them to consent to a unique solution while being robust to sensor and link failure, a desirable characteristic for flight management systems. In addition, the underlying quaternion-valued state space model allows to avoid problems associated with gimbal lock. The performance of the developed algorithm is verified through simulations.

1607.05073 2026-06-04 math.NA cs.NA stat.ML

Higher-Order Block Term Decomposition for Spatially Folded fMRI Data

高阶块项分解用于空间折叠的fMRI数据

Christos Chatzichristos, Eleftherios Kofidis, Giannis Kopsinis, Sergios Theodoridis

AI总结 本文提出高阶块项分解用于分析空间折叠的fMRI数据,以提高空间维度的利用效率,通过4或5阶张量建模提升数据处理能力。

详情
AI中文摘要

随着神经影像技术的广泛应用,生物医学数据量激增,具有高维特性。基于张量的脑影像数据分析在利用其多向性质方面已被证明非常有效。张量方法相对于矩阵方法在功能磁共振成像(fMRI)数据的特征化中也已得到验证,其中空间(体素)维度通常被分组(展开)为3阶数组的单一方式/模式,其他两个方式对应时间与受试者。然而,此类方法在更严苛的场景下表现不佳,如强噪声或激活区域显著重叠的情况。本文旨在通过更充分地利用空间维度,通过4或5阶张量建模来探索可能的增益。在此背景下,为增加建模过程的自由度,首次在fMRI分析中应用了高阶块项分解(BTD)。其有效性通过广泛的模拟结果得到验证。

英文摘要

The growing use of neuroimaging technologies generates a massive amount of biomedical data that exhibit high dimensionality. Tensor-based analysis of brain imaging data has been proved quite effective in exploiting their multiway nature. The advantages of tensorial methods over matrix-based approaches have also been demonstrated in the characterization of functional magnetic resonance imaging (fMRI) data, where the spatial (voxel) dimensions are commonly grouped (unfolded) as a single way/mode of the 3-rd order array, the other two ways corresponding to time and subjects. However, such methods are known to be ineffective in more demanding scenarios, such as the ones with strong noise and/or significant overlapping of activated regions. This paper aims at investigating the possible gains from a better exploitation of the spatial dimension, through a higher- (4 or 5) order tensor modeling of the fMRI signal. In this context, and in order to increase the degrees of freedom of the modeling process, a higher-order Block Term Decomposition (BTD) is applied, for the first time in fMRI analysis. Its effectiveness is demonstrated via extensive simulation results.

1508.06700 2026-06-04 math.NA cs.NA stat.CO

A surrogate accelerated multicanonical Monte Carlo method for uncertainty quantification

一种加速的多-canonical 蒙特卡罗方法用于不确定性量化

Keyi Wu, Jinglai Li

AI总结 本文提出一种加速的多-canonical 蒙特卡罗方法,通过构建局部高斯过程代理模型来提高计算效率,用于估计由随机参数描述的系统性能概率密度函数。

详情
AI中文摘要

在本文中,我们考虑了一类不确定性量化问题,其中系统性能或可靠性由标量参数y描述。由于系统中存在多种不确定性源,性能参数y是随机的,我们的目标是估计y的概率密度函数(PDF)。我们提出使用多-canonical 蒙特卡罗(MMC)方法,一种特殊的自适应重要性抽样算法,来计算感兴趣的PDF。此外,我们开发了一种自适应算法来构建局部高斯过程代理模型,以进一步加速MMC迭代。通过数值示例,我们证明所提出的方法在标准蒙特卡罗方法上可以实现多个数量级的加速。

英文摘要

In this work we consider a class of uncertainty quantification problems where the system performance or reliability is characterized by a scalar parameter $y$. The performance parameter $y$ is random due to the presence of various sources of uncertainty in the system, and our goal is to estimate the probability density function (PDF) of $y$. We propose to use the multicanonical Monte Carlo (MMC) method, a special type of adaptive importance sampling algorithm, to compute the PDF of interest. Moreover, we develop an adaptive algorithm to construct local Gaussian process surrogates to further accelerate the MMC iterations. With numerical examples we demonstrate that the proposed method can achieve several orders of magnitudes of speedup over the standard Monte Carlo method.

1502.00592 2026-06-04 stat.ME cs.CV cs.MM cs.NA math.NA stat.AP

A Class of DCT Approximations Based on the Feig-Winograd Algorithm

基于Feig-Winograd算法的一类DCT近似方法

C. J. Tablada, F. M. Bayer, R. J. Cintra

AI总结 本文提出基于Feig-Winograd算法8点DCT因子化的参数化矩阵类,通过多目标优化获得具有低计算复杂度、正交性、低逆复杂度及接近精确DCT性能的新型DCT近似方法。

详情
Journal ref
Signal Processing, vol. 113, pp. 38-51, August 2015
Comments
26 pages, 4 figures, 5 tables, fixed arithmetic complexity in Table IV
AI中文摘要

本文提出了一种基于Feig-Winograd因子化8点DCT的参数化矩阵类。此类参数化诱导出一个矩阵子空间,统一了多种现有的DCT近似方法。通过求解一个综合的多目标优化问题,我们识别出几种新的DCT近似方法。获得的解旨在具备以下特性:(i) 低乘法无计算复杂度,(ii) 正交或近正交性,(iii) 低复杂度可逆性,(iv) 接近精确DCT的接近性和性能。所提出的方法在接近DCT、编码性能及图像压缩适用性方面进行了评估。考虑到帕累托效率,某些新提出的近似方法可能在文献中已有的各种现有方法上表现更优。

英文摘要

A new class of matrices based on a parametrization of the Feig-Winograd factorization of 8-point DCT is proposed. Such parametrization induces a matrix subspace, which unifies a number of existing methods for DCT approximation. By solving a comprehensive multicriteria optimization problem, we identified several new DCT approximations. Obtained solutions were sought to possess the following properties: (i) low multiplierless computational complexity, (ii) orthogonality or near orthogonality, (iii) low complexity invertibility, and (iv) close proximity and performance to the exact DCT. Proposed approximations were submitted to assessment in terms of proximity to the DCT, coding performance, and suitability for image compression. Considering Pareto efficiency, particular new proposed approximations could outperform various existing methods archived in literature.

1402.1298 2026-06-04 math.NA cond-mat.stat-mech cs.IT cs.LG cs.NA math.IT stat.ML

Phase transitions and sample complexity in Bayes-optimal matrix factorization

贝叶斯最优矩阵分解中的相变与样本复杂性

Yoshiyuki Kabashima, Florent Krzakala, Marc Mézard, Ayaka Sakata, Lenka Zdeborová

AI总结 研究贝叶斯最优矩阵分解中的相变现象及样本复杂性,利用统计力学方法分析推断的可行性与计算可处理性,探讨最小均方误差与高效近似消息传递算法的性能。

详情
Journal ref
IEEE Transactions on Information Theory (Volume:62 , Issue: 7, Pages: 4228 - 4265) 2016
Comments
50 pages, 10 figures
AI中文摘要

我们分析了矩阵分解问题。给定两个矩阵乘积的噪声测量,问题在于恢复原始矩阵。它在许多应用中出现,如字典学习、盲矩阵校准、稀疏主成分分析、盲源分离、低秩矩阵补全、鲁棒主成分分析或因子分析。它在机器学习中也很重要:无监督表示学习往往可以通过矩阵分解研究。我们使用统计力学工具——空腔和副本方法——来分析贝叶斯最优推断设置中推断问题的可行性和计算可处理性,即假设两个矩阵具有随机独立元素,由某些已知分布生成,并且该信息可供推断算法使用。在此设置中,我们计算了在任何计算时间内理论上可实现的最小均方误差,以及高效近似消息传递算法可达到的误差。计算基于算法的渐进状态演变分析。我们的分析预测的性能,无论是就达到的均方误差而言,还是就样本复杂性而言,都非常有希望,值得进一步发展该算法。

英文摘要

We analyse the matrix factorization problem. Given a noisy measurement of a product of two matrices, the problem is to estimate back the original matrices. It arises in many applications such as dictionary learning, blind matrix calibration, sparse principal component analysis, blind source separation, low rank matrix completion, robust principal component analysis or factor analysis. It is also important in machine learning: unsupervised representation learning can often be studied through matrix factorization. We use the tools of statistical mechanics - the cavity and replica methods - to analyze the achievability and computational tractability of the inference problems in the setting of Bayes-optimal inference, which amounts to assuming that the two matrices have random independent elements generated from some known distribution, and this information is available to the inference algorithm. In this setting, we compute the minimal mean-squared-error achievable in principle in any computational time, and the error that can be achieved by an efficient approximate message passing algorithm. The computation is based on the asymptotic state-evolution analysis of the algorithm. The performance that our analysis predicts, both in terms of the achieved mean-squared-error, and in terms of sample complexity, is extremely promising and motivating for a further development of the algorithm.

1607.03954 2026-06-04 stat.ME cs.NA math.NA stat.CO stat.ML

Ensemble preconditioning for Markov chain Monte Carlo simulation

马尔可夫链蒙特卡洛模拟中的集合预条件

Charles Matthews, Jonathan Weare, Benedict Leimkuhler

AI总结 本文提出并行马尔可夫链蒙特卡洛方法,通过集体路径集合和邻近副本计算的局部协方差信息,消除乘性噪声并稳定动态,解决高维各向异性采样问题。

详情
AI中文摘要

我们描述了并行马尔可夫链蒙特卡洛方法,这些方法传播一个集体路径集合,其中局部协方差信息是从邻近副本计算的。集体动力学的使用消除了乘性噪声并稳定了动态,从而提供了解决高维各向异性采样问题的实用方法。数值实验表明,与各种替代方案相比,可以实现显著的加速效果。

英文摘要

We describe parallel Markov chain Monte Carlo methods that propagate a collective ensemble of paths, with local covariance information calculated from neighboring replicas. The use of collective dynamics eliminates multiplicative noise and stabilizes the dynamics thus providing a practical approach to difficult anisotropic sampling problems in high dimensions. Numerical experiments with model problems demonstrate that dramatic potential speedups, compared to various alternative schemes, are attainable.

1601.05257 2026-06-04 eess.SY cs.RO cs.SY stat.AP

Magnetometer calibration using inertial sensors

使用惯性传感器的磁力计校准

Manon Kok, Thomas B. Schön

AI总结 本文提出一种实用算法,用于校准磁力计以应对磁干扰和传感器误差,并结合惯性测量进行姿态估计。

详情
Journal ref
IEEE Sensors Journal, Volume 16, Issue 14, Pages 5679--5689, 2016
Comments
19 pages, 8 figures
AI中文摘要

本文提出了一种实用算法,用于校准磁力计以应对磁干扰和传感器误差。该算法还校正磁力计与惯性传感器轴之间的偏移,将磁力计测量与惯性测量结合用于姿态估计。校准算法被建模为最大似然问题的解决方案,并在离线环境中进行计算。使用两个不同商用传感器单元的数据验证了该算法的有效性。将校准后的磁力计测量与惯性传感器结合用于确定传感器姿态,显著提高了航向估计的精度。

英文摘要

In this work we present a practical algorithm for calibrating a magnetometer for the presence of magnetic disturbances and for magnetometer sensor errors. To allow for combining the magnetometer measurements with inertial measurements for orientation estimation, the algorithm also corrects for misalignment between the magnetometer and the inertial sensor axes. The calibration algorithm is formulated as the solution to a maximum likelihood problem and the computations are performed offline. The algorithm is shown to give good results using data from two different commercially available sensor units. Using the calibrated magnetometer measurements in combination with the inertial sensors to determine the sensor's orientation is shown to lead to significantly improved heading estimates.

1506.06669 2026-06-04 econ.GN q-fin.EC stat.AP

Understanding the Impact of Microcredit Expansions: A Bayesian Hierarchical Analysis of 7 Randomised Experiments

理解微贷扩展的影响:7项随机实验的贝叶斯分层分析

Rachael Meager

AI总结 本文通过7项随机实验,采用贝叶斯分层模型分析微贷扩展对家庭结果的影响,发现其效果可能较小且存在潜在负面影响,传统方法易产生误导性显著结果。

详情
Comments
This draft is preliminary and incomplete; future versions of this paper will contain substantive additional results
AI中文摘要

贝叶斯分层模型是一种用于聚合和综合异质环境数据的方法,广泛应用于统计学和其他领域。本文将该框架应用于7项扩展微贷访问权限的随机实验数据,以评估干预对家庭结果的一般影响以及不同地点的异质性影响。结果表明,微贷的影响可能为正但相对控制组平均水平较小,无法排除负面影响的可能性。相比之下,传统元分析方法在不评估异质性的情况下,导致6个家庭结果中的2个出现误导性的统计显著性结果。标准汇总指标显示研究的平均汇总率为60%,表明地点特定效应在外部有效性上是合理的,因此对彼此和一般情况具有信息价值。跨研究异质性几乎完全由27%之前从事商业的家庭异质效应产生,尽管该群体可能整体影响更大。通过岭回归程序评估地点特定协变量与治疗效应之间的相关性,表明剩余异质性强烈相关于经济变量差异,但与研究设计协议差异无关。平均利率和平均贷款规模与治疗效应相关性最强,且均为负相关。

英文摘要

Bayesian hierarchical models are a methodology for aggregation and synthesis of data from heterogeneous settings, used widely in statistics and other disciplines. I apply this framework to the evidence from 7 randomized experiments of expanding access to microcredit to assess the general impact of the intervention on household outcomes and the heterogeneity in this impact across sites. The results suggest that the effect of microcredit is likely to be positive but small relative to control group average levels, and the possibility of a negative impact cannot be ruled out. By contrast, common meta-analytic methods that pool all the data without assessing the heterogeneity misleadingly produce "statistically significant" results in 2 of the 6 household outcomes. Standard pooling metrics for the studies indicate on average 60% pooling on the treatment effects, suggesting that the site-specific effects are reasonably externally valid, and thus informative for each other and for the general case. The cross-study heterogeneity is almost entirely generated by heterogeneous effects for the 27% households who previously operated businesses before microcredit expansion, although this group is likely to see much larger impacts overall. A Ridge regression procedure to assess the correlations between site-specific covariates and treatment effects indicates that the remaining heterogeneity is strongly correlated with differences in economic variables, but not with differences in study design protocols. The average interest rate and the average loan size have the strongest correlation with the treatment effects, and both are negative.

1307.4847 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML

Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization

在确定性系统中通过价值函数泛化实现高效的强化学习

Zheng Wen, Benjamin Van Roy

AI总结 本文提出OCP算法,通过优化约束传播实现高效探索和价值函数泛化,在有限时间 horizon 确定性系统中实现最优动作选择,并提供效率和渐进行为保证。

详情
AI中文摘要

我们考虑在有限时间 horizon 确定性系统中进行强化学习的问题,并提出乐观约束传播(OCP)算法,该算法旨在合成高效的探索和价值函数泛化。我们证明当真实价值函数位于给定的假设类中时,OCP在最多K个episode中选择最优动作,其中K是给定假设类的eluder维度。我们进一步建立了效率和渐进行为保证,即使真实价值函数不位于给定的假设类中,对于假设类是预指定指示函数在不相交集合上的张量的特殊情况。我们还讨论了OCP的计算复杂性,并展示了两个示例的计算结果。

英文摘要

We consider the problem of reinforcement learning over episodes of a finite-horizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization. We establish that when the true value function lies within a given hypothesis class, OCP selects optimal actions over all but at most K episodes, where K is the eluder dimension of the given hypothesis class. We establish further efficiency and asymptotic performance guarantees that apply even if the true value function does not lie in the given hypothesis class, for the special case where the hypothesis class is the span of pre-specified indicator functions over disjoint sets. We also discuss the computational complexity of OCP and present computational results involving two illustrative examples.

1607.01458 2026-06-04 stat.CO cs.NA math.NA

A hybrid adaptive MCMC algorithm in function spaces

函数空间中的混合自适应MCMC算法

Qingping Zhou, Zixi Hu, Zhewei Yao, Jinglai Li

AI总结 本文提出一种混合自适应MCMC算法,结合自适应Metropolis方案与标准pCN算法,提升函数空间中采样效率,并满足重要遍历性条件。

详情
Comments
arXiv admin note: text overlap with arXiv:1511.05838
AI中文摘要

预条件Crank-Nicolson(pCN)方法是一种马尔可夫链蒙特卡洛(MCMC)方案,专门设计用于函数空间中的贝叶斯推断。与许多标准MCMC算法不同,pCN方法可以在网格细化下保持采样效率,这一特性称为维度无关性。本文考虑了一种自适应策略以进一步提高pCN的效率。具体而言,我们开发了一种混合自适应MCMC方法:该算法在选定的有限维子空间中执行自适应Metropolis方案,并在选定子空间的补空间中执行标准pCN算法。我们证明了所提出算法满足某些重要的遍历性条件。最后,通过数值示例我们展示了所提出方法与现有自适应算法具有竞争性的性能。

英文摘要

The preconditioned Crank-Nicolson (pCN) method is a Markov Chain Monte Carlo (MCMC) scheme, specifically designed to perform Bayesian inferences in function spaces. Unlike many standard MCMC algorithms, the pCN method can preserve the sampling efficiency under the mesh refinement, a property referred to as being dimension independent. In this work we consider an adaptive strategy to further improve the efficiency of pCN. In particular we develop a hybrid adaptive MCMC method: the algorithm performs an adaptive Metropolis scheme in a chosen finite dimensional subspace, and a standard pCN algorithm in the complement space of the chosen subspace. We show that the proposed algorithm satisfies certain important ergodicity conditions. Finally with numerical examples we demonstrate that the proposed method has competitive performance with existing adaptive algorithms.

1607.00345 2026-06-04 math.OC cs.LG cs.NA math.NA stat.ML

Convergence Rate of Frank-Wolfe for Non-Convex Objectives

非凸目标函数下Frank-Wolfe算法的收敛速度

Simon Lacoste-Julien

AI总结 本文证明Frank-Wolfe算法在非凸目标函数上以O(1/√t)速度收敛,且分析为仿射不变,首次在不同稳定性度量下达到与投影梯度方法相似的收敛速度。

详情
Comments
6 pages
AI中文摘要

本文证明Frank-Wolfe算法在非凸目标函数上以O(1/√t)速度收敛,且分析为仿射不变,首次在不同稳定性度量下达到与投影梯度方法相似的收敛速度。

英文摘要

We give a simple proof that the Frank-Wolfe algorithm obtains a stationary point at a rate of $O(1/\sqrt{t})$ on non-convex objectives with a Lipschitz continuous gradient. Our analysis is affine invariant and is the first, to the best of our knowledge, giving a similar rate to what was already proven for projected gradient methods (though on slightly different measures of stationarity).

1607.00514 2026-06-04 math.NA cs.LG cs.NA stat.ML

Approximate Joint Matrix Triangularization

近似联合矩阵三角化

Nicolo Colombo, Nikos Vlassis

AI总结 本文研究了噪声联合对角化矩阵的近似联合三角化问题,提出基于理论和观测量的扰动界,并讨论其在张量分解中的应用。

详情
Comments
19 pages
AI中文摘要

本文研究了噪声联合对角化矩阵的近似联合三角化问题,提出基于理论和观测量的扰动界,并讨论其在张量分解中的应用。

英文摘要

We consider the problem of approximate joint triangularization of a set of noisy jointly diagonalizable real matrices. Approximate joint triangularizers are commonly used in the estimation of the joint eigenstructure of a set of matrices, with applications in signal processing, linear algebra, and tensor decomposition. By assuming the input matrices to be perturbations of noise-free, simultaneously diagonalizable ground-truth matrices, the approximate joint triangularizers are expected to be perturbations of the exact joint triangularizers of the ground-truth matrices. We provide a priori and a posteriori perturbation bounds on the `distance' between an approximate joint triangularizer and its exact counterpart. The a priori bounds are theoretical inequalities that involve functions of the ground-truth matrices and noise matrices, whereas the a posteriori bounds are given in terms of observable quantities that can be computed from the input matrices. From a practical perspective, the problem of finding the best approximate joint triangularizer of a set of noisy matrices amounts to solving a nonconvex optimization problem. We show that, under a condition on the noise level of the input matrices, it is possible to find a good initial triangularizer such that the solution obtained by any local descent-type algorithm has certain global guarantees. Finally, we discuss the application of approximate joint matrix triangularization to canonical tensor decomposition and we derive novel estimation error bounds.

1602.04621 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML

Deep Exploration via Bootstrapped DQN

通过Bootstrap DQN进行深度探索

Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy

AI总结 本文提出Bootstrap DQN算法,通过随机价值函数实现高效探索,提升复杂环境中的学习速度和性能,尤其在Atari游戏中表现优异。

详情
AI中文摘要

复杂环境中的高效探索仍是强化学习的主要挑战。我们提出了Bootstrap DQN,一种简单算法,通过使用随机价值函数在计算和统计上高效地进行探索。与epsilon-greedy等策略不同,Bootstrap DQN实现时序扩展(或深度)探索,可导致学习速度呈指数级提升。我们在复杂随机MDP和大规模 Arcade Learning Environment 中展示了这些优势。Bootstrap DQN在大多数Atari游戏中显著提高了学习时间和性能。

英文摘要

Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions. Unlike dithering strategies such as epsilon-greedy exploration, bootstrapped DQN carries out temporally-extended (or deep) exploration; this can lead to exponentially faster learning. We demonstrate these benefits in complex stochastic MDPs and in the large-scale Arcade Learning Environment. Bootstrapped DQN substantially improves learning times and performance across most Atari games.

1607.00315 2026-06-04 math.NA cs.NA math.OC stat.ML

A multilevel framework for sparse optimization with application to inverse covariance estimation and logistic regression

一种用于稀疏优化的多级框架及其在逆协方差估计和逻辑回归中的应用

Eran Treister, Javier S. Turek, Irad Yavneh

AI总结 本文提出一种多级框架高效解决l1正则化稀疏优化问题,应用于逆协方差估计和逻辑回归,通过分层问题加速优化过程,提升大规模数据处理效率。

详情
Comments
To appear on SISC journal
AI中文摘要

求解l1正则化优化问题在计算生物学、信号处理和机器学习中很常见。此类l1正则化用于寻找凸函数的稀疏极小值。著名的例子是LASSO问题,其中l1范数正则化二次函数。本文提出一种多级框架高效解决此类l1正则化稀疏优化问题。利用解的稀疏性,创建相似类型的问题层次结构,按顺序遍历以加速优化过程。该框架应用于两个问题:(1) 估计多元正态分布未知协方差矩阵的逆,假设其稀疏;(2) 在逻辑回归分类目标中加入l1正则化以减少数据过拟合并获得稀疏模型。数值实验展示了该多级框架在加速现有迭代求解器方面的效率。

英文摘要

Solving l1 regularized optimization problems is common in the fields of computational biology, signal processing and machine learning. Such l1 regularization is utilized to find sparse minimizers of convex functions. A well-known example is the LASSO problem, where the l1 norm regularizes a quadratic function. A multilevel framework is presented for solving such l1 regularized sparse optimization problems efficiently. We take advantage of the expected sparseness of the solution, and create a hierarchy of problems of similar type, which is traversed in order to accelerate the optimization process. This framework is applied for solving two problems: (1) the sparse inverse covariance estimation problem, and (2) l1-regularized logistic regression. In the first problem, the inverse of an unknown covariance matrix of a multivariate normal distribution is estimated, under the assumption that it is sparse. To this end, an l1 regularized log-determinant optimization problem needs to be solved. This task is challenging especially for large-scale datasets, due to time and memory limitations. In the second problem, the l1-regularization is added to the logistic regression classification objective to reduce overfitting to the data and obtain a sparse model. Numerical experiments demonstrate the efficiency of the multilevel framework in accelerating existing iterative solvers for both of these problems.

1602.06604 2026-06-04 eess.SY cs.SI cs.SY physics.data-an physics.soc-ph stat.AP

Detection of Cyber-Physical Faults and Intrusions from Physical Correlations

从物理相关性检测网络物理故障和入侵

Andrey Y. Lokhov, Nathan Lemons, Thomas C. McAndrew, Aric Hagberg, Scott Backhaus

AI总结 研究通过物理信号相关性高效识别故障,无需先验知识,验证于建筑自动化系统数据。

详情
Comments
10 pages, 9 figures
AI中文摘要

网络物理系统是关键基础设施,对资源可靠交付和自动控制架构稳定运行至关重要。这些系统由相互依赖的物理、控制和通信网络组成,由不同的数学模型描述,带来超出单个网络建模和分析的科学挑战。网络物理防御的关键挑战是快速在线检测和定位故障和入侵,而无需事先了解故障类型。我们描述了一套技术,通过物理信号中的相关性高效识别故障,仅假设少量可用系统信息。我们的检测方法性能在大型建筑自动化系统收集的数据上进行了说明。

英文摘要

Cyber-physical systems are critical infrastructures that are crucial both to the reliable delivery of resources such as energy, and to the stable functioning of automatic and control architectures. These systems are composed of interdependent physical, control and communications networks described by disparate mathematical models creating scientific challenges that go well beyond the modeling and analysis of the individual networks. A key challenge in cyber-physical defense is a fast online detection and localization of faults and intrusions without prior knowledge of the failure type. We describe a set of techniques for the efficient identification of faults from correlations in physical signals, assuming only a minimal amount of available system information. The performance of our detection method is illustrated on data collected from a large building automation system.

1402.1673 2026-06-04 math.NA cs.NA stat.OT

Non-Orthogonal Tensor Diagonalization

非正交张量对角化

Petr Tichavsky, Anh Huy Phan, Andrzej Cichocki

AI总结 本文提出非正交张量对角化算法,用于处理对称和非对称张量的对角化问题,并改进了张量分解方法。

详情
Comments
The manuscript was revised deeply, but the main idea is the same. The algorithm has changed significantly
AI中文摘要

张量对角化是指通过乘以非正交可逆矩阵沿张量选定的维度将给定的张量转换为精确或近似的对角形式。它是近似联合对角化(AJD)的推广。特别是,我们推导出(1)一种新的对称AJD算法,称为三阶张量的双侧对称对角化,(2)一种类似的非对称AJD算法,也称为三阶张量的通用双侧对角化,以及(3)一种用于三阶张量三侧对角化的算法。后两种算法可用于canonial polyadic(CP)张量分解,并且在张量秩不超过张量多线性秩的限制下,可以比其他CP张量分解方法在计算速度上更优。最后,我们提出(4)类似的张量块对角化算法,这与张量块项分解有关。

英文摘要

Tensor diagonalization means transforming a given tensor to an exactly or nearly diagonal form through multiplying the tensor by non-orthogonal invertible matrices along selected dimensions of the tensor. It is generalization of approximate joint diagonalization (AJD) of a set of matrices. In particular, we derive (1) a new algorithm for symmetric AJD, which is called two-sided symmetric diagonalization of order-three tensor, (2) a similar algorithm for non-symmetric AJD, also called general two-sided diagonalization of an order-3 tensor, and (3) an algorithm for three-sided diagonalization of order-3 or order-4 tensors. The latter two algorithms may serve for canonical polyadic (CP) tensor decomposition, and they can outperform other CP tensor decomposition methods in terms of computational speed under the restriction that the tensor rank does not exceed the tensor multilinear rank. Finally, we propose (4) similar algorithms for tensor block diagonalization, which is related to the tensor block-term decomposition.

1606.09155 2026-06-04 math.OC cs.NA math.NA stat.ML

Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming

加速的首次-order 原始-对偶近似方法用于线性约束的复合凸编程

Yangyang Xu

AI总结 本文提出两种加速方法解决线性约束的复合凸优化问题,通过线性化和近似项简化子问题,实现O(1/t)或O(1/t²)收敛性,优于现有方法。

详情
AI中文摘要

受大数据应用启发,近年来首次-order 方法非常流行。然而,朴素梯度方法通常收敛缓慢。因此,许多努力用于加速各种首次-order 方法。本文提出两种加速方法解决结构化的线性约束凸编程问题,假设复合凸目标。第一种方法是加速线性化增广拉格朗日方法(LALM)。在每次更新原始变量时,它允许对可微函数和增广项进行线性化,从而简化子问题。仅假设弱凸性,我们证明LALM在参数固定时具有O(1/t)收敛性,若参数自适应则可达到O(1/t²)收敛性,其中t是总迭代次数。第二种方法是加速线性化交替方向乘子法(LADMM)。除了复合凸性外,它进一步假设目标函数的两块结构。不同于经典ADMM,我们的方法允许对目标和增广项进行线性化以使更新简单。假设一个块变量的强凸性,我们证明LADMM在自适应参数下也具有O(1/t²)收敛性。这一结果优于[Goldstein等, SIIMS'14],后者要求两个块变量的强凸性且不进行目标或增广项的线性化。数值实验在二次规划、图像去噪和支持向量机上进行。所提加速方法与非加速方法及现有加速方法进行比较。结果证明了加速的有效性,并展示了所提方法在现有方法上的优越性能。

英文摘要

Motivated by big data applications, first-order methods have been extremely popular in recent years. However, naive gradient methods generally converge slowly. Hence, much efforts have been made to accelerate various first-order methods. This paper proposes two accelerated methods towards solving structured linearly constrained convex programming, for which we assume composite convex objective. The first method is the accelerated linearized augmented Lagrangian method (LALM). At each update to the primal variable, it allows linearization to the differentiable function and also the augmented term, and thus it enables easy subproblems. Assuming merely weak convexity, we show that LALM owns $O(1/t)$ convergence if parameters are kept fixed during all the iterations and can be accelerated to $O(1/t^2)$ if the parameters are adapted, where $t$ is the number of total iterations. The second method is the accelerated linearized alternating direction method of multipliers (LADMM). In addition to the composite convexity, it further assumes two-block structure on the objective. Different from classic ADMM, our method allows linearization to the objective and also augmented term to make the update simple. Assuming strong convexity on one block variable, we show that LADMM also enjoys $O(1/t^2)$ convergence with adaptive parameters. This result is a significant improvement over that in [Goldstein et. al, SIIMS'14], which requires strong convexity on both block variables and no linearization to the objective or augmented term. Numerical experiments are performed on quadratic programming, image denoising, and support vector machine. The proposed accelerated methods are compared to nonaccelerated ones and also existing accelerated methods. The results demonstrate the validness of acceleration and superior performance of the proposed methods over existing ones.

1502.06069 2026-06-04 math.NA cs.NA math.PR stat.CO

Multilevel ensemble Kalman filtering

多级集合卡尔曼滤波

Håkon Hoel, Kody J. H. Law, Raul Tempone

AI总结 本文将多级蒙特卡洛采样策略引入集合卡尔曼滤波的蒙特卡洛步骤中,针对有限维信号演变和噪声离散时间观测,通过多级时间网格实现SDE的多级数值积分,证明多级EnKF在计算成本与近似精度方面优于传统EnKF。

详情
Journal ref
SIAM J. Numer. Anal., 54(3), 1813-1839 (2016)
AI中文摘要

本文将多级蒙特卡洛采样策略嵌入到集合卡尔曼滤波(EnKF)的蒙特卡洛步骤中,应用于有限维信号演变和噪声离散时间观测的场景。信号动力学假设由随机微分方程(SDE)支配,引入多级时间网格用于该SDE的多级数值积分。结果表明,多级EnKF在计算成本与近似精度方面渐近优于传统EnKF,理论结果通过数值实验加以验证。

英文摘要

This work embeds a multilevel Monte Carlo sampling strategy into the Monte Carlo step of the ensemble Kalman filter (EnKF) in the setting of finite dimensional signal evolution and noisy discrete-time observations. The signal dynamics is assumed to be governed by a stochastic differential equation (SDE), and a hierarchy of time grids is introduced for multilevel numerical integration of that SDE. The resulting multilevel EnKF is proved to asymptotically outperform EnKF in terms of computational cost versus approximation accuracy. The theoretical results are illustrated numerically.

1506.07405 2026-06-04 math.NA cs.NA stat.ML

Global Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation

广义梯度下降算法在子空间估计中的全局收敛性

Dejiao Zhang, Laura Balzano

AI总结 本文提出一种约束于格拉斯曼流形的梯度下降算法,用于估计流数据矩阵的子空间,证明在无噪声情况下算法能从任意初始化收敛到全局最优,且在有噪声情况下提供预期收敛率。

详情
Comments
23 pages, 10 figures
AI中文摘要

在各种情境中已观察到,尽管相关问题 formulation 非凸,梯度下降方法在解决低秩矩阵分解问题中表现出色。我们针对特定情形,即寻求由流数据矩阵张成的 d 维子空间。我们应用自然的一阶增量梯度下降方法,将梯度方法约束在格拉斯曼流形上。本文提出一种自适应步长方案,该方案在无噪声情况下贪婪地最大化收敛度量在每个数据索引 t 的改进,并在有噪声情况下提供预期改进。我们证明,在无噪声数据情况下,该方法从任意随机初始化收敛到问题的全局最小值。对于有噪声数据,我们提供了所提算法每迭代次的预期收敛率。

英文摘要

It has been observed in a variety of contexts that gradient descent methods have great success in solving low-rank matrix factorization problems, despite the relevant problem formulation being non-convex. We tackle a particular instance of this scenario, where we seek the $d$-dimensional subspace spanned by a streaming data matrix. We apply the natural first order incremental gradient descent method, constraining the gradient method to the Grassmannian. In this paper, we propose an adaptive step size scheme that is greedy for the noiseless case, that maximizes the improvement of our metric of convergence at each data index $t$, and yields an expected improvement for the noisy case. We show that, with noise-free data, this method converges from any random initialization to the global minimum of the problem. For noisy data, we provide the expected convergence rate of the proposed algorithm per iteration.

1503.06171 2026-06-04 stat.AP cs.SY eess.SY

Probabilistic Forecast of Real-Time LMP and Network Congestion

实时LMP和网络拥堵的概率预报

Yuting Ji, Robert J. Thomas, Lang Tong

AI总结 本文提出一种基于多参数规划的概率预报方法,用于实时LMP和网络拥堵的短期预测,通过离线计算降低在线计算成本,并动态生成关键区域以提高计算效率。

详情
AI中文摘要

从系统运营商角度考虑实时 locational marginal price (LMP) 和网络拥堵的短期预测。提出了一种基于多参数规划的新型概率预报技术,将不确定性参数空间划分为关键区域,从而获得实时LMP/拥堵的条件概率分布。所提方法整合了负荷/发电预测、时间变化的操作约束和 contingencies模型。通过将多参数规划的计算成本离线转移,显著降低了在线计算成本。还提出了一种在线模拟技术,通过动态生成关键区域,使得计算成本比标准蒙特卡洛方法提高了几个数量级。

英文摘要

The short-term forecasting of real-time locational marginal price (LMP) and network congestion is considered from a system operator perspective. A new probabilistic forecasting technique is proposed based on a multiparametric programming formulation that partitions the uncertainty parameter space into critical regions from which the conditional probability distribution of the real-time LMP/congestion is obtained. The proposed method incorporates load/generation forecast, time varying operation constraints, and contingency models. By shifting the computation cost associated with multiparametric programs offline, the online computation cost is significantly reduced. An online simulation technique by generating critical regions dynamically is also proposed, which results in several orders of magnitude improvement in the computational cost over standard Monte Carlo methods.

1606.07414 2026-06-04 cs.CV cs.MM cs.NA math.NA stat.ME

Multiplierless 16-point DCT Approximation for Low-complexity Image and Video Coding

无乘法器16点DCT近似用于低复杂度图像和视频编码

T. L. T. Silveira, R. S. Oliveira, F. M. Bayer, R. J. Cintra, A. Madanayake

AI总结 本文提出一种无需乘法和位移操作的16点近似DCT变换,通过矩阵分解快速算法仅需44次加法,实现了最低的算术成本,并在图像和视频编码中表现出最佳的成本效益比。

详情
Comments
12 pages, 5 figures, 3 tables
AI中文摘要

本文介绍了一种正交的16点近似离散余弦变换(DCT)。所提出的变换不需要乘法或位移操作。引入了一种基于矩阵分解的快速算法,仅需44次加法,这是文献中最低的算术成本。为了评估所提出的变换,计算了计算复杂度、与精确DCT的相似性以及编码性能指标。经典和最先进的16点低复杂度变换在比较分析中被使用。在图像压缩中,所提出的近似通过PSNR和SSIM测量评估,获得了最佳的成本效益比。对于视频编码,所提出的近似被嵌入到HEVC参考软件中,直接与原始HEVC标准进行比较。通过FPGA硬件实现和测试,所提出的变换在与文献中最佳竞争变换相比时,面积-时间和面积-时间平方VLSI指标分别提高了35%和37%。

英文摘要

An orthogonal 16-point approximate discrete cosine transform (DCT) is introduced. The proposed transform requires neither multiplications nor bit-shifting operations. A fast algorithm based on matrix factorization is introduced, requiring only 44 additions---the lowest arithmetic cost in literature. To assess the introduced transform, computational complexity, similarity with the exact DCT, and coding performance measures are computed. Classical and state-of-the-art 16-point low-complexity transforms were used in a comparative analysis. In the context of image compression, the proposed approximation was evaluated via PSNR and SSIM measurements, attaining the best cost-benefit ratio among the competitors. For video encoding, the proposed approximation was embedded into a HEVC reference software for direct comparison with the original HEVC standard. Physically realized and tested using FPGA hardware, the proposed transform showed 35% and 37% improvements of area-time and area-time-squared VLSI metrics when compared to the best competing transform in the literature.

1602.08861 2026-06-04 stat.AP cs.NA math.NA

Bayesian inference for age-structured population model of infectious disease with application to varicella in Poland

传染病年龄结构人口模型的贝叶斯推断及其在波兰水痘应用中的研究

Piotr Gwiazda, Błażej Miasojedow, Magdalena Rosińska

AI总结 本文探讨了基于年龄结构的传染病模型的贝叶斯数据同化方法,通过水痘在波兰的实证研究,展示了如何利用观测数据校准模型参数,获得无偏的后验分布。

详情
AI中文摘要

传染病传播的动力学通常需要考虑基于特定特征(如年龄或免疫水平)的人口结构。此类模型的实用价值取决于与观测数据的适当校准。本文讨论了针对两种状态年龄结构模型的贝叶斯方法用于数据同化。此类模型常用于基于多个时间点收集的患病率数据描述疾病动态(即感染力)。我们证明,当模型方程的显式解已知时,在贝叶斯框架中考虑数据收集过程可获得确定感染力参数的无偏后验分布。我们进一步通过分析和数值测试表明,这些参数的后验分布对队列近似(Escalator Boxcar Train)过程是稳定的。最后,我们将该技术应用于基于波兰观测的水痘血清流行病学数据校准模型。

英文摘要

Dynamics of the infectious disease transmission is often best understood taking into account the structure of population with respect to specific features, in example age or immunity level. Practical utility of such models depends on the appropriate calibration with the observed data. Here, we discuss the Bayesian approach to data assimilation in case of two-state age-structured model. This kind of models are frequently used to describe the disease dynamics (i.e. force of infection) basing on prevalence data collected at several time points. We demonstrate that, in the case when the explicit solution to the model equation is known, accounting for the data collection process in the Bayesian framework allows to obtain an unbiased posterior distribution for the parameters determining the force of infection. We further show analytically and through numerical tests that the posterior distribution of these parameters is stable with respect to cohort approximation (Escalator Boxcar Train) to the solution. Finally, we apply the technique to calibrate the model based on observed sero-prevalence of varicella in Poland.

1606.05562 2026-06-04 cs.IT cs.AR cs.NA math.IT math.NA stat.ME

An Orthogonal 16-point Approximate DCT for Image and Video Compression

一种用于图像和视频压缩的正交16点近似DCT

T. L. T. da Silveira, F. M. Bayer, R. J. Cintra, S. Kulasekera, A. Madanayake, A. J. Kozakevicius

AI总结 本文提出了一种低复杂度的正交无乘法器近似DCT,采用矩阵分解快速算法,仅需60次加法运算,实现高效图像和视频压缩,并在FPGA上物理实现,嵌入HEVC参考软件验证效果。

详情
Journal ref
Multidimensional Systems and Signal Processing, vol. 27, no. 1, pp. 87-104, 2016
Comments
18 pages, 7 figures, 6 tables
AI中文摘要

本文介绍了一种低复杂度的正交无乘法器近似16点离散余弦变换(DCT)。所提方法设计用于具有极低的计算成本。提出了一种基于矩阵分解的快速算法,仅需60次加法运算。所提架构在作为图像和视频压缩工具时,优于经典和最先进算法。还提出了数字VLSI硬件实现,已在FPGA技术中实现,从45nm到综合和布局级别。此外,所提方法被嵌入到高效率视频编码(HEVC)参考软件中进行实际验证。获得的结果显示,与Chen DCT算法相比,在HEVC中视频降质可忽略不计。

英文摘要

A low-complexity orthogonal multiplierless approximation for the 16-point discrete cosine transform (DCT) was introduced. The proposed method was designed to possess a very low computational cost. A fast algorithm based on matrix factorization was proposed requiring only 60~additions. The proposed architecture outperforms classical and state-of-the-art algorithms when assessed as a tool for image and video compression. Digital VLSI hardware implementations were also proposed being physically realized in FPGA technology and implemented in 45 nm up to synthesis and place-route levels. Additionally, the proposed method was embedded into a high efficiency video coding (HEVC) reference software for actual proof-of-concept. Obtained results show negligible video degradation when compared to Chen DCT algorithm in HEVC.

1606.05560 2026-06-04 stat.ML cs.NA math.NA

Estimation of matrix trace using machine learning

利用机器学习估计矩阵迹

Boram Yoon

AI总结 本文提出一种新的矩阵迹估计方法,通过机器学习选择少量探测向量替代传统随机噪声向量,实现高精度估计。

详情
Comments
10 pages
AI中文摘要

我们提出了一种新的矩阵迹估计器,该估计器无需显式给出矩阵形式,但可提供向量乘积。该估计器形式类似于Hutchison随机迹估计器,但用机器学习确定的小数量探测向量替代随机噪声向量。讨论了估计质量的评估和偏差校正。提出了一个无偏估计器用于计算函数的期望值。在随机矩阵的数值实验中,显示使用由机器学习确定的$\mathcal{O}(10)$探测向量的迹估计精度与使用$\mathcal{O}(10000)$随机噪声向量的精度相似。

英文摘要

We present a new trace estimator of the matrix whose explicit form is not given but its matrix multiplication to a vector is available. The form of the estimator is similar to the Hutchison stochastic trace estimator, but instead of the random noise vectors in Hutchison estimator, we use small number of probing vectors determined by machine learning. Evaluation of the quality of estimates and bias correction are discussed. An unbiased estimator is proposed for the calculation of the expectation value of a function of traces. In the numerical experiments with random matrices, it is shown that the precision of trace estimates with $\mathcal{O}(10)$ probing vectors determined by the machine learning is similar to that with $\mathcal{O}(10000)$ random noise vectors.

1505.07519 2026-06-04 stat.CO cs.NA math.NA stat.ML

A Bounded $p$-norm Approximation of Max-Convolution for Sub-Quadratic Bayesian Inference on Additive Factors

具有 bounded p-范数的 max-卷积近似用于亚二次贝叶斯推理在加法因子上

Julianus Pfeuffer, Oliver Serang

AI总结 本文提出两种基于 p-范数快速卷积的数值稳定方法,用于近似 max-卷积,从而将隐马尔可夫模型中 Toeplitz 转移矩阵的 Viterbi 路径近似时间从 O(nk²) 降低到 O(nk logk)。

详情
Journal ref
Journal of Machine Learning Research 17 (2016) 1-39
AI中文摘要

Max-convolution 是一个重要的问题,与标准卷积非常相似,在许多领域中频繁出现。本文扩展了已知最快的最坏情况运行时间的方法,该方法可以应用于非负向量,通过数值近似 Chebyshev 范数 ||·||_∞ 来实现。基于通过快速卷积计算 p-范数的思路,本文提出两种数值稳定的方法:第一种方法的运行时间在 O(k logk log logk)(对于任何可实际实现的向量,这小于 18k logk),使用 p-范数直接近似 Chebyshev 范数。第二种方法的运行时间在 O(k logk)(尽管在实践中两者表现相似),使用一种新的空域投影方法,从一系列 p-范数中提取信息以估计向量中的最大值(这相当于通过具有有限支持分布的少量矩来估计最大值)。p-范数方法被相互比较,并在隐马尔可夫模型中计算 Viterbi 路径的近似值,其中转移矩阵是 Toeplitz 矩阵;因此,近似 Viterbi 路径的运行时间从 O(nk²) 步骤减少到 O(nk logk) 步骤,在实践中通过从 S&P 500 股票指数推断美国失业率来演示。

英文摘要

Max-convolution is an important problem closely resembling standard convolution; as such, max-convolution occurs frequently across many fields. Here we extend the method with fastest known worst-case runtime, which can be applied to nonnegative vectors by numerically approximating the Chebyshev norm $\| \cdot \|_\infty$, and use this approach to derive two numerically stable methods based on the idea of computing $p$-norms via fast convolution: The first method proposed, with runtime in $O( k \log(k) \log(\log(k)) )$ (which is less than $18 k \log(k)$ for any vectors that can be practically realized), uses the $p$-norm as a direct approximation of the Chebyshev norm. The second approach proposed, with runtime in $O( k \log(k) )$ (although in practice both perform similarly), uses a novel null space projection method, which extracts information from a sequence of $p$-norms to estimate the maximum value in the vector (this is equivalent to querying a small number of moments from a distribution of bounded support in order to estimate the maximum). The $p$-norm approaches are compared to one another and are shown to compute an approximation of the Viterbi path in a hidden Markov model where the transition matrix is a Toeplitz matrix; the runtime of approximating the Viterbi path is thus reduced from $O( n k^2 )$ steps to $O( n $k \log(k))$ steps in practice, and is demonstrated by inferring the U.S. unemployment rate from the S&P 500 stock index.

1501.01579 2026-06-04 eess.SY cs.SY stat.CO

Consensus Labeled Random Finite Set Filtering for Distributed Multi-Object Tracking

基于一致性标注随机有限集滤波的分布式多目标跟踪

C. Fantacci, B. -N. Vo, B. -T. Vo, G. Battistelli, L. Chisci

AI总结 本文提出基于标注随机有限集和动态贝叶斯推断的分布式多目标估计方法,设计了两种新型一致性跟踪滤波器,实现多目标跟踪的分布式、可扩展和高效解决方案。

详情
AI中文摘要

本文针对异构且地理分布广泛的网络节点上的分布式多目标跟踪问题,提出基于标注随机有限集和动态贝叶斯推断的分布式多目标估计方法,设计了两种新型一致性跟踪滤波器,即一致性标注δ-广义标签多伯努利滤波器和一致性标注多伯努利跟踪滤波器。所提算法为多目标跟踪提供了完全分布式、可扩展且计算高效的解决方案。通过高斯混合实现的仿真实验验证了所提方法在挑战性场景中的有效性。

英文摘要

This paper addresses distributed multi-object tracking over a network of heterogeneous and geographically dispersed nodes with sensing, communication and processing capabilities. The main contribution is an approach to distributed multi-object estimation based on labeled Random Finite Sets (RFSs) and dynamic Bayesian inference, which enables the development of two novel consensus tracking filters, namely a Consensus Marginalized $δ$-Generalized Labeled Multi-Bernoulli and Consensus Labeled Multi-Bernoulli tracking filter. The proposed algorithms provide fully distributed, scalable and computationally efficient solutions for multi-object tracking. Simulation experiments via Gaussian mixture implementations confirm the effectiveness of the proposed approach on challenging scenarios.

1411.1552 2026-06-04 stat.CO cs.NA math.NA

A Block Circulant Embedding Method for Simulation of Stationary Gaussian Random Fields on Block-regular Grids

用于块正则网格上平稳高斯随机场模拟的块循环嵌入方法

M. Park, M. V. Tretyakov

AI总结 本文提出一种新的方法,用于在非正则但具有正则块结构的网格上采样平稳高斯随机场,该方法在典型模型问题中优于传统循环嵌入方法。

详情
Journal ref
Int. J. Uncertainty Quantification, V. 5, No. 6 (2015), pp. 527-544
Comments
[17 pages, 8 figures] We added Remarks 2.1, 3.1, 3.2, and Example 1.3 and removed the Appendix which is now summarized in Remark 2.1
AI中文摘要

我们提出了一种新的方法,用于在非正则但具有正则块结构的网格上采样平稳高斯随机场。所引入的块循环嵌入方法(BCEM)在应用前不需要对不规则网格进行正则化,从而优于传统循环嵌入方法(CEM)。BCEM与CEM在一些典型模型问题上进行了比较。

英文摘要

We propose a new method for sampling from stationary Gaussian random field on a grid which is not regular but has a regular block structure which is often the case in applications. The introduced block circulant embedding method (BCEM) can outperform the classical circulant embedding method (CEM) which requires a regularization of the irregular grid before its application. Comparison of BCEM vs CEM is performed on some typical model problems.

1606.01245 2026-06-04 math.NA cs.AI cs.NA math.OC stat.ML

Scalable Algorithms for Tractable Schatten Quasi-Norm Minimization

可扩展算法用于可计算的Schatten准范数最小化

Fanhua Shang, Yuanyuan Liu, James Cheng

AI总结 本文提出两种可计算的Schatten准范数,设计高效算法以加速大规模问题解决,并通过实验验证其精度和速度优势。

详情
Comments
16 pages, 5 figures, Appears in Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona, USA, pp. 2016--2022, 2016
AI中文摘要

Schatten-p准范数(0<p<1)通常用于替代标准核范数以更精确地近似秩函数。然而,现有Schatten-p准范数最小化算法在每次迭代中均涉及奇异值分解(SVD)或特征值分解(EVD),因此对于大规模问题可能变得非常缓慢且不实用。本文首先定义了两种可计算的Schatten准范数,即Frobenius/核混合准范数和双核准范数,并证明它们本质上是Schatten-2/3和1/2准范数,分别导致仅需更新两个较小因子矩阵的高效算法。我们还为解决代表性矩阵补全问题设计了两种高效的近端交替线性化最小化算法。最后,我们提供了算法的全局收敛性和性能保证,其收敛性质优于现有算法。在合成和真实数据上的实验结果表明,我们的算法比现有最先进方法更准确,并且快了多个数量级。

英文摘要

The Schatten-p quasi-norm $(0<p<1)$ is usually used to replace the standard nuclear norm in order to approximate the rank function more accurately. However, existing Schatten-p quasi-norm minimization algorithms involve singular value decomposition (SVD) or eigenvalue decomposition (EVD) in each iteration, and thus may become very slow and impractical for large-scale problems. In this paper, we first define two tractable Schatten quasi-norms, i.e., the Frobenius/nuclear hybrid and bi-nuclear quasi-norms, and then prove that they are in essence the Schatten-2/3 and 1/2 quasi-norms, respectively, which lead to the design of very efficient algorithms that only need to update two much smaller factor matrices. We also design two efficient proximal alternating linearized minimization algorithms for solving representative matrix completion problems. Finally, we provide the global convergence and performance guarantees for our algorithms, which have better convergence properties than existing algorithms. Experimental results on synthetic and real-world data show that our algorithms are more accurate than the state-of-the-art methods, and are orders of magnitude faster.

1310.1826 2026-06-04 stat.ML cs.NA math.NA

Learning Non-Parametric Basis Independent Models from Point Queries via Low-Rank Methods

通过低秩方法从点查询中学习非参数基无关模型

Hemant Tyagi, Volkan Cevher

AI总结 本文提出一种随机多项式复杂度采样方案,用于估计多 ridge 函数,通过低秩矩阵恢复技术实现多项式时间估计和统一逼近保证。

详情
Comments
27 pages, minor corrections in the proof of Proposition 2 (appendix H), modified the statement of Proposition 2, typos corrected in appendix E
AI中文摘要

我们考虑从 f 的点评估中学习多 ridge 函数 f(x) = g(Ax) 的问题。假设 f 定义在 R^d 的 l_2 球上,g 几乎处处两次连续可微,A ∈ R^{k×d} 是秩为 k 的矩阵,其中 k << d。我们提出了一种随机、多项式复杂度的采样方案来估计此类函数。我们的理论发展利用了低秩矩阵恢复的最新技术,使我们能够推导出函数 f 的多项式时间估计器以及统一逼近保证。我们证明该方案也可用于学习形式为 f(x) = ∑_{i=1}^{k} g_i(a_i^T x) 的函数,前提是 f 在原点附近满足某些光滑性条件。我们还Characterized了该方案的噪声鲁棒性。最后,我们通过数值示例来展示理论界限的实际应用。

英文摘要

We consider the problem of learning multi-ridge functions of the form f(x) = g(Ax) from point evaluations of f. We assume that the function f is defined on an l_2-ball in R^d, g is twice continuously differentiable almost everywhere, and A \in R^{k \times d} is a rank k matrix, where k << d. We propose a randomized, polynomial-complexity sampling scheme for estimating such functions. Our theoretical developments leverage recent techniques from low rank matrix recovery, which enables us to derive a polynomial time estimator of the function f along with uniform approximation guarantees. We prove that our scheme can also be applied for learning functions of the form: f(x) = \sum_{i=1}^{k} g_i(a_i^T x), provided f satisfies certain smoothness conditions in a neighborhood around the origin. We also characterize the noise robustness of the scheme. Finally, we present numerical examples to illustrate the theoretical bounds in action.

1606.00770 2026-06-04 math.NA cs.NA stat.OT

Different numerical estimators for main effect global sensitivity indices

不同主效应全局灵敏度指数的数值估计方法

Sergei Kucherenko, Shufang Song

AI总结 本文比较了四种直接公式在具有解析解的测试问题上的表现,发现双重循环重排方法在独立和依赖变量模型中均表现最佳。

详情
AI中文摘要

基于Sobol灵敏度指数的方差方法因易于解释而广受欢迎。然而,对于复杂问题计算Sobol指数通常需要大量函数评估以达到合理收敛。本文比较了四种直接公式在具有解析解的测试问题上的表现,这些公式基于高维积分,使用MC和QMC技术进行评估。此外,还比较了基于所谓双重循环重排公式的不同方法。发现双重循环重排(DLR)方法在独立和依赖变量模型中均表现最佳。

英文摘要

The variance-based method of global sensitivity indices based on Sobol sensitivity indices became very popular among practitioners due to its easiness of interpretation. For complex practical problems computation of Sobol indices generally requires a large number of function evaluations to achieve reasonable convergence. Four different direct formulas for computing Sobol main effect sensitivity indices are compared on a set of test problems for which there are analytical results. These formulas are based on high-dimensional integrals which are evaluated using MC and QMC techniques. Direct formulas are also compared with a different approach based on the so-called double loop reordering formula. It is found that the double loop reordering (DLR) approach shows a superior performance among all methods both for models with independent and dependent variables.

1606.00142 2026-06-04 stat.ML econ.GN q-fin.EC stat.CO

Model selection consistency from the perspective of generalization ability and VC theory with an application to Lasso

从泛化能力与VC理论视角进行模型选择一致性研究及其在Lasso中的应用

Ning Xu, Jian Hong, Timothy C. G. Fisher

AI总结 本文从泛化能力视角探讨模型选择一致性,结合SRM与VC理论,证明Lasso在类似OLS假设下具有L2一致性,并提出新的过拟合度量GR2,通过模拟验证CV-Lasso在模型选择与过拟合控制中的有效性。

详情
AI中文摘要

模型选择分析在理论和实践中都具有重要性,尤其在高维数据分析中。最近,最小绝对收缩和选择算子(Lasso)已被应用于统计和计量经济学文献。本文从泛化能力视角出发,结合结构风险最小化(SRM)和Vapnik-Chervonenkis(VC)理论,研究模型选择一致性。方法强调样本内和样本外拟合的平衡,通过交叉验证选择模型复杂度的惩罚项。证明模型泛化能力与模型选择一致性之间存在精确关系。通过实施SRM和VC不等式,证明在类似OLS假设下Lasso具有L2一致性。进一步推导了惩罚极值估计量与无惩罚极值估计量之间距离的概率界,该界受过拟合主导。还提出基于泛化能力的新过拟合度量GR2,若模型选择一致则收敛于零。通过模拟验证所提CV-Lasso算法在模型选择和过拟合控制中的有效性。

英文摘要

Model selection is difficult to analyse yet theoretically and empirically important, especially for high-dimensional data analysis. Recently the least absolute shrinkage and selection operator (Lasso) has been applied in the statistical and econometric literature. Consis- tency of Lasso has been established under various conditions, some of which are difficult to verify in practice. In this paper, we study model selection from the perspective of generalization ability, under the framework of structural risk minimization (SRM) and Vapnik-Chervonenkis (VC) theory. The approach emphasizes the balance between the in-sample and out-of-sample fit, which can be achieved by using cross-validation to select a penalty on model complexity. We show that an exact relationship exists between the generalization ability of a model and model selection consistency. By implementing SRM and the VC inequality, we show that Lasso is L2-consistent for model selection under assumptions similar to those imposed on OLS. Furthermore, we derive a probabilistic bound for the distance between the penalized extremum estimator and the extremum estimator without penalty, which is dominated by overfitting. We also propose a new measurement of overfitting, GR2, based on generalization ability, that converges to zero if model selection is consistent. Using simulations, we demonstrate that the proposed CV-Lasso algorithm performs well in terms of model selection and overfitting control.

1506.07868 2026-06-04 math.NA cs.NA math.OC stat.ML

The local convexity of solving systems of quadratic equations

求解二次方程组的局部凸性

Chris D. White, Sujay Sanghavi, Rachel Ward

AI总结 本文研究了从二次测量中恢复秩r正定矩阵的问题,证明了在特定样本条件下,梯度下降法能线性收敛至正确解。

详情
Comments
36 pages, 3 figures
AI中文摘要

本文研究了从二次测量中恢复秩r正定矩阵的问题,证明了在特定样本条件下,梯度下降法能线性收敛至正确解。

英文摘要

This paper considers the recovery of a rank $r$ positive semidefinite matrix $X X^T\in\mathbb{R}^{n\times n}$ from $m$ scalar measurements of the form $y_i := a_i^T X X^T a_i$ (i.e., quadratic measurements of $X$). Such problems arise in a variety of applications, including covariance sketching of high-dimensional data streams, quadratic regression, quantum state tomography, among others. A natural approach to this problem is to minimize the loss function $f(U) = \sum_i (y_i - a_i^TUU^Ta_i)^2$ which has an entire manifold of solutions given by $\{XO\}_{O\in\mathcal{O}_r}$ where $\mathcal{O}_r$ is the orthogonal group of $r\times r$ orthogonal matrices; this is {\it non-convex} in the $n\times r$ matrix $U$, but methods like gradient descent are simple and easy to implement (as compared to semidefinite relaxation approaches). In this paper we show that once we have $m \geq C nr \log^2(n)$ samples from isotropic gaussian $a_i$, with high probability {\em (a)} this function admits a dimension-independent region of {\em local strong convexity} on lines perpendicular to the solution manifold, and {\em (b)} with an additional polynomial factor of $r$ samples, a simple spectral initialization will land within the region of convexity with high probability. Together, this implies that gradient descent with initialization (but no re-sampling) will converge linearly to the correct $X$, up to an orthogonal transformation. We believe that this general technique (local convexity reachable by spectral initialization) should prove applicable to a broader class of nonconvex optimization problems.

1512.03720 2026-06-04 math.NA cs.NA stat.CO

What the collapse of the ensemble Kalman filter tells us about particle filters

ensemble 协同卡尔曼滤波崩溃告诉我们关于粒子滤波器什么

Matthias Morzfeld, Daniel Hodyss, Chris Snyder

AI总结 研究探讨了协同卡尔曼滤波崩溃与高维粒子滤波器的关系,指出粒子滤波器在高维问题中崩溃,支持气象学中对粒子滤波器进行局部化的努力。

详情
AI中文摘要

ensemble 协同卡尔曼滤波(EnKF)是高维气象问题中可靠的数据同化工具。另一方面,EnKF可以被解释为粒子滤波器,而粒子滤波器在高维问题中会崩溃。我们解释这些看似矛盾的陈述提供了关于粒子滤波器在某些高维问题中如何工作的见解,特别是支持气象学中最近努力将粒子滤波器

英文摘要

The ensemble Kalman filter (EnKF) is a reliable data assimilation tool for high-dimensional meteorological problems. On the other hand, the EnKF can be interpreted as a particle filter, and particle filters collapse in high-dimensional problems. We explain that these seemingly contradictory statements offer insights about how particle filters function in certain high-dimensional problems, and in particular support recent efforts in meteorology to "localize" particle filters, i.e., to restrict the influence of an observation to its neighborhood.

1605.09049 2026-06-04 cs.LG cs.NA math.NA stat.ML

Recycling Randomness with Structure for Sublinear time Kernel Expansions

利用结构回收随机性以实现子线性时间核展开

Krzysztof Choromanski, Vikas Sindhwani

AI总结 本文提出通过结构矩阵近似各种核函数的方法,扩展了快速食品构造,并通过理论分析和实验验证了结构化矩阵在提升核方法性能中的有效性。

详情
AI中文摘要

我们提出了一种方案,通过将高斯随机向量回收到结构化矩阵中,在子线性时间内近似各种核函数。我们的框架包括快速食品构造作为特殊情况,但也可扩展到循环、托普利茨和汉克尔矩阵,以及更广泛的具有低位移秩特性的结构化矩阵。我们引入了相干性和图论结构常数等概念来控制近似质量,并证明了框架内随机特征映射的无偏性和低方差性质。对于低位移矩阵,我们展示了如何通过控制结构和随机性的程度来减少统计方差,尽管这会增加计算和存储需求。实验证果强烈支持我们的理论,并证明了使用更广泛的结构化矩阵可以提升核方法的扩展性。

英文摘要

We propose a scheme for recycling Gaussian random vectors into structured matrices to approximate various kernel functions in sublinear time via random embeddings. Our framework includes the Fastfood construction as a special case, but also extends to Circulant, Toeplitz and Hankel matrices, and the broader family of structured matrices that are characterized by the concept of low-displacement rank. We introduce notions of coherence and graph-theoretic structural constants that control the approximation quality, and prove unbiasedness and low-variance properties of random feature maps that arise within our framework. For the case of low-displacement matrices, we show how the degree of structure and randomness can be controlled to reduce statistical variance at the cost of increased computation and storage requirements. Empirical results strongly support our theory and justify the use of a broader family of structured matrices for scaling up kernel methods using random features.

1512.01904 2026-06-04 stat.ML cs.NA math.NA

Gauss quadrature for matrix inverse forms with applications

高斯求积用于矩阵逆形式及其应用

Chengtao Li, Suvrit Sra, Stefanie Jegelka

AI总结 本文提出加速需计算双线性逆形式u^T A^{-1}u的机器学习算法框架,基于高斯型求积方法,适用于大规模稀疏矩阵,并能计算上下界以加速算法,证明这些界以几何速率收敛。

详情
AI中文摘要

我们提出一个框架,用于加速需要计算双线性逆形式u^T A^{-1}u的机器学习算法集合,其中A是正定矩阵,u是给定向量。我们的框架基于高斯型求积方法,能够轻松扩展到大规模稀疏矩阵。进一步,它允许回顾性计算u^T A^{-1}u的上下界,从而加速多种算法。我们证明这些界在迭代中收紧并以线性(几何)速率收敛。据我们所知,这是首次展示高斯型求积的关键性质的工作,这是一个经典且深入研究的主题。我们通过使用求积来加速涉及行列式点过程和子模优化的机器学习任务,观察到在多个实例中显著的加速效果。

英文摘要

We present a framework for accelerating a spectrum of machine learning algorithms that require computation of bilinear inverse forms $u^\top A^{-1}u$, where $A$ is a positive definite matrix and $u$ a given vector. Our framework is built on Gauss-type quadrature and easily scales to large, sparse matrices. Further, it allows retrospective computation of lower and upper bounds on $u^\top A^{-1}u$, which in turn accelerates several algorithms. We prove that these bounds tighten iteratively and converge at a linear (geometric) rate. To our knowledge, ours is the first work to demonstrate these key properties of Gauss-type quadrature, which is a classical and deeply studied topic. We illustrate empirical consequences of our results by using quadrature to accelerate machine learning tasks involving determinantal point processes and submodular optimization, and observe tremendous speedups in several instances.

1509.04072 2026-06-04 stat.ML cs.SY eess.SY

Robust Gaussian Filtering using a Pseudo Measurement

使用伪测量的鲁棒高斯滤波

Manuel Wüthrich, Cristina Garcia Cifuentes, Sebastian Trimpe, Franziska Meier, Jeannette Bohg, Jan Issac, Stefan Schaal

AI总结 本文提出通过引入伪测量改进高斯滤波以应对厚尾传感器模型,通过特征函数优化实现鲁棒滤波,适用于线性和非线性系统。

详情
AI中文摘要

许多传感器,如测距、声纳、雷达、GPS和视觉设备,产生的测量值常受异常值污染。此问题可通过使用厚尾传感器模型解决,但所有属于高斯滤波家族的估计算法(如广泛使用的扩展卡尔曼滤波和无迹卡尔曼滤波)本质上无法与之兼容。本文的贡献是证明通过简单改变即可使任何高斯滤波器与厚尾传感器模型兼容:即不再使用物理测量进行滤波,而是使用对物理测量应用特征函数得到的伪测量进行滤波。我们推导出在某些条件下最优的特征函数。仿真结果表明,所提方法能有效处理测量异常值,并在线性和非线性系统中实现鲁棒滤波。

英文摘要

Many sensors, such as range, sonar, radar, GPS and visual devices, produce measurements which are contaminated by outliers. This problem can be addressed by using fat-tailed sensor models, which account for the possibility of outliers. Unfortunately, all estimation algorithms belonging to the family of Gaussian filters (such as the widely-used extended Kalman filter and unscented Kalman filter) are inherently incompatible with such fat-tailed sensor models. The contribution of this paper is to show that any Gaussian filter can be made compatible with fat-tailed sensor models by applying one simple change: Instead of filtering with the physical measurement, we propose to filter with a pseudo measurement obtained by applying a feature function to the physical measurement. We derive such a feature function which is optimal under some conditions. Simulation results show that the proposed method can effectively handle measurement outliers and allows for robust filtering in both linear and nonlinear systems.

1512.09103 2026-06-04 math.OC cs.DS cs.NA math.NA stat.ML

Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling

更快的加速坐标下降法:非均匀采样

Zeyuan Allen-Zhu, Zheng Qu, Peter Richtárik, Yang Yuan

AI总结 本文通过非均匀采样将加速坐标下降法的运行时间提升至$\sqrt{n}$,方法基于坐标光滑性参数的平方根比例采样,适用于经验风险最小化和线性系统求解。

详情
Comments
same result, but polished writing
AI中文摘要

加速坐标下降法因其低迭代成本和可扩展性被广泛用于优化。本文通过非均匀采样将运行时间提升至$\sqrt{n}$,该方法基于坐标光滑性参数的平方根比例采样,证明技术也不同于以往的估计序列方法,适用于经验风险最小化和线性系统求解,理论和实践均有效。

英文摘要

Accelerated coordinate descent is widely used in optimization due to its cheap per-iteration cost and scalability to large-scale problems. Up to a primal-dual transformation, it is also the same as accelerated stochastic gradient descent that is one of the central methods used in machine learning. In this paper, we improve the best known running time of accelerated coordinate descent by a factor up to $\sqrt{n}$. Our improvement is based on a clean, novel non-uniform sampling that selects each coordinate with a probability proportional to the square root of its smoothness parameter. Our proof technique also deviates from the classical estimation sequence technique used in prior work. Our speed-up applies to important problems such as empirical risk minimization and solving linear systems, both in theory and in practice.

1605.08257 2026-06-04 cs.LG cs.NA math.NA math.OC stat.ML

Low-rank tensor completion: a Riemannian manifold preconditioning approach

低秩张量补全:黎曼流形预条件方法

Hiroyuki Kasai, Bamdev Mishra

AI总结 本文提出了一种基于黎曼流形预条件的方法用于具有秩约束的张量补全问题,通过引入新的黎曼度量利用最小二乘结构和Tucker分解的对称性,开发出预条件非线性共轭梯度和随机梯度下降算法,实验表明其在不同数据集上优于现有方法。

详情
Comments
The 33rd International Conference on Machine Learning (ICML 2016). arXiv admin note: substantial text overlap with arXiv:1506.02159
AI中文摘要

我们提出了一种新颖的黎曼流形预条件方法用于具有秩约束的张量补全问题。提出了一种新的黎曼度量或内积,利用成本函数的最小二乘结构,并考虑Tucker分解中存在的结构对称性。特定的度量允许利用黎曼优化在商流形框架中开发预条件非线性共轭梯度和随机梯度下降算法,分别用于批量和在线设置。列举了各种优化相关成分的矩阵表示。数值比较表明,所提出算法在不同合成和实际数据集上稳健地优于现有方法。

英文摘要

We propose a novel Riemannian manifold preconditioning approach for the tensor completion problem with rank constraint. A novel Riemannian metric or inner product is proposed that exploits the least-squares structure of the cost function and takes into account the structured symmetry that exists in Tucker decomposition. The specific metric allows to use the versatile framework of Riemannian optimization on quotient manifolds to develop preconditioned nonlinear conjugate gradient and stochastic gradient descent algorithms for batch and online setups, respectively. Concrete matrix representations of various optimization-related ingredients are listed. Numerical comparisons suggest that our proposed algorithms robustly outperform state-of-the-art algorithms across different synthetic and real-world datasets.

1511.03722 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ME stat.ML

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

强化学习中的双重鲁棒离策略价值评估

Nan Jiang, Lihong Li

AI总结 本文提出一种双重鲁棒估计器,用于离策略价值评估,兼顾无偏性和低方差性,并在基准问题中验证其有效性。

详情
Comments
14 pages; 4 figures; ICML 2016
AI中文摘要

本文研究了强化学习(RL)中的离策略价值评估问题,其中目标是基于由不同策略收集的数据来估计新策略的价值。这一问题通常是将RL应用于现实世界问题时的关键步骤。尽管其重要性,现有的通用方法要么存在不可控的偏差,要么方差较高。在本文中,我们扩展了用于轮盘赌的双重鲁棒估计器到顺序决策问题,实现了两全其美:它保证无偏,并且比流行的重要性采样估计器具有显著更低的方差。我们展示了估计器在多个基准问题中的准确性,并展示了其作为安全策略改进子程序的用途。我们还提供了对问题难度的理论结果,并证明在某些情况下,我们的估计器可以达到下限。

英文摘要

We study the problem of off-policy value evaluation in reinforcement learning (RL), where one aims to estimate the value of a new policy based on data collected by a different policy. This problem is often a critical step when applying RL in real-world problems. Despite its importance, existing general methods either have uncontrolled bias or suffer high variance. In this work, we extend the doubly robust estimator for bandits to sequential decision-making problems, which gets the best of both worlds: it is guaranteed to be unbiased and can have a much lower variance than the popular importance sampling estimators. We demonstrate the estimator's accuracy in several benchmark problems, and illustrate its use as a subroutine in safe policy improvement. We also provide theoretical results on the hardness of the problem, and show that our estimator can match the lower bound in certain scenarios.

1605.07830 2026-06-04 math.ST cs.NA math.NA math.PR stat.TH

Derivative-based global sensitivity measures and their link with Sobol sensitivity indices

基于导数的全局敏感性度量及其与Sobol敏感性指数的联系

S. Kucherenko, S. Song

AI总结 本文探讨了基于导数的全局敏感性分析方法(DGSM)及其与Sobol指数的联系,介绍了DGSM的高效计算方法及Sobol指数的新的上下界结果,并通过实例展示了其应用。

详情
Comments
Monte Carlo and Quasi-Monte Carlo Methods 2014, conference, Monte Carlo and Quasi-Monte Carlo Methods, R. Cools and D. Nuyens (eds.), Springer Proceedings in Mathematics & Statistics 163, Springer International Publishing Switzerland 2016
AI中文摘要

Sobol敏感性指数的方差法因高效和易于解释而广受欢迎,但在高维模型中直接应用效率低下且成本高。作为替代方法,基于导数的全局敏感性分析方法(DGSM)近年来受到关注,其与Morris筛选法和Sobol指数相关。DGSM易于实现和计算,计算时间通常低于Sobol指数估计。本文综述了DGSM的最新进展,并提出了Sobol总敏感性指数的新上下界结果。通过这些边界,可以大多数情况下获得Sobol总敏感性指数的良好实用估计。多个示例用于展示DGSM的应用。

英文摘要

The variance-based method of Sobol sensitivity indices is very popular among practitioners due to its efficiency and easiness of interpretation. However, for high-dimensional models the direct application of this method can be very time consuming and prohibitively expensive to use. One of the alternative global sensitivity analysis methods known as the method of derivative based global sensitivity measures (DGSM) has recently become popular among practitioners. It has a link with the Morris screening method and Sobol sensitivity indices. DGSM are very easy to implement and evaluate numerically. The computational time required for numerical evaluation of DGSM is generally much lower than that for estimation of Sobol sensitivity indices. We present a survey of recent advances in DGSM and new results concerning new lower and upper bounds on the values of Sobol total sensitivity indices. Using these bounds it is possible in most cases to get a good practical estimation of the values of Sobol total sensitivity indices. Several examples are used to illustrate an application of DGSM.

1504.08196 2026-06-04 eess.SY cs.SY stat.ML

On the estimation of initial conditions in kernel-based system identification

关于基于核的方法系统辨识中初始条件估计的问题

Riccardo Sven Risuleo, Giulio Bottegal, Håkan Hjalmarsson

AI总结 本文提出三种方法,利用不同类型的先验信息估计初始条件,并通过最大似然-后验估计器调整稳定样条核的超参数,通过期望最大化方法解决优化问题,提升了系统脉冲响应重建的准确性。

详情
Comments
16 pages, accepted for publication at IEEE Conference on Decision and Control 2015
AI中文摘要

近年来,系统辨识领域的发展引起了对正则化核方法的关注,其中采用最近引入的稳定样条核,强制施加对未知过程的先验信息。这降低了估计的方差,使得核方法在输入输出数据样本较少时特别有吸引力。然而,在这种情况下,系统初始条件的影响可能对输出动态产生显著影响。本文专门针对这一点,提出三种方法,利用不同类型的先验信息估计初始条件,并通过混合最大似然-后验估计器调整稳定样条核的超参数。为了解决相关优化问题,我们采用期望最大化方法,证明了解可以通过在简单更新步骤之间迭代获得。数值实验显示,与不考虑初始条件影响的其他核方法相比,所提策略在重建系统脉冲响应的准确性方面具有优势。

英文摘要

Recent developments in system identification have brought attention to regularized kernel-based methods, where, adopting the recently introduced stable spline kernel, prior information on the unknown process is enforced. This reduces the variance of the estimates and thus makes kernel-based methods particularly attractive when few input-output data samples are available. In such cases however, the influence of the system initial conditions may have a significant impact on the output dynamics. In this paper, we specifically address this point. We propose three methods that deal with the estimation of initial conditions using different types of information. The methods consist in various mixed maximum likelihood--a posteriori estimators which estimate the initial conditions and tune the hyperparameters characterizing the stable spline kernel. To solve the related optimization problems, we resort to the expectation-maximization method, showing that the solutions can be attained by iterating among simple update steps. Numerical experiments show the advantages, in terms of accuracy in reconstructing the system impulse response, of the proposed strategies, compared to other kernel-based schemes not accounting for the effect initial conditions.

1504.08190 2026-06-04 eess.SY cs.SY stat.ML

A new kernel-based approach for overparameterized Hammerstein system identification

针对过参数化Hammerstein系统辨识的一种新核方法

Riccardo Sven Risuleo, Giulio Bottegal, Håkan Hjalmarsson

AI总结 本文提出了一种新的Hammerstein系统辨识方案,通过过参数化向量估计非线性系数和线性系统脉冲响应,结合正则化核方法提升估计精度。

详情
Comments
17 pages, submitted to IEEE Conference on Decision and Control 2015
AI中文摘要

本文提出了一种新的Hammerstein系统辨识方案,Hammerstein系统是由静态非线性部分和线性时不变动态系统级联构成的动态系统。我们假设非线性函数可以表示为p个基函数的线性组合。通过估计一个包含所有未知变量组合的np维过参数化向量,重建非线性部分的p个系数和线性系统前n个脉冲响应样本。为避免估计中的高方差,我们采用正则化核方法,并引入一种针对Hammerstein系统辨识的新核。我们证明所提出的方案能够唯一地将过参数化向量分解为脉冲响应和静态非线性部分的p个系数的组合。通过多个数值实验,我们还展示了所提方法在Hammerstein系统辨识中优于两种标准方法。

英文摘要

In this paper we propose a new identification scheme for Hammerstein systems, which are dynamic systems consisting of a static nonlinearity and a linear time-invariant dynamic system in cascade. We assume that the nonlinear function can be described as a linear combination of $p$ basis functions. We reconstruct the $p$ coefficients of the nonlinearity together with the first $n$ samples of the impulse response of the linear system by estimating an $np$-dimensional overparameterized vector, which contains all the combinations of the unknown variables. To avoid high variance in these estimates, we adopt a regularized kernel-based approach and, in particular, we introduce a new kernel tailored for Hammerstein system identification. We show that the resulting scheme provides an estimate of the overparameterized vector that can be uniquely decomposed as the combination of an impulse response and $p$ coefficients of the static nonlinearity. We also show, through several numerical experiments, that the proposed method compares very favorably with two standard methods for Hammerstein system identification.

1412.4056 2026-06-04 eess.SY cs.SY stat.ML

Blind system identification using kernel-based methods

基于核方法的盲系统辨识

Giulio Bottegal, Riccardo S. Risuleo, Håkan Hjalmarsson

AI总结 本文提出了一种盲系统辨识新方法,利用高斯回归框架将未知线性系统的冲击响应建模为高斯过程的实现,采用稳定的样条核结构并估计超参数和噪声方差,通过经验贝叶斯方法和改进的EM算法实现高效优化。

详情
Comments
15 pages; accepted for publication at IFAC Sysid 2015
AI中文摘要

我们提出了一种新的盲系统辨识方法。借助高斯回归框架,我们将未知线性系统的冲击响应建模为高斯过程的实现。此类过程的协方差矩阵(或核)结构由稳定的样条核给出,该核近期已被引入用于系统辨识,并依赖于一个未知的超参数。我们假设输入可以由少数参数线性描述。我们使用经验贝叶斯方法估计这些参数、核超参数和噪声方差。相关的优化问题通过基于期望最大化方法的新型迭代方案高效求解。特别是,我们展示每次迭代由一组简单的更新规则组成。通过一些数值实验,我们展示了所提方法非常有前途的性能。

英文摘要

We propose a new method for blind system identification. Resorting to a Gaussian regression framework, we model the impulse response of the unknown linear system as a realization of a Gaussian process. The structure of the covariance matrix (or kernel) of such a process is given by the stable spline kernel, which has been recently introduced for system identification purposes and depends on an unknown hyperparameter. We assume that the input can be linearly described by few parameters. We estimate these parameters, together with the kernel hyperparameter and the noise variance, using an empirical Bayes approach. The related optimization problem is efficiently solved with a novel iterative scheme based on the Expectation-Maximization method. In particular, we show that each iteration consists of a set of simple update rules. We show, through some numerical experiments, very promising performance of the proposed method.

1605.05069 2026-06-04 math.ST cs.NA math.NA stat.OT stat.TH

Sobol' indices for problems defined in non-rectangular domains

非矩形领域中Sobol指数的理论与数值框架

S. Kucherenko, O. V. Klymenko, N. Shah

AI总结 本文提出一种新方法,用于估计输入受限于非矩形域的模型的Sobol指数,通过四阶积分法和MC/QMC估计器进行数值计算,并用约束测试函数验证方法的有效性。

详情
Comments
24 pages, 16 figures
AI中文摘要

本文开发了一种新的理论和数值框架,用于估计输入受限于非矩形域(例如存在不等式约束)的模型的Sobol敏感性指数。提出了两种数值方法:四阶积分法,适用于低维和中等维问题,以及基于接受-拒绝抽样方法的MC/QMC估计器,用于Sobol敏感性指数的数值估计。考虑了多个具有约束的模型测试函数,其中找到了解析解作为基准来验证数值估计。该方法被证明是通用且高效的。

英文摘要

A novel theoretical and numerical framework for the estimation of Sobol sensitivity indices for models in which inputs are confined to a non-rectangular domain (e.g., in presence of inequality constraints) is developed. Two numerical methods, namely the quadrature integration method which may be very efficient for problems of low and medium dimensionality and the MC/QMC estimators based on the acceptance-rejection sampling method are proposed for the numerical estimation of Sobol sensitivity indices. Several model test functions with constraints are considered for which analytical solutions for Sobol sensitivity indices were found. These solutions were used as benchmarks for verifying numerical estimates. The method is shown to be general and efficient.

1605.05042 2026-06-04 math.NA cs.NA math.ST stat.TH

Computational issues and numerical experiments for Linear Multistep Method Particle Filtering

线性多步法粒子滤波的计算问题与数值实验

Daniela Calvetti, Salvatore Cuomo, Monica Pragliola, Erkki Somersalo, Gerardo Toraldo

AI总结 本文研究了线性多步法粒子滤波在估计进化系统参数中的有效性,通过骨骼肌代谢问题验证了该方法的计算效率及误差来源,分析了线性多步法与Runge-Kutta方法的对比效果。

详情
AI中文摘要

线性多步法粒子滤波(LMM PF)是一种用于预测由微分方程系统支配的进化系统时间演化的算法。如果某些参数未知,可以通过计算来估计这些参数。本文假设所有未知参数均被视为随机变量,其随机性表示其值的不确定性而非内在属性。因此,系统状态和参数通过概率密度描述,通常以代表性样本形式呈现。该方法在参数估计反问题中特别有吸引力,因为统计方法自然提供了通过分布扩散来评估解不确定性的手段。底层采样技术的计算效率对方法的成功至关重要,因为解的准确性取决于能否从未知参数分布中生成代表性样本。本文在骨骼肌代谢问题上测试了LMM PF,该问题此前在Ensemble Kalman过滤框架中处理。本文利用数值证据突显了主要误差源与所采用线性多步法的影响。最后,我们分析了将LMM替换为Runge-Kutta类积分方法对支持粒子滤波技术的影响。

英文摘要

The Linear Multistep Method Particle Filter (LMM PF) is a method for predicting the evolution in time of a evolutionary system governed by a system of differential equations. If some of the parameters of the governing equations are unknowns, it is possible to organize the calculations so as to estimate them while following the evolution of the system in time. The underlying assumption in the approach that we present is that all unknowns are modelled as random variables, where the randomness is an indication of the uncertainty of their values rather than an intrinsic property of the quantities. Consequently, the states of the system and the parameters are described in probabilistic terms by their density, often in the form of representative samples. This approach is particularly attractive in the context of parameter estimation inverse problems, because the statistical formulation naturally provides a means of assessing the uncertainty in the solution via the spread of the distribution. The computational efficiency of the underlying sampling technique is crucial for the success of the method, because the accuracy of the solution depends on the ability to produce representative samples from the distribution of the unknown parameters. In this paper LMM PF is tested on a skeletal muscle metabolism problem, which was previously treated within the Ensemble Kalman filtering framework. Here numerical evidences are used to highlight the correlation between the main sources of errors and the influence of the linera multistep method adopted. Finally, we analyzed the effect of replacing LMM with Runge-Kutta class integration methods for supporting the PF technique.

1604.01067 2026-06-04 math.NA cs.CC cs.IT cs.NA math.IT stat.CO

Sparse matrices for weighted sparse recovery

稀疏矩阵用于加权稀疏恢复

Bubacarr Bah

AI总结 本文首次提出加权ℓ1最小化在稀疏随机矩阵和加权稀疏信号中的稀疏恢复保证,通过加权空空间性质推导,并展示其计算效率优于密集矩阵。

详情
Comments
19 pages, 4 figures
AI中文摘要

我们推导了使用稀疏随机矩阵和加权稀疏信号的加权ℓ1最小化首次稀疏恢复保证,利用加权的空空间性质推导这些保证。这些稀疏矩阵来自扩展图,可以快速应用且具有优于其密集对应物的计算复杂性。此外,我们展示使用此类稀疏矩阵,加权稀疏恢复通过加权ℓ1最小化导致样本复杂性与信号加权稀疏性线性相关,且这些采样率可能小于标准稀疏恢复。此外,这些结果减少到已知的标准稀疏恢复和具有先验信息的稀疏恢复结果,并通过数值实验支持。

英文摘要

We derived the first sparse recovery guarantees for weighted $\ell_1$ minimization with sparse random matrices and the class of weighted sparse signals, using a weighted versions of the null space property to derive these guarantees. These sparse matrices from expender graphs can be applied very fast and have other better computational complexities than their dense counterparts. In addition we show that, using such sparse matrices, weighted sparse recovery with weighted $\ell_1$ minimization leads to sample complexities that are linear in the weighted sparsity of the signal and these sampling rates can be smaller than those of standard sparse recovery. Moreover, these results reduce to known results in standard sparse recovery and sparse recovery with prior information and the results are supported by numerical experiments.

1507.06736 2026-06-04 math.NA cs.CC cs.IT cs.NA math.FA math.IT stat.CO

The sample complexity of weighted sparse approximation

加权稀疏逼近的样本复杂性

Bubacarr Bah, Rachel Ward

AI总结 研究了加权稀疏恢复的样本复杂性,证明了在特定权重条件下,通过加权ℓ1最小化可实现线性样本复杂性,且结果适用于未知支持集和不同噪声情况。

详情
Comments
21 pages, 12 figures
AI中文摘要

对于高斯采样矩阵,我们提供了在给定稀疏支持集先验模型与真实支持集匹配程度下,实现鲁棒加权稀疏恢复保证所需的最小测量数m的界限。我们的主要贡献是,对于一个支持于未知集合S⊂{1,…,N}的稀疏向量x∈R^N,其中|S|≤k,若S具有加权基数ω(S):=∑_{j∈S}ω_j^2,并且S^c上的权重呈现温和增长,即对于j∈S^c,ω_j^2≥γlog(j/ω(S)),γ>0,则通过使用权重ω_j的加权ℓ1最小化进行稀疏恢复的样本复杂性与加权稀疏性水平线性相关,且m=O(ω(S)/γ)。此主要结果是标准稀疏恢复情况、已知支持集情况以及具有先验信息的稀疏恢复情况的推广。我们进一步将结果扩展到加性噪声情况。我们的结果是非均匀的,即适用于固定但未知的支持集,且S上的权重不必全部小于S^c上的权重即可保证恢复结果。

英文摘要

For Gaussian sampling matrices, we provide bounds on the minimal number of measurements $m$ required to achieve robust weighted sparse recovery guarantees in terms of how well a given prior model for the sparsity support aligns with the true underlying support. Our main contribution is that for a sparse vector ${\bf x} \in \mathbb{R}^N$ supported on an unknown set $\mathcal{S} \subset \{1, \dots, N\}$ with $|\mathcal{S}|\leq k$, if $\mathcal{S}$ has \emph{weighted cardinality} $ω(\mathcal{S}) := \sum_{j \in \mathcal{S}} ω_j^2$, and if the weights on $\mathcal{S}^c$ exhibit mild growth, $ω_j^2 \geq γ\log(j/ω(\mathcal{S}))$ for $j\in\mathcal{S}^c$ and $γ> 0$, then the sample complexity for sparse recovery via weighted $\ell_1$-minimization using weights $ω_j$ is linear in the weighted sparsity level, and $m = \mathcal{O}(ω(\mathcal{S})/γ)$. This main result is a generalization of special cases including a) the standard sparse recovery setting where all weights $ω_j \equiv 1$, and $m = \mathcal{O}\left(k\log\left(N/k\right)\right)$; b) the setting where the support is known a priori, and $m = \mathcal{O}(k)$; and c) the setting of sparse recovery with prior information, and $m$ depends on how well the weights are aligned with the support set $\mathcal{S}$. We further extend the results in case c) to the setting of additive noise. Our results are {\em nonuniform} that is they apply for a fixed support, unknown a priori, and the weights on $\mathcal{S}$ do not all have to be smaller than the weights on $\mathcal{S}^c$ for our recovery results to hold.

1510.06053 2026-06-04 stat.CO cs.NA math.NA stat.ME

Scalable posterior approximations for large-scale Bayesian inverse problems via likelihood-informed parameter and state reduction

通过似然主导的参数和状态缩减实现大规模贝叶斯反问题后验近似

Tiangang Cui, Youssef M. Marzouk, Karen E. Willcox

AI总结 本文提出通过似然主导的参数和状态缩减方法,加速大规模贝叶斯反问题求解,通过构建低维子空间提高后验近似效率,适用于高维观测和高维状态空间的反问题。

详情
Journal ref
Journal of Computational Physics, Volume 315, 15 June 2016, Pages 363-387
Comments
35 pages, 12 figures
AI中文摘要

解决大规模贝叶斯反问题的主要瓶颈是后验采样算法在高维参数空间中的可扩展性和正向模型评估的计算成本。然而,不完整或噪声数据、状态变化和参数依赖性以及先验中的相关性提供了有用的结构,可用于此设置中的降维。为此,我们展示了如何共同构建参数空间和状态空间的低维子空间,以加速反问题的贝叶斯求解。作为状态降维的副产品,我们还展示了如何在高维观测问题中识别数据的低维子空间。这些子空间使后验近似为两个因子的乘积:(i) 后验在低维参数子空间上的投影,其中原始似然被减少模型的近似替代;(ii) 高维补集参数子空间的边缘先验分布。我们提出了几种使用有限正向和伴随模型模拟构建这些子空间的策略。所得到的后验近似可通过标准采样技术快速表征,例如马尔可夫链蒙特卡罗。两个数值例子展示了本文方法的准确性和效率:大气遥感积分方程的反演,其中数据维度非常高;以及地下水系统中异质传导率场的推断,涉及高维状态和参数的偏微分方程正向模型。

英文摘要

Two major bottlenecks to the solution of large-scale Bayesian inverse problems are the scaling of posterior sampling algorithms to high-dimensional parameter spaces and the computational cost of forward model evaluations. Yet incomplete or noisy data, the state variation and parameter dependence of the forward model, and correlations in the prior collectively provide useful structure that can be exploited for dimension reduction in this setting--both in the parameter space of the inverse problem and in the state space of the forward model. To this end, we show how to jointly construct low-dimensional subspaces of the parameter space and the state space in order to accelerate the Bayesian solution of the inverse problem. As a byproduct of state dimension reduction, we also show how to identify low-dimensional subspaces of the data in problems with high-dimensional observations. These subspaces enable approximation of the posterior as a product of two factors: (i) a projection of the posterior onto a low-dimensional parameter subspace, wherein the original likelihood is replaced by an approximation involving a reduced model; and (ii) the marginal prior distribution on the high-dimensional complement of the parameter subspace. We present and compare several strategies for constructing these subspaces using only a limited number of forward and adjoint model simulations. The resulting posterior approximations can rapidly be characterized using standard sampling techniques, e.g., Markov chain Monte Carlo. Two numerical examples demonstrate the accuracy and efficiency of our approach: inversion of an integral equation in atmospheric remote sensing, where the data dimension is very high; and the inference of a heterogeneous transmissivity field in a groundwater system, which involves a partial differential equation forward model with high dimensional state and parameters.

1411.3688 2026-06-04 stat.CO cs.NA math.NA stat.ME

Dimension-independent likelihood-informed MCMC

与维度无关的似然信息MCMC

Tiangang Cui, Kody J. H. Law, Youssef M. Marzouk

AI总结 本文提出了一种与维度无关的似然信息MCMC方法,通过引入操作权重提案分布和局部Hessian信息,实现对高维后验分布的有效采样,用于解决非高斯结构的逆问题。

详情
Journal ref
Journal of Computational Physics, 304, 109-137 (2016)
AI中文摘要

许多贝叶斯推断问题需要探索高维参数的后验分布,这些参数代表底层函数的离散化。本文介绍了一种MCMC采样器家族,能够适应后验分布的特定结构。首先,我们引入了一类操作权重提案分布,定义在函数空间上,使得采样器性能与函数离散化无关。其次,通过利用先验到后验分布变化中的局部Hessian信息和相关低维结构,我们开发了一种非均匀离散化方案,用于Langevin随机微分方程,从而得到适应后验非高斯结构的运算权重提案。所得到的与维度无关、似然信息(DILI)MCMC采样器可能对一大类高维问题有用,其中目标概率测度相对于高斯参考测度具有密度。两个非线性逆问题用于展示这些DILI采样器的效率:椭圆PDE系数逆问题和条件扩散中的路径重建。

英文摘要

Many Bayesian inference problems require exploring the posterior distribution of high-dimensional parameters that represent the discretization of an underlying function. This work introduces a family of Markov chain Monte Carlo (MCMC) samplers that can adapt to the particular structure of a posterior distribution over functions. Two distinct lines of research intersect in the methods developed here. First, we introduce a general class of operator-weighted proposal distributions that are well defined on function space, such that the performance of the resulting MCMC samplers is independent of the discretization of the function. Second, by exploiting local Hessian information and any associated low-dimensional structure in the change from prior to posterior distributions, we develop an inhomogeneous discretization scheme for the Langevin stochastic differential equation that yields operator-weighted proposals adapted to the non-Gaussian structure of the posterior. The resulting dimension-independent, likelihood-informed (DILI) MCMC samplers may be useful for a large class of high-dimensional problems where the target probability measure has a density with respect to a Gaussian reference measure. Two nonlinear inverse problems are used to demonstrate the efficiency of these DILI samplers: an elliptic PDE coefficient inverse problem and path reconstruction in a conditioned diffusion.

1403.4680 2026-06-04 stat.CO cs.NA math.NA stat.ME

Likelihood-informed dimension reduction for nonlinear inverse problems

基于似然信息的非线性反问题降维方法

Tiangang Cui, James Martin, Youssef M. Marzouk, Antti Solonen, Alessio Spantini

AI总结 本文提出一种基于似然信息的降维方法,用于非线性反问题中的参数空间降维,通过识别似然主导子空间提高贝叶斯推断效率。

详情
Journal ref
Inverse Problems, 30, 114015 (2014)
AI中文摘要

非问题的固有维度受先验信息、观测精度和数量以及正向算子的平滑性质影响。从贝叶斯视角来看,先验到后验的转变可能局限于参数空间的低维子空间。本文提出一种降维方法,称为

英文摘要

The intrinsic dimensionality of an inverse problem is affected by prior information, the accuracy and number of observations, and the smoothing properties of the forward operator. From a Bayesian perspective, changes from the prior to the posterior may, in many problems, be confined to a relatively low-dimensional subspace of the parameter space. We present a dimension reduction approach that defines and identifies such a subspace, called the "likelihood-informed subspace" (LIS), by characterizing the relative influences of the prior and the likelihood over the support of the posterior distribution. This identification enables new and more efficient computational methods for Bayesian inference with nonlinear forward models and Gaussian priors. In particular, we approximate the posterior distribution as the product of a lower-dimensional posterior defined on the LIS and the prior distribution marginalized onto the complementary subspace. Markov chain Monte Carlo sampling can then proceed in lower dimensions, with significant gains in computational efficiency. We also introduce a Rao-Blackwellization strategy that de-randomizes Monte Carlo estimates of posterior expectations for additional variance reduction. We demonstrate the efficiency of our methods using two numerical examples: inference of permeability in a groundwater system governed by an elliptic PDE, and an atmospheric remote sensing problem based on Global Ozone Monitoring System (GOMOS) observations.

1403.4290 2026-06-04 stat.CO cs.NA math.NA stat.ME

Data-Driven Model Reduction for the Bayesian Solution of Inverse Problems

数据驱动的模型降阶用于贝叶斯逆问题求解

Tiangang Cui, Youssef M. Marzouk, Karen E. Willcox

AI总结 本文提出数据驱动的降阶方法,用于降低贝叶斯逆问题求解的计算成本,通过自适应生成后验分布快照,同时进行后验探索和降阶,提升采样效率。

详情
Journal ref
International Journal for Numerical Methods in Engineering, 102 (5), 966-990 (2015)
AI中文摘要

在由偏微分方程(PDE) governed 的贝叶斯逆问题求解中,重复评估数值PDE模型的计算成本是一个主要挑战。本文提出了一种数据驱动的投影基降阶技术,通过自适应生成后验分布快照来构建降阶模型,从而同时进行后验探索和降阶。该方法将全尺度模型与降阶模型耦合在MCMC算法中,保持了准确推断的同时降低了计算成本。在考虑渗流稳定状态的数值实验中,数据驱动的降阶模型在准确性上优于传统方法,并显著提高了后验采样效率。

英文摘要

One of the major challenges in the Bayesian solution of inverse problems governed by partial differential equations (PDEs) is the computational cost of repeatedly evaluating numerical PDE models, as required by Markov chain Monte Carlo (MCMC) methods for posterior sampling. This paper proposes a data-driven projection-based model reduction technique to reduce this computational cost. The proposed technique has two distinctive features. First, the model reduction strategy is tailored to inverse problems: the snapshots used to construct the reduced-order model are computed adaptively from the posterior distribution. Posterior exploration and model reduction are thus pursued simultaneously. Second, to avoid repeated evaluations of the full-scale numerical model as in a standard MCMC method, we couple the full-scale model and the reduced-order model together in the MCMC algorithm. This maintains accurate inference while reducing its overall computational cost. In numerical experiments considering steady-state flow in a porous medium, the data-driven reduced-order model achieves better accuracy than a reduced-order model constructed using the classical approach. It also improves posterior sampling efficiency by several orders of magnitude compared to a standard MCMC method.

1512.00984 2026-06-04 math.NA cs.LG cs.NA stat.ML

Fast Low-Rank Matrix Learning with Nonconvex Regularization

快速低秩矩阵学习与非凸正则化

Quanming Yao, James T. Kwok, Wenliang Zhong

AI总结 本文提出一种利用非凸正则化快速学习低秩矩阵的方法,通过截断奇异值和幂方法提升效率,实现更准确的矩阵恢复。

详情
Comments
Long version of conference paper appeared ICDM 2015
AI中文摘要

低秩建模在机器学习、计算机视觉和社会网络分析中有广泛应用。尽管核范数常用于近似矩阵秩,但非凸低秩正则化在恢复性能上更优。然而,由此产生的优化问题更具挑战性。最近的最先进方法基于近端梯度算法,但需要每次近端步骤进行昂贵的完整SVD。本文表明,对于许多常用非凸低秩正则化器,可以推导出截断,自动阈值化由近端算子获得的奇异值。这使得可以高效地用幂方法近似SVD。此外,近端算子可以简化为一个较小矩阵在该主子空间上的投影。可以保证O(1/T)的收敛率,其中T是迭代次数。在矩阵补全和鲁棒主成分分析上进行了广泛实验。所提方法在最先进方法上实现了显著加速。此外,获得的矩阵解比传统核范数正则化器更准确且秩更低。

英文摘要

Low-rank modeling has a lot of important applications in machine learning, computer vision and social network analysis. While the matrix rank is often approximated by the convex nuclear norm, the use of nonconvex low-rank regularizers has demonstrated better recovery performance. However, the resultant optimization problem is much more challenging. A very recent state-of-the-art is based on the proximal gradient algorithm. However, it requires an expensive full SVD in each proximal step. In this paper, we show that for many commonly-used nonconvex low-rank regularizers, a cutoff can be derived to automatically threshold the singular values obtained from the proximal operator. This allows the use of power method to approximate the SVD efficiently. Besides, the proximal operator can be reduced to that of a much smaller matrix projected onto this leading subspace. Convergence, with a rate of O(1/T) where T is the number of iterations, can be guaranteed. Extensive experiments are performed on matrix completion and robust principal component analysis. The proposed method achieves significant speedup over the state-of-the-art. Moreover, the matrix solution obtained is more accurate and has a lower rank than that of the traditional nuclear norm regularizer.

1503.00021 2026-06-04 math.NA cs.NA stat.CO

Mercer kernels and integrated variance experimental design: connections between Gaussian process regression and polynomial approximation

Mercer核与积分方差实验设计:高斯过程回归与多项式逼近之间的联系

Alex A. Gorodetsky, Youssef M. Marzouk

AI总结 本文探讨了用于开发计算模型替代方案的实验设计方法,分析了实验设计与逼近算法之间的相互作用。研究了高斯过程回归和非侵入性多项式逼近两种常用方法,并提出通过最小化后验积分方差设计准则来优化高斯过程回归的算法,同时发现积分方差最优设计在准确性上优于其他设计方法。

详情
AI中文摘要

本文研究了用于开发计算模型替代方案的实验设计方法,探讨了实验设计与逼近算法之间的相互作用。我们关注两种广泛使用的逼近方法,即高斯过程(GP)回归和非侵入性多项式逼近。首先,我们介绍了用于最小化后验积分方差(IVAR)设计准则的算法,该方法将设计视为连续优化问题,可以在复杂输入域上使用基于梯度的方法解决,而无需使用贪心近似。我们证明,通过这种方式最小化IVAR可以得到具有良好插值性质的点集,并且能够比基于熵最小化或互信息最大化的设计方法提供更准确的高斯过程回归。第二,通过使用Mercer核/特征函数视角分析高斯过程回归,我们确定了在什么条件下高斯过程回归与伪谱多项式逼近一致。偏离这些条件可以理解为对核或实验设计本身的改变。然后我们展示了IVAR最优设计在牺牲核特征函数的离散正交性的情况下,可以比正交化点集产生更低的逼近误差。最后,我们比较了自适应高斯过程回归和自适应伪谱逼近在多种目标函数类别中的性能,识别了对GP+IVAR方法有利的特征。

英文摘要

This paper examines experimental design procedures used to develop surrogates of computational models, exploring the interplay between experimental designs and approximation algorithms. We focus on two widely used approximation approaches, Gaussian process (GP) regression and non-intrusive polynomial approximation. First, we introduce algorithms for minimizing a posterior integrated variance (IVAR) design criterion for GP regression. Our formulation treats design as a continuous optimization problem that can be solved with gradient-based methods on complex input domains, without resorting to greedy approximations. We show that minimizing IVAR in this way yields point sets with good interpolation properties, and that it enables more accurate GP regression than designs based on entropy minimization or mutual information maximization. Second, using a Mercer kernel/eigenfunction perspective on GP regression, we identify conditions under which GP regression coincides with pseudospectral polynomial approximation. Departures from these conditions can be understood as changes either to the kernel or to the experimental design itself. We then show how IVAR-optimal designs, while sacrificing discrete orthogonality of the kernel eigenfunctions, can yield lower approximation error than orthogonalizing point sets. Finally, we compare the performance of adaptive Gaussian process regression and adaptive pseudospectral approximation for several classes of target functions, identifying features that are favorable to the GP + IVAR approach.

1601.05842 2026-06-04 math.ST cs.NA math.NA stat.CO stat.TH

Asymptotic Normality of Scrambled Geometric Net Quadrature

scrambling几何网求积的渐近正态性

Kinjal Basu, Rajarshi Mukherjee

AI总结 本文基于Loh(2003)的工作,证明在特定光滑函数下,scrambled几何网估计具有渐近正态分布,改进了Basu和Owen(2015)关于求积方法的方差分析。

详情
Comments
41 pages, 6 figures
AI中文摘要

在最近的一项工作中,Basu和Owen(2015)提出在域为s个任意维度为d的空间乘积时,使用scrambled几何网进行数值积分。对于一类光滑函数,积分估计的方差为O(n^{-1-2/d}(log n)^{s-1}),相较于普通蒙特卡罗方法的O(n^{-1})。本文的主要思想是扩展Loh(2003)的工作,证明在特定光滑函数定义在合适子集的R^d乘积上时,scrambled几何网估计具有渐近正态分布。

英文摘要

In a very recent work, Basu and Owen (2015) propose the use of scrambled geometric nets in numerical integration when the domain is a product of $s$ arbitrary spaces of dimension $d$ having a certain partitioning constraint. It was shown that for a class of smooth functions, the integral estimate has variance $O( n^{-1 -2/d} (\log n)^{s-1})$ for scrambled geometric nets, compared to $O(n^{-1})$ for ordinary Monte Carlo. The main idea of this paper is to develop on the work by Loh (2003), to show that the scrambled geometric net estimate has an asymptotic normal distribution for certain smooth functions defined on products of suitable subsets of $\mathbb{R}^d$.

1507.05366 2026-06-04 math.ST cs.NA math.NA stat.AP stat.ME stat.TH

ConceFT: Concentration of Frequency and Time via a multitapered synchrosqueezed transform

ConceFT: 通过多 tapered 同步压缩变换实现频率和时间的集中

Ingrid Daubechies, Yi Wang, Hau-tieng Wu

AI总结 本文提出一种新方法,用于确定由多个振荡成分组成的时变信号的时间-频率内容,通过数值实验和理论分析验证其有效性。

详情
AI中文摘要

本文提出了一种新方法,用于确定由多个振荡成分组成的时变信号的时间-频率内容,其中每个成分具有时间变化的振幅和瞬时频率。数值实验以及理论分析被呈现以评估其有效性。

英文摘要

A new method is proposed to determine the time-frequency content of time-dependent signals consisting of multiple oscillatory components, with time-varying amplitudes and instantaneous frequencies. Numerical experiments as well as a theoretical analysis are presented to assess its effectiveness.

1509.04613 2026-06-04 stat.CO cs.NA math.NA math.PR

Gaussian process surrogates for failure detection: a Bayesian experimental design approach

高斯过程代理用于故障检测:一种贝叶斯实验设计方法

Hongqiao Wang, Guang Lin, Jinglai Li

AI总结 本文提出利用高斯过程代理进行故障检测和概率估计,针对高成本计算机模型,采用贝叶斯实验设计优化采样点,提高故障边界推断效率与准确性。

详情
AI中文摘要

不确定性量化的重要任务是识别由各种不确定性源引起的不利事件(特别是系统故障)的概率。本文考虑构建高斯过程代理用于故障检测和故障概率估计。特别地,我们考虑底层计算机模型极其昂贵的情况,在此设置下,确定状态空间中的采样点至关重要。我们将问题表述为贝叶斯推断极限状态(即故障边界)的最优实验设计,并提出高效的数值方案来解决由此产生的优化问题。特别是,所提出的极限状态推断方法能够一次确定多个采样点,因此适用于可以并行进行多个计算机模拟的问题。所提出方法的准确性和性能通过学术和实际例子得到验证。

英文摘要

An important task of uncertainty quantification is to identify {the probability of} undesired events, in particular, system failures, caused by various sources of uncertainties. In this work we consider the construction of Gaussian {process} surrogates for failure detection and failure probability estimation. In particular, we consider the situation that the underlying computer models are extremely expensive, and in this setting, determining the sampling points in the state space is of essential importance. We formulate the problem as an optimal experimental design for Bayesian inferences of the limit state (i.e., the failure boundary) and propose an efficient numerical scheme to solve the resulting optimization problem. In particular, the proposed limit-state inference method is capable of determining multiple sampling points at a time, and thus it is well suited for problems where multiple computer simulations can be performed in parallel. The accuracy and performance of the proposed method is demonstrated by both academic and practical examples.

1508.05214 2026-06-04 stat.ME cs.NA math.NA

IGS: an IsoGeometric approach for Smoothing on surfaces

IGS:用于曲面平滑的等几何方法

Matthieu Wilhelm, Luca Dedè, Laura M. Sangalli, Pierre Wilhelm

AI总结 本文提出了一种基于等几何分析的曲面平滑方法,通过求解四阶偏微分方程来估计噪声和离散测量数据,应用于空间 shuttle 机翼尖的压力系数和气动力估计。

详情
AI中文摘要

我们提出了一种用于曲面平滑的等几何方法,即从噪声和离散测量中估计函数。更具体地说,我们旨在估计位于由NURBS表示的曲面上的函数,这些NURBS是工业应用中常用的几何表示。估计基于最小化一个惩罚最小二乘函数,后者等价于求解一个四阶偏微分方程(PDE)。在此背景下,我们使用等几何分析(IGA)来近似此类曲面PDE,从而得到一种用于在曲面上分布的数据拟合的等几何平滑(IGS)方法。确实,IGA有助于在分析中封装曲面的精确几何表示,同时允许使用至少全局C^1连续的NURBS基函数,这些基函数可以使用标准伽辽金方法求解四阶PDE。我们通过数值模拟展示了所提IGS方法的性能,并将其应用于空间 shuttle 机翼尖的压力系数和相关气动力估计。

英文摘要

We propose an Isogeometric approach for smoothing on surfaces, namely estimating a function starting from noisy and discrete measurements. More precisely, we aim at estimating functions lying on a surface represented by NURBS, which are geometrical representations commonly used in industrial applications. The estimation is based on the minimization of a penalized least-square functional. The latter is equivalent to solve a 4th-order Partial Differential Equation (PDE). In this context, we use Isogeometric Analysis (IGA) for the numerical approximation of such surface PDE, leading to an IsoGeometric Smoothing (IGS) method for fitting data spatially distributed on a surface. Indeed, IGA facilitates encapsulating the exact geometrical representation of the surface in the analysis and also allows the use of at least globally $C^1-$continuous NURBS basis functions for which the 4th-order PDE can be solved using the standard Galerkin method. We show the performance of the proposed IGS method by means of numerical simulations and we apply it to the estimation of the pressure coefficient, and associated aerodynamic force on a winglet of the SOAR space shuttle.

1507.04727 2026-06-04 cs.NE cs.SY eess.SY math.OC stat.AP stat.CO

Recursive Sparse Point Process Regression with Application to Spectrotemporal Receptive Field Plasticity Analysis

递归稀疏点过程回归及其在频谱时间感受野可塑性分析中的应用

Alireza Sheikhattar, Jonathan B. Fritz, Shihab A. Shamma, Behtash Babadi

AI总结 本文提出递归稀疏点过程回归方法,通过引入遗忘因子和ℓ1正则化,实现在线估计稀疏时变参数向量,用于分析听觉神经元的频谱时间感受野可塑性。

详情
AI中文摘要

我们考虑了在线估计点过程模型稀疏时变参数向量的问题,其中观测和输入分别由二进制和连续时间序列组成。我们通过在点过程对数似然中引入遗忘因子机制,构建了新的目标函数以实现适应性,并利用ℓ1正则化捕捉稀疏性。我们对目标函数的最大值进行了严格分析,将压缩感知的保证扩展到我们的设置。我们基于近端优化技术构建了两个递归滤波器用于在线估计参数向量,以及一个新的滤波器用于递归计算统计置信区间。模拟研究显示,我们的算法在跟踪性、拟合优度和均方误差方面优于现有点过程滤波器。最后,我们将我们的滤波算法应用于实验记录的尖峰数据,分析了在注意行为中点击率辨别任务期间ferret初级听觉皮层神经元的频谱时间感受野可塑性的时间进程。

英文摘要

We consider the problem of estimating the sparse time-varying parameter vectors of a point process model in an online fashion, where the observations and inputs respectively consist of binary and continuous time series. We construct a novel objective function by incorporating a forgetting factor mechanism into the point process log-likelihood to enforce adaptivity and employ $\ell_1$-regularization to capture the sparsity. We provide a rigorous analysis of the maximizers of the objective function, which extends the guarantees of compressed sensing to our setting. We construct two recursive filters for online estimation of the parameter vectors based on proximal optimization techniques, as well as a novel filter for recursive computation of statistical confidence regions. Simulation studies reveal that our algorithms outperform several existing point process filters in terms of trackability, goodness-of-fit and mean square error. We finally apply our filtering algorithms to experimentally recorded spiking data from the ferret primary auditory cortex during attentive behavior in a click rate discrimination task. Our analysis provides new insights into the time-course of the spectrotemporal receptive field plasticity of the auditory neurons.

1509.03917 2026-06-04 stat.ML cs.DS cs.IT cs.LG cs.NA math.IT math.NA math.OC

Dropping Convexity for Faster Semi-definite Optimization

放弃凸性以加快半定规划优化

Srinadh Bhojanapalli, Anastasios Kyrillidis, Sujay Sanghavi

AI总结 本文研究了在半正定矩阵集上最小化凸函数的问题,通过因子梯度下降法(FGD)在非凸情况下实现更快收敛,提供了步长选择规则和初始化方法,适用于一般凸函数的收敛性保证。

详情
Comments
40 pages
AI中文摘要

我们研究了在n×n半正定矩阵集上最小化凸函数f(X)的问题,但当问题转换为min_U g(U) := f(UU^T),其中U∈R^{n×r}且r≤n时,我们研究了梯度下降在g上的性能,即因子梯度下降(FGD)。我们提供了一个选择步长的规则,并证明在该选择下,FGD的局部收敛率与标准梯度下降在原始f上的收敛率相同:即经过k步后,误差为O(1/k)对于光滑的f,当f是(受限)强凸时,误差呈指数级减小。此外,我们提供了一种初始化FGD的程序,适用于(受限)强凸目标函数,并且当只能通过一阶oracle访问f时。对于多个问题实例,适当的初始化导致全局收敛保证。FGD和类似程序在实践中广泛用于可表述为矩阵分解的问题。据我们所知,这是首次为一般凸函数在标准凸假设下提供精确的收敛率保证的论文。

英文摘要

We study the minimization of a convex function $f(X)$ over the set of $n\times n$ positive semi-definite matrices, but when the problem is recast as $\min_U g(U) := f(UU^\top)$, with $U \in \mathbb{R}^{n \times r}$ and $r \leq n$. We study the performance of gradient descent on $g$---which we refer to as Factored Gradient Descent (FGD)---under standard assumptions on the original function $f$. We provide a rule for selecting the step size and, with this choice, show that the local convergence rate of FGD mirrors that of standard gradient descent on the original $f$: i.e., after $k$ steps, the error is $O(1/k)$ for smooth $f$, and exponentially small in $k$ when $f$ is (restricted) strongly convex. In addition, we provide a procedure to initialize FGD for (restricted) strongly convex objectives and when one only has access to $f$ via a first-order oracle; for several problem instances, such proper initialization leads to global convergence guarantees. FGD and similar procedures are widely used in practice for problems that can be posed as matrix factorization. To the best of our knowledge, this is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.

1508.02087 2026-06-04 math.OC cs.LG cs.NA math.NA stat.CO stat.ML

A Linearly-Convergent Stochastic L-BFGS Algorithm

一种线性收敛的随机L-BFGS算法

Philipp Moritz, Robert Nishihara, Michael I. Jordan

AI总结 本文提出一种新的随机L-BFGS算法,证明了其在强凸和光滑函数上的线性收敛性,并展示了其在大规模凸优化问题中的高效性能。

详情
Comments
10 pages, 3 figures in International Conference on Artificial Intelligence and Statistics, 2016
AI中文摘要

我们提出了一种新的随机L-BFGS算法,并证明了其在强凸和光滑函数上的线性收敛性。我们的算法借鉴了Byrd等人(2014)最近提出的随机L-BFGS变体以及Johnson和Zhang(2013)最近提出的随机梯度下降方差减少方法。我们通过实验表明,该算法在大规模凸和非凸优化问题中表现良好,展现出线性收敛性和快速求解能力。此外,我们还展示了该算法在广泛步长范围内的良好表现,步长通常相差几个数量级。

英文摘要

We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions. Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a recent approach to variance reduction for stochastic gradient descent from Johnson and Zhang (2013). We demonstrate experimentally that our algorithm performs well on large-scale convex and non-convex optimization problems, exhibiting linear convergence and rapidly solving the optimization problems to high levels of precision. Furthermore, we show that our algorithm performs well for a wide-range of step sizes, often differing by several orders of magnitude.

1604.03912 2026-06-04 cs.AI cs.LG cs.SY eess.SY stat.ML

Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics

逆强化学习与奖励和动态的同时估计

Michael Herman, Tobias Gindele, Jörg Wagner, Felix Schmitt, Wolfram Burgard

AI总结 本文提出一种基于梯度的逆强化学习方法,同时估计系统动态和奖励函数,提升了样本效率和估计准确性。

详情
Comments
accepted to appear in AISTATS 2016
AI中文摘要

逆强化学习(IRL)描述了从观察到的智能体行为中学习未知马尔可夫决策过程(MDP)奖励函数的问题。由于智能体的行为源于其策略,而MDP策略依赖于随机系统动态和奖励函数,逆问题的解决方案受到两者显著影响。当前的IRL方法假设如果转移模型未知,可以获取额外的系统动态样本,或观察行为提供足够的系统动态样本以准确求解逆问题。这些假设往往不成立。为克服这一问题,我们提出了一种基于梯度的IRL方法,同时估计系统的动态。通过求解联合优化问题,我们的方法考虑了演示的偏差,这种偏差源于生成策略。在合成MDP和迁移学习任务上的评估显示,该方法在样本效率以及估计的奖励函数和转移模型的准确性方面有所改进。

英文摘要

Inverse Reinforcement Learning (IRL) describes the problem of learning an unknown reward function of a Markov Decision Process (MDP) from observed behavior of an agent. Since the agent's behavior originates in its policy and MDP policies depend on both the stochastic system dynamics as well as the reward function, the solution of the inverse problem is significantly influenced by both. Current IRL approaches assume that if the transition model is unknown, additional samples from the system's dynamics are accessible, or the observed behavior provides enough samples of the system's dynamics to solve the inverse problem accurately. These assumptions are often not satisfied. To overcome this, we present a gradient-based IRL approach that simultaneously estimates the system's dynamics. By solving the combined optimization problem, our approach takes into account the bias of the demonstrations, which stems from the generating policy. The evaluation on a synthetic MDP and a transfer learning task shows improvements regarding the sample efficiency as well as the accuracy of the estimated reward functions and transition models.

1508.03283 2026-06-04 math.NA cs.NA math.ST stat.CO stat.TH

An adaptive independence sampler MCMC algorithm for infinite dimensional Bayesian inferences

一种自适应独立采样MCMC算法用于无限维贝叶斯推断

Zhe Feng, Jinglai Li

AI总结 本文提出一种自适应独立采样MCMC算法,用于处理无限维贝叶斯推断问题,通过混合有限个特殊参数化的高斯测度作为建议分布,实现维度无关的高效采样,且在多模态后验分布中表现稳健。

详情
AI中文摘要

许多科学和工程问题需要在函数空间中进行贝叶斯推断,其中未知参数为无限维。在这些问题中,许多标准马尔可夫链蒙特卡罗(MCMC)算法在网格细化时变得任意缓慢,这被称为维度依赖性。本文开发了一种基于独立采样的MCMC方法用于无限维贝叶斯推断。我们表示建议分布为有限个特殊参数化的高斯测度的混合。我们证明在所选参数化下,所得MCMC算法是维度无关的。我们还设计了高效的自适应算法来调整混合物的参数值,基于之前的样本。最后,我们提供了数值示例,以展示所提出方法的效率和鲁棒性,即使对于具有多模态后验分布的问题也是如此。

英文摘要

Many scientific and engineering problems require to perform Bayesian inferences in function spaces, in which the unknowns are of infinite dimension. In such problems, many standard Markov Chain Monte Carlo (MCMC) algorithms become arbitrary slow under the mesh refinement, which is referred to as being dimension dependent. In this work we develop an independence sampler based MCMC method for the infinite dimensional Bayesian inferences. We represent the proposal distribution as a mixture of a finite number of specially parametrized Gaussian measures. We show that under the chosen parametrization, the resulting MCMC algorithm is dimension independent. We also design an efficient adaptive algorithm to adjust the parameter values of the mixtures from the previous samples. Finally we provide numerical examples to demonstrate the efficiency and robustness of the proposed method, even for problems with multimodal posterior distributions.

1501.06929 2026-06-04 stat.ML cs.SY eess.SY stat.AP

A Probabilistic Least-Mean-Squares Filter

基于概率的最小均方滤波器

Jesus Fernandez-Bes, Víctor Elvira, Steven Van Vaerenbergh

AI总结 本文提出一种概率方法改进LMS滤波器,通过高效近似实现可变步长算法并提供估计不确定性度量,保持线性复杂度,实验显示优于传统LMS和同类先进算法。

详情
AI中文摘要

我们介绍了一种概率方法用于LMS滤波器。通过高效的近似方法,该方法提供了一种可变步长的LMS算法,并提供了关于估计的不确定性度量。此外,所提出的近似方法保持了标准LMS的线性复杂度。数值结果表明,该算法在性能上优于标准LMS和具有类似复杂度的先进算法。因此,本文的目标是为将更多的贝叶斯机器学习技术引入自适应滤波领域打开大门。

英文摘要

We introduce a probabilistic approach to the LMS filter. By means of an efficient approximation, this approach provides an adaptable step-size LMS algorithm together with a measure of uncertainty about the estimation. In addition, the proposed approximation preserves the linear complexity of the standard LMS. Numerical results show the improved performance of the algorithm with respect to standard LMS and state-of-the-art algorithms with similar complexity. The goal of this work, therefore, is to open the door to bring some more Bayesian machine learning techniques to adaptive filtering.

1511.05838 2026-06-04 stat.CO cs.NA math.NA

On an adaptive preconditioned Crank-Nicolson MCMC algorithm for infinite dimensional Bayesian inferences

关于一种适应性预条件Crank-Nicolson MCMC算法用于无限维贝叶斯推断

Zixi Hu, Zhewei Yao, Jinglai Li

AI总结 本文提出一种适应性预条件Crank-Nicolson MCMC算法,通过调整提议分布的协方差算子提升效率,并证明其满足遍历性条件。

详情
AI中文摘要

许多科学和工程问题需要对无限维未知参数进行贝叶斯推断。在这些问题中,许多标准马尔可夫链蒙特卡罗(MCMC)算法在网格细化时变得任意缓慢,即称为维度依赖。为此,提出了一种称为预条件Crank-Nicolson(pCN)方法的维度独立MCMC算法,用于采样无限维参数。本文开发了一种pCN算法的适应性版本,其中提议分布的协方差算子根据采样历史进行调整以提高模拟效率。我们证明,在某些温和假设下,所提出的算法满足一个重要遍历性条件。最后,我们提供了数值示例以展示所提方法的性能。

英文摘要

Many scientific and engineering problems require to perform Bayesian inferences for unknowns of infinite dimension. In such problems, many standard Markov Chain Monte Carlo (MCMC) algorithms become arbitrary slow under the mesh refinement, which is referred to as being dimension dependent. To this end, a family of dimensional independent MCMC algorithms, known as the preconditioned Crank-Nicolson (pCN) methods, were proposed to sample the infinite dimensional parameters. In this work we develop an adaptive version of the pCN algorithm, where the covariance operator of the proposal distribution is adjusted based on sampling history to improve the simulation efficiency. We show that the proposed algorithm satisfies an important ergodicity condition under some mild assumptions. Finally we provide numerical examples to demonstrate the performance of the proposed method.

1507.04331 2026-06-04 q-bio.QM cs.NA math.AG math.NA q-bio.MN stat.ME

Numerical algebraic geometry for model selection and its application to the life sciences

数值代数几何在模型选择中的应用及其在生命科学中的应用

Elizabeth Gross, Brent Davis, Kenneth L. Ho, Daniel J. Bates, Heather A. Harrington

AI总结 本文利用数值代数几何方法解决数学模型中的参数估计、验证和选择问题,通过概率一多项式连续方法计算目标函数的所有临界点,并过滤以恢复全局最优解,应用于细胞信号传导、合成生物学和流行病学等领域。

详情
Comments
References added, additional clarifications
AI中文摘要

从事数学模型研究的人员常常面临参数估计、模型验证和模型选择等相关的挑战问题。这些问题都是优化问题,由于非线性、非凸性和多个局部极值而具有挑战性。此外,当只有部分数据可用时,这些挑战会进一步加剧。本文考虑多项式模型(例如,稳态质量作用化学反应网络)并基于数值代数几何的优化方法描述其分析框架。具体来说,我们使用概率一多项式连续方法计算目标函数的所有临界点,然后过滤以恢复全局极值。我们的方法利用模型与数据之间的几何结构,并在细胞信号传导、合成生物学和流行病学中的示例中展示了其效用。

英文摘要

Researchers working with mathematical models are often confronted by the related problems of parameter estimation, model validation, and model selection. These are all optimization problems, well-known to be challenging due to non-linearity, non-convexity and multiple local optima. Furthermore, the challenges are compounded when only partial data is available. Here, we consider polynomial models (e.g., mass-action chemical reaction networks at steady state) and describe a framework for their analysis based on optimization using numerical algebraic geometry. Specifically, we use probability-one polynomial homotopy continuation methods to compute all critical points of the objective function, then filter to recover the global optima. Our approach exploits the geometric structures relating models and data, and we demonstrate its utility on examples from cell signaling, synthetic biology, and epidemiology.

1603.08355 2026-06-04 eess.SY cs.SY stat.ME

Multi-Sensor Control for Multi-Target Tracking Using Cauchy-Schwarz Divergence

多传感器控制用于多目标跟踪的Cauchy-Schwarz散度应用

Meng Jiang, Wei Yi, Lingjiang Kong

AI总结 本文提出基于Cauchy-Schwarz散度的多传感器控制方法,用于多目标跟踪,通过标签随机有限集在传感器网络中实现最优和次优的联合与独立决策方法。

详情
AI中文摘要

本文针对传感器网络系统中的多传感器控制多目标跟踪问题,基于信息论中的Cauchy-Schwarz散度,提出两种新的多传感器控制方法。该散度具有闭合形式解,适用于GLMB密度。首先提出的联合决策方法(JDM)最优且性能良好,而其次优的独立决策方法(IDM)计算更快且计算量更小。通过挑战性情境的仿真验证了两种方法的有效性。

英文摘要

The paper addresses the problem of multi-sensor control for multi-target tracking via labelled random finite sets (RFS) in the sensor network systems. Based on an information theoretic divergence measure, namely Cauchy-Schwarz (CS) divergence which admits a closed form solution for GLMB densities, we propose two novel multi-sensor control approaches in the framework of generalized Covariance Intersection (GCI). The first joint decision making (JDM) method is optimal and can achieve overall good performance, while the second independent decision making (IDM) method is suboptimal as a fast realization with smaller amount of computations. Simulation in challenging situation is presented to verify the effectiveness of the two proposed approaches.

1603.07653 2026-06-04 eess.SY cs.SY stat.AP

A Quaternion Frequency and Phasor Estimator for Three-Phase Power Distribution Networks

三相电力分配网络的四元数频率和相量估计器

Sayed Pouria Talebi, Professor Danilo P. Mandic

AI总结 本文提出基于四元数的实时频率估计方法,利用四元数特性统一表征三相电力系统,并通过HR-calculus结合四元数扩展卡尔曼滤波实现相量自适应估计,仿真验证其优于复数方法。

详情
AI中文摘要

首次将四元数用于实时频率估计,利用四元数的多维特性对三相电力系统进行完整表征。通过四元数整合三相电压测量,结合HR-calculus推导基于四元数扩展卡尔曼滤波的状态空间估计器,设计的状态空间向量组件用于系统相量自适应估计。最终通过合成和实际数据仿真验证,表明所开发的四元数频率估计器优于复数方法。

英文摘要

For the first time quaternions have been used for real-time frequency estimation, where the multi-dimensional nature of quaternions allows for the full characterization of three-phase power systems. This is achieved through the use of quaternions to provide a unified framework for incorporating voltage measurements from all the phases of a three-phase system and then employing the recently introduced HR-calculus to derive a state space estimator based on the quaternion extended Kalman filter (QEKF). The components of the state space vector are designed such that they can be deployed for adaptive estimation of the system phasors. Finally, the proposed algorithm is validated through simulations using both synthetic and real-world data, which indicate that the developed quaternion frequency estimator can outperform its complex-valued counterparts.

1503.08019 2026-06-04 cs.DS cs.SY eess.SY stat.OT

Optimality of Fast Matching Algorithms for Random Networks with Applications to Structural Controllability

快速匹配算法在随机网络中的最优性及其在结构可控性中的应用

Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis

AI总结 研究快速可扩展算法在随机网络中寻找最大匹配的最优性,分析Karp和Sipser的启发式方法及其简化版本的渐近最优性,探讨最大匹配大小的渐近结果。

详情
AI中文摘要

网络控制是指控制线性时不变动力系统可控性的一类问题,其中目标是选择适当的输入将网络引导至期望状态。结构可控性与在网络拓扑上寻找最大匹配密切相关。本文研究了大规模随机网络中快速可扩展算法寻找最大匹配的最优性。首先,证明度分布随机网络是结构可控性的真实模型。随后,分析Karp和Sipser提出的流行快速实用启发式方法及其简化版本,对两种启发式方法建立渐近最优性,并提供关于广泛随机网络最大匹配渐近大小的结果。

英文摘要

Network control refers to a very large and diverse set of problems including controllability of linear time-invariant dynamical systems, where the objective is to select an appropriate input to steer the network to a desired state. There are many notions of controllability, one of them being structural controllability, which is intimately connected to finding maximum matchings on the underlying network topology. In this work, we study fast, scalable algorithms for finding maximum matchings for a large class of random networks. First, we illustrate that degree distribution random networks are realistic models for real networks in terms of structural controllability. Subsequently, we analyze a popular, fast and practical heuristic due to Karp and Sipser as well as a simplification of it. For both heuristics, we establish asymptotic optimality and provide results concerning the asymptotic size of maximum matchings for an extensive class of random networks.

1603.07279 2026-06-04 stat.AP cs.NA math.NA

Spatial Global Sensitivity Analysis of High Resolution classified topographic data use in 2D urban flood modelling

高分辨率分类地形数据在二维城市洪水建模中的空间全局敏感性分析

M Abily, N. Bertrand, O Delestre, P Gourbesville, C. -M. Duluc

AI总结 本文提出了一种基于二维浅水方程的高分辨率洪水模型中的空间全局敏感性分析方法,通过Sobol指数估计生成敏感性地图,分析高分辨率地形数据输入参数对洪水模型输出的影响,揭示建模者选择的重要性及计算成本限制。

详情
Journal ref
Environmental Modelling and Software, Elsevier, 2016, 77, pp.183-195
AI中文摘要

本文介绍了一种基于二维浅水方程的高分辨率洪水模型中的空间全局敏感性分析(GSA)方法。空间GSA的目标是生成基于Sobol指数估计的敏感性地图。该方法允许对不确定的高分辨率(HR)地形数据输入参数对洪水模型输出的影响进行排序。研究了三个参数的影响:测量误差、地上元素表示的细节水平和空间离散化分辨率。为引入不确定性,应用了概率密度函数和离散空间方法生成2,000个数字高程模型(DEMs)。基于二维城市洪水河流事件建模,生成的敏感性地图突显了在使用高分辨率地形数据时,建模者选择的影响大于高分辨率数据集的精度影响,以及排名的空间变化性。亮点包括:空间GSA允许生成Sobol指数地图,增强了每个不确定参数对计算输出参数变化的相对权重;Sobol指数地图展示了在使用高分辨率地形数据进行二维水力模型建模时,建模者选择的主要影响,相对于高分辨率数据集精度的影响;该方法为建模者更好地理解其模型的局限性提供了价值;该方法的要求和限制与选择的主观性和计算成本有关。

英文摘要

This paper presents a spatial Global Sensitivity Analysis (GSA) approach in a 2D shallow water equations based High Resolution (HR) flood model. The aim of a spatial GSA is to produce sensitivity maps which are based on Sobol index estimations. Such an approach allows to rank the effects of uncertain HR topographic data input parameters on flood model output. The influence of the three following parameters has been studied: the measurement error, the level of details of above-ground elements representation and the spatial discretization resolution. To introduce uncertainty, a Probability Density Function and discrete spatial approach have been applied to generate 2, 000 DEMs. Based on a 2D urban flood river event modelling, the produced sensitivity maps highlight the major influence of modeller choices compared to HR measurement errors when HR topographic data are used, and the spatial variability of the ranking. Highlights $\bullet$ Spatial GSA allowed the production of Sobol index maps, enhancing the relative weight of each uncertain parameter on the variability of calculated output parameter of interest. 1 $\bullet$ The Sobol index maps illustrate the major influence of the modeller choices, when using the HR topographic data in 2D hydraulic models with respect to the influence of HR dataset accuracy. $\bullet$ Added value is for modeller to better understand limits of his model. $\bullet$ Requirements and limits for this approach are related to subjectivity of choices and to computational cost.

1603.03450 2026-06-04 stat.ME cs.SY eess.SY

Multisensor--Multitarget Bearing--Only Sensor Registration

多传感器-多目标仅测方位传感器注册

Ehsan Taghavi, R. Tharmarasa, T. Kirubarajan, Mike McDonald

AI总结 本文研究了仅测方位传感器中偏移偏差建模及其多目标跟踪中的补偿问题,提出了一种基于多传感器的算法,并推导了Cramér-Rao下界以评估方法的理论精度。

详情
Journal ref
IEEE Transactions on Aerospace and Electronics Systems, 52 (4), 2016
AI中文摘要

仅测方位估计是目标跟踪中的基本且具有挑战性的问题。与雷达跟踪类似,存在偏移或位置偏差会加剧仅测方位估计的难度。建模各种传感器偏差并非易事,尤其在仅测方位跟踪方面文献较少。本文针对仅测方位传感器中的偏移偏差建模及其多目标跟踪中的偏差补偿问题,提出了一种基于多传感器的算法,能够有效处理监控区域内时间变化的目标数量。所提出的算法导致最大似然偏差估计器,并推导了Cramér-Rao下界以量化所提方法或任何其他算法可达到的理论精度。最后,通过不同分布式跟踪场景的仿真结果展示所提方法的能力。为展示所提方法即使在存在假警和漏检的情况下仍能工作,还展示了在集中式跟踪场景中本地传感器发送所有测量值(而非AMR或本地跟踪)的仿真结果。

英文摘要

Bearing--only estimation is one of the fundamental and challenging problems in target tracking. As in the case of radar tracking, the presence of offset or position biases can exacerbate the challenges in bearing--only estimation. Modeling various sensor biases is not a trivial task and not much has been done in the literature specifically for bearing--only tracking. This paper addresses the modeling of offset biases in bearing--only sensors and the ensuing multitarget tracking with bias compensation. Bias estimation is handled at the fusion node to which individual sensors report their local tracks in the form of associated measurement reports (AMR) or angle-only tracks. The modeling is based on a multisensor approach that can effectively handle a time--varying number of targets in the surveillance region. The proposed algorithm leads to a maximum likelihood bias estimator. The corresponding Cramér--Rao Lower Bound to quantify the theoretical accuracy that can be achieved by the proposed method or any other algorithm is also derived. Finally, simulation results on different distributed tracking scenarios are presented to demonstrate the capabilities of the proposed approach. In order to show that the proposed method can work even with false alarms and missed detections, simulation results on a centralized tracking scenario where the local sensors send all their measurements (not AMRs or local tracks) are also presented.

1510.05439 2026-06-04 math.NA cs.NA math.ST stat.TH

Efficient estimators for likelihood ratio sensitivity indices of complex stochastic dynamics

复杂随机动力学likelihood比敏感度指数的高效估计器

Georgios Arampatzis, Markos A. Katsoulakis, Luc Rey-Bellet

AI总结 本文提出高效估计器用于复杂随机动力学的likelihood比敏感度指数,具有低恒定方差,适用于长时间和稳态分析,方法基于新的协方差形式,包含Fisher信息矩阵,可快速筛选不敏感参数。

详情
Comments
Revision of the paper. Added a new estimator
AI中文摘要

我们证明,对于复杂随机动力学的敏感度指数,中心化likelihood比估计器具有高度效率,其方差低且时间恒定,因此适用于长时间和稳态 regime 的敏感性分析。这些估计器依赖于新的likelihood比协方差公式,其中包含作为子矩阵的随机动力学的Fisher信息矩阵,也可用于快速筛选不敏感参数和参数组合。所提出的方法适用于广泛的随机动力学类别,如化学反应网络、Langevin型方程和金融随机模型,包括高维参数空间和/或不同可观察量之间不同时刻相关时间的系统。此外,它们易于实现,作为现有模拟算法中的标准可观察量,无需额外修改。

英文摘要

We demonstrate that centered likelihood ratio estimators for the sensitivity indices of complex stochastic dynamics are highly efficient with low, constant in time variance and consequently they are suitable for sensitivity analysis in long-time and steady-state regimes. These estimators rely on a new covariance formulation of the likelihood ratio that includes as a submatrix a Fisher Information Matrix for stochastic dynamics and can also be used for fast screening of insensitive parameters and parameter combinations. The proposed methods are applicable to broad classes of stochastic dynamics such as chemical reaction networks, Langevin-type equations and stochastic models in finance, including systems with a high dimensional parameter space and/or disparate decorrelation times between different observables. Furthermore, they are simple to implement as a standard observable in any existing simulation algorithms without additional modifications.

1506.00343 2026-06-04 stat.CO cs.NA math.NA

On Polynomial Chaos Expansion via Gradient-enhanced $\ell_1$-minimization

通过梯度增强的ℓ1最小化进行多项式混沌展开

Ji Peng, Jerrad Hampton, Alireza Doostan

AI总结 本文研究了通过梯度增强的ℓ1最小化方法来加速多项式混沌展开系数的识别,通过理论分析和实验验证,证明导数信息能提高求解稳定性与收敛性,降低计算成本。

详情
AI中文摘要

梯度增强的不确定性量化(UQ)近期受到关注,其中利用Quantity of Interest(QoI)对不确定参数的导数来改进代理近似。多项式混沌展开(PCE)常用于UQ,当QoI可由稀疏PCE表示时,ℓ1最小化可利用少量样本识别PCE系数。本文研究了梯度增强的ℓ1最小化方法,其中导数信息用于加速PCE系数的识别。对于该方法,稳定性与收敛性分析缺乏,因此本文通过概率结果进行了探讨。特别是,通过适当归一化,我们证明导数信息的引入几乎必然导致更好的条件,例如与测量矩阵的空域和相干性相关,从而实现成功的解恢复。进一步,我们通过三个数值示例验证了分析:制造的PCE、具有随机输入的椭圆偏微分方程以及具有随机边界的平面泊肃叶流。这些示例均表明,引入导数信息可在降低计算成本的同时实现解恢复。

英文摘要

Gradient-enhanced Uncertainty Quantification (UQ) has received recent attention, in which the derivatives of a Quantity of Interest (QoI) with respect to the uncertain parameters are utilized to improve the surrogate approximation. Polynomial chaos expansions (PCEs) are often employed in UQ, and when the QoI can be represented by a sparse PCE, $\ell_1$-minimization can identify the PCE coefficients with a relatively small number of samples. In this work, we investigate a gradient-enhanced $\ell_1$-minimization, where derivative information is computed to accelerate the identification of the PCE coefficients. For this approach, stability and convergence analysis are lacking, and thus we address these here with a probabilistic result. In particular, with an appropriate normalization, we show the inclusion of derivative information will almost-surely lead to improved conditions, e.g. related to the null-space and coherence of the measurement matrix, for a successful solution recovery. Further, we demonstrate our analysis empirically via three numerical examples: a manufactured PCE, an elliptic partial differential equation with random inputs, and a plane Poiseuille flow with random boundaries. These examples all suggest that including derivative information admits solution recovery at reduced computational cost.

1505.05114 2026-06-04 cs.IT cs.LG cs.NA math.IT math.NA math.ST stat.ML stat.TH

Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems

求解随机二次方程组与求解线性方程组几乎同样容易

Yuxin Chen, Emmanuel J. Candes

AI总结 本文提出了一种新的方法,通过谱方法获得初始猜测,再通过非凸函数最小化来求解随机二次方程组,证明在特定模型下算法可在线性时间内得到正确解,并在噪声环境下达到接近不可改进的统计精度。

详情
Comments
accepted to Communications on Pure and Applied Mathematics (CPAM)
AI中文摘要

我们考虑了在n个变量中求解二次方程组的基本问题,其中y_i = |⟨a_i, x⟩|²,i = 1, ..., m,且x ∈ ℝ^n未知。我们提出了一种新方法,从通过谱方法计算的初始猜测开始,通过类似于Wirtinger流方法的非凸函数最小化进行处理。关键特征包括不同的目标函数和新的更新规则,这些规则以自适应方式操作,并丢弃对搜索方向影响过大的项。这些精心选择的规则提供了更精确的初始猜测、更好的下降方向,从而提升了实际性能。在理论方面,我们证明对于某些无结构的二次方程组模型,我们的算法在m/n比率超过固定数值常数时可在线性时间内得到正确解。我们扩展了理论以处理噪声系统,其中我们只有y_i ≈ |⟨a_i, x⟩|²,并证明我们的算法达到几乎不可改进的统计精度。我们通过数值示例补充了理论研究,显示求解随机二次方程组在计算和统计上并不比求解相同规模的线性方程组更困难——因此本文的标题。例如,我们证明算法的计算成本大约是相同规模最小二乘问题的四倍。

英文摘要

We consider the fundamental problem of solving quadratic systems of equations in $n$ variables, where $y_i = |\langle \boldsymbol{a}_i, \boldsymbol{x} \rangle|^2$, $i = 1, \ldots, m$ and $\boldsymbol{x} \in \mathbb{R}^n$ is unknown. We propose a novel method, which starting with an initial guess computed by means of a spectral method, proceeds by minimizing a nonconvex functional as in the Wirtinger flow approach. There are several key distinguishing features, most notably, a distinct objective functional and novel update rules, which operate in an adaptive fashion and drop terms bearing too much influence on the search direction. These careful selection rules provide a tighter initial guess, better descent directions, and thus enhanced practical performance. On the theoretical side, we prove that for certain unstructured models of quadratic systems, our algorithms return the correct solution in linear time, i.e. in time proportional to reading the data $\{\boldsymbol{a}_i\}$ and $\{y_i\}$ as soon as the ratio $m/n$ between the number of equations and unknowns exceeds a fixed numerical constant. We extend the theory to deal with noisy systems in which we only have $y_i \approx |\langle \boldsymbol{a}_i, \boldsymbol{x} \rangle|^2$ and prove that our algorithms achieve a statistical accuracy, which is nearly un-improvable. We complement our theoretical study with numerical examples showing that solving random quadratic systems is both computationally and statistically not much harder than solving linear systems of the same size---hence the title of this paper. For instance, we demonstrate empirically that the computational cost of our algorithm is about four times that of solving a least-squares problem of the same size.

1412.7890 2026-06-04 cs.IT cs.NA math.FA math.IT math.NA math.OC math.ST stat.TH

Compressive Deconvolution in Random Mask Imaging

压缩反卷积在随机掩码成像中的应用

Sohail Bahmani, Justin Romberg

AI总结 研究通过随机编码掩码实现信号重建,证明在少量测量下可稳定反卷积,并展示稀疏图像恢复的可行性。

详情
Journal ref
IEEE Transactions on Computational Imaging 1(4):236--246, 2015
AI中文摘要

本文研究了从子采样卷积后的调制版本和已知滤波器中重建信号的问题。该问题应用于依赖于随机编码

英文摘要

We investigate the problem of reconstructing signals from a subsampled convolution of their modulated versions and a known filter. The problem is studied as applies to specific imaging systems relying on spatial phase modulation by randomly coded "masks." The diversity induced by the random masks is deemed to improve the conditioning of the deconvolution problem while maintaining sampling efficiency. We analyze a linear model of the system, where the joint effect of the spatial modulation, blurring, and spatial subsampling is represented by a measurement matrix. We provide a bound on the conditioning of this measurement matrix in terms of the number of masks, the dimension of the image, and certain characteristics of the blurring kernel and subsampling operator. The derived bound shows that stable deconvolution is possible with high probability even if the total number of (scalar) measurements is within a logarithmic factor of the image size. Furthermore, beyond a critical number of masks determined by the extent of blurring and subsampling, every additional mask improves the conditioning of the measurement matrix. We also consider a more interesting scenario where the target image is sparse. We show that under mild conditions on the blurring kernel, with high probability the measurement matrix is a restricted isometry when the number of masks is within a logarithmic factor of the sparsity of the image. Therefore, the image can be reconstructed using many sparse recovery algorithms such as the basis pursuit. The bound on the required number of masks is linear in sparsity of the image but it is logarithmic in its dimension. The bound provides a quantitative view of the effect of the blurring and subsampling on the required number of masks, which is critical for designing efficient imaging systems.

1603.06349 2026-06-04 stat.ME cs.SY eess.SY

Distributed Multi-Sensor Fusion Using Generalized Multi-Bernoulli Densities

分布式多传感器融合使用广义多伯努利密度

Meng Jiang, Wei Yi, Reza Hoseinnezhad, Lingjiang Kong

AI总结 本文提出基于广义多伯努利密度的分布式多目标跟踪方法,通过第二阶近似改进GCI融合,验证了其在多静态雷达系统中的有效性。

详情
AI中文摘要

本文提出基于广义多伯努利密度的分布式多目标跟踪方法,通过第二阶近似改进GCI融合,验证了其在多静态雷达系统中的有效性。

英文摘要

The paper addresses distributed multi-target tracking in the framework of generalized Covariance Intersection (GCI) over multistatic radar system. The proposed method is based on the unlabeled version of generalized labeled multi-Bernoulli (GLMB) family by discarding the labels, referred as generalized multi-Bernoulli (GMB) family. However, it doesn't permit closed form solution for GCI fusion with GMB family. To solve this challenging problem, firstly, we propose an efficient approximation to the GMB family which preserves both the probability hypothesis density (PHD) and cardinality distribution, named as second-order approximation of GMB (SO-GMB) density. Then, we derive explicit expression for the GCI fusion with SO-GMB density. Finally, we compare the first-order approximation of GMB (FO-GMB) density with SO-GMB density in two scenarios and make a concrete analysis of the advantages of the second-order approximation. Simulation results are presented to verify the proposed approach.

1603.06216 2026-06-04 eess.SY cs.SY stat.CO

Skew-t inference with improved covariance matrix approximation

具有改进协方差矩阵近似的斜分布推断

Henri Nurminen, Tohid Ardeshiri, Robert Piche, Fredrik Gustafsson

AI总结 本文提出改进的斜分布滤波器和平滑器,通过变分贝叶斯近似提高后验协方差矩阵的准确性,并在精度和速度上优于传统方法。

详情
AI中文摘要

本文提出了用于线性离散时间状态空间模型的滤波和平滑算法,其中测量噪声服从斜分布。所提出的算法通过使用均场变分贝叶斯近似将后验分布近似为斜分布似然和正态先验,改进了先前提出的滤波器和平滑器。我们的仿真显示,所提出的变分贝叶斯近似在后验协方差矩阵的近似精度上优于先前方法。此外,新的滤波器和平滑器在精度和速度上均优于先前方法和传统低复杂度替代方案。

英文摘要

Filtering and smoothing algorithms for linear discrete-time state-space models with skew-t distributed measurement noise are presented. The proposed algorithms improve upon our earlier proposed filter and smoother using the mean field variational Bayes approximation of the posterior distribution to a skew-t likelihood and normal prior. Our simulations show that the proposed variational Bayes approximation gives a more accurate approximation of the posterior covariance matrix than our earlier proposed method. Furthermore, the novel filter and smoother outperform our earlier proposed methods and conventional low complexity alternatives in accuracy and speed.

1603.04834 2026-06-04 eess.SY cs.IT cs.SY math.IT math.OC stat.AP

Mobile Beamforming & Spatially Controlled Relay Communications

移动波束成形与空间控制中继通信

Dionysios S. Kalogerias, Athina P. Petropulu

AI总结 本文研究了单源单目标机器人中继网络中的随机运动规划问题,提出了一种两阶段随机规划方法,通过优化中继位置以最大化总波束成形功率的期望倒数,采用随机因果信道状态信息进行决策。

详情
Comments
41st International Conference on Acoustics, Speech & Signal Processing (ICASSP 2016) Presentation Available: http://sigport.org/831
AI中文摘要

我们考虑了在合作波束成形框架下,单源单目标机器人中继网络中的随机运动规划问题。假设通信介质构成一个时空随机场,我们提出了一种两阶段随机规划方法,用于确定中继的位置,以最大化其总波束成形功率的期望倒数。随机决策基于随机因果信道状态信息。认识到原始问题的不可行性,我们提出一个下界松弛,导致一个非平凡的中继位置优化问题,等价于一个小的简单可解子问题集。我们的方法导致具有预测特性的空间控制器;在每个时间槽中,新的中继位置应使得下一个时间槽的期望功率倒数最大化。很有趣的是,松弛问题的最优控制策略是纯粹选择性的;在某种意义上,只有最佳中继应移动。

英文摘要

We consider stochastic motion planning in single-source single-destination robotic relay networks, under a cooperative beamforming framework. Assuming that the communication medium constitutes a spatiotemporal stochastic field, we propose a 2-stage stochastic programming formulation of the problem of specifying the positions of the relays, such that the expected reciprocal of their total beamforming power is maximized. Stochastic decision making is made on the basis of random causal CSI. Recognizing the intractability of the original problem, we propose a lower bound relaxation, resulting to a nontrivial optimization problem with respect to the relay locations, which is equivalent to a small set of simple, tractable subproblems. Our formulation results in spatial controllers with a predictive character; at each time slot, the new relay positions should be such that the expected power reciprocal at the next time slot is maximized. Quite interestingly, the optimal control policy to the relaxed problem is purely selective; under a certain sense, only the best relay should move.

1503.07591 2026-06-04 math.NA cs.NA stat.ME

Convex Optimization approach to signals with fast varying instantaneous frequency

用凸优化方法分析具有快速变化瞬时频率的信号

Matthieu Kowalski, Adrien Meynard, Hau-tieng Wu

AI总结 本文提出用凸优化方法处理多组分快速变化瞬时频率信号,通过自适应谐波模型直接最小化函数实现时频表示,采用FISTA算法高效求解,验证了Tycoon算法的有效性。

详情
AI中文摘要

受多组分快速变化瞬时频率信号分析局限的启发,本文通过优化方法解决时频分析问题。基于提出的自适应谐波模型,通过直接最小化包含信号重构和集中时频表示等性质的函数获得信号的时频表示。采用FISTA算法实现函数的高效数值近似,将该算法命名为{\it Time-frequency bY COnvex OptimizatioN} (Tycoon)。数值结果证实了Tycoon算法的潜力。

英文摘要

Motivated by the limitation of analyzing oscillatory signals composed of multiple components with fast-varying instantaneous frequency, we approach the time-frequency analysis problem by optimization. Based on the proposed adaptive harmonic model, the time-frequency representation of a signal is obtained by directly minimizing a functional, which involves few properties an "ideal time-frequency representation" should satisfy, for example, the signal reconstruction and concentrative time frequency representation. FISTA (Fast Iterative Shrinkage-Thresholding Algorithm) is applied to achieve an efficient numerical approximation of the functional. We coin the algorithm as {\it Time-frequency bY COnvex OptimizatioN} (Tycoon). The numerical results confirm the potential of the Tycoon algorithm.

1602.04003 2026-06-04 math.NA cs.NA stat.AP stat.ME

Bayesian smoothing of dipoles in Magneto-/Electro-encephalography

脑电图中磁/电偶极子的贝叶斯平滑

Valentina Vivaldi, Alberto Sorrentino

AI总结 本文提出一种基于粒子滤波的动态估计多偶极子状态的新方法,通过蒙特卡洛算法近似平滑分布,提升源定位准确性,尤其在癫痫患者源建模中具有重要意义。

详情
Journal ref
Inverse Problems 32 (2016) 045007
Comments
16 pages, 5 figures
AI中文摘要

我们描述了一种新颖的方法,用于从磁/电脑电图(M/EEG)时间序列中动态估计多偶极子状态。该新方法基于最近针对M/EEG开发的粒子滤波算法;这些算法通过样本和权重近似后验分布,即在时间t给定数据到时间t的神经源分布。然而,对于离线推断目的,更倾向于使用平滑分布,即神经源在时间t条件下整个时间序列的分布。在本研究中,我们使用蒙特卡洛算法近似时间变化的电流偶极子的平滑分布。通过数值模拟,我们展示出平滑分布提供的估计比滤波分布更准确,特别是在源出现时。我们使用记录自癫痫患者的实验数据集验证了所提算法。改进的源定位可能对癫痫患者源建模特别相关,因为源出现提供了关于癫痫灶的信息。

英文摘要

We describe a novel method for dynamic estimation of multi-dipole states from Magneto/Electro-encephalography (M/EEG) time series. The new approach builds on the recent development of particle filters for M/EEG; these algorithms approximate, with samples and weights, the posterior distribution of the neural sources at time t given the data up to time t. However, for off-line inference purposes it is preferable to work with the smoothing distribution, i.e. the distribution for the neural sources at time t conditioned on the whole time series. In this study, we use a Monte Carlo algorithm to approximate the smoothing distribution for a time-varying set of current dipoles. We show, using numerical simulations, that the estimates provided by the smoothing distribution are more accurate than those provided by the filtering distribution, particularly at the appearance of the source. We validate the proposed algorithm using an experimental dataset recorded from an epileptic patient. Improved localization of the source onset can be particularly relevant in source modeling of epileptic patients, where the source onset brings information on the epileptogenic zone.

1506.09016 2026-06-04 cs.LG cs.CV cs.NA math.NA math.OC stat.ML

Online Learning to Sample

在线学习采样

Guillaume Bouchard, Théo Trouillon, Julien Perez, Adrien Gaidon

AI总结 本文提出AW-SGD算法,通过在线学习优化采样策略,提升在线优化效率,应用于图像分类、矩阵分解和强化学习。

详情
Comments
Update: removed convergence theorem and proof as there is an error. Submitted to UAI 2016
AI中文摘要

随机梯度下降(SGD)是机器学习中用于在线优化最广泛使用的技术之一。在本工作中,我们通过适应性地学习如何在每个时间步选择最有用的训练示例来加速SGD。首先,我们证明SGD可以用于学习重要采样估计器的最佳可能采样分布。其次,我们证明SGD算法的采样分布可以通过逐步最小化梯度的方差来在线估计。所得到的算法——自适应加权SGD(AW-SGD)——维护一组用于优化的参数,以及一组用于采样学习示例的参数。我们证明AWSGD在三个不同的应用中实现了更快的收敛:(i)使用深度特征的图像分类,其中图像的采样取决于其标签,(ii)矩阵分解,其中行和列不是均匀采样的,以及(iii)强化学习,其中优化和探索策略同时被估计,其中我们的方法对应于一个off-policy梯度算法。

英文摘要

Stochastic Gradient Descent (SGD) is one of the most widely used techniques for online optimization in machine learning. In this work, we accelerate SGD by adaptively learning how to sample the most useful training examples at each time step. First, we show that SGD can be used to learn the best possible sampling distribution of an importance sampling estimator. Second, we show that the sampling distribution of an SGD algorithm can be estimated online by incrementally minimizing the variance of the gradient. The resulting algorithm - called Adaptive Weighted SGD (AW-SGD) - maintains a set of parameters to optimize, as well as a set of parameters to sample learning examples. We show that AWSGD yields faster convergence in three different applications: (i) image classification with deep features, where the sampling of images depends on their labels, (ii) matrix factorization, where rows and columns are not sampled uniformly, and (iii) reinforcement learning, where the optimized and exploration policies are estimated at the same time, where our approach corresponds to an off-policy gradient algorithm.

1603.04565 2026-06-04 stat.ME cs.SY eess.SY

A Generalized Labeled Multi-Bernoulli Filter for Maneuvering Targets

一种用于机动目标的广义标记多伯努利滤波器

Yuthika Punchihewa, Ba-Ngu Vo, Ba-Tuong Vo

AI总结 本文提出一种广义标记多伯努利滤波器,用于跟踪可建模为跳马尔可夫系统(JMS)的机动目标,通过线性和非线性示例验证了其有效性。

详情
AI中文摘要

一个多重机动目标系统可以被视为跳马尔可夫系统(JMS),因为目标运动可以使用不同的运动模型建模,其中特定目标在运动模型之间的转换遵循马尔可夫链概率规则。本文描述了一种用于跟踪可建模为JMS的机动目标的广义标记多伯努利(GLMB)滤波器。所提出的滤波器通过两个线性和非线性机动目标跟踪示例进行了验证。

英文摘要

A multiple maneuvering target system can be viewed as a Jump Markov System (JMS) in the sense that the target movement can be modeled using different motion models where the transition between the motion models by a particular target follows a Markov chain probability rule. This paper describes a Generalized Labelled Multi-Bernoulli (GLMB) filter for tracking maneuvering targets whose movement can be modeled via such a JMS. The proposed filter is validated with two linear and nonlinear maneuvering target tracking examples.

1510.06083 2026-06-04 cs.LG cs.NA math.NA math.OC stat.ML

Regularization vs. Relaxation: A conic optimization perspective of statistical variable selection

正则化与松弛:从锥优化视角看统计变量选择

Hongbo Dong, Kun Chen, Jeff Linderoth

AI总结 本文从锥优化视角探讨变量选择问题,证明MCP和反Huber惩罚函数可视为视角松弛的特例,并通过半定松弛解决,结合Goemans-Williamson方法获得近似解。

详情
Comments
Also available on optimization online {http://www.optimization-online.org/DB_HTML/2015/05/4932.html}
AI中文摘要

变量选择是统计数据分析中的基本任务。稀疏诱导正则化方法同时执行变量选择和模型估计,核心问题是一个带有l0范数惩罚的二次优化问题。精确执行l0范数惩罚对大规模问题计算不可行,因此引入了近似l0范数的稀疏诱导惩罚函数。本文表明从凸松弛视角分析问题提供新见解。特别是,我们证明了流行的稀疏诱导凹惩罚函数Minimax Concave Penalty(MCP)和反Huber惩罚函数(由Pilanci等人最近提出)均可视为一种称为视角松弛的提升凸松弛的特例。最优视角松弛是一个相关的minimax问题,平衡整体凸性和对l0范数的逼近紧密性。我们证明其可通过半定松弛解决。此外,半定松弛的概率解释揭示了与组合优化中的布尔二次多面体的联系。最后,通过将l0范数惩罚问题重新表述为两级问题,其中内层为Max-Cut问题,我们的所提半定松弛可通过将内层问题替换为其由Goemans和Williamson研究的半定松弛来实现。此解释表明使用Goemans-Williamson的舍入过程可找到l0范数惩罚问题的近似解。数值实验展示了我们所提半定松弛的紧密性,以及通过Goemans-Williamson舍入找到近似解的有效性。

英文摘要

Variable selection is a fundamental task in statistical data analysis. Sparsity-inducing regularization methods are a popular class of methods that simultaneously perform variable selection and model estimation. The central problem is a quadratic optimization problem with an l0-norm penalty. Exactly enforcing the l0-norm penalty is computationally intractable for larger scale problems, so dif- ferent sparsity-inducing penalty functions that approximate the l0-norm have been introduced. In this paper, we show that viewing the problem from a convex relaxation perspective offers new insights. In particular, we show that a popular sparsity-inducing concave penalty function known as the Minimax Concave Penalty (MCP), and the reverse Huber penalty derived in a recent work by Pilanci, Wainwright and Ghaoui, can both be derived as special cases of a lifted convex relaxation called the perspective relaxation. The optimal perspective relaxation is a related minimax problem that balances the overall convexity and tightness of approximation to the l0 norm. We show it can be solved by a semidefinite relaxation. Moreover, a probabilistic interpretation of the semidefinite relaxation reveals connections with the boolean quadric polytope in combinatorial optimization. Finally by reformulating the l0-norm pe- nalized problem as a two-level problem, with the inner level being a Max-Cut problem, our proposed semidefinite relaxation can be realized by replacing the inner level problem with its semidefinite relaxation studied by Goemans and Williamson. This interpretation suggests using the Goemans-Williamson rounding procedure to find approximate solutions to the l0-norm penalized problem. Numerical experiments demonstrate the tightness of our proposed semidefinite relaxation, and the effectiveness of finding approximate solutions by Goemans-Williamson rounding.

1506.04344 2026-06-04 math.ST cs.NA math.NA math.OC stat.TH

Enhancing Sparsity of Hermite Polynomial Expansions by Iterative Rotations

通过迭代旋转增强Hermite多项式展开的稀疏性

Xiu Yang, Huan Lei, Nathan A. Baker, Guang Lin

AI总结 本文提出通过迭代旋转确定线性映射,以提升Hermite多项式展开的稀疏性,从而提高压缩感知在不确定性量化中的效率和精度,应用于随机偏微分方程和高维问题。

详情
AI中文摘要

压缩感知近年来已成为不确定性量化中的强大补充。本文通过线性映射识别随机变量的新基底,使得目标量的表示在新基底下更稀疏,从而提高压缩感知基于不确定性量化的效率和准确性。具体而言,我们考虑基于旋转的线性映射,这些映射通过迭代确定Hermite多项式展开。我们通过求解随机偏微分方程和高维(O(100))问题的应用,展示了新方法的有效性。

英文摘要

Compressive sensing has become a powerful addition to uncertainty quantification in recent years. This paper identifies new bases for random variables through linear mappings such that the representation of the quantity of interest is more sparse with new basis functions associated with the new random variables. This sparsity increases both the efficiency and accuracy of the compressive sensing-based uncertainty quantification method. Specifically, we consider rotation-based linear mappings which are determined iteratively for Hermite polynomial expansions. We demonstrate the effectiveness of the new method with applications in solving stochastic partial differential equations and high-dimensional ($\mathcal{O}(100)$) problems.

1603.01136 2026-06-04 stat.CO cs.NA math.NA

Multilevel Sequential Monte Carlo Samplers for Normalizing Constants

多级顺序蒙特卡洛采样器用于归一化常数

Pierre Del Moral, Ajay Jasra, Kody Law, Yan Zhou

AI总结 本文研究了用于后验分布归一化常数比的顺序蒙特卡洛近似,考虑了连续模型下的误差平衡问题,提出多级策略以降低计算成本,并通过参数渗透率识别示例验证理论结果。

详情
AI中文摘要

本文考虑了顺序蒙特卡洛(SMC)近似用于后验分布归一化常数比的问题,这些归一化常数本质上依赖于连续模型。因此,蒙特卡洛估计误差和离散近似误差必须平衡。采用多级策略显著降低了获得给定误差水平的近似成本,相比标准估计器。考虑了两种估计器并给出了相对方差界。理论结果通过识别参数渗透率在椭圆方程中给定点观测压力的示例进行数值验证。

英文摘要

This article considers the sequential Monte Carlo (SMC) approximation of ratios of normalizing constants associated to posterior distributions which in principle rely on continuum models. Therefore, the Monte Carlo estimation error and the discrete approximation error must be balanced. A multilevel strategy is utilized to substantially reduce the cost to obtain a given error level in the approximation as compared to standard estimators. Two estimators are considered and relative variance bounds are given. The theoretical results are numerically illustrated for the example of identifying a parametrized permeability in an elliptic equation given point-wise observations of the pressure.

1602.08712 2026-06-04 math.NA cs.NA math.FA stat.ML

On the entropy numbers of the mixed smoothness function classes

关于混合光滑函数类的熵数

V. Temlyakov

AI总结 本文研究了具有混合光滑性的多变量函数类的熵数行为,提出基于非线性逼近的新方法,通过两步策略推导熵数上界,并利用体积估计法证明下界。

详情
AI中文摘要

本文研究了具有混合光滑性的多变量函数类的熵数行为。这个问题有长期历史,该领域仍存在一些根本性问题尚未解决。本文的主要目标是开发一种新的方法,用于证明熵数的上界。该方法基于非线性逼近的最新发展,特别是贪心逼近。该方法由以下两个步骤策略组成。第一步是获得相对于字典的最佳m项近似界。第二步是利用将熵数与最佳m项近似联系起来的一般不等式。对于下界,我们使用体积估计法,这是一种已知的强有力的证明熵数下界的常用方法。它已在许多先前论文中使用。

英文摘要

Behavior of the entropy numbers of classes of multivariate functions with mixed smoothness is studied here. This problem has a long history and some fundamental problems in the area are still open. The main goal of this paper is to develop a new method of proving the upper bounds for the entropy numbers. This method is based on recent developments of nonlinear approximation, in particular, on greedy approximation. This method consists of the following two steps strategy. At the first step we obtain bounds of the best m-term approximations with respect to a dictionary. At the second step we use general inequalities relating the entropy numbers to the best m-term approximations. For the lower bounds we use the volume estimates method, which is a well known powerful method for proving the lower bounds for the entropy numbers. It was used in a number of previous papers.

1602.04434 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Frequency Analysis of Temporal Graph Signals

时序图信号的频谱分析

Andreas Loukas, Damien Foucard

AI总结 本文提出时序图频谱分析概念,统一了时频和图频分析方法,通过联合时频变换设计分布式滤波器用于干扰消除。

详情
Comments
5 pages, 3 figures
AI中文摘要

本文扩展了图频谱的概念,研究时序图信号的频谱特性。通过构建联合时频变换,设计分布式滤波器实现干扰消除,算法具有线性复杂度且能近似任意联合滤波目标。

英文摘要

This letter extends the concept of graph-frequency to graph signals that evolve with time. Our goal is to generalize and, in fact, unify the familiar concepts from time- and graph-frequency analysis. To this end, we study a joint temporal and graph Fourier transform (JFT) and demonstrate its attractive properties. We build on our results to create filters which act on the joint (temporal and graph) frequency domain, and show how these can be used to perform interference cancellation. The proposed algorithms are distributed, have linear complexity, and can approximate any desired joint filtering objective.

1506.01326 2026-06-04 math.NA cs.AI cs.LG cs.NA stat.CO stat.ML

Probabilistic Numerics and Uncertainty in Computations

概率数值计算与计算中的不确定性

Philipp Hennig, Michael A Osborne, Mark Girolami

AI总结 本文呼吁采用概率数值方法,通过在计算中返回不确定性来改进线性代数、积分、优化和微分方程求解等算法,强调其在气候科学和天文学等领域的应用价值。

详情
Comments
Author Generated Postprint. 17 pages, 4 Figures, 1 Table
AI中文摘要

我们呼吁采用概率数值方法:即在数值任务中返回不确定性的算法,包括线性代数、积分、优化和求解微分方程。这些不确定性源于数值计算中由于时间和硬件限制导致的精度损失,对现代科学和工业至关重要。在诸如气候科学和天文学等应用中,基于大规模复杂数据的计算需求促使重新关注数值不确定性的管理。我们描述了几种经典数值方法如何自然地被解释为概率推断。然后展示概率观点如何提出新的算法,能够灵活适应应用需求,并提供改进的实证性能。我们提供了天文学和天文成像等实际科学问题中概率数值算法的实例,同时指出这些新算法存在的开放问题。最后,我们描述了概率数值方法如何为结合数值算法(如数值优化器和微分方程求解器)的计算提供一致的框架,可能允许诊断(和控制)计算中的误差源。

英文摘要

We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such uncertainties, arising from the loss of precision induced by numerical calculation with limited time or hardware, are important for much contemporary science and industry. Within applications such as climate science and astrophysics, the need to make decisions on the basis of computations with large and complex data has led to a renewed focus on the management of numerical uncertainty. We describe how several seminal classic numerical methods can be interpreted naturally as probabilistic inference. We then show that the probabilistic view suggests new algorithms that can flexibly be adapted to suit application specifics, while delivering improved empirical performance. We provide concrete illustrations of the benefits of probabilistic numeric algorithms on real scientific problems from astrometry and astronomical imaging, while highlighting open problems with these new algorithms. Finally, we describe how probabilistic numerical methods provide a coherent framework for identifying the uncertainty in calculations performed with a combination of numerical algorithms (e.g. both numerical optimisers and differential equation solvers), potentially allowing the diagnosis (and control) of error sources in computations.

1507.06759 2026-06-04 stat.CO cs.NA math.NA stat.ML

Variational Bayesian strategies for high-dimensional, stochastic design problems

变分贝叶斯策略用于高维、随机设计问题

Phaedon-Stelios Koutsourelakis

AI总结 本文提出变分贝叶斯方法用于解决高维不确定性下的优化设计问题,通过识别设计空间中的低维方向提升计算效率。

详情
AI中文摘要

本文探讨了在基于模型的不确定性量化(UQ)背景下,优化/设计/控制在不确定性下的问题。此类问题的解决不仅面临通常UQ任务的困难(如每个正向模拟的高计算成本、大量随机变量),还涉及需要求解包含大量设计变量和潜在约束的非线性优化问题。我们提出了一种适用于广泛问题类别的框架,基于将问题重新表述为概率推断任务的思路。为此,我们提出了一种变分贝叶斯(VB)公式和迭代VB-期望-最大化方案,该方案还能识别设计空间中的低维方向,其中目标函数表现出最大的灵敏度。我们通过两个涉及$\mathcal{O}(10^3)$随机和设计变量的数值示例验证了所提方法的有效性。所有情况下,正向模型的计算成本在调用次数上为$\mathcal{O}(10^2)$。近似精度通过适当的信息论度量进行评估。

英文摘要

This paper is concerned with a lesser-studied problem in the context of model-based, uncertainty quantification (UQ), that of optimization/design/control under uncertainty. The solution of such problems is hindered not only by the usual difficulties encountered in UQ tasks (e.g. the high computational cost of each forward simulation, the large number of random variables) but also by the need to solve a nonlinear optimization problem involving large numbers of design variables and potentially constraints. We propose a framework that is suitable for a large class of such problems and is based on the idea of recasting them as probabilistic inference tasks. To that end, we propose a Variational Bayesian (VB) formulation and an iterative VB-Expectation-Maximization scheme that is also capable of identifying a low-dimensional set of directions in the design space, along which, the objective exhibits the largest sensitivity. We demonstrate the validity of the proposed approach in the context of two numerical examples involving $\mathcal{O}(10^3)$ random and design variables. In all cases considered the cost of the computations in terms of calls to the forward model was of the order $\mathcal{O}(10^2)$. The accuracy of the approximations provided is assessed by appropriate information-theoretic metrics.

1503.05968 2026-06-04 math.OC cs.SY eess.SY math.ST stat.TH

Geometric methods for optimal sensor design

几何方法用于最优传感器设计

M. -A. Belabbas

AI总结 本文提出通过几何方法设计最优传感器,针对卡尔曼滤波器的最优传感器特性,推导出正定算子的交换条件,并提供梯度流算法以保证收敛性,从而实现最小估计误差。

详情
AI中文摘要

观察器是从噪声传感器测量中估计动态系统状态的估计器。观察器的需求无处不在,应用于工程、生物学和经济学等领域。最常用的观察器是卡尔曼滤波器,它在噪声为加性高斯噪声时是最佳估计器。由于其性能受限于配对的传感器,因此自然寻求卡尔曼滤波器的最优传感器。然而,该问题非凸,因此多年来采用了许多即兴方法来设计传感器。本文展示了如何表征并获得卡尔曼滤波器的最优传感器。具体而言,我们展示了最优传感器必须与正定算子交换的算子。我们还提供了一种梯度流来寻找最优传感器,并证明了在广泛的应用中该梯度流收敛到唯一的极小值。该最优传感器在固定信噪比下产生最低可能的估计误差。本文所呈现的结果也适用于最优执行器设计的对偶问题。

英文摘要

An observer is an estimator of the state of a dynamical system from noisy sensor measurements. The need for observers is ubiquitous, with applications in fields ranging from engineering to biology to economics. The most widely used observer is the Kalman filter, which is known to be the optimal estimator of the state when the noise is additive and Gaussian. Because its performance is limited by the sensors to which it is paired, it is natural to seek an optimal sensor for the Kalman filter. The problem is however not convex and, as a consequence, many ad hoc methods have been used over the years to design sensors. We show in this paper how to characterize and obtain the optimal sensor for the Kalman filter. Precisely, we exhibit a positive definite operator which optimal sensors have to commute with. We furthermore provide a gradient flow to find optimal sensors, and prove the convergence of this gradient flow to the unique minimum in a broad range of applications. This optimal sensor yields the lowest possible estimation error for measurements with a fixed signal to noise ratio. The results presented here also apply to the dual problem of optimal actuator design.

1402.0635 2026-06-04 stat.ML cs.AI cs.LG cs.SY eess.SY

Generalization and Exploration via Randomized Value Functions

通过随机价值函数实现泛化与探索

Ian Osband, Benjamin Van Roy, Zheng Wen

AI总结 本文提出随机最小二乘价值迭代算法(RLSVI),通过线性参数化价值函数实现高效的探索与泛化,证明其在无先验知识学习中的近优性能。

详情
Comments
arXiv admin note: text overlap with arXiv:1307.4847
AI中文摘要

我们提出了随机最小二乘价值迭代(RLSVI)——一种新的强化学习算法,旨在通过线性参数化价值函数实现高效的探索与泛化。我们解释了使用玻尔兹曼或epsilon-贪婪探索的最小二乘价值迭代版本为何效率低下,并通过计算结果展示了RLSVI带来的显著效率提升。进一步,我们建立了RLSVI预期遗憾的上界,证明其在无先验知识学习中的近最优性。更广泛地说,我们的结果表明,随机价值函数为解决强化学习中的关键挑战——合成高效探索与有效泛化——提供了一种有前景的方法。

英文摘要

We propose randomized least-squares value iteration (RLSVI) -- a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions. We explain why versions of least-squares value iteration that use Boltzmann or epsilon-greedy exploration can be highly inefficient, and we present computational results that demonstrate dramatic efficiency gains enjoyed by RLSVI. Further, we establish an upper bound on the expected regret of RLSVI that demonstrates near-optimality in a tabula rasa learning context. More broadly, our results suggest that randomized value functions offer a promising approach to tackling a critical challenge in reinforcement learning: synthesizing efficient exploration and effective generalization.

1511.07837 2026-06-04 math.OC cs.LG cs.NA math.NA stat.CO stat.ML

Generalized Conjugate Gradient Methods for $\ell_1$ Regularized Convex Quadratic Programming with Finite Convergence

针对ℓ₁正则化凸二次规划的广义共轭梯度方法及其有限收敛性

Zhaosong Lu, Xiaojun Chen

AI总结 本文提出了一种广义共轭梯度方法,用于求解带有ℓ₁正则化的凸二次规划问题,在有限次迭代内达到最优解。方法通过比较子梯度的分量大小选择步骤类型,并结合精确线搜索和共轭梯度子程序,具有较低的计算复杂度。

详情
Comments
36 pages, 2 tables
AI中文摘要

共轭梯度(CG)方法是求解大规模强凸二次规划(QP)的有效迭代方法。本文提出了一些广义CG(GCG)方法,用于求解带有ℓ₁正则化的(可能不强凸)QP问题,可在有限次迭代内终止于最优解。在每次迭代中,我们的方法首先确定一个正交抗的面,然后要么沿负的投影最小范数子梯度方向进行精确线搜索,要么执行一个CG子程序,直到CG迭代跨越该面的边界或找到该面或子面的近似最小值。我们通过比较最小范数子梯度的某些分量大小来确定应采取哪种步骤类型。我们的有限收敛性分析利用了误差界结果和上述精确线搜索和CG子程序的一些关键性质。我们还展示了所提出的方法能够通过允许CG子程序执行的不精确性来找到问题的近似解。我们GCG方法找到ε-最优解的总体算术运算成本依赖于ε在O(log(1/ε)),优于加速近端梯度方法[2,23]依赖于ε在O(1/√ε)。此外,我们的GCG方法可以简单地扩展到求解具有有限收敛性的盒约束凸QP。数值结果表明,我们的方法对于求解病态问题非常有效。

英文摘要

The conjugate gradient (CG) method is an efficient iterative method for solving large-scale strongly convex quadratic programming (QP). In this paper we propose some generalized CG (GCG) methods for solving the $\ell_1$-regularized (possibly not strongly) convex QP that terminate at an optimal solution in a finite number of iterations. At each iteration, our methods first identify a face of an orthant and then either perform an exact line search along the direction of the negative projected minimum-norm subgradient of the objective function or execute a CG subroutine that conducts a sequence of CG iterations until a CG iterate crosses the boundary of this face or an approximate minimizer of over this face or a subface is found. We determine which type of step should be taken by comparing the magnitude of some components of the minimum-norm subgradient of the objective function to that of its rest components. Our analysis on finite convergence of these methods makes use of an error bound result and some key properties of the aforementioned exact line search and the CG subroutine. We also show that the proposed methods are capable of finding an approximate solution of the problem by allowing some inexactness on the execution of the CG subroutine. The overall arithmetic operation cost of our GCG methods for finding an $ε$-optimal solution depends on $ε$ in $O(\log(1/ε))$, which is superior to the accelerated proximal gradient method [2,23] that depends on $ε$ in $O(1/\sqrtε)$. In addition, our GCG methods can be extended straightforwardly to solve box-constrained convex QP with finite convergence. Numerical results demonstrate that our methods are very favorable for solving ill-conditioned problems.

1602.02523 2026-06-04 stat.ML cs.LG cs.SY eess.SY

Data-Efficient Reinforcement Learning in Continuous-State POMDPs

连续状态POMDPs中的数据高效强化学习

Rowan McAllister, Carl Edward Rasmussen

AI总结 本文提出一种抗观测噪声的数据高效强化学习算法,通过扩展PILCO算法至POMDPs,利用过滤过程提升策略评估性能,实现在Cartpole摆动任务中更优的非线性控制效果。

详情
AI中文摘要

我们提出了一种抗观测噪声的数据高效强化学习算法。我们的方法通过在策略评估过程中考虑过滤过程,将高度数据高效的PILCO算法(Deisenroth & Rasmussen, 2011)扩展至部分观测马尔可夫决策过程(POMDPs)。PILCO进行策略搜索,通过首先预测可能系统轨迹的解析分布来评估每个策略。我们还预测轨迹相对于过滤过程,从而在结合过滤器与由原始(未过滤)框架优化的策略时实现了显著更高的性能。我们的测试设置是带有传感器噪声的Cartpole摆动任务,该任务涉及非线性动态并需要非线性控制。

英文摘要

We present a data-efficient reinforcement learning algorithm resistant to observation noise. Our method extends the highly data-efficient PILCO algorithm (Deisenroth & Rasmussen, 2011) into partially observed Markov decision processes (POMDPs) by considering the filtering process during policy evaluation. PILCO conducts policy search, evaluating each policy by first predicting an analytic distribution of possible system trajectories. We additionally predict trajectories w.r.t. a filtering process, achieving significantly higher performance than combining a filter with a policy optimised by the original (unfiltered) framework. Our test setup is the cartpole swing-up task with sensor noise, which involves nonlinear dynamics and requires nonlinear control.

1406.5311 2026-06-04 math.OC cs.AI cs.LG cs.NA math.NA stat.ML

Towards A Deeper Geometric, Analytic and Algorithmic Understanding of Margins

迈向更深入的几何、分析和算法对边界的理解

Aaditya Ramdas, Javier Peña

AI总结 本文研究了矩阵A的边界条件度量,探讨了线性可行性问题的难度,通过几何、分析和算法方法扩展了边界理论,并证明了感知机收敛率与边界的关联。

详情
Journal ref
Optimization Methods and Software, Volume 31, Issue 2, Pages 377-391, 2016
Comments
18 pages, 3 figures
AI中文摘要

给定一个矩阵A,线性可行性问题(线性分类是其特例)旨在求解原问题w: A^Tw > 0或证明对偶问题的证书,即概率分布p: Ap = 0。受

英文摘要

Given a matrix $A$, a linear feasibility problem (of which linear classification is a special case) aims to find a solution to a primal problem $w: A^Tw > \textbf{0}$ or a certificate for the dual problem which is a probability distribution $p: Ap = \textbf{0}$. Inspired by the continued importance of "large-margin classifiers" in machine learning, this paper studies a condition measure of $A$ called its \textit{margin} that determines the difficulty of both the above problems. To aid geometrical intuition, we first establish new characterizations of the margin in terms of relevant balls, cones and hulls. Our second contribution is analytical, where we present generalizations of Gordan's theorem, and variants of Hoffman's theorems, both using margins. We end by proving some new results on a classical iterative scheme, the Perceptron, whose convergence rates famously depends on the margin. Our results are relevant for a deeper understanding of margin-based learning and proving convergence rates of iterative schemes, apart from providing a unifying perspective on this vast topic.

1506.00438 2026-06-04 cs.LG cs.DM cs.SY eess.SY stat.ME

Network Topology Identification using PCA and its Graph Theoretic Interpretations

利用PCA进行网络拓扑识别及其图论解释

Aravind Rajeswaran, Shankar Narasimhan

AI总结 本文通过PCA估计线性关系,利用f-cut集和f-环路实现网络拓扑识别,展示了从稳态数据中识别网络结构的方法及图论意义。

详情
Comments
Structure of paper is changed to improve presentation. Methods and results are unchanged. A more detailed literature survey has been added
AI中文摘要

我们解决了从稳态网络测量中识别(重建)网络拓扑的问题。具体来说,给定一个数据矩阵X,其中X_{ij}对应配置(稳态)j中边i的流量,我们希望找到一个网络结构,使得所有节点的流量守恒成立。这模型了许多涉及守恒量的网络问题,如水、电力和代谢网络。我们证明了识别等同于学习一个模型A_n,该模型捕捉了X中不同变量之间的近似线性关系(即形式为A_n X ≈ 0),使得A_n满秩(最高可能)且与网络节点-边 incidence 结构一致。该问题通过一系列步骤解决,包括使用PCA估计近似线性关系、从这些近似关系中获得f-cut集,以及从f-cut集(或等价地f-环路)中实现图结构。每一步和整个过程都是多项式时间。该方法通过识别水分布网络的拓扑结构进行示例说明。我们还研究了从稳态数据中识别的可识别性范围。

英文摘要

We solve the problem of identifying (reconstructing) network topology from steady state network measurements. Concretely, given only a data matrix $\mathbf{X}$ where the $X_{ij}$ entry corresponds to flow in edge $i$ in configuration (steady-state) $j$, we wish to find a network structure for which flow conservation is obeyed at all the nodes. This models many network problems involving conserved quantities like water, power, and metabolic networks. We show that identification is equivalent to learning a model $\mathbf{A_n}$ which captures the approximate linear relationships between the different variables comprising $\mathbf{X}$ (i.e. of the form $\mathbf{A_n X \approx 0}$) such that $\mathbf{A_n}$ is full rank (highest possible) and consistent with a network node-edge incidence structure. The problem is solved through a sequence of steps like estimating approximate linear relationships using Principal Component Analysis, obtaining f-cut-sets from these approximate relationships, and graph realization from f-cut-sets (or equivalently f-circuits). Each step and the overall process is polynomial time. The method is illustrated by identifying topology of a water distribution network. We also study the extent of identifiability from steady-state data.

1404.6853 2026-06-04 stat.ML cs.NA math.NA math.OC math.PR

One-bit compressive sensing with norm estimation

一位压缩感知与范数估计

Karin Knudson, Rayan Saab, Rachel Ward

AI总结 本文研究了一位压缩感知中通过量化仿射测量进行范数估计的问题,提出通过适当选择偏移量实现范数恢复,并展示如何通过单次逆高斯误差函数评估来估计任意固定向量的范数。

详情
Comments
20 pages, 2 figures
AI中文摘要

考虑从量化线性测量中恢复未知信号x的问题。在一位压缩感知设置中,通常假设x是稀疏的,且测量形式为sign⟨a_i,x⟩∈{±1}。由于此类测量不提供x范数的信息,恢复方法通常假设‖x‖_2=1。本文显示,如果允许更一般的量化仿射测量形式sign⟨a_i,x⟩+b_i,并且向量a_i是随机的,适当选择偏移量b_i可将范数恢复纳入现有的一位压缩感知方法中。此外,本文还展示,对于任意固定x在环形区域r≤‖x‖_2≤R中,可通过m≥R^4 r^{-2} δ^{-2}这样的二进制测量,通过单次逆高斯误差函数评估,将范数‖x‖_2估计到加法误差δ以内。最后,所有恢复保证可以被扩展到稀疏向量,即以高概率,一组测量和阈值可成功估计所有稀疏向量x在已知半径的欧几里得球内。

英文摘要

Consider the recovery of an unknown signal ${x}$ from quantized linear measurements. In the one-bit compressive sensing setting, one typically assumes that ${x}$ is sparse, and that the measurements are of the form $\operatorname{sign}(\langle {a}_i, {x} \rangle) \in \{\pm1\}$. Since such measurements give no information on the norm of ${x}$, recovery methods from such measurements typically assume that $\| {x} \|_2=1$. We show that if one allows more generally for quantized affine measurements of the form $\operatorname{sign}(\langle {a}_i, {x} \rangle + b_i)$, and if the vectors ${a}_i$ are random, an appropriate choice of the affine shifts $b_i$ allows norm recovery to be easily incorporated into existing methods for one-bit compressive sensing. Additionally, we show that for arbitrary fixed ${x}$ in the annulus $r \leq \| {x} \|_2 \leq R$, one may estimate the norm $\| {x} \|_2$ up to additive error $δ$ from $m \gtrsim R^4 r^{-2} δ^{-2}$ such binary measurements through a single evaluation of the inverse Gaussian error function. Finally, all of our recovery guarantees can be made universal over sparse vectors, in the sense that with high probability, one set of measurements and thresholds can successfully estimate all sparse vectors ${x}$ within a Euclidean ball of known radius.

1601.04251 2026-06-04 eess.SY cs.LG cs.SY stat.AP stat.ML

On-line Bayesian System Identification

在线贝叶斯系统辨识

Diego Romeres, Giulia Prando, Gianluigi Pillonetto, Alessandro Chiuso

AI总结 本文提出一种在线贝叶斯系统辨识方法,通过边际似然最大化更新超参数,仅需一次迭代优化算法,实验验证其有效性。

详情
AI中文摘要

我们考虑一种在线系统辨识设置,在给定的时间步骤中新数据逐步出现。为满足实时估计要求,我们提出一种定制的贝叶斯系统辨识程序,其中超参数仍通过边际似然最大化更新,但仅需一次合适的迭代优化算法迭代。考虑了梯度方法和EM算法用于边际似然优化。我们比较了这种'1步'程序与标准程序,后者优化方法运行直至收敛到局部最小值。我们进行的实验确认了所提方法的有效性。

英文摘要

We consider an on-line system identification setting, in which new data become available at given time steps. In order to meet real-time estimation requirements, we propose a tailored Bayesian system identification procedure, in which the hyper-parameters are still updated through Marginal Likelihood maximization, but after only one iteration of a suitable iterative optimization algorithm. Both gradient methods and the EM algorithm are considered for the Marginal Likelihood optimization. We compare this "1-step" procedure with the standard one, in which the optimization method is run until convergence to a local minimum. The experiments we perform confirm the effectiveness of the approach we propose.

1601.02712 2026-06-04 cs.DS cs.ET cs.IT cs.NA math.IT math.NA math.OC stat.ML

IRLS and Slime Mold: Equivalence and Convergence

IRLS与黏菌:等价性与收敛性

Damian Straszak, Nisheeth K. Vishnoi

AI总结 本文探讨了信号处理中的IRLS算法与生物中的黏菌动力学的等价性,证明了两者为同一高维动力系统的投影,并通过阻尼IRLS算法得出收敛性与复杂度界。

详情
AI中文摘要

本文探讨了信号处理中的IRLS算法与生物中的黏菌动力学的等价性,证明了两者为同一高维动力系统的投影,并通过阻尼IRLS算法得出收敛性与复杂度界。

英文摘要

In this paper we present a connection between two dynamical systems arising in entirely different contexts: one in signal processing and the other in biology. The first is the famous Iteratively Reweighted Least Squares (IRLS) algorithm used in compressed sensing and sparse recovery while the second is the dynamics of a slime mold (Physarum polycephalum). Both of these dynamics are geared towards finding a minimum l1-norm solution in an affine subspace. Despite its simplicity the convergence of the IRLS method has been shown only for a certain regularization of it and remains an important open problem. Our first result shows that the two dynamics are projections of the same dynamical system in higher dimensions. As a consequence, and building on the recent work on Physarum dynamics, we are able to prove convergence and obtain complexity bounds for a damped version of the IRLS algorithm.

1601.02640 2026-06-04 math.NA cs.NA math.OC math.ST stat.TH

High-Dimensional Stochastic Design Optimization by Adaptive-Sparse Polynomial Dimensional Decomposition

高维随机设计优化的自适应稀疏多项式维度分解

Sharif Rahman, Xuchun Ren, Vaibhav Yadav

AI总结 本文提出了一种自适应稀疏多项式维度分解方法,用于复杂系统的随机设计优化,通过结合自适应稀疏PDD近似与分数函数,提高了统计矩和可靠性分析的效率,展示了在工程问题中的应用潜力。

详情
Comments
18 pages, 2 figures, to appear in Sparse Grids and Applications--Stuttgart 2014, Lecture Notes in Computational Science and Engineering 109, edited by J. Garcke and D. Pflüger, Springer International Publishing, 2016
AI中文摘要

本文提出了一种新颖的自适应稀疏多项式维度分解(PDD)方法,用于复杂系统的随机设计优化。该方法包括对高维随机响应的自适应稀疏PDD近似,用于统计矩和可靠性分析;将自适应稀疏PDD近似与分数函数相结合,用于估计统计矩和失效概率的一阶设计灵敏度;以及标准梯度优化算法。新分析公式用于同时确定统计矩或失效概率的设计灵敏度。数学函数的数值结果表明,新方法比现有方法更具计算效率。最后,对具有79个变量的喷气发动机支架进行了随机形状优化,展示了该方法在解决实际工程问题中的强大能力。

英文摘要

This paper presents a novel adaptive-sparse polynomial dimensional decomposition (PDD) method for stochastic design optimization of complex systems. The method entails an adaptive-sparse PDD approximation of a high-dimensional stochastic response for statistical moment and reliability analyses; a novel integration of the adaptive-sparse PDD approximation and score functions for estimating the first-order design sensitivities of the statistical moments and failure probability; and standard gradient-based optimization algorithms. New analytical formulae are presented for the design sensitivities that are simultaneously determined along with the moments or the failure probability. Numerical results stemming from mathematical functions indicate that the new method provides more computationally efficient design solutions than the existing methods. Finally, stochastic shape optimization of a jet engine bracket with 79 variables was performed, demonstrating the power of the new method to tackle practical engineering problems.

1512.06789 2026-06-04 stat.ML cs.AI cs.SY eess.SY math.OC

Information-Theoretic Bounded Rationality

信息论有界理性

Pedro A. Ortega, Daniel A. Braun, Justin Dyer, Kee-Eung Kim, Naftali Tishby

AI总结 本文基于信息论提出有界理性的理论,通过自由能函数描述决策,具备控制解空间、精确蒙特卡洛规划及捕捉模型不确定性的特性,并扩展至序列决策。

详情
Comments
47 pages, 19 figures
AI中文摘要

有界理性,即在资源限制下进行决策和规划,被认为是人工智能、强化学习、计算神经科学和经济学中的重要开放问题。本文提供了一个基于信息论的有界理性的理论综述。我们为使用自由能功能作为有界理性决策的客观函数提供了概念论证。该功能具有三个关键特性:它控制了解空间的大小;它具有精确的蒙特卡洛规划器,却无需穷尽搜索;它捕捉到缺乏证据或与其他具有未知意图的智能体交互时产生的模型不确定性。我们讨论了单步决策情况,并展示如何通过等价变换扩展到序列决策。这种扩展产生了一种非常一般的决策问题类,涵盖了经典决策规则(如EXPECTIMAX和MINIMAX)作为极限情况,以及信任和风险敏感的规划。

英文摘要

Bounded rationality, that is, decision-making and planning under resource limitations, is widely regarded as an important open problem in artificial intelligence, reinforcement learning, computational neuroscience and economics. This paper offers a consolidated presentation of a theory of bounded rationality based on information-theoretic ideas. We provide a conceptual justification for using the free energy functional as the objective function for characterizing bounded-rational decisions. This functional possesses three crucial properties: it controls the size of the solution space; it has Monte Carlo planners that are exact, yet bypass the need for exhaustive search; and it captures model uncertainty arising from lack of evidence or from interacting with other agents having unknown intentions. We discuss the single-step decision-making case, and show how to extend it to sequential decisions using equivalence transformations. This extension yields a very general class of decision problems that encompass classical decision rules (e.g. EXPECTIMAX and MINIMAX) as limit cases, as well as trust- and risk-sensitive planning.

1507.05372 2026-06-04 stat.AP cs.NA math.NA stat.ME

When interpolation-induced reflection artifact meets time-frequency analysis

当插值引起的反射伪影与时频分析相遇

Yu-Ting Lin, Patrick Flandrin, Hau-tieng Wu

AI总结 研究探讨了在时频分析中插值方案引起的反射伪影问题,通过理论分析和麻醉深度分析实例,揭示了伪影对提取信号时变动态的影响,并提出选择合适插值方案以避免伪影的必要性。

详情
AI中文摘要

在基于时频分析提取时间动态特征时,如再分配和同步压缩变换,日益受到生物医学数据分析的关注。然而,应注意由插值方案产生的伪影,特别是当采样率不显著高于所关注振荡成分的频率时。本文提出了称为反射效应的问题,并提供了理论依据。我们还展示了在麻醉深度分析中的示例,其中存在明显但不期望的伪影。结果表明,与反射效应相关的伪影不仅在理论上存在,而且在实践中也存在。其影响在应用时频分析提取信号内部的时间变化动态时尤为显著。结论是,必须通过选择适当的插值方案来谨慎处理与反射效应相关的伪影。

英文摘要

While extracting the temporal dynamical features based on the time-frequency analyses, like the reassignment and synchrosqueezing transform, attracts more and more interest in bio-medical data analysis, we should be careful about artifacts generated by interpolation schemes, in particular when the sampling rate is not significantly higher than the frequency of the oscillatory component we are interested in. In this study, we formulate the problem called the reflection effect and provide a theoretical justification of the statement. We also show examples in the anesthetic depth analysis with clear but undesirable artifacts. The results show that the artifact associated with the reflection effect exists not only theoretically but practically. Its influence is pronounced when we apply the time-frequency analyses to extract the time-varying dynamics hidden inside the signal. In conclusion, we have to carefully deal with the artifact associated with the reflection effect by choosing a proper interpolation scheme.

1512.04468 2026-06-04 stat.CO cs.NA math.NA

A Method to Calculate the Exit Time in Stochastic Simulations

一种计算随机模拟中退出时间的方法

Basil S. Bayati

AI总结 本文提出一种基于随机变量序列和卷积定理计算随机模拟退出时间的方法,通过频域求解并转换回实域,减少随机数需求,同时分析误差和加速效果。

详情
AI中文摘要

本文提出了一种新的方法,用于计算随机模拟算法中的退出时间。该方法基于随机变量序列的添加,并利用卷积定理推导出最终分布。最终分布通过频域求解并近似,然后转换回实域,可在模拟中采样。结果是一种需要更少随机变量的经典随机模拟算法的近似方法。本文还分析了与传统随机模拟算法相比的误差和加速效果。

英文摘要

A novel method is presented to compute the exit time for the stochastic simulation algorithm. The method is based on the addition of a series of random variables and is derived using the convolution theorem. The final distribution is derived and approximated in the frequency domain. The distribution for the final time is transformed back to the real domain and can be sampled from in a simulation. The result is an approximation of the classical stochastic simulation algorithm that requires fewer random variates. An analysis of the error and speedup compared to the stochastic simulation algorithm is presented.

1512.03518 2026-06-04 math.OC cs.LG cs.NA math.NA stat.ML

A Unified Approach to Error Bounds for Structured Convex Optimization Problems

结构凸优化问题误差界的一种统一方法

Zirui Zhou, Anthony Man-Cho So

AI总结 本文提出一种统一框架,用于建立结构凸优化问题的误差界,涵盖一般约束最小化问题和机器学习中的正则化损失最小化问题,并通过核范数正则化损失问题展示了新误差界的应用。

详情
Comments
32 pages
AI中文摘要

误差界是指通过残差函数将测试集向量距离给定集合的距离进行限制的不等式,已被证明在分析迭代方法的收敛速度方面非常有用。本文提出了一种新的框架,用于建立一类结构凸优化问题的误差界,其中目标函数是光滑凸函数和一般闭合正凸函数的和。此类问题不仅涵盖广泛的一般约束最小化问题,还涵盖各种正则化损失最小化公式。使用我们的框架,我们证明了现有误差界结果可以以统一和透明的方式恢复。为进一步展示我们框架的威力,我们将其应用于核范数正则化损失最小化问题,并在严格互补型正则性条件下建立了此类问题的新误差界。然后,我们通过构造一个例子来证明,在没有正则性条件的情况下,所述误差界可能失效。因此,我们得到了对Tseng提出的问题的较为完整的回答。我们相信,我们的方法将在结构凸优化问题的误差界研究中找到进一步的应用。

英文摘要

Error bounds, which refer to inequalities that bound the distance of vectors in a test set to a given set by a residual function, have proven to be extremely useful in analyzing the convergence rates of a host of iterative methods for solving optimization problems. In this paper, we present a new framework for establishing error bounds for a class of structured convex optimization problems, in which the objective function is the sum of a smooth convex function and a general closed proper convex function. Such a class encapsulates not only fairly general constrained minimization problems but also various regularized loss minimization formulations in machine learning, signal processing, and statistics. Using our framework, we show that a number of existing error bound results can be recovered in a unified and transparent manner. To further demonstrate the power of our framework, we apply it to a class of nuclear-norm regularized loss minimization problems and establish a new error bound for this class under a strict complementarity-type regularity condition. We then complement this result by constructing an example to show that the said error bound could fail to hold without the regularity condition. Consequently, we obtain a rather complete answer to a question raised by Tseng. We believe that our approach will find further applications in the study of error bounds for structured convex optimization problems.

1512.03251 2026-06-04 math.NA cs.NA stat.CO

Histogram Arithmetic under Uncertainty of Probability Density Function

概率密度函数不确定性下的直方图运算

V. N. Petrushin, E. V. Nikulchev, D. A. Korolev

AI总结 本文提出在变量分布未知时进行算术运算的方法,通过选择代数方程及系统解的概率区间,处理随机参数经验密度分布的直方图评估结果。

详情
Journal ref
Applied Mathematical Sciences 9(2015) 7043-7052
Comments
10 pages
AI中文摘要

在本文中,我们提出了一种在变量分布未知时进行算术运算的方法。该方法对算术运算的结果评估可以选取代数方程及其系统解、微分方程及其系统解的概率区间,以处理随机参数经验密度分布的直方图评估结果。

英文摘要

In this article we propose a method of performing arithmetic operations on varia-bles with unknown distribution. The approach to the evaluation results of arithme-tic operations can select probability intervals of the algebraic equations and their systems solutions, of differential equations and their systems in case of histogram evaluation of the empirical density distributions of random parameters.

1512.02713 2026-06-04 stat.CO cs.NA math.NA

Transformations and Hardy-Krause variation

变换与Hardy-Krause变分

Kinjal Basu, Art B. Owen

AI总结 本文研究变换τ的条件,使得f∘τ在Hardy-Krause意义下有界变分,同时探讨其对Scrambled Net采样精度的影响。

详情
AI中文摘要

利用多变量Faa di Bruno公式,本文给出了变换τ:[0,1]^m→X的条件,其中X是R^d中的闭合有界子集,使得对于所有f∈C^d(X),f∘τ在Hardy-Krause意义下有界变分。我们还给出了f∘τ足够光滑以使Scrambled Net采样达到O(n^{-3/2+ε})精度的类似条件。一些流行的对称变换到单纯形和球面被证明不满足这些条件。Fang和Wang(1993)提出的某些变换满足第一个但不满足第二个条件。我们还提供了使f∘τ足够光滑以充分利用Scrambled Net采样所有广义多项式类的单纯形变换。此外,我们还找到了R^2中Rosenblatt-Hlawka-Mück变换和重要性采样在Hardy-Krause意义下有界变分的充分条件。

英文摘要

Using a multivariable Faa di Bruno formula we give conditions on transformations $τ:[0,1]^m\to\mathcal{X}$ where $\mathcal{X}$ is a closed and bounded subset of $\mathbb{R}^d$ such that $f\circτ$ is of bounded variation in the sense of Hardy and Krause for all $f\in C^d(\mathcal{x})$. We give similar conditions for $f\circτ$ to be smooth enough for scrambled net sampling to attain $O(n^{-3/2+ε})$ accuracy. Some popular symmetric transformations to the simplex and sphere are shown to satisfy neither condition. Some other transformations due to Fang and Wang (1993) satisfy the first but not the second condition. We provide transformations for the simplex that makes $f\circτ$ smooth enough to fully benefit from scrambled net sampling for all $f$ in a class of generalized polynomials. We also find sufficient conditions for the Rosenblatt-Hlawka-Mück transformation in $\mathbb{R}^2$ and for importance sampling to be of bounded variation in the sense of Hardy and Krause.

1512.02303 2026-06-04 math.NA cs.NA math.ST stat.TH

The $f$-Sensitivity Index

$f$-敏感性指数

Sharif Rahman

AI总结 本文提出了一种通用的多变量$f$-敏感性指数,基于随机响应的无条件与条件概率测度间的$f$-散度,用于全局敏感性分析。该指数适用于随机输入遵循依赖或独立概率分布的情况,并展示了多种$f$-敏感性指数的可能形式,包括互信息、平方损失互信息和Borgonovo重要性度量等。

详情
Comments
32 pages, 5 figures, accepted by SIAM/ASA Journal on Uncertainty Quantification, 2015
AI中文摘要

本文提出了一种通用的多变量$f$-敏感性指数,基于随机响应的无条件与条件概率测度间的$f$-散度,用于全局敏感性分析。与方差法的Sobol指数不同,$f$-敏感性指数适用于随机输入遵循依赖或独立概率分布的情况。由于$f$-散度类支持广泛的散度或距离度量,因此可能产生多种$f$-敏感性指数,为敏感性分析提供多样选择。常用的敏感性指数或度量,如互信息、平方损失互信息和Borgonovo的重要性度量,均被证明是所提敏感性指数的特例。新的理论结果揭示了$f$-敏感性指数的基本性质,并建立了重要的不等式。提出了三种新的近似方法,取决于如何确定随机响应的概率密度,以估计敏感性指数。四个数值例子,包括计算上昂贵的随机边值问题,展示了这些方法并解释了何时一种方法比其他方法更相关。

英文摘要

This article presents a general multivariate $f$-sensitivity index, rooted in the $f$-divergence between the unconditional and conditional probability measures of a stochastic response, for global sensitivity analysis. Unlike the variance-based Sobol index, the $f$-sensitivity index is applicable to random input following dependent as well as independent probability distributions. Since the class of $f$-divergences supports a wide variety of divergence or distance measures, a plethora of $f$-sensitivity indices are possible, affording diverse choices to sensitivity analysis. Commonly used sensitivity indices or measures, such as mutual information, squared-loss mutual information, and Borgonovo's importance measure, are shown to be special cases of the proposed sensitivity index. New theoretical results, revealing fundamental properties of the $f$-sensitivity index and establishing important inequalities, are presented. Three new approximate methods, depending on how the probability densities of a stochastic response are determined, are proposed to estimate the sensitivity index. Four numerical examples, including a computationally intensive stochastic boundary-value problem, illustrate these methods and explain when one method is more relevant than the others.

1305.0030 2026-06-04 math.NA cs.NA stat.ML

A least-squares method for sparse low rank approximation of multivariate functions

一种基于离散最小二乘的稀疏低秩多变量函数近似方法

Mathilde Chevreuil, Régis Lebrun, Anthony Nouy, Prashant Rai

AI总结 本文提出一种基于离散最小二乘的低秩近似方法,用于从随机无噪声观测中近似多变量函数,通过引入稀疏诱导正则化技术,利用交叉验证选择最优正则化参数和秩,验证该算法在少次数评估下近似多变量函数的能力。

详情
Journal ref
SIAM/ASA Journal on Uncertainty Quantification}, 3(1):897--921, 2015
AI中文摘要

在本文中,我们提出了一种基于离散最小二乘的低秩近似方法,用于从随机、无噪声的观测中近似多变量函数。通过在经典低秩近似算法中引入稀疏诱导正则化技术,以利用低秩近似可能的稀疏性。稀疏低秩近似通过一种稳健的更新贪心算法构建,该算法包括使用交叉验证技术进行最优正则化参数和近似秩的选择。数值示例展示了即使在非常有限的函数评估可用时,也能近似多变量函数的能力,从而证明了所提算法在通过复杂计算模型传播不确定性时的实用性。

英文摘要

In this paper, we propose a low-rank approximation method based on discrete least-squares for the approximation of a multivariate function from random, noisy-free observations. Sparsity inducing regularization techniques are used within classical algorithms for low-rank approximation in order to exploit the possible sparsity of low-rank approximations. Sparse low-rank approximations are constructed with a robust updated greedy algorithm which includes an optimal selection of regularization parameters and approximation ranks using cross validation techniques. Numerical examples demonstrate the capability of approximating functions of many variables even when very few function evaluations are available, thus proving the interest of the proposed algorithm for the propagation of uncertainties through complex computational models.

1509.08581 2026-06-04 math.OC cs.LG cs.NA math.NA stat.CO stat.ML

Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method

通过非单调投影梯度法优化稀疏对称集

Zhaosong Lu

AI总结 本文提出非单调投影梯度法用于优化稀疏对称集,引入更强的最优条件并证明其全局或局部最优性。

详情
Comments
30 pages
AI中文摘要

我们考虑在稀疏对称集上最小化Lipschitz可微函数的问题,该问题在工程和科学中有广泛应用。已知经典投影梯度(PG)方法常数步长1/L的任何聚点满足L-站定最优条件。本文引入更强的最优条件,并提出非单调投影梯度(NPG)方法,结合支持变化和坐标交换策略。证明NPG的任何聚点满足新条件且为坐标站定点。在合适假设下,其为全局或局部极小值。数值实验显示NPG在解质量上优于PG,且在速度上至少可比甚至优于PG。

英文摘要

We consider the problem of minimizing a Lipschitz differentiable function over a class of sparse symmetric sets that has wide applications in engineering and science. For this problem, it is known that any accumulation point of the classical projected gradient (PG) method with a constant stepsize $1/L$ satisfies the $L$-stationarity optimality condition that was introduced in [3]. In this paper we introduce a new optimality condition that is stronger than the $L$-stationarity optimality condition. We also propose a nonmonotone projected gradient (NPG) method for this problem by incorporating some support-changing and coordintate-swapping strategies into a projected gradient method with variable stepsizes. It is shown that any accumulation point of NPG satisfies the new optimality condition and moreover it is a coordinatewise stationary point. Under some suitable assumptions, we further show that it is a global or a local minimizer of the problem. Numerical experiments are conducted to compare the performance of PG and NPG. The computational results demonstrate that NPG has substantially better solution quality than PG, and moreover, it is at least comparable to, but sometimes can be much faster than PG in terms of speed.

1507.08504 2026-06-04 q-bio.CB cs.NA math.NA q-bio.QM stat.CO

A global sensitivity analysis approach for morphogenesis models

一种用于形态发生模型的全局敏感性分析方法

Sonja E. M. Boas, Maria I. Navarro Jimenez, Roeland M. H. Merks, Joke G. Blom

AI总结 本文提出一种全局敏感性分析方法,用于研究形态发生模型中单参数及参数交互对输出的影响,通过血管形态发生模型验证,识别了主导参数并为模型简化提供依据。

详情
Journal ref
Boas, S. E. M., Navarro Jimenez, M. I., Merks, R. M. H., & Blom, J. G. (2015). A global sensitivity analysis approach for morphogenesis models. Bmc Systems Biology, 9(1), 85
Comments
31 pages, 7 figures
AI中文摘要

形态发生是细胞组织成特定形状和模式的发育过程。复杂多因素模型常用于研究形态发生,但理解输入不确定性与输出之间的关系困难,因此需要敏感性分析工具。本文介绍了一种全局敏感性分析方法的工作流程,用于研究形态发生模型中单参数及参数交互对输出的影响。通过已发表的血管形态发生模型进行演示,模型参数代表驱动血管生成机制的细胞特性。全局敏感性分析正确识别了模型中的主导参数,与先前研究一致。此外,分析提供了单参数和参数交互的相对影响信息。模型输出的不确定性主要由单参数引起,参数交互尽管影响较小,但提供了新的计算模型生成机制的见解。最后,分析表明模型可减少一个参数。我们提出全局敏感性分析作为研究和验证形态发生机制的替代方法。模型参数影响排名与实验数据知识的比较以及对操纵实验的验证有助于反驳模型并发现形态发生中的作用机制。该工作流程适用于所有'黑箱'模型,包括高通量体外模型,其中输出指标受一组实验扰动影响。

英文摘要

Morphogenesis is a developmental process in which cells organize into shapes and patterns. Complex, multi-factorial models are commonly used to study morphogenesis. It is difficult to understand the relation between the uncertainty in the input and the output of such `black-box' models, giving rise to the need for sensitivity analysis tools. In this paper, we introduce a workflow for a global sensitivity analysis approach to study the impact of single parameters and the interactions between them on the output of morphogenesis models. To demonstrate the workflow, we used a published, well-studied model of vascular morphogenesis. The parameters of the model represent cell properties and behaviors that drive the mechanisms of angiogenic sprouting. The global sensitivity analysis correctly identified the dominant parameters in the model, consistent with previous studies. Additionally, the analysis provides information on the relative impact of single parameters and of interactions between them. The uncertainty in the output of the model was largely caused by single parameters. The parameter interactions, although of low impact, provided new insights in the mechanisms of \emph{in silico} sprouting. Finally, the analysis indicated that the model could be reduced by one parameter. We propose global sensitivity analysis as an alternative approach to study and validate the mechanisms of morphogenesis. Comparison of the ranking of the impact of the model parameters to knowledge derived from experimental data and validation of manipulation experiments can help to falsify models and to find the operand mechanisms in morphogenesis. The workflow is applicable to all `black-box' models, including high-throughput \emph{in vitro} models in which an output measure is affected by a set of experimental perturbations.

1505.00274 2026-06-04 cs.AI cs.SY eess.SY stat.ML

Stick-Breaking Policy Learning in Dec-POMDPs

在Dec-POMDPs中采用Stick-Breaking策略的学习

Miao Liu, Christopher Amato, Xuejun Liao, Lawrence Carin, Jonathan P. How

AI总结 本文提出了一种变大小状态控制器的Dec-SBPR框架,通过Stick-Breaking先验构建局部策略,无需假设Dec-POMDP模型即可学习控制器参数,有效提升大规模问题的性能。

详情
AI中文摘要

期望最大化(EM)最近已被证明是学习大规模分布式部分可观测马尔可夫决策过程(Dec-POMDPs)中有限状态控制器(FSCs)的高效算法。然而,当前方法使用固定大小的FSCs,通常收敛于远离最优的极值。本文考虑使用可变大小的FSCs来表示每个智能体的局部策略。这些可变大小的FSCs通过Stick-Breaking先验构建,导致一种新的框架,称为去中心化Stick-Breaking策略表示(Dec-SBPR)。该方法通过变分贝叶斯算法学习控制器参数,而无需假设Dec-POMDP模型可用。Dec-SBPR在多个基准问题上的性能表明,该算法能够扩展到大规模问题,同时优于其他最先进的方法。

英文摘要

Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from optimal. This paper considers a variable-size FSC to represent the local policy of each agent. These variable-size FSCs are constructed using a stick-breaking prior, leading to a new framework called \emph{decentralized stick-breaking policy representation} (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the Dec-POMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.

1501.02627 2026-06-04 math.NA cs.NA stat.CO stat.ME stat.ML

A fast numerical method for max-convolution and the application to efficient max-product inference in Bayesian networks

一种快速的数值方法用于最大卷积及其在贝叶斯网络中高效最大乘积推断中的应用

Oliver Serang

AI总结 本文提出了一种O(k log k)的快速数值方法,用于估计两个非负向量的最大卷积,从而在贝叶斯网络中实现高效的max-product推断,将Viterbi路径计算时间从O(nk²)降至O(nk log k)。

详情
Journal ref
Journal of Computational Biology. August 2015, 22(8): 770-783
AI中文摘要

观测到随机变量之和的现象在许多领域都很常见;然而,目前尚无有效方法进行这些和的一般离散分布上的max-product推断(可用于获得最大后验估计)。max-product推断的限制步骤是max-convolution问题(有时以对数变换形式呈现,称为“infimal convolution”、“min-convolution”或“tropical semiring上的卷积”),目前尚无O(k log k)的方法。本文提出了一种O(k log k)的数值方法,用于估计两个非负向量(例如两个概率质量函数)的最大卷积,其中k是较大向量的长度。该数值最大卷积方法随后通过在卷积树上执行快速max-product推断来演示,卷积树是一种用于在O(nk log(nk) log(n))步内执行快速推断的数据结构(其中每个随机变量在k个连续可能状态上有任意先验分布)。该数值最大卷积方法可以应用于特定的隐马尔可夫模型,将计算Viterbi路径的运行时间从O(nk²)降至O(nk log k),并可能应用于所有对最短路径问题。

英文摘要

Observations depending on sums of random variables are common throughout many fields; however, no efficient solution is currently known for performing max-product inference on these sums of general discrete distributions (max-product inference can be used to obtain maximum a posteriori estimates). The limiting step to max-product inference is the max-convolution problem (sometimes presented in log-transformed form and denoted as "infimal convolution", "min-convolution", or "convolution on the tropical semiring"), for which no O(k log(k)) method is currently known. Here I present a O(k log(k)) numerical method for estimating the max-convolution of two nonnegative vectors (e.g., two probability mass functions), where k is the length of the larger vector. This numerical max-convolution method is then demonstrated by performing fast max-product inference on a convolution tree, a data structure for performing fast inference given information on the sum of n discrete random variables in O(n k log(n k) log(n) ) steps (where each random variable has an arbitrary prior distribution on k contiguous possible states). The numerical max-convolution method can be applied to specialized classes of hidden Markov models to reduce the runtime of computing the Viterbi path from n k^2 to n k log(k), and has potential application to the all-pairs shortest paths problem.

1404.0651 2026-06-04 stat.CO cs.NA math.NA q-fin.ST

On parameter identification in stochastic differential equations by penalized maximum likelihood

关于通过惩罚最大似然法在随机微分方程中进行参数识别

Fabian Dunker, Thorsten Hohage

AI总结 本文提出非参数估计器用于估计随机微分方程的系数,通过Fokker-Planck方程描述确定性前向算子,推导惩罚最大似然估计器的风险收敛速率,并通过蒙特卡洛模拟展示对数似然相较于二次数据保真项的优势。

详情
Journal ref
Inverse Problems, 2014, 30, 095001
AI中文摘要

本文提出非参数估计器用于估计随机微分方程的系数,当数据由独立同分布随机变量描述时。该问题被公式化为非线性病态算子方程,其中确定性前向算子由Fokker-Planck方程描述。我们推导了具有凸惩罚项的惩罚最大似然估计器的风险收敛速率以及牛顿型方法的收敛速率。我们的通用收敛结果的假设已被验证用于漂移系数的估计。通过蒙特卡洛模拟展示了对数似然相较于二次数据保真项的优势。

英文摘要

In this paper we present nonparametric estimators for coefficients in stochastic differential equation if the data are described by independent, identically distributed random variables. The problem is formulated as a nonlinear ill-posed operator equation with a deterministic forward operator described by the Fokker-Planck equation. We derive convergence rates of the risk for penalized maximum likelihood estimators with convex penalty terms and for Newton-type methods. The assumptions of our general convergence results are verified for estimation of the drift coefficient. The advantages of log-likelihood compared to quadratic data fidelity terms are demonstrated in Monte-Carlo simulations.

1407.0753 2026-06-04 math.OC cs.LG cs.NA math.NA stat.ML

Global convergence of splitting methods for nonconvex composite optimization

非凸复合优化中分裂方法的全局收敛性

Guoyin Li, Ting Kei Pong

AI总结 本文研究了非凸复合优化问题,分析了交替方向乘子法和近端梯度算法的收敛性,证明了在特定条件下序列收敛于 stationary 点,并给出了保证序列有界的充分条件。

详情
Comments
To appear in SIOPT
AI中文摘要

我们考虑了最小化一个具有有界海森矩阵的光滑函数 h 和一个非光滑函数之和的问题。我们假设后者是一个闭函数 P 和一个满射线性映射 M 的组合,且 P 的近端映射在参数 τ>0 时易于计算。该问题一般是非凸的,并涵盖工程和机器学习中的许多重要应用。本文分析了两种分裂方法用于解决该非凸优化问题:交替方向乘子法和近端梯度算法。对于交替方向乘子法的直接适应,我们证明如果惩罚参数足够大且生成的序列有聚点,则会得到非凸问题的 stationary 点。我们还建立了在附加假设下整个序列收敛的条件,即函数 h 和 P 是半代数的。此外,我们给出了保证生成序列有界的简单充分条件。这些条件可以满足广泛的应用,包括带有 ℓ_{1/2} 正则化的最小二乘问题。最后,当 M 是恒等映射时,即近端梯度算法可以高效应用时,我们证明任何聚点在略微更灵活的常数步长规则下是 stationary 点,这比文献中非凸 h 的已知条件更灵活。

英文摘要

We consider the problem of minimizing the sum of a smooth function $h$ with a bounded Hessian, and a nonsmooth function. We assume that the latter function is a composition of a proper closed function $P$ and a surjective linear map $\cal M$, with the proximal mappings of $τP$, $τ> 0$, simple to compute. This problem is nonconvex in general and encompasses many important applications in engineering and machine learning. In this paper, we examined two types of splitting methods for solving this nonconvex optimization problem: alternating direction method of multipliers and proximal gradient algorithm. For the direct adaptation of the alternating direction method of multipliers, we show that, if the penalty parameter is chosen sufficiently large and the sequence generated has a cluster point, then it gives a stationary point of the nonconvex problem. We also establish convergence of the whole sequence under an additional assumption that the functions $h$ and $P$ are semi-algebraic. Furthermore, we give simple sufficient conditions to guarantee boundedness of the sequence generated. These conditions can be satisfied for a wide range of applications including the least squares problem with the $\ell_{1/2}$ regularization. Finally, when $\cal M$ is the identity so that the proximal gradient algorithm can be efficiently applied, we show that any cluster point is stationary under a slightly more flexible constant step-size rule than what is known in the literature for a nonconvex $h$.

1405.4980 2026-06-04 math.OC cs.CC cs.LG cs.NA math.NA stat.ML

Convex Optimization: Algorithms and Complexity

凸优化:算法与复杂性

Sébastien Bubeck

AI总结 本文探讨了凸优化中的复杂性定理及其算法,涵盖黑盒优化、结构优化和随机优化的理论与方法,重点介绍FISTA、对偶平均和内点法等核心算法。

详情
Journal ref
In Foundations and Trends in Machine Learning, Vol. 8: No. 3-4, pp 231-357, 2015
Comments
A previous version of the manuscript was titled "Theory of Convex Optimization for Machine Learning"
AI中文摘要

本文阐述了凸优化中的复杂性定理及其相应算法。从黑盒优化的基本理论开始,内容逐步推进到结构优化和随机优化的最新进展。黑盒优化部分深受Nesterov的开创性著作和Nemirovski的讲义影响,涵盖了切割平面方法以及(加速)梯度下降方案的分析。我们特别关注非欧几里得设置(相关算法包括Frank-Wolfe、镜像下降和对偶平均)并讨论其在机器学习中的相关性。我们为结构优化提供了简要介绍,包括FISTA(用于优化光滑与简单非光滑项之和)、鞍点镜像近似(Nemirovski对Nesterov平滑方法的替代)以及内点法的简要描述。在随机优化中,我们讨论了随机梯度下降、小批量、随机坐标下降和亚线性算法。我们还简要提及了凸松弛的组合优化问题以及利用随机性来近似解的方法,以及基于随机游走的方法。

英文摘要

This monograph presents the main complexity theorems in convex optimization and their corresponding algorithms. Starting from the fundamental theory of black-box optimization, the material progresses towards recent advances in structural optimization and stochastic optimization. Our presentation of black-box optimization, strongly influenced by Nesterov's seminal book and Nemirovski's lecture notes, includes the analysis of cutting plane methods, as well as (accelerated) gradient descent schemes. We also pay special attention to non-Euclidean settings (relevant algorithms include Frank-Wolfe, mirror descent, and dual averaging) and discuss their relevance in machine learning. We provide a gentle introduction to structural optimization with FISTA (to optimize a sum of a smooth and a simple non-smooth term), saddle-point mirror prox (Nemirovski's alternative to Nesterov's smoothing), and a concise description of interior point methods. In stochastic optimization we discuss stochastic gradient descent, mini-batches, random coordinate descent, and sublinear algorithms. We also briefly touch upon convex relaxation of combinatorial problems and the use of randomness to round solutions, as well as random walks based methods.

1511.03122 2026-06-04 stat.ME cs.NA math.NA

A Coherent Integration Method Based on Radon Non-uniform FRFT for Random Pulse Repetition Interval (RPRI) Radar

基于Radon非均匀FRFT的相干整合方法用于随机脉冲重复间隔(RPRI)雷达

Jing Tian, Xiang-Gen Xia, Gang Yang, Wei Cui, Si-Liang Wu

AI总结 本文提出基于Radon非均匀FRFT的相干整合方法,用于解决随机脉冲重复间隔雷达中目标运动引起的距离单元迁移和频谱扩展问题,通过运动参数空间搜索和非均匀FRFT实现目标检测,优于MTD、RFT和RFRFT方法。

详情
Comments
7 pages, 3 figures
AI中文摘要

为了解决由目标运动引起的距离单元迁移(RCM)和整合时间内频谱扩展问题,本文提出了一种基于Radon非均匀FRFT(NUFRFT)的新相干整合方法,适用于随机脉冲重复间隔(RPRI)雷达。在此方法中,通过在运动参数空间中搜索来消除RCM,利用NUFRFT来解决频谱扩展问题。与其他流行方法,即移动目标检测(MTD)、Radon-Fourier变换(RFT)和Radon-分数阶Fourier变换(RFRFT)进行了比较。仿真结果表明,所提出的方法能够在低信噪比场景中检测到移动目标,并优于其他两种方法。

英文摘要

To solve the range cell migration (RCM) and spectrum spread during the integration time induced by the motion of a target, this paper proposes a new coherent integration method based on Radon non-uniform FRFT (NUFRFT) for random pulse repetition interval (RPRI) radar. In this method, RCM is eliminated via searching in the motion parameters space and the spectrum spread is resolved by using NUFRFT. Comparisons with other popular methods, moving target detection (MTD), Radon-Fourier transform (RFT), and Radon-Fractional Fourier Transform (RFRFT) are performed. The simulation results demonstrate that the proposed method can detect the moving target even in low SNR scenario and is superior to the other two methods.

1502.06800 2026-06-04 cs.LG cs.NA math.NA stat.ML

On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions

核积分规则与随机特征展开的等价性

Francis Bach

AI总结 研究揭示核积分规则是随机特征展开的特例,通过理论分析得出样本数与积分算子特征值的关系,扩展至函数逼近问题并改进随机特征学习的泛化保证。

详情
AI中文摘要

我们展示基于核的积分规则可以视为正定核随机特征展开的特例,对于特定分解总存在。我们提供理论分析,得出所需样本数与近似误差的关系,得到基于积分算子特征值的上下界,匹配对数项。特别地,我们显示上界可通过特定非均匀分布的独立同分布样本获得,而下界若对任何点集有效。将结果应用于核积分规则时,我们恢复了Sobolev空间的已知上下界。此外,结果扩展至更一般的函数逼近问题,得到L2-和L∞-范数结果,匹配特殊情形的已知结果。应用于随机特征时,我们显示改进了保持学习Lipschitz连续损失泛化保证所需的随机特征数量。

英文摘要

We show that kernel-based quadrature rules for computing integrals can be seen as a special case of random feature expansions for positive definite kernels, for a particular decomposition that always exists for such kernels. We provide a theoretical analysis of the number of required samples for a given approximation error, leading to both upper and lower bounds that are based solely on the eigenvalues of the associated integral operator and match up to logarithmic terms. In particular, we show that the upper bound may be obtained from independent and identically distributed samples from a specific non-uniform distribution, while the lower bound if valid for any set of points. Applying our results to kernel-based quadrature, while our results are fairly general, we recover known upper and lower bounds for the special cases of Sobolev spaces. Moreover, our results extend to the more general problem of full function approximations (beyond simply computing an integral), with results in L2- and L$\infty$-norm that match known results for special cases. Applying our results to random features, we show an improvement of the number of random features needed to preserve the generalization guarantees for learning with Lipschitz-continuous losses.

1412.6049 2026-06-04 stat.ME cs.MA cs.SY eess.SY physics.data-an

Distributed Detection via Bayesian Updates and Consensus

通过贝叶斯更新和共识进行分布式检测

Qipeng Liu, Jiuhua Zhao, Xiaofan Wang

AI总结 本文研究了基于贝叶斯定律的分布式检测算法,分析了共识协议与贝叶斯更新的结合方式,发现几何平均共识更高效,通信延迟显著影响收敛速度,不同信号结构的通信提升收敛率。

详情
Comments
6 pages, 3 figures. This paper has been submitted to Chinese Control Conference 2015 at Hangzhou, People's Republic of China
AI中文摘要

本文讨论了一类分布式检测算法,可视为分布式环境下贝叶斯定律的实现。部分算法最近被提出,部分在此文中首次提出。这些算法的共同特点是结合(i)某种共识协议与(ii)贝叶斯更新。它们的主要区别在于共识协议的类型和两个操作的顺序。在讨论其相似性和差异性后,通过数值示例比较这些分布式算法。我们关注这些算法检测对象真实状态的速度。发现(a)几何平均共识算法比算术平均共识更高效;(b)共识聚合与贝叶斯更新的顺序对性能影响不明显;(c)通信延迟显著降低收敛速度;(d)不同信号结构的代理之间更多通信提高收敛速度。

英文摘要

In this paper, we discuss a class of distributed detection algorithms which can be viewed as implementations of Bayes' law in distributed settings. Some of the algorithms are proposed in the literature most recently, and others are first developed in this paper. The common feature of these algorithms is that they all combine (i) certain kinds of consensus protocols with (ii) Bayesian updates. They are different mainly in the aspect of the type of consensus protocol and the order of the two operations. After discussing their similarities and differences, we compare these distributed algorithms by numerical examples. We focus on the rate at which these algorithms detect the underlying true state of an object. We find that (a) The algorithms with consensus via geometric average is more efficient than that via arithmetic average; (b) The order of consensus aggregation and Bayesian update does not apparently influence the performance of the algorithms; (c) The existence of communication delay dramatically slows down the rate of convergence; (d) More communication between agents with different signal structures improves the rate of convergence.

1511.01846 2026-06-04 math.NA cs.NA stat.ML

Sparse approximation by greedy algorithms

由贪心算法进行稀疏逼近

Vladimir Temlyakov

AI总结 本文综述了构造性稀疏逼近的最新成果,探讨了贪心算法在特殊字典中的Lebesgue型不等式、三角函数系的构造性稀疏逼近以及张量积结构字典的稀疏逼近方法,并提供了构造性方法和证明。

详情
Comments
arXiv admin note: substantial text overlap with arXiv:1303.6811, arXiv:1303.3595
AI中文摘要

本文是对构造性稀疏逼近最新研究成果的综述。讨论了三个方向:(1) 针对特殊字典的贪心算法的Lebesgue型不等式;(2) 针对三角函数系的构造性稀疏逼近;(3) 针对具有张量积结构的字典的稀疏逼近。在所有三种情况下都提供了构造性方法。所用技术基于贪心逼近理论的基本结果。特别是方向(1)的结果基于最近在压缩感知中发展出的深刻方法。我们详细介绍了这些结果及其证明。

英文摘要

It is a survey on recent results in constructive sparse approximation. Three directions are discussed here: (1) Lebesgue-type inequalities for greedy algorithms with respect to a special class of dictionaries, (2) constructive sparse approximation with respect to the trigonometric system, (3) sparse approximation with respect to dictionaries with tensor product structure. In all three cases constructive ways are provided for sparse approximation. The technique used is based on fundamental results from the theory of greedy approximation. In particular, results in the direction (1) are based on deep methods developed recently in compressed sensing. We present some of these results with detailed proofs.

1511.01543 2026-06-04 eess.SY cs.SY stat.ML

Regularization and Bayesian Learning in Dynamical Systems: Past, Present and Future

在动态系统中正则化与贝叶斯学习:过去、现在与未来

A. Chiuso

AI总结 本文回顾动态系统中正则化与贝叶斯方法的发展历程,探讨其在系统辨识中的演变及核心问题,强调历史基础与早期贡献,以澄清与现代研究的联系。

详情
Comments
Plenary Presentation at the IFAC SYSID 2015. Submitted to Annual Reviews in Control
AI中文摘要

正则化和贝叶斯方法在系统辨识中的应用近年来再次受到关注,并证明在经典参数方法面前具有竞争力。本文旨在阐述正则化在系统辨识中的应用如何随时间演变,从自动控制和计量经济学及统计学文献中的早期贡献开始。我们将讨论一些基本问题,如复合估计问题和交换性,这些在正则化和贝叶斯方法中起重要作用,如早期统计学出版物中所示。历史和基础性问题将得到更多强调(和空间),而近期发展则仅简要讨论。这种选择的主要原因是,尽管近期文献 readily available,且已出版过相关综述,但在作者看来,与过去工作的明确联系尚未完全澄清。

英文摘要

Regularization and Bayesian methods for system identification have been repopularized in the recent years, and proved to be competitive w.r.t. classical parametric approaches. In this paper we shall make an attempt to illustrate how the use of regularization in system identification has evolved over the years, starting from the early contributions both in the Automatic Control as well as Econometrics and Statistics literature. In particular we shall discuss some fundamental issues such as compound estimation problems and exchangeability which play and important role in regularization and Bayesian approaches, as also illustrated in early publications in Statistics. The historical and foundational issues will be given more emphasis (and space), at the expense of the more recent developments which are only briefly discussed. The main reason for such a choice is that, while the recent literature is readily available, and surveys have already been published on the subject, in the author's opinion a clear link with past work had not been completely clarified.

1511.01304 2026-06-04 stat.ML cs.NA math.NA

Dictionary descent in optimization

字典下降在优化中的应用

Vladimir Temlyakov

AI总结 本文研究了凸优化问题,探讨了字典在优化算法中的应用,通过减少维度提升收敛速度,分析了字典特性对贪心算法收敛率的影响。

详情
Comments
arXiv admin note: text overlap with arXiv:1206.0392
AI中文摘要

本文研究了凸优化问题,通常在凸优化中,最小化是在d维域上进行的。非常 often,优化算法的收敛速度取决于维度d。本文研究的算法利用字典代替坐标下降算法中常用的标准基。我们展示了这种方法如何允许减少问题的维度性。此外,我们探讨了字典的哪些特性对典型贪心型算法的收敛速度有利。

英文摘要

The problem of convex optimization is studied. Usually in convex optimization the minimization is over a d-dimensional domain. Very often the convergence rate of an optimization algorithm depends on the dimension d. The algorithms studied in this paper utilize dictionaries instead of a canonical basis used in the coordinate descent algorithms. We show how this approach allows us to reduce dimensionality of the problem. Also, we investigate which properties of a dictionary are beneficial for the convergence rate of typical greedy-type algorithms.

1410.1221 2026-06-04 math.OC cs.NA math.NA stat.CO stat.ME

Scalable and efficient algorithms for the propagation of uncertainty from data through inference to prediction for large-scale problems, with application to flow of the Antarctic ice sheet

可扩展和高效的算法用于从数据通过推理到预测的不确定性传播,用于大规模问题,应用于南极冰盖流动

Tobin Isaac, Noemi Petra, Georg Stadler, Omar Ghattas

AI总结 本文提出高效算法处理大规模模型中的不确定性推断与传播,应用于南极冰盖流动预测,通过低秩近似实现维度无关性。

详情
AI中文摘要

计算科学与工程中大多数高效可扩展算法的研究集中在正向问题:给定参数输入,求解控制方程以确定感兴趣的输出量。相反,本文考虑更广泛的问题:给定包含不确定参数的大规模模型、可能有噪声的观测数据和感兴趣的预测量,如何构建高效可扩展的算法来(1)从数据推断模型参数(确定性反问题),(2)量化推断参数的不确定性(贝叶斯推断问题),以及(3)将结果不确定参数通过模型传播以发布具有量化不确定性的预测(正向不确定性传播问题)。本文在高斯近似下,针对南极冰盖流动及其对海平面的影响,提出了这种端到端的数据到预测过程的高效可扩展算法。冰被建模为粘性、不可压缩、缓慢流动、剪切稀化的流体。观测数据来自InSAR卫星测量的冰面流动速度,需要推断的不确定参数场是基底滑动参数。感兴趣的预测量是当今从南极大陆向海洋的冰质量通量。本文表明,执行此数据到预测过程所需的工作量与状态维度、参数维度、数据维度和处理器核心数量无关。实现此维度无关性的关键是利用观测数据通常仅提供模型参数的稀疏信息。这一性质可以用来构造线性化参数到可观察量映射的低秩近似。

英文摘要

The majority of research on efficient and scalable algorithms in computational science and engineering has focused on the forward problem: given parameter inputs, solve the governing equations to determine output quantities of interest. In contrast, here we consider the broader question: given a (large-scale) model containing uncertain parameters, (possibly) noisy observational data, and a prediction quantity of interest, how do we construct efficient and scalable algorithms to (1) infer the model parameters from the data (the deterministic inverse problem), (2) quantify the uncertainty in the inferred parameters (the Bayesian inference problem), and (3) propagate the resulting uncertain parameters through the model to issue predictions with quantified uncertainties (the forward uncertainty propagation problem)? We present efficient and scalable algorithms for this end-to-end, data-to-prediction process under the Gaussian approximation and in the context of modeling the flow of the Antarctic ice sheet and its effect on sea level. The ice is modeled as a viscous, incompressible, creeping, shear-thinning fluid. The observational data come from InSAR satellite measurements of surface ice flow velocity, and the uncertain parameter field to be inferred is the basal sliding parameter. The prediction quantity of interest is the present-day ice mass flux from the Antarctic continent to the ocean. We show that the work required for executing this data-to-prediction process is independent of the state dimension, parameter dimension, data dimension, and number of processor cores. The key to achieving this dimension independence is to exploit the fact that the observational data typically provide only sparse information on model parameters. This property can be exploited to construct a low rank approximation of the linearized parameter-to-observable map.

1412.0473 2026-06-04 stat.AP cs.NA math.NA physics.comp-ph stat.ML

Sparse Variational Bayesian Approximations for Nonlinear Inverse Problems: applications in nonlinear elastography

稀疏变分贝叶斯近似用于非线性反问题:非线性弹性成像应用

Isabell M. Franck, P. S. Koutsourelakis

AI总结 本文提出高效的贝叶斯框架解决非线性高维模型校准问题,通过变分贝叶斯方法近似后验分布,实现参数低维表示和高效后验密度计算,应用于非线性弹性成像中的生物材料机械性质识别。

详情
AI中文摘要

本文提出了一种高效的贝叶斯框架,用于解决非线性、高维模型校准问题。该框架基于变分贝叶斯方法,通过在适当选择的分布族上求解优化问题来近似精确后验。目标是双重的:首先,找到能够尽可能捕捉相关后验密度的未知参数向量的低维表示;其次,使后验密度的近似计算尽可能少地调用正向模型。我们讨论如何通过完全贝叶斯论证和将边际似然或证据作为最终模型验证度量来实现这些目标。我们演示了所提出方法在非线性弹性成像中的性能,其中生物材料的机械性质识别可为非侵入性医学诊断提供信息。最后,采用重要性采样方案来验证结果并评估所提供的近似方法的有效性。

英文摘要

This paper presents an efficient Bayesian framework for solving nonlinear, high-dimensional model calibration problems. It is based on a Variational Bayesian formulation that aims at approximating the exact posterior by means of solving an optimization problem over an appropriately selected family of distributions. The goal is two-fold. Firstly, to find lower-dimensional representations of the unknown parameter vector that capture as much as possible of the associated posterior density, and secondly to enable the computation of the approximate posterior density with as few forward calls as possible. We discuss how these objectives can be achieved by using a fully Bayesian argumentation and employing the marginal likelihood or evidence as the ultimate model validation metric for any proposed dimensionality reduction. We demonstrate the performance of the proposed methodology for problems in nonlinear elastography where the identification of the mechanical properties of biological materials can inform non-invasive, medical diagnosis. An Importance Sampling scheme is finally employed in order to validate the results and assess the efficacy of the approximations provided.

1507.08254 2026-06-04 cs.IT cs.NA math.IT math.NA math.OC math.ST stat.TH

Efficient Compressive Phase Retrieval with Constrained Sensing Vectors

高效压缩相位恢复与受限传感向量

Sohail Bahmani, Justin Romberg

AI总结 本文提出一种高效的压缩相位恢复方法,通过受限传感向量和两阶段重构方法,解决从线性测量的幅度重建稀疏向量的问题,证明在O(k log(d/k))测量下可有效恢复目标信号。

详情
Comments
Accepted for the 29th Annual Conference on Neural Information Processing Systems (NIPS), 2015
AI中文摘要

我们提出了一种稳健且高效的压缩相位恢复方法,旨在从多个线性测量的幅度中重建稀疏向量。所提出的框架依赖于受限传感向量和一个两阶段重构方法,该方法由两个标准的凸优化问题依次求解。近年来,尽管提出了多种压缩相位恢复方法,但它们在样本复杂度上仍不理想或缺乏鲁棒性保证。主要障碍在于目标结构类型没有直接的凸松弛方法。给定一组欠定测量,存在标准框架用于恢复稀疏矩阵和低秩矩阵,但恢复联合稀疏和低秩矩阵的一般高效方法仍难以获得。与通用测量模型不同,本文证明如果从不相干子空间中随机选择传感向量,则目标信号的低秩和稀疏结构可以有效解耦。我们证明,由低秩恢复阶段后接稀疏恢复阶段的恢复算法,在测量数为O(k log(d/k))时能准确估计目标信号。我们还通过数值模拟评估了该算法。

英文摘要

We propose a robust and efficient approach to the problem of compressive phase retrieval in which the goal is to reconstruct a sparse vector from the magnitude of a number of its linear measurements. The proposed framework relies on constrained sensing vectors and a two-stage reconstruction method that consists of two standard convex programs that are solved sequentially. In recent years, various methods are proposed for compressive phase retrieval, but they have suboptimal sample complexity or lack robustness guarantees. The main obstacle has been that there is no straightforward convex relaxations for the type of structure in the target. Given a set of underdetermined measurements, there is a standard framework for recovering a sparse matrix, and a standard framework for recovering a low-rank matrix. However, a general, efficient method for recovering a jointly sparse and low-rank matrix has remained elusive. Deviating from the models with generic measurements, in this paper we show that if the sensing vectors are chosen at random from an incoherent subspace, then the low-rank and sparse structures of the target signal can be effectively decoupled. We show that a recovery algorithm that consists of a low-rank recovery stage followed by a sparse recovery stage will produce an accurate estimate of the target when the number of measurements is $\mathsf{O}(k\,\log\frac{d}{k})$, where $k$ and $d$ denote the sparsity level and the dimension of the input signal. We also evaluate the algorithm through numerical simulation.

1501.00046 2026-06-04 cs.IT cs.NA math.FA math.IT math.NA math.OC math.ST stat.TH

Lifting for Blind Deconvolution in Random Mask Imaging: Identifiability and Convex Relaxation

在随机掩码成像中的盲反卷积提升:可识别性与凸松弛

Sohail Bahmani, Justin Romberg

AI总结 本文分析了在编码成像系统中图像与未知模糊的盲反卷积问题,通过提升方法将测量转换为矩阵外积的线性方程组,证明在特定条件下通过核范数最小化可实现高概率恢复。

详情
Journal ref
SIAM J. Imaging Sci. 8(4):2203--2238, 2015
AI中文摘要

本文分析了在编码成像系统中图像与未知模糊的盲反卷积问题。测量由未知模糊核与多个随机二进制调制(编码掩码)的图像的子采样卷积组成。为了进行反卷积,我们考虑了图像和模糊核的标准提升,将测量转换为由其外积矩阵形成的线性方程组。任何秩为一的解都提供了一个有效的图像和模糊对。我们首先表达了在某些附加假设(均匀子采样和编码掩码数量无限制)下秩为一解的唯一性条件,这些条件是矩阵补全问题中已建立的可识别性结果的特殊情况。我们还刻画了一个低维子空间模型,该模型足以保证可识别性,包括有趣的

英文摘要

In this paper we analyze the blind deconvolution of an image and an unknown blur in a coded imaging system. The measurements consist of subsampled convolution of an unknown blurring kernel with multiple random binary modulations (coded masks) of the image. To perform the deconvolution, we consider a standard lifting of the image and the blurring kernel that transforms the measurements into a set of linear equations of the matrix formed by their outer product. Any rank-one solution to this system of equation provides a valid pair of an image and a blur. We first express the necessary and sufficient conditions for the uniqueness of a rank-one solution under some additional assumptions (uniform subsampling and no limit on the number of coded masks). These conditions are special case of a previously established result regarding identifiability in the matrix completion problem. We also characterize a low-dimensional subspace model for the blur kernel that is sufficient to guarantee identifiability, including the interesting instance of "bandpass"` blur kernels. Next, assuming the bandpass model for the blur kernel, we show that the image and the blur kernel can be found using nuclear norm minimization. Our main results show that recovery is achieved (with high probability) when the number of masks is on the order of $μ\log^{2}L\,\log\frac{Le}μ\,\log\log\left(N+1\right)$ where $μ$ is the \emph{coherence} of the blur, $L$ is the dimension of the image, and $N$ is the number of measured samples per mask.

1410.4062 2026-06-04 stat.ML cs.LG cs.NA math.NA math.OC

Complexity Issues and Randomization Strategies in Frank-Wolfe Algorithms for Machine Learning

凸优化中Frank-Wolfe算法的复杂性问题与随机化策略

Emanuele Frandi, Ricardo Nanculef, Johan Suykens

AI总结 本文研究了Frank-Wolfe算法在大规模数据集中的有效性,分析了随机采样策略的替代方案,并提供了一些指导原则。

详情
AI中文摘要

Frank-Wolfe算法用于凸优化最近在优化和机器学习社区中引起了广泛关注,因其特性使其成为各种应用中的合适选择。然而,由于每次迭代都需要优化线性模型,巧妙的实现对于使这些算法在大规模数据集上可行至关重要。为此,几位研究者提出了基于随机采样的近似策略。在本文中,我们对这些技术的有效性进行了实验研究,分析了可能的替代方案,并根据我们的结果提供了一些指导原则。

英文摘要

Frank-Wolfe algorithms for convex minimization have recently gained considerable attention from the Optimization and Machine Learning communities, as their properties make them a suitable choice in a variety of applications. However, as each iteration requires to optimize a linear model, a clever implementation is crucial to make such algorithms viable on large-scale datasets. For this purpose, approximation strategies based on a random sampling have been proposed by several researchers. In this work, we perform an experimental study on the effectiveness of these techniques, analyze possible alternatives and provide some guidelines based on our results.

1510.04977 2026-06-04 stat.CO cs.NA math.NA math.PR math.ST stat.TH

Multilevel particle filter

多级粒子滤波器

Ajay Jasra, Kengo Kamatani, Kody J. H. Law, Yan Zhou

AI总结 本文研究了部分观测扩散过程的过滤问题,提出多级估计器以优化均方误差,适用于金融数据估计。

详情
AI中文摘要

本文研究了部分观测扩散过程的过滤问题,提出多级估计器以优化均方误差,适用于金融数据估计。

英文摘要

In this paper the filtering of partially observed diffusions, with discrete-time observations, is considered. It is assumed that only biased approximations of the diffusion can be obtained, for choice of an accuracy parameter indexed by $l$. A multilevel estimator is proposed, consisting of a telescopic sum of increment estimators associated to the successive levels. The work associated to $\mathcal{O}(\varepsilon^2)$ mean-square error between the multilevel estimator and average with respect to the filtering distribution is shown to scale optimally, for example as $\mathcal{O}(\varepsilon^{-2})$ for optimal rates of convergence of the underlying diffusion approximation. The method is illustrated on some toy examples as well as estimation of interest rate based on real S&P 500 stock price data.

1510.04841 2026-06-04 q-fin.ST econ.GN q-fin.EC q-fin.GN stat.ME

How to (Not) Estimate Gini Coefficients for Fat Tailed Variables

如何(不)估算尾部脂肪变量的基尼系数

Nassim Nicholas Taleb

AI总结 传统方法估算基尼系数不准确,因超加性导致无法比较不同规模单位及跨时期分析。尾部指数最大似然估计方法更可靠,能有效降低误差。

详情
AI中文摘要

传统算术计算基尼系数是次优估计,即使包含全部人口,因超加性无法比较不同规模单位及跨时期分析。聚合单位A和B的基尼系数高于单独计算的值,尾部脂肪程度越强,此效应越显著。样本量小于总体时误差极高。传统基尼系数文献不可信,比较不同规模国家无意义,基于传统指标的

英文摘要

Direct measurements of Gini coefficients by conventional arithmetic calculations are a poor estimator, even if paradoxically, they include the entire population, as because of super-additivity they cannot lend themselves to comparisons between units of different size, and intertemporal analyses are vitiated by the population changes. The Gini of aggregated units A and B will be higher than those of A and B computed separately. This effect becomes more acute with fatness of tails. When the sample size is smaller than entire population, the error is extremely high. The conventional literature on Gini coefficients cannot be trusted and comparing countries of different sizes makes no sense; nor does it make sense to make claims of "changes in inequality" based on conventional measures. We compare the standard methodologies to the indirect methods via maximum likelihood estimation of tail exponent. We compare to the tail method which is unbiased, with considerably lower error rate. We also consider measurement errors of the tail exponent and suggest a simple but efficient methodology to calculate Gini coefficients.

1510.04346 2026-06-04 stat.ME econ.GN q-fin.EC

Explicit solutions to a vector time series model and its induced model for business cycles

向量时间序列模型及其诱导模型的显式解

Xiongzhi Chen

AI总结 本文给出了一种描述异质 agent 在不确定性下根据凯恩斯原理交互的向量时间序列模型的显式解,并通过 agent 增长率的加权平均诱导出商业周期模型,从而更深入理解模型的数学性质和经济计量特性。

详情
Comments
16 page2, 1 figure
AI中文摘要

本文给出了一个一般向量时间序列模型的显式解,该模型描述了在不确定性下根据凯恩斯原理运作的异质agent之间的交互。由此,通过模型中agent增长率的加权平均诱导出商业周期模型。显式解使得时间序列的直接模拟和增长速率的联合行为理解成为可能。此外,商业周期模型及其解也明确给出并进行了分析。显式解提供了对这些模型数学性质和试图纳入的经济计量特性的更好理解。

英文摘要

This article gives the explicit solution to a general vector time series model that describes interacting, heterogeneous agents that operate under uncertainties but according to Keynesian principles, from which a model for business cycle is induced by a weighted average of the growth rates of the agents in the model. The explicit solution enables a direct simulation of the time series defined by the model and better understanding of the joint behavior of the growth rates. In addition, the induced model for business cycles and its solutions are explicitly given and analyzed. The explicit solutions provide a better understanding of the mathematics of these models and the econometric properties they try to incorporate.

1510.01670 2026-06-04 cs.IT cs.NA math.IT math.NA math.ST stat.TH

Sketching for Simultaneously Sparse and Low-Rank Covariance Matrices

对同时稀疏和低秩协方差矩阵的抽样技术

Sohail Bahmani, Justin Romberg

AI总结 本文提出一种通过抽样技术估计结构化协方差矩阵的方法,利用预选向量与观测向量的内积形成估计,展示了特殊结构的抽样向量可使两阶段算法更高效,且估计精度与抽样数量成正比。

详情
Comments
Accepted in 2015 IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP 2015)
AI中文摘要

我们介绍了一种技术,用于从随机向量的观测数据中估计结构化协方差矩阵。每个观测随机向量x_t通过与预选向量a_ℓ的内积被缩减为单个数值。这些观测用于估计协方差矩阵Σ的线性观测,假设Σ同时稀疏和低秩。我们证明,如果抽样向量a_ℓ具有特殊结构,则可以使用简单的两阶段算法利用该结构。我们还展示,当抽样数量与Σ的最大秩和显著行/列数成正比时,估计是准确的。此外,我们的算法通过仅处理远小于原始协方差矩阵的矩阵,直接利用Σ的低秩结构。

英文摘要

We introduce a technique for estimating a structured covariance matrix from observations of a random vector which have been sketched. Each observed random vector $\boldsymbol{x}_t$ is reduced to a single number by taking its inner product against one of a number of pre-selected vector $\boldsymbol{a}_\ell$. These observations are used to form estimates of linear observations of the covariance matrix $\boldsymbol{\varSigma}$, which is assumed to be simultaneously sparse and low-rank. We show that if the sketching vectors $\boldsymbol{a}_\ell$ have a special structure, then we can use straightforward two-stage algorithm that exploits this structure. We show that the estimate is accurate when the number of sketches is proportional to the maximum of the rank times the number of significant rows/columns of $\boldsymbol{\varSigma}$. Moreover, our algorithm takes direct advantage of the low-rank structure of $\boldsymbol{\varSigma}$ by only manipulating matrices that are far smaller than the original covariance matrix.

1507.08566 2026-06-04 cs.DC cs.SY eess.SY math.ST stat.ML stat.TH

Diffusion Adaptation Over Clustered Multitask Networks Based on the Affine Projection Algorithm

基于仿射投影算法的集群多任务网络扩散适应

Vinay Chakravarthi Gogineni, Mrityunjoy Chakraborty

AI总结 本文提出基于仿射投影算法的多任务扩散策略,通过利用相关输入的鲁棒性,提升协同估计性能,并通过仿真验证了改进算法的收敛速度和稳态EMSE表现。

详情
Comments
Under Communication. arXiv admin note: substantial text overlap with arXiv:1311.4894 by other authors
AI中文摘要

分布式自适应网络通过利用时域和空域多样性,在消耗少量资源的同时实现更好的估计性能。近期研究集中于单任务分布式估计问题,其中节点协同估计单个最优参数向量。然而,许多重要应用需要同时估计多个向量,且以协同方式完成。本文提出基于仿射投影算法(APA)的多任务扩散策略,利用APA使算法对相关输入具有鲁棒性。所提出的多任务扩散APA算法在均值和均方意义上进行了性能分析,并提出了一种改进的多任务扩散策略,以在收敛速度和稳态EMSE方面提升性能。通过仿真验证了分析结果。

英文摘要

Distributed adaptive networks achieve better estimation performance by exploiting temporal and as well spatial diversity while consuming few resources. Recent works have studied the single task distributed estimation problem, in which the nodes estimate a single optimum parameter vector collaboratively. However, there are many important applications where the multiple vectors have to estimated simultaneously, in a collaborative manner. This paper presents multi-task diffusion strategies based on the Affine Projection Algorithm (APA), usage of APA makes the algorithm robust against the correlated input. The performance analysis of the proposed multi-task diffusion APA algorithm is studied in mean and mean square sense. And also a modified multi-task diffusion strategy is proposed that improves the performance in terms of convergence rate and steady state EMSE as well. Simulations are conducted to verify the analytical results.

1509.08990 2026-06-04 cs.SI cs.LG cs.SY eess.SY math.OC stat.ML

Learning without Recall: A Case for Log-Linear Learning

无需回忆的学习:对日志线性学习的案例

Mohammad Amin Rahimian, Ali Jadbabaie

AI总结 本文研究了在无回忆条件下,理性代理人如何通过合理推断进行学习和信念形成,探讨了时间变化先验对学习和学习速率的影响。

详情
Comments
in 5th IFAC Workshop on Distributed Estimation and Control in Networked Systems, (NecSys 2015)
AI中文摘要

我们分析了一个学习和信念形成在网络中的模型,其中代理人遵循贝叶斯规则,但不回忆过去的观察历史,也无法推断其他代理人的信念形成过程。他们通过合理推断自己的观察来实现,这些观察包括一系列独立同分布的私人信号以及每个时间点邻居的信念。完全理性的代理人会依次应用贝叶斯规则处理全部观察历史。这导致了由于对全球网络结构缺乏了解而产生令人担忧的复杂推断。为了解决这些复杂性,我们考虑了无回忆学习模型,该模型不仅为分析社会网络中理性代理人的行为提供了可处理的框架,还能为文献中各种非贝叶斯更新规则提供行为基础。我们阐述了各种时间变化先验选择对代理人学习及其速率的影响。

英文摘要

We analyze a model of learning and belief formation in networks in which agents follow Bayes rule yet they do not recall their history of past observations and cannot reason about how other agents' beliefs are formed. They do so by making rational inferences about their observations which include a sequence of independent and identically distributed private signals as well as the beliefs of their neighboring agents at each time. Fully rational agents would successively apply Bayes rule to the entire history of observations. This leads to forebodingly complex inferences due to lack of knowledge about the global network structure that causes those observations. To address these complexities, we consider a Learning without Recall model, which in addition to providing a tractable framework for analyzing the behavior of rational agents in social networks, can also provide a behavioral foundation for the variety of non-Bayesian update rules in the literature. We present the implications of various choices for time-varying priors of such agents and how this choice affects learning and its rate.

1408.3115 2026-06-04 math.NA cs.LG cs.NA stat.ML

On Data Preconditioning for Regularized Loss Minimization

关于正则化损失最小化的数据预处理

Tianbao Yang, Rong Jin, Shenghuo Zhu, Qihang Lin

AI总结 研究通过数据预处理技术提升一阶方法在正则化损失最小化中的收敛速度,分析了问题条件数对收敛的影响,并提出随机采样方法实现高效预处理。

详情
AI中文摘要

在本文中,我们研究了数据预处理技术,这是一种已知且长期存在的技术,用于提升一阶方法在正则化损失最小化中的收敛速度。众所周知,问题的条件数,即Lipschitz常数与强凸模量的比值,对一阶优化方法的收敛性有显著影响。因此,最小化一个小的正则化损失以获得良好的泛化性能,导致产生一个病态的问题,成为大数据问题的瓶颈。我们为正则化损失最小化提供了数据预处理的理论。特别是,我们的分析展示了一个适当的数据预处理器,并 characterize 了损失函数和数据下的条件,使得数据预处理可以降低条件数,从而加速最小化正则化损失的收敛。为了使数据预处理在实践中有用,我们努力采用并分析一种随机采样方法,以高效计算预处理后的数据。初步实验验证了我们的理论。

英文摘要

In this work, we study data preconditioning, a well-known and long-existing technique, for boosting the convergence of first-order methods for regularized loss minimization. It is well understood that the condition number of the problem, i.e., the ratio of the Lipschitz constant to the strong convexity modulus, has a harsh effect on the convergence of the first-order optimization methods. Therefore, minimizing a small regularized loss for achieving good generalization performance, yielding an ill conditioned problem, becomes the bottleneck for big data problems. We provide a theory on data preconditioning for regularized loss minimization. In particular, our analysis exhibits an appropriate data preconditioner and characterizes the conditions on the loss function and on the data under which data preconditioning can reduce the condition number and therefore boost the convergence for minimizing the regularized loss. To make the data preconditioning practically useful, we endeavor to employ and analyze a random sampling approach to efficiently compute the preconditioned data. The preliminary experiments validate our theory.

1509.06890 2026-06-04 astro-ph.SR cs.NA math.NA stat.CO

Numerical methods for solution of the stochastic differential equations equivalent to the non-stationary Parker's transport equation

求解等价于非稳态帕克传输方程的随机微分方程的数值方法

A. Wawrzynczak, R. Modzelewska, M. Kluczek

AI总结 本文提出求解非稳态帕克传输方程对应的随机微分方程的数值方法,包括欧拉-马尔可夫、米尔斯坦和随机龙格-库塔方法,讨论其在提高解精度方面的优缺点。

详情
Journal ref
IOP Publishing Ltd., Journal of Physics: Conference Series, Volume 633, conference 1, 012058, 2015
Comments
4 pages, 2 figures, presented on 4th International Conference on Mathematical Modeling in Physical Sciences, 2015
AI中文摘要

我们推导了求解对应于非稳态帕克传输方程(PTE)的随机微分方程(SDE)集的数值方案。PTE是一个5维(3个空间坐标、粒子能量和时间)福克-普兰克型方程,描述了太阳风中的高能宇宙射线(GCR)粒子在日球层中的非稳态传输。我们给出了在全三维扩散张量情况下,由维纳过程驱动的所获SDE集的数值解公式。我们介绍了应用强阶欧拉-马尔可夫、米尔斯坦和随机龙格-库塔方法的解法,并讨论了所呈现的数值方法在提高PTE解精度方面的优缺点。

英文摘要

We derive the numerical schemes for the strong order integration of the set of the stochastic differential equations (SDEs) corresponding to the non-stationary Parker transport equation (PTE). PTE is 5-dimensional (3 spatial coordinates, particles energy and time) Fokker- Planck type equation describing the non-stationary the galactic cosmic ray (GCR) particles transport in the heliosphere. We present the formulas for the numerical solution of the obtained set of SDEs driven by a Wiener process in the case of the full three-dimensional diffusion tensor. We introduce the solution applying the strong order Euler-Maruyama, Milstein and stochastic Runge-Kutta methods. We discuss the advantages and disadvantages of the presented numerical methods in the context of increasing the accuracy of the solution of the PTE.

1509.06523 2026-06-04 astro-ph.SR cs.NA math.NA physics.plasm-ph physics.space-ph stat.CO

Stochastic approach to the numerical solution of the non-stationary Parker's transport equation

非稳态帕克传输方程数值解的随机方法

A. Wawrzynczak, R. Modzelewska, A. Gil

AI总结 本文提出基于随机微分方程求解非稳态帕克传输方程的方法,采用球坐标系推导等效方程,并验证了该模型与Forbush下降实验数据的一致性。

详情
Journal ref
IOP Publishing, Journal of Physics: Conference Series, 574, 012078, 2015, (Web of Science)
Comments
4 pages, 2 figures, presented on International Conference on Mathematical Modeling in Physical Sciences, 2014
AI中文摘要

我们介绍了新发展的银河宇宙射线(GCR)粒子在日球层中传输的随机模型。数学上,帕克传输方程(PTE)描述了湍流介质中带电粒子的非稳态传输,属于福克-计划克类型,是一个二阶抛物型时间依赖的四维偏微分方程(3个空间坐标和粒子能量/刚性)。值得注意的是,若假设稳态情况,它仍保持关于粒子刚性R的三维抛物型问题。若固定能量,它仍保持关于时间的三维抛物型问题。所提出的方法基于求解等效于帕克传输方程的随机微分方程(SDEs)系统。我们介绍了在日心球坐标系中推导等效SDEs的方法,采用逆向方法。所得的随机模型对GCR强度的Forbush下降与实验数据一致。讨论了PTE正向和逆向求解的优缺点。

英文摘要

We present the newly developed stochastic model of the galactic cosmic ray (GCR) particles transport in the heliosphere. Mathematically Parker transport equation (PTE) describing non-stationary transport of charged particles in the turbulent medium is the Fokker-Planck type. It is the second order parabolic time-dependent 4-dimensional (3 spatial coordinates and particles energy/rigidity) partial differential equation. It is worth to mention that, if we assume the stationary case it remains as the 3-D parabolic type problem with respect to the particles rigidity R. If we fix the energy it still remains as the 3-D parabolic type problem with respect to time. The proposed method of numerical solution is based on the solution of the system of stochastic differential equations (SDEs) being equivalent to the Parker's transport equation. We present the method of deriving from PTE the equivalent SDEs in the heliocentric spherical coordinate system for the backward approach. The obtained stochastic model of the Forbush decrease of the GCR intensity is in an agreement with the experimental data. The advantages and disadvantages of the forward and the backward solution of the PTE are discussed.

1509.06519 2026-06-04 astro-ph.SR cs.NA math.NA physics.space-ph stat.CO

A stochastic method of solution of the Parker transport equation

求解帕克传输方程的随机方法

A. Wawrzynczak, R. Modzelewska, A. Gil

AI总结 本文提出了一种基于帕克传输方程的随机模型,用于描述银河宇宙射线在日球层内的传输,通过求解随机微分方程来模拟福布什下降和27天变化现象,与有限差分法结果一致。

详情
Journal ref
IOP Publishing, Journal of Physics: Conference Series, 2015, 1742-6596, 632, 012084, (Web of Science)
Comments
8 pages, 7 figures, presented on 24th European Cosmic Ray Symposium 2014
AI中文摘要

我们提出了银河宇宙射线(GCR)粒子在日球层内传输的随机模型。基于帕克传输方程的解,我们开发了描述GCR强度短期变化的模型,即福布什下降(Fd)和27天变化。帕克传输方程作为福克-平克类型方程,描述了在湍流介质中带电粒子的非稳态传输。所提出的数值解法基于求解一组等价的随机微分方程(SDEs)。我们展示了如何从帕克传输方程推导出对应的SDEs,采用地心球坐标系进行反向方法。强调了反向方法相较于正向方法的优势。我们比较了福布什下降和27天变化的随机模型结果与之前由有限差分法建立的模型结果。两种模型都与实验数据一致。

英文摘要

We present the stochastic model of the galactic cosmic ray (GCR) particles transport in the heliosphere. Based on the solution of the Parker transport equation we developed models of the short-time variation of the GCR intensity, i.e. the Forbush decrease (Fd) and the 27-day variation of the GCR intensity. Parker transport equation being the Fokker-Planck type equation delineates non-stationary transport of charged particles in the turbulent medium. The presented approach of the numerical solution is grounded on solving of the set of equivalent stochastic differential equations (SDEs). We demonstrate the method of deriving from Parker transport equation the corresponding SDEs in the heliocentric spherical coordinate system for the backward approach. Features indicative the preeminence of the backward approach over the forward is stressed. We compare the outcomes of the stochastic model of the Fd and 27-day variation of the GCR intensity with our former models established by the finite difference method. Both models are in an agreement with the experimental data.

1509.05722 2026-06-04 stat.ML cs.AI cs.MA cs.SY eess.SY

Energy saving in smart homes based on consumer behaviour: A case study

基于消费者行为的智能家庭节能:一个案例研究

Michael Zehnder, Holger Wache, Hans-Friedrich Witschel, Danilo Zanatta, Miguel Rodriguez

AI总结 本文提出一个节能推荐系统,通过分析消费者行为数据,利用机器学习建议减少家庭能耗,同时保持居住舒适度。

详情
Comments
To be presented on IEEE International Smart Cities Conference 2015
AI中文摘要

本文介绍了一个推荐系统案例,该系统可帮助智能家庭在不降低居住舒适度的情况下节省能源。系统利用消费者行为数据,通过机器学习建议居民采取节能行动。系统从Digitalstrom家庭自动化系统提供的事件数据中挖掘频繁和周期性模式,将这些模式转换为关联规则,并与居民当前行为进行比较。如果系统检测到可以在不降低舒适度的情况下节能的机会,它会向居民发送推荐。

英文摘要

This paper presents a case study of a recommender system that can be used to save energy in smart homes without lowering the comfort of the inhabitants. We present an algorithm that uses consumer behavior data only and uses machine learning to suggest actions for inhabitants to reduce the energy consumption of their homes. The system mines for frequent and periodic patterns in the event data provided by the Digitalstrom home automation system. These patterns are converted into association rules, prioritized and compared with the current behavior of the inhabitants. If the system detects an opportunities to save energy without decreasing the comfort level it sends a recommendation to the residents.

1502.03697 2026-06-04 stat.CO cs.SY eess.SY math.OC

Nonlinear state space smoothing using the conditional particle filter

非线性状态空间平滑中的条件粒子滤波

Andreas Svensson, Thomas B. Schön, Manon Kok

AI总结 本文提出一种基于条件粒子滤波的非线性状态空间平滑方法,通过马尔可夫链蒙特卡洛方式迭代计算,具有渐近收敛性,应用于超宽带与加速度计/陀螺仪传感器融合的室内定位问题。

详情
Comments
Accepted for the 17th IFAC Symposium on System Identification (SYSID), Beijing, China, October 2015
AI中文摘要

为了估计非线性状态空间模型中的平滑分布,我们应用带有祖先采样技术的条件粒子滤波。这提供了一种马尔可夫链蒙特卡洛风格的迭代算法,具有渐近收敛性结果。分析了计算复杂性,并成功将所提出算法应用于超宽带与加速度计/陀螺仪测量之间的传感器融合难题。它似乎成为现有非线性平滑算法,特别是前向过滤-后向模拟平滑器的一种有竞争力的替代方案。

英文摘要

To estimate the smoothing distribution in a nonlinear state space model, we apply the conditional particle filter with ancestor sampling. This gives an iterative algorithm in a Markov chain Monte Carlo fashion, with asymptotic convergence results. The computational complexity is analyzed, and our proposed algorithm is successfully applied to the challenging problem of sensor fusion between ultra-wideband and accelerometer/gyroscope measurements for indoor positioning. It appears to be a competitive alternative to existing nonlinear smoothing algorithms, in particular the forward filtering-backward simulation smoother.

1509.04580 2026-06-04 stat.ML cs.SY eess.SY

Maximum Correntropy Kalman Filter

最大相关熵卡尔曼滤波器

Badong Chen, Xi Liu, Haiquan Zhao, José C. Príncipe

AI总结 本文提出最大相关熵卡尔曼滤波器,用于改善传统卡尔曼滤波在非高斯噪声下的鲁棒性,通过最大相关熵准则替代最小均方误差准则,提供更稳定的估计。

详情
Comments
11 pages, 11 figures, 7 tables
AI中文摘要

传统卡尔曼滤波(KF)是在著名的最小均方误差(MMSE)准则下推导的,这在高斯假设下最优。然而,当信号非高斯,尤其是系统受重尾脉冲噪声干扰时,KF性能会严重下降。为此,本文提出新的卡尔曼滤波器,称为最大相关熵卡尔曼滤波器(MCKF),采用稳健的最大相关熵准则(MCC)作为最优准则,而非MMSE。与传统KF类似,MCKF使用状态均值和协方差矩阵传播方程来提供状态和协方差矩阵的先验估计。然后采用新的固定点算法更新后验估计。给出了保证固定点算法收敛的充分条件。通过示例演示了新算法的有效性和鲁棒性。

英文摘要

Traditional Kalman filter (KF) is derived under the well-known minimum mean square error (MMSE) criterion, which is optimal under Gaussian assumption. However, when the signals are non-Gaussian, especially when the system is disturbed by some heavy-tailed impulsive noises, the performance of KF will deteriorate seriously. To improve the robustness of KF against impulsive noises, we propose in this work a new Kalman filter, called the maximum correntropy Kalman filter (MCKF), which adopts the robust maximum correntropy criterion (MCC) as the optimality criterion, instead of using the MMSE. Similar to the traditional KF, the state mean and covariance matrix propagation equations are used to give prior estimations of the state and covariance matrix in MCKF. A novel fixed-point algorithm is then used to update the posterior estimations. A sufficient condition that guarantees the convergence of the fixed-point algorithm is given. Illustration examples are presented to demonstrate the effectiveness and robustness of the new algorithm.

1509.04332 2026-06-04 eess.SY cs.SY math.OC stat.ML

Learning without Recall by Random Walks on Directed Graphs

通过在有向图上随机游走实现无回忆学习

Mohammad Amin Rahimian, Shahin Shahrampour, Ali Jadbabaie

AI总结 本文提出一种基于随机游走的无回忆学习模型,通过局部观察和邻居随机选择,实现快速准确的状态学习。

详情
Comments
6 pages, To Appear in Conference on Decision and Control 2015
AI中文摘要

我们考虑一个由智能体组成的网络,其目标是通过私人观察和信念交换学习未知的世界状态。每个时间点,智能体观察到的私人信号基于真实未知状态生成。某些智能体可能无法仅凭私人观察区分真实状态,当其他状态在观察上等价于真实状态时。为克服这一缺陷,智能体必须相互交流以利用局部观察。我们提出了一种模型,其中每个智能体在每个时间点随机选择一个邻居,然后利用其私人信号和该邻居的先验信息来细化其观点。该规则可视为一个无法回忆用于推断的贝叶斯智能体。这种无回忆学习方法保留了贝叶斯推断的一些方面,同时具有计算可行性。通过建立与网络图上随机游走的对应关系,我们证明在所描述的协议下,智能体以指数速度几乎确定性地学习真相。渐近速率表示为每个智能体信号结构的相对熵之和,加权以随机游走的平稳分布。

英文摘要

We consider a network of agents that aim to learn some unknown state of the world using private observations and exchange of beliefs. At each time, agents observe private signals generated based on the true unknown state. Each agent might not be able to distinguish the true state based only on her private observations. This occurs when some other states are observationally equivalent to the true state from the agent's perspective. To overcome this shortcoming, agents must communicate with each other to benefit from local observations. We propose a model where each agent selects one of her neighbors randomly at each time. Then, she refines her opinion using her private signal and the prior of that particular neighbor. The proposed rule can be thought of as a Bayesian agent who cannot recall the priors based on which other agents make inferences. This learning without recall approach preserves some aspects of the Bayesian inference while being computationally tractable. By establishing a correspondence with a random walk on the network graph, we prove that under the described protocol, agents learn the truth exponentially fast in the almost sure sense. The asymptotic rate is expressed as the sum of the relative entropies between the signal structures of every agent weighted by the stationary distribution of the random walk.

1509.00728 2026-06-04 math.OC cs.CV cs.MA cs.NA math.NA stat.ML

On Transitive Consistency for Linear Invertible Transformations between Euclidean Coordinate Systems

关于欧几里得坐标系统之间线性可逆变换的传递一致性

Johan Thunberg, Florian Bernard, Jorge Goncalves

AI总结 本文研究了如何同步非传递一致的线性可逆变换,提出两种同步方法及迭代Gauss-Newton方法,适用于不同图拓扑,并通过仿真验证了方法的有效性。

详情
Comments
25 pages
AI中文摘要

传递一致性是欧几里得坐标框架之间线性可逆变换集合的内在属性。在实践中,当变换由数据估计时,这一属性往往缺失。本文解决如何同步非传递一致的变换的问题。一旦变换被同步,它们将满足传递一致性条件——从框架A到框架C的变换等于先从A到B再从B到C的复合变换。坐标框架对应图中的节点,变换对应图中的边。提出了两种直接或集中同步方法,分别适用于近强连通图和连通图。作为第二种方法的扩展,提出了迭代Gauss-Newton方法,并将其适应于仿射和欧几里得变换的情况。还提出了适用于正交矩阵的两种分布式同步方法,这些方法可以看作是两种直接或集中方法的分布式版本;它们类似于用于分布式平均的标准共识协议。当变换为正交矩阵时,可以计算最优性间隙的上界。仿真显示,即使在噪声幅度较大的情况下,间隙也几乎准确。本文还从理论层面提供了传递一致变换的线性代数关系。所提出方法的一个优点是其简单性——使用基本线性代数方法,例如奇异值分解(SVD)。对于广泛参数设置范围内的方法,进行了数值验证。

英文摘要

Transitive consistency is an intrinsic property for collections of linear invertible transformations between Euclidean coordinate frames. In practice, when the transformations are estimated from data, this property is lacking. This work addresses the problem of synchronizing transformations that are not transitively consistent. Once the transformations have been synchronized, they satisfy the transitive consistency condition - a transformation from frame $A$ to frame $C$ is equal to the composite transformation of first transforming A to B and then transforming B to C. The coordinate frames correspond to nodes in a graph and the transformations correspond to edges in the same graph. Two direct or centralized synchronization methods are presented for different graph topologies; the first one for quasi-strongly connected graphs, and the second one for connected graphs. As an extension of the second method, an iterative Gauss-Newton method is presented, which is later adapted to the case of affine and Euclidean transformations. Two distributed synchronization methods are also presented for orthogonal matrices, which can be seen as distributed versions of the two direct or centralized methods; they are similar in nature to standard consensus protocols used for distributed averaging. When the transformations are orthogonal matrices, a bound on the optimality gap can be computed. Simulations show that the gap is almost right, even for noise large in magnitude. This work also contributes on a theoretical level by providing linear algebraic relationships for transitively consistent transformations. One of the benefits of the proposed methods is their simplicity - basic linear algebraic methods are used, e.g., the Singular Value Decomposition (SVD). For a wide range of parameter settings, the methods are numerically validated.

1406.2082 2026-06-04 stat.ML cs.LG cs.NA math.NA math.OC stat.AP

Fast and Flexible ADMM Algorithms for Trend Filtering

快速且灵活的ADMM算法用于趋势过滤

Aaditya Ramdas, Ryan J. Tibshirani

AI总结 本文提出一种快速稳健的算法用于趋势过滤,解决其在大规模数据下的计算问题,并展示其在稀疏趋势过滤和等距趋势过滤中的扩展性。

详情
Comments
22 pages, 10 figures; published in Journal of Computational and Graphical Statistics, 2015
AI中文摘要

本文提出了一种快速且稳健的算法用于趋势过滤,一种最近发展的非参数回归工具。已证明,对于导数有界变差的估计函数,趋势过滤达到最小最大最优误差率,而其他流行方法如平滑样条和核方法无法达到。然而,限制其更广泛实际应用的是缺乏可扩展且数值稳定的算法来拟合趋势过滤估计。本文提出了一种高效专用的ADMM程序用于趋势过滤。我们的算法与当前使用的专用内点方法竞争,但更具数值鲁棒性。此外,所提出的ADMM实现非常简单,而且重要的是,它足够灵活,可以扩展到许多有趣的相关问题,如稀疏趋势过滤和等距趋势过滤。我们的方法的软件以C和R语言免费提供。

英文摘要

This paper presents a fast and robust algorithm for trend filtering, a recently developed nonparametric regression tool. It has been shown that, for estimating functions whose derivatives are of bounded variation, trend filtering achieves the minimax optimal error rate, while other popular methods like smoothing splines and kernels do not. Standing in the way of a more widespread practical adoption, however, is a lack of scalable and numerically stable algorithms for fitting trend filtering estimates. This paper presents a highly efficient, specialized ADMM routine for trend filtering. Our algorithm is competitive with the specialized interior point methods that are currently in use, and yet is far more numerically robust. Furthermore, the proposed ADMM implementation is very simple, and importantly, it is flexible enough to extend to many interesting related problems, such as sparse trend filtering and isotonic trend filtering. Software for our method is freely available, in both the C and R languages.

1507.03194 2026-06-04 stat.ML cs.LG cs.NA math.NA

A Review of Nonnegative Matrix Factorization Methods for Clustering

聚类分析中非负矩阵因子化方法综述

Ali Caner Türkmen

AI总结 本文综述了非负矩阵因子化方法在聚类中的应用,探讨了多种变体及其聚类解释。

详情
AI中文摘要

非负矩阵因子化(NMF)最初作为低秩矩阵近似技术引入,已广泛应用于多个领域。尽管NMF似乎与聚类问题无关,但研究表明二者紧密相关。本文首先介绍了聚类和NMF的基础知识,然后探讨了多种NMF变体,包括稀疏NMF、投影NMF、非负光谱聚类和Cluster-NMF,以及它们的聚类解释。

英文摘要

Nonnegative Matrix Factorization (NMF) was first introduced as a low-rank matrix approximation technique, and has enjoyed a wide area of applications. Although NMF does not seem related to the clustering problem at first, it was shown that they are closely linked. In this report, we provide a gentle introduction to clustering and NMF before reviewing the theoretical relationship between them. We then explore several NMF variants, namely Sparse NMF, Projective NMF, Nonnegative Spectral Clustering and Cluster-NMF, along with their clustering interpretations.

1508.05514 2026-06-04 stat.ML cs.CV cs.LG cs.RO cs.SY eess.SY

Gaussian Mixture Reduction Using Reverse Kullback-Leibler Divergence

基于反向Kullback-Leibler散度的高斯混合减少

Tohid Ardeshiri, Umut Orguner, Emre Özkan

AI总结 本文提出一种贪心混合减少算法,基于Kullback-Leibler散度进行混合成分的剪枝与合并,通过分析近似方法提高计算效率,并在模拟和实际数据中验证其性能优于现有方法。

详情
AI中文摘要

我们提出了一种贪心的混合减少算法,能够基于Kullback-Leibler散度(KLD)剪枝和合并混合成分。该算法不同于已知的Runnalls基于KLD的方法,因为它不限于合并操作。剪枝能力(除合并外)使算法在减少过程中能够保留原始混合的峰值。通过分析近似方法来避免KLD的计算不可行性,从而得到一个计算高效的算法。所提出的算法在两个数值示例中与Runnalls和Williams的方法进行比较,使用模拟和实际数据。结果表明,所提出的方法在性能和计算复杂度方面使其成为现有混合减少方法的高效替代方案。

英文摘要

We propose a greedy mixture reduction algorithm which is capable of pruning mixture components as well as merging them based on the Kullback-Leibler divergence (KLD). The algorithm is distinct from the well-known Runnalls' KLD based method since it is not restricted to merging operations. The capability of pruning (in addition to merging) gives the algorithm the ability of preserving the peaks of the original mixture during the reduction. Analytical approximations are derived to circumvent the computational intractability of the KLD which results in a computationally efficient method. The proposed algorithm is compared with Runnalls' and Williams' methods in two numerical examples, using both simulated and real world data. The results indicate that the performance and computational complexity of the proposed approach make it an efficient alternative to existing mixture reduction methods.

1508.05406 2026-06-04 stat.ME cs.NA math.NA

A Note on Spherical Needlets

关于球面Needlets的注记

Minjie Fan

AI总结 本文对比传统球面谐波,介绍具有双域局部化的球面Needlets,能高效稀疏表示小尺度特征函数,并提供Matlab实现及示例验证其性质。

详情
Comments
12 pages, 7 figures, technical report
AI中文摘要

相较于传统球面谐波,球面Needlets是一种新的球面小波,具有若干吸引人的特性。它们在空间和频率域中的双局部化特性使其能够高效且稀疏地表示具有小尺度特征函数。本文分为两部分。首先,回顾球面谐波并讨论其在表示具有小尺度特征函数时的局限性。为克服这些局限性,本文介绍了球面Needlets及其吸引人的特性。在本文的第二部分中,呈现了一个用于球面Needlets的Matlab工具包。通过该工具包,用多个示例展示了球面Needlets的特性。

英文摘要

Compared with the traditional spherical harmonics, the spherical needlets are a new generation of spherical wavelets that possess several attractive properties. Their double localization in both spatial and frequency domains empowers them to easily and sparsely represent functions with small spatial scale features. This paper is divided into two parts. First, it reviews the spherical harmonics and discusses their limitations in representing functions with small spatial scale features. To overcome the limitations, it introduces the spherical needlets and their attractive properties. In the second part of the paper, a Matlab package for the spherical needlets is presented. The properties of the spherical needlets are demonstrated by several examples using the package.

1501.02513 2026-06-04 math.PR econ.GN math.ST q-fin.EC stat.TH

The 20-60-20 Rule

20-60-20 规则

Piotr Jaworski, Marcin Pitera

AI总结 本文探讨了20-60-20规则,指出在任意基准标准下,将人群分为三组时,该比例能实现平衡,有助于高效管理。通过多变量正态分布数学证明,该比例在考虑分散性和线性依赖时导致全局均衡状态。

详情
AI中文摘要

本文讨论了一个经验现象,称为20-60-20规则。它指出,若根据某种任意基准标准将人群分为三组,则此比例暗示某种平衡。从实际角度看,这一特征常导致高效的管理或控制。我们提供了一个数学示例,证明该规则在许多现实情况中出现。我们展示,对于任何可以用多变量正态向量描述的人群,当考虑分散性和线性依赖测量时,此固定比例会导致全局均衡状态。

英文摘要

In this paper we discuss an empirical phenomena known as the 20-60-20 rule. It says that if we split the population into three groups, according to some arbitrary benchmark criterion, then this particular ratio implies some sort of balance. From practical point of view, this feature often leads to efficient management or control. We provide a mathematical illustration, justifying the occurrence of this rule in many real world situations. We show that for any population, which could be described using multivariate normal vector, this fixed ratio leads to a global equilibrium state, when dispersion and linear dependance measurement is considered.

1412.2291 2026-06-04 stat.CO cs.CG cs.CV cs.NA math.NA

Adjusted least squares fitting of algebraic hypersurfaces

修正的最小二乘法拟合代数超曲面

Konstantin Usevich, Ivan Markovsky

AI总结 本文提出修正的最小二乘法用于拟合欧几里得空间中的点集,通过构造偏倚修正的矩矩阵解决普通最小二乘法的偏倚问题,并改进了计算算法。

详情
Comments
30 pages, 10 figures
AI中文摘要

我们考虑用代数超曲面拟合欧几里得空间中点集的问题。假设真实超曲面由多项式方程描述,其上的点受均值为零的独立高斯噪声干扰,我们估计真实多项式方程的系数。修正的最小二乘估计器考虑了普通最小二乘估计器中的偏倚。该估计器基于构造一个准Hankel矩阵,这是一个偏倚修正的矩矩阵。对于未知噪声方差的情况,估计器定义为多项式特征值问题的解。本文提出了关于修正最小二乘估计器不变性性质的新结果,并改进了计算估计器的算法,适用于任意多项式方程中的单项式集。

英文摘要

We consider the problem of fitting a set of points in Euclidean space by an algebraic hypersurface. We assume that points on a true hypersurface, described by a polynomial equation, are corrupted by zero mean independent Gaussian noise, and we estimate the coefficients of the true polynomial equation. The adjusted least squares estimator accounts for the bias present in the ordinary least squares estimator. The adjusted least squares estimator is based on constructing a quasi-Hankel matrix, which is a bias-corrected matrix of moments. For the case of unknown noise variance, the estimator is defined as a solution of a polynomial eigenvalue problem. In this paper, we present new results on invariance properties of the adjusted least squares estimator and an improved algorithm for computing the estimator for an arbitrary set of monomials in the polynomial equation.

1508.04467 2026-06-04 cs.CV cs.IT cs.LG cs.NA math.IT math.NA stat.ML

Robust Subspace Clustering via Smoothed Rank Approximation

通过平滑秩近似实现鲁棒子空间聚类

Zhao Kang, Chong Peng, Qiang Cheng

AI总结 本文提出基于对数-行列式秩近似的方法,用于子空间聚类,以提高精度并有效处理误差和噪声。

详情
Journal ref
IEEE Signal Processing Letters, 22(2015)2088-2092
Comments
Journal, code is available
AI中文摘要

本文提出基于对数-行列式秩近似的方法,用于子空间聚类,以提高精度并有效处理误差和噪声。矩阵秩最小化受线性约束在许多应用领域中出现,从信号处理到机器学习。核范数是该问题的凸松弛,可以在某些受限且理论有趣的条件下精确恢复秩。然而,对于许多现实应用,核范数近似到秩函数只能产生远离最优解的结果。为了寻求比核范数更准确的解决方案,本文提出基于对数-行列式的秩近似方法。我们考虑将此秩近似应用于子空间聚类应用。我们的框架可以建模不同类型的误差和噪声。开发了有效的优化策略,并具有理论保证,以收敛到 stationary 点。所提出的方法在人脸识别和运动分割任务上相比最先进的子空间聚类算法表现出有希望的结果。

英文摘要

Matrix rank minimizing subject to affine constraints arises in many application areas, ranging from signal processing to machine learning. Nuclear norm is a convex relaxation for this problem which can recover the rank exactly under some restricted and theoretically interesting conditions. However, for many real-world applications, nuclear norm approximation to the rank function can only produce a result far from the optimum. To seek a solution of higher accuracy than the nuclear norm, in this paper, we propose a rank approximation based on Logarithm-Determinant. We consider using this rank approximation for subspace clustering application. Our framework can model different kinds of errors and noise. Effective optimization strategy is developed with theoretical guarantee to converge to a stationary point. The proposed method gives promising results on face clustering and motion segmentation tasks compared to the state-of-the-art subspace clustering algorithms.

1508.04158 2026-06-04 stat.ME cs.SY eess.SY

Distributed multi-object tracking over sensor networks: a random finite set approach

分布式多目标跟踪 over 传感器网络:一种随机有限集方法

Claudio Fantacci

AI总结 本文提出基于随机有限集的分布式多目标跟踪方法,通过信息论数据融合和共识算法实现多节点协同跟踪,解决异构地理分散节点的多目标跟踪问题。

详情
Comments
Ph.D. thesis of Claudio Fantacci, Università di Firenze, Dipartimento di Ingegneria dell'Informazione (DINFO), Florence, Italy Successfully defended on the 5th of March 2015
AI中文摘要

本文旨在解决异构且地理分散的节点网络中的分布式跟踪问题。跟踪在贝叶斯框架下进行,通过信息论方法的数据融合,利用共识算法和概率密度函数的Kullback-Leibler平均(KLA)扩展到分布式场景。首先考虑单个移动目标,各节点通过共识算法传播信息以实现跟踪。接着研究多个移动目标的分布式估计,采用随机有限集(RFS)方法以处理可能变化的多目标数量。最后,通过带标签的RFS框架实现具有唯一目标身份的分布式滤波,发展出高效且可扩展的多目标跟踪滤波器。

英文摘要

The aim of the present dissertation is to address distributed tracking over a network of heterogeneous and geographically dispersed nodes (or agents) with sensing, communication and processing capabilities. Tracking is carried out in the Bayesian framework and its extension to a distributed context is made possible via an information-theoretic approach to data fusion which exploits consensus algorithms and the notion of Kullback-Leibler Average (KLA) of the Probability Density Functions (PDFs) to be fused. The first step toward distributed tracking considers a single moving object. Consensus takes place in each agent for spreading information over the network so that each node can track the object. To achieve such a goal, consensus is carried out on the local single-object posterior distribution, which is the result of local data processing, in the Bayesian setting, exploiting the last available measurement about the object. The next step is in the direction of distributed estimation of multiple moving objects. In order to model, in a rigorous and elegant way, a possibly time-varying number of objects present in a given area of interest, the Random Finite Set (RFS) formulation is adopted since it provides the notion of probability density for multi-object states that allows to directly extend existing tools in distributed estimation to multi-object tracking. The last theoretical step of the present dissertation is toward distributed filtering with the further requirement of unique object identities. To this end the labeled RFS framework is adopted as it provides a tractable approach to the multi-object Bayesian recursion. A generalization of the KLA to the labeled RFS framework, enables the development of novel consensus multi-object tracking filters which are fully distributed, scalable and computationally efficient.

1412.8464 2026-06-04 math.NA cs.NA stat.ML

Alternating Minimization Algorithm with Automatic Relevance Determination for Transmission Tomography under Poisson Noise

交替最小化算法与自动相关性确定在泊松噪声下的传输断层成像

Yan Kaganovsky, Shaobo Han, Soysal Degirmenci, David G. Politte, David J. Brady, Joseph A. O'Sullivan, Lawrence Carin

AI总结 本文提出了一种全局收敛的交替最小化算法,用于在泊松噪声下进行传输断层成像的图像重建,通过引入额外的隐变量来促进像素/体素差域的稀疏性,并在大规模问题中实现高效的并行优化。

详情
Comments
Changes relative to the previous version: (1) Minor changes in abstract; (2) The results for simulated data contain another comparison to a previous method; (3) Corrected Eq. (6.3); (4) Additional discussions in Secs.1,7,8
AI中文摘要

我们提出了一种全局收敛的交替最小化(AM)算法,用于传输断层成像中的图像重建,该算法将自动相关性确定(ARD)扩展到具有比尔定律的泊松噪声模型中。该算法通过引入额外的隐变量(每个像素/体素一个)来促进像素/体素差域的稀疏性,然后使用分层贝叶斯模型从数据中学习这些变量。重要的是,所提出的AM算法无需任何调参,图像质量可与标准惩罚似然方法相媲美。我们的算法利用优化转移原理,将问题转化为并行的1D优化任务(每个像素/体素一个),使其在大规模问题中可行。这种方法显著减少了ARD与后验方差相关的计算瓶颈。传输断层成像问题固有的正性约束也得到了强制执行。我们使用合成和实际数据集展示了所提算法在X射线计算机断层成像中的性能。该算法在高光子通量下比基于近似高斯噪声模型的先前ARD算法表现更好。

英文摘要

We propose a globally convergent alternating minimization (AM) algorithm for image reconstruction in transmission tomography, which extends automatic relevance determination (ARD) to Poisson noise models with Beer's law. The algorithm promotes solutions that are sparse in the pixel/voxel-differences domain by introducing additional latent variables, one for each pixel/voxel, and then learning these variables from the data using a hierarchical Bayesian model. Importantly, the proposed AM algorithm is free of any tuning parameters with image quality comparable to standard penalized likelihood methods. Our algorithm exploits optimization transfer principles which reduce the problem into parallel 1D optimization tasks (one for each pixel/voxel), making the algorithm feasible for large-scale problems. This approach considerably reduces the computational bottleneck of ARD associated with the posterior variances. Positivity constraints inherent in transmission tomography problems are also enforced. We demonstrate the performance of the proposed algorithm for x-ray computed tomography using synthetic and real-world datasets. The algorithm is shown to have much better performance than prior ARD algorithms based on approximate Gaussian noise models, even for high photon flux.

1412.8293 2026-06-04 stat.ML cs.LG cs.NA math.NA stat.CO

Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels

准蒙特卡洛特征映射用于移不变核

Haim Avron, Vikas Sindhwani, Jiyan Yang, Michael Mahoney

AI总结 本文提出用准蒙特卡洛方法改进随机傅里叶特征映射,以加速大规模数据集上核方法的训练和测试速度,通过低差异序列减少积分误差。

详情
Comments
A short version of this paper has been presented in ICML 2014
AI中文摘要

我们考虑如何提高随机傅里叶特征映射的效率,以加速核方法在大规模数据集上的训练和测试速度。这些近似特征映射作为蒙特卡洛近似积分表示的移不变核函数(如高斯核)的近似。本文提出使用准蒙特卡洛(QMC)近似,其中相关的积分被评估在低差异点序列上,而不是蒙特卡洛方法中的随机点集。我们推导了一个新的差异度量,称为箱差异,基于对给定序列的积分误差的理论特征。然后我们提出基于显式的箱差异最小化来学习适应我们设置的QMC序列。我们的理论分析辅以实验证明经典和自适应QMC技术在该问题上的有效性。

英文摘要

We consider the problem of improving the efficiency of randomized Fourier feature maps to accelerate training and testing speed of kernel methods on large datasets. These approximate feature maps arise as Monte Carlo approximations to integral representations of shift-invariant kernel functions (e.g., Gaussian kernel). In this paper, we propose to use Quasi-Monte Carlo (QMC) approximations instead, where the relevant integrands are evaluated on a low-discrepancy sequence of points as opposed to random point sets as in the Monte Carlo approach. We derive a new discrepancy measure called box discrepancy based on theoretical characterizations of the integration error with respect to a given sequence. We then propose to learn QMC sequences adapted to our setting based on explicit box discrepancy minimization. Our theoretical analyses are complemented with empirical results that demonstrate the effectiveness of classical and adaptive QMC techniques for this problem.

1508.01071 2026-06-04 eess.SY cs.SY stat.ML

A MAP approach for $\ell_q$-norm regularized sparse parameter estimation using the EM algorithm

基于EM算法的$\ell_q$-范数正则化稀疏参数估计的MAP方法

Rodrigo Carvajal, Juan C. Agüero, Boris I. Godoy, Dimitrios Katselis

AI总结 本文基于EM算法重新考虑了通过最大后验(MAP)准则进行贝叶斯参数估计的问题,利用合适先验分布引入稀疏性促进惩罚项,通过方差-均值高斯混合(VMGM)描述先验分布,解决非线性和隐变量优化问题,并通过仿真比较了坐标下降算法的性能。

详情
Comments
Accepted to IEEE Machine Learning for Signal Processing Conference 2015
AI中文摘要

本文重新考虑了通过最大后验(MAP)准则进行贝叶斯参数估计的问题,通过在估计问题的成本函数中引入稀疏性促进惩罚项,利用合适先验分布来解决相应优化问题。为此,我们利用方差-均值高斯混合(VMGM)来描述先验分布,并将这些混合物的许多优点融入到估计问题中。对应的MAP估计问题完全以EM算法的形式表达,这允许处理传统方法难以处理的非线性和隐变量。为了比较目的,我们还为$\ell_q$-范数惩罚问题开发了坐标下降算法,并通过仿真展示了性能结果。

英文摘要

In this paper, Bayesian parameter estimation through the consideration of the Maximum A Posteriori (MAP) criterion is revisited under the prism of the Expectation-Maximization (EM) algorithm. By incorporating a sparsity-promoting penalty term in the cost function of the estimation problem through the use of an appropriate prior distribution, we show how the EM algorithm can be used to efficiently solve the corresponding optimization problem. To this end, we rely on variance-mean Gaussian mixtures (VMGM) to describe the prior distribution, while we incorporate many nice features of these mixtures to our estimation problem. The corresponding MAP estimation problem is completely expressed in terms of the EM algorithm, which allows for handling nonlinearities and hidden variables that cannot be easily handled with traditional methods. For comparison purposes, we also develop a Coordinate Descent algorithm for the $\ell_q$-norm penalized problem and present the performance results via simulations.

1409.2848 2026-06-04 cs.LG cs.NA math.NA math.OC stat.ML

A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate

具有指数收敛速率的随机PCA和SVD算法

Ohad Shamir

AI总结 提出VR-PCA算法,通过低计算成本的随机迭代实现快速收敛,解决传统方法收敛慢或计算强度大的问题。

详情
Comments
Fixed a minor bug in the proof of lemma 1 (which does not affect the result)
AI中文摘要

我们描述并分析了一个简单的主成分分析和奇异值分解算法VR-PCA,该算法使用计算成本低的随机迭代,却能以指数速度收敛到最优解。与现有算法相比,现有方法要么收敛速度慢,要么迭代计算强度大,运行时间随数据规模增长。该算法基于最近的方差减少随机梯度技术,此前该技术用于强凸优化分析,而此处应用于本质上非凸的问题,采用了非常不同的分析方法。

英文摘要

We describe and analyze a simple algorithm for principal component analysis and singular value decomposition, VR-PCA, which uses computationally cheap stochastic iterations, yet converges exponentially fast to the optimal solution. In contrast, existing algorithms suffer either from slow convergence, or computationally intensive iterations whose runtime scales with the data size. The algorithm builds on a recent variance-reduced stochastic gradient technique, which was previously analyzed for strongly convex optimization, whereas here we apply it to an inherently non-convex problem, using a very different analysis.

1410.0989 2026-06-04 cs.IT cs.NA math.IT math.NA math.OC stat.ME

On the Effective Measure of Dimension in the Analysis Cosparse Model

在分析稀疏模型中有效维度的度量

Raja Giryes, Yaniv Plan, Roman Vershynin

AI总结 研究探讨了在存在噪声的情况下,能否通过与流形维度成比例的测量数准确恢复稀疏信号,证明不存在此类算法,并通过数值模拟显示凸松弛在测量数与流形维度相当时失效。

详情
Comments
19 pages, 6 figures
AI中文摘要

近年来,低维模型在许多应用中带来了显著效益。许多信号尽管高维,但本质上低维,这使得它们可以从相对少量的测量中稳定恢复。例如,在压缩感知和矩阵补全中,所需测量数与信号的流形维度成比例(最多乘以对数因子)。最近,一种新的自然低维信号模型被提出:分析稀疏先验。在无噪声情况下,可以使用组合搜索从与信号流形维度成比例的测量中恢复信号。然而,如果要求对噪声的稳定性或高效的(多项式复杂度)求解器,现有结果要求的测量数远大于流形维度。因此,自然地提出问题:这种差距是理论和求解器的缺陷,还是仅靠流形维度恢复稀疏信号存在实际障碍?是否存在在存在噪声情况下,能从与流形维度成比例的测量中准确恢复稀疏信号的算法?本文证明不存在此类算法。进一步,通过数值模拟显示,即使在无噪声情况下,当测量数与流形维度相当时,凸松弛也失效。这为基于流形维度的压缩获取信号文献提供了实际反例。

英文摘要

Many applications have benefited remarkably from low-dimensional models in the recent decade. The fact that many signals, though high dimensional, are intrinsically low dimensional has given the possibility to recover them stably from a relatively small number of their measurements. For example, in compressed sensing with the standard (synthesis) sparsity prior and in matrix completion, the number of measurements needed is proportional (up to a logarithmic factor) to the signal's manifold dimension. Recently, a new natural low-dimensional signal model has been proposed: the cosparse analysis prior. In the noiseless case, it is possible to recover signals from this model, using a combinatorial search, from a number of measurements proportional to the signal's manifold dimension. However, if we ask for stability to noise or an efficient (polynomial complexity) solver, all the existing results demand a number of measurements which is far removed from the manifold dimension, sometimes far greater. Thus, it is natural to ask whether this gap is a deficiency of the theory and the solvers, or if there exists a real barrier in recovering the cosparse signals by relying only on their manifold dimension. Is there an algorithm which, in the presence of noise, can accurately recover a cosparse signal from a number of measurements proportional to the manifold dimension? In this work, we prove that there is no such algorithm. Further, we show through numerical simulations that even in the noiseless case convex relaxations fail when the number of measurements is comparable to the manifold dimension. This gives a practical counter-example to the growing literature on compressed acquisition of signals based on manifold dimension.

1502.03032 2026-06-04 cs.DC cs.DS cs.NA math.NA stat.ML

Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

在并行和分布式环境中实现随机矩阵算法

Jiyan Yang, Xiangrui Meng, Michael W. Mahoney

AI总结 本文探讨了在大规模并行和分布式环境中实现随机矩阵算法的理论与实践,重点在于随机投影和采样算法在大规模ℓ1和ℓ2回归问题中的应用与性能优化。

详情
AI中文摘要

在大规模数据时代,基于 commodity 硬件集群的分布式系统提供了廉价可靠的数据存储和可扩展的处理能力。本文回顾了近年来在大规模并行和分布式环境中开发和实现随机矩阵算法的研究。随机算法在矩阵问题中近年来受到广泛关注,通常在理论、机器学习应用或单机实现中。我们的主要关注点是随机投影和随机采样算法在非常大且非常超定(即超约束)ℓ1和ℓ2回归问题中的底层理论和实际实现。随机化可以以两种相关的方式使用:要么构造可精确或近似求解的传统数值方法子采样问题;要么构造预处理的原始完整问题版本,以便更易用传统迭代算法求解。理论结果表明,在接近输入稀疏时间且仅需几次数据遍历的情况下,可以以高概率获得非常强的相对误差近似解。实证结果强调了各种权衡的重要性(例如,嵌入构造时间与嵌入条件质量之间的权衡,计算与通信相对重要性的权衡等),并展示了ℓ1和ℓ2回归问题可以在现有分布式系统上处理至低、中或高精度,处理数据规模可达数TB级。

英文摘要

In this era of large-scale data, distributed systems built on top of clusters of commodity hardware provide cheap and reliable storage and scalable processing of massive data. Here, we review recent work on developing and implementing randomized matrix algorithms in large-scale parallel and distributed environments. Randomized algorithms for matrix problems have received a great deal of attention in recent years, thus far typically either in theory or in machine learning applications or with implementations on a single machine. Our main focus is on the underlying theory and practical implementation of random projection and random sampling algorithms for very large very overdetermined (i.e., overconstrained) $\ell_1$ and $\ell_2$ regression problems. Randomization can be used in one of two related ways: either to construct sub-sampled problems that can be solved, exactly or approximately, with traditional numerical methods; or to construct preconditioned versions of the original full problem that are easier to solve with traditional iterative algorithms. Theoretical results demonstrate that in near input-sparsity time and with only a few passes through the data one can obtain very strong relative-error approximate solutions, with high probability. Empirical results highlight the importance of various trade-offs (e.g., between the time to construct an embedding and the conditioning quality of the embedding, between the relative importance of computation versus communication, etc.) and demonstrate that $\ell_1$ and $\ell_2$ regression problems can be solved to low, medium, or high precision in existing distributed systems on up to terabyte-sized data.

1507.04396 2026-06-04 math.NA cs.LG cs.NA stat.ML

Parallel MMF: a Multiresolution Approach to Matrix Computation

并行MMF:矩阵计算的多分辨率方法

Risi Kondor, Nedelina Teneva, Pramod K. Mudrakarta

AI总结 本文提出并行MMF算法,用于多尺度结构分析和矩阵压缩,通过实验展示其在稀疏矩阵压缩和预处理中的有效性。

详情
AI中文摘要

多分辨率矩阵分解(MMF)最近被引入为一种寻找多尺度结构并定义图/矩阵上的小波的方法。在本文中,我们推导出pMMF,一种用于计算MMF分解的并行算法。经验上,pMMF的运行时间与稀疏矩阵的维度成线性关系。我们认为这使pMMF成为一种有价值的计算原语,并展示了将其用于两种不同目的的实验:压缩矩阵和预处理大型稀疏线性系统。

英文摘要

Multiresolution Matrix Factorization (MMF) was recently introduced as a method for finding multiscale structure and defining wavelets on graphs/matrices. In this paper we derive pMMF, a parallel algorithm for computing the MMF factorization. Empirically, the running time of pMMF scales linearly in the dimension for sparse matrices. We argue that this makes pMMF a valuable new computational primitive in its own right, and present experiments on using pMMF for two distinct purposes: compressing matrices and preconditioning large sparse linear systems.

1406.1102 2026-06-04 math.NA cs.LG cs.NA stat.CO stat.ML

Linear Convergence of Variance-Reduced Stochastic Gradient without Strong Convexity

方差减少随机梯度算法在无强凸性下的线性收敛性

Pinghua Gong, Jieping Ye

AI总结 本文提出Prox-SVRG和VRPSG算法,证明在无强凸性条件下,这些算法在约束和正则化问题中实现线性收敛,引入Semi-Strongly Convex不等式作为关键理论贡献。

详情
Comments
18 pages
AI中文摘要

随机梯度算法通过仅使用一个或几个样本估计梯度,具有低的每迭代计算成本。它们在大规模优化问题中被广泛应用。然而,由于梯度计算中的固有方差,随机梯度算法通常收敛缓慢,具有亚线性收敛速率。为加速收敛,一些方差减少随机梯度算法,如近端随机方差减少梯度(Prox-SVRG)算法,最近被提出以解决强凸问题。在强凸条件下,这些方差减少随机梯度算法实现线性收敛速率。然而,许多机器学习问题是凸但非强凸的。在本文中,我们引入Prox-SVRG及其投影变种称为方差减少投影随机梯度(VRPSG)算法,以解决广泛用于机器学习的一类非强凸优化问题。作为本文的主要技术贡献,我们证明了VRPSG和Prox-SVRG在无强凸性条件下实现线性收敛速率。证明中的关键成分是一个半强凸(SSC)不等式,这是首次严格证明用于一类非强凸问题的约束和正则化设置中的不等式。此外,SSC不等式与算法无关,可能用于分析其他随机梯度算法,这可能具有独立价值。据我们所知,这是首次在无强凸性条件下建立方差减少随机梯度算法在解决约束和正则化问题中的线性收敛速率的工作。

英文摘要

Stochastic gradient algorithms estimate the gradient based on only one or a few samples and enjoy low computational cost per iteration. They have been widely used in large-scale optimization problems. However, stochastic gradient algorithms are usually slow to converge and achieve sub-linear convergence rates, due to the inherent variance in the gradient computation. To accelerate the convergence, some variance-reduced stochastic gradient algorithms, e.g., proximal stochastic variance-reduced gradient (Prox-SVRG) algorithm, have recently been proposed to solve strongly convex problems. Under the strongly convex condition, these variance-reduced stochastic gradient algorithms achieve a linear convergence rate. However, many machine learning problems are convex but not strongly convex. In this paper, we introduce Prox-SVRG and its projected variant called Variance-Reduced Projected Stochastic Gradient (VRPSG) to solve a class of non-strongly convex optimization problems widely used in machine learning. As the main technical contribution of this paper, we show that both VRPSG and Prox-SVRG achieve a linear convergence rate without strong convexity. A key ingredient in our proof is a Semi-Strongly Convex (SSC) inequality which is the first to be rigorously proved for a class of non-strongly convex problems in both constrained and regularized settings. Moreover, the SSC inequality is independent of algorithms and may be applied to analyze other stochastic gradient algorithms besides VRPSG and Prox-SVRG, which may be of independent interest. To the best of our knowledge, this is the first work that establishes the linear convergence rate for the variance-reduced stochastic gradient algorithms on solving both constrained and regularized problems without strong convexity.

1407.3463 2026-06-04 math.NA cs.NA stat.CO stat.ME

Optimal low-rank approximations of Bayesian linear inverse problems

贝叶斯线性反问题的最优低秩近似

Alessio Spantini, Antti Solonen, Tiangang Cui, James Martin, Luis Tenorio, Youssef Marzouk

AI总结 本文研究了利用低维子空间近似贝叶斯反问题后验分布,提出最优低秩更新方法及快速后验均值近似,适用于多数据集重复计算。

详情
AI中文摘要

在贝叶斯反问题中,数据通常仅在参数空间的低维子空间中提供信息。通过将后验协方差矩阵视为先验协方差矩阵的低秩更新,本文证明了基于Hessian矩阵和先验精度定义的矩阵 pencils 的主导特征方向的更新在广泛损失函数下最优。同时提出了两种快速后验均值近似,并证明其在加权贝叶斯风险下最优。这些方法通过离线-在线方式实现,适用于多次多数据集计算。通过X射线断层成像和反热传导问题的数值实验,验证了低维结构的有效利用。

英文摘要

In the Bayesian approach to inverse problems, data are often informative, relative to the prior, only on a low-dimensional subspace of the parameter space. Significant computational savings can be achieved by using this subspace to characterize and approximate the posterior distribution of the parameters. We first investigate approximation of the posterior covariance matrix as a low-rank update of the prior covariance matrix. We prove optimality of a particular update, based on the leading eigendirections of the matrix pencil defined by the Hessian of the negative log-likelihood and the prior precision, for a broad class of loss functions. This class includes the Förstner metric for symmetric positive definite matrices, as well as the Kullback-Leibler divergence and the Hellinger distance between the associated distributions. We also propose two fast approximations of the posterior mean and prove their optimality with respect to a weighted Bayes risk under squared-error loss. These approximations are deployed in an offline-online manner, where a more costly but data-independent offline calculation is followed by fast online evaluations. As a result, these approximations are particularly useful when repeated posterior mean evaluations are required for multiple data sets. We demonstrate our theoretical results with several numerical examples, including high-dimensional X-ray tomography and an inverse heat conduction problem. In both of these examples, the intrinsic low-dimensional structure of the inference problem can be exploited while producing results that are essentially indistinguishable from solutions computed in the full space.

1312.1254 2026-06-04 math.NA cs.NA stat.CO

Parallel matrix factorization for low-rank tensor completion

低秩张量补全的并行矩阵分解

Yangyang Xu, Ruru Hao, Wotao Yin, Zhixun Su

AI总结 本文提出通过同时进行所有模式矩阵化低秩矩阵分解来恢复低秩张量,采用交替最小化算法并结合自适应秩调整策略,有效从更少样本中恢复合成低秩张量,并在真实数据上表现优异。

详情
Journal ref
Inverse Problems and Imaging. Volume 9, No.2, 601-624, 2015
Comments
25 pages, 12 figures
AI中文摘要

高阶低秩张量在许多应用中自然出现,包括超光谱数据恢复、视频修复、地震数据重建等。我们提出了一种新的模型,通过同时对底层张量的所有模式矩阵化进行低秩矩阵分解来恢复低秩张量。应用交替最小化算法求解该模型,并在确切秩未知时采用两种自适应秩调整策略。相变图显示,我们的算法能够从显著少于对比方法的样本中恢复多种合成低秩张量,包括应用于张量恢复的矩阵完成方法和两种最先进的张量补全方法。进一步在真实数据上的测试显示相似优势。尽管我们的模型是非凸的,但我们的算法在测试中表现一致,并优于一些基于凸模型的方法。此外,我们的算法的全局收敛性可以建立在梯度趋于零的意义上。

英文摘要

Higher-order low-rank tensors naturally arise in many applications including hyperspectral data recovery, video inpainting, seismic data recon- struction, and so on. We propose a new model to recover a low-rank tensor by simultaneously performing low-rank matrix factorizations to the all-mode ma- tricizations of the underlying tensor. An alternating minimization algorithm is applied to solve the model, along with two adaptive rank-adjusting strategies when the exact rank is not known. Phase transition plots reveal that our algorithm can recover a variety of synthetic low-rank tensors from significantly fewer samples than the compared methods, which include a matrix completion method applied to tensor recovery and two state-of-the-art tensor completion methods. Further tests on real- world data show similar advantages. Although our model is non-convex, our algorithm performs consistently throughout the tests and give better results than the compared methods, some of which are based on convex models. In addition, the global convergence of our algorithm can be established in the sense that the gradient of Lagrangian function converges to zero.

1507.00438 2026-06-04 cs.LG cs.NA math.NA stat.ML

DC Proximal Newton for Non-Convex Optimization Problems

非凸优化问题的DC近端牛顿法

Alain Rakotomamonjy, Remi Flamary, Gilles Gasso

AI总结 本文提出一种新的非凸优化算法,通过近端牛顿法处理非凸损失和正则化函数,理论分析证明其极限点为DC目标函数的 stationary points,实验显示其在高维转导学习中更高效。

详情
AI中文摘要

我们介绍了一种新的算法,用于解决学习问题,其中损失函数和正则器均为非凸但属于差分凸(DC)函数类。我们的贡献是一种通用的近端牛顿算法,能够处理此类情况。算法通过近似损失函数获得下降方向,并通过线搜索确保充分下降。理论分析表明,所提出算法的迭代点的极限点是DC目标函数的 stationary points。数值实验显示,我们的方法在具有凸损失函数和非凸正则化函数的问题上比现有方法更高效。我们还展示了该算法在高维转导学习问题中的优势,其中损失函数和正则化器均为非凸的。

英文摘要

We introduce a novel algorithm for solving learning problems where both the loss function and the regularizer are non-convex but belong to the class of difference of convex (DC) functions. Our contribution is a new general purpose proximal Newton algorithm that is able to deal with such a situation. The algorithm consists in obtaining a descent direction from an approximation of the loss function and then in performing a line search to ensure sufficient descent. A theoretical analysis is provided showing that the iterates of the proposed algorithm {admit} as limit points stationary points of the DC objective function. Numerical experiments show that our approach is more efficient than current state of the art for a problem with a convex loss functions and non-convex regularizer. We have also illustrated the benefit of our algorithm in high-dimensional transductive learning problem where both loss function and regularizers are non-convex.

1507.00421 2026-06-04 math.NA cs.LG cs.NA math.ST stat.ML stat.TH

Categorical Matrix Completion

分类矩阵补全

Yang Cao, Yao Xie

AI总结 本文提出通过扩展一位矩阵补全方法,解决具有类别值的矩阵补全问题,通过核范数约束最大化似然比,建立理论误差界,并在MovieLens数据集上验证方法优势。

详情
Comments
Submitted
AI中文摘要

我们考虑从不完整观测中补全具有类别值的矩阵问题,通过扩展一位矩阵补全的公式和理论实现。通过最大化似然比并约束X的核范数来恢复低秩矩阵X,观测通过多个链接函数映射自X的条目。我们建立了恢复误差的理论上界和下界,达到常数因子O(K^{3/2}),其中K是固定类别数。上界依赖于类别数通过最大化涉及链接函数平滑度的项。与一位矩阵补全相比,我们的边界在类别数平方根的阶数上是最佳的,这与类别数增加时问题变难的直觉一致。通过在MovieLens数据集上比较我们的方法与传统矩阵补全方法的性能,我们展示了方法的优势。

英文摘要

We consider the problem of completing a matrix with categorical-valued entries from partial observations. This is achieved by extending the formulation and theory of one-bit matrix completion. We recover a low-rank matrix $X$ by maximizing the likelihood ratio with a constraint on the nuclear norm of $X$, and the observations are mapped from entries of $X$ through multiple link functions. We establish theoretical upper and lower bounds on the recovery error, which meet up to a constant factor $\mathcal{O}(K^{3/2})$ where $K$ is the fixed number of categories. The upper bound in our case depends on the number of categories implicitly through a maximization of terms that involve the smoothness of the link functions. In contrast to one-bit matrix completion, our bounds for categorical matrix completion are optimal up to a factor on the order of the square root of the number of categories, which is consistent with an intuition that the problem becomes harder when the number of categories increases. By comparing the performance of our method with the conventional matrix completion method on the MovieLens dataset, we demonstrate the advantage of our method.

1503.06606 2026-06-04 eess.SY cs.SY stat.CO

Robust Inference for State-Space Models with Skewed Measurement Noise

具有偏斜测量噪声的状态空间模型鲁棒推断

Henri Nurminen, Tohid Ardeshiri, Robert Piché, Fredrik Gustafsson

AI总结 本文提出了一种针对具有偏斜和厚尾测量噪声的线性离散时间状态空间模型的滤波和平滑算法,通过变分贝叶斯方法近似后验分布,实验表明其在精度上优于传统低复杂度方法。

详情
Journal ref
IEEE Signal Processing Letters 22(11) (2015) 1898-1902
Comments
5 pages, 7 figures. Accepted for publication in IEEE Signal Processing Letters
AI中文摘要

本文提出了针对线性离散时间状态空间模型的滤波和平滑算法,该算法使用变分贝叶斯方法近似具有正态先验和偏斜t分布测量噪声的后验分布。在模拟的伪距定位场景中,所提出的方法在精度上优于传统低复杂度方法,滤波器的计算复杂度约为卡尔曼滤波的5到10倍。

英文摘要

Filtering and smoothing algorithms for linear discrete-time state-space models with skewed and heavy-tailed measurement noise are presented. The algorithms use a variational Bayes approximation of the posterior distribution of models that have normal prior and skew-t-distributed measurement noise. The proposed filter and smoother are compared with conventional low-complexity alternatives in a simulated pseudorange positioning scenario. In the simulations the proposed methods achieve better accuracy than the alternative methods, the computational complexity of the filter being roughly 5 to 10 times that of the Kalman filter.

1506.07540 2026-06-04 math.NA cs.LG cs.NA stat.ML

Global Optimality in Tensor Factorization, Deep Learning, and Beyond

张量分解、深度学习及其他中的全局最优性

Benjamin D. Haeffele, Rene Vidal

AI总结 本文提出一个通用框架,分析非凸分解问题,证明局部最小值为全局最小值,并指导深度网络架构和正则化策略以提高优化效率。

详情
AI中文摘要

涉及分解的技术在广泛的应用中取得显著实证成功,但大多数问题的优化问题通常由于多线性形式或其他破坏凸性的转换而非凸。本文基于矩阵分解的凸松弛思想,提出一个通用框架,分析包括矩阵分解、张量分解和深度神经网络训练在内的非凸分解问题。我们推导出保证非凸优化问题局部最小值为全局最小值的充分条件,并证明如果分解变量的规模足够大,则从任何初始化出发,使用纯局部下降算法可以找到全局最小值。该框架还部分理论上解释了深度神经网络中ReLU的广泛应用,并提供指导以促进高效优化。

英文摘要

Techniques involving factorization are found in a wide range of applications and have enjoyed significant empirical success in many fields. However, common to a vast majority of these problems is the significant disadvantage that the associated optimization problems are typically non-convex due to a multilinear form or other convexity destroying transformation. Here we build on ideas from convex relaxations of matrix factorizations and present a very general framework which allows for the analysis of a wide range of non-convex factorization problems - including matrix factorization, tensor factorization, and deep neural network training formulations. We derive sufficient conditions to guarantee that a local minimum of the non-convex optimization problem is a global minimum and show that if the size of the factorized variables is large enough then from any initialization it is possible to find a global minimizer using a purely local descent algorithm. Our framework also provides a partial theoretical justification for the increasingly common use of Rectified Linear Units (ReLUs) in deep neural networks and offers guidance on deep network architectures and regularization strategies to facilitate efficient optimization.

1502.06777 2026-06-04 stat.CO cs.NA math.NA

Statistical efficiency of structured cpd estimation applied to Wiener-Hammerstein modeling

结构化CPD估计的统计效率及其在Wiener-Hammerstein建模中的应用

José Henrique De Morais Goulart, Maxime Boizard, Rémy Boyer, Gérard Favier, Pierre Comon

AI总结 本文研究了结构化CPD估计在非线性系统建模中的统计效率,通过Wiener-Hammerstein模型结合高阶Volterra核,推导出CRB并验证估计器的性能。

详情
Comments
Accepted for publication in the Proceedings of the European Signal Processing Conference (EUSIPCO) Aug 2015, Nice, France. 2015
AI中文摘要

结构化 canonical polyadic decomposition (CPD) 的计算对解决现实应用中的重要建模问题具有帮助。本文考虑通过 Wiener-Hammerstein 模型识别非线性系统,假设该系统的高阶 Volterra 核已预先估计。此类核视为张量,允许具有带状循环因子的 CPD,这些因子包含模型参数。为估计这些参数,我们基于最近提出的结构化 CPD 计算算法制定专用估计器。然后,考虑加性白高斯噪声的存在,我们推导出与此估计问题相关的 Cramer-Rao bound (CRB) 的闭式表达式。最后,通过蒙特卡洛模拟评估所提估计器的统计性能,通过将其均方误差与 CRB 进行比较来验证。

英文摘要

The computation of a structured canonical polyadic decomposition (CPD) is useful to address several important modeling problems in real-world applications. In this paper, we consider the identification of a nonlinear system by means of a Wiener-Hammerstein model, assuming a high-order Volterra kernel of that system has been previously estimated. Such a kernel, viewed as a tensor, admits a CPD with banded circulant factors which comprise the model parameters. To estimate them, we formulate specialized estimators based on recently proposed algorithms for the computation of structured CPDs. Then, considering the presence of additive white Gaussian noise, we derive a closed-form expression for the Cramer-Rao bound (CRB) associated with this estimation problem. Finally, we assess the statistical performance of the proposed estimators via Monte Carlo simulations, by comparing their mean-square error with the CRB.

1408.4236 2026-06-04 physics.ao-ph cs.NA math.NA nlin.CD physics.data-an stat.CO

Ensemble Kalman filtering with a divided state-space strategy for coupled data assimilation problems

基于分块状态空间策略的集合卡尔曼滤波用于耦合数据同化问题

Xiaodong Luo, Ibrahim Hoteit

AI总结 本文提出一种分块状态空间策略,用于解决耦合系统数据同化问题,通过分别处理子系统并结合高效准确的权衡方法改进集合卡尔曼滤波。

详情
Comments
To appear in Monthly Weather Review; Please note that there is a supplementary file associated with the paper
AI中文摘要

本文研究了耦合系统中的数据同化问题,该系统由两个相互作用的子系统组成。一种直接的方法是将子系统的状态拼接成一个扩展状态向量,从而直接应用标准集合卡尔曼滤波(EnKF)。本文提出了一种分块状态空间估计策略,其中数据同化针对每个子系统进行,涉及该子系统自身的量和与其他耦合子系统的相关量。在分块状态空间估计策略的基础上,还考虑了单独运行子系统的可能性。结合这两种思想,推导出几种EnKF的变体。这些变体的引入主要受当前耦合数据同化问题现状和挑战的启发,因此可能在实际应用中有参考价值。使用多尺度洛伦茨96模型进行数值实验,评估这些变体相对于传统EnKF的性能。此外,针对耦合数据同化问题,还开发了两种方法的扩展原型,以在效率和准确性之间取得平衡。

英文摘要

This study considers the data assimilation problem in coupled systems, which consists of two components (sub-systems) interacting with each other through certain coupling terms. A straightforward way to tackle the assimilation problem in such systems is to concatenate the states of the sub-systems into one augmented state vector, so that a standard ensemble Kalman filter (EnKF) can be directly applied. In this work we present a divided state-space estimation strategy, in which data assimilation is carried out with respect to each individual sub-system, involving quantities from the sub-system itself and correlated quantities from other coupled sub-systems. On top of the divided state-space estimation strategy, we also consider the possibility to run the sub-systems separately. Combining these two ideas, a few variants of the EnKF are derived. The introduction of these variants is mainly inspired by the current status and challenges in coupled data assimilation problems, and thus might be of interest from a practical point of view. Numerical experiments with a multi-scale Lorentz 96 model are conducted to evaluate the performance of these variants against that of the conventional EnKF. In addition, specific for coupled data assimilation problems, two prototypes of extensions of the presented methods are also developed in order to achieve a trade-off between efficiency and accuracy.

1407.1697 2026-06-04 cs.IT cs.SY eess.SY math.IT math.OC stat.CO

L1 Control Theoretic Smoothing Splines

L1控制理论平滑样条

Masaaki Nagahara, Clyde F. Martin

AI总结 本文提出基于L1最优性的控制理论平滑样条,用于减少描述拟合曲线的参数数量并去除异常数据。通过线性动态系统生成控制理论样条,利用L1范数进行正则化和经验风险项优化,有效提升鲁棒性。

详情
Comments
Accepted for publication in IEEE Signal Processing Letters. 4 pages (twocolumn), 5 figures
AI中文摘要

本文提出一种基于L1最优性的控制理论平滑样条,旨在减少描述拟合曲线的参数数量并去除异常数据。控制理论样条是通过给定线性动态系统生成的平滑样条。传统设计需要与数据数量相同的基函数,结果对异常值不鲁棒。为解决这些问题,本文提出使用L1最优性,即使用L1范数作为正则化项和/或经验风险项。优化过程描述为凸优化问题,可通过数值优化软件高效求解。数值示例展示了所提方法的有效性。

英文摘要

In this paper, we propose control theoretic smoothing splines with L1 optimality for reducing the number of parameters that describes the fitted curve as well as removing outlier data. A control theoretic spline is a smoothing spline that is generated as an output of a given linear dynamical system. Conventional design requires exactly the same number of base functions as given data, and the result is not robust against outliers. To solve these problems, we propose to use L1 optimality, that is, we use the L1 norm for the regularization term and/or the empirical risk term. The optimization is described by a convex optimization, which can be efficiently solved via a numerical optimization software. A numerical example shows the effectiveness of the proposed method.

1404.7188 2026-06-04 math.NA cs.NA stat.CO

Limitations of polynomial chaos expansions in the Bayesian solution of inverse problems

多项式混沌展开在贝叶斯反问题求解中的局限性

Fei Lu, Matthias Morzfeld, Xuemin Tu, Alexandre J. Chorin

AI总结 研究指出多项式混沌展开在数据信息超出先验假设时会导致后验估计不准确,需适应性增加多项式阶数以提高精度,但成本可能过高。

详情
AI中文摘要

多项式混沌展开用于降低贝叶斯反问题求解的计算成本,通过创建可快速评估的近似后验。分析与实例表明,当数据包含超出先验假设的信息时,近似后验可能与真实后验差异显著,导致估计不准确。可通过适应性增加多项式阶数提高精度,但成本可能过高,不如无近似后验的蒙特卡洛采样有效。

英文摘要

Polynomial chaos expansions are used to reduce the computational cost in the Bayesian solutions of inverse problems by creating a surrogate posterior that can be evaluated inexpensively. We show, by analysis and example, that when the data contain significant information beyond what is assumed in the prior, the surrogate posterior can be very different from the posterior, and the resulting estimates become inaccurate. One can improve the accuracy by adaptively increasing the order of the polynomial chaos, but the cost may increase too fast for this to be cost effective compared to Monte Carlo sampling without a surrogate posterior.

1502.02251 2026-06-04 stat.ML cs.LG cs.RO cs.SY eess.SY

From Pixels to Torques: Policy Learning with Deep Dynamical Models

从像素到扭矩:基于深度动态模型的策略学习

Niklas Wahlström, Thomas B. Schön, Marc Peter Deisenroth

AI总结 本文提出一种高效的数据驱动强化学习算法,通过深度动态模型直接从像素信息学习闭环控制策略,解决高维观测下的连续状态-动作空间数据高效学习问题。

详情
Comments
9 pages
AI中文摘要

在开发完全自主系统中,利用非常高的维数观测进行数据高效学习连续状态-动作空间仍是一个关键挑战。本文考虑这一挑战的一个实例,即像素到扭矩问题,其中智能体必须仅从像素信息学习闭环控制策略。我们引入了一种数据高效、基于模型的强化学习算法,该算法直接从像素信息学习此类闭环策略。关键成分是深度动态模型,该模型使用深度自编码器学习图像的低维嵌入,并在该低维特征空间中学习预测模型。联合学习确保不仅静态属性,而且动态属性都被考虑在内。这对于长期预测至关重要,而长期预测是适应性模型预测控制策略的核心。与最先进的连续状态和动作强化学习方法相比,我们的方法学习速度快,可扩展到高维状态空间,并是向完全自主学习从像素到扭矩的重要一步。

英文摘要

Data-efficient learning in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. In this paper, we consider one instance of this challenge, the pixels to torques problem, where an agent must learn a closed-loop control policy from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model that uses deep auto-encoders to learn a low-dimensional embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning ensures that not only static but also dynamic properties of the data are accounted for. This is crucial for long-term predictions, which lie at the core of the adaptive model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art reinforcement learning methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces and is an important step toward fully autonomous learning from pixels to torques.

1402.5297 2026-06-04 math.ST cs.NA math.NA math.PR stat.TH

Maximum-A-Posteriori Estimates in Linear Inverse Problems with Log-concave Priors are Proper Bayes Estimators

线性反问题中具有对数凹先验的最大后验估计是恰当的贝叶斯估计器

Martin Burger, Felix Lucka

AI总结 本文挑战传统观点,证明在高维稀疏性促进的贝叶斯反问题中,最大后验估计是恰当的贝叶斯估计器,通过引入新的凸贝叶斯成本函数。

详情
AI中文摘要

贝叶斯反演中经常争论的问题是,最大后验估计(MAP)和条件均值估计(CM)中哪一个更优。由于MAP估计对应变分正则化技术的解,这也成为两个研究领域争论的焦点。通过贝叶斯成本形式化论点,CM估计因是均方误差成本的贝叶斯估计器而被传统偏好,而MAP估计因仅在渐近情况下是均匀成本函数的贝叶斯估计器而被传统否定。本文提出近期的理论和计算观察,挑战这一观点,特别是在高维稀疏性促进的贝叶斯反问题中。通过Bregman距离,我们提出新的、恰当的凸贝叶斯成本函数,其中MAP估计器是贝叶斯估计器。我们进一步通过纠正关于MAP估计的常见误解来补充这一发现。总体而言,我们旨在恢复在具有对数凹先验的线性反问题中MAP估计作为恰当贝叶斯估计器的地位。

英文摘要

A frequent matter of debate in Bayesian inversion is the question, which of the two principle point-estimators, the maximum-a-posteriori (MAP) or the conditional mean (CM) estimate is to be preferred. As the MAP estimate corresponds to the solution given by variational regularization techniques, this is also a constant matter of debate between the two research areas. Following a theoretical argument - the Bayes cost formalism - the CM estimate is classically preferred for being the Bayes estimator for the mean squared error cost while the MAP estimate is classically discredited for being only asymptotically the Bayes estimator for the uniform cost function. In this article we present recent theoretical and computational observations that challenge this point of view, in particular for high-dimensional sparsity-promoting Bayesian inversion. Using Bregman distances, we present new, proper convex Bayes cost functions for which the MAP estimator is the Bayes estimator. We complement this finding by results that correct further common misconceptions about MAP estimates. In total, we aim to rehabilitate MAP estimates in linear inverse problems with log-concave priors as proper Bayes estimators.

1312.3378 2026-06-04 math.NA cs.NA math.ST stat.TH

Expectation Propagation for Nonlinear Inverse Problems -- with an Application to Electrical Impedance Tomography

期望传播用于非线性反问题——以电阻抗断层成像应用为例

Matthias Gehre, Bangti Jin

AI总结 本文提出基于期望传播的快速近似推断方法,用于求解非线性反问题的后验概率分布,通过高效估计后验均值和协方差,提供反演解及不确定性量化,应用于电阻抗断层成像并对比马尔可夫链蒙特卡洛方法。

详情
Comments
Journal of Computational Physics, to appear
AI中文摘要

本文研究了一种基于期望传播的快速近似推断方法,用于探索非线性反问题的后验概率分布。该方法能够高效地提供可靠的后验均值和协方差估计,从而提供反演解及量化不确定性。讨论了迭代算法的某些理论性质,并描述了重要投影类问题的有效实现。通过一个典型的非线性反问题——具有完整电极模型的电阻抗断层成像,在稀疏约束下进行演示。展示了真实实验数据的数值结果,并与马尔可夫链蒙特卡洛方法的结果进行比较。结果表明,该方法准确且计算效率非常高。

英文摘要

In this paper, we study a fast approximate inference method based on expectation propagation for exploring the posterior probability distribution arising from the Bayesian formulation of nonlinear inverse problems. It is capable of efficiently delivering reliable estimates of the posterior mean and covariance, thereby providing an inverse solution together with quantified uncertainties. Some theoretical properties of the iterative algorithm are discussed, and the efficient implementation for an important class of problems of projection type is described. The method is illustrated with one typical nonlinear inverse problem, electrical impedance tomography with complete electrode model, under sparsity constraints. Numerical results for real experimental data are presented, and compared with that by Markov chain Monte Carlo. The results indicate that the method is accurate and computationally very efficient.

1310.0865 2026-06-04 stat.ML cs.LG cs.SY eess.SY

Electricity Market Forecasting via Low-Rank Multi-Kernel Learning

通过低秩多核学习进行电力市场预测

Vassilis Kekatos, Yu Zhang, Georgios B. Giannakis

AI总结 本文通过低秩核学习方法对电力市场进行预测,利用核规范正则化选择定价节点和小时的核,提高预测精度和计算效率。

详情
Comments
10 pages
AI中文摘要

智能电网愿景涉及先进的信息技术和数据分析,以提高电网基础设施的效率、可持续性和经济性。本文利用现代统计学习工具进行电力市场推断。日前提价预测被转化为低秩核学习问题。独特地利用市场清算过程,拥堵模式被建模为矩阵中时空变化价格的秩一成分。通过一种新的核范数基于正则化,可以在定价节点和小时之间系统地选择核。尽管市场范围预测从学习角度看是有益的,但涉及处理高维市场数据。后者在设计解决涉及的非凸优化问题的块坐标下降算法后成为可能。该算法利用了块稀疏向量恢复的结果,并保证能够收敛到一个 stationary 点。在中西部 ISO(MISO)市场的实际数据上的数值测试证实了所开发方法的预测精度、计算效率和解释性优势。

英文摘要

The smart grid vision entails advanced information technology and data analytics to enhance the efficiency, sustainability, and economics of the power grid infrastructure. Aligned to this end, modern statistical learning tools are leveraged here for electricity market inference. Day-ahead price forecasting is cast as a low-rank kernel learning problem. Uniquely exploiting the market clearing process, congestion patterns are modeled as rank-one components in the matrix of spatio-temporally varying prices. Through a novel nuclear norm-based regularization, kernels across pricing nodes and hours can be systematically selected. Even though market-wide forecasting is beneficial from a learning perspective, it involves processing high-dimensional market data. The latter becomes possible after devising a block-coordinate descent algorithm for solving the non-convex optimization problem involved. The algorithm utilizes results from block-sparse vector recovery and is guaranteed to converge to a stationary point. Numerical tests on real data from the Midwest ISO (MISO) market corroborate the prediction accuracy, computational efficiency, and the interpretative merits of the developed approach over existing alternatives.

1506.03382 2026-06-04 math.ST cs.IT cs.NA math.IT math.NA stat.ML stat.TH

Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow

噪声稀疏相位恢复的最优收敛率:基于阈值梯度下降法

T. Tony Cai, Xiaodong Li, Zongming Ma

AI总结 本文研究噪声稀疏相位恢复问题,提出阈值梯度下降算法,在稀疏度范围内实现最优收敛率。

详情
Comments
28 pages, 4 figures
AI中文摘要

本文考虑噪声稀疏相位恢复问题:从含噪声二次测量$y_j = (a_j' x )^2 + ε_j$恢复稀疏信号$x \in \mathbb{R}^p$。目标是理解稀疏性对估计精度的影响,并构造计算可行的估计器以达到最优收敛率。受Wirtinger Flow算法启发,提出新的阈值梯度下降算法,证明其在稀疏度范围内适应性地达到最小最大最优收敛率,当$a_j$为独立标准高斯随机向量且样本量足够大时。

英文摘要

This paper considers the noisy sparse phase retrieval problem: recovering a sparse signal $x \in \mathbb{R}^p$ from noisy quadratic measurements $y_j = (a_j' x )^2 + ε_j$, $j=1, \ldots, m$, with independent sub-exponential noise $ε_j$. The goals are to understand the effect of the sparsity of $x$ on the estimation precision and to construct a computationally feasible estimator to achieve the optimal rates. Inspired by the Wirtinger Flow [12] proposed for noiseless and non-sparse phase retrieval, a novel thresholded gradient descent algorithm is proposed and it is shown to adaptively achieve the minimax optimal rates of convergence over a wide range of sparsity levels when the $a_j$'s are independent standard Gaussian random vectors, provided that the sample size is sufficiently large compared to the sparsity of $x$.

1408.2773 2026-06-04 stat.CO cs.NA math.NA

On Integration Methods Based on Scrambled Nets of Arbitrary Size

基于任意尺寸的随机数网络的积分方法

Mathieu Gerber

AI总结 本文研究了使用随机数网络改进积分方法的收敛性,推导了方差界并证明了顺序拟蒙特卡洛方法在任意N下的收敛性。

详情
Comments
27 pages, 2 figures (final version, to appear in The Journal of Complexity)
AI中文摘要

我们考虑了评估I(φ):=∫[0,1)^s φ(x) dx对于函数φ∈L^2[0,1)^s的问题。在I(φ)可以近似为形式为N^{-1}∑_{n=0}^{N-1}φ(x^n)的估计时,已知通过取scrambled (t,s)-序列中的前N=λb^m个点,λ∈{1,…,b-1},可以将蒙特卡洛收敛率O_P(N^{-1/2})提高。本文推导了scrambled net quadrature规则方差的界,其阶为o(N^{-1}),不加任何限制于N。作为推论,该界允许我们为任何N的模式提供简单的条件,使得对于依赖于quadature大小N的函数,积分误差大小为o_P(N^{-1/2})。值得注意的是,我们证明顺序拟蒙特卡洛方法对于任何N值都能达到o_P(N^{-1/2})收敛率。在数值研究中,我们展示了对于scrambled net quadrature规则,当被积函数φ为不连续函数时,可以放松N的约束而不损失效率,而对于顺序拟蒙特卡洛方法,取N=λb^m可能只能提供适度的收益。

英文摘要

We consider the problem of evaluating $I(φ):=\int_{[0,1)^s}φ(x) dx$ for a function $φ\in L^2[0,1)^{s}$. In situations where $I(φ)$ can be approximated by an estimate of the form $N^{-1}\sum_{n=0}^{N-1}φ(x^n)$, with $\{x^n\}_{n=0}^{N-1}$ a point set in $[0,1)^s$, it is now well known that the $O_P(N^{-1/2})$ Monte Carlo convergence rate can be improved by taking for $\{x^n\}_{n=0}^{N-1}$ the first $N=λb^m$ points, $λ\in\{1,\dots,b-1\}$, of a scrambled $(t,s)$-sequence in base $b\geq 2$. In this paper we derive a bound for the variance of scrambled net quadrature rules which is of order $o(N^{-1})$ without any restriction on $N$. As a corollary, this bound allows us to provide simple conditions to get, for any pattern of $N$, an integration error of size $o_P(N^{-1/2})$ for functions that depend on the quadrature size $N$. Notably, we establish that sequential quasi-Monte Carlo (M. Gerber and N. Chopin, 2015, \emph{J. R. Statist. Soc. B, to appear.}) reaches the $o_P(N^{-1/2})$ convergence rate for any values of $N$. In a numerical study, we show that for scrambled net quadrature rules we can relax the constraint on $N$ without any loss of efficiency when the integrand $φ$ is a discontinuous function while, for sequential quasi-Monte Carlo, taking $N=λb^m$ may only provide moderate gains.

1406.6603 2026-06-04 math.NA cs.LG cs.NA stat.ML

A scaled gradient projection method for Bayesian learning in dynamical systems

一种用于动态系统贝叶斯学习的缩放梯度投影方法

Silvia Bonettini, Alessandro Chiuso, Marco Prato

AI总结 本文提出一种缩放梯度投影算法,用于解决贝叶斯学习中的非凸优化问题,通过有效设计缩放矩阵和步长参数,实现高效求解。

详情
Journal ref
SIAM Journal on Scientific Computing 37 (2015), A1297-A1318
AI中文摘要

系统辨识问题中选择最合适的模型类是一个关键任务,传统上通过交叉验证或渐近论点解决。最近文献中提出在贝叶斯框架中解决此问题,通过少量超参数调节模型复杂性,可通过边际似然最大化估计。因此,设计有效的优化方法至关重要。若将未知脉冲响应建模为具有合适核的高斯过程,最大化边际似然会导致挑战性的非凸优化问题,需要稳定有效的求解策略。本文通过缩放梯度投影算法解决此问题,其中缩放矩阵和步长参数在计算时间上与二阶方法相当。特别地,我们提出了一种扩展的分裂梯度方法,用于在存在框约束时设计缩放矩阵,并有效实现梯度和目标函数。在多个测试问题上的广泛数值实验表明,该方法在几毫秒内提供与最先进方法相媲美的精度解决方案。此外,该策略的灵活性使其易于适应不同领域中的更广泛问题。

英文摘要

A crucial task in system identification problems is the selection of the most appropriate model class, and is classically addressed resorting to cross-validation or using asymptotic arguments. As recently suggested in the literature, this can be addressed in a Bayesian framework, where model complexity is regulated by few hyperparameters, which can be estimated via marginal likelihood maximization. It is thus of primary importance to design effective optimization methods to solve the corresponding optimization problem. If the unknown impulse response is modeled as a Gaussian process with a suitable kernel, the maximization of the marginal likelihood leads to a challenging nonconvex optimization problem, which requires a stable and effective solution strategy. In this paper we address this problem by means of a scaled gradient projection algorithm, in which the scaling matrix and the steplength parameter play a crucial role to provide a meaning solution in a computational time comparable with second order methods. In particular, we propose both a generalization of the split gradient approach to design the scaling matrix in the presence of box constraints, and an effective implementation of the gradient and objective function. The extensive numerical experiments carried out on several test problems show that our method is very effective in providing in few tenths of a second solutions of the problems with accuracy comparable with state-of-the-art approaches. Moreover, the flexibility of the proposed strategy makes it easily adaptable to a wider range of problems arising in different areas of machine learning, signal processing and system identification.

1306.1587 2026-06-04 math.NA cs.NA math.ST stat.ME stat.ML stat.TH

Spectral Convergence of the connection Laplacian from random samples

从随机样本中连接拉普拉斯算子的谱收敛

Amit Singer, Hau-tieng Wu

AI总结 本文提出统一框架,利用主纤维结构近似 manifold 上的其他连接拉普拉斯算子,并证明其在无限样本极限下的谱收敛性,适用于非均匀分布和有/无边界的流形。

详情
AI中文摘要

基于离散图拉普拉斯算子的谱方法,如扩散映射和拉普拉斯特征映射,常用于流形学习和非线性降维。Belkin 和 Niyogi 之前证明,在无限多数据点独立采样下,图拉普拉斯的本征向量和本征值收敛到流形的拉普拉斯-贝尔特拉米算子的本征函数和本征值。最近,我们引入了向量扩散映射,展示了流形切丛的连接拉普拉斯算子可通过随机样本近似。本文提出统一框架,通过考虑流形的主纤维结构近似其他连接拉普拉斯算子。我们证明在无限多独立随机样本极限下,这些拉普拉斯算子的本征向量和本征值收敛。我们还推广了谱收敛结果,适用于非均匀分布采样和有/无边界的流形。

英文摘要

Spectral methods that are based on eigenvectors and eigenvalues of discrete graph Laplacians, such as Diffusion Maps and Laplacian Eigenmaps are often used for manifold learning and non-linear dimensionality reduction. It was previously shown by Belkin and Niyogi \cite{belkin_niyogi:2007} that the eigenvectors and eigenvalues of the graph Laplacian converge to the eigenfunctions and eigenvalues of the Laplace-Beltrami operator of the manifold in the limit of infinitely many data points sampled independently from the uniform distribution over the manifold. Recently, we introduced Vector Diffusion Maps and showed that the connection Laplacian of the tangent bundle of the manifold can be approximated from random samples. In this paper, we present a unified framework for approximating other connection Laplacians over the manifold by considering its principle bundle structure. We prove that the eigenvectors and eigenvalues of these Laplacians converge in the limit of infinitely many independent random samples. We generalize the spectral convergence results to the case where the data points are sampled from a non-uniform distribution, and for manifolds with and without boundary.

1406.5286 2026-06-04 stat.ML cs.LG cs.NA math.NA math.OC

Enhancing Pure-Pixel Identification Performance via Preconditioning

通过预条件化增强纯像素识别性能

Nicolas Gillis, Wing-Kin Ma

AI总结 本文分析了不同预条件化方法以提升纯像素搜索算法的鲁棒性,针对SPA算法提出近似解的鲁棒性分析,并探讨了预白化和基于SPA的预条件化方法的鲁棒性与效率。

详情
Journal ref
SIAM J. on Imaging Sciences 8 (2), pp. 1161-1186, 2015
Comments
25 pages, 3 figures
AI中文摘要

在本文中,我们分析了不同预条件化方法以增强纯像素搜索算法的鲁棒性,这些算法用于盲超谱解混,并等价于近可分离的非负矩阵分解算法。我们的分析聚焦于 successive projection algorithm (SPA),一种简单、高效且可证明鲁棒的纯像素算法。最近,Gillis和Vavasis(arXiv:1310.2273)提出了一种可证明鲁棒的预条件化方法,该方法需要求解一个半正定规划(SDP)以找到包含数据点的最小体积椭球。由于在高精度下求解SDP可能耗时,我们扩展了鲁棒性分析以适用于SDP的近似解,即目标函数值与最优值相差某些乘法因子的解。证明了高精度解对鲁棒性并不关键,这为更快的预条件化方法(例如基于一阶优化方法的)铺平了道路。这一贡献也使我们能够为另外两种预条件化方法提供鲁棒性分析。第一种是预白化,可以解释为同一SDP的最优解加上额外约束。我们分析了预白化的鲁棒性,以表征其在某些情况下与基于SDP的预条件化方法具有竞争力的情况。第二种基于SPA本身,可以解释为SDP松弛的最优解。它在多个合成数据集上与基于SDP的预条件化方法竞争。

英文摘要

In this paper, we analyze different preconditionings designed to enhance robustness of pure-pixel search algorithms, which are used for blind hyperspectral unmixing and which are equivalent to near-separable nonnegative matrix factorization algorithms. Our analysis focuses on the successive projection algorithm (SPA), a simple, efficient and provably robust algorithm in the pure-pixel algorithm class. Recently, a provably robust preconditioning was proposed by Gillis and Vavasis (arXiv:1310.2273) which requires the resolution of a semidefinite program (SDP) to find a data points-enclosing minimum volume ellipsoid. Since solving the SDP in high precisions can be time consuming, we generalize the robustness analysis to approximate solutions of the SDP, that is, solutions whose objective function values are some multiplicative factors away from the optimal value. It is shown that a high accuracy solution is not crucial for robustness, which paves the way for faster preconditionings (e.g., based on first-order optimization methods). This first contribution also allows us to provide a robustness analysis for two other preconditionings. The first one is pre-whitening, which can be interpreted as an optimal solution of the same SDP with additional constraints. We analyze robustness of pre-whitening which allows us to characterize situations in which it performs competitively with the SDP-based preconditioning. The second one is based on SPA itself and can be interpreted as an optimal solution of a relaxation of the SDP. It is extremely fast while competing with the SDP-based preconditioning on several synthetic data sets.

1402.5662 2026-06-04 math.ST cs.NA math.NA stat.TH

Non-uniform spline recovery from small degree polynomial approximation

非均匀样条从低次多项式逼近中恢复

Yohann De Castro, Guillaume Mijoule

AI总结 研究通过TV范数正则化方法在多项式空间中解决稀疏尖峰反卷积问题,提出基于Chebyshev型最小分离条件的支持恢复和振幅误差量化界,并利用半正定规划无网格恢复非均匀样条的节点。

详情
AI中文摘要

我们研究了从噪声less和noisy矩测量组合中重建测度的问题。我们研究了一种TV-范数正则化过程,用于在该框架中局部化支持并估计目标离散测度的权重。此外,我们导出了在支持上Chebyshev型最小分离条件下支持恢复和振幅误差的定量界。顺便,我们研究了当观察到非均匀样条的内积与已知多项式基的高斯扰动(即已知小次多项式逼近)和边界条件已知时,非均匀样条的节点定位问题。我们证明了可以使用半正定规划以无网格方式恢复节点。

英文摘要

We investigate the sparse spikes deconvolution problem onto spaces of algebraic polynomials. Our framework encompasses the measure reconstruction problem from a combination of noiseless and noisy moment measurements. We study a TV-norm regularization procedure to localize the support and estimate the weights of a target discrete measure in this frame. Furthermore, we derive quantitative bounds on the support recovery and the amplitudes errors under a Chebyshev-type minimal separation condition on its support. Incidentally, we study the localization of the knots of non-uniform splines when a Gaussian perturbation of their inner-products with a known polynomial basis is observed (i.e. a small degree polynomial approximation is known) and the boundary conditions are known. We prove that the knots can be recovered in a grid-free manner using semidefinite programming.

1505.07170 2026-06-04 q-bio.QM cs.NA math.NA stat.ME

Sparse multiway decomposition for analysis and modeling of diffusion imaging and tractography

稀疏多向分解在扩散成像与束形成像分析与建模中的应用

Cesar F. Caiafa, Franco Pestilli

AI总结 本文提出利用稀疏多向分解方法对线性化神经影像模型进行处理,展示分解模型在保持准确性的同时显著降低内存和计算需求,适用于白质连通性评估等神经影像分析。

详情
Comments
19 pages, 1 table, 9 figures
AI中文摘要

神经影像数据集的公开数量正在快速增长。随着数据可用性和分辨率的提高,需要现代信号处理方法进行数据分析和结果验证。我们引入稀疏多向分解方法(Caiafa和Cichocki,2012)应用于线性化神经影像模型。我们证明分解模型更加紧凑但与完整模型一样准确,并可用于快速数据分析。以最近用于白质连通性评估的模型(Pestilli等,2014)为例,我们展示多向分解模型在准确性上与完整模型相当,同时仅需极小的内存和计算时间。该方法对使用线性近似测量信号的大多数神经影像方法具有重要意义。

英文摘要

The number of neuroimaging data sets publicly available is growing at fast rate. The increase in availability and resolution of neuroimaging data requires modern approaches to signal processing for data analysis and results validation. We introduce the application of sparse multiway decomposition methods (Caiafa and Cichocki, 2012) to linearized neuroimaging models. We show that decomposed models are more compact but as accurate as full models and can be successfully used for fast data analysis. We focus as example on a recent model for the evaluation of white matter connectomes (Pestilli et al, 2014). We show that the multiway decomposed model achieves accuracy comparable to the full model, while requiring only a small fraction of the memory and compute time. The approach has implications for a majority of neuroimaging methods using linear approximations to measured signals.

1505.05571 2026-06-04 math.NA cs.DC cs.NA stat.CO

Fast exact summation using small and large superaccumulators

利用小和大超累加器实现快速精确求和

Radford M. Neal

AI总结 本文提出两种新方法,通过精确求和并正确四舍五入,提高浮点数求和精度,适用于数据均值计算。方法利用超累加器概念,通过小和大超累加器组合实现高效精确求和,比简单求和更快且结果一致。

详情
AI中文摘要

我提出了两种新的方法,用于精确求和一组浮点数,并正确四舍五入到最近的浮点数。在许多应用中,比简单求和(每次加法后四舍五入)更高的精度是重要的,例如计算数据的样本均值。精确求和也保证了并行和串行实现的结果相同,因为精确的总和与顺序无关。新的方法使用超累加器概念的变种——超累加器是一种大固定点数,可以精确表示任何合理数量的浮点数之和。一种方法使用一个具有67个64位块的小超累加器,每个块与下一个块有32位的重叠,允许进位传播很少发生。当求和少量项时,单独使用小超累加器。对于大规模求和,还使用了大超累加器。它由4096个64位块组成,每个块对应可能的指数位和符号位的每一个组合,加上需要转移到小超累加器的块的计数。将项加到大超累加器时,只需更新一个块及其相关的计数,如果仔细实现,这需要非常少的指令。在现代64位处理器上,使用这种大和小超累加器组合精确求和一个大型数组的时间不到简单、不精确、有序求和时间的两倍,具有串行实现。使用少量处理器核心的并行实现可以预期能够以达到内存带宽限制的速度进行大型数组的精确求和。因此,一些试图在不精确的情况下提高精度的常见方法可能在至少对于大规模求和来说是无用的,因为它们比计算精确的总和要慢。

英文摘要

I present two new methods for exactly summing a set of floating-point numbers, and then correctly rounding to the nearest floating-point number. Higher accuracy than simple summation (rounding after each addition) is important in many applications, such as finding the sample mean of data. Exact summation also guarantees identical results with parallel and serial implementations, since the exact sum is independent of order. The new methods use variations on the concept of a "superaccumulator" - a large fixed-point number that can exactly represent the sum of any reasonable number of floating-point values. One method uses a "small" superaccumulator with sixty-seven 64-bit chunks, each with 32-bit overlap with the next chunk, allowing carry propagation to be done infrequently. The small superaccumulator is used alone when summing a small number of terms. For big summations, a "large" superaccumulator is used as well. It consists of 4096 64-bit chunks, one for every possible combination of exponent bits and sign bit, plus counts of when each chunk needs to be transferred to the small superaccumulator. To add a term to the large superaccumulator, only a single chunk and its associated count need to be updated, which takes very few instructions if carefully implemented. On modern 64-bit processors, exactly summing a large array using this combination of large and small superaccumulators takes less than twice the time of simple, inexact, ordered summation, with a serial implementation. A parallel implementation using a small number of processor cores can be expected to perform exact summation of large arrays at a speed that reaches the limit imposed by memory bandwidth. Some common methods that attempt to improve accuracy without being exact may therefore be pointless, at least for large summations, since they are slower than computing the sum exactly.

1505.05216 2026-06-04 eess.SY cs.SY math.OC stat.ML

Convergence Analysis of Policy Iteration

策略迭代的收敛性分析

Ali Heydari

AI总结 本文研究了确定性已知动态系统的非线性自适应最优控制问题,分析了基于稳定初始控制的策略迭代方案的收敛性及极限函数的最优性,并比较了策略迭代与价值迭代的收敛速度。

详情
AI中文摘要

本文研究了确定性已知动态系统的非线性自适应最优控制问题,分析了基于稳定初始控制的策略迭代方案的收敛性及极限函数的最优性,并比较了策略迭代与价值迭代的收敛速度。此外,还理论上将策略迭代的收敛结果扩展到了多步前瞻策略迭代的情况。

英文摘要

Adaptive optimal control of nonlinear dynamic systems with deterministic and known dynamics under a known undiscounted infinite-horizon cost function is investigated. Policy iteration scheme initiated using a stabilizing initial control is analyzed in solving the problem. The convergence of the iterations and the optimality of the limit functions, which follows from the established uniqueness of the solution to the Bellman equation, are the main results of this study. Furthermore, a theoretical comparison between the speed of convergence of policy iteration versus value iteration is presented. Finally, the convergence results are extended to the case of multi-step look-ahead policy iteration.

1505.04824 2026-06-04 math.OC cs.SY eess.SY stat.ML

An Asynchronous Mini-Batch Algorithm for Regularized Stochastic Optimization

一种异步小批量算法用于正则化随机优化

Hamid Reza Feyzmahdavian, Arda Aytekin, Mikael Johansson

AI总结 本文提出一种异步小批量算法,用于解决具有平滑损失函数的正则化随机优化问题,通过合理选择步长值,实现O(1/√T)和O(1/T)的收敛速度,理论结果在分布式计算环境中得到验证。

详情
AI中文摘要

小批量优化已被证明是大规模学习的强大范式。然而,目前最先进的并行小批量算法假设同步操作或循环更新顺序。当工作节点异质(由于不同的计算能力或不同的通信延迟)时,同步和循环操作效率低下,因为它们会留下工人空闲等待较慢的节点完成计算。在本文中,我们提出了一种异步小批量算法,用于正则化随机优化问题,消除了空闲等待并允许工人以最大更新速率运行。我们证明,通过合适选择步长值,该算法在一般凸正则化函数情况下达到O(1/√T)的收敛速度,在强凸正则化函数情况下达到O(1/T)的收敛速度,其中T是迭代次数。在两种情况下,异步性对算法收敛速度的影响在渐近上可以忽略不计,可以期望在工人数量上的近线性加速。理论结果在分布式计算基础设施上的实际实现中得到证实。

英文摘要

Mini-batch optimization has proven to be a powerful paradigm for large-scale learning. However, the state of the art parallel mini-batch algorithms assume synchronous operation or cyclic update orders. When worker nodes are heterogeneous (due to different computational capabilities or different communication delays), synchronous and cyclic operations are inefficient since they will leave workers idle waiting for the slower nodes to complete their computations. In this paper, we propose an asynchronous mini-batch algorithm for regularized stochastic optimization problems with smooth loss functions that eliminates idle waiting and allows workers to run at their maximal update rates. We show that by suitably choosing the step-size values, the algorithm achieves a rate of the order $O(1/\sqrt{T})$ for general convex regularization functions, and the rate $O(1/T)$ for strongly convex regularization functions, where $T$ is the number of iterations. In both cases, the impact of asynchrony on the convergence rate of our algorithm is asymptotically negligible, and a near-linear speedup in the number of workers can be expected. Theoretical results are confirmed in real implementations on a distributed computing infrastructure.

1505.04724 2026-06-04 math.NA cs.NA stat.CO

A Hybrid Monte-Carlo Sampling Smoother for Four Dimensional Data Assimilation

一种用于四维数据同化混合蒙特卡罗采样平滑器

Ahmed Attia, Vishwas Rao, Adrian Sandu

AI总结 本文提出一种基于集合的采样平滑器,用于四维数据同化,结合混合/汉密尔顿蒙特卡罗方法,有效采样初始时间的后验概率密度,适用于非高斯误差和非线性动力学。

详情
Comments
33 Pages
AI中文摘要

本文构造了一种基于集合的采样平滑器,用于四维数据同化,采用混合/汉密尔顿蒙特卡罗方法。该平滑器高效地从解的初始时间的后验概率密度中采样。与著名的集合卡尔曼平滑器不同,后者仅在线性高斯情况下最优,而本文的方法自然适应非高斯误差和非线性模型动态和观测算子。与四维变分方法不同,后者仅找到后验分布的一个模式,该平滑器提供后验不确定性的估计。可以使用集合均值作为状态的最小方差估计,或结合变分方法估计后续同化窗口的背景误差。数值结果展示了本文方法相较于传统变分和集合平滑方法的优势。

英文摘要

This paper constructs an ensemble-based sampling smoother for four-dimensional data assimilation using a Hybrid/Hamiltonian Monte-Carlo approach. The smoother samples efficiently from the posterior probability density of the solution at the initial time. Unlike the well-known ensemble Kalman smoother, which is optimal only in the linear Gaussian case, the proposed methodology naturally accommodates non-Gaussian errors and non-linear model dynamics and observation operators. Unlike the four-dimensional variational met\-hod, which only finds a mode of the posterior distribution, the smoother provides an estimate of the posterior uncertainty. One can use the ensemble mean as the minimum variance estimate of the state, or can use the ensemble in conjunction with the variational approach to estimate the background errors for subsequent assimilation windows. Numerical results demonstrate the advantages of the proposed method compared to the traditional variational and ensemble-based smoothing methods.

1412.6095 2026-06-04 eess.SY cs.LG cs.SY math.OC stat.ML

Theoretical and Numerical Analysis of Approximate Dynamic Programming with Approximation Errors

近似动态规划中近似误差的理论与数值分析

Ali Heydari

AI总结 本文研究近似动态规划迭代中误差对最终结果的影响,分析确定性非线性最优控制问题中价值迭代方案的收敛性,并推导稳定性和吸引区域的充分条件。

详情
Comments
This study is the counterpart of another work of the author (arXiv:1412.5675) which was for value iterations with initial stabilizing guess (with overlaps on Theorem 1 and Lemma 1). As for the revision on this work, some steps of proofs are updated and an explanation about the approximation error is included. Initial submission date: 12/18/2014
AI中文摘要

本文旨在回答近似动态规划(ADP)每次迭代中的近似误差如何影响最终结果的问题。研究了在考虑每次迭代中的误差影响下,确定性非线性最优控制问题中价值迭代方案的收敛性。通过已知的一般最优控制问题中的量和可验证的假设,获得了围绕最优解的有界性。此外,由于近似误差导致结果偏离最优性,推导了在有限次价值迭代后获得的结果所操作系统的稳定性充分条件,以及其吸引区域的估计。最后,通过轨道机动问题的实现过程验证了理论发展的假设,并应用充分条件以保证稳定性和近优性。

英文摘要

This study is aimed at answering the famous question of how the approximation errors at each iteration of Approximate Dynamic Programming (ADP) affect the quality of the final results considering the fact that errors at each iteration affect the next iteration. To this goal, convergence of Value Iteration scheme of ADP for deterministic nonlinear optimal control problems with undiscounted cost functions is investigated while considering the errors existing in approximating respective functions. The boundedness of the results around the optimal solution is obtained based on quantities which are known in a general optimal control problem and assumptions which are verifiable. Moreover, since the presence of the approximation errors leads to the deviation of the results from optimality, sufficient conditions for stability of the system operated by the result obtained after a finite number of value iterations, along with an estimation of its region of attraction, are derived in terms of a calculable upper bound of the control approximation error. Finally, the process of implementation of the method on an orbital maneuver problem is investigated through which the assumptions made in the theoretical developments are verified and the sufficient conditions are applied for guaranteeing stability and near optimality.

1412.5675 2026-06-04 eess.SY cs.SY math.OC stat.ML

Stabilizing Value Iteration with and without Approximation Errors

通过近似误差稳定价值迭代

Ali Heydari

AI总结 本文分析了基于稳定策略启动的价值迭代在连续性、系统稳定性、算法收敛性和最优性等方面的表现,并探讨了近似误差对近似价值迭代的有界性和系统稳定性的影响。

详情
Comments
In this revision the proof of Lemma 5 is updated. Initial submission date: 12/17/2014. (This study has overlaps on Theorem 6 and Lemma 5 with another work of the author available at arXiv:1412.6095)
AI中文摘要

本文对使用价值迭代(VI)进行自适应最优控制进行了理论分析,包括结果的连续性、使用任何单一/常数控制策略操作系统的稳定性、使用演进/时间变化控制策略操作系统的稳定性、算法的收敛性以及极限函数的最优性。随后,考虑了涉及函数近似过程中的近似误差影响,并推导出另一组关于近似VI的有界性和在应用单一策略或演进策略时系统稳定性的结果。所提出的成果的一个特点是提供了吸引域的估计,使得如果初始条件在该区域内,整个轨迹将保持在其中,从而确保函数近似结果的可靠性。

英文摘要

Adaptive optimal control using value iteration (VI) initiated from a stabilizing policy is theoretically analyzed in various aspects including the continuity of the result, the stability of the system operated using any single/constant resulting control policy, the stability of the system operated using the evolving/time-varying control policy, the convergence of the algorithm, and the optimality of the limit function. Afterwards, the effect of presence of approximation errors in the involved function approximation processes is incorporated and another set of results for boundedness of the approximate VI as well as stability of the system operated under the results for both cases of applying a single policy or an evolving policy are derived. A feature of the presented results is providing estimations of the region of attraction so that if the initial condition is within the region, the whole trajectory will remain inside it and hence, the function approximation results will be reliable.

1312.7651 2026-06-04 stat.ML cs.LG cs.SY eess.SY

Petuum: A New Platform for Distributed Machine Learning on Big Data

Petuum:一种用于大数据上分布式机器学习的新平台

Eric P. Xing, Qirong Ho, Wei Dai, Jin Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, Yaoliang Yu

AI总结 本文提出一种通用框架,系统解决大规模机器学习中的数据和模型并行挑战,通过观察许多机器学习程序本质上是优化导向的,并允许容错、迭代收敛的算法解决方案,从而实现高效的系统设计。

详情
Comments
15 pages, 10 figures, final version in KDD 2015 under the same title
AI中文摘要

什么是系统化的方法,能够高效地将广泛先进的机器学习程序应用于工业级问题,使用大数据(高达数百亿参数)上的大数据(高达太字节或拍字节)?现代并行化策略采用细粒度操作和调度,超越经典批量同步处理范式,如MapReduce流行化,甚至专门的基于图的执行,依赖于机器学习程序的图表示。各种方法倾向于将系统和算法设计引向不同方向,难以找到适用于广泛机器学习程序的通用平台。我们提出一种通用框架,系统解决大规模机器学习中的数据和模型并行挑战,通过观察许多机器学习程序本质上是优化导向的,并允许容错、迭代收敛的算法解决方案。这为集成系统设计提供了独特机会,如受限误差网络同步和基于机器学习程序结构的动态调度。我们展示了这些系统设计相对于现代机器学习算法知名实现的有效性,使机器学习程序能够在较小的计算集群上以更少的时间和更大的模型规模运行。

英文摘要

What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization strategies employ fine-grained operations and scheduling beyond the classic bulk-synchronous processing paradigm popularized by MapReduce, or even specialized graph-based execution that relies on graph representations of ML programs. The variety of approaches tends to pull systems and algorithms design in different directions, and it remains difficult to find a universal platform applicable to a wide range of ML programs at scale. We propose a general-purpose framework that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions. This presents unique opportunities for an integrative system design, such as bounded-error network synchronization and dynamic scheduling based on ML program structure. We demonstrate the efficacy of these system designs versus well-known implementations of modern ML algorithms, allowing ML programs to run in much less time and at considerably larger model sizes, even on modestly-sized compute clusters.

1505.02343 2026-06-04 cs.LG cs.NA math.NA stat.ML

Bayesian Sparse Tucker Models for Dimension Reduction and Tensor Completion

基于贝叶斯稀疏Tucker模型的降维与张量补全

Qibin Zhao, Liqing Zhang, Andrzej Cichocki

AI总结 本文提出一种概率生成Tucker模型,通过结构稀疏性在多线性潜在空间中实现张量降维与补全,自动适应模型复杂度并提升泛化性能。

详情
AI中文摘要

Tucker分解是现代张量数据分析机器学习的核心,广泛应用于多向特征提取、压缩感知和张量补全。最具有挑战性的问题是确定模型复杂度(即多线性秩),尤其是在存在噪声和缺失数据时。此外,现有方法无法考虑潜在因子的不确定性信息,导致泛化性能低下。为了解决这些问题,我们提出了一类概率生成Tucker模型,用于张量分解与补全,具有结构稀疏性。为了利用结构稀疏建模,我们引入了两种组稀疏诱导先验,通过Laplace和学生t分布的分层表示,从而实现完全后验推断。对于模型学习,我们推导了所有模型(超)参数上的变分贝叶斯推断,并开发了基于多线性操作的高效可扩展算法。我们的方法可以自动适应模型复杂度,并通过模型证据的最大下界原理推断最优多线性秩。在合成、化学计量学和神经影像数据上的实验结果和比较显示,我们的模型在恢复多线性秩和缺失条目方面表现出色。

英文摘要

Tucker decomposition is the cornerstone of modern machine learning on tensorial data analysis, which have attracted considerable attention for multiway feature extraction, compressive sensing, and tensor completion. The most challenging problem is related to determination of model complexity (i.e., multilinear rank), especially when noise and missing data are present. In addition, existing methods cannot take into account uncertainty information of latent factors, resulting in low generalization performance. To address these issues, we present a class of probabilistic generative Tucker models for tensor decomposition and completion with structural sparsity over multilinear latent space. To exploit structural sparse modeling, we introduce two group sparsity inducing priors by hierarchial representation of Laplace and Student-t distributions, which facilitates fully posterior inference. For model learning, we derived variational Bayesian inferences over all model (hyper)parameters, and developed efficient and scalable algorithms based on multilinear operations. Our methods can automatically adapt model complexity and infer an optimal multilinear rank by the principle of maximum lower bound of model evidence. Experimental results and comparisons on synthetic, chemometrics and neuroimaging data demonstrate remarkable performance of our models for recovering ground-truth of multilinear rank and missing entries.

1406.6668 2026-06-04 math.NA cs.NA math.ST stat.TH

Bayesian Numerical Homogenization

贝叶斯数值均质化

Houman Owhadi

AI总结 本文提出将数值均质化问题转化为贝叶斯推断问题,通过噪声激励和有限观测估计解的值,发现最优基函数具有最优恢复性质。

详情
Comments
22 pages. To appear in SIAM Multiscale Modeling and Simulation
AI中文摘要

数值均质化,即通过任意粗糙系数的PDE解空间的有限维近似,需要识别准确的基元素。这些基元素通常是在繁琐的科学探究和猜测过程中找到的。这个问题能否被简化?是否存在一个通用的配方/决策框架来指导基元素的设计?我们建议基于将数值均质化重新表述为贝叶斯推断问题,其中给定的PDE(或多尺度算子)被噪声(随机右端项/源项)激励,并试图基于有限观测估计解在给定点的值。我们将这种重新表述应用于任意积分-微分方程的数值均质化基元素识别,并展示这些基元素具有最优恢复性质。特别是我们展示了Rough Polyharmonic Splines如何作为高斯滤波问题的最优解而被重新发现。

英文摘要

Numerical homogenization, i.e. the finite-dimensional approximation of solution spaces of PDEs with arbitrary rough coefficients, requires the identification of accurate basis elements. These basis elements are oftentimes found after a laborious process of scientific investigation and plain guesswork. Can this identification problem be facilitated? Is there a general recipe/decision framework for guiding the design of basis elements? We suggest that the answer to the above questions could be positive based on the reformulation of numerical homogenization as a Bayesian Inference problem in which a given PDE with rough coefficients (or multi-scale operator) is excited with noise (random right hand side/source term) and one tries to estimate the value of the solution at a given point based on a finite number of observations. We apply this reformulation to the identification of bases for the numerical homogenization of arbitrary integro-differential equations and show that these bases have optimal recovery properties. In particular we show how Rough Polyharmonic Splines can be re-discovered as the optimal solution of a Gaussian filtering problem.

1505.00965 2026-06-04 math.NA cs.CE cs.NA physics.data-an q-fin.CP stat.CO

An Introduction to Multilevel Monte Carlo for Option Valuation

多级蒙特卡洛方法简介:期权定价应用

Desmond J. Higham

AI总结 本文介绍了多级蒙特卡洛方法,用于期权定价,总结了近期研究成果及该方法在提升计算效率方面的贡献。

详情
Comments
Submitted to International Journal of Computer Mathematics, special issue on Computational Methods in Finance
AI中文摘要

蒙特卡洛方法是一种简单且灵活的工具,广泛应用于计算金融领域。在这一背景下,通常感兴趣的量是随机微分方程定义的随机变量的期望值。2008年,吉尔斯提出了一种显著改进的方法,即在数值方法离散化后应用标准蒙特卡洛方法。他的多级蒙特卡洛方法提供了速度提升的阶数,由精度 epsilon 的倒数给出,其中 epsilon 是所需精度。当需要两位精度时,计算可以快100倍。多级思想自以来被众多研究者采用,产生了一大批具有实际意义的结果,其中大部分尚未进入阐述性文献。本文简要、易于理解地介绍了多级蒙特卡洛方法,并总结了适用于期权评估任务的最新研究成果。

英文摘要

Monte Carlo is a simple and flexible tool that is widely used in computational finance. In this context, it is common for the quantity of interest to be the expected value of a random variable defined via a stochastic differential equation. In 2008, Giles proposed a remarkable improvement to the approach of discretizing with a numerical method and applying standard Monte Carlo. His multilevel Monte Carlo method offers an order of speed up given by the inverse of epsilon, where epsilon is the required accuracy. So computations can run 100 times more quickly when two digits of accuracy are required. The multilevel philosophy has since been adopted by a range of researchers and a wealth of practically significant results has arisen, most of which have yet to make their way into the expository literature. In this work, we give a brief, accessible, introduction to multilevel Monte Carlo and summarize recent results applicable to the task of option evaluation.

1505.00526 2026-06-04 math.NA cs.NA stat.ML

An Explicit Sampling Dependent Spectral Error Bound for Column Subset Selection

列子集选择中的显式采样依赖谱误差界

Tianbao Yang, Lijun Zhang, Rong Jin, Shenghuo Zhu

AI总结 本文提出一种新的随机算法谱范数重构分析,建立显式依赖采样概率的误差界,揭示采样概率与重构误差的权衡,优于现有基于特定概率分布的误差界,并推导出更优的采样分布。

详情
AI中文摘要

本文研究列子集选择问题,对简单随机算法的谱范数重构进行了新分析,建立了显式依赖采样概率的误差界。该误差界允许更深入理解采样概率与重构误差的权衡,比现有利用特定概率分布的误差界具有更多洞察,并推导出更优的采样分布。特别地,我们证明了采样概率与统计杠杆得分平方根成正比的分布优于均匀采样,并在统计杠杆得分非常非均匀时优于基于杠杆的采样。通过解决与误差界相关的约束优化问题,利用高效的二分搜索方法,我们能够实现优于使用杠杆分布或与统计杠杆得分平方根成正比的分布的性能。数值模拟展示了新采样分布对低秩矩阵近似和最小二乘近似相比现有最先进算法的优势。

英文摘要

In this paper, we consider the problem of column subset selection. We present a novel analysis of the spectral norm reconstruction for a simple randomized algorithm and establish a new bound that depends explicitly on the sampling probabilities. The sampling dependent error bound (i) allows us to better understand the tradeoff in the reconstruction error due to sampling probabilities, (ii) exhibits more insights than existing error bounds that exploit specific probability distributions, and (iii) implies better sampling distributions. In particular, we show that a sampling distribution with probabilities proportional to the square root of the statistical leverage scores is always better than uniform sampling and is better than leverage-based sampling when the statistical leverage scores are very nonuniform. And by solving a constrained optimization problem related to the error bound with an efficient bisection search we are able to achieve better performance than using either the leverage-based distribution or that proportional to the square root of the statistical leverage scores. Numerical simulations demonstrate the benefits of the new sampling distributions for low-rank matrix approximation and least square approximation compared to state-of-the art algorithms.

1505.00475 2026-06-04 stat.ME econ.GN q-fin.EC

On the Forecast Combination Puzzle

关于预测组合难题

Wei Qian, Craig A. Rolling, Gang Cheng, Yuhong Yang

AI总结 本文探讨了预测组合难题的成因,提出多级AFTER策略以解决该问题,通过模拟和实证分析揭示了预测组合场景的重要性。

详情
AI中文摘要

预测组合难题常常被报告为在预测组合文献中,候选预测的简单平均比复杂的组合方法更稳健。这种现象通常被称为“预测组合谜题”。受此谜题的启发,我们探讨了其可能的解释,包括估计误差、无效的加权公式和模型筛选。我们表明,现有的对谜题的理解应通过区分不同的预测组合场景来补充,即组合以适应和组合以改进。在不考虑底层场景的情况下应用组合方法本身可能导致谜题。基于我们的新理解,通过模拟和实证分析揭示了谜题的原因。我们进一步提出了一种多级AFTER策略,该策略可以整合不同组合方法的优势,并智能适应底层场景。特别地,通过将简单平均视为候选预测,所提出的策略被证明可以避免估计误差的高昂成本,并在很大程度上解决预测组合谜题。

英文摘要

It is often reported in forecast combination literature that a simple average of candidate forecasts is more robust than sophisticated combining methods. This phenomenon is usually referred to as the "forecast combination puzzle". Motivated by this puzzle, we explore its possible explanations including estimation error, invalid weighting formulas and model screening. We show that existing understanding of the puzzle should be complemented by the distinction of different forecast combination scenarios known as combining for adaptation and combining for improvement. Applying combining methods without consideration of the underlying scenario can itself cause the puzzle. Based on our new understandings, both simulations and real data evaluations are conducted to illustrate the causes of the puzzle. We further propose a multi-level AFTER strategy that can integrate the strengths of different combining methods and adapt intelligently to the underlying scenario. In particular, by treating the simple average as a candidate forecast, the proposed strategy is shown to avoid the heavy cost of estimation error and, to a large extent, solve the forecast combination puzzle.

1505.00314 2026-06-04 cs.LG cs.SY eess.SY stat.ME

Deconstructing Principal Component Analysis Using a Data Reconciliation Perspective

从数据协调视角解构主成分分析

Shankar Narasimhan, Nirav Bhatt

AI总结 本文从数据协调视角探讨主成分分析,揭示两者紧密关联,构建统一框架并展示其协同处理数据的方法。

详情
Journal ref
Computers and Chemical Engineering 77 (2015) 74-84
AI中文摘要

数据协调(DR)和主成分分析(PCA)是过程工业中两种流行的数据分析技术。数据协调用于从错误测量中获得准确且一致的变量和参数估计。PCA主要用于减少高维数据的维度并作为去噪预处理技术。这两种技术曾被独立开发和部署。本文的主要目的是阐明这两种看似不同的技术之间的密切关系。这导致了PCA和DR的统一框架。进一步,我们展示了如何将这两种技术以协作和一致的方式应用于数据处理。该框架已扩展以处理部分测量系统,并纳入关于过程模型的部分知识。

英文摘要

Data reconciliation (DR) and Principal Component Analysis (PCA) are two popular data analysis techniques in process industries. Data reconciliation is used to obtain accurate and consistent estimates of variables and parameters from erroneous measurements. PCA is primarily used as a method for reducing the dimensionality of high dimensional data and as a preprocessing technique for denoising measurements. These techniques have been developed and deployed independently of each other. The primary purpose of this article is to elucidate the close relationship between these two seemingly disparate techniques. This leads to a unified framework for applying PCA and DR. Further, we show how the two techniques can be deployed together in a collaborative and consistent manner to process data. The framework has been extended to deal with partially measured systems and to incorporate partial knowledge available about the process model.

1504.06877 2026-06-04 eess.SY cs.SY stat.ML

Bayesian kernel-based system identification with quantized output data

基于量化输出数据的贝叶斯核系统辨识

Giulio Bottegal, Gianluigi Pillonetto, Håkan Hjalmarsson

AI总结 本文提出一种新的线性系统辨识方法,利用稳定样条核作为高斯过程的协方差函数,通过马尔可夫链蒙特卡洛方法估计系统参数,显著提升了量化数据下的辨识精度。

详情
Comments
Submitted to IFAC SysId 2015
AI中文摘要

本文介绍了一种新的线性系统辨识方法,针对量化输出数据。我们把脉冲响应建模为均值为零的高斯过程,其协方差(核)由最近提出的稳定样条核给出,该核编码了正则性和指数稳定性信息。这为将系统辨识问题置于贝叶斯框架中提供了基础。我们采用马尔可夫链蒙特卡洛(MCMC)方法来估计系统。特别地,我们展示了如何设计一个快速收敛于目标分布的吉布斯采样器。数值模拟显示,当应用于具有量化数据的系统辨识时,该方法在估计精度上显著优于现有基于核的方法。

英文摘要

In this paper we introduce a novel method for linear system identification with quantized output data. We model the impulse response as a zero-mean Gaussian process whose covariance (kernel) is given by the recently proposed stable spline kernel, which encodes information on regularity and exponential stability. This serves as a starting point to cast our system identification problem into a Bayesian framework. We employ Markov Chain Monte Carlo (MCMC) methods to provide an estimate of the system. In particular, we show how to design a Gibbs sampler which quickly converges to the target distribution. Numerical simulations show a substantial improvement in the accuracy of the estimates over state-of-the-art kernel-based methods when employed in identification of systems with quantized data.

1504.05723 2026-06-04 stat.CO cs.RO cs.SY eess.SY q-fin.CP

Noise Robust Online Inference for Linear Dynamic Systems

针对线性动态系统的噪声鲁棒在线推断

Saikat Saha

AI总结 本文提出一种新的噪声自适应 Rao-Blackwellized 粒子滤波器,通过分层高斯模型近似非高斯噪声密度,以提高鲁棒性和适应性,同时保持可扩展性和易实现性。

详情
AI中文摘要

我们重新审视了在非高斯环境下线性动态系统(LDS)的贝叶斯在线推断问题。噪声可以自然非高斯(偏斜和/或重尾)或为了处理虚假观测,噪声可以建模为重尾分布。然而,这种噪声鲁棒性可能在没有虚假观测时导致性能下降。因此,任何推断引擎不仅要鲁棒于噪声异常,还应适应潜在未知且随时间变化的噪声参数;同时应具有可扩展性和易实现性。为了解决这些问题,本文提出了一种新的噪声自适应 Rao-Blackwellized 粒子滤波器(RBPF),通过分层高斯模型作为任何非高斯(过程或测量)噪声密度的代理。这导致了可处理的条件线性高斯模型(CLGM)。然而,该框架需要一个有效的转移核作为粒子滤波(PF)的目标状态。这通常是未知的。我们概述了如何通过辅助潜在变量方法构建这样的核,至少对于包含许多常见非高斯噪声的某些类别。通过数值研究验证了该 RBPF 算法的有效性。

英文摘要

We revisit the Bayesian online inference problems for the linear dynamic systems (LDS) under non- Gaussian environment. The noises can naturally be non-Gaussian (skewed and/or heavy tailed) or to accommodate spurious observations, noises can be modeled as heavy tailed. However, at the cost of such noise robustness, the performance may degrade when such spurious observations are absent. Therefore, any inference engine should not only be robust to noise outlier, but also be adaptive to potentially unknown and time varying noise parameters; yet it should be scalable and easy to implement. To address them, we envisage here a new noise adaptive Rao-Blackwellized particle filter (RBPF), by leveraging a hierarchically Gaussian model as a proxy for any non-Gaussian (process or measurement) noise density. This leads to a conditionally linear Gaussian model (CLGM), that is tractable. However, this framework requires a valid transition kernel for the intractable state, targeted by the particle filter (PF). This is typically unknown. We outline how such kernel can be constructed provably, at least for certain classes encompassing many commonly occurring non-Gaussian noises, using auxiliary latent variable approach. The efficacy of this RBPF algorithm is demonstrated through numerical studies.

1412.4044 2026-06-04 stat.ML cs.CV cs.NA math.NA math.OC

Adaptive Stochastic Gradient Descent on the Grassmannian for Robust Low-Rank Subspace Recovery and Clustering

在Grassmannian上进行自适应随机梯度下降用于鲁棒低秩子空间恢复与聚类

Jun He, Yue Zhang

AI总结 本文提出GASG21算法,通过在Grassmann流形上进行自适应随机梯度下降,实现从大矩阵中鲁棒地恢复低秩子空间,并通过K子空间扩展实现对受损数据的聚类。

详情
Comments
13 pages, 12 figures and 6 tables
AI中文摘要

在本文中,我们提出了GASG21(Grassmannian Adaptive Stochastic Gradient for $L_{2,1}$ norm minimization),一种自适应随机梯度算法,用于从大规模矩阵中鲁棒地恢复低秩子空间。在存在列异常值的情况下,我们将批量模式矩阵$L_{2,1}$范数最小化问题(带有秩约束)重新公式化为受Grassmann流形约束的随机优化方法。对于每个观测数据向量,低秩子空间$\mathcal{S}$通过沿着Grassmannian的测地线进行梯度步长更新。为了加速随机梯度方法的收敛速度,我们选择通过利用连续梯度来自适应调整常数步长。此外,我们证明了在适当初始化的情况下,K子空间扩展K-GASG21可以将大量受损数据向量鲁棒地聚类到子空间的并集。在合成和真实数据上的数值实验展示了所提出算法在重柱异常值腐蚀下的效率和准确性。

英文摘要

In this paper, we present GASG21 (Grassmannian Adaptive Stochastic Gradient for $L_{2,1}$ norm minimization), an adaptive stochastic gradient algorithm to robustly recover the low-rank subspace from a large matrix. In the presence of column outliers, we reformulate the batch mode matrix $L_{2,1}$ norm minimization with rank constraint problem as a stochastic optimization approach constrained on Grassmann manifold. For each observed data vector, the low-rank subspace $\mathcal{S}$ is updated by taking a gradient step along the geodesic of Grassmannian. In order to accelerate the convergence rate of the stochastic gradient method, we choose to adaptively tune the constant step-size by leveraging the consecutive gradients. Furthermore, we demonstrate that with proper initialization, the K-subspaces extension, K-GASG21, can robustly cluster a large number of corrupted data vectors into a union of subspaces. Numerical experiments on synthetic and real data demonstrate the efficiency and accuracy of the proposed algorithms even with heavy column outliers corruption.

1405.5170 2026-06-04 math.NA cs.NA stat.ML

The ROMES method for statistical modeling of reduced-order-model error

ROMES方法用于减少阶模型误差的统计建模

Martin Drohmann, Kevin Carlberg

AI总结 ROMES方法利用高斯过程回归,通过低成本的误差指标构建真实误差分布,有效建模规范误差和一般输出误差,提升预测精度。

详情
Journal ref
SIAM/ASA Journal on Uncertainty Quantification, Vol. 3, No. 1, p. 116-145 (2015)
AI中文摘要

本文提出了一种用于减少阶模型误差统计建模的技术。该方法利用高斯过程回归,从少量计算成本低的`误差指标`到真实误差分布的映射。该分布的方差可解释为由减少阶模型引入的(epistemic)不确定性。为建模规范误差,该方法使用现有的严格误差界和残差范数作为指标;数值实验显示,该方法在对比典型误差界时具有近最优的预期效果ivity。为建模一般输出误差,该方法使用双加权残差---这些残差适用于不确定性控制---作为指标。实验表明,用此替代模型输出进行修正可使预测精度提高一个数量级;这与现有`多保真度修正`方法相比,后者在减少阶模型中常失败且受维度灾难影响。所提出的误差替代模型也导致一种`概率严谨性`的概念,即替代模型以指定概率界定了误差。

英文摘要

This work presents a technique for statistically modeling errors introduced by reduced-order models. The method employs Gaussian-process regression to construct a mapping from a small number of computationally inexpensive `error indicators' to a distribution over the true error. The variance of this distribution can be interpreted as the (epistemic) uncertainty introduced by the reduced-order model. To model normed errors, the method employs existing rigorous error bounds and residual norms as indicators; numerical experiments show that the method leads to a near-optimal expected effectivity in contrast to typical error bounds. To model errors in general outputs, the method uses dual-weighted residuals---which are amenable to uncertainty control---as indicators. Experiments illustrate that correcting the reduced-order-model output with this surrogate can improve prediction accuracy by an order of magnitude; this contrasts with existing `multifidelity correction' approaches, which often fail for reduced-order models and suffer from the curse of dimensionality. The proposed error surrogates also lead to a notion of `probabilistic rigor', i.e., the surrogate bounds the error with specified probability.

1403.5045 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML

Matroid Bandits: Fast Combinatorial Optimization with Learning

Matroid Bandits: 快速组合优化中的学习

Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, Brian Eriksson

AI总结 本文提出matroid bandits,结合bandits和matroids,通过Optimistic Matroid Maximization算法解决在matroid上最大化随机模函数的问题,并给出两种 regret 上界。

详情
AI中文摘要

Matroid 是组合优化中独立性的概念,与计算效率密切相关。本文将bandits与matroids结合,提出matroid bandits,目标是学习在matroid上最大化随机模函数。我们提出实用算法Optimistic Matroid Maximization (OMM),并证明两种regret上界,均为亚线性时间,且在其他量上至多线性。gap-dependent上界是紧的,并证明了partition matroid bandit的匹配下界。最后在三个实际问题上评估了该方法,证明其实用性。

英文摘要

A matroid is a notion of independence in combinatorial optimization which is closely related to computational efficiency. In particular, it is well known that the maximum of a constrained modular function can be found greedily if and only if the constraints are associated with a matroid. In this paper, we bring together the ideas of bandits and matroids, and propose a new class of combinatorial bandits, matroid bandits. The objective in these problems is to learn how to maximize a modular function on a matroid. This function is stochastic and initially unknown. We propose a practical algorithm for solving our problem, Optimistic Matroid Maximization (OMM); and prove two upper bounds, gap-dependent and gap-free, on its regret. Both bounds are sublinear in time and at most linear in all other quantities of interest. The gap-dependent upper bound is tight and we prove a matching lower bound on a partition matroid bandit. Finally, we evaluate our method on three real-world problems and show that it is practical.

1504.02914 2026-06-04 stat.CO cs.MS cs.NA math.NA

Representing numeric data in 32 bits while preserving 64-bit precision

在32位中表示数值数据的同时保持64位精度

Radford M. Neal

AI总结 本文提出了一种在32位中精确表示64位浮点数的方法,通过保留高位mantissa部分和指数,并利用表查找补全低位,从而实现高效压缩。

详情
AI中文摘要

数据文件常包含仅有少数有效十进制数字的数值,其信息量足以用32位存储。然而,我们可能需要对这些数值进行64位浮点精度的算术运算,这不允许简单地将数据表示为32位浮点数。十进制浮点数提供紧凑且精确的表示,但需要在使用前进行缓慢的除法转换。本文展示,64位浮点数的有趣子集可以由32位表示,包含符号、指数和高位mantissa部分,低位mantissa部分通过表查找补全,索引由保留mantissa部分的位和可能的指数位提供。例如,具有4位或更少小数点左边数字和2位或更少小数点右边数字的十进制数据,可通过保留mantissa部分的低位5位作为索引进行表示。数据包含6位十进制数,小数点在任一数字前或后的位置也可通过这种方式表示,并使用mantissa和指数的19位作为索引进行解码。编码只需复制64位值的一半,必要时通过检查是否能正确解码进行验证。解码只需提取索引位并进行表查找。小型表查找通常会命中缓存;即使使用更大的表,解码速度仍快于十进制浮点数的除法转换。本文讨论了此类方案在最新计算机系统中的性能,以及如何用于自动压缩解释性语言如R中的大型数组。

英文摘要

Data files often consist of numbers having only a few significant decimal digits, whose information content would allow storage in only 32 bits. However, we may require that arithmetic operations involving these numbers be done with 64-bit floating-point precision, which precludes simply representing the data as 32-bit floating-point values. Decimal floating point gives a compact and exact representation, but requires conversion with a slow division operation before it can be used. Here, I show that interesting subsets of 64-bit floating-point values can be compactly and exactly represented by the 32 bits consisting of the sign, exponent, and high-order part of the mantissa, with the lower-order 32 bits of the mantissa filled in by table lookup, indexed by bits from the part of the mantissa retained, and possibly from the exponent. For example, decimal data with 4 or fewer digits to the left of the decimal point and 2 or fewer digits to the right of the decimal point can be represented in this way using the lower-order 5 bits of the retained part of the mantissa as the index. Data consisting of 6 decimal digits with the decimal point in any of the 7 positions before or after one of the digits can also be represented this way, and decoded using 19 bits from the mantissa and exponent as the index. Encoding with such a scheme is a simple copy of half the 64-bit value, followed if necessary by verification that the value can be represented, by checking that it decodes correctly. Decoding requires only extraction of index bits and a table lookup. Lookup in a small table will usually reference cache; even with larger tables, decoding is still faster than conversion from decimal floating point with a division operation. I discuss how such schemes perform on recent computer systems, and how they might be used to automatically compress large arrays in interpretive languages such as R.

1403.6015 2026-06-04 math.NA astro-ph.IM cs.NA math.ST stat.TH

Fast Direct Methods for Gaussian Processes

高斯过程的快速直接方法

Sivaram Ambikasaran, Daniel Foreman-Mackey, Leslie Greengard, David W. Hogg, Michael O'Neil

AI总结 本文提出高斯过程中高维协方差矩阵的快速计算方法,通过分层分解降低计算复杂度,实现高维概率计算。

详情
AI中文摘要

概率与统计中的许多问题可通过多元正态分布解决。在一维情况下,给定均值和方差只需计算相应高斯密度。在n维情况下,需求解n×n协方差矩阵C及其行列式det(C)。在高斯过程回归中,C通常为σ²I+K,其中K由数据和超参数决定。C通常密集,导致标准直接方法需O(n³)工作量,难以处理大规模问题。本文证明,对于常用协方差函数,C可分解为块低秩更新的乘积,得到O(nlog²n)算法。此分解使det(C)计算成为可能,允许在较宽松的核假设下进行高维概率计算。该快速算法使边缘化和超参数适应在单CPU内核上变得可行。近最优的规模扩展与高性能计算资源结合,将使之前难以处理的问题建模成为可能。本文在标准协方差核上展示了该方法的性能。

英文摘要

A number of problems in probability and statistics can be addressed using the multivariate normal (Gaussian) distribution. In the one-dimensional case, computing the probability for a given mean and variance simply requires the evaluation of the corresponding Gaussian density. In the $n$-dimensional setting, however, it requires the inversion of an $n \times n$ covariance matrix, $C$, as well as the evaluation of its determinant, $\det(C)$. In many cases, such as regression using Gaussian processes, the covariance matrix is of the form $C = σ^2 I + K$, where $K$ is computed using a specified covariance kernel which depends on the data and additional parameters (hyperparameters). The matrix $C$ is typically dense, causing standard direct methods for inversion and determinant evaluation to require $\mathcal O(n^3)$ work. This cost is prohibitive for large-scale modeling. Here, we show that for the most commonly used covariance functions, the matrix $C$ can be hierarchically factored into a product of block low-rank updates of the identity matrix, yielding an $\mathcal O (n\log^2 n) $ algorithm for inversion. More importantly, we show that this factorization enables the evaluation of the determinant $\det(C)$, permitting the direct calculation of probabilities in high dimensions under fairly broad assumptions on the kernel defining $K$. Our fast algorithm brings many problems in marginalization and the adaptation of hyperparameters within practical reach using a single CPU core. The combination of nearly optimal scaling in terms of problem size with high-performance computing resources will permit the modeling of previously intractable problems. We illustrate the performance of the scheme on standard covariance kernels.

1503.08855 2026-06-04 math.OC cs.IT cs.LG cs.MA cs.SY eess.SY math.IT stat.ML

Decentralized learning for wireless communications and networking

无线通信与网络中的去中心化学习

Georgios B. Giannakis, Qing Ling, Gonzalo Mateos, Ioannis D. Schizas, Hao Zhu

AI总结 本文提出了一种去中心化学习算法,用于图数据的网络内处理,通过交替方向乘子法实现分布式优化,适用于无线通信和网络任务的案例研究。

详情
Comments
Contributed chapter to appear in Splitting Methods in Communication and Imaging, Science and Engineering, R. Glowinski, S. Osher, and W. Yin, Editors, New York, Springer, 2015
AI中文摘要

本章讨论了用于图数据网络内处理的去中心化学习算法。提出了一个通用学习问题,并将其转换为可分离形式,通过交替方向乘子法(ADMM)迭代最小化以实现所需程度的并行化。在不交换分布式训练集元素且保持节点间通信成本可控的情况下,本地学习器同意全球推断得到的量,即若整个训练数据集集中可用时所获得的量。通过包括使用无线传感器网络的目标跟踪、揭示互联网流量异常、电力系统状态估计以及无线认知无线电网络的频谱制图等案例研究,展示了去中心化学习框架对当代无线通信和网络任务的影响。

英文摘要

This chapter deals with decentralized learning algorithms for in-network processing of graph-valued data. A generic learning problem is formulated and recast into a separable form, which is iteratively minimized using the alternating-direction method of multipliers (ADMM) so as to gain the desired degree of parallelization. Without exchanging elements from the distributed training sets and keeping inter-node communications at affordable levels, the local (per-node) learners consent to the desired quantity inferred globally, meaning the one obtained if the entire training data set were centrally available. Impact of the decentralized learning framework to contemporary wireless communications and networking tasks is illustrated through case studies including target tracking using wireless sensor networks, unveiling Internet traffic anomalies, power system state estimation, as well as spectrum cartography for wireless cognitive radio networks.

1307.6701 2026-06-04 math.NA cs.NA math.ST stat.CO stat.ME stat.TH

Iterative Estimation of Solutions to Noisy Nonlinear Operator Equations in Nonparametric Instrumental Regression

迭代估计噪声非线性算子方程的解在非参数工具回归中

Fabian Dunker, Jean-Pierre Florens, Thorsten Hohage, Jan Johannes, Enno Mammen

AI总结 本文提出一种正则化牛顿型迭代方法,用于解决噪声积分核下的非线性积分方程,特别强调在工具回归模型中使用更强的独立性假设,以正确估计非参数回归函数。

详情
Journal ref
Journal of Econometrics , 2014, 178, 444-455
AI中文摘要

本文讨论了非参数工具回归中出现的非线性积分方程的解法,其中积分核含噪声。我们提出了一种正则化牛顿型迭代方法,并建立了收敛性和收敛速度的结果。特别强调了在工具回归模型中,通常的条件均值假设被更强的独立性假设所替代。我们展示了在二元工具情况下,我们的方法能够正确估计在标准模型中不可识别的回归函数,并通过计算示例和模拟数据进行了说明。

英文摘要

This paper discusses the solution of nonlinear integral equations with noisy integral kernels as they appear in nonparametric instrumental regression. We propose a regularized Newton-type iteration and establish convergence and convergence rate results. A particular emphasis is on instrumental regression models where the usual conditional mean assumption is replaced by a stronger independence assumption. We demonstrate for the case of a binary instrument that our approach allows the correct estimation of regression functions which are not identifiable with the standard model. This is illustrated in computed examples with simulated data.

1404.2006 2026-06-04 math.DG cs.IT cs.SY eess.SY math.IT math.ST stat.TH

Kählerian information geometry for signal processing

Kähler信息几何在信号处理中的应用

Jaehyung Choi, Andrew P. Mullhaupt

AI总结 本文研究了信号滤波器的信息几何与Kähler流形的对应关系,证明了最小相位线性系统的信息几何为Kähler流形,并探讨了其在信号处理中的应用与优势。

详情
Journal ref
Entropy 17(4), 1581-1605 (2015)
Comments
24 pages, published version
AI中文摘要

我们证明了信号滤波器的信息几何与Kähler流形之间的对应关系。最小相位线性系统的信息几何是一个Kähler流形,信号滤波器的复频谱范数的平方对应于Kähler势。Kähler流形的Hermite结构仅在脉冲响应函数中最高次$z$的项在模型参数中为常数时才会显式出现。Kähler信息几何利用了更高效的度量张量和里奇张量计算步骤。此外,几何张量的$α$-泛化在$α$上是线性的。它还对寻找贝叶斯预测先验,如超谐波先验,具有鲁棒性,因为Kähler流形上的拉普拉斯-贝尔特里算子比非Kähler流形上的形式简单得多。几种时间序列模型在Kähler信息几何中被研究。

英文摘要

We prove the correspondence between the information geometry of a signal filter and a Kähler manifold. The information geometry of a minimum-phase linear system with a finite complex cepstrum norm is a Kähler manifold. The square of the complex cepstrum norm of the signal filter corresponds to the Kähler potential. The Hermitian structure of the Kähler manifold is explicitly emergent if and only if the impulse response function of the highest degree in $z$ is constant in model parameters. The Kählerian information geometry takes advantage of more efficient calculation steps for the metric tensor and the Ricci tensor. Moreover, $α$-generalization on the geometric tensors is linear in $α$. It is also robust to find Bayesian predictive priors, such as superharmonic priors, because Laplace-Beltrami operators on Kähler manifolds are in much simpler forms than those of the non-Kähler manifolds. Several time series models are studied in the Kählerian information geometry.

1410.6558 2026-06-04 cs.IT cs.NA math.IT math.NA stat.ME

Sampling in the Analysis Transform Domain

分析域中的采样

Raja Giryes

AI总结 本文提出了一种新的采样方案,使现有合成方法可用于分析模型信号的恢复。

详情
Comments
13 Pages, 2 figures
AI中文摘要

许多信号和图像处理应用受益于底层信号处于低维子空间的事实。其中一种低维性模型是稀疏性。在该框架下,稀疏建模有两种主要方法:合成方法和分析方法。在合成方法中,信号被假设在给定字典下具有稀疏表示。而分析方法中,稀疏性则在应用特定变换后的信号系数中测量。尽管已有多种算法和理论,但合成方法的算法数量更多。鉴于分析字典可能是帧或二维有限差分算子,本文提出了一种新的采样方案,允许使用现有合成方法从分析模型信号的样本中恢复信号。这种新采样策略的优势在于,它使现有的合成方法及其理论也可用于分析框架中的信号。

英文摘要

Many signal and image processing applications have benefited remarkably from the fact that the underlying signals reside in a low dimensional subspace. One of the main models for such a low dimensionality is the sparsity one. Within this framework there are two main options for the sparse modeling: the synthesis and the analysis ones, where the first is considered the standard paradigm for which much more research has been dedicated. In it the signals are assumed to have a sparse representation under a given dictionary. On the other hand, in the analysis approach the sparsity is measured in the coefficients of the signal after applying a certain transformation, the analysis dictionary, on it. Though several algorithms with some theory have been developed for this framework, they are outnumbered by the ones proposed for the synthesis methodology. Given that the analysis dictionary is either a frame or the two dimensional finite difference operator, we propose a new sampling scheme for signals from the analysis model that allows to recover them from their samples using any existing algorithm from the synthesis model. The advantage of this new sampling strategy is that it makes the existing synthesis methods with their theory also available for signals from the analysis framework.

1310.0807 2026-06-04 cs.IT cs.LG cs.NA math.IT math.NA math.ST stat.ML stat.TH

Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming

通过凸优化从二次采样中获得精确且稳定的协方差估计

Yuxin Chen, Yuejie Chi, Andrea Goldsmith

AI总结 本文研究了通过凸优化从二次采样中提取高维数据协方差结构的方法,探讨了低秩、Toeplitz低秩、稀疏性等结构假设,并展示了在无噪声情况下能实现准确的协方差估计。

详情
Comments
accepted to IEEE Transactions on Information Theory, 2015
AI中文摘要

高维数据的统计推断和信息处理往往需要高效且准确的二阶统计量估计。在数据快速变化、处理能力和存储有限的情况下,提取协方差结构需要单次数据遍历和少量存储测量。本文探讨了一种二次(或秩一)测量模型,该模型在采样过程中具有最小的内存需求和低计算复杂性,并在保持各种低维协方差结构方面被证明是最优的。具体而言,研究了四种流行的协方差矩阵结构假设,即低秩、Toeplitz低秩、稀疏性以及联合秩一和稀疏结构,通过针对相应结构的凸松弛范式实现恢复。所提出的二次采样框架具有多种潜在应用,包括流数据处理、高频无线通信、相空间断层扫描和光学相位恢复,以及非相干子空间检测。我们的方法在无噪声情况下,只要测量数量超过信息论极限即可实现普遍准确的协方差估计。我们还展示了该方法在噪声和不完美结构假设下的鲁棒性。我们的分析基于一种新的概念,称为混合范数限制等距性质(RIP-ℓ₂/ℓ₁),以及传统的RIP-ℓ₂/ℓ₂用于近各向同性和有界测量。此外,我们的结果在使用PhaseLift进行相位恢复(包括密集和稀疏信号)的已知最佳保证方面,采用了一种显著更简单的方法。

英文摘要

Statistical inference and information processing of high-dimensional data often require efficient and accurate estimation of their second-order statistics. With rapidly changing data, limited processing power and storage at the acquisition devices, it is desirable to extract the covariance structure from a single pass over the data and a small number of stored measurements. In this paper, we explore a quadratic (or rank-one) measurement model which imposes minimal memory requirements and low computational complexity during the sampling process, and is shown to be optimal in preserving various low-dimensional covariance structures. Specifically, four popular structural assumptions of covariance matrices, namely low rank, Toeplitz low rank, sparsity, jointly rank-one and sparse structure, are investigated, while recovery is achieved via convex relaxation paradigms for the respective structure. The proposed quadratic sampling framework has a variety of potential applications including streaming data processing, high-frequency wireless communication, phase space tomography and phase retrieval in optics, and non-coherent subspace detection. Our method admits universally accurate covariance estimation in the absence of noise, as soon as the number of measurements exceeds the information theoretic limits. We also demonstrate the robustness of this approach against noise and imperfect structural assumptions. Our analysis is established upon a novel notion called the mixed-norm restricted isometry property (RIP-$\ell_{2}/\ell_{1}$), as well as the conventional RIP-$\ell_{2}/\ell_{2}$ for near-isotropic and bounded measurements. In addition, our results improve upon the best-known phase retrieval (including both dense and sparse signals) guarantees using PhaseLift with a significantly simpler approach.

1503.05567 2026-06-04 stat.ML cs.IT cs.SY eess.SY math.IT

The Knowledge Gradient Policy Using A Sparse Additive Belief Model

基于稀疏加性信念模型的知识梯度策略

Yan Li, Han Liu, Warren Powell

AI总结 本文提出一种用于高维稀疏信念函数的序贯学习策略,结合贝叶斯R&S与频率学学习,通过B-样条基扩展和非参数加性模型实现稀疏线性模型参数估计。

详情
AI中文摘要

我们提出了一种用于噪声离散全局优化和排序选择问题的序贯学习策略,其中高维稀疏信念函数包含数百甚至数千个特征,但仅有少量特征具有解释力。我们推导了带有组Lasso惩罚的稀疏线性模型的知识梯度策略(KGSpLin),该策略是贝叶斯R&S与频率学学习的混合方法。特别地,我们的方法自然结合了B-样条基扩展,并推广到非参数加性模型(KGSpAM)和函数ANOV模型。理论上,我们提供了后验均值估计和函数估计的误差界。受控实验表明,该算法即使在嵌入数百个虚参数的模型中也能高效学习正确非零参数集,并优于线性模型的知识梯度方法。

英文摘要

We propose a sequential learning policy for noisy discrete global optimization and ranking and selection (R\&S) problems with high dimensional sparse belief functions, where there are hundreds or even thousands of features, but only a small portion of these features contain explanatory power. We aim to identify the sparsity pattern and select the best alternative before the finite budget is exhausted. We derive a knowledge gradient policy for sparse linear models (KGSpLin) with group Lasso penalty. This policy is a unique and novel hybrid of Bayesian R\&S with frequentist learning. Particularly, our method naturally combines B-spline basis expansion and generalizes to the nonparametric additive model (KGSpAM) and functional ANOVA model. Theoretically, we provide the estimation error bounds of the posterior mean estimate and the functional estimate. Controlled experiments show that the algorithm efficiently learns the correct set of nonzero parameters even when the model is imbedded with hundreds of dummy parameters. Also it outperforms the knowledge gradient for a linear model.

1503.03903 2026-06-04 cs.LG cs.IT cs.NA math.IT math.NA stat.ML

Approximating Sparse PCA from Incomplete Data

从不完整数据近似稀疏PCA

Abhisek Kundu, Petros Drineas, Malik Magdon-Ismail

AI总结 研究如何利用少量数据元素形成的草图恢复数据矩阵的稀疏主成分,证明草图接近原矩阵时可获得近优解,提出稀疏PCA算法并展示其在多领域数据上的有效性,提升运行效率。

详情
AI中文摘要

我们研究如何利用少量数据元素形成的草图来恢复数据矩阵的稀疏主成分。我们证明,对于广泛的一类优化问题,如果草图在谱范数上接近原始数据矩阵,则可通过草图获得优化问题的近优解。特别是,我们使用此方法获得稀疏主成分,并证明对于m个数据点在n维空间中,O(ε^{-2}\ ilde{k}max{m,n})个元素可提供稀疏PCA问题的ε-近似解(\ ilde{k}是数据矩阵的稳定秩)。我们广泛在图像、文本、生物和金融数据上展示了我们的算法。结果表明,不仅能够从不完整数据中恢复稀疏主成分,而且通过使用稀疏草图,运行时间可减少五倍或更多。

英文摘要

We study how well one can recover sparse principal components of a data matrix using a sketch formed from a few of its elements. We show that for a wide class of optimization problems, if the sketch is close (in the spectral norm) to the original data matrix, then one can recover a near optimal solution to the optimization problem by using the sketch. In particular, we use this approach to obtain sparse principal components and show that for \math{m} data points in \math{n} dimensions, \math{O(ε^{-2}\tilde k\max\{m,n\})} elements gives an \mathε-additive approximation to the sparse PCA problem (\math{\tilde k} is the stable rank of the data matrix). We demonstrate our algorithms extensively on image, text, biological and financial data. The results show that not only are we able to recover the sparse PCAs from the incomplete data, but by using our sparse sketch, the running time drops by a factor of five or more.

1503.03355 2026-06-04 stat.ML cs.LG cs.NA math.NA stat.AP

Automatic Unsupervised Tensor Mining with Quality Assessment

自动无监督张量挖掘与质量评估

Evangelos E. Papalexakis

AI总结 本文提出AutoTen算法,通过改进启发式方法实现自动无监督张量挖掘,通过合成数据和真实数据验证其性能,为自动化张量挖掘提供新方法。

详情
AI中文摘要

张量分解是无监督建模和挖掘多方面数据的常用工具。在探索性设置中,当没有标签或地面真实信息时,如何自动决定提取多少组件?如何评估结果质量,以便领域专家在解释结果时考虑此质量度量?本文介绍AutoTen,一种新型自动无监督张量挖掘算法,最小化用户干预,利用并改进了评估结果质量的启发式方法。我们对合成数据进行了广泛评估,优于现有基线方法。最后,我们将在各种真实数据集上应用AutoTen,提供见解和发现。我们将这项工作视为迈向完全自动化的无监督张量挖掘工具的一步,该工具可以被学术界和工业界的专业人员轻松采用。

英文摘要

A popular tool for unsupervised modelling and mining multi-aspect data is tensor decomposition. In an exploratory setting, where and no labels or ground truth are available how can we automatically decide how many components to extract? How can we assess the quality of our results, so that a domain expert can factor this quality measure in the interpretation of our results? In this paper, we introduce AutoTen, a novel automatic unsupervised tensor mining algorithm with minimal user intervention, which leverages and improves upon heuristics that assess the result quality. We extensively evaluate AutoTen's performance on synthetic data, outperforming existing baselines on this very hard problem. Finally, we apply AutoTen on a variety of real datasets, providing insights and discoveries. We view this work as a step towards a fully automated, unsupervised tensor mining tool that can be easily adopted by practitioners in academia and industry.

1503.02893 2026-06-04 cs.IT cs.NA math.IT math.NA math.OC stat.ML

Robust recovery of complex exponential signals from random Gaussian projections via low rank Hankel matrix reconstruction

通过低秩Hankel矩阵重建实现从随机高斯投影中稳健恢复复指数信号

Jian-Feng Cai, Xiaobo Qu, Weiyu Xu, Gui-Bo Ye

AI总结 本文研究了从少量随机高斯投影中稳健恢复多个复指数函数的信号,通过低秩Hankel矩阵重建,理论证明在投影数量超过O(Rln²N)时可实现稳健恢复,无需不相干或分离条件,适用于频谱压缩感知。

详情
Comments
17 pages
AI中文摘要

本文探讨了从少量随机高斯投影中稳健恢复多个复指数函数的信号。我们假设信号为2N-1维且R<<2N-1,该框架覆盖生物、自动化、成像科学等领域的广泛应用。通过最小化Hankel矩阵的核范数并满足采样数据一致性来重建信号。理论结果表明,只要投影数量超过O(Rln²N),即可实现稳健恢复。无需不相干或分离条件。该方法可应用于频谱压缩感知,相比现有结果,无需频率分离条件,且在测量数量上有更好的或可比的界。此外,该方法为非均匀采样NMR光谱中的采样数量提供了理论指导。通过数值实验进一步验证了算法性能。

英文摘要

This paper explores robust recovery of a superposition of $R$ distinct complex exponential functions from a few random Gaussian projections. We assume that the signal of interest is of $2N-1$ dimensional and $R<<2N-1$. This framework covers a large class of signals arising from real applications in biology, automation, imaging science, etc. To reconstruct such a signal, our algorithm is to seek a low-rank Hankel matrix of the signal by minimizing its nuclear norm subject to the consistency on the sampled data. Our theoretical results show that a robust recovery is possible as long as the number of projections exceeds $O(R\ln^2N)$. No incoherence or separation condition is required in our proof. Our method can be applied to spectral compressed sensing where the signal of interest is a superposition of $R$ complex sinusoids. Compared to existing results, our result here does not need any separation condition on the frequencies, while achieving better or comparable bounds on the number of measurements. Furthermore, our method provides theoretical guidance on how many samples are required in the state-of-the-art non-uniform sampling in NMR spectroscopy. The performance of our algorithm is further demonstrated by numerical experiments.

1503.02737 2026-06-04 math.NA cs.NA stat.CO

Scrambled geometric net integration over general product spaces

在一般乘积空间上进行混乱几何网络积分

K. Basu, A. B. Owen

AI总结 本文提出了一种在一般乘积空间上进行数值积分的点集构造方法,通过递归几何划分将随机化(t,m,s)-网转换为积分点集,提升了QMC在光滑积分函数上的精度。

详情
Comments
29 pages; 5 figures
AI中文摘要

准蒙特卡罗(QMC)采样已被开发用于在[0,1]^s上进行积分,其在积分函数有界变分时比蒙特卡罗(MC)方法具有更高的精度。混乱网络四则法允许基于复制的误差估计,对于QMC来说至少具有相同精度,对于足够光滑的积分函数甚至更好。在三角形、球体、圆盘及这些空间的笛卡尔积上进行积分对QMC来说更加困难,因为诱导在单位立方体上的积分函数可能无法具有所需的正则性。在本文中,我们提出了一种构造用于在s个空间的笛卡尔积上进行数值积分的点集,这些点集是通过递归几何划分将随机化(t,m,s)-网转换而来的。所得到的积分估计是无偏的,其方差为o(1/n)对于任何在乘积空间中的L^2函数。在积分函数的光滑性假设下,我们的随机QMC算法方差为O(n^{-1 - 2/d} (log n)^{s-1}),对于s重笛卡尔积的d维域上的积分,相比普通蒙特卡罗的O(n^{-1})。

英文摘要

Quasi-Monte Carlo (QMC) sampling has been developed for integration over $[0,1]^s$ where it has superior accuracy to Monte Carlo (MC) for integrands of bounded variation. Scrambled net quadrature gives allows replication based error estimation for QMC with at least the same accuracy and for smooth enough integrands even better accuracy than plain QMC. Integration over triangles, spheres, disks and Cartesian products of such spaces is more difficult for QMC because the induced integrand on a unit cube may fail to have the desired regularity. In this paper, we present a construction of point sets for numerical integration over Cartesian products of $s$ spaces of dimension $d$, with triangles ($d=2$) being of special interest. The point sets are transformations of randomized $(t,m,s)$-nets using recursive geometric partitions. The resulting integral estimates are unbiased and their variance is $o(1/n)$ for any integrand in $L^2$ of the product space. Under smoothness assumptions on the integrand, our randomized QMC algorithm has variance $O(n^{-1 - 2/d} (\log n)^{s-1})$, for integration over $s$-fold Cartesian products of $d$-dimensional domains, compared to $O(n^{-1})$ for ordinary Monte Carlo.

1503.01210 2026-06-04 eess.SY cs.SY stat.ML

Low-dimensional Models in Spatio-Temporal Wind Speed Forecasting

时空风速预测中的低维模型

Borhan M. Sanandaji, Akin Tascikaraoglu, Kameshwar Poolla, Pravin Varaiya

AI总结 本文提出一种结合时间序列数据和周围站点数据的时空风速预测算法,利用压缩感知和结构稀疏恢复理论,通过低维结构恢复提升短期风速预测精度。

详情
Comments
Initially submitted for review to the 2015 American Control Conference on September 22, 2014; Accepted for publication on January 22, 2015
AI中文摘要

将风能并入电网具有挑战性,因其随机性。准确的短期风功率预测有助于集成。本文提出一种时空风速预测算法,结合目标站点的时间序列数据和周围站点数据。受压缩感知(CS)和结构稀疏恢复算法启发,我们主张存在内在低维结构支配大量站点,应被利用。将预测问题转化为从线性方程集合中恢复块稀疏信号x的问题,提出新的结构稀疏恢复算法。东海岸案例研究结果表明,所提压缩时空风速预测(CST-WSF)算法相比广泛使用的基准模型显著提升了短期预测精度。

英文摘要

Integrating wind power into the grid is challenging because of its random nature. Integration is facilitated with accurate short-term forecasts of wind power. The paper presents a spatio-temporal wind speed forecasting algorithm that incorporates the time series data of a target station and data of surrounding stations. Inspired by Compressive Sensing (CS) and structured-sparse recovery algorithms, we claim that there usually exists an intrinsic low-dimensional structure governing a large collection of stations that should be exploited. We cast the forecasting problem as recovery of a block-sparse signal $\boldsymbol{x}$ from a set of linear equations $\boldsymbol{b} = A\boldsymbol{x}$ for which we propose novel structure-sparse recovery algorithms. Results of a case study in the east coast show that the proposed Compressive Spatio-Temporal Wind Speed Forecasting (CST-WSF) algorithm significantly improves the short-term forecasts compared to a set of widely-used benchmark models.

1402.5180 2026-06-04 cs.LG cs.NA math.NA stat.ML

Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-$1$ Updates

通过交替秩-1更新保证非正交张量分解

Animashree Anandkumar, Rong Ge, Majid Janzamin

AI总结 本文提供了一种保证CP张量分解的局部和全局收敛性,通过交替秩-1更新方法,适用于非对称张量,证明了在特定秩条件下可恢复过完备分解。

详情
Comments
We have added an additional sub-algorithm to remove the (approximate) residual error left after the tensor power iteration
AI中文摘要

在本文中,我们为恢复CP(Candecomp/Parafac)张量分解提供了局部和全局收敛性保证。所提出算法的主要步骤是一个简单的交替秩-1更新,这是针对非对称张量的张量幂迭代的交替版本。对于秩为k的第三阶张量,在d维空间中,当k=o(d^{1.5})且张量组件不相干时,建立了局部收敛性保证,从而可以恢复过完备张量分解。我们还通过简单的初始化过程,强化了结果,通过使用随机张量切片的顶部奇异向量进行初始化,从而在更严格的秩条件下(k ≤ βd,对于任意常数β>1)实现了全局收敛性保证。此外,还提供了p阶张量的近似局部收敛性保证,条件为k=o(d^{p/2})。这些保证还包括在存在噪声张量时的紧扰动分析。

英文摘要

In this paper, we provide local and global convergence guarantees for recovering CP (Candecomp/Parafac) tensor decomposition. The main step of the proposed algorithm is a simple alternating rank-$1$ update which is the alternating version of the tensor power iteration adapted for asymmetric tensors. Local convergence guarantees are established for third order tensors of rank $k$ in $d$ dimensions, when $k=o \bigl( d^{1.5} \bigr)$ and the tensor components are incoherent. Thus, we can recover overcomplete tensor decomposition. We also strengthen the results to global convergence guarantees under stricter rank condition $k \le βd$ (for arbitrary constant $β> 1$) through a simple initialization procedure where the algorithm is initialized by top singular vectors of random tensor slices. Furthermore, the approximate local convergence guarantees for $p$-th order tensors are also provided under rank condition $k=o \bigl( d^{p/2} \bigr)$. The guarantees also include tight perturbation analysis given noisy tensor.

1503.00282 2026-06-04 math.NA cs.NA stat.ML

Constructive sparse trigonometric approximation for functions with small mixed smoothness

构造性稀疏三角逼近用于具有小混合光滑度的函数

V. N. Temlyakov

AI总结 本文提出基于贪心算法的构造性方法,为具有小混合光滑度的函数提供最优的三角逼近误差。

详情
AI中文摘要

本文给出了基于贪心算法的构造性方法,为具有小混合光滑度的函数类提供了在三角系下$ m $-项逼近的最佳可能误差顺序。

英文摘要

The paper gives a constructive method, based on greedy algorithms, that provides for the classes of functions with small mixed smoothness the best possible in the sense of order approximation error for the $m$-term approximation with respect to the trigonometric system.

1502.04689 2026-06-04 cs.LG cs.NA math.NA stat.ML

Exact tensor completion using t-SVD

利用t-SVD进行精确张量补全

Zemin Zhang, Shuchin Aeron

AI总结 本文基于t-SVD提出张量补全方法,通过凸优化最小化张量核范数以保证恢复概率,验证了在随机采样下张量补全的最优性。

详情
Comments
16 pages, 5 figures, 2 tables
AI中文摘要

本文聚焦于从有限采样中补全多维数组(张量)的问题。我们的方法基于最近提出的张量奇异值分解(t-SVD)。利用该分解可以得到称为张量管秩的张量秩概念,其最优性性质类似于由SVD得到的矩阵秩。如[2]所示,某些多维数据,如平移视频序列,表现出低张量管秩,我们考虑在数据立方体的随机采样下补全此类数据的问题。我们证明,通过求解一个凸优化问题,该问题最小化作为张量管秩的凸松弛得到的张量核范数,可以保证在观察到与t-SVD中的自由度成比例的样本时,恢复具有 overwhelming 概率。从这个意义上说,我们的结果是顺序最优的。该结果成立的条件非常类似于矩阵补全的无相干条件,尽管我们是在t-SVD的代数框架下定义无相干性。我们还在一些真实数据集上展示了算法的性能,并将其与其他基于张量展平和Tucker分解的方法进行了比较。

英文摘要

In this paper we focus on the problem of completion of multidimensional arrays (also referred to as tensors) from limited sampling. Our approach is based on a recently proposed tensor-Singular Value Decomposition (t-SVD) [1]. Using this factorization one can derive notion of tensor rank, referred to as the tensor tubal rank, which has optimality properties similar to that of matrix rank derived from SVD. As shown in [2] some multidimensional data, such as panning video sequences exhibit low tensor tubal rank and we look at the problem of completing such data under random sampling of the data cube. We show that by solving a convex optimization problem, which minimizes the tensor nuclear norm obtained as the convex relaxation of tensor tubal rank, one can guarantee recovery with overwhelming probability as long as samples in proportion to the degrees of freedom in t-SVD are observed. In this sense our results are order-wise optimal. The conditions under which this result holds are very similar to the incoherency conditions for the matrix completion, albeit we define incoherency under the algebraic set-up of t-SVD. We show the performance of the algorithm on some real data sets and compare it with other existing approaches based on tensor flattening and Tucker decomposition.

1502.05786 2026-06-04 cs.DC cs.PF cs.SY eess.SY math.PR stat.AP

Randomized Assignment of Jobs to Servers in Heterogeneous Clusters of Shared Servers for Low Delay

在异构服务器集群中随机分配作业以降低延迟

Arpan Mukhopadhyay, A. Karthik, Ravi R. Mazumdar

AI总结 本文研究了在异构服务器集群中通过随机分配作业以降低延迟的问题,提出两种方案以减少作业的平均停留时间,分析了系统在大规模极限下的稳定性及平稳解的存在性。

详情
AI中文摘要

我们考虑了一个由N个并行处理器共享服务器组成的多服务器系统,这些服务器分为M(<<N)种类型,根据处理能力和速度进行分类。随机大小的作业按照速率Nλ的泊松过程到达系统。每次到达时,从每种类型中随机采样少量服务器。然后根据选择规则将作业分配给其中一个采样的服务器。我们提出了两种方案,每种方案对应特定的选择规则,旨在减少作业在系统中的平均停留时间。我们首先证明两种方法都能达到最大稳定性区域。然后分析系统在N→∞时的运行情况,对应于均场。我们的结果表明,即使M是有限的,服务器之间也存在渐近独立性,且交换性仅在同类型服务器内成立。我们进一步建立了均场平稳解的存在性和唯一性,并证明每个服务器类型的服务器占用尾分布以双指数衰减。当到达率估计不可用时,所提出的方案提供了实现较低作业平均停留时间的更简单替代方法,如我们的数值研究所示。

英文摘要

We consider the job assignment problem in a multi-server system consisting of $N$ parallel processor sharing servers, categorized into $M$ ($\ll N$) different types according to their processing capacity or speed. Jobs of random sizes arrive at the system according to a Poisson process with rate $N λ$. Upon each arrival, a small number of servers from each type is sampled uniformly at random. The job is then assigned to one of the sampled servers based on a selection rule. We propose two schemes, each corresponding to a specific selection rule that aims at reducing the mean sojourn time of jobs in the system. We first show that both methods achieve the maximal stability region. We then analyze the system operating under the proposed schemes as $N \to \infty$ which corresponds to the mean field. Our results show that asymptotic independence among servers holds even when $M$ is finite and exchangeability holds only within servers of the same type. We further establish the existence and uniqueness of stationary solution of the mean field and show that the tail distribution of server occupancy decays doubly exponentially for each server type. When the estimates of arrival rates are not available, the proposed schemes offer simpler alternatives to achieving lower mean sojourn time of jobs, as shown by our numerical studies.

1502.05571 2026-06-04 math.NA cs.NA stat.ML

Finding Dantzig selectors with a proximity operator based fixed-point algorithm

通过近似操作符基于固定点算法寻找Dantzig选择器

Ashley Prater, Lixin Shen, Bruce W. Suter

AI总结 本文提出一种简单迭代方法寻找Dantzig选择器,通过固定点公式近似解Dantzig选择器问题,并通过数据在近似Dantzig选择器支持集上的回归构造新估计器,数值模拟显示该方法在速度上显著优于交替方向方法。

详情
Comments
15 pages, 5 figures. Submitted to Computational Statistics and Data Analysis
AI中文摘要

本文研究了一种简单迭代方法用于寻找Dantzig选择器,该方法旨在解决线性回归问题。该方法包含两个主要阶段。第一阶段是通过固定点公式近似求解Dantzig选择器问题。第二阶段是通过将数据投影到近似Dantzig选择器的支持集上来构造新的估计器。我们比较了该方法与交替方向方法,并在合成和真实数据集上使用这两种方法进行了数值模拟。数值模拟显示,两种方法产生相似质量的结果,但所提方法在速度上显著更快。

英文摘要

In this paper, we study a simple iterative method for finding the Dantzig selector, which was designed for linear regression problems. The method consists of two main stages. The first stage is to approximate the Dantzig selector through a fixed-point formulation of solutions to the Dantzig selector problem. The second stage is to construct a new estimator by regressing data onto the support of the approximated Dantzig selector. We compare our method to an alternating direction method, and present the results of numerical simulations using both the proposed method and the alternating direction method on synthetic and real data sets. The numerical simulations demonstrate that the two methods produce results of similar quality, however the proposed method tends to be significantly faster.

1502.01377 2026-06-04 math.NA cs.NA stat.AP stat.CO stat.ME

The Arithmetic Cosine Transform: Exact and Approximate Algorithms

算术余弦变换:精确与近似算法

R. J. Cintra, V. S. Dimitrov

AI总结 本文介绍了一种新的变换方法——算术余弦变换(ACT),探讨其数学性质及在信号处理中的应用潜力。

详情
Journal ref
IEEE Transactions on Signal Processing, vol. 58, no. 6, pp. 3076-3085, June 2010
Comments
17 pages, 3 figures
AI中文摘要

本文介绍了一种新的变换方法——算术余弦变换(ACT)。我们提供了ACT的核心数学性质,这对于设计高效且精确的实现方法至关重要。本文的关键数学工具来自解析数论,特别是黎曼ζ函数的性质。此外,我们证明了对于任何块长度,都可以实现精确的信号插值。同时,也考虑了近似计算。提供的数值示例展示了ACT在各种数字信号处理应用中的潜力。

英文摘要

In this paper, we introduce a new class of transform method --- the arithmetic cosine transform (ACT). We provide the central mathematical properties of the ACT, necessary in designing efficient and accurate implementations of the new transform method. The key mathematical tools used in the paper come from analytic number theory, in particular the properties of the Riemann zeta function. Additionally, we demonstrate that an exact signal interpolation is achievable for any block-length. Approximate calculations were also considered. The numerical examples provided show the potential of the ACT for various digital signal processing applications.

1502.01038 2026-06-04 math.NA cs.NA stat.ME

A Factorization Scheme for Some Discrete Hartley Transform Matrices

一些离散Hartley变换矩阵的因子化方案

H. M. de Oliveira, R. J. Cintra, R. M. Campello de Souza

AI总结 本文提出了一种高效算法,通过Hartley变换矩阵的因子化方法,达到了DFT/DHT的理论下界乘法复杂度,并给出了短块长的算法。

详情
Comments
10 pages, 4 figures, 2 tables, International Conference on System Engineering, Communications and Information Technologies, 2001, Punta Arenas. ICSECIT 2001 Proceedings. Punta Arenas: Universidad de Magallanes, 2001
AI中文摘要

离散变换如离散傅里叶变换(DFT)和离散Hartley变换(DHT)是数值分析中的重要工具。变换技术的成功应用依赖于存在高效的快速变换。本文推导了一些快速算法。DFT/DHT的乘法复杂度理论下界得以实现。该方法基于DHT矩阵的因子化。对于短块长如$N \in \{3, 5, 6, 12, 24 \}$的算法被提出。

英文摘要

Discrete transforms such as the discrete Fourier transform (DFT) and the discrete Hartley transform (DHT) are important tools in numerical analysis. The successful application of transform techniques relies on the existence of efficient fast transforms. In this paper some fast algorithms are derived. The theoretical lower bound on the multiplicative complexity for the DFT/DHT are achieved. The approach is based on the factorization of DHT matrices. Algorithms for short blocklengths such as $N \in \{3, 5, 6, 12, 24 \}$ are presented.

1502.00277 2026-06-04 math.NA cs.NA math.NT stat.CO

Fast Finite Field Hartley Transforms Based on Hadamard Decomposition

基于Hadamard分解的快速有限域Hartley变换

H. M. de Oliveira, R. G. F. Távora, R. J. Cintra, R. M. Campello de Souza

AI总结 本文提出基于Hadamard分解的快速有限域Hartley变换算法,利用其对称性优化计算复杂度,达到理论下限,适用于多址接入系统设计。

详情
Comments
6 pages, 3 tables, fixed typos, submitted to the Sixth International Symposium on Communication Theory and Applications (ISCTA'01), 2001
AI中文摘要

一种新的有限域Hartley变换(FFHT)最近被引入,并在多址接入系统和多级扩频序列设计中提出了有前景的应用。FFHT表现出有趣的对称性,通过Hadamard-Walsh变换(HWT)进行递归分解来推导定制化的快速变换算法。所提出的快速算法通过HWT分解FFHT,所有研究案例均达到乘法复杂度的下限。新算法的复杂度与传统算法进行了比较。

英文摘要

A new transform over finite fields, the finite field Hartley transform (FFHT), was recently introduced and a number of promising applications on the design of efficient multiple access systems and multilevel spread spectrum sequences were proposed. The FFHT exhibits interesting symmetries, which are exploited to derive tailored fast transform algorithms. The proposed fast algorithms are based on successive decompositions of the FFHT by means of Hadamard-Walsh transforms (HWT). The introduced decompositions meet the lower bound on the multiplicative complexity for all the cases investigated. The complexity of the new algorithms is compared with that of traditional algorithms.

1502.00950 2026-06-04 math.NA cs.NA stat.ME

Compactly Supported Wavelets Derived From Legendre Polynomials: Spherical Harmonic Wavelets

基于Legendre多项式的紧支撑小波:球面谐波小波

M. M. S. Lira, H. M. de Oliveira, M. A. Carvalho, R. M. Campello de Souza

AI总结 本文提出一种基于Legendre多项式的新小波家族,通过多分辨率滤波器构造,具有紧支撑特性,用于球面谐波分析。

详情
Comments
6 pages, 6 figures, 1 table In: Computational Methods in Circuits and Systems Applications, WSEAS press, pp.211-215, 2003. ISBN: 960-8052-88-2
AI中文摘要

本文介绍了一种新的小波家族,与Legendre多项式相关联。这些被称为球面谐波或Legendre小波的小波具有紧支撑特性。小波构造方法源自普通二阶微分方程与多分辨率滤波器的关联。与Legendre多分辨率分析相关的低通滤波器是一种线性相位有限脉冲响应滤波器(FIR)。

英文摘要

A new family of wavelets is introduced, which is associated with Legendre polynomials. These wavelets, termed spherical harmonic or Legendre wavelets, possess compact support. The method for the wavelet construction is derived from the association of ordinary second order differential equations with multiresolution filters. The low-pass filter associated with Legendre multiresolution analysis is a linear phase finite impulse response filter (FIR).

1502.00555 2026-06-04 stat.ME cs.CV cs.MM cs.NA math.NA stat.CO

A Discrete Tchebichef Transform Approximation for Image and Video Coding

一种用于图像和视频编码的离散切比绍夫变换近似

P. A. M. Oliveira, R. J. Cintra, F. M. Bayer, S. Kulasekera, A. Madanayake

AI总结 本文提出了一种低复杂度的离散切比绍夫变换近似方法,通过减少乘法和加法运算提升编码效率,并在FPGA上实现时降低功耗和面积。

详情
Journal ref
IEEE Signal Processing Letters, vol. 22, issue 8, pp. 1137-1141, 2015
Comments
13 pages, 5 figures, 2 tables
AI中文摘要

本文介绍了一种低复杂度的离散切比绍夫变换(DTT)近似方法。所提出的正反向变换是乘法自由的,仅需较少的加法和位移运算。数值压缩模拟展示了该变换在图像和视频编码中的效率。此外,基于Xilinx Virtex-6 FPGA的硬件实现表明,与文献相比,动态功耗降低了44.9%,面积减少了64.7%。

英文摘要

In this paper, we introduce a low-complexity approximation for the discrete Tchebichef transform (DTT). The proposed forward and inverse transforms are multiplication-free and require a reduced number of additions and bit-shifting operations. Numerical compression simulations demonstrate the efficiency of the proposed transform for image and video coding. Furthermore, Xilinx Virtex-6 FPGA based hardware realization shows 44.9% reduction in dynamic power consumption and 64.7% lower area when compared to the literature.

1501.07091 2026-06-04 math.ST cs.NA math.NA stat.TH

Forward-reverse EM algorithm for Markov chains: convergence and numerical analysis

马尔可夫链的前向-后向EM算法:收敛性与数值分析

Christian Bayer, Hilmar Mai, John Schoenmakers

AI总结 本文提出前向-后向EM算法用于估计马尔可夫链动力学参数,证明其几乎处处收敛,并分析算法复杂度,通过两个应用示例展示其在连续和离散时间过程中的适用性。

详情
AI中文摘要

我们开发了用于估计确定离散时间马尔可夫链动态参数的前向-后向EM(FREM)算法。作为FREM方法构建的关键工具,我们开发了条件于特定终端状态的马尔可夫链的前向-后向表示。这些表示可视为Bayer和Schoenmakers [2013] 关于条件扩散的先前工作的扩展。我们证明了对于具有曲面指数族结构的马尔可夫链模型,该算法几乎处处收敛。在数值方面,我们通过推导其期望成本来分析前向-后向算法的复杂度。两个应用示例用于展示其在从基于连续时间过程到离散时间马尔可夫链模型的广泛应用范围。

英文摘要

We develop a forward-reverse EM (FREM) algorithm for estimating parameters that determine the dynamics of a discrete time Markov chain evolving through a certain measurable state space. As a key tool for the construction of the FREM method we develop forward-reverse representations for Markov chains conditioned on a certain terminal state. These representations may be considered as an extension of the earlier work Bayer and Schoenmakers [2013] on conditional diffusions. We proof almost sure convergence of our algorithm for a Markov chain model with curved exponential family structure. On the numerical side we give a complexity analysis of the forward-reverse algorithm by deriving its expected cost. Two application examples are discuss to demonstrate the scope of possible applications ranging from models based on continuous time processes to discrete time Markov chain models.

1501.07047 2026-06-04 math.NA cs.NA stat.ME

Preprocessing of centred logratio transformed density functions using smoothing splines

使用平滑样条处理中心对数比变换的密度函数预处理

Jitka Machalova, Karel Hron, Gianna Serafina Monti

AI总结 本文提出利用最优平滑样条处理中心对数比变换的密度函数,以考虑其特定特征,为合理预处理离散分布观测提供简洁方法,通过实际数据集展示理论发展。

详情
Comments
13 pages
AI中文摘要

随着大规模数据库系统的发展,由概率分布形成的统计分析在探索性数据分析中变得重要。然而,由于密度函数的特定性质,其合理的统计处理仍然是功能数据分析中的挑战。通常的L2度量无法充分考虑密度函数携带的信息的相对特性;相反,它们的几何特征遵循测度的贝叶斯空间。在L2空间中表达密度函数的最简单方法是使用中心对数比变换,但会导致积分常数约束的功能数据,需要在进一步分析中考虑。虽然贝叶斯空间本身已提供了合理分析密度函数的理论背景,但预处理问题仍需发展。本文旨在引入针对中心对数比变换密度函数的最优平滑样条,以考虑所有特定特征,并提供合理预处理原始(离散化)分布观测的简洁方法。理论发展通过官方统计数据集中的实际数据集进行说明。

英文摘要

With large-scale database systems, statistical analysis of data, formed by probability distributions, become an important task in explorative data analysis. Nevertheless, due to specific properties of density functions, their proper statistical treatment still represents a challenging task in functional data analysis. Namely, the usual L2 metric does not fully accounts for the relative character of information, carried by density functions; instead, their geometrical features are followed by Bayes spaces of measures. The easiest possibility of expressing density functions in L2 space is to use centred logratio transformation, nevertheless, it results in functional data with a constant integral constraint that needs to be taken into account for further analysis. While theoretical background for reasonable analysis of density functions is already provided comprehensively by Bayes spaces themselves, preprocessing issues still need to be developed. The aim of this paper is to introduce optimal smoothing splines for centred logratio transformed density functions that take all their specific features into account and provide a concise methodology for reasonable preprocessing of raw (discretized) distributional observations. Theoretical developments are illustrated with a real-world data set from official statistics.

1312.4344 2026-06-04 math.ST cs.NA math.NA math.PR stat.TH

Error bounds of MCMC for functions with unbounded stationary variance

MCMC方法计算具有无界平稳方差函数的误差界

Daniel Rudolf, Nikolaus Schweizer

AI总结 本文研究了MCMC方法在计算具有无界平稳方差函数期望值时的误差界,通过L_p范数有限性假设,推导出均匀渐近马尔可夫链的最优收敛阶误差界,并提供烧尽期选择方法。

详情
Journal ref
Statistics & Probability Letters, Volume 99, April 2015, Pages 6-12
Comments
13 pages
AI中文摘要

我们证明了马尔可夫链蒙特卡罗(MCMC)方法计算具有无界平稳方差函数期望值的显式误差界。我们假设存在p∈(1,2),使得函数具有有限的L_p范数。对于均匀渐进马尔可夫链,我们得到误差界具有最优收敛阶n^{1/p-1},如果存在谱间隙则几乎达到最优阶。此外,考虑了烧尽期并提供了一种选择烧尽期的配方。

英文摘要

We prove explicit error bounds for Markov chain Monte Carlo (MCMC) methods to compute expectations of functions with unbounded stationary variance. We assume that there is a $p\in(1,2)$ so that the functions have finite $L_p$-norm. For uniformly ergodic Markov chains we obtain error bounds with the optimal order of convergence $n^{1/p-1}$ and if there exists a spectral gap we almost get the optimal order. Further, a burn-in period is taken into account and a recipe for choosing the burn-in is provided.

1501.05740 2026-06-04 stat.ML cs.LG cs.NA math.NA

Bayesian Learning for Low-Rank matrix reconstruction

基于贝叶斯学习的低秩矩阵重建

Martin Sundin, Cristian R. Rojas, Magnus Jansson, Saikat Chatterjee

AI总结 本文提出基于潜在变量模型的贝叶斯学习方法,用于从线性测量中完成和重建低秩矩阵,通过证据近似和期望最大化学习模型参数,验证了在未知秩和噪声功率时的重建能力。

详情
Comments
Submitted to IEEE Transactions on Signal Processing
AI中文摘要

我们开发了基于潜在变量模型的贝叶斯学习方法,用于从线性测量中完成和重建低秩矩阵。对于欠定系统,所开发的方法在未知秩和噪声功率的情况下能够重建低秩矩阵。我们推导了潜在变量模型与几种低秩促进惩罚函数之间的关系。这些关系证明了在高斯先验下使用克罗内克结构协方差矩阵的合理性。在方法中,我们使用证据近似和期望最大化来学习模型参数。通过广泛的数值模拟评估了方法的性能。

英文摘要

We develop latent variable models for Bayesian learning based low-rank matrix completion and reconstruction from linear measurements. For under-determined systems, the developed methods are shown to reconstruct low-rank matrices when neither the rank nor the noise power is known a-priori. We derive relations between the latent variable models and several low-rank promoting penalty functions. The relations justify the use of Kronecker structured covariance matrices in a Gaussian based prior. In the methods, we use evidence approximation and expectation-maximization to learn the model parameters. The performance of the methods is evaluated through extensive numerical simulations.

1501.04819 2026-06-04 math.NA cs.NA stat.ML

Separation of undersampled composite signals using the Dantzig selector with overcomplete dictionaries

利用Dantzig选择器与过完备字典分离欠采样复合信号

Ashley Prater, Lixin Shen

AI总结 本文提出利用Dantzig选择器与过完备字典分离欠采样复合信号,通过改进算法在压缩感知框架下实现信号分离,实验显示其在速度和质量上优于交替方向法。

详情
Comments
18 pages, 4 figures, preprint of a paper accepted by IET Signal Processing
AI中文摘要

在许多应用中,可能会获取由多个信号组成的复合信号,其可能被噪声干扰,可靠分离各成分而不损失重要细节具有挑战性。在压缩感知框架中,仅能获得复合信号的欠采样线性投影。本文提出利用Dantzig选择器模型结合过完备字典来分离噪声下的欠采样复合信号,并提出一种高效的算法来解决该模型。Dantzig选择器是一种统计方法,通过最小化候选系数向量的$\ell_1$范数并在残差范围内约束来解决噪声线性回归问题。如果底层系数向量是稀疏的,则Dantzig选择器在恢复和分离未知复合信号方面表现良好。随后,我们提出基于接近操作符的算法,通过Dantzig选择器恢复和分离未知的噪声欠采样复合信号。我们通过数值模拟比较所提算法与竞争的交替方向法,并发现所提算法在速度上更快,同时产生相似质量的结果。此外,我们通过在各种领域应用中应用该算法来展示其效用,包括复值系数向量的恢复、从平滑信号中去除脉冲噪声以及手写数字组合的分离和分类。

英文摘要

In many applications one may acquire a composition of several signals that may be corrupted by noise, and it is a challenging problem to reliably separate the components from one another without sacrificing significant details. Adding to the challenge, in a compressive sensing framework, one is given only an undersampled set of linear projections of the composite signal. In this paper, we propose using the Dantzig selector model incorporating an overcomplete dictionary to separate a noisy undersampled collection of composite signals, and present an algorithm to efficiently solve the model. The Dantzig selector is a statistical approach to finding a solution to a noisy linear regression problem by minimizing the $\ell_1$ norm of candidate coefficient vectors while constraining the scope of the residuals. If the underlying coefficient vector is sparse, then the Dantzig selector performs well in the recovery and separation of the unknown composite signal. In the following, we propose a proximity operator based algorithm to recover and separate unknown noisy undersampled composite signals through the Dantzig selector. We present numerical simulations comparing the proposed algorithm with the competing Alternating Direction Method, and the proposed algorithm is found to be faster, while producing similar quality results. Additionally, we demonstrate the utility of the proposed algorithm in several experiments by applying it in various domain applications including the recovery of complex-valued coefficient vectors, the removal of impulse noise from smooth signals, and the separation and classification of a composition of handwritten digits.

1310.5715 2026-06-04 math.NA cs.CV cs.LG cs.NA math.OC stat.ML

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm

随机梯度下降、加权采样与随机化Kaczmarz算法

Deanna Needell, Nathan Srebro, Rachel Ward

AI总结 本文改进了随机梯度下降在光滑强凸目标下的线性收敛保证,从二次依赖于条件数转换为线性依赖,同时探讨了加权采样对收敛性的影响,并将随机化Kaczmarz算法与SGD联系起来,证明其在加权最小二乘问题中的指数收敛性。

详情
Comments
22 pages, 6 figures
AI中文摘要

我们获得了随机梯度下降在光滑且强凸目标下的改进有限样本保证,将线性收敛的依赖从二次的条件数$(L/μ)^2$(其中$L$是光滑性的上界,$μ$是强凸性的上界)转为线性依赖于$L/μ$。此外,我们展示了如何通过重新加权采样分布(即重要性采样)进一步提升收敛性,并获得平均光滑性的线性依赖,优于先前结果。我们还讨论了SGD中的重要性采样在其他场景中的应用。我们的结果基于将SGD与随机化Kaczmarz算法联系起来的发现,使我们能够将两种方法的文献思想相互转移。特别是,我们将随机化Kaczmarz算法重新表述为SGD的一个实例,并应用我们的结果证明其在加权最小二乘问题中的指数收敛性,而非原始最小二乘问题。然后,我们提出了一种修改的Kaczmarz算法,具有部分偏置采样,该算法能够收敛到原始最小二乘解,并以相同的指数收敛速率。

英文摘要

We obtain an improved finite-sample guarantee on the linear convergence of stochastic gradient descent for smooth and strongly convex objectives, improving from a quadratic dependence on the conditioning $(L/μ)^2$ (where $L$ is a bound on the smoothness and $μ$ on the strong convexity) to a linear dependence on $L/μ$. Furthermore, we show how reweighting the sampling distribution (i.e. importance sampling) is necessary in order to further improve convergence, and obtain a linear dependence in the average smoothness, dominating previous results. We also discuss importance sampling for SGD more broadly and show how it can improve convergence also in other scenarios. Our results are based on a connection we make between SGD and the randomized Kaczmarz algorithm, which allows us to transfer ideas between the separate bodies of literature studying each of the two methods. In particular, we recast the randomized Kaczmarz algorithm as an instance of SGD, and apply our results to prove its exponential convergence, but to the solution of a weighted least squares problem rather than the original least squares problem. We then present a modified Kaczmarz algorithm with partially biased sampling which does converge to the original least squares solution with the same exponential convergence rate.

1501.02995 2026-06-04 cs.MM cs.CV cs.NA math.NA stat.ME

Improved 8-point Approximate DCT for Image and Video Compression Requiring Only 14 Additions

改进的8点近似DCT用于图像和视频压缩,仅需14次加法

U. S. Potluri, A. Madanayake, R. J. Cintra, F. M. Bayer, S. Kulasekera, A. Edirisuriya

AI总结 本文提出一种仅需14次加法的8点DCT近似方法,具有低计算复杂度,相比现有方法在算法复杂度和信噪比上表现更优,适用于HEVC等可重构视频标准。

详情
Journal ref
Circuits and Systems I: Regular Papers, IEEE Transactions on, Volume 61, Issue 6, June 2014, 1727--1740
Comments
30 pages, 7 figures, 5 tables
AI中文摘要

视频处理系统如HEVC要求低能耗以满足多媒体市场的需求,推动了快速算法在高效近似2-D DCT变换方面的广泛应用。由于DCT具有显著的能量压缩特性,被广泛应用于多种压缩标准。已提出无乘法器的近似DCT变换,提供极低电路复杂度下的优异压缩性能。此类近似可通过仅使用加法和减法在数字VLSI硬件中实现,显著降低芯片面积和功耗。本文提出一种新的8点DCT近似方法,仅需14次加法运算和无乘法。该变换具有低计算复杂度,并在算法复杂度和峰值信噪比方面与现有最先进的DCT近似方法进行比较。所提出的DCT近似方法是HEVC等可重构视频标准的候选方案。所提出变换及其他几种DCT近似方法被映射到脉动阵列数字架构,并通过FPGA技术和45 nm CMOS工艺物理实现为数字原型电路。

英文摘要

Video processing systems such as HEVC requiring low energy consumption needed for the multimedia market has lead to extensive development in fast algorithms for the efficient approximation of 2-D DCT transforms. The DCT is employed in a multitude of compression standards due to its remarkable energy compaction properties. Multiplier-free approximate DCT transforms have been proposed that offer superior compression performance at very low circuit complexity. Such approximations can be realized in digital VLSI hardware using additions and subtractions only, leading to significant reductions in chip area and power consumption compared to conventional DCTs and integer transforms. In this paper, we introduce a novel 8-point DCT approximation that requires only 14 addition operations and no multiplications. The proposed transform possesses low computational complexity and is compared to state-of-the-art DCT approximations in terms of both algorithm complexity and peak signal-to-noise ratio. The proposed DCT approximation is a candidate for reconfigurable video standards such as HEVC. The proposed transform and several other DCT approximations are mapped to systolic-array digital architectures and physically realized as digital prototype circuits using FPGA technology and mapped to 45 nm CMOS technology.

1501.01946 2026-06-04 stat.ME cs.NA math.NA

Multi-Beam RF Aperture Using Multiplierless FFT Approximation

多束射频孔径使用无乘法FFT近似

D. Suarez, R. J. Cintra, F. M. Bayer, A. Sengupta, S. Kulasekera, A. Madanayake

AI总结 本文提出一种低复杂度无乘法8点FFT近似算法,用于实现多束射频波束成形,通过26次加法运算生成八束波束,适用于低功耗射频多波束接收器。

详情
Journal ref
Electronics Letters, volume 50, issue 24, pages 1788-1790, 2014
Comments
8 pages, 3 figures, 2 tables, sfg corrected
AI中文摘要

多独立射频(RF)波束在通信、射电天文、雷达和微波成像中有广泛应用。将N点FFT在接收天线阵列上空间应用可提供N个独立RF波束,其乘法复杂度为N/2 log₂N。本文提出了一种低复杂度无乘法8点FFT近似算法用于射频波束成形,仅使用26次加法。该算法提供八束波束,其特性接近传统基于FFT的波束成形器的天线阵列模式,但不使用乘法器。所提出的FFT-like算法适用于低功耗射频多波束接收器;在45 nm CMOS工艺下1.1 V供电下合成,并在Xilinx Virtex-6 Lx240T FPGA器件上验证。CMOS仿真和FPGA实现分别表明每个独立接收模式RF波束的带宽为588 MHz和369 MHz。

英文摘要

Multiple independent radio frequency (RF) beams find applications in communications, radio astronomy, radar, and microwave imaging. An $N$-point FFT applied spatially across an array of receiver antennas provides $N$-independent RF beams at $\frac{N}{2}\log_2N$ multiplier complexity. Here, a low-complexity multiplierless approximation for the 8-point FFT is presented for RF beamforming, using only 26 additions. The algorithm provides eight beams that closely resemble the antenna array patterns of the traditional FFT-based beamformer albeit without using multipliers. The proposed FFT-like algorithm is useful for low-power RF multi-beam receivers; being synthesized in 45 nm CMOS technology at 1.1 V supply, and verified on-chip using a Xilinx Virtex-6 Lx240T FPGA device. The CMOS simulation and FPGA implementation indicate bandwidths of 588 MHz and 369 MHz, respectively, for each of the independent receive-mode RF beams.

1501.01571 2026-06-04 math.PR cs.DS cs.IT cs.NA math.IT math.NA stat.ML

An Introduction to Matrix Concentration Inequalities

矩阵集中不等式简介

Joel A. Tropp

AI总结 本文介绍矩阵集中不等式的核心方法与应用,通过实例展示其在解决复杂问题中的有效性。

详情
Comments
163 pages. To appear in Foundations and Trends in Machine Learning
AI中文摘要

近年来,随机矩阵在计算数学中扮演了重要角色,但经典随机矩阵理论仍由专家掌握。过去十年中,随着矩阵集中不等式的出现,研究进展到可以利用几行计算解决许多以前具有挑战性的问题。本文旨在描述该领域最成功的方法,并通过一些有趣示例展示这些技术的应用。

英文摘要

In recent years, random matrices have come to play a major role in computational mathematics, but most of the classical areas of random matrix theory remain the province of experts. Over the last decade, with the advent of matrix concentration inequalities, research has advanced to the point where we can conquer many (formerly) challenging problems with a page or two of arithmetic. The aim of this monograph is to describe the most successful methods from this area along with some interesting examples that these techniques can illuminate.

1304.4610 2026-06-04 cs.IT cs.LG cs.NA math.IT math.NA stat.ML

Spectral Compressed Sensing via Structured Matrix Completion

通过结构矩阵补全的谱压缩感知

Yuxin Chen, Yuejie Chi

AI总结 本文提出基于结构矩阵补全的增强矩阵补全算法,用于从少量时域样本中恢复谱稀疏对象,通过核范数最小化实现完美恢复,且在信息理论极限附近具有鲁棒性和超分辨率应用能力。

详情
Journal ref
Journal of Machine Learning Research, W&CP 28 (3) :414-422, 2013
Comments
accepted to International Conference on Machine Learning (ICML 2013)
AI中文摘要

本文研究从少量时域样本中恢复谱稀疏对象的问题。具体而言,目标对象在 ambient 维度 $n$ 下假设为 $r$ 个复多维正弦波的混合,而底层频率可在单位盘内任意取值。传统压缩感知范式在施加离散字典到傅里叶表示时会遇到基函数不匹配问题。为解决此问题,我们开发了一种非参数算法,称为增强矩阵补全(EMaC),基于结构矩阵补全。该算法首先将数据排列成低秩增强形式,具有多倍Hankel结构,然后通过核范数最小化进行恢复。在温和的不相干条件下,EMaC允许样本数量超过 $\mathcal{O}(r\log^{2} n)$ 时实现完美恢复。我们还显示,在许多实例中,当观测条目数与信息理论极限成比例时,低秩多倍Hankel矩阵的准确补全是可能的(除了对数间隙外)。通过数值实验进一步展示了EMaC对有界噪声的鲁棒性和其在超分辨率中的应用能力。

英文摘要

The paper studies the problem of recovering a spectrally sparse object from a small number of time domain samples. Specifically, the object of interest with ambient dimension $n$ is assumed to be a mixture of $r$ complex multi-dimensional sinusoids, while the underlying frequencies can assume any value in the unit disk. Conventional compressed sensing paradigms suffer from the {\em basis mismatch} issue when imposing a discrete dictionary on the Fourier representation. To address this problem, we develop a novel nonparametric algorithm, called enhanced matrix completion (EMaC), based on structured matrix completion. The algorithm starts by arranging the data into a low-rank enhanced form with multi-fold Hankel structure, then attempts recovery via nuclear norm minimization. Under mild incoherence conditions, EMaC allows perfect recovery as soon as the number of samples exceeds the order of $\mathcal{O}(r\log^{2} n)$. We also show that, in many instances, accurate completion of a low-rank multi-fold Hankel matrix is possible when the number of observed entries is proportional to the information theoretical limits (except for a logarithmic gap). The robustness of EMaC against bounded noise and its applicability to super resolution are further demonstrated by numerical experiments.

1304.8126 2026-06-04 cs.IT cs.NA cs.SY eess.SY math.IT math.NA stat.ML

Robust Spectral Compressed Sensing via Structured Matrix Completion

通过结构化矩阵补全实现鲁棒的频谱压缩感知

Yuxin Chen, Yuejie Chi

AI总结 本文提出一种无需先验模型阶数的增强矩阵补全算法,用于恢复频谱稀疏信号,证明在满足一定不相干条件时,算法能在样本数超过r log⁴n时实现完美恢复,并对噪声和部分样本损坏具有鲁棒性。

详情
Journal ref
IEEE Transactions on Information Theory, Vol. 60, No. 10, pp. 6576 - 6601, October 2014
Comments
accepted to IEEE Transactions on Information Theory
AI中文摘要

本文探讨了频谱压缩感知问题,旨在从少量随机样本中恢复频谱稀疏信号。信号假设为r个多维复正弦波的叠加,其频率可为归一化频率域中的任意连续值。传统压缩感知方法在应用离散字典时面临基 mismatch 问题。为此,我们开发了基于结构化矩阵补全的新型算法,称为增强矩阵补全(EMaC)。该算法首先将数据排列成低秩增强形式,展现多倍Hankel结构,然后通过核范数最小化进行恢复。在温和的不相干条件下,EMaC允许样本数超过r log⁴n时实现完美恢复,并对有界噪声稳定。即使在常数比例的样本被任意幅度损坏时,只要样本复杂度超过r² log³n,EMaC仍允许精确恢复。此外,我们的结果展示了凸松弛在从最少观测条目恢复低秩多倍Hankel或Toeplitz矩阵方面的强大能力。通过数值实验进一步验证了该算法的性能及其在超分辨率中的应用性。

英文摘要

The paper explores the problem of \emph{spectral compressed sensing}, which aims to recover a spectrally sparse signal from a small random subset of its $n$ time domain samples. The signal of interest is assumed to be a superposition of $r$ multi-dimensional complex sinusoids, while the underlying frequencies can assume any \emph{continuous} values in the normalized frequency domain. Conventional compressed sensing paradigms suffer from the basis mismatch issue when imposing a discrete dictionary on the Fourier representation. To address this issue, we develop a novel algorithm, called \emph{Enhanced Matrix Completion (EMaC)}, based on structured matrix completion that does not require prior knowledge of the model order. The algorithm starts by arranging the data into a low-rank enhanced form exhibiting multi-fold Hankel structure, and then attempts recovery via nuclear norm minimization. Under mild incoherence conditions, EMaC allows perfect recovery as soon as the number of samples exceeds the order of $r\log^{4}n$, and is stable against bounded noise. Even if a constant portion of samples are corrupted with arbitrary magnitude, EMaC still allows exact recovery, provided that the sample complexity exceeds the order of $r^{2}\log^{3}n$. Along the way, our results demonstrate the power of convex relaxation in completing a low-rank multi-fold Hankel or Toeplitz matrix from minimal observed entries. The performance of our algorithm and its applicability to super resolution are further validated by numerical experiments.

1412.8604 2026-06-04 math.ST cs.NA math.NA stat.TH

A note on the Karhunen-Loève expansions for infinite-dimensional Bayesian inverse problems

关于无限维贝叶斯逆问题的卡尔文-洛埃夫展开注记

Jinglai Li

AI总结 本文探讨了截断卡尔文-洛埃夫展开在无限维逆问题中的应用,证明在特定条件下,误差界可无需解知即可估计。

详情
AI中文摘要

在本文中,我们考虑截断卡尔文-洛埃夫展开用于近似无限维逆问题解。我们证明,在某些条件下,解与有限维近似解之间的误差界可以估计,而无需了解解本身。

英文摘要

In this note, we consider the truncated Karhunen-Loève expansion for approximating solutions to infinite dimensional inverse problems. We show that, under certain conditions, the bound of the error between a solution and its finite-dimensional approximation can be estimated without the knowledge of the solution.

1309.5524 2026-06-04 stat.CO cs.NA math.NA

Adaptive construction of surrogates for the Bayesian solution of inverse problems

自适应构造代理模型用于贝叶斯逆问题求解

Jinglai Li, Youssef M. Marzouk

AI总结 本文提出自适应构造多项式近似模型的方法,用于加速贝叶斯逆问题求解,通过随机优化在数据驱动的度量下构建准确代理模型,提升效率和精度。

详情
Journal ref
SIAM Journal on Scientific Computing, 36: A1163--A1186 (2014)
Comments
23 pages, 10 figures
AI中文摘要

贝叶斯逆问题通常依赖后验抽样方法,如马尔可夫链蒙特卡洛方法,其中每个样本的生成需要一次或多次参数到观测映射或正向模型的评估。当这些评估计算成本高时,正向模型的近似变得至关重要,以加速基于样本的推断。然而,为非线性正向模型构建全局准确的近似可能计算上是不可行的,实际上也不必要,因为后验分布通常集中在先验分布支持的极小部分。本文提出了一种新的方法,利用随机优化在一系列自适应确定的数据度量下构造多项式近似,最终集中在后验分布上。该方法在逆问题求解中表现出比基于先验的代理模型更高的效率和精度。

英文摘要

The Bayesian approach to inverse problems typically relies on posterior sampling approaches, such as Markov chain Monte Carlo, for which the generation of each sample requires one or more evaluations of the parameter-to-observable map or forward model. When these evaluations are computationally intensive, approximations of the forward model are essential to accelerating sample-based inference. Yet the construction of globally accurate approximations for nonlinear forward models can be computationally prohibitive and in fact unnecessary, as the posterior distribution typically concentrates on a small fraction of the support of the prior distribution. We present a new approach that uses stochastic optimization to construct polynomial approximations over a sequence of measures adaptively determined from the data, eventually concentrating on the posterior distribution. The approach yields substantial gains in efficiency and accuracy over prior-based surrogates, as demonstrated via application to inverse problems in partial differential equations.

1412.5676 2026-06-04 eess.SY cs.SY math.OC stat.ML

Optimal Triggering of Networked Control Systems

网络化控制系统最优触发问题

Ali Heydari

AI总结 研究非线性网络化控制系统的资源分配问题,提出动态规划方法解决固定终时问题并扩展至无限时域问题,分析不同情况下的触发策略及未知动态下的模型自由学习方案。

详情
AI中文摘要

非线性网络化控制系统资源分配问题的研究,不同于传统稳定性触发问题,目标为最优触发。开发了近似动态规划方法解决固定终时问题,并扩展至无限时域问题。研究了零阶保持、广义零阶保持和随机网络等不同情况。随后,将发展扩展到未知动态问题,并提出模型自由方案学习近似最优解。通过详细分析收敛性、最优性和稳定性,通过不同数值示例展示方法性能。

英文摘要

The problem of resource allocation of nonlinear networked control systems is investigated, where, unlike the well discussed case of triggering for stability, the objective is optimal triggering. An approximate dynamic programming approach is developed for solving problems with fixed final times initially and then it is extended to infinite horizon problems. Different cases including Zero-Order-Hold, Generalized Zero-Order-Hold, and stochastic networks are investigated. Afterwards, the developments are extended to the case of problems with unknown dynamics and a model-free scheme is presented for learning the (approximate) optimal solution. After detailed analyses of convergence, optimality, and stability of the results, the performance of the method is demonstrated through different numerical examples.

1412.1392 2026-06-04 stat.ME cs.NA math.AG math.NA

An algebraic method for constructing stable and consistent autoregressive filters

构造稳定且一致自回归滤波器的代数方法

John Harlim, Hoon Hong, Jacob L. Robbins

AI总结 本文提出一种代数方法,通过长期平均统计构建低阶稳定且一致的自回归模型,用于预测非线性湍流信号,相比传统回归方法更准确。

详情
Journal ref
J. Comput. Phys. 283, 241-257, 2015
Comments
10 figures
AI中文摘要

本文介绍了一种构造低阶稳定且一致的单变量自回归(AR)模型的代数方法,用于过滤和预测具有记忆深度的非线性湍流信号。通过稳定指传统AR模型的稳定性条件,通过一致指Adams-Bashforth二阶方法的古典一致性约束。该方法的优点是模型参数可通过长期平均统计获得,无需训练数据。所提方法提供离散化时间步长区间,保证稳定一致AR模型的存在并同时生成AR模型参数。在两个具有不同衰减时间尺度的混沌时间序列的数值示例中,发现所提AR模型在短期预测方面显著更准确,与线性回归AR模型相比在过滤方面具有可比性。这些结果在广泛的离散化时间、观测时间和观测噪声方差范围内都稳健。最后,发现所提模型在预测 Madden-Julian Oscillation 数据集时,相比线性回归AR模型具有改进的短期预测能力。

英文摘要

In this paper, we introduce an algebraic method to construct stable and consistent univariate autoregressive (AR) models of low order for filtering and predicting nonlinear turbulent signals with memory depth. By stable, we refer to the classical stability condition for the AR model. By consistent, we refer to the classical consistency constraints of Adams-Bashforth methods of order-two. One attractive feature of this algebraic method is that the model parameters can be obtained without directly knowing any training data set as opposed to many standard, regression-based parameterization methods. It takes only long-time average statistics as inputs. The proposed method provides a discretization time step interval which guarantees the existence of stable and consistent AR model and simultaneously produces the parameters for the AR models. In our numerical examples with two chaotic time series with different characteristics of decaying time scales, we find that the proposed AR models produce significantly more accurate short-term predictive skill and comparable filtering skill relative to the linear regression-based AR models. These encouraging results are robust across wide ranges of discretization times, observation times, and observation noise variances. Finally, we also find that the proposed model produces an improved short-time prediction relative to the linear regression-based AR-models in forecasting a data set that characterizes the variability of the Madden-Julian Oscillation, a dominant tropical atmospheric wave pattern.

1412.4384 2026-06-04 math.NA cs.NA stat.ME

Bayesian Hierarchical Model of Total Variation Regularisation for Image Deblurring

基于总变分正则化的贝叶斯分层模型

Marko Järvenpää, Robert Piché

AI总结 本文提出基于总变分正则化的贝叶斯分层模型,通过拉普拉斯密度先验和变分贝叶斯方法实现逆问题参数的同时估计,用于图像去模糊,结果展示其在边缘保持方面的应用潜力。

详情
Comments
21 pages, 5 figures
AI中文摘要

本文提出了一种总变分正则化的贝叶斯分层模型。所有逆问题参数,包括正则化参数,均从模型数据中同时估计。该模型基于拉普拉斯密度先验作为高斯混合的尺度混合。通过不同的混合变量先验,可获得其他总变分类正则化方法,如与t分布相关的先验。采用变分贝叶斯方法近似后验均值,并提出迭代交替序列算法计算最大后验估计。通过图像去模糊示例展示了方法,结果表明该模型可用于自动边缘保持的逆问题解决。尽管结果令人鼓舞,但模型仍存在一些困难,需在未来工作中进一步改进。

英文摘要

A Bayesian hierarchical model for total variation regularisation is presented in this paper. All the parameters of an inverse problem, including the "regularisation parameter", are estimated simultaneously from the data in the model. The model is based on the characterisation of the Laplace density prior as a scale mixture of Gaussians. With different priors on the mixture variable, other total variation like regularisations e.g. a prior that is related to t-distribution, are also obtained. An approximation of the resulting posterior mean is found using a variational Bayes method. In addition, an iterative alternating sequential algorithm for computing the maximum a posteriori estimate is presented. The methods are illustrated with examples of image deblurring. Results show that the proposed model can be used for automatic edge-preserving inversion in the case of image deblurring. Despite promising results, some difficulties with the model were encountered and are subject to future work.

1412.3297 2026-06-04 stat.ML cs.NA math.NA

Convergence and rate of convergence of some greedy algorithms in convex optimization

凸优化中某些贪心算法的收敛性及收敛速率

Vladimir Temlyakov

AI总结 本文系统研究了凸优化中广泛应用的三种贪心型算法的近似版本,分析了在评估存在误差时的收敛性及收敛速率。

详情
AI中文摘要

本文系统研究了凸优化中广泛应用的三种贪心型算法的近似版本,分析了在评估存在误差时的收敛性及收敛速率。

英文摘要

The paper gives a systematic study of the approximate versions of three greedy-type algorithms that are widely used in convex optimization. By approximate version we mean the one where some of evaluations are made with an error. Importance of such versions of greedy-type algorithms in convex optimization and in approximation theory was emphasized in previous literature.

1311.1899 2026-06-04 math.ST cs.NA math.NA math.PR stat.TH

Computation of expectations by Markov chain Monte Carlo methods

通过马尔可夫链蒙特卡洛方法计算期望

Erich Novak, Daniel Rudolf

AI总结 本文探讨了马尔可夫链蒙特卡洛方法在计算积分和期望中的误差界、烧入期选择规则、高维问题及可处理性与维度灾难的对比。

详情
Journal ref
Extraction of Quantifiable Information from Complex Systems, Lecture Notes in Computational Science and Engineering, Volume 102, 2014, pp 397-411
Comments
14 pages. In: "Extraction of quantifiable information from complex systems", S. Dahlke et al. (eds.), Springer, 2014
AI中文摘要

马尔可夫链蒙特卡洛(MCMC)方法是一种非常灵活且广泛使用的工具,用于计算积分和期望。在本文简短的综述中,我们聚焦于误差界、烧入期选择规则、高维问题以及可处理性与维度灾难之间的对比。

英文摘要

Markov chain Monte Carlo (MCMC) methods are a very versatile and widely used tool to compute integrals and expectations. In this short survey we focus on error bounds, rules for choosing the burn in, high dimensional problems and tractability versus curse of dimension.

1311.1890 2026-06-04 stat.CO cs.NA math.NA math.ST stat.TH

Discrepancy estimates for variance bounding Markov chain quasi-Monte Carlo

方差限制马尔可夫链准蒙特卡洛的偏差估计

Josef Dick, Daniel Rudolf

AI总结 本文研究了由确定性序列驱动的方差限制马尔可夫链准蒙特卡洛方法的效率,定义了驱动序列的回溯偏差,并证明了存在确定性驱动序列使偏差以蒙特卡洛速率n^1/2衰减,同时提供了误差估计示例。

详情
Journal ref
Electron. J. Probab. 19 (2014), no. 105, 1-24
Comments
24 pages
AI中文摘要

马尔可夫链蒙特卡洛(MCMC)模拟被建模为由真实随机数驱动。我们考虑由确定性序列驱动的方差限制马尔可夫链。星偏差是此类马尔可夫链准蒙特卡洛方法效率的度量。我们定义了驱动序列的回溯偏差,并阐述其与马尔可夫链准蒙特卡洛样本星偏差之间的紧密关系。我们证明存在一个确定性驱动序列,使得偏差几乎以蒙特卡洛速率n^{1/2}衰减。对于MCMC模拟,也可以考虑烧入期以减少初始状态的影响。特别是,我们的偏差界导致了期望计算误差的估计。为了说明我们的理论,我们提供了一个基于球漫步的Metropolis算法示例。此外,在额外假设下,我们证明存在一个驱动序列,使得对应的确定性马尔可夫链样本的偏差以阶n^{-1+δ}衰减,对于每个δ>0。

英文摘要

Markov chain Monte Carlo (MCMC) simulations are modeled as driven by true random numbers. We consider variance bounding Markov chains driven by a deterministic sequence of numbers. The star-discrepancy provides a measure of efficiency of such Markov chain quasi-Monte Carlo methods. We define a pull-back discrepancy of the driver sequence and state a close relation to the star-discrepancy of the Markov chain-quasi Monte Carlo samples. We prove that there exists a deterministic driver sequence such that the discrepancies decrease almost with the Monte Carlo rate $n^{1/2}$. As for MCMC simulations, a burn-in period can also be taken into account for Markov chain quasi-Monte Carlo to reduce the influence of the initial state. In particular, our discrepancy bound leads to an estimate of the error for the computation of expectations. To illustrate our theory we provide an example for the Metropolis algorithm based on a ball walk. Furthermore, under additional assumptions we prove the existence of a driver sequence such that the discrepancy of the corresponding deterministic Markov chain sample decreases with order $n^{-1+δ}$ for every $δ>0$.

1412.0607 2026-06-04 math.ST cs.SY eess.SY stat.ML stat.TH

How to monitor and mitigate stair-casing in l1 trend filtering

如何监控和缓解l1趋势过滤中的阶梯效应

Cristian R. Rojas, Bo Wahlberg

AI总结 本文研究了使用l1趋势过滤估计时间序列中的变化趋势,指出该方法存在阶梯效应问题,并讨论了监控和缓解该问题的方法。

详情
AI中文摘要

在本文中,我们研究了使用ℓ1趋势过滤估计时间序列中的变化趋势。该方法将一维全变分(TV)去噪用于检测均值的阶跃变化扩展到检测趋势变化,并依赖于一个凸优化问题,其中存在非常高效的数值算法。已知TV去噪会受到所谓的阶梯效应影响,导致检测到虚假的转折点。本文旨在展示ℓ1趋势过滤也存在某种阶梯效应问题。分析基于将方法中优化问题的对偶变量解释为整合的随机游走。我们讨论了ℓ1趋势过滤的一致性条件、如何监控其满足情况,以及如何修改算法以避免阶梯效应的虚假检测问题。

英文摘要

In this paper we study the estimation of changing trends in time-series using $\ell_1$ trend filtering. This method generalizes 1D Total Variation (TV) denoising for detection of step changes in means to detecting changes in trends, and it relies on a convex optimization problem for which there are very efficient numerical algorithms. It is known that TV denoising suffers from the so-called stair-case effect, which leads to detecting false change points. The objective of this paper is to show that $\ell_1$ trend filtering also suffers from a certain stair-case problem. The analysis is based on an interpretation of the dual variables of the optimization problem in the method as integrated random walk. We discuss consistency conditions for $\ell_1$ trend filtering, how to monitor their fulfillment, and how to modify the algorithm to avoid the stair-case false detection problem.

1411.6081 2026-06-04 cs.LG cs.NA math.NA stat.ML

PU Learning for Matrix Completion

矩阵补全的PU学习

Cho-Jui Hsieh, Nagarajan Natarajan, Inderjit S. Dhillon

AI总结 本文研究了在仅观测到二进制测量值的情况下,如何通过PU学习方法进行矩阵补全,提出了两种方法并给出了误差界和样本复杂度。

详情
AI中文摘要

本文考虑了当观测值为某些基础矩阵M的一位测量值时的矩阵补全问题,特别是观测样本仅包含1而无0的情况。此问题源于推荐系统和社会网络等现代应用,其中仅观察到

英文摘要

In this paper, we consider the matrix completion problem when the observations are one-bit measurements of some underlying matrix M, and in particular the observed samples consist only of ones and no zeros. This problem is motivated by modern applications such as recommender systems and social networks where only "likes" or "friendships" are observed. The problem of learning from only positive and unlabeled examples, called PU (positive-unlabeled) learning, has been studied in the context of binary classification. We consider the PU matrix completion problem, where an underlying real-valued matrix M is first quantized to generate one-bit observations and then a subset of positive entries is revealed. Under the assumption that M has bounded nuclear norm, we provide recovery guarantees for two different observation models: 1) M parameterizes a distribution that generates a binary matrix, 2) M is thresholded to obtain a binary matrix. For the first case, we propose a "shifted matrix completion" method that recovers M using only a subset of indices corresponding to ones, while for the second case, we propose a "biased matrix completion" method that recovers the (thresholded) binary matrix. Both methods yield strong error bounds --- if M is n by n, the Frobenius error is bounded as O(1/((1-rho)n), where 1-rho denotes the fraction of ones observed. This implies a sample complexity of O(n\log n) ones to achieve a small error, when M is dense and n is large. We extend our methods and guarantees to the inductive matrix completion problem, where rows and columns of M have associated features. We provide efficient and scalable optimization procedures for both the methods and demonstrate the effectiveness of the proposed methods for link prediction (on real-world networks consisting of over 2 million nodes and 90 million links) and semi-supervised clustering tasks.

1411.4695 2026-06-04 eess.SY cs.SY math.OC stat.ML

Feedback Solution to Optimal Switching Problems with Switching Cost

反馈解法用于具有切换成本的最优切换问题

Ali Heydari

AI总结 本文研究了在非线性自主子系统间切换的最优问题,通过引入切换成本项,提出反馈解法以近似最优解,并通过数值例子验证了方法的有效性。

详情
AI中文摘要

本文研究了在非线性自主子系统间切换的最优问题,其中目标不仅包括将状态接近期望点,还包括调整切换模式,即通过惩罚切换次数并赋予不同模式使用优先级。切换序列未指定,成本函数中包含切换成本项以惩罚每次切换。证明一旦引入切换成本,最优成本到目标函数依赖于已激活的子系统,即前一时间步参与的子系统。随后开发了一种基于近似动态规划的方法,为不同初始条件提供反馈形式的最优解近似。最后通过数值例子分析了该方法的性能。

英文摘要

The problem of optimal switching between nonlinear autonomous subsystems is investigated in this study where the objective is not only bringing the states to close to the desired point, but also adjusting the switching pattern, in the sense of penalizing switching occurrences and assigning different preferences to utilization of different modes. The mode sequence is unspecified and a switching cost term is used in the cost function for penalizing each switching. It is shown that once a switching cost is incorporated, the optimal cost-to-go function depends on the already active subsystem, i.e., the subsystem which was engaged in the previous time step. Afterwards, an approximate dynamic programming based method is developed which provides an approximation of the optimal solution to the problem in a feedback form and for different initial conditions. Finally, the performance of the method is analyzed through numerical examples.

1411.3954 2026-06-04 stat.CO cs.NA math.NA

Optimal mixture weights in multiple importance sampling

多重重要性采样中最优混合权重

Hera Y. He, Art B. Owen

AI总结 本文研究了多重重要性采样中混合权重的优化问题,通过凸优化方法提升效率,并提出一种序列重要性采样算法来估计最优混合比例。

详情
Comments
23 pages, 0 figures
AI中文摘要

在多重重要性采样中,我们结合多个建议分布的样本。当这些建议分布用于创建控制变量时,Owen和Zhou(2000)指出可以将结果方差与未知最佳建议分布的方差比进行界。最小最大遗憾出现在取均匀混合时,但当有大量组件时会显得保守。本文通过优化混合组件的采样率来进一步提高效率。我们证明了带有控制变量的混合重要性采样方差在混合概率和控制变量回归系数上是联合凸的。我们还提出了一种序列重要性采样算法来从样本数据中估计最优混合比例。

英文摘要

In multiple importance sampling we combine samples from a finite list of proposal distributions. When those proposal distributions are used to create control variates, it is possible (Owen and Zhou, 2000) to bound the ratio of the resulting variance to that of the unknown best proposal distribution in our list. The minimax regret arises by taking a uniform mixture of proposals, but that is conservative when there are many components. In this paper we optimize the mixture component sampling rates to gain further efficiency. We show that the sampling variance of mixture importance sampling with control variates is jointly convex in the mixture probabilities and control variate regression coefficients. We also give a sequential importance sampling algorithm to estimate the optimal mixture from the sample data.

1411.2389 2026-06-04 stat.ME cs.NA math.NA

On Filter Banks and Wavelets Based on Chebyshev Polynomials

基于切比雪夫多项式的滤波器组与小波

R. J. Cintra, H. M. de Oliveira, L. R. Soares

AI总结 本文提出切比雪夫小波,基于切比雪夫多项式,研究其滤波器组性质,证明级联算法收敛性,并展示其在信号去噪中的应用。

详情
Journal ref
Cintra, R. J. ; Oliveira, H. M. ; Soares, L. R. "On Filter Banks and Wavelets Based on Chebyshev Polynomials". In: 7th WSEAS International Conference on Circuits, 2003, Corfu Island, Greece, p. 195-200
Comments
18 pages, 6 figures
AI中文摘要

本文介绍了一种新的小波族,称为切比雪夫小波,其源自传统的一阶和二阶切比雪夫多项式。研究了切比雪夫滤波器组的性质,包括正交性和完美重建条件。切比雪夫小波具有紧支撑,其滤波器具有良好的选择性,但不正交。通过马尔可夫链的性质证明了切比雪夫小波级联算法的收敛性。展示了这些小波的计算实现和一些明确的应用。所提出的小波适用于信号去噪。

英文摘要

In this paper we introduce a new family of wavelets, named Chebyshev wavelets, which are derived from conventional first and second kind Chebyshev polynomials. Properties of Chebyshev filter banks are investigated, including orthogonality and perfect reconstruction conditions. Chebyshev wavelets have compact support, their filters possess good selectivity, but they are not orthogonal. The convergence of the cascade algorithm of Chebyshev wavelets is proved by using properties of Markov chains. Computational implementation of these wavelets and some clear-cut applications are presented. Proposed wavelets are suitable for signal denoising.

1411.1151 2026-06-04 math.NA cs.NA stat.ME

Guaranteed Monte Carlo Methods for Bernoulli Random Variables

伯努利随机变量的保证蒙特卡罗方法

Lan Jiang, Fred J. Hickernell

AI总结 本文提出一种保证自动积分库(GAIL)中的算法,利用霍夫丁不等式自动确定样本数量,以在高置信度下达到指定的绝对误差容忍度。

详情
AI中文摘要

简单的蒙特卡罗方法是一种通用的计算方法,具有O(n^{-1/2})的收敛率。它可以用于估计分布未知的随机变量的均值。伯努利随机变量Y广泛用于建模复杂系统的成功(失败)。这里Y=1表示成功(失败),p=E(Y)表示成功(失败)的概率。伯努利随机变量的另一个应用是Y=1_R(X),此时p是X落在区域R中的概率。本文探讨如何在高置信度1-α下,以指定的绝对误差容忍度ε估计p。所提出的算法通过使用霍夫丁不等式自动确定所需Y样本数量,以达到指定的误差容忍度和置信水平。本文描述的算法已用MATLAB实现,并是保证自动积分库(GAIL)的一部分。

英文摘要

Simple Monte Carlo is a versatile computational method with a convergence rate of $O(n^{-1/2})$. It can be used to estimate the means of random variables whose distributions are unknown. Bernoulli random variables, $Y$, are widely used to model success (failure) of complex systems. Here $Y=1$ denotes a success (failure), and $p=\mathbb{E}(Y)$ denotes the probability of that success (failure). Another application of Bernoulli random variables is $Y=\mathbb{1}_{R}(\boldsymbol{X})$, where then $p$ is the probability of $\boldsymbol{X}$ lying in the region $R$. This article explores how estimate $p$ to a prescribed absolute error tolerance, $\varepsilon$, with a high level of confidence, $1-α$. The proposed algorithm automatically determines the number of samples of $Y$ needed to reach the prescribed error tolerance with the specified confidence level by using Hoeffding's inequality. The algorithm described here has been implemented in MATLAB and is part of the Guaranteed Automatic Integration Library (GAIL).

1411.1087 2026-06-04 math.NA cs.DS cs.IT cs.LG cs.NA math.IT stat.ML

Fast Exact Matrix Completion with Finite Samples

快速精确矩阵补全与有限样本

Prateek Jain, Praneeth Netrapalli

AI总结 本文提出一种快速迭代算法,通过观察O(nr^5 log^3 n)个样本实现精确矩阵补全,运行时间为O(nr^7 log^3 n log 1/ε),首次实现近线性时间且样本复杂度独立于精度的补全方法。

详情
AI中文摘要

矩阵补全是通过观测少量矩阵条目恢复低秩矩阵的问题。近期多项工作提出了快速非凸优化迭代算法,但样本复杂度在秩、条件数和所需精度的依赖上仍不最优。本文提出一种快速迭代算法,通过观察O(nr^5 log^3 n)个条目解决矩阵补全问题,运行时间为O(nr^7 log^3 n log 1/ε),近线性于矩阵维度。本文算法基于已知的投影梯度下降方法,投影到非凸的低秩矩阵集合。两个关键思想:1) 使用ℓ∞范数势函数而非谱范数,提供新的扰动界方法;2) 扩展Davis-Kahan定理以获得具有良好特征间隙矩阵的最佳低秩近似扰动界。这些思想可能具有独立价值。

英文摘要

Matrix completion is the problem of recovering a low rank matrix by observing a small fraction of its entries. A series of recent works [KOM12,JNS13,HW14] have proposed fast non-convex optimization based iterative algorithms to solve this problem. However, the sample complexity in all these results is sub-optimal in its dependence on the rank, condition number and the desired accuracy. In this paper, we present a fast iterative algorithm that solves the matrix completion problem by observing $O(nr^5 \log^3 n)$ entries, which is independent of the condition number and the desired accuracy. The run time of our algorithm is $O(nr^7\log^3 n\log 1/ε)$ which is near linear in the dimension of the matrix. To the best of our knowledge, this is the first near linear time algorithm for exact matrix completion with finite sample complexity (i.e. independent of $ε$). Our algorithm is based on a well known projected gradient descent method, where the projection is onto the (non-convex) set of low rank matrices. There are two key ideas in our result: 1) our argument is based on a $\ell_{\infty}$ norm potential function (as opposed to the spectral norm) and provides a novel way to obtain perturbation bounds for it. 2) we prove and use a natural extension of the Davis-Kahan theorem to obtain perturbation bounds on the best low rank approximation of matrices with good eigen-gap. Both of these ideas may be of independent interest.

1409.8109 2026-06-04 stat.AP cs.NA math.NA

Sequential Monte Carlo samplers for semilinear inverse problems and application to magnetoencephalography

用于半线性反问题的序列蒙特卡罗采样及其在脑磁图中的应用

Sara Sommariva, Alberto Sorrentino

AI总结 本文提出利用序列蒙特卡罗方法解决半线性反问题,通过分析线性变量和非线性变量,降低计算成本并提高效率,应用于脑磁图中多电极模型的源数和位置估计。

详情
Journal ref
Inverse Problems 30 (2014) 114020
Comments
26 pages, 6 figures
AI中文摘要

我们讨论了一种最近的序列蒙特卡罗方法用于求解具有半线性结构的反问题,即数据对变量子集线性依赖,对其他变量非线性依赖。在适当高斯假设下,可以边缘化线性变量。这意味着蒙特卡罗过程只需应用于非线性变量,而线性变量可解析处理;结果蒙特卡罗方差和/或计算成本降低。我们使用这种方法解决脑磁图的反问题,采用多电极模型作为源。数据对源数和位置非线性依赖,对电流矢量线性依赖。半解析方法使我们能够从完整时间序列估计电极数和位置,而非单个时间点,同时保持低计算成本。

英文摘要

We discuss the use of a recent class of sequential Monte Carlo methods for solving inverse problems characterized by a semi-linear structure, i.e. where the data depend linearly on a subset of variables and nonlinearly on the remaining ones. In this type of problems, under proper Gaussian assumptions one can marginalize the linear variables. This means that the Monte Carlo procedure needs only to be applied to the nonlinear variables, while the linear ones can be treated analytically; as a result, the Monte Carlo variance and/or the computational cost decrease. We use this approach to solve the inverse problem of magnetoencephalography, with a multi-dipole model for the sources. Here, data depend nonlinearly on the number of sources and their locations, and depend linearly on their current vectors. The semi-analytic approach enables us to estimate the number of dipoles and their location from a whole time-series, rather than a single time point, while keeping a low computational cost.

1411.0622 2026-06-04 math.NA cs.IT cs.NA math.IT stat.AP

A Subspace Method for Array Covariance Matrix Estimation

一种用于阵列协方差矩阵估计的子空间方法

Mostafa Rahmani, George Atia

AI总结 本文提出一种基于子空间的协方差矩阵估计方法,通过解决半定规划问题获得近似解,提升估计精度。

详情
Comments
5 pages, 4 figures
AI中文摘要

本文介绍了一种用于估计阵列协方差矩阵的子空间方法。当接收到的信号不相关时,真实的阵列协方差矩阵位于一个特定的子空间中,其维度通常远小于全空间维度。基于此思想,提出了一种基于子空间的协方差矩阵估计器。该估计器是半定凸优化问题的解。虽然该优化问题没有闭式解,但提出了一种近似闭式解,使其易于实现。与传统方法相比,所提方法由于消除了不位于真实协方差矩阵子空间中的估计误差,从而提高了估计精度。数值实验表明,所提的协方差矩阵估计器可以显著提高协方差矩阵的估计质量。

英文摘要

This paper introduces a subspace method for the estimation of an array covariance matrix. It is shown that when the received signals are uncorrelated, the true array covariance matrices lie in a specific subspace whose dimension is typically much smaller than the dimension of the full space. Based on this idea, a subspace based covariance matrix estimator is proposed. The estimator is obtained as a solution to a semi-definite convex optimization problem. While the optimization problem has no closed-form solution, a nearly optimal closed-form solution is proposed making it easy to implement. In comparison to the conventional approaches, the proposed method yields higher estimation accuracy because it eliminates the estimation error which does not lie in the subspace of the true covariance matrices. The numerical examples indicate that the proposed covariance matrix estimator can significantly improve the estimation quality of the covariance matrix.

1411.0024 2026-06-04 math.OC cs.LG cs.SY eess.SY stat.ML

Robust sketching for multiple square-root LASSO problems

多重平方根LASSO问题的鲁棒抽样

Vu Pham, Laurent El Ghaoui, Arturo Fernandez

AI总结 本文提出一种鲁棒框架,通过低秩近似对多个相似问题进行高效求解,减少计算量并提升统计性能。

详情
AI中文摘要

许多学习任务,如交叉验证、参数搜索或留一分析,涉及多个相似问题实例,每个实例共享大量学习数据。我们介绍了一种基于学习数据抽样的鲁棒框架,用于解决多个平方根LASSO问题。我们的方法通过低秩近似大幅减少计算工作量,将观测数从m减少到k,而不牺牲甚至提升了统计性能。理论分析和在合成及真实数据上的数值实验展示了该方法在大规模应用中的效率。

英文摘要

Many learning tasks, such as cross-validation, parameter search, or leave-one-out analysis, involve multiple instances of similar problems, each instance sharing a large part of learning data with the others. We introduce a robust framework for solving multiple square-root LASSO problems, based on a sketch of the learning data that uses low-rank approximations. Our approach allows a dramatic reduction in computational effort, in effect reducing the number of observations from $m$ (the number of observations to start with) to $k$ (the number of singular values retained in the low-rank model), while not sacrificing---sometimes even improving---the statistical performance. Theoretical analysis, as well as numerical experiments on both synthetic and real data, illustrate the efficiency of the method in large scale applications.

1406.4905 2026-06-04 cs.LG cs.RO cs.SY eess.SY stat.ML

Variational Gaussian Process State-Space Models

变分高斯过程状态空间模型

Roger Frigola, Yutian Chen, Carl E. Rasmussen

AI总结 本文提出基于稀疏高斯过程的变分贝叶斯学习方法,用于高效学习非线性状态空间模型,实现对非线性动力系统后验的可计算性,相比传统参数模型,能灵活平衡模型容量与计算成本,避免过拟合。

详情
Journal ref
R. Frigola, Y. Chen and C. E. Rasmussen. Variational Gaussian Process State-Space Models, in Advances in Neural Information Processing Systems (NIPS), 2014
AI中文摘要

状态空间模型在科学和工程的不同领域中已成功应用超过五十年。我们提出了一种基于稀疏高斯过程的高效变分贝叶斯学习非线性状态空间模型的程序。学习结果是对非线性动力系统可计算的后验。与传统参数模型相比,我们提供了在避免过拟合的同时,可以方便地权衡模型容量和计算成本的可能性。我们的主要算法使用了结合变分贝叶斯和顺序蒙特卡洛的混合推断方法。我们还提出了随机变分推断和在线学习方法,以实现对长时间序列的快速学习。

英文摘要

State-space models have been successfully used for more than fifty years in different areas of science and engineering. We present a procedure for efficient variational Bayesian learning of nonlinear state-space models based on sparse Gaussian processes. The result of learning is a tractable posterior over nonlinear dynamical systems. In comparison to conventional parametric models, we offer the possibility to straightforwardly trade off model capacity and computational cost whilst avoiding overfitting. Our main algorithm uses a hybrid inference approach combining variational Bayes and sequential Monte Carlo. We also present stochastic variational inference and online learning approaches for fast learning with long time series.

1410.7550 2026-06-04 stat.ML cs.LG cs.NE cs.SY eess.SY

Learning deep dynamical models from image pixels

从图像像素学习深度动态模型

Niklas Wahlström, Thomas B. Schön, Marc Peter Deisenroth

AI总结 本文提出通过深度学习与系统识别结合,从高维图像像素学习非线性动态系统的嵌入表示和预测转移模型。

详情
Comments
10 pages, 11 figures
AI中文摘要

建模动态系统在许多领域都很重要,例如控制、机器人学或神经技术。通常这些系统的状态无法直接观测,只能通过噪声且可能高维的观测获得。在这种情况下,系统识别,即在潜在空间中找到测量映射和转移映射(系统动态)具有挑战性。对于线性系统动态和测量映射,有高效的系统识别解决方案。然而在实际应用中,线性假设不成立,需要非线性系统识别技术。如果观测是高维的(例如图像),则非线性系统识别本质上困难。为了解决从高维观测中进行非线性系统识别的问题,我们结合了深度学习和系统识别的最新进展。特别是,我们通过深度自编码器联合学习观测的低维嵌入,并在该低维空间中学习预测转移模型。我们证明我们的模型能够仅从像素信息学习良好的动态系统预测模型。

英文摘要

Modeling dynamical systems is important in many disciplines, e.g., control, robotics, or neurotechnology. Commonly the state of these systems is not directly observed, but only available through noisy and potentially high-dimensional observations. In these cases, system identification, i.e., finding the measurement mapping and the transition mapping (system dynamics) in latent space can be challenging. For linear system dynamics and measurement mappings efficient solutions for system identification are available. However, in practical applications, the linearity assumptions does not hold, requiring non-linear system identification techniques. If additionally the observations are high-dimensional (e.g., images), non-linear system identification is inherently hard. To address the problem of non-linear system identification from high-dimensional observations, we combine recent advances in deep learning and system identification. In particular, we jointly learn a low-dimensional embedding of the observation by means of deep auto-encoders and a predictive transition model in this low-dimensional space. We demonstrate that our model enables learning good predictive models of dynamical systems from pixel information only.

1410.6151 2026-06-04 math.NA cs.NA stat.CO

Small-noise analysis and symmetrization of implicit Monte Carlo samplers

小噪声分析与隐式马尔可夫采样器的对称化

Jonathan Goodman, Kevin K. Lin, Matthias Morzfeld

AI总结 本文通过拉普拉斯渐近展开分析小噪声下的隐式采样器,提出对称化改进方案,验证了其在小噪声采样中的有效性。

详情
AI中文摘要

隐式采样器是用于从多变量概率分布中生成独立加权样本的算法,常用于贝叶斯数据同化。本文利用拉普拉斯渐近展开分析两种隐式采样器在小噪声 regime 中的行为,提出对称化算法改进方案,表明该方法在较小的额外成本下提升了采样效果。计算实验验证了理论结果,显示对称化在小噪声采样问题中有效。

英文摘要

Implicit samplers are algorithms for producing independent, weighted samples from multi-variate probability distributions. These are often applied in Bayesian data assimilation algorithms. We use Laplace asymptotic expansions to analyze two implicit samplers in the small noise regime. Our analysis suggests a symmetrization of the algo- rithms that leads to improved (implicit) sampling schemes at a rel- atively small additional cost. Computational experiments confirm the theory and show that symmetrization is effective for small noise sampling problems.

1410.4622 2026-06-04 cs.RO cs.SY eess.SY math.AT stat.ML

Robust Topological Feature Extraction for Mapping of Environments using Bio-Inspired Sensor Networks

鲁棒拓扑特征提取用于生物启发式传感器网络环境映射

Alireza Dirafzoon, Edgar Lobaton

AI总结 本文利用生物启发式传感器网络收集的最小感知信息,通过概率运动模型提取弱接触信息,构建环境拓扑表示。采用拓扑数据分析提取主导特征,结合密度基子采样算法提高鲁棒性,并提出鲁棒尺度不变分类算法以量化特征。

详情
Comments
14 pages, 7 figures
AI中文摘要

本文提出了一种利用生物启发式传感器网络进行未知环境探索和映射的方法。通过受蟑螂运动特性启发的概率运动模型,提取弱接触信息以构建环境拓扑表示。利用节点间交互生成点云,代表环境流形的空间特征。通过拓扑数据分析生成持续区间用于拓扑映射。为提高采样数据对异常值的鲁棒性,采用密度基子采样算法。此外,提出一种鲁棒尺度不变分类算法用于持久图,以提供数据中所需特征的定量表示。同时,提出多种定义接触度量的策略,以提高拓扑方法的估计和分类性能。

英文摘要

In this paper, we exploit minimal sensing information gathered from biologically inspired sensor networks to perform exploration and mapping in an unknown environment. A probabilistic motion model of mobile sensing nodes, inspired by motion characteristics of cockroaches, is utilized to extract weak encounter information in order to build a topological representation of the environment. Neighbor to neighbor interactions among the nodes are exploited to build point clouds representing spatial features of the manifold characterizing the environment based on the sampled data. To extract dominant features from sampled data, topological data analysis is used to produce persistence intervals for features, to be used for topological mapping. In order to improve robustness characteristics of the sampled data with respect to outliers, density based subsampling algorithms are employed. Moreover, a robust scale-invariant classification algorithm for persistence diagrams is proposed to provide a quantitative representation of desired features in the data. Furthermore, various strategies for defining encounter metrics with different degrees of information regarding agents' motion are suggested to enhance the precision of the estimation and classification performance of the topological method.

1410.2954 2026-06-04 eess.SY cs.SY stat.ML

Q-learning for Optimal Control of Continuous-time Systems

用于连续时间系统的最优控制的Q学习

Biao Luo, Derong Liu, Tingwen Huang

AI总结 本文提出PIQL和VIQL算法,通过连续时间系统的Q函数解决非线性连续时间系统的模型无关最优控制问题,证明其Q函数序列非递增并收敛,利用实际系统数据学习最优控制策略。

详情
Comments
Submitted for Review
AI中文摘要

本文提出两种Q学习(QL)方法,并为其建立收敛理论,以解决一般非线性连续时间系统的模型无关最优控制问题。通过引入连续时间系统的Q函数,提出了基于策略迭代的Q学习(PIQL)和基于价值迭代的Q学习(VIQL)算法,从实际系统数据中学习最优控制策略,而不是使用数学系统模型。证明PIQL和VIQL方法生成非递增的Q函数序列,最终收敛到最优Q函数。在实现QL算法时,应用加权残差法推导参数更新规则。所开发的PIQL和VIQL算法本质上是基于策略的强化学习方法,系统数据可以任意收集,从而增加探索能力。利用实际系统收集的数据,QL方法在线学习最优控制策略,然后将收敛的控制策略应用于实际系统。通过计算机仿真验证了所开发的QL算法的有效性。

英文摘要

In this paper, two Q-learning (QL) methods are proposed and their convergence theories are established for addressing the model-free optimal control problem of general nonlinear continuous-time systems. By introducing the Q-function for continuous-time systems, policy iteration based QL (PIQL) and value iteration based QL (VIQL) algorithms are proposed for learning the optimal control policy from real system data rather than using mathematical system model. It is proved that both PIQL and VIQL methods generate a nonincreasing Q-function sequence, which converges to the optimal Q-function. For implementation of the QL algorithms, the method of weighted residuals is applied to derived the parameters update rule. The developed PIQL and VIQL algorithms are essentially off-policy reinforcement learning approachs, where the system data can be collected arbitrary and thus the exploration ability is increased. With the data collected from the real system, the QL methods learn the optimal control policy offline, and then the convergent control policy will be employed to real system. The effectiveness of the developed QL algorithms are verified through computer simulation.

1410.0719 2026-06-04 math.NA cs.CV cs.IT cs.LG cs.NA math.IT math.OC math.ST stat.TH

Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)

第二届‘国际稀疏模型与技术相互作用’研讨会论文集(iTWIST'14)

L. Jacques, C. De Vleeschouwer, Y. Boursier, P. Sudhakar, C. De Mol, A. Pizurica, S. Anthoine, P. Vandergheynst, P. Frossard, C. Bilen, S. Kitic, N. Bertin, R. Gribonval, N. Boumal, B. Mishra, P. -A. Absil, R. Sepulchre, S. Bundervoet, C. Schretter, A. Dooms, P. Schelkens, O. Chabiron, F. Malgouyres, J. -Y. Tourneret, N. Dobigeon, P. Chainais, C. Richard, B. Cornelis, I. Daubechies, D. Dunson, M. Dankova, P. Rajmic, K. Degraux, V. Cambareri, B. Geelen, G. Lafruit, G. Setti, J. -F. Determe, J. Louveaux, F. Horlin, A. Drémeau, P. Heas, C. Herzet, V. Duval, G. Peyré, A. Fawzi, M. Davies, N. Gillis, S. A. Vavasis, C. Soussen, L. Le Magoarou, J. Liang, J. Fadili, A. Liutkus, D. Martina, S. Gigan, L. Daudet, M. Maggioni, S. Minsker, N. Strawn, C. Mory, F. Ngole, J. -L. Starck, I. Loris, S. Vaiter, M. Golbabaee, D. Vukobratovic

AI总结 iTWIST'14聚焦稀疏范式理论与应用,通过演讲、海报和讨论促进国际协作,涵盖稀疏数据传感、子空间联合、非线性逆问题等主题。

详情
Comments
69 pages, 24 extended abstracts, iTWIST'14 website: http://sites.google.com/site/itwist14
AI中文摘要

iTWIST研讨会旨在通过口头报告、海报和自由讨论促进国际科学团队合作。第二届iTWIST'14于2014年8月27日至29日在比利时纳穆尔举行,吸引了70名国际参与者,包含9场特邀讲座、10场口头报告和14个海报,主题涵盖稀疏范式的理论、应用与推广,包括稀疏数据传感、低维子空间联合、非线性逆问题等。

英文摘要

The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in Belgium, from Wednesday August 27th till Friday August 29th, 2014. The workshop was conveniently located in "The Arsenal" building within walking distance of both hotels and town center. iTWIST'14 has gathered about 70 international participants and has featured 9 invited talks, 10 oral presentations, and 14 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing; Union of low dimensional subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph sensing/processing; Blind inverse problems and dictionary learning; Sparsity and computational neuroscience; Information theory, geometry and randomness; Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?; Sparse machine learning and inference.

1409.4857 2026-06-04 econ.GN math.DS math.ST q-fin.EC stat.TH

A simple dynamical model leading to Pareto wealth distribution and stability

一种简单的动力学模型导致帕累托财富分布和稳定性

Ricardo Pérez-Marco

AI总结 本文提出一个简单的财富演变动力学模型,其不变分布为帕累托类型,并在动态上稳定,符合帕累托的猜想。

详情
Comments
10 pages. Formulas corrected from version 1. Results unchanged
AI中文摘要

我们提出一个简单的财富演变动力学模型。其不变分布为帕累托类型,并且在动态上稳定,正如帕累托所推测的那样。

英文摘要

We propose a simple dynamical model of wealth evolution. The invariant distributions are of Pareto type and are dynamically stable as conjectured by Pareto.

1409.8327 2026-06-04 eess.SY cs.LG cs.SY stat.ML

Bayesian and regularization approaches to multivariable linear system identification: the role of rank penalties

基于贝叶斯和正则化方法的多变量线性系统辨识:秩惩罚的作用

Giulia Prando, Alessandro Chiuso, Gianluigi Pillonetto

AI总结 本文提出一种基于ℓ2正则化和秩惩罚的冲击响应估计器,用于处理多输入多输出系统中输入输出通道的耦合问题,通过优化边际似然估计超参数,实现闭式解。

详情
Comments
to appear in IEEE Conference on Decision and Control, 2014
AI中文摘要

最近线性系统辨识的发展提出了非参数方法,依赖正则化策略处理偏差/方差权衡。本文引入一种冲击响应估计器,采用ℓ2型正则化和基于log-det启发式的秩惩罚作为秩函数的平滑近似。这允许考虑估计冲击响应的不同属性(如平滑性和稳定性),同时惩罚高复杂度模型。此外,它允许考虑并强制多输入多输出系统中不同输入输出通道的耦合。根据贝叶斯范式,参数定义了两个正则化项的相对权重以及秩惩罚的结构,通过优化边际似然估计。一旦这些超参数被估计,冲击响应估计即可用闭式形式表示。实验表明,所提方法优于仅依赖经典ℓ2正则化或原子和核范数的估计器。

英文摘要

Recent developments in linear system identification have proposed the use of non-parameteric methods, relying on regularization strategies, to handle the so-called bias/variance trade-off. This paper introduces an impulse response estimator which relies on an $\ell_2$-type regularization including a rank-penalty derived using the log-det heuristic as a smooth approximation to the rank function. This allows to account for different properties of the estimated impulse response (e.g. smoothness and stability) while also penalizing high-complexity models. This also allows to account and enforce coupling between different input-output channels in MIMO systems. According to the Bayesian paradigm, the parameters defining the relative weight of the two regularization terms as well as the structure of the rank penalty are estimated optimizing the marginal likelihood. Once these hyperameters have been estimated, the impulse response estimate is available in closed form. Experiments show that the proposed method is superior to the estimator relying on the "classic" $\ell_2$-regularization alone as well as those based in atomic and nuclear norm.

1409.8276 2026-06-04 cs.LG cs.NA math.NA stat.ML

A Bayesian Tensor Factorization Model via Variational Inference for Link Prediction

通过变分推断的贝叶斯张量分解模型用于链接预测

Beyza Ermis, A. Taylan Cemgil

AI总结 本文提出基于变分贝叶斯推断的张量分解模型,用于解决链接预测问题,相比最大似然方法在大规模数据集上表现更优。

详情
Comments
arXiv admin note: substantial text overlap with arXiv:1409.8083
AI中文摘要

概率性张量分解方法旨在通过设定低秩约束从不完整数据中提取有意义的结构。最近,变分贝叶斯(VB)推断技术已成功应用于大规模模型。本文提出了通过VB进行完整贝叶斯推断的单张量和耦合张量分解模型。我们的方法即使在非常大的模型上也能运行,并且易于实现。在多个现实世界数据集上,它在缺失链接预测问题上的预测性能优于基于最大似然的方法。

英文摘要

Probabilistic approaches for tensor factorization aim to extract meaningful structure from incomplete data by postulating low rank constraints. Recently, variational Bayesian (VB) inference techniques have successfully been applied to large scale models. This paper presents full Bayesian inference via VB on both single and coupled tensor factorization models. Our method can be run even for very large models and is easily implemented. It exhibits better prediction performance than existing approaches based on maximum likelihood on several real-world datasets for missing link prediction problem.

1409.8083 2026-06-04 stat.CO cs.NA math.NA

Variational Inference For Probabilistic Latent Tensor Factorization with KL Divergence

变分推断用于具有KL散度的概率潜在张量分解

Beyza Ermis, Y. Kenan Yılmaz, A. Taylan Cemgil, Evrim Acar

AI总结 本文提出基于变分贝叶斯的完整贝叶斯推断方法,用于概率潜在张量分解框架,实现更强大的建模和复杂的推断,应用于模型阶选择和链接预测。

详情
AI中文摘要

概率潜在张量分解(PLTF)是一种最近提出的概率框架,用于建模多向数据。不仅常见的张量分解模型,任何任意的张量分解结构都可以通过PLTF框架实现。本文提出了基于变分贝叶斯的完整贝叶斯推断,以促进更强大的建模,并允许在PLTF框架上进行更复杂的推断。我们通过模型阶选择和链接预测来说明我们的方法。

英文摘要

Probabilistic Latent Tensor Factorization (PLTF) is a recently proposed probabilistic framework for modelling multi-way data. Not only the common tensor factorization models but also any arbitrary tensor factorization structure can be realized by the PLTF framework. This paper presents full Bayesian inference via variational Bayes that facilitates more powerful modelling and allows more sophisticated inference on the PLTF framework. We illustrate our approach on model order selection and link prediction.

1409.1620 2026-06-04 math.ST econ.GN q-fin.EC stat.AP stat.TH

Orthogonal Polynomials for Seminonparametric Instrumental Variables Model

正交多项式用于半非参数工具变量模型

Yevgeniy Kovchegov, Nese Yildiz

AI总结 本文提出解决半非参数工具变量模型中多项式基问题的方法,通过构造正交多项式基来估计结构函数,适用于连续和离散内生协变量的模型。

详情
Comments
18 pages
AI中文摘要

我们开发了一种方法,解决了一类具有离散内生协变量模型中的多项式基问题,并适用于Newey和Powell(2003)考虑的计量经济模型,其中内生协变量是连续的。假设X是d维的内生随机变量,Z₁和Z₂是工具变量(向量),Z=(Z₁;Z₂)。现在假设X给定Z的条件分布满足足以解决识别问题的条件,如Newey和Powell(2003)或本文中的命题1.1。即,对于图像空间中的函数π(z),存在几乎 surely 唯一的函数g(x,z₁)在域空间中,使得E[g(X,Z₁)|Z]=π(Z) Z几乎 surely。在本文中,对于一类条件分布X|Z,我们产生一个正交多项式基Q_j(x,z₁),使得对于几乎每个Z₁=z₁,以及所有j∈Z_+^d,和某个μ(Z),P_j(μ(Z))=E[Q_j(X,Z₁)|Z],其中P_j是次数为j的多项式。这即是所谓的解决多项式基问题。假设了解X|Z和π(z)的推断,我们的方法提供了一种自然估计感兴趣结构函数g(x,z₁)的方式。我们的多项式基方法自然扩展到Pearson-like和Ord-like分布族。

英文摘要

We develop an approach that resolves a {\it polynomial basis problem} for a class of models with discrete endogenous covariate, and for a class of econometric models considered in the work of Newey and Powell (2003), where the endogenous covariate is continuous. Suppose $X$ is a $d$-dimensional endogenous random variable, $Z_1$ and $Z_2$ are the instrumental variables (vectors), and $Z=\left(\begin{array}{c}Z_1 \\Z_2\end{array}\right)$. Now, assume that the conditional distributions of $X$ given $Z$ satisfy the conditions sufficient for solving the identification problem as in Newey and Powell (2003) or as in Proposition 1.1 of the current paper. That is, for a function $π(z)$ in the image space there is a.s. a unique function $g(x,z_1)$ in the domain space such that $$E[g(X,Z_1)~|~Z]=π(Z) \qquad Z-a.s.$$ In this paper, for a class of conditional distributions $X|Z$, we produce an orthogonal polynomial basis $Q_j(x,z_1)$ such that for a.e. $Z_1=z_1$, and for all $j \in \mathbb{Z}_+^d$, and a certain $μ(Z)$, $$P_j(μ(Z))=E[Q_j(X, Z_1)~|~Z ],$$ where $P_j$ is a polynomial of degree $j$. This is what we call solving the {\it polynomial basis problem}. Assuming the knowledge of $X|Z$ and an inference of $π(z)$, our approach provides a natural way of estimating the structural function of interest $g(x,z_1)$. Our polynomial basis approach is naturally extended to Pearson-like and Ord-like families of distributions.

1409.1403 2026-06-04 stat.ML cs.NA math.NA

Nonlinear tensor product approximation of functions

非线性张量积函数的近似

D. Bazarkhanov, V. Temlyakov

AI总结 研究多变量函数的非线性张量积近似,探讨在L_p空间中混合光滑性假设下的最佳多线性近似方法及误差衰减率。

详情
AI中文摘要

我们关注通过线性组合的乘积u^1(x_1)...u^d(x_d)近似多变量函数f(x_1,...,x_d)的问题。在d=2时,这是经典的双线性近似问题。在L_2空间中,双线性近似问题与相应积分算子的奇异值分解(也称为Schmidt展开)密切相关。已知在不同光滑性假设下,双线性近似的误差衰减率结果。在d≥3时,多线性近似(非线性张量积近似)问题更为复杂且研究较少。本文将讨论在混合光滑性假设下,多线性近似在L_p空间中的最佳结果。

英文摘要

We are interested in approximation of a multivariate function $f(x_1,\dots,x_d)$ by linear combinations of products $u^1(x_1)\cdots u^d(x_d)$ of univariate functions $u^i(x_i)$, $i=1,\dots,d$. In the case $d=2$ it is a classical problem of bilinear approximation. In the case of approximation in the $L_2$ space the bilinear approximation problem is closely related to the problem of singular value decomposition (also called Schmidt expansion) of the corresponding integral operator with the kernel $f(x_1,x_2)$. There are known results on the rate of decay of errors of best bilinear approximation in $L_p$ under different smoothness assumptions on $f$. The problem of multilinear approximation (nonlinear tensor product approximation) in the case $d\ge 3$ is more difficult and much less studied than the bilinear approximation problem. We will present results on best multilinear approximation in $L_p$ under mixed smoothness assumption on $f$.

1408.6043 2026-06-04 math.NA cs.NA math.OC q-fin.CP stat.CO

A Framework of Conjugate Direction Methods for Symmetric Linear Systems in Optimization

优化中对称线性系统 conjugate direction 方法的框架

Giovanni Fasano

AI总结 本文提出一种参数依赖的 Krylov 类方法 CD,用于求解对称线性系统,通过生成共轭方向序列扩展标准共轭梯度法的性质,实现共轭性保持,并证明了 CD 算法的有限收敛性及误差分析,同时引入预处理以保持预处理 CG 的误差界。

详情
Comments
31 pages
AI中文摘要

本文介绍了一种参数依赖的基于 Krylov 的方法 CD,用于求解对称线性系统。我们证明在我们的提议中,我们生成共轭方向序列,扩展了标准共轭梯度 (CG) 方法的一些性质,以保持共轭性。对于我们框架中的特定参数值,我们得到与 CG 和缩放 CG 等价的方案。我们还证明了 CD 算法的有限收敛性,并提供了某些误差分析。最后,引入了 CD 的预处理,并证明标准预处理 CG 的误差界也适用于预处理 CD。

英文摘要

In this paper we introduce a parameter dependent class of Krylov-based methods, namely CD, for the solution of symmetric linear systems. We give evidence that in our proposal we generate sequences of conjugate directions, extending some properties of the standard Conjugate Gradient (CG) method, in order to preserve the conjugacy. For specific values of the parameters in our framework we obtain schemes equivalent to both the CG and the scaled-CG. We also prove the finite convergence of the algorithms in CD, and we provide some error analysis. Finally, preconditioning is introduced for CD, and we show that standard error bounds for the preconditioned CG also hold for the preconditioned CD.

1408.2054 2026-06-04 cs.LG cs.NA math.NA stat.ML

Non-Convex Rank Minimization via an Empirical Bayesian Approach

通过经验贝叶斯方法实现非凸秩最小化

David Wipf

AI总结 本文提出基于变分近似的经验贝叶斯方法,用于解决非凸秩最小化问题,该方法在保留全局最优估计的同时,通过边际化处理克服了传统凸松弛方法的局限性,尤其在鲁棒主成分分析中表现出色。

详情
Comments
Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)
AI中文摘要

在需要最小秩矩阵解的应用中,底层成本函数非凸导致优化问题难以解决。因此,核范数常被用作矩阵秩的替代惩罚项。然而,在许多实际场景中,无法保证正确估计生成的低秩矩阵,理论特例除外。本文提出了一种基于变分近似的经验贝叶斯方法,该方法在许多有用约束下保留与秩函数相同的全局最小点估计。通过边际化处理,局部最小解被平滑掉,使算法在标准凸松弛完全失败时仍能成功。虽然该方法适用于广泛低秩应用,但本文聚焦于鲁棒主成分分析问题(RPCA),即估计未知低秩矩阵及其未知稀疏损坏。理论和实证证据表明,本文方法可能优于相关MAP方法,其中凸原理成分追求(PCP)算法(Candes等,2011)可视为特例。

英文摘要

In many applications that require matrix solutions of minimal rank, the underlying cost function is non-convex leading to an intractable, NP-hard optimization problem. Consequently, the convex nuclear norm is frequently used as a surrogate penalty term for matrix rank. The problem is that in many practical scenarios there is no longer any guarantee that we can correctly estimate generative low-rank matrices of interest, theoretical special cases notwithstanding. Consequently, this paper proposes an alternative empirical Bayesian procedure build upon a variational approximation that, unlike the nuclear norm, retains the same globally minimizing point estimate as the rank function under many useful constraints. However, locally minimizing solutions are largely smoothed away via marginalization, allowing the algorithm to succeed when standard convex relaxations completely fail. While the proposed methodology is generally applicable to a wide range of low-rank applications, we focus our attention on the robust principal component analysis problem (RPCA), which involves estimating an unknown low-rank matrix with unknown sparse corruptions. Theoretical and empirical evidence are presented to show that our method is potentially superior to related MAP-based approaches, for which the convex principle component pursuit (PCP) algorithm (Candes et al., 2011) can be viewed as a special case.

1408.1742 2026-06-04 stat.CO cs.NA math.NA

Discrepancy Estimates for Acceptance-Rejection Samplers Using Stratified Inputs

基于分层输入的接受-拒绝采样器的偏差估计

Houying Zhu, Josef Dick

AI总结 本文提出利用分层输入作为发散序列的接受-拒绝采样器,估计生成点的偏差,并给出了星形偏差的上界及L_q-偏差的q阶矩上界,同时改进了确定性算法的收敛速率。

详情
AI中文摘要

在本文中,我们提出了一种利用分层输入作为发散序列的接受-拒绝采样器,并估计由此算法生成的点的偏差。首先,我们展示了星形偏差的上界为阶$N^{-1/2-1/(2s)}$。进一步,我们证明了$L_q$-偏差的q阶矩$(\mathbb{E}[N^{q}L^{q}_{q,N}])^{1/q}$对于$2\le q\le \infty$的上界为阶$N^{(1-1/s)(1-1/q)}$。我们还展示了使用$(t,m,s)$-网作为驱动序列的确定性接受-拒绝算法的改进收敛速率。

英文摘要

In this paper we propose an acceptance-rejection sampler using stratified inputs as diver sequence. We estimate the discrepancy of the points generated by this algorithm. First we show an upper bound on the star discrepancy of order $N^{-1/2-1/(2s)}$. Further we prove an upper bound on the $q$-th moment of the $L_q$-discrepancy $(\mathbb{E}[N^{q}L^{q}_{q,N}])^{1/q}$ for $2\le q\le \infty$, which is of order $N^{(1-1/s)(1-1/q)}$. We also present an improved convergence rate for a deterministic acceptance-rejection algorithm using $(t,m,s)-$nets as driver sequence.

1408.0838 2026-06-04 cs.LG cs.NA math.NA math.OC stat.ML

Estimating Maximally Probable Constrained Relations by Mathematical Programming

通过数学规划估计最大可能的约束关系

Lizhen Qu, Bjoern Andres

AI总结 本文提出了一种概率测度家族,用于联合抽象多标签分类、相关聚类和排序问题,通过数学规划方法解决半监督学习。

详情
Comments
16 pages
AI中文摘要

估计约束关系是机器学习中的基本问题。特殊情形包括分类、聚类和排序。本文贡献了一种在两个有限非空集合之间所有关系的概率测度家族,提供多标签分类、相关聚类和排序的联合抽象。给定相关和不相关配对的训练集,估计最大可能测度是一个凸优化问题。给定测度估计最大可能关系是一个01线性规划问题,对于映射可在线性时间内解决,而对于等价关系和线性顺序则是NP难问题。实验显示了所有三种情况的实用解决方案。最后,联合估计最大可能测度和关系被提出为混合整数非线性规划问题。此方法为半监督学习提供了数学规划的途径。

英文摘要

Estimating a constrained relation is a fundamental problem in machine learning. Special cases are classification (the problem of estimating a map from a set of to-be-classified elements to a set of labels), clustering (the problem of estimating an equivalence relation on a set) and ranking (the problem of estimating a linear order on a set). We contribute a family of probability measures on the set of all relations between two finite, non-empty sets, which offers a joint abstraction of multi-label classification, correlation clustering and ranking by linear ordering. Estimating (learning) a maximally probable measure, given (a training set of) related and unrelated pairs, is a convex optimization problem. Estimating (inferring) a maximally probable relation, given a measure, is a 01-linear program. It is solved in linear time for maps. It is NP-hard for equivalence relations and linear orders. Practical solutions for all three cases are shown in experiments with real data. Finally, estimating a maximally probable measure and relation jointly is posed as a mixed-integer nonlinear program. This formulation suggests a mathematical programming approach to semi-supervised learning.

1305.3312 2026-06-04 stat.ME cs.NA math.NA

Stable Estimation of a Covariance Matrix Guided by Nuclear Norm Penalties

由核范数惩罚引导的协方差矩阵稳定估计

Eric C. Chi, Kenneth Lange

AI总结 本文提出一种稳定估计协方差矩阵的方法,通过核范数惩罚引导经典样本协方差估计器,证明其一致性和渐近有效性,适用于判别分析和EM聚类等场景。

详情
Journal ref
Computational Statistics & Data Analysis 80:117-128, 2014
Comments
25 pages, 3 figures
AI中文摘要

协方差矩阵或其逆的估计在许多统计方法中起核心作用。为确保方法可靠,估计的矩阵不仅需要可逆,还应良好条件数。本文提出一种直观先验,将经典样本协方差估计器收缩到稳定目标。证明所提估计器一致且渐近有效。随着样本数相对于协变量数增长,估计器优雅地过渡到样本协方差矩阵。此外,本文在样本数被协变量数主导或相当时,展示了该估计器在判别分析和EM聚类中的实用性。

英文摘要

Estimation of covariance matrices or their inverses plays a central role in many statistical methods. For these methods to work reliably, estimated matrices must not only be invertible but also well-conditioned. In this paper we present an intuitive prior that shrinks the classic sample covariance estimator towards a stable target. We prove that our estimator is consistent and asymptotically efficient. Thus, it gracefully transitions towards the sample covariance matrix as the number of samples grows relative to the number of covariates. We also demonstrate the utility of our estimator in two standard situations -- discriminant analysis and EM clustering -- when the number of samples is dominated by or comparable to the number of covariates.

1408.0722 2026-06-04 math.NA cs.NA math.ST stat.TH

A Generalized ANOVA Dimensional Decomposition for Dependent Probability Measures

一种适用于依赖概率测度的广义ANOVA维度分解

Sharif Rahman

AI总结 本文提出一种广义ANOVA维度分解方法,通过弱化消去条件揭示组件函数具有零均值和层次正交性,利用此特性推导耦合方程,并提出基于测度一致多变量正交多项式的新方法,扩展了有效维度定义并验证了依赖性对全局敏感性指数的影响。

详情
Comments
27 pages, 2 figures, accepted SIAM/ASA Journal on Uncertainty Quantification, 2014
AI中文摘要

本文探讨了适用于依赖随机变量的广义方差分析或ANOVA维度分解(ADD)方法。两个源自弱化消去条件的性质揭示了广义ADD的组件函数具有零均值且层次正交。通过利用这些性质,本文提出了一种简单替代方法,推导出广义ADD组件函数所满足的耦合方程。这些方程包含经典ADD作为特殊情况。为确定广义ADD的组件函数,本文提出了一种新的构造方法,利用测度一致的多变量正交多项式作为基函数,并通过求解线性代数方程计算涉及的展开系数。新提出的广义公式包括依赖概率分布的二阶矩特征,包括三元组全局敏感性指数。此外,广义ADD导致了有效维度的扩展定义,这些定义在当前文献中已为经典ADD所报告。数值结果表明,随机变量的相关结构可以显著改变组件函数的组成,产生广泛变化的全局敏感性指数,从而产生随机变量的不同排序。对随机本征值分析的应用展示了所提近似方法的实用性。

英文摘要

This article explores the generalized analysis-of-variance or ANOVA dimensional decomposition (ADD) for multivariate functions of dependent random variables. Two notable properties, stemming from weakened annihilating conditions, reveal that the component functions of the generalized ADD have \emph{zero} means and are hierarchically orthogonal. By exploiting these properties, a simple, alternative approach is presented to derive a coupled system of equations that the generalized ADD component functions satisfy. The coupled equations, which subsume as a special case the classical ADD, reproduce the component functions for independent probability measures. To determine the component functions of the generalized ADD, a new constructive method is proposed by employing measure-consistent, multivariate orthogonal polynomials as bases and calculating the expansion coefficients involved from the solution of linear algebraic equations. New generalized formulae are presented for the second-moment characteristics, including triplets of global sensitivity indices, for dependent probability distributions. Furthermore, the generalized ADD leads to extended definitions of effective dimensions, reported in the current literature for the classical ADD. Numerical results demonstrate that the correlation structure of random variables can significantly alter the composition of component functions, producing widely varying global sensitivity indices and, therefore, distinct rankings of random variables. An application to random eigenvalue analysis demonstrates the usefulness of the proposed approximation.

1407.7738 2026-06-04 stat.ME econ.GN q-fin.CP q-fin.EC

Multivariate Self-Exciting Threshold Autoregressive Models with eXogenous Input

具有外生输入的多元自激发阈值自回归模型

Peter Martey Addo

AI总结 本文提出了一种具有外生输入的多元自激发阈值自回归模型,并给出了参数估计方法,探讨了模型的平稳性条件及参数估计算法的效率。

详情
Comments
This is a preliminary version of the paper-- please do not quote
AI中文摘要

本文定义了具有外生输入的多元自激发阈值自回归模型(MSETARX),并提出了参数估计程序。提供了非线性MSETARX模型的平稳性条件。通过模拟,探讨了该类模型中自适应参数估计算法和最小二乘估计(LSE)算法的效率。

英文摘要

This study defines a multivariate Self--Exciting Threshold Autoregressive with eXogenous input (MSETARX) models and present an estimation procedure for the parameters. The conditions for stationarity of the nonlinear MSETARX models is provided. In particular, the efficiency of an adaptive parameter estimation algorithm and LSE (least squares estimate) algorithm for this class of models is then provided via simulations.

1407.7299 2026-06-04 math.NA cs.LG cs.NA stat.ML

Algorithms, Initializations, and Convergence for the Nonnegative Matrix Factorization

非负矩阵分解的算法、初始化和收敛性

Amy N. Langville, Carl D. Meyer, Russell Albright, James Cox, David Duling

AI总结 本文研究了非负矩阵分解算法的初始化对收敛速度和精度的影响,提出两种新的交替最小二乘算法,并讨论了收敛准则的选择问题。

详情
AI中文摘要

非负矩阵分解算法的初始化对收敛速度和精度有显著影响。许多非负矩阵分解算法对W或H的初始化敏感,尤其是交替最小二乘算法。本文提出了两种新的交替最小二乘算法,并比较了六种初始化方法(两种标准和四种新方法)的结果。最后,讨论了选择合适收敛准则的实践问题。

英文摘要

It is well known that good initializations can improve the speed and accuracy of the solutions of many nonnegative matrix factorization (NMF) algorithms. Many NMF algorithms are sensitive with respect to the initialization of W or H or both. This is especially true of algorithms of the alternating least squares (ALS) type, including the two new ALS algorithms that we present in this paper. We compare the results of six initialization procedures (two standard and four new) on our ALS algorithms. Lastly, we discuss the practical issue of choosing an appropriate convergence criterion.

1407.0286 2026-06-04 math.NA cs.LG cs.NA stat.ML

DC approximation approaches for sparse optimization

基于DC近似方法的稀疏优化

Hoai An Le Thi, Tao Pham Dinh, Hoai Minh Le, Xuan Thanh Vo

AI总结 本文从DC框架出发,研究了稀疏优化的非凸近似方法,分析了近似问题与原问题的解的一致性,并开发了四种DCA方案,用于解决零范数和稀疏优化问题。

详情
Comments
35 pages
AI中文摘要

稀疏优化是指目标或约束中包含零范数的优化问题。本文从DC(差分凸函数)编程框架出发,研究了稀疏优化的非凸近似方法。考虑了包含所有标准稀疏诱导惩罚函数的零范数的常见DC近似,研究了近似问题与原问题的全局最小值(或局部最小值)的一致性。证明了在某些情况下,近似问题的某些全局最小值(或局部最小值)也是原问题的。利用DC编程中的精确惩罚技术,证明了某些特定近似方法在合适参数下与原问题等价。对几种稀疏诱导惩罚函数的效率进行了全面分析。开发了四种DCA(DC算法)方案,涵盖了非凸稀疏近似方法中的所有标准算法作为特殊版本。这些算法可以视为ℓ₁扰动算法/加权ℓ₁算法。本文提供了一种统一的非凸近似方法,结合了坚实的理论工具和基于DC编程和DCA的高效算法,以解决零范数和稀疏优化问题。作为应用,我们实现了我们的方法用于SVM(支持向量机)问题的特征选择,并在各种近似函数上进行了实证比较数值实验。

英文摘要

Sparse optimization refers to an optimization problem involving the zero-norm in objective or constraints. In this paper, nonconvex approximation approaches for sparse optimization have been studied with a unifying point of view in DC (Difference of Convex functions) programming framework. Considering a common DC approximation of the zero-norm including all standard sparse inducing penalty functions, we studied the consistency between global minimums (resp. local minimums) of approximate and original problems. We showed that, in several cases, some global minimizers (resp. local minimizers) of the approximate problem are also those of the original problem. Using exact penalty techniques in DC programming, we proved stronger results for some particular approximations, namely, the approximate problem, with suitable parameters, is equivalent to the original problem. The efficiency of several sparse inducing penalty functions have been fully analyzed. Four DCA (DC Algorithm) schemes were developed that cover all standard algorithms in nonconvex sparse approximation approaches as special versions. They can be viewed as, an $\ell _{1}$-perturbed algorithm / reweighted-$\ell _{1}$ algorithm / reweighted-$\ell _{1}$ algorithm. We offer a unifying nonconvex approximation approach, with solid theoretical tools as well as efficient algorithms based on DC programming and DCA, to tackle the zero-norm and sparse optimization. As an application, we implemented our methods for the feature selection in SVM (Support Vector Machine) problem and performed empirical comparative numerical experiments on the proposed algorithms with various approximation functions.

1407.5820 2026-06-04 eess.SY cs.SY math.OC stat.ML

Approximate Regularization Path for Nuclear Norm Based H2 Model Reduction

核范数基于H2模型降阶的近似正则化路径

Niclas Blomberg, Cristian R. Rojas, Bo Wahlberg

AI总结 本文研究了利用Hankel矩阵核范数进行动态系统降阶,通过正则化路径方法平衡模型拟合与复杂度,提出基于对偶间隙上界确定何时精确计算最优解的策略,高效计算正则化路径。

详情
AI中文摘要

本文研究了利用Hankel矩阵核范数进行动态系统降阶,通过正则化路径方法平衡模型拟合与复杂度,提出基于对偶间隙上界确定何时精确计算最优解的策略,高效计算正则化路径。

英文摘要

This paper concerns model reduction of dynamical systems using the nuclear norm of the Hankel matrix to make a trade-off between model fit and model complexity. This results in a convex optimization problem where this trade-off is determined by one crucial design parameter. The main contribution is a methodology to approximately calculate all solutions up to a certain tolerance to the model reduction problem as a function of the design parameter. This is called the regularization path in sparse estimation and is a very important tool in order to find the appropriate balance between fit and complexity. We extend this to the more complicated nuclear norm case. The key idea is to determine when to exactly calculate the optimal solution using an upper bound based on the so-called duality gap. Hence, by solving a fixed number of optimization problems the whole regularization path up to a given tolerance can be efficiently computed. We illustrate this approach on some numerical examples.

1407.5807 2026-06-04 cs.MA cs.SY eess.SY stat.ML

Multi-agents adaptive estimation and coverage control using Gaussian regression

多智能体自适应估计与覆盖控制使用高斯回归

Andrea Carron, Marco Todescato, Ruggero Carli, Luca Schenato, Gianluigi Pillonetto

AI总结 本文研究多智能体在未知传感函数下实现最优覆盖控制的问题,通过高斯回归框架实现自适应估计与覆盖的平衡,实验验证了方法的有效性。

详情
AI中文摘要

我们考虑一组智能体旨在根据传感函数进行最优区域覆盖的场景。特别是需要计算质心Voronoi划分。该任务的困难在于传感函数未知,必须在线从噪声测量中重建。因此,估计和覆盖需要同时进行。我们将问题置于贝叶斯回归框架中,将传感函数视为高斯随机场。然后,我们设计了一组控制输入,试图在覆盖和估计之间取得良好平衡,并讨论算法的收敛性。数值实验展示了新方法的有效性。

英文摘要

We consider a scenario where the aim of a group of agents is to perform the optimal coverage of a region according to a sensory function. In particular, centroidal Voronoi partitions have to be computed. The difficulty of the task is that the sensory function is unknown and has to be reconstructed on line from noisy measurements. Hence, estimation and coverage needs to be performed at the same time. We cast the problem in a Bayesian regression framework, where the sensory function is seen as a Gaussian random field. Then, we design a set of control inputs which try to well balance coverage and estimation, also discussing convergence properties of the algorithm. Numerical experiments show the effectivness of the new approach.

1310.8185 2026-06-04 stat.AP cs.SY eess.SY math.DS physics.soc-ph

Dynamics of popstar record sales on phonographic market -- stochastic model

流行音乐唱片销售动态——随机模型

Amdrzej Jarynowski, Andrzej Buda

AI总结 本文研究了全球最流行30位艺术家的周唱片销量,揭示了其非平凡的记忆特性,并通过MRGJD和MRS模型解释了销量波动规律及集体行为现象。

详情
Journal ref
Acta Physica Polonica B (PS) No 2, Vol. 7 2014
Comments
Summer Solstice 2013 International Conference on Discrete Models of Complex Systems, Warsaw, Poland
AI中文摘要

本文研究了全球最流行30位艺术家的周唱片销量,揭示了其非平凡的记忆特性,并通过MRGJD和MRS模型解释了销量波动规律及集体行为现象。

英文摘要

We investigate weekly record sales of the world's most popular 30 artists (2003-2013). Time series of sales have non-trivial kind of memory (anticorrelations, strong seasonality and constant autocorrelation decay within 120 weeks). Amount of artists record sales are usually the highest in the first week after premiere of their brand new records and then decrease to fluctuate around zero till next album release. We model such a behavior by discrete mean-reverting geometric jump diffusion (MRGJD) and Markov regime switching mechanism (MRS) between the base and the promotion regimes. We can built up the evidence through such a toy model that quantifies linear and nonlinear dynamical components (with stationary and nonstationary parameters set), and measure local divergence of the system with collective behavior phenomena. We find special kind of disagreement between model and data for Christmas time due to unusual shopping behavior. Analogies to earthquakes, product life-cycles, and energy markets will also be discussed.

1407.2676 2026-06-04 math.OC cs.AI cs.LG cs.SY eess.SY stat.ML

A New Optimal Stepsize For Approximate Dynamic Programming

近似动态规划的一种新最优步长

Ilya O. Ryzhov, Peter I. Frazier, Warren B. Powell

AI总结 本文提出一种新的最优步长规则,通过优化预测误差提升近似动态规划算法的短期性能,仅需一个敏感度较低的可调参数,适应问题噪声水平,加快数值实验中的收敛速度。

详情
Comments
Matlab files are included with the paper source
AI中文摘要

近似动态规划(ADP)已在大规模交通运输问题、医疗保健、收益管理以及能源系统等广泛领域中得到了应用。设计有效的ADP算法有许多维度,但一个关键因素是用于更新价值函数近似值的步长规则。许多运筹学应用计算上都很耗费资源,因此快速获得良好结果非常重要。此外,最流行的步长公式使用可调参数,如果调节不当,可能会产生非常差的结果。我们推导出一种新的步长规则,以优化预测误差,从而提高ADP算法的短期性能。仅需一个相对不敏感的可调参数,新的规则能够适应问题中的噪声水平,并在数值实验中产生更快的收敛速度。

英文摘要

Approximate dynamic programming (ADP) has proven itself in a wide range of applications spanning large-scale transportation problems, health care, revenue management, and energy systems. The design of effective ADP algorithms has many dimensions, but one crucial factor is the stepsize rule used to update a value function approximation. Many operations research applications are computationally intensive, and it is important to obtain good results quickly. Furthermore, the most popular stepsize formulas use tunable parameters and can produce very poor results if tuned improperly. We derive a new stepsize rule that optimizes the prediction error in order to improve the short-term performance of an ADP algorithm. With only one, relatively insensitive tunable parameter, the new rule adapts to the level of noise in the problem and produces faster convergence in numerical experiments.

1404.6635 2026-06-04 math.OC cs.SY eess.SY stat.CO

Greedy Block Coordinate Descent (GBCD) Method for High Dimensional Quadratic Programs

用于高维二次规划的贪心块坐标下降(GBCD)方法

Gugan Thoppe, Vivek S. Borkar, Dinesh Garg

AI总结 本文提出一种贪心块坐标下降方法,用于解决高维二次规划问题,通过适应可用计算资源,提升求解效率,实验表明其在高维问题中表现优异。

详情
Comments
29 pages, 3 figures, New references added
AI中文摘要

高维无约束二次规划(UQPs)在诸如网络、社交网络等领域中变得常见。除非有与这些数据集相匹配的计算资源,否则使用经典UQP方法求解此类问题非常困难。本文讨论了替代方法。我们首先定义了高维兼容(HDC)方法,即能够通过适应可用计算资源来解决高维UQPs的方法。然后我们展示,块Kaczmarz和块坐标下降(BCD)是唯一可以成为HDC的现有方法。作为BCD方法中最佳的可能答案,我们提出了一种新的贪心BCD(GBCD)方法,包括串行、并行和分布式变种。收敛率和数值测试证实,GBCD确实是解决高维UQPs的有效方法。事实上,它有时甚至优于共轭梯度法。

英文摘要

High dimensional unconstrained quadratic programs (UQPs) involving massive datasets are now common in application areas such as web, social networks, etc. Unless computational resources that match up to these datasets are available, solving such problems using classical UQP methods is very difficult. This paper discusses alternatives. We first define high dimensional compliant (HDC) methods for UQPs---methods that can solve high dimensional UQPs by adapting to available computational resources. We then show that the class of block Kaczmarz and block coordinate descent (BCD) are the only existing methods that can be made HDC. As a possible answer to the question of the `best' amongst BCD methods for UQP, we propose a novel greedy BCD (GBCD) method with serial, parallel and distributed variants. Convergence rates and numerical tests confirm that the GBCD is indeed an effective method to solve high dimensional UQPs. In fact, it sometimes beats even the conjugate gradient.

1407.0449 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

Classification-based Approximate Policy Iteration: Experiments and Extended Discussions

基于分类的近似策略迭代:实验与扩展讨论

Amir-massoud Farahmand, Doina Precup, André M. S. Barreto, Mohammad Ghavamzadeh

AI总结 本文提出基于分类的近似策略迭代框架,通过价值函数和策略空间的规律性来提升算法性能,并在HIV控制等任务中验证了其有效性。

详情
AI中文摘要

本文提出基于分类的近似策略迭代框架,通过价值函数和策略空间的规律性来提升算法性能,并在HIV控制等任务中验证了其有效性。

英文摘要

Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities, or intrinsic structure, of the problem in hand. Most current methods are geared towards exploiting the regularities of either the value function or the policy. We introduce a general classification-based approximate policy iteration (CAPI) framework, which encompasses a large class of algorithms that can exploit regularities of both the value function and the policy space, depending on what is advantageous. This framework has two main components: a generic value function estimator and a classifier that learns a policy based on the estimated value function. We establish theoretical guarantees for the sample complexity of CAPI-style algorithms, which allow the policy evaluation step to be performed by a wide variety of algorithms (including temporal-difference-style methods), and can handle nonparametric representations of policies. Our bounds on the estimation error of the performance loss are tighter than existing results. We also illustrate this approach empirically on several problems, including a large HIV control task.

1407.0013 2026-06-04 math.NA cs.LG cs.NA math.ST stat.TH

Relevance Singular Vector Machine for low-rank matrix sensing

相关性奇异向量机用于低秩矩阵感知

Martin Sundin, Saikat Chatterjee, Magnus Jansson, Cristian R. Rojas

AI总结 本文提出了一种新的贝叶斯推断方法,用于低秩矩阵重建,即相关性奇异向量机(RSVM),通过在基础矩阵的奇异向量上定义合适先验来促进低秩性,并通过数值高效近似加速计算。

详情
Comments
International Conference on Signal Processing and Communications (SPCOM), 5 pages
AI中文摘要

本文提出了一种新的贝叶斯推断方法用于低秩矩阵重建。我们称之为相关性奇异向量机(RSVM),其中在基础矩阵的奇异向量上定义了适当的先验以促进低秩性。为了加速计算,开发了一种数值高效的近似方法。所提出算法应用于矩阵补全和矩阵重建问题,并通过数值方法研究了其性能。

英文摘要

In this paper we develop a new Bayesian inference method for low rank matrix reconstruction. We call the new method the Relevance Singular Vector Machine (RSVM) where appropriate priors are defined on the singular vectors of the underlying matrix to promote low rank. To accelerate computations, a numerically efficient approximation is developed. The proposed algorithms are applied to matrix completion and matrix reconstruction problems and their performance is studied numerically.

1310.7529 2026-06-04 stat.ML cs.LG cs.NA math.NA math.OC

Successive Nonnegative Projection Algorithm for Robust Nonnegative Blind Source Separation

递进非负投影算法用于鲁棒非负盲源分离

Nicolas Gillis

AI总结 本文提出一种快速且鲁棒的递归算法,用于近可分离的非负矩阵分解问题。该算法称为递进非负投影算法(SNPA),利用非负约束提升鲁棒性,适用于更广泛的非负矩阵。

详情
Journal ref
SIAM J. on Imaging Sciences 7 (2), pp. 1420-1450, 2014
Comments
31 pages, 7 figures, 4 tables. Main changes: new numerical experiments on column-rank-deficient matrices, typos corrected, discussion on the comparison with XRAY
AI中文摘要

本文提出了一种新的快速且鲁棒的递归算法,用于近可分离的非负矩阵分解,这是非负盲源分离问题的一种特定情况。该算法称为递进非负投影算法(SNPA),与流行的递进投影算法(SPA)密切相关,但利用分解中的非负约束。我们证明SNPA比SPA更鲁棒,可应用于更广泛的非负矩阵。这在一些合成数据集和真实世界超光谱图像上得到了验证。

英文摘要

In this paper, we propose a new fast and robust recursive algorithm for near-separable nonnegative matrix factorization, a particular nonnegative blind source separation problem. This algorithm, which we refer to as the successive nonnegative projection algorithm (SNPA), is closely related to the popular successive projection algorithm (SPA), but takes advantage of the nonnegativity constraint in the decomposition. We prove that SNPA is more robust than SPA and can be applied to a broader class of nonnegative matrices. This is illustrated on some synthetic data sets, and on a real-world hyperspectral image.

1401.5375 2026-06-04 math.NA cs.NA stat.AP

Iterative regularization for ensemble data assimilation in reservoir models

迭代正则化在油藏模型中的集合数据同化

Marco A. Iglesias

AI总结 本文提出利用迭代正则化方法开发集合方法解决贝叶斯反问题,设计了IR-enLM和IR-ES方法,通过正则化Levenberg-Marquardt方案提升后验估计的鲁棒性。

详情
AI中文摘要

我们提出将迭代正则化应用于开发用于求解贝叶斯反问题的集合方法。具体而言,我们构建了(i)变分迭代正则化集合Levenberg-Marquardt方法(IR-enLM)和(ii)无导数迭代集合卡尔曼平滑器(IR-ES)。这些方法的目标是提供贝叶斯后验的稳健集合近似。所提出的方法基于迭代正则化方法的基本思想,这些方法已被广泛用于确定性反问题的求解[21]。在本工作中,我们关注所提出集合方法在油藏建模应用中出现的贝叶斯反问题的求解。所提出的集合方法利用了Hanke[16]开发的正则化Levenberg-Marquardt方案的关键方面,我们最近在[18]中将其应用于历史匹配。在正向算子线性且先验高斯的情况下,我们证明所提出的IR-enLM和IR-ES与标准随机最大似然(RML)和集合平滑器(ES)一致。对于一般非线性情况,我们开发了数值框架来评估所提出集合方法在捕捉后验方面的性能。该框架包括使用先进的MCMC方法从合成实验中解析贝叶斯后验。解析后的后验通过MCMC提供了一个金标准,用于比较所提出的IR-enLM和IR-ES。我们证明,通过精心选择正则化参数,可以在均值和方差方面获得稳健的后验近似。数值实验展示了使用迭代正则化在获得更稳健和稳定的后验近似方面优于标准无正则化方法的优势。

英文摘要

We propose the application of iterative regularization for the development of ensemble methods for solving Bayesian inverse problems. In concrete, we construct (i) a variational iterative regularizing ensemble Levenberg-Marquardt method (IR-enLM) and (ii) a derivative-free iterative ensemble Kalman smoother (IR-ES). The aim of these methods is to provide a robust ensemble approximation of the Bayesian posterior. The proposed methods are based on fundamental ideas from iterative regularization methods that have been widely used for the solution of deterministic inverse problems [21]. In this work we are interested in the application of the proposed ensemble methods for the solution of Bayesian inverse problems that arise in reservoir modeling applications. The proposed ensemble methods use key aspects of the regularizing Levenberg-Marquardt scheme developed by Hanke [16] and that we recently applied for history matching in [18]. In the case where the forward operator is linear and the prior is Gaussian, we show that the proposed IR-enLM and IR-ES coincide with standard randomized maximum likelihood (RML) and the ensemble smoother (ES) respectively. For the general nonlinear case, we develop a numerical framework to assess the performance of the proposed ensemble methods at capturing the posterior. This framework consists of using a state-of-the art MCMC method for resolving the Bayesian posterior from synthetic experiments. The resolved posterior via MCMC then provides a gold standard against to which compare the proposed IR-enLM and IR-ES. We show that for the careful selection of regularization parameters, robust approximations of the posterior can be accomplished in terms of mean and variance. Our numerical experiments showcase the advantage of using iterative regularization for obtaining more robust and stable approximation of the posterior than standard unregularized methods.

1406.4549 2026-06-04 stat.ME cs.CC cs.NA math.NA

Extensible grids: uniform sampling on a space-filling curve

可扩展网格:在空间填充曲线上的均匀采样

Zhijian He, Art B. Owen

AI总结 研究空间填充曲线生成的点在[0,1]^d中的性质,发现确定性采样具有O(n^{-1/d})的偏差,随机分层采样和 scrambling van der Corput 点在积分Lipshitz连续函数时有O(n^{-1-2/d})的均方误差,且优于 IID 采样。

详情
Comments
22 pages, 6 figures
AI中文摘要

我们研究了在[0,1]^d中由Hilbert空间填充曲线生成的点的性质。对于确定性采样,我们得到d≥2时偏差为O(n^{-1/d})。对于随机分层采样和scrambled van der Corput点,当d≥3时,积分Lipshitz连续积分函数的均方误差为O(n^{-1-2/d})。这些速率与在d维网格上的采样相同,但随着d的增加而恶化。对于Lipshitz函数,该速率在该平滑度下是最佳的,优于普通IID采样。不同于网格,空间填充曲线采样可提供任意所需的样本量,且van der Corput版本可扩展n。此外,我们证明某些具有无限变分(按Hardy和Krause意义)的不连续函数可被积分,其均方误差为O(n^{-1-1/d})。此前仅知该速率为o(n^{-1})。其他空间填充曲线,如Sierpinski和Peano曲线,也达到这些速率,而Lebesgue曲线的上界稍差,如同维度为log₂(3)倍。

英文摘要

We study the properties of points in $[0,1]^d$ generated by applying Hilbert's space-filling curve to uniformly distributed points in $[0,1]$. For deterministic sampling we obtain a discrepancy of $O(n^{-1/d})$ for $d\ge2$. For random stratified sampling, and scrambled van der Corput points, we get a mean squared error of $O(n^{-1-2/d})$ for integration of Lipshitz continuous integrands, when $d\ge3$. These rates are the same as one gets by sampling on $d$ dimensional grids and they show a deterioration with increasing $d$. The rate for Lipshitz functions is however best possible at that level of smoothness and is better than plain IID sampling. Unlike grids, space-filling curve sampling provides points at any desired sample size, and the van der Corput version is extensible in $n$. Additionally we show that certain discontinuous functions with infinite variation in the sense of Hardy and Krause can be integrated with a mean squared error of $O(n^{-1-1/d})$. It was previously known only that the rate was $o(n^{-1})$. Other space-filling curves, such as those due to Sierpinski and Peano, also attain these rates, while upper bounds for the Lebesgue curve are somewhat worse, as if the dimension were $\log_2(3)$ times as high.

1404.1530 2026-06-04 cs.DS cs.IT cs.NA math.IT math.NA math.ST stat.ML stat.TH

Provable Deterministic Leverage Score Sampling

可证明的确定性杠杆评分采样

Dimitris Papailiopoulos, Anastasios Kyrillidis, Christos Boutsidis

AI总结 本文研究确定性杠杆评分采样在矩阵近似中的有效性,证明其在幂律衰减条件下与随机采样等效,并通过实验证明其性能优于现有方法。

详情
Comments
20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
AI中文摘要

我们从理论上解释了一个有趣的实证现象:"通过确定性地选择具有相应最大杠杆评分的矩阵列子集来近似矩阵,可以得到一个良好的低秩矩阵替代品"。为获得可证明的保证,先前的工作要求以杠杆评分比例随机采样列。在本文中,我们提供了确定性杠杆评分采样的新理论分析。我们证明,如果杠杆评分遵循适度陡峭的幂律衰减,这种确定性采样可以证明与随机采样同样准确。我们通过提供实证证据支持这种幂律假设,即这种衰减定律在现实世界数据集中普遍存在。然后,我们通过实验证明确定性杠杆评分采样的性能,其在许多情况下与或优于现有最先进技术。

英文摘要

We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate". To obtain provable guarantees, previous work requires randomized sampling of the columns with probabilities proportional to their leverage scores. In this work, we provide a novel theoretical analysis of deterministic leverage score sampling. We show that such deterministic sampling can be provably as accurate as its randomized counterparts, if the leverage scores follow a moderately steep power-law decay. We support this power-law assumption by providing empirical evidence that such decay laws are abundant in real-world data sets. We then demonstrate empirically the performance of deterministic leverage score sampling, which many times matches or outperforms the state-of-the-art techniques.

1406.0214 2026-06-04 eess.SY cs.SY math.AT stat.ML

Topological and Statistical Behavior Classifiers for Tracking Applications

拓扑与统计行为分类器用于跟踪应用

Paul Bendich, Sang Chin, Jesse Clarke, Jonathan deSena, John Harer, Elizabeth Munch, Andrew Newman, David Porter, David Rouse, Nate Strawn, Adam Watkins

AI总结 本文提出基于多假设跟踪、拓扑数据分析和机器学习的统一理论,通过拓扑特征编码行为信息,利用统计模型拟合拓扑特征分布,并结合目标类型分类方法提升跟踪性能。

详情
AI中文摘要

我们介绍了一种基于多假设跟踪、拓扑数据分析和机器学习的统一理论,用于目标跟踪。我们的创新包括:1)利用鲁棒的拓扑特征编码行为信息;2)对这些拓扑特征的分布拟合统计模型;3)采用Wigren和Bar Shalom等人的目标类型分类方法,利用所得的拓扑特征似然值提升跟踪过程。为证明我们方法的有效性,我们在由Simulation of Urban Mobility包生成的合成车辆数据上进行了测试。

英文摘要

We introduce the first unified theory for target tracking using Multiple Hypothesis Tracking, Topological Data Analysis, and machine learning. Our string of innovations are 1) robust topological features are used to encode behavioral information, 2) statistical models are fitted to distributions over these topological features, and 3) the target type classification methods of Wigren and Bar Shalom et al. are employed to exploit the resulting likelihoods for topological features inside of the tracking procedure. To demonstrate the efficacy of our approach, we test our procedure on synthetic vehicular data generated by the Simulation of Urban Mobility package.

1310.5035 2026-06-04 math.NA cs.LG cs.NA math.OC stat.ML

Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning

线性化交替方向法与并行分裂及自适应惩罚用于机器学习中的可分离凸程序

Zhouchen Lin, Risheng Liu, Huan Li

AI总结 本文提出LADMPSAP方法,用于高效求解多块可分离凸程序,通过并行分裂和自适应惩罚改进传统ADM方法,实现更强的收敛性和更快的收敛速度,适用于稀疏表示和低秩恢复问题。

详情
Comments
Preliminary version published on Asian Conference on Machine Learning 2013
AI中文摘要

许多机器学习和其他领域的问题可以重新公式化为具有线性约束的可分离凸程序。在大多数情况下,存在多个变量块。然而,传统的交替方向法(ADM)及其线性化版本(LADM,通过线性化二次惩罚项获得)仅适用于两块情况,无法简单推广到多块情况。因此,扩展基于ADM的方法以处理多块情况有巨大需求。本文提出LADMPSAP以高效求解多块可分离凸程序。当所有组件目标函数具有有界子梯度时,我们获得了比ADM和LADM更强的收敛结果,例如允许惩罚参数无界,并证明了全局收敛的充分必要条件。我们进一步提出一个简单的最优性度量,并揭示了LADMPSAP在测度意义上的收敛速度。对于具有额外凸集约束的程序,通过精细的参数估计,我们设计了一个实用的LADMPSAP变种以加快收敛速度。最后,我们通过线性化部分目标函数来推广LADMPSAP,以处理更困难的目标函数程序。LADMPSAP特别适用于稀疏表示和低秩恢复问题,因为其子问题有闭合形式解,迭代过程中的稀疏性和低秩性可以得到保持。它也高度并行化,因此适合并行或分布式计算。数值实验验证了LADMPSAP在速度和数值精度方面的优势。

英文摘要

Many problems in machine learning and other fields can be (re)for-mulated as linearly constrained separable convex programs. In most of the cases, there are multiple blocks of variables. However, the traditional alternating direction method (ADM) and its linearized version (LADM, obtained by linearizing the quadratic penalty term) are for the two-block case and cannot be naively generalized to solve the multi-block case. So there is great demand on extending the ADM based methods for the multi-block case. In this paper, we propose LADM with parallel splitting and adaptive penalty (LADMPSAP) to solve multi-block separable convex programs efficiently. When all the component objective functions have bounded subgradients, we obtain convergence results that are stronger than those of ADM and LADM, e.g., allowing the penalty parameter to be unbounded and proving the sufficient and necessary conditions} for global convergence. We further propose a simple optimality measure and reveal the convergence rate of LADMPSAP in an ergodic sense. For programs with extra convex set constraints, with refined parameter estimation we devise a practical version of LADMPSAP for faster convergence. Finally, we generalize LADMPSAP to handle programs with more difficult objective functions by linearizing part of the objective function as well. LADMPSAP is particularly suitable for sparse representation and low-rank recovery problems because its subproblems have closed form solutions and the sparsity and low-rankness of the iterates can be preserved during the iteration. It is also highly parallelizable and hence fits for parallel or distributed computing. Numerical experiments testify to the advantages of LADMPSAP in speed and numerical accuracy.

1308.4084 2026-06-04 stat.CO cs.NA math.NA math.OC stat.ME

A-optimal design of experiments for infinite-dimensional Bayesian linear inverse problems with regularized $\ell_0$-sparsification

A-最优实验设计用于由偏微分方程支配的无限维贝叶斯线性逆问题的正则化ℓ0-稀疏化

Alen Alexanderian, Noemi Petra, Georg Stadler, Omar Ghattas

AI总结 本文提出了一种高效方法,用于计算由偏微分方程支配的无限维贝叶斯线性逆问题的A-最优实验设计,通过正则化ℓ0-稀疏化技术减少参数估计的不确定性。

详情
Comments
27 pages, accepted for publication in SIAM Journal on Scientific Computing
AI中文摘要

我们提出了一种高效方法,用于计算由偏微分方程(PDEs)支配的无限维贝叶斯线性逆问题的A-最优实验设计。具体来说,我们解决了一个问题,即优化传感器的位置(用于收集观测数据),以最小化通过求解逆问题估计的参数的不确定性,其中不确定性由后验协方差的迹表示。计算最优实验设计(OEDs)对于由计算昂贵的PDE模型支配的逆问题,尤其是具有无限维(或离散化后的高维)参数的问题,尤其具有挑战性。为了减轻计算成本,我们利用问题结构,构建了一个低秩近似参数到观测映射,预条件化为先验协方差算子的平方根。这使我们的方法在评估最优实验设计目标函数及其导数时免于昂贵的PDE求解。此外,我们采用随机迹估计器来高效评估OED目标函数。我们通过使用一系列惩罚函数来控制传感器配置的稀疏性,这些惩罚函数依次逼近ℓ0-“范数”;这导致了二进制设计,以表征最优传感器位置。我们展示了对于从时空观测推断初始条件的逆问题,在二维和三维空间中,时间依赖的对流-扩散问题的数值结果。我们发现,最优设计的计算成本(以正向PDE求解的次数衡量)与参数和传感器维度无关。我们还证明了通过连续方法获得的ℓ0-稀疏化实验设计在数值上优于ℓ1-稀疏化设计。

英文摘要

We present an efficient method for computing A-optimal experimental designs for infinite-dimensional Bayesian linear inverse problems governed by partial differential equations (PDEs). Specifically, we address the problem of optimizing the location of sensors (at which observational data are collected) to minimize the uncertainty in the parameters estimated by solving the inverse problem, where the uncertainty is expressed by the trace of the posterior covariance. Computing optimal experimental designs (OEDs) is particularly challenging for inverse problems governed by computationally expensive PDE models with infinite-dimensional (or, after discretization, high-dimensional) parameters. To alleviate the computational cost, we exploit the problem structure and build a low-rank approximation of the parameter-to-observable map, preconditioned with the square root of the prior covariance operator. This relieves our method from expensive PDE solves when evaluating the optimal experimental design objective function and its derivatives. Moreover, we employ a randomized trace estimator for efficient evaluation of the OED objective function. We control the sparsity of the sensor configuration by employing a sequence of penalty functions that successively approximate the $\ell_0$-"norm"; this results in binary designs that characterize optimal sensor locations. We present numerical results for inference of the initial condition from spatio-temporal observations in a time-dependent advection-diffusion problem in two and three space dimensions. We find that an optimal design can be computed at a cost, measured in number of forward PDE solves, that is independent of the parameter and sensor dimensions. We demonstrate numerically that $\ell_0$-sparsified experimental designs obtained via a continuation method outperform $\ell_1$-sparsified designs.

1402.2864 2026-06-04 eess.SY cs.SY stat.ML

Sparse Estimation From Noisy Observations of an Overdetermined Linear System

从受扰线性超定系统中稀疏估计

Liang Dai, Kristiaan Pelckmans

AI总结 本文提出一种高效估计有限未知参数的方法,通过线性规划恢复支持集并利用LSE进行去偏差,证明在样本量足够时估计结果等于真实参数支持集的LSE。

详情
Comments
This paper is provisionally accepted by Automatica
AI中文摘要

本文研究了一种从受高斯噪声扰动的线性方程中高效估计有限未知参数的方法。当未知参数仅有少数非零元素时,所提估计器比传统方法更高效。该方法包含三个步骤:(1) 经典的最小二乘估计(LSE),(2) 通过线性规划优化问题恢复支持集,该问题可通过软阈值化步骤计算,(3) 使用LSE在估计的支持集上进行去偏差。本文的主要贡献是正式推导了最终估计的关联ORACLE性质。即当样本量足够大时,估计结果被证明等于基于真实参数支持集的LSE。

英文摘要

This note studies a method for the efficient estimation of a finite number of unknown parameters from linear equations, which are perturbed by Gaussian noise. In case the unknown parameters have only few nonzero entries, the proposed estimator performs more efficiently than a traditional approach. The method consists of three steps: (1) a classical Least Squares Estimate (LSE), (2) the support is recovered through a Linear Programming (LP) optimization problem which can be computed using a soft-thresholding step, (3) a de-biasing step using a LSE on the estimated support set. The main contribution of this note is a formal derivation of an associated ORACLE property of the final estimate. That is, when the number of samples is large enough, the estimate is shown to equal the LSE based on the support of the {\em true} parameters.

1402.0562 2026-06-04 stat.ML cs.LG cs.SY eess.SY

Online Stochastic Optimization under Correlated Bandit Feedback

在线随机优化中的相关多臂反馈

Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

AI总结 本文提出HCT算法,解决局部光滑函数的在线随机优化问题,处理相关奖励挑战,改进内存需求和光滑性假设,应用于强化学习策略搜索。

详情
AI中文摘要

本文考虑在多臂反馈下局部光滑函数的在线随机优化问题。我们引入高置信度树(HCT)算法,一种新型的任何时间$\mathcal{X}$-臂多臂算法,并推导出与现有最先进方法在步骤数和光滑性因子依赖性上匹配的遗憾界。HCT的主要优势在于处理相关奖励的挑战,而现有方法要求每个臂的奖励生成过程是独立同分布(iid)的随机过程。HCT还改进了现有方法在内存需求和对均奖励函数光滑性假设的弱化方面。最后,我们讨论了HCT在强化学习策略搜索问题中的应用,并报告了初步的实证结果。

英文摘要

In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel any-time $\mathcal{X}$-armed bandit algorithm, and derive regret bounds matching the performance of existing state-of-the-art in terms of dependency on number of steps and smoothness factor. The main advantage of HCT is that it handles the challenging case of correlated rewards, whereas existing methods require that the reward-generating process of each arm is an identically and independent distributed (iid) random process. HCT also improves on the state-of-the-art in terms of its memory requirement as well as requiring a weaker smoothness assumption on the mean-reward function in compare to the previous anytime algorithms. Finally, we discuss how HCT can be applied to the problem of policy search in reinforcement learning and we report preliminary empirical results.

1310.1502 2026-06-04 math.NA cs.LG cs.NA stat.ML

Randomized Approximation of the Gram Matrix: Exact Computation and Probabilistic Bounds

随机近似Gram矩阵:精确计算与概率界

John T. Holodnak, Ilse C. F. Ipsen

AI总结 研究通过随机化方法近似Gram矩阵,提出基于稳定秩的概率误差界,适用于小维度矩阵和高成功概率场景。

详情
Comments
Update to title in third version. Major revisions in second version including new bounds and a more detailed experimental section. Submitted to SIMAX
AI中文摘要

给定一个具有n列的实矩阵A,问题在于通过c<<n个加权外积近似Gram积AA^T。精确计算AA^T(在精确算术中)所需的条件取决于A的右奇异向量矩阵。对于Drineas等人提出的蒙特卡洛矩阵乘法算法,我们给出了由于随机化导致的2范数相对误差的概率界。这些界依赖于稳定秩或A的秩,而不是矩阵维度。数值实验表明,这些界在严格成功概率和小维度矩阵情况下仍具信息量。我们还推导了通过从正交矩阵中采样行所获得的矩阵的最小奇异值和条件数的界。

英文摘要

Given a real matrix A with n columns, the problem is to approximate the Gram product AA^T by c << n weighted outer products of columns of A. Necessary and sufficient conditions for the exact computation of AA^T (in exact arithmetic) from c >= rank(A) columns depend on the right singular vector matrix of A. For a Monte-Carlo matrix multiplication algorithm by Drineas et al. that samples outer products, we present probabilistic bounds for the 2-norm relative error due to randomization. The bounds depend on the stable rank or the rank of A, but not on the matrix dimensions. Numerical experiments illustrate that the bounds are informative, even for stringent success probabilities and matrices of small dimension. We also derive bounds for the smallest singular value and the condition number of matrices obtained by sampling rows from orthonormal matrices.

1311.6107 2026-06-04 eess.SY cs.LG cs.SY math.OC stat.ML

Off-policy reinforcement learning for $ H_\infty $ control design

为H∞控制设计的非策略强化学习

Biao Luo, Huai-Ning Wu, Tingwen Huang

AI总结 本文提出非策略强化学习方法解决未知内部模型非线性系统H∞控制问题,通过实时系统数据学习HJI方程解,证明收敛性并应用于F16飞机和旋转/平移执行器系统。

详情
Comments
Accepted by IEEE Transactions on Cybernetics. IEEE Transactions on Cybernetics, Online Available, 2014
AI中文摘要

针对具有未知内部系统模型的非线性系统H∞控制设计问题,本文考虑将非线性H∞控制问题转化为求解所谓的哈密顿-雅可比-伊萨克斯(HJI)方程,该方程是一种通常无法解析求解的非线性偏微分方程。更糟糕的是,当准确系统模型不可用或获取成本高时,基于模型的方法无法近似求解HJI方程。为克服这些困难,本文引入了一种非策略强化学习(RL)方法,从真实系统数据而不是数学系统模型中学习HJI方程的解,并证明其收敛性。在非策略RL方法中,系统数据可以使用任意策略生成,而不是评估策略,这对于实际系统至关重要且具有前景。出于实施目的,采用基于神经网络(NN)的actor-critic结构,并基于残差加权方法推导出最小二乘NN权重更新算法。最后,所开发的基于NN的非策略RL方法在线性F16飞机植物上进行测试,并进一步应用于旋转/平移执行器系统。

英文摘要

The $H_\infty$ control design problem is considered for nonlinear systems with unknown internal system model. It is known that the nonlinear $ H_\infty $ control problem can be transformed into solving the so-called Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, model-based approaches cannot be used for approximately solving HJI equation, when the accurate system model is unavailable or costly to obtain in practice. To overcome these difficulties, an off-policy reinforcement leaning (RL) method is introduced to learn the solution of HJI equation from real system data instead of mathematical system model, and its convergence is proved. In the off-policy RL method, the system data can be generated with arbitrary policies rather than the evaluating policy, which is extremely important and promising for practical systems. For implementation purpose, a neural network (NN) based actor-critic structure is employed and a least-square NN weight update algorithm is derived based on the method of weighted residuals. Finally, the developed NN-based off-policy RL method is tested on a linear F16 aircraft plant, and further applied to a rotational/translational actuator system.

1405.1773 2026-06-04 stat.ML cs.IT cs.NA math.IT math.NA math.OC math.PR

On Tensor Completion via Nuclear Norm Minimization

通过核范数最小化进行张量补全

Ming Yuan, Cun-Hui Zhang

AI总结 本文提出通过直接最小化张量核范数进行张量补全,证明该方法能提升采样需求,发展了张量核范数的子微分刻画和张量鞅集中不等式等技术。

详情
AI中文摘要

许多问题可以表述为恢复低秩张量。尽管成为越来越常见的任务,张量恢复仍是一个具有挑战性的问题,因为涉及高阶张量分解的复杂性。为克服这些困难,现有方法通常通过将张量展开为矩阵,然后应用矩阵补全技术。本文显示,此类矩阵化无法利用张量结构,可能导致次优过程。更具体地说,本文研究了一种通过直接最小化张量核范数的凸优化方法进行张量补全,并证明这会导致更优的采样需求。为建立我们的结果,我们开发了一系列代数和概率技术,如张量核范数的子微分刻画和张量鞅集中不等式,这些技术可能具有独立兴趣,并可用于其他张量相关问题。

英文摘要

Many problems can be formulated as recovering a low-rank tensor. Although an increasingly common task, tensor recovery remains a challenging problem because of the delicacy associated with the decomposition of higher order tensors. To overcome these difficulties, existing approaches often proceed by unfolding tensors into matrices and then apply techniques for matrix completion. We show here that such matricization fails to exploit the tensor structure and may lead to suboptimal procedure. More specifically, we investigate a convex optimization approach to tensor completion by directly minimizing a tensor nuclear norm and prove that this leads to an improved sample size requirement. To establish our results, we develop a series of algebraic and probabilistic techniques such as characterization of subdifferetial for tensor nuclear norm and concentration inequalities for tensor martingales, which may be of independent interests and could be useful in other tensor related problems.

1405.1250 2026-06-04 stat.CO cs.NA math.NA

New tight approximations for Fisher's exact test

Fisher精确检验的新紧近似方法

Wilhelmiina Hämäläinen

AI总结 本文提出了一种快速计算的Fisher精确检验近似方法,能够准确逼近p值,且不受数据规模和分布影响,适用于大数据集中的统计依赖性分析。

详情
AI中文摘要

Fisher精确检验常被用于估计统计依赖性的显著性。然而,在大数据集中,该检验通常过于繁琐而难以应用,尤其是在穷举搜索(数据挖掘)中。传统的解决方案是用$χ^2$度量来近似显著性,但其准确性往往不可接受。为此,我们引入了一类上界,这些上界计算快速且能准确逼近Fisher的$p$-值。此外,新的近似方法不受数据规模、分布或最小期望计数的影响, unlike $χ^2$-based近似。根据理论和实验分析,新的近似方法对所有足够强的依赖性都能产生准确的结果。基本形式的近似在弱依赖性情况下可能失效,但上界的通用形式可以调整为任意准确。

英文摘要

Fisher's exact test is often a preferred method to estimate the significance of statistical dependence. However, in large data sets the test is usually too worksome to be applied, especially in an exhaustive search (data mining). The traditional solution is to approximate the significance with the $χ^2$-measure, but the accuracy is often unacceptable. As a solution, we introduce a family of upper bounds, which are fast to calculate and approximate Fisher's $p$-value accurately. In addition, the new approximations are not sensitive to the data size, distribution, or smallest expected counts like the $χ^2$-based approximation. According to both theoretical and experimental analysis, the new approximations produce accurate results for all sufficiently strong dependencies. The basic form of the approximation can fail with weak dependencies, but the general form of the upper bounds can be adjusted to be arbitrarily accurate.

1404.1377 2026-06-04 cs.LG cs.NA math.NA stat.ML

Orthogonal Rank-One Matrix Pursuit for Low Rank Matrix Completion

正交秩一矩阵追迹法用于低秩矩阵补全

Zheng Wang, Ming-Jun Lai, Zhaosong Lu, Wei Fan, Hasan Davulcu, Jieping Ye

AI总结 本文提出一种高效可扩展的低秩矩阵补全算法,通过将正交匹配追踪方法扩展到矩阵领域,并引入新的权重更新规则降低计算和存储复杂度,具有线性收敛速度和单一可调参数,适用于大规模学习问题。

详情
AI中文摘要

本文提出了一种高效的低秩矩阵补全算法,其核心思想是将正交匹配追踪方法从向量扩展到矩阵领域。我们进一步提出了一种经济版本的算法,通过引入新的权重更新规则来降低时间和存储复杂度。两种版本在每次矩阵追迹迭代中计算成本低廉,且在几次迭代中就能获得满意的结果。我们提出的算法的另一个优点是仅有一个可调参数,即秩,这使得用户易于理解和使用,特别是在大规模学习问题中尤为重要。此外,我们严格证明了两种版本都实现了线性收敛速度,这比之前已知的结果显著更好。我们还通过多个真实世界数据集,包括大规模推荐数据集Netflix以及MovieLens数据集,经验性地比较了所提算法与几种最先进的矩阵补全算法。数值结果表明,所提算法在效率上优于竞争算法,同时在预测性能上达到相似或更好的水平。

英文摘要

In this paper, we propose an efficient and scalable low rank matrix completion algorithm. The key idea is to extend orthogonal matching pursuit method from the vector case to the matrix case. We further propose an economic version of our algorithm by introducing a novel weight updating rule to reduce the time and storage complexity. Both versions are computationally inexpensive for each matrix pursuit iteration, and find satisfactory results in a few iterations. Another advantage of our proposed algorithm is that it has only one tunable parameter, which is the rank. It is easy to understand and to use by the user. This becomes especially important in large-scale learning problems. In addition, we rigorously show that both versions achieve a linear convergence rate, which is significantly better than the previous known results. We also empirically compare the proposed algorithms with several state-of-the-art matrix completion algorithms on many real-world datasets, including the large-scale recommendation dataset Netflix as well as the MovieLens datasets. Numerical results show that our proposed algorithm is more efficient than competing algorithms while achieving similar or better prediction performance.

1308.6221 2026-06-04 stat.ME cs.NA math.NA math.OC math.ST stat.CO stat.TH

A computational framework for infinite-dimensional Bayesian inverse problems: Part II. Stochastic Newton MCMC with application to ice sheet flow inverse problems

无限维贝叶斯反问题计算框架:第二部分。随机牛顿MCMC方法及其在冰盖流动反问题中的应用

Noemi Petra, James Martin, Georg Stadler, Omar Ghattas

AI总结 本文提出一种解决无限维反问题的计算框架,采用随机牛顿MCMC方法,通过低秩近似Hessian来加速采样,应用于冰盖流动反问题,展示了方法在收敛速度和计算成本上的优势。

详情
Comments
31 pages
AI中文摘要

我们针对贝叶斯推断框架下的无限维反问题数值求解进行了研究。在本文第一部分(arXiv.org:1308.1313)中,我们考虑了线性化的无限维反问题。在此部分二中,我们放松线性化假设,采用马尔可夫链蒙特卡洛(MCMC)采样方法解决完全非线性无限维反问题。为应对由偏微分方程 governed 的贝叶斯反问题带来的高维概率分布采样挑战,我们基于随机牛顿MCMC方法。该方法利用问题结构,将局部高斯近似作为提议密度,其构造通过低秩近似其数据不匹配项的Hessian来实现。本文引入了一种随机牛顿提议的近似方法,其中在MAP点计算低秩Hessian,并在每个MCMC步骤中重用该Hessian。我们比较了所提方法与原始随机牛顿MCMC方法和独立采样器的性能。该比较在合成冰盖反问题上进行。对于该问题,具有MAP基础Hessian的随机牛顿MCMC方法至少与原始随机牛顿MCMC方法同样快速收敛,但更经济,因为它避免在每个步骤重新计算Hessian。另一方面,其每样本成本高于独立采样器;然而,其收敛速度显著更快,因此总体上更经济。最后,我们对后验分布进行了广泛分析和解释,并根据参数空间中方向被先验或观测信息的程度进行了分类。

英文摘要

We address the numerical solution of infinite-dimensional inverse problems in the framework of Bayesian inference. In the Part I companion to this paper (arXiv.org:1308.1313), we considered the linearized infinite-dimensional inverse problem. Here in Part II, we relax the linearization assumption and consider the fully nonlinear infinite-dimensional inverse problem using a Markov chain Monte Carlo (MCMC) sampling method. To address the challenges of sampling high-dimensional pdfs arising from Bayesian inverse problems governed by PDEs, we build on the stochastic Newton MCMC method. This method exploits problem structure by taking as a proposal density a local Gaussian approximation of the posterior pdf, whose construction is made tractable by invoking a low-rank approximation of its data misfit component of the Hessian. Here we introduce an approximation of the stochastic Newton proposal in which we compute the low-rank-based Hessian at just the MAP point, and then reuse this Hessian at each MCMC step. We compare the performance of the proposed method to the original stochastic Newton MCMC method and to an independence sampler. The comparison of the three methods is conducted on a synthetic ice sheet inverse problem. For this problem, the stochastic Newton MCMC method with a MAP-based Hessian converges at least as rapidly as the original stochastic Newton MCMC method, but is far cheaper since it avoids recomputing the Hessian at each step. On the other hand, it is more expensive per sample than the independence sampler; however, its convergence is significantly more rapid, and thus overall it is much cheaper. Finally, we present extensive analysis and interpretation of the posterior distribution, and classify directions in parameter space based on the extent to which they are informed by the prior or the observations.

1403.7737 2026-06-04 cs.LG cs.NA math.NA stat.ML

Sharpened Error Bounds for Random Sampling Based $\ell_2$ Regression

随机采样基于ℓ₂回归的误差界改进

Shusen Wang

AI总结 本文提出两种随机采样方法改进ℓ₂回归效率,改进误差界至O(d log d + d/ε)以实现1+ε精度,同时证明均匀采样在特定条件下可获得2+ε的界。

详情
Comments
unpublished manuscript
AI中文摘要

给定数据矩阵X∈R^{n×d}和响应向量y∈R^n,当n>d时,求解最小二乘回归(LSR)问题需要O(n d²)时间与O(n d)空间。当n和d均较大时,精确求解非常昂贵。当n>>d时,将y和X的所有列随机嵌入到较小的子空间R^c中,可将LSR问题的行数减少,从而以O(c d²)时间和O(c d)空间求解。本文讨论了两种随机采样方法以更高效地求解LSR。先前工作表明基于杠杆分数采样的LSR在c≥O(d ε^{-2} log d)时可达到1+ε精度。本文改进此误差界,证明当c=O(d log d + d/ε)时即可实现1+ε精度。此外,当c≥O(μd ε^{-2} log d)时,均匀采样基于LSR以正概率达到2+ε的界。

英文摘要

Given a data matrix $X \in R^{n\times d}$ and a response vector $y \in R^{n}$, suppose $n>d$, it costs $O(n d^2)$ time and $O(n d)$ space to solve the least squares regression (LSR) problem. When $n$ and $d$ are both large, exactly solving the LSR problem is very expensive. When $n \gg d$, one feasible approach to speeding up LSR is to randomly embed $y$ and all columns of $X$ into a smaller subspace $R^c$; the induced LSR problem has the same number of columns but much fewer number of rows, and it can be solved in $O(c d^2)$ time and $O(c d)$ space. We discuss in this paper two random sampling based methods for solving LSR more efficiently. Previous work showed that the leverage scores based sampling based LSR achieves $1+ε$ accuracy when $c \geq O(d ε^{-2} \log d)$. In this paper we sharpen this error bound, showing that $c = O(d \log d + d ε^{-1})$ is enough for achieving $1+ε$ accuracy. We also show that when $c \geq O(μd ε^{-2} \log d)$, the uniform sampling based LSR attains a $2+ε$ bound with positive probability.

1311.4468 2026-06-04 cs.LG cs.SY eess.SY physics.data-an stat.ML

Stochastic processes and feedback-linearisation for online identification and Bayesian adaptive control of fully-actuated mechanical systems

随机过程与反馈线性化用于完全驱动机械系统的在线识别和贝叶斯自适应控制

Jan-Peter Calliess, Antonis Papachristodoulou, Stephen J. Roberts

AI总结 本文提出了一种新的方法,结合概率识别与控制,利用随机过程先验条件和拉格朗日力学结构知识,通过反馈线性化实现对完全驱动机械系统的灵活非参数贝叶斯学习。

详情
AI中文摘要

本文提出了一种新的方法,用于同时进行可观察完全驱动机械系统的概率识别和控制。通过将随机过程先验条件于配置观测和噪声配置导数估计上,实现识别。与以往利用随机过程进行识别的工作不同,我们利用拉格朗日力学提供的结构知识,分别学习控制-affine系统的漂移和控制输入矩阵函数。利用反馈线性化将不确定的非线性控制问题在期望上转化为易于调节的问题。因此,本文的方法结合了非参数贝叶斯学习的灵活性与对闭环轨迹期望的元认知保证。在扭矩驱动摆的背景下,通过正态和对数正态过程的结合来学习动力学。

英文摘要

This work proposes a new method for simultaneous probabilistic identification and control of an observable, fully-actuated mechanical system. Identification is achieved by conditioning stochastic process priors on observations of configurations and noisy estimates of configuration derivatives. In contrast to previous work that has used stochastic processes for identification, we leverage the structural knowledge afforded by Lagrangian mechanics and learn the drift and control input matrix functions of the control-affine system separately. We utilise feedback-linearisation to reduce, in expectation, the uncertain nonlinear control problem to one that is easy to regulate in a desired manner. Thereby, our method combines the flexibility of nonparametric Bayesian learning with epistemological guarantees on the expected closed-loop trajectory. We illustrate our method in the context of torque-actuated pendula where the dynamics are learned with a combination of normal and log-normal processes.

1403.2073 2026-06-04 math.NA cs.NA stat.ML

Generalized Canonical Correlation Analysis and Its Application to Blind Source Separation Based on a Dual-Linear Predictor Structure

广义典型相关分析及其在基于双线性预测器结构的盲源分离中的应用

Wei Liu

AI总结 本文提出广义典型相关分析用于处理带噪声的盲源分离问题,采用双线性预测器结构提升分离性能。

详情
Comments
7 pages and 5 figures. The main aim is to show the inherent relationship between generalised canonical correlation analysis and the dual-linear predictor approach presented in two separate conference papers (references [15] and [16])
AI中文摘要

本文提出广义典型相关分析用于处理带噪声的盲源分离问题,采用双线性预测器结构提升分离性能。

英文摘要

Blind source separation (BSS) is one of the most important and established research topics in signal processing and many algorithms have been proposed based on different statistical properties of the source signals. For second-order statistics (SOS) based methods, canonical correlation analysis (CCA) has been proved to be an effective solution to the problem. In this work, the CCA approach is generalized to accommodate the case with added white noise and it is then applied to the BSS problem for noisy mixtures. In this approach, the noise component is assumed to be spatially and temporally white, but the variance information of noise is not required. An adaptive blind source extraction algorithm is derived based on this idea and a further extension is proposed by employing a dual-linear predictor structure for blind source extraction (BSE).

1309.1369 2026-06-04 stat.ML cs.LG cs.NA math.NA stat.CO

Semistochastic Quadratic Bound Methods

半随机二次界方法

Aleksandr Y. Aravkin, Anna Choromanska, Tony Jebara, Dimitri Kanevsky

AI总结 本文提出半随机二次界方法用于最大似然推断,通过优化分区函数,证明了在弱假设下全局收敛性和在强假设下线性收敛性,同时通过不精确子问题求解和批量大小选择方案提升效率与稳定性。

详情
Comments
11 pages, 1 figure
AI中文摘要

本文提出半随机二次界方法用于最大似然推断,通过优化分区函数,证明了在弱假设下全局收敛性和在强假设下线性收敛性,同时通过不精确子问题求解和批量大小选择方案提升效率与稳定性。

英文摘要

Partition functions arise in a variety of settings, including conditional random fields, logistic regression, and latent gaussian models. In this paper, we consider semistochastic quadratic bound (SQB) methods for maximum likelihood inference based on partition function optimization. Batch methods based on the quadratic bound were recently proposed for this class of problems, and performed favorably in comparison to state-of-the-art techniques. Semistochastic methods fall in between batch algorithms, which use all the data, and stochastic gradient type methods, which use small random selections at each iteration. We build semistochastic quadratic bound-based methods, and prove both global convergence (to a stationary point) under very weak assumptions, and linear convergence rate under stronger assumptions on the objective. To make the proposed methods faster and more stable, we consider inexact subproblem minimization and batch-size selection schemes. The efficacy of SQB methods is demonstrated via comparison with several state-of-the-art techniques on commonly used datasets.

1312.0516 2026-06-04 cs.LG cs.SY eess.SY stat.AP stat.ML

Grid Topology Identification using Electricity Prices

利用电价识别电网拓扑

Vassilis Kekatos, Georgios B. Giannakis, Ross Baldick

AI总结 本文研究通过公开市场数据恢复电网拓扑的潜力,提出基于LMP的正则化最大似然估计器,利用低秩和稀疏结构恢复电网拉普拉斯矩阵,通过IEEE 14节点基准数据验证了方法的有效性。

详情
Comments
PES General Meeting 2014 submission
AI中文摘要

本文探讨了仅使用公开市场数据恢复电网拓扑的潜力。在当代批发电力市场中,实时电价通常通过求解受网络约束的经济调度问题确定。在线性直流模型下,位置边际价格(LMP)对应于所涉及线性规划的拉格朗日乘数。有趣的是,时空变化的LMP矩阵具有以下性质:一旦与加权电网拉普拉斯矩阵相乘,就会得到低秩且稀疏的矩阵。利用这一丰富结构,开发了一种正则化的最大似然估计器(MLE)来从LMP中恢复电网拉普拉斯矩阵。所提出的凸优化问题包含促进低秩和稀疏性的正则化项,并通过可扩展的算法求解。在为IEEE 14节点基准生成的价格上进行的数值测试提供了令人鼓舞的拓扑恢复结果。

英文摘要

The potential of recovering the topology of a grid using solely publicly available market data is explored here. In contemporary whole-sale electricity markets, real-time prices are typically determined by solving the network-constrained economic dispatch problem. Under a linear DC model, locational marginal prices (LMPs) correspond to the Lagrange multipliers of the linear program involved. The interesting observation here is that the matrix of spatiotemporally varying LMPs exhibits the following property: Once premultiplied by the weighted grid Laplacian, it yields a low-rank and sparse matrix. Leveraging this rich structure, a regularized maximum likelihood estimator (MLE) is developed to recover the grid Laplacian from the LMPs. The convex optimization problem formulated includes low rank- and sparsity-promoting regularizers, and it is solved using a scalable algorithm. Numerical tests on prices generated for the IEEE 14-bus benchmark provide encouraging topology recovery results.

1306.3343 2026-06-04 cs.LG cs.NA math.NA stat.ML

Relaxed Sparse Eigenvalue Conditions for Sparse Estimation via Non-convex Regularized Regression

松弛的稀疏特征值条件用于非凸正则化回归中的稀疏估计

Zheng Pan, Changshui Zhang

AI总结 本文研究了非凸正则化回归中稀疏估计的松弛特征值条件,证明了非凸正则化在稀疏估计中的有效性,并展示了坐标下降法在获得近似全局解中的应用。

详情
AI中文摘要

非凸正则化器通常在实践中提高了稀疏估计的性能。为了证明这一点,我们研究了稀疏估计的条件,特别是针对一种包含许多现有正则化器的尖锐凹正则化器。对于正则化回归的全局解,我们的基于稀疏特征值的条件比L1正则化在参数估计和稀疏性估计中的条件更弱。对于近似全局和近似 stationary(AGAS)解,几乎相同的条件也足够。我们证明了通过坐标下降(CD)方法可以得到所需的AGAS解。最后,我们进行了一些实验,以展示CD方法在获得AGAS解中的性能以及尖锐凹正则化器所需估计条件的弱性。

英文摘要

Non-convex regularizers usually improve the performance of sparse estimation in practice. To prove this fact, we study the conditions of sparse estimations for the sharp concave regularizers which are a general family of non-convex regularizers including many existing regularizers. For the global solutions of the regularized regression, our sparse eigenvalue based conditions are weaker than that of L1-regularization for parameter estimation and sparseness estimation. For the approximate global and approximate stationary (AGAS) solutions, almost the same conditions are also enough. We show that the desired AGAS solutions can be obtained by coordinate descent (CD) based methods. Finally, we perform some experiments to show the performance of CD methods on giving AGAS solutions and the degree of weakness of the estimation conditions required by the sharp concave regularizers.

1306.0308 2026-06-04 stat.ML cs.LG cs.NA math.NA

Probabilistic Solutions to Differential Equations and their Application to Riemannian Statistics

微分方程的概率解及其在黎曼统计学中的应用

Philipp Hennig, Søren Hauberg

AI总结 本文提出一种概率数值方法,用于求解初值和边界值问题,返回解的高斯过程后验。该方法在黎曼流形统计中具有应用价值,能处理非解析常微分方程,通过不确定性边际化提升统计鲁棒性,提出新的黎曼算法和主地理分析方法。

详情
Journal ref
Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS) 2014, Reykjavik, Iceland. Journal of Machine Learning Research: W&CP volume 33
Comments
11 page (9 page conference paper, plus supplements)
AI中文摘要

我们研究了一种概率数值方法,用于求解初值和边界值问题,该方法返回解的联合高斯过程后验。此类方法在黎曼流形统计中具有实际价值,因为几乎所有计算都涉及非解析常微分方程。概率公式允许对数值解的不确定性进行边际化,使统计结果对不准确性更不敏感。这导致了新的黎曼算法用于均值计算和主地理分析。边际化也意味着结果可能不如点估计精确,从而在状态-of-the-art方法上实现显著加速。我们的方法是关于更广泛观点的论据,即数值计算引起的不确定性应在机器学习算法的整个管道中进行跟踪。

英文摘要

We study a probabilistic numerical method for the solution of both boundary and initial value problems that returns a joint Gaussian process posterior over the solution. Such methods have concrete value in the statistics on Riemannian manifolds, where non-analytic ordinary differential equations are involved in virtually all computations. The probabilistic formulation permits marginalising the uncertainty of the numerical solution such that statistics are less sensitive to inaccuracies. This leads to new Riemannian algorithms for mean value computations and principal geodesic analysis. Marginalisation also means results can be less precise than point estimates, enabling a noticeable speed-up over the state of the art. Our approach is an argument for a wider point that uncertainty caused by numerical calculations should be tracked throughout the pipeline of machine learning algorithms.

1309.6290 2026-06-04 math.NA cs.NA stat.CO

Coefficient Matrices Computation of Structural Vector Autoregressive Model

结构向量自回归模型系数矩阵的计算

Aravindh Krishnamoorthy

AI总结 本文提出高效的Large Inverse Cholesky方法,用于计算结构向量自回归模型的系数矩阵。

详情
Comments
2pp; pre-publication
AI中文摘要

在本文中,我们提出了Large Inverse Cholesky(LIC)方法,一种用于计算结构向量自回归(SVAR)模型系数矩阵的高效方法。

英文摘要

In this paper we present the Large Inverse Cholesky (LIC) method, an efficient method for computing the coefficient matrices of a Structural Vector Autoregressive (SVAR) model.

1401.2288 2026-06-04 math.NA cs.LG cs.NA stat.ML

Extension of Sparse Randomized Kaczmarz Algorithm for Multiple Measurement Vectors

稀疏随机Kaczmarz算法的扩展:多测量向量问题

Hemant Kumar Aggarwal, Angshul Majumdar

AI总结 本文提出改进的随机Kaczmarz算法解决多测量向量问题,通过共同稀疏支持实现更高效的恢复与收敛。

详情
AI中文摘要

Kaczmarz算法因迭代求解超定线性方程组而流行。传统算法在几次遍历中可近似解,但随机版本能指数收敛且与方程数量无关。最近提出基于加权随机Kaczmarz算法的稀疏解算法,但仅适用于单测量向量问题。本文通过修改随机Kaczmarz算法解决多测量向量问题,将视频人脸识别建模为该问题并应用所提技术。在真实和合成数据集上,所提算法在公平约束下优于状态最新型谱投影梯度算法,蒙特卡洛模拟证实其恢复和收敛速率更优。

英文摘要

The Kaczmarz algorithm is popular for iteratively solving an overdetermined system of linear equations. The traditional Kaczmarz algorithm can approximate the solution in few sweeps through the equations but a randomized version of the Kaczmarz algorithm was shown to converge exponentially and independent of number of equations. Recently an algorithm for finding sparse solution to a linear system of equations has been proposed based on weighted randomized Kaczmarz algorithm. These algorithms solves single measurement vector problem; however there are applications were multiple-measurements are available. In this work, the objective is to solve a multiple measurement vector problem with common sparse support by modifying the randomized Kaczmarz algorithm. We have also modeled the problem of face recognition from video as the multiple measurement vector problem and solved using our proposed technique. We have compared the proposed algorithm with state-of-art spectral projected gradient algorithm for multiple measurement vectors on both real and synthetic datasets. The Monte Carlo simulations confirms that our proposed algorithm have better recovery and convergence rate than the MMV version of spectral projected gradient algorithm under fairness constraints.

1401.1842 2026-06-04 stat.ML cs.IT cs.LG cs.NA math.IT math.NA

Robust Large Scale Non-negative Matrix Factorization using Proximal Point Algorithm

鲁棒大规模非负矩阵分解的近点算法

Jason Gejie Liu, Shuchin Aeron

AI总结 本文提出一种鲁棒算法用于大规模非负矩阵分解,通过引入减少的约束条件改进线性规划算法,无需预先知道分解秩,适用于极端射线或主题数量远大于数据向量维度的情况。

详情
Comments
Appeared in IEEE GlobalSIP, 2013, TX, Austin
AI中文摘要

本文提出了一种用于处理大规模数据的鲁棒非负矩阵分解(NMF)算法,其中分离性假设成立。具体来说,我们通过引入减少的约束条件修改了[9]中的线性规划(LP)算法,以实现精确的NMF。与以往方法不同,所提出的算法不需要知道分解秩(极端射线[3]或主题[7])。受代谢网络分析中类似问题的启发,我们考虑了一种完全不同的情形,即极端射线或主题的数量可以远大于数据向量的维度。不同合成数据集上算法的性能也得到了提供。

英文摘要

A robust algorithm for non-negative matrix factorization (NMF) is presented in this paper with the purpose of dealing with large-scale data, where the separability assumption is satisfied. In particular, we modify the Linear Programming (LP) algorithm of [9] by introducing a reduced set of constraints for exact NMF. In contrast to the previous approaches, the proposed algorithm does not require the knowledge of factorization rank (extreme rays [3] or topics [7]). Furthermore, motivated by a similar problem arising in the context of metabolic network analysis [13], we consider an entirely different regime where the number of extreme rays or topics can be much larger than the dimension of the data vectors. The performance of the algorithm for different synthetic data sets are provided.

1305.0087 2026-06-04 cs.DS cs.DC cs.NA math.NA stat.ML

Quantile Regression for Large-scale Applications

分位数回归在大规模应用中的方法

Jiyan Yang, Xiangrui Meng, Michael W. Mahoney

AI总结 本文提出一种近线性时间复杂度的随机算法,用于解决大规模分位数回归问题,通过低失真子空间保持嵌入提升计算效率,适用于超大规模数据集。

详情
Comments
35 pages; long version of a paper appearing in the 2013 ICML. Version to appear in the SIAM Journal on Scientific Computing
AI中文摘要

分位数回归是一种估计响应变量条件分布分位数的方法,比最小二乘或最小绝对偏差回归能更准确地描述响应变量与观测协变量的关系。它可以表示为线性规划,并通过适当预处理使用内点法解决中等规模问题。处理非常大规模问题(例如,涉及数据量达太字节级别)仍具挑战性。本文提出了一种随机算法,可在输入规模下近线性时间内以常数概率计算任意分位数回归问题的(1+ε)近似解。关键步骤是计算相对于分位数回归损失函数的低失真子空间保持嵌入。实验证明,该算法在小到中等规模问题上与最佳先前工作竞争,并且可以实现于MapReduce类环境中,应用于太字节级问题。

英文摘要

Quantile regression is a method to estimate the quantiles of the conditional distribution of a response variable, and as such it permits a much more accurate portrayal of the relationship between the response variable and observed covariates than methods such as Least-squares or Least Absolute Deviations regression. It can be expressed as a linear program, and, with appropriate preprocessing, interior-point methods can be used to find a solution for moderately large problems. Dealing with very large problems, \emph(e.g.), involving data up to and beyond the terabyte regime, remains a challenge. Here, we present a randomized algorithm that runs in nearly linear time in the size of the input and that, with constant probability, computes a $(1+ε)$ approximate solution to an arbitrary quantile regression problem. As a key step, our algorithm computes a low-distortion subspace-preserving embedding with respect to the loss function of quantile regression. Our empirical evaluation illustrates that our algorithm is competitive with the best previous work on small to medium-sized problems, and that in addition it can be implemented in MapReduce-like environments and applied to terabyte-sized problems.

1312.6182 2026-06-04 cs.MS cs.LG cs.NA math.NA stat.ML

Large-Scale Paralleled Sparse Principal Component Analysis

大规模并行稀疏主成分分析

W. Liu, H. Zhang, D. Tao, Y. Wang, K. Lu

AI总结 本文提出基于GPU的高效并行稀疏主成分分析方法,通过并行实现通用幂方法的四种优化形式,显著提升计算效率,实验证明其在实际数据集中的实用性。

详情
Comments
submitted to Multimedia Tools and Applications
AI中文摘要

主成分分析(PCA)是一种用于多变量数据分析的统计技术,但其主成分(PCs)作为原始变量的线性组合,难以解释。稀疏PCA(SPCA)通过近似稀疏PCs来平衡统计保真度和可解释性。本文提出一种高效的GPU并行方法,用于实现SPCA,特别是通用幂方法的SPCA(GP-SPCA)。该方法在GPU上使用CUBLAS实现,比CPU上的CBLAS实现快11倍,比Matlab实现快107倍。在多个真实数据集上的广泛比较实验验证了SPCA的实用性。

英文摘要

Principal component analysis (PCA) is a statistical technique commonly used in multivariate data analysis. However, PCA can be difficult to interpret and explain since the principal components (PCs) are linear combinations of the original variables. Sparse PCA (SPCA) aims to balance statistical fidelity and interpretability by approximating sparse PCs whose projections capture the maximal variance of original data. In this paper we present an efficient and paralleled method of SPCA using graphics processing units (GPUs), which can process large blocks of data in parallel. Specifically, we construct parallel implementations of the four optimization formulations of the generalized power method of SPCA (GP-SPCA), one of the most efficient and effective SPCA approaches, on a GPU. The parallel GPU implementation of GP-SPCA (using CUBLAS) is up to eleven times faster than the corresponding CPU implementation (using CBLAS), and up to 107 times faster than a MatLab implementation. Extensive comparative experiments in several real-world datasets confirm that SPCA offers a practical advantage.

1312.4852 2026-06-04 stat.ML cs.SY eess.SY

Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM

基于粒子随机近似EM算法的高斯过程状态空间模型识别

Roger Frigola, Fredrik Lindsten, Thomas B. Schön, Carl E. Rasmussen

AI总结 本文提出基于粒子随机近似EM算法的高斯过程状态空间模型参数识别方法,通过有效利用粒子MCMC技术实现非参数动态描述的系统参数估计。

详情
AI中文摘要

高斯过程状态空间模型(GP-SSMs)是一类非常灵活的非线性动态系统建模方法,包含对系统动态的贝叶斯非参数表示以及调控该表示性质的附加(超)参数。贝叶斯框架使得系统动态的不确定性可以系统性地进行推理。本文提出了一种在保留完整非参数动态描述的同时,对GP-SSMs参数进行最大似然识别的方法,该方法基于随机近似版本的EM算法,并利用最近的粒子马尔可夫链蒙特卡洛发展实现高效识别。

英文摘要

Gaussian process state-space models (GP-SSMs) are a very flexible family of models of nonlinear dynamical systems. They comprise a Bayesian nonparametric representation of the dynamics of the system and additional (hyper-)parameters governing the properties of this nonparametric representation. The Bayesian formalism enables systematic reasoning about the uncertainty in the system dynamics. We present an approach to maximum likelihood identification of the parameters in GP-SSMs, while retaining the full nonparametric description of the dynamics. The method is based on a stochastic approximation version of the EM algorithm that employs recent developments in particle Markov chain Monte Carlo for efficient identification.

1306.2861 2026-06-04 stat.ML cs.LG cs.SY eess.SY

Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC

贝叶斯推断与学习在高斯过程状态空间模型中的粒子MCMC应用

Roger Frigola, Fredrik Lindsten, Thomas B. Schön, Carl E. Rasmussen

AI总结 本文提出一种全贝叶斯方法,用于非线性非参数状态空间模型中的推断与学习,通过高斯过程先验建模状态转移动态,并利用粒子MCMC进行高效推断。

详情
Journal ref
Published in NIPS 2013, Advances in Neural Information Processing Systems 26, pp. 3156--3164
AI中文摘要

本文提出了一种全贝叶斯方法,用于非线性非参数状态空间模型中的推断与学习,通过高斯过程先验建模状态转移动态,并利用粒子MCMC进行高效推断。

英文摘要

State-space models are successfully used in many areas of science, engineering and economics to model time series and dynamical systems. We present a fully Bayesian approach to inference \emph{and learning} (i.e. state estimation and system identification) in nonlinear nonparametric state-space models. We place a Gaussian process prior over the state transition dynamics, resulting in a flexible model able to capture complex dynamical phenomena. To enable efficient inference, we marginalize over the transition dynamics function and infer directly the joint smoothing distribution using specially tailored Particle Markov Chain Monte Carlo samplers. Once a sample from the smoothing distribution is computed, the state transition predictive distribution can be formulated analytically. Our approach preserves the full nonparametric expressivity of the model and can make use of sparse Gaussian processes to greatly reduce computational complexity.

1310.3556 2026-06-04 math.NA cs.LG cs.NA stat.ML

Identifying Influential Entries in a Matrix

识别矩阵中的关键条目

Abhisek Kundu, Srinivas Nambirajan, Petros Drineas

AI总结 本文提出一种概率分布,用于识别矩阵中最关键的条目,并通过理论证明在采样少量条目后可精确重建矩阵,且无需假设矩阵的无相干性。

详情
Comments
There is a bug in the proof of Lemma 5, which we are currently working to fix
AI中文摘要

对于任意一个大小为m×n、秩为ρ的矩阵A,我们提出了一种针对矩阵条目的概率分布(方程(2)中的元素杠杆分数),以揭示矩阵中最关键的条目。从理论角度看,我们证明了在采样至多s=O((m+n)ρ²ln(m+n))个条目(见方程(3)中s的精确值)并基于这些分数进行采样后,解核范数最小化问题可精确重建A。据我们所知,这些是目前在不假设矩阵A无相干性的情况下,对矩阵补全问题最强的理论保证。从实验角度看,我们显示高元素杠杆分数对应的条目揭示了数据矩阵的结构特性,这些特性对领域科学家具有研究价值。

英文摘要

For any matrix A in R^(m x n) of rank ρ, we present a probability distribution over the entries of A (the element-wise leverage scores of equation (2)) that reveals the most influential entries in the matrix. From a theoretical perspective, we prove that sampling at most s = O ((m + n) ρ^2 ln (m + n)) entries of the matrix (see eqn. (3) for the precise value of s) with respect to these scores and solving the nuclear norm minimization problem on the sampled entries, reconstructs A exactly. To the best of our knowledge, these are the strongest theoretical guarantees on matrix completion without any incoherence assumptions on the matrix A. From an experimental perspective, we show that entries corresponding to high element-wise leverage scores reveal structural properties of the data matrix that are of interest to domain scientists.

1312.2132 2026-06-04 eess.SY cs.LG cs.SY stat.ML

Robust Subspace System Identification via Weighted Nuclear Norm Optimization

通过加权核范数优化实现鲁棒子空间系统辨识

Dorsa Sadigh, Henrik Ohlsson, S. Shankar Sastry, Sanjit A. Seshia

AI总结 本文提出一种基于加权核范数优化的鲁棒子空间系统辨识方法,通过在拟合、秩和稀疏性之间进行权衡,有效处理异常值问题。

详情
Comments
Submitted to the IFAC World Congress 2014
AI中文摘要

子空间辨识是系统辨识中的经典且广泛研究的问题。该问题最近被提出为凸优化问题,通过核范数松弛。受鲁棒PCA的启发,我们扩展了这一框架以处理异常值。所提出的框架形式为一个凸优化问题,其目标函数在拟合、秩和稀疏性之间进行权衡。与鲁棒PCA类似,找到合适的正则化参数可能会有问题。我们展示了如何将合适参数的搜索空间限制在二维参数空间的有界开集内。在实践中,这非常有用,因为它限制了需要调查的参数空间。

英文摘要

Subspace identification is a classical and very well studied problem in system identification. The problem was recently posed as a convex optimization problem via the nuclear norm relaxation. Inspired by robust PCA, we extend this framework to handle outliers. The proposed framework takes the form of a convex optimization problem with an objective that trades off fit, rank and sparsity. As in robust PCA, it can be problematic to find a suitable regularization parameter. We show how the space in which a suitable parameter should be sought can be limited to a bounded open set of the two dimensional parameter space. In practice, this is very useful since it restricts the parameter space that is needed to be surveyed.

1312.1613 2026-06-04 stat.ML cs.LG cs.NA math.NA

Max-Min Distance Nonnegative Matrix Factorization

最大-最小距离非负矩阵分解

Jim Jing-Yan Wang

AI总结 本文提出一种监督非负矩阵分解算法,通过利用类别标签将数据对分为同类和异类对,旨在最小化同类对在新空间中的最大距离,同时最大化异类对的最小距离,提升表示的判别能力。

详情
AI中文摘要

非负矩阵分解(NMF)已成为模式分类问题中流行的表示方法。它试图将数据样本的非负矩阵分解为非负基本矩阵和非负系数矩阵的乘积,系数矩阵用作新的表示。然而,传统NMF方法忽略了数据样本的类别标签。在本文中,我们提出了一种监督的新型NMF算法,以提高新表示的判别能力。利用类别标签,我们将所有数据样本对分为同类对和异类对。为了提高新NMF表示的判别能力,我们希望在新NMF空间中同类对的最大距离被最小化,同时异类对的最小距离被最大化。基于此准则,我们构建了一个目标函数,并通过交替优化基本矩阵、系数矩阵和松弛变量来优化该函数,从而得到一个迭代算法。

英文摘要

Nonnegative Matrix Factorization (NMF) has been a popular representation method for pattern classification problem. It tries to decompose a nonnegative matrix of data samples as the product of a nonnegative basic matrix and a nonnegative coefficient matrix, and the coefficient matrix is used as the new representation. However, traditional NMF methods ignore the class labels of the data samples. In this paper, we proposed a supervised novel NMF algorithm to improve the discriminative ability of the new representation. Using the class labels, we separate all the data sample pairs into within-class pairs and between-class pairs. To improve the discriminate ability of the new NMF representations, we hope that the maximum distance of the within-class pairs in the new NMF space could be minimized, while the minimum distance of the between-class pairs pairs could be maximized. With this criterion, we construct an objective function and optimize it with regard to basic and coefficient matrices and slack variables alternatively, resulting in a iterative algorithm.

1312.1476 2026-06-04 stat.CO cs.NA math.NA

Scalable iterative methods for sampling from massive Gaussian random vectors

大规模高斯随机向量采样中的可扩展迭代方法

Daniel P. Simpson, Ian W. Turner, Christopher M. Strickland, Anthony N. Pettitt

AI总结 本文提出利用高斯马尔可夫随机场的近似方法加速Krylov子空间采样,同时应用于大规模多元高斯分布的正则化常数计算,提出O(n log n)的采样方案。

详情
Comments
17 Pages, 4 Figures
AI中文摘要

从高斯马尔可夫随机场(GMRF)采样,即由协方差矩阵逆参数化的多元高斯随机向量,是计算统计学中的基础问题。本文展示如何利用任意精确的GMRF近似来加速Krylov子空间采样方法。我们还展示这些方法可用于计算大规模多元高斯分布的正则化常数,这对于任何基于似然的推断方法都是必需的。我们推导的方法也适用于其他结构化的高斯随机向量,特别是当精度矩阵是(块)循环矩阵的扰动时,仍可以推导出O(n log n)的采样方案。

英文摘要

Sampling from Gaussian Markov random fields (GMRFs), that is multivariate Gaussian ran- dom vectors that are parameterised by the inverse of their covariance matrix, is a fundamental problem in computational statistics. In this paper, we show how we can exploit arbitrarily accu- rate approximations to a GMRF to speed up Krylov subspace sampling methods. We also show that these methods can be used when computing the normalising constant of a large multivariate Gaussian distribution, which is needed for both any likelihood-based inference method. The method we derive is also applicable to other structured Gaussian random vectors and, in particu- lar, we show that when the precision matrix is a perturbation of a (block) circulant matrix, it is still possible to derive O(n log n) sampling schemes.

1311.7525 2026-06-04 math.NA cs.NA stat.CO

Polynomial regression using trapezoidal rule for computing Legendre coefficients

用梯形法则计算Legendre系数的多项式回归

Demetris T. Christopoulos

AI总结 本文提出利用梯形法则计算多项式回归的傅里叶系数,采用正交的Legendre多项式作为基函数,结果比Forsythe方法更准确稳定。

详情
Comments
13 pages, 2 figures, 4 tables
AI中文摘要

我们提出了一种方法,通过使用数值积分的梯形法则来计算给定多项式回归的傅里叶系数。作为函数基底,我们使用正交的Legendre多项式。结果在准确性和稳定性上优于Forsythe的方法。

英文摘要

We are presenting a method for computing the Fourier coefficients of a given polynomial regression by using the trapezoidal rule for numerical integration. As function basis we use the orthogonal Legendre polynomials. The results are accurate and stable compared to Forsythe's method.

1311.5750 2026-06-04 cs.LG cs.NA math.NA stat.ML

Gradient Hard Thresholding Pursuit for Sparsity-Constrained Optimization

梯度硬阈值化追求用于稀疏约束优化

Xiao-Tong Yuan, Ping Li, Tong Zhang

AI总结 本文将硬阈值化追求算法推广到稀疏约束凸优化问题,通过梯度下降与硬阈值化步骤交替,证明其在收敛速度和参数估计精度上的强保证,并在稀疏逻辑回归和稀疏精度矩阵估计中优于现有方法。

详情
AI中文摘要

梯度硬阈值化追求算法是一种迭代的贪心选择方法,用于寻找欠定线性系统中的稀疏解。该方法已显示出强大的理论保证和出色的数值性能。本文将硬阈值化追求方法从压缩感知推广到一般性的稀疏约束凸优化问题。所提出的算法在标准梯度下降步骤和硬阈值化步骤之间交替,可有或无去偏处理。我们证明了该方法在收敛速度和参数估计精度方面具有与HTP相似的强保证。数值结果表明,该方法在稀疏逻辑回归和稀疏精度矩阵估计任务中优于现有的贪心选择方法。

英文摘要

Hard Thresholding Pursuit (HTP) is an iterative greedy selection procedure for finding sparse solutions of underdetermined linear systems. This method has been shown to have strong theoretical guarantee and impressive numerical performance. In this paper, we generalize HTP from compressive sensing to a generic problem setup of sparsity-constrained convex optimization. The proposed algorithm iterates between a standard gradient descent step and a hard thresholding step with or without debiasing. We prove that our method enjoys the strong guarantees analogous to HTP in terms of rate of convergence and parameter estimation accuracy. Numerical evidences show that our method is superior to the state-of-the-art greedy selection methods in sparse logistic regression and sparse precision matrix estimation tasks.

1311.4643 2026-06-04 cs.LG cs.IT cs.NA math.IT math.NA stat.ML

Near-Optimal Entrywise Sampling for Data Matrices

近最优数据矩阵的逐元素采样

Dimitris Achlioptas, Zohar Karnin, Edo Liberty

AI总结 本文提出一种近最优的数据矩阵逐元素采样方法,通过四条性质保证了高效性和压缩性,同时在流式模型中能有效竞争最优分布。

详情
Comments
14 pages, to appear in NIPS' 13
AI中文摘要

我们考虑了选择矩阵$A$的非零元素以生成稀疏草稿$B$,使其最小化$\|A-B\|_2$的问题。对于大规模$m \times n$矩阵,当$n \gg m$时,我们给出了一种采样分布,具有四个重要性质:首先,它们有闭合形式,可从最少的关于$A$的信息计算;其次,允许以任意顺序流式输入非零元素进行草稿生成,每个非零元素的计算为$O(1)$;第三,生成的草稿矩阵不仅稀疏,其非零元素高度可压缩;最后,且最重要的是,在温和假设下,我们的分布能与最优离线分布竞争。需要注意的是,最优离线分布中的概率可能是所有矩阵元素的复杂函数,因此,无论计算复杂度如何,最优分布可能在流式模型中无法计算。

英文摘要

We consider the problem of selecting non-zero entries of a matrix $A$ in order to produce a sparse sketch of it, $B$, that minimizes $\|A-B\|_2$. For large $m \times n$ matrices, such that $n \gg m$ (for example, representing $n$ observations over $m$ attributes) we give sampling distributions that exhibit four important properties. First, they have closed forms computable from minimal information regarding $A$. Second, they allow sketching of matrices whose non-zeros are presented to the algorithm in arbitrary order as a stream, with $O(1)$ computation per non-zero. Third, the resulting sketch matrices are not only sparse, but their non-zero entries are highly compressible. Lastly, and most importantly, under mild assumptions, our distributions are provably competitive with the optimal offline distribution. Note that the probabilities in the optimal offline distribution may be complex functions of all the entries in the matrix. Therefore, regardless of computational complexity, the optimal distribution might be impossible to compute in the streaming model.

1311.0396 2026-06-04 eess.SY cs.SY math.OC stat.ML

Data-based approximate policy iteration for nonlinear continuous-time optimal control design

基于数据的近似策略迭代法用于非线性连续时间最优控制设计

Biao Luo, Huai-Ning Wu, Tingwen Huang, Derong Liu

AI总结 本文提出基于数据的近似策略迭代方法,用于解决非线性最优控制问题,通过演员-评论家结构神经网络近似控制策略和成本函数,无需系统数学模型即可学习HJB方程解。

详情
Comments
22 pages, 21 figures, submitted for Peer Review
AI中文摘要

本文针对无模型非线性最优问题,开发了一种基于数据的强化学习技术。已知非线性最优控制问题依赖于哈密顿-雅可比-贝尔曼(HJB)方程的解,该方程通常是无法解析求解的非线性偏微分方程。更糟糕的是,大多数实际系统过于复杂,无法建立准确的数学模型。为克服这些困难,本文提出了一种基于数据的近似策略迭代(API)方法,利用实际系统数据而非系统模型。首先,推导了一个用于约束最优控制问题的无模型策略迭代算法,并证明了其收敛性,该算法可以学习HJB方程和最优控制策略的解,而无需任何系统数学模型的知识。该算法的实现基于演员-评论家结构的思想,其中演员和评论家神经网络(NNs)分别用于近似控制策略和成本函数。为了更新演员和评论家NNs的权重,开发了一种基于残差加权方法的最小二乘方法。整个基于数据的API方法包括两部分,其中第一部分在线实施以收集实际系统信息,第二部分进行离线策略迭代以学习HJB方程和控制策略的解。然后,将该基于数据的API算法简化以解决非线性和线性系统的无约束最优控制问题。最后,在简单的非线性系统上测试了该基于数据的API控制设计方法的效率,并进一步应用于旋转/平移执行器系统。仿真结果证明了所提方法的有效性。

英文摘要

This paper addresses the model-free nonlinear optimal problem with generalized cost functional, and a data-based reinforcement learning technique is developed. It is known that the nonlinear optimal control problem relies on the solution of the Hamilton-Jacobi-Bellman (HJB) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, most of practical systems are too complicated to establish their accurate mathematical model. To overcome these difficulties, we propose a data-based approximate policy iteration (API) method by using real system data rather than system model. Firstly, a model-free policy iteration algorithm is derived for constrained optimal control problem and its convergence is proved, which can learn the solution of HJB equation and optimal control policy without requiring any knowledge of system mathematical model. The implementation of the algorithm is based on the thought of actor-critic structure, where actor and critic neural networks (NNs) are employed to approximate the control policy and cost function, respectively. To update the weights of actor and critic NNs, a least-square approach is developed based on the method of weighted residuals. The whole data-based API method includes two parts, where the first part is implemented online to collect real system information, and the second part is conducting offline policy iteration to learn the solution of HJB equation and the control policy. Then, the data-based API algorithm is simplified for solving unconstrained optimal control problem of nonlinear and linear systems. Finally, we test the efficiency of the data-based API control design method on a simple nonlinear system, and further apply it to a rotational/translational actuator system. The simulation results demonstrate the effectiveness of the proposed method.

1311.3830 2026-06-04 math.NA cs.NA math.NT stat.CO

Applications of geometric discrepancy in numerical analysis and statistics

几何偏差在数值分析与统计中的应用

Josef Dick

AI总结 本文探讨了几何偏差度量与数值分析、统计学应用之间的联系,包括球面上点分布、接受-拒绝算法及特定马尔可夫链蒙特卡罗算法。

详情
AI中文摘要

本文讨论了几何偏差度量,如凸集偏差(特别是具有光滑边界凸集的偏差)与数值分析和统计学应用之间的各种联系,包括球面上的点分布、接受-拒绝算法以及某些马尔可夫链蒙特卡罗算法。

英文摘要

In this paper we discuss various connections between geometric discrepancy measures, such as discrepancy with respect to convex sets (and convex sets with smooth boundary in particular), and applications to numerical analysis and statistics, like point distributions on the sphere, the acceptance-rejection algorithm and certain Markov chain Monte Carlo algorithms.

1310.3240 2026-06-04 cs.IT cs.NA math.FA math.IT math.NA math.OC math.ST stat.TH

Phase Retrieval from Coded Diffraction Patterns

从编码衍射图案中恢复相位

Emmanuel Candes, Xiaodong Li, Mahdi Soltanolkotabi

AI总结 本文研究了从仅强度测量中恢复物体相位的问题,提出通过随机调制产生编码衍射图案,并证明PhaseLift算法能以对数次数的调制精确恢复相位信息。

详情
AI中文摘要

本文考虑了从仅强度测量中恢复物体相位的问题,该问题自然出现在X射线晶体学等领域。我们研究了一个物理现实的设置,其中可以调制感兴趣的信号,然后收集其衍射图案的强度,每次调制都会产生一种编码衍射图案。我们证明PhaseLift,一种近期的凸优化技术,可以从随机调制的少量次数中精确恢复相位信息,该次数与未知数的数量呈多项对数关系。使用无噪声和有噪声的数据的数值实验补充了我们的理论分析并展示了我们的方法。

英文摘要

This paper considers the question of recovering the phase of an object from intensity-only measurements, a problem which naturally appears in X-ray crystallography and related disciplines. We study a physically realistic setup where one can modulate the signal of interest and then collect the intensity of its diffraction pattern, each modulation thereby producing a sort of coded diffraction pattern. We show that PhaseLift, a recent convex programming technique, recovers the phase information exactly from a number of random modulations, which is polylogarithmic in the number of unknowns. Numerical experiments with noiseless and noisy data complement our theoretical analysis and illustrate our approach.

1310.7679 2026-06-04 eess.SY cs.SY stat.ML

Structured Optimal Transmission Control in Network-coded Two-way Relay Channels

网络编码双向中继信道中的结构最优传输控制

Ni Ding, Parastoo Sadeghi, Rodney A. Kennedy

AI总结 本文研究网络编码双向中继信道中的传输控制问题,通过马尔可夫决策过程建模,提出基于动态规划的最优策略,证明在特定条件下最优策略随队列状态或信道状态非递减,以降低复杂度并实现实时控制。

详情
Comments
32 pages
AI中文摘要

本文考虑网络编码双向中继信道(NC-TWRC)中的传输控制问题,其中中继器缓冲来自两个用户的随机符号到达,并假设信道存在衰落。该问题通过折扣无限 horizon 马尔可夫决策过程(MDP)建模。目标是找到一个传输控制策略,以最小化符号延迟、缓冲溢出、传输功率消耗和误码率。通过子模性、多模性和 L-自然凸性等概念,研究动态规划(DP)算法搜索的最优策略结构。证明在某些条件下,最优传输策略随队列占用或/且信道状态非递减。本文结果可用于降低 DP 的复杂度并促进实时控制。

英文摘要

This paper considers a transmission control problem in network-coded two-way relay channels (NC-TWRC), where the relay buffers random symbol arrivals from two users, and the channels are assumed to be fading. The problem is modeled by a discounted infinite horizon Markov decision process (MDP). The objective is to find a transmission control policy that minimizes the symbol delay, buffer overflow and transmission power consumption and error rate simultaneously and in the long run. By using the concepts of submodularity, multimodularity and L-natural convexity, we study the structure of the optimal policy searched by dynamic programming (DP) algorithm. We show that the optimal transmission policy is nondecreasing in queue occupancies or/and channel states under certain conditions such as the chosen values of parameters in the MDP model, channel modeling method, modulation scheme and the preservation of stochastic dominance in the transitions of system states. The results derived in this paper can be used to relieve the high complexity of DP and facilitate real-time control.

1310.3697 2026-06-04 stat.ML cs.LG cs.SY eess.SY

Variance Adjusted Actor Critic Algorithms

方差调整的actor-critic算法

Aviv Tamar, Shie Mannor

AI总结 本文提出了一种针对MDP的actor-critic框架,目标为方差调整的预期回报。通过线性函数逼近和扩展兼容特征概念,提出了一种分回合算法,并证明其几乎必然收敛到目标函数的局部最优解。

详情
AI中文摘要

我们提出了一种适用于马尔可夫决策过程的actor-critic框架,其目标为方差调整的预期回报。我们的 critic 使用线性函数逼近,并将兼容特征的概念扩展到方差调整的设定。我们提出了一种分回合的actor-critic算法,并证明该算法几乎必然收敛到目标函数的局部最优解。

英文摘要

We present an actor-critic framework for MDPs where the objective is the variance-adjusted expected return. Our critic uses linear function approximation, and we extend the concept of compatible features to the variance-adjusted setting. We present an episodic actor-critic algorithm and show that it converges almost surely to a locally optimal point of the objective function.

1309.1913 2026-06-04 math.OC cs.SY eess.SY math.ST stat.TH

Dynamic Team Theory of Stochastic Differential Decision Systems with Decentralized Noisy Information Structures via Girsanov's Measure Transformation

动态团队理论:通过Girsanov测度变换的随机微分决策系统的去中心化噪声信息结构

Charalambos D. Charalambous, Nasir U. Ahmed

AI总结 本文提出两种方法,将静态团队理论推广到动态团队理论,通过连续时间随机非线性微分去中心化决策系统,利用Girsanov测度变换获得等价动态团队问题,研究团队最优条件和放松策略的存在性。

详情
Comments
50 pages
AI中文摘要

本文提出两种方法,将静态团队理论推广到动态团队理论,研究连续时间随机非线性微分去中心化决策系统。通过Girsanov测度变换,将观测和信息结构与团队决策分离。第一种方法基于函数空间积分与Wiener测度的乘积,推广Witsenhausen对离散时间静态和动态团队问题等价性的定义。第二种方法基于随机Pontryagin极大原理,团队最优条件由包含正向和反向随机微分方程的

英文摘要

In this paper, we present two methods which generalize static team theory to dynamic team theory, in the context of continuous-time stochastic nonlinear differential decentralized decision systems, with relaxed strategies, which are measurable to different noisy information structures. For both methods we apply Girsanov's measure transformation to obtain an equivalent dynamic team problem under a reference probability measure, so that the observations and information structures available for decisions, are not affected by any of the team decisions. The first method is based on function space integration with respect to products of Wiener measures, and generalizes Witsenhausen's [1] definition of equivalence between discrete-time static and dynamic team problems. The second method is based on stochastic Pontryagin's maximum principle. The team optimality conditions are given by a "Hamiltonian System" consisting of forward and backward stochastic differential equations, and a conditional variational Hamiltonian with respect to the information structure of each team member, expressed under the initial and a reference probability space via Girsanov's measure transformation. Under global convexity conditions, we show that that PbP optimality implies team optimality. In addition, we also show existence of team and PbP optimal relaxed decentralized strategies (conditional distributions), in the weak$^*$ sense, without imposing convexity on the action spaces of the team members. Moreover, using the embedding of regular strategies into relaxed strategies, we also obtain team and PbP optimality conditions for regular team strategies, which are measurable functions of decentralized information structures, and we use the Krein-Millman theorem to show realizability of relaxed strategies by regular strategies.

1309.2375 2026-06-04 stat.ML cs.LG cs.NA math.NA stat.CO

Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization

加速的正则化损失最小化随机对偶坐标上升法

Shai Shalev-Shwartz, Tong Zhang

AI总结 本文提出一种正则化损失最小化问题的加速随机对偶坐标上升方法,并通过内-外迭代流程提升其效率,改进了支持向量机、逻辑回归等关键机器学习优化问题的理论结果。

详情
AI中文摘要

我们介绍了一种随机对偶坐标上升方法的正则化版本,并展示了如何通过内-外迭代流程加速该方法。我们分析了该框架的运行时间并获得了改进现有最先进结果的速率,适用于支持向量机、逻辑回归、岭回归、Lasso和多类支持向量机等多种关键机器学习优化问题。实验验证了我们的理论发现。

英文摘要

We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure. We analyze the runtime of the framework and obtain rates that improve state-of-the-art results for various key machine learning optimization problems including SVM, logistic regression, ridge regression, Lasso, and multiclass SVM. Experiments validate our theoretical findings.

1309.1864 2026-06-04 eess.SY cs.SY stat.AP

Timing estimation in distributed sensor and control systems with central processing

具有中央处理的分布式传感器和控制系统中的定时估计

John-Olof Nilsson, Peter Händel

AI总结 本文研究了在具有中央处理的分布式传感器和控制系统中直接估计测量和执行时间的方法,提出了一种构造系统时间的启发式方法,并展示了外围单元时间戳如何减少抖动,最后通过数值示例展示了现代系统设计。

详情
AI中文摘要

我们考虑了在具有中央处理的分布式传感器和控制系统中估计测量和执行时间的问题。重点是直接时间估计,适用于无法或不希望进行时钟同步的场景。时间模型和中央及外围时间戳模型源于底层时钟和通信延迟的定义和模型。提出了构造系统时间的启发式方法,并概述了如何处理联合时间和植物状态估计。对于一组简单的底层时钟和通信延迟模型,展示了外围单元时间戳如何减少抖动,并论证了在一般情况下会显著减少抖动。最后,给出了一个现代系统设计的数值示例。

英文摘要

We consider the problem of estimating timing of measurements and actuation in distributed sensor and control systems with central processing. The focus is on direct timing estimation for scenarios where clock synchronization is not feasible or desirable. Models of the timing and central and peripheral time stamps are motivated and derived from underlying clock and communication delay definitions and models. Heuristics for constructing a system time is presented and it is outlined how the joint timing and the plant state estimation can be handled. For a simple set of underlying clock and communication delay models, inclusion of peripheral unit time stamps is shown to reduce jitter, and it is argued that in general it will give significant jitter reduction. Finally, a numerical example is given of a contemporary system design.

1308.6774 2026-06-04 math.OC cs.DC cs.NA math.NA stat.ML

Separable Approximations and Decomposition Methods for the Augmented Lagrangian

可分离近似与分解方法用于增广拉格朗日法

Rachael Tappenden, Peter Richtarik, Burak Buke

AI总结 本文研究了基于可分离近似的分解方法用于最小化增广拉格朗日函数,比较了Mulvey和Ruszczyński的DQAM与Richtárik和Takáč的PCDM,证明两者在可行性问题中等价,且在强凸性下改进了PCDM的复杂度界。

详情
Comments
28 pages, 6 algorithms, 2 figures
AI中文摘要

本文研究了基于可分离近似的分解方法用于最小化增广拉格朗日函数。特别地,我们研究并比较了Mulvey和Ruszczyński的对角二次近似方法(DQAM)和Richtárik与Takáč的并行坐标下降方法(PCDM)。我们证明,这两种方法在可行性问题中,仅在步长参数的选择上有所不同即等价。此外,我们证明了在强凸性条件下,PCDM的复杂度界得到改进,且该界至少比DQAM的最佳已知界好$8(L'/\bar{L})(ω-1)^2$倍,其中$ω$是部分可分离度,$L'$和$\bar{L}$是增广拉格朗日函数中二次惩罚项梯度的块Lipschitz常数的最大值和平均值。

英文摘要

In this paper we study decomposition methods based on separable approximations for minimizing the augmented Lagrangian. In particular, we study and compare the Diagonal Quadratic Approximation Method (DQAM) of Mulvey and Ruszczyński and the Parallel Coordinate Descent Method (PCDM) of Richtárik and Takáč. We show that the two methods are equivalent for feasibility problems up to the selection of a single step-size parameter. Furthermore, we prove an improved complexity bound for PCDM under strong convexity, and show that this bound is at least $8(L'/\bar{L})(ω-1)^2$ times better than the best known bound for DQAM, where $ω$ is the degree of partial separability and $L'$ and $\bar{L}$ are the maximum and average of the block Lipschitz constants of the gradient of the quadratic penalty appearing in the augmented Lagrangian.

1308.5697 2026-06-04 math.NA cs.NA stat.CO

Randomized algorithms for low-rank matrix factorizations: sharp performance bounds

随机算法用于低秩矩阵分解:精确性能界限

Rafi Witten, Emmanuel Candes

AI总结 本文研究了用于降维的随机算法,提出了一种新的分析方法,推导出精确的误差估计,并通过数值实验验证了理论结果的紧致性。

详情
AI中文摘要

随机算法在数值线性代数中的发展,例如用于计算近似QR和SVD分解,已成为研究热点。本文研究了文献中最常讨论的降维算法之一,即近似输入矩阵的低秩元素。我们介绍了Martinsson等人(2008)算法的一种新颖且直观的分析方法,使我们能够推导出精确的估计并提供关于其性能的新见解。这种分析提供了关于近似误差的理论保证,同时给出了性能的极限(下界),表明我们的上界是紧致的。数值实验补充了我们的研究,展示了我们的预测与经验观察相比的紧致性。

英文摘要

The development of randomized algorithms for numerical linear algebra, e.g. for computing approximate QR and SVD factorizations, has recently become an intense area of research. This paper studies one of the most frequently discussed algorithms in the literature for dimensionality reduction---specifically for approximating an input matrix with a low-rank element. We introduce a novel and rather intuitive analysis of the algorithm in Martinsson et al. (2008), which allows us to derive sharp estimates and give new insights about its performance. This analysis yields theoretical guarantees about the approximation error and at the same time, ultimate limits of performance (lower bounds) showing that our upper bounds are tight. Numerical experiments complement our study and show the tightness of our predictions compared with empirical observations.

1308.5576 2026-06-04 stat.ML cs.IT cs.SY eess.SY math.IT

A Comparison of Algorithms for Learning Hidden Variables in Normal Graphs

正常图中学习隐藏变量算法的比较

Francesco A. N. Palmieri

AI总结 本文比较了正常图中学习隐藏变量的算法,通过约束最大似然和KL散度准则推导局部适应规则,并在合成数据集上验证了不同算法的性能。

详情
Comments
Submitted for journal publication
AI中文摘要

一个贝叶斯因子图简化为正常形式,由分流单元(或等约束单元)和单输入/单输出(SISO)块的互连组成。在该框架中,通过约束最大似然公式和最小KL散度准则,利用KKT条件显式推导出局部适应规则。学习算法与基于维特比似然和变分近似两种更新方程的算法进行比较。在各种架构的合成数据集上验证了各种算法的性能。本文的目标是为程序员提供明确的算法,以便快速部署贝叶斯图到应用中。

英文摘要

A Bayesian factor graph reduced to normal form consists in the interconnection of diverter units (or equal constraint units) and Single-Input/Single-Output (SISO) blocks. In this framework localized adaptation rules are explicitly derived from a constrained maximum likelihood (ML) formulation and from a minimum KL-divergence criterion using KKT conditions. The learning algorithms are compared with two other updating equations based on a Viterbi-like and on a variational approximation respectively. The performance of the various algorithm is verified on synthetic data sets for various architectures. The objective of this paper is to provide the programmer with explicit algorithms for rapid deployment of Bayesian graphs in the applications.

1308.3015 2026-06-04 cs.RO cs.SY eess.SY stat.CO stat.ME

On Generalized Bayesian Data Fusion with Complex Models in Large Scale Networks

在大规模网络中使用复杂模型的广义贝叶斯数据融合

Nisar Ahmed, Tsung-Lin Yang, Mark Campbell

AI总结 本文提出新的广义贝叶斯分布式数据融合算法,用于处理动态网络拓扑和复杂信念模型,通过混合pdf和条件因子提升多机器人目标搜索中的融合效果。

详情
Comments
Revised version of paper submitted to 2013 Workshop on Wireless Intelligent Sensor Networks (WISeNET 2013) at Duke University, June 5, 2013
AI中文摘要

近年来,通信、移动计算和人工智能的进步大大扩展了智能分布式传感器网络的应用空间。这反过来推动了开发广义贝叶斯分布式数据融合(DDF)算法,以在自主代理之间实现鲁棒且高效的资源共享,使用概率信念模型。然而,DDF在需要使用动态/即需网络拓扑和复杂信念模型(如高斯混合或混合贝叶斯网络)的一般现实应用中显著难以实施。为了解决这些问题,我们首先讨论了关于精确DDF和保守近似DDF的一些新关键数学见解。这些见解随后被用来开发基于混合pdf和条件因子的新型广义DDF算法。受多机器人目标搜索启发的数值示例显示,我们的方法导致显著更好的融合结果,因此有潜力增强传感器网络中的分布式智能推理。

英文摘要

Recent advances in communications, mobile computing, and artificial intelligence have greatly expanded the application space of intelligent distributed sensor networks. This in turn motivates the development of generalized Bayesian decentralized data fusion (DDF) algorithms for robust and efficient information sharing among autonomous agents using probabilistic belief models. However, DDF is significantly challenging to implement for general real-world applications requiring the use of dynamic/ad hoc network topologies and complex belief models, such as Gaussian mixtures or hybrid Bayesian networks. To tackle these issues, we first discuss some new key mathematical insights about exact DDF and conservative approximations to DDF. These insights are then used to develop novel generalized DDF algorithms for complex beliefs based on mixture pdfs and conditional factors. Numerical examples motivated by multi-robot target search demonstrate that our methods lead to significantly better fusion results, and thus have great potential to enhance distributed intelligent reasoning in sensor networks.

1308.2853 2026-06-04 cs.LG cs.IR cs.NA math.NA math.ST stat.ML stat.TH

When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

何时过完备主题模型是可识别的?具有结构稀疏性的张量Tucker分解的唯一性

Animashree Anandkumar, Daniel Hsu, Majid Janzamin, Sham Kakade

AI总结 本文研究了过完备主题模型在特定阶可观察矩下的可识别性,提出通过结构稀疏性约束实现张量Tucker分解的唯一性。

详情
AI中文摘要

过完备潜在表示近年来在无监督特征学习中非常流行。本文指明哪些过完备模型在特定阶的可观察矩下可识别。我们考虑在过完备 regime 中的概率混合或主题模型,其中潜在主题数量远超观察词汇量。尽管一般过完备主题模型不可识别,但通过引入称为主题持续性的约束,我们建立了通用可识别性条件。这些条件涉及对主题-词汇矩阵或模型总体结构的新“高阶”展开条件。这些高阶展开条件允许过完备模型,并要求存在从潜在主题到高阶观察词汇的完美匹配。我们证明在过完备 regime 中,随机结构主题模型以高概率可识别。我们的可识别性结果允许一般(非退化)分布建模主题比例,从而在框架中处理任意相关的主题。我们的可识别性结果暗示了一类具有结构稀疏性的张量分解的唯一性,该类包含在Tucker分解中,但比Candecomp/Parafac(CP)分解更一般。

英文摘要

Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime, where the number of latent topics can greatly exceed the size of the observed word vocabulary. While general overcomplete topic models are not identifiable, we establish generic identifiability under a constraint, referred to as topic persistence. Our sufficient conditions for identifiability involve a novel set of "higher order" expansion conditions on the topic-word matrix or the population structure of the model. This set of higher-order expansion conditions allow for overcomplete models, and require the existence of a perfect matching from latent topics to higher order observed words. We establish that random structured topic models are identifiable w.h.p. in the overcomplete regime. Our identifiability results allows for general (non-degenerate) distributions for modeling the topic proportions, and thus, we can handle arbitrarily correlated topics in our framework. Our identifiability results imply uniqueness of a class of tensor decompositions with structured sparsity which is contained in the class of Tucker decompositions, but is more general than the Candecomp/Parafac (CP) decomposition.

1306.2665 2026-06-04 cs.IT cs.LG cs.SY eess.SY math.IT math.OC stat.ML

Precisely Verifying the Null Space Conditions in Compressed Sensing: A Sandwiching Algorithm

在压缩感知中精确验证空域条件:一种 Sandwiching 算法

Myung Cho, Weiyu Xu

AI总结 本文提出新算法验证压缩感知中的空域条件,通过高效计算α_k,改进了传统方法的复杂度和精度。

详情
Comments
30 pages
AI中文摘要

本文提出新的高效算法来验证压缩感知(CS)中的空域条件。给定一个(n-m)×n(m>0)的CS矩阵A和正数k,我们感兴趣于计算α_k = max{z: Az=0, z≠0} max{K: |K|≤k} ||z_K||₁||z||₁,其中K代表{1,2,...,n}的子集,|K|是K的基数。特别地,我们关注找到使得α_k < 1/2的最大k。然而,计算α_k被认为极具挑战性。本文首先提出一系列新的多项式时间算法来计算α_k的上界。基于这些新算法,我们进一步设计了一种新的Sandwiching算法,以大大降低复杂度的方式计算精确的α_k。当需要时,这种新的Sandwiching算法还能在计算复杂度和结果精度之间实现平滑的权衡。实验证明了我们的算法在性能上的改进;并且我们的算法输出精确的α_k值,其复杂度远低于穷举搜索。

英文摘要

In this paper, we propose new efficient algorithms to verify the null space condition in compressed sensing (CS). Given an $(n-m) \times n$ ($m>0$) CS matrix $A$ and a positive $k$, we are interested in computing $\displaystyle α_k = \max_{\{z: Az=0,z\neq 0\}}\max_{\{K: |K|\leq k\}}$ ${\|z_K \|_{1}}{\|z\|_{1}}$, where $K$ represents subsets of $\{1,2,...,n\}$, and $|K|$ is the cardinality of $K$. In particular, we are interested in finding the maximum $k$ such that $α_k < {1}{2}$. However, computing $α_k$ is known to be extremely challenging. In this paper, we first propose a series of new polynomial-time algorithms to compute upper bounds on $α_k$. Based on these new polynomial-time algorithms, we further design a new sandwiching algorithm, to compute the \emph{exact} $α_k$ with greatly reduced complexity. When needed, this new sandwiching algorithm also achieves a smooth tradeoff between computational complexity and result accuracy. Empirical results show the performance improvements of our algorithm over existing known methods; and our algorithm outputs precise values of $α_k$, with much lower complexity than exhaustive search.

1308.1509 2026-06-04 eess.SY cs.IT cs.SY math.IT math.OC stat.AP

Monotone Smoothing Splines Using General Linear Systems

使用通用线性系统的一致平滑样条

Masaaki Nagahara, Clyde F. Martin

AI总结 本文提出了一种通过通用线性系统求解一致平滑样条的方法,解决了在非二次积分器模型下保持单调性的难题,通过半无限二次规划和离散化技术实现收敛性证明。

详情
Journal ref
Asian Journal of Control, Vol. 5, No. 2, pp. 461-468, Mar. 2013
AI中文摘要

本文提出了一种通过通用线性系统求解一致平滑样条的方法。该问题,也称为单调控制理论样条,仅在曲线生成器由二阶积分器建模时得到解决,而其他情况下尚未解决。问题的难点在于单调性约束必须在整个连续统区间上满足。为此,首先将问题形式化为半无限二次规划问题,然后采用离散化技术得到有限维二次规划问题。证明了有限维问题的解始终满足无限维的单调性约束。同时证明了近似解随着离散化网格尺寸趋于零时收敛到精确解。通过一个例子展示了所提方法的有效性。

英文摘要

In this paper, a method is proposed to solve the problem of monotone smoothing splines using general linear systems. This problem, also called monotone control theoretic splines, has been solved only when the curve generator is modeled by the second-order integrator, but not for other cases. The difficulty in the problem is that the monotonicity constraint should be satisfied over an interval which has the cardinality of the continuum. To solve this problem, we first formulate the problem as a semi-infinite quadratic programming, and then we adopt a discretization technique to obtain a finite-dimensional quadratic programming problem. It is shown that the solution of the finite-dimensional problem always satisfies the infinite-dimensional monotonicity constraint. It is also proved that the approximated solution converges to the exact solution as the discretization grid-size tends to zero. An example is presented to show the effectiveness of the proposed method.

1308.1313 2026-06-04 math.NA cs.NA math.OC stat.CO stat.ME

A computational framework for infinite-dimensional Bayesian inverse problems. Part I: The linearized case, with application to global seismic inversion

无限维贝叶斯反问题计算框架。第一部分:线性化情况,及其在全球地震反演中的应用

Tan Bui-Thanh, Omar Ghattas, James Martin, Georg Stadler

AI总结 本文提出一种计算框架,用于估计线性化无限维统计反问题数值解的不确定性,通过贝叶斯推断方法解决参数场的后验分布问题,并通过低秩近似和高效算法实现高维参数的可扩展计算。

详情
Comments
30 pages; to appear in SIAM Journal on Scientific Computing
AI中文摘要

我们提出了一种计算框架,用于估计线性化无限维统计反问题数值解的不确定性。我们采用贝叶斯推断方法:给定观测数据及其不确定性,正向问题及其不确定性,以及描述参数场不确定性的先验概率分布,找到参数场上的后验概率分布。先验必须适当选择以保证无限维反问题的适定性,并促进后验计算。此外,直接离散化可能无法导致无限维问题的收敛近似。最后,通过显式构造协方差矩阵来求解离散反问题是不可行的,因为需要求解正向问题的次数与参数数量相同。我们的计算框架基于Stuart提出的无限维公式,并结合了多个组件以确保无限维反问题的收敛离散化。该框架还结合了处理先验、构造数据驱动的后验协方差算子的低秩近似以及探索后验的方法,确保整个框架能够扩展到非常高的参数维度。我们在此计算框架上展示了在具有数万参数的3D全球地震波传播反问题中的贝叶斯解法。

英文摘要

We present a computational framework for estimating the uncertainty in the numerical solution of linearized infinite-dimensional statistical inverse problems. We adopt the Bayesian inference formulation: given observational data and their uncertainty, the governing forward problem and its uncertainty, and a prior probability distribution describing uncertainty in the parameter field, find the posterior probability distribution over the parameter field. The prior must be chosen appropriately in order to guarantee well-posedness of the infinite-dimensional inverse problem and facilitate computation of the posterior. Furthermore, straightforward discretizations may not lead to convergent approximations of the infinite-dimensional problem. And finally, solution of the discretized inverse problem via explicit construction of the covariance matrix is prohibitive due to the need to solve the forward problem as many times as there are parameters. Our computational framework builds on the infinite-dimensional formulation proposed by Stuart (A. M. Stuart, Inverse problems: A Bayesian perspective, Acta Numerica, 19 (2010), pp. 451-559), and incorporates a number of components aimed at ensuring a convergent discretization of the underlying infinite-dimensional inverse problem. The framework additionally incorporates algorithms for manipulating the prior, constructing a low rank approximation of the data-informed component of the posterior covariance operator, and exploring the posterior that together ensure scalability of the entire framework to very high parameter dimensions. We demonstrate this computational framework on the Bayesian solution of an inverse problem in 3D global seismic wave propagation with hundreds of thousands of parameters.

1308.0384 2026-06-04 eess.SY cs.IT cs.SY math.IT math.OC stat.AP

L1-Optimal Splines for Outlier Rejection

L1最优样条用于异常值剔除

Masaaki Nagahara, Clyde F. Martin

AI总结 本文提出基于L1优化的控制理论样条,用于数据中的异常值剔除。通过L1优化提升鲁棒性,克服L2优化对异常值的敏感性。

详情
Comments
Submitted to the 59th World Statistics Congress (WSC), Aug. 2013
AI中文摘要

本文考虑了用于数据异常值剔除的控制理论样条与L1优化。控制理论样条根据成本函数和线性微分方程约束分为插值或平滑样条。由于估计基于L2优化,控制理论样条在高斯噪声下有效。然而,实践中数据中可能存在异常值,这在高斯噪声假设下可能概率极低,而L2优化样条对此敏感。为提高鲁棒性,本文提出使用L1最优性,其也用于支持向量回归。数值示例展示了所提方法的有效性。

英文摘要

In this article, we consider control theoretic splines with L1 optimization for rejecting outliers in data. Control theoretic splines are either interpolating or smoothing splines, depending on a cost function with a constraint defined by linear differential equations. Control theoretic splines are effective for Gaussian noise in data since the estimation is based on L2 optimization. However, in practice, there may be outliers in data, which may occur with vanishingly small probability under the Gaussian assumption of noise, to which L2-optimized spline regression may be very sensitive. To achieve robustness against outliers, we propose to use L1 optimality, which is also used in support vector regression. A numerical example shows the effectiveness of the proposed method.

1307.6127 2026-06-04 stat.CO cs.NA math.NA

Sequential Monte Carlo Methods for High-Dimensional Inverse Problems: A case study for the Navier-Stokes equations

高维逆问题中的序贯蒙特卡洛方法:纳维-斯托克斯方程的案例研究

Nikolas Kantas, Alexandros Beskos, Ajay Jasra

AI总结 本文提出了一种高效的序贯蒙特卡洛方法,用于解决高维逆问题,通过结合马尔可夫链蒙特卡洛技术,提高了数据同化效率和准确性。

详情
Comments
31 pages, 14 figures
AI中文摘要

我们考虑了估计偏微分方程初始条件的逆问题,该问题仅通过离散时间间隔的噪声测量来观察。特别是,我们关注欧拉测量来自时间与空间演化的向量场,其演变更服从定义在环面上的二维纳维-斯托克斯方程。这一背景特别相关于数值天气预报和数据同化领域。我们将采用一种特定正则化方法得到的贝叶斯公式,以确保问题成立。在基于蒙特卡洛的推断中,从初始条件的高维后验分布中获取样本是一项具有挑战性的任务。在实际的数据同化应用中,计算方法通常会使用启发式和高斯近似。然而,在非线性动态和观测存在的情况下,这些推断是有偏的且缺乏正当性。另一方面,蒙特卡洛方法可以以原则性的方式同化数据,但往往由于问题的高维性而被认为效率低下。本文提出了一种通用的序贯蒙特卡洛(SMC)采样方法,用于高维逆问题,克服了这些困难。该方法基于马尔可夫链蒙特卡洛(MCMC)技术,这些技术目前被视为评估实际使用的数据同化算法的基准。在我们的数值实验中,所提出的SMC方法在准确度上与MCMC相同,但效率更高。

英文摘要

We consider the inverse problem of estimating the initial condition of a partial differential equation, which is only observed through noisy measurements at discrete time intervals. In particular, we focus on the case where Eulerian measurements are obtained from the time and space evolving vector field, whose evolution obeys the two-dimensional Navier-Stokes equations defined on a torus. This context is particularly relevant to the area of numerical weather forecasting and data assimilation. We will adopt a Bayesian formulation resulting from a particular regularization that ensures the problem is well posed. In the context of Monte Carlo based inference, it is a challenging task to obtain samples from the resulting high dimensional posterior on the initial condition. In real data assimilation applications it is common for computational methods to invoke the use of heuristics and Gaussian approximations. The resulting inferences are biased and not well-justified in the presence of non-linear dynamics and observations. On the other hand, Monte Carlo methods can be used to assimilate data in a principled manner, but are often perceived as inefficient in this context due to the high-dimensionality of the problem. In this work we will propose a generic Sequential Monte Carlo (SMC) sampling approach for high dimensional inverse problems that overcomes these difficulties. The method builds upon Markov chain Monte Carlo (MCMC) techniques, which are currently considered as benchmarks for evaluating data assimilation algorithms used in practice. In our numerical examples, the proposed SMC approach achieves the same accuracy as MCMC but in a much more efficient manner.

1307.5494 2026-06-04 math.NA cs.LG cs.NA stat.ML

On GROUSE and Incremental SVD

关于GROUSE和增量SVD

Laura Balzano, Stephen J. Wright

AI总结 本文改进增量SVD以处理缺失数据,并证明其与特定参数下的GROUSE等价,探讨了增量算法在子空间估计中的应用。

详情
AI中文摘要

GROUSE(Grassmannian Rank-One Update Subspace Estimation)是一种增量算法,用于从序列向量中识别R^n的子空间,其中每次迭代仅揭示每个向量的部分组件。近期分析表明,在某些假设下,GROUSE在期望线性速率下局部收敛。GROUSE与增量奇异值分解算法有相似之处,后者在添加单列后更新矩阵的SVD。本文改进增量SVD以处理缺失数据,并证明该改进方法在特定算法参数选择下等同于GROUSE。

英文摘要

GROUSE (Grassmannian Rank-One Update Subspace Estimation) is an incremental algorithm for identifying a subspace of Rn from a sequence of vectors in this subspace, where only a subset of components of each vector is revealed at each iteration. Recent analysis has shown that GROUSE converges locally at an expected linear rate, under certain assumptions. GROUSE has a similar flavor to the incremental singular value decomposition algorithm, which updates the SVD of a matrix following addition of a single column. In this paper, we modify the incremental SVD approach to handle missing data, and demonstrate that this modified approach is equivalent to GROUSE, for a certain choice of an algorithmic parameter.

1307.1223 2026-06-04 math.NA cs.NA math.PR math.ST stat.TH

Fast inverse transform sampling in one and two dimensions

一维和二维中的快速逆变换采样

Sheehan Olver, Alex Townsend

AI总结 本文提出一种高效稳健的算法,用于在一维和二维中生成广泛光滑概率分布的伪随机样本,采用Chebyshev多项式近似方案,结合Chebyshev网格和低秩函数近似。

详情
Comments
10 pages
AI中文摘要

我们开发了一种计算高效且稳健的算法,用于在一维和二维中生成广泛光滑概率分布的伪随机样本。该算法基于逆变换采样,结合多项式近似方案,使用Chebyshev多项式、Chebyshev网格和低秩函数近似。数值实验表明,我们的算法在性能上优于现有方法。

英文摘要

We develop a computationally efficient and robust algorithm for generating pseudo-random samples from a broad class of smooth probability distributions in one and two dimensions. The algorithm is based on inverse transform sampling with a polynomial approximation scheme using Chebyshev polynomials, Chebyshev grids, and low rank function approximation. Numerical experiments demonstrate that our algorithm outperforms existing approaches.

1306.5793 2026-06-04 eess.SY cs.NI cs.SY stat.AP stat.ML

A State-Space Approach for Optimal Traffic Monitoring via Network Flow Sampling

基于状态空间的最优网络流采样交通监控方法

Michael Kallitsis, Stilian Stoev, George Michailidis

AI总结 本文提出基于状态空间的优化交通监控方法,通过利用网络流的空间和时间关系,在资源受限条件下实现最优的流量采样策略,提升网络流量估计的准确性。

详情
Comments
preliminary work, short paper
AI中文摘要

IP网络的鲁棒性和完整性要求高效的流量监控和分析工具,这些工具能够随着流量体积和网络规模的增加而扩展。我们解决了在资源受限条件下对计算机网络进行大规模流量监控的最优问题。我们提出了一种随机优化框架,通过利用流量的空间(跨网络链路)和时间关系来进行流量测量。具体而言,给定网络拓扑,网络流的状态空间表征以及每个监控站的采样约束,我们寻求一种最优的分组采样策略,以获得网络中所有流量的最佳流量体积估计。最优的采样设计是凸优化问题的结果;然后,使用卡尔曼滤波来为每个网络流生成一系列流量估计。我们通过实际的Internet2数据评估了我们的算法。

英文摘要

The robustness and integrity of IP networks require efficient tools for traffic monitoring and analysis, which scale well with traffic volume and network size. We address the problem of optimal large-scale flow monitoring of computer networks under resource constraints. We propose a stochastic optimization framework where traffic measurements are done by exploiting the spatial (across network links) and temporal relationship of traffic flows. Specifically, given the network topology, the state-space characterization of network flows and sampling constraints at each monitoring station, we seek an optimal packet sampling strategy that yields the best traffic volume estimation for all flows of the network. The optimal sampling design is the result of a concave minimization problem; then, Kalman filtering is employed to yield a sequence of traffic estimates for each network flow. We evaluate our algorithm using real-world Internet2 data.

1306.1716 2026-06-04 cs.LG cs.DS cs.NA math.NA stat.ML

Fast greedy algorithm for subspace clustering from corrupted and incomplete data

从被破坏和不完整数据中快速贪心算法用于子空间聚类

Alexander Petukhov, Inna Kozlov

AI总结 本文提出一种高效的子空间聚类算法FGSSC,能够处理高擦除率噪声数据,其聚类能力优于现有方法,计算成本略高但效率高。

详情
Comments
arXiv admin note: substantial text overlap with arXiv:1304.4282
AI中文摘要

我们描述了快速贪心稀疏子空间聚类(FGSSC)算法,提供了一种高效的聚类方法,用于属于几个低维线性或仿射子空间的数据。我们的算法与前人不同之处在于能够处理具有高擦除率的噪声数据(丢失的已知坐标条目)和错误(被破坏的未知坐标条目)。我们讨论了如何实现快速贪心算法的快速版本,其贪心策略被整合到基本算法的迭代中。我们提供了数值证据表明,该快速贪心算法在子空间聚类能力上不仅优于作者所采用的现有最先进SSC算法,也优于最近的GSSC算法。同时,其计算成本仅略高于SSC。算法的显著优势在几个合成模型以及扩展耶鲁B数据集中也得到了验证。特别是,人脸识别的误分类率比SSC算法低6-20倍。我们还提供了数值证据,证明FGSSC算法能够高效地对被破坏的数据进行聚类,即使子空间维度总和显著超过环境空间的维度。

英文摘要

We describe the Fast Greedy Sparse Subspace Clustering (FGSSC) algorithm providing an efficient method for clustering data belonging to a few low-dimensional linear or affine subspaces. The main difference of our algorithm from predecessors is its ability to work with noisy data having a high rate of erasures (missed entries with the known coordinates) and errors (corrupted entries with unknown coordinates). We discuss here how to implement the fast version of the greedy algorithm with the maximum efficiency whose greedy strategy is incorporated into iterations of the basic algorithm. We provide numerical evidences that, in the subspace clustering capability, the fast greedy algorithm outperforms not only the existing state-of-the art SSC algorithm taken by the authors as a basic algorithm but also the recent GSSC algorithm. At the same time, its computational cost is only slightly higher than the cost of SSC. The numerical evidence of the algorithm significant advantage is presented for a few synthetic models as well as for the Extended Yale B dataset of facial images. In particular, the face recognition misclassification rate turned out to be 6-20 times lower than for the SSC algorithm. We provide also the numerical evidence that the FGSSC algorithm is able to perform clustering of corrupted data efficiently even when the sum of subspace dimensions significantly exceeds the dimension of the ambient space.

1305.0395 2026-06-04 math.NA cs.LG cs.NA q-bio.NC stat.ML

Tensor Decompositions: A New Concept in Brain Data Analysis?

张量分解:脑数据处理中的新概念?

Andrzej Cichocki

AI总结 本文综述了张量分解在多向BSS/ICA、特征提取、分类和多向PLS回归中的新模型与方法,涵盖约束Tucker和CP模型及惩罚张量分解。

详情
Journal ref
Control Measurement, and System Integration (SICE), special issue; Measurement of Brain Functions and Bio-Signals, 7, 507-517, (2011)
AI中文摘要

矩阵分解及其扩展到张量分解和分解的技术已成为线性和多线性盲源分离(BSS)中的重要方法,尤其在多向独立成分分析(ICA)、非负矩阵和张量分解(NMF/NTF)、平滑成分分析(SmoCA)和稀疏成分分析(SCA)中。此外,张量分解在多线性BSS之外还有许多潜在应用,如特征提取、分类、降维和多向聚类。本文简要回顾了张量分解在组联多向BSS/ICA、特征提取、分类和多向偏最小二乘(MPLS)回归中的新模型和方法。关键词:多线性BSS,联多向BSS/ICA,张量分解和分解,约束Tucker和CP模型,惩罚张量分解(PTD),特征提取,分类,多向PLS和CCA。

英文摘要

Matrix factorizations and their extensions to tensor factorizations and decompositions have become prominent techniques for linear and multilinear blind source separation (BSS), especially multiway Independent Component Analysis (ICA), NonnegativeMatrix and Tensor Factorization (NMF/NTF), Smooth Component Analysis (SmoCA) and Sparse Component Analysis (SCA). Moreover, tensor decompositions have many other potential applications beyond multilinear BSS, especially feature extraction, classification, dimensionality reduction and multiway clustering. In this paper, we briefly overview new and emerging models and approaches for tensor decompositions in applications to group and linked multiway BSS/ICA, feature extraction, classification andMultiway Partial Least Squares (MPLS) regression problems. Keywords: Multilinear BSS, linked multiway BSS/ICA, tensor factorizations and decompositions, constrained Tucker and CP models, Penalized Tensor Decompositions (PTD), feature extraction, classification, multiway PLS and CCA.

1304.3877 2026-06-04 eess.SY cs.SY math.OC math.ST stat.TH

Linear models based on noisy data and the Frisch scheme

基于噪声数据的线性模型与弗里希方案

Lipeng Ning, Tryphon T. Georgiou, Allen Tannenbaum, Stephen P. Boyd

AI总结 本文探讨基于噪声测量的变量线性关系识别问题,提出弗里希方案用于噪声贡献的最小化,结合凸松弛和全局最优性证书,讨论了多种正则化方案。

详情
Comments
26 pages
AI中文摘要

我们解决基于噪声测量识别变量间线性关系的问题。该问题在

英文摘要

We address the problem of identifying linear relations among variables based on noisy measurements. This is, of course, a central question in problems involving "Big Data." Often a key assumption is that measurement errors in each variable are independent. This precise formulation has its roots in the work of Charles Spearman in 1904 and of Ragnar Frisch in the 1930's. Various topics such as errors-in-variables, factor analysis, and instrumental variables, all refer to alternative formulations of the problem of how to account for the anticipated way that noise enters in the data. In the present paper we begin by describing the basic theory and provide alternative modern proofs to some key results. We then go on to consider certain generalizations of the theory as well applying certain novel numerical techniques to the problem. A central role is played by the Frisch-Kalman dictum which aims at a noise contribution that allows a maximal set of simultaneous linear relations among the noise-free variables --a rank minimization problem. In the years since Frisch's original formulation, there have been several insights including trace minimization as a convenient heuristic to replace rank minimization. We discuss convex relaxations and certificates guaranteeing global optimality. A complementary point of view to the Frisch-Kalman dictum is introduced in which models lead to a min-max quadratic estimation error for the error-free variables. Points of contact between the two formalisms are discussed and various alternative regularization schemes are indicated.