Sparse Tree-Based Aggregation for Time Series Regressions
基于稀疏树聚合的时间序列回归
AI总结 提出StarTime方法,利用时间树分层排列滞后项,通过凸惩罚实现系数聚合与稀疏选择,降低高维时间序列回归的维度,提高估计精度。
基于稀疏树聚合的时间序列回归
Marie Corillon, Stephan Smeekes, Ines Wilms
AI总结 提出StarTime方法,利用时间树分层排列滞后项,通过凸惩罚实现系数聚合与稀疏选择,降低高维时间序列回归的维度,提高估计精度。
高维时间序列回归通常通过正则化产生稀疏系数。我们证明,时间聚合为高阶自回归和混频回归中的降维提供了强有力的替代方案。为此,我们提出了StarTime(基于稀疏树聚合的时间序列),一种凸惩罚方法,它使用时间树将滞后项从高频到低频分层排列。然后,StarTime灵活地选择系数以可能变化的频率进行聚合,可以是稀疏的或两者的组合。我们为StarTime提供了新的误差界,在模拟中相对于基准方法展示了改进的估计精度以及聚合和稀疏性的恢复,并说明了StarTime在金融和宏观经济应用中的相关性。
High-dimensional time series regressions are often regularized to produce sparse coefficients. We show that temporal aggregation provides a powerful alternative to reduce dimensionality in high-order autoregressions and mixed-frequency regressions. To this end, we propose StarTime (Sparse Tree-based Aggregation for Time Series), a convex penalization method that uses a temporal tree to arrange lags hierarchically from high to low frequency. StarTime then flexibly selects coefficients to be aggregated at possibly varying frequencies, sparse or a combination thereof. We provide new error bounds for StarTime, demonstrate improved estimation accuracy and recovery of aggregation and sparsity in simulations relative to benchmarks, and illustrate StarTime's relevance for financial and macroeconomic applications.
恢复住房市场中环境舒适度的直接价格效应:基于实证蒙特卡洛模拟的回归与因果机器学习模型评估
Zhenshan Chen, Klaus Moeltner, Matthew Mair
AI总结 通过实证蒙特卡洛模拟,评估传统回归与因果机器学习方法在估计环境舒适度对房产价值的直接价格效应(DUET)中的表现,发现广义双重差分法表现稳健,因果森林在样本较大时优势显著。
特征价格模型被广泛用于评估环境舒适度如何影响房产价值,但关于估计直接价格效应的方法指导仍然匮乏。我们进行了一项实证蒙特卡洛模拟,以评估传统和因果机器学习方法在估计空间划分的舒适度对处理房产的直接无中介价格效应(DUET)方面的表现,DUET是福利变化的保守下限近似值,可直接应用于收益-成本分析。以往的模拟依赖于参数假设,而我们保留了纽约州北部(1990-2024年)超过100万笔房产交易的实际数据生成过程。通过在迭代中随机分配“处理位置”,我们建立了一个“真实基准”,从而能够精确测量估计误差。我们的结果表明,在所有情景下,广义双重差分(DID)回归始终优于基线DID和双向固定效应模型。因果机器学习(CML)方法,特别是因果森林DID,在大多数情景下实现了与广义DID相当的性能。在当代特征价格研究中越来越常见的大样本(超过3000个处理单元)中,当适当指定时,CML方法提供了显著优势。基于实证模拟结果,我们为传统回归和因果机器学习方法提供了一套针对具体方法的最佳实践建议。
Hedonic price models are widely used to assess how environmental amenities affect property values, yet methodological guidance for estimating direct price effects remains sparse. We conduct an empirical Monte Carlo simulation to evaluate the performance of traditional and causal machine learning approaches for estimating the direct unmediated price effect of spatially delineated amenities on treated properties (DUET), a conservative lower-bound approximation for welfare changes with direct applications to benefit-cost analysis. Where previous simulations rely on parametric assumptions, we retain the actual data-generating process underlying over 1 million property transactions from upstate New York (1990--2024). By randomly assigning "treatment locations" across iterations we establish a "ground truth" that allows us to precisely measure estimation error. Our results demonstrate that generalized difference-in-differences (DID) regression consistently outperforms baseline DID and two-way fixed effects models across all scenarios. Causal Machine Learning (CML) methods, particularly causal forest DID, achieve comparable performance to generalized DID in most scenarios. In larger samples (above 3,000 treated) increasingly common in contemporary hedonic studies, CML approaches offer substantial advantages when properly specified. Based on empirical simulation results, we provide a set of method-specific best practice recommendations for both traditional regression and causal machine learning approaches.
一种面向难解似然的聚合关系数据神经估计框架
Rowland G Seymour, Joseph Marsh
AI总结 提出一种基于模拟的神经估计框架,通过训练置换不变贝叶斯估计器,解决聚合关系数据中因同质性、潜在空间聚类和不完美回忆导致的跨群体依赖问题,并应用于网络规模升级法难以处理的生成模型。
聚合关系数据(ARD)包含对诸如“你认识多少~$X$? 的人”这类问题的调查回答,广泛用于调查统计中关于人群和社交网络的间接推断。ARD的主要推断目标是通过网络规模升级法(NSUM)估计隐藏人群规模,但也用于个人网络规模估计、混合模式恢复以及潜在网络结构推断。ARD的贝叶斯推断几乎普遍假设,在给定受访者度数的条件下,不同子群体报告的计数是独立的。然而,有理由质疑这一假设,因为同质性、潜在空间聚类和不完美回忆都可能引起跨群体依赖性。我们开发了一个基于模拟的ARD神经估计框架,仅需一个模拟器,因此可应用于似然无法写出或有效评估的生成模型。该框架训练一个置换不变的神经贝叶斯估计器,通过最小化多分位数弹球损失(采用累积间隔构造以从设计上排除分位数交叉),为每个边缘参数返回后验中位数和95%可信区间。我们在NSUM式ARD推断的三个结构不同的难解扩展上演示了该框架:随机块模型、潜在空间模型和回忆子集模型。我们将该框架应用于在卢旺达收集的ARD家庭调查。该框架对来自训练分布的任何新调查提供推断,并将ARD建模的范围扩展到网络结构和认知过程假设,超越了当前基于似然的推断所能达到的范围。
Aggregated relational data (ARD) consists of survey responses to questions of the form ``how many people do you know who~$X$?'' and is widely used in survey statistics for indirect inference about populations and social networks. The dominant ARD inference target is hidden-population size estimation via the Network Scale-Up Method (NSUM), but ARD is also used for personal-network-size estimation, mixing-pattern recovery, and inference about latent network structure. Bayesian inference for ARD almost universally assumes that, conditional on a respondent's degree, the counts reported for different subpopulations are independent. There are, however, reasons to question this assumption, as homophily, latent-space clustering, and imperfect recall may all induce cross-population dependence. We develop a simulation-based neural estimation framework for ARD which requires only a simulator, so it can be applied to generative models whose likelihood cannot be written down or efficiently evaluated. The framework trains a permutation-invariant neural Bayes estimator that returns, for each marginal parameter, a posterior median and a 95% credible interval, by minimising a multi-quantile pinball loss with a cumulative-gap construction that rules out quantile crossing by design. We demonstrate the framework on three structurally distinct intractable extensions of NSUM-style ARD inference: a stochastic block model, a latent-space model, and a recall-subset model. We apply the framework to ARD Household Survey collected in Rwanda. The framework provides inference on any new survey drawn from the training distribution, and extends the reach of ARD modelling to network-structure and cognitive-process assumptions beyond those currently accessible to likelihood-based inference.
强相关高维线性模型中方差解释比例的主成分分解
Man Luo, Chun Chieh Fan, David Azriel, Armin Schwartzman
AI总结 针对高维线性模型中预测变量强相关导致传统FVE估计失效的问题,提出将FVE分解为低维强相关成分和高维弱相关成分的框架,通过主成分分解结合GWASH或LMM-REML方法降低偏差,并在ABCD脑成像数据中验证了其有效性。
线性模型中的方差解释比例(FVE)量化了预测变量对结果变异性的解释程度。在高维设置中,传统FVE估计量不适用,而现代FVE估计量(如GWASH或通过限制最大似然估计的线性混合效应模型LMM-REML)难以处理预测变量间的强相关性,这在脑成像数据中经常出现。我们提出一个分解框架,将FVE分为两个部分:一个捕捉强相关性的低维成分(可通过低维方法估计),以及一个具有剩余弱相关性的高维成分(可通过高维方法估计)。模拟表明,分解主导主成分(PCs)并使用GWASH或LMM-REML估计高维FVE,相比直接应用GWASH和LMM-REML等标准方法,能更好地减少偏差。随着预测变量数和样本数增加,我们的方法渐近地表现出一致的性能。我们在青少年大脑认知发展(ABCD)脑成像数据集的分析中展示了该方法,捕捉了由高分辨率脑成像数据预测的认知测量FVE中细微的遗传力信号。
The fraction of variance explained (FVE) in a linear model quantifies the extent to which predictors account for outcome variability. In high-dimensional settings, where traditional FVE estimators do not apply, modern FVE estimators such as GWASH or linear mix-effect model estimated through the restricted maximum likelihood (LMM-REML) struggle with strong correlation among predictors, often found, for example, in brain imaging data. We propose a decomposition framework that partitions the FVE into two components: a low-dimensional component capturing the strong correlation, estimable by low dimensional methods, and a high-dimensional component with remaining weak correlation, estimable by high dimensional methods. Simulations demonstrate that decomposing dominant principal components (PCs) and estimating the high-dimensional FVE using GWASH or LMM-REML leads to improved bias reduction compared to directly applying standard approaches such as GWASH and LMM-REML. Our method shows consistent performance asymptotically as both the number of predictors and the number of samples increase. We illustrate the method in an analysis of the Adolescent Brain Cognitive Development (ABCD) brain imaging dataset, capturing nuanced heritability signals in the FVE of cognitive measures predicted by high-resolution brain imaging data.
信号损失下广告系统的隐私鲁棒增量测量
Prashant Shekhar, Caroline Howard
AI总结 针对隐私保护报告系统导致的信号损失,提出鲁棒因果决策框架,通过投影观测兼容的实验世界到增量泛函,给出尖锐决策边界,实现认证、拒绝或未决的增量判断。
广告平台使用随机提升测试来测量增量,但隐私保护报告系统通过匹配率损失、可链接性损失、归因窗口损失、聚合阈值抑制、随机报告噪声和分段异质信号损失降低观测信号。本文将隐私约束下的广告测量形式化为一个鲁棒因果决策问题,考虑上述信号损失。给定随机实验和隐私引起的退化的模糊集,该框架将观测兼容的干净/未过滤实验世界的纤维投影到增量泛函上,并返回认证、拒绝和未决的决策。主要结果给出了尖锐的决策边界。边界外的报告支持一致有效的认证或拒绝,而边界内的报告包含的信息太少,任何方法都无法一致区分高于阈值的增量与非增量。支持结果给出了有限样本认证、样本复杂度保证、表明信号损失减少有效信息的极小极大下界,以及报告粒度权衡。在200万条Criteo提升数据和6.4万条Hillstrom电子邮件实验中,两个数据集的干净转化提升均为正,分别为0.00112和0.00495。在Criteo中,总体认证在轻度退化下幸存,在Hillstrom中在严重退化下幸存,而两个数据集中所有考虑的有限样本压力设置在同时包含不确定性和报告噪声后仍然未决。总体而言,本研究为隐私感知的增量测量贡献了一个决策理论层,其输出是由退化广告信号证明的最强因果主张。
Advertising platforms use randomized lift tests to measure incrementality, but privacy-preserving reporting systems degrade the observed signal through match-rate loss, linkability loss, attribution-window loss, aggregation-threshold suppression, randomized reporting noise, and segment-heterogeneous signal loss. This paper formulates privacy-constrained advertising measurement as a robust causal decision problem under the mentioned signal losses. Given a randomized experiment and an ambiguity set for privacy-induced degradation, the framework projects the observation-compatible fiber of clean/unfiltered experimental worlds onto the incrementality functional and returns certified, rejected, and unresolved decisions. The main result gives a sharp decision frontier. Reports outside the frontier support uniformly valid certification or rejection, whereas reports inside it contain too little information for any method to uniformly distinguish above-threshold incrementality from non-incrementality. Supporting results give finite-sample certification, sample-complexity guarantees, a minimax lower bound showing that signal loss reduces effective information, and a reporting-granularity tradeoff. On 2.0M Criteo Uplift rows and the 64K-row Hillstrom email experiment, clean conversion lift is positive in both datasets, with lifts 0.00112 and 0.00495, respectively. Population certification survives mild degradation in Criteo and severe degradation in Hillstrom, while all considered finite-sample stress settings in both datasets remain unresolved after simultaneous uncertainty and reporting noise are included. Overall, the research contributes a decision-theoretic layer for privacy-aware incrementality measurement whose output is the strongest causal-claim justified by degraded ads signals.
多变量波动率预测的网络时间序列模型
Chiara Boetti, Matthew A. Nunes
AI总结 提出基于网络的广义异质自回归(GNHAR)模型,通过格兰杰因果检验或关联指数推断的有向图纳入截面溢出效应,实现简洁的多变量波动率预测。
实现波动率已成为衡量金融资产潜在变动的标准工具,其预测对于广泛的金融应用至关重要。我们提出了一种基于网络的模型,通过异质自回归(HAR)方法预测实现方差向量。广义网络HAR(GNHAR)模型通过从格兰杰因果检验或关联指数推断的有向图纳入截面溢出效应,得到简洁的多变量时间序列模型规范。在平静和危机状态下对十只股票的应用中,所提出的GNHAR模型在短期和长期预测中均优于常见的HAR模型基准。我们还比较了考虑跳跃-连续分解或节点特定期权隐含方差时的网络规范。最后,与过度参数化模型不同,我们的方法产生一组简洁的参数,跟踪跨市场依赖性的增强或减弱,提供市场稳定性的时变定量评估。
Realized volatility has become a standard tool for measuring latent variation in financial assets, and its forecasting is crucial for a wide range of financial applications. We propose a network-based model for forecasting a vector of realized variance processes through the heterogeneous autoregressive (HAR) approach. The generalised network HAR (GNHAR) model incorporates cross-sectional spillovers through a directed graph inferred from Granger-causality tests or connectedness indices, yielding a parsimonious multivariate time series model specification. In an application to ten equities over tranquil and crisis regimes, the proposed GNHAR model improves upon common HAR model benchmarks under both short- and long-term forecasting. We also compare the network-based specification when the jump-continuous decomposition or node-specific option-implied variances are considered. Finally, unlike overparameterised models, our approach yields a concise set of parameters that track the strengthening or weakening of cross-market dependencies, providing a time-varying quantitative assessment of market stability.
扩散模型中流蒸馏的定量近似框架
Weiguo Gao, Ming Li, Lei Shi, Hanfei Zhou
AI总结 针对扩散模型中的流蒸馏,提出一个定量近似框架,将少步采样视为学习流映射组合下的误差传播,通过理论分析和实验验证了稳定性平衡的非均匀时间网格能显著降低端到端相对MSE。
我们为扩散蒸馏开发了一个定量近似框架,将少步采样视为学习流映射组合下的误差传播。聚焦于概率流ODE的轨迹蒸馏,我们表明局部近似误差在低噪声多模态区域可能被强烈放大,其中底层动力学变得刚性。在一个解析可处理的高斯混合Ornstein--Uhlenbeck设定中,我们分离了两个核心困难:近似时间依赖的分数场和控制由概率流ODE的时间积分Jacobian界决定的动力学放大。在近似方面,我们证明了构造性的L^p(p_t)保证,表明ReLU--ReQU网络随时间一致地近似高斯混合分数,其深度和宽度在目标精度上呈多对数缩放,并显式依赖于混合几何。在稳定性方面,我们推导了概率流速度的空间Lipschitz常数的一个显式界L(t),并将其转化为由∫_s^t L(u)du控制的流映射稳定性估计,使得刚性区域中的后期放大可计算。基于这些估计,我们证明深度残差组合有效近似长时程传输,全局误差由稳定性放大因子控制,并识别出一个Lipschitz不匹配区域,其中一步蒸馏在结构上不利。由此产生的理论通过累积稳定性坐标的均匀划分得到一个稳定性平衡的非均匀时间网格。实验支持该预测,并在8个分段下与均匀网格相比将端到端相对MSE降低了高达51.9%。
We develop a quantitative approximation framework for diffusion distillation, viewing few-step sampling as error propagation under compositions of learned flow maps. Focusing on trajectory distillation for the probability-flow ODE, we show that local approximation errors can be strongly amplified in low-noise multimodal regimes, where the underlying dynamics become stiff. In an analytically tractable Gaussian-mixture Ornstein--Uhlenbeck setting, we separate two core difficulties: approximating the time-dependent score field and controlling the dynamical amplification governed by the time-integrated Jacobian bound of the probability-flow ODE. On the approximation side, we prove constructive L^p(p_t) guarantees showing that ReLU--ReQU networks approximate the Gaussian-mixture score uniformly over time, with depth and width scaling polylogarithmically in the target accuracy and explicitly with the mixture geometry. On the stability side, we derive an explicit bound L(t) for the spatial Lipschitz constant of the probability-flow velocity and convert it into a flow map stability estimate governed by \int_s^t L(u)\,du, making late-time amplification in stiff regimes computable. Building on these estimates, we prove that deep residual compositions efficiently approximate the long-horizon transport, with global error controlled by the stability amplification factor, and identify a Lipschitz-mismatch regime in which one-step distillation is structurally unfavorable. The resulting theory yields a stability-balanced non-uniform time grid obtained by uniform partitioning in the cumulative stability coordinate. Experiments support the prediction and reduce end-to-end relative MSE by up to 51.9\% with 8 segments compared with uniform grids.
通过伪博弈和幻影玩家进行成对比较模型的正则化
Mark E. Glickman
AI总结 针对成对比较模型中最大似然估计不稳定的问题,本文提出两种数据增广正则化方法:添加伪博弈和引入幻影玩家,产生有限收缩估计并解决位置不可识别性,在Bradley-Terry模型中与岭正则化效果相当。
成对比较模型对于从二元结果估计潜在能力或偏好很有用,但当比较图不连通或近乎分离时,最大似然估计可能不稳定或失败。岭正则化通过将能力参数向共同中心收缩来解决这些困难,但它可能掩盖了使Bradley-Terry和Thurstone-Mosteller模型对从业者有吸引力的简单似然解释。本文描述了两种关于正则化的数据增广视角。第一种在每对竞争者之间添加分数伪博弈。第二种添加一个固定强度的幻影玩家,并让每个真实竞争者对该玩家获得加权的伪胜和伪负。两种方法都产生有限的收缩估计;幻影玩家构造还解决了通常的位置不可识别性,而无需显式线性约束。对于Bradley-Terry模型,这两种增广导致了透明的惩罚函数,可以直接与岭惩罚进行比较。对2025年美国职业棒球大联盟常规赛的应用表明,调优的伪博弈和幻影玩家正则化可以紧密再现岭正则化的强度估计,同时保留直观的增广数据表示。
Paired comparison models are useful for estimating latent abilities or preferences from binary outcomes, but maximum likelihood estimation can be unstable or fail when the comparison graph is disconnected or nearly separated. Ridge regularization addresses these difficulties by shrinking ability parameters toward a common center, but it can obscure the simple likelihood interpretation that makes Bradley-Terry and Thurstone-Mosteller models attractive to practitioners. This paper describes two data-augmentation perspectives on regularization. The first adds fractional pseudo-games between every pair of competitors. The second adds a fixed-strength phantom player and gives each real competitor a weighted pseudo-win and pseudo-loss against that player. Both approaches yield finite, shrunken estimates; the phantom-player construction also resolves the usual location nonidentifiability without an explicit linear constraint. For the Bradley-Terry model, the two augmentations lead to transparent penalty functions that can be compared directly with ridge penalties. An application to the 2025 Major League Baseball regular season illustrates that tuned pseudo-game and phantom-player regularization can closely reproduce ridge-regularized strength estimates while retaining an intuitive augmented-data representation.
将TCLUST扩展到更高维度
Lucía Trapote Reglero, Luis Ángel García Escudero, Agustín Mayo Íscar
AI总结 针对高维数据中传统鲁棒聚类方法TCLUST参数估计困难的问题,提出结合HDDC框架与修剪技术的tHHDC方法,实现鲁棒聚类与降维的融合。
已知异常值会显著扭曲许多常用聚类方法的结果,通常导致不可靠的分区。为了解决这个问题,已经开发了几种鲁棒聚类方法,这些方法不仅减少异常值的影响,而且有助于检测有意义的异常值。本报告聚焦于基于修剪的鲁棒聚类方法,特别是TCLUST,它将MCD在单总体问题中使用的修剪类型扩展到多个未知聚类的更一般情况。虽然TCLUST在低维数据上表现良好,但由于估计大量参数的复杂性,它在高维数据集上表现不佳。鲁棒线性分组(RLG)方法通过假设聚类位于低维子空间附近,从而将聚类与降维相结合,提供了一种替代方案。然而,当子空间相交时,RLG存在局限性,并且假设了过于简单的各向同性正交误差。本文将提出一种扩展TCLUST的鲁棒聚类方法,该方法基于高维数据聚类(HDDC)方法,通过引入修剪和特征值约束。这种新方法称为tHHDC,它结合了TCLUST和RLG,需要在HDDC框架内对这两种方法进行仔细的修改和集成。本文将研究该方法的理论性质,并提供可行的实现算法。通过模拟研究和实际数据示例,将说明所提出方法的有效性以及输入参数选择的问题。
Outliers are known to significantly distort the results of many commonly used clustering methods, often leading to unreliable partitions. To address this issue, several robust clustering approaches have been developed that not only reduce their influence but also facilitate the detection of meaningful outliers. This presentation focuses on robust clustering methods based on trimming, especially TCLUST, which extends the type of trimming used by MCD in one-population problems to the more general case of multiple and unknown clusters. While TCLUST performs well on low-dimensional data, it struggles with high-dimensional datasets due to the complexity of estimating a large number of parameters. The Robust Linear Grouping (RLG) method offers an alternative by assuming clusters lie near lower-dimensional subspaces, thereby combining clustering with dimensionality reduction. However, RLG has limitations when subspaces intersect and assumes overly simplistic isotropic orthogonal errors. A robust clustering method extending TCLUST will be presented, building on the High Dimensional Data Clustering (HDDC) approach by incorporating trimming and eigenvalue constraints. This new approach, called tHHDC, combines TCLUST and RLG, requiring careful modification and integration of both methodologies within that HDDC framework. A study of the theoretical properties of this approach, together with a feasible algorithm for its implementation, will be presented. The interest of the proposed methodology, along with the issue of selecting input parameters, will be illustrated through a simulation study and a real-data example.
资源约束下的自适应推断用于顺序定价
Ruicheng Ao, Jiashuo Jiang, David Simchi-Levi
AI总结 针对资源约束导致固定价格推断不可行的问题,提出一种目标感知定价控制器,通过认证可行目标带并记录连续局部密度,实现基于局部去偏的学生化区间,并分析遗憾-信息核算。
资源约束的定价控制器可能使得固定价格推断变得不可能:即使每个已实现的动作具有已知的正密度,控制器的资源状态也可能从可行集中移除目标价格邻域。我们通过局部不可识别结果和已实现的信息时钟形式化了这种支持排除失败。然后,我们设计了一种目标感知定价控制器,该控制器认证可行的目标带并记录连续的局部密度。局部去偏产生了学生化区间,其宽度由该时钟控制。由此产生的遗憾-信息核算(直到初始求解误差)表明,廉价的探索可能不足以进行推断:多项式目标质量给出多项式速率,而纯$1/t$目标分支在没有额外局部移动的情况下不会产生收缩的固定目标区间。实验显示了在认证带中的校准以及当资源状态崩溃目标支持时的诊断性弃权。
Resource-constrained pricing controllers can make fixed-price inference impossible: the controller's resource state may remove the target price neighborhood from the feasible set, even when every realized action has a known positive density. We formalize this support-exclusion failure through a local non-identification result and a realized information clock. We then design a target-aware pricing controller that certifies feasible target bands and logs continuous local densities. Localized debiasing gives studentized intervals whose width is governed by this clock. The resulting regret--information accounting, stated up to pilot re-solving error, shows that cheap exploration can be insufficient for inference: polynomial target mass gives polynomial rates, while a pure $1/t$ target branch does not yield shrinking fixed-target intervals without additional local movement. Experiments show calibration in certified bands and diagnostic abstention when the resource state collapses target support.
重复调查的动态极小极大设计与序贯分层贝叶斯推断
Siu-Ming Tam
AI总结 提出动态极小极大(DMM)框架,通过联合优化样本量和波重叠,在满足精度约束、受访者负担和预算下降低成本,并实现水平和变动的协调推断。
本文为重复调查开发了一个动态极小极大(DMM)框架,包括动态极小极大设计和序贯分层贝叶斯更新(SHBU)。DMM在同时满足水平和变动的精度约束、受访者负担限制和实地调查预算的条件下,联合优化样本量和波重叠。使用2021年澳大利亚人口普查数据(t=1)和模拟波t=2,3,4进行说明。DMM和经典设计均从相同的5%比例分配n_A=42,018个单位开始。DMM将其减少到n*=40,251,同时满足所有精度约束,实现约6.3%的成本节约。两种设计的水平覆盖率相当(最大绝对相对误差(MARE)比率0.844-1.263)。变动覆盖率显著不同:DMM在所有27个域-变量单元中达到100%,而经典设计仅达到82%-96%(全国87.5%-95.0%)。经典置信区间低估了变动不确定性,因为它仅处理抽样方差,未考虑模型方差分量V_mod_hat。论文概述了DMM框架的其他优势——包括水平和变动的协调联合推断、无需临时复合估计器链接的序贯更新以及小区域估计。
TThis paper develops a Dynamic Mini-Max (DMM) framework for repeated surveys comprising a Dynamic Mini-Max Design and a Sequential Hierarchical Bayes Update (SHBU). The DMM jointly optimizes sample size and wave overlap subject to simultaneous precision constraints for levels and movements, a respondent burden limit, and a fieldwork budget. The methods are illustrated using 2021 Australian Census data (t = 1) and simulated waves t = 2, 3, 4. Both the DMM and the classical design start from the same 5% proportional allocation of n_A = 42,018 units. The DMM reduces this to n* = 40,251 while meeting all precision constraints, achieving a cost saving of approximately 6.3%. Level coverage is comparable between the two designs (maximum absolute relative error (MARE) ratio 0.844--1.263). Movement coverage diverges markedly: the DMM achieves 100% across all 27 domain-variable cells, while the classical design achieves only 82%--96% (87.5%--95.0% nationally). The classical confidence interval understates movement uncertainty because it addresses sampling variance only and does not account for the model variance component V_mod_hat. Additional benefits of the DMM framework -- including coherent joint inference for levels and movements, sequential updating without ad hoc composite-estimator chaining, and small area estimation -- are outlined in the paper.
多元数据中方向不对称性与尾部比率偏离的投影诊断
Sayantan Banerjee, Soudeep Deb
AI总结 提出基于投影的诊断方法,通过方向偏度与分位数尾部比率将数据分类为四种模式,避免高阶矩的不稳定性,并建立理论性质。
我们研究基于投影的诊断方法,用于区分多元数据中的方向不对称性与尾部比率偏离。该方法将问题简化为单维投影,并计算两个基于分位数的汇总统计量:在多个分位数水平上评估的方向偏度度量,以及相对于选定基准评估的分位数间尾部比率。这两个汇总统计量导致四类分类:对称基准尾部、对称尾部偏离、偏斜基准尾部和偏斜尾部偏离。分位数公式避免依赖三阶和四阶矩,这些矩在重尾设置中可能不稳定。我们在中心对称性和椭圆性下建立总体性质,在搜索方向上建立均匀有限样本界,以及在分离模式下的阈值分类器一致性。还使用稀疏秩一计算说明为什么坐标方向在高维中可以补充随机方向。所得诊断旨在指导后续建模选择,例如对称、偏斜、尾部偏离或组合多元模型是否合适。
We study projection-based diagnostics for distinguishing directional asymmetry from tail-ratio departure in multivariate data. The procedure reduces the problem to one-dimensional projections and computes two quantile-based summaries: a directional skewness measure evaluated over several quantile levels, and an interquantile tail-ratio evaluated relative to a chosen benchmark. The two summaries lead to a four-regime classification: symmetric benchmark-tail, symmetric tail-departed, skewed benchmark-tail, and skewed tail-departed. The quantile formulation avoids relying on third and fourth moments, which can be unstable in heavy-tailed settings. We establish population properties under central symmetry and ellipticity, uniform finite-sample bounds over the searched directions, and consistency of the threshold classifier under separated regimes. A sparse rank-one calculation is also used to show why coordinate directions can complement random directions in high dimensions. The resulting diagnostic is meant to guide subsequent modelling choices, for example whether a symmetric, skewed, tail-departed, or combined multivariate model is appropriate.
从共形p值到e值的集合保持校准
Nabil Alami, Jad Zakharia, Souhaib Ben Taieb
AI总结 针对共形预测中p值到e值转换的局限性,提出一种集合保持的P2E校准器,在不改变预测集的前提下实现高效转换,并在交叉共形预测和共形聚合中达到期望覆盖并提升效率。
标准的共形预测(CP)过程通常用p值表述,但仅依赖p值限制了灵活性,例如在跨模型或数据分割组合依赖证据时。最近的工作探索了共形推断的e值表述,然而CP中p值和e值表述之间的直接联系仍然缺失,特别是在统计效率方面。我们首先指出了CP设置中经典p到e校准器的局限性,表明它们不是集合保持的,可能导致过于保守的预测集。为解决这一问题,我们提出了一种新颖的P2E校准器,它将共形p值转换为e值,而不改变原始共形p值诱导的预测集。我们在理论和实证上证明,我们的校准器相比现有的p到e校准器可以带来显著的效率提升。这种e值表述使得能够原则性地使用e值合并和随机化的最新进展,我们在两个应用中展示了其影响:交叉共形预测(CCP),其变体通常仅提供近似的$1-2\alpha$覆盖率,以及共形聚合(CA)。在这两种情况下,我们基于e值的方法满足所需的$1-\alpha$覆盖率保证,同时相比标准基线提高了效率。更广泛地说,我们的方法扩展了CP的灵活性,并为高效、无分布的量化不确定性开辟了新方向。
Standard conformal prediction (CP) procedures are typically formulated in terms of p-values, but reliance on p-values alone limits flexibility, for example, when combining dependent evidence across models or data splits. Recent work has explored e-value formulations for conformal inference, yet a direct connection between p- and e-value formulations in CP has been missing, especially regarding their statistical efficiency. We first identify limitations of classical p-to-e calibrators in the CP setting, showing that they are not set-preserving and can lead to overly conservative prediction sets. To address this, we propose a novel P2E calibrator that converts conformal p-values into e-values without altering the prediction set induced by the original conformal p-value. We establish both theoretically and empirically that our calibrator can yield significant efficiency gains over existing p-to-e calibrators. This e-value formulation enables principled use of recent advances in e-value merging and randomization, where we demonstrate its impact in two applications: cross-conformal prediction (CCP), whose variants typically provide only approximate $1-2α$ coverage, and conformal aggregation (CA). In both cases, our e-value-based methods satisfy the desired $1-α$ coverage guarantee while improving efficiency over standard baselines. More broadly, our approach expands the flexibility of CP and opens new directions for efficient, distribution-free uncertainty quantification.
基于长短期记忆网络的脉冲星噪声少样本预测
Qingye Tang, Dechao An, Haoran Peng, Yuqi Ouyang
AI总结 针对脉冲星计时数据稀缺问题,提出一种结合模型无关元学习优化的LSTM网络,仅需少量真实计时残差即可快速适应新频域,并利用粒子群算法自动调参,在IPTA数据集上以10%数据实现高精度预测。
本文提出了一种新颖的解决方案,用于在有限数据下预测脉冲星计时残差,解决了PTA数据集中毫秒脉冲星自旋频率子组数据稀缺的关键挑战。该方案应用了长短期记忆(LSTM)网络,并通过模型无关元学习算法进行优化,使得仅需少量真实计时残差即可通过微调LSTM网络快速适应新的频域。同时,采用粒子群优化算法进行自动超参数优化,提高了预测精度。我们的解决方案在国际脉冲星计时阵列(IPTA)第二次数据发布上进行了评估,在高频测试频域的三个指标上均展现出鲁棒的泛化能力和准确预测,且仅需这些域中10%的计时残差进行模型微调。此外,我们的轻量级结构仅需16.86 MB CPU内存和18毫秒即可完成单步残差预测。所有这些特性使得我们的解决方案非常适合实际应用,在这些应用中,有效且实时的脉冲星计时残差预测至关重要——尤其是在计算能力、内存或能源有限的资源受限环境中。
This work proposes a novel solution to predict pulsar timing residuals with limited data, addressing the critical challenge of data scarcity across spin-frequency subgroups of millisecond pulsars in PTA datasets. The proposed solution applies a Long Short-Term Memory (LSTM) network optimized using the model-agnostic meta-learning algorithm, enabling rapid adaptation to new frequency domain by fine-tuning the LSTM network with only a few-shot of ground truth timing residuals. Particle swarm optimization algorithm is also used for automatic hyperparameter optimization, leading to improved prediction accuracy. Our solution, evaluated on the second data release of the International Pulsar Timing Array (IPTA), demonstrates robust generalization with accurate predictions in three metrics across high-frequency test frequency domains, while requiring only 10% of the timing residuals from these domains for model fine-tuning. Furthermore, our lightweight structure only costs 16.86 MB CPU memory and 18 milliseconds for single-step residual prediction. All these characteristics make our solution highly suitable for real-world applications, where effective and real-time predictions of pulsar timing residuals are essential-particularly in resource-constrained environments with limited computational power, memory, or energy availability.
稀疏主成分分析的鲁棒优化方法
David Vävinggren, Francis Bach, André M. H. Teixeira, Dave Zachariah, Antônio H. Ribeiro
AI总结 提出AdvPCA方法,通过鲁棒优化在重建目标中引入最坏情况潜在空间扰动实现稀疏性,并给出闭式解和迭代算法。
虽然主成分分析(PCA)是降维的基本工具,但其稠密表示使其不适用于高维数据。现有方法通过显式的$\ell_1$惩罚来促进稀疏性,但由于任务的无监督性质,这些惩罚不易调整。相比之下,我们提出了对抗性PCA(AdvPCA),它利用鲁棒优化,通过优化针对有界、最坏情况潜在空间扰动的重建目标来实现稀疏性。我们表明,该公式允许闭式约简,从而产生一种实用的迭代算法,该算法交替进行稀疏编码器的对抗性线性回归式更新和解码器的正交更新。通过对解进行理论刻画,我们推导出一种数据自适应参数化,使算法能够开箱即用地有效执行。我们通过在合成和真实世界基因组学数据上的数值实验验证了这些主张。
While principal component analysis (PCA) is a fundamental tool for dimensionality reduction, its dense representations make it ill-suited for high-dimensional data. Existing methods address this by promoting sparsity through explicit $\ell_1$-penalties, but these are not obvious to tune due to the unsupervised nature of the task. In contrast, we propose Adversarial PCA (AdvPCA), which leverages robust optimization to achieve sparsity by optimizing the reconstruction objective against bounded, worst-case latent space perturbations. We show that this formulation admits a closed-form reduction, leading to a practical iterative algorithm that alternates between adversarial linear regression-style updates for the sparse encoder and orthogonal updates for the decoder. By theoretically characterizing the solution, we derive a data-adaptive parameterization that allows the algorithm to perform effectively out of the box. We validate these claims through numerical experiments on synthetic and real-world genomics data.
测量约束下基于替代辅助的最优抽样用于风险预测
Sunhyun Park, Seong-ho Lee
AI总结 针对真实响应测量成本高且有限的问题,提出一种利用替代变量辅助的最优抽样框架,通过最小化期望交叉熵损失来分配测量预算,并构建逆概率加权估计器,实现预测性能提升与鲁棒性。
在许多风险预测问题中,协变量和响应替代变量通常可在大规模目标人群中常规获取,而真实响应则成本高昂且仅对有限子集观测。这产生了一个设计问题:在固定测量预算下,必须决定哪些观测应接受响应测量以构建预测模型。我们提出了一种在测量约束下用于风险预测的替代辅助最优抽样框架。在目标设定中,替代变量识别出已确认的阳性病例,而替代阴性观测的响应未被观测且可选择性地测量,因此抽样设计决定了响应测量预算的分配方式。我们的框架构建了最小化期望样本外交叉熵损失主导项的最优抽样设计,并将所得设计纳入逆概率加权交叉熵估计量。所提出的设计仅依赖于协变量、替代变量和初步估计量,因此在设计阶段不需要未标记观测的响应。我们建立了所得估计量的一致性、渐近正态性和前导阶预测最优性。广泛的模拟研究和两个真实数据应用表明,所提出的设计提高了预测性能,并在替代变量错误设定和罕见结局设定下表现出鲁棒性。
In many risk prediction problems, covariates and a response surrogate are routinely available for a large target population, whereas the true response is costly to ascertain and is observed only for a limited subset. This creates a design problem: one must decide which observations should receive response measurement in order to build a prediction model under a fixed measurement budget. We propose a surrogate-assisted optimal sampling framework for risk prediction under measurement constraints. In the target setting, the surrogate identifies confirmed positive cases, while responses for surrogate-negative observations remain unobserved and can be selectively measured, and thus the sampling design determines how the response measurement budget is allocated. Our framework constructs an optimal sampling design minimizing the leading term of the expected out-of-sample cross-entropy loss and incorporates the resulting design into an inverse-probability-weighted cross-entropy estimator. The proposed design depends only on covariates, the surrogate, and a preliminary estimator, and therefore does not require responses from unlabeled observations at the design stage. We establish consistency, asymptotic normality, and leading-order prediction optimality of the resulting estimator. Extensive simulation studies and two real data applications demonstrate that the proposed design improves prediction performance and exhibits robustness under surrogate misspecification and rare outcome settings.
高阶矢量Potts模型对离散数据的建模
Aaron De Clercq, Merijn Moody, Clélia de Mulatier
AI总结 本文通过引入q态自旋模型,将最大熵框架从二元数据推广到离散数据,提出高阶矢量Potts模型,并利用配分函数的圈展开和规范变换揭示其统计性质,最后聚焦于最小复杂模型实现快速模型选择。
对高维数据进行建模具有挑战性,但对于理解许多复杂系统至关重要。最大熵模型(如Ising模型和Potts模型)已被广泛用于从数据中的相关模式捕获成对相互作用,从而能够从观测(例如,从蛋白质序列或神经群体活动)中推断复杂系统的图形表示。最近,人们对涉及三个或更多变量的高阶相关模式建模的兴趣日益增长。虽然在高阶Ising模型的二元数据方面取得了进展,但我们将此框架扩展到更一般的离散数据情况。我们引入了q态自旋模型,这是一个完整的最大熵模型族,将矢量Potts模型推广到包含长程和任意高阶相互作用。在成对情况下,与标准矢量Potts模型相比,我们的模型允许更多样化的相互作用类型。我们通过示例讨论了它们的统计解释,并将其与离散傅里叶分析联系起来。利用配分函数的圈展开,我们证明了自旋模型的统计性质完全由其相互作用的代数结构所捕获。我们定义了规范变换,在此变换下该结构(以及配分函数)保持不变。规范变换下等价的模型可以被视为同一抽象统计模型的不同表示,尽管通常具有不同阶数的相互作用,这扩展了二元情况的结果。对于数据分析的实际应用,我们专注于二元情况下称为最小复杂模型的一个子集,并将其推广到离散数据。我们获得了这些模型边际似然的闭式表达式,从而能够快速进行模型选择。我们通过简单的真实世界示例说明了它们的用途。
Modeling high-dimensional data is challenging, yet essential to understanding many complex systems. Maximum entropy models such as Ising and Potts models have been used extensively to capture pairwise interactions from correlation patterns in data, allowing to infer graphical representations of complex systems from observations (e.g., from protein sequences or neural population activity). Recently, there has been growing interest in modeling higher-order correlation patterns involving simultaneously three or more variables. While progress has been made in binary data with high-order Ising models, we extend this framework to the more general case of discrete data. We introduce q-state spin models, a complete family of maximum entropy models that generalize the vector Potts model to include long-range and arbitrary high-order interactions. In the pairwise case, our models allow for more diverse interaction types compared to the standard vector Potts model. We discuss their statistical interpretation with examples and relate them to discrete Fourier analysis. Using a loop expansion of the partition function, we show that the statistical properties of spin models are fully captured by the algebraic structure of their interactions. We define gauge transformations under which this structure, and thus the partition function, remains invariant. Models equivalent under gauge transformations can be seen as different representations of the same abstract statistical model, despite generally having interactions of different orders, extending results from the binary case. For practical application to data analysis, we focus on a subset of models known in the binary case as Minimally Complex Models, generalizing them to discrete data. We obtain a closed-form expression for the marginal likelihood of these models, enabling fast model selection. We illustrate their use with simple real-world examples.
结合统计特征与深度编码的基于排练的类增量时间序列分类
Pablo García-Santaclara, Bruno Fernández-Castro, Rebeca Pilar Díaz-Redondo
AI总结 提出一种双流特征提取管道(结合预训练冻结基础模型的深度时间嵌入特征与统计特征),用于多变量时间序列的类增量持续学习,在五个基准数据集上实现了有竞争力的平均准确率和低遗忘率。
现实环境中使用的许多系统需要在不遗忘分类模型先前学习内容的情况下添加新类别并整合新信息。这被称为类增量持续学习,而对于多变量时间序列,数据的时间结构进一步增加了复杂性。本文提出了一种基于双流特征提取管道(使用通过预训练冻结基础模型生成的深度时间嵌入特征以及应用统计特征)的多变量时间序列分类类增量持续学习的新方法。在五个基准数据集上的评估表明,所提出的系统在所有数据集上实现了有竞争力的平均准确率,同时在所有实验配置中保持了较低的遗忘率。
Many systems used in real-world environments require adding new categories and incorporating new information without forgetting what was previously learnt by the classification model. This is known as class-incremental continual learning, and in the case of multivariate time-series, is further complicated by the temporal structure of the data. In this paper, we present a novel approach for performing class incremental continual learning for the classification of multivariate time series data based upon the construction of a dual-stream feature extraction pipeline (using both deep temporal embedding features generated via a pre-trained frozen foundation model and application of statistical features). Evaluated on five benchmark datasets, the proposed system achieves competitive average accuracy across all datasets while maintaining low forgetting rates across all experimental configurations.
校准的层次结构:分类与回归的融合
Johannes Resin, Lu Yang, Tilmann Gneiting
AI总结 本文综述、扩展并桥接了分类与回归任务中的校准概念,重点研究了不同校准概念之间的层次关系,并提出了模态校准、全校准、部分校准和平均校准等新概念。
校准概念形式化了概率预测与相应结果之间的兼容性。简而言之,结果应与从预测分布中随机抽取的样本无法区分。本文回顾、扩展并桥接了针对分类和回归任务提出的校准概念。特别强调了各种概念之间的层次关系,因为它们适用于一般实值数据、连续结果、计数数据、名义类别和二元结果。为了突出若干贡献,我们引入了名义结果的模态校准概念,在此背景下区分了全校准、部分校准和平均校准,并证明了双概率积分变换(PIT)校准在逻辑上独立于先前针对离散结果提出的校准概念。此外,我们推广了关于校准概念的现有结果,这些概念以预测分布的性质或泛函(如均值、分位数或事件概率)表示。在整篇论文中,我们通过实例说明这些概念及其层次关系,并提供支持构建指导性示例和反例的算法工具。
Concepts of calibration formalize the compatibility between probabilistic predictions and the respective outcomes. In a nutshell, the outcomes ought to be indistinguishable from random draws from the predictive distributions. In this paper, we review, extend, and bridge notions of calibration that have been proposed for classification and regression tasks. Particular emphasis is given to hierarchical relations between the various notions, as they apply to general real-valued data, continuous outcomes, count data, nominal classes, and binary outcomes. To highlight a number of contributions, we introduce the notion of modal calibration for nominal outcomes, we distinguish full, partial, and average calibration in this setting, and we show that double probability integral transform (PIT) calibration is logically independent of previously proposed concepts of calibration for discrete outcomes. Furthermore, we generalize extant results on concepts of calibration that are expressed in terms of properties or functionals of the predictive distributions, such as means, quantiles, or event probabilities. Throughout the paper, we illustrate the concepts and their hierarchical relations in worked examples, and we provide algorithmic tools that support the construction of instructive examples and counterexamples.
面向预测的卡尔曼滤波
Zheyang Shen, Gerardo Duran-Martin, Chris. J. Oates
AI总结 针对非线性状态空间模型中模型误设导致的过度自信推断问题,提出一种基于预测导向后验的快速近似线性高斯更新方法(EKF-PrO),计算成本与现有滤波方法相当。
本文提出了一种后贝叶斯方法,用于非线性状态空间模型中的在线滤波,能够避免在动力学模型、测量模型或两者都可能被误设的情况下出现过度自信的推断。这通过使用预测导向(PrO)后验来解决,这是一种新兴范式,其中学习(即后验集中)当且仅当整体模型被良好指定时发生,而不严格遵循贝叶斯定理。由于PrO后验的表征具有挑战性,我们的主要技术贡献是一种快速的近似线性高斯更新过程,类似于(迭代)扩展卡尔曼滤波。该方法称为EKF-PrO,没有可调超参数,计算成本与现有滤波方法相当。在系统性地误设状态空间模型的一系列线性和非线性应用中,对性能进行了实证评估。
This paper presents a post-Bayesian approach to online filtering in nonlinear state-space models, capable of avoiding over-confident inferences in settings where either the dynamical model, the measurement model, or both, could be misspecified. This is addressed using predictively oriented (PrO) posteriors, an emerging paradigm in which learning (i.e., posterior concentration) occurs if and only if the overall model is well-specified, without strict adherence to Bayes' theorem. As the characterisation of PrO posteriors is challenging, our main technical contribution is a fast approximate linear-Gaussian update procedure, analogous to an (iterated) extended Kalman filter. The methodology, which we call EKF-PrO, has no tunable hyper-parameters and has a computational cost comparable to that of existing filtering methods. Performance is empirically assessed on a range of linear and non-linear applications, in which the state-space model is systematically misspecified.
上下文学习中思维链的渐近理论
Kaito Takanami, Cengiz Pehlevan
AI总结 通过高维随机矩阵理论,推导了线性回归中上下文学习思维链的泛化误差精确公式,揭示了推理深度、预训练数据量和上下文长度之间的相变现象。
思维链推理已成为一种广泛使用的机制,通过在推理时生成中间推理步骤来激发大型语言模型的多步推理。然而,泛化能力随思维链深度的缩放行为仍知之甚少。为了解决这个问题,我们研究了一个理论上可解的线性回归中上下文权重预测的思维链模型,其中测试时推理表示为权重参数估计的迭代细化。利用高维渐近下的随机矩阵理论工具,我们推导了泛化误差作为推理深度、预训练数据量和上下文长度的精确公式。我们的分析揭示了指数与多项式改进、饱和及过度思考之间的尖锐相变,并刻画了最优推理深度如何缩放。我们进一步表明,更深的推理在预训练和上下文信息足够丰富时最为有效,而有限的预训练或上下文会使较长的推理容易产生误差放大或饱和。我们还通过在完全学习的线性注意力和softmax注意力模型上的实验验证了这些预测。我们的结果为测试时思维链深度如何影响泛化提供了一个统一的理论解释。
Chain-of-thought (CoT) reasoning has become a widely used mechanism for eliciting multi-step reasoning in large language models by generating intermediate reasoning steps at inference time. Yet the scaling behavior of generalization with CoT depth remains poorly understood. To address this question, we study a theoretically solvable model of CoT for in-context weight prediction in linear regression, where test-time reasoning is represented as an iterative refinement of the weight-parameter estimate. Using tools from random matrix theory under high-dimensional asymptotics, we derive an exact formula for the generalization error as a function of reasoning depth, pretraining data amount, and context length. Our analysis reveals a sharp phase transition separating exponential and polynomial improvement, saturation, and overthinking, and characterizes how the optimal reasoning depth scales. We further show that deeper reasoning is most effective with sufficiently rich pretraining and in-context information, whereas limited pretraining or context makes longer reasoning prone to error amplification or saturation. We also validate these predictions through experiments on fully learned linear attention and softmax attention models. Our results provide a unified theoretical account of how test-time CoT depth affects generalization.
通过OPAL进行预测辅助推断的优化标注资源分配
Virginia L. Ma, Emmanuel J. Candès
AI总结 提出OPAL方法,通过可学习的平滑策略分配标注资源,以最小化估计方差,实现预测辅助推断中的高效标注和统计推断。
主动统计推断是一个新框架,能够对总体参数做出具有可证明统计保证的精确声明。它利用预测性“黑箱”机器学习模型策略性地决定标注哪些数据点,大致优先考虑ML模型对其标签值不确定的样本。一个主要问题是,当不确定性估计存在噪声时,该框架可能变得脆弱。本文介绍了OPAL(标注分配优化策略),它在可处理的平滑策略类中学习标注策略,以产生方差最小的估计量。实际上,OPAL是一个端到端的流程,将黑箱模型的不确定性得分转化为数据自适应的标注策略,然后对收集的样本进行推断。我们在涵盖医学影像数据、计算社会科学和蛋白质组学的真实数据集上评估了OPAL。作为一个具体例子,我们考虑从组织病理学图像预测乳腺癌亚型,并使用OPAL为不同人口统计组的比值比形成有效的置信区间。我们表明,OPAL在有限样本中实现了名义覆盖,并具有人们期望从拥有更多标注样本的方法中获得的准确性。
Active Statistical Inference is a new framework to make precise claims about population parameters with provable statistical guarantees. It uses a predictive "black-box" machine learning (ML) model to strategically decide which data points to label, roughly prioritizing samples for which the ML model is unsure about their label values. A major issue is that the framework can be brittle when uncertainty estimates are noisy. This paper introduces OPAL (Optimized Policy for Allocation of Labels), which learns a labeling strategy within a tractable class of smooth policies to yield estimators with the lowest variance. In effect, OPAL is an end-to-end pipeline that turns a black-box model's uncertainty scores into a data-adaptive labeling strategy and then performs inference on the collected samples. We evaluate OPAL on real datasets spanning medical imaging data, computational social science, and proteomics. As a concrete example, we consider predicting breast cancer subtype from histopathology images and using OPAL to form valid confidence intervals for odds ratios for different demographic groups. We show that OPAL achieves nominal coverage in finite samples and has the accuracy one expects from methods which have far more labeled samples.
高维尾指数回归的高效联邦估计与推断
Haoyu Geng, Liuhua Peng, Changliang Zou, Xiaolong Cui
AI总结 针对异质联邦数据,提出基于稀疏正则化与非凹融合惩罚的高维尾指数回归方法,实现系数估计、变量选择和分组恢复,并建立去偏联邦推断程序。
尾指数回归研究协变量如何影响重尾数据的尾部重度。在许多应用中,数据分布在异质来源中,由于隐私或监管限制,直接合并不可行。现有方法主要关注单数据集分析,未解决异质联邦设置。我们开发了一个针对高维尾指数回归的个性化联邦框架,该框架在利用客户端间潜在相似性的同时适应客户端异质性。所提出的估计器结合稀疏正则化和非凹融合惩罚,进行系数估计、变量选择和分组恢复。我们建立了非渐近收敛速度,并证明该估计器通过一致恢复潜在分组结构具有oracle性质。在计算方面,我们开发了一种基于ADMM的联邦算法,具有自适应梯度更新,并建立了其收敛保证。我们进一步提出了一种基于相关客户端间自适应加权聚合的去偏联邦推断程序,产生有效的置信区间和假设检验,其效率优于仅目标推断。模拟研究和真实数据分析证明了所提出方法的有效性。
Tail index regression studies how covariates affect tail heaviness in heavy-tailed data. In many applications, data are distributed across heterogeneous sources, where direct pooling is infeasible due to privacy or regulatory constraints. Existing methods mainly focus on single-dataset analysis and do not address heterogeneous federated settings. We develop a personalized federated framework for high-dimensional tail index regression that accommodates client heterogeneity while exploiting latent similarities across clients. The proposed estimator combines sparsity regularization with nonconcave fusion penalties to perform coefficient estimation, variable selection, and group recovery. We establish non-asymptotic convergence rates and show that the estimator enjoys an oracle property by consistently recovering the underlying grouping structure. For computation, we develop an ADMM-based federated algorithm with adaptive gradient updates and establish its convergence guarantees. We further propose a debiased federated inference procedure based on adaptive weighted aggregation across related clients, yielding valid confidence intervals and hypothesis tests with improved efficiency over target-only inference. Simulation studies and real-data analysis demonstrate the effectiveness of the proposed methods.
Trans GAN-WT: 一种基于特征提取和交互学习的风电机组时间序列数据异常检测模型
Jingzhe Kang
AI总结 提出融合Transformer和生成对抗网络的异常检测模型TransGAN-WT,通过放大重构误差、自回归多模态特征提取和时序特征交互学习,在真实风电机组数据集上F1达96.10%,误报率仅0.06%。
随着风电场规模和数量的增加,风电机组的日常运维成本不断上升。为了降低运维成本并在灾难性故障发生前提高风电机组及系统运行数据的可靠性,监测设备运行状态并在早期检测故障至关重要。利用工况数据对风电机组运行状态进行异常评估,实现运行状态异常监测具有重要的实际意义。然而,现有的异常检测方法既无法在充满大量冗余信息的数据中进行有效的关系建模,也无法合理利用有价值的异常数据。为此,本文提出了一种融合Transformer和生成对抗网络的异常检测模型。首先,通过放大重构误差来降低微小偏差异常的漏检率。其次,利用自回归推理提取多模态特征,以增强训练的稳定性和泛化能力。最后,构建时序特征提取模块,促进不同时间尺度特征之间的交互学习,有效减少时间冗余。在真实风电机组数据集上进行的多组实验结果表明,TransGAN-WT在多个风电机组数据集上的平均F1分数达到96.10%,比几种其他最先进的基线方法分别高出5.84%和2.89%。同时,其误报率(FPR)仅为0.06%,并通过Wilcoxon符号秩检验验证了与最先进基线方法相比取得了统计上显著的性能提升,有效保障了风电机组的稳定运行。
With the increasing scale and number of wind farms, wind turbines' daily operation and maintenance costs are increasing. To reduce operation and maintenance costs and enhance the reliability of wind turbine and system operation data before reaching catastrophic failures, monitoring the operating status of the equipment and detecting failures at an early stage is crucial. It is of great practical significance to utilize the working condition data for abnormal assessment of the operating status of wind turbines to realize abnormal monitoring of the operating status of wind turbines. However, the existing anomaly detection methods can neither perform effective relational modeling in data filled with a large amount of redundant information nor reasonably utilize the valuable anomaly data. For this reason, this paper proposes an anomaly detection model that fuses a Transformer and a generative adversarial network. Firstly, it reduces the leakage detection rate of minor deviation anomalies by amplifying the reconstruction error. Secondly, it uses autoregressive inference to extract multimodal features to enhance the stability and generalization ability of training. Finally, the temporal feature extraction module is constructed to promote the interactive learning between features of different time scales and effectively reduce the time redundancy. The results of multiple sets of experiments conducted on real WTG datasets show that TransGAN-WT achieves an average F1 score of 96.10% across multiple wind turbine datasets, which is 5.84% and 2.89% higher than several other state-of-the-art baseline methods. It also realizes a false positive rate (FPR) of 0.06%, and is verified by the Wilcoxon signed-rank test to have achieved a statistically significant performance enhancement compared to the state-of-the-art baseline methods, effectively ensuring the stable operation of wind turbines.
边际化泊松障碍模型用于含过多零的截面计数数据
Fred Fosu Agyarko, Edward Acheampong, Issah Seidu, Samuel Iddi
AI总结 针对含过多零的计数数据,提出边际化泊松障碍模型(MPHM),通过重新参数化计数分量直接建模边际均值,解决了标准泊松障碍模型中发病率密度比(IDR)非恒定问题,并证明了估计量的渐近性质。
在健康经济学和流行病学中,含过多零的计数数据频繁出现。标准泊松障碍模型(PHM)直接参数化潜在的泊松率,因此其计数分量系数是对数率比而非边际均值的对数比。因此,PHM的发病率密度比(IDR)既不精确也不随协变量分布恒定,这使应用报告复杂化。我们提出边际化泊松障碍模型(MPHM),它重新参数化计数分量,使得系数向量beta直接控制边际均值E[Y]。一个非线性连接方程将结构泊松率与该参数化均值联系起来。我们证明了连接解的存在性和唯一性,开发了向量化的Brent方法求解器,推导了得分方程和块对角Fisher信息,建立了渐近正态性,并证明了exp(beta)在所有协变量值上精确恒定。一项模拟研究,样本量n ∈ {100, 250, 500, 1000},零比例π ∈ {0.2, 0.4, 0.6, 0.8},R = 200次重复,在所有16种场景下确认了一致性、接近零的偏差以及0.905-0.975的95% Wald覆盖率。应用于NMES1988医生就诊数据(n = 4,406),MPHM得出每个额外慢性病的IDR = 1.163(95% CI: 1.150-1.177)——这是一个精确的、全人群效应,而PHM无法得出。MPHM通过直接参数化E[Y]解决了非恒定IDR问题。得到的IDR对每个个体和整个人群都成立,无需进一步边际化,大大简化了健康利用研究中协变量效应的报告。
Count data with excess zeros arise frequently in health economics and epidemiology. The standard Poisson Hurdle Model (PHM) parametrises the underlying Poisson rate directly, so its count-component coefficients are log-rate ratios rather than log-ratios of the marginal mean. Consequently, the incidence density ratio (IDR) from the PHM is neither exact nor constant across covariate profiles, complicating applied reporting. We propose the Marginalised Poisson Hurdle Model (MPHM), which reparametrises the count component so that the coefficient vector beta directly governs the marginal mean E[Y]. A nonlinear connector equation links the structural Poisson rate to this parametrised mean. We prove existence and uniqueness of the connector solution, develop a vectorised Brent's-method solver, derive the score equations and block-diagonal Fisher information, establish asymptotic normality, and prove that exp(beta) is exactly constant across all covariate values. A simulation study with n in {100, 250, 500, 1000}, zero proportion pi in {0.2, 0.4, 0.6, 0.8}, and R = 200 replications confirms consistency, near-zero bias, and 95% Wald coverage of 0.905-0.975 across all 16 scenarios. Applied to the NMES1988 physician visit data (n = 4,406), the MPHM yields IDR = 1.163 (95% CI: 1.150-1.177) per additional chronic condition - an exact, population-wide effect not derivable from the PHM. The MPHM resolves the non-constant IDR problem by directly parametrising E[Y]. The resulting IDR holds for every individual and the whole population without further marginalisation, substantially simplifying the reporting of covariate effects in health utilisation research.
高维结果与高维预测变量的快速筛选方法
Hongju Park, Zhenyao Ye, Shuo Chen
AI总结 提出图独立双筛选(GIDS)框架,同时降低响应变量和预测变量的维度,以解决高维交叉模态分析中的计算负担和可解释性问题。
由于超高维度和复杂依赖结构伴随高水平噪声,对多模态高维数据间的交互建模本质上具有挑战性。筛选方法能有效降低维度,但大多数现有方法仅缩减预测变量空间而保留所有结果变量。在交叉模态分析中,不同结果变量通常选择不同的预测变量子集,因此并集仍然很大且响应维度不变,限制了筛选的实际效益。这导致沉重的计算负担和较差的可解释性。为解决这些局限,我们提出一个新的筛选框架——图独立双筛选(GIDS),它同时降低响应变量和预测变量的维度。我们设计了计算高效的算法,促进后续选择过程,提高准确性和可扩展性,并建立了支持性的理论结果。广泛的模拟研究表明,GIDS优于仅筛选预测变量的现有方法。为展示其实用性,我们将GIDS应用于阿尔茨海默病神经影像学倡议(ADNI)数据集,分析全基因组865,353个DNA甲基化与49,386个转录组变量之间的交互。GIDS将特征空间缩减至约9,000个CpG位点和2,000个转录本,揭示了块状交互结构:具有强关联的CpG位点簇和基因转录本簇。这些发现不仅提高了计算可处理性,还产生了可解释的生物学见解,突显了阿尔茨海默病背后的协调调控机制。
Modeling interactions among multimodal, high-dimensional data is intrinsically challenging due to ultra-high dimensionality and complex dependence structure with high level noise. Screening methods are effective for reducing dimensionality, but most existing approaches shrink only the predictor space while retaining all outcomes. In cross-modal analyses, different outcomes often select different predictor subsets, so the union remains large and the response dimension is unchanged, limiting the practical benefit of screening. This gives rise to heavy computational burdens and poor interpretability. To address these limitations, we propose a new screening framework, Graph Independence Dual Screening (GIDS), which simultaneously reduces the dimensionality of response variables and predictors. We design computationally efficient algorithms that facilitate downstream selection procedures, improving accuracy and scalability, and establish supporting theoretical results. Extensive simulation studies demonstrate that GIDS outperforms existing methods that screen only predictors. To illustrate its utility, we applied GIDS to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, analyzing interactions between genome-wide 865,353 DNA methylation and 49,386 transcriptomic variables. GIDS reduced the feature space to approximately 9,000 CpGs and 2,000 transcripts, uncovering blockwise interaction structures: clusters of CpG sites and gene transcripts with strong associations. These findings not only improve computational tractability but also yield interpretable biological insights, highlighting coordinated regulatory mechanisms underlying Alzheimer's disease.
强大的切换实验?——或者不是?
Sergei Pankratev
AI总结 本文推导了切换实验个体层面OLS估计量的多水平渐近方差闭合公式,揭示了统计功效的结构性下限,并研究了三种方法论应用。
切换实验——其中处理在跨时间段的聚类水平上分配——广泛应用于市场和平台环境,但目前尚不存在闭合形式的功效公式。我们通过推导个体层面OLS估计量的闭合形式、多水平渐近方差近似来填补这一空白,从而便于功效预算。利用该公式,我们揭示了统计功效的结构性下限:虽然特质噪声随观测密度消失,但宏观冲击会因聚类规模不平衡而受到乘法惩罚。通过解析推导和蒙特卡洛模拟,我们确认该公式在典型参数下是精确的,并在极端边界情况下作为数学上的保守上界。我们研究了三种方法论应用。首先,我们证明诸如分层等高级分配设计仅能部分消除聚类规模不平衡对功效的惩罚。其次,我们证明针对宏观冲击的方差缩减技术比针对残差噪声的技术产生不成比例的更高效率增益。第三,我们形式化了个体层面估计量与单元层面估计量之间的有限样本功效权衡。
Switchback experiments -- in which treatment is assigned at the level of a cluster crossed with a time period -- are widely used in marketplace and platform settings, yet no closed-form power formula exists for them. We fill this gap by deriving a closed-form, multi-level asymptotic variance approximation for the individual-level OLS estimator, facilitating power budgeting. Using this formula, we reveal a structural floor on statistical power: while idiosyncratic noise vanishes with observation density, macro-level shocks are multiplicatively penalized by cluster size imbalance. We confirm through analytical derivations and Monte Carlo simulations that the formula is exact across typical parameters and serves as a mathematically conservative upper bound in extreme boundary regimes. We study three methodological applications. First, we prove that advanced assignment designs like stratification only partially eliminate the penalty of cluster size imbalance on power. Second, we demonstrate that variance reduction techniques targeting macro-level shocks yield disproportionately greater efficiency gains than those targeting residual noise. Third, we formalize the finite-sample power trade-offs between individual-level and cell-level estimators.
计算多类型 Galton-Watson 过程的最终流行规模分布
Yuta Okada, Hiroshi Nishiura
AI总结 提出一种基于柯西积分轮廓选择的方法,计算多类型 Galton-Watson 过程的最终规模分布,并应用于模拟数据和中东呼吸综合征真实数据。
Galton-Watson 过程 (GWP) 是一种离散时间分支过程模型,为分析流行病数据和估计基本再生数等关键流行病学参数提供了有力工具。当与基于监测的簇大小数据结合使用时,即使每个传播过程不可直接观测,GWP 也能揭示传播异质性的程度。当获得簇大小分布数据时,可通过使用与观测簇大小数据对应的概率质量函数来统计推断控制传播的参数。然而,对于多类型 GWP,实际应用仍然有限,可能是因为缺乏概念上和实践中直接的方法来推导最终规模分布的闭式解。在本研究中,我们提出一个框架,通过选择柯西积分轮廓的方法来计算多类型 GWP 的最终规模分布。我们提供了如何将我们的框架应用于模拟数据和中东呼吸综合征真实数据的示例,并讨论了在使用未以灭绝为条件的似然进行统计推断时参数可识别性方面的潜在陷阱。
The Galton--Watson process (GWP) is a discrete-time branching process model that provides a powerful tool for analyzing epidemic data and estimating key epidemiological parameters such as the basic reproduction number. When used with surveillance-based cluster size data, the GWP can also elicit information about the extent of transmission heterogeneity, even when each transmission process is not directly observable. When cluster size distribution data are available, the parameters that govern the transmission can be statistically inferred by using the probability mass function that corresponds to the observed cluster size data. For multi-type GWPs, however, real-world applications remain limited, possibly because of the absence of conceptually and practically straightforward approaches for deriving the closed-form solution of the final size distribution. In the present study, we propose a framework for computing the final size distribution of multi-type GWPs, using a method for the choice of the Cauchy integral contour. We provide examples of how our framework can be applied to both simulated data and real-world data of Middle East respiratory syndrome, and discuss potential pitfalls surrounding the identifiability of parameters for statistical inference when using likelihoods that are not conditioned on extinction.
可扩展的导数高斯过程通过精确梯度约简
Hyunseok Seung, Matthias Katzfuss
AI总结 提出TERA方法,利用精确梯度约简将导数高斯过程的计算复杂度从O(n^3 d^3)降至O(d m^2 + m^6),实现高维空间中的可扩展推理。
梯度观测可以显著改善高斯过程(GP)代理,特别是在函数评估昂贵的高维设置中。然而,对n个函数值和n个完整梯度(d维)进行精确推理的计算复杂度与联合状态大小呈三次方关系,导致难以处理的O(n^3 d^3)计算瓶颈。我们提出TERA,一种基于目标特定精确梯度约简的高度可扩展导数GP方法。我们证明,对于平稳核,与连接目标和条件点的方向正交的梯度分量在条件上独立于目标函数值;因此,一旦指定了大小为m的条件集,精确条件密度完全由至多m^2个方向导数刻画。通过将这些约简的、无维度的条件作为Vecchia近似中的局部因子,TERA有效地将n和d从稠密矩阵求逆中解耦。这将每个目标的评估成本降低到O(d m^2 + m^6)时间和O(d m^2 + m^4)内存,同时保持底层导数GP模型在数学上不变。实验评估表明,TERA实现了最先进的预测精度,同时比标准导数GP快数个数量级。关键的是,计算时间和峰值GPU内存相对于d基本保持平稳,从而在高维空间中实现高度可扩展的推理。
Gradient observations can substantially improve Gaussian process (GP) surrogates, particularly in high-dimensional settings where function evaluations are expensive. However, exact inference with $n$ function values and $n$ full gradients in $d$ dimensions scales cubically in the joint state size, imposing an intractable $\mathcal{O}(n^3 d^3)$ computational bottleneck. We introduce TERA, a highly scalable derivative GP method based on target-specific exact gradient reduction. We prove that for stationary kernels, the gradient components orthogonal to the directions connecting the target and conditioning points are conditionally independent of the target function value; consequently, the exact conditional density is fully characterized by at most $m^2$ directional derivatives once a conditioning set of size $m$ is specified. By using these reduced, dimension-free conditionals as local factors in a Vecchia approximation, TERA effectively decouples $n$ and $d$ from the dense matrix inversion. This reduces the per-target evaluation cost to $\mathcal{O}(dm^2 + m^6)$ time and $\mathcal{O}(dm^2 + m^4)$ memory, leaving the underlying derivative GP model mathematically unchanged. Empirical evaluations demonstrate that TERA achieves state-of-the-art predictive accuracy while operating orders of magnitude faster than standard derivative GPs. Crucially, both computation time and peak GPU memory remain essentially flat with respect to $d$, enabling highly scalable inference in high-dimensional spaces.
基于最终结果数据的随机流行病模型的神经后验估计
Theodore Kypraios
AI总结 本文首次将神经后验估计(NPE)应用于基于最终结果数据的随机SIR流行病模型,通过前馈神经网络参数化对数正态后验近似,准确恢复参考后验,并推广到家庭模型中的全局和局部传播率联合推断。
神经后验估计(NPE)是一种基于模拟的贝叶斯推断方法,通过训练神经网络从模拟的参数-数据对中近似后验分布,绕过了似然评估。我们首次将NPE应用于通过最终结果数据观测的随机易感-感染-移除(SIR)流行病模型,考虑了均匀混合和家庭结构种群。这类数据在回顾性暴发调查和家庭传播研究中自然出现,但推断在计算上具有挑战性:数据增强马尔可夫链蒙特卡洛(MCMC)在大种群中混合缓慢且难以实现,而近似贝叶斯计算(ABC)则面临低接受率,尤其是对于大种群或不太可能的结果。此类观测的离散、低维特性使得该设置特别适合NPE。我们表明,由前馈神经网络参数化的对数正态后验近似能够准确恢复各种种群大小和传播机制下的参考后验,并自然地扩展到家庭模型中全局和局部传播率的联合推断。一旦训练完成,网络在几秒钟内产生近似后验分布,并可靠地推广到训练中未见过的种群大小和结构。在合成和真实暴发数据集上的性能始终强劲,结果与已发表的分析高度一致。
Neural posterior estimation (NPE) is a simulation-based approach to Bayesian inference that trains a neural network to approximate the posterior distribution from simulated parameter - data pairs, bypassing likelihood evaluation. We apply NPE -- to our knowledge for the first time -- to stochastic susceptible-infectious-removed (SIR) epidemic models observed through final outcome data, considering both homogeneously mixing and household-structured populations. Such data arise naturally in retrospective outbreak investigations and household transmission studies, yet inference is computationally challenging: data-augmentation Markov chain Monte Carlo (MCMC) can be slow to mix in large populations and difficult to implement, while Approximate Bayesian Computation (ABC) suffers from low acceptance rates, particularly for large populations or unlikely outcomes. The discrete, low-dimensional nature of such observations makes this setting particularly well suited to NPE. We show that a logNormal posterior approximation, parameterised by a feed-forward neural network, accurately recovers reference posteriors across a range of population sizes and transmission regimes, and extends naturally to joint inference on global and local transmission rates in the household model. Once trained, the network produces approximate posterior distributions in seconds and generalises reliably to population sizes and structures not seen during training. Performance on both synthetic and real outbreak datasets is consistently strong, with results in close agreement with published analyses.
顺序因果中介路径的识别、估计与推断
Ritoban Kundu, Canyi Chen, Peter X. K. Song
AI总结 本文建立了一个针对顺序中介变量的通用框架,实现了总效应的路径特定分解,并提出了基于学生化统计量和数据分割的推断方法,在复合零假设下有效控制第一类错误。
中介分析在揭示暴露通过中间途径影响结果的机制中起着重要作用。虽然单中介变量设置的方法学进展已较为成熟,但处理多个顺序中介变量的严谨工具仍不完善。这类设置在纵向队列研究等应用中很常见,其中暴露随时间通过复杂的中介链发挥作用。在本文中,我们建立了一个针对顺序中介变量的通用框架,能够识别总效应并将其正式分解为特定路径的效应。我们还开发了针对连续和分类结果的中介估计量的估计程序。此外,我们引入了一种新的检验策略,使用学生化统计量结合数据分割进行推断。该方法在复合零假设下,针对多种数据生成机制实现了有效的第一类错误控制。通过大量模拟和两项大规模实证研究的应用,我们证明了所提出的方法能够提供可靠的估计、有效的推断,并在发现新中介途径方面具有更高的功效。
Mediation analysis plays an essential role in uncovering the mechanisms by which an exposure influences an outcome through intermediate pathways. While methodological advances for single-mediator settings are well established, rigorous tools for handling multiple, sequentially ordered mediators remain underdeveloped. Such settings are common in applications like longitudinal cohort studies, where exposures operate through complex chains of mediators over time. In this paper, we establish a general framework for sequentially ordered mediators that enables the identification and formal decomposition of the total effect into component path-specific effects. We also develop estimation procedures for mediation estimands with both continuous and categorical outcomes. Furthermore, we introduce a new testing strategy to conduct inference using a studentized statistic combined with data-splitting. This approach achieves valid Type I error control under the composite null across diverse data-generating mechanisms. Through extensive simulations and applications to two large-scale empirical studies, we demonstrate that the proposed methodology provides reliable estimation, valid inference, and improved power for discovering novel mediation pathways.
大规模计算机实验的模拟器:定量与定性输入
Anita Shahrokhian, Youngdeok Hwang, C. Devon Lin
AI总结 提出一种基于加性高斯过程和Vecchia近似的可扩展框架,用于处理混合输入的大规模计算机实验模拟。
同时具有定量和定性输入的计算机实验在各个领域已变得普遍。然而,为这类大规模实验构建精确且计算高效的模拟器仍然是一个重大挑战。我们提出了一种新颖的、可扩展的框架,用于模拟具有混合输入的计算机实验。我们的方法基于一种新的协方差函数,该函数结合了加性高斯过程(GPs)来处理混合输入,并采用Vecchia近似实现可扩展性。我们证明,当与所提出的建模框架结合时,大规模计算机实验的方法可以有效地扩展。
Computer experiments with both quantitative and qualitative inputs have become common across various areas. However, constructing accurate and computationally efficient emulators for such experiments at large scales remains a significant challenge. We propose a novel, scalable framework for emulating computer experiments with mixed inputs. Our approach is based on a new covariance function integrating additive Gaussian Processes (GPs) to handle the mixed inputs, with Vecchia approximation for scalability. We demonstrate that methods for large-scale computer experiments can be effectively extended when paired with our proposed modeling framework.
ScoreStop: 基于梯度的早期停止方法使用函数得分检验
Oliver J. Hines, Christian L. Hines
AI总结 提出ScoreStop方法,通过函数得分检验在每次迭代中检验当前预测器是否为总体风险最小化器,从而在梯度提升决策树中实现基于梯度的早期停止,避免过拟合。
梯度提升决策树需要停止规则以避免过拟合。标准规则监控验证损失,如果损失在固定的耐心期内没有改善则停止。然而,耐心参数没有可解释的尺度,验证损失可能带有噪声或由用户指定的梯度隐式定义。我们提出ScoreStop,一种基于梯度的早期停止规则,将每次迭代的停止决策视为检验当前预测器是否为总体风险最小化器的原假设。我们使用在验证数据上计算的函数得分检验,其统计量在更新方向上具有尺度不变性,并且在原假设下具有已知的渐近分布。由于我们的检验使用梯度而非损失值,相同的构造适用于隐式损失(如LambdaRank)和通过影响函数的数据依赖损失(如Cox回归)。在合成实验和真实数据基准测试中,我们展示了ScoreStop与基于损失的方法相比具有竞争力。
Gradient boosted decision trees require a stopping rule to avoid overfitting. The standard rule monitors a validation loss and stops if the loss fails to improve for a fixed patience period. However, the patience parameter has no interpretable scale and validation losses can be noisy or implicitly defined by a user-specified gradient. We propose ScoreStop, a gradient-based early-stopping rule that casts the stopping decision at each iteration as a test of the null hypothesis that the current predictor is the population risk minimizer. We use a functional score test, computed on validation data, with a statistic that is scale-invariant in the update direction, with a known asymptotic distribution under the null. Because our test uses gradients rather than loss values, the same construction applies to implicit losses such as LambdaRank, and data-dependent losses such as Cox regression via influence functions. In synthetic experiments and real-data benchmarks, we show that ScoreStop is competitive with loss-based methods.
极值回归模型的诊断工具
Ed Mackay, Jordan Richards, Philip Jonathan
AI总结 针对极值回归模型,提出标准化尾部图和标准化残差图两种可视化诊断工具,通过渐近分布实现全局和局部拟合优度比较,支持模型选择与改进。
可视化和定量的拟合优度诊断是实践者工具箱中的重要工具。在拟合极值回归模型时,对令人信服且可靠的诊断的需求尤为明显,这类模型用于远超出响应变量可观测范围的外推,且常在未观测的协变量值处进行评估。尽管如此,针对极值回归模型的诊断工具很少,现有的工具在低维或非欧几里得协变量域(现代应用中常见)上的可解释性或可扩展性方面往往存在不足。此外,现有方法倾向于提供模型拟合的全局视角;即它们量化整个数据集上的拟合优度,而不提供对协变量空间中模型拟合可能较差的区域的洞察。我们提出了两种新的极值回归模型可视化诊断工具:标准化尾部图和标准化残差图。通过考虑标准化超越概率的渐近分布,我们证明这些图的置信边界近似独立于构建时使用的样本量。这使得我们能够提出可视化诊断工具,尽管协变量域各区域的样本量不同,但可以高效且一致地在全局和区域层面比较拟合优度。在讨论全局和区域拟合优度的汇总统计量之后,我们提供了两个极值回归模型的应用实例,说明我们的诊断工具如何用于模型比较(在数千个候选模型中)并提供支持模型设计的可操作发现。
Visual and quantitative goodness-of-fit diagnostics are an important tool in the practitioner's toolbox. The need for convincing and reliable diagnostics is particularly clear when fitting extreme value regression models, which are used for extrapolation far beyond the observable range of the response variable, and often evaluated at unobserved covariate values. Despite this, few diagnostics have been developed for extreme value regression models, and those available often suffer in terms of interpretability or scalability on low-dimensional or non-Euclidean covariate domains, often encountered in modern applications. Moreover, existing methods tend to offer a global perspective on model fit; that is, they quantify goodness-of-fit across the entire dataset, without offering insight into regions of the covariate space where the model fit may be poor. We propose two novel visual diagnostics for extreme value regression models: the standardised tail plot and the normalised residual plot. By considering the asymptotic distribution of normalised exceedance probabilities, we show that uncertainty bounds for our plots are approximately independent of the sample size used in their construction. This allows us to propose visual diagnostics which can efficiently and consistently compare goodness-of-fit at both a global and regional level, despite varying sample sizes over regions of the covariate domain. Following a discussion of summary statistics for global and regional goodness-of-fit, we provide two applications of extreme value regression models that illustrate how our diagnostics can be used to perform model comparison (across thousands of candidate models) and provide actionable findings that support model design.
潜变量动力系统中的状态耦合波动性:部分观测下的恢复
Imani Beckett
AI总结 提出状态耦合随机波动框架,利用粒子期望最大化算法在部分观测下估计潜变量过程方差与平衡点位移的关系,并通过仿真验证了恢复与检测性能。
潜状态空间模型广泛用于研究部分观测的动力系统,但大多数公式假设过程变异性与潜状态位置无关。然而,在许多生物、行为和生理系统中,变异性可能系统地依赖于潜在动力状态,产生恒定方差模型无法捕捉的结构化随机性。我们引入了一个状态耦合随机波动框架,其中潜过程方差取决于与潜平衡点的位移。为了在部分观测下估计这种关系,我们开发了一种粒子期望最大化程序,结合了引导粒子滤波和反向轨迹平滑。该模型包含一个耦合参数 $\gamma$,用于量化潜状态位置与过程变异性之间的关联强度。一个大规模仿真基准评估了在不同耦合强度、观测噪声水平、轨迹长度和持续性机制下的恢复和检测性能。与基于观测状态的异方差代理相比,所提出的框架一致地减少了恢复偏差,在强耦合下改进最大。恢复性能随着潜持续性的增加而提高,而检测性能在广泛条件下保持竞争力,并随着观测噪声的增加而变得更加有利。综合来看,结果表明当明确建模潜状态结构时,可以在部分观测下识别和估计状态耦合波动性。该框架为研究状态依赖变异性以及评估结构化随机性是否提供超出平均状态轨迹所包含的系统动力学信息提供了实用的方法论基础。
Latent state-space models are widely used to study partially observed dynamical systems, yet most formulations assume that process variability is independent of latent-state position. In many biological, behavioral, and physiological systems, however, variability may depend systematically on the underlying dynamical state, producing structured stochasticity that is not captured by constant-variance models. We introduce a state-coupled stochastic volatility framework in which latent process variance depends on displacement from a latent equilibrium. To estimate this relationship under partial observation, we develop a particle expectation-maximization procedure combining bootstrap particle filtering and backward trajectory smoothing. The model includes a coupling parameter, $γ$, that quantifies the strength of association between latent-state position and process variability. A large-scale simulation benchmark evaluated recovery and detection performance across varying coupling strengths, observation noise levels, trajectory lengths, and persistence regimes. The proposed framework consistently reduced recovery bias relative to an observed-state heteroskedastic proxy, with the largest improvements occurring under strong coupling. Recovery performance improved with increasing latent persistence, while detection performance remained competitive across a broad range of conditions and became increasingly advantageous as observation noise increased. Taken together, the results demonstrate that state-coupled volatility can be identified and estimated under partial observation when latent-state structure is explicitly modeled. The framework provides a practical methodological foundation for studying state-dependent variability and evaluating whether structured stochasticity contributes information about system dynamics beyond that contained in mean-state trajectories alone.
目标更新可能稳定线性Q学习:周期性和软动态
Donghwan Lee
AI总结 本文通过精确的切换线性系统动力学和联合谱半径分析,证明了在特定谱和步长条件下,周期性硬目标更新和软目标更新可以保证线性Q学习收敛到精确的投影Q-Bellman解。
Q学习中的周期性目标更新和actor-critic方法中的软目标更新是经验上公认的稳定机制,但其精确的理论解释仍不完整。本文针对线性函数逼近的Q学习(线性Q学习),利用Bellman最大值引起的精确切换线性系统(SLS)动力学以及由此产生的切换矩阵族的联合谱半径(JSR),对这些机制进行了严格而精确的分析。尽管线性Q学习通常可能无法收敛,但我们证明,在明确的谱和步长条件下,周期性硬目标更新和软目标更新可以保证收敛到精确的投影Q-Bellman解。主要分析针对确定性线性Q学习进行,其中目标更新机制最为透明。一旦为均值递归建立了相应的JSR证书,随机强化学习设置可以通过将确定性模式替换为采样随机模式并添加相应的随机噪声分析来处理。
Periodic target updates in Q-learning and soft target updates in actor-critic methods are empirically well established stabilization mechanisms, but their precise theoretical explanation is still incomplete. This paper gives a rigorous and exact analysis of these mechanisms for Q-learning with linear function approximation (linear Q-learning) using the exact switched linear system (SLS) dynamics induced by the Bellman maximum and the joint spectral radius (JSR) of the resulting switching matrix families. Although linear Q-learning can fail to converge in general, we prove that, under explicit spectral and step-size conditions, periodic hard target updates and soft target updates can guarantee convergence to the exact projected Q-Bellman solution. The main analysis is carried out for deterministic linear Q-learning, where the target-update mechanism is most transparent. Once the corresponding JSR certificate is established for the mean recursion, the stochastic reinforcement-learning setting can be treated by replacing deterministic modes with sampled stochastic modes and adding the corresponding stochastic-noise analysis.
立场:优先识别结构,而非复杂模型,以促进科学发现
Tyler H. McCormick
AI总结 本文论证现代机器学习在高维代理机制下存在通用欠定性,提出“机制性机器学习”的具体标准,以确保以LLM为中心的工作流真正支持科学而非模拟科学。
现代机器学习(ML)和人工智能(AI)模型,特别是大型语言模型(LLMs),越来越多地被用于从观测数据中生成科学假设和机制解释。这篇立场论文认为,在现代ML擅长的高维代理机制中,机制性学习通常是欠定的:许多不相容的机制在数据支撑上诱导出本质上相同的观测关系,因此预测成功和连贯的解释并不足以作为机制发现的证据。这种欠定性在大型语言模型(LLMs)中变得尤为危险,因为它们倾向于将大量等价的解释类压缩成一个流畅的叙述。本文提出了“机制性机器学习”的具体标准,并论证如果以LLM为中心的工作流要支持科学而非仅仅模拟科学,这些标准是必要的。
Modern Machine Learning (ML) and Artificial Intelligence (AI) models, especially large language models (LLMs), are increasingly used to generate scientific hypotheses and mechanistic explanations from observational data. This position paper argues that in the high-dimensional proxy regimes where modern ML excels, mechanistic learning is generically underdetermined: many incompatible mechanisms induce essentially the same observational relationships on the support of the data, so predictive success and coherent explanations are insufficient evidence of mechanism discovery. This underdetermination becomes uniquely hazardous with large language models (LLMs), which tend to collapse large equivalence classes of explanations into a single fluent narrative. This paper proposes concrete standards for ``mechanistic ML,'' and argues these norms are necessary if LLM-centered workflows are to support science rather than merely simulate it.
利用Sentinel-5P卫星数据追踪城市大气污染物
Alice Gomez-Cantos, Henry O. Velesaca
AI总结 提出基于Sentinel-5P/TROPOMI卫星对流层柱观测的框架,通过中位数和高百分位数等分布指标及K-means聚类,在厄瓜多尔瓜亚斯省尺度上表征城市NO2污染背景与极端值,为数据稀缺地区提供可解释、可扩展的空气质量评估工具。
城市二氧化氮($NO_2$)是燃烧相关空气污染的关键指标,在城市中表现出强烈的时空变异性。本研究提出一个基于卫星的框架,利用Sentinel-5P/TROPOMI的对流层柱观测数据,追踪厄瓜多尔瓜亚斯省的城市$NO_2$污染。该方法不估计地表浓度,而是强调稳健的分布指标,包括中位数和上尾百分位数($P_{90}$、$P_{95}$和$P_{99}$),以表征县尺度上的背景条件和局部污染极端值。多年卫星观测数据按年汇总,并使用无监督K-means聚类分析,以识别无预定义阈值的特征污染模式。结果表明,高度城市化的县持续表现出较高的极端$NO_2$值和更大的变异性,而城市化程度较低的地区则呈现较低且更均匀的模式。所提出的方法为数据稀缺地区仅使用卫星观测提供了一种可解释且可扩展的城市空气质量评估工具。该实现已在GitHub上公开,网址为https://this URL。
Urban nitrogen dioxide ($NO_2$) is a key indicator of combustion-related air pollution and exhibits strong spatial and temporal variability in cities. This study presents a satellite-based framework for tracking urban $NO_2$ pollution using tropospheric column observations from Sentinel-5P/TROPOMI over Guayas Province, Ecuador. Rather than estimating surface concentrations, the methodology emphasizes robust distributional metrics, including the median and upper-tail percentiles ($P_{90}$, $P_{95}$, and $P_{99}$), to characterize background conditions and localized pollution extremes at the canton scale. Multi-year satellite observations are aggregated annually and analyzed using unsupervised K-means clustering to identify characteristic pollution regimes without predefined thresholds. Results show that highly urbanized cantons consistently exhibit elevated extreme $NO_2$ values and greater variability, while less urbanized areas display lower and more homogeneous patterns. The proposed approach provides an interpretable and scalable tool for urban air-quality assessment in data-scarce regions using satellite observations alone. The implementation is publicly available on GitHub https://hvelesaca.github.io/sentinel-5P-clustering/.
Rashomon播种退火用于因子设计中的鲁棒贝叶斯推断
Yiyang Fan, Soumyakanti Pan, Tyler H. McCormick
AI总结 针对因子设计中模型不确定性导致的后验多模态和MCMC收敛问题,提出Rashomon播种退火方法,利用Rashomon集作为退火重要性采样的初始分布,实现无需穷举模型空间的完整后验推断。
在因子设计中,通过贝叶斯模型平均整合模型不确定性受到可解释交互效应的组合爆炸阻碍,通常产生多模态后验,标准马尔可夫链蒙特卡罗算法遇到显著的收敛问题。我们提出一个通用计算框架,将Rashomon集(传统上因预测和可解释性而受到重视的高性能模型集合)重新用作估计完整后验的战略性“热启动”。我们的方法,Rashomon播种退火,通过将起始密度锚定在这些预先识别的高证据区域内,同时保持对整个模型空间的全局支持,来初始化退火重要性采样(AIS)。AIS校正不是将推断限制在Rashomon集并低估不确定性,而是恢复完整的后验推断,将Rashomon证书从推断截断转变为提议机制。我们使用Rashomon划分集(RPS)作为因子设计的严格认证种子构造器来演示这种方法。所得算法产生一致的自标准化后验摘要,如模型平均单元均值、可信区间和不确定性摘要,而无需穷举整个模型空间。这弥合了高证据模型发现与严格贝叶斯推断之间的差距,并概述了一种通用策略,其中任何高后验种子集都可以为基于AIS的模型平均提供计算杠杆。
Integrating over model uncertainty in factorial designs via Bayesian model averaging is hindered by the combinatorial explosion of interpretable interaction effects, often yielding a multimodal posterior, where standard Markov chain Monte Carlo algorithms encounter significant convergence issues. We propose a general computational framework that repurposes Rashomon sets, collections of high-performing models traditionally valued for prediction and interpretability, as a strategic "warm start" for estimating the full posterior. Our method, Rashomon-seeded annealing, initializes annealed importance sampling (AIS) by anchoring the starting density within these pre-identified, high-evidence regions while preserving global support over the entire model space. Rather than restricting inference to the Rashomon set and understating uncertainty, the AIS correction restores full posterior inference, turning the Rashomon certificate from an inferential truncation into a proposal mechanism. We demonstrate this approach using Rashomon Partition Sets (RPS) as a rigorous, certified seed constructor for factorial designs. The resulting algorithm yields consistent self-normalized posterior summaries, such as model-averaged cell means, credible intervals, and uncertainty summaries without exhaustive enumeration of the complete model space. This bridges the gap between high-evidence model discovery and rigorous Bayesian inference, and outlines a general strategy in which any high-posterior seed set can provide computational leverage for AIS-based model averaging.
演化作为因果推断的过程
Jacopo Iacovacci
AI总结 本文提出自然选择应被理解为因果推断过程,利用Neyman-Rubin潜在结果框架形式化突变作为自然实验,并证明平均适应度的代际变化可分解为选择项和突变项。
最近,复制子方程到贝叶斯定理的映射已被认识,导致了演化动力学与贝叶斯学习之间的类比。然而,这种类比仅适用于无限种群中的纯选择,当引入突变——演化的核心机制——时则失效。这里我提出,自然选择下的演化,至少在静态环境中的单倍体复制子种群中,最好不被理解为学习过程,而是因果推断过程。每个突变事件构成一个自然实验,其中亲本作为对照,突变后代作为处理单元。自然选择筛选突变对适应度的因果效应,保留非负效应的突变。我在Neyman-Rubin潜在结果框架内形式化了这一观点。我首先使用通用适应度结果发展了一般理论,并展示了因果推断中的核心识别假设(稳定单位处理值假设、一致性、无混杂性、积极性)如何映射到演化生物学。利用非归一化的准种方程,我证明了平均适应度的代际变化精确分解为一个选择项——恢复了费舍尔基本定理——加上一个突变项,该突变项对应于所有亲本基因型上所有突变的累积效应的适应度加权平均值。我展示了在适当假设下,这种分解扩展到广义复制子-突变子方程,并且匹配的亲本-后代群体的频率根据突变对适应度的平均因果效应成比例更新。
Recently, the mapping of the replicator equation onto Bayes' theorem has been recognised, leading to an analogy between evolutionary dynamics and Bayesian learning. However, this analogy holds only for pure selection in infinite populations and breaks down when mutations -- a central mechanism of evolution -- are introduced. Here I propose that evolution by natural selection, at least for populations of haploid replicators in static environments, is best understood not as a learning process but as a process of causal inference. Each mutation event constitutes a natural experiment in which the parent serves as the control and the mutant offspring as the treated unit. Natural selection screens the causal effect of the mutation on fitness, retaining mutations with non-negative effects. I formalise this view within the Neyman-Rubin potential-outcomes framework. I first develop the general theory using a generic fitness outcome and show how the core identification assumptions in causal inference (Stable Unit Treatment Value Assumption, Consistency, Unconfoundedness, Positivity) map onto evolutionary biology. Using the unnormalised quasispecies equation, I prove that the intergenerational change in mean fitness decomposes exactly into a selection term -- recovering Fisher's Fundamental Theorem -- plus a mutation term that corresponds to a fitness-weighted average of the cumulated effect of all mutations over all parental genotypes. I show that this decomposition extends, under suitable assumptions, to the generalised replicator-mutator equation and that the frequencies of populations of matched parents-offspring update in proportion to the average causal effect of mutations on fitness.
基于梯度变化的区间遗憾在线学习
Yan-Feng Xie, Shuche Wang, Peng Zhao, Zhi-Hua Zhou
AI总结 本文提出首个基于梯度变化量实现区间遗憾界的在线学习算法,采用两层在线集成结构,自适应多种问题相关量并达到极小化最优率,同时引入Lipschitz和平滑性无关的变体。
本文研究使用区间遗憾度量的非平稳在线学习,该度量要求在线算法在每个时间区间内表现良好。我们提出了第一个在线学习算法,其区间遗憾界随梯度变化缩放,梯度变化是衡量在线函数梯度累积变化的基本度量,与多种问题相关量有关,并与随机优化等问题紧密相连。我们的方法采用简单高效的两层在线集成结构,实现了强大的理论保证。具体来说,它享有同时自适应多种问题相关量的遗憾界,同时在最坏情况下保持极小化最优率。此外,认识到超参数调优的挑战,我们引入了一种Lipschitz和平滑性无关的变体,自动适应这些可能未知的常数。这主要得益于一种新颖的Lipschitz自适应元算法,该算法可能具有独立的意义。除了区间遗憾,我们的方法还产生了更广泛的影响:它为区间动态遗憾(一种更强的度量,与任何区间上的变化比较器竞争)提供了通用的界,并首次为随机扩展对抗优化提供了分段刻画。理论发现通过实验得到验证。
This paper investigates non-stationary online learning using the metric of interval regret, which requires an online algorithm to perform well over every time interval. We propose the first online learning algorithm that achieves an interval regret bound scaling with gradient variation, a fundamental measure of the cumulative change in online function gradients, which relates to various problem-dependent quantities and is closely connected to stochastic optimization and other problems. Our method employs a simple and efficient two-layer online ensemble structure that achieves strong theoretical guarantees. Specifically, it enjoys a regret bound that simultaneously adapts to various problem-dependent quantities while also preserving the minimax-optimal rate in the worst case. Moreover, recognizing the challenge of hyperparameter tuning, we introduce a Lipschitz- and smoothness-agnostic variant that automatically adapts to these potentially unknown constants. This is primarily enabled by a novel Lipschitz-adaptive meta algorithm, which may be of independent interest. Beyond interval regret, our method also yields broader implications: it provides versatile bounds for interval dynamic regret, a stronger measure that competes with changing comparators over any interval, and yields the first piecewise characterization for stochastic extended adversarial optimization. Theoretical findings are validated by experiments.
通过后验采样的共形语言建模
Nicolas Emmenegger, Theo X. Olausson, Armando Solar-Lezama, Chara Podimata
AI总结 提出通过近似LLM后验采样(条件为校准的高分区域)来替代事后过滤,实现目标风险控制并提高下游效用。
大型语言模型仍然受到幻觉的困扰。最近的工作试图使用基于共形预测的统计技术来抑制其普遍性,取得了理论和实证上的成功。然而,这些方法以事后方式运作,将采样过程本身视为原子操作,然后通过外科手术式地修改样本来移除幻觉声明。这种过滤与生成之间的脱节可能导致样本不连贯、不一致,或者仅仅在模型本身下不太可能。此外,事后手术无法将概率质量转移到更有用和更有帮助的响应上。为了解决这些问题,我们提出从LLM后验的近似中采样,其中条件事件对应于一个校准的高分区域。我们开发了一种针对条件序列生成场景的校准程序,该程序能有效识别该区域并实现目标风险控制。在实证中,我们将我们的方法应用于以开放式的传记生成和数学问题解决为重点的案例研究;与先前的工作相比,我们获得了相同的统计保证,且下游效用更高。
Large Language Models remain plagued by hallucinations. Recent work has sought to tame their prevalence using statistical techniques based on conformal prediction, with both theoretical and empirical success. However, these methods operate in a post-hoc fashion, treating the sampling procedure itself as atomic and then surgically altering samples to remove hallucinated claims. This disconnect between filtering and generation can result in samples that are incoherent, inconsistent, or simply unlikely under the model itself. Moreover, post-hoc surgery is unable to shift probability mass towards more useful and helpful responses. To address these issues, we propose to instead sample from approximations to an LLM posterior, where the conditioning event corresponds to a calibrated, high-scoring region. We develop a calibration procedure tailored to the setting of conditional sequential generation that effectively identifies this region and achieves target risk control. Empirically, we apply our method to case studies focused on open-ended biography generation and mathematical problem solving; compared to prior work, we obtain the same statistical guarantees, with higher downstream utility.
AugMask: 通过随机增强和掩码在不完整表格数据上训练扩散模型
Jungkyu Kim, Taeyoung Park, Kibok Lee
AI总结 提出AugMask训练框架,通过条件随机增强和仅对观测坐标去噪,使标准扩散模型适应缺失表格数据,并连接Rao-Blackwellized目标实现方差加权惩罚,优于专门处理缺失的基线。
基于分数的扩散模型已成为突出的深度生成模型;然而,它们在表格数据上的应用仍然具有挑战性,因为其主干网络假设输入完全指定,而现实世界的表格数据通常包含缺失值。我们提出了AugMask,一个即插即用的训练框架,通过将条件与监督分离,使对缺失不敏感的主干网络适应不完整数据。AugMask 1) 使用轻量级辅助模型通过条件随机增强构建数值输入,2) 仅对观测坐标应用去噪监督。实际上,增强的缺失条目作为不确定的条件上下文,而不是训练目标。我们将此训练规则与Rao-Blackwellized目标联系起来,并表明对缺失条目进行边缘化会产生方差加权的敏感性惩罚,从而阻止对不确定补全的过度依赖。在多种数据集和缺失机制下,AugMask使基于扩散的标准表格生成器优于专门处理缺失的基线方法。
Score-based diffusion models have emerged as prominent deep generative models; however, their application to tabular data remains challenging because their backbones assume fully specified inputs, whereas real-world tabular data often contain missing values. We propose AugMask, a plug-and-play training framework that adapts missing-unaware backbones to incomplete data by separating conditioning from supervision. AugMask 1) constructs numeric inputs via conditional stochastic augmentation using lightweight auxiliary models, and 2) applies denoising supervision only to observed coordinates. In effect, augmented missing entries serve as uncertain conditioning context rather than training targets. We connect this training rule to a Rao--Blackwellized objective and show that marginalizing missing entries yields a variance-weighted sensitivity penalty, discouraging over-reliance on uncertain completions. Across diverse datasets and missingness regimes, AugMask enables standard diffusion-based tabular generators to outperform specialized missing-aware baselines.
现实世界数据集是否包含自然实验?基于因果特征选择的实证研究
Gautam Gare, John Galeotti, Michael Mozer, Deva Ramanan, Nan Rosemary Ke
AI总结 本文利用因果发现和特征选择检测现实世界数据集中的自然实验,并通过干预性处理提升模型性能。
在自然界中,影响某些个体或群体但不影响其他个体或群体的事件构成隐式干预,被称为自然实验。例如,COVID-19大流行是冠状病毒对感染COVID的亚群的一次干预。我们问:现有的现实世界数据集中是否存在自然实验?如果存在,我们应该如何处理它们?为了检测数据中的自然实验,我们使用因果发现恢复潜在因果图,并基于因果链接进行特征选择。如果通过将数据视为干预性而非观测性来提升下游性能,我们认为这表明数据集包含自然实验。我们首先通过使用合成图模拟包含和不包含自然实验的数据集来验证这一假设。然后,我们在大量现实世界数据集上进行系统的实证评估。我们的结果表明,现实世界数据集确实包含自然实验,我们可以利用这些自然实验通过因果推断来提升模型性能。我们的工作代表了该领域的初步探索,在有限范围内进行了初步研究。
In nature, events that affect some individuals or groups but not others constitute an implicit intervention and are known as natural experiments. For example, the COVID-19 pandemic was an intervention by the coronavirus on the sub-population infected with COVID. We ask, do natural experiments occur in existing real-world datasets? If yes, how should we treat them? To detect natural experiments in data, we use causal discovery to recover the underlying causal graph and perform feature selection based on causal links. If downstream performance improves by treating the data as interventional rather than observational, we argue that this suggests the dataset contains natural experiments. We first validate this hypothesis by simulating datasets with and without natural experiments using synthetic graphs. We then perform a systematic empirical evaluation on a large suite of real-world datasets. Our results indicate that real-world datasets do contain natural experiments and we can take advantage of those natural experiments to improve model performance using causal inference. Our work represents the initial foray into this area, offering a preliminary exploration within a limited scope.
神经网络可证明地学习群组合的谱表示
Jianliang He, Leda Wang, Fengzhuo Zhang, Siyu Chen, Zhuoran Yang
AI总结 通过将投影梯度流提升到傅里叶域,证明两层神经网络在群组合任务中几乎必然收敛到单个不可约表示,并揭示了表示论视角下的特征学习和低秩压缩现象。
理解神经网络训练过程中结构化内部结构如何涌现是深度学习研究的核心。我们通过群组合任务研究这一现象,其中训练一个两层神经网络来预测有限群 $G$ 中元素的 $g_1 \star g_2$。通过将投影梯度流提升到傅里叶域,我们证明训练动力学由一个表示论能量泛函上的黎曼梯度上升控制。我们证明,在随机初始化下,该流驱动每个神经元几乎必然收敛到单个不可约表示,而跨层傅里叶系数实现旋转秩一对齐。该框架提供了特征学习的表示论解释,并刻画了矩阵值群表示的一种新颖的低秩压缩现象。此外,对于阿贝尔群,我们提供了完整的总体水平描述:随机初始化促进非平凡表示上的均匀多样化,并诱导 Haar 均匀相位,通过多数投票机制联合逼近指示函数。我们进一步证明相位对齐和表示竞争都以指数收敛速率出现。
Understanding how structured internal structure emerges during neural network training is central to the study of deep learning. We investigate this phenomenon through the group composition task, where a two-layer neural network is trained to predict $g_1 \star g_2$ for elements of a finite group $G$. By lifting the projected gradient flow to the Fourier domain, we demonstrate that the training dynamics are governed by a Riemannian gradient ascent on a representation-theoretic energy functional. We prove that, under random initialization, this flow drives each neuron to converge almost surely toward a single irreducible representation, while the cross-layer Fourier coefficients achieve a rotational rank-one alignment. This framework provides a representation-theoretic account of feature learning and characterizes a novel low-rank compression phenomenon for matrix-valued group representations. Moreover, for Abelian groups, we provide a complete population-level description: random initialization promotes uniform diversification across nontrivial representations and induces Haar-uniform phases, jointly approximating the indicator via a majority-vote mechanism. We further prove that both phase alignment and representation competition emerge with exponential convergence rates.
部分可观测马尔可夫决策过程中的值函数半代数集
Ryan A. Anderson, Guido Montufar
AI总结 本文刻画了无限时域部分可观测马尔可夫决策过程(POMDP)中无记忆随机策略下可行值函数的几何结构,将其表示为由多项式不等式定义的半代数集,并揭示了部分可观测性导致的非线性约束和孤立局部极大值现象。
我们研究了无限时域部分可观测马尔可夫决策过程(POMDP)中无记忆随机策略下可行值函数的几何结构。我们的主要贡献是将可行值函数集刻画为一个半代数集,由POMDP的转移动力学、观测核和奖励结构决定的显式多项式不等式定义。这一结果将先前完全可观测马尔可夫决策过程(其中可行集已知为多面体)的工作扩展到了更为复杂的部分可观测情形。与MDP中出现的多面体结构不同,部分可观测性引入了本质非线性的约束,导致更丰富和更复杂的几何结构。我们的几何刻画为MDP和POMDP中策略优化的景观提供了新的见解,并揭示了部分可观测性独有的定性现象,包括长期奖励的孤立局部极大值及其对初始状态分布的依赖性。
We study the geometry of feasible value functions in infinite-horizon partially observable Markov decision processes (POMDPs) under memoryless stochastic policies. Our main contribution is a characterization of the feasible set of value functions as a semi-algebraic set, defined by explicit polynomial inequalities determined by the transition dynamics, observation kernel, and reward structure of the POMDP. This result extends prior work for fully observable Markov decision processes, where the feasible set is known to be a polytope, to the substantially more intricate partially observable setting. In contrast to the polyhedral structure arising in MDPs, partial observability induces fundamentally nonlinear constraints, leading to a richer and more complex geometric structure. Our geometric characterization provides new insight into the landscape of policy optimization in both MDPs and POMDPs, and reveals qualitative phenomena unique to partial observability, including the emergence of isolated local maximizers of the long-term reward and their dependence on the initial state distribution.
MCMC混合的局部与全局收缩原理
Alireza Daeijavad, Shahab Asoodeh
AI总结 提出基于Eγ散度的全局和局部收缩系数框架,证明投影Langevin Monte Carlo的指数收敛速度,并针对独立Metropolis-Hastings算法引入局部收缩系数以处理重尾分布。
我们开发了一个基于收缩的框架,用于证明马尔可夫链蒙特卡洛算法的混合时间界。该框架建立在马尔可夫核在$\mathsf E_\gamma$-散度($\gamma\ge1$)下的全局和局部收缩系数之上。对于紧凸域上的投影Langevin Monte Carlo,我们证明高斯平滑为$\mathsf E_\gamma$-散度提供了显式的全局收缩系数。这直接证明了对于一般光滑、可能非凸的势函数,离散化平稳分布的指数收敛性。该速率是显式的,适用于任意随机批采样方案,并为包括KL散度、$\chi^2$散度和Rényi散度在内的多种散度提供了收敛保证。对于以$\pi$为目标、$q$为提议且无界重要性权重$w=d\pi/dq$的独立Metropolis-Hastings算法,全局收缩系数通常是平凡的。因此,我们在核心$C_R=\{w\le R\}$上引入了一个局部收缩系数,并证明它控制了核心上的拒绝轮廓。这产生了由局部收缩系数和尾部轮廓$H_R=\pi(w>R)$支配的热启动收敛界,当$\mathbb E_q[w^p]<\infty$(对于某个$p>1$)时,恢复了现有的尖锐矩收敛速率,同时在不存在任何$p>1$阶有限矩的重尾区域中仍然有效。
We develop a contraction-based framework for proving mixing-time bounds for Markov chain Monte Carlo algorithms. The framework is built around global and local contraction coefficients of Markov kernels under the $\mathsf E_γ$-divergence with $γ\ge1$. For projected Langevin Monte Carlo on a compact convex domain, we show that Gaussian smoothing yields an explicit global contraction coefficient for the $\mathsf E_γ$-divergence. This gives a direct proof of exponential convergence to the discretized stationary distribution for general smooth, possibly non-convex potentials. The rate is explicit, accommodates arbitrary random-batch sampling schemes, and yields convergence guarantees for several divergences, including KL, $χ^2$, and Rényi divergences. For independent Metropolis--Hastings with target $π$, proposal $q$, and unbounded importance weight $w=dπ/dq$, global contraction coefficients are typically trivial. We therefore introduce a local contraction coefficient on the core $C_R=\{w\le R\}$ and prove that it controls the rejection profile on the core. This yields warm-start convergence bounds governed by the local contraction coefficient and the tail profile $H_R=π(w>R)$, recovering sharp existing moment-based convergence rates when $\mathbb E_q[w^p]<\infty$ for some $p>1$, while remaining effective in heavy-tailed regimes where no finite moment of order $p>1$ exists.
区分信念与从众:在线辩论中投票行为的贝叶斯理想点模型
Elena Candellone
AI总结 提出贝叶斯逻辑回归模型,区分在线辩论中基于信念的conviction和基于同伴影响的conformity,并在this http URL数据集上发现不同话题下两种机制的主导性差异。
在线辩论平台为了解意见形成的机制提供了独特窗口:它们同时捕捉了明确的政治偏好以及表达这些偏好的同伴环境。在这项工作中,我开发了一个贝叶斯逻辑回归模型,受政治科学中的理想点模型启发,以区分在线辩论中投票行为的两种竞争机制:由先前意识形态信念驱动的conviction(信念)和由同伴影响驱动的conformity(从众)。我将该框架应用于this http URL数据集,该数据集包含48个社会政治话题上约78k场辩论中的约341k次投票。由于辩论平台未为每场辩论提供预定义的话题标签,我使用大语言模型从辩论文本中推断话题和立场,并通过贝叶斯方法量化每种机制的相对贡献。我发现话题间存在显著异质性:在涉及个人自由和生活方式选择的问题上,如毒品合法化和卖淫合法化,conviction占主导;而在被广泛视为道德信念典型例子的多个话题上,包括堕胎、枪支权利和全球变暖,conformity占主导。这些结果对在线政治话语的稳定性和 deliberative 平台的设计具有启示意义。
Online debate platforms offer a unique window into the mechanisms driving opinion formation: they capture both explicit political preferences and the peer environment in which those preferences are expressed. In this work, I develop a Bayesian logistic regression model, inspired by ideal point models from political science, to disentangle two competing mechanisms of voting behaviour in online debates: conviction, driven by prior ideological beliefs, and conformity, driven by peer influence. I apply this framework to the Debate.org dataset, comprising approximately 341k votes across 78k debates on 48 socio-political topics. As the debate platform does not provide predefined topic labels for each debate, I infer the topic and stance from the debate text using large language models, and, with a Bayesian approach, I quantify the relative contribution of each mechanism. I find substantial heterogeneity across topics: conviction dominates on issues tied to personal freedoms and lifestyle choices, such as drug legalisation and legalised prostitution, while conformity dominates on several topics widely regarded as paradigmatic cases of moral conviction, including abortion, gun rights, and global warming. These results have implications for the stability of online political discourse and the design of deliberative platforms.
基于多模态Transformer的通用混合密度网络用于快速射电暴散射时标估计
Bikash Kharel, Emmanuel Fonseca, Srinjoy Das, Mason Ng, Paul Scholz, Mawson W. Simmons, Lordrick Kahinga, Afrokk Khan
AI总结 提出多模态Transformer通用混合密度网络(MT-GMDN),通过并行编码器融合FRB动态谱和时间序列特征,利用混合密度公式估计散射时标τ及其分布,在CHIME/FRB数据上达到94%的决定系数和90%的召回率。
随着新射电设施的投入使用,快速射电暴(FRB)的发现率持续增加,但提取其天体物理参数(如散射时标τ)仍是一个重大瓶颈。当前的τ测量方法(如拟合解析模板模型和散射感知反卷积)虽然准确,但速度慢、对初始化敏感、受限于低信噪比且通常需要人工监督。这些局限性促使我们探索快速、鲁棒且可扩展的机器学习方法来估计天体物理参数值。我们提出了一种深度学习方法,名为基于多模态Transformer的通用混合密度网络(MT-GMDN),它通过并行Transformer编码器输入FRB动态频谱及其对应的时间序列轮廓,融合它们的潜在表示,并基于通用混合密度公式预测τ的概率分布。该公式不仅估计τ的值,还捕捉了FRB群体的(零膨胀)特性,即相当一部分暴发显示出不可分辨的散射。我们在CHIME/FRB第二目录中的约3500个FRB上训练了MT-GMDN,同时保留一部分FRB用于训练期间的验证和训练完成后的测试。该模型在可测量散射事件τ的期望值上达到了94%的决定系数(R²),在测试数据集上召回率高达90%。该模型还能够纳入异方差误差,从而为预测构建置信区间。
The discovery rate of fast radio bursts (FRBs) continues to increase with the advent of new radio facilities and yet extracting their astrophysical parameters such as scattering timescale ($τ$) remains a significant bottleneck. Current $τ$ measurement approaches like fitting analytic template models and scattering aware de-convolution are accurate but slow, sensitive to initialization, limited by low signal to noise and often require manual supervision. These limitations inspired us to explore fast, robust and scalable machine learning methods to estimate the astrophysical parameter value. We present a deep learning approach named Multimodal Transformer Based Generic Mixture Density Network (MT-GMDN) which ingests FRB dynamic spectrum and its corresponding timeseries profile through parallel transformer encoders, fuses their latent representations and predicts the distribution of $τ$ with probabilistic output derived from generic mixture-density formulation. This formulation not only estimates the value of $τ$ but also captures the (zero inflated) nature of FRB populations where a significant fraction of bursts exhibit unresolvable scattering. We trained MT-GMDN on $\sim3500$ FRBs from CHIME/FRB \cattwo while holding out some fraction of FRBs for validation during training and for testing after the training completes. The model achieves a coefficient of determination ($R^2$) value of $94\%$ on the expected value of $τ$ for the events with measurable scattering with an excellent recall value of $90\%$ on the test data set. The model was also able to incorporate heteroskedastic errors enabling us the construction of a confidence interval for the predictions.
基于Catalan指数先验的决策树贝叶斯模型平均的样本复杂度和决策理论保证
Livija Jakaite, Vitaly Schetinin
AI总结 针对具有Dirichlet-Multinomial叶模型和Catalan指数树大小先验的贝叶斯决策树,建立了理性承诺阈值的完整非渐近理论,回答了贝叶斯模型平均权重何时蕴含足够认知信息以证明对平均分布的承诺利用是合理的。
我们提出一个问题:当决策树上的贝叶斯模型平均(BMA)权重携带足够的认知信息时,何时可以证明对平均分布的承诺利用是合理的?对于具有Dirichlet-Multinomial叶模型和Catalan指数树大小先验(Schetinin & Jakaite, 2025)的贝叶斯决策树(BDTs),我们以闭式回答了这个问题,建立了理性承诺阈值的完整非渐近理论。
We ask: when do Bayesian model averaging (BMA) weights over decision trees carry sufficient epistemic information to justify committed exploitation of the averaging distribution? We answer this question in closed form for Bayesian decision trees (BDTs) with Dirichlet-Multinomial leaf models and a Catalan-exponential tree-size prior (Schetinin&Jakaite, 2025), establishing a complete non-asymptotic theory of rational commitment thresholds.
超越均值的结构因果效应的拓扑可忽略性
Usef Faghihi
AI总结 本文提出基于拓扑几何的因果度量(如密度超水平Betti摘要、欧拉签名和持续同调摘要)来量化干预分布的结构差异,并引入拓扑可忽略性假设以在无需完整反事实分布的情况下识别结构因果效应。
许多干预措施改变的是结果分布的结构而非其均值:它们可以将总体分裂为不连通的区域、创建循环或空洞、生成分支,或重组结果云团而几乎不改变平均响应。在这种情况下,基于均值的因果估计量(如平均处理效应)可能遗漏重要的结构效应。 我们引入了基于干预结果定律摘要的拓扑几何因果度量,包括密度超水平Betti摘要、欧拉签名和持续同调摘要。这些度量量化了处理组和未处理组结果定律之间超出平均值的结构差异。我们还研究了因果解释所需的假设。我们引入了拓扑可忽略性,这是条件可忽略性的拓扑类比,要求所选结构特征的不变性而非整个反事实分布。当所选摘要是单射时,该条件与弱可忽略性一致;对于非单射摘要,它可以在不识别完整干预定律的情况下识别感兴趣的结构特征。 我们定义了一个协变量标准化的拓扑几何因果效应,并开发了实用的估计量。我们在两个隐藏混杂基准中验证了该框架:一个完全合成的精确基准和一个使用威斯康星乳腺癌协变量的真实协变量半合成基准。在这两个基准中,弱可忽略性失败,平衡观测协变量几乎消除了标准化均值差异,但坐标均值平均处理效应仍然有偏。相比之下,选定的有限密度超水平Betti和欧拉对比在神谕、观测和加权分析中保持稳定。
Many interventions alter the structure of an outcome distribution rather than its mean: they can split a population into disconnected regimes, create loops or holes, generate branches, or reorganize an outcome cloud while leaving the average response nearly unchanged. In such settings, mean-based causal estimands such as the average treatment effect may miss important structural effects. We introduce topological-geometrical causal metrics based on summaries of interventional outcome laws, including density-superlevel Betti summaries, Euler signatures, and persistent-homology summaries. These metrics quantify structural differences between treated and untreated outcome laws beyond averages. We also study the assumptions needed for causal interpretation. We introduce topological ignorability, a topological analogue of conditional ignorability that requires invariance of the chosen structural feature rather than the full counterfactual distribution. When the chosen summary is injective, this condition coincides with weak ignorability; for noninjective summaries, it can identify the structural feature of interest without identifying the full interventional law. We define a covariate-standardized topological-geometrical causal effect and develop practical estimators. We validate the framework in two hidden-confounding benchmarks: a fully synthetic exact benchmark and a real-covariate semi-synthetic benchmark using Wisconsin breast-cancer covariates. In both, weak ignorability fails and balancing observed covariates nearly eliminates standardized mean differences, yet the coordinate-mean average treatment effect remains biased. By contrast, selected finite density-superlevel Betti and Euler contrasts remain stable across oracle, observational, and weighted analyses.
通过双谱间隙证书实现自认证传输MCMC
Jun Hu
AI总结 提出CerT-MCMC框架,利用归一化流实现传输MCMC的自动严格收敛认证,通过覆盖证书和分位数核心证书提供谱间隙界限。
我们提出CerT-MCMC,一个为学习传输马尔可夫链蒙特卡洛配备自动、严格收敛证书的框架。归一化流将高斯参考映射到目标后验的近似;同一流同时作为独立Metropolis-Hastings提议和可计算谱间隙界的基础。我们开发了两个互补的证书。覆盖证书通过有限样本覆盖论证在全提议支撑上界权重比振荡,当保守梯度界可用时产生全支撑谱间隙界;其修正项以O(n^{-1/D})缩放,随着维度增加迅速变弱并最终无效。我们证明了一个匹配的Omega(n^{-1/D})下界,确立这一障碍是逐点Lipschitz认证固有的。分位数核心证书将注意力限制在高概率残差核心上,其振荡由一维经验分位数控制,具有O(n^{-1/2})的有限样本概率松弛,与维数无关。在合成目标(D=2-20)、结构工程后验(D=6,8)、心脏病数据集上的真实数据逻辑回归(D=13)以及合成贝叶斯逻辑回归(D=20)上,分位数核心证书在覆盖证书无效时提供了非平凡的谱间隙界,其谱间隙代理在7%内跟踪经验有效样本量。一个阴性对照实验证实,证书以超过10倍的因子区分流质量,而接受率仅相差1.15倍。据我们所知,双证书框架是第一个为学习传输MCMC提供自动、维度感知收敛证书的框架,区分了真正的传输失败与证明技术限制。
We propose CerT-MCMC, a framework that equips learned-transport Markov chain Monte Carlo with automatic, rigorous convergence certificates. A normalising flow maps a Gaussian reference to an approximation of the target posterior; the same flow then serves as both the independence Metropolis-Hastings proposal and the basis for a computable spectral-gap bound. We develop two complementary certificates. The covering certificate bounds the weight-ratio oscillation over the full proposal support via finite-sample covering arguments, yielding full-support spectral-gap bounds when a conservative gradient bound is available; its correction term scales as O(n^{-1/D}), making it rapidly weak and eventually vacuous as dimension increases. We prove a matching Omega(n^{-1/D}) lower bound, establishing that this barrier is intrinsic to pointwise Lipschitz certification. The quantile-core certificate restricts attention to a high-probability residual core on which the oscillation is controlled by one-dimensional empirical quantiles, with a finite-sample probability slack of O(n^{-1/2}), independent of the ambient dimension. On synthetic targets (D=2-20), structural-engineering posteriors (D=6,8), real-data logistic regression on the Heart Disease data set (D=13), and synthetic Bayesian logistic regression (D=20), the quantile-core certificate delivers non-vacuous spectral-gap bounds where the covering certificate is vacuous, and its spectral-gap proxy tracks empirical effective sample sizes within 7%. A negative control experiment confirms that the certificate discriminates flow quality by a factor exceeding 10x, whereas acceptance rates differ by only 1.15x. To our knowledge, the dual-certificate framework is the first to provide automatic, dimension-aware convergence certificates for learned-transport MCMC, distinguishing genuine transport failure from proof-technique limitations.
缺失交通量下事故率估计的经验贝叶斯的贝叶斯分层推广
Lars Skaug
AI总结 针对经验贝叶斯方法在交通量缺失时的局限性,提出全贝叶斯分层模型,联合插补缺失ADT并估计路段事故率,通过放松暴露结构假设显著提升预测精度。
Hauer等人(2002)的经验贝叶斯(EB)程序是高速公路安全分析的主要工具:它将安全性能函数与观测到的事故计数相结合,产生路段级事故率的收缩估计。EB通过将几个量固定在标定值上实现实用性:SPF系数、每类过度离散、观测ADT和固定暴露指数。当大多数路段的ADT缺失时,这些假设变得紧张。我们提出了一个全贝叶斯分层模型,通过在单一联合推断中放松每个假设来推广EB。该模型在俄亥俄州道路清单(408,304个路段,290万起事故,2013-2025年)上拟合,联合插补缺失ADT并估计每个路段的事故率及其不确定性。初始固定暴露模型的后验预测检查暴露了尾部不拟合;将暴露结构放松为每个功能类的暴露指数和估计的长度指数(代替单个标量和固定偏移)解决了该问题,并提高了样本外预测精度(PSIS-LOO Δelpd = 9,394,SE 238)。每个类别中事故计数与交通量呈次线性关系(暴露指数0.49-0.70,均<1,即安全数量效应),与路段长度呈次线性关系(β_len = 0.69)。部分池化相比完全池化显著提高了样本外预测精度(PSIS-LOO Δelpd = 4,780,SE 225)。在相同特征下,贝叶斯ADT子模型达到R²_log = 0.756,而LightGBM为0.653。输出是每个路段的事故率后验分布,取代了我们先前风险感知路由框架中使用的按类型中位数点估计。
The Empirical Bayes (EB) procedure of Hauer et al. (2002) is the workhorse of highway safety analysis: it combines a Safety Performance Function with observed crash counts to produce shrinkage estimates of segment-level crash rates. EB delivers practicality by holding several quantities fixed at calibration: SPF coefficients, per-type overdispersion, observed ADT, and a fixed exposure exponent. These assumptions strain when ADT is missing on a majority of segments. We present a fully Bayesian hierarchical model that moves beyond EB by relaxing each of these assumptions in a single joint inference. Fit on Ohio's road inventory (408,304 segments, 2.9 million crashes, 2013-2025), the model jointly imputes missing ADT and estimates per-segment crash rates with uncertainty. Posterior predictive checks of an initial fixed-exposure model expose a tail misfit; relaxing the exposure structure to a per-functional-class exposure exponent and an estimated length exponent, in place of a single scalar and a fixed offset, resolves it and improves out-of-sample predictive accuracy (PSIS-LOO $Δ\mathrm{elpd}$ = 9,394, SE 238). Crash count is sublinear in traffic in every class (exposure exponents 0.49-0.70, all $<1$, the safety-in-numbers effect) and sublinear in segment length ($β_{\mathrm{len}} = 0.69$). Partial pooling substantially improves out-of-sample predictive accuracy over complete pooling (PSIS-LOO $Δ\mathrm{elpd}$ = 4,780, SE 225). The Bayesian ADT submodel attains $R^2_{\log} = 0.756$ by encoding county and functional class as hierarchical priors, versus $0.653$ for a LightGBM restricted to the same continuous predictors. The output is a posterior crash rate distribution per segment, replacing the median-by-type point estimates used in our prior risk-aware routing framework.
坐标上升变分推断的Wasserstein收缩
Rocco Caprio, Adrien Corenflos, Sam Power
AI总结 研究坐标上升变分推断算法在Wasserstein距离下的收缩性,通过不动点处的传输-信息不等式和函数光滑性条件给出局部收敛保证,并应用于贝叶斯高斯混合模型、高维贝叶斯Probit回归及Pólya-Gamma逻辑回归。
我们研究了坐标上升变分推断算法在Wasserstein距离下的收缩性。该性质在不动点处满足传输-信息不等式和函数光滑性条件时成立。结果是通用且精确的,允许局部收敛保证,适用于一般光滑流形,也适用于某些非光滑空间。我们考虑了在贝叶斯高斯混合模型、高维贝叶斯Probit回归以及带有Pólya-Gamma随机变量的逻辑回归(即Jaakkola-Jordan算法)中的应用。
We study the contraction in Wasserstein distance of the coordinate ascent variational inference algorithm. This is shown to hold under a transport-information inequality at the fixed points and a functional smoothness condition. The results are general and sharp, allow for local convergence guarantees, hold for general smooth manifolds, and also in some non-smooth spaces. We consider applications to Bayesian Gaussian Mixture Models, and high-dimensional Bayesian Probit Regression, and Logistic Regression with Pólya-Gamma random variables (i.e. Jaakkola-Jordan's algorithm).
近似全共形预测:通过锦标赛校正实现无分布保证
Aabesh Bhattacharyya, Boxuan Zhang, Rina Foygel Barber
AI总结 提出基于锦标赛思想的一类新近似方法,在保证边际覆盖率为1-2α的同时降低计算成本,并可在稳定性条件下收紧至约1-α。
共形预测是一个提供预测区间的框架,具有无分布有效性的保证,确保对来自任何分布的数据的预测覆盖率。它的两个主要变体是全共形预测和分裂共形预测(也称为转导和归纳)。全共形预测被广泛认为在统计上更有效(因为分裂共形预测需要数据分割,因此由于样本量的损失可能导致更宽的预测区间),但其实现计算上不可行,因为它需要对响应空间中的每个候选值重新拟合底层模型。现有的计算捷径,例如使用离散值网格来近似全共形预测构造,通常缺乏边际覆盖率的理论保证,并且可能在实际中失败。为了解决这一限制,我们引入了一类新的全共形预测方法近似,基于锦标赛思想,使得能够构建具有严格边际覆盖率保证为$1-2α$的预测集。在稳定性条件下,理论覆盖率保证收紧至约$1-α$。这个新框架推广了现有的留一法交叉共形预测方法,同时允许灵活使用各种现有的近似策略。
Conformal prediction is a framework for providing prediction intervals with distribution-free validity, guaranteeing predictive coverage for data drawn from any distribution. Its two main variants are full conformal prediction and split conformal prediction (also called transductive and inductive). Full conformal prediction is widely considered to be statistically more efficient (since split conformal prediction requires data splitting, and therefore can lead to wider prediction intervals due to the resulting loss in sample size), but its implementation is computationally prohibitive, as it requires the underlying model to be refit for every candidate value in the response space. Existing computational shortcuts, such as using a discrete grid of values to approximate the full conformal prediction construction, frequently lack theoretical guarantees on marginal coverage and can fail in practice. To address this limitation, we introduce a novel class of approximations to the full conformal prediction method, based on the idea of \emph{tournaments}, which enables the construction of prediction sets with a rigorous marginal coverage guarantee of $1-2α$. Under stability conditions, the theoretical coverage guarantee tightens to approximately $1-α$. This new framework generalizes the existing method of leave-one-out cross-conformal prediction, while allowing for flexible use of various existing approximation strategies.
随机特征岭回归中弱到强泛化的改进缩放定律
Diyuan Wu, Lehan Chen, Theodor Misiakiewicz, Marco Mondelli
AI总结 本文通过随机特征岭回归的确定性等价分析,揭示了弱教师训练的强学生模型在偏差主导和方差主导场景下均能改进缩放定律,甚至达到极小极大最优率。
在机器学习中,使用学习模型标记数据,然后用这些数据训练更强大的模型变得越来越常见。弱到强泛化现象体现了这种两阶段过程的优势:强学生模型在由弱教师模型获得的不完美标签上训练,但强学生模型的表现优于弱教师模型。在本文中,我们展示了这种潜在改进是显著的,因为它影响了测试误差所遵循的缩放定律。具体来说,我们考虑通过随机特征岭回归(RFRR)训练的学生和教师模型。我们的主要技术贡献是推导出学生模型在教师模型获得的标签上训练时的超额测试误差的确定性等价。通过这个确定性等价,我们识别出学生模型的缩放定律相对于教师模型得到改进的区域,揭示了这种改进可以在偏差主导和方差主导的设置中实现。引人注目的是,无论教师模型的缩放定律如何,学生模型都可能达到极小极大最优率——事实上,即使教师模型的测试误差不随样本量衰减,这一结论也成立。
It is increasingly common in machine learning to use learned models to label data and then employ such data to train more capable models. The phenomenon of weak-to-strong generalization exemplifies the advantage of this two-stage procedure: a strong student is trained on imperfect labels obtained from a weak teacher, and yet the strong student outperforms the weak teacher. In this paper, we show that the potential improvement is substantial, in the sense that it affects the scaling law followed by the test error. Specifically, we consider students and teachers trained via random feature ridge regression (RFRR). Our main technical contribution is to derive a deterministic equivalent for the excess test error of the student trained on labels obtained via the teacher. Via this deterministic equivalent, we then identify regimes in which the scaling law of the student improves upon that of the teacher, unveiling that the improvement can be achieved both in bias-dominated and variance-dominated settings. Strikingly, the student may attain the minimax optimal rate regardless of the scaling law of the teacher -- in fact, when the test error of the teacher does not even decay with the sample size.
TASTE:一个由设计师标注的AI生成图形设计多维偏好数据集
Haonan Zhu, Elad Hirsch, Alexandria Minetti, Allison Nulty, Purvanshi Mehta
AI总结 针对现有偏好数据集仅提供单一整体评价的不足,本文构建了TASTE多维偏好数据集,由两组专业设计师对四个文本到图像模型的输出按九项标准排序,并提出了无准则信号验证框架和偏好模型基准测试。
文本到图像模型现在能够以生产规模生成图形设计,但其监督仍然主要来自照片风格的偏好数据集,每次比较只有一个整体判断。设计师沿着几个不同的轴(例如,排版、布局、色彩和谐)评估设计,而单个偏好标签会将这些轴合并。我们发布了\emph{TASTE} extit{(排版、美学、空间、色调等)},这是一个多维偏好数据集,其中两个不相交的五名专业设计师队列分别对来自四个当前文本到图像模型的输出按九项标准进行排序,并附带每张图像的幻觉标记。我们将该数据集与两个贡献配对。首先,一个基于Kendall的$τ$、多数投票概率和Condorcet循环的无准则信号验证框架,针对精确的iid均匀零假设;分析揭示了显著但中等程度的设计师一致性,每个TASTE标准都拒绝了随机评分者的零假设。其次,我们在TASTE上对偏好模型进行基准测试,发现现成的VLM评判器和专用的T2I评分器未能达到与设计师小组的多数一致,而直接在TASTE上训练的小型MLP头显著缩小了与单个评分者上限的差距,为未来基于TASTE训练的偏好模型设定了基线。
Text-to-image models now generate graphic design at production scale, yet their supervision still comes primarily from photo-style preference datasets with a single overall verdict per comparison. Designers evaluate designs along several distinct axes (e.g., typography, layout, color harmony) that a single preference label collapses. We release \emph{TASTE} \textit{(Typography, Aesthetics, Spatial, Tone, Etc.)}, a multi-dimensional preference dataset in which two disjoint cohorts of five professional designers each ranked outputs from four current text-to-image models across nine criteria along with per-image hallucination flags. We pair the dataset with two contributions. First, a criterion-agnostic signal-validation framework based on Kendall's $τ$, majority-vote probability, and Condorcet cycles against exact iid-uniform nulls; the analysis reveals significant but moderate designer agreement, with every TASTE criterion rejecting the random-rater null. Second, we benchmark preference models on TASTE and find that off-the-shelf VLM judges and dedicated T2I scorers fail to reach majority agreement with the designer panel, while a small MLP head trained directly on TASTE substantially narrows the gap to the single-rater ceiling, setting a baseline for future TASTE-trained preference models.
潜在拉普拉斯扩散用于不规则多元时间序列
Zinuo You, Jin Zheng, John Cartlidge
AI总结 提出潜在拉普拉斯扩散(LLapDiff)生成框架,通过低维潜在轨迹和拉普拉斯域参数化实现不规则时间序列的长时预测与缺失值插补。
不规则多元时间序列对长期预测提出了权衡:离散方法通过重新网格化可能扭曲时间结构,而连续时间模型通常需要容易漂移的序贯求解器。为弥合这一差距,我们提出了潜在拉普拉斯扩散(LLapDiff),一种生成式框架,将目标建模为低维潜在轨迹,从而无需逐步积分物理时间即可实现全范围生成。我们利用受随机端口-哈密顿动力学启发的稳定模态参数化来引导逆向过程,并通过可学习的共轭复极点参数化其在拉普拉斯域中的均值演化,从而能够在不规则时间戳上直接评估。我们还通过更新平均分析将连续动力学与不规则观测联系起来,该分析将采样间隙映射到有效事件域极点,并激发了间隙感知的历史总结器。大量实验表明,LLapDiff在长期预测中优于基线,其连续时间生成性质通过在同一模型的历史时间戳上查询,支持缺失值插补。代码可在https://github.com/pixelhero98/LLapDiffusion获取。
Irregular multivariate time series impose a trade-off for long-horizon forecasting: discrete methods can distort temporal structure via re-gridding, while continuous-time models often require sequential solvers prone to drift. To bridge this gap, we present Latent Laplace Diffusion (LLapDiff), a generative framework that models the target as a low-dimensional latent trajectory, enabling horizon-wide generation without step-by-step integration over physical time. We guide the reverse process utilizing a stable modal parameterization motivated by stochastic port-Hamiltonian dynamics, and parameterize its mean evolution in the Laplace domain via learnable complex-conjugate poles, enabling direct evaluation over irregular timestamps. We also link continuous dynamics to irregular observations through renewal-averaging analysis, which maps sampling gaps to effective event-domain poles and motivates a gap-aware history summarizer. Extensive experiments show that LLapDiff improves over baselines in long-horizon forecasting, and its continuous-time generative nature supports missing-value imputation by querying the same model at historical timestamps. Code is available at https://github.com/pixelhero98/LLapDiffusion.
优化器设计的对称性兼容原理:嵌入、LM头、SwiGLU MLP和MoE路由器
Tim Tsz-Kit Lau, Weijie Su
AI总结 针对现代神经网络参数空间的对称性与坐标级优化器之间的几何不匹配,提出对称性兼容的优化器设计原则,并针对嵌入矩阵、LM头、SwiGLU MLP投影和MoE路由器等特殊参数块导出相应更新规则,实验证明其改善验证损失、负载平衡和训练稳定性。
深度学习实践中长期存在一种显著的几何差异。现代神经网络架构自然展现出丰富的对称性和等变性,而流行的优化器如Adam及其变体本质上是坐标级的,无法尊重参数空间的等变结构。我们通过引入优化器设计的对称性兼容原则来解决这一差异:梯度更新规则应在作用于相应权重块的对称群下等变。遵循这一原则,我们首先为一般矩阵层提供了双正交等变更新的统一视角,如随机谱下降、Muon、Scion和极梯度方法所采用的。更重要的是,通过从正交群转向置换和共享移位对称性,我们为参数块(其对称性与一般矩阵层不同)推导了对称性兼容的优化器:嵌入和LM头矩阵、SwiGLU MLP投影以及MoE路由器矩阵。这些构造包括单边谱、行范数、混合行范数/谱、行感知、列感知、中心行范数和左谱更新。它们产生了一个端到端的逐层优化器堆栈,其中每个主要的矩阵值参数类被分配一个更新,其等变性与其对称群匹配。我们通过在密集和稀疏MoE语言模型上的预训练实验验证了这一原则,包括Qwen3-0.6B风格、Gemma 3 1B风格、OLMoE-1B-7B风格和缩小版gpt-oss架构。在这些实验中,对称性兼容的更新规则一致地改善了最终验证损失,减少了稀疏MoE模型中的负载不平衡,并在若干情况下比相应的AdamW更新提高了训练稳定性。
A striking geometric disparity has long persisted in the practice of deep learning. While modern neural network architectures naturally exhibit rich symmetry and equivariance properties, popular optimizers such as Adam and its variants operate inherently coordinate-wise, rendering them unable to respect the equivariance structures of the parameter space. We address this disparity by introducing a symmetry-compatible principle for optimizer design: the gradient update rule should be equivariant under the symmetry group acting on the corresponding weight block. Following this principle, we first provide a unified perspective on bi-orthogonally equivariant updates for general matrix layers, as employed by stochastic spectral descent, Muon, Scion, and polar gradient methods. More importantly, by moving from orthogonal groups to permutation and shared-shift symmetries, we derive symmetry-compatible optimizers for parameter blocks whose symmetries differ from those of general matrix layers: embedding and LM head matrices, SwiGLU MLP projections, and MoE router matrices. These constructions include one-sided spectral, row-norm, hybrid row-norm/spectral, row-aware, column-aware, centered row-norm, and left-spectral updates. They yield an end-to-end layerwise optimizer stack in which each major matrix-valued parameter class is assigned an update whose equivariance matches its symmetry group. We corroborate this principle through pre-training experiments on dense and sparse MoE language models, including Qwen3-0.6B-style, Gemma 3 1B-style, OLMoE-1B-7B-style, and downsized gpt-oss architectures. Across these experiments, symmetry-compatible update rules consistently improve final validation loss, reduce load imbalance in sparse MoE models, and in several cases improve training stability over the corresponding AdamW updates.
概率PLS的精确Stiefel优化:闭式更新、误差界与校准不确定性
Haoran Hu, Xingce Wang
AI总结 提出一种基于Stiefel流形精确优化的概率偏最小二乘框架,通过噪声预估计、约束似然优化和预测校准,实现闭式更新、误差界和校准不确定性。
概率偏最小二乘(PPLS)是一种基于似然的核心双视图模型,适用于需要可解释潜在因子和校准不确定性的场景。基于Bouhaddani等人(2018)的可识别参数化,现有拟合流程仍面临两个实际瓶颈:联合EM/ECM更新下的噪声-信号耦合以及正交约束的非平凡处理。遵循固定噪声标量似然协议,我们开发了一个端到端框架,将噪声预估计、约束似然优化和预测校准整合到一条流水线中。我们从低特征值噪声子空间估计观测噪声,并通过精确的Stiefel流形优化强制执行正交性。噪声子空间估计器实现了与信号强度无关的前沿有限样本率,并匹配极小极大下界,而全谱噪声估计器在同一模型下携带确定性偏差。我们通过可选的高斯化将框架扩展到次高斯设置,并通过块结构Fisher分析提供闭式标准误差。在合成高噪声设置和两个多组学基准(TCGA-BRCA和PBMC CITE-seq)上,该方法无需事后重新校准即可实现接近名义覆盖,在TCGA-BRCA上秩$r=3$时达到Ridge级点精度,在跨视图预测上匹配或超过PO2PLS,同时提供原生校准不确定性,并提高参数恢复的稳定性。
Probabilistic partial least squares (PPLS) is a central likelihood-based model for two-view learning when one needs both interpretable latent factors and calibrated uncertainty. Building on the identifiable parameterization of Bouhaddani et al.\ (2018), existing fitting pipelines still face two practical bottlenecks: noise--signal coupling under joint EM/ECM updates and nontrivial handling of orthogonality constraints. Following the fixed-noise scalar-likelihood protocol, we develop an end-to-end framework that combines noise pre-estimation, constrained likelihood optimization, and prediction calibration in one pipeline. We estimate the observation noise from the low-eigenvalue noise subspace and enforce orthogonality through exact Stiefel-manifold optimization. The noise-subspace estimator attains a signal-strength-independent leading finite-sample rate and matches a minimax lower bound, whereas a full-spectrum noise estimator carries a deterministic bias under the same model. We further extend the framework to sub-Gaussian settings via optional Gaussianization and provide closed-form standard errors through a block-structured Fisher analysis. Across synthetic high-noise settings and two multi-omics benchmarks (TCGA-BRCA and PBMC CITE-seq), the method achieves near-nominal coverage without post-hoc recalibration, reaches Ridge-level point accuracy on TCGA-BRCA at rank $r=3$, matches or exceeds PO2PLS on cross-view prediction while providing native calibrated uncertainty, and improves stability of parameter recovery.
共形预测中条件覆盖的统一理论及其应用
Yinjie Min, Liuhua Peng, Changliang Zou
AI总结 本文提出一个统一框架,将条件覆盖不足分解为三个可解释成分,并基于此指导模型选择、开发局部化方法及扩展到结构化数据。
共形预测提供了具有有限样本边际覆盖的预测集,但许多应用需要适应单个测试点、子群体或数据结构组件的覆盖保证。现有的针对条件覆盖的方法大多是逐案分析的,缺乏通用理论来理解条件覆盖不足的来源、不同程序应如何比较,以及如何将此类保证扩展到独立同分布数据之外。我们通过一个针对条件覆盖的共形方法的统一框架和理论来解决这些空白。我们的核心贡献是将条件覆盖不足非渐近地分解为三个可解释的成分:分数估计误差、有限样本校准误差和内在条件不匹配误差。这种分解阐明了渐近条件有效性背后的机制,并将现有方法置于一个共同的分析视角下。基于此框架,我们推导出面向条件覆盖的模型选择的原则性指导,并开发了在协变量偏移下具有渐近条件保证的局部化方法。最后,我们将该框架扩展到结构化数据,具体应用于图结构和层次结构设置。数值实验证实了该理论并展示了所提出程序的有效性。
Conformal prediction provides prediction sets with finite-sample marginal coverage, but many applications require coverage guarantees that adapt to individual test points, a subpopulation, or a structural component of the data. Existing methods targeting conditional coverage are largely analyzed case by case, leaving limited general theory for understanding where conditional miscoverage comes from, how different procedures should be compared, and how such guarantees can be extended beyond i.i.d.~data. We address these gaps through a unified framework and theory for conformal methods targeting conditional coverage. Our central contribution is a non-asymptotic decomposition of conditional miscoverage into three interpretable components: score-estimation error, finite-sample calibration error, and intrinsic conditional-mismatch error. This decomposition clarifies the mechanisms behind asymptotic conditional validity and places existing methods within a common analytical lens. Building on this framework, we derive principled guidance for conditional-coverage-oriented model selection, and develop localized methods with asymptotic conditional guarantees under covariate shift. Finally, we extend the framework to structured data, with concrete applications to graph-structured and hierarchical settings. Numerical experiments corroborate the theory and demonstrate the effectiveness of the proposed procedures.
用于分类数据采样的球面流
Jannis Chemseddine, Gregor Kornhardt, Gabriele Steidl
AI总结 提出在球面上利用von Mises-Fisher分布进行离散序列生成建模,通过径向对称性简化连续性方程为标量ODE,结合后验加权切线和与预测-校正采样实现高效采样。
我们研究了在连续嵌入空间中学习离散序列生成模型的问题。以往的方法通常在欧几里得空间或概率单纯形上操作,而我们则在球面$\mathbb S^{d-1}$上工作。在那里,von Mises-Fisher (vMF)分布诱导了一个自然的噪声过程,并允许闭式条件得分。条件速度通常是难以处理的。利用vMF密度的径向对称性,我们将$\mathbb S^{d-1}$上的连续性方程简化为关于余弦相似度的标量ODE,其唯一有界解决定了速度。$\mathbb S^{d-1}$上的边际速度和边际得分都分解为后验加权的切线和,仅因每个token的标量权重不同。这提供了ODE和预测-校正(PC)采样两种途径。后验是唯一需要学习的对象,通过交叉熵损失训练。实验将vMF路径与测地线和欧几里得替代方案进行了比较。vMF与PC采样的结合显著改善了数独和语言建模的结果。
We study the problem of learning generative models for discrete sequences in a continuous embedding space. Whereas prior approaches typically operate in Euclidean space or on the probability simplex, we instead work on the sphere $\mathbb S^{d-1}$. There the von Mises-Fisher (vMF) distribution induces a natural noise process and admits a closed-form conditional score. The conditional velocity is in general intractable. Exploiting the radial symmetry of the vMF density we reduce the continuity equation on $\mathbb S^{d-1}$ to a scalar ODE in the cosine similarity, whose unique bounded solution determines the velocity. The marginal velocity and marginal score on $(\mathbb S^{d-1})^L$ both decompose into posterior-weighted tangent sums that differ only by per-token scalar weights. This gives access to both ODE and predictor-corrector (PC) sampling. The posterior is the only learned object, trained by a cross-entropy loss. Experiments compare the vMF path against geodesic and Euclidean alternatives. The combination of vMF and PC sampling significantly improves results on Sudoku and language modeling.
连续边缘最优传输:一种无网格核方法
Yumiharu Nakano
AI总结 本文针对连续边缘最优传输问题,提出了一种基于再生核希尔伯特空间的无网格求解器,通过弱连续性方程嵌入和线性参数化速度场,实现了高精度漂移恢复和边缘一致性。
本文研究连续边缘最优传输问题。给定一个时间连续的概率边缘族,目标是恢复最小能量速度场,其流再现每个边缘。该问题是经典双边缘Benamou-Brenier公式的连续极限,也是随机最优传输Nelson问题的确定性极限。我们提出了一种实用的无网格求解器。弱连续性方程被嵌入到再生核希尔伯特空间中,得到一个仅需样本的目标函数,无需空间离散化。速度场由任意线性参数化字典或神经网络参数化,并通过小批量随机方法优化。合成实验证实该方法实现了准确的漂移恢复和边缘一致性。相同的计算框架也适用于随机Nelson问题。
In this paper we study continuum-marginal optimal transport. Given a time-continuous family of probability marginals, the problem is to recover the minimum-energy velocity field whose flow reproduces every marginal. This problem is the continuum limit of the classical two-marginal Benamou--Brenier formulation, and also the deterministic limit of the Nelson problem of stochastic optimal transport. We propose a practical mesh-free solver for this problem. The weak continuity equation is embedded in a reproducing kernel Hilbert space, yielding a sample-only objective that requires no spatial discretization. The velocity is parametrized by any linear-in-parameters dictionary or neural network, and is optimized by mini-batch stochastic methods. Synthetic experiments confirm that the method achieves accurate drift recovery and marginal consistency. The same computational framework also applies to the stochastic Nelson problem.
ProEval:生成式AI评估的主动故障发现与高效性能估计
Yizheng Huang, Wenjun Zeng, Aditi Kumaresan, Zi Wang
AI总结 提出ProEval框架,利用预训练高斯过程进行贝叶斯积分和超水平集采样,实现高效性能估计和主动故障发现,在推理、安全对齐和分类基准上以8-65倍更少样本达到1%误差内估计。
由于推理速度慢、评估成本高以及模型和基准的快速增长,评估生成式AI模型变得越来越资源密集。我们提出ProEval,一个主动评估框架,利用迁移学习高效估计性能并识别故障案例。ProEval采用预训练高斯过程(GPs)作为性能评分函数的代理,将模型输入映射到指标,如错误严重性或安全违规。通过将性能估计构建为贝叶斯积分(BQ)和故障发现构建为超水平集采样,我们开发了不确定性感知的决策策略,主动选择或合成高度信息量的输入进行测试。理论上,我们证明了基于预训练GP的BQ估计器是无偏且有界的。实验上,在推理、安全对齐和分类基准上的大量实验表明,ProEval比竞争基线显著更高效。它需要8-65倍更少的样本即可达到真实值1%内的估计,同时在更严格的评估预算下揭示更多样化的故障案例。
Evaluating generative AI models is increasingly resource-intensive due to slow inference, expensive raters, and a rapidly growing landscape of models and benchmarks. We propose ProEval, a proactive evaluation framework that leverages transfer learning to efficiently estimate performance and identify failure cases. ProEval employs pre-trained Gaussian Processes (GPs) as surrogates for the performance score function, mapping model inputs to metrics such as the severity of errors or safety violations. By framing performance estimation as Bayesian quadrature (BQ) and failure discovery as superlevel set sampling, we develop uncertainty-aware decision strategies that actively select or synthesize highly informative inputs for testing. Theoretically, we prove that our pre-trained GP-based BQ estimator is unbiased and bounded. Empirically, extensive experiments on reasoning, safety alignment, and classification benchmarks demonstrate that ProEval is significantly more efficient than competitive baselines. It requires 8-65x fewer samples to achieve estimates within 1% of the ground truth, while simultaneously revealing more diverse failure cases under a stricter evaluation budget.
基于因子的条件扩散模型用于情境投资组合优化
Xuefeng Gao, Mengying He, Xuedong He, Jiale Zha
AI总结 提出一种条件扩散模型,利用扩散Transformer架构学习股票收益的条件分布,并通过生成样本进行均值-方差和均值-CVaR优化,在中国A股市场优于多种基准。
我们提出了一种新颖的条件扩散模型,用于情境投资组合优化,该模型学习基于高维资产特定因子的次日股票收益的横截面分布。我们的模型采用具有token-wise条件化的扩散Transformer架构,能够将每个资产的收益与其自身的因子向量关联起来,同时捕捉复杂的跨资产依赖关系。通过从学习到的条件收益分布中生成样本,我们进行每日均值-方差和均值-CVaR优化,并考虑交易成本和实际约束。利用中国A股市场的数据,我们证明了我们的方法在多个风险调整绩效指标上持续优于各种标准基准。此外,我们建立了条件扩散模型的2-Wasserstein误差界,并量化了其分布近似误差如何传播到下游的投资组合优化任务。我们的结果展示了生成扩散模型在高维、风险敏感的情境随机优化和金融决策中的潜力。
We propose a novel conditional diffusion model for contextual portfolio optimization that learns the cross-sectional distribution of next-day stock returns conditioned on high-dimensional asset-specific factors. Our model leverages a Diffusion Transformer architecture with token-wise conditioning, which enables linking each asset's return to its own factor vector while capturing complex cross-asset dependencies. By drawing generative samples from the learned conditional return distribution, we perform daily mean-variance and mean-CVaR optimization, incorporating transaction costs and realistic constraints. Using data from the Chinese A-share market, we demonstrate that our approach consistently outperforms various standard benchmarks across multiple risk-adjusted performance metrics. Furthermore, we establish a 2-Wasserstein error bound for the conditional diffusion model and quantify how its distributional approximation errors propagate to the downstream portfolio optimization task. Our results demonstrate the potential of generative diffusion models for high-dimensional, risk-sensitive contextual stochastic optimization and financial decision making.
从部分重叠治疗的多项研究中整合学习个体化治疗规则
Yuan Bian, Donglin Zeng, Hyun-Joon Yang, Leanne M. Williams, Yuanjia Wang
AI总结 针对多项随机对照试验(RCT)中治疗不完全重叠的问题,提出一种整合学习框架,通过正则化加权误分类风险函数自适应地整合共享共同比较组但替代治疗臂不同的研究,以改进个体化治疗规则(ITR)的估计。
个体化治疗规则(ITR)根据患者的具体特征定制治疗方案。然而,随机对照试验(RCT)通常不足以检测治疗效果的异质性,从而难以可靠地估计ITR。为了解决这一局限性,越来越多的人关注利用多项研究的信息来提高统计功效并支持个体化决策。在此背景下,一个关键挑战是可用的RCT可能不评估相同的治疗集合。在本文中,我们提出一个整合学习框架,该框架综合了多项共享共同比较组但替代治疗臂不同的RCT的证据。我们的方法通过正则化加权误分类风险函数整合信息,并自适应地确定每项研究对其他研究ITR的贡献。我们严格研究了所得估计量的超额风险。模拟研究表明,所提出的方法改进了价值函数和收益函数的估计。我们使用两项关于重度抑郁症的重要研究数据说明了我们方法的实用性:临床护理中抗抑郁反应调节因子和生物标志物建立研究以及抑郁症优化治疗预测国际研究,这两项研究均包含选择性5-羟色胺再摄取抑制剂作为共同治疗组。我们发现,单独学习方法优于一刀切方法,而我们的整合方法进一步提高了性能。
An individualized treatment rule (ITR) tailors treatments to a patient's specific characteristics. However, randomized controlled trials (RCTs) are often underpowered to detect the treatment effect heterogeneity needed for reliable ITR estimation. To address this limitation, there is growing interest in leveraging information from multiple studies to improve statistical power and support individualized decision-making. A key challenge in this context is that available RCTs may not evaluate the same set of treatments. In this paper, we propose an integrative learning framework that synthesizes evidence across multiple RCTs that share a common comparator but differ in their alternative treatment arms. Our method integrates information through a regularized weighted misclassification risk function and adaptively determines the contribution of each study to the ITRs of the others. We rigorously study the excess risk of the resulting estimator. Simulation studies demonstrate that the proposed approaches improve the estimation of both value and benefit functions. We illustrate the utility of our methodology using data from two landmark studies of major depressive disorder: the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care study and the International Study to Predict Optimized Treatment in Depression study, both of which include a selective serotonin reuptake inhibitor as a common treatment arm. We find that the separate learning method outperforms one-size-fits-all methods, and our integrative methods further improve performance.
学习在严格截止日期下进行前瞻性任意有效测试的投注
Ege Onur Taga, Samet Oymak, Shubhanshu Shekhar
AI总结 本文通过将前瞻性投注建模为有限时域最优控制问题,利用深度强化学习学习通用策略,在严格截止日期下实现有界均值的任意有效测试和置信序列。
我们针对严格截止日期 $N$ 下的有界均值,开发了前瞻性任意有效测试和置信序列。利用投注/e-过程框架,我们将前瞻性投注视为一个状态空间为 $(t, \log W_t)$ 的有限时域最优控制问题,其中 $t$ 是时间,$W_t$ 是测试鞅值。我们首先证明,在状态空间的某些内部区域,显著偏离Kelly投注的策略是次优的,而Kelly投注以高概率达到阈值。然后,我们识别出充分条件,表明在该区域之外,如果投注者落后于计划,比Kelly更激进的投注可能更好;如果投注者领先,比Kelly更保守的投注可能更好。这些结果共同暗示了 $(t, \log W_t)$ 平面上的一个简单相图,描绘了Kelly、分数Kelly和激进投注可能更优的区域。在此相图指导下,我们引入了一种基于通用深度Q网络(DQN)智能体的深度强化学习方法,该智能体从合成经验中学习单一策略,并将过去观测的简单统计量映射为跨时域和零假设的投注。在有限时域实验中,学习到的DQN策略取得了最先进的结果。
We develop horizon-aware anytime-valid tests and confidence sequences for bounded means under a strict deadline $N$. Using the betting/e-process framework, we cast horizon-aware betting as a finite-horizon optimal control problem with state space $(t, \log W_t)$, where $t$ is the time and $W_t$ is the test martingale value. We first show that in certain interior regions of the state space, policies that deviate significantly from Kelly betting are provably suboptimal, while Kelly betting reaches the threshold with high probability. We then identify sufficient conditions showing that outside this region, more aggressive betting than Kelly can be better if the bettor is behind schedule, and less aggressive can be better if the bettor is ahead. Taken together these results suggest a simple phase diagram in the $(t, \log W_t)$ plane, delineating regions where Kelly, fractional Kelly, and aggressive betting may be preferable. Guided by this phase diagram, we introduce a Deep Reinforcement Learning approach based on a universal Deep Q-Network (DQN) agent that learns a single policy from synthetic experience and maps simple statistics of past observations to bets across horizons and null values. In limited-horizon experiments, the learned DQN policy yields state-of-the-art results.
关于稳健混合先验贝叶斯动态借用方法中先验权重与稳健化成分方差之间的相互作用
Marco Ratta, Gaelle Saint-Hilary, Mauro Gasparini, Pavel Mozgunov
AI总结 本文研究稳健混合先验方法中先验权重与稳健化成分方差的联合选择对后验推断、I类错误控制和稳健性的影响,并提出一种新的超参数启发式方法。
稳健混合先验是一种流行的贝叶斯动态借用方法,它将信息性历史分布与一个信息量较少的成分(称为稳健化成分)结合在混合先验中,以提高混合对照随机试验的效率。当前实践通常仅关注控制这两个成分相对影响的先验权重的选择,往往将稳健化成分的方差固定为单个观测的方差。在本研究中,我们证明RMP的性能关键取决于权重和稳健化成分方差的联合选择。特别地,我们表明广泛的权重-方差对可以产生几乎相同的后验推断(特别是参数空间的某些区域),并且可以使用大方差的稳健成分而不会引发所谓的林德利悖论。我们进一步证明,使用大方差的稳健化成分可以改善渐近I类错误控制,并增强RMP对稳健化成分位置参数设定的稳健性。最后,我们利用这些理论结果提出一种新颖且实用的超参数启发式方法。
Robust Mixture Prior (RMP) is a popular Bayesian dynamic borrowing method, which combines an informative historical distribution with a less informative component (referred as robustification component) in a mixture prior to enhance the efficiency of hybrid-control randomized trials. Current practice typically focuses solely on the selection of the prior weight that governs the relative influence of these two components, often fixing the variance of the robustification component to that of a single observation. In this study we demonstrate that the performance of RMPs critically depends on the joint selection of both weight and variance of the robustification component. In particular, we show that a wide range of weight-variance pairs can yield practically identical posterior inferences (in particular regions of the parameter space) and that large variance robust components may be employed without incurring in the so called Lindley's paradox. We further show that the use of large variance robustification components leads to improved asymptotic Type I error control and enhanced robustness of the RMP to the specification of the location parameter of the robustification component. Finally, we leverage these theoretical results to propose a novel and practical hyper-parameter elicitation routine.
FlowSN:基于神经模拟的推理在超新星宇宙学中考虑真实选择效应
Benjamin M. Boyd, Kaisey S. Mandel, Matthew Grayling, Ayan Mitra, Richard Kessler, Maximilian Autenrieth, Aaron Do, Madeleine Ginolin, Lisa Kelsey, Gautham Narayan, Matthew O'Callaghan, Nikhil Sarin, Stephen Thorp
AI总结 提出FlowSN框架,利用归一化流进行基于模拟的推理,以校正观测天文学中的选择效应,并在类LSST模拟中首次验证,显著降低宇宙学参数估计偏差。
我们提出FlowSN,一个使用基于模拟的推理(SBI)和归一化流来考虑观测天文学中选择效应的统计框架。未能考虑选择效应可能导致对全局参数的推断产生偏差。一个例子是马尔姆奎斯特偏差,其中探测极限导致样本偏向更亮的天体。在Ia型超新星(SN Ia)宇宙学中,这些选择效应会系统性地改变宇宙学参数的后验分布,因此需要开发稳健的统计框架来校正偏差。SBI使我们能够隐式学习在解析上难以计算的概率分布。在这项工作中,我们引入了一种新方法,该方法使用归一化流从正向模拟中学习给定巡天的非解析选择超新星似然,独立于假设的宇宙学模型。得到的似然近似被纳入分层贝叶斯框架,并使用哈密顿蒙特卡洛进行后验采样,以获得基于观测数据的宇宙学参数约束。模块化的学习似然近似可以重复使用而无需重新训练,以评估不同的宇宙学模型,这比其他SBI方法提供了关键优势。我们首次使用类似LSST的SNANA模拟来训练和测试SBI技术,展示了该方法的性能。我们的FlowSN方法在宇宙学参数(包括暗能量状态方程$w_0$)上产生了准确的后验估计,其偏差比传统技术低一个数量级,并且表现出改进的频率论校准。
We present FlowSN, a statistical framework using simulation-based inference (SBI) with normalising flows to account for selection effects in observational astronomy. Failure to account for selection effects can lead to biased inference on global parameters. An example is Malmquist bias, where detection limits result in a sample skewed towards brighter objects. In Type Ia supernova (SN Ia) cosmology, these selection effects can systematically shift the inferred posterior distributions of cosmological parameters, necessitating the development of robust statistical frameworks to account for the biases. SBI enables us to implicitly learn probability distributions that are analytically intractable to calculate. In this work, we introduce a novel approach that employs a normalising flow to learn the non-analytic selected SN likelihood for a given survey from forward simulations, independent of the assumed cosmological model. The resulting likelihood approximation is incorporated into a hierarchical Bayesian framework and posterior sampling is performed using Hamiltonian Monte Carlo to obtain constraints on cosmological parameters conditioned on the observed data. The modular learnt likelihood approximation can be reused without retraining to evaluate different cosmological models, providing a key advantage over other SBI approaches. We demonstrate the performance of this methodology by training and testing the SBI technique using realistic LSST-like SNANA simulations for the first time. Our FlowSN approach yields accurate posterior estimates on cosmological parameters, including the dark energy equation of state $w_0$, that are an order of magnitude less biased than those obtained with conventional techniques and also exhibit improved frequentist calibration.
迈向因果市场模拟器
Dennis Thumm, Luis Ontaneda Mijares
AI总结 提出一种结合变分自编码器与结构因果模型的时间序列神经因果模型VAE(TNCM-VAE),用于生成保留时间依赖和因果关系的反事实金融时间序列,在合成数据上实现低至0.03-0.10的L1距离。
使用深度生成模型的市场生成器在合成金融数据生成方面显示出前景,但现有方法缺乏反事实分析和风险评估所必需的因果推理能力。我们提出了一种时间序列神经因果模型VAE(TNCM-VAE),它将变分自编码器与结构因果模型相结合,以生成反事实金融时间序列,同时保留时间依赖性和因果关系。我们的方法通过解码器架构中的有向无环图施加因果约束,并使用因果Wasserstein距离进行训练。我们在受Ornstein-Uhlenbeck过程启发的合成自回归模型上验证了该方法,在反事实概率估计中表现出优越性能,与真实值相比L1距离低至0.03-0.10。该模型通过生成尊重潜在因果机制的合理反事实市场轨迹,实现了金融压力测试、情景分析和增强回测。
Market generators using deep generative models have shown promise for synthetic financial data generation, but existing approaches lack causal reasoning capabilities essential for counterfactual analysis and risk assessment. We propose a Time-series Neural Causal Model VAE (TNCM-VAE) that combines variational autoencoders with structural causal models to generate counterfactual financial time series while preserving both temporal dependencies and causal relationships. Our approach enforces causal constraints through directed acyclic graphs in the decoder architecture and employs the causal Wasserstein distance for training. We validate our method on synthetic autoregressive models inspired by the Ornstein-Uhlenbeck process, demonstrating superior performance in counterfactual probability estimation with L1 distances as low as 0.03-0.10 compared to ground truth. The model enables financial stress testing, scenario analysis, and enhanced backtesting by generating plausible counterfactual market trajectories that respect underlying causal mechanisms.
最具影响力集合的检验
Lucas D. Konrad, Nikolas Kuschnig
AI总结 针对小部分数据点可能过度影响模型结论的问题,基于线性最小二乘法推导精确影响公式并识别最大影响的极值分布,提出一个用于检验过度影响的假设检验框架。
小的有影响力的数据子集可以极大地影响模型结论,少数数据点可能推翻关键发现。虽然最近的研究识别了这些最具影响力的集合,但没有正式的方法来判断最大影响何时是过度的,而非在自然随机抽样变异下预期的。我们通过开发一个关于最具影响力集合的原则性框架来填补这一空白。聚焦于线性最小二乘法,我们推导了一个方便的精确影响公式,并识别了最大影响的极值分布——对于固定大小的集合和重尾数据是重尾的弗雷歇分布,对于增长集合或轻尾数据是表现良好的耿贝尔分布。这使得我们能够对过度影响进行严格的假设检验。我们通过跨经济学、生物学和机器学习基准的应用,解决了有争议的发现,并用严格的推断取代了临时的启发式方法。
Small influential data subsets can dramatically impact model conclusions, with a few data points overturning key findings. While recent work identifies these most influential sets, there is no formal way to tell when maximum influence is excessive rather than expected under natural random sampling variation. We address this gap by developing a principled framework for most influential sets. Focusing on linear least-squares, we derive a convenient exact influence formula and identify the extreme value distributions of maximal influence - the heavy-tailed Fréchet for constant-size sets and heavy-tailed data, and the well-behaved Gumbel for growing sets or light tails. This allows us to conduct rigorous hypothesis tests for excessive influence. We demonstrate through applications across economics, biology, and machine learning benchmarks, resolving contested findings and replacing ad-hoc heuristics with rigorous inference.
Buzz, Choose, Forget: 一种类蜂决策的元老虎机框架
Emmanuelle Claeys, Elena Kerjean, Jean-Michel Loubes
AI总结 提出基于多臂老虎机的序列模仿学习模型MAYA,通过时间窗口τ模拟蜜蜂有限记忆,在真实、模拟和补充数据集上优于基线模型,并具备可解释性和轨迹推断能力。
本文介绍了MAYA,一种基于多臂老虎机的序列模仿学习模型,旨在再现和预测个体蜜蜂在情境化觅食任务中的决策。该模型通过时间窗口$τ$考虑蜜蜂的有限记忆,其最优值约为7次试验,且轻微依赖于天气条件。在真实、模拟和补充(小鼠)数据集上的实验结果表明,MAYA(特别是使用Wasserstein距离时)优于模仿基线和经典统计模型,同时提供了个体学习策略的可解释性,并能够推断出用于未来生态应用的真实轨迹。
This work introduces MAYA, a sequential imitation learning model based on multi-armed bandits, designed to reproduce and predict individual bees' decisions in contextualized foraging tasks. The model accounts for bees' limited memory through a temporal window $τ$, whose optimal value is around 7 trials, with a slight dependence on weather conditions. Experimental results on real, simulated, and complementary (mice) datasets show that MAYA (particularly with the Wasserstein distance) outperforms imitation baselines and classical statistical models, while providing interpretability of individual learning strategies and enabling the inference of realistic trajectories for prospective ecological applications.
因果森林中的诚实性:何时有益,何时有害
Yanfang Hou, Carlos Fernández-Loría
AI总结 本文通过偏差-方差权衡分析,发现诚实估计(分割数据用于子组定义和效应估计)在异质性较强且数据充足时会降低个体处理效应估计精度,建议将其视为正则化手段而非默认选择。
因果森林估计处理效应如何随个体变化,指导营销、运营和公共政策等领域的个性化干预。标准做法是诚实估计:将数据分为两个样本,一个用于定义子组,另一个用于估计子组内的处理效应。这旨在减少过拟合,并且是许多软件包的默认设置。但这是正确的选择吗?我们表明,诚实估计会降低个体处理效应估计的准确性,特别是当效应异质性显著且数据集足够大以检测到它时。原因是偏差-方差权衡:诚实性降低了过拟合的风险,但通过限制可用于检测和建模异质性的数据,增加了欠拟合的风险。在超过7000个基准数据集上,我们发现默认使用诚实性的代价可能高达需要多27%的数据才能匹配未使用诚实性训练的模型的性能。诚实性最好被理解为一种正则化形式。是否采用它应取决于应用的目标及其经验表现,而不是反射性的默认使用。
Causal forests estimate how treatment effects vary across individuals, guiding personalized interventions in areas like marketing, operations, and public policy. A standard practice is honest estimation: dividing the data into two samples, one to define subgroups and another to estimate treatment effects within them. This is intended to reduce overfitting and is the default in many software packages. But is it the right choice? We show that honest estimation can reduce the accuracy of estimates of individual treatment effects, especially when effect heterogeneity is substantial and datasets are large enough to detect it. The reason is a bias-variance trade-off: honesty lowers the risk of overfitting but increases the risk of underfitting by limiting the data available to detect and model heterogeneity. Across more than 7,000 benchmark datasets, we find that the cost of using honesty by default can be as high as requiring 27% more data to match the performance of models trained without it. Honesty is best understood as a form of regularization. Whether to adopt it should depend on the goals of the application and its empirical performance, not on reflexive default use.
在线强化学习中延迟观测的极小化最优策略
Harin Lee, Kevin Jamieson
AI总结 针对延迟状态观测的强化学习问题,提出结合增广方法和上置信界算法的策略,在表格型MDP上达到极小化最优遗憾界。
我们研究具有延迟状态观测的强化学习,其中智能体在随机数量的时间步后观察到当前状态。我们提出了一种结合增广方法和上置信界方法的算法。对于表格型马尔可夫决策过程(MDP),我们推导出遗憾界为$\tilde{\mathcal{O}}(H \sqrt{D_{\max} SAK})$,其中$S$和$A$是状态和动作空间的基数,$H$是时间跨度,$K$是回合数,$D_{\max}$是最大延迟长度。我们还提供了匹配的下界(对数因子除外),表明我们的方法是最优的。我们的分析框架将这个问题表述为一类更广泛的MDP的特例,其中它们的转移动态分解为已知部分和未知但结构化的部分。我们为这个抽象设定建立了通用结果,这可能具有独立的研究价值。
We study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of time steps. We propose an algorithm that combines the augmentation method and the upper confidence bound approach. For tabular Markov decision processes (MDPs), we derive a regret bound of $\tilde{\mathcal{O}}(H \sqrt{D_{\max} SAK})$, where $S$ and $A$ are the cardinalities of the state and action spaces, $H$ is the time horizon, $K$ is the number of episodes, and $D_{\max}$ is the maximum length of the delay. We also provide a matching lower bound up to logarithmic factors, showing the optimality of our approach. Our analytical framework formulates this problem as a special case of a broader class of MDPs, where their transition dynamics decompose into a known component and an unknown but structured component. We establish general results for this abstract setting, which may be of independent interest.
Rex: 一族可逆指数(随机)龙格-库塔求解器
Zander W. Blasingame, Chen Liu
AI总结 提出Rex求解器族,通过Lawson方法将显式(随机)龙格-库塔格式转化为代数可逆形式,用于扩散ODE和SDE,实现近机器精度重建并提升流模型和扩散模型的性能。
基于神经微分方程的深度生成模型已成为许多生成任务的最先进方法。这些模型依赖于从先验分布积分到数据分布的ODE/SDE求解器;在许多应用中,逆方向积分也非常可取。然而,标准求解器会累积离散误差,阻碍精确反演,这种不准确性在精度关键的应用中是不可接受的。现有的反演方法稳定性差、收敛阶低,且严格限于ODE设置。在这项工作中,我们提出Rex,一族可逆指数(随机)龙格-库塔求解器,通过应用Lawson方法将任何显式(随机)龙格-库塔格式转化为扩散ODE和SDE的代数可逆格式。除了严格的理论分析——建立任意阶收敛性和非零线性稳定区域——我们通过实验证明Rex实现了近机器精度的重建,并改进了基于流模型的玻尔兹曼采样以及基于扩散模型的图像生成和编辑。
Deep generative models based on neural differential equations have become state-of-the-art for many generation tasks. These models rely on ODE/SDE solvers that integrate from a prior distribution to the data distribution; in many applications it is also highly desirable to integrate in the inverse direction. Standard solvers, however, accumulate discretization errors that prohibit exact inversion, an inaccuracy that is unacceptable in precision-critical applications. Existing inversion methods suffer from poor stability and low order of convergence, and are strictly limited to the ODE setting. In this work, we propose Rex, a family of reversible exponential (stochastic) Runge-Kutta solvers obtained by applying Lawson methods to convert any explicit (stochastic) Runge-Kutta scheme into an algebraically reversible one for both diffusion ODEs and SDEs. Beyond a rigorous theoretical analysis -- establishing arbitrary-order convergence and a non-zero region of linear stability -- we empirically demonstrate that Rex achieves near-machine-precision reconstruction and improves Boltzmann sampling with flow models as well as image generation and editing with diffusion models.
延迟接受马尔可夫链蒙特卡罗方法用于稳健贝叶斯分析
Masahiro Tanaka
AI总结 提出延迟接受MCMC算法,通过两阶段筛选减少大协方差矩阵计算,在准贝叶斯推断中实现效率翻倍。
本研究引入了一种计算高效的算法——延迟接受马尔可夫链蒙特卡罗(DA-MCMC),旨在改进准贝叶斯推断中的后验模拟。准贝叶斯方法不需要完全指定概率模型,但由于需要评估大协方差矩阵的逆和行列式,通常计算成本高昂。DA-MCMC通过采用两阶段过程解决这一挑战:在第一阶段,使用近似后验筛选提议;在第二阶段,基于精确目标后验做出最终接受或拒绝决定。这减少了对昂贵矩阵计算的需求,从而在不牺牲准确性的情况下提高效率。我们通过合成数据和真实数据的应用证明了DA-MCMC的有效性。结果表明,尽管DA-MCMC每次迭代的有效样本量略低于标准MCMC,但它在每秒有效样本量方面取得了显著改进,效率大约翻倍。这使得DA-MCMC特别适用于后验模拟计算密集的情况。因此,DA-MCMC算法为准贝叶斯推断的计算效率提供了显著进步,使其成为稳健贝叶斯分析的宝贵工具。
This study introduces a computationally efficient algorithm, delayed acceptance Markov chain Monte Carlo (DA-MCMC), designed to improve posterior simulation in quasi-Bayesian inference. Quasi-Bayesian methods, which do not require fully specifying a probabilistic model, are often computationally expensive owing to the need to evaluate the inverse and determinant of large covariance matrices. DA-MCMC addresses this challenge by employing a two-stage process: In the first stage, proposals are screened using an approximate posterior, whereas a final acceptance or rejection decision is made in the second stage based on the exact target posterior. This reduces the need for costly matrix computations, thereby improving efficiency without sacrificing accuracy. We demonstrate the effectiveness of DA-MCMC through applications to both synthetic and real data. The results demonstrate that, although DA-MCMC slightly reduces the effective sample size per iteration compared with the standard MCMC, it achieves substantial improvement in terms of effective sample size per second, approximately doubling the efficiency. This makes DA-MCMC particularly useful for cases where posterior simulation is computationally intensive. Thus, the DA-MCMC algorithm offers a significant advancement in computational efficiency for quasi-Bayesian inference, making it a valuable tool for robust Bayesian analysis.
深度网络的最优初始化:深度Leaky ReLU网络的Lyapunov初始化与极限定理
Constantin Kogler, Tassilo Schwarz, Samuel Kittle
AI总结 本文通过随机深度Leaky ReLU网络的严格概率分析,提出Lyapunov初始化方法,将Lyapunov指数设为零以确保激活稳定性,从而改善学习效果。
深度网络的有效初始化需要理解随机神经网络。本文对深度无偏置随机Leaky ReLU网络进行了严格的概率分析。我们证明了网络激活范数对数的强大数定律和中心极限定理,表明随着层数增加,其增长由称为Lyapunov指数的参数控制。该参数刻画了激活消失与爆炸之间的尖锐相变,并针对高斯或正交权重矩阵显式计算了Lyapunov指数。我们的结果表明,标准方法(如He初始化或正交初始化)无法保证低宽度深度网络的激活稳定性。基于这些理论见解,我们提出了一种新的初始化方法,称为Lyapunov初始化,它将Lyapunov指数设为零,从而确保神经网络尽可能稳定,经验上导致学习改进。
Effective initialization in deep networks requires an understanding of random neural networks. In this work, a rigorous probabilistic analysis of deep bias-free random Leaky ReLU networks is provided. We prove a Law of Large Numbers and a Central Limit Theorem for the logarithm of the norm of network activations, establishing that, as the number of layers increases, their growth is governed by a parameter called the Lyapunov exponent. This parameter characterizes a sharp phase transition between vanishing and exploding activations, and we calculate the Lyapunov exponent explicitly for Gaussian or orthogonal weight matrices. Our results reveal that standard methods, such as He initialization or orthogonal initialization, do not guarantee activation stability for deep networks of low width. Based on these theoretical insights, we propose a novel initialization method, referred to as Lyapunov initialization, which sets the Lyapunov exponent to zero and thereby ensures that the neural network is as stable as possible, leading empirically to improved learning.
将噪声适应于数据:来自一维过程的生成流
Jannis Chemseddine, Gregor Kornhardt, Richard Duong, Gabriele Steidl
AI总结 提出一个通用框架,通过一维分位数函数学习数据自适应的参数化先验分布(潜在噪声),利用噪声与数据之间的Wasserstein距离进行优化,以改善生成流模型对重尾等分布的学习能力。
基于流的生成模型中的默认高斯潜变量在学习某些分布(如重尾分布)时会带来挑战。我们引入了一个通用框架,使用一维分位数函数学习数据自适应的参数化先验分布(潜在噪声),并通过噪声与数据之间的Wasserstein距离进行优化。基于分位数的先验参数化自然地适应重尾分布和紧支撑分布,并缩短传输路径。在重尾天气和图像数据集上的数值结果证实了该方法的灵活性和有效性,且计算开销可忽略不计。
The default Gaussian latent in flow-based generative models poses challenges when learning certain distributions such as heavy-tailed ones. We introduce a general framework for learning data-adaptive parametric prior distributions (latent noise) using one-dimensional quantile functions, optimized via the Wasserstein distance between noise and data. The quantile-based prior parameterization naturally adapts to both heavy-tailed and compactly supported distributions and shortens transport paths. Numerical results on heavy-tailed weather and image datasets confirm the method's flexibility and effectiveness achieved with negligible computational overhead.
低秩适配器的多选学习用于语言建模
Victor Letzelter, Hugo Malard, Mathieu Fontaine, Gaël Richard, Slim Essid, Andrei Bursuc, Patrick Pérez
AI总结 提出LoRA-MCL训练方案,通过多选学习和低秩适配扩展语言模型的下一词预测,以在推理时解码多样且合理的句子延续。
我们提出LoRA-MCL,一种训练方案,通过一种旨在推理时解码多样、合理的句子延续的方法,扩展语言模型中的下一词预测。传统语言建模是一个本质上不适定的问题:给定一个上下文,多个未来可能同样合理。我们的方法利用多选学习(MCL)和胜者全得损失,通过低秩适配有效处理歧义。我们提供了将MCL应用于语言建模的理论解释,假设数据来自混合分布。我们使用马尔可夫链混合来说明所提出的方法。然后,我们通过音频和视觉字幕以及机器翻译的实验证明,我们的方法在生成输出中实现了高多样性和相关性。我们发布了将LoRA-MCL应用于广泛语言模型的代码。
We propose LoRA-MCL, a training scheme that extends next-token prediction in language models with a method designed to decode diverse, plausible sentence continuations at inference time. Traditional language modeling is an intrinsically ill-posed problem: given a context, multiple futures may be equally plausible. Our approach leverages Multiple Choice Learning (MCL) and the winner-takes-all loss to efficiently handle ambiguity through Low-Rank Adaptation. We provide a theoretical interpretation of applying MCL to language modeling, assuming the data is generated from a mixture of distributions. We illustrate the proposed approach using mixtures of Markov chains. We then demonstrate with experiments on audio and visual captioning, as well as machine translation, that our method achieves high diversity and relevance in generated outputs. We release the code for applying LoRA-MCL to a wide range of language models.
在线表格MDPs的数据依赖和方差依赖遗憾界
Mingyi Li, Taira Tsuchiya, Kenji Yamanishi
AI总结 针对已知转移的在线表格马尔可夫决策过程,提出在对抗性环境下实现数据依赖遗憾界、在随机环境下实现方差依赖遗憾界的最优算法,并证明全局优化方法达到近乎最优。
本文研究具有已知转移的在线情景表格马尔可夫决策过程(MDPs),并开发了在对抗性环境下实现精细数据依赖遗憾界、在随机环境下实现方差依赖遗憾界的最佳算法。我们使用一阶量和几个新的数据依赖度量(包括二阶量和路径长度度量)来量化对抗性环境下的MDP复杂度,以及基于方差的度量来量化随机环境下的MDP复杂度。为了适应这些度量,我们基于全局优化和策略优化开发了算法,两者都建立在具有对数障碍正则化的乐观跟随正则化领导者之上。对于全局优化,我们的算法在对抗性环境下实现了一阶、二阶和路径长度遗憾界,在随机环境下实现了方差感知的无间隙依赖界和方差感知的间隙依赖界(该界关于情景数量为多对数)。对于策略优化,通过利用新的乐观$Q$函数估计器,我们的算法实现了相同的数据和方差依赖自适应性,但乘以情景视界因子。最后,我们针对对抗性环境下的数据依赖复杂度度量和随机环境下的方差度量建立了遗憾下界,表明全局优化方法实现的遗憾上界是近乎最优的。
This work studies online episodic tabular Markov decision processes (MDPs) with known transitions and develops best-of-both-worlds algorithms that achieve refined data-dependent regret bounds in the adversarial regime and variance-dependent regret bounds in the stochastic regime. We quantify MDP complexity using a first-order quantity and several new data-dependent measures for the adversarial regime, including a second-order quantity and a path-length measure, as well as variance-based measures for the stochastic regime. To adapt to these measures, we develop algorithms based on global optimization and policy optimization, both built on optimistic follow-the-regularized-leader with log-barrier regularization. For global optimization, our algorithms achieve first-order, second-order, and path-length regret bounds in the adversarial regime, and in the stochastic regime, they achieve a variance-aware gap-independent bound and a variance-aware gap-dependent bound that is polylogarithmic in the number of episodes. For policy optimization, our algorithms achieve the same data- and variance-dependent adaptivity, up to a factor of the episode horizon, by exploiting a new optimistic $Q$-function estimator. Finally, we establish regret lower bounds in terms of data-dependent complexity measures for the adversarial regime and a variance measure for the stochastic regime, implying that the regret upper bounds achieved by the global-optimization approach are nearly optimal.
因果偏好启发
Edwin V. Bonilla, He Zhao, Daniel M. Steinberg
AI总结 提出一种贝叶斯框架,通过主动查询局部边关系来集中有向无环图的后验分布,实现专家参与的因果发现。
我们提出因果偏好启发,一种用于专家参与因果发现的贝叶斯框架,该框架主动查询局部边关系以集中有向无环图(DAG)的后验分布。从任何黑箱观测后验出发,我们使用一个三向似然模型对专家的噪声判断进行建模,该似然涵盖边的存在性和方向。后验推断采用灵活的粒子近似,并通过专家分类响应的期望信息增益准则高效选择查询。在合成图、蛋白质信号数据以及人类基因扰动基准上的实验表明,在严格的查询预算下,后验集中速度更快,且对有向效应的恢复能力得到提升。
We propose causal preference elicitation, a Bayesian framework for expert-in-the-loop causal discovery that actively queries local edge relations to concentrate a posterior over directed acyclic graphs (DAGs). From any black-box observational posterior, we model noisy expert judgments with a three-way likelihood over edge existence and direction. Posterior inference uses a flexible particle approximation, and queries are selected by an efficient expected information gain criterion on the expert's categorical response. Experiments on synthetic graphs, protein signaling data, and a human gene perturbation benchmark show faster posterior concentration and improved recovery of directed effects under tight query budgets.
弱扩散先验仍能实现强逆问题性能
Jing Jia, Wei Yuan, Sifan Liu, Liyue Shen, Guanyang Wang
AI总结 研究弱扩散先验在逆问题中的鲁棒性,通过贝叶斯一致性和局部相关性分析揭示其在信息丰富测量下仍有效的原因。
在卧室图像上训练的扩散模型能否恢复人脸图像?扩散模型被广泛用作逆问题的先验,但标准方法通常假设一个高保真模型,该模型在与未知信号高度匹配的数据上训练。实践中,常常必须使用不匹配或低保真的扩散先验。令人惊讶的是,这些弱先验的表现往往几乎与全强度的域内基线相当。我们研究了逆求解器何时以及为何对弱扩散先验具有鲁棒性。通过大量实验,我们发现当测量信息高度丰富(例如,大量观测像素)时,弱先验能够成功,并识别了它们失败的场景。为了解释这一行为,我们将贝叶斯一致性理论与局部相关性分析相结合:理论给出了高维测量使后验集中于真实信号附近的条件,而相关性分析表明弱先验和更强的自然图像先验可以共享相似的局部空间结构。这些结果为何时可以可靠地使用弱扩散先验提供了原则性依据。代码可在 https://github.com/jjia131/weak-diffusion-priors-inverse-problem 获取。
Can a diffusion model trained on bedrooms recover human faces? Diffusion models are widely used as priors for inverse problems, but standard approaches usually assume a high-fidelity model trained on data that closely match the unknown signal. In practice, one often must use a mismatched or low-fidelity diffusion prior. Surprisingly, these weak priors often perform nearly as well as full-strength, in-domain baselines. We study when and why inverse solvers are robust to weak diffusion priors. Through extensive experiments, we find that weak priors succeed when measurements are highly informative (e.g., many observed pixels), and we identify regimes where they fail. To explain this behavior, we combine Bayesian-consistency theory with local-correlation analysis: the theory gives conditions under which high-dimensional measurements make the posterior concentrate near the true signal, while the correlation analysis shows that weak and stronger natural-image priors can share similar local spatial structure. These results provide a principled justification on when weak diffusion priors can be used reliably. Code is available at https://github.com/jjia131/weak-diffusion-priors-inverse-problem.
使用无标签汇总统计的半监督推断
Facheng Yu, Zhen Qi, Yuqian Zhang
AI总结 提出在仅有无标签汇总统计(如样本均值和协方差)的受限半监督设置下,利用无标签汇总统计提高均值估计效率并校正选择偏差的方法,并扩展到平均处理效应估计。
半监督推断假设可以访问一个标记数据集以及一个结果变量缺失的大型无标签数据集,它被广泛用于提高统计效率和支持跨人群的泛化能力。然而,在许多现代应用中,由于隐私限制、数据共享限制或存储约束,个体水平的无标签数据可能无法直接访问,而无标签人群的汇总统计量(如样本均值和协方差)通常是可用的。在这项工作中,我们研究了这种受限的半监督设置,其中除了具有个体观测的标记数据外,来自无标签人群的辅助信息仅通过汇总统计量可用。我们提出了在协变量独立和协变量依赖标记下用于均值估计的新半监督推断方法,并表明无标签汇总统计仍然可以提高效率并帮助校正选择偏差。所提出的方法适用于高维情况,并且对模型误设具有鲁棒性。在稀疏性条件下获得有效的推断,这些条件与假设可以访问个体水平无标签样本的半监督方法所需的条件相当。我们的方法依赖于一种专门的交叉拟合程序,其中样本分割仅应用于标记数据,这消除了对个体化无标签协变量的需求。我们进一步将该框架扩展到平均处理效应估计,从而在这种受限的半监督设置中实现因果结论的泛化性和可迁移性。
Semi-supervised inference assumes access to a labeled dataset together with a large unlabeled dataset in which the outcome variable is missing, and it is widely used to improve statistical efficiency and support generalizability across populations. In many modern applications, however, individual-level unlabeled data may not be directly accessible due to privacy restrictions, data-sharing limits, or storage constraints, while summary statistics such as sample means and covariances from the unlabeled population are often available. In this work, we study this constrained semi-supervised setting where, in addition to labeled data with individualized observations, auxiliary information from the unlabeled population is available only through summary statistics. We propose new semi-supervised inference methods for mean estimation under both covariate-independent and covariate-dependent labeling and show that unlabeled summaries can still improve efficiency and help correct selection bias. The proposed methods apply in high dimensions and are robust to model misspecification. Valid inference is obtained under sparsity conditions comparable to those required by semi-supervised methods that assume access to individual-level unlabeled samples. Our approach relies on a specialized cross-fitting procedure, where sample splitting is applied only to the labeled data, which removes the need for individualized unlabeled covariates. We further extend this framework to average treatment effect estimation, enabling generalizability and transportability of causal conclusions in this constrained semi-supervised setting.
通过加权p值提高组合预测集的覆盖范围
Gina Wong, Drew Prinster, Suchi Saria, Rama Chellappa, Anqi Liu
AI总结 提出一种加权聚合预测集的框架,通过为每个预测集分配权重,实现覆盖范围在$1-2α$与$1-α$之间的灵活控制,并推广到数据依赖权重,在混合专家模型等场景中保持有限样本有效性。
共形预测通过用有效的预测集增强点预测来量化机器学习模型的不确定性。对于涉及多个试验、模型或数据源的复杂场景,可以聚合共形预测集以创建捕获整体不确定性的预测集,通常能提高精度。然而,聚合具有个体$1-α$覆盖率的多个预测集不可避免地削弱了整体保证,通常导致最坏情况覆盖率为$1-2α$。在这项工作中,我们提出了一个预测集加权聚合的框架,其中根据每个预测集的贡献为其分配权重。我们的框架提供了对集合聚合方式的灵活控制,实现了更紧的覆盖界限,根据权重的分布在组合模型的$1-2α$保证和单个模型的$1-α$保证之间插值。重要的是,我们的框架推广到数据依赖的权重,因为我们推导了一个加权聚合程序,即使权重依赖于数据,也能保持有限样本有效性。这一扩展使我们的框架广泛适用于权重被学习的场景,例如混合专家模型(MoE),并且我们通过在MoE设置中的实验证明,我们的方法实现了自适应覆盖。
Conformal prediction quantifies the uncertainty of machine learning models by augmenting point predictions with valid prediction sets. For complex scenarios involving multiple trials, models, or data sources, conformal prediction sets can be aggregated to create a prediction set that captures the overall uncertainty, often improving precision. However, aggregating multiple prediction sets with individual $1-α$ coverage inevitably weakens the overall guarantee, typically resulting in $1-2α$ worst-case coverage. In this work, we propose a framework for the weighted aggregation of prediction sets, where weights are assigned to each prediction set based on their contribution. Our framework offers flexible control over how the sets are aggregated, achieving tighter coverage bounds that interpolate between the $1-2α$ guarantee of the combined models and the $1-α$ guarantee of an individual model depending on the distribution of weights. Importantly, our framework generalizes to data-dependent weights, as we derive a procedure for weighted aggregation that maintains finite-sample validity even when the weights depend on the data. This extension makes our framework broadly applicable to settings where weights are learned, such as mixture-of-experts (MoE), and we demonstrate through experiments in the MoE setting that our methods achieve adaptive coverage.
随机矩阵中最小-最大归一化特征值的统计
Hyakka Nakada, Shu Tanaka
AI总结 研究随机矩阵中最小-最大归一化特征值的统计性质,提出有效分布并推导累积分布的标度律和矩阵分解的残差误差。
随机矩阵理论在纯数学、数学物理和机器学习的各个领域都发挥了重要作用。从数据科学的实际角度来看,输入数据通常在处理前进行归一化。因此,本研究探讨了随机矩阵中最小-最大归一化特征值的统计性质。先前,已经提出了这种归一化特征值的有效分布。在本研究中,我们将其应用于评估累积分布的标度律。此外,我们推导了随机矩阵分解过程中产生的残差误差。我们进行了数值实验来验证这些理论预测。
Random matrix theory has played an important role in various areas of pure mathematics, mathematical physics, and machine learning. From a practical perspective of data science, input data are usually normalized prior to processing. Thus, this study investigates the statistical properties of min-max normalized eigenvalues in random matrices. Previously, the effective distribution for such normalized eigenvalues has been proposed. In this study, we apply it to evaluate a scaling law of the cumulative distribution. Furthermore, we derive the residual error that arises during matrix factorization of random matrices. We conducted numerical experiments to verify these theoretical predictions.
基于大规模在线核学习的双向因果效应估计
Masahiro Tanaka
AI总结 提出一种可扩展的在线核学习框架,结合异方差识别和拟极大似然估计,用于估计存在相互依赖和异方差系统中的双向因果效应,并通过随机傅里叶特征和自适应在线梯度下降实现高效计算。
本研究提出一种可扩展的在线核学习框架,用于估计以相互依赖和异方差为特征的系统中的双向因果效应。传统因果推断通常关注单向效应,忽略了现实世界中常见的双向关系。基于异方差识别,该方法将联立方程模型的拟极大似然估计与大规模在线核学习相结合。它采用随机傅里叶特征逼近来灵活建模非线性条件均值和方差,同时自适应在线梯度下降算法确保了对流式和高维数据的计算效率。大量模拟结果表明,与单方程和多项式逼近基线相比,该方法在多种数据生成过程中实现了更高的准确性和稳定性,偏差和均方根误差更低。这些结果证实,该方法以近线性计算扩展有效捕获了复杂的双向因果效应。通过将计量经济学识别与现代机器学习技术相结合,所提框架为自然科学/社会科学、政策制定、商业和工业应用中的大规模因果推断提供了一种实用、可扩展且理论扎实的解决方案。
In this study, a scalable online kernel learning framework is proposed for estimating bidirectional causal effects in systems characterized by mutual dependence and heteroskedasticity. Traditional causal inference often focuses on unidirectional effects, overlooking the common bidirectional relationships in real-world phenomena. Building on heteroskedasticity-based identification, the proposed method integrates a quasi-maximum likelihood estimator for simultaneous equation models with large scale online kernel learning. It employs random Fourier feature approximations to flexibly model nonlinear conditional means and variances, while an adaptive online gradient descent algorithm ensures computational efficiency for streaming and high-dimensional data. Results from extensive simulations demonstrate that the proposed method achieves superior accuracy and stability than single equation and polynomial approximation baselines, exhibiting lower bias and root mean squared error across various data-generating processes. These results confirm that the proposed approach effectively captures complex bidirectional causal effects with near-linear computational scaling. By combining econometric identification with modern machine learning techniques, the proposed framework offers a practical, scalable, and theoretically grounded solution for large scale causal inference in natural/social science, policy making, business, and industrial applications.
基于潜在扩散模型的可扩展单细胞基因表达生成
Giovanni Palla, Sudarshan Babu, Payam Dibaeinia, James D. Pearce, Donghui Li, Aly A. Khan, Theofanis Karaletsos, Jakub M. Tomczak
AI总结 提出scLDM,一种结合变分自编码器和潜在扩散模型的可扩展生成方法,通过置换不变/等变架构和扩散Transformer实现高质量单细胞基因表达生成。
单细胞基因表达的计算建模对于理解细胞过程至关重要,但生成真实的表达谱仍然是一个主要挑战。这一困难源于基因表达数据的计数性质以及基因之间复杂的潜在依赖性。现有的生成模型通常强加人工基因排序或依赖浅层神经网络架构。我们引入了一种可扩展的潜在扩散模型用于单细胞基因表达数据,称为scLDM,该模型尊重数据的基本可交换性属性。我们的VAE使用固定大小的潜在变量,利用统一的多头交叉注意力块(MCAB)架构,该架构具有双重作用:编码器中的置换不变池化和解码器中的置换等变反池化。我们通过用使用扩散Transformer和线性插值的潜在扩散模型替换高斯先验来增强这一框架,从而通过多条件无分类器引导实现高质量生成。我们在观察性和扰动性单细胞数据的多种实验以及下游任务(如细胞水平分类)中展示了其优越性能。
Computational modeling of single-cell gene expression is crucial for understanding cellular processes, but generating realistic expression profiles remains a major challenge. This difficulty arises from the count nature of gene expression data and complex latent dependencies among genes. Existing generative models often impose artificial gene orderings or rely on shallow neural network architectures. We introduce a scalable latent diffusion model for single-cell gene expression data, which we refer to as scLDM, that respects the fundamental exchangeability property of the data. Our VAE uses fixed-size latent variables leveraging a unified Multi-head Cross-Attention Block (MCAB) architecture, which serves dual roles: permutation-invariant pooling in the encoder and permutation-equivariant unpooling in the decoder. We enhance this framework by replacing the Gaussian prior with a latent diffusion model using Diffusion Transformers and linear interpolants, enabling high-quality generation with multi-conditional classifier-free guidance. We show its superior performance in a variety of experiments for both observational and perturbational single-cell data, as well as downstream tasks like cell-level classification.
基于上下文感知保形预测的增强型可再生能源预测
Alireza Moradi, Mathieu Tanneau, Reza Zandehshahvar, Pascal Van Hentenryck
AI总结 提出上下文感知保形预测(CACP)框架,通过加权历史观测校准预测区间,无需重新训练模型,提升可再生能源预测的可靠性和效率。
人工智能(AI)越来越多地被用于支持可再生能源预测和电网运营。随着可再生能源渗透率的增长,可靠的概率预测对于管理不确定性和支持风险感知的运营决策变得至关重要。然而,由于时间变异性、天气条件变化和异质运行机制,这些预测常常存在校准偏差。在许多实际场景中,可再生能源预测由外部来源、供应商或独立训练的系统提供,由于模型访问受限或计算约束,重新训练不可行。这需要高效且模型无关的方法来在预测生成后提高其可靠性。本文提出了上下文感知保形预测(CACP),一种用于校准可再生能源预测的框架。所提方法在校准过程中依赖于一种加权机制,该机制为与目标预测条件更相似的历史观测分配更高的权重。这使得能够自适应预测区间,反映局部不确定性机制,而无需访问或重新训练底层预测模型。实验在来自美国国家可再生能源实验室(NREL)的日前太阳能预测大规模数据集上进行,涵盖包括MISO、ERCTO和SPP在内的多个系统。结果表明,与NREL的基础预测模型和其他保形预测基线相比,CACP在站点和系统层面均改善了可靠性-效率权衡。这些结果表明,CACP可以作为可信AI驱动的可再生能源预测和运营决策支持的实际可靠性增强层。
Artificial intelligence (AI) is increasingly used to support renewable energy forecasting and grid operations. As renewable penetration grows, reliable probabilistic forecasting is becoming essential for managing uncertainty and supporting risk-aware operational decision-making. However, these forecasts often suffer from miscalibration due to temporal variability, changing weather conditions, and heterogeneous operating regimes. In many real-world settings, renewable energy forecasts are provided by external sources, vendors, or independently trained systems, making retraining infeasible because of limited model access or computational constraints. This creates a need for efficient and model-agnostic methods that can improve forecast reliability after they are produced. This paper presents Context-Aware Conformal Prediction (CACP), a framework for calibrating renewable energy forecasts. The proposed method relies on a weighting mechanism during the calibration procedure which assigns higher weights to historical observations that are more similar to the target forecasting condition. This enables adaptive prediction intervals that reflect local uncertainty regimes without requiring access to, or retraining of, the underlying forecasting model. Experiments are performed on a large-scale dataset from National Renewable Energy Laboratory (NREL) day-ahead solar forecasting, covering multiple systems including MISO, ERCTO, and SPP. The results show that CACP improves the reliability-efficiency tradeoff at both site and system levels compared to NREL's base forecasting model and the other conformal prediction baselines. These results suggest that CACP can serve as a practical reliability-enhancement layer for trustworthy AI-enabled renewable energy forecasting and operational decision support.
非平稳性下通过截断策略梯度估计器估计处理效应
Ramesh Johari, Tianyi Peng, Wenqian Xing
AI总结 针对非平稳动态系统中的处理效应估计问题,提出截断策略梯度(TPG)估计器,通过短视结果轨迹替代瞬时结果,在马尔可夫非平稳设置下实现偏差和方差的可证明降低,并建立了中心极限定理和一致方差估计。
随机实验(或A/B测试)被广泛用于评估推荐平台、市场和数字健康等动态系统中的干预措施。在这些设置中,干预措施既影响当前系统状态也影响未来系统状态,因此估计全局平均处理效应(GATE)需要考虑时间动态性,这在非平稳性存在时尤其具有挑战性;现有方法要么偏差高,要么方差高,或者两者兼有。在本文中,我们通过新颖的截断策略梯度(TPG)估计器应对这一挑战,该估计器用短视结果轨迹替代瞬时结果。该估计器具有策略梯度解释:它是GATE一阶近似的截断,在非平稳马尔可夫设置中可证明地减少了偏差和方差。我们进一步建立了TPG估计器的中心极限定理,并开发了一个在非平稳性下使用单轨迹数据仍然有效的一致方差估计器。我们通过两个真实世界案例研究验证了我们的理论。结果表明,相对于现有方法,校准良好的TPG估计器在非平稳设置中能够实现偏差和方差的有利平衡,凸显了策略梯度视角在设计复杂动态下有效估计器中的价值。
Randomized experiments (or A/B tests) are widely used to evaluate interventions in dynamic systems such as recommendation platforms, marketplaces, and digital health. In these settings, interventions affect both current and future system states, so estimating the global average treatment effect (GATE) requires accounting for temporal dynamics, which is especially challenging in the presence of nonstationarity; existing approaches suffer from high bias, high variance, or both. In this paper, we address this challenge via the novel Truncated Policy Gradient (TPG) estimator, which replaces instantaneous outcomes with short-horizon outcome trajectories. The estimator admits a policy gradient interpretation: it is a truncation of the first-order approximation to the GATE, yielding provable reductions in bias and variance in nonstationary Markovian settings. We further establish a central limit theorem for the TPG estimator and develop a consistent variance estimator that remains valid under nonstationarity with single-trajectory data. We validate our theory with two real-world case studies. The results show that relative to existing approaches, a well-calibrated TPG estimator can achieve a favorable balance between bias and variance in nonstationary settings, highlighting the value of the policy-gradient perspective for designing effective estimators under complex dynamics.
随机矩阵叠加谱中的高阶间距与间距比比较及其在复杂系统中的应用
Sashmita Rout, Udaysinh T. Bhosale
AI总结 通过数值研究同一类圆随机矩阵的m个叠加谱中的高阶间距统计,提出修正Dyson指数序列的唯一性猜想,并应用于量子混沌 kicked top 模型及中间映射的谱涨落分析。
数值研究了同一类圆随机矩阵的$m$个叠加谱中的高阶间距统计。我们猜想,对于给定的$m$(或阶数$k$)和$β$,使用累积分布函数间绝对差之和方法(记为$D(β')$)获得的修正Dyson指数$β'(k)$(或$β'(m)$)序列是唯一的。此外,对于给定的$k$,当$m\rightarrow \infty$时,分布趋向于相应的$k$阶泊松统计。研究了不同希尔伯特空间维度的量子混沌 kicked top 模型,发现其满足我们的猜想。这涉及对COE结果中$m=2$情况的数值验证。我们的结果可作为表征系统并确定系统对称性结构(无需对谱进行去对称化)的工具。此外,使用$D(β')$方法,在COE和GOE的$m=1$和$m=2$情况下,数值比较了这些系综内部及之间的高阶间距和间距比分布。该研究通过改变维度并保持实现次数恒定,以及反之亦然的方式进行。在COE和GOE之间,就给定的谱涨落度量而言,观察到相同的高阶渐近统计。但在COE或GOE的给定系综内,高阶间距和间距比分布的结果仅在较低的$k$以下彼此一致,超过该范围后开始偏离。此外,还研究了不同维度的中间映射的谱涨落。我们展示了从大量数值计算分析中得出的各种重要观察和讨论。
Higher-order spacing statistics in the $m$ superposed spectra of circular random matrices of the same class are studied numerically. We conjecture that for given $m$ (or order $k$) and $β$, the sequence of modified Dyson index $β'(k)$ (or $β'(m)$) obtained using the sum of absolute differences between the cumulative distribution functions method (denoted as $D(β')$) is unique. Also, for a given $k$, the distribution tends to the corresponding $k$-th order Poisson statistics in the limit $m\rightarrow \infty$. The quantum chaotic kicked top model for various Hilbert space dimensions is studied, and it is found to satisfy our conjecture. This involves the numerical verification of $m=2$ case of COE results. Our result can be used as a tool for the characterization of a system and to determine the symmetry structure of the system without desymmetrization of the spectra. Additionally, the comparative study of the higher-order spacing and ratio distributions in both $m=1$ and $m=2$ cases of COE as well as GOE is performed within and across these ensembles numerically using the $D(β')$ method. This study is carried out both by varying the dimension and keeping the number of realizations constant, and vice-versa. The same asymptotic higher-order statistics are observed across COE and GOE in terms of a given spectral fluctuation measure. But, within a given ensemble of COE or GOE, the results of higher-order spacing and ratio distributions agree with each other only up to some lower $k$, and beyond that, they start deviating from each other. Further, the spectral fluctuations of the intermediate map of various dimensions are studied. Various important observations and discussions from the analysis of our extensive numerical computations are presented.
SplineSketch:具有误差保证的更精确分位数估计
Aleksander Łukasiewicz, Jakub Tětek, Pavel Veselý
AI总结 提出SplineSketch分位数摘要算法,通过单调三次样条插值动态划分子区间,在保证均匀有界秩误差的同时,性能比t-digest提升2-20倍。
大规模数据集中分位数的空间高效流式估计是一个基础问题,在数据监控和分析中有众多应用。虽然理论研究产生了最优算法,如Greenwald-Khanna算法或KLL草图,但实践者通常使用其他在实践中表现更好但缺乏理论保证的草图。最值得注意的是,广泛使用的$t$-digest具有无界的最坏情况误差。在本文中,我们力求两全其美。我们提出了一种新的分位数摘要SplineSketch,用于数值数据,提供接近最优的理论保证,即均匀有界秩误差,并在一系列合成和真实数据集上比$t$-digest性能提升2-20倍。为了实现这样的性能,我们开发了一种新颖的方法,该方法在输入范围内维护动态子区间划分,同时使用单调三次样条插值拟合输入分布。核心挑战是以空间高效的方式实现该方法,同时确保强最坏情况保证。
Space-efficient streaming estimation of quantiles in massive datasets is a fundamental problem with numerous applications in data monitoring and analysis. While theoretical research led to optimal algorithms, such as the Greenwald-Khanna algorithm or the KLL sketch, practitioners often use other sketches that perform significantly better in practice but lack theoretical guarantees. Most notably, the widely used $t$-digest has unbounded worst-case error. In this paper, we seek to get the best of both worlds. We present a new quantile summary, SplineSketch, for numeric data, offering near-optimal theoretical guarantees, namely uniformly bounded rank error, and outperforming $t$-digest by a factor of 2-20 on a range of synthetic and real-world datasets. To achieve such performance, we develop a novel approach that maintains a dynamic subdivision of the input range into buckets while fitting the input distribution using monotone cubic spline interpolation. The core challenge is implementing this method in a space-efficient manner while ensuring strong worst-case guarantees.
AlphaEval:一个全面高效的公式化Alpha挖掘评估框架
Hongjun Ding, Binqi Chen, Jinsheng Huang, Taian Guo, Zhengyang Mao, Guoyi Shao, Lutong Zou, Luchen Liu, Ming Zhang
AI总结 提出AlphaEval框架,通过五个维度(预测能力、稳定性、鲁棒性、金融逻辑、多样性)对自动Alpha挖掘模型进行统一、可并行化且无需回测的评估,实现与回测相当的评估一致性并提高效率。
公式化Alpha挖掘从金融数据中生成预测信号,对量化投资至关重要。尽管遗传编程、强化学习和大语言模型等多种算法方法显著扩展了Alpha发现的能力,但系统评估仍是一个关键挑战。现有评估指标主要包括回测和基于相关性的度量。回测计算密集、本质上是顺序的,并且对特定策略参数敏感。基于相关性的度量虽然高效,但仅评估预测能力,忽略了时间稳定性、鲁棒性、多样性和可解释性等其他关键属性。此外,大多数现有Alpha挖掘模型的闭源性质阻碍了可重复性并减缓了该领域的进展。为解决这些问题,我们提出了AlphaEval,一个统一、可并行化且无需回测的自动Alpha挖掘模型评估框架。AlphaEval沿五个互补维度评估生成Alpha的整体质量:预测能力、稳定性、对市场扰动的鲁棒性、金融逻辑和多样性。跨代表性Alpha挖掘算法的广泛实验表明,AlphaEval实现了与全面回测相当的评估一致性,同时提供更全面的洞察和更高的效率。此外,与传统的单一指标筛选方法相比,AlphaEval能有效识别更优的Alpha。所有实现和评估工具均已开源,以促进可重复性和社区参与。
Formula alpha mining, which generates predictive signals from financial data, is critical for quantitative investment. Although various algorithmic approaches-such as genetic programming, reinforcement learning, and large language models-have significantly expanded the capacity for alpha discovery, systematic evaluation remains a key challenge. Existing evaluation metrics predominantly include backtesting and correlation-based measures. Backtesting is computationally intensive, inherently sequential, and sensitive to specific strategy parameters. Correlation-based metrics, though efficient, assess only predictive ability and overlook other crucial properties such as temporal stability, robustness, diversity, and interpretability. Additionally, the closed-source nature of most existing alpha mining models hinders reproducibility and slows progress in this field. To address these issues, we propose AlphaEval, a unified, parallelizable, and backtest-free evaluation framework for automated alpha mining models. AlphaEval assesses the overall quality of generated alphas along five complementary dimensions: predictive power, stability, robustness to market perturbations, financial logic, and diversity. Extensive experiments across representative alpha mining algorithms demonstrate that AlphaEval achieves evaluation consistency comparable to comprehensive backtesting, while providing more comprehensive insights and higher efficiency. Furthermore, AlphaEval effectively identifies superior alphas compared to traditional single-metric screening approaches. All implementations and evaluation tools are open-sourced to promote reproducibility and community engagement.
基于噪声测量的高维Ornstein-Uhlenbeck过程的快速数据反演
Yizi Lin, Xubo Liu, Paul Segall, Mengyang Gu
AI总结 提出一种可扩展的潜在因子模型方法,利用正交因子载荷矩阵避免卡尔曼滤波中后验协方差矩阵求逆,并通过期望最大化算法导出闭式表达式降低计算复杂度,应用于高维时间序列噪声滤波、非可分离协方差结构估计及真实世界物理过程反演。
在这项工作中,我们为高维动力系统开发了一种灵活潜在因子模型的可扩展方法。每个潜在因子过程有其自身的相关性和方差参数,正交因子载荷矩阵可以是固定的或估计的。我们利用正交因子载荷矩阵,避免了在卡尔曼滤波的每个时刻计算后验协方差矩阵的逆,并在期望最大化算法中推导出闭式表达式用于参数估计,从而在没有近似的情况下大幅降低了计算复杂度。我们的方法有多个应用,包括高维时间序列的噪声滤波、不同时间序列之间非可分离协方差结构的估计,以及从真实世界测量中估计潜在物理过程。广泛的模拟研究表明,与替代方法相比,我们的方法具有更高的准确性和可扩展性。此外,通过将我们的方法应用于大地测量数据以估计卡斯卡迪亚地区的慢滑事件,我们估计的滑移与独立测量的震颤事件地震数据更吻合。我们的方法带来的显著加速使得大规模噪声数据能够用于地质灾害量化及其他应用。
In this work, we develop a scalable approach for a flexible latent factor model for high-dimensional dynamical systems. Each latent factor process has its own correlation and variance parameters, and the orthogonal factor loading matrix can be either fixed or estimated. We utilize an orthogonal factor loading matrix that avoids computing the inversion of the posterior covariance matrix at each time of the Kalman filter, and derive closed-form expressions in an expectation-maximization algorithm for parameter estimation, which substantially reduces the computational complexity without approximation. Our approach has several applications, including noise filtering for high-dimensional time series, estimating nonseparable covariance structure between different time series, and estimating latent physical processes from real-world measurements. Extensive simulated studies illustrate higher accuracy and scalability of our approach compared to alternatives. Furthermore, by applying our method to geodetic measurements to estimate slow slip events from geodetic data in the Cascadia region, our estimated slip better agrees with independently measured seismic data of tremor events. The substantial acceleration from our method enables the use of massive noisy data for geological hazard quantification and other applications.
贪婪即美德:引导生成的统一视角
Zander W. Blasingame, Chen Liu
AI总结 本文通过将后验引导视为端到端引导的贪婪策略,统一了两种梯度引导方法,并提出了在计算与精度之间权衡的插值方法,在逆图像问题和分子生成任务上验证了有效性。
无训练引导生成是一种广泛使用且强大的技术,允许最终用户对流/扩散模型的生成过程施加进一步控制。一般来说,针对基于梯度的引导,已经出现了两种技术系列:即后验引导(即通过目标预测模型将当前样本投影到目标分布进行引导)和端到端引导(即通过在整个ODE求解过程中执行反向传播进行引导)。在这项工作中,我们表明这两个看似分离的系列实际上可以通过将后验引导视为端到端引导的贪婪策略来统一。我们探索了这两个系列之间的理论联系,并深入分析了这两种技术相对于连续理想梯度的关系。基于这一分析,我们提出了一种在这两个系列之间插值的方法,从而在引导梯度的计算与精度之间实现权衡。然后,我们在几个逆图像问题和性质引导的分子生成任务上验证了这项工作。
Training-free guided generation is a widely used and powerful technique that allows the end user to exert further control over the generative process of flow/diffusion models. Generally speaking, two families of techniques have emerged for solving this problem for gradient-based guidance: namely, posterior guidance (i.e., guidance via projecting the current sample to the target distribution via the target prediction model) and end-to-end guidance (i.e., guidance by performing backpropagation throughout the entire ODE solve). In this work, we show that these two seemingly separate families can actually be unified by looking at posterior guidance as a greedy strategy of end-to-end guidance. We explore the theoretical connections between these two families and provide an in-depth theoretical of these two techniques relative to the continuous ideal gradients. Motivated by this analysis we then show a method for interpolating between these two families enabling a trade-off between compute and accuracy of the guidance gradients. We then validate this work on several inverse image problems and property-guided molecular generation.
Motsch-Tadmor模型中交互核估计的稀疏贝叶斯学习算法
Jinchao Feng, Sui Tang
AI总结 针对Motsch-Tadmor模型中非对称交互核的估计问题,提出一种基于变分框架和稀疏贝叶斯学习的算法,实现核函数的鲁棒识别与不确定性量化。
本文基于观测轨迹数据,研究Motsch-Tadmor模型中非对称交互核的数据驱动辨识。所考虑的模型由一类半线性演化方程控制,其中交互核定义了一个归一化的、依赖于状态的拉普拉斯算子,该算子支配集体动力学。为了解决由此产生的非线性逆问题,我们提出一个变分框架,利用控制方程的隐式形式重新表述核辨识问题,将其简化为子空间辨识问题。我们建立了一个可辨识性结果,刻画了交互核在尺度意义下可唯一恢复的条件。为了鲁棒地求解逆问题,我们开发了一种稀疏贝叶斯学习算法,该算法引入信息先验进行正则化,量化不确定性,并实现原则性的模型选择。在代表性交互粒子系统上的大量数值实验表明,所提出的框架在不同噪声水平和数据范围内具有准确性、鲁棒性和可解释性。
In this paper, we investigate the data-driven identification of asymmetric interaction kernels in the Motsch-Tadmor model based on observed trajectory data. The model under consideration is governed by a class of semilinear evolution equations, where the interaction kernel defines a normalized, state-dependent Laplacian operator that governs collective dynamics. To address the resulting nonlinear inverse problem, we propose a variational framework that reformulates kernel identification using the implicit form of the governing equations, reducing it to a subspace identification problem. We establish an identifiability result that characterizes conditions under which the interaction kernel can be uniquely recovered up to scale. To solve the inverse problem robustly, we develop a sparse Bayesian learning algorithm that incorporates informative priors for regularization, quantifies uncertainty, and enables principled model selection. Extensive numerical experiments on representative interacting particle systems demonstrate the accuracy, robustness, and interpretability of the proposed framework across a range of noise levels and data regimes.
一种通过傅里叶分析评估G-Wishart归一化常数的新方法
Ching Wong, Giusi Moffa, Jack Kuipers
AI总结 针对贝叶斯高斯图模型中G-Wishart分布归一化常数的计算难题,本文利用随机矩阵理论和傅里叶分析,为非弦图类推导了适合数值计算的新精确结果,并开发了比现有方法高效数个数量级的蒙特卡洛方案。
G-Wishart分布是高斯图模型贝叶斯分析中作为精度矩阵共轭先验的核心组成部分。评估此类模型的边际似然通常需要计算高维积分以确定G-Wishart归一化常数。对于可分解图或弦图,已知有闭式结果,而对于一般图,最近已推导出形式级数展开的显式表示。然而,嵌套无穷和并不适合计算,实际价值有限。借鉴随机矩阵理论和傅里叶分析的技术,我们为弦图之外的图类提供了适合归一化常数数值评估的新精确结果。此外,我们为一般图开发了一种蒙特卡洛方案,其效率可比当前方法高出数个数量级。
The G-Wishart distribution is a core component for the Bayesian analysis of Gaussian graphical models as the conjugate prior for the precision matrix. Evaluating the marginal likelihood of such models usually requires computing high-dimensional integrals to determine the G-Wishart normalising constant. Closed-form results are known for decomposable or chordal graphs, while an explicit representation as a formal series expansion has been derived recently for general graphs. The nested infinite sums, however, do not lend themselves to computation, remaining of limited practical value. Borrowing techniques from random matrix theory and Fourier analysis, we provide novel exact results well suited to the numerical evaluation of the normalising constant for classes of graphs beyond chordal graphs. We additionally develop a Monte Carlo scheme for general graphs, which can be orders of magnitude more efficient than current approaches.
因子模型的规范分解:弱因子无处不在
Philipp Gersing, Matteo Barigozzi, Christoph Rust, Manfred Deistler
AI总结 本文提出因子模型的规范分解,引入弱公共成分(动态与静态公共成分之差),并通过理论和实证表明该成分不可忽略,且考虑弱成分可获得更合理的脉冲响应函数。
我们推导出一种新颖的因子模型规范分解,涵盖静态因子模型(因子仅同期加载)和广义动态因子模型(因子滞后加载)。该分解包含一个新项:弱公共成分,定义为动态与静态公共成分之差。它由(可能无限多的)非普遍弱因子驱动,这些因子属于动态公共空间。通过理论和实证例子(涉及美国宏观经济指标和全球金融波动性),我们表明弱公共成分通常不可忽略。此外,我们证明,通过考虑弱公共成分的存在,我们可能获得比纯静态方法更合理的脉冲响应函数形状。我们还为规范分解的所有项和弱因子提供了一致估计量。
We derive a novel canonical decomposition of factor models encompassing both the static factor model - where factors are loaded only contemporaneously - and the Generalised Dynamic Factor Model - where factors are loaded with lags. This decomposition features a new term: the weak common component, defined as the difference between the dynamic and static common components. It is driven by (possibly infinitely many) non-pervasive weak factors which belong to the dynamically common space. Through theoretical and empirical examples - both on U.S. macroeconomic indicators and global financial volatilities - we show that, in general, the weak common component shall not be neglected. Furthermore, we show that, by accounting for the presence of weak common components, we are likely to obtain Impulse Response Functions with more plausible shapes than those obtained from purely static approaches. In addition, we provide consistent estimators for all terms of the canonical decomposition and for the weak factors.
无需评估目标的马尔可夫链蒙特卡洛:一种辅助变量方法
Wei Yuan, Guanyang Wang
AI总结 针对目标分布难以评估的采样问题,提出一种统一的辅助变量MCMC框架,利用估计梯度指导提议移动,显著提升性能。
在采样任务中,目标分布通常已知到归一化常数。然而,在许多情况下,即使评估未归一化的分布也可能是昂贵或不可行的。这个问题出现在诸如从大数据集的贝叶斯后验和“双重难处理”分布中采样等场景。在本文中,我们首先观察到看似不同的马尔可夫链蒙特卡洛(MCMC)算法,如交换算法、PoissonMH和TunaMH,可以在一个简单的共同程序下统一。然后,我们将该程序扩展为一个新颖的框架,允许在提议和接受-拒绝步骤中使用辅助变量。从这个框架中涌现出几种新的MCMC算法,它们使用估计梯度来指导提议移动。在合成和真实数据集上,它们表现出比现有方法显著更好的性能。我们还为新框架开发了理论,并用它来简化和扩展现有算法的结果。重现实验结果的代码可在https://github.com/ywwes26/Auxiliary-MCMC找到。
In sampling tasks, it is common for target distributions to be known up to a normalizing constant. However, in many situations, even evaluating the unnormalized distribution can be costly or infeasible. This issue arises in scenarios such as sampling from the Bayesian posterior for tall datasets and the `doubly-intractable' distributions. In this paper, we begin by observing that seemingly different Markov chain Monte Carlo (MCMC) algorithms, such as the exchange algorithm, PoissonMH, and TunaMH, can be unified under a simple common procedure. We then extend this procedure into a novel framework that allows the use of auxiliary variables in both the proposal and the acceptance--rejection step. Several new MCMC algorithms emerge from this framework that uses estimated gradients to guide the proposal moves. They have demonstrated significantly better performance than existing methods on both synthetic and real datasets. We also develop theory for the new framework and use it to simplify and extend results for existing algorithms. The code to reproduce the experimental results can be found at https://github.com/ywwes26/Auxiliary-MCMC.
非均匀超图随机块模型中的部分恢复与弱一致性
Ioana Dumitriu, Hai-Xiao Wang, Yizhe Zhu
AI总结 针对非均匀超图随机块模型,提出一种谱算法,通过超边选择、谱划分和校正合并步骤,在稀疏随机超图中实现部分恢复和弱一致性。
我们考虑在非均匀超图随机块模型(HSBM)下的稀疏随机超图中的社区检测问题,这是一个具有社区结构和高阶交互的随机网络的一般模型。当随机超图具有有界期望度数时,我们提供了一种谱算法,该算法输出一个至少正确分类$γ$比例顶点的划分,其中$γ\in (0.5,1)$取决于模型的信噪比(SNR)。当SNR随着顶点数趋于无穷而缓慢增长时,我们的算法实现了弱一致性,这改进了Ghoshdastidar和Dukkipati(2017)针对非均匀HSBMs的先前结果。 我们的谱算法包括三个主要步骤:(1)超边选择:选择特定大小的超边,为诱导子超图提供最大信噪比;(2)谱划分:构造正则化邻接矩阵,并基于奇异向量获得近似划分;(3)校正与合并:整合来自邻接张量的超边信息,以提升错误率保证。我们算法的理论分析依赖于稀疏非均匀随机超图的邻接矩阵的集中性和正则化,这可能是独立感兴趣的。
We consider the community detection problem in sparse random hypergraphs under the non-uniform hypergraph stochastic block model (HSBM), a general model of random networks with community structure and higher-order interactions. When the random hypergraph has bounded expected degrees, we provide a spectral algorithm that outputs a partition with at least a $γ$ fraction of the vertices classified correctly, where $γ\in (0.5,1)$ depends on the signal-to-noise ratio (SNR) of the model. When the SNR grows slowly as the number of vertices goes to infinity, our algorithm achieves weak consistency, which improves the previous results in Ghoshdastidar and Dukkipati (2017) for non-uniform HSBMs. Our spectral algorithm consists of three major steps: (1) Hyperedge selection: select hyperedges of certain sizes to provide the maximal signal-to-noise ratio for the induced sub-hypergraph; (2) Spectral partition: construct a regularized adjacency matrix and obtain an approximate partition based on singular vectors; (3) Correction and merging: incorporate the hyperedge information from adjacency tensors to upgrade the error rate guarantee. The theoretical analysis of our algorithm relies on the concentration and regularization of the adjacency matrix for sparse non-uniform random hypergraphs, which can be of independent interest.
通过神经网络生成可求积测度
Erwin Riegler, Alex Bühler, Yang Pan, Helmut Bölcskei
AI总结 本文证明可数m-可求积测度可通过ReLU神经网络将[0,1]上的一维勒贝格测度推前得到,在Wasserstein距离下达到任意小逼近误差,且所需网络数量上界为2^{O(ε^{-m} log^2 ε)},该率等于可求积参数m。
我们推导了(可数)$m$-可求积测度类的通用逼近结果。具体地,我们证明$m$-可求积测度可以通过ReLU神经网络将$[0,1]$上的一维勒贝格测度推前得到,在Wasserstein距离下达到任意小的逼近误差。此外,所考虑网络的权重是量化和有界的,达到逼近误差$\varepsilon$所需的ReLU神经网络数量不超过$2^{b(\varepsilon)}$,其中$b(\varepsilon)=\mathcal{O}(\varepsilon^{-m}\log^2(\varepsilon))$。这一结果改进了Perekrestenko等人的引理IX.4,因为它表明当$\varepsilon$趋于零时$b(\varepsilon)$趋于无穷的速率等于可求积参数$m$,而$m$可能远小于环境维度。我们将此结果推广到可数$m$-可求积测度,并证明该速率仍然等于可求积参数$m$,前提是(除其他技术假设外)测度在可数$m$-可求积支撑集的各个分量上指数衰减。
We derive universal approximation results for the class of (countably) $m$-rectifiable measures. Specifically, we prove that $m$-rectifiable measures can be approximated as push-forwards of the one-dimensional Lebesgue measure on $[0,1]$ using ReLU neural networks with arbitrarily small approximation error in terms of Wasserstein distance. What is more, the weights in the networks under consideration are quantized and bounded and the number of ReLU neural networks required to achieve an approximation error of $\varepsilon$ is no larger than $2^{b(\varepsilon)}$ with $b(\varepsilon)=\mathcal{O}(\varepsilon^{-m}\log^2(\varepsilon))$. This result improves Lemma IX.4 in Perekrestenko et al. as it shows that the rate at which $b(\varepsilon)$ tends to infinity as $\varepsilon$ tends to zero equals the rectifiability parameter $m$, which can be much smaller than the ambient dimension. We extend this result to countably $m$-rectifiable measures and show that this rate still equals the rectifiability parameter $m$ provided that, among other technical assumptions, the measure decays exponentially on the individual components of the countably $m$-rectifiable support set.
通过序贯蒙特卡洛采样器进行广义后验校准
Masahiro Tanaka
AI总结 针对广义后验推断中学习率选择计算成本高的问题,提出结合序贯蒙特卡洛采样器的高效校准算法,显著降低计算开销。
随着可用数据的数量和复杂性增加,对稳健统计学习的需求变得更加迫切。为了增强对模型误设的鲁棒性,广义后验推断方法通过用学习率对似然项进行指数化来调整似然项,从而微调后验分布的离散程度。本研究提出了一种计算上高效的选择适当学习率的策略。所提出的方法建立在广义后验校准(GPC)算法的基础上,该算法旨在选择确保名义频率覆盖的学习率。该算法使用自助样本评估覆盖概率,由于需要为自助样本重复进行后验模拟,计算成本很高。为了解决这一局限性,本研究提出了一种结合GPC算法和序贯蒙特卡洛(SMC)采样器元素的算法。通过利用广义后验推断中的学习率与SMC采样中的逆温度之间的相似性,所提出的算法以降低的计算成本高效地校准后验分布。为了演示,将所提出的算法应用于几个统计学习模型,并显示出比原始GPC显著更快的速度。
As the amount and complexity of available data increases, the need for robust statistical learning becomes more pressing. To enhance resilience against model misspecification, the generalized posterior inference method adjusts the likelihood term by exponentiating it with a learning rate, thereby fine-tuning the dispersion of the posterior distribution. This study proposes a computationally efficient strategy for selecting an appropriate learning rate. The proposed approach builds upon the generalized posterior calibration (GPC) algorithm, which is designed to select a learning rate that ensures nominal frequentist coverage. This algorithm, which evaluates the coverage probability using bootstrap samples, has high computational costs because of the repeated posterior simulations needed for bootstrap samples. To address this limitation, the study proposes an algorithm that combines elements of the GPC algorithm with the sequential Monte Carlo (SMC) sampler. By leveraging the similarity between the learning rate in generalized posterior inference and the inverse temperature in SMC sampling, the proposed algorithm efficiently calibrates the posterior distribution with a reduced computational cost. For demonstration, the proposed algorithm was applied to several statistical learning models and shown to be significantly faster than the original GPC.
一般非均匀超图随机块模型的最优精确恢复
Ioana Dumitriu, Hai-Xiao Wang
AI总结 针对非均匀超图随机块模型中的社区检测问题,首次建立了精确恢复的尖锐阈值,并提出了两种达到最优性能的高效算法。
考虑非均匀超图随机块模型(HSBM)下的随机超图社区检测问题,其中每条超边以仅依赖于其顶点标签的给定概率独立出现。我们在文献中首次建立了在次要约束下非均匀情形下精确恢复的尖锐阈值;特别地,我们考虑具有多个社区的模型。这里的一个关键点是,通过聚合所有均匀层的信息,即使单独考虑每一层时似乎不可能实现精确恢复,我们也能获得精确恢复。除此之外,我们证明了一个广泛的信息论下界,关于任何算法错误分类的顶点数,该下界依赖于一个涉及模型参数的广义Chernoff-Hellinger散度。我们提供了两种高效算法,当高于阈值时成功实现精确恢复,并在精确恢复不可能时达到最低可能的错分比,证明是最优的。我们算法的理论分析依赖于非均匀随机超图邻接矩阵的集中和正则化,这可能具有独立意义。我们还解决了一些关于参数知识和估计的开放问题。
Consider the community detection problem in random hypergraphs under the non-uniform hypergraph stochastic block model (HSBM), where each hyperedge appears independently with some given probability depending only on the labels of its vertices. We establish, for the first time in the literature, a sharp threshold for exact recovery under this non-uniform case, subject to minor constraints; in particular, we consider the model with multiple communities. One crucial point here is that by aggregating information from all the uniform layers, we may obtain exact recovery even in cases when this may appear impossible if each layer were considered alone. Besides that, we prove a wide-ranging, information-theoretic lower bound on the number of misclassified vertices \emph{for any algorithm}, depending on a \emph{generalized Chernoff-Hellinger} divergence involving model parameters. We provide two efficient algorithms which successfully achieve exact recovery when above the threshold, and attain the lowest possible mismatch ratio when the exact recovery is impossible, proved to be optimal. The theoretical analysis of our algorithms relies on the concentration and regularization of the adjacency matrix for non-uniform random hypergraphs, which could be of independent interest. We also address some open problems regarding parameter knowledge and estimation.
一种用于径向基函数插值的离散自适应层次基求解器
Julio Enrique Castrillon-Candas, Jun Li, Victor Eijkhout
AI总结 本文开发了一种离散层次基(HB)方法,通过正交化、适应核函数和节点分布,将变阶多项式径向基函数插值问题解耦为两步求解,并利用GMRES迭代和预条件器高效计算。
本文开发了一种离散层次基(HB),以高效求解变多项式阶的径向基函数(RBF)插值问题。该HB构成正交集,并适应于核种子函数和插值节点的布置。此外,该基与插值节点上定义的多项式(高达给定阶)正交。因此,我们能够将任意多项式插值阶的RBF插值问题解耦,并分两步求解:(1)多项式正交RBF插值问题在变换后的HB基中通过GMRES迭代和对角或块SSOR预条件器高效求解。(2)然后将残差投影到标准正交多项式基上。我们将我们的方法应用于几个测试案例以研究其有效性,包括在最佳线性无偏估计回归问题中的应用。
In this paper we develop a discrete Hierarchical Basis (HB) to efficiently solve the Radial Basis Function (RBF) interpolation problem with variable polynomial order. The HB forms an orthogonal set and is adapted to the kernel seed function and the placement of the interpolation nodes. Moreover, this basis is orthogonal to a set of polynomials up to a given order defined on the interpolating nodes. We are thus able to decouple the RBF interpolation problem for any order of the polynomial interpolation and solve it in two steps: (1) The polynomial orthogonal RBF interpolation problem is efficiently solved in the transformed HB basis with a GMRES iteration and a diagonal, or block SSOR preconditioner. (2) The residual is then projected onto an orthonormal polynomial basis. We apply our approach on several test cases to study its effectiveness, including an application to the Best Linear Unbiased Estimator regression problem.
非线性置换格兰杰因果关系
Noah D. Gade, Jordan Rodu
AI总结 针对非线性数据中格兰杰因果推断的挑战,提出一种基于协变量集置换的显式功能连接度量方法,利用人工神经网络逼近任意非线性关系,并在特定条件下证明置换方差的一致性估计,通过模拟和神经数据验证其性能。
格兰杰因果推断是一种有争议但广泛使用的方法,应用于从经济学到神经科学等领域。原始定义通过建立基于指定模型的条件函数依赖来处理时间序列中的因果关系概念。将格兰杰因果关系适应于非线性数据仍然具有挑战性,许多方法采用不包含样本外可预测性的样本内检验,导致模型过拟合的担忧。为了允许样本外比较,通过协变量集的置换明确定义了功能连接度量。人工神经网络作为数据的特征化器,用于逼近任意非线性关系,并在特征化过程和模型残差的特定条件下,证明了每个置换方差的一致性估计。通过模拟将置换方法的性能与惩罚变量选择、朴素替换和省略技术进行比较,并将其应用于麻醉大鼠听觉皮层中声刺激的神经元反应。当数据集中因果机制的先验知识有限时,有针对性地使用格兰杰因果框架有助于揭示变量集之间潜在的可预测关系,值得进一步研究。
Granger causal inference is a contentious but widespread method used in fields ranging from economics to neuroscience. The original definition addresses the notion of causality in time series by establishing functional dependence conditional on a specified model. Adaptation of Granger causality to nonlinear data remains challenging, and many methods apply in-sample tests that do not incorporate out-of-sample predictability, leading to concerns of model overfitting. To allow for out-of-sample comparison, a measure of functional connectivity is explicitly defined using permutations of the covariate set. Artificial neural networks serve as featurizers of the data to approximate any arbitrary, nonlinear relationship, and consistent estimation of the variance for each permutation is shown under certain conditions on the featurization process and the model residuals. Performance of the permutation method is compared to penalized variable selection, naive replacement, and omission techniques via simulation, and it is applied to neuronal responses of acoustic stimuli in the auditory cortex of anesthetized rats. Targeted use of the Granger causal framework, when prior knowledge of the causal mechanisms in a dataset are limited, can help to reveal potential predictive relationships between sets of variables that warrant further study.
评估英国COVID-19疫苗接种的影响:一种高斯过程方法
Gianluca Giudice, Sara Geneletti, Konstantinos Kalogeropoulos
AI总结 本研究采用多输出高斯过程模型,结合中断时间序列分析和合成控制方法,评估英国加速疫苗接种及政策转变对COVID-19死亡率和传播动态的影响,发现死亡率显著降低但传播率变化不大。
2021年初英国COVID-19疫苗的快速推广与许多其他欧洲国家显著不同,为评估疫苗接种速度对公共卫生结果的影响提供了自然条件。我们通过构建在较慢疫苗接种和重新开放轨迹下英国的概率参考轨迹,评估了加速的英国疫苗接种推广及相关政策转变对COVID-19死亡率和传播动态的影响。所提出的框架结合了中断时间序列分析和合成控制方法的思想,以及基于多输出高斯过程的灵活概率建模。这些模型捕捉了国家间和时间上的非线性和异质性依赖结构,同时通过预测分布提供不确定性量化。该方法的一个核心特征是基于在预留的干预前期预测性能的设计一致性验证策略,该策略既用于指导模型规范,也用于评估重建参考轨迹的合理性。实证结果表明,加速的疫苗接种政策转变与COVID-19死亡率的大幅降低相关,但对传播率几乎没有影响的证据。总体而言,该框架展示了灵活的概率模型和预测验证如何在复杂时间序列设置中支持因果和政策评估。
The rapid rollout of COVID-19 vaccines in the United Kingdom in early 2021 differed markedly from that of many other European countries, providing a natural setting to assess the impact of vaccination speed on public health outcomes. We evaluate the impact of the accelerated UK vaccination rollout and associated policy transition on COVID-19 mortality and transmission dynamics by constructing a probabilistic reference trajectory for the UK under a slower vaccination and reopening trajectory. The proposed framework combines ideas from interrupted time series analysis and synthetic control methods with flexible probabilistic modelling based on multi-output Gaussian processes. These models capture non-linear and heterogeneous dependence structures across countries and over time, while providing uncertainty quantification through predictive distributions. A central feature of the methodology is a design-consistent validation strategy based on predictive performance in held-out pre-intervention periods, which is used both to guide model specification and to assess the plausibility of the reconstructed reference trajectory. The empirical results indicate a substantial reduction in COVID-19 mortality associated with the accelerated vaccination-policy transition, with little evidence of an effect on transmission rates. Generally, the framework illustrates how flexible probabilistic models and predictive validation can support causal and policy evaluation in complex time series settings.
一类结构化非线性规划的序列凸规划方法
Zhaosong Lu
AI总结 本文研究一类结构化非线性规划问题,提出序列凸规划方法,并证明生成序列的聚点为KKT点。
本文研究了一类广泛的结构化非线性规划(SNLP)问题。首先,我们建立了它们的一阶最优性条件。然后,我们提出了序列凸规划(SCP)方法来求解这些问题,其中每次迭代通过求解一个凸规划问题获得。在适当的假设下,我们证明了该方法生成的序列的任何聚点都是SNLP问题的KKT点。此外,我们提出了一种SCP方法的变体,其中使用了非单调方案和相关函数的“局部”Lipschitz常数。建立了与上述类似的收敛结果。
In this paper we study a broad class of structured nonlinear programming (SNLP) problems. In particular, we first establish the first-order optimality conditions for them. Then we propose sequential convex programming (SCP) methods for solving them in which each iteration is obtained by solving a convex programming problem. Under some suitable assumptions, we establish that any accumulation point of the sequence generated by the methods is a KKT point of the SNLP problems. In addition, we propose a variant of the SCP method for SNLP in which nonmonotone scheme and ``local'' Lipschitz constants of the associated functions are used. A similar convergence result as mentioned above is established.
关于黎曼ζ函数实部符号的研究
Juan Arias de Reyna, Richard P. Brent, Jan van de Lune
AI总结 本文利用Bohr和Jessen的经典结果,通过特征函数ψ_σ(x)给出固定线σ>1/2上argζ(σ+it)的分布密度d(σ)和Reζ(σ+it)<0的密度d_-(σ)的显式表达式,并设计实用算法进行数值计算。
我们考虑固定线$σ> \frac12$上$\argζ(σ+it)$的分布,特别是密度\[d(σ) = \lim_{T \rightarrow +\infty} \frac{1}{2T} |\{t \in [-T,+T]: |\argζ(σ+it)| > π/2\}|\,\]以及密切相关的密度\[d_{-}(σ) = \lim_{T \rightarrow +\infty} \frac{1}{2T} |\{t \in [-T,+T]: \Reζ(σ+it) < 0\}|。\]利用Bohr和Jessen的经典结果,我们得到了与$\argζ(σ+it)$相关的特征函数$ψ_σ(x)$的显式表达式。我们给出了$d(σ)$和$d_{-}(σ)$用$ψ_σ(x)$表示的显式表达式。最后,我们给出了一个实用算法来评估这些表达式,以获得$d(σ)$和$d_{-}(σ)$的精确数值。
We consider the distribution of $\argζ(σ+it)$ on fixed lines $σ> \frac12$, and in particular the density \[d(σ) = \lim_{T \rightarrow +\infty} \frac{1}{2T} |\{t \in [-T,+T]: |\argζ(σ+it)| > π/2\}|\,,\] and the closely related density \[d_{-}(σ) = \lim_{T \rightarrow +\infty} \frac{1}{2T} |\{t \in [-T,+T]: \Reζ(σ+it) < 0\}|\,.\] Using classical results of Bohr and Jessen, we obtain an explicit expression for the characteristic function $ψ_σ(x)$ associated with $\argζ(σ+it)$. We give explicit expressions for $d(σ)$ and $d_{-}(σ)$ in terms of $ψ_σ(x)$. Finally, we give a practical algorithm for evaluating these expressions to obtain accurate numerical values of $d(σ)$ and $d_{-}(σ)$.
球面上最近站点角距离的分布性质
Hongjun Li, Jiatong Sui, Shengpeng Mu, Xing Qiu
AI总结 本文研究了球面上均匀随机点到预设站点集的最小大圆角距离的分布和计算性质,推导了其累积分布函数、概率密度函数和矩的公式,并通过蒙特卡洛模拟验证了方法的计算效率。
最近站点距离出现在许多涉及球形或方向域的应用中,包括全球地理空间分析、无线通信、球形聚类和基于余弦相似度的数据分析。在本文中,我们研究了$L_2$的分布和计算性质,$L_2$是从球面上均匀分布的随机点到同一球面上预设站点集的最小大圆角距离。我们首先推导了$L_0$的累积分布函数(CDF)和概率密度函数(PDF),$L_0$是从球面三角形的固定顶点到该三角形内均匀分布的随机点的大圆角距离。然后,我们将这些三角形层面的结果推广到凸球面多边形,并利用球面Voronoi图、Voronoi胞的三角剖分和数值积分,得到$L_2$的可计算的分布和矩公式。此外,我们推导了$\cos(L_2)$的选定矩的显式公式,这些公式与余弦相似度和球面数据分析相关。大量的蒙特卡洛模拟验证了所提出的CDF、PDF和矩公式,并展示了我们的方法相对于通用数值积分和基于模拟的替代方案的计算效率。
Nearest-site distances arise in many applications involving spherical or directional domains, including global geospatial analysis, wireless communications, spherical clustering, and cosine-similarity-based data analysis. In this paper, we study the distributional and computational properties of $L_2$, the minimal angular great-circle distance from a uniformly distributed random point on a sphere to a set of prespecified sites on the same sphere. We first derive the cumulative distribution function (CDF) and probability density function (PDF) of $L_0$, the angular great-circle distance from a fixed vertex of a spherical triangle to a random point uniformly distributed within that triangle. We then extend these triangle-level results to convex spherical polygons and use spherical Voronoi diagrams, triangulations of Voronoi cells, and numerical integration to obtain computable distributional and moment formulas for $L_2$. In addition, we derive explicit formulas for selected moments of $\cos(L_2)$, which are relevant to cosine similarity and spherical data analysis. Extensive Monte Carlo simulations validate the proposed CDF, PDF, and moment formulas and demonstrate computational efficiency of our method relative to generic numerical integration and simulation-based alternatives.
分段常数信号去噪的通用方法与求解器
Max A. Little, Nick S. Jones
AI总结 本文提出分段常数信号去噪的通用泛函框架,涵盖多种现有方法,并引入结合全局均值漂移聚类与局部全变差平滑的新方法,通过合成数据对比验证其有效性。
从分段常数信号中去除噪声是一个具有挑战性的信号处理问题,出现在许多实际场景中。例如,在勘探地球科学中,需要将含噪钻孔记录分离成地层带;在生物物理学中,需要从含噪荧光显微镜信号中提取分子驻留状态之间的跳变。存在许多分段常数去噪方法,包括全变差正则化、均值漂移聚类、逐步跳变放置、运行中位数、凸聚类收缩和双边滤波;然而,传统的线性信号处理方法根本不适用。本文表明,这些方法大多与一个广义泛函的特例相关,该泛函通过最小化实现分段常数去噪。最小化可以通过多种求解器算法获得,包括逐步跳变放置、凸规划、有限差分、迭代运行中位数、最小角回归、正则化路径跟踪和坐标下降。我们引入了新颖的分段常数去噪方法,例如,将全局均值漂移聚类与局部全变差平滑相结合。在合成数据上对这些方法进行了头对头比较,揭示出我们的新方法可以发挥有用的作用。最后,简要讨论了本文方法与其他方法(如小波收缩、隐马尔可夫模型和分段平滑滤波)之间的重叠。
Removing noise from piecewise constant (PWC) signals, is a challenging signal processing problem arising in many practical contexts. For example, in exploration geosciences, noisy drill hole records need separating into stratigraphic zones, and in biophysics, jumps between molecular dwell states need extracting from noisy fluorescence microscopy signals. Many PWC denoising methods exist, including total variation regularization, mean shift clustering, stepwise jump placement, running medians, convex clustering shrinkage and bilateral filtering; conventional linear signal processing methods are fundamentally unsuited however. This paper shows that most of these methods are associated with a special case of a generalized functional, minimized to achieve PWC denoising. The minimizer can be obtained by diverse solver algorithms, including stepwise jump placement, convex programming, finite differences, iterated running medians, least angle regression, regularization path following, and coordinate descent. We introduce novel PWC denoising methods, which, for example, combine global mean shift clustering with local total variation smoothing. Head-to-head comparisons between these methods are performed on synthetic data, revealing that our new methods have a useful role to play. Finally, overlaps between the methods of this paper and others such as wavelet shrinkage, hidden Markov models, and piecewise smooth filtering are touched on.
利用稀疏性在网格上跟踪目标信号强度
Shahrokh Farahmand, Georgios B. Giannakis, Geert Leus, Zhi Tian
AI总结 提出一种基于网格的线性状态和测量模型,通过稀疏感知卡尔曼滤波实现多目标跟踪,避免数据关联并降低复杂度。
多目标跟踪主要面临测量方程中的非线性和快速准确数据关联的挑战。为克服这些挑战,本文引入一种基于网格的模型,其中状态捕获已知空间网格上的目标信号强度(TSSG)。该模型导致线性的状态和测量方程,绕过了数据关联,并可以通过稀疏感知卡尔曼滤波(KF)进行状态估计。利用新模型的网格诱导稀疏性,开发了两种稀疏感知TSSG-KF跟踪器:一种通过ℓ1范数正则化实现稀疏性,另一种将稀疏性作为额外测量。开发了迭代扩展卡尔曼滤波和高斯-牛顿算法以实现低复杂度跟踪,并提供了精确的误差协方差更新以评估所得稀疏感知状态估计器的性能。基于TSSG状态估计,可以在后续步骤中获得更具信息性的目标位置和轨迹估计,确保轨迹关联和位置估计误差不会传播回TSSG状态估计。新颖的TSSG跟踪器不需要知道目标数量或其信号强度,并且与基准隐马尔可夫模型滤波器相比,复杂度显著降低,尤其是在目标数量较多时。数值模拟表明,与不感知稀疏性的对应方法相比,稀疏感知跟踪器在降低复杂度的同时,均方根误差性能得到改善。
Multi-target tracking is mainly challenged by the nonlinearity present in the measurement equation, and the difficulty in fast and accurate data association. To overcome these challenges, the present paper introduces a grid-based model in which the state captures target signal strengths on a known spatial grid (TSSG). This model leads to \emph{linear} state and measurement equations, which bypass data association and can afford state estimation via sparsity-aware Kalman filtering (KF). Leveraging the grid-induced sparsity of the novel model, two types of sparsity-cognizant TSSG-KF trackers are developed: one effects sparsity through $\ell_1$-norm regularization, and the other invokes sparsity as an extra measurement. Iterative extended KF and Gauss-Newton algorithms are developed for reduced-complexity tracking, along with accurate error covariance updates for assessing performance of the resultant sparsity-aware state estimators. Based on TSSG state estimates, more informative target position and track estimates can be obtained in a follow-up step, ensuring that track association and position estimation errors do not propagate back into TSSG state estimates. The novel TSSG trackers do not require knowing the number of targets or their signal strengths, and exhibit considerably lower complexity than the benchmark hidden Markov model filter, especially for a large number of targets. Numerical simulations demonstrate that sparsity-cognizant trackers enjoy improved root mean-square error performance at reduced complexity when compared to their sparsity-agnostic counterparts.
基于离群稀疏约束的动态过程双稳健平滑
Shahrokh Farahmand, Georgios B. Giannakis, Daniele Angelosante
AI总结 提出一种基于ℓ1正则化最小二乘的双稳健平滑算法,通过坐标下降和ADMM联合估计状态与离群变量,同时处理测量和状态动态中的离群值。
在动态过程中处理离群值在各种应用中至关重要,因为实际中与名义模型的偏差并不罕见。在此背景下,本文开发了新颖的固定滞后和固定区间平滑算法,对同时存在于测量和状态动态中的离群值具有稳健性。离群值通过辅助未知变量处理,这些变量与状态基于最小二乘准则联合估计,该准则通过离群值的ℓ1范数进行正则化以实现稀疏控制。由此产生的迭代估计器依赖于坐标下降和交替方向乘子法,每次迭代以闭式表达,并且可证明收敛。新型双稳健平滑器的其他吸引特性包括:i) 能够处理两种类型的离群值;ii) 对未知名义噪声和离群分布具有普适性;iii) 灵活性,可包含在名义条件下具有可靠性能的最大后验最优估计器;iv) 在相当复杂度下相对于竞争替代方案性能提升,通过模拟测试得到证实。
Coping with outliers contaminating dynamical processes is of major importance in various applications because mismatches from nominal models are not uncommon in practice. In this context, the present paper develops novel fixed-lag and fixed-interval smoothing algorithms that are robust to outliers simultaneously present in the measurements {\it and} in the state dynamics. Outliers are handled through auxiliary unknown variables that are jointly estimated along with the state based on the least-squares criterion that is regularized with the $\ell_1$-norm of the outliers in order to effect sparsity control. The resultant iterative estimators rely on coordinate descent and the alternating direction method of multipliers, are expressed in closed form per iteration, and are provably convergent. Additional attractive features of the novel doubly robust smoother include: i) ability to handle both types of outliers; ii) universality to unknown nominal noise and outlier distributions; iii) flexibility to encompass maximum a posteriori optimal estimators with reliable performance under nominal conditions; and iv) improved performance relative to competing alternatives at comparable complexity, as corroborated via simulated tests.
累积二元正态分布的递归数值评估
Christian Meyer
AI总结 基于Marsaglia的累积一元正态分布评估思想,提出一种数学透明、性能优异且易于扩展至高精度的累积二元正态分布递归评估算法。
我们提出了一种评估累积二元正态分布的算法,该算法建立在Marsaglia评估累积一元正态分布的思想之上。该算法数学上透明,性能具有竞争力,并且可以轻松扩展到任意精度。
We propose an algorithm for evaluation of the cumulative bivariate normal distribution, building upon Marsaglia's ideas for evaluation of the cumulative univariate normal distribution. The algorithm is mathematically transparent, delivers competitive performance and can easily be extended to arbitrary precision.
数据拟合的混合确定性-随机方法
Michael P. Friedlander, Mark Schmidt
AI总结 提出一种混合增量梯度算法,通过控制样本大小实现全梯度方法的稳定收敛率,并基于拟牛顿法给出实用实现。
许多结构化数据拟合应用需要解决涉及大量测量值之和的优化问题。增量梯度算法通过对和中的项进行子集采样,提供了廉价的迭代。这些方法初始进展很快,但接近解时往往变慢。相比之下,全梯度方法以每次迭代评估完整目标和梯度为代价,实现稳定收敛。我们探索了兼具两者优点的混合方法。收敛速率分析表明,通过控制增量梯度算法中的样本大小,可以保持全梯度方法的稳定收敛速率。我们详细介绍了基于该方法的实用拟牛顿实现。数值实验说明了其潜在优势。
Many structured data-fitting applications require the solution of an optimization problem involving a sum over a potentially large number of measurements. Incremental gradient algorithms offer inexpensive iterations by sampling a subset of the terms in the sum. These methods can make great progress initially, but often slow as they approach a solution. In contrast, full-gradient methods achieve steady convergence at the expense of evaluating the full objective and gradient on each iteration. We explore hybrid methods that exhibit the benefits of both approaches. Rate-of-convergence analysis shows that by controlling the sample size in an incremental gradient algorithm, it is possible to maintain the steady convergence rates of full-gradient methods. We detail a practical quasi-Newton implementation based on this approach. Numerical experiments illustrate its potential benefits.
使用近似求解器进行复杂系统中的不确定性量化
Phaedon-Stelios Koutsourelakis
AI总结 针对计算成本高昂且具有大量非高斯不确定性向量的复杂系统,提出一种结合先进蒙特卡洛采样与贝叶斯公式的不确定性量化框架,通过严格使用廉价近似计算模型(如粗化离散化、增大时间步长、减少求解器迭代或使用低阶模型)来显著降低计算量,同时保证统计量的准确估计并提供置信界限。
本文提出了一种新颖的不确定性量化框架,适用于以大量非高斯不确定性向量为特征的计算密集型系统。该框架将先进蒙特卡洛采样技术与贝叶斯公式相结合。与现有工作的关键区别在于,它以严格的方式使用了廉价的近似计算模型。这些模型可以通过粗化控制方程求解中的离散化尺寸、在常微分方程积分时增大时间步长、使用非线性求解器时减少迭代次数或利用低阶模型来轻松获得。结果表明,即使在不精确模型对精确响应提供非常差近似的情况下,后者(精确响应)的统计量也能被准确量化,同时显著减少计算工作量。可以使用多个近似模型,并在所有阶段提供估计值的严格置信界限。
This paper proposes a novel uncertainty quantification framework for computationally demanding systems characterized by a large vector of non-Gaussian uncertainties. It combines state-of-the-art techniques in advanced Monte Carlo sampling with Bayesian formulations. The key departure from existing works is the use of inexpensive, approximate computational models in a rigorous manner. Such models can readily be derived by coarsening the discretization size in the solution of the governing PDEs, increasing the time step when integration of ODEs is performed, using fewer iterations if a non-linear solver is employed or making use of lower order models. It is shown that even in cases where the inexact models provide very poor approximations of the exact response, statistics of the latter can be quantified accurately with significant reductions in the computational effort. Multiple approximate models can be used and rigorous confidence bounds of the estimates produced are provided at all stages.
多变量谱密度的距离与黎曼度量
Xianhua Jiang, Lipeng Ning, Tryphon T. Georgiou
AI总结 本文引入功率谱密度矩阵间的散度度量,通过最优预测比较模型,导出黎曼度量并给出测地线与距离的显式公式,揭示了与Fisher-Rao度量的联系。
我们首先引入功率谱密度矩阵之间的一类散度度量。这些度量是通过在最优预测背景下比较不同模型的适用性而推导出来的。“无穷小接近”的功率谱之间的距离是二次的,因此它们诱导出微分几何结构。我们研究了相应的黎曼度量,并在特定情况下给出了相应的测地线和测地距离的显式公式。注意到功率谱的几何与Fisher-Rao度量的几何之间的密切联系。
We first introduce a class of divergence measures between power spectral density matrices. These are derived by comparing the suitability of different models in the context of optimal prediction. Distances between "infinitesimally close" power spectra are quadratic, and hence, they induce a differential-geometric structure. We study the corresponding Riemannian metrics and, for a particular case, provide explicit formulae for the corresponding geodesics and geodesic distances. The close connection between the geometry of power spectra and the geometry of the Fisher-Rao metric is noted.
通过凸松弛进行系统辨识的输入设计
Ian R. Manchester
AI总结 提出一种基于凸松弛的框架,在时域幅度约束下最大化Fisher信息矩阵,实现系统辨识激励输入的优化,并保证近似最优性。
本文提出了一种用于系统辨识激励输入优化的新框架。所考虑的优化问题是在经典的D-、E-或A-最优意义下最大化约简的Fisher信息矩阵。与大多数已发表的工作不同,我们在时域中考虑问题,并受到输入信号幅度的约束。该优化问题是非凸的。本文的主要结果是一个凸松弛,它给出了一个上界,精确到真实最大值的$2/π$以内。提出了一种随机算法来寻找可行解,在某种意义上,该解预计至少具有全局最优输入信号$2/π$的信息量。在输入功率单一约束的情况下,所提出的方法精确地恢复了全局最优解。给出了输入和输出同时具有功率和幅度约束的情况的扩展。一个简单的仿真示例说明了该技术。
This paper proposes a new framework for the optimization of excitation inputs for system identification. The optimization problem considered is to maximize a reduced Fisher information matrix in any of the classical D-, E-, or A-optimal senses. In contrast to the majority of published work on this topic, we consider the problem in the time domain and subject to constraints on the amplitude of the input signal. This optimization problem is nonconvex. The main result of the paper is a convex relaxation that gives an upper bound accurate to within $2/π$ of the true maximum. A randomized algorithm is presented for finding a feasible solution which, in a certain sense is expected to be at least $2/π$ as informative as the globally optimal input signal. In the case of a single constraint on input power, the proposed approach recovers the true global optimum exactly. Extensions to situations with both power and amplitude constraints on both inputs and outputs are given. A simple simulation example illustrates the technique.
子空间聚类的贪婪特征选择
Eva L. Dyer, Aswin C. Sankaranarayanan, Richard G. Baraniuk
AI总结 本文研究使用贪婪方法(正交匹配追踪)进行子空间聚类的精确特征选择,并证明其在稀疏采样条件下优于最近邻方法。
子空间的并集为高维数据集合提供了对线性子空间模型的强大推广。为了从数据集合中学习子空间的并集,必须识别集合中属于同一子空间的信号集,以获得数据中存在的子空间结构的准确估计。最近,稀疏恢复方法已被证明为精确特征选择(EFS)提供了可证明且稳健的策略——从集合中恢复位于同一子空间的点集。与最近关于L1最小化EFS的研究并行,本文为使用贪婪方法(即正交匹配追踪(OMP))进行稀疏信号恢复的EFS发展了充分条件。在分析之后,我们提供了对生活在子空间并集上的信号的特征选择策略的实证研究,并刻画了稀疏恢复方法与基于最近邻(NN)的方法之间的差距。特别是,我们证明了稀疏恢复方法比NN方法具有显著优势,并且当数据集中子空间的采样稀疏时,这两种方法之间的差距尤为明显。我们的结果表明,在NN方法无法揭示集合中点所属子空间的许多情况下,OMP可以可靠地恢复精确的特征集。
Unions of subspaces provide a powerful generalization to linear subspace models for collections of high-dimensional data. To learn a union of subspaces from a collection of data, sets of signals in the collection that belong to the same subspace must be identified in order to obtain accurate estimates of the subspace structures present in the data. Recently, sparse recovery methods have been shown to provide a provable and robust strategy for exact feature selection (EFS)--recovering subsets of points from the ensemble that live in the same subspace. In parallel with recent studies of EFS with L1-minimization, in this paper, we develop sufficient conditions for EFS with a greedy method for sparse signal recovery known as orthogonal matching pursuit (OMP). Following our analysis, we provide an empirical study of feature selection strategies for signals living on unions of subspaces and characterize the gap between sparse recovery methods and nearest neighbor (NN)-based approaches. In particular, we demonstrate that sparse recovery methods provide significant advantages over NN methods and the gap between the two approaches is particularly pronounced when the sampling of subspaces in the dataset is sparse. Our results suggest that OMP may be employed to reliably recover exact feature sets in a number of regimes where NN approaches fail to reveal the subspace membership of points in the ensemble.
高斯过程动态系统中的期望传播:扩展版
Marc Peter Deisenroth, Shakir Mohamed
AI总结 本文提出基于期望传播的消息传递算法用于高斯过程动态系统的近似推理,通过前向后向平滑迭代获得更精确的潜在结构后验分布,提升预测性能,并统一了现有GPDS平滑器。
丰富且复杂的时间序列数据,例如来自工程系统、金融市场、视频或神经记录的生成数据,现在已成为现代数据分析的常见特征。解释这些多样化数据集背后的现象需要灵活且准确的模型。在本文中,我们推广高斯过程动态系统(GPDS)作为适合此类分析的丰富模型类。特别地,我们提出了一种基于期望传播的GPDS近似推理消息传递算法。通过将推理视为一般的消息传递问题,我们迭代前向后向平滑。因此,我们获得了更准确的潜在结构后验分布,与最先进的GPDS平滑器(这些平滑器是我们一般消息传递算法的特例)相比,预测性能得到改善。因此,我们提供了一种统一的方法,在其中将消息传递置于GPDS的上下文中。
Rich and complex time-series data, such as those generated from engineering systems, financial markets, videos or neural recordings, are now a common feature of modern data analysis. Explaining the phenomena underlying these diverse data sets requires flexible and accurate models. In this paper, we promote Gaussian process dynamical systems (GPDS) as a rich model class that is appropriate for such analysis. In particular, we present a message passing algorithm for approximate inference in GPDSs based on expectation propagation. By posing inference as a general message passing problem, we iterate forward-backward smoothing. Thus, we obtain more accurate posterior distributions over latent structures, resulting in improved predictive performance compared to state-of-the-art GPDS smoothers, which are special cases of our general message passing algorithm. Hence, we provide a unifying approach within which to contextualize message passing in GPDSs.
最优加权Herding是贝叶斯求积
Ferenc Huszár, David Duvenaud
AI总结 本文证明核Herding的样本选择准则等价于贝叶斯求积的后验方差,并提出序贯贝叶斯求积作为加权Herding的改进版本,实现优于O(1/N)的收敛速度。
Herding和核Herding是选择样本以总结概率分布的确定性方法。一个相关任务是选择样本以使用贝叶斯求积估计积分。我们证明,在核Herding中选择样本时最小化的准则等价于贝叶斯求积中的后验方差。然后,我们表明序贯贝叶斯求积可以看作是核Herding的加权版本,其性能优于任何其他加权Herding方法。我们经验性地证明了收敛速度快于O(1/N)。我们的结果还隐含了贝叶斯求积估计的经验误差的上界。
Herding and kernel herding are deterministic methods of choosing samples which summarise a probability distribution. A related task is choosing samples for estimating integrals using Bayesian quadrature. We show that the criterion minimised when selecting samples in kernel herding is equivalent to the posterior variance in Bayesian quadrature. We then show that sequential Bayesian quadrature can be viewed as a weighted version of kernel herding which achieves performance superior to any other weighted herding method. We demonstrate empirically a rate of convergence faster than O(1/N). Our results also imply an upper bound on the empirical error of the Bayesian quadrature estimate.
度量空间上的霍奇理论
Laurent Bartholdi, Thomas Schick, Nat Smale, Steve Smale, Anthony W. Baker
AI总结 本文在带概率测度的度量空间上发展霍奇理论,旨在为视觉和模式识别中的图像空间提供几何分析工具。
霍奇理论是几何、拓扑和分析的优美综合,已在黎曼流形上得到发展。另一方面,在视觉和模式识别的数学基础中重要的图像空间并不符合这一框架。这促使我们在带概率测度的度量空间上发展霍奇理论的一个版本。我们相信这构成了理解视觉几何的一步。Anthony Baker的附录提供了一个可分、紧致的度量空间,具有无限维α尺度同调。
Hodge theory is a beautiful synthesis of geometry, topology, and analysis, which has been developed in the setting of Riemannian manifolds. On the other hand, spaces of images, which are important in the mathematical foundations of vision and pattern recognition, do not fit this framework. This motivates us to develop a version of Hodge theory on metric spaces with a probability measure. We believe that this constitutes a step towards understanding the geometry of vision. The appendix by Anthony Baker provides a separable, compact metric space with infinite dimensional α-scale homology.
Lipschitz函数遗留数据观测的最优不确定性量化
T. J. Sullivan, M. McKerns, D. Meyer, F. Theil, H. Owhadi, M. Ortiz
AI总结 针对部分观测函数,提出一种优化框架以提供最优不确定性量化(UQ),通过求解优化问题得到感兴趣量的最优界限,并利用类似线性规划单纯形算法高效处理高维系统与高基数遗留数据。
我们考虑为部分观测函数提供最优不确定性量化(UQ)——从而提供严格认证——的问题。我们提出了一个UQ框架,其中观测数量可以少也可以多,并且不需要携带系统运行时的概率分布信息。UQ目标被表述为优化问题,其解是感兴趣量的最优界限;我们考虑两种典型设置,即参数灵敏度(McDiarmid直径)和输出偏差(或失效)概率。这些优化问题的解以非平凡的方式(甚至非单调且不连续)依赖于指定的遗留数据。此外,极值通常仅由数据集中的少数成员决定;在我们主要的物理动机示例中,界限仅由32个数据点中的2个决定,其余数据不携带任何信息,可以忽略而不改变最终答案。我们提出了一种类似于线性规划中单纯形算法的算法,利用这些观测为具有高基数遗留数据的高维系统提供高效且严格的UQ。这些发现为选择最优(信息量最大)的下一次实验提供了自然方法。
We consider the problem of providing optimal uncertainty quantification (UQ) --- and hence rigorous certification --- for partially-observed functions. We present a UQ framework within which the observations may be small or large in number, and need not carry information about the probability distribution of the system in operation. The UQ objectives are posed as optimization problems, the solutions of which are optimal bounds on the quantities of interest; we consider two typical settings, namely parameter sensitivities (McDiarmid diameters) and output deviation (or failure) probabilities. The solutions of these optimization problems depend non-trivially (even non-monotonically and discontinuously) upon the specified legacy data. Furthermore, the extreme values are often determined by only a few members of the data set; in our principal physically-motivated example, the bounds are determined by just 2 out of 32 data points, and the remainder carry no information and could be neglected without changing the final answer. We propose an analogue of the simplex algorithm from linear programming that uses these observations to offer efficient and rigorous UQ for high-dimensional systems with high-cardinality legacy data. These findings suggest natural methods for selecting optimal (maximally informative) next experiments.
非线性系统逼近的核方法
Jake Bouvrie, Boumediene Hamzi
AI总结 本文提出一种数据驱动降阶方法,利用核技巧将非线性系统提升至高维特征空间进行隐式平衡截断,从而构建闭式降阶模型以保留输入输出特性。
我们借鉴机器学习和统计降维的最新进展,提出一种用于非线性控制系统的数据驱动降阶方法。该方法基于如下假设:当非线性系统被提升到高维(或无限维)特征空间时,其行为呈线性,从而可隐式地进行平衡截断。由此导出一个非线性降阶映射,可与属于再生核希尔伯特空间的系统表示相结合,得到一个闭式的降阶动力系统,该系统保留了原始模型的基本输入输出特性。文中还提供了说明该方法的经验仿真结果。
We introduce a data-driven order reduction method for nonlinear control systems, drawing on recent progress in machine learning and statistical dimensionality reduction. The method rests on the assumption that the nonlinear system behaves linearly when lifted into a high (or infinite) dimensional feature space where balanced truncation may be carried out implicitly. This leads to a nonlinear reduction map which can be combined with a representation of the system belonging to a reproducing kernel Hilbert space to give a closed, reduced order dynamical system which captures the essential input-output characteristics of the original model. Empirical simulations illustrating the approach are also provided.
一致遍历马尔可夫链拟蒙特卡洛的差异界
Josef Dick, Daniel Rudolf, Houying Zhu
AI总结 针对一致遍历马尔可夫链,使用确定性驱动序列替代独立均匀随机变量,证明了样本经验分布与目标分布之间的差异上界,并表明存在驱动序列使差异以接近n^{-1/2}的蒙特卡洛速率收敛。
马尔可夫链可用于生成样本,其分布近似于给定的目标分布。此类马尔可夫链样本的质量可以通过样本经验分布与目标分布之间的差异来衡量。在假设马尔可夫链是一致遍历且驱动序列是确定性而非独立$U(0,1)$随机变量的条件下,我们证明了该差异的上界。特别地,我们展示了存在驱动序列,使得对于某些测试集,马尔可夫链与目标分布之间的差异以(几乎)通常的蒙特卡洛速率$n^{-1/2}$收敛。
Markov chains can be used to generate samples whose distribution approximates a given target distribution. The quality of the samples of such Markov chains can be measured by the discrepancy between the empirical distribution of the samples and the target distribution. We prove upper bounds on this discrepancy under the assumption that the Markov chain is uniformly ergodic and the driver sequence is deterministic rather than independent $U(0,1)$ random variables. In particular, we show the existence of driver sequences for which the discrepancy of the Markov chain from the target distribution with respect to certain test sets converges with (almost) the usual Monte Carlo rate of $n^{-1/2}$.
凸聚类的分裂方法
Eric C. Chi, Kenneth Lange
AI总结 本文提出两种分裂方法(ADMM和AMA)求解凸聚类问题,其中AMA在复杂度和数值实验中更高效。
聚类是许多科学应用中的基本问题。然而,标准方法如$k$-均值、高斯混合模型和层次聚类受到局部极小值的困扰,这些局部极小值有时会严重次优。最近引入的$k$-均值和层次聚类的凸松弛将聚类中心相互收缩,并确保唯一的全局最小化器。在这项工作中,我们提出了两种求解凸聚类问题的分裂方法。第一种是交替方向乘子法(ADMM)的一个实例;第二种是交替最小化算法(AMA)的一个实例。与先前考虑的算法相比,我们的ADMM和AMA公式为在先前研究的范数下求解凸聚类问题提供了简单统一的框架,并为潜在的新范数打开了大门。我们在模拟和真实数据示例上展示了算法的性能。虽然两种算法在表面上差异很小,但复杂度分析和数值实验表明AMA显著更高效。
Clustering is a fundamental problem in many scientific applications. Standard methods such as $k$-means, Gaussian mixture models, and hierarchical clustering, however, are beset by local minima, which are sometimes drastically suboptimal. Recently introduced convex relaxations of $k$-means and hierarchical clustering shrink cluster centroids toward one another and ensure a unique global minimizer. In this work we present two splitting methods for solving the convex clustering problem. The first is an instance of the alternating direction method of multipliers (ADMM); the second is an instance of the alternating minimization algorithm (AMA). In contrast to previously considered algorithms, our ADMM and AMA formulations provide simple and unified frameworks for solving the convex clustering problem under the previously studied norms and open the door to potentially novel norms. We demonstrate the performance of our algorithm on both simulated and real data examples. While the differences between the two algorithms appear to be minor on the surface, complexity analysis and numerical experiments show AMA to be significantly more efficient.
固定基系数矩阵补全的秩校正方法
Weimin Miao, Shaohua Pan, Defeng Sun
AI总结 针对核范数在固定基系数矩阵补全中的局限性,提出一种基于核半范数的秩校正方法,并证明其非渐近恢复误差界可降低约50%,同时给出秩一致性的充要条件。
对于低秩矩阵补全问题,广泛使用的核范数技术在许多情况下可能面临挑战,特别是当某些基系数固定时,例如金融市场等领域的低秩相关矩阵补全以及量子态层析中的低秩密度矩阵补全。为了寻求核范数无法达到的高恢复质量解,本文提出了一种使用核半范数的秩校正程序来生成新的估计量。对于这个新估计量,我们建立了非渐近恢复误差界。更重要的是,我们量化了该秩校正程序恢复误差界的减少。与核范数惩罚最小二乘估计量得到的误差界相比,这种减少可能是显著的(约50%)。我们还提供了Bach(2008)意义上秩一致性的必要和充分条件。非常有趣的是,这些条件与矩阵优化中的约束非退化概念高度相关。作为副产品,我们的结果为Gao和Sun(2010)以及Gao(2010)用于结构化低秩矩阵优化问题的主化惩罚方法提供了理论基础。大量的数值实验表明,我们提出的秩校正程序可以同时实现高恢复精度和捕捉低秩结构。
For the problems of low-rank matrix completion, the efficiency of the widely-used nuclear norm technique may be challenged under many circumstances, especially when certain basis coefficients are fixed, for example, the low-rank correlation matrix completion in various fields such as the financial market and the low-rank density matrix completion from the quantum state tomography. To seek a solution of high recovery quality beyond the reach of the nuclear norm, in this paper, we propose a rank-corrected procedure using a nuclear semi-norm to generate a new estimator. For this new estimator, we establish a non-asymptotic recovery error bound. More importantly, we quantify the reduction of the recovery error bound for this rank-corrected procedure. Compared with the one obtained for the nuclear norm penalized least squares estimator, this reduction can be substantial (around 50%). We also provide necessary and sufficient conditions for rank consistency in the sense of Bach (2008). Very interestingly, these conditions are highly related to the concept of constraint nondegeneracy in matrix optimization. As a byproduct, our results provide a theoretical foundation for the majorized penalty method of Gao and Sun (2010) and Gao (2010) for structured low-rank matrix optimization problems. Extensive numerical experiments demonstrate that our proposed rank-corrected procedure can simultaneously achieve a high recovery accuracy and capture the low-rank structure.
压缩移位检索
Henrik Ohlsson, Yonina C. Eldar, Allen Y. Yang, S. Shankar Sastry
AI总结 针对经典移位检索问题,提出从压缩信号中直接估计移位的方法,利用傅里叶系数在温和条件下仅需一个系数即可恢复真实移位。
经典的移位检索问题考虑两个向量形式的信号,它们通过移位相关联。该问题在许多应用中非常重要,通常通过最大化两个信号之间的互相关来解决。受压缩感知的启发,本文旨在直接从压缩信号中估计移位。我们表明,在某些条件下,与经典设置相比,可以使用更少的样本和更少的计算来恢复移位。特别令人感兴趣的是从傅里叶系数估计移位。我们表明,在相当温和的条件下,仅需一个傅里叶系数就足以恢复真实移位。
The classical shift retrieval problem considers two signals in vector form that are related by a shift. The problem is of great importance in many applications and is typically solved by maximizing the cross-correlation between the two signals. Inspired by compressive sensing, in this paper, we seek to estimate the shift directly from compressed signals. We show that under certain conditions, the shift can be recovered using fewer samples and less computation compared to the classical setup. Of particular interest is shift estimation from Fourier coefficients. We show that under rather mild conditions only one Fourier coefficient suffices to recover the true shift.
关于SPICE方法的注记
Cristian R. Rojas, Dimitrios Katselis, Håkan Hjalmarsson
AI总结 本文分析SPICE方法,建立其与Lasso和LAD-Lasso等标准稀疏估计方法的联系,从而将SPICE定位为计算Lasso型估计量的高效技术,并利用该联系建立SPICE的渐近性质及提出改进方案。
在本文中,我们分析了文献[1]中发展的SPICE方法,并建立了其与其他标准稀疏估计方法(如Lasso和LAD-Lasso)的联系。这一结果将SPICE定位为一种计算Lasso型估计量的高效技术。反过来,这种联系对于在多种问题场景下建立SPICE的渐近性质,以及在SPICE朴素版本失效的情况下提出合适的修改方案非常有用。
In this article, we analyze the SPICE method developed in [1], and establish its connections with other standard sparse estimation methods such as the Lasso and the LAD-Lasso. This result positions SPICE as a computationally efficient technique for the calculation of Lasso-type estimators. Conversely, this connection is very useful for establishing the asymptotic properties of SPICE under several problem scenarios and for suggesting suitable modifications in cases where the naive version of SPICE would not work.
非凸约束下的非参数工具变量回归
Markus Grasmair, Otmar Scherzer, Anne Vanhems
AI总结 针对内生解释变量问题,本文提出使用Tikhonov正则化结合工具变量估计非参数约束回归函数,并在非凸约束下推导了收敛速度。
本文考虑具有依赖于解释变量的加性误差的非参数回归模型。如同流行病学和经济学中的实证研究常见的那样,本文还假设观测到了有效的工具变量。微观经济学中的一个经典例子将消费者需求函数视为商品价格和收入的函数,这两个变量通常被认为是内生的。在此框架下,经济理论也对需求函数施加了形状约束,如可积性条件。受微观经济学中这一例子的启发,我们通过Tikhonov正则化研究了一种使用工具变量的非参数约束回归函数的估计量。我们在确定性设定和随机设定下推导了正则化模型的收敛速度,假设真实回归函数满足投影源条件,并且由于施加的约束的非凸性,还满足一个额外的小性条件。
This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, like integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.
反问题中的 nuisance 参数估计
Aleksandr Y. Aravkin, Tristan van Leeuwen
AI总结 本文提出一种将 nuisance 参数投影到主参数估计中的方法,适用于一大类最大似然和最大后验问题,并通过数值实验展示了在大规模反问题中主参数恢复的改进。
许多反问题包含 nuisance 参数,这些参数虽然不直接感兴趣,但却是恢复主参数所必需的。这些问题中存在的结构允许高效的优化策略——一个众所周知的例子是变量投影,其中在某些参数中呈线性的非线性最小二乘问题可以非常高效地优化。在本文中,我们将投影出变量子集的思想扩展到一大类具有 nuisance 参数(如方差或自由度)的最大似然(ML)和最大后验(MAP)问题。因此,我们能够将 nuisance 参数估计纳入大规模约束和无约束反问题公式中。我们将该方法应用于各种问题,包括高斯模型中未知方差参数的估计、鲁棒反问题背景下的自由度(d.o.f.)参数估计、自动校准和最优实验设计。通过数值例子,我们展示了在几个大规模反问题中主参数恢复的改进。所提出的方法与多种算法和公式兼容,其实现仅需对现有算法进行微小修改。
Many inverse problems include nuisance parameters which, while not of direct interest, are required to recover primary parameters. Structure present in these problems allows efficient optimization strategies - a well known example is variable projection, where nonlinear least squares problems which are linear in some parameters can be very efficiently optimized. In this paper, we extend the idea of projecting out a subset over the variables to a broad class of maximum likelihood (ML) and maximum a posteriori likelihood (MAP) problems with nuisance parameters, such as variance or degrees of freedom. As a result, we are able to incorporate nuisance parameter estimation into large-scale constrained and unconstrained inverse problem formulations. We apply the approach to a variety of problems, including estimation of unknown variance parameters in the Gaussian model, degree of freedom (d.o.f.) parameter estimation in the context of robust inverse problems, automatic calibration, and optimal experimental design. Using numerical examples, we demonstrate improvement in recovery of primary parameters for several large- scale inverse problems. The proposed approach is compatible with a wide variety of algorithms and formulations, and its implementation requires only minor modifications to existing algorithms.
传感器网络中同时自定位与跟踪的分布式最大似然方法
Nikolas Kantas, Sumeetpal S. Singh, Arnaud Doucet
AI总结 将传感器自定位问题转化为隐马尔可夫模型的静态参数估计问题,并实现递归最大似然和在线期望最大化算法的全分布式版本,以同时进行传感器网络定位与目标跟踪。
我们展示了传感器自定位问题可以转化为隐马尔可夫模型的静态参数估计问题,并实现了递归最大似然和在线期望最大化算法的全分布式版本,以同时进行传感器网络定位与目标跟踪。对于线性高斯模型,我们的算法可以通过分布式卡尔曼滤波器和一种新颖的消息传递算法精确实现。后者允许每个节点计算似然的局部导数或期望最大化所需的充分统计量。在非线性情况下,提出了一种基于局部线性化的解决方案,类似于扩展卡尔曼滤波器。在数值示例中,我们证明了所开发的算法能够学习定位参数。
We show that the sensor self-localization problem can be cast as a static parameter estimation problem for Hidden Markov Models and we implement fully decentralized versions of the Recursive Maximum Likelihood and on-line Expectation-Maximization algorithms to localize the sensor network simultaneously with target tracking. For linear Gaussian models, our algorithms can be implemented exactly using a distributed version of the Kalman filter and a novel message passing algorithm. The latter allows each node to compute the local derivatives of the likelihood or the sufficient statistics needed for Expectation-Maximization. In the non-linear case, a solution based on local linearization in the spirit of the Extended Kalman Filter is proposed. In numerical examples we demonstrate that the developed algorithms are able to learn the localization parameters.
马尔可夫链蒙特卡洛的耦合控制变量
Jonathan B. Goodman, Kevin K. Lin
AI总结 提出利用马尔可夫耦合作为控制变量,在稳态分布未知时提高MCMC计算精度,并通过非平衡输运模型验证。
我们表明,在某些稳态概率分布未知的情况下,马尔可夫耦合可用于提高马尔可夫链蒙特卡洛计算的精度。该技术推广了经典蒙特卡洛积分中控制变量的概念。我们使用两个非平衡输运模型进行说明。
We show that Markov couplings can be used to improve the accuracy of Markov chain Monte Carlo calculations in some situations where the steady-state probability distribution is not explicitly known. The technique generalizes the notion of control variates from classical Monte Carlo integration. We illustrate it using two models of nonequilibrium transport.
谱估计的不确定性界
Johan Karlsson, Tryphon T. Georgiou
AI总结 本文研究基于有限二阶统计量的功率谱不确定性度量,通过弱拓扑下的直径计算不确定性集的上界,并利用滤波器组实现先验不确定性界。
本文旨在研究适用于评估基于有限二阶统计量的功率谱不确定性的度量。与估计统计量的给定值范围一致的功率谱族代表了关于“真实”功率谱的不确定性集。我们的目标是使用合适的距离概念量化该不确定性集的大小,特别是计算该集的直径,因为这代表了集中任何名义元素与“真实”功率谱之间距离的上界。由于不确定性集可能包含具有谱线和间断点的功率谱,因此自然地在弱拓扑(由矩的连续性定义的拓扑)下量化距离。我们提供了此类弱连续度量的例子,并专注于那些可以显式量化谱不确定性的特定度量。然后,我们考虑某些利用滤波器组进行预处理的高分辨率技术,并仅基于滤波器动态计算最坏情况下的先验不确定性界。这使得能够先验地调整滤波器组,以在选定频段上提高分辨率。
The purpose of this paper is to study metrics suitable for assessing uncertainty of power spectra when these are based on finite second-order statistics. The family of power spectra which is consistent with a given range of values for the estimated statistics represents the uncertainty set about the "true" power spectrum. Our aim is to quantify the size of this uncertainty set using suitable notions of distance, and in particular, to compute the diameter of the set since this represents an upper bound on the distance between any choice of a nominal element in the set and the "true" power spectrum. Since the uncertainty set may contain power spectra with lines and discontinuities, it is natural to quantify distances in the weak topology---the topology defined by continuity of moments. We provide examples of such weakly-continuous metrics and focus on particular metrics for which we can explicitly quantify spectral uncertainty. We then consider certain high resolution techniques which utilize filter-banks for pre-processing, and compute worst-case a priori uncertainty bounds solely on the basis of the filter dynamics. This allows the a priori tuning of the filter-banks for improved resolution over selected frequency bands.
压缩矩阵乘法
Rasmus Pagh
AI总结 针对样本协方差矩阵计算和向量稀疏基变换问题,提出一种基于哈希和多项式压缩的矩阵乘法近似算法,通过FFT实现亚二次时间,并利用纠错码恢复显著项。
受样本协方差矩阵计算以及将向量集合变换到稀疏基的问题启发,我们提出一种简单算法,用于计算两个n×n实矩阵A和B的乘积的近似值。令||AB||_F表示AB的Frobenius范数,b为决定时间/精度权衡的参数。给定两两独立的哈希函数h_1,h_2: [n] -> [b]以及s_1,s_2: [n] -> {-1,+1},算法首先将矩阵乘积“压缩”为多项式p(x) = sum_{k=1}^n (sum_{i=1}^n A_{ik} s_1(i) x^{h_1(i)}) (sum_{j=1}^n B_{kj} s_2(j) x^{h_2(j)})。利用FFT进行多项式乘法,我们可以在Õ(n^2+ n b)时间内计算c_0,...,c_{b-1}使得sum_i c_i x^i = (p(x) mod x^b) + (p(x) div x^b)。然后,(AB)_{ij}的无偏估计量(方差至多为||AB||_F^2 / b)可计算为:C_{ij} = s_1(i) s_2(j) c_{(h_1(i)+h_2(j)) mod b}。我们的方法还导出了一个算法,当A和B最多有N个非零项且AB最多有b个非零项时,可以在Õ(N + nb)时间内高概率精确计算AB。此外,我们以一种新颖的方式使用纠错码在近线性时间内恢复AB的显著项。
Motivated by the problems of computing sample covariance matrices, and of transforming a collection of vectors to a basis where they are sparse, we present a simple algorithm that computes an approximation of the product of two n-by-n real matrices A and B. Let ||AB||_F denote the Frobenius norm of AB, and b be a parameter determining the time/accuracy trade-off. Given 2-wise independent hash functions $_1,h_2: [n] -> [b], and s_1,s_2: [n] -> {-1,+1} the algorithm works by first "compressing" the matrix product into the polynomial p(x) = sum_{k=1}^n (sum_{i=1}^n A_{ik} s_1(i) x^{h_1(i)}) (sum_{j=1}^n B_{kj} s_2(j) x^{h_2(j)}) Using FFT for polynomial multiplication, we can compute c_0,...,c_{b-1} such that sum_i c_i x^i = (p(x) mod x^b) + (p(x) div x^b) in time Õ(n^2+ n b). An unbiased estimator of (AB)_{ij} with variance at most ||AB||_F^2 / b can then be computed as: C_{ij} = s_1(i) s_2(j) c_{(h_1(i)+h_2(j)) mod b. Our approach also leads to an algorithm for computing AB exactly, whp., in time Õ(N + nb) in the case where A and B have at most N nonzero entries, and AB has at most b nonzero entries. Also, we use error-correcting codes in a novel way to recover significant entries of AB in near-linear time.
用于认知无线电网络的删失截断序贯频谱感知
Sina Maleki, Geert Leus
AI总结 提出一种删失截断序贯频谱感知技术,通过最小化每个传感器的最大能耗并约束检测概率和虚警率,实现节能与性能平衡。
可靠的频谱感知是认知无线电网络的关键功能。协作频谱感知提高了认知无线电系统的检测可靠性,但也增加了系统能耗,这对于低功耗无线技术尤为关键。删失截断序贯频谱感知技术被认为是一种节能方法。为了设计底层感知参数,在满足全局检测概率下限和虚警率上限的条件下,最小化每个传感器的最大能耗。这样既控制了因漏检对主用户造成的干扰,也控制了因低虚警率带来的网络吞吐量。我们在不同场景下将所提方案与固定样本量删失方案进行了性能比较。结果表明,随着认知无线电感知成本的增加,删失截断序贯方法的能效显著提高。
Reliable spectrum sensing is a key functionality of a cognitive radio network. Cooperative spectrum sensing improves the detection reliability of a cognitive radio system but also increases the system energy consumption which is a critical factor particularly for low-power wireless technologies. A censored truncated sequential spectrum sensing technique is considered as an energy-saving approach. To design the underlying sensing parameters, the maximum energy consumption per sensor is minimized subject to a lower bounded global probability of detection and an upper bounded false alarm rate. This way both the interference to the primary user due to miss detection and the network throughput as a result of a low false alarm rate is controlled. We compare the performance of the proposed scheme with a fixed sample size censoring scheme under different scenarios. It is shown that as the sensing cost of the cognitive radios increases, the energy efficiency of the censored truncated sequential approach grows significantly.
智能电网中的线路故障识别:Lasso方法
Hao Zhu, Georgios B. Giannakis
AI总结 针对智能电网中多条线路故障的快速识别问题,提出基于Lasso的二次规划算法,通过块坐标下降迭代实现近实时识别。
快速准确地揭示电力线路故障对于预防可能导致停电的故障以及智能电网的日常监控和控制任务(包括状态估计和最优潮流)至关重要。现有方法要么受限于所涉及的组合复杂度问题,因此仅限于识别单线和双线故障;要么采用不太实际的假设,例如电网中可用的条件独立相量角度测量。本文仅使用电压相角数据的一个子集,开发了一种近实时算法,用于识别多条线路故障,其复杂度仅为通过块坐标下降迭代求解二次规划。该新方法依赖于将直流线性潮流模型重新表述为稀疏过完备展开,并利用压缩采样和变量选择的最新进展,使用最小绝对收缩和选择算子(Lasso)。对标准IEEE 118总线系统的分析和仿真测试证实了在智能电网中应用Lasso识别线路变化的效果。
Fast and accurate unveiling of power line outages is of paramount importance not only for preventing faults that may lead to blackouts, but also for routine monitoring and control tasks of the smart grid, including state estimation and optimal power flow. Existing approaches are either challenged by the \emph{combinatorial complexity} issues involved, and are thus limited to identifying single- and double-line outages; or, they invoke less pragmatic assumptions such as \emph{conditionally independent} phasor angle measurements available across the grid. Using only a subset of voltage phasor angle data, the present paper develops a near real-time algorithm for identifying multiple line outages at the affordable complexity of solving a quadratic program via block coordinate descent iterations. The novel approach relies on reformulating the DC linear power flow model as a \emph{sparse} overcomplete expansion, and leveraging contemporary advances in compressive sampling and variable selection using the least-absolute shrinkage and selection operator (Lasso). Analysis and simulated tests on the standard IEEE 118-bus system confirm the effectiveness of lassoing line changes in the smart power grid.
L1范数惩罚最小二乘问题优化器中非零条目数单调递增的充分条件
J. Duan, Charles Soussen, David Brie, Jerome Idier, Y. -P. Wang
AI总结 本文针对L1范数惩罚最小二乘问题(LASSO),提出了一个充分条件,在该条件下当超参数减小时优化器中非零条目数单调递增,并将结果推广到全变分情形。
基于$\ell$-1范数的优化广泛应用于信号处理,尤其是近期的压缩感知理论。本文研究$\ell$-1范数惩罚最小二乘问题的解路径,其约束形式称为最小绝对收缩和选择算子(LASSO)。解路径是随着超参数(拉格朗日乘子)变化的所有优化器的集合。解路径的研究对于理解和观察近似项与正则化项之间的权衡曲线具有重要意义。如果已知给定问题的解路径,它可以帮助我们在给定准则(如Akaike信息准则)下找到最优超参数。本文提出了$\ell$-1范数惩罚最小二乘问题的一个充分条件。在该充分条件下,当超参数减小时,优化器或解向量中的非零条目数单调递增。我们还将结果推广到常用的全变分情形,其中$\ell$-1范数作用于解向量的一阶导数。我们证明所提出的条件与Donoho等人\cite{Donoho08}给出的条件以及Efron等人\cite{Efron04}的正锥条件具有内在联系。然而,所提出的条件不需要像Donoho等人的条件那样假设信号的稀疏水平,并且在用于实际应用时比Efron等人的正锥条件更容易验证。
The $\ell$-1 norm based optimization is widely used in signal processing, especially in recent compressed sensing theory. This paper studies the solution path of the $\ell$-1 norm penalized least-square problem, whose constrained form is known as Least Absolute Shrinkage and Selection Operator (LASSO). A solution path is the set of all the optimizers with respect to the evolution of the hyperparameter (Lagrange multiplier). The study of the solution path is of great significance in viewing and understanding the profile of the tradeoff between the approximation and regularization terms. If the solution path of a given problem is known, it can help us to find the optimal hyperparameter under a given criterion such as the Akaike Information Criterion. In this paper we present a sufficient condition on $\ell$-1 norm penalized least-square problem. Under this sufficient condition, the number of nonzero entries in the optimizer or solution vector increases monotonically when the hyperparameter decreases. We also generalize the result to the often used total variation case, where the $\ell$-1 norm is taken over the first order derivative of the solution vector. We prove that the proposed condition has intrinsic connections with the condition given by Donoho, et al \cite{Donoho08} and the positive cone condition by Efron {\it el al} \cite{Efron04}. However, the proposed condition does not need to assume the sparsity level of the signal as required by Donoho et al's condition, and is easier to verify than Efron, et al's positive cone condition when being used for practical applications.
不确定性量化中偶然变异与认知变异的区分与整合
Kamaljit Chowdhary, Paul Dupuis
AI总结 提出利用风险敏感积分与相对熵的对偶性,结合多项式混沌展开,在已知、近似已知或非随机变量分布下量化系统不确定性,并给出方差和超越概率的显式界。
迄今为止,大部分不确定性量化工作集中在确定以概率建模且分布已知的变量对某个物理或工程系统的影响。我们开发了当某些变量的分布精确已知、某些仅近似已知、而其他变量根本不作为随机变量建模时获取系统信息的方法。主要工具是风险敏感积分与相对熵之间的对偶性,我们在与名义分布的距离由相对熵度量的分布族上,获得了标准性能度量(方差、超越概率)的显式界。风险敏感期望的评估基于多项式混沌展开,这有助于保持计算方面的可处理性。
Much of uncertainty quantification to date has focused on determining the effect of variables modeled probabilistically, and with a known distribution, on some physical or engineering system. We develop methods to obtain information on the system when the distributions of some variables are known exactly, others are known only approximately, and perhaps others are not modeled as random variables at all. The main tool used is the duality between risk-sensitive integrals and relative entropy, and we obtain explicit bounds on standard performance measures (variances, exceedance probabilities) over families of distributions whose distance from a nominal distribution is measured by relative entropy. The evaluation of the risk-sensitive expectations is based on polynomial chaos expansions, which help keep the computational aspects tractable.
成像中的统计多分辨率Dantzig估计:基本概念与算法框架
Klaus Frick, Philipp Marnitz, Axel Munk
AI总结 本文针对“信号+噪声”模型中的函数估计问题,提出了一类统计多分辨率估计器,并开发了基于交替方向乘子法和Dykstra算法的计算框架,通过成像和信号检测示例展示了方法的有效性。
本文关注于“信号+噪声”模型中函数的全自动和局部自适应估计,其中回归函数可能进一步被线性算子(例如卷积)模糊。为此,我们引入了一类通用的统计多分辨率估计器,并开发了用于计算这些估计器的算法框架。这意味着估计器被定义为具有上确界型约束的凸优化问题的解。我们结合了交替方向乘子法和Dykstra算法来计算凸集交集上的正交投影,并证明了数值收敛性。通过成像和信号检测的各种示例,展示了所提出方法的能力。
In this paper we are concerned with fully automatic and locally adaptive estimation of functions in a "signal + noise"-model where the regression function may additionally be blurred by a linear operator, e.g. by a convolution. To this end, we introduce a general class of statistical multiresolution estimators and develop an algorithmic framework for computing those. By this we mean estimators that are defined as solutions of convex optimization problems with supremum-type constraints. We employ a combination of the alternating direction method of multipliers with Dykstra's algorithm for computing orthogonal projections onto intersections of convex sets and prove numerical convergence. The capability of the proposed method is illustrated by various examples from imaging and signal detection.
多个相互作用的故障的同时序贯检测
Ram Rajagopal, XuanLong Nguyen, Sinem Coleri Ergen, Pravin Varaiya
AI总结 针对分布式系统中多个相互作用的故障,提出一种基于复合停止规则的序贯检测算法,并在贝叶斯框架下证明其渐近最优性。
单故障序贯变点问题对于大型分布式系统(如传感器网络)中各种现象的建模变得重要。但在许多情况下,此类系统存在多个相互作用的故障。例如,网络中的单个传感器可能发生故障,通过比较传感器之间的测量值进行检测,导致故障之间存在统计依赖性。我们提出了分布式系统中多个相互作用故障的新公式。该公式包括组成大型系统的各个子系统可能故障的方式、这些子系统之间可以共享的信息以及故障之间的交互模式。然后,我们指定了一种新的序贯算法来检测这些故障。该算法的主要特点是它使用依赖于其他子系统决策的子系统复合停止规则。我们在贝叶斯设置下对该算法进行了渐近虚警和检测延迟分析,并表明在某些条件下该算法是最优的。分析方法依赖于停止时间之间新颖的详细比较技术。我们通过一些仿真验证了该方法。
Single fault sequential change point problems have become important in modeling for various phenomena in large distributed systems, such as sensor networks. But such systems in many situations present multiple interacting faults. For example, individual sensors in a network may fail and detection is performed by comparing measurements between sensors, resulting in statistical dependency among faults. We present a new formulation for multiple interacting faults in a distributed system. The formulation includes specifications of how individual subsystems composing the large system may fail, the information that can be shared among these subsystems and the interaction pattern between faults. We then specify a new sequential algorithm for detecting these faults. The main feature of the algorithm is that it uses composite stopping rules for a subsystem that depend on the decision of other subsystems. We provide asymptotic false alarm and detection delay analysis for this algorithm in the Bayesian setting and show that under certain conditions the algorithm is optimal. The analysis methodology relies on novel detailed comparison techniques between stopping times. We validate the approach with some simulations.
支持向量机在神经解码问题中的非光滑形式
Cary Humber, Kazufumi Ito, Chad Bouton
AI总结 本文提出一种广义分类算法,应用于解码大脑神经活动,通过非光滑支持向量机形式处理神经放电数据,识别与不同想象运动相关的神经元及其放电模式变化。
本文提出一种广义分类算法,应用于分类(或“解码”)大脑中的神经活动。医生和研究人员长期以来一直对大脑活动与身体运动之间的关联感兴趣。针对无法移动的患者进行了实验,以了解思考运动如何产生可辨别的神经活动。研究人员需要根据神经放电数据,确定哪些神经元负责不同的想象运动,以及放电行为如何变化。例如,想象运动可能包括手腕屈曲、肘部伸展或握拳。这只是数据分类的众多应用之一。尽管本文涉及神经科学中的应用,但所提出的广义算法在从神经科学到声学和医学成像等科学领域都有应用。
This paper formulates a generalized classification algorithm with an application to classifying (or `decoding') neural activity in the brain. Medical doctors and researchers have long been interested in how brain activity correlates to body movement. Experiments have been conducted on patients whom are unable to move, in order to gain insight as to how thinking about movements might generate discernable neural activity. Researchers are tasked with determining which neurons are responsible for different imagined movements and how the firing behavior changes, given neural firing data. For instance, imagined movements may include wrist flexion, elbow extension, or closing the hand. This is just one of many applications to data classification. Though this article deals with an application in neuroscience, the generalized algorithm proposed in this article has applications in scientific areas ranging from neuroscience to acoustic and medical imaging.
基于凸优化的低秩张量估计
Ryota Tomioka, Kohei Hayashi, Hisashi Kashima
AI总结 针对部分观测下多路数组(张量)的Tucker分解估计问题,提出三种基于凸优化的方法,通过迹范数正则化自动估计秩,并在预测性能、速度和恢复已知多线性结构方面优于传统方法。
在本文中,我们提出了三种从部分观测中估计多路数组(张量)的Tucker分解的方法。所有方法都表述为凸最小化问题,因此最小值保证唯一。所提出的方法可以通过优化自动估计因子数量(秩),因此无需事先指定秩。我们采用的关键技术是迹范数正则化,这是低秩矩阵估计的常用方法。此外,我们提出了一种简单的启发式方法来提高所得分解的可解释性。通过对合成和真实世界数据集的数值实验,展示了三种方法的优缺点。我们表明,所提出的基于凸优化的方法在预测性能上更准确、速度更快,并且在恢复已知多线性结构方面比传统方法更可靠。
In this paper, we propose three approaches for the estimation of the Tucker decomposition of multi-way arrays (tensors) from partial observations. All approaches are formulated as convex minimization problems. Therefore, the minimum is guaranteed to be unique. The proposed approaches can automatically estimate the number of factors (rank) through the optimization. Thus, there is no need to specify the rank beforehand. The key technique we employ is the trace norm regularization, which is a popular approach for the estimation of low-rank matrices. In addition, we propose a simple heuristic to improve the interpretability of the obtained factorization. The advantages and disadvantages of three proposed approaches are demonstrated through numerical experiments on both synthetic and real world datasets. We show that the proposed convex optimization based approaches are more accurate in predictive performance, faster, and more reliable in recovering a known multilinear structure than conventional approaches.
分布式优化的对偶平均法:收敛分析与网络缩放
John Duchi, Alekh Agarwal, Martin Wainwright
AI总结 针对网络上的分布式优化问题,提出基于次梯度对偶平均的分布式算法,并给出收敛速率关于网络规模和拓扑的尖锐界,证明迭代次数与网络谱间隙成反比。
网络上的分散式优化目标是通过仅使用本地计算和通信来优化由局部(可能非光滑)凸函数之和构成的全局目标。它出现在各种应用领域,包括分布式跟踪与定位、多智能体协调、传感器网络中的估计以及机器学习中的大规模优化。我们开发并分析了基于次梯度对偶平均的分布式算法,并给出了其收敛速率关于网络规模和拓扑的尖锐界。我们的分析方法允许在优化算法本身的收敛与网络结构引起的通信约束效应之间进行清晰分离。特别地,我们表明算法所需的迭代次数与网络谱间隙成反比。该预测的尖锐性通过理论下界和各种网络的模拟得到证实。我们的方法包括确定性优化和通信的情况,以及随机优化和/或通信的问题。
The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. It arises in various application domains, including distributed tracking and localization, multi-agent co-ordination, estimation in sensor networks, and large-scale optimization in machine learning. We develop and analyze distributed algorithms based on dual averaging of subgradients, and we provide sharp bounds on their convergence rates as a function of the network size and topology. Our method of analysis allows for a clear separation between the convergence of the optimization algorithm itself and the effects of communication constraints arising from the network structure. In particular, we show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. Our approach includes both the cases of deterministic optimization and communication, as well as problems with stochastic optimization and/or communication.
经验框架系数的极值分析及其对软阈值去噪的启示
Markus Haltmeier, Axel Munk
AI总结 本文通过推导冗余框架下噪声系数最大值的渐近分布,为通用极值阈值去噪方法提供理论依据,从而在统计上平衡噪声去除与信号保留。
框架阈值去噪是从加性高斯白噪声污染的数据中恢复离散信号或图像的最基本且高效的方法之一。其基本思想是选择一组分析元素框架,将数据分离为少数由信号引起的大系数和许多主要由噪声ε_n引起的小系数。去除所有幅度低于某个阈值的数据系数,即可重建原始信号。为了适当平衡待去除的噪声量与待保留的相关信号特征,准确理解阈值化的统计特性至关重要。为此,我们推导了对于一类广泛的冗余框架(ϕ_ω^n: ω∈Ω_n),max_{ω∈Ω_n} |<ϕ_ω^n,ε_n>|的渐近分布。基于我们的理论结果,我们为通用极值阈值技术提供了理论依据,该技术可产生对应于指定显著性水平的渐近尖锐置信区域和平滑度估计。这些结果涵盖了成像和信号恢复应用中使用的许多框架,例如冗余小波系统、曲波框架或基的并集。我们表明,“一般地”会得到标准Gumbel分布,正如正交小波基的情况一样。然而,对于特定的高度冗余框架,可能会出现其他极限定律。我们确实验证了平移不变小波变换表现出不同的渐近行为。
Denoising by frame thresholding is one of the most basic and efficient methods for recovering a discrete signal or image from data that are corrupted by additive Gaussian white noise. The basic idea is to select a frame of analyzing elements that separates the data in few large coefficients due to the signal and many small coefficients mainly due to the noise ε_n. Removing all data coefficients being in magnitude below a certain threshold yields a reconstruction of the original signal. In order to properly balance the amount of noise to be removed and the relevant signal features to be kept, a precise understanding of the statistical properties of thresholding is important. For that purpose we derive the asymptotic distribution of max_{ω\in Ω_n} |<ϕ_ω^n,ε_n>| for a wide class of redundant frames (ϕ_ω^n: ω\in Ω_n}. Based on our theoretical results we give a rationale for universal extreme value thresholding techniques yielding asymptotically sharp confidence regions and smoothness estimates corresponding to prescribed significance levels. The results cover many frames used in imaging and signal recovery applications, such as redundant wavelet systems, curvelet frames, or unions of bases. We show that `generically' a standard Gumbel law results as it is known from the case of orthonormal wavelet bases. However, for specific highly redundant frames other limiting laws may occur. We indeed verify that the translation invariant wavelet transform shows a different asymptotic behaviour.
动力系统中的预测与模块性
Artemy Kolchinsky, Luis M. Rocha
AI总结 本文从统计建模和预测的角度,利用模型简洁性与预测精度之间的权衡,提出了一种将动力网络最优多尺度分解为弱耦合简单模块的方法,并给出了状态依赖和因果版本。
识别和理解模块化组织是复杂系统研究中的核心问题。已有多种方法被提出,其中许多以信息论术语表述。我们的研究从动力系统的统计建模和预测这一互补视角出发。已知对于有限量的训练数据,简单模型可能比复杂模型具有更强的预测能力。我们利用模型简洁性与预测精度之间的权衡,将动力网络最优多尺度分解为弱耦合的简单模块。还提出了我们方法的状态依赖和因果版本。
Identifying and understanding modular organizations is centrally important in the study of complex systems. Several approaches to this problem have been advanced, many framed in information-theoretic terms. Our treatment starts from the complementary point of view of statistical modeling and prediction of dynamical systems. It is known that for finite amounts of training data, simpler models can have greater predictive power than more complex ones. We use the trade-off between model simplicity and predictive accuracy to generate optimal multiscale decompositions of dynamical networks into weakly-coupled, simple modules. State-dependent and causal versions of our method are also proposed.
在具有正定函数的再生核巴拿赫空间中求解支持向量机
Gregory E. Fasshauer, Fred J. Hickernell, Qi Ye
AI总结 本文在再生核巴拿赫空间中利用半内积的正交性和傅里叶变换技术,显式求解支持向量机,并通过正定函数表示最优解,系数由不动点迭代计算。
本文在再生核巴拿赫空间中求解支持向量机,其再生核定义在非对称域上,而非传统再生核希尔伯特空间中的方法。利用半内积的正交性,我们可以获得支持向量机解的对偶(归一化对偶映射)元素的显式表示。此外,通过傅里叶变换技术,我们可以在广义原生空间中引入再生性质,使其成为再生核巴拿赫空间,甚至可以嵌入到索伯列夫空间中,其再生核由相关的正定函数建立。在这些再生核巴拿赫空间中,支持向量机(正则化经验风险)的最优解用正定函数显式表示,其有限数量的系数可通过不动点迭代计算。我们还给出了由Matérn函数(Sobolev样条)诱导的再生核巴拿赫空间的一些典型例子,使得它们的支持向量机解像经典算法一样易于计算。此外,它们的每个再生基都包含来自多个训练数据点的信息。再生核巴拿赫空间的概念为我们求解支持向量机提供了一种新的数值工具。
In this paper we solve support vector machines in reproducing kernel Banach spaces with reproducing kernels defined on nonsymmetric domains instead of the traditional methods in reproducing kernel Hilbert spaces. Using the orthogonality of semi-inner-products, we can obtain the explicit representations of the dual (normalized-duality-mapping) elements of support vector machine solutions. In addition, we can introduce the reproduction property in a generalized native space by Fourier transform techniques such that it becomes a reproducing kernel Banach space, which can be even embedded into Sobolev spaces, and its reproducing kernel is set up by the related positive definite function. The representations of the optimal solutions of support vector machines (regularized empirical risks) in these reproducing kernel Banach spaces are formulated explicitly in terms of positive definite functions, and their finite numbers of coefficients can be computed by fixed point iteration. We also give some typical examples of reproducing kernel Banach spaces induced by Matérn functions (Sobolev splines) so that their support vector machine solutions are well computable as the classical algorithms. Moreover, each of their reproducing bases includes information from multiple training data points. The concept of reproducing kernel Banach spaces offers us a new numerical tool for solving support vector machines.
通过蒙特卡洛采样保证保守固定宽度置信区间
Fred J. Hickernell, Lan Jiang, Yuewei Liu, Art Owen
AI总结 提出一种两阶段算法,利用峰度上界和Berry-Esseen不等式,通过初始样本估计方差并确定所需样本量,从而构建具有指定半宽度的保守置信区间。
蒙特卡洛方法用于近似随机变量$Y$的均值$\mu$,这些随机变量的分布未知。关键思想是随机样本$Y_1, ..., Y_n$的平均值随着$n$趋于无穷而趋于$\mu$。本文探讨如何可靠地构建一个具有指定半宽度(或误差容限)$\varepsilon$的置信区间。我们提出的两阶段算法假设$Y$的峰度不超过某个用户指定的界限。初始独立同分布(IID)样本用于可靠地估计$Y$的方差。然后,Berry-Esseen不等式使得能够确定构建所需$\mu$置信区间所需的IID样本量。我们讨论了$Y=f(\vX)$且$\vX$是具有概率密度函数$\rho$的随机$d$维向量的重要情况。在这种情况下,$\mu$可以解释为积分$\int_{\reals^d} f(\vx) \rho(\vx) \dif \vx$,而蒙特卡洛方法成为多维求积的一种方法。
Monte Carlo methods are used to approximate the means, $μ$, of random variables $Y$, whose distributions are not known explicitly. The key idea is that the average of a random sample, $Y_1, ..., Y_n$, tends to $μ$ as $n$ tends to infinity. This article explores how one can reliably construct a confidence interval for $μ$ with a prescribed half-width (or error tolerance) $\varepsilon$. Our proposed two-stage algorithm assumes that the kurtosis of $Y$ does not exceed some user-specified bound. An initial independent and identically distributed (IID) sample is used to confidently estimate the variance of $Y$. A Berry-Esseen inequality then makes it possible to determine the size of the IID sample required to construct the desired confidence interval for $μ$. We discuss the important case where $Y=f(\vX)$ and $\vX$ is a random $d$-vector with probability density function $ρ$. In this case $μ$ can be interpreted as the integral $\int_{\reals^d} f(\vx) ρ(\vx) \dif \vx$, and the Monte Carlo method becomes a method for multidimensional cubature.
高斯噪声下的奇异向量扰动
Rongrong Wang
AI总结 本文对高斯噪声下奇异向量的分布进行非渐近分析,给出了矩阵前几个奇异向量近似正态分布的充分条件,可用于线性降维的误差分析。
我们对高斯噪声下的奇异向量分布进行了非渐近分析。特别地,我们给出了一个矩阵的前几个奇异向量具有近似正态分布的充分条件。我们的结果可用于促进线性降维中的误差分析。
We perform a non-asymptotic analysis on the singular vector distribution under Gaussian noise. In particular, we provide sufficient conditions on a matrix for its first few singular vectors to have near normal distribution. Our result can be used to facilitate the error analysis in linear dimension reduction.
基于L1型先验的高维逆问题稀疏贝叶斯推断的快速马尔可夫链蒙特卡洛采样
Felix Lucka
AI总结 针对高维逆问题中的稀疏贝叶斯推断,提出一种基于L1型先验的单分量吉布斯MCMC采样器,其效率随稀疏性或未知数维度增加而提高,克服了传统Metropolis-Hastings采样器在高维稀疏情形下不可行的局限。
稀疏性已成为使用变分正则化技术求解高维逆问题的关键概念。近年来,通过将稀疏约束编码到先验分布中,在贝叶斯框架下对逆问题施加类似稀疏约束的方法引起了关注。当使用促进稀疏性的反演时,关于正则化理论与贝叶斯推断之间关系的重要问题仍需解决。这些研究的一个实际障碍是缺乏针对稀疏高维贝叶斯反演的快速后验采样算法:要访问完整的贝叶斯推断方法范围,需要能够快速高效地从后验概率分布中抽取样本。这通常使用马尔可夫链蒙特卡洛(MCMC)采样算法完成。在本文中,我们开发并检验了一种针对依赖L1范数的稀疏先验的单分量吉布斯MCMC采样器的新实现。我们证明,当稀疏程度或未知数维度增加时,我们的吉布斯采样器的效率会提高。这一性质与最常用的Metropolis-Hastings(MH)采样方案的性质相反:我们证明,对于L1型先验,当稀疏程度或未知数维度增加时,MH方案的效率急剧下降。实际上,使用MH采样器对L1型先验进行贝叶斯反演完全不可行。由于这通常被认为是MCMC采样的固有特征,我们的吉布斯采样器的性能也挑战了关于基于样本的贝叶斯推断适用性的普遍看法。
Sparsity has become a key concept for solving of high-dimensional inverse problems using variational regularization techniques. Recently, using similar sparsity-constraints in the Bayesian framework for inverse problems by encoding them in the prior distribution has attracted attention. Important questions about the relation between regularization theory and Bayesian inference still need to be addressed when using sparsity promoting inversion. A practical obstacle for these examinations is the lack of fast posterior sampling algorithms for sparse, high-dimensional Bayesian inversion: Accessing the full range of Bayesian inference methods requires being able to draw samples from the posterior probability distribution in a fast and efficient way. This is usually done using Markov chain Monte Carlo (MCMC) sampling algorithms. In this article, we develop and examine a new implementation of a single component Gibbs MCMC sampler for sparse priors relying on L1-norms. We demonstrate that the efficiency of our Gibbs sampler increases when the level of sparsity or the dimension of the unknowns is increased. This property is contrary to the properties of the most commonly applied Metropolis-Hastings (MH) sampling schemes: We demonstrate that the efficiency of MH schemes for L1-type priors dramatically decreases when the level of sparsity or the dimension of the unknowns is increased. Practically, Bayesian inversion for L1-type priors using MH samplers is not feasible at all. As this is commonly believed to be an intrinsic feature of MCMC sampling, the performance of our Gibbs sampler also challenges common beliefs about the applicability of sample based Bayesian inference.
用于学习潜变量模型的张量分解
Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky
AI总结 本文利用低阶可观测矩的张量结构,通过对称张量分解(类似矩阵SVD的推广)实现高斯混合模型、隐马尔可夫模型等潜变量模型的参数估计,并提供了鲁棒张量幂法的详细分析。
本文研究了一类广泛的潜变量模型(包括高斯混合模型、隐马尔可夫模型和潜在狄利克雷分配)的计算和统计高效参数估计方法,该方法利用了其低阶可观测矩(通常是二阶和三阶)中的特定张量结构。具体地,参数估计被简化为从矩导出的对称张量中提取某种(正交)分解的问题;这种分解可以看作是矩阵奇异值分解的自然推广。尽管张量分解通常难以计算,但这些特殊结构张量的分解可以通过多种方法高效获得,包括幂迭代和最大化方法(类似于矩阵的情况)。本文提供了鲁棒张量幂方法的详细分析,建立了类似于矩阵奇异向量的Wedin扰动定理的类比。这意味着对于几种流行的潜变量模型,存在一种鲁棒且计算可行的估计方法。
This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.
基于抽样的敏感性分析中实验最优设计的竞争性比较
Eliska Janouchova, Anna Kucerova
AI总结 本文综述并比较了用于基于抽样的敏感性分析的实验设计质量准则,旨在提高敏感性预测的准确性。
如今,真实世界结构的数值模型更加精确、复杂,当然也更加耗时。尽管计算能力有所增长,探索模型行为仍然是一项复杂的任务。敏感性分析是研究模型对其输入敏感性的基本工具。一种广泛使用的评估敏感性的策略基于对给定输入参数集(即设计空间中的点)进行有限次模拟。然后,通过计算输入参数与所选模型响应之间的相关性,可以获得敏感性的估计。敏感性预测的准确性取决于称为实验设计的设计点的选择。本文旨在回顾和比较确定适用于基于抽样的敏感性分析的实验设计质量的可用准则。
Nowadays, the numerical models of real-world structures are more precise, more complex and, of course, more time-consuming. Despite the growth of a computational effort, the exploration of model behaviour remains a complex task. The sensitivity analysis is a basic tool for investigating the sensitivity of the model to its inputs. One widely used strategy to assess the sensitivity is based on a finite set of simulations for a given sets of input parameters, i.e. points in the design space. An estimate of the sensitivity can be then obtained by computing correlations between the input parameters and the chosen response of the model. The accuracy of the sensitivity prediction depends on the choice of design points called the design of experiments. The aim of the presented paper is to review and compare available criteria determining the quality of the design of experiments suitable for sampling-based sensitivity analysis.
基于置信度的报童问题优化
Roberto Rossi, Steven Prestwich, S. Armagan Tarim, Brahim Hnich
AI总结 针对单周期单品种随机库存优化中的需求估计问题,提出一种结合置信区间分析与库存优化的策略,以指定置信概率确定包含真实最优订货量的候选范围,并给出成本上下界。
我们提出了一种新策略来解决单品种单周期随机库存优化问题中的需求估计问题。我们的策略将置信区间分析与库存优化分析性地结合起来。假设决策者获得一组过去的需求样本,我们采用置信区间分析来识别一组候选订货量,这些订货量以指定的置信概率包含未知平稳参数随机需求过程的真实最优订货量。此外,对于识别出的每个候选订货量,我们的方法可以给出相关成本的上界和下界。我们将这一新方法应用于指数族中的三种需求分布:二项分布、泊松分布和指数分布。对于其中两种分布,我们还讨论了扩展到未观测损失销售的情况。通过数值示例展示了我们的方法如何补充现有的频率学派(例如基于最大似然估计)或贝叶斯策略。
We introduce a novel strategy to address the issue of demand estimation in single-item single-period stochastic inventory optimisation problems. Our strategy analytically combines confidence interval analysis and inventory optimisation. We assume that the decision maker is given a set of past demand samples and we employ confidence interval analysis in order to identify a range of candidate order quantities that, with prescribed confidence probability, includes the real optimal order quantity for the underlying stochastic demand process with unknown stationary parameter(s). In addition, for each candidate order quantity that is identified, our approach can produce an upper and a lower bound for the associated cost. We apply our novel approach to three demand distribution in the exponential family: binomial, Poisson, and exponential. For two of these distributions we also discuss the extension to the case of unobserved lost sales. Numerical examples are presented in which we show how our approach complements existing frequentist - e.g. based on maximum likelihood estimators - or Bayesian strategies.
低秩矩阵补全的代数组合方法
Franz J. Király, Louis Theran, Ryota Tomioka
AI总结 本文提出一种基于代数几何和拟阵理论的代数组合新视角,通过研究少量条目间的关系来解决低秩矩阵补全问题,并给出概率为1的算法判断特定条目是否可补全、从少量条目补全该条目及估计补全误差。
我们提出了一种新颖的低秩矩阵补全的代数组合观点,该观点基于使用代数几何和拟阵理论的工具研究少量条目之间的关系。该方法的固有局部性允许在封闭的理论和实践框架中处理单个条目。更具体地说,除了引入低秩矩阵补全的代数组合理论外,我们还提出了概率为1的算法,以决定矩阵的特定条目是否可以被补全。我们还描述了从少量其他条目补全该条目的方法,以及估计任何补全该条目的方法所产生的误差。此外,我们展示了关于矩阵补全的已知结果及其采样假设如何与我们的新视角相关联,并可根据可补全性相变进行解释。
We present a novel algebraic combinatorial view on low-rank matrix completion based on studying relations between a few entries with tools from algebraic geometry and matroid theory. The intrinsic locality of the approach allows for the treatment of single entries in a closed theoretical and practical framework. More specifically, apart from introducing an algebraic combinatorial theory of low-rank matrix completion, we present probability-one algorithms to decide whether a particular entry of the matrix can be completed. We also describe methods to complete that entry from a few others, and to estimate the error which is incurred by any method completing that entry. Furthermore, we show how known results on matrix completion and their sampling assumptions can be related to our new perspective and interpreted in terms of a completability phase transition.
识别凸/凹曲线拐点的方法开发
Demetris T. Christopoulos
AI总结 提出两种方法,利用几何性质通过弦线关系识别含误差或不含误差数据的真实拐点,并在Sigmoid曲线和三次多项式上验证。
我们介绍了两种揭示含有或不含误差数据的真实拐点的方法。起点是一组几何性质,这些性质遵循光滑函数存在拐点p的事实。这些性质将p前后分别的凸性/凹性概念与适当定义的三条弦线联系起来。最后,针对Sigmoid曲线类和三次多项式进行了一系列实验。
We are introducing two methods for revealing the true inflection point of data that contains or not error. The starting point is a set of geometrical properties that follow the existence of an inflection point p for a smooth function. These properties connect the concept of convexity/concavity before and after p respectively with three chords defined properly. Finally a set of experiments is presented for the class of sigmoid curves and for the third order polynomials.
面向目标的降基方法误差估计及其在敏感性分析中的应用
Alexandre Janon, Maëlle Nodet, Clémentine Prieur
AI总结 提出一种新的概率误差界,用于降基方法中线性泛函的误差估计,该误差界更精确且可高效计算,并应用于敏感性分析。
降基方法是一种强大的模型降阶技术,旨在加速参数化偏微分方程多个数值解的计算。我们考虑一个感兴趣的量,它是PDE解的线性泛函。提出了一种新的降阶模型概率误差界。该误差界可高效显式计算,并且我们在不同示例上表明该误差界比现有误差界更精确。我们将我们的工作应用于敏感性分析研究。
The reduced basis method is a powerful model reduction technique designed to speed up the computation of multiple numerical solutions of parametrized partial differential equations. We consider a quantity of interest, which is a linear functional of the PDE solution. A new probabilistic error bound for the reduced model is proposed. It is efficiently and explicitly computable, and we show on different examples that this error bound is sharper than existing ones. We include application of our work to sensitivity analysis studies.
缺失条目的矩阵逼近与补全
Gil Shabat, Yaniv Shmueli, Amir Averbuch
AI总结 针对仅部分条目已知的矩阵,提出一系列算法实现矩阵补全与逼近,支持低秩、核范数、谱范数等多种约束,并证明凸情形下全局收敛,无需参数且适用于图像重建及偏微分方程数据恢复。
我们描述了当矩阵仅部分条目已知时,用于矩阵补全和矩阵逼近的几种算法。逼近约束可以是任何对于完整矩阵已知其近似解的约束。对于低秩逼近,类似的算法最近在文献中以不同名称出现。在这项工作中,我们引入了矩阵逼近的新定理,并表明这些算法可以扩展到处理不同的约束,例如核范数、谱范数、正交约束等,这些约束与低秩逼近不同。由于这些算法可以从优化的角度看待,我们讨论了它们在凸情形下收敛到全局解的问题。我们还讨论了最优步长,并表明它在每次迭代中是固定的。此外,推导出的矩阵补全流是鲁棒的,不需要任何参数。该矩阵补全流适用于不同的谱最小化问题,并可应用于物理、数学和电气工程问题,例如图像数据重建以及来自偏微分方程(如用于电磁波的亥姆霍兹方程)的数据重建。
We describe several algorithms for matrix completion and matrix approximation when only some of its entries are known. The approximation constraint can be any whose approximated solution is known for the full matrix. For low rank approximations, similar algorithms appears recently in the literature under different names. In this work, we introduce new theorems for matrix approximation and show that these algorithms can be extended to handle different constraints such as nuclear norm, spectral norm, orthogonality constraints and more that are different than low rank approximations. As the algorithms can be viewed from an optimization point of view, we discuss their convergence to global solution for the convex case. We also discuss the optimal step size and show that it is fixed in each iteration. In addition, the derived matrix completion flow is robust and does not require any parameters. This matrix completion flow is applicable to different spectral minimizations and can be applied to physics, mathematics and electrical engineering problems such as data reconstruction of images and data coming from PDEs such as Helmholtz equation used for electromagnetic waves.
L1-恢复的精度保证
Anatoli Iouditski, Arkadii S. Nemirovski
AI总结 本文提出两种基于L1最小化的稀疏信号恢复新方法,通过可验证的性能保证和优化边界,构建统计性质优于常用方法的估计器,并利用非欧几里得基追踪算法计算精度界。
我们讨论了两种基于$\ell_1$最小化的从噪声观测中恢复稀疏信号的新方法。它们与Lasso和Dantzig Selector等众所周知的技术密切相关。然而,这些估计器带有可有效验证的性能保证。通过针对方法参数优化这些界,我们能够构建出比常用估计器具有更好统计性质的估计器。我们还展示了这些技术如何为Lasso和Dantzig Selector提供可有效计算的精度界。我们将我们的性能估计与压缩感知的著名结果联系起来,并通过一个oracle不等式证明我们提出的方法的合理性,该不等式将恢复算法的性质与已知信号支撑时的最佳估计性能联系起来。我们演示了如何使用非欧几里得基追踪算法计算这些估计。
We discuss two new methods of recovery of sparse signals from noisy observation based on $\ell_1$- minimization. They are closely related to the well-known techniques such as Lasso and Dantzig Selector. However, these estimators come with efficiently verifiable guaranties of performance. By optimizing these bounds with respect to the method parameters we are able to construct the estimators which possess better statistical properties than the commonly used ones. We also show how these techniques allow to provide efficiently computable accuracy bounds for Lasso and Dantzig Selector. We link our performance estimations to the well known results of Compressive Sensing and justify our proposed approach with an oracle inequality which links the properties of the recovery algorithms and the best estimation performance when the signal support is known. We demonstrate how the estimates can be computed using the Non-Euclidean Basis Pursuit algorithm.
最小化复合函数的近端牛顿型方法
Jason D. Lee, Yuekai Sun, Michael A. Saunders
AI总结 针对光滑函数与非光滑凸函数之和的优化问题,提出近端牛顿型方法,并证明其在不精确搜索方向下仍保持牛顿型方法的收敛性,统一了生物信息学、信号处理和统计学习中的多种流行方法。
我们将最小化光滑函数的牛顿型方法推广到处理两个凸函数之和:一个光滑函数和一个具有简单近端映射的非光滑函数。我们表明,即使搜索方向计算不精确,所得到的近端牛顿型方法也继承了最小化光滑函数的牛顿型方法的理想收敛行为。许多针对生物信息学、信号处理和统计学习中出现的特定问题的流行方法是近端牛顿型方法的特例,我们的分析为其中一些方法提供了新的收敛结果。
We generalize Newton-type methods for minimizing smooth functions to handle a sum of two convex functions: a smooth function and a nonsmooth function with a simple proximal mapping. We show that the resulting proximal Newton-type methods inherit the desirable convergence behavior of Newton-type methods for minimizing smooth functions, even when search directions are computed inexactly. Many popular methods tailored to problems arising in bioinformatics, signal processing, and statistical learning are special cases of proximal Newton-type methods, and our analysis yields new convergence results for some of these methods.
离散时间描述系统最大后验估计的推导
Ali Al-Matouq
AI总结 本文利用Kronecker规范变换推导了一般离散时间描述系统的MAP状态估计目标函数,并指出对于指标1的因果描述系统,MAP估计无需模型变换且可递归求解;对于指标2或更高的因果系统,在随机模型设计中考虑因果性后也可递归求解。
本报告给出了通用(可能非方)离散时间因果/非因果描述系统的MAP状态估计目标函数的推导。推导利用了Kronecker规范变换来提取描述状态向量的先验分布,从而可以使用最大后验(MAP)点估计。分析表明,对于指标1的因果描述系统,MAP估计不需要任何模型变换,并且可以递归求解。此外,如果描述系统是指标2或更高,且无噪声系统是因果的,那么在随机模型设计中考虑模型因果性的前提下,MAP估计也可以在没有模型变换的情况下递归求解。
In this report a derivation of the MAP state estimator objective function for general (possibly non-square) discrete time causal/non-causal descriptor systems is presented. The derivation made use of the Kronecker Canonical Transformation to extract the prior distribution on the descriptor state vector so that Maximum a Posteriori (MAP) point estimation can be used. The analysis indicates that the MAP estimate for index 1 causal descriptor systems does not require any model transformations and can be found recursively. Furthermore, if the descriptor system is of index 2 or higher and the noise free system is causal, then the MAP estimate can also be found recursively without model transformations provided that model causality is accounted for in designing the stochastic model.
用于隐藏不可压缩湍流运动的自相似先验和小波基
Patrick Héas, Frédéric Lavancier, Souleymane Kadri-Harouna
AI总结 针对从图像序列估计湍流这一病态逆问题,提出基于散度自由各向同性分数布朗运动的自相似先验模型,并利用小波基实现有效求解。
本文关注从图像序列观测估计湍流这一病态逆问题。从贝叶斯角度,选择散度自由各向同性分数布朗运动作为瞬时湍流速度场的先验模型。该自相似先验准确刻画了不可压缩各向同性湍流中速度场的二阶统计特性。然而,相关的最大后验估计涉及分数阶拉普拉斯算子,实际实现较为困难。为解决此问题,我们提出将散度自由分数布朗运动分解到精心选择的小波基上。作为第一种方案,我们设计小波作为白化滤波器,并证明这些滤波器是由Leray投影算子组成的分数阶拉普拉斯小波。作为第二种方案,我们使用散度自由小波基,该基隐式考虑了物理中的不可压缩约束。尽管后一种分解涉及相关小波系数,我们仍能在实践中处理这种依赖性。基于这两种小波分解,我们最终提供了有效且高效的算法来逼近最大后验估计。大量数值评估证明了所提出的小波基自相似先验的相关性。
This work is concerned with the ill-posed inverse problem of estimating turbulent flows from the observation of an image sequence. From a Bayesian perspective, a divergence-free isotropic fractional Brownian motion (fBm) is chosen as a prior model for instantaneous turbulent velocity fields. This self-similar prior characterizes accurately second-order statistics of velocity fields in incompressible isotropic turbulence. Nevertheless, the associated maximum a posteriori involves a fractional Laplacian operator which is delicate to implement in practice. To deal with this issue, we propose to decompose the divergent-free fBm on well-chosen wavelet bases. As a first alternative, we propose to design wavelets as whitening filters. We show that these filters are fractional Laplacian wavelets composed with the Leray projector. As a second alternative, we use a divergence-free wavelet basis, which takes implicitly into account the incompressibility constraint arising from physics. Although the latter decomposition involves correlated wavelet coefficients, we are able to handle this dependence in practice. Based on these two wavelet decompositions, we finally provide effective and efficient algorithms to approach the maximum a posteriori. An intensive numerical evaluation proves the relevance of the proposed wavelet-based self-similar priors.
基于多项式逼近的高维散乱数据不确定性量化
Lionel Mathelin
AI总结 提出一种从有限高维散乱样本中构建随机过程函数表示的方法,通过高维模型表示格式与改进最小角回归选择基函数,结合交替最小二乘或加权总体最小二乘估计系数,在样本量极少(每维约3个)且含噪声时仍能获得准确逼近。
本文讨论了一种从散乱点样本集合中确定随机过程函数表示的方法。本工作特别关注在信息有限的情况下,位于高维随机空间中的随机量。所提出的方法包括选择近似基和评估相关系数的过程。近似基的选择依赖于先验选择的高维模型表示格式,并结合改进的最小角回归技术。由此产生的基为实际近似基提供了结构,可能使用不同的函数,这些函数更简洁且系数非线性。为了评估系数,采用了交替最小二乘和交替加权总体最小二乘两种方法。提供了高维空间中随机变量近似以及随机场估计的示例。考虑了高达100的随机维度,信息量低至每维约3个样本,并且证明了近似对数据集中噪声的鲁棒性。求解方法的计算成本仅与先验基的基数呈线性关系,并且与数据集中的样本数N_q呈(N_q)^s依赖关系,其中2 ≤ s ≤ 3。提供的数值实验说明了本方法即使在存在噪声的情况下,也能从稀少的散乱数据中导出准确的近似。
This paper discusses a methodology for determining a functional representation of a random process from a collection of scattered pointwise samples. The present work specifically focuses onto random quantities lying in a high dimensional stochastic space in the context of limited amount of information. The proposed approach involves a procedure for the selection of an approximation basis and the evaluation of the associated coefficients. The selection of the approximation basis relies on the a priori choice of the High-Dimensional Model Representation format combined with a modified Least Angle Regression technique. The resulting basis then provides the structure for the actual approximation basis, possibly using different functions, more parsimonious and nonlinear in its coefficients. To evaluate the coefficients, both an alternate least squares and an alternate weighted total least squares methods are employed. Examples are provided for the approximation of a random variable in a high-dimensional space as well as the estimation of a random field. Stochastic dimensions up to 100 are considered, with an amount of information as low as about 3 samples per dimension, and robustness of the approximation is demonstrated w.r.t. noise in the dataset. The computational cost of the solution method is shown to scale only linearly with the cardinality of the a priori basis and exhibits a (N_q)^s, 2 <= s <= 3, dependence with the number N_q of samples in the dataset. The provided numerical experiments illustrate the ability of the present approach to derive an accurate approximation from scarce scattered data even in the presence of noise.
切空间扰动的非渐近分析
Daniel N. Kaslovsky, Francois G. Meyer
AI总结 利用非渐近随机矩阵理论研究局部主成分分析估计切空间的稳定性,并给出可实际应用的扰动界限。
为高维中靠近光滑流形的大规模含噪数据点构建高效参数化仍是一个基本问题。一种方法是通过局部切平面恢复局部参数化。主成分分析(PCA)通常是首选工具,因为它在线性子空间的无噪声样本情况下返回最优基。为了处理来自非线性流形的含噪数据样本,PCA必须在局部应用,尺度要足够小使得流形近似线性,但又足够大使得结构能从噪声中辨别。利用特征空间扰动理论和非渐近随机矩阵理论,我们研究了PCA估计的子空间随尺度的稳定性,并以高概率界定了它与真实切空间之间的夹角。通过自适应选择最小化该界限的尺度,我们的分析揭示了局部切平面恢复的合适尺度。我们还引入了几何不确定性原理,量化了噪声-曲率扰动对稳定恢复的限制。为了提供可在实践中使用的扰动界限,我们提出了插件估计,使得理论结果可以直接应用于真实数据集。
Constructing an efficient parameterization of a large, noisy data set of points lying close to a smooth manifold in high dimension remains a fundamental problem. One approach consists in recovering a local parameterization using the local tangent plane. Principal component analysis (PCA) is often the tool of choice, as it returns an optimal basis in the case of noise-free samples from a linear subspace. To process noisy data samples from a nonlinear manifold, PCA must be applied locally, at a scale small enough such that the manifold is approximately linear, but at a scale large enough such that structure may be discerned from noise. Using eigenspace perturbation theory and non-asymptotic random matrix theory, we study the stability of the subspace estimated by PCA as a function of scale, and bound (with high probability) the angle it forms with the true tangent space. By adaptively selecting the scale that minimizes this bound, our analysis reveals an appropriate scale for local tangent plane recovery. We also introduce a geometric uncertainty principle quantifying the limits of noise-curvature perturbation for stable recovery. With the purpose of providing perturbation bounds that can be used in practice, we propose plug-in estimates that make it possible to directly apply the theoretical results to real data sets.
二项比例多阶段估计的精确方法
Zhengjia Chen, Xinjia Chen
AI总结 本文回顾了现有二项比例序贯估计方法,提出了一类新的组序贯抽样方案,在给定误差和置信水平下实现均匀覆盖概率控制和渐近最优性,并推导了样本数的解析界。
我们首先回顾了现有的估计二项比例的序贯方法。之后,我们提出了一类新的组序贯抽样方案,用于在给定的误差范围和置信水平下估计二项比例。特别地,我们建立了这类抽样方案的覆盖概率的一致可控性和渐近最优性。我们的理论结果确立了这类抽样方案的参数可以确定,从而以很少的样本浪费保证指定的置信水平。推导了累积分布函数和样本数期望的解析界。此外,我们讨论了各种抽样方案的内在联系。解决了数值问题以提高计算的准确性和效率。进行了计算实验以比较抽样方案。给出了在临床试验中应用的说明性示例。
We first review existing sequential methods for estimating a binomial proportion. Afterward, we propose a new family of group sequential sampling schemes for estimating a binomial proportion with prescribed margin of error and confidence level. In particular, we establish the uniform controllability of coverage probability and the asymptotic optimality for such a family of sampling schemes. Our theoretical results establish the possibility that the parameters of this family of sampling schemes can be determined so that the prescribed level of confidence is guaranteed with little waste of samples. Analytic bounds for the cumulative distribution functions and expectations of sample numbers are derived. Moreover, we discuss the inherent connection of various sampling schemes. Numerical issues are addressed for improving the accuracy and efficiency of computation. Computational experiments are conducted for comparing sampling schemes. Illustrative examples are given for applications in clinical trials.
均值的鲁棒估计
Xinjia Chen
AI总结 提出一种基于样本重用技术的计算方法,在温和假设下高效计算均值的上下界,并证明计算复杂度服从泊松分布。
在本文中,我们开发了一种在不确定性存在下估计数量均值的计算方法。我们证明,在一些温和假设下,均值的上下界可以通过样本重用技术高效计算,其计算复杂度被证明服从泊松分布。
In this paper, we develop a computational approach for estimating the mean value of a quantity in the presence of uncertainty. We demonstrate that, under some mild assumptions, the upper and lower bounds of the mean value are efficiently computable via a sample reuse technique, of which the computational complexity is shown to posses a Poisson distribution.
鲁棒控制中的风险分析——为概率鲁棒控制辩护
Xinjia Chen, Jorge Aravena, Kemin Zhou
AI总结 本文批判了鲁棒控制设计中的“最坏情况”方法,通过案例分析表明概率方法在风险控制上可能更优,并量化了最坏情况设计的高风险。
本文对作为鲁棒控制设计基石的“最坏情况”方法提出了批判性观点。我们认为,盲目接受最坏情况场景可能导致的设计实际上比基于内置风险因子的概率技术更危险。真正的问题在于建模。如果承认不确定性没有完美的数学模型,那么概率方法可以导致更可靠的控制,即使它不能保证在所有可能情况下的稳定性。我们的论述基于案例分析。我们首先确定最坏情况不一定是“包罗万象的”。事实上,我们表明,对于某些不确定控制问题,要获得传统的鲁棒控制解,必须做出排除某些可行情况的假设。一旦我们确立了这一点,我们论证最坏情况设计中未考虑情况的风险并不罕见地大于概率方法中接受的风险。通过一个例子,我们量化了风险,并表明最坏情况可能显著更具风险。最后,我们将我们的分析与关于计算复杂性和概率鲁棒性的现有结果相结合,论证确定性最坏情况分析不一定更好。
This paper offers a critical view of the "worst-case" approach that is the cornerstone of robust control design. It is our contention that a blind acceptance of worst-case scenarios may lead to designs that are actually more dangerous than designs based on probabilistic techniques with a built-in risk factor. The real issue is one of modeling. If one accepts that no mathematical model of uncertainties is perfect then a probabilistic approach can lead to more reliable control even if it cannot guarantee stability for all possible cases. Our presentation is based on case analysis. We first establish that worst-case is not necessarily "all-encompassing." In fact, we show that for some uncertain control problems to have a conventional robust control solution it is necessary to make assumptions that leave out some feasible cases. Once we establish that point, we argue that it is not uncommon for the risk of unaccounted cases in worst-case design to be greater than that of the accepted risk in a probabilistic approach. With an example, we quantify the risks and show that worst-case can be significantly more risky. Finally, we join our analysis with existing results on computational complexity and probabilistic robustness to argue that the deterministic worst-case analysis is not necessarily the better tool.
几何规划与符号规划的MM算法
Kenneth Lange, Hua Zhou
AI总结 本文基于MM算法框架,利用几何-算术均值不等式和支持超平面不等式,将无约束符号规划转化为一系列一维最小化问题,并扩展至约束情形,特别适用于二次规划。
本文推导了符号规划(几何规划的推广)的新算法。这些算法基于称为MM算法的通用优化原理。在此设置中,可以应用几何-算术均值不等式和支持超平面不等式来创建参数分离的替代函数。因此,无约束符号规划简化为一系列一维最小化问题。简单例子表明,推导出的MM算法可以收敛到边界点或连续最小点集合中的一点。对于几何规划,证明了最小点唯一或出现在参数空间内部的条件。收敛到内点以线性速率发生。最后,MM框架容易处理符号类型的等式和不等式约束。对于最重要的特例——约束二次规划,MM算法涉及非常简单的更新。
This paper derives new algorithms for signomial programming, a generalization of geometric programming. The algorithms are based on a generic principle for optimization called the MM algorithm. In this setting, one can apply the geometric-arithmetic mean inequality and a supporting hyperplane inequality to create a surrogate function with parameters separated. Thus, unconstrained signomial programming reduces to a sequence of one-dimensional minimization problems. Simple examples demonstrate that the MM algorithm derived can converge to a boundary point or to one point of a continuum of minimum points. Conditions under which the minimum point is unique or occurs in the interior of parameter space are proved for geometric programming. Convergence to an interior point occurs at a linear rate. Finally, the MM framework easily accommodates equality and inequality constraints of signomial type. For the most important special case, constrained quadratic programming, the MM algorithm involves very simple updates.
恢复非负和组合稀疏表示
Karthikeyan Natesan Ramamurthy, Jayaraman J. Thiagarajan, Andreas Spanias
AI总结 基于多面体理论,研究了欠定线性系统中非负解的唯一性条件,并提出了组合稀疏表示范式及相应的组合正交匹配追踪算法,用于恢复唯一最稀疏系数向量。
有时,欠定线性系统的非负解可以被唯一恢复,即使不施加任何额外的稀疏约束。在本文中,我们基于多面体理论推导了此类系统存在唯一非负解的条件。此外,我们发展了组合稀疏表示的范式,其中只有部分系数向量被约束为非负,其余部分无约束(一般)。我们分析了在三种不同的系数支持知识情况下,组合表示的唯一最稀疏解的恢复:(a)非负系数和一般系数的非零支持已知,(b)仅一般系数的非零支持已知,(c)两个非零支持均未知。对于情况(c),我们提出了组合正交匹配追踪算法用于系数恢复,并推导了确定性稀疏度阈值,在该阈值下可以恢复唯一最稀疏的系数向量。我们量化了算法的阶复杂度,并检验了它们在各种噪声条件下精确和近似恢复系数的性能。此外,我们还获得了它们的经验相变特性。我们表明,与无约束的对应算法相比,具有部分非负约束的基追踪算法和所提出的贪婪算法在恢复唯一稀疏表示方面表现更好。最后,我们展示了所提方法在恢复受饱和噪声污染的图像中的实用性。
The non-negative solution to an underdetermined linear system can be uniquely recovered sometimes, even without imposing any additional sparsity constraints. In this paper, we derive conditions under which a unique non-negative solution for such a system can exist, based on the theory of polytopes. Furthermore, we develop the paradigm of combined sparse representations, where only a part of the coefficient vector is constrained to be non-negative, and the rest is unconstrained (general). We analyze the recovery of the unique, sparsest solution, for combined representations, under three different cases of coefficient support knowledge: (a) the non-zero supports of non-negative and general coefficients are known, (b) the non-zero support of general coefficients alone is known, and (c) both the non-zero supports are unknown. For case (c), we propose the combined orthogonal matching pursuit algorithm for coefficient recovery and derive the deterministic sparsity threshold under which recovery of the unique, sparsest coefficient vector is possible. We quantify the order complexity of the algorithms, and examine their performance in exact and approximate recovery of coefficients under various conditions of noise. Furthermore, we also obtain their empirical phase transition characteristics. We show that the basis pursuit algorithm, with partial non-negative constraints, and the proposed greedy algorithm perform better in recovering the unique sparse representation when compared to their unconstrained counterparts. Finally, we demonstrate the utility of the proposed methods in recovering images corrupted by saturation noise.
基于高斯过程的贝叶斯非线性系统辨识的集成预处理
Roger Frigola, Carl Edward Rasmussen
AI总结 提出GP-FNARX模型,通过集成数据预处理与稀疏高斯过程回归,实现从原始数据到辨识模型的自动化流程,并利用边际似然最大化同时优化预处理参数和超参数,获得能报告不确定性的贝叶斯动力学模型。
我们介绍了GP-FNARX:一种新的非线性系统辨识模型,基于带有滤波回归量(F)的非线性自回归外生模型(NARX),其中非线性回归问题使用稀疏高斯过程(GP)解决。我们将数据预处理与系统辨识集成到一个完全自动化的流程中,从原始数据到辨识模型。预处理参数和GP超参数均通过最大化概率模型的边际似然来调整。我们获得了系统动力学的贝叶斯模型,该模型能够在数据稀缺的区域报告其不确定性。自动化方法、不确定性建模及其相对较低的计算成本使GP-FNARX成为机器人和自适应控制应用的良好候选方案。
We introduce GP-FNARX: a new model for nonlinear system identification based on a nonlinear autoregressive exogenous model (NARX) with filtered regressors (F) where the nonlinear regression problem is tackled using sparse Gaussian processes (GP). We integrate data pre-processing with system identification into a fully automated procedure that goes from raw data to an identified model. Both pre-processing parameters and GP hyper-parameters are tuned by maximizing the marginal likelihood of the probabilistic model. We obtain a Bayesian model of the system's dynamics which is able to report its uncertainty in regions where the data is scarce. The automated approach, the modeling of uncertainty and its relatively low computational cost make of GP-FNARX a good candidate for applications in robotics and adaptive control.
KL-UCB算法:有界随机赌博机及其扩展
Aurélien Garivier, Olivier Cappé
AI总结 本文提出KL-UCB算法,通过有限时间分析证明其在有界奖励下优于UCB和UCB2,在伯努利奖励下达到Lai和Robbins下界,并扩展到指数族分布,数值实验显示其高效稳定。
本文对KL-UCB算法进行了有限时间分析,该算法是一种在线、无时间视界的随机赌博机问题的索引策略。我们证明了两个不同的结果:首先,对于任意有界奖励,KL-UCB算法满足比UCB或UCB2一致更优的遗憾界;其次,在伯努利奖励的特殊情况下,它达到了Lai和Robbins的下界。此外,我们展示了KL-UCB算法的简单改编对于特定类别的(可能无界)奖励也是最优的,包括那些从指数族分布生成的奖励。一项大规模数值研究将KL-UCB与其主要竞争对手(UCB、UCB2、UCB-Tuned、UCB-V、DMED)进行比较,表明KL-UCB非常高效且稳定,包括在短时间范围内。KL-UCB也是唯一始终优于基本UCB策略的方法。我们的遗憾界依赖于附录中陈述并证明的具有独立兴趣的偏差结果。作为副产品,我们还获得了标准UCB算法的改进遗憾界。
This paper presents a finite-time analysis of the KL-UCB algorithm, an online, horizon-free index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary bounded rewards, the KL-UCB algorithm satisfies a uniformly better regret bound than UCB or UCB2; second, in the special case of Bernoulli rewards, it reaches the lower bound of Lai and Robbins. Furthermore, we show that simple adaptations of the KL-UCB algorithm are also optimal for specific classes of (possibly unbounded) rewards, including those generated from exponential families of distributions. A large-scale numerical study comparing KL-UCB with its main competitors (UCB, UCB2, UCB-Tuned, UCB-V, DMED) shows that KL-UCB is remarkably efficient and stable, including for short time horizons. KL-UCB is also the only method that always performs better than the basic UCB policy. Our regret bounds rely on deviations results of independent interest which are stated and proved in the Appendix. As a by-product, we also obtain an improved regret bound for the standard UCB algorithm.
Hottopixx的鲁棒性分析:一个用于非负矩阵分解的线性规划模型
Nicolas Gillis
AI总结 本文对Hottopixx线性规划模型进行鲁棒性分析,并提出一种后处理策略以增强对数据集中重复和近似重复的鲁棒性。
尽管非负矩阵分解(NMF)通常是NP难的,但最近研究表明,在输入非负数据矩阵接近可分离的假设下(可分离性要求输入矩阵的所有列属于由这些列的一个小子集张成的锥),NMF是易处理的。此后,设计了多种算法来处理这类NMF子问题。特别地,Bittorf、Recht、Ré和Tropp(《用线性规划分解非负矩阵》,NIPS 2012)提出了一种线性规划模型,称为Hottopixx。本文提供了对其方法的一种新的、更一般的鲁棒性分析。特别地,我们通过一种后处理策略设计了一个可证明更鲁棒的变体,该策略允许我们处理数据集中的重复和近似重复。
Although nonnegative matrix factorization (NMF) is NP-hard in general, it has been shown very recently that it is tractable under the assumption that the input nonnegative data matrix is close to being separable (separability requires that all columns of the input matrix belongs to the cone spanned by a small subset of these columns). Since then, several algorithms have been designed to handle this subclass of NMF problems. In particular, Bittorf, Recht, Ré and Tropp (`Factoring nonnegative matrices with linear programs', NIPS 2012) proposed a linear programming model, referred to as Hottopixx. In this paper, we provide a new and more general robustness analysis of their method. In particular, we design a provably more robust variant using a post-processing strategy which allows us to deal with duplicates and near duplicates in the dataset.
贪婪稀疏约束优化
Sohail Bahmani, Bhiksha Raj, Petros Boufounos
AI总结 提出一种贪婪算法GraSP,用于近似任意形式代价函数的稀疏最小值,并在代价函数满足稳定受限Hessian或稳定受限线性化时保证收敛到真稀疏最优解的有界距离内。
稀疏约束优化在机器学习、统计和信号处理问题(如特征选择和压缩感知)中具有广泛适用性。大量工作从理论、算法和应用方面研究了线性模型中的稀疏约束优化,其中估计的保真度通过平方误差来衡量。相比之下,在涉及非线性模型或代价函数非二次的情况下,对稀疏约束优化的研究相对较少。本文提出一种贪婪算法——梯度支持追踪(GraSP),用于近似任意形式代价函数的稀疏最小值。如果代价函数具有稳定受限Hessian(SRH)或稳定受限线性化(SRL)(两者均在本文中引入),我们的算法保证在距真实稀疏最优解有界距离内产生一个稀疏向量。我们的方法推广了稀疏线性回归和压缩感知中二次代价函数的已知结果。我们还通过合成数据的数值模拟评估了GraSP的性能,其中该算法用于有/无$\ell_2$正则化的稀疏逻辑回归。
Sparsity-constrained optimization has wide applicability in machine learning, statistics, and signal processing problems such as feature selection and compressive Sensing. A vast body of work has studied the sparsity-constrained optimization from theoretical, algorithmic, and application aspects in the context of sparse estimation in linear models where the fidelity of the estimate is measured by the squared error. In contrast, relatively less effort has been made in the study of sparsity-constrained optimization in cases where nonlinear models are involved or the cost function is not quadratic. In this paper we propose a greedy algorithm, Gradient Support Pursuit (GraSP), to approximate sparse minima of cost functions of arbitrary form. Should a cost function have a Stable Restricted Hessian (SRH) or a Stable Restricted Linearization (SRL), both of which are introduced in this paper, our algorithm is guaranteed to produce a sparse vector within a bounded distance from the true sparse optimum. Our approach generalizes known results for quadratic cost functions that arise in sparse linear regression and Compressive Sensing. We also evaluate the performance of GraSP through numerical simulations on synthetic data, where the algorithm is employed for sparse logistic regression with and without $\ell_2$-regularization.
贝叶斯反演中加速MCMC方法的复杂度分析
Viet Ha Hoang, Christoph Schwab, Andrew M. Stuart
AI总结 针对模型椭圆PDE的贝叶斯反演问题,分析了几种MCMC方法的计算复杂度,并提出了稀疏多项式混沌代理和多层MCMC两种加速策略以降低复杂度。
我们研究了一个具有未知扩散系数的模型椭圆PDE的贝叶斯反演问题。针对给定数据$δ$下贝叶斯后验分布的期望的高效数值计算,我们提供了几种马尔可夫链蒙特卡洛(MCMC)方法的复杂度分析。特别关注达到预设误差水平$\varepsilon$所需的总工作量界限。具体来说,我们首先基于MCMC采样与椭圆PDE的线性复杂度多层求解器相结合,给出了“普通”MCMC的计算复杂度界限。我们(新的)工作量与精度界限表明,这种方法的复杂度可能相当高。然后提出并分析了两种降低计算复杂度的策略:第一种是在整个参数空间上对PDE的正向响应图进行稀疏、参数化和确定性的广义多项式混沌(gpc)“代理”表示;第二种是一种新颖的多层马尔可夫链蒙特卡洛(MLMCMC)策略,该策略利用从后验和正向PDE的多层离散化中进行采样。对于这两种策略,我们推导了工作量与精度的渐近界限,从而得到了算法计算复杂度的渐近界限。特别地,我们提供了PDE未知系数的正则性以及所使用的近似方法的充分条件,使得这些策略带来的MCMC加速能够比“普通”MCMC算法在贝叶斯PDE反演中实现复杂度降低。
We study Bayesian inversion for a model elliptic PDE with unknown diffusion coefficient. We provide complexity analyses of several Markov Chain-Monte Carlo (MCMC) methods for the efficient numerical evaluation of expectations under the Bayesian posterior distribution, given data $δ$. Particular attention is given to bounds on the overall work required to achieve a prescribed error level $\varepsilon$. Specifically, we first bound the computational complexity of "plain" MCMC, based on combining MCMC sampling with linear complexity multilevel solvers for elliptic PDE. Our (new) work versus accuracy bounds show that the complexity of this approach can be quite prohibitive. Two strategies for reducing the computational complexity are then proposed and analyzed: first, a sparse, parametric and deterministic generalized polynomial chaos (gpc) "surrogate" representation of the forward response map of the PDE over the entire parameter space, and, second, a novel Multi-Level Markov Chain Monte Carlo (MLMCMC) strategy which utilizes sampling from a multilevel discretization of the posterior and of the forward PDE. For both of these strategies we derive asymptotic bounds on work versus accuracy, and hence asymptotic bounds on the computational complexity of the algorithms. In particular we provide sufficient conditions on the regularity of the unknown coefficients of the PDE, and on the approximation methods used, in order for the accelerations of MCMC resulting from these strategies to lead to complexity reductions over "plain" MCMC algorithms for Bayesian inversion of PDEs.}
通过结构化Schatten范数正则化的凸张量分解
Ryota Tomioka, Taiji Suzuki
AI总结 本文研究用于凸优化张量分解的结构化Schatten范数,从理论上证明“潜在”方法优于“重叠”方法,并建立对偶性、一致性和可识别性结果。
我们讨论了用于张量分解的结构化Schatten范数,包括最近提出的两种用于基于凸优化的张量分解的范数(“重叠”和“潜在”),并将张量分解与更广泛的结构化稀疏性文献联系起来。基于结构化Schatten范数的性质,我们从数学上分析了“潜在”方法在张量分解中的性能,该方法在某些设置下经验上被发现比“重叠”方法表现更好。我们从理论上证明了这确实是事实。特别是,当未知的真实张量在特定模式下是低秩时,该方法的表现与知道最小秩的模式一样好。在此过程中,我们展示了结构化Schatten范数的一个新颖的对偶性结果,建立了一致性,并讨论了该方法的可识别性。通过数值模拟,我们确认了我们的理论预测可以精确预测均方误差的缩放行为。
We discuss structured Schatten norms for tensor decomposition that includes two recently proposed norms ("overlapped" and "latent") for convex-optimization-based tensor decomposition, and connect tensor decomposition with wider literature on structured sparsity. Based on the properties of the structured Schatten norms, we mathematically analyze the performance of "latent" approach for tensor decomposition, which was empirically found to perform better than the "overlapped" approach in some settings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a specific mode, this approach performs as good as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structures Schatten norms, establish the consistency, and discuss the identifiability of this approach. We confirm through numerical simulations that our theoretical prediction can precisely predict the scaling behavior of the mean squared error.
鲁棒与趋势跟随的学生t卡尔曼平滑器
Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto
AI总结 提出基于学生t分布的重尾误差建模的卡尔曼平滑框架,包含算法、收敛理论、开源实现,并应用于鲁棒平滑和状态突变跟踪。
我们提出了一种基于重尾学生t分布建模误差的卡尔曼平滑框架,包括算法、收敛理论、通用开源实现以及几个重要应用。每次迭代的计算量随时间序列长度线性增长,所有平滑器都允许非线性过程和测量模型。鲁棒平滑器是该框架内平滑器的一个重要子类。这些平滑器适用于测量值受到噪声高度污染或包含前向模型无法解释的数据的情况。通过使用学生t分布对测量误差建模,开发了高度鲁棒的平滑器,在数据包含20%或更多异常值的极端情况下优于最近提出的L1-Laplace平滑器。我们详细考虑的第二个特殊应用允许跟踪状态的突然变化。它通过使用学生t分布对过程噪声建模来开发,得到的平滑器可以跟踪状态的突然变化。这些特性可以单独使用或组合使用,我们提出了一种通用的平滑器算法和开源实现,以及涵盖广泛平滑器的收敛性分析。我们方法的一个关键要素是处理学生t损失函数非凸性的技术。线性和非线性模型的数值结果说明了新平滑器在鲁棒和跟踪应用以及具有两种特征的混合问题中的性能。
We present a Kalman smoothing framework based on modeling errors using the heavy tailed Student's t distribution, along with algorithms, convergence theory, open-source general implementation, and several important applications. The computational effort per iteration grows linearly with the length of the time series, and all smoothers allow nonlinear process and measurement models. Robust smoothers form an important subclass of smoothers within this framework. These smoothers work in situations where measurements are highly contaminated by noise or include data unexplained by the forward model. Highly robust smoothers are developed by modeling measurement errors using the Student's t distribution, and outperform the recently proposed L1-Laplace smoother in extreme situations with data containing 20% or more outliers. A second special application we consider in detail allows tracking sudden changes in the state. It is developed by modeling process noise using the Student's t distribution, and the resulting smoother can track sudden changes in the state. These features can be used separately or in tandem, and we present a general smoother algorithm and open source implementation, together with convergence analysis that covers a wide range of smoothers. A key ingredient of our approach is a technique to deal with the non-convexity of the Student's t loss function. Numerical results for linear and nonlinear models illustrate the performance of the new smoothers for robust and tracking applications, as well as for mixed problems that have both types of features.
非凸正则化优化问题的一般迭代收缩与阈值算法
Pinghua Gong, Changshui Zhang, Zhaosong Lu, Jianhua Huang, Jieping Ye
AI总结 针对非凸稀疏诱导惩罚的优化问题,提出一种通用迭代收缩与阈值算法(GIST),通过近端算子闭式解和BB规则线搜索实现高效求解,并给出收敛性分析。
非凸稀疏诱导惩罚近年来在稀疏学习中受到广泛关注。最近的理论研究表明,在若干稀疏学习场景中,非凸惩罚优于其凸对应物。然而,与非凸惩罚相关的非凸优化问题的求解仍然是一个重大挑战。一种常用方法是多阶段(MS)凸松弛(或DC规划),它将原始非凸问题松弛为一系列凸问题。这种方法通常不适用于大规模问题,因为其计算成本是求解单个凸问题的倍数。在本文中,我们提出了一种通用迭代收缩与阈值(GIST)算法,用于求解一大类非凸惩罚的非凸优化问题。GIST算法迭代求解一个近端算子问题,而该问题对于许多常用惩罚具有闭式解。在算法的每次外迭代中,我们使用由Barzilai-Borwein(BB)规则初始化的线搜索,以快速找到合适的步长。本文还给出了GIST算法的详细收敛性分析。通过在大规模数据集上的大量实验,证明了所提算法的效率。
Non-convex sparsity-inducing penalties have recently received considerable attentions in sparse learning. Recent theoretical investigations have demonstrated their superiority over the convex counterparts in several sparse learning settings. However, solving the non-convex optimization problems associated with non-convex penalties remains a big challenge. A commonly used approach is the Multi-Stage (MS) convex relaxation (or DC programming), which relaxes the original non-convex problem to a sequence of convex problems. This approach is usually not very practical for large-scale problems because its computational cost is a multiple of solving a single convex problem. In this paper, we propose a General Iterative Shrinkage and Thresholding (GIST) algorithm to solve the nonconvex optimization problem for a large class of non-convex penalties. The GIST algorithm iteratively solves a proximal operator problem, which in turn has a closed-form solution for many commonly used penalties. At each outer iteration of the algorithm, we use a line search initialized by the Barzilai-Borwein (BB) rule that allows finding an appropriate step size quickly. The paper also presents a detailed convergence analysis of the GIST algorithm. The efficiency of the proposed algorithm is demonstrated by extensive experiments on large-scale data sets.
具有异方差和相关误差的动态季节性和趋势的非参数自适应建模
Yu-Chun Chen, Ming-Yen Cheng, Hau-tieng Wu
AI总结 提出非参数模型描述多分量季节性的动态,并利用同步压缩变换在存在趋势和异方差相关误差时提取这些特征,理论证明了自适应性和鲁棒性。
季节性(或周期性)和趋势是描述观测序列的特征,提取这些特征是许多科学领域的重要问题。然而,现有方法难以同时分析趋势和季节性的动态(如时变频率和幅度),并且分析对这些动态的自适应性以及对异方差、相关误差的鲁棒性无法保证。当存在多个季节性分量时,这些任务变得更加具有挑战性。我们提出了一种非参数模型来描述多分量季节性的动态,并研究了最近开发的同步压缩变换(SST)在存在趋势和异方差、相关误差的情况下提取这些特征的能力。研究了非参数季节性模型的可识别性问题,并在离散和连续时间设置下从理论上证明了SST的自适应性和鲁棒性。因此,我们拥有了一种在一般非参数设置中解耦趋势、季节性和异方差、相关误差过程的新技术。提供了一系列模拟结果,并分析了台湾水痘和带状疱疹的发病率时间序列以及睡眠研究中观察到的呼吸信号。
Seasonality (or periodicity) and trend are features describing an observed sequence, and extracting these features is an important issue in many scientific fields. However, it is not an easy task for existing methods to analyze simultaneously the trend and {\it dynamics} of the seasonality such as time-varying frequency and amplitude, and the {\it adaptivity} of the analysis to such dynamics and robustness to heteroscedastic, dependent errors is not guaranteed. These tasks become even more challenging when there exist multiple seasonal components. We propose a nonparametric model to describe the dynamics of multi-component seasonality, and investigate the recently developed Synchrosqueezing transform (SST) in extracting these features in the presence of a trend and heteroscedastic, dependent errors. The identifiability problem of the nonparametric seasonality model is studied, and the adaptivity and robustness properties of the SST are theoretically justified in both discrete- and continuous-time settings. Consequently we have a new technique for de-coupling the trend, seasonality and heteroscedastic, dependent error process in a general nonparametric setup. Results of a series of simulations are provided, and the incidence time series of varicella and herpes zoster in Taiwan and respiratory signals observed from a sleep study are analyzed.
多臂赌博机问题的探索与利用确定性排序
Sattar Vakili, Keqin Liu, Qing Zhao
AI总结 提出基于探索与利用确定性排序(DSEE)的策略,针对轻尾分布实现最优对数遗憾,针对重尾分布达到次优遗憾,并推广到多种MAB变体。
在多臂赌博机(MAB)问题中,存在一组具有未知奖励模型的臂。在每个时刻,玩家选择一个臂进行游戏,旨在最大化在T长度时间范围内的总期望奖励。本文开发了一种基于探索与利用确定性排序(DSEE)的方法来构建顺序臂选择策略。结果表明,对于所有轻尾奖励分布,DSEE实现了遗憾的最优对数阶,其中遗憾定义为相对于已知奖励模型的理想情况的总期望奖励损失。对于重尾奖励分布,当奖励分布的矩存在到p阶(1<p≤2)时,DSEE实现了O(T^{1/p})的遗憾,对于p>2时实现了O(T^{1/(1+p/2)})的遗憾。利用对重尾奖励分布有限矩的上界知识,DSEE提供了最优的对数遗憾阶。所提出的DSEE方法通过为一般奖励分布提供相应结果,补充了现有的MAB工作。此外,通过明确定义的可调参数——探索序列的基数,DSEE方法易于扩展到MAB的变体,包括具有不同目标的MAB、具有多个玩家和碰撞下不完全奖励观测的分散式MAB、具有未知马尔可夫动力学的MAB,以及具有依赖臂的组合MAB,这些常出现在网络优化问题中,如未知随机权重下的最短路径、最小生成树和支配集问题。
In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies. It is shown that for all light-tailed reward distributions, DSEE achieves the optimal logarithmic order of the regret, where regret is defined as the total expected reward loss against the ideal case with known reward models. For heavy-tailed reward distributions, DSEE achieves O(T^1/p) regret when the moments of the reward distributions exist up to the pth order for 1<p<=2 and O(T^1/(1+p/2)) for p>2. With the knowledge of an upperbound on a finite moment of the heavy-tailed reward distributions, DSEE offers the optimal logarithmic regret order. The proposed DSEE approach complements existing work on MAB by providing corresponding results for general reward distributions. Furthermore, with a clearly defined tunable parameter-the cardinality of the exploration sequence, the DSEE approach is easily extendable to variations of MAB, including MAB with various objectives, decentralized MAB with multiple players and incomplete reward observations under collisions, MAB with unknown Markov dynamics, and combinatorial MAB with dependent arms that often arise in network optimization problems such as the shortest path, the minimum spanning, and the dominating set problems under unknown random weights.
投影梯度下降法在 $\ell_p$ 约束最小二乘问题中的统一分析
Sohail Bahmani, Bhiksha Raj
AI总结 本文利用受限等距性质,分析了投影梯度下降法在压缩感知框架下 $\ell_p$ 约束最小二乘问题中的收敛性,统一并推广了迭代硬阈值和软阈值算法的结果,揭示了随 $p$ 增大条件变严、鲁棒性下降的规律。
本文研究了投影梯度下降(PGD)算法在压缩感知框架下 $\ell_{p}$ 约束最小二乘问题中的性能。基于受限等距性质,我们为 $0\leq p\leq1$ 的整个范围提供了该算法的收敛保证,这些结果包含并推广了迭代硬阈值算法的现有结果,并作为特例为迭代软阈值算法提供了新的精度保证。我们的结果表明,在这类算法中,随着 $p$ 从零增加到一,保证精度所需的条件变得更加严格,对噪声的鲁棒性也会下降。
In this paper we study the performance of the Projected Gradient Descent(PGD) algorithm for $\ell_{p}$-constrained least squares problems that arise in the framework of Compressed Sensing. Relying on the Restricted Isometry Property, we provide convergence guarantees for this algorithm for the entire range of $0\leq p\leq1$, that include and generalize the existing results for the Iterative Hard Thresholding algorithm and provide a new accuracy guarantee for the Iterative Soft Thresholding algorithm as special cases. Our results suggest that in this group of algorithms, as $p$ increases from zero to one, conditions required to guarantee accuracy become stricter and robustness to noise deteriorates.
紧支撑小波的Smirnov-Bickel-Rosenblatt定理
Adam D. Bull
AI总结 针对非参数统计中紧支撑小波估计的方差项,证明其sup范数渐近服从Gumbel分布,并验证了Daubechies小波和symlets等基的适用性。
在非参数统计问题中,我们希望找到未知函数f的估计量。我们可以将其误差分解为偏差项和方差项;Smirnov、Bickel和Rosenblatt已经证明,对于直方图或核估计,方差项的sup范数渐近分布为Gumbel随机变量。在下文中,我们针对使用紧支撑小波(非参数统计中的一种流行工具)的估计量证明了该结果的一个版本。我们的结果依赖于对小波特性的一个假设,该假设必须通过可证明良好的数值近似来验证。我们验证了具有N=6,...,20个消失矩的Daubechies小波和symlets的假设;较大的N值以及其他小波基很容易检查,我们推测在这些情况下我们的假设也成立。
In nonparametric statistical problems, we wish to find an estimator of an unknown function f. We can split its error into bias and variance terms; Smirnov, Bickel and Rosenblatt have shown that, for a histogram or kernel estimate, the supremum norm of the variance term is asymptotically distributed as a Gumbel random variable. In the following, we prove a version of this result for estimators using compactly-supported wavelets, a popular tool in nonparametric statistics. Our result relies on an assumption on the nature of the wavelet, which must be verified by provably-good numerical approximations. We verify our assumption for Daubechies wavelets and symlets, with N = 6, ..., 20 vanishing moments; larger values of N, and other wavelet bases, are easily checked, and we conjecture that our assumption holds also in those cases.
大小结构种群分裂率的非参数估计
Marie Doumic Jauffret, Marc Hoffmann, Patricia Reynaud-Bouret, Vincent Rivoirard
AI总结 针对大小结构种群的分裂率估计问题,提出基于传输-碎片方程特征问题的统计推断方法,通过核方法和自动带宽选择构建估计量,达到与确定性逆问题相同的最优误差界。
我们考虑在非参数设定下估计大小结构种群的分裂率问题。系统大小根据传输-碎片方程演化:每个个体以给定的传输速率生长,并以依赖于其大小的未知分裂率,按照二元碎片过程分裂成两个相同大小的后代。与(Perthame, Zubelli, 2007)和(Doumic, Perthame, Zubelli, 2009)中的确定性逆问题方法不同,本文采用统计推断的视角:我们的数据包括个体大小的一个大样本,此时系统演化接近其时间渐近行为,从而可以与所考虑的传输-碎片方程的特征问题相关联(例如参见\cite{PR})。通过统计估计特征问题的每一项,并适当反演某个线性算子(参见先前引用的文章),我们能够构建一个更现实的分裂率估计量,该估计量达到与相关确定性逆问题相同的最优误差界。我们的过程依赖于具有自动带宽选择的核方法,其灵感来自模型选择以及Goldenschluger和Lepski的最新结果。
We consider the problem of estimating the division rate of a size-structured population in a nonparametric setting. The size of the system evolves according to a transport-fragmentation equation: each individual grows with a given transport rate, and splits into two offsprings of the same size, following a binary fragmentation process with unknown division rate that depends on its size. In contrast to a deterministic inverse problem approach, as in (Perthame, Zubelli, 2007) and (Doumic, Perthame, Zubelli, 2009), we take in this paper the perspective of statistical inference: our data consists in a large sample of the size of individuals, when the evolution of the system is close to its time-asymptotic behavior, so that it can be related to the eigenproblem of the considered transport-fragmentation equation (see \cite{PR} for instance). By estimating statistically each term of the eigenvalue problem and by suitably inverting a certain linear operator (see previously quoted articles), we are able to construct a more realistic estimator of the division rate that achieves the same optimal error bound as in related deterministic inverse problems. Our procedure relies on kernel methods with automatic bandwidth selection. It is inspired by model selection and recent results of Goldenschluger and Lepski.
关于覆盖的一个注记
Vladimir Temlyakov
AI总结 本文利用非相干字典构造有限维Banach空间单位球的高效覆盖,并迭代扩展至任意半径。
我们讨论有限维Banach空间单位球的覆盖构造。众所周知的体积比较技术给出了覆盖数的上下界,但该技术并未提供良好覆盖的构造方法。本文应用非相干字典来构造良好覆盖。我们采用以下策略:首先,构建一个半径接近1的球体的良好覆盖;其次,迭代此构造以获得任意半径的良好覆盖。我们主要集中于该策略的第一步。
We discuss construction of coverings of the unit ball of a finite dimensional Banach space. The well known technique of comparing volumes gives upper and lower bounds on covering numbers. This technique does not provide a construction of good coverings. Here we apply incoherent dictionaries for construction of good coverings. We use the following strategy. First, we build a good covering by balls with a radius close to one. Second, we iterate this construction to obtain a good covering for any radius. We mostly concentrate on the first step of this strategy.
基于面元分量分析的几何盲源分离方法
P. Yin, Y. Sun, J. Xin
AI总结 提出一种基于面元分量分析(FCA)的几何盲源分离方法,通过识别数据锥结构的面元而非顶点,结合数据分类与线性回归实现非负线性混合的盲分离,并利用去噪技术处理噪声数据。
给定一组混合信号,盲源分离试图在没有或很少混合过程信息的情况下恢复源信号。我们提出了一种几何方法用于非负线性混合的盲分离,称为面元分量分析(FCA)。该方法基于数据底层锥结构的面元识别。早期工作侧重于通过顶点分量分析(VCA)定位锥的顶点,这要求每个源信号在其频谱中具有独立的峰值(互稀疏条件)。我们提出了替代条件,使得足够多的数据点落在锥的面元上,而不是聚集在顶点周围。为了找到唯一可解的条件,我们利用了数据点的几何和密度特性,并通过结合数据分类和线性回归开发了一种高效的面元识别方法。对于含噪数据,我们展示了可以采用去噪方法,如图像处理中的全变分技术和主成分分析。我们在核磁共振波谱数据上展示了计算结果,以验证我们的方法。
Given a set of mixtures, blind source separation attempts to retrieve the source signals without or with very little information of the the mixing process. We present a geometric approach for blind separation of nonnegative linear mixtures termed {\em facet component analysis} (FCA). The approach is based on facet identification of the underlying cone structure of the data. Earlier works focus on recovering the cone by locating its vertices (vertex component analysis or VCA) based on a mutual sparsity condition which requires each source signal to possess a stand-alone peak in its spectrum. We formulate alternative conditions so that enough data points fall on the facets of a cone instead of accumulating around the vertices. To find a regime of unique solvability, we make use of both geometric and density properties of the data points, and develop an efficient facet identification method by combining data classification and linear regression. For noisy data, we show that denoising methods may be employed, such as the total variation technique in imaging processing, and principle component analysis. We show computational results on nuclear magnetic resonance spectroscopic data to substantiate our method.
变分优化
Joe Staines, David Barber
AI总结 本文提出一种通用技术,通过构造可微边界来优化不可微或离散目标函数,并应用于稀疏学习与支持向量分类。
我们讨论了一种通用技术,可用于形成不可微或离散目标函数最优值的可微边界。我们形成了这些方法的统一描述,并考虑了在何种情况下该边界是凹的。特别地,我们考虑了该方法的两个具体应用,即稀疏学习和支持向量分类。
We discuss a general technique that can be used to form a differentiable bound on the optima of non-differentiable or discrete objective functions. We form a unified description of these methods and consider under which circumstances the bound is concave. In particular we consider two concrete applications of the method, namely sparse learning and support vector classification.
利用核医学数据和统计优化的肾脏生理区室分析
Sara Garbarino, Giacomo Caviglia, Massimo Brignone, Michela Massollo, Gianmario Sambuceti, Michele Piana
AI总结 提出基于谱分析和统计优化的核数据区室建模通用方法,以肾脏生理为测试案例,通过小鼠模型微PET实验的合成数据和真实测量验证方法有效性。
本文描述了一种基于谱分析和统计优化的核数据区室建模通用方法。我们以肾脏生理为测试案例,通过小鼠模型两次微PET实验获得的合成数据和真实测量验证了该方法。
This paper describes a general approach to the compartmental modeling of nuclear data based on spectral analysis and statistical optimization. We utilize the renal physiology as test case and validate the method against both synthetic data and real measurements acquired during two micro-PET experiments with murine models.
随机负荷波动与运行在余维1鞍结分岔附近的电力系统崩溃概率
Dmitry Podolsky, Konstantin Turitsyn
AI总结 针对运行在传输极限附近的电力系统,研究随机负荷波动对电压稳定性的影响,通过计算状态向量的自相关函数解释临界慢化现象,并构建新的指示函数来估计崩溃概率和平均清除时间,同时提出最小化崩溃概率的控制策略。
对于运行在其输电系统功率传输极限附近的电力系统,电力负荷的随机波动效应可能变得至关重要,因为足够强的此类波动可能激活电压不稳定性并导致系统大规模崩溃。考虑这些随机波动在余维1鞍结分岔附近的影响,我们显式计算了状态向量的自相关函数,并展示了其行为如何解释通常在停电阈值附近的电力系统中观察到的临界慢化现象。我们还估计了电力系统的崩溃概率/平均清除时间,并构建了一个新的指示函数,用于指示大规模崩溃的临近。该新指示函数易于使用PMU数据流以及关于电网节点上电力负荷波动的SCADA信息进行实时估计。我们讨论了导致崩溃概率最小化的控制策略。
For a power system operating in the vicinity of the power transfer limit of its transmission system, effect of stochastic fluctuations of power loads can become critical as a sufficiently strong such fluctuation may activate voltage instability and lead to a large scale collapse of the system. Considering the effect of these stochastic fluctuations near a codimension 1 saddle-node bifurcation, we explicitly calculate the autocorrelation function of the state vector and show how its behavior explains the phenomenon of critical slowing-down often observed for power systems on the threshold of blackout. We also estimate the collapse probability/mean clearing time for the power system and construct a new indicator function signaling the proximity to a large scale collapse. The new indicator function is easy to estimate in real time using PMU data feeds as well as SCADA information about fluctuations of power load on the nodes of the power grid. We discuss control strategies leading to the minimization of the collapse probability.
通过数据预处理实现稀疏且唯一的非负矩阵分解
Nicolas Gillis
AI总结 本文提出一种基于数据预处理的非负矩阵分解方法,利用M-矩阵理论和几何解释,在可分离性假设下得到稀疏且唯一的最优解。
非负矩阵分解(NMF)已成为机器学习中非常流行的技术,因为它通过稀疏且基于部分的表示自动提取有意义的特征。然而,NMF 具有高度不适定的缺点,即通常存在许多不同但等价的分解。在本文中,我们介绍了一种全新的方法来获得更适定的 NMF 问题,其解更稀疏。我们的技术基于对非负输入数据矩阵的预处理,并依赖于 M-矩阵理论和 NMF 的几何解释。该方法在 Donoho 和 Stodden(NIPS, 2003)的可分离性假设下可证明地得到最优且稀疏的解,并且对于秩为三的矩阵,使得精确分解的数量有限。我们在几个图像数据集上展示了我们技术的有效性。
Nonnegative matrix factorization (NMF) has become a very popular technique in machine learning because it automatically extracts meaningful features through a sparse and part-based representation. However, NMF has the drawback of being highly ill-posed, that is, there typically exist many different but equivalent factorizations. In this paper, we introduce a completely new way to obtaining more well-posed NMF problems whose solutions are sparser. Our technique is based on the preprocessing of the nonnegative input data matrix, and relies on the theory of M-matrices and the geometric interpretation of NMF. This approach provably leads to optimal and sparse solutions under the separability assumption of Donoho and Stodden (NIPS, 2003), and, for rank-three matrices, makes the number of exact factorizations finite. We illustrate the effectiveness of our technique on several image datasets.
多尺度马尔可夫决策问题:压缩、求解与迁移学习
Jake Bouvrie, Mauro Maggioni
AI总结 提出一种多尺度压缩马尔可夫决策过程的快速算法,自动构建层次结构,解耦子任务并加速收敛,同时实现跨问题的策略迁移。
序列决策和随机控制中的许多问题通常具有自然的多尺度结构:子任务被组合在一起以完成复杂目标。系统性地推断和利用层次结构,尤其是超越单一抽象层次,一直是一个长期挑战。我们描述了一种快速的多尺度过程,用于重复压缩或均质化马尔可夫决策过程(MDP),其中自动确定不同尺度上的子问题层次结构。粗化后的MDP本身是独立的确定性MDP,可以使用现有算法求解。该过程提供的多尺度表示将子任务相互解耦,可以在子问题内部局部和跨子问题全局上显著提高收敛速度,从而节省大量计算。这项工作的第二个基本方面是,这些多尺度分解为不同问题之间提供了新的迁移机会,其中层次结构中不同级别的子任务的解可能适用于迁移到新问题。强调了在任意尺度上策略和势算子的局部迁移。最后,我们在一个说明性领域集合中展示了压缩和迁移,包括涉及离散和连续状态空间的示例。
Many problems in sequential decision making and stochastic control often have natural multiscale structure: sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure, particularly beyond a single level of abstraction, has remained a longstanding challenge. We describe a fast multiscale procedure for repeatedly compressing, or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of sub-problems at different scales is automatically determined. Coarsened MDPs are themselves independent, deterministic MDPs, and may be solved using existing algorithms. The multiscale representation delivered by this procedure decouples sub-tasks from each other and can lead to substantial improvements in convergence rates both locally within sub-problems and globally across sub-problems, yielding significant computational savings. A second fundamental aspect of this work is that these multiscale decompositions yield new transfer opportunities across different problems, where solutions of sub-tasks at different levels of the hierarchy may be amenable to transfer to new problems. Localized transfer of policies and potential operators at arbitrary scales is emphasized. Finally, we demonstrate compression and transfer in a collection of illustrative domains, including examples involving discrete and continuous statespaces.
低秩张量的近似秩检测分解
Franz J. Király, Andreas Ziehe
AI总结 提出AROFAC2算法,通过检测三阶张量的CP秩并分解为秩一分量,具有内在检测真实秩、避免虚假分量、对异常值和非高斯噪声鲁棒的优势。
我们提出了一种算法AROFAC2,该算法能够检测三阶张量的(CP-)秩并将其分解为秩一分量。我们给出了算法有效的生成条件,并在合成数据和真实世界数据上证明,AROFAC2是黄金标准PARAFAC的潜在更优替代方案,其优势在于能够内在检测真实秩、避免虚假分量,并且对异常值和非高斯噪声具有稳定性。
We present an algorithm, AROFAC2, which detects the (CP-)rank of a degree 3 tensor and calculates its factorization into rank-one components. We provide generative conditions for the algorithm to work and demonstrate on both synthetic and real world data that AROFAC2 is a potentially outperforming alternative to the gold standard PARAFAC over which it has the advantages that it can intrinsically detect the true rank, avoids spurious components, and is stable with respect to outliers and non-Gaussian noise.
矩阵乘法的随机近似方案分析
Daniel Hsu, Sham M. Kakade, Tong Zhang
AI总结 本文分析了Sarlos (2006)提出的基于随机旋转和均匀列采样的矩阵乘法随机近似方案,利用矩阵Bernstein不等式和次高斯随机向量二次型的尾部不等式给出简单分析。
本文对Sarlos (2006)提出的基于随机旋转后均匀列采样的矩阵乘法随机近似方案进行了简单分析。结果来自于矩阵版本的Bernstein不等式以及次高斯随机向量中二次型的尾部不等式。
This note gives a simple analysis of a randomized approximation scheme for matrix multiplication proposed by Sarlos (2006) based on a random rotation followed by uniform column sampling. The result follows from a matrix version of Bernstein's inequality and a tail inequality for quadratic forms in subgaussian random vectors.
一种数学随机数生成器 (MRNG)
Osvaldo Skliar, Ricardo E. Monge, Sherry Gapper, Guillermo Oviedo
AI总结 提出一种基于数学程序而非物理现象的随机数生成器 (MRNG),能生成通过统计检验的二进制串和十进制随机数,并应用于概率系统模拟和随机抽样。
本文提出了一种新颖的数学随机数生成器 (MRNG)。这里,“数学”指的是构造该生成器无需借助物理现象(如电子设备的热噪声),而是采用数学程序。MRNG 生成二进制串——原则上长度任意——这些串可被视为真正随机的,因为它们通过了当前公认的用于评估串随机性的统计检验。从这些串中,MRNG 还生成以十进制表示的随机数。MRNG 已作为一项工具安装在以下网页:http://www.appliedmathgroup.org。该生成器可用于以下任务中的应用:a) 概率型系统的计算模拟,以及 b) 不同总体的随机抽样。对密码学应用感兴趣的用户可以构建另一个 MRNG,但他们必须对未授权解码使用该资源加密的消息的人员隐瞒第5节中指定的信息。
A novel Mathematical Random Number Generator (MRNG) is presented here. In this case, "mathematical" refers to the fact that to construct that generator it is not necessary to resort to a physical phenomenon, such as the thermal noise of an electronic device, but rather to a mathematical procedure. The MRNG generates binary strings - in principle, as long as desired - which may be considered genuinely random in the sense that they pass the statistical tests currently accepted to evaluate the randomness of those strings. From those strings, the MRNG also generates random numbers expressed in base 10. An MRNG has been installed as a facility on the following web page: http://www.appliedmathgroup.org. This generator may be used for applications in tasks in: a) computational simulation of probabilistic-type systems, and b) the random selection of samples of different populations. Users interested in applications in cryptography can build another MRNG, but they would have to withhold information - specified in section 5 - from people who are not authorized to decode messages encrypted using that resource.
基于不完整含噪距离测量的定位
Adel Javanmard, Andrea Montanari
AI总结 针对含噪部分距离测量下的欧氏空间点云定位问题,提出基于半定规划的算法,并刻画其在随机几何图模型下的性能边界。
我们考虑在欧氏空间 $\mathbb{R}^d$ 中利用部分成对距离的含噪测量来定位点云的问题。该任务在传感器网络定位和从NMR测量重建蛋白质构象等领域有应用。此外,它与降维问题和流形学习密切相关,后者的目标是通过局部(或部分)度量信息学习数据集的潜在全局几何结构。本文提出一种基于半定规划的重建算法。对于随机几何图模型和一致有界噪声,我们精确刻画了算法的性能:在无噪声情况下,我们找到一个半径 $r_0$,超过该半径算法能重建精确位置(直至刚性变换)。在存在噪声的情况下,我们得到的重建误差上下界仅相差一个依赖于维度 $d$ 和图中节点平均度的因子。
We consider the problem of positioning a cloud of points in the Euclidean space $\mathbb{R}^d$, using noisy measurements of a subset of pairwise distances. This task has applications in various areas, such as sensor network localization and reconstruction of protein conformations from NMR measurements. Also, it is closely related to dimensionality reduction problems and manifold learning, where the goal is to learn the underlying global geometry of a data set using local (or partial) metric information. Here we propose a reconstruction algorithm based on semidefinite programming. For a random geometric graph model and uniformly bounded noise, we provide a precise characterization of the algorithm's performance: In the noiseless case, we find a radius $r_0$ beyond which the algorithm reconstructs the exact positions (up to rigid transformations). In the presence of noise, we obtain upper and lower bounds on the reconstruction error that match up to a factor that depends only on the dimension $d$, and the average degree of the nodes in the graph.
高阶 scrambled 数字网对光滑被积函数达到均方根误差的最优速率
Josef Dick
AI总结 本文提出一种新的随机化拟蒙特卡洛算法,通过高阶 scrambled 数字网实现光滑被积函数均方根误差的收敛速率 N^{-α-1/2+ε},并证明该速率在给定光滑性条件下不可改进。
我们研究一种随机采样技术,通过在采样点处对函数求平均来近似积分 $\int_{[0,1]^s}f(\mathbf{x})\,\mathrm{d}\mathbf{x}$。我们关注被积函数光滑的情形,这是统计学中出现的问题。近似误差的收敛速率取决于函数 $f$ 的光滑性和采样技术。例如,对于具有有限方差的函数 $f$,蒙特卡洛(MC)采样产生的均方根误差(RMSE)收敛阶为 $N^{-1/2}$(其中 $N$ 是样本数)。随机化 QMC(RQMC)是 MC 和拟蒙特卡洛(QMC)的结合,在被积函数具有有界变差的更强假设下,RMSE 达到 $N^{-3/2+\varepsilon}$ 阶。RQMC 与局部对偶抽样的结合,对于具有直到二阶混合偏导数的函数,RMSE 收敛阶为 $N^{-3/2-1/s+\varepsilon}$(其中 $s\ge1$ 是维数)。被积函数的额外光滑性通常不会提高这些算法的收敛速率。另一方面,已知如果没有被积函数的额外光滑性,则不可能提高收敛速率。本文介绍一种新的 RQMC 算法,我们证明该算法在满足每个变量具有直到 $α>1$ 阶平方可积混合偏导数的强假设下,均方根误差(RMSE)的收敛阶达到 $N^{-α-1/2+\varepsilon}$。已知的 RMSE 下界表明,对于具有该光滑性的被积函数,该收敛速率一般无法改进。我们提供的数值示例中,RMSE 分别以约 $N^{-5/2}$ 和 $N^{-7/2}$ 阶收敛,与理论上界一致。
We study a random sampling technique to approximate integrals $\int_{[0,1]^s}f(\mathbf{x})\,\mathrm{d}\mathbf{x}$ by averaging the function at some sampling points. We focus on cases where the integrand is smooth, which is a problem which occurs in statistics. The convergence rate of the approximation error depends on the smoothness of the function $f$ and the sampling technique. For instance, Monte Carlo (MC) sampling yields a convergence of the root mean square error (RMSE) of order $N^{-1/2}$ (where $N$ is the number of samples) for functions $f$ with finite variance. Randomized QMC (RQMC), a combination of MC and quasi-Monte Carlo (QMC), achieves a RMSE of order $N^{-3/2+\varepsilon}$ under the stronger assumption that the integrand has bounded variation. A combination of RQMC with local antithetic sampling achieves a convergence of the RMSE of order $N^{-3/2-1/s+\varepsilon}$ (where $s\ge1$ is the dimension) for functions with mixed partial derivatives up to order two. Additional smoothness of the integrand does not improve the rate of convergence of these algorithms in general. On the other hand, it is known that without additional smoothness of the integrand it is not possible to improve the convergence rate. This paper introduces a new RQMC algorithm, for which we prove that it achieves a convergence of the root mean square error (RMSE) of order $N^{-α-1/2+\varepsilon}$ provided the integrand satisfies the strong assumption that it has square integrable partial mixed derivatives up to order $α>1$ in each variable. Known lower bounds on the RMSE show that this rate of convergence cannot be improved in general for integrands with this smoothness. We provide numerical examples for which the RMSE converges approximately with order $N^{-5/2}$ and $N^{-7/2}$, in accordance with the theoretical upper bound.
正则化反演中的超参数和仪器参数估计:以SPIRE/Herschel地图制作为例
F. Orieux, J. -F. Giovannelli, T. Rodet, A. Abergel
AI总结 提出一种基于贝叶斯框架的吉布斯采样方法,用于同时估计正则化图像重建中的超参数和仪器参数,并通过SPIRE/Herschel模拟和真实观测验证了方法的有效性。
我们描述了用于图像重建的正则化方法,并聚焦于超参数和仪器参数估计问题,即无监督和盲问题。我们开发了一个贝叶斯框架,该框架基于给定观测下所有未知量的后验密度。通过基于吉布斯循环并包含Metropolis-Hastings步骤的马尔可夫链蒙特卡罗采样技术探索该密度。数值评估依赖于赫歇尔天文台的SPIRE仪器。使用模拟和真实观测,我们表明超参数和仪器参数被正确估计,这为天体物理学成像开辟了许多前景。
We describe regularized methods for image reconstruction and focus on the question of hyperparameter and instrument parameter estimation, i.e. unsupervised and myopic problems. We developed a Bayesian framework that is based on the \post density for all unknown quantities, given the observations. This density is explored by a Markov Chain Monte-Carlo sampling technique based on a Gibbs loop and including a Metropolis-Hastings step. The numerical evaluation relies on the SPIRE instrument of the Herschel observatory. Using simulated and real observations, we show that the hyperparameters and instrument parameters are correctly estimated, which opens up many perspectives for imaging in astrophysics.
谱聚类:近似算法的实证研究及其在员工流失问题中的应用
B. Cung, T. Jin, J. Ramirez, A. Thompson, C. Boutsidis, D. Needell
AI总结 本文通过实验评估多种谱聚类近似方法,并应用于员工流失预测问题,展示了近似谱聚类在保持分类准确性的同时降低计算成本的有效性。
聚类是将一组对象分成组(称为簇)的问题,使得同一簇内的对象比不同簇中的对象更相似。谱聚类是一种众所周知的聚类方法,它利用数据相似性矩阵的谱来进行这种分离。由于该方法依赖于求解特征向量问题,对于大型数据集计算成本很高。为了克服这一限制,人们开发了近似方法,旨在减少运行时间同时保持准确的分类。在本文中,我们总结并实验评估了几种谱聚类的近似方法。从应用的角度,我们使用谱聚类来解决所谓的员工流失问题,其目标是从一组员工中识别出那些可能自愿离开公司的人。我们的研究揭示了现有近似谱聚类方法的实证性能,并展示了这些方法在一个重要的业务优化相关问题中的适用性。
Clustering is the problem of separating a set of objects into groups (called clusters) so that objects within the same cluster are more similar to each other than to those in different clusters. Spectral clustering is a now well-known method for clustering which utilizes the spectrum of the data similarity matrix to perform this separation. Since the method relies on solving an eigenvector problem, it is computationally expensive for large datasets. To overcome this constraint, approximation methods have been developed which aim to reduce running time while maintaining accurate classification. In this article, we summarize and experimentally evaluate several approximation methods for spectral clustering. From an applications standpoint, we employ spectral clustering to solve the so-called attrition problem, where one aims to identify from a set of employees those who are likely to voluntarily leave the company from those who are not. Our study sheds light on the empirical performance of existing approximate spectral clustering methods and shows the applicability of these methods in an important business optimization related problem.
稀疏贝叶斯算法在MEG/EEG源空间因果分析中MVAR系数估计的有效性
Kensuke Sekihara, Hagai Attias, Julia P. Owen, Srikantan S. Nagarajan
AI总结 本文通过计算机实验评估稀疏贝叶斯算法在强背景干扰下估计多元自回归系数的有效性,发现其虽对干扰不敏感但会降低真实因果关系的可检测性。
本文研究了在存在大量背景干扰时,稀疏贝叶斯算法估计多元自回归系数的有效性。本文通过计算机实验比较了源空间因果分析中的两种方法:传统最小二乘法和稀疏贝叶斯方法。我们的计算机实验结果表明,干扰对最小二乘法的影响非常严重。除非信扰比非常高,否则它会产生大量的假阳性结果。另一方面,稀疏贝叶斯方法对干扰的存在相对不敏感。然而,这种鲁棒性是以牺牲真实因果关系的可检测性为代价的。我们的实验还表明,替代数据自举法往往为稀疏方法提供过低的统计阈值。基于置换检验的方法提供了更高(更保守)的阈值,当控制周期可用时,应与稀疏贝叶斯方法一起使用。
This paper examines the effectiveness of a sparse Bayesian algorithm to estimate multivariate autoregressive coefficients when a large amount of background interference exists. This paper employs computer experiments to compare two methods in the source-space causality analysis: the conventional least-squares method and a sparse Bayesian method. Results of our computer experiments show that the interference affects the least-squares method in a very severe manner. It produces large false-positive results, unless the signal-to-interference ratio is very high. On the other hand, the sparse Bayesian method is relatively insensitive to the existence of interference. However, this robustness of the sparse Bayesian method is attained on the scarifies of the detectability of true causal relationship. Our experiments also show that the surrogate data bootstrapping method tends to give a statistical threshold that are too low for the sparse method. The permutation-test-based method gives a higher (more conservative) threshold and it should be used with the sparse Bayesian method whenever the control period is available.
通过多项式混沌展开和凸优化模拟随机系统
Lorenzo Fagiano, Mustafa Khammash
AI总结 提出一种基于凸优化的计算高效方法,通过特定加权矩阵的正则化技术计算多项式混沌展开系数,避免模型操作,适用于高维随机变量和少量蒙特卡洛模拟。
多项式混沌展开是模拟动态系统随机模型的强大工具。然而,对于复杂系统,推导展开系数可能需要对模型进行大量且非平凡的操作,或者计算大量模拟运行,使得该方法对于具有多个随机变量的应用过于耗时且不切实际。我们引入了一种新颖的计算可行技术来计算多项式混沌展开的系数。该方法利用了一种正则化技术,并选择了特定的加权矩阵,从而能够考虑多项式混沌展开的特定特征。该方法完全基于凸优化,可应用于具有大量随机变量的问题,使用适度的蒙特卡洛模拟次数,同时避免模型操作。当可用时,随机过程的额外信息也可以通过凸约束纳入该方法。我们在三个不同领域的应用中展示了所提出技术的有效性,包括非线性电路分析、组织行为的混沌模型以及化学振荡器。
Polynomial Chaos Expansions represent a powerful tool to simulate stochastic models of dynamical systems. Yet, deriving the expansion's coefficients for complex systems might require a significant and non-trivial manipulation of the model, or the computation of large numbers of simulation runs, rendering the approach too time consuming and impracticable for applications with more than a handful of random variables. We introduce a novel computationally tractable technique for computing the coefficients of polynomial chaos expansions. The approach exploits a regularization technique with a particular choice of weighting matrices, which allow to take into account the specific features of Polynomial Chaos expansions. The method, completely based on convex optimization, can be applied to problems with a large number of random variables and uses a modest number of Monte Carlo simulations, while avoiding model manipulations. Additional information on the stochastic process, when available, can be also incorporated in the approach by means of convex constraints. We show the effectiveness of the proposed technique in three applications in diverse fields, including the analysis of a nonlinear electric circuit, a chaotic model of organizational behavior, finally a chemical oscillator.
Demmel条件数及相关条件数的分布
Prathapasinghe Dharmawansa, Matthew McKay, Yang Chen
AI总结 研究随机矩阵Demmel条件数及相关条件数的精确概率密度和渐近行为,推广了Edelman的工作。
考虑一个随机矩阵 $\mathbf{A}\in\mathbb{C}^{m\times n}$ ($m \geq n$),其元素为独立零均值单位方差的复高斯变量,令 $0<\lambda_1\leq \lambda_{2}\leq ...\leq \lambda_n<\infty$ 为 $\mathbf{A}^{*}\mathbf{A}$ 的特征值,其中 $(\cdot)^*$ 表示共轭转置。本文研究了随机变量 $\frac{\sum_{j=1}^n \lambda_j}{\lambda_k}$ 的分布,其中 $k = 1$ 和 $k = 2$。这两个变量与某些条件数度量有关,包括所谓的Demmel条件数,这些度量已在多种应用中出现。对于这两种情况,我们推导了概率密度的新的精确表达式,并建立了当矩阵维度增大时的渐近行为。特别地,当 $n$ 和 $m$ 趋于无穷大且它们的差固定时,两个密度均以 $n^3$ 的量级缩放。经过适当的变换,我们得到了渐近密度的精确表达式,在某些情况下得到了简单的闭式表达式。我们的结果推广了Edelman关于 $m = n$ 情形下Demmel条件数的工作。
Consider a random matrix $\mathbf{A}\in\mathbb{C}^{m\times n}$ ($m \geq n$) containing independent complex Gaussian entries with zero mean and unit variance, and let $0<λ_1\leq λ_{2}\leq ...\leq λ_n<\infty$ denote the eigenvalues of $\mathbf{A}^{*}\mathbf{A}$ where $(\cdot)^*$ represents conjugate-transpose. This paper investigates the distribution of the random variables $\frac{\sum_{j=1}^n λ_j}{λ_k}$, for $k = 1$ and $k = 2$. These two variables are related to certain condition number metrics, including the so-called Demmel condition number, which have been shown to arise in a variety of applications. For both cases, we derive new exact expressions for the probability densities, and establish the asymptotic behavior as the matrix dimensions grow large. In particular, it is shown that as $n$ and $m$ tend to infinity with their difference fixed, both densities scale on the order of $n^3$. After suitable transformations, we establish exact expressions for the asymptotic densities, obtaining simple closed-form expressions in some cases. Our results generalize the work of Edelman on the Demmel condition number for the case $m = n$.
$l_0$ 正则化凸锥规划问题的迭代硬阈值方法
Zhaosong Lu
AI总结 提出迭代硬阈值方法及其变体求解 $l_0$ 正则化凸锥规划,证明收敛到局部极小点并建立迭代复杂度。
本文考虑 $l_0$ 正则化凸锥规划问题。具体地,我们首先提出一种迭代硬阈值(IHT)方法及其变体,用于求解 $l_0$ 正则化箱约束凸规划。我们证明这些方法生成的序列收敛到局部极小点。同时,我们建立了 IHT 方法寻找 $\epsilon$-局部最优解的迭代复杂度。然后,我们通过将 IHT 方法应用于二次罚松弛,提出一种求解 $l_0$ 正则化凸锥规划的方法,并建立其寻找 $\epsilon$-近似局部极小解的迭代复杂度。最后,我们提出该方法的变体,其中相关的罚参数动态更新,并证明每个聚点是问题的局部极小点。
In this paper we consider $l_0$ regularized convex cone programming problems. In particular, we first propose an iterative hard thresholding (IHT) method and its variant for solving $l_0$ regularized box constrained convex programming. We show that the sequence generated by these methods converges to a local minimizer. Also, we establish the iteration complexity of the IHT method for finding an $ε$-local-optimal solution. We then propose a method for solving $l_0$ regularized convex cone programming by applying the IHT method to its quadratic penalty relaxation and establish its iteration complexity for finding an $ε$-approximate local minimizer. Finally, we propose a variant of this method in which the associated penalty parameter is dynamically updated, and show that every accumulation point is a local minimizer of the problem.
高维阿基米德连接函数的估计量
Marius Hofert, Martin Maechler, Alexander J. McNeil
AI总结 研究已知和新的阿基米德连接函数参数估计量在高维和数值困难下的性能,通过大规模模拟比较多种估计量,首次考虑高达100维的情况,并详细解决计算问题。
研究了已知和新的阿基米德连接函数参数估计量的性能,特别关注大维度和数值困难。具体研究了基于成对Kendall's tau的矩法估计量、Blomqvist's beta的多变量扩展、最小距离估计量、最大似然估计量、模拟最大似然估计量以及基于连接函数对角线的最大似然估计量。在已知和未知边际(伪观测)下,通过大规模模拟研究比较了它们的性能,包括小维度和高维度、小依赖和大依赖、各种不同的阿基米德族和样本量。首次考虑了高达一百维的情况,并详细讨论了由此类大维度产生的计算问题。所有方法均在开源R包copula中实现,因此可以轻松访问和研究。
The performance of known and new parametric estimators for Archimedean copulas is investigated, with special focus on large dimensions and numerical difficulties. In particular, method-of-moments-like estimators based on pairwise Kendall's tau, a multivariate extension of Blomqvist's beta, minimum distance estimators, the maximum-likelihood estimator, a simulated maximum-likelihood estimator, and a maximum-likelihood estimator based on the copula diagonal are studied. Their performance is compared in a large-scale simulation study both under known and unknown margins (pseudo-observations), in small and high dimensions, under small and large dependencies, various different Archimedean families and sample sizes. High dimensions up to one hundred are considered for the first time and computational problems arising from such large dimensions are addressed in detail. All methods are implemented in the open source \R{} package \pkg{copula} and can thus be easily accessed and studied.
基于模型的谱聚类舍入方法
Leonard K. M. Poon, April H. Liu, Tengfei Liu, Nevin Lianwen Zhang
AI总结 提出一种基于潜树模型的谱聚类舍入方法,同时解决特征向量选择、聚类数确定和数据划分三个子问题。
在谱聚类中,首先为数据点集合定义相似矩阵,然后变换矩阵得到拉普拉斯矩阵,接着计算拉普拉斯矩阵的特征向量,最后利用前导特征向量获得数据的划分。最后一步有时称为舍入,需要决定使用多少个前导特征向量、确定聚类数以及划分数据点。本文提出了一种新的舍入方法。该方法在三个方面与以往方法不同。首先,我们放宽了聚类数等于所用特征向量数的假设。其次,在决定使用多少个前导特征向量时,我们不仅依赖前导特征向量本身包含的信息,还使用后续特征向量。第三,我们的方法是基于模型的,并使用一类称为潜树模型的图模型来解决舍入的三个子问题。我们在合成数据和真实数据上评估了该方法。结果表明,在理想情况下(即类间相似度为0),我们的方法能够正确工作,并且随着偏离理想情况,性能会优雅地下降。
In spectral clustering, one defines a similarity matrix for a collection of data points, transforms the matrix to get the Laplacian matrix, finds the eigenvectors of the Laplacian matrix, and obtains a partition of the data using the leading eigenvectors. The last step is sometimes referred to as rounding, where one needs to decide how many leading eigenvectors to use, to determine the number of clusters, and to partition the data points. In this paper, we propose a novel method for rounding. The method differs from previous methods in three ways. First, we relax the assumption that the number of clusters equals the number of eigenvectors used. Second, when deciding the number of leading eigenvectors to use, we not only rely on information contained in the leading eigenvectors themselves, but also use subsequent eigenvectors. Third, our method is model-based and solves all the three subproblems of rounding using a class of graphical models called latent tree models. We evaluate our method on both synthetic and real-world data. The results show that our method works correctly in the ideal case where between-clusters similarity is 0, and degrades gracefully as one moves away from the ideal case.
具有闭式解的秩/范数正则化:应用于子空间聚类
Yao-Liang Yu, Dale Schuurmans
AI总结 本文通过推广Eckart-Young-Mirsky定理到所有酉不变范数,得到秩/范数正则化问题的闭式解,并应用于子空间聚类,获得新理论见解和实验效果。
当数据从未知子空间采样时,主成分分析(PCA)提供了一种有效的方法来估计子空间,从而降低数据的维度。PCA的核心是Eckart-Young-Mirsky定理,该定理刻画了矩阵的最佳秩k近似。在本文中,我们证明了Eckart-Young-Mirsky定理在所有酉不变范数下的推广。利用这一结果,我们得到了一组秩/范数正则化问题的闭式解,并推导出一类通用子空间聚类问题(其中数据由未知子空间的并集建模)的闭式解。从这些结果中,我们获得了新的理论见解和有希望的实验结果。
When data is sampled from an unknown subspace, principal component analysis (PCA) provides an effective way to estimate the subspace and hence reduce the dimension of the data. At the heart of PCA is the Eckart-Young-Mirsky theorem, which characterizes the best rank k approximation of a matrix. In this paper, we prove a generalization of the Eckart-Young-Mirsky theorem under all unitarily invariant norms. Using this result, we obtain closed-form solutions for a set of rank/norm regularized problems, and derive closed-form solutions for a general class of subspace clustering problems (where data is modelled by unions of unknown subspaces). From these results we obtain new theoretical insights and promising experimental results.
随意设计各种成分分析
Akisato Kimura, Masashi Sugiyama, Sakano Hitoshi, Hirokazu Kameoka
AI总结 提出一种基于广义成对表达(GPE)的通用成分分析框架,涵盖标准方法、正则化、加权、聚类及半监督扩展,并给出利用模板组合设计新方法的简单策略。
本文提供了一个通用的成分分析(CA)方法框架,引入了一种新的散度矩阵和Gram矩阵表达式,称为广义成对表达(GPE)。该表达式非常紧凑但功能强大:该框架不仅包括(1)标准CA方法,还包括(2)几种正则化技术,(3)加权扩展,(4)一些聚类方法,以及(5)它们的半监督扩展。本文还提出了一种非常简单的方法,用于从所提出的框架中设计所需的CA方法:采用已知的GPE作为模板,并通过适当组合这些模板生成新方法。
This paper provides a generic framework of component analysis (CA) methods introducing a new expression for scatter matrices and Gram matrices, called Generalized Pairwise Expression (GPE). This expression is quite compact but highly powerful: The framework includes not only (1) the standard CA methods but also (2) several regularization techniques, (3) weighted extensions, (4) some clustering methods, and (5) their semi-supervised extensions. This paper also presents quite a simple methodology for designing a desired CA method from the proposed framework: Adopting the known GPEs as templates, and generating a new method by combining these templates appropriately.
关于具有随机状态相关收益的纳什寻求算法的收敛性
A. F. Hanif, H. Tembine, M. Assaad, D. Zeghlache
AI总结 本文提出一种分布式随机状态相关收益下的纳什均衡寻求算法,证明其收敛到常微分方程定义的极限轨迹,并给出误差界。
近年来,分布式战略学习受到关注。随着系统变得分布式,以分布式方式寻找纳什均衡对各种应用变得越来越重要。在本文中,我们开发了一个分布式战略学习框架,用于在随机状态相关收益函数下寻找纳什均衡。我们将Krstic等人[1]的工作扩展到随机状态相关收益函数的情况。我们开发了一种迭代分布式纳什寻求算法,并研究其收敛到由常微分方程(ODE)定义的极限轨迹。我们证明了所提算法在步长趋于零时的收敛性,并给出了固定步长的误差界。最后,我们进行了稳定性分析,并将所提方案应用于通用无线网络。我们还提供了数值结果,证实了我们的主张。
Distributed strategic learning has been getting attention in recent years. As systems become distributed finding Nash equilibria in a distributed fashion is becoming more important for various applications. In this paper, we develop a distributed strategic learning framework for seeking Nash equilibria under stochastic state-dependent payoff functions. We extend the work of Krstic et.al. in [1] to the case of stochastic state dependent payoff functions. We develop an iterative distributed algorithm for Nash seeking and examine its convergence to a limiting trajectory defined by an Ordinary Differential Equation (ODE). We show convergence of our proposed algorithm for vanishing step size and provide an error bound for fixed step size. Finally, we conduct a stability analysis and apply the proposed scheme in a generic wireless networks. We also present numerical results which corroborate our claim.
一种求解复指数逼近问题的黑箱方法
Piero Barone
AI总结 针对从含噪数据中估计阻尼正弦波数量及其参数的问题,提出一种随机扰动方法,在低信噪比下优于最大似然方法,且超参数可固定,形成黑箱方法。
一个在许多不同应用背景下出现的常见问题是,估计加权和最佳拟合一组有限含噪数据的指数衰减正弦波的数量,并估计其参数。为此存在许多不同的方法。其中最好的方法基于近似最大似然估计,假设已知阻尼正弦波的数量,然后可以通过阶数选择程序进行估计。由于该问题可能严重不适定,提出了一种随机扰动方法,在信噪比较低时比基于最大似然的方法提供更好的结果。该方法依赖于一些超参数,这些超参数实际上与应用无关。因此,它们可以一次性固定,从而产生一种黑箱方法。
A common problem, arising in many different applied contexts, consists in estimating the number of exponentially damped sinusoids whose weighted sum best fits a finite set of noisy data and in estimating their parameters. Many different methods exist to this purpose. The best of them are based on approximate Maximum Likelihood estimators, assuming to know the number of damped sinusoids, which can then be estimated by an order selection procedure. As the problem can be severely ill posed, a stochastic perturbation method is proposed which provides better results than Maximum Likelihood based methods when the signal-to-noise ratio is low. The method depends on some hyperparameters which turn out to be essentially independent of the application. Therefore they can be fixed once and for all, giving rise to a black box method.
大特征间隙下Nyström方法的改进界
Mehrdad Mahdavi, Tianbao Yang, Rong Jin
AI总结 针对核矩阵谱中存在大特征间隙的情况,基于积分算子集中不等式和矩阵扰动理论,将Nyström方法的Frobenius范数近似误差从O(N/m^{1/4})改进到O(N/m^{1/2})。
我们在大特征间隙假设下,为Nyström方法的近似误差建立了一个改进的界。这是基于经验观察,即特征间隙对Nyström方法的近似误差有显著影响。我们的方法基于积分算子的集中不等式和矩阵扰动理论。我们的分析表明,当存在大特征间隙时,在Frobenius范数下,我们可以将Nyström方法的近似误差从$O(N/m^{1/4})$改进到$O(N/m^{1/2})$,其中$N$是核矩阵的大小,$m$是采样列的数量。
We develop an improved bound for the approximation error of the Nyström method under the assumption that there is a large eigengap in the spectrum of kernel matrix. This is based on the empirical observation that the eigengap has a significant impact on the approximation error of the Nyström method. Our approach is based on the concentration inequality of integral operator and the theory of matrix perturbation. Our analysis shows that when there is a large eigengap, we can improve the approximation error of the Nyström method from $O(N/m^{1/4})$ to $O(N/m^{1/2})$ when measured in Frobenius norm, where $N$ is the size of the kernel matrix, and $m$ is the number of sampled columns.
基于最优映射的贝叶斯推断
Tarek A. El Moselhy, Youssef M. Marzouk
AI总结 提出一种避免马尔可夫链模拟的贝叶斯推断方法,通过最优传输理论构造从先验到后验的保测映射,并利用优化求解实现高效后验采样与矩计算。
我们提出了一种新的贝叶斯推断方法,完全避免了马尔可夫链模拟,通过构造一个将先验测度推送到后验测度的映射。在最优传输理论的框架下,我们建立了合适的保测映射的存在性和唯一性。我们讨论了显式参数化映射的各种方法,并通过求解优化问题高效计算映射,尽可能利用前向模型的梯度信息。得到的算法克服了马尔可夫链蒙特卡洛的许多计算瓶颈。基于映射的后验表示的优势包括后验矩的解析表达式以及无需额外的似然评估或前向求解即可生成任意数量的独立后验样本的能力。优化方法还为后验近似提供了清晰的收敛准则,并通过自动计算边际似然促进模型选择。我们在不同维度的非线性逆问题上展示了该方法的准确性和效率,涉及常微分方程和偏微分方程中参数的推断。
We present a new approach to Bayesian inference that entirely avoids Markov chain simulation, by constructing a map that pushes forward the prior measure to the posterior measure. Existence and uniqueness of a suitable measure-preserving map is established by formulating the problem in the context of optimal transport theory. We discuss various means of explicitly parameterizing the map and computing it efficiently through solution of an optimization problem, exploiting gradient information from the forward model when possible. The resulting algorithm overcomes many of the computational bottlenecks associated with Markov chain Monte Carlo. Advantages of a map-based representation of the posterior include analytical expressions for posterior moments and the ability to generate arbitrary numbers of independent posterior samples without additional likelihood evaluations or forward solves. The optimization approach also provides clear convergence criteria for posterior approximation and facilitates model selection through automatic evaluation of the marginal likelihood. We demonstrate the accuracy and efficiency of the approach on nonlinear inverse problems of varying dimension, involving the inference of parameters appearing in ordinary and partial differential equations.
基于高斯过程的鲁棒滤波与平滑
Marc Peter Deisenroth, Ryan Turner, Marco F. Huber, Uwe D. Hanebeck, Carl Edward Rasmussen
AI总结 提出一种基于非参数高斯过程模型的非线性随机动态系统鲁棒贝叶斯滤波与平滑算法,通过解析平滑实现鲁棒性,数值实验表明在其它先进方法失效时仍保持稳健。
我们提出了一种原则性算法,用于在非线性随机动态系统中进行鲁棒贝叶斯滤波和平滑,其中转移函数和测量函数均由非参数高斯过程(GP)模型描述。在信号处理、机器学习、机器人和控制领域,GP通过后验概率分布表示未知系统函数,其重要性日益增加。这种现代的“系统辨识”方式比寻找参数函数表示的点估计更为鲁棒。在本文中,我们提出了一种原则性算法,用于在GP动态系统中进行鲁棒解析平滑,该系统在机器人和控制领域应用日益广泛。我们的数值评估表明,在其它最先进的高斯滤波器和平滑器可能失败的情况下,所提方法具有鲁棒性。
We propose a principled algorithm for robust Bayesian filtering and smoothing in nonlinear stochastic dynamic systems when both the transition function and the measurement function are described by non-parametric Gaussian process (GP) models. GPs are gaining increasing importance in signal processing, machine learning, robotics, and control for representing unknown system functions by posterior probability distributions. This modern way of "system identification" is more robust than finding point estimates of a parametric function representation. In this article, we present a principled algorithm for robust analytic smoothing in GP dynamic systems, which are increasingly used in robotics and control. Our numerical evaluations demonstrate the robustness of the proposed approach in situations where other state-of-the-art Gaussian filters and smoothers can fail.
分布式线性参数估计:渐近高效的自适应策略
Soummya Kar, Jose' M. F. Moura, H. Vincent Poor
AI总结 针对多智能体推理网络中的分布式自适应线性参数估计问题,提出一种混合时间尺度随机过程,通过同时进行分布式学习和估计,使得智能体估计渐近达到集中式估计器的效率。
本文考虑了多智能体推理网络中的分布式自适应线性参数估计问题。局部传感模型信息仅在智能体部分可用,且智能体间通信被假设为不可预测。本文开发了一种通用的混合时间尺度随机过程,包括同时进行的分布式学习和估计,其中智能体随时间自适应地评估其相对观测质量并相应地融合创新。在关于统计模型和智能体间通信的相当弱的假设下,通过适当调整共识势相对于创新势,表明学习过程中产生的渐近信息率损失可以忽略不计。因此,智能体估计是渐近高效的,其渐近协方差与具有完美全局模型信息并始终访问所有观测的集中式估计器(对于高斯系统,为集中式Fisher信息率的逆)的协方差一致。证明技术主要基于非马尔可夫混合时间尺度随机逼近过程的收敛性论证。过程中发展的几个近似结果具有独立意义。
The paper considers the problem of distributed adaptive linear parameter estimation in multi-agent inference networks. Local sensing model information is only partially available at the agents and inter-agent communication is assumed to be unpredictable. The paper develops a generic mixed time-scale stochastic procedure consisting of simultaneous distributed learning and estimation, in which the agents adaptively assess their relative observation quality over time and fuse the innovations accordingly. Under rather weak assumptions on the statistical model and the inter-agent communication, it is shown that, by properly tuning the consensus potential with respect to the innovation potential, the asymptotic information rate loss incurred in the learning process may be made negligible. As such, it is shown that the agent estimates are asymptotically efficient, in that their asymptotic covariance coincides with that of a centralized estimator (the inverse of the centralized Fisher information rate for Gaussian systems) with perfect global model information and having access to all observations at all times. The proof techniques are mainly based on convergence arguments for non-Markovian mixed time scale stochastic approximation procedures. Several approximation results developed in the process are of independent interest.
可证明安全且鲁棒的基于学习的模型预测控制
Anil Aswani, Humberto Gonzalez, S. Shankar Sastry, Claire Tomlin
AI总结 提出一种基于学习的模型预测控制(LBMPC)方案,通过解耦安全与性能,利用统计学习改进性能并保证鲁棒性。
控制器设计面临鲁棒性与性能之间的权衡,线性控制器的可靠性使得许多从业者关注前者。然而,为了应对日益增长的能源约束,提高系统性能重新引起兴趣。本文描述了一种基于学习的模型预测控制(LBMPC)方案,该方案提供鲁棒性的确定性保证,同时使用统计识别工具来识别更丰富的系统模型以提高性能;该框架的优点在于它处理状态和输入约束,根据成本函数优化系统性能,并且可以设计使用各种参数或非参数统计工具。LBMPC的主要见解是,在优化框架中,通过维护两个系统模型,可以在合理条件下解耦安全性和性能。第一个是具有不确定性界限的近似模型,第二个模型通过统计方法更新。LBMPC通过选择最小化成本的输入(受学习动力学约束)来提高性能,并通过检查这些相同的输入是否在不确定性下保持近似模型稳定来确保安全性和鲁棒性。此外,我们证明如果系统充分激励,则LBMPC控制动作概率收敛到使用真实动力学计算的MPC的控制动作。
Controller design faces a trade-off between robustness and performance, and the reliability of linear controllers has caused many practitioners to focus on the former. However, there is renewed interest in improving system performance to deal with growing energy constraints. This paper describes a learning-based model predictive control (LBMPC) scheme that provides deterministic guarantees on robustness, while statistical identification tools are used to identify richer models of the system in order to improve performance; the benefits of this framework are that it handles state and input constraints, optimizes system performance with respect to a cost function, and can be designed to use a wide variety of parametric or nonparametric statistical tools. The main insight of LBMPC is that safety and performance can be decoupled under reasonable conditions in an optimization framework by maintaining two models of the system. The first is an approximate model with bounds on its uncertainty, and the second model is updated by statistical methods. LBMPC improves performance by choosing inputs that minimize a cost subject to the learned dynamics, and it ensures safety and robustness by checking whether these same inputs keep the approximate model stable when it is subject to uncertainty. Furthermore, we show that if the system is sufficiently excited, then the LBMPC control action probabilistically converges to that of an MPC computed using the true dynamics.
流形上的局部线性回归及其几何解释
Ming-Yen Cheng, Hau-tieng Wu
AI总结 针对高维数据位于未知低维非线性流形上的情况,提出直接降维至流形本征维数并在切平面估计上执行局部线性回归(LLR)的方法,同时分析其几何意义并用于流形学习。
高维数据分析一直是一个活跃的领域,主要关注变量选择和降维。在实践中,变量通常位于一个未知的低维非线性流形上。基于这一流形假设,本文的目的之一是在流形上进行回归和梯度估计,另一个目的是开发一种新的流形学习工具。对于第一个目标,我们建议直接将维数降至流形的本征维数 $d$,并在切平面估计上执行流行的局部线性回归(LLR)。当环境空间维数 $p\gg d$ 时,一个直接的结果是计算时间的大幅减少。通过仔细分析曲率、边界和非均匀采样效应,我们为所提出的回归和梯度估计量的收敛性提供了严格的理论证明。提出了一种能够处理异方差误差的带宽选择器。对于第二个目标,我们仔细分析了回归估计量在流形内部和边界附近的行为,并明确了其与流形学习的关系,特别是估计流形的 Laplace-Beltrami 算子。在此背景下,我们还明确指出,在切平面估计中使用比 LLR 更小的带宽是重要的。仿真研究和 Isomap 人脸数据示例用于说明我们方法的计算速度和估计精度。
High-dimensional data analysis has been an active area, and the main focuses have been variable selection and dimension reduction. In practice, it occurs often that the variables are located on an unknown, lower-dimensional nonlinear manifold. Under this manifold assumption, one purpose of this paper is regression and gradient estimation on the manifold, and another is developing a new tool for manifold learning. To the first aim, we suggest directly reducing the dimensionality to the intrinsic dimension $d$ of the manifold, and performing the popular local linear regression (LLR) on a tangent plane estimate. An immediate consequence is a dramatic reduction in the computation time when the ambient space dimension $p\gg d$. We provide rigorous theoretical justification of the convergence of the proposed regression and gradient estimators by carefully analyzing the curvature, boundary, and non-uniform sampling effects. A bandwidth selector that can handle heteroscedastic errors is proposed. To the second aim, we analyze carefully the behavior of our regression estimator both in the interior and near the boundary of the manifold, and make explicit its relationship with manifold learning, in particular estimating the Laplace-Beltrami operator of the manifold. In this context, we also make clear that it is important to use a smaller bandwidth in the tangent plane estimation than in the LLR. Simulation studies and the Isomap face data example are used to illustrate the computational speed and estimation accuracy of our methods.
瞬时频率与波形函数 (I)
Hau-tieng Wu
AI总结 本文针对一类近似周期重复的波形函数,严格定义了瞬时频率,并证明同步压缩变换可用于确定其瞬时频率,即使波形非谐波,从而推广了余弦波函数的早期结果。
尽管可以直观地定义瞬时频率的概念,将“频率”推广到如傅里叶变换中的理解,但缺乏严格的数学定义。本文考虑一类由近似周期重复的波形组成的函数,并赋予瞬时频率严格意义。我们证明,即使波形非谐波,同步压缩变换也可用于确定该类函数的瞬时频率,从而推广了余弦波函数的早期结果。我们还提供了实际例子,并讨论了在这些例子中考虑此类非谐波波形的优势。
Although one can formulate an intuitive notion of instantaneous frequency, generalizing "frequency" as we understand it in e.g. the Fourier transform, a rigorous mathematical definition is lacking. In this paper, we consider a class of functions composed of waveforms that repeat nearly periodically, and for which the instantaneous frequency can be given a rigorous meaning. We show that Synchrosqueezing can be used to determine the instantaneous frequency of functions in this class, even if the waveform is not harmonic, thus generalizing earlier results for cosine wave functions. We also provide real-life examples and discuss the advantages, for these examples, of considering such non-harmonic waveforms.
MahNMF: 曼哈顿非负矩阵分解
Naiyang Guan, Dacheng Tao, Zhigang Luo, John Shawe-Taylor
AI总结 针对重尾噪声和异常值问题,提出基于曼哈顿距离的MahNMF模型,并开发了秩一残差迭代和Nesterov平滑两种快速优化算法。
非负矩阵分解(NMF)通过两个非负低秩因子矩阵 $W$ 和 $H$ 的乘积来逼近非负矩阵 $X$。NMF 及其扩展通过最小化 $X$ 与 $W^T H$ 之间的 Kullback-Leibler 散度或欧氏距离来建模泊松噪声或高斯噪声。然而,当噪声分布具有重尾特性时,这些方法表现不佳。本文提出曼哈顿 NMF(MahNMF),通过最小化 $X$ 与 $W^T H$ 之间的曼哈顿距离来建模重尾拉普拉斯噪声。与稀疏和低秩矩阵分解类似,MahNMF 能够鲁棒地估计非负矩阵的低秩部分和稀疏部分,从而在数据受到异常值污染时有效工作。我们通过开发带盒约束的 MahNMF、流形正则化 MahNMF、组稀疏 MahNMF、弹性网诱导 MahNMF 和对称 MahNMF,将 MahNMF 扩展到各种实际应用。本文的主要贡献在于为 MahNMF 及其扩展提出了两种快速优化算法:秩一残差迭代(RRI)方法和 Nesterov 平滑方法。具体地,通过将 MahNMF 中的残差矩阵近似为 $W$ 的一行和 $H$ 的一行的外积,我们开发了 RRI 方法,以闭式解迭代更新 $W$ 和 $H$ 的每个变量。尽管 RRI 对于小规模 MahNMF 及其某些扩展是高效的,但它既不能扩展到大规模矩阵,也不够灵活以优化所有 MahNMF 扩展。由于 MahNMF 及其扩展的目标函数既非凸也不光滑,我们应用 Nesterov 平滑方法,在固定一个因子矩阵的情况下递归优化另一个因子矩阵。通过将平滑参数设置为与迭代次数成反比,我们逐步提高了 MahNMF 及其扩展的逼近精度。
Non-negative matrix factorization (NMF) approximates a non-negative matrix $X$ by a product of two non-negative low-rank factor matrices $W$ and $H$. NMF and its extensions minimize either the Kullback-Leibler divergence or the Euclidean distance between $X$ and $W^T H$ to model the Poisson noise or the Gaussian noise. In practice, when the noise distribution is heavy tailed, they cannot perform well. This paper presents Manhattan NMF (MahNMF) which minimizes the Manhattan distance between $X$ and $W^T H$ for modeling the heavy tailed Laplacian noise. Similar to sparse and low-rank matrix decompositions, MahNMF robustly estimates the low-rank part and the sparse part of a non-negative matrix and thus performs effectively when data are contaminated by outliers. We extend MahNMF for various practical applications by developing box-constrained MahNMF, manifold regularized MahNMF, group sparse MahNMF, elastic net inducing MahNMF, and symmetric MahNMF. The major contribution of this paper lies in two fast optimization algorithms for MahNMF and its extensions: the rank-one residual iteration (RRI) method and Nesterov's smoothing method. In particular, by approximating the residual matrix by the outer product of one row of W and one row of $H$ in MahNMF, we develop an RRI method to iteratively update each variable of $W$ and $H$ in a closed form solution. Although RRI is efficient for small scale MahNMF and some of its extensions, it is neither scalable to large scale matrices nor flexible enough to optimize all MahNMF extensions. Since the objective functions of MahNMF and its extensions are neither convex nor smooth, we apply Nesterov's smoothing method to recursively optimize one factor matrix with another matrix fixed. By setting the smoothing parameter inversely proportional to the iteration number, we improve the approximation accuracy iteratively for both MahNMF and its extensions.
基于模型的强化学习的不可知系统辨识
Stephane Ross, J. Andrew Bagnell
AI总结 针对模型类可能不包含真实系统的不可知情况,提出一种利用无遗憾在线学习算法获得近优策略的迭代方法,并在离散和连续域上验证其有效性。
控制中的一个基本问题是从观测中学习一个对控制器综合有用的系统模型。为了提供良好的性能保证,现有方法必须假设真实系统属于学习过程中考虑的模型类。我们提出了一种迭代方法,即使在系统不在模型类中的不可知情况下,也能提供强有力的保证。特别地,我们表明,只要某个模型实现了低训练误差并且能够访问良好的探索分布,任何无遗憾在线学习算法都可以用于获得近优策略。我们的方法适用于离散和连续域。我们在文献中一个具有挑战性的直升机领域上展示了其有效性和可扩展性。
A fundamental problem in control is to learn a model of a system from observations that is useful for controller synthesis. To provide good performance guarantees, existing methods must assume that the real system is in the class of models considered during learning. We present an iterative method with strong guarantees even in the agnostic case where the system is not in the class. In particular, we show that any no-regret online learning algorithm can be used to obtain a near-optimal policy, provided some model achieves low training error and access to a good exploration distribution. Our approach applies to both discrete and continuous domains. We demonstrate its efficacy and scalability on a challenging helicopter domain from the literature.
马尔可夫链转移核的几何分配方法
Hidemaro Suwa, Synge Todo
AI总结 提出一种几何分配方法构建马尔可夫链转移核,最小化平均拒绝率并在许多情况下实现零拒绝,同时支持不可逆转移核,显著降低自相关时间。
我们引入了一种新的几何方法,用于构建马尔可夫链的转移核。我们的方法始终最小化平均拒绝率,并在许多相关情况下甚至将其降至零,这是传统方法(如Metropolis-Hastings算法或热浴算法(Gibbs采样器))无法实现的。此外,几何方法使得不仅能够找到可逆的,还能找到不可逆的无拒绝转移概率的解。这是第一种在一般情况下能够构建不可逆转移核的通用方法。我们证明,Potts模型的自相关时间(渐近方差)比传统Metropolis-Hastings算法缩短了6倍以上。我们的算法几乎适用于所有类型的马尔可夫链蒙特卡洛方法,并将提高其效率。
We introduce a new geometric approach that constructs a transition kernel of Markov chain. Our method always minimizes the average rejection rate and even reduce it to zero in many relevant cases, which cannot be achieved by conventional methods, such as the Metropolis-Hastings algorithm or the heat bath algorithm (Gibbs sampler). Moreover, the geometric approach makes it possible to find not only a reversible but also an irreversible solution of rejection-free transition probabilities. This is the first versatile method that can construct an irreversible transition kernel in general cases. We demonstrate that the autocorrelation time (asymptotic variance) of the Potts model becomes more than 6 times as short as that by the conventional Metropolis-Hastings algorithm. Our algorithms are applicable to almost all kinds of Markov chain Monte Carlo methods and will improve the efficiency.
CIR过程与Heston模型的卡方模拟
Simon J. A. Malham, Anke Wiese
AI总结 提出基于广义高斯变量幂和的中心卡方密度新表示,结合Marsaglia极坐标法扩展与Beasley-Springer-Moro直接反演法,实现高精度、鲁棒且高效的卡方采样,并应用于Heston模型的非中心卡方方差模拟。
Cox-Ingersoll-Ross过程的转移概率可由非中心卡方密度表示。首先,我们基于广义高斯随机变量的幂和证明了中心卡方密度的新表示。其次,我们证明了Marsaglia极坐标法可推广到该分布,为广义高斯采样(从而为中心卡方采样)提供了一种简单、精确、稳健且高效的接受-拒绝方法。第三,我们基于Beasley-Springer-Moro方法推导了一种简单、高精度、稳健且高效的广义高斯采样直接反演法,其对逆累积分布函数的逼近精度可达小数点后第十位。随后,我们将方法应用于Heston模型中的非中心卡方方差采样。我们重点关注自由度较小且零边界具有吸引性且可达的情况,这在外汇市场中很典型。利用卡方分布的可加性,我们的方法适用于所有参数区域。
The transition probability of a Cox-Ingersoll-Ross process can be represented by a non-central chi-square density. First we prove a new representation for the central chi-square density based on sums of powers of generalized Gaussian random variables. Second we prove Marsaglia's polar method extends to this distribution, providing a simple, exact, robust and efficient acceptance-rejection method for generalized Gaussian sampling and thus central chi-square sampling. Third we derive a simple, high-accuracy, robust and efficient direct inversion method for generalized Gaussian sampling based on the Beasley-Springer-Moro method. Indeed the accuracy of the approximation to the inverse cumulative distribution function is to the tenth decimal place. We then apply our methods to non-central chi-square variance sampling in the Heston model. We focus on the case when the number of degrees of freedom is small and the zero boundary is attracting and attainable, typical in foreign exchange markets. Using the additivity property of the chi-square distribution, our methods apply in all parameter regimes.
更快的高斯求和:理论与实验
Dongryeol Lee, Alexander G. Gray
AI总结 本文针对机器学习中常见的高斯求和问题,提出两种新扩展(带严格误差界的O(Dp)泰勒展开和集成任意近似方法的新误差控制方案),并在自适应分层数据结构框架下实现更快的算法,通过核密度估计中的最优带宽选择实验首次揭示了当前最先进方法的优缺点。
我们为高斯求和问题提供了更快的算法,该问题出现在许多机器学习方法中。我们在使用自适应分层数据结构的最佳离散算法框架内,开发了两个新的扩展——一个具有严格误差界的O(Dp)泰勒展开式用于高斯核,以及一个集成任意近似方法的新误差控制方案。我们在核密度估计中最优带宽选择的背景下严格评估了这些技术的实证效果,首次揭示了当前最先进方法的优缺点。我们的结果表明,新的误差控制方案提高了性能,而级数展开方法仅在低维(五维或以下)中有效。
We provide faster algorithms for the problem of Gaussian summation, which occurs in many machine learning methods. We develop two new extensions - an O(Dp) Taylor expansion for the Gaussian kernel with rigorous error bounds and a new error control scheme integrating any arbitrary approximation method - within the best discretealgorithmic framework using adaptive hierarchical data structures. We rigorously evaluate these techniques empirically in the context of optimal bandwidth selection in kernel density estimation, revealing the strengths and weaknesses of current state-of-the-art approaches for the first time. Our results demonstrate that the new error control scheme yields improved performance, whereas the series expansion approach is only effective in low dimensions (five or less).
矩阵瓦片分析
Inmar Givoni, Vincent Cheung, Brendan J. Frey
AI总结 提出矩阵瓦片分析(MTA)问题,通过非重叠瓦片分解矩阵,并设计近似迭代算法和和积松弛方法,在合成数据和酵母基因敲除数据上验证其有效性。
许多任务需要在数字、符号或类别似然矩阵中寻找元素组。一种方法是使用高效的双线性或三线性分解技术,包括PCA、ICA、稀疏矩阵分解和格子分析。当矩阵元素的加法和乘法没有明确定义时,这些技术不适用。更直接地,像双聚类这样的方法可用于对矩阵元素进行分类,但这些方法做出了过于严格的假设,即每个元素的类别是行类别和列类别的函数。我们引入一个通用的计算问题——矩阵瓦片分析(MTA),它将矩阵分解为一组非重叠的瓦片,每个瓦片由通常不相邻的行和列的子集定义。MTA不需要用于组合瓦片的代数,但必须搜索瓦片分配的离散组合。精确MTA是一个计算上难以处理的整数规划问题,但我们描述了一种近似迭代技术和一种计算高效的整数规划和积松弛。我们在数百个随机生成的任务上比较了这些方法与PCA和格子分析的有效性。利用双基因敲除数据,我们展示了MTA找到了具有生物学相关功能的相互作用酵母基因群。
Many tasks require finding groups of elements in a matrix of numbers, symbols or class likelihoods. One approach is to use efficient bi- or tri-linear factorization techniques including PCA, ICA, sparse matrix factorization and plaid analysis. These techniques are not appropriate when addition and multiplication of matrix elements are not sensibly defined. More directly, methods like bi-clustering can be used to classify matrix elements, but these methods make the overly-restrictive assumption that the class of each element is a function of a row class and a column class. We introduce a general computational problem, `matrix tile analysis' (MTA), which consists of decomposing a matrix into a set of non-overlapping tiles, each of which is defined by a subset of usually nonadjacent rows and columns. MTA does not require an algebra for combining tiles, but must search over discrete combinations of tile assignments. Exact MTA is a computationally intractable integer programming problem, but we describe an approximate iterative technique and a computationally efficient sum-product relaxation of the integer program. We compare the effectiveness of these methods to PCA and plaid on hundreds of randomly generated tasks. Using double-gene-knockout data, we show that MTA finds groups of interacting yeast genes that have biologically-related functions.
同时稀疏和低秩矩阵的估计
Emile Richard, Pierre-Andre Savalle, Nicolas Vayatis
AI总结 本文提出一种凸混合惩罚方法,同时使用ℓ1范数和迹范数,以估计同时稀疏和低秩的矩阵,并推导了预言不等式和链接预测的泛化误差界,通过近端下降算法高效求解。
本文介绍了一种惩罚矩阵估计过程,旨在同时实现稀疏和低秩的解。这种结构出现在社交网络或蛋白质相互作用的背景下,其中底层图的邻接矩阵在适当基下是块对角化的。我们引入了一种凸混合惩罚,同时涉及ℓ1范数和迹范数。我们得到了一个预言不等式,指示了两种效应如何根据目标矩阵的性质相互作用。我们界定了链接预测问题中的泛化误差。我们还开发了近端下降策略来高效求解优化问题,并在合成和真实数据集上评估了性能。
The paper introduces a penalized matrix estimation procedure aiming at solutions which are sparse and low-rank at the same time. Such structures arise in the context of social networks or protein interactions where underlying graphs have adjacency matrices which are block-diagonal in the appropriate basis. We introduce a convex mixed penalty which involves $\ell_1$-norm and trace norm simultaneously. We obtain an oracle inequality which indicates how the two effects interact according to the nature of the target matrix. We bound generalization error in the link prediction problem. We also develop proximal descent strategies to solve the optimization problem efficiently and evaluate performance on synthetic and real data sets.
低秩矩阵完备可辨识性的组合代数方法
Franz Kiraly, Ryota Tomioka
AI总结 本文通过组合代数方法,首次给出了任意秩矩阵从一组矩阵条目中可辨识的充要组合条件,并提出了新算法。
本文回顾了矩阵完备问题,并揭示了其与代数几何、组合学和图论的密切联系。我们首次给出了任意秩矩阵从一组矩阵条目中可辨识的充要组合条件,为矩阵完备问题提供了理论约束和新算法。最后,我们通过算法评估了给定条件和算法在实际相关矩阵大小上的紧致性,表明代数组合方法可以改进现有的矩阵完备方法。
In this paper, we review the problem of matrix completion and expose its intimate relations with algebraic geometry, combinatorics and graph theory. We present the first necessary and sufficient combinatorial conditions for matrices of arbitrary rank to be identifiable from a set of matrix entries, yielding theoretical constraints and new algorithms for the problem of matrix completion. We conclude by algorithmically evaluating the tightness of the given conditions and algorithms for practically relevant matrix sizes, showing that the algebraic-combinatoric approach can lead to improvements over state-of-the-art matrix completion methods.
面向控制的定向时间序列回归
Yi-Hao Kao, Benjamin Van Roy
AI总结 提出定向时间序列回归方法,结合最小二乘回归与经验优化的优点,用于确定性等价模型预测控制中的时间序列模型参数估计,在随机倒立摆平衡问题中显著提升控制器性能。
我们提出了定向时间序列回归,这是一种用于确定性等价模型预测控制中时间序列模型参数估计的新方法。该方法结合了最小二乘回归和经验优化的优点。通过一个涉及著名倒立摆平衡问题的随机版本的计算研究,我们证明了定向时间序列回归能够在控制器性能上比上述任何一种替代方法产生显著的改进。
We propose directed time series regression, a new approach to estimating parameters of time-series models for use in certainty equivalent model predictive control. The approach combines merits of least squares regression and empirical optimization. Through a computational study involving a stochastic version of a well known inverted pendulum balancing problem, we demonstrate that directed time series regression can generate significant improvements in controller performance over either of the aforementioned alternatives.
非线性交流潮流模型的鲁棒电力系统状态估计
Hao Zhu, Georgios B. Giannakis
AI总结 针对非线性交流潮流模型下状态估计的非凸性和数据完整性问题,提出一种基于过完备异常向量稀疏性的鲁棒状态估计方法,利用凸半定松弛技术实现高效求解,并在IEEE 30节点系统上验证了优越性。
电力系统的一项重要监控任务是准确估计系统运行状态。在非线性交流潮流模型下,状态估计(SE)问题本质上是非凸的,导致许多局部最优解。除了非凸性,SE还面临数据完整性和网络安全问题的挑战。不幸的是,实践中常规采用的现有鲁棒(R-)SE方案依赖于迭代求解器,这些求解器对初始化敏感且无法保证全局最优性。本文通过利用过完备异常向量模型的稀疏性,提出了一种新的R-SE方法。研究了该模型的可观测性和可辨识性问题,并在R-SE与纠错编码之间建立了简洁的联系。进一步采用凸半定松弛(SDR)技术,使非凸R-SE问题能够高效求解。所得算法显著优于现有的迭代替代方案,在标准IEEE 30节点系统上的数值测试证实了这一点。
An important monitoring task for power systems is accurate estimation of the system operation state. Under the nonlinear AC power flow model, the state estimation (SE) problem is inherently nonconvex giving rise to many local optima. In addition to nonconvexity, SE is challenged by data integrity and cyber-security issues. Unfortunately, existing robust (R-) SE schemes employed routinely in practice rely on iterative solvers, which are sensitive to initialization and cannot ensure global optimality. A novel R-SE approach is formulated here by capitalizing on the sparsity of an overcomplete outlier vector model. Observability and identifiability issues of this model are investigated, and neat links are established between R-SE and error control coding. The \emph{convex} semidefinite relaxation (SDR) technique is further pursued to render the nonconvex R-SE problem efficiently solvable. The resultant algorithm markedly outperforms existing iterative alternatives, as corroborated through numerical tests on the standard IEEE 30-bus system.
基于低秩双随机矩阵分解的聚类
Zhirong Yang, Erkki Oja
AI总结 提出一种超越矩阵分解的低秩学习方法,通过两步二分随机游走逼近聚类分配概率,利用KL散度最小化实现判别模型的最大似然估计,并采用松弛的MM算法优化,显著提升大规模流形数据的聚类纯度。
在过去十年中,通过非负低秩近似进行聚类分析取得了显著进展。然而,该方向上的大多数近似方法仍局限于矩阵分解。我们提出了一种新的低秩学习方法以提高聚类性能,该方法超越了矩阵分解。该近似基于通过虚拟聚类节点的两步二分随机游走,其中近似仅由聚类分配概率构成。通过Kullback-Leibler散度测量的近似误差最小化等价于判别模型的最大似然估计,这为我们的方法提供了坚实的概率解释。优化通过一种松弛的Majorization-Minimization算法实现,该算法在寻找良好局部最小值方面具有优势。此外,我们指出带有Dirichlet先验的正则化算法仅作为初始化。实验结果表明,新方法在各种数据集上,特别是大规模流形数据上,具有强大的聚类纯度性能。
Clustering analysis by nonnegative low-rank approximations has achieved remarkable progress in the past decade. However, most approximation approaches in this direction are still restricted to matrix factorization. We propose a new low-rank learning method to improve the clustering performance, which is beyond matrix factorization. The approximation is based on a two-step bipartite random walk through virtual cluster nodes, where the approximation is formed by only cluster assigning probabilities. Minimizing the approximation error measured by Kullback-Leibler divergence is equivalent to maximizing the likelihood of a discriminative model, which endows our method with a solid probabilistic interpretation. The optimization is implemented by a relaxed Majorization-Minimization algorithm that is advantageous in finding good local minima. Furthermore, we point out that the regularized algorithm with Dirichlet prior only serves as initialization. Experimental results show that the new method has strong performance in clustering purity for various datasets, especially for large-scale manifold data.
凸回归的集成方法及其在基于几何规划的电路设计中的应用
Lauren Hannah, David Dunson
AI总结 本文提出集成方法(如bagging、smearing和随机划分)来改进分段线性凸回归的稳定性,并应用于基于几何规划的电路设计中的器件建模和约束近似。
凸回归是连接统计估计和确定性凸优化的一个有前景的领域。新的分段线性凸回归方法快速且可扩展,但在用于近似优化问题的约束或目标函数时可能不稳定。集成方法,如bagging、smearing和随机划分,可以缓解这一问题并保持底层估计器的理论性质。我们通过实验检验了集成方法在预测和优化中的性能,然后将其应用于基于几何规划的电路设计中的器件建模和约束近似。
Convex regression is a promising area for bridging statistical estimation and deterministic convex optimization. New piecewise linear convex regression methods are fast and scalable, but can have instability when used to approximate constraints or objective functions for optimization. Ensemble methods, like bagging, smearing and random partitioning, can alleviate this problem and maintain the theoretical properties of the underlying estimator. We empirically examine the performance of ensemble methods for prediction and optimization, and then apply them to device modeling and constraint approximation for geometric programming based circuit design.
协同过滤中矩阵分解的稳定性
Yu-Xiang Wang, Huan Xu
AI总结 研究矩阵分解算法在矩阵补全中对抗性噪声的稳定性,通过误差界、子空间分析和个体预测误差分析,为协同过滤系统设计提供指导。
我们研究了矩阵分解算法在矩阵补全中对抗性噪声的稳定性。具体地,我们的结果包括:(I)我们以均方根误差为度量,给出了分解方法解矩阵与真实值之间的差距的界;(II)我们将矩阵分解视为子空间拟合问题,并分析了求解子空间与真实子空间之间的差异;(III)我们基于子空间稳定性分析了单个用户的预测误差。我们将这些结果应用于操纵者攻击下的协同过滤问题,从而为协同过滤系统设计提供了有用的见解和指导。
We study the stability vis a vis adversarial noise of matrix factorization algorithm for matrix completion. In particular, our results include: (I) we bound the gap between the solution matrix of the factorization method and the ground truth in terms of root mean square error; (II) we treat the matrix factorization as a subspace fitting problem and analyze the difference between the solution subspace and the ground truth; (III) we analyze the prediction error of individual users based on the subspace stability. We apply these results to the problem of collaborative filtering under manipulator attack, which leads to useful insights and guidelines for collaborative filtering system design.
凸半定优化的一种混合算法
Soeren Laue
AI总结 提出一种混合算法用于优化凸光滑函数在半正定矩阵锥上的问题,该算法收敛到全局最优解,可解决大规模半定规划,在矩阵补全、度量学习和稀疏PCA上优于现有方法。
我们提出了一种混合算法,用于在半正定矩阵锥上优化凸光滑函数。我们的算法收敛到全局最优解,可用于解决一般的大规模半定规划问题,因此可以轻松应用于各种机器学习问题。我们在三个机器学习问题(矩阵补全、度量学习和稀疏PCA)上展示了实验结果。我们的方法优于最先进的算法。
We present a hybrid algorithm for optimizing a convex, smooth function over the cone of positive semidefinite matrices. Our algorithm converges to the global optimal solution and can be used to solve general large-scale semidefinite programs and hence can be readily applied to a variety of machine learning problems. We show experimental results on three machine learning problems (matrix completion, metric learning, and sparse PCA) . Our approach outperforms state-of-the-art algorithms.
拟牛顿方法:一个新方向
Philipp Hennig, Martin Kiefel
AI总结 本文通过将拟牛顿方法解释为贝叶斯线性回归的近似,揭示了经典算法的缺陷,并提出了一种新的非参数拟牛顿方法,在相似计算成本下更高效地利用信息。
在拟牛顿方法发明四十年后,它们仍然是无约束数值优化中的最先进技术。虽然通常不被这样解释,但这些是拟合目标函数的局部二次逼近的学习算法。我们表明,许多(包括最流行的)拟牛顿方法可以解释为在不同先验假设下贝叶斯线性回归的近似。这一新概念阐明了经典算法的一些缺陷,并为一种新颖的非参数拟牛顿方法指明了道路,该方法能够在与之前方法相似的计算成本下更有效地利用可用信息。
Four decades after their invention, quasi-Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors.
复正交匹配追踪及其精确恢复条件
Rong Fan, Qun Wan, Yipeng Liu, Hui Chen, Xiao Zhang
AI总结 本文针对复数情况(复测量向量、复字典和复加性高斯白噪声),提出正交匹配追踪(OMP)在无噪声和有界高斯噪声下精确恢复稀疏信号的条件,并推导了复数情况下的精确恢复条件(ERC)。
在本文中,我们提出了关于使用正交匹配追踪(OMP)解决复数情况下(即复测量向量、复字典和复加性高斯白噪声(CAWGN))过完备字典上的稀疏逼近问题的新结果。在无噪声和有界高斯噪声设置下,我们提出了OMP在复数情况下能够恢复精确稀疏信号的最优表示的充分条件。类似于实数情况下的精确恢复条件(ERC)结果,我们将其扩展到复数情况,并在本文中推导了相应的ERC。该理论表明,对于一类复数字典,OMP能够成功恢复k-稀疏信号。此外,还给出了一个基于几何绕射理论(GTD)模型的复数应用实例。最后,仿真实验验证了理论分析的有效性。
In this paper, we present new results on using orthogonal matching pursuit (OMP), to solve the sparse approximation problem over redundant dictionaries for complex cases (i.e., complex measurement vector, complex dictionary and complex additive white Gaussian noise (CAWGN)). A sufficient condition that OMP can recover the optimal representation of an exactly sparse signal in the complex cases is proposed both in noiseless and bound Gaussian noise settings. Similar to exact recovery condition (ERC) results in real cases, we extend them to complex case and derivate the corresponding ERC in the paper. It leverages this theory to show that OMP succeed for k-sparse signal from a class of complex dictionary. Besides, an application with geometrical theory of diffraction (GTD) model is presented for complex cases. Finally, simulation experiments illustrate the validity of the theoretical analysis.
压缩感知中的多稀疏信号恢复
Yipeng Liu, Ivan Gligorijevic, Vladimir Matic, Maarten De Vos, Sabine Van Huffel
AI总结 针对在多个域中稀疏的信号,提出一种结合多稀疏约束和线性测量拟合约束的凸规划模型,以提高信号恢复性能,并以肌电信号为例验证了方法的有效性。
信号恢复是压缩感知的关键技术之一。它从线性亚奈奎斯特测量中重建原始信号。经典方法利用一个域中的稀疏性来构建L0范数优化。最近的研究表明,一些信号在多个域中是稀疏的。为了进一步提高信号重建性能,我们可以利用这种多稀疏性来生成一个新的凸规划模型。后者由多个域中的多个稀疏约束和线性测量拟合约束构成。它通过额外的先验信息提高了信号恢复性能。由于一些肌电信号在时域和频域都表现出稀疏性,我们在数值实验中以它们为例。结果表明,新提出的方法对多稀疏信号具有更好的性能。
Signal recovery is one of the key techniques of Compressive sensing (CS). It reconstructs the original signal from the linear sub-Nyquist measurements. Classical methods exploit the sparsity in one domain to formulate the L0 norm optimization. Recent investigation shows that some signals are sparse in multiple domains. To further improve the signal reconstruction performance, we can exploit this multi-sparsity to generate a new convex programming model. The latter is formulated with multiple sparsity constraints in multiple domains and the linear measurement fitting constraint. It improves signal recovery performance by additional a priori information. Since some EMG signals exhibit sparsity both in time and frequency domains, we take them as example in numerical experiments. Results show that the newly proposed method achieves better performance for multi-sparse signals.
传感器管理:过去、现在与未来
Alfred O. Hero, Douglas Cochran
AI总结 本文综述了传感器管理的理论、算法和应用,涵盖其发展历程和当前现状,并展望未来方向。
传感器系统通常在资源约束下运行,这些约束阻止了所有资源同时使用。当传感系统具有主动管理这些资源的能力时,即能够在部署期间根据先前的测量改变其运行配置,传感器管理就变得相关。当前或近期可能使用传感器管理的系统示例包括自主机器人、监视和侦察网络以及波形捷变雷达。本文概述了传感器管理的理论、算法和应用,如其过去几十年发展至今的状况。
Sensor systems typically operate under resource constraints that prevent the simultaneous use of all resources all of the time. Sensor management becomes relevant when the sensing system has the capability of actively managing these resources; i.e., changing its operating configuration during deployment in reaction to previous measurements. Examples of systems in which sensor management is currently used or is likely to be used in the near future include autonomous robots, surveillance and reconnaissance networks, and waveform-agile radars. This paper provides an overview of the theory, algorithms, and applications of sensor management as it has developed over the past decades and as it stands today.
最优方向吉布斯采样
J. Andrés Christen, Colin Fox, Diego Andrés Pérez-Ruiz, Mario Santana-Cibrian
AI总结 研究在方向性随机扫描吉布斯采样器中如何通过最小化截断正态目标函数的互信息来选择最优方向,并推广到多元正态局部近似情况。
广义吉布斯核是可以沿任意方向(不限于目标函数参数的每个轴)进行采样的核。我们研究如何在方向性随机扫描吉布斯采样器设置中最优地选择这样的方向。通过最小化截断正态目标函数的MCMC两步之间的互信息(Kullback-Leibler散度)来选择最优方向。该结果被推广到当目标函数可用多元正态(局部)近似时使用。在高度偏斜的非正态目标函数中测试了三种吉布斯方向分布。
Generalized Gibbs kernels are those that may take any direction not necessarily bounded to each axis along the parameters of the objective function. We study how to optimally choose such directions in a Directional, random scan, Gibbs sampler setting. The optimal direction is chosen by minimizing to the mutual information (Kullback-Leibler divergence) of two steps of the MCMC for a truncated Normal objective function. The result is generalized to be used when a Multivariate Normal (local) approximation is available for the objective function. Three Gibbs direction distributions are tested in highly skewed non-normal objective functions.
自由能与序列决策的广义最优性方程
Pedro A. Ortega, Daniel A. Braun
AI总结 本文应用自由能原理到包含对抗和随机环境的通用决策树,推导出广义序列最优性方程,该方程包含Bellman最优性方程作为极限情况,并导出Expectimax、Minimax和Expectiminimax等决策规则,为每个节点分配资源参数以表达计算成本。
自由能泛函最近被提出作为有界理性决策的变分原理,因为它实例化了效用增益与信息处理成本之间的自然权衡,并且可以从公理推导出来。这里我们将自由能原理应用于包含对抗和随机环境的通用决策树。我们推导出广义序列最优性方程,该方程不仅包含Bellman最优性方程作为极限情况,而且导出了众所周知的决策规则,如Expectimax、Minimax和Expectiminimax。我们展示了如何从单一的自由能原理推导出这些决策规则,该原理为决策树中的每个节点分配一个资源参数。这些资源参数表达了一个具体的计算成本,可以测量为从属于每个节点的分布所需的样本数量。因此,自由能原理为考虑对抗和随机环境的广义最优性方程提供了规范基础。
The free energy functional has recently been proposed as a variational principle for bounded rational decision-making, since it instantiates a natural trade-off between utility gains and information processing costs that can be axiomatically derived. Here we apply the free energy principle to general decision trees that include both adversarial and stochastic environments. We derive generalized sequential optimality equations that not only include the Bellman optimality equations as a limit case, but also lead to well-known decision-rules such as Expectimax, Minimax and Expectiminimax. We show how these decision-rules can be derived from a single free energy principle that assigns a resource parameter to each node in the decision tree. These resource parameters express a concrete computational cost that can be measured as the amount of samples that are needed from the distribution that belongs to each node. The free energy principle therefore provides the normative basis for generalized optimality equations that account for both adversarial and stochastic environments.
使用可逆跳跃MCMC求解马尔可夫决策过程的新推理策略
Matthias Hoffman, Hendrik Kueck, Nando de Freitas, Arnaud Doucet
AI总结 本文提出基于可逆跳跃MCMC的改进推理策略,通过新目标分布和打破参数-轨迹相关性,实现高维空间中的最优策略估计。
本文基于先前使用推理技术(特别是马尔可夫链蒙特卡洛(MCMC)方法)求解参数化控制问题的工作,提出了一系列改进,以使该方法在一般的高维空间中更加实用。我们首先引入了一个新的目标分布,能够从采样轨迹中融入更多奖励信息。我们还展示了如何打破策略参数与采样轨迹之间的强相关性,以实现更自由的采样。最后,我们展示了如何以原则性的方式将这些技术结合起来,以获得最优策略的估计。
In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higher-dimensional spaces. We first introduce a new target distribution which is able to incorporate more reward information from sampled trajectories. We also show how to break strong correlations between the policy parameters and sampled trajectories in order to sample more freely. Finally, we show how to incorporate these techniques in a principled manner to obtain estimates of the optimal policy.
L0-范数最小化的罚分解方法
Zhaosong Lu, Yong Zhang
AI总结 提出罚分解方法求解含L0-范数的优化问题,通过转化为秩最小化问题并利用向量化操作,在压缩感知等应用中优于现有方法。
本文考虑一般的l0-范数最小化问题,即目标函数或约束中出现l0-范数的问题。特别地,我们首先将l0-范数约束问题重新表述为等价的秩最小化问题,然后应用[33]中提出的罚分解(PD)方法求解后者。通过利用特殊结构,我们将该方法的所有矩阵运算转化为向量运算,得到仅涉及向量运算的PD方法。在适当的假设下,我们证明PD方法生成的序列的任何聚点满足一阶最优性条件,该条件通常比一个自然最优性条件更强。我们进一步扩展PD方法以求解目标函数中出现l0-范数的问题。最后,通过将PD方法应用于压缩感知、稀疏逻辑回归和稀疏逆协方差选择来测试其性能。计算结果表明,我们的方法在解质量和/或速度方面通常优于现有方法。
In this paper we consider general l0-norm minimization problems, that is, the problems with l0-norm appearing in either objective function or constraint. In particular, we first reformulate the l0-norm constrained problem as an equivalent rank minimization problem and then apply the penalty decomposition (PD) method proposed in [33] to solve the latter problem. By utilizing the special structures, we then transform all matrix operations of this method to vector operations and obtain a PD method that only involves vector operations. Under some suitable assumptions, we establish that any accumulation point of the sequence generated by the PD method satisfies a first-order optimality condition that is generally stronger than one natural optimality condition. We further extend the PD method to solve the problem with the l0-norm appearing in objective function. Finally, we test the performance of our PD methods by applying them to compressed sensing, sparse logistic regression and sparse inverse covariance selection. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed.
方差分量与广义Sobol'指数
Art B. Owen
AI总结 本文引入广义Sobol'指数,比较其估计策略,并系统搜索高效估计量,特别关注双线性形式的对比、平方和与指数,以降低函数评估次数。
本文介绍了广义Sobol'指数,比较了它们的估计策略,并系统搜索了高效估计量。特别关注的是对比、平方和以及双线性形式的指数,这些与替代方法相比允许减少函数评估次数。双线性框架包括来自Saltelli (2002)和Mauntz (2002)的一些高效估计量,以及针对特定方差分量和平均维数的一些新估计量。本文还提供了Janon等人(2012)估计量的偏差校正版本,并将偏差校正扩展到广义Sobol'指数。给出了一些数值比较。
This paper introduces generalized Sobol' indices, compares strategies for their estimation, and makes a systematic search for efficient estimators. Of particular interest are contrasts, sums of squares and indices of bilinear form which allow a reduced number of function evaluations compared to alternatives. The bilinear framework includes some efficient estimators from Saltelli (2002) and Mauntz (2002) as well as some new estimators for specific variance components and mean dimensions. This paper also provides a bias corrected version of the estimator of Janon et al.\,(2012) and extends the bias correction to generalized Sobol' indices. Some numerical comparisons are given.
两个联合高斯变量比值的密度扩散方程及拉普拉斯变换数值反演
Piero Barone
AI总结 本文证明同方差联合高斯变量比值的密度满足非稳态扩散方程,并探讨该结果在随机矩阵束广义特征值凝聚密度核密度估计中用于拉普拉斯变换数值反演的意义。
证明了具有相同方差和联合高斯分布的两个随机变量比值的密度满足非稳态扩散方程。讨论了该结果对于随机矩阵束广义特征值凝聚密度的核密度估计的意义,该估计可用于拉普拉斯变换的数值反演。
It is shown that the density of the ratio of two random variables with the same variance and joint Gaussian density satisfies a non stationary diffusion equation. Implications of this result for kernel density estimation of the condensed density of the generalized eigenvalues of a random matrix pencil useful for the numerical inversion of the Laplace transform is discussed.
求解复指数逼近问题的新变换的计算方面与应用
Piero Barone
AI总结 针对复指数逼近问题,提出一种基于矩的新变换,并开发算法应用于核磁共振谱分析、时间序列插值外推及矩形状重建。
许多实际问题可以归结为复指数逼近问题的求解,而该问题通常是不适定的。最近,一种用于求解该问题的新变换被提出,该变换被表述为平面上的特定矩问题,并在理论框架下进行了研究。本文探讨了一些计算问题,以使这一新工具在实践中发挥作用。我们开发了一种算法,并将其用于解决核磁共振谱分析问题、两个时间序列插值和外推问题以及一个基于矩的形状重建问题。
Many real life problems can be reduced to the solution of a complex exponentials approximation problem which is usually ill posed. Recently a new transform for solving this problem, formulated as a specific moments problem in the plane, has been proposed in a theoretical framework. In this work some computational issues are addressed to make this new tool useful in practice. An algorithm is developed and used to solve a Nuclear Magnetic Resonance spectrometry problem, two time series interpolation and extrapolation problems and a shape from moments problem.
一种求解噪声复指数逼近问题的新变换
Piero Barone
AI总结 针对加性高斯噪声下复矩估计复测度问题,提出一种新离散变换以估计未知测度,并通过仿真验证近似效果。
考虑从受加性独立同分布高斯噪声影响的有限个复矩估计由复平面上点处狄拉克分布的线性组合构成的复测度问题。定义了一个随机测度,其期望在适当条件下近似未知测度。然后提出了该近似测度的估计量,以及一种新的噪声矩离散变换,用于计算未知测度的估计。还进行了小规模仿真研究,以实验验证近似的良好性。
The problem of estimating a complex measure made up by a linear combination of Dirac distributions centered on points of the complex plane from a finite number of its complex moments affected by additive i.i.d. Gaussian noise is considered. A random measure is defined whose expectation approximates the unknown measure under suitable conditions. An estimator of the approximating measure is then proposed as well as a new discrete transform of the noisy moments that allows to compute an estimate of the unknown measure. A small simulation study is also performed to experimentally check the goodness of the approximations.
小Sobol'灵敏度指数的更好估计
Art B. Owen
AI总结 提出一种使用三个独立输入向量而非通常两个的新方法,以更准确地估计小Sobol'指数,并在目标指数较小时优于利用真实均值的oracle方法。
提出了一种估计Sobol'指数的新方法。新方法使用3个独立输入向量,而非通常的2个。在目标Sobol'指数较小的问题上,它获得了更高的精度,甚至优于一些利用真实但未知的函数均值进行调整的oracle方法。当目标Sobol'指数相当大时,oracle方法比新方法表现更好。
A new method for estimating Sobol' indices is proposed. The new method makes use of 3 independent input vectors rather than the usual 2. It attains much greater accuracy on problems where the target Sobol' index is small, even outperforming some oracles which adjust using the true but unknown mean of the function. When the target Sobol' index is quite large, the oracles do better than the new method.
基于统计线性化的自适应高斯混合滤波器
Marco F. Huber
AI总结 提出一种基于统计线性化的自适应高斯混合滤波器,通过动态分裂高斯分量来平衡计算复杂度与估计精度,并引入局部线性化误差度量以确定分裂方向,仿真验证其优于相关方法。
高斯混合是非线性、非高斯贝叶斯状态估计中常见的密度表示。然而,选择合适数量的高斯分量是困难的,因为必须在计算复杂度和估计精度之间进行权衡。本文提出了一种基于统计线性化的自适应高斯混合滤波器。根据所考虑估计问题的非线性程度,该滤波器通过分裂动态增加分量数量。为此,引入了一种度量,用于量化每个高斯混合分量局部引起的线性化误差。评估非线性与线性化状态空间模型之间的偏差以确定分裂方向。所提出的方法不限于特定的统计线性化方法。仿真表明,与相关方法和常见滤波算法相比,其估计性能更优。
Gaussian mixtures are a common density representation in nonlinear, non-Gaussian Bayesian state estimation. Selecting an appropriate number of Gaussian components, however, is difficult as one has to trade of computational complexity against estimation accuracy. In this paper, an adaptive Gaussian mixture filter based on statistical linearization is proposed. Depending on the nonlinearity of the considered estimation problem, this filter dynamically increases the number of components via splitting. For this purpose, a measure is introduced that allows for quantifying the locally induced linearization error at each Gaussian mixture component. The deviation between the nonlinear and the linearized state space model is evaluated for determining the splitting direction. The proposed approach is not restricted to a specific statistical linearization method. Simulations show the superior estimation performance compared to related approaches and common filtering algorithms.
高效正则化保序回归及其在基因-基因交互搜索中的应用
Ronny Luss, Saharon Rosset, Moni Shahar
AI总结 提出基于递归划分的正则化保序回归算法IRP,通过控制模型复杂度解决高维过拟合问题,并应用于基因-基因交互搜索。
保序回归是一种用于拟合数据单调模型的非参数方法,在理论和实践上都得到了广泛研究。然而,该方法在高维中会遇到计算和统计过拟合问题。为了解决这两个问题,我们提出了一种算法,称为保序递归划分(IRP),该算法基于递归划分协变量空间,通过求解逐渐变小的“最佳切割”子问题来实现保序回归。这创建了一个正则化的保序模型序列,模型复杂度逐渐增加,并收敛到全局保序回归解。由于提供了复杂度控制,序列中的模型通常比未正则化的保序回归模型更准确。我们通过沿路径估计自由度来量化这种复杂度控制。通过一系列模拟和真实数据实验,证明了正则化模型在预测中的成功以及IRP良好的计算性能。我们讨论了IRP在搜索基因-基因交互和上位性中的应用,并在三种常见疾病的全基因组关联研究数据上进行了演示。
Isotonic regression is a nonparametric approach for fitting monotonic models to data that has been widely studied from both theoretical and practical perspectives. However, this approach encounters computational and statistical overfitting issues in higher dimensions. To address both concerns, we present an algorithm, which we term Isotonic Recursive Partitioning (IRP), for isotonic regression based on recursively partitioning the covariate space through solution of progressively smaller "best cut" subproblems. This creates a regularized sequence of isotonic models of increasing model complexity that converges to the global isotonic regression solution. The models along the sequence are often more accurate than the unregularized isotonic regression model because of the complexity control they offer. We quantify this complexity control through estimation of degrees of freedom along the path. Success of the regularized models in prediction and IRPs favorable computational properties are demonstrated through a series of simulated and real data experiments. We discuss application of IRP to the problem of searching for gene--gene interactions and epistasis, and demonstrate it on data from genome-wide association studies of three common diseases.
一种使用无线传感器网络的简单洪水预测方案
Victor Seal, Arnab Raha, Shovan Maity, Souvik Kr Mitra, Amitava Mukherjee, Mrinal Kanti Naskar
AI总结 提出一种基于无线传感器网络的多元鲁棒线性回归洪水预测模型,通过简单快速的计算实现实时预测,并与其他算法对比验证改进效果。
本文提出一种使用无线传感器网络(WSNs)设计的预测模型,用于预测河流洪水,采用简单快速的计算提供实时结果,以拯救可能受洪水影响的生命。我们的预测模型使用多元鲁棒线性回归,易于理解,实现简单且成本效益高,速度高效,资源利用率低,同时提供可靠精度的实时预测,因此具有任何实际算法所期望的特征。我们的预测模型独立于参数数量,即可以根据现场需求添加或删除任意数量的参数。当水位上升时,我们使用多项式表示水位,其性质用于判断水位是否可能在近期超过洪水线。我们将我们的工作与一种当代算法进行比较,以展示我们的改进。然后,我们展示了预测水位与实际水位的仿真结果。
This paper presents a forecasting model designed using WSNs (Wireless Sensor Networks) to predict flood in rivers using simple and fast calculations to provide real-time results and save the lives of people who may be affected by the flood. Our prediction model uses multiple variable robust linear regression which is easy to understand and simple and cost effective in implementation, is speed efficient, but has low resource utilization and yet provides real time predictions with reliable accuracy, thus having features which are desirable in any real world algorithm. Our prediction model is independent of the number of parameters, i.e. any number of parameters may be added or removed based on the on-site requirements. When the water level rises, we represent it using a polynomial whose nature is used to determine if the water level may exceed the flood line in the near future. We compare our work with a contemporary algorithm to demonstrate our improvements over it. Then we present our simulation results for the predicted water level compared to the actual water level.
统计模型检验中重要性采样参数的交叉熵优化
Cyrille Jégourel, Axel Legay, Sean Sedwards
AI总结 提出一种基于交叉熵的低维参数优化算法,用于统计模型检验中的重要性采样分布,避免状态空间显式表示,显著提升模拟效率。
统计模型检验通过多次系统执行来估计属性,并在置信区间内给出结果,从而避免了概率模型检验中状态数的指数增长。稀有属性通常非常重要,但对基于模拟的方法构成特殊挑战,因此在这些情况下,关键目标是减少达到给定置信水平所需的模拟次数和长度。重要性采样是一种成熟的实现这一目标的技术,然而为了保持统计模型检验的优势,需要在不考虑整个状态空间的情况下找到好的重要性采样分布。基于上述动机,我们提出了一种简单的算法,利用交叉熵的概念来寻找重要性采样的最优参数。与以往工作不同,我们的算法使用低维参数向量来定义该分布,从而避免了通常难以处理的转移矩阵显式表示。我们证明,我们的参数化导致唯一最优解,并且可以在模拟效率上产生多个数量级的改进。通过将我们的方法应用于可靠性工程和生物化学模型,我们展示了其有效性。
Statistical model checking avoids the exponential growth of states associated with probabilistic model checking by estimating properties from multiple executions of a system and by giving results within confidence bounds. Rare properties are often very important but pose a particular challenge for simulation-based approaches, hence a key objective under these circumstances is to reduce the number and length of simulations necessary to produce a given level of confidence. Importance sampling is a well-established technique that achieves this, however to maintain the advantages of statistical model checking it is necessary to find good importance sampling distributions without considering the entire state space. Motivated by the above, we present a simple algorithm that uses the notion of cross-entropy to find the optimal parameters for an importance sampling distribution. In contrast to previous work, our algorithm uses a low dimensional vector of parameters to define this distribution and thus avoids the often intractable explicit representation of a transition matrix. We show that our parametrisation leads to a unique optimum and can produce many orders of magnitude improvement in simulation efficiency. We demonstrate the efficacy of our methodology by applying it to models from reliability engineering and biochemistry.
HodgeRank是Perron Rank的极限
Ngoc Mai Tran
AI总结 本文研究将元素正矩阵映射到其k次Hadamard幂的主特征向量的k次根,证明当k趋于0时恢复行几何平均向量,并揭示HodgeRank是Perron Rank的极限,建立两种成对排序方法之间的数学联系。
我们研究将元素正矩阵映射到其k次Hadamard幂的主特征向量的k次根。我们证明当$k$趋于0时,恢复行几何平均向量,并讨论这种收敛的几何意义。在成对比较排序的背景下,我们的结果表明HodgeRank是Perron Rank的极限,从而为两种重要的成对排序方法提供了新颖的数学联系。
We study the map which takes an elementwise positive matrix to the k-th root of the principal eigenvector of its k-th Hadamard power. We show that as $k$ tends to 0 one recovers the row geometric mean vector and discuss the geometric significance of this convergence. In the context of pairwise comparison ranking, our result states that HodgeRank is the limit of Perron Rank, thereby providing a novel mathematical link between two important pairwise ranking methods.
凸规划精确罚方法中的路径跟踪
Hua Zhou, Kenneth Lange
AI总结 本文提出一种路径跟踪策略,通过连续变化罚常数从无约束解追踪到约束解,适用于精确罚方法,并应用于多种凸规划问题。
经典罚方法求解一系列无约束问题,这些问题的约束满足程度越来越严格。当罚常数趋于无穷大时,恢复约束解。在精确罚方法中,平方罚被绝对值罚替代,且对于有限罚常数值即可恢复解。实践中,罚函数中的折点和罚常数未知大小阻碍了精确罚方法在非线性规划中的广泛应用。本文研究了一种与精确罚方法一致的路径跟踪策略。我们不是在一个固定的罚常数下进行优化,而是将解作为罚常数的连续函数进行追踪。因此,路径跟踪从无约束解开始,随着罚常数增加沿解路径移动。在此过程中,解路径会碰到、沿着滑动并离开各种约束。对于二次规划,解路径是分段线性的,并在约束之间大幅跳跃。对于一般凸规划,解路径是分段光滑的,路径跟踪通过逐段数值求解常微分方程来实现。我们在a) 凸集投影、b) 非负最小二乘、c) 二次约束二次规划、d) 几何规划和e) 半定规划中的多种应用展示了路径跟踪的机制和潜力。最后,图像去噪的示例展示了路径跟踪在逆问题正则化估计中的相关性。在正则化估计中,我们随着罚常数从大值减小而追踪解路径。
Classical penalty methods solve a sequence of unconstrained problems that put greater and greater stress on meeting the constraints. In the limit as the penalty constant tends to $\infty$, one recovers the constrained solution. In the exact penalty method, squared penalties are replaced by absolute value penalties, and the solution is recovered for a finite value of the penalty constant. In practice, the kinks in the penalty and the unknown magnitude of the penalty constant prevent wide application of the exact penalty method in nonlinear programming. In this article, we examine a strategy of path following consistent with the exact penalty method. Instead of performing optimization at a single penalty constant, we trace the solution as a continuous function of the penalty constant. Thus, path following starts at the unconstrained solution and follows the solution path as the penalty constant increases. In the process, the solution path hits, slides along, and exits from the various constraints. For quadratic programming, the solution path is piecewise linear and takes large jumps from constraint to constraint. For a general convex program, the solution path is piecewise smooth, and path following operates by numerically solving an ordinary differential equation segment by segment. Our diverse applications to a) projection onto a convex set, b) nonnegative least squares, c) quadratically constrained quadratic programming, d) geometric programming, and e) semidefinite programming illustrate the mechanics and potential of path following. The final detour to image denoising demonstrates the relevance of path following to regularized estimation in inverse problems. In regularized estimation, one follows the solution path as the penalty constant decreases from a large value.
高维中少量任意线性参数的函数学习
Massimo Fornasier, Karin Schnass, Jan Vybiral
AI总结 针对高维空间中由少量线性参数决定的函数,提出基于随机采样和压缩感知的近似算法,在多项式时间内实现高概率逼近。
假设 $f$ 是定义在 $\mathbb R^d$ 的单位球上的连续函数,形式为 $f(x) = g (A x)$,其中 $A$ 是 $k imes d$ 矩阵,$g$ 是 $k$ 个变量的函数,且 $k \ll d$。我们有一个预算 $m \in \mathbb N$,即允许查询 $f$ 的 $m$ 个点 $f(x_i)$,$i=1,...,m$,以构造一致逼近函数。在函数 $g$ 的某些光滑性和变差假设下,以及矩阵 $A$ 的任意选择下,本文提出: 1. 随机抽取点 $\{x_i\}$ 的采样选择,用于每个函数逼近; 2. 计算逼近函数的算法(算法1和算法2),其复杂度在维度 $d$ 和点数 $m$ 上最多为多项式。 由于 $A$ 的任意性,采样点的选择将根据适当的随机分布进行,我们的结果以压倒性概率成立。我们的方法使用了压缩感知框架中的工具、正半定矩阵和的近期Chernoff界,以及奇异值分解不变子空间的经典稳定性界。
Let us assume that $f$ is a continuous function defined on the unit ball of $\mathbb R^d$, of the form $f(x) = g (A x)$, where $A$ is a $k \times d$ matrix and $g$ is a function of $k$ variables for $k \ll d$. We are given a budget $m \in \mathbb N$ of possible point evaluations $f(x_i)$, $i=1,...,m$, of $f$, which we are allowed to query in order to construct a uniform approximating function. Under certain smoothness and variation assumptions on the function $g$, and an {\it arbitrary} choice of the matrix $A$, we present in this paper 1. a sampling choice of the points $\{x_i\}$ drawn at random for each function approximation; 2. algorithms (Algorithm 1 and Algorithm 2) for computing the approximating function, whose complexity is at most polynomial in the dimension $d$ and in the number $m$ of points. Due to the arbitrariness of $A$, the choice of the sampling points will be according to suitable random distributions and our results hold with overwhelming probability. Our approach uses tools taken from the {\it compressed sensing} framework, recent Chernoff bounds for sums of positive-semidefinite matrices, and classical stability bounds for invariant subspaces of singular value decompositions.
平均最佳 $m$ 项逼近
Jan Vybíral
AI总结 针对 $\ell_p^n$ 单位球上的概率测度,引入平均最佳 $m$ 项逼近宽度的概念,估计了嵌入 $id:\ell_p^n o\ell_q^n$($0<p\le q\le \infty$)在归一化锥面和球面测度下的这些量,并考虑张量积权重,证明典型向量具有强可压缩(即几乎稀疏)结构。
我们引入了关于 $\ell_p^n$ 单位球上概率测度的平均最佳 $m$ 项逼近宽度的概念。对于 $0<p\le q\le \infty$ 的嵌入 $id:\ell_p^n o\ell_q^n$,我们在归一化锥面和球面测度下估计了这些量。此外,我们考虑了某些张量积权重,并表明相对于这种测度的典型向量展现出强可压缩(即几乎稀疏)结构。
We introduce the concept of average best $m$-term approximation widths with respect to a probability measure on the unit ball of $\ell_p^n$. We estimate these quantities for the embedding $id:\ell_p^n\to\ell_q^n$ with $0<p\le q\le \infty$ for the normalized cone and surface measure. Furthermore, we consider certain tensor product weights and show that a typical vector with respect to such a measure exhibits a strong compressible (i.e. nearly sparse) structure.
大数据集主成分分析算法
Nathan Halko, Per-Gunnar Martinsson, Yoel Shkolnisky, Mark Tygert
AI总结 针对无法完全存入内存的大数据集,提出一种基于随机化方法的高效主成分分析算法,实现近似最优精度且支持外存计算。
最近流行的用于主成分分析(PCA)的随机化方法,即使在并行处理器上,也能高效可靠地产生接近最优的精度——这与经典的(确定性)替代方法不同。我们将其中一种随机化方法应用于那些太大而无法存储在随机存取存储器(RAM)中的数据集。(传统术语称我们的过程高效地“外核”运行。)我们通过几个数值示例说明了该算法的性能。例如,我们报告了对存储在磁盘上的一个数据集的PCA,该数据集如此之大,以至于其不到百分之一可以放入我们计算机的RAM中。
Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy --- even on parallel processors --- unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure works efficiently "out-of-core.") We illustrate the performance of the algorithm via several numerical examples. For example, we report on the PCA of a data set stored on disk that is so large that less than a hundredth of it can fit in our computer's RAM.
凸锥问题的模板及其在稀疏信号恢复中的应用
Stephen R. Becker, Emmanuel J. Candès, Michael Grant
AI总结 本文提出一个通用框架,通过锥规划、对偶、平滑和一阶优化方法求解信号处理、机器学习等领域中的凸锥问题,并应用于压缩感知等稀疏信号恢复问题,同时引入延续策略、步长控制等新技术,实现高效稳定的算法。
本文开发了一个通用框架,用于解决信号处理、机器学习、统计学及其他领域中经常出现的各种凸锥问题。该方法的工作流程如下:首先,确定问题的锥形式;其次,确定其对偶问题;第三,应用平滑处理;第四,使用最优一阶方法求解。该方法的一个优点是灵活性:例如,所有压缩感知问题都可以通过该方法解决,包括具有目标泛函如全变分范数、||Wx||_1(其中W是任意的)或其组合的模型。此外,本文还引入了一些技术贡献,如一种新颖的延续方案、一种控制步长的新方法,以及一些新结果,表明平滑问题与非平滑问题有时在形式上等价。结合我们的框架,这些贡献产生了新颖、稳定且计算高效的算法。例如,我们的通用实现与解决LASSO等深入研究问题的先进方法具有竞争力。此外,数值实验表明,可以在几百次迭代内解决Dantzig选择器问题,而此前不存在高效的大规模求解器。最后,本文附带了软件发布。该软件不是单一的、整体的求解器;而是一套程序和例程,旨在作为构建完整算法的构建块。
This paper develops a general framework for solving a variety of convex cone problems that frequently arise in signal processing, machine learning, statistics, and other fields. The approach works as follows: first, determine a conic formulation of the problem; second, determine its dual; third, apply smoothing; and fourth, solve using an optimal first-order method. A merit of this approach is its flexibility: for example, all compressed sensing problems can be solved via this approach. These include models with objective functionals such as the total-variation norm, ||Wx||_1 where W is arbitrary, or a combination thereof. In addition, the paper also introduces a number of technical contributions such as a novel continuation scheme, a novel approach for controlling the step size, and some new results showing that the smooth and unsmoothed problems are sometimes formally equivalent. Combined with our framework, these lead to novel, stable and computationally efficient algorithms. For instance, our general implementation is competitive with state-of-the-art methods for solving intensively studied problems such as the LASSO. Further, numerical experiments show that one can solve the Dantzig selector problem, for which no efficient large-scale solvers exist, in a few hundred iterations. Finally, the paper is accompanied with a software release. This software is not a single, monolithic solver; rather, it is a suite of programs and routines designed to serve as building blocks for constructing complete algorithms.
垂钓者的钓鱼问题
Anna Karpowicz, Krzysztof Szajowski
AI总结 本文通过建立标记更新-奖励过程,研究垂钓者在固定时间内使用两种技术(最多两根钓竿)的最优停止时间问题,以最大化其满意度(效用函数与成本函数之差)。
所考虑的模型将被表述为与“钓鱼问题”相关,即使它的其他应用更为明显。垂钓者去钓鱼。他使用各种技术,并且最多有两根钓竿。他购买固定时间的钓鱼票。根据更新过程,使用不同的方法捕获鱼。鱼的价值和到达间隔时间由独立同分布(i.i.d.)随机变量序列给出,其分布函数已知。这构成了标记更新-奖励过程。垂钓者的满意度度量由效用函数(取决于捕获鱼的价值)与成本函数(与钓鱼时间相关)之差给出。通过这种方式,模拟了垂钓者对钓鱼方法的相对看法。垂钓者的目标是获得尽可能多的满意度,并且他必须在固定时刻之前离开湖泊。因此,他的目标是找到两个最优停止时间以最大化他的满意度。在第一个时刻,他改变钓鱼技术,例如排除一根钓竿并加强其余部分。接下来,他决定何时停止探险。这些停止时间必须短于固定的钓鱼时间。使用动态规划方法找到这两个最优停止时间,并指定这些时刻垂钓者的预期满意度。
The considered model will be formulated as related to "the fishing problem" even if the other applications of it are much more obvious. The angler goes fishing. He uses various techniques and he has at most two fishing rods. He buys a fishing ticket for a fixed time. The fishes are caught with the use of different methods according to the renewal processes. The fishes' value and the inter arrival times are given by the sequences of independent, identically distributed (i.i.d.) random variables with the known distribution functions. It forms the marked renewal--reward process. The angler's measure of satisfaction is given by the difference between the utility function, depending on the value of the fishes caught, and the cost function connected with the time of fishing. In this way, the angler's relative opinion about the methods of fishing is modelled. The angler's aim is to have as much satisfaction as possible and additionally he has to leave the lake before a fixed moment. Therefore his goal is to find two optimal stopping times in order to maximize his satisfaction. At the first moment, he changes the technique of fishing, e.g. by excluding one rod and intensifying on the rest. Next, he decides when he should stop the expedition. These stopping times have to be shorter than the fixed time of fishing. The dynamic programming methods have been used to find these two optimal stopping times and to specify the expected satisfaction of the angler at these times.
随机凸优化的预言机复杂度的信息论下界
Alekh Agarwal, Peter L. Bartlett, Pradeep Ravikumar, Martin J. Wainwright
AI总结 本文通过信息论方法,在预言机计算模型下研究了随机凸优化的复杂度,改进了已知结果并获得了多种函数类的紧极小极大复杂度估计。
相对于关于凸优化复杂度上界的大量文献,这些问题的基本难度受到的关注较少。鉴于凸优化在机器学习和统计学中的广泛应用,理解这些复杂度理论问题非常重要。在本文中,我们在预言机计算模型下研究随机凸优化的复杂度。我们改进了已知结果,并获得了各种函数类的紧极小极大复杂度估计。
Relative to the large literature on upper bounds on complexity of convex optimization, lesser attention has been paid to the fundamental hardness of these problems. Given the extensive use of convex optimization in machine learning and statistics, gaining an understanding of these complexity-theoretic issues is important. In this paper, we study the complexity of stochastic convex optimization in an oracle model of computation. We improve upon known results and obtain tight minimax complexity estimates for various function classes.
结构化协方差估计的几何方法
Lipeng Ning, Xianhua Jiang, Tryphon Georgiou
AI总结 本文采用基于Wasserstein/Bures/Hellinger距离的几何方法估计Toeplitz结构协方差矩阵,通过求解线性矩阵不等式实现高效计算,并与最大似然和Burg方法在谱线功率谱估计上进行了比较。
我们考虑结构化协方差矩阵的估计问题,特别是具有Toeplitz结构的矩阵。我们遵循基于某种合适距离概念的几何观点。为此,我们概述并比较了几种替代度量和散度度量。我们提倡一种特定的度量,它表示相应高斯分布之间的Wasserstein距离,并表明它也与协方差矩阵之间的所谓Bures/Hellinger距离一致。最重要的是,除了物理上吸引人的解释外,该度量的计算需要求解线性矩阵不等式(LMI)。因此,对于涉及大协方差矩阵的问题,计算规模良好,并且协方差结构上的线性先验约束易于处理。我们将这种传输/Bures/Hellinger度量与最大似然和Burg方法在文献中代表性案例研究中的功率谱估计性能进行了比较。
We consider problems of estimation of structured covariance matrices, and in particular of matrices with a Toeplitz structure. We follow a geometric viewpoint that is based on some suitable notion of distance. To this end, we overview and compare several alternatives metrics and divergence measures. We advocate a specific one which represents the Wasserstein distance between the corresponding Gaussians distributions and show that it coincides with the so-called Bures/Hellinger distance between covariance matrices as well. Most importantly, besides the physically appealing interpretation, computation of the metric requires solving a linear matrix inequality (LMI). As a consequence, computations scale nicely for problems involving large covariance matrices, and linear prior constraints on the covariance structure are easy to handle. We compare this transportation/Bures/Hellinger metric with the maximum likelihood and the Burg methods as to their performance with regard to estimation of power spectra with spectral lines on a representative case study from the literature.
基于部分信息的在线鲁棒子空间跟踪
Jun He, Laura Balzano, John C. S. Lui
AI总结 提出GRASTA算法,利用鲁棒l1范数从高度不完整数据中在线跟踪子空间,应用于鲁棒矩阵补全和视频背景-前景实时分离,在基准视频上达到57帧/秒。
本文提出了GRASTA(Grassmannian鲁棒自适应子空间跟踪算法),一种高效且鲁棒的在线算法,用于从高度不完整的信息中跟踪子空间。该算法使用鲁棒的$l^1$-范数代价函数,以便在流数据向量被异常值污染时估计和跟踪非平稳子空间。我们将GRASTA应用于鲁棒矩阵补全以及视频中背景与前景的实时分离问题。在第二个应用中,我们展示了GRASTA以异常高的速度执行运动物体与背景的高质量分离:在一个流行的基准视频示例中,即使在个人笔记本电脑上运行MATLAB,GRASTA也能达到每秒57帧的速率。
This paper presents GRASTA (Grassmannian Robust Adaptive Subspace Tracking Algorithm), an efficient and robust online algorithm for tracking subspaces from highly incomplete information. The algorithm uses a robust $l^1$-norm cost function in order to estimate and track non-stationary subspaces when the streaming data vectors are corrupted with outliers. We apply GRASTA to the problems of robust matrix completion and real-time separation of background from foreground in video. In this second application, we show that GRASTA performs high-quality separation of moving objects from background at exceptional speeds: In one popular benchmark video example, GRASTA achieves a rate of 57 frames per second, even when run in MATLAB on a personal laptop.
关于不同维度线性回归模型系数之间的关系
V. G. Panov
AI总结 本文证明了给定响应变量在完整预测变量集和子集下的两个线性回归模型系数之间存在线性关系,并探讨了该定理的推论。
考虑给定响应变量在某个预测变量集及其子集下的两个线性回归模型。证明了这些模型的系数之间存在线性关系。考虑了所证明定理的一些推论。
Considered two linear regression models of a given response variable with some predictor set and its subset. It is shown that there is a linear relationship between coefficients of these models. Some corollaries of the proved theorem is considered.
低秩矩阵估计的稀疏贝叶斯方法
S. Derin Babacan, Martin Luessi, Rafael Molina, Aggelos K. Katsaggelos
AI总结 提出基于稀疏贝叶斯学习的矩阵补全和鲁棒主成分分析算法,通过稀疏约束自动确定秩并实现高恢复性能。
低秩矩阵的恢复最近在科学和工程的许多领域引起了显著关注,这得益于精确重构保证的理论结果和有趣的实际应用。针对这一恢复问题,已经开发了许多方法。然而,通常没有提供选择未知目标秩的原则性方法。在本文中,我们提出了基于稀疏贝叶斯学习(SBL)原理的矩阵补全和鲁棒主成分分析中估计低秩矩阵的新恢复算法。从矩阵分解公式出发,将估计中的低秩约束作为稀疏约束强制执行,我们开发了一种在确定正确秩的同时提供高恢复性能的有效方法。我们提供了与其他类似问题中现有方法的联系,以及经验结果和与当前最先进方法的比较,说明了该方法的有效性。
Recovery of low-rank matrices has recently seen significant activity in many areas of science and engineering, motivated by recent theoretical results for exact reconstruction guarantees and interesting practical applications. A number of methods have been developed for this recovery problem. However, a principled method for choosing the unknown target rank is generally not provided. In this paper, we present novel recovery algorithms for estimating low-rank matrices in matrix completion and robust principal component analysis based on sparse Bayesian learning (SBL) principles. Starting from a matrix factorization formulation and enforcing the low-rank constraint in the estimates as a sparsity constraint, we develop an approach that is very effective in determining the correct rank while providing high recovery performance. We provide connections with existing methods in other similar problems and empirical results and comparisons with current state-of-the-art methods that illustrate the effectiveness of this approach.
动态策略编程
Mohammad Gheshlaghi Azar, Vicenc Gomez, Hilbert J. Kappen
AI总结 提出动态策略编程(DPP)方法,通过平均累积误差的无穷范数界,在近似误差下优于标准近似值迭代和近似策略迭代,并在多个问题域中显著超越现有强化学习方法。
在本文中,我们提出了一种新颖的策略迭代方法,称为动态策略编程(DPP),用于估计无限时域马尔可夫决策过程中的最优策略。我们证明了在存在近似/估计误差的情况下,DPP的有限迭代和渐近l∞范数性能损失界。这些界以平均累积误差的l∞范数表示,而不是标准近似值迭代(AVI)和近似策略迭代(API)中误差的l∞范数。这表明DPP可以实现比AVI和API更好的性能,因为它平均了整个学习过程中由蒙特卡洛采样引起的模拟噪声。我们通过在不同问题域上比较DPP的近似变体与现有强化学习(RL)方法的性能,数值验证了这一理论结果。我们的结果表明,在所有情况下,基于DPP的算法都大幅优于其他RL方法。
In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision processes. We prove the finite-iteration and asymptotic l\infty-norm performance-loss bounds for DPP in the presence of approximation/estimation error. The bounds are expressed in terms of the l\infty-norm of the average accumulated error as opposed to the l\infty-norm of the error in the case of the standard approximate value iteration (AVI) and the approximate policy iteration (API). This suggests that DPP can achieve a better performance than AVI and API since it averages out the simulation noise caused by Monte-Carlo sampling throughout the learning process. We examine this theoretical results numerically by com- paring the performance of the approximate variants of DPP with existing reinforcement learning (RL) methods on different problem domains. Our results show that, in all cases, DPP-based algorithms outperform other RL methods by a wide margin.
利用监督学习改进蒙特卡洛积分估计
Brendan Tracey, David Wolpert, Juan J. Alonso
AI总结 提出Stacked Monte Carlo (StackMC)方法,通过监督学习中的函数拟合和交叉验证对现有MC样本进行后处理,以降低积分估计的方差而不增加偏差。
蒙特卡洛(MC)技术通常用于通过随机生成的函数样本来估计多元函数的积分。鉴于航空航天工程中不确定性量化和鲁棒设计应用的日益关注,计算此类函数(例如性能指标)的期望值变得重要。然而,MC技术常常面临高方差和随着样本数量增加收敛缓慢的问题。在本文中,我们提出了堆叠蒙特卡洛(StackMC)方法,这是一种对现有MC样本集进行后处理以改进相关积分估计的新方法。StackMC基于监督学习技术,即函数拟合和交叉验证。它应该能够降低任何类型蒙特卡洛积分估计(简单抽样、重要性抽样、拟蒙特卡洛、MCMC等)的方差,而不增加偏差。我们报告了大量实验,证实StackMC的积分估计比未处理的MC估计以及基于MC样本函数拟合的估计更准确。这些实验涵盖了各种积分空间、样本点数、维度和拟合函数。特别是,我们将StackMC应用于估计未来商用飞机的燃油消耗指标的期望值以及估计音爆响度度量。我们将StackMC的效率与更标准的方法进行比较,并表明在可忽略的额外计算成本下,准确性显著提高。
Monte Carlo (MC) techniques are often used to estimate integrals of a multivariate function using randomly generated samples of the function. In light of the increasing interest in uncertainty quantification and robust design applications in aerospace engineering, the calculation of expected values of such functions (e.g. performance measures) becomes important. However, MC techniques often suffer from high variance and slow convergence as the number of samples increases. In this paper we present Stacked Monte Carlo (StackMC), a new method for post-processing an existing set of MC samples to improve the associated integral estimate. StackMC is based on the supervised learning techniques of fitting functions and cross validation. It should reduce the variance of any type of Monte Carlo integral estimate (simple sampling, importance sampling, quasi-Monte Carlo, MCMC, etc.) without adding bias. We report on an extensive set of experiments confirming that the StackMC estimate of an integral is more accurate than both the associated unprocessed Monte Carlo estimate and an estimate based on a functional fit to the MC samples. These experiments run over a wide variety of integration spaces, numbers of sample points, dimensions, and fitting functions. In particular, we apply StackMC in estimating the expected value of the fuel burn metric of future commercial aircraft and in estimating sonic boom loudness measures. We compare the efficiency of StackMC with that of more standard methods and show that for negligible additional computational cost significant increases in accuracy are gained.
全变分、自适应全变分和非凸平滑剪切绝对偏差惩罚用于块状图像去噪
Aditya Chopra, Heng Lian
AI总结 针对全变分模型的偏差问题,提出一种受高维变量选择启发的非凸惩罚函数,通过MM算法高效求解,实验证明在块状图像去噪中性能优于传统方法。
基于全变分的图像去噪模型已被广泛推广和扩展,在不同场景下提升了性能。我们提出一种新的惩罚函数,其灵感来自高维变量选择统计文献的最新进展。利用特定实例化的MM算法,优化问题可以高效求解,且计算过程与空间自适应全变分模型类似。我们的两像素图像模型从理论上证明,新惩罚函数解决了全变分模型固有的偏差问题。通过多个实验展示了新惩罚的优越性能。我们的研究仅限于具有小全变分的“块状”图像。
The total variation-based image denoising model has been generalized and extended in numerous ways, improving its performance in different contexts. We propose a new penalty function motivated by the recent progress in the statistical literature on high-dimensional variable selection. Using a particular instantiation of the majorization-minimization algorithm, the optimization problem can be efficiently solved and the computational procedure realized is similar to the spatially adaptive total variation model. Our two-pixel image model shows theoretically that the new penalty function solves the bias problem inherent in the total variation model. The superior performance of the new penalty is demonstrated through several experiments. Our investigation is limited to "blocky" images which have small total variation.
线性状态估计的自适应采样
Maben Rabi, George V. Moustakides, John S. Baras
AI总结 针对传感器通过有限速率网络传输测量值以估计状态的问题,研究了因果采样策略的最优设计,通过将问题转化为最优停时序列,证明了Delta采样性能较差,并给出了布朗运动情形下的解析解。
当传感器具有连续测量值但通过数据网络向估计状态的监督器发送有限消息时,可用的数据包速率决定了状态估计的可实现质量。当这种速率限制变得严格时,传感器的消息策略需要重新设计。什么是好的因果消息策略?消息数据包应包含什么?监督器处因果估计的最低可能失真是什么?Delta采样是否优于周期性采样?我们在网络的理想化模型和传感器完美测量的假设下回答这些问题。对于标量线性扩散过程,我们研究选择因果采样时间以最小化总平方误差失真的问题。我们坚持有限时域,并对允许的样本数量施加硬上限。我们将设计问题转化为选择最优停时序列的问题。我们将其简化为一个嵌套序列问题,每个问题要求一个单一的最优停时。在监督器处最小二乘估计的一个未经证明但自然的假设下,这些单一停时问题中的每一个都是标准形式。最优停时是估计误差超过设计包络的随机时间。对于状态是布朗运动的情况,我们解析地给出:最优采样包络的形状、最优Delta采样下的包络形状及其性能。令人惊讶的是,我们发现Delta采样表现不佳。因此,当速率约束是有限时域内样本数量的硬限制时,我们不应使用Delta采样。
When a sensor has continuous measurements but sends limited messages over a data network to a supervisor which estimates the state, the available packet rate fixes the achievable quality of state estimation. When such rate limits turn stringent, the sensor's messaging policy should be designed anew. What are the good causal messaging policies ? What should message packets contain ? What is the lowest possible distortion in a causal estimate at the supervisor ? Is Delta sampling better than periodic sampling ? We answer these questions under an idealized model of the network and the assumption of perfect measurements at the sensor. For a scalar, linear diffusion process, we study the problem of choosing the causal sampling times that will give the lowest aggregate squared error distortion. We stick to finite-horizons and impose a hard upper bound on the number of allowed samples. We cast the design as a problem of choosing an optimal sequence of stopping times. We reduce this to a nested sequence of problems each asking for a single optimal stopping time. Under an unproven but natural assumption about the least-square estimate at the supervisor, each of these single stopping problems are of standard form. The optimal stopping times are random times when the estimation error exceeds designed envelopes. For the case where the state is a Brownian motion, we give analytically: the shape of the optimal sampling envelopes, the shape of the envelopes under optimal Delta sampling, and their performances. Surprisingly, we find that Delta sampling performs badly. Hence, when the rate constraint is a hard limit on the number of samples over a finite horizon, we should should not use Delta sampling.
人类网络中的流行病传播
Faryad Darabi Sahneh, Caterina Scoglio
AI总结 通过向经典SIS模型添加警觉状态,研究人类响应如何影响流行病传播,并利用平均场近似和代数图论分析揭示两个阈值及警觉性对传播动态的影响。
复杂网络上流行的动态之一是流行病传播。流行病模型描述了感染如何在网络中传播。在用于描述流行病的仓室模型中,易感-感染-易感(SIS)模型已被广泛使用。在SIS模型中,每个节点可以是易感的,以给定的感染率被感染,并以给定的治愈率再次变为易感。在本文中,我们向经典SIS模型添加了一个新仓室,以考虑人类对流行病传播的响应。每个个体可以被感染、易感或警觉。如果易感个体的邻域中存在感染个体,他们可以以警觉率变为警觉。处于警觉状态的个体比处于易感状态的个体被感染的可能性更低,这是由于新采取的谨慎行为。该问题被表述为一般静态图上的连续时间马尔可夫过程,然后通过平均场近似方法和相应的科尔莫戈罗夫前向方程建模为一组常微分方程。然后利用代数图论和中心流形定理的结果研究该模型。我们分析表明,我们的模型在流行病传播动态中表现出两个不同的阈值。低于第一个阈值,感染以指数方式消失。超过第二个阈值,感染在稳态下持续存在。在两个阈值之间,感染在第一阶段传播,但由于网络中警觉性的提高,最终渐近消失。最后,提供仿真以支持我们的发现。我们的结果表明,警觉性可以被视为控制流行病的一种策略,这提出了多个潜在的应用领域,从传染病缓解到恶意软件影响减少。
One of the popular dynamics on complex networks is the epidemic spreading. An epidemic model describes how infections spread throughout a network. Among the compartmental models used to describe epidemics, the Susceptible-Infected-Susceptible (SIS) model has been widely used. In the SIS model, each node can be susceptible, become infected with a given infection rate, and become again susceptible with a given curing rate. In this paper, we add a new compartment to the classic SIS model to account for human response to epidemic spread. Each individual can be infected, susceptible, or alert. Susceptible individuals can become alert with an alerting rate if infected individuals exist in their neighborhood. An individual in the alert state is less probable to become infected than an individual in the susceptible state; due to a newly adopted cautious behavior. The problem is formulated as a continuous-time Markov process on a general static graph and then modeled into a set of ordinary differential equations using mean field approximation method and the corresponding Kolmogorov forward equations. The model is then studied using results from algebraic graph theory and center manifold theorem. We analytically show that our model exhibits two distinct thresholds in the dynamics of epidemic spread. Below the first threshold, infection dies out exponentially. Beyond the second threshold, infection persists in the steady state. Between the two thresholds, the infection spreads at the first stage but then dies out asymptotically as the result of increased alertness in the network. Finally, simulations are provided to support our findings. Our results suggest that alertness can be considered as a strategy of controlling the epidemics which propose multiple potential areas of applications, from infectious diseases mitigations to malware impact reduction.
从高度不完整信息中在线识别和跟踪子空间
Laura Balzano, Robert Nowak, Benjamin Recht
AI总结 提出GROUSE算法,利用Grassmann流形上的增量梯度下降,从高度不完整观测中在线跟踪子空间,并可用于低秩矩阵补全。
本文提出了GROUSE(Grassmann秩一更新子空间估计),一种高效的在线算法,用于从高度不完整的观测中跟踪子空间。GROUSE在每次迭代中仅需要基本的线性代数操作,并且每次子空间更新可以在子空间维度的线性时间内完成。该算法通过分析Grassmann流形上的增量梯度下降推导得出。稍作修改后,GROUSE也可用作矩阵补全问题的在线增量算法,用于填补低秩矩阵的缺失条目。GROUSE在实践中表现异常出色,无论是在跟踪子空间方面,还是作为矩阵补全的在线算法。
This work presents GROUSE (Grassmanian Rank-One Update Subspace Estimation), an efficient online algorithm for tracking subspaces from highly incomplete observations. GROUSE requires only basic linear algebraic manipulations at each iteration, and each subspace update can be performed in linear time in the dimension of the subspace. The algorithm is derived by analyzing incremental gradient descent on the Grassmannian manifold of subspaces. With a slight modification, GROUSE can also be used as an online incremental algorithm for the matrix completion problem of imputing missing entries of a low-rank matrix. GROUSE performs exceptionally well in practice both in tracking subspaces and as an online algorithm for matrix completion.
熵泛函、信息路径泛函的基本要素及其与柯尔莫哥洛夫熵、复杂性和物理学的联系
Vladimir S. Lerner
AI总结 本文介绍受控扩散过程轨迹上的熵泛函和信息路径泛函的最新结果,分析它们与柯尔莫哥洛夫熵、复杂性和李雅普诺夫特征的关系,并研究IPF极值方程的奇异性及生成的不变关系。
本文介绍了与受控扩散过程轨迹上的熵泛函和信息路径泛函(IPF)相关的最新结果,分析了它们与柯尔莫哥洛夫熵、复杂性和李雅普诺夫特征的联系。考虑到IPF的基本要素和特性,本文研究了IPF极值方程的奇异性以及生成的不变关系,这些对于解决重要的数学和应用问题都有用。 关键词:加性泛函;熵;奇异性;自然边界问题;不变性
The paper introduces the recent results related to an entropy functional on trajectories of a controlled diffusion process, and the information path functional (IPF), analyzing their connections to the Kolmogorov's entropy, complexity and the Lyapunov's characteristics. Considering the IPF's essentials and specifics, the paper studies the singularities of the IPF extremal equations and the created invariant relations, which both are useful for the solution of important mathematical and applied problems. Keywords: Additive functional; Entropy; Singularities, Natural Border Problem; Invariant
压缩感知中抗测量矩阵不确定性的稀疏信号恢复
Yipeng Liu, Qun Wan, Fei Wen, Jia Xu, Yingning Peng
AI总结 针对测量矩阵不确定性,提出一种结合混合L2和L1范数的抗不确定性约束,并与稀疏约束结合形成抗不确定性稀疏信号恢复算子,数值模拟表明其重构性能优于传统方法。
压缩感知是一种从随机测量和测量矩阵中估计稀疏信号的技术。传统的稀疏信号恢复方法在测量矩阵不确定性下性能严重退化。本文将测量矩阵不确定性建模为有界加性误差。从具有测量矩阵不确定性的稀疏信号模型中推导出混合L2和L1范数形式的抗不确定性约束。然后我们将稀疏约束与抗不确定性约束结合,得到抗不确定性稀疏信号恢复算子。数值模拟表明,所提出的算子在测量矩阵不确定性下具有比传统方法更好的重构性能。
Compressive sensing (CS) is a technique for estimating a sparse signal from the random measurements and the measurement matrix. Traditional sparse signal recovery methods have seriously degeneration with the measurement matrix uncertainty (MMU). Here the MMU is modeled as a bounded additive error. An anti-uncertainty constraint in the form of a mixed L2 and L1 norm is deduced from the sparse signal model with MMU. Then we combine the sparse constraint with the anti-uncertainty constraint to get an anti-uncertainty sparse signal recovery operator. Numerical simulations demonstrate that the proposed operator has a better reconstructing performance with the MMU than traditional methods.
全模态断层成像/多模态断层成像——整合多种模态实现同步成像
Ge Wang, Jie Zhang, Hao Gao, Victor Weir, Hengyong Yu, Wenxiang Cong, Xiaochen Xu, Haiou Shen, James Bennett, Yue Wang, Michael Vannier
AI总结 本文提出全模态断层成像(omni-tomography)概念,通过整合CT、MRI、PET、SPECT、超声、光学等多种成像机制实现真正同步的局部重建,克服现有模态融合方法在配准误差和物理限制方面的固有局限。
当前的断层成像系统需要重大改进,尤其是在研究多维、多尺度、多时间及多参数现象时。临床前和临床成像现在都依赖于体内断层成像,通常需要不同成像模态分别评估以定义形态细节、描绘疾病或干预引起的变化,并研究具有相互关联方面的生理功能。过去十年中,多模态图像融合出现了两种不同方法:事后图像配准以及PET-CT、PET-MRI及其他混合扫描仪上的联合采集。事后图像分析和双/三模态方法都存在固有局限性,这些局限性由配准误差和采集链中的物理约束决定。我们预见断层成像将超越当前的模态融合,走向大融合,即所有或许多成像模态的大规模融合,可称为全模态断层成像或多模态断层成像。与模态融合不同,这里提出的大融合旨在实现真正同步但通常局部的重建,涉及所有或许多相关成像机制,如CT、MRI、PET、SPECT、超声、光学以及可能更多。本文介绍了全模态断层成像的技术基础,并通过下一代扫描仪的顶层设计、代表性模态的内部断层重建以及全模态断层成像的预期应用进行了说明。
Current tomographic imaging systems need major improvements, especially when multi-dimensional, multi-scale, multi-temporal and multi-parametric phenomena are under investigation. Both preclinical and clinical imaging now depend on in vivo tomography, often requiring separate evaluations by different imaging modalities to define morphologic details, delineate interval changes due to disease or interventions, and study physiological functions that have interconnected aspects. Over the past decade, fusion of multimodality images has emerged with two different approaches: post-hoc image registration and combined acquisition on PET-CT, PET-MRI and other hybrid scanners. There are intrinsic limitations for both the post-hoc image analysis and dual/triple modality approaches defined by registration errors and physical constraints in the acquisition chain. We envision that tomography will evolve beyond current modality fusion and towards grand fusion, a large scale fusion of all or many imaging modalities, which may be referred to as omni-tomography or multi-tomography. Unlike modality fusion, grand fusion is here proposed for truly simultaneous but often localized reconstruction in terms of all or many relevant imaging mechanisms such as CT, MRI, PET, SPECT, US, optical, and possibly more. In this paper, the technical basis for omni-tomography is introduced and illustrated with a top-level design of a next generation scanner, interior tomographic reconstructions of representative modalities, and anticipated applications of omni-tomography.
高斯滤波与平滑的概率视角
Marc Peter Deisenroth, Henrik Ohlsson
AI总结 本文从概率视角统一高斯滤波与平滑方法,指出其核心区别仅在于联合概率均值和协方差的计算/近似方式,并据此推导了容积卡尔曼平滑器及基于吉布斯采样的鲁棒滤波与平滑算法。
我们提出了一个关于高斯滤波与平滑的通用概率视角。这使我们能够证明,常见的高斯滤波/平滑方法仅通过其计算/近似联合概率的均值和协方差的方法来区分。这意味着,通过提供计算这些矩的方法,可以直接推导出新的滤波器和平滑器。基于这一见解,我们推导了容积卡尔曼平滑器,并提出了一种基于吉布斯采样的新型鲁棒滤波与平滑算法。
We present a general probabilistic perspective on Gaussian filtering and smoothing. This allows us to show that common approaches to Gaussian filtering/smoothing can be distinguished solely by their methods of computing/approximating the means and covariances of joint probabilities. This implies that novel filters and smoothers can be derived straightforwardly by providing methods for computing these moments. Based on this insight, we derive the cubature Kalman smoother and propose a novel robust filtering and smoothing algorithm based on Gibbs sampling.
基于压缩采样的分数布朗运动信号稀疏样本重构
Andriyan Bayu Suksmono
AI总结 提出一种基于压缩采样的分数布朗运动信号插值/重构方法,利用其频谱稀疏性,无需已知Hurst参数即可从部分样本中恢复信号。
本文提出了一种基于压缩采样(CS)的分数布朗运动(fBm)插值/重构方法,该方法利用部分已知样本进行信号恢复。由于1/f特性导致fBm频谱呈幂律衰减,fBm信号在频域中应是稀疏的。这一性质促使我们在开发重构方法时采用CS。幂律中的Hurst参数H决定了稀疏程度,因此对于给定数量的已知子样本,fBm信号的CS重构质量将依赖于H。然而,所提出的方法无需知道H的信息即可从部分样本中重构fBm信号。该方法采用离散傅里叶变换(DFT)作为稀疏基,并利用已知样本位置导出的随机矩阵作为投影基。使用不同H值的模拟fBm信号来展示Hurst参数与重构质量之间的关系。此外,还使用美国道琼斯工业平均指数(US-DJIA)的月度时间序列来展示所提出方法在重构真实世界数据中的适用性。
This paper proposes a new fBm (fractional Brownian motion) interpolation/reconstruction method from partially known samples based on CS (Compressive Sampling). Since 1/f property implies power law decay of the fBm spectrum, the fBm signals should be sparse in frequency domain. This property motivates the adoption of CS in the development of the reconstruction method. Hurst parameter H that occurs in the power law determines the sparsity level, therefore the CS reconstruction quality of an fBm signal for a given number of known subsamples will depend on H. However, the proposed method does not require the information of H to reconstruct the fBm signal from its partial samples. The method employs DFT (Discrete Fourier Transform) as the sparsity basis and a random matrix derived from known samples positions as the projection basis. Simulated fBm signals with various values of H are used to show the relationship between the Hurst parameter and the reconstruction quality. Additionally, US-DJIA (Dow Jones Industrial Average) stock index monthly values time-series are also used to show the applicability of the proposed method to reconstruct a real-world data.
高维高斯分布中的参数估计
Erlend Aune, Daniel P. Simpson
AI总结 针对高维空间高斯模型中对数似然计算的内存瓶颈,提出一种利用矩阵函数、Krylov子空间和探测向量的迭代方法,以快速计算矩阵-向量积。
为了计算高维空间高斯模型的对数似然,需要计算大型、稀疏、对称正定精度矩阵Q的行列式。对于非常大的模型,传统的对数似然评估方法可能因巨大的内存需求而失败。我们提出了一种新颖的方法,当矩阵-向量积Qv计算快速时,可以评估此类似然。在该方法中,我们利用矩阵函数、Krylov子空间和探测向量来构建一种迭代方法,用于计算对数似然。
In order to compute the log-likelihood for high dimensional spatial Gaussian models, it is necessary to compute the determinant of the large, sparse, symmetric positive definite precision matrix, Q. Traditional methods for evaluating the log-likelihood for very large models may fail due to the massive memory requirements. We present a novel approach for evaluating such likelihoods when the matrix-vector product, Qv, is fast to compute. In this approach we utilise matrix functions, Krylov subspaces, and probing vectors to construct an iterative method for computing the log-likelihood.
带边界流形上图拉普拉斯算子的行为
Xueyuan Zhou, Mikhail Belkin
AI总结 本文分析了带边界流形上图拉普拉斯算子在边界附近的行为,揭示了其与内部不同的缩放特性及全局影响,并给出了收敛速率和数值结果。
在流形学习中,基于数据构建的图拉普拉斯算法在实际应用和理论分析中都受到了广泛关注。特别是,从采样数据获得的图拉普拉斯算子收敛到某些连续算子最近成为一个活跃的研究课题。现有的大部分工作都假设数据采样自无边界流形,或者感兴趣的函数在远离边界的点处评估。然而,边界行为问题具有相当大的实践和理论意义。在本文中,我们分析了图拉普拉斯算子在边界附近或边界上的点的行为,讨论了它们的收敛速率及其含义,并提供了一些数值结果。结果表明,虽然边界附近的点只占流形总体积的一小部分,但图拉普拉斯算子在这些点的行为具有与流形上其他地方不同的缩放特性,并对整个流形产生全局影响,这一观察对于流形学习的普遍问题具有潜在的重要意义。
In manifold learning, algorithms based on graph Laplacians constructed from data have received considerable attention both in practical applications and theoretical analysis. In particular, the convergence of graph Laplacians obtained from sampled data to certain continuous operators has become an active research topic recently. Most of the existing work has been done under the assumption that the data is sampled from a manifold without boundary or that the functions of interests are evaluated at a point away from the boundary. However, the question of boundary behavior is of considerable practical and theoretical interest. In this paper we provide an analysis of the behavior of graph Laplacians at a point near or on the boundary, discuss their convergence rates and their implications and provide some numerical results. It turns out that while points near the boundary occupy only a small part of the total volume of a manifold, the behavior of graph Laplacian there has different scaling properties from its behavior elsewhere on the manifold, with global effects on the whole manifold, an observation with potentially important implications for the general problem of learning on manifolds.
一类带跳的倒向随机偏微分方程及其应用
Wanyang Dai
AI总结 提出一类带跳的高阶向量倒向随机偏微分方程,在Lipschitz和线性增长条件下证明适应解的存在唯一性,并应用于金融领域。
我们提出了一类新的随机偏微分方程(SPDEs),称为带跳的高阶向量倒向随机偏微分方程(B-SPDEs),允许将高阶积分-偏微分算子同时引入漂移和扩散系数。在某种Lipschitz和线性增长条件下,我们发展了一种方法来证明这些带跳的B-SPDEs的适应解的存在唯一性。与现有关于传统倒向随机(常)微分方程(BSDEs)的讨论相比,我们需要处理带跳的B-SPDEs的适应三元组解的可微性,这是证明主要结果中的一个微妙部分,因为B-SPDEs两侧的微分阶数不一致以及扩散系数中出现的偏微分算子。此外,我们还讨论了在某种马尔可夫随机环境下B-SPDEs的问题,并利用漂移系数中带有强非线性偏微分算子的B-SPDE来说明主要结果在金融中的应用。
We formulate a new class of stochastic partial differential equations (SPDEs), named high-order vector backward SPDEs (B-SPDEs) with jumps, which allow the high-order integral-partial differential operators into both drift and diffusion coefficients. Under certain type of Lipschitz and linear growth conditions, we develop a method to prove the existence and uniqueness of adapted solution to these B-SPDEs with jumps. Comparing with the existing discussions on conventional backward stochastic (ordinary) differential equations (BSDEs), we need to handle the differentiability of adapted triplet solution to the B-SPDEs with jumps, which is a subtle part in justifying our main results due to the inconsistency of differential orders on two sides of the B-SPDEs and the partial differential operator appeared in the diffusion coefficient. In addition, we also address the issue about the B-SPDEs under certain Markovian random environment and employ a B-SPDE with strongly nonlinear partial differential operator in the drift coefficient to illustrate the usage of our main results in finance.
元模型拟合与验证过程的数值研究
Bertrand Iooss, Loïc Boussouf, Vincent Feuillard, Amandine Marrel
AI总结 本文通过数值方法比较了最优拉丁超立方体设计中的空间填充设计,发现最小环绕差异的样本特别适合高斯过程元模型拟合,并提出了一种优化验证点与学习点距离的算法,以最少验证点估计元模型预测能力。
复杂计算机代码,例如模拟物理现象的代码,通常过于耗时,无法直接用于不确定性、敏感性、优化和稳健性分析。解决此问题的一种广泛接受的方法是用计算成本低廉的数学函数(称为元模型)替代计算昂贵的计算机模型。本文聚焦于高斯过程元模型及其定义阶段的两个关键步骤。首先,计算机代码输入变量的初始设计(用于拟合元模型)需要满足良好的空间填充性质。我们采用数值方法比较不同类型空间填充设计(在最优拉丁超立方体样本类别中)的性能,基于后续拟合元模型的预测能力。结论是,具有最小环绕差异的此类样本特别适合高斯过程元模型拟合。其次,元模型验证过程包括评估元模型相对于原始计算机代码的预测能力。我们提出并测试了一种算法,该算法优化验证点与元模型学习点之间的距离,以便用最少的验证点估计真实的元模型预测能力。与经典验证算法的比较以及应用于核安全计算机代码的实例显示了这种新的序贯验证设计的相关性。
Complex computer codes, for instance simulating physical phenomena, are often too time expensive to be directly used to perform uncertainty, sensitivity, optimization and robustness analyses. A widely accepted method to circumvent this problem consists in replacing cpu time expensive computer models by cpu inexpensive mathematical functions, called metamodels. In this paper, we focus on the Gaussian process metamodel and two essential steps of its definition phase. First, the initial design of the computer code input variables (which allows to fit the metamodel) has to honor adequate space filling properties. We adopt a numerical approach to compare the performance of different types of space filling designs, in the class of the optimal Latin hypercube samples, in terms of the predictivity of the subsequent fitted metamodel. We conclude that such samples with minimal wrap-around discrepancy are particularly well-suited for the Gaussian process metamodel fitting. Second, the metamodel validation process consists in evaluating the metamodel predictivity with respect to the initial computer code. We propose and test an algorithm which optimizes the distance between the validation points and the metamodel learning points in order to estimate the true metamodel predictivity with a minimum number of validation points. Comparisons with classical validation algorithms and application to a nuclear safety computer code show the relevance of this new sequential validation design.
X-Armed Bandits
Sébastien Bubeck, Rémi Munos, Gilles Stoltz, Csaba Szepesvari
AI总结 针对臂集为一般可测空间且均值回报函数满足已知相异度局部Lipschitz条件的随机多臂赌博机问题,提出HOO算法,实现与维度无关的遗憾界并证明极小极大最优性。
我们考虑随机赌博机的一个推广,其中臂集$\cX$可以是任意可测空间,且均值回报函数关于决策者已知的相异度函数是“局部Lipschitz”的。在此条件下,我们构建了一种称为HOO(分层乐观优化)的臂选择策略,对于一大类问题,其遗憾界相比之前的结果有所改进。特别地,我们的结果表明,如果$\cX$是欧氏空间中的单位超立方体,且均值回报函数有有限个全局最大值,在这些最大值附近函数的行为具有已知光滑度的局部连续性,那么HOO的期望遗憾以对数因子为界被$\sqrt{n}$控制,即遗憾的增长速率与空间维度无关。我们还证明了当相异度为度量时,我们的算法是极小极大最优的。我们的基本策略具有关于时间步数的二次计算复杂度,且不依赖于加倍技巧。我们还引入了一种改进策略,该策略依赖于加倍技巧但运行时间为线性对数。这两个结果相比之前的方法都有改进。
We consider a generalization of stochastic bandits where the set of arms, $\cX$, is allowed to be a generic measurable space and the mean-payoff function is "locally Lipschitz" with respect to a dissimilarity function that is known to the decision maker. Under this condition we construct an arm selection policy, called HOO (hierarchical optimistic optimization), with improved regret bounds compared to previous results for a large class of problems. In particular, our results imply that if $\cX$ is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally continuous with a known smoothness degree, then the expected regret of HOO is bounded up to a logarithmic factor by $\sqrt{n}$, i.e., the rate of growth of the regret is independent of the dimension of the space. We also prove the minimax optimality of our algorithm when the dissimilarity is a metric. Our basic strategy has quadratic computational complexity as a function of the number of time steps and does not rely on the doubling trick. We also introduce a modified strategy, which relies on the doubling trick but runs in linearithmic time. Both results are improvements with respect to previous approaches.
可耻地可并行化的网格生成
David Bortz, Andrew Christlieb
AI总结 提出一种基于随机抽样的非均匀网格生成方法,通过统计粗网格解的矩来构造映射,提高偏微分方程边值问题数值解的精度,并具有近乎完美的并行性。
我们提出了一种新颖的方法,采用随机抽样来生成精确的非均匀网格,用于数值求解偏微分方程边值问题(PDE-BVP)。从一维域上的均匀概率分布U中,我们抽取M个大小为N的离散化,其中M>>N。在每个M个超稀疏网格上给定BVP的解的统计矩提供了识别高精度非均匀网格的洞察。本质上,我们使用粗网格解的点态均值和方差来构造从均匀间隔网格点到非均匀间隔网格点的映射Q(x)。对于某类BVP,非均匀网格上PDE-BVP近似解的误差收敛性质优于均匀网格。特别地,该方法对局部非光滑解的BVP效果良好。我们提出了一个研究抽样稀疏网格解的框架,并提供数值证据证明了该方法应用于一组示例BVP的实用性。最后,我们讨论了该方法近乎完美的并行性如何表明这些策略有潜力高效利用大规模并行多核技术,如通用图形处理单元(GPGPU)。我们认为所提出的算法不仅仅是尴尬并行;在任何非大规模多核架构上实现它都将是可耻的。
We propose a novel approach which employs random sampling to generate an accurate non-uniform mesh for numerically solving Partial Differential Equation Boundary Value Problems (PDE-BVP's). From a uniform probability distribution U over a 1D domain, we sample M discretizations of size N where M>>N. The statistical moments of the solutions to a given BVP on each of the M ultra-sparse meshes provide insight into identifying highly accurate non-uniform meshes. Essentially, we use the pointwise mean and variance of the coarse-grid solutions to construct a mapping Q(x) from uniformly to non-uniformly spaced mesh-points. The error convergence properties of the approximate solution to the PDE-BVP on the non-uniform mesh are superior to a uniform mesh for a certain class of BVP's. In particular, the method works well for BVP's with locally non-smooth solutions. We present a framework for studying the sampled sparse-mesh solutions and provide numerical evidence for the utility of this approach as applied to a set of example BVP's. We conclude with a discussion of how the near-perfect paralellizability of our approach suggests that these strategies have the potential for highly efficient utilization of massively parallel multi-core technologies such as General Purpose Graphics Processing Units (GPGPU's). We believe that the proposed algorithm is beyond embarrassingly parallel; implementing it on anything but a massively multi-core architecture would be scandalous.
成对排名:方法的选择可能导致任意不同的排名顺序
Ngoc Mai Tran
AI总结 本文研究了三种成对比较排名方法(主特征向量、HodgeRank和热带特征向量),证明方法的选择可能导致任意不同的排名顺序,并讨论了实际意义、几何性质及开放问题。
我们研究了三种通过成对比较进行排名的方法:主特征向量、HodgeRank和热带特征向量。结果表明,方法的选择可能导致任意不同的排名顺序。精确地说,对于这三种方法中的任意两种,以及至少四个项目的任意一对排名,存在一个比较矩阵,使得这两种方法得到的排名恰好是预设的。我们讨论了这一结果在实际中的意义,研究了这些方法的几何性质,并提出了一些开放问题。
We examine three methods for ranking by pairwise comparison: Principal Eigenvector, HodgeRank and Tropical Eigenvector. It is shown that the choice of method can produce arbitrarily different rank order.To be precise, for any two of the three methods, and for any pair of rankings of at least four items, there exists a comparison matrix for the items such that the rankings found by the two methods are the prescribed ones. We discuss the implications of this result in practice, study the geometry of the methods, and state some open problems.
含离群值的鲁棒矩阵分解
Daniel Hsu, Sham M. Kakade, Tong Zhang
AI总结 研究通过ℓ1范数和迹范数最小化从观测矩阵中恢复低秩矩阵和稀疏离群值矩阵的条件,给出了比以往更强的恢复保证,且不假设离群值的空间模式是随机的。
假设给定的观测矩阵可以分解为一个低秩矩阵和一个稀疏矩阵(离群值)的和,目标是恢复这些独立分量。这种加性分解在多种数值问题中有应用,包括系统辨识、潜变量图建模和主成分分析。我们研究通过ℓ1范数和迹范数最小化实现这种分解的条件。我们特别关注允许多少离群值使得凸规划仍能实现准确恢复,并且我们得到了比以往研究更强的恢复保证。此外,我们不假设离群值的空间模式是随机的,这与通过矩阵补全进行相关分析形成对比。
Suppose a given observation matrix can be decomposed as the sum of a low-rank matrix and a sparse matrix (outliers), and the goal is to recover these individual components from the observed sum. Such additive decompositions have applications in a variety of numerical problems including system identification, latent variable graphical modeling, and principal components analysis. We study conditions under which recovering such a decomposition is possible via a combination of $\ell_1$ norm and trace norm minimization. We are specifically interested in the question of how many outliers are allowed so that convex programming can still achieve accurate recovery, and we obtain stronger recovery guarantees than previous studies. Moreover, we do not assume that the spatial pattern of outliers is random, which stands in contrast to related analyses under such assumptions via matrix completion.
ML(n)BiCGStab:重新表述、分析与实现
Man-Chung Yeung
AI总结 本文通过索引函数系统重新推导ML(n)BiCGStab算法,提出一种存储需求更少的变体,并分析了其与Lanczos型BiCGStab和Arnoldi型FOM的联系,以及从概率角度分析算法中断情况。
借助索引函数,我们以更系统的方式重新推导了Yeung和Chan在1999年论文中的ML(n)BiCGStab算法。结果表明,定义ML(n)BiCGStab残差向量有n种方式,每种定义都会产生不同的ML(n)BiCGStab算法。我们通过提出第二种算法来证明这一点,该算法需要更少的存储空间。理论上,第二种算法作为连接基于Lanczos的BiCGStab和基于Arnoldi的FOM的桥梁,而ML(n)BiCG则是连接BiCG和FOM的桥梁。我们还从概率角度分析了中断情况,并总结了ML(n)BiCGStab的一些有用性质。同时讨论了实现问题。
With the aid of index functions, we re-derive the ML(n)BiCGStab algorithm in a paper by Yeung and Chan in 1999 in a more systematic way. It turns out that there are n ways to define the ML(n)BiCGStab residual vector. Each definition will lead to a different ML(n)BiCGStab algorithm. We demonstrate this by presenting a second algorithm which requires less storage. In theory, this second algorithm serves as a bridge that connects the Lanczos-based BiCGStab and the Arnoldi-based FOM while ML(n)BiCG a bridge connecting BiCG and FOM. We also analyze the breakdown situations from the probabilistic point of view and summarize some useful properties of ML(n)BiCGStab. Implementation issues are also addressed.
一种逼近多孔介质类型奇异偏微分方程解的概率算法
Nadia Belaribi, François Cuvelier, Francesco Russo
AI总结 针对一维广义多孔介质方程(可能具有不连续系数β),提出并实现了一种随机粒子算法来逼近其解,并预测解的长时间行为。
本文的研究对象是一维广义多孔介质方程(PDE),其系数β可能不连续,该方程作为L^1(R)中的演化问题是适定的。在Blanchard等人和Barbu等人最近的一些论文中,如果初始条件是有界可积函数,则解由非线性随机微分方程的解在分布意义下表示。我们首先推广了这一结果,至少在β连续且初始条件仅可积并满足一些额外技术假设的情况下。本文的主要目的是引入并实现一种随机粒子算法来逼近(PDE)的解,该算法也适用于β可能不规则的情况,以预测解的长时间行为,并与最近的一些数值确定性技术进行比较。
The object of this paper is a one-dimensional generalized porous media equation (PDE) with possibly discontinuous coefficient $β$, which is well-posed as an evolution problem in $L^1(\mathbb{R})$. In some recent papers of Blanchard et alia and Barbu et alia, the solution was represented by the solution of a non-linear stochastic differential equation in law if the initial condition is a bounded integrable function. We first extend this result, at least when $β$ is continuous and the initial condition is only integrable with some supplementary technical assumption. The main purpose of the article consists in introducing and implementing a stochastic particle algorithm to approach the solution to (PDE) which also fits in the case when $β$ is possibly irregular, to predict some long-time behavior of the solution and in comparing with some recent numerical deterministic techniques.
压缩、不完整和不准确测量下的谱聚类性能分析
Blake Hunter, Thomas Strohmer
AI总结 本文结合压缩感知和矩阵完成的距离保持测量与鲁棒谱聚类,分析了亲和矩阵微小误差对谱坐标和聚类能力的影响,并将双类谱聚类的扰动结果推广到多类聚类。
谱聚类是提取数据集潜在全局结构最广泛使用的技术之一。压缩感知和矩阵完成已成为分别有效恢复稀疏和部分观测信号的主流方法。我们将压缩感知和矩阵完成的距离保持测量与鲁棒谱聚类的力量相结合。我们的分析提供了关于亲和矩阵中微小误差如何影响谱坐标和聚类能力的严格界限。这项工作将双类谱聚类的当前扰动结果推广到使用k个特征向量的多类聚类。我们彻底追踪了使用压缩感知和矩阵完成引起的小扰动如何影响亲和矩阵,进而影响谱坐标。这些多类聚类的扰动结果要求亲和矩阵的第k个和第(k+1)个特征值之间存在特征间隙,这在具有k个良好定义簇的数据中自然出现。我们的理论保证辅以数值结果以及图像数据的无监督组织和聚类的若干示例。
Spectral clustering is one of the most widely used techniques for extracting the underlying global structure of a data set. Compressed sensing and matrix completion have emerged as prevailing methods for efficiently recovering sparse and partially observed signals respectively. We combine the distance preserving measurements of compressed sensing and matrix completion with the power of robust spectral clustering. Our analysis provides rigorous bounds on how small errors in the affinity matrix can affect the spectral coordinates and clusterability. This work generalizes the current perturbation results of two-class spectral clustering to incorporate multi-class clustering with k eigenvectors. We thoroughly track how small perturbation from using compressed sensing and matrix completion affect the affinity matrix and in succession the spectral coordinates. These perturbation results for multi-class clustering require an eigengap between the kth and (k+1)th eigenvalues of the affinity matrix, which naturally occurs in data with k well-defined clusters. Our theoretical guarantees are complemented with numerical results along with a number of examples of the unsupervised organization and clustering of image data.
使张量分解对非高斯噪声鲁棒
Eric C. Chi, Tamara G. Kolda
AI总结 针对CP张量分解在非高斯噪声下敏感的问题,提出基于1-范数的损失函数,并设计交替最小化-最大化算法进行拟合。
张量是多维数组,Candecomp/Parafac (CP) 张量分解已在许多不同领域得到应用。CP 模型通常使用最小二乘目标函数进行拟合,这是在独立同分布高斯噪声假设下的最大似然估计。我们证明该损失函数实际上对非高斯噪声高度敏感。因此,我们提出一种基于1-范数的损失函数,因为它可以同时处理高斯和严重非高斯扰动。我们还提出了一种交替最小化-最大化算法,用于使用我们提出的损失函数拟合 CP 模型。
Tensors are multi-way arrays, and the Candecomp/Parafac (CP) tensor factorization has found application in many different domains. The CP model is typically fit using a least squares objective function, which is a maximum likelihood estimate under the assumption of i.i.d. Gaussian noise. We demonstrate that this loss function can actually be highly sensitive to non-Gaussian noise. Therefore, we propose a loss function based on the 1-norm because it can accommodate both Gaussian and grossly non-Gaussian perturbations. We also present an alternating majorization-minimization algorithm for fitting a CP model using our proposed loss function.
MALA算法的非渐近混合性
Nawaf Bou-Rabee, Martin Hairer, Eric Vanden-Eijnden
AI总结 本文研究了Metropolis调整Langevin算法(MALA)在非全局Lipschitz漂移系数随机微分方程(SDE)中的收敛性,证明了其以指数速率收敛至平衡态,且误差项关于时间步长指数小。
Metropolis调整Langevin算法(MALA)最初用于在无限长时间区间上精确采样某些随机微分方程(SDE)的不变测度,也可用于在有限时间区间上路径近似这些SDE的解。然而,当应用于具有非全局Lipschitz漂移系数的SDE时,即使SDE本身具有谱间隙,该算法也可能没有谱间隙。本文调和了MALA缺乏谱间隙与其对SDE不变测度的遍历性以及有限时间精度之间的矛盾。特别地,本文证明了其收敛到平衡态以指数速率发生,误差项关于时间步长指数小。这一量化依赖于MALA精确保持SDE不变测度以及在有限时间区间上准确表示SDE转移概率的能力。
The Metropolis-Adjusted Langevin Algorithm (MALA), originally introduced to sample exactly the invariant measure of certain stochastic differential equations (SDE) on infinitely long time intervals, can also be used to approximate pathwise the solution of these SDEs on finite time intervals. However, when applied to an SDE with a nonglobally Lipschitz drift coefficient, the algorithm may not have a spectral gap even when the SDE does. This paper reconciles MALA's lack of a spectral gap with its ergodicity to the invariant measure of the SDE and finite time accuracy. In particular, the paper shows that its convergence to equilibrium happens at exponential rate up to terms exponentially small in time-stepsize. This quantification relies on MALA's ability to exactly preserve the SDE's invariant measure and accurately represent the SDE's transition probability on finite time intervals.
温度和摩擦加速玻尔兹曼-吉布斯分布采样
Molei Tao, Houman Owhadi, Jerrold E. Marsden
AI总结 通过调节朗之万动力学中的摩擦和温度,实现从正则系综快速采样,其中近最优加速通过将哈密顿量的局部二次近似设为临界阻尼振荡器并采用过加热后冷却策略获得。
本文关注在朗之万动力学中调节摩擦和温度以实现从正则系综快速采样。我们表明,通过选择摩擦使得哈密顿量的局部二次近似为临界阻尼振荡器,可实现近最优加速。系统还被过加热并冷却至最终温度。分析了不同冷却计划作为总模拟时间函数的性能。
This paper is concerned with tuning friction and temperature in Langevin dynamics for fast sampling from the canonical ensemble. We show that near-optimal acceleration is achieved by choosing friction so that the local quadratic approximation of the Hamiltonian is a critical damped oscillator. The system is also over-heated and cooled down to its final temperature. The performances of different cooling schedules are analyzed as functions of total simulation time.
一种去偏蒙特卡洛估计器的通用方法
Don McLeish
AI总结 提出一种通用方法,对数值积分、蒙特卡洛近似或牛顿-拉夫森迭代等过程进行无偏估计,并应用于Heston随机波动率模型中的期权定价。
考虑一个过程,无论是随机的还是确定性的,通过数值积分方案、涉及积分近似的蒙特卡洛方法或牛顿-拉夫森迭代来近似方程的根得到。我们假设可以从时间0到有限时间n的过程分布中采样。我们提出了一种方案,用于无偏估计过程的极限值,同时提供标准误差的估计,并将其应用于包括数值积分、求根和Heston随机波动率模型中的期权定价等例子。这在许多潜在应用中用无偏估计器取代了有偏估计器。
Consider a process, stochastic or deterministic, obtained by using a numerical integration scheme, or from Monte-Carlo methods involving an approximation to an integral, or a Newton-Raphson iteration to approximate the root of an equation. We will assume that we can sample from the distribution of the process from time 0 to finite time n. We propose a scheme for unbiased estimation of the limiting value of the process, together with estimates of standard error and apply this to examples including numerical integrals, root-finding and option pricing in a Heston Stochastic Volatility model. This results in unbiased estimators in place of biased ones i nmany potential applications.
长信号信号子空间的扰动展开
Vladimir Nekrutkin
AI总结 针对奇异谱分析等子空间方法,通过正交投影的扰动展开分析信号子空间的主角度,推导上界并研究时间序列长度趋于无穷时的收敛性、收敛速率及邻近主项。
奇异谱分析及许多其他基于子空间的信号处理方法隐含地依赖于由特殊“信号”和“扰动信号”矩阵的奇异值分解提取的未扰动和扰动信号子空间接近的假设。本文通过相应正交投影的扰动展开,分析了这些子空间之间的主角度。推导了适用的上界。主要关注时间序列长度趋于无穷的渐近情况。给出了关于收敛条件、收敛速率以及邻近主项的结果。
Singular Spectrum Analysis and many other subspace-based methods of signal processing are implicitly relying on the assumption of close proximity of unperturbed and perturbed signal subspaces extracted by the Singular Value Decomposition of special "signal" and "perturbed signal" matrices. In this paper, the analysis of the main principal angle between these subspaces is performed in terms of the perturbation expansions of the corresponding orthogonal projectors. Applicable upper bounds are derived. The main attention is paid to the asymptotical case when the length of the time series tends to infinity. Results concerning conditions for convergence, rate of convergence, and the main terms of proximity are presented.
非线性动力学中近似后验克拉美-罗下界的误差分析
Ming Lei, Pierre Del Moral, Christophe Baehr
AI总结 针对非线性滤波问题,提出两种基于高斯假设的递归近似后验克拉美-罗下界计算方法,并分析其差异,通过粒子滤波和无迹卡尔曼滤波仿真验证了均值-协方差模型的优越性。
在实际非线性滤波中,评估可达到的滤波性能非常重要。本文关注高效递归近似后验克拉美-罗下界(CRLB)的问题。通过使用高斯假设,提出了两种计算CRLB的近似方法:一种使用状态估计的精确模型,另一种使用状态估计及其误差协方差的泰勒级数展开模型。此外,还解析地推导了两种近似CRLB之间的差异。通过使用粒子滤波(PF)和无迹卡尔曼滤波(UKF)进行计算,仿真结果表明,基于均值-协方差的近似CRLB优于基于均值的精确模型。还表明,通过改进滤波方法可以改善估计CRLB之间的理论差异。
In practical nonlinear filtering, the assessment of achievable filtering performance is important. In this paper, we focus on the problem of efficiently approximate the posterior Cramer-Rao lower bound (CRLB) in a recursive manner. By using Gaussian assumptions, two types of approximations for calculating the CRLB are proposed: An exact model using the state estimate as well as a Taylor-series-expanded model using both of the state estimate and its error covariance, are derived. Moreover, the difference between the two approximated CRLBs is also formulated analytically. By employing the particle filter (PF) and the unscented Kalman filter (UKF) to compute, simulation results reveal that the approximated CRLB using mean-covariance-based model outperforms that using the mean-based exact model. It is also shown that the theoretical difference between the estimated CRLBs can be improved through an improved filtering method.
George Forsythe的最后一篇论文
Richard P. Brent
AI总结 本文描述了von Neumann的指数分布采样思想、Forsythe的推广以及作者对正态分布伪随机数生成算法的改进。
我们描述了von Neumann用于从指数分布采样的优雅思想,Forsythe将其推广到从密度形式为exp(-G(x))的概率分布中采样,其中G(x)易于计算(例如多项式),以及我对这些思想的改进,从而给出一种高效的正态分布伪随机数生成算法。还提到了后续的发展。
We describe von Neumann's elegant idea for sampling from the exponential distribution, Forsythe's generalization for sampling from a probability distribution whose density has the form exp(-G(x)), where G(x) is easy to compute (e.g. a polynomial), and my refinement of these ideas to give an efficient algorithm for generating pseudo-random numbers with a normal distribution. Later developments are also mentioned.
向量处理器上的快速正态随机数生成器
Richard P. Brent
AI总结 针对向量处理器,通过向量化实现Box-Muller和Polar方法,并在Fujitsu VP2200上展示其高性能,同时分析其他方法为何在向量处理器上难以与Polar方法竞争。
我们考虑适用于向量处理器的伪随机数生成器。特别地,我们描述了Box-Muller和Polar方法的向量化实现,并展示它们在Fujitsu VP2200上的良好性能。我们还考虑了一些其他流行方法,例如Kinderman和Monahan(1977)的比率方法(由Leva(1992)改进),以及Von Neumann和Forsythe方法,并说明为什么它们在向量处理器上不太可能与Polar方法竞争。
We consider pseudo-random number generators suitable for vector processors. In particular, we describe vectorised implementations of the Box-Muller and Polar methods, and show that they give good performance on the Fujitsu VP2200. We also consider some other popular methods, e.g. the Ratio method of Kinderman and Monahan (1977) (as improved by Leva (1992)), and the method of Von Neumann and Forsythe, and show why they are unlikely to be competitive with the Polar method on vector processors.
Wallace正态随机数生成器的一种快速向量化实现
Richard P. Brent
AI总结 提出Wallace正态伪随机数生成器的向量化实现,通过矩阵-向量乘法内循环,在Fujitsu VP2200和VPP300上速度比Polar和Box-Muller方法快三倍以上。
Wallace提出了一类新的正态变量伪随机生成器。这些生成器除了初始化外,不需要均匀伪随机数流。内循环本质上是矩阵-向量乘法,非常适合在向量处理器或向量/并行处理器(如Fujitsu VPP300)上实现。在本报告中,我们概述了Wallace的想法,考虑了一些变体,并描述了一种向量化实现RANN4,它在Fujitsu VP2200和VPP300上比最佳竞争对手(Polar和Box-Muller方法)快三倍以上。
Wallace has proposed a new class of pseudo-random generators for normal variates. These generators do not require a stream of uniform pseudo-random numbers, except for initialisation. The inner loops are essentially matrix-vector multiplications and are very suitable for implementation on vector processors or vector/parallel processors such as the Fujitsu VPP300. In this report we outline Wallace's idea, consider some variations on it, and describe a vectorised implementation RANN4 which is more than three times faster than its best competitors (the Polar and Box-Muller methods) on the Fujitsu VP2200 and VPP300.
随机扰动下的奇异向量
Van Vu
AI总结 针对大矩阵奇异向量计算中噪声影响的问题,利用高维几何方法证明了当扰动随机且矩阵低秩时,奇异向量变化的上界比经典最坏情况估计更优。
计算大矩阵的前几个奇异向量是统计学和数值分析中常见的问题。由于噪声的存在,精确计算难以实现,因此以下问题至关重要: \vskip2mm \centerline {\it 矩阵的微小扰动会如何改变奇异向量?} \vskip2mm 回答这个问题时,经典定理(如Davis-Kahan和Wedin定理)给出了最坏情况下的紧致估计。在本文中,我们证明如果扰动(噪声)是随机的且矩阵是低秩的,那么可以得到更好的估计。我们的方法依赖于高维几何,与早期论文中使用的方法不同。
Computing the first few singular vectors of a large matrix is a problem that frequently comes up in statistics and numerical analysis. Given the presence of noise, exact calculation is hard to achieve, and the following problem is of importance: \vskip2mm \centerline {\it How much a small perturbation to the matrix changes the singular vectors ?} \vskip2mm Answering this question, classical theorems, such as those of Davis-Kahan and Wedin, give tight estimates for the worst-case scenario. In this paper, we show that if the perturbation (noise) is random and our matrix has low rank, then better estimates can be obtained. Our method relies on high dimensional geometry and is different from those used an earlier papers.
线性不适定积分方程的自适应最优正则化
E. Ostrovsky L. Sirota
AI总结 针对第一类线性积分方程,提出一种自适应渐近最优(在L2范数意义下阶最优)的解估计方法,并构建自适应置信区域。
我们构造了一种自适应渐近最优(在$L(2)$意义下阶最优)的第一类线性积分方程的解(估计)及其能量,同时构建自适应置信区域。
We construct an adaptive asymptotically optimal in order in the $ L(2) $ sense a solution (estimation) of an integral linear equation of a first kind and energy of this solution with the confidence region building, also adaptive.
正态分布分位数函数的一种新近似
Paul M. Voutier
AI总结 提出一种形式类似于Beasley和Springer近似的新近似方法,通过牺牲部分精度(最大绝对误差小于2.5×10⁻⁵)换取更高计算速度,适用于金融等对速度要求高的场景。
我们提出了一种正态分布分位数函数的新近似。其形式与Beasley和Springer [3]的近似相似,最大绝对误差小于$2.5 \cdot 10^{-5}$。虽然精度低于[3],但仍足以满足许多应用。然而,它比[3]更快。这是其主要优势,对包括金融市场在内的许多应用至关重要。
We present a new approximation to the normal distribution quantile function. It has a similar form to the approximation of Beasley and Springer [3], providing a maximum absolute error of less than $2.5 \cdot 10^{-5}$. This is less accurate than [3], but still sufficient for many applications. However it is faster than [3]. This is its primary benefit, which can be crucial to many applications, including in financial markets.
通过并行边缘化实现高效蒙特卡洛采样
Jonathan Weare
AI总结 提出利用快速平衡的粗粒度马尔可夫链与全链交换以克服自相关时间长的问题,并通过随机微分方程的桥采样和滤波/平滑问题数值测试验证。
马尔可夫链蒙特卡洛采样方法通常面临较长的相关时间。因此,这些方法需要运行许多步才能生成独立样本。本文提出了一种克服这一困难的方法。该方法利用快速平衡的粗粒度马尔可夫链的信息,这些链对全系统的边缘分布进行采样。这是通过全链与辅助粗链之间的交换实现的。给出了随机微分方程的桥采样和滤波/平滑问题的数值测试结果。
Markov chain Monte Carlo sampling methods often suffer from long correlation times. Consequently, these methods must be run for many steps to generate an independent sample. In this paper a method is proposed to overcome this difficulty. The method utilizes information from rapidly equilibrating coarse Markov chains that sample marginal distributions of the full system. This is accomplished through exchanges between the full chain and the auxiliary coarse chains. Results of numerical tests on the bridge sampling and filtering/smoothing problems for a stochastic differential equation are presented.
独立于噪声水平的正则化:拟最优性分析
Frank Bauer, Markus Reiss
AI总结 本文针对谱截断估计器,从平均角度分析拟最优性准则,证明其能确定平均意义上速率最优的估计器,并通过金融数学中的校准问题展示实际性能。
拟最优性准则在不考虑噪声水平的情况下选择逆问题中的正则化参数。该规则在实践中效果显著,尽管Bakushinskii已证明总存在性能极差的反例。我们针对谱截断估计器提出了拟最优性的平均情况分析,并证明拟最优性准则能确定平均意义上速率最优的估计器。其实际性能通过数学金融中的校准问题加以说明。
The quasi-optimality criterion chooses the regularization parameter in inverse problems without taking into account the noise level. This rule works remarkably well in practice, although Bakushinskii has shown that there are always counterexamples with very poor performance. We propose an average case analysis of quasi-optimality for spectral cut-off estimators and we prove that the quasi-optimality criterion determines estimators which are rate-optimal {\em on average}. Its practical performance is illustrated with a calibration problem from mathematical finance.
随机Landau-Lifshitz Navier-Stokes方程的数值方法
John B. Bell, Alejandro L. Garcia, Sarah A. Williams
AI总结 本文研究随机Landau-Lifshitz Navier-Stokes方程的显式欧拉离散化,提出一种守恒中心格式结合三阶Runge-Kutta时间积分器以准确再现密度涨落,并通过多种数值测试与理论及DSMC模拟比较。
Landau-Lifshitz Navier-Stokes (LLNS) 方程通过使用随机通量将热涨落纳入宏观流体动力学。本文研究了完整LLNS方程的显式欧拉离散化。考虑了几种CFD方法(包括MacCormack两步Lax-Wendroff格式和分段抛物方法),发现这些方法在动量和能量涨落的方差上给出了良好的结果(约10%误差)。然而,这两种格式都不能准确再现密度涨落。我们引入了一种守恒中心格式,结合三阶Runge-Kutta时间积分器,该格式能够准确产生密度涨落。考虑了多种数值测试,包括驻立激波的随机游走,并将随机LLNS PDE求解器的结果与理论(当可用时)以及使用直接模拟蒙特卡洛(DSMC)算法的分子模拟进行了比较。
The Landau-Lifshitz Navier-Stokes (LLNS) equations incorporate thermal fluctuations into macroscopic hydrodynamics by using stochastic fluxes. This paper examines explicit Eulerian discretizations of the full LLNS equations. Several CFD approaches are considered (including MacCormack's two-step Lax-Wendroff scheme and the Piecewise Parabolic Method) and are found to give good results (about 10% error) for the variances of momentum and energy fluctuations. However, neither of these schemes accurately reproduces the density fluctuations. We introduce a conservative centered scheme with a third-order Runge-Kutta temporal integrator that does accurately produce density fluctuations. A variety of numerical tests, including the random walk of a standing shock wave, are considered and results from the stochastic LLNS PDE solver are compared with theory, when available, and with molecular simulations using a Direct Simulation Monte Carlo (DSMC) algorithm.
两种随机模式约化方法的比较研究
Panagiotis Stinis
AI总结 本文比较了两种针对具有时间尺度分离的常微分方程组进行降维的方法,两者均产生随机微分方程,并允许在约化系统中使用高阶项,通过理论条件和数值实验验证了两种方法具有相似的预测能力。
我们提出了两种方法的比较研究,用于降低具有时间尺度分离的常微分方程系统的维数。两种方法都导致一个约化的随机微分方程系统。这些方法的新颖之处在于,它们允许在约化系统中使用关于已解析变量的高阶项。第一种方法由 Majda、Timofeyev 和 Vanden-Eijnden 提出,基于 Kurtz 发展的渐近策略。第二种方法是不可逆统计力学中 Mori-Zwanzig 投影形式主义的短记忆近似,由 Chorin、Hald 和 Kupferman 提出。我们给出了两种方法产生的约化模型应具有相似预测能力的条件。我们将这两种方法应用于满足这些条件的测试案例。约化模型的形式和数值模拟表明,这两种方法具有预期的相似预测能力。
We present a comparative study of two methods for the reduction of the dimensionality of a system of ordinary differential equations that exhibits time-scale separation. Both methods lead to a reduced system of stochastic differential equations. The novel feature of these methods is that they allow the use, in the reduced system, of higher order terms in the resolved variables. The first method, proposed by Majda, Timofeyev and Vanden-Eijnden, is based on an asymptotic strategy developed by Kurtz. The second method is a short-memory approximation of the Mori-Zwanzig projection formalism of irreversible statistical mechanics, as proposed by Chorin, Hald and Kupferman. We present conditions under which the reduced models arising from the two methods should have similar predictive ability. We apply the two methods to test cases that satisfy these conditions. The form of the reduced models and the numerical simulations show that the two methods have similar predictive ability as expected.
指数密度估计与重正化的最大似然算法
Panagiotis Stinis
AI总结 提出基于最大似然的算法,通过Levenberg-Marquardt优化求解矩匹配问题,实现指数密度的估计与重正化,并用于二维Ising模型临界温度计算。
我们提出了一种基于最大似然的算法,用于指数密度的估计和重正化(边缘化)。由似然最大化产生的矩匹配问题通过使用Levenberg-Marquardt算法的优化问题求解。在重正化的情况下,建立矩匹配问题所需的矩使用Swendsen重正化方法进行评估。我们专注于算法的重正化版本,通过计算二维Ising模型的临界温度来演示其使用。讨论了该算法的可能应用。
We present an algorithm based on maximum likelihood for the estimation and renormalization (marginalization) of exponential densities. The moment-matching problem resulting from the maximization of the likelihood is solved as an optimization problem using the Levenberg-Marquardt algorithm. In the case of renormalization, the moments needed to set up the moment-matching problem are evaluated using Swendsen's renormalization method. We focus on the renormalization version of the algorithm, where we demonstrate its use by computing the critical temperature of the two-dimensional Ising model. Possible applications of the algorithm are discussed.
有限马尔可夫矩问题的求解
Laurent Gosse, Olof Runborg
AI总结 本文提出了一种构造性方法来解决有限马尔可夫矩问题,基于Toeplitz矩阵理论和经典牛顿关系进行证明。
我们详细阐述了一种构造性过程来求解所谓的“有限马尔可夫矩问题”。证明依赖于Toeplitz矩阵的一般理论以及经典的牛顿关系。
We expose in full detail a constructive procedure to invert the so--called "finite Markov moment problem". The proofs rely on the general theory of Toeplitz matrices together with the classical Newton's relations.
一类有限马尔可夫矩问题的存在性、唯一性及构造性求解算法
Laurent Gosse, Olof Runborg
AI总结 针对具有任意正负分支数的有限马尔可夫矩问题,给出了解的存在性和唯一性判据,刻画了非唯一解族,并提出了一种构造性数值求解算法。
我们考虑一类具有任意数量正负分支的有限马尔可夫矩问题。我们给出了解的存在性和唯一性判据,并详细刻画了非唯一解族。此外,我们提出了一种构造性算法来数值求解矩问题,并证明了该算法能计算出正确的解。
We consider a class of finite Markov moment problems with arbitrary number of positive and negative branches. We show criteria for the existence and uniqueness of solutions, and we characterize in detail the non-unique solution families. Moreover, we present a constructive algorithm to solve the moment problems numerically and prove that the algorithm computes the right solution.
非线性滤波器的插值与迭代
Alexandre J. Chorin, Xuemin Tu
AI总结 本文提出隐式粒子滤波中迭代与插值过程的通用形式,通过伪高斯表示后验密度来聚焦粒子路径,以减少非线性数据同化所需的粒子数。
我们提出了隐式粒子滤波中使用的迭代和插值过程的通用形式。隐式滤波器基于后验密度的伪高斯表示,旨在聚焦粒子路径,以减少非线性数据同化中所需的粒子数量。给出了示例。
We present a general form of the iteration and interpolation process used in implicit particle filters. Implicit filters are based on a pseudo-Gaussian representation of posterior densities, and are designed to focus the particle paths so as to reduce the number of particles needed in nonlinear data assimilation. Examples are given.
泊松过程的模型选择
Lucien Birgé
AI总结 本文基于T估计量的模型选择方法,引入Hellinger距离作为损失函数,构建检验以选择泊松过程均值测度的最优模型,并应用于自适应强度估计与估计量聚合。
本文的目的是将基于T估计量的模型选择通用方法论(Birgé [Ann. Inst. H. Poincaré Probab. Statist. 42 (2006) 273--325])应用于泊松过程未知均值测度的估计这一特定情形。我们引入有限正测度之间的Hellinger型距离作为损失函数,并在均值测度集合中针对该距离的球构建合适的检验。由于此类检验的存在,给定一个合适的逼近模型族,我们可以基于该模型族构建均值测度的T估计量并分析其性能。我们提供了若干应用,涉及当强度平方根属于不同光滑类时的自适应强度估计。我们还给出了一种初步估计量的聚合方法。
Our purpose in this paper is to apply the general methodology for model selection based on T-estimators developed in Birgé [Ann. Inst. H. Poincaré Probab. Statist. 42 (2006) 273--325] to the particular situation of the estimation of the unknown mean measure of a Poisson process. We introduce a Hellinger type distance between finite positive measures to serve as our loss function and we build suitable tests between balls (with respect to this distance) in the set of mean measures. As a consequence of the existence of such tests, given a suitable family of approximating models, we can build T-estimators for the mean measure based on this family of models and analyze their performances. We provide a number of applications to adaptive intensity estimation when the square root of the intensity belongs to various smoothness classes. We also give a method for aggregation of preliminary estimators.
拉普拉斯《概率分析理论》第505-512页、第516-520页的翻译与现代解读
Julien Langou
AI总结 本文翻译并现代解读拉普拉斯1820年法文手稿,将其中的两个算法分别解释为回归矩阵上的反向无平方根修正行Gram-Schmidt算法和反向无平方根Cholesky算法。
拉普拉斯的文本《Sur l'application du calcul des probabilités à la philosophie naturelle》(《概率分析理论》第三版,第一增补,1820)在Gram-Schmidt算法的背景下被引用。我们提供了拉普拉斯手稿(原为法文)的英文翻译,并在当代背景下解读拉普拉斯的算法。拉普拉斯给出的两个算法计算线性统计模型解的两个分量的均值和方差。第一个算法可解释为回归矩阵上的{\em 反向无平方根修正行Gram-Schmidt}算法。第二个算法可解释为{\em 反向无平方根Cholesky}算法。
The text of Laplace, \textit{Sur l'application du calcul des probabilités à la philosophie naturelle,} (Théorie Analytique des Probabilités. Troisième Édition. Premier Supplément), 1820, is quoted in the context of the Gram-Schmidt algorithm. We provide an English translation of Laplace's manuscript (originally in French) and interpret the algorithms of Laplace in a contemporary context. The two algorithms given by Laplace computes the mean and the variance of two components of the solution of a linear statistical model. The first algorithm can be interpreted as {\em reverse square-root-free modified Gram-Schmidt by row} algorithm on the regression matrix. The second algorithm can be interpreted as the {\em reverse square-root-free Cholesky} algorithm.
非贝叶斯粒子滤波
Alexandre J. Chorin, Xuemin Tu
AI总结 针对非线性问题中的数据同化,提出一种非贝叶斯粒子滤波方法,通过直接迭代采样相关概率密度函数,减少所需粒子数量。
用于非线性问题中数据同化的粒子滤波器使用“粒子”(底层系统的副本)通过贝叶斯过程生成一系列概率密度函数(pdf)。这可能很昂贵,因为必须使用大量粒子来保持精度。我们在此提供一种替代方案,其中相关的pdf通过迭代直接采样。详细讨论了一个例子。
Particle filters for data assimilation in nonlinear problems use "particles" (replicas of the underlying system) to generate a sequence of probability density functions (pdfs) through a Bayesian process. This can be expensive because a significant number of particles has to be used to maintain accuracy. We offer here an alternative, in which the relevant pdfs are sampled directly by an iteration. An example is discussed in detail.
数据表示中的降维
A. G. Ramm
AI总结 针对高维数据,提出一种寻找低维子空间(维度远小于原始维度)的算法,使得大部分数据点位于该子空间附近,与主成分分析(PCA)不同。
假设数据由一组点 $x_j$($1\leq j \leq J$)组成,分布在有界域 $D\subset R^N$ 中,其中 $N$ 很大。给出一种算法,用于寻找维度 $k\ll N$($k=1,2,\ldots,K$)的集合 $L_k$,使得最大数量的点 $x_j\in S$ 位于 $L_k$ 的邻域内。该算法不同于主成分分析(PCA)。
Suppose the data consist of a set $S$ of points $x_j$, $1\leq j \leq J$, distributed in a bounded domain $D\subset R^N$, where $N$ is a large number. An algorithm is given for finding the sets $L_k$ of dimension $k\ll N$, $k=1,2,...K$, in a neighborhood of which maximal amount of points $x_j\in S$ lie. The algorithm is different from PCA (principal component analysis)
高斯因子模型中贷款组合分券预期损失的快速计算:使用Hermite展开提高精度
P. Okunev
AI总结 针对高斯因子模型,提出一种基于Hermite展开的快速算法,用于计算贷款组合分券的预期损失,在小规模组合上比现有方法更准确。
我们提出了一种快速算法,用于计算高斯因子模型中的预期分券损失。我们在单因子高斯模型下,对规模从25(DJ iTraxx Australia的规模)到100(DJCDX.NA.HY的规模)的投资组合进行了测试,结果表明该算法给出了准确的结果。这里提出的算法是对\cite{PO}中算法的扩展。新算法的优点在于,对于\cite{PO}中提出的正态近似不够准确的小规模投资组合,它也能很好地工作。该算法旨在作为基于傅里叶变换的较慢方法\cite{MD}的替代方案。
We propose a fast algorithm for computing the expected tranche loss in the Gaussian factor model. We test it on portfolios ranging in size from 25 (the size of DJ iTraxx Australia) to 100 (the size of DJCDX.NA.HY) with a single factor Gaussian model and show that the algorithm gives accurate results. The algorithm proposed here is an extension of the algorithm proposed in \cite{PO}. The advantage of the new algorithm is that it works well for portfolios of smaller size for which the normal approximation proposed in \cite{PO} in not sufficiently accurate. The algorithm is intended as an alternative to the much slower Fourier transform based methods \cite{MD}.
计算高斯因子模型中预期贷款组合分档损失的快速算法
Pavel Okunev
AI总结 针对高斯因子模型,提出一种快速计算预期分档损失的算法,在125个名称的投资组合上验证了准确性,作为较慢的穆迪FT方法的替代方案。
我们提出了一种快速算法,用于计算高斯因子模型中的预期分档损失。我们在一个125个名称的投资组合上使用单因子高斯模型进行了测试,结果表明该算法给出了准确的结果。我们选择125个名称的投资组合进行测试,因为这是标准DJCDX.NA.HY投资组合的规模。这里提出的算法旨在作为速度较慢的穆迪FT方法的替代方案。
We propose a fast algorithm for computing the expected tranche loss in the Gaussian factor model. We test it on a 125 name portfolio with a single factor Gaussian model and show that the algorithm gives accurate results. We choose a 125 name portfolio for our tests because this is the size of the standard DJCDX.NA.HY portfolio. The algorithm proposed here is intended as an alternative to the much slower Moody's FT method.
从含噪价格数据中一致估计定价核
Vladislav Kargin
AI总结 针对非负定价核的逆问题,提出约束最小二乘法和松弛相对熵最大化方法,实现一致估计。
如果假定定价核非负,则求解定价核的逆问题是适定的。约束最小二乘法提供了定价核的一致估计。当数据有限时,提出了一种新方法:松弛相对熵最大化。该估计量也是一致的。关键词:$ε$-熵、非参数估计、定价核、逆问题。
If pricing kernels are assumed non-negative then the inverse problem of finding the pricing kernel is well-posed. The constrained least squares method provides a consistent estimate of the pricing kernel. When the data are limited, a new method is suggested: relaxed maximization of the relative entropy. This estimator is also consistent. Keywords: $ε$-entropy, non-parametric estimation, pricing kernel, inverse problems.
关于波动率平均广义Fong-Vasicek期限结构不存在单因子利率模型的证明
B. Stehlikova, D. Sevcovic
AI总结 本文研究具有随机波动率的广义Fong-Vasicek双因子利率模型,通过分析平均债券价格,证明不存在单因子模型能匹配该平均价格。
本文旨在研究具有随机波动率的广义Fong-Vasicek双因子利率模型。在该模型中,随机短期利率的离散度(波动率的平方)也被假设为随机的,并遵循一个非负过程,其波动率与离散度的平方根成正比。离散度随机过程的漂移项被假设为相当一般的形式,特别地,包括具有一个根(产生原始Fong-Vasicek模型)或具有三个根(产生用于描述波动率聚集的广义Fong-Vasicek模型)的线性函数。我们考虑相对于随机离散度极限分布的平均债券价格。平均债券价格依赖于时间和短期利率的当前水平,如同许多流行的单因子利率模型(特别是Vasicek和Cox-Ingersoll-Ross模型)中的情况。然而,作为本文的主要结果,我们证明不存在这样的单因子模型能够产生与上述平均值相同的债券价格。
The purpose of this paper is to study the generalized Fong--Vasicek two-factor interest rate model with stochastic volatility. In this model the dispersion of the stochastic short rate (square of volatility) is assumed to be stochastic as well and it follows a non-negative process with volatility proportional to the square root of dispersion. The drift of the stochastic process for the dispersion is assumed to be in a rather general form including, in particular, linear function having one root (yielding the original Fong--Vasicek model or a cubic like function having three roots (yielding a generalized Fong--Vasicek model for description of the volatility clustering). We consider averaged bond prices with respect to the limiting distribution of stochastic dispersion. The averaged bond prices depend on time and current level of the short rate like it is the case in many popular one-factor interest rate model including in particular the Vasicek and Cox--Ingersoll-Ross model. However, as a main result of this paper we show that there is no such one-factor model yielding the same bond prices as the averaged values described above.
原子核中的同位旋不对称性与核对称能
Tapan Mukhopadhyay, D. N. Basu
AI总结 利用最大似然估计从原子质量中提取核对称能的体积和表面部分,并约束核对称能,同时通过体积和表面对称项及库仑能系数得到有限核的中子皮厚度。
通过最大似然估计从测量的原子质量中确定核对称能的体积和表面对称部分以及液滴模型的其他系数。从有限核中提取的体积对称能系数提供了对核对称能的约束。该方法还通过体积和表面对称项以及库仑能系数之间的关系,得到有限核的中子皮厚度。从密度依赖的M3Y有效相互作用的同位旋标量和同位旋矢量分量对核物质的描述,提供了与从测量原子质量中提取的对称能经验值以及核物质的其他现代理论描述一致的对称能值。
The volume and surface symmetry parts of the nuclear symmetry energy and other coefficients of the liquid droplet model are determined from the measured atomic masses by the maximum likelihood estimator. The volume symmetry energy coefficient extracted from finite nuclei provides a constraint on the nuclear symmetry energy. This approach also yields the neutron skin of a finite nucleus through its relationship with the volume and surface symmetry terms and the Coulomb energy coefficient. The description of nuclear matter from the isoscalar and isovector components of the density dependent M3Y effective interaction provide a value of the symmetry energy that is consistent with the empirical value of the symmetry energy extracted from measured atomic masses and with other modern theoretical descriptions of nuclear matter.
具有时间尺度分离系统的粒子滤波方差缩减
Dror Givon, Panagiotis Stinis, Jonathan Weare
AI总结 针对时间尺度分离系统,利用平均原理降维和Rao-Blackwell化分解转移概率,结合粗投影积分框架构建粒子滤波器,实现方差缩减和加速。
我们提出了一种针对具有时间尺度分离系统的粒子滤波器构造。时间尺度的分离允许我们利用两个简化:i) 使用平均原理对系统进行降维,以便求解每个粒子;ii) 转移概率的分解,允许对滤波步骤进行Rao-Blackwell化。这两种简化都可以使用粗投影积分框架实现。得到的粒子滤波器比基于原始系统的粒子滤波器更快,且方差更小。该方法在一个多尺度随机微分方程和一个由化学反应驱动的多尺度纯跳跃扩散上进行了测试。
We present a particle filter construction for a system that exhibits time-scale separation. The separation of time-scales allows two simplifications that we exploit: i) The use of the averaging principle for the dimensional reduction of the system needed to solve for each particle and ii) the factorization of the transition probability which allows the Rao-Blackwellization of the filtering step. Both simplifications can be implemented using the coarse projective integration framework. The resulting particle filter is faster and has smaller variance than the particle filter based on the original system. The method is tested on a multiscale stochastic differential equation and on a multiscale pure jump diffusion motivated by chemical reactions.
国民经济的状态空间描述:V4国家
Ivo Petras, Igor Podlubny
AI总结 本文采用状态空间方法,以GDP、通胀率和失业率为状态变量,通过正交回归分析V4国家(斯洛伐克、捷克、匈牙利、波兰)的经济轨迹,发现其相轨迹近似位于一个平面,并提出新的经济指标。
我们提出了一种描述国民经济的新方法。为此,我们采用了主要应用于动力系统理论和控制理论中的状态空间观点。以国内生产总值、通货膨胀率和失业率作为状态变量。我们证明,在所考虑的时间段内,每个V4国家(斯洛伐克共和国、捷克共和国、匈牙利和波兰)的相轨迹近似位于一个平面内,因此每个国家的经济发展可以与状态空间中的一个对应平面相关联。所提出的方法为一系列新的经济指标(例如,国民经济的法向量、各种平面斜率、不同经济对应平面之间的二维角度等)开辟了道路。计算所使用的工具是正交回归(也称为正交距离回归,或总体最小二乘法),并且我们还给出了使用正交回归代替基于最小二乘法的经典回归的一般论据。提供了一个用于将3D数据拟合到3D中的直线和平面上的MATLAB例程。
We present a new approach to description of national economies. For this we use the state space viewpoint, which is used mostly in the theory of dynamical systems and in the control theory. Gross domestic product, inflation, and unemployment rates are taken as state variables. We demonstrate that for the considered period of time the phase trajectory of each of the V4 countries (Slovak Republic, Czech Republic, Hungary, and Poland) lies approximately in one plane, so that the economic development of each country can be assocated with a corresponding plane in the state space. The suggested approach opens a way to a new set of economic indicators (for example, normal vectors of national economies, various plane slopes, 2D angles between the planes corresponding to different economies, etc.). The tool used for computations is orthogonal regression (alias orthogonal distance regression, alias total least squares method), and we also give general arguments for using orthogonal regression instead of the classical regression based on the least squares method. A MATLAB routine for fitting 3D data to lines and planes in 3D is provided.
计算线性最小二乘解分量的条件数
Marc Baboulin, Jack Dongarra, Serge Gratton, Julien Langou
AI总结 本文针对超定满秩线性最小二乘问题,基于Arioli等人的理论结果,定义了可计算的条件数估计,并将其解释为统计量,同时提供了使用LAPACK例程的计算代码和成本分析,最后给出了历史实例和空间工业实验。
本文讨论了超定满秩线性最小二乘问题结果的准确性。我们回顾了Arioli、Baboulin和Gratton在SIMAX 29(2):413--433, 2007中关于最小二乘解及其分量的条件数的理论结果,其中矩阵扰动以Frobenius范数或谱范数度量。然后我们定义了这些条件数的可计算估计,并将其解释为统计量。特别地,我们证明在经典线性统计模型中,解的一个分量的方差与右端项方差的比值正是该解分量在考虑右端项扰动时的条件数。我们还提供了使用LAPACK例程计算方差-协方差矩阵和最小二乘条件数的片段代码,并给出了相应的计算成本。最后,我们给出了一个小的历史数值例子,该例子被Laplace在《概率分析理论》(1820年)中用于计算木星的质量,以及来自航天工业的真实物理数据实验。
In this paper, we address the accuracy of the results for the overdetermined full rank linear least squares problem. We recall theoretical results obtained in Arioli, Baboulin and Gratton, SIMAX 29(2):413--433, 2007, on conditioning of the least squares solution and the components of the solution when the matrix perturbations are measured in Frobenius or spectral norms. Then we define computable estimates for these condition numbers and we interpret them in terms of statistical quantities. In particular, we show that, in the classical linear statistical model, the ratio of the variance of one component of the solution by the variance of the right-hand side is exactly the condition number of this solution component when perturbations on the right-hand side are considered. We also provide fragment codes using LAPACK routines to compute the variance-covariance matrix and the least squares conditioning and we give the corresponding computational cost. Finally we present a small historical numerical example that was used by Laplace in Theorie Analytique des Probabilites, 1820, for computing the mass of Jupiter and experiments from the space industry with real physical data.
计量经济学中不适定逆问题的速率最优性
Xiaohong Chen, Markus Reiss
AI总结 本文阐明非参数间接回归与非参数工具变量模型正则条件的关系,建立极小化极大风险下界,并证明投影估计子和筛子最小距离估计子达到该下界,实现速率最优性。
本文阐明了非参数间接回归(NPIR)和非参数工具变量(NPIV)回归模型收敛速率的现有正则条件集之间的关系。我们在两个基本正则条件下,建立了NPIR和NPIV模型在均方积分误差损失下的极小化极大风险下界,这些条件允许温和不适定和严重不适定情况。我们证明,NPIR模型的简单投影估计子和NPIV模型的筛子最小距离估计子都能达到极小化极大风险下界,并且在一大类结构函数上均匀地实现速率最优,允许温和不适定和严重不适定情况。
In this paper, we clarify the relations between the existing sets of regularity conditions for convergence rates of nonparametric indirect regression (NPIR) and nonparametric instrumental variables (NPIV) regression models. We establish minimax risk lower bounds in mean integrated squared error loss for the NPIR and the NPIV models under two basic regularity conditions that allow for both mildly ill-posed and severely ill-posed cases. We show that both a simple projection estimator for the NPIR model, and a sieve minimum distance estimator for the NPIV model, can achieve the minimax risk lower bounds, and are rate-optimal uniformly over a large class of structure functions, allowing for mildly ill-posed and severely ill-posed cases.
球面上的实用小波设计
Frédéric Guilloux, Gilles Fay, Jean-François Cardoso
AI总结 针对球面数据分析(如宇宙微波背景),提出一种在谱域完美受限、空间域最优局域的球面各向同性分析函数设计方法,基于Narcowich等人2006年的局部框架进行优化和数值比较。
我们解决了在球面上设计各向同性分析函数的问题,这些函数在谱域中完全受限,并在空间域中具有最优的局域性。这项工作源于对球面数据(例如宇宙微波背景科学)进行局部分析工具的需求。我们的构造基于Narcowich、Petrushev、Ward(2006)引入的局部框架。分析框架针对特定应用进行了优化,并使用各种标准进行了数值比较。
We address the question of designing isotropic analysis functions on the sphere which are perfectly limited in the spectral domain and optimally localized in the spatial domain. This work is motivated by the need of localized analysis tools in domains where the data is lying on the sphere, e.g.{} the science of the Cosmic Microwave Background. Our construction is derived from the localized frames introduced by Narcowich, Petrushev, Ward, 2006. The analysis frames are optimized for given applications and compared numerically using various criteria.
一种用于似然最大化的类牛顿算法:稳健方差评分算法
Daniel Commenges, Helene Jacqmin-Gadda, Cecile Proust, Jeremie Guedj
AI总结 提出稳健方差评分(RVS)算法,用得分方差估计替代牛顿-拉夫森算法中的海森矩阵,减少计算量,并基于近似误差与统计误差之比设计收敛准则,模拟表明RVS比Marquardt算法更快且覆盖率满意。
本文研究了一种已被多位作者使用但尚未被彻底研究的类牛顿方法。我们称之为稳健方差评分(RVS)算法,因为我们考虑的主要版本用矩阵$G$替换了牛顿-拉夫森算法中使用的对数似然的负海森矩阵,该矩阵是在真实概率下得分的方差估计,仅使用个体得分。因此,该算法的一次迭代所需的计算量远少于牛顿-拉夫森算法的一次迭代。此外,这种得分方差的估计在最大值处估计信息矩阵。我们还研究了一个收敛准则,该准则具有估计近似误差与统计误差之比的良好解释;因此它可以用于停止迭代过程,无论问题如何。模拟研究证实,RVS算法比Marquardt算法(牛顿-拉夫森算法的稳健版本)更快;这是因为RVS算法所需的迭代次数仅略大于Marquardt算法,而每次迭代的计算时间却短得多。此外,使用矩阵$G$的覆盖率也是令人满意的。
This article studies a Newton-like method already used by several authors but which has not been thouroughly studied yet. We call it the robust-variance scoring (RVS) algorithm because the main version of the algorithm that we consider replaces minus the Hessian of the loglikelihood used in the Newton-Raphson algorithm by a matrix $G$ which is an estimate of the variance of the score under the true probability, which uses only the individual scores. Thus an iteration of this algorithm requires much less computations than an iteration of the Newton-Raphson algorithm. Moreover this estimate of the variance of the score estimates the information matrix at maximum. We have also studied a convergence criterion which has the nice interpretation of estimating the ratio of the approximation error over the statistical error; thus it can be used for stopping the iterative process whatever the problem. A simulation study confirms that the RVS algorithm is faster than the Marquardt algorithm (a robust version of the Newton-Raphson algorithm); this happens because the number of iterations needed by the RVS algorithm is barely larger than that needed by the Marquardt algorithm while the computation time for each iteration is much shorter. Also the coverage rates using the matrix $G$ are satisfactory.
当投影大幅降低维度时随机投影多面体的面计数
David L. Donoho, Jared Tanner
AI总结 本文发展渐近方法计数随机高维多面体的面,并揭示在统计、概率、信息论和信号处理中的意外应用,包括高斯点云凸包、随机投影信号恢复以及高斯纠错码可纠正的粗差数量。
本文发展渐近方法计数随机高维多面体的面。除了其内在兴趣,我们的结论具有令人惊讶的含义——在统计学、概率论、信息论和信号处理中——并对医学成像和数字通信等实际领域有潜在影响。其中三个含义涉及:高斯点云的凸包、从随机投影中恢复信号,以及从高斯纠错码中可以有效纠正多少粗差。
This paper develops asymptotic methods to count faces of random high-dimensional polytopes. Beyond its intrinsic interest, our conclusions have surprising implications - in statistics, probability, information theory, and signal processing - with potential impacts in practical subjects like medical imaging and digital communications. Three such implications concern: convex hulls of Gaussian point clouds, signal recovery from random projections, and how many gross errors can be efficiently corrected from Gaussian error correcting codes.
关于通过奇次样条的新Hermite插值问题中误差有界性的猜想
Fadoua Balabdaoui, Jon A. Wellner
AI总结 提出一种新的奇次样条Hermite插值问题,猜想插值误差在最大模下与节点位置无关,并通过数值实验为k=3,...,10提供强证据。
我们提出一个通过奇次样条的Hermite插值问题,据作者所知,这在关于奇次样条插值的文献中尚未被考虑。在这个新的插值问题中,我们猜想插值误差在最大模下与节点位置无关。给定整数k≥3,我们的样条插值函数次数为2k-1,且有2k-4个(内部)节点。进行模拟以检验猜想的有效性。我们提供强有力的数值证据支持k=3,...,10时的猜想,当被插值函数属于C^{(2k)}[0,1](即[0,1]上2k次连续可微函数类)时。在这种情况下,证明最坏插值误差由具有与样条插值函数相同节点的2k次完美样条达到。该插值问题自然出现在通过最小二乘和最大似然方法对多重单调密度的非参数估计中。
We present a Hermite interpolation problem via splines of odd-degree which, to the best knowledge of the authors, has not been considered in the literature on interpolation via odd-degree splines. In this new interpolation problem, we conjecture that the interpolation error is bounded in the supremum norm independently of the locations of the knots. Given an integer k greater than or equal to 3, our spline interpolant is of degree 2k-1 and with 2k-4 (interior) knots. Simulations were performed to check the validity of the conjecture. We present strong numerical evidence in support of the conjecture for k=3, ..., 10 when the interpolated function belongs to C^{(2k)}[0,1], the class of 2k-times continuously differentiable functions on [0,1]. In this case, the worst interpolation error is proved to be attained by the perfect spline of degree 2k with the same knots as the spline interpolant. This interpolation problem arises naturally in nonparametric estimation of a multiply monotone density via Least Squares and Maximum Likehood methods.
关于快速、鲁棒的众数估计器:与其他鲁棒估计器的比较及应用
David R. Bickel, Rudolf Fruehwirth
AI总结 研究半样本众数这一快速且鲁棒的连续分布众数估计器,证明其具有高崩溃点和对异常值不敏感的特性,并展示其在迭代鲁棒位置估计和质子-质子碰撞点发现中的应用。
计算能力的进步使得众数得以更广泛地使用,因为作为最可能的值,它不受分布尾部的影响,是一种自然的集中趋势度量。本文研究了半样本众数的性质,它是一种简单且快速的连续分布众数估计器。半样本众数对异常值的敏感度低于大多数其他位置估计器,包括许多其他低偏差的众数估计器。其崩溃点为二分之一,与中位数相同。然而,由于其有限的拒绝点,半样本众数对全部大于或小于样本其他值的异常值非常不敏感。通过将众数估计器和中位数应用于来自被异常值污染的正态分布、对数正态分布和帕累托分布的样本,证实了这一点。还表明,半样本众数与鲁棒尺度估计器结合,可作为迭代鲁棒位置估计器(如Huber的M估计器)的高度鲁棒的起点。半样本众数可以很容易地推广到包含多于或少于一半样本的众数区间。本文展示了这种估计器在高能质子-质子相互作用中寻找碰撞点的应用。
Advances in computing power enable more widespread use of the mode, which is a natural measure of central tendency since, as the most probable value, it is not influenced by the tails in the distribution. The properties of the half-sample mode, which is a simple and fast estimator of the mode of a continuous distribution, are studied. The half-sample mode is less sensitive to outliers than most other estimators of location, including many other low-bias estimators of the mode. Its breakdown point is one half, equal to that of the median. However, because of its finite rejection point, the half-sample mode is much less sensitive to outliers that are all either greater or less than the other values of the sample. This is confirmed by applying the mode estimator and the median to samples drawn from normal, lognormal, and Pareto distributions contaminated by outliers. It is also shown that the half-sample mode, in combination with a robust scale estimator, is a highly robust starting point for iterative robust location estimators such as Huber's M-estimator. The half-sample mode can easily be generalized to modal intervals containing more or less than half of the sample. An application of such an estimator to the finding of collision points in high-energy proton-proton interactions is presented.
从含噪谱数据中恢复边缘——一个新视角
Shlomo Engelberg, Eitan Tadmor
AI总结 本文研究从N阶谱内容(含噪声)中检测分段光滑函数边缘的问题,通过调整浓度因子适应噪声方差,实现噪声尺度远大于1/N时O(1)尺度跳跃边缘的检测。
我们考虑从N阶谱内容中检测分段光滑函数边缘的问题,假设谱内容被噪声污染。涉及三个尺度:1/N量级的“光滑”尺度、$η$量级的噪声尺度和跳跃间断的O(1)尺度。我们使用根据噪声方差调整的浓度因子,其中$η$ >> 1/N,以检测与噪声尺度分离($η$ << 1)的潜在O(1)边缘。
We consider the problem of detecting edges in piecewise smooth functions from their N-degree spectral content, which is assumed to be corrupted by noise. There are three scales involved: the "smoothness" scale of order 1/N, the noise scale of order $η$ and the O(1) scale of the jump discontinuities. We use concentration factors which are adjusted to the noise variance, $η$ >> 1/N, in order to detect the underlying O(1)-edges, which are separated from the noise scale, $η$ << 1.
卷积声音混合盲分离的动态算法
Jie Liu, Jack Xin, Yingyong Qi
AI总结 提出一种基于频域统计信息更新和时域解混滤波器支撑最小化的动态盲源分离算法,通过优化多时滞互相关系数的l1×l∞范数解决排列和缩放模糊性,实现无迭代自适应分离。
我们研究了一种高效的动态卷积声音混合盲源分离算法,该算法基于频域统计信息的更新,并通过加权最小二乘法最小化时域解混滤波器的支撑。分离的排列和缩放不确定性问题,以及相邻时间帧信号的拼接问题,通过优化多时滞互相关系数的$l^1 \times l^\infty$范数得到解决。该算法是一种无需迭代的直接方法,并且对环境具有自适应性。对录制的和合成的语音及音乐信号混合的计算显示出优异的性能。
We study an efficient dynamic blind source separation algorithm of convolutive sound mixtures based on updating statistical information in the frequency domain, andminimizing the support of time domain demixing filters by a weighted least square method. The permutation and scaling indeterminacies of separation, and concatenations of signals in adjacent time frames are resolved with optimization of $l^1 \times l^\infty$ norm on cross-correlation coefficients at multiple time lags. The algorithm is a direct method without iterations, and is adaptive to the environment. Computations on recorded and synthetic mixtures of speech and music signals show excellent performance.