Even More Guarantees for Variational Inference in the Presence of Symmetries
变分推断在对称性存在下的更多保证
Lena Zellinger, Antonio Vergari
AI总结 本文扩展了变分推断在目标对称性下的鲁棒性理论,证明了使用前向KL散度和α-散度时,即使模型误设也能精确恢复目标均值和相关矩阵,并放宽了对数凹假设,适用于多模态分布。
详情
当通过变分推断(VI)近似一个难以处理的密度时,变分族通常被选为一个简单的参数族,很可能不包含目标。这引发了一个问题:在模型误设的情况下,我们能在什么条件下恢复目标的特征?在这项工作中,我们在两个重要方面扩展了先前关于位置-尺度族在目标对称性下鲁棒VI的理论结果:(1)我们通过提供使用前向Kullback-Leibler散度和α-散度时精确恢复目标均值和相关矩阵的充分条件,将它们开放给更广泛的散度。(2)通过这样做,我们发现可以放弃先前工作中做出的对数凹目标的限制性假设,从而允许我们为更广泛的目标(包括多模态目标)提供保证。在我们的实验中,我们展示了我们的保证如何作为选择变分族和α值的指南,并通过一组多样化的例子说明了在缺乏我们的充分条件时优化如何以及为何会失败。
When approximating an intractable density via variational inference (VI) the variational family is typically chosen as a simple parametric family that very likely does not contain the target. This raises the question: Under which conditions can we recover characteristics of the target despite misspecification? In this work, we extend previous theoretical results on robust VI with location-scale families under target symmetries in two substantial ways: (1) We open them up to a wider range of divergences by providing sufficient conditions for exact recovery of the target mean and correlation matrix when using the forward Kullback-Leibler divergence and $α$-divergences. (2) By doing so, we find that we can drop the restrictive assumption of a log-concave target made in previous work, allowing us to give guarantees for a wider range of targets, including multi-modal ones. In our experiments, we show how our guarantees can serve as guidelines for the choice of the variational family and $α$-value and we illustrate on a diverse set of examples how and why optimization can fail in the absence of our sufficient conditions.