A Biconvex Formulation for Stable Transport of Mixture Models with a Unique Solution
混合模型稳定传输的双凸形式与唯一解
Yeganeh Marghi, Kelly Jin, Uygar Sümbül
AI总结 提出最优混合传输(OMT)框架,通过严格双凸优化实现子群体混合的稳定传输,理论保证稳定性,计算复杂度仅与混合成分数相关。
详情
最优传输(OT)为概率分布之间的映射提供了原则性框架。尽管取得了广泛进展,将OT应用于大规模数据仍然计算密集,且得到的逐点传输计划往往难以解释。我们引入了最优混合传输(OMT),这是一个可扩展的框架,将传输范式从单个样本转移到子群体的混合,将传输问题重新表述为具有唯一全局最小值的严格双凸优化。我们进一步建立了OMT映射稳定性的理论保证,表明底层分布的有界扰动会导致传输计划的有界变化。通过将子群体表述为指数族分布,OMT将计算复杂度与样本量解耦,仅随混合成分数量扩展。我们在广泛的合成基准和真实世界数据集(包括图像数据和大规模单细胞RNA测序测量)上展示了OMT的有效性和实用性。
Optimal transport (OT) provides a principled framework for mapping between probability distributions. Despite extensive progress, applying OT to large-scale data remains computationally demanding, and the resulting pointwise transport plans are often difficult to interpret. We introduce Optimal Mixture Transport (OMT), a scalable framework that shifts the transport paradigm from individual samples to mixtures of subpopulations, reformulating the transport problem as a strictly biconvex optimization with a unique global minimizer. We further establish theoretical guarantees on the stability of the OMT map, showing that bounded perturbations of the underlying distributions lead to bounded changes in the transport plan. By formulating subpopulations as exponential-family distributions, OMT decouples computational complexity from the sample size, scaling solely with the number of mixture components. We demonstrate the effectiveness and practicality of OMT on a wide range of synthetic benchmarks and real-world datasets, including image data and large-scale single-cell RNA sequencing measurements.