Near-Optimal Regret in Adversarial Kernel Bandits
对抗性核赌博中的近最优遗憾
Yu-Jie Zhang, Hao Qiu, Jonathan Scarlett, Kevin Jamieson
AI总结 针对对抗性核赌博问题,提出基于正则化重要性加权损失估计的指数权重算法,通过显式修正项消除偏差,实现与随机核赌博已知最优率匹配的遗憾界。
详情
我们研究对抗性核赌博问题,其中每轮的损失由再生核希尔伯特空间(RKHS)中的任意有界元素诱导。我们提出了一种基于正则化重要性加权损失估计的指数权重算法,并带有一个显式修正项,用于抵消正则化引入的偏差。我们的主要结果将遗憾界限制为 $\widetilde{O}ig(\sqrt{T\, d_*(λ)\,\log|{X}|}ig)$,其中 $d_*(λ)$ 是广泛采用的有效维度概念,用于捕捉核的复杂度。忽略对数因子,这匹配了相关随机核赌博问题中已知的速率。一个显著的应用是 $\mathbb{R}^d$ 上具有平滑参数 $ν$ 的 Matérn$(ν,d)$ 核,此时我们的界特化为 $\widetilde{O}ig(T^{(ν+d)/(2ν+d)}ig)$,改进了 Chatterji 等人 [2019] 先前已知的最佳速率,同时去除了他们分析所需的秩一对手假设。此外,该速率与随机核赌博的已知最优速率相同,并且与并发工作中的下界仅相差一个 $\log T$ 因子。
We study the adversarial kernel bandit problem, in which the loss at each round is induced by an arbitrary bounded element of a reproducing kernel Hilbert space (RKHS). We propose an exponential-weights algorithm built on a regularized importance-weighted loss estimator, together with an explicit correction term that cancels the bias introduced by the regularization. Our main result bounds the regret by $\widetilde{O}\big(\sqrt{T\, d_*(λ)\,\log|{X}|}\big)$, where $d_*(λ)$ is a widely-adopted notion of effective dimension that captures the complexity of the kernel. Up to logarithmic factors, this matches the known rate achieved in the related stochastic kernel bandit problem. A notable application is the Matérn$(ν,d)$ kernel with smoothness parameter $ν$ on $\mathbb{R}^d$, for which our bound specializes to $\widetilde{O}\big(T^{(ν+d)/(2ν+d)}\big)$, improving over the best-known prior rate of Chatterji et al. [2019] while simultaneously removing the rank-one adversary assumption required by their analysis. Moreover, this rate is the same as the known optimal rate for stochastic kernel bandits, and also matches a lower bound from concurrent work up to a $\log T$ factor.