Multi-Label Test-Time Adaptation with Bayesian Conditional Priors
基于贝叶斯条件先验的多标签测试时自适应
Qiru Li, Ao Zhou, Zhiwei Jiang, Zifeng Cheng, Cong Wang, Yafeng Yin, Qing Gu
AI总结 提出贝叶斯条件先验估计(BCP),一种无梯度的测试时自适应方法,通过在线估计锚定条件先验注入标签依赖性,提升冻结视觉语言模型在多标签识别中的分布偏移鲁棒性。
详情
- Comments
- accepted by ICML2026
多标签识别中,冻结的视觉语言模型(VLM)在分布偏移下表现脆弱:标准零样本推理独立评分每个标签,忽略共现结构,产生不连贯的标签集,其中主导概念抑制较弱但兼容的标签。我们引入贝叶斯条件先验(BCP)估计,一种无梯度的测试时自适应方法,在不调整主干网络的情况下注入标签依赖性。BCP将零样本logits视为在固定图像-文本似然下的边缘后验代理,并将偏移引起的误差主要归因于不匹配的标签先验。对于每个测试图像,它选择一个高置信度的锚定标签,并应用锚定条件的贝叶斯精炼。该更新在logit空间中是闭式的,并具有点互信息(PMI)解释,明确促进兼容标签并抑制不兼容标签。BCP通过从无标签测试流中在线估计锚定条件先验(使用轻量级二阶共现统计)来运行,无需目标标注,且仅增加单个前向传递之外的微不足道的开销。在标准多标签基准和多个CLIP主干网络上,BCP持续优于强TTA基线,例如将RN50的平均mAP从57.31提升至69.22,ViT-B/16从62.61提升至71.79。
Multi-label recognition with frozen Vision-Language Models (VLMs) is brittle under distribution shift: standard zero-shot inference scores labels independently, ignoring co-occurrence structure and producing incoherent label sets where dominant concepts suppress weaker but compatible labels. We introduce Bayesian Conditional Priors (BCP) Estimation, a gradient-free test-time adaptation method that injects label dependency without tuning the backbone. BCP views zero-shot logits as a proxy for marginal posteriors under a fixed image-text likelihood and attributes shift-induced errors mainly to a mismatched label prior. For each test image, it selects a high-confidence anchor label and applies an anchor-conditioned Bayesian refinement. This update is closed-form in logit space and admits a pointwise mutual information (PMI) interpretation, explicitly promoting compatible labels and suppressing incompatible ones. BCP operates without target annotations by estimating anchor-conditioned priors online from the unlabeled test stream via lightweight second-order co-occurrence statistics, adding negligible overhead beyond a single forward pass. Across standard multi-label benchmarks and multiple CLIP backbones, BCP consistently outperforms strong TTA baselines, e.g., improving RN50 average mAP from 57.31 to 69.22 and ViT-B/16 from 62.61 to 71.79.