AdapShot: Adaptive Many-Shot In-Context Learning with Semantic-Aware KV Cache Reuse
AdapShot: 自适应多示例上下文学习与语义感知的KV缓存重用
Jie Ou, Jinyu Guo, Shiyao Guo, Yuang Li, Ruiqi Wu, Zhaokun Wang, Wenyi Li, Wenhong Tian
AI总结 提出AdapShot方法,通过基于熵的探针机制动态优化示例数量,并结合语义感知的KV缓存重用策略,实现高效的多示例上下文学习,性能提升约10%,速度提升4.64倍。
详情
多示例上下文学习(Many-Shot ICL)已成为一种有前景的范式,利用大量示例来释放大型语言模型(LLMs)的推理潜力。然而,现有方法通常依赖于预定的固定示例数量。这种静态方法往往无法适应不同查询的难度变化,导致上下文不足或噪声干扰。此外,长上下文的过高计算和内存成本严重限制了多示例的可行性。为了解决上述限制,我们提出了AdapShot,它动态优化示例数量,并利用KV缓存重用实现高效推理。具体来说,我们设计了一种基于探针的评估机制,利用输出熵确定最佳示例数量。为了在探测和推理阶段避免冗余的预填充计算,我们引入了一种语义感知的KV缓存重用策略。在该重用策略中,为了解决位置编码不兼容问题,我们提出了一种解耦和重新编码方法,使得缓存的键值对能够灵活重新排序。大量实验表明,与最先进的DBSA相比,AdapShot平均性能提升约10%,速度提升4.64倍。
Many-Shot In-Context Learning (ICL) has emerged as a promising paradigm, leveraging extensive examples to unlock the reasoning potential of Large Language Models (LLMs). However, existing methods typically rely on a predetermined, fixed number of shots. This static approach often fails to adapt to the varying difficulty of different queries, leading to either insufficient context or interference from noise. Furthermore, the prohibitive computational and memory costs of long contexts severely limit Many-Shot's feasibility. To address the above limitations, we propose AdapShot, which dynamically optimizes shot counts and leverages KV cache reuse for efficient inference. Specifically, we design a probe-based evaluation mechanism that utilizes output entropy to determine the optimal number of shots. To bypass the redundant prefilling computation during both the probing and inference phases, we incorporate a semantics-aware KV cache reuse strategy. Within this reuse strategy, to address positional encoding incompatibilities, we introduce a decoupling and re-encoding method that enables the flexible reordering of cached key-value pairs. Extensive experiments demonstrate that AdapShot achieves an average performance gain of around 10% and a 4.64x speedup compared to state-of-the-art DBSA.