Fix the Structural Bottleneck: Context Compression via Explicit Information Transmission
修复结构瓶颈:通过显式信息传输进行上下文压缩
Jiangnan Ye, Hanqi Yan, Zhenyi Shen, Heng Chang, Ye Mao, Yulan He
AI总结 本文通过从结构角度重新审视上下文压缩,识别出标准LLM压缩方法中的两个关键瓶颈,并提出ComprExIT框架,通过显式信息传输提升压缩效率,实验表明其在多个数据集上表现优异,提升了F1分数并降低了计算成本。
详情
长上下文LLM代理往往面临增长的token、内存和延迟成本,使高效的上下文压缩对实际部署至关重要。现有LLM作为压缩器的方法在使用完整上下文时仍明显劣于其性能。我们发现这一差距部分源于其无法有效保留上下文信息。在本文中,我们从结构角度重新审视上下文压缩,并识别出标准LLM压缩方法中的两个关键瓶颈:信息聚合过程中压缩token之间的协调有限,以及层间稀释削弱了中间隐藏状态中的有用信号。为了解决这些限制,我们提出了ComprExIT,一种基于显式信息传输的新上下文压缩框架。ComprExIT会自适应地选择冻结LLM层中的特征,然后通过全局协调的运输计划将信息从锚点分配到压缩槽中。在12个数据集上的实验表明,ComprExIT在多个数据集上优于强大的软压缩基线,平均F1分数提升高达18.5%,同时仅增加约1%的可训练参数,并且比最快的基线快超过2倍的压缩速度。代码将在接受后发布。
Long-context LLM agents often struggle with growing token, memory, and latency costs, making efficient context compression essential for practical deployment. Existing LLM-as-a-compressor methods remain noticeably inferior to using the full context. We find that this gap partly stems from their inability to preserve contextual information effectively. In this work, we revisit context compression from a structural perspective and identify two key bottlenecks in standard LLM-based compressors: limited coordination among compression tokens during information aggregation, and layerwise dilution that weakens useful signals from intermediate hidden states. To address these limitations, we propose ComprExIT, a new context compression framework based on explicit information transmission. ComprExIT adaptively selects features across frozen LLM layers, then allocates information from anchors to compression slots through a globally coordinated transport plan. Experiments on 12 datasets show that ComprExIT consistently outperforms strong soft-compression baselines, improving average F1 by up to 18.5%, while adding only ~1% trainable parameters and achieving more than 2x faster compression than the fastest baselines. The code will be released upon acceptance.