arXivDaily arXiv每日学术速递 周一至周五更新

AI 大模型

语言大模型 / LLM

大语言模型、预训练、指令微调、后训练和语言模型应用。

今日/当前日期收录 97 信号源:cs.CL, cs.AI, cs.LG

1. 后训练 5 篇

2606.19549 2026-06-19 cs.LG 新提交 80%

Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates

预测参数高效微调更新的可合并性

Lin Tang, Wei Zhang, Jing Li, Hongyu Chen, Ming Zhao, Yuxuan Wang

发表机构 * Sichuan University(四川大学) University of Electronic Science and Technology of China(电子科技大学)

专题命中 后训练 :预测LoRA适配器可合并性,涉及模型微调

AI总结 提出MergeProbe,通过训练初期信号预测LoRA适配器的可合并性,在MERGE-PEFT基准上实现最佳平均和最差保留性能。

详情
AI中文摘要

低秩适配(LoRA)使得训练许多领域和任务特定的语言模型适配器变得廉价,但两个适配器是否可以合并通常只有在两者都经过充分训练和评估后才能发现。这种延迟反馈代价高昂:单独表现强大的适配器在合并更新后可能会产生破坏性干扰。我们询问是否可以预测这种结果。我们将适配器可合并性形式化为适配器在合并后保持其单任务效用的程度,并表明可以从训练初期百分之几的信号中预测——主要是低秩更新及其梯度在不同任务间的对齐程度以及它们对共享表示的干扰程度。我们将这些信号打包成MergeProbe,一个轻量级预测器,用于估计成对和集合级别的保留,并将估计转化为具体决策:直接合并、重新加权、剪枝或路由。在MERGE-PEFT(一个涵盖数学、代码、科学、指令遵循和安全的五领域基准)上,MergeProbe在强干扰感知合并基线中实现了最佳平均和最差保留,同时增加的部署开销远低于完整任务路由。这将LoRA合并从事后工程步骤转变为预期测量问题。

英文摘要

Low-rank adaptation (LoRA) makes it cheap to train many domain- and task-specific language model adapters, but whether two adapters can be merged is usually discovered only after both have been fully trained and evaluated. This late feedback is costly: adapters that are strong in isolation can interfere destructively once their updates are combined. We ask whether this outcome can be anticipated. We formalize adapter mergeability as the degree to which an adapter preserves its single-task utility after merging, and show that it can be forecast from signals measured in the first few percent of training -- chiefly how the low-rank updates and their gradients align across tasks and how much they disturb shared representations. We package these signals into MergeProbe, a lightweight predictor that estimates pairwise and set-level retention and turns the estimate into a concrete decision: merge directly, reweight, prune, or route. On MERGE-PEFT, a five-domain benchmark spanning math, code, science, instruction following, and safety, MergeProbe attains the best average and worst-case retention among strong interference-aware merge baselines while adding far less deployment overhead than full task routing. This turns LoRA merging from a post-hoc engineering step into an anticipatory measurement problem.

2606.19542 2026-06-19 cs.LG 新提交 80%

Tracking Representation Dynamics in Large Language Models with Persistent Homology

利用持续同调追踪大型语言模型中的表示动态

Naman Malhotra, Jay Ambadkar, Abhinav Gupta, Kushal Kasivel, Abbas Schwarz, Kamillo Ferry, Anthea Monod

发表机构 * Imperial College London(伦敦帝国学院)

专题命中 后训练 :分析对齐过程中LLM内部表示拓扑变化

AI总结 通过持续同调分析激活空间拓扑,发现对齐过程中拓扑重组主要发生在训练早期,且不同对齐目标产生可区分的拓扑轨迹。

Comments 29 pages

详情
AI中文摘要

大型语言模型通常通过监督微调进行对齐,但关于其内部表示在此过程中如何演变的研究尚不充分。我们利用持续同调,通过追踪微调过程中激活空间的拓扑结构来研究对齐动态。在四个参数范围从1B到7B的Transformer语言模型以及对应于有用、无害和混合训练数据的三个对齐目标上,我们发现大多数拓扑重组发生在训练的最早阶段。密集检查点分析揭示了拓扑活动的瞬态峰值,随后迅速稳定。我们进一步表明,不同的对齐目标会引发可区分的拓扑轨迹,而指令微调和预训练模型则表现出定性不同的演化模式。我们的结果表明,持续同调为对齐提供了互补视角,揭示了仅从行为指标无法察觉的表示级变化。

英文摘要

Large language models are commonly aligned through supervised fine-tuning, yet little is known about how their internal representations evolve during this process. We study alignment dynamics using persistent homology by tracking the topology of activation spaces throughout fine-tuning. Across four transformer language models ranging from 1B to 7B parameters and three alignment objectives corresponding to helpful, harmless, and mixed training data, we find that the majority of topological reorganization occurs during the earliest stages of training. A dense checkpoint analysis reveals a transient peak in topological activity followed by rapid stabilization. We further show that different alignment objectives induce distinguishable topological trajectories, while instruction-tuned and pretrained models exhibit qualitatively different patterns of evolution. Our results suggest that persistent homology provides a complementary perspective on alignment, revealing representation-level changes that are not apparent from behavioral metrics alone.

2606.19946 2026-06-19 cs.CL cs.LG 新提交 75%

GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMs

GEMS: 几何约束使LLM中多语义叠加成为可能

Yu Deng

专题命中 后训练 :提出无训练多语义方向激活干预方法GEMS

AI总结 提出GEMS方法,通过范数保持加权叠加、目标注意力路径注入和实时正交化两个几何约束,解决无训练多方向激活干预中的分布偏差和方向干扰问题,在GSM8K上保持98%准确率。

Comments 30 pages, 5 figures, 20 tables. Code and logs are available at: https://github.com/LuLu663939/gems-multi-semantic-steering

详情
AI中文摘要

激活引导通过在推理时修改中间隐藏状态来控制模型行为,无需重新训练。现有方法仅处理单方向注入;当多个语义方向无约束叠加时,模型崩溃。我们证明这种崩溃分解为两个独立作用的来源:分布偏差(加法扰动在层间累积范数并将激活推出训练分布)和方向干扰(非正交语义向量叠加时相互抑制)。这两个来源定义了任何无训练多方向干预必须满足的设计约束。作为这些原则的一个实例,我们提出GEMS,一种无训练方法,将每个来源映射到相应的几何约束:针对分布偏差的范数保持加权叠加和目标注意力路径注入,以及针对方向干扰的实时正交化。在GSM8K上,注入三个并发非数学方向保持98%的准确率(基线92%),而无约束加法崩溃至4%;在Wikitext-2上,相同注入仅导致2.2%的PPL增加。组件消融隔离了每个约束的因果作用,层级探针确认正交化信号通过FFN路径存活并以语义特异性到达输出分布。定性引导效果跨架构从3B到31B迁移。

英文摘要

Activation steering controls model behavior by modifying intermediate hidden states at inference time without retraining. Existing methods handle only single-direction injection; when multiple semantic directions are superposed without constraints, the model collapses. We show that this collapse decomposes into two independently acting sources: distributional deviation, where additive perturbations accumulate in norm across layers and drive activations outside the training distribution, and directional interference, where non-orthogonal semantic vectors mutually dampen when superposed. These two sources define the design constraints that any training-free multi-directional intervention must address. As one instantiation of these principles, we propose GEMS, a training-free method that maps each source to a corresponding geometric constraint: norm-preserving weighted superposition and targeted attention-pathway injection for distributional deviation, and real-time orthogonalization for directional interference. On GSM8K, injecting three concurrent non-mathematical directions preserves accuracy at 98% (baseline 92%), while unconstrained addition collapses to 4%; on Wikitext-2, the same injection incurs only 2.2% PPL increase. Component ablation isolates the causal role of each constraint, and layer-level probes confirm that orthogonalized signals survive the FFN pathway and reach the output distribution with semantic specificity. Qualitative steering effects transfer across architectures from 3B to 31B.

2606.20508 2026-06-19 cs.AI cs.LG 新提交 70%

What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations?

安全对齐的LLM从混合顺从演示中学到了什么?

Sihui Dai, Mann Patel

专题命中 后训练 :涉及偏好优化训练阶段对安全的影响

AI总结 研究通过混合良性顺从演示和有害顺从演示,探究演示组成如何驱动有害顺从,发现演示内容、顺序和训练方法影响模型提取的信息。

详情
AI中文摘要

先前工作表明,上下文演示可以越狱语言模型,但模型如何解释不同类型的顺从演示仍不清楚。我们通过混合良性顺从演示(无害请求,有帮助响应)与有害顺从演示(有害请求,有帮助响应)并测试关于演示组成如何驱动有害顺从的三个假设来研究这一点。在四个模型中,我们发现良性和有害演示不可互换:良性演示根据模型不同可以减少或增加有害顺从。我们进一步表明,偏好优化是防止良性演示增加有害顺从的关键训练阶段,演示顺序表现出强烈的近因偏差,并且模型在拒绝与上下文学习的交互方式上有所不同:一些模型在拒绝时也采用演示的格式,而其他模型在拒绝时覆盖所有上下文信号。综合来看,这项工作超越了展示基于演示的越狱有效,而是描述了其工作原理:模型从顺从演示中提取的内容取决于演示内容、顺序和训练方法。

英文摘要

Prior work has shown that in-context demonstrations can jailbreak language models, but it remains unclear how models interpret different types of compliance demonstrations. We study this by mixing benign compliance demonstrations (non-harmful request, helpful response) with harmful compliance demonstrations (harmful request, helpful response) and testing three hypotheses about how demonstration composition drives harmful compliance. Across four models, we find that benign and harmful demonstrations are not interchangeable: benign demonstrations can either reduce or increase harmful compliance depending on the model. We further show that preference optimization is the critical training stage that prevents benign demonstrations from increasing harmful compliance, that demonstration ordering exhibits strong recency bias, and that models differ in how refusal interacts with in-context learning: some adopt demonstrated formatting even when refusing, while others override all in-context signals upon refusal. Taken together, this work moves beyond showing that demonstration-based jailbreaking works to characterizing how it works: what models extract from compliance demonstrations depends on demonstration content, ordering, and training methodology.

2606.20482 2026-06-19 cs.CL cs.HC cs.LG 新提交 70%

Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users

你的鼠标和眼睛悄悄泄露你的偏好:利用用户隐式反馈进行LLM对齐

Haw-Shiuan Chang, Jeffrey Gomez, Mehul Patwari, Aryan Sajith, Hamed Zamani

发表机构 * University of Massachusetts, Amherst(马萨诸塞大学阿默斯特分校) York University(约克大学)

专题命中 后训练 :训练奖励模型用于DPO对齐

AI总结 针对显式反馈稀缺的问题,提出利用鼠标轨迹和眼动数据等隐式反馈训练奖励模型,将文本奖励模型准确率从55%提升至64%,并显著提高DPO对齐后响应质量。

详情
AI中文摘要

为了对齐大型语言模型(LLM),大多数现有方法收集显式的人类反馈,并基于响应文本训练奖励模型来预测人类偏好。这些现有方法有两个关键局限性。首先,用户很少为LLM响应提供显式反馈,这使得高质量偏好标注的收集成本高昂。其次,这些方法没有利用隐式人类反馈,而隐式反馈已被证明对互联网巨头的经济护城河至关重要。为了量化隐式反馈的价值,我们构建了一个名为IFLLM的新数据集,收集了来自59名Mechanical Turk工作者的1336个多轮问题、他们的鼠标轨迹以及通过网络摄像头对LLM响应的眼动注视点。IFLLM显示用户具有非常多样化的注视行为和鼠标轨迹。基于隐式用户反馈的奖励模型将基于文本的奖励模型准确率从55%提升至64%,并在将DPO应用于八个LLM后,相对响应质量改进几乎翻了三倍,证明了隐式反馈在现实场景中的价值。我们的数据收集网站、数据集和代码可在以下网址找到:此https URL。

英文摘要

To align a Large Language Model (LLM), most existing methods collect explicit human feedback and train a reward model to predict the human preference based on the response text. These existing methods have two key limitations. First, the users rarely provide explicit feedback for LLM responses, which makes the high-quality preference annotation expensive to collect. Second, the methods do not leverage implicit human feedback, which has proven vital to the economic moats of Internet giants. To quantify the value of implicit feedback, we build a new dataset called IFLLM, which collects 1336 multi-turn questions from the 59 Mechanical Turk workers, their mouse trajectories, and eye gazing points to the LLMs' responses from their webcams. IFLLM shows that the users have very diverse types of gazing behavior and mouse trajectories. Our reward model based on the implicit user feedback boosts the accuracy of the text-based reward model from 55% to 64% and nearly triples the relative response quality improvements after applying the DPO to eight LLMs, demonstrating the value of implicit feedback in the wild. Our data collection website, dataset, and codes can be found at https://github.com/themehulpatwari/llm-implicit-feedback/.

2. 预训练 2 篇

2606.19528 2026-06-19 cs.LG cs.AI 新提交 80%

Techniques for Peak Memory Reduction for LoRA Fine-tuning of LLMs on Edge Devices

边缘设备上LLM LoRA微调峰值内存降低技术

Hassan Dbouk, Matthias Reisser, Prathamesh Mandke, Likhita Arun Navali, Christos Louizos

发表机构 * GitHub

专题命中 预训练 :降低LLM LoRA微调峰值内存的技术

AI总结 针对边缘设备上LLM LoRA微调的内存瓶颈,提出四种互补技术(量化、检查点、softmax近似、logits掩码),在Llama-3.2 3B和Qwen-2.5 3B上实现高达26倍和28倍的峰值内存降低。

Comments Hassan Dbouk and Matthias Reisser contributed equally to this work

详情
AI中文摘要

使用低秩适配(LoRA)在终端用户数据上微调大型语言模型(LLM)可提供个性化体验并保护数据隐私,但在消费级硬件上面临严重的内存限制。微调期间的峰值内存通常超过设备限制,尤其是对于具有数十亿参数和长上下文训练数据的模型。本文介绍了一套互补技术,可在不牺牲模型质量的情况下减少内存占用:(1)基模型量化与即时反量化,(2)结合选择性激活缓存和磁盘卸载的内存高效检查点,(3)使用语义相关令牌子集的softmax近似,以及(4)logits掩码。在Llama-3.2 3B和Qwen-2.5 3B上的实验表明,峰值内存降低高达26倍和28倍,从而能够在资源受限设备上进行微调。

英文摘要

Fine-tuning of Large Language Models (LLMs) using Low-Rank Adaptation (LoRA) on an end-user's data offers personalized experiences while keeping data private, but faces severe memory constraints on consumer hardware. Peak memory during fine-tuning often exceeds device limits, especially for models with billions of parameters and long-context training data. This paper introduces a suite of complementary techniques to reduce memory footprint without sacrificing model quality: (1) base model quantization with on-the-fly dequantization, (2) memory-efficient checkpointing combining selective activation caching and disk offloading, (3) softmax approximation using semantically relevant token subsets, and (4) logits masking. Experiments on Llama-3.2 3B and Qwen-2.5 3B demonstrate up to $26\times$ and $28\times$ reduction in peak memory, enabling fine-tuning on resource-constrained devices.

2606.19625 2026-06-19 cs.CL cs.LG 新提交 75%

Where Does Social Reasoning Come From? Capability Provenance in Language Models

社会推理从何而来?语言模型中的能力来源

Glenn Matlin, Chandreyi Chakraborty, Saehee Eom, Mika Okamoto, Rayan Castilla, Louis Jaburi, Alvin Deng, Taywon Min, Lucia Quirke, Stella Biderman, Mark Riedl

发表机构 * Georgia Institute of Technology, College of Computing(佐治亚理工学院计算学院) MATS Program(MATS项目) EleutherAI KAIST AI(韩国科学技术院人工智能学院) Georgia Tech AI Safety Initiative(佐治亚理工学院人工智能安全倡议)

专题命中 预训练 :通过训练数据归因分析社会推理与STEM推理来源。

AI总结 通过训练数据归因方法,发现OLMo3-7B中社会推理和STEM推理依赖于不同的预训练语料区域,且推理层面的差异比知识层面更显著。

Comments Under review at COLM 2026 (Conference)

详情
AI中文摘要

我们使用训练数据归因作为可解释的工具进行能力发现,映射预训练语料库中哪些区域支持OLMo3-7B的社会推理与STEM推理。训练数据归因衡量每个训练文档对模型在基准测试上的预测的影响强度,但文档级别的分数过于嘈杂,无法识别哪些语料区域支持哪些能力,且先前的工作侧重于事实知识而非推理。我们在从去重后的Dolma3混合数据中抽取的工作集上计算基于梯度的归因(通过Bergmann的TrackStar),聚合跨WebOrganizer的24格式×24主题分类(576个箱子)的影响,并在2×2设计中对比基准对,该设计变化领域(社会 vs. STEM)和能力类型(推理 vs. 知识):SocialIQA和MMLU社会科学对比ARC-Challenge和MMLU STEM。社会和STEM推理依赖于定性不同的语料区域,且推理层面的对比比知识层面更尖锐。有针对性的机器遗忘提供了部分因果验证:遗忘高归因主题箱(例如,SocialIQA的文学)比箱内随机基线更严重地降低对齐的基准,我们开源所有代码、采样清单、箱级影响矩阵和遗忘检查点。

英文摘要

We use training-data attribution as an interpretable tool for capability discovery, mapping which regions of the pretraining corpus support social-reasoning versus STEM-reasoning in OLMo3-7B. Training-data attribution measures how strongly each training document influences a model's predictions on a benchmark, but document-level scores are too noisy to identify which corpus regions support which capabilities, and prior work has emphasized factual knowledge rather than reasoning. We compute gradient-based attribution (TrackStar via Bergson) over a working set drawn from the de-duplicated Dolma3 mix, aggregate influence across WebOrganizer's 24-format x 24-topic taxonomy (576 bins), and contrast benchmark pairs in a 2x2 design that varies domain (social vs. STEM) and capability type (reasoning vs. knowledge): SocialIQA and MMLU Social Sciences against ARC-Challenge and MMLU STEM. Social and STEM reasoning draw on qualitatively distinct corpus regions, and the contrast is sharper at the reasoning level than at the knowledge level. Targeted machine unlearning provides partial causal validation: forgetting high-attribution topic bins (e.g., Literature for SocialIQA) degrades the aligned benchmark more than within-bin random baselines, and we open-source all code, sampling manifests, the bin-level influence matrix, and unlearning checkpoints.

3. 领域大模型 3 篇

2606.19376 2026-06-19 cs.LG cs.AI cs.IR 新提交 80%

Cost-Optimal LLM Routing with Limited User Feedback under User Satisfaction Guarantees

在用户满意度保证下基于有限用户反馈的成本最优LLM路由

Herbert Woisetschläger, Arastun Mammadli, Ryan Zhang, Shiqiang Wang

发表机构 * Technical University of Munich(慕尼黑工业大学) University of Exeter(埃克塞特大学) Horace Greeley High School(霍勒斯格里利高中)

专题命中 领域大模型 :研究LLM路由以优化成本和服务质量。

AI总结 针对LLM推理成本与服务质量之间的矛盾,提出SLARouter在线路由算法,利用稀疏单侧用户反馈学习成本最优策略,理论保证成本最优和SLA合规,实验显示成本降低高达2.2倍。

Comments Preprint. Under review

详情
AI中文摘要

大型语言模型(LLM)应用的推理成本正在快速增长,这是由于需求激增和基础设施成本上升所驱动的。用户期望高质量的响应,在商业环境中,这被正式编码在服务级别协议(SLA)中,从而在成本和质量之间形成了根本性的矛盾。最近在成本感知的LLM请求路由方面的进展显示出解决这一矛盾的潜力,但现有方法依赖于完整的反馈信号、离线训练、大量的每工作负载调优,并且大多数缺乏SLA保证或推理时适应性。我们引入了SLARouter,一种在线路由算法,它从生产系统中可用的稀疏、单侧用户反馈中学习成本最优策略。SLARouter为成本最优性和严格的SLA合规性提供了理论保证。在广泛的LLM基准测试上的实验表明,SLARouter无需每基准调优即可满足SLA约束,将运营成本降低至现有基线的2.2倍。

英文摘要

Inference costs for large language model (LLM) applications are rapidly growing, driven by surging demand and rising infrastructure cost. Users expect high-quality responses, and in commercial settings this is formally codified in Service Level Agreements (SLAs), creating a fundamental tension between cost and quality. Recent progress on cost-aware LLM request routing has shown potential to resolve this tension, but existing approaches rely on complete feedback signals, offline training, extensive per-workload tuning, and most lack SLA guarantees or inference-time adaptivity. We introduce SLARouter, an online routing algorithm that learns a cost-optimal policy from the sparse, one-sided user feedback available in production systems. SLARouter provides theoretical guarantees for both cost optimality and strict SLA compliance. Experiments across a wide range of LLM benchmarks show that SLARouter satisfies SLA constraints without the need for per-benchmark tuning, reducing operating cost by up to 2.2x over existing baselines.

2606.19387 2026-06-19 cs.SE cs.AI 新提交 75%

Interpretable and Verifiable Hardware Generation with LLM-Driven Stepwise Refinement

可解释且可验证的硬件生成:基于LLM驱动的逐步细化

You Li, Samuel Mandell, David Z. Pan

发表机构 * The University of Texas at Austin(德克萨斯大学奥斯汀分校) Fudan University(复旦大学) USA(美国)

专题命中 领域大模型 :LLM用于硬件设计,但非通用语言模型。

AI总结 提出结合LLM创造力与形式化方法可解释性的硬件生成框架,通过迭代应用变换规则将设计规范转换为正确性有保证的RTL程序。

详情
AI中文摘要

大型语言模型(LLM)在软件开发中取得了显著成功。然而,它们容易产生幻觉,即可能引入微妙的语义和逻辑错误。由于芯片设计和制造的高风险,硬件工程师仍不愿依赖LLM进行寄存器传输级(RTL)生成。本文提出一种硬件生成框架,结合了LLM的创造力和广泛知识与形式化方法的可解释性和数学严谨性。具体而言,我们设计了一组覆盖各种设计决策和硬件特征的变换规则。通过迭代应用这些规则,LLM代理可以将设计规范转换为正确性有保证的RTL程序。实验结果证明了该框架的有效性和效率。

英文摘要

Large language models (LLMs) have achieved remarkable success in software development. However, they are susceptible to hallucinations, meaning that they can introduce subtle semantic and logical errors. Due to the high stakes in chip design and manufacturing, hardware engineers are still reluctant to rely on LLMs for register-transfer level (RTL) generation. In this paper, we propose a hardware generation framework that combines the creativity and broad knowledge of LLMs with the explainability and mathematical rigor of formal methods. Specifically, we devise a set of transformation rules that cover various design decisions and hardware features. By iteratively applying these rules, an LLM agent can convert a design specification into an RTL program with guaranteed correctness. Experimental results demonstrate the effectiveness and efficiency of the framework.

2606.19364 2026-06-19 cs.LG 新提交 75%

Closing the Social-Semantic Gap: SPSD for Edge-Based Prompt Compression in Cloud LLM Inference

缩小社会-语义差距:SPSD用于云LLM推理中的边缘端提示压缩

Abhinit Sen, Ajeet Kumar, Manaranjan Pradhan

发表机构 * Indian School of Business(印度管理学院)

专题命中 领域大模型 :边缘端提示压缩用于云LLM推理。

AI总结 针对云LLM推理中提示词预填充阶段能耗高的问题,提出SPSD边缘端管道,利用4比特量化小语言模型压缩用户提示,在保持响应质量非劣效的前提下,平均节省99.9个输入token,每调用净节能70-270 uWh。

Comments 19 pages, 7 tables, 1 figure, includes appendix

详情
AI中文摘要

大语言模型(LLM)推理的预填充阶段正成为云规模能耗的日益增长的贡献者。许多面向消费者的支持和对话提示包含社会性支架:礼貌标记、道歉性开场白、重复以及建立融洽关系的语言,这些对人类交流很重要,但对机器推理而言边际信息量较低。我们将这种差异称为社会-语义差距。我们提出SPSD(情感保留语义蒸馏),一种边缘端管道,在传输到云端部署的LLM之前,使用4比特量化的小语言模型压缩用户提示。在248个提示的语料库上,使用Gemma-2-2B-Instruct(Q4_K_M)作为SLM、Llama-3.1-8B-Instruct作为云端评估模型进行评估,每次蒸馏调用平均输入token节省99.9个,所有146次蒸馏调用均产生正向节省。通过盲法LLM-as-judge评分对121对进行评估,响应质量在15分制中预先指定的1分非劣效范围内不劣于原始路径;评审员给出43%平局、28%蒸馏胜出和29%原始胜出。余弦相似度结果不一:均值0.682,中位数0.712,54.1%的配对高于0.70参考阈值。安全关键领域通过基于规则的网关保守地路由至直通模式。在所述假设下,每次调用净节能估计为70-270 uWh。SPSD表明,设备端提示蒸馏可以在保持响应质量在实际非劣效范围内的同时,降低云LLM的输入token成本。

英文摘要

The prefill stage of Large Language Model (LLM) inference is a growing contributor to cloud-scale energy cost. Many consumer-support and conversational prompts contain social scaffolding: politeness markers, apologetic preamble, repetition, and rapport-building language that is important for human communication but carries low marginal information for machine reasoning. We call this discrepancy the Social-Semantic Gap. We present SPSD (Sentiment Preserving Semantic Distillation), an edge-based pipeline that compresses user prompts using a 4-bit quantised Small Language Model before transmission to a cloud-deployed LLM. Evaluation on a 248-prompt corpus using Gemma-2-2B-Instruct (Q4_K_M) as the SLM and Llama-3.1-8B-Instruct as the cloud evaluation model yields a mean input token saving of 99.9 tokens per distilled call, with all 146 distilled calls yielding positive savings. Response quality, assessed by blind LLM-as-judge scoring across 121 pairs, is non-inferior to the raw path within a pre-specified 1-point margin on a 15-point rubric; the judge awarded 43 percent ties, 28 percent distilled wins, and 29 percent raw wins. Cosine similarity is mixed: mean 0.682, median 0.712, with 54.1 percent of pairs above the 0.70 reference threshold. Safety-critical domains are conservatively routed to passthrough via rule-based gates. Per-call net energy saving is estimated at 70-270 uWh under stated assumptions. SPSD shows that on-device prompt distillation can reduce cloud LLM input-token cost while preserving response quality within a practical non-inferiority margin.

4. 其他LLM 18 篇

2606.19353 2026-06-19 cs.CL cs.LG 新提交 80%

Quantifying Aleatoric Uncertainty of In-Context Learning for Robust Measure of LLM Prediction Confidence

量化上下文学习中的偶然不确定性以稳健衡量LLM预测置信度

Jinseok Chung, Minkyoung Song, Hyunji Jung, Namhoon Lee

发表机构 * POSTECH(浦项科技大学)

专题命中 其他LLM :量化上下文学习中的不确定性,提升置信度

AI总结 针对上下文学习(ICL)中预测对提示设计敏感的问题,提出基于贝叶斯观点和机制可解释性的自函数向量,直接估计偶然不确定性,并设计严格评估协议,在合成和真实数据集上验证了方法的可靠性及在幻觉检测等应用中的实用性。

Comments Accepted to ACL 2026

详情
AI中文摘要

上下文学习(ICL)使LLM能够从少量示例中适应新任务,但其可靠性仍存疑虑:预测对提示设计和模型理解上下文的能力高度敏感,使得失败源于数据特性还是模型限制难以区分。不确定性分解——将偶然不确定性从认知不确定性中分离——在此场景中尤为关键,然而现有方法针对标准生成任务设计,未能捕捉ICL的独特动态。为解决此问题,我们引入基于贝叶斯观点和ICL机制可解释性的自函数向量概念。这些向量利用模型内部表示来建模上下文提示中学习的潜在概念,从而在贝叶斯框架内直接估计偶然不确定性,并规避了对脆弱的输入或解码操作的依赖。鉴于缺乏既定基准和合适的评估协议,我们还提出了首个严格的评估协议,其中数据以受控方式被操纵,以便精确量化偶然不确定性并将其与认知不确定性分离。借助这一新的评估框架(最初基于合成任务进行概念开发,随后扩展到真实世界数据集),我们展示了所提出的方法比现有替代方法更可靠地衡量LLM在ICL下做出的预测的不确定性。此外,我们展示了它可作为可信相关应用(如幻觉检测)的实用工具。我们的发现为将不确定性的量化观点与模型行为的机制理解联系起来开辟了新方向。

英文摘要

In-Context Learning (ICL) allows LLMs to adapt to new tasks from a few demonstrations, but its reliability remains a concern: predictions are highly sensitive to both prompt design and the model's ability to understand the context, obscuring whether failures arise from data properties or model limitations. Uncertainty decomposition-separating aleatoric from epistemic sources-is particularly crucial in this setting, yet existing methods, designed for standard generation tasks, fail to capture the unique dynamics of ICL. To address this, we introduce a concept of self-function vectors, built upon Bayesian views and the mechanistic interpretability of ICL. These vectors leverage internal model representations to model the latent concept learned during in-context prompting, thereby enabling a direct estimation of aleatoric uncertainty within a Bayesian framework and circumventing the reliance on brittle input or decoding manipulations. Given the lack of established benchmarks and suitable evaluation protocols, we also propose the first and rigorous evaluation protocol, in which data is manipulated in controlled ways so as to quantify aleatoric uncertainty precisely and separately from epistemic uncertainty. With this new evaluation framework, initially grounded in synthetic tasks for conceptual development and subsequently extended to real-world datasets, we show that our proposed methodology can measure uncertainty of LLM predictions made under ICL more reliably than existing alternative methods. Moreover, we show it can be used as a practical tool for trustworthy-related applications, such as hallucination detection. Our findings pave a new direction for connecting the quantitative view of uncertainty with the mechanistic understanding of model behavior.

2606.19349 2026-06-19 cs.CL cs.AI 新提交 80%

Where to Place the Query? Unveiling and Mitigating Positional Bias in In-Context Learning for Diffusion LLMs via Decoding Dynamics

查询应置于何处?通过解码动力学揭示并缓解扩散大语言模型中上下文学习的位置偏差

Zhengheng Li, Panrui Li, Xuyang Liu, Puzhi Xia

发表机构 * Southeast University(东南大学)

专题命中 其他LLM :研究扩散LLM中上下文学习的位置偏差

AI总结 本文系统分析了扩散大语言模型中查询位置对生成质量的影响,发现其与示例语义质量同等重要,并提出基于平均置信度的无训练自适应路由策略Auto-ICL以优化查询放置。

Comments 9 figures, 4 tables

详情
AI中文摘要

尽管上下文学习(ICL)在自回归(AR)大语言模型(LLMs)中已被广泛研究,但其在扩散大语言模型(dLLMs)中的机制仍基本未被探索。与受单向因果掩码限制的AR模型不同,dLLMs本质上利用双向注意力,为查询放置提供了广泛的空间灵活性。不幸的是,当前实践通常继承AR风格的尾随查询模板,往往忽略了结构范式转变。本文通过全面分析揭示了查询位置实际上是dLLMs中的一阶变量。通过经验解耦,我们证明了位置方差对生成质量的影响与示例语义质量相当。在内部,这种位置敏感性源于注意力流中的空间“近因效应”以及解码轨迹中依赖于任务的偏移。为了在没有真实标签的情况下缓解这种不稳定性,我们揭示了传统的单步置信度($C_{decoded}$)在dLLMs中失效。相反,我们提出了平均置信度($\overline{C}$),一种跟踪迭代解码过程的新指标。通过建立基础的空间ICL基线,我们引入了Auto-ICL,一种无需训练的自适应路由策略,动态优化查询放置,在异构推理和感知任务中稳健地接近最优性能。

英文摘要

While In-Context Learning (ICL) is extensively studied in Autoregressive (AR) LLMs, its mechanism within Diffusion Large Language Models (dLLMs) remains largely unexplored. Unlike AR models restricted by unidirectional causal masking, dLLMs intrinsically utilize bidirectional attention, offering extensive spatial flexibility for query placement. Unfortunately, current practices conventionally inherit AR-style trailing-query templates, often overlooking the structural paradigm shift. This paper presents a comprehensive analysis unveiling that query position is actually a first-order variable in dLLMs. Through empirical decoupling, we demonstrate that positional variance impacts generation quality on par with example semantic quality. Internally, this positional sensitivity stems from a spatial ``Recency Effect'' in attention flow and task-dependent shifts in decoding trajectories. To mitigate this instability without ground-truth labels, we reveal that traditional single-step confidence ($C_{decoded}$) fails in dLLMs. Instead, we propose Average Confidence ($\overline{C}$), a novel metric tracking the iterative decoding process. By establishing the foundational spatial ICL baselines, we introduce Auto-ICL, a training-free adaptive routing strategy that dynamically optimizes query placement, robustly approaching oracle performance across heterogeneous reasoning and perception tasks.

2606.19346 2026-06-19 cs.CL cs.AI 新提交 80%

Disentangling Linguistic Relatedness from Task Alignment in Cross-Lingual Transfer

跨语言迁移中语言相关性与任务对齐的解耦

Ahmed Haj Ahmed, Ruochen Zhang, Alvin Grissom

发表机构 * Haverford College(哈弗福德学院) Brown University(布朗大学)

专题命中 其他LLM :跨语言迁移中任务对齐与语言相关性解耦

AI总结 通过微调大语言模型并在闪语族与非闪语族语言上评估零样本阅读理解,发现跨语言迁移主要提升任务格式对齐而非语言特定知识。

详情
AI中文摘要

我们通过微调七个大语言模型(4B--671B参数)在阿拉伯语上,并在闪语族语言和非闪语族对照语言上评估零样本阅读理解,研究跨语言迁移。在密集架构和混合专家架构中,我们没有发现闪语族特定迁移的证据:基线较弱的模型在所有语言上都有显著提升,而基线较强的模型无论语言族系如何,只有边际提升。思维链消融实验强化了这一发现——从微调中获益最多的模型同样从推理时推理中获益,这表明两种机制都解决了任务格式对齐问题,而非跨语言知识迁移。

英文摘要

We study cross-lingual transfer by fine-tuning seven large language models (4B--671B parameters) on Arabic and evaluating zero-shot reading comprehension on Semitic languages and non-Semitic controls. Across dense and Mixture-of-Experts architectures, we find no evidence of Semitic-specific transfer: models with weak baselines improve dramatically across all languages, while strong-baseline models show only marginal gains regardless of language family. A chain-of-thought ablation reinforces this finding -- the same models that benefit most from fine-tuning benefit equally from inference-time reasoning, suggesting both mechanisms address task-format alignment rather than cross-lingual knowledge transfer.

2606.20560 2026-06-19 cs.LG cs.AI 新提交 75%

How Transparent is DiffusionGemma?

DiffusionGemma 的透明度如何?

Joshua Engels, Callum McDougall, Bilal Chughtai, Janos Kramar, Senthoran Rajamanoharan, Cindy Wu, Arthur Conmy, Asic Q Chen, Jean Tarbouriech, Min Ma, Brendan O'Donoghue, João Gabriel Lopes de Oliveira, Rohin Shah, Neel Nanda

发表机构 * Google(谷歌)

专题命中 其他LLM :研究DiffusionGemma推理透明度

AI总结 研究DiffusionGemma在连续潜空间中的推理透明度,通过变量透明度和算法透明度分解,发现可解释的令牌瓶颈将不透明串行深度降至Gemma 4的1.1倍,并揭示扩散特有现象。

Comments 20 main text pages and 6 pages of references and appendices

详情
AI中文摘要

LLM推理透明度是理解模型决策、减少误用和错位以及调试意外模型行为的关键能力。然而,DiffusionGemma在连续潜空间中执行了更大比例的计算;这是否使其推理透明度降低?我们通过将透明度分解为两个组成部分来研究这个问题:变量透明度,即我们是否理解模型计算状态的中间快照;以及算法透明度,即我们是否能够利用这些快照重建模型得出其输出的过程。直观上,DiffusionGemma的变量透明度较差:其不透明串行深度,即在可解释模型状态之间发生的串行计算量,最初似乎是相应自回归Gemma 4模型的28.6倍。然而,我们表明,我们可以通过一个可解释的令牌瓶颈映射去噪步骤之间流动的信息,且下游性能没有下降。将这些中间状态视为可解释的,将不透明串行深度降至仅为Gemma 4的1.1倍。对于扩散模型来说,算法透明度比自回归模型更难,因为画布中的所有令牌预测在每个去噪步骤中都可能发生变化,这使模型有能力在去噪过程中实现复杂的分布式算法。为了开始弥合这一差距,我们进行了一系列可解释性案例研究,发现了扩散特有现象(如非时序推理、令牌和序列涂抹以及中间上下文推理)的初步证据。最后,我们测试了可监控性,这是透明度的一个关键应用,衡量模型输出是否对下游任务有用。我们发现DiffusionGemma的可监控性与Gemma 4相似。

英文摘要

LLM reasoning transparency is a critical affordance for understanding model decisions, mitigating misuse and misalignment, and debugging surprising model behaviors. However, DiffusionGemma performs a larger fraction of its computation in a continuous latent space; does this make its reasoning less transparent? We study this question by decomposing transparency into two components: variable transparency, whether we understand intermediate snapshots of a model's computational state; and algorithmic transparency, whether we can use these snapshots to reconstruct the process by which the model arrived at its outputs. Naively, DiffusionGemma has poor variable transparency: its opaque serial depth, the amount of serial computation that occurs in between interpretable model states, seems at first 28.6X higher than the corresponding autoregressive Gemma 4 model. However, we show that we can map the information flowing between denoising steps through an interpretable token bottleneck with no decrease in downstream performance. Treating these intermediate states as interpretable reduces the opaque serial depth to just 1.1X that of Gemma 4. Algorithmic transparency is harder for diffusion models than for autoregressive models because all token predictions in the canvas can change at every denoising step, giving the model the power to implement complicated distributed algorithms during the denoising process. To begin bridging this gap, we conduct a suite of interpretability case studies, uncovering initial evidence of novel diffusion-specific phenomena such as non-chronological reasoning, token and sequence smearing, and intermediate-context reasoning. Finally, we test monitorability, a key application of transparency that measures whether model outputs are useful for downstream tasks. We find that DiffusionGemma is similarly monitorable to Gemma 4.

2606.20400 2026-06-19 cs.LG 新提交 75%

The Significance of Style Diversity in Annotation-Free Synthetic Data Generation

无标注合成数据生成中风格多样性的重要性

Zahra Abbasiantaeb, Zeno Belligoli, Omar Essam, Mohammad Aliannejadi

发表机构 * University of Amsterdam(阿姆斯特丹大学)

专题命中 其他LLM :利用LLM生成合成对话数据,提升意图分类性能

AI总结 提出无需人工标注的对话生成框架,利用主题和风格属性增强多样性,并设计两种后处理风格化模型,实验表明风格多样性比主题多样性更关键,性能可达人工标注数据的93.3%。

详情
AI中文摘要

为意图分类生成高实用性的合成数据通常需要人工标注的种子数据,这在快节奏的工业环境中往往不可用。在本文中,我们提出了一个完全无需人工标注数据、仅依赖意图定义的合成对话生成框架。我们提出的对话生成框架利用两种不同类型的主题和风格属性来提高数据多样性。此外,我们提出了两种新颖的后处理风格化模型,称为Univ和Exam,以将合成的LLM生成的语句转换为更多样化、更接近人类的语言风格。为了提升数据质量,我们利用LLM作为评判的过滤过程。在工业数据集和公开数据集上的实验结果表明,所提出的方法达到了使用人工标注训练数据所获得性能的93.3%。至关重要的是,研究结果揭示,对于合成数据的实用性,风格多样性比主题多样性更为关键,因为它能防止模型学习虚假的风格相关性。此外,研究表明,在生成过程中融入风格属性比后处理风格适应更有效。

英文摘要

Generating high-utility synthetic data for intent classification typically requires human-annotated seed data, which is often unavailable in fast-paced industrial settings. In this paper, we propose a framework for synthetic dialogue generation that works entirely without human-annotated data, relying solely on intent definitions. Our proposed dialogue generation framework utilizes two different types of topic and style attributes to improve data diversity. Also, we propose two novel post-hoc stylization models called Univ and Exam to transform synthetic LLM-generated utterances into more varied, human-like linguistic styles. To enhance data quality, we utilize an LLM-as-a-judge filtering process. Experimental results on both industrial and public datasets demonstrate that the proposed approach achieves up to 93.3% of the performance obtained using human-annotated training data. Crucially, the findings reveal that style diversity is more critical than topic diversity for synthetic data utility, as it prevents models from learning spurious stylistic correlations. Furthermore, the study shows that incorporating style attributes during the generation process is more effective than post-hoc style adaptation.

2606.19831 2026-06-19 cs.CL cs.LG 新提交 75%

Leverage Is Not Reach: A Control-Window Law for Single-Neuron Steering in Language Models

杠杆不等于可达性:语言模型中单神经元操控的控制窗口定律

Hongliang Liu

发表机构 * Palo Alto Networks

专题命中 其他LLM :研究语言模型中单神经元干预的控制窗口理论。

AI总结 提出预算归一化控制窗口框架,通过残差范数与写入范数之比定义的相干预算,预测单神经元干预何时产生连贯行为控制,并在15个神经元上验证了预测精度。

详情
AI中文摘要

对齐语言模型通过稀疏前馈神经元门控拒绝和语言路由等行为,但尚无理论预测单神经元干预何时连贯地控制行为而非导致输出崩溃。我们开发了一个预算归一化的控制窗口框架用于单神经元操控。沿一个写入方向的剂量简化为一个控制坐标:残差流与写入之间的对齐,该对齐沿着一条通用饱和曲线驱动,以残差范数除以写入范数设定的相干预算为单位。当行为触发点低于崩溃上限时,存在连贯控制。同一坐标控制良性模式切换和拒绝;上限由权重和一次通用前向传播得出,而触发点在 rollout 时测量。在15个保留神经元上,预测上限的平均绝对误差为0.14,在批量层中约为0.07,并且承诺的开启或关闭判定在11个神经元上成立,而多数基线为10/15。关闭情况揭示了三种失败模式而非违反:触发前崩溃、深度不足以传播、或归一化限制了单个神经元能推动的距离。该定律解释了为什么局部梯度归因反直觉地预测控制:真正的控制器偏离读出轴写入,并携带接近零的一阶梯度。由窗口精确化的仅前向对比筛选恢复了归因遗漏的控制器。在拒绝这一最难案例中,干预成功是类型化的而非标量:连贯旁路和严格可操作可达性分离,因此一个神经元可以在流畅、任务相关且无操作内容的文本中翻转拒绝,而真正的可操作可达性仅出现在六个审计的 Llama 枢轴中的三个,且仅在较晚的 rollout 时间范围内。因此,单神经元操控是对可控性的预算化、类型化审计,而非固定剂量的轶事。

英文摘要

Aligned language models gate behaviors such as refusal and language routing through sparse feed forward neurons, yet no theory predicts when a single neuron intervention controls a behavior coherently rather than collapsing the output. We develop a budget normalized control window framework for single neuron steering. A dose along one write direction reduces to one control coordinate: the alignment between the residual stream and the write, driven along a universal saturation curve in units of a coherence budget set by the residual norm divided by the write norm. Coherent control exists when a behavior trigger lies below the collapse ceiling. The same coordinate governs benign mode switches and refusal; the ceiling follows from weights and one generic forward pass, while triggers are measured at rollout. On fifteen held out neurons, the predicted ceiling has mean absolute error 0.14, about 0.07 in bulk layers, and the committed open or closed verdict holds on eleven against a ten of fifteen majority baseline. Closed cases expose three failure modes rather than violations: collapse before trigger, too little depth to propagate, or a normalization that caps how far one neuron can push. The law explains why local gradient attribution anti predicts control: true controllers write off the readout axis and carry a near zero first order gradient. A forward only contrastive screen made precise by the window recovers controllers that attribution misses. On refusal, the hardest case, intervention success is typed, not scalar: coherent bypass and strict actionable reach separate, so a neuron can flip refusal in fluent, on task text with no actionable content, and genuine actionable reach appears only for three of six audited Llama pivots and only at later rollout horizons. Single neuron steering is therefore a budgeted, typed audit of controllability rather than a fixed dose anecdote.

2606.18933 2026-06-19 cs.LG cs.IR stat.ME 新提交 75%

Zero-Shot Active Feature Acquisition via LLM-Elicitation

基于LLM启发式的零样本主动特征获取

Binyamin Perets, Natalie Mendelson, Shiran Vainberg, Yehuda Chowers, Shai Shen-Orr, Shie Mannor

发表机构 * Faculty of EE, Technion(技术学院电子工程系) Faculty of Medicine, Technion(技术学院医学院) CytoReason NVIDIA

专题命中 其他LLM :LLM启发式用于主动特征获取

AI总结 提出通过LLM启发式获取马尔可夫随机场充分统计量的零样本主动特征获取框架,解决数据标注不足问题,在IBD患者诊断中优于现有方法。

详情
AI中文摘要

主动特征获取(AFA)顺序选择要观察的特征以达成分类或排序决策。其主要局限性在于依赖大量标注数据来拟合指导获取的概率模型。大型语言模型(LLM)提供无监督的领域知识,但作为序列规划者表现不佳。要求其同时知晓和决策会混淆最好分开的能力。这里,我们通过严格的启发式方法开发了一个零样本AFA框架:仅要求LLM返回其可被信任返回的内容,即马尔可夫随机场(MRF)的充分统计量——一元偏差和成对协变。我们将该框架应用于两个场景:二分类和top-$k$识别。实践中,LLM可靠地仅返回判别性统计量,即区分类别而非孤立每个类别的统计量,这阻碍了经典AFA。我们应用最大熵闭包来解决这种规范模糊性。我们在炎症性肠病(IBD)患者队列上进行评估,这是一个活跃的临床环境,其中诊断模糊性和患者异质性阻碍了稳定的治疗策略。我们的框架在真实标签和其自身提取的信念上均优于LLM。在最关键的地方,即最困难的患者上,我们的top-$k$获取策略显著优于所有现有方法。

英文摘要

Active feature acquisition (AFA) sequentially selects which features to observe to reach a classification or ranking decision. Its central limitation is reliance on large amount of labeled data to fit probabilistic models guiding acquisition. Large language models (LLMs) supply unsupervised domain knowledge, but are poor sequential planners. Asking one to both know and decide conflates capabilities best kept separate. Here, we develop a framework for zero-shot AFA through disciplined elicitation: asking the LLM only for what it can be trusted to return, the unary deviations and pairwise co-variations that are the sufficient statistics of a Markov random field (MRF). We apply our framework to two settings: binary classification and top-$k$ identification. In practice, the LLM reliably returns only discriminative statistics, what distinguishes the classes rather than each class in isolation, which precludes classical AFA. We apply a maximum-entropy closure that resolves this gauge ambiguity. We evaluate on a cohort of Inflammatory Bowel Disease (IBD) patients, an active clinical setting where diagnostic ambiguity and patient heterogeneity obstruct stable treatment strategies. Our framework outperforms the LLM both on real labels and on its own extracted beliefs. Where it matters most, on the hardest patients, our top-$k$ acquisition policy markedly outperforms all existing methods.

2606.17832 2026-06-19 cs.LG 新提交 75%

From Drift to Coherence: Stabilizing Beliefs in LLMs

从漂移到一致:稳定LLM中的信念

SongEun Kim, Seungyoo Lee, Edwin Fong, Hyungi Lee, Juho Lee

发表机构 * Department of Statistics, Seoul National University Korea Advanced Institute of Science \& Technology Department of AI, Kookmin University University of Hong Kong

专题命中 其他LLM :研究LLM信念稳定性,提出预测重采样方法

AI总结 研究LLM在多项选择问答中的信念漂移问题,提出提示式预测重采样(PPR)方法,发现信念过程会自稳定并收敛,进而提出种子答案提示策略和自一致性损失以加速稳定并提高预测一致性。

详情
AI中文摘要

大型语言模型(LLM)常被假设执行隐式贝叶斯推理,然而一个关键的一致性条件——预测信念的鞅性质——已被证明在受控的合成上下文学习设置中失效。我们在更典型的使用场景中重新审视这个问题:通用多项选择问答。利用离散答案空间,我们计算精确的预测分布,并研究由自回归答案重采样引起的信念动态。我们引入了提示式预测重采样(PPR),其中LLM对同一问题生成一系列答案。实验表明,PPR揭示了早期阶段的信念漂移,表明鞅性质被违反。然而,在足够的重采样步骤后,信念过程自稳定并收敛到一个一致的预测分布。基于这一观察,我们进一步提出了(i)种子答案提示策略以加速稳定,以及(ii)自一致性损失,通过微调将早期漂移摊销到模型中。在多项选择问答基准上的实验表明,我们的方法在不牺牲准确性的情况下显著减少了信念漂移并提高了预测一致性。

英文摘要

Large language models (LLMs) are often hypothesized to perform implicit Bayesian inference, yet a key coherence condition, the martingale property of predictive beliefs, has been shown to fail in controlled synthetic in-context learning settings. We revisit this question in a more typical usage regime: generic multiple-choice question answering. Exploiting the discrete answer space, we compute exact predictive distributions and study belief dynamics induced by autoregressive answer resampling. We introduce prompted predictive resampling (PPR), where an LLM generates a sequence of answers to the same question. Empirically, PPR reveals early-stage belief drift, indicating martingale violations. However, after sufficient resampling steps, the belief process self-stabilizes and converges to a coherent predictive distribution. Based on this observation, we further propose (i) a seed-answer prompting strategy to accelerate stabilization, and (ii) a self-consistency loss that amortizes early-stage drift into the model via fine-tuning. Experiments on multiple-choice QA benchmarks show that our methods substantially reduce belief drift and improve predictive coherence without sacrificing accuracy.

2606.20544 2026-06-19 cs.AI cs.LG 新提交 70%

Toward Calibrated Mixture-of-Experts Under Distribution Shift

面向分布偏移下的校准混合专家模型

Gina Wong, Drew Prinster, Suchi Saria, Rama Chellappa, Anqi Liu

发表机构 * Johns Hopkins University(约翰霍普金斯大学)

专题命中 其他LLM :研究混合专家模型的校准问题,属于大语言模型应用

AI总结 研究混合专家模型在分布偏移下的校准问题,提出对抗性重加权方法以改善路由聚合的校准误差,提升准确率-校准权衡。

Journal ref ICML 2026

详情
AI中文摘要

校准将模型的预测不确定性与其经验结果的频率对齐,对于理解和信任报告的概率很重要。最近的研究表明,在单个预测器级别强制执行校准可以提高集成准确性和校准,特别是混合专家(MoE)模型显示出强烈的经验改进;然而,校准有助于MoE的条件尚不清楚。在这项工作中,我们研究了MoE模型在分布偏移下的行为,重点关注路由机制如何与专家级校准相互作用。我们表明,在硬路由模型中,专家校准足以确保整体模型在一大类分布偏移下的校准,但不足以校准软路由模型。为了解决这个问题,我们提出了一种对抗性重加权方法,惩罚分布偏移下路由聚合的校准误差,并证明它在平均情况下以及在数据的困难子集上,跨模型类别、预测任务和分布偏移,改善了准确率-校准权衡。

英文摘要

Calibration aligns a model's predictive uncertainty with the frequencies of its empirical outcomes and is important for understanding and trusting reported probabilities. Recent work shows that enforcing calibration at the level of individual predictors can improve ensemble accuracy and calibration, with mixture-of-experts (MoE) models showing strong empirical improvements in particular; however, the conditions under which calibration helps MoE are not well understood. In this work, we study how MoE models behave under distribution shift, focusing on how routing mechanisms interact with expert-level calibration. We show that expert calibration is sufficient to ensure calibration of the overall model under a broad class of distribution shifts in hard-routed models, but is insufficient for calibrating soft-routed models. To address this, we propose an adversarial reweighting that penalizes calibration errors of the routed aggregate under distribution shift, and we demonstrate that it improves the accuracy-calibration tradeoff both on average and on difficult subsets of the data, across model classes, prediction tasks, and distribution shifts.

2606.20538 2026-06-19 cs.LG 新提交 70%

Multi-Task Bayesian In-Context Learning

多任务贝叶斯上下文学习

Qingyang Zhu, Eric Karl Oermann, Kyunghyun Cho

发表机构 * New York University(纽约大学)

专题命中 其他LLM :提出多任务上下文学习框架,属于LLM方法

AI总结 提出多任务上下文学习框架,通过将先验信息表示为上下文数据集前缀,训练Transformer实现分层贝叶斯预测推理,在多种分布偏移下匹配最优贝叶斯性能且速度提升数个数量级。

Comments ICML 2026

详情
AI中文摘要

贝叶斯预测推断为不确定性量化、数据效率和鲁棒泛化提供了原则性框架。然而,精确推断通常难以处理,可扩展近似可能仍计算昂贵或需要限制性建模假设,从而降低预测性能。先验数据拟合和上下文模型最近作为一种摊销替代方案出现,通过学习直接将数据集映射到预测分布,但现有方法与训练先验的支持紧密耦合,缺乏在测试时适应新先验的显式机制,导致在分布偏移下鲁棒性有限。我们引入了一个多任务上下文学习框架,用于摊销分层贝叶斯预测推断,该框架将先验信息显式表示为上下文数据集的前缀。一个在先验和目标任务序列上训练的Transformer学习跨先验族调整其预测。在一系列难度递增的评估中,包括元分布外先验和具有高维潜在结构的先验,我们的方法匹配了最优贝叶斯预测器,同时速度快了几个数量级。我们进一步在真实世界的时空温度预测基准上展示了其实用性。代码可在https://this URL获取。

英文摘要

Bayesian predictive inference provides a principled framework for uncertainty quantification, data efficiency, and robust generalization. However, exact inference is often intractable, and scalable approximations may remain computationally expensive or require restrictive modeling assumptions that degrade predictive performance. Prior-Data Fitted and in-context models have recently emerged as an amortized alternative by learning to map datasets directly to predictive distributions, but existing approaches are tightly coupled to the support of the training prior and lack explicit mechanisms for adapting to new priors at test time, resulting in limited robustness under distribution shift. We introduce a multi-task in-context learning framework for amortized hierarchical Bayesian predictive inference that explicitly represents prior information as a prefix of in-context datasets. A transformer trained on sequences of prior and target tasks learns to adapt its predictions across families of priors. On a suite of evaluations with increasing difficulty, including out-of-meta-distribution priors and priors with high-dimensional latent structures, our method matches oracle Bayesian predictors while being orders of magnitude faster. We further demonstrate its practical relevance on a real-world spatiotemporal temperature prediction benchmark. Code is available at https://github.com/martianmartina/multi-task-bayesian-icl/.

2606.20502 2026-06-19 cs.CR cs.AI cs.SE 新提交 70%

Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software

无理解的校准:诊断微调大语言模型在系统软件漏洞检测中的局限性

Arastoo Zibaeirad, Marco Vieira

发表机构 * University of North Carolina at Charlotte(北卡罗来纳大学夏洛特分校)

专题命中 其他LLM :诊断微调LLM在漏洞检测中的局限性

AI总结 提出CWE-Trace框架,通过834个Linux内核样本和两个诊断指标(DFI和HDD)评估LLM漏洞检测能力,发现数据污染无实质帮助,微调仅改变输出阈值而非决策策略,模型缺乏真正的安全推理能力。

详情
AI中文摘要

大语言模型在漏洞基准测试中得分高,但究竟是真正推理安全还是仅对污染数据进行模式匹配,这一问题仍未解决。我们提出CWE-Trace,一个基于834个手动整理的Linux内核样本(涵盖74个CWE)构建的LLM漏洞检测框架。该框架强制执行严格的时间分割(2025年前的历史集/截止后的无泄漏集),保留上下文感知的易受攻击-修补对,并引入两个诊断指标:方向性失败指数(DFI)和层次距离与方向(HDD)。我们评估了8个原始LLM和15个LoRA微调变体,涵盖非目标检测、目标检测和CWE分类。分析得出两个关键结果。首先,数据污染未提供可衡量的优势。函数级分析显示,84%的名义污染样本不携带可用的记忆信号:易受攻击的函数缺失或跨数据集交叉映射,约31%的污染样本存在CWE误分类。其次,骨干方向性先验主导微调。模型表现出稳定、系统性的失败模式(DFI范围从-85.5到+94.8个百分点),这些模式从历史数据持续到截止后数据,且难以纠正。微调改变了输出阈值,但未改变决策策略。这是无理解的校准:输出分布适应训练数据,而底层安全推理仍然缺失。在二元检测中最弱的骨干(DeepSeek-R1)在粗粒度CWE分类中提升最大,表明检测和理解是解耦的能力。最佳检测得分仅达到52.1%(比随机高2.1个百分点);精确CWE排名Top-1准确率仍低于1.3%,证实当前LLM无论采用何种微调策略,都缺乏对系统软件的可靠安全推理能力。

英文摘要

Whether LLMs scoring well on vulnerability benchmarks genuinely reason about security or merely pattern-match on contaminated data remains unresolved. We present CWE-Trace, a framework for LLM vulnerability detection built from 834 manually curated Linux kernel samples spanning 74 CWEs. The framework enforces a strict temporal split (pre-2025 historical set / post-cutoff leakage-free set), preserves context-aware vulnerable--patched pairs, and introduces two diagnostic metrics: the Directional Failure Index (DFI) and Hierarchical Distance and Direction (HDD). We evaluate eight vanilla LLMs and 15 LoRA fine-tuned variants across non-targeted detection, targeted detection, and CWE classification. Our analysis yields two key results. First, data contamination provides no measurable advantage. Function-level analysis shows that 84% of nominally contaminated samples carry no usable memorization signal: vulnerable functions are absent or cross-mapped across datasets, and ~31% of contaminated samples carry CWE misclassification. Second, backbone directional priors dominate fine-tuning. Models exhibit stable, systematic failure modes (DFI ranging from -85.5 to +94.8 pp) that persist from historical to post-cutoff data and resist correction. Fine-tuning shifts the output threshold without changing the decision policy. This is calibration without comprehension: output distributions adapt to training data while the underlying security reasoning remains absent. The weakest backbone at binary detection (DeepSeek-R1) gains the most in coarse CWE classification, revealing that detection and understanding are decoupled capabilities. The best detection score reaches only 52.1% (+2.1 pp above chance); exact CWE ranking remains below 1.3% Top-1 accuracy, confirming that current LLMs lack reliable security reasoning for systems software, regardless of fine-tuning strategy.

2606.20436 2026-06-19 cs.CR cs.AI 新提交 70%

Multi-View Decompilation for LLM-Based Malware Classification

基于LLM的恶意软件分类的多视角反编译

Bercan Turkmen, Vyas Raina

发表机构 * Independent Researcher(独立研究员) SPARK

专题命中 其他LLM :利用LLM分类反编译代码,涉及LLM应用

AI总结 提出多反编译器视角提升LLM恶意软件分类性能,通过Ghidra和RetDec的互补伪C代码提高召回率和F1分数。

详情
AI中文摘要

恶意软件分析师通常在源代码不可用时,通过反编译的伪C代码检查编译后的二进制文件。最近的研究表明,大型语言模型(LLMs)可以通过将反编译代码分类为良性或恶意来辅助这一过程,但现有的流程通常依赖于单一的反编译器视角。我们认为这一假设是脆弱的:反编译器是有损的启发式工具,不同的反编译器可能暴露同一二进制文件的不同特征。我们整理了一个包含良性工具和恶意程序的基准测试,涵盖一系列威胁行为。每个样本都使用Ghidra和RetDec进行编译和反编译,生成匹配的伪C视图。在来自主要模型系列的一系列LLMs中,我们发现提供两种反编译器视图可以提高恶意类别的F1分数,主要是通过提高恶意样本的召回率。一致性分析进一步表明,Ghidra和RetDec会犯部分不同的错误,支持反编译器输出提供互补证据的观点。我们的结果表明,多反编译器提示是一种简单、无需训练的方法,可以在实际环境中改进基于LLM的恶意软件分类。

英文摘要

Malware analysts often inspect compiled binaries through decompiled pseudo-C, when source code is unavailable. Recent work suggests that large language models (LLMs) can assist this process by classifying decompiled code as benign or malicious, but existing pipelines typically rely on a single decompiler view. We argue that this assumption is fragile: decompilers are lossy heuristic tools, and different decompilers can expose different artefacts of the same binary. We curate a benchmark of benign utilities and malicious programs spanning a range of threat behaviors. Each sample is compiled and decompiled with both Ghidra and RetDec, yielding matched pseudo-C views. Across a range of LLMs from major model families, we find that providing both decompiler views improves malicious-class F1, mainly by increasing recall on malicious samples. Agreement analyses further show that Ghidra and RetDec make partially different errors, supporting the view that decompiler outputs provide complementary evidence. Our results suggest that multi-decompiler prompting is a simple, training-free way to improve LLM-based malware triage in practical settings.

2606.20295 2026-06-19 cs.SE cs.CL 新提交 70%

Token-Operations-Oriented Inference Optimization Techniques for Large Models

面向令牌操作的大模型推理优化技术

Shiguo Lian, Kai Wang, Zhaoxiang Liu, Wen Liu, Minjie Hua, Yutong Liu, Jiangze Yan, Xin Wang, Cong Wang, Yilin Zhang, Yi Shen, Jieyun Huang, Fang Zhao, Huanlin Gao, Ping Chen, Xinyu Yang, Kaikai Zhao, Yao Zhao, Xinggang Wang, Huishuai Zhang, Dongyan Zhao, Junping Du, Tao Chen, Xiang Gao, Qinghuai Ma

发表机构 * China’s National Data Administration(中国国家数据管理局)

专题命中 其他LLM :综述大模型推理优化技术

AI总结 本文提出多模型融合、模型优化、计算-模型融合、计算-网络-模型融合四层技术架构,系统综述各层关键技术及产业现状,旨在降低令牌成本、提升服务效率、保障供应稳定性,推动大模型服务从可调用到可运营的转变。

Comments 62 pages, 36 figures

详情
AI中文摘要

大模型推理优化是支撑大模型服务可扩展、低成本、高稳定运行的关键基础。本文以面向令牌的推理优化技术为核心,首次提出由多模型融合、模型优化、计算-模型融合、计算-网络-模型融合组成的四层技术架构,系统梳理了这四层的关键技术和产业现状,并分析了相关技术在实际业务场景中的应用价值。本文为降低令牌生产成本、提高令牌服务效率、保障令牌供应稳定性、推动大模型服务从可调用到可运营的转变提供了实用的技术路径。

英文摘要

Large model inference optimization serves as a key foundation for supporting the scalable, low-cost, and highly stable operation of large model services. Centered on token-oriented inference optimization technology, this paper proposes for the first time a four-layer technical architecture consisting of Multi-model Fusion, Model Optimization, Compute-Model Fusion, and Compute-Network-Model Fusion. It systematically reviews the key technologies and current industry status across these four levels and analyzes the application value of related technologies in real-world business scenarios. This paper provides a practical technical path for reducing token production costs, improving token service efficiency, ensuring the stability of token supply, and driving the transition of large model services from being merely callable to being operable.

2606.20287 2026-06-19 cs.CL 新提交 70%

PsyScore: A Psychometrically-Aware Framework for Trait-Adaptive Essay Scoring and ZPD-Scaffolded Feedback

PsyScore: 一种心理测量感知的特质自适应作文评分与最近发展区支架反馈框架

Wei Xia, Jin Wu, Haoran Shi, Xiangyu Wang, Chanjin Zheng

发表机构 * Department of Educational Psychology, East China Normal University(华东师范大学教育心理学系) Shanghai Institute of Artificial Intelligence for Education, East China Normal University(华东师范大学上海智能教育研究院) School of Computer Science and Technology, East China Normal University(华东师范大学计算机科学与技术学院)

专题命中 其他LLM :使用LLM生成自适应反馈

AI总结 提出PsyScore框架,通过共享潜在能力表示整合诊断评估与教学支架,包括特质自适应神经IRT评分器、ZPD支架反馈生成器和多视角反馈评估策略,在ASAP++数据集上实现竞争性评分性能并提供更符合教学法的反馈。

详情
AI中文摘要

有效的自动作文评分(AES)应支持可靠评估和可操作的教学反馈。然而,现有方法通常将评分和反馈视为独立组件:神经评分模型可解释性有限,而基于大语言模型(LLM)的反馈通常对学习者熟练度不敏感。为解决这一碎片化问题,本工作提出PsyScore,一个心理测量感知的框架,通过共享潜在能力表示整合诊断评估与教学支架。PsyScore包含三个关键模块:特质自适应神经IRT评分器,将分级部分信用模型(GPCM)融入神经架构,能够在保持心理测量可解释性的同时精确估计学生能力;ZPD支架反馈生成器,根据诊断出的能力参数调节多智能体反馈策略,以适应不同熟练水平的教学重点;以及多视角反馈评估策略,通过成对偏好判断和学生修订模拟评估反馈质量。在ASAP++数据集上的实验表明,PsyScore在提供更具教学一致性的反馈的同时,实现了有竞争力的评分性能。

英文摘要

Effective Automated Essay Scoring (AES) are expected to support both reliable assessment and actionable instructional feedback. However, existing approaches often treat scoring and feedback as separate components: neural scoring models provide limited interpretability, while Large Language Model (LLM)-based feedback is typically insensitive to learners proficiency levels. To address this fragmentation, this work proposes PsyScore, a psychometrically-aware framework that integrates diagnostic assessment with instructional scaffolding through a shared latent ability representation. PsyScore comprises three key modules: a Trait-Adaptive Neural IRT Scorer that incorporates the Graded Partial Credit Model (GPCM) into a neural architecture, enabling the precise estimation of student ability while maintaining psychometric interpretability, a ZPD-Scaffolded Feedback Generator, which conditions multi-agent feedback strategies on the diagnosed ability parameter to adapt instructional focus across different proficiency levels, and a Multi-Perspective Feedback Evaluation Strategy that assesses feedback quality via pairwise preference judgements and student revision simulations. Experiments on the ASAP++ dataset demonstrate that PsyScore achieves competitive scoring performance while providing more pedagogically aligned feedback.

2606.19941 2026-06-19 cs.LG 新提交 70%

Compositionality Emerges in a Narrow Depth-Connectivity Regime: Architecture Constraints and Solution Manifolds

组合性在窄深度-连接性区域中涌现:架构约束与解流形

Dat H. Do, Rushi Shah, Duc V. Le, Dianbo Liu

发表机构 * National University of Singapore(新加坡国立大学) University of Twente(特温特大学)

专题命中 其他LLM :研究组合性在稀疏网络中的涌现机制

AI总结 研究发现组合性仅在特定稀疏网络和特定深度区间涌现,提出基于相似性的剪枝和深度预测方法,并用理论框架解释原因。

详情
AI中文摘要

组合性被认为是泛化的基础,使模型能够在新颖组合中重用有意义的原语。然而,使用标准梯度优化训练的模型很少且通常仅微弱地表现出组合内部结构,并且尚不清楚这种组合性如何或为何形成。在这项工作中,我们表明组合性在一个狭窄的连接性-深度最佳点涌现。沿着连接性轴,组合性仅出现在某些特定稀疏网络中,严重依赖于保留哪些连接而非仅权重的稀疏性。沿着深度轴,组合性在一个狭窄的、目标依赖的区域内涌现,在特定深度达到峰值,而更浅和更深的网络都失败。当深度或连接性条件被违反时,梯度下降会静默地收敛到破碎解而非组合解。为了发现并利用这种涌现,我们引入了(i)基于相似性的剪枝(SP)以恢复组合连接性,以及(ii)一个启发式深度预测器以估计组合性最可能出现的深度。最后,我们通过基于组合稀疏性、体积比论证和特征干扰界限的理论框架支持这些实证发现,解释了为什么组合解仅在狭窄的深度-连接性区域内可达。

英文摘要

Compositionality is believed to be the foundation for generalization, enabling models to reuse meaningful primitives in novel combinations. Yet, models trained with standard gradient-based optimization rarely, and often only weakly, exhibit compositional internal structure, and it remains unclear how or why such compositionality forms. In this work, we show that compositionality emerges in a narrow connectivity-depth sweet spot. Along the connectivity axis, compositionality only appears in some specifically sparse networks, heavily depends on which connections remain rather than on weights' sparsity alone. Along the depth axis, compositionality emerges within a narrow, target-dependent regime, peaking at specific depths, while both shallower and deeper networks fail. When either the depth or connectivity condition is violated, gradient descent silently converges to fractured solutions rather than compositional ones. To discover and exploit this emergence, we introduce (i) similarity-based pruning (SP) to recover compositional connectivity and (ii) a heuristic depth predictor to estimate where compositionality is most likely to appear. Finally, we support these empirical findings with a theoretical framework based on compositional sparsity, volume-ratio arguments, and feature-interference bounds, explaining why compositional solutions are reachable only in a narrow depth-connectivity regime.

2606.19864 2026-06-19 cs.CL 新提交 70%

The Almost Intelligent Revolution: Options for Scaling Up Deliberation and Empowering People with AI

近乎智能的革命:扩大审议规模并利用AI赋能人类的选项

Serge Sharoff

发表机构 * Centre on Participatory and Deliberative Democracy(参与性和协商性民主研究中心)

专题命中 其他LLM :LLM用于民主审议规模扩大。

AI总结 探讨大型语言模型如何通过系统功能语言学视角扩大民主审议规模,增强包容性并赋权边缘群体,同时警惕过度承诺与低估风险。

Comments Published in /Handbook of Democracy in the Era of Artificial Intelligence/ edited by Evangelos Pournaras, Srijoni Majumdar, Carina Ines Hausladen, and Dirk Helbing. 2026

详情
AI中文摘要

大型语言模型在公共话语中的日益突出为民主审议带来了机遇和挑战。虽然红队策略有助于缓解特定风险,但关于语言限制、偏见和LLM的谄媚倾向等更广泛的担忧仍然存在。本章探讨如何利用LLM显著扩大和民主化审议,特别是在促进包容性和赋权传统边缘群体方面。借鉴系统功能语言学的概念,本章考察了语言使用者之间的差异(例如,关于社会人口群体)和语言使用中的差异(例如,关于交际功能)如何影响AI支持的审议参与。本章介绍了AI驱动的审议研究,并评估了它们在支撑论证、增强可及性以及减少嵌入在声望语域中的排斥性语言规范和偏见的影响方面的潜力。同时,本章警告不要过度承诺(导致不切实际的期望)和低估承诺(冒着错失AI辅助参与机会的风险)。最后,本章确定了未来的研究方向,以最大化AI辅助参与的民主潜力,同时嵌入伦理保障以抵消语言不平等的再生产。

英文摘要

The increasing prominence of Large Language Models (LLMs) in public discourse presents both opportunities and challenges for democratic deliberation. While red teaming strategies help mitigate specific risks, broader concerns persist regarding linguistic constraints, biases, and the sycophantic tendencies of LLMs. This chapter explores how LLMs can be used to significantly scale up and democratise deliberation, particularly in fostering inclusivity and empowering traditionally marginalised groups. Drawing on concepts from Systemic-Functional Linguistics, the chapter examines how variations across language users (for example, with respect to socio-demographic groups) and across language use (for example, with respect to communicative functions) shape participation in AI-supported deliberation. The chapter presents AI-driven deliberation studies and assesses their potential to scaffold argumentation, enhance access, and reduce the influence of exclusionary linguistic norms and biases which are embedded in prestigious registers. At the same time, the chapter cautions against both overclaiming, which leads to unrealistic expectations, and underclaiming, which risks missed opportunities for AI-assisted engagement. The chapter concludes by identifying future research directions to maximise the democratic potential of AI-assisted participation while embedding ethical safeguards to counteract the reproduction of linguistic inequalities.

2606.19857 2026-06-19 cs.CL cs.AI 新提交 70%

Large Language Models Do Not Always Need Readable Language

大型语言模型并不总是需要可读语言

Jiayi Zhu, Haoxuan Peng, Junxi Wang, Liang Ke, Chen Zhang, Linfeng Zhang

发表机构 * Shanghai Jiao Tong University(上海交通大学) The University of Sydney(悉尼大学) Hefei University of Technology(合肥工业大学) Xi’an Jiaotong University(西安交通大学) Nanjing University(南京大学)

专题命中 其他LLM :探索非标准文本表示,降低上下文开销。

AI总结 研究提出BabelTele表示法,将语义编码为紧凑、非标准文本,牺牲人类可读性但保持LLM可恢复性,实验表明可压缩至27.9%长度并保持99.5%语义保真度,降低上下文开销。

Comments 23 pages, 10 figures. Preprint

详情
AI中文摘要

大型语言模型(LLM)通常使用人类可读的自然语言进行提示和交互,即使目标读者是另一个模型。本文研究语义信息是否可以编码为紧凑、非标准的文本形式,这种形式牺牲了人类可读性,但能被LLM恢复。我们将这类以模型为中心的文本表示称为BabelTele,这里不是作为固定协议,而是作为探索LLM生成和解释此类表示能力的经验探针。通过可读性诊断、模型似然度量、人类问卷和下游任务评估,我们发现BabelTele可以显著偏离普通自然语言,同时为指令调优的LLM保留核心语义。作为一种任务无关的表示范式,BabelTele展示了高信息密度,即使文本体积压缩到原始长度的27.9%,也能保持99.5%的语义保真度。我们进一步评估了其在跨模型迁移、智能体记忆和多智能体通信中的语义鲁棒性。结果表明,BabelTele可以降低上下文开销,同时通常保持可靠的下游性能,但其有效性取决于压缩器-读取器对和任务设置。这些发现表明,人类可读性、自然语言典型性和模型端语义可恢复性可以部分解耦,为未来探索LLM系统中的模型原生表示开辟了道路。

英文摘要

Large language models (LLMs) are commonly prompted and interfaced with human-readable natural language, even when the intended reader is another model. This paper investigates whether semantic information can be encoded in compact, non-standard textual forms that sacrifice human readability while remaining recoverable by LLMs. We refer to this class of model-centric textual representations as BabelTele, approached here not as a fixed protocol but as an empirical probe into LLMs' capacity to generate and interpret such representations. Through readability diagnostics, model likelihood measures, human questionnaires, and downstream task evaluations, we find that BabelTele can substantially depart from ordinary natural language while preserving core semantics for instruction-tuned LLMs. As a task-agnostic representational paradigm, BabelTele demonstrates high information density, maintaining 99.5% semantic fidelity even when the text volume is condensed to 27.9% of its original length. We further evaluate its semantic robustness in cross-model transfer, agent memory, and multi-agent communication. Results suggest that BabelTele can reduce context overhead while generally maintaining reliable downstream performance, although its effectiveness depends on the compressor-reader pair and task setting. These findings indicate that human readability, natural-language typicality, and model-side semantic recoverability can be partially decoupled, opening a path toward model-native representations in future exploration of LLM systems.

2606.19826 2026-06-19 cs.CR cs.MA 新提交 70%

Heterogeneous LLM Debate Under Adversarial Peers: Honest Gains, Replacement Costs, and Resilience

对抗性同伴下的异构LLM辩论:诚实增益、替代成本与韧性

Prashanti Nilayam, Kiran Kumar Ramanna, Prashil Tumbade, Sankalp Nayak

专题命中 其他LLM :研究异构LLM辩论中诚实与对抗性同伴的影响。

AI总结 研究异构LLM辩论中诚实与对抗性同伴对修正行为的影响,发现诚实同伴降低有害修正率,对抗性同伴则逆转,且异构性在已有对手时也能作为防御。

详情
AI中文摘要

异构LLM辩论的动机在于,多样化的同伴可以相互纠正,但同样的交流既携带纠正也携带对抗性影响。我们通过跟踪异构同伴如何改变诚实智能体的修正行为来衡量哪种影响占主导:他们改变答案的频率,以及这种改变是纠正性的还是有害的。我们比较了匹配面板(同质基线、诚实混合和对抗混合)以及受污染面板(其中已存在一个恶意的同族同伴),涵盖四个模型家族和三个推理基准。一个诚实的异构同伴显著降低了有害修正,而对抗性同伴则逆转了这一效果。对于Llama-3.1-70B防御者在MATH-hard上,诚实插槽的有害修正率从同质面板的89%下降到有诚实同伴时的35%,而对抗性同伴使其回到90%。条件率对弱防御者隐藏了这种损害,但辩论结束时的翻转率暴露了它。该模式在家族和基准上保持符号一致,而其幅度随防御者-基准机制变化。我们还测量了当已存在一个对抗性同族同伴时的效果:一个诚实的异构同伴降低了有害修正率以及最初正确答案丢失的比率。在相同的Llama-3.1-70B设置下,添加的诚实同伴将最初正确项上的翻转率从同族对手下的31%降至6%。因此,异构性不仅是一个攻击面,而且当对手已经存在时,也是一种防御。

英文摘要

Heterogeneous LLM debate is motivated by the promise that diverse peers correct one another, but the same exchange that carries correction also carries adversarial influence. We measure which dominates by tracking how a heterogeneous peer changes the honest agents' revision behavior: how often they change their answer, and whether the change is corrective or harmful. We compare matched panels (homogeneous baseline, honest-mixed, and adversarial-mixed) and contaminated panels in which a malicious same-family peer is already present, spanning four model families and three reasoning benchmarks. An honest heterogeneous peer sharply lowers harmful revision, and an adversarial one reverses it. For Llama-3.1-70B defenders on MATH-hard, the honest-slot harmful-revision rate falls from 89% in the homogeneous panel to 35% with an honest peer, and an adversarial peer returns it to 90%. The conditional rate hides this damage on weak defenders, but the end-of-debate flip rate exposes it. The pattern keeps its sign across families and benchmarks while its magnitude varies with the defender-benchmark regime. We also measure the effects when an adversarial same-family peer is already present: an honest heterogeneous peer lowers both harmful revision and the rate at which initially-correct answers are lost. On the same Llama-3.1-70B setting, the added honest peer cuts the flip rate on initially-correct items from 31% under a same-family adversary to 6%. Heterogeneity is therefore not only an attack surface but, when an adversary is already present, also a defense.

5. 指令微调 1 篇

2606.19710 2026-06-19 cs.CL cs.AI 新提交 75%

FineREX: Fine-Tuned NER-RE for Human Smuggling Knowledge Graphs

FineREX: 面向人口走私知识图谱的微调NER-RE

Elijah Feldman, Dipak Meher, Carlotta Domeniconi

发表机构 * Thomas Jefferson High School for Science and Technology(托马斯·杰斐逊科技高中)

专题命中 指令微调 :微调LLM以提升特定领域信息提取性能。

AI总结 提出FineREX,一个基于微调LLM的流水线,用于从法律文档中提取实体和关系构建知识图谱,在F1分数上分别提升15.50%和31.46%,并减少50%处理时间。

Comments Code available at https://github.com/ElijahFeldman7/FineREX

详情
AI中文摘要

法庭记录包含关于人口走私网络的有价值证据,但这些信息通常埋藏在非结构化的、充满术语的法律文件中。虽然大型语言模型(LLM)可以通过自动信息提取支持知识图谱构建,但现有方法依赖通用模型,未针对该领域所需的实体和关系定义进行定制。我们提出FineREX,一个精简的知识图谱构建流水线,基于微调的LLM进行命名实体识别和关系提取(NER-RE)。使用包含512个文本块的手动标注数据集,FineREX在实体和关系F1分数上分别比更大的通用基线模型绝对提高了15.50%和31.46%。这些提升转化为更高质量的知识图谱,将法律噪声减少近一半,并将长文档上的节点重复率从17.78%降至11.17%。通过消除文档重写和冗余提取阶段,FineREX还将端到端处理时间减少了50.0%。我们的结果表明,领域特定的微调可以显著优于更大的通用模型,同时提高非法网络分析知识图谱构建的质量和效率。

英文摘要

Court proceedings contain valuable evidence about human smuggling networks, but this information is often buried within unstructured, jargon-heavy legal documents. While large language models (LLMs) can support knowledge graph construction through automated information extraction, existing approaches rely on general-purpose models that are not tailored to the entity and relationship definitions required in this domain. We introduce FineREX, a streamlined knowledge graph construction pipeline built around a fine-tuned LLM for named entity recognition and relationship extraction (NER-RE). Using a manually annotated dataset of $512$ text chunks, FineREX achieves absolute improvements of 15.50% and 31.46% in entity and relationship F1-score, respectively, compared to a larger general-purpose baseline. These gains translate into higher-quality knowledge graphs, reducing legal noise by nearly half and lowering node duplication on long documents from 17.78% to 11.17%. By eliminating document rewriting and redundant extraction stages, FineREX also reduces end-to-end processing time by 50.0%. Our results demonstrate that domain-specific fine-tuning can substantially outperform larger general-purpose models while improving both the quality and efficiency of knowledge graph construction for illicit network analysis.

6. 长上下文 1 篇

2606.20474 2026-06-19 cs.LG cs.AI cs.PF 新提交 70%

UltraQuant: 4-bit KV Caching for Context-Heavy Agents

UltraQuant: 面向上下文密集型智能体的4位KV缓存

Inesh Chakrabarti, David Limpus, Aditi Ghai Rana, Bowen Bao, Spandan Tiwari, Thiago Crepaldi, Ashish Sirasao

发表机构 * Advanced Micro Devices(超威半导体) University of California, Los Angeles(加州大学洛杉矶分校) Purdue University(普渡大学)

专题命中 长上下文 :针对长上下文场景优化KV缓存,降低延迟。

AI总结 针对上下文密集型智能体场景,提出UltraQuant方法,通过4位KV缓存压缩、旋转量化和代码本量化,结合AMD GPU优化,在长上下文多轮任务中延迟降低3.47倍,吞吐量提升1.63倍。

Comments 11 pages, 9 figures

详情
AI中文摘要

上下文密集型智能体给键值(KV)缓存带来了异常压力:长前缀在多个短轮次中重复使用,而并发性决定了服务系统能否保持GPU利用率。我们针对此场景研究4位KV缓存压缩,采用TurboQuant风格的旋转和代码本量化作为质量锚点,vLLM FP8 KV缓存作为部署锚点。我们报告三项贡献。首先,我们将4位KV缓存框架用于多轮智能体工作负载,其中任务质量、缓存驻留和服务吞吐量必须联合衡量。其次,我们描述了使4位路径鲁棒所需的实际设计选择,包括非对称K/V处理、Walsh-Hadamard旋转、QJL移除和块尺度变体。第三,我们展示了AMD GPU上的服务优化,包括优化的解码注意力内核和UltraQuant,一种使用FP8查询、FP4 KV张量、UE8M0组尺度和CDNA4上原生缩放MFMA支持的FP4近似路径。在长上下文、多轮智能体工作负载上,UltraQuant在缓存压力大的后期轮次中将P50首令牌延迟降低了3.47倍(所有轮次平均2.3倍),并将输出吞吐量比FP8 KV基线提高了1.63倍。

英文摘要

Context-heavy agents place unusual pressure on the key-value (KV) cache: long prefixes are reused across many short turns, while concurrency determines whether the serving system can keep GPUs utilized. We study 4-bit KV-cache compression for this setting, using TurboQuant-style rotation and codebook quantization as a quality anchor and vLLM FP8 KV caching as the deployment anchor. We report three contributions. First, we frame 4-bit KV caching around multi-round agent workloads where task quality, cache residency, and serving throughput must be measured jointly. Second, we describe the practical design choices needed to make the 4-bit path robust, including asymmetric K/V treatment, Walsh-Hadamard rotation, QJL removal, and block-scale variants. Third, we present serving optimizations on AMD GPUs, including optimized decode-attention kernels and UltraQuant, an FP4 approximation path that uses FP8 queries, FP4 KV tensors, UE8M0 group scales, and native scaled-MFMA support on CDNA4. On a long-context, multi-turn agentic workload, UltraQuant cuts P50 time-to-first-token by 3.47x in the cache-pressured late rounds (2.3x across all rounds) and raises output throughput by 1.63x over the FP8 KV baseline.