arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.04474 2026-05-12 cs.MA cs.AI

From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration

Yizhe Xie, Congcong Zhu, Xinyue Zhang, Tianqing Zhu, Dayong Ye, Minfeng Qi, Huajie Chen, Wanlei Zhou

AI总结本文研究了基于大语言模型的多智能体系统（LLM-MAS）中错误逐步扩散并导致系统性共识偏差的问题，提出了一种基于传播动力学的模型，用于分析和识别错误扩大的风险。通过实验，作者发现了三种主要的系统脆弱性，并设计了一种基于基因图谱的治理层，作为消息层插件，有效抑制内外部错误的传播，实验表明该方法在多种运行模式下能显著减少错误的级联扩散。

2602.10666 2026-05-12 eess.AS cs.LG cs.SD

From Diet to Free Lunch: Estimating Auxiliary Signal Properties using Dynamic Pruning Masks in Speech Enhancement Networks

Riccardo Miccini, Clément Laroche, Tobias Piechowiak, Xenofon Fafoutis, Luca Pezzarossa

AI总结本文研究了如何在语音增强网络中利用动态通道剪枝（DynCP）生成的内部剪枝掩码来估计辅助信号属性，如语音活动检测（VAD）、噪声分类和基频（F0）估计，从而避免部署额外模型的需求。通过简单的可解释预测器，该方法在多个任务上取得了较高的准确率，且计算开销极小。研究不仅揭示了DynCP模型在下游任务中的学习行为，还提出了将其作为高效语音增强与信号属性联合估计的统一解决方案。

Comments Accepted for publication at the 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

2601.22143 2026-05-12 cs.GR cs.CV

JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion

Anthony Chen, Naomi Ken Korem, Gal Zeevi, Tavi Halperin, Matan Ben Yosef, Urska Jelercic, Ofir Bibi, Or Patashnik, Daniel Cohen-Or

AI总结本文提出了一种基于音频-视觉扩散模型的视频配音方法JUST-DUB-IT，通过轻量级的LoRA适配器实现从输入视频生成对应语言的配音和同步面部动作。该方法利用生成模型自身生成多语言配对视频作为训练数据，通过在单个视频片段中切换语言并进行面部和音频修复，实现了高质量的配音效果，保持了说话人身份和唇形同步，同时在复杂运动和真实场景中表现出更强的鲁棒性。

Comments Project webpage available at https://justdubit.github.io

2601.20898 2026-05-12 eess.AS cs.CL cs.LG

Reducing Prompt Sensitivity in LLM-based Speech Recognition Through Learnable Projection

Sergio Burdisso, Esaú Villatoro-Tello, Shashi Kumar, Srikanth Madikeri, Andrés Carofilis, Pradeep Rangappa, Manjunath K E, Kadri Hacioglu, Petr Motlicek, Andreas Stolcke

AI总结本文研究了基于大语言模型（LLM）的语音识别系统中提示（prompt）设计对性能的影响，指出固定手动提示在不同场景下表现不稳定。为此，作者提出了一种可学习的提示投影模块，无需修改原有模型结构，即可将提示嵌入映射到更有效的LLM输入空间区域。实验表明，该方法在多个数据集上有效提升了语音识别性能并减少了结果的波动性。

Comments Paper accepted at ICASSP 2026

2512.16875 2026-05-12 cs.DS cs.LG math.ST stat.ML stat.TH

Learning Confidence Ellipsoids and Applications to Robust Subspace Recovery

Chao Gao, Liren Shan, Vaidehi Srinivas, Aravindan Vijayaraghavan

AI总结本文研究了在高维空间中为任意分布寻找置信椭球的问题，目标是在给定置信参数α的情况下，找到包含至少1−α概率质量的最小体积椭球。为了解决高维下传统方法难以高效近似的问题，作者提出了一种多项式时间算法，能够在体积近似因子与椭球条件数β的多项式关系下，保证覆盖足够概率质量，并给出了相应的计算复杂性下界。该方法基于最小体积外接椭球的对偶结构和几何Brascamp-Lieb不等式，为鲁棒子空间恢复问题提供了首个具有最坏情况近似保证的多项式时间算法。

2511.05476 2026-05-12 cs.SE cs.LG

A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?

Md. Abdul Awal, Mrigank Rochan, Chanchal K. Roy

AI总结该研究从元形态测试的角度探讨了代码语言模型的知识蒸馏问题，指出尽管学生模型在传统准确率指标上表现良好，但其在行为一致性方面可能与教师模型存在显著差异。为此，作者提出了MetaCompress框架，通过行为保持的元形态关系系统评估学生模型的行为保真度，实验表明该方法能有效揭示学生模型中高达62%的行为偏差，强调了在知识蒸馏过程中评估行为一致性的重要性。

Comments This paper has been accepted for publication in the Journal of Systems and Software (JSS)

详情

英文摘要

Transformer-based language models of code have achieved state-of-the-art performance across a wide range of software analytics tasks, but their practical deployment remains limited due to high computational costs, slow inference speeds, and significant environmental impact. To address these challenges, recent research has increasingly explored knowledge distillation as a method for compressing a large language model of code (the teacher) into a smaller model (the student) while maintaining performance. However, the degree to which a student model deeply mimics the predictive behavior and internal representations of its teacher remains largely unexplored, as current accuracy-based evaluation provides only a surface-level view of model quality and often fails to capture more profound discrepancies in behavioral fidelity between the teacher and student models. To address this gap, we empirically show that the student model often fails to deeply mimic the teacher model, resulting in up to 285% greater performance drop under adversarial attacks, which is not captured by traditional accuracy-based evaluation. Therefore, we propose MetaCompress, a metamorphic testing framework that systematically evaluates behavioral fidelity by comparing the outputs of teacher and student models under a set of behavior-preserving metamorphic relations. We evaluate MetaCompress on two widely studied tasks, using compressed versions of popular language models of code, obtained via three different knowledge distillation techniques: Compressor, AVATAR, and MORPH. The results show that MetaCompress identifies up to 62% behavioral discrepancies in student models, underscoring the need for behavioral fidelity evaluation within the knowledge distillation pipeline and establishing MetaCompress as a practical framework for testing compressed language models of code derived through knowledge distillation.

URL PDF HTML ☆

赞 0 踩 0

2511.01292 2026-05-12 stat.ML cs.LG

Optimal Attention Temperature Improves the Robustness of In-Context Learning under Distribution Shift in High Dimensions

Samet Demir, Zafer Dogan

AI总结该研究探讨了如何通过调整注意力温度来提升预训练Transformer模型在分布偏移情况下的上下文学习（ICL）鲁棒性。在高维线性回归框架下，作者分析了一种具有近似softmax注意力机制的Transformer，并推导出分布偏移下ICL泛化误差的闭式表达式，发现存在一个最优注意力温度可最小化该误差。实验表明，调整注意力温度不仅能提升理论性能，还能在实际预训练大语言模型中有效增强对噪声上下文示例的鲁棒性。

Comments ICML 2026, 24 pages, 7 figures

2510.15995 2026-05-12 q-fin.TR cs.GT cs.LG

The Invisible Handshake: Persistent Overpricing by Adaptive Market Agents

Luigi Foscari, Emanuele Guidotti, Nicolò Cesa-Bianchi, Tatjana Chavdarova, Alfio Ferrara

AI总结本文研究了市场做市商与交易者之间的重复博弈中出现的持续高价现象。通过分析交易对价格的内生影响和外生冲击，作者定义了相对于无价格影响的反事实价格路径的高价，并刻画了能够产生持续高价的策略组合。研究发现，基于投影随机梯度上升等方法的去中心化学习机制可以在有限时间内达到高价区域，揭示了市场参与者自适应学习行为如何导致金融市场的持续高价问题。

2510.03761 2026-05-12 cs.CR cs.AI

You Have Been LaTeXpOsEd: A Systematic Analysis of Information Leakage in Preprint Archives Using Large Language Models

Richard A. Dubniczky, Bertalan Borsos, Tamas Bisztray, Norbert Tihanyi

AI总结该研究系统分析了预印本平台（如arXiv）中可能存在的信息泄露问题，指出在缺乏清理的情况下，提交的原始LaTeX源文件、代码、图片和注释可能泄露敏感信息。研究提出了LaTeXpOsEd框架，结合模式匹配、逻辑过滤和大语言模型等技术，从超过1.2TB的10万份arXiv提交中发现了大量个人信息、云存储链接、会议提交凭证等敏感内容，揭示了预印本平台中存在的严重安全隐患，并呼吁学术界和平台运营方采取行动加以改进。

2509.06172 2026-05-12 stat.AP cs.LG

Robust Analysis for Resilient AI System

Yu Wang, Ran Jin, Lulu Kang

AI总结本文针对制造工业互联网（MII）系统中操作风险导致的数据异常问题，提出了一种新的鲁棒回归方法DPD-Lasso，结合密度幂散度与Lasso正则化，以处理AI韧性实验中的污染数据。该方法通过高效的迭代算法克服了计算瓶颈，并在气溶胶喷射打印的MII测试平台中验证了其在干净数据和含异常值数据下的可靠性和稳定性，为构建和验证韧性工业AI系统提供了重要工具。

Comments 10 pages, 3 figures

2507.23511 2026-05-12 eess.AS cs.AI cs.CL cs.SD

MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks

Yadong Niu, Tianzi Wang, Heinrich Dinkel, Xingwei Sun, Jiahao Zhou, Gang Li, Jizhong Liu, Xunying Liu, Junbo Zhang, Jian Luan

AI总结本文提出MECAT，一个多专家构建的细粒度音频理解基准，旨在解决当前音频语言模型在细微理解层面的不足。该基准通过整合专业模型分析与链式推理大语言模型生成多视角、细粒度的描述和开放问答对，并引入新的评估指标DATE，以提升对模型输出细节程度的区分能力。实验表明，MECAT能够更准确地评估现有音频模型在细粒度理解任务中的表现与局限。

Comments Accepted to ICML 2026

2507.07871 2026-05-12 cs.CR cs.AI cs.LG

Mitigating Watermark Forgery in Generative Models via Randomized Key Selection

Toluwani Aremu, Noor Hussein, Munachiso Nwadike, Samuele Poppi, Jie Zhang, Karthik Nandakumar, Neil Gong, Nils Lukas

AI总结该论文研究了如何通过随机密钥选择来防止生成模型中的水印伪造攻击。现有方法通过在内容中嵌入多个水印密钥来抵御伪造，但可能影响模型性能，且在攻击者收集足够多水印样本时仍存在风险。本文提出了一种新的防御机制，通过为每次查询随机选择水印密钥，并仅在恰好一个密钥检测到水印时才接受内容为真实，从而在不降低模型性能的前提下，有效限制了攻击者的成功概率，实验表明攻击成功率可从接近100%降至2%。

2506.20928 2026-05-12 stat.ML cs.LG

Active Learning for Manifold Gaussian Process Regression

Yuanxing Cheng, Lulu Kang, Yiwei Wang, Chun Liu

AI总结本文提出了一种用于流形高斯过程回归的主动学习框架，将流形学习与策略性数据选择相结合，以提升高维空间中的预测精度。该方法联合优化一个用于降维的神经网络和潜空间中的高斯过程回归器，并通过主动学习准则最小化全局预测误差。实验表明，该框架在合成数据上的表现优于随机顺序学习，能够高效处理复杂且不连续的函数，同时保持计算可行性，具有重要的科学与工程应用价值。

Comments 13 pages, 6 figures

2504.02373 2026-05-12 eess.IV cs.CV

HPGN: Hybrid Priors-Guided Network for Compressed Low-Light Image Enhancement

Hantang Li, Qiang Zhu, Xiandong Meng, Lei Xiong, Shuyuan Zhu, Xiaopeng Fan

AI总结在实际应用中，低光照图像通常为了高效存储和传输而被压缩，但现有方法大多忽视了压缩伪影的去除或难以建立统一的增强框架。为此，本文提出了一种结合压缩先验和光照先验的混合引导网络（HPGN），通过引入JPEG质量因子和DCT量化矩阵指导模块设计，实现了对不同压缩质量低光照图像的联合增强。实验结果表明，该方法在提升图像质量方面具有显著优势。

Comments 5 pages, 3 figures

2410.14927 2026-05-12 q-fin.TR cs.CE cs.LG

Hierarchical Reinforced Trader (HRT): A Bi-Level Approach for Optimizing Stock Selection and Execution

Zijie Zhao, Roy E. Welsch

AI总结本文提出了一种基于双层强化学习框架的自动化股票交易系统——分层强化交易者（HRT），用于在多资产股票市场中进行文本感知的组合管理。HRT 将交易决策分为两个层级：高层控制器从市场和文本信号中提取稀疏的方向信号（买入、卖出或持有），而底层控制器则在考虑交易成本、回撤和文本风险等因素下，将这些方向转化为可行的组合权重调整。实验表明，HRT 在多个基准对比中表现出最优的风险收益比，提升了夏普比率并降低了交易周转率，验证了其在结合市场预测与文本风险信号方面的有效性。

2409.19379 2026-05-12 math.CO cs.AI

Automated conjecturing with \emph{TxGraffiti}

Randy Davila

AI总结本文介绍了名为 *TxGraffiti* 的自动化数学猜想生成程序，该程序基于数据驱动和启发式方法，旨在跨数学领域自动生成猜想。*TxGraffiti* 源自早期的 *Graffiti* 系统，已在图论等领域产生多项研究成果，并通过新开发的网络界面提升了用户交互体验。文章详细阐述了其数据收集、猜想生成与过滤机制，展示了其在数学研究中的实际贡献与应用潜力。

Comments Annals of Mathematics and Artificial Intelligence (2026)

2006.02666 2026-05-12 eess.IV cs.CV

Deep Sequential Feature Learning in Clinical Image Classification of Infectious Keratitis

Yesheng Xu, Ming Kong, Wenjia Xie, Runping Duan, Zhengqing Fang, Yuxiao Lin, Qiang Zhu, Siliang Tang, Fei Wu, Yu-Feng Yao

AI总结本文针对感染性角膜炎的临床图像分类问题，提出了一种基于序列级深度学习的模型，旨在准确区分感染性角膜病变的细微差异。该方法通过设计有效的机制保留临床图像的空间结构并提取关键特征，显著提升了分类性能。实验表明，该模型在120张测试图像上的诊断准确率达到80.00%，远超421位眼科医生49.27%的平均水平，展示了其在辅助诊断中的巨大潜力。

Comments Accepted by Engineering

2004.06443 2026-05-12 stat.ML cs.LG

Particle-based Energetic Variational Inference

Yiwei Wang, Jiuhai Chen, Chun Liu, Lulu Kang

AI总结本文提出了一种基于能量耗散律的变分推断新框架——能量变分推断（EVI），能够统一并推导出多种现有的粒子型变分推断方法，如Stein变分梯度下降（SVGD）。在此框架下，作者还提出了一种新的粒子型EVI方法，采用“先近似后变分”的策略，在每一步迭代中显著降低KL散度，数值实验表明该方法在保持目标分布忠实度方面优于现有方法。

Comments 17 pages, 7 figures

2605.10084 2026-05-12 eess.AS cs.AI cs.LG cs.SD

PoDAR: Power-Disentangled Audio Representation for Generative Modeling

Alejandro Luebs, Mithilesh Vaidya, Ishaan Kumar, Sumukh Badam, Stephen W. Bailey, Matthew Bendel, Jose Sotelo, Xingzhe He

AI总结本文提出了一种名为PoDAR的音频表示方法，通过显式地将信号功率与语义内容解耦，显著提升了音频潜在空间的可建模性。该方法利用随机功率增强和潜在一致性目标，使生成模型的收敛速度加快并提升生成质量。实验表明，PoDAR在多个指标上优于基线方法，同时扩展了条件生成的适用范围。

Comments 9 pages, 3 figures

2605.10076 2026-05-12 eess.IV cs.LG

A Stability Benchmark of Generative Regularizers for Inverse Problems

Alexander Denker, Johannes Hertrich, Sebastian Neumayer

AI总结该论文研究了生成式正则化方法在逆问题中的稳定性表现，重点评估了其在不完美条件下的收敛性、对分布外数据的鲁棒性以及对前向算子和噪声模型误差的敏感性。作者通过数值实验对比了生成模型与基于变分优化的现代方法，揭示了生成先验在不同应用场景下的优势与局限，为选择合适的重建方法提供了参考依据。

2605.10036 2026-05-12 cs.NI cs.AI

Bridging the Cognitive Gap: A Unified Memory Paradigm for 6G Agentic AI-RAN

Xijun Wang, Zhaoyang Liu, Chenyuan Feng, Xiang Chen, Howard H. Yang, Tony Q. S. Quek

AI总结随着6G的发展，无线接入网络需要超越传统自动化，引入具备感知、推理和演进能力的智能体AI。当前解耦架构中存在认知鸿沟，物理层被迫将高维状态压缩为低维指标，限制了智能体的语义理解能力。本文提出一种统一的内存范式，通过映射生物记忆层次到异构计算架构，打破感知与推理的界限，利用新型相干互连技术实现跨时间尺度的状态共享，从而在实时响应与长期上下文之间建立真正的自主6G网络。

Comments This work has been submitted to the IEEE for possible publication

2605.10015 2026-05-12 stat.ML cs.CR cs.LG

Differentially Private Sampling from Distributions via Wasserstein Projection

Shokichi Takakura, Seng Pei Liew, Satoshi Hasegawa

AI总结本文研究了在差分隐私约束下从分布中采样的问题。与以往基于密度比的效用度量方法不同，本文提出以Wasserstein距离作为效用指标，克服了传统方法在捕捉分布支持几何结构和处理不同支持分布方面的不足。作者提出了基于Wasserstein投影的最小最大最优机制（WPM），并设计了相应的高效近似算法，提供了收敛性保证，为差分隐私采样提供了新的理论框架和实用方法。

2605.10008 2026-05-12 physics.optics cs.CV cs.ET

Measurement-Adapted Eigentask Representations for Photon-Limited Optical Readout

Tianyang Chen, Mandar M. Sohoni, Saeed A. Khan, Jérémie Laydevant, Shi-Yuan Ma, Tianyu Wang, Peter L. McMahon, Hakan E. Türeci

AI总结在低光条件下，光学读取面临光子噪声、探测器噪声和量化误差等限制，影响后续分类与决策的准确性。本文提出一种基于特征可分辨性的本征任务（eigentask）表示方法，用于对光学传感器输出进行噪声自适应的特征表示。实验表明，该方法在光子预算有限、样本稀缺和任务复杂度高的场景下显著优于主成分分析等传统方法，有效提升了分类性能与学习效率。

Comments 15+14 pages, 4+9 figures, 55 references

2605.09981 2026-05-12 q-bio.BM cs.AI

Yeti: A compact protein structure tokenizer for reconstruction and multi-modal generation

Nabin Giri, Steven Farrell, Kristofer E. Bouchard

AI总结该研究提出了一种名为Yeti的紧凑型蛋白质结构分词器，旨在解决多模态模型中蛋白质结构、序列和功能注释联合建模的问题。Yeti基于无查找量化方法，通过端到端的流匹配目标进行训练，能够在保持高重建精度的同时实现优异的生成能力。实验表明，Yeti在参数数量大幅减少的情况下，仍能实现与现有模型相当甚至更优的结构重建和多模态生成性能，为高效训练多模态蛋白质生成模型提供了有力工具。

2605.09971 2026-05-12 cs.HC cs.AI

HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation

Jiahao Xiong, Fei Wang, Anran Xu, Pinzhi Huang, Tao Wen, Lijia Pan, Cai Chen

AI总结本文提出HapticLDM，一种基于潜在扩散模型的文本到触觉振动生成方法，旨在解决从自然语言生成准确、一致且完整的振动信号这一核心挑战。该方法通过引入强调动态特性的文本处理策略和全局去噪机制，提升了振动信号在时间包络上的连贯性与稳定性。实验结果表明，HapticLDM在真实感和语义对齐方面优于现有方法，并能生成多样化、细腻且物理精确的触觉反馈，有效简化了触觉设计流程。

2605.09960 2026-05-12 physics.geo-ph cs.LG cs.NA math.NA

Total Generalized Variation regularization closes the gap between neural-eld and classical methods in seismic travel-time tomography

Isao Kurosawa

AI总结本文提出了一种基于全广义变分（TGV²）正则化的可微框架MIMIR，用于地震走时层析成像，通过傅里叶特征神经网络表示二维速度场，替代传统网格化的慢度向量，从而实现连续且无限可微的速度场建模。该方法消除了传统TGV计算中的内层Chambolle-Pock迭代，提升了计算效率，并在多个合成数据集上显著优于经典方法，验证了TGV²在恢复分段仿射结构上的优越性。研究指出，在物理信息神经网络反演中，正则化选择比网络结构更为关键。

Comments 15 pages, 6 figures. Manuscript submitted to Geophysical Journal International

2605.09916 2026-05-12 math.MG cs.LG

The Observable Wasserstein Distance

Edivaldo Lopes dos Santos, Leandro Vicente Mauri, Washington Mio, Tom Needham

AI总结本文提出了一种可观测的Wasserstein距离，用于在波兰度量空间上推导概率测度之间的Wasserstein距离下界，以克服大规模非欧几里得数据集中精确最优传输计算的困难。该方法通过1-利普希茨可观测函数将测度投影到实数轴上，并计算投影后分布之间的Wasserstein距离，定义了一个由子空间限制构成的伪度量层次结构。理论上的核心贡献是建立了一个与测度支撑集的度量覆盖维数相关的唯一性恢复结果，为欧几里得分布中的Cramér-Wold定理提供了度量空间的类比。实验表明，该层次结构在Wasserstein距离下界精度与计算效率之间提供了可调节的权衡。

2605.09890 2026-05-12 cs.CR cs.LG

Deep Learning under Fractional-Order Differential Privacy

Mohammad Partohaghighi, Roummel Marcia

AI总结本文提出了一种基于分数阶微分隐私的随机梯度下降方法（FO-DP-SGD），通过引入有限窗口的幂律加权历史输出，增强了隐私保护机制的记忆能力，同时保持了标准的“先求和后加噪”的结构。该方法在保证隐私的前提下，有效降低了每步的敏感度，从而提升了模型的精度与隐私-效用平衡。实验表明，FO-DP-SGD在多个数据集上优于现有的隐私保护优化方法。

详情

英文摘要

Differentially private stochastic gradient descent (DP-SGD) is a standard approach to privacy-preserving learning based on per-example clipping, subsampling, Gaussian perturbation, and privacy accounting. Classical DP-SGD releases a noisy version of the current clipped subsampled gradient sum. We propose Fractional-Order Differentially Private Stochastic Gradient Descent (\textbf{FO-DP-SGD}), a mechanism-level extension that replaces this current-only query, before Gaussian noise is added, with a fractional recursive query combining the current clipped sum with a finite-window, power-law-weighted aggregation of previously released private sum-level outputs. This injects fractional memory into the release mechanism while preserving the standard \emph{sum-then-noise-then-divide} structure. Under add/remove adjacency with Poisson subsampling, the current-step sensitivity analysis shows that the only newly data-dependent term is the scaled current clipped sum. Hence, conditioned on the private history, the effective $\ell_2$-sensitivity is at most $βC$, where $C$ is the clipping threshold and $β\in(0,1]$ controls the current-step contribution. Thus, FO-DP-SGD admits standard per-step Rényi differential privacy accounting via a Poisson-subsampled Gaussian mechanism with effective noise-to-sensitivity ratio $σ/β$, and composes to yield overall $(\varepsilon,δ)$-differential privacy guarantees. FO-DP-SGD provides a framework for studying long-memory effects in private optimization. The fractional order, memory window, and mixing coefficient govern the trade-off among current-step sensitivity, signal retention, and private-history influence. Experiments on SVHN, CIFAR-10, and CIFAR-100 show improved test accuracy and privacy--utility performance over DP-SGD and private baselines including DP-Adam, DP-IS, SA-DP-SGD, ADP-AdamW, DP-SAT, and DP-Adam-AC.

URL PDF HTML ☆

赞 0 踩 0

2605.09881 2026-05-12 hep-ph cs.LG hep-ex

Dissecting Jet-Tagger Through Mechanistic Interpretability

Saurabh Rai, Sanmay Ganguly

AI总结本文通过机制可解释性方法分析了用于顶夸克标签任务的粒子变换器模型，旨在揭示其内部用于喷注分类的计算回路及其物理表征内容。研究发现，一个由六个注意力头组成的稀疏回路能够恢复模型大部分性能，并具有清晰的源-中继-读出结构，其中早期层头作为因果源，中间层头选择性关注喷注子结构，晚期层头负责信号读出。结果表明，自然语言模型的可解释性方法可应用于喷注分类任务，且梯度下降过程可能在无监督条件下重现物理上有意义的喷注标签特征。

Comments 40 pages, 14 figures, 12 tables. Comments are welcome

2605.09863 2026-05-12 cs.CR cs.AI cs.CL cs.IR cs.LG

Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents

Chunxiao Wang

AI总结本文提出了一种名为 Nautilus Compass 的黑盒方法，用于检测生产环境中大型语言模型代理的“人格漂移”问题，即代理在长时间交互中偏离用户设定的约束和先前约定。该方法无需访问模型权重，仅通过用户提示与行为锚文本之间的余弦相似度进行检测，使用 BGE-m3 嵌入进行聚合计算，适用于如 Claude 和 GPT-4 等封闭 API 接口。实验表明，该方法在真实会话数据集上实现了较高的漂移检测性能，并且系统部署成本较低，具有实际应用价值。

Comments 19 pages, 6 figures. MIT-licensed code + reproduction scripts at github.com/chunxiaoxx/nautilus-compass

详情

英文摘要

Production LLM coding agents drift over long sessions: they forget user-specified constraints, slip into mistakes the user already flagged, and confabulate prior agreements. White-box approaches such as persona vectors require model weights and so cannot be applied to closed APIs (Claude, GPT-4) that most users actually interact with. We present Nautilus Compass, a black-box persona drift detector and agent memory layer for production coding agents. The method operates entirely at the prompt-text layer: cosine similarity between user prompts and behavioral anchor texts, aggregated by a weighted top-k mean using BGE-m3 embeddings. Compass is, to our knowledge, the only public agent memory layer (among Mem0, Letta, Cognee, Zep, MemOS, smrti verified May 2026) that does not call an LLM at index time to extract facts or build a graph; raw conversation text is embedded directly. The system ships as a Claude Code plugin, an MCP 2024-11-05 A2A server (Cursor, Cline, Hermes), a CLI, and a REST API on one daemon, with a Merkle-chained audit log for tamper-evident anchor updates. On a held-out test set built from real Claude Code session traces and labeled by an independent LLM judge, Compass reaches ROC AUC 0.83 for drift detection. The embedded retrieval pipeline scores 56.6% on LongMemEval-S v0.8 and 44.4% on EverMemBench-Dynamic (n=500), topping the four published EverMemBench Table 4 baselines. LongMemEval-S 56.6% is ~30 points below recent white-box leaders (90+%); we treat that as the architectural ceiling of the no-extraction design. End-to-end reproduction cost is $3.50 (~14x cheaper than GPT-4o-judged stacks). A paired cross-vendor behavior A/B accompanies these numbers as preliminary system-level evidence. Code, anchors, frozen test data, and audit-log tooling are MIT-licensed at github.com/chunxiaoxx/nautilus-compass.

URL PDF HTML ☆

赞 0 踩 0