arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 3405
2605.25304 2026-05-26 cs.LG cs.CR cs.CV

When Interpretability Becomes a Liability: Adversarial Attacks on CBM Concept Layers

当可解释性成为负担:针对CBM概念层的对抗攻击

Aditya Sridhar

发表机构 * Independent Researcher(独立研究者)

AI总结 本文系统研究了概念瓶颈模型(CBM)中概念层的对抗性脆弱性,提出了一种基于语义扰动的稳定性正则化防御方法SPECTRA,显著提高了攻击所需的最小扰动范数,同时保持了分类精度。

Comments Accepted to CVPR 2026 (Findings). 9 pages, 6 figures

详情
AI中文摘要

概念瓶颈模型(CBM)已成为可解释机器学习的基础方法,通过显式的概念激活提供人类可理解的中间表示。然而,这种可解释性从根本上引入了一个关键且先前未被探索的攻击面:概念瓶颈层本身。我们提出了对CBM中概念级对抗性脆弱性的全面、系统性研究,揭示了对输入像素进行有针对性的最小扰动可以通过操纵语义表示导致灾难性的错误分类。我们开发了一个严格的理论框架来量化概念空间的鲁棒性,建立了揭示这些架构脆弱性景观的新指标。我们在CUB-200-2011数据集上的广泛分析表明,标准CBM对概念级操纵表现出严重的敏感性。为了解决这一关键弱点,我们引入了SPECTRA(基于语义扰动的概念训练以增强对抗鲁棒性),一种原则性的稳定性正则化防御。SPECTRA有效地强化了语义表示空间,将成功攻击所需的最小扰动范数从0.46提高到超过4,200,使得有针对性的概念操纵在计算上变得不可行。此外,SPECTRA将基线分类精度保持在2.2%以内。通过将概念级攻击确立为一种根本不同的威胁模型,这项工作在可解释机器学习与对抗鲁棒性的交叉领域开辟了一个新的研究前沿。

英文摘要

Concept Bottleneck Models (CBMs) have emerged as a cornerstone approach for interpretable machine learning, providing human-understandable intermediate representations through explicit concept activations. However, this interpretability fundamentally introduces a critical, previously unexplored attack surface: the concept bottleneck layer itself. We present a comprehensive, systematic study of concept-level adversarial vulnerabilities in CBMs, revealing that targeted, minimal perturbations operating on input pixels can induce catastrophic misclassification by manipulating semantic representations. We develop a rigorous theoretical framework to quantify concept-space robustness, establishing novel metrics that expose the vulnerability landscape of these architectures. Our extensive analysis on the CUB-200-2011 dataset demonstrates that standard CBMs exhibit severe susceptibility to concept-level manipulation. To address this critical weakness, we introduce SPECTRA (Semantic Perturbation-based Concept Training for Robustness against Attacks), a principled stability regularization defense. SPECTRA effectively hardens the semantic representation space, increasing the minimal perturbation norm required for a successful attack from 0.46 to over 4,200, rendering targeted concept manipulation computationally prohibitive. Furthermore, SPECTRA preserves baseline classification accuracy to within 2.2%. By establishing concept-level attacks as a fundamentally distinct threat model, this work opens a new research frontier at the intersection of interpretable machine learning and adversarial robustness.

2605.25294 2026-05-26 cs.CV

Geometry-Aware Image Flow Matching

几何感知图像流匹配

Junho Lee, Kwanseok Kim, Joonseok Lee

发表机构 * Seoul National University, Seoul, Korea(首尔国立大学)

AI总结 本文通过发现自然图像语义信息主要编码在方向分量上,提出球面最优传输流匹配(SOT-CFM)和球面流匹配(SFM)两种几何感知方法,在超球面上建模图像,相比欧几里得基线取得更优性能。

详情
AI中文摘要

生成模型的最新进展突显了几何感知建模在流形约束环境中的强大能力。然而,对于自然图像,该领域仍局限于欧几里得假设,未能利用数据内在的几何结构。在本文中,我们研究了自然图像的几何结构,观察到语义信息主要编码在方向分量中,而范数分量可以通过全局平均值近似。这一性质在RGB空间和潜在空间中都成立,表明自然图像可以在超球面上有效建模。基于这一发现,我们引入了球面最优传输流匹配(SOT-CFM),它利用角距离,以及球面流匹配(SFM),它直接在流形上约束动力学。我们的实验表明,这些几何感知方法相比欧几里得基线取得了更优的性能。最终,这项工作提供了一种新颖的视角,弥合了基于黎曼流形的建模与自然图像生成之间的差距。

英文摘要

Recent advances in generative models highlight the power of geometry-aware modeling in manifold-constrained settings. Yet, for natural images, the field remains confined to Euclidean assumptions, failing to exploit the potential of intrinsic geometric structures within the data. In this work, we investigate the geometry of natural images and observe that semantic information is predominantly encoded in directional components, while norm components can be approximated by the global average. This property holds across both RGB and latent spaces, suggesting that natural images can be effectively modeled on a hypersphere. Building on this finding, we introduce Spherical Optimal Transport Flow Matching (SOT-CFM), which utilizes angular distance, and Spherical Flow Matching (SFM), which constrains dynamics directly on the manifold. Our experiments demonstrate that these geometry-aware methods achieve superior performance against Euclidean baselines. Ultimately, this work provides a novel perspective that bridges the gap between Riemannian manifold-based modeling and natural image generation.

2605.25293 2026-05-26 cs.CV cs.AI cs.RO

Neuromorphic LiDAR-based Bird's Eye View Object Detection using Energy-efficient Spiking Neural Networks

基于神经形态激光雷达的鸟瞰图目标检测:使用节能脉冲神经网络

Sambit Mohapatra, Senthil Yogamani, Heinrich Gotzig, Patrick Mader

发表机构 * Valeo, Germany(德国瓦莱欧公司) Valeo, Ireland(爱尔兰瓦莱欧公司) TU Ilmenau, Germany(德国伊门豪大学)

AI总结 提出一种端到端脉冲编码器-解码器网络,用于激光雷达点云鸟瞰图表示中的目标检测,通过代理梯度反向传播训练,在KITTI基准上达到高精度,并实现3.33倍突触操作能耗降低。

详情
AI中文摘要

自动驾驶感知需要在严格的功耗约束下对三维传感器数据进行准确高效的处理。传统卷积神经网络实现了强大的检测精度,但计算密集,限制了其在资源受限的神经形态平台上的部署。脉冲神经网络通过事件驱动的稀疏计算提供了一种引人注目的替代方案,但其在复杂真实世界感知任务(如三维目标检测)中的应用仍然有限。在这项工作中,我们提出了一种端到端脉冲编码器-解码器网络,用于激光雷达点云鸟瞰图表示中的目标检测,并使用代理梯度反向传播进行训练。我们训练了两个变体:一个膜电位变体,在输出阶段读取连续神经元状态以获得最大精度,在$\mathrm{IoU}\!=\!0.5$(简单/中等/困难)下达到$92.05$/$87.04$/$86.51$ AP;以及一个全二进制脉冲变体,每一层仅操作脉冲序列,用于直接神经形态部署。我们评估了四种输入脉冲编码策略,并证明允许网络直接从数据学习脉冲表示优于手工制作的泊松、延迟和z轴编码方案,在KITTI基准上,当顺序帧不可用且BEV输入跨时间步重复呈现作为时间流代理时。分块能量分析表明,在保守的基于循环的操作下,与等效CNN相比,突触操作能量降低了$3.33 imes$。这些结果共同证明了脉冲神经网络在自动驾驶中实现准确且节能的神经形态感知的可行性。

英文摘要

Autonomous driving perception demands accurate and efficient processing of three-dimensional sensor data under strict power constraints. Traditional convolutional neural networks achieve strong detection accuracy but are computationally intensive, limiting their suitability for deployment on resource-constrained neuromorphic platforms. Spiking neural networks offer a compelling alternative through event-driven sparse computation, yet their application to complex real-world perception tasks such as three-dimensional object detection remains limited. In this work, we propose an end-to-end spiking encoder-decoder network for object detection in bird's eye view representations of LiDAR point clouds, trained using surrogate gradient backpropagation. We train two variants: a membrane potential variant that reads continuous neuron state at the output stage for maximum accuracy, achieving $92.05$/$87.04$/$86.51$ AP at $\mathrm{IoU}\!=\!0.5$ (Easy/Moderate/Hard), and, a fully binary spiking variant that operates exclusively on spike trains at every layer for direct neuromorphic deployment. We evaluate four input spike encoding strategies and demonstrate that allowing the network to learn spike representations directly from data outperforms hand-crafted Poisson, latency, and z-axis encoding schemes on the KITTI benchmark, where sequential frames are unavailable and the BEV input is presented repeatedly across timesteps as a proxy for temporal streaming. A block-wise energy analysis demonstrates a $3.33\times$ reduction in synaptic operation energy over an equivalent CNN under conservative loop-based operation. Together, these results demonstrate the viability of spiking neural networks for accurate and energy-efficient neuromorphic perception in autonomous driving.

2605.25284 2026-05-26 cs.CL

Knowing but Not Showing: LLMs Recognize Ambiguity but Rarely Ask Clarifying Questions

知道但不展示:LLMs 识别歧义但很少提出澄清问题

Jinyan Su, Claire Cardie

发表机构 * Cornell University(康奈尔大学)

AI总结 研究大型语言模型在识别用户查询歧义与主动提出澄清问题之间的行为差距,发现模型虽能识别歧义但默认直接回答,检索上下文会进一步减少澄清行为。

详情
AI中文摘要

用户查询通常不明确,可能允许多种有效解释。一个有用的助手不应默默假设用户意图,而应通过提出澄清问题来揭示这种歧义。这需要两种能力:识别查询存在歧义,并基于该识别采取行动(寻求澄清而非直接回答)。为了研究这些能力,我们在三种设置下评估模型对歧义、无歧义和消歧问题的表现:标准问答、显式歧义判断和行为分析(其中评判模型将响应分类为直接回答、拒绝或澄清问题)。我们发现识别与行为之间存在明显差距:当被明确要求判断时,模型通常能识别歧义,但在问答设置中,它们绝大多数默认直接回答。检索上下文通过提高可回答性进一步扩大了这一差距,使模型更不可能提出澄清问题。

英文摘要

User queries are often underspecified and may admit multiple valid interpretations. Rather than silently making assumptions about the user's intent, a helpful assistant should surface such ambiguity by asking a clarifying question. Doing so requires two abilities: recognizing that a query is ambiguous, and acting on that recognition by seeking clarification instead of answering directly. To study these abilities, we evaluate models on ambiguous, unambiguous, and disambiguated questions in three settings: standard question answering, explicit ambiguity judgment, and behavioral analysis, where a judge model classifies responses as direct answers, refusals, or clarifying questions. We find a clear gap between recognition and behavior: models often identify ambiguity when explicitly asked to judge it, yet in the QA setting they overwhelmingly default to direct answers. Retrieved context further widens this gap by improving answerability while making models even less likely to ask clarifying questions.

2605.25279 2026-05-26 cs.RO

GreenSeg: Ground Segmentation Algorithm for Agricultural Robots in Mediterranean Greenhouses using RGB-D Point Clouds

GreenSeg: 基于RGB-D点云的地中海温室农业机器人地面分割算法

Fernando Cañadas-Aránega, José C. Moreno, José L. Blanco-Claraco

发表机构 * Department of Informatics, CIESOL, ceiA3, Universidad de Almería(信息学院,CIESOL,ceiA3,阿尔梅里亚大学)

AI总结 针对地中海温室狭窄通道、异构地形和光学干扰等挑战,提出一种基于RGB-D感知的双层验证地面分割框架GreenSeg,通过全局平面拟合、曲率滤波和种子点区域生长实现稳定导航,在AGRICOBIOT I平台上验证了其在动态光照下优于基准方法。

详情
AI中文摘要

地中海地区的温室农业因其独特的结构和环境限制面临显著的自动化挑战。这些环境的特点是极其狭窄的通道、从混凝土到耕地的异构地形,以及由聚乙烯覆盖物引起的光学干扰,导致深度传感器中出现镜面反射和“鬼点”。虽然自主导航对于农业任务的数字化至关重要,但传统解决方案通常依赖于昂贵的3D LiDAR系统,这些系统对于大多数设施来说在经济上不可扩展。为了解决这个问题,本文提出了GreenSeg,一个使用RGB-D感知的鲁棒感知框架,用于自主导航。所提出的方法引入了一种双层验证策略:一种鲁棒的全局平面拟合结合表面曲率滤波器以实现地形适应性,以及一种基于种子点的区域生长约束以确保可导航平面的空间连续性。使用AGRICOBIOT I平台在四个不同太阳高度角的日间场景下进行了实验验证。结果表明,GreenSeg始终优于基准分割方法,在走廊末端的关键旋转操作中,平均召回率提高了11.58%,mIoU提高了19.24%。这些发现证实了所提出的算法能够在受预算限制且对光照条件敏感的非结构化动态农业环境中实现稳定安全的自主导航。

英文摘要

Greenhouse agriculture in the Mediterranean region faces significant automation challenges due to its unique structural and environmental constraints. These environments are characterized by extremely narrow aisles, heterogeneous terrains ranging from concrete to tilled soil and severe optical interference caused by polyethylene covers, which induce specular reflections and "ghost points" in depth sensors. While autonomous navigation is essential for digitizing agricultural tasks, traditional solutions often rely on expensive 3D LiDAR systems that are economically unscalable for most facilities. To address this, this paper presents GreenSeg, a robust perception framework for autonomous navigation using RGB-D sensing. The proposed method introduces a dual-layer validation strategy: a robust global plane fitting combined with a surface curvature filter for terrain adaptability, and a seed-point-based Region Growing constraint to ensure the spatial continuity of the navigable plane. Experimental validation was conducted using the AGRICOBIOT I platform across four diurnal scenarios with varying solar elevations. The results show that GreenSeg consistently outperforms benchmark segmentation methods, achieving peak improvements of 11.58% in mean Recall and 19.24% in mIoU during critical rotational maneuvers at the end of corridors. These findings confirm that the proposed algorithm enables stable and safe autonomous navigation in unstructured, dynamic agricultural environments that are subject to budget constraints and sensitive to lighting conditions.

2605.25275 2026-05-26 cs.LG

Label-NTK Alignments and A Tighter Convergence Bound in the NTK Regime

标签-NTK 对齐与 NTK 区域中更紧的收敛界

Ruchirinkil Marreddy, Chaoyue Liu

发表机构 * Elmore Family School of Electrical and Computer Engineering(埃洛姆家族电气与计算机工程学院)

AI总结 通过标签与NTK特征谱的对齐特性,提出更紧的收敛界,显著改进经典最坏情况结果。

详情
AI中文摘要

神经正切核(NTK)框架通过近似线性化动力学解释过参数化神经网络的优化,提供指数收敛保证。然而,现有结果往往过于悲观,与实际快速训练不符,因为它们依赖于最小的NTK特征值,而该特征值在实践中通常极小。在这项工作中,我们通过刻画数据标签与NTK特征谱之间的相互作用,开发了更精确的收敛保证。我们识别出两个关键现象:标签-NTK对齐和残差-NTK对齐,表明标签和残差在NTK特征向量上的投影与对应特征值成比例。我们在温和的数据假设下提供了经验证据和理论证明。利用这些对齐性质,我们推导出一个依赖于完整谱的精细收敛界,该界紧密匹配实际训练动态,显著优于经典最坏情况结果。我们进一步获得了改进的泛化界。在多个数据集上的MLP和CNN实验验证了我们的理论。

英文摘要

The Neural Tangent Kernel (NTK) framework explains optimization in over-parameterized neural networks via approximately linearized dynamics, yielding exponential convergence guarantees. However, existing results are often overly pessimistic and do not match the fast training in practice, as they depend on the smallest NTK eigenvalue, which is typically extremely small in practice. In this work, we develop sharper convergence guarantees by characterizing the interaction between data labels and the NTK eigen-spectrum. We identify two key phenomena, Label-NTK alignment and Residual-NTK alignment, showing that projections of labels and residuals onto NTK eigenvectors scale with the corresponding eigenvalues. We provide empirical evidence and theoretical justification under mild data assumptions. Exploiting these alignment properties, we derive a refined convergence bound that depends on the full spectrum and closely matches practical training dynamics, significantly improving over classical worst-case results. We further obtain improved generalization bounds. Experiments on MLPs and CNNs across multiple datasets validate our theory.

2605.25272 2026-05-26 cs.AI cs.CY stat.AP

AI Cartography: Mapping the Latent Landscape of AI Benchmark Ecosystems

AI 制图:绘制 AI 基准生态系统的潜在景观

Michael Hardy, Anka Reuel, Lijin Zhang, Jodi M. Casabianca, Sang Truong, Yash Dave, Hansol Lee, Benjamin Domingue, Sanmi Koyejo

发表机构 * Open LLM Leaderboard(开放大语言模型排行榜) HELM ICML(国际机器学习会议)

AI总结 针对排行榜分数受测量噪声影响的问题,提出基于验证性因子分析和概化理论的框架,分解排名方差来源,揭示基准间关系、局部依赖性及元数据影响,并比较显式与潜在缩放律的可靠性。

详情
AI中文摘要

虽然总体排行榜分数驱动着 AI 发展,但它们包含大量测量噪声,其来源和幅度尚未量化,使得排名何时反映真实能力差异何时反映评估伪像尚不明确。我们引入了一个用于测量 AI 基准生态系统中潜在景观的框架。将验证性因子分析(CFA)和概化理论应用于 Open LLM Leaderboard 上的 4000 多个模型,我们分解了排名方差的来源并确定:(1)当前报告实践中假设的结构低估了基准之间关系的强度;(2)排行榜项目之间存在局部依赖性的证据,这削弱了在当前评分系统下将基准用作测量工具的有效性;(3)在此背景下,贡献者元数据解释了比架构或部署类别更多的排名相关方差(约 9%);(4) 显式分数的“缩放律”斜率可靠性较低($R_β=0.53$);相比之下,潜在通用因子大小斜率在生态系统控制下高度稳定($R_g=0.97$)。我们能够提供对基准动态的独特见解,例如哪些基准是 LLM 规模的函数,哪些可能受到后训练实践的相反影响。我们提供了可操作的诊断方法,以确定如何信任基准排名以及如何改进基准设计。

英文摘要

While aggregate leaderboard scores drive AI development, they contain substantial measurement noise whose sources and magnitudes remain unquantified, making it unclear when rankings reflect genuine capability differences versus evaluation artifacts. We introduce a framework for measuring the latent landscape in AI benchmark ecosystems. Applying Confirmatory Factor Analysis (CFA) and Generalizability Theory to 4,000+ models from the Open LLM Leaderboard, we decompose sources of ranking variance and establish: (1) structures assumed in current reporting practice underestimate the strength of relationships between benchmarks; (2) evidence of local dependence among leaderboard items, undermining uses of benchmarks as measurement instruments under current scoring systems; (3) contributor metadata explains more rank-relevant variance ($\approx9\%$) than architecture or deployment categories in this context; (4) a manifest-score "scaling law" slope has low reliability ($R_β=0.53$); by contrast, the latent general-factor size slope is highly stable across ecosystem controls ($R_g=0.97$). We are able to provide unique insights into benchmark dynamics, such as which benchmarks are a function of LLM size and which can be oppositely impacted by post-training practices. We provide actionable diagnostics to determine how benchmark rankings can be trusted and how benchmark design can be improved.

2605.25267 2026-05-26 cs.LG cs.AI

Latent Q-Barrier Shielding for Safe In-Context Reinforcement Learning

潜在Q-屏障屏蔽用于安全上下文强化学习

Minjae Kwon, Amir Moeini, Shangtong Zhang, Lu Feng

发表机构 * University of Virginia(弗吉尼亚大学)

AI总结 提出一种潜在Q-屏障屏蔽方法,通过学习上下文表示、潜在动力学和集成成本评论家,在部署时无需参数更新即可根据剩余预算和预测未来成本过滤或软重加权候选动作,从而改善安全上下文强化学习在分布外转移下的奖励-安全权衡。

详情
AI中文摘要

安全上下文强化学习(ICRL)在测试时不更新参数,仅从交互历史中在线适应,同时将情节成本控制在安全预算内。在分布外(OOD)部署转移下,仅预训练的安全ICRL可能产生较差的奖励-安全权衡,因为剩余预算仅通过冻结的策略条件影响行为,而非通过针对预测未来成本的显式动作级检查。我们提出一种潜在Q-屏障屏蔽,在部署前学习上下文表示、潜在动力学和集成成本评论家。无需参数更新,该屏蔽从历史中推断上下文,并使用剩余预算和预测未来成本过滤或软重加权候选动作。我们证明了一个条件性的、误差分解的屏障-边际结果:满足Q-屏障的动作将下一个潜在预算状态置于近似预算安全的延续中(在学习的评论家下),误差上界由贝尔曼误差和潜在预测误差决定。在五个安全ICRL基准测试中,该屏蔽在部署时相比强安全ICRL基线改善了奖励-安全权衡:在短上下文窗口后,它在五个基准中的四个上实现了更高的回报,同时在所有五个基准中匹配或降低了平均情节成本。

英文摘要

Safe in-context reinforcement learning (ICRL) adapts online from interaction history without test-time parameter updates while controlling episode cost under a safety budget. Under out-of-distribution (OOD) deployment shifts, pretraining-only safe ICRL can give poor reward-safety tradeoffs because the remaining budget affects behavior only through frozen policy conditioning, not an explicit action-level check against predicted future cost. We propose a latent Q-Barrier shield that learns a context representation, latent dynamics, and an ensemble cost critic before deployment. Without parameter updates, the shield infers context from history and filters or softly reweights candidate actions using the remaining budget and predicted future cost. We prove a conditional, error-decomposed barrier-margin result: a Q-Barrier-satisfying action leaves the next latent-budget state with an approximately budget-safe continuation under the learned critic, up to Bellman and latent-prediction errors. Across five safe ICRL benchmarks, the shield improves deployment-time reward-safety tradeoffs over a strong safe-ICRL baseline: after a short context window, it achieves higher return in four of five benchmarks while matching or lowering average episode cost in all five.

2605.25266 2026-05-26 cs.CV

DeltaCam: Differential Intrinsic Camera Modeling for Video Generation

DeltaCam: 用于视频生成的差分内参相机建模

Debabrata Mandal, Zhihan Peng, Yujie Wang, Praneeth Chakravarthula

发表机构 * UNC, Chapel Hill USA(北卡罗来纳大学教堂山分校)

AI总结 提出DeltaCam视频扩散框架,通过差分参数化神经相机适配器学习相对变化,实现焦距、光圈、ISO等内参的平滑可控视频生成,并扩展到真实场景。

详情
AI中文摘要

将相机内参纳入视频生成模型为控制场景动态和影响视觉外观的成像过程提供了原则性方法。先前工作主要关注外参控制(如相机姿态和运动),而将内参视为隐式或固定。关键瓶颈在于缺乏具有准确且多样化的时变相机元数据的大规模视频数据集,这使得学习绝对相机参数化变得困难。因此,当前模型难以以可控且时间一致的方式融入摄影相机行为,包括景深转换、曝光变化、镜头畸变和色彩处理。我们引入DeltaCam,一种视频扩散框架,通过Δ参数化的神经相机适配器对相机行为进行建模,该适配器基于相机运动和内参的相对变化而非绝对状态进行操作。通过从合成视频数据中学习这种差分公式,我们减轻了对精确真实世界相机标签的依赖,并实现了对焦距、光圈、ISO、色温和镜头畸变成像因子的平滑一致控制。我们将此框架扩展到真实世界视频,通过两种机制:在真实图像-元数据对上微调控制以实现精确镜头匹配,以及提取解耦嵌入用于隐式视频到视频风格迁移,无需显式相机参数。通过有效分离场景内容与内生成像行为,DeltaCam实现了现有模型难以实现的相机一致视频生成和编辑操作。最终,我们的结果为连接合成控制与真实世界摄影仿真建立了一种实用且可扩展的方法。

英文摘要

Incorporating camera intrinsics into video generation models offers a principled way to control not only scene dynamics but also the imaging process that governs visual appearance. Prior work has primarily focused on extrinsic control, such as camera pose and motion, while treating intrinsic camera parameters as implicit or fixed. A key bottleneck is the lack of large-scale video datasets with accurate and diverse temporally varying camera metadata, which makes learning absolute camera parameterizations difficult. As a result, current models struggle to incorporate photographic camera behavior, including depth-of-field transitions, exposure variations, lens distortions, and color processing, in a controllable and temporally consistent manner. We introduce DeltaCam, a video diffusion framework that models camera behavior through $Δ$-parameterized neural camera adaptors, operating on relative changes in camera motion and intrinsics instead of absolute states. By learning this differential formulation from synthetic video data, we mitigate reliance on precise real-world camera labels and enable smooth, consistent control over imaging factors such as focal length, aperture, ISO, color temperature, and lens distortion. We extend this framework to real-world footage through two mechanisms: finetuning the controls on real image-metadata pairs for precise shot matching, and extracting disentangled embeddings for implicit video-to-video style transfer without requiring explicit camera parameters. By effectively separating scene content from intrinsic imaging behavior, DeltaCam enables camera-consistent video generation and editing operations that are difficult to achieve with existing models. Ultimately, our results establish a practical and scalable approach for bridging synthetic control and real-world photographic emulation.

2605.25263 2026-05-26 cs.CL cs.AI

Mimir: Large-scale Multilingual Concept Modeling

Mimir:大规模多语言概念建模

Elio Musacchio, Lucia Siciliani, Pierpaolo Basile

发表机构 * Department of Computer Science(计算机科学系) University of Bari Aldo Moro(巴里阿尔多·莫罗大学)

AI总结 提出Mimir,一个1.6B参数的大规模概念模型,通过多语言预训练和指令微调实现概念级别的理解与生成,替代传统的token预测范式。

详情
AI中文摘要

当前的语言建模方法围绕token构建。文本语料被分割成token,模型通过对这些token进行计算来训练,例如根据前文预测下一个token。这一范式已成为现代语言建模的标准,尤其是基于token的架构取得了卓越性能。然而,最近的研究不仅开始质疑语言模型如何从token中处理和理解意义,还开始质疑使用更高级别的粒度是否能推动研究领域的发展。这引出了概念建模的想法,即直接训练模型进行下一个概念预测,而非下一个token预测。目标是输入从token转变为概念,迫使底层语言模型将其粒度从细粒度的token转变为广泛的概念。在这项工作中,我们介绍了Mimir,一个1.6B参数的大规模概念模型,用于多语言概念理解和生成。我们利用了一个大规模多语言预训练语料库(38,883,987,240个句子),涵盖46种语言,以及一个大规模多轮多语言指令微调数据集(66,816,428个句子),覆盖总共35种语言。我们针对一个参数数量相当的语言模型,对模型性能进行了广泛评估。

英文摘要

Current language modeling approaches are built around tokens. Text corpora are split into tokens, and models are trained by performing computations on these tokens, such as predicting the next token given the preceding ones as context. This paradigm has become the standard in modern language modeling, especially given the outstanding performance obtained by token-based architectures. However, recent works have not only begun to question how language models process and understand meaning from tokens, but also to question whether using higher levels of granularity could advance the research field. This led to the idea of Concept Modeling, that is, to directly train models for next-concept prediction rather than next-token prediction. The goal is to change the input from tokens to concepts, forcing the underlying language model to shift its granularity from fine-grained tokens to broad concepts. In this work, we introduce Mimir, a 1.6B Large Concept Model trained for multilingual concept understanding and generation. We leverage a large-scale multilingual pre-training corpus (38,883,987,240 sentences) spanning 46 languages and a large-scale multi-turn and multilingual instruction-tuning dataset (66,816,428 sentences) covering a total of 35 languages. We extensively evaluate model performance against a language model with a comparable number of parameters.

2605.25262 2026-05-26 cs.CV

Semantics-Guided Multimodal Masked Autoencoder Pretraining for 3D BEV Object Detection

语义引导的多模态掩码自编码器预训练用于3D BEV目标检测

Prabuddhi Wariyapperuma, Rajitha de Silva, Marc Hanheide, Thomas Bohné, Leonardo Guevara

发表机构 * University of Lincoln, Lincoln Centre for Autonomous Systems(林肯大学,林肯自主系统中心) University of Cambridge, Institute for Manufacturing, Department of Engineering(剑桥大学,制造研究所,工程系)

AI总结 提出语义引导的多模态掩码自编码器框架,通过语义引导的LiDAR体素掩码和辅助点语义解码分支,在预训练中注入语义信息,提升3D BEV目标检测性能。

Comments Accepted at the ICRA 2026 Workshop on Semantics for Reliable Robot Autonomy (SRRA) as a lightning talk and poster

详情
AI中文摘要

准确的3D鸟瞰图(BEV)目标检测对于自动驾驶至关重要,并且强烈依赖于来自互补传感器(如摄像头和LiDAR)的有效多模态表示。多模态掩码自编码器已显示出学习此类表示以用于下游3D BEV目标检测的强大潜力。然而,现有方法通常对摄像头和LiDAR输入应用均匀随机掩码,平等对待所有区域,并且仅通过掩码重建学习表示。我们提出了一种语义引导的多模态掩码自编码器框架,该框架在预训练期间通过两个独立组件引入语义信息:(i)语义引导的LiDAR体素掩码,它更强烈地保留语义重要的LiDAR区域,以及(ii)一个辅助的点级LiDAR语义解码分支,在重建之外注入语义引导。在BEVFusion 3D目标检测上,与标准UniM2AE基线相比,我们的语义引导预训练策略在nuScenes mini验证集上提升了性能:语义引导的LiDAR体素掩码在基线上实现了+1.49%的平均精度(mAP)和+1.66%的nuScenes检测分数(NDS),而解码器侧的点语义监督实现了+1.39%的mAP和+3.22%的NDS。

英文摘要

Accurate 3D bird's-eye view (BEV) object detection is essential for autonomous driving, and depends strongly on effective multimodal representations from complementary sensors such as cameras and LiDAR. Multimodal masked autoencoders have shown strong potential for learning such representations for downstream 3D BEV object detection. However, existing methods typically apply uniform random masking to camera and LiDAR inputs, treating all regions equally, and learn representations only through masked reconstruction. We propose a semantics-guided multimodal masked autoencoder framework that introduces semantic information during pretraining through two separate components: (i) semantics-guided LiDAR voxel masking, which preserves semantically important LiDAR regions more strongly, and (ii) an auxiliary point-wise LiDAR semantic decoder branch that injects semantic guidance in addition to reconstruction. On BEVFusion 3D object detection, our semantics-guided pretraining strategy improves performance on the nuScenes mini validation set compared to the standard UniM2AE baseline: semantics-guided LiDAR voxel masking yields +1.49% mean Average Precision (mAP) and +1.66% nuScenes Detection Score (NDS), while decoder-side point semantic supervision yields +1.39% mAP and +3.22% NDS over the baseline.

2605.25254 2026-05-26 cs.CV cs.AI

Guess the Unified Model: How Much Can We Recover from Generated Images?

猜猜统一模型:从生成的图像中我们能恢复多少?

Jasin Cekinmez, Ryo Mitsuhashi, Addison J. Wu, Yida Yin

发表机构 * Princeton University(普林斯顿大学)

AI总结 本文研究统一模型生成图像的可分离性,通过七个模型的大量图像实验,发现模型归因高度可行,且语义内容对可分离性有贡献但非主导信号。

详情
AI中文摘要

随着统一模型生成的图像现在在线广泛传播,追溯其来源模型为透明度和深入理解单个模型的特征行为提供了一条途径。先前的工作已经探索了LLM生成文本、扩散模型图像和数据集的来源,但统一模型生成图像的可分离性仍然是一个未充分探索的领域。我们通过使用七个统一模型生成的图像,检查在损坏、领域和提示语言上的可分离性来填补这一空白。我们表明模型归因高度可行,因为我们的模型在每个模型约20K图像的情况下达到了近乎完美的准确率。损坏和结构扰动对归因性能的影响较小,跨领域泛化表明语义内容对可分离性有贡献,但并非主导信号。最后,我们观察到对于大多数模型,提示语言归因接近随机水平,表明语言特定的视觉特征极少。这些发现突显了统一模型输出中一致的模型特定视觉特征,并为追踪和审计生成图像流水线开辟了新方向。

英文摘要

With unified model-generated images now widespread online, attributing their model of origin offers a path toward transparency and deeper insight into the characteristic behaviors of individual models. Prior work has explored provenance in LLM-generated text, diffusion model images, and datasets, but the separability of unified model-generated images remains an underexplored area. We address this gap by examining separability across corruption, domains, and prompt languages using images generated by seven unified models. We show that model attribution is highly feasible as our model achieves near-perfect accuracy with around 20K images per model. Corruptions and structural perturbations have only a modest effect on attribution performance, and cross-domain generalization reveals that semantic content contributes to separability but is not the dominant signal. Finally, we observe that for most models, prompt language attribution is around chance levels, suggesting minimal language-specific visual signatures. These findings highlight consistent model-specific visual characteristics in unified models outputs and open new directions for tracing and auditing generative image pipelines.

2605.23491 2026-05-26 cs.LG cs.AI cs.CL

CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test

CoSPlay: 测试时协作自我博弈与自生成代码和单元测试

Zhangyi Hu, Chenhui Liu, Tian Huang, Jindong Li, Yang Yang, Jiemin Wu, Zining Zhong, Menglin Yang, Yutao Yue

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) Institute of Deep Perception Technology, JITRI, Wuxi, China(深度感知技术研究院,无锡,中国)

AI总结 提出CoSPlay框架,通过代码与单元测试的协作自我博弈,在无真实单元测试的情况下迭代优化两者,显著提升代码生成性能。

Comments Code is available at: https://github.com/sanae-ai/CosPlay | Data & log is available at: https://huggingface.co/datasets/yomi017/CosPlay

详情
AI中文摘要

最近,可验证奖励强化学习(RLVR)和测试时扩展(TTS)通过可执行验证推动了LLM代码生成的发展。然而,真实单元测试(GT UTs)仍然是瓶颈:最先进的RLVR方法需要它们进行昂贵的训练,而现有的TTS方法在没有它们的情况下会失去竞争力。这促使了无GT的TTS,其中现有方法直接使用自生成的UT来优化和选择代码候选。然而,这些UT通常带有噪声或与错误代码虚假耦合,而UT质量在没有可靠代码的情况下也无法验证。因此,关键挑战是同时改进两者。为此,我们提出了CoSPlay,一个无GT、无需训练的框架,通过协作自我博弈同时改进代码和UT。它首先探索多样化的解决方案思路,识别其潜在失败模式以生成有区分力的UT思路。然后,它利用代码-UT执行矩阵中的双向通过计数信号,迭代地修剪或修复弱代码,并刷新或替换不可靠的UT,使两个池共同进化。最后,当多个代码在最高通过计数上并列时,它从最大的输出共识簇中选择最终代码,因为正确的代码在相同输入上一致,而错误的代码则发散。在四个具有挑战性的基准上的实验表明,CoSPlay在Qwen2.5-7B-Instruct上将平均BoN从22.1%提升到33.2%,UT准确率从14.6%提升到78.3%,匹配或超越了RLVR模型CURE-7B。当应用于CURE-7B时,它进一步将BoN提高了5.7%。CoSPlay还能跨不同骨干网络泛化,并在相当的token预算下优于无GT的TTS基线,且随着预算增加持续获益。这些结果表明,无需任何GT数据即可实现竞争性代码生成的可扩展推理策略。

英文摘要

Recently, Reinforcement Learning with Verifiable Rewards (RLVR) and Test-Time Scaling (TTS) have advanced LLM code generation through executable verification. Yet Ground-Truth Unit Tests (GT UTs) remain a bottleneck: SOTA RLVR methods require them for costly training, while existing TTS methods lose competitiveness without them. This motivates GT-free TTS, where existing methods directly use self-generated UTs to refine and select code candidates. Yet such UTs are often noisy or spuriously coupled with wrong code, and UT quality in turn cannot be validated without reliable code. The key challenge is therefore to jointly improve both. To this end, we present CoSPlay, a GT-free, training-free framework that jointly improves codes and UTs through cooperative self-play. It first explores diverse solution ideas and identifies their potential failure modes to produce discriminative UT ideas. It then uses bidirectional pass-count signals from the Code-UT execution matrix to iteratively prune or fix weak codes and refresh or replace unreliable UTs, letting the two pools co-evolve. Finally, when multiple codes remain tied at the highest pass count, it picks the final code from the largest output-consensus cluster, since correct codes agree on the same inputs while wrong codes diverge. Experiments on four challenging benchmarks show that CoSPlay on Qwen2.5-7B-Instruct improves average BoN from 22.1% to 33.2% and UT accuracy from 14.6% to 78.3%, matching or surpassing the RLVR model CURE-7B. When applied to CURE-7B, it further improves BoN by 5.7%. CoSPlay also generalizes across diverse backbones and outperforms GT-free TTS baselines under comparable token budgets, with continued gains as the budget scales up. These results suggest a scalable inference strategy for competitive code generation without any GT data.

2605.23473 2026-05-26 cs.LG cs.AI

Automated Random Embedding for Practical Bayesian Optimization with Unknown Effective Dimension

面向未知有效维度的实用贝叶斯优化的自动随机嵌入

Hong Qian, Xiang Shu, Xiang Xia, Xuhui Liu, Yangde Fu, Bei Liang, Huibin Wang, Liang Dou

发表机构 * Shanghai Institute of AI for Education, and School of Computer Science and Technology, East China Normal University(上海人工智能教育研究院,东华大学计算机科学与技术学院) Ant Group(蚂蚁集团) Nanjing University(南京大学)

AI总结 提出动态共享嵌入贝叶斯优化(DSEBO)方法,通过自动调整子空间维度并共享查询解,平衡近似与优化误差,在高维优化中显著降低遗憾和时间成本。

Comments This paper has been accepted by IJCAI 2026

详情
AI中文摘要

贝叶斯优化广泛应用于复杂黑箱函数的优化,但受维度灾难困扰。随机嵌入作为一种降维策略,通过在低维子空间中优化来简化具有有效维度的任务。然而,预先确定任务的有效维度仍是一个重大挑战,它影响子空间维度的选择和优化性能。传统方法使用专家提供的固定子空间维度,或依赖试错法估计子空间维度,消耗资源。为此,本文提出一种针对未知有效维度的高维贝叶斯优化的自动随机嵌入方法,称为动态共享嵌入贝叶斯优化(DSEBO)。DSEBO从低维度开始,如果当前子空间中的解显示初步收敛,则切换到更高维的子空间。DSEBO基于不同子空间中解的质量动态确定下一子空间的维度,并与新子空间共享已查询的解以实现更好的初始化。理论上,我们推导了DSEBO的遗憾界,并证明DSEBO能更好地平衡近似误差和优化误差。在维度规模变化的函数和未知有效维度的实际任务上的大量实验表明,与最先进方法相比,跨不同子空间的交替优化在高维优化中显著提高了优化遗憾和时间性能。

英文摘要

Bayesian optimization is widely employed for optimizing complex black-box functions but struggles with the curse of dimensionality. Random embedding, as a dimension reduction strategy, simplifies tasks that possess the effective dimension by optimizing within a low-dimensional subspace. However, determining the effective dimension of a task in advance remains a significant challenge, which influences the selection of the subspace dimensionality and the optimization performance. Traditional methods use fixed subspace dimensions provided by experts or rely on trial and error to estimate subspace dimensions with resources consumed. To this end, this paper proposes an automated random embedding for high-dimensional Bayesian optimization with unknown effective dimension, called Dynamic Shared Embedding Bayesian Optimization (DSEBO). DSEBO starts with a low dimension and switches to a higher subspace if the solutions in the current subspace show preliminary convergence. DSEBO dynamically determines the dimension of the next subspace based on the quality of the solutions in different subspaces and shares the queried solutions with the new subspace for a better initialization. Theoretically, we derive a regret bound for DSEBO and demonstrate that DSEBO can better balance approximation and optimization errors. Extensive experiments on functions with dimensionality of varying magnitudes and real-world tasks with unknown effective dimensions reveal that, compared with state-of-the-art methods, alternating optimization across different subspaces results in significant improvements in high-dimensional optimization, both in terms of optimization regret and time.

2605.23454 2026-05-26 cs.CL

ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning

ARES: 面向可扩展大语言模型强化学习的自动评分标准合成

Xiaoyuan Li, Keqin Bao, Moxin Li, Yubo Ma, Yichang Zhang, Wenjie Wang, Fuli Feng, Dayiheng Liu

发表机构 * University of Science and Technology of China(中国科学技术大学) Alibaba Group(阿里巴巴集团) National University of Singapore(新加坡国立大学)

AI总结 提出ARES框架,从原始预训练文档自动生成问答对和实例级加权评分标准,用于可扩展的基于评分标准的强化学习,在多个开放任务上超越持续预训练、监督微调和二元奖励强化学习。

Comments Under Review

详情
AI中文摘要

基于评分标准的奖励为将强化学习扩展到大型语言模型提供了一种有前景的方式,超越了具有自动可验证答案的任务。然而,扩展基于评分标准的强化学习仍然具有挑战性:现有方法通常依赖专家编写的评分标准和手动构建的问题集,而固定的任务级评分标准可能无法捕捉单个问题的评估需求。我们提出ARES(面向可扩展强化学习的自动评分标准合成),一个自动构建基于评分标准的强化学习数据的框架。从原始预训练文档开始,ARES将源知识转换为自包含的问答对,并共同生成特定问题的加权评分标准,从而为开放式回答提供实例级奖励监督。为了提高多样性和质量,ARES基于领域标签和人物角色信息生成,并应用验证过滤器以确保问题自包含性、答案忠实性和评分标准有效性。使用ARES,我们在十个领域构建了10万个评分标准标注的实例。在七个基准上的实验表明,使用ARES训练的基于评分标准的强化学习优于持续预训练、监督微调和二元奖励强化学习,在医疗和指令遵循等多维开放任务上提升最大。

英文摘要

Rubric-based rewards offer a promising way to extend reinforcement learning (RL) for large language models beyond tasks with automatically verifiable answers. However, scaling rubric-based RL remains challenging: existing approaches often rely on expert-written rubrics and manually constructed question sets, while fixed task-level rubrics may fail to capture the evaluation requirements of individual questions. We propose ARES (Automated Rubric synthEsis for Scalable RL), a framework for automatically constructing rubric-based RL data at scale. Starting from raw pretraining documents, ARES converts source knowledge into self-contained question-answer pairs and co-generates question-specific weighted rubrics, enabling instance-level reward supervision for open-ended responses. To improve diversity and quality, ARES conditions generation on domain labels and persona information, and applies validation filters for question self-containment, answer faithfulness, and rubric validity. Using ARES, we construct 100K rubric-annotated instances across ten domains. Experiments on seven benchmarks show that rubric-based RL trained with ARES, outperforms continual pretraining, supervised fine-tuning, and binary-reward RL, with the largest gains on multi-dimensional open-ended tasks such as healthcare and instruction following.

2605.23395 2026-05-26 cs.LG

Convex Compositional Reasoning Models

凸组合推理模型

Meir Roketlishvili, Semyon Semenov, Maksim Bobrin, Viktor Kovalchuk, Albert Baichorov, Abduragim Shtanchaev, Fakhri Karray, Dmitry V. Dylov, Martin Takáč, Arip Asadulaev

发表机构 * Mohamed bin Zayed University of Artificial Intelligence(莫扎德·本·扎耶德人工智能大学) Applied AI Institute(应用人工智能研究所) Computational Imaging Lab(计算成像实验室)

AI总结 针对组合推理中能量景观的非凸几何瓶颈,提出凸组合能量最小化框架,通过输入凸神经网络参数化因子并优化紧凸松弛,实现确定性投影一阶优化,在小问题上训练后可零样本迁移到大实例。

详情
AI中文摘要

组合能量模型可以通过在许多局部约束中重用学习到的因子能量,泛化到更大的组合推理问题。在本文中,我们表明组合推理的一个关键瓶颈不是组合本身,而是学习到的能量景观的非凸几何。为了解决这个问题,我们引入了凸组合能量最小化(CCEM),这是一个用输入凸神经网络参数化每个因子,并在可行集的紧凸松弛上优化组合能量的框架。由于凸性在求和下保持不变,全局松弛目标保持凸性,从而能够进行确定性投影一阶优化。CCEM分两个阶段训练:因子级对比学习以塑造局部能量盆地,然后通过展开的投影求解器进行端到端细化。我们的实验表明,在小子问题或单个问题规模上训练的模型可以无需重新训练地迁移到更大的实例。

英文摘要

Compositional energy-based models can generalize to larger combinatorial reasoning problems by reusing a learned factor energy across many local constraints. In our paper, we show that a key bottleneck in compositional reasoning is not composition itself, but the non-convex geometry of the learned energy landscape. To solve this problem, we introduce Convex Compositional Energy Minimization (CCEM), a framework that parameterizes each factor with an input-convex neural network and optimizes the composed energy over a tight convex relaxation of the feasible set. Because convexity is preserved under summation, the global relaxed objective remains convex, enabling deterministic projected first-order optimization. CCEM is trained in two stages: factor-level contrastive learning to shape local energy basins, followed by end-to-end refinement through an unrolled projected solver. Our experiments show that our models trained on small subproblems or a single problem size transfer to larger instances without retraining.

2605.23163 2026-05-26 cs.CL

Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

Fast-dDrive:面向自动驾驶的高效块扩散视觉语言模型

Kewei Zhang, Jin Wang, Sensen Gao, Chengyue Wu, Yulong Cao, Songyang Han, Boris Ivanovic, Langechuan Liu, Marco Pavone, Song Han, Daquan Zhou, Enze Xie

发表机构 * Peking University(北京大学) NVIDIA The University of Hong Kong(香港大学) MIT(麻省理工学院)

AI总结 提出Fast-dDrive,一种块扩散视觉语言动作模型,通过语义单元内双向细化与跨单元因果约束,结合结构化令牌冻结、分段感知训练和推测解码,实现高保真轨迹规划与高效推理,在WOD-E2E和nuScenes上达到最优性能,推理速度提升12倍。

详情
AI中文摘要

通过视觉-语言-动作(VLA)模型实现的端到端自动驾驶需要在高保真轨迹规划与高效推理之间取得不稳定的平衡。现有范式通常存在不足:自回归(AR)VLA在边缘硬件上受限于内存带宽,且容易产生曝光偏差漂移;而全序列扩散模型无法复用KV缓存,并遭受违反基本感知-规划因果关系的“逻辑泄漏”。我们提出Fast-dDrive,一种块扩散VLA,它在语义单元内执行双向细化,同时强制跨单元严格因果排序。利用驾驶VLA通常输出结构化JSON式输出的观察,Fast-dDrive将结构令牌冻结为节支架,并采用节感知训练策略,优先考虑安全关键规划。我们进一步引入支架推测解码,以显著更高的吞吐量实现AR等效质量。最后,我们提出一种低开销的测试时缩放方案:通过从单个共享前缀KV缓存分叉出N个随机轨迹展开并取平均,以极小的计算成本有效抑制预测方差。实验结果表明,Fast-dDrive重新定义了驾驶智能体的速度-精度边界。在WOD-E2E测试集上,Fast-dDrive在3秒和5秒平均位移误差(ADE)上达到最优,同时在基于扩散的VLA中具有最高的RFS;在nuScenes上,它将平均L2误差降至0.32米(提升22%)。当与SGLang集成时,我们的框架相比AR基线实现了12倍的吞吐量提升,缩小了高容量VLA与实时车载部署效率需求之间的差距。

英文摘要

End-to-end autonomous driving via Vision-Language-Action (VLA) models demands a precarious balance between high-fidelity trajectory planning and efficient inference. Existing paradigms typically fall short: autoregressive (AR) VLAs are memory-bandwidth-bound on edge hardware and prone to exposure-bias drift, while full-sequence diffusion models preclude KV-cache reuse and suffer from "logical leakage" that violates the fundamental perceive-then-plan causality. We present Fast-dDrive, a block-diffusion VLA that performs bidirectional refinement within semantic units while enforcing strict causal ordering across them. Leveraging the observation that driving VLAs often emit structured JSON-like outputs, Fast-dDrive freezes structural tokens into a section scaffold and employs a section-aware training recipe that prioritizes safety-critical planning. We further introduce Scaffold Speculative Decoding to achieve AR-equivalent quality at significantly higher throughput. Finally, we propose a low-overhead test-time scaling scheme: by forking $N$ stochastic trajectory rollouts from a single shared-prefix KV cache and averaging them, we effectively suppress prediction variance at a fractional computational cost. Empirical results demonstrate that Fast-dDrive redefines the speed-accuracy frontier for driving agents. On the WOD-E2E test set, Fast-dDrive achieves SOTA ADE@3s and ADE@5s, alongside the highest RFS among diffusion-based VLAs; on nuScenes, it reduces average L2 error to $0.32$m (a $22\%$ improvement). When integrated with SGLang, our framework delivers $12\times$ throughput speedup over the AR baseline, narrowing the gap between high-capacity VLAs and the efficiency demands of real-time on-vehicle deployment.

2605.23148 2026-05-26 cs.CL cs.CY

When Symptoms Are Not Enough: Evidence-Weighting Patterns in Large Language Model Psychiatric Screening

当症状不足时:大语言模型精神科筛查中的证据加权模式

Jianfeng Zhu, Megan Korhummel, Ruoming Jin, Karin G. Coifman

发表机构 * Departments of Computer Science1 and Psychological Science2(计算机科学系和心理学系)

AI总结 本研究引入SCID锚定基准,评估五个大语言模型在精神科筛查中的表现,发现模型在焦虑症、抑郁症和创伤后应激障碍分类中,当存在功能保留或保护性背景时倾向于低估症状证据,导致假阴性错误。

Comments 25 pages 7 figures

详情
AI中文摘要

随着心理健康护理需求超过临床医生提供的评估,对可扩展筛查工具的需求日益增加。大语言模型(LLMs)可能从患者叙述中识别精神科风险,但其在不同诊断、人口统计亚组和证据使用模式中的可靠性仍不确定。我们引入了一个基于SCID的基准,包含555个半结构化体验访谈,并配有焦虑症、重度抑郁症、创伤后应激障碍和任何当前心理健康障碍的诊断参考标签。使用零样本任务特定提示,我们评估了五个最先进的LLM,并检查假阴性错误是否反映了遗漏的精神科证据或对症状、功能损害和保护性背景线索的差异化加权。不同任务和模型的表现各异,准确率从0.49到0.86,马修斯相关系数从0.16到0.38。GPT-4.1 Mini和GPT-5 Mini显示出最一致的疾病特异性准确率。亚组分析发现,男性参与者的抑郁症分类准确率高于女性,没有一致的年龄相关模式,种族阶层间存在适度的非均匀变异。证据整合分析显示,假阴性的焦虑症和PTSD分类通常包含明确的症状证据,但伴有功能保留、应对能力或社会支持。功能损害证据使模型输出偏向阳性分类,而保护性背景证据则使输出偏离。这些发现表明,LLMs可能支持可扩展的精神科筛查,但它们在功能保留或保护性背景下低估症状证据的倾向需要在临床部署前进行仔细验证。

英文摘要

As demand for mental health care outpaces clinician-delivered assessment, scalable screening tools are increasingly needed. Large language models (LLMs) may identify psychiatric risk from patient narratives, but their reliability across diagnoses, demographic subgroups, and evidence-use patterns remains uncertain. We introduce a SCID-anchored benchmark of 555 semi-structured experiential interviews paired with diagnostic reference labels for anxiety disorder, major depressive disorder, post-traumatic stress disorder, and any current mental health disorder. Using zero-shot task-specific prompting, we evaluated five state-of-the-art LLMs and examined whether false-negative errors reflected missed psychiatric evidence or differential weighting of symptom, functional-impairment, and protective-context cues. Performance varied across tasks and models, with accuracy ranging from 0.49 to 0.86 and Matthews correlation coefficients from 0.16 to 0.38. GPT-4.1 Mini and GPT-5 Mini showed the most consistent disorder-specific accuracy. Subgroup analyses found higher depression-classification accuracy among male than female participants, no consistent age-related pattern, and modest non-uniform variation across race strata. Evidence-integration analyses showed that false-negative anxiety and PTSD classifications often contained explicit symptom evidence but were accompanied by preserved functioning, coping ability, or social support. Functional-impairment evidence shifted model outputs toward positive classifications, whereas protective-context evidence shifted outputs away. These findings suggest that LLMs may support scalable psychiatric screening, but their tendency to discount symptom evidence in the presence of preserved functioning or protective context requires careful validation before clinical deployment.

2605.22769 2026-05-26 cs.CL cs.AI

Understanding Data Temporality Impact on Large Language Models Pre-training

理解数据时间性对大型语言模型预训练的影响

Hippolyte Pilchen, Romain Fabre, Franck Signe Talla, Patrick Perez, Edouard Grave

发表机构 * Kyutai

AI总结 研究预训练数据顺序对大型语言模型获取时间敏感事实知识的影响,通过构建包含7000多个时间相关问题的基准并训练60亿参数模型,发现按时间顺序训练比随机打乱训练能产生更及时和精确的知识。

详情
AI中文摘要

大型语言模型(LLMs)通常在打乱顺序的语料库上进行训练,导致模型的知识在训练时被冻结,其时间基础仍然难以理解。在这项工作中,我们研究了预训练动态对获取时间敏感事实知识的影响,特别关注数据顺序。我们的主要贡献有两方面。首先,我们引入了一个包含7000多个时间基础问题的综合基准和一个评估协议,能够分析模型是否将事实与其对应的时间段正确关联。其次,我们在按时间顺序排列的Common Crawl快照上预训练了60亿参数的模型,并将其与标准的随机打乱预训练进行比较。我们的结果表明,按顺序训练的模型在通用语言理解和常识方面与随机打乱的基线相当,同时始终表现出更及时和精确的时间知识。按时间顺序的预训练提高了事实的新鲜度,而随机打乱的预训练在较旧的数据上表现更好,可能是由于事实重复增加。这些发现,连同我们在https://github.com/kyutai-labs/kairos 发布的代码、在https://huggingface.co/collections/kyutai/kairos 发布的检查点和数据集,为LLMs的持续学习未来研究提供了基础。

英文摘要

Large language models (LLMs) are typically trained on shuffled corpora, yielding models whose knowledge is frozen at train time and whose temporal grounding remains poorly understood. In this work, we study the impact of pre-training dynamics on the acquisition of time-sensitive factual knowledge, focusing specifically on data ordering. Our main contributions are twofold. First, we introduce a comprehensive benchmark of over 7,000 temporally grounded questions and an evaluation protocol that enables analysis of whether models correctly associate facts with their corresponding time periods. Second, we pretrain 6B-parameter models on temporally ordered Common Crawl snapshots and compare them against standard shuffled pre-training. Our results show that sequentially trained models match shuffled baselines on general language understanding and common knowledge while consistently exhibiting more up-to-date and temporally precise knowledge. Temporally ordered pre-training yields improved factual freshness, while shuffled pre-training peaks on older data, possibly due to increased factual repetition. These findings, along with the release of our code at https://github.com/kyutai-labs/kairos , checkpoints, and datasets at https://huggingface.co/collections/kyutai/kairos provide a foundation for future research on continual learning for LLMs.

2605.22222 2026-05-26 cs.LG

ARC-STAR: Auditable Post-Hoc Correction for PDE Foundation Models

ARC-STAR: 面向PDE基础模型的可审计事后修正

Chengze Li, Lingwei Wei, Li Sun, Hongbo Lv, Jie Yang, Hanrong Zhang, Kening Zheng, Wei-Chieh Huang, Enze Ma, Philip S. Yu

发表机构 * University of Illinois Chicago(伊利诺伊大学芝加哥分校) Beijing University of Posts and Telecommunications(北京邮电大学) North China Electric Power University(华北电力大学)

AI总结 针对PDE基础模型预测漂移且误差空间集中的问题,提出冻结求解器的事后修正框架ARC-STAR,通过全局修正、局部精炼和预算感知路由三阶段实现可审计、低误差的修正。

Comments 40 pages, including appendices

详情
AI中文摘要

偏微分方程(PDE)基础模型是预训练网络,能够从单一可重用求解器预测速度、压力等物理场的演化。在不熟悉的流场上,它们的预测会逐步漂移,误差集中在少数区域,然而重新训练会破坏网络稳定性,而统一的事后修正忽略了这种空间集中性。为解决此问题,我们提出了一种冻结求解器的事后修正框架——自适应风险校准空间分诊可审计精炼(ARC-STAR)。ARC-STAR将修正组织为三个阶段:全局修正器消除广泛的求解器偏差,块级局部精炼器清理全局后残差,在部署时,无标签分数在计算预算下将精炼路由到高风险块。该框架设计为:(i) 冻结宿主,保留预训练求解器无需微调;(ii) 可审计,全局和局部阶段分别训练和评估以实现可衡量的贡献;(iii) 预算感知,使用块级接口,要么精炼整个场,要么将有限计算路由到高风险区域。在跨越十个状态单元的五个流基准测试中,ARC-STAR是唯一在每个单元上将速度滚动误差比原始Poseidon降低至少36倍的方法。全局阶段将原始宿主误差降低91-99%,局部阶段进一步将剩余的全局后残差降低高达94.4%。

英文摘要

Partial differential equation (PDE) foundation models are pretrained networks that forecast how physical fields like velocity and pressure evolve from a single reusable solver. On unfamiliar flows their predictions drift step by step, errors concentrate in a few regions, yet retraining destabilizes the network and uniform post-hoc correction overlooks this spatial concentration. To address this, we propose a frozen-solver post-hoc correction framework, Adaptive Risk-Calibrated Spatial Triage for Auditable Refinement (ARC-STAR). ARC-STAR organizes correction into three stages: a global corrector removes broad solver bias, a blockwise local refiner cleans the post-global residual, and, at deployment, a label-free score routes refinement to high-risk blocks under a compute budget. The framework is designed to be (i) frozen-host, preserving the pretrained solver without fine-tuning; (ii) auditable, with global and local stages trained and evaluated separately for measurable contributions; and (iii) budget-aware, using a blockwise interface that either refines the full field or routes limited compute to high-risk regions. Across five flow benchmarks spanning ten regime cells, ARC-STAR is the only method that cuts velocity rollout error by at least 36x over raw Poseidon on every cell. The global stage reduces raw host error by 91-99%, and the local stage further reduces the remaining post-global residual by up to 94.4%.

2605.22137 2026-05-26 cs.CL

Cross-Lingual Consensus: Aligning Multilingual Cultural Knowledge via Multilingual Self-Consistency

跨语言共识:通过多语言自一致性对齐多语言文化知识

Andrew Ivan Soegeng, Patrick Sutanto, Tan Sang Nguyen

发表机构 * SAP School of Computing, National University of Singapore(国立新加坡大学计算机学院)

AI总结 提出一种自监督框架,利用多语言自一致性和自我批评机制,从本地语言表示中提取文化知识并迁移到英语,以缩小跨语言文化知识差距,在BLEnD基准上平均提升英语查询性能5.03%。

Comments Accepted to The 1st Workshop on Multilinguality in the Era of Large Language Models

详情
AI中文摘要

尽管大型语言模型(LLMs)在各种任务中展现出强大的能力,但它们在不同语言之间表现出显著的性能差异。虽然用英语提示LLMs通常能获得最高的通用性能,但这往往会导致以西方为中心的偏见,阻碍模型准确反映多样化的文化知识。我们假设LLMs已经拥有嵌入在本地语言表示中的丰富文化知识,但在用英语提示时无法检索到这些知识。为了弥合这一跨语言知识差距,我们提出了一种新颖的自监督框架。我们的方法利用多语言自一致性来识别跨语言中最可靠的文化响应,并结合自我批评机制将这些知识转移到较弱的语言中。在BLEnD基准上的评估表明,我们的方法显著改善了文化对齐——在英语查询上平均提升5.03%——完全依赖于自生成数据。最终,我们的工作表明,潜在的文化知识可以成功地在语言之间浮现和传播,从而实现更具文化公平性和一致性的LLMs。

英文摘要

Although Large Language Models (LLMs) demonstrate strong capabilities across various tasks, they exhibit significant performance discrepancies across languages. While prompting LLMs in English typically yields the highest general performance, it often induces a Western-centric bias, hindering the model's ability to accurately reflect diverse cultural knowledge. We hypothesize that LLMs already possess rich cultural knowledge embedded within local-language representations, but fail to retrieve it when prompted in English. To bridge this cross-lingual knowledge gap, we propose a novel self-supervised framework. Our method leverages multilingual self-consistency to identify the most reliable cultural responses across languages, combined with a self-critique mechanism to transfer this knowledge to the weaker language. Evaluations on the BLEnD benchmark demonstrate that our approach significantly improves cultural alignment-boosting performance on English queries by an average of 5.03%-relying entirely on self-generated data. Ultimately, our work demonstrates that latent cultural knowledge can be successfully surfaced and propagated across languages, enabling more culturally equitable and consistent LLMs.

2605.22064 2026-05-26 cs.CL

Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild

Hy-MT2:面向复杂真实场景的快速、高效且强大的多语言翻译模型系列

Mao Zheng, Zheng Li, Tao Chen, Bo Lv, Mingrui Sun, Mingyang Song, Jinlong Song, Hong Huang, Decheng Wu, Hai Wang, Yifan Song, Yanfeng Chen, Guanwei Zhang

发表机构 * Tencent Hunyuan Team(腾讯文言团队)

AI总结 本文提出Hy-MT2系列多语言翻译模型,通过三种规模(1.8B、7B、30B-A3B MoE)支持33种语言翻译,在通用、商业、领域和指令跟随任务上超越开源模型和商业API,并实现轻量级设备端部署。

详情
AI中文摘要

Hy-MT2是一系列面向复杂真实场景的快速思考多语言翻译模型。它包括三种模型规模:1.8B、7B和30B-A3B(MoE),均支持33种语言之间的翻译,并能有效遵循多种语言的翻译指令。多维度评估表明,Hy-MT2在通用、真实世界业务、领域特定和指令跟随翻译任务中均表现出色。7B和30B模型在快速思考模式下超越了DeepSeek-V4-Pro和Kimi K2.6等开源模型,而轻量级的1.8B模型在整体性能上也超越了微软、豆包等提供商的主流商业API。此外,当与AngelSlim的1.25位极端量化结合用于设备端部署时,轻量级1.8B模型仅需440 MB存储空间,并实现了1.5倍的推理加速。

英文摘要

Hy-MT2 is a family of fast-thinking multilingual translation models designed for complex real-world scenarios. It includes three model sizes: 1.8B, 7B, and 30B-A3B (MoE), all of which support translation among 33 languages and effectively follow translation instructions in multiple languages. Multi-dimensional evaluations show that Hy-MT2 delivers outstanding performance across general, real-world business, domain-specific, and instruction-following translation tasks. The 7B and 30B models outperform open-source models such as DeepSeek-V4-Pro and Kimi K2.6 in fast-thinking mode, while the lightweight 1.8B model also surpasses mainstream commercial APIs from providers such as Microsoft and Doubao overall. Moreover, when paired with AngelSlim's 1.25-bit extreme quantization for on-device deployment, the lightweight 1.8B model requires only 440 MB of storage and achieves a 1.5x inference speedup.

2605.21602 2026-05-26 cs.AI cs.SE

Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs

基准测试与改进LLMs中的分布外对齐失败监控器

Dylan Feng, Pragya Srivastava, Anca Dragan, Cassidy Laidlaw

发表机构 * University of California, Berkeley, USA(加州大学伯克利分校) Haize Labs, New York, USA(Haize实验室) Google DeepMind, India(谷歌DeepMind)

AI总结 针对大语言模型在分布外情境下的安全与对齐失败问题,提出MOOD基准并证明结合守卫模型与OOD检测器可提升监控召回率。

详情
AI中文摘要

大语言模型(LLMs)的许多安全和对齐失败源于分布外(OOD)情境:模型开发者未预见到的异常提示或响应模式。我们通过引入名为Misalignment Out Of Distribution (MOOD)的基准,系统研究LLM监控流程能否检测这些OOD对齐失败。对于在大量安全数据集上训练的现成模型,很难找到真正OOD的失败。我们通过在MOOD中包含一个受限训练集(用于训练我们自己的监控器)以及七个具有不同对齐失败且超出训练分布的测试集来规避这一问题。利用MOOD,我们发现守卫模型(安全分类器)通常难以泛化到OOD。为解决此问题,我们提出将守卫模型与OOD检测器结合。我们测试了四种OOD检测器,发现将守卫模型与基于马氏距离和困惑度的OOD检测器结合,可将召回率从39%提升至45%。我们还建立了跨模型规模的监控器(结合守卫模型和OOD检测器)的正向扩展趋势;发现将OOD检测纳入监控比使用参数多20倍的守卫模型能获得更高的召回率增益。我们的工作表明,OOD检测应成为LLM监控的关键组成部分,并为这一重要问题的进一步研究奠定了基础。我们公开发布了实验代码和数据,相关链接见:https://github.com/Dylan102938/mood-bench。

英文摘要

Many safety and alignment failures of large language models (LLMs) occur due to out-of-distribution (OOD) situations: unusual prompt or response patterns that are unforeseen by model developers. We systematically study whether LLM monitoring pipelines can detect these OOD alignment failures by introducing a benchmark called Misalignment Out Of Distribution (MOOD). It is difficult to find failures that are truly OOD for off-the-shelf models trained on vast safety datasets. We sidestep this by including a restricted training set in MOOD that we use to train our own monitors, as well as seven test sets with diverse alignment failures that are outside the training distribution. Using MOOD, we find that guard models (safety classifiers) often fail to generalize OOD. To fix this, we propose combining guard models with OOD detectors. We test four types of OOD detectors and find that a combination of a guard model with Mahalanobis distance and perplexity-based OOD detectors can improve recall from 39% to 45%. We also establish positive scaling trends across model scales for monitors that combine a guard model and OOD detector; we find that incorporating OOD detection into monitoring achieves a higher recall gain than using a guard model with 20 times more parameters. Our work suggests that OOD detection should be a crucial component of LLM monitoring and provides a foundation for further work on this important problem. We release the code and data for our experiments publicly, and you can find the relevant links here: https://github.com/Dylan102938/mood-bench.

2605.20787 2026-05-26 cs.CV

Findings of the Counter Turing Test: AI-Generated Image Detection

反图灵测试结果:AI生成图像检测

Rajarshi Roy, Nasrin Imanpour, Ashhar Aziz, Shashwat Bajpai, Gurpreet Singh, Shwetangshu Biswas, Kapil Wanaskar, Parth Patwa, Subhankar Ghosh, Shreyas Dixit, Nilesh Ranjan Pal, Vipula Rawte, Ritvik Garimella, Amitava Das, Amit Sheth, Vasu Sharma, Aishwarya Naresh Reganti, Vinija Jain, Aman Chadha

发表机构 * Kalyani Government Engineering College(卡利尼政府工程学院) University of South Carolina(南卡罗来纳大学) IIIT Delhi(德里IIIT) BITS Pilani Hyderabad Campus(比斯潘尼 Hyderabad 分校) IIIT Guwahati(果阿瓦提IIIT) NIT Silchar(西里char 工科院) San José State University(桑乔斯州立大学) UCLA(加州大学洛杉矶分校) Washington State University(华盛顿州立大学) Vishwakarma Institute of Information Technology(维斯瓦克arma 信息科技学院) Meta AI Amazon AI(亚马逊AI) BITS Pilani Goa(比斯潘尼 Goa 分校)

AI总结 本文通过Defactify 4.0工作坊的反图灵测试竞赛,评估了多种检测方法在区分AI生成图像与真实图像及识别具体生成模型上的性能,发现检测准确率较高但模型识别仍具挑战。

Comments Defactify4 @AAAI 2025

详情
AI中文摘要

生成式AI技术(如Stable Diffusion、DALL-E和Midjourney)的快速发展显著改变了合成视觉内容的创建方式。虽然这些模型推动了各行各业的创新,但也带来了严重挑战,包括错误信息、虚假信息和有偏内容生成。AI生成图像日益逼真,使其检测成为研究人员、政策制定者和行业利益相关者关注的紧迫问题。 在本文中,我们介绍了Defactify 4.0工作坊的成果,该工作坊推出了用于AI生成图像检测的反图灵测试(CT2)。竞赛包含两个关键任务:(1)将图像二分类为AI生成或真实;(2)识别生成AI图像的具体生成模型。为支持这两个任务,我们采用了MS COCOAI数据集,该基准包含由五个最先进模型生成的96000张真实和合成图像,以及来自MS COCO的真实图像。 参与者采用了多种检测策略,包括卷积神经网络(CNN)、视觉Transformer(ViT)、基于频率的分析、对比学习和多模态技术。结果表明,虽然AI生成图像可以被高精度检测(F1分数>0.83),但准确识别具体模型仍然更具挑战性(最高F1分数:0.4986)。这些发现凸显了改进模型指纹识别、对抗鲁棒性和实时检测机制的必要性。

英文摘要

The rapid advancements in generative AI technologies, such as Stable Diffusion, DALL-E, and Midjourney, have significantly transformed the creation of synthetic visual content. While these models enable innovation across industries, they also pose serious challenges, including misinformation, disinformation, and biased content generation. The increasing realism of AI-generated images makes their detection a pressing concern for researchers, policymakers, and industry stakeholders. In this paper, we present the findings of the Defactify 4.0 workshop, which introduced the Counter Turing Test (CT2) for AI-Generated Image Detection. The competition consisted of two key tasks: (1) binary classification of images as either AI-generated or real and (2) identification of the specific generative model responsible for an AI-generated image. To support both tasks, we employed the MS COCOAI dataset, a benchmark of 96000 real and synthetic images generated by five state-of-the-art models alongside real images from MS COCO. Participants employed diverse detection strategies, including convolutional neural networks (CNNs), Vision Transformers (ViTs), frequency-based analysis, contrastive learning, and multimodal techniques. The results demonstrated that while AI-generated images can be detected with high accuracy (F1-score > 0.83), identifying the exact model used remains significantly more challenging (highest F1-score: 0.4986). These findings highlight the need for improved model fingerprinting, adversarial robustness, and real-time detection mechanisms.

2605.20772 2026-05-26 cs.CV

VIHD: Visual Intervention-based Hallucination Detection for Medical Visual Question Answering

VIHD: 基于视觉干预的医学视觉问答幻觉检测

Jiayi Chen, Benteng Ma, Zehui Liao, Winston Chong, Yasmeen George, Jianfei Cai

发表机构 * Department of Data Science \& AI, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia Alfred Health Radiology, Alfred Health, Melbourne, VIC 3004, Australia School of Translational Medicine, Faculty of Medicine, Nursing Health Sciences, Monash University, Melbourne, VIC 3800, Australia Hong Kong Polytechnic University, Hong Kong SAR, China

AI总结 提出VIHD方法,通过视觉依赖探测和视觉干预解码校准语义熵,有效检测医学多模态大语言模型中的幻觉响应。

Comments Early accepted by MICCAI 2026. This version of the contribution has been accepted for publication, after peer review (when applicable) but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections

详情
AI中文摘要

尽管医学多模态大语言模型(MLLMs)在辅助诊断方面展现出潜力,但它们仍然频繁生成在语言上看似合理但缺乏视觉证据的幻觉响应。这种幻觉对临床决策构成风险,因此需要有效的检测方法。现有的内省检测方法主要通过分析模型在原始或扰动输入条件下的响应来进行不确定性估计或逻辑验证。然而,这种外部扰动通常是启发式的且与上下文无关,忽略了解码过程中生成令牌与相关视觉令牌之间的内部跨模态依赖。为解决这一问题,我们提出了VIHD,一种基于视觉干预的幻觉检测方法,通过针对性的视觉令牌掩码校准语义熵,以实现更有效的幻觉检测。VIHD通过视觉依赖探测(VDP)定位视觉主导的解码器层,通过令牌掩码执行视觉干预解码(VID)以校准语义分布,并将得到的校准语义熵(CSE)量化为可靠的幻觉信号。在三个医学VQA基准测试和两个医学MLLM上的大量实验表明,VIHD始终优于最先进的方法,强调了细粒度视觉依赖对于幻觉检测的重要性。代码将发布在https://github.com/Jiayi-Chen-AU/VIHD。

英文摘要

While medical Multimodal Large Language Models (MLLMs) have shown promise in assisting diagnosis, they still frequently generate hallucinated responses that appear linguistically plausible but lack visual evidence. Such hallucinations pose risks to clinical decision-making and necessitate effective detection. Existing introspective detection methods primarily perform uncertainty estimation or logical verification by analyzing model responses conditioned on original or perturbed inputs. However, such external perturbations are often heuristic and context-agnostic, which overlooks the internal cross-modal dependency between generated tokens and related visual tokens during decoding. To address this issue, we propose VIHD, a Visual Intervention-based Hallucination Detection method that leverages targeted visual token masking to calibrate semantic entropy for more effective hallucination detection. VIHD locates visually dominant decoder layers via Visual Dependency Probing (VDP), executes Visual Intervention Decoding (VID) via token masking to calibrate the semantic distribution, and quantifies the resulting Calibrated Semantic Entropy (CSE) as a reliable hallucination signal. Extensive experiments on three medical VQA benchmarks with two medical MLLMs demonstrate that VIHD consistently outperforms state-of-the-art methods, underscoring the importance of fine-grained visual dependency for hallucination detection. The code will be available at https://github.com/Jiayi-Chen-AU/VIHD

2605.20761 2026-05-26 cs.CL

Findings of the Counter Turing Test: AI-Generated Text Detection

反图灵测试的发现:AI生成文本检测

Rajarshi Roy, Gurpreet Singh, Ashhar Aziz, Shashwat Bajpai, Nasrin Imanpour, Shwetangshu Biswas, Kapil Wanaskar, Parth Patwa, Subhankar Ghosh, Shreyas Dixit, Nilesh Ranjan Pal, Vipula Rawte, Ritvik Garimella, Amitava Das, Amit Sheth, Vasu Sharma, Aishwarya Naresh Reganti, Vinija Jain, Aman Chadha

发表机构 * Kalyani Government Engineering College(卡利尼政府工程学院) IIIT Delhi(德里IIIT) BITS Pilani Hyderabad Campus(比斯汉学院海得拉巴校区) AI Institute, University of South Carolina(南卡罗来纳大学人工智能研究所) IIIT Guwahati(古瓦哈提IIIT) NIT Silchar(西里char理工学院) San José State University(圣何塞州立大学) UCLA(加州大学洛杉矶分校) Washington State University(华盛顿州立大学) Vishwakarma Institute of Information Technology(维斯瓦卡马信息科技学院) Meta AI(Meta人工智能) Amazon AI(亚马逊人工智能) BITS Pilani Goa(比斯汉学院果阿)

AI总结 本文通过反图灵测试(CT2)共享任务,评估了AI生成文本检测技术的有效性,发现二分类任务表现优异(F1=1.0000),但模型归因任务更具挑战性(最佳F1=0.9531),并分析了微调Transformer、集成学习等方法的优劣。

Comments Defactify4 @AAAI 2025

详情
AI中文摘要

大型语言模型生成流畅、上下文连贯文本的能力不断增强,给负责确保数字内容真实性的系统和机构带来了越来越大的压力。先进的生成模型如GPT-4、Claude 3.5和Llama能够生成高度连贯且类似人类的文本,使得区分人类撰写和AI生成的内容变得越来越困难。虽然这些模型具有变革性的应用,但它们的滥用引发了关于错误信息、偏见叙事和安全威胁的担忧。 本文对最先进的AI生成文本检测技术进行了全面分析,并通过反图灵测试(CT2)共享任务评估了其有效性。任务A(二分类)要求参与者区分人类撰写和AI生成的文本,而任务B(模型归因)则专注于识别生成给定文本的具体语言模型。结果显示,二分类性能较高,最佳系统F1得分为1.0000,但模型归因得分显著较低,最佳系统仅为0.9531,凸显了该任务的复杂性。 表现最佳的团队利用了微调Transformer模型、集成学习和混合检测方法,其中基于DeBERTa和BART的方法表现出色。然而,任务B的较低得分强调了区分不同LLM输出的挑战,需要进一步研究对抗鲁棒性、特征提取和跨领域泛化。

英文摘要

The growing capability of large language models to produce fluent, contextually coherent text has created mounting pressure on the systems and institutions responsible for ensuring the authenticity of digital content. Advanced generative models such as GPT-4, Claude 3.5, and Llama can produce highly coherent and human-like text, making it increasingly difficult to differentiate between human-written and AI-generated content. While these models have transformative applications, their misuse has raised concerns about misinformation, biased narratives, and security threats. This paper provides a comprehensive analysis of state-of-the-art AI-generated text detection techniques and evaluates their effectiveness through the Counter Turing Test (CT2) shared tasks. Task A (Binary Classification) required participants to distinguish between human-written and AI-generated text, while Task B (Model Attribution) focused on identifying the specific language model responsible for generating a given text. The results demonstrated high performance in binary classification, with the top system achieving an F1 score of 1.0000, but significantly lower scores in model attribution, where the best system achieved 0.9531, highlighting the increased complexity of this task. The top-performing teams leveraged fine-tuned transformer models, ensemble learning, and hybrid detection approaches, with DeBERTa-based and BART-based methods demonstrating strong results. However, the lower scores in Task B underscore the challenges of distinguishing outputs from different LLMs, necessitating further research into adversarial robustness, feature extraction, and cross-domain generalization.

2605.20749 2026-05-26 cs.LG cs.AI

The Devil is in the Condition Numbers: Why is GLU Better than non-GLU Structure?

魔鬼在于条件数:为什么GLU优于非GLU结构?

Xingyu Lyu, Qianqian Xu, Zhiyong Yang, Peisong Wen, Qingming Huang

发表机构 * State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China(人工智能安全国家重点实验室,计算技术研究所,中国科学院,北京100190,中国) School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 101408, China(中国科学院大学计算机科学与技术学院,北京101408,中国) Beijing Academy of Artificial Intelligence (BAAI), Beijing, China(北京人工智能研究院(BAAI),北京,中国)

AI总结 通过神经正切核分析,发现门控线性单元(GLU)通过重塑核谱、减小条件数来加速优化收敛,而非主要降低泛化差距。

Comments Accepted by ICML 2026

详情
AI中文摘要

门控线性单元(GLU)及其变体被广泛应用于现代开源大语言模型架构中,并且始终优于其非门控对应物,然而这种优势的根本原因尚不清楚。在这项工作中,我们通过分析神经正切核(NTK)机制下的两层网络来研究GLU。我们的分析表明,GLU结构重塑了NTK谱,导致更小的条件数和更紧凑的特征值分布。基于这一发现,我们进一步分析了由此产生的训练动态,并展示了重塑后的谱如何导致GLU模型更快的收敛,包括在GLU和非GLU模型之间观察到的特征损失交叉现象。最后,我们通过实验观察到,GLU在缩小各种模型(包括ViT和GPT-2)的泛化差距方面影响有限,这表明其主要优势在于加速优化而非减少泛化差距。代码可在 https://github.com/Zemdalk/GLU-NTK 获取。

英文摘要

Gated Linear Units (GLU) and their variants are widely adopted in modern open-source large language model architectures and consistently outperform their non-gated counterparts, yet the underlying reasons for this advantage remain unclear. In this work, we study GLU by analyzing two-layer networks in the neural tangent kernel (NTK) regime. Our analysis reveals that the GLU structure reshapes the NTK spectrum, leading to a smaller condition number and a more compact eigenvalue distribution. Building on this finding, we further analyze the resulting training dynamics and show how the reshaped spectrum leads to faster convergence of GLU models, including a characteristic loss-crossing phenomenon observed between GLU and non-GLU models. Finally, we empirically observe that GLU has limited impact in reducing the generalization gap on various models, including ViT and GPT-2, suggesting that its primary benefit lies in accelerating optimization rather than reducing the generalization gap. The code is available at: https://github.com/Zemdalk/GLU-NTK.

2605.19739 2026-05-26 cs.CV

FlowErase-RL: Rethinking Concept Erasure as Reward Optimization in Flow Matching Models

FlowErase-RL:将概念擦除重新思考为流匹配模型中的奖励优化

Yi Sun, Zhiqi Zhang, Xinhao Zhong, Yimin Zhou, Shuoyang Sun, Bin Chen, Shu-Tao Xia, Ke Xu

发表机构 * Harbin Institute of Technology, Shenzhen(哈尔滨工业大学(深圳)) Tsinghua Shenzhen International Graduate School, Tsinghua University(清华大学深圳国际研究生院) Jilin University(吉林大学) Peng Cheng Laboratory(鹏城实验室) Department of Computer Science and Technology, Tsinghua University(清华大学计算机科学与技术系)

AI总结 提出FlowErase-RL,首个基于GRPO的框架,通过动态双路径奖励机制将概念擦除转化为奖励优化问题,在抑制目标概念的同时保持生成保真度,实现最先进的擦除性能与鲁棒性。

详情
AI中文摘要

近期流匹配模型的进展显著提升了文本到图像生成的质量,但也因生成有害或不良内容而引入了日益增长的安全风险。现有的概念擦除方法要么是推理时干预,效果有限;要么依赖监督微调(SFT),后者需要精确对齐的数据,且在可扩展性和多概念场景中面临挑战。本文提出\emph{FlowErase-RL},首个基于GRPO的流匹配模型概念擦除框架。我们将概念擦除重新表述为奖励优化问题,并引入 extbf{动态双路径奖励机制},联合优化(i)概念擦除(CE)奖励以抑制目标概念,以及(ii)非目标空间(NS)奖励以保持生成保真度。通过性能驱动的切换策略,在训练过程中自适应平衡两条奖励路径,无需显式监督即可实现稳定优化。在裸体、物体和艺术风格擦除上的大量实验表明,我们的方法在保持强大图像质量和语义对齐的同时,实现了最先进的擦除性能。此外,它对对抗攻击表现出鲁棒抵抗性,并能有效扩展到多概念场景。我们的结果为流匹配模型中的安全可控生成建立了新范式。

英文摘要

Recent advances in flow matching models have significantly improved text-to-image generation quality, but also introduce growing safety risks due to the generation of harmful or undesirable content. Existing concept erasure methods are either inference-time interventions with limited effectiveness or rely on supervised fine-tuning (SFT), which requires precisely aligned data and struggles with scalability and multi-concept settings. In this paper, we propose \emph{FlowErase-RL}, the first GRPO-based framework for concept erasure in flow matching models. We reformulate concept erasure as a reward optimization problem and introduce a \textbf{dynamic dual-path reward mechanism} that jointly optimizes (i) a Concept Erasure (CE) reward to suppress target concepts and (ii) a Non-target Space (NS) reward to preserve generative fidelity. The two reward paths are adaptively balanced during training via a performance-driven switching strategy, enabling stable optimization without explicit supervision. Extensive experiments on nudity, object, and artistic style erasure demonstrate that our method achieves state-of-the-art erasure performance while maintaining strong image quality and semantic alignment. Moreover, it exhibits robust resistance to adversarial attacks and scales effectively to multi-concept scenarios. Our results establish a new paradigm for safe and controllable generation in flow matching models.

2605.18797 2026-05-26 cs.LG cs.AI

Simply Stabilizing the Loop via Fully Looped Transformer

通过全循环Transformer简单稳定循环

Rao Fu, Zixuan Yang, Jiankun Zhang, Jing Ma, Hechang Chen, Yu Li, Yi Chang

发表机构 * Hong Kong Baptist University(香港 Baptist 大学) Jilin University(吉林大学)

AI总结 针对循环Transformer在迭代次数增加时出现的训练不稳定性,提出全循环Transformer,通过全循环架构和注意力注入两种无参数修改,稳定训练至12次循环,下游任务性能提升最高13.2%。

详情
AI中文摘要

扩展模型性能通常需要增加模型大小。循环Transformer通过迭代重用相同的Transformer块提供了一种引人注目的替代方案,用额外的计算换取性能提升,而不增加参数数量或上下文长度。由于推理时可以调整循环迭代次数,它还提供了一种平衡性能和测试时计算的自然机制。然而,当循环迭代次数增加时,循环Transformer仍然存在训练不稳定性。我们的分析表明,这种不稳定性源于两个来源:梯度振荡和残差爆炸。为了解决这两个问题,我们提出了全循环Transformer,它引入了两种无参数修改:(1)全循环架构,将循环间信号分布到所有层以缓解残差爆炸;(2)注意力注入,重用现有的注意力块以抑制梯度振荡。这些修改稳定了训练动态,使得全循环Transformer能够稳定训练多达12次循环迭代,而其他基线循环模型在这种情况下会崩溃。在循环Transformer不会崩溃的较温和设置中,全循环Transformer仍然将平均下游任务性能提升了高达13.2%。总体而言,我们的实验表明,全循环Transformer提高了训练稳定性,增强了下游性能,并通过在推理时改变循环迭代次数,提供了在不同测试时计算预算下的初步适应性。

英文摘要

Scaling model performance typically requires increasing model size. Looped Transformer offers a compelling alternative by iteratively reusing the same Transformer blocks, trading additional computation for improved performance without increasing parameter count or context length. Because the number of loop iterations can be adjusted at inference, it also provides a natural mechanism for balancing performance and test-time compute. However, Looped Transformer still suffers from training instability when the number of loop iterations increases. Our analysis reveals that this instability stems from two sources: gradient oscillation and residual explosion. To address these two problems, we propose the Fully Looped Transformer, which introduces two parameter-free modifications: (1) Fully Looped Architecture, which distributes inter-loop signals across all layers to mitigate residual explosion; (2) Attention Injection, which reuses the existing attention block to suppress gradient oscillation. These modifications stabilize training dynamics, enabling the Fully Looped Transformer to be trained stably up to 12 loop iterations, whereas other baseline looped models collapse in this regime. In milder settings where Looped Transformer does not collapse, Fully Looped Transformer still improves average downstream-task performance by up to 13.2\%. Overall, our experiments demonstrate that Fully Looped Transformer improves training stability, enhances downstream performance, and provides preliminary adaptability under different test-time compute budgets by varying loop iterations at inference.

2605.18746 2026-05-26 cs.CV cs.AI cs.CL cs.LG cs.RO

ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

ESI-Bench: 迈向闭环感知-动作的具身空间智能

Yining Hong, Jiageng Liu, Han Yin, Manling Li, Leonidas Guibas, Li Fei-Fei, Jiajun Wu, Yejin Choi

发表机构 * Stanford University(斯坦福大学) UCLA(加州大学洛杉矶分校) Northwestern University(西北大学)

AI总结 提出ESI-BENCH基准,通过主动探索(感知、移动、操作)在OmniGibson环境中评估具身空间智能,发现主动探索显著优于被动方法,失败主因是动作盲视而非感知弱,且模型存在元认知差距。

Comments https://esi-bench.github.io/

详情
AI中文摘要

空间智能通过感知-动作循环展开:智能体通过行动获取观察,并推理观察如何随动作变化。它们不是被动处理所见,而是主动揭示未见——遮挡结构、动态、包含关系和功能,这些无法仅通过被动感知解决。我们超越先前假设神谕观察的空间智能表述,将观察者重新定义为行动者。我们引入ESI-BENCH,一个基于OmniGibson、扎根于Spelke核心知识系统的全面具身空间智能基准,涵盖10个任务类别和29个子类别。智能体必须决定部署哪些能力——感知、移动和操作——以及如何排序以主动积累任务相关证据。我们对最先进的MLLM进行大量实验,发现主动探索显著优于被动对应物,智能体自发发现涌现的空间策略而无需明确指令,而随机多视角往往增加噪声而非信号,尽管消耗更多图像。大多数失败并非源于感知弱,而是动作盲视:糟糕的动作选择导致糟糕的观察,进而引发级联错误。虽然显式3D基础稳定了深度敏感任务的推理,但不完美的3D表示通过扭曲空间关系证明比2D基线更有害。人类研究进一步揭示,与寻求证伪视角并在矛盾下修正信念的人类不同,模型无论证据质量如何都过早且高置信度地承诺,暴露了一个既不能通过更好感知也不能通过更多具身互动单独闭合的元认知差距。

英文摘要

Spatial intelligence unfolds through a perception-action loop: agents act to acquire observations, and reason about how observations vary as a function of action. Rather than passively processing what is seen, they actively uncover what is unseen - occluded structure, dynamics, containment, and functionality that cannot be resolved from passive sensing alone. We move beyond prior formulations of spatial intelligence that assume oracle observations by recasting the observer as an actor. We introduce ESI-BENCH, a comprehensive benchmark for embodied spatial intelligence spanning 10 task categories and 29 subcategories built on OmniGibson, grounded in Spelke's core knowledge systems. Agents must decide what abilities to deploy - perception, locomotion, and manipulation - and how to sequence them to actively accumulate task-relevant evidence. We conduct extensive experiments on state-of-the-art MLLMs and find that active exploration substantially outperforms passive counterparts, with agents spontaneously discovering emergent spatial strategies without explicit instructions, while random multi-view often adds noise rather than signal despite consuming far more images. Most failures stem not from weak perception but from action blindness: poor action choices lead to poor observations, which in turn drive cascading errors. While explicit 3D grounding stabilizes reasoning on depth-sensitive tasks, imperfect 3D representation proves more harmful than 2D baselines by distorting spatial relations. Human studies further reveal that unlike humans who seek falsifying viewpoints and revise beliefs under contradiction, models commit prematurely with high confidence regardless of evidence quality, exposing a metacognitive gap that neither better perception nor more embodied interaction alone can close.