arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2312.07762 2026-06-08 cs.LG cs.NA math.NA stat.AP 版本更新

Interpretable factorization of clinical questionnaires to identify latent factors of psychopathology

临床问卷的可解释分解以识别精神病理学的潜在因素

Ka Chun Lam, Bridget W Mahony, Armin Raznahan, Francisco Pereira

发表机构 * Machine Learning Core, National Institute of Mental Health, National Institutes of Health（机器学习核心，国家心理健康研究所，国立卫生研究院）； Section on Developmental Neurogenomics, Human Genetics Branch, National Institute of Mental Health, National Institutes of Health（发育神经基因组学部门，人类遗传学分支，国家心理健康研究所，国立卫生研究院）； National Institute of Mental Health, National Institutes of Health（国家心理健康研究所，国立卫生研究院）

AI总结提出可解释性约束问卷分解（ICQF），一种非负矩阵分解方法，通过正则化提高因子可解释性和稳定性，并自动检测潜在维度，在真实数据中优于现有方法。

详情

AI中文摘要

精神病学研究旨在通过识别少量潜在因素来理解问卷数据中测量的行为精神病理学表现。虽然因子分析是传统工具，但所得因子可能不可解释，且可能受混杂变量影响。此外，缺失数据常见，通常需要显式插补。为克服这些限制，我们引入了可解释性约束问卷分解（ICQF），一种针对问卷数据正则化的非负矩阵分解方法。我们的方法旨在提高因子可解释性和解稳定性。我们提供了具有理论收敛保证的优化过程，以及自动准确检测潜在维度的程序。我们使用逼真的合成数据验证了这些程序。我们在两个独立数据集（健康大脑网络和青少年大脑认知发展研究）中展示了该方法在广泛使用的通用问卷中的有效性。具体而言，我们表明ICQF提高了领域专家定义的可解释性，同时保留了跨一系列障碍的诊断信息，并在较小数据集规模下优于竞争方法。这表明我们方法中的正则化与领域特征相匹配。ICQF的Python实现可在https://github.com/jefferykclam/ICQF获取。

英文摘要

Psychiatry research seeks to understand the manifestations of psychopathology in behavior, as measured in questionnaire data, by identifying a small number of latent factors that explain them. While factor analysis is the traditional tool for this purpose, the resulting factors may not be interpretable, and may also be subject to confounding variables. Moreover, missing data are common, and explicit imputation is often required. To overcome these limitations, we introduce interpretability constrained questionnaire factorization (ICQF), a non-negative matrix factorization method with regularization tailored for questionnaire data. Our method aims to promote factor interpretability and solution stability. We provide an optimization procedure with theoretical convergence guarantees, and an automated procedure to detect latent dimensionality accurately. We validate these procedures using realistic synthetic data. We demonstrate the effectiveness of our method in a widely used general-purpose questionnaire, in two independent datasets (the Healthy Brain Network and Adolescent Brain Cognitive Development studies). Specifically, we show that ICQF improves interpretability, as defined by domain experts, while preserving diagnostic information across a range of disorders, and outperforms competing methods for smaller dataset sizes. This suggests that the regularization in our method matches domain characteristics. The python implementation for ICQF is available at https://github.com/jefferykclam/ICQF.

URL PDF HTML ☆

赞 0 踩 0

2603.14014 2026-06-08 cs.LG cs.GT 版本更新

Aumann-SHAP: The Geometry of Counterfactual Interaction Explanations in Machine Learning

Aumann-SHAP: 机器学习中反事实交互解释的几何结构

Adam Belahcen, Stéphane Mussard

发表机构 * GitHub ； arXiv

AI总结提出Aumann-SHAP框架，通过局部超立方体网格分解反事实转移，利用微博弈Shapley和LES值实现几何感知归因，在合成数据上纠正了等分Shapley的偏差，在真实数据上修正了符号错误并提高了编辑效率。

详情

AI中文摘要

我们引入Aumann-SHAP，一个交互感知框架，通过将模型限制在连接基线和反事实特征的局部超立方体来分解反事实转移。每个超立方体被离散化为一个网格，以构建一个诱导的微玩家合作博弈，其中基本网格步移动成为玩家。该TU-微博弈上的Shapley和LES值产生几何感知的域内归因，在网格细化下收敛到对角Aumann-Shapley / Integrated Gradients极限，并将等分Shapley恢复为退化的$m=1$特例。精确的网格状态闭式公式为固定交互阶数提供了多项式时间计算。在具有已知真实值的合成基准上，等分Shapley带有不可约偏差，而Aumann-SHAP收敛到正确分解。在German Credit上，交互几何在$12.3\%$的实例中改变了特征优先级排序。在UCI Heart Disease上，等分错误地将胆固醇抑制因子归因为正贡献者，这是Aumann-SHAP纠正的符号错误。在MNIST上，博弈论归因达到目标置信度所需的编辑次数比基于幅度的排序少$3.5\ imes$，其中微博弈Shapley在所有预算下实现了最佳效率。

英文摘要

We introduce Aumann-SHAP, an interaction-aware framework that decomposes counterfactual transitions by restricting the model to a local hypercube connecting baseline and counterfactual features. Each hypercube is discretized into a grid to construct an induced micro-player cooperative game in which elementary grid-step moves become players. Shapley and LES values on this TU-micro-game yield geometry-aware within-pot attributions that converge to the diagonal Aumann--Shapley / Integrated Gradients limit under grid refinement, and recover equal-split Shapley as the degenerate $m=1$ special case. An exact grid-state closed form gives polynomial-time computation for fixed interaction order. On a synthetic benchmark with known ground truth, equal-split Shapley carries an irreducible bias while Aumann-SHAP converges to the correct decomposition. On German Credit, interaction geometry changes feature priority rankings in $12.3\%$ of instances. On UCI Heart Disease, equal-split misattributes a cholesterol suppressor as a positive contributor, which is a sign error Aumann-SHAP corrects. On MNIST, game-theoretic attribution reaches target confidence with $3.5\times$ fewer edits than magnitude-based ordering, with micro-game Shapley achieving the best efficiency across all budgets.

URL PDF HTML ☆

赞 0 踩 0

2603.13546 2026-06-08 cs.LG 版本更新

Probabilistic Gaussian Homotopy: A Probability-Space Continuation Framework for Nonconvex Optimization

概率高斯同伦：非凸优化的概率空间延拓框架

Eshed Gal, Samy Wu Fung, Eldad Haber

发表机构 * University of British Columbia（不列颠哥伦比亚大学）； Colorado School of Mines（科罗拉多矿业学院）

AI总结提出概率高斯同伦（PGH）框架，通过变形玻尔兹曼分布和玻尔兹曼加权梯度聚合，实现非凸优化的概率空间延拓，并导出基于蒙特卡洛梯度估计的实用算法PGHO。

详情

AI中文摘要

我们提出了概率高斯同伦（PGH），一种用于非凸优化的概率空间延拓框架。与经典的高斯同伦（平滑目标函数并均匀平均梯度）不同，PGH 变形相关的玻尔兹曼分布，并诱导扰动梯度的玻尔兹曼加权聚合，从而将下降方向指数地偏向低能量区域。我们证明 PGH 对应于一种 log-sum-exp（软最小）同伦，它在尺度 $λ>0$ 下平滑非凸目标函数，并在 $λ\ o 0$ 时恢复原始目标函数，从而得到 Moreau 包络的后验均值推广，并且我们推导了沿着退火同伦路径控制极小值演化的动力系统。这建立了高斯延拓、贝叶斯去噪和扩散式平滑之间的原理性联系。我们进一步提出了概率高斯同伦优化（PGHO），一种基于蒙特卡洛梯度估计的实用随机算法，并在高维非凸基准测试和稀疏恢复问题上展示了强大的性能，而经典梯度方法和目标空间平滑在这些问题上经常失败。

英文摘要

We introduce Probabilistic Gaussian Homotopy (PGH), a probability-space continuation framework for nonconvex optimization. Unlike classical Gaussian homotopy, which smooths the objective and uniformly averages gradients, PGH deforms the associated Boltzmann distribution and induces Boltzmann-weighted aggregation of perturbed gradients, which exponentially biases descent directions toward low-energy regions. We show that PGH corresponds to a log-sum-exp (soft-min) homotopy that smooths a nonconvex objective at scale $λ>0$ and recovers the original objective as $λ\to 0$, yielding a posterior-mean generalization of the Moreau envelope, and we derive a dynamical system governing minimizer evolution along an annealed homotopy path. This establishes a principled connection between Gaussian continuation, Bayesian denoising, and diffusion-style smoothing. We further propose Probabilistic Gaussian Homotopy Optimization (PGHO), a practical stochastic algorithm based on Monte Carlo gradient estimation, and demonstrate strong performance on high-dimensional nonconvex benchmarks and sparse recovery problems where classical gradient methods and objective-space smoothing frequently fail.

URL PDF HTML ☆

赞 0 踩 0

2603.13092 2026-06-08 cs.LG cs.AR 版本更新

Breaking the Tuning Barrier: Zero-Hyperparameters Yield Multi-Corner Analysis Via Learned Priors

打破调优壁垒：通过先验学习实现零超参数的多角分析

Wei W. Xing, Kaiqi Huang, Jiazhan Liu, Hong Qiu, Shan Shen

发表机构 * School of Mathematical and Physical Science, University of Sheffield（谢菲尔德大学数学与物理科学学院）； SZU–UoS Joint Centre for Innovation and Entrepreneurship, College of Mechatronics and Control Engineering, Shenzhen University（深大-乌兹别克斯坦联合创新与创业中心，机电控制工程学院，深圳大学）； Nanjing University of Science and Technology（南京理工大学）

AI总结针对电路多角分析中仿真成本高且现有方法需大量调参的问题，提出基于预训练基础模型的上下文学习方法，无需调优即可匹配最先进精度，将验证成本降低10倍以上。

详情

Comments: Accepted by DAC2026. Camera-ready Version

AI中文摘要

良率多角分析在25个以上的工艺-电压-温度角下验证电路，导致组合仿真成本为$O(K \ imes N)$，其中$K$表示角数，$N$每个角超过$10^4$个样本。现有方法面临基本权衡：简单模型实现自动化但在非线性电路上失败，而先进AI模型捕获复杂行为但每次设计迭代需要数小时的超参数调优，形成调优壁垒。我们通过用从数百万回归任务预训练的基础模型中学到的先验替代工程先验（即模型规范）来打破这一壁垒。该模型进行上下文学习，无需调优或重新训练即可即时适应每个电路。其注意力机制通过识别工作条件之间共享的电路物理特性，自动跨角传递知识。结合自动特征选择器（1152D到48D），我们的方法以零调优匹配最先进精度（平均MRE低至0.11%），将总验证成本降低10倍以上。

英文摘要

Yield Multi-Corner Analysis validates circuits across 25+ Process-Voltage-Temperature corners, resulting in a combinatorial simulation cost of $O(K \times N)$ where $K$ denotes corners and $N$ exceeds $10^4$ samples per corner. Existing methods face a fundamental trade-off: simple models achieve automation but fail on nonlinear circuits, while advanced AI models capture complex behaviors but require hours of hyperparameter tuning per design iteration, forming the Tuning Barrier. We break this barrier by replacing engineered priors (i.e., model specifications) with learned priors from a foundation model pre-trained on millions of regression tasks. This model performs in-context learning, instantly adapting to each circuit without tuning or retraining. Its attention mechanism automatically transfers knowledge across corners by identifying shared circuit physics between operating conditions. Combined with an automated feature selector (1152D to 48D), our method matches state-of-the-art accuracy (mean MREs as low as 0.11%) with zero tuning, reducing total validation cost by over $10\times$.

URL PDF HTML ☆

赞 0 踩 0

2603.13042 2026-06-08 cs.LG cs.AR 版本更新

OpenACMv2: An Accuracy-Constrained Co-Optimization Framework for Approximate DCiM

OpenACMv2：面向近似数字存内计算的精度约束协同优化框架

Yiqi Zhou, Yue Yuan, Yikai Wang, Bohao Liu, Qinxin Mei, Zhuohua Liu, Shan Shen, Wei Xing, Daying Sun, Li Li, Guozhu Liu

发表机构 * Nanjing University of Science and Technology, China（南京理工大学）； Shenzhen University, China（深圳大学）； Beihang University, China（北京航空航天大学）； University of Sheffield, UK（谢菲尔德大学）； The 58th Research Institute of China Electronics Technology Group Corporation, China（中国电子科技集团第五十八研究所）

AI总结提出OpenACMv2框架，通过两级优化（架构搜索和晶体管尺寸调整）实现精度约束下数字存内计算的功耗-性能-面积权衡，实验显示功耗延迟积降低50%以上。

详情

Comments: Accepted by DAC2026. Camera-ready version

AI中文摘要

数字存内计算通过减少数据移动来加速神经网络。近似数字存内计算可以进一步改善功耗-性能-面积，但需要在耦合的架构和晶体管级选择中进行精度约束的协同优化。基于OpenYield，我们引入了精度约束协同优化，并提出了OpenACMv2，这是一个通过两级优化实现ACCO的开放框架：（1）基于快速GNN代理预测PPA和误差，进行压缩机组合和SRAM宏参数的精度约束架构搜索；（2）使用蒙特卡洛方法对标准单元和SRAM位单元进行考虑工艺偏差和PVT的晶体管尺寸调整。通过将ACCO解耦为架构级探索和电路级尺寸调整，OpenACMv2集成了经典的单目标和多目标优化器，以提供强大的PPA-精度权衡和稳健的收敛性。该工作流兼容FreePDK45和OpenROAD，支持可复现的评估和易于采用。实验表明，所提出的两级ACCO框架在Level-1通过架构探索实现了大部分精度约束的效率提升，功耗延迟积降低约50%以上，而Level-2晶体管级优化在保持精度的同时进一步提供了个位数的PDP改进，从而支持对近似DCiM进行快速的“假设”探索。该框架可在GitHub上获取（https://github.com/ShenShan123/OpenACM）。

英文摘要

Digital Compute-in-Memory (DCiM) accelerates neural networks by reducing data movement. Approximate DCiM can further improve power-performance-area (PPA), but demands accuracy-constrained co-optimization across coupled architecture and transistor-level choices. Building on OpenYield, we introduce Accuracy-Constrained Co-Optimization (ACCO) and present OpenACMv2, an open framework that operationalizes ACCO via two-level optimization: (1) accuracy-constrained architecture search of compressor combinations and SRAM macro parameters, driven by a fast GNN-based surrogate for PPA and error; and (2) variation- and PVT-aware transistor sizing for standard cells and SRAM bitcells using Monte Carlo. By decoupling ACCO into architecture-level exploration and circuit-level sizing, OpenACMv2 integrates classic single- and multi-objective optimizers to deliver strong PPA-accuracy tradeoffs and robust convergence. The workflow is compatible with FreePDK45 and OpenROAD, supporting reproducible evaluation and easy adoption. Experiments show that the proposed two-level ACCO framework achieves most of its accuracy-constrained efficiency gain at Level-1 through architecture exploration, delivering roughly 50%+ PDP reduction, while Level-2 transistor-level optimization provides a further single-digit PDP improvement while preserving accuracy, enabling rapid "what-if" exploration for approximate DCiM. The framework is available on GitHub (https://github.com/ShenShan123/OpenACM).

URL PDF HTML ☆

赞 0 踩 0

2603.12507 2026-06-08 cs.LG math.OC stat.CO stat.ML 版本更新

Adaptive Conditional Forest Sampling for Spectral Risk Optimisation under Decision-Dependent Uncertainty

自适应条件森林采样用于决策依赖不确定性下的谱风险优化

Marcell T. Kurbucz

发表机构 * Institute for Global Prosperity, The Bartlett, University College London（全球繁荣研究所，巴特利特学院，伦敦大学学院）

AI总结提出ACFS框架，结合广义随机森林、CEM全局搜索和重加权聚焦增强，解决决策依赖分布下的谱风险最小化问题，在重尾和偏态基准上优于现有方法。

详情

Comments: 18 pages, 3 figures, 10 tables

AI中文摘要

当不确定性分布依赖于决策时，最小化谱风险目标（定义为期望成本与条件风险价值（CVaR）的加权组合）具有挑战性，这使得代理建模和基于模拟的排序对尾部估计误差敏感。我们提出自适应条件森林采样（ACFS），一个四阶段模拟优化框架，集成了用于决策条件分布近似的广义随机森林、CEM引导的全局探索、秩加权聚焦增强以及代理到真实的两阶段重排序，然后进行多起点梯度优化。我们在两个结构不同的数据生成过程上评估ACFS：具有决策依赖学生t边际的高斯copula和具有对数正态边际的高斯copula，在三种惩罚权重配置和每种设置100次重复下，对每种方法可用的真实分布oracle抽取次数设置共同上限。在第二个基准测试中，ACFS在每个配置下均实现了最低的中位数oracle谱风险，中位数差距相对于GP-BO在8.6%到21.8%之间。在第一个基准测试中，ACFS和GP-BO在中位数目标上统计上无显著差异，但在较高惩罚权重下，ACFS相对于GP-BO将跨重复离散度降低了约1.9到2.5倍，在最低权重下接近持平，在第二个基准测试中整体降低了1.7到2.3倍，表明运行间可靠性显著提高。ACFS在几乎所有设置中也优于CEM-SO、SGD-CVaR和KDE-SO，而消融和敏感性分析支持设计的鲁棒性，并表明各组件贡献在偏斜的对数正态基准上最为显著。

英文摘要

Minimising a spectral risk objective, defined as a weighted combination of expected cost and Conditional Value-at-Risk (CVaR), is challenging when the uncertainty distribution is decision-dependent, making both surrogate modelling and simulation-based ranking sensitive to tail estimation error. We propose Adaptive Conditional Forest Sampling (ACFS), a four-phase simulation-optimisation framework that integrates Generalised Random Forests for decision-conditional distribution approximation, CEM-guided global exploration, rank-weighted focused augmentation, and surrogate-to-oracle two-stage reranking before multi-start gradient-based refinement. We evaluate ACFS on two structurally distinct data-generating processes: a Gaussian copula with decision-dependent Student-t marginals and a Gaussian copula with log-normal marginals, across three penalty-weight configurations and 100 replications per setting, under a common cap on the number of true-distribution oracle draws available to each method. ACFS achieves the lowest median oracle spectral risk on the second benchmark in every configuration, with median gaps over GP-BO ranging from 8.6% to 21.8%. On the first benchmark, ACFS and GP-BO are statistically indistinguishable in median objective, but ACFS reduces cross-replication dispersion relative to GP-BO by approximately 1.9 to 2.5 times at the higher penalty weights, with near-parity at the lowest, and by 1.7 to 2.3 times throughout on the second benchmark, indicating materially improved run-to-run reliability. ACFS also outperforms CEM-SO, SGD-CVaR, and KDE-SO in nearly all settings, while ablation and sensitivity analyses support the robustness of the design and indicate that component contributions are most pronounced on the skewed log-normal benchmark.

URL PDF HTML ☆

赞 0 踩 0

2603.11333 2026-06-08 cs.AI 版本更新

LLM-Augmented Digital Twin for Policy Evaluation in Short-Video Platforms

LLM增强的数字孪生用于短视频平台策略评估

Haoting Zhang, Yunduan Lin, Jinghai He, Denglin Jiang, Zuo-Jun Shen, Zeyu Zheng

发表机构 * University of California, Berkeley（加州大学伯克利分校）； The Chinese University of Hong Kong（香港中文大学）； New York University（纽约大学）； The University of Hong Kong（香港大学）

AI总结提出一种LLM增强的四模块数字孪生架构（用户、内容、交互、平台），通过事件驱动执行层和可插拔策略组件，支持在闭环动态下对平台策略（含AI策略）进行可复现的仿真评估。

详情

AI中文摘要

短视频平台是闭环、人在回路中的生态系统，其中平台策略、创作者激励和用户行为共同演化。这种反馈结构使得在生产环境中进行反事实策略评估变得困难，尤其是对于长期和分布性结果。随着平台部署改变内容进入系统方式、代理适应方式以及平台运行方式的AI工具，这一挑战被放大。我们提出了一种大语言模型（LLM）增强的数字孪生用于短视频平台，具有模块化的四孪生架构（用户、内容、交互、平台）和一个支持可复现实验的事件驱动执行层。平台策略作为平台孪生中的可插拔组件实现，LLM作为可选的、模式约束的决策服务（例如，角色生成、内容字幕、活动规划、趋势预测）集成，并通过统一优化器路由。这种设计使得可扩展的仿真成为可能，在保留闭环动态的同时允许选择性采用LLM，从而能够在现实反馈和约束下研究平台策略，包括AI增强策略。

英文摘要

Short-video platforms are closed-loop, human-in-the-loop ecosystems where platform policy, creator incentives, and user behavior co-evolve. This feedback structure makes counterfactual policy evaluation difficult in production, especially for long-horizon and distributional outcomes. The challenge is amplified as platforms deploy AI tools that change what content enters the system, how agents adapt, and how the platform operates. We propose a large language model (LLM)-augmented digital twin for short-video platforms, with a modular four-twin architecture (User, Content, Interaction, Platform) and an event-driven execution layer that supports reproducible experimentation. Platform policies are implemented as pluggable components within the Platform Twin, and LLMs are integrated as optional, schema-constrained decision services (e.g., persona generation, content captioning, campaign planning, trend prediction) that are routed through a unified optimizer. This design enables scalable simulations that preserve closed-loop dynamics while allowing selective LLM adoption, enabling the study of platform policies, including AI-enabled policies, under realistic feedback and constraints.

URL PDF HTML ☆

赞 0 踩 0

2603.08683 2026-06-08 cs.SD cs.AI cs.LG eess.AS 版本更新

Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

全保真音频无损压缩的语言建模基准测试

Phillip Long, Zachary Novack, Chris Donahue

发表机构 * University of California, San Diego, Computer Science and Engineering Department（加州大学圣地亚哥分校计算机科学与工程系）； Carnegie Mellon University, School of Computer Science（卡内基梅隆大学计算机科学学院）

AI总结提出字节级分词方案Trilobyte，将词汇量从指数级降至常数级，首次实现24位音频的LM无损压缩，并在8位和16位下超越FLAC。

详情

Comments: Accepted at Interspeech 2026, 7 pages, 5 figures

AI中文摘要

在原始波形上训练的自回归“语言”模型（LM）可被重新用于无损音频压缩，但先前的工作仅限于8位音频，尚不清楚此类方法是否适用于实际场景（16/24位）以及能否与现有编解码器竞争。我们对基于LM的压缩在全保真音频上进行了基准测试，涵盖不同领域（音乐、语音、生物声学）、采样率（16kHz-48kHz）和位深度（8、16、24位）。标准的样本级分词在更高位深度下因词汇量过大（16位为65K；24位为16.7M）而变得不可行。我们提出了Trilobyte，一种用于全分辨率音频的字节级分词方案，将词汇量从$O(2^{b})$改进为$O(1)$，并首次实现了可行的24位基于LM的无损压缩。虽然LM在8位和16位下持续优于FLAC并达到最先进的压缩效果，但我们观察到，随着位深度超过8位，压缩增益变得更为有限。

英文摘要

Autoregressive "language" models (LMs) trained on raw waveforms can be repurposed for lossless audio compression, but prior work is limited to 8-bit audio, leaving open whether such approaches work for practical settings (16/24-bit) and can compete with existing codecs. We benchmark LM-based compression on full-fidelity audio across diverse domains (music, speech, bioacoustics), sampling rates (16kHz-48kHz), and bit depths (8, 16, 24-bit). Standard sample-level tokenization becomes intractable at higher bit depths due to vocabulary size (65K for 16-bit; 16.7M for 24-bit). We propose Trilobyte, a byte-level tokenization schema for full resolution audio, improving vocabulary scaling from $O(2^{b})$ to $O(1)$ and enabling the first tractable 24-bit LM-based lossless compression. While LMs consistently outperform FLAC and yield state-of-the-art compression at 8-bit and 16-bit, we observe that compression gains become more modest as bit depth increases beyond 8-bit.

URL PDF HTML ☆

赞 0 踩 0

2603.07704 2026-06-08 cs.CV 版本更新

PARSE: Part-Aware Relational Spatial Modeling

PARSE: 部件感知的关系空间建模

Yinuo Bai, Peijun Xu, Kuixiang Shao, Yuyang Jiao, Jingxuan Zhang, Kaixin Yao, Jiayuan Gu, Jingyi Yu

发表机构 * ShanghaiTech University（上海科技大学）； Deemos Technology（德莫斯科技）

AI总结提出PARSE框架，通过部件级部件中心装配图（PAG）和空间配置求解器，实现几何约束下的无碰撞物理有效场景组装，并构建PARSE-10K数据集，提升3D场景布局推理和生成的真实感。

详情

Comments: Project Page: https://otanaaa.github.io/PARSE-project-page/

AI中文摘要

物体间关系是空间智能的基础，但现有表示（如语言介词或物体级场景图）过于粗糙，无法指定哪些区域实际支撑、包含或接触彼此，导致布局模糊且物理不一致。为解决这些歧义，需要部件级表示；因此，我们引入PARSE，一个显式建模物体部件如何交互以确定可行且空间接地场景配置的框架。PARSE的核心是部件中心装配图（PAG），它编码特定物体部件之间的几何关系，以及一个部件感知空间配置求解器，该求解器将这些关系转换为几何约束，以组装无碰撞、物理有效的场景。利用PARSE，我们构建了PARSE-10K数据集，包含10,000个3D室内场景，这些场景基于真实图像布局先验和精心标注的部件形状数据库构建，每个场景具有密集的接触结构和部件级接触图。借助这种结构化、空间接地的监督，在PARSE-10K上微调Qwen3-VL可产生更强的物体级布局推理和更准确的部件级关系理解；此外，在3D生成模型中利用PAG作为结构先验，可生成物理真实感和结构复杂性显著提升的场景。这些结果表明，PARSE显著推进了几何接地的空间推理，并支持生成物理一致的3D场景。

英文摘要

Inter-object relations underpin spatial intelligence, yet existing representations -- linguistic prepositions or object-level scene graphs -- are too coarse to specify which regions actually support, contain, or contact one another, leading to ambiguous and physically inconsistent layouts. To address these ambiguities, a part-level formulation is needed; therefore, we introduce PARSE, a framework that explicitly models how object parts interact to determine feasible and spatially grounded scene configurations. PARSE centers on the Part-centric Assembly Graph (PAG), which encodes geometric relations between specific object parts, and a Part-Aware Spatial Configuration Solver that converts these relations into geometric constraints to assemble collision-free, physically valid scenes. Using PARSE, we build PARSE-10K, a dataset of 10,000 3D indoor scenes constructed from real-image layout priors and a curated part-annotated shape database, each with dense contact structures and a part-level contact graph. With this structured, spatially grounded supervision, fine-tuning Qwen3-VL on PARSE-10K yields stronger object-level layout reasoning and more accurate part-level relation understanding; furthermore, leveraging PAGs as structural priors in 3D generation models leads to scenes with substantially improved physical realism and structural complexity. Together, these results show that PARSE significantly advances geometry-grounded spatial reasoning and supports the generation of physically consistent 3D scenes.

URL PDF HTML ☆

赞 0 踩 0

2603.06915 2026-06-08 cs.CL cs.LG 版本更新

A Dynamic Self-Evolving Extraction System

一种动态自演化抽取系统

Moin Amin-Naseri, Hannah Kim, Estevam Hruschka

发表机构 * Megagon Labs（Megagon实验室）

AI总结提出DySECT系统，通过LLM抽取三元组构建知识库，结合概率知识和图推理丰富知识，再反馈优化抽取器，形成闭环持续提升。

详情

AI中文摘要

从原始文本中抽取结构化信息是许多NLP应用（包括文档检索、排序和相关性估计）的基本组成部分。高质量的抽取通常需要领域特定的准确性、对专业分类法的最新理解，以及吸收新兴术语和罕见异常值的能力。在许多领域（如医疗、法律和人力资源），抽取模型还必须适应不断变化的术语，并受益于对结构化知识的显式推理。我们提出了DySECT，一个动态自演化抽取与策管工具包，它在使用过程中持续改进。该系统逐步用LLM抽取的三元组填充一个多功能、自扩展的知识库（KB）。KB通过整合概率知识和基于图的推理进一步丰富自身，逐步积累领域概念和关系。然后，丰富的KB通过提示调优、采样相关少样本示例或使用KB衍生的合成数据进行微调，反馈给LLM抽取器。结果，系统形成了一个共生的闭环循环，其中抽取持续改进知识，知识持续改进抽取。

英文摘要

The extraction of structured information from raw text is a fundamental component of many NLP applications, including document retrieval, ranking, and relevance estimation. High-quality extractions often require domain-specific accuracy, up-to-date understanding of specialized taxonomies, and the ability to incorporate emerging jargon and rare outliers. In many domains--such as medical, legal, and HR--the extraction model must also adapt to shifting terminology and benefit from explicit reasoning over structured knowledge. We propose DySECT, a Dynamic Self-Evolving Extraction and Curation Toolkit, which continually improves as it is used. The system incrementally populates a versatile, self-expanding knowledge base (KB) with triples extracted by the LLM. The KB further enriches itself through the integration of probabilistic knowledge and graph-based reasoning, gradually accumulating domain concepts and relationships. The enriched KB then feeds back into the LLM extractor via prompt tuning, sampling of relevant few-shot examples, or fine-tuning using KB-derived synthetic data. As a result, the system forms a symbiotic closed-loop cycle in which extraction continuously improves knowledge, and knowledge continuously improves extraction.

URL PDF HTML ☆

赞 0 踩 0

2603.06673 2026-06-08 cs.CV cs.LG 版本更新

Unmixing ATR-μFTIR spectroscopic images of cross-sections of historical oil paintings

历史油画横截面的ATR-μFTIR光谱图像解混

Shivam Pande, Nicolas Nadisic, Francisco Mederos-Henry, Aleksandra Pizurica

发表机构 * Belgian Federal Science Policy（比利时联邦科学政策）； FED-tWIN project（FED-tWIN项目）； Prf-2022-050 BALaTAI ； Prf-2021-002 MatCoRe

AI总结提出一种无监督CNN自编码器，结合加权光谱角距离损失，用于解混ATR-μFTIR高光谱图像，自动估计端元光谱和丰度图，在污染区域提升可解释性。

详情

Comments: 5 pages, accepted at EUSIPCO 2026

AI中文摘要

光谱成像已成为遗产科学的核心技术，因为它能够对文物中的材料进行非侵入性、空间分辨的表征。特别是，衰减全反射傅里叶变换红外显微镜（ATR-$μ$FTIR）被广泛用于分析绘画横截面，其中在每个像素处记录光谱以形成高光谱图像（HSI）。解释这些数据是困难的：光谱通常是异质、多层和退化样品中多种物质的混合物，而当前实践仍然严重依赖于与参考库的手动比较。这种工作流程缓慢、主观且难以扩展。我们提出了一种无监督CNN自编码器，用于盲解混ATR-$μ$FTIR HSI，通过基于块建模利用局部空间结构，估计端元光谱及其丰度图。为了减少对超过1500个波段的大气和采集伪影的敏感性，我们引入了一种加权光谱角距离（WSAD）损失，该损失具有从空间平坦度、邻域一致性和光谱粗糙度的稳健度量中自动导出的波段可靠性权重。与标准SAD训练相比，WSAD在易受污染的光谱区域提高了可解释性。我们在凡·艾克兄弟的根特祭坛画的ATR-$μ$FTIR横截面上演示了该方法。

英文摘要

Spectroscopic imaging (SI) has become central to heritage science because it enables non-invasive, spatially resolved characterisation of materials in artefacts. In particular, attenuated total reflection Fourier transform infrared microscopy (ATR-$μ$FTIR) is widely used to analyse painting cross-sections, where a spectrum is recorded at each pixel to form a hyperspectral image (HSI). Interpreting these data is difficult: spectra are often mixtures of several species in heterogeneous, multi-layered and degraded samples, and current practice still relies heavily on manual comparison with reference libraries. This workflow is slow, subjective and hard to scale. We propose an unsupervised CNN autoencoder for blind unmixing of ATR-$μ$FTIR HSIs, estimating endmember spectra and their abundance maps while exploiting local spatial structure through patch-based modelling. To reduce sensitivity to atmospheric and acquisition artefacts across more than 1500 bands, we introduce a weighted spectral angle distance (WSAD) loss with automatic band-reliability weights derived from robust measures of spatial flatness, neighbour agreement and spectral roughness. Compared with standard SAD training, WSAD improves interpretability in contamination-prone spectral regions. We demonstrate the method on an ATR-$μ$FTIR cross-section from the Ghent Altarpiece by the Van Eyck brothers.

URL PDF HTML ☆

赞 0 踩 0

2512.14391 2026-06-08 cs.LG cs.AI cs.CL 版本更新

RePo: Language Models with Context Re-Positioning

RePo：具有上下文重定位的语言模型

Huayang Li, Tianyu Zhao, Deng Cai, Richard Sproat

发表机构 * University of Maryland（马里兰大学）

AI总结提出RePo机制，通过可微分模块重新分配token位置以减轻注意力层负担，在噪声上下文、结构化数据和长上下文任务上持续提升性能。

详情

Comments: Accepted to ICML 2026

AI中文摘要

上下文学习是现代大型语言模型（LLM）的基础；然而，主流架构通过分配线性或常数的位置索引来施加刚性且固定的上下文结构。刚性的位置信息将组织输入结构的全部负担强加给注意力层，从而减少了可用于更关键信息的注意力量。为了解决这个问题，我们提出了RePo，一种通过上下文重定位来减轻注意力层负担的新机制。与传统方法不同，RePo利用可微分模块$f_ϕ$来分配捕获上下文依赖关系的token位置，而不是依赖预定义的顺序。通过在OLMo-2 1B和7B模型上持续预训练，我们证明RePo在涉及噪声上下文、结构化数据和更长上下文长度的任务上持续提升性能，同时在一般短上下文任务上保持有竞争力的性能。分析表明，RePo成功地将更多注意力分配给遥远但相关的信息，在密集且非线性的空间中分配位置，并捕获输入上下文的内在结构。我们的代码位于https://github.com/SakanaAI/repo。

英文摘要

In-context learning is fundamental to modern Large Language Models (LLMs); however, prevailing architectures impose a rigid and fixed contextual structure by assigning linear or constant positional indices. The rigid position information poses the full burden of organizing the input structure to attention layers, thus reducing the amount of attention that could be allocated for more critical information. To address this, we propose RePo, a novel mechanism that alleviates the burden for attention layers via context re-positioning. Unlike conventional approaches, RePo utilizes a differentiable module, $f_ϕ$, to assign token positions that capture contextual dependencies, rather than replying on pre-defined order. By continually pre-training on the OLMo-2 1B \& 7B models, we demonstrate that RePo consistently enhances performance on tasks involving noisy contexts, structured data, and longer context length, while maintaining competitive performance on general short-context tasks. Analysis reveals that RePo successfully allocates more attention mass to distant but relevant information, assigns positions in a dense and non-linear space, and captures the intrinsic structure of the input context. Our code is at https://github.com/SakanaAI/repo.

URL PDF HTML ☆

赞 0 踩 0

2603.02970 2026-06-08 cs.LG math.OC 版本更新

LAGO: A Local-Global Optimization Framework Combining Trust Region Methods and Bayesian Optimization

LAGO：一种结合信赖域方法和贝叶斯优化的局部-全局优化框架

Eliott Van Dieren, Tommaso Vanzan, Fabio Nobile

发表机构 * Institute of Mathematics EPFL（瑞士联邦理工学院数学研究所）； Dipartimento di Scienze Matematiche Politecnico di Torino（都灵理工大学数学系）

AI总结提出LAGO框架，通过自适应竞争机制耦合贝叶斯优化和基于梯度的信赖域局部细化，用于光滑且梯度可用的昂贵目标函数优化，在提议层面分离全局探索与局部细化。

详情

Comments: 21 pages, 12 figures

AI中文摘要

我们提出LAGO，一种局部-全局优化框架，通过自适应竞争机制耦合贝叶斯优化（BO）和基于梯度的信赖域局部细化，用于光滑且梯度可用的昂贵目标函数优化。在每次迭代中，全局和局部优化策略独立提议候选点，并根据预测改进选择下一个评估点。LAGO在提议层面分离全局探索与局部细化：BO采集函数在活跃信赖域外优化，而局部候选点在信赖域内提议。仅当满足基于长度尺度的最小距离准则时，接受局部步附近的点才被纳入全局GP数据集，从而降低局部利用期间数值不稳定的风险。LAGO在到达有希望区域时通过高效局部细化增强BO，并在局部步不具竞争力时恢复探索行为。

英文摘要

We introduce LAGO, a LocAl-Global Optimization framework coupling Bayesian Optimization (BO) and gradient-based trust region local refinement through an adaptive competition mechanism for smooth expensive-to-evaluate objective functions with available gradients. At each iteration, global and local optimization strategies independently propose candidate points, and the next evaluation is selected based on predicted improvement. LAGO separates global exploration from local refinement at the proposal level: the BO acquisition function is optimized outside the active trust region, while local candidates are proposed within the trust region. Points in the vicinity of the accepted local step are incorporated in the global GP dataset only when satisfying a lengthscale-based minimum-distance criterion, hence reducing the risk of numerical instability during local exploitation. LAGO enhances BO with efficient local refinement when reaching promising regions, and reverts to exploratory behavior when local steps are not competitive.

URL PDF HTML ☆

赞 0 踩 0

2603.02220 2026-06-08 cs.LG cs.AI cs.CV 版本更新

Forecasting as Rendering: A 2D Gaussian Splatting Framework for Time Series Forecasting

预测即渲染：面向时间序列预测的2D高斯泼溅框架

Yixin Wang, Yifan Hu, Peiyuan Liu, Naiqi Li, Tao Dai, Shu-Tao Xia

发表机构 * Tsinghua Shenzhen International Graduate School, Tsinghua University（清华大学深圳国际研究生院）； College of Computer Science and Software Engineering, Shenzhen University（深圳大学计算机科学与软件工程学院）

AI总结提出TimeGS框架，将时间序列预测转化为2D高斯泼溅生成渲染，通过各向异性高斯核和连续光栅化解决周期内与周期间的建模问题，实现SOTA性能。

详情

AI中文摘要

时间序列预测仍然是一个具有挑战性的问题，因为周期内波动和周期间趋势的复杂纠缠。尽管最近的进展试图将一维序列重塑为二维周期-相位表示，但它们存在两个主要局限性。首先，将重塑后的张量视为静态图像会导致拓扑不匹配，因为标准空间算子在网格边界处切断了时间连续性。其次，依赖统一的固定大小表示会低效地分配建模能力，并且无法为可压缩的非平稳时间模式提供所需的自适应分辨率。为了解决这些局限性，我们引入了TimeGS，这是一个新颖的框架，从根本上将预测范式从回归转变为二维生成渲染。通过将未来序列重新概念化为潜在的二维时间表面，TimeGS利用高斯核的固有各向异性，以灵活的几何对齐自适应地建模复杂变化。为了实现这一点，我们引入了多基高斯核生成（MB-GKG）块，该块从固定字典中合成核以稳定优化，以及多周期时间连续光栅化（MP-CCR）块，该块在周期边界上强制执行严格的时间连续性。在标准基准数据集上的全面实验表明，TimeGS达到了最先进或具有竞争力的性能。代码位于https://github.com/yixinwang1/TimeGS。

英文摘要

Time series forecasting remains a challenging problem due to the intricate entanglement of intra-period fluctuations and inter-period trends. While recent advances have attempted to reshape 1D sequences into 2D period-phase representations, they suffer from two principal limitations. Firstly, treating reshaped tensors as static images results in a topological mismatch, as standard spatial operators sever chronological continuity at grid boundaries. Secondly, relying on uniform fixed-size representations allocates modeling capacity inefficiently and fails to provide the adaptive resolution required for compressible, non-stationary temporal patterns. To address these limitations, we introduce TimeGS, a novel framework that fundamentally shifts the forecasting paradigm from regression to 2D generative rendering. By reconceptualizing the future sequence as a latent 2D temporal surface, TimeGS utilizes the inherent anisotropy of Gaussian kernels to adaptively model complex variations with flexible geometric alignment. To realize this, we introduce a Multi-Basis Gaussian Kernel Generation (MB-GKG) block that synthesizes kernels from a fixed dictionary to stabilize optimization, and a Multi-Period Chronologically Continuous Rasterization (MP-CCR) block that enforces strict temporal continuity across periodic boundaries. Comprehensive experiments on standard benchmark datasets demonstrate that TimeGS attains state-of-the-art or competitive performance. The code is at https://github.com/yixinwang1/TimeGS.

URL PDF HTML ☆

赞 0 踩 0

2602.19213 2026-06-08 cs.CV 版本更新

SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation

SegMoTE: 用于医学图像分割的令牌级混合专家模型

Yujie Lu, Jingwen Li, Sibo Ju, Yanzhou Su, he yao, Yisong Liu, Min Zhu, Junlong Cheng

发表机构 * Sichuan University（四川大学）； Xinjiang University（新疆大学）； Fuzhou University（福州大学）； Alibaba DAMO Academy（阿里巴巴 DAMO 院）

AI总结提出SegMoTE框架，通过令牌级混合专家机制和渐进式提示令牌化，在极低标注成本下实现医学图像分割的跨模态自适应与SOTA性能。

详情

AI中文摘要

医学图像分割对于临床诊断和定量分析至关重要，但由于成像模态的异质性和像素级标注的高成本，仍然具有挑战性。尽管像SAM这样的通用交互式分割模型取得了显著进展，但它们向医学影像的迁移仍面临两个关键瓶颈：(i) 缺乏针对模态和解剖特定任务的自适应机制，限制了在分布外医学场景中的泛化能力；(ii) 当前的医学适应方法在没有选择的情况下对大型异构数据集进行微调，导致噪声监督、更高成本和负迁移。为了解决这些问题，我们提出了SegMoTE，一个高效且自适应的医学图像分割框架。SegMoTE保留了SAM原始的提示接口、高效推理和零样本泛化能力，同时仅引入少量可学习参数以动态适应不同模态和任务。此外，我们设计了一种渐进式提示令牌化机制，实现了全自动分割，显著减少了对标注的依赖。在MedSeg-HQ（一个精心策划的数据集，规模不到现有大型数据集的1%）上训练后，SegMoTE在多种成像模态和解剖任务中达到了SOTA性能。这是首次在极低标注成本下将通用分割模型高效、鲁棒且可扩展地适应到医学领域，推动了基础视觉模型在临床应用中的实际部署。

英文摘要

Medical image segmentation is vital for clinical diagnosis and quantitative analysis, yet remains challenging due to the heterogeneity of imaging modalities and the high cost of pixel-level annotations. Although general interactive segmentation models like SAM have achieved remarkable progress, their transfer to medical imaging still faces two key bottlenecks: (i) the lack of adaptive mechanisms for modality- and anatomy-specific tasks, which limits generalization in out-of-distribution medical scenarios; and (ii) current medical adaptation methods fine-tune on large, heterogeneous datasets without selection, leading to noisy supervision, higher cost, and negative transfer. To address these issues, we propose SegMoTE, an efficient and adaptive framework for medical image segmentation. SegMoTE preserves SAM's original prompt interface, efficient inference, and zero-shot generalization while introducing only a small number of learnable parameters to dynamically adapt across modalities and tasks. In addition, we design a progressive prompt tokenization mechanism that enables fully automatic segmentation, significantly reducing annotation dependence. Trained on MedSeg-HQ, a curated dataset less than 1% of existing large-scale datasets, SegMoTE achieves SOTA performance across diverse imaging modalities and anatomical tasks. It represents the first efficient, robust, and scalable adaptation of general segmentation models to the medical domain under extremely low annotation cost, advancing the practical deployment of foundation vision models in clinical applications.

URL PDF HTML ☆

赞 0 踩 0

2511.18945 2026-06-08 cs.LG cs.IT math.IT 版本更新

MIST: Mutual Information Estimation Via Supervised Training

MIST: 通过监督训练进行互信息估计

German Gritsai, Megan Richards, Maxime Méloux, Kyunghyun Cho, Maxime Peyrard

发表机构 * Université Grenoble Alpes（格拉诺布尔大学）； CNRS（国家科学研究中心）； Grenoble INP（格拉诺布尔研究所）； LIG（实验室）

AI总结提出一种基于神经网络的全数据驱动互信息估计器MIST，在大规模合成数据集上训练，采用二维注意力机制处理变长样本，并通过分位数回归量化不确定性，实验表明其性能优于传统方法且推理速度快。

详情

AI中文摘要

我们提出了一种完全数据驱动的互信息（MI）估计器设计方法。由于任何MI估计器都是来自两个随机变量的观测样本的函数，我们用一个神经网络（MIST）参数化这个函数，并端到端地训练它以预测MI值。训练是在一个包含625,000个已知真实MI的合成联合分布的大型元数据集上进行的。为了处理可变的样本大小和维度，我们采用了一种二维注意力机制，确保输入样本的置换不变性。为了量化不确定性，我们优化了分位数回归损失，使估计器能够近似MI的采样分布，而不是返回单个点估计。这一研究计划与先前的工作不同，它采取了一条完全经验性的路线，用普适的理论保证换取了灵活性和效率。实验表明，学习到的估计器在样本大小和维度上大大优于经典基线，包括在训练期间未见过的联合分布上。由此产生的基于分位数的区间校准良好，比基于自助法的置信区间更可靠，而推理速度比现有的神经基线快几个数量级。除了直接的实证收益外，这一框架产生了可训练、完全可微的估计器，可以嵌入到更大的学习流程中。此外，利用MI对可逆变换的不变性，元数据集可以通过归一化流适应任意数据模态，从而为多样化的目标元分布实现灵活的训练。

英文摘要

We propose a fully data-driven approach to designing mutual information (MI) estimators. Since any MI estimator is a function of the observed sample from two random variables, we parameterize this function with a neural network (MIST) and train it end-to-end to predict MI values. Training is performed on a large meta-dataset of 625,000 synthetic joint distributions with known ground-truth MI. To handle variable sample sizes and dimensions, we employ a two-dimensional attention scheme ensuring permutation invariance across input samples. To quantify uncertainty, we optimize a quantile regression loss, enabling the estimator to approximate the sampling distribution of MI rather than return a single point estimate. This research program departs from prior work by taking a fully empirical route, trading universal theoretical guarantees for flexibility and efficiency. Empirically, the learned estimators largely outperform classical baselines across sample sizes and dimensions, including on joint distributions unseen during training. The resulting quantile-based intervals are well-calibrated and more reliable than bootstrap-based confidence intervals, while inference is orders of magnitude faster than existing neural baselines. Beyond immediate empirical gains, this framework yields trainable, fully differentiable estimators that can be embedded into larger learning pipelines. Moreover, exploiting MI's invariance to invertible transformations, meta-datasets can be adapted to arbitrary data modalities via normalizing flows, enabling flexible training for diverse target meta-distributions.

URL PDF HTML ☆

赞 0 踩 0

2602.18905 2026-06-08 cs.LG cs.AI cs.CL 版本更新

TRUE: A Trustworthy Unified Explanation Framework for Large Language Model Reasoning

TRUE：一种用于大语言模型推理的可信统一解释框架

Yujiao Yang

发表机构 * Dalian University of Technology（大连理工大学）

AI总结提出TRUE框架，通过可执行推理验证、可行域DAG建模和因果故障模式分析，为LLM推理提供实例级、局部结构级和类别级的多层次可验证解释。

详情

AI中文摘要

大型语言模型（LLM）在复杂推理任务中展现出强大能力，但其决策过程仍难以解释。现有解释方法通常缺乏可信的结构性洞察，且局限于单实例分析，无法揭示推理稳定性和系统性故障机制。为解决这些局限，我们提出可信统一解释框架（TRUE），该框架集成了可执行推理验证、可行域有向无环图（DAG）建模和因果故障模式分析。在实例层面，我们将推理轨迹重新定义为可执行过程规范，并引入盲执行验证来评估操作有效性。在局部结构层面，我们通过结构一致性扰动构建可行域DAG，从而显式刻画局部输入空间中推理稳定性和可执行区域。在类别层面，我们引入因果故障模式分析方法，识别重复出现的结构性故障模式，并使用Shapley值量化其因果影响。在多个推理基准上的广泛实验表明，所提框架提供了多层次、可验证的解释，包括单个实例的可执行推理结构、邻近输入的可行域表示以及类别层面具有量化重要性的可解释故障模式。这些结果建立了一个统一且原则性的范式，用于提高LLM推理系统的可解释性和可靠性。

英文摘要

Large language models (LLMs) have demonstrated strong capabilities in complex reasoning tasks, yet their decision-making processes remain difficult to interpret. Existing explanation methods often lack trustworthy structural insight and are limited to single-instance analysis, failing to reveal reasoning stability and systematic failure mechanisms. To address these limitations, we propose the Trustworthy Unified Explanation Framework (TRUE), which integrates executable reasoning verification, feasible-region directed acyclic graph (DAG) modeling, and causal failure mode analysis. At the instance level, we redefine reasoning traces as executable process specifications and introduce blind execution verification to assess operational validity. At the local structural level, we construct feasible-region DAGs via structure-consistent perturbations, enabling explicit characterization of reasoning stability and the executable region in the local input space. At the class level, we introduce a causal failure mode analysis method that identifies recurring structural failure patterns and quantifies their causal influence using Shapley values. Extensive experiments across multiple reasoning benchmarks demonstrate that the proposed framework provides multi-level, verifiable explanations, including executable reasoning structures for individual instances, feasible-region representations for neighboring inputs, and interpretable failure modes with quantified importance at the class level. These results establish a unified and principled paradigm for improving the interpretability and reliability of LLM reasoning systems.

URL PDF HTML ☆

赞 0 踩 0

2602.16864 2026-06-08 cs.LG cs.AI math.DS 版本更新

Position: A Dynamical Systems Perspective is Needed to Advance Time Series Modeling

立场：需要动力系统视角以推进时间序列建模

Daniel Durstewitz, Christoph Jürgen Hemmer, Florian Hess, Charlotte Ricarda Doll, Lukas Eisenmann

发表机构 * University of Tübingen（图宾根大学）

AI总结本文主张时间序列建模需引入动力系统视角，通过重构底层DS实现更优预测，并讨论其理论优势与具体建议。

详情

AI中文摘要

时间序列（TS）建模从早期的统计方法（主要是线性方法）发展到当前TS基础模型的趋势，已经走过了很长的路。由于该领域存在大量炒作和工业需求，实际进展并不总是清晰。为了将TS预测和分析提升到新水平，本文主张该领域需要动力系统（DS）视角。来自自然或工程系统的观测TS几乎总是源于某个底层DS，并且可以说，访问其控制方程将产生理论上的最优预测。这是DS重构（DSR）的承诺，这是一类旨在从数据中推断底层DS替代模型的ML/AI方法。但基于DS原理的模型还提供了其他深刻优势：除了短期预测，它们还能预测观测系统的长期统计量，这在许多实际场景中可能是更相关的量。此外，DS理论提供了领域无关的理论洞见，理解TS生成的机制，从而告知我们例如任何TS模型性能的上限、向未见过场景（如临界点）的泛化，或潜在的控制策略。在回顾DS理论和DSR中的一些核心概念、方法、度量和模型后，我们将讨论该领域的洞见如何以关键方式推进TS建模，实现更好的预测，同时大幅降低计算和内存占用。最后，我们提出若干具体建议，将DSR的洞见转化为TS建模实践。

英文摘要

Time series (TS) modeling has come a long way from early statistical, mainly linear, approaches to the current trend in TS foundation models. With a lot of hype and industrial demand in this field, it is not always clear how much progress there really is. To advance TS forecasting and analysis to the next level, here we argue that the field needs a dynamical systems (DS) perspective. TS of observations from natural or engineered systems almost always originate from some underlying DS, and arguably access to its governing equations would yield theoretically optimal forecasts. This is the promise of DS reconstruction (DSR), a class of ML/AI approaches that aim to infer surrogate models of the underlying DS from data. But models based on DS principles offer other profound advantages: Beyond short-term forecasts, they enable to predict the long-term statistics of an observed system, which in many practical scenarios may be the more relevant quantities. DS theory furthermore provides domain-independent theoretical insight into mechanisms underlying TS generation, and thereby will inform us, e.g., about upper bounds on performance of any TS model, generalization into unseen regimes as in tipping points, or potential control strategies. After reviewing some of the central concepts, methods, measures, and models in DS theory and DSR, we will discuss how insights from this field can advance TS modeling in crucial ways, enabling better forecasting with much lower computational and memory footprints. We conclude with a number of specific suggestions for translating insights from DSR into TS modeling.

URL PDF HTML ☆

赞 0 踩 0

2602.16073 2026-06-08 cs.RO cs.AI cs.LO cs.SY eess.SY 版本更新

ScenicRules: An Autonomous Driving Benchmark with Multi-Objective Specifications and Abstract Scenarios

ScenicRules：具有多目标规范和抽象场景的自动驾驶基准测试

Kevin Kai-Chun Chang, Ekin Beyazit, Alberto Sangiovanni-Vincentelli, Tichakorn Wongpiromsarn, Sanjit A. Seshia

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Massachusetts Institute of Technology（麻省理工学院）

AI总结提出ScenicRules基准，通过层次化规则框架和形式化场景模型，在随机环境下评估自动驾驶系统对优先级多目标规范的满足程度。

详情

Comments: v2: Minor numerical corrections for Table V. 16 pages, 14 figures, 7 tables. Extended version of paper accepted to 2026 IEEE Intelligent Vehicles Symposium (IV 2026). ScenicRules benchmark available at https://github.com/BerkeleyLearnVerify/ScenicRules

AI中文摘要

开发复杂交通环境下的自动驾驶系统需要平衡多个目标，例如避免碰撞、遵守交通规则和高效行驶。在许多情况下，这些目标无法同时满足，因此自然会出现明确的优先级关系。此外，驾驶规则需要上下文，因此正式建模这些规则适用的环境场景非常重要。现有的自动驾驶车辆评估基准缺乏这种多目标优先级规则和形式化环境模型的组合。在这项工作中，我们引入了ScenicRules，一个在随机环境下根据优先级多目标规范评估自动驾驶系统的基准。我们首先形式化了一组多样化的目标作为定量评估指标。接下来，我们设计了一个层次化规则书框架，以可解释和可适应的方式编码多个目标及其优先级关系。然后，我们构建了一个紧凑但具有代表性的场景集合，涵盖各种驾驶情境和近事故情况，并使用Scenic语言进行形式化建模。实验结果表明，我们的形式化目标和层次化规则书与人类驾驶判断高度一致，并且我们的基准有效地暴露了代理在优先级目标方面的失败。我们的基准可在https://github.com/BerkeleyLearnVerify/ScenicRules/获取。

英文摘要

Developing autonomous driving systems for complex traffic environments requires balancing multiple objectives, such as avoiding collisions, obeying traffic rules, and making efficient progress. In many situations, these objectives cannot be satisfied simultaneously, and explicit priority relations naturally arise. Also, driving rules require context, so it is important to formally model the environment scenarios within which such rules apply. Existing benchmarks for evaluating autonomous vehicles lack such combinations of multi-objective prioritized rules and formal environment models. In this work, we introduce ScenicRules, a benchmark for evaluating autonomous driving systems in stochastic environments under prioritized multi-objective specifications. We first formalize a diverse set of objectives to serve as quantitative evaluation metrics. Next, we design a Hierarchical Rulebook framework that encodes multiple objectives and their priority relations in an interpretable and adaptable manner. We then construct a compact yet representative collection of scenarios spanning diverse driving contexts and near-accident situations, formally modeled in the Scenic language. Experimental results show that our formalized objectives and Hierarchical Rulebooks align well with human driving judgments and that our benchmark effectively exposes agent failures with respect to the prioritized objectives. Our benchmark can be accessed at https://github.com/BerkeleyLearnVerify/ScenicRules/.

URL PDF HTML ☆

赞 0 踩 0

2502.00225 2026-06-08 cs.LG cs.AI cs.CL 版本更新

Should You Use Your Large Language Model to Explore or Exploit?

你应该使用你的大语言模型进行探索还是利用？

Keegan Harris, Aleksandrs Slivkins

发表机构 * UC Berkeley（伯克利大学）； Microsoft Research（微软研究院）

AI总结研究当前大语言模型在探索-利用权衡中的决策能力，通过分离探索和利用任务评估其表现，发现推理模型在利用任务上最有潜力但成本高，非推理模型通过工具使用和上下文总结可提升中等难度任务性能，但在所有任务中均不如简单线性回归，然而LLM在具有语义的大动作空间探索中有帮助。

详情

Comments: Accepted to UAI 2026

AI中文摘要

我们评估了当前一代大语言模型（LLMs）在面对探索-利用权衡时的决策能力。虽然先前的工作主要研究LLMs解决组合探索-利用任务的能力，我们采取了更系统的方法，将LLMs用于在各种（上下文）赌博机任务中分别进行探索和利用。我们发现推理模型在解决利用任务方面最有前景，尽管它们在实际应用中仍然过于昂贵或缓慢。受此启发，我们研究了非推理模型的工具使用和上下文总结。我们发现这些缓解措施可以显著提高中等难度任务的性能，但即便如此，我们研究的所有LLMs在所有任务中（包括非线性设置）的表现都不如简单的线性回归。另一方面，我们发现LLMs在探索具有内在语义的大动作空间时确实有帮助，通过建议合适的候选动作进行探索。

英文摘要

We evaluate the ability of the current generation of large language models (LLMs) to help a decision-making agent facing an exploration-exploitation tradeoff. While previous work has largely study the ability of LLMs to solve combined exploration-exploitation tasks, we take a more systematic approach and use LLMs to explore and exploit in silos in various (contextual) bandit tasks. We find that reasoning models show the most promise for solving exploitation tasks, although they are still too expensive or too slow to be used in many practical settings. Motivated by this, we study tool use and in-context summarization using non-reasoning models. We find that these mitigations may be used to substantially improve performance on medium-difficulty tasks, however even then, all LLMs we study perform worse than a simple linear regression, even in non-linear settings. On the other hand, we find that LLMs do help at exploring large action spaces with inherent semantics, by suggesting suitable candidates to explore.

URL PDF HTML ☆

赞 0 踩 0

2602.15287 2026-06-08 cs.CV 版本更新

Consistency-Preserving Diverse Video Generation

保持一致性的多样化视频生成

Xinshuang Liu, Runfa Blark Li, Truong Nguyen

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出一种联合采样框架，在保持时间一致性的同时提高文本到视频生成中批次内视频的多样性，通过轻量级潜在空间模型避免视频解码和反向传播。

详情

AI中文摘要

文本到视频生成成本高昂，因此每个提示通常只生成少量样本。在这种低样本情况下，最大化每批的价值需要高跨视频多样性。最近的方法提高了图像生成的多样性，但对于视频，它们常常降低视频内的时间一致性，并且需要通过视频解码器进行昂贵的反向传播。我们提出了一种用于流匹配视频生成器的联合采样框架，该框架在保持时间一致性的同时提高了批次多样性。我们的方法应用多样性驱动的更新，然后仅移除会降低时间一致性目标的分量。为了避免图像空间梯度，我们使用轻量级潜在空间模型计算两个目标，避免了视频解码和解码器反向传播。在最新的文本到视频流匹配模型上的实验表明，我们的方法在接近强联合采样基线的多样性的同时，显著提高了时间一致性和颜色自然度。我们的代码可在 https://github.com/XinshuangL/Diverse-Video 获取。

英文摘要

Text-to-video generation is expensive, so only a few samples are typically produced per prompt. In this low-sample regime, maximizing the value of each batch requires high cross-video diversity. Recent methods improve diversity for image generation, but for videos they often degrade within-video temporal consistency and require costly backpropagation through a video decoder. We propose a joint-sampling framework for flow-matching video generators that improves batch diversity while preserving temporal consistency. Our approach applies diversity-driven updates and then removes only the components that would decrease a temporal-consistency objective. To avoid image-space gradients, we compute both objectives with lightweight latent-space models, avoiding video decoding and decoder backpropagation. Experiments on a state-of-the-art text-to-video flow-matching model show diversity close to strong joint-sampling baselines while substantially improving temporal consistency and color naturalness. Our code is available at https://github.com/XinshuangL/Diverse-Video.

URL PDF HTML ☆

赞 0 踩 0

2602.14209 2026-06-08 cs.LG cs.CL 版本更新

MAGE: All-[MASK] Block Already Knows Where to Look in Block Diffusion LLM

MAGE：在块扩散LLM中，全[MASK]块已经知道在哪里看

Omin Kwon, Yeonjae Kim, Doyeon Kim, Minseo Kim, Yeonhong Park, Jae W. Lee

发表机构 * Seoul National University（首尔国立大学）； Meta

AI总结针对块扩散LLM长上下文推理中KV缓存导致的内存瓶颈，提出无训练方法MAGE，利用块扩散训练目标的对齐特性，在第一步确定整个轨迹的KV子集，实现近无损精度和显著加速。

详情

AI中文摘要

块扩散LLM是一种并行语言生成的新兴范式，但其KV缓存使得内存访问成为长上下文推理中的主要瓶颈。稀疏注意力（每个查询仅关注少量KV子集）可以在最小化精度损失的情况下减少延迟。然而，在块扩散中，每个块的B个token必须共享一个KV子集，我们证明这种每块约束会使现有稀疏KV估计器的召回率下降高达25%。为了解决这一挑战，我们利用了块扩散训练目标中出现的一个特性：它将去噪步骤中的块平均查询对齐，因此第一步的全[MASK]块已经揭示了整个轨迹中每块的KV子集。我们在MAGE（[MASK]引导的稀疏注意力）中利用了这一特性，这是一种无训练方法，在第一步执行一次精确注意力，并在块内的所有剩余步骤中重用其top-k索引集。在LongBench上的三个块扩散家族中，MAGE在k=512时匹配精确注意力，精度几乎无损，在128K上下文中实现高达6.82倍的端到端加速，并且比分别为自回归LLM和全双向扩散LLM设计的Quest和SparseD快3.35倍和2.28倍。

英文摘要

Block diffusion LLMs are an emerging paradigm for parallel language generation, but their KV caching makes memory access the dominant bottleneck in long-context inference. Sparse attention, which attends only to a small KV subset per query, can reduce this latency with minimal accuracy loss. In block diffusion, however, the B tokens of each block must share a single KV subset, and we show this per-block constraint degrades existing sparse KV estimators by up to 25% in recall. We address this challenge by exploiting a property that emerges from the block-diffusion training objective: it aligns the block-average query across denoising steps, so the All-[MASK] block at the first step already reveals the per-block KV subset for the entire trajectory. We exploit this in MAGE ([MASK]-Guided Sparse Attention), a training-free method that runs one exact attention pass at the first step and reuses its top-k index sets for all remaining steps within the block. Across three block-diffusion families on LongBench, MAGE matches Exact Attention at k=512 with near-lossless accuracy, achieves up to 6.82x end-to-end speedup at 128K context, and runs up to 3.35x and 2.28x faster than Quest and SparseD, designed for AR LLMs and fully bidirectional diffusion LLMs, respectively.

URL PDF HTML ☆

赞 0 踩 0

2602.12360 2026-06-08 cs.RO 版本更新

Predicting Dynamic Map States from Limited Field-of-View Sensor Data

从有限视场传感器数据预测动态地图状态

Knut Peterson, David Han

发表机构 * iMaPLe Research Lab, Drexel University（iMaPLe研究实验室，德雷塞尔大学）

AI总结针对传感器有限视场问题，提出将时空信息编码为单图像格式，利用现有图像到图像学习模型高精度预测动态地图状态。

详情

Comments: 6 pages, 4 figures. Accepted to the 2026 International Conference on Advanced Visual and Signal-Based Systems (AVSS)

AI中文摘要

当自主系统部署在真实场景中时，传感器通常受到有限视场（FOV）约束，这可能是由于系统设计自然导致的，也可能是由于意外遮挡或传感器故障。在无法获得大视场的情况下，能够基于可用数据推断环境信息并预测附近周围环境的状态对于维持安全准确的运行至关重要。在这项工作中，我们探讨了基于有限视场时间序列数据进行动态地图状态预测的深度学习有效性。我们表明，通过将动态传感器数据表示为捕获空间和时间信息的简单单图像格式，我们可以有效地利用各种现有的图像到图像学习模型，在多种传感场景中高精度地预测地图状态。

英文摘要

When autonomous systems are deployed in real-world scenarios, sensors are often subject to limited field-of-view (FOV) constraints, either naturally through system design, or through unexpected occlusions or sensor failures. In conditions where a large FOV is unavailable, it is important to be able to infer information about the environment and predict the state of nearby surroundings based on available data to maintain safe and accurate operation. In this work, we explore the effectiveness of deep learning for dynamic map state prediction based on limited FOV time series data. We show that by representing dynamic sensor data in a simple single-image format that captures both spatial and temporal information, we can effectively use a wide variety of existing image-to-image learning models to predict map states with high accuracy in a diverse set of sensing scenarios.

URL PDF HTML ☆

赞 0 踩 0

2602.11201 2026-06-08 cs.CL 版本更新

Mechanistic Evidence for Faithfulness Decay in Chain-of-Thought Reasoning

链式思维推理中忠实度衰减的机制证据

Donald Ye, Max Loffgren, Om Kotadia, Linus Wong, Jonas Rohweder

发表机构 * Fordham University（福特汉姆大学）； Algoverse AI Research（Algoverse AI研究）； Rice University（稻子大学）； UC San Diego（圣地亚哥大学）； Santa Clara University（圣克拉拉大学）； LMU Munich（慕尼黑路德维希-马克西米利安大学）

AI总结提出归一化对数几率差衰减（NLDD）指标，通过破坏推理步骤并测量模型置信度下降，发现链式思维中超过70-85%长度的令牌对最终答案贡献微弱或负面，揭示了忠实度衰减现象。

详情

Comments: 16 pages, 16 figures. Accepted to ICLR LIT workshop. Code: https://github.com/donald-ye/NLDD

AI中文摘要

链式思维（CoT）解释被广泛用于解释语言模型如何解决复杂问题，但目前尚不清楚这些逐步解释是否反映了模型实际得出答案的方式，还是仅仅是事后证明。我们提出了归一化对数几率差衰减（NLDD），一种衡量单个推理步骤是否忠实于模型决策过程的指标。我们的方法从解释中破坏单个推理步骤，并测量模型对其答案的置信度下降程度，以确定该步骤是否真正重要。通过标准化这些测量，NLDD能够实现跨不同架构的严格跨模型比较。在三种模型家族上测试句法、逻辑和算术任务，我们发现了一个一致的推理视界（k*），位于链长的70-85%处，超过该点的推理令牌对最终答案几乎没有或只有负面影响。我们还发现，模型可以在完全失败任务的同时编码正确的内部表示。这些结果表明，仅凭准确性并不能揭示模型是否真正通过其链进行推理。NLDD提供了一种衡量CoT何时重要的方法。

英文摘要

Chain-of-Thought (CoT) explanations are widely used to interpret how language models solve complex problems, yet it remains unclear whether these step-by-step explanations reflect how the model actually reaches its answer, or merely post-hoc justifications. We propose Normalized Logit Difference Decay (NLDD), a metric that measures whether individual reasoning steps are faithful to the model's decision-making process. Our approach corrupts individual reasoning steps from the explanation and measures how much the model's confidence in its answer drops, to determine if a step is truly important. By standardizing these measurements, NLDD enables rigorous cross-model comparison across different architectures. Testing three model families across syntactic, logical, and arithmetic tasks, we discover a consistent Reasoning Horizon (k*) at 70--85% of chain length, beyond which reasoning tokens have little or negative effect on the final answer. We also find that models can encode correct internal representations while completely failing the task. These results show that accuracy alone does not reveal whether a model actually reasons through its chain. NLDD offers a way to measure when CoT matters.

URL PDF HTML ☆

赞 0 踩 0

2512.20963 2026-06-08 cs.LG cs.CV 版本更新

Generalization of Diffusion Models Arises with a Balanced Representation Space

扩散模型的泛化源于平衡表示空间

Zekai Zhang, Xiao Li, Xiang Li, Lianghe Shi, Meng Wu, Molei Tao, Qing Qu

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结通过分析两层ReLU去噪自编码器，证明记忆化导致局部尖峰表示，而泛化产生平衡表示，并在真实扩散模型中验证，提出基于表示的检测和编辑方法。

详情

Comments: Accepted at ICLR 2026. 40 pages, 19 figures. The first two authors contributed equally

AI中文摘要

扩散模型擅长生成高质量、多样化的样本，但当过度拟合训练目标时，它们有记忆训练数据的风险。我们通过表示学习的视角分析了扩散模型中记忆化和泛化之间的区别。通过研究两层ReLU去噪自编码器（DAE），我们证明了（i）记忆化对应于模型在学习的权重中存储原始训练样本以进行编码和解码，产生局部尖峰表示，而（ii）泛化发生在模型捕获局部数据统计时，产生平衡表示。此外，我们在真实的无条件和文本到图像扩散模型上验证了这些理论发现，表明相同的表示结构出现在深度生成模型中，并具有重要的实际意义。基于这些见解，我们提出了一种基于表示的检测记忆化的方法，以及一种无需训练的编辑技术，通过表示引导实现精确控制。总之，我们的结果强调了学习好的表示对于新颖且有意义的生成建模至关重要。

英文摘要

Diffusion models excel at generating high-quality, diverse samples, yet they risk memorizing training data when overfit to the training objective. We analyze the distinctions between memorization and generalization in diffusion models through the lens of representation learning. By investigating a two-layer ReLU denoising autoencoder (DAE), we prove that (i) memorization corresponds to the model storing raw training samples in the learned weights for encoding and decoding, yielding localized spiky representations, whereas (ii) generalization arises when the model captures local data statistics, producing balanced representations. Furthermore, we validate these theoretical findings on real-world unconditional and text-to-image diffusion models, demonstrating that the same representation structures emerge in deep generative models with significant practical implications. Building on these insights, we propose a representation-based method for detecting memorization and a training-free editing technique that allows precise control via representation steering. Together, our results highlight that learning good representations is central to novel and meaningful generative modeling.

URL PDF HTML ☆

赞 0 踩 0

2602.02600 2026-06-08 cs.LG cs.AI 版本更新

Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models

自回归与扩散语言模型中的逐步拒绝动态

Eliron Rahimi, Elad Hirshel, Rom Himelstein, Amit LeVi, Avi Mendelson, Chaim Baskin

发表机构 * Department of Computer Science, Technion – Israel Institute of Technology（技术学院计算机科学系，以色列技术学院）； INSIGHT Lab, School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Israel（内斯坦实验室，贝内-加隆大学内加尔分校，以色列）； Computer Science Department, University of Haifa, Haifa, Israel（海法大学计算机科学系，海法，以色列）

AI总结研究扩散语言模型（DLM）与自回归（AR）模型在拒绝有害生成行为上的差异，发现扩散重掩码机制可促进恢复，提出逐步拒绝内部动态（SRI）信号，并基于此构建无需修改推理的越狱检测器。

详情

Comments: Preprint

AI中文摘要

扩散语言模型（DLM）最近已成为自回归（AR）模型的有竞争力的替代方案，提供并行解码、竞争性生成质量以及越狱鲁棒性改善的初步证据。尽管取得了这些进展，但采样机制在塑造拒绝行为中的作用仍知之甚少。为填补这一空白，我们提出了一项关于逐步拒绝动态的全面研究。我们表明，扩散重掩码可以促进从有害中间生成中恢复，提供证据表明这种行为与采样机制相关，并证明从AR采样切换到扩散采样可提高越狱鲁棒性，包括在固定模型权重下。为了捕捉在文本层面不可观察的生成动态，我们提出了逐步拒绝内部动态（SRI）信号。与我们的文本层面发现一致，SRI表明恢复主要在AR采样下失败，这些失败在SRI空间中通常相对于无害生成表现为异常。基于这一观察，我们表明SRI能够实现一个简单的越狱检测器，该检测器无需修改推理，并且仅通过在良性SRI信号上训练即可泛化到未见攻击。我们的评估表明，该检测器匹配或超越现有越狱检测基线，同时增加可忽略的开销。

英文摘要

Diffusion language models (DLMs) have recently emerged as a competitive alternative to autoregressive (AR) models, offering parallel decoding, competitive generation quality, and initial evidence of improved jailbreak robustness. Despite this progress, the role of sampling mechanisms in shaping refusal behavior remains poorly understood. To address this gap, we present a comprehensive study of step-wise refusal dynamics. We show that diffusion remasking can promote recovery from harmful intermediate generations, provide evidence that this behavior is tied to the sampling mechanism, and demonstrate that switching from AR to diffusion sampling improves jailbreak robustness, including under fixed model weights. To capture generation dynamics not observable at the text level, we propose the Step-Wise Refusal Internal Dynamics (SRI) signal. Consistent with our text-level findings, SRI shows that recovery fails primarily under AR sampling, with these failures often appearing anomalous relative to harmless generations in the SRI space. Based on this observation, we show that SRI enables a simple jailbreak detector that does not modify inference and generalizes to unseen attacks by training only on benign SRI signals. Our evaluation shows that this detector matches or outperforms existing jailbreak detection baselines while adding negligible overhead.

URL PDF HTML ☆

赞 0 踩 0

2602.07025 2026-06-08 cs.CV cs.AI 版本更新

The Geometry of Representational Failures in Vision Language Models

视觉语言模型中表征失败的几何结构

Daniele Savietto, Declan Campbell, André Panisson, Marco Nurisso, Giovanni Petri, Jonathan D. Cohen, Alan Perotti

发表机构 * Dipartimento di Fisica, Università di Torino（都灵大学物理系）； Princeton Neuroscience Institute and AI Lab, Princeton University（普林斯顿大学神经科学研究所和AI实验室）； Intesa Sanpaolo AI Research（Intesa Sanpaolo AI研究中心）； Dipartimento di Scienze Matematiche, Politecnico di Torino（都灵理工学院数学科学系）； Network Science Institute, Northeastern University London, UK（伦敦大学东北方大学网络科学研究所）

AI总结通过分析开源视觉语言模型的概念向量几何重叠，揭示多目标视觉任务中幻觉等错误与认知约束的关联，并提出基于干预的验证方法。

详情

AI中文摘要

视觉语言模型在多目标视觉任务中表现出令人困惑的失败，例如幻觉不存在的元素或未能识别干扰中最相似的物体。虽然这些错误反映了人类的认知约束，如“绑定问题”，但在人工系统中驱动这些错误的内部机制仍然知之甚少。在这里，我们通过分析开源视觉语言模型（Qwen、InternVL、Gemma）的表征几何结构，提出了一种机制性见解，比较了提炼“概念向量”（编码视觉概念的潜在方向）的方法。我们通过引导干预验证了概念向量，这些干预在简化和自然视觉任务中可靠地操纵模型行为（例如，强制模型将红色花朵感知为蓝色）。我们观察到这些向量之间的几何重叠与特定错误模式强相关，提供了一个有依据的定量框架来理解内部表征如何塑造模型行为并驱动视觉失败。

英文摘要

Vision-Language Models (VLMs) exhibit puzzling failures in multi-object visual tasks, such as hallucinating non-existent elements or failing to identify the most similar objects among distractions. While these errors mirror human cognitive constraints, such as the 'Binding Problem', the internal mechanisms driving them in artificial systems remain poorly understood. Here, we propose a mechanistic insight by analyzing the representational geometry of open-weight VLMs (Qwen, InternVL, Gemma), comparing methodologies to distill "concept vectors'' - latent directions encoding visual concepts. We validate our concept vectors via steering interventions that reliably manipulate model behavior in both simplified and naturalistic vision tasks (e.g., forcing the model to perceive a red flower as blue). We observe that the geometric overlap between these vectors strongly correlates with specific error patterns, offering a grounded quantitative framework to understand how internal representations shape model behavior and drive visual failures.

URL PDF HTML ☆

赞 0 踩 0

2602.01740 2026-06-08 cs.AI cs.CV cs.LG 版本更新

MACD: Model-Aware Contrastive Decoding via Counterfactual Data

MACD：基于反事实数据的模型感知对比解码

Qixin Xiao, Kun Zhou

发表机构 * arXiv.org ； cs.AI（计算机科学与人工智能）

AI总结提出MACD方法，利用视频语言模型自身反馈识别导致幻觉的目标区域，生成目标级反事实输入，结合对比解码减少幻觉，提升多模型在复杂场景下的准确性。

详情

AI中文摘要

视频语言模型（Video-LLMs）容易产生幻觉，当视觉证据薄弱、模糊或存在偏差时，会生成看似合理但无根据的内容。现有方法如对比解码（CD）依赖随机扰动构建对比数据以缓解幻觉，但往往未能针对驱动幻觉的视觉线索或模型弱点。我们提出基于模型感知反事实数据的对比解码（MACD），这是一种结合模型引导的反事实构建与对比解码的推理策略。MACD利用Video-LLM自身的反馈来识别最可能导致幻觉的目标区域，生成有针对性的目标级反事实输入，而非任意的帧或时间修改。这些反事实输入被整合到CD中，以在解码过程中强制进行基于证据的令牌选择。在EventHallusion、MVBench、Perception-test和Video-MME上的实验表明，MACD在包括Qwen和InternVL在内的多种Video-LLM上持续减少幻觉，同时保持或提高任务准确性，在涉及小目标、遮挡目标或共现目标的场景中尤其表现出显著优势。

英文摘要

Video language models (Video-LLMs) are prone to hallucinations, generating plausible but ungrounded content when visual evidence is weak, ambiguous, or biased. Existing methods, such as contrastive decoding (CD), rely on random perturbations to construct contrastive data for hallucination mitigation, but often fail to target the visual cues that drive hallucination or align with model weaknesses. We propose Model-Aware Counterfactual Data based Contrastive Decoding (MACD), an inference strategy that combines model-guided counterfactual construction with contrastive decoding. MACD uses the Video-LLM's own feedback to identify object regions most responsible for hallucination, generating targeted object-level counterfactual inputs rather than arbitrary frame or temporal modifications. These counterfactual inputs are integrated into CD to enforce evidence-grounded token selection during decoding. Experiments on EventHallusion, MVBench, Perception-test, and Video-MME show that MACD consistently reduces hallucination while maintaining or improving task accuracy across diverse Video-LLMs, including Qwen and InternVL, with especially strong gains in scenarios involving small, occluded, or co-occurring objects.

URL PDF HTML ☆

赞 0 踩 0

2602.06941 2026-06-08 cs.LG cs.AI cs.CL 版本更新

Endogenous Resistance to Activation Steering in Language Models

语言模型中激活引导的内生抵抗

Alex McKenzie, Keenan Pepper, Stijn Servaes, Martin Leitgab, Murat Cubuktepe, Mike Vaiana, Diogo de Lucena, Judd Rosenblatt, Michael S. A. Graziano

发表机构 * University of Washington（华盛顿大学）

AI总结研究发现大型语言模型在任务不匹配的激活引导下能内生抵抗，通过显式重启恢复正确生成，并识别出相关稀疏自编码器潜在变量，可增强或削弱该抵抗。

详情

AI中文摘要

大型语言模型可以在生成过程中从任务不匹配的激活引导中恢复，产生显式的语言重启（例如，“等等，那不对”），并在引导扰动仍然活跃的情况下继续讨论主题。我们将此称为内生引导抵抗（ESR）。使用稀疏自编码器（SAE）潜在变量来引导模型激活，我们发现Llama-3.3-70B在\llamaseventyEsrRate\\%的情况下表现出显式ESR，而来自Llama-3和Gemma-2系列的较小模型则较少出现显式形式。两个对照实验将ESR分解为检测事件和持续抵抗组件，后者不能仅由最近的on-topic token条件化来完全解释。我们通过对比on-topic/off-topic搜索识别出\numOtdLatents{}个SAE潜在变量；将其零消融使多次尝试率降低\multiAttemptReductionPct\\%，随机潜在变量和保留提示对照支持特异性。ESR还可以通过元提示和基于合成自我纠正示例的微调来有意增强。ESR对安全性具有双重影响：它可能使模型对对抗性激活空间操纵更具抵抗力，但同样可能干扰有益的基于引导的干预，因为模型无法区分两者。代码可在\href{https://github.com/agencyenterprise/endogenous-steering-resistance}{github.com/agencyenterprise/endogenous-steering-resistance}获取。

英文摘要

Large language models can recover mid-generation from task-misaligned activation steering, producing explicit verbal restarts (e.g., ``wait, that's not right'') and continuing on-topic even while the steering perturbation remains active. We term this Endogenous Steering Resistance (ESR). Using sparse autoencoder (SAE) latents to steer model activations, we find that Llama-3.3-70B exhibits explicit ESR at \llamaseventyEsrRate\%, with smaller models from the Llama-3 and Gemma-2 families showing the explicit form less frequently. Two controls dissociate ESR into a detection event and a sustained-resistance component that conditioning on recent on-topic tokens does not fully explain. We identify \numOtdLatents{} SAE latents through contrastive on-topic/off-topic search; zero-ablating them reduces the multi-attempt rate by \multiAttemptReductionPct\%, with random-latent and held-out-prompt controls supporting specificity. ESR can also be deliberately enhanced through both meta-prompting and fine-tuning on synthetic self-correction examples. ESR has dual implications for safety: it could harden models against adversarial activation-space manipulation, but may equally interfere with beneficial steering-based interventions, since the model has no way to distinguish the two. Code is available at \href{https://github.com/agencyenterprise/endogenous-steering-resistance}{github.com/agencyenterprise/endogenous-steering-resistance}.

URL PDF HTML ☆

赞 0 踩 0

2512.17058 2026-06-08 cs.LG 版本更新

Universal consistency of the $k$-NN rule in metric spaces and Nagata dimension. III

度量空间和Nagata维数中$k$-NN规则的普适一致性. III

Vladimir G. Pestov

发表机构 * Department of Mathematics and Statistics, University of Ottawa（数学与统计学系，渥太华大学）； Departamento de Matemática, Universidade Federal de Santa Catarina（数学系，圣卡塔琳娜联邦大学）

AI总结本文证明了在完备可分度量空间中，$k$-最近邻分类器普适一致的充要条件是空间具有强Lebesgue-Besicovitch微分性质或Nagata的$\sigma$-有限维数，填补了最后缺失的环节。

详情

Comments: 22 pages, latex with ESAIM P&S macros, a second revision requested by the referee, with more accurate and detailed proofs, in particular, the referee pointed out the correct value of the Nagata dimension of R^2 which is 4

AI中文摘要

我们建立了最后缺失的环节，使得能够用维数理论的组合术语和实分析的基本性质来描述那些完备可分度量空间$X$，其中$k$最近邻分类器是普适一致的。以下条件等价：(1) $k$-最近邻分类器在$X$中普适一致，(2) 强Lebesgue--Besicovitch微分性质在$X$中对每个局部有限Borel测度成立，(3) $X$在Jun-Iti Nagata意义下是$\sigma$-有限维的。等价关系(2)$\iff$(3)由Preiss (1983)宣布，而(3)$\Rightarrow$(2)的详细证明仅出现在Assouad和Quentin de Gromard (2006)中。(2)$\Rightarrow$(1)由Cérou和Guyader (2006)建立。我们证明了(1)$\Rightarrow$(3)。我们进一步表明，弱（而非强）Lebesgue--Besicovitch性质对于$k$-NN规则的一致性是不充分的，例如Heisenberg群就是一个反例（这里我们纠正了之前文章(Kumari and Pestov 2024)中的一个错误说法）。有点反直觉的是，存在一个与通常距离一致等价的实数直线上的度量，在该度量下$k$-NN分类器失效。最后，另一个可以添加到上述条件的等价条件是Cover--Hart性质：(4) $1$-最近邻分类器的误差渐近地至多是Bayes误差的两倍。

英文摘要

We establish the last missing link allowing to describe those complete separable metric spaces $X$ in which the $k$ nearest neighbour classifier is universally consistent, both in combinatorial terms of dimension theory and via a fundamental property of real analysis. The following are equivalent: (1) The $k$-nearest neighbour classifier is universally consistent in $X$, (2) The strong Lebesgue--Besicovitch differentiation property holds in $X$ for every locally finite Borel measure, (3) $X$ is sigma-finite dimensional in the sense of Jun-Iti Nagata. The equivalence (2)$\iff$(3) was announced by Preiss (1983), while a detailed proof of the implication (3)$\Rightarrow$(2) has only appeared in Assouad and Quentin de Gromard (2006). The implication (2)$\Rightarrow$(1) was established by Cérou and Guyader (2006). We prove the implication (1)$\Rightarrow$(3). We further show that the weak (instead of strong) Lebesgue--Besicovitch property is insufficient for the consistency of the $k$-NN rule, as witnessed, for example, by the Heisenberg group (here we correct a wrong claim made in the previous article (Kumari and Pestov 2024)). A bit counter-intuitively, there is a metric on the real line uniformly equivalent to the usual distance but under which the $k$-NN classifier fails. Finally, another equivalent condition that can be added to the above is the Cover--Hart property: (4) the error of the $1$-nearest neighbour classifier is asymptotically at most twice as bad as the Bayes error.

URL PDF HTML ☆

赞 0 踩 0