arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.17165 2026-06-19 stat.ME cs.AI econ.EM math.ST stat.TH 新提交

Statistical Foundations of LLM-based A/B Testing: A Surrogacy Framework for Human Causal Inference

基于LLM的A/B测试的统计基础：用于人类因果推断的替代指标框架

Joel Persson, Mårten Schultzberg, Sebastian Ankargren

发表机构 * Spotify USA, Inc.（Spotify美国公司）

AI总结提出替代指标理论框架，证明在弱于分布等价条件下，校准LLM输出可识别平均处理效应，并分析随机性带来的偏差与方差。

详情

AI中文摘要

组织和研究者越来越有兴趣在A/B测试中使用大型语言模型（LLM）代替人类参与者，以期更快、更低成本地进行实验。我们研究当在LLM结果上估计的处理效应何时能够恢复在感兴趣的人类群体上测量的效应。LLM与人类结果之间的分布等价性会使任何标准估计量有效，但这不现实。因此，我们开发了一个统计框架，将替代终点理论适配到LLM。该框架表明，将LLM结果校准到人类结果，在替代性和可比性条件（联合弱于分布等价性）下，可以识别平均处理效应。当这些条件不成立时，感兴趣的效应仅部分可识别，我们提供了诊断方法，可以在历史实验上证伪替代性，并给出有限重叠下最坏情况偏差的界限。我们进一步证明，LLM固有的随机性会引入偏差和方差，但使用多次抽取的平均值作为替代指标可以同时缓解两者。我们在模拟和Upworthy标题的A/B测试应用中展示了方法和理论。我们工作的一个核心结论是，LLM结果作为替代指标的有效性只能对过去的处理被证伪，而无法对新处理被验证，因此对于新颖干预，人类实验仍然不可或缺。我们讨论了LLM选择、提示和温度作为设计变量的作用，以及如何确定人类实验的规模以进行验证。

英文摘要

Organizations and researchers show increasing interest in using large language models (LLMs) in place of human participants in A/B tests, in the hope of experimenting faster and at lower cost. We study when a treatment effect estimated on LLM outcomes can recover the effect that would have been measured on the human population of interest. Distributional equivalence between LLM and human outcomes would make any standard estimator valid but is unrealistic. We therefore develop a statistical framework that adapts surrogate endpoint theory to LLMs, showing that calibrating LLM outcomes to human outcomes identifies the average treatment effect under surrogacy and comparability conditions that are jointly weaker than distributional equivalence. We present a falsification test for surrogacy and a bound on the worst-case bias from limited overlap between the LLM and human samples. We further show that the stochasticity inherent to LLMs can weaken surrogacy for identification while also introducing bias and variance during estimation, but that using an average over multiple LLM draws per unit as the surrogate mitigates these issues. Simulations validate the results, and an empirical application to A/B tests on Upworthy headlines shows that raw LLM predictions recover only 39\% of the human treatment effect while nonparametric calibration closes the gap. A central takeaway is that A/B testing on LLMs yields correct results only by assumption, whereas A/B testing on humans is correct by design, and that the required assumptions are hardest to justify precisely where A/B testing on LLMs promises the greatest benefit. We discuss the role of LLM choice, prompting, and temperature as design variables, the compounded challenge posed by long-term outcomes, and how to size human pilot studies for validation.

URL PDF HTML ☆

赞 0 踩 0

2606.16326 2026-06-19 cs.GT cs.AI q-fin.RM 新提交

Gaming-Resistant Insurance Contracts for Autonomous AI Agents: Strategy-Proof Toll Mechanism Design

自主AI代理的抗博弈保险合约：策略证明的通行费机制设计

Hao-Hsuan Chen

发表机构 * Hao-Hsuan Chen（何浩轩）

AI总结本文扩展了时间一致精算运行时的框架，使运营商策略化，刻画了自主AI代理保险合约的五种攻击空间，并证明了精算运行时的抗博弈性，通过新合约条款实现激励兼容。

Comments 29 pages. Companion to arXiv:2605.26508 (Paper A, foundations) and arXiv:2605.25632 (Paper B, empirical)

详情

AI中文摘要

论文A定义了一个时间一致的精算运行时，该运行时根据合约固定的安全默认值对每个产生副作用的行动定价，并针对储备预算门控执行。它将运营商视为被动。本文使运营商策略化。我们刻画了自主AI代理保险合约的五种攻击空间，并证明了精算运行时何时具有抗博弈性。两种攻击面——通行费后的安全默认选择以及边界内的行动分割——通过论文A的最小权限和无分割条款得以关闭。其余三种需要新的合约条款。首先，公共控制聚合防止跨边界重新路由将通行费降低到应用于总暴露的边界潜力以下。其次，接口故障（如无效JSON）是合约相关事件，而非安全胜利：将其视为零通行费安全默认值可能奖励不可靠的模型，而升级费用则逆转了激励。我们通过来自配套实证论文的跨模型轨迹验证了这一接口合规定理。第三，一个带有分量最小惩罚计划的模型身份菜单使得部署模型的真实报告成为弱占优策略。然后，我们将这些条款与论文A的运行时保证组合，以获得在五种攻击空间上的联合激励兼容性。最后，一个双参数保费族在真实均衡下满足了运营商个体理性和弱预算平衡。结果是为自主代理副作用的精算控制提供了一个激励兼容层。

英文摘要

Paper A defines a time-consistent actuarial runtime that prices each side-effect-bearing action against a contractually fixed safe default and gates execution against a reserve budget. It treats the operator as passive. This paper makes the operator strategic. We characterise a five-attack space for autonomous AI-agent insurance contracts and prove when the actuarial runtime is gaming-resistant. Two attack surfaces -- post-toll safe-default selection and within-boundary action splitting -- are closed by Paper A's minimal-authority and no-splitting clauses. The remaining three require new contract clauses. First, common-control aggregation prevents cross-boundary re-routing from reducing toll below the boundary potential applied to total exposure. Second, interface failures such as invalid JSON are contract-relevant events, not safety wins: treating them as zero-toll safe defaults can reward unreliable models, while escalation fees reverse the incentive. We validate this interface-compliance theorem on committed cross-model traces from the companion empirical paper. Third, a model-identity menu with a componentwise-minimum penalty schedule makes truthful reporting of the deployed model weakly dominant. We then compose these clauses with Paper A's runtime guarantees to obtain joint incentive compatibility over the five-attack space. Finally, a two-parameter premium family discharges operator individual rationality and weak budget balance at the truthful equilibrium. The result is an incentive-compatibility layer for actuarial control of autonomous-agent side effects.

URL PDF HTML ☆

赞 0 踩 0

2606.13794 2026-06-19 eess.SY cs.AI cs.RO cs.SY 新提交

An integrated interpretable control effectiveness learning and nonlinear control allocation methodology for overactuated aircrafts

过驱动飞行器的可解释控制效能学习与非线性控制分配集成方法

Umut Demir, Aamir Ahmad, Walter Fichter

发表机构 * University of Stuttgart, Faculty of Aerospace Engineering and Geodesy, Institute of Flight Mechanics and Control (iFR)（斯图加特大学航空航天工程与大地测量学院飞行力学与控制研究所）

AI总结提出一种基于稀疏非线性动力学辨识的学习控制效能映射方法，结合在线自适应机制，实现过驱动飞行器的高效非线性控制分配，兼具可解释性和低计算成本。

详情

AI中文摘要

非线性动力学以及多个执行器之间产生的强耦合削弱了传统线性控制分配技术背后的假设。当飞行进入非线性效应主导的模态时，线性分配器因模型失配增加而精度下降，进而降低飞行控制系统的性能和鲁棒性。高保真机载模型和黑箱数据驱动方法可以在整个飞行包线内恢复精度，但分别带来实时分配难以承受的计算负担，并牺牲了验证和故障诊断所需的可解释性。本文通过使用稀疏非线性动力学辨识从代表性飞行数据中学习显式的、受物理约束的控制效能映射解析模型，解决了这些限制。所得映射紧凑、可解释，并允许解析导数，从而能够在非线性求解器中高效计算，同时额外包含执行器动力学，无需机载模型。在线自适应机制监控预测残差，并在检测到显著对象变化时刷新模型，从而在执行器故障和变化工况下提供平滑重构。该方法在一款高保真非线性基准飞行器上经过一系列激进机动评估，达到了与完整非线性机载模型相当的精度，同时相对于现有基线显著降低了计算成本。

英文摘要

Nonlinear dynamics and the strong couplings that arise between multiple effectors undermine the assumptions behind conventional, linear control allocation techniques. When flight enters regimes where nonlinear effects dominate, linear allocators exhibit reduced accuracy due to increased model mismatch, which subsequently degrades performance and robustness of the flight control system. High fidelity onboard models and black box data driven approaches can recover accuracy across the flight envelope, but respectively impose computational burdens prohibitive for real time allocation and sacrifice the interpretability required for verification and fault diagnosis. This paper addresses these limitations by learning an explicit, physics constrained analytical model of the control effectiveness mapping from representative flight data using Sparse Identification of Nonlinear Dynamics. The resulting mapping is compact, interpretable, and admits analytical derivatives, enabling efficient computation within nonlinear solvers that additionally incorporate actuator dynamics, without requiring an onboard model. An online adaptation mechanism monitors prediction residuals and refreshes the model when significant plant changes are detected, providing graceful reconfiguration under actuator failures and varying operating conditions. The methodology is evaluated on a high fidelity nonlinear benchmark aircraft across a range of aggressive maneuvers, achieving accuracy comparable to a full nonlinear onboard model while substantially reducing computational cost relative to established baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.11673 2026-06-19 quant-ph cs.LG 新提交

Higher-Order Token Interactions via Quantum Attention

高阶令牌交互的量子注意力机制

Jian Xu, Chao Li, Delu Zeng, John Paisley, Qibin Zhao

发表机构 * RIKEN iTHEMS ； RIKEN AIP ； South China University of Technology（华南理工大学）； Columbia University（哥伦比亚大学）

AI总结提出量子高阶注意力（QHA），通过数据重上传和非克利福德纠缠器在浅电路中合成任意阶令牌交互，证明其表达能力超越经典自注意力，并具有可训练性保证，在遗传上位、带噪学习奇偶和图三角形检测中高效检测高阶交互。

详情

AI中文摘要

标准点积自注意力在单层中仅计算令牌间的成对（二阶）交互；表示一般的$k$阶交互已知需要在单层中使用超二次资源或通过深度组合。我们引入\textbf{量子高阶注意力（QHA）}，一种浅层、硬件可实现的量子注意力头，通过数据重上传和全对非克利福德纠缠器，在电路内部合成$k$阶令牌交互，并通过局部单量子比特读出暴露它们。我们证明：（i）表达能力分离：任何嵌入维度$m$、$H$个头和$p$位精度满足$mHp=o(N/\log\log N)$的单个标准自注意力层无法表示一个QHA头以电路深度$O(\log k)$（$O(k)$个两量子比特门）表示的$k$阶相关族；（ii）其局部设计实例的可训练性保证：使用局部读出和$O(\log n)$深度，梯度方差为$\Omega(1/\mathrm{poly}(n))$（无贫瘠高原），我们通过实验确认——同时明确我们基准测试的更具表达力的全对实例是经验训练的，并显示指数衰减的梯度。实验上，在参数预算小$6.5\times$的情况下，QHA从不相交输入中泛化每个阶$k\le6$的隐藏子集奇偶性，而更大的经典注意力头在阶~2之后崩溃；与理论一致，优势的大小跟踪目标的傅里叶度——奇偶性最大，当存在低阶结构时缩小。作为一个应用，QHA在三个领域——遗传上位、带噪学习奇偶和图三角形检测——作为紧凑的高阶交互检测器，在最小的参数预算下达到噪声上限，而领域标准的线性方法失败。

英文摘要

Standard dot-product self-attention computes, in a single layer, only pairwise (order-2) interactions between tokens; representing a generic order-$k$ interaction is known to require either super-quadratic resources in one layer or composition across depth. We introduce \textbf{Quantum Higher-Order Attention (QHA)}, a shallow, hardware-realizable quantum attention head that, via data re-uploading and an all-to-all non-Clifford entangler, synthesizes order-$k$ token interactions inside the circuit and exposes them through a local single-qubit read-out. We prove (i) an expressivity separation: any single standard self-attention layer with embedding dimension $m$, $H$ heads and $p$-bit precision satisfying $mHp=o(N/\log\log N)$ cannot represent the order-$k$ correlation family that one QHA head represents with circuit depth $O(\log k)$ ($O(k)$ two-qubit gates); and (ii) a trainability guarantee for its local-design instantiation: with a local read-out and $O(\log n)$ depth the gradient variance is $Ω(1/\mathrm{poly}(n))$ (no barren plateau), which we confirm empirically -- while being explicit that the more expressive all-to-all instantiation we benchmark is trained empirically and shows exponentially decaying gradients. Empirically, at a $6.5\times$ smaller parameter budget, QHA generalizes hidden-subset parity of every order $k\le6$ from disjoint inputs, whereas the larger classical attention head collapses past order~2; consistent with theory, the size of the advantage tracks the target's Fourier degree - largest for parity and shrinking when low-order structure is present. As an application, QHA serves as a compact high-order interaction detector across three domains - genetic epistasis, learning-parity-with-noise, and graph triangle detection - reaching the noise ceiling at the smallest parameter budget where field-standard linear methods fail.

URL PDF HTML ☆

赞 0 踩 0

2606.10686 2026-06-19 physics.comp-ph astro-ph.IM cs.LG 新提交

An adaptive framework for the axisymmetric pulsar magnetosphere using physics-informed Kolmogorov-Arnold networks

基于物理信息Kolmogorov-Arnold网络的轴对称脉冲星磁层自适应框架

Spyros Rigas, Ioannis Contopoulos, Georgios Alexandridis, Antonios Nathanail

发表机构 * Department of Digital Industry Technologies, School of Science, National and Kapodistrian University of Athens（数字产业技术系，科学学院，国家与卡布利安大学）； Research Center for Astronomy and Applied Mathematics, Academy of Athens（天文与应用数学研究所，雅典学院）

AI总结提出基于Kolmogorov-Arnold网络的自适应框架，结合自动化训练流程和物理收敛准则，在双精度下将PDE残差均方误差降至O(1e-6)，收敛时间缩短至20分钟内，并可靠解析缩小80%的恒星半径。

Comments 25 pages, 10 figures

详情

AI中文摘要

脉冲星磁层直到最近才通过物理信息神经网络（PINNs）进行研究，采用区域分解方法并将分离线和赤道电流片视为无限薄的间断。然而，这一基线方法需要大量手动超参数调整，最终精度有限且需要数小时训练。我们通过引入基于Kolmogorov-Arnold网络的领域特定神经架构、自动化自适应训练流程以及基于物理的收敛准则来改进这一框架，消除了手动校准的需求。所提出的方法提供了自洽的轴对称磁层解，在双精度下PDE残差的均方误差达到O(1e-6)量级——比基线方法提高了两个数量级——同时在单精度下在20分钟内实现收敛。重要的是，该方法可靠地解析了相比基线缩小高达80%的恒星半径，克服了同样挑战传统求解器的严重空间尺度差异。此外，通过改变开放至无穷远的磁通量，我们提供了将其与赤道T点位置关联的方程的修正。完整框架已作为开源库PulsarX发布。

英文摘要

The pulsar magnetosphere has only recently been addressed using Physics-Informed Neural Networks (PINNs), by deploying a domain-decomposition approach and treating the separatrix and equatorial current sheet as infinitesimally thin discontinuities. However, this baseline requires extensive manual hyperparameter tuning, achieves limited final accuracy and demands several hours of training. We refine this framework by introducing domain-specific neural architectures based on Kolmogorov-Arnold networks, an automated adaptive training pipeline and a physics-based convergence criterion that eliminate the need for manual calibration. The proposed methodology delivers self-consistent axisymmetric magnetosphere solutions with mean squared errors of the PDE residuals at O(1e-6) in double precision - an improvement of two orders of magnitude over the baseline - while achieving convergence in under 20 minutes in single precision. Importantly, the method reliably resolves stellar radii reduced by up to 80% compared to the baseline, overcoming the severe spatial scale disparities that also challenge traditional solvers. Furthermore, by varying the flux that opens to infinity, we provide a correction to the equation that connects it to the equatorial T-point's position. The complete framework is released as the open-source library PulsarX.

URL PDF HTML ☆

赞 0 踩 0

2602.05416 2026-06-19 cs.CE cs.AI cs.LG physics.ao-ph physics.flu-dyn

Reduced-Order Surrogates for Forced Flexible Mesh Coastal-Ocean Models

降阶代理模型用于强制柔性网格海岸-海洋模型

Freja Høgholm Petersen, Jesper Sandvig Mariegaard, Rocco Palmitessa, Allan P. Engsig-Karup

发表机构 * DTU（技术大学）

AI总结本文提出一种灵活的Koopman自动编码器，结合气象强迫和边界条件，对比其与POD代理模型的性能，展示高精度和高效能的降阶方法。

Comments Submitted for peer-review in a journal. v2: revised version submitted to journal after minor revisions

详情

DOI: 10.1016/j.ocemod.2026.102772

AI中文摘要

尽管基于正交分解（POD）的代理模型在水动力应用中被广泛研究，但Koopman自动编码器在现实海岸-海洋建模中的应用仍较为有限。本文介绍了一种灵活的Koopman自动编码器公式，结合气象强迫和边界条件，并系统地比较其与POD代理模型的性能。Koopman自动编码器在潜在空间中使用学习的线性时间算子，通过特征值正则化促进时间稳定性。该策略与时间展开技术结合，以实现稳定和准确的长期预测。模型在三个涵盖不同动力学领域的测试案例上进行评估，预测时间跨度达一年，时间分辨率为30分钟。在所有案例中，具有时间展开的降阶代理模型在相对均方根误差为0.0068-0.14和R²值为0.61-0.995的情况下实现了高精度，其中预测误差最大为洋流速度，最小为水表面 elevation。在两个案例中，Koopman自动编码器的精度高于POD代理模型。与现场观测相比，代理模型的水表面 elevation 预测误差比物理模型的预测误差增加了-0.64%至12%。这些误差水平，对应于几厘米，对于许多实际应用是可接受的，同时推理速度提升300-1400倍，使如集合预报和长期气候模拟等工作流程成为可能。

英文摘要

While proper orthogonal decomposition (POD)-based surrogates are widely explored for hydrodynamic applications, the use of Koopman autoencoders for real-world coastal-ocean modelling remains relatively limited. This paper introduces a flexible Koopman autoencoder formulation that incorporates meteorological forcings and boundary conditions, and systematically compares its performance against POD-based surrogates. The Koopman autoencoder employs a learned linear temporal operator in latent space, enabling eigenvalue regularization to promote temporal stability. This strategy is evaluated alongside temporal unrolling techniques for achieving stable and accurate long-term predictions. The models are assessed on three test cases spanning distinct dynamical regimes, with prediction horizons up to one year at 30-minute temporal resolution. Across all cases, the reduced order surrogates with temporal unrolling achieve high accuracy with relative root-mean-squared-errors of 0.0068-0.14 and $R^2$-values of 0.61-0.995, where prediction errors are largest for current velocities, and smallest for water surface elevations. In two of the three cases, the Koopman Autoencoder have higher accuracy than the POD-based surrogates. Comparing to in-situ observations, the surrogate yields -0.64% to 12% increase in water surface elevation prediction error when compared to prediction errors of the physics-based model. These error levels, corresponding to a few centimeters, are acceptable for many practical applications, while inference speed-ups of 300-1400x enables workflows such as ensemble forecasting and long climate simulations for coastal-ocean modelling.

URL PDF HTML ☆

赞 0 踩 0

2601.12433 2026-06-19 eess.SP cs.LG

Temporal Data and Short-Time Averages Improve Multiphase Mass Flow Metering

时序数据和短时平均值提升多相质量流量计测量

Amanda Nyholm, Yessica Arellano, Jinyu Liu, Damian Krakowiak, Pierluigi Salvo Rossi

发表机构 * Dept. Electronic Systems, Norwegian University of Science and Technology（电子系统系，挪威科学与技术大学）； Dept. Gas Technology, SINTEF Energy Research（气体技术系，SINTEF能源研究）； Dept. Research and Development, KROHNE Ltd.（研发部，KROHNE有限公司）

AI总结本文通过结合机器学习与单相流量计，利用时序数据和短时平均值提升多相流测量精度，CNN在0.25Hz下表现最佳，误差显著低于传统方法。

Comments 9 pages, 6 figures

Journal ref IEEE Sensors Journal, vol. 26, no. 11, pp. 17252-17261, 1 June 2026

详情

DOI: 10.1109/JSEN.2026.3683143

AI中文摘要

可靠的流量测量对许多行业至关重要，但当前仪器常难以准确估计多相流。本文将机器学习算法与准确的单相流量计结合，通过保留时序信息显著提升模型性能。我们比较了多层感知机、滑动窗口多层感知机和卷积神经网络（CNN）在342次三相空气-水-油流实验数据上的表现。与以往将每个实验压缩为单一平均样本不同，我们计算每个实验内的短时平均值，并训练保留时序信息的模型。CNN在0.25Hz下表现最佳，相对误差低于13%的占比约95%，归一化均方根误差为0.03，平均绝对百分比误差约4.3%，明显优于最佳单平均模型，证明在单个实验内使用短时平均更优。结果在多种数据分割和随机种子下一致，显示鲁棒性。

英文摘要

Reliable flow measurements are essential in many industries, but current instruments often fail to accurately estimate multiphase flows, which are frequently encountered in real-world operations. Combining machine learning (ML) algorithms with accurate single-phase flowmeters has therefore received extensive research attention in recent years. The Coriolis mass flowmeter is a widely used single-phase meter that provides direct mass flow measurements, which ML models can be trained to correct, thereby reducing measurement errors in multiphase conditions. This paper demonstrates that preserving temporal information significantly improves model performance in such scenarios. We compare a multilayer perceptron, a windowed multilayer perceptron, and a convolutional neural network (CNN) on three-phase air-water-oil flow data from 342 experiments. Whereas prior work typically compresses each experiment into a single averaged sample, we instead compute short-time averages from within each experiment and train models that preserve temporal information at several downsampling intervals. The CNN performed best at 0.25 Hz with approximately 95 % of relative errors below 13 %, a normalized root mean squared error of 0.03, and a mean absolute percentage error of approximately 4.3 %, clearly outperforming the best single-averaged model and demonstrating that short-time averaging within individual experiments is preferable. Results are consistent across multiple data splits and random seeds, demonstrating robustness.

URL PDF HTML ☆

赞 0 踩 0

2506.23396 2026-06-19 stat.ML cs.LG

AICO: Feature Significance Tests for Supervised Learning

AICO：监督学习中的特征重要性检验

Kay Giesecke, Enguerrand Horel, Chartsiri Jirachotkulthorn

发表机构 * Stanford University, Department of Management Science and Engineering and Institute for Computational and Mathematical Engineering（斯坦福大学管理科学与工程系和计算与数学工程研究所）； Upstart, Inc.（Upstart公司）； Stanford University, Institute for Computational and Mathematical Engineering（斯坦福大学计算与数学工程研究所）

AI总结 AICO提出一种高效统计方法，通过屏蔽特征信息来测试特征对预测性能的贡献，为大规模模型提供无分布假设的可解释性工具。

详情

AI中文摘要

机器学习在现代科学、工业和政策中至关重要，但其预测能力往往以透明性为代价：我们很少知道哪些输入特征真正驱动模型的预测。现有工具评估特征影响有限，大多数缺乏统计保证，且许多需要昂贵的重新训练或替代模型，难以应用于大型现代模型。我们引入AICO，一种广泛适用的框架，将模型可解释性转化为高效的统计练习。AICO测试每个特征是否真正提高预测性能，通过屏蔽其信息并测量由此产生的变化。该方法通过简单的非渐近假设检验程序提供精确的有限样本特征p值和置信区间，无需重新训练、替代模型或分布假设，适用于大规模算法。在受控实验和实际应用中，从信用评分到抵押行为预测，AICO可靠地识别驱动模型行为的变量，提供可扩展且统计上合理的透明和可信机器学习路径。

英文摘要

Machine learning is central to modern science, industry, and policy, yet its predictive power often comes at the cost of transparency: we rarely know which input features truly drive a model's predictions. Without such understanding, researchers cannot draw reliable conclusions, practitioners cannot ensure fairness or accountability, and policymakers cannot trust or govern model-based decisions. Existing tools for assessing feature influence are limited; most lack statistical guarantees, and many require costly retraining or surrogate modeling, making them impractical for large modern models. We introduce AICO, a broadly applicable framework that turns model interpretability into an efficient statistical exercise. AICO tests whether each feature genuinely improves predictive performance by masking its information and measuring the resulting change. The method provides exact, finite-sample feature p-values and confidence intervals for feature importance through a simple, non-asymptotic hypothesis testing procedure. It requires no retraining, surrogate modeling, or distributional assumptions, making it feasible for large-scale algorithms. In both controlled experiments and real applications, from credit scoring to mortgage-behavior prediction, AICO reliably identifies the variables that drive model behavior, providing a scalable and statistically principled path toward transparent and trustworthy machine learning.

URL PDF HTML ☆

赞 0 踩 0

2602.14239 2026-06-19 cs.SI cs.AI cs.LG

A Hybrid TGN-SEAL Model for Dynamic Graph Link Prediction

一种混合TGN-SEAL模型用于动态图链接预测

Nafiseh Sadat Sajadi, Behnam Bahrak, Mahdi Jafari Siavoshani

发表机构 * Department of Computer Engineering, Sharif University of Technology（谢尔万大学计算机工程系）； Tehran Institute for Advanced Studies, Khatam University（泰赫兰高级研究院，卡塔姆大学）

AI总结本文提出混合TGN-SEAL模型，通过提取候选链接周围子图，联合学习结构和时间信息，提升稀疏动态网络链接预测性能。

Journal ref EPJ Data Science (2026)

详情

DOI: 10.1140/epjds/s13688-026-00670-1

AI中文摘要

在稀疏且持续演化的网络中预测链接是网络科学中的核心挑战。传统启发式方法和深度学习模型，包括图神经网络（GNNs），通常设计用于静态图，难以捕捉时间依赖性。基于快照的技术部分解决了这一问题，但在具有短暂交互的网络（如电信呼叫详细记录（CDRs））中常面临数据稀疏和类别不平衡的问题。时间图网络（TGNs）通过随时间更新节点嵌入来建模动态图；然而，在稀疏条件下其预测准确性仍有限。在本研究中，我们通过提取候选链接周围的封闭子图改进TGN框架，使模型能够联合学习结构和时间信息。在稀疏CDR数据集上的实验表明，我们的方法在标准TGNs基础上将平均精度提高了2.6%，展示了在动态网络中整合局部拓扑结构以实现稳健链接预测的优势。

英文摘要

Predicting links in sparse, continuously evolving networks is a central challenge in network science. Conventional heuristic methods and deep learning models, including Graph Neural Networks (GNNs), are typically designed for static graphs and thus struggle to capture temporal dependencies. Snapshot-based techniques partially address this issue but often encounter data sparsity and class imbalance, particularly in networks with transient interactions such as telecommunication call detail records (CDRs). Temporal Graph Networks (TGNs) model dynamic graphs by updating node embeddings over time; however, their predictive accuracy under sparse conditions remains limited. In this study, we improve the TGN framework by extracting enclosing subgraphs around candidate links, enabling the model to jointly learn structural and temporal information. Experiments on a sparse CDR dataset show that our approach increases average precision by 2.6% over standard TGNs, demonstrating the advantages of integrating local topology for robust link prediction in dynamic networks.

URL PDF HTML ☆

赞 0 踩 0

2601.15119 2026-06-19 eess.IV cs.CV

Vision Models for Medical Imaging: A Hybrid Approach for PCOS Detection from Ultrasound Scans

医学影像中的视觉模型：一种用于超声扫描中多囊卵巢综合征检测的混合方法

Md Mahmudul Hoque, Md Mehedi Hassain, Muntakimur Rahaman, Md. Towhidul Islam, Shaista Rani, Md Sharif Mollah

发表机构 * Department of CSE, CCN University of Science & Technology（计算机科学与工程系，CCN科学与技术大学）； Department of EEE,International Islamic University Chittagong（电子工程系，国际伊斯兰大学恰tagong分校）； Faculty of Engineering, Multimedia University（工程学院，多媒体大学）； Department of CSE, Stamford University of Bangladesh（计算机科学与工程系，斯塔福德大学孟加拉国分校）； Department of Biology, Lucknow University（生物学系，拉胡尔大学）； Department of CSE, Bangladesh Army International University of Science & Technology（计算机科学与工程系，孟加拉国军队国际科学与技术大学）

AI总结本文提出两种混合模型，结合卷积和Transformer方法，用于超声图像中多囊卵巢综合征的准确检测，最终模型在准确性上达到98.23%。

详情

DOI: 10.1088/1742-6596/3191/1/012120

AI中文摘要

多囊卵巢综合征（PCOS）是育龄女性最常见的内分泌疾病。许多孟加拉女性在老年时患PCOS。我们的研究目的是识别有效的基于视觉的医学图像分析技术，并评估混合模型以准确检测PCOS。我们引入了两种新颖的混合模型，结合卷积和Transformer方法。训练和测试数据被分为两类：“感染”（PCOS阳性）和“非感染”（健康卵巢）。在初始阶段，我们的第一个混合模型“DenConST”（结合DenseNet121、Swin Transformer和ConvNeXt）达到了85.69%的准确率。最终优化的模型“DenConREST”（结合Swin Transformer、ConvNeXt、DenseNet121、ResNet18和EfficientNetV2）表现出更优异的性能，准确率达到98.23%。在所有评估的模型中，DenConREST表现最佳。本研究为从超声图像中检测PCOS提供了一个高效的解决方案，显著提高了诊断准确性并减少了检测错误。

英文摘要

Polycystic Ovary Syndrome (PCOS) is the most familiar endocrine illness in women of reproductive age. Many Bangladeshi women suffer from PCOS disease in their older age. The aim of our research is to identify effective vision-based medical image analysis techniques and evaluate hybrid models for the accurate detection of PCOS. We introduced two novel hybrid models combining convolutional and transformer-based approaches. The training and testing data were organized into two categories: "infected" (PCOS-positive) and "noninfected" (healthy ovaries). In the initial stage, our first hybrid model, 'DenConST' (integrating DenseNet121, Swin Transformer, and ConvNeXt), achieved 85.69% accuracy. The final optimized model, 'DenConREST' (incorporating Swin Transformer, ConvNeXt, DenseNet121, ResNet18, and EfficientNetV2), demonstrated superior performance with 98.23% accuracy. Among all evaluated models, DenConREST showed the best performance. This research highlights an efficient solution for PCOS detection from ultrasound images, significantly improving diagnostic accuracy while reducing detection errors.

URL PDF HTML ☆

赞 0 踩 0

2509.04390 2026-06-19 eess.AS cs.SD

Accelerated Interactive Auralization of Highly Reverberant Spaces using Graphics Hardware

利用图形硬件加速高混响空间的交互式声景还原

Hannes Rosseel, Toon van Waterschoot

发表机构 * KU Leuven, Dept. of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing

AI总结本文提出基于GPU的实时多声道扬声器声学还原系统，通过GPU加速降低计算延迟，实现高混响空间的实时声学合成与反馈消除。

Comments 9 pages, 6 figures, submitted to Journal of the Audio Engineering Society

详情

DOI: 10.17743/jaes.2022.0267

AI中文摘要

交互式声学还原允许用户实时探索虚拟声学环境，能够重现不再可访问、声学改变或难以访问的音乐厅或历史礼拜空间。交互式声学合成需要实时将输入信号与一组合成滤波器卷积，以建模空间-时间声学响应。由于音乐厅和历史礼拜空间具有长混响时间，导致合成滤波器包含许多滤波器 taps。因此，卷积过程可能计算密集，产生显著延迟，限制了声学还原系统的实时交互性。本文介绍了实时多声道扬声器基声学还原系统的实现。该系统能够利用GPU加速实时合成高混响空间的声学特性。比较了传统CPU卷积与GPU加速卷积，显示后者可实现显著降低延迟的实时性能。此外，系统在GPU上集成了声学合成与声学反馈消除，创建了一个统一的扬声器基声学还原框架，以最小化处理延迟。

英文摘要

Interactive acoustic auralization allows users to explore virtual acoustic environments in real-time, enabling the acoustic recreation of concert hall or Historical Worship Spaces (HWS) that are either no longer accessible, acoustically altered, or impractical to visit. Interactive acoustic synthesis requires real-time convolution of input signals with a set of synthesis filters that model the space-time acoustic response of the space. The acoustics in concert halls and HWS are both characterized by a long reverberation time, resulting in synthesis filters containing many filter taps. As a result, the convolution process can be computationally demanding, introducing significant latency that limits the real-time interactivity of the auralization system. In this paper, the implementation of a real-time multichannel loudspeaker-based auralization system is presented. This system is capable of synthesizing the acoustics of highly reverberant spaces in real-time using GPU-acceleration. A comparison between traditional CPU-based convolution and GPU-accelerated convolution is presented, showing that the latter can achieve real-time performance with significantly lower latency. Additionally, the system integrates acoustic synthesis with acoustic feedback cancellation on the GPU, creating a unified loudspeaker-based auralization framework that minimizes processing latency.

URL PDF HTML ☆

赞 0 踩 0

2510.05013 2026-06-19 stat.ML cs.LG

Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration

通过自我探索的机器人好奇心驱动行为与语言发展

Theodore Jerome Tinker, Kenji Doya, Jun Tani

发表机构 * Okinawa Institute of Science and Technology（冲绳科学技术大学院大学）

AI总结本研究通过好奇心驱动的机器人自我探索，结合Q学习实现主动推理，揭示了组合泛化、快速学习、先配对后组合以及异常处理导致的U型发展模式，为人类高效语言习得提供解释。

Comments 27 pages, 22 pages of supplementary material

详情

AI中文摘要

婴儿通过极少的经验就能泛化习得语言，而大型语言模型需要数十亿的训练标记。人类高效发展的基础是什么？我们通过实验研究了这一问题，其中机器人代理通过好奇心驱动的自我探索学习执行与祈使句（例如，推红色立方体）相关的动作。我们的方法使用Q学习摊销主动推理，实现内在动机的发展性学习。模拟揭示了与发展心理学观察相对应的关键发现。i) 随着组合元素规模的增加，泛化能力显著提高。ii) 好奇心驱动的探索能够加速学习。iii) 句子和动作的机械配对先于组合泛化。iv) 异常处理导致U型发展表现，这种模式类似于儿童语言学习中的表征重述。这些结果表明，好奇心驱动的主动推理解释了内在动机的感觉运动-语言学习如何支持人类和人工代理中的可扩展组合泛化和异常处理。

英文摘要

Infants acquire language with generalization from minimal experience, whereas large language models require billions of training tokens. What underlies efficient development in humans? We investigated this problem through experiments wherein robotic agents learn to perform actions associated with imperative sentences (e.g., push red cube) via curiosity-driven self-exploration. Our approach amortizes active inference using Q-learning, enabling intrinsically motivated developmental learning. The simulations reveal key findings corresponding to observations in developmental psychology. i) Generalization improves drastically as the scale of compositional elements increases. ii) Curiosity-driven exploration enables faster learning. iii) Rote pairing of sentences and actions precedes compositional generalization. iv) Exception-handling induces U-shaped developmental performance, a pattern like representational redescription in child language learning. These results suggest that curiosity-driven active inference accounts for how intrinsically motivated sensorimotor-linguistic learning supports scalable compositional generalization and exception handling in humans and artificial agents.

URL PDF HTML ☆

赞 0 踩 0

2606.20532 2026-06-19 cs.AI 新提交

How Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-Speech

指令如何塑造语音？面向风格描述文本到语音的交叉注意力归因

Nityanand Mathur, Hamees Sayed, Wasim Madha, Apoorv Singh, Sameer Khurana, Akshat Mandloi, Sudarshan Kamath

AI总结提出交叉注意力归因方法，分析风格描述文本到语音系统中单词对声学输出的影响，发现风格标记在早期步骤和深层注意力峰值，且与基频和能量相关。

详情

AI中文摘要

风格描述文本到语音系统使用自然语言控制语音特征，但单个单词如何影响声学输出仍不清楚。理解这一点对于诊断故障模式和提高表现性TTS的可控性至关重要。我们首次将DAAM框架适配到语音领域，为语音扩散模型提出交叉注意力归因，并将其应用于CapSpeech-TTS。我们的方法提取了25层和24个ODE步骤的逐词热力图。我们分析了3,600个（风格描述，文本转录）组合，包括120个风格描述条件生成30个文本转录，揭示了描述词如何塑造波形。结果表明：（1）风格标记的时间方差低于内容/功能标记，确认了全局条件作用；（2）风格注意力与基频和能量相关；（3）风格条件作用在早期步骤和深层达到峰值；（4）注意力熵在第17层达到最小值，与风格重要性峰值同时出现，表明在最关键风格阶段网络选择性最大。这是首次研究自然语言如何影响语音扩散模型中的交叉注意力。

英文摘要

Style-captioned text-to-speech systems use natural language to control voice characteristics, but how individual words influence acoustic output remains unclear. Understanding this is critical for diagnosing failure modes and improving controllability in expressive TTS. We propose cross-attention attribution for speech diffusion models, adapting the DAAM framework to the speech domain for the first time, and apply it to CapSpeech-TTS. Our method extracts per-token heatmaps across 25 layers and 24 ODE steps. We analyze 3,600 (style caption, text transcript) combinations comprising 120 style captions conditioning the generation of 30 text transcripts each, revealing how caption tokens shape waveforms. Results show: (1) style tokens have lower temporal variance than content/function tokens, confirming global conditioning; (2) style attention correlates with F0 and energy; (3) style conditioning peaks in early steps and deep layers; (4) attention entropy reaches its minimum at layer 17, co-occurring with the style importance peak, indicating maximal network selectivity at the most style-critical stage. This is the first study of how natural language influences cross-attention in speech diffusion models

URL PDF HTML ☆

赞 0 踩 0

2606.20508 2026-06-19 cs.AI cs.LG 新提交

What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations?

安全对齐的LLM从混合顺从演示中学到了什么？

Sihui Dai, Mann Patel

AI总结研究通过混合良性顺从演示和有害顺从演示，探究演示组成如何驱动有害顺从，发现演示内容、顺序和训练方法影响模型提取的信息。

详情

AI中文摘要

先前工作表明，上下文演示可以越狱语言模型，但模型如何解释不同类型的顺从演示仍不清楚。我们通过混合良性顺从演示（无害请求，有帮助响应）与有害顺从演示（有害请求，有帮助响应）并测试关于演示组成如何驱动有害顺从的三个假设来研究这一点。在四个模型中，我们发现良性和有害演示不可互换：良性演示根据模型不同可以减少或增加有害顺从。我们进一步表明，偏好优化是防止良性演示增加有害顺从的关键训练阶段，演示顺序表现出强烈的近因偏差，并且模型在拒绝与上下文学习的交互方式上有所不同：一些模型在拒绝时也采用演示的格式，而其他模型在拒绝时覆盖所有上下文信号。综合来看，这项工作超越了展示基于演示的越狱有效，而是描述了其工作原理：模型从顺从演示中提取的内容取决于演示内容、顺序和训练方法。

英文摘要

Prior work has shown that in-context demonstrations can jailbreak language models, but it remains unclear how models interpret different types of compliance demonstrations. We study this by mixing benign compliance demonstrations (non-harmful request, helpful response) with harmful compliance demonstrations (harmful request, helpful response) and testing three hypotheses about how demonstration composition drives harmful compliance. Across four models, we find that benign and harmful demonstrations are not interchangeable: benign demonstrations can either reduce or increase harmful compliance depending on the model. We further show that preference optimization is the critical training stage that prevents benign demonstrations from increasing harmful compliance, that demonstration ordering exhibits strong recency bias, and that models differ in how refusal interacts with in-context learning: some adopt demonstrated formatting even when refusing, while others override all in-context signals upon refusal. Taken together, this work moves beyond showing that demonstration-based jailbreaking works to characterizing how it works: what models extract from compliance demonstrations depends on demonstration content, ordering, and training methodology.

URL PDF HTML ☆

赞 0 踩 0

2606.20428 2026-06-19 cs.RO 新提交

ARC: Adaptive Robust Joint State and Covariance Estimation

ARC：自适应鲁棒联合状态与协方差估计

Alexandre Hadji-Thomas, Andrew Stirling, James R. Forbes

AI总结提出统一块坐标下降框架，结合自适应鲁棒损失、迭代重加权最小二乘状态更新和最小加权协方差行列式估计器，实现离群值下状态与协方差的自适应联合估计。

Comments Submitted to information IEEE Robotics and Automation Letters (RA-L), June 2026. 8 pages, 7 figures, 1 table

详情

AI中文摘要

传感器测量经常受到离群值和非高斯噪声的污染。这些传感器数据中的缺陷会导致经典状态估计器产生有偏且不可靠的状态和不确定性估计。鲁棒估计器拒绝或降低离群值的权重，但不进行测量协方差估计，而联合状态和协方差估计器假设高斯残差和固定的损失形状参数。将这两种能力整合到一个框架中，可以在存在离群值的情况下同时估计状态和协方差。本文提出了一种统一的块坐标下降框架，该框架结合了范数感知自适应鲁棒损失、迭代重加权最小二乘状态更新和最小加权协方差行列式协方差估计器，产生了一个自调谐的联合状态和协方差估计器。该框架在蒙特卡洛模拟和真实世界超宽带定位实验（在杂乱的视距外环境中）中进行了评估。结果表明，所提出的估计器能够一致地恢复真实的内点测量协方差，并在状态估计精度上达到或超过所有基线方法，且无需任何手动参数调整。

英文摘要

Sensor measurements are frequently corrupted by outliers and non-Gaussian noise. These imperfections in the sensor data can cause classical state estimators to generate biased and unreliable state and uncertainty estimates. Robust estimators reject or downweight outliers but do not perform measurement covariance estimation, whereas joint state and covariance estimators assume Gaussian residuals and fixed loss shape parameters. Integrating these two capabilities into a single framework is an opportunity to simultaneously estimate both state and covariance in the presence of outliers. This paper proposes a unified Block-Coordinate Descent framework that combines a norm-aware adaptive robust loss, an Iteratively Reweighted Least-Squares state update, and a Minimum Weighted Covariance Determinant covariance estimator, yielding a self-tuning joint state and covariance estimator. The framework is evaluated in a Monte-Carlo simulation and on real-world ultra-wideband localization experiments in cluttered non-line-of-sight environments. Results show that the proposed estimator consistently recovers the true inlier measurement covariance and matches or exceeds the state estimation accuracy of all baselines, without requiring any manual parameter tuning.

URL PDF HTML ☆

赞 0 踩 0

2606.20411 2026-06-19 cs.LG 新提交

Direct Advantage Estimation for Scalable and Sample-efficient Deep Reinforcement Learning

直接优势估计：可扩展且样本高效的深度强化学习

Hsiao-Ru Pan, Bernhard Schölkopf

AI总结针对直接优势估计（DAE）在部分可观测域和高维观测下的局限性，本文扩展其理论框架并引入离散潜动态模型降低计算复杂度，在Arcade学习环境中验证了DAE的可扩展性和样本效率。

Comments Accepted at RLC2026

2606.20382 2026-06-19 cs.LG 新提交

Towards Modality-imbalanced Federated Graph Learning: A Data Synthesis-based Approach

面向模态不平衡的联邦图学习：一种基于数据合成的方法

Zhengyu Wu, Hongchao Qin, Xunkai Li, Zekai Chen, Rong-Hua Li, Guoren Wang

AI总结针对联邦图学习中客户端级和节点级模态不平衡问题，提出隐式图感知潜在语义表示合成范式FedMGS，通过可用性感知图编码器、原型引导语义合成器和可靠性校准融合机制恢复缺失模态语义，在四个任务上最高提升17.41%。

详情

AI中文摘要

多模态联邦图学习（MM-FGL）提供了一种自然的协作训练范式，但其实际部署受到两种粒度的模态不平衡挑战。当某些客户端缺少完整模态时，会出现客户端级不平衡；而当单个节点缺少视觉或文本属性时，会出现节点级不平衡。尽管存在一些相关研究，但我们的调查表明，它们主要针对图无关或集中式场景，难以直接适应。为了解决这些挑战，我们将模态不平衡的MM-FGL形式化为一个隐式图感知潜在语义表示合成问题。该范式直接在表示空间中恢复缺失的模态语义，从而最大化与原始数据语义分布的对齐，并缓解由缺失模态引起的高方差。为此，我们提出了FedMGS（联邦模态感知图合成），它集成了三个核心组件。可用性感知图编码器防止缺失模态污染局部结构传播。原型引导潜在语义合成器为不可用模态建立跨客户端语义锚点。可靠性校准语义融合机制在预测读出之前调节恢复的潜在表示的影响。在四个任务上的大量实验表明，FedMGS始终优于竞争基线，最高提升17.41%，并实现了最佳效率-性能权衡。

英文摘要

MultiModal Federated Graph Learning (MM-FGL) offers a natural collaborative training paradigm, but its practical deployment is challenged by two granularities of modality imbalance. Client-level imbalance occurs when certain clients lack entire modalities, while node-level imbalance occurs when individual nodes exhibit missing visual or textual attributes. While several relevant studies exist, our investigation reveals that they predominantly target graph-agnostic or centralized scenarios, rendering them difficult to adapt directly. To address these challenges, we formalize modality-imbalanced MM-FGL as an implicit graph-aware latent semantic representation synthesis problem. This paradigm recovers missing modal semantics directly within the representation space, thereby maximizing alignment with the original data's semantic distribution and mitigating the high variance induced by missing modalities. To this end, we propose FedMGS (Federated Modality-aware Graph Synthesis), which integrates three core components. The availability-aware graph encoder prevents missing modalities from contaminating local structural propagation. The prototype-guided latent semantic synthesizer establishes cross-client semantic anchors for unavailable modalities. The reliability-calibrated semantic fusion mechanism regulates the impact of recovered latent representations prior to predictive readout. Extensive experiments on four tasks show that FedMGS consistently outperforms competitive baselines with gains up to 17.41% with best efficiency-performance tradeoff.

URL PDF HTML ☆

赞 0 踩 0

2606.20357 2026-06-19 cs.LG 新提交

On the Variance of Temporal Difference Learning and its Reduction Using Control Variates

时序差分学习的方差及其通过控制变量的降低

Hsiao-Ru Pan, Bernhard Schölkopf

AI总结本文分析表格表示下相位设置中时序差分学习的方差，证明其方差降低机制是通过有效聚合更多独立轨迹，并比较了TD、MC和DAE的方差界限。

Comments Accepted at RLC2026

2606.20323 2026-06-19 cs.AI 新提交

Leveraging systems' non-linearity to tackle the scarcity of data in the design of Intelligent Fault Diagnosis Systems

利用系统非线性应对智能故障诊断系统设计中的数据稀缺问题

Giancarlo Santamato, Andrea Mattia Garavagno, Massimiliano Solazzi, Antonio Frisoli

AI总结提出一种利用系统固有非线性的周期多激励级方法，结合数据可视化与增强技术，在数据稀缺条件下实现基于深度迁移学习的振动故障诊断，并在铁路受电弓结构上验证有效性。

Journal ref Nonlinear Dynamics, vol. 112, pp. 16153-16166, 2024

详情

DOI: 10.1007/s11071-024-09864-6

AI中文摘要

深度迁移学习（DTL）允许高效构建智能故障诊断系统（IFDS）。另一方面，DTL方法仍然严重依赖大量标记数据。在处理机器或结构故障时，获取如此大量的数据可能具有挑战性。本文提出了一种在数据严重稀缺条件下使用DTL设计基于振动的IFDS的新方法。利用真实世界系统固有非线性的周期性多激励级过程生成图像，这些图像可以由预训练的卷积神经网络（CNN）方便地分析以诊断故障。本文提出了一种新的数据可视化方法及其增强技术，以应对IFDS设计过程中典型的数据缺乏问题。在铁路受电弓结构上的实验验证为所提方法提供了有效支持。

英文摘要

Deep Transfer Learning (DTL) allows for the efficient building of Intelligent Fault Diagnosis Systems (IFDS). On the other hand, DTL methods still heavily rely on large amounts of labelled data. Obtaining such an amount of data can be challenging when dealing with machines or structures faults. This document proposes a novel approach to the design of vibration-based IFDS using DTL in condition of strong data scarcity. A periodic multi-excitation level procedure leveraging intrinsic non-linearities of real-world systems is used to produce images that can be conveniently analysed by pre-trained Convolutional Neural Networks (CNNs) to diagnose faults. A new data visualization method and its augmentation technique are proposed in this paper to tackle the typical lack of data encountered during the design of IFDS. Experimental validation on a railway pantograph structure provides effective support for the proposed method.

URL PDF HTML ☆

赞 0 踩 0

2606.20312 2026-06-19 cs.CV 新提交

Reliability-Aware Prototype Calibration for Frozen Pose-Flow Video Anomaly Detection

面向冻结姿态流视频异常检测的可靠性感知原型校准

Ning Dong, Yingna Su, Xin Dong, Ziyun Jiao, Xinnian Guo, Zhuangzhuang Pan

AI总结提出一种后验评分校准方法RPC，通过标准化潜在空间中的最近原型偏差修正冻结姿态流检测器的排名，在8个骨干-数据集组合上平均提升AUROC 2.03个百分点。

Comments 15 pages, 5 figures, 7 tables. Code available at https://github.com/iNing10/RPC

详情

AI中文摘要

姿态流视频异常检测器因其能为跟踪的骨架窗口提供基于似然的排名，在一类监控中具有吸引力。然而，单个似然分数可能隐藏多模态正常行为，并对姿态观测噪声敏感。我们研究了一个冻结检测器设置，其中姿态流骨干网络、缓存的骨架轨迹和评估流程是固定的。可靠性感知原型校准（RPC）是针对该设置的一种后验评分校准方法。它在冻结潜在空间中添加标准化的最近原型偏差到标准化的流分数，并仅使用关键点置信度来门控这一新增的几何证据。因此，RPC在保留原始密度信号的同时，利用姿态可靠性下的经验正常模式结构修正排名。在两个冻结姿态流骨干网络和四个数据集上，RPC在所有八个骨干-数据集对中提升了帧级AUROC，增益范围为0.34到4.49个百分点，平均为2.03个百分点。消融和可靠性分析表明，原型偏差是主要的修正信号，而可靠性门控在姿态观测不可靠时最为有用。这些结果表明，当重新训练或复现完整姿态流程不可行时，轻量级后验校准可以增强缓存的姿态流系统。

英文摘要

Pose-flow video anomaly detectors are attractive for one-class surveillance because they provide likelihood-based rankings for tracked skeleton windows. However, a single likelihood score may hide multimodal normal behavior and be sensitive to pose-observation noise. We study a frozen-detector setting in which the pose-flow backbone, cached skeleton tracks, and evaluation pipeline are fixed. Reliability-Aware Prototype Calibration (RPC) is a post-hoc score calibration method for this setting. It adds a standardized nearest-prototype deviation in the frozen latent space to the standardized flow score, and uses keypoint confidence only to gate this added geometric evidence. Thus, RPC preserves the original density signal while correcting the ranking with empirical normal-mode structure under pose reliability. Across two frozen pose-flow backbones and four datasets, RPC improves frame-level AUROC in all eight backbone-dataset pairs, with gains ranging from 0.34 to 4.49 percentage points and averaging 2.03 points. Ablation and reliability analyses show that prototype deviation is the main corrective signal, while reliability gating is most useful when pose observations are less trustworthy. These results suggest that lightweight post-hoc calibration can strengthen cached pose-flow systems when retraining or reproducing the full pose pipeline is impractical.

URL PDF HTML ☆

赞 0 踩 0

2606.20274 2026-06-19 cs.AI 新提交

Lagrange: An Open-Vocabulary, Energy-Based Sparse Framework for Generalized End-to-End Driving

Lagrange: 一种面向通用端到端驾驶的开放词汇、基于能量的稀疏框架

Shihao Ji, HongXi Li, Zihui Song, Mingyu Li

AI总结提出Lagrange框架，利用掩码潜在场和视觉语言模型实现开放词汇、稀疏计算，通过拉格朗日动作最小化确保运动学约束，在nuScenes和CODA基准上验证了鲁棒性和可解释性。

详情

AI中文摘要

将端到端自动驾驶扩展到复杂的开放世界环境，需要能够泛化到异常场景的感知模型和能够产生运动学有效轨迹的规划器。现有范式在表示效率和泛化能力之间存在明显分歧。密集模型（如占用网络）虽然几何鲁棒，但存在关键计算瓶颈，且难以进行高层语义推理。相反，稀疏的基于查询的规划器效率高，但依赖于封闭集定义，使其容易受到分布外事件的影响。尽管最近的视觉-语言-动作模型提供了开放词汇推理，但其自回归离散令牌生成从根本上与车辆动力学的连续高频控制需求相冲突。为解决这一问题，我们提出了Lagrange，一种基于掩码潜在场的开放词汇、计算稀疏的驾驶框架。Lagrange不依赖密集体积重建或封闭集查询机制，而是利用视觉语言模型将类别无关的目标提议编码为连续语义视觉令牌。我们引入了一种意图驱动的掩码交叉注意力模块，该模块在时间上过滤不相关实体，并将注意力令牌解码为定义在空间坐标上的隐式连续能量场。通过将决策制定为跨越该能量场的拉格朗日动作最小化问题，我们在执行碰撞避免的同时强制遵守车辆运动学。在标准（nuScenes）和长尾（CODA）基准上的大量离线评估表明，Lagrange为鲁棒、可解释且运动学可行的开放世界自主性建立了一个有前景的框架。

英文摘要

Scaling end-to-end autonomous driving to complex, open-world environments requires perceptual models that generalize to anomalous scenarios and planners that produce kinematically valid trajectories. Existing paradigms face a distinct dichotomy between representational efficiency and generalization capacity. Dense models (e.g., occupancy networks), while geometrically robust, incur critical computational bottlenecks and struggle with high-level semantic reasoning. Conversely, sparse, query-based planners are efficient but reliant on closed-set definitions, rendering them vulnerable to out-of-distribution (OOD) events. Although recent Vision-Language-Action (VLA) models offer open-vocabulary reasoning, their autoregressive, discrete token generation fundamentally conflicts with the continuous, high-frequency control requirements of vehicle dynamics. To address this, we propose Lagrange, an open-vocabulary, computationally sparse driving framework based on Masked Latent Fields (MLF). Rather than relying on dense volumetric reconstructions or closed-set query mechanisms, Lagrange exploits Vision-Language Models (VLMs) to encode class-agnostic object proposals into continuous semantic visual tokens. We introduce an intent-driven masked cross-attention module that temporally filters irrelevant entities, decoding the attended tokens into an implicit continuous energy field defined over spatial coordinates. By framing decision-making as a Lagrangian action minimization problem spanning this energy field, we enforce strict compliance with vehicle kinematics while executing collision avoidance. Extensive offline evaluations on both standard (nuScenes) and long-tail (CODA) benchmarks demonstrate that Lagrange establishes a promising framework for robust, interpretable, and kinematically feasible open-world autonomy.

URL PDF HTML ☆

赞 0 踩 0

2606.20255 2026-06-19 cs.CL cs.AI 新提交

The Register Gap: A Meaning Intelligence Framework for Nigerian Public Discourse

语域差距：尼日利亚公共话语的意义智能框架

Celestine Achi

AI总结提出九维意义智能框架（MIF），通过语域、真实意图等维度区分表面情感与真实交际意图，在尼日利亚公共话语数据集上使语域分类准确率提升40个百分点，复合意义智能评分提升5.4分。

Comments Preprint. 12 pages, 2 tables. Supplementary materials: MIF Master Specification v2.0, Annotation Guidelines v1.0, and 30-item public calibration set with gold labels available from the author

详情

AI中文摘要

我们提出了意义智能框架（MIF），这是一个用于尼日利亚公共话语的九维标注和评估方案，将表面情感与真实交际意图区分开来。现有的尼日利亚语言基准（包括NaijaSenti和AfriSenti）将情感分类视为三向极性任务（正面、负面、中性）。我们认为，AI系统在尼日利亚话语上的主要失败模式不是翻译失败，而是语境失败：同一话语根据说话者、听众和情境可能具有相反的语用效力。MIF通过九个评分维度将这一见解操作化：语域、表面情感、真实意图、反讽、编码潜台词、风险等级、标注者置信度、说话者情绪和推荐沟通行动。我们构建了一个包含30个项目的校准数据集，涵盖标准英语、尼日利亚英语、尼日利亚皮钦语和混合语域，并在零样本和模式引导提示条件下评估了一个前沿语言模型（Gemini 2.5 Flash）。主要发现是语域差距：零样本语域分类准确率为33.3%，当模型在上下文中接收到MIF模式时，准确率上升至73.3%（+40个百分点）。在模式引导提示下，复合意义智能评分增加了5.4分（从73.2到78.6），最大的实际收益体现在语域识别、编码潜台词检测（+10分）和战略行动推荐（+10.3分）上。我们发布了框架规范、标注指南和包含30个项目的公开校准集以支持可重复性，同时保留了一个私有留存语料库用于防污染评估。

英文摘要

We introduce the Meaning Intelligence Framework (MIF), a nine-dimension annotation and evaluation schema for Nigerian public discourse that separates surface sentiment from true communicative intent. Existing benchmarks for Nigerian languages, including NaijaSenti and AfriSenti, treat sentiment classification as a three-way polarity task (positive, negative, neutral). We argue that the dominant failure mode of AI systems on Nigerian discourse is not translation failure but context failure: the same utterance carries opposite pragmatic force depending on speaker, audience, and situation. The MIF operationalises this insight across nine scored dimensions: register, surface sentiment, true intent, irony, coded subtext, risk tier, annotator confidence, speaker emotion, and recommended communications action. We construct a 30-item calibration dataset spanning Standard English, Nigerian English, Nigerian Pidgin, and code-mixed registers, and evaluate a frontier language model (Gemini 2.5 Flash) under zero-shot and schema-informed prompting conditions. The headline finding is the Register Gap: zero-shot register classification accuracy is 33.3%, rising to 73.3% (+40 points) when the model receives the MIF schema in-context. The composite Meaning Intelligence Score increases by 5.4 points (73.2 to 78.6) under schema-informed prompting, with the largest practical gains in register identification, coded-subtext detection (+10 points), and strategic action recommendation (+10.3 points). We release the framework specification, annotation guidelines, and the 30-item public calibration set to support reproducibility, while retaining a private holdout corpus for contamination-protected evaluation.

URL PDF HTML ☆

赞 0 踩 0

2606.20208 2026-06-19 cs.AI cs.DB cs.NE 新提交

Beyond Accuracy: Measuring Logical Compliance of Predictive Models

超越准确性：衡量预测模型的逻辑合规性

Guillaume Olivier Delplanque, Pierre Genevès, Nabil Layaïda, Zephirin Faure

AI总结提出规则违反分数（RVS），一种独立于预测准确性的评估指标，用于量化预测模型对逻辑规则的遵守程度，并通过实验证明两个准确率相近的模型可能表现出截然不同的逻辑合规性。

详情

AI中文摘要

机器学习模型主要通过预测性能指标进行评估，如排序质量、预测误差或分类准确性。虽然这些指标有效量化了预测与真实值的匹配程度，但它们不评估模型输出是否尊重预定义的逻辑或领域特定约束。在医疗、金融和自主系统等高安全性应用中，逻辑一致性与预测准确性同样关键，但尚无标准指标捕捉这一维度。我们引入了规则违反分数（RVS），这是一种互补的评估指标，独立于预测准确性，量化预测模型对给定逻辑规则集的遵守程度。RVS 对硬规则（严格约束）和软规则（统计规律）区别对待，可在任何数据集和任何在关系词汇上表达的预测模型上进行评估，并可通过为 Horn 规则自动生成的 SQL 查询进行计算。除了评估模型，RVS 还可以评估训练数据集的逻辑一致性，并帮助识别定义不良的规则。我们在三个基准测试上评估了 RVS，涵盖知识图谱链接预测和关系回归，包括基于规则、基于嵌入和神经符号的预测模型。我们的结果表明，两个实现相当预测准确性的模型可能表现出显著不同的逻辑合规性，揭示了标准指标无法捕捉的模型行为差异。

英文摘要

Machine learning models are predominantly evaluated through predictive performance metrics such as ranking quality, prediction error, or classification accuracy. While these metrics effectively quantify how closely predictions match the ground truth, they do not assess whether model outputs respect predefined logical or domain-specific constraints. In high-stakes applications, including healthcare, finance, and autonomous systems, logical consistency can be as critical as predictive accuracy, yet no standard metric captures this dimension. We introduce the Rule Violation Score (RVS), a complementary evaluation metric that quantifies the extent to which a predictive model respects a given set of logical rules, independently of predictive accuracy. RVS treats hard rules (strict constraints) and soft rules (statistical regularities) differently, can be evaluated on any dataset and on any predictive model expressed over a relational vocabulary, and can be computed using SQL queries that are automatically generated for Horn rules. Beyond evaluating models, RVS can also evaluate the logical consistency of training datasets and help identify poorly defined rules. We evaluate RVS on three benchmarks covering knowledge graph link prediction and relational regression, including rule-based, embedding-based, and neuro-symbolic predictive models. Our results demonstrate that two models achieving comparable predictive accuracy can exhibit substantially different levels of logical compliance, revealing differences in model behavior that standard metrics fail to capture.

URL PDF HTML ☆

赞 0 踩 0

2606.20183 2026-06-19 cs.LG 新提交

Effective Dimension Governs Generalization in Quantum Kernel Vision Models

有效维度主导量子核视觉模型的泛化

Jian Xu, Delu Zeng, John Paisley, Qibin Zhao

AI总结通过有效维度d_eff解释量子视觉模型中纠缠结构增强泛化与量子噪声提升测试精度的现象，提出噪声形状核的谱分解与正则化机制。

详情

AI中文摘要

最近的量子视觉模型——量子视觉变换器和量子卷积网络——报告了两个引人注目但尚未解释的经验现象：(i) 具有更多或更均匀分布纠缠的拟设泛化更好，以及(ii) 注入量子噪声可以提高测试精度而不是降低它。这些观察目前被视为奇闻，通过网格搜索发现，并且如果有解释的话，也是手工进行的。我们表明，两者都是一个单一可测量量的表现：即（噪声形状的）量子特征核的\emph{有效维度}$d_{\rm eff}$。主要使用量子核视觉模型——由核分类器读出的量子特征映射——我们给出了一个谱解释，其中纠缠结构和量子噪声是调节$d_{\rm eff}$的两个旋钮；在过拟合区域，收缩$d_{\rm eff}$起到类似岭正则化的作用。我们分析了机制：退极化核$K_p=(1-p)^2K+\tfrac{p(2-p)}{D}\mathbf{1}\mathbf{1}^\top$的\emph{精确}分解，其中$d_{\rm eff}(K_p)\to1$，振幅阻尼的收缩结果（及其边界），核机器容量界，以及容量/对齐风险分解；在我们的纠缠实验中运作的单调收缩是经验验证的，并非普遍证明。沿着单参数退极化族，坍缩反而是通过构造精确的；我们仅用它来确认核分解到机器精度，最多达12个量子比特，而不是作为$d_{\rm eff}$的证据。振幅阻尼收缩$d_{\rm eff}$并沿倒U型最佳点将测试精度提升高达+13%；效应符号在过拟合和欠拟合区域之间翻转；噪声注入匹配显式谱过滤前沿。我们的结果将两个报告的现象组织成一个单一可测量原则，用于设计量子视觉模型。

英文摘要

Recent quantum vision models-quantum vision transformers and quantum convolutional networks-report two striking but unexplained empirical phenomena: (i) ansatze with more, or more uniformly distributed, entanglement generalize better, and (ii) injecting quantum noise can improve test accuracy rather than degrade it. These observations are currently treated as curiosities, discovered by grid search and explained, if at all, by hand. We show that both are manifestations of a single, measurable quantity: the \emph{effective dimension} $d_{\rm eff}$ of the (noise-shaped) quantum feature kernel. Working primarily with quantum-kernel vision models-a quantum feature map read out by a kernel classifier-we give a spectral account in which entanglement structure and quantum noise are two knobs that move $d_{\rm eff}$; in an overfitting regime, contracting $d_{\rm eff}$ acts as ridge-like regularization. We analyze the mechanism: an \emph{exact} decomposition of the depolarized kernel $K_p=(1-p)^2K+\tfrac{p(2-p)}{D}\mathbf{1}\mathbf{1}^\top$ with $d_{\rm eff}(K_p)\to1$, a contraction result (and its boundary) for amplitude damping, a kernel-machine capacity bound, and a capacity/alignment risk decomposition; the monotone contraction operative in our entangled experiments is verified empirically, not proven in general. Along the one-parameter depolarizing family the collapse is instead exact by construction; we use it only to confirm the kernel decomposition to machine precision and at up to $12$ qubits, not as evidence for $d_{\rm eff}$. Amplitude damping contracts $d_{\rm eff}$ and lifts test accuracy by up to $+13\%$ along an inverted-U sweet spot; the effect's sign flips between the over- and under-fitting regimes; noise injection matches an explicit spectral-filtering frontier. Our results organize two reported anecdotes into a single measurable principle for designing quantum-vision models.

URL PDF HTML ☆

赞 0 踩 0

2606.20179 2026-06-19 cs.CL 新提交

ReNikud: Audio-Supervised Hebrew Grapheme-to-Phoneme Conversion

ReNikud：音频监督的希伯来语字素到音素转换

Maxim Melichov, Yakov Kolani, Morris Alper

AI总结提出ReNikud方法，利用音频监督和伪元音化架构，通过无标注音频的ASR伪标签和字符级对齐，解决希伯来语G2P转换中的元音缺失和发音歧义问题，在多个基准上达到最优。

详情

AI中文摘要

现代希伯来语的字素到音素（G2P）转换对于文本到语音（TTS）等应用是必需的，但由于该语言的辅音音素文字系统（abjad）使元音大多不写出来，造成大量歧义，因此具有挑战性。标准方法首先预测元音变音符号（nikud）以生成国际音标（IPA）转录，但这存在局限性：元音化数据稀缺且制作费力，它不指定词汇重音等特征，并且反映的是正式语法规则而非日常口语发音。同时，直接的序列到序列IPA预测在有限数据上表现不佳，且未能利用辅音音素文字特有的字符级对齐。我们的方法ReNikud通过两个关键洞察克服了这些限制：（1）通过基于音素的自动语音识别（ASR）伪标签流水线，在数千小时无标注希伯来语音频上进行弱音频监督，生成反映自然口语规范的音位转录，无需人工标注。（2）一种伪元音化架构，在每个字符位置预测IPA音素，强制字符级对齐作为归纳偏置。在现有希伯来语G2P基准和针对口语希伯来语的新MILIM基准上的结果表明，ReNikud超越了先前的最先进方法。我们将发布代码和训练模型，以支持希伯来语TTS和语音技术的进一步研究。

英文摘要

Grapheme-to-phoneme (G2P) conversion for Modern Hebrew is needed for applications like text-to-speech (TTS), but is challenging due to the language's abjad writing system, which leaves vowels largely unwritten, creating substantial ambiguity. Standard approaches first predict vowel diacritics (nikud) to produce International Phonetic Alphabet (IPA) transcriptions, but this is limited: vocalization data is scarce and laborious to produce, it does not specify features such as lexical stress, and it reflects formal grammatical rules rather than everyday spoken pronunciation. Direct sequence-to-sequence IPA prediction, meanwhile, struggles on limited data and fails to exploit the character-level alignment characteristic of abjads. Our method, ReNikud, overcomes these limitations with two key insights: (1) Weak audio supervision via a phoneme-based automatic speech recognition (ASR) pseudo-labeling pipeline on thousands of hours of unlabeled Hebrew audio, yielding phonemic transcriptions that reflect natural spoken norms without manual annotation. (2) A pseudo-vocalization architecture that predicts IPA phonemes at each character position, enforcing character-level alignment as an inductive bias. Results on existing Hebrew G2P benchmarks and the new targeted MILIM benchmark for spoken Hebrew show that ReNikud surpasses previous state-of-the-art methods. We will release our code and trained models to support further work on Hebrew TTS and speech technologies.

URL PDF HTML ☆

赞 0 踩 0

2606.20034 2026-06-19 cs.LG 新提交

Exploring the potential of AlphaEarth and TESSERA embeddings for Fine-scale Local Climate Zone Mapping: A case study across five cities in Switzerland

探索AlphaEarth和TESSERA嵌入在精细尺度局地气候区制图中的应用潜力：以瑞士五个城市为例

Htet Yamin Ko Ko, Clement Atzberger

AI总结本研究对比TESSERA和AlphaEarth嵌入与传统Sentinel-1/2数据，使用注意力U-Net将粗分辨率LCZ图提升至10米，发现嵌入模型在跨城市迁移和精度上表现更优，但跨年迁移仍是挑战。

详情

AI中文摘要

理解城市空间形态对于气候建模、风险评估和可持续城市设计至关重要，而局地气候区（LCZ）制图为此提供了基本框架。然而，许多城市仍使用约100米分辨率的粗LCZ记录，这并不适用于精细尺度的城市研究。在本研究中，我们将TESSERA（Feng等人，2025）和AlphaEarth（Brown等人，2025）的预计算嵌入与传统的Sentinel-1/2（S1S2）合成数据在瑞士五个城市进行比较，以评估它们是否能够使用基于注意力的U-Net将粗LCZ图提升至10米分辨率。三个实验评估了多城市迁移性、更高分辨率参考数据的影响以及对年际物候变化的时间鲁棒性。我们发现，所有数据集在前两个实验中均取得了强劲性能，测试数据的交并比（IoU）分别在0.59-0.69和0.77-0.82之间。TESSERA在两种设置下均一致优于S1S2和AlphaEarth。正如预期，我们发现基于嵌入的模型从一年迁移到另一年仍然是一个开放的挑战。然而，总体而言，我们的结果表明，来自地球观测基础模型的嵌入在减少耗时预处理和手动特征工程任务方面具有巨大潜力，并能够指导通用的基于深度学习的LCZ制图工作流程。当与简单的位置感知注意力U-Net架构结合时，这些嵌入增强了区域迁移性和可扩展性，支持为全球城市气候应用开发全面且可重复的精细尺度LCZ图。提高参考数据质量仍然是进一步提升精度的最强杠杆。

英文摘要

Understanding urban spatial morphology is critical for climate modeling, risk assessment, and sustainable urban design, and Local Climate Zone (LCZ) mapping provides the basic framework for this. However, many cities still use coarse ~100-m resolution LCZ records, which are unsuitable for fine-scale urban research. In this study, precomputed embeddings from TESSERA (Feng et al., 2025) and AlphaEarth (Brown et al., 2025) are compared to traditional Sentinel-1/2 (S1S2) composites in five Swiss cities to see if they can upscale coarse LCZ maps to 10-m resolution using an attention-based U-Net. Three experiments assess multi-city transferability, the impact of higher-resolution reference data, and temporal robustness to year-to-year phenology changes. We find that all datasets achieve strong performance with test data Intersection-over-Union (IoU) ranging from 0.59-0.69 and 0.77-0.82 in the first two experiments. TESSERA consistently outperforms both S1S2 and AlphaEarth across both settings As expected, we find that the transfer of embedding-based models from one year to another remains an open challenge. Overall, however, our results demonstrate the promising potential of embeddings derived from EO foundation models to reduce time consuming preprocessing, respectively, manual feature engineering tasks and to guide a universal deep learning-based LCZ mapping workflow. When combined with a simple location-aware attention U-Net architecture, the embeddings enhance regional transferability and scalability, supporting the development of comprehensive and reproducible fine-scale LCZ maps for global urban climate applications Improving reference data quality remains the strongest lever for further accuracy gains.

URL PDF HTML ☆

赞 0 踩 0

2606.19987 2026-06-19 cs.SD eess.AS 新提交

PolSeT: Polish Semantics of Timbre Dataset

PolSeT: 波兰语音色语义数据集

Jan Jasiński

AI总结介绍PolSeT数据集，通过自由言语化和语义差异实验，收集波兰语语义描述符和音色评分，填补音色研究数据空白，支持跨文化心理声学和MIR研究。

Comments 8 pages, 7 figures. Data descriptor for the PolSeT dataset (Polish Semantics of Timbre), available at https://doi.org/10.5281/zenodo.17830609 under CC BY 4.0

详情

AI中文摘要

本数据报告介绍了PolSeT（波兰语语义音色）数据集，该数据集旨在促进波兰语及跨文化背景下的心理声学和音乐信息检索（MIR）研究。数据集包含两个连续实验的数据。实验1（N=60）是一项自由言语化任务，旨在创建波兰语语义描述符词汇表。使用11个刺激，共收集了1901个描述符（701个唯一）。实验2（N=105）利用该词汇表进行语义差异研究，参与者对18种乐器声音在8个双极量表上进行评分，并进行了重复试验以进行信度分析。发布的数据集包括原始听众响应、全面的人口统计数据（经验、性别、年龄）、音频刺激以及提取的声学特征及Python提取代码。该数据集填补了开放音色研究数据的空白，为心理声学研究和多语言语义嵌入模型的训练提供了必要的定性语言基础和定量评分。

英文摘要

This data report introduces PolSeT (Polish Semantic Timbre), a dataset designed to facilitate research in psychoacoustics and Music Information Retrieval (MIR) in Polish and cross-cultural contexts. The dataset contains data from two sequential experiments. Experiment 1 (N=60) was a free-verbalization task aimed at creating a lexicon of Polish semantic descriptors. Using 11 stimuli, a total of 1901 descriptors (701 unique) were gathered. Experiment 2 (N=105) utilized this lexicon to conduct a semantic differential study, where participants rated 18 instrument sounds on 8 bipolar scales, with repeated trials for reliability analysis. The released dataset includes raw listener responses, comprehensive demographics (experience, gender, age), audio stimuli, and extracted acoustic features with Python extraction code. This dataset addresses a gap in open timbre research data, providing both the qualitative linguistic groundwork and the quantitative ratings necessary for psychoacoustic research and the training of multilingual semantic embedding models.

URL PDF HTML ☆

赞 0 踩 0

2606.19971 2026-06-19 cs.RO 新提交

Evaluation of Augmented Reality-based Intuitive Interface for Robot-Assisted Transesophageal Echocardiography: A User Study

基于增强现实的机器人辅助经食管超声心动图直观界面评估：用户研究

Xiu Zhang*, Matteo Di Mauro*, Sofia Breschi, Angela Peloso, Emiliano Votta, Arianna Menciassi, Elena De Momi

AI总结本研究提出并评估了一种基于增强现实的直观界面，用于机器人辅助经食管超声心动图，通过3D可视化与尖端控制显著提升空间精度并降低操作误差。

详情

AI中文摘要

经食管超声心动图（TEE）对于诊断和引导结构性心脏病（SHD）介入治疗至关重要。然而，手动TEE操作需要操作者具备丰富的专业技能，体力消耗大，并且在透视下操作会使临床医生暴露于辐射中。机器人辅助TEE系统已被引入以改进探头操作并减少操作者疲劳，但直观有效的用户界面设计仍是一个开放挑战。本研究提出并评估了一种模型增强的、基于增强现实（AR）的直观界面，用于机器人辅助TEE，旨在提高空间意识和控制直观性。使用集成电磁跟踪和虚拟模拟器的机器人TEE平台，比较了三种在可视化和交互模式上不同的用户界面：2D关节级（2D-JI）、3D关节级（3D-JI）和3D尖端级（3D-TI）。36名参与者执行标准化导航任务以再现目标超声心动图视图，通过位置和方向误差、完成时间和NASA-TLX工作量评分评估性能。结果表明，3D可视化显著提高了空间精度，与2D界面相比，中位位置误差从13毫米减少到3毫米，方向误差减半。尖端级交互相比关节级控制，方向误差进一步降低50%，并减少了用户间变异性。总体而言，3D-TI配置结合了沉浸式可视化与直接尖端级控制，被证明是最有效且符合人体工程学的界面，支持将基于AR的可视化和直观控制范式集成到下一代机器人TEE系统中，以增强操作者性能和手术安全性。

英文摘要

TransEsophageal Echocardiography (TEE) is essential for diagnosing and guiding Structural Heart Disease (SHD) interventions. However, manual TEE manipulation demands significant operator expertise, is physically demanding, and exposes clinicians to radiation when performed alongside fluoroscopy. Robotic-assisted TEE systems have been introduced to improve probe handling and reduce operator fatigue, yet the design of intuitive and effective user interfaces remains an open challenge. This study presents and evaluates a model-enhanced, Augmented Reality (AR)-based intuitive interface for robot-assisted TEE, designed to improve spatial awareness and control intuitiveness. A robotic TEE platform integrated with electromagnetic tracking and a virtual simulator was used to compare three user interfaces differing in visualization and interaction modalities: 2D jointlevel (2D-JI), 3D joint-level (3D-JI), and 3D tip-level (3D-TI). Thirty six participants performed standardized navigation tasks to reproduce target echocardiographic views, with performance assessed via position and orientation errors, completion time, and NASA-TLX workload scores. Results show that 3D visualization significantly improved spatial accuracy, reducing median position error from 13 mm to 3 mm and halving the orientation error compared with the 2D interface. Tip-level interaction yielded a further 50% reduction in orientation error and reduced interuser variability relative to joint-level control. Overall, the 3D-TI configuration, combining immersive visualization with direct tip-level control, proved the most effective and ergonomic interface, supporting the integration of AR-based visualization and intuitive control paradigms into next-generation robotic TEE systems to enhance operator performance and procedural safety.

URL PDF HTML ☆

赞 0 踩 0

2606.19946 2026-06-19 cs.CL cs.LG 新提交

GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMs

GEMS: 几何约束使LLM中多语义叠加成为可能

Yu Deng

AI总结提出GEMS方法，通过范数保持加权叠加、目标注意力路径注入和实时正交化两个几何约束，解决无训练多方向激活干预中的分布偏差和方向干扰问题，在GSM8K上保持98%准确率。

Comments 30 pages, 5 figures, 20 tables. Code and logs are available at: https://github.com/LuLu663939/gems-multi-semantic-steering

详情

AI中文摘要

激活引导通过在推理时修改中间隐藏状态来控制模型行为，无需重新训练。现有方法仅处理单方向注入；当多个语义方向无约束叠加时，模型崩溃。我们证明这种崩溃分解为两个独立作用的来源：分布偏差（加法扰动在层间累积范数并将激活推出训练分布）和方向干扰（非正交语义向量叠加时相互抑制）。这两个来源定义了任何无训练多方向干预必须满足的设计约束。作为这些原则的一个实例，我们提出GEMS，一种无训练方法，将每个来源映射到相应的几何约束：针对分布偏差的范数保持加权叠加和目标注意力路径注入，以及针对方向干扰的实时正交化。在GSM8K上，注入三个并发非数学方向保持98%的准确率（基线92%），而无约束加法崩溃至4%；在Wikitext-2上，相同注入仅导致2.2%的PPL增加。组件消融隔离了每个约束的因果作用，层级探针确认正交化信号通过FFN路径存活并以语义特异性到达输出分布。定性引导效果跨架构从3B到31B迁移。

英文摘要

Activation steering controls model behavior by modifying intermediate hidden states at inference time without retraining. Existing methods handle only single-direction injection; when multiple semantic directions are superposed without constraints, the model collapses. We show that this collapse decomposes into two independently acting sources: distributional deviation, where additive perturbations accumulate in norm across layers and drive activations outside the training distribution, and directional interference, where non-orthogonal semantic vectors mutually dampen when superposed. These two sources define the design constraints that any training-free multi-directional intervention must address. As one instantiation of these principles, we propose GEMS, a training-free method that maps each source to a corresponding geometric constraint: norm-preserving weighted superposition and targeted attention-pathway injection for distributional deviation, and real-time orthogonalization for directional interference. On GSM8K, injecting three concurrent non-mathematical directions preserves accuracy at 98% (baseline 92%), while unconstrained addition collapses to 4%; on Wikitext-2, the same injection incurs only 2.2% PPL increase. Component ablation isolates the causal role of each constraint, and layer-level probes confirm that orthogonalized signals survive the FFN pathway and reach the output distribution with semantic specificity. Qualitative steering effects transfer across architectures from 3B to 31B.

URL PDF HTML ☆

赞 0 踩 0

2606.19769 2026-06-19 cs.RO cs.AI 新提交

Data Standards for Humanoid Robotics: The Missing Infrastructure for Physical AI

人形机器人数据标准：物理AI缺失的基础设施

Shaoshan Liu, Xiugong Qin, Xuan Wu, Xuan Xia, Ning Ding, Jialu Liu, Jie Tang

AI总结本文论证数据标准是人形机器人可扩展性的关键基础设施，通过提出ISO/WD 26264-1标准，解决数据非累积性问题，使具身经验可解释、可共享、可追溯和可复用。

详情

AI中文摘要

人形机器人的可扩展性不仅取决于模型和硬件，还取决于物理经验能否在机器人、任务、组织及时间维度上积累。基于作者在ISO/TC 299/WG 16内制定ISO/WD 26264-1《人形机器人数据集——第1部分：通用要求》的工作，本文论证数据标准正成为物理AI的基础设施。我们提出三个见解：第一，人形机器人数据是具身交互数据，而非孤立数字样本的集合；有用的数据集必须保留机器人本体、动作、任务、场景、执行轨迹和结果之间的关系。第二，其价值取决于物理一致性：多模态流仅在时序、坐标系、标定、运动学、单位和同步假设可检查时才可复用。第三，主要瓶颈不仅是数据稀缺，更是由高采集成本、数据孤岛和不一致评估导致的非累积性数据。我们认为人形机器人数据标准通过使具身经验可解释、可共享、可追溯和可复用来解决这些瓶颈。通用标准应为生命周期管理、元数据、来源、质量、版本控制和可追溯性提供横向基础设施，而能力特定部分应定义操作、移动、人机交互、认知及未来人形能力的领域语法。随着AI从屏幕进入实体，数据标准必须从组织数字信息演变为结构化物理交互。

英文摘要

The scalability of humanoid robots will depend not only on models and hardware, but also on whether physical experience can accumulate across robots, tasks, organizations, and time. Drawing on the authors' work in developing ISO/WD 26264-1, Humanoid robot datasets -- Part 1: General requirements, within ISO/TC 299/WG 16, this article argues that data standards are becoming foundational infrastructure for Physical AI. We develop three insights. First, humanoid robot data is embodied interaction data, not a collection of isolated digital samples; a useful dataset must preserve the relationship among robot body, action, task, scene, execution trace, and outcome. Second, its value depends on physical coherence: multimodal streams are reusable only when timing, coordinate frames, calibration, kinematics, units, and synchronization assumptions remain inspectable. Third, the main bottleneck is not only data scarcity, but non-cumulative data caused by high collection costs, data silos, and inconsistent evaluation. We argue that humanoid robot data standards address these bottlenecks by making embodied experience interpretable, shareable, traceable, and reusable. A general standard should provide horizontal infrastructure for lifecycle management, metadata, provenance, quality, versioning, and traceability, while capability-specific parts should define domain grammar for manipulation, locomotion, human-robot interaction, cognition, and future humanoid capabilities. As AI moves from screens into bodies, data standards must evolve from organizing digital information to structuring physical interaction.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Statistical Foundations of LLM-based A/B Testing: A Surrogacy Framework for Human Causal Inference

Gaming-Resistant Insurance Contracts for Autonomous AI Agents: Strategy-Proof Toll Mechanism Design

An integrated interpretable control effectiveness learning and nonlinear control allocation methodology for overactuated aircrafts

Higher-Order Token Interactions via Quantum Attention

An adaptive framework for the axisymmetric pulsar magnetosphere using physics-informed Kolmogorov-Arnold networks

Reduced-Order Surrogates for Forced Flexible Mesh Coastal-Ocean Models

Temporal Data and Short-Time Averages Improve Multiphase Mass Flow Metering

AICO: Feature Significance Tests for Supervised Learning

A Hybrid TGN-SEAL Model for Dynamic Graph Link Prediction

Vision Models for Medical Imaging: A Hybrid Approach for PCOS Detection from Ultrasound Scans

Accelerated Interactive Auralization of Highly Reverberant Spaces using Graphics Hardware

Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration

How Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-Speech

What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations?

ARC: Adaptive Robust Joint State and Covariance Estimation

Direct Advantage Estimation for Scalable and Sample-efficient Deep Reinforcement Learning

Towards Modality-imbalanced Federated Graph Learning: A Data Synthesis-based Approach

On the Variance of Temporal Difference Learning and its Reduction Using Control Variates

Leveraging systems' non-linearity to tackle the scarcity of data in the design of Intelligent Fault Diagnosis Systems

Reliability-Aware Prototype Calibration for Frozen Pose-Flow Video Anomaly Detection

Lagrange: An Open-Vocabulary, Energy-Based Sparse Framework for Generalized End-to-End Driving

The Register Gap: A Meaning Intelligence Framework for Nigerian Public Discourse

Beyond Accuracy: Measuring Logical Compliance of Predictive Models

Effective Dimension Governs Generalization in Quantum Kernel Vision Models

ReNikud: Audio-Supervised Hebrew Grapheme-to-Phoneme Conversion

Exploring the potential of AlphaEarth and TESSERA embeddings for Fine-scale Local Climate Zone Mapping: A case study across five cities in Switzerland

PolSeT: Polish Semantics of Timbre Dataset

Evaluation of Augmented Reality-based Intuitive Interface for Robot-Assisted Transesophageal Echocardiography: A User Study

GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMs

Data Standards for Humanoid Robotics: The Missing Infrastructure for Physical AI