arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.12876 2026-06-12 cs.LG cs.CL cs.IT 新提交

Multi-Bitwidth Quantization for LLMs Using Additive Codebooks

使用加性码本的大语言模型多比特宽度量化

Liza Babaoglu, Shuangyi Chen, Ashish Khisti

发表机构 * University of Toronto(多伦多大学)

AI总结 提出Drop-by-Drop框架,基于信息论和逐次细化理论,利用加性码本和Matryoshka监督实现单个模型在推理时支持多精度权重控制,降低存储开销并保持性能。

详情
Comments
37 pages, 12 figures
AI中文摘要

随着大语言模型(LLM)在具有不同资源约束的异构硬件上部署越来越广泛,无需重新训练即可自适应管理性能与效率之间权衡的能力变得至关重要。我们提出Drop-by-Drop,一种新颖的多比特宽度训练后量化框架,能够从单个训练模型实现对LLM权重的推理时精度控制。我们的方法在理论上基于信息论和逐次细化。我们证明,通常服从高斯分布的LLM权重,在由LLM损失函数驱动的加权均方误差失真下,随着额外比特的加入可以以递增的保真度最优重建。为了在实践中实现这一点,Drop-by-Drop将Matryoshka风格的监督纳入损失函数,利用了加性码本的结构。Drop-by-Drop生成单个模型,其中有序的码本子集在每个精度级别产生精确的部分重建。这种方法通过允许单个检查点服务于多个比特宽度,显著减少了存储和内存开销,同时在主要架构(如Qwen、LLaMA、Gemma和Mistral)上保持了有竞争力的困惑度和准确度。

英文摘要

As large language models (LLMs) are increasingly deployed across heterogeneous hardware with varying resource constraints, the ability to adaptively manage the trade-off between performance and efficiency without retraining is critical. We propose Drop-by-Drop, a novel multi-bitwidth post-training quantization framework that enables inference-time precision control over LLM weights from a single trained model. Our method is theoretically grounded in information theory and successive refinement. We establish that LLM weights, which commonly follow a Gaussian distribution, can be optimally reconstructed with increasing fidelity as additional bits are incorporated, under a weighted mean squared error distortion motivated by LLM loss functions. To realize this in practice, Drop-by-Drop incorporates Matryoshka-style supervision into the loss function, exploiting the structure of additive codebooks. Drop-by-Drop produces a single model where ordered subsets of codebooks yield accurate partial reconstructions at each precision level. This approach significantly reduces storage and memory overhead by allowing a single checkpoint to serve multiple bitwidths, while maintaining competitive perplexity and accuracy across major architectures, such as Qwen, LLaMA, Gemma, and Mistral.

2606.12710 2026-06-12 cs.LG math.OC 新提交

A Stabilized Path-Space Approach to Diffusion-Based Posterior Sampling

一种稳定的路径空间方法用于基于扩散的后验采样

Evan Scope Crafts, Umberto Villa, Saviz Mowlavi, Yanting Ma, Hassan Mansour, Wael H. Ali

发表机构 * Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin(德克萨斯大学奥斯汀分校奥登计算工程与科学研究所) Mitsubishi Electric Research Laboratories (MERL)(三菱电机研究实验室) Department of Biomedical Engineering, The University of Texas at Austin(德克萨斯大学奥斯汀分校生物医学工程系) Mitsubishi Electric Research Laboratories(三菱电机研究实验室)

AI总结 提出一种稳定的路径空间框架,通过随机最优控制与信任域优化,实现非线性逆问题中准确且鲁棒的后验采样。

详情
AI中文摘要

扩散模型为贝叶斯逆问题提供了表达性数据驱动先验,但许多扩散后验采样器依赖启发式引导近似,可能对非线性算子和多模态后验失效。本文开发了一种稳定的路径空间框架用于基于扩散的后验采样。从终端边际代表先验的基础扩散过程出发,我们定义了轨迹上的似然加权目标测度,并将后验采样转化为学习一个路径测度匹配该目标的受控随机过程。该公式将扩散后验采样与随机最优控制联系起来,同时保留了不确定性量化所需的贝叶斯结构。我们引入了一种时间重参数化,通过消除未知初始值函数引起的偏差,使路径空间控制问题适定,无需辅助训练。然后通过具有对数方差目标的信任域路径空间优化方法学习控制。路径空间视角还统一了我们的学习控制方法与现有的基于引导的采样器,量化了近似控制引起的采样误差,并产生了用于渐近精确后验期望的重要性采样校正。我们在具有解析表征或高质量参考后验的基准逆问题套件上评估了所提出的框架,从而实现了对采样精度和不确定性量化的原则性评估。这些实验深入揭示了基于扩散的后验采样器的行为,并证明了相比领先方法更高的准确性和鲁棒性。

英文摘要

Diffusion models provide expressive data-driven priors for Bayesian inverse problems, but many diffusion posterior samplers rely on heuristic guidance approximations that can fail for nonlinear operators and multimodal posteriors. In this work, we develop a stabilized path-space framework for diffusion-based posterior sampling. Starting from a base diffusion process whose terminal marginal represents the prior, we define a likelihood-weighted target measure on trajectories and cast posterior sampling as learning a controlled stochastic process whose path measure matches this target. This formulation connects diffusion posterior sampling to stochastic optimal control while preserving the Bayesian structure needed for uncertainty quantification. We introduce a time reparameterization that makes the path-space control problem well posed by removing the bias induced by the unknown initial value function, without auxiliary training. We then learn the control via a trust-region path-space optimization method with log-variance objectives. The path-space perspective also unifies our learned control approach with existing guidance-based samplers, quantifies the sampling error induced by approximate controls, and yields importance sampling corrections for asymptotically exact posterior expectations. We evaluate the proposed framework on a suite of benchmark inverse problems with analytically characterized or high-quality reference posteriors, enabling principled assessment of sampling accuracy and uncertainty quantification. These experiments provide insight into the behavior of diffusion-based posterior samplers and demonstrate improved accuracy and robustness over leading approaches.

2606.12694 2026-06-12 cs.DS cs.LG math.PR stat.ML 新提交

A unified complexity bound for logconcave sampling

对数凹采样的统一复杂度界

Yunbum Kook, Santosh S. Vempala

发表机构 * University of Texas at Austin(得克萨斯大学奥斯汀分校)

AI总结 本文通过In-and-Out算法与指数提升,给出了从热启动采样任意对数凹分布的简单、统一且近乎紧的界,主要创新是提升了提升分布的Poincaré常数界。

详情
Comments
5 pages
AI中文摘要

我们给出了一个简单、统一且近乎紧的界,用于从热启动使用In-and-Out算法结合指数提升采样任意对数凹分布。分析中的主要新成分是提升了提升分布的Poincaré常数界。因此,得到的收敛率对于约束设置(例如,限制在凸体上的高斯分布)和良条件设置(例如,强对数凹且光滑的密度)都是近乎紧的。

英文摘要

We give a simple, unified, and nearly tight bound for sampling arbitrary logconcave distributions from a warm start using the In-and-Out algorithm along with exponential lifting. The main new ingredient in the analysis is an improved bound on the Poincaré constant of a lifted distribution. As a consequence, the resulting convergence rate is nearly tight for both constrained settings (e.g., Gaussian restricted to a convex body) and well-conditioned settings (e.g., strongly logconcave and smooth densities).

2606.12646 2026-06-12 stat.ML cs.IT cs.LG 新提交

Epistemic Uncertainty Is Not the Reducible Kind

认知不确定性并非可约简的那种

Robin Young

发表机构 * University of Cambridge(剑桥大学)

AI总结 证明标准定义中认知不确定性为可被更多数据移除的部分,与互信息度量在扩展上不一致,并提出三部分分解:偶然、样本可约简认知和机制可约简认知不确定性。

详情
AI中文摘要

预测不确定性的标准分类将认知不确定性定义为可通过收集更多数据移除的部分,而标准度量将其与互信息项等同。我们证明该定义与度量在扩展上不一致。在一个显式构造中,度量将所有不确定性归为认知类,但任何数量的训练数据都无法减少它。可约简性反而是(不确定性,获取类)这一对的性质,二分法分解为三部分:偶然不确定性、样本可约简认知不确定性和机制可约简认知不确定性。一个观测值的精确恒等式表明,分布内数据永远不会减少机制不可约简的不确定性,并且通常会增加它。集成分歧,即部署的认知估计,追踪的是训练过程而非认知项。在一致训练下,它降至正真值以下的零,并在插值下等于超参数缩放的初始化噪声。有限样本的证伪测试和种子扫描实验证实了该理论。

英文摘要

The standard taxonomy of predictive uncertainty defines epistemic uncertainty as the part removable by collecting more data, while the standard measure identifies it with a mutual-information term. We prove the definition and the measure are extensionally inconsistent. On an explicit construction, the measure assigns all uncertainty to the epistemic class, yet no quantity of training data reduces it. Reducibility is instead a property of the pair (uncertainty, acquisition class), and the dichotomy resolves into three parts: aleatoric, sample-reducible epistemic, and mechanism-reducible epistemic uncertainty. An exact identity for the value of an observation shows that in-distribution data never reduces mechanism-irreducible uncertainty and generically increases it. Ensemble disagreement, the deployed epistemic estimate, tracks the training procedure rather than the epistemic term. It collapses to zero beneath a positive truth under consistent training, and equals hyperparameter-scaled initialization noise under interpolation. A finite-sample falsification test and seed-swept experiments confirm the theory.

2606.12611 2026-06-12 cs.LG cs.IT 新提交

Evaluation of AutoML Frameworks for IDS under Imbalanced Data Conditions of the NSL-KDD Dataset

NSL-KDD数据集不平衡数据条件下IDS的AutoML框架评估

Wiliane Carolina Silva, Evandro César Vilas Boas, Felipe A. P. de Figueiredo

发表机构 * Cybersecurity and Artificial Intelligence Laboratory (CS&I Lab), National Institute of Telecommunications (Inatel)(网络安全与人工智能实验室(CS&I Lab),国家电信研究所(Inatel)) Wireless and Artificial Intelligence Laboratory (WAI Lab), National Institute of Telecommunications (Inatel)(无线与人工智能实验室(WAI Lab),国家电信研究所(Inatel))

AI总结 研究NSL-KDD数据集上严重类别不平衡对多分类入侵检测中AutoML框架性能的影响,发现集成学习和不平衡感知优化可提升少数类检测能力,PyCaret表现最佳(macro-F1 66%)。

详情
AI中文摘要

本研究探讨了严重类别不平衡对使用NSL-KDD数据集进行多分类网络入侵检测的自动化机器学习(AutoML)框架性能的影响。与以往通过二分类或移除少数类来简化问题的研究不同,我们保留了原始的五类分布,包括高度欠表示的R2L和U2R攻击,从而能够对不平衡敏感的学习行为进行现实评估。在统一且可重复的实验协议下,分析了九个开源AutoML框架,考虑了架构设计、集成策略、验证程序、超参数优化和不平衡处理机制的差异。结果表明,采用集成学习和不平衡感知优化的框架在少数类判别上表现更好。PyCaret获得了最佳整体性能,macro-F1达到66%,其次是AutoGluon(55%),而缺乏原生平衡支持的框架在少数类检测能力上显著下降。进一步分析表明,仅以准确率为导向的优化不足以应对高度不平衡的入侵检测场景,因为高加权指标可能与对罕见攻击类别的泛化能力差共存。作为贡献,本研究为严重多类不平衡下的AutoML入侵检测建立了标准化基准,指出了当前架构的局限性,以及将不平衡感知优化、重采样和分层评估策略原生集成到自动化学习流水线中的必要性。源代码已公开。

英文摘要

This work investigates the impact of severe class imbalance on the performance of automated machine learning (AutoML) frameworks for multiclass network intrusion detection using the NSL-KDD dataset. Unlike previous studies that simplify the problem through binary classification or minority-class removal, we preserve the original five-class distribution, including highly underrepresented attacks such as R2L and U2R, enabling a realistic evaluation of imbalance-sensitive learning behavior. Nine open-source AutoML frameworks were analyzed under a unified and reproducible experimental protocol, considering differences in architectural design, ensemble strategies, validation procedures, hyperparameter optimization, and imbalance-handling mechanisms. The results demonstrate that frameworks incorporating ensemble learning and imbalance-aware optimization achieve better minority-class discrimination. PyCaret obtained the best overall performance, reaching 66\% macro-F1, followed by AutoGluon with 55\%, whereas frameworks lacking native balancing support exhibited significant degradation in minority-class detection capability. The analysis further shows that accuracy-oriented optimization alone is insufficient for highly imbalanced IDS scenarios, since high-weighted metrics may coexist with poor generalization on rare attack categories. As a contribution, this work establishes a standardized benchmark for AutoML-based intrusion detection under severe multiclass imbalance, highlighting current architectural limitations and the need for native integration of imbalance-aware optimization, resampling, and stratified evaluation strategies into automated learning pipelines. The source code is publicly available.

2606.13605 2026-06-12 math.OC cs.LG eess.SY 新提交

Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning

基于机会约束强化学习的分布无关鲁棒轨迹优化

Yashdeep Chaudhary, Roberto Armellin, Harry Holt, Marco Sagliano

发表机构 * Auckland University(奥克兰大学)

AI总结 提出一种分布无关的鲁棒轨迹优化框架,通过机会约束强化学习处理初始条件和过程噪声的不确定性,采用离线标称轨迹与在线仿射闭环校正,在两种不同轨迹设计问题上验证了概率可行性与燃料效率。

详情
Comments
Preprint. 39 pages, 16 figures
AI中文摘要

本文提出了一种基于机会约束强化学习的分布无关鲁棒轨迹优化框架。不确定性通过初始条件和过程噪声表示,唯一要求是能够对其进行采样。首先离线计算确定性标称轨迹,然后仅使用强化学习通过结构化仿射闭环校正律(包括前馈控制调整和时变反馈增益)来鲁棒化该基线。通过基于rollout的上尾分位数经验性地强制执行概率可行性,同时通过协方差可行性惩罚来调节终端分散性。该框架在两个性质不同的轨迹设计问题上进行了评估。主要案例研究是一个三维多脉冲地球-火星转移任务,其中学习策略在高斯不确定性下与最近的鲁棒轨迹优化参考进行基准比较,然后在有界均匀不确定性和训练期间未见的过程扰动下进行评估。第二个案例研究是一个随机大气精确火箭着陆问题,用于评估在具有阻力、质量消耗和下滑角约束的短时连续推力设置中的可移植性。结果表明,所提出的框架在保持概率可行性的同时,能够在上尾燃料成本方面保持竞争力,并且相同的鲁棒化框架可以跨异构航天器轨迹规划问题移植,而无需重新设计其核心随机控制结构。

英文摘要

This paper presents a distribution-agnostic robust trajectory-optimization framework based on chance-constrained reinforcement learning. The uncertainty is represented here through initial conditions and process noise, with the only requirement being that it can be sampled. A deterministic nominal trajectory is first computed offline, and reinforcement learning is then used only to robustify that baseline through a structured affine closed-loop correction law comprising a feedforward control adjustment and time-varying feedback gains. Probabilistic feasibility is enforced empirically through rollout-based upper-tail quantiles, while terminal dispersion is regulated through covariance-feasibility penalties. The framework is assessed on two materially different trajectory design problems. The flagship case study is a three-dimensional multi-impulse Earth-Mars transfer, where the learned policy is benchmarked against a recent robust trajectory-optimization reference under Gaussian uncertainty and then evaluated under bounded uniform uncertainty and under process disturbances not seen during training. The second case study is a stochastic atmospheric pinpoint rocket landing problem, used to assess portability to a short-horizon continuous-thrust setting with drag, mass depletion, and glide-slope constraints. The results show that the proposed framework can remain competitive in upper-tail fuel cost while preserving probabilistic feasibility, and that the same robustification scaffold can be carried across heterogeneous spacecraft trajectory planning problems without redesign of its core stochastic-control structure.

2606.12858 2026-06-12 cs.IT cs.AI cs.CV 新提交

JSCGC: Joint Source-Channel-Generation Coding for Wireless Generative Communications

JSCGC:面向无线生成式通信的联合源信道生成编码

Tong Wu, Zhiyong Chen, Guo Lu, Li Song, Feng Yang, Meixia Tao, Wenjun Zhang

发表机构 * Cooperative Medianet Innovation Center, the School of Information Science and Electronic Engineering, Shanghai Jiao Tong University(联合中位网创新中心,信息科学与电子工程学院,上海交通大学)

AI总结 提出联合源信道生成编码(JSCGC),用生成模型替换传统解码器,将通信重构问题转化为受感知约束下的受控生成问题,通过联合训练和随机采样框架最大化互信息,在潜空间图像传输中提升特征、语义和分布质量。

详情
Comments
submitted to IEEE Journal
AI中文摘要

传统通信系统,包括基于分离的编码和基于学习的联合源信道编码(JSCC),通常是在香农率失真理论下设计的。然而,依赖通用失真度量无法捕捉复杂的人类视觉感知,常常导致模糊或不真实的复原。在本文中,我们提出联合源信道生成编码(JSCGC),一种生成式通信范式,用接收端的生成模型替换传统解码器。接收信号被视为一个条件,控制采样过程进入学习到的条件分布,将通信从用于失真最小化的确定性重构重新表述为在感知约束下用于互信息最大化的受控生成。基于这一表述,我们开发了一个统一的联合训练和高效随机采样框架,并提供了其在学习和推理阶段有效性的理论分析。在潜空间图像传输上的大量实验表明,JSCGC在不同信道条件下持续改善基于特征、语义层面和分布的质量,同时表现出一种以语义不一致而非失真为特征的独特错误行为。

英文摘要

Conventional communication systems, including both separation-based coding and learning-based joint source-channel coding (JSCC), are typically designed under Shannon's rate-distortion theory. However, relying on generic distortion metrics fails to capture complex human visual perception, often resulting in blurred or unrealistic reconstructions. In this paper, we propose Joint Source-Channel-Generation Coding (JSCGC), a generative communication paradigm that replaces the conventional decoder with a generative model at the receiver. The received signal is treated as a condition that controls the sampling process into the learned conditional distribution, reformulating communication from deterministic reconstruction for distortion minimization to controlled generation for mutual information maximization under perceptual constraints. Based on this formulation, we develop a unified joint training and efficient stochastic sampling framework, and provide theoretical analysis of its effectiveness in both learning and inference stages. Extensive experiments on latent-space image transmission demonstrate that the JSCGC consistently improves feature-based, semantic-level, and distributional quality across diverse channel conditions, while exhibiting a distinct error behavior characterized by semantic inconsistency rather than distortion.

2606.12489 2026-06-12 cs.IT cs.LG 新提交

Masked Neural Detection for Constrained Channel Coding in Molecular Communication

分子通信中约束信道编码的掩码神经检测

Melih Şahin, Ozgur B. Akan

发表机构 * Centre for neXt Communications (CXC), Department of Engineering, University of Cambridge(下一代通讯中心(CXC)、工程系、剑桥大学) Centre for neXt Communications (CXC), Department of Electrical and Electronics Engineering, Koç University(下一代通讯中心(CXC)、电子与电气工程系、科克大学)

AI总结 针对分子通信中的扩散记忆问题,提出掩码神经检测器,结合RLIM约束码与SBRNN,在多数情况下优于未编码检测,平均增益达10.36倍,并设计RLIM定制训练掩码进一步提升性能。

详情
Comments
5 pages, 2 figures, 4 tables
AI中文摘要

分子通信(MC)遭受严重的扩散记忆,因为一个符号释放的分子可能在后续符号期间到达。神经序列检测器,特别是滑动双向循环神经网络(SBRNN),在此类信道中能显著优于阈值检测器。这引出了MC信道编码的一个核心问题:当编码和未编码传输均采用神经检测评估时,先前在阈值检测下建立优势的码是否仍能保持其优势?本文针对游程限制的ISI缓解(RLIM)码(一类先前在MC中显示出巨大BER增益的约束码)回答了这一问题。在测试的工作点中,最佳RLIM-SBRNN接收机在59个案例中的46个中击败了最佳未编码接收机(在阈值和SBRNN检测之间选择),平均增益为10.36倍。我们还为紧凑型SBRNN检测器提出了一个RLIM定制的训练掩码,在236次比较中的227次中改进了未掩码的RLIM-SBRNN,当掩码有益时平均增益为3.267倍。最后,紧凑型掩码RLIM-SBRNN尽管不使用任何信道知识,但与信道状态感知的MLSE具有竞争力。

英文摘要

Molecular communication (MC) suffers from severe diffusion memory because molecules released for one symbol may arrive during later symbols. Neural sequence detectors, especially sliding bidirectional recurrent neural networks (SBRNNs), can substantially outperform threshold detectors in such channels. This raises a central question for MC channel coding: does a code whose advantage was established under threshold detection retain it when both coded and uncoded transmission are evaluated with neural detection? This letter answers this question for run-length-limited ISI-mitigation (RLIM) codes, a class of constrained codes previously shown to provide large BER gains in MC. Across the tested operating points, the best RLIM-SBRNN receiver beats the best uncoded receiver, chosen between threshold and SBRNN detection, in $46$ of $59$ cases, with a mean gain of $10.36\times$ over those wins. We also propose an RLIM-tailored training mask for compact SBRNN detectors, improving the unmasked RLIM-SBRNN in $227$ of $236$ comparisons with $3.267\times$ mean gain when masking is beneficial. Finally, the compact masked RLIM-SBRNN is competitive with channel-state-aware MLSE despite using no channel knowledge.

2606.12816 2026-06-12 quant-ph cs.ET cs.LG 新提交

Graph Reinforcement Learning for Calibration-Aware Quantum Circuit Routing

图强化学习用于校准感知的量子电路路由

Yash Vardhan Tomar, Dheeraj Peddireddy, Vaneet Aggarwal

发表机构 * University of California, Berkeley(加州大学伯克利分校) National Institute of Standards and Technology(国家标准与技术研究院)

AI总结 提出一种利用图强化学习进行校准感知的量子电路路由方法,通过IBM Heron r2校准数据选择SWAP操作,在MQT Bench电路上平均保真度达0.727,优于SABRE-best20的0.440。

详情
AI中文摘要

量子电路路由是在为噪声中等规模量子处理器编译程序时的关键步骤。通过标准开销指标看似高效的路由,在通过校准不良的耦合器时仍可能损失保真度。我们研究了一种校准感知的图强化学习路由器,该路由器使用当天的IBM Heron r2校准数据来选择硬件边缘SWAP。我们使用近端策略优化训练策略,并通过九个慕尼黑量子工具包(MQT)基准电路和三个校准快照的精确模拟保真度进行评估。在这些评估中,合并的平均精确保真度为$0.727$,而SABRE-best20为$0.440$,目标感知SABRE为$0.481$。保真度增益伴随着更高的路由双量子比特计数,并集中在5q和8q电路系列中;在固定树动作图下,所有10q系列都倾向于SABRE-best20。总体而言,我们的结果表明,校准感知的学习路由可以超越基于门计数的编译,提高保真度。

英文摘要

Quantum circuit routing is a key step in compiling programs for noisy intermediate-scale quantum processors. Routes that appear efficient by standard overhead metrics can still lose fidelity when they pass through poorly calibrated couplers. We study a calibration-aware graph reinforcement-learning router that uses same-day IBM Heron r2 calibration data to choose hardware-edge SWAPs. We train the policy with proximal policy optimization and evaluate it with exact simulated fidelity across nine Munich Quantum Toolkit (MQT) Bench circuits and three calibration snapshots. Across these evaluations, pooled mean exact fidelity is $0.727$, compared with $0.440$ for SABRE-best20 and $0.481$ for target-aware SABRE. Fidelity gains come with higher routed two-qubit counts and are concentrated in the 5q and 8q circuit families; under the fixed tree action graph, all 10q families favor SABRE-best20. Overall, our results show that calibration-aware learned routing can improve fidelity beyond gate-count-driven compilation.

2606.12806 2026-06-12 quant-ph cs.LG 新提交

Quantum Reservoir Computing for Short-Term Power Load Forecasting in Resource-Constrained Energy Systems

量子储层计算在资源受限能源系统中的短期电力负荷预测

Mansi Od, Param Pathak, Nouhaila Innan, Muhammad Shafique

发表机构 * University of Waterloo(滑铁卢大学)

AI总结 提出一种硬件高效的量子储层计算框架,通过固定量子储层和压缩经典读出层,在有限内存和硬件噪声下实现短期负荷预测,6位量化保留全精度性能并减少81.2%内存。

详情
Comments
11 pages, 9 figures
AI中文摘要

短期负荷预测对于可靠的能源管理至关重要,但在边缘设备上的实际部署需要模型在有限内存、有限测量预算和硬件噪声下保持准确性。本文提出一种硬件高效的量子储层计算(QRC)框架用于能源负荷预测,其中固定量子储层将时间输入窗口转换为高维特征,仅训练经典弹性网络读出层。为降低部署成本,训练后的读出层通过训练后定点量化压缩,位宽从8位到2位。该框架在Tetouan和Spain能源负荷数据集上评估,采用精确态矢量模拟、512次有限采样以及来自IBM FakeTorino和IBM FakeMarrakesh的 realistic 硬件噪声模型。结果表明,6位读出精度保持全精度预测性能,同时将读出内存减少81.2%。低于此阈值时,性能退化依赖于数据集,Tetouan表现出更强的敏感性,而Spain退化更缓慢。硬件噪声验证进一步表明,训练后的读出层可转移到噪声储层状态而无需重新训练。这些发现支持量化QRC作为近期量子时间序列应用的资源感知预测方法。

英文摘要

Short-term load forecasting is essential for reliable energy management, but practical deployment on edge devices requires models that remain accurate under limited memory, finite measurement budgets, and hardware noise. This work proposes a hardware-efficient Quantum Reservoir Computing (QRC) framework for energy load forecasting, where a fixed quantum reservoir transforms temporal input windows into high-dimensional features and only a classical Elastic Net readout is trained. To reduce deployment cost, the trained readout is compressed using post-training fixed-point quantization at bit widths from 8 to 2 bits. The framework is evaluated on the Tetouan and Spain energy load datasets under exact statevector simulation, 512-shot finite sampling, and realistic hardware-noise models from IBM FakeTorino and IBM FakeMarrakesh. Results show that 6-bit readout precision preserves full-precision forecasting performance while reducing readout memory by 81.2%. Below this point, degradation becomes dataset dependent, with Tetouan showing stronger sensitivity and Spain degrading more gradually. Hardware-noise validation further shows that the trained readout transfers to noisy reservoir states without retraining. These findings support quantized QRC as a resource-aware forecasting approach for near-term quantum time-series applications.

2606.13581 2026-06-12 cs.CY cs.CL cs.HC physics.soc-ph 新提交

The Tone of Awareness: Topic, Sentiment, and Toxicity Maps During Mental Health Month on TikTok

意识基调:TikTok 心理健康月期间的主题、情感和毒性地图

Henrique Ferraz de Arruda, Andreia Sofia Teixeira, Pranay Gundala Reddy, Anindya Mondal, Kleber Andrade Oliveira, Filipi Nascimento Silva

发表机构 * Institute for Biocomputation and Physics of Complex Systems (BIFI)(生物计算与复杂系统物理研究所) University of Zaragoza(萨拉戈塔大学) ARAID Foundation(ARAID基金会) Network Science Institute(网络科学研究所) Northeastern University London(伦敦东北大学) Kent Medway Medical School(肯特梅德斯医疗学院) LASIGE(拉西格研究所) Faculdade de Ciências da Universidade de Lisboa(里斯本大学科学学院) Department of Psychology, University of Limerick(利默里克大学心理学系) Observatory on Social Media, Indiana University(社交媒体观察所,印第安纳大学) CSSI - Kellogg School of Management, Northwestern University(CSSI - 北western大学凯洛格管理学院)

AI总结 通过分析 TikTok 2023-2024 年心理健康月期间的视频和评论,使用 BERTopic 提取主题、XLM-T 和 Detoxify 量化情感与毒性,发现视频情感偏负面而评论更混合,毒性在评论中呈长尾分布且集中于特定主题。

详情
Comments
12 pages, 6 figures
AI中文摘要

尽管人们担忧使用 TikTok 对心理健康的影响,但关于创作者如何构建相关内容以及受众如何接收这些内容,我们知之甚少。我们通过 TikTok 研究 API 收集了 2023 年和 2024 年心理健康意识月(5月)的 28,341 个 TikTok 视频和 80,130 条评论的内容,并研究了意识基调在不同主题和年份间的变化。我们将“基调”定义为心理健康话语的情感和人际框架,通过情感和毒性度量来操作化。我们使用 BERTopic 和对数几率关键词从视频文本中提取主题,然后分别对视频转录和评论量化主题条件下的情感(XLM-T)和毒性(Detoxify)。情感捕捉内容的效价,而毒性反映有害或辱骂性语言的存在。我们发现跨年份存在一组稳定的重复主题,涵盖临床状况、情感披露、自我护理和活动导向内容,且参与度高度偏向一小部分主题。所有情感和毒性分析均分别针对视频内容和评论进行计算,使我们能够区分内容生产和受众接收。视频中的情感对于情感强烈的主题通常是负面的,而评论则倾向于转向更混合或积极的极性,尤其是对于自杀预防。毒性总体中位数较低,但在评论中表现出比视频更长的尾部异常值,这些异常值在评论中更为明显,并集中在特定主题(例如“Duet”、“Suicide Prevention”和“Psychisch”)。总体而言,我们的结果提供了意识月活动期间 TikTok 上心理健康话语的主题级分解。

英文摘要

Despite raising concerns about the mental health effects associated with the usage of TikTok, little is known about how related content is framed by creators and received by audiences. We collect the content of 28,341 TikTok videos and 80,130 comments from Mental Health Awareness Month (May) in 2023 and 2024 via the TikTok Research API, and study how the tone of awareness varies across topics and years. We characterize "tone" as the emotional and interpersonal framing of mental health discourse, operationalized through sentiment and toxicity measures. We extract topics from video text using BERTopic and log-odds keywords, then quantify topic-conditioned sentiment (XLM-T) and toxicity (Detoxify) separately for video transcriptions and comments. Sentiment captures the affective valence of content, while toxicity reflects the presence of harmful or abusive language. We find a stable set of recurring themes across years, spanning clinical conditions, emotional disclosure, self-care, and campaign-oriented content, with engagement highly skewed toward a small subset of topics. All sentiment and toxicity analyses are computed separately for video content and comments, allowing us to distinguish between content production and audience reception. Sentiment in videos is often negative for emotionally charged topics, while comments tend to shift toward more mixed or positive polarity, especially for suicide prevention. Toxicity is low in median overall, but exhibits longer-tailed outliers in comments than in videos that are more pronounced in comments and concentrated in specific topics (e.g., "Duet", "Suicide Prevention", and "Psychisch"). Overall, our results provide a topic-level decomposition of mental health discourse on TikTok during awareness-month campaigns.

2606.13422 2026-06-12 quant-ph cs.LG physics.flu-dyn 新提交

Foundations of Practical Quantum Advantage in Quantum-Informed Machine Learning for Predicting Chaos

量子信息机器学习预测混沌的实用量子优势基础

Maida Wang, Xiao Xue, Minh Chung, Peter V. Coveney

发表机构 * Centre for Computational Science, University College London(大学学院伦敦计算科学中心) Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities(巴伐利亚科学院和人文科学莱比锡超算中心) Centre for Advanced Research Computing, University College London(大学学院伦敦先进研究计算中心)

AI总结 提出基于高阶量子统计先验的量子优势机制,通过两阶段优势(表示与提取)证明量子-经典复制测量复杂度分离,并在湍流和天气预报中验证。

详情
AI中文摘要

我们为混沌动力系统的量子信息机器学习中的实用量子优势机制建立了理论基础。一族由k索引的高阶量子统计先验(Q-Priors)在n_q = kq个量子比特上承载不变测度的k点边际,扩展了先前工作的单站点构造。我们证明了一个两阶段优势。在表示阶段,叠加和纠缠紧凑地存储了n_q个量子比特上不变测度的不可分解空间相关性。在提取阶段,对两个副本进行联合贝尔测量,以独立于n_q的副本对数量估计任何事后泡利泛函,而相应的全泡利读出的任何自适应单副本协议需要Ω(2^(n_q))个副本;这是复制测量复杂度中可证明的量子-经典分离。双副本读出在模拟和IQM超导处理器上实现。两个案例研究将这一机制实例化到具有独立科学价值的工作流程中:一个湍流通道流研究,其中双副本读出产生了不变测度的一个命名的非对角关联子(速度方向相干性),以及一个基于欧洲中期天气预报中心ERA5再分析的中期天气预报工作流程,其中对角k ≤ 2 Q-Prior引导Koopman展开,在48-240小时预报时效内将异常相关系数技能提高10-39%,并减少了滚动预报到静态平均场的长期崩溃。我们的实用优势定义的两个条件在互补层面上得到满足,为在容错硬件之前实现实用量子优势确定了一条候选路径。

英文摘要

We develop theoretical foundations for a practical quantum-advantage mechanism in quantum-informed machine learning for chaotic dynamical systems. A family of k-indexed higher-order quantum statistical priors (Q-Priors) hosts the k-point marginal of the invariant measure on n_q = kq qubits, extending the single-site construction of prior work. We prove a two-stage advantage. In the representation stage, superposition and entanglement compactly store non-factorisable spatial correlations of the invariant measure on n_q qubits. In the extraction stage, joint Bell measurements on two copies estimate any post hoc Pauli functional with a copy-pair count independent of n_q, whereas any adaptive single-copy protocol for the corresponding full-Pauli read-out requires Omega(2^(n_q)) copies; this is a provable quantum-classical separation in copy-measurement complexity. The two-copy read-out is realised in simulation and on IQM superconducting processors. Two case studies instantiate the mechanism in workflows of independent scientific value: a turbulent channel-flow study in which the two-copy read-out yields a named non-diagonal correlator of the invariant measure (the velocity-direction coherence), and a medium-range weather forecasting workflow on the European Centre for Medium-Range Weather Forecasts ERA5 reanalysis in which the diagonal k <= 2 Q-Prior steers a Koopman rollout, improves anomaly-correlation skill by 10-39% across 48-240 h lead times, and reduces the long-horizon collapse of rollouts onto a static mean field. The two conditions of our practical-advantage definition are met at complementary levels, identifying a candidate route to practical quantum advantage before fault-tolerant hardware.

2606.12824 2026-06-12 eess.IV cs.AI cs.CV physics.med-ph 新提交

Acquisition state behaves as a structured, measurable variable governing lung-nodule AI: kernel-driven measurement instability and noise-driven detection fragility, invisible to DICOM metadata

采集状态作为结构化、可测量变量影响肺结节AI:核驱动的测量不稳定性和噪声驱动的检测脆弱性,DICOM元数据不可见

Daniel Soliman

发表机构 * Daniel Soliman, M.S(丹尼尔·索利曼,硕士)

AI总结 研究通过LUNA16训练的RetinaNet检测器,发现CT采集状态(重建核与噪声)独立影响AI的测量与检测性能,且无法从DICOM元数据恢复,提出采集感知的输入验证层。

详情
AI中文摘要

医学影像AI治理正在规范化:2026年ACR-SIIM实践参数建议本地验收测试和持续漂移监测,ACR Assess-AI注册使用DICOM元数据监测AI输出。我们认为在输出指标之下存在一个必要但目前未监测的层:输入研究是否保持在模型验证过的采集范围内。使用LUNA16训练的MONAI RetinaNet肺结节检测器,我们测试采集状态是否表现为结构化的可测量变量。在仅重建核不同的真实配对CT(NLST B30f vs B80f)上,核单独使AI测量的直径发生偏移,并在5.2%(155个结节中的8个)中翻转了Fleischner尺寸类别,而检测置信度不变(Wilcoxon p=0.22)。在受控的LIDC-IDRI扰动下,效应按轴分离:噪声轴降低检测置信度(p=5.9e-32,集中在6mm以下结节)但不影响测量,而频率/核轴破坏测量(p=8.6e-13)但不影响检测。一个4特征像素指纹恢复了重建身份(真实CT上患者级AUC约0.95,QIBA体模上0.995),而ConvolutionKernel DICOM标签无信息(不同重建标签相同)。核轴跨四个制造商传输(留一制造商AUC 0.94-0.98,与制造商内上限匹配)。因此采集状态映射到不同的AI故障模式:频率内容对应测量可靠性,噪声对应检测灵敏度,且无法从元数据恢复。采集感知的输入侧验证是现在进入影像AI认证的验收测试和漂移监测要求中缺失的层。

英文摘要

AI governance for medical imaging is formalizing: the 2026 ACR-SIIM Practice Parameter recommends local acceptance testing and ongoing drift monitoring, and the ACR Assess-AI registry monitors AI outputs using DICOM metadata for context. We argue that a necessary, currently unmonitored layer sits beneath output metrics: whether incoming studies remain within the acquisition envelope a model was validated on. Using a LUNA16-trained MONAI RetinaNet lung-nodule detector, we test whether acquisition state behaves as a structured, measurable variable. On real paired CT differing only in reconstruction kernel (NLST B30f vs B80f), kernel alone shifted AI-measured diameter and flipped a Fleischner size category in 5.2% (8 of 155) of nodules at fixed patient and acquisition, while detection confidence was unchanged (Wilcoxon p=0.22). Under controlled LIDC-IDRI perturbations the effects dissociated by axis: the noise axis degraded detection confidence (p=5.9e-32, concentrated in nodules under 6 mm) but not measurement, while the frequency/kernel axis corrupted measurement (p=8.6e-13) but not detection. A 4-feature pixel fingerprint recovered reconstruction identity (patient-level AUC about 0.95 on real CT, 0.995 on a QIBA phantom) where the ConvolutionKernel DICOM tag was uninformative (identical labels across reconstructions). The kernel axis transported across four manufacturers (leave-one-vendor-out AUC 0.94-0.98, matching the within-vendor ceiling). Acquisition state thus maps to distinct AI failure modes, frequency content to measurement reliability and noise to detection sensitivity, and is not recoverable from metadata. Acquisition-aware, input-side validation is the missing layer for the acceptance-testing and drift-monitoring requirements now entering imaging-AI accreditation.

2606.12559 2026-06-12 physics.comp-ph cs.LG math.NA physics.flu-dyn 新提交

Feature-preserving Latent-EnKF for Data Assimilation of Flows with Shocks

保持特征的潜在EnKF用于含激波流动的数据同化

Hemanth Chandravamsi, Hangchuan Hu, Ponkrshnan Thiagarajan, Tamer A. Zaki

发表机构 * Department of Mechanical Engineering, Johns Hopkins University(约翰霍普金斯大学机械工程系)

AI总结 针对含激波流动中EnKF因多模态统计产生伪振荡的问题,提出在学习的低维潜在空间进行集合更新以保持激波特征,并通过共享解码器恢复物理状态,数值实验验证了无伪振荡的准确特征恢复。

详情
AI中文摘要

集合卡尔曼滤波(EnKF)被广泛用于顺序数据同化,但对于具有间断的解(如可压缩流中的激波)会失效。激波位置的不确定性导致多模态集合统计,违反了EnKF的高斯假设,在分析状态中产生大尺度伪振荡。我们引入了一种保持特征的潜在EnKF,在学习的低维潜在空间中进行集合更新,其中激波和流动特征具有光滑流形表示,从而在EnKF分析期间保持尖锐特征。更新后的潜在状态通过所有集合成员共享的解码器映射回物理状态。该算法消除了先前方法中使用的成员特定有序训练和正性下限。在Sod激波管和马赫2激波与二维圆柱相互作用的数值实验中,使用稀疏和噪声观测,结果显示能够准确恢复激波和接触间断的特征,且无伪振荡。

英文摘要

The ensemble Kalman filter (EnKF) is widely adopted for sequential data assimilation, but fails for solutions with discontinuities, such as shocks in compressible flows. Uncertainty in shock location induces multimodal ensemble statistics that violate the Gaussian assumptions underlying the EnKF, producing large-scale spurious oscillations in the analysis state. We introduce a feature-preserving latent-EnKF that performs the ensemble update in a learned low-dimensional latent space, where shock and flow features admit a smooth manifold representation, thereby preserving sharp features during EnKF analysis. The updated latent state is mapped back to physical state through a shared decoder for all ensemble members. The algorithm eliminates the member-specific ordered training and positivity flooring used in prior approaches. Numerical experiments on a Sod shock tube and Mach 2 shock interaction with a 2D cylinder, using sparse and noisy observations, show accurate feature recovery of shocks and contact discontinuities without spurious oscillations.

2606.12502 2026-06-12 physics.soc-ph cs.AI 新提交

A Mathematical Theory of Value: a synthesis on goal-directed agency under resource constraints

价值的数学理论:资源约束下目标导向行为的综合

Cheng Qian

发表机构 * Cheng Qian(陈倩)

AI总结 本文提出价值是目标导向主体在资源约束下转化资源为目标进度的速率,通过尺度不变性公理导出对数度量,并推导出价值编码定理,实现价值与信息论的统一。

详情
Comments
Also available at this https URL (v5)
AI中文摘要

我们提出,价值——目标导向主体创造、毁灭和交换的量——是与信息同类的合法结构量。遵循香农的方法,我们做出一个无情的抽象:价值是主体将资源转化为目标进度的速率,相对于由其目标固定的参考系。尺度不变性公理强制采用对数度量 $V=\sum_i k_i \ln e_i$;通过Peters(2019)的遍历性论证,再投资资源的复利强制了相同的形式。这两条路径是亲缘关系而非独立;它们的一致性是一种一致性检查,而非过度确定。我们推导了价值的编码定理:$\Delta G \le I(X;Y)$,由贝叶斯比例分配实现;实现的价值分解为 $G=D(q\\|r)-D(q\\|p)$,将错位识别为可测量的浪费。对于群体,价值是参考系相关的,而价格是参考系无关的;共享资源并融合感知的舰队继承上限 $G_{\mathrm{fleet}} \le I(X;Y_{1:m}) \le H(X)$(一个推论;早期的求和形式声明是错误的,并在v5中修正)。动力学层产生了实然/应然不对称性,从该不对称性中,对齐作为控制稳定性条件出现,并具有闭式残差。我们在预注册的规模扩展中测试了单参考系定律于实时语言模型:感知互信息跟踪实际能力而非参数数量(在30个模型×领域点上合并的Spearman $\rho = 0.977$),样本外 $\Delta G$ 跟踪 $I(X;Y)$,过度自信是可测量的耗散;进一步的预注册测试显示,该桥在四种任务形状上形状不变($n=42$,斜率0.953)。这些机制没有一个是全新的——广义Kelly、Armstrong & Mindermann(2018)、经典控制;贡献在于它们的统一以及随之而来的治理映射(监督上的激励设计)。

英文摘要

We propose that value -- the quantity goal-directed agents create, destroy, and exchange -- is a lawful structural quantity in the same category as information. Following Shannon's method, we make one ruthless abstraction: value is the rate at which an agent converts a resource into goal-progress, relative to a frame fixed by its goal. A scale-invariance axiom forces a logarithmic measure, $V=\sum_i k_i \ln e_i$; compounding of a reinvested resource forces the same form via the ergodicity argument of Peters (2019). The two routes are kin rather than independent; their agreement is a consistency check, not an over-determination. We derive a coding theorem of value: $\Delta G \le I(X;Y)$, achieved by Bayes-proportional allocation; realized value decomposes as $G=D(q\|r)-D(q\|p)$, identifying misalignment with measurable waste. For populations, value is frame-relative while price is frame-independent; a fleet that pools its resource and fuses its perception inherits the ceiling $G_{\mathrm{fleet}} \le I(X;Y_{1:m}) \le H(X)$ (a corollary; an earlier sum-form claim was wrong and is corrected in v5). A dynamical layer yields an is/ought asymmetry from which alignment emerges as a control-stability condition with a closed-form residual. We test the single-frame laws on live language models in a pre-registered scale-up: perception mutual information tracks realized capability rather than parameter count (Spearman $\rho = 0.977$ pooled over 30 model$\times$domain points), out-of-sample $\Delta G$ tracks $I(X;Y)$, and over-confidence is measurable dissipation; a further pre-registered test shows the bridge is shape-invariant across four task shapes ($n=42$, slope 0.953). None of the mechanisms is individually new -- generalized Kelly, Armstrong & Mindermann (2018), classical control; the contribution is their unification and the governance mapping (incentive design over oversight) that follows.

2606.13535 2026-06-12 hep-ex cs.AI hep-ph 新提交

AgentRivet: an automated system for producing Rivet routines from journal publications

AgentRivet:从期刊论文自动生成Rivet例程的系统

Antonio J. Costa, Caterina Doglioni, Christian Gütschow, Andrew D. Pilkington, Sukanya Sinha

发表机构 * Department of Physics & Astronomy, University of Manchester(曼彻斯特大学物理与天文学系) Centre for Advanced Research Computing, University College London(伦敦大学学院先进计算中心)

AI总结 提出基于大语言模型的自动化工作流AgentRivet,从论文提取物理分析信息并生成缺失的Rivet例程,经代码和物理审查实现质量控制,在ATLAS和CMS测量中生成语法错误少、物理保真度合理的例程。

详情
AI中文摘要

粒子物理对撞机实验将Rivet例程作为模型无关测量分析保存策略的一部分。Rivet是一个C++工具包,允许将新的理论模型与测量结果进行比较,从而帮助开发和调整蒙特卡洛事件生成器,以及搜索标准模型之外的新物理。然而,已知分析覆盖不完整,只有39%的测量具有文档化且公开可用的Rivet例程。在本文中,我们设计并实现了一个基于大语言模型的自动化工作流,旨在提供缺失的例程。这个多步骤工作流称为AgentRivet,从已发表的论文中提取物理分析信息,并编写缺失的Rivet例程,中间代码和物理审查作为自主质量控制的一部分。我们报告了使用OpenAI、Anthropic和Google提供的商业大语言模型,针对ATLAS和CMS实验的两个近期测量所获得的结果。我们发现AgentRivet生成了语法错误很少的合格Rivet例程。例程的物理保真度合理,并遵循相关出版物中的解释。然而,物理实现问题确实出现,并使用AgentRivet产生的产物进行了调查。大多数物理实现问题源于给定出版物中微妙但模糊的定义,尽管有些模型即使在给出明确定义时也难以实现复杂的可观测量。

英文摘要

Particle physics collider experiments provide Rivet routines as part of the analysis preservation strategy for model-independent measurements. Rivet is a C++ toolkit that allow new theoretical models to be compared to the measurements, thus aiding the development and tuning of Monte Carlo event generators as well as searches for physics beyond the Standard Model. However, analysis coverage is known to be incomplete, with only 39% of measurements having documented and publicly available Rivet routines. In this article, we design and implement an automated workflow based on Large Language Models with the goal of providing the missing routines. This multi-step workflow, referred to as AgentRivet, extracts the physics analysis information from published papers and writes the missing Rivet routines, with intermediate code- and physics- reviews as part of an autonomous quality control. We report the results obtained using commercial Large Language Models, provided by OpenAI, Anthropic, and Google, for two recent measurements from the ATLAS and CMS experiments. We find that AgentRivet produces competent Rivet routines with few syntax errors. The physics fidelity of the routines is reasonable and follows the explanations given in the relevant publications. Nevertheless, physics-implementation issues do arise and are investigated using the artefacts produced by AgentRivet. The majority of physics implementation issues arise from subtle-but-ambiguous definitions in the given publication, although some models struggle to implement complex observables even when clear definitions are given.

2606.13454 2026-06-12 physics.optics cond-mat.dis-nn cs.ET cs.LG 新提交

Optical Implementation of Equilibrium Propagation Using Spatial Photonic Ising Machines

利用空间光子伊辛机实现平衡传播的光学实现

Dimitri Vanden Abeele, Daniele Veraldi, Davide Pierangeli, Claudio Conti, Serge Massar

发表机构 * Laboratoire d’Information Quantique, Université Libre de Bruxelles (ULB)(量子信息实验室,布鲁塞尔自由大学) Dipartimento di Fisica, Sapienza Università di Roma(物理学系,萨皮恩扎罗马大学)

AI总结 提出利用空间光子伊辛机光学实现平衡传播,通过规范变换方法编码神经元状态和可训练模式,在Wine和MNIST数据集上验证了能效物理实现的可行性。

详情
AI中文摘要

平衡传播为训练基于能量的网络提供了一种传统机器学习的引人注目的替代方案。在这里,我们展示了使用空间光子伊辛机(SPIM)的平衡传播(EP)的混合光学-数字实现。SPIM利用规范变换方法,通过空间光调制器将连续神经元状态和秩1二进制可训练模式光学编码为相位调制,并使用有限差分方案实现推理。实验系统在Wine分类数据集上进行了评估。该方法的潜力,包括使用连续耦合和结构化耦合矩阵,在更复杂的MNIST数据集上通过数值评估。我们的工作为平衡传播的节能物理实现提供了一条具体路径。

英文摘要

Equilibrium Propagation offers a compelling alternative to traditional machine learning for training energy-based networks. Here we demonstrate a hybrid optical-digital implementation of EP using a Spatial Photonic Ising Machine (SPIM). The SPIM exploits the gauge transformation method to optically encode both continuous neuron states and rank-1 binary trainable patterns as phase modulations via a spatial light modulator, with inference realized using a finite difference scheme. The experimental system is evaluated on the Wine classification dataset. The potential of this approach, including the use of continuous couplings and structured coupling matrices, is evaluated numerically on the more complex MNIST dataset. Our work provides a concrete pathway toward energy-efficient physical implementations of Equilibrium Propagation.

2606.12478 2026-06-12 cs.LG cond-mat.stat-mech quant-ph 新提交

Boltzmann Attention: Learnable Ising Couplings for Cooperative Attention

玻尔兹曼注意力:用于协同注意力的可学习伊辛耦合

Gilhan Kim, Daniel K. Park

发表机构 * Yonsei University(延世大学)

AI总结 提出玻尔兹曼注意力,通过可学习的伊辛耦合增强注意力机制中的位置间交互,在字符级语言建模和括号匹配任务中优于标准softmax注意力,并展示了量子退火训练的有效性。

详情
Comments
19 pages, 5 figures
AI中文摘要

注意力机制是现代序列模型的核心,但标准注意力主要通过单个查询-键相似度计算相关性。尽管softmax归一化引入了位置间的竞争,但标准注意力层并未显式参数化注意力决策之间的可学习交互。这限制了其直接在注意力机制内建模协同或对抗性共注意力结构的能力。我们提出玻尔兹曼注意力,一种基于能量的泛化,其中注意力模式由相互作用的伊辛模型控制。该方法用可学习的成对耦合增强通常的数据依赖局部场,使模型能够表示超出softmax或sigmoid注意力所捕获的位置间相关性。在字符级语言建模和合成括号匹配实验上,玻尔兹曼注意力在标准Transformer架构中持续优于标准softmax注意力,且优势随序列长度增加而更加明显。四路消融实验证实改进来自可学习的成对耦合。这些结果表明,显式位置间交互为基于注意力的序列建模提供了原则性增强。此外,伊辛公式为基于量子计算的采样策略开辟了自然路径:我们证明非绝热量子退火提供了实用的训练方法,同时保持了与精确玻尔兹曼计算相当的性能。

英文摘要

Attention mechanisms are central to modern sequence models, yet standard attention computes relevance primarily through individual query--key similarities. Although softmax normalization introduces competition among positions, a standard attention layer does not explicitly parameterize learnable interactions between attention decisions. This limits its ability to directly model cooperative or antagonistic co-attention structure within the attention mechanism itself. We propose Boltzmann attention, an energy-based generalization in which attention patterns are governed by an interacting Ising model. The method augments the usual data-dependent local fields with learnable pairwise couplings, allowing the model to represent inter-position correlations beyond those captured by softmax or sigmoid attention. Experiments on character-level language modeling and synthetic bracket matching show that Boltzmann attention consistently improves over standard softmax attention within a standard Transformer architecture, with the advantage becoming more pronounced as sequence length increases. A four-way ablation confirms that the improvement arises from the learnable pairwise couplings. These results suggest that explicit inter-position interactions provide a principled enhancement for attention-based sequence modeling. Moreover, the Ising formulation opens a natural path toward quantum-computing-based sampling strategies: we demonstrate that diabatic quantum annealing provides a practical training method while maintaining competitive performance with exact Boltzmann computation.

2606.13045 2026-06-12 cond-mat.dis-nn cs.LG 新提交

A solvable model for unsupervised federated learning

无监督联邦学习的一个可解模型

Giovanni Catania, Aurélien Decelle, Gianluca Manzan, Beatriz Seoane, Daniele Tantari

发表机构 * Institute for Cross-disciplinary Physics and Complex Systems IFISC (CSIC-UIB)(跨学科物理与复杂系统研究所(IFISC,CSIC-UIB)) Departamento de Física Teórica, Universidad Complutense de Madrid(马德里complutense大学理论物理系) Escuela Técnica Superior de Ingenieros Industriales, Universidad Politécnica de Madrid(马德里理工大学工业工程师学院) GISC - Grupo Interdisciplinar de Sistemas Complejos(跨学科复杂系统小组) Inria Saclay - Tau team(萨克利Inria团队) Department of Mathematics, University of Bologna(博洛尼亚大学数学系)

AI总结 提出一个理论框架,通过教师-多学生交互场景分析联邦学习,证明学生间交互能系统提升学习性能,并推导最优贝叶斯条件,映射到受限玻尔兹曼机。

详情
AI中文摘要

我们引入了一个理论框架,用于在生成式设置中分析联邦学习,通过教师-多学生交互场景,其中每个学生接收不同的数据实现,要么通过不同的噪声破坏,要么通过访问不同的子集,可能大小不同。使用平衡无序系统的理论工具,我们解析地表明学生间的交互系统地提升了学习性能:高噪声学生需要更少的样本来恢复潜在模式,而低噪声学生与真实信号的重叠更大。我们推导了教师恢复的最优贝叶斯条件,作为样本复杂度、噪声水平和交互强度的函数,并通过数值模拟验证了这些预测。得到的动力学可以映射到具有结构化隐藏层的受限玻尔兹曼机中的平衡采样,从而为交互如何改进分布式生成建模提供了原则性的理论理解。

英文摘要

We introduce a theoretical framework for analyzing federated learning in a generative setting through a teacher-multiple interacting students scenario, in which each student receives a distinct realization of the data, either through a different noise corruption or by accessing a different subset, possibly of varying size. Using theoretical tools in equilibrium disordered system, we analytically show that interactions among students systematically enhance learning performance: highly noisy students require fewer samples to recover the underlying pattern, while low-noise students achieve a larger overlap with the ground-truth signal. We derive the optimal Bayesian conditions for teacher recovery as functions of the sample complexity, noise level, and interaction strength, and validate these predictions through numerical simulations. The resulting dynamics can be mapped onto equilibrium sampling in a Restricted Boltzmann Machine with a structured hidden layer, providing a principled theoretical understanding of how interactions improve distributed generative modeling.

2606.12263 2026-06-12 cs.CV 新提交

VOID: Defeating Unauthorized Mimicry in Latent Diffusion Models

VOID: 击败潜在扩散模型中的未授权模仿

Chunlin Qiu, Ang Li, Tianxiao Huang, Ruilin Gan, Yunjie Ge, Shenyi Zhang, Huayi Duan, Lingchen Zhao, Chao Shen, Qian Wang

发表机构 * School of Cyber Science and Engineering, Wuhan University(武汉大学网络空间安全学院) School of Computer Science, Wuhan University(武汉大学计算机学院) Institute for Math&AI, Wuhan University(武汉大学数学与人工智能研究所) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) School of Cyber Science and Engineering, Xi’an Jiaotong University(西安交通大学网络空间安全学院)

AI总结 针对潜在扩散模型被用于未授权模仿的问题,提出VOID防御框架,通过操纵模型内在随机性,放大潜在编码误差并抵消目标引导信号,实现语义破坏,阻止未授权模仿,同时将扰动限制在人眼不可感知区域。

详情
Comments
Extended full version with more comprehensive experimental results. To appear in the 35th USENIX Security Symposium (USENIX Security 2026)
AI中文摘要

虽然潜在扩散模型(LDM)彻底改变了视觉合成,但它们越来越多地被用于对个人的未授权模仿。现有防御通过注入欺骗性扰动,将生成图像引导至无关目标。然而,这种方法基于一个无根据的假设:微小的扰动能在LDM的整个生成过程中保持其欺骗效果。实际上,模型固有的恢复机制会移除这些扰动,导致个体身份在生成的图像中重新出现。我们提出VOID,一种通过操纵LDM内在随机性克服这一难题的防御框架。VOID以两种新颖方式扰动扩散管道:1)放大潜在编码误差以破坏图像的语义结构,以及2)抵消目标引导信号以抑制模型的恢复能力。这导致语义破坏,阻止任何未授权模仿。值得注意的是,安全增益不以视觉效用为代价,因为VOID同时设法将扰动限制在受保护图像的人眼不可感知区域。我们在5个数据集上对10种模仿攻击的24种最先进防御进行了全面评估,证明了VOID前所未有的保护能力:它将平均Frechet Inception Distance(FID)从113提高到365,比迄今为止最强的防御提升了223%。

英文摘要

While Latent Diffusion Models (LDMs) have revolutionized visual synthesis, they are increasingly exploited for unauthorized mimicry of individuals. Existing defenses inject deceptive perturbations to steer the generated images toward irrelevant targets. However, this approach hinges on an ungrounded assumption: subtle perturbations can maintain their deceptive efficacy throughout an LDM's extensive generation process. In reality, the model's innate restoration mechanism will remove such perturbations and cause individual identities to re-emerge in the images generated. We propose VOID, a defense framework that overcomes this conundrum by manipulating an LDM's intrinsic stochasticity. VOID perturbs the diffusion pipeline in two novel ways: 1) amplifying the latent encoding errors to shatter an image's semantic structure, and 2) counteracting the target guidance signals to suppress the model's restoration capabilities. This results in a semantic corruption that thwarts any unauthorized mimicry. Notably, the security gain does not come at the price of visual utility, as VOID simultaneously manages to confine perturbations to human-imperceptible regions of protected images. Our comprehensive evaluation of 24 state-of-the-art defenses against 10 mimicry attacks on 5 datasets demonstrates VOID's unprecedented protection power: it increases the average Frechet Inception Distance (FID) from 113 to 365, a 223% improvement over the strongest defense to date.

2606.12236 2026-06-12 cs.RO cs.CV 新提交

DrivingAgent: Design and Scheduling Agents for Autonomous Driving Systems

DrivingAgent: 自动驾驶系统的设计与调度智能体

Zhongyu Xia, Wenhao Chen, Yongtao Wang, Ming-Hsuan Yang

发表机构 * Wangxuan Institute of Computer Technology, Peking University(北京大学王选计算机技术研究所) University of California, Merced(加州大学默塞德分校)

AI总结 提出DrivingAgent框架,通过自动化模块开发(设计阶段)和强化学习训练的轻量级LLM实时调度(调度阶段),解决自动驾驶系统集成新模型和满足实时约束的挑战,在nuScenes和Bench2Drive上取得更优速度-精度权衡。

详情
AI中文摘要

许多自动驾驶系统越来越多地整合基础模型以提高泛化能力并处理长尾场景。然而,这一趋势带来了两个关键挑战:(i)设计和集成新模型的手动且劳动密集型过程,以及(ii)缺乏智能、动态的调度机制以满足严格的实时约束。虽然基于大语言模型(LLM)的智能体为自动化提供了有前景的途径,但现有框架并不适合自动驾驶。具体来说,它们未能区分系统设计和实时调度的根本不同需求,将模块视为不透明的黑盒,并且并非为持续运行而设计。为了解决这些局限性,我们提出了DrivingAgent,这是一个针对自动驾驶系统设计和调度双重挑战的新型智能体框架。在设计阶段,DrivingAgent通过解释系统架构、生成代码以及通过超网络训练验证模块来自动化模块开发。在调度阶段,它采用一个通过强化学习训练的轻量级LLM来实时动态编排系统模块,并由一个集成长期存储与带时间戳短期上下文的结构化记忆支持。实验结果表明,DrivingAgent在nuScenes和Bench2Drive基准测试上实现了更优的速度-精度权衡。

英文摘要

Many autonomous driving systems are increasingly incorporating foundation models to improve generalization and handle long-tail scenarios. However, this trend introduces two key challenges: (i) the manual and labor-intensive process of designing and integrating new models, and (ii) the lack of intelligent, dynamic scheduling mechanisms to meet strict real-time constraints. While Large Language Model (LLM)-based agents offer a promising avenue for automation, existing frameworks are ill-suited for autonomous driving. Specifically, they fail to distinguish between the fundamentally different requirements of system design and real-time scheduling, treat modules as opaque black boxes, and are not designed for continuous operation. To address these limitations, we propose DrivingAgent, a novel agent framework tailored to the dual challenges of autonomous driving system design and scheduling. In the design phase, DrivingAgent automates module development by interpreting system architecture, generating code, and validating modules via super-network training. In the scheduling phase, it employs a lightweight LLM trained with reinforcement learning to dynamically orchestrate system modules in real time, supported by a structured memory that integrates long-term storage with timestamped short-term context. Experimental results demonstrate that DrivingAgent achieves a superior speed--accuracy trade-off on both the nuScenes and Bench2Drive benchmarks.

2606.12160 2026-06-12 cs.CL 新提交

A Controlled Study of Decoding-Time Truthfulness Methods on Instruction-Tuned LLMs

指令调优大语言模型解码时真实性方法的受控研究

Ao Sun

发表机构 * Independent Researcher(独立研究员)

AI总结 本研究通过分析每层令牌logits特征,提出CHAIR框架检测幻觉,在TruthfulQA和MMLU上显著提升零样本检测准确率。

详情
AI中文摘要

在这项工作中,我们引入了CHAIR(Classifier of Hallucination As ImproveR),一个通过分析每个令牌每一层的内部logits来检测幻觉的监督框架。我们的方法从所有层的令牌logits中提取一组紧凑的特征,如最大值、最小值、均值、标准差和斜率,从而在不发生过拟合的情况下实现有效的幻觉检测。在TruthfulQA和MMLU数据集上的实验表明,CHAIR显著提高了检测准确性,特别是在零样本场景下,展示了其鲁棒性和泛化能力。除了幻觉检测,CHAIR还凸显了利用内部表示设计高级解码策略的潜力。通过利用logits中的模式,我们建议更复杂的模型和自适应解码方法可以进一步减少幻觉并提高文本完成质量。CHAIR不仅为检测幻觉提供了实用解决方案,还为探索LLM中更丰富的表示以改进其事实性和连贯性奠定了基础。

英文摘要

Decoding-time truthfulness methods -- layer-contrast decoding, inference-time intervention, and learned logit adapters -- have demonstrated 10-30 point gains on TruthfulQA when applied to base language models. However, modern instruction-tuned LLMs already achieve substantially higher baselines (61-76%), raising the question of whether these methods remain effective in practice. We design a six-control evaluation framework -- out-of-distribution training, multi-judge validation, simple decoding baselines, confound controls, bootstrap confidence intervals, and seed variance -- and apply it across 5 models (1B-70B), 3 benchmarks, and 15 methods. We find that previously reported gains shrink substantially under strict controls: on the full TruthfulQA benchmark (N=817), no token-level method achieves statistically significant improvement, and the best learned adapter scores -2.0 points below greedy (p=.23). We identify five evaluation sensitivities -- contamination, judge choice, missing baselines, confounds, and statistical noise -- that individually or jointly account for these discrepancies. Cross-benchmark validation on HaluEval QA and TriviaQA confirms that these patterns extend beyond TruthfulQA. Deliberative prompting methods (chain-of-thought, self-critique) appear more robust in the evaluated regime, with CoT achieving +5.6-19pp across benchmarks as a training-free, single-pass method. We release a seven-point evaluation checklist and discuss implications for future truthfulness research.

2606.11930 2026-06-12 cs.HC cs.AI cs.CV 新提交

Frozen Multimodal Embeddings for AI-Assisted Interview Assessment of Personality and Cognitive Ability

冻结多模态嵌入用于异步视频面试中的个性与认知能力评估

Kuo-En Hung, Hung-Yue Suen, Shih-Ching Yeh, Hsiang-Wen Wang

发表机构 * Technology Application and Human Resource Development, National Taiwan Normal University(台湾国立台中教育大学技术应用与人力资源发展系) Computer Science and Information Engineering, National Central University(台湾国立中央大学计算机科学与资讯工程系) Institute of Photonic System, National Yang Ming Chiao Tung University(台湾阳明交通大学光电系统研究所)

AI总结 针对异步视频面试中标注数据有限的高维多模态学习问题,提出使用冻结多模态编码器(CLIP、Whisper、RoBERTa等)结合低容量下游模型,在个性预测任务上实现MSE降低19.1%,并发现认知能力预测中存在数据集捷径。

详情
Comments
9 pages, 1 figure, 5 tables
AI中文摘要

从异步视频面试(AVI)中预测心理特质是一个具有挑战性的多模态学习问题,因为标注数据集有限,而每个回答包含高维的视觉、声学和语言信号。本文介绍了我们针对ACM多媒体AVI挑战2026的解决方案,该挑战评估两个任务:Track~1从与个性相关的面试回答中预测自我报告的HEXACO个性特质,Track~2从结构化AVI回答中对认知能力水平进行分类。我们将该问题视为小样本表示学习任务。我们不微调大型预训练模型,而是使用冻结的多模态编码器,包括用于视觉特征的CLIP、用于声学特征和转录的Whisper,以及用于文本表示的RoBERTa、E5和DeBERTaV3,随后使用低容量下游模型。对于Track~1,我们的特质特定回归和晚期融合系统实现了平均验证MSE为0.2696,优于官方基线0.3334。消融结果显示,从全局模型(0.3189)到逐特质建模(0.2871)再到逐特质晚期融合(0.2696)的三步改进,相对于官方基线MSE相对降低了19.1%。对于Track~2,一个紧凑的主题属性基线达到了0.5781的准确率,而我们的多模态集成达到了0.5313,两者均高于官方基线0.4062。我们将这一结果解释为验证分割中可能存在主题属性捷径的证据,而非从AVI内容中进行的稳健认知推理。总体而言,我们的发现表明,基于AVI的心理评估受益于特质特定的多模态建模,但认知能力预测需要仔细控制数据集捷径。

英文摘要

Predicting psychological traits from asynchronous video interviews (AVIs) is a challenging problem in AI-assisted interview assessment because labeled datasets are limited while each response contains high-dimensional visual, acoustic, and verbal signals. This paper presents our solution for the ACM Multimedia AVI Challenge 2026, which evaluates two tasks: Track~1 predicts self-reported HEXACO personality traits from personality-related interview responses, and Track~2 classifies cognitive ability levels from structured AVI responses. We treat the problem as a small-sample representation learning task. Instead of fine-tuning large pretrained models, we use frozen multimodal encoders, including CLIP for visual features, Whisper for acoustic features and transcripts, and RoBERTa, E5, and DeBERTaV3 for textual representations, followed by low-capacity downstream models. For Track~1, our trait-specific regression and late-fusion system achieves an average validation MSE of 0.2696, improving over the official baseline of 0.3334. Ablation results show a three-step improvement from a global model (0.3189), to per-trait modeling (0.2871), to per-trait late fusion (0.2696), corresponding to a 19.1% relative MSE reduction over the official baseline. For Track~2, a compact subject-attribute baseline reaches 0.5781 accuracy, while our multimodal ensemble reaches 0.5313, both above the official baseline of 0.4062. We interpret this result as evidence of possible subject-attribute shortcuts in the validation split rather than robust cognitive inference from AVI content. Overall, our findings suggest that AVI-based psychological assessment benefits from trait-specific multimodal modeling, but cognitive ability prediction requires careful control of dataset shortcuts.

2606.11898 2026-06-12 cs.CL cs.LG 新提交

GraspLLM: Towards Zero-Shot Generalization on Text-Attributed Graphs with LLMs

GraspLLM: 面向文本属性图与LLM的零样本泛化

Hengyi Feng, Zeang Sheng, Meiyi Qiang, Li Yang, Wentao Zhang

发表机构 * Peking University(北京大学) National University of Singapore(新加坡国立大学) University of California, Berkeley(加州大学伯克利分校)

AI总结 提出GraspLLM框架,通过融合图结构理解与LLM语义能力,利用基序感知对比学习和最优上下文子图对齐,实现跨数据集和跨任务的零样本泛化。

详情
AI中文摘要

近年来,对文本属性图(TAGs)的研究因其在引文网络、电子商务平台、社交媒体和网页等各类真实数据场景中的广泛应用而备受关注。受大语言模型(LLMs)卓越语义理解能力的启发,已有许多尝试将LLMs集成到TAGs中。然而,现有方法仍难以在不同图和任务间泛化,且其捕获可迁移图结构模式的能力有限。为此,我们提出了GraspLLM框架,该框架将图结构理解与LLM的语义理解能力相结合,以增强跨数据集和跨任务的泛化能力。具体而言,我们使用冻结的通用嵌入模型将不同图的节点文本表示在统一语义空间中,在此基础上,我们在多个基序诱导的邻接矩阵上进行基序感知对比学习,以提取与数据集无关的结构信息。然后,通过我们提出的最优上下文子图,为每个目标节点提取最相关的上下文子图,并通过对齐投影仪将这些子图对齐到LLM的令牌空间。在涵盖不同领域的TAG基准数据集上的大量实验表明,GraspLLM在零样本场景下始终优于先前基于LLM的TAG方法,突显了其在不同数据集和任务上的强泛化能力。我们的代码可在以下网址获取:此 https URL。

英文摘要

Research on Text-Attributed Graphs (TAGs) has gained significant attention recently due to its broad applications across various real-world data scenarios, such as citation networks, e-commerce platforms, social media, and web pages. Inspired by the remarkable semantic understanding ability of Large Language Models (LLMs), there have been numerous attempts to integrate LLMs into TAGs. However, existing methods still struggle to generalize across diverse graphs and tasks, and their ability to capture transferable graph structural patterns remains limited. To address this, we introduce the GraspLLM, a framework that combines Graph structural comprehension with semantic understanding prowess of LLMs to enhance the cross-dataset and cross-task generalizability. Specifically, we represent node texts from different graphs in a unified semantic space with a frozen general embedding model, on top of which we perform motif-aware contrastive learning across multiple motif-induced adjacency matrices to extract dataset-agnostic structural information. Then, with our proposed optimal contextual subgraph, we extract the most contextually relevant subgraph for each target node and align these subgraphs to the token space of LLM via an alignment projector. Extensive experiments on TAG benchmark datasets spanning diverse domains reveal that GraspLLM consistently outperforms previous LLM-based methods for TAGs, especially in zero-shot scenarios, highlighting its strong generalizability across different datasets and tasks. Our code is available at this https URL.

2606.11894 2026-06-12 cs.CV 新提交

Wild3R: Feed-Forward 3D Gaussian Splatting from Unconstrained Sparse Photo Collection

Wild3R: 从无约束稀疏照片集合进行前馈式3D高斯泼溅

Yuto Furutani, Takashi Otonari, Kaede Shiohara, Toshihiko Yamasaki

发表机构 * The University of Tokyo(东京大学)

AI总结 提出Wild3R,一种针对无约束稀疏照片集合的前馈式3D高斯泼溅方法,通过引入包含多样光照和瞬态物体的WildCity数据集,学习跨视角外观一致性并移除瞬态内容,性能优于现有前馈方法,与基于逐场景优化的方法相当。

详情
Comments
Project page: this https URL
AI中文摘要

前馈式3D高斯泼溅(3DGS)消除了传统3DGS所需的耗时逐场景优化。然而,现有的前馈方法难以处理包含多样光照条件和瞬态物体的真实世界照片集合。在本文中,我们提出了Wild3R,一种针对无约束稀疏照片集合的前馈方法。主要瓶颈在于缺乏提供多视角、多种光照和瞬态变化的训练数据,而这些是学习鲁棒场景表示所必需的。为解决这一问题,我们引入了WildCity数据集,该数据集包含200个场景、170种光照条件和瞬态物体,总计337,500张图像。通过利用该数据集,我们的模型在参考视图条件下学习跨视角的外观一致性,同时移除瞬态内容。大量实验表明,我们的方法优于现有的前馈方法,并取得了与先前基于逐场景优化的方法相竞争的结果。

英文摘要

Feed-forward 3D Gaussian Splatting (3DGS) removes the need for time-consuming per-scene optimization required by traditional 3DGS. However, existing feed-forward approaches struggle with real-world photo collections that include diverse lighting conditions and transient objects. In this paper, we present Wild3R, a feed-forward approach for unconstrained sparse photo collections. The main bottleneck is the lack of training data that provides multiple viewpoints, a variety of illuminations, and transient variations necessary for learning robust scene representations. To address this, we introduce the WildCity dataset, which comprises 200 scenes, 170 lighting conditions, and transient objects, resulting in 337,500 images in total. By leveraging the dataset, our model learns appearance consistency across viewpoints conditioned on reference views, while removing transient content. Extensive experiments demonstrate that our method outperforms existing feed-forward approaches and achieves results competitive with prior per-scene optimization-based methods.

2606.11836 2026-06-12 cs.SD cs.AI eess.AS 新提交

Towards Data-free and Training-free Compression for Speech Foundation Models Using Parameter Clustering

面向语音基础模型的无数据无训练压缩:基于参数聚类的方法

Haoning Xu, Zhaoqing Li, Huimeng Wang, Youjun Chen, Chengxi Deng, Mengzhe Geng, Xunying Liu

发表机构 * The Chinese University of Hong Kong(香港中文大学) National Research Council Canada(加拿大国家研究委员会)

AI总结 提出一种基于k-means通道聚类的无数据无训练压缩方法,通过层间不同参数簇数实现细粒度混合稀疏剪枝,在HuBERT-large和Whisper-large-v3上显著降低WER。

详情
Comments
Accepted by Interspeech 2026
AI中文摘要

本文提出了一种新颖的无数据无训练压缩方法,用于语音基础模型,该方法通过k-means进行通道级聚类。还探索了更细粒度的混合稀疏剪枝,通过层间不同数量的参数簇实现。在LibriSpeech数据集上进行的实验表明,当对HuBERT-large进行50%的剪枝稀疏度操作时,在微调前,测试干净和测试其他子集上,相对于基于幅度的剪枝,获得了27.73%/18.61%绝对(34.37%/21.91%相对)的一致WER降低;在仅3个epoch的微调后,获得了0.19%/0.79%绝对(3.36%/4.62%相对)的降低。在Whisper-large-v3上,在10%稀疏度下,相对于基于幅度的剪枝,观察到2.86%/5.02%绝对(59.21%/55.29%相对)的类似WER降低,所有这些相对于未压缩基线均没有显著的WER增加。

英文摘要

This paper presents a novel data-free and training-free compression approach for speech foundation models using channelwise clustering via k-means. More fine-grained, mixed sparsity pruning by layer-level varying number of parameter clusters is also explored. Experiments conducted on the LibriSpeech dataset suggest that when operating with pruning sparsity of 50% on HuBERT-large, consistent WER reductions of 27.73%/18.61% absolute (34.37%/21.91% relative) over the magnitude-based pruning were obtained on the test-clean and test-other subsets before fine-tuning and 0.19%/0.79% absolute (3.36%/4.62% relative) after fine-tuning with only 3 epochs. Similar WER reductions of 2.86%/5.02% absolute (59.21%/55.29% relative) were observed against magnitudebased pruning on Whisper-large-v3 at 10% sparsity, all with no significant WER increase relative to the uncompressed baseline.

2606.11793 2026-06-12 cs.LG cs.AI physics.ao-ph 新提交

Scalable Deep Learning Framework for Global High-Resolution Land Use Reconstruction

AI4Land: 面向全球高分辨率土地利用重建的可扩展深度学习

Amirpasha Mozaffari, Marina Castaño, Stefano Materia, Etienne Tourigny, Oscar Molina-Sedano, Jordi Varela-Agrelo, Dario Garcia-Gasulla, Miguel Castrillo Melguizo, Mario Acosta, Amanda Duarte

发表机构 * Barcelona Supercomputing Center(巴塞罗那超级计算中心)

AI总结 提出AI4Land框架,采用U-Net两阶段方法,结合粗分辨率情景数据与静态地理特征,重建高分辨率年度土地利用与覆盖,减少陆地碳循环不确定性,支持气候模拟。

详情
AI中文摘要

陆地碳循环的不确定性仍是气候预测的主要制约因素,部分源于地球系统模型中陆面表征和变率的不确定性。为解决此问题,我们提出了数据驱动框架AI4Land,用于生成关键陆面变量的高分辨率历史重建和未来预测。该框架采用U-Net架构的两阶段方法。在第一阶段(本文重点),它通过整合粗分辨率情景数据与静态地理特征,重建年度土地利用与土地覆盖。在计划的第二阶段,生成的高分辨率地图将用于在更细时间尺度上预测动态生物物理变量,特别是叶面积指数。模型基于地球观测数据训练,学习再现空间明确且物理一致的陆面模式,并将时间覆盖扩展到缺乏直接观测的时期。AI4Land在MareNostrum5上开发和训练,展示了GPU加速的高性能计算基础设施如何支持全球尺度的气候AI流水线。最终产品是一套开源模拟器,旨在与数字孪生平台(如Destination Earth计划下开发的平台)实时耦合。通过按需提供逼真且演变的陆面条件,本工作旨在减少关键不确定性,提高下一代气候模拟的预测能力。

英文摘要

Uncertainty in the terrestrial carbon cycle remains a major constraint in climate projections, partly driven by the uncertainties affecting the land surface representation and variability in Earth system models. To address this limitation, we present a data-driven framework AI4Land, for generating high-resolution historical reconstructions and future projections of key land surface variables. The framework follows a two-phase approach using a U-Net architecture. In the first phase, which is the focus of this work, it reconstructs annual land use and land cover by integrating coarse-resolution scenario data with static geophysical features. In a planned second phase, the resulting high-resolution maps will be used to predict dynamic biophysical variables, particularly leaf area index, at finer temporal scales. Trained on Earth observation data, the models learn to reproduce spatially explicit and physically consistent land surface patterns, extending temporal coverage to periods lacking direct observations. AI4Land was developed and trained on MareNostrum5, demonstrating how GPU-accelerated HPC infrastructure enables global-scale climate AI pipelines. The final product is a suite of open-source emulators designed for real-time coupling with digital twin platforms, such as those developed under the Destination Earth initiative. By delivering realistic and evolving land surface conditions on demand, this work aims to reduce critical uncertainties and improve the predictive power of next-generation climate simulations.

2606.11792 2026-06-12 cs.CV cs.AI cs.CL 新提交

MultiToP: Learning to Patch Visual Tokens to Mitigate Hallucinations in Video Large Multimodal Models

MultiToP:学习修补视觉令牌以减轻视频大型多模态模型中的幻觉

Yuansheng Gao, Wenbin Xing, Jiahao Yuan, Kaiwen Zhou, Han Bao, Zonghui Wang, Wenzhi Chen

发表机构 * Zhejiang University(浙江大学) Sun Yat-sen University(中山大学) East China Normal University(华东师范大学)

AI总结 提出MultiToP框架,通过轻量级视觉令牌修补器动态替换不可靠视觉令牌,结合信息引导排名校准和稀疏正则化,在不修改原模型情况下减少视频多模态模型幻觉,显著提升F1分数和问答准确率。

详情
Comments
Preprint
AI中文摘要

视频大型多模态模型在视频理解方面取得了显著进展,但仍容易产生幻觉,即生成的响应未能忠实于输入视频。在本文中,我们提出MultiToP,一种多模态上下文感知的视觉令牌修补框架,通过在语言生成之前优化不可靠的视觉令牌来减轻幻觉。MultiToP引入了一个轻量级的视觉令牌修补器,用于预测令牌级替换分布,并选择性地用动态全局修补令牌替换不可靠的视觉令牌。为了有效训练修补器,我们进一步提出了信息引导的排名校准,利用从主干网络派生的答案条件帧级信息线索来指导令牌替换。结合真实答案监督和稀疏正则化,MultiToP实现了局部视觉证据优化,而无需修改原始模型。大量实验表明,MultiToP在Vript-HAL上有效减少了幻觉,且推理开销可忽略不计,将Qwen3-VL-4B-Instruct的F1分数相比原始模型提高了50.60%。同时,MultiToP保持了通用的视频理解能力,在ActivityNet-QA上为Video-LLaVA-7B带来了18.58%的相对准确率提升。

英文摘要

Video Large Multimodal Models have achieved remarkable progress in video understanding, yet they remain prone to hallucinations, where generated responses are not faithfully supported by the input video. In this paper, we propose MultiToP, a multimodal-context-aware visual token patching framework that mitigates hallucinations by refining unreliable visual tokens before language generation. MultiToP introduces a lightweight Visual Token Patcher to predict token-level replacement distributions and selectively substitute unreliable visual tokens with a dynamic global patch token. To train the patcher effectively, we further propose information-guided rank calibration, which uses answer-conditioned frame-level information cues derived from the backbone to guide token replacement. Combined with ground-truth answer supervision and sparsity regularization, MultiToP enables localized visual evidence refinement without modifying the original model. Extensive experiments demonstrate that MultiToP effectively reduces hallucinations on Vript-HAL with negligible inference overhead, improving the F1 scores of Qwen3-VL-4B-Instruct by 50.60% over the vanilla model. Meanwhile, MultiToP preserves general video understanding ability, yielding an 18.58% relative accuracy gain on ActivityNet-QA for Video-LLaVA-7B.

2606.11767 2026-06-12 cs.RO cs.AI 新提交

Blind Dexterous Grasping via Real2Sim2Real Tactile Policy Learning

通过真实到仿真到真实触觉策略学习的盲操作灵巧抓取

Shengcheng Luo, Xiyan Huang, Zhe Xu, Wanlin Li, Ziyuan Jiao, Chenxi Xiao

发表机构 * ShanghaiTech University(上海科技大学) Beijing Institute for General Artificial Intelligence(北京通用人工智能研究院)

AI总结 提出一种结合Real2Sim触觉校准、布局感知触觉编码器和触觉条件扩散策略的框架,实现仅依赖触觉的灵巧手盲抓取,在真实机器人上对20个物体达到27%成功率。

详情
Comments
23 pages, 6 figures
AI中文摘要

使用灵巧手进行盲抓取是一项关键的操作能力。然而,由于触觉的仿真到真实差距以及稀疏触觉信号的有限表达能力,为真实机器人学习这种仅依赖触觉的策略仍然具有挑战性。为了弥合这一差距,我们提出了一个仅依赖触觉的盲抓取框架,该框架可部署在物理多指机器人手上。我们的方法结合了三个关键组成部分。首先,我们引入了一个Real2Sim触觉校准流程,构建了一个接触校准的数字孪生模拟器,能够复现真实的触觉信号。其次,我们使用布局感知触觉编码器改进了稀疏触觉观测的表达能力,该编码器通过自监督预训练融入了传感器几何先验。第三,为了提高对未见物体的泛化能力,我们在校准后的模拟器中训练了特定物体的强化学习专家,并将其成功的抓取轨迹聚合为触觉条件扩散策略。我们在配备分布式触觉传感的物理LEAP手上评估了我们的方法,涉及10个见过和10个未见过的物体。部署的策略在所有20个物体上实现了27%的真实世界抓取成功率,无需真实世界的抓取演示或视觉输入。仿真消融实验表明,布局感知触觉预训练提高了抓取性能,而传感级评估确认Real2Sim校准增加了仿真与硬件之间触觉接触事件的一致性。这些结果表明,接触事件校准、几何感知触觉表示学习和基于扩散的策略聚合为真实灵巧机器人手上的仅触觉盲抓取提供了一条有效路径。项目页面:此HTTP URL。

英文摘要

Blind grasping with a dexterous hand is a crucial manipulation capability. Nevertheless, learning such tactile-only policies for real robots remains challenging due to the tactile sim-to-real gap and the limited expressiveness of sparse tactile signals. To bridge this gap, we propose a framework for tactile-only blind grasping that is deployable on a physical multi-fingered robotic hand. Our approach combines three key components. First, we introduce a Real2Sim tactile calibration pipeline that constructs a contact-calibrated digital-twin simulator capable of reproducing real tactile signals. Second, we improve the expressiveness of sparse tactile observations using a layout-aware tactile encoder, which incorporates sensor-geometry priors through self-supervised pretraining. Third, to improve generalization to unseen objects, we train object-specific reinforcement-learning experts in the calibrated simulator and aggregate their successful grasp trajectories into a tactile-conditioned Diffusion Policy. We evaluate our method on a physical LEAP Hand equipped with distributed tactile sensing across 10 seen and 10 unseen objects. The deployed policy achieves a 27\% real-world grasp success rate across all 20 objects, without real-world grasping demonstrations or visual input. Simulation ablations show that layout-aware tactile pretraining improves grasping performance, while sensing-level evaluations confirm that Real2Sim calibration increases the consistency of tactile contact events between simulation and hardware. Together, these results suggest that contact-event calibration, geometry-aware tactile representation learning, and diffusion-based policy aggregation provide an effective path toward tactile-only blind grasping on real dexterous robotic hands. Project page: this http URL.

2606.11681 2026-06-12 cs.CL cs.SD 新提交

UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction

UR-BERT:通过通用罗马化和语音标记预测扩展大规模多语言TTS的文本编码器

Sangmin Lee, Eekgyun Ahn, Woongjib Choi, Hong-Goo Kang

发表机构 * Dept. of Electronics and Electrical Engineering, Yonsei University(延世大学电子与电气工程系)

AI总结 提出UR-BERT,一种基于罗马化转录的TTS编码器,通过统一书写系统为罗马化表示,结合语音标记预测目标,在495种语言上实现高效多语言TTS,优于现有基线并泛化到未见语言。

详情
Comments
Accepted to Interspeech 2026, Github: this https URL
AI中文摘要

我们提出UR-BERT,一种基于罗马化转录的文本到语音(TTS)编码器,用于大规模多语言TTS系统。传统的字素到音素(G2P)方法由于可靠G2P资源的可用性,仅限于约100种语言。相比之下,UR-BERT通过将多样化的书写系统统一为共享的罗马化表示,扩展到495种语言。为了进一步增强语音保真度和文本-语音对齐,我们在训练过程中引入了一个语音标记预测目标,这促使编码器以数据高效的方式学习语音感知的语音表示。实验表明,基于UR-BERT构建的TTS系统在广泛的语言和资源条件下,始终优于最近的文本编码器基线,并展现出对未见语言的强大泛化能力。

英文摘要

We propose UR-BERT, a Romanized transcription-based text-to-speech (TTS) encoder for massively multilingual TTS systems. Conventional grapheme-to-phoneme (G2P)-based approaches are limited to around 100 languages due to the availability of reliable G2P resources. In contrast, UR-BERT scales to 495 languages by unifying diverse writing systems into a shared Romanization representation. To further enhance phonetic fidelity and text-speech alignment, we introduce a speech token prediction objective during training, which encourages the encoder to learn speech-aware phonetic representations in a data-efficient manner. Experiments show that TTS systems built on UR-BERT consistently outperform recent text encoder baselines across a wide range of languages and resource conditions, and demonstrate strong generalization to unseen languages.