arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.10454 2026-06-10 eess.AS cs.SD 新提交

Entropy-Aware Domain-Routed Mixture-of-Experts Speech-LLM Framework: A Case Study of Multi-Domain Child-Adult ASR

熵感知域路由混合专家语音-大语言模型框架:多领域儿童-成人ASR案例研究

Mohan Shi, Kaiyuan Zhang, Zilai Wang, Natarajan Balaji Shankar, Eray Eren, Abeer Alwan

AI总结 提出一种混合专家语音-大语言模型,通过分类器域路由、混合投影器和混合LoRA模块以及熵感知路由机制,实现跨不同环境和年龄组的统一儿童-成人ASR,在公共儿童语料库上取得一致改进。

详情
Comments
Accepted to Interspeech 2026
AI中文摘要

虽然语音大语言模型在成人自动语音识别上取得了强劲性能,但其对儿童语音的有效性仍未被充分探索,且单一模型往往难以同时处理多样化的成人和儿童年龄组。本文提出一种混合专家语音-大语言模型,用于跨不同环境和年龄组的统一成人及儿童语音ASR。该框架采用基于分类器的域路由,结合粗到细策略,并集成混合投影器和混合LoRA模块以建模域特定变化。为解决域边界附近的路由不确定性,引入熵感知路由机制以动态整合共享专家。在公共儿童语料库上的实验表明,该方法在保持成人ASR性能的同时,相比基线取得了一致改进。据我们所知,这是首个利用语音-大语言模型实现涵盖儿童和成人的统一多领域ASR的工作。

英文摘要

While Speech Large Language Models (Speech-LLMs) have achieved strong performance on adult Automatic Speech Recognition (ASR), their effectiveness on child speech remains under-explored, and single models often struggle to handle diverse adult and child age groups simultaneously. This paper proposes a Mixture-of-Experts (MoE) Speech-LLM for unified ASR across adult and child speech spanning diverse environments and age groups. The framework employs a Classifier-based Domain Router (C-DR) with a coarse-to-fine strategy and integrates both a Mixture-of-Projectors (MoP) and a Mixture-of-LoRAs (MoL) to model domain-specific variations. To address routing uncertainty near domain boundaries, an Entropy-Aware Routing (EAR) mechanism is introduced to dynamically incorporate a shared expert. Experiments on public child corpora demonstrate consistent improvements over baselines while preserving adult ASR performance. To our knowledge, this is the first work leveraging Speech-LLMs for unified, multi-domain ASR encompassing both children and adults.

2606.10361 2026-06-10 stat.ML cs.LG 新提交

Near-Exponential Convergence Rates for kNN Classification based on Boltzmann Margin

基于玻尔兹曼间隔的kNN分类近指数收敛速率

Luyuan Yang, Shayan Shafaei, Chao Lan

AI总结 提出玻尔兹曼间隔条件,介于Tsybakov与Massart间隔之间,首次证明kNN分类器可实现近指数收敛速率。

详情
Comments
Conference on Uncertainty in Artificial Intelligence (UAI)
AI中文摘要

分类器的收敛速率分析通常在Tsybakov间隔或Massart间隔下进行。前者是相对较弱的条件,通常产生多项式速率,而后者更强,但能保证指数速率。本文引入一种新条件,称为玻尔兹曼间隔,它填补了这两种机制之间的空白。该条件弱于Massart间隔,通常强于Tsybakov间隔,并在适当条件下能蕴含它们的许多性质。我们将玻尔兹曼间隔应用于kNN分类器的分析,并建立了kNN分类的第一个近指数收敛速率。我们还给出了主要结果的扩展,并提供了支持主要理论结论的数值证据。

英文摘要

Convergence-rate analysis for classifiers is often conducted under either Tsybakov margin or Massart margin. The former is a relatively weak condition that typically yields polynomial rates, while the latter is substantially stronger but can guarantee exponential rates. In this paper, we introduce a new condition, called Boltzmann margin, that bridges the gap between these two regimes. It is weaker than Massart margin, generally stronger than Tsybakov margin, and can imply many of their properties under suitable conditions. We apply Boltzmann margin to the analysis of kNN classifiers and establish the first near-exponential convergence rates for kNN classification. We also present extensions of the main results and provide numerical evidence supporting the main theoretical implications.

2606.10317 2026-06-10 eess.AS cs.SD 新提交

SSL-GMMVC: Interpretable Voice Conversion via Locally Linear GMM Transforms in Self-Supervised Representation Space

SSL-GMMVC:自监督表示空间中通过局部线性GMM变换的可解释语音转换

Tomoya Tanabu, Hiroshi Nishijima, Daisuke Saito, Nobuaki Minematsu

AI总结 提出SSL-GMMVC方法,在自监督语音空间中用高斯混合模型建模源-目标特征,通过后验加权仿射变换实现可解释的语音转换,在保持可理解性和自然度的同时提升说话人相似度。

详情
Comments
Accepted to Interspeech2026
AI中文摘要

我们介绍了SSL-GMMVC,一种在自监督语音空间中可解释的语音转换方法。该方法使用高斯混合模型对配对的源-目标特征进行建模,并将转换表示为仿射变换的后验加权和。这产生了适应异质特征空间结构且保持解析可处理性的局部线性变换。通过客观和主观评估,我们表明SSL-GMMVC在保持相当可理解性和自然度的同时提高了说话人相似度,并且随着混合成分数量的增加,即使是受限协方差变体也超过了深度学习基线。进一步的分析将成分选择与语音结构联系起来,并揭示了学习变换中可解释的缩放和旋转。这些发现凸显了SSL-GMMVC作为一种有效且可分析的语音转换框架。

英文摘要

We introduce SSL-GMMVC, an interpretable voice conversion method in self-supervised speech space. The method models paired source-target features with a Gaussian mixture model and performs conversion as a posterior-weighted sum of affine transforms. This yields locally linear transformations that adapt to heterogeneous feature-space structure while remaining analytically tractable. Through objective and subjective evaluations, we show that SSL-GMMVC improves speaker similarity with comparable intelligibility and naturalness, and that even a constrained covariance variant surpasses a deep learning baseline as the number of mixture components increases. Further analyses link component selection to phonetic structure and reveal interpretable scaling and rotation in the learned transforms. These findings highlight SSL-GMMVC as an effective, analyzable framework for voice conversion.

2606.10280 2026-06-10 eess.IV cs.CV 新提交

Overlapped Wavelet Diffusion for Low-Light Image Enhancement

重叠小波扩散用于低光照图像增强

Fen Peng, Taizo Suzuki, Seisuke Kyochi

AI总结 提出重叠小波扩散框架OWDiff,通过重叠小波变换消除块伪影,并引入低频引导的高频增强模块恢复细节,在LOLv1和LOLv2-real数据集上优于现有方法。

详情
Journal ref
IEICE Transactions on Information and Systems, Advance online publication, 2026
Comments
Advance published in IEICE Transactions on Information and Systems. DOI: 10.1587/transinf.2026PCP0006. Code: https://github.com/FinnPeg/Overlapped-Wavelet-Diffusion
AI中文摘要

在这项研究中,我们提出了一种用于低光照图像增强(LLIE)的重叠小波扩散框架,该框架包含两个互补组件,以实现无块伪影和细节保持的增强。尽管与传统方法相比,最近基于扩散的LLIE方法表现出显著性能,但DiffLL仍然遭受由Haar小波变换(WT)引起的块伪影以及由于其高频恢复模块(HFRM)的限制导致的边缘模糊或纹理过度平滑。为了克服这些问题,我们引入了重叠小波变换(OWT),它融合了相邻区域的相关性,从而在结构上防止块伪影。此外,我们集成了一个低频引导的高频增强模块(HFEBlock)来加强细节恢复,产生更清晰的边缘和更可靠的纹理。在LOLv1和LOLv2-real数据集上的大量实验表明,我们的框架(称为OWDiff)在定性和定量上均持续优于现有的LLIE方法,在保持计算效率的同时实现了卓越的视觉质量。OWDiff有效解决了Haar WT和HFRM的结构限制,与DiffLL相比,在LOLv1和LOLv2-real数据集上平均PSNR增益为0.58 dB,SSIM相对提高1.64%,LPIPS相对降低5.9%。

英文摘要

In this study, we propose an overlapped wavelet diffusion framework for Low-Light Image Enhancement (LLIE), which incorporates two complementary components to achieve blocking artifact-free and detail-preserving enhancement. Although recent diffusion-based LLIE methods have demonstrated remarkable performance compared with traditional approaches, DiffLL still suffers from blocking artifacts caused by the Haar Wavelet Transform (WT) and blurred edges or over-smoothed textures due to the limitations of its High-Frequency Restoration Module (HFRM). To overcome these issues, we introduce an Overlapped WT (OWT) that incorporates correlations across neighboring regions, thereby structurally preventing blocking artifacts. Furthermore, we integrate a low-frequency-guided High-Frequency Enhance Block (HFEBlock) to strengthen detail recovery, yielding sharper edges and more reliable textures. Extensive experiments on the LOLv1 and LOLv2-real datasets demonstrate that our framework, termed OWDiff, consistently outperforms existing LLIE methods both qualitatively and quantitatively, achieving superior visual quality while maintaining computational efficiency. OWDiff effectively addresses the structural limitations of the Haar WT and the HFRM, achieving an average PSNR gain of 0.58 dB, along with a 1.64% relative improvement in SSIM and a 5.9% relative reduction in LPIPS, compared to DiffLL across both the LOLv1 and LOLv2-real datasets.

2606.10238 2026-06-10 q-bio.NC cs.AI 新提交

Hyperbolic Neural Population Geometry Benefits Computation

双曲神经群体几何结构有益于计算

Dennis Wu, Yi-Chun Hung, Braden Yuille, James E. Fitzgerald, Han Liu

AI总结 本文提出海马体群体活动诱导双曲几何的理论框架,证明现代Hopfield网络更新规则计算最小均方误差估计,并引入双曲空间中的新联想记忆模型,其容量显著优于现有模型。

详情
Comments
Accepted at ICML 2026, 37 pages, 5 figures
AI中文摘要

神经群体几何结构影响下游计算。最近神经生物学的实验发现表明,海马体中的群体活动具有双曲结构。本文为这一现象提供了理论框架。首先,我们提出了一种海马体调谐曲线的合理构造,该构造在统计上诱导双曲几何。接着,我们通过证明现代Hopfield网络更新规则计算最小均方误差(MMSE)估计,建立了神经解码与联想记忆之间的联系。最后,我们引入了一个在双曲空间中定义的新型联想记忆模型,其容量显著大于领先模型。我们的结果表明,动物将空间信息编码为潜在的双曲认知地图,从而提高了记忆容量和解码精度。

英文摘要

Neural population geometry shapes downstream computation. Recent empirical findings in neurobiology suggest that a hyperbolic structure underlies population activity in the hippocampus. Here we provide a theoretical framework for this phenomenon. First, we propose a plausible construction of hippocampal tuning curves that statistically induces hyperbolic geometry. Next, we establish a connection between neural decoding and associative memory by demonstrating that the Modern Hopfield Network update rule computes the minimum mean-squared-error (MMSE) estimator. Finally, we introduce a novel associative memory model defined in hyperbolic space that yields significantly larger capacity than leading models. Our results suggest that animals encode spatial information as a latent hyperbolic cognitive map, improving both memory capacity and decoding accuracy.

2606.10233 2026-06-10 eess.AS cs.LG cs.SD 新提交

ANCHOR: Autoregressive Non-intrusive Chunk-Ordered Refinement for Joint Multi-Resolution Speech Quality Modeling

ANCHOR: 自回归非侵入式分块有序细化用于联合多分辨率语音质量建模

Zhuoyan Tao, Jiatong Shi, Hye-jin Shim, Shinji Watanabe

AI总结 提出ANCHOR模型,将增量语音质量评估重构为多分辨率自回归任务,通过双分辨率令牌和分辨率感知层次实现分块到整句的粗到细细化,在部分输入下显著降低误差,并揭示感知质量的时域积累机制。

详情
Comments
Accepted at Interspeech 2026
AI中文摘要

虽然语音质量通常是在完整话语上评估的,但流式和生成系统需要从部分音频中进行增量估计。现有的预测器假设完整的上下文,在受前缀约束的输入上性能下降。扩展ARECHO,我们提出ANCHOR,将增量评估重新表述为多分辨率自回归任务。它使用双分辨率令牌和分辨率感知层次结构在单个解码器中建模分块级和话语级质量,实现从粗到细的细化。实验表明,在部分输入下具有显著的鲁棒性,包括在2秒前缀上PLCMOS误差减少48%。收敛性分析揭示了4-6秒的有效感知上下文范围。压力测试进一步隔离了局部损坏下的结构化外推偏差。结果表明,层次监督改进了增量预测,并阐明了感知质量如何随时间累积。

英文摘要

While speech quality is typically assessed on complete utterances, streaming and generative systems require incremental estimation from partial audio. Existing predictors assume full context, degrading on prefix-constrained inputs. Extending ARECHO, we propose ANCHOR, reformulating incremental assessment as a multi-resolution autoregressive task. It models chunk- and utterance-level quality within a single decoder using dual-resolution tokens and a resolution-aware hierarchy for coarse-to-fine refinement. Experiments show substantial robustness under partial input, including a 48% PLCMOS error reduction on 2-second prefixes. Convergence analysis reveals a 4-6 s effective perceptual context horizon. A stress test further isolates structured extrapolation biases under localized corruption. Results demonstrate that hierarchical supervision improves incremental prediction and elucidates how perceptual quality accumulates over time.

2606.10187 2026-06-10 stat.ML cs.LG 新提交

Decision-Calibrated Conformal Uncertainty for Pacing Decisions in Streaming Advertising

面向流式广告中节奏控制的决策校准共形不确定性

Prashant Shekhar, Caroline Howard

AI总结 提出一种决策校准共形框架,通过衡量预测误差对实际部署策略的最大影响来校准不确定性,理论证明该分数是保护所有可部署节奏控制策略的最小有效不确定性度量,并在公开数据集上显著降低不确定性半径。

详情
AI中文摘要

我们开发了一个决策校准的共形框架,用于流式广告中的节奏控制决策。节奏控制依赖于不确定的未来库存、需求压力、增量响应和会员体验负载。该框架不是校准通用的预测残差,而是通过预测误差对实际可能部署的策略的最大影响来衡量预测误差。主要定理表明,所提出的分数是统一保护所有可部署节奏控制策略的最小有效不确定性度量。几何上,它是有符号策略敏感性集的支持函数。分裂共形校准为该分数提供了有限样本覆盖。一个高维分离定理表明,传统的残差校准可能因支付干扰库存维度而任意保守,而一个鲁棒的节奏控制结果结合了库存、响应和体验不确定性。在基于Criteo Uplift和KuaiRand数据集构建的公开数据校准节奏控制回放中,传统共形节奏控制仍然未解决,在Criteo上残差半径高达7236.7,在KuaiRand上为4629.4。采用所提出的决策校准方法,不确定性半径分别降至18.4和278.6,并为价值、交付、预算和会员负载设置了单独的边际。在Criteo上,所提出的方法证明了比点预测基线更不激进的节奏控制策略,并将保留的任何违规率从16.7%降至3.3%,且预算和会员负载违规为零。在KuaiRand上,选择仍未解决。简而言之,本文确立了预测、响应估计和会员体验模型应根据它们是否缩小节奏控制决策使用的不确定性来判断,因为这会导致自信且不过度保守的决策。

英文摘要

We develop a decision-calibrated conformal framework for pacing decisions in streaming advertising. Pacing depends on uncertain future inventory, demand pressure, incremental response, and member-experience load. Instead of calibrating a generic forecast residual, the framework measures forecast error by its largest impact on the policies that could actually be deployed. The main theorem shows that the proposed score is the smallest valid uncertainty measure that uniformly protects all deployable pacing policies. Geometrically, it is the support function of the signed policy sensitivity set. Split conformal calibration gives finite-sample coverage for this score. A high-dimensional separation theorem shows that traditional residual calibration can be arbitrarily more conservative by paying for nuisance inventory dimensions, and a robust pacing result combines inventory, response, and experience uncertainty. On public-data-calibrated pacing replays built from Criteo Uplift and KuaiRand datasets, traditional conformal pacing remains unresolved with high residual radii of 7236.7 on Criteo and 4629.4 on KuaiRand. With the proposed decision calibration approach, the uncertainty radii are reduced to 18.4 and 278.6 respectively, with separate margins for value, delivery, budget, and member load. On Criteo, the proposed method certifies a less aggressive pacing policy than the point-forecast baseline, and reduces held-out any-violation rate from 16.7% to 3.3%, with zero budget and member-load violations. On KuaiRand, the choice remains unresolved. In a nutshell, the paper establishes that forecasts, response estimates, and member-experience models should be judged by whether they shrink the uncertainty that the pacing decision uses, as this leads to confident decisions that are not overly conservative.

2606.10125 2026-06-10 stat.ML cs.DB cs.LG 新提交

Robust Active Learning for Few-Shot Example Selection in Text-to-SQL

鲁棒主动学习用于文本到SQL中的少样本示例选择

Arash Pourhabib

AI总结 针对文本到SQL中少样本示例选择,提出一种鲁棒主动学习方法,通过分层贪婪算法最大化异方差互信息目标,在嵌入流形上实现常数因子近似保证,显著减少标注成本。

详情
Comments
31 pages, 4 figures, 5 tables
AI中文摘要

少样本示例检索是将大型语言模型(LLM)应用于特定领域文本到SQL系统的主要范式。然而,标注示例库的质量直接决定系统准确性,且专家标注成本高昂。我们将这些示例的主动选择形式化为一个在语义查询嵌入的内在低维流形上的约束实验设计问题。与标准主动学习框架不同,我们的设置引入了三个关键挑战:依赖于查询的可变标注可靠性(异方差性)、跨语义主题的空间多样性严格要求(划分拟阵约束),以及嵌入空间真实协方差结构未知的固有现实(模型误设)。为了解决这些问题,我们提出了一种分层贪婪算法,该算法最大化异方差互信息目标。我们证明该目标在内在流形上保持子模性和近似单调性,从而得到理论上的常数因子近似保证。我们建立了一个谱界,表明当假设的替代核与真实数据生成过程存在偏差时,该近似保证会优雅地退化,而非灾难性地崩溃。实验结果表明,所提出的策略显著减少了标注工作量,同时保持了较高的文本到SQL检索准确性。

英文摘要

Few-shot example retrieval is the dominant paradigm for grounding large language models (LLMs) in domain-specific text-to-SQL systems. However, the quality of the annotated example bank directly governs system accuracy, and expert annotation is prohibitively expensive. We formalize the active selection of these examples as a constrained experimental design problem over the intrinsic, low-dimensional manifold of semantic query embeddings. Unlike standard active learning frameworks, our setting introduces three critical challenges: varying, query-dependent annotation reliability (heteroscedasticity), strict requirements for spatial diversity across semantic topics (partition matroid constraints), and the inherent reality that the true covariance structure of the embedding space is unknown (misspecification). To address these, we propose a stratified greedy algorithm that maximizes a heteroscedastic mutual information objective. We prove that this objective remains submodular and approximately monotonic on the intrinsic manifold, yielding a theoretical constant-factor approximation guarantee. We establish a spectral bound demonstrating that this approximation guarantee degrades gracefully, rather than catastrophically, when the assumed surrogate kernel diverges from the true underlying data-generating process. Empirical results demonstrate that the proposed strategy significantly reduces labeling effort while maintaining high text-to-SQL retrieval accuracy.

2606.10010 2026-06-10 eess.AS cs.AI cs.MM cs.SD 新提交

DeRA-MOS: Optimizing Text-to-Music Evaluation via Decoupled Listwise Ranking and Modality Alignment

DeRA-MOS:通过解耦列表排序和模态对齐优化文本到音乐评估

Chien-Chun Wang, Hung-Shin Lee, Hsin-Min Wang, Berlin Chen

AI总结 提出DeRA-MOS解耦优化框架,通过批感知列表排序损失和分数锚定模态对齐损失,分别优化音乐印象和文本对齐的排名指标,在MusicEval上显著提升评估性能。

详情
Comments
Accepted to IEEE Signal Processing Letters (SPL)
AI中文摘要

评估文本到音乐(TTM)系统仍然昂贵,因为音乐印象(MI)和文本对齐(TA)分数依赖于人类平均意见分数(MOS)。大多数自动MOS估计器采用逐点回归或分布分类训练。这些目标不直接优化基于排名的指标,并且为跨模态一致性提供较弱的几何约束。为了解决这些问题,我们提出了DeRA-MOS,一种用于TTM评估的解耦优化框架。对于MI,我们引入了一种批感知列表排序损失,该损失对每个小批量内的相对顺序进行建模,并更好地与基于Spearman秩相关系数(SRCC)的评估对齐。对于TA,我们引入了一种分数锚定的模态对齐损失,将人类分数映射到目标音频-文本相似度,并在融合前正则化潜在空间。通过有效缓解逐点训练不匹配和模态漂移,MusicEval上的实验表明,我们的解耦框架在MI和TA排名指标上均取得了显著改进,为大规模TTM评估建立了稳健的范式。

英文摘要

Evaluating text-to-music (TTM) systems remains expensive because music impression (MI) and text alignment (TA) scores rely on human mean opinion scores (MOS). Most automatic MOS estimators are trained with point-wise regression or distributional classification. These objectives do not directly optimize rank-based metrics and provide weak geometric constraints for cross-modal coherence. To address these gaps, we propose DeRA-MOS, a decoupled optimization framework for TTM evaluation. For MI, we introduce a batch-aware listwise ranking loss that models relative order within each mini-batch and better aligns with evaluation based on Spearman's rank correlation coefficient (SRCC). For TA, we introduce a score-anchored modality alignment loss that maps human scores to target audio-text similarity and regularizes the latent space before fusion. By effectively mitigating the point-wise training mismatch and modality drift, experiments on MusicEval demonstrate that our decoupled framework yields substantial improvements in both MI and TA ranking metrics, establishing a robust paradigm for large-scale TTM evaluation.

2606.09953 2026-06-10 eess.IV cs.AI cs.LG 新提交

Deep Slice Interpolation for Reducing Through-Plane Anisotropy and Noise in Head CT

深度切片插值用于减少头部CT的穿平面各向异性和噪声

Luis Cortés Ferre, Miguel A. Gutiérrez-Naranjo, Marcin Balcerzyk

AI总结 提出一种深度学习系统,通过相邻轴向切片对合成中间CT切片,将有效穿平面间距减半,同时实现隐式降噪,在结构指标上优于经典插值和视频帧插值方法。

详情
AI中文摘要

头部计算机断层扫描(CT)通常使用亚毫米级的面内分辨率,但穿平面间距为2-5毫米,造成显著的各向异性,这会降低多平面重建、血肿体积估计等体积测量以及假设近似各向同性体素的后续算法的性能。我们提出一个深度学习系统,从相邻轴向切片对合成中间CT切片,将有效穿平面间距减半。该系统改善三维可视化,同时产生固有降噪的输出,在一次推理中实现两个互补优势。为构建可靠系统,我们系统评估像素级损失(均方误差MSE和平均绝对误差L1)、结构相似性损失(结构相似性指数SSIM及其多尺度变体MS-SSIM)以及混合组合。在保留测试集上,所有收敛模型在所有结构指标上均优于经典插值基线和预训练视频帧插值方法(RIFE、FILM),其中MS-SSIM+L1提供最强平衡性能。我们还记录了SSIM族损失中的训练不稳定性并识别部分补救措施:标准数值修复消除了主要失败模式,但在较小批量大小下留下残余发散。所有结果均报告患者级自助法置信区间和配对统计检验。作为示例,我们将系统应用于来自Virgen del Rocío大学医院的非分布头部CT序列:模型合成中间切片,并在真实切片上表现出我们理论分析预测的隐式降噪特征,支持在单个外部病例中插值质量和隐式降噪不局限于训练分布。

英文摘要

Head computed tomography (CT) typically uses sub-millimeter in-plane resolution but 2-5 mm through-plane spacing, creating substantial anisotropy that degrades multiplanar reconstructions, volumetric measurements such as hematoma volume estimation, and downstream algorithms that assume near-isotropic voxels. We present a deep learning system that synthesizes intermediate CT slices from pairs of neighboring axial slices, halving the effective through-plane spacing. The system improves three-dimensional visualization while simultaneously producing inherently denoised outputs, yielding two complementary benefits from a single inference pass. To build a reliable system, we systematically evaluate pixel-wise losses, namely mean squared error (MSE) and mean absolute error (L1); structural-similarity losses, namely the structural similarity index (SSIM) and its multi-scale variant (MS-SSIM); and hybrid combinations. On a held-out test set, all converged models outperform classical interpolation baselines and pretrained video frame interpolation methods (RIFE, FILM) on all structural measures, with MS-SSIM+L1 offering the strongest balanced profile. We also document training instability in SSIM-family losses and identify partial remedies: the standard numerical fixes eliminate the dominant failure mode but leave residual divergence at smaller batch sizes. All results are reported with patient-level bootstrap confidence intervals and paired statistical tests. As an illustration, we apply the system to an out-of-distribution head CT series from Hospital Universitario Virgen del Rocío: the model synthesizes intermediate slices and exhibits on the real slices the implicit-denoising signature predicted by our theoretical analysis, supporting in a single external case that interpolation quality and implicit denoising are not confined to the training distribution.

2606.09944 2026-06-10 econ.GN cs.AI q-fin.EC 新提交

GAGI: A Gini-Adjusted GDP-per-Capita Index for Distribution-Aware Macroeconomic Welfare Monitoring

GAGI:一种用于分布感知宏观经济福利监测的基尼调整人均GDP指数

Sivasathivel Kandasamy

AI总结 提出GAGI指数,通过基尼系数和价格水平调整人均GDP,以监测福利分配效应,应用于G7国家发现福利增长与GDP增长持续偏离。

详情
AI中文摘要

人均GDP是政府机构追踪经济繁荣和经济事件后果的默认视角,但它忽视了生活繁荣的两个首要决定因素:收入/财富分配和通胀影响。不平等调整的收入衡量指标本身并不新鲜,但宏观经济监测工具包中具体缺失的不是福利概念,而是一个可操作的监测触发指标:一个足够简洁、可每年从公开数据计算、无需建模假设即可审计、且标准化以便于理解年度间和国家间变化(监管机构需要据此采取行动)的统计量。我们构建了这样一个工具,即基尼调整人均GDP指数(GAGI):一种可复现、可公开计算的公式,通过不平等调整因子(1-G)和价格水平重新调整各国人均GDP,并以2010年为基准标准化。GAGI是一个通用福利指数,并非特定于AI自动化,适用于任何需要追踪福利调整后繁荣的场景。将GAGI应用于2010-2026年的G7经济体,我们发现福利调整后的繁荣与总体GDP增长持续且日益偏离,这种偏离在2022年后急剧扩大,时间上与COVID后遗症和生成式AI部署加速相吻合,尽管仅凭此证据尚不能证明因果关系。我们认为GAGI是基于GDP监测的必要补充:任何仅追踪总产出的宏观经济监测工具都会系统性地忽略自动化可能造成的分配损害,即使报告的增长依然强劲。

英文摘要

GDP per capita is the default lens through which governibng bodies track the economic prosperity and consequences of economic events , yet it is blind to two first-order determinants of lived prosperity: income/wealth distribution and inflation impact. Inequality-adjusted income measures are themselves not new but What is missing from the macroeconomic monitoring toolkit specifically is not a welfare concept but an operational monitoring trigger: a statistic minimal enough to compute annually from public data, transparent enough to audit without modelling assumptions, and normalised so that year-on-year, cross-country change ? the quantity a regulator needs to act on? is legible. We assemble such an instrument, the Gini- Adjusted GDP per Capita Index (GAGI): a reproducible, publicly computable formulation that rescales each country's GDP per capita by its inequality-adjustment factor (1-G) and its price level, normalised to a 2010 baseline. GAGI is a general-purpose welfare index, not inherently specific to AI automation, applicable wherever welfare-adjusted prosperity needs tracking. Applying GAGI to the G7 economies over 2010-2026, we show that welfare-adjusted prosperity has diverged persistently and increasingly from headline GDP growth, that the divergence widens sharply after 2022, temporally coincident with, though not, on this evidence alone, demonstrated to be caused by the after effects of COVID and the acceleration of generative-AI deployment. We argue that GAGI is a necessary complement to GDP-based monitoring: any macroeconomic monitoring instrument that tracks only aggregate output will systematically miss the distributional harm that automation can cause even while reported growth remains strong.

2606.09941 2026-06-10 stat.AP cs.LG stat.OT 新提交

Stochastic weather generators for high-frequency wind vector time series

高频风矢量时间序列的随机天气生成器

Mingshi Cui, Kevin Eng, Justin T. Greene, Zern Ke, Abolfazl Sodagartojgi, Zhiqiu Xia, Gemma E. Moran, Michael L. Stein

AI总结 针对分钟级风矢量时间序列,开发基于时间矢量量化变分自编码器的机器学习模型,生成逼真序列,捕捉昼夜变化但极端风速分布匹配不足。

详情
AI中文摘要

地表风速在分钟尺度上变化显著,因此有必要研究其在此精细时间尺度上的变化。为最小化季节性影响,本文限定于六月,基于俄克拉荷马州拉蒙特站点超过30年的分钟级高质量测量数据,开发了一系列用于生成真实地表风矢量时间序列的机器学习模型。此类生成器可作为多种学科模型的输入,特别是风能领域,同时也适用于野火蔓延和航空等。数据显示风速和风向均存在复杂的昼夜结构,标准时间序列模型难以捕捉,因此我们考虑多种机器学习方法,基于时间矢量量化变分自编码器构建随机风生成器。我们考虑一次生成一天的数据,以及基于前一天风况生成一天的风矢量。我们还研究了在生成器中纳入离散天气状态变量的方法。我们使用多种正式和非正式方法评估生成器。其中最佳生成器能够捕捉观测数据中的许多(但非全部)复杂特征。特别地,我们的最佳方法准确模拟了风波动性的昼夜变化,但在匹配观测到的极端风速分布方面存在困难。

英文摘要

Surface winds can vary substantially from one minute to the next, so there is scope for studying its variation on this fine time scale. Restricting to the month of June to minimize seasonality, this work develops a range of machine learning models for generating realistic time series of surface wind vectors at a site in Lamont, Oklahoma based on more than 30 years of high quality measurements at the minute time scale. Such a generator could be used as an input into models from a range of disciplines, notably for wind energy, but also wildfire spread and aviation, among others. The data show complex diurnal structures in both wind speed and direction that would be challenging to capture with standard time series models, so we consider a number of machine learning approaches to producing a stochastic wind generator based on time vector-quantized variational autoencoders. We consider generating a day's worth of data at a time and generating a day of wind vectors conditional on the previous day's winds. We also study methods for incorporating a discrete weather state variable in the generator. We evaluate the generators using a wide range of formal and informal methods. The best of these generators can capture many but not all of the complex features present in the observational data. In particular, the best of our approaches accurately mimic diurnal changes in wind volatility but struggle to match the observed distribution of extreme wind speeds.

2606.09893 2026-06-10 eess.IV cs.AI cs.LG 新提交

Tractogram foundation model

TractFM:纤维束图基础模型

Guikun Chen, Yuqian Chen, Yijie Li, Yogesh Rathi, Nikos Makris, Fan Zhang, Wenguan Wang, Lauren J. O'Donnell

AI总结 提出TractFM基础模型,直接从全脑纤维束集学习可复用表示,结合局部纤维编码器和置换等变纤维束编码器,通过密集解剖束分割预训练,实现纤维束级和受试者级任务的迁移。

详情
AI中文摘要

扩散MRI(dMRI)纤维束成像是在活体人脑中绘制白质通路的唯一非侵入性方法。它将每个大脑表示为一个纤维束图:一个大型、无序的三维流线集合,包含局部流线几何和全脑解剖组织的信息。这种结构使纤维束图成为表示学习的自然但具有挑战性的目标。现有方法将流线分类和受试者级预测视为独立问题:流线分类器关注几何模式,而受试者级预测通常依赖于手工特征。因此,当前方法无法学习连接流线解剖与全脑受试者间变异的可复用表示。本文介绍TractFM,一个纤维束图基础模型,直接从全脑纤维束集学习可复用表示。TractFM结合了局部流线编码器和置换等变纤维束编码器,使得一个受试者的所有流线能够在单次前向传递中共同上下文化。在密集解剖束分割(即给单个流线分配解剖标签)上的预训练产生了两种互补表示:用于束分割的上下文化流线级嵌入和用于下游受试者表型预测的紧凑受试者级描述符。在三种纤维束成像算法和五个dMRI数据集上,TractFM迁移到流线级和受试者级任务。其冻结表示实现了准确的束分割,并在独立数据集上预测年龄和性别。这些结果表明,全脑几何上下文(一次性学习)可以泛化到纤维束成像流程、数据集和预测任务中。

英文摘要

Diffusion MRI (dMRI) tractography is the only noninvasive approach for mapping white-matter pathways in the living human brain. It represents each brain as a tractogram: a large, unordered set of three-dimensional streamlines that includes information about both local streamline geometry and whole-brain anatomical organization. This structure makes tractograms a natural but challenging target for representation learning. Existing methods treat streamline classification and subject-level prediction as separate problems: streamline classifiers focus on geometric patterns, whereas subject-level prediction often depends on hand-crafted features. As a result, current methods do not learn reusable representations that connect streamline anatomy with whole-brain inter-subject variation. Here we introduce TractFM, a tractogram foundation model that learns reusable representations directly from whole-brain streamline sets. TractFM combines a local streamline encoder with a permutation-equivariant tractogram encoder, allowing all streamlines from a subject to be contextualized jointly in a single forward pass. Pretraining on dense anatomical tract parcellation, i.e., assigning anatomical labels to individual streamlines, yields two complementary representations: contextualized streamline-level embeddings for tract parcellation and compact subject-level descriptors for downstream prediction of subject phenotypes. Across three tractography algorithms and five dMRI datasets, TractFM transfers to both streamline-level and subject-level tasks. Its frozen representations achieve accurate tract parcellation and predict age and sex across independent datasets. These results show that whole-brain geometric context, learned once, can generalize across tractography pipelines, datasets, and prediction tasks.

2606.11186 2026-06-10 cs.CV 新提交

AnyMod-LLVE: Low-Light Video Enhancement with Modality-Agnostic Inference

AnyMod-LLVE: 模态无关推理的低光照视频增强

Hangfeng Liang, Yutao Hu, Yanhan Hu, Xiaohan Wu, Wenqi Shao, Ying Fu

AI总结 提出AMNet统一多模态框架,通过空间-频谱双门控转换器学习辅助模态与RGB输入的对应关系,支持推理时任意模态组合,解决低光照视频增强中辅助模态缺失问题。

详情
Comments
Accepted at ICML 2026; Project page and code: https://lhfgghc.github.io/LLVE-AMNet
AI中文摘要

低光照视频增强(LLVE)由于低照度条件下严重的信息退化仍然是一项具有挑战性的任务。最近的多模态方法通过引入辅助模态(如事件流和红外图像)显著提升了增强性能。然而,这些方法通常假设推理时这些模态可用,这在现实场景中往往不可行。为了解决这个问题,在本工作中,我们提出了AMNet,一个统一的LLVE多模态框架,以支持灵活的模态无关推理,其中辅助模态可能不可用。为了解决模态缺失问题,我们引入了一个空间-频谱双门控转换器,学习辅助模态与RGB输入之间的对应关系,生成隐式辅助表示以支持鲁棒增强。此外,为了充分促进跨模态对应学习,我们基于仅RGB数据集和合成辅助模态进行了大规模多模态预训练。大量实验表明,AMNet能够处理任意推理时的模态组合,并在模态缺失条件下展现出优越的LLVE性能。代码和模型可在项目页面上获取。

英文摘要

Low-light video enhancement (LLVE) remains a challenging task due to severe information degradation under low-illumination conditions. Recent multimodal approaches have significantly improved enhancement performance by incorporating auxiliary modalities, such as event streams and infrared images. However, these methods typically assume the availability of these modalities at inference, which is often not feasible in real-world scenarios. To solve this problem, in this work, we propose AMNet, a unified multimodal framework for LLVE, to support flexible modality-agnostic inference, where auxiliary modalities may be unavailable. To address the issue of modality absence, we introduce a Spatial-Spectral Dual-Gated Translator that learns the correspondence between auxiliary modalities and RGB inputs, producing implicit auxiliary representations to support the robust enhancement. Additionally, to fully facilitate the learning of cross-modal correspondence, we conduct large-scale multimodal pretraining based on the RGB-only dataset with synthetic auxiliary modalities. Extensive experiments demonstrate that AMNet could handle arbitrary inference-time modality combinations and exhibits superior performance for LLVE under modality absence conditions. Code and models are available on the project page.

2606.11169 2026-06-10 cs.DC cs.AI 新提交

Piper: A Programmable Distributed Training System

Piper: 可编程的分布式训练系统

Megan Frisella, Shubham Tiwari, Andy Ruan, Yi Pan, Parker Gustafson, Mat Jacob, Gilbert Bernstein, Stephanie Wang

AI总结 提出Piper系统,通过解耦策略与运行时实现,允许用户用少量注解和调度指令声明分布式训练策略,并编译为设备执行计划,支持常见策略并实现组合策略的联合调度优化。

详情
AI中文摘要

大规模模型训练日益依赖于组合多种并行策略(如数据、流水线和专家并行)以及内存节省优化(如ZeRO)。用于基础模型预训练的部署系统通常依赖人类专家手动设计高层并行策略,然后实现相应的低层执行策略,这使得系统难以适应新策略。同时,许多通用框架更加灵活,但其实现仍然局限于一组固定的常见并行策略,使得整合最新策略具有挑战性。我们提出Piper,一个用户可控的分布式训练系统,将策略与运行时实现解耦。Piper允许用户通过少量模型注解和调度指令声明全面的分布式训练策略。每条指令对Piper的中间表示(IR)应用变换,IR是一个统一的全局训练DAG,表示所有计算和通信。使用此IR,Piper编译每设备执行计划,并使用与策略无关的分布式运行时执行它们。我们表明,该组合系统在常见策略(如ZeRO)上保持性能一致,同时通过组合并行策略(如DeepSeek-V3的DualPipe)中计算和通信的联合调度,实现额外的性能和内存效率提升。

英文摘要

Large-scale model training increasingly relies on composing multiple parallelism strategies, such as data, pipeline, and expert parallelism, together with memory-saving optimizations like ZeRO. Deployed systems for foundation model pretraining often rely on human experts to manually design a high-level parallelism strategy then implement the corresponding low-level execution strategy, making it difficult to adapt the system to new strategies. Meanwhile, many general-purpose frameworks are more flexible but their implementations are still tied to a fixed set of common parallelism strategies, making it challenging to integrate state-of-the-art strategies. We present Piper, a user-controllable distributed training system that decouples the strategy from the runtime implementation. Piper allows users to declare a comprehensive distributed training strategy with a small set of model annotations and scheduling directives. Each directive applies a transformation on Piper's intermediate representation (IR), a unified global training DAG that represents all computation and communication. Using this IR, Piper compiles per-device execution plans and executes them with a distributed runtime agnostic to the strategy. We show that the combined system maintains performance parity on commonly available strategies such as ZeRO, while also enabling additional performance and memory efficiency gains through joint scheduling of compute and communication in composed parallelism strategies such as DeepSeek-V3's DualPipe.

2606.11155 2026-06-10 cs.CV 新提交

Mean Flow Distillation: Robust and Stable Distillation for Flow Matching Models

平均流蒸馏:面向流匹配模型的鲁棒稳定蒸馏方法

An Zhao, Shengyuan Zhang, Zhongjian Sun, Yixiang Zhou, Zejian Li, Ling Yang, Tianrun Chen, Lingyun Sun

AI总结 提出平均流蒸馏(MFD)框架,通过时间低通滤波抑制优化噪声并保证轨迹一致性,实现流匹配模型的高保真单步生成。

详情
AI中文摘要

流匹配模型在广泛的生成任务中展现出强大性能。然而,它们依赖于基于ODE的迭代采样,在推理中产生大量计算开销,限制了其在实时场景中的应用。虽然蒸馏是一种有前景的解决方案,但现有方法大多借鉴基于扩散的分数匹配,往往未能利用流的固有几何结构,并遭受训练不稳定、高方差和生成质量下降的问题。在本文中,我们提出平均流蒸馏(MFD),一种专为流匹配模型设计的新型蒸馏框架。我们从理论上证明,MFD充当时间低通滤波器,有效抑制变分分数蒸馏(VSD)中固有的高频优化噪声,同时确保全局轨迹一致性。我们进一步证明了平均流匹配定理,表明匹配期望平均速度足以实现严格的分布对齐。在实验上,在包括4D占用预测和文本到图像生成在内的高维流形挑战性任务中,MFD实现了最先进的性能,实现了高保真单步生成。

英文摘要

Flow Matching models have demonstrated strong performance across a wide range of generative tasks. However, their reliance on ODE-based iterative sampling incurs substantial computational overhead in inference, which limits their applicability in real-time scenes. While distillation is a promising solution, existing approaches largely borrow from diffusion-based score matching, often failing to exploit the intrinsic geometric structure of flows and suffering from training instability, high variance, and degraded generation quality. In this paper, we propose Mean Flow Distillation (MFD), a novel distillation framework tailored for flow matching models. We theoretically demonstrate that MFD acts as a temporal low-pass filter, effectively suppressing the high-frequency optimization noise inherent in variational score distillation (VSD) while ensuring global trajectory consistency. We further prove the Mean Flow Matching Theorem, establishing that matching expected average velocities is sufficient for strict distribution alignment. Empirically, on challenging tasks of high-dimensional manifolds including 4D occupancy forecasting and text-to-image generation, MFD achieves state-of-the-art performance, enabling high-fidelity single-step generation.

2606.11150 2026-06-10 cs.AI cs.CY 新提交

ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity

ABC-Bench:生物安全的主体生物能力基准

Andrew Bo Liu, Samira Nedungadi, Bryce Cai, Alex Kleinman, Harmon Bhasin, Seth Donoughe

AI总结 提出ABC-Bench基准,评估LLM主体在生物安全相关任务上的能力,包括液体处理机器人编程、DNA片段设计和合成筛选规避,所有测试主体均优于人类专家基线。

详情
Comments
18 pages. To be published in ICML 2026
AI中文摘要

大型语言模型(LLM)正在迅速获得与生物研究相关的能力,从文献综合到实验数据解释。LLM主体也越来越能够执行以前需要经验丰富的人类生物学家才能完成的计算机生物学任务。这些新兴的AI能力为科学发现和生物医学进步提供了新的机会,但也改变了生物安全风险的格局。为了解决这个问题,我们引入了主体生物能力基准(ABC-Bench),这是一套用于衡量主体生物安全相关能力的任务。ABC-Bench在良性和双重用途生物学任务上评估LLM主体:编写代码操作液体处理机器人、设计用于体外组装的DNA片段以及规避DNA合成筛选。这些任务需要生物学和软件专业知识的结合。所有测试的LLM主体在所有三项任务上的表现都优于中位数专家人类基线。主体在依赖已发表知识和有良好文档记录协议的任务上表现优异,而在需要新颖生物信息学推理的任务上表现较弱。在三个湿实验室验证实验中,我们发现OpenAI的o4-mini-high生成的脚本在OpenTrons液体处理机器人上运行时,成功组装了具有预期序列的DNA。

英文摘要

Large language models (LLMs) are rapidly acquiring capabilities relevant to biological research, from literature synthesis to interpretation of experimental data. Increasingly, LLM agents can also perform in silico biology tasks that previously required experienced human biologists. These emerging AI capabilities offer new opportunities for scientific discovery and biomedical advances, but they also shift the landscape of biosecurity risks. To address this, we introduce the Agentic Bio-Capabilities Benchmark (ABC-Bench), a suite of tasks to measure agentic biosecurity-relevant capabilities. ABC-Bench evaluates LLM agents on both benign and dual-use biology tasks: writing code to operate liquid handling robots, designing DNA fragments for in vitro assembly, and evading DNA synthesis screening. These tasks require a combination of biology and software expertise. All tested LLM agents outperformed the median expert human baseliner on all three tasks. Agents performed highly on tasks drawing on published knowledge and well-documented protocols, and more weakly on a task requiring novel bioinformatics reasoning. In three wet-lab validation experiments, we found that OpenAI's o4-mini-high produced scripts that, when run on an OpenTrons liquid handling robot, successfully assembled DNA with expected sequences.

2606.11131 2026-06-10 cs.CV 新提交

UniPET: a universal network for high-quality PET image denoising across varied dose reduction factors

UniPET:一种适用于不同剂量减少因子的高质量PET图像去噪通用网络

Zhiwen Yang, Yang Zhou, Haowei Chen, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu

AI总结 针对现有PET去噪方法在剂量减少因子变化时性能下降的问题,提出UniPET网络,通过风格对齐网络和区域感知学习策略实现跨DRF的高质量去噪,性能达到最先进水平。

详情
AI中文摘要

大多数现有的基于深度学习的PET图像去噪方法假设低剂量PET图像具有固定且已知的剂量减少因子(DRF)。然而,当DRF在实际应用中超出假设范围时,这些方法会遇到显著的性能下降。为了应对不同DRF带来的挑战,一些初步研究聚焦于通用PET图像去噪任务,旨在训练一个覆盖不同DRF低剂量数据的通用模型。尽管如此,这些通用模型常常难以处理不同DRF数据中存在的风格不匹配问题,导致出现显著的过度平滑效应,即\textit{风格消除问题}。为了解决这个问题,我们创新性地将域泛化引入PET图像去噪,并提出了一种通用PET图像去噪网络(UniPET),以实现跨不同DRF的高质量PET图像去噪。UniPET包含两个主要创新:风格对齐网络(SAN)和区域感知学习策略(RALS)。具体而言,SAN利用源自域泛化的风格对齐技术来对齐和恢复不同DRF下的风格,确保模型在各种DRF下的泛化能力,同时有效保留风格。此外,为了增强风格恢复,RALS区分平坦区域和风格化区域,仅在后者上进行对抗学习,从而更有效地引导模型关注学习风格化区域。实验证明,我们提出的UniPET能够自适应地恢复不同DRF风格,并实现跨DRF的高质量PET图像去噪。全面的实验表明,UniPET在特定DRF下表现出与专用DRF模型相当的性能,并在定量、感知和临床评估中实现了通用PET图像去噪的最先进性能。

英文摘要

Most existing deep learning-based PET image denoising methods assume a fixed and known dose reduction factor (DRF) for low-dose PET images. However, these methods encounter significant performance degradation when the DRF varies beyond the assumed one in practical applications. To address the challenge posed by varied DRFs, several preliminary studies focus on the task of universal PET image denoising, aiming to train a universal model over low-dose data across DRFs. Nonetheless, these vanilla universal models often struggle with misaligned styles present in different DRF data, leading to the \textit{style elimination issue} with a significant over-smoothing effect. To deal with this issue, we innovatively introduce domain generalization to PET image denoising and propose a universal PET image denoising network (UniPET) to achieve high-quality PET image denoising across diverse DRFs. UniPET comprises two primary innovations: a style alignment network (SAN) and a region-aware learning strategy (RALS). Specifically, SAN utilizes style alignment techniques derived from domain generalization to align and recover styles across different DRFs, ensuring the model's generalizability across various DRFs while effectively preserving styles. Furthermore, to enhance style recovery, RALS distinguishes between flat and stylized regions, exclusively conducting adversarial learning on the latter, thereby more effectively guiding the model's focus towards learning stylized regions. It is demonstrated that our proposed UniPET can adaptively recover different DRF styles and achieve high-quality PET image denoising across DRFs. Comprehensive experiments show that UniPET exhibits comparable performance to individual DRF-specific models at specific DRFs and realizes state-of-the-art performance in universal PET image denoising quantitatively, perceptually, and clinically.

2606.11117 2026-06-10 cs.AR cs.AI cs.PF 新提交

Towards Autonomous Accelerator Design: FPGA Accelerator Generation with SECDA

迈向自主加速器设计:基于SECDA的FPGA加速器生成

Vinamra Sharma, Xingjian Fu, Jude Haris, José Cano

AI总结 提出SECDA-DSE框架,集成大语言模型引导FPGA加速器设计空间探索,通过结构化探索器和LLM推理生成可综合的加速器设计,减少人工干预。

详情
Comments
Accepted to the Machine Learning for Architecture and Systems Workshop (MLArchSys), co-located with ISCA 2026
AI中文摘要

为现代人工智能工作负载设计基于FPGA的加速器需要探索庞大而复杂的硬件设计空间,涉及架构参数、数据流策略和内存层次结构,这使得过程非常耗时。虽然现有方法如SECDA通过SystemC仿真和FPGA执行实现了快速的硬件-软件协同设计,但识别高效的加速器配置仍然是一个主要需要广泛领域知识的手动过程。SECDA-DSE是一个将大语言模型(LLM)集成到SECDA生态系统中的框架,用于指导基于FPGA的加速器的设计空间探索(DSE)。它结合了用于生成候选架构的结构化DSE探索器,以及使用检索增强生成和思维链提示进行推理引导探索的LLM栈,并配有用于迭代和强化优化的反馈循环。基于我们之前介绍SECDA-DSE的工作,本文通过生成三种加速器设计(包括逐元素向量乘法、二维卷积和矩阵转置)并在FPGA硬件上执行端到端运行来扩展其评估。结果表明,SECDA-DSE能够生成符合SECDA标准的加速器设计,并成功在FPGA硬件上综合和执行。此外,生成的设计捕获了计算并行性和数据移动之间的内核特定权衡,突显了LLM引导探索在跨不同工作负载调整架构配置方面的潜力,同时减少了探索时间和大量人类专业知识的需求。

英文摘要

Designing FPGA-based accelerators for modern artificial intelligence workloads requires exploring a large and complex hardware design space that involves architectural parameters, data flow strategies, and memory hierarchies, making the process very time consuming. While existing methodologies such as SECDA enable rapid hardware-software co-design through SystemC simulation and FPGA execution, identifying efficient accelerator configurations remains a largely manual process requiring extensive domain knowledge. SECDA-DSE is a framework that integrates Large Language Models (LLMs) into the SECDA ecosystem to guide design space exploration (DSE) of FPGA-based accelerators. It combines a structured DSE Explorer for generating candidate architectures with an LLM Stack that performs reasoning-guided exploration using retrieval-augmented generation and chain-of-thought prompting, coupled with a feedback loop for iterative and reinforced refinement. Building on our previous work introducing SECDA-DSE, this paper extends its evaluation by generating three accelerator designs, including element-wise vector multiplication, 2D convolution, and matrix transpose, and performing end-to-end execution on FPGA hardware. The results show that SECDA-DSE can generate SECDA-compliant accelerator designs that are successfully synthesized and executed on FPGA hardware. Furthermore, the generated designs capture kernel-specific trade-offs between compute parallelism and data movement, highlighting the potential of LLM-guided exploration to adapt architectural configurations across diverse workloads while reducing exploration time and the need for extensive human expertise.

2606.11116 2026-06-10 cs.CY cs.AI cs.HC 新提交

Designed by Journalists, but Is It for Readers? Rethinking AI Disclosures and Transparency in News

由记者设计,但为读者而设?重新思考AI披露与新闻透明度

Pooja Prajod

AI总结 研究发现,详细披露会引发透明度困境降低信任,而简短披露造成信息缺口;读者偏好用户代理型设计(如按需详情、AI比例可视化),呼吁HCI社区重新设计披露机制。

详情
Comments
Accepted to CHIWORK Workshop (Interrogating GenAI Augmentation for CHIworkers: Strategies for Professional Autonomy and Accountability)
AI中文摘要

随着新闻编辑室整合生成式AI,记者面临一个披露挑战:如何以维护读者信任的方式传达AI参与。当前实践提供两种方法:简短的一行标签或详细的披露,说明人工监督、编辑责任和错误报告机制。两者都未能实现记者通过透明度建立信任的目标。一项针对34名新闻读者的现有对照实验表明,详细披露会引发\textit{透明度困境},降低信任而非增加信任,并有可能引入暗黑模式,使读者在透明度的错觉下滚动忽略。一行披露避免了这种效应,但可能造成信息缺口,促使读者花费认知努力寻找披露所指示但未解释的AI参与迹象。然而,读者并非拒绝透明度,他们提出了以用户代理为中心的披露设计:按需详情交互、比例AI可视化、媒体级别信号和明确的“无AI”标签。我认为,从业者认为负责任的披露与用户实际需求之间的脱节是HCI社区的一个设计问题。

英文摘要

As newsrooms integrate generative AI, journalists face a disclosure challenge: how to communicate AI involvement in ways that maintain reader trust. Current practice offers two approaches: brief one-line labels or detailed disclosures specifying human oversight, editorial accountability, and error reporting mechanisms. Neither achieves journalists' goal of building trust through transparency. An existing controlled experiment with 34 news readers show that detailed disclosures trigger a \textit{transparency dilemma}, reducing trust rather than increasing it, and risk introducing dark patterns that readers scroll past with the illusion of transparency. One-line disclosures avoid this effect but can create an information gap, prompting readers to expend cognitive effort searching for signs of AI involvement that the disclosure indicates but does not explain. Yet readers are not rejecting transparency, they proposed disclosure designs centered on user agency: detail-on-demand interactions, proportional AI-ratio visualizations, outlet-level signals, and explicit "no AI" labels. I argue that this disconnect between what practitioners believe is responsible disclosure and what users actually need is a design problem for the HCI community.

2606.11098 2026-06-10 cs.CR cs.LG 新提交

Do Transformers Actually Help Intrusion Detection? A Temporal Sequence Evaluation on CIC-IDS2017

Transformer 真的有助于入侵检测吗?基于 CIC-IDS2017 的时间序列评估

Zach Moczkodan, Hany Ragab

AI总结 本研究重新将 CIC-IDS2017 构建为时间序列入侵检测任务,发现填充方式而非架构决定 Transformer 性能,且随机分割和填充方式会高估模型鲁棒性。

详情
Comments
11 pages, 9 figures, 9 tables. Preprint. Code: https://github.com/zachmocz/temporal-ids-bench
AI中文摘要

近年来,用于网络入侵检测的深度学习方法越来越多地采用时间架构,如循环网络和 Transformer,通常在 CIC-IDS2017 上报告接近完美的性能。然而,许多现有研究既没有为其时间模块提供真实的序列输入,也没有在现实、无泄漏的条件下进行评估,使得报告的性能提升是否源于真正的序列建模能力尚不清楚。在这项工作中,我们通过从网络对话中构建有序流序列,并在随机分割、两种无泄漏分割以及填充方案消融下对九种经典和深度学习架构进行基准测试,将 CIC-IDS2017 重新表述为时间入侵检测任务。核心发现是,填充惯例而非架构决定了 Transformer 的性能:在真正的序列(非填充)窗口上,Transformer 实现了实验中所有模型的最高 macro-F1(0.89);在零填充+掩码评估下,其性能显著下降(-0.24 macro-F1),而 LSTM、GRU 和 1D-CNN 保持稳定。在无泄漏组评估下,随机森林是最稳健的模型(+0.009),而 Transformer 的误报率从 0.04% 增长到 2.7%,增加了 67 倍,这在传统协议下是不可见的。这些发现表明,评估方法——特别是填充惯例和分割协议——对报告性能的影响大于架构选择,并且广泛使用的随机分割与重复最后填充可能高估模型鲁棒性高达 0.24 macro-F1。我们主张将无泄漏分割、显式填充披露和序列感知基准测试作为未来入侵检测研究的标准实践。代码和实现细节可在此 https URL 获取。

英文摘要

Recent deep learning approaches for network intrusion detection increasingly incorporate temporal architectures such as recurrent networks and Transformers, often reporting near-perfect performance on CIC-IDS2017. However, many existing studies neither supply their temporal modules with genuine sequence inputs nor evaluate under realistic, leakage-free conditions, making it unclear whether reported gains arise from true sequence-modeling capability. In this work, we reformulate CIC-IDS2017 as a temporal intrusion-detection task by constructing ordered flow sequences from network conversations and benchmarking nine classical and deep learning architectures under a random split, two leakage-free splits, and a padding-scheme ablation. The central finding is that padding convention, not architecture, determines the Transformer's performance: on genuinely sequential (non-padded) windows the Transformer achieves the highest macro-F1 of any model in the experiment (0.89); under zero-pad+mask evaluation it drops markedly (-0.24 macro-F1), while LSTM, GRU, and 1D-CNN remain stable. Under leakage-free group evaluation the Random Forest is the most robust model (+0.009), while the Transformer's false-alarm rate grows from 0.04% to 2.7%, a 67-fold increase invisible under conventional protocols. These findings demonstrate that evaluation methodology -- specifically padding convention and split protocol -- has a larger effect on reported performance than architectural choice, and that widely used random splits with repeat-last padding can overestimate model robustness by up to 0.24 macro-F1. We advocate leakage-free splits, explicit padding disclosure, and sequence-aware benchmarking as standard practice in future IDS research. Code and implementation details are available at https://github.com/zachmocz/temporal-ids-bench.

2606.11082 2026-06-10 cs.CL cs.CY 新提交

The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language Models

示播列效应:审计大型语言模型的跨语言分布偏斜

Hakan Mehmetcik

AI总结 本研究通过多智能体地缘政治兵棋推演,发现前沿LLM在跨语言条件下存在行为偏斜,且该效应依赖于模型架构与训练机制,而非西方起源模型的普遍属性。

详情
Comments
25 pages, 2 figures, 6 tables, Research Article
AI中文摘要

本研究调查了前沿大型语言模型(LLMs)在持续对抗条件下遭受的跨语言分布偏斜(示播列效应)。我们开发了一个多智能体地缘政治兵棋推演——蔚蓝海危机,这是一个旨在模拟东地中海冲突结构动态的合成海洋领土争端。六个前沿模型(GPT-4o、Llama-4、Mistral-Large、Gemini-3.1-Pro、Qwen3.6-Plus和DeepSeek-R1)参与了一项组间实验(每组N=10局游戏,每局K=5轮),其中唯一的操作变量是游戏语言(英语与土耳其语),产生了586条有效陈述。一个零样本分类器沿两个连续维度评估行为倾向:让步率和强制修辞。结果是异质的。Llama-4在土耳其语下显示出经Holm校正的强制修辞显著增加(delta = +0.800,p = .002),而Gemini-3.1-Pro显示出同样大的下降(delta = -0.750,p = .005)。DeepSeek-R1表现出类似的负向偏移(delta = -0.860,p = .006),并提供了与缓冲机制一致的思维链证据。GPT-4o未显示出可检测效应(delta = +0.130,p = .614)。这些发现表明,跨语言行为偏斜取决于模型架构和训练机制,而非西方起源LLM的普遍属性。我们识别出两种不同的缓冲机制——思维链制度锚定和多语言RLHF对齐——并讨论了它们对将LLM安全集成到外交和危机管理环境中的启示。

英文摘要

This study investigates cross-lingual distributional skew (the Shibboleth Effect) in frontier large language models (LLMs) subjected to sustained adversarial conditions. We develop a multi-agent geopolitical wargame, the Cerulean Sea Crisis, a synthetic maritime territorial dispute designed to mirror the structural dynamics of Eastern Mediterranean conflicts. Six frontier models (GPT-4o, Llama-4, Mistral-Large, Gemini-3.1-Pro, Qwen3.6-Plus, and DeepSeek-R1) participate in a between-groups experiment (N = 10 games per arm, K = 5 rounds per game) in which the sole manipulation is the language of play (English versus Turkish), producing 586 validated statements. A zero-shot classifier assesses behavioral dispositions along two continuous dimensions: Concession Rate and Coercive Rhetoric. The results are heterogeneous. Llama-4 shows a substantial, Holm-corrected increase in coercive rhetoric under Turkish (delta = +0.800, p = .002), whereas Gemini-3.1-Pro displays an equally large decrease (delta = -0.750, p = .005). DeepSeek-R1 exhibits a similar negative shift (delta = -0.860, p = .006) and provides chain-of-thought evidence consistent with a buffering mechanism. GPT-4o shows no detectable effect (delta = +0.130, p = .614). These findings indicate that cross-lingual behavioral skew is contingent on model architecture and training regime rather than a universal property of Western-origin LLMs. We identify two distinct buffering mechanisms, chain-of-thought institutional anchoring and multilingual RLHF alignment, and discuss their implications for integrating LLMs safely into diplomatic and crisis-management settings.

2606.11066 2026-06-10 cs.LG q-bio.NC 新提交

GRAFT: Gain-Recalibrated Adapters for Transformer-Based Neural Population Activity Modeling

GRAFT: 基于Transformer的神经群体活动建模中的增益重校准适配器

Xiangsheng Ge, Yang Xie

AI总结 提出GRAFT模型,通过分离可重用时间动态与可重校准神经元接口,在MC Maze数据集上达到SOTA,并仅更新9.21%参数实现跨天重校准。

详情
AI中文摘要

神经群体活动模型可以从分箱的尖峰信号中恢复丰富的时间结构,但其读入和读出层通常与固定的记录神经元集合绑定。这种耦合限制了在长期脑机接口中的重用,因为记录神经元的身份、数量和响应统计可能每天变化。我们引入了GRAFT,一种基于Transformer的神经群体活动模型,它将可重用时间动态与可重校准的神经元接口分离。神经元接口控制记录神经元如何进入和离开共享骨干网络,辅助增益和位置机制支持Transformer内部的神经活动建模。在标准NLB'21协议下的MC Maze上,GRAFT作为集成模型达到0.3866 co-bps,在公共和报告的NLB'21结果中,在主要co-bps指标上创造了新的最先进水平。在从NLB'21 MC Maze数据集系列构建的跨天协议中,GRAFT通过仅更新9.21%的参数,从MC Maze重校准到缩放后的MC Maze数据集(Large/Medium/Small),在受限的目标天支持集下分别达到0.3749、0.3112和0.3152 co-bps。这些结果表明,相同的接口-骨干分离既支持强大的基于Transformer的神经群体活动建模,也支持数据高效的跨天重校准。

英文摘要

Neural population activity models can recover rich temporal structure from binned spikes, but their read-in and readout layers often remain tied to a fixed set of recorded neurons. This coupling limits reuse in long-term brain-computer interfaces, where recorded neuron identities, counts, and response statistics can change across days. We introduce GRAFT, a Transformer-based neural population activity model that separates reusable temporal dynamics from a recalibratable neuron interface. The neuron interface controls how recorded neurons enter and leave the shared backbone, and auxiliary gain and positional mechanisms support neural activity modeling inside the Transformer. On MC Maze under the standard NLB'21 protocol, GRAFT reaches 0.3866 co-bps as an ensemble, setting a new state of the art on the primary co-bps metric among public and reported NLB'21 results. In a cross-day protocol constructed from the NLB'21 MC Maze dataset series, GRAFT recalibrates from MC Maze to the scaled MC Maze datasets (Large/Medium/Small) by updating only 9.21% of parameters, reaching 0.3749, 0.3112, and 0.3152 co-bps with restricted target-day support sets. These results show that the same interface-backbone separation supports both strong Transformer-based neural population activity modeling and data-efficient cross-day recalibration.

2606.11057 2026-06-10 cs.LG q-bio.BM stat.ML 新提交

Flexible Kernels for Protein Property Prediction

用于蛋白质性质预测的灵活核函数

Martin Jankowiak, Yerdos Ordabayev, Rudraksh Tuwani, Henry N. Ward, Hunter Nisonoff, James M. McFarland, Gevorg Grigoryan

AI总结 提出利用进化替代矩阵和局部线性性的序列核函数,结合高斯过程实现数据高效的蛋白质性质预测,并融入结构信息进行多任务学习。

详情
Comments
50 pages; to appear at ICML 2026
AI中文摘要

尽管对蛋白质设计应用至关重要,但从稀疏实验数据预测蛋白质性质(如结合亲和力和热稳定性)仍然是一个重大挑战。因此,我们引入了一类序列核函数,利用进化替代矩阵以及局部线性性,并证明由此产生的高斯过程为蛋白质性质景观提供了数据高效的模型,通常优于依赖基础模型嵌入的替代方法。此外,通过学习实际上是结构感知的替代矩阵,我们展示了我们的核函数可以轻松地整合来自基础模型的结构信息。我们证明了这些结构条件核函数非常适合跨多个蛋白质性质景观的多任务学习,并且可以显著优于局部监督学习方法。

英文摘要

Despite its importance to applications in protein design, predicting protein properties like binding affinity and thermostability from sparse experimental data remains a significant challenge. Accordingly, we introduce a class of sequence kernels that exploit evolutionary substitution matrices as well as local linearity and demonstrate that the resulting Gaussian processes provide data-efficient models of protein property landscapes, frequently outperforming alternatives that rely on foundation model embeddings. Furthermore--by learning what are in effect structure-aware substitution matrices--we show that our kernels can readily incorporate structural information from foundation models. We demonstrate that these structure-conditioned kernels are well suited to multi-task learning across multiple protein property landscapes and can decisively outperform local supervised learning methods.

2606.11023 2026-06-10 cs.IR cs.CL cs.LG 新提交

Generative Archetype-Grounded Item Representations for Sequential Recommendation

生成式原型驱动的物品表示用于序列推荐

Yifan Li, Jiahong Liu, Xinni Zhang, Hao Chen, Yankai Chen, Wenhao Yu, Jianting Chen, Irwin King

AI总结 提出GenAIR框架,利用大语言模型生成物品原型描述并提取嵌入,结合行为校准目标弥合语义与行为差距,显著提升序列推荐性能。

详情
Comments
Accepted by WWW 2026 (Oral)
AI中文摘要

序列推荐旨在通过分析用户的历史行为来预测用户与物品的下一次交互。然而,物品表示的质量有限仍然是一个关键瓶颈。虽然预训练的大语言模型(LLM)可以提供丰富的语义表示,但现有方法仅依赖于固定属性的静态编码,忽视了目标受众在定义物品身份中的关键作用。此外,语义空间难以反映实际用户行为,导致语义表示与行为模式之间存在显著差距。为了解决这些局限性,我们提出了GenAIR,一个通用框架,通过生成式原型驱动的物品表示来增强序列推荐。具体来说,我们首先利用LLM分析物品元数据并推断原型的文本描述,该原型代表物品理想目标受众的概念轮廓。然后,我们在一次前向传播中提取相应的嵌入。此外,为了将这些生成式原型基于现实世界的行为,我们引入了一个行为校准目标,该目标明确地整合了来自实际交互的行为信号。该目标调整嵌入空间的结构以反映经验模式。GenAIR能够与大多数现有模型无缝集成,同时保持高效率。在三个真实世界数据集上进行的全面实验表明,GenAIR显著提高了各种序列推荐模型的性能,并始终优于最先进的基线方法。实现代码可在以下网址获取:https://this URL。

英文摘要

Sequential recommendation aims to predict users' next interaction with items by analyzing their historical behavior. However, the limited quality of item representations remains a critical bottleneck. While pre-trained large language models (LLMs) can provide rich semantic representations, existing approaches only rely on static encoding of fixed attributes, overlooking the crucial role of target audiences in defining item identity. Moreover, the semantic space struggles to reflect actual user behavior, resulting in a significant gap between semantic representations and behavioral patterns. To address these limitations, we propose GenAIR, a general framework that empowers sequential recommendation with Generative Archetype-grounded Item Representations. Specifically, we first leverage an LLM to analyze item metadata and infer textual description of the Archetype, which represents the conceptual profile of the item's ideal target audience. We then extract the corresponding embeddings in a single forward pass. Further, to ground these generative archetypes in real-world behavior, we introduce a behavioral calibration objective, which explicitly incorporates behavioral signals from actual interactions. This objective adjusts the structure of the embedding space to reflect empirical patterns. GenAIR enables seamless integration with most existing models while maintaining high efficiency. Comprehensive experiments conducted on three real-world datasets demonstrate that GenAIR significantly improves the performance of various sequential recommendation models and consistently outperforms state-of-the-art baseline approaches. Implementation codes are available at https://github.com/AI-Santiago/GenAIR.

2606.11009 2026-06-10 cs.CL cs.CY 新提交

Who Brought Easter Eggs to Eid? Auditing Cultural Translation of Math Word Problems Across Diverse Languages and Regions

谁把复活节彩蛋带到了开斋节?跨语言和地区数学应用题的文化翻译审计

Parisa Suchdev, Juniper Lovato

AI总结 本研究审计了三个大型语言模型将60个英语数学应用题翻译为7种语言时的文化适应性,发现模型在62.5%的案例中一致,但仅33.5%有相同替换,且所有组合均出现熵塌缩,优先改变表面标记而保留深层结构,导致文化多样性压缩和区域误归因。

详情
Comments
17 pages total with references and appendix, 9 figures, under review
AI中文摘要

大型语言模型越来越多地被用于大规模个性化学习中改编数学应用题,但这些改编是否跨模型一致、是否在规模上保留文化多样性、以及揭示模型认为哪些文化实体最显著,仍是未解决的问题。我们分析了Claude Opus 4、GPT-4.1和Gemini 2.5 Pro如何将60个英语数学应用题改编为孟加拉语、印地语、旁遮普语(印度)、乌尔都语、信德语(巴基斯坦)、意大利语和西西里语(意大利),这一语言集涵盖了从高资源语言(意大利语和印地语)到研究不足的语言(信德语、西西里语和旁遮普语)的完整资源谱系。我们标注了6,489个实体转换,编码模型是否保留、本地化、泛化、省略或更改名称、食物和地点等实体。模型在62.5%的案例中在转换类型上一致,在特定替换上仅33.5%一致,这意味着模型选择直接塑造了学生遇到的文化世界。所有21种语言-模型组合均出现熵塌缩,改编压缩而非扩展了文化多样性。模型优先处理表面标记(如名称、食物和货币),同时保留更深层的结构特征(如嵌入特定文化假设的年级系统)。尽管提示指定了目标国家,模型仍错误归因区域背景,例如对印度孟加拉语学生使用孟加拉国塔卡,并产生跨文化污染,例如将寻蛋活动改编为开斋节活动。某些失败在单个翻译中可见。其他失败,包括多样性塌缩、对表面标记的系统性偏好以及一致的区域误归因,仅通过语料库级分析才显现。使改编问题看起来正确的表面合理性,正是使深层失败容易被忽视的原因。

英文摘要

Large language models are increasingly used to adapt math word problems for personalized learning at scale, but it remains an open question whether those adaptations are consistent across models, preserve cultural diversity at scale, and reveal which cultural entities models treat as most salient. We analyze how Claude Opus 4, GPT-4.1, and Gemini 2.5 Pro adapt 60 English math word problems into Bengali, Hindi, Punjabi (India), Urdu, Sindhi (Pakistan), Italian, and Sicilian (Italy), a language set spanning the full resource spectrum, from high-resource Italian and Hindi to under-studied Sindhi, Sicilian, and Punjabi. We annotate 6,489 entity transformations, coding whether models preserve, localize, generalize, omit, or change entities such as names, foods, and places. Models agree on transformation type in 62.5% of cases and on specific substitutions in only 33.5%, meaning model choice directly shapes which cultural world students encounter. All 21 language-model combinations show entropy collapse, with adaptation compressing rather than expanding cultural diversity. Models prioritize surface markers such as names, foods, and currencies while preserving deeper structural features such as grade-level systems that embed culturally specific assumptions. Despite prompts specifying target countries, models misattribute regional context by using Bangladeshi taka for Indian Bengali students and produce cross-cultural contamination, such as adapting egg hunts as Eid activities. Some failures are visible in individual translations. Others, including diversity collapse, systematic preference for surface markers, and consistent regional misattribution, emerge only through corpus-level analysis. The surface plausibility that makes adapted problems look correct is precisely what makes deeper failures easy to overlook.

2606.11007 2026-06-10 cs.CR cs.AI cs.SE 新提交

Understanding and mitigating the risks of OpenClaw for non-technical users: A practical guide with Skill

理解并减轻非技术用户使用OpenClaw的风险:一份实用指南与Skill

Junchang Zheng, Junfeng Tan, Jialiang Lin

AI总结 针对非技术用户,识别OpenClaw的七类核心风险,用通俗语言解释,提供可操作的防御策略,并开发自动化安全配置的Skill,降低使用门槛。

详情
Comments
Work in progress
AI中文摘要

OpenClaw已迅速成为一种变革性的人工智能(AI)智能体框架,其自主执行复杂多步任务的能力吸引了日益增长且多样化的用户群体。然而,这种能力伴随着显著的风险。虽然现有研究在描述这些威胁方面取得了重要进展,但此类工作主要面向技术娴熟的受众,对非技术用户而言仍然难以触及。这一群体如今在社区中占比越来越大且服务不足,而正是这些用户最迫切需要实用且直接的指导。为此,我们通过一系列相互关联的努力来弥合这一差距,旨在降低非技术OpenClaw用户的风险门槛。首先,我们识别并分类了OpenClaw用户在日常使用中可能遇到的七类核心风险,并用通俗语言解释,以便非技术用户能够轻松理解这些威胁的性质和潜在后果。其次,针对每种已识别的风险,我们将一套相应的防御策略提炼为清晰且可操作的具体步骤,易于遵循。第三,为使保护更加便捷,我们提供了一个配套的OpenClaw Skill,可自动执行关键安全配置,使用户能够以最少的手动干预保护其系统。通过这项工作,我们证明了防范智能体风险不必是安全专家的专属领域,非技术用户可以通过简单、实用的行动有意义地参与降低这些风险。

英文摘要

OpenClaw has rapidly emerged as a transformative artificial intelligence (AI) agent framework, and its ability to autonomously execute complex, multi-step tasks has attracted an ever-growing and diverse user base. However, this capability comes with significant risks. While existing research has made important strides in characterizing these threats, such work is predominantly directed at technically sophisticated audiences. It remains largely inaccessible to non-technical users. This demographic now makes up an increasingly large and underserved portion of the community, yet it is these very users who most urgently need practical and straightforward guidance. In response, we bridge this gap through a series of interconnected efforts designed to lower the risk barrier for non-technical OpenClaw users. First, we identify and categorize seven core risks that OpenClaw users may encounter in daily usage, explaining each in plain language so that non-technical users can readily grasp the nature and potential consequences of these threats. Second, for each identified risk, we distill a set of corresponding defensive strategies into clear and actionable operational steps that are easy to follow. Third, to make protection even easier, we provide a companion OpenClaw Skill that automates key security configurations, enabling users to safeguard their systems with minimal manual intervention. Through this work, we demonstrate that safeguarding against the risks of intelligent agents need not be the exclusive domain of security experts, and that non-technical users can meaningfully participate in reducing these risks through simple, practical actions.

2606.10986 2026-06-10 cs.RO cs.SY eess.SY 新提交

Multi-UAV Active Sensing with Information Gain-based Planning and Belief Fusion

基于信息增益规划与信念融合的多无人机主动感知

S. Habibi, L. Marques

AI总结 提出多无人机主动感知框架,利用信息增益路径规划与概率信念融合实现二元地形映射,在合成和真实农业图像上验证,相比随机游走和扫描覆盖降低熵与误差。

详情
AI中文摘要

无人机越来越多地用于空间分布环境中的主动感知和信息收集。然而,其性能受到有限飞行时间、感知不确定性以及空间覆盖与观测精度之间权衡的制约。本文提出了一个多无人机主动感知框架的实际验证,用于概率二元地形映射,以精准农业作为应用案例。环境表示为概率信念图,其中空间依赖性通过因子图建模。无人机决策由基于信息增益的信息路径规划(IGbIPP)引导,并与随机游走和扫描覆盖路径规划基线在合成地形和真实无人机农业图像上进行比较。研究还评估了空间相关权重和几种用于多无人机信息共享的概率信念融合规则。结果表明,IGbIPP比基线更有效地降低了熵和映射误差,而更宽的视场提高了实际覆盖和地图精度。结果进一步表明,简单的相等或偏置空间权重比自适应权重更稳健,并且贝叶斯、对数几率与Dempster-Shafer融合实现了最佳协同映射性能。这些发现强调了不确定性驱动规划、感知几何、空间建模和概率融合对于实际无人机主动感知的重要性。

英文摘要

Unmanned aerial vehicles (UAVs) are increasingly used for active sensing and information gathering in spatially distributed environments. Their performance, however, is constrained by limited flight time, sensing uncertainty, and the trade-off between spatial coverage and observation accuracy. This paper presents a real-world validation of a multi-UAV active sensing framework for probabilistic binary terrain mapping, with precision agriculture used as the application case. The environment is represented as a probabilistic belief map, where spatial dependencies are modeled through a factor-graph formulation. UAV decision making is guided by Information Gain based Informative Path Planning (IGbIPP), and the approach is compared with Random Walk and Sweep coverage path planning baselines using both synthetic terrains and real UAV-derived agricultural imagery. The study also evaluates spatial correlation weights and several probabilistic belief-fusion rules for multi-UAV information sharing. Results show that IGbIPP reduces entropy and mapping error more effectively than the baselines, while a wider field of view improves real-world coverage and map accuracy. The results further show that simple equal or biased spatial weights can be more robust than adaptive weights, and that Bayesian, log-odds, and Dempster--Shafer fusion achieve the best cooperative mapping performance. These findings highlight the importance of uncertainty-driven planning, sensing geometry, spatial modeling, and probabilistic fusion for real-world UAV-based active sensing.

2606.10942 2026-06-10 cs.NI cs.AI cs.LG 新提交

Generative Explainability for Next-Generation Networks: LLM-Augmented XAI with Mutual Feature Interactions

下一代网络的生成式可解释性:基于互特征交互的LLM增强XAI

Kiarash Rezaei, Omran Ayoub, Sebastian Troia, Francesco Lelli, Paolo Monti, Carlos Natalino

AI总结 提出一种利用大语言模型和互特征交互数据生成自然语言解释的框架,在光传输质量估计用例中,相比基线方法,解释有用性和范围分别提升12.2%和6.2%,正确率达97.5%。

详情
Journal ref
Proc. WiMob, Marrakesh, Morocco, 2025
Comments
7 pages, with one page for appendix. Accepted for publication at the 2025 21th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob)
AI中文摘要

随着人工智能和机器学习模型成为网络运营的核心,其缺乏透明度对运营商信任构成重大障碍。现有的可解释人工智能技术往往无法为非专家弥合这一差距,产生的技术输出难以转化为可操作的见解。本文提出了一个专门解决这一缺陷的框架。它利用中等规模的大语言模型,并超越了SHapley Additive exPlanations特征影响值的标准用法。该框架采用结构化的提示,并辅以互特征交互数据,以生成人类可理解的自然语言解释。为了验证我们的框架,我们在光传输质量估计用例中进行了实证评估,并邀请了人类评估者。我们收集了专家的独立性能评估,显示出较高的评估者间一致性。与仅使用SHAP特征影响值进行简单提示的最先进基线相比,我们的方法将解释有用性和范围分别提高了12.2%和6.2%,同时实现了97.5%的正确性。

英文摘要

As artificial intelligence and machine learning (AI/ML) models become integral to network operations, their lack of transparency poses a significant barrier to operator trust. Existing explainable artificial intelligence (XAI) techniques often fail to bridge this gap for non-specialists, producing technical outputs that are difficult to translate into actionable insights. This paper presents a framework specifically designed to address this shortcoming. It leverages a moderately sized large language model (LLM) and extends beyond the standard use of SHapley Additive exPlanations (SHAP) feature influence values. The framework employs a structured prompt enriched with mutual feature interaction data to generate human-understandable natural language explanations. To validate our framework, we performed an empirical evaluation on an optical quality of transmission (QoT) estimation use case with human evaluators. We collected independent performance evaluations from specialists, which showed a high inter-evaluator agreement. Compared to a state-of-the-art baseline that uses only SHAP feature influence values in a straightforward prompt, our approach improves the explanation usefulness and scope by 12.2% and 6.2%, while achieving 97.5% correctness.

2606.10940 2026-06-10 cs.CV cs.AI cs.LG 新提交

Democratising Camera Trap AI: An Open-Source Model for Detecting UK Mammals

民主化相机陷阱AI:用于检测英国哺乳动物的开源模型

Paul Fergus, Philip Stephens, Russell A. Hill, Lee Oliver, Katie Appleby, Sarah Beatham, Naomi Davies Walsh, Stuart Nixon, Naomi Matthews, Chris Sutherland, Kelly Hitchcock

AI总结 发布一个针对31类(28种英国常见哺乳动物和鸟类)的开源目标检测模型,基于YOLO26x在48,165个标注实例上训练,mAP@0.5达0.984,旨在降低生态学家使用AI的门槛。

详情
Comments
15 Pages, 4 Figures
AI中文摘要

相机陷阱已成为生物多样性监测的基石,但将大量图像转化为可用生态数据的人工智能通常被锁定在商业平台之后,或针对与不列颠群岛不相符的动物群进行训练。为了消除障碍并提高采用率,我们发布了一个针对31类(28种英国常见哺乳动物和鸟类,以及人类、校准杆和车辆等实用类)的开源目标检测模型,该模型基于从多个地点经过十年运营部署(通过Conservation AI及其后续项目Trap Tracker)收集的48,165个标注实例的精选数据集。该模型是YOLO26x检测器,在80/10/10的类别分层划分上进行训练和测试,在保留的验证集上,IoU为0.5时平均精度为0.984(IoU 0.5-0.95时为0.956),精确率为0.988,召回率为0.965。在未见过的保留测试集上,31个类别的平均物种置信度范围为0.96至0.99,假阴性率为0.17%,主要集中在困难的夜间、远处或遮挡图像中。这些指标来自与训练相同站点和相机池的数据,因此在新站点的性能留待未来工作。我们以非商业许可发布ONNX格式的训练权重,支持本地桌面和实时相机,明确面向没有机器学习经验的生态学家。此发布是对过去十年中开发的多个付费模型的有意制衡。

英文摘要

Camera traps have become a cornerstone of biodiversity monitoring, but the artificial intelligence that turns vast quantities of images into usable ecological data is often locked behind commercial platforms or trained on fauna that does not match that of the British Isles. In an attempt to remove barriers and increase uptake, we release an open-source object detection model for 31 classes, 28 common UK mammal and bird species, plus utility classes for humans, calibration poles, and vehicles, drawn from a curated dataset of 48,165 labelled instances assembled from multiple sites over a decade of operational deployment through Conservation AI and its successor, Trap Tracker. The model, a YOLO26x detector trained and tested on an 80/10/10 class-stratified split, achieves a mean Average Precision of 0.984 at Intersection over Union (IoU) of 0.5 (0.956 at IoU 0.5-0.95) on the held-out validation set, with precision 0.988 and recall 0.965. On an unseen held-out test split, mean per-species confidence ranged from 0.96 to 0.99 across the 31 classes, with a 0.17% false-negative rate concentrated in difficult night-time, distant, or occluded images. These metrics are from data from the same pool of sites and cameras as training, so performance at entirely new sites is left to future work. We release the trained weights in ONNX format under a non-commercial licence, with local desktop and real-time camera support, aimed explicitly at ecologists with no machine-learning experience. This release is a deliberate counterweight to the multiple paid for models that have developed over the last decade.