arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.21115 2026-06-18 cs.DC cs.LG 版本更新

Automated Byzantine-Resilient Clustered Decentralized Federated Learning for Battery Intelligence in Connected EVs

自动化抗拜占庭攻击的集群化去中心化联邦学习用于连接电动车的电池智能

Mouhamed Amine Bouchiha, Abdelaziz Amara Korba, Yacine Ghamri-Doudane

发表机构 * SAMOVAR, Télécom SudParis（SAMOVAR，法国电信南巴黎学院）； Department of Computer Science, German University of Technology in Oman (GUtech)（阿曼技术大学计算机科学系）； L3i, La Rochelle University（拉罗什大学L3i）

AI总结本文提出了一种自动化抗拜占庭攻击的集群化去中心化联邦学习框架ABC-DFL，用于连接电动车的电池智能，通过引入动态Quorum拜占庭容错协议和基于或acles的聚合层，提高信任、安全和自动化水平，FLECA协议通过适应性阈值过滤恶意更新，有效缓解拜占庭攻击。

Comments 16 pages, 8 figures

详情

AI中文摘要

联邦学习（FL）已作为一种有前景的范式，用于管理智能交通系统（ITS）中的电动汽车（EV）电池数据，使其能够执行隐私保护的任务，如异常检测和容量估计。然而，大多数现有框架依赖于集中式聚合方案，这在安全性和信任方面存在关键限制。为了应对这些挑战，我们提出了ABC-DFL，一种用于连接电动车的自动化抗拜占庭攻击的集群化去中心化联邦学习（C-DFL）框架。所提出的激励驱动的C-DFL系统用开放许可的区块链取代中央服务器，特征新的动态Quorum拜占庭容错（QBFT）协议和基于或acles的聚合层，以增强信任、安全和自动化。ABC-DFL的核心是FLECA（过滤分层增强聚合），一种稳健的分层聚合协议，通过让每个EV使用基于其参考模型更新偏差的适应性阈值过滤恶意更新来缓解拜占庭攻击。Oracle节点负责跨组聚合，利用稳健的聚类来隔离和聚合来自可信EV组的模型更新。全面的实验评估显示，FLECA在良好条件下与FedProx收敛，并在适应性对抗场景中显著优于现有防御措施，攻击影响评分低于0.10。此外，多个多任务模型学习实验验证了激励机制的有效性和公平性。最后，链上和链下基准验证了ABC-DFL的实用性。

英文摘要

Federated learning (FL) has emerged as a promising paradigm for managing electric vehicle (EV) battery data in intelligent transportation systems (ITS), enabling privacy-preserving tasks such as anomaly detection and capacity estimation. However, most existing frameworks rely on centralized aggregation schemes, which pose critical limitations in terms of security and trust. To address these challenges, we propose ABC-DFL, an automated Byzantine-resilient clustered decentralized federated learning (C-DFL) framework for connected EVs. The proposed incentive-driven C-DFL system replaces the central server with an open-permissioned blockchain, featuring a new dynamic Quorum Byzantine Fault Tolerance (QBFT) protocol and an oracle-based aggregation layer, to enhance trust, security, and automation. At the core of ABC-DFL lies FLECA (Filtered Layered Enhanced Clustering Aggregation), a robust hierarchical aggregation protocol that mitigates Byzantine attacks by having each EV filter malicious updates using an adaptive threshold based on deviations from its reference model update. Oracle nodes, responsible for inter-group aggregation, employ robust clustering to isolate and aggregate model updates from trustworthy EV groups. Comprehensive experimental evaluations demonstrate that FLECA matches FedProx convergence under benign conditions and significantly outperforms existing defenses with attack impact scores below 0.10 in adaptive adversarial scenarios. Furthermore, several learning experiments with multitask models confirm the effectiveness and fairness of the incentive mechanism. Finally, on-chain and off-chain benchmarks validate the practicality of ABC-DFL.

URL PDF HTML ☆

赞 0 踩 0

2605.20726 2026-06-18 stat.ME cs.LG stat.ML 版本更新

Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference

在符合推断中对虚假发现比例的处处有效界

Ziang Song, Ying Jin, Emmanuel J. Candès

发表机构 * Department of Statistics, Stanford University（斯坦福大学统计学系）； Department of Statistics and Data Science, University of Pennsylvania（宾夕法尼亚大学统计学与数据科学系）； Department of Mathematics, Stanford University（斯坦福大学数学系）

AI总结本文提出了一种在多重检验问题中对虚假发现比例（FDP）的处处有效界，通过构造高概率包络来保证在任意后验阈值选择下的统计保证，同时展示了该方法在异常检测和符合选择中的应用。

Comments 34 pages, 12 figures. Code available at https://github.com/sza919/everywhere-valid-fdp-bounds-in-conformal-inference

详情

AI中文摘要

现代将符合推断应用于多重检验问题，如异常检测和候选选择时，通常涉及选择符合p值低于阈值的测试样本。此类方法的质量通常通过虚假发现比例（FDP）来衡量，定义为错误选择的比例。现有方法通常控制FDP的期望值，使用如Benjamini-Hochberg过程等方法。这种做法无法提供高概率界下的实际FDP界，且当拒绝阈值在查看数据后选择时会破坏统计保证。本文建立了适用于所有可能拒绝阈值的有限样本、分布无关的FDP上界，从而允许任意后验阈值选择。通过从其联合分布中采样来构造null符合p值的经验分布函数的高概率包络，实现了同时有效性。此外，我们的框架允许从业者调节包络的形状，从而在主要感兴趣的拒绝区域中产生更紧的界。我们使用这种灵活的方法推导出异常检测和符合选择的的同时FDP上界。通过合成和真实数据实验，我们展示了所得到的界既有效又比现有方法的界更加不保守。

英文摘要

Modern applications of conformal inference to multiple testing problems, such as outlier detection and candidate selection, often involve selecting test samples whose conformal p-values fall below a threshold. The quality of such methods is often measured by the false discovery proportion (FDP), defined as the fraction of incorrect selections. Existing approaches typically control the expected value of the FDP, using methods such as the Benjamini-Hochberg procedure. This approach fails to provide high-probability bounds on the realized false discovery proportion and invalidates statistical guarantees if the rejection threshold is selected after inspecting the data. This paper establishes finite-sample, distribution-free upper bounds on the FDP that hold simultaneously over all possible rejection thresholds, enabling arbitrary post hoc selection of the threshold. Simultaneous validity is achieved by constructing a high-probability envelope for the empirical distribution function of null conformal p-values by sampling from their joint distribution. Furthermore, our framework allows practitioners to modulate the envelope's shape, thereby producing tight bounds in rejection regions of primary interest. We use this flexible approach to derive simultaneous FDP upper bounds for both outlier detection and conformal selection. We demonstrate through synthetic and real-data experiments that the resulting bounds are both valid and substantially less conservative than those derived from existing approaches.

URL PDF HTML ☆

赞 0 踩 0

2605.12713 2026-06-18 quant-ph cs.AI 版本更新

Controllable Quantum Memory Capacity in Quantum Reservoir Networks with Tunable partial-SWAPs

量子回路网络中可控的量子记忆容量：可调部分SWAPs

Erik L. Connerty, Ethan N. Evans

发表机构 * University of South Carolina - Columbia（南卡罗来纳大学哥伦比亚分校）； Qodex Quantum（Qodex量子）

AI总结本文提出一种可调部分SWAP机制，用于控制量子回路网络中记忆衰减速率，通过模拟和IBM QPU验证，提升了噪声中间尺度量子处理器的性能。

Comments 14 pages, 9 figures

详情

AI中文摘要

在量子回路计算领域，许多不同的计算模型和架构已被提出。从这些模型中，我们识别出基于反馈的模型和递归模型作为两种主要竞争架构。本文在递归架构基础上，提出了一种双寄存器方法，使量子回路计算具有衰减记忆。虽然这些方法已在硬件上验证并展示了在噪声中间尺度量子处理器上的优异性能，但记忆容量的确切机制尚不完全理解或完全可控。为此，我们扩展了递归方法，提出了一种硬件可实现的可调部分SWAP机制，允许从基于门的量子处理器上实现的量子回路网络直接控制记忆衰减速率。该机制的理论基于受控振幅阻尼通道，并通过随机短期记忆容量（STMC）回忆基准和NARMA-5数据集的验证实验进行验证，分别使用模拟和IBM QPU进行测试。

英文摘要

In the field of quantum reservoir computing (QRC), many different computational models and architectures have been proposed. From these models, we identify feedback-based models -- which use a feedback mechanism to re-embed classical measurements from the QRC -- and recurrent models -- which use a multi-register approach with memory and readout qubits -- as the two major competing architectures that have been discussed and validated on hardware. In this paper, we advance upon the recurrent architectures, which employ a two register approach to endow the QRC with a fading memory. While these approaches have been validated on hardware and have demonstrated great real-world performance on noisy-intermediate-scale-quantum (NISQ) quantum processing units (QPUs), the exact mechanism through which the memory capacity arises is not completely understood or fully controllable. With this, we augment the recurrent approaches and present a hardware-realizable mechanism, which we call a tunable partial-SWAP, that allows for the direct control of the rate of memory dissipation from a QRN implemented on a gate-based QPU. The theory behind this mechanism is discussed in terms of a controlled amplitude-damping channel and validation experiments using a randomized short-term memory capacity (STMC) recall benchmark and the NARMA-5 dataset are conducted using simulation and IBM QPUs, respectively.

URL PDF HTML ☆

赞 0 踩 0

2603.15988 2026-06-18 eess.AS cs.AI cs.LG 版本更新

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

无中生有：面向构音障碍语音严重程度鲁棒估计的数据增强

Jaesung Bae, Xiuwen Zheng, Minje Kim, Chang D. Yoo, Mark Hasegawa-Johnson

发表机构 * 1 University of Illinois Urbana-Champaign, IL, USA 2 Korea Advanced Institute of Science \& Technology, KR

AI总结提出三阶段框架，利用未标注构音障碍语音和典型语音数据集，通过教师模型生成伪标签、标签感知对比学习预训练和微调，在五个未见数据集上平均SRCC达0.761，显著优于现有方法。

Comments Accepted to Interspeech 2026 Long Paper Track

详情

AI中文摘要

构音障碍语音质量评估（DSQA）对于临床诊断和包容性语音技术至关重要。然而，主观评估成本高且难以规模化，而标注数据的稀缺限制了鲁棒的客观建模。为解决这一问题，我们提出了一个三阶段框架，利用未标注的构音障碍语音和大规模典型语音数据集来扩展训练。教师模型首先生成未标注样本的伪标签，然后使用标签感知对比学习策略进行弱监督预训练，使模型暴露于多样化的说话者和声学条件。预训练模型随后针对下游DSQA任务进行微调。在跨越多种病因和语言的五个未见数据集上的实验证明了我们方法的鲁棒性。我们的基于Whisper的基线显著优于SOTA DSQA预测器（如SpICE），完整框架在未见测试数据集上实现了平均SRCC为0.761。

英文摘要

Dysarthric speech quality assessment (DSQA) is critical for clinical diagnostics and inclusive speech technologies. However, subjective evaluation is costly and difficult to scale, and the scarcity of labeled data limits robust objective modeling. To address this, we propose a three-stage framework that leverages unlabeled dysarthric speech and large-scale typical speech datasets to scale training. A teacher model first generates pseudo-labels for unlabeled samples, followed by weakly supervised pretraining using a label-aware contrastive learning strategy that exposes the model to diverse speakers and acoustic conditions. The pretrained model is then fine-tuned for the downstream DSQA task. Experiments on five unseen datasets spanning multiple etiologies and languages demonstrate the robustness of our approach. Our Whisper-based baseline significantly outperforms SOTA DSQA predictors such as SpICE, and the full framework achieves an average SRCC of 0.761 across unseen test datasets.

URL PDF HTML ☆

赞 0 踩 0

2604.14906 2026-06-18 physics.bio-ph cs.LG 版本更新

Unraveling the Mechanism of Drug Binding to SARS-CoV-2 RNA Pseudoknot with Thermodynamics-Driven Machine Learning

用热力学驱动的机器学习揭示药物与SARS-CoV-2 RNA假结的结合机制

Mariia Ivonina, Jakub Rydzewski

发表机构 * Platform of Inter/Transdisciplinary Energy Research, Kyushu University（interdisciplinary 能源研究平台，九州大学）； Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University（物理研究所，物理、天文学与信息学学院，尼古拉库普林大学）

AI总结本研究利用热力学驱动的机器学习方法（光谱映射）从全原子分子动力学轨迹中学习集体变量，揭示了配体结合对SARS-CoV-2 RNA假结拓扑选择性去稳定化的机制，并发现质子化状态是模拟RNA靶向药物作用的关键因素。

详情

AI中文摘要

SARS-CoV-2 RNA中的假结二级结构通过$-1$程序性核糖体移码（$-1$ PRF）调控蛋白质合成，该机制使病毒能从重叠阅读框产生结构蛋白和非结构蛋白。该假结表现出穿线和非穿线两种长寿命拓扑结构。配体结合对其折叠的影响是开发$-1$ PRF小分子抑制剂的关键过程。通过引入捕捉相应最慢动力学模式的集体变量（CVs），可以促进通过无偏分子动力学（MD）模拟理解这一过程。这里，我们使用光谱映射（SM），一种热力学驱动的机器学习技术，直接从SARS-CoV-2 RNA假结与$-1$ PRF抑制剂莫拉沙星及其两种结构类似物（中性和离子化形式）复合物的全原子MD轨迹中学习这样的CVs。从学习到的CVs导出的自由能景观（FELs）表明，配体诱导的去稳定化是拓扑选择性的。在穿线假结中，抑制剂去稳定化S2茎，而在非穿线假结中，去稳定化发生在S1和S3茎。此外，每个配体重塑FEL的程度与实验报道的抗病毒效力相匹配，而质子化状态在相同RNA拓扑内定性地改变动力学。总体而言，我们的结果显示了假结拓扑、配体类型和质子化状态如何共同影响病毒RNA的慢构象动力学，并确立了生理质子化作为模拟RNA靶向药物作用的关键因素。

英文摘要

The pseudoknot secondary structure in SARS-CoV-2 RNA is essential for regulating protein synthesis through $-$1 programmed ribosomal frameshifting ($-1$ PRF), a mechanism that allows the virus to generate both structural and non-structural proteins from overlapping reading frames. This pseudoknot exhibits both threaded and unthreaded long-lived topologies. The influence of ligand binding on its folding is a process critical for the development of $-$1 PRF small-molecule inhibitors. Understanding this process through unbiased molecular dynamics (MD) simulations can be facilitated by introducing collective variables (CVs) that capture the corresponding slowest dynamical modes. Here, we use spectral map (SM), a thermodynamics-driven machine learning technique, to learn such CVs directly from all-atom MD trajectories of the SARS-CoV-2 RNA pseudoknot in complex with the $-$1 PRF inhibitor merafloxacin and its two structural analogs in neutral and ionized forms. Free-energy landscapes (FELs) derived from the learned CVs indicate that ligand-induced destabilization is topology-selective. In the threaded pseudoknot, the inhibitors destabilize the S2 stem, while in the unthreaded pseudoknot, destabilization occurs in the S1 and S3 stems. Furthermore, the extent to which each ligand reshapes the FEL matches experimentally reported antiviral potency, whereas the protonation state qualitatively alters dynamics within the same RNA topology. Overall, our results show how pseudoknot topology, ligand type, and protonation state collectively influence the slow conformational dynamics of viral RNA and establish physiological protonation as a critical factor for modeling RNA-targeted drug action.

URL PDF HTML ☆

赞 0 踩 0

2604.06367 2026-06-18 cs.CR cs.AI cs.LG 版本更新

WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

WebSP-Eval：在网站安全与隐私任务上评估网络代理

Guruprasad Viswanathan Ramesh, Asmit Nayak, Basieem Siddique, Kassem Fawaz

发表机构 * University of Wisconsin-Madison（威斯康星大学麦迪逊分校）

AI总结提出WebSP-Eval框架，通过200个任务实例和自动化评估器，测试多模态大模型在网站安全与隐私任务上的表现，发现状态UI元素（如开关）导致超过45%的任务失败。

Comments Accepted at PETS 2026. Project Page: https://wiscprivacy.com/webspeval/

详情

AI中文摘要

网络代理自动化浏览器任务，从简单的表单填写到复杂的工作流程（如订购杂货）。虽然当前的基准测试评估通用性能（如WebArena）或针对恶意行为的安全性（如SafeArena），但没有现有框架评估代理成功执行面向用户的网站安全和隐私任务的能力，例如管理cookie偏好、配置隐私敏感账户设置或撤销非活动会话。为填补这一空白，我们引入了WebSP-Eval，一个用于衡量网络代理在网站安全和隐私任务上性能的评估框架。WebSP-Eval包括：1）一个手动制作的任务数据集，涵盖28个网站的200个任务实例；2）一个强大的代理系统，支持使用自定义Google Chrome扩展在多次运行中进行账户和初始状态管理；以及3）一个自动化评估器。我们使用最先进的多模态大语言模型评估了总共8个网络代理实例，对网站、任务类别和UI元素进行了细粒度分析。我们的评估显示，当前模型在可靠解决网站安全和隐私任务方面自主探索能力有限，并且在特定任务类别和网站上表现困难。关键的是，我们发现状态UI元素是代理失败的主要原因，其中开关导致许多模型超过45%的任务失败。

英文摘要

Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performance~(e.g., WebArena) or safety against malicious actions~(e.g., SafeArena), no existing framework assesses an agent's ability to successfully execute user-facing website security and privacy tasks, such as managing cookie preferences, configuring privacy-sensitive account settings, or revoking inactive sessions. To address this gap, we introduce WebSP-Eval, an evaluation framework for measuring web agent performance on website security and privacy tasks. WebSP-Eval comprises 1) a manually crafted task dataset of 200 task instances across 28 websites; 2) a robust agentic system supporting account and initial state management across runs using a custom Google Chrome extension; and 3) an automated evaluator. We evaluate a total of 8 web agent instantiations using state-of-the-art multimodal large language models, conducting a fine-grained analysis across websites, task categories, and UI elements. Our evaluation reveals that current models suffer from limited autonomous exploration capabilities to reliably solve website security and privacy tasks, and struggle with specific task categories and websites. Crucially, we identify stateful UI elements are a primary reason for agent failure, with toggles causing more than 45% task failure across many models.

URL PDF HTML ☆

赞 0 踩 0

2508.11211 2026-06-18 eess.IV cs.CV 版本更新

Efficient Image-to-Image Schrödinger Bridge for CT Field of View Extension

面向CT视野扩展的高效图像到图像薛定谔桥

Zhenhao Li, Song Ni, Long Yang, Xiaojie Yin, Haijun Yu, Jiazhou Wang, Hongbin Han, Weigang Hu, Yixing Huang

发表机构 * Institute of Medical Technology, Peking University Health Science Center（北京大学人民医院医学技术研究所）； Shanghai Cancer Center, Fudan University（复旦大学上海癌症中心）； Department of Electrical and Computer Engineering, University of Massachusetts Lowell（马萨诸塞大学洛厄尔分校电气与计算机工程系）； Beijing Key Laboratory of Intelligent Neuromodulation and Brain Disorder Treatment（北京智能神经调控与脑疾病治疗重点实验室）

AI总结提出基于图像到图像薛定谔桥（I²SB）扩散模型的CT视野扩展框架，通过直接学习有限视野与扩展视野图像间的随机映射，实现单步快速推理，在精度和速度上均超越现有扩散模型。

Comments 12 pages

Journal ref IEEE Transactions on Radiation and Plasma Medical Sciences 2026

详情

AI中文摘要

计算机断层扫描（CT）是一种用于无创、高分辨率可视化内部解剖结构的基石成像模态。然而，当扫描物体超出扫描仪的视野（FOV）时，投影数据被截断，导致重建不完整并在FOV边界附近出现明显伪影。传统重建算法难以从这类数据中恢复准确的解剖结构，限制了临床可靠性。深度学习方法已被探索用于FOV扩展，其中扩散生成模型代表了图像合成的最新进展。然而，传统扩散模型由于迭代采样过程，计算量大且推理速度慢。为解决这些限制，我们提出了一种基于图像到图像薛定谔桥（I$^2$SB）扩散模型的高效CT FOV扩展框架。与从纯高斯噪声合成图像的传统扩散模型不同，I$^2$SB学习配对的有限FOV和扩展FOV图像之间的直接随机映射。这种直接对应关系产生了更可解释和可追踪的生成过程，增强了重建中的解剖一致性和结构保真度。I$^2$SB实现了优越的定量性能，在模拟噪声数据上的均方根误差（RMSE）值为49.8 HU，在真实数据上为152.0 HU，优于最先进的扩散模型，如条件去噪扩散概率模型（cDDPM）和基于块的扩散方法。此外，其单步推理使得每2D切片的重建仅需0.19秒，相比cDDPM（135秒）实现了超过700倍的加速，并超过了第二快的DiffusionGAN（0.58秒）。这种准确性和效率的结合表明I$^2$SB具有实时或临床部署的潜力。

英文摘要

Computed tomography (CT) is a cornerstone imaging modality for non-invasive, high-resolution visualization of internal anatomical structures. However, when the scanned object exceeds the scanner's field of view (FOV), projection data are truncated, resulting in incomplete reconstructions and pronounced artifacts near FOV boundaries. Conventional reconstruction algorithms struggle to recover accurate anatomy from such data, limiting clinical reliability. Deep learning approaches have been explored for FOV extension, with diffusion generative models representing the latest advances in image synthesis. Yet, conventional diffusion models are computationally demanding and slow at inference due to their iterative sampling process. To address these limitations, we propose an efficient CT FOV extension framework based on the image-to-image Schrödinger Bridge (I$^2$SB) diffusion model. Unlike traditional diffusion models that synthesize images from pure Gaussian noise, I$^2$SB learns a direct stochastic mapping between paired limited-FOV and extended-FOV images. This direct correspondence yields a more interpretable and traceable generative process, enhancing anatomical consistency and structural fidelity in reconstructions. I$^2$SB achieves superior quantitative performance, with root-mean-square error (RMSE) values of 49.8 HU on simulated noisy data and 152.0 HU on real data, outperforming state-of-the-art diffusion models such as conditional denoising diffusion probabilistic models (cDDPM) and patch-based diffusion methods. Moreover, its one-step inference enables reconstruction in just 0.19 s per 2D slice, representing over a 700-fold speedup compared to cDDPM (135 s) and surpassing DiffusionGAN (0.58 s), the second fastest. This combination of accuracy and efficiency indicates that I$^2$SB has potential for real-time or clinical deployment.

URL PDF HTML ☆

赞 0 踩 0

2604.03275 2026-06-18 physics.ao-ph cs.AI cs.LG 版本更新

IPSL-AID: Generative Diffusion Models for Climate Downscaling from Global to Regional Scales

IPSL-AID：用于从全球到区域尺度气候降尺度的生成扩散模型

Kishanthan Kingston, Olivier Boucher, Freddy Bouchet, Pierre Chapel, Rosemary Eade, Jean-Francois Lamarque, Redouane Lguensat, Kazem Ardaneh

发表机构 * Climate Modeling Center（气候建模中心）； Sorbonne University（索邦大学）； CNRS（法国国家科学研究中心）； IPSL ； Paris（巴黎）； France（法国）

AI总结提出基于去噪扩散概率模型的IPSL-AID工具，利用ERA5再分析数据从粗分辨率输入生成0.25°温度、风和降水场，并建模细尺度特征概率分布以量化不确定性，准确重建统计分布、极端事件和空间结构。

Comments 17 pages, 12 figures, submitted to Climate Informatique 2026, to appear in Environmental Data Science

2604.00730 2026-06-18 cs.CY cs.AI cs.LG cs.SE 版本更新

A CEFR-Inspired Classification Framework with Fuzzy C-Means To Automate Assessment of Programming Skills in Scratch

基于CEFR启发的模糊C均值分类框架：自动化评估Scratch编程技能

Ricardo Hidalgo-Aragón, Jesús M. González-Barahona, Gregorio Robles

发表机构 * Universidad Rey Juan Carlos（雷昂卡洛斯大学）

AI总结提出一种基于CEFR的Scratch项目评估框架，使用模糊C均值聚类对200万+项目分级，识别B2瓶颈并引入分类确定性指标以平衡自动反馈与人工审核。

Comments Best Paper Award CSEDU 2026 -Minor change FPC fix-

详情

AI中文摘要

背景：学校、培训平台和技术公司日益需要以透明、可重复的方法大规模评估编程能力，以支持个性化学习路径。目标：本研究引入一个与欧洲共同语言参考标准（CEFR）一致的Scratch项目评估教学框架，为学生和教师提供通用能力等级，并为课程设计提供可行见解。方法：我们对通过此http URL评估的2008246个Scratch项目应用模糊C均值聚类，实施序数准则将聚类映射到CEFR等级（A1-C2），并引入增强分类指标，识别过渡学习者，实现持续进度跟踪，量化分类确定性以平衡自动反馈与教师评审。影响：该框架能够诊断系统性课程缺口——特别是“B2瓶颈”，由于逻辑同步和数据表示的认知负荷，仅13.3%的学习者处于该等级——同时提供基于确定性的触发机制以进行人工干预。

英文摘要

Context: Schools, training platforms, and technology firms increasingly need to assess programming proficiency at scale with transparent, reproducible methods that support personalized learning pathways. Objective: This study introduces a pedagogical framework for Scratch project assessment, aligned with the Common European Framework of Reference (CEFR), providing universal competency levels for students and teachers alongside actionable insights for curriculum design. Method: We apply Fuzzy C-Means clustering to 2008246 Scratch projects evaluated via Dr.Scratch, implementing an ordinal criterion to map clusters to CEFR levels (A1-C2), and introducing enhanced classification metrics that identify transitional learners, enable continuous progress tracking, and quantify classification certainty to balance automated feedback with instructor review. Impact: The framework enables diagnosis of systemic curriculum gaps-notably a "B2 bottleneck" where only 13.3% of learners reside due to the cognitive load of integrating Logic Synchronization, and Data Representation--while providing certainty--based triggers for human intervention.

URL PDF HTML ☆

赞 0 踩 0

2603.05128 2026-06-18 eess.AS cs.SD 版本更新

PolyBench: A Benchmark for Compositional Reasoning in Polyphonic Audio

PolyBench：多声部音频中组合推理的基准测试

Yuanjian Chen, Yang Xiao, Han Yin, Xubo Liu, Jinjie Huang, Ting Dang

发表机构 * Harbin University of Science and Technology（哈尔滨理工大学）； The University of Melbourne（墨尔本大学）； KAIST（韩国成均馆大学）； University of Surrey（萨里大学）

AI总结针对多声部音频中组合推理评估缺失的问题，提出PolyBench基准，包含计数、分类、检测、并发和时长估计五个子集，评估发现现有大音频语言模型在多声部场景下性能持续下降。

Comments Accepted by INTERSPEECH 2026

2603.06310 2026-06-18 eess.AS cs.CL cs.SD 版本更新

Continual Adaptation for Pacific Indigenous Speech Recognition

太平洋土著语音识别的持续适应

Yang Xiao, Aso Mahmudi, Nick Thieberger, Eliathamby Ambikairajah, Eun-Jung Holden, Ting Dang

发表机构 * The University of Melbourne（墨尔本大学）； UNSW Sydney（新南威尔士大学悉尼分校）

AI总结针对太平洋土著语言数据稀缺和灾难性遗忘问题，研究语音基础模型的适应策略，发现LoRA在顺序学习中会灾难性遗忘，需定制鲁棒适应方法。

Comments Accepted by Interspeech 2026

详情

AI中文摘要

语音基础模型在处理资源匮乏的太平洋土著语言时面临严重的数据稀缺问题。此外，完全微调存在灾难性遗忘的风险。为弥补这一空白，我们提出了一项实证研究，将模型适应到真实的太平洋数据集。我们研究了数据量、适应策略和表征漂移对多种太平洋语言语音基础模型的影响。此外，我们分析了一个用于顺序语言习得的持续学习框架。跨三种不同的太平洋土著语言的实证结果表明，适应这些语言距离较远的语言会引发严重的内部表征漂移。因此，这些模型面临严格的可塑性与稳定性困境。虽然LoRA初始适应良好，但在顺序学习过程中会出现灾难性遗忘。最终，本研究强调了为代表性不足的语言定制鲁棒适应策略的迫切需求。

英文摘要

Speech foundation models struggle with low-resource Pacific Indigenous languages because of severe data scarcity. Furthermore, full fine-tuning risks catastrophic forgetting. To address this gap, we present an empirical study adapting models to real-world Pacific datasets. We investigate the impact of data volume, adaptation strategies, and representational drift on speech foundation models for various Pacific languages. Additionally, we analyze a continual learning framework for sequential language acquisition. Empirical results across three distinct Pacific Indigenous languages demonstrate that adapting to these linguistically distant languages induces severe internal representational drift. Consequently, these models face a strict plasticity and stability dilemma. While LoRA adapts well initially, it suffers from catastrophic forgetting during sequential learning. Ultimately, this study highlights the urgent need for robust adaptation strategies tailored to underrepresented languages.

URL PDF HTML ☆

赞 0 踩 0

2603.04895 2026-06-18 stat.ML cs.LG math.OC 版本更新

How Does the ReLU Activation Affect the Implicit Bias of Gradient Descent on High-dimensional Neural Network Regression?

ReLU激活函数如何影响高维神经网络回归中梯度下降的隐式偏差？

Kuo-Wei Lai, Guanghui Wang, Molei Tao, Vidya Muthukumar

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结本文通过原始-对偶分析，研究了高维随机数据下浅层ReLU模型平方损失梯度下降的隐式偏差，证明其以高概率近似最小ℓ2范数解，差距为Θ(√(n/||λ||₁))。

Comments 66 pages

详情

AI中文摘要

过度参数化的机器学习模型（包括神经网络）通常会导致欠定的训练目标，具有多个全局最小值。隐式偏差指的是通过常见优化算法（如梯度下降）达到的极限全局最小值。在本文中，我们刻画了在高维随机特征上使用平方损失训练浅层ReLU模型时梯度下降的隐式偏差。先前的工作（Vardi和Shamir，2021）表明，在最坏情况下隐式偏差不存在，或者在完全正交数据下恰好对应于最小ℓ2范数插值解（Boursier等人，2022）。我们的工作介于这两个极端之间，并表明，对于足够高维的随机数据，隐式偏差以高概率近似最小ℓ2范数解，差距为Θ(√(n/||λ||₁))，其中n是训练样本数，λ表示数据协方差矩阵的谱。我们的结果通过一种新颖的原始-对偶分析获得，该分析仔细跟踪了预测、数据跨度系数及其相互作用的演变，并表明ReLU激活模式在随机数据上以高概率迅速稳定。

英文摘要

Overparameterized ML models, including neural networks, typically induce underdetermined training objectives with multiple global minima. The implicit bias refers to the limiting global minimum that is attained by a common optimization algorithm, such as gradient descent (GD). In this paper, we characterize the implicit bias of GD for training a shallow ReLU model with the squared loss on high-dimensional random features. Prior work (Vardi and Shamir, 2021) showed that the implicit bias does not exist in the worst-case, or corresponds exactly to the minimum-$\ell_2$-norm interpolating solution under exactly orthogonal data (Boursier et al., 2022). Our work interpolates between these two extremes and shows that, for sufficiently high-dimensional random data, the implicit bias approximates the minimum-$\ell_2$-norm solution with high probability with a gap on the order $Θ(\sqrt{n/||λ||_1})$, where $n$ is the number of training examples and $λ$ denotes the spectrum of the data covariance matrix. Our results are obtained through a novel primal-dual analysis that carefully tracks the evolution of predictions, data-span coefficients, as well as their interactions, and show that the ReLU activation pattern quickly stabilizes with high probability over random data.

URL PDF HTML ☆

赞 0 踩 0

2511.14555 2026-06-18 q-bio.NC cs.AI 版本更新

DecNefSimulator: A Modular, Interpretable Framework for Decoded Neurofeedback Simulation Using Generative Models

DecNefSimulator：一个用于解码神经反馈模拟的模块化、可解释框架

Alexander Olza, Roberto Santana, David Soto

发表机构 * Intelligent Systems Group, University of the Basque Country (UPV/EHU)（巴斯克国家大学智能系统组）； Consciousness Group, Basque Center on Cognition, Brain and Language (BCBL)（巴斯克认知、大脑与语言中心意识组）； Ikerbasque, Basque Foundation for Science（巴斯克科学基金会）

AI总结提出DecNefSimulator，一个模块化可解释的模拟框架，将解码神经反馈形式化为机器学习问题，通过潜变量生成模型模拟参与者，直接观察内部状态并评估协议设计对学习的影响，可复现经验现象、识别失败条件并指导协议设计。

详情

AI中文摘要

解码神经反馈（DecNef）是一种有前景的非侵入性脑调控方法，在神经医学和认知神经科学中具有广泛应用。然而，DecNef研究的进展仍受限于受试者依赖的学习变异性、依赖间接测量来量化进展，以及实验的高成本和时间消耗。我们提出DecNefSimulator，一个模块化且可解释的模拟框架，将DecNef形式化为一个机器学习问题。除了提供虚拟实验室，DecNefSimulator使研究人员能够建模、分析和理解神经反馈动态。通过使用潜变量生成模型作为模拟参与者，DecNefSimulator允许直接观察内部认知状态，并系统评估不同协议设计和受试者特征如何影响学习。我们展示了这种方法如何（i）复现DecNef学习的经验现象，（ii）识别DecNef反馈未能诱导学习的条件，以及（iii）在人体实施之前，在计算机中指导设计更稳健可靠的DecNef协议。总之，DecNefSimulator连接了计算建模和认知神经科学，为方法创新、稳健协议设计以及最终更深入地理解基于DecNef的脑调控提供了原则性基础。

英文摘要

Decoded Neurofeedback (DecNef) is a promising non-invasive approach to brain modulation with wide-ranging applications in neuromedicine and cognitive neuroscience. However, progress in DecNef research remains constrained by subject-dependent learning variability, reliance on indirect measures to quantify progress, and the high cost and time demands of experimentation. We present DecNefSimulator, a modular and interpretable simulation framework that formalizes DecNef as a machine learning problem. Beyond providing a virtual laboratory, DecNefSimulator enables researchers to model, analyze and understand neurofeedback dynamics. Using latent variable generative models as simulated participants, DecNefSimulator allows direct observation of internal cognitive states and systematic evaluation of how different protocol designs and subject characteristics influence learning. We demonstrate how this approach can (i) reproduce empirical phenomena of DecNef learning, (ii) identify conditions under which DecNef feedback fails to induce learning, and (iii) guide the design of more robust and reliable DecNef protocols in silico before human implementation. In summary, DecNefSimulator bridges computational modeling and cognitive neuroscience, offering a principled foundation for methodological innovation, robust protocol design, and ultimately, a deeper understanding of DecNef-based brain modulation.

URL PDF HTML ☆

赞 0 踩 0

2602.23006 2026-06-18 stat.ML cs.LG 版本更新

Regular Fourier Features for Nonstationary Gaussian Processes

非平稳高斯过程的规则傅里叶特征

Arsalan Jawaid, Abdullah Karatas, Jörg Seewig

发表机构 * Institute of Measurement and Sensor Technology University of Kaiserslautern-Landau（测量与传感器技术研究所柏林-卡尔斯鲁厄大学）； Independent Researcher（独立研究者）

AI总结提出规则傅里叶特征方法，通过直接离散化谱表示避免概率假设，实现非平稳高斯过程的低秩近似，并扩展至核学习。

Comments 11 pages (9 main + 2 suppl.), 5 figures, 2 tables

详情

AI中文摘要

模拟高斯过程需要从高维高斯分布中采样，其计算复杂度随采样点数量呈三次方增长。谱方法通过利用傅里叶表示并将谱密度视为适用于蒙特卡洛近似的概率分布来应对这一挑战。尽管这种概率解释对平稳过程有效，但对于非平稳情况则过于严格，因为非平稳过程的谱密度通常不是概率测度。我们针对可调和过程提出规则傅里叶特征以避免这一限制。我们的方法直接离散化谱表示，保留谱权重之间的相关结构，无需概率假设。在有限谱支撑假设下，这产生了一个高效的低秩近似，该近似一致且半正定。当谱密度未知时，该框架自然地扩展到基于数据的核学习。我们在局部平稳和可调和混合核（后者具有复值谱密度）上演示了该方法，并将核学习扩展应用于真实和合成数据。

英文摘要

Simulating a Gaussian process requires sampling from a high-dimensional Gaussian distribution, which scales cubically with the number of sample locations. Spectral methods address this challenge by exploiting the Fourier representation and treating the spectral density as a probability distribution suitable for Monte Carlo approximation. Although this probabilistic interpretation is valid for stationary processes, it is overly restrictive for the nonstationary case, where spectral densities are generally not probability measures. We propose regular Fourier features for harmonizable processes to avoid this limitation. Our method discretizes the spectral representation directly, preserving the correlation structure among spectral weights without requiring probability assumptions. Under a finite-spectral-support assumption, this yields an efficient low-rank approximation that is consistent and positive semi-definite by construction. When the spectral density is unknown, the framework extends naturally to kernel learning from data. We demonstrate the method on locally stationary and harmonizable mixture kernels, the latter with a complex-valued spectral density, and apply the kernel-learning extension to real and synthetic data.

URL PDF HTML ☆

赞 0 踩 0

2602.21160 2026-06-18 stat.ML cs.LG stat.AP stat.ME 版本更新

Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions

不仅多少，而且何处：将认知不确定性分解为每类贡献

Mame Diarra Toure, David A. Stephens

发表机构 * Department of Mathematics and Statistics（数学与统计学系）

AI总结针对安全关键分类中认知不确定性度量无法区分类别的问题，提出将互信息分解为每类向量$C_k$，通过二阶泰勒展开和$1/\mu_k$加权校正边界抑制，在糖尿病视网膜病变选择性预测、分布外检测和标签噪声研究中验证其有效性。

Comments 8 pages, 17 figures Accepted at UAI 2026

Journal ref Forty-Second Annual Conference on Uncertainty in Artificial Intelligence}, year={2026}, url={https://openreview.net/forum?id=cxuWscJmAr}

详情

AI中文摘要

在安全关键分类中，失败的代价往往是不对称的，然而贝叶斯深度学习用单个标量——互信息（MI）来总结认知不确定性，这无法区分模型的无知涉及良性类别还是安全关键类别。我们将MI分解为每类向量$C_k(x)=\sigma_k^{2}/(2\mu_k)$，其中$\mu_k{=}\mathbb{E}[p_k]$，$\sigma_k^2{=}\mathrm{Var}[p_k]$，计算基于后验样本。该分解来自熵的二阶泰勒展开；$1/\mu_k$加权校正了边界抑制，使$C_k$在稀有类别和常见类别之间具有可比性。根据构造，$\sum_k C_k \approx \mathrm{MI}$，并且伴随的偏度诊断标志可识别近似退化的输入。在刻画$C_k$的公理性质后，我们在三个任务上验证了它：（i）糖尿病视网膜病变的选择性预测，其中关键类别的$C_k$相比MI降低了34.7%的选择性风险，相比方差基线降低了56.2%；（ii）临床和图像基准上的分布外检测，其中$\sum_k C_k$取得了最高的AUROC，并且每类视角暴露了MI无法察觉的不对称偏移；（iii）受控的标签噪声研究，其中在端到端贝叶斯训练下，$\sum_k C_k$对注入的偶然噪声的敏感性低于MI，而在迁移学习下两种度量均退化。在所有任务中，后验近似的质量对不确定性的影响至少与度量选择本身一样强，这表明不确定性如何通过网络传播与其如何被度量同等重要。

英文摘要

In safety-critical classification, the cost of failure is often asymmetric, yet Bayesian deep learning summarises epistemic uncertainty with a single scalar, mutual information (MI), that cannot distinguish whether a model's ignorance involves a benign or safety-critical class. We decompose MI into a per-class vector $C_k(x)=σ_k^{2}/(2μ_k)$, with $μ_k{=}\mathbb{E}[p_k]$ and $σ_k^2{=}\mathrm{Var}[p_k]$ across posterior samples. The decomposition follows from a second-order Taylor expansion of the entropy; the $1/μ_k$ weighting corrects boundary suppression and makes $C_k$ comparable across rare and common classes. By construction $\sum_k C_k \approx \mathrm{MI}$, and a companion skewness diagnostic flags inputs where the approximation degrades. After characterising the axiomatic properties of $C_k$, we validate it on three tasks: (i) selective prediction for diabetic retinopathy, where critical-class $C_k$ reduces selective risk by 34.7\% over MI and 56.2\% over variance baselines; (ii) out-of-distribution detection on clinical and image benchmarks, where $\sum_k C_k$ achieves the highest AUROC and the per-class view exposes asymmetric shifts invisible to MI; and (iii) a controlled label-noise study in which $\sum_k C_k$ shows less sensitivity to injected aleatoric noise than MI under end-to-end Bayesian training, while both metrics degrade under transfer learning. Across all tasks, the quality of the posterior approximation shapes uncertainty at least as strongly as the choice of metric, suggesting that how uncertainty is propagated through the network matters as much as how it is measured.

URL PDF HTML ☆

赞 0 踩 0

2602.17187 2026-06-18 stat.ML cs.LG 版本更新

Anti-causal domain generalization: Leveraging unlabeled data

反因果域泛化：利用无标签数据

Sorawit Saengkyongam, Juan L. Gamella, Andrew C. Miller, Jonas Peters, Nicolai Meinshausen, Christina Heinze-Deml

发表机构 * Apple（苹果公司）； ETH Zürich（苏黎世联邦理工学院）

AI总结针对反因果设置下的域泛化问题，提出利用无标签数据估计环境扰动方向，通过惩罚模型对协变量均值和协方差变化的敏感性实现鲁棒性，并提供最坏情况最优性保证。

Comments Accepted at the International Conference on Machine Learning (ICML) 2026

详情

AI中文摘要

域泛化问题关注的是学习在部署到新的、未见过的环境时对分布变化具有鲁棒性的预测模型。现有方法通常需要来自多个训练环境的标记数据，这在标记数据稀缺时限制了它们的适用性。在这项工作中，我们研究了反因果设置下的域泛化，其中结果导致观察到的协变量。在这种结构下，影响协变量的环境扰动不会传播到结果，这促使我们对模型对这些扰动的敏感性进行正则化。关键在于，估计这些扰动方向不需要标签，使我们能够利用来自多个环境的无标签数据。我们提出了两种方法，分别惩罚模型对跨环境协变量均值和协方差变化的敏感性，并证明这些方法在特定环境类别下具有最坏情况最优性保证。最后，我们在一个受控物理系统和一个生理信号数据集上展示了我们方法的实证性能。

英文摘要

The problem of domain generalization concerns learning predictive models that are robust to distribution shifts when deployed in new, previously unseen environments. Existing methods typically require labeled data from multiple training environments, limiting their applicability when labeled data are scarce. In this work, we study domain generalization in an anti-causal setting, where the outcome causes the observed covariates. Under this structure, environment perturbations that affect the covariates do not propagate to the outcome, which motivates regularizing the model's sensitivity to these perturbations. Crucially, estimating these perturbation directions does not require labels, enabling us to leverage unlabeled data from multiple environments. We propose two methods that penalize the model's sensitivity to variations in the mean and covariance of the covariates across environments, respectively, and prove that these methods have worst-case optimality guarantees under certain classes of environments. Finally, we demonstrate the empirical performance of our approach on a controlled physical system and a physiological signal dataset.

URL PDF HTML ☆

赞 0 踩 0

2512.12850 2026-06-18 cs.AR cs.LG cs.SY eess.SY hep-ex 版本更新

KANELÉ: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation

KANELÉ：基于Kolmogorov-Arnold网络的高效LUT评估

Duc Hoang, Aarush Gupta, Philip Harris

发表机构 * Massachusetts Institute of Technology（麻省理工学院）

AI总结提出KANELÉ框架，利用Kolmogorov-Arnold网络（KAN）的独特性质，通过量化与剪枝协同优化，首次系统实现FPGA上的高效LUT映射，相比先前方法加速高达2700倍并节省大量资源。

Comments International Symposium on Field-Programmable Gate Arrays 2026 (ISFPGA'2026)

详情

DOI: 10.1145/3748173.3779202

AI中文摘要

低延迟、资源高效的FPGA神经网络推理对于需要实时能力和低功耗的应用至关重要。基于查找表（LUT）的神经网络是一种常见解决方案，结合了强大的表示能力和高效的FPGA实现。在这项工作中，我们介绍了KANELÉ，一个利用Kolmogorov-Arnold网络（KAN）独特性质进行FPGA部署的框架。与传统的多层感知器（MLP）不同，KAN使用可学习的一维样条作为边缘激活函数，其域固定，这种结构天然适合离散化和高效的LUT映射。我们提出了第一个在FPGA上实现KAN的系统设计流程，通过量化与剪枝协同优化训练，以实现紧凑、高吞吐量和低延迟的KAN架构。我们的结果表明，与先前的KAN-on-FPGA方法相比，加速高达2700倍，并节省了数量级的资源。此外，KANELÉ在广泛使用的基准测试中匹配或超越了其他基于LUT的架构，特别是在涉及符号或物理公式的任务中，同时平衡了FPGA硬件上的资源使用。最后，我们通过将框架扩展到实时、高能效的控制系统，展示了其多功能性。

英文摘要

Low-latency, resource-efficient neural network inference on FPGAs is essential for applications demanding real-time capability and low power. Lookup table (LUT)-based neural networks are a common solution, combining strong representational power with efficient FPGA implementation. In this work, we introduce KANELÉ, a framework that exploits the unique properties of Kolmogorov-Arnold Networks (KANs) for FPGA deployment. Unlike traditional multilayer perceptrons (MLPs), KANs employ learnable one-dimensional splines with fixed domains as edge activations, a structure naturally suited to discretization and efficient LUT mapping. We present the first systematic design flow for implementing KANs on FPGAs, co-optimizing training with quantization and pruning to enable compact, high-throughput, and low-latency KAN architectures. Our results demonstrate up to a 2700x speedup and orders of magnitude resource savings compared to prior KAN-on-FPGA approaches. Moreover, KANELÉ matches or surpasses other LUT-based architectures on widely used benchmarks, particularly for tasks involving symbolic or physical formulas, while balancing resource usage across FPGA hardware. Finally, we showcase the versatility of the framework by extending it to real-time, power-efficient control systems.

URL PDF HTML ☆

赞 0 踩 0

2602.04796 2026-06-18 eess.AS cs.SD 版本更新

LALM-as-a-Judge: Benchmarking Large Audio-Language Models for Safety Evaluation in Multi-Turn Spoken Dialogues

LALM-as-a-Judge：用于多轮口语对话安全评估的大型音频语言模型基准测试

Amir Ivry, Shinji Watanabe

发表机构 * Computer Engineering, Technion--Israel Institute of Technology, Haifa, Israel（技术学院电子工程系，技术离子技术研究所，以色列海法）； Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA（语言技术研究所，卡内基梅隆大学，美国匹兹堡）

AI总结针对口语对话中社会不安全内容评估仍以文本为中心、忽略韵律和转录失败的问题，提出包含24000个多轮口语对话的开放基准，评估6种大型音频语言模型在文本、音频和多模态设置下的敏感性、严重性顺序特异性和轮次位置偏差，发现音频提供非词汇证据，多模态增益非普遍且存在多种模式。

Comments Accepted to ICML 2026

详情

AI中文摘要

对口语对话中社会不安全内容的评估仍然以文本为中心，忽略了韵律和转录失败。我们提出了LALM-as-a-Judge，其中包括一个包含24000个多轮口语对话的开放基准，每个对话包含一个局部不安全轮次，这些对话基于8个社会不安全类别和5个严重级别生成。我们评估了6种大型音频语言模型（LALMs）作为评判者，包括开源和闭源模型，在纯文本、纯音频和多模态设置下，针对对话中社会有害内容的敏感性、严重性顺序特异性和轮次位置偏差。结果表明，音频提供了超越转录语义的非词汇证据，并且多模态增益并非普遍存在，而是可以表现为文本锚定、平衡、保守和干扰，我们将这些归因于音频路径瓶颈和融合限制。我们将该基准定位为诊断工具，并为模型、模态和提示选择提供实践者指导。

英文摘要

Evaluation of socially unsafe content in spoken dialogues remains text-centric, missing prosody and transcription failures. We present LALM-as-a-Judge, which includes an open benchmark of 24,000 multi-turn spoken dialogues with one localized unsafe turn, generated out of 8 socially unsafe categories and 5 severity levels. We evaluate 6 large audio-language models (LALMs) as judges, open and closed-source, in text-only, audio-only, and multimodal setups by their sensitivity, severity-order specificity, and turn-position bias for socially harmful content in the dialogue. Results show that audio contributes non-lexical evidence beyond transcript semantics and that multimodal gains are not universal but can be text-anchored, balanced, conservative, and interfering, which we link to the audio pathway bottlenecks and fusion limits. We position the benchmark as diagnostic and derive practitioner guidance for model, modality, and prompts choices.

URL PDF HTML ☆

赞 0 踩 0

2602.02056 2026-06-18 cs.AR cs.LG cs.SY eess.SY stat.ML 版本更新

Ultrafast On-chip Online Learning via Spline Locality in Kolmogorov-Arnold Networks

基于Kolmogorov-Arnold网络中样条局部性的超快片上在线学习

Duc Hoang, Aarush Gupta, Philip Harris

发表机构 * MIT（麻省理工学院）

AI总结针对量子计算和核聚变控制等高频系统对亚微秒级在线学习的需求，提出利用Kolmogorov-Arnold网络的B样条局部性实现稀疏更新和固定点量化鲁棒性，在FPGA上实现比MLP更高效、更具表达力的超快在线学习。

Comments Forty-Third International Conference on Machine Learning (ICML'26)

详情

AI中文摘要

超快在线学习对于高频系统（如量子计算和核聚变控制）至关重要，这些系统中的自适应必须在亚微秒时间尺度内发生。满足这些需求需要在严格的内存约束下进行低延迟、固定精度的计算，而传统的多层感知器（MLP）在这种条件下既低效又不稳定。我们识别了Kolmogorov-Arnold网络（KAN）与这些约束相符的关键特性。具体来说，我们表明：（i）利用B样条局部性的KAN更新是稀疏的，从而实现优越的片上资源缩放；（ii）KAN对固定点量化具有固有的鲁棒性。通过在现场可编程门阵列（FPGA）上实现固定点在线训练（一种代表性的片上计算平台），我们证明基于KAN的在线学习器在一系列低延迟和资源受限的任务中比MLP显著更高效且更具表达力。据我们所知，这项工作首次展示了在亚微秒延迟下的无模型在线学习。

英文摘要

Ultrafast online learning is essential for high-frequency systems, such as controls for quantum computing and nuclear fusion, where adaptation must occur on sub-microsecond timescales. Meeting these requirements demands low-latency, fixed-precision computation under strict memory constraints, a regime in which conventional Multi-Layer Perceptrons (MLPs) are both inefficient and numerically unstable. We identify key properties of Kolmogorov-Arnold Networks (KANs) that align with these constraints. Specifically, we show that: (i) KAN updates exploiting B-spline locality are sparse, enabling superior on-chip resource scaling, and (ii) KANs are inherently robust to fixed-point quantization. By implementing fixed-point online training on Field-Programmable Gate Arrays (FPGAs), a representative platform for on-chip computation, we demonstrate that KAN-based online learners are significantly more efficient and expressive than MLPs across a range of low-latency and resource-constrained tasks. To our knowledge, this work is the first to demonstrate model-free online learning at sub-microsecond latencies.

URL PDF HTML ☆

赞 0 踩 0

2601.14288 2026-06-18 astro-ph.CO cs.AI cs.CE gr-qc hep-th 版本更新

DeepInflation: an AI agent for research and model discovery of inflation

DeepInflation：用于暴胀研究与模型发现的AI智能体

Ze-Yu Peng, Hao-Shi Yuan, Qi Lai, Jun-Qian Jiang, Gen Ye, Jun Zhang, Yun-Song Piao

发表机构 * School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China ； International Centre for Theoretical Physics Asia-Pacific, University of Chinese Academy of Sciences, 100190 Beijing, China Taiji Laboratory for Gravitational Wave Universe, University of Chinese Academy of Sciences, 100049 Beijing, China School of Fundamental Physics ； Mathematical Sciences, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China Institute of Theoretical Physics, Chinese Academy of Sciences, P.O. Box 2735, Beijing 100190, China D\' e partement de Physique Th\' e orique, Universit\' e de Gen\` e ve, 24 quai Ernest-Ansermet, CH-1211 Gen\` e ve 4, Switzerland

AI总结提出基于多智能体架构的AI智能体DeepInflation，集成大语言模型、符号回归引擎和检索增强生成知识库，自动发现与最新观测一致的单场慢滚暴胀势，并解释理论背景。

详情

AI中文摘要

我们提出了DeepInflation，一个专为暴胀宇宙学中的研究和模型发现而设计的AI智能体。基于多智能体架构，DeepInflation将大语言模型（LLMs）与符号回归（SR）引擎以及检索增强生成（RAG）知识库相结合。该框架使智能体能够自动探索和验证广阔的暴胀势景观，同时将其输出建立在既定的理论文献基础上。我们证明，DeepInflation能够成功发现与最新观测（以ACT DR6结果为例）或任意给定的$n_s$和$r$一致的简单且可行的单场慢滚暴胀势，并为晦涩的暴胀场景提供准确的理论背景。DeepInflation作为宇宙学中新一代自主科学发现引擎的原型，使研究人员和非专家都能使用自然语言探索暴胀景观。该智能体可从此网址获取：https://example.com。

英文摘要

We present DeepInflation, an AI agent designed for research and model discovery in inflationary cosmology. Built upon a multi-agent architecture, DeepInflation integrates Large Language Models (LLMs) with a symbolic regression (SR) engine and a retrieval-augmented generation (RAG) knowledge base. This framework enables the agent to automatically explore and verify the vast landscape of inflationary potentials while grounding its outputs in established theoretical literature. We demonstrate that DeepInflation can successfully discover simple and viable single-field slow-roll inflationary potentials consistent with the latest observations (with the ACT DR6 results taken as an example) or any given $n_s$ and $r$, and provide accurate theoretical context for obscure inflationary scenarios. DeepInflation serves as a prototype for a new generation of autonomous scientific discovery engines in cosmology, which enables researchers and non-experts alike to explore the inflationary landscape using natural language. This agent is available at https://github.com/pengzy-cosmo/DeepInflation.

URL PDF HTML ☆

赞 0 踩 0

2601.00567 2026-06-18 cs.IR cs.AI 版本更新

Improving Scientific Document Retrieval with Academic Concept Index

利用学术概念索引改进科学文献检索

Jeyun Lee, Junhyoung Lee, Wonbin Kweon, Bowen Jin, Yu Zhang, Susik Yoon, Dongha Lee, Hwanjo Yu, Jiawei Han, Seongku Kang

发表机构 * Korea University Seoul South Korea ； University of Illinois Urbana-Champaign Champaign United States ； Texas A\&M University College Station United States ； Yonsei University Seoul South Korea ； Pohang University of Science ； Korea University ； University of Illinois Urbana-Champaign ； Texas A\&M University ； Yonsei University

AI总结针对通用检索器在科学领域因词汇和需求不匹配而表现不佳的问题，提出基于学术概念索引的方法，通过概念覆盖查询生成和概念聚焦上下文扩展，提升查询质量和检索性能。

Comments Accepted for publication in ACM TIST, 2026

详情

AI中文摘要

将通用领域的检索器适应到科学领域具有挑战性，原因在于缺乏大规模领域特定的相关性标注，以及词汇和信息需求的显著不匹配。最近的方法通过两个独立方向利用大型语言模型（LLMs）来解决这些问题：（1）生成合成查询以进行微调，（2）生成辅助上下文以支持相关性匹配。然而，这两个方向都忽略了科学文档中嵌入的多样化学术概念，常常产生冗余或概念狭窄的查询和上下文。为了解决这一限制，我们引入了一个学术概念索引，该索引从论文中提取关键概念，并在学术分类的指导下进行组织。这个结构化索引为改进这两个方向奠定了基础。首先，我们通过基于概念覆盖的查询生成（CCQGen）来增强合成查询生成，该方法自适应地以未覆盖的概念为条件，生成具有更广泛概念覆盖的互补查询。其次，我们通过概念聚焦的辅助上下文（CCExpand）来增强上下文增强，该方法利用一组文档片段作为对概念感知的CCQGen查询的简洁响应。大量实验表明，将学术概念索引纳入查询生成和上下文增强中，可以产生更高质量的查询、更好的概念对齐以及改进的检索性能。

英文摘要

Adapting general-domain retrievers to scientific domains is challenging due to the scarcity of large-scale domain-specific relevance annotations and the substantial mismatch in vocabulary and information needs. Recent approaches address these issues through two independent directions that leverage large language models (LLMs): (1) generating synthetic queries for fine-tuning, and (2) generating auxiliary contexts to support relevance matching. However, both directions overlook the diverse academic concepts embedded within scientific documents, often producing redundant or conceptually narrow queries and contexts. To address this limitation, we introduce an academic concept index, which extracts key concepts from papers and organizes them guided by an academic taxonomy. This structured index serves as a foundation for improving both directions. First, we enhance the synthetic query generation with concept coverage-based generation (CCQGen), which adaptively conditions LLMs on uncovered concepts to generate complementary queries with broader concept coverage. Second, we strengthen the context augmentation with concept-focused auxiliary contexts (CCExpand), which leverages a set of document snippets that serve as concise responses to the concept-aware CCQGen queries. Extensive experiments show that incorporating the academic concept index into both query generation and context augmentation leads to higher-quality queries, better conceptual alignment, and improved retrieval performance.

URL PDF HTML ☆

赞 0 踩 0

2505.15215 2026-06-18 stat.ML cs.LG stat.ME 版本更新

Clustering and Pruning in Causal Data Fusion

因果数据融合中的聚类与剪枝

Otto Tabell, Santtu Tikka, Juha Karvanen

发表机构 * Department of Mathematics and Statistics（数学与统计学系）

AI总结针对多数据源因果融合中变量增多导致计算复杂的问题，提出剪枝和聚类预处理方法，基于小图推断大图中因果效应的可识别性并给出识别函数。

详情

AI中文摘要

数据融合，即结合观测数据和实验数据的过程，可以使得原本不可识别的因果效应变得可识别。尽管针对特定场景已经开发了识别算法，但do-calculus仍然是因果数据融合的唯一通用工具，特别是当某些变量存在于部分数据源而其他数据源中没有时。然而，基于do-calculus的方法可能随着变量数量增加和因果图复杂度增长而面临计算挑战。因此，有必要在保留必要特征的同时减小此类模型的规模。为此，我们提出将剪枝（移除不必要的变量）和聚类（合并变量）作为因果数据融合的预处理操作。我们将先前关于单一数据源的结果进行推广，并推导出在多数据源情况下应用剪枝和聚类的条件。我们给出了基于较小图推断较大图中因果效应可识别性或不可识别性的充分条件，并展示了如何为可识别的因果效应获得相应的识别函数。来自流行病学和社会科学的例子展示了这些结果的应用。

英文摘要

Data fusion, the process of combining observational and experimental data, can enable the identification of causal effects that would otherwise remain non-identifiable. Although identification algorithms have been developed for specific scenarios, do-calculus remains the only general-purpose tool for causal data fusion, particularly when variables are present in some data sources but not others. However, approaches based on do-calculus may encounter computational challenges as the number of variables increases and the causal graph grows in complexity. Consequently, there exists a need to reduce the size of such models while preserving the essential features. For this purpose, we propose pruning (removing unnecessary variables) and clustering (combining variables) as preprocessing operations for causal data fusion. We generalize earlier results on a single data source and derive conditions for applying pruning and clustering in the case of multiple data sources. We give sufficient conditions for inferring the identifiability or non-identifiability of a causal effect in a larger graph based on a smaller graph and show how to obtain the corresponding identifying functional for identifiable causal effects. Examples from epidemiology and social science demonstrate the use of the results.

URL PDF HTML ☆

赞 0 踩 0

2511.19468 2026-06-18 cs.DC cs.ET cs.LG physics.space-ph 版本更新

Towards a future space-based, highly scalable AI infrastructure system design

面向未来天基、高度可扩展的AI基础设施系统设计

Blaise Agüera y Arcas, Travis Beals, Maria Biggs, Jessica V. Bloom, Thomas Fischbacher, Konstantin Gromov, Urs Köster, Rishiraj Pravahan, James Manyika

发表机构 * Google（谷歌）

AI总结本文探索利用卫星集群、太阳能板、自由空间光通信和TPU芯片构建天基机器学习计算系统，并分析辐射测试、发射成本等可行性。

Comments 18 pages, 4 figures. v2: Cleaned up references. Improved rough estimates. Fixed typos. Re-ran radiation test with improved methods

详情

AI中文摘要

如果AI是一种基础通用技术，我们应该预期对AI计算和能源的需求将持续增长。太阳是太阳系中最大的能源来源，因此值得考虑未来的AI基础设施如何最有效地利用这种能量。本文探索了用于太空机器学习的可扩展计算系统，该系统使用配备太阳能板的卫星群、自由空间光通信的星间链路以及谷歌张量处理单元（TPU）加速芯片。为了促进高带宽、低延迟的星间通信，卫星将近距离飞行。我们通过一个半径为1公里的81颗卫星集群说明了编队飞行的基本方法，并描述了一种使用基于高精度ML模型来控制大规模星座的方法。Trillium TPU经过了辐射测试。它们在总电离剂量相当于5年任务寿命的情况下存活，没有永久性故障，并针对位翻转错误进行了表征。发射成本是整体系统成本的关键部分；学习曲线分析表明，到2030年代中期，发射到近地轨道（LEO）的成本可能达到$\lesssim$200美元/公斤。

英文摘要

If AI is a foundational general-purpose technology, we should anticipate that demand for AI compute -- and energy -- will continue to grow. The Sun is by far the largest energy source in our solar system, and thus it warrants consideration how future AI infrastructure could most efficiently tap into that power. This work explores a scalable compute system for machine learning in space, using fleets of satellites equipped with solar arrays, inter-satellite links using free-space optics, and Google tensor processing unit (TPU) accelerator chips. To facilitate high-bandwidth, low-latency inter-satellite communication, the satellites would be flown in close proximity. We illustrate the basic approach to formation flight via an 81-satellite cluster of 1 km radius, and describe an approach for using high-precision ML-based models to control large-scale constellations. Trillium TPUs are radiation tested. They survive a total ionizing dose equivalent to a 5 year mission life without permanent failures, and are characterized for bit-flip errors. Launch costs are a critical part of overall system cost; a learning curve analysis suggests launch to low-Earth orbit (LEO) may reach $\lesssim$\$200/kg by the mid-2030s.

URL PDF HTML ☆

赞 0 踩 0

2509.03734 2026-06-18 cs.DS cs.LG 版本更新

How fast can you find a good hypothesis?

你能多快找到一个好的假设？

Anders Aamand, Maryam Aliakbarpour, Justin Y. Chen, Sandeep Silwal

发表机构 * BARC, University of Copenhagen（巴尔的效力研究所，哥本哈根大学）； Rice University（里士满大学）； MIT University of Wisconsin-Madison（麻省理工学院，威斯康星大学麦迪逊分校）

AI总结研究假设选择问题，提出一种运行时间为poly(n)的混合输出算法，达到C=3-2/n的近似保证，并将正确算法的运行时间改进为Õ(n/(δε²))。

Comments Abstract abridged to meet arxiv requirements. This is the full version of a paper appearing at COLT 2026

详情

AI中文摘要

在假设选择问题中，我们被给予对有限候选分布（假设）集合 $\mathcal{H} = \{H_1, \ldots, H_n\}$ 的样本和查询访问，以及来自未知分布 $P$ 的样本，两者都在域 $\mathcal{X}$ 上。目标是输出一个分布 $Q$，使其到 $P$ 的距离与 $\mathcal{H}$ 中最近假设的距离相当。具体来说，如果最小距离是 $\mathsf{OPT}$，我们旨在输出 $Q$，使得以至少 $1-\delta$ 的概率，其到 $P$ 的总变差距离至多为 $C \cdot \mathsf{OPT} + \varepsilon$。对于正确算法（其中 $Q \in \mathcal{H}$），最优近似为 $C=3$，使用来自 $P$ 的 $\Theta(\log(n/\delta)/\varepsilon^2)$ 个样本；对于不正确算法（其中 $Q$ 不一定在 $\mathcal{H}$ 中），最优近似为 $C=2$，使用来自 $P$ 的 $\tilde{\Theta}(\log(n/\delta)/\varepsilon^2)$ 个样本。在不正确设置中，达到 $C=2$ 的算法 [Bousquet, Braverman, Kol, Efremenko, Moran, FOCS 2021] 的运行时间随 $|\mathcal{X}|$ 多项式增长——对于实值分布，它无法在有限时间内运行。改进运行时间的一个有希望的途径是考虑输出假设混合 $Q$ 的不正确算法，因为这样的分布可以用 $n$ 个内存字表示。我们证明 (1) 一个下界：除非样本数量是 $|\mathcal{X}|$ 的多项式，否则任何输出混合的算法都无法实现比 $C = 3-2/n$ 更好的近似，以及 (2) 一个运行时间为 $\text{poly}(n)$ 并达到相同近似保证的算法。在正确设置中，[Aliakbarpour, Bun, Smith, NeurIPS 2024] 提供了一个 $C=3$ 且运行时间为 $\tilde{O}(n/(\delta^3\varepsilon^3))$ 的算法。我们将时间复杂度改进为 $\tilde{O}(n/(\delta \varepsilon^2))$，显著减少了对置信度和误差参数的依赖。

英文摘要

In the hypothesis selection problem, we are given sample and query access to finite set of candidate distributions (hypotheses), $\mathcal{H} = \{H_1, \ldots, H_n\}$, and samples from an unknown distribution $P$, both over a domain $\mathcal{X}$. The goal is to output a distribution $Q$ whose distance to $P$ is comparable to that of the nearest hypothesis in $\mathcal{H}$. Specifically, if the minimum distance is $\mathsf{OPT}$, we aim to output $Q$ such that, with probability at least $1-δ$, its total variation distance to $P$ is at most $C \cdot \mathsf{OPT} + \varepsilon$. The optimal approximation for proper algorithms (where $Q \in \mathcal{H}$) is $C=3$ using $Θ(\log(n/δ)/\varepsilon^2)$ samples from $P$ and for improper algorithms (where $Q$ is not necessarily in $\mathcal{H}$) is $C=2$ using $\tildeΘ(\log(n/δ)/\varepsilon^2)$ samples from $P$. In the improper setting, the algorithm achieving $C=2$ [Bousquet, Braverman, Kol, Efremenko, Moran, FOCS 2021] runs in time which grows polynomially with $|\mathcal{X}|$ -- it does not run in finite time for real-valued distributions. A promising path towards improved runtime is to consider improper algorithms which output a mixture $Q$ of the hypotheses as such a distribution can be represented in $n$ words of memory. We show (1) a lower bound that no algorithm which outputs a mixture can achieve approximation better than $C = 3-2/n$ unless the number of samples is polynomial in $|\mathcal{X}|$, as well as (2) an algorithm which runs in time $\text{poly}(n)$ and achieves the same approximation guarantee. In the proper setting, [Aliakbarpour, Bun, Smith, NeurIPS 2024] provided an algorithm with $C=3$ running in $\tilde{O}(n/(δ^3\varepsilon^3))$ time. We improve this time complexity to $\tilde{O}(n/(δ\varepsilon^2))$, significantly reducing the dependence on the confidence and error parameters.

URL PDF HTML ☆

赞 0 踩 0

2511.00802 2026-06-18 cs.SE cs.CL cs.LG 版本更新

GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents

GrowthHacker: 使用代码修改型LLM代理的自动离线策略评估优化

Jie JW Wu, Ayanda Patrick Herlihy, Ahmad Saleem Mirza, Ali Afoud, Fatemeh Fard

发表机构 * Michigan Technological University, Houghton（密歇根技术大学）； Birmingham City University（伯明翰城市大学）； University of British Columbia, Kelowna（不列颠哥伦比亚大学, 肯洛纳）

AI总结提出GrowthHacker基准，利用LLM代理自动迭代修改代码以优化离线策略评估（OPE）实现，在Open Bandit Pipeline和Scope-RL上评估多种框架，证明基于LLM的代理可作为自动增长黑客持续改进OPE系统。

Comments Accepted for publication in ACM Transactions on Software Engineering and Methodology (TOSEM), 2026

详情

DOI: 10.1145/3815588

AI中文摘要

随着数据驱动开发的广泛采用，在线A/B测试已成为衡量新技术效果的既定方法。然而，部署在线实验需要设计、实现和部署资源，并可能对用户产生负面影响（例如，不安全或不道德的结果），同时需要数周的数据收集。为了解决这一问题，离线策略评估（OPE）或离线A/B测试这一日益增长的研究领域，使用先前收集的日志数据离线评估新技术。OPE也是强化学习中的一个基本问题，在在线测试昂贵或风险高的领域（如医疗保健、推荐系统、教育和机器人技术）中非常重要。尽管代码生成大语言模型（LLM）和代理工作流取得了进展，但关于LLM和基于LLM的代理是否以及如何自动优化OPE实现，我们知之甚少。我们提出了GrowthHacker，这是一个基准测试，用于在大规模公共数据集上评估基线LLM和基于LLM的代理。GrowthHacker自主迭代修改代码，运行OPE，并使用指标指导后续优化。我们在Open Bandit Pipeline（OBP）和Scope-RL上评估方法，并开发了一个双代理框架，该框架解决了现有框架的局限性，同时降低了复杂性。在两个库中，双代理显示出最高的可靠性（98.1%-100%成功率）和正向结果率（78%），正向结果的中位改进为4.4%；CrewAI实现了最高的平均改进（37.9%），并且是唯一没有极端值失败的框架。AutoGen和Default各达到65%的正向结果率。这些结果证明了使用基于LLM的代理作为自动“增长黑客”持续改进OPE系统的可行性，对在手动优化成本高昂的情况下扩展数据驱动决策具有重要意义。

英文摘要

With data-driven development now widely adopted, online A/B testing is an established method for measuring the effects of new technologies. However, deploying online experiments demands resources for design, implementation, and deployment, and may negatively impact users (e.g., unsafe or unethical outcomes) while requiring weeks of data collection. To address this, the growing research area of off-policy evaluation (OPE), or offline A/B testing, assesses new technologies offline using previously collected logged data. OPE is also a fundamental problem in reinforcement learning and is important where online testing is expensive or risky, such as healthcare, recommender systems, education, and robotics. Despite advances in code-generation large language models (LLMs) and agentic workflows, little is known about whether and how LLMs and LLM-based agents can automatically optimize OPE implementations. We propose GrowthHacker, a benchmark that evaluates baseline LLMs and LLM-based agents on large-scale public datasets. GrowthHacker autonomously and iteratively modifies code, runs OPE, and uses the metrics to guide subsequent optimization. We evaluate methods on Open Bandit Pipeline (OBP) and Scope-RL, and develop a two_agent framework that addresses limitations of existing frameworks while reducing complexity. Across both libraries, two_agent shows the highest reliability (98.1%-100% success rate) and positive-outcome rate (78%), with a median improvement of 4.4% among positive outcomes; CrewAI achieves the highest average improvement (37.9%) and is the only framework with zero extreme-value failures. AutoGen and Default each reach 65% positive-outcome rates. These results establish the feasibility of using LLM-based agents as automated "growth hackers" to continuously improve OPE systems, with implications for scaling data-driven decision-making where manual optimization is expensive.

URL PDF HTML ☆

赞 0 踩 0

2511.00366 2026-06-18 stat.ML cs.CE cs.LG 版本更新

A Streaming Sparse Cholesky Method for Derivative-Informed Gaussian Process Surrogates Within Digital Twin Applications

面向数字孪生应用中导数信息高斯过程代理的流式稀疏Cholesky方法

Shridhar Vashishtha, Krishna Prasath Logakannan, Jacob Hochhalter, Shandian Zhe, Robert M. Kirby

发表机构 * organization= Department of Mechanical Engineering, University of Utah , addressline= , city= Salt Lake City , postcode= 84112 , state= UT , country= USA ； organization= Kahlert School of Computing, University of Utah , city= Salt Lake City , postcode= 84112 , state= UT , country= USA ； organization= Scientific Computing \& Imaging Institute, University of Utah , addressline= , city= Salt Lake City , postcode= 84112 , state= UT , country= USA

AI总结提出一种流式稀疏Cholesky方法，通过动态更新和导数信息增强高斯过程代理，降低协方差矩阵维度，实现数字孪生中飞机结构性能的实时预测。

详情

AI中文摘要

数字孪生被开发用于模拟特定物理资产（或孪生体）的行为，它们可以由高保真基于物理的模型或代理组成。高精度代理通常优于多物理场模型，因为它们能够实时预测物理孪生体的未来状态。为了适应特定的物理孪生体，必须使用来自该物理孪生体的在役数据更新数字孪生模型。在本文中，我们结合并扩展了几项先前与代理相关的进展，旨在展示一个端到端的数字孪生（DT）解决方案，用于预测飞机结构（物理资产）的性能。为此，我们将高斯过程（GP）模型扩展到包含导数数据，以提高精度，并通过动态更新来吸收在役期间的物理孪生体数据。然而，包含导数数据会带来协方差矩阵维度增加的过高成本。我们通过改进的动态稀疏Cholesky线性系统求解器规避了这个问题。数值实验表明，导数增强的稀疏Cholesky GP方法在动态数据添加时产生了改进的模型预测精度。最后，我们在一个数字孪生框架内演示了所开发的算法，用于模拟航空航天飞行器中的疲劳裂纹扩展，从而通过我们组装的工程系统展示了数字孪生技术如何在实践中结合。

英文摘要

Digital twins are developed to model the behavior of a specific physical asset (or twin), and they can consist of high-fidelity physics-based models or surrogates. A highly accurate surrogate is often preferred over multi-physics models as they enable forecasting the physical twin future state in real-time. To adapt to a specific physical twin, the digital twin model must be updated using in-service data from that physical twin. In this paper, we combine and extend several previous surrogate-related advancements with the goal of demonstrating an end-to-end digital twin (DT) solution for predicting performance of an aircraft structure (the physical asset). To this end, we extend Gaussian process (GP) models to include derivative data, for improved accuracy, with dynamic updating to ingest physical twin data during service. Including derivative data, however, comes at a prohibitive cost of increased covariance matrix dimension. We circumvent this issue through our modified dynamic sparse Cholesky linear system solver. Numerical experiments demonstrate that the prediction accuracy of the derivative-enhanced sparse Cholesky GP method produces improved models upon dynamic data additions. Lastly, we demonstrate the developed algorithm within a DT framework to model fatigue crack growth in an aerospace vehicle, thereby exhibiting through our assembled engineered system how digital twin technologies can be combined in practice.

URL PDF HTML ☆

赞 0 踩 0

2506.11139 2026-06-18 eess.IV cs.AI cs.CV 版本更新

Grids Often Outperform Implicit Neural Representations at Compressing Dense Signals

网格通常在压缩密集信号方面优于隐式神经表示

Namhoon Kim, Sara Fridovich-Keil

发表机构 * Department of Electrical and Computer Engineering（电气与计算机工程系）； Georgia Institute of Technology（佐治亚理工学院）

AI总结研究发现，对于密集信号任务，带插值的正则化网格在训练速度和重建质量上优于同等参数量的隐式神经表示，而INR仅在拟合二值信号（如形状轮廓）时表现更优。

Comments Our analysis are available at https://github.com/voilalab/INR-benchmark

详情

AI中文摘要

隐式神经表示（INR）最近展示了令人印象深刻的结果，但其基本容量、隐式偏差和缩放行为仍知之甚少。我们研究了不同INR在一系列具有不同有效带宽的2D和3D真实及合成信号上的性能，以及包括断层扫描、超分辨率和去噪在内的过拟合和泛化任务。通过根据模型大小以及信号类型和带宽对性能进行分层，我们的结果揭示了不同INR和网格表示如何分配其容量。我们发现，对于许多涉及密集信号的任务，具有插值的简单正则化网格在训练速度和质量上优于或等同于具有相同参数数量的任何INR。我们还发现有限的情况——即拟合二值信号（如形状轮廓）——其中INR优于网格，以指导INR的未来开发和使用，使其应用于最有利的应用场景。

英文摘要

Implicit Neural Representations (INRs) have recently shown impressive results, but their fundamental capacity, implicit biases, and scaling behavior remain poorly understood. We investigate the performance of diverse INRs across a suite of 2D and 3D real and synthetic signals with varying effective bandwidth, as well as both overfitting and generalization tasks including tomography, super-resolution, and denoising. By stratifying performance according to model size as well as signal type and bandwidth, our results shed light on how different INR and grid representations allocate their capacity. We find that, for many tasks involving dense signals, a simple regularized grid with interpolation trains faster and to higher or comparable quality than any INR with the same number of parameters. We also find limited settings -- namely fitting binary signals such as shape contours -- where INRs outperform grids, to guide future development and use of INRs towards the most advantageous applications.

URL PDF HTML ☆

赞 0 踩 0

2502.02904 2026-06-18 cs.HC cs.CL q-bio.NC 版本更新

ScholaWrite: A Dataset of End-to-End Scholarly Writing Process

ScholaWrite: 端到端学术写作过程数据集

Khanh Chi Le, Linghe Wang, Minhwa Lee, Ross Volkov, Luan Tuyen Chau, Dongyeop Kang

发表机构 * University of Minnesota（明尼苏达大学）

AI总结提出ScholaWrite数据集，通过Chrome扩展记录Overleaf上的按键，捕捉从初稿到终稿的多月写作过程，包含5篇计算机科学预印本的近6.2万次文本修改及认知写作意图标注，揭示人类写作与LLM辅助之间的差距。

Comments Equal contribution: Khanh Chi Le, Linghe Wang, Minhwa Lee | project page: https://minnesotanlp.github.io/scholawrite/

详情

AI中文摘要

写作是一项认知要求高的活动，需要持续决策、高度依赖工作记忆，并在不同目标的任务之间频繁切换。为了构建与作者认知真正一致的写作助手，我们必须捕捉并解码作者将想法转化为最终文本背后的完整思维过程。我们提出了ScholaWrite，这是第一个端到端学术写作数据集，追踪从初稿到最终手稿的多月历程。我们贡献了三个关键进展：（1）一个Chrome扩展，可无干扰地记录Overleaf上的按键，从而能够收集真实、现场写作数据；（2）一个新颖的完整学术手稿语料库，附有认知写作意图的细粒度标注。该数据集包含基于LaTeX的五篇计算机科学预印本的编辑，捕捉了四个月内近6.2万次文本更改；（3）对学术写作微观动态的分析和见解，突出了人类写作过程与大型语言模型（LLM）在提供有意义帮助方面的当前能力之间的差距。ScholaWrite强调了捕获端到端写作数据以开发未来写作助手的重要性，这些助手支持而非取代科学家的认知工作。

英文摘要

Writing is a cognitively demanding activity that requires constant decision-making, heavy reliance on working memory, and frequent shifts between tasks of different goals. To build writing assistants that truly align with writers' cognition, we must capture and decode the complete thought process behind how writers transform ideas into final texts. We present ScholaWrite, the first dataset of end-to-end scholarly writing, tracing the multi-month journey from initial drafts to final manuscripts. We contribute three key advances: (1) a Chrome extension that unobtrusively records keystrokes on Overleaf, enabling the collection of realistic, in-situ writing data; (2) a novel corpus of full scholarly manuscripts, enriched with fine-grained annotations of cognitive writing intentions. The dataset includes \LaTeX-based edits from five computer science preprints, capturing nearly 62K text changes over four months; and (3) analyses and insights into the micro-dynamics of scholarly writing, highlighting gaps between human writing processes and the current capabilities of large language models (LLMs) in providing meaningful assistance. ScholaWrite underscores the value of capturing end-to-end writing data to develop future writing assistants that support, not replace, the cognitive work of scientists.

URL PDF HTML ☆

赞 0 踩 0

2508.10178 2026-06-18 q-bio.QM cs.LG 版本更新

Estimating carbon pools in the European Shelf sea environment: replacing reanalysis by model-informed machine learning?

估算欧洲陆架海环境中的碳库：用模型指导的机器学习替代再分析？

Jozef Skakala

发表机构 * Plymouth Marine Laboratory（普利茅斯海洋实验室）； National Centre for Earth Observation（国家地球观测中心）

AI总结提出用深度集成神经网络学习可观测变量与海洋碳库的关系，以低成本替代昂贵再分析，在西北欧陆架海实现高效碳库预测并提供不确定性。

Comments 37 pages, 9 figures (+ 3 in the appendix), v3 - published version

Journal ref JGR - Machine Learning and Computation 3 (2026)

详情

DOI: 10.1029/2026JH001326

AI中文摘要

陆架海对经济和碳循环至关重要，但碳库观测往往稀疏或高度不确定。碳再分析（无论是同化叶绿素a等代理变量还是直接同化碳）可提供替代方案，但运行成本高昂。我们提出使用计算成本低的神经网络集成（即深度集成）来学习直接可观测（大气、河流和海洋）变量与海洋碳库之间的关系，该关系来自一个物理-生物地球化学耦合模型。深度集成在西北欧陆架海（NWES）物理-生物地球化学模型自由运行模拟上训练。训练后，使用来自NWES再分析的输入而非自由运行来运行深度集成，证明它能高效预测多个NWES碳库（如碎屑、浮游动物、异养细菌），且与再分析的一致性远优于自由运行，同时提供不确定性信息。我们进一步表明，当深度集成直接由同化到再分析中的观测驱动时，其表现同样良好，但碳库只能预测在观测位置和时间。我们关注结果的可解释性，并展示了深度集成在未来气候假设情景中的潜在应用。我们认为，模型指导的机器学习为昂贵的再分析提供了可行的替代方案，并可在观测缺失和/或高度不确定的地方补充观测。

英文摘要

Shelf seas are important for the economy and the carbon cycle, but shelf sea observations for carbon pools are often sparse, or highly uncertain. An alternative can be provided by carbon reanalyses (whether assimilating proxy variables, such as chlorophyll-$a$, or directly carbon), but these are often expensive to run. We propose to use a computationally cheap ensemble of neural networks (i.e. deep ensemble) to learn the relationship between the directly observable (atmospheric, riverine and ocean) variables and marine carbon pools from a coupled physics-biogeochemistry model. The deep ensemble was trained on a North-West European Shelf (NWES) physical-biogeochemistry model free run simulation. After training, the deep ensemble was run using inputs from the NWES reanalysis instead of the free run, demonstrating that it can efficiently predict several NWES carbon pools (e.g., detritus, zooplankton, heterotrophic bacteria) in much better agreement with the reanalysis than the free run, while also providing uncertainty information. We further show that the deep ensemble performs similarly well when it is driven directly by the observations assimilated into the reanalysis, with the limitation that carbon pools can then be predicted only at the observed locations and times. We focus on explainability of the results and demonstrate potential use of the deep ensembles for future climate what-if scenarios. We suggest that model-informed machine learning presents a viable alternative to expensive reanalyses and could complement observations, wherever they are missing and/or highly uncertain.

URL PDF HTML ☆

赞 0 踩 0

2501.06348 2026-06-18 cs.HC cs.RO 版本更新

Why Automate This? Exploring Correlations Between Desire for Robotic Automation, Invested Time and Well-Being

为什么自动化这个？探索机器人自动化愿望、投入时间与幸福感之间的相关性

Ruchira Ray, Leona Pang, Sanjana Srivastava, Li Fei-Fei, Samantha Shorey, Roberto Martín-Martín

发表机构 * University of Texas at Austin（德克萨斯大学奥斯汀分校）； Stanford University（斯坦福大学）； University of Pittsburgh（匹兹堡大学）

AI总结本研究利用BEHAVIOR-1K等数据集，发现活动时间并非自动化偏好的强预测因子，而幸福感和痛苦感是最强指标，并揭示了性别和收入水平的差异。

Comments 26 pages, 14 figures

详情

AI中文摘要

理解人类倾向于自动化任务的动机对于开发无缝融入日常生活的机器人至关重要。因此，我们提出疑问：个体是否更倾向于根据活动消耗的时间或执行活动时的感受来自动化活动？本研究探讨了这些偏好以及它们是否在不同社会群体（特别是性别类别和收入水平）之间存在差异。利用BEHAVIOR-1K数据集、美国时间使用调查以及美国时间使用调查幸福感模块的数据，我们研究了机器人自动化愿望、花费时间以及相关感受（幸福感、意义感、悲伤感、痛苦感、压力感或疲惫感）之间的关系。我们的主要发现表明，尽管存在常见假设，但活动花费的时间并不能强烈预测自动化偏好；相反，幸福感和痛苦感是最强的指标。我们还识别出性别和经济水平的差异：女性倾向于自动化压力大的活动，而男性倾向于自动化让他们不快乐的活动；中等收入个体优先自动化不太愉快和有意义的活动，而低收入和高收入群体则没有显著相关性。我们希望我们的研究有助于推动机器人设计符合用户优先事项，使家用机器人朝着更具社会相关性的解决方案发展。所有数据和交互式工具均可在此https URL公开获取。

英文摘要

Understanding the motivations underlying the human inclination to automate tasks is vital for developing robots that fit seamlessly into daily life. Accordingly, we ask: are individuals more inclined to automate activities based on the time they consume or the feelings experienced while performing them? This study explores these preferences and whether they vary across social groups, specifically gender category and income level. Leveraging data from the BEHAVIOR-1K dataset, the American Time-Use Survey, and the American Time-Use Survey Well-Being Module, we investigate the relationship between the desire for robot automation, time spent, and associated feelings: Happiness, Meaningfulness, Sadness, Painfulness, Stressfulness, or Tiredness. Our key findings show that, despite common assumptions, time spent on activities does not strongly predict automation preferences; instead, happiness and pain are the strongest indicators. We also identify differences by gender and economic level: Women prefer to automate stressful activities, whereas men prefer to automate those that make them unhappy; mid-income individuals prioritize automating less enjoyable and meaningful activities, while low and high-income show no significant correlations. We hope our research helps motivate the design of robots that align with user priorities, moving domestic robotics toward more socially relevant solutions. All data and an interactive tool are publicly available at https://robin-lab.cs.utexas.edu/why-automate-this/.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Automated Byzantine-Resilient Clustered Decentralized Federated Learning for Battery Intelligence in Connected EVs

Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference

Controllable Quantum Memory Capacity in Quantum Reservoir Networks with Tunable partial-SWAPs

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

Unraveling the Mechanism of Drug Binding to SARS-CoV-2 RNA Pseudoknot with Thermodynamics-Driven Machine Learning

WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

Efficient Image-to-Image Schrödinger Bridge for CT Field of View Extension

IPSL-AID: Generative Diffusion Models for Climate Downscaling from Global to Regional Scales

A CEFR-Inspired Classification Framework with Fuzzy C-Means To Automate Assessment of Programming Skills in Scratch

PolyBench: A Benchmark for Compositional Reasoning in Polyphonic Audio

Continual Adaptation for Pacific Indigenous Speech Recognition

How Does the ReLU Activation Affect the Implicit Bias of Gradient Descent on High-dimensional Neural Network Regression?

DecNefSimulator: A Modular, Interpretable Framework for Decoded Neurofeedback Simulation Using Generative Models

Regular Fourier Features for Nonstationary Gaussian Processes

Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions

Anti-causal domain generalization: Leveraging unlabeled data

KANELÉ: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation

LALM-as-a-Judge: Benchmarking Large Audio-Language Models for Safety Evaluation in Multi-Turn Spoken Dialogues

Ultrafast On-chip Online Learning via Spline Locality in Kolmogorov-Arnold Networks

DeepInflation: an AI agent for research and model discovery of inflation

Improving Scientific Document Retrieval with Academic Concept Index

Clustering and Pruning in Causal Data Fusion

Towards a future space-based, highly scalable AI infrastructure system design

How fast can you find a good hypothesis?

GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents

A Streaming Sparse Cholesky Method for Derivative-Informed Gaussian Process Surrogates Within Digital Twin Applications

Grids Often Outperform Implicit Neural Representations at Compressing Dense Signals

ScholaWrite: A Dataset of End-to-End Scholarly Writing Process

Estimating carbon pools in the European Shelf sea environment: replacing reanalysis by model-informed machine learning?

Why Automate This? Exploring Correlations Between Desire for Robotic Automation, Invested Time and Well-Being