arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4089
2507.05658 2026-06-02 physics.ao-ph cs.LG

HRRRCast: a data-driven emulator for regional weather forecasting at convection allowing scales

HRRRCast:面向对流允许尺度区域天气预报的数据驱动模拟器

Daniel Abdi, Isidora Jankov, Paul Madden, Vanderlei Vargas, Timothy A. Smith, Sergey Frolov, Montgomery Flora, Corey Potvin

发表机构 * Cooperative Institute for Research in Environmental Sciences(环境科学研究院) Cooperative Institute for Research in the Atmosphere(大气研究院) Cooperative Institute for Severe and High-Impact Weather Research and Operations(严重和高影响天气研究与运作研究院) NOAA Global Systems Laboratory(国家海洋和大气管理局全球系统实验室) NOAA Physical Sciences Laboratory(国家海洋和大气管理局物理科学实验室) NOAA National Severe Storms Laboratory(国家海洋和大气管理局严重风暴实验室)

AI总结 提出HRRRCast数据驱动模拟器,采用ResNet和GNN架构,通过多预报时长训练和贪婪滚动策略,在CONUS区域复合反射率预报上达到与HRRR模型相当或更优的性能。

详情
Journal ref
Artificial Intelligence for the Earth Systems, Vol. 5, No. 2, 2026, Article 250061
AI中文摘要

高分辨率快速刷新(HRRR)模型是一种用于美国本土(CONUS)业务天气预报的对流允许模型。为了提供计算高效的替代方案,我们引入了HRRRCast,这是一个基于先进机器学习技术构建的数据驱动模拟器。HRRRCast包含两种架构:基于ResNet的模型(ResHRRR)和基于图神经网络的模型(GraphHRRR)。ResHRRR使用卷积神经网络,并增强了挤压激励模块和特征线性调制,通过去噪扩散隐式模型(DDIM)支持概率预报。为了更好地处理较长的预报时效,我们训练单个模型预测多个预报时长(1小时、3小时和6小时),然后在推理时采用贪婪滚动策略。当使用3到10个成员的集合在CONUS全区域评估复合反射率时,ResHRRR在弱降雨阈值(20 dBZ)下优于HRRR预报,并在中等阈值(30 dBZ)下达到有竞争力的性能。我们的工作改进了Pathak等人[21]的StormCast模型,具体改进包括:a) 在CONUS全区域上训练,b) 使用多个预报时长以提高长期预报技巧,c) 使用分析数据而非StormCast中无意使用的+1小时后分析数据训练,d) 将未来的GFS状态作为输入,实现降尺度以提高长预报时效的准确性。基于网格、邻域和对象的指标证实,与HRRR相比,风暴定位更好、频率偏差更低、成功率更高。HRRRCast集合预报还保持了更清晰的空间细节,功率谱与HRRR分析更匹配。虽然GraphHRRR在当前形式下表现不佳,但它为未来的图基预报奠定了基础。HRRRCast代表了向高效、数据驱动的区域天气预报迈出的一步,具有有竞争力的准确性和集合能力。

英文摘要

The High-Resolution Rapid Refresh (HRRR) model is a convection-allowing model used in operational weather forecasting across the contiguous United States (CONUS). To provide a computationally efficient alternative, we introduce HRRRCast, a data-driven emulator built with advanced machine learning techniques. HRRRCast includes two architectures: a ResNet-based model (ResHRRR) and a Graph Neural Network-based model (GraphHRRR). ResHRRR uses convolutional neural networks enhanced with squeeze-and-excitation blocks and Feature-wise Linear Modulation, and supports probabilistic forecasting via the Denoising Diffusion Implicit Model (DDIM). To better handle longer lead times, we train a single model to predict multiple lead times (1h, 3h, and 6h), then use a greedy rollout strategy during inference. When evaluated on composite reflectivity over the full CONUS domain using ensembles of 3 to 10 members, ResHRRR outperforms HRRR forecast at light rainfall threshold (20 dBZ) and achieves competitive performance at moderate thresholds (30 dBZ). Our work advances the StormCast model of Pathak et al. [21] by: a) training on the full CONUS domain, b) using multiple lead times to improve long-range skill, c) training on analysis data instead of the +1h post-analysis data inadvertently used in StormCast, and d) incorporating future GFS states as inputs, enabling downscaling that improves long-lead accuracy. Grid-, neighborhood-, and object-based metrics confirm better storm placement, lower frequency bias, and higher success ratios than HRRR. HRRRCast ensemble forecasts also maintain sharper spatial detail, with power spectra more closely matching HRRR analysis. While GraphHRRR underperforms in its current form, it lays groundwork for future graph-based forecasting. HRRRCast represents a step toward efficient, data-driven regional weather prediction with competitive accuracy and ensemble capability.

2507.02905 2026-06-02 cs.HC cs.AI cs.LG

Preference-Optimal Multi-Metric Weighting for Parallel Coordinate Plots

平行坐标图的偏好最优多度量加权

Chisa Mori, Shuhei Watanabe, Masaki Onishi, Takayuki Itoh

发表机构 * Preferred Networks Inc.(Preferred Networks公司)

AI总结 针对平行坐标图中多度量可视化难题,提出基于偏好最优加权的公式化方法,并利用雷达图与UMAP降维实现直观偏好选择,有效揭示控制参数重要性模式。

Comments Accepted to International Conference Information Visualisation (iV2025)

详情
AI中文摘要

平行坐标图(PCP)是一种解释控制参数与度量之间关系的常用方法。PCP通过基于单一度量的颜色渐变来提供这种解释。然而,当存在多个度量时,提供这样的渐变是具有挑战性的。虽然一种简单的方法是通过线性加权每个度量来计算单一度量,但这种加权对用户来说是不明确的。为了解决这个问题,我们首先提出了一种基于特定偏好度量组合计算最优加权的原则性公式。尽管用户可以在双度量问题的二维(2D)平面上简单地选择他们的偏好,但多度量问题需要直观的可视化以允许他们选择偏好。我们通过使用各种雷达图来可视化由UMAP降维的2D平面上的度量权衡来实现这一点。在使用行人流引导规划的分析中,我们的方法为每个用户偏好识别出了控制参数重要性的独特模式,突出了我们方法的有效性。

英文摘要

Parallel coordinate plots (PCPs) are a prevalent method to interpret the relationship between the control parameters and metrics. PCPs deliver such an interpretation by color gradation based on a single metric. However, it is challenging to provide such a gradation when multiple metrics are present. Although a naive approach involves calculating a single metric by linearly weighting each metric, such weighting is unclear for users. To address this problem, we first propose a principled formulation for calculating the optimal weight based on a specific preferred metric combination. Although users can simply select their preference from a two-dimensional (2D) plane for bi-metric problems, multi-metric problems require intuitive visualization to allow them to select their preference. We achieved this using various radar charts to visualize the metric trade-offs on the 2D plane reduced by UMAP. In the analysis using pedestrian flow guidance planning, our method identified unique patterns of control parameter importance for each user preference, highlighting the effectiveness of our method.

2506.10677 2026-06-02 stat.ML cs.LG

Exploiting Similarities in A/B Testing with Off-Policy Estimation

利用离线策略估计在A/B测试中的相似性

Otmane Sakhi, Alexandre Gilotte, David Rohde

发表机构 * Criteo AI Lab(Criteo AI实验室)

AI总结 本文提出利用离线策略估计方法,通过捕捉新旧系统决策倾向的相似性,构建一族A/B测试估计器,在保持无偏性的同时改善集中性质,提高统计效率。

Comments KDD '26

详情
AI中文摘要

我们研究A/B测试,即衡量新决策系统相对于基线的性能增益的标准协议。传统的A/B测试将两个系统视为黑箱,忽略了它们之间的潜在相似性。然而,在实践中,新系统和基线系统很少存在根本性差异,通常共享显著的结构,这可以通过它们做出相似决策的倾向来捕捉。我们表明,在这种情况下,常用的均值差估计量虽然无偏,但在统计上并非最优。利用离线策略估计,我们引入了一族A/B测试估计量,这些估计量利用被测试系统的倾向来获得改进的集中性质。这族估计量足够灵活,可以针对实际决策进行定制。得到的估计量简单、对倾向性误设具有鲁棒性,在测试系统表现出相似性时显著更准确,并在缺乏这种相似性时优雅地退化为均值差估计量。我们的理论分析和实证研究证实了它们的效率和实用性。

英文摘要

We study A/B testing, the standard protocol for measuring the performance gain of a new decision system relative to a baseline. Traditional A/B testing treats both systems as black boxes, ignoring potential similarities between them. In practice, however, new and baseline systems are rarely radically different and often share significant structure, which can be captured by their propensities to make similar decisions. We show that in such cases, the commonly used difference-in-means estimator, though unbiased, is statistically suboptimal. Leveraging off-policy estimation, we introduce a family of A/B testing estimators that exploit the propensities of the tested systems to achieve improved concentration properties. This family is flexible enough to be tailored to practical decision-making. The resulting estimators are simple, robust to propensities misspecification, substantially more accurate when the tested systems exhibit similarities, and gracefully fall back to the difference-in-means estimator when such similarities are absent. Our theoretical analysis and empirical studies confirm their efficiency and practicality.

2506.10858 2026-06-02 eess.IV cs.CV

Med-URWKV†: Toward Enhanced Pretrained Pure VRWKV Models for Medical Image Segmentation

Med-URWKV†:面向医学图像分割的增强型预训练纯VRWKV模型

Zhenhuan Zhou, Yining Li, Yanlin Wu, Haohan Zou, Yan Wang, Tao Li

发表机构 * College of Computer Science, Nankai University(南开大学计算机科学学院) Key Laboratory of Data and Intelligent System Security, Ministry of Education(教育部数据与智能系统安全重点实验室) School of Medicine, Nankai University(南开大学医学院) Nankai University Eye institute, Nankai University(南开大学眼科研究院) Tianjn Eye Hospital(天津眼科医院) Haihe Lab of ITAI(海河ITAI实验室)

AI总结 本文提出Med-URWKV模型,通过重用预训练VRWKV编码器并设计FAWA和MSCF模块,在五个数据集上达到SOTA性能,其中Med-URWKV†以半参数实现最高平均Dice 88.00%。

Comments Under Review Since 2026-1-22, 12 pages. Copyright: College of Computer Science, Nankai University. All rights reserved

详情
AI中文摘要

医学图像分割是计算机辅助诊断和治疗中的基本任务。基于CNN、ViT、Mamba和混合模型的现有方法仍存在感受野受限、计算成本高或精度不足等问题。最近,视觉感受野加权键值(VRWKV)模型作为一种有前景的替代方案出现,为视觉任务提供了强大的长距离依赖建模能力。然而,当前基于VRWKV的医学图像分割研究主要集中于从头训练的混合架构,而大规模预训练纯VRWKV模型的潜力尚未被探索。在这项工作中,我们系统研究了纯VRWKV架构在医学图像分割中的有效性。通过在不同尺度上重用预训练VRWKV编码器并搭配纯VRWKV解码器,我们构建了Med-URWKV-T和Med-URWKV-S,从而对该领域中的预训练纯VRWKV模型进行全面评估。为进一步提升性能,我们提出了两个VRWKV兼容模块:频率感知小波注意力(FAWA)模块,利用小波变换捕捉边缘细节和结构特征;以及多尺度通道融合(MSCF)模块,整合多尺度特征以增强信息性通道表示。通过将它们集成到Med-URWKV-T中,我们得到了增强模型Med-URWKV†。在五个医学图像分割数据集上的大量实验表明,Med-URWKV取得了与最先进方法及精心设计的混合VRWKV架构相当或更优的性能。此外,Med-URWKV†进一步提升了分割精度,在仅使用一半参数量的情况下超越了Med-URWKV-S,并达到了最高的平均Dice相似系数88.00%。代码将公开发布。

英文摘要

Medical image segmentation is a fundamental task in computer-aided diagnosis and treatment. Existing approaches based on CNNs, ViTs, Mamba, and hybrid models still suffer from limitations such as restricted receptive fields, high computational cost, or insufficient accuracy. Recently, Vision Receptive-field Weighted Key-Value (VRWKV) models have emerged as a promising alternative,delivering strong long-range dependency modeling for visual tasks. However, current studies on VRWKV-based medical image segmentation mainly focus on hybrid architectures trained from scratch, while the potential of large-scale pretrained pure VRWKV models remains unexplored. In this work, we systematically investigate the effectiveness of pure VRWKV architectures for medical image segmentation. We construct Med-URWKV-T and Med-URWKV-S by reusing pretrained VRWKV encoders at different scales and pairing them with pure VRWKV decoders, enabling a comprehensive evaluation of pretrained pure VRWKV models in this domain. To further enhance performance, we propose two VRWKV-compatible modules: a Frequency-Aware Wavelet Attention (FAWA) module, which exploits wavelet transforms to capture edge details and structural characteristics, and a Multi-Scale Channel Fusion (MSCF) module, which integrates multi-scale features to strengthen informative channel representations. By incorporating them into Med-URWKV-T, we obtain the enhanced model Med-URWKV†. Extensive experiments on five medical image segmentation datasets demonstrate that Med-URWKV achieves performance comparable to or superior to state-of-the-art methods and carefully designed hybrid VRWKV architectures. Moreover, Med-URWKV† further improves segmentation accuracy, surpassing Med-URWKV-S while using only half of its parameter count, and achieves the highest average Dice similarity coefficient of 88.00%. The codes will be released.

2506.01226 2026-06-02 eess.SY cs.LG cs.SY

React to Surprises: Stable-by-Design Neural Feedback Control and the Youla-REN

应对意外:稳定设计的神经反馈控制与Youla-REN

Nicholas H. Barbara, Ruigang Wang, Alexandre Megretski, Ian R. Manchester

发表机构 * Australian Centre for Robotics(澳大利亚机器人中心) School of Aerospace, Mechanical and Mechatronic Engineering(航空航天、机械与机电工程学院) The University of Sydney(悉尼大学) Laboratory for Information and Decision Systems(信息与决策系统实验室) Dept. Electrical Engineering and Computer Science(电气工程与计算机科学系)

AI总结 提出基于非线性Youla-Kucera参数化和鲁棒神经网络(如循环均衡网络REN)的结构,实现无约束优化且保证闭环稳定性,并分析了非线性、部分观测和增量稳定性要求下的性质。

详情
AI中文摘要

我们研究了用于基于学习的控制的稳定非线性策略的参数化。提出了一种基于非线性Youla-Kucera参数化与鲁棒神经网络(如循环均衡网络REN)相结合的结构。得到的参数化是无约束的,因此可以通过一阶优化方法进行搜索,同时始终通过构造保证闭环稳定性。我们研究了(a)非线性动力学、(b)部分观测和(c)增量闭环稳定性要求(收缩性和Lipschitz性)的组合。我们发现,对于(c)与(a)或(b)的组合,收缩且Lipschitz的Youla参数总是导致收缩且Lipschitz的闭环。然而,如果三者同时成立,则增量稳定性可能因外部扰动而丧失。相反,维持了一个较弱的条件,我们称之为d-管收缩和Lipschitz性。我们进一步得到了逆结果,表明所提出的参数化覆盖了某些非线性系统类别的所有收缩且Lipschitz的闭环。数值实验说明了我们的参数化在学习具有内置稳定性保证的控制器时的实用性,这些控制器用于:(i)没有稳定效应的“经济”奖励;(ii)短训练周期;以及(iii)不确定系统。

英文摘要

We study parameterizations of stabilizing nonlinear policies for learning-based control. We propose a structure based on a nonlinear version of the Youla-Kucera parameterization combined with robust neural networks such as the recurrent equilibrium network (REN). The resulting parameterizations are unconstrained, and hence can be searched over with first-order optimization methods, while always ensuring closed-loop stability by construction. We study the combination of (a) nonlinear dynamics, (b) partial observation, and (c) incremental closed-loop stability requirements (contraction and Lipschitzness). We find that for the combination of (c) with either (a) or (b), a contracting and Lipschitz Youla parameter always leads to contracting and Lipschitz closed loops. However, if all three hold, then incremental stability can be lost with exogenous disturbances. Instead, a weaker condition is maintained, which we call d-tube contraction and Lipschitzness. We further obtain converse results showing that the proposed parameterization covers all contracting and Lipschitz closed loops for certain classes of nonlinear systems. Numerical experiments illustrate the utility of our parameterization when learning controllers with built-in stability certificates for: (i) ``economic'' rewards without stabilizing effects; (ii) short training horizons; and (iii) uncertain systems.

2505.19925 2026-06-02 stat.ME cs.LG

Cellwise and Casewise Robust Covariance in High Dimensions

高维中的逐细胞和逐案例稳健协方差

Fabio Centofanti, Mia Hubert, Peter J. Rousseeuw

发表机构 * Section of Statistics and Data Science, Department of Mathematics, KU Leuven, Belgium(统计与数据科学系,数学系,卢森堡大学,比利时)

AI总结 提出cellRCov方法,通过主成分和正交子空间分解结合岭正则化,同时处理高维数据中的案例异常值、细胞异常值和缺失数据,并建立了理论性质。

详情
AI中文摘要

样本协方差矩阵是多变量统计的基石,但它对异常值高度敏感。这些异常值可以是案例异常值(例如属于不同总体的案例),也可以是细胞异常值(数据矩阵中的偏差单元格)。最近开发了一些能够处理这两种异常值的稳健协方差估计量,但其计算仅适用于最多20维。为了解决这个问题,我们提出了cellRCov方法,这是一种同时处理案例异常值、细胞异常值和缺失数据的稳健协方差估计量。它依赖于协方差在主成分和正交子空间上的分解,利用了稳健PCA的最新工作。它还采用岭型正则化来稳定估计的协方差矩阵。我们建立了cellRCov的一些理论性质,包括其逐案例和逐细胞影响函数以及一致性和渐近正态性。模拟研究证明了cellRCov在污染和缺失数据场景中的优越性能。此外,其在异常检测的实际应用中也展示了实用性。我们还构建并展示了用于稳健和正则化典型相关分析的cellRCCA方法。

英文摘要

The sample covariance matrix is a cornerstone of multivariate statistics, but it is highly sensitive to outliers. These can be casewise outliers, such as cases belonging to a different population, or cellwise outliers, which are deviating cells (entries) of the data matrix. Recently some robust covariance estimators have been developed that can handle both types of outliers, but their computation is only feasible up to at most 20 dimensions. To remedy this we propose the cellRCov method, a robust covariance estimator that simultaneously handles casewise outliers, cellwise outliers, and missing data. It relies on a decomposition of the covariance on principal and orthogonal subspaces, leveraging recent work on robust PCA. It also employs a ridge-type regularization to stabilize the estimated covariance matrix. We establish some theoretical properties of cellRCov, including its casewise and cellwise influence functions as well as consistency and asymptotic normality. A simulation study demonstrates the superior performance of cellRCov in contaminated and missing data scenarios. Furthermore, its practical utility is illustrated in a real-world application to anomaly detection. We also construct and illustrate the cellRCCA method for robust and regularized canonical correlation analysis.

2505.14725 2026-06-02 q-bio.GN cs.LG stat.AP

HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity

HR-VILAGE-3K3M:用于系统免疫学的人类呼吸道病毒免疫纵向基因表达数据集

Xuejun Sun, Yiran Song, Xiaochen Zhou, Ruilie Cai, Yu Zhang, Xinyi Li, Rui Peng, Jialiu Xie, Yuanyuan Yan, Muyao Tang, Prem Lakshmanane, Baiming Zou, James S. Hagood, Raymond J. Pickles, Didong Li, Fei Zou, Xiaojing Zheng

发表机构 * Department of Biostatistics University of North Carolina at Chapel Hill(北卡罗来纳大学教堂山分校生物统计学系) Department of Epidemiology and Biostatistics University of South Carolina(南卡罗来纳大学流行病学与生物统计学系) Department of Pediatrics University of North Carolina at Chapel Hill(北卡罗来纳大学教堂山分校儿科系) Department of Microbiology and Immunology University of North Carolina at Chapel Hill(北卡罗来纳大学教堂山分校微生物学与免疫学系)

AI总结 为解决呼吸道病毒感染研究中转录组数据分散且处理不一致的问题,构建了包含3178名受试者、66项研究的HR-VILAGE-3K3M数据集,整合了疫苗接种、病毒接种和混合暴露的批量及单细胞转录组数据,并进行了统一的预处理和质量控制,以支持生物标志物发现、免疫机制研究和分析方法开发。

详情
AI中文摘要

呼吸道病毒感染构成全球健康负担,但保护性和病理性的细胞免疫机制仍不清楚。自然感染队列通常缺乏暴露前基线和时间控制采样,而接种和疫苗试验则产生结构良好的纵向转录组数据。然而,这些数据集分散在多个存储库中且处理不一致,阻碍了整合性和AI驱动的分析。为应对这些挑战,我们开发了人类呼吸道病毒免疫纵向基因表达(HR-VILAGE-3K3M)存储库:一个整合了来自66项研究的3178名受试者的批量及单细胞转录组谱的AI就绪资源。该数据集涵盖疫苗接种、病毒接种和混合暴露,样本来自血液和鼻拭子,收集自GEO、ImmPort和ArrayExpress等公共存储库。我们整理并协调了受试者级别的元数据,标准化了结果测量,并应用了统一的预处理和严格的质量控制。我们还提供了基准分析以说明其实用性。该资源支持生物标志物发现、免疫机制和方法学开发。作为人类呼吸道病毒免疫领域最大的纵向转录组资源之一,HR-VILAGE-3K3M能够实现可重复和可扩展的分析,从而加速疫苗和抗病毒研究。

英文摘要

Respiratory viral infections pose a global health burden, yet the cellular immune mechanisms underlying protection and pathology remain unclear. Natural infection cohorts often lack pre-exposure baselines and time-controlled sampling, whereas inoculation and vaccination trials generate well-structured longitudinal transcriptomic data. However, these datasets are scattered across repositories and processed inconsistently, hindering integrative and AI-driven analyses. To address these challenges, we developed the Human Respiratory Viral Immunization LongitudinAl Gene Expression (HR-VILAGE-3K3M) repository: an AI-ready resource integrating bulk and single-cell transcriptomic profiles from 3,178 subjects across 66 studies. The dataset spans vaccination, inoculation, and mixed exposures, with samples from blood and nasal swabs collected from public repositories including GEO, ImmPort, and ArrayExpress. We curated and harmonized subject-level metadata, standardized outcome measures, and applied unified preprocessing with rigorous quality control. We further provide benchmark analyses illustrating its utility. This resource supports discovery of biomarkers, immune mechanisms, and methodological development. As one of the largest longitudinal transcriptomic resources for human respiratory viral immunization, HR-VILAGE-3K3M enables reproducible and scalable analyses to accelerate vaccine and antiviral research.

2504.20238 2026-06-02 physics.ao-ph cs.LG

Atmospheric Predictability Beyond 30 Days with Machine Learning

利用机器学习实现30天以上的大气可预报性

P. Trent Vonich, Gregory J. Hakim

发表机构 * University of Washington(华盛顿大学) Air Force Institute of Technology(空军技术研究院)

AI总结 通过机器学习模型GraphCast优化初始条件,将确定性天气预报的时效从两周延长至30天以上,平均误差降低86%。

详情
AI中文摘要

长期以来,大气可预报性研究认为小空间尺度上的快速误差增长将确定性天气预报的固有时效限制在约两周。我们利用机器学习天气模型GraphCast,通过优化2020年每日两次预报的初始条件,挑战了这一极限。该方法在十天预报中,相对于再分析初始条件的控制预报,平均误差降低了86%,且技能持续超过30天。平均最优初始条件扰动显示出大尺度、空间一致的修正,主要反映了哈德莱环流的增强。在盘古天气模型中使用GraphCast最优初始条件的预报实现了21%的误差降低,在四天时达到峰值,表明分析修正针对的是模型和分析误差的调整。这些结果证明了存在能够产生远超两周的 skillful 确定性预报的初始条件。这些初始条件是否能够实时识别以改进业务天气预报,仍是未来研究的课题。

英文摘要

Atmospheric predictability research has long held that rapid error growth at small spatial scales imposes an intrinsic limit of roughly two weeks on deterministic weather forecast skill. We challenge this limit using GraphCast, a machine-learning weather model, by optimizing initial conditions for twice-daily forecasts spanning 2020. This approach yields an average error reduction of 86% at ten days relative to control forecasts from reanalysis initial conditions, with skill lasting beyond 30 days. Mean optimal initial-condition perturbations reveal large-scale, spatially coherent corrections primarily reflecting an intensification of the Hadley circulation. Forecasts using GraphCast-optimal initial conditions in the Pangu-Weather model achieve a 21% error reduction, peaking at four days, indicating that analysis corrections reflect adjustments that target both model and analysis error. These results demonstrate the existence of initial conditions producing skillful deterministic forecasts far beyond two weeks. Whether such initial conditions can be identified in real-time for improving operational weather forecasts remains a topic of future research.

2504.16139 2026-06-02 cs.CY cs.AI

Enhancing Trust Through Standards: A Comparative Risk-Impact Framework for Aligning ISO AI Standards with Global Ethical and Regulatory Contexts

通过标准增强信任:一种用于对齐ISO AI标准与全球伦理和监管背景的比较风险-影响框架

Sridharan Sankaran

发表机构 * Research and Innovation Group(研究与创新组) Tata Consultancy Services(塔塔咨询服务)

AI总结 提出比较风险-影响评估框架,分析ISO AI标准在不同监管环境下的伦理风险覆盖情况,并建议通过强制审计、区域附件和隐私模块增强其全球适用性。

详情
AI中文摘要

随着人工智能重塑行业和社会,确保其可信赖性——通过减轻偏见、不透明性和问责缺陷等伦理风险——仍然是一个全球性挑战。国际标准化组织(ISO)的AI标准,如ISO/IEC 24027和24368,旨在通过将公平性、透明度和风险管理嵌入AI系统来促进负责任的发展。然而,它们的有效性在不同监管环境中存在差异,从欧盟基于风险的AI法案到中国注重稳定的措施以及美国分散的州级举措。本文引入了一种新颖的比较风险-影响评估框架,以评估ISO标准在这些背景下如何应对伦理风险,并提出增强其全球适用性的改进建议。通过将ISO标准映射到欧盟AI法案,并调查十个地区(包括英国、加拿大、印度、日本、新加坡、韩国和巴西)的监管框架,我们建立了伦理对齐的基线。该框架应用于欧盟、美国科罗拉多州和中国的案例研究,揭示了差距:自愿性ISO标准在执行方面不足(例如科罗拉多州),并且低估了区域特定风险(如中国的隐私)。我们建议强制风险审计、区域特定附录和以隐私为重点的模块,以增强ISO的适应性。这种方法不仅综合了全球趋势,还提供了一种可复制的工具,用于将标准化与伦理要求对齐,促进全球AI的互操作性和信任。政策制定者和标准机构可以利用这些见解来发展AI治理,确保随着技术发展满足多样化的社会需求。

英文摘要

As artificial intelligence (AI) reshapes industries and societies, ensuring its trustworthiness-through mitigating ethical risks like bias, opacity, and accountability deficits-remains a global challenge. International Organization for Standardization (ISO) AI standards, such as ISO/IEC 24027 and 24368, aim to foster responsible development by embedding fairness, transparency, and risk management into AI systems. However, their effectiveness varies across diverse regulatory landscapes, from the EU's risk-based AI Act to China's stability-focused measures and the U.S.'s fragmented state-led initiatives. This paper introduces a novel Comparative Risk-Impact Assessment Framework to evaluate how well ISO standards address ethical risks within these contexts, proposing enhancements to strengthen their global applicability. By mapping ISO standards to the EU AI Act and surveying regulatory frameworks in ten regions-including the UK, Canada, India, Japan, Singapore, South Korea, and Brazil-we establish a baseline for ethical alignment. The framework, applied to case studies in the EU, US-Colorado, and China, reveals gaps: voluntary ISO standards falter in enforcement (e.g., Colorado) and undervalue region-specific risks like privacy (China). We recommend mandatory risk audits, region-specific annexes, and a privacy-focused module to enhance ISO's adaptability. This approach not only synthesizes global trends but also offers a replicable tool for aligning standardization with ethical imperatives, fostering interoperability and trust in AI worldwide. Policymakers and standards bodies can leverage these insights to evolve AI governance, ensuring it meets diverse societal needs as the technology advances.

2411.03163 2026-06-02 quant-ph cs.LG

Efficient Hamiltonian, structure and trace distance learning of Gaussian states

高斯态的高效哈密顿量、结构和迹距离学习

Marco Fanizza, Cambyse Rouzé, Daniel Stilck França

发表机构 * Inria(法国国家信息与自动化技术研究院) Télécom Paris - LTCI(巴黎电信学院 - LTCI) Institut Polytechnique de Paris(巴黎理工学院) University of Copenhagen(哥本哈根大学) Universitat Autònoma de Barcelona(巴塞罗那自治大学) Univ Lyon(里昂大学) ENS Lyon(里昂高等师范学院) UCBL(里昂大学) CNRS(国家科学研究中心) LIP(里昂信息科学与技术实验室)

AI总结 本文针对正温度玻色子高斯态,提出基于外差测量的高效协议,以对数样本复杂度学习二次哈密顿量参数和相互作用图,并首次实现迹距离下多项式样本复杂度的高斯态学习。

Comments 54 pages, improvements in presentation and tighter analysis of the dependence on the precision in Hamiltonian and graph learning

详情
AI中文摘要

在这项工作中,我们首次研究了正温度玻色子高斯态的哈密顿量学习问题,这是广泛研究的高斯图模型学习的量子推广。在假设温度、压缩、位移和相互作用图的最大度有界的情况下,我们获得了推断其底层二次哈密顿量参数的高效协议,包括样本复杂度和计算复杂度。我们的协议仅需要外差测量,这通常在实验上可行,并且样本复杂度随模式数对数增长。此外,我们证明在类似的设置和样本复杂度下可以学习底层相互作用图。另外,利用我们的技术,我们首次获得了迹距离下高斯态学习的结果,其精度二次缩放,模式数多项式缩放,尽管对高斯态施加了某些限制。我们的主要技术创新是高斯态协方差矩阵和哈密顿矩阵的几个连续性界,这些界本身具有独立意义,并结合了我们称之为局部反演技术的方法。本质上,局部反演技术允许我们通过仅并行估计协方差矩阵的子矩阵来可靠地推断高斯态的哈密顿量,这些子矩阵的大小随所需精度缩放,而不随模式数缩放。这样,我们避免了对协方差矩阵进行精确全局估计的需求,从而控制了样本复杂度。

英文摘要

In this work, we initiate the study of Hamiltonian learning for positive temperature bosonic Gaussian states, the quantum generalization of the widely studied problem of learning Gaussian graphical models. We obtain efficient protocols, both in sample and computational complexity, for the task of inferring the parameters of their underlying quadratic Hamiltonian under the assumption of bounded temperature, squeezing, displacement and maximal degree of the interaction graph. Our protocol only requires heterodyne measurements, which are often experimentally feasible, and has a sample complexity that scales logarithmically with the number of modes. Furthermore, we show that it is possible to learn the underlying interaction graph in a similar setting and sample complexity. In addition, we use our techniques to obtain the first results on learning Gaussian states in trace distance with a quadratic scaling in precision and polynomial in the number of modes, albeit imposing certain restrictions on the Gaussian states. Our main technical innovations are several continuity bounds for the covariance and Hamiltonian matrix of a Gaussian state, which are of independent interest, combined with what we call the local inversion technique. In essence, the local inversion technique allows us to reliably infer the Hamiltonian of a Gaussian state by only estimating in parallel submatrices of the covariance matrix whose size scales with the desired precision, but not the number of modes. This way we bypass the need to obtain precise global estimates of the covariance matrix, controlling the sample complexity.

2411.15076 2026-06-02 eess.IV cs.CV q-bio.QM

RankByGene: Gene-Guided Histopathology Representation Learning Through Cross-Modal Ranking Consistency

RankByGene: 通过跨模态排序一致性实现基因引导的组织病理学表示学习

Wentao Huang, Meilong Xu, Xiaoling Hu, Shahira Abousamra, Aniruddha Ganguly, Saarthak Kapse, Alisa Yurovsky, Prateek Prasanna, Tahsin Kurc, Joel Saltz, Michael L. Miller, Chao Chen

发表机构 * Stony Brook University(石英溪大学) Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital and Harvard Medical School(阿提诺拉A.马丁努斯生物医学影像中心,麻省总医院和哈佛医学院) Department of Biomedical Data Science, Stanford University(生物医学数据科学系,斯坦福大学) Department of Pathology and Cell Biology, Columbia University(病理学与细胞生物学系,哥伦比亚大学)

AI总结 提出基于排序对齐损失的框架,利用教师-学生网络自监督知识蒸馏,解决空间转录组学与组织学图像的对齐问题,在基因表达预测、切片分类和生存分析任务中表现优异。

Comments 18 pages, 9 figures

详情
AI中文摘要

空间转录组学通过映射组织内的基因表达提供必要的空间背景,从而能够详细研究细胞异质性和组织组织。然而,由于固有的空间扭曲和模态特异性变化,将ST数据与组织学图像对齐面临挑战。现有方法主要依赖直接对齐,通常无法捕捉复杂的跨模态关系。为解决这些限制,我们提出一种新颖框架,使用基于排序的对齐损失来对齐基因和图像特征,保留跨模态的相对相似性,并实现稳健的多尺度对齐。为进一步增强对齐的稳定性,我们采用教师-学生网络架构的自监督知识蒸馏,有效减轻基因表达数据中高维性、稀疏性和噪声带来的干扰。在涵盖基因表达预测、切片级分类和生存分析的七个公共数据集上的大量实验证明了我们方法的有效性,显示出比现有方法更好的对齐和预测性能。

英文摘要

Spatial transcriptomics (ST) provides essential spatial context by mapping gene expression within tissue, enabling detailed study of cellular heterogeneity and tissue organization. However, aligning ST data with histology images poses challenges due to inherent spatial distortions and modality-specific variations. Existing methods largely rely on direct alignment, which often fails to capture complex cross-modal relationships. To address these limitations, we propose a novel framework that aligns gene and image features using a ranking-based alignment loss, preserving relative similarity across modalities and enabling robust multi-scale alignment. To further enhance the alignment's stability, we employ self-supervised knowledge distillation with a teacher-student network architecture, effectively mitigating disruptions from high dimensionality, sparsity, and noise in gene expression data. Extensive experiments on seven public datasets that encompass gene expression prediction, slide-level classification, and survival analysis demonstrate the efficacy of our method, showing improved alignment and predictive performance over existing methods.

2306.15369 2026-06-02 cs.SE cs.LG

A Meta-analytical Comparison of Naive Bayes and Random Forest for Software Defect Prediction

朴素贝叶斯与随机森林在软件缺陷预测中的元分析比较

Ch Muhammad Awais, Wei Gu, Gcinizwe Dlamini, Zamira Kholmatova, Giancarlo Succi

发表机构 * Innopolis University(因诺波利斯大学) Università di Bologna(博洛尼亚大学)

AI总结 通过系统文献综述和元分析,比较朴素贝叶斯和随机森林在召回率、F-measure和精确度上的统计差异,发现两者无显著差异。

Comments 11 pages, 8 figures, Conference Paper

详情
Journal ref
Intelligent Systems Design and Applications. ISDA 2022. Lecture Notes in Networks and Systems, vol 716
AI中文摘要

朴素贝叶斯和随机森林在预测软件缺陷的召回率、F-measure和精确度方面是否存在统计差异?通过利用系统文献综述和元分析,我们回答了这个问题。我们通过建立搜索和选择论文的标准进行了系统文献综述,最终得到五项研究。之后,利用五篇选定论文的元数据和森林图,我们进行了元分析来比较这两个模型。结果表明,没有显著的统计证据表明朴素贝叶斯在召回率、F-measure和精确度方面与随机森林表现不同。

英文摘要

Is there a statistical difference between Naive Bayes and Random Forest in terms of recall, f-measure, and precision for predicting software defects? By utilizing systematic literature review and meta-analysis, we are answering this question. We conducted a systematic literature review by establishing criteria to search and choose papers, resulting in five studies. After that, using the meta-data and forest-plots of five chosen papers, we conducted a meta-analysis to compare the two models. The results have shown that there is no significant statistical evidence that Naive Bayes perform differently from Random Forest in terms of recall, f-measure, and precision.

2411.12438 2026-06-02 cs.DS cs.LG stat.ML

Dimension Reduction via Sum-of-Squares and Improved Clustering Algorithms for Non-Spherical Mixtures

通过平方和方法的降维及非球形混合物的改进聚类算法

Prashanti Anderson, Mitali Bafna, Rares-Darius Buhai, Pravesh K. Kothari, David Steurer

发表机构 * MIT University of Washington EPFL(麻省理工学院 华盛顿大学 EPFL)

AI总结 提出基于平方和方法的降维子程序,实现非球形高斯混合物的高效聚类,显著降低样本和时间的维度依赖。

Comments 67 pages, updated to match camera-ready version at COLT 2026

详情
AI中文摘要

我们开发了一种新的方法,通过基于平方和方法的子程序,对非球形(即任意分量协方差)高斯混合模型进行聚类,该子程序能够找到输入数据的低维分离保持投影。我们的方法给出了经典降维(基于奇异值分解)的非球形类比,该经典降维在众多应用中构成著名的Vempala和Wang [VW04]球形聚类算法的关键组成部分。作为应用,我们获得了以下算法:(1) 对任意全变差分离的$k$个中心化(即零均值)高斯混合,使用$n\geq \operatorname{poly}(d) f(w_{\min}^{-1})$个样本和$\operatorname{poly}(n)$时间进行聚类;(2) 对任意全变差分离的$k$个具有相同但未知协方差的高斯混合,使用$n \geq d^{O(\log w_{\min}^{-1})} f(w_{\min}^{-1})$个样本和$n^{O(\log w_{\min}^{-1})}$时间进行聚类。这里,$w_{\min}$是输入混合的最小混合权重,$f$不依赖于维度$d$。我们的算法自然扩展到容忍与维度无关的任意异常值比例。在这项工作之前,最先进的非球形聚类算法中的技术需要$d^{O(k)} f(w_{\min}^{-1})$个样本和时间来聚类此类混合。我们的结果可能令人惊讶,因为针对非球形高斯混合聚类的$d^{Ω(k)}$统计查询和平方和下限 [DKS17, DKPP24] 通常被认为排除了该问题的$d^{o(k)}$代价算法,但我们的结果表明,对于一类非常广泛的高斯混合,这些下限实际上可以被规避。

英文摘要

We develop a new approach for clustering non-spherical (i.e., arbitrary component covariances) Gaussian mixture models via a subroutine, based on the sum-of-squares method, that finds a low-dimensional separation-preserving projection of the input data. Our method gives a non-spherical analog of the classical dimension reduction, based on singular value decomposition, that, among several other applications, forms a key component of the celebrated spherical clustering algorithm of Vempala and Wang [VW04]. As applications, we obtain an algorithm to (1) cluster an arbitrary total-variation separated mixture of $k$ centered (i.e., zero-mean) Gaussians with $n\geq \operatorname{poly}(d) f(w_{\min}^{-1})$ samples and $\operatorname{poly}(n)$ time, and (2) cluster an arbitrary total-variation separated mixture of $k$ Gaussians with identical but arbitrary unknown covariance with $n \geq d^{O(\log w_{\min}^{-1})} f(w_{\min}^{-1})$ samples and $n^{O(\log w_{\min}^{-1})}$ time. Here, $w_{\min}$ is the minimum mixing weight of the input mixture, and $f$ does not depend on the dimension $d$. Our algorithms naturally extend to tolerating a dimension-independent fraction of arbitrary outliers. Before this work, the techniques in the state-of-the-art non-spherical clustering algorithms needed $d^{O(k)} f(w_{\min}^{-1})$ samples and time for clustering such mixtures. Our results may come as a surprise in the context of the $d^{Ω(k)}$ statistical query and sum-of-squares lower bounds [DKS17, DKPP24] for clustering non-spherical Gaussian mixtures. While these results are usually thought to rule out $d^{o(k)}$ cost algorithms for the problem, our results show that the lower bounds can in fact be circumvented for a remarkably general class of Gaussian mixtures.

2409.19310 2026-06-02 cs.CR cs.AI

Model X-Ray: Detection of Hidden Malware in AI Model Weights using Few Shot Learning

Model X-Ray: 使用少样本学习检测AI模型权重中的隐藏恶意软件

Daniel Gilkarov, Ran Dubin

发表机构 * Department of Computer Science and Ariel Cyber Innovation Center, Ariel University(计算机科学系和 Ariel 网络创新中心,阿里尔大学) Department of Computer and Software Engineering and Ariel Cyber Innovation Center, Ariel University(计算机与软件工程系和 Ariel 网络创新中心,阿里尔大学)

AI总结 本文提出一种基于少样本学习的AI模型恶意软件检测方法,通过将模型权重转换为图像表示,仅需6个训练样本即可检测低至6%嵌入率的隐蔽攻击,并展现出对新型扩频隐写攻击的鲁棒性。

详情
AI中文摘要

随着人工智能(AI)的快速发展和Model Zoo等共享AI模型平台的广泛使用,AI模型被利用的潜在风险增加。攻击者可以通过隐写技术将恶意软件嵌入AI模型中,利用这些模型庞大的体积隐藏恶意数据并用于恶意目的,例如远程代码执行。确保AI模型的安全性是一个新兴的研究领域,对于保护依赖AI技术的众多组织和用户至关重要。本研究利用成熟的图像少样本学习技术,通过一种新颖的图像表示将AI模型转换到图像领域。在该领域应用少样本学习使我们能够创建实用的模型,这是先前工作所缺乏的。我们的方法解决了现有最先进检测技术中阻碍其实用性的关键限制。该方法将所需的训练数据集大小从40000个模型减少到仅6个。此外,我们的方法能够持续检测嵌入率低至25%甚至在某些情况下低至6%的隐蔽攻击,而先前的工作仅被证明对100%-50%的嵌入率有效。我们采用严格的评估策略,确保训练后的模型在各种因素下具有泛化能力。此外,我们展示了训练后的模型成功检测到新型扩频隐写攻击,仅通过学习一种攻击类型就证明了模型令人印象深刻的鲁棒性。我们开源代码以支持可重复性并促进这一新领域的研究。

英文摘要

The potential for exploitation of AI models has increased due to the rapid advancement of Artificial Intelligence (AI) and the widespread use of platforms like Model Zoo for sharing AI models. Attackers can embed malware within AI models through steganographic techniques, taking advantage of the substantial size of these models to conceal malicious data and use it for nefarious purposes, e.g. Remote Code Execution. Ensuring the security of AI models is a burgeoning area of research essential for safeguarding the multitude of organizations and users relying on AI technologies. This study leverages well-studied image few-shot learning techniques by transferring the AI models to the image field using a novel image representation. Applying few-shot learning in this field enables us to create practical models, a feat that previous works lack. Our method addresses critical limitations in state-of-the-art detection techniques that hinder their practicality. This approach reduces the required training dataset size from 40000 models to just 6. Furthermore, our methods consistently detect delicate attacks of up to 25% embedding rate and even up to 6% in some cases, while previous works were only shown to be effective for a 100%-50% embedding rate. We employ a strict evaluation strategy to ensure the trained models are generic concerning various factors. In addition, we show that our trained models successfully detect novel spread-spectrum steganography attacks, demonstrating the models' impressive robustness just by learning one type of attack. We open-source our code to support reproducibility and enhance the research in this new field.

2401.17010 2026-06-02 cs.CR cs.AI cs.LG

Finetuning Large Language Models for Vulnerability Detection

微调大型语言模型用于漏洞检测

Alexey Shestov, Rodion Levichev, Ravil Mussabayev, Evgeny Maslov, Anton Cheshkov, Pavel Zadorozhny

发表机构 * Sber AI Lab(Sber AI实验室) Huawei Russian Research Institute(华为俄罗斯研究院) Satbayev University(萨特拜耶夫大学)

AI总结 本文通过微调WizardCoder模型,优化训练流程并处理类别不平衡,在漏洞检测任务上提升了ROC AUC和F1指标,展示了预训练LLM在源代码分析中的迁移学习潜力。

详情
AI中文摘要

本文介绍了微调大型语言模型(LLMs)用于检测源代码中漏洞的结果。我们利用WizardCoder(最新改进的先进LLM StarCoder),并通过进一步微调使其适应漏洞检测。为加速训练,我们修改了WizardCoder的训练过程,并研究了最优训练方案。针对负样本远多于正样本的不平衡数据集,我们还探索了不同技术以提升分类性能。微调后的WizardCoder模型在平衡和不平衡的漏洞数据集上,相比于CodeBERT类模型,在ROC AUC和F1指标上均有提升,证明了将预训练LLM用于源代码漏洞检测的有效性。关键贡献包括:微调先进的代码LLM WizardCoder、在不损害性能的前提下提高其训练速度、优化训练流程和方案、处理类别不平衡,以及在困难的漏洞检测数据集上提升性能。这展示了通过微调大型预训练语言模型进行专门源代码分析任务的迁移学习潜力。

英文摘要

This paper presents the results of finetuning large language models (LLMs) for the task of detecting vulnerabilities in source code. We leverage WizardCoder, a recent improvement of the state-of-the-art LLM StarCoder, and adapt it for vulnerability detection through further finetuning. To accelerate training, we modify WizardCoder's training procedure, also we investigate optimal training regimes. For the imbalanced dataset with many more negative examples than positive, we also explore different techniques to improve classification performance. The finetuned WizardCoder model achieves improvement in ROC AUC and F1 measures on balanced and imbalanced vulnerability datasets over CodeBERT-like model, demonstrating the effectiveness of adapting pretrained LLMs for vulnerability detection in source code. The key contributions are finetuning the state-of-the-art code LLM, WizardCoder, increasing its training speed without the performance harm, optimizing the training procedure and regimes, handling class imbalance, and improving performance on difficult vulnerability detection datasets. This demonstrates the potential for transfer learning by finetuning large pretrained language models for specialized source code analysis tasks.

2407.12014 2026-06-02 cs.HC cs.CY cs.RO

Surprising Performances of Students with Autism in Classroom with NAO Robot

自闭症学生在NAO机器人课堂中的惊人表现

Qin Yang, Huan Lu, Dandan Liang, Shengrong Gong, Huanghao Feng

发表机构 * School of Computer and Information Technology(计算机与信息学院) Northeast Petroleum University(东北石油大学) School of Computer Science and Engineering(计算机科学与工程学院) Changshu Institute of Technology(常州职业技术学院) Changshu Special Education School(常州特殊教育学校) School of Chinese Language and Literature(中文语言文学学院) Nanjing Normal University(南京师范大学)

AI总结 本研究通过NAO机器人辅助的集体课堂实验,发现自闭症谱系障碍学生在机器人课堂中表现出更高的参与度和更少的刻板行为,表明社交机器人能显著提升其课堂专注力和教育表现。

详情
Journal ref
Frontiers of Digital Education 3(2), 2024
AI中文摘要

自闭症是一种在幼儿期出现并持续终生的发育障碍,深刻影响社交行为,并阻碍患者学习和社交技能的获取。随着技术进步,越来越多的技术被用于支持自闭症谱系障碍(ASD)学生的教育,旨在改善其教育成果和社交能力。许多关于自闭症干预的研究强调了社交机器人在行为治疗中的有效性。然而,关于将社交机器人融入自闭症儿童课堂环境的研究仍然很少。本文描述了在NAO机器人介导的集体课堂环境中进行的一项小组实验的设计与实施。实验由特殊教育教师和NAO机器人协作开展课堂活动,旨在通过教师、机器人和学生之间的互动营造动态学习环境。该实验在特殊教育学校进行,作为预期中扩展机器人辅助课堂的基础研究。实验数据表明,配备NAO机器人的课堂中的ASD学生表现明显优于普通课堂中的学生。NAO机器人的类人特征和肢体语言吸引了学生的注意力,特别是在才艺展示和指令任务中,学生表现出更高的参与度,并且减少了在常规环境中常见的刻板重复行为和不相关的小动作。我们的初步发现表明,NAO机器人显著提高了ASD学生的专注力和课堂参与度,可能改善教育表现并促进更好的社交行为。

英文摘要

Autism is a developmental disorder that manifests in early childhood and persists throughout life, profoundly affecting social behavior and hindering the acquisition of learning and social skills in those diagnosed. As technological advancements progress, an increasing array of technologies is being utilized to support the education of students with Autism Spectrum Disorder (ASD), aiming to improve their educational outcomes and social capabilities. Numerous studies on autism intervention have highlighted the effectiveness of social robots in behavioral treatments. However, research on the integration of social robots into classroom settings for children with autism remains sparse. This paper describes the design and implementation of a group experiment in a collective classroom setting mediated by the NAO robot. The experiment involved special education teachers and the NAO robot collaboratively conducting classroom activities, aiming to foster a dynamic learning environment through interactions among teachers, the robot, and students. Conducted in a special education school, this experiment served as a foundational study in anticipation of extended robot-assisted classroom sessions. Data from the experiment suggest that ASD students in classrooms equipped with the NAO robot exhibited notably better performance compared to those in regular classrooms. The humanoid features and body language of the NAO robot captivated the students' attention, particularly during talent shows and command tasks, where students demonstrated heightened engagement and a decrease in stereotypical repetitive behaviors and irrelevant minor movements commonly observed in regular settings. Our preliminary findings indicate that the NAO robot significantly enhances focus and classroom engagement among students with ASD, potentially improving educational performance and fostering better social behaviors.

2404.07373 2026-06-02 eess.SY cs.LG cs.SY

Synthesizing Neural Network Controllers with Closed-Loop Dissipativity Guarantees

具有闭环耗散性保证的神经网络控制器综合

Neelay Junnarkar, Murat Arcak, Peter Seiler

发表机构 * Department of Electrical Engineering and Computer Sciences, University of California, Berkeley(加州大学伯克利分校电子工程与计算机科学系) Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor(密歇根大学安娜堡分校电子工程与计算机科学系)

AI总结 提出一种在保证闭环系统耗散性(如稳定性和L2增益界)的硬约束下最大化奖励的神经网络控制器综合方法,利用积分二次约束描述不确定性和激活函数,通过线性矩阵不等式和投影训练实现。

Comments Accepted to the journal Automatica, 17 pages, 9 figures

详情
AI中文摘要

本文提出了一种综合神经网络控制器的方法,在反馈系统(包括被控对象和控制器)具有耗散性的硬约束下最大化奖励,从而保证稳定性和$L_2$增益界等性质。它考虑了非线性和不确定的被控对象,将其建模为线性时不变(LTI)系统与一个包含非线性的不确定性模块的互联。被控对象的不确定性和神经网络的激活函数均使用积分二次约束(IQCs)描述。首先,推导了不确定LTI系统的耗散性条件。其次,利用该条件构造了一个线性矩阵不等式(LMI),可用于综合神经网络控制器。最后,将该凸条件用于基于投影的训练方法,以综合具有耗散性保证的神经网络控制器。通过倒立摆和带柔性杆的小车上的数值示例,证明了该方法的有效性。

英文摘要

This paper presents a method to synthesize neural network controllers to maximize reward subject to the hard constraint that the feedback system of plant and controller be dissipative, certifying requirements such as stability and $L_2$ gain bounds. It considers nonlinear and uncertain plants, modeled as the interconnection of a linear time-invariant (LTI) system and an uncertainty block, which incorporates nonlinearities. The uncertainty of the plant and the activation functions of the neural network are both described using integral quadratic constraints (IQCs). First, a dissipativity condition is derived for uncertain LTI systems. Second, this condition is used to construct a linear matrix inequality (LMI) which can be used to synthesize neural network controllers. Finally, this convex condition is used in a projection-based training method to synthesize neural network controllers with dissipativity guarantees. Numerical examples on an inverted pendulum and a flexible rod on a cart are provided to demonstrate the effectiveness of this approach.

2402.14031 2026-06-02 eess.SY cs.LG cs.SY

Discovering Nonlinear Static Relationships in Unlabeled Dataset using Autoencoder with Ordered Variance

使用有序方差自编码器发现未标注数据集中的非线性静态关系

Midhun T. Augustine, Parag Patil, Mani Bhushan, Sharad Bhartiya

发表机构 * Automation Lab, Department of Chemical Engineering(自动化实验室,化学工程系)

AI总结 提出一种有序方差自编码器(AEO)及其残差网络扩展(RAEO),通过方差正则化项强制潜变量有序排列,实现未标注数据中非线性关系的无监督发现与静态模型提取。

Comments 14 pages, 5 figures

详情
AI中文摘要

本文提出了一种有序方差自编码器(AEO),其中传统的重建损失通过基于方差的正则化项进行增强,该正则化项促进了潜空间内的有序结构。在这种结构中,潜变量根据其在训练数据上计算的方差进行排序,有助于系统地确定潜空间维度。AEO进一步通过残差网络扩展,产生了基于ResNet的AEO(RAEO)。AEO和RAEO均能发现未标注数据集中变量间的非线性关系,从而实现无监督静态模型提取。理论贡献包括对潜变量方差排序的形式化保证。通过将框架应用于非线性稳态模型的识别及其在实时优化中的使用,以连续搅拌釜反应器过程作为代表性案例研究,展示了该框架的实际效用。

英文摘要

This paper presents an autoencoder with ordered variance (AEO), in which the conventional reconstruction loss is augmented by a variance-based regularization term that promotes an ordered structure within the latent space. In this structure, the latent variables are ordered by their variance computed over the training data, facilitating systematic determination of the latent space dimensionality. The AEO is further extended using residual networks, resulting in a ResNet-based AEO (RAEO). Both AEO and RAEO green lead to discovery of nonlinear relationships among variables in unlabeled datasets, thereby enabling unsupervised static model extraction. Theoretical contributions include formal guarantees on the ordering of latent variances. The practical utility of the framework is demonstrated through its application to the identification of nonlinear steady-state models and their use in real-time optimization, with a continuous stirred tank reactor process serving as a representative case study.

2211.15223 2026-06-02 math.AP cs.LG math.OC

Gamma-convergence of a nonlocal perimeter arising in adversarial machine learning

对抗机器学习中出现的非局部周长的Gamma收敛

Leon Bungert, Kerrek Stinson

发表机构 * Hausdorff Center for Mathematics, University of Bonn(哈代尔夫数学中心、波恩大学)

AI总结 本文证明了一种Minkowski型非局部周长在Gamma收敛意义下趋于局部各向异性周长,该非局部模型描述了二分类中对抗训练的正则化效应,仅假设分布具有有界BV密度,并应用于总变分、对抗训练渐近性和图离散化的收敛性分析。

Comments Fixed typos, added new isotropic-anisotropic decomposition formula for limit perimeter

详情
Journal ref
Calculus of Variations and Partial Differential Equations 63 (5), 114, 2024
AI中文摘要

在本文中,我们证明了Minkowski型非局部周长在Gamma收敛意义下趋于局部各向异性周长。该非局部模型描述了二分类中对抗训练的正则化效应。该能量本质上依赖于两个分布之间的相互作用,这些分布模拟了相关类别的似然。我们克服了分布典型的严格正则性假设,仅假设它们具有有界$BV$密度。在由紧性导出的自然拓扑中,我们证明了Gamma收敛到一个加权周长,其权重由两个密度的各向异性函数决定。尽管是局部的,这个尖锐界面极限反映了对抗扰动下的分类稳定性。我们进一步应用我们的结果来推导相关总变分的Gamma收敛,研究对抗训练的渐近性,并证明非局部周长的图离散化的Gamma收敛。

英文摘要

In this paper we prove Gamma-convergence of a nonlocal perimeter of Minkowski type to a local anisotropic perimeter. The nonlocal model describes the regularizing effect of adversarial training in binary classifications. The energy essentially depends on the interaction between two distributions modelling likelihoods for the associated classes. We overcome typical strict regularity assumptions for the distributions by only assuming that they have bounded $BV$ densities. In the natural topology coming from compactness, we prove Gamma-convergence to a weighted perimeter with weight determined by an anisotropic function of the two densities. Despite being local, this sharp interface limit reflects classification stability with respect to adversarial perturbations. We further apply our results to deduce Gamma-convergence of the associated total variations, to study the asymptotics of adversarial training, and to prove Gamma-convergence of graph discretizations for the nonlocal perimeter.

2606.02540 2026-06-02 cs.CL

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

SkillHarm: 通过自动化构建实现生命周期感知的基于技能的攻

Yuting Ning, Zhehao Zhang, Yash Kumar Lal, Boyu Gou, Junyi Li, Weitong Ruan, Chentao Ye, Rahul Gupta, Diyi Yang, Yu Su, Huan Sun

AI总结 提出SkillHarm基准,通过固定载荷投毒和自我变异投毒两种攻击场景,系统评估基于技能的攻击在技能使用生命周期中的风险,并构建自动化管道AutoSkillHarm生成大规模攻击样本。

Comments Work in Progress

详情
AI中文摘要

智能体技能在智能体工作流中占据特权地位,因为智能体被期望隐式地遵循并执行它们,这使得第三方技能成为易受攻击的表面。现有研究揭示了由基于技能的攻击引起的不安全智能体行为,但它们主要评估单个任务执行中的投毒技能,并通过临时风险列表枚举危害。为弥补这些不足,我们引入SkillHarm,这是一个跨越技能使用生命周期的基于技能的攻击基准,并配有一个系统化的技能相关风险分类。SkillHarm评估两种攻击场景:固定载荷投毒(FPP),其中固定的投毒技能包直接危害任何调用它的任务会话;以及自我变异投毒(SMP),其中最初良性的执行静默地变异持久的技能内容,将危害延迟到后续重用。它进一步根据危害针对的智能体工作流组件定义了12种风险类型:数据管道、系统环境和智能体自主性。为了大规模实例化这些攻击,我们构建了AutoSkillHarm,一个由自然语言驱动、编码智能体驱动的自动化构建管道。由此产生的基准包含71个技能的879个攻击样本。实验表明,当前智能体仍然易受攻击,FPP和SMP的攻击成功率分别高达86.3%和69.3%。我们的分析进一步揭示了一个潜在风险:许多明显的攻击失败源于智能体未能与被投毒的文件交互,而非真正的抵抗,而当前的防御措施仍然无法可靠地缓解这一威胁。

英文摘要

Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack surface. Existing studies have revealed unsafe agent behaviors induced by skill-based attacks, but they primarily evaluate poisoned skills within a single task execution and enumerate harms through ad-hoc risk lists. To bridge these gaps, we introduce SkillHarm, a benchmark of skill-based attacks across the skill-use lifecycle, paired with a systematic taxonomy of skill-relevant risks. SkillHarm evaluates two attack scenarios: Fixed-Payload Poisoning (FPP), where a fixed poisoned skill package directly compromises any task session that invokes it, and Self-Mutating Poisoning (SMP), where an initially benign execution silently mutates persistent skill content, deferring harm until a subsequent reuse. It further defines 12 risk types based on the agent workflow component targeted by the harm: data pipelines, system environments, and agent autonomy. To instantiate these attacks at scale, we build AutoSkillHarm, an automated construction pipeline with coding agents driven by natural-language harnesses. The resulting benchmark contains 879 attack samples across 71 skills. Experiments show that current agents remain vulnerable with attack success rates up to 86.3% in FPP and 69.3% in SMP. Our analysis further reveals a latent risk: many apparent attack failures stem from the agent failing to engage with the poisoned file rather than genuine resistance, and current defenses still fail to reliably mitigate the threat.

2606.02515 2026-06-02 cs.LG

A Biconvex Formulation for Stable Transport of Mixture Models with a Unique Solution

混合模型稳定传输的双凸形式与唯一解

Yeganeh Marghi, Kelly Jin, Uygar Sümbül

AI总结 提出最优混合传输(OMT)框架,通过严格双凸优化实现子群体混合的稳定传输,理论保证稳定性,计算复杂度仅与混合成分数相关。

详情
AI中文摘要

最优传输(OT)为概率分布之间的映射提供了原则性框架。尽管取得了广泛进展,将OT应用于大规模数据仍然计算密集,且得到的逐点传输计划往往难以解释。我们引入了最优混合传输(OMT),这是一个可扩展的框架,将传输范式从单个样本转移到子群体的混合,将传输问题重新表述为具有唯一全局最小值的严格双凸优化。我们进一步建立了OMT映射稳定性的理论保证,表明底层分布的有界扰动会导致传输计划的有界变化。通过将子群体表述为指数族分布,OMT将计算复杂度与样本量解耦,仅随混合成分数量扩展。我们在广泛的合成基准和真实世界数据集(包括图像数据和大规模单细胞RNA测序测量)上展示了OMT的有效性和实用性。

英文摘要

Optimal transport (OT) provides a principled framework for mapping between probability distributions. Despite extensive progress, applying OT to large-scale data remains computationally demanding, and the resulting pointwise transport plans are often difficult to interpret. We introduce Optimal Mixture Transport (OMT), a scalable framework that shifts the transport paradigm from individual samples to mixtures of subpopulations, reformulating the transport problem as a strictly biconvex optimization with a unique global minimizer. We further establish theoretical guarantees on the stability of the OMT map, showing that bounded perturbations of the underlying distributions lead to bounded changes in the transport plan. By formulating subpopulations as exponential-family distributions, OMT decouples computational complexity from the sample size, scaling solely with the number of mixture components. We demonstrate the effectiveness and practicality of OMT on a wide range of synthetic benchmarks and real-world datasets, including image data and large-scale single-cell RNA sequencing measurements.

2606.02497 2026-06-02 cs.AI

Bridging the Last Mile of Time Series Forecasting with LLM Agents

用LLM智能体弥合时间序列预测的最后一公里

Yuhua Liao, Zetian Wang, Qiangqiang Nie, Zhenhua Zhang

AI总结 提出一个LLM智能体框架,通过检索上下文证据和结构化约束,将统计预测转化为业务就绪的预测。

详情
AI中文摘要

时间序列预测发展迅速,特别是随着基础模型的出现,这些模型在数值外推上展现出强大的零样本性能。然而,在实际预测场景中,统计上合理的基线很少是实践中使用的最终预测。在预测成为决策就绪之前,通常需要使用弱结构化的业务背景进行修订,例如假日效应、活动计划、外部事件、历史类比和专家反馈。这一实际阶段在预测文献中仍未得到充分探索。在本文中,我们将这一阶段定义为 extbf{最后一公里预测}问题,并提出一个位于预测骨干之上的LLM智能体框架。我们的系统维护一个统一的预测工作空间,调用工具检索上下文证据,并在结构安全约束下将推理轨迹转化为明确的预测修订行动。它还通过map-reduce风格的分解支持长周期预测,并通过记忆库支持事后反思。最终的系统设计为可控和可审计的。通过实际案例研究,我们展示了LLM智能体如何弥合统计预测与业务就绪预测之间的差距。

英文摘要

Time series forecasting has advanced rapidly, especially with the emergence of foundation models that show strong zero-shot performance on numerical extrapolation. However, in real-world forecasting settings, a statistically plausible baseline is rarely the final forecast used in practice. Before a forecast becomes decision-ready, it often needs to be revised using weakly structured business context such as holiday effects, campaign plans, external events, historical analogs, and expert feedback. This practical stage remains underexplored in the forecasting literature. In this paper, we formulate this stage as the \textbf{last-mile forecasting} problem and present an LLM-agent framework that sits on top of a forecasting backbone. Our system maintains a unified forecast workspace, invokes tools to retrieve contextual evidence, and converts reasoning trajectories into explicit forecast revision actions under structural safety constraints. It also supports long-horizon forecasting through map-reduce-style decomposition and post-hoc reflection through a memory bank. The resulting system is designed to be controllable and auditable. Through real-world case studies, we show how LLM agents can bridge the gap between statistical prediction and business-ready forecasting.

2606.02481 2026-06-02 cs.CV

Places in the Wild: A Large, High-Resolution RAW Photograph Dataset for Ecologically Valid Vision Research

野外场景:一个用于生态有效视觉研究的大规模高分辨率RAW照片数据集

Michelle R. Greene

AI总结 本文提出了一个包含67,574张高分辨率RAW照片的数据集,通过360度视角采样覆盖260个场景类别,支持视角依赖识别、真实场景理解及自然场景统计研究。

Comments 19 pages, 3 tables, 4 figures

详情
AI中文摘要

大规模图像数据集加速了认知神经科学和计算机视觉的进展。然而,大多数数据集是低分辨率、来自互联网的JPEG图像,其拍摄条件未知且空间上下文有限。野外场景数据集包含67,574张高分辨率照片,这些照片在810个物理位置现场采集,涵盖260个基本级场景类别,包括室内、城市和自然环境。在每个位置,安装在全景三脚架上的4500万像素佳能EOS R5相机以5度水平间隔拍摄72张图像,并在不同仰角拍摄12张图像,实现了密集的360度视点采样。所有图像同时记录为14位RAW(CR3)文件和压缩JPEG文件,保留了传感器级别的细节,用于分析亮度、对比度、颜色和其他图像统计信息。该数据集附有完整的EXIF元数据和一套图像质量指标。野外场景数据集支持人类和模型中视角依赖识别的研究、在真实条件下训练和评估场景理解系统、自然场景统计特征的刻画,以及需要近全视野视觉显示的实验。

英文摘要

Large image datasets have accelerated progress in cognitive neuroscience and computer vision. However, most datasets are low-resolution, internet-sourced JPEGs with unknown capture conditions and limited spatial context. Places in the Wild is a dataset of 67,574 high-resolution photographs collected in situ across 810 physical locations spanning 260 basic-level scene categories, including indoor, urban, and natural environments. At each location, a 45-megapixel Canon EOS R5 mounted on a panoramic tripod captured 72 images at 5-degree horizontal intervals plus 12 images at varying elevations, yielding dense 360-degree viewpoint sampling. All images were recorded simultaneously as 14-bit RAW (CR3) files and compressed JPEGs, preserving sensor-level detail for analyses of luminance, contrast, color, and other image statistics. The dataset is accompanied by complete EXIF metadata and a suite of image-quality metrics. Places in the Wild supports research on viewpoint-dependent recognition in humans and models, training and evaluation of scene-understanding systems under realistic conditions, characterization of natural scene statistics, and experiments requiring near-full-field visual displays.

2606.02458 2026-06-02 cs.AI

Beyond One-shot: AI Agents for Learning in Field Experiments

超越一次性:用于现场实验学习的AI智能体

Junjie Luo, Ritu Agarwal, Gordon Gao

AI总结 研究通过工具增强的智能体AI自动从实验数据中学习并生成新干预措施,在医疗处方消息现场实验中证明其优于人类+聊天机器人方法。

详情
AI中文摘要

组织通常进行A/B测试实验,但一次实验产生的数据未被充分利用以指导后续干预设计。从先前实验数据中提取可操作知识以指导新干预存在重大障碍。我们研究工具增强的智能体AI能否自动从实验数据中学习,以在后续实验中生成新干预。通过医疗处方消息传递(693,139次患者就诊)的两阶段现场实验,我们比较了人类+聊天机器人方法(第一阶段:行为专家与对话式AI共同设计13种消息变体,444,691次患者就诊)与工具增强的智能体AI方法(第二阶段:AI自主从第一阶段数据中提取原则以生成17种新变体,248,448次患者就诊)。配备分析工具、结构化数据-信息-知识-智慧(DIKW)推理智能体和透明证据链的智能体AI方法产生了更优的干预:AI生成的最佳消息实现了69.8%的点击率(比基线高6.5个百分点)。关键的是,我们的结果表明价值来自特定领域的实验数据,而非通用推理能力:没有实验数据的前沿大语言模型无法预测哪些干预会成功。现场实验还揭示,用于干预设计的通用行为理论并不能统一适用于特定医疗环境,这激发了在实验规模上进行理论审计的智能体AI方法。我们的研究表明,工具增强的AI可以从实验数据中学习并生成改进的领域相关干预,将行为实验从一次性评估转变为可扩展的累积设计学习系统。

英文摘要

Organizations routinely run experiments for A/B testing, yet the data generated from one experiment is underutilized to inform subsequent intervention design. Significant barriers exist to extracting actionable knowledge from prior experimental data to inform new interventions. We study whether tool-augmented agentic AI can automatically learn from experimental data to generate new interventions in subsequent experiments. Through two-stage field experiments in healthcare prescription messaging (693,139 patient visits), we compare a Human + Chatbot method (Stage 1: behavioral experts with conversational AI co-designing 13 message variants, 444,691 patient visits) against a Tool-Augmented Agentic AI method (Stage 2: AI autonomously extracting principles from Stage 1 data to generate 17 new variants, 248,448 patient visits). The Agentic AI method, equipped with analytical tools, structured Data-Information-Knowledge-Wisdom (DIKW) reasoning agents, and transparent evidence chains, produces superior interventions: the best AI-generated message achieved a 69.8% CTR (+6.5 percentage points over baseline). Critically, our results suggest that the value comes from domain-specific experimental data, not from general reasoning ability: frontier LLMs operating without experimental data failed to predict which interventions would succeed. The field experiments also revealed that general-purpose behavioral theories used for intervention design do not extend uniformly to specific healthcare contexts, motivating an agentic AI approach to theory audits at field-experiment scale. Our research shows that tool-augmented AI can learn from experimental data and generate improved domain-relevant interventions, transforming behavioral experimentation from one-shot evaluation into a scalable system for cumulative design learning.

2606.02438 2026-06-02 cs.AI

LLM-Evolved Pattern Generators for Optimal Classical Planning

LLM演化模式生成器用于最优经典规划

Windy Phung, Dominik Drexler, Arnaud Lequen, Jendrik Seipp

AI总结 提出首个通过LLM驱动的进化程序合成学习可容许启发式函数的方法,用于最优经典规划,结合饱和成本分区保证A*搜索的最优性。

详情
AI中文摘要

学习到的启发式函数最近已成为满足规划中传统领域无关启发式函数的竞争性替代方案。然而,现有方法侧重于改进搜索引导而非保证可容许性,这使得它们不适用于最优经典规划。我们提出了第一种学习领域相关启发式函数的方法,这些启发式函数在设计上是可容许的,从而保留了A*搜索的最优性保证。我们不是学习从状态到启发式值的直接映射,而是学习构建可诱导可容许启发式函数的抽象。我们使用LLM驱动的进化程序合成框架,为每个领域获得一个程序,该程序为该领域中的任何任务生成模式集合,并通过饱和成本分区以可容许的方式组合所得模式。实验表明,学习到的程序编码了可解释的领域特定见解,在测试时以可忽略的开销运行,并在多个领域上产生了与最先进的领域无关基线相匹配的覆盖范围,同时每个状态的评估速度显著更快。

英文摘要

Learned heuristics have recently become a competitive alternative to traditional domain-independent heuristics for satisficing planning. Existing approaches, however, focus on improving search guidance rather than guaranteeing admissibility, which makes them unsuitable for optimal classical planning. We present the first method for learning domain-dependent heuristics that are admissible by design and thus preserve the optimality guarantees of A* search. Instead of learning a direct mapping from states to heuristic values, we learn to construct abstractions that induce admissible heuristics. We use an LLM-driven evolutionary program-synthesis framework to obtain, for each domain, a program that produces a pattern collection for any task in that domain, and we combine the resulting patterns admissibly via saturated cost partitioning. Empirically, the learned programs encode interpretable domain-specific insights, run with negligible overhead at test time and yield heuristics that match the coverage of state-of-the-art domain-independent baselines on several domains while evaluating each state substantially faster.

2606.02384 2026-06-02 cs.LG

TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks

TabPrep: 弥合表格基准测试中的特征工程差距

Andrej Tschalzev, Nick Erickson, Yuyang Wang, Huzefa Rangwala, Stefan Lüdtke, Heiner Stuckenschmidt, Christian Bartelt

AI总结 本文提出TabPrep,一个轻量级预处理流程,通过针对三种特定数据模式设计的特征生成器,系统性地进行特征工程,显著提升多种模型在表格基准测试中的性能。

详情
AI中文摘要

表格机器学习的进展主要集中在日益复杂的模型架构上。同时,特征工程仍然是现实建模流程中关键但未被充分探索的组成部分,在现代基准测试中完全缺失,这造成了未量化的评估差距。在这项工作中,我们引入了TabPrep,一个轻量级预处理流程,由精心设计以针对三种特定结构数据模式的特征生成器组成。我们表明,许多广泛使用的模型类对这些模式表现出可预测的盲点,仅凭系统性的特征工程就能建立新的峰值性能。在TabArena基准测试中,将TabPrep集成到模型训练和调优中持续提升了基于树、神经网络、线性和基础模型的性能,通常超过仅通过以模型为中心的创新所获得的收益。TabPrep在性能、效率和跨数据集的适用性方面优于以前的自动化特征工程方法,使其能够集成到大规模基准测试中。通过发布TabPrep(见https://github.com/atschalz/tabprep),我们使研究人员能够将特征工程集成到他们的基准测试设置中,填补了表格评估中长期存在的空白。

英文摘要

Progress in tabular machine learning has largely focused on increasingly sophisticated model architectures. At the same time, feature engineering remains a critical yet underexplored component of real-world modeling pipelines that is entirely absent from modern benchmarks, which creates an unquantified evaluation gap. In this work, we introduce TabPrep, a lightweight preprocessing pipeline composed of feature generators that are carefully designed to target three specific structural data patterns. We show that many widely used model classes exhibit predictable blind spots to these patterns and that systematic feature engineering alone can establish new peak performance. Across the TabArena benchmark, integrating TabPrep into model training and tuning consistently improves performance for tree-based, neural, linear, and foundation models, often surpassing gains achieved by model-centric innovations alone. TabPrep outperforms previous automated feature engineering approaches in performance, efficiency, and applicability across datasets, enabling integration into large-scale benchmarks. By releasing TabPrep (see https://github.com/atschalz/tabprep), we enable researchers to integrate feature engineering into their benchmarking setup, filling a longstanding gap in tabular evaluations.

2606.02374 2026-06-02 cs.AI

Spatial Representation Learning Beyond Pixels: Unifying Raster Data and Vector Semantics for Human-Centric Geospatial Foundation Models

超越像素的空间表示学习:统一栅格数据和向量语义以构建以人为中心的地理空间基础模型

Steffen Knoblauch, Hao Li, Gengchen Mai, Konstantin Klemmer, Song Gao, WenWen Li

AI总结 本文提出统一栅格感知与向量推理的联合空间表示学习范式,旨在解决当前地球观测基础模型仅依赖栅格模态、忽略向量数据中丰富结构化信息的局限性。

详情
AI中文摘要

地球观测(EO)从根本上改变了环境过程和人类活动的监测,达到了行星尺度。自监督学习的最新进展催生了地球观测基础模型(EOFMs),这些模型利用PB级未标记EO数据学习跨广泛下游地理空间任务的可迁移表示。尽管取得了这些进展,当前的EOFMs仍然局限于栅格模态,忽视了诸如OpenStreetMap和Overture等可公开访问的向量数据源中编码的丰富结构化信息。向量数据提供了地理实体的显式和紧凑表示,包括几何、拓扑和语义关系,提供了在图像中通常模糊或难以获取的关键上下文信号。因此,栅格和向量数据代表了地理空间的互补视图:栅格数据捕捉连续的物理和光谱模式,而向量数据编码离散对象及其关系结构,并且通常更多地代表人类系统而非物理系统(例如社会或人口数据)。然而,现有的地理空间表示学习范式孤立地处理这些模态,依赖于不完美且常有损的转换来桥接它们。这篇观点文章呼吁向联合空间表示学习(SRL)的范式转变,即在统一的嵌入空间中整合栅格感知与基于向量的推理。基于多模态地理空间学习的新兴努力,我们强调了对齐异构空间数据源的概念基础、技术挑战和有前景的方向。我们认为,这种整合对于开发能够更准确、可解释且语义扎实地理解地球的下一代地理空间AI系统至关重要。

英文摘要

Earth Observation (EO) has fundamentally transformed the monitoring of environmental processes and human activities up to planetary scale. Recent advances in self-supervised learning have given rise to Earth Observation Foundation Models (EOFMs), which leverage petabyte-scale unlabeled EO data to learn transferable representations across a wide range of downstream geospatial tasks. Despite these advances, current EOFMs remain largely confined to raster modalities, overlooking the rich, structured information encoded in openly-accessible vector data sources such as OpenStreetMap and Overture. Vector data provides explicit and compact representations of geographic entities, including geometry, topology, and semantic relationships, offering critical contextual signals that are often ambiguous or inaccessible in imagery alone. Raster and vector data thus represent complementary views of geographic space: raster data captures continuous physical and spectral patterns, while vector data encodes discrete objects and their relational structure and often represents more of the human rather than the physical systems (e.g. social or demographic data). However, existing geospatial representation learning paradigms treat these modalities in isolation, relying on imperfect and often lossy transformations to bridge them. This perspective paper calls for a paradigm shift toward joint Spatial Representation Learning (SRL) in an unified embedding space that integrate raster perception with vector-based reasoning. Building on emerging efforts in multimodal geospatial learning, we highlight conceptual foundations, technical challenges, and promising directions for aligning heterogeneous spatial data sources. We contend that such integration is essential for developing next-generation geospatial AI systems capable of more accurate, interpretable, and semantically grounded understanding of the Earth.

2606.02373 2026-06-02 cs.AI cs.CL cs.IR

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Harness-1:基于状态外化马具的搜索智能体强化学习

Pengcheng Jiang, Zhiyi Shi, Kelly Hong, Xueqiang Xu, Jiashuo Sun, Jimeng Sun, Hammad Bashir, Jiawei Han

AI总结 提出Harness-1,一个20B参数的搜索智能体,通过强化学习在有状态搜索马具中训练,将常规状态管理外化到环境,在八个检索基准上平均召回率0.730,超越现有开源搜索子智能体11.4个百分点。

详情
AI中文摘要

搜索智能体通常被训练为基于不断增长的转录的策略:模型必须决定如何搜索,同时记住它看到了什么、哪些证据有用、哪些约束仍然开放、哪些声明已被检查。我们认为这种表述将过多的常规状态管理放在策略内部:强化学习被迫同时优化语义搜索决策和可恢复的簿记,而环境可以更可靠地维护这些簿记。我们引入Harness-1,一个20B参数的搜索智能体(检索子智能体),在有状态搜索马具内通过强化学习训练。该马具维护环境端的工作记忆,包括候选池、重要性标记的精选集、紧凑的证据链接、验证记录、压缩和去重的观察结果,以及预算感知的上下文渲染。策略保留语义决策:搜索什么、保留或丢弃哪些文档、验证什么以及何时停止。在涵盖网络、金融、专利和多跳问答的八个检索基准上,Harness-1实现了0.730的平均精选召回率,比次强的开源搜索子智能体高出11.4个百分点,并与更大的前沿模型搜索器保持竞争力。其优势在保留的迁移基准上尤为显著,表明基于显式搜索状态的强化学习可以产生超越训练领域的检索行为。我们的代码可在https://github.com/pat-jj/harness-1获取。

英文摘要

Search agents are often trained as policies over growing transcripts: the model must decide how to search while also remembering what it has seen, which evidence is useful, which constraints remain open, and which claims have actually been checked. We argue that this formulation puts too much routine state management inside the policy: reinforcement learning is forced to optimize both semantic search decisions and recoverable bookkeeping that the environment can maintain more reliably. We introduce Harness-1, a 20B search agent (retrieval subagent) trained with reinforcement learning inside a stateful search harness. The harness maintains environment-side working memory, including a candidate pool, an importance-tagged curated set, compact evidence links, verification records, compressed and deduplicated observations, and budget-aware context rendering. The policy retains the semantic decisions: what to search, which documents to keep or discard, what to verify, and when to stop. Across eight retrieval benchmarks spanning web, finance, patents, and multi-hop QA, Harness-1 achieves 0.730 average curated recall, outperforming the next strongest open search subagent by +11.4 points and remaining competitive with much larger frontier-model searchers. Its gains are especially strong on held-out transfer benchmarks, suggesting that reinforcement learning over explicit search state can produce retrieval behaviors that generalize beyond the training domains. Our code is available at https://github.com/pat-jj/harness-1.

2606.02370 2026-06-02 cs.RO

A Simulation Platform for Flapping-Wing Vehicles

扑翼飞行器仿真平台

Haichuan Li, Tomi Westerlund

AI总结 针对扑翼飞行器仿真与现实差距大的问题,提出FWAV-Sim高保真仿真平台,集成复合气动模型、湍流生成和真实传感器模拟,提升自主系统开发效果。

详情
AI中文摘要

扑翼飞行器(FWAVs)表现出卓越的敏捷性,但由于其对气动扰动的高敏感性和有限的传感器有效载荷能力,面临着巨大的自主性挑战。当前的仿真平台通常依赖于过度简化的层流假设和理想化的传感器模型,无法捕捉实际运行中遇到的复杂湍流模式和感知限制。这种仿真与现实的差距严重阻碍了FWAVs鲁棒自主系统的发展。我们引入了FWAV-Sim,一个基于Unity的高保真仿真框架,它集成了:(1)结合准稳态叶片单元理论和钝体阻力效应的复合气动模型,(2)通过分形噪声合成生成时空相关的湍流,以及(3)包括噪声IMU测量、LiDAR点云和RGB相机馈送的真实传感器模拟。我们的平台能够可扩展地生成包含真实车辆状态、气动力、湍流风场和多模态传感器流的同步数据集。实验验证表明,在FWAV-Sim中开发的自主流水线(包括控制器和感知系统)表现出显著提高的仿真能力,从而推进了扑翼飞行系统基于仿真的开发的卓越性能。

英文摘要

Flapping-wing aerial vehicles (FWAVs) demonstrate remarkable agility but face substantial autonomy challenges due to their high sensitivity to aerodynamic disturbances and limited sensor payload capacity. Current simulation platforms typically rely on oversimplified laminar flow assumptions and idealized sensor models, failing to capture the complex turbulence patterns and perceptual limitations encountered in real-world operation. This simulation-to-reality discrepancy significantly impedes the development of robust autonomy systems for FWAVs. We introduce FWAV-Sim, a high-fidelity Unity-based simulation framework that integrates: (1) a composite aerodynamic model combining quasi-steady blade-element theory with bluff-body drag effects, (2) spatiotemporally correlated turbulence generation through fractal noise synthesis, and (3) realistic sensor simulation including noisy IMU measurements, LiDAR point clouds, and RGB camera feeds. Our platform enables scalable generation of synchronized datasets containing ground-truth vehicle states, aerodynamic forces, turbulent wind fields, and multi-modal sensor streams. Experimental validation demonstrates that autonomy pipelines (including both controllers and perception systems) developed in FWAV-Sim exhibit significantly improved simulation capability, thereby advancing the outstanding performance in simulation-based development for flapping-wing aerial systems.

2606.02357 2026-06-02 cs.CV cs.AI

Do Multimodal Agents Really Benefit from Tool Use? A Systematic Study of Capability Gains

多模态智能体真的从工具使用中受益吗?能力增益的系统性研究

Garvin Guo, Donglei Yu, Yu Chen, Xiang Wang, Shuai Li, Xinpei Zhao, Huaxing Liu, Qinghao Wang, Minpeng Liao

AI总结 通过对比工具增强与无工具的多模态智能体在多项任务上的表现,发现工具使用并未带来一致的性能提升,智能体更多是学会了工具调用模式而非真正利用工具扩展能力。

详情
AI中文摘要

工具增强的多模态智能体在基准测试中表现出显著提升,这常被视为智能体已学会使用工具的证据。我们认为这种解读可能为时过早:仅凭工具调用轨迹并不能证明工具提供了答案关键信息。我们研究了两种代表性的“用图像思考”智能体,Thyme 和 DeepEyesV2,在真实世界理解、OCR、图表理解和数学推理任务上的表现。每个智能体与其无工具版本以及从同一源池训练但不含工具调用轨迹的纯文本推理器进行比较。工具访问并未带来一致的总体改进,未能可靠地降低生成令牌成本,并且仅留下一个很小的仅工具解决集:DeepEyesV2 的 93% 工具解决问题和 Thyme 的 96% 也被至少一种无工具设置解决。机制消融进一步表明,完整的工具使用循环并不始终优于单独的工具调用格式或返回的执行结果。在我们研究的设置中,所分析的智能体似乎更可靠地学习了工具调用模式而非工具贡献的能力,这表明评估应区分工具的可用性与工具是否真正扩展了智能体可解决的问题。

英文摘要

Tool-augmented multimodal agents show strong benchmark gains, often taken as evidence that agents have learned to use tools. We argue that this interpretation can be premature: a tool-call trace alone does not show whether the tool supplied answer-critical information. We study two representative ``thinking with images'' agents, Thyme and DeepEyesV2, across real-world understanding, OCR, chart understanding, and mathematical reasoning. Each agent is compared with its Tool-Free counterpart and with a Pure-Text Reasoner trained from the same source pool without tool-calling trajectories. Tool access yields little consistent aggregate improvement, does not reliably reduce generated-token cost, and leaves only a small tool-only solved set: 93% of DeepEyesV2's tool-solved problems and 96% of Thyme's are also solved by at least one non-tool setting. Mechanism ablations further show that the full tool-use loop does not consistently outperform either the tool-call format or the returned execution result alone. In the settings we study, the analyzed agents appear to learn tool-calling patterns more reliably than tool-contributed capabilities, suggesting that evaluation should distinguish tool availability from whether tools actually expand what agents can solve.