arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1708
专题追踪
2512.04123 2026-06-08 cs.CY cs.AI cs.LG cs.SE 版本更新

Measuring Agents in Production

生产环境中的智能体测量

Melissa Z. Pan, Negar Arabzadeh, Riccardo Cogo, Yuxuan Zhu, Alexander Xiong, Lakshya A Agrawal, Huanzhi Mao, Emma Shen, Sid Pallerla, Liana Patel, Shu Liu, Tianneng Shi, Xiaoyuan Liu, Jared Quincy Davis, Emmanuele Lacavalla, Alessandro Basile, Shuyi Yang, Paul Castro, Daniel Kang, Koushik Sen, Dawn Song, Joseph E. Gonzalez, Ion Stoica, Matei Zaharia, Marquita Ellis

发表机构 * University of California at Berkeley(加州大学伯克利分校) IBM Research(IBM研究院) University of Illinois at Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Stanford University(斯坦福大学)

AI总结 通过对86个已部署系统的调查和20个案例研究,发现生产环境中的LLM智能体主要采用简单可控的方法,可靠性是首要挑战,并依赖系统级设计和人工评估。

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026) as Oral Presentation

详情
AI中文摘要

基于LLM的智能体已经在许多行业的生产环境中运行,但我们缺乏对哪些技术方法能使部署成功的理解。我们首次系统性地研究了生产环境中的智能体测量(MAP),使用了来自智能体开发者的一手数据。我们通过深度访谈进行了20个案例研究,并调查了来自26个领域的86个已部署系统的从业者。我们调查了组织为何构建智能体、如何构建它们、如何评估它们以及它们面临的主要开发挑战。我们的研究发现,生产环境中的智能体是使用简单、可控的方法构建的:68%的智能体在人类干预前最多执行10步,70%依赖对现成模型进行提示而非权重调整,74%主要依赖人工评估。可靠性(随时间保持一致的正确行为)仍然是首要开发挑战,从业者目前通过系统级设计来解决。MAP记录了生产智能体的当前状态,为研究社区提供了部署现实和未充分探索的研究方向的可见性。

英文摘要

LLM-based agents already operate in production across many industries, yet we lack an understanding of what technical methods make deployments successful. We present the first systematic study of Measuring Agents in Production, MAP, using first-hand data from agent developers. We conducted 20 case studies via in-depth interviews and surveyed 86 deployed systems practitioners across 26 domains. We investigate why organizations build agents, how they build them, how they evaluate them, and their top development challenges. Our study finds that production agents are built using simple, controllable approaches: 68% execute at most 10 steps before human intervention, 70% rely on prompting off-the-shelf models instead of weight tuning, and 74% depend primarily on human evaluation. Reliability (consistent correct behavior over time) remains the top development challenge, which practitioners currently address through systems-level design. MAP documents the current state of production agents, providing the research community with visibility into deployment realities and underexplored research avenues.

2601.12375 2026-06-08 cs.NI cs.LG 版本更新

LiQSS: Post-Transformer Linear Quantum-Inspired State-Space Tensor Networks for Real-Time 6G

LiQSS:后Transformer线性量子启发状态空间张量网络用于实时6G

Farhad Rezazadeh, Hatim Chergui, Amir Ashtari Gargari, Mehdi Bennis, Houbing Song, Lingjia Liu, Merouane Debbah

发表机构 * i2CAT Foundation(i2CAT基金会) University of Oulu(奥卢大学) University of Maryland, Baltimore County (UMBC)(马里兰大学巴尔的摩县分校) Virginia Tech(弗吉尼亚理工大学) Khalifa University of Science and Technology(谢赫扎耶德科学技术大学)

AI总结 提出一种后Transformer的量子启发状态空间张量网络LiQSS,用线性复杂度结构状态空间核替代自注意力,结合张量训练分解和轻量门控,在6G O-RAN近实时KPI预测中实现参数减少155倍、推理加速2.74倍且不损失精度。

Comments 13 pages, 4 figures, 5 tables

详情
AI中文摘要

第六代(6G)开放无线接入网络(O-RAN)中的主动和智能控制需要在严格的近实时(Near-RT)延迟和计算约束下进行控制级预测。虽然基于Transformer的模型在序列建模中有效,但其二次复杂度限制了在近实时RAN智能控制器(RIC)分析中的可扩展性。本文研究了一种后Transformer设计范式,用于高效的无线电遥测预测。我们提出了一种量子启发的多体状态空间张量网络,用稳定的结构化状态空间动力学核替代自注意力,实现线性时间序列建模。采用张量训练(TT)/矩阵乘积态(MPS)表示形式的张量网络分解,以减少输入投影和预测头中的参数化和数据移动,同时轻量级通道门控和混合层捕获非平稳的跨关键性能指标(KPI)依赖关系。该模型实例化为一个智能感知-预测xApp,并在一个包含13个KPI的59441个滑动窗口的定制O-RAN KPI时间序列数据集上评估,以参考信号接收功率(RSRP)预测作为代表性用例。我们提出的线性量子启发状态空间(LiQSS)模型比先前的结构化状态空间基线小10.8倍至15.8倍,速度快约1.4倍。相对于基于Transformer的模型,LiQSS在不牺牲预测精度的情况下,参数数量减少高达155倍,推理速度提升高达2.74倍。

英文摘要

Proactive and agentic control in Sixth-Generation (6G) Open Radio Access Networks (O-RAN) requires control-grade prediction under stringent Near-Real-Time (Near-RT) latency and computational constraints. While Transformer-based models are effective for sequence modeling, their quadratic complexity limits scalability in Near-RT RAN Intelligent Controller (RIC) analytics. This paper investigates a post-Transformer design paradigm for efficient radio telemetry forecasting. We propose a quantum-inspired many-body state-space tensor network that replaces self-attention with stable structured state-space dynamics kernels, enabling linear-time sequence modeling. Tensor-network factorizations in the form of Tensor Train (TT) / Matrix Product State (MPS) representations are employed to reduce parameterization and data movement in both input projections and prediction heads, while lightweight channel gating and mixing layers capture non-stationary cross-Key Performance Indicator (KPI) dependencies. The proposed model is instantiated as an agentic perceive-predict xApp and evaluated on a bespoke O-RAN KPI time-series dataset comprising 59,441 sliding windows across 13 KPIs, using Reference Signal Received Power (RSRP) forecasting as a representative use case. Our proposed Linear Quantum-Inspired State-Space (LiQSS) model is 10.8x-15.8x smaller and approximately 1.4x faster than prior structured state-space baselines. Relative to Transformer-based models, LiQSS achieves up to a 155x reduction in parameter count and up to 2.74x faster inference, without sacrificing forecasting accuracy.

2512.23128 2026-06-08 cs.HC cs.AI cs.MA 版本更新

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

这是一个陷阱!面向网络代理的任务重定向说服基准

Karolina Korgul, Yushi Yang, Arkadiusz Drohomirecki, Piotr Błaszczyk, Will Howard, Lukas Aichberger, Chris Russell, Philip H. S. Torr, Adam Mahdi, Adel Bibi

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出TRAP基准,评估大型语言模型驱动的网络代理在动态网页中易受提示注入攻击的程度,发现平均25%的任务中代理被重定向,揭示了心理驱动的系统漏洞。

Comments ICML 2026

详情
AI中文摘要

由大型语言模型驱动的网络代理越来越多地用于电子邮件管理或专业网络等任务。然而,它们对动态网页内容的依赖使其容易受到提示注入攻击:隐藏在界面元素中的对抗性指令,说服代理偏离其原始任务。我们引入了任务重定向代理说服基准(TRAP),这是一个研究说服技术如何在现实任务中误导自主网络代理的基准。在六个前沿模型中,代理平均在25%的任务中容易受到提示注入(GPT-5为13%,DeepSeek-R1为43%),小的界面或上下文变化通常会使成功率翻倍,并揭示网络代理中系统的、由心理驱动的漏洞。我们还提供了一个模块化的社会工程注入框架,并在高保真网站克隆上进行受控实验,允许进一步扩展基准。

英文摘要

Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking. Their reliance on dynamic web content, however, makes them vulnerable to prompt injection attacks: adversarial instructions hidden in interface elements that persuade the agent to divert from its original task. We introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP), a benchmark for studying how persuasion techniques misguide autonomous web agents on realistic tasks. Across six frontier models, agents are susceptible to prompt injection in 25% of tasks on average (13% for GPT-5 to 43% for DeepSeek-R1), with small interface or contextual changes often doubling success rates and revealing systemic, psychologically driven vulnerabilities in web-based agents. We also provide a modular social-engineering injection framework with controlled experiments on high-fidelity website clones, allowing for further benchmark expansion.

2511.04567 2026-06-08 physics.plasm-ph cs.CE cs.LG physics.comp-ph 版本更新

Machine Learning for Electron-Scale Turbulence Modeling in W7-X

W7-X中电子尺度湍流建模的机器学习方法

Ionut-Gabriel Farcas, Don Lawrence Carl Agapito Fernando, Alejandro Banon Navarro, Gabriele Merlo, Frank Jenko

发表机构 * Department of Mathematics and Division of Computational Modeling and Data Analytics, Academy of Data Science, Virginia Tech(数学系和计算建模与数据科学学院,数据科学学院,弗吉尼亚理工学院) Max Planck Institute for Plasma Physics(最大平面物理研究所)

AI总结 针对Wendelstein 7-X仿星器中电子温度梯度湍流,利用主动学习回归构建物理引导的标度律降阶模型,预测热流并评估插值与外推性能。

Comments 15 pages, 7 tables, 14 figures

Journal ref Phys. Plasmas 33, 000000 (2026)

详情
AI中文摘要

构建湍流输运的降阶模型对于加速剖面预测和实现参数探索、设计优化等多查询任务至关重要。本文研究了Wendelstein 7-X (W7-X)仿星器中电子温度梯度(ETG)湍流的机器学习驱动降阶模型。我们开发了物理引导的标度律,以预测七个径向位置处的ETG热流作为三个关键等离子体参数的函数:归一化电子温度梯度($ω_{T_e}$)、归一化电子温度与密度梯度之比($η_e$)以及电子与离子温度比($τ$)。模型系数通过回归结合主动学习策略确定。该过程使用低基数稀疏网格训练数据初始化标度律,并通过从现有模拟数据库中选择信息量最大的样本迭代丰富训练集。使用每个径向位置超过393个点的样本外数据集评估模型的预测性能。利用在七个训练径向位置识别的系数,我们进一步推导了标度律系数作为径向位置函数的回归参数化。然后在训练中未使用的三个额外径向位置评估所得模型,包括插值和适度外推情况。总体而言,我们的降阶模型表现出良好的预测性能,并达到与原始参考模拟相当的精度,包括在插值和适度外推范围内。一个重要发现是,单一的径向无关模型无法充分描述W7-X核心区的ETG输运,表明存在当前公式未捕捉的几何依赖物理。

英文摘要

Constructing reduced models for turbulent transport is essential for accelerating profile predictions and enabling many-query tasks such as parameter exploration and design optimization. This work investigates machine-learning-driven reduced models for Electron Temperature Gradient (ETG) turbulence in the Wendelstein 7-X (W7-X) stellarator. We develop physics-guided scaling laws to predict the ETG heat flux at seven radial locations as functions of three key plasma parameters: the normalized electron temperature gradient ($ω_{T_e}$), the ratio of normalized electron temperature and density gradients ($η_e$), and the electron-to-ion temperature ratio ($τ$). The model coefficients are determined through regression combined with an active learning strategy. The procedure initializes the scaling laws using low-cardinality sparse-grid training data and iteratively enriches the training set by selecting maximally informative samples from an existing simulation database. The predictive performance of the models is assessed using out-of-sample datasets comprising more than $393$ points per radial location. Using the coefficients identified at the seven training radial locations, we further derive regression-based parameterizations for the scaling-law coefficients as functions of radial position. The resulting models are then evaluated at three additional radial locations not used during training, including both interpolation and moderate extrapolation cases. Overall, our reduced models demonstrate good predictive performance and achieve accuracy comparable to the original reference simulations, including in interpolation and moderate extrapolation regimes. An important finding is that a single radius-independent model cannot adequately describe ETG transport across the W7-X core, suggesting the presence of geometry-dependent physics not captured by the present formulation.

2511.02748 2026-06-08 cs.NI cs.LG 版本更新

Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning

面向6G的智能体世界建模:近实时生成式状态空间推理

Farhad Rezazadeh, Amir Ashtari Gargari, Hatim Chergui, Sandra Lagen, Merouane Debbah, Houbing Song, Lingjia Liu

发表机构 * BrainOmega and the Technical University of Catalonia (UPC)(BrainOmega 和 哈佛大学(UPC)) Centre Tecnologic de Telecomunicacions de Catalunya (CTTC/CERCA)(巴塞罗那电信技术中心(CTTC/CERCA)) i2CAT Foundation(i2CAT 基金会) Khalifa University of Science and Technology(科技 Khalifa 大学) University of Maryland, Baltimore County (UMBC)(马里兰大学巴尔的摩分校(UMBC)) Virginia Tech(弗吉尼亚理工学院)

AI总结 提出基于世界建模的智能体框架WM-MS3M,通过生成式状态空间实现6G网络近实时反事实推理与资源分配,在O-RAN数据上降低预测误差并加速推理。

Comments 13 Pages, 3 Figures, 4 Tables

详情
AI中文摘要

我们认为第六代(6G)智能并非流畅的令牌预测,而是想象与选择的能力——模拟未来场景、权衡取舍并以校准的不确定性行动。我们通过反事实动力学和世界建模(WM)范式重新定义开放无线接入网(O-RAN)近实时(Near-RT)控制,该范式学习动作条件的生成式状态空间。这使得超越大语言模型(LLM)作为主要建模基元的定量“假设”预测成为可能。诸如物理资源块(PRB)之类的动作在因果世界模型中被视为一等控制输入,并且对预测和假设分析中的偶然不确定性和认知不确定性进行建模。一个基于智能体模型预测控制(MPC)的交叉熵方法(CEM)规划器在短时域上运行,利用数据驱动的PRB边界内的先验均值展开以最大化确定性奖励。该模型将多尺度结构化状态空间混合(MS3M)与紧凑随机潜变量耦合形成WM-MS3M,总结关键绩效指标(KPI)历史并在假设PRB序列下预测下一步KPI。在真实O-RAN轨迹上,WM-MS3M相比MS3M在参数减少32%且延迟相似的情况下将平均绝对误差(MAE)降低1.69%,相比注意力/混合基线实现35-80%更低的均方根误差(RMSE)和2.3-4.1倍更快的推理速度,从而支持稀有事件模拟和离线策略筛选。

英文摘要

We argue that sixth-generation (6G) intelligence is not fluent token prediction but the capacity to imagine and choose -- to simulate future scenarios, weigh trade-offs, and act with calibrated uncertainty. We reframe open radio access network (O-RAN) near-real-time (Near-RT) control via counterfactual dynamics and a world modeling (WM) paradigm that learns an action-conditioned generative state space. This enables quantitative "what-if" forecasting beyond large language models (LLMs) as the primary modeling primitive. Actions such as physical resource blocks (PRBs) are treated as first-class control inputs in a causal world model, and both aleatoric and epistemic uncertainty are modeled for prediction and what-if analysis. An agentic, model predictive control (MPC)-based cross-entropy method (CEM) planner operates over short horizons, using prior-mean rollouts within data-driven PRB bounds to maximize a deterministic reward. The model couples multi-scale structured state-space mixtures (MS3M) with a compact stochastic latent to form WM-MS3M, summarizing key performance indicators (KPIs) histories and predicting next-step KPIs under hypothetical PRB sequences. On realistic O-RAN traces, WM-MS3M cuts mean absolute error (MAE) by 1.69% versus MS3M with 32% fewer parameters and similar latency, and achieves 35-80% lower root mean squared error (RMSE) than attention/hybrid baselines with 2.3-4.1x faster inference, enabling rare-event simulation and offline policy screening.

2510.17004 2026-06-08 cs.MA cs.AI 版本更新

ReclAIm: A Multi-Agent Framework for Monitoring and Correcting Performance Decline in Medical Imaging AI

ReclAIm:用于监测和纠正医学影像AI性能下降的多智能体框架

Eleftherios Tzanis, Michail E. Klontzas

发表机构 * Artificial Intelligence and Translational Imaging (ATI) Lab, Department of Radiology, School of Medicine, University of Crete(人工智能与转化成像实验室,放射科,医学院,希腊克里特大学) Computational Biomedicine Laboratory, Institute of Computer Science Foundation for Research and Technology Hellas (ICS - FORTH), Heraklion, Crete, Greece(计算生物医学实验室,希腊基础研究与技术院计算机科学研究所(ICS - FORTH),克里特,希腊) Division of Radiology, Department of Clinical Science, Intervention and Technology (CLINTEC), Karolinska Institute, Huddinge, Sweden(放射科,临床科学、干预与技术部(CLINTEC),卡罗林斯卡研究所,瑞典Huddinge)

AI总结 提出基于大语言模型的多智能体框架ReclAIm,通过自然语言交互自动监测医学图像分类模型性能下降并触发微调,采用数据增强、类别不平衡处理和参数锚定正则化策略,在多个数据集上验证了有效性。

Comments Published in Radiology: Artificial Intelligence (https://doi.org/10.1148/ryai.250923)

详情
AI中文摘要

目的:开发并评估一个用于自动监测、检测和纠正医学图像分类模型性能下降的多智能体框架(ReclAIm)。材料与方法:ReclAIm是一个基于大语言模型的多智能体系统,通过自然语言交互运行。一个主智能体协调三个任务特定智能体,执行性能评估并在检测到显著性能下降时触发微调。微调流程包含数据增强、类别不平衡处理以及参数锚定正则化策略以限制灾难性遗忘。该系统使用多个影像数据集进行基准测试,包括脑部MRI、胸部CT和胸部X光片,按模型开发、推理(性能监测)和微调子集划分(60%:20%:20%)。结果:ReclAIm成功协调了所有数据集的训练、评估和性能监测。在18个模型中的8个中检测到测试数据与推理数据之间的性能差异,触发了微调流程以减少性能差距。在性能下降高达40.6%的情况下(心脏肥大数据集,InceptionV3),微调将性能指标恢复至基线值的2%以内。结论:ReclAIm为医学图像分类模型的自动监测和定向微调提供了一个原型框架,其自然语言接口旨在支持研究及潜在临床应用的可及性。

英文摘要

Purpose: To develop and evaluate a multi-agent framework (ReclAIm) for automated monitoring, detection, and correction of performance decline in medical image classification models. Materials and Methods: ReclAIm is a large language model-based multi-agent system that operates through natural language interaction. A master agent coordinating three task-specific agents performed performance evaluation and triggered fine-tuning when substantial performance declines were detected. The fine-tuning workflow incorporated data augmentation, class imbalance handling, and a parameter-anchoring regularization strategy to limit catastrophic forgetting. The system was benchmarked using multiple imaging datasets, including brain MRI, chest CT, and chest radiography, partitioned into model development, inference (performance monitoring), and fine-tuning subsets (60%:20%:20%). Results: ReclAIm successfully orchestrated training, evaluation, and performance monitoring across all datasets. Performance discrepancies between test and inference data were detected in 8 of 18 models, prompting fine-tuning workflows that reduced performance gaps. In cases with declines of up to 40.6% (cardiomegaly dataset, InceptionV3), fine-tuning restored performance metrics to within 2% of baseline values. Conclusion: ReclAIm provides a prototype framework for automated monitoring and targeted fine-tuning of medical image classification models, with a natural language interface designed to support accessibility in research and potential clinical applications.

2509.22685 2026-06-08 eess.IV cs.CV cs.GR 版本更新

VIRTUS-FPP: Virtual Sensor Modeling for Fringe Projection Profilometry in NVIDIA Isaac Sim

VIRTUS-FPP:NVIDIA Isaac Sim中条纹投影轮廓测量的虚拟传感器建模

Adam Haroon, Anush Lakshman, Badrinath Balasubramaniam, Beiwen Li

发表机构 * Department of Mechanical Engineering, Iowa State University(Iowa州立大学机械工程系) College of Engineering, University of Georgia(佐治亚大学工程学院)

AI总结 提出VIRTUS-FPP,首个在NVIDIA Isaac Sim中实现的端到端虚拟传感器建模框架,用于条纹投影轮廓测量,实现物理保真模拟,无需预校准物理系统,支持亚毫米级重建精度。

Comments 10 pages, 13 figures, accepted for publication in IEEE Sensors Journal

详情
AI中文摘要

条纹投影轮廓测量(FPP)是一种用于3D表面重建的高精度结构光传感技术,但其实际部署常受限于复杂的校准程序、对环境条件的敏感性以及物理实验的高成本。同时,机器人研究日益依赖如NVIDIA Isaac Sim等仿真平台进行可扩展的开发与验证,但目前缺乏FPP等光学计量传感器的精确虚拟表示。本文提出VIRTUS-FPP,这是首个在NVIDIA Isaac Sim中实现的用于条纹投影轮廓测量的端到端虚拟传感器建模框架,能够对完整的FPP流程(包括结构光投影、图像形成、校准和3D重建)进行物理保真模拟,且无需依赖预校准的物理系统。该框架利用逆相机模型表示投影仪,确保了几何和光度保真度与结构光原理一致。通过连接光学计量与机器人仿真,VIRTUS-FPP实现了高保真合成数据生成、传感流程的系统评估以及真实世界FPP系统的数字孪生复制。实验结果表明,该框架具有亚毫米级重建精度,且模拟与物理测量之间具有强对应性,突显了其有效性及在推动感知驱动型机器人、仿真到现实迁移以及可扩展光学传感器设计方面的潜力。

英文摘要

Fringe projection profilometry (FPP) is a high-precision structured-light sensing technique for 3D surface reconstruction, yet its practical deployment is often constrained by complex calibration procedures, sensitivity to environmental conditions, and the high cost of physical experimentation. At the same time, robotics research increasingly relies on simulation platforms such as NVIDIA Isaac Sim for scalable development and validation, but accurate virtual representations of optical metrology sensors such as FPP are not currently available. In this work, we present VIRTUS-FPP, the first end-to-end virtual sensor modeling framework for fringe projection profilometry implemented in NVIDIA Isaac Sim, enabling physically grounded simulation of the complete FPP pipeline, including structured light projection, image formation, calibration, and 3D reconstruction, without dependence on pre-calibrated physical systems. The framework leverages an inverse camera model for projector representation, ensuring geometric and photometric fidelity consistent with structured-light principles. By bridging optical metrology and robotics simulation, VIRTUS-FPP enables high-fidelity synthetic data generation, systematic evaluation of sensing pipelines, and digital twin replication of real-world FPP systems. Experimental results demonstrate sub-millimeter reconstruction accuracy and strong correspondence between simulated and physical measurements, highlighting the framework's effectiveness and its potential to advance perception-driven robotics, simulation-to-reality transfer, and scalable optical sensor design.

2509.04991 2026-06-08 physics.ao-ph cs.AI cs.LG 版本更新

A Mechanism-Coupled Split Window Network for Medium- to High-Resolution Land Surface Temperature Retrieval

一种面向中高分辨率地表温度反演的机制耦合分裂窗网络

Tian Xie, Menghui Jiang, Chao Zeng, Huifang Li, Guanhao Zhang, Chan Li, Huanfeng Shen

发表机构 * School of Resource and Environmental Sciences, Wuhan University(武汉大学资源与环境科学学院) Key Laboratory of Geographic Information System of Ministry of Education, Wuhan(教育部地理信息系统重点实验室) Key Laboratory of Digital Cartography and Land Information Application of the Ministry of Natural Resources, Wuhan(自然资源部数字测图与土地信息应用重点实验室)

AI总结 提出并行分量解耦神经网络(PCD-Net),将分裂窗反演重构为物理分量系数的动态学习问题,通过分量级解耦建模和残差分支,实现复杂大气和地表条件下的高精度、鲁棒且全局可泛化的地表温度反演。

详情
AI中文摘要

地表温度(LST)是陆-气相互作用、地表能量收支和气候过程中的基本物理变量。从中高分辨率热红外(TIR)观测中获取的LST能有效揭示不同景观单元间的热环境差异。然而,在复杂大气条件和多样土地覆盖类型下,实现准确、鲁棒且全局可泛化的LST反演仍具挑战。传统分裂窗(SW)算法严重依赖经验参数化,其固定系数无法适应高温地表和高大气水汽含量等复杂场景。同时,传统数据驱动模型因缺乏显式物理结构约束,对分布外(OOD)样本的泛化能力有限。为解决这些问题,本研究提出并行分量解耦神经网络(PCD-Net)框架,将SW反演重构为物理分量系数的动态学习问题。以SW方程作为物理主干,该框架构建并行子网络,自适应学习对应常数项、一阶和二阶亮度温度差项的动态系数;同时引入残差分支,补充由地表发射率和大气水汽联合效应引起的非线性耦合校正。通过这种分量级解耦建模,PCD-Net显式刻画了地表发射率、大气水汽含量与不同SW物理分量之间的动态响应关系。

英文摘要

Land surface temperature (LST) is a fundamental physical variable in land-atmosphere interactions, surface energy budgets, and climate processes. LST derived from medium- to high-resolution thermal infrared (TIR) observations effectively reveals thermal environmental disparities across distinct landscape units. However, achieving accurate, robust, and globally generalizable LST retrieval remains challenging under complex atmospheric conditions and diverse land cover types. Traditional split window (SW) algorithms heavily rely on empirical parameterizations, whose fixed coefficients fail to adapt to complex scenarios such as high surface temperatures and high atmospheric water vapor content. Concurrently, conventional data-driven models exhibit limited generalizability to out-of-distribution (OOD) samples due to the absence of explicit physical structure constraints. To address these issues, this study proposes a Parallel Component Decoupled Neural Network (PCD-Net) framework, which reformulates SW retrieval as a dynamic learning problem of physical component coefficients. Using the SW equation as the physical backbone, the framework constructs parallel subnetworks to adaptively learn the dynamic coefficients corresponding to the constant, first-order, and second-order brightness temperature difference terms; meanwhile, a residual branch is incorporated to supplement the nonlinear coupling corrections induced by the joint effects of surface emissivity and atmospheric water vapor. Through this component-level decoupled modeling, PCD-Net explicitly characterizes the dynamic response relationships between land surface emissivity, atmospheric water vapor content, and different SW physical components.

2508.17693 2026-06-08 cs.DB cs.AI cs.CL 版本更新

Database Normalization via Dual-LLM Self-Refinement

通过双LLM自精炼的数据库规范化

Eunjae Jo, Nakyung Lee, Gyuyeong Kim

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出Miffie框架,利用双模型自精炼架构和大语言模型实现数据库自动规范化,无需人工干预且保持高准确率。

Comments 7 pages

详情
AI中文摘要

数据库规范化对于保持数据完整性至关重要。然而,它通常由数据工程师手动执行,既耗时又容易出错。为此,我们提出了Miffie,一个利用大语言模型能力的数据库规范化框架。Miffie实现了无需人工努力的自动化数据规范化,同时保持高准确性。Miffie的核心是一种双模型自精炼架构,分别结合了性能最佳的模型用于规范化模式生成和验证。生成模块根据验证模块的反馈消除异常,直到输出模式满足规范化要求。我们还精心设计了任务特定的零样本提示,以引导模型实现高准确性和成本效率。实验结果表明,Miffie能够在保持高准确性的同时规范化复杂的数据库模式。

英文摘要

Database normalization is crucial to preserving data integrity. However, it is time-consuming and error-prone, as it is typically performed manually by data engineers. To this end, we present Miffie, a database normalization framework that leverages the capability of large language models. Miffie enables automated data normalization without human effort while preserving high accuracy. The core of Miffie is a dual-model self-refinement architecture that combines the best-performing models for normalized schema generation and verification, respectively. The generation module eliminates anomalies based on the feedback of the verification module until the output schema satisfies the requirement for normalization. We also carefully design task-specific zero-shot prompts to guide the models for achieving both high accuracy and cost efficiency. Experimental results show that Miffie can normalize complex database schemas while maintaining high accuracy.

2503.00065 2026-06-08 cs.CR cs.LG 版本更新

ADAGE: Active Defenses Against GNN Extraction

ADAGE: 针对GNN提取的主动防御

Jing Xu, Franziska Boenisch, Adam Dziedzic

发表机构 * CISPA Helmholtz Center for Information Security(信息安全研究中心)

AI总结 提出首个通用主动防御框架ADAGE,通过监控查询多样性并逐步扰动输出,有效阻止多种GNN模型窃取攻击,同时保持下游任务性能。

Comments Accepted at AsiaCCS 2026

详情
AI中文摘要

图神经网络(GNN)在药物发现、交通状态预测和推荐系统等实际应用中取得了高性能。构建强大的GNN需要大量训练数据、强大的计算资源和人类专业知识,这使得模型成为模型窃取攻击的有利目标。先前的研究表明,针对GNN的窃取攻击威胁向量大且多样,攻击者可以利用从节点标签到高维节点嵌入的各种异质信号,以原始训练成本的一小部分创建目标GNN的本地副本。这种威胁向量的多样性使得设计有效且通用的防御具有挑战性,现有的防御通常专注于特定的窃取设置。此外,它们仅提供识别被盗模型副本的方法,而非阻止攻击。为弥补这一差距,我们提出了首个通用的针对GNN提取的主动防御(ADAGE)。ADAGE基于以下观察:窃取模型的全部功能需要高度多样化的查询来泄露其在整个输入空间的行为。我们的防御监控这种查询多样性,并随着累积泄漏的增加逐步扰动输出。与先前工作相比,ADAGE可以在所有常见攻击设置下阻止窃取。我们使用六个基准数据集、四个GNN模型和三种类型的自适应攻击者进行的广泛实验评估表明,ADAGE对攻击者施加惩罚,使其无法窃取,同时保持下游任务的预测性能。因此,ADAGE有助于未来安全地共享有价值的GNN。

英文摘要

Graph Neural Networks (GNNs) achieve high performance in various real-world applications, such as drug discovery, traffic states prediction, and recommendation systems. The fact that building powerful GNNs requires a large amount of training data, powerful computing resources, and human expertise turns the models into lucrative targets for model stealing attacks. Prior work has revealed that the threat vector of stealing attacks against GNNs is large and diverse, as an attacker can leverage various heterogeneous signals ranging from node labels to high-dimensional node embeddings to create a local copy of the target GNN at a fraction of the original training costs. This diversity in the threat vector renders the design of effective and general defenses challenging and existing defenses usually focus on one particular stealing setup. Additionally, they solely provide means to identify stolen model copies rather than preventing the attack. To close this gap, we propose the first and general Active Defense Against GNN Extraction (ADAGE). ADAGE builds on the observation that stealing a model's full functionality requires highly diverse queries to leak its behavior across the input space. Our defense monitors this query diversity and progressively perturbs outputs as the accumulated leakage grows. In contrast to prior work, ADAGE can prevent stealing across all common attack setups. Our extensive experimental evaluation using six benchmark datasets, four GNN models, and three types of adaptive attackers shows that ADAGE penalizes attackers to the degree of rendering stealing impossible, whilst preserving predictive performance on downstream tasks. ADAGE, thereby, contributes towards securely sharing valuable GNNs in the future.

2506.11066 2026-06-08 cs.SE cs.AI 版本更新

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval

CoQuIR:面向代码质量感知信息检索的综合基准

Jiahui Geng, Fengyu Cai, Shaobo Cui, Qing Li, Liangwei Chen, Chenyang Lyu, Haonan Li, Derui Zhu, Walter Pretschner, Heinz Koeppl, Fakhri Karray

发表机构 * Linköping University(林波伊大学) MBZUAI(麦肯锡人工智能研究院) TU Darmstadt(德累斯顿技术大学) Shanghai Jiao Tong University(上海交通大学) EPFL(苏黎世联邦理工学院) University of Groningen(Groningen大学) Google Tokyo(东京Google) Alibaba Group(阿里巴巴集团) TU Munich(慕尼黑技术大学)

AI总结 提出首个大规模多语言代码质量感知检索基准CoQuIR,涵盖正确性、效率、安全性和可维护性四维度,通过细粒度标注和两个质量中心指标评估23个模型,发现顶尖模型常无法区分有缺陷代码,并探索了训练方法以提升质量感知能力。

详情
AI中文摘要

代码检索在现代软件开发中至关重要,因为它能促进代码复用并加速调试。然而,当前的基准主要强调功能相关性,而忽视了软件质量的关键维度。受此差距启发,我们引入了CoQuIR,这是首个大规模、多语言的基准,专门设计用于评估跨四个关键维度(正确性、效率、安全性和可维护性)的质量感知代码检索。CoQuIR为11种编程语言的42,725个查询和134,907个代码片段提供了细粒度的质量注释,并附带两个以质量为中心的评估指标:成对偏好准确率和基于边界的排名分数。利用CoQuIR,我们对23个检索模型(涵盖开源和专有系统)进行了基准测试,发现即使是最先进的模型也常常无法区分有缺陷或不安全的代码与其更健壮的对应代码。此外,我们初步研究了明确鼓励检索器识别代码质量的训练方法。使用合成数据集,我们展示了在各种模型上质量感知指标的显著改进,而不牺牲语义相关性。下游代码生成实验进一步验证了我们方法的有效性。总体而言,我们的工作强调了将质量信号整合到代码检索系统中的重要性,为更可信和更健壮的软件开发工具奠定了基础。

英文摘要

Code retrieval is essential in modern software development, as it boosts code reuse and accelerates debugging. However, current benchmarks primarily emphasize functional relevance while neglecting critical dimensions of software quality. Motivated by this gap, we introduce CoQuIR, the first large-scale, multilingual benchmark specifically designed to evaluate quality-aware code retrieval across four key dimensions: correctness, efficiency, security, and maintainability. CoQuIR provides fine-grained quality annotations for 42,725 queries and 134,907 code snippets in 11 programming languages, and is accompanied by two quality-centric evaluation metrics: Pairwise Preference Accuracy and Margin-based Ranking Score. Using CoQuIR, we benchmark 23 retrieval models, covering both open-source and proprietary systems, and find that even top-performing models frequently fail to distinguish buggy or insecure code from their more robust counterparts. Furthermore, we conduct preliminary investigations into training methods that explicitly encourage retrievers to recognize code quality. Using synthetic datasets, we demonstrate promising improvements in quality-aware metrics across various models, without sacrificing semantic relevance. Downstream code generation experiments further validate the effectiveness of our approach. Overall, our work highlights the importance of integrating quality signals into code retrieval systems, laying the groundwork for more trustworthy and robust software development tools.

2404.02141 2026-06-08 stat.ME cs.LG econ.EM stat.CO stat.ML 版本更新

Robustly estimating heterogeneity in factorial data using Rashomon Partitions

使用Rashomon分区稳健估计因子数据中的异质性

Aparajithan Venkateswaran, Anirudh Sankar, Arun G. Chandrasekhar, Tyler H. McCormick

发表机构 * Department of Statistics, University of Washington, USA(美国华盛顿大学统计学系) Department of Economics, Stanford University, USA(美国斯坦福大学经济学系) J-PAL, NBER, USA(美国J-PAL和NBER) Department of Sociology, University of Washington, USA(美国华盛顿大学社会学系)

AI总结 提出Rashomon分区集(RPS)贝叶斯框架,通过枚举后验密度接近最大后验模型的所有模型来量化模型不确定性,实现稳健的异质性估计。

详情
AI中文摘要

在观测数据和随机对照试验中,研究人员选择统计模型来阐述感兴趣的结果如何随可观测协变量的组合而变化。选择过于简单的模型可能会掩盖协变量组之间结果的重要异质性,而过于复杂则可能识别出虚假模式。在本文中,我们提出了一种新颖的贝叶斯模型不确定性框架,称为Rashomon分区集(RPS)。RPS包含所有后验密度接近最大后验(MAP)模型的模型。我们通过枚举而非采样来构建RPS,这确保我们探索数据中具有高证据的所有模型,即使它们提供截然不同的实质性解释。我们使用l0先验,该先验允许我们在不对效应之间的关联施加强假设的情况下捕获复杂的异质性,并从信息论角度证明该先验是极小化最优的。我们刻画了在RPS内相对于整个后验条件计算的参数(的函数)的近似误差。我们提出了一种算法,从可解释且唯一的模型类中枚举RPS,然后给出RPS大小的界限。我们提供了模拟证据以及三个实证例子:价格对慈善捐赠的影响、染色体结构的异质性以及小额信贷的引入。

英文摘要

In both observational data and randomized control trials, researchers select statistical models to articulate how the outcome of interest varies with combinations of observable covariates. Choosing a model that is too simple can obfuscate important heterogeneity in outcomes between covariate groups, while too much complexity risks identifying spurious patterns. In this paper, we propose a novel Bayesian framework for model uncertainty called Rashomon Partition Sets (RPSs). The RPS consists of all models that have posterior density close to the maximum a posteriori (MAP) model. We construct the RPS by enumeration, rather than sampling, which ensures that we explore all models with high evidence in the data, even if they offer dramatically different substantive explanations. We use a l0 prior, which allows the allows us to capture complex heterogeneity without imposing strong assumptions about the associations between effects, showing this prior is minimax optimal from an information-theoretic perspective. We characterize the approximation error of (functions of) parameters computed conditional on being in the RPS relative to the entire posterior. We propose an algorithm to enumerate the RPS from the class of models that are interpretable and unique, then provide bounds on the size of the RPS. We give simulation evidence along with three empirical examples: price effects on charitable giving, heterogeneity in chromosomal structure, and the introduction of microfinance.

2507.01548 2026-06-08 cs.HC cs.AI cs.CL 版本更新

Telling stories, making Hanzi: AI-assisted co-creation with elderly migrants in urban China

讲述故事,创造汉字:人工智能辅助中国城市老年移民的协同创作

Yunfei Chen, Wen Zhan, Peiyue Lin, Ziqun Hua, Ying Hu

发表机构 * School of Design, Hunan University(湖南大学设计学院) Royal College of Art(皇家艺术学院) University of the Arts London, Central Saint Martins(伦敦艺术大学,中央圣马丁学院)

AI总结 通过协同创作工作坊,结合口述故事、AI辅助和手工制作,让老年移民创造新汉字以记录被忽视的生活故事,揭示参与者的异质性和适应能力,并展示AI作为降低表达门槛的创意启动器。

详情
AI中文摘要

本文探讨了中国城市老年移民如何记录日常语言和设计常忽略的故事。我们与10位老年人开展了两次协同创作工作坊。活动结合了口述故事、主持人中介的AI辅助和手工制作。大型语言模型通过主持人提出候选字形。参与者创作了新的汉字来承载他们的故事。生成的字符作为记忆锚点,用于后续的分享和复述。我们的解释性分析揭示了参与者之间的异质性和适应能力。参与者将AI视为降低表达和创作门槛的创意启动器,尤其对数字素养较低者。这项工作挑战了关于老年人的同质化假设以及统一能力和需求的预设。我们贡献了一个将AI定位为后台促进者的工作坊框架,并提供了在包容性城市系统中将老年移民视为社区记忆和情境文化知识来源的见解。

英文摘要

This paper explores how older migrants in urban China can record stories that everyday language and design often miss. We ran two co-creation workshops with 10 elders. Activities combined oral storytelling, facilitator-mediated AI assistance, and hand-making. Large language models proposed candidate glyphs through a facilitator. Participants crafted new Hanzi to hold their stories. The resulting characters served as memory anchors for later sharing and retelling. Our interpretive analysis shows heterogeneity and adaptive capacity among participants. Participants experienced AI as a creative initiator that lowered barriers to expression and making, especially for those with lower digital literacy. The work challenges homogenizing assumptions about older adults and the presumption of uniform capacities and needs. We contribute a workshop framework that positions AI as a backstage facilitator. We also offer insights on engaging older migrants as sources of community memory and situated cultural knowledge within inclusive urban systems.

2505.17739 2026-06-08 cs.MA cs.CY cs.HC cs.RO 版本更新

Feasible Action Space Reduction for Quantifying Causal Responsibility in Continuous Spatial Interactions

可行动作空间缩减用于量化连续空间交互中的因果责任

Ashwin George, Luciano Cavalcante Siebert, David A. Abbink, Arkady Zgonnikov

发表机构 * Deflt University of Technology(德福特技术大学)

AI总结 针对连续动作空间,提出FeAR度量的连续空间公式,用于量化空间交互中智能体的因果责任,并展示其在分析回溯责任和估计前瞻责任中的应用。

Comments In review

详情
AI中文摘要

理解一个智能体对另一个智能体的因果影响对于将自动化车辆和移动机器人等人工智能系统安全部署到人类居住环境中至关重要。现有的因果责任模型处理具有离散动作的场景的简化抽象,从而限制了在理解空间交互中的责任时的实际应用。基于空间交互的智能体嵌入场景中且必须在每个时刻执行一个动作的假设,提出了可行动作空间缩减(FeAR)作为离散动作的网格世界环境中因果责任的度量。由于现实世界的交互涉及连续动作空间,本文提出了用于测量空间连续交互中因果责任的FeAR度量的公式。我们展示了该度量在典型空间共享冲突中的效用,并展示了其在分析回溯责任和估计前瞻责任以指导智能体决策中的应用。我们的结果突显了FeAR度量在设计和工程化人工智能体以及评估人类周围智能体责任方面的潜力。

英文摘要

Understanding the causal influence of one agent on another agent is crucial for safely deploying artificially intelligent systems such as automated vehicles and mobile robots into human-inhabited environments. Existing models of causal responsibility deal with simplified abstractions of scenarios with discrete actions, thus, limiting real-world use when understanding responsibility in spatial interactions. Based on the assumption that spatially interacting agents are embedded in a scene and must follow an action at each instant, Feasible Action-Space Reduction (FeAR) was proposed as a metric for causal responsibility in a grid-world setting with discrete actions.Since real-world interactions involve continuous action spaces, this paper proposes a formulation of the FeAR metric for measuring causal responsibility in space-continuous interactions. We illustrate the utility of the metric in prototypical space-sharing conflicts, and showcase its applications for analysing backward-looking responsibility and in estimating forward-looking responsibility to guide agent decision making. Our results highlight the potential of the FeAR metric for designing and engineering artificial agents, as well as for assessing the responsibility of agents around humans.

2209.00188 2026-06-08 cs.AR cs.LG 版本更新

Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction

Hermes: 通过基于感知器的片外负载预测加速长延迟负载请求

Rahul Bera, Konstantinos Kanellopoulos, Shankar Balachandran, David Novo, Ataberk Olgun, Mohammad Sadrosadati, Onur Mutlu

发表机构 * ETH Zürich(苏黎世联邦理工学院) Intel Processor Architecture Research Lab(英特尔处理器架构研究实验室) LIRMM, Univ. Montpellier, CNRS(蒙彼利埃大学LIRMM实验室,CNRS)

AI总结 提出Hermes技术,利用感知器预测片外负载请求,投机性地直接从主存获取数据,同时并行访问缓存层次,从而消除片外负载关键路径上的片上缓存访问延迟,显著提升处理器性能。

Comments To appear in 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022

详情
AI中文摘要

长延迟负载请求持续限制高性能处理器的性能。为增加处理器的延迟容忍度,架构师主要依赖两种关键技术:复杂的数据预取器和大型片上缓存。在这项工作中,我们表明:1) 即使是最先进的复杂预取器,在广泛的工作负载中平均也只能预测一半的片外负载请求;2) 由于片上缓存的规模和复杂性不断增加,片外负载请求的大部分延迟都花费在访问片上缓存层次结构上。本工作的目标是通过从片外负载请求的关键路径中移除片上缓存访问延迟来加速它们。为此,我们提出了一种名为Hermes的新技术,其关键思想是:1) 准确预测哪些负载请求可能走向片外;2) 投机性地直接从主存获取预测的片外负载所需的数据,同时并发访问这些负载的缓存层次结构。为实现Hermes,我们开发了一种新的轻量级、基于感知器的片外负载预测技术,该技术学习使用多个程序特征(例如,程序计数器序列)来识别片外负载请求。对于每个负载请求,预测器观察一组程序特征以预测该负载是否会走向片外。如果预测负载将走向片外,Hermes在负载的物理地址生成后立即向内存控制器发出投机性请求。如果预测正确,负载最终会错过缓存层次结构,并等待正在进行的投机性请求完成,从而从片外负载的关键路径中隐藏片上缓存层次结构访问延迟。我们的评估表明,Hermes显著提升了最先进基线的性能。我们开源了Hermes。

英文摘要

Long-latency load requests continue to limit the performance of high-performance processors. To increase the latency tolerance of a processor, architects have primarily relied on two key techniques: sophisticated data prefetchers and large on-chip caches. In this work, we show that: 1) even a sophisticated state-of-the-art prefetcher can only predict half of the off-chip load requests on average across a wide range of workloads, and 2) due to the increasing size and complexity of on-chip caches, a large fraction of the latency of an off-chip load request is spent accessing the on-chip cache hierarchy. The goal of this work is to accelerate off-chip load requests by removing the on-chip cache access latency from their critical path. To this end, we propose a new technique called Hermes, whose key idea is to: 1) accurately predict which load requests might go off-chip, and 2) speculatively fetch the data required by the predicted off-chip loads directly from the main memory, while also concurrently accessing the cache hierarchy for such loads. To enable Hermes, we develop a new lightweight, perceptron-based off-chip load prediction technique that learns to identify off-chip load requests using multiple program features (e.g., sequence of program counters). For every load request, the predictor observes a set of program features to predict whether or not the load would go off-chip. If the load is predicted to go off-chip, Hermes issues a speculative request directly to the memory controller once the load's physical address is generated. If the prediction is correct, the load eventually misses the cache hierarchy and waits for the ongoing speculative request to finish, thus hiding the on-chip cache hierarchy access latency from the critical path of the off-chip load. Our evaluation shows that Hermes significantly improves performance of a state-of-the-art baseline. We open-source Hermes.

2203.07904 2026-06-08 eess.IV cs.CV cs.LG 版本更新

Unsupervised Learning Based Focal Stack Camera Depth Estimation

基于无监督学习的焦堆相机深度估计

Zhengyu Huang, Weizhi Du, Theodore B. Norris

发表机构 * Center for Ultrafast Optical Science, University of Michigan(超快光学科学中心,密歇根大学) University of Michigan(密歇根大学)

AI总结 提出一种基于无监督深度学习的方法,从焦堆相机图像估计深度,在NYU-v2数据集上相比单图像方法显著提高精度。

Journal ref in Conference on Lasers and Electro-Optics, Technical Digest Series (Optica Publishing Group, 2022), paper JW3A.5

详情
AI中文摘要

我们提出一种基于无监督深度学习的方法,从焦堆相机图像估计深度。在NYU-v2数据集上,我们的方法相比基于单图像的方法实现了更好的深度估计精度。

英文摘要

We propose an unsupervised deep learning based method to estimate depth from focal stack camera images. On the NYU-v2 dataset, our method achieves much better depth estimation accuracy compared to single-image based methods.

2602.04234 2026-06-08 cs.MA cs.AI

When Does Multi-Agent Collaboration Help? An Entropy Perspective

何时多智能体协作有助于?从熵的角度来看

Yuxuan Zhao, Sijia Chen, Ningxin Su

发表机构 * Yantai Research Institute of Harbin Engineering University(哈尔滨工程大学烟台研究室) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州))

AI总结 本文从熵的角度探讨了多智能体协作的有效性,通过分析不同拓扑结构、六个推理基准和两个智能体任务中的熵转换,发现单个智能体在43.3%的情况下表现更优,并揭示了熵动态在第一轮交互中的决定性作用。研究提出了三个关键观察:确定性偏好、基础熵和任务意识,并引入了熵判别算法来提升智能体系统的性能。

Comments Project page: https://multiagent-entropy.github.io/

详情
AI中文摘要

多智能体系统(MAS)已逐渐成为利用大型语言模型(LLMs)解决复杂任务的主流范式。然而,基于公开可用LLMs构建的MAS的有效性机制,尤其是其成功或失败的内在原因,仍鲜有研究。本文从熵的角度重新审视MAS,通过研究问题解决过程中不同拓扑结构、六个推理基准和两个智能体任务中的熵转换,考虑了智能体内部和交互动态。通过分析245个跨越token级、agent级和轮次级的熵特征,我们意外发现,在约43.3%的情况下,单个智能体优于MAS。此外,我们发现熵动态主要在首次交互轮次中决定。我们提供了三个关键观察:1)确定性偏好:峰值熵直接损害MAS的正确性,稳定熵直接促进MAS的正确性;2)基础熵:基础模型在问题解决过程中具有较低熵会因果驱动MAS性能;3)任务意识:MAS的熵动态在不同任务中扮演不同角色。基于这些见解,我们引入了一个简单而有效的算法,即熵判别器,用于从MAS的pass@k结果中选择解决方案,从而在所有MAS配置和任务中均实现了稳定的准确率提升。我们的源代码可在https://github.com/AgenticFinLab/multiagent-entropy获取。

英文摘要

Multi-agent systems (MAS) have emerged as a prominent paradigm for leveraging large language models (LLMs) to tackle complex tasks. However, the mechanisms governing the effectiveness of MAS built upon publicly available LLMs, specifically the underlying rationales for their success or failure, remain largely unexplored. In this paper, we revisit MAS through the perspective of \textit{entropy}, considering both intra- and inter-agent dynamics by investigating entropy transitions during problem-solving across various topologies, six reasoning benchmarks, and two agentic tasks. By analyzing 245 features spanning token-, agent-, and round-level entropy, we counterintuitively find that a single agent outperforms MAS in approximately 43.3\% of cases, and that entropy dynamics are largely determined during the first round of interaction. Furthermore, we provide three key observations: 1) \textit{Certainty Preference}: peak entropy directly harms and stable entropy directly benefits MAS correctness; 2) \textit{Base Entropy}: base models with lower entropy during problem-solving causally drive MAS performance; and 3) \textit{Task Awareness}: entropy dynamics of MAS play varying roles across different tasks. Building on these insights, we introduce a simple yet effective algorithm, the \textit{Entropy Judger}, to select solutions from MAS's pass@$k$ results, leading to consistent accuracy improvements across all MAS configurations and tasks. Our source code is available at \href{https://github.com/AgenticFinLab/multiagent-entropy}{this https URL}.

2409.13477 2026-06-08 eess.IV cs.CV physics.med-ph

A Plug-and-Play Method for Guided Multi-contrast MRI Reconstruction based on Content/Style Modeling

基于内容/风格建模的即插即用式引导多对比度MRI重建方法

Chinmay Rao, Matthias van Osch, Nicola Pezzotti, Jeroen de Bresser, Mark van Buchem, Laurens Beljaards, Jakob Meineke, Elwin de Weerdt, Huangling Lu, Mariya Doneva, Marius Staring

发表机构 * University of Amsterdam(阿姆斯特丹大学) Erasmus University Rotterdam(埃因霍温理工大学) Erasmus University Medical Center(埃因霍温医学院) University of Utrecht(乌得勒支大学)

AI总结 提出一种无需k空间训练数据的模块化即插即用方法PnP-CoSMo,通过内容/风格解耦利用参考扫描引导欠采样对比度重建,在公共和内部数据集上达到或超越端到端方法,并实现更高加速比。

详情
AI中文摘要

由于同一解剖结构的不同MR对比度包含冗余信息,一种对比度可用于引导在同一会话中随后采集的另一种欠采样对比度的重建。为了解决这一利用多对比度侧信息的重建问题,已有多种端到端学习方法被提出。然而,一个关键挑战是需要包含原始k空间数据和配准参考图像的大型配对训练数据集。我们提出了一种模块化的即插即用方法,该方法不需要k空间训练数据,仅依赖于部分配对的图像域数据集。首先学习双对比度MR图像数据的内容/风格模型,随后在迭代重建中作为即插即用算子应用。内容与风格的解耦允许显式表示对比度无关和对比度特定的因素。因此,将先验信息融入重建简化为使用从参考扫描中导出的高质量内容替换估计图像的混叠内容的操作。将该操作与MR数据一致性步骤以及内容估计的校正过程相结合,形成迭代方案。我们将这种新方法命名为PnP-CoSMo。通过设计,它提供了跨对比度的泛化能力,并基于两个给定对比度下的共享和非共享生成因素提供了一个解释框架。我们通过仿真探索了包括可解释性和收敛性在内的多个方面。此外,在公共NYU fastMRI DICOM数据集上展示了其实用性,显示出与端到端方法相当或更优的质量以及更强的泛化能力。在两个内部多线圈数据集上,在给定SSIM下,PnP-CoSMo相比非引导重建实现了高达32.6%的加速。

英文摘要

Since the various MR contrasts of a given anatomy contain redundant information, one contrast can be used to guide the reconstruction of another undersampled contrast acquired subsequently in the same session. To solve this reconstruction problem leveraging multi-contrast side information, several end-to-end learning-based methods have been proposed. However, a key challenge is the requirement for large paired training datasets comprising raw k-space data and aligned reference images. We propose a modular plug-and-play method, which requires no k-space training data and relies solely on partially paired image-domain datasets. A content/style model of two-contrast MR image data is first learned and subsequently applied as a plug-and-play operator in iterative reconstruction. The disentanglement of content and style allows explicit representation of contrast-independent and contrast-specific factors. Consequently, incorporating prior information into the reconstruction reduces to a simple replacement operation on the aliased content of the estimated image using high-quality content derived from the reference scan. Combining this operation with an MR data consistency step, followed by a corrective procedure for the content estimate, yields an iterative scheme. We name this novel approach PnP-CoSMo. It offers, by design, cross-contrast generalizability and provides an explanatory framework based on the shared and non-shared generative factors underlying the two given contrasts. We explore various aspects, including interpretability and convergence, via simulations. Furthermore, its practicality is demonstrated on the public NYU fastMRI DICOM dataset, showing equivalent or superior quality and greater generalizability compared to end-to-end methods. On two in-house multi-coil datasets, PnP-CoSMo enabled up to 32.6% greater acceleration over non-guided reconstruction at given SSIM.

2507.12878 2026-06-08 eess.SP cs.LG stat.ML

Bayesian Modeling and Estimation of Linear Time-Varying Systems using Neural Networks and Gaussian Processes

基于神经网络和高斯过程的线性时变系统贝叶斯建模与估计

Yaniv Shulman

发表机构 * Shulman.info(Shulman信息)

AI总结 本文提出一种统一的贝叶斯框架,通过将系统脉冲响应建模为随机过程,利用变分推断和高斯过程,实现了对线性时变系统的鲁棒估计。

详情
AI中文摘要

本文提出了一种统一的贝叶斯框架,通过将系统脉冲响应建模为随机过程,利用变分推断和高斯过程,实现了对线性时变系统的鲁棒估计。

英文摘要

The identification of Linear Time-Varying (LTV) systems from input-output data is a fundamental yet challenging ill-posed inverse problem. This work introduces a unified Bayesian framework that models the system's impulse response, $h(t, τ)$, as a stochastic process. We decompose the response into a posterior mean and a random fluctuation term, a formulation that provides a principled approach for quantifying uncertainty, unifies intrinsic channel variability and epistemic uncertainty through a common posterior representation, and naturally defines a new, useful system class we term Linear Time-Invariant in Expectation (LTIE). To perform inference, we leverage modern machine learning techniques, including Bayesian neural networks and Gaussian Processes, using scalable variational inference. We demonstrate through a series of experiments that our framework can infer the properties of an LTI system from a single noisy input-output pair, including under deliberate additive-noise misspecification, achieve a lower overall error floor than the classical CCF stacking baseline in a simulated ambient noise tomography setting, and track a continuously varying LTV impulse response by using a structured Gaussian Process prior. This work provides a flexible and robust methodology for uncertainty-aware system identification in dynamic environments.

2505.18006 2026-06-08 cs.CY cs.AI cs.HC cs.IR

AI Literacy for Legal AI Systems: A practical approach

为法律AI系统设计的AI素养:一种实用方法

Gizem Gultekin-Varkonyi

发表机构 * University of Szeged, Faculty of Law and Political Sciences, International and Regional Studies Institute(塞格德大学法学院与政治科学学院,国际与区域研究学院)

AI总结 本文探讨了法律AI系统的AI素养,分析了其对法律和伦理发展的关键作用,并提出了一种实用的风险评估工具。

Comments Forthcoming in Iustum Aequum Salutare (2025) vol.21

Journal ref Iustum Aequum Salutare, 2025, 21 (4)

详情
AI中文摘要

法律AI系统正被全球司法和法律系统部署者和提供者越来越多地采用,以支持各种应用。尽管它们提供了减少偏见、提高效率和改善问责的潜在好处,但也带来了重大风险,需要在机会、法律和伦理发展和部署之间取得平衡。AI素养作为欧盟AI法案中的法律要求,以及部署者和提供者实现伦理AI的关键使能者,可以成为实现这一平衡的工具。本文引入了“法律AI系统”一词,然后分析了AI素养的概念及其与这些系统相关的利弊。这一分析与处理法律AI系统的组织的更广泛AI-L概念相关联。本文的成果是一份路线图问卷,作为实用工具,帮助开发者和提供者评估风险、益处和利益相关者的担忧,以满足社会和监管对法律AI的期望。

英文摘要

Legal AI systems are increasingly being adopted by judicial and legal system deployers and providers worldwide to support a range of applications. While they offer potential benefits such as reducing bias, increasing efficiency, and improving accountability, they also pose significant risks, requiring a careful balance between opportunities, and legal and ethical development and deployment. AI literacy, as a legal requirement under the EU AI Act and a critical enabler of ethical AI for deployers and providers, could be a tool to achieve this. The article introduces the term "legal AI systems" and then analyzes the concept of AI literacy and the benefits and risks associated with these systems. This analysis is linked to a broader AI-L concept for organizations that deal with legal AI systems. The outcome of the article, a roadmap questionnaire as a practical tool for developers and providers to assess risks, benefits, and stakeholder concerns, could be useful in meeting societal and regulatory expectations for legal AI.

2603.14573 2026-06-08 cond-mat.dis-nn cs.LG math.PR

Rigorous Asymptotics for First-Order Algorithms Through the Dynamical Cavity Method

通过动力学空腔方法严格推导一阶算法的渐进行为

Yatin Dandi, David Gamarnik, Francisco Pernice, Lenka Zdeborová

发表机构 * Statistical Physics of Computation Laboratory, École polytechnique fédérale de Lausanne (EPFL)(计算统计物理实验室,瑞士联邦理工学院(EPFL)) Sloan School of Management, Operations Research Center and Institute of Data, Systems and Society (IDSS), MIT(斯隆管理学院,运筹学中心和数据、系统与社会研究所(IDSS),麻省理工学院) CSAIL and LIDS, MIT(计算机科学与人工智能实验室(CSAIL)和麻省理工学院数据科学研究所(LIDS))

AI总结 本文通过严格形式化的动力学空腔方法,推导出一阶算法(如梯度下降和近似消息传递)的动力学主方程,为非严谨的传统方法提供数学基础。

Journal ref COLT 2026

详情
AI中文摘要

通过动力学空腔方法严格推导一阶算法的渐进行为,本文建立了动态平均场理论(DMFT)方程的数学基础,为广义一阶方法(包括梯度下降和近似消息传递等算法)的动力学行为提供了严格形式化的描述。

英文摘要

Dynamical Mean Field Theory (DMFT) provides an asymptotic description of the dynamics of macroscopic observables in certain disordered systems. Originally pioneered in the context of spin glasses by Sompolinsky and Zippelius (1982), it has since been used to derive asymptotic dynamical equations for a wide range of models in physics, high-dimensional statistics and machine learning. One of the main tools used by physicists to obtain these equations is the dynamical cavity method, which has remained largely non-rigorous. In contrast, existing mathematical formalizations have relied on alternative approaches, including Gaussian conditioning, large deviations over paths, or Fourier analysis. In this work, we formalize the dynamical cavity method and use it to give a new proof of the DMFT equations for General First Order Methods, a broad class of dynamics encompassing algorithms such as Gradient Descent and Approximate Message Passing.

2602.10680 2026-06-08 stat.ML cond-mat.dis-nn cs.LG

A solvable high-dimensional model where nonlinear autoencoders learn structure invisible to PCA while test loss misaligns with generalization

一个可解的高维模型,其中非线性自编码器学习到结构对PCA不可见,而测试损失与泛化不一致

Vicente Conde Mendes, Lorenzo Bardone, Cédric Koller, Jorge Medina Moreira, Vittorio Erba, Emanuele Troiani, Lenka Zdeborová

发表机构 * Statistical Physics of Computation Laboratory, École polytechnique fédérale de Lausanne (EPFL)(计算统计物理实验室,瑞士联邦理工学院(EPFL))

AI总结 本文提出一个高维模型,展示非线性自编码器能学习线性方法如PCA无法捕捉的结构,尽管其测试损失与泛化性能不一致。

Journal ref ICML 2026

详情
AI中文摘要

许多现实世界的数据集包含隐藏的结构,这些结构无法通过输入特征间的简单线性相关性检测到。例如,潜在因子可能以协调的方式影响数据,尽管其影响对基于协方差的方法如PCA不可见。在实践中,非线性神经网络常在无监督和自监督学习中成功提取此类隐藏结构。然而,构建一个最小的高维模型,其中这种优势可以严格分析仍是一个开放的理论挑战。我们引入了一个可解的高维 spiked 模型,包含两个潜在因子:一个对协方差可见,另一个统计上相关但不相关,仅出现在高阶矩中。PCA 和线性自编码器无法恢复后者,而最小的非线性自编码器可以证明性地提取两者。我们分析了总体风险和经验风险最小化。我们的模型还提供了一个可解的例子,其中自监督测试损失与表征质量不一致:非线性自编码器恢复了线性方法无法捕捉的结构,尽管其重建损失更高。

英文摘要

Many real-world datasets contain hidden structure that cannot be detected by simple linear correlations between input features. For example, latent factors may influence the data in a coordinated way, even though their effect is invisible to covariance-based methods such as PCA. In practice, nonlinear neural networks often succeed in extracting such hidden structure in unsupervised and self-supervised learning. However, constructing a minimal high-dimensional model where this advantage can be rigorously analyzed has remained an open theoretical challenge. We introduce a tractable high-dimensional spiked model with two latent factors: one visible to covariance, and one statistically dependent yet uncorrelated, appearing only in higher-order moments. PCA and linear autoencoders fail to recover the latter, while a minimal nonlinear autoencoder provably extracts both. We analyze both the population risk, and empirical risk minimization. Our model also provides a tractable example where self-supervised test loss is poorly aligned with representation quality: nonlinear autoencoders recover latent structure that linear methods miss, even though their reconstruction loss is higher.

2509.24914 2026-06-08 stat.ML cond-mat.dis-nn cs.IT cs.LG math.IT

Single-Head Attention in High Dimensions: A Theory of Generalization, Weights Spectra, and Scaling Laws

高维中的单头注意力:一般化、权重谱和扩展定律的理论

Fabrizio Boncoraglio, Vittorio Erba, Emanuele Troiani, Yizhou Xu, Florent Krzakala, Lenka Zdeborová

发表机构 * Statistical Physics of Computation Laboratory, École polytechnique fédérale de Lausanne (EPFL)(计算物理实验室,瑞士联邦理工学院(EPFL)) Information, Learning and Physics Laboratory, École polytechnique fédérale de Lausanne (EPFL)(信息、学习与物理实验室,瑞士联邦理工学院(EPFL))

AI总结 本文研究了高维序列任务中训练的注意力层权重谱结构,通过随机矩阵理论等工具,揭示了训练误差、插值阈值及键查询矩阵谱的高维特性,并预测了功率谱定律的出现。

Journal ref ICML 2026

详情
AI中文摘要

训练的注意力层表现出显著且可重复的权重谱结构,包括低秩坍塌、批量变形和孤立谱异常,但其起源及对泛化的影响尚不明确。本文通过在合成高维序列任务上训练单头绑定注意力层,利用随机矩阵理论、自旋玻璃理论和近似消息传递工具,获得训练和测试误差、插值和恢复阈值及键查询矩阵谱的高维表征。理论预测了训练查询-键映射的完整奇异值分布,包括低秩结构和孤立谱异常,与更现实的Transformer观察结果定性一致。最后,对于具有幂律谱的目标,显示学习通过序列谱恢复进行,导致幂律扩展定律的出现。

英文摘要

Trained attention layers exhibit striking and reproducible spectral structure of the weights, including low-rank collapse, bulk deformation, and isolated spectral outliers, yet the origin of these phenomena and their implications for generalization remain poorly understood. We study empirical risk minimization in a single-head tied-attention layer trained on synthetic high-dimensional sequence tasks generated from the attention-indexed model. Using tools from random matrix theory, spin-glass theory, and approximate message passing, we obtain an exact high-dimensional characterization of training and test error, interpolation and recovery thresholds, and the spectrum of the key and query matrices. Our theory predicts the full singular-value distribution of the trained query-key map, including low-rank structure and isolated spectral outliers, in qualitative agreement with observations in more realistic transformers. Finally, for targets with power-law spectra, we show that learning proceeds through sequential spectral recovery, leading to the emergence of power-law scaling laws.

2511.09568 2026-06-08 physics.chem-ph cs.AI cs.CV

VEDA: 3D Molecular Generation via Variance-Exploding Diffusion with Annealing

VEDA:通过退火变方差扩散实现3D分子生成

Peining Zhang, Jinbo Bi, Minghu Song

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 VEDA结合退火变方差扩散与SE(3)等价架构,高效生成准确的3D分子结构,实现高化学精度与计算效率。

详情
AI中文摘要

扩散模型在3D分子生成中展现出潜力,但面临采样效率与构象准确性之间的根本权衡。尽管流形模型速度快,但常产生几何不准确的结构,因难以捕捉分子构象的多模分布。相比之下,去噪扩散模型更准确但采样慢,限制在于扩散动力学与SE(3)-等价架构之间的整合不足。为此,我们提出了VEDA,一个统一的SE(3)-等价框架,结合变方差扩散与退火以高效生成构象准确的3D分子结构。关键贡献包括:(1) 一种VE调度使噪声注入类似于模拟退火,提高3D准确性并降低松弛能量;(2) 一种新型预处理方案协调SE(3)-等价网络的坐标预测性质与残差扩散目标;(3) 一种新的arcsin调度器将采样集中在对数信号噪声比的关键区间。在QM9和GEOM-DRUGS数据集上,VEDA的采样效率与流形模型相当,仅用100次采样步骤就实现了最先进的价键稳定性与有效性。更重要的是,VEDA生成的结构在GFN2-xTB优化过程中表现出显著的稳定性,其松弛能量中位数仅为1.72 kcal/mol,显著低于其基线架构SemlaFlow的32.3 kcal/mol。我们的框架证明了原理上整合VE扩散与SE(3)-等价架构可以实现高化学精度和计算效率。

英文摘要

Diffusion models show promise for 3D molecular generation, but face a fundamental trade-off between sampling efficiency and conformational accuracy. While flow-based models are fast, they often produce geometrically inaccurate structures, as they have difficulty capturing the multimodal distributions of molecular conformations. In contrast, denoising diffusion models are more accurate but suffer from slow sampling, a limitation attributed to sub-optimal integration between diffusion dynamics and SE(3)-equivariant architectures. To address this, we propose VEDA, a unified SE(3)-equivariant framework that combines variance-exploding diffusion with annealing to efficiently generate conformationally accurate 3D molecular structures. Specifically, our key technical contributions include: (1) a VE schedule that enables noise injection functionally analogous to simulated annealing, improving 3D accuracy and reducing relaxation energy; (2) a novel preconditioning scheme that reconciles the coordinate-predicting nature of SE(3)-equivariant networks with a residual-based diffusion objective, and (3) a new arcsin-based scheduler that concentrates sampling in critical intervals of the logarithmic signal-to-noise ratio. On the QM9 and GEOM-DRUGS datasets, VEDA matches the sampling efficiency of flow-based models, achieving state-of-the-art valency stability and validity with only 100 sampling steps. More importantly, VEDA's generated structures are remarkably stable, as measured by their relaxation energy during GFN2-xTB optimization. The median energy change is only 1.72 kcal/mol, significantly lower than the 32.3 kcal/mol from its architectural baseline, SemlaFlow. Our framework demonstrates that principled integration of VE diffusion with SE(3)-equivariant architectures can achieve both high chemical accuracy and computational efficiency.

2511.03898 2026-06-08 cs.CR cs.AI cs.CE cs.SE

Secure Code Generation at Scale with Reflexion

大规模安全代码生成中的反射

Arup Datta, Ahmed Aljohani, Hyunsook Do

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 研究评估了使用Instruct Prime和反射提示方法提升代码安全性的效果,发现反射提示能显著提高安全性能,尤其在第一轮提示中效果最明显。

Comments Accepted for publication at the 2nd IEEE International Conference on AI-powered Software (AIware 2025)

详情
AI中文摘要

大型语言模型(LLMs)现在广泛用于起草和重构代码,但生成的代码不一定是安全的。我们评估了使用Instruct Prime(消除了合规性提示和提示污染)以及通过零 shot 基线和三轮反射提示方法评估五个指令调优的代码 LLMs。安全性通过不安全代码检测器(ICD)测量,结果通过修复、回归和净收益指标报告,考虑编程语言和CWE家族。我们的发现显示,在第一轮中不安全代码仍然普遍存在:在零 shot 基线(t0)下,约25-33%的程序不安全。弱加密/依赖配置的bug最难避免,而模板化的bug如XSS、代码注入和硬编码的秘密则更可靠地被处理。Python的高安全率;C和C#最低,Java、JS、PHP和C++在中间。反射提示对所有模型都有提升,将平均准确率从t0的70.74%提升到t3的79.43%,最大的提升出现在第一轮,随后是递减的收益。修复、回归和净收益指标的趋势显示,应用一到两轮提示产生大部分收益。一个可复制的包可在https://doi.org/10.5281/zenodo.17065846获取。

英文摘要

Large language models (LLMs) are now widely used to draft and refactor code, but code that works is not necessarily secure. We evaluate secure code generation using the Instruct Prime, which eliminated compliance-required prompts and cue contamination, and evaluate five instruction-tuned code LLMs using a zero-shot baseline and a three-round reflexion prompting approach. Security is measured using the Insecure Code Detector (ICD), and results are reported by measuring Repair, Regression, and NetGain metrics, considering the programming language and CWE family. Our findings show that insecurity remains common at the first round: roughly 25-33% of programs are insecure at a zero-shot baseline (t0 ). Weak cryptography/config-dependent bugs are the hardest to avoid while templated ones like XSS, code injection, and hard-coded secrets are handled more reliably. Python yields the highest secure rates; C and C# are the lowest, with Java, JS, PHP, and C++ in the middle. Reflexion prompting improves security for all models, improving average accuracy from 70.74% at t0 to 79.43% at t3 , with the largest gains in the first round followed by diminishing returns. The trends with Repair, Regression, and NetGain metrics show that applying one to two rounds produces most of the benefits. A replication package is available at https://doi.org/10.5281/zenodo.17065846.

2507.17799 2026-06-08 eess.AS cs.LG cs.SD

A Concept-based approach to Voice Disorder Detection

基于概念的方法用于声带疾病检测

Davide Ghia, Gabriele Ciravegna, Alkis Koudounas, Marco Fantini, Erika Crosetti, Giovanni Succo, Tania Cerquitelli

发表机构 * Politecnico di Torino CENTAI Institute(CENTAI研究院) San Feliciano Hospital(San Feliciano医院) SCDU Otorinolaringoiatria, Head Neck Cancer Unit, Ospedale San Giovanni Bosco(SCDU耳鼻喉科,头颈癌症单元,San Giovanni Bosco医院) Dipartimento di Oncologia, Università degli Studi di Torino(肿瘤学系,托里尼大学)

AI总结 本文提出基于概念的声带疾病检测方法,利用可解释AI提升模型透明度,与传统深度学习方法相比,实现更清晰的决策框架。

详情
AI中文摘要

声带疾病影响了大量人口,使用自动化非侵入性技术进行诊断将显著推动医疗进步,提高患者生活质量。近期研究表明,人工智能模型,特别是深度神经网络(DNNs),能有效解决此任务。然而,由于其复杂性,此类模型的决策过程常不透明,限制了其在临床中的可信度。本文探讨了基于可解释AI(XAI)的替代方法,旨在通过提供不同形式的解释来提高DNNs的可解释性。具体而言,本文聚焦于概念模型,如概念瓶颈模型(CBM)和概念嵌入模型(CEM),探讨它们如何在性能上与传统深度学习方法相媲美,同时提供更透明和可解释的决策框架。

英文摘要

Voice disorders affect a significant portion of the population, and the ability to diagnose them using automated, non-invasive techniques would represent a substantial advancement in healthcare, improving the quality of life of patients. Recent studies have demonstrated that artificial intelligence models, particularly Deep Neural Networks (DNNs), can effectively address this task. However, due to their complexity, the decision-making process of such models often remain opaque, limiting their trustworthiness in clinical contexts. This paper investigates an alternative approach based on Explainable AI (XAI), a field that aims to improve the interpretability of DNNs by providing different forms of explanations. Specifically, this works focuses on concept-based models such as Concept Bottleneck Model (CBM) and Concept Embedding Model (CEM) and how they can achieve performance comparable to traditional deep learning methods, while offering a more transparent and interpretable decision framework.

2506.12454 2026-06-08 stat.ML cond-mat.dis-nn cs.CR cs.LG

On the existence of consistent adversarial attacks in high-dimensional linear classification

高维线性分类中一致对抗攻击存在的存在性研究

Matteo Vilucchio, Lenka Zdeborová, Bruno Loureiro

发表机构 * Information Learning and Physics Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)(信息学习与物理实验室,瑞士联邦理工学院(EPFL)) Statistical Physics of Computation Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)(计算统计物理实验室,瑞士联邦理工学院(EPFL)) Département d’Informatique, École Normale Supérieure - PSL & CNRS, France(信息学系,法国高等科学研究院(PSL)与国家科学研究中心(CNRS))

AI总结 本文研究高维二分类中对抗攻击与模型表达能力有限导致的误分类区别,提出新的误差度量标准,揭示模型对保持真实标签扰动的脆弱性,理论分析显示模型越过度参数化,对标签保持扰动的敏感性越高。

Journal ref ICML 2026

详情
AI中文摘要

本文研究高维二分类中对抗攻击与模型表达能力有限或数据有限导致的误分类的本质区别,提出新的误差度量标准,精确捕捉这一区别,量化模型对保持真实标签扰动的脆弱性。我们的主要技术贡献是精确且严谨地对这些度量在良好指定模型和潜在空间模型中的渐进行为进行刻画,揭示与标准稳健误差度量不同的脆弱性模式。理论结果表明,随着模型变得越来越过度参数化,其对标签保持扰动的脆弱性增加,为理解模型对对抗攻击的敏感机制提供了理论见解。

英文摘要

What fundamentally distinguishes an adversarial attack from a misclassification due to limited model expressivity or finite data? In this work, we investigate this question in the setting of high-dimensional binary classification, where statistical effects due to limited data availability play a central role. We introduce a new error metric that precisely capture this distinction, quantifying model vulnerability to consistent adversarial attacks -- perturbations that preserve the ground-truth labels. Our main technical contribution is an exact and rigorous asymptotic characterization of these metrics in both well-specified models and latent space models, revealing different vulnerability patterns compared to standard robust error measures. The theoretical results demonstrate that as models become more overparameterized, their vulnerability to label-preserving perturbations grows, offering theoretical insight into the mechanisms underlying model sensitivity to adversarial attacks.

2407.15555 2026-06-08 eess.SP cs.LG

The Rlign Algorithm for Enhanced Electrocardiogram Analysis through R-Peak Alignment for Explainable Classification and Clustering

通过R峰对齐提升心电图分析的Rlign算法:用于可解释分类和聚类

Lucas Plagwitz, Lucas Bickmann, Michael Fujarski, Alexander Brenner, Warnes Gobalakrishnan, Lars Eckardt, Antonius Büscher, Julian Varghese

发表机构 * IMI Medical Systems GmbH(IMI医疗系统 GmbH) University of Freiburg(弗赖堡大学)

AI总结 本文提出Rlign算法,通过R峰对齐重构心电图信号,提升分类、聚类和可解释性,优于传统方法和CNN。

Journal ref European Heart Journal - Digital Health, Volume 7, Issue 5, June 2026, ztag067

详情
AI中文摘要

心电图(ECG)记录长期以来在诊断不同心脏状况中至关重要。最近,使用机器学习方法自动处理ECG的研究变得重要,主要通过在原始ECG信号上使用深度学习方法。像卷积神经网络(CNNs)这样的模型的优势在于能够有效处理生物医学影像或信号数据。然而,这种优势受到缺乏可解释性、需要大量训练数据以及适应于无监督聚类任务的复杂性等挑战的限制。为解决这些问题,我们旨在通过利用其半结构化、周期性形式重新引入浅层学习技术,包括支持向量机和主成分分析,到ECG信号处理中。为此,我们开发并评估了一种转换,能够有效将ECG信号重构为完全结构化的格式,从而后续使用浅层学习算法进行分析。在本研究中,我们提出了这种自适应转换方法,通过在数据集中对所有信号的R峰进行对齐,并在有无心跳率依赖的情况下重新采样R峰之间的段落。我们展示了这种转换在分类、聚类和可解释性方面的显著优势,优于商业软件的中位心拍转换和CNN方法。我们的方法在处理有限训练数据时,显示出浅层机器学习方法相对于CNNs的显著优势。此外,我们发布了一个经过充分测试且公开可访问的代码框架,提供了一个稳健的对齐管道以支持未来研究,网址为https://github.com/imi-ms/rlign。

英文摘要

Electrocardiogram (ECG) recordings have long been vital in diagnosing different cardiac conditions. Recently, research in the field of automatic ECG processing using machine learning methods has gained importance, mainly by utilizing deep learning methods on raw ECG signals. A major advantage of models like convolutional neural networks (CNNs) is their ability to effectively process biomedical imaging or signal data. However, this strength is tempered by challenges related to their lack of explainability, the need for a large amount of training data, and the complexities involved in adapting them for unsupervised clustering tasks. In addressing these tasks, we aim to reintroduce shallow learning techniques, including support vector machines and principal components analysis, into ECG signal processing by leveraging their semi-structured, cyclic form. To this end, we developed and evaluated a transformation that effectively restructures ECG signals into a fully structured format, facilitating their subsequent analysis using shallow learning algorithms. In this study, we present this adaptive transformative approach that aligns R-peaks across all signals in a dataset and resamples the segments between R-peaks, both with and without heart rate dependencies. We illustrate the substantial benefit of this transformation for traditional analysis techniques in the areas of classification, clustering, and explainability, outperforming commercial software for median beat transformation and CNN approaches. Our approach demonstrates a significant advantage for shallow machine learning methods over CNNs, especially when dealing with limited training data. Additionally, we release a fully tested and publicly accessible code framework, providing a robust alignment pipeline to support future research, available at https://github.com/imi-ms/rlign.

2302.00198 2026-06-08 cs.NE cs.AI cs.NA math.NA

A fuzzy adaptive metaheuristic algorithm for identifying sustainable, economical, lightweight, and earthquake-resistant reinforced concrete cantilever retaining walls

一种模糊自适应元启发式算法用于识别可持续、经济、轻质且抗震的钢筋混凝土悬臂挡土墙

Farshid Keivanian, Raymond Chiong, Ali R. Kashani, Amir H. Gandomi

发表机构 * School of Information and Physical Sciences, The University of Newcastle(新castle大学信息与物理科学学院) Department of Civil Engineering, University of Memphis(Memphis大学土木工程系) Faculty of Engineering and Information Technology, University of Technology Sydney(悉尼技术大学工程与信息技术学院)

AI总结 本文提出一种模糊自适应元启发式算法,用于优化抗震钢筋混凝土悬臂挡土墙的设计,考虑了结构强度、地质稳定性及几何变量,以实现轻质、经济且环保的抗震设计。

Comments There are six figures, 51 pages, and 12 tables in the revised manuscript that has recently been resubmitted to the Journal of Computational Science

Journal ref Journal of Computational Science, Volume 70, Article 101978, 2023

详情
AI中文摘要

在地震多发区,钢筋混凝土悬臂挡土墙的抗震性能至关重要。本研究利用水平和垂直伪静态系数来评估其抗震性能。为解决由此产生的土压力导致的钢筋混凝土悬臂(RCC)重量和力问题,26个结构强度和地质稳定性约束以及12个几何变量与每个设计相关联。这些约束和设计变量形成一个十二维解空间的约束优化问题。为了有效搜索并产生可持续、经济、轻质且能抵御地震危害的RCC设计,本文提出了一种新颖的自适应模糊基于元启发式算法。该方法将搜索空间划分为子区域,并基于其新颖的搜索组件建立探索、信息共享和开发搜索能力。此外,模糊推理系统被用于解决参数化和计算成本评估问题。研究发现,与几种经典和表现最佳的设计优化器相比,所提出的算法能够在九种地震条件下实现低成本、低重量和低二氧化碳排放的RCC设计。

英文摘要

In earthquake-prone zones, the seismic performance of reinforced concrete cantilever (RCC) retaining walls is significant. In this study, the seismic performance was investigated using horizontal and vertical pseudo-static coefficients. To tackle RCC weights and forces resulting from these earth pressures, 26 constraints for structural strengths and geotechnical stability along with 12 geometric variables are associated with each design. These constraints and design variables form a constraint optimization problem with a twelve-dimensional solution space. To conduct effective search and produce sustainable, economical, lightweight RCC designs robust against earthquake hazards, a novel adaptive fuzzy-based metaheuristic algorithm is applied. The proposed method divides the search space to sub-regions and establishes exploration, information sharing, and exploitation search capabilities based on its novel search components. Further, fuzzy inference systems were employed to address parameterization and computational cost evaluation issues. It was found that the proposed algorithm can achieve low-cost, low-weight, and low CO2 emission RCC designs under nine seismic conditions in comparison with several classical and best-performing design optimizers.

2606.07351 2026-06-08 cs.LG cs.AI 新提交

SleepExplain: Explainable Non-Rapid Eye Movement and Rapid Eye Movement Sleep Stage Classification from EEG Signal

SleepExplain: 基于EEG信号的可解释非快速眼动和快速眼动睡眠阶段分类

Rafsan Jany, Md. Hamjajul Ashmafee, Iqram Hussain, Md Azam Hossain

AI总结 提出SleepExplain模型,使用集成学习(随机森林、XGBoost、梯度提升)对NREM和REM睡眠阶段进行分类,准确率达94.30%,并利用SHAP提供可解释性。

Comments 6 pages, 7 figures, 2022 25th International Conference on Computer and Information Technology (ICCIT)

Journal ref 2022 25th International Conference on Computer and Information Technology (ICCIT), pp. 248-253, 2022

详情
AI中文摘要

睡眠阶段分类是多种睡眠相关疾病最重要的诊断方法之一。脑电图(EEG)被认为是检查神经效应与睡眠阶段之间关联的有力工具,因为它能正确识别与睡眠相关的神经变化。在非快速眼动(NREM)和快速眼动(REM)睡眠阶段,许多神经和身体功能受到影响,因此在其功能中扮演重要角色。本研究旨在从睡眠EEG数据中分类NREM和REM睡眠阶段,并提出一个新颖的SleepExplain模型,一种可解释的NREM和REM睡眠阶段分类,以解释其预测。在这项工作中,使用随机森林、XGBoost和梯度提升集成分类模型对睡眠阶段进行分类。总体而言,我们获得了92.54%(随机森林)、94.25%(梯度提升)和94.30%(XGBoost)的准确率。对于可解释分类模型,我们采用博弈论方法SHAP(SHapley Additive exPlanations)为预测提供令人信服的解释。

英文摘要

Classification of sleep stages is one of the most important diagnostic approaches for a variety of sleep-related disorders. Electroencephalography (EEG) is regarded as a powerful tool for examining the association between neurological effects and sleep phases since it correctly identifies sleep-related neurological alterations. During Non-Rapid Eye Movement (NREM) and Rapid Eye Movement (REM) sleep phases, a number of nerve and bodily functions are affected and therefore hold an important role both in their functionalities. This work aims to classify NREM and REM sleep stages from sleep EEG data and present a noble SleepExplain model, an explainable NREM and REM sleep stage classification to explain its predictions. In this work, sleep stages were classified using Random Forest, XGBoost, and Gradient Boosting ensemble classification models. Overall, we obtained an accuracy of 92.54% (Random Forest), 94.25% (Gradient Boosting), and 94.30% (XGBoost). For explainable classification model, we utilized a game theoretic approach, SHAP (SHapley Addictive exPlanations) to offer a convincing explanation for the prediction.