arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1970
专题追踪
2508.21531 2026-06-12 stat.ML cs.LG stat.CO 版本更新

Adaptive generative moment matching networks for improved learning of dependence structures

自适应生成矩匹配网络用于改进依赖结构学习

Marius Hofert, Gan Yao

发表机构 * Department of Statistics and Actuarial Science, The University of Hong Kong(香港大学统计与精算科学系)

AI总结 提出自适应带宽选择的最大均值差异混合核用于生成矩匹配网络,通过增加核数量和早停策略提升训练性能,在copula随机数生成、高维收敛率及金融数据依赖建模中优于传统方法。

详情
AI中文摘要

引入了一种用于最大均值差异(MMD)中混合核的自适应带宽选择程序,以拟合生成矩匹配网络(GMMNs),并展示了copula随机数生成器的改进学习。基于训练损失的相对误差,在训练过程中增加核的数量;此外,验证损失的相对误差被用作早停标准。虽然训练时间保持相似,但自适应训练GMMNs(AGMMNs)显著提高了训练性能,这通过验证MMD轨迹、样本和验证MMD值得以展示。在三个应用中,AGMMNs相比GMMNs和参数copula模型也表现出优越性。首先,首次在高达100维的维度中研究了基于copula的准随机与伪随机样本的估计量收敛速度。其次,重复的验证MMD以及蒙特卡洛和准蒙特卡洛应用证明了AGMMNs在去GARCH化后的标普500指数50个成分所隐含的copula模型上的改进训练。最后,后一个数据集和富时100指数的50个成分被用于证明AGMMNs的改进训练确实转化为改进的模型预测。

英文摘要

An adaptive bandwidth selection procedure for the mixture kernel in the maximum mean discrepancy (MMD) for fitting generative moment matching networks (GMMNs) is introduced, and improved learning of copula random number generators is demonstrated. Based on the relative error of the training loss, the number of kernels is increased during training; additionally, the relative error of the validation loss is used as an early stopping criterion. While training time remains similar, adaptively training GMMNs (AGMMNs) significantly increases training performance, which is shown based on validation MMD trajectories, samples and validation MMD values. Superiority of AGMMNs over GMMNs and parametric copula models is also demonstrated in terms of three applications. First, convergence rates of estimators based on quasi-random versus pseudo-random samples from copulas are investigated in dimensions as large as 100 for the first time. Second, replicated validation MMDs, as well as Monte Carlo and quasi-Monte Carlo applications demonstrate the improved training of AGMMNs for a copula model implied by the 50 constituents of the S&P 500 index after deGARCHing. Last, both the latter dataset and 50 constituents of the FTSE 100 are used to demonstrate that the improved training of AGMMNs indeed translates to an improved model prediction.

2402.01779 2026-06-12 eess.IV cs.CV cs.LG stat.ML 版本更新

Plug-and-Play image restoration with Stochastic deNOising REgularization

即插即用图像恢复:随机去噪正则化

Marien Renaud, Jean Prost, Arthur Leclaire, Nicolas Papadakis

发表机构 * GitHub

AI总结 提出SNORE框架,仅在适当噪声水平图像上应用去噪器,结合随机正则化与梯度下降求解逆问题,在去模糊和修复任务上达到SOTA。

详情
AI中文摘要

即插即用(PnP)算法是一类迭代算法,通过结合物理模型和深度神经网络进行正则化来解决图像逆问题。尽管它们能产生令人印象深刻的图像恢复结果,但这些算法依赖于在迭代过程中噪声逐渐减小的图像上非标准地使用去噪器,这与最近基于扩散模型(DM)的算法形成对比,后者仅在重新加噪的图像上应用去噪器。我们提出了一种新的PnP框架,称为随机去噪正则化(SNORE),该框架仅在具有适当噪声水平的图像上应用去噪器。它基于显式的随机正则化,从而产生一种随机梯度下降算法来解决不适定逆问题。提供了该算法及其退火扩展的收敛性分析。实验上,我们证明SNORE在去模糊和修复任务上与最先进方法相比具有竞争力,无论是在定量还是定性方面。

英文摘要

Plug-and-Play (PnP) algorithms are a class of iterative algorithms that address image inverse problems by combining a physical model and a deep neural network for regularization. Even if they produce impressive image restoration results, these algorithms rely on a non-standard use of a denoiser on images that are less and less noisy along the iterations, which contrasts with recent algorithms based on Diffusion Models (DM), where the denoiser is applied only on re-noised images. We propose a new PnP framework, called Stochastic deNOising REgularization (SNORE), which applies the denoiser only on images with noise of the adequate level. It is based on an explicit stochastic regularization, which leads to a stochastic gradient descent algorithm to solve ill-posed inverse problems. A convergence analysis of this algorithm and its annealing extension is provided. Experimentally, we prove that SNORE is competitive with respect to state-of-the-art methods on deblurring and inpainting tasks, both quantitatively and qualitatively.

2505.04021 2026-06-12 cs.DC cs.AI cs.LG cs.PF 版本更新

Prism: Cost-Efficient Multi-LLM Serving via GPU Memory Ballooning

Prism: 通过GPU内存气球实现经济高效的多LLM服务

Shan Yu, Yifan Qiao, Mingyuan Ma, Yangmin Li, Shuo Yang, Xinyuan Tong, Yang Wang, Zhiqiang Xie, Yuwei An, Shiyi Cao, Ke Bao, Deepak Vij, Xiaoning Ding, Yichen Wang, Qingda Lu, Zhong Wang, Gao Gao, Harry Xu, Junyi Shu, Jiarong Xing, Ying Sheng

发表机构 * UCLA(加州大学洛杉矶分校) UC Berkeley(伯克利加州大学) Harvard University(哈佛大学) CMU(卡内基梅隆大学) University of Edinburgh(爱丁堡大学) Intel(英特尔) Stanford University(斯坦福大学) LMSYS(灵州市系统实验室) ByteDance(字节跳动) Alibaba Cloud(阿里云) Tsinghua University(清华大学) Novita AI Rice University(里士满大学)

AI总结 针对多LLM服务中资源效率低下的问题,提出基于内存气球的内存中心化LLM协同服务框架Prism,统一空间与时间共享,已在10K+ GPU生产环境部署。

Comments OSDI'26

详情
AI中文摘要

推理提供商必须为许多LLM保持可用性,包括低流量但关键的模型,随着token价格下降,资源效率变得越来越重要。对生产轨迹的分析揭示了一种动态突发组模式,其中一组模型同时活跃并随时间变化;现有的空间和时间共享方法缺乏适应这种变化的原理性机制,迫使在SLO遵守和效率之间进行权衡。我们观察到弹性内存分配可以统一空间和时间共享。基于这一洞察,我们开发了Prism,一个以内存为中心的LLM协同服务框架,它应用内存气球来跨模型回收内存,并在单一方案下支持两种形式的共享。Prism的气球驱动程序,称为kvcached,已在https://github.com/... 开源,并在超过10K GPU的生产环境中部署。

英文摘要

Inference providers must maintain availability for many LLMs, including low-volume but essential models, making resource efficiency increasingly important as token prices fall. Analysis of production traces reveals a dynamic bursty-group pattern in which sets of models become active together and shift over time; existing space- and time-sharing approaches lack principled mechanisms to adapt to this variability, forcing trade-offs between SLO adherence and efficiency. We observe that elastic memory allocation can unify spatial and temporal sharing. Based on this insight, we have developed Prism, a memory-centric LLM co-serving framework that applies memory ballooning to reclaim memory across models and support both forms of sharing under a single scheme. Prism's balloon driver, referred to as kvcached, has been open-sourced at https://github.com/ovg-project/kvcached, and deployed in production environments across 10K+ GPUs.

2401.08301 2026-06-12 eess.SP cs.LG cs.SY eess.SY 版本更新

QoS Improvement in Multi User Cellular-Symbiotic Radio Network Assisted by Active-STAR-RIS

基于有源同步透射反射智能超表面的多用户蜂窝共生无线电网络中的QoS改进

Rahman Saadat Yeganeh, Mohammad Javad Omidi, Farshad Zeinali, Mohammad Robat Mili, Mohammad Ghavami

发表机构 * Department of Electrical and Computer Engineering, Isfahan University of Technology(伊斯法罕理工大学电气与计算机工程系) Department of Electronics and Communication Engineering, Kuwait College of Science and Technology(科威特科学与技术学院电子与通信工程系) The Pasargad Institute for Advanced Innovative Solutions (PIAIS)(帕萨尔加德先进创新解决方案研究所) Electrical and Electronic Engineering Department, London South Bank University(伦敦南岸大学电子与电气工程系)

AI总结 本文利用有源同步透射反射智能超表面(ASRIS)增强6G蜂窝网络服务质量,通过深度强化学习优化波束成形、相位调整和调度参数,最大化共生反向散射设备与用户间的吞吐量。

Comments This article will be submitted to the Transactions journal

详情
AI中文摘要

在本文中,我们采用有源同步透射反射可重构智能表面(ASRIS)来增强6G蜂窝网络服务的质量。该网络集成了共生无线电(CSR)子系统,以促进无源物联网(IoT)用户与有源用户之间的通信,分别称为共生反向散射设备(SBD)和共生用户设备(SUE)。由于SBD是无源的,向SUE传输信息面临重大挑战。为克服这一挑战,我们利用基站(BS)内大规模多输入多输出(MIMO)天线的能力,以更大的功率中继SBD传输的信息。该方案采用非正交多址(NOMA)技术实现所有用户的多址接入,并使用连续干扰消除(SIC)消除潜在干扰。主要目标是最大化SBD与SUE之间的吞吐量。为此,我们构建了一个优化问题,涉及BS和ASRIS处的有源波束成形系数、ASRIS的相位调整以及CSR与蜂窝网络之间的调度参数。为解决该优化问题,我们使用了三种深度强化学习(DRL)方法:近端策略优化(PPO)、双延迟深度确定性策略梯度(TD3)和异步优势演员-评论家(A3C)。对这些方法进行了仿真,结果表明A3C、TD3和PPO分别具有最快的收敛速度并实现了最高的网络吞吐量增长。最后,使用无源同步透射反射RIS(STAR-RIS)对所提方案进行了评估,其性能劣于ASRIS。

英文摘要

In this article, we employ active simultaneously transmitting and reflecting reconfigurable intelligent surfaces (ASRIS) to enhance the quality of 6G cellular network services. The network integrates commensal symbiotic radio (CSR) subsystems to facilitate communication between passive Internet of Things (IoT) users and active users, referred to as symbiotic backscatter devices (SBDs) and symbiotic user equipments (SUEs), respectively. Since the SBDs are passive, transmitting information to the SUEs poses significant challenges. To overcome this challenge, we harness the capabilities of massive multiple input multiple output (MIMO) antennas within the base station (BS) to relay the information transmitted by SBDs with greater power. This scheme uses the non-orthogonal multiple access (NOMA) technique for multiple access among all users, and potential interferences are eliminated using successive interference cancellation (SIC). The primary objective is to maximize the throughput between SBDs and SUEs. To achieve this, we formulate an optimization problem involving variables such as active beamforming coefficients at the BS and ASRIS, phase adjustments of ASRIS, and scheduling parameters between CSR and cellular networks. To solve this optimization problem, we used three deep reinforcement learning (DRL) methods: proximal policy optimization (PPO), twin delayed deep deterministic policy gradient (TD3), and asynchronous advantage actor critic (A3C). These methods were simulated, and the results demonstrate that A3C, TD3, and PPO have the best convergence speeds and achieve the highest increases in network throughput, respectively. Finally, the proposed scheme was evaluated using passive simultaneously transmitting and reflecting RIS (STAR-RIS), which demonstrated poorer performance compared to ASRIS.

2604.15372 2026-06-12 cs.CR cs.AI cs.MM

The Synthetic Media Shift: Tracking the Rise, Virality, and Detectability of AI-Generated Multimodal Misinformation

合成媒体的演变:跟踪AI生成多模态虚假信息的兴起、传播与可检测性

Zacharias Chrysidis, Stefanos-Iordanis Papadopoulos, Symeon Papadopoulos

发表机构 * Centre for Research and Technology Hellas(希腊研究中心)

AI总结 本文提出CONVEX数据集,研究多模态虚假信息的传播与共识动态,发现AI生成内容虽传播迅速但依赖被动互动,且检测性能随生成模型发展而下降。

详情
AI中文摘要

随着生成式AI的发展,真实与合成媒体的界限日益模糊,挑战在线信息的完整性。本文介绍了CONVEX,一个包含超过15万条多模态虚假信息的大型数据集,涵盖误标、编辑和AI生成的视觉内容,来自X的Community Notes。我们分析了多模态虚假信息在传播性、互动性和共识动态方面的演变,重点关注合成媒体。结果表明,尽管AI生成内容传播性 disproportionate,但其传播主要由被动互动驱动而非主动讨论。尽管初始报告较慢,AI生成内容一旦被标记,能更快达成社区共识。此外,我们评估了专门检测器和视觉-语言模型,发现其在区分合成与真实图像方面性能随生成模型发展而持续下降。这些发现突显了在快速演变的数字信息环境中持续监控和适应性策略的必要性。

英文摘要

As generative AI advances, the distinction between authentic and synthetic media is increasingly blurred, challenging the integrity of online information. In this study, we present CONVEX, a large-scale dataset of multimodal misinformation involving miscaptioned, edited, and AI-generated visual content, comprising over 150K multimodal posts with associated notes and engagement metrics from X's Community Notes. We analyze how multimodal misinformation evolves in terms of virality, engagement, and consensus dynamics, with a focus on synthetic media. Our results show that while AI-generated content achieves disproportionate virality, its spread is driven primarily by passive engagement rather than active discourse. Despite slower initial reporting, AI-generated content reaches community consensus more quickly once flagged. Moreover, our evaluation of specialized detectors and vision-language models reveals a consistent decline in performance over time in distinguishing synthetic from authentic images as generative models evolve. These findings highlight the need for continuous monitoring and adaptive strategies in the rapidly evolving digital information environment.

2603.26705 2026-06-12 q-bio.BM cs.AI cs.LG

PI-Mamba: Linear-Time Protein Backbone Generation via Spectrally Initialized Flow Matching

PI-Mamba:通过谱初始化流匹配实现线性时间的蛋白质主链生成

Tianyu Wu, Lin Zhu

发表机构 * Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign(生物物理与定量生物学中心,伊利诺伊大学厄巴纳-香槟分校) School of Information Science, University of Illinois Urbana-Champaign(信息科学学院,伊利诺伊大学厄巴纳-香槟分校)

AI总结 PI-Mamba通过谱初始化和流匹配框架,在保证局部共价几何精确性的同时实现线性时间推断,实现了主链生成的高效与高保真。

Journal ref Bioinformatics (2026)

详情
AI中文摘要

动机:蛋白质主链设计的生成模型必须同时确保几何有效性、采样效率和长序列的可扩展性。然而,大多数现有方法依赖于迭代细化、二次注意力机制或事后几何修正,导致计算效率与结构保真度之间存在持续的权衡。结果:我们提出物理指导的Mamba(PI-Mamba),一种生成模型,通过构造确保精确的局部共价几何,同时实现线性时间推断。PI-Mamba将可微约束执行操作符整合到流匹配框架中,并与基于Mamba的状态空间架构耦合。为了提高优化稳定性和主链真实性,我们引入了源自Rouse聚合物模型的谱初始化和辅助的顺式脯氨酸意识头。在基准任务中,PI-Mamba实现了0.0%的局部几何违规率和高设计性(scTM = $0.91\pm 0.03$,n = 100),并且在单个A5000 GPU(24 GB)上可扩展到超过2,000个残基的蛋白质。

英文摘要

Motivation: Generative models for protein backbone design have to simultaneously ensure geometric validity, sampling efficiency, and scalability to long sequences. However, most existing approaches rely on iterative refinement, quadratic attention mechanisms, or post-hoc geometry correction, leading to a persistent trade-off between computational efficiency and structural fidelity. Results: We present Physics-Informed Mamba (PI-Mamba), a generative model that enforces exact local covalent geometry by construction while enabling linear-time inference. PI-Mamba integrates a differentiable constraint-enforcement operator into a flow-matching framework and couples it with a Mamba-based state-space architecture. To improve optimisation stability and backbone realism, we introduce a spectral initialization derived from the Rouse polymer model and an auxiliary cis-proline awareness head. Across benchmark tasks, PI-Mamba achieves 0.0\% local geometry violations and high designability (scTM = $0.91\pm 0.03$, n = 100), while scaling to proteins exceeding 2,000 residues on a single A5000 GPU (24 GB).

2602.18072 2026-06-12 cs.AR cs.AI

HiAER-Spike Software-Hardware Reconfigurable Platform for Event-Driven Neuromorphic Computing at Scale

HiAER-Spike软件-硬件可重构平台:大规模事件驱动神经形态计算

Gwenevere Frank, Gopabandhu Hota, Keli Wang, Christopher Deng, Krish Arora, Diana Vins, Abhinav Uppal, Omowuyi Olajide, Kenneth Yoshimoto, Qingbo Wang, Mari Yamaoka, Johannes Leugering, Stephen Deiss, Leif Gibb, Gert Cauwenberghs

发表机构 * Institute for Neural Computation, UC San Diego(神经计算研究所,加州大学圣地亚哥分校) Fujitsu(富士通) Forschungszentrum Jülich(吕贝克研究中心) Qernel AI

AI总结 HiAER-Spike平台支持执行多达1.6亿神经元和400亿突触的大型脉冲神经网络,通过模块化可重构架构实现高效事件驱动计算,提供Python接口简化神经网络配置与执行。

Comments Leif Gibb, Gert Cauwenberghs are equal authors. arXiv admin note: substantial text overlap with arXiv:2504.03671

Journal ref npj Unconventional Computing (2026)

详情
AI中文摘要

本文介绍了HiAER-Spike,一个模块化、可重构的事件驱动神经形态计算平台,可执行多达1.6亿神经元和400亿突触的大型脉冲神经网络,其架构优化了运行时大规模并行处理和分层地址事件路由(HiAER),支持稀疏连接和活动的高效处理,适用于边缘和云计算。该系统提供Python接口,屏蔽硬件细节,简化通用脉冲神经网络的配置与执行。平台通过网页门户向社区开放,展示了在CIFAR-10、DVS事件手势、MNIST和Pong任务上的事件驱动视觉能力。

英文摘要

In this work, we present HiAER-Spike, a modular, reconfigurable, event-driven neuromorphic computing platform designed to execute large spiking neural networks with up to 160 million neurons and 40 billion synapses - roughly twice the neurons of a mouse brain at faster than real time. This system, assembled at the UC San Diego Supercomputer Center, comprises a co-designed hard- and software stack that is optimized for run-time massively parallel processing and hierarchical address-event routing (HiAER) of spikes while promoting memory-efficient network storage and execution. The architecture efficiently handles both sparse connectivity and sparse activity for robust and low-latency event-driven inference for both edge and cloud computing. A Python programming interface to HiAER-Spike, agnostic to hardware-level detail, shields the user from complexity in the configuration and execution of general spiking neural networks with minimal constraints in topology. The system is made easily available over a web portal for use by the wider community. In the following, we provide an overview of the hard- and software stack, explain the underlying design principles, demonstrate some of the system's capabilities and solicit feedback from the broader neuromorphic community. Examples are shown demonstrating HiAER-Spike's capabilities for event-driven vision on benchmark CIFAR-10, DVS event-based gesture, MNIST, and Pong tasks.

2411.02933 2026-06-12 cs.DB cs.LG cs.PF

P-MOSS: Scheduling Main-Memory Indexes Over NUMA Servers Using Next Token Prediction

P-MOSS:利用下一个令牌预测在NUMA服务器上调度主内存索引

Yeasir Rayhan, Walid G. Aref

发表机构 * Purdue University West Lafayette, IN, USA(普渡大学西拉法叶分校)

AI总结 P-MOSS通过学习空间调度框架,在NUMA服务器上调度查询执行到特定逻辑核心并 colocate 数据,利用大语言模型原理提升性能,实验表明其查询吞吐量提升达6倍。

Comments Accepted to SIGMOD'26

详情
AI中文摘要

自从2000年代初Dennard缩放定律失效,CPU频率停滞,厂商开始在每个CPU芯片上增加核心数量,引入异构性,从而 ushered the era of NUMA和Chiplet处理器。此后,硬件设计空间的异构性不断增加,现代服务器中DBMS性能可能变化高达一个数量级。影响性能的重要因素包括DBMS查询执行的逻辑核心位置和数据存储的位置。本文介绍了P-MOSS,一种学习空间调度框架,将查询执行调度到特定逻辑核心,并在对应的NUMA节点上 colocate数据。为了实现跨硬件和工作负载的适应性,P-MOSS利用大语言模型的核心原理,如下一个令牌预测、生成式预训练和微调。在硬件-软件协同的精神下,P-MOSS仅基于硬件性能监控单元收集的低层硬件统计信息,通过决策变压器进行调度决策。在B$^+$-Tree索引的背景下进行了实验评估。性能结果表明,P-MOSS在查询吞吐量方面比传统调度提高了多达6倍。

英文摘要

Ever since the Dennard scaling broke down in the early 2000s and the frequency of the CPUs stalled, vendors have started to increase the core count in each CPU chip at the expense of introducing heterogeneity, thus ushering the era of NUMA and Chiplet processors. Since then, the heterogeneity in the design space of hardware has only increased to the point that DBMS performance may vary significantly up to an order of magnitude in modern servers. An important factor that affects performance includes the location of the logical cores where the DBMS queries execute, and the location where the data resides. This paper introduces P-MOSS, a learned spatial scheduling framework that schedules query execution to specific logical cores, and co-locates data on the corresponding NUMA node. For cross-hardware and workload adaptability, P-MOSS leverages core principles from Large Language Models, such as Next Token prediction, Generative Pre-training, and Fine-tuning. In the spirit of hardware-software synergy, P-MOSS guides its scheduling decision solely based on the low-level hardware statistics collected from the hardware Performance Monitoring Unit with the aid of a Decision Transformer. Experimental evaluation is performed in the context of the B$^+$-Tree index. Performance results demonstrate that P-MOSS offers an improvement of up to $6\times$ over traditional schedules in terms of query throughput.

2601.10885 2026-06-12 physics.plasm-ph cs.LG physics.comp-ph

Learning collision operators from plasma phase space data using differentiable simulators

利用可微分模拟器从等离子体相空间数据学习碰撞算子

Diogo D. Carvalho, Pablo J. Bilbao, Warren B. Mori, Luis O. Silva, E. Paulo Alves

发表机构 * GoLP/Instituto de Plasmas e Fusão Nuclear, Instituto Superior Técnico, Universidade de Lisboa(GoLP/等离子体与核融合研究所,理工学院,里斯本大学) Mani L. Bhaumik Institute for Theoretical Physics, University of California, Los Angeles(马尼·L·巴乌米克理论物理研究所,加州大学洛杉矶分校) The Rudolf Peierls Centre for Theoretical Physics, University of Oxford(鲁道夫·皮埃尔尔斯理论物理中心,牛津大学) Department of Physics and Astronomy University of California, Los Angeles(物理与天文学系,加州大学洛杉矶分校)

AI总结 提出一种结合可微分Fokker-Planck求解器与梯度优化方法,从等离子体相空间数据推断碰撞算子的方法,并在二维PIC模拟数据上验证其准确性和计算效率。

Comments accepted for publication in Journal of Plasma Physics, code available at https://github.com/diogodcarvalho/ml-pic-collision-operators

Journal ref J. Plasma Phys. (2026), vol. 92, E76

详情
AI中文摘要

我们提出了一种从等离子体动力学相空间数据推断碰撞算子的方法。该方法结合了一个可微分动力学模拟器(其核心组件是一个可微分的Fokker-Planck求解器)与基于梯度的优化方法,以学习最能描述相空间动力学的碰撞算子。我们使用空间均匀热等离子体的二维Particle-in-Cell模拟数据测试了该方法,学习了能够捕获有限大小带电粒子之间自洽电磁相互作用的碰撞算子,该算子适用于多种模拟参数。我们证明,学习到的算子比基于粒子轨迹的替代估计更准确,同时无需对过程的相关时间尺度做出先验假设,并显著降低了内存需求。我们发现,在非相对论条件下获得的算子与静电场景的理论预测高度一致。我们的结果表明,可微分模拟器为推断新算子提供了一种强大且计算高效的方法,适用于广泛的问题,如电磁主导的碰撞动力学和随机波粒相互作用。

英文摘要

We propose a methodology to infer collision operators from phase space data of plasma dynamics. Our approach combines a differentiable kinetic simulator, whose core component in this work is a differentiable Fokker-Planck solver, with a gradient-based optimisation method to learn the collisional operators that best describe the phase space dynamics. We test our method using data from two-dimensional Particle-in-Cell simulations of spatially uniform thermal plasmas, and learn the collision operator that captures the self-consistent electromagnetic interaction between finite-size charged particles over a wide variety of simulation parameters. We demonstrate that the learned operators are more accurate than alternative estimates based on particle tracks, while making no prior assumptions about the relevant time scales of the processes and significantly reducing memory requirements. We find that the retrieved operators, obtained in the non-relativistic regime, are in excellent agreement with theoretical predictions derived for electrostatic scenarios. Our results show that differentiable simulators offer a powerful and computational efficient approach to infer novel operators for a wide rage of problems, such as electromagnetically dominated collisional dynamics and stochastic wave-particle interactions.

2510.03699 2026-06-12 q-bio.NC cs.AI cs.LG cs.NE cs.SY eess.SY

Dissecting Larval Zebrafish Hunting using Deep Reinforcement Learning Trained RNN Agents

解析斑马鱼幼体捕食行为的深度强化学习训练RNN代理

Raaghav Malik, Satpreet H. Singh, Sonja Johnson-Yu, Nathan Wu, Roy Harpaz, Florian Engert, Kanaka Rajan

发表机构 * California Institute of Technology(加州理工学院) Harvard University(哈佛大学)

AI总结 本文通过深度强化学习训练RNN代理,研究斑马鱼幼体捕食行为,揭示生态和能量约束如何影响适应性行为,发现简单模型能复现真实捕食行为,并通过虚拟实验验证约束和环境对捕食动态的影响。

Journal ref Proceedings of the 9th Conference on Cognitive Computational Neuroscience (2026)

详情
AI中文摘要

斑马鱼幼体捕食行为为研究生态和能量约束如何塑造生物大脑和人工代理适应性行为提供了可操作的环境。本文开发了一个最小的基于代理的模型,通过深度强化学习在基于回合的斑马鱼模拟器中训练循环策略。尽管模型简单,它能复现标志性的捕食行为,包括眼位联合适追、速度调节和刻板接近轨迹,这些行为与真实幼体斑马鱼高度吻合。定量轨迹分析显示,追捕回合系统性地将猎物角度减少约一半后再捕食,与测量结果一致。虚拟实验和参数扫描变化生态和能量约束、回合运动学(耦合 vs. 未耦合转弯和前进运动)以及环境因素如食物密度、食物速度和融合限制。这些操作揭示了约束和环境如何塑造追捕动态、捕食成功率和中止率,为神经科学实验提供可验证的预测。这些扫描识别出一组紧凑的约束——双目感知、回合运动学中前进速度与转弯的耦合,以及适度的运动和融合的能量成本——这些约束足以使斑马鱼样式的捕食行为出现。惊人的是,这些行为在最小的代理中出现,而无需详细的生物力学、流体动力学、电路真实性和从真实斑马鱼数据中模仿学习。总体而言,这项工作为斑马鱼捕食行为提供了规范性的解释,即能量成本和感官收益之间的最佳平衡,突显了融合和轨迹动态的权衡。我们建立了一个虚拟实验室,缩小了实验搜索空间并生成了关于行为和神经编码的可验证预测。

英文摘要

Larval zebrafish hunting provides a tractable setting to study how ecological and energetic constraints shape adaptive behavior in both biological brains and artificial agents. Here we develop a minimal agent-based model, training recurrent policies with deep reinforcement learning in a bout-based zebrafish simulator. Despite its simplicity, the model reproduces hallmark hunting behaviors -- including eye vergence-linked pursuit, speed modulation, and stereotyped approach trajectories -- that closely match real larval zebrafish. Quantitative trajectory analyses show that pursuit bouts systematically reduce prey angle by roughly half before strike, consistent with measurements. Virtual experiments and parameter sweeps vary ecological and energetic constraints, bout kinematics (coupled vs. uncoupled turns and forward motion), and environmental factors such as food density, food speed, and vergence limits. These manipulations reveal how constraints and environments shape pursuit dynamics, strike success, and abort rates, yielding falsifiable predictions for neuroscience experiments. These sweeps identify a compact set of constraints -- binocular sensing, the coupling of forward speed and turning in bout kinematics, and modest energetic costs on locomotion and vergence -- that are sufficient for zebrafish-like hunting to emerge. Strikingly, these behaviors arise in minimal agents without detailed biomechanics, fluid dynamics, circuit realism, or imitation learning from real zebrafish data. Taken together, this work provides a normative account of zebrafish hunting as the optimal balance between energetic cost and sensory benefit, highlighting the trade-offs that structure vergence and trajectory dynamics. We establish a virtual lab that narrows the experimental search space and generates falsifiable predictions about behavior and neural coding.

2508.19273 2026-06-12 cs.CR cs.AI

MixGAN: A Hybrid Semi-Supervised and Generative Approach for DDoS Detection in Cloud-Integrated IoT Networks

MixGAN:一种混合半监督和生成方法用于云集成物联网网络中的DDoS检测

Tongxi Wu, Chenwei Xu, Jin Yang

发表机构 * College of Cyber Science and Engineering, Sichuan University(四川大学网络空间安全学院) College of Information Science and Technology, Tibet University(西藏大学信息科学学院)

AI总结 本文提出MixGAN,结合条件生成、半监督学习和鲁棒特征提取,解决云集成物联网网络中DDoS检测的复杂交通动态、类别不平衡和数据稀缺问题,实验表明其在准确率、TPR和TNR上优于现有方法。

Journal ref ECAI 2025, 28th European Conference on Artificial Intelligence

详情
AI中文摘要

本文提出MixGAN,一种结合条件生成、半监督学习和鲁棒特征提取的混合方法,用于云集成物联网网络中的DDoS检测。随着云集成物联网系统的普及,由于攻击面扩大、异构设备行为和边缘防护有限,DDoS攻击的威胁加剧。然而,在这种背景下,DDoS检测仍面临复杂交通动态、严重类别不平衡和数据稀缺的挑战。尽管近期方法已探索解决类别不平衡的解决方案,但许多方法仍难以在有限监督和动态交通条件下泛化。为克服这些挑战,我们提出MixGAN,一种混合检测方法,整合了条件生成、半监督学习和鲁棒特征提取。具体而言,为处理复杂的时序交通模式,我们设计了一个由时序卷积层组成的1-D WideResNet主干,包含残差连接,能够有效捕捉交通序列中的局部爆发模式。为缓解类别不平衡和标签稀缺问题,我们使用预训练的CTGAN生成合成少数类(DDoS攻击)样本,以补充未标记数据。此外,为减轻伪标签的噪声影响,我们引入了MixUp-Average-Sharpen(MAS)策略,通过在增强视图上平均预测并重新加权向高置信度类别,构造平滑和增强的目标。在NSL-KDD、BoT-IoT和CICIoT2023数据集上的实验表明,MixGAN在准确率、TPR和TNR上分别比现有方法高2.5%和4%,验证了其在大规模物联网-云环境中的鲁棒性。源代码可在https://github.com/0xCavaliers/MixGAN上公开获取。

英文摘要

The proliferation of cloud-integrated IoT systems has intensified exposure to Distributed Denial of Service (DDoS) attacks due to the expanded attack surface, heterogeneous device behaviors, and limited edge protection. However, DDoS detection in this context remains challenging because of complex traffic dynamics, severe class imbalance, and scarce labeled data. While recent methods have explored solutions to address class imbalance, many still struggle to generalize under limited supervision and dynamic traffic conditions. To overcome these challenges, we propose MixGAN, a hybrid detection method that integrates conditional generation, semi-supervised learning, and robust feature extraction. Specifically, to handle complex temporal traffic patterns, we design a 1-D WideResNet backbone composed of temporal convolutional layers with residual connections, which effectively capture local burst patterns in traffic sequences. To alleviate class imbalance and label scarcity, we use a pretrained CTGAN to generate synthetic minority-class (DDoS attack) samples that complement unlabeled data. Furthermore, to mitigate the effect of noisy pseudo-labels, we introduce a MixUp-Average-Sharpen (MAS) strategy that constructs smoothed and sharpened targets by averaging predictions over augmented views and reweighting them towards high-confidence classes. Experiments on NSL-KDD, BoT-IoT, and CICIoT2023 demonstrate that MixGAN achieves up to 2.5% higher accuracy and 4% improvement in both TPR and TNR compared to state-of-the-art methods, confirming its robustness in large-scale IoT-cloud environments. The source code is publicly available at https://github.com/0xCavaliers/MixGAN.

2104.11105 2026-06-12 cs.CR cs.LG cs.NE

Synchronization of Tree Parity Machines using non-binary input vectors

使用非二进制输入向量同步树奇偶机

Miłosz Stypiński, Marcin Niemiec

发表机构 * AGH University of Science and Technology(波兰格但尼克技术大学)

AI总结 本文提出利用范围更广的非二进制输入向量改进树奇偶机的同步过程,从而减少同步时间并提升神经密码学的安全性。

Comments This work has been submitted to the IEEE for possible publication

Journal ref IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 1, pp. 1423-1429, Jan. 2024

详情
AI中文摘要

神经密码学是将人工神经网络应用于密码学领域的解决方案。其功能基于树奇偶机,利用人工神经网络在网络实体间执行安全密钥交换。本文提出改进两个树奇偶机的同步方法,该方法基于使用范围更广的非二进制输入向量学习人工神经网络。结果表明,同步过程的时间缩短,因此树奇偶机在更短的时间内达成共同权重,从而提升了神经密码学的安全性。

英文摘要

Neural cryptography is the application of artificial neural networks in the subject of cryptography. The functionality of this solution is based on a tree parity machine. It uses artificial neural networks to perform secure key exchange between network entities. This article proposes improvements to the synchronization of two tree parity machines. The improvement is based on learning artificial neural network using input vectors which have a wider range of values than binary ones. As a result, the duration of the synchronization process is reduced. Therefore, tree parity machines achieve common weights in a shorter time due to the reduction of necessary bit exchanges. This approach improves the security of neural cryptography

1710.03070 2026-06-12 cs.NE cs.LG q-bio.NC stat.ML

full-FORCE: A Target-Based Method for Training Recurrent Networks

full-FORCE:一种基于目标的训练循环网络方法

Brian DePasquale, Christopher J. Cueva, Kanaka Rajan, G. Sean Escola, L. F. Abbott

发表机构 * Department of Neuroscience(神经科学系) Zuckerman Institute(Zuckerman研究所) Columbia University(哥伦比亚大学) Department of Physiology and Cellular Biophysics(生理学与细胞生物物理学系) Columbia University College of Physicians and Surgeons(哥伦比亚大学医学与外科学院) Princeton Neuroscience Institute(普林斯顿神经科学研究所) Lewis-Sigler Institute for Integrative Genomics(整合基因组学研究所)

AI总结 本文提出一种基于目标的循环网络训练方法,通过引入第二网络提供目标动态,实现更高效的任务处理,具有更少的神经元和更高的噪声鲁棒性。

Comments 20 pages, 8 figures

Journal ref PLoS ONE (2018)

详情
AI中文摘要

训练好的循环网络是建模动态神经计算的强大工具。我们提出了一种基于目标的方法,用于修改循环网络的全连接矩阵,以训练其执行涉及时间复杂输入/输出转换的任务。该方法在训练过程中引入第二个网络,提供合适的“目标”动态,有助于完成任务。由于利用了全循环连接,该方法产生的网络在执行任务时比传统的最小二乘(FORCE)方法使用更少的神经元,并具有更高的噪声鲁棒性。此外,我们展示了如何通过向目标生成网络引入额外的输入信号,这些信号作为任务提示,大大扩展了可学习的任务范围,并提供了对训练任务执行网络动态复杂性和性质的控制。

英文摘要

Trained recurrent networks are powerful tools for modeling dynamic neural computations. We present a target-based method for modifying the full connectivity matrix of a recurrent network to train it to perform tasks involving temporally complex input/output transformations. The method introduces a second network during training to provide suitable "target" dynamics useful for performing the task. Because it exploits the full recurrent connectivity, the method produces networks that perform tasks with fewer neurons and greater noise robustness than traditional least-squares (FORCE) approaches. In addition, we show how introducing additional input signals into the target-generating network, which act as task hints, greatly extends the range of tasks that can be learned and provides control over the complexity and nature of the dynamics of the trained, task-performing network.

2606.13658 2026-06-12 cs.AI 新提交

Before You Think: System 0, AI-Mediated Cognition and Cognitive Colonization

在你思考之前:系统0、AI中介认知与认知殖民化

Marianna Bergamaschi Ganapini, Massimo Chiriatti, Enrico Panai, Giuseppe Riva

AI总结 本文比较三种AI认知框架,提出系统0具有独特理论地位,并引入“认知殖民化”概念,指出AI系统能将外部利益嵌入自我架构,构成难以察觉的影响。

详情
AI中文摘要

本文考察了三种用于理解人工智能的认知和认识后果的最新框架:三系统理论、思维框架和系统0。本文认为,虽然前两种框架捕捉了AI对个体推理和集体认识实践影响的重要维度,但系统0占据了一个理论上的独特地位,其他两者都无法完全复制。本文引入了认知殖民化的概念,根据这一概念,AI系统能够以用户难以察觉的方式将外部利益嵌入自我架构中。由于此类系统已广泛部署,理解这些无形的影响形式是一项紧迫的哲学和实践任务。

英文摘要

This paper examines three recent frameworks for understanding the cognitive and epistemic consequences of artificial intelligence: Tri-System Theory, Thinkframes, and System 0. It argues that while the first two capture important dimensions of AI's influence on individual reasoning and collective epistemic practices, System 0 occupies a theoretically distinctive position that neither can fully replicate. The paper introduces the concept of cognitive colonization, according to which AI systems can embed external interests within the architecture of the self in ways that are difficult for users to perceive. Because such systems are already widely deployed, understanding these invisible forms of influence is an urgent philosophical and practical task.

2606.13028 2026-06-12 cs.RO cs.CV 新提交

Comparing Commercial Depth Sensor Accuracy for Medical Applications

面向医疗应用的商用深度传感器精度比较

Pit Henrich, Maximilian Weiherer, Franziska Hansen, Bernhard Egger, Franziska Mathis-Ullrich

AI总结 本文在猪骨、猪肚和硅胶肾模型上,以触针采样为参考,比较了立体视觉、结构光和飞行时间四类深度传感器在50cm距离下的精度,发现Zivid 2M+ 60在所有物体和指标上表现最佳。

Comments 4 Pages

详情
AI中文摘要

深度估计在医疗和外科手术中有众多应用。我们使用触针采样的参考数据,在猪骨标本、猪肚标本和硅胶肾脏模型上对四种深度传感器进行了基准测试。这些物体包含多个现实挑战,包括均匀表面、镜面反射表面和次表面散射。比较包括距离约50厘米处的立体视觉、结构光和飞行时间传感器。具体而言,比较了Intel RealSense D405(美国Intel RealSense)、PMD Flexx2(德国pmdtechnologies)、Stereolabs ZED 2i(法国Stereolabs)和Zivid 2M+ 60(挪威Zivid)。在本研究考虑的所有物体和指标中,Zivid 2M+ 60表现最佳。ZED在真实组织上排名第二,但在模型上排名最后。

英文摘要

Depth estimation has numerous medical and surgical applications. We benchmark four depth sensors on a porcine bone specimen, a porcine belly specimen, and a silicone kidney phantom using stylus-sampled references. These objects contain several real-world challenges, including homogeneous surfaces, specular surfaces, and subsurface scattering. The comparison includes stereo, structured-light, and time-of-flight sensors at a distance of approximately 50 cm. Specifically, the Intel RealSense D405 (Intel RealSense, United States), PMD Flexx2 (pmdtechnologies, Germany), Stereolabs ZED 2i (Stereolabs, France), and Zivid 2M+ 60 (Zivid, Norway) are compared. The Zivid 2M+ 60 performed best across all objects and metrics considered in this work. The ZED ranked second for real tissue, but last on the phantom.

2606.12917 2026-06-12 cs.LG 新提交

Where Computation Lives Inside TabPFN: Causal Localisation of Attention Head Function

计算在 TabPFN 中的位置:注意力头功能的因果定位

Atharva Gupta, Dhruv Kumar, Murari Mandal, Saurabh Deshpande

AI总结 通过激活修补、消融和注意力熵分析,发现 TabPFN 2.5 中一个注意力头在峰值层的因果必要性比其他头高2-5倍,且其主导层随任务复杂度变化,其余头呈现对称的后期层轮廓。

Comments Accepted to Workshop FMSD @ ICML 2026

详情
AI中文摘要

我们首次对表格基础模型进行了因果机制分析,研究了 TabPFN 2.5 的逐特征注意力头如何跨层分布计算。使用两个合成回归数据集上的激活修补、消融和注意力熵,我们发现明确的时间特化:一个头的因果必要性在峰值层比其他头高2到5倍,其主导层随不同复杂度的任务而变化,而其余头表现出对称的后期层轮廓。注意力熵和修补为优势头的计算活跃层提供了收敛证据。我们还通过对比激活引导研究了推理时间的可操控性,发现它无法跨样本迁移。我们将这一结果归因于 TabPFN 的上下文学习机制,该机制通过上下文相关的注意力编码任务结构,而不是语言模型中使引导可行的稳定参数方向。

英文摘要

We present the first causal mechanistic analysis of a tabular foundation model, investigating how TabPFN 2.5's feature wise attention heads distribute computation across layers. Using activation patching, ablation, and attention entropy across two synthetic regression datasets, we find clear temporal specialisation: one head's causal necessity dominates that of the others by 2 to 5 times at peak layer, with its dominant layer shifting across tasks of different complexity, while the remaining heads exhibit symmetric late layer profiles. Attention entropy and patching provide convergent evidence for the computationally active layers of the dominant head. We additionally investigate inference time steerability via contrastive activation steering, which fails to transfer across samples. We attribute this result to TabPFN's in context learning mechanism, which encodes task structure through context dependent attention rather than the stable parametric directions that make steering tractable in language models.

2606.12900 2026-06-12 cs.AI cs.CL cs.LG 新提交

Zero-source LLM Hallucination Detection with Human-like Criteria Probing

零源大语言模型幻觉检测:类人类标准探测

Jiahao Yang, Shuhai Zhang, Hailong Kang, Feng Liu, Qi Chen, Mingkui Tan

AI总结 提出HCPD范式,通过类人类标准探测机制模拟人类评估者的多面推理,结合奖励对齐和多样本聚合,实现零源条件下的有效可解释幻觉检测。

Comments Accepted at ICML 2026

详情
AI中文摘要

大型语言模型(LLM)常因生成事实错误或不忠实的内容而产生幻觉,对其安全使用构成重大风险。在零源约束下,即无法获取模型内部信息或外部参考,检测必须仅依赖于文本查询-答案对,检测此类幻觉尤为困难。本文提出用于幻觉检测的类人类标准探测(HCPD)范式,该范式模拟人类评估者的多面推理。其核心是类人类标准探测(HCP)机制,其中LLM代理自适应地将其判断分解为一组可解释的加权标准,并将特定标准得分聚合为最终的真实性度量。为实现这种自适应能力,我们引入了一种基于奖励的对齐方案,仅使用来自语义一致性的弱监督。在推理时,我们采用多样本聚合策略,确保决策稳健的同时保持完全可解释性。我们进一步提供了支持我们方法可靠性的理论分析。大量实验表明,HCPD始终优于最先进的基线,为零源幻觉检测提供了一种有效且可解释的解决方案。代码可从此https URL获取。

英文摘要

Large language models (LLMs) often hallucinate by generating factually incorrect or unfaithful content, posing significant risks to their safe use. Detecting such hallucinations is particularly challenging under the zero-source constraint, where no model internals or external references are available, and detection must rely solely on the textual query-answer pair. In this paper, we propose Human-like Criteria Probing for Hallucination Detection (HCPD), a paradigm that emulates the multi-faceted reasoning of human evaluators. Its core is a Human-like Criteria Probing (HCP) mechanism, in which a LLM agent adaptively decomposes its judgment into a weighted set of interpretable criteria and aggregates criterion-specific scores into a final truthfulness measure. To achieve this adaptive capability, we introduce a reward-based alignment scheme using only weak supervision from semantic consistency. At inference, we employ a multi-sampling aggregation strategy to ensure robust decisions while preserving full interpretability. We further provide theoretical analysis supporting the reliability of our approach. Extensive experiments show that HCPD consistently outperforms state-of-the-art baselines, offering an effective and explainable solution for zero-source hallucination detection. Code is available at https://github.com/TRISKEL10N/HCPD.

2606.12896 2026-06-12 cs.LG cs.AI cs.CR 新提交

PolicyGuard: Towards Test-time and Step-level Adversary Defense for Reinforcement Learning Agent

PolicyGuard:面向强化学习智能体的测试时和步级对抗防御

Junfeng Guo Heng Huang

AI总结 提出PolicyGuard,一种基于高斯过程后验方差的测试时步级后门防御方法,通过自适应伪轨迹计算单步不确定性,在七种RL游戏中达到平均AUROC 0.856和0.859。

详情
AI中文摘要

尽管强化学习(RL)的实际应用日益普及,但RL系统的安全性值得更多关注和探索。特别是,最近的研究揭示了RL智能体容易受到后门攻击,即受害智能体在标准条件下表现正常,但在特定触发器被激活时执行恶意动作。现有的RL后门防御要么需要访问智能体的内部参数,要么仅在模型或轨迹级别操作,或者仅限于特定攻击类型。为了确保RL智能体的安全性,我们提出了\texttt{PolicyGuard},一种\textit{测试时步级}后门防御方法,它利用高斯过程(GP)后验方差并自适应伪轨迹以实现单个时间步的不确定性计算。此外,我们还提供了理论基础来解释GP后验方差的有效性。在七个RL游戏上的大量实验表明,PolicyGuard在大多数情况下实现了最先进的检测性能,对于基于扰动的攻击平均AUROC为0.856,对于对抗智能体攻击平均AUROC为0.859。

英文摘要

While real-world applications of reinforcement learning (RL) are becoming increasingly popular, the security of RL systems deserve more attention and exploration. In particular, recent work has revealed that RL agents are vulnerable to backdoor attacks, where a victim agent behaves normally under standard conditions but executes malicious actions when a specific trigger is activated. Existing backdoor defenses for RL either require access to the agent's internal parameters, operate only at the model or trajectory level, or are limited to specific attack types. To ensure the security of RL agents, we propose \texttt{PolicyGuard}, a \textit{test-time step-level} backdoor defense which leverages Gaussian Process (GP) posterior variance and adapts pseudo trajectories to enable uncertainty computation for individual time step. Besides, we also provide theoretical foundations to explain the efficacy of GP posterior variance. Extensive experiments across seven RL games demonstrate that PolicyGuard achieves state-of-the-art detection performance in most cases, with average AUROC of 0.856 for perturbation-based attacks and 0.859 for adversary-agent attacks.

2606.12882 2026-06-12 cs.AI 新提交

HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness

HarnessBridge: 用于LLM智能体框架的可学习双向控制器

Xiaoxuan Wang, Haixin Wang, Alexander Taylor, Jason Cong, Yizhou Sun, Wei Wang

AI总结 提出HarnessBridge,一种轻量级可学习框架控制器,通过双向投影参数化智能体-环境接口,减少令牌使用和轨迹长度,并泛化到更大模型。

详情
AI中文摘要

大型语言模型越来越多地被部署为用于长周期任务的智能体,但其性能不仅受模型能力和环境设计的影响,还受调节智能体-环境交互的框架的影响。现有的框架大多是手动设计的,随着轨迹变长和交互变得更加复杂,它们难以扩展。在这项工作中,我们探究框架是否可以通过一个可学习的即插即用模块生成,该模块可以以端到端的方式进行训练。我们引入了HarnessBridge,一种轻量级可学习框架控制器,它将智能体-环境接口参数化为双向投影。HarnessBridge学习两个双向投影:观测投影,将原始轨迹提炼为紧凑的、与决策相关的状态;以及动作投影,将提议的动作转换为可执行的转换或基于轨迹的拒绝。我们在框架监督数据集上通过统一指令调优训练HarnessBridge。在Terminal-Bench~2.0和SWE-bench Verified上,HarnessBridge匹配或超越了强大的专用框架,同时大幅减少了令牌使用和轨迹长度,并从较小的生成器泛化到较大的商业模型。

英文摘要

Large language models are increasingly deployed as agents for long-horizon tasks, yet their performance is shaped not only by model capability and environment design, but also by the harness that mediates agent--environment interaction. Existing harnesses are largely manually engineered, making them difficult to scale as trajectories grow longer and interactions become more complex. In this work, we ask whether harness can be generated by a learnable plug-in module that can be trained in an end-to-end fashion. We introduce HarnessBridge, a lightweight learnable harness controller that parameterizes the agent--environment interface as a bidirectional projection. HarnessBridge learns two bidirectional projections: observation projection, which distills raw trajectories into compact, decision-relevant states, and action projection, which converts proposed actions into executable transitions or trajectory-grounded rejections. We train HarnessBridge on a harness supervision dataset via unified instruction tuning. On Terminal-Bench~2.0 and SWE-bench Verified, HarnessBridge matches or surpasses strong specialized harnesses while substantially reducing token usage and trajectory length, and generalizes from smaller generators to larger commercial models.

2606.12840 2026-06-12 cs.LG 新提交

CLARITree: Cholesky and Lookahead Accelerations for Regression with Interpretable Piecewise Linear Trees

CLARITree: 基于Cholesky和前瞻加速的可解释分段线性树回归

Yixiao Wang, Hayden McTavish, Varun Babbar, Margo Seltzer, Cynthia Rudin

AI总结 提出一种结合前瞻搜索和秩一Cholesky更新的算法,用于构建近最优稀疏分段线性回归树,在计算效率、预测精度和稀疏性之间取得良好平衡。

Comments Accepted at ICML 2026

详情
AI中文摘要

回归树是机器学习中最具可解释性且表达能力最强的模型之一。历史上,贪心归纳一直是构建高性能回归树的主要方法。尽管存在基于动态规划和分支定界的最优方法,但对于一般的线性回归树,这些方法在计算上不可行,尽管它们通常比贪心方法取得更好的性能。最近的研究表明,专门的前瞻策略可以显著提高运行时间,同时保持接近最优的性能,主要是在分类设置中。在这项工作中,我们开发了一种新颖的算法,用于近最优、稀疏、分段线性回归树,该算法将前瞻式搜索策略与Gram矩阵的高效秩一Cholesky更新相结合。我们从理论和实验上证明,我们的方法在计算效率、预测精度和稀疏性之间实现了有利的权衡,并且比当前最先进的方法具有更好的可扩展性。

英文摘要

Regression trees are among the most interpretable yet expressive model classes in machine learning. Historically, greedy induction has been the dominant approach for constructing well-performing regression trees. While optimal methods based on dynamic programming and branch-and-bound exist, they are computationally prohibitive for general linear regression trees, despite often achieving substantially better performance than greedy approaches. Recent work has shown that specialized lookahead strategies can dramatically improve runtime while maintaining near-optimal performance, primarily in classification settings. In this work, we develop a novel algorithm for near-optimal, sparse, piecewise linear regression trees that combines a lookahead-style search strategy with efficient rank-one Cholesky updates of the Gram matrix. We demonstrate, both theoretically and empirically, that our method achieves a favorable trade-off between computational efficiency, predictive accuracy, and sparsity, and scales significantly better than the current state of the art.

2606.12828 2026-06-12 cs.AI 新提交

Topical Phase Transitions in Artificial Intelligence Research: Large-Scale Evidence and an Early-Warning Signature for Emerging Topics

人工智能研究中的主题相变:大规模证据与新兴主题的早期预警信号

Rasul Khanbayov, Hasan Kurban

AI总结 通过分析2017-2025年五大AI会议论文,发现AI主题通过“相变”方式突然爆发,并基于早期预警信号识别未来需关注的主题。

详情
AI中文摘要

人工智能的研究主题是逐渐增长,还是通过突然的、可检测的跳跃式发展?通过分析2017年至2025年期间五个顶级AI会议(ACL、CVPR、ICLR、ICML、NeurIPS)的80,814篇主会论文,我们发现主要AI主题通过主题相变推进:在多年间保持边缘地位,然后在一到三年内跨会议激增。到2025年,大型语言模型成为跨会议的主导主题,扩散模型以类似的突发性崛起,语言模型方法通过视觉语言模型进入计算机视觉领域,而强化学习则平滑累积,这区分了真正的相变与普通增长。这一结构是我们的主要贡献:对AI研究如何重组的大规模、跨会议特征描述。然后我们探究相变是否在达到顶峰前留下可检测的足迹。我们定义了一个早期预警信号,即基于2017-2021年数据冻结的四项出版动力学标准,并在2023-2025年的相变上进行样本外评估,在13.5%的基准率下获得了27%的精确率和63%的召回率。应用于2025年数据时,该信号将推理与测试时计算、智能体AI、多模态LLM、检索增强生成和世界模型标记为2026-2028年需监测的主题。源代码也在GitHub上公开,网址为https://this https URL。

英文摘要

Do research topics in artificial intelligence grow gradually, or do they advance through abrupt, detectable jumps? Analyzing 80,814 accepted main-track papers from five premier AI conferences (ACL, CVPR, ICLR, ICML, NeurIPS) spanning 2017 to 2025, we show major AI topics advance through topical phase transitions: remaining marginal for years, then surging across venues within one to three years. Large language models became the dominant cross-venue topic by 2025, diffusion models rose with comparable abruptness, and language-model methods crossed into computer vision via vision-language models, whereas reinforcement learning compounded smoothly, distinguishing genuine phase transitions from ordinary growth. This structure is our primary contribution: a large-scale, cross-venue characterization of how AI research reorganizes. We then ask whether a transition leaves a detectable footprint before it peaks. We define an early-warning signature, four publication-dynamics criteria frozen on 2017-2021 data, and evaluate it out of sample on 2023-2025 transitions, obtaining a precision of 27% and recall of 63% against a 13.5% base rate. Applied to 2025 data, the signature flags reasoning and test-time compute, agentic AI, multimodal LLMs, retrieval-augmented generation, and world models as topics to monitor over 2026-2028. The source code is also publicly available on GitHub at https://github.com/KurbanIntelligenceLab/ai-phase-transitions.

2606.12754 2026-06-12 cs.CL cs.AI 新提交

LLMs Can Better Capture Human Judgments--With the Right Prompts

LLMs 能更好地捕捉人类判断——使用合适的提示

Danica Dillion, Chen Cecilia Liu, Baihui Wang, Daniele Barolo, Tanmay Rajore, Niket Tandon, Pranathi Ravikumar, Kurt Gray

AI总结 通过简单提示策略,LLMs 能恢复人类反应的完整分布,并减少对措辞变化的敏感性,提升 AI-人类对齐。

详情
AI中文摘要

大型语言模型(LLMs)在捕捉人类判断方面是否表现不佳?两个常被提及的限制是:LLMs 无法捕捉反应的全分布,以及它们的判断在措辞变化上不稳定。我们展示了缓解这些限制的简单提示策略。在两个数据集上——一个代表美国的 144 个道德情景集,以及国际社会调查项目“家庭与性别角色变化”模块涵盖 32 个国家的 38 个道德信念——我们展示了简单的启发式技术如何帮助改善 AI-人类对齐。首先,提示模型报告标准差和反应比例,比常见策略更好地恢复了人类反应的完整范围。其次,确保情景对人类参与者清晰——如人类困惑评分所反映——提升了模型对齐度,且 LLMs 可以跟踪人类困惑评分。同时,我们发现 LLMs 对自身误差的估计校准不佳,尽管它们能相对较好地预测人类变异性。这些结果表明,向 LLMs 提出更好的问题可以得到更好的答案。

英文摘要

Are large language models (LLMs) bad at capturing human judgment? Two commonly stated limitations are that LLMs fail to capture full distributions of responses, and that their judgments are unstable across wording variations. We demonstrate simple prompting strategies that mitigate these limitations. Across two datasets--a U.S.-representative set of 144 moral scenarios and 38 moral beliefs from the International Social Survey Programme's Family and Changing Gender Roles module covering 32 countries--we show how simple elicitation techniques help improve AI-human alignment. First, prompting models to report standard deviations and response proportions recovers the full range of human responses better than common strategies. Second, ensuring scenarios are clear to human participants--as reflected in human confusion ratings--boosts model alignment, and LLMs can track human confusion ratings. At the same time, we find that LLMs' estimates of their own error are poorly calibrated, though they can predict human variability relatively well. These results suggest that asking better questions to LLMs can yield better answers.

2606.12748 2026-06-12 cs.CL 新提交

Agent-based models for the evolution of morphological alternation patterns

基于智能体的形态交替模式演化模型

Aravinth Kulanthaivelu, Richard Sproat

AI总结 通过多智能体模拟,研究形态交替(如go/went)的涌现机制,发现无标度社交网络和随机采纳策略能产生更真实的形态模式。

Comments 51 + 37 pages. 31 Figures

详情
AI中文摘要

为什么英语中“go”的过去式是看似无关的“went”?这种交替在语言中很常见。它们既无助于交流也不利于学习,却能持续存在数百年或数千年。我们提出了一个多智能体模拟,用于研究形态词干和屈折交替的涌现。交替形式源于语音变化,或者像“go/went”一样,来自与部分人群相关的词汇替代。当一个智能体“听到”另一个智能体对某个词形位(例如go的过去式)使用新形式时,它们会以一定概率采纳该形式,并可能将其使用扩展到共享相同原始形式的其他词形位。因此,替代形式可以在人群中传播,并固化为词干或屈折标记的交替形式。与许多先前的计算研究不同,我们的系统允许自然主义的词汇形式、现实的语音规则、包含数百或数千条目的词典,以及数十或数百个智能体的人群。它支持多种网络拓扑、扩散模式和智能体采纳策略。这类模拟的一个问题是评估:与真实语言相比,产生的形态有多真实?我们引入了AI历史语言学家,这是一个新颖的大型语言模型驱动系统,模拟两位历史语言学家之间的辩论。我们用它来比较一组真实语言的形态、伪装形态和实验演化形态。结果表明,有利于产生更合理形态的因素包括无标度社交网络和随机伯努利形式采纳。我们还提出了三个案例研究,模拟了有记载的历史变化,使我们能够测试如果历史不同会发生什么。所有代码和数据均已发布。

英文摘要

Why is the past of English "go" the apparently unrelated "went"? Such alternations are frequent in languages. They neither aid communication nor learnability, yet they can be persistent, surviving over centuries or millennia. We present a multi-agent simulation of the emergence of morphological stem and inflection alternations. Alternate forms arise by phonological changes or, as with "go/went", from lexical alternatives associated with a subset of the population. When an agent 'hears' another agent use a novel form for a slot in the paradigm of a word (say, the past tense of go), they will with some probability adopt that form, possibly spreading its use to other slots in the paradigm that shared the same original form. Thus alternative forms can spread through the population and become entrenched as stem or inflectional marker alternants. Unlike many previous computational studies, our system allows for naturalistic lexical forms, realistic phonological rules, lexicons with hundreds or thousands of entries, and agent populations in the tens or hundreds. It supports several network topologies, diffusion patterns and agent adoption policies. One issue with such simulations is evaluation: how realistic is the resulting morphology compared to those of real languages? We introduce the AI Historical Linguist, a novel Large Language Model-driven system that models a debate between two historical linguists. We use this to compare a set of real language morphologies, disguised morphologies, and experimentally evolved morphologies. The results suggest that among the factors that favor more plausible morphologies are scale-free social networks and random Bernoulli adoption of forms. We also present three case studies modeling attested historical changes, allowing us to test what might have happened if history had been different. All code and data are released.

2606.12702 2026-06-12 cs.AI 新提交

Deployment-Centered Evaluation: Predicting Query-Level Rejection Risk in a Clinical LLM System

以部署为中心的评估:预测临床大语言模型系统中的查询级拒绝风险

Alyssa Unell, Miguel Fuentes, Brenna Li, Bridget Lin, Meena Jagadeesan, Sanmi Koyejo, Nigam Shah

AI总结 针对临床大语言模型系统,提出基于部署上下文(如提供者类型、科室名称)的预响应分类器,预测用户拒绝风险,AUROC达0.719,并展示其在触发护栏和弃权中的效用。

详情
AI中文摘要

大语言模型(LLMs)正越来越多地集成到临床系统中,因此评估这些系统的实际效用至关重要。然而,静态基准倾向于衡量正确性而非用户接受度,跨查询聚合性能,并需要密集标注的数据集——这导致评估临床系统时存在重大盲点。在这项工作中,我们对嵌入某学术医疗中心电子健康记录中的LLM系统进行了以部署为中心的评估,其中用户反馈稀疏但密切反映了部署条件。具体而言,我们训练了一个预响应分类器,该分类器基于查询内容和生成前可用的部署特定上下文,估计未来交互导致用户拒绝LLM响应的风险。我们对模型进行了4.5个月用户反馈的前瞻性分析,发现我们的预测模型达到了0.719的AUROC。此外,我们估计了此类预测在两个下游用例(触发护栏和弃权)中的益处。我们的关键概念洞察是,利用部署特定上下文(即提供者类型、科室名称、用于响应的语言模型),而不仅仅是查询内容,可以提高预测用户是否会拒绝系统输出的能力。总之,我们的实证案例研究证明了使用部署特定上下文预测用户拒绝的可行性,为定向护栏打开了大门。

英文摘要

Large language models (LLMs) are increasingly integrated into clinical systems, making it essential to evaluate the real-world utility of these systems. However, static benchmarks tend to measure correctness rather than user acceptance, aggregate performance across queries, and require densely annotated datasets -- leading to major blind spots for evaluating clinical systems. In this work, we perform a deployment-centered evaluation of an LLM system embedded within electronic health records at an academic medical center, where user feedback is sparse but closely reflects the deployment conditions. Specifically, we train a pre-response classifier that estimates the risk that a future interaction will result in the user rejecting the LLM response, based on query content and deployment-specific context available before generation. We conduct a prospective analysis of our model over 4.5 months of user feedback, finding that our prediction model achieves an AUROC of 0.719. Further, we estimate the benefit of such predictions in two downstream use cases (guardrail triggering and abstention). Our key conceptual insight is that making use of deployment-specific context (i.e., the provider type, department name, language model used for response), as opposed to only query content, improves the ability to predict whether the user will reject the system output. Altogether, our empirical case study demonstrates the feasibility of predicting user rejection using deployment-specific context, opening the door to targeted guardrails.

2606.12674 2026-06-12 cs.AI 新提交

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Evoflux: 紧凑型智能体的可执行工具工作流的推理时演化

Kushal Raj Bhandari, Ling Yue, Ching-Yun Ko, Dhaval Patel, Shaowu Pan, Pin-Yu Chen, Jianxi Gao

AI总结 提出Evoflux,一种推理时演化搜索方法,通过结构化编辑和执行反馈修复紧凑语言模型的工具工作流,将执行可行性从3%提升至17-24%,优于SFT和ReAct。

Comments Code is available at https://github.com/IBM/Evoflux

详情
AI中文摘要

紧凑型语言模型(LMs)降低了工具智能体的成本、延迟和部署风险。然而,MCP风格的工具使用不仅仅需要孤立的函数调用:智能体必须从实时目录中发现工具、满足模式、跨中间输出保留依赖关系,并在执行证据中基于最终响应。小型规划器通常生成看似合理的工作流图,但在工具解析、参数验证、依赖跟踪或执行中失败。我们认为,小语料蒸馏难以处理这种失败模式。几百个教师轨迹可以教授工作流格式,但很少涵盖修复失败计划所需的恢复行为。我们引入了Evoflux,一种推理时演化搜索方法,将紧凑工具使用视为可执行工具工作流的修复。它通过结构化编辑、执行反馈、自适应强度、元引导重设计和多样性剪枝来演化类型化工作流图。在涵盖实时MCP服务器和250个工具的保留MCP-Bench任务上,Evoflux将小型规划器的执行可行性从约3%提高到17-24%。相比之下,在相同搜索挖掘数据上的SFT和SFT+DPO匹配、表现不佳或崩溃至零样本性能以下;ReAct达到更高峰值,但方差和令牌成本更高。这些结果表明,在稀缺的教师轨迹预算下,基于执行的搜索更可靠。

英文摘要

Compact language models (LMs) reduce cost, latency, and deployment risk for tool agents. Yet MCP-style tool use requires more than isolated function calling: an agent must discover tools from live catalogs, satisfy schemas, preserve dependencies across intermediate outputs, and ground final responses in executed evidence. Small planners often generate plausible workflow graphs that fail under tool resolution, parameter validation, dependency tracking, or execution. We argue that this failure mode is poorly handled by small-corpus distillation. A few hundred teacher traces can teach workflow format, but rarely cover the recovery behavior needed to repair failed plans over changing tool catalogs. We introduce Evoflux, an inference-time evolutionary search method that treats compact tool use as the repair of executable tool workflows. It evolves typed workflow graphs through structured edits, execution feedback, adaptive intensity, meta-guided redesign, and diversity pruning. On held-out MCP-Bench tasks spanning live MCP servers and 250 tools, Evoflux raises execution feasibility from roughly 3% to 17-24% across small planners. In contrast, SFT and SFT+DPO on the same search-mined data match, underperform, or collapse below zero-shot performance; ReAct reaches higher peaks, but with higher variance and token cost. These results show that execution-grounded search is more reliable under scarce teacher-trace budgets.

2606.12671 2026-06-12 cs.CV 新提交

SalArt-VQA: Diagnosing Whether VLMs Understand Salient Artifacts in Generated Images

SalArt-VQA: 诊断VLM是否理解生成图像中的显著伪影

Xiaoxiao Sun, Ruotian Zhang, Junzhe Huang, James Burgess, Serena Yeung-Levy

AI总结 提出SalArt-VQA基准,通过950张图像和3681道多选题,从检测、定位、空间基础、缺陷识别四方面评估VLM对生成图像伪影的理解,揭示高检测准确率下隐藏的失败模式。

Comments 23 pages, 7 figures, 7 tables. Dataset: https://huggingface.co/datasets/salartvqa/SalArt-VQA

详情
AI中文摘要

视觉语言模型(VLM)越来越多地被用于检测AI生成图像是否包含可见伪影,然而它们分析此类伪影的能力仍然知之甚少。正确的图像级决策仍可能隐藏重要失败:模型可能正确标记伪影,但依赖于错误的视觉线索、选择错误的区域,或描述图像中不存在的缺陷。为了直接评估这些行为,我们引入了SalArt-VQA,一个用于细粒度理解AI生成图像中显著伪影的诊断基准。SalArt-VQA包含950张图像和3,681道人工编写的多项选择题,涵盖伪影图像、匹配的真实参考图像和配对的生成参考图像。四种对齐的问题类型评估存在检测、语义定位、空间基础和证据基础的缺陷识别,而参考分割测试了当注释缺陷不存在时的校准和弃权能力。在20个VLM上,SalArt-VQA揭示了图像级检测准确率所隐藏的失败:最强的模型在伪影图像上达到99.37%的检测召回率,但仅在53.26%的图像上正确回答了所有四个伪影侧问题。比较伪影图像与无伪影参考揭示了灵敏度-校准权衡:敏感模型经常做出无根据的伪影声明,而保守模型主要通过遗漏真实伪影来避免误报。这些结果表明,高伪影检测准确率本身并不意味着有基础的伪影理解。SalArt-VQA暴露了这些隐藏的失败模式,并提供了对VLM伪影声明是否得到局部视觉证据支持的细粒度评估。

英文摘要

Vision-language models (VLMs) are increasingly used to detect whether AI-generated images contain visible artifacts, yet their ability to analyze such artifacts remains poorly understood. A correct image-level decision can still hide important failures: a model may correctly flag an artifact while relying on the wrong visual cue, selecting the wrong region, or describing a defect that the image does not support. To evaluate these behaviors directly, we introduce SalArt-VQA, a diagnostic benchmark for fine-grained SALient ARTifact understanding in AI-generated images. SalArt-VQA contains 950 images and 3,681 human-authored multiple-choice questions spanning artifact images, matched real reference images, and paired generated reference images. Four aligned question types evaluate presence detection, semantic localization, spatial grounding, and evidence-grounded defect identification, while the reference splits test calibration and abstention when the annotated defect is absent. Across 20 VLMs, SalArt-VQA reveals failures that image-level detection accuracy hides: the strongest model reaches 99.37% detection recall on artifact images but answers all four artifact-side questions correctly on only 53.26% of images. Comparing artifact images with artifact-free references reveals a sensitivity-calibration tradeoff: sensitive models often make unsupported artifact claims, while conservative models avoid false alarms largely by missing real artifacts. These results show that high artifact detection accuracy alone does not imply grounded artifact understanding. SalArt-VQA exposes these hidden failure modes and provides a fine-grained evaluation of whether VLM artifact claims are supported by local visual evidence.

2606.12599 2026-06-12 cs.CL 新提交

Constrained Semantic Decompression in LLMs through Persian Proverb-Conditioned Story Generation

通过波斯谚语条件故事生成实现LLM中的约束语义解压缩

Zahra Habibzadeh, Paria Khoshtab, Amir Mesbah, Yadollah Yaghoobzadeh

AI总结 提出约束语义解压缩任务,通过波斯谚语条件故事生成测试大语言模型的抽象到实现能力,构建PAND数据集,发现解压缩差距,并表明显式推理和迭代细化可部分缓解。

详情
AI中文摘要

将一个密集、抽象的谚语转化为引人入胜且道德忠实的故事需要深厚的文化理解和稳健的语义基础。我们将此问题定义为约束语义解压缩任务,并研究谚语条件故事生成作为大语言模型中抽象到实现的测试平台。聚焦波斯语,我们引入了谚语对齐叙事数据集(PAND),将谚语与人类编写的故事和显式含义配对。通过结合人类校准的LLM-as-a-Judge与结构度量的混合评估框架,我们分析了多种提示机制下的模型行为。我们的发现揭示了一个持续存在的解压缩差距:当前的LLM通常实现强大的表面流畅性,但未能忠实地实例化谚语中编码的潜在道德和因果结构。我们进一步表明,显式推理和迭代细化可以部分缓解这些失败,这表明许多解压缩错误源于将抽象含义转化为叙事形式的困难,而非完全缺乏相关知识。我们提出的任务自然扩展到其他形式的压缩文化知识。

英文摘要

Transforming a dense, abstract proverb into an engaging and morally faithful narrative requires deep cultural understanding and robust semantic grounding. We frame this problem as a \emph{constrained semantic decompression} task and study proverb-conditioned story generation as a testbed for abstraction-to-realization in large language models (LLMs). Focusing on Persian, we introduce the Proverb Aligned Narrative Dataset (PAND), pairing proverbs with human-written stories and explicit meanings. By a hybrid evaluation framework that combines human-calibrated LLM-as-a-Judge with structural metrics, we analyze model behavior across multiple prompting regimes. Our findings reveal a persistent \emph{decompression gap}: current LLMs often achieve strong surface-level fluency while failing to faithfully instantiate the underlying moral and causal structure encoded in proverbs. We further show that explicit reasoning and iterative refinement can partially mitigate these failures, suggesting that many decompression errors arise from difficulties in translating abstract meaning into narrative form rather than a complete lack of relevant knowledge. Our proposed task naturally extends to other forms of compressed cultural knowledge.

2606.12576 2026-06-12 cs.CL 新提交

Helping Figures Tell their Story! Paper-Grounded Video Generation Explaining Complex Scientific Figures

帮助图表讲述它们的故事!基于论文的视频生成解释复杂科学图表

Ishani Mondal, Javad Baghirov, Jordan Boyd-Graber

AI总结 提出MINARD流水线,从图表及其论文生成基于区域分解的叙述性视频,并发布FigTalk基准,在自动和人工评估中优于现有方法。

Comments Webpage: https://minard.vercel.app/

详情
AI中文摘要

科学图表将复杂的流程压缩到单个画布中,但理解它们需要基于论文的、逐步的叙述,并与视觉高亮对齐——这是当前视频生成系统和基准所缺乏的能力。为了解决这个问题,我们引入了基于论文的图表到视频生成:从图表及其论文生成叙述性的、区域引导的导览视频。我们提出了MINARD(通过区域分解对叙述性架构进行多模态解释),这是一个生成基于论文的叙述并顺序将其与图表区域对齐的流水线。我们还发布了FigTalk,一个包含新的顺序和组件级对齐指标的基准。在FigTalk上,MINARD生成类人的、忠于论文的叙述,并在自动和人工评估中,在叙述条件下的图表空间对齐方面优于现有方法。

英文摘要

Scientific figures compress complex pipelines into a single canvas, yet understanding them requires paper-grounded, step-by-step narration aligned with visual highlights a capability missing from current video generation systems and benchmarks. To address this, we introduce paper-grounded figure-to-video generation: generating narrated, region-grounded walkthrough videos from a figure and its paper. We propose MINARD (Multimodal Interpretation of Narrated Architecture via Region Decomposition), a pipeline that generates paper-grounded narrations and sequentially grounds them to figure regions. We also release FigTalk, a benchmark with new sequential and component-level grounding metrics derived. On FigTalk, MINARD generates humanlike, paper-faithful narrations and outperforms narration-conditioned figure spatial grounding compared to existing approaches in both automatic and human evaluation

2606.12483 2026-06-12 cs.LG 新提交

Scalable anomaly detection via a univariate Christoffel function

通过单变量Christoffel函数实现可扩展的异常检测

Florian Grivet, Didier Henrion, Jean-Bernard Lasserre, Louise Travé-Massuyès

AI总结 针对Christoffel函数方法因矩阵大小随维度指数增长而难以应用于高维数据的问题,提出基于查询点与支撑点间平方距离的单变量Christoffel函数(UCF),在ADBench基准上平均精度优于14种基线方法。

详情
AI中文摘要

异常检测在欺诈检测、网络入侵和系统故障诊断等领域识别异常模式中发挥关键作用。近年来,基于Christoffel函数的方法(根植于多项式优化)因其坚实的数学基础和计算节俭性,成为深度学习的有前景替代方案。然而,其实用性受限于需要求逆一个大小随数据维度指数增长的矩阵,即使对于中等维度数据集也难以处理。本文解决了Christoffel函数异常检测的维度限制,同时保留了其关键理论性质,即开关支撑二分法行为和准确的支撑形状捕获。我们引入了UCF,一种基于查询点与支撑点间平方距离的单变量Christoffel函数。在ADBench基准上的大量实验表明,UCF在平均精度上持续优于14个最先进的基线方法。通过解决Christoffel函数的可扩展性瓶颈,本文扩展了异常检测方法的工具箱,提供了一种稳健、有理论依据且普遍适用的方法。

英文摘要

Anomaly detection plays a critical role in identifying unusual patterns across domains such as fraud detection, network intrusion, and system fault diagnosis. Recently, Christoffel function-based methods, rooted in polynomial optimization, have emerged as promising alternatives to deep learning due to their strong mathematical foundations and computational frugality. However, their practical applicability is hindered by the need to invert a matrix whose size grows exponentially with the data dimension, rendering the method intractable even for moderate-dimensional datasets. This paper addresses the dimensionality limitations of Christoffel function-based anomaly detection while preserving its key theoretical properties, i.e., the on-off support dichotomy behavior and the accurate support shape capture. We introduce UCF, a univariate Christoffel function which is based on the squared distance between the query point and the support points. Extensive experiments on the ADBench benchmark demonstrate that UCF consistently outperforms 14 state-of-the-art baselines in terms of Average Precision. By resolving the scalability bottleneck of the Christoffel Function, this work expands the toolkit of anomaly detection methods with a robust, theoretically grounded, and universally applicable approach.

2606.13634 2026-06-12 cs.CL math.CT 新提交

Operads for compositional reasoning in LLMs

用于LLM组合推理的Operad框架

Nathaniel Bottman, Kyle Richardson

AI总结 提出operad作为问题分解的数学框架,定义问题operad Q,将QA模型解释为Q上的代数,并引入operadic一致性度量,实验表明该度量与准确性强相关。

详情
AI中文摘要

问题分解,即将复杂查询分解为更简单的子查询,并将子查询的答案组合成最终答案,是提高LLM推理能力的常用策略,但目前缺乏严格的数学基础。本文提出operad(一种模拟多输入单输出操作及其组合的数学结构)作为描述问题分解的自然框架。我们定义了问题operad $Q$,其中操作对应问题模板,组合对应子答案的替换,并展示了QA模型如何被解释为$Q$上的代数。除了重新诠释现有实践,这一operad视角还指向了新方法,特别是operadic一致性概念,它衡量QA模型的答案在问题分解树的部分折叠上是否一致。关于operadic一致性的实证评估见我们的姊妹论文(Bottman, Liu, and Richardson, 2026),该论文发现它在12个LLM和4个多跳QA数据集上与准确性强相关,且优于基于温度的标准自一致性基线。我们认为operad是问题分解的自然数学框架,而诸如operadic一致性等不变量为分析和改进多步推理的可靠性开辟了新方向。

英文摘要

Question decomposition, i.e. breaking a complex query into simpler sub-queries whose answers are composed to produce a final answer, is a widely used strategy for improving LLM reasoning, yet it currently lacks a rigorous mathematical foundation. In this paper, we propose operads, mathematical structures that model many-in, one-out operations and compositions thereof, as a natural framework for describing question decomposition. We define the questions operad $Q$, in which operations correspond to question templates and composition corresponds to substitution of sub-answers, and show how QA models can be interpreted as algebras over $Q$. Beyond reframing existing practice, this operadic perspective points toward new methods, in particular a notion of operadic consistency, which measures whether a QA model's answers agree across the partial collapses of a question decomposition tree. Empirical evaluation of operadic consistency is reported in our companion paper (Bottman, Liu, and Richardson, 2026), which finds it strongly correlated with accuracy across twelve LLMs and four multi-hop QA datasets and outperforming standard temperature-based self-consistency baselines. We argue that operads are the natural mathematical home for question decomposition, and that invariants such as operadic consistency open new directions for analyzing and improving the reliability of multi-step reasoning.