arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2042
2606.05208 2026-06-05 eess.SP cs.LG

Transformer-Enhanced Reinforcement Learning: Fundamentals and Applications in Communication Networks

Transformer增强的强化学习:通信网络中的基础与应用

Nguyen Cong Luong, Shaohan Feng, Nguyen Duc Hai, Zeping Sui, Bo Ma, Min Xu, Zhihao Dong, Qiushi Zhao, Nguyen Duc Duy Anh, Nguyen Quoc Khanh, Ngoc Hung Nguyen, Zitian Zhang, Jie Cao

发表机构 * Faculty of Computer Science, Phenikaa University(菲律宾Phenikaa大学计算机科学学院) Faculty of Artificial Intelligence and Data Science, Phenikaa University(菲律宾Phenikaa大学人工智能与数据科学学院) School of Information and Electronic Engineering (Sussex Artificial Intelligence Institute), Zhejiang Gongshang University(浙江工商大学信息与电子工程学院(Sussex人工智能研究院)) School of Computer Science and Electronics Engineering, University of Essex(埃塞克斯大学计算机科学与电子工程学院) School of Mathematics, Statistics and Mechanics, Beijing University of Technology(北京理工大学数学、统计与力学学院) School of Information Science and Technology, Harbin Institute of Technology(哈尔滨工业大学信息科学与技术学院) Department of Electrical and Information Technology, Faculty(电气与信息技术系)

AI总结 本文综述了Transformer增强的强化学习算法及其在通信网络中的应用,重点解决了传统RL在长期依赖建模和部分可观测性方面的局限。

详情
AI中文摘要

强化学习长期以来一直是解决通信网络中各种问题的强大工具。然而,传统的强化学习模型仍然面临若干局限性。它们不仅依赖于与环境的大量交互,而且在建模长期关系和应对部分可观测性方面也受到限制。近年来,Transformer模型展示了增强强化学习模型的能力,使其能够克服这些问题。特别是,Transformer中的自注意力机制能够有效建模长程依赖和全局相关性,同时加速训练过程并处理异质数据模态。本文全面综述了基于Transformer的强化学习算法及其在通信网络中的应用。具体而言,本文提供了强化学习和Transformer架构的数学背景,并深入探讨了资源分配、计算卸载、路由与轨迹控制以及网络安全等关键问题。最后,我们讨论了挑战、开放问题以及值得关注的未来研究方向,包括用于语义通信和网络优化的Transformer增强深度强化学习算法。

英文摘要

Reinforcement Learning (RL) has long been a powerful solution to various problems in communication networks. However, traditional RL models still face with several limitations. Not only do they rely on large numbers of interactions with the environment, but they are also limited in terms of modeling long-term relationships and tackling partial observability. In recent years, the Transformer model has demonstrated the ability to enhance RL models, allowing them to overcome these issues. Particularly, the self-attention mechanism within the Transformer enables efficient modeling of long-range dependencies and global correlations, as well as accelerates training processes and handles heterogeneous data modalities. In this paper, we present a comprehensive survey of Transformer-based RL algorithms and their applications in communication networks. Specifically, the paper provides the mathematical background of RL and Transformer architectures, along with insights into key issues such as resource allocation, computation offloading, routing, and trajectory control, and network security. We conclude the paper by discussing challenges, open issues, and notable future research directions, including Transformer-enhanced DRL algorithms for semantic communication and network optimization.

2606.05206 2026-06-05 q-bio.NC cs.AI stat.AP

Ontology-constrained multi-LLM scoring of hypothesis support in the predictive processing literature

本体约束的多LLM评分在预测处理文献中假设支持度的应用

Hamed Nejat, Alexander Maier, Jesse Spencer-Smith, André M. Bastos

发表机构 * University of Edinburgh(爱丁堡大学) University of Cambridge(剑桥大学)

AI总结 本文提出一个本地多LLM流水线,通过本体约束对预测编码文献中的研究进行评分,将异构文献映射到定量证据空间,并揭示假设间的结构化分歧。

Comments 33 pages, 5 tables and 9 figures

详情
AI中文摘要

跨学科领域由于方法多样和理论承诺不同,常常存在碎片化问题。预测编码神经科学是一个典型例子:其文献涵盖计算理论、电生理学、影像学、行为学和建模,造成了传统荟萃分析难以解决的综合问题。本文描述了一个用于本体约束文献综合的本地多LLM流水线。该流水线读取论文、提取证据、整合图表描述、组装约束提示,并根据专家词汇表验证输出。我们手动定义了一个预测编码词汇表,包含36个概念,分为三个假设:预测抑制、前向误差传播和普遍性。由十个本地语言模型组成的委员会根据每个词汇因子在局部和全局oddball情境下的一致性或不一致性,对31项研究进行评分。这使得可以进行成对研究一致性分析、跨模型比较和三维假设空间映射。某些假设的一致性较高,而其他假设则较弱,揭示了结构化分歧,特别是在局部与全局oddball范式之间。我们进一步定义了假设空间温度,这是一种几何离散度度量,用于衡量研究在假设空间中的紧凑程度。局部oddball情境的温度较低,而全局oddball情境的温度较高,表明后者离散度更大。评分几何还允许我们估计实验情境之间的变化向量。这些结果表明,本地多LLM委员会可以产生可审计的不一致性测量,将异构文献映射到定量证据空间。该框架可能推广到传统荟萃分析缺乏共同比较空间的跨研究假设映射。

英文摘要

Fragmentation is common in interdisciplinary fields with diverse methods and theoretical commitments. Predictive coding neuroscience is a clear example: its literature spans computational theory, electrophysiology, imaging, behavior, and modeling, creating a synthesis problem that conventional meta-analysis cannot easily resolve. Here, we describe a local multi-LLM pipeline for ontology-constrained literature synthesis. The pipeline reads papers, extracts evidence, incorporates figure descriptions, assembles constrained prompts, and validates outputs against an expert glossary. We manually defined a predictive-coding glossary of thirty-six concepts grouped into three hypotheses: predictive suppression, feedforward error propagation, and ubiquity. A council of ten local language models scored 31 studies according to their agreement or disagreement with each glossary factor across local and global oddball contexts. This enabled pairwise study-agreement analysis, cross-model comparison, and three-dimensional hypothesis-space mapping. Agreement was high for some hypotheses but weaker for others, revealing structured disagreement, particularly across local versus global oddball paradigms. We further define hypothesis-space temperature, a geometric dispersion metric measuring how compactly studies occupy the hypothesis space. Temperature was lower for local oddball contexts and higher for global oddball contexts, indicating greater dispersion in the latter. The scoring geometry also allowed us to estimate vectors of change between experimental contexts. These results demonstrate that local multi-LLM councils can produce auditable disagreement measurements that map heterogeneous literatures into quantitative evidence spaces. This framework may generalize to cross-study hypothesis mapping where conventional meta-analysis lacks a common comparison space.

2606.05202 2026-06-05 physics.comp-ph cs.LG

Multi-Fidelity Learning with Shallow Recurrent Decoders for Reactor Physics

基于浅层循环解码器的多保真度学习在反应堆物理中的应用

Stefano Riva, Carolina Introini, J. Nathan Kutz, Antonio Cammi

发表机构 * Autodesk Research(Autodesk研究院) Department of Energy, Nuclear Engineering Division(能源部核工程系) Politecnico di Milano(米兰理工大学) Department of Mechanical and Nuclear Engineering and Emirates Nuclear Technology Center(机械与核工程系和阿联酋核技术中心)

AI总结 针对反应堆物理中高保真数据稀缺而低保真数据丰富的问题,提出利用浅层循环解码器将低保真模型(如点动力学)映射到高保真模型(如扩散方程),以低成本获得高保真解。

详情
AI中文摘要

在反应堆物理中,根据用户需求,中子学可以以不同的保真度处理。一方面,由于数值求解玻尔兹曼输运方程的计算成本高,精确模拟反应堆中中子行为通常昂贵且耗时。另一方面,通过采用适当的假设,如SP$_N$、扩散理论和点动力学,可以高效生成低保真数据。从代理模型的角度看,这种计算限制转化为高保真数据稀缺和大量低保真数据。鉴于这种保真度差异,开发一种合适的程序将低保真模型映射到高保真模型将是有趣的;例如,可以从点动力学模型获得的时间序列数据出发,求解多群扩散方程。实际上,本文通过利用多保真度信息和浅层循环解码器(一种新颖的机器学习架构,能够将时间序列观测映射到反应堆的全状态)来研究这种可能性。该技术设计为使用局部或全局测量作为输入,并将其时间轨迹映射到高维状态;同理,原则上当输入由集总模型的解构成时,该架构也可使用。本文将这一思想应用于基准反应堆几何,在各种输入条件下将点动力学模型映射到扩散解,且计算成本大大降低。

英文摘要

In reactor physics, neutronics can be treated with different fidelity levels, according to the needs of the user. On one hand, the precise modeling of neutrons' behaviour in reactor physics is often expensive and time-consuming due to the high computational costs to numerically solve the Boltzmann transport equation. Conversely, by adopting suitable assumptions, such as the SP$_N$, diffusion theory, and point kinetics, it is possible to generate efficiently low-fidelity data. From the perspective of surrogate models, this computational limitation translates into a scarcity of high-fidelity data and a significant amount of low-fidelity data. Given this difference in fidelity levels, it would be interesting to develop a suitable procedure to map low-fidelity models towards higher fidelity models; for instance, one could obtain the solution to a multi-group diffusion equation starting from time-series data obtained from a point kinetics model. Indeed, this work investigates this possibility by leveraging multi-fidelity information with Shallow Recurrent Decoders, a novel machine learning architecture able to map time-series observations to the full state of the reactor. This technique has been designed to use local or global measurements as input and map their temporal trajectories to the high-dimensional state; by the same logic, in principle, this architecture can also be used when the input is formed by the solution of a lumped model. This work applies this idea to a benchmark reactor geometry, mapping the point kinetics model to the diffusion solution under various input conditions, with much less computational costs.

2606.05200 2026-06-05 physics.comp-ph cs.LG

A differentiable machine learning small-angle X-ray scattering analysis framework for structure elucidation of lipid nanoparticles

一种用于脂质纳米颗粒结构解析的可微分机器学习小角X射线散射分析框架

Maria Bånkestad, Sandra Barman, Magnus Röding, Erik Kaunisto, Viktoriia Meklesh, Audrey Gallud, Marco Mendez, Marianna Yanez Arteta, Stefan Norberg, Ann Terry, Smita Chakraborty, Shun Yu, Jerk Rönnols, Sepideh Pashami

发表机构 * RISE Research Institutes of Sweden, Division Digital Systems, Computer Science(瑞典RISE研究机构,数字系统部门,计算机科学) RISE Research Institutes of Sweden, Division Bioeconomy, Food Research and Innovation(瑞典RISE研究机构,生物经济、食品研究与创新部门) Sustainable Innovation & Transformational Excellence, Pharmaceutical Technology & Development, Operations, AstraZeneca(可持续创新与转型卓越,制药技术与开发,运营,阿斯利康) Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg(查尔姆斯理工大学和哥德堡大学数学科学系) Advanced Drug Delivery, Pharmaceutical Sciences, R&D, AstraZeneca(先进药物递送,药学科学,研发,阿斯利康) Global Product Development, Pharmaceutical Technology & Development, Operations, AstraZeneca(全球产品开发,制药技术与开发,运营,阿斯利康) MAX IV Laboratory, Lund University(隆德大学MAX IV实验室)

AI总结 提出一种结合机器学习代理模型和可微分层的框架,加速脂质纳米颗粒的SAXS数据分析,实现多起点拟合和可辨识性分析,揭示参数简并性。

Comments 38 pages, 24 figures, 5 tables (incl. supplementary information)

详情
AI中文摘要

脂质纳米颗粒(LNPs)是带负电核酸的有效递送系统。其多组分架构产生核-壳结构。小角X射线散射(SAXS)是LNPs的重要表征技术,但从SAXS恢复内部结构和尺寸分布是一个具有非唯一解的反问题。现实模型通常过于昂贵,难以进行系统探索。我们引入了一个机器学习加速的、可微分的框架,用于异质、多分散LNPs的SAXS分析。前向模型结合了具有高斯随机场内芯的核-壳颗粒、单分散SAXS图的神经代理模型,以及一个对颗粒尺寸分布进行积分的可微分层。代理模型将预测成本降低了四个数量级,而可微性使得大规模多起点拟合和集成可辨识性分析成为可能。应用于合成和实验MC3 LNP数据,该框架表明,近乎相同的SAXS拟合可能源于不同的参数模式,其中实验拟合主要由尺寸分布与内部结构参数之间的权衡主导。

英文摘要

Lipid nanoparticles (LNPs) are efficient delivery systems for negatively charged nucleic acids. Their multi-component architecture yields a core-shell structure. Small-angle X-ray scattering (SAXS) is an important characterization technique for LNPs, but recovering internal structure and size distribution from SAXS is an inverse problem with non-unique solutions. Realistic models are often too expensive for systematic exploration. We introduce a machine-learning-accelerated, differentiable framework for SAXS analysis of heterogeneous, polydisperse LNPs. The forward model combines a core-shell particle with a Gaussian random-field interior, a neural surrogate for the monodisperse SAXS map, and a differentiable layer integrating over particle-size distributions. The surrogate reduces prediction cost by four orders of magnitude, while differentiability enables large-scale multi-start fitting and ensemble identifiability analysis. Applied to synthetic and experimental MC3 LNP data, the framework shows that near-identical SAXS fits can arise from distinct parameter modes, with the experimental fits dominated by a trade-off between size-distribution and interior-structure parameters.

2606.05199 2026-06-05 physics.comp-ph cs.AI

Finite Element-Based Material Learning via Automatic Differentiation: Learning constitutive neural network models from full-field deformation data

基于有限元和自动微分的材料学习:从全场变形数据学习本构神经网络模型

Matthias Knipper, Chenyi Ji, Malte Brand, Kevin Linka

发表机构 * Computational Mechanics in Medicine, Applied Medical Engineering, RWTH Aachen University(医学计算力学,应用医学工程,亚琛RWTH大学) Institute for Continuum and Material Mechanics, Hamburg University of Technology(连续介质力学与材料力学研究所,汉堡技术大学)

AI总结 提出FE-MAD框架,通过自动微分将本构神经网络集成到JAX-FEM非线性求解器中,利用梯度优化从全场变形数据识别材料参数,适用于灰箱和白箱本构模型,并在三个实验数据集上验证。

详情
AI中文摘要

从异质全场变形数据中识别本构神经网络模型为基于均匀应力-应变实验的传统标定方法提供了稳健的替代方案,特别是考虑到可训练参数的高维性。现有方法必须在通用性、鲁棒性和计算效率之间取得平衡:传统有限元模型更新适用广泛但计算量大;弱形式方法效率高但对噪声和数据稀缺敏感;神经算子模型表达力强但需要大量训练数据。本文提出FE-MAD(基于有限元和自动微分的材料学习),一个端到端可微框架,将本构神经网络模型集成到JAX-FEM非线性求解器中,并通过基于梯度的测量-失配损失最小化来识别其参数。牛顿切线刚度和损失梯度通过整个流程的前向和反向模式自动微分自动计算,从而消除了解析伴随或离线代理模型的需求。FE-MAD针对两种架构进行了演示:灰箱本构人工神经网络(CANN),一个多凸、全连接且高度灵活的模型;以及白箱CANN,一个具有现象学可解释应变能项的专家系统网络。聚焦于不可压缩各向同性超弹性,FE-MAD在三个开放实验数据集上进行了评估:(1)带孔拉伸试件的全场数字图像相关(DIC)数据,(2)具有一维拉伸轮廓和全局力-位移曲线的降数据场景,以及(3)异质基体-夹杂系统,其中两相的本构定律被识别并推广到22个先前未见过的样本。

英文摘要

The identification of constitutive neural network models from heterogeneous full-field deformation data provides a robust alternative to traditional calibration methods based on homogeneous stress-strain experiments, particularly given the high dimensionality of trainable parameters. Existing approaches must balance generality, robustness, and computational efficiency: Conventional finite element model updating is broadly applicable but computationally demanding; weak-form methods offer efficiency but are sensitive to noise and data scarcity; neural operator models are highly expressive but require extensive training datasets. This work presents FE-MAD (Finite Element-Based Material learning via Automatic Differentiation), an end-to-end differentiable framework that integrates a constitutive neural network model within a JAX-FEM nonlinear solver and identifies its parameters through gradient-based minimization of a measurement-mismatch loss. Newton tangent stiffness and loss gradients are computed automatically using forward- and reverse-mode automatic differentiation throughout the entire pipeline, thereby removing the need for analytic adjoints or offline surrogate models. FE-MAD is demonstrated for two architectures: a grey-box Constitutive Artificial Neural Network (CANN), a polyconvex, fully connected model with high flexibility, and a white-box CANN, an expert-system network with phenomenologically interpretable strain-energy terms. Focusing on incompressible isotropic hyperelasticity, FE-MAD is evaluated on three open experimental datasets: (1) full digital image correlation (DIC) of a perforated tensile specimen, (2) a reduced-data scenario with a one-dimensional stretch profile and global force-displacement curve, and (3) a heterogeneous matrix-inclusion system in which both phases constitutive laws are identified and generalized to twenty-two previously unseen samples.

2606.05198 2026-06-05 q-bio.BM cs.LG

An accurate nucleic acid-small molecule docking framework via geometric deep learning with large-scale pretraining

基于几何深度学习与大尺度预训练的核酸-小分子精准对接框架

Shi Li, Xujun Zhang, Mingquan Liu, Hui Zhang, Shuoying Jia, Yu Kang, Tingjun Hou, Peichen Pan

发表机构 * College of Pharmaceutical Sciences, Zhejiang University(浙江大学药学院) Faculty of Health Sciences, University of Macau(澳门大学健康科学学院) Zhejiang Provincial Key Laboratory for Intelligent Drug Discovery and Development, Jinhua Institute of Zhejiang University(浙江省智能药物发现与开发重点实验室,浙江大学金华研究院) Shanghai Innovation Institute(上海创新研究院)

AI总结 提出NucleoDock框架,通过物理引导的大规模预训练和精细调优,结合序列、结构及原子级特征,利用混合密度网络几何评分头实现核酸-小分子对接,在125个复合物基准上达到56%的top-1成功率(RMSD<2.0Å),优于传统方法。

Comments 34 pages, 4 figures, 4 tabels, Supplementary Materials includes 8 tabels

详情
AI中文摘要

核酸越来越被认为是超越传统以蛋白质为中心的药物发现中的治疗靶点,然而将小分子准确高效地对接至核酸结构仍然具有挑战性。基于物理的对接方法通常准确性和效率有限,而深度学习方法则受限于实验解析的核酸-配体复合物稀缺。在此,我们提出NucleoDock,一个用于核酸-小分子对接的深度学习框架。为解决数据稀缺问题,NucleoDock将物理引导的大规模预训练(对数百万个对接生成的合成复合物)与在精选实验共晶结构上的微调相结合。它进一步整合了序列和结构信息的核苷酸表示与原子级三维特征,以捕获生物学背景和结合位点几何结构。使用基于混合密度网络的几何评分头来建模条件相互作用距离分布以进行构象排序。在125个核酸-配体复合物的外部基准测试中,NucleoDock在RMSD截止值2.0Å下实现了56%的top-1成功率,优于rDock的29%,同时每个复合物生成100个构象约需5秒。在ROBIN基准上的回顾性虚拟筛选进一步显示了早期富集的改善。NucleoDock代表了在弥合蛋白质导向和核酸导向的计算药物发现之间方法论差距方面迈出的一步。

英文摘要

Nucleic acids are increasingly recognized as therapeutic targets beyond conventional protein-centered drug discovery, yet accurate and efficient docking of small molecules to nucleic acid structures remains challenging. Physics-based docking methods often show limited accuracy and efficiency, whereas deep learning approaches are constrained by the scarcity of experimentally resolved nucleic acid-ligand complexes. Here, we present NucleoDock, a deep learning framework for nucleic acid-small molecule docking. To address data scarcity, NucleoDock combines physics-guided large-scale pretraining on millions of docking-generated synthetic complexes with fine-tuning on curated experimental co-crystal structures. It further integrates sequence- and structure-informed nucleotide representations with atomistic three-dimensional features to capture both biological context and binding-site geometry. A mixture density network-based geometric scoring head is used to model conditional interaction-distance distributions for pose ranking. On an external benchmark of 125 nucleic acid-ligand complexes, NucleoDock achieved a top-1 success rate of 56 percent at an RMSD cutoff of 2.0 Angstrom, outperforming rDock with 29 percent, while generating 100 poses in approximately 5 seconds per complex. Retrospective virtual screening on the ROBIN benchmark further showed improved early enrichment. NucleoDock represents a step toward bridging the methodological gap between protein- and nucleic acid-directed computational drug discovery.

2606.05188 2026-06-05 cs.CY cs.AI

Assessing the Geographic Diversity of AI's Platial Representations in Image Generation

评估图像生成中AI地点表征的地理多样性

Zilong Liu, Krzysztof Janowicz, Mina Karimi

发表机构 * Department of Geography and Regional Research, University of Vienna, Austria(维也纳大学地理与区域研究系)

AI总结 本文以GPT和DALL-E模型为例,引入生态学中的物种多样性度量方法,通过相似性加权评估图像生成的地理多样性,发现旧模型可能更具多样性且提示修订比图像生成更促进多样性,同时观察到模型同质性导致缺乏地理多样性。

Comments Full conference paper accepted by the AGILE 2026 (https://agile-gi.eu/conference-2026)

详情
AI中文摘要

(生成式)AI多样性不仅仅是伦理问题。从地理信息科学(GIScience)的角度来看,它可被解释为不确定性的一种函数,以及嵌入AI输出中的一种认知偏差。近期研究致力于开发信息论多样性度量,并将其应用于评估地理背景下AI聊天机器人的输出。随着我们日常接触的AI生态系统迅速变得多模态,我们认为检查不同模态下的地理多样性至关重要。本文聚焦于图像,旨在填补这一研究空白。首先,我们选取GPT和DALL-E模型作为最先进的例子,指出评估其地理多样性涉及多个阶段,包括提示修订和图像生成。然后,受生态学中物种多样性度量的启发,我们将相似性加权纳入地理多样性的测量。接着,我们通过案例研究展示如何评估图像生成中的地理多样性。我们的分析揭示了若干反直觉的发现。例如,较旧的模型可能表现出更大的地理多样性,尽管生成的图像质量较低;提示修订比图像生成产生更大的地理多样性。同时,我们观察到缺乏地理多样性背后存在明显的模型同质性,因为所选模型一致地描绘相同的地理原型特征或相似特征。这令人担忧,因为它可能产生对地方的刻板印象。

英文摘要

(Gen)AI diversity is not merely an ethical issue. From the perspective of geographic information science (GIScience), it could be interpreted as a function of uncertainty and as a form of cognitive bias, embedded in AI outputs. Recent work has sought to develop information-theoretic diversity measures and apply them to evaluate AI-chatbot outputs in a geographic context. As the AI ecosystem to which we are exposed on a daily basis becomes rapidly multimodal, we believe it is important to examine geographic diversity across various modalities. Focusing on images, this paper aims to fill this research gap. First, we select the GPT and DALL-E models as state-of-the-art examples and point out how assessing their geographic diversity involves various stages, including prompt revision and image generation. Then, taking inspiration from species diversity measures in ecological research, we incorporate similarity weighting into the measurement of geographic diversity. Next, we demonstrate how to evaluate geographic diversity in image generation through a case study. Our analysis reveals several counterintuitive findings. For instance, older models can exhibit greater geographic diversity despite producing lower-quality images, and prompt revision yields greater geographic diversity than image generation. At the same time, we observe explicit model homogeneity underlying the lack of geographic diversity, as the selected models consistently depict the same prototypical geo-specific feature or similar features. This is concerning, as it risks producing stereotypical representations of places.

2606.05187 2026-06-05 cs.CY cs.AI

Geographic Bias and Diversity in AI Evaluation

AI评估中的地理偏见与多样性

Zilong Liu, Krzysztof Janowicz, Gengchen Mai, Song Gao, Rui Zhu

发表机构 * University of Vienna(维也纳大学) University of Texas at Austin(德克萨斯大学奥斯汀分校) University of Wisconsin-Madison(威斯康星大学麦迪逊分校) University of Bristol(布里斯托大学)

AI总结 通过文献综述,识别AI中从训练数据到生成输出的多种地理偏见,并展示近期研究如何通过评估生成AI在不同认知层次、参数设置和输出模态下的地理多样性来应对这些偏见。

Comments Book chapter accepted by "Geography According to ChatGPT"

详情
AI中文摘要

在阻碍AI负责任开发和部署的众多挑战中,各种形式的偏见无疑受到了最严格的审视。这凸显了AI研究人员的广泛担忧,即模型输出(例如来自生成式AI)可能编码结构性分布不平衡(源于训练数据或模型设计),从而可能加剧社会不平等或在从生物多样性到灾害缓解等应用领域引入系统性扭曲。然而,相对较少的工作研究了偏见的地理性质,或为(生成式)AI的无偏见性开发了可衡量的基准。在本章中,我们通过文献综述来研究这个问题。随着基础模型重塑偏见研究的格局,我们考察了涵盖预生成式AI和生成式AI时期的工作。首先,我们识别了一系列地理偏见。这些偏见包括训练数据中的代表性偏差、语言模型事实回忆中的区域差异,以及生成式AI过度倾向于典型地点(称为默认值)的倾向。然后,我们展示了近期研究如何通过评估生成式AI在不同认知层次、参数设置和输出模态下的地理多样性来解决后一种偏见。

英文摘要

Among the many challenges hindering the responsible development and deployment of AI, arguably none has faced more intense scrutiny than bias in its various forms. This underscores the widespread concerns across AI researchers that model outputs, e.g., from generative AI, may encode structural distributional imbalances (stemming from training data or model design) that may amplify social inequality or introduce systemic distortions across application domains ranging from biodiversity to disaster mitigation. Yet, relatively little work has investigated the geographical nature of bias or developed measurable benchmarks for what it means for (generative) AI to be unbiased. In this chapter, we investigate this issue through a literature review. As foundation models are reshaping the landscape of bias research, we examine work spanning both the pre-generative AI and generative AI periods. First, we identify a range of geographic biases. These biases span from representation bias in the training data and regional disparities in the factual recall of language models to the tendency of generative AI to over-proportionally favor prototypical places (called defaults). Then, we showcase how recent studies address the latter bias by evaluating geographic diversity in the outputs of generative AI across various cognitive levels, parameter settings, and output modalities.

2606.05185 2026-06-05 cs.CY cs.CV cs.LG

Drishti AI-Event Guardian: An Intelligent Real-Time Crowd Monitoring and Emergency Response System for Mass Gathering Events

Drishti AI-Event Guardian:面向大规模聚集事件的智能实时人群监控与应急响应系统

Ritabrata Roy Choudhury, Arkajyoti Karmakar, Rudra Pratap Mitra

发表机构 * School of Computer Engineering, Kalinga Institute of Industrial Technology(计算机工程学院,凯林加工业技术学院) School of Electronics Engineering, Kalinga Institute of Industrial Technology(电子工程学院,凯林加工业技术学院)

AI总结 提出Drishti AI-Event Guardian框架,结合YOLOv8、异常检测和梯度提升回归等多模态深度学习技术,实现实时人群密度估计、异常检测、预测建模、人脸识别、医疗紧急报告、聊天机器人和智能警卫重分配,在Kumbh Mela和RCB Victory Parade事件中验证了低延迟和高精度。

Comments 22 pages

详情
AI中文摘要

大规模聚集事件常因人群监控不足和应急响应协调不力导致严重安全事故。传统监控系统缺乏智能分析,导致威胁识别延迟、资源部署不当,以及在密集公共集会中对弱势个体的支持不足。本文提出Drishti AI-Event Guardian,一种利用深度学习增强公共安全的智能人群管理框架。该架构整合来自CCTV网络和无人机平台的多模态数据,由Google Vertex AI基础设施上的模型处理。核心方法包括使用YOLOv8进行实时人群密度估计、时空异常检测以及通过梯度提升回归进行预测性人群流动建模。Drishti还集成了四个模块:(i) 用于失踪人员识别并触发全人群通知的人脸识别;(ii) 带有自动调度的医疗紧急报告;(iii) 用于报告和投诉的对话式AI聊天机器人;(iv) 智能警卫重分配引擎,可根据人群密度变化动态重新分配人员。该系统在Kumbh Mela集会和RCB Victory Parade活动两个场景中进行了评估,实现了人群密度估计MAE为3.2人/平方米、异常检测F1分数为0.91、人脸识别精确率为0.93,以及中位警报延迟为111毫秒。预测性拥堵建模提供五分钟预测,MAPE为8.3%,从而实现预防性干预。聊天机器人无需人工操作即可解决89%的事件申报,而警卫重分配相比手动重新分配将响应人员部署延迟降低了34%。结果表明,该系统从被动监控转向主动人群智能,并为从本地集会到大型节日的活动提供了可扩展的基础。

英文摘要

Mass gathering events are associated with critical safety incidents caused by insufficient crowd monitoring and inadequate emergency response coordination. Traditional surveillance systems lack intelligent analytics, resulting in delayed threat identification, poor resource deployment, and weak support for vulnerable individuals during dense public assemblies. This paper presents Drishti AI-Event Guardian, an intelligent crowd management framework using deep learning for public safety enhancement. The architecture combines multimodal data from CCTV networks and UAV platforms, processed by models on Google Vertex AI infrastructure. Core methods include real-time crowd density estimation using YOLOv8, spatiotemporal anomaly detection, and predictive crowd-flow modeling through gradient-boosted regression. Drishti also integrates four modules: (i) facial recognition for missing person identification with crowd-wide notification; (ii) medical emergency reporting with automated dispatch; (iii) a conversational AI chatbot for reports and complaints; and (iv) an intelligent guard reallocation engine that dynamically reassigns personnel in response to crowd density changes. The system is evaluated on two scenarios: the Kumbh Mela gathering and the RCB Victory Parade event, achieving crowd density estimation MAE of 3.2 persons/m2, anomaly detection F1-score of 0.91, facial recognition precision of 0.93, and median alert latency of 111 ms. Predictive congestion modeling provides five-minute forecasts with MAPE of 8.3%, enabling preemptive intervention. The chatbot resolved 89% of incident filings without human operators, while guard reallocation reduced responder deployment latency by 34% versus manual reassignment. Results demonstrate a shift from passive surveillance toward active crowd intelligence and scalable foundation for events from local gatherings to mega festivals.

2606.05178 2026-06-05 cs.HC cs.AI

The Virtual Roundtable: Multi-Agent Personas Simulating the Dynamics of Human Brainstorming

虚拟圆桌会议:模拟人类头脑风暴动态的多智能体角色

Tim Dorn, Saara A. Khan, Julie Mumford

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出一种多智能体架构,通过发散与收敛两阶段模拟圆桌头脑风暴,利用多样化AI角色和智能引导者产生多样化创意并评估排名,案例研究表明其能产生多样相关创意并深化讨论质量。

Comments 10 pages, 10 figures, 2 tables

详情
AI中文摘要

随着AI驱动产品开发的加速,瓶颈正从如何构建转向构建什么。传统人类头脑风暴面临群体思维、回音室和多样性有限等挑战。为解决这一问题,我们提出了一种多智能体架构,通过两个阶段模拟圆桌头脑风暴:发散思维以产生多样化创意,以及收敛思维以评估和排名最有前景的创意。该系统采用多样化的AI角色参与圆桌讨论,并由一个智能引导者引导讨论走向富有成效的结果。角色在公开评论的同时保持私人想法,创意在讨论过程中有机涌现。每个角色在创意提交和投票上的配额促进了平衡参与,同时产生自然排名。在整个会话过程中,系统跟踪每个创意的谱系,捕捉概念如何随时间起源和交叉传播。我们通过一个为AI智能眼镜生成消费者创意的案例研究来展示该方法,表明:(i) 它产生了多样、相关的创意,并提供了对其演化的洞察;(ii) 角色之间观点的累积交流培养了一个共享语境,逐步深化了讨论质量和产生的创意。

英文摘要

As AI-driven product development accelerates, the bottleneck is shifting from how we build to what we build. Traditional human brainstorming faces challenges including groupthink, echo chambers, and limited diversity. To address this, we present a multi-agentic architecture that simulates roundtable brainstorming through two phases: divergent thinking to generate diverse ideas, and convergent thinking to evaluate and rank the most promising ones. The system employs diverse AI personas that engage in roundtable discussions, guided by an agentic facilitator that steers the discussion toward productive outcomes. Personas maintain private thoughts while commenting publicly, with ideas emerging organically throughout the discussion. Per-persona quotas on idea submissions and votes promote balanced participation while producing natural rankings. Throughout the session, the system tracks each idea's lineage, capturing how concepts originate and cross-pollinate over time. We demonstrate this approach through a case study generating consumer ideas for AI smart glasses, showing (i) it produces diverse, relevant ideas with insights into their evolution; (ii) the cumulative exchange of perspectives across personas cultivates a shared context that progressively deepens the quality of discussion and the ideas produced.

2606.05172 2026-06-05 cs.HC cs.CV

Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image Editing

这个编辑正确吗?面向推理感知图像编辑的多维度基准

Yixuan Ding, Wei Huang, Ruijie Quan, Xiaojuan Qi, Yi Yang

发表机构 * Zhejiang University(浙江大学) The University of Hong Kong(香港大学)

AI总结 提出RE-Edit基准,从物理、环境、文化、因果和指代五个推理维度评估图像编辑系统,发现现有模型在隐式逻辑约束推理上存在不足,并引入轻量级推理引导后编辑基线。

Comments 23 pages, 10 figures, 7 tables

详情
AI中文摘要

基于扩散的图像编辑在自然语言指令下实现了强大的视觉保真度,但大多数现有系统仍停留在表面指令遵循层面,没有推理真实用户请求中嵌入的隐式上下文约束。这常常导致视觉上合理但逻辑不一致的编辑。在这项工作中,我们引入了RE-Edit,一个面向推理感知图像编辑的基准,它从五个互补的推理维度评估图像编辑系统:物理、环境、文化、因果和指代。RE-Edit包含1000个精心策划的样本,每个样本的设计使得仅凭视觉合理性是不够的,正确的编辑需要满足隐式逻辑约束。为了支持细粒度分析,我们建立了维度对齐的评估标准,并对十个开源和两个商业图像编辑模型进行了全面研究。我们的结果表明,即使先进的系统也常常在隐式多维度推理上挣扎,尽管它们能产生高质量的视觉结果。我们进一步提出了一个轻量级的推理引导后编辑基线作为初步探索,说明了如何以模型无关的方式插入显式推理来帮助缓解此类失败。

英文摘要

Diffusion-based image editing has achieved strong visual fidelity under natural language instructions, yet most existing systems still operate at the level of surface instruction following, without reasoning about the implicit contextual constraints embedded in real user requests. This often leads to visually plausible but logically inconsistent edits. In this work, we introduce RE-Edit, a benchmark for REasoning-aware image Editing that evaluates image editing systems across five complementary reasoning dimensions: physical, environmental, cultural, causal, and referential. RE-Edit comprises 1,000 carefully curated samples, each designed such that visual plausibility alone is insufficient and correct editing requires satisfying implicit logical constraints. To support fine-grained analysis, we establish dimension-aligned evaluation criteria and conduct a comprehensive study of ten open-source and two commercial image editing models. Our results show that even advanced systems frequently struggle with implicit multi-dimensional reasoning despite producing high-quality visuals. We further present a lightweight reasoning-guided post-edit baseline as an initial exploration, illustrating how inserting explicit reasoning can help mitigate such failures in a model-agnostic manner.

2606.05167 2026-06-05 cs.MA cs.AI

RAINO: Anchoring Agents in Reality, A Systematic Review and Conceptual Framework for Realism in Agent-Based Modelling

RAINO:将智能体锚定于现实——基于智能体建模中现实主义的系统综述与概念框架

Loïs Vanhée, Melania Borit

发表机构 * Umeå Universitet(乌梅拉大学) UiT The Arctic University of Norway(北极大学)

AI总结 本文通过系统文献综述,识别了基于智能体建模中现实主义操作化与展示的不足,并提出了RAINO框架(现实锚点、输入、输出),以统一和拓展对现实主义的理解。

Comments The paper has been accepted in the Social Simulation Conference 2025

详情
AI中文摘要

现实主义是基于智能体建模中一个核心但似乎理论化不足的概念。本文呈现了一项系统文献综述,旨在识别现实主义目前如何被操作化和展示。结果表明,现实主义往往定义模糊,缺乏一致的概念框架。虽然使用了多种方法来实现和展示现实主义,但对这些方法是否以及为何适用于其预期目的的解释通常有限。基于此综述,我们引入了现实锚点、输入、输出(RAINO)框架。RAINO识别了用于论证基于智能体模型中现实主义的关键结构,包括现实锚点(例如,经验数据、形式理论、专家知识、常识期望)及其作为模型输入或输出的应用。RAINO拓宽了现有关于现实主义如何被框架化的视角。它解释了为什么不同的评估者可能以不同方式评估模型的现实主义,并展示了这种更广泛的框架如何导致显著不同的模型开发方法。

英文摘要

Realism is a central yet seemingly under-theorized concept in Agent-Based Modelling. This paper presents a Systematic Literature Review, aiming to identify how realism is currently operationalized and demonstrated. The results show that realism is often poorly defined and lacks a consistent conceptual framework. A wide variety of methods are used to achieve and demonstrate realism, but explanations of whether and why these methods are appropriate for their intended purposes are generally limited. Building on this review, we introduce the Reality Anchor, Input, Output (RAINO) framework. RAINO identifies the key structures used to argue for realism in Agent-Based Models, consisting of Reality Anchors (e.g., empirical data, formal theory, expert knowledge, common-sense expectations) and their application as model Input or Output. RAINO broadens existing perspectives on how realism is framed. It explains why different assessors may evaluate the realism of a model in different ways, and it shows how this broader framing can lead to significantly different approaches to model development.

2606.03998 2026-06-05 eess.SP cs.CV

TGSD: Topology-Guided State-Space Diffusion Framework for EEG Spatial Super-Resolution

TGSD: 拓扑引导的状态空间扩散用于EEG空间超分辨率

Zijian Kang, Weiming Zeng, Yueyang Li, Shengyu Gong, Hongjie Yan, Wai Ting Siok, Nizhuan Wang

发表机构 * Lab of Digital Image and Intelligent Computation, Shanghai Maritime University(数字图像与智能计算实验室,上海海洋大学) Department of Language Science and Technology, The Hong Kong Polytechnic University(语言科学与技术系,香港理工大学) Affiliated Lianyungang Hospital of Xuzhou Medical University(徐州医学院连云港医院)

AI总结 提出TGSD框架,通过拓扑引导的状态空间扩散模型,利用分层空间先验编码器和条件状态空间扩散重建器,从低密度EEG恢复高密度信号,在SEED和PhysioNet MM/I数据集上优于基线方法。

详情
AI中文摘要

低密度EEG更适合可穿戴和基于物联网的大脑传感,但稀疏的电极采样通常缺乏足够的空间信息来表征跨区域的神经活动。EEG空间超分辨率旨在从稀疏记录中恢复密集通道EEG,但由于通道缺失通常发生在整个通道级别,全电极布局上的时空依赖性往往未被充分探索,且从稀疏到密集信号的映射本质上具有模糊性,因此仍然具有挑战性。为了解决这些问题,我们提出了TGSD,一种用于EEG空间超分辨率的拓扑引导状态空间扩散框架。TGSD首先采用分层空间先验编码器,通过整合局部几何关系与区域级上下文信息,学习完整电极布局上的拓扑感知先验。基于这些先验和稀疏观测,条件状态空间扩散重建器通过反向扩散逐步生成缺失通道信号,同时交替进行时间和通道维度的状态空间建模,在统一框架中捕捉长程时间动态和通道间依赖性。在SEED和PhysioNet MM/I数据集上的实验表明,TGSD在不同超分辨率因子下,在重建保真度和下游分类性能方面均持续优于代表性基线。这些结果证明了将拓扑感知空间先验与条件扩散相结合,在可穿戴和物联网场景中增强实用低密度EEG传感的有效性。官方实现代码可在https://github.com/jtggz/TGSD获取。

英文摘要

Low-density EEG is more suitable for wearable and IoT-based brain sensing, but sparse electrode sampling often lacks sufficient spatial information to characterize cross-regional neural activity. EEG spatial super-resolution aims to recover dense-channel EEG from sparse recordings, yet remains challenging because channel missingness typically occurs at the whole-channel level, spatiotemporal dependencies over the full electrode layout are often underexplored, and the mapping from sparse to dense signals is inherently ambiguous. To address these issues, we propose TGSD, a topology-guided state-space diffusion framework for EEG spatial super-resolution. TGSD first employs a Hierarchical Spatial Prior Encoder to learn topology-aware priors over the complete electrode layout by integrating local geometric relationships with region-level contextual information. Based on these priors and sparse observations, a Conditional State-Space Diffusion Reconstructor progressively generates missing-channel signals through reverse diffusion, while alternating temporal and channel-wise state-space modeling captures long-range temporal dynamics and inter-channel dependencies in a unified framework. Experiments on the SEED and PhysioNet MM/I datasets show that TGSD consistently outperforms representative baselines under different super-resolution factors in both reconstruction fidelity and downstream classification performance. These results demonstrate the effectiveness of combining topology-aware spatial priors with conditional diffusion for enhancing practical low-density EEG sensing in wearable and IoT scenarios. The official implementation code is available at https://github.com/jtggz/TGSD.

2606.03067 2026-06-05 stat.ML cs.LG

Trajectory-Aware Node Contributions and the Limits of Static Controllability

轨迹感知的节点贡献与静态可控性的极限

Valentina Kuskova, Dmitry Zaytsev, Michael Coppedge

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文提出“涌现贡献”(EC)作为节点动态杠杆的有限时域度量,通过可微模型的雅可比矩阵计算,在线性时不变极限下退化为平均可控性,并构建相图刻画两者一致与分歧的条件。

Comments 11 pages, 1 figure

详情
AI中文摘要

复杂网络中的一个常见数据挖掘任务是确定单个节点如何影响系统行为。现有方法依赖于静态图中心性或控制理论量(如可控性格拉姆矩阵),这些方法假设线性时不变动力学。然而,实际估计的系统通常是非线性和时变的。我们定义了“涌现贡献(EC)”,这是一种节点动态杠杆的有限时域度量:其脉冲响应的度量加权能量沿系统轨迹累积。EC 通过任何可微模型的雅可比矩阵计算,与估计器无关,并在线性时不变极限下精确地退化为平均可控性。我们的贡献是刻画了这两种度量一致与分歧的条件。使用一个具有已知真实贡献的受控合成族,我们构建了一个跨越非线性、机制结构、持续性和扰动幅度的相图。EC 和平均可控性在静态或平滑漂移动力学下一致,并且两者都跟踪真实值。分歧在持续机制切换下出现,在持续符号反转下最强,并在移除符号反转时消失。在极端扰动幅度下,两种度量都会退化,这揭示了局部线性化的极限。我们将来自多个领域的五个估计真实系统置于该相空间中。它们的位置可作为 EC 何时提供超出静态可控性信息的诊断,从而证明其额外计算成本的合理性。在一个深入检查的面板上,一个二十种子重训练集成揭示了稳健的方差-杠杆分离:节点的扰动广泛传播,尽管其系统内方差较低,这既未被静态中心性恢复,也未被基于方差的摘要恢复。

英文摘要

A recurring data mining task in complex networks is to determine how individual nodes contribute to system behavior. Existing approaches rely on either static-graph centralities or control-theoretic quantities such as controllability Gramians, which assume linear, time-invariant dynamics. Estimated systems, however, are typically nonlinear and time-varying. We define "emergent contribution (EC)," a finite-horizon measure of a node's dynamical leverage: the metric-weighted energy of its impulse response accumulated along the system trajectory. Computed from the Jacobians of any differentiable model, EC is estimator-agnostic and reduces exactly to average controllability in the linear, time-invariant limit. Our contribution is a characterization of when the two measures agree and diverge. Using a controlled synthetic family with known ground-truth contribution, we construct a phase diagram spanning nonlinearity, regime structure, persistence, and perturbation amplitude. EC and average controllability agree under static or smoothly drifting dynamics and both track ground truth. Divergence emerges under persistent regime switching, is strongest under persistent sign reversal, and disappears when the sign reversal is removed. At extreme perturbation amplitudes, both measures degrade, identifying the limits of local linearization. We place five estimated real systems from several domains within this phase space. Their placement serves as a diagnostic of when EC provides information beyond static controllability and therefore justifies its additional computational cost. On one panel examined in depth, a twenty-seed retraining ensemble reveals a robust variance--leverage dissociation: nodes whose perturbations propagate widely despite low within-system variance, which is not recovered by static centralities nor variance-based summaries.

2606.03091 2026-06-05 cs.IR cs.AI

BAHSD: Bridging the Long-tail Gap via Adaptive Distillation in Black-box Sequential Recommendation

BAHSD:通过自适应蒸馏弥合黑盒序列推荐中的长尾差距

Xi Zhou, Famin Wu, Mingming Li, Hongyue Zhang, Jiao Dai, Jizhong Han, Tao Guo

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China(中国科学院信息工程研究所,北京,中国) School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China(中国科学院大学网络安全学院,北京,中国) Beijing Institute for General Artificial Intelligence, Beijing, China(北京一般人工智能研究院,北京,中国)

AI总结 针对黑盒序列推荐中长尾分布导致的信号异质性,提出BAHSD框架,利用多尺度一致性探测机制量化信号可靠性,并设计自适应分层目标(动态温度KL散度、排序一致性和InfoNCE对比学习)来缓解偏好固化并增强噪声鲁棒性,在尾用户上提升80%以上。

详情
AI中文摘要

序列推荐系统被广泛采用,但通常作为黑盒API部署,这推动了近期对模型提取的兴趣,以在本地复制其能力。然而,长尾分布导致了严重的信号异质性:密集的头部序列触发教师偏好的固化,使提取偏向局部模式,而稀疏的尾部序列产生平坦且嘈杂的预测。现有的一刀切式提取忽略了这种差异,导致噪声过拟合和次优的知识迁移。我们提出BAHSD,一种黑盒自适应蒸馏框架,通过多尺度一致性探测机制隐式量化信号可靠性来处理信号异质性。基于此,设计了自适应分层目标:动态温度KL散度缓解高置信度信号的偏好固化,而排序一致性和InfoNCE对比学习为低置信度信号提供噪声鲁棒的增强。BAHSD持续优于基线,在教师模型上获得高达4.98%的提升,在尾用户上提升80%以上,为高保真黑盒推荐提取提供了一种即插即用的解决方案。

英文摘要

Sequential recommendation systems are widely adopted but often deployed as black-box APIs, which has driven recent interest in model extraction to replicate their capabilities locally. However, the long-tail distribution induces severe signal heterogeneity: dense head sequences trigger the solidification of teacher preference, biasing extraction toward local patterns, while sparse tail sequences yield flat, noisy predictions. Existing one-size-fits-all extraction overlooks this disparity, resulting in noise overfitting and suboptimal knowledge transfer. We propose BAHSD, a black-box adaptive distillation framework that handles signal heterogeneity via a multi-scale consistency probing mechanism to implicitly quantify signal reliability. Based on this, an adaptive hierarchical objective is designed: dynamic-temperature KL divergence mitigates preference solidification for high-confidence signals, while ranking consistency and InfoNCE contrastive learning provide noise-robust enhancement for low-confidence signals. BAHSD consistently outperforms baselines, achieving up to 4.98\% gain over the teacher and 80\%+ improvement on tail users, offering a plug-and-play solution for high-fidelity black-box recommendation extraction.

2606.00804 2026-06-05 cs.MA cs.AI cs.CL

Dynamic Coordination Strategy Selection for Enterprise Multi-Agent Systems

企业多智能体系统的动态协调策略选择

Thanh Luong Tuan

发表机构 * Golden Gate University(金门大学) Foundation AgenticOS (FAOS)(基础代理操作系统(FAOS))

AI总结 本文通过大规模实验评估企业多智能体系统是否应根据问题类别动态选择协调策略,发现动态路由作为校准默认值有效,但无法确定唯一最优策略。

Comments 13 pages, 4 appendix. Code and data: https://github.com/frank-luongt/faos-research/tree/main/RA-1

详情
AI中文摘要

企业多智能体系统日益暴露多种协调模式,但部署时往往缺乏证据表明何时使用共识、辩论、综合或更简单的单智能体工作流。本文评估协调策略是否应根据问题类别动态选择,而非全局固定。我们运行了一个固定的矩阵,包含30个企业任务,涵盖六个行业、五个问题类别、四种执行条件、每个单元格三个重复,以及四个模型分支:qwen_local、sonnet、gemma_openrouter和一个辅助的openai云验证分支。所有1,440个生成输出均由固定的Sonnet评分标准评判。主要发现是有界且操作上有用的,但并非最初的严格H1。预先注册的精确胜者/CI标准未得到支持:精确胜者身份在不同模型分支间不稳定,且若干预测策略接近但未超过最佳观察到的替代方案。一个较弱的近最优路由主张得到强烈支持。在每个预先注册的模型分支和问题类别中,以及在辅助的OpenAI验证分支中,预测策略的质量分数与最佳观察条件相差在0.10以内。结构化合规验证是对原始映射最明显的例外:所有分支都偏好单智能体而非共识。预先注册的Kendall's W检验发现,越南语领域和英语领域任务在四种协调条件排序的一致性上没有可靠差异(两个分层的平均W均为0.20;符号秩检验p = .85),因此H2未得到支持。我们得出结论,企业协调策略应使用动态路由作为校准默认值,而非确定性胜者选择法则。

英文摘要

Enterprise multi-agent systems increasingly expose multiple coordination patterns, but deployments often lack evidence for when to use consensus, debate, synthesis, or a simpler single-agent workflow. This paper evaluates whether coordination strategy should be selected dynamically by problem class rather than fixed globally. We run a frozen matrix of 30 enterprise tasks spanning six industries, five problem classes, four execution conditions, three replications per cell, and four model arms: qwen_local, sonnet, gemma_openrouter, and an auxiliary openai cloud-validation arm. All 1,440 generated outputs are judged by a fixed Sonnet rubric. The main finding is bounded and operationally useful, but it is not the original strict H1. The pre-registered exact-winner/CI criterion is not supported: exact winner identity is unstable across model arms, and several predicted strategies are close to, but not above, the best observed alternative. A weaker near-best routing claim is strongly supported. In every pre-registered model arm and problem class, and again in the auxiliary OpenAI validation arm, the predicted strategy is within 0.10 quality-score points of the best observed condition. Structured compliance verification is the clearest exception to the original mapping: all arms favor single_agent rather than consensus. A pre-registered Kendall's W test finds no reliable difference between Vietnamese-domain and English-domain tasks in how consistently the four coordination conditions are ranked (mean W of 0.20 in both strata; signed-rank p = .85), so H2 is not supported. We conclude that enterprise coordination policy should use dynamic routing as a calibrated default, not as a deterministic winner-selection law.

2605.27991 2026-06-05 stat.ML cs.LG

Gradient-Flow Optimization as Dynamic Random-Effects Inference: Testing and Early Stopping with Applications to Deep Learning

深度神经网络训练作为随机效应:优化-推断对偶性

Minhao Yao, Ruoyu Wang, Xihong Lin, Lin Liu, Zhonghua Liu

发表机构 * Centre for Biomedical Data Science, Duke-NUS Medical School, National University of Singapore(生物医学数据科学中心,国立新加坡大学杜克-新加坡医学学校) Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA(生物统计学系,哈佛T.H. Chan公共卫生学院,马萨诸塞州波士顿,美国) Institute of Natural Sciences, MOE-LSC, School of Mathematical Sciences, CMA-Shanghai, SJTU-Yale Joint Center of Biostatistics and Data Science, Shanghai Jiao Tong University(自然科学院,MOE-LSC,数学科学学院,CMA-上海,SJTU-耶鲁联合生物统计学与数据科学中心,上海交通大学) Department of Biostatistics, Columbia University, New York, NY, USA(生物统计学系,哥伦比亚大学,纽约州纽约市,美国)

AI总结 本文提出深度神经网络训练与经典随机效应模型等价,揭示了优化-推断对偶性,并利用限制最大似然估计实现基于似然的早停规则。

详情
AI中文摘要

深度神经网络(DNN)取得了显著的实证成功,但其训练动态主要从优化而非统计原理的角度被理解。本文通过证明连续时间神经正切核(NTK)梯度流产生的预测与经典随机效应模型的预测完全等价,为过参数化机制下的DNN训练建立了一个统计框架。在该框架中,训练时间充当方差分量,或等价地作为经验贝叶斯协方差超参数,控制噪声到结构化信号的变异分配。这种等价性揭示了一种优化-推断对偶性:梯度流路径既是优化轨迹,也是经验贝叶斯随机效应推断路径。以训练时间为条件,网络输出是潜在信号的后验均值,通过限制最大似然估计(REML)估计训练时间,将早停转化为基于似然的经验贝叶斯推断,而非外部调参。这一视角产生了一个两阶段推断程序。首先,方差分量检验确定DNN训练是否捕捉到初始化之外的统计显著结构。其次,以训练合理为条件,REML提供基于似然的早停规则。由此产生的停止时间在NTK特征基下具有谱解释,其中训练持续到谱损失去相关实现。我们进一步证明,对于固定设计下的样本内预测,REML引导的早停实现了渐近最优预测误差,并且在额外的随机设计正则条件下,对于样本外预测也成立。这项工作将DNN训练重新定义为统计推断,并为决定是否以及训练深度神经网络多长时间提供了原则性基础。

英文摘要

Gradient-flow optimization is usually viewed as an algorithmic procedure for minimizing empirical loss, with training duration selected by validation or heuristic early-stopping rules. We develop a statistical inference framework for the gradient-flow training trajectory itself. The central object is fixed-operator squared-error gradient flow: whenever the fitted value evolves through a time-invariant positive semidefinite training operator, the trained model output at each training time is exactly equivalent to the best linear unbiased predictor, or empirical-Bayes posterior mean, under a corresponding random-effects model. Under this representation, training time becomes a variance-component parameter governing how variance is reallocated from residual noise to structured signal. This turns two basic training decisions into inferential problems. First, whether training is needed is formulated as a variance-component test for signal beyond initialization. Second, how long to train is formulated as restricted maximum likelihood (REML) estimation of the training-time variance component. The resulting REML-guided early stopping rule has a spectral interpretation: it selects the training time at which optimized spectral losses become empirically decorrelated from the eigenvalues of the training operator, yielding an effective degrees-of-freedom measure for the evolving trained model. We establish asymptotic prediction optimality for fixed-design in-sample risk and, under additional kernel regularity conditions, random-design out-of-sample risk. Deep learning models in fixed-kernel gradient regimes provide canonical modern-AI instantiations of the theory. Numerical experiments and a UK Biobank proteomics application show that the proposed inferential approach attains competitive prediction accuracy while reducing the reliance on validation splits and repeated checkpoint evaluation.

2605.26179 2026-06-05 cond-mat.mtrl-sci cs.AI cs.CE

AutoDFT: A Closed-Loop Multi-Agent Framework for Autonomous DFT Calculations

AutoDFT:用于自主DFT计算的闭环多智能体框架

Penghui Yang, Zhonghan Zhang, Yue Li, Xinrun Wang, Yanchen Deng, Yuhao Lu, Bijun Tang, Zheng Liu, Bo An

发表机构 * Nanyang Technological University, Singapore(南洋理工大学,新加坡) Singapore Management University(新加坡管理大学)

AI总结 提出AutoDFT闭环多智能体框架,通过将LLM推理嵌入DFT计算全生命周期,实现从规划到执行的自主适应,在VASPBench基准上达到94.1%任务成功率,并可靠预测电子、磁性和能量性质。

详情
AI中文摘要

密度泛函理论(DFT)是材料科学和化学中计算发现的基础,然而每次计算都需要大量人工努力:当收敛停滞时调整算法,当出现意外物理现象时修改计划,以及当中间结果重塑问题时插入步骤。现有的基于LLM的智能体仅自动化初始规划阶段,预先生成完整的执行计划,而将所有后续调整留给手工规则。因此,这些工作流仍然脆弱,难以泛化到预规划场景之外,并且当失败或意外的中间结果需要改变计算路径时,通常需要专家干预。在此,我们介绍AutoDFT,一个闭环多智能体框架,将LLM推理嵌入DFT生命周期的每个阶段:战略规划器生成步骤目标的骨架计划;步骤规划器根据先前结果即时生成数值参数;监控-恢复-反思循环诊断失败、修复失败,并在证据支持时修改计划。我们展示了广度和深度:广度方面,在VASPBench(一个专门构建的基准,涵盖34个任务和9种DFT计算类型)上,AutoDFT使用GPT-5.2实现了94.1%的任务级成功率;深度方面,在已建立的材料数据库上,AutoDFT在电子、磁性和能量性质上产生了定量可靠的属性预测。通过闭环规划和执行,AutoDFT使没有深厚计算专业知识的实验人员能够获得可靠的第一性原理结果。

英文摘要

Density functional theory (DFT) serves as the basis for computational discovery in materials science and chemistry, yet each calculation demands extensive human effort: adjusting algorithms when convergence stalls, revising plans when unexpected physics emerges, and inserting steps as intermediate results reshape the problem. Existing LLM-based agents automate only the initial planning stage, producing a full execution plan upfront and leaving all subsequent adaptation to hand-crafted rules. As a result, these workflows remain fragile, do not generalize well beyond pre-planned scenarios, and often require expert intervention when failures or unexpected intermediate results require changes to the calculation path. Here, we introduce AutoDFT, a closed-loop multi-agent framework that embeds LLM reasoning into every stage of the DFT lifecycle, where a strategic planner produces a skeletal plan of step objectives; a step planner generates numerical parameters just in time from preceding results; and a monitor-recover-reflect cycle diagnoses failures, repairs them, and revises the plan when the evidence justifies it. We demonstrate both breadth and depth: breadth on VASPBench, a purpose-built benchmark spanning 34 tasks and 9 DFT calculation types, where AutoDFT achieves 94.1% task-level success with GPT-5.2; and depth on established materials databases, where AutoDFT produces quantitatively reliable property predictions across electronic, magnetic, and energetic properties. By closing the loop between planning and execution, AutoDFT enables experimentalists without deep computational expertise to obtain reliable first-principles results.

2605.29916 2026-06-05 cs.NE cs.AI cs.DS math.OC

Selection Hyper-heuristics Can Automatically Adjust the Learning Period to Optimally Solve Pseudo-Boolean Problems

选择超启发式可以自动调整学习周期以最优地解决伪布尔问题

Benjamin Doerr, Pietro S. Oliveto, John Alasdair Warwicker

发表机构 * Laboratoire d’Informatique (LIX), CNRS, École Polytechnique, Institut Polytechnique de Paris(信息实验室(LIX),法国国家科学研究中心,巴黎高等理工学院,巴黎理工学院) Department of Computer Science and Engineering, Southern University of Science and Technology(计算机科学与工程系,南方科技大学) School of Computing & Communications, Lancaster University Leipzig(计算与通信学院,莱斯特大学莱比锡分校)

AI总结 本文提出一种自动设置学习周期参数的超启发式方法,证明其能在1-o(1)比例的迭代中选择最优邻域大小,从而以最优时间(忽略低阶项)优化LeadingOnes基准问题。

Comments To appear in "Artificial Intelligence"

详情
Journal ref
Artificial Intelligence 357:104560 (2026)
AI中文摘要

最近研究表明,随机梯度超启发式在使用随机局部搜索(RLS)元启发式优化LeadingOnes基准时,能够学习最优邻域大小。然而,这需要使用一定长度$τ$的学习周期,这与经典超启发式不同,后者仅基于前一次迭代的成功来改变行为。在本文中,我们展示了如何自动设置这个新参数值,从而使用户免于控制这一新颖算法参数的非平凡任务。我们证明,由此产生的超启发式在$1-o(1)$比例的迭代中选择最优邻域大小,并因此以这些邻域大小所能达到的最佳时间(忽略低阶项)优化LeadingOnes基准。

英文摘要

The Random Gradient hyper-heuristic was recently shown to be able to learn the optimal neighbourhood size when optimizing the LeadingOnes benchmark via the Randomised Local Search (RLS) meta-heuristic. However, for this to happen, a learning period of a certain length $τ$ had to be used, differently from classic hyper-heuristics, which change their behaviour based on the success of only the previous iteration. In this paper, we show how to automatically set this new parameter value, relieving the user from the non-trivial task of controlling this novel algorithm parameter. We prove that the resulting hyper-heuristic selects the optimal neighbourhood size in a $1-o(1)$ fraction of the iterations and, consequently, optimises the LeadingOnes benchmark in the best possible time (apart from lower-order terms) achievable with these neighborhood sizes.

2605.29054 2026-06-05 cs.SE cs.CL

Converted, Not Equivalent: Benchmarking Codebase Conversion via Observational Equivalence

转换而非等价:通过观察等价性基准测试代码库转换

Linxin Song, Jiefeng Chen, Yue Huang, Bhavana Dalvi Mishra, Chi Wang, Jieyu Zhao, Jinsung Yoon, Tomas Pfister

发表机构 * University of Southern California(南加州大学) Google Cloud AI Research(谷歌云人工智能研究) University of Notre Dame(圣约翰大学) Google Deepmind(谷歌深Mind)

AI总结 针对代码库转换中智能体过度信任本地验证导致语义违反的问题,提出T2J-Bench基准,通过固定等价契约和三级验证(Spec、Numeric、Behavioral)评估转换质量,发现最佳系统通过率仅26.7-28.9%,且所有系统高估成功率66.6-97.8点。

详情
AI中文摘要

编码智能体日益成为代码库规模的协作者,能够协助代码库转换,但这一进展暴露了一个关键弱点:智能体往往过度信任自己的本地验证例程,并在满足表面检查但违反用户实际关心的语义契约的工件上宣布成功。这个问题在代码库转换中尤为严重,因为先前的评估主要是结果驱动的,因此不稳定:两个实现可以在浅层结果上匹配,例如单个前向损失,但在梯度、优化器行为或短期训练动态上存在差异。我们引入了T2J-Bench,一个代码库转换基准,它将转换重新定义为在固定等价契约下的迁移。然后,一个固定验证器通过三个有序阶段比较源代码库和转换后的代码库:Spec(接口可接受性)、Numeric(前向输出、损失、梯度和目标特定张量)和Behavioral(固定种子下的短期训练动态)。在355次盲转换尝试中,尽管Spec通过率高达91.1%,最佳系统总体通过率仅为26.7-28.9%;4.7倍的token预算差异仅产生2.2倍的通过率差异;所有系统相对于固定评估器高估成功率66.6-97.8点。这表明失败更多源于契约不一致的自我验证,而非有限的预算或骨干强度。

英文摘要

Coding agents increasingly act as codebase-scale collaborators that can assist with codebase conversion, but this progress has exposed a critical weakness: agents often over-trust their own local validation routines and declare success on artifacts that satisfy surface checks while violating the semantic contracts users actually care about. This problem is especially acute in codebase conversion, where prior evaluation is largely outcome-driven and therefore unstable: two implementations can match on a shallow outcome, such as a single forward loss, while diverging in gradients, optimizer behavior, or short-horizon training dynamics. We introduce T2J-Bench, a benchmark for codebase conversion that reformulates conversion as transfer under a fixed equivalence contract. A fixed verifier then compares source and converted codebases through three ordered stages: Spec (interface admissibility), Numeric (forward outputs, losses, gradients, and objective-specific tensors), and Behavioral (short training dynamics under fixed seeds). Across 355 blind conversion attempts, the best system reaches only 26.7--28.9% overall pass rate despite Spec pass rates up to 91.1%; a 4.7x token-budget spread yields only a 2.2x pass-rate spread; and all systems overestimate success by 66.6--97.8 points relative to the fixed evaluator. This suggests that failures stem more from contract-misaligned self-validation than from limited budget or backbone strength.

2605.23809 2026-06-05 eess.SY cs.LG cs.SY

Advanced AI Service Provisioning in O-RAN through LLM Engine Integration

通过LLM引擎集成在O-RAN中的高级AI服务提供

Seyed Bagher Hashemi Natanzi, Pranshav Gajjar, Bo Tang, Vijay K. Shah

发表机构 * Department of Electrical and Computer Engineering, Worcester Polytechnic Institute(电气与计算机工程系,沃斯特理工学院) Department of Electrical and Computer Engineering, North Carolina State University(电气与计算机工程系,北卡罗来纳州立大学)

AI总结 提出一种双脑架构,结合LLM的推理能力和轻量级ML引擎的实时性,实现O-RAN中AI服务的自动化部署与配置。

详情
AI中文摘要

开放无线接入网络(O-RAN)架构允许通过模块化的xApp和rApp将AI直接嵌入到RAN中,然而创建这些应用程序——收集数据、训练模型、编写代码以及安全部署它们——仍然缓慢且主要依赖人工。大型语言模型(LLM)具有强大的推理和代码生成能力,但不适合实时RAN控制所需的快速、确定性推理。我们提出了一种概念验证的双脑架构,结合了两者的优势:基于LLM的编排器将运营商意图转化为数据收集策略和部署代码,而自动化ML引擎NeuralSmith通过API按需训练轻量级分类器。我们描述了架构和提供工作流,分享了来自容器化O-RAN 5G SA测试平台的实际见解,并讨论了开放的研究方向。

英文摘要

The Open Radio Access Network (O-RAN) architecture allows AI to be embedded directly into the RAN through modular xApps and rApps, yet creating these applications collecting data, training models, writing code, and deploying them safely remains slow and largely manual. Large Language Models (LLMs) offer strong reasoning and code-generation capabilities but are unsuited for the fast, deterministic inference required in real-time RAN control. We present a proof-of-concept Dual-Brain architecture that combines both strengths: an LLM-based orchestrator translates operator intents into data-collection policies and deployment code, while an automated ML engine, NeuralSmith, trains lightweight classifiers on demand via an API. We describe the architecture and provisioning workflow, share practical insights from a containerized O-RAN 5G~SA testbed, and discuss open research directions.

2604.15524 2026-06-05 eess.SY cs.RO cs.SY

Safe and Energy-Aware Multi-Robot Density Control via PDE-Constrained Optimization for Long-Duration Autonomy

面向长期自主性的安全与能量感知多机器人密度控制:基于PDE约束优化

Longchen Niu, Andrew Nasif, Gennaro Notomista

发表机构 * Department of Electrical and Computer Engineering, University of Waterloo(滑铁卢大学电气与计算机工程系)

AI总结 提出一种结合Fokker-Planck偏微分方程与控制李雅普诺夫/障碍函数的密度控制框架,实现多机器人系统的目标密度跟踪、避障和能量可持续性。

详情
AI中文摘要

本文提出了一种新颖的多机器人系统密度控制框架,具有空间安全性和能量可持续性保证。随机机器人运动通过Fokker-Planck偏微分方程在密度层面进行编码。控制李雅普诺夫函数和控制障碍函数与PDE相结合,以强制实现目标密度跟踪、障碍区域避免以及多个充电周期内的能量充足性。由此产生的二次规划实现了快速的在环实现,可实时调整指令。进行了多机器人实验和广泛仿真,以证明控制器在定位和运动不确定性下的有效性。

英文摘要

This paper presents a novel density control framework for multi-robot systems with spatial safety and energy sustainability guarantees. Stochastic robot motion is encoded through the Fokker-Planck Partial Differential Equation (PDE) at the density level. Control Lyapunov and control barrier functions are integrated with PDEs to enforce target density tracking, obstacle region avoidance, and energy sufficiency over multiple charging cycles. The resulting quadratic program enables fast in-the-loop implementation that adjusts commands in real-time. Multi-robot experiment and extensive simulations were conducted to demonstrate the effectiveness of the controller under localization and motion uncertainties.

2605.21557 2026-06-05 stat.ML cs.AI cs.LG

Scalable Reinforcement Learning via Adaptive Batch Scaling

通过自适应批处理缩放实现可扩展的在线强化学习

Jongchan Park

发表机构 * Jongchan Park

AI总结 本文提出自适应批处理缩放方法,通过动态调整有效批处理大小来平衡强化学习早期的可塑性需求和晚期的稳定收敛,发现增大网络和批处理大小的组合在强化学习中取得最佳性能。

详情
AI中文摘要

传统观点认为大批次训练与强化学习(RL)本质上不兼容,超过一定阈值后增大批次大小通常会导致回报减少或性能下降,由于数据分布的固有非平稳性。我们通过观察非平稳性并非RL的固定属性,而是随着训练过程演变:早期阶段表现出快速的行为转变,需要小批次以保持可塑性,而晚期阶段接近准平稳状态,大批次可实现精确收敛。受此启发,我们提出自适应批处理缩放(ABS),根据学习策略的稳定性动态调整有效批次大小。ABS的核心是行为分歧,一种新的度量指标,通过测量连续更新之间的动作级转变来量化策略非平稳性,用于将批次大小反向缩放至策略波动性。与并行化Q网络(PQN)算法结合并在ALE基准上评估,ABS无缝地平衡了早期阶段的可塑性和晚期阶段的稳定收敛。令人惊讶的是,与传统观点相反,我们的结果表明,较大的网络和较大的批次大小的组合实现了最佳性能——一种之前被认为在强化学习中无法实现的扩展行为,现在通过自适应批处理控制得以解锁。

英文摘要

Conventional wisdom holds that large-batch training is fundamentally incompatible with Reinforcement Learning (RL) - beyond a modest threshold, increasing batch sizes typically yields diminishing returns or performance degradation due to the inherent non-stationarity of the data distribution. We challenge this view by observing that non-stationarity is not a fixed property of RL, but evolves throughout training: early stages exhibit rapid behavioral shifts that demand small batches for plasticity, whereas late stages approach a quasi-stationary regime where large batches enable precise convergence. Motivated by this observation, we propose Adaptive Batch Scaling (ABS), that dynamically adjusts the effective batch size according to the stability of the learning policy. Central to ABS is Behavioral Divergence, a novel metric that quantifies policy non-stationarity by measuring action-level shifts between consecutive updates, which we use to scale batch size inversely to policy volatility. Integrated with the Parallelised Q-Network (PQN) algorithm and evaluated on the ALE benchmark, ABS seamlessly reconciles early-stage plasticity with late-stage stable convergence. Strikingly, contrary to conventional wisdom, our results reveal that the combination of larger networks and larger batch sizes achieves the best performance - a scaling behavior previously thought to be unattainable in RL, now unlocked through adaptive batch control.

2605.10807 2026-06-05 cs.CR cs.AR cs.LG

LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

利用大语言模型进行安全硬件设计及相关问题:机遇与挑战

Johann Knechtel, Ozgur Sinanoglu, Ramesh Karri

发表机构 * New York University Abu Dhabi(纽约大学阿布扎克分校) NYU Tandon School of Engineering(纽约大学塔能工程学院)

AI总结 本文探讨了大语言模型在电子设计自动化和硬件安全领域的应用,分析了其在生成RTL代码、自动生成测试平台以及弥合高层次规格与硅芯片之间语义差距方面的潜力,同时指出了其引入的严重安全漏洞,并总结了当前研究的最新进展和未来研究方向。

Comments Accepted for 2026 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

详情
AI中文摘要

将大语言模型(LLMs)整合到电子设计自动化(EDA)和硬件安全领域正迅速重塑半导体行业。尽管LLMs在生成寄存器传输级(RTL)代码、自动生成测试平台以及弥合高层次规格与硅芯片之间的语义差距方面提供了前所未有的能力,但同时它们也引入了严重的安全隐患。本文全面回顾了LLM驱动的硬件设计的最新研究进展,围绕EDA综合、硬件信任、安全设计和教育等方面的关键进展进行深入分析。我们系统地扩展了最近突破的方法——从基于推理的综合和多代理漏洞提取到数据污染和对抗性机器学习(ML)规避。我们整合了关于关键防御措施的一般讨论,如动态基准测试以对抗数据记忆和激进的红队测试以实现稳健的安全评估。最后,我们综合了跨领域的经验教训,以指导未来研究朝着安全、可信和自主的设计生态系统发展。

英文摘要

The integration of Large Language Models (LLMs) into Electronic Design Automation (EDA) and hardware security is rapidly reshaping the semiconductor industry. While LLMs offer unprecedented capabilities in generating Register Transfer Level (RTL) code, automating testbenches, and bridging the semantic gap between high-level specifications and silicon, they simultaneously introduce severe vulnerabilities. This comprehensive review provides an in-depth analysis of the state-of-the-art in LLM-driven hardware design, organized around key advancements in EDA synthesis, hardware trust, design for security, and education. We systematically expand on the methodologies of recent breakthroughs -- from reasoning-driven synthesis and multi-agent vulnerability extraction to data contamination and adversarial machine learning (ML) evasion. We integrate general discussions on critical countermeasures, such as dynamic benchmarking to combat data memorization and aggressive red-teaming for robust security assessment. Finally, we synthesize cross-cutting lessons learned to guide future research toward secure, trustworthy, and autonomous design ecosystems.

2603.17837 2026-06-05 eess.AS cs.CL

The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning

沉默的思维:通过潜在推理建模全双工语音对话模型中的内部认知

Donghang Wu, Tianyu Zhang, Yuxin Li, Hexin Liu, Chen Chen, Eng Siong Chng, Yoshua Bengio

发表机构 * DeepMind(深度Mind)

AI总结 本文提出了一种名为FLAIR的全双工语音对话模型,通过潜在推理同时进行语音感知和内部思考,以提高对话质量,该方法在多个语音基准测试中取得了竞争性的结果。

Comments Accepted by ICML 2026

详情
AI中文摘要

在对话互动中,人类在听讲者说话时会潜意识地进行同时思考。尽管这种内部认知处理可能不总是表现为显式的语言结构,但它是制定高质量响应的关键。受这一认知现象的启发,我们提出了一种名为FLAIR的新全双工潜在和内部推理方法,该方法在语音感知的同时进行潜在思考。与传统NLP中的“思考”机制不同,我们的方法不需要事后生成,而是无缝地与语音对话系统结合:在用户说话阶段,它将前一步的潜在嵌入输出递归地馈入下一步,从而实现连续推理,严格遵循因果性而不引入额外延迟。为了实现这种潜在推理,我们设计了一个基于证据下界的目标,支持通过教师强制进行高效的监督微调,从而避免了需要显式推理注释的需要。实验表明,这种听的同时思考设计在多个语音基准测试中均取得了竞争性的结果。此外,FLAIR能够稳健地处理对话动态,并在全双工交互指标上取得了竞争性的性能。

英文摘要

During conversational interactions, humans subconsciously engage in concurrent thinking while listening to a speaker. Although this internal cognitive processing may not always manifest as explicit linguistic structures, it is instrumental in formulating high-quality responses. Inspired by this cognitive phenomenon, we propose a novel Full-duplex LAtent and Internal Reasoning method named FLAIR that conducts latent thinking simultaneously with speech perception. Unlike conventional "thinking" mechanisms in NLP, which require post-hoc generation, our approach aligns seamlessly with spoken dialogue systems: during the user's speaking phase, it recursively feeds the latent embedding output from the previous step into the next step, enabling continuous reasoning that strictly adheres to causality without introducing additional latency. To enable this latent reasoning, we design an Evidence Lower Bound-based objective that supports efficient supervised finetuning via teacher forcing, circumventing the need for explicit reasoning annotations. Experiments demonstrate the effectiveness of this think-while-listening design, which achieves competitive results on a range of speech benchmarks. Furthermore, FLAIR robustly handles conversational dynamics and attains competitive performance on full-duplex interaction metrics.

2605.13587 2026-06-05 stat.ML cs.LG eess.SP

Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: A large-scale benchmark of operator-adaptive PLS and Ridge models

将预处理选择重新定义为近红外光谱学中的模型内部校准:一种大规模的运算符自适应PLS和岭模型基准测试

Gregory Beurier, Robin Reiter, Camille Noûs, Lauriane Rouan, Denis Cornet

发表机构 * CIRAD, UMR AGAP Institut(CIRAD,AGAP研究院) UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro(AGAP研究院,蒙彼利埃大学,CIRAD,INRAE,农业研究院) Laboratoire Cogitamus(Cogitamus实验室)

AI总结 本文研究了在近红外光谱学中,将预处理选择重新定义为模型内部校准的方法,通过大规模基准测试比较运算符自适应PLS和岭模型的性能和效率。

Comments 17 pages, 8 figures; supplementary material (39 pages, 4 figures) included. Extended preprint version of a companion study prepared as a concise journal article (same results, different framing and scope). Code and artifacts: https://github.com/GBeurier/nirs4all-aom

详情
AI中文摘要

预处理筛选通常是近红外光谱学校准工作流程中最昂贵的部分。其有效性在于平滑、导数、去趋势及相关滤波器会改变PLS或岭回归所看到的光谱方向,但完整的外部搜索会反复拟合几乎相同的线性模型。本文研究了将该搜索折叠成一个校准步骤的情况。对于严格的线性预处理运算符,变换后的PLS交叉协方差满足(XA^T)^T Y = AX^T Y,而岭回归依赖于运算符诱导的核X A^T A X^T。这些恒等式允许在模型内部筛选有限的运算符银行,同时保留原始波长系数。样本自适应或拟合的校正如SNV、MSC、EMSC和ASLS仍保持为折叠局部分支,而不是被吸收进代数中。本研究使用AOM基准队列:在显式中包含61个回归行和17个分类行。在主回归分母(N=32)上,普通的紧凑银行AOM-PLS记录了与PLS默认值相比的中位RMSEP比为0.991,与PLS-HPO相比为0.990;所选的ASLS-AOM-compact-cv5分支在相同的两个参考上记录为0.985和1.002。普通的AOMRidge-global-compact-none基线记录了与Ridge默认值相比的0.974,与Ridge-HPO相比为0.984,而所选的AOMRidge-Blender-headline-spxy3记录为0.918和0.966。所选分类器AOM-PLS-DA-global-simpls-covariance在13个数据集上将平衡精度提高了0.159,其中12/13胜出。运行时间差距是实际结果:PLS-HPO每次运行的中位总时间是710.81秒,而所选的AOM-PLS分支仅为1.63秒。线性运算符自适应校准因此在预测质量上与彻底的预处理筛选相当,对于PLS来说,拟合时间减少了多个数量级。

英文摘要

Preprocessing screening is often the most expensive part of a near-infrared spectroscopy calibration workflow. It works because smoothing, derivatives, detrending and related filters change the spectral directions seen by partial least squares (PLS) or Ridge regression, but a full external search repeatedly refits nearly the same linear model. This paper studies the case where that search can be collapsed into one calibration step. For a strict linear preprocessing operator A acting on row spectra as XA^T, the transformed PLS cross-covariance satisfies (XA^T)^T Y = A X^T Y, and Ridge regression depends on the operator-induced kernel X A^T A X^T. These identities let a finite operator bank be screened inside the model while retaining original-wavelength coefficients, and the same identity extends to cheaply evaluated linear operator chains. Sample-adaptive or fitted corrections such as SNV, MSC, EMSC and ASLS are not strict linear; we prove the boundary and keep them as fold-local branches. The cohort has 61 regression and 17 classification rows, with a strict paired regression denominator of N=32 for the eight paper variants. There, AOM-PLS reaches median RMSEP ratios of 0.991/0.990 (simple) and 0.985/1.002 (best) against PLS-default/PLS-HPO, and AOM-Ridge reaches 0.974/0.984 (simple) and 0.918/0.966 (best) against Ridge-default/Ridge-HPO. The operator-adaptive classifier AOM-PLS-DA improves balanced accuracy by a median 0.159 on N=13 datasets (12/13 wins). The practical result is the runtime gap: PLS-HPO takes a median 710.81 s per run, whereas AOM-PLS takes 1.18-1.63 s -- 436 to 602 times less PLS fitting time. Linear operator-adaptive calibration thus gives prediction quality comparable to exhaustive preprocessing screening, with orders-of-magnitude less fitting time for PLS.

2605.15212 2026-06-05 cs.AR cs.AI cs.CE

Fault tolerance estimation in digital circuits with visualised generative networks

数字电路中故障容错估计与可视化生成网络

Sascha Biel, Carl Alexander Gaede, Amiel Glaser, Jan Wolter, Alexej Schelle

发表机构 * IU Internationale Hochschule(国际大学) Constructor University(Constructor大学)

AI总结 本文提出一种新的数值方法,通过生成网络采样技术估计数字电路结构中故障模式的容错性,通过比较理想数字化的模拟电流的随机输入与生成对抗网络(GAN)判别器部分的现实信号,计算与理想数字电子信号的偏差,包括缺失或互换逻辑器件等误差模式。

Comments 7 pages, 7 figures, 1 table

详情
AI中文摘要

我们提出了一种新的数值方法,用于估计数字电路结构中故障模式的容错性,采用生成网络采样技术。从由理想数字化的模拟电流的随机输入生成的位配置开始,在经典逻辑门的数字电路设计中,将预期输出电流与生成对抗网络(GAN)判别器部分的数值实验中的现实信号进行比较,以计算与理想数字电子信号的偏差,包括各种误差模式,如缺失或互换的逻辑器件。从GAN在复变量中的表示分析来看,可以通过区分与不同经典逻辑元件相关的故障模式的影响来评估电子设计的鲁棒性。

英文摘要

We propose a new numerical method to estimate the fault tolerance of failure modes in digital circuit structures with a generative network sampling technique. From a random input of generated bitwise configurations of ideally digitalised analog currents in the digital circuit design with classical logical gates, expected output currents are compared to the realistic signals of a numerical experiment at the discriminator part of the Generative Adversarial Network (GAN) to calculate the deviation from ideal digital electronic signals, including various error modes, such as missing or interchanged logical devices. From the present analysis of a representation of the GAN in terms of complex variables, it is possible to evaluate the robustness in electronic designs by differentiating the impact of failure modes associated with different classical logical elements in the circuit.

2605.12951 2026-06-05 stat.ML cs.LG

Coreset-Induced Conditional Velocity Flow Matching

由Coreset诱导的条件速度流匹配

Xiao Wang, Zihua She, Jianxi Su

发表机构 * Department of Statistics, Purdue University(普渡大学统计学系)

AI总结 本文提出了一种生成模型CCVFM,通过数据驱动的源分布增强层次化修正流,利用Coreset压缩目标数据并生成高斯混合分布,从而在无需学习神经采样器的情况下实现条件速度律的闭式表达,并通过轻量级修正流进一步优化生成效果。

详情
AI中文摘要

我们提出了Coreset-Induced Conditional Velocity Flow Matching (CCVFM),一种生成模型,通过数据驱动的源分布增强层次化修正流。层次化流匹配在速度空间中建模完整的条件速度定律,但其内部流被要求从头开始将各向同性高斯噪声传输到多模态目标速度分布。我们的关键观察是,此内部源可以被一个闭式近似替代,该近似基于目标的Coreset。CCVFM首先利用熵Sinkhorn Coreset将目标压缩为加权原子,并将它们提升为高斯混合分布。由此诱导的条件速度定律是一个闭式高斯混合分布,可在不学习神经采样器的情况下进行采样。一个轻量级修正流,从该精确近似源训练而来,然后优化剩余的近似到目标残差,而不是学习整个噪声到数据映射。我们证明,在显式压缩假设下,近似传输成本等于目标-近似Wasserstein差距,而噪声-源的类比具有维度尺度下界。我们进一步刻画了直接近似源训练目标的条件二次矩,并表明当近似条件律接近真实条件速度律在均值和协方差时,其源依赖的超额是小的。实验证明,在MNIST、CIFAR-10、ImageNet-32和CelebA-HQ上,所提方法在匹配架构下实现了具有竞争力的少步生成。

英文摘要

We propose Coreset-Induced Conditional Velocity Flow Matching (CCVFM), a generative model that augments hierarchical rectified flow with a data-informed source distribution. Hierarchical flow matching models the full conditional velocity law in velocity space, but its inner flow is asked to transport isotropic Gaussian noise to a multimodal target velocity distribution from scratch. Our key observation is that this inner source can be replaced by a closed-form surrogate built from a coreset of the target. CCVFM first compresses the target into weighted atoms using an entropic Sinkhorn coreset and lifts them to a Gaussian mixture. The induced conditional velocity law is then a closed-form Gaussian mixture that can be sampled without a learned neural sampler. A lightweight correction flow, trained from this exact surrogate source, then refines the remaining surrogate-to-target residual rather than learning an entire noise-to-data map. We prove that the surrogate transport cost equals the target--surrogate Wasserstein gap under an explicit compression assumption, whereas the noise-source analogue has a dimension-scale lower bound. We further characterize the conditional second moment of the direct surrogate-source training target and show that its source-dependent excess is small when the surrogate conditional law is close to the true conditional velocity law in mean and covariance. Empirically, on MNIST, CIFAR-10, ImageNet-32, and CelebA-HQ, the proposed method reaches competitive few-step generation under matched architectures.

2605.11732 2026-06-05 cs.IR cs.CL cs.MA cs.MM

AgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agents

AgentDisCo: 向开放深度研究代理中的解耦与协作迈进

Jiarui Jin, Zexuan Yan, Shijian Wang, Wenxiang Jiao, Yuan Lu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出AgentDisCo,一种解耦且协作的代理架构,将深度研究视为信息探索与利用之间的对抗优化问题。通过批评代理评估生成的草稿并优化搜索查询,生成代理检索更新结果并修订草稿,最终生成综合报告。该框架通过元优化 harness 支持手工和自动发现的设计策略,并利用强大的代码生成代理自完善。

详情
AI中文摘要

在本文中,我们提出了AgentDisCo,一种新颖的解耦且协作的代理架构,将深度研究视为信息探索与利用之间的对抗优化问题。与现有方法将这两个过程合并到一个模块中不同,AgentDisCo采用一个批评代理来评估生成的草稿并优化搜索查询,以及一个生成代理来检索更新的结果并相应地修订草稿。迭代优化的草稿随后传递给下游的报告撰写代理,以综合生成全面的研究报告。整体工作流通过元优化 harness 支持手工和自动发现的设计策略,其中生成代理被重新利用为评分代理,以评估批评代理的输出并生成质量信号。强大的代码生成代理(例如Claude-Code、Codex)系统地探索代理配置并构建一个策略库,一个结构化的可重用设计策略存储库,使框架能够自我完善而无需大量人工干预。我们在三个已建立的深度研究基准(DeepResearchBench、DeepConsult、DeepResearchGym)上评估AgentDisCo,使用Gemini-2.5-Pro,取得的性能与或优于领先的闭源系统相当。观察到现有基准不足以反映真实世界用户需求,我们引入GALA(通用人工智能生活助手),一个基准,该基准从用户的历史浏览行为中挖掘潜在研究兴趣。我们进一步开发了一个渲染代理,将研究报告转换为视觉丰富的海报演示,并展示了一个端到端的产品AutoResearch Your Interest,该产品根据个人浏览历史提供个性化的深度研究推荐。

英文摘要

In this paper, we present AgentDisCo, a novel Disentangled and Collaborative agentic architecture that formulates deep research as an adversarial optimization problem between information exploration and exploitation. Unlike existing approaches that conflate these two processes into a single module, AgentDisCo employs a critic agent to evaluate generated outlines and refine search queries, and a generator agent to retrieve updated results and revise outlines accordingly. The iteratively refined outline is then passed to a downstream report writer that synthesizes a comprehensive research report. The overall workflow supports both handcrafted and automatically discovered design strategies via a meta-optimization harness, in which the generator agent is repurposed as a scoring agent to evaluate critic outputs and generate quality signals. Powerful code-generation agents (e.g., Claude-Code, Codex) systematically explore agent configurations and construct a policy bank, a structured repository of reusable design strategies, enabling the framework to self-refine without extensive human intervention. We evaluate AgentDisCo on three established deep research benchmarks (DeepResearchBench, DeepConsult, DeepResearchGym) using Gemini-2.5-Pro, achieving performance comparable to or surpassing leading closed-source systems. Observing that existing benchmarks inadequately reflect real-world user needs, we introduce GALA (General AI Life Assistants), a benchmark that mines latent research interests from users' historical browsing behavior. We further develop a rendering agent that converts research reports into visually rich poster presentations, and demonstrate an end-to-end product, AutoResearch Your Interest, which delivers personalized deep research recommendations derived from individual browsing histories.

2605.00174 2026-06-05 cs.AR cs.CV

DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference

DPU 或 GPU 加速神经网络推断——为何不两者都用?分割 CNN 推断

Ali Emre Oztas, Mahir Demir, James Garside, Mikel Luján

发表机构 * The University of Manchester(曼彻斯特大学)

AI总结 本文提出了一种将 CNN 推断任务分割到 DPU 和 GPU 上的方法,以降低延迟。通过在 DPU 处理初始层,GPU 处理剩余层,结合 GNN 分割索引预测方法,实现了比单一 DPU 或 GPU 更高的效率提升。

详情
AI中文摘要

边缘设备上的视频和图像流需要低延迟。为解决此问题,神经网络(NN)被广泛应用,先前的研究主要集中在使用单个硬件单元如图形处理单元(GPU)、可编程门阵列(FPGA)和深度学习处理单元(DPU)来加速这些网络。然而,通过结合这些单元可以进一步减少延迟。本文提出将 CNN 推断任务分割到 DPU 和 GPU 上(Split CNN 推断)。第一个分割部分在 Versal VCK190 的 AI 引擎(DPU)上运行,处理输入图像的初始 CNN 层。DPU 在数据源附近处理第一部分。异步流水线方式下,GPU 运行剩余的层。NVIDIA RTX 2080 GPU 处理第二部分,尽管减少了数据源(存储/摄像头)与 GPU 之间的数据传输。此外,提出了一种基于图神经网络(GNN)的分割索引预测方法,以自动化 Split 推断所需的 CNN 分割。已建立的模型如 LeNet-5、ResNet18/50/101/152、VGG16 和 MobileNetv2 被分析。结果表明,相比仅使用 DPU 的执行,延迟提高了最多 2.48 倍;相比仅使用 GPU 的执行,延迟提高了最多 3.37 倍。训练好的 GNN 模型在适当的设备之间分割层的准确率为 96.27%。

英文摘要

Video and image streaming on edge devices requires low latency. To address this, Neural Networks (NNs) are widely used, and prior work mainly focuses on accelerating them with single hardware units such as Graphics Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs), and Deep Learning Processing Units (DPUs). However, further reductions in latency can be observed by combining these units. In this paper, partitioning CNN inference across DPU and GPU (Split CNN Inference) is proposed. The first partition runs on the AI engines (DPU) of a Versal VCK190, which consists of initial CNN layers processing the input images. The DPU processes the first partition near the source of the data. Pipelined asynchronously, a GPU runs the remaining layers. The GPU (NVIDIA RTX 2080) processes the second partition, albeit having reduced the data transfer between the data source (storage/camera) and the GPU. Furthermore, a Graph Neural Network (GNN)-based partition index prediction method is proposed to automate the partitioning of CNNs needed for Split Inference. Well established models such as LeNet-5, ResNet18/50/101/152, VGG16, and MobileNetv2 are analyzed. Results demonstrate up to 2.48x latency improvement over DPU-only execution and up to 3.37x over GPU-only execution. The trained GNN model splits the layers between the appropriate devices with 96.27% accuracy.