arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4101
2606.00308 2026-06-02 cs.SE cs.AI cs.LG

How Generation Architecture Shapes Code Complexity in Multi-Agent LLM Systems: A Paired Study on HumanEval

生成架构如何塑造多智能体LLM系统中的代码复杂度:基于HumanEval的配对研究

Nazmus Ashrafi

发表机构 * GitHub

AI总结 通过配对实验比较六种多智能体架构在HumanEval上的代码复杂度,发现架构复杂度与功能正确性无正相关,最简架构在准确率上持平或超越复杂架构。

Comments 16 pages, 7 figures, 7 tables

详情
AI中文摘要

大语言模型代码生成已从单次提示转向多智能体编排——分析师、编码员、测试员和调试器流水线——并且几乎完全根据功能正确性进行评估。这些架构是否也影响它们生成代码的结构复杂度,以及哪些编排层承担了成本,在很大程度上仍未得到检验:先前的工作记录了提示级别对代码复杂度的影响,但架构级别的问题仍是开放的。我们在GPT-4o系列的两个模型下,针对所有164个HumanEval任务(1,968个配对观测),使用五个RADON复杂度度量(SLOC、圈复杂度以及Halstead体积、难度和努力),比较了六种广泛使用的多智能体配置(Basic、AC、ACT、Debugger、AC+Debugger、ACT+Debugger)。我们在所有完成和仅通过条件下应用了配对非参数统计流程(Friedman总体检验、Wilcoxon符号秩事后检验与Holm校正、Kendall's W和配对秩双列效应量)。六种架构坍缩为两个不可区分的复杂度簇,间隔50-130%的差距,在两个模型和两种条件下分区相同;在架构层中,分析师-编码员分割增加了复杂度,运行时调试器没有——并且在分析师-编码员背景下主动降低复杂度——而测试员则重新增加复杂度。重簇的额外复杂度并未带来pass@1优势:最简架构在准确率上匹配或超越最重架构。因此,LLM代码生成中的架构细化应通过所关注维度上的实测收益来证明,而非假设。

英文摘要

Large-language-model code generation has shifted from single-shot prompting to multi-agent orchestrations - analyst, coder, tester, and debugger pipelines - and is evaluated almost exclusively on functional correctness. Whether these architectures also affect the structural complexity of the code they produce, and which orchestration layers carry the cost, remains largely unexamined: prior work has documented prompt-level effects on code complexity, but the architecture-level question is open. We compare six widely-used multi-agent configurations (Basic, AC, ACT, Debugger, AC+Debugger, ACT+Debugger) under two models from the GPT-4o family across all 164 HumanEval tasks - 1,968 paired observations - using the five RADON complexity metrics (SLOC, cyclomatic complexity, and Halstead Volume, Difficulty, and Effort). We apply a paired non-parametric statistical pipeline (Friedman omnibus, Wilcoxon signed-rank post-hoc with Holm correction, Kendall's $W$ and matched-pairs rank-biserial effect sizes) in both all-completions and passing-only conditions. The six architectures collapse into two indistinguishable complexity clusters separated by a 50-130% gap, the same partition in both models and under both conditions; among the architectural layers, the analyst-coder split inflates complexity, the runtime debugger does not - and on the analyst-coder background actively deflates it - and the tester re-inflates it. The heavy cluster's additional complexity buys no pass@1 advantage: the leanest architectures match or beat the heaviest on accuracy. Architectural elaboration in LLM code generation should therefore be justified by measured benefit on the dimensions that matter, not assumed.

2606.00298 2026-06-02 math.NA cs.LG cs.NA cs.SY eess.SY math.DS math.OC

Symmetric Hermite quadrature-based balanced truncation for learning linear dynamical systems from derivative data

基于对称Hermite求积的平衡截断:从导数数据学习线性动力系统

Sean Reiter, Steffen W. R. Werner

发表机构 * New York University(纽约大学) Virginia Tech(弗吉尼亚理工学院)

AI总结 提出一种对称Hermite求积平衡截断算法,通过传递函数及其导数数据构建线性降阶模型,保持状态空间Hermite性和渐近稳定性。

Comments 14 pages, 2 figures, 4 tables

详情
AI中文摘要

数据驱动的降阶建模是控制系统计算机辅助设计的重要组成部分。本文提出了一种新颖的对称Hermite形式的求积平衡截断算法,该算法通过评估全阶系统的传递函数及其导数来构建线性降阶模型。值得注意的是,Hermite形式保留了用于生成数据的系统的理想定性性质,例如状态空间Hermite性,进而保持渐近稳定性。

英文摘要

Data-driven reduced-order modeling is an essential component in the computer-aided design of control systems. In this work, we present a novel symmetric Hermite formulation of the quadrature-based balanced truncation algorithm that constructs linear reduced-order models from evaluations of the full-order system's transfer function and its derivative. Significantly, the Hermite formulation preserves desirable qualitative properties of the system used to generate the data, such as state-space Hermiticity and, consequently, asymptotic stability.

2606.00297 2026-06-02 eess.SY cs.RO cs.SY

Predicted-Flow Control Barrier Functions for Real-Time Safe Optimal Control

预测流控制障碍函数用于实时安全最优控制

Amirsaeid Safari, Jesse B. Hoagg

发表机构 * Department of Mechanical and Aerospace Engineering, University of Kentucky(机械与航空航天工程系,肯塔基大学)

AI总结 本文提出预测流控制障碍函数(P-CBF),通过将CBF推广为预测流的泛函,结合终端候选和规划时间偏移,实现有限预测时域内的安全证书,并统一了有限时域积分成本优化与安全认证。

详情
AI中文摘要

控制障碍函数(CBF)通过状态上的逐点条件提供实时安全保证。然而,合成有效的CBF是困难的,且得到的控制器是短视的。为解决短视问题,本文引入了预测流控制障碍函数(P-CBF),它将CBF从当前状态的函数推广为在有限预测时域内参数化控制计划下的预测流的泛函。为了安全,P-CBF可以证明预测流在整个预测时域内处于安全集中。然而,候选P-CBF面临与候选CBF相同的挑战,即控制约束使得保证P-CBF的有效性变得困难。本文通过引入终端候选P-CBF(要求预测流在终端时刻终止于备份安全集)和规划时间偏移(调节预测时域,提供额外的自由度以确保可行性)来解决这一挑战。实时控制以及控制计划参数和规划时间偏移的演化由单个凸优化联合确定,该优化保证可行且使相关安全集前向不变。所得到的安全最优流控制在整个预测时域内提供安全证书,并统一了有限时域积分成本优化与安全认证。如果控制约束是凸多面体,则该优化简化为二次规划(QP)。该QP实现称为FlowBarrier,在非完整地面机器人穿越密集环境的场景中进行了验证。FlowBarrier与非线性模型预测控制和两种基于CBF的安全滤波方法在100次试验中进行了比较,FlowBarrier实现了最高的目标到达率、零安全违规和最低的计算时间。

英文摘要

Control barrier functions (CBFs) provide real-time safety guarantees through pointwise conditions on the state. However, synthesizing a valid CBF is difficult and the resulting controllers are myopic. To address myopia, this article introduces predicted-flow control barrier functions (P-CBFs), which generalize the CBF from a function of the current state to a functional of a predicted flow under a parametrized control plan over a finite prediction horizon. For safety, a P-CBF can certify that the predicted flow is in a safe set over the entire prediction horizon. However, candidate P-CBFs suffer from the same challenge as candidate CBFs, namely, control constraints make it difficult to guarantee that the P-CBF is valid. This article resolves this challenge by introducing a terminal candidate P-CBF requiring that the predicted flow end in a backup safe set at the terminal time, and a planning-time shift that modulates the prediction horizon, providing an additional degree of freedom to ensure feasibility. The real-time control and the evolution of the control-plan parameter and planning-time shift are determined jointly by a single convex optimization that is guaranteed to be feasible and renders the associated safe set forward invariant. The resulting safe optimal flow control provides a safety certificate over the entire prediction horizon and unifies finite-horizon integral-cost optimization with safety certification. This optimization reduces to a quadratic program (QP) if the control constraints are a convex polytope. The QP implementation, termed FlowBarrier, is validated on a nonholonomic ground robot navigating a dense environment. FlowBarrier is compared to nonlinear model predictive control and two CBF-based safety filter methods across 100 trials, where FlowBarrier achieves the highest goal-reaching rate, zero safety violations, and the lowest computation time.

2606.00296 2026-06-02 stat.ML cs.LG math.AP

Is Zero-Shot Super-Resolution Possible in Operator Learning?

零样本超分辨率在算子学习中是否可能?

Unique Subedi, Ambuj Tewari

发表机构 * Unique Subedi Ambuj Tewari

AI总结 本文系统研究算子学习中的零样本超分辨率现象,证明其在信息论上可能不可行,并识别输出函数的Hölder光滑性作为充分条件,给出泛化界。

详情
AI中文摘要

神经算子常被报道具有零样本超分辨率能力,即模型在粗网格上训练后,无需额外训练即可在更细的测试网格上产生准确预测。尽管有强有力的经验证据,这一现象的理论基础仍不清楚。本文对算子学习中的零样本超分辨率进行了系统的理论研究。我们首先证明,即使在输入函数在整个连续域上可用且真实映射为简单秩一线性算子的良性设置下,零样本超分辨率在信息论上也可能不可行。然后,我们识别出输出函数的Hölder光滑性作为零样本超分辨率的充分条件,并推导出相应的泛化界。最后,我们通过实验结果验证了所识别的失败模式。

英文摘要

Neural operators are often reported to exhibit zero-shot super-resolution, a phenomenon in which a model trained on coarse grids produces accurate predictions on finer testing grids without additional retraining. Despite strong empirical evidence, the theoretical foundations of this phenomenon remain unclear. In this work, we provide a systematic theoretical study of zero-shot super-resolution in operator learning. We first show that zero-shot super-resolution can be information-theoretically impossible even in benign settings such as when the input functions are available over the entire continuum and the ground truth is a simple rank-one linear operator. We then identify H{\" o}lder smoothness of the output functions as a sufficient condition for zero-shot super-resolution and derive corresponding generalization bounds. Finally, we also validate the identified failure modes through experimental results.

2606.00291 2026-06-02 cs.GT cs.LG

The Representation-Rationalizability Tradeoff in Reward Learning

奖励学习中的表示-可理性权衡

Jing Dong, Yaoliang Yu, Pascal Pourpart

发表机构 * Vector Institute(向量研究所) University of Waterloo(滑铁库大学)

AI总结 本文研究RLHF中奖励学习面临的表示与可理性之间的权衡,通过分解交叉熵损失为表示项和聚合项,证明更丰富的表示会扩大不可理性比较的数量,且联合训练无法自动达到最优平衡点。

详情
AI中文摘要

在RLHF中,每个训练样本包含一个提示$x$和两个候选回答$y,y'$,标注者提供这些回答之间的成对偏好。学习问题是将这些异质成对判断转换为一个标量奖励$r(x,y)$,用于衡量每个提示的回答质量。经典社会选择理论表明这是不可能的,因为异质标注者样本可能导致具有孔多塞循环的汇总偏好,因此没有标量奖励能够一致地评估所有被比较的回答对。越来越多的文献将RLHF作为社会选择问题进行分析,但通常假设固定的有限备选集合,即每个提示预先列举的有限候选回答集。现代流程则通过一个学习的表示$ϕ(x,y)$对回答进行评分,然后通过标量头,因此$ϕ$决定了哪些回答被视为可区分的备选,以及哪些比较对奖励模型可见。一旦嵌入成为问题的一部分,社会选择理论中的不可能结果就变成了一个权衡。我们证明,任何基于$ϕ$构建的奖励的额外交叉熵损失可以精确分解为一个表示项(更丰富的$ϕ$会缩小它)和一个聚合项(更丰富的$ϕ$通过暴露更多无法被任何标量一致排序的比较而扩大它)。相同的结果扩展到直接偏好优化(DPO),并且联合训练嵌入和奖励不能保证恢复此权衡的最佳点。在合成数据和真实偏好数据集上的实验证实了我们的结果。

英文摘要

In RLHF, each training example contains a prompt $x$ and two candidate responses $y,y'$, and annotators provide pairwise preferences between these responses. The learning problem is to convert these heterogeneous pairwise judgments into a single scalar reward $r(x,y)$ that measures response quality for each prompt. Classical social choice implies an impossibility because heterogeneous annotator samples can induce pooled preferences with Condorcet cycles, so no scalar reward can evaluate all compared response pairs consistently. A growing literature analyzes RLHF as a social-choice problem, but usually assumes a fixed finite set of alternatives, i.e., a pre-enumerated finite set of candidate responses for each prompt. Modern pipelines instead score responses through a learned representation $ϕ(x,y)$ before a scalar head, so $ϕ$ determines which responses are treated as distinguishable alternatives and which comparisons are visible to the reward model. Once this embedding is part of the problem, the impossibility results from social choice theory become a tradeoff. We show that the excess cross-entropy loss of any reward built on $ϕ$ decomposes exactly into a representational term, which a richer $ϕ$ shrinks, and an aggregation term, which a richer $ϕ$ enlarges by exposing more comparisons that no scalar can rank consistently. The same results extend to direct preference optimization (DPO), and jointly training the embedding and the reward cannot guarantee to recover the sweet spot of this tradeoff. Experiments on synthetic data and real preference datasets corroborate our results.

2606.00282 2026-06-02 cs.IR cs.AI

Synthetic Data from Cross-Domain Events for Large-Scale Recommendation Systems

跨域事件的合成数据用于大规模推荐系统

Xiangyu Wang, Yawen He, Shivendra Pratap Singh, Han Huang, Mengtong Hu, Sharath Ciddu, Yi-Hsuan Hsieh, Erik Groving, Yi Ding, Jieming Di, Tony Wang, Min Yun, Xiaoyu Chen, Ling Leng, Rob Malkin

发表机构 * Meta

AI总结 提出SCALR框架,通过源域事件生成目标域的合成用户-物品交互事件,以缓解数据稀疏和噪声反馈问题,并在工业推荐平台的在线A/B测试中取得显著改进。

Comments 13 pages, 3 figures

详情
AI中文摘要

大规模推荐系统在多个域中运行,但面临数据稀疏和噪声隐式反馈的挑战。传统方法通过从源域到目标域的特定模型知识蒸馏来缓解这一问题。受大型语言模型(LLM)中合成数据生成的变革性成功启发,我们提出了用于推荐的合成跨域增强与学习(SCALR)框架,该框架通过利用源域中的观察事件,为目标推荐域生成合成用户-物品交互事件。SCALR将跨域学习分解为两个模块化阶段。首先,它通过将事件生成视为估计用户在源域中观察到的交互条件下与目标域物品交互的可能性,来翻译源域中的观察用户事件。其次,下游模型将这些合成事件作为跨域学习目标进行训练,其中合成事件以模型无关的方式增强目标域的训练数据。我们的方法在工业推荐平台的在线A/B测试中取得了统计显著的改进。据我们所知,这是首批明确将跨域事件转移作为推荐系统合成数据生成的工作之一。

英文摘要

Large-scale recommendation systems operate across diverse domains, yet they face the challenges of data sparsity and noisy implicit feedback. Traditional approaches mitigate this via model-specific knowledge distillation from source domains to a target domain. Inspired by the transformative success of synthetic data generation in large language models (LLMs), we introduce Synthetic Cross-domain Augmentation and Learning for Recommendation (SCALR), a framework that generates synthetic user-item interaction events for a target recommendation domain by leveraging observed events from a source domain. SCALR decomposes cross-domain learning into two modular stages. First, it translates observed user events in source domains by framing event generation as estimating the likelihood that a user would interact with a target-domain item, conditioned on their observed interactions in a source domain. Second, downstream models train on these synthetic events as cross-domain learning objectives, where the synthetic events augment the target domain's training data in a model-agnostic manner. Our approach yields statistically significant improvements in online A/B tests on an industrial recommendation platform. To the best of our knowledge, this is among the first works to explicitly frame cross-domain event transfer as synthetic data generation for recommendation systems.

2606.00281 2026-06-02 physics.ao-ph cs.LG

Flow Matching for Convective-Scale Precipitation Downscaling

对流尺度降水降尺度的流匹配方法

Tom Wetherell

发表机构 * Met Office(英国气象局)

AI总结 针对对流尺度降水降尺度问题,提出流匹配生成模型,相比扩散模型在空间技能上表现更优,但低估降水分布上尾导致气候平均偏干。

详情
AI中文摘要

生成式机器学习正日益成为动力降尺度的重要补充,用于生成高分辨率降水预测,其中扩散模型是目前领先的方法。流匹配是一种相关的生成框架,最近在图像、视频和其他领域取得了强劲成果,并在降尺度方面显示出早期前景。我们训练了一个流匹配模型,将新加坡周围对流尺度区域上的每日降水从8公里映射到2公里,并将其与基于分数的扩散模型CPMGEM进行基准测试。流匹配在空间技能上始终表现更好:在每个降水阈值和邻域尺度测试中,分数技能得分更高,并且SAL得分的结构和幅度分量更紧密,位置技能相当。然而,流匹配低估了降水分布的上尾,导致气候平均存在干偏差。这些结果表明,流匹配是对流尺度降水降尺度的竞争性生成框架,特别适合捕捉空间结构。

英文摘要

Generative machine learning is an increasingly important complement to dynamical downscaling for producing high-resolution precipitation projections, with diffusion models currently the leading approach. Flow matching is a related generative framework that has recently achieved strong results across image, video and other domains, and shown early promise for downscaling. We train a flow matching model to map daily precipitation from 8 km to 2 km over a convective-scale domain centred on Singapore, and benchmark it against CPMGEM, a score-based diffusion model. Flow matching achieves consistently better spatial skill: higher fractions skill score at every precipitation threshold and neighbourhood scale tested, and tighter structure and amplitude components of the SAL score with comparable location skill. However, flow matching underestimates the upper tail of the precipitation distribution, resulting in a dry bias in the climatological mean. These results suggest that flow matching is a competitive generative framework for convective-scale precipitation downscaling, particularly well suited to capturing spatial structure.

2606.00266 2026-06-02 cs.NI cs.LG

KISS: Keeping it Simple and Slotted when Learning to Communicate over Wireless

KISS:学习无线通信时保持简单和时隙化

Kamil Szczech, Maksymilian Wojnar, Krzysztof Rusek, Katarzyna Kosek-Szott, Szymon Szott

发表机构 * AGH University of Krakow(克拉科夫AGH大学)

AI总结 本文使用离线双深度Q网络结合贝叶斯推理,在时隙信道上训练分布式智能体自主学习随机接入策略,实现了接近理论效率且公平的接入,并发现学习到的行为类似于动态调整传输概率的时隙ALOHA。

详情
AI中文摘要

分布式无线系统中长期存在的挑战是确保高效且公平的随机信道接入。现有解决方案通常处理与时间、周期性或集中化相关的特定约束,但它们通常依赖固定启发式方法。受机器学习(ML)最新进展的启发,我们研究ML智能体能否自主学习高效且公平的接入策略,以及这种学习能否为介质访问控制(MAC)设计提供新见解。我们的目标不是提出可部署的协议,而是检验在最小假设下,分散式学习能否重新发现或近似理论上高效的随机接入机制。为此,我们部署了带有贝叶斯推理的离线双深度Q网络(DDQN)来训练在时隙信道上运行的智能体。所得方法完全在线(无需预训练)、完全分布式(独立的多智能体学习器)、随机(非周期性),且无需协调或显式通信。大量仿真表明,学习到的策略适应变化的网络条件,并在保持公平性的同时实现接近理论的效率。消融研究进一步揭示,学习到的行为类似于具有动态调整传输概率的时隙ALOHA,因此我们将该方法称为KISS:保持简单和时隙化。

英文摘要

A long-standing challenge in distributed wireless systems is ensuring efficient and fair random channel access. Existing solutions often address specific constraints related to timing, periodicity, or centralization, but they typically rely on fixed heuristics. Motivated by recent advances in machine learning (ML), we investigate whether ML agents can autonomously learn efficient and fair access strategies, and whether such learning can offer new insights into medium access control (MAC) design. Rather than proposing a deployable protocol, our aim is to examine whether decentralized learning can rediscover or approximate theoretically efficient random-access mechanisms under minimal assumptions. To this end, we deploy an off-policy Double Deep Q-Network (DDQN) with Bayesian inference to train agents operating over a slotted channel. The resulting method is fully online (no pre-training), fully distributed (independent multi-agent learners), stochastic (non-periodic), and requires no coordination or explicit communication. Extensive simulations show that the learned strategy adapts to varying network conditions and achieves near-theoretical efficiency while maintaining fairness. Ablation studies further reveal that the learned behavior resembles slotted ALOHA with a dynamically adjusted transmission probability, leading us to refer to the method as KISS: Keeping It Simple and Slotted.

2606.00263 2026-06-02 eess.SP cs.LG

ReFLEX: Length-Generalizable CSI Denoising for MIMO-OFDM via Relative-Frequency Bias

ReFLEX: 通过相对频率偏置实现MIMO-OFDM中长度可泛化的CSI去噪

Zhibin Zhang, Robert Potekhin, Ziwei Wan, Vladimir Lyashev, Zhen Gao

发表机构 * Moscow Institute of Physics and Technology (State University)(莫斯科物理技术学院(国家大学)) Yangtze Delta Region Academy(长江三角洲地区研究院) Beijing Institute of Technology(北京理工大学) School of Interdisciplinary Science(交叉科学学院)

AI总结 提出ReFLEX,一种基于相对频率位置偏置(RFPB)的长度可泛化Transformer,用于MIMO-OFDM系统中可变RB分配的CSI去噪,在未见RB长度和稀疏DM-RS场景下无需重训即可应用,并在3GPP信道和NR PUSCH仿真中显著提升性能。

Comments 5 pages, 3 figures, submitted to IEEE journal

详情
AI中文摘要

本文研究了具有可变NR资源块(RB)分配的MIMO-OFDM系统的CSI去噪问题。ReFLEX是一种长度可泛化的Transformer,其频率注意力使用由子载波偏移生成的相对频率位置偏置(RFPB)。单个检查点可处理未见过的RB长度,并可应用于测试的RB5/RB10 PUSCH配置中的稀疏DM-RS观测,无需重新训练。在3GPP TR 38.901 UMa NLOS信道中,ReFLEX在未见RB长度上实现了约-9.6 dB的NMSE。在NR PUSCH/UL-SCH仿真中,ReFLEX去噪后接时频插值将10% BLER阈值降低了约2-3 dB。

英文摘要

This letter studies CSI denoising for MIMO--OFDM with variable NR resource block (RB) allocations. ReFLEX is a length-generalizable Transformer whose frequency attention uses a relative-frequency position bias (RFPB) generated from subcarrier offsets. A single checkpoint handles unseen RB lengths and can be applied to sparse DM-RS observations in the tested RB5/RB10 PUSCH setup without retraining. In a 3GPP~TR~38.901 UMa NLOS channel, ReFLEX achieves about $-9.6$~dB NMSE on unseen RB lengths. In NR PUSCH/UL-SCH simulations, ReFLEX denoising followed by time-frequency interpolation reduces the 10\% BLER threshold by about 2--3~dB.

2606.00235 2026-06-02 physics.soc-ph cs.AI cs.CY cs.MA

Civilizational Metamaterials: Engineering Coordination Under Capability Gradients and Structural Turbulence

文明超材料:能力梯度与结构湍流下的协调工程

David Orban

发表机构 * Independent Researcher(独立研究者)

AI总结 受超材料物理学启发,提出将治理从规范性学科转变为工程学科的正式框架,通过有效协调系数模型预测自愈与自失稳相变,并设计可检验假设与实验方案。

Comments 19 pages, 4 figures. Accepted for presentation at AGI-26 (Springer LNAI, forthcoming). v2 corrects the sign of the synergy term in the constitutive law (Eq. 2) and reformulates H3 as a threshold-crossing claim, per peer review

详情
AI中文摘要

我们认为治理必须从规范性学科转变为工程学科,并受超材料物理学启发,开发了一个正式框架,使这一转变量化和可检验。通用人工智能主要通过提高决策速度来影响文明,而人类验证能力仍然有限。当验证AI生成输出的成本超过基于其行动的预期效用时,理性主体默认不行动:我们称之为冻结均衡的稳定但灾难性的纳什均衡。借鉴超材料(其中涌现的宏观性质源于设计的微观结构),我们为制度协调建立了一个现象学本构定律:$R_{\mathrm{eff}} = \beta\cdot (1-\rho) \cdot (1-\tau) \cdot (1-\gamma\rho\tau)$,其中$\beta$是决策分支因子,$\rho$是来源保真度,$\tau$是验证率,$\gamma\in [0,1]$捕捉来源和验证失败之间相关检测的协同效应。该模型预测自愈($R_{\mathrm{eff}} < 1$)和自失稳($R_{\mathrm{eff}} > 1$)状态之间的尖锐相变。我们引入了一个三类来源分类法:密码学、制度性和上下文绑定,并推导出四个可证伪的假设,以及一个提议在政府拨款评审小组中进行的为期12周的阶梯楔形整群随机试验。该框架连接了AI对齐理论和制度设计。

英文摘要

We argue that governance must transition from a normative discipline to an engineering discipline, and we develop a formal framework, inspired by the physics of metamaterials, to make this transition quantitative and testable. Artificial General Intelligence affects civilization primarily by increasing decision velocity while human verification capacity remains bounded. When the cost of validating AI-generated outputs exceeds the expected utility of acting on them, rational agents default to inaction: a stable but catastrophic Nash equilibrium we term the Freezing Equilibrium. Drawing on metamaterials, where emergent macro-properties arise from designed microstructure, we develop a phenomenological constitutive law for institutional coordination: $R_{\mathrm{eff}} = β\cdot (1-ρ) \cdot (1-τ) \cdot (1-γρτ)$, where $β$ is the decision branching factor, $ρ$ is provenance fidelity, $τ$ is the verification rate, and $γ\in [0,1]$ captures correlated-detection synergy between provenance and verification failures. The model predicts a sharp phase transition between self-healing ($R_{\mathrm{eff}} < 1$) and self-destabilizing ($R_{\mathrm{eff}} > 1$) regimes. We introduce a three-class provenance taxonomy: cryptographic, institutional, and context binding, and derive four falsifiable hypotheses with a proposed 12-week stepped-wedge cluster-randomized trial in government grant review panels. The framework bridges AI alignment theory and institutional design.

2606.00220 2026-06-02 cs.PL cs.AI

SEMBridge: Tagless-Final Program Semantics with Weakest-Precondition and Bounded-Checking Interpretations

SEMBridge: 具有最弱前置条件和有界检查解释的无标签最终程序语义

Eric Liang

发表机构 * Oracle

AI总结 提出SEMBridge框架,通过无标签最终风格统一生成可执行语义、最弱前置条件验证条件和有界检查,保持三者同步。

详情
AI中文摘要

形式化方法提供程序行为的严格描述,但实际软件工程通常通过可执行库、测试和增量设计工作。本文提出SEMBridge,一个小的无标签最终框架,用于从相同的可执行目标程序生成最弱前置条件和有界检查解释。不是将程序语义提交给一个抽象语法树然后编写单独的遍历,而是针对语义接口编写一次目标程序,并将其解释为多种含义:可读代码、具体执行、谓词变换器、有界反例搜索以及未来的证明助手或SMT后端。Python原型实现了一个无循环的命令式核心,包含赋值、条件、假设和断言。在五个示例程序上,相同的无标签最终定义生成了可执行状态变换器和验证条件,这些条件在多达729个状态的域上通过了有界检查。贡献不是Scala代码生成系统或新的验证器,而是一种紧凑的架构,用于保持可执行语义、最弱前置条件工件和有界验证同步。

英文摘要

Formal methods provide rigorous accounts of program behavior, but practical software engineering often works through executable libraries, tests, and incremental design. This paper presents SEMBridge, a small tagless-final framework for generating weakest-precondition and bounded-checking interpretations from the same executable object programs. Instead of committing a program semantics to one abstract syntax tree and then writing separate traversals, object programs are written once against a semantic interface and interpreted into multiple meanings: readable code, concrete execution, predicate transformers, bounded counterexample search, and future proof-assistant or SMT back ends. The Python prototype implements a loop-free imperative core with assignments, conditionals, assumptions, and assertions. Across five example programs, the same tagless-final definitions generated executable state transformers and verification conditions that passed bounded checking over domains up to 729 states. The contribution is not a Scala code-generation system or a new verifier, but a compact architecture for keeping executable semantics, weakest-precondition artifacts, and bounded validation synchronized.

2606.00219 2026-06-02 astro-ph.CO astro-ph.GA cs.LG

21cmEMUv3: a hybrid diffusion-LSTM emulator of 21cmFAST summary observables

21cmEMUv3: 一种混合扩散-LSTM的21cmFAST概要可观测量的仿真器

Daniela Breitman, Andrei Mesinger, Steven G. Murray, Ivan Nikolic, Roberto Trotta

发表机构 * Research Center for the Early Universe, Graduate School of Science, The University of Tokyo(早期宇宙研究中心,东京大学研究生院) Department of Physics, Graduate School of Science, The University of Tokyo(东京大学研究生院物理系) Scuola Normale Superiore (SNS), Piazza dei Cavalieri 7, Pisa(普拉蒂亚学院(SNS),比萨) Physics Department, Stellenbosch University(斯坦福博斯奇大学物理系) Cosmic Dawn Center (DAWN)(黎明宇宙中心(DAWN)) Niels Bohr Institute, University of Copenhagen(哥本哈根大学尼尔斯·波尔研究所) SISSA, Via Bonomea 265, 34136 Trieste(SISSA,特里斯特) INFN Sezione di Trieste(INFN特里埃斯特分部) Centro Nazionale di Ricerca in High Performance Computing, Big Data e Quantum Computing(高性能计算、大数据和量子计算国家研究中心) Physics Department, Blackett Lab, Imperial College London(伦敦帝国理工学院布莱特实验室物理系)

AI总结 提出混合扩散-LSTM仿真器21cmEMUv3,基于21cmFASTv3模拟,以高精度仿真21cm功率谱等七个概要可观测量,并用于重新解释HERA观测上限和预测SKA探测能力。

Comments 12 pages, 6 figures

详情
AI中文摘要

我们正在见证宇宙黎明和再电离时期观测的激增,这推动了对快速且稳健的理论解释框架的需求。为此,机器学习,特别是仿真,已成为加速和增强推断流程的有力方法。在本工作中,我们提出21cmEMUv3,一个基于21cmFASTv3模拟训练的仿真器,该模拟同时模拟了原子冷却和分子冷却星系。21cmEMUv3以$σ_8$和十个天体物理参数为条件,生成七个概要可观测量:(i) 柱状21cm功率谱,首次在如此高的分辨率和精度下,在宽红移范围$z \sim$ 6--30内进行仿真;(ii) 球平均21cm功率谱;(iii) 星系间介质平均中性分数;(iv) 平均21cm自旋温度;(v) 全局21cm信号;(vi) 紫外光度函数;(vii) 汤姆孙散射光深。值得注意的是,柱状21cm功率谱通过基于分数的扩散进行仿真,而其余六个概要可观测量通过长短期记忆网络进行仿真,所有结果均达到亚百分位的中位精度。我们使用该仿真器重新解释当前HERA的21cm功率谱上限,首次利用最先进的流体动力学模拟为分子冷却星系中的恒星形成提供先验信息。我们发现,推断出的单位恒星形成率软带X射线光度与高质量X射线双星向第一代星系预期的低金属丰度区域的外推一致,在95%置信度下排除了低于$10^{39.2}$ erg s$^{-1}M^{-1}_\odot m{yr}$的值。最后,我们针对不同阵列配置,对平方公里阵列探测宇宙21cm功率谱进行了预测。21cmEMU软件包已公开可用。

英文摘要

We are witnessing a surge in observations of the cosmic dawn (CD) and epoch of reionisation (EoR), driving an increasing demand for fast and robust theoretical interpretation frameworks. In response, machine learning (ML), and emulation in particular, has emerged as a powerful approach to accelerate and enhance inference pipelines. In this work, we present 21cmEMUv3, an emulator trained on 21cmFASTv3 simulations that model both atomically and molecularly cooling galaxies. 21cmEMUv3 is conditioned on $σ_8$ and ten astrophysical parameters to produce seven summary observables: (i) the cylindrical 21cm power spectrum (PS), emulated for the first time at such high resolution and accuracy across a wide redshift range of $z \sim$ 6--30; (ii) the spherically-averaged 21cm PS; (iii) the mean neutral fraction of the intergalactic medium (IGM); (iv) the mean 21cm spin temperature; (v) the global 21cm signal; (vi) the ultraviolet (UV) luminosity functions (LFs); and (vii) the Thomson scattering optical depth. Notably, the cylindrical 21cm PS is emulated via score-based diffusion, while the remaining six summaries are emulated via long-short term memory (LSTM) networks, all achieving sub-percent median accuracy. We use the emulator to reinterpret current 21cm PS upper limits from HERA, for the first time using state-of-the-art hydrodynamical simulations to inform priors on star formation inside molecularly cooling galaxies. We find that our inferred soft-band X-ray luminosity per unit star formation rate is consistent with extrapolations of high-mass X-ray binaries to the low-metallicity regimes expected in the first galaxies, excluding values below $10^{39.2}$ erg s$^{-1}M^{-1}_\odot \rm{yr}$ at $95\%$ confidence. Finally, we produce forecasts for the detection of the cosmic 21cm PS with the Square Kilometre Array for different array configurations. The 21cmEMU package is publicly available.

2606.00170 2026-06-02 cs.HC cs.AI cs.CV

UF-AMA: A unified framework for cross-domain emotion recognition via adaptive multimodal alignment

UF-AMA: 通过自适应多模态对齐的跨域情感识别统一框架

Zheng Wang, Shuo Wang, Junhong Wang

发表机构 * Institute of Advanced Technology, University of Science and Technology of China(中国科学技术大学先进技术研究院) Department of Electronic Engineering and Information Science, University of Science and Technology of China(中国科学技术大学电子工程与信息科学系) Institute of Artificial Intelligence, Hefei Comprehensive National Science Center(合肥综合国家科学中心人工智能研究院)

AI总结 提出一种统一框架UF-AMA,利用自适应多模态对齐和置信度感知筛选机制,解决跨主体和跨会话的生理信号情感识别中的分布偏移问题,在SEED和SEED-IV数据集上达到最优性能。

详情
AI中文摘要

近年来,基于脑电图(EEG)等生理信号的情感识别受到了广泛关注,因为与面部表情等外部行为数据相比,内部生理数据提供了更高的客观性和可靠性。然而,由于个体和情境差异导致的分布偏移,以及各模态样本质量的差异,构建具有高泛化性和鲁棒性的跨域多模态情感识别模型仍然是一个关键挑战。在本研究中,我们提出了一种具有自适应多模态对齐的统一框架(UF-AMA),以使用多模态生理信号解决跨主体和跨会话的情感识别问题。首先,我们构建了一个由Transformer编码器和多头交叉注意力模块组成的跨模态特征融合网络,实现了EEG信号和眼动追踪数据的深度融合。随后,我们引入了一种置信度感知筛选机制,动态评估每个模态分支在目标域样本上的预测可靠性,将样本划分为不同的质量子集,并相应地应用全局一致性对齐和跨模态蒸馏。最后,我们提出了一个多级域自适应框架,联合优化局部模态特定特征和全局融合特征的边际分布和条件分布,从而在多个粒度上减少跨域分布偏移。在SEED和SEED-IV数据集上的大量实验表明,UF-AMA在跨主体和跨会话任务中均达到了最先进的性能。源代码可在 https://github.com/BetterCoderLab/UF-AMA 获取。

英文摘要

In recent years, emotion recognition based on physiological signals such as electroencephalogram (EEG) has gained considerable attention, as internal physiological data offer greater objectivity and reliability compared to external behavioral data like facial expressions. However, due to distribution shifts caused by individual and contextual differences, along with variations in sample quality across modalities, constructing a cross-domain multimodal emotion recognition model with high generalization and robustness remains a key challenge. In this study, we propose a Unified Framework with Adaptive Multimodal Alignment (UF-AMA) to address cross-subject and cross-session emotion recognition using multimodal physiological signals. First, we construct a cross-modal feature fusion network comprising Transformer encoders and multi-head cross-attention modules, enabling the deep integration of EEG signals and eye-tracking data. Subsequently, we introduce a confidence-aware screening mechanism that dynamically assesses the predictive reliability of each modality branch on target domain samples, partitions samples into different quality subsets, and accordingly applies global consistency alignment and cross-modal distillation. Finally, we propose a multi-level domain adaptation framework that jointly optimizes the marginal and conditional distributions of both local modality-specific and global fusion features, thereby reducing cross-domain distribution shifts at multiple granularities. Extensive experiments on the SEED and SEED-IV datasets demonstrate that UF-AMA achieves state-of-the-art (SOTA) performance in both cross-subject and cross-session tasks. The source code is available at: https://github.com/BetterCoderLab/UF-AMA.

2606.00161 2026-06-02 cs.CR cs.AI cs.LG

Improving IoT Intrusion Detection Through SMOTE-Based Oversampling and Extended Multi-Model Evaluation on Side-Channel Power Data

基于SMOTE过采样和扩展多模型评估的侧信道功率数据物联网入侵检测改进

Muhammad Khuram Shahzad, Haseeb Khan, Muhammad Masood Khan, Mubashra Bibi

发表机构 * School of Electrical Engineering and Computer Science (SEECS), NUST(电气工程与计算机科学学院(SEECS),努斯兰大学)

AI总结 针对物联网侧信道数据集中的严重类别不平衡问题,采用SMOTE过采样平衡数据,并评估八种机器学习模型,其中随机森林和极端随机树在F1分数上超越基线方法,同时揭示了宏观F1指标的重要性。

Comments 8 pages, 14 figures; code and results publicly available

详情
AI中文摘要

物联网网络中的入侵检测面临传统机器学习方法无法克服的挑战,其中最大的挑战之一是侧信道数据集中存在的类别不平衡问题,正常类样本与攻击类样本的比例可达75964:1。Dominguez等人通过基于功率的入侵检测概念验证解决了这一问题,但既未尝试处理不平衡问题,也未使用平衡训练集评估分类器性能。本文同时处理这两个方面。首先,对从初始数据集提取的所有九个可能数据集应用合成少数类过采样技术(SMOTE),使每个数据集的精确不平衡比达到1.1。然后,在SMOTE平衡的6小时数据集上,在相同条件下训练了八种算法:随机森林、HistGradientBoosting、LightGBM、极端随机树、XGBoost、k近邻、多层感知机和决策树。随机森林的微平均F1分数达到0.9989,宏F1为0.9794,优于基线论文中时间序列森林算法之前的最佳微F1结果0.9983。极端随机树提供了相同的性能,但速度快10倍。与基线论文评估相比,显式引入宏F1指标揭示了聚合性能指标遗漏的重要类别级信息。基于混淆矩阵、F1热图和ROC曲线计算的每类召回率表明,仅当使用SMOTE平衡时,少数攻击类(尤其是M+L联合感染类)才能被可靠检测。特征重要性分析表明,在功率窗口的60个时间步中,最近的时间步是最重要的预测信号。

英文摘要

The detection of intrusions in IoT-based networks poses challenges that cannot be overcome using traditional machine learning methods. Perhaps the biggest of them is related to the presence of a class imbalance in the side-channel dataset, where the number of samples in the normal class compared to the attacks can reach a ratio of 75,964 to 1. Such an aspect is addressed by Dominguez et al. through the proof of concept of power-based intrusion detection. Unfortunately, neither the authors attempt to cope with the problem of imbalance nor do they assess the classifier performance using a balanced training set. In the current paper, both aspects will be handled at once. First, a Synthetic Minority Oversampling Technique (SMOTE) was performed on all nine possible datasets extracted from the initial one, providing an exact imbalance ratio of 1.1 for each. Then, eight algorithms i.e. Random Forest, HistGradientBoosting, LightGBM, Extra Trees, XGBoost, k-Nearest Neighbors, Multi-Layer Perceptron, and Decision Tree were trained under identical conditions for the SMOTE balanced 6-hour dataset. Random Forest reached a micro-averaged F1 score of 0.9989 and macro F1 of 0.9794, thus outperforming the previously best micro-F1 result obtained by Time Series Forest algorithm from the base paper of 0.9983. Extra Trees provided the same performance as well, but at 10 times faster. The introduction of a macro-F1 metric explicitly in contrast to the base paper assessment reveals important class-level information missed with aggregate performance metrics. Recall rates per-class calculated with confusion matrices, F1 heatmaps, and ROC curves show that minority attack classes, especially those with combined M+L infections, are detected reliably only when using SMOTE balance. Feature importance analysis indicates the latest time steps as the most important predictor signals out of 60 steps in a power window.

2606.00160 2026-06-02 cs.CR cs.AI cs.CL

DataShield: Safety-degrading Data Filtering for LLM Benign Instruction Fine-Tuning

DataShield: 针对大语言模型良性指令微调的安全降级数据过滤

Junbo Zhang, Qianli Zhou, Xinyang Deng, Wen Jiang, Jie Pan, Jinbiao Zhu

发表机构 * nwpu.edu.cn(西北工业大学)

AI总结 提出DataShield方法,通过量化每个样本对模型合规行为的贡献作为安全降级分数,高效识别良性数据集中的安全降级样本,并在多个模型和数据集上验证有效性。

详情
AI中文摘要

大型语言模型(LLM)即使在使用良性数据集进行微调时,也会出现安全能力下降的问题。然而,现有识别良性数据集中安全降级样本的方法存在计算成本高和噪声显著的问题。在本文中,我们提出DataShield,以高效且有效地识别潜在的安全降级样本。我们的关键直觉基于以下观察:良性微调提高了LLM的整体响应合规性。DataShield的核心技术见解是将每个样本对模型合规行为的贡献量化为其安全降级分数。DataShield包含三个核心组件:(1)合规向量提取,捕获LLM的合规行为倾向;(2)一种新颖的合规感知分数(CAS),自动识别最优的安全关键层;(3)安全降级样本过滤,量化训练数据沿合规方向的投影偏移。在Llama3-8B、Llama3.1-8B和Qwen2.5-7B上使用Alpaca和Dolly良性数据集进行的广泛实验评估验证了我们的方法在识别高风险和低风险数据子集方面的有效性。我们还观察到,开放式问答更可能触发安全降级,且相应的响应往往更长。我们希望这项工作能为以数据为中心的防御方法提供新的见解。源代码可在https://github.com/ZJunBo/DataShield获取。

英文摘要

Large language models (LLMs) suffer from degraded safety capabilities even when fine-tuned with benign datasets. However, existing methods for identifying safety-degrading samples in benign datasets suffer from high computational costs and significant noise issues. In this paper, we propose DataShield to efficiently and effectively identify potential safety-degrading samples. Our key intuition is based on the observation that benign fine-tuning increases the overall response compliance of LLMs. DataShield's key technical insight is to quantify each sample's contribution to the model's compliance behavior as its safety degradation score. DataShield consists of three core components: (1) Compliance Vector Extraction, which captures the LLM's compliance behavior tendency; (2) a novel Compliance-Aware Score (CAS), which automatically identifies the optimal safety-critical layer; and (3) Safety-degrading Sample Filtering, which quantifies the projection shift of training data along the compliance direction. Extensive experimental evaluation on Llama3-8B, Llama3.1-8B, and Qwen2.5-7B using the Alpaca and Dolly benign datasets validates our method's effectiveness in identifying high-risk and low-risk data subsets. We also observe that open-ended question answering is more likely to trigger safety degradation, and corresponding responses tend to be longer. We hope this work can provide new insights into data-centric defense methods. The source code is available at: https://github.com/ZJunBo/DataShield.

2606.00158 2026-06-02 eess.IV cs.CV

Training-Free Continuous Bitrate Control for Scalable Image Coding for Humans and Machines

面向人类与机器的可扩展图像编码的无训练连续码率控制

Yui Tatsumi, Hiroshi Watanabe

发表机构 * University of Tokyo(东京大学)

AI总结 提出一种无训练的变码率可扩展图像编码框架,通过基于预测尺度值调整量化步长实现连续码率控制,同时保留机器层和增强层的高尺度信息。

详情
AI中文摘要

连续变码率压缩在实际应用中需求很高,但在面向人类和机器的可扩展图像编码中仍未得到充分探索。在本文中,我们提出了一种无训练的变码率可扩展图像编码框架。通过基于预测尺度值调整量化步长,所提出的方法实现了连续码率控制,同时保留了机器层和增强层中的高尺度信息。实验结果证明了所提出方法的有效性,并强调了两个层之间码率分配的重要性。

英文摘要

Continuous variable-rate compression is highly demanded in real-world applications, but remains underexplored in scalable image coding for humans and machines. In this paper, we propose a training-free variable-rate scalable image coding framework. By adjusting quantization steps based on predicted scale values, the proposed method achieves continuous bitrate control while preserving high-scale information in the machine and enhancement layers. Experimental results demonstrate the effectiveness of the proposed method and highlight the importance of bitrate allocation between the two layers.

2606.00157 2026-06-02 stat.ML cs.AI cs.LG math.PR

Interpreting FCDNNs via RG on Exponential Family

通过指数族上的重正化群解释全连接深度神经网络

Fuzhou Gong, Zigeng Xia

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文通过建立统计物理中重正化群方法与深度神经网络训练过程的对应关系,证明了对于指数族连续输入数据,全连接DNN训练后特征层输出的特征参数等于RG方法下的不动点,从而解释了DNN的特征提取能力。

Comments 18 pages, 2 figures

详情
AI中文摘要

我们考虑通过建立统计物理中的重正化群(RG)方法与深度神经网络(DNN)训练过程之间的对应关系,来建立深度学习的可解释性理论。我们已使用一维伊辛模型作为输入数据证明了所构建的关系。本文我们将结果推广到连续输入数据的情况,这是将该对应框架应用于真实数据的必要准备。为具有代表性,我们考虑指数族中的一类数据分布。我们证明,当全连接(FC)DNN的参数在训练后达到最优值时,DNN特征层输出的特征参数等于连续场RG方法下输入数据特征参数的不动点。这一结论表明,DNN的训练过程等价于对此类数据进行RG计算,因此网络能够像RG一样从输入数据中提取主要特征。此外,该等价性进一步验证了我们建立的对应框架,为DNN在真实数据上的卓越表现提供了解释。

英文摘要

We consider establishing the interpretability theory of deep learning through constructing a corresponding relationship between the renormalization group (RG) method in statistical physics and the training process of deep neural networks (DNNs). We have proved the constructed relationship using the one-dimensional Ising model as the input data. In this paper we generalize our results to the case of continuous input data, which is a necessary preparation for applying the corresponding framework to real-world data. To be representative, we consider a class of data distribution in the exponential family. We prove that when the parameters of fully connected (FC) DNNs achieve their optimal value after training, the characteristic parameters of the feature layer output of DNNs are equal to the fixed points of the characteristic parameters of input data under RG method for continuous fields. This conclusion shows that the training process of DNNs is equivalent to RG calculation on this kind of data and therefore the network can extract main features from the input data just like RG. Also, the equivalence further validates the correspondence framework we have established, providing an explanation for the outstanding performance of DNNs on real-world data.

2606.00156 2026-06-02 eess.IV cs.AI

A physics-informed foundation model for quantitative diffusion MRI

一种用于定量扩散MRI的物理信息基础模型

Zihan Li, Jialan Zheng, Ziyu Li, Xun Yuan, Kasidit Anmahapong, Ziang Wang, Mingxuan Liu, Hongjia Yang, Yifei Chen, Zhuhao Wang, Yuhang He, Fang Chen, Rui Li, Huaiqiang Sun, Yi Liao, Congyu Liao, Yang Yang, Haibo Qu, Xue Zhang, Hongen Liao, Qiyuan Tian

发表机构 * School of Biomedical Engineering, Tsinghua University(清华大学生物医学工程系) Oxford Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford(牛津大学整合神经影像中心、FMRIB、临床神经科学系) Department of Radiology, West China Second University Hospital, Sichuan University(四川大学华西第二医院放射科) School of Biomedical Engineering and the Institute of Medical Robotics, Shanghai Jiaotong University(上海交通大学生物医学工程学院和医学机器人研究院) Department of Radiology, Institution of Radiology and Medical Imaging, West China Hospital, Sichuan University(四川大学华西医院放射科、放射医学与影像研究所) Department of Radiology and Biomedical Imaging, University of California San Francisco(加州大学旧金山分校放射科和生物医学影像系) Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine(斯坦福大学医学院精神病学与行为科学系)

AI总结 提出物理信息生成微结构网络(PIGMENT),通过零样本适应实现从稀疏数据中恢复可靠的定量扩散MRI参数映射。

详情
AI中文摘要

理解人脑需要获取其微观组织架构。扩散磁共振成像(MRI)提供了唯一非侵入性的活体全脑微结构窗口,但可靠的定量映射仍局限于需要密集采样和优化采集协议的专业研究环境。为解决这一差距,我们提出了一种物理信息生成微结构网络(PIGMENT),它学习人脑微结构的通用生成先验,并以零样本方式适应每个参与者的测量数据,以恢复特定主体的映射。PIGMENT在涵盖多个站点、供应商和场强的11375次扫描上训练,使得在来自五个独立中心的外部数据集上,能够对张量、峰度和NODDI模型进行可靠的定量映射。在传统拟合变得不可靠的情况下,它仍然有效,从极其稀疏的采集中恢复有意义的映射,同时支持下游的纤维追踪和结构连接映射。PIGMENT估计显示出强大的生物学有效性,从10倍加速扫描中保留了亚毫米级皮层微结构模式和早期儿童白质发育轨迹。此外,PIGMENT能够在成本效益高的低场系统上进行可靠的定量张量映射,并使用超快速临床协议提取肿瘤相关生物标志物。这些结果共同确立了PIGMENT作为一种物理信息基础模型,将定量扩散MRI扩展到传统上因过于稀疏、异质或临床受限而无法进行可靠分析的领域。

英文摘要

Understanding the human brain requires access to its microscopic tissue architecture. Diffusion magnetic resonance imaging (MRI) provides the only noninvasive window into whole-brain microstructure in vivo, yet reliable quantitative mapping remains confined to specialized research settings requiring dense sampling and optimized acquisition protocols. To address this gap, we present a physics-informed generative microstructure network (PIGMENT) that learns a universal generative prior of human brain microstructure and adapts it zero-shot to each participant's measured data to recover subject-specific maps. Trained on 11375 scans spanning multiple sites, vendors, and field strengths, PIGMENT enabled reliable quantitative mapping for tensor, kurtosis, and NODDI models across external datasets from five independent centers. It remains effective where conventional fitting becomes unreliable, recovering meaningful maps from extremely sparse acquisitions while supporting downstream tractography and structural connectivity mapping. PIGMENT estimates demonstrated strong biological validity, preserving submillimeter cortical microarchitectural patterns and early-childhood white matter developmental trajectories from 10-fold accelerated scans. Furthermore, PIGMENT enables reliable quantitative tensor mapping on cost-efficient low-field systems and the extraction of tumor-related biomarkers using ultra-fast clinical protocols. Together, these results establish PIGMENT as a physics-informed foundation model that extends quantitative diffusion MRI into regimes traditionally too sparse, heterogeneous, or clinically constrained for reliable analysis.

2606.00155 2026-06-02 cs.CR cs.AI

A Protocol-Language Model for Network Intrusion (Without Deep Packet Inspection)

一种用于网络入侵的协议语言模型(无需深度包检测)

Vivek Kumar Sharma

发表机构 * Palo Alto Networks(帕洛阿尔托网络)

AI总结 提出PLM-NIDS,利用RWKV-4状态空间模型将网络流作为语言处理,仅基于L3/L4包元数据检测攻击,无需深度包检测,实现零样本异常检测(PR-AUC=0.93)和加密协议透明处理。

Comments 20 pages Research paper on Packet Language Models for Network Intrusion Detection Systems(Without Deep Packet Inspection).Code available on GitHub

详情
AI中文摘要

现代网络入侵检测系统(NIDS)陷入结构性矛盾:承载最高威胁情报的协议恰恰是那些在TLS 1.3和QUIC下加密的协议,其中负载检测毫无用处。我们提出一个更简单的问题——如果攻击签名不在字节中,而在节奏中呢?——并通过将网络流视为一种语言来回答,该语言的语法完全由L3/L4包元数据编写:长度、到达间隔时间、TTL、TCP标志和哈希端口号。我们提出了PLM-NIDS,它依次证明了三个主张。(1)语法存在且可学习:在344,232个未标记的Monday流上训练的RWKV-4状态空间模型实现了0.204的因果LM验证损失,表明良性流量具有可预测的、统计一致的结构。(2)攻击违反此语法:在训练时使用零攻击标签的情况下,每流困惑度得分以PR-AUC=0.93清晰分离良性流和攻击流。(3)这种分离在架构上非平凡:在相同令牌序列上训练的LSTM退化为多数类预测器(ROC-AUC约0.50,通过始终预测“攻击”得到F1=0.91),证明RWKV的因果预训练提供了直接分类器无法获得的归纳偏置。监督微调进一步将PR-AUC提升至0.94,ROC-AUC提升至0.75,在校准操作阈值下精确率为97.7%。RWKV骨干的O(T)循环推理支持逐包流式处理而无需流缓冲,使PLM-NIDS在线速率下可操作。由于它仅读取IP/TCP/UDP头部,因此本质上是加密无关的:TLS 1.3、QUIC和未来的加密协议均被透明处理。

英文摘要

Modern network intrusion detection systems (NIDS) are caught in a structural contradiction: the protocols carrying the highest threat intelligence are precisely those encrypted under TLS 1.3 and QUIC, where payload inspection yields nothing. We ask a simpler question -- what if the attack signature is not in the bytes, but in the rhythm? -- and answer it by treating network flows as a language whose grammar is written entirely in L3/L4 packet metadata: length, inter-arrival time, TTL, TCP flags, and hashed port numbers. We present PLM-NIDS, which proves three claims in sequence. (1) The grammar exists and is learnable: a RWKV-4 state-space model trained on 344,232 unlabelled Monday flows achieves a causal LM validation loss of 0.204, demonstrating that benign traffic has predictable, statistically consistent structure. (2) Attacks violate this grammar: the per-flow perplexity score cleanly separates benign from attack flows with PR-AUC = 0.93 using zero attack labels at training time. (3) This separation is architecturally nontrivial: an LSTM trained on identical token sequences degenerates to a majority-class predictor (ROC-AUC approximately 0.50, F1 = 0.91 by always predicting "attack"), proving that RWKV's causal pre-training provides an inductive bias unavailable to direct classifiers. Supervised fine-tuning further raises PR-AUC to 0.94 and ROC-AUC to 0.75, with a precision of 97.7% at the calibrated operating threshold. The RWKV backbone's O(T) recurrent inference enables per-packet streaming without flow buffering, making PLM-NIDS operationally viable at line rate. Because it reads only IP/TCP/UDP headers, it is inherently encryption-agnostic: TLS 1.3, QUIC, and future encrypted protocols are handled transparently.

2606.00152 2026-06-02 cs.CR cs.AI

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say

PrivacyPeek: 审计基于LLM的智能体获取了什么,而不仅仅是它们说了什么

Mingxuan Zhang, Jiahui Han, Dadi Guo, Songze Li, Guanchu Wang, Na Zou, Dongrui Liu, Xia Hu

发表机构 * Shanghai Artificial Intelligence Laboratory(上海人工智能实验室) Southeast University(东南大学)

AI总结 提出PrivacyPeek基准,通过检查工具调用轨迹和探针诱导,评估基于LLM的智能体在获取阶段不必要的敏感信息泄露,发现该问题普遍存在且现有防御效果有限。

Comments 19 pages, 9 figures

详情
AI中文摘要

基于LLM的智能体正在快速发展,能够自主调用外部工具为用户完成多步骤任务。然而,智能体常常获取超出任务所需的敏感信息。现有的隐私基准审计智能体的响应或外部行为泄露了什么,但忽略了数据首次进入智能体上下文时的获取阶段。过度获取的信息只需一次粗心操作或一次攻击即可完全泄露。为了评估其普遍性,我们引入了\emph{PrivacyPeek},一个用于评估基于LLM的智能体获取阶段隐私泄露的基准,包含$1{,}182$个案例,涵盖$7$种获取行为和$16$个应用领域。具体来说,\emph{获取检查}检查智能体的工具调用轨迹,包括其调用的工具和接收的数据,以检测其何时获取超出任务范围的敏感信息。然后,\emph{探针诱导}发出后续探针,并衡量攻击者能够多容易地诱导出智能体已获取但未披露的敏感信息。我们在4个模型家族的10个基于LLM的智能体上的实验表明,不必要的敏感信息获取非常普遍。此外,我们观察到任务完成能力与获取阶段泄露之间存在相关性。提示级别的防御仅减少了获取阶段泄露的一小部分,大部分未被缓解。这些结果使得审计获取阶段的隐私既紧迫又必要。我们的数据集和代码可在https://github.com/Xuan269/PrivacyPeek-Resource获取。

英文摘要

LLM-based agents are rapidly advancing, autonomously invoking external tools to complete multi-step tasks for users. However, agents often acquire more sensitive information than the task requires. Existing privacy benchmarks audit what the agent's response or outgoing actions disclose, but overlook the acquisition stage where data first enters the agent's context. The over-acquired information is then one careless action or one attack away from an outright leak. To assess its prevalence, we introduce \emph{PrivacyPeek}, a benchmark for evaluating acquisition-stage privacy leakage of LLM-based agents, with $1{,}182$ cases across $7$ acquisition behaviours and $16$ application domains. Specifically, \emph{Acquisition Inspection} examines the agent's tool-call trajectory, both the tools it invokes and the data it receives, to detect when it acquires sensitive information beyond the task scope. \emph{Probe Elicitation} then issues a follow-up probe and measures how readily an attacker could elicit sensitive information the agent acquired but did not disclose. Our experiments on 10 LLM-based agents across 4 model families show that the unnecessary acquisition of sensitive information is widespread. In addition, we observe a correlation between the task-completion capability and acquisition-stage leakage. Prompt-level defences reduce only a small fraction of acquisition-stage leakage, leaving the majority unmitigated. These results make auditing acquisition-stage privacy both urgent and necessary. Our dataset and code are available at https://github.com/Xuan269/PrivacyPeek-Resource.

2606.00150 2026-06-02 cs.CR cs.AI

Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models

人格攻击:针对大型语言模型的增量记忆注入越狱攻击

Junyoung Park, Seongyong Ju, Sunghwan Park, Jaewoo Lee

发表机构 * Chung-Ang University(Chung-Ang 大学)

AI总结 提出一种基于记忆注入的越狱方法Persona Attack,通过逐步操纵模型上下文窗口,使模型在记忆积累中优先处理注入指令,从而绕过安全对齐,在特定配置下攻击成功率可达95%。

详情
AI中文摘要

随着大型语言模型为方便用户而不断发展,尽管在安全训练方面持续努力,但越狱攻击的脆弱性仍被不断报告。传统的越狱技术通常侧重于单次提示注入,忽略了模型记住对话流程和用户指令的能力。在本文中,我们提出了Persona Attack,一种基于记忆注入的越狱方法,通过逐步方法操纵模型的上下文窗口。将Persona Attack应用于多个广泛使用的LLM的实验结果表明,随着注入在记忆中的积累,模型越来越优先考虑这些指令,而不是其内部安全对齐机制。此外,我们的实验经验性地证明,攻击成功率不仅根据模型的记忆实现而变化,还取决于指令的组合,在特定指令配置下可达到95%。

英文摘要

As Large Language Models evolve for user convenience, vulnerability to jailbreak attacks continues to be reported despite ongoing efforts in safety training. Traditional jailbreak techniques typically focus on a single prompt injection, neglecting the models' ability to remember the flow of conversation and the user's instructions. In this paper, we propose Persona Attack, a memory injection based jailbreak method that manipulates the model's context window through a step by step approach. Experimental results from applying Persona Attack to several widely used LLMs reveal that, as injections accumulate in memory, models increasingly prioritize these instructions over their internal safety alignment mechanisms. Furthermore, our experiments empirically demonstrate that the attack success rate varies not only according to the memory implementation of the model, but also combinations of instructions and can reach 95% under specific instruction configurations.

2606.00146 2026-06-02 eess.IV cs.AI cs.CV

Multi-Contrast MRI Motion Correction via Parameter-Informed Disentanglement and Adaptive Experts

多对比度MRI运动校正:基于参数信息解缠与自适应专家网络

Honglin Xiong, Yuxian Tang, Feng Li, Yulin Wang, Lei Xiang, Dinggang Shen, Qian Wang

发表机构 * ShanghaiTech University(上海科技大学)

AI总结 提出一种结合参数信息对比度解缠与严重度感知自适应校正的统一框架,通过ScanCLIP提取对比度嵌入以分离解剖内容,利用视觉Transformer估计运动严重度并路由至专家混合网络,实现跨对比度与严重度的运动伪影校正,在IXI和HCP基准上优于现有方法。

详情
AI中文摘要

磁共振成像中的运动伪影降低了诊断可靠性。现有的深度学习方法通常针对特定对比度,无法泛化到不同模态和伪影严重度。我们提出一个统一框架,结合参数信息对比度解缠与严重度感知自适应校正。ScanCLIP在超过30,000个MRI文本-图像对上预训练,从采集参数中导出对比度嵌入,将对比度风格与解剖内容分离,得到无对比度特征。然后,视觉Transformer估计运动严重度,并通过专家混合网络路由特征,实现针对性伪影校正。双路径解码器重建干净图像和残差伪影图,强制执行图像空间一致性。在IXI和HCP基准上,我们的方法在PSNR上提升0.75 dB,SSIM最高提升0.0279,优于现有方法,且在更高伪影严重度下增益更大。该方法在真实临床数据上展现出鲁棒的零样本泛化能力,这些数据使用未见过的扫描参数采集,而现有方法要么无法去除伪影,要么引入额外失真。

英文摘要

Motion artifacts in magnetic resonance imaging (MRI) degrade diagnostic reliability. Existing deep learning methods are typically contrast-specific and fail to generalize across diverse modalities and artifact severities. We propose a unified framework combining parameter-informed contrast disentanglement with severity-aware adaptive correction. ScanCLIP, pretrained on over 30,000 MRI text-image pairs, derives contrast embeddings from acquisition parameters to disentangle contrast style from anatomical content, yielding contrast-free features. A Vision Transformer then estimates motion severity and routes features through a Mixture-of-Experts network, enabling targeted artifact correction. A dual-pathway decoder reconstructs both the clean image and residual artifact map, enforcing image-space consistency. On IXI and HCP benchmarks, our method improves PSNR by 0.75 dB and SSIM by up to 0.0279 over state-of-the-art approaches, with larger gains at higher artifact severities. It further demonstrates robust zero-shot generalization on real-world clinical data acquired with unseen scanning parameters, where existing methods either fail to remove artifacts or introduce additional distortions.

2606.00143 2026-06-02 q-fin.PM cs.AI

Regime-Adaptive Continual Learning for Portfolio Management

Regime-Adaptive Continual Learning for Portfolio Management

Chaofan Pan, Lingfei Ren, Linbo Xiong, Yonghao Li, Wei Wei, Xin Yang

发表机构 * Southwestern University of Finance and Economics(西南财经大学) Shanxi University(山西大学)

AI总结 提出ReCAP框架,通过自适应制度检测和持续学习实现投资组合管理的快速适应与长期优异回报。

Comments Accepted by KDD 2026

详情
AI中文摘要

金融市场本质上是不稳定的,频繁出现制度转变和结构性变化,使得传统的投资组合管理方法失效。现有的补救措施,如滚动窗口重新训练和朴素在线微调,分别受到高计算成本和知识利用不足的困扰,导致低回报和有限的适应性。持续学习通过使交易代理能够跨顺序任务积累和转移知识,提供了一种有前景的范式。在本文中,我们提出了 extbf{Re}gime-aware extbf{C}ontinual extbf{A}daptive extbf{P}ortfolio management ( extbf{ReCAP}),一个将CL集成到PM中以应对动态金融环境挑战的新框架。ReCAP采用自适应制度检测模块将历史市场数据分割成可变长度的制度,实现制度特定的策略向量学习和策略库构建。在持续交易过程中,制度门控模块根据当前市场状态自适应地组合策略库中的策略向量,促进对新检测到的制度的快速适应。只有制度门控和当前制度的策略向量被持续更新,以有效保留有用知识。在五个真实世界数据集上的广泛实验表明,ReCAP持续优于流行的基线,在长期投资视野中实现卓越回报,并快速适应制度转变。

英文摘要

Financial markets are inherently non-stationary, exhibiting frequent regime shifts and structural changes that render traditional Portfolio Management (PM) approaches ineffective. Existing remedies, such as rolling-window retraining and naive online fine-tuning, are hindered by high computational costs and insufficient knowledge utilization, respectively, resulting in low returns and limited adaptability. Continual learning (CL) offers a promising paradigm by enabling trading agents to accumulate and transfer knowledge across sequential tasks. In this paper, we propose \textbf{Re}gime-aware \textbf{C}ontinual \textbf{A}daptive \textbf{P}ortfolio management (\textbf{ReCAP}), a novel framework that integrates CL into PM to address the challenges of dynamic financial environments. ReCAP employs an adaptive regime detection module to segment historical market data into variable-length regimes, enabling regime-specific learning of policy vectors and the construction of a policy library. During continual trading, a regime-gate module adaptively combines policy vectors from the library based on the current market state, facilitating rapid adaptation to newly detected regimes. Only the regime-gate and the current regime's policy vector are continually updated to preserve useful knowledge effectively. Extensive experiments on five real-world datasets demonstrate that ReCAP consistently outperforms popular baselines, achieving superior returns in long-term investment horizons and rapid adaptation to regime shifts.

2606.00134 2026-06-02 cs.CR cs.AI cs.LG

XAI-SOH-FL: Enhancing SOH-FL with Adaptive Aggregation and Explainable AI for Intrusion Detection in Heterogeneous IoT

XAI-SOH-FL: 通过自适应聚合和可解释人工智能增强异构物联网入侵检测中的SOH-FL

Ambreen Aslam, Maaz Hassan, Bibi Zahra, Muhammad Khuram Shahzad

发表机构 * School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST)(电气工程与计算机科学学院(SEECS),国家 sciences and Technology(NUST))

AI总结 针对异构物联网中数据异构、标签稀缺和模型不可解释性问题,提出XAI-SOH-FL框架,通过自适应聚合(动态γ选择与贝叶斯优化)和SHAP可解释性,在CICIDS2017数据集上达到94.12%准确率和0.92 F1分数,优于基线SOH-FL。

Comments 8 pages, 6 figures; code available at https://github.com/aaslam-msit/SOH-FL-Enhancement

详情
AI中文摘要

物联网环境中的入侵检测系统面临数据异构、缺乏标记数据和模型可解释性有限等重大挑战。联邦学习提供了一种隐私保护解决方案;然而,现有方法如SOH-FL存在两个关键限制:依赖手动调整的聚合参数γ以及模型预测缺乏可解释性。在本文中,我们提出XAI-SOH-FL,一个增强框架,将自适应聚合和可解释人工智能集成到SOH-FL范式中。首先,我们引入基于相似性阈值的动态γ选择机制,使聚合过程能够适应不断变化的数据分布。其次,采用贝叶斯优化自动确定最优γ值,消除了手动调整的需要。第三,引入SHAP(SHapley Additive exPlanations)为入侵检测决策提供特征级可解释性。在CICIDS2017数据集上的实验评估表明,所提方法达到了94.12%的准确率和0.92的F1分数,优于基线SOH-FL模型,同时收敛所需的通信轮次更少。此外,基于SHAP的分析揭示,流级特征如流持续时间和数据包长度显著影响模型预测。这些结果表明,XAI-SOH-FL在异构物联网环境中提供了准确性、适应性和可解释性之间的有效平衡。

英文摘要

Intrusion Detection Systems (IDS) in Internet of Things (IoT) environments face significant challenges due to data heterogeneity, lack of labeled data, and limited model interpretability. Federated Learning (FL) offers a privacy-preserving solution; however, existing approaches such as SOH-FL suffer from two key limitations: reliance on a manually tuned aggregation parameter γ and lack of explainability in model predictions. In this paper, we propose XAI-SOH-FL, an enhanced framework that integrates adaptive aggregation and explainable artificial intelligence into the SOH-FL paradigm. First, we introduce a dynamic γ selection mechanism based on similarity thresholding, enabling the aggregation process to adapt to evolving data distributions. Second, Bayesian Optimization is employed to automatically determine optimal γ values, eliminating the need for manual tuning. Third, SHAP (SHapley Additive exPlanations) is incorporated to provide feature-level interpretability for intrusion detection decisions. Experimental evaluation on the CICIDS2017 dataset demonstrates that the proposed approach achieves an accuracy of 94.12% and an F1-score of 0.92, outperforming the baseline SOH-FL model while converging in fewer communication rounds. Furthermore, SHAP-based analysis reveals that flow-level features such as Flow Duration and Packet Length significantly influence model predictions. These results indicate that XAI-SOH-FL provides an effective balance between accuracy, adaptability, and interpretability in heterogeneous IoT environments.

2606.00131 2026-06-02 cs.SE cs.AI cs.LG cs.PL

AI-PROPELLER: Warehouse-Scale Interprocedural Code Layout Optimization with AlphaEvolve

AI-PROPELLER:基于AlphaEvolve的仓库规模过程间代码布局优化

Chaitanya Mamatha Ananda, Rajiv Gupta, Mircea Trofin, Aiden Grossman, Sriraman Tallam, Xinliang David Li, Amir Yazdanbakhsh

发表机构 * University of California, Riverside(加州大学河滨分校) Google(谷歌) DeepMind(深度思维)

AI总结 提出AI-PROPELLER系统,利用Magellan智能工作流将Propeller的编译器启发式方法演化为细粒度过程间优化器,并通过实际硬件执行评估布局变体,首次在工业仓库规模应用中实现细粒度过程间代码布局优化,性能提升0.23%至1.6%。

详情
AI中文摘要

后链接优化器(如Propeller和BOLT)已证明,精确的、基于性能剖析的代码布局可以从高度优化的二进制文件中提取显著的性能提升。然而,这些系统目前局限于过程内技术,未能充分利用过程间布局的全局潜力。由于组合爆炸的搜索空间和复杂的调用返回语义难以建模,过程间代码布局历来困难。因此,细粒度过程间布局的性能潜力在实践中尚未得到证实。AI-PROPELLER使用Magellan(一种智能工作流),将Propeller中的编译器启发式方法演化为细粒度过程间优化器,并微调所得策略的超参数。为确保高保真度,我们摒弃了近似的静态成本模型,智能工作流生成多个布局变体,并在实际硬件上执行以测量真实性能计数器,为进化循环提供精确的奖励信号。AI-PROPELLER已在包括大型仓库规模应用在内的多个基准测试上进行了评估,实验表明,在使用最先进的FDO和PLO优化后,性能提升0.23%至1.6%,这对于实际二进制文件而言意义重大。这是首次在工业环境中对大型仓库规模应用进行细粒度过程间代码布局优化。

英文摘要

Post-link optimizers (PLOs) such as Propeller and BOLT have demonstrated that precise, profile-guided code layout can extract significant performance gains from heavily optimized binaries. However, these systems are currently restricted to intraprocedural techniques, leaving the global potential of interprocedural layout largely untapped. Interprocedural code layout is historically difficult due to a combinatorially intractable search space and complex call-return semantics that are challenging to model. Consequently, the performance potential of fine-grained interprocedural layout remains unproven in practice. AI-PROPELLER uses Magellan, an agentic workflow that evolves the compiler heuristic in Propeller into a fine-grained interprocedural optimizer and fine-tunes the resulting policy hyperparameters. To ensure high-fidelity, we move away from approximate static cost models and the agentic workflow generates multiple layout variants that are executed on actual hardware to measure real performance counters, providing a precise reward signal for the evolutionary loop. AI-PROPELLER has been evaluated on several benchmarks including large warehouse-scale applications and experiments show performance improvements of 0.23% to 1.6% optimized with state-of-the-art FDO and PLO which is significant for real-world binaries. This is the first time ever that large warehouse-scale applications in industrial settings have been optimized with fine-grained interprocedural code layout.

2606.00125 2026-06-02 cs.IR cs.AI cs.LG cs.MM

Multimodal Music Recommendation System using LLMs

使用LLMs的多模态音乐推荐系统

Srikar Prabhas Kandagatla, Sreehitha R. Narayana, Chandana Magapu, Swetha Mohan, Shamanth Kuthpadi, Hongjie Chen, Ryan A. Rossi, Franck Dernoncourt, Nesreen Ahmed

发表机构 * University of Massachusetts Amherst(马萨诸塞大学阿姆赫斯特分校) Dolby Laboratories(Dolby实验室) Adobe Research(Adobe研究) Cisco Research(Cisco研究)

AI总结 提出一个多模态框架,通过融合音频、歌词、LLM生成的语义元数据和收听完成率,在基于会话的音乐推荐中显著提升Recall和NDCG。

详情
AI中文摘要

音乐推荐系统通常将歌曲视为不透明标记,依赖协同交互历史,忽略了语义或声学内容。先前工作探索了LLM增强、多模态和文本增强的序列推荐方法,但有些方法部分结合了语义、声学或参与信号,没有在一个统一的基于LLM的序列推理框架中联合建模所有三个信号,该框架将推荐基于实际歌曲内容。在这项工作中,我们提出了一个用于基于会话的音乐推荐的多模态框架,通过三种互补信号丰富了LastFM-1K数据集:(1) 使用预训练音乐和文本表示模型提取的音频和歌词嵌入,(2) 使用MGPHot注释方案生成的LLM语义元数据,以及(3) 收听完成率。我们采用E4SRec框架,通过扩展多模态特征和不同的项目ID编码器骨干(包括SASRec、BERT4Rec和GRU4Rec)来增强它。我们进一步扩展了LLM骨干选项,包括LLaMa-2-13B、Qwen2.5-7B-Instruct和LLaMa-3-70B,在零样本和微调设置下。我们的实验表明,集成基于内容的特征比仅使用ID的基线在Recall上提升高达95%,在NDCG上提升高达79%。此外,我们的实验表明,朴素的多模态融合并不总是产生加性改进,突显了跨模态整合的挑战。我们发布了一个用于音乐推荐的大规模多模态基准。

英文摘要

Music recommendation systems typically treat songs as opaque tokens, relying on collaborative interaction histories which overlooks semantic or acoustic content. Prior work has explored LLM-augmented, multimodal, and text-enhanced approaches to sequential recommendation, and while some methods partially combine semantic, acoustic, or engagement signals, none jointly model all three within a unified LLM-based sequential reasoning framework that grounds recommendations in actual song content. In this work, we propose a multimodal framework for session-based music recommendation that enriches the LastFM-1K dataset with three complementary signals: (1) audio and lyric embeddings extracted using pretrained music and text representation models, (2) LLM-generated semantic metadata using the MGPHot annotation schema, and (3) listening completion ratios. We adopt the E4SRec framework by extending it with multimodal features and different item ID encoder backbones, including SASRec, BERT4Rec, and GRU4Rec. We further extend the LLM backbone option with LLaMa-2-13B, Qwen2.5-7B-Instruct, and LLaMa-3-70B in both zero-shot and fine-tuned settings. Our experiments show that integrating content-based features improves over ID-only baselines up to 95% in terms of Recall and 79% in terms of NDCG. Moreover, our experiments show that naive multimodal fusion does not always yield additive improvements, highlighting challenges in cross-modal integration. We release a large-scale multimodal benchmark for music recommendation.

2606.00120 2026-06-02 eess.SP cs.AI cs.LG

SpikeWFM: Spiking-Aided Wireless Foundation Model for Robust Channel Prediction

SpikeWFM:用于鲁棒信道预测的脉冲辅助无线基础模型

Liwen Jing, Yisha Lu, Tingting Yang, Li Sun, Yuxuan Shi, Yuwei Wang, Mengfan Zheng, Leiyang Xu

发表机构 * Mobile Information Networks-National Science and Technology Major Project(移动信息网络国家科技重大专项)

AI总结 提出SpikeWFM混合架构,将脉冲神经网络与基于ANN的Transformer结合,通过时间稀疏性和事件驱动处理增强无线基础模型对噪声和干扰的鲁棒性,在信道预测任务上优于传统模型。

详情
AI中文摘要

本文提出SpikeWFM,一种新颖的混合架构,它将脉冲神经网络(SNN)与基于传统人工神经网络(ANN)的Transformer集成用于无线基础模型(WFM)。受人类大脑中噪声鲁棒且节能的信息处理启发,SpikeWFM旨在增强WFM对噪声和干扰的抵抗力,同时保持跨多种无线场景的强大泛化能力。借鉴大型语言模型成功经验,WFM利用跨各种无线环境的大规模数据集上的自监督预训练,学习一个统一的嵌入表示,支持包括信道预测、信道估计、波束预测、定位等在内的广泛下游任务。这类模型通常优于任务特定设计,并对未见条件表现出卓越的适应性。然而,现有WFM在实际无线系统中仍易受真实噪声和干扰影响。为解决这一局限,我们将脉冲神经元引入基于Transformer的WFM架构。我们提供简要理论分析,展示SNN-ANN混合如何通过时间稀疏性和事件驱动处理有效减轻噪声和干扰。实验结果表明,SpikeWFM在预训练收敛和信道预测准确性上均持续优于传统基于ANN的WFM。关于通信和感知任务的更多结果将在本工作的完整期刊版本中呈现。

英文摘要

This paper proposes SpikeWFM, a novel hybrid architecture that integrates spiking neural networks (SNNs) with conventional artificial neural network (ANN)-based transformers for wireless foundation models (WFMs). Inspired by the noise-robust and energy-efficient information processing in the human brain, SpikeWFM aims to enhance the resilience of WFMs against noise and interference while maintaining strong generalization capabilities across diverse wireless scenarios. Drawing from the success of large language models, WFMs leverage self-supervised pre-training on large-scale datasets spanning various wireless environments to learn a unified embedding that supports a wide range of downstream tasks, including channel prediction, channel estimation, beam predition, positioning and etc. Such models typically outperform task-specific designs and exhibit superior adaptability to unseen conditions. However, existing WFMs remain vulnerable to realistic noise and interference in practical wireless systems. To address this limitation, we incorporate spiking neurons into the transformer-based WFM architecture. We provide a brief theoretical analysis demonstrating how the SNN-ANN hybrid effectively mitigates noise and interference through temporal sparsity and event-driven processing. Experimental results show that SpikeWFM consistently outperforms conventional ANN-based WFMs in both pre-training convergence and channel prediction accuracy. Additional results on communication and sensing tasks will be presented in the full journal version of this work.

2606.00112 2026-06-02 cs.NE cs.CV

Evolving to the Aesthetics of a Vision-Language Model

进化到视觉语言模型的美学

Stephen James Krol, Jon McCormack

发表机构 * SensiLab, Monash University Melbourne, Australia(传感实验室,墨尔本莫纳什大学,澳大利亚)

AI总结 本研究探索使用视觉语言模型(VLM)通过CLIP-IQA评分或成对比较结合Glicko评级系统来评估进化设计的美学,并与艺术家排名对比分析两种方法的优劣。

Comments Paper presented at ICCC26, June 29 - July 3, 2026, Coimbra, Portugal

详情
AI中文摘要

进化系统在创意领域已展现出显著成果,最近的应用包括生成式排版、设计和音乐。然而,设计能有效捕捉抽象输出所需美学的适应度函数仍是一个开放问题。在这项工作中,我们探索了两种使用视觉语言模型(VLM)评估种群美学的方法。第一种方法使用CLIP-IQA预测每个设计的美学分数。第二种方法则让候选设计相互对抗,由VLM根据用户指定的自定义提示确定胜者。然后,这些成对比较的结果通过Glicko评级系统用于估计种群排名。我们在一个使用自定义生成系统的案例研究中展示了这些方法,并将所得排名与艺术家的美学排名以及其他美学评估技术产生的排名进行比较。此外,我们记录了艺术家使用这些方法进化设计的体验,批判性地分析了两种方法的优缺点。

英文摘要

Evolutionary systems have demonstrated remarkable results in creative domains, with recent applications in generative typography, design, and music. However, an open problem remains in designing fitness functions that effectively capture the desired aesthetics of abstract outputs. In this work, we explore two methods for evaluating the aesthetics of a population using Vision-Language Models (VLMs). The first method uses CLIP-IQA to predict an aesthetic score for each design. The second method instead pits candidates against each other, with winners determined by a VLM using a custom prompt specified by the user. The outcomes of these pairwise comparisons are then used to estimate a population ranking via the Glicko rating system. We present these methods in the context of a case study using a custom generative system and compare the resulting rankings with an artist's aesthetic ranking and those produced by other aesthetic evaluation techniques. Additionally, we document the artist's experience using these approaches to evolve designs, critically analysing the strengths and weaknesses of both methods.

2606.00111 2026-06-02 eess.IV cs.CV cs.LG

ChWDTA: Channel-wise Wavelet-Domain Transformer Attention and Entropy Modeling for Learned Image Compression

ChWDTA:用于学习图像压缩的通道级小波域变换器注意力和熵建模

Haisheng Fu, Runyu Yang, Feng Ding, Siyu Zhu, Jie Liang, Xiaoxiao Li, Zhenman Fang, Jingning Han

发表机构 * Electrical and Computer Engineering Department, The University of British Columbia(英属哥伦比亚大学电气与计算机工程系) School of Engineering Science, Simon Fraser University(西蒙弗雷泽大学工程科学学院) School of Electronic Science and Technology, Eastern Institute of Technology(电子科学与技术学院,东部技术学院) Google LLC(谷歌公司)

AI总结 提出通道级小波域变换器注意力(ChWDTA)和通道级小波包分解,在混合CNN-Transformer图像压缩框架中提升率失真性能,在多个测试集上实现显著BD-rate降低。

Comments 13 pages, 8 figures, 6 tables

详情
AI中文摘要

最先进的学习图像压缩(LIC)方案越来越多地基于混合CNN-Transformer架构。为了进一步提高率失真性能,我们将通道级小波变换引入变换器和熵编码组件。首先,我们提出了一种通道级小波域变换器注意力(ChWDTA)机制。ChWDTA保留了现代LIC骨干中使用的有效窗口化空间自注意力,但在将注意力输出通过逆变换映射回来之前,在通道级小波变换特征上计算Q/K/V投影。因此,得到的通道级小波域变换器块(ChWDTB)保留了窗口化注意力的空间标记化模式,同时稀疏化了注意力投影所见的通道协方差。其次,在熵编码阶段,我们引入了一种通道级小波包(ChWP)分解,产生四个大小相等的子带,这更适合基于通道级切片的自回归熵建模。当每个通道级子带被分成两个切片时,我们使用八个切片进行熵编码。通过这种配置,所提出的方案在Kodak、CLIC Professional Validation和Tecnick测试集上分别获得了-17.82%、-19.15%和-22.56%的BD-rate降低。即使每个通道级子带被编码为单个切片,该方案仍以较低的复杂度保留了大部分编码增益。结果证实了在基于CNN-Transformer的LIC方案中引入小波变换的优势。

英文摘要

State-of-the-art learned image compression (LIC) schemes are increasingly based on hybrid CNN-transformer architectures. To further improve rate-distortion performance, we introduce channel-wise wavelet transforms into both the transformer and entropy-coding components. First, we propose a channel-wise wavelet-domain transformer attention (ChWDTA) mechanism. ChWDTA keeps the efficient windowed spatial self-attention used in modern LIC backbones, but computes the Q/K/V projections on channel-wise wavelet-transformed features before mapping the attention output back with the inverse transform. The resulting Channel-wise Wavelet-Domain Transformer Block (ChWDTB) therefore preserves the spatial tokenization pattern of windowed attention while sparsifying the channel covariance seen by the attention projections. Second, in the entropy-coding stage, we introduce a channel-wise wavelet packet (ChWP) decomposition that produces four equal-sized subbands, which better fit channel-wise slice-based autoregressive entropy modeling. When each channel-wise subband is divided into two slices, we use eight slices for entropy coding. With this configuration, the proposed scheme obtains BD-rate reductions of -17.82%, -19.15%, and -22.56% on the Kodak, CLIC Professional Validation, and Tecnick test sets, respectively. Even when each channel-wise subband is coded as a single slice, the scheme still retains most of the coding gains with lower complexity. The results confirm the advantage of introducing wavelet transform in CNN-transformer-based LIC schemes.

2606.00108 2026-06-02 eess.SP cs.AI

Project SPARROW and the Future of Conservation Technology

SPARROW项目与保护技术的未来

Juan M. Lavista Ferres, Carl Chalmers, Bruno Demuro Segundo, Zhongqi Miao, Andres Hernandez Celis, Federico Alves Torres, Isai Daniel Chacon Silva, Anthony Cintron Roman, Allen Kim, Meygha Machado, Luana Marotti, Amy Michaels, Daniela Ruiz Lopez, Catherine Romero, Rahul Dodhia, Inbal Becker-Reshef, Pablo Arbelaez

发表机构 * Microsoft AI for Good Lab(微软AI for Good实验室) Universidad de los Andes(andes大学) University of Maryland(马里兰大学)

AI总结 提出SPARROW开源平台,通过集成太阳能、边缘AI和卫星通信,实现偏远地区连续自主的生物多样性监测,并在多国部署验证其鲁棒性和可扩展性。

详情
AI中文摘要

全球生物多样性正以前所未有的速度下降,然而可用于监测和保护生态系统的工具仍受限于电力、连接性和可达性。我们提出SPARROW,一个集成太阳能、边缘人工智能和卫星通信的开源软硬件平台,能够在偏远环境中实现连续、自主的生物多样性监测。每个SPARROW节点结合低功耗图形处理单元(GPU)与模块化视觉、声学和环境传感器,执行设备端深度学习推理,并通过低地球轨道(LEO)卫星或全球移动通信系统(GSM)网络传输汇总结果。我们在哥伦比亚、秘鲁、坦桑尼亚和美国的热带、温带和高山生态系统中部署了SPARROW,它在多变的环境条件下维持24/7运行,并在前190天内收集了超过200万张图像和声学记录。该系统展示了鲁棒的实时分类和自适应电源管理,实现了无需现场人工干预的完全自主。通过集成可再生能源、边缘AI和开源设计,SPARROW降低了生态监测的技术和财务门槛,并为分布式智能传感器网络(新兴的“万物互联”用于行星生物多样性监测)建立了可扩展的基础。

英文摘要

Global biodiversity is declining at unprecedented rates, yet the tools available to monitor and protect ecosystems remain limited by constraints in power, connectivity, and accessibility. We present SPARROW, a hardware and software open-source platform that integrates solar energy, edge artificial intelligence, and satellite communication to enable continuous, autonomous biodiversity monitoring in remote environments. Each SPARROW node combines a low-power Graphics Processing Unit (GPU) with modular visual, acoustic, and environmental sensors, performing on-device deep learning inference and transmitting summarized results through Low-Earth-Orbit (LEO) satellite or Global System for Mobile Communications (GSM) networks. We deployed SPARROW across tropical, temperate, and montane ecosystems in Colombia, Peru, Tanzania, and the United States, where it sustained 24/7 operation under variable environmental conditions and collected more than two million images and acoustic recordings in the first 190 days. The system demonstrated robust real-time classification and adaptive power management, achieving full autonomy without on-site human intervention. By integrating renewable energy, on-edge AI, and open-source design, SPARROW lowers the technical and financial barriers to ecological monitoring and establishes a scalable foundation for a distributed, intelligent network of sensors, an emerging "Internet of Living Things" for planetary biodiversity monitoring.