arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.00170 2026-06-02 cs.HC cs.AI cs.CV

UF-AMA: A unified framework for cross-domain emotion recognition via adaptive multimodal alignment

UF-AMA: 通过自适应多模态对齐的跨域情感识别统一框架

Zheng Wang, Shuo Wang, Junhong Wang

发表机构 * Institute of Advanced Technology, University of Science and Technology of China（中国科学技术大学先进技术研究院）； Department of Electronic Engineering and Information Science, University of Science and Technology of China（中国科学技术大学电子工程与信息科学系）； Institute of Artificial Intelligence, Hefei Comprehensive National Science Center（合肥综合国家科学中心人工智能研究院）

AI总结提出一种统一框架UF-AMA，利用自适应多模态对齐和置信度感知筛选机制，解决跨主体和跨会话的生理信号情感识别中的分布偏移问题，在SEED和SEED-IV数据集上达到最优性能。

详情

AI中文摘要

近年来，基于脑电图（EEG）等生理信号的情感识别受到了广泛关注，因为与面部表情等外部行为数据相比，内部生理数据提供了更高的客观性和可靠性。然而，由于个体和情境差异导致的分布偏移，以及各模态样本质量的差异，构建具有高泛化性和鲁棒性的跨域多模态情感识别模型仍然是一个关键挑战。在本研究中，我们提出了一种具有自适应多模态对齐的统一框架（UF-AMA），以使用多模态生理信号解决跨主体和跨会话的情感识别问题。首先，我们构建了一个由Transformer编码器和多头交叉注意力模块组成的跨模态特征融合网络，实现了EEG信号和眼动追踪数据的深度融合。随后，我们引入了一种置信度感知筛选机制，动态评估每个模态分支在目标域样本上的预测可靠性，将样本划分为不同的质量子集，并相应地应用全局一致性对齐和跨模态蒸馏。最后，我们提出了一个多级域自适应框架，联合优化局部模态特定特征和全局融合特征的边际分布和条件分布，从而在多个粒度上减少跨域分布偏移。在SEED和SEED-IV数据集上的大量实验表明，UF-AMA在跨主体和跨会话任务中均达到了最先进的性能。源代码可在 https://github.com/BetterCoderLab/UF-AMA 获取。

英文摘要

In recent years, emotion recognition based on physiological signals such as electroencephalogram (EEG) has gained considerable attention, as internal physiological data offer greater objectivity and reliability compared to external behavioral data like facial expressions. However, due to distribution shifts caused by individual and contextual differences, along with variations in sample quality across modalities, constructing a cross-domain multimodal emotion recognition model with high generalization and robustness remains a key challenge. In this study, we propose a Unified Framework with Adaptive Multimodal Alignment (UF-AMA) to address cross-subject and cross-session emotion recognition using multimodal physiological signals. First, we construct a cross-modal feature fusion network comprising Transformer encoders and multi-head cross-attention modules, enabling the deep integration of EEG signals and eye-tracking data. Subsequently, we introduce a confidence-aware screening mechanism that dynamically assesses the predictive reliability of each modality branch on target domain samples, partitions samples into different quality subsets, and accordingly applies global consistency alignment and cross-modal distillation. Finally, we propose a multi-level domain adaptation framework that jointly optimizes the marginal and conditional distributions of both local modality-specific and global fusion features, thereby reducing cross-domain distribution shifts at multiple granularities. Extensive experiments on the SEED and SEED-IV datasets demonstrate that UF-AMA achieves state-of-the-art (SOTA) performance in both cross-subject and cross-session tasks. The source code is available at: https://github.com/BetterCoderLab/UF-AMA.

URL PDF HTML ☆

赞 0 踩 0

2606.00161 2026-06-02 cs.CR cs.AI cs.LG

Improving IoT Intrusion Detection Through SMOTE-Based Oversampling and Extended Multi-Model Evaluation on Side-Channel Power Data

基于SMOTE过采样和扩展多模型评估的侧信道功率数据物联网入侵检测改进

Muhammad Khuram Shahzad, Haseeb Khan, Muhammad Masood Khan, Mubashra Bibi

发表机构 * School of Electrical Engineering and Computer Science (SEECS), NUST（电气工程与计算机科学学院（SEECS），努斯兰大学）

AI总结针对物联网侧信道数据集中的严重类别不平衡问题，采用SMOTE过采样平衡数据，并评估八种机器学习模型，其中随机森林和极端随机树在F1分数上超越基线方法，同时揭示了宏观F1指标的重要性。

Comments 8 pages, 14 figures; code and results publicly available

详情

AI中文摘要

物联网网络中的入侵检测面临传统机器学习方法无法克服的挑战，其中最大的挑战之一是侧信道数据集中存在的类别不平衡问题，正常类样本与攻击类样本的比例可达75964:1。Dominguez等人通过基于功率的入侵检测概念验证解决了这一问题，但既未尝试处理不平衡问题，也未使用平衡训练集评估分类器性能。本文同时处理这两个方面。首先，对从初始数据集提取的所有九个可能数据集应用合成少数类过采样技术（SMOTE），使每个数据集的精确不平衡比达到1.1。然后，在SMOTE平衡的6小时数据集上，在相同条件下训练了八种算法：随机森林、HistGradientBoosting、LightGBM、极端随机树、XGBoost、k近邻、多层感知机和决策树。随机森林的微平均F1分数达到0.9989，宏F1为0.9794，优于基线论文中时间序列森林算法之前的最佳微F1结果0.9983。极端随机树提供了相同的性能，但速度快10倍。与基线论文评估相比，显式引入宏F1指标揭示了聚合性能指标遗漏的重要类别级信息。基于混淆矩阵、F1热图和ROC曲线计算的每类召回率表明，仅当使用SMOTE平衡时，少数攻击类（尤其是M+L联合感染类）才能被可靠检测。特征重要性分析表明，在功率窗口的60个时间步中，最近的时间步是最重要的预测信号。

英文摘要

The detection of intrusions in IoT-based networks poses challenges that cannot be overcome using traditional machine learning methods. Perhaps the biggest of them is related to the presence of a class imbalance in the side-channel dataset, where the number of samples in the normal class compared to the attacks can reach a ratio of 75,964 to 1. Such an aspect is addressed by Dominguez et al. through the proof of concept of power-based intrusion detection. Unfortunately, neither the authors attempt to cope with the problem of imbalance nor do they assess the classifier performance using a balanced training set. In the current paper, both aspects will be handled at once. First, a Synthetic Minority Oversampling Technique (SMOTE) was performed on all nine possible datasets extracted from the initial one, providing an exact imbalance ratio of 1.1 for each. Then, eight algorithms i.e. Random Forest, HistGradientBoosting, LightGBM, Extra Trees, XGBoost, k-Nearest Neighbors, Multi-Layer Perceptron, and Decision Tree were trained under identical conditions for the SMOTE balanced 6-hour dataset. Random Forest reached a micro-averaged F1 score of 0.9989 and macro F1 of 0.9794, thus outperforming the previously best micro-F1 result obtained by Time Series Forest algorithm from the base paper of 0.9983. Extra Trees provided the same performance as well, but at 10 times faster. The introduction of a macro-F1 metric explicitly in contrast to the base paper assessment reveals important class-level information missed with aggregate performance metrics. Recall rates per-class calculated with confusion matrices, F1 heatmaps, and ROC curves show that minority attack classes, especially those with combined M+L infections, are detected reliably only when using SMOTE balance. Feature importance analysis indicates the latest time steps as the most important predictor signals out of 60 steps in a power window.

URL PDF HTML ☆

赞 0 踩 0

2606.00160 2026-06-02 cs.CR cs.AI cs.CL

DataShield: Safety-degrading Data Filtering for LLM Benign Instruction Fine-Tuning

DataShield: 针对大语言模型良性指令微调的安全降级数据过滤

Junbo Zhang, Qianli Zhou, Xinyang Deng, Wen Jiang, Jie Pan, Jinbiao Zhu

发表机构 * nwpu.edu.cn（西北工业大学）

AI总结提出DataShield方法，通过量化每个样本对模型合规行为的贡献作为安全降级分数，高效识别良性数据集中的安全降级样本，并在多个模型和数据集上验证有效性。

详情

AI中文摘要

大型语言模型（LLM）即使在使用良性数据集进行微调时，也会出现安全能力下降的问题。然而，现有识别良性数据集中安全降级样本的方法存在计算成本高和噪声显著的问题。在本文中，我们提出DataShield，以高效且有效地识别潜在的安全降级样本。我们的关键直觉基于以下观察：良性微调提高了LLM的整体响应合规性。DataShield的核心技术见解是将每个样本对模型合规行为的贡献量化为其安全降级分数。DataShield包含三个核心组件：（1）合规向量提取，捕获LLM的合规行为倾向；（2）一种新颖的合规感知分数（CAS），自动识别最优的安全关键层；（3）安全降级样本过滤，量化训练数据沿合规方向的投影偏移。在Llama3-8B、Llama3.1-8B和Qwen2.5-7B上使用Alpaca和Dolly良性数据集进行的广泛实验评估验证了我们的方法在识别高风险和低风险数据子集方面的有效性。我们还观察到，开放式问答更可能触发安全降级，且相应的响应往往更长。我们希望这项工作能为以数据为中心的防御方法提供新的见解。源代码可在https://github.com/ZJunBo/DataShield获取。

英文摘要

Large language models (LLMs) suffer from degraded safety capabilities even when fine-tuned with benign datasets. However, existing methods for identifying safety-degrading samples in benign datasets suffer from high computational costs and significant noise issues. In this paper, we propose DataShield to efficiently and effectively identify potential safety-degrading samples. Our key intuition is based on the observation that benign fine-tuning increases the overall response compliance of LLMs. DataShield's key technical insight is to quantify each sample's contribution to the model's compliance behavior as its safety degradation score. DataShield consists of three core components: (1) Compliance Vector Extraction, which captures the LLM's compliance behavior tendency; (2) a novel Compliance-Aware Score (CAS), which automatically identifies the optimal safety-critical layer; and (3) Safety-degrading Sample Filtering, which quantifies the projection shift of training data along the compliance direction. Extensive experimental evaluation on Llama3-8B, Llama3.1-8B, and Qwen2.5-7B using the Alpaca and Dolly benign datasets validates our method's effectiveness in identifying high-risk and low-risk data subsets. We also observe that open-ended question answering is more likely to trigger safety degradation, and corresponding responses tend to be longer. We hope this work can provide new insights into data-centric defense methods. The source code is available at: https://github.com/ZJunBo/DataShield.

URL PDF HTML ☆

赞 0 踩 0

2606.00158 2026-06-02 eess.IV cs.CV

Training-Free Continuous Bitrate Control for Scalable Image Coding for Humans and Machines

面向人类与机器的可扩展图像编码的无训练连续码率控制

Yui Tatsumi, Hiroshi Watanabe

发表机构 * University of Tokyo（东京大学）

AI总结提出一种无训练的变码率可扩展图像编码框架，通过基于预测尺度值调整量化步长实现连续码率控制，同时保留机器层和增强层的高尺度信息。

2606.00157 2026-06-02 stat.ML cs.AI cs.LG math.PR

Interpreting FCDNNs via RG on Exponential Family

通过指数族上的重正化群解释全连接深度神经网络

Fuzhou Gong, Zigeng Xia

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文通过建立统计物理中重正化群方法与深度神经网络训练过程的对应关系，证明了对于指数族连续输入数据，全连接DNN训练后特征层输出的特征参数等于RG方法下的不动点，从而解释了DNN的特征提取能力。

Comments 18 pages, 2 figures

详情

AI中文摘要

我们考虑通过建立统计物理中的重正化群（RG）方法与深度神经网络（DNN）训练过程之间的对应关系，来建立深度学习的可解释性理论。我们已使用一维伊辛模型作为输入数据证明了所构建的关系。本文我们将结果推广到连续输入数据的情况，这是将该对应框架应用于真实数据的必要准备。为具有代表性，我们考虑指数族中的一类数据分布。我们证明，当全连接（FC）DNN的参数在训练后达到最优值时，DNN特征层输出的特征参数等于连续场RG方法下输入数据特征参数的不动点。这一结论表明，DNN的训练过程等价于对此类数据进行RG计算，因此网络能够像RG一样从输入数据中提取主要特征。此外，该等价性进一步验证了我们建立的对应框架，为DNN在真实数据上的卓越表现提供了解释。

英文摘要

We consider establishing the interpretability theory of deep learning through constructing a corresponding relationship between the renormalization group (RG) method in statistical physics and the training process of deep neural networks (DNNs). We have proved the constructed relationship using the one-dimensional Ising model as the input data. In this paper we generalize our results to the case of continuous input data, which is a necessary preparation for applying the corresponding framework to real-world data. To be representative, we consider a class of data distribution in the exponential family. We prove that when the parameters of fully connected (FC) DNNs achieve their optimal value after training, the characteristic parameters of the feature layer output of DNNs are equal to the fixed points of the characteristic parameters of input data under RG method for continuous fields. This conclusion shows that the training process of DNNs is equivalent to RG calculation on this kind of data and therefore the network can extract main features from the input data just like RG. Also, the equivalence further validates the correspondence framework we have established, providing an explanation for the outstanding performance of DNNs on real-world data.

URL PDF HTML ☆

赞 0 踩 0

2606.00155 2026-06-02 cs.CR cs.AI

A Protocol-Language Model for Network Intrusion (Without Deep Packet Inspection)

一种用于网络入侵的协议语言模型（无需深度包检测）

Vivek Kumar Sharma

发表机构 * Palo Alto Networks（帕洛阿尔托网络）

AI总结提出PLM-NIDS，利用RWKV-4状态空间模型将网络流作为语言处理，仅基于L3/L4包元数据检测攻击，无需深度包检测，实现零样本异常检测（PR-AUC=0.93）和加密协议透明处理。

Comments 20 pages Research paper on Packet Language Models for Network Intrusion Detection Systems(Without Deep Packet Inspection).Code available on GitHub

详情

AI中文摘要

现代网络入侵检测系统（NIDS）陷入结构性矛盾：承载最高威胁情报的协议恰恰是那些在TLS 1.3和QUIC下加密的协议，其中负载检测毫无用处。我们提出一个更简单的问题——如果攻击签名不在字节中，而在节奏中呢？——并通过将网络流视为一种语言来回答，该语言的语法完全由L3/L4包元数据编写：长度、到达间隔时间、TTL、TCP标志和哈希端口号。我们提出了PLM-NIDS，它依次证明了三个主张。（1）语法存在且可学习：在344,232个未标记的Monday流上训练的RWKV-4状态空间模型实现了0.204的因果LM验证损失，表明良性流量具有可预测的、统计一致的结构。（2）攻击违反此语法：在训练时使用零攻击标签的情况下，每流困惑度得分以PR-AUC=0.93清晰分离良性流和攻击流。（3）这种分离在架构上非平凡：在相同令牌序列上训练的LSTM退化为多数类预测器（ROC-AUC约0.50，通过始终预测“攻击”得到F1=0.91），证明RWKV的因果预训练提供了直接分类器无法获得的归纳偏置。监督微调进一步将PR-AUC提升至0.94，ROC-AUC提升至0.75，在校准操作阈值下精确率为97.7%。RWKV骨干的O(T)循环推理支持逐包流式处理而无需流缓冲，使PLM-NIDS在线速率下可操作。由于它仅读取IP/TCP/UDP头部，因此本质上是加密无关的：TLS 1.3、QUIC和未来的加密协议均被透明处理。

英文摘要

Modern network intrusion detection systems (NIDS) are caught in a structural contradiction: the protocols carrying the highest threat intelligence are precisely those encrypted under TLS 1.3 and QUIC, where payload inspection yields nothing. We ask a simpler question -- what if the attack signature is not in the bytes, but in the rhythm? -- and answer it by treating network flows as a language whose grammar is written entirely in L3/L4 packet metadata: length, inter-arrival time, TTL, TCP flags, and hashed port numbers. We present PLM-NIDS, which proves three claims in sequence. (1) The grammar exists and is learnable: a RWKV-4 state-space model trained on 344,232 unlabelled Monday flows achieves a causal LM validation loss of 0.204, demonstrating that benign traffic has predictable, statistically consistent structure. (2) Attacks violate this grammar: the per-flow perplexity score cleanly separates benign from attack flows with PR-AUC = 0.93 using zero attack labels at training time. (3) This separation is architecturally nontrivial: an LSTM trained on identical token sequences degenerates to a majority-class predictor (ROC-AUC approximately 0.50, F1 = 0.91 by always predicting "attack"), proving that RWKV's causal pre-training provides an inductive bias unavailable to direct classifiers. Supervised fine-tuning further raises PR-AUC to 0.94 and ROC-AUC to 0.75, with a precision of 97.7% at the calibrated operating threshold. The RWKV backbone's O(T) recurrent inference enables per-packet streaming without flow buffering, making PLM-NIDS operationally viable at line rate. Because it reads only IP/TCP/UDP headers, it is inherently encryption-agnostic: TLS 1.3, QUIC, and future encrypted protocols are handled transparently.

URL PDF HTML ☆

赞 0 踩 0

2606.00152 2026-06-02 cs.CR cs.AI

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say

PrivacyPeek: 审计基于LLM的智能体获取了什么，而不仅仅是它们说了什么

Mingxuan Zhang, Jiahui Han, Dadi Guo, Songze Li, Guanchu Wang, Na Zou, Dongrui Liu, Xia Hu

发表机构 * Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； Southeast University（东南大学）

AI总结提出PrivacyPeek基准，通过检查工具调用轨迹和探针诱导，评估基于LLM的智能体在获取阶段不必要的敏感信息泄露，发现该问题普遍存在且现有防御效果有限。

Comments 19 pages, 9 figures

详情

AI中文摘要

基于LLM的智能体正在快速发展，能够自主调用外部工具为用户完成多步骤任务。然而，智能体常常获取超出任务所需的敏感信息。现有的隐私基准审计智能体的响应或外部行为泄露了什么，但忽略了数据首次进入智能体上下文时的获取阶段。过度获取的信息只需一次粗心操作或一次攻击即可完全泄露。为了评估其普遍性，我们引入了\emph{PrivacyPeek}，一个用于评估基于LLM的智能体获取阶段隐私泄露的基准，包含$1{,}182$个案例，涵盖$7$种获取行为和$16$个应用领域。具体来说，\emph{获取检查}检查智能体的工具调用轨迹，包括其调用的工具和接收的数据，以检测其何时获取超出任务范围的敏感信息。然后，\emph{探针诱导}发出后续探针，并衡量攻击者能够多容易地诱导出智能体已获取但未披露的敏感信息。我们在4个模型家族的10个基于LLM的智能体上的实验表明，不必要的敏感信息获取非常普遍。此外，我们观察到任务完成能力与获取阶段泄露之间存在相关性。提示级别的防御仅减少了获取阶段泄露的一小部分，大部分未被缓解。这些结果使得审计获取阶段的隐私既紧迫又必要。我们的数据集和代码可在https://github.com/Xuan269/PrivacyPeek-Resource获取。

英文摘要

LLM-based agents are rapidly advancing, autonomously invoking external tools to complete multi-step tasks for users. However, agents often acquire more sensitive information than the task requires. Existing privacy benchmarks audit what the agent's response or outgoing actions disclose, but overlook the acquisition stage where data first enters the agent's context. The over-acquired information is then one careless action or one attack away from an outright leak. To assess its prevalence, we introduce \emph{PrivacyPeek}, a benchmark for evaluating acquisition-stage privacy leakage of LLM-based agents, with $1{,}182$ cases across $7$ acquisition behaviours and $16$ application domains. Specifically, \emph{Acquisition Inspection} examines the agent's tool-call trajectory, both the tools it invokes and the data it receives, to detect when it acquires sensitive information beyond the task scope. \emph{Probe Elicitation} then issues a follow-up probe and measures how readily an attacker could elicit sensitive information the agent acquired but did not disclose. Our experiments on 10 LLM-based agents across 4 model families show that the unnecessary acquisition of sensitive information is widespread. In addition, we observe a correlation between the task-completion capability and acquisition-stage leakage. Prompt-level defences reduce only a small fraction of acquisition-stage leakage, leaving the majority unmitigated. These results make auditing acquisition-stage privacy both urgent and necessary. Our dataset and code are available at https://github.com/Xuan269/PrivacyPeek-Resource.

URL PDF HTML ☆

赞 0 踩 0

2606.00150 2026-06-02 cs.CR cs.AI

Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models

人格攻击：针对大型语言模型的增量记忆注入越狱攻击

Junyoung Park, Seongyong Ju, Sunghwan Park, Jaewoo Lee

发表机构 * Chung-Ang University（Chung-Ang 大学）

AI总结提出一种基于记忆注入的越狱方法Persona Attack，通过逐步操纵模型上下文窗口，使模型在记忆积累中优先处理注入指令，从而绕过安全对齐，在特定配置下攻击成功率可达95%。

详情

AI中文摘要

随着大型语言模型为方便用户而不断发展，尽管在安全训练方面持续努力，但越狱攻击的脆弱性仍被不断报告。传统的越狱技术通常侧重于单次提示注入，忽略了模型记住对话流程和用户指令的能力。在本文中，我们提出了Persona Attack，一种基于记忆注入的越狱方法，通过逐步方法操纵模型的上下文窗口。将Persona Attack应用于多个广泛使用的LLM的实验结果表明，随着注入在记忆中的积累，模型越来越优先考虑这些指令，而不是其内部安全对齐机制。此外，我们的实验经验性地证明，攻击成功率不仅根据模型的记忆实现而变化，还取决于指令的组合，在特定指令配置下可达到95%。

英文摘要

As Large Language Models evolve for user convenience, vulnerability to jailbreak attacks continues to be reported despite ongoing efforts in safety training. Traditional jailbreak techniques typically focus on a single prompt injection, neglecting the models' ability to remember the flow of conversation and the user's instructions. In this paper, we propose Persona Attack, a memory injection based jailbreak method that manipulates the model's context window through a step by step approach. Experimental results from applying Persona Attack to several widely used LLMs reveal that, as injections accumulate in memory, models increasingly prioritize these instructions over their internal safety alignment mechanisms. Furthermore, our experiments empirically demonstrate that the attack success rate varies not only according to the memory implementation of the model, but also combinations of instructions and can reach 95% under specific instruction configurations.

URL PDF HTML ☆

赞 0 踩 0

2606.00146 2026-06-02 eess.IV cs.AI cs.CV

Multi-Contrast MRI Motion Correction via Parameter-Informed Disentanglement and Adaptive Experts

多对比度MRI运动校正：基于参数信息解缠与自适应专家网络

Honglin Xiong, Yuxian Tang, Feng Li, Yulin Wang, Lei Xiang, Dinggang Shen, Qian Wang

发表机构 * ShanghaiTech University（上海科技大学）

AI总结提出一种结合参数信息对比度解缠与严重度感知自适应校正的统一框架，通过ScanCLIP提取对比度嵌入以分离解剖内容，利用视觉Transformer估计运动严重度并路由至专家混合网络，实现跨对比度与严重度的运动伪影校正，在IXI和HCP基准上优于现有方法。

详情

AI中文摘要

磁共振成像中的运动伪影降低了诊断可靠性。现有的深度学习方法通常针对特定对比度，无法泛化到不同模态和伪影严重度。我们提出一个统一框架，结合参数信息对比度解缠与严重度感知自适应校正。ScanCLIP在超过30,000个MRI文本-图像对上预训练，从采集参数中导出对比度嵌入，将对比度风格与解剖内容分离，得到无对比度特征。然后，视觉Transformer估计运动严重度，并通过专家混合网络路由特征，实现针对性伪影校正。双路径解码器重建干净图像和残差伪影图，强制执行图像空间一致性。在IXI和HCP基准上，我们的方法在PSNR上提升0.75 dB，SSIM最高提升0.0279，优于现有方法，且在更高伪影严重度下增益更大。该方法在真实临床数据上展现出鲁棒的零样本泛化能力，这些数据使用未见过的扫描参数采集，而现有方法要么无法去除伪影，要么引入额外失真。

英文摘要

Motion artifacts in magnetic resonance imaging (MRI) degrade diagnostic reliability. Existing deep learning methods are typically contrast-specific and fail to generalize across diverse modalities and artifact severities. We propose a unified framework combining parameter-informed contrast disentanglement with severity-aware adaptive correction. ScanCLIP, pretrained on over 30,000 MRI text-image pairs, derives contrast embeddings from acquisition parameters to disentangle contrast style from anatomical content, yielding contrast-free features. A Vision Transformer then estimates motion severity and routes features through a Mixture-of-Experts network, enabling targeted artifact correction. A dual-pathway decoder reconstructs both the clean image and residual artifact map, enforcing image-space consistency. On IXI and HCP benchmarks, our method improves PSNR by 0.75 dB and SSIM by up to 0.0279 over state-of-the-art approaches, with larger gains at higher artifact severities. It further demonstrates robust zero-shot generalization on real-world clinical data acquired with unseen scanning parameters, where existing methods either fail to remove artifacts or introduce additional distortions.

URL PDF HTML ☆

赞 0 踩 0

2606.00143 2026-06-02 q-fin.PM cs.AI

Regime-Adaptive Continual Learning for Portfolio Management

Chaofan Pan, Lingfei Ren, Linbo Xiong, Yonghao Li, Wei Wei, Xin Yang

发表机构 * Southwestern University of Finance and Economics（西南财经大学）； Shanxi University（山西大学）

AI总结提出ReCAP框架，通过自适应制度检测和持续学习实现投资组合管理的快速适应与长期优异回报。

Comments Accepted by KDD 2026

详情

AI中文摘要

金融市场本质上是不稳定的，频繁出现制度转变和结构性变化，使得传统的投资组合管理方法失效。现有的补救措施，如滚动窗口重新训练和朴素在线微调，分别受到高计算成本和知识利用不足的困扰，导致低回报和有限的适应性。持续学习通过使交易代理能够跨顺序任务积累和转移知识，提供了一种有前景的范式。在本文中，我们提出了 extbf{Re}gime-aware extbf{C}ontinual extbf{A}daptive extbf{P}ortfolio management ( extbf{ReCAP})，一个将CL集成到PM中以应对动态金融环境挑战的新框架。ReCAP采用自适应制度检测模块将历史市场数据分割成可变长度的制度，实现制度特定的策略向量学习和策略库构建。在持续交易过程中，制度门控模块根据当前市场状态自适应地组合策略库中的策略向量，促进对新检测到的制度的快速适应。只有制度门控和当前制度的策略向量被持续更新，以有效保留有用知识。在五个真实世界数据集上的广泛实验表明，ReCAP持续优于流行的基线，在长期投资视野中实现卓越回报，并快速适应制度转变。

英文摘要

Financial markets are inherently non-stationary, exhibiting frequent regime shifts and structural changes that render traditional Portfolio Management (PM) approaches ineffective. Existing remedies, such as rolling-window retraining and naive online fine-tuning, are hindered by high computational costs and insufficient knowledge utilization, respectively, resulting in low returns and limited adaptability. Continual learning (CL) offers a promising paradigm by enabling trading agents to accumulate and transfer knowledge across sequential tasks. In this paper, we propose \textbf{Re}gime-aware \textbf{C}ontinual \textbf{A}daptive \textbf{P}ortfolio management (\textbf{ReCAP}), a novel framework that integrates CL into PM to address the challenges of dynamic financial environments. ReCAP employs an adaptive regime detection module to segment historical market data into variable-length regimes, enabling regime-specific learning of policy vectors and the construction of a policy library. During continual trading, a regime-gate module adaptively combines policy vectors from the library based on the current market state, facilitating rapid adaptation to newly detected regimes. Only the regime-gate and the current regime's policy vector are continually updated to preserve useful knowledge effectively. Extensive experiments on five real-world datasets demonstrate that ReCAP consistently outperforms popular baselines, achieving superior returns in long-term investment horizons and rapid adaptation to regime shifts.

URL PDF HTML ☆

赞 0 踩 0

2606.00134 2026-06-02 cs.CR cs.AI cs.LG

XAI-SOH-FL: Enhancing SOH-FL with Adaptive Aggregation and Explainable AI for Intrusion Detection in Heterogeneous IoT

XAI-SOH-FL: 通过自适应聚合和可解释人工智能增强异构物联网入侵检测中的SOH-FL

Ambreen Aslam, Maaz Hassan, Bibi Zahra, Muhammad Khuram Shahzad

发表机构 * School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST)（电气工程与计算机科学学院（SEECS），国家 sciences and Technology（NUST））

AI总结针对异构物联网中数据异构、标签稀缺和模型不可解释性问题，提出XAI-SOH-FL框架，通过自适应聚合（动态γ选择与贝叶斯优化）和SHAP可解释性，在CICIDS2017数据集上达到94.12%准确率和0.92 F1分数，优于基线SOH-FL。

Comments 8 pages, 6 figures; code available at https://github.com/aaslam-msit/SOH-FL-Enhancement

详情

AI中文摘要

物联网环境中的入侵检测系统面临数据异构、缺乏标记数据和模型可解释性有限等重大挑战。联邦学习提供了一种隐私保护解决方案；然而，现有方法如SOH-FL存在两个关键限制：依赖手动调整的聚合参数γ以及模型预测缺乏可解释性。在本文中，我们提出XAI-SOH-FL，一个增强框架，将自适应聚合和可解释人工智能集成到SOH-FL范式中。首先，我们引入基于相似性阈值的动态γ选择机制，使聚合过程能够适应不断变化的数据分布。其次，采用贝叶斯优化自动确定最优γ值，消除了手动调整的需要。第三，引入SHAP（SHapley Additive exPlanations）为入侵检测决策提供特征级可解释性。在CICIDS2017数据集上的实验评估表明，所提方法达到了94.12%的准确率和0.92的F1分数，优于基线SOH-FL模型，同时收敛所需的通信轮次更少。此外，基于SHAP的分析揭示，流级特征如流持续时间和数据包长度显著影响模型预测。这些结果表明，XAI-SOH-FL在异构物联网环境中提供了准确性、适应性和可解释性之间的有效平衡。

英文摘要

Intrusion Detection Systems (IDS) in Internet of Things (IoT) environments face significant challenges due to data heterogeneity, lack of labeled data, and limited model interpretability. Federated Learning (FL) offers a privacy-preserving solution; however, existing approaches such as SOH-FL suffer from two key limitations: reliance on a manually tuned aggregation parameter γ and lack of explainability in model predictions. In this paper, we propose XAI-SOH-FL, an enhanced framework that integrates adaptive aggregation and explainable artificial intelligence into the SOH-FL paradigm. First, we introduce a dynamic γ selection mechanism based on similarity thresholding, enabling the aggregation process to adapt to evolving data distributions. Second, Bayesian Optimization is employed to automatically determine optimal γ values, eliminating the need for manual tuning. Third, SHAP (SHapley Additive exPlanations) is incorporated to provide feature-level interpretability for intrusion detection decisions. Experimental evaluation on the CICIDS2017 dataset demonstrates that the proposed approach achieves an accuracy of 94.12% and an F1-score of 0.92, outperforming the baseline SOH-FL model while converging in fewer communication rounds. Furthermore, SHAP-based analysis reveals that flow-level features such as Flow Duration and Packet Length significantly influence model predictions. These results indicate that XAI-SOH-FL provides an effective balance between accuracy, adaptability, and interpretability in heterogeneous IoT environments.

URL PDF HTML ☆

赞 0 踩 0

2606.00131 2026-06-02 cs.SE cs.AI cs.LG cs.PL

AI-PROPELLER: Warehouse-Scale Interprocedural Code Layout Optimization with AlphaEvolve

AI-PROPELLER：基于AlphaEvolve的仓库规模过程间代码布局优化

Chaitanya Mamatha Ananda, Rajiv Gupta, Mircea Trofin, Aiden Grossman, Sriraman Tallam, Xinliang David Li, Amir Yazdanbakhsh

发表机构 * University of California, Riverside（加州大学河滨分校）； Google（谷歌）； DeepMind（深度思维）

AI总结提出AI-PROPELLER系统，利用Magellan智能工作流将Propeller的编译器启发式方法演化为细粒度过程间优化器，并通过实际硬件执行评估布局变体，首次在工业仓库规模应用中实现细粒度过程间代码布局优化，性能提升0.23%至1.6%。

详情

AI中文摘要

后链接优化器（如Propeller和BOLT）已证明，精确的、基于性能剖析的代码布局可以从高度优化的二进制文件中提取显著的性能提升。然而，这些系统目前局限于过程内技术，未能充分利用过程间布局的全局潜力。由于组合爆炸的搜索空间和复杂的调用返回语义难以建模，过程间代码布局历来困难。因此，细粒度过程间布局的性能潜力在实践中尚未得到证实。AI-PROPELLER使用Magellan（一种智能工作流），将Propeller中的编译器启发式方法演化为细粒度过程间优化器，并微调所得策略的超参数。为确保高保真度，我们摒弃了近似的静态成本模型，智能工作流生成多个布局变体，并在实际硬件上执行以测量真实性能计数器，为进化循环提供精确的奖励信号。AI-PROPELLER已在包括大型仓库规模应用在内的多个基准测试上进行了评估，实验表明，在使用最先进的FDO和PLO优化后，性能提升0.23%至1.6%，这对于实际二进制文件而言意义重大。这是首次在工业环境中对大型仓库规模应用进行细粒度过程间代码布局优化。

英文摘要

Post-link optimizers (PLOs) such as Propeller and BOLT have demonstrated that precise, profile-guided code layout can extract significant performance gains from heavily optimized binaries. However, these systems are currently restricted to intraprocedural techniques, leaving the global potential of interprocedural layout largely untapped. Interprocedural code layout is historically difficult due to a combinatorially intractable search space and complex call-return semantics that are challenging to model. Consequently, the performance potential of fine-grained interprocedural layout remains unproven in practice. AI-PROPELLER uses Magellan, an agentic workflow that evolves the compiler heuristic in Propeller into a fine-grained interprocedural optimizer and fine-tunes the resulting policy hyperparameters. To ensure high-fidelity, we move away from approximate static cost models and the agentic workflow generates multiple layout variants that are executed on actual hardware to measure real performance counters, providing a precise reward signal for the evolutionary loop. AI-PROPELLER has been evaluated on several benchmarks including large warehouse-scale applications and experiments show performance improvements of 0.23% to 1.6% optimized with state-of-the-art FDO and PLO which is significant for real-world binaries. This is the first time ever that large warehouse-scale applications in industrial settings have been optimized with fine-grained interprocedural code layout.

URL PDF HTML ☆

赞 0 踩 0

2606.00125 2026-06-02 cs.IR cs.AI cs.LG cs.MM

Multimodal Music Recommendation System using LLMs

使用LLMs的多模态音乐推荐系统

Srikar Prabhas Kandagatla, Sreehitha R. Narayana, Chandana Magapu, Swetha Mohan, Shamanth Kuthpadi, Hongjie Chen, Ryan A. Rossi, Franck Dernoncourt, Nesreen Ahmed

发表机构 * University of Massachusetts Amherst（马萨诸塞大学阿姆赫斯特分校）； Dolby Laboratories（Dolby实验室）； Adobe Research（Adobe研究）； Cisco Research（Cisco研究）

AI总结提出一个多模态框架，通过融合音频、歌词、LLM生成的语义元数据和收听完成率，在基于会话的音乐推荐中显著提升Recall和NDCG。

详情

AI中文摘要

音乐推荐系统通常将歌曲视为不透明标记，依赖协同交互历史，忽略了语义或声学内容。先前工作探索了LLM增强、多模态和文本增强的序列推荐方法，但有些方法部分结合了语义、声学或参与信号，没有在一个统一的基于LLM的序列推理框架中联合建模所有三个信号，该框架将推荐基于实际歌曲内容。在这项工作中，我们提出了一个用于基于会话的音乐推荐的多模态框架，通过三种互补信号丰富了LastFM-1K数据集：(1) 使用预训练音乐和文本表示模型提取的音频和歌词嵌入，(2) 使用MGPHot注释方案生成的LLM语义元数据，以及(3) 收听完成率。我们采用E4SRec框架，通过扩展多模态特征和不同的项目ID编码器骨干（包括SASRec、BERT4Rec和GRU4Rec）来增强它。我们进一步扩展了LLM骨干选项，包括LLaMa-2-13B、Qwen2.5-7B-Instruct和LLaMa-3-70B，在零样本和微调设置下。我们的实验表明，集成基于内容的特征比仅使用ID的基线在Recall上提升高达95%，在NDCG上提升高达79%。此外，我们的实验表明，朴素的多模态融合并不总是产生加性改进，突显了跨模态整合的挑战。我们发布了一个用于音乐推荐的大规模多模态基准。

英文摘要

Music recommendation systems typically treat songs as opaque tokens, relying on collaborative interaction histories which overlooks semantic or acoustic content. Prior work has explored LLM-augmented, multimodal, and text-enhanced approaches to sequential recommendation, and while some methods partially combine semantic, acoustic, or engagement signals, none jointly model all three within a unified LLM-based sequential reasoning framework that grounds recommendations in actual song content. In this work, we propose a multimodal framework for session-based music recommendation that enriches the LastFM-1K dataset with three complementary signals: (1) audio and lyric embeddings extracted using pretrained music and text representation models, (2) LLM-generated semantic metadata using the MGPHot annotation schema, and (3) listening completion ratios. We adopt the E4SRec framework by extending it with multimodal features and different item ID encoder backbones, including SASRec, BERT4Rec, and GRU4Rec. We further extend the LLM backbone option with LLaMa-2-13B, Qwen2.5-7B-Instruct, and LLaMa-3-70B in both zero-shot and fine-tuned settings. Our experiments show that integrating content-based features improves over ID-only baselines up to 95% in terms of Recall and 79% in terms of NDCG. Moreover, our experiments show that naive multimodal fusion does not always yield additive improvements, highlighting challenges in cross-modal integration. We release a large-scale multimodal benchmark for music recommendation.

URL PDF HTML ☆

赞 0 踩 0

2606.00120 2026-06-02 eess.SP cs.AI cs.LG

SpikeWFM: Spiking-Aided Wireless Foundation Model for Robust Channel Prediction

SpikeWFM：用于鲁棒信道预测的脉冲辅助无线基础模型

Liwen Jing, Yisha Lu, Tingting Yang, Li Sun, Yuxuan Shi, Yuwei Wang, Mengfan Zheng, Leiyang Xu

发表机构 * Mobile Information Networks-National Science and Technology Major Project（移动信息网络国家科技重大专项）

AI总结提出SpikeWFM混合架构，将脉冲神经网络与基于ANN的Transformer结合，通过时间稀疏性和事件驱动处理增强无线基础模型对噪声和干扰的鲁棒性，在信道预测任务上优于传统模型。

详情

AI中文摘要

本文提出SpikeWFM，一种新颖的混合架构，它将脉冲神经网络（SNN）与基于传统人工神经网络（ANN）的Transformer集成用于无线基础模型（WFM）。受人类大脑中噪声鲁棒且节能的信息处理启发，SpikeWFM旨在增强WFM对噪声和干扰的抵抗力，同时保持跨多种无线场景的强大泛化能力。借鉴大型语言模型成功经验，WFM利用跨各种无线环境的大规模数据集上的自监督预训练，学习一个统一的嵌入表示，支持包括信道预测、信道估计、波束预测、定位等在内的广泛下游任务。这类模型通常优于任务特定设计，并对未见条件表现出卓越的适应性。然而，现有WFM在实际无线系统中仍易受真实噪声和干扰影响。为解决这一局限，我们将脉冲神经元引入基于Transformer的WFM架构。我们提供简要理论分析，展示SNN-ANN混合如何通过时间稀疏性和事件驱动处理有效减轻噪声和干扰。实验结果表明，SpikeWFM在预训练收敛和信道预测准确性上均持续优于传统基于ANN的WFM。关于通信和感知任务的更多结果将在本工作的完整期刊版本中呈现。

英文摘要

This paper proposes SpikeWFM, a novel hybrid architecture that integrates spiking neural networks (SNNs) with conventional artificial neural network (ANN)-based transformers for wireless foundation models (WFMs). Inspired by the noise-robust and energy-efficient information processing in the human brain, SpikeWFM aims to enhance the resilience of WFMs against noise and interference while maintaining strong generalization capabilities across diverse wireless scenarios. Drawing from the success of large language models, WFMs leverage self-supervised pre-training on large-scale datasets spanning various wireless environments to learn a unified embedding that supports a wide range of downstream tasks, including channel prediction, channel estimation, beam predition, positioning and etc. Such models typically outperform task-specific designs and exhibit superior adaptability to unseen conditions. However, existing WFMs remain vulnerable to realistic noise and interference in practical wireless systems. To address this limitation, we incorporate spiking neurons into the transformer-based WFM architecture. We provide a brief theoretical analysis demonstrating how the SNN-ANN hybrid effectively mitigates noise and interference through temporal sparsity and event-driven processing. Experimental results show that SpikeWFM consistently outperforms conventional ANN-based WFMs in both pre-training convergence and channel prediction accuracy. Additional results on communication and sensing tasks will be presented in the full journal version of this work.

URL PDF HTML ☆

赞 0 踩 0

2606.00112 2026-06-02 cs.NE cs.CV

Evolving to the Aesthetics of a Vision-Language Model

进化到视觉语言模型的美学

Stephen James Krol, Jon McCormack

发表机构 * SensiLab, Monash University Melbourne, Australia（传感实验室，墨尔本莫纳什大学，澳大利亚）

AI总结本研究探索使用视觉语言模型（VLM）通过CLIP-IQA评分或成对比较结合Glicko评级系统来评估进化设计的美学，并与艺术家排名对比分析两种方法的优劣。

Comments Paper presented at ICCC26, June 29 - July 3, 2026, Coimbra, Portugal

详情

AI中文摘要

进化系统在创意领域已展现出显著成果，最近的应用包括生成式排版、设计和音乐。然而，设计能有效捕捉抽象输出所需美学的适应度函数仍是一个开放问题。在这项工作中，我们探索了两种使用视觉语言模型（VLM）评估种群美学的方法。第一种方法使用CLIP-IQA预测每个设计的美学分数。第二种方法则让候选设计相互对抗，由VLM根据用户指定的自定义提示确定胜者。然后，这些成对比较的结果通过Glicko评级系统用于估计种群排名。我们在一个使用自定义生成系统的案例研究中展示了这些方法，并将所得排名与艺术家的美学排名以及其他美学评估技术产生的排名进行比较。此外，我们记录了艺术家使用这些方法进化设计的体验，批判性地分析了两种方法的优缺点。

英文摘要

Evolutionary systems have demonstrated remarkable results in creative domains, with recent applications in generative typography, design, and music. However, an open problem remains in designing fitness functions that effectively capture the desired aesthetics of abstract outputs. In this work, we explore two methods for evaluating the aesthetics of a population using Vision-Language Models (VLMs). The first method uses CLIP-IQA to predict an aesthetic score for each design. The second method instead pits candidates against each other, with winners determined by a VLM using a custom prompt specified by the user. The outcomes of these pairwise comparisons are then used to estimate a population ranking via the Glicko rating system. We present these methods in the context of a case study using a custom generative system and compare the resulting rankings with an artist's aesthetic ranking and those produced by other aesthetic evaluation techniques. Additionally, we document the artist's experience using these approaches to evolve designs, critically analysing the strengths and weaknesses of both methods.

URL PDF HTML ☆

赞 0 踩 0

2606.00111 2026-06-02 eess.IV cs.CV cs.LG

ChWDTA: Channel-wise Wavelet-Domain Transformer Attention and Entropy Modeling for Learned Image Compression

ChWDTA：用于学习图像压缩的通道级小波域变换器注意力和熵建模

Haisheng Fu, Runyu Yang, Feng Ding, Siyu Zhu, Jie Liang, Xiaoxiao Li, Zhenman Fang, Jingning Han

发表机构 * Electrical and Computer Engineering Department, The University of British Columbia（英属哥伦比亚大学电气与计算机工程系）； School of Engineering Science, Simon Fraser University（西蒙弗雷泽大学工程科学学院）； School of Electronic Science and Technology, Eastern Institute of Technology（电子科学与技术学院，东部技术学院）； Google LLC（谷歌公司）

AI总结提出通道级小波域变换器注意力（ChWDTA）和通道级小波包分解，在混合CNN-Transformer图像压缩框架中提升率失真性能，在多个测试集上实现显著BD-rate降低。

Comments 13 pages, 8 figures, 6 tables

详情

AI中文摘要

最先进的学习图像压缩（LIC）方案越来越多地基于混合CNN-Transformer架构。为了进一步提高率失真性能，我们将通道级小波变换引入变换器和熵编码组件。首先，我们提出了一种通道级小波域变换器注意力（ChWDTA）机制。ChWDTA保留了现代LIC骨干中使用的有效窗口化空间自注意力，但在将注意力输出通过逆变换映射回来之前，在通道级小波变换特征上计算Q/K/V投影。因此，得到的通道级小波域变换器块（ChWDTB）保留了窗口化注意力的空间标记化模式，同时稀疏化了注意力投影所见的通道协方差。其次，在熵编码阶段，我们引入了一种通道级小波包（ChWP）分解，产生四个大小相等的子带，这更适合基于通道级切片的自回归熵建模。当每个通道级子带被分成两个切片时，我们使用八个切片进行熵编码。通过这种配置，所提出的方案在Kodak、CLIC Professional Validation和Tecnick测试集上分别获得了-17.82%、-19.15%和-22.56%的BD-rate降低。即使每个通道级子带被编码为单个切片，该方案仍以较低的复杂度保留了大部分编码增益。结果证实了在基于CNN-Transformer的LIC方案中引入小波变换的优势。

英文摘要

State-of-the-art learned image compression (LIC) schemes are increasingly based on hybrid CNN-transformer architectures. To further improve rate-distortion performance, we introduce channel-wise wavelet transforms into both the transformer and entropy-coding components. First, we propose a channel-wise wavelet-domain transformer attention (ChWDTA) mechanism. ChWDTA keeps the efficient windowed spatial self-attention used in modern LIC backbones, but computes the Q/K/V projections on channel-wise wavelet-transformed features before mapping the attention output back with the inverse transform. The resulting Channel-wise Wavelet-Domain Transformer Block (ChWDTB) therefore preserves the spatial tokenization pattern of windowed attention while sparsifying the channel covariance seen by the attention projections. Second, in the entropy-coding stage, we introduce a channel-wise wavelet packet (ChWP) decomposition that produces four equal-sized subbands, which better fit channel-wise slice-based autoregressive entropy modeling. When each channel-wise subband is divided into two slices, we use eight slices for entropy coding. With this configuration, the proposed scheme obtains BD-rate reductions of -17.82%, -19.15%, and -22.56% on the Kodak, CLIC Professional Validation, and Tecnick test sets, respectively. Even when each channel-wise subband is coded as a single slice, the scheme still retains most of the coding gains with lower complexity. The results confirm the advantage of introducing wavelet transform in CNN-transformer-based LIC schemes.

URL PDF HTML ☆

赞 0 踩 0

2606.00107 2026-06-02 eess.SP cs.AI cs.LG

Motif-based morphology signatures for interpretable ECG screening and monitoring

基于基序的形态学特征用于可解释的心电图筛查和监测

Nivedita Bijlani, Mauricio Villarroel

发表机构 * The Podium Institute of Sports Medicine and Technology（Podium运动医学与体育科技研究所）

AI总结提出一种基于基序的框架，通过定义可解释的心跳对齐基序和三种漂移度量，实现短期和长期心电图监测中的形态学变化量化与异常检测。

Comments Accepted to the IEEE Engineering in Medicine and Biology Conference (EMBC) 2026

详情

AI中文摘要

心电图仍然是心血管筛查的核心，但解读仍主要依赖人工且呈间歇性。临床实践依赖于简短的静息心电图，并在需要时进行长时间动态记录，两者都会产生需要大量资源审查的数据。因此，在临床明显异常出现之前，微妙的形态学变化或渐进性漂移可能被忽视。我们提出了一种基于基序的框架，该框架将心跳对齐的心电图基序定义为可解释的心脏特征，并量化短期和长期监测中的形态学漂移和偏差。基序是代表主导形态的典型心动周期。我们引入了三个可解释的漂移度量：与正常窦性心律的偏差、与个性化基线的偏差以及基序不稳定性指数。基序通过选择在固定窗口内最小化动态时间规整距离的心跳来提取。我们在短期（PTB-XL）和长期（MIT-BIH心律失常）心电图数据集上评估这些度量。通过代表性基序叠加和基于基准点的可视化实现可解释性，从而能够直接检查形态学变化。在MIT-BIH中，所提出的度量显著区分了主要正常和心律失常受试者（p<0.01）。在PTB-XL中，正常窦性心律偏差在主要诊断亚型中区分了正常和异常心电图（p<1e-4，Cliff's delta高达0.93）。心电图基序提供了心脏形态的可解释表示，支持可扩展的纵向监测和形态学驱动变化的早期检测。

英文摘要

Electrocardiography (ECG) remains central to cardiovascular screening, yet interpretation remains largely manual and episodic. Clinical practice relies on brief resting ECGs and, when required, long-duration ambulatory recordings, both generating data that require resource-intensive review. Consequently, subtle morphological changes or progressive drift preceding clinically apparent abnormalities may go unnoticed. We propose a motif-based framework that defines beat-aligned ECG motifs as interpretable cardiac signatures and quantifies morphological drift and deviation across short and long-term monitoring. Motifs are representative cardiac cycles capturing dominant morphology. We introduce three interpretable drift metrics: deviation from a normal sinus rhythm (NSR), deviation from a personalised baseline, and a motif instability index. Motifs are extracted by selecting beats that minimise Dynamic Time Warping (DTW) distance within fixed windows. We evaluate these metrics on short (PTB-XL) and long-duration (MIT-BIH Arrhythmia) ECG datasets. Interpretability is achieved through representative motif overlays and fiducial-based visualisations, enabling direct inspection of morphological changes. In MIT-BIH, the proposed metrics significantly separated predominantly normal from arrhythmic subjects (p<0.01). In PTB-XL, NSR deviation distinguished normal from abnormal ECGs across major diagnostic subtypes (p<1e-4, Cliff's delta up to 0.93). ECG motifs provide an interpretable representation of cardiac morphology, supporting scalable longitudinal monitoring and early detection of morphology-driven change.

URL PDF HTML ☆

赞 0 踩 0

2606.00106 2026-06-02 eess.SP cs.AI cs.HC cs.LG

A Methodological Framework for Explicit Control of the Speed-Accuracy Trade-off in Brain-Computer Interfaces

脑机接口中速度-准确性权衡显式控制的方法论框架

Javier Jiménez, Francisco B Rodríguez

发表机构 * Grupo de Neurocomputación Biológica, Departamento de Ingeniería Informática, Universidad Autónoma de Madrid（生物神经计算组，信息工程系，马德里自治大学）

AI总结提出一个独立于分类器、范式和早停策略的评估框架，通过增益和保持度两个指标及可调参数α显式控制速度-准确性权衡，并在P300范式上验证其有效性。

详情

AI中文摘要

脑机接口（BCI）受到脑电图等模态低信噪比的限制，需要多次试验才能可靠解码用户意图。这导致了速度-准确性权衡，即更高的准确性以速度为代价。速度-准确性平衡依赖于应用，因此需要可控的权衡。传统指标（如信息传输率）将速度和准确性合并，模糊了它们的依赖关系并可能引入偏差。在本研究中，我们提出了一个独立于分类器、范式和早停策略的评估框架，将速度和准确性分离。我们采用两个度量：增益（相对速度提升）和保持度（相对准确性保持），并将它们组合成一个由α控制的可调增益-保持平衡，从而调节速度-准确性权衡。该参数无需修改分类器即可调整工作点，便于跨场景部署。该框架在P300事件相关电位范式上进行了评估，使用了63名受试者的公开记录以及多种分类器和早停策略，以实现速度-准确性和比特率的不同工作点。结果表明，调整α可产生快速、准确或平衡的BCI行为，展示了速度-准确性权衡的显式控制。该方法支持受试者级别的性能预测，并提高了BCI行为的可解释性。对信息传输率的进一步分析揭示了其向速度的系统性偏差，该偏差通过所提出的框架中的增益和保持度测量得到解释。总体而言，本工作将速度-准确性权衡确立为可控的设计变量，并在公开的P300范式上进行了验证，从而实现了BCI的透明评估和应用特定优化。

英文摘要

Brain-computer interfaces (BCIs) are limited by low signal-to-noise ratio in modalities such as electroencephalography, which requires multiple trials to reliably decode user intentions. This induces a speed-accuracy trade-off, whereby higher accuracy comes at the cost of speed. The speed-accuracy balance is application-dependent, motivating controllable trade-offs. Conventional metrics, such as the Information Transfer Rate, combine speed and accuracy obscuring their dependence and potentially introducing biases. In this study, we propose an evaluation framework independent of classifier, paradigm, and early-stopping strategy that separates speed and accuracy. We employ two measures, Gain (relative speed improvement) and Conservation (relative accuracy preservation), and combine them into a tunable Gain-Cons Balance controlled by α, regulating the speed-accuracy trade-off. The parameter adjusts the operating point without modifying the classifier, facilitating deployment across scenarios. The framework was evaluated on P300 event-related potential paradigms using public recordings from 63 subjects as well as multiple classifiers and early-stopping strategies to achieve distinct operating points in speed-accuracy and bitrate. Results show that tuning α yields fast, accurate, or balanced BCI behaviours, demonstrating explicit control of the speed-accuracy trade-off. The method supports subject-level performance prediction and improves explainability of BCI behaviour. Further analysis of the Information Transfer Rate reveals a systematic bias toward speed, explained by the proposed framework through the Gain and Conservation measurements. Overall, this work establishes the speed-accuracy trade-off as a controllable design variable validated on public P300-based paradigms, enabling transparent evaluation and application-specific optimization of BCIs.

URL PDF HTML ☆

赞 0 踩 0

2606.00084 2026-06-02 cs.IR cs.AI cs.CL cs.LG

SentimentLens: Reconciling Sentiment and Ratings via Dual-Modality in the Hospitality Sector

SentimentLens: 通过双模态调和酒店业中的情感与评分

Dineth Jayakody, Pasindu Thenahandi, Sampath Jayarathna

发表机构 * University of Peradeniya（珀拉尼亚大学）

AI总结提出SentimentLens系统，基于方面级情感分析从非结构化酒店评论中提取知识，并通过跨模态调和文本情感与数值评分来识别运营冲突和服务改进机会。

详情

AI中文摘要

在线旅游平台生成大量用户生成的酒店评论，为大规模理解旅行者体验提供了丰富机会。然而，将非结构化文本反馈转化为结构化、可操作的见解仍然是一项具有挑战性的任务。本文提出了SentimentLens，一个基于方面级情感分析的可扩展分析系统，该系统从非结构化酒店评论中执行知识提取，并将其组织成可解释的服务类别。SentimentLens集成了方面术语提取、方面情感分类、语义类别分配和多层次分析模块，以支持区域级、酒店级和类别级评估。该系统设计为在不同地理环境和酒店环境中运行。为了展示其实用性，我们将SentimentLens应用于一个包含超过10,000条公开酒店评论的大型真实数据集。通过广泛分析，该框架揭示了旅行者情感如何随区域、服务类别和酒店类型而变化。我们进一步实现了文本情感与数值评分的跨模态调和，以识别潜在运营冲突、服务质量的结构性不一致性，并使用重要性-绩效和基于熵的分析确定高影响力的改进机会。结果表明，SentimentLens有效地将大规模非结构化评论转化为可操作的情报，支持酒店管理和旅游政策的数据驱动决策。虽然通过一个国家案例研究进行了演示，但所提出的系统可推广到其他目的地和评论驱动的服务领域。

英文摘要

Online travel platforms generate vast volumes of user-generated hotel reviews, offering rich opportunities to understand traveler experiences at scale. However, transforming unstructured textual feedback into structured, actionable insights remains a challenging task. This paper presents SentimentLens, a scalable analysis system based on Aspect-Based Sentiment Analysis that performs knowledge extraction from unstructured hotel reviews and organizes them into interpretable service categories. SentimentLens integrates aspect term extraction, aspect sentiment classification, semantic category assignment, and multi-level analytical modules to support region-level, hotel-level, and category-level evaluation. The system is designed to operate across different geographic contexts and hospitality settings. To demonstrate its practical utility, we apply SentimentLens to a large real-world dataset of over 10,000 publicly available hotel reviews. Through extensive analysis, the framework reveals how traveler sentiment varies across regions, service categories, and hotel archetypes. We further implement a cross-modal reconciliation of textual sentiment and numerical ratings to identify latent operational conflicts, structural inconsistencies in service quality, and high-impact improvement opportunities using importance--performance and entropy-based analyses. The results show that SentimentLens effectively transforms large-scale unstructured reviews into actionable intelligence, supporting data-driven decision-making for hospitality management and tourism policy. While demonstrated using a national case study, the proposed system is generalizable to other destinations and review-driven service domains.

URL PDF HTML ☆

赞 0 踩 0

2606.00074 2026-06-02 eess.SP cs.AI cs.LG

CLSP-REQA: A Real-Time Quality-Aware Closed-Loop Seizure Prediction Framework with Mamba-BiLSTM and Confidence-Gated Intervention

CLSP-REQA：基于Mamba-BiLSTM和置信门控干预的实时质量感知闭环癫痫发作预测框架

Mufeng Chen, Qi Wu, Bingchao Huang, Xiwen Lai, Zekai Chen, Xinge Ouyang, Quansheng Ren

发表机构 * Department of Engineering Science, University of Oxford（牛津大学工程科学系）； Mathematical Institute, University of Oxford（牛津大学数学研究所）； School of Computer Science and Engineering, Beihang University（北航计算机科学与工程学院）； Aerospace Information Research Institute, Chinese Academy of Sciences（中国科学院航天信息研究所）； Department of Mechanical Engineering, The University of British Columbia（不列颠哥伦比亚大学机械工程系）； College of Life Sciences, Hunan Normal University（湖南师范大学生命科学学院）； School of Electronics, Peking University（北京大学电子学院）

AI总结提出CLSP-REQA框架，通过嵌入实时EEG质量评估模块和Mamba-BiLSTM骨干网络，结合分层非线性融合函数，在严格跨患者评估下实现优于现有方法的癫痫发作预测性能。

Comments 27 pages, 8 figures, submitted to Biomedical Signal Processing and Control

详情

AI中文摘要

可靠的癫痫发作预测是闭环神经刺激治疗的前提，然而现有方法很少考虑实际部署中EEG信号质量的可变性，并且绝大多数采用非严格的评估协议，高估了泛化性能。我们提出了CLSP-REQA（具有实时EEG质量评估的闭环癫痫发作预测），这是一个统一框架，将轻量级信号质量估计器直接嵌入预测流程中。实时EEG质量评估（REQA）模块与Mamba-BiLSTM骨干网络并行运行，产生一个标量质量分数q ∈ [0,1]，通过分层非线性融合函数（ECLO）调节输出置信度。在CHB-MIT头皮EEG数据库（n=23名受试者，198次发作）的严格跨患者评估下，CLSP-REQA实现了0.7426 ± 0.0199的AUC-ROC，优于Jemal等人报告的未适应跨患者基线0.69，仅使用16个EEG通道（先前工作为23个），且无需任何目标患者数据或域适应。在SIENA头皮EEG数据库（n=14名受试者，47次发作）上，CLSP-REQA实现了0.7012 ± 0.0249的AUC，大幅超过同一数据集上最佳域适应跨患者结果0.61，展示了强大的跨数据集泛化能力。该框架输出结构化四元组(p, q, c, Phi_SHAP)，可直接与闭环神经刺激器接口兼容。

英文摘要

Reliable seizure prediction is a prerequisite for closed-loop neurostimulation therapy, yet existing methods rarely account for the variability in EEG signal quality encountered in real-world deployment, and the overwhelming majority adopt non-strict evaluation protocols that overestimate generalisation performance. We propose CLSP-REQA (Closed-Loop Seizure Prediction with Real-time EEG Quality Assessment), a unified framework that embeds a lightweight signal quality estimator directly within the prediction pipeline. A Real-time EEG Quality Assessment (REQA) module runs in parallel with a Mamba-BiLSTM backbone, producing a scalar quality score q in [0,1] that modulates output confidence through a tiered non-linear fusion function (ECLO). Under strict cross-patient evaluation on the CHB-MIT Scalp EEG Database (n = 23 subjects, 198 seizures), CLSP-REQA achieves an AUC-ROC of 0.7426 +- 0.0199, outperforming the unadapted cross-patient baseline of 0.69 reported by Jemal et al., using only 16 EEG channels compared to 23 in prior work, and without requiring any target-patient data or domain adaptation. On the SIENA Scalp EEG Database (n = 14 subjects, 47 seizures), CLSP-REQA achieves AUC 0.7012 +- 0.0249, substantially surpassing the best domain-adapted cross-patient result of 0.61 on the same dataset, demonstrating strong cross-dataset generalisation. The framework outputs a structured four-tuple (p, q, c, Phi_SHAP) directly compatible with closed-loop neurostimulator interfaces.

URL PDF HTML ☆

赞 0 踩 0

2606.00073 2026-06-02 cs.NE cs.AI cs.LG

Rare Events, Real Signals: Functional Ensembles as Units of Computation in Deep Spiking Networks

罕见事件，真实信号：深度脉冲网络中的功能集合作为计算单元

Aditi Aravind, Konstantinos Ladakis, Mario Alexios Savaglio, Stelios M. Smirnakis, Maria Papadopouli

发表机构 * University of Crete（希腊克里特大学）； Foundation of Research & Technology - Hellas（希腊研究与技术基金会）； Archimedes Research Unit（阿基米德研究单位）； Harvard Medical School（哈佛医学院）； Brigham and Women’s Hospital（布莱根妇女医院）

AI总结通过引入功能连接性分析框架，研究深度脉冲神经网络中功能集合的涌现特性，发现一阶功能连接集合的协同放电可靠预测下游神经元响应，且信息编码集中在罕见但高度协调的活动模式中。

详情

AI中文摘要

我们通过引入一个受神经科学启发的框架，从功能连接性的角度分析深度脉冲神经网络（SNN），研究内部表征如何在层次化处理系统中涌现。借鉴系统神经科学和信息论的概念，我们基于一个神经元与训练好的SNN架构中前一层神经元的统计显著成对相关性，形成该神经元的一阶功能连接（1FC）组。然后，我们在各种条件下的推理过程中跟踪其响应特性。我们的分析表明，先前在生物皮层中观察到的功能连接性的几个原理在脉冲ResNet架构中得以保留。这些1FC集合表现出有趣的特性：它们的聚合协同放电通过一个鲁棒的、类似ReLU的输入输出关系可靠地预测下游神经元响应，其增益随集合大小系统性缩放。仅在高的1FC协同放电事件期间才出现所呈现类别的可靠编码，而这些事件本身发生频率较低，表明信息表征集中在罕见但高度协调的活动模式中。在均匀随机噪声或对抗性扰动下，这些响应轮廓被破坏，尤其是在早期和中间层。这使得能够在特定节点和路径上进行有针对性的高分辨率探查。我们表明，功能连接结构由学习塑造，并且在权重置换下该结构被破坏。这些确立了1FC集合作为输入编码和信息传递的功能上有意义的基质，对设计针对信息流的有针对性的细粒度诊断具有潜在意义。

英文摘要

We investigate how internal representations emerge across hierarchical processing systems by introducing a neuroscience-inspired framework for analyzing deep spiking neural networks (SNN) through the lens of functional connectivity. Drawing on concepts from systems neuroscience and information theory, we form the first-order functionally-connected (1FC) group of a neuron based on its statistically significant pairwise correlations with neurons from the previous layer of a trained SNN architecture. We then track its response properties during inference under various conditions. Our analysis shows that several principles of functional connectivity previously observed in biological cortex are preserved in spiking ResNet architectures. These 1FC ensembles display interesting properties: their aggregate cofiring reliably predicts downstream neuronal responses through a robust, ReLU-like input-output relationship, whose gain scales systematically with ensemble size. Reliable encoding of the presented class emerges only during high 1FC cofiring events, which themselves occur infrequently, indicating that informative representations are concentrated in rare but highly coordinated activity patterns. Under uniform random noise or adversarial perturbations, these response profiles are disrupted, particularly in early and intermediate layers. This enables a targeted high-resolution interrogation at specific nodes and pathways. We showed that the functional connectivity structure is shaped by learning and this structure breaks under weight permutation. These establish 1FC ensembles as a functionally meaningful substrate for input encoding and information transfer, with potential implications in designing targeted fine-grained diagnostics on the information flow.

URL PDF HTML ☆

赞 0 踩 0

2606.00065 2026-06-02 cs.IR cond-mat.mtrl-sci cs.AI cs.CL

Beyond Text and Tables: Vision-Language Model Integration in ComProScanner for Extracting Materials Data from Scientific Figures with High Accuracy

超越文本与表格：ComProScanner中视觉-语言模型集成实现从科学图表中高精度提取材料数据

Aritra Roy, Enrico Grisan, Chiara Gattinoni, John Buckeridge

发表机构 * Energy, Materials and Environment Research Centre, London South Bank University, London SE1 0AA, UK（能源、材料与环境研究中心，伦敦南银行大学）； School of Engineering and Design, London South Bank University, London SE1 0AA, UK（工程与设计学院，伦敦南银行大学）； Bioscience and Bioengineering Research Centre, London South Bank University, London SE1 0AA, UK（生物科学与生物工程研究中心，伦敦南银行大学）； Department of Physics, Kings College London, London WC2R 2LS, UK（物理系，伦敦国王学院）

AI总结本文通过集成视觉-语言模型扩展ComProScanner框架，实现了从科学图表中自动提取成分-性能数据，在压电陶瓷数据集上达到0.97的组成准确率和归一化F1分数，并引入基于范围的误差阈值评估方法。

Comments 18 pages, 3 figures

详情

AI中文摘要

基于大语言模型流水线的自动提取科学文献中材料成分-性能数据的方法已取得显著进展；然而，现有框架仍局限于文本和表格内容，忽视了仅在科学图表中报告的大量定量性能数据。本文扩展了ComProScanner——一个用于自动构建成分-性能数据库的完全端到端多智能体框架，为其增加了基于原生视觉-语言模型（VLM）的图表提取能力。该扩展引入了一个FigureExtractor工具，用于基于标题关键词对所有支持的出版商进行图表过滤，以及一个GraphExtractorTool智能体，它将提取的图表传递给可配置的VLM，以从科学图表和绘图中恢复成分-性能对。基于LMArena Diagram排行榜和每百万token输入成本低于1.50美元的标准，选择了四个VLM进行评估。在来自已建立的d33测试语料库的50篇压电陶瓷文章上的基准测试表明，Gemini-3-Flash-Preview实现了最高性能，组成准确率为0.97，归一化F1分数为0.97，同时仍然是四个评估模型中成本效益最高的。此外，我们在评估框架中引入了一个基于范围的值误差阈值参数，与精确值匹配相比，提供了对从图表中提取的数值性能数据更具物理意义的评估。这些贡献使集成VLM的ComProScanner成为第一个针对材料科学、完全自动化、多模态的文献挖掘平台，能够在单一统一流水线中从文本、表格和图表中提取结构化的成分-性能数据。

英文摘要

Automated extraction of materials composition-property data from scientific literature has advanced considerably with the development of large language model-based pipelines; however, existing frameworks remain limited to textual and tabular content, overlooking the substantial proportion of quantitative property data reported exclusively in scientific figures. Here, we extend ComProScanner, a fully end-to-end multi-agent framework for automated composition-property database construction, with a native vision-language model (VLM) based figure extraction capability. The extension introduces a FigureExtractor utility for caption-keyword-based figure filtering across all supported publishers, and a GraphExtractorTool agent that passes extracted figures to a configurable VLM to recover composition-property pairs from scientific charts and plots. Four VLMs are selected for evaluation on the basis of the LMArena Diagram leaderboard with an input cost criterion of less than \$1.50 per million tokens. Benchmarking on 50 piezoelectric ceramic articles from the established $d_{33}$ test corpus demonstrates that Gemini-3-Flash-Preview achieves the highest performance with a composition accuracy of 0.97 and a normalised F1 score of 0.97, whilst remaining the most cost-effective model among the four evaluated. We additionally introduce a range-based value error threshold parameter into the evaluation framework, providing a more physically meaningful assessment of numeric property values extracted from figures than exact value matching. These contributions establish VLM-integrated ComProScanner as the first materials-specific, fully automated, multimodal literature mining platform capable of extracting structured composition-property data from text, tables, and figures within a single unified pipeline.

URL PDF HTML ☆

赞 0 踩 0

2606.00060 2026-06-02 q-fin.TR cs.CE cs.LG

Machine Learning-Based Bitcoin Trading Under Transaction Costs: Evidence From Walk-Forward Forecasting

基于机器学习的比特币交易：考虑交易成本的滚动前向预测证据

Andrei Bysik, Robert Ślepaczuk

发表机构 * Quantitative Finance Research Group, Faculty of Economic Sciences, University of Warsaw（经济科学学院量化金融研究组，华沙大学）； Quantitative Finance Research Group, Department of Quantitative Finance and Machine Learning, Faculty of Economic Sciences, University of Warsaw（经济科学学院量化金融与机器学习系量化金融研究组，华沙大学）

AI总结研究在交易成本下，利用XGBoost、LSTM和iTransformer等机器学习模型预测BTC-USDT小时收益率，并通过成本感知执行过滤器将预测转化为盈利交易策略。

Comments 42 pages,

详情

AI中文摘要

本文研究机器学习对BTC-USDT小时收益率的预测能否在扣除交易成本后转化为具有经济意义的交易表现。使用2018-2026年间约70,000个小时观测值，在27折滚动前向协议中评估XGBoost、LSTM和iTransformer。所有三种模型在选定配置下均产生正的毛交易表现，但一旦施加十个基点的交易成本，基于符号的朴素策略便失效。一种成本感知的执行过滤器（仅当预测幅度超过基于交易成本的阈值时才阻止交易）显著降低了换手率，并在选定配置下恢复了盈利能力。最强的纯多头XGBoost策略年化收益率超过65%，夏普比率高于1。额外测试表明，技术指标在选定情况下提升了表现，EGARCH导出的特征并未提供一致的稳健收益，且XGBoost在描述性上优于神经替代模型，尽管自助法证据不支持正式的统计优势。损失函数和模型选择效应是次要的且统计上脆弱。结果表明，小时级加密货币交易的主要障碍不仅在于弱可预测性，还在于将预测转化为交易的方式。

英文摘要

This paper investigates whether machine learning forecasts of hourly BTC-USDT returns can be converted into economically meaningful trading performance after transaction costs. Using approximately 70,000 hourly observations from 2018-2026, XGBoost, LSTM, and iTransformer are evaluated in a 27-fold walk-forward protocol. All three models produce positive gross trading performance in selected configurations, but naive sign-based strategies fail once transaction costs of ten basis points are imposed. A cost-aware execution filter, which prevents trades only when the forecast magnitude exceeds a transaction-cost-based threshold, sharply reduces turnover and restores profitability in selected configurations. The strongest long-only XGBoost strategy produces annualised returns above 65% with a Sharpe ratio above one. Additional tests show that technical indicators improve performance in selected cases, EGARCH-derived features do not provide uniformly robust gains, and XGBoost is descriptively stronger than the neural alternatives, although bootstrap evidence does not support formal statistical dominance. Loss-function and model-selection effects are secondary and statistically fragile. The results show that the main obstacle in hourly cryptocurrency trading is not only weak predictability, but also the way forecasts are converted into trades.

URL PDF HTML ☆

赞 0 踩 0

2606.00051 2026-06-02 cs.CY cs.AI

Business Utility of Large Language Models as Exploratory Data Analysis Agents

大型语言模型作为探索性数据分析代理的商业实用性

Rafał Łabędzki, Patryk Miziuła, Hubert Rutkowski, Szymon Betlewski, Cezary Depta, Szymon Janowski, Jarosław Kochanowicz, Jan Kanty Milczek

发表机构 * deepsense.ai ； SGH Warsaw School of Economics（SGH沃兹尼亚克经济学院）； Bydgoszcz University of Science and Technology（比得戈茨茨理工大学）； Google（谷歌）

AI总结通过基于代理的供应链模拟基准，评估LLM作为EDA代理在商业环境中的平均性能与可重复性，提出风险调整指标Business utility，发现多数配置不可靠，GPT-5.4表现最佳。

详情

AI中文摘要

大型语言模型（LLM）越来越多地被用于分析工作流，但它们在商业环境中作为探索性数据分析（EDA）代理的适用性仍不确定。在实践中，一个可部署的EDA代理不仅必须提供有用的平均性能，还必须提供足够的可重复性以支持对其输出的信任。我们在一个受控的、与商业相关的基准上评估了这一要求，该基准基于基于代理的供应链模拟。任务是通过从间接操作痕迹而非显式标签进行推理，识别导致低质量和下游销售损失的供应商-产品组合。来自八个模型家族的十五种模型变体配置在四种实验条件下进行了评估，这些条件改变了数据表示、提示清晰度和信号强度，每种条件有五个轨迹。输出使用Jaccard指数与确定性真实值进行评分，并通过一个框架进行评估，该框架结合了平均得分（ms）、变异系数（CV）、探索性跨条件显著性检验以及商业实用性（Business utility），这是我们提出的一个风险调整指标，用于在单一操作度量中总结质量和可重复性。结果表明，大多数配置对于自主EDA使用来说不够可靠，即使它们的平均得分看起来可以接受。具有超高推理努力的GPT-5.4实现了最强的整体表现，实验平均ms为0.8748，实验平均商业实用性为0.6952，而次优配置在可变性折扣后损失了更多的实用性。我们的发现表明，对EDA代理的评估应将平均质量、可重复性和条件敏感性视为操作可信度的互补维度。

英文摘要

Large Language Models (LLMs) are increasingly used in analytical workflows, but their suitability as exploratory data analysis (EDA) agents in business settings remains uncertain. In practice, a deployable EDA agent must provide not only useful average performance but also sufficient repeatability to support trust in its outputs. We evaluate this requirement in a controlled, business-relevant benchmark built on an agent-based supply chain simulation. The task is to identify supplier-product combinations responsible for low quality and downstream sales loss by reasoning from indirect operational traces rather than from explicit labels. Fifteen model-variant configurations from eight model families were evaluated under four experimental conditions that varied data representation, prompt clarity, and signal strength, with five trajectories per condition. Outputs were scored against deterministic ground truth using the Jaccard index and assessed through a framework that combines mean score (ms), coefficient of variation (CV), exploratory cross-condition significance tests, and Business utility, a risk-adjusted metric that we propose to summarise quality and repeatability in a single operational measure. The results show that most configurations are not reliable enough for autonomous EDA use, even when their average scores appear acceptable. GPT-5.4 with extra-high reasoning effort achieved the strongest overall profile, with an experiment-averaged ms of 0.8748 and an experiment-averaged Business utility of 0.6952, while the next-best configurations lost substantially more utility after variability discounting. Our findings suggest that evaluation of EDA agents should treat average quality, repeatability, and condition sensitivity as complementary dimensions of operational trustworthiness.

URL PDF HTML ☆

赞 0 踩 0

2606.00049 2026-06-02 cs.CY cs.AI

Measuring and Mitigating Bias in Code Generated by Large Language Models

测量和减轻大型语言模型生成代码中的偏见

Yuxi Chen, Yutian Tang, Timothy Storer

发表机构 * School of Computing Science, University of Glasgow（格拉斯哥大学计算机科学学院）

AI总结本文针对GPT-4o和Gemini等主流代码生成工具，提出评估框架，使用代码偏见分数和属性变化比率量化偏见，并探索四种轻量级缓解策略。

详情

AI中文摘要

大型语言模型（LLMs）在自然语言生成中的应用广受认可，并越来越多地用于代码生成任务。然而，其生成输出中的偏见问题仍然显著。本文聚焦于GPT-4o和Gemini这两个主流的代码生成工具，提出了一个评估LLM生成代码中偏见的框架，特别考察了受保护属性、提示和网络搜索能力的影响。我们使用两个指标：代码偏见分数（CBS）和属性变化比率（ACR），分别量化偏见的普遍性和不同属性的影响程度。此外，我们研究了四种轻量级缓解策略：少样本、思维链、少样本思维链和多智能体，旨在减轻生成代码中的偏见。我们的研究结果表明，即使在应用缓解策略后，偏见在不同受保护属性和数据集中仍然普遍存在，这凸显了需要更有效的方法来减少AI驱动的代码生成系统中的偏见。

英文摘要

Large language models (LLMs) are widely recognised for their applications in natural language generation and are increasingly used for code generation tasks. However, concerns about bias in their generated outputs remain significant. This paper focuses on GPT-4o and Gemini, mainstream tools for code generation, and proposes a framework for evaluating bias in LLM-generated code, specifically examining the influence of protected attributes, prompts and web-search capability. We use two metrics: the code bias score (CBS) and the attribute change ratio (ACR), to quantify the prevalence of bias and the degree of influence of different attributes, respectively. In addition, we investigate four lightweight mitigation strategies: Few-Shot, Chain-of-Thought, Few-Shot Chain-of-Thought, and Multi-agent, aimed at mitigating bias in generated code. Our findings reveal that bias remains prevalent across different protected attributes and datasets even after applying mitigation strategies, highlighting the need for more effective approaches to reduce bias in AI-driven code generation systems.

URL PDF HTML ☆

赞 0 踩 0

2606.00048 2026-06-02 cs.CY cs.CL

The Invisible Coalition Partner: How LLMs Vote When Democracy Gets Concrete

无形的联盟伙伴：当民主变得具体时，LLM如何投票

Joel Barmettler

发表机构 * Independent Researcher（独立研究员）； Zurich Switzerland（苏黎世瑞士）

AI总结通过对比抽象问卷和瑞士实际公投，发现LLM在具体政策决策中表现为中间派、偏向现状且跨语言不一致，而非先前认为的左倾偏见。

Comments 13 pages, 10 figures. Preprint. Code and data: https://github.com/joelbarmettlerUZH/invisible-coalition-partner

详情

AI中文摘要

先前的研究已确定，经过指令调整的大型语言模型表现出左倾政治偏见，这些偏见仅通过抽象政治问卷测量。我们表明这一发现并不适用于具体的政策决策。我们引入了一种基于瑞士民主现实的双工具方法论。Smartvote问卷（75个抽象政策问题）被应用于来自27个模型家族的66个LLM，并与184名当选的瑞士国民院议员进行比较，复制了已确立的左倾趋同（Cohen's d = 3.64, p = 0.0002）。然后，作为本工作的创新，9个旗舰LLM在三种信息条件下面对48次真实的联邦公投（Volksabstimmungen），使用四种国家语言（德语、法语、意大利语、罗曼什语），将投票与实际结果和政党推荐（Parolen）进行比较。三个发现挑战了主流叙述。（1）抽象问卷不能预测具体行为：在Smartvote上的左右一致梯度从左侧峰值转变为在Volksabstimmungen上的中心峰值，模型最接近中间派的Die Mitte和FDP，而非左派的SP和Gruene（Wilcoxon p = 0.008）。（2）对于某些模型，政治问题的语言比政治内容更能改变答案：跨语言一致性范围从50%（Mistral）到98%（GPT-5.4）。（3）两个模型表现出系统的变革厌恶而非政治偏见，无论方向如何，在83-94%的公投中投反对票（二项式p < 0.0001）。先前工作测量的“左倾偏见”可能无法推广到抽象工具之外。在具体政策决策上，LLM的行为更像是谨慎的公务员而非左派的联盟伙伴：中间派、偏好现状且跨语言不一致。

英文摘要

Prior research has established that instruction-tuned large language models exhibit left-of-center political bias, measured exclusively through abstract political questionnaires. We show that this finding does not generalize to concrete policy decisions. We introduce a dual-instrument methodology grounded in Swiss democratic reality. The Smartvote questionnaire (75 abstract policy questions) is administered to 66 LLMs from 27 model families and compared to 184 elected members of the Swiss National Council, replicating the established leftward convergence (Cohen's d = 3.64, p = 0.0002). Then, novel to this work, 9 flagship LLMs are confronted with 48 real federal referenda (Volksabstimmungen) in four national languages (German, French, Italian, Romansh) under three information conditions, comparing votes to actual outcomes and party recommendations (Parolen). Three findings challenge the prevailing narrative. (1) Abstract questionnaires do not predict concrete behavior: the left-to-right agreement gradient on Smartvote shifts from left-peaked to center-peaked on Volksabstimmungen, where models align most with centrist Die Mitte and FDP rather than leftist SP and Gruene (Wilcoxon p = 0.008). (2) For some models, the language of a political question changes the answer more than the political content does: cross-linguistic consistency ranges from 50% (Mistral) to 98% (GPT-5.4). (3) Two models exhibit systematic change-aversion rather than political bias, voting Nein on 83-94% of referenda regardless of direction (binomial p < 0.0001). What prior work measured as "leftward bias" may not generalize beyond abstract instruments. On concrete policy decisions, LLMs behave less like coalition partners of the left and more like cautious civil servants: centrist, status-quo-favoring, and inconsistent across languages.

URL PDF HTML ☆

赞 0 踩 0

2606.00047 2026-06-02 cs.CY cs.AI

Comprehensive AI governance requires addressing non-model gains

全面的人工智能治理需要解决非模型增益问题

Arthur Goemans, Dan Altman, Noemi Dreksler, Jonas Freund, Milan Gandhi, Zhengdong Wang, Sarah Cogan, Sebastien Krier, Demetra Brady, Lewis Ho, Allan Dafoe

发表机构 * Stanford University（斯坦福大学）； UC Berkeley（加州大学伯克利分校）； Open Philanthropy（开放哲学基金会）

AI总结本文提出非模型增益的概念，包括推理增益、系统增益和资产增益，并论证这些增益会削弱以模型为中心的治理有效性，进而提出超越模型层面的治理方法。

Comments This paper has been accepted to ICML 2026 (Position paper track): https://openreview.net/forum?id=V3O1sHpKxX

详情

AI中文摘要

前沿人工智能治理通常以模型级治理范式为中心，该范式假设模型的能力概况主要取决于训练期间使用的计算和数据。本文认为，当能力进步越来越多地由“非模型增益”——即与基础模型进步无关的改进——驱动时，模型级治理的有效性会降低。我们形式化了非模型增益的概念，并提供了三种不同能力增益向量的分类：推理增益（测试时扩展计算）、系统增益（训练后增强，如脚手架）和资产增益（用受限资产增强模型）。我们展示了这些向量——以及来自具身化、持续学习和人工智能扩散的潜在未来影响——可能会破坏主要依赖于部署前评估和缓解的风险管理策略。我们概述了超越模型层面的治理方法：系统、实体、代理和云治理。最后，我们强调社会韧性作为这些治理层补充的重要性。

英文摘要

Frontier AI governance often centres on the model-level governance paradigm, which assumes that a model's capability profile is primarily a function of the compute and data used during training. This position paper argues that model-level governance becomes less effective when capability progress is increasingly driven by "non-model gains"--improvements that are independent from advances in the base model. We formalise the concept of non-model gains and provide a taxonomy of three distinct vectors of capability gain: inference gain (scaling compute at test-time), systems gain (post-training enhancements such as scaffolds), and asset gain (enhancing a model with restricted assets). We demonstrate how these vectors--alongside potential future impacts from embodiment, continual learning, and AI diffusion--may undermine risk management strategies that hinge mostly on pre-deployment evaluation and mitigation. We provide an overview of governance approaches that go beyond the model level: system, entity, agent, and cloud governance. Finally, we emphasise the importance of societal resilience as a complement to these governance layers.

URL PDF HTML ☆

赞 0 踩 0

2606.00046 2026-06-02 cs.MM cs.AI cs.CV cs.CY

When Jokes Cross the Line: Analyzing Regular Humor and Dark Humor in YouTube Shorts

当玩笑越界：分析YouTube Shorts中的常规幽默与黑色幽默

Sydney Johns, Sanjeev Parthasarathy, Shantnu Bhalla, Vaibhav Garg

发表机构 * Virginia Polytechnic Institute and State University（弗吉尼亚理工大学）

AI总结通过构建TwistedHumor数据集（1211个YouTube Shorts及33041条评论的手工标注），结合多视角分析（LLooM概念归纳、评论情感分析、大模型评估），揭示了短格式视频中常规幽默与黑色幽默在主题、观众反应和模型检测上的差异，强调了上下文感知审核的必要性。

详情

AI中文摘要

YouTube等视频平台重塑了用户参与娱乐和信息的方式，强调简短、高参与度的内容，如Shorts。在这个生态系统中，某些内容处于灰色地带：虽然允许存在，但仍可能对部分观众产生意想不到的负面影响。为了研究这一问题，我们引入了TwistedHumor数据集，包含1,211个YouTube Shorts及其配对的33,041条相关评论，并手工标注了幽默存在性、幽默类型、伤害性、主题、修辞手法和单口喜剧背景。除了数据集构建，我们还提出了对短格式社交媒体中幽默与伤害表现的多视角分析。通过使用基于LLooM的概念归纳对视频描述进行分析，我们发现黑色幽默经常围绕批评、应对、尴尬和身份表达等主题聚集，而不是作为一个单一的类别出现。我们进一步通过关联评论分析观众反应，表明常规幽默与更积极的情感相关，而黑色幽默则收到更多混合、中性甚至有时更有毒的反馈。最后，我们评估了大语言模型与人类标注的一致性，发现它们在单口喜剧上的表现优于短笑话。综合来看，这些结果将TwistedHumor不仅定位为一个新的基准，而且是对短格式视频中幽默与伤害灰色地带的实证研究，强调了需要上下文感知的审核和更稳健的多模态评估。

英文摘要

Video platforms such as YouTube have reshaped how users engage with entertainment and information, emphasizing brief, highly engaging content such as Shorts. Within this ecosystem, certain content occupies a gray area where it remains allowed but may still have unintended negative effects on some audiences. To study this problem, we introduce TwistedHumor, a dataset of 1,211 YouTube Shorts paired with 33,041 related comments, with hand annotations for humor presence, humor type, harm, topic, rhetorical devices, and stand up context. Beyond dataset creation, we present a multi view analysis of how humor and harm appear in short form social media. Using LLooM based concept induction over video descriptions, we find that dark humor frequently clusters around themes of critique, coping, awkwardness, and identity expression rather than appearing as a single uniform category. We further analyze audience response through linked comments and show that regular humor is associated with more positive sentiment, while dark humor receives more mixed, neutral, and sometimes more toxic reactions. Finally, we evaluate large language models against human annotations and find that they perform better on stand up comedy compared to shorter jokes. Together, these results position TwistedHumor not only as a new benchmark, but as an empirical study of the gray area between humor and harm in short form video, highlighting the need for context aware moderation and more robust multimodal evaluation.

URL PDF HTML ☆

赞 0 踩 0

2606.00044 2026-06-02 cs.CY cs.AI

Algorithmic Authority and the Clinical Standard of Care

算法权威与临床护理标准

Aizierjiang Aiersilan

发表机构 * The George Washington University（乔治·华盛顿大学）

AI总结本文探讨人工智能在临床医学中引发的算法概率推理与医生经验直觉之间的张力，提出将AI系统视为事实上的医疗监管，并主张通过辩证的护理标准将AI-医生联合体作为单一诊断责任实体。

2606.00041 2026-06-02 cs.CY cs.AI

Improving Hospital Process Management through Process Mining: A Case Study on COVID-19 Clinical Pathways

通过过程挖掘改进医院流程管理：COVID-19临床路径案例研究

Pasquale Ardimento, Mario Luca Bernardi, Marta Cimitile, Samuele Latorre

发表机构 * University of Bari Aldo Moro（巴里大学Aldo Moro分校）； Unisannio University of Benevento（贝内文托大学Unisannio分校）； UnitelmaSapienza Rome（罗马Sapienza大学Unitelma分校）

AI总结本研究利用COVID数据共享学习数据集，构建透明可复现的管道将临床数据转化为事件日志，通过过程发现、声明性合规检查和结果分析，揭示COVID-19护理路径中的监测主干、急诊与入院接口的变异性以及年龄和重症监护暴露导致的结果差异，支持分诊标准化、容量规划和降级协调。