arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4092
2506.16114 2026-06-02 cs.IR cs.AI

GFlowGR: Fine-tuning Generative Recommendation Frameworks with Generative Flow Networks

GFlowGR:使用生成流网络微调生成式推荐框架

Yejing Wang, Shengyu Zhou, Jinyu Lu, Qidong Liu, Xinhang Li, Wenlin Zhang, Feng Li, Pengjie Wang, Chuan Yu, Jian Xu, Bo Zheng, Xiangyu Zhao

发表机构 * City University of Hong Kong(城市大学) Alibaba Group(阿里巴巴集团)

AI总结 针对生成式推荐中微调步骤忽略未观测正样本导致的曝光偏差问题,提出基于GFlowNets的微调框架GFlowGR,通过自适应轨迹采样器和综合奖励模型整合协同知识,利用GFlowNets的多样生成特性缓解偏差。

详情
AI中文摘要

生成式推荐(GR)通常包括项目分词器和生成式大语言模型(LLM),已在广泛场景中取得显著成功。现有研究主要集中于开发强大的项目分词器或改进LLM解码策略以获得更优性能。然而,GR框架中关键的微调步骤(对于使LLM适应推荐数据至关重要)仍基本未被探索。当前方法主要依赖监督微调(SFT)的下一词预测损失或推荐特定的直接偏好优化(DPO)策略。这两种方法都忽略了对可能存在的正未观测样本的探索,这通常被称为曝光偏差问题。为缓解此问题,本文将GR视为多步生成任务,并构建了基于GFlowNets的微调框架(GFlowGR)。所提框架整合了传统推荐系统中的协同知识,以创建自适应轨迹采样器和综合奖励模型。利用GFlowNets的多样生成特性以及采样和启发式加权技术,GFlowGR成为缓解曝光偏差问题的一种有前景的方法。在两个真实世界数据集和两种不同GR骨干上的大量实证结果突显了GFlowGR的有效性和鲁棒性。

英文摘要

Generative recommendations (GR), which usually include item tokenizers and generative Large Language Models (LLMs), have demonstrated remarkable success across a wide range of scenarios. The majority of existing research efforts primarily concentrate on developing powerful item tokenizers or advancing LLM decoding strategies to attain superior performance. However, the critical fine-tuning step in GR frameworks, which is essential for adapting LLMs to recommendation data, remains largely unexplored. Current approaches predominantly rely on either the next-token prediction loss of supervised fine-tuning (SFT) or recommendationspecific direct preference optimization (DPO) strategies. Both methods ignore the exploration of possible positive unobserved samples, which is commonly referred to as the exposure bias problem. To mitigate this problem, this paper treats the GR as a multi-step generation task and constructs a GFlowNets-based fine-tuning framework (GFlowGR). The proposed framework integrates collaborative knowledge from traditional recommender systems to create an adaptive trajectory sampler and a comprehensive reward model. Leveraging the diverse generation property of GFlowNets, along with sampling and heuristic weighting techniques, GFlowGR emerges as a promising approach to mitigate the exposure bias problem. Extensive empirical results on two real-world datasets and with two different GR backbones highlight the effectiveness and robustness of GFlowGR.

2511.13487 2026-06-02 eess.AS cs.LG cs.SD

Systematic Evaluation of Time-Frequency Features for Binaural Sound Source Localization

双耳声源定位的时频特征系统评估

Davoud Shariat Panah, Alessandro Ragano, Dan Barry, Jan Skoglund, Andrew Hines

发表机构 * Taighde Éireann – Research Ireland(塔尔德·爱尔兰——爱尔兰研究)

AI总结 系统评估不同时频特征组合对双耳声源定位性能的影响,发现精心选择的特征组合(如通道频谱图结合ILD和IPD)可超越增加模型复杂度,为领域特定和通用定位提供实用指导。

Comments Accepted at EUSIPCO 2026

详情
AI中文摘要

本研究对双耳声源定位(SSL)的时频特征设计进行了系统评估,重点关注特征选择如何在多样条件下影响模型性能。我们研究了使用基于幅度特征(幅度频谱图、耳间电平差 - ILD)和基于相位特征(相位频谱图、耳间相位差 - IPD)的各种组合的卷积神经网络(CNN)模型的性能。在域内和域外数据(具有不匹配的头部相关传递函数 - HRTFs)上的评估表明,精心选择的特征组合通常优于增加模型复杂度。虽然诸如ILD + IPD的双特征集足以用于域内SSL,但泛化到多样内容需要更丰富的输入,结合通道频谱图与ILD和IPD。使用最优特征集,我们的低复杂度CNN模型实现了有竞争力的性能。我们的发现强调了特征设计在双耳SSL中的重要性,并为领域特定和通用定位提供了实用指导。

英文摘要

This study presents a systematic evaluation of time-frequency feature design for binaural sound source localization (SSL), focusing on how feature selection influences model performance across diverse conditions. We investigate the performance of a convolutional neural network (CNN) model using various combinations of amplitude-based features (magnitude spectrogram, interaural level difference - ILD) and phase-based features (phase spectrogram, interaural phase difference - IPD). Evaluations on in-domain and out-of-domain data with mismatched head-related transfer functions (HRTFs) reveal that carefully chosen feature combinations often outperform increases in model complexity. While two-feature sets such as ILD + IPD are sufficient for in-domain SSL, generalization to diverse content requires richer inputs combining channel spectrograms with both ILD and IPD. Using the optimal feature sets, our low-complexity CNN model achieves competitive performance. Our findings underscore the importance of feature design in binaural SSL and provide practical guidance for both domain-specific and general-purpose localization.

2511.12081 2026-06-02 cs.IR cs.LG

From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction

从规模到结构化表达能力:重新思考用于CTR预测的Transformer

Bencheng Yan, Yuejie Lei, Zhiyuan Zeng, Zheye Deng, Di Wang, Kaiyi Lin, Pengjie Wang, Chuan Yu, Jian Xu, Bo Zheng

发表机构 * Alibaba Group(阿里巴巴集团)

AI总结 针对CTR预测中Transformer模型因结构错位导致收益递减的问题,提出Field-Aware Transformer (FAT),通过场感知参数重构和基组合超网络实现结构化表达能力,在理论(Rademacher复杂度标度律)和实验(AUC提升+4.38%,线上CTR+2.33%,RPM+0.66%)上均优于现有方法。

Comments KDD 2026; The first four authors contributed equally to this work

详情
AI中文摘要

尽管在规模上投入巨大,用于点击率(CTR)预测的深度模型往往表现出快速递减的回报——这与大型语言模型(LLM)中观察到的可预测标度律形成鲜明对比。我们识别出根本原因在于根本性的结构错位:标准Transformer假设顺序组合性,而CTR数据需要对异构字段进行组合推理。为恢复对齐,我们引入了Field-Aware Transformer (FAT)。通过用场中心参数重构标准Transformer块,FAT实现了结构化表达能力,从根本上将模型复杂度依赖从总词汇量n转变为字段数F(n >> F)。关键的是,为了将模型容量与字段基数解耦,FAT采用基组合超网络从共享基合成场特定参数,进一步降低参数复杂度。理论上,我们通过基于Rademacher复杂度的形式化标度律来支撑这一缩放行为。实验上,FAT以高达+4.38%的AUC提升超越现有最先进方法,并在线上生产中带来+2.33%的CTR和+0.66%的RPM提升。我们的工作表明,可扩展的推荐不仅来自规模,更来自结构化表达能力——架构与数据语义的一致性。

英文摘要

Despite massive investments in scale, deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns -- a stark contrast to the {predictable scaling laws} seen in large language models (LLMs). We identify the root cause as a {fundamental} \textit{structural misalignment}: {standard} Transformers assume sequential compositionality, whereas CTR data demand combinatorial reasoning over {heterogeneous} fields. To restore alignment, we introduce the \textbf{Field-Aware Transformer (FAT)}. {By reconstructing the standard Transformer block with field-centric parameters, FAT achieves \textit{structured expressivity}, {fundamentally shifting the model complexity dependence from the total vocabulary size $n$ with the number of fields $F$ ($n \gg F$).}} Crucially, to decouple model capacity from field cardinality, FAT employs a {Basis-Composed Hypernetwork} to synthesize field-specific parameters from shared bases, further reducing parameter complexity. {Theoretically, we ground this scaling behavior through a formal scaling law based on Rademacher complexity. Empirically, FAT outperforms exisiting state-of-the-art methods with up to \textbf{+4.38\%} AUC improvement, and delivers \textbf{+2.33\%} CTR and \textbf{+0.66\%} RPM in live production.} Our work establishes that scalable recommendation arises not from size alone, but from \textit{structured expressivity} -- architectural coherence with data semantics.

2507.22842 2026-06-02 stat.ML cs.LG

Tricks and Plug-ins for Gradient Boosting in Image Classification

图像分类中梯度提升的技巧与插件

Biyi Fang, Truong Vo, Jean Utke, Diego Klabjan

发表机构 * Northwestern University(西北大学) Allstate(Allstate公司)

AI总结 提出一种结合动态特征选择与BoostCNN原理的框架,通过子网格选择和重要性采样策略,将提升权重嵌入最小二乘损失训练,提升CNN性能与效率。

Comments 6 pages, 5 figures. Experimental results reported on CIFAR-10, SVHN, and ImageNetSub datasets

详情
Journal ref
2025 IEEE International Conference on Big Data (BigData), pp. 1382-1388
AI中文摘要

卷积神经网络(CNN)通过深度架构的分层特征学习,在广泛的机器学习任务中取得了显著成功。然而,大量的层和数百万参数通常使得CNN训练计算成本高昂,需要大量时间和手动调优来发现最优架构。在本文中,我们介绍了一种提升CNN性能的新框架,该框架将动态特征选择与BoostCNN原理相结合。我们的方法包含两个关键策略:子网格选择和重要性采样,以引导训练朝向特征空间的信息区域。我们进一步开发了一系列算法,使用最小二乘损失公式将提升权重直接嵌入网络训练过程。这种集成不仅减轻了手动架构设计的负担,还提高了准确性和效率。在多个细粒度分类基准上的实验结果表明,我们的提升CNN变体在预测性能和训练速度上始终优于传统CNN。

英文摘要

Convolutional Neural Networks (CNNs) have achieved remarkable success across a wide range of machine learning tasks by leveraging hierarchical feature learning through deep architectures. However, the large number of layers and millions of parameters often make CNNs computationally expensive to train, requiring extensive time and manual tuning to discover optimal architectures. In this paper, we introduce a novel framework for boosting CNN performance that integrates dynamic feature selection with the principles of BoostCNN. Our approach incorporates two key strategies: subgrid selection and importance sampling, to guide training toward informative regions of the feature space. We further develop a family of algorithms that embed boosting weights directly into the network training process using a least squares loss formulation. This integration not only alleviates the burden of manual architecture design but also enhances accuracy and efficiency. Experimental results across several fine-grained classification benchmarks demonstrate that our boosted CNN variants consistently outperform conventional CNNs in both predictive performance and training speed.

2511.10806 2026-06-02 eess.IV cs.CV

From Attention to Frequency: Integration of Vision Transformer and FFT-ReLU for Enhanced Image Deblurring

从注意力到频率:融合Vision Transformer与FFT-ReLU的图像去模糊增强方法

Syed Mumtahin Mahmud, Mahdi Mohd Hossain Noki, Prothito Shovon Majumder, Abdul Mohaimen Al Radi, Md. Haider Ali, Md. Mosaddek Khan

发表机构 * Department of Computer Science and Engineering, University of Dhaka(达卡大学计算机科学与工程系)

AI总结 提出一种双域架构,将Vision Transformer与频域FFT-ReLU模块结合,通过空间注意力建模和频率稀疏性抑制模糊伪影并保留细节,在基准数据集上取得优于现有方法的PSNR、SSIM和感知质量。

详情
Journal ref
Proceedings of the 18th International Conference on Agents and Artificial Intelligence (ICAART 2026), Volume 2, Marbella, Spain, March 5-7, 2026, pp. 1810-1820. SCITEPRESS
AI中文摘要

图像去模糊在计算机视觉中至关重要,旨在从由运动或相机抖动引起的模糊图像中恢复清晰图像。尽管诸如CNN和Vision Transformers(ViTs)等深度学习方法推动了该领域的发展,但它们往往难以处理复杂或高分辨率的模糊以及计算需求。我们提出了一种新的双域架构,将Vision Transformers与频域FFT-ReLU模块统一起来,明确桥接了空间注意力建模和频率稀疏性。在该结构中,ViT骨干网络捕获局部和全局依赖关系,而FFT-ReLU组件则强制执行频域稀疏性以抑制与模糊相关的伪影并保留精细细节。在基准数据集上的大量实验表明,与最先进模型相比,该架构实现了优越的PSNR、SSIM和感知质量。定量指标、定性比较和人类偏好评估均证实了其有效性,为真实世界图像恢复建立了一个实用且可泛化的范式。

英文摘要

Image deblurring is vital in computer vision, aiming to recover sharp images from blurry ones caused by motion or camera shake. While deep learning approaches such as CNNs and Vision Transformers (ViTs) have advanced this field, they often struggle with complex or high-resolution blur and computational demands. We propose a new dual-domain architecture that unifies Vision Transformers with a frequency-domain FFT-ReLU module, explicitly bridging spatial attention modeling and frequency sparsity. In this structure, the ViT backbone captures local and global dependencies, while the FFT-ReLU component enforces frequency-domain sparsity to suppress blur-related artifacts and preserve fine details. Extensive experiments on benchmark datasets demonstrate that this architecture achieves superior PSNR, SSIM, and perceptual quality compared to state-of-the-art models. Both quantitative metrics, qualitative comparisons, and human preference evaluations confirm its effectiveness, establishing a practical and generalizable paradigm for real-world image restoration.

2509.22689 2026-06-02 eess.IV cs.CV

Graph-Theoretic Consistency for Robust and Topology-Aware Semi-Supervised Histopathology Segmentation

基于图论一致性的鲁棒且拓扑感知的半监督组织病理学分割

Ha-Hieu Pham, Minh Le, Han Huynh, Nguyen Quoc Khanh Le, Huy-Hieu Pham

发表机构 * Student(学生)

AI总结 提出拓扑图一致性(TGC)框架,通过对齐预测图与参考图的拉普拉斯谱、组件计数和邻接统计,在仅5-10%标注下实现最先进的半监督分割性能。

Comments Accepted to the AAAI 2026 Student Abstract and Poster Program

详情
Journal ref
Proceedings of the AAAI Conference on Artificial Intelligence 2026
AI中文摘要

半监督语义分割(SSSS)在计算病理学中至关重要,因为密集标注成本高昂且有限。现有方法通常依赖像素级一致性,这会传播噪声伪标签并产生碎片化或拓扑无效的掩膜。我们提出拓扑图一致性(TGC),一个通过对齐预测图与参考图的拉普拉斯谱、组件计数和邻接统计来整合图论约束的框架。这强制执行全局拓扑并提高分割精度。在GlaS和CRAG上的实验表明,TGC在5-10%监督下实现了最先进的性能,并显著缩小了与全监督的差距。

英文摘要

Semi-supervised semantic segmentation (SSSS) is vital in computational pathology, where dense annotations are costly and limited. Existing methods often rely on pixel-level consistency, which propagates noisy pseudo-labels and produces fragmented or topologically invalid masks. We propose Topology Graph Consistency (TGC), a framework that integrates graph-theoretic constraints by aligning Laplacian spectra, component counts, and adjacency statistics between prediction graphs and references. This enforces global topology and improves segmentation accuracy. Experiments on GlaS and CRAG demonstrate that TGC achieves state-of-the-art performance under 5-10% supervision and significantly narrows the gap to full supervision.

2511.06663 2026-06-02 eess.SY cs.LG cs.SY

GNN-Enabled Robust Hybrid Beamforming with Score-Based CSI Generation and Denoising

基于分数生成CSI与去噪的GNN鲁棒混合波束成形

Yuhang Li, Yang Lu, Bo Ai, Zhiguo Ding, Arumugam Nallanathan

发表机构 * State Key Laboratory of Advanced Rail Autonomous Operation(先进轨道交通自主运行国家重点实验室) School of Computer Science and Technology(计算机科学与技术学院) Beijing Jiaotong University(北京交通大学) School of Electronics and Information Engineering(电子与信息工程学院) School of Electrical and Electronic Engineering (EEE)(电子与电气工程学院) Nanyang Technological University(南洋理工大学) School of Electronic Engineering and Computer Science(电子工程与计算机科学学院) Queen Mary University of London(伦敦女王玛丽大学) Department of Electronic Engineering(电子工程系)

AI总结 针对混合波束成形中信道状态信息不精确的问题,提出利用图神经网络和基于分数的生成模型,通过混合消息图注意力网络、BERT噪声条件分数网络和去噪分数网络实现鲁棒HBF。

详情
AI中文摘要

准确的信道状态信息(CSI)对于混合波束成形(HBF)任务至关重要。然而,在实际无线通信系统中,获取高分辨率CSI仍然具有挑战性。为了解决这个问题,我们提出利用图神经网络(GNN)和基于分数的生成模型,在不完美的CSI条件下实现鲁棒的HBF。首先,我们开发了混合消息图注意力网络(HMGAT),通过节点级和边级消息传递更新节点和边特征。其次,我们设计了一个基于BERT的噪声条件分数网络(NCSN),学习高分辨率CSI的分布,促进CSI生成和数据增强,进一步提高HMGAT的性能。最后,我们提出了一个去噪分数网络(DSN)框架及其实例化DeBERT,该网络可以在任意信道误差水平下对不完美的CSI进行去噪,从而实现鲁棒的HBF。在DeepMIMO城市数据集上的实验表明,所提出的模型在完美和不完美CSI的各种HBF任务中具有优越的泛化能力、可扩展性和鲁棒性。

英文摘要

Accurate Channel State Information (CSI) is critical for Hybrid Beamforming (HBF) tasks. However, obtaining high-resolution CSI remains challenging in practical wireless communication systems. To address this issue, we propose to utilize Graph Neural Networks (GNNs) and score-based generative models to enable robust HBF under imperfect CSI conditions. Firstly, we develop the Hybrid Message Graph Attention Network (HMGAT) which updates both node and edge features through node-level and edge-level message passing. Secondly, we design a Bidirectional Encoder Representations from Transformers (BERT)-based Noise Conditional Score Network (NCSN) to learn the distribution of high-resolution CSI, facilitating CSI generation and data augmentation to further improve HMGAT's performance. Finally, we present a Denoising Score Network (DSN) framework and its instantiation, termed DeBERT, which can denoise imperfect CSI under arbitrary channel error levels, thereby facilitating robust HBF. Experiments on DeepMIMO urban datasets demonstrate the proposed models' superior generalization, scalability, and robustness across various HBF tasks with perfect and imperfect CSI.

2511.05613 2026-06-02 cs.CY cs.AI cs.LG

Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations

谁在评估人工智能的社会影响?第一方和第三方评估的覆盖范围与差距分析

Anka Reuel, Avijit Ghosh, Jenny Chim, Andrew Tran, Yanan Long, Jennifer Mickel, Usman Gohar, Srishti Yadav, Pawan Sasanka Ammanamanchi, Mowafak Allaham, Hossein A. Rahmani, Mubashara Akhtar, Felix Friedrich, Robert Scholz, Michael Alexander Riegler, Jan Batzner, Eliya Habba, Arushi Saxena, Anastassia Kornilova, Kevin Wei, Prajna Soni, Yohan Mathew, Kevin Klyman, Jeba Sania, Subramanyam Sahoo, Olivia Beyer Bruvik, Pouya Sadeghi, Sujata Goswami, Angelina Wang, Yacine Jernite, Zeerak Talat, Stella Biderman, Mykel Kochenderfer, Sanmi Koyejo, Irene Solaiman

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 通过分析186份第一方发布报告和248份第三方评估来源,结合开发者访谈,揭示了第一方报告稀疏且流于表面,而第三方评估更广泛深入,但数据溯源、内容审核劳动等关键领域存在披露缺口,呼吁政策强制开发者透明化并加强独立评估生态。

Comments Accepted at the Forty-Third International Conference on Machine Learning (ICML), 2026, in Seoul, Korea

详情
AI中文摘要

基础模型日益成为高风险人工智能系统的核心,治理框架现在依赖评估来评估其风险和能力。尽管通用能力评估已广泛开展,但涵盖偏见、公平性、隐私、环境成本和劳动的社会影响评估仍不均衡。为了描述这一格局,我们进行了首次社会影响评估报告的综合分析,检查了186份第一方发布报告和248份第三方评估来源,并辅以开发者访谈。我们发现明显的分工:第一方报告稀疏、通常流于表面,且在环境影响和偏见等领域呈下降趋势,而第三方评估者提供了更广泛、更严格的偏见、有害内容和性能差异覆盖。然而,只有开发者才能权威地报告数据来源、内容审核劳动、成本和基础设施,但访谈揭示这些披露除非与产品采用或合规挂钩,否则被降级优先。当前实践在评估社会影响方面留下了重大空白,强调了需要制定政策强制开发者透明化、加强独立评估生态系统,并创建聚合第三方评估的共享基础设施。

英文摘要

Foundation models are increasingly central to high-stakes AI systems, and governance frameworks now depend on evaluations to assess their risks and capabilities. Although general capability evaluations are widespread, social impact assessments covering bias, fairness, privacy, environmental costs, and labor remain uneven. To characterize this landscape, we conduct the first comprehensive analysis of social impact evaluation reporting, examining 186 first-party release reports and 248 third-party evaluation sources, supplemented by developer interviews. We find a stark division of labor: first-party reporting is sparse, often superficial, and declining in areas like environmental impact and bias, while third-party evaluators provide broader, more rigorous coverage of bias, harmful content, and performance disparities. However, only developers can authoritatively report on data provenance, content moderation labor, costs, and infrastructure, yet interviews reveal these disclosures are deprioritized unless tied to product adoption or compliance. Current practices leave major gaps in assessing societal impacts, underscoring the need for policies that mandate developer transparency, strengthen independent evaluation ecosystems, and create shared infrastructure for aggregating third-party evaluations.

2511.04873 2026-06-02 stat.ML cs.LG

Prototype Selection Using Topological Data Analysis

使用拓扑数据分析的原型选择

Jordan Eckert, Elvan Ceyhan, Henry Schenck

发表机构 * Department of Mathematics & Statistics(数学与统计学系)

AI总结 提出两种基于持续同调的原型选择方法TPS和BoundaryTPS,通过多尺度拓扑结构压缩训练集,在保持决策边界和内部典型点的同时,实现了对H1持续图的最佳保留和稳定的折叠扰动性能。

Comments Code will be made available upon request to Jordan Eckert

详情
AI中文摘要

原型选择方法压缩训练集,但现有的分类(压缩、编辑、混合、基于能力、基于优化和基于聚类)不包括对数据多尺度拓扑结构进行操作的方法。本文介绍了两种不同的基于持续性的原型选择变体:拓扑原型选择器(TPS)和边界感知拓扑原型选择器(BoundaryTPS)。TPS使用两个连续的Rips过滤来保留边界相关点和内部典型点。BoundaryTPS是一种单阶段变体,其顶点加权过滤将保留集中在决策边界附近。我们在15个真实数据集上对这两种方法进行了评估,并与七个经典基线方法进行了比较,发现拓扑方法在原型选择设计空间中占据了与现有方法不同的操作点。BoundaryTPS在$H_1$持续图保留上实现了最低的平均Friedman秩,并且显著优于七个基线中的五个(Nemenyi,$α= 0.05$)。TPS在同一指标上排名第三。两种方法在折叠扰动下比任何测试的链式决策选择器更稳定,并且两者都继承了源集的类别比例,无需标签感知机制。在聚合G-Mean上,两种方法具有竞争力但并非领先,跨折叠组合的秩1频率分别为$11.3\%$(TPS)和$9.9\%$(BoundaryTPS)。经验上,两种方法在样本量上呈次二次方缩放。

英文摘要

Prototype selection methods compress a training set, but the existing taxonomy of condensation, edition, hybrid, competence-based, optimization-based, and clustering-based families does not include methods that operate on the multi-scale topological structure of the data. This paper introduces two different persistence-based prototype selector variants, Topological Prototype Selector (TPS) and Boundary-Conscious Topological Prototype Selector (BoundaryTPS). TPS uses two sequential Rips filtrations to retain boundary-relevant and interior-typical points. BoundaryTPS is a single-stage variant whose vertex-weighted filtration concentrates retention near the decision boundary. We evaluate both methods against seven classical baselines on fifteen real datasets and find that the topological methods occupy a different operating point in the prototype-selection design space than existing methods. BoundaryTPS achieves the lowest mean Friedman rank on $H_1$ persistence-diagram preservation and is significantly better than five of the seven baselines (Nemenyi, $α= 0.05$). TPS ranks third on the same endpoint. Both methods are more stable under fold perturbation than any chained-decision selector tested, and both inherit the source set's class proportions without label-aware machinery. On aggregate G-Mean both methods are competitive but not leading, with rank-1 frequencies of $11.3\%$ (TPS) and $9.9\%$ (BoundaryTPS) across fold combinations. Empirically, both methods scale sub-quadratically in sample size.

2506.22666 2026-06-02 cs.CR cs.CL cs.LG stat.ML

VERA: Variational Inference Framework for Jailbreaking Large Language Models

VERA:用于越狱大型语言模型的变分推理框架

Anamika Lochab, Lu Yan, Patrick Pynadath, Xiangyu Zhang, Ruqi Zhang

发表机构 * Department of Computer Science, Purdue University(计算机科学系,普渡大学)

AI总结 提出VERA框架,将黑盒越狱提示生成视为变分推理问题,训练小型攻击者LLM近似目标LLM的对抗提示后验,无需重新优化即可生成多样且流畅的越狱提示。

Comments Accepted by NeurIPS 2025

详情
AI中文摘要

仅通过API访问最先进LLM的兴起凸显了在现实环境中识别模型漏洞的有效黑盒越狱方法的需求。由于缺乏基于梯度的优化原则性目标,大多数现有方法依赖于遗传算法,这些算法受限于其初始化和对人工策划提示池的依赖。此外,这些方法需要对每个提示进行单独优化,未能提供模型漏洞的全面表征。为弥补这一差距,我们引入了VERA:用于越狱的变分推理框架。VERA将黑盒越狱提示生成视为变分推理问题,训练一个小型攻击者LLM来近似目标LLM在对抗提示上的后验。一旦训练完成,攻击者可以针对目标查询生成多样化、流畅的越狱提示,而无需重新优化。实验结果表明,VERA在一系列目标LLM上取得了强劲的性能,凸显了概率推理在对抗性提示生成中的价值。

英文摘要

The rise of API-only access to state-of-the-art LLMs highlights the need for effective black-box jailbreak methods to identify model vulnerabilities in real-world settings. Without a principled objective for gradient-based optimization, most existing approaches rely on genetic algorithms, which are limited by their initialization and dependence on manually curated prompt pools. Furthermore, these methods require individual optimization for each prompt, failing to provide a comprehensive characterization of model vulnerabilities. To address this gap, we introduce VERA: Variational infErence fRamework for jAilbreaking. VERA casts black-box jailbreak prompting as a variational inference problem, training a small attacker LLM to approximate the target LLM's posterior over adversarial prompts. Once trained, the attacker can generate diverse, fluent jailbreak prompts for a target query without re-optimization. Experimental results show that VERA achieves strong performance across a range of target LLMs, highlighting the value of probabilistic inference for adversarial prompt generation.

2504.16129 2026-06-02 cs.MA cs.AI cs.LG cs.RO

MARFT: Multi-Agent Reinforcement Fine-Tuning

MARFT: 多智能体强化微调

Junwei Liao, Muning Wen, Jun Wang, Weinan Zhang

发表机构 * Shanghai Jiao Tong University(上海交通大学) Shanghai Innovation Institute(上海创新研究院) OPPO Research Institute(OPPO研究院)

AI总结 针对基于大语言模型的多智能体系统,提出多智能体强化微调(MARFT)框架,通过引入Flex-MG马尔可夫博弈公式和通用算法,解决异步交互、异构架构等挑战,提升系统鲁棒性和适应性。

Comments 37 pages

详情
AI中文摘要

基于大语言模型的多智能体系统(LaMAS)在需要多方面推理和协作的复杂智能体任务中展现出强大能力,从高质量演示生成到科学研究。同时,强化学习(RL)被广泛认可用于增强智能体智能,但用基础RL技术微调LaMAS的研究有限。由于LaMAS的独特机制,直接将传统多智能体强化学习(MARL)应用于LaMAS也带来了重大挑战。为解决这些挑战,本文对基于LLM的MARL进行了全面研究,并提出了多智能体强化微调(MARFT)。我们引入了Flex-MG,一种与真实世界LaMAS优化一致的新马尔可夫博弈公式,以及一个针对LaMAS定制的通用算法框架。我们回顾了从传统RL到强化微调(RFT)的演变,然后分析了多智能体对应部分。对于LaMAS,我们识别了经典MARL与MARFT之间的关键差异,包括异步智能体交互、轮廓感知智能体设计和异构架构。这些差异促使了面向LaMAS的RFT公式。我们提出了一个稳健且可扩展的MARFT框架,详细介绍了其模块化算法,并提供了开源实现以支持采用和进一步研究。本文进一步讨论了应用前景和开放挑战,包括动态环境建模、样本效率低下以及缺乏连贯框架。通过将理论基础与实践方法相结合,本文旨在作为推进MARFT向弹性、自适应和与人类一致的智能体系统发展的路线图。实现:https://github.com/jwliao-ai/MARFT。

英文摘要

Large Language Model (LLM)-based Multi-Agent Systems (LaMAS) have demonstrated strong capabilities on complex agentic tasks requiring multifaceted reasoning and collaboration, from high-quality presentation generation to scientific research. Meanwhile, Reinforcement Learning (RL) is widely recognized for enhancing agent intelligence, but limited work has studied fine-tuning LaMAS with foundational RL techniques. Directly applying conventional Multi-Agent Reinforcement Learning (MARL) to LaMAS also introduces major challenges due to the unique mechanisms of LaMAS. To address these challenges, this article presents a comprehensive study of LLM-based MARL and proposes Multi-Agent Reinforcement Fine-Tuning (MARFT). We introduce Flex-MG, a new Markov Game formulation aligned with real-world LaMAS optimization, together with a universal algorithmic framework tailored to LaMAS. We review the evolution from traditional RL to Reinforcement Fine-Tuning (RFT), then analyze the multi-agent counterpart. For LaMAS, we identify key differences between classical MARL and MARFT, including asynchronous agent interactions, profile-aware agent design, and heterogeneous architectures. These differences motivate a LaMAS-oriented formulation of RFT. We present a robust and scalable MARFT framework, detail its modular algorithm, and provide an open-source implementation to support adoption and further research. The paper further discusses application perspectives and open challenges, including dynamic environment modeling, sample inefficiency, and the lack of cohesive frameworks. By connecting theoretical foundations with practical methodology, this work aims to serve as a roadmap for advancing MARFT toward resilient, adaptive, and human-aligned agentic systems. Implementation: https://github.com/jwliao-ai/MARFT.

2410.14483 2026-06-02 stat.ML cs.LG stat.ME

Interventional Processes for Causal Uncertainty Quantification

因果不确定性量化的干预过程

Hugh Dance, Peter Orbanz, Arthur Gretton

发表机构 * Gatsby Unit, University College London, London, United Kingdom(伦敦大学学院Gatsby单元)

AI总结 本文提出一种基于高斯过程的方法,通过将干预函数表示为再生核希尔伯特空间中观测函数的内积,实现干预函数的不确定性量化,并给出闭式后验矩和可处理的训练推理过程。

详情
AI中文摘要

在高风险应用中,因果效应的可靠不确定性量化至关重要,但当目标是一个完整函数而非标量估计量时,这仍然具有挑战性。在这项工作中,我们引入了一种基于高斯过程的方法,用于干预函数的不确定性量化。核心思想是建立在最近工作的基础上,该工作将干预函数表示为再生核希尔伯特空间中观测函数的内积,通过为这些函数构建适当的高斯过程先验,并从观测数据中推断后验。我们的方法产生闭式后验矩和可处理的训练与推理,同时避免了先前为RKHS函数构建高斯过程先验的病理问题。我们进一步推导了一种后验覆盖校准的实用程序。在合成基准、因果贝叶斯优化任务和大规模真实数据集上,我们的方法在保持因果效应估计竞争力的同时,改善了不确定性量化。

英文摘要

Reliable uncertainty quantification for causal effects is crucial in high-stakes applications, but remains challenging when the target is an entire function rather than a scalar estimand. In this work, we introduce a GP-based approach for uncertainty quantification of interventional functions. The central idea is to build on recent work representing interventional functions as an inner-product of observational functions in a reproducing kernel Hilbert space (RKHS), by constructing appropriate GP priors for such functions and inferring posteriors from observational data. Our approach yields closed-form posterior moments and tractable training and inference, while avoiding pathologies of previous GP prior constructions for RKHS functions. We further derive a practical procedure for posterior coverage calibration. Across synthetic benchmarks, causal Bayesian optimization tasks, and a large-scale real dataset, our method improves uncertainty quantification while remaining competitive in causal effect estimation.

2510.11560 2026-06-02 cs.IR cs.AI

Characterizing Web Search in The Age of Generative AI

生成式AI时代下网络搜索的特征刻画

Elisabeth Kirsten, Jost Grosse Perdekamp, Qinyuan Wu, Mihir Upadhyay, Krishna P. Gummadi, Muhammad Bilal Zafar

发表机构 * UA Ruhr Research Center for Trustworthy Data Science and Security(乌尔姆-鲁尔可信数据科学与安全研究中心) Max Planck Institute for Software Systems(马克斯·普朗克软件系统研究所) Ruhr University Bochum(波鸿鲁尔大学)

AI总结 通过系统比较传统搜索与多个生成式搜索系统,揭示了它们在知识来源、多样性、稳定性上的差异,并指出生成式搜索引入了现有评估范式未覆盖的新维度。

详情
AI中文摘要

LLM的出现催生了生成式搜索,这是一种新的搜索范式,其中LLM从网络中检索与查询相关的信息,并将其综合成一个连贯的响应。这种范式与传统的网络搜索有根本不同,传统搜索的结果以独立网页的排名列表形式返回。在本文中,我们提出:生成式搜索与传统搜索在哪些维度上存在差异?我们对Google有机搜索和来自三个提供商(Google、OpenAI和Perplexity)的五个生成式搜索系统进行了系统比较。我们的分析揭示了引擎在依赖内部与外部知识、来源多样性和稳定性方面的显著差异。虽然生成式系统通常能达到与传统搜索相当的主题覆盖,但它们使用的是明显不同的检索足迹和综合策略。我们进一步表明,生成式搜索的输出可能随时间及执行而变化,这给鲁棒性带来了新的挑战。我们的发现表明,生成式搜索引入了现有评估范式未捕捉到的新维度,从而促使开发明确考虑生成式搜索系统中检索行为、综合和稳定性的评估方法。

英文摘要

The advent of LLMs has given rise to generative search, a new search paradigm in which LLMs retrieve information from the web related to a query and synthesize it into a single, coherent response. This paradigm differs fundamentally from traditional web search, where results are returned as a ranked list of independent web pages. In this paper, we ask: Along what dimensions does generative search differ from traditional search? We conduct a systematic comparison between Google organic search and five generative search systems from three providers: Google, OpenAI, and Perplexity. Our analysis reveals substantial variation among engines in their reliance on internal v.s. external knowledge, source diversity, and stability. While generative systems often achieve topical coverage comparable to traditional search, they do so using markedly different retrieval footprints and synthesis strategies. We further show that the outputs of generative search can vary across time and executions, raising new challenges for robustness. Our findings demonstrate that generative search introduces new dimensions that are not captured by existing evaluation paradigms, motivating the development of evaluations that explicitly account for retrieval behavior, synthesis, and stability in generative search systems.

2510.10943 2026-06-02 cs.MA cs.CL

The Social Cost of Intelligence: Emergence, Propagation, and Amplification of Stereotypical Bias in Multi-Agent Systems

智能的社会成本:多智能体系统中刻板偏见的涌现、传播与放大

Thi-Nhung Nguyen, Linhao Luo, Amardeep Kaur, Rollin Omari, Tamas Abraham, Junae Kim, Thuy-Trang Vu, Dinh Phung

发表机构 * Department of Data Science & AI, Monash University(数据科学与人工智能系,莫纳什大学) Defence Science and Technology Group, Australia(澳大利亚国防科学与技术集团)

AI总结 本研究提出一个评估框架,通过三个智能体级指标量化多智能体系统中偏见的涌现、传播和放大,发现通信可触发高达70%的新偏见、传播至80%以上智能体并放大3倍以上刻板印象,且密集竞争性通信增加偏见,系统易受简单偏见注入攻击。

详情
AI中文摘要

大型语言模型(LLM)中的偏见仍然是一个持续存在的挑战,常常导致跨社会群体的刻板印象和不公平对待。虽然先前的工作主要关注单个LLM,但多智能体系统(MAS)的出现——其中多个LLM协作和通信——引入了偏见如何涌现、传播和放大的新的且未被充分探索的动态。为了系统地研究这些动态,我们提出了一个简单的评估框架,包含三个智能体级指标,用于量化多智能体交互过程中偏见的涌现、传播和放大。我们在不同的LLM骨干网络、社会群体配置、通信行为和对抗性设置下,对三个偏见基准评估了MAS。我们的结果表明,通信可以触发高达70%的新偏见涌现,将偏见传播到超过80%的智能体,并将刻板印象放大3倍以上。我们进一步发现,更密集和竞争性的通信通常会增加偏见。最后,我们证明了MAS极易受到简单的偏见注入攻击,而现有的防御策略只能提供有限的保护。我们的发现为多智能体LLM系统的公平性和鲁棒性提供了重要见解。

英文摘要

Bias in large language models (LLMs) remains a persistent challenge, often leading to stereotyping and unfair treatment across social groups. While prior work has mainly focused on individual LLMs, the emergence of multi-agent systems (MAS), where multiple LLMs collaborate and communicate, introduces new and underexplored dynamics in how bias emerges, propagates, and amplifies. To systematically investigate these dynamics, we propose a simple evaluation framework with three agent-level metrics that quantify bias emergence, propagation, and amplification throughout multi-agent interaction. We evaluate MAS across three bias benchmarks under varying LLM backbones, social-group configurations, communication behaviors, and adversarial settings. Our results show that communication can trigger up to 70\% new bias emergence, propagate bias across over 80\% of agents, and amplify stereotypes by more than 3$\times$. We further find that denser and competitive communication generally increases bias. Finally, we demonstrate that MAS are highly vulnerable to simple bias injection attacks, and existing defense strategies provide only limited protection. Our findings provide important insights into the fairness and robustness of multi-agent LLM systems.

2510.10676 2026-06-02 cs.AR cs.CL cs.RO eess.AS

Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multilingual Neural Machine Translation

Bhasha-Rupantarika: 面向多语言神经机器翻译的算法-硬件协同设计方法

Mukul Lokhande, Tanushree Dewangan, Mohd Sharik Mansoori, Tejas Chaudhari, Akarsh J., Damayanti Lokhande, Adam Teman, Santosh Kumar Vishvakarma

发表机构 * Special Manpower Development Program for Chip to Start-Up (SMDP-C2S)(芯片到初创企业专项人才发展计划(SMDP-C2S)) Ministry of Electronics and Information Technology (MeitY)(电子与信息技术部(MeitY))

AI总结 提出一种通过算法-硬件协同设计实现的轻量高效多语言翻译系统Bhasha-Rupantarika,采用亚字节精度量化(FP8/INT8/INT4/FP4)在FPGA上实现模型大小减少4.1倍、推理速度提升4.2倍,为资源受限环境下的多语言AI部署提供可行方案。

详情
Journal ref
International Symposium on Quality Electronic Design (ISQED), San Francisco, CA, USA, 2026
AI中文摘要

本文介绍了Bhasha-Rupantarika,一个通过算法-硬件协同设计为资源受限环境量身定制的轻量高效多语言翻译系统。该方法研究了亚字节精度级别(FP8、INT8、INT4和FP4)的模型部署,实验结果表明模型大小减少4.1倍(FP4),推理速度提升4.2倍,对应吞吐量提高至66 tokens/s(提升4.8倍)。这凸显了超低精度量化对于使用FPGA加速器的物联网设备实时部署的重要性,实现了与预期相当的性能。我们的评估涵盖了印度语言和国际语言之间的双向翻译,展示了其在低资源语言环境中的适应性。FPGA部署显示LUT减少1.96倍,FF减少1.65倍,与OPU相比吞吐量提升2.2倍,与HPTA相比提升4.6倍。总体而言,该评估提供了一种基于量化感知翻译且兼顾硬件效率的可行解决方案,适用于可部署的多语言AI系统。完整的代码[https://github.com/mukullokhande99/Bhasha-Rupantarika/]和可复现数据集已公开,便于研究人员快速集成和进一步开发。

英文摘要

This paper introduces Bhasha-Rupantarika, a light and efficient multilingual translation system tailored through algorithm-hardware codesign for resource-limited settings. The method investigates model deployment at sub-octet precision levels (FP8, INT8, INT4, and FP4), with experimental results indicating a 4.1x reduction in model size (FP4) and a 4.2x speedup in inference speed, which correlates with an increased throughput of 66 tokens/s (improvement by 4.8x). This underscores the importance of ultra-low precision quantization for real-time deployment in IoT devices using FPGA accelerators, achieving performance on par with expectations. Our evaluation covers bidirectional translation between Indian and international languages, showcasing its adaptability in low-resource linguistic contexts. The FPGA deployment demonstrated a 1.96x reduction in LUTs and a 1.65x decrease in FFs, resulting in a 2.2x enhancement in throughput compared to OPU and a 4.6x enhancement compared to HPTA. Overall, the evaluation provides a viable solution based on quantisation-aware translation along with hardware efficiency suitable for deployable multilingual AI systems. The entire codes [https://github.com/mukullokhande99/Bhasha-Rupantarika/] and dataset for reproducibility are publicly available, facilitating rapid integration and further development by researchers.

2510.09288 2026-06-02 stat.ML cs.LG

A unifying Bayesian framework for adversarial robustness

对抗鲁棒性的统一贝叶斯框架

Pablo G. Arce, Roi Naveiro, David Ríos Insua

发表机构 * Universidad Autónoma de Madrid, Escuela de Doctorado(马德里自治大学博士学院) Institute of Mathematical Sciences, Spanish National Research Council(西班牙国家研究理事会数学研究所) CUNEF Universidad(CUNEF大学)

AI总结 提出一个统一的贝叶斯框架,通过随机信道建模对抗不确定性,衍生出对抗训练和对抗净化两种鲁棒化策略,并验证了显式建模对抗不确定性的优势。

详情
AI中文摘要

机器学习模型对对抗攻击的脆弱性仍然是一个关键的社会安全挑战。传统的防御方法,如对抗训练,通常通过最小化最坏情况损失来增强模型鲁棒性。这些确定性方法没有考虑对手攻击的不确定性。虽然存在将概率分布置于对手上的随机防御,但它们通常缺乏统计严谨性,并且未能明确其潜在假设。为了解决这些问题,我们引入了一个正式的贝叶斯框架,通过随机信道建模对抗不确定性,阐明所有概率假设。这产生了两种鲁棒化策略:一种是在训练期间实施的主动防御,与对抗训练一致;另一种是在操作期间实施的被动防御,与对抗净化一致。几种最先进的防御可以作为我们模型的极限情况恢复。我们通过实验验证了我们的方法,展示了显式建模对抗不确定性的好处。

英文摘要

The vulnerability of machine learning models to adversarial attacks remains a critical societal security challenge. Traditional defenses, such as adversarial training, typically robustify models by minimizing a worst-case loss. These deterministic approaches do not account for uncertainty in the adversary's attack. While stochastic defenses placing a probability distribution on the adversary exist, they often lack statistical rigor and fail to make explicit their underlying assumptions. To resolve these issues, we introduce a formal Bayesian framework that models adversarial uncertainty through a stochastic channel, articulating all probabilistic assumptions. This yields two robustification strategies: a proactive defense enacted during training, aligned with adversarial training, and a reactive defense enacted during operations, aligned with adversarial purification. Several state-of-the-art defenses can be recovered as limiting cases of our model. We empirically validate our methodology, showcasing the benefits of explicitly modeling adversarial uncertainty.

2510.09260 2026-06-02 cs.CR cs.LG

GREAT: Generalizable Backdoor Attacks in RLHF via Emotion-Aware Trigger Synthesis

GREAT: 通过情感感知触发器在RLHF中实现可泛化的后门攻击

Subrat Kishore Dutta, Yuelin Xu, Piyush Pant, Xiao Zhang

发表机构 * CISPA Helmholtz Center for Information Security(CISPA赫尔姆霍茨信息安全中心)

AI总结 提出GREAT框架,利用情感感知触发器在RLHF中构造自然分布后门,通过潜在空间聚类和多样性提示策略生成愤怒触发器,实现对未见触发器的泛化攻击。

详情
AI中文摘要

近期研究表明,RLHF极易受到后门攻击。然而,现有方法通常依赖稀有令牌或固定触发器,限制了其在现实场景中的影响。在这项工作中,我们开发了GREAT,一个用于在RLHF中构造自然分布后门的新框架。具体而言,GREAT针对易受攻击的用户子群体,通过语义暴力的请求与情感愤怒的触发器配对,生成有害响应。我们框架的核心是一个在模型潜在嵌入空间中运行的触发器识别管道,利用降维和聚类技术来识别代表性触发器。为实现这一点,我们引入了一种层次化和多样性驱动的提示策略,构建了Erinyes,一个从GPT-4.1中策划的包含5000多个愤怒触发器的高质量数据集。实验表明,GREAT在攻击泛化到未见触发器方面显著优于基线,同时保持标准效用并在防御下保持隐蔽。

英文摘要

Recent work has shown that RLHF is highly susceptible to backdoor attacks. However, existing methods often rely on rare tokens or fixed triggers, limiting their impact in realistic scenarios. In this work, we develop GREAT, a novel framework for crafting natural distributional backdoors in RLHF. Specifically, GREAT targets harmful response generation for a vulnerable user subpopulation featured by semantically violent requests paired with emotionally angry triggers. At the core of our framework is a trigger identification pipeline that operates in the model's latent embedding space, leveraging dimensionality reduction and clustering techniques to identify representative triggers. To enable this, we introduce a hierarchical and diversity-driven prompting strategy to construct Erinyes, a high-quality dataset of over 5,000 angry triggers curated from GPT-4.1. Our experiments show that GREAT significantly outperforms baselines in attack generalization to unseen triggers, while preserving standard utility and maintaining stealth under defenses.

2510.05566 2026-06-02 stat.ML cs.AI cs.CL cs.LG stat.AP

Domain-Shift-Aware Conformal Prediction for Large Language Models

领域偏移感知的共形预测用于大型语言模型

Zhexiao Lin, Yuanyuan Li, Neeraj Sarna, Yuanyuan Gao, Michael von Gablenz

发表机构 * University of Waterloo(多伦多大学)

AI总结 提出领域偏移感知共形预测框架,通过重加权校准样本应对分布偏移,在MMLU基准上提升覆盖可靠性。

Comments Accepted to Forty-Third International Conference on Machine Learning (ICML), 2026

详情
AI中文摘要

大型语言模型在各种任务中取得了令人印象深刻的性能。然而,它们倾向于产生过度自信且事实不正确的输出,即所谓的幻觉,这在实际应用中带来了风险。共形预测提供了有限样本、无分布假设的覆盖保证,但标准共形预测在领域偏移下会失效,常常导致覆盖不足和不可靠的预测集。我们提出了一种称为领域偏移感知共形预测(DS-CP)的新框架。我们的框架通过根据校准样本与测试提示的接近程度系统地重新加权校准样本,将共形预测适应于领域偏移下的大型语言模型,从而在保持有效性的同时增强适应性。我们的理论分析和在MMLU基准上的实验表明,所提出的方法比标准共形预测提供了更可靠的覆盖,尤其是在显著分布偏移下,同时保持了效率。这为大型语言模型在实际部署中实现可信的不确定性量化迈出了实际的一步。

英文摘要

Large language models have achieved impressive performance across diverse tasks. However, their tendency to produce overconfident and factually incorrect outputs, known as hallucinations, poses risks in real-world applications. Conformal prediction provides finite-sample, distribution-free coverage guarantees, but standard conformal prediction breaks down under domain shift, often leading to under-coverage and unreliable prediction sets. We propose a new framework called Domain-Shift-Aware Conformal Prediction (DS-CP). Our framework adapts conformal prediction to large language models under domain shift, by systematically reweighting calibration samples based on their proximity to the test prompt, thereby preserving validity while enhancing adaptivity. Our theoretical analysis and experiments on the MMLU benchmark demonstrate that the proposed method delivers more reliable coverage than standard conformal prediction, especially under substantial distribution shifts, while maintaining efficiency. This provides a practical step toward trustworthy uncertainty quantification for large language models in real-world deployment.

2510.03938 2026-06-02 physics.optics cs.CV cs.NE physics.app-ph

Super-resolution image projection over an extended depth of field using a diffractive decoder

使用衍射解码器实现扩展景深上的超分辨率图像投影

Hanlong Chen, Cagatay Isil, Tianyi Gan, Mona Jarrahi, Aydogan Ozcan

发表机构 * Electrical and Computer Engineering Department, University of California, Los Angeles, California 90095, USA(加州大学洛杉矶分校电子与计算机工程系) Bioengineering Department, University of California, Los Angeles, California 90095, USA(加州大学洛杉矶分校生物工程系) California NanoSystems Institute (CNSI), University of California, Los Angeles, California 90095, USA(加州大学洛杉矶分校加州纳米系统研究所)

AI总结 提出一种混合图像投影系统,结合CNN编码器和全光学衍射解码器,实现扩展景深和像素超分辨率,提升空间带宽积。

Comments 18 Pages, 6 Figures

详情
Journal ref
Light: Science & Applications (2026)
AI中文摘要

图像投影系统必须在数据存储、计算和传输方面高效,同时保持输出的大空间带宽积(SBP)。本文介绍了一种混合图像投影系统,该系统结合了基于卷积神经网络(CNN)的数字编码器和全光学衍射解码器,实现了具有改进分辨率的扩展景深(DOF)。基于CNN的编码器将输入图像压缩为紧凑的相位表示,随后由低分辨率(LR)投影仪显示,并由模拟衍射解码器进行全光学图像重建。该光学解码器完全被动,设计用于合成像素超分辨图像投影,具有扩展景深,同时无需额外功耗即可实现超分辨图像重建。我们的像素超分辨率(PSR)图像投影系统在约267倍波长(W)的扩展景深内展示了高保真图像合成,同时在每个横向平面上提供高达约16倍的SBP改进。通过太赫兹波段的实验验证了该概念,并且该系统可扩展到电磁波谱的不同部分。这种图像投影架构可以减少显示系统的数据存储和传输需求,而不会对光学解码器施加额外的功率限制。除了扩展景深PSR图像投影外,该方法的基本原理还可扩展到各种应用,包括光学计量和显微镜。

英文摘要

Image projection systems must be efficient in data storage, computation and transmission while maintaining a large space-bandwidth-product (SBP) at their output. Here, we introduce a hybrid image projection system that achieves extended depth-of-field (DOF) with improved resolution, combining a convolutional neural network (CNN)-based digital encoder with an all-optical diffractive decoder. A CNN-based encoder compresses input images into compact phase representations, which are subsequently displayed by a low-resolution (LR) projector and processed by an analog diffractive decoder for all-optical image reconstruction. This optical decoder is completely passive, designed to synthesize pixel super-resolved image projections that feature an extended DOF while eliminating the need for additional power consumption for super-resolved image reconstruction. Our pixel super-resolution (PSR) image projection system demonstrates high-fidelity image synthesis over an extended DOF of ~267xW, where W is the illumination wavelength, concurrently offering up to ~16-fold SBP improvement at each lateral plane. The proof of concept of this approach is validated through an experiment conducted in the THz spectrum, and the system is scalable across different parts of the electromagnetic spectrum. This image projection architecture can reduce data storage and transmission requirements for display systems without imposing additional power constraints on the optical decoder. Beyond extended DOF PSR image projection, the underlying principles of this approach can be extended to various applications, including optical metrology and microscopy.

2510.00481 2026-06-02 cs.NI cs.AI cs.HC cs.MM cs.PF

Make a Video Call with LLM: A Measurement Campaign over Six Mainstream Apps

用LLM进行视频通话:对六个主流应用的测量活动

Jiayang Xu, Xiangjie Huang, Zijie Li, Antariksh Verma, Zili Meng

发表机构 * Hong Kong University of Science and Technology(香港科技大学)

AI总结 本文通过自定义测试平台和在线平台,从质量、延迟、内部机制和系统开销四个维度对六个主流AI视频聊天应用进行基准测试,发现AI视频通话的网络延迟影响小于人类视频通话,AI代理能力对用户体验影响最大。

详情
AI中文摘要

2025年,大型语言模型(LLM)服务推出了一项新功能——AI视频聊天,允许用户通过实时视频通信(RTC)与AI代理互动,就像与真人聊天一样。尽管其重要性,但尚无系统性研究描述现有AI视频聊天系统的性能。为填补这一空白,本文提出了一个涵盖四个维度的综合基准:质量、延迟、内部机制和系统开销。使用自定义测试平台,我们进一步用该基准评估了六个主流AI视频聊天机器人。我们还构建了一个用于用户研究的在线平台。测量结果得出了有趣的发现,可能对未来优化有益。例如,AI视频通话的网络延迟不如人类视频通话重要。AI代理的能力对用户体验影响最大。我们的基准测试结果也为未来AI视频聊天机器人的优化提出了几个研究问题。可用性:在线评估平台、开源数据集和测试平台见https://callarena.net/。

英文摘要

In 2025, Large Language Model (LLM) services have launched a new feature -- AI video chat -- allowing users to interact with AI agents via real-time video communication (RTC), just like chatting with real people. Despite its significance, no systematic study has characterized the performance of existing AI video chat systems. To address this gap, this paper proposes a comprehensive benchmark across four dimensions: quality, latency, internal mechanisms, and system overhead. Using custom testbeds, we further evaluate six mainstream AI video chatbots with this benchmark. We also build an online platform for user study. The measurement leads to interesting findings that could be beneficial to the future optimizations. For example, the network latency of AI video chat matters not as much as human video chat. The capabilities of AI agents matters most in the user experience. Our benchmarking results also open up several research questions for future optimizations of AI video chatbots. Availability: https://callarena.net/ for the online evaluation platform and our open-sourced dataset and testbed.

2510.00053 2026-06-02 eess.IV cs.CV cs.LG

DPsurv: Dual-Prototype Evidential Fusion for Uncertainty-Aware and Interpretable Whole-Slide Image Survival Prediction

DPsurv: 双原型证据融合用于不确定性感知和可解释的全切片图像生存预测

Yucheng Xing, Ling Huang, Jingying Ma, Ruping Hong, Jiangdong Qiu, Pei Liu, Kai He, Huazhu Fu, Mengling Feng

发表机构 * National University of Singapore National University of Singapore Guangzhou Research Translation Innovation Institute Imperial College London Peking Union Medical College Hospital, Chinese Academy of Medical Sciences \& Peking Union Medical College Hunan University Institute of High Performance Computing, Agency for Science, Technology Research (A STAR)

AI总结 提出DPsurv双原型证据融合网络,通过不确定性感知的生存区间预测和基于补丁原型分配图、组件原型及组件级相对风险聚合的可解释性,在五个公开数据集上取得最佳一致性指数和积分Brier分数。

详情
AI中文摘要

病理全切片图像(WSIs)因其在细胞和组织水平上全面的组织病理学信息而被广泛用于癌症生存分析,能够进行定量、大规模且预后丰富的肿瘤特征分析。然而,现有大多数WSI生存分析方法可解释性有限,且常常忽略异质性切片图像中的预测不确定性。本文提出DPsurv,一种双原型全切片图像证据融合网络,输出不确定性感知的生存区间,同时通过补丁原型分配图、组件原型和组件级相对风险聚合实现预测的解释。在五个公开数据集上的实验取得了最高的平均一致性指数和最低的平均积分Brier分数,验证了DPsurv的有效性和可靠性。预测结果的解释在特征、推理和决策层面提供了透明度,从而增强了DPsurv的可信度和可解释性。

英文摘要

Pathology whole-slide images (WSIs) are widely used for cancer survival analysis because of their comprehensive histopathological information at both cellular and tissue levels, enabling quantitative, large-scale, and prognostically rich tumor feature analysis. However, most existing methods in WSI survival analysis struggle with limited interpretability and often overlook predictive uncertainty in heterogeneous slide images. In this paper, we propose DPsurv, a dual-prototype whole-slide image evidential fusion network that outputs uncertainty-aware survival intervals, while enabling interpretation of predictions through patch prototype assignment maps, component prototypes, and component-wise relative risk aggregation. Experiments on five publicly available datasets achieve the highest mean concordance index and the lowest mean integrated Brier score, validating the effectiveness and reliability of DPsurv. The interpretation of prediction results provides transparency at the feature, reasoning, and decision levels, thereby enhancing the trustworthiness and interpretability of DPsurv.

2509.18025 2026-06-02 math.OC cs.AI cs.LG math.LO stat.ML

Deep Learning as the Disciplined Construction of Tame Objects

深度学习作为驯服对象的有纪律构造

Gilles Bareilles, Allen Gehret, Johannes Aspman, Jana Lepšová, Jakub Mareček

发表机构 * Czech Technical University in Prague, Artificial Intelligence Center(布拉格捷克技术大学人工智能中心)

AI总结 本文通过驯服几何(o-极小性)框架,介绍深度学习模型作为函数组合的数学基础,并展示其在非光滑非凸但驯服设置下为随机梯度下降提供收敛保证的应用。

Comments 39 pages, 10 figures

详情
AI中文摘要

人们可以将深度学习模型视为所谓驯服几何中函数的组合。在这篇说明性笔记中,我们概述了驯服几何(也称为o-极小性)、优化理论以及深度学习理论与实践之间的一些主题。为此,我们逐步介绍在一般非光滑非凸但驯服的设置中,为随机梯度下降建立收敛保证所使用的概念和工具。这说明了驯服几何作为研究AI系统(尤其是深度学习)的自然数学框架的一些方式。

英文摘要

One can see deep-learning models as compositions of functions within the so-called tame geometry. In this expository note, we give an overview of some topics at the interface of tame geometry (also known as o-minimality), optimization theory, and deep learning theory and practice. To do so, we gradually introduce the concepts and tools used to build convergence guarantees for stochastic gradient descent in a general nonsmooth nonconvex, but tame, setting. This illustrates some ways in which tame geometry is a natural mathematical framework for the study of AI systems, especially within Deep Learning.

2509.11056 2026-06-02 eess.SY cs.LG cs.SY

BERT4beam: Large AI Model Enabled Generalized Beamforming Optimization

BERT4beam: 大型AI模型实现通用波束赋形优化

Yuhang Li, Yang Lu, Wei Chen, Bo Ai, Zhiguo Ding

发表机构 * State Key Laboratory of Advanced Rail Autonomous Operation(先进轨道交通自主运行国家重点实验室) School of Computer Science and Technology(计算机科学与技术学院) School of Electronics and Information Engineering(电子与信息工程学院) School of Electrical and Electronic Engineering (EEE)(电子与电气工程学院)

AI总结 本文提出基于BERT的框架BERT4beam,将波束赋形优化转化为token级序列学习任务,通过预训练和微调实现单任务与多任务优化,在不同用户规模、系统效用和天线配置下均能接近最优性能。

详情
AI中文摘要

人工智能(AI)有望成为未来第六代(6G)无线通信系统的关键推动力。然而,当前关于无线通信大型AI模型的研究主要集中在针对特定任务微调预训练的大型语言模型(LLM)。本文研究了专为波束赋形优化设计的大规模AI模型,以适应并泛化到由系统效用和规模定义的不同任务。我们提出了一种基于Transformer双向编码器表示(BERT)的新框架,称为BERT4beam。我们旨在将波束赋形优化问题表述为token级序列学习任务,对信道状态信息进行token化,构建BERT模型,并执行任务特定的预训练和微调策略。基于该框架,我们分别提出了两种基于BERT的方法用于单任务和多任务波束赋形优化。两种方法均可泛化到不同用户规模。此外,前者通过重新配置BERT模型的输入和输出模块,能够适应不同的系统效用和天线配置;而后者(称为UBERT)由于采用更细粒度的token化策略,可以直接泛化到多种任务。大量仿真结果表明,这两种方法能够实现接近最优的性能,并在各种波束赋形优化任务中优于现有AI模型,展现出强大的适应性和泛化能力。

英文摘要

Artificial intelligence (AI) is anticipated to emerge as a pivotal enabler for the forthcoming sixth-generation (6G) wireless communication systems. However, current research efforts regarding large AI models for wireless communications primarily focus on fine-tuning pre-trained large language models (LLMs) for specific tasks. This paper investigates the large-scale AI model designed for beamforming optimization to adapt and generalize to diverse tasks defined by system utilities and scales. We propose a novel framework based on bidirectional encoder representations from transformers (BERT), termed BERT4beam. We aim to formulate the beamforming optimization problem as a token-level sequence learning task, perform tokenization of the channel state information, construct the BERT model, and conduct task-specific pre-training and fine-tuning strategies. Based on the framework, we propose two BERT-based approaches for single-task and multi-task beamforming optimization, respectively. Both approaches are generalizable for varying user scales. Moreover, the former can adapt to varying system utilities and antenna configurations by re-configuring the input and output module of the BERT model, while the latter, termed UBERT, can directly generalize to diverse tasks, due to a finer-grained tokenization strategy. Extensive simulation results demonstrate that the two proposed approaches can achieve near-optimal performance and outperform existing AI models across various beamforming optimization tasks, showcasing strong adaptability and generalizability.

2509.10491 2026-06-02 eess.SP cs.LG

FlowECG: Using Flow Matching to Create a More Efficient ECG Signal Generator

FlowECG:利用流匹配创建更高效的ECG信号生成器

Vitalii Bondar, Serhii Semenov, Vira Babenko, Dmytro Holovniak

发表机构 * Cherkasy State Technological University(切爾卡西州立科技大學) University of the National Education Commission(國家教育委員會大學) State Scientific Research Institute of Armament and Military Equipment Testing and Certification(軍事裝備測試和認證州科學研究 institutes)

AI总结 提出FlowECG方法,采用流匹配替代扩散过程,通过连续流动力学学习噪声到数据分布的直连路径,在PTB-XL数据集上以更少的采样步数(10-25次)达到与扩散方法(200次)相当的生成质量,计算需求降低一个数量级。

Comments 8 pages, 2 figures, 1 table, reviewed version will be published in "Sensors, Devices and Systems 2025 Proceedings" (Springer's Lecture Notes in Electrical Engineering)

详情
AI中文摘要

合成心电图生成为需要隐私保护数据共享和训练数据集增强的医学AI应用提供服务。当前基于扩散的方法实现了高生成质量,但在采样过程中需要数百次神经网络评估,给临床部署造成了计算瓶颈。我们提出了FlowECG,一种流匹配方法,通过用连续流动力学替代迭代扩散过程来适配SSSD-ECG架构。流匹配通过常微分方程求解学习从噪声到数据分布的直连传输路径。我们使用动态时间规整、Wasserstein距离、最大均值差异和频谱相似性指标在PTB-XL数据集上评估了我们的方法。FlowECG在200次神经函数评估时匹配了SSSD-ECG的性能,并在三个指标上优于基线。关键发现表明,FlowECG以大幅减少的采样步数保持生成质量,与扩散方法需要200次评估相比,仅需10-25次评估即可获得可比结果。这种效率提升将计算需求降低了一个数量级,同时保留了生理上真实的12导联ECG特征。该方法使得在需要实时生成或大规模合成数据创建的资源受限临床环境中实现实际部署成为可能。

英文摘要

Synthetic electrocardiogram generation serves medical AI applications requiring privacy-preserving data sharing and training dataset augmentation. Current diffusion-based methods achieve high generation quality but require hundreds of neural network evaluations during sampling, creating computational bottlenecks for clinical deployment. We propose FlowECG, a flow matching approach that adapts the SSSD-ECG architecture by replacing the iterative diffusion process with continuous flow dynamics. Flow matching learns direct transport paths from noise to data distributions through ordinary differential equation solving. We evaluate our method on the PTB-XL dataset using Dynamic Time Warping, Wasserstein distance, Maximum Mean Discrepancy, and spectral similarity metrics. FlowECG matches SSSD-ECG performance at 200 neural function evaluations, outperforming the baseline on three metrics. The key finding shows that FlowECG maintains generation quality with substantially fewer sampling steps, achieving comparable results with 10-25 evaluations compared to 200 for diffusion methods. This efficiency improvement reduces computational requirements by an order of magnitude while preserving physiologically realistic 12-lead ECG characteristics. The approach enables practical deployment in resource-limited clinical settings where real-time generation or large-scale synthetic data creation is needed.

2509.03456 2026-06-02 stat.ML cs.LG

Off-Policy Learning in Large Action Spaces: Optimization Matters More Than Estimation

大动作空间中的离线策略学习:优化比估计更重要

Imad Aouali, Otmane Sakhi

发表机构 * Criteo AI Lab(Criteo AI实验室)

AI总结 本文研究离线上下文强盗中的离线策略学习,发现现有方法在大动作空间中面临严重优化问题,提出使用加权对数似然目标可改善优化并取得竞争性策略。

Comments ICML '26

详情
AI中文摘要

离线策略评估(OPE)和离线策略学习(OPL)是离线上下文强盗中决策制定的基础。最近OPL的进展主要优化具有改进统计特性的OPE估计器,假设更好的估计器自然产生更优的策略。尽管有理论依据,但这种以估计器为中心的方法忽略了一个关键的实际障碍:具有挑战性的优化景观。在本文中,我们提供理论见解和实证证据,表明当前的OPL方法遇到严重的优化问题,特别是随着动作空间的增长。我们表明,估计器感知的策略参数化可以缓解但不能完全解决优化挑战。在此基础上,我们探索更简单的加权对数似然目标,并证明它们具有显著更好的优化特性,并且仍然能够恢复具有竞争力、通常更优的学习策略。我们的发现强调了在开发针对大动作空间的OPL算法时,明确考虑优化问题的必要性。

英文摘要

Off-policy evaluation (OPE) and off-policy learning (OPL) are foundational for decision-making in offline contextual bandits. Recent advances in OPL primarily optimize OPE estimators with improved statistical properties, assuming that better estimators inherently yield superior policies. Although theoretically justified, this estimator-centric approach neglects a critical practical obstacle: challenging optimization landscapes. In this paper, we provide theoretical insights and empirical evidence showing that current OPL methods encounter severe optimization issues, particularly as the action space grows. We show that estimator-aware policy parametrization can mitigate, but not fully resolve, optimization challenges. Building on this, we explore simpler weighted log-likelihood objectives and demonstrate that they enjoy substantially better optimization properties and still recover competitive, often superior, learned policies. Our findings emphasize the necessity of explicitly addressing optimization considerations in the development of OPL algorithms for large action spaces.

2507.19702 2026-06-02 cs.SI cs.AI cs.LG

A Lightweight Deep Learning-based Model for Ranking Influential Nodes in Complex Networks

基于轻量级深度学习的复杂网络中有影响力节点排序模型

Mohammed A. Ramadhan, Abdulhakeem O. Mohammed

发表机构 * Computer Science Department, College of Science, University of Zakho(扎赫大学科学学院计算机科学系) Department of Computer Science and Information Technology, The American University of Kurdistan(库尔德斯坦美国大学计算机科学与信息技术系)

AI总结 提出一种结合一维卷积神经网络和GraphSAGE的轻量级混合模型1D-CGS,利用节点度和平均邻居度特征,通过回归任务高效排序有影响力节点,在12个真实网络上平均Kendall Tau提升4.73%,Jaccard相似度提升7.67%,单调性指数达0.99,运行速度显著快于现有深度学习方法。

详情
AI中文摘要

识别复杂网络中的有影响力节点是一项关键任务,在不同领域有广泛应用。然而,现有方法常在准确性和计算效率之间权衡。为解决这些挑战,我们提出1D-CGS,一种轻量级且有效的混合模型,它结合了一维卷积神经网络(1D-CNN)的速度和GraphSAGE的拓扑表示能力,用于高效节点排序。该模型使用基于两个简单且重要的拓扑特征(节点度和平均邻居度)构建的轻量级输入表示。这些特征通过一维卷积提取局部模式,然后通过GraphSAGE层聚合邻域信息。我们将节点排序任务表述为回归问题,并使用易感-感染-恢复(SIR)模型生成真实影响力分数。1D-CGS首先在Barabasi-Albert模型生成的合成网络上训练,然后应用于真实世界网络以识别有影响力节点。在12个真实网络上的实验评估表明,1D-CGS在排序准确性上显著优于传统中心性度量和最近的深度学习模型,同时运行速度非常快。与表现最佳的深度学习基线相比,所提模型在Kendall Tau相关性上平均提升4.73%,在Jaccard相似度上平均提升7.67%。它还实现了平均单调性指数(MI)分数0.99,并产生近乎完美的排名分布,表明高度独特和可区分的排名。此外,所有实验证实1D-CGS在高度合理的时间内运行,比现有深度学习方法快得多,使其适用于大规模应用。

英文摘要

Identifying influential nodes in complex networks is a critical task with a wide range of applications across different domains. However, existing approaches often face trade-offs between accuracy and computational efficiency. To address these challenges, we propose 1D-CGS, a lightweight and effective hybrid model that integrates the speed of one-dimensional convolutional neural networks (1D-CNN) with the topological representation power of GraphSAGE for efficient node ranking. The model uses a lightweight input representation built on two straightforward and significant topological features: node degree and average neighbor degree. These features are processed through 1D convolutions to extract local patterns, followed by GraphSAGE layers to aggregate neighborhood information. We formulate the node ranking task as a regression problem and use the Susceptible-Infected-Recovered (SIR) model to generate ground truth influence scores. 1D-CGS is initially trained on synthetic networks generated by the Barabasi-Albert model and then applied to real world networks for identifying influential nodes. Experimental evaluations on twelve real world networks demonstrate that 1D-CGS significantly outperforms traditional centrality measures and recent deep learning models in ranking accuracy, while operating in very fast runtime. The proposed model achieves an average improvement of 4.73% in Kendall's Tau correlation and 7.67% in Jaccard Similarity over the best performing deep learning baselines. It also achieves an average Monotonicity Index (MI) score 0.99 and produces near perfect rank distributions, indicating highly unique and discriminative rankings. Furthermore, all experiments confirm that 1D-CGS operates in a highly reasonable time, running significantly faster than existing deep learning methods, making it suitable for large scale applications.

2507.12645 2026-06-02 eess.SP cs.AI cs.LG

A Novel Data Augmentation Strategy for Robust Deep Learning Classification of Biomedical Time-Series Data: Application to ECG and EEG Analysis

一种用于生物医学时间序列数据鲁棒深度学习分类的新型数据增强策略:在ECG和EEG分析中的应用

Mohammed Guhdar, Ramadhan J. Mstafa, Abdulhakeem O. Mohammed

发表机构 * Computer Science Department, College of Science, University of Zakho(扎赫大学科学学院计算机科学系) Department of Computer Science and Information Technology, The American University of Kurdistan(库尔德斯坦美国大学计算机科学与信息技术系) PRIME Lab, Scientific Research Center, University of Zakho(扎赫大学科学研究中心PRIME实验室)

AI总结 提出一种结合ResNet-CNN与注意力机制的统一深度学习框架,通过时域拼接多个增强变体的新型数据增强策略和Focal Loss处理类别不平衡,在ECG和EEG数据集上达到99.96%-100%的准确率,且内存需求低、推理速度快。

详情
AI中文摘要

准确统一分析多种生物信号(如ECG和EEG)的需求日益迫切,这对于全面评估患者状况至关重要,尤其是在同步监测中。尽管多传感器融合取得了进展,但在开发能够有效处理和提取本质上不同生理信号特征的统一架构方面仍存在关键空白。另一个挑战是许多生物医学数据集固有的类别不平衡,这常常导致传统方法性能偏差。本研究通过提出一种新颖且统一的深度学习框架来解决这些问题,该框架在不同信号类型上均达到了最先进的性能。我们的方法将基于ResNet的CNN与注意力机制相结合,并通过一种新颖的数据增强策略增强:对每个信号的多个增强变体进行时域拼接,以生成更丰富的表示。与先前工作不同,我们科学地增加信号复杂性以实现未来能力,从而相比现有技术获得了最佳预测。预处理步骤包括小波去噪、基线去除和标准化。通过结合使用这种高级数据增强和Focal Loss函数,有效管理了类别不平衡。训练过程中应用了正则化技术以确保泛化能力。我们在三个基准数据集上严格评估了所提出的架构:UCI癫痫EEG、MIT-BIH心律失常和PTB诊断ECG。它分别达到了99.96%、99.78%和100%的准确率,展示了在不同信号类型和临床背景下的鲁棒性。最后,该架构需要约130 MB内存,每个样本处理时间约10 ms,表明其适用于低端或可穿戴设备部署。

英文摘要

The increasing need for accurate and unified analysis of diverse biological signals, such as ECG and EEG, is paramount for comprehensive patient assessment, especially in synchronous monitoring. Despite advances in multi-sensor fusion, a critical gap remains in developing unified architectures that effectively process and extract features from fundamentally different physiological signals. Another challenge is the inherent class imbalance in many biomedical datasets, often causing biased performance in traditional methods. This study addresses these issues by proposing a novel and unified deep learning framework that achieves state-of-the-art performance across different signal types. Our method integrates a ResNet-based CNN with an attention mechanism, enhanced by a novel data augmentation strategy: time-domain concatenation of multiple augmented variants of each signal to generate richer representations. Unlike prior work, we scientifically increase signal complexity to achieve future-reaching capabilities, which resulted in the best predictions compared to the state of the art. Preprocessing steps included wavelet denoising, baseline removal, and standardization. Class imbalance was effectively managed through the combined use of this advanced data augmentation and the Focal Loss function. Regularization techniques were applied during training to ensure generalization. We rigorously evaluated the proposed architecture on three benchmark datasets: UCI Seizure EEG, MIT-BIH Arrhythmia, and PTB Diagnostic ECG. It achieved accuracies of 99.96%, 99.78%, and 100%, respectively, demonstrating robustness across diverse signal types and clinical contexts. Finally, the architecture requires ~130 MB of memory and processes each sample in ~10 ms, suggesting suitability for deployment on low-end or wearable devices.

2501.12189 2026-06-02 math.OC cs.LG

MirrorCBO: A consensus-based optimization method in the spirit of mirror descent

MirrorCBO:一种镜像下降思想的共识优化方法

Leon Bungert, Franca Hoffmann, Dohyeon Kim, Tim Roith

发表机构 * Department of Computing and Mathematical Sciences, Caltech(计算与数学科学系,加州理工学院) Helmholtz Imaging, Deutsches Elektronen-Synchrotron DESY(海德堡成像,德意志电子同步辐射研究中心(DESY))

AI总结 提出MirrorCBO方法,通过将共识优化与镜像下降结合,实现无导数非凸优化,并推广到约束优化问题,理论证明指数收敛速率,实验展示稀疏诱导和约束优化的竞争力。

Comments 66 pages, 18 figures, 19 tables

详情
Journal ref
Mathematical Models and Methods in Applied Sciences 35 (14), 3083-3170, 2025
AI中文摘要

本文提出MirrorCBO,一种共识优化方法,它像镜像下降推广梯度下降一样推广了标准CBO。为此,我们将CBO方法应用于对偶粒子群,并通过应用镜像映射的逆(参数化为强凸函数$\phi$的次微分)来保留原始粒子位置。这样,我们结合了无导数非凸优化算法和镜像下降的优点。作为一个特例,该方法将CBO扩展到具有凸约束的优化问题。假设与$\phi$相关的Bregman距离有界,我们提供了MirrorCBO的渐近收敛结果,具有显式指数速率。另一个关键贡献是对该新算法在不同应用场景中的探索性数值研究,重点关注(i)稀疏诱导优化和(ii)约束优化,展示了MirrorCBO的竞争性能。我们经验性地观察到,该方法也可用于欧几里得空间(非凸)子流形上的优化,可适应其他近期CBO变体的镜像版本,并且继承了镜像下降选择理想极小值(如稀疏解)的能力。我们还概述了近期用于约束优化的CBO方法,并将其性能与MirrorCBO进行了比较。

英文摘要

In this work we propose MirrorCBO, a consensus-based optimization (CBO) method which generalizes standard CBO in the same way that mirror descent generalizes gradient descent. For this we apply the CBO methodology to a swarm of dual particles and retain the primal particle positions by applying the inverse of the mirror map, which we parametrize as the subdifferential of a strongly convex function $ϕ$. In this way, we combine the advantages of a derivative-free non-convex optimization algorithm with those of mirror descent. As a special case, the method extends CBO to optimization problems with convex constraints. Assuming bounds on the Bregman distance associated to $ϕ$, we provide asymptotic convergence results for MirrorCBO with explicit exponential rate. Another key contribution is an exploratory numerical study of this new algorithm across different application settings, focusing on (i) sparsity-inducing optimization, and (ii) constrained optimization, demonstrating the competitive performance of MirrorCBO. We observe empirically that the method can also be used for optimization on (non-convex) submanifolds of Euclidean space, can be adapted to mirrored versions of other recent CBO variants, and that it inherits from mirror descent the capability to select desirable minimizers, like sparse ones. We also include an overview of recent CBO approaches for constrained optimization and compare their performance to MirrorCBO.

2506.21278 2026-06-02 stat.ML cs.AI cs.LG math.ST stat.TH

Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution

使用高效球面柯西分布的超球面变分自编码器

Lukas Sablica, Kurt Hornik

发表机构 * Institute for Statistics and Mathematics(统计与数学研究所) Vienna University of Economics and Business(维也纳经济与商业大学) Austria(奥地利)

AI总结 提出基于球面柯西分布的超球面变分自编码器,通过莫比乌斯变换实现可微重参数化,避免贝塞尔函数计算,在保持重尾特性的同时提供高效稳定的训练与推理。

详情
AI中文摘要

我们提出在超球面潜变量空间上使用球面柯西(spCauchy)潜变量的变分自编码器。spCauchy 族具有重尾全局行为,并且通过对球面上的均匀样本应用莫比乌斯变换,允许精确可微的重参数化。我们证明,在高浓度极限下,spCauchy 在显式浓度参数映射下恢复了 von Mises-Fisher(vMF)分布的局部切空间几何,同时避免了 vMF 实现所需的高阶贝塞尔函数计算。对于训练,到均匀球面先验的 Kullback-Leibler 散度具有快速收敛的级数、稳定的求积以及高浓度渐近形式。我们进一步建立了浓度依赖的 KL 核心的单调性,并推导了具有闭形式代理和误差控制的解析括号,支持极端情况下的稳定近似。压力测试基准表明,所得到的潜层目标在 CPU 和 GPU 上比 vMF 基线更稳定且评估更快。在图像和分子序列数据上的实验表明,spCauchy-VAE 为具有超球面潜表示的生式建模提供了一种鲁棒且可扩展的替代方案。

英文摘要

We propose spherical Cauchy (spCauchy) latent variables for variational autoencoders on hyperspherical latent spaces. The spCauchy family has heavy-tailed global behavior and admits an exact differentiable reparameterization by applying a Möbius transformation to uniform samples on the sphere. We show that, in the high-concentration limit, spCauchy recovers the local tangent-space geometry of the von Mises-Fisher (vMF) distribution under an explicit concentration parameter mapping, while avoiding the high-order Bessel-function evaluations required by vMF implementations. For training, the Kullback-Leibler divergence to a uniform spherical prior admits rapidly convergent series, stable quadrature, and high-concentration asymptotic forms. We further establish monotonicity of the concentration-dependent KL core and derive analytic brackets with closed-form surrogates and error control, supporting stable approximation in extreme regimes. Stress-test benchmarks show that the resulting latent-layer objective remains stable and faster to evaluate than vMF baselines on CPU and GPU. Experiments on image and molecular sequence data demonstrate that spCauchy-VAEs provide a robust and scalable alternative for generative modeling with hyperspherical latent representations.

2507.07339 2026-06-02 stat.AP cs.LG

Benchmarking Waitlist Mortality Prediction in Heart Transplantation Through Time-to-Event Modeling using New Longitudinal UNOS Dataset

基于新的纵向UNOS数据集通过时间-事件模型对心脏移植等待名单死亡率预测进行基准测试

Yingtao Luo, Reza Skandari, Carlos Martinez, Arman Kilic, Rema Padman

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Imperial College(帝国理工学院) United Network for Organ Sharing(美国器官共享网络) Medical University of South Carolina(南卡罗来纳医学院)

AI总结 本研究利用纵向等待名单历史数据,通过时间-事件模型对心脏移植等待名单死亡率进行预测,最佳模型C-Index达0.94,AUROC达0.89,显著优于以往模型。

Comments Best Student Paper Finalist in Proceedings of AMIA Annual Symposium 2025

详情
AI中文摘要

目前,关于心脏移植等待名单患者管理的决策由医生委员会根据多种因素做出,但过程在很大程度上仍是临时的。随着2018年以来器官共享联合网络(UNOS)收集的纵向患者、供体和器官数据量的增加,人们对在器官可用时支持临床决策的分析方法越来越感兴趣。在本研究中,我们对利用纵向等待名单历史数据进行时间依赖性、时间-事件建模的机器学习模型进行了基准测试,以预测等待名单死亡率。我们使用23,807条患者记录(包含77个变量)进行训练,并在1年时间范围内评估生存预测和区分能力。我们的最佳模型实现了0.94的C-Index和0.89的AUROC,显著优于以往模型。关键预测因子与已知风险因素一致,同时也揭示了新的关联。我们的发现可以支持心脏移植决策中的紧迫性评估和政策改进。

英文摘要

Decisions about managing patients on the heart transplant waitlist are currently made by committees of doctors who consider multiple factors, but the process remains largely ad-hoc. With the growing volume of longitudinal patient, donor, and organ data collected by the United Network for Organ Sharing (UNOS) since 2018, there is increasing interest in analytical approaches to support clinical decision-making at the time of organ availability. In this study, we benchmark machine learning models that leverage longitudinal waitlist history data for time-dependent, time-to-event modeling of waitlist mortality. We train on 23,807 patient records with 77 variables and evaluate both survival prediction and discrimination at a 1-year horizon. Our best model achieves a C-Index of 0.94 and AUROC of 0.89, significantly outperforming previous models. Key predictors align with known risk factors while also revealing novel associations. Our findings can support urgency assessment and policy refinement in heart transplant decision making.