arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2154
专题追踪
2601.15014 2026-05-20 stat.ML cs.LG math.ST stat.TH

Efficient and Minimax Optimal In-context Nonparametric Regression with Transformers

高效且最优的基于上下文的非参数回归变换器

Michelle Ching, Ioana Popescu, Nico Smith, Tianyi Ma, William G. Underwood, Richard J. Samworth

发表机构 * Statistical Laboratory, University of Cambridge, Cambridge, UK(剑桥大学统计实验室,剑桥,英国)

AI总结 本文研究了基于上下文学习的非参数回归,针对α-Holder光滑回归函数,证明了使用预训练的变换器可以达到最优收敛率,且参数和预训练序列数量显著少于现有文献。

Comments 30 pages, 7 figures

详情
AI中文摘要

我们研究了对于某些α>0的α-Hölder光滑回归函数的基于上下文学习的非参数回归。我们证明,使用n个基于上下文的例子和d维回归协变量,一个具有Θ(log n)参数和Ω(n^{2α/(2α+d)} log^3 n)预训练序列的预训练变换器可以达到均方误差的最优收敛率O(n^{-2α/(2α+d)})。我们的结果需要比文献中现有的结果显著更少的变换器参数和预训练序列。这通过展示变换器能够通过实现核加权多项式基并随后运行梯度下降来高效地近似局部多项式估计器来实现。

英文摘要

We study in-context learning for nonparametric regression with $α$-Hölder smooth regression functions, for some $α>0$. We prove that, with $n$ in-context examples and $d$-dimensional regression covariates, a pretrained transformer with $Θ(\log n)$ parameters and $Ω\bigl(n^{2α/(2α+d)}\log^3 n\bigr)$ pretraining sequences can achieve the minimax optimal rate of convergence $O\bigl(n^{-2α/(2α+d)}\bigr)$ in mean squared error. Our result requires substantially fewer transformer parameters and pretraining sequences than previous results in the literature. This is achieved by showing that transformers are able to approximate local polynomial estimators efficiently by implementing a kernel-weighted polynomial basis and then running gradient descent.

2601.12367 2026-05-20 cs.HC cs.RO

User-to-Vehicle Interaction in Smart Mobility: The GO-DRiVeS Autonomous Ride-Sharing Application

用户与车辆交互在智能交通中的应用:GO-DRiVeS自动驾驶拼车应用

Hana E. Elmalah, Catherine M. Elias

发表机构 * C-DRiVeS Lab: Cognitive Driving Research in Vehicular Systems, Cairo, Egypt(C-DRiVeS实验室:车载系统认知驾驶研究,埃及开罗) Computer Science and Engineering Department - Faculty of Media Engineering and Technology(计算机科学与工程系——媒体工程与技术学院) German University in Cairo, Egypt(埃及开罗德国大学)

AI总结 本文提出了一种名为GO-DRiVeS的拼车应用,旨在解决大学学生和员工在炎热天气或携带重物时长时间步行的问题。该应用采用敏捷开发方法,并基于现有的交通应用框架进行分析和比较,实现了用户注册、拼车请求和实时追踪等功能,并通过多个实验验证了其稳定性和可靠性。

详情
AI中文摘要

本文介绍了GO-DRiVeS应用,这是一种按需拼车和请求的移动应用,专门针对解决长时间步行、时间消耗和疲劳的问题,尤其是在炎热天气或携带重物时,这对大学学生和员工来说是一个挑战。GO-DRiVeS应用是按照敏捷方法开发的,以确保其灵活性。此外,使用移动应用程序系统架构和客户端-服务器架构。GO-DRiVeS是使用React Native(Expo)作为前端,Node.js和Express作为后端,MongoDB作为数据库实现的;基于对现有交通应用的详细分析,比较其框架并识别其核心功能。GO-DRiVeS支持用户注册、拼车请求和实时追踪等核心功能。此外,它能够以先到先得的方式同时处理多个请求。该应用基于这些功能进行开发,其结果以多种形式的实验形式呈现,展示了在处理请求时的稳定性,如在方法和结果章节中所展示的。

英文摘要

This paper introduces the GO-DRiVeS application, an on demand ride sharing and requesting mobile application tailored specifically to save long walks and challenges which are time consuming and tiring especially during hot days or when carrying heavy items, faced by university students and staff. The GO-DRiVeS application was developed following the Agile methodology for its flexibility. In addition to, using the mobile application system architecture and client-server architecture. GO-DRiVeS was implemented using React Native (Expo) for the frontend, Node.js and Express for the backend, and MongoDB as the database; based on a detailed analyses to the existing transportation application, comparing their frameworks and identifying their essential functionalities. GO-DRiVeS supports core features like user registration, ride requesting and real-time tracking.In addition to handling multiple requests at the same time in a first come first serve manner. The application was developed based on these features, and the results were conducted in the form of multiple experiments that demonstrated stable behavior in handling the requests, as presented in the Methodology and Results chapters.

2512.00667 2026-05-20 eess.SY cs.RO cs.SY

Active Learning of Fractional-Order Viscoelastic Model Parameters for Realistic Haptic Rendering

分数阶黏弹性模型参数的主动学习用于真实触觉渲染

Harun Tolasa, Gorkem Gemalmaz, Volkan Patoglu

发表机构 * Faculty of Engineering and Natural Sciences(工程与自然科学学院)

AI总结 本文提出了一种系统的方法,通过主动学习优化分数阶黏弹性模型的参数,以提高触觉渲染的感知真实感,同时通过人类在回路优化和群体感知地图结合,选择出在一般人群中被广泛认为真实的参数。

Comments This work has been submitted to the IEEE Transactions on Haptics for possible publication. 14 pages, 8 figures

详情
AI中文摘要

有效的医疗模拟器需要真实地渲染具有黏弹性材料特性(如蠕变和应力松弛)的生物组织。分数阶模型提供了一种有效描述本质上时间依赖的黏弹性动力学的方法,仅需少量参数,因为它们自然地捕捉记忆效应。然而,由于分数元素的阶数与其他参数之间的非直观、频率依赖的耦合,确定产生高感知真实感的分数阶模型参数值仍是一个重大挑战。在本研究中,我们提出了一种系统的方法,通过主动学习优化分数阶黏弹性模型的参数,以优化触觉渲染在一般人群中的感知真实感。首先,我们证明通过基于定性反馈的人类在回路(HiL)优化可以有效优化分数阶模型的参数,以确保对每个人都能保持一致的高真实感评分。其次,我们提出了一种严格的方法,将HiL优化结果结合到一个在完整数据集上训练的聚合感知地图中,并展示如何从这种表示中选择群体层面的最佳参数,这些参数在一般人群中被广泛认为是真实的。最后,我们通过人类受试者实验验证了广义分数阶黏弹性模型参数在三种黏弹性材料中的有效性。总体而言,通过所提出的HiL优化和聚合方法建立的广义分数阶黏弹性模型有潜力显著提高医疗训练模拟器的sim-to-real过渡性能。

英文摘要

Effective medical simulators necessitate realistic haptic rendering of biological tissues that exhibit viscoelastic material properties, such as creep and stress relaxation. Fractional-order models provide an effective means of describing intrinsically time-dependent viscoelastic dynamics with few parameters, as they naturally capture memory effects. However, due to the unintuitive, frequency-dependent coupling among the order of the fractional element and other parameters, determining appropriate parameter values for fractional-order models that yield high perceived realism remains a significant challenge. In this study, we propose a systematic means of determining the parameters of fractional-order viscoelastic models that optimizes the perceived realism of haptic rendering across general populations. First, we demonstrate that the parameters of fractional-order models can be effectively optimized through active learning, using qualitative feedback-based human-in-the-loop (HiL) optimization, to ensure consistently high realism ratings for each individual. Second, we propose a rigorous method to combine HiL optimization results into an aggregate perceptual map trained on the entire dataset, and demonstrate how to select population-level optimal parameters from this representation that are broadly perceived as realistic across general populations. Finally, we provide evidence of the effectiveness of the generalized fractional-order viscoelastic model parameters for three viscoelastic materials by characterizing their perceived realism through human-subject experiments. Overall, generalized fractional-order viscoelastic models established through the proposed HiL optimization and aggregation approach possess the potential to significantly improve the sim-to-real transition performance of medical training simulators.

2511.01959 2026-05-20 astro-ph.IM astro-ph.CO astro-ph.HE cs.LG

Addressing prior dependence in hierarchical Bayesian modeling for PTA data analysis II: Noise and SGWB inference through parameter decorrelation

解决层次贝叶斯建模中先验依赖问题 II:通过参数去相关进行噪声和随机引力波背景推断

Eleonora Villa, Luigi D'Amico, Aldo Barca, Fatima Modica Bittordo, Francesco Alì, Massimo Meneghetti, Luca Naso

发表机构 * INAF(意大利国家天体物理研究所)

AI总结 本文提出了一种层次贝叶斯建模策略,通过参数去相关来解决脉冲星计时阵列数据中的先验依赖问题,同时通过正交投影和归一化流方法提高噪声和随机引力波背景参数推断的准确性。

Comments 27 pages, 5 figures. Extended analysis and appendix added. Submitted to the Astronomy and Computing special issue HPC in Cosmology and Astrophysics

详情
AI中文摘要

脉冲星计时阵列(PTA)提供了一个强大的框架来测量低频引力波,但结果的准确性和鲁棒性受到复杂噪声过程的挑战,必须精确建模。标准PTA分析为每个脉冲星分配固定均匀噪声先验,这种方法在组合阵列时可能引入系统性偏差。为克服这一限制,我们采用层次贝叶斯建模策略,其中噪声先验由更高层次的超参数参数化。为缓解推断参数对噪声超先验选择的敏感性,我们引入了一种基于超参数在物理参数子空间上的正交投影的层次模型重参数化方法。该变换通过归一化流(NFs)实现,提供了可逆且可 tractable 的表示,并在重参数化模型中保留了收缩和跨脉冲星信息池化。我们还采用i-nessai,一种流引导的嵌套采样器,以高效探索由此产生的高维参数空间。我们将其方法应用于一个最小的3脉冲星案例研究,同时推断噪声和随机引力波背景(SGWB)参数。尽管数据集有限,结果一致表明,重参数化层次处理更紧密地约束了噪声参数,并部分缓解了红噪声-SGWB退化,而正交重参数化进一步增强了参数独立性,而不影响物理过程幂律建模固有的相关性。

英文摘要

Pulsar Timing Arrays (PTA) provide a powerful framework to measure low-frequency gravitational waves, but accuracy and robustness of the results are challenged by complex noise processes that must be accurately modeled. Standard PTA analyses assign fixed uniform noise priors to each pulsar, an approach that can introduce systematic biases when combining the array. To overcome this limitation, we adopt a hierarchical Bayesian modeling strategy in which noise priors are parametrized by higher-level hyperparameters. To mitigate the sensitivity of the inferred parameters to the choice of noise hyperprior, we introduce a reparametrization of the hierarchical model based on the orthogonal projection of hyperparameters onto the physical parameter subspace. The transformation is implemented through Normalizing Flows (NFs), which provide an invertible, tractable representation and preserve shrinkage and inter-pulsar information pooling in the reparametrized model. We also employ i-nessai, a flow-guided nested sampler, to efficiently explore the resulting higher-dimensional parameter space. We apply our method to a minimal 3-pulsar case study, performing a simultaneous inference of noise and stochastic gravitational wave background (SGWB) parameters. Despite the limited dataset, the results consistently show that the reparametrized hierarchical treatment constrains the noise parameters more tightly and partially alleviates the red-noise-SGWB degeneracy, while the orthogonal reparametrization further enhances parameter independence without affecting the correlations intrinsic to the power-law modeling of the physical processes involved.

2510.27588 2026-05-20 cs.DS cs.DB cs.LG

Learned Static Function Data Structures

学习静态函数数据结构

Stefan Hermann, Hans-Peter Lehmann, Giorgio Vinciguerra, Stefan Walzer

发表机构 * Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院) University of Pisa(比萨大学)

AI总结 本文提出了一种利用机器学习捕获键值间相关性的静态函数数据结构,通过压缩编码实现空间节省,突破零阶熵限制并支持点查询。

Journal ref PVLDB, 19(5): 917-930, 2026

详情
AI中文摘要

我们考虑了构建一个数据结构的任务,该数据结构将静态键集与值关联起来,同时允许对键集外的查询返回任意值。与哈希表相比,这些所谓的静态函数数据结构不需要存储键集,因此使用显著更少的内存。已知几种技术,压缩的静态函数接近值序列的零阶经验熵。在本文中,我们引入了学习静态函数,利用机器学习捕捉键和值之间的相关性。对于每个键,模型预测一个值的概率分布,从中推导出键特定的前缀码以紧凑地编码真实值。所得的编码词存储在经典静态函数数据结构中。这种设计使学习静态函数能够突破零阶熵限制,同时支持点查询。我们的实验显示了显著的空间节省:在真实数据上可达一个数量级,在合成数据上可达三个数量级。

英文摘要

We consider the task of constructing a data structure for associating a static set of keys with values, while allowing arbitrary output values for queries involving keys outside the set. Compared to hash tables, these so-called static function data structures do not need to store the key set and thus use significantly less memory. Several techniques are known, with compressed static functions approaching the zero-order empirical entropy of the value sequence. In this paper, we introduce learned static functions, which use machine learning to capture correlations between keys and values. For each key, a model predicts a probability distribution over the values, from which we derive a key-specific prefix code to compactly encode the true value. The resulting codeword is stored in a classic static function data structure. This design allows learned static functions to break the zero-order entropy barrier while still supporting point queries. Our experiments show substantial space savings: up to one order of magnitude on real data, and up to three orders of magnitude on synthetic data.

2510.12278 2026-05-20 cs.ET cs.AI

Quantum Annealing for Staff Scheduling in Educational Environments

量子退火在教育环境中的员工调度应用

Alessia Ciacco, Francesca Guerriero, Eneko Osaba

发表机构 * Department of Political and Social Sciences, University of Calabria, Rende (CS), Italy(政治与社会科学系,卡拉布里亚大学,雷内(CS),意大利) Department of Mechanical, Energy and Management Engineering, University of Calabria, Rende (CS), Italy(机械、能源与管理工程系,卡拉布里亚大学,雷内(CS),意大利) TECNALIA, Basque Research and Technology Alliance (BRTA), Derio (Bizkaia), Spain(TECNALIA,巴斯克研究与技术联盟(BRTA),德里奥(巴斯克),西班牙)

AI总结 本文提出了一种基于量子退火的优化模型,用于解决多所学校和教育层次间员工分配问题,展示了量子退火在教育调度中的实际应用价值。

Comments 8 pages, 3 tables, and 2 figures. Paper presented at the International Conference on Quantum Communications, Networking, and Computing (QCNC 2026)

Journal ref in Proc. 2026 International Conference on Quantum Communications, Networking, and Computing (QCNC), IEEE, 2026, pp. 630-637

详情
AI中文摘要

我们解决了一个新的员工分配问题,该问题出现在多个学校站点和教育层次之间组织协作者的过程中。该问题源于意大利卡拉布里亚公立学校的一个真实案例,其中员工必须在幼儿园、小学和中学之间分配,受到可用性、能力和公平性的约束。为解决此问题,我们开发了一个优化模型并研究了基于量子退火的解决方案方法。我们在真实数据上的计算实验表明,量子退火能够在较短的运行时间内产生平衡的分配结果。这些结果为量子优化方法在教育调度中的实际应用提供了证据,并更广泛地为复杂资源分配任务提供了依据。

英文摘要

We address a novel staff allocation problem that arises in the organization of collaborators among multiple school sites and educational levels. The problem emerges from a real case study in a public school in Calabria, Italy, where staff members must be distributed across kindergartens, primary, and secondary schools under constraints of availability, competencies, and fairness. To tackle this problem, we develop an optimization model and investigate a solution approach based on quantum annealing. Our computational experiments on real-world data show that quantum annealing is capable of producing balanced assignments in short runtimes. These results provide evidence of the practical applicability of quantum optimization methods in educational scheduling and, more broadly, in complex resource allocation tasks.

2509.19707 2026-05-20 stat.ML cs.LG stat.CO stat.ME

Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies

扩散与流基copula:遗忘与记忆依赖

David Huk, Theodoros Damoulas

发表机构 * Department of Statistics(统计系) Department of Computer Science(计算机科学系) University of Warwick(沃里克大学)

AI总结 本文提出基于扩散和流原理的copula建模方法,通过遗忘和记忆依赖机制,有效建模多变量依赖,提升了copula模型的表示能力,适用于复杂和高维数据。

Comments Published as a conference paper at ICLR 2026

详情
AI中文摘要

copulas是建模数据多变量依赖的基本工具,在众多领域和应用中被广泛采用。然而,现有模型在处理多模态和高维依赖时受到限制性假设和扩展性差的阻碍。在本文中,我们提出了基于扩散和流原理的copula建模方法。我们设计了两种过程,逐步遗忘变量间依赖,同时不影响维度分布,证明在所有时间都定义有效的copula。我们展示了如何通过学习从每个过程中记忆遗忘的依赖来获得copula模型,理论上在最优时恢复真实copula。我们的框架的第一种实例专注于直接密度估计,第二种则专注于高效采样。实验表明,我们的方法在建模科学数据集和图像中的复杂和高维依赖方面优于现有copula方法。我们的工作增强了copula模型的表示能力,推动了其在更广泛领域和更大规模应用中的采用。

英文摘要

Copulas are a fundamental tool for modelling multivariate dependencies in data, forming the method of choice in diverse fields and applications. However, the adoption of existing models for multimodal and high-dimensional dependencies is hindered by restrictive assumptions and poor scaling. In this work, we present methods for modelling copulas based on the principles of diffusions and flows. We design two processes that progressively forget inter-variable dependencies while leaving dimension-wise distributions unaffected, provably defining valid copulas at all times. We show how to obtain copula models by learning to remember the forgotten dependencies from each process, theoretically recovering the true copula at optimality. The first instantiation of our framework focuses on direct density estimation, while the second specialises in expedient sampling. Empirically, we demonstrate the superior performance of our proposed methods over state-of-the-art copula approaches in modelling complex and high-dimensional dependencies from scientific datasets and images. Our work enhances the representational power of copula models, empowering applications and paving the way for their adoption on larger scales and more challenging domains.

2508.06526 2026-05-20 cs.DC cs.AI cs.AR

PiKV: KV Cache Management System for Mixture of Experts

PiKV: 一种用于混合专家架构的键值缓存管理系统

Dong Liu, Yanxuan Yu, Ben Lengerich, Ying Nian Wu

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校) Yale University(耶鲁大学) Columbia University(哥伦比亚大学) University of Wisconsin-Madison(威斯康星大学麦迪逊分校)

AI总结 本文提出PiKV,一种专为混合专家架构设计的并行分布式键值缓存服务框架,通过专家分片缓存、PiKV路由和PiKV调度来减少缓存访问开销,并通过压缩模块降低内存使用。

Comments Github Link: https://github.com/NoakLiu/PiKV

详情
AI中文摘要

随着大规模语言模型在规模和上下文长度上持续扩展,键值(KV)缓存存储的内存和通信成本已成为多GPU和多节点推断中的主要瓶颈。虽然基于混合专家(MoE)的架构在计算上稀疏化,但相应的KV缓存仍然密集且全局同步,导致显著的开销。我们介绍了PiKV,一种专为MoE架构设计的并行和分布式KV缓存服务框架。PiKV利用专家分片的KV存储将缓存划分为GPU,利用PiKV路由减少令牌到KV的访问,以及PiKV调度来适应性地保留查询相关的条目。为了进一步减少内存使用,PiKV将PiKV压缩模块整合到缓存管道中以加速。PiKV最近已作为开源软件库公开发布:https://github.com/NoakLiu/PiKV。PiKV仍是一个活跃的项目,旨在成为一种全面的MoE架构的键值缓存管理系统。

英文摘要

As large-scale language models continue to scale up in both size and context length, the memory and communication cost of key-value (KV) cache storage has become a major bottleneck in multi-GPU and multi-node inference. While MoE-based architectures sparsify computation across experts, the corresponding KV caches remain dense and globally synchronized, resulting in significant overhead. We introduce \textbf{PiKV}, a parallel and distributed KV cache serving framework tailored for MoE architecture. PiKV leverages \textit{expert-sharded KV storage} to partition caches across GPUs, \textit{PiKV routing} to reduce token-to-KV access, and a \textit{PiKV Scheduling} to adaptively retain query-relevant entries. To further reduce memory usage, PiKV integrates \textit{PiKV Compression} modules the caching pipeline for acceleration. PiKV is recently publicly available as an open-source software library: \href{https://github.com/NoakLiu/PiKV}{https://github.com/NoakLiu/PiKV}. PiKV is still a living project, aiming to become a comprehesive KV Cache management system for MoE Architectures.

2507.01932 2026-05-20 math.OC cs.LG cs.NA math.NA stat.ML

A first-order method for nonconvex-nonconcave minimax problems under a local Kurdyka-Lojasiewicz condition

非凸-非凹极小极大问题的一种一阶方法:在局部Kurdyka-Lojasiewicz条件下

Zhaosong Lu, Xiangyuan Wang

发表机构 * Department of Industrial and Systems Engineering, University of Minnesota, USA(明尼苏达大学工业与系统工程系)

AI总结 本文研究了一类非凸-非凹极小极大问题,其中内部最大化问题满足一个可能随外部最小化变量变化的局部Kurdyka-Lojasiewicz条件。与文献中常见的全局KL或Polyak-Lojasiewicz条件相比,该局部KL条件能涵盖更广泛的实际场景,但同时也带来了新的分析挑战。为此,本文证明了关联的最大函数是局部广义Hölder光滑的,并基于此开发了一种近似近端梯度方法来求解极小极大问题,在温和假设下建立了计算近似 stationary 点的复杂性保证。

Comments Accepted by SIAM Journal on Optimization

详情
AI中文摘要

我们研究了一类非凸-非凹极小极大问题,其中内部最大化问题满足一个可能随外部最小化变量变化的局部Kurdyka-Lojasiewicz(KL)条件。与文献中常见的全局KL或Polyak-Lojasiewicz(PL)条件相比,该局部KL条件能涵盖更广泛的实际场景,但同时也带来了新的分析挑战。特别是,随着优化算法向问题的 stationary 点推进,KL条件成立的区域可能缩小,导致更复杂且可能病态的景观。为解决这一挑战,我们证明了关联的最大函数是局部广义Hölder光滑的。利用这一关键性质,我们开发了一种近似近端梯度方法来求解极小极大问题,其中最大函数的近似梯度通过应用KL结构子问题的近端梯度方法计算。在温和假设下,我们建立了计算极小极大问题近似 stationary 点的复杂性保证。

英文摘要

We study a class of nonconvex-nonconcave minimax problems in which the inner maximization problem satisfies a local Kurdyka-Lojasiewicz (KL) condition that may vary with the outer minimization variable. In contrast to the global KL or Polyak-Lojasiewicz (PL) conditions commonly assumed in the literature -- which are significantly stronger and often too restrictive in practice -- this local KL condition accommodates a broader range of practical scenarios. However, it also introduces new analytical challenges. In particular, as an optimization algorithm progresses toward a stationary point of the problem, the region over which the KL condition holds may shrink, resulting in a more intricate and potentially ill-conditioned landscape. To address this challenge, we show that the associated maximal function is locally generalized Hölder smooth. Leveraging this key property, we develop an inexact proximal gradient method for solving the minimax problem, where the inexact gradient of the maximal function is computed by applying a proximal gradient method to a KL-structured subproblem. Under mild assumptions, we establish complexity guarantees for computing an approximate stationary point of the minimax problem.

2506.12218 2026-05-20 eess.SP cs.LG

Directed Acyclic Graph Convolutional Networks

有向无环图卷积网络

Samuel Rey, Hamed Ajorlou, Gonzalo Mateos

发表机构 * Dept. of Signal Theory and Communications, Rey Juan Carlos University, Madrid, Spain(信号理论与通信系,雷亚尔·卡洛斯大学,马德里,西班牙) Dept. of Electrical and Computer Engineering, University of Rochester(电气与计算机工程系,罗切斯特大学)

AI总结 本文提出了一种专门针对DAG上信号卷积学习的新型图神经网络架构DCN,通过因果图滤波器学习节点表示,利用正式的卷积操作实现频域表示,并引入并行DCN(PDCN)以解耦模型复杂度与图规模,实验证明其在准确率、鲁棒性和计算效率上优于现有方法。

详情
AI中文摘要

有向无环图(DAG)在科学和工程应用中至关重要,包括因果推断、调度和神经架构搜索。本文介绍DAG卷积网络(DCN),一种专为从DAG上信号进行卷积学习设计的新型图神经网络(GNN)架构。DCN利用因果图滤波器学习节点表示,这些表示考虑了DAG固有的部分顺序,这是一种在传统GNN中不存在的强归纳偏差。与以往在DAG上的机器学习方法不同,DCN基于允许频域表示的正式卷积操作。我们进一步提出并行DCN(PDCN),该模型将输入DAG信号馈入并行的因果图移位操作符银行,并使用共享的多层感知机处理这些DAG感知特征。这样,PDCN在解耦模型复杂度与图规模的同时保持了令人满意的预测性能。所提架构的排列等变性和表达能力也得到了确立。在多个任务、数据集和实验条件下进行全面的数值测试表明,(P)DCN在准确率、鲁棒性和计算效率方面均优于现有最先进基线。这些结果将(P)DCN定位为一种可行的深度学习框架,该框架专门针对DAG结构数据进行设计,基于第一性(图)信号处理原理。

英文摘要

Directed acyclic graphs (DAGs) are central to science and engineering applications including causal inference, scheduling, and neural architecture search. In this work, we introduce the DAG Convolutional Network (DCN), a novel graph neural network (GNN) architecture designed specifically for convolutional learning from signals supported on DAGs. The DCN leverages causal graph filters to learn nodal representations that account for the partial ordering inherent to DAGs, a strong inductive bias does not present in conventional GNNs. Unlike prior art in machine learning over DAGs, DCN builds on formal convolutional operations that admit spectral-domain representations. We further propose the Parallel DCN (PDCN), a model that feeds input DAG signals to a parallel bank of causal graph-shift operators and processes these DAG-aware features using a shared multilayer perceptron. This way, PDCN decouples model complexity from graph size while maintaining satisfactory predictive performance. The architectures' permutation equivariance and expressive power properties are also established. Comprehensive numerical tests across several tasks, datasets, and experimental conditions demonstrate that (P)DCN compares favorably with state-of-the-art baselines in terms of accuracy, robustness, and computational efficiency. These results position (P)DCN as a viable framework for deep learning from DAG-structured data that is designed from first (graph) signal processing principles.

2506.07209 2026-05-20 cs.GR cs.CV

HOI-PAGE: Zero-Shot Human-Object Interaction Generation with Part Affordance Guidance

HOI-PAGE:基于部分可及性的零样本人类-物体交互生成

Lei Li, Angela Dai

发表机构 * University of Virginia(弗吉尼亚大学) Technical University of Munich(慕尼黑技术大学)

AI总结 本文提出HOI-PAGE,一种通过部分可及性推理生成高保真4D人类-物体交互的零样本方法,利用大语言模型进行部分级机械推理,并通过结构化部分可及性图(PAG)引导三阶段合成,生成复杂多物体或多人物交互序列。

Comments ICML 2026. Project page: https://craigleili.github.io/projects/hoipage/ Video: https://www.youtube.com/watch?v=gwXjOffCFyk

详情
AI中文摘要

我们提出了HOI-PAGE,一种新的方法,优先考虑部分级可及性推理,从文本提示中以零样本方式生成高保真的4D人类-物体交互(HOIs)。与之前专注于全局、整体身体-物体运动合成的方法不同,我们的方法利用大语言模型(LLMs)显式推理交互的底层部分级机械特性。我们通过结构化的部分可及性图(PAG)表示来捕捉这种推理,作为高层次交互框架,引导三阶段合成:首先,将输入3D对象分解为语义部分;然后,从文本提示生成参考HOI视频以提取基于部分的运动约束;最后,优化4D HOI运动序列,使其模仿参考动态并满足部分级接触约束。广泛的实验表明,我们的方法具有灵活性,能够生成复杂的多物体或多人物交互序列,具有显著提高的现实感和文本对齐性,对于零样本4D HOI生成具有明显优势。

英文摘要

We present HOI-PAGE, a new approach that prioritizes part-level affordance reasoning to generate high-fidelity 4D human-object interactions (HOIs) from text prompts in a zero-shot fashion. In contrast to prior works that focus on global, whole body-object motion synthesis, our approach explicitly reasons about the underlying part-level mechanics of interactions using large language models (LLMs). We capture this reasoning in a structured part affordance graph (PAG) representation, serving as a high-level interaction scaffolding to guide a three-stage synthesis: first, decomposing input 3D objects into semantic parts; then, generating reference HOI videos from text prompts to extract part-based motion constraints; and finally, optimizing for 4D HOI motion sequences that mimic the reference dynamics while satisfying part-level contact constraints. Extensive experiments show that our approach is flexible and capable of generating complex multi-object or multi-person interaction sequences, with significantly improved realism and text alignment for zero-shot 4D HOI generation.

2506.03178 2026-05-20 eess.IV cs.AI cs.CV

LLaMA-XR: A Novel Framework for Radiology Report Generation using LLaMA and QLoRA Fine Tuning

LLaMA-XR: 一种基于LLaMA和QLoRA微调的新型放射科报告生成框架

Md. Zihad Bin Jahangir, Muhammad Ashad Kabir, Sumaiya Akter, Israt Jahan, Minh Chau

发表机构 * Department of Computer Science and Engineering, Southeast University(计算机科学与工程系,东南大学) School of Computing, Mathematics and Engineering, Charles Sturt University(计算、数学与工程学院,查尔斯·斯特劳特大学) Department of Computer Science and Engineering, University of Liberal Arts Bangladesh(计算机科学与工程系,孟加拉国自由大学) Medical Imaging Group, School of Dentistry and Medical Sciences, Charles Sturt University(医学影像组,牙科学院与医学科学学院,查尔斯·斯特劳特大学)

AI总结 本文提出LLaMA-XR框架,结合LLaMA 3.1与DenseNet-121图像嵌入及QLoRA微调,提升放射科报告生成的准确性和临床相关性,同时保持计算效率。

Comments 25 pages

Journal ref Bioengineering 2026, 13(5), 493

详情
AI中文摘要

自动化放射科报告生成具有减少放射科医生工作负担和提高诊断准确性的潜力。然而,从胸部X光片生成精确且具有临床意义的报告仍然具有挑战性,因为医学语言的复杂性和对上下文理解的需求。现有模型在保持准确性和上下文相关性方面存在困难。在本文中,我们提出了LLaMA-XR,一种新型框架,整合了LLaMA 3.1与基于DenseNet-121的图像嵌入以及量化低秩适应(QLoRA)微调。LLaMA-XR在保持计算效率的同时实现了改进的连贯性和临床准确性。这种效率是由一种优化策略驱动的,该策略增强了参数利用并减少了内存开销,使报告生成速度更快,计算资源需求更低。在IU X光基准数据集上进行的广泛实验表明,LLaMA-XR优于一系列最先进的方法。我们的模型在ROUGE-L得分上达到0.433,在METEOR得分上达到0.336,建立了该领域的性能新基准。这些结果突显了LLaMA-XR作为自动化放射科报告的有效且高效的AI系统潜力,提供了增强的临床效用和可靠性。

英文摘要

Automated radiology report generation holds significant potential to reduce radiologists' workload and enhance diagnostic accuracy. However, generating precise and clinically meaningful reports from chest radiographs remains challenging due to the complexity of medical language and the need for contextual understanding. Existing models often struggle with maintaining both accuracy and contextual relevance. In this paper, we present LLaMA-XR, a novel framework that integrates LLaMA 3.1 with DenseNet-121-based image embeddings and Quantized Low-Rank Adaptation (QLoRA) fine-tuning. LLaMA-XR achieves improved coherence and clinical accuracy while maintaining computational efficiency. This efficiency is driven by an optimization strategy that enhances parameter utilization and reduces memory overhead, enabling faster report generation with lower computational resource demands. Extensive experiments conducted on the IU X-ray benchmark dataset demonstrate that LLaMA-XR outperforms a range of state-of-the-art methods. Our model achieves a ROUGE-L score of 0.433 and a METEOR score of 0.336, establishing new performance benchmarks in the domain. These results underscore LLaMA-XR's potential as an effective and efficient AI system for automated radiology reporting, offering enhanced clinical utility and reliability.

2504.08381 2026-05-20 eess.SP cs.LG

An Empirical Investigation of Reconstruction-Based Models for Seizure Prediction from ECG Signals

基于重建模型的癫痫预测的实证研究:从ECG信号出发

Mohammad Reza Chopannavaz, Foad Ghaderi

发表机构 * Human-Computer Interaction Lab., Faculty of Electrical and Computer Engineering, Tarbiat Modares University(人机交互实验室,电气与计算机工程学院,塔里亚特莫达雷斯大学)

AI总结 本文提出了一种基于重建的异常检测框架,利用时频表示和深度学习模型捕捉与癫痫发作相关的的心率动态变化,通过平滑重建误差和自适应阈值策略提高预测准确性,实验结果显示在Siena数据库上达到99.16%的特异度和76.05%的准确率,同时在临床环境中提供可操作的早期预警。

详情
AI中文摘要

癫痫发作是短暂的神经学事件,其特征是大脑中异常和过度的神经元活动,通常与心血管系统可测量的紊乱有关。传统上,脑电图(EEG)信号被用作癫痫预测的主要模式,因为它们直接测量大脑活动并具有高诊断精度。然而,它们的成本、对噪声的敏感性和实际部署限制限制了它们在非受控临床环境中的应用。为克服这些挑战,最近的研究越来越多地研究了心电图(ECG)信号作为一种实用且非侵入性的替代方法,用于现实环境中的癫痫预测。证据表明,ECG衍生的心脏特征可能在临床癫痫发作前出现,提供了一个可行的早期检测窗口。在本文中,我们提出了一种基于重建的异常检测框架,该框架结合了时频表示和先进的深度学习模型,以捕捉与癫痫发作相关的的心率动态变化。随后,重建误差被平滑,并应用了自适应阈值策略以减少误报。该方法在Siena数据库上进行了评估,实现了99.16%的特异度、76.05%的准确率和每小时0.01的假阳性率,平均预测时间在癫痫发作前45分钟。这些结果表明,基于ECG的预测可以提供临床可操作的早期预警,同时提高患者可及性和舒适度。然而,这种性能反映了一种倾向于高特异度而非灵敏度的权衡,导致假阳性率降低,并符合临床对可靠部署的需求。

英文摘要

Epileptic seizures are transient neurological events characterized by abnormal and excessive neuron activity in the brain, which are often associated with measurable disturbances in the cardiovascular system. Traditionally, electroencephalogram (EEG) signals have served as the primary modality for seizure prediction due to their direct measurement of brain activity and high diagnostic precision. However, their cost, sensitivity to noise, and practical deployment constraints limit their applicability outside controlled clinical environments. To overcome these challenges, recent studies have increasingly investigated electrocardiogram (ECG) signals as a practical and non-invasive alternative for seizure prediction in real-world settings. Evidence suggests that ECG-derived cardiac signatures may precede clinical seizure onset, offering a viable window for early detection. In this paper, we propose a reconstruction-based anomaly detection framework that integrates time-frequency representations with advanced deep learning models to capture deviations in heart rate dynamics associated with seizure onset. Afterward, reconstruction error is smoothed, and an adaptive thresholding strategy is applied to reduce false alarms. The method was evaluated on the Siena database, achieving a specificity of 99.16%, accuracy of 76.05%, and a false positive rate (FPR) of 0.01/h, with an average prediction horizon of 45 minutes prior to seizure onset. These results demonstrate that ECG-based prediction can provide clinically actionable early warnings while improving patient accessibility and comfort. Nevertheless, this performance reflects a trade-off favoring high specificity over sensitivity, resulting in reduced FPR and aligning with clinical requirements for reliable deployment.

2504.04349 2026-05-20 cs.GT cs.LG

Tight Regret Bounds for Fixed-Price Bilateral Trade

固定价格双边交易的紧懊悔界

Houshuang Chen, Yaonan Jin, Pinyan Lu, Chihao Zhang

发表机构 * Shanghai Jiao Tong University(上海交通大学) Huawei’s Taylor Lab(华为泰勒实验室) Shanghai University of Finance and Economics, Laboratory of Interdisciplinary Research of Computation and Economics (SUFE)(上海金融学院,计算与经济学交叉研究实验室(SUFE))

AI总结 本文研究了固定价格机制在双边交易中的懊悔最小化问题,针对独立值和相关/对抗值分别给出了紧致的懊悔界,并改进了现有结果。

详情
AI中文摘要

我们通过懊悔最小化的视角研究固定价格机制在双边交易中的应用。我们的主要结果有两个方面:(i) 对于独立值,给出了具有两比特/一比特反馈的全局预算平衡固定价格机制的近最优紧界$\widetilde{\Theta}(T^{2/3})$。(ii) 对于相关/对抗值,给出了具有两比特/一比特反馈的全局预算平衡固定价格机制的近最优下界$\Omega(T^{3/4})$,这改进了[ BCCF24]中得到的$\Omega(T^{5/7})$下界,并在多至多项式对数因子范围内匹配了同一工作中得到的$\widetilde{\mathcal{O}}(T^{3 / 4})$上界。我们的工作结合之前的[CCCFL24mor, CCCFL24jmlr, AFF24, BCCF24]等工作,全面理解了固定价格双边交易的懊悔最小化问题。在此过程中,我们开发了两个可能具有独立兴趣的技术成分:(i) 一种名为'分形消除'的新算法范式,用于处理一比特反馈和独立值;(ii) 一种新的下界构造方法,具有新颖的证明技术,用于处理全局预算平衡约束和相关值。

英文摘要

We examine fixed-price mechanisms in bilateral trade through the lens of regret minimization. Our main results are twofold. (i) For independent values, a near-optimal $\widetildeΘ(T^{2/3})$ tight bound for $\textsf{Global Budget Balance}$ fixed-price mechanisms with two-bit/one-bit feedback. (ii) For correlated/adversarial values, a near-optimal $Ω(T^{3/4})$ lower bound for $\textsf{Global Budget Balance}$ fixed-price mechanisms with two-bit/one-bit feedback, which improves the best known $Ω(T^{5/7})$ lower bound obtained in the work [BCCF24] and, up to polylogarithmic factors, matches the $\widetilde{\mathcal{O}}(T^{3 / 4})$ upper bound obtained in the same work. Our work in combination with the previous works [CCCFL24mor, CCCFL24jmlr, AFF24, BCCF24] (essentially) gives a thorough understanding of regret minimization for fixed-price bilateral trade. En route, we have developed two technical ingredients that might be of independent interest: (i) A novel algorithmic paradigm, called $\textit{fractal elimination}$, to address one-bit feedback and independent values. (ii) A new $\textit{lower-bound construction}$ with novel proof techniques, to address the $\textsf{Global Budget Balance}$ constraint and correlated values.

2503.16309 2026-05-20 eess.IV cs.CV physics.med-ph

Rapid patient-specific neural networks for intraoperative X-ray to volume registration

快速的患者特异性神经网络用于术中X射线到体积的配准

Vivek Gopalakrishnan, David-Dimitris Chlorogiannis, Andrew Abumoussa, Anna M. Larson, Nazim Haouchine, Darren B. Orbach, Sarah Frisken, Neel Dey, Polina Golland

发表机构 * Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Technology(哈佛-麻省理工健康科学与技术, 麻省理工学院) Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology(计算机科学与人工智能实验室, 麻省理工学院) Department of Radiology, Harvard Medical School(哈佛医学院放射科) Saint Luke’s Marion Bloch Neuroscience Institute(圣路易斯马里恩布洛克神经科学研究所) Department of Critical Care Medicine, Shriners Children’s Hospital(谢尔曼儿童医院重症医学科) Department of Interventional Neuroradiology, Boston Children’s Hospital(波士顿儿童医院介入神经放射科) Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital(阿提努拉A·马丁诺斯生物医学成像中心, 麻省总医院)

AI总结 本文提出了一种自监督框架xvr,结合患者特异性神经网络和梯度优化,实现了快速且准确的2D到3D配准,通过物理模拟生成训练数据,无需手动标注,提升了临床和研究社区的广泛应用能力。

详情
AI中文摘要

先进的导航技术在图像引导的介入和手术机器人中需要快速且精确地对齐3D术前体积(如CT、MRI)到2D术中图像(如X射线荧光)。然而,现有的2D/3D配准方法无法在广泛的荧光引导程序中泛化:传统基于强度的优化器需要为每个患者仔细调整超参数,而深度学习方法需要大量的手动标注数据集,并且受限于训练时特定的解剖结构。为了解决这些限制,我们提出了xvr,一种自监督框架,结合了患者特异性神经网络和基于梯度的优化,实现了自动的2D/3D配准。xvr利用基于物理的模拟生成训练数据,从患者的术前扫描中生成,消除了手动标注的需要。我们提出了一种在数千次全身扫描上预训练的基础模型,仅需5分钟的微调即可实现任何解剖区域的患者特异性适应。在迄今为止最大的2D/3D配准评估中,xvr在多种解剖结构、成像模态和医院中实现了高精度,精度比现有方法提高了数量级。xvr通过开源软件https://xvr.csail.mit.edu,使广谱解剖的2D/3D刚性配准对广泛的临床和研究社区可及。

英文摘要

Advanced navigation techniques in image-guided interventions and surgical robotics require the rapid and precise alignment of 3D preoperative volumes (e.g., CT, MRI) to 2D intraoperative images (e.g., X-ray fluoroscopy). However, existing 2D/3D registration methods fail to generalize across the broad spectrum of fluoroscopy-guided procedures: traditional intensity-based optimizers require careful hyperparameter tuning for each subject, while deep learning approaches demand extensive manually labeled datasets and remain constrained to the specific anatomy on which they were trained. To address these limitations, we present xvr, a self-supervised framework that combines patient-specific neural networks with gradient-based optimization for automatic 2D/3D registration. xvr leverages physics-based simulation to generate training data from a patient's own preoperative scan, eliminating the need for manual annotation. We present a foundation model pretrained on thousands of whole-body scans, achieving patient-specific adaptation for any anatomical region in only 5 minutes of finetuning. In the largest evaluation of 2D/3D registration on real fluoroscopy to date, xvr achieves high accuracy in seconds across diverse anatomical structures, imaging modalities, and hospitals, improving upon the accuracy of existing methods by an order of magnitude. xvr makes pan-anatomical 2D/3D rigid registration accessible to broad clinical and research communities through open-source software at https://xvr.csail.mit.edu.

2404.16676 2026-05-20 cs.DS cs.LG

Multilayer Correlation Clustering

多层相关聚类

Atsushi Miyauchi, Florian Adriaens, Francesco Bonchi, Nikolaj Tatti

发表机构 * Intesa Sanpaolo University of Helsinki(Intesa Sanpaolo 哈尔滨工业大学) Intesa Sanpaolo AI Research University of Helsinki(Intesa Sanpaolo AI 研究大学 哈尔滨工业大学)

AI总结 本文提出了一种多层相关聚类方法,旨在通过最小化多层不一致向量的ℓ_𝑝范数来优化聚类结果,并设计了相应的近似算法和实验验证。

Comments AISTATS 2026

详情
AI中文摘要

我们建立了多层相关聚类,这是相关聚类在多层设置下的新一般化。在该模型中,我们被给予一系列相关聚类的输入(称为层)在共同的集合V上。目标是找到V的一个聚类,使其多层不一致向量的ℓ_𝑝范数(p≥1)最小化,该向量的维度等于层数,每个元素表示聚类在相应层上的不一致程度。对于这一一般化,我们首先设计了一个O(L log n)的近似算法,其中L是层数。然后我们研究了我们问题的一个重要特殊情况,即具有所谓概率约束的情况。对于这种情况,我们首先给出一个(α+2)的近似算法,其中α是任何可能的单层对应物的近似比。此外,我们设计了一个4近似算法,该算法改进了上述一般概率约束情况下的近似比α+2=4.5。使用现实世界数据集的计算实验支持了我们的理论发现,并展示了所提出算法的实用性。

英文摘要

We establish Multilayer Correlation Clustering, a novel generalization of Correlation Clustering to the multilayer setting. In this model, we are given a series of inputs of Correlation Clustering (called layers) over the common set $V$ of $n$ elements. The goal is to find a clustering of $V$ that minimizes the $\ell_p$-norm ($p\geq 1$) of the multilayer-disagreements vector, which is defined as the vector (with dimension equal to the number of layers), each element of which represents the disagreements of the clustering on the corresponding layer. For this generalization, we first design an $O(L\log n)$-approximation algorithm, where $L$ is the number of layers. We then study an important special case of our problem, namely the problem with the so-called probability constraint. For this case, we first give an $(α+2)$-approximation algorithm, where $α$ is any possible approximation ratio for the single-layer counterpart. Furthermore, we design a $4$-approximation algorithm, which improves the above approximation ratio of $α+2=4.5$ for the general probability-constraint case. Computational experiments using real-world datasets support our theoretical findings and demonstrate the practical effectiveness of our proposed algorithms.

2312.02652 2026-05-20 hep-ex cs.LG

What Machine Learning Can Do for Focusing Aerogel Detectors

机器学习如何帮助聚焦气凝胶探测器

Foma Shipilov, Alexander Barnyakov, Viktor Bobrovnikov, Sergey Kononov, Fedor Ratnikov

发表机构 * NRU Higher School of Economics(俄罗斯莫斯科国立经济学院) Budker Institute of Nuclear Physics of Siberian Branch Russian Academy of Sciences(西伯利亚分支俄罗斯科学院布里克核物理研究所) Novosibirsk State Technical University(新西伯利亚国立技术大学) Novosibirsk State University(新西伯利亚国立大学)

AI总结 本文提出利用机器学习技术来过滤聚焦气凝胶环电离切连尼探测器中的背景信号,以减少数据流并提高粒子速度分辨率。

Comments 5 pages, 4 figures, to be published in 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP2023) proceedings

详情
AI中文摘要

在超重味厂实验中,粒子识别将由聚焦气凝胶环电离切连尼探测器(FARICH)提供。探测器的位置特性使得适当的冷却变得困难,因此大量的环境背景击中会被捕获。必须采取措施来减轻这些背景击中,以减少数据流并提高粒子速度分辨率。在本工作中,我们提出了几种过滤信号击中的方法,这些方法受到计算机视觉中机器学习技术的启发。

英文摘要

Particle identification at the Super Charm-Tau factory experiment will be provided by a Focusing Aerogel Ring Imaging CHerenkov detector (FARICH). The specifics of detector location make proper cooling difficult, therefore a significant number of ambient background hits are captured. They must be mitigated to reduce the data flow and improve particle velocity resolution. In this work we present several approaches to filtering signal hits, inspired by machine learning techniques from computer vision.

2605.19737 2026-05-20 cs.GR cs.CV

Decentralized Direct Volume Rendering: A Browser-Native GPU Architecture for MRI Digital Twins in Resource-Constrained Settings

去中心化直接体渲染:一种浏览器原生的GPU架构,用于资源受限环境中的MRI数字孪生

Oserebameh Augustine Beckley

发表机构 * Lagos State University(拉各斯州大学)

AI总结 本研究提出了一种去中心化的浏览器原生GPU架构,用于在资源受限环境中实现高保真的MRI数字孪生,通过在低成本集成边缘GPU上执行确定性的单次通过射线投射和形态学梯度计算,实现了快速的像素生成和稳定的交互性能。

Comments 10 pages, 4 figures. Live interactive browser demo available at: https://webgpu-mri.vercel.app/ . Source code repository: https://github.com/Bahdmanbabzo/webgpu-mri

详情
AI中文摘要

数字孪体(DT)技术在手术计划和个性化医学中具有巨大潜力。然而,生成交互式、患者特异性的解剖孪体目前依赖于计算密集型的服务器端渲染(SSR)或昂贵的本地工作站,这在资源受限环境中(RCS)构成了显著的部署障碍。本文提出了一种去中心化的、客户端侧的WebGPU架构,以民主化高保真解剖数字孪体的访问。通过绕过标准的服务器端渲染管线,该框架在低成本的集成边缘GPU上执行确定性的单次通过射线投射和形态学梯度计算。消除云渲染解决方案固有的网络延迟,系统实现了小于920.0毫秒的首次像素时间(TTFP)并在>=82.0 FPS的稳定交互性。通过统一缓冲区维持连续交互保真度,实现了零延迟的组织参数操控,以支持动态临床决策。通过证明复杂的患者特异性MRI扫描的3D医学模拟可以在浏览器中原生执行,无需深度学习或外部计算依赖,该架构提供了一种可扩展且经济的平台,以促进医疗数字孪体的广泛临床应用。

英文摘要

Digital Twin (DT) technology holds immense potential for surgical planning and personalized medicine. However, generating interactive, patient-specific anatomical twins currently relies on computationally heavy Server-Side Rendering (SSR) or expensive local workstations, creating significant barriers to deployment, especially in resource-constrained settings (RCS). This paper presents a decentralized, client-side WebGPU architecture that democratizes access to high-fidelity anatomical Digital Twins. By bypassing standard server-side rendering pipelines, the framework executes deterministic single-pass raymarching and morphological gradient calculations directly on low-cost integrated edge GPUs. Eliminating the network latency inherent to cloud-rendered solutions, the system achieves a Time to First Pixel (TTFP) of under 920.0ms and maintains stable interactivity at >= 82.0 FPS. Continuous Interaction Fidelity is maintained via uniform buffers, enabling zero-latency manipulation of tissue parameters for dynamic clinical decision-making. By proving that complex 3D medical simulations of patient-specific MRI scan can be executed natively in the browser without deep learning or external computational dependencies, this architecture provides a scalable, affordable foundation for the widespread clinical adoption of healthcare Digital Twins.

2605.19722 2026-05-20 cs.CR cs.AI

Measuring Safety Alignment Effects in Autonomous Security Agents

在自主安全代理中测量安全对齐效应

Isaac David, Arthur Gervais

发表机构 * University College London(伦敦大学学院)

AI总结 本文提出了一种基于轨迹的基准测试,用于评估安全代理在执行漏洞分析任务时的安全对齐效果,发现安全代理的性能差异主要体现在拒绝、不安全行为和工具可靠性等方面,而非单纯的拒绝率。

详情
AI中文摘要

当安全对齐的语言模型及其未经审查或删除的衍生版本作为自主安全代理运行时,它们的行为是否不同?单轮拒绝基准无法回答这个问题:安全代理必须检查仓库、调用工具并在授权的沙箱中生成漏洞证据。我们提出了一个包含30个本地漏洞分析任务的基于轨迹的基准测试,这些任务具有固定的工具、确定的成功谓词、擦除规则和基础检查,并将四种标准模型与未经审查或删除的衍生版本进行比较:Gemma 4 31B、Gemma 4 26B A4B、Qwen2.5-Coder 7B和Llama 3.1 8B。该成果包含1,500个安全代理轨迹和800个非安全控制轨迹。Gemma配对显示在安全任务中具有较大的限制减少收益:31B的成功率从14.0%降至0.7%,26B的成功率从10.7%降至0.0%,同时具有更高的平均基础性(3.91 vs 3.27和4.12 vs 1.64,满分5分)以及0.0%的拒绝、压制行为和不安全行为率。然而,控制和非Gemma配对排除了干净的安全特定或普遍的限制减少效应:Gemma的差距也出现在普通编码任务中,Qwen2.5-Coder在限制减少衍生版本中的成功率较低(2.0% vs 5.3%),而删除的Llama衍生版本未能通过工具协议。在所有家族中,硬证明触发和补丁验证任务仍无法解决。这些结果表明,自主安全代理中的安全对齐效应应在系统层面进行测量,将拒绝、不安全行为、工具可靠性和证据基础性分开,而不是将拒绝率作为安全信号。

英文摘要

Do stock safety-aligned language models and their uncensored or abliterated derivatives behave differently when run as autonomous security agents? Single-turn refusal benchmarks cannot answer this question: security agents must inspect repositories, call tools, and produce vulnerability evidence inside authorized sandboxes. We present a trace-based benchmark of 30 local vulnerability-analysis tasks with fixed tools, deterministic success predicates, redaction rules, and grounding checks, and compare four stock models against uncensored or abliterated derivatives: Gemma 4 31B, Gemma 4 26B A4B, Qwen2.5-Coder 7B, and Llama 3.1 8B. The artifact contains 1,500 security-agent traces and 800 non-security control traces. The Gemma pairs show large less-restricted gains on security tasks: 14.0% versus 0.7% success for 31B and 10.7% versus 0.0% for 26B, with higher mean grounding (3.91 versus 3.27 and 4.12 versus 1.64 out of five) and 0.0% refusal, suppressed-action, and unsafe-action rates in the 31B traces. However, controls and non-Gemma pairs rule out a clean security-specific or universal less-restricted effect: Gemma gaps also appear on ordinary coding tasks, Qwen2.5-Coder success is lower for the less-restricted derivative (2.0% versus 5.3%), and the abliterated Llama derivative fails the tool protocol. Across all families, hard proof-of-trigger and patch-verification tasks remain unsolved. These results show that safety alignment effects in autonomous security agents should be measured at the system level, separating refusal, unsafe action, tool reliability, and evidence grounding rather than treating refusal rate as the safety signal.

2605.19698 2026-05-20 cs.CR cs.LG

Awakening the Hydra: Stabilizing Multi-Concept Backdoor Injection in Text-to-Image Diffusion Models

唤醒 Hydra:在文本到图像扩散模型中稳定多概念后门注入

Kai Wang, Jiale Zhang, Chengcheng Zhu, Chuang Ma, Songze Li

发表机构 * Yangzhou University(扬州大学) Nanjing University(南京大学) Chongqing University(重庆大学) Southeast University(东南大学)

AI总结 本文研究了在易受干扰的环境下多概念后门攻击的稳定性问题,提出 Hydra 框架,通过约束触发语义和协调跨任务交互,实现稳健且可控的多概念后门注入,实验表明 Hydra 在保持清洁生成质量的同时,有效激活后门。

Comments Preprint. 18 pages

详情
AI中文摘要

文本到图像扩散模型通过开源重用和多次下游微调不断发展,其中重用的检查点难以验证,因此更容易出现隐藏的后门行为。在这样的生态系统中,一个预训练模型可能被多个独立方依次适应和重新分发,导致多个概念特定的触发-目标关联在同一个模型中累积。当这些关联共存时,语义冲突会在共享的表示空间中被放大,导致跨概念纠缠和生成质量下降。值得注意的是,这种累积并不增强攻击,反而可能破坏之前注入的行为并降低攻击可靠性。在本工作中,我们系统地研究了在此干扰环境中后门攻击,并提出 Hydra,一个统一的框架,用于在累积和去中心化的重用下实现稳健和可控的多概念后门注入。我们的核心见解是,在大规模多概念设置下稳定的后门注入需要在优化过程中显式约束触发语义并协调跨任务交互。具体而言,Hydra 在文本编码器空间中执行进化触发搜索,以识别与目标概念语义对齐但与其他注入概念保持稳定的触发器。它进一步结合多任务微调与触发器清洁正则化,以提高在密集多概念注入下的训练稳定性。在多个扩散骨干网络上进行的严格多概念设置下的广泛实验表明,Hydra 在保持清洁生成保真度和图像质量的同时,维持了有效的后门激活。例如,在 8 个攻击者和 500 个概念对上,Hydra 维持了约 95% 的 ASR 和强清洁生成。

英文摘要

Text-to-image diffusion models are increasingly developed through open-source reuse and repeated downstream fine-tuning, where reused checkpoints are difficult to verify and thus more susceptible to hidden backdoor behaviors. In such ecosystems, a single pretrained model may be sequentially adapted and redistributed by multiple independent parties, allowing multiple concept-specific trigger-target associations to accumulate in the same model. When these associations coexist, semantic conflicts can be amplified in the shared representation space, leading to cross-concept entanglement and degraded generation quality. Notably, instead of strengthening the attack, such accumulation can destabilize previously injected behaviors and reduce attack reliability. In this work, we systematically investigate backdoor attacks under this interference-prone setting and propose Hydra, a unified framework for robust and controlled multi-concept backdoor injection under cumulative and decentralized reuse. Our core insight is that stable backdoor injection under large-scale multi-concept settings requires explicitly constraining trigger semantics while coordinating cross-task interactions during optimization. Specifically, Hydra performs evolutionary trigger search in the text encoder space to identify triggers that are semantically aligned with their target concepts while remaining stable across other injected concepts. It further combines multi-task fine-tuning with trigger-clean regularization to improve training stability under dense multi-concept injection. Extensive experiments across multiple diffusion backbones under rigorous multi-concept settings show that Hydra maintains effective backdoor activation while preserving clean generation fidelity and image quality. For instance, across 8 attackers and 500 concept pairs, Hydra maintains ~95% ASR and strong clean generation.

2605.19695 2026-05-20 eess.AS cs.SD

Cross-Talk Speech Reduction, by Separation, for Separation

通过分离实现的交叉talk语音消除,用于分离

Zhong-Qiu Wang, Samuele Cornell

发表机构 * Department of Computer Science and Engineering(计算机科学与工程系) Southern University of Science and Technology(南方科技大学) Language Technologies Institute(语言技术研究所) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文提出了一种旨在从近场混合信号中分离说话人语音的交叉talk消除任务,并提出了一种名为CTRnet的新型方法,可以直接在真实录制的近场和远场混合信号对上训练以完成CTR。基于CTRnet,进一步提出基于伪标签的远场语音分离(PuLSS),利用CTRnet估计的干净语音作为伪标签来训练分离远场混合信号的模型。该框架的主要优势是CTRnet和PuLSS都可以在目标域的真实数据上进行训练,解决了模型仅在模拟数据上训练时通常观察到的泛化差距。在CHiME-6数据集上,该框架在Oracle和估计说话人分离条件下实现了最先进的ASR性能,超过了所有CHiME-{7,8}挑战提交。据我们所知,这是首个在真实对话“语音在野外”数据上显著优于引导源分离的神经语音分离方法。

Comments in submission

详情
AI中文摘要

在对话语音分离和识别任务中,通常在训练数据收集期间将近场麦克风附接到每个说话人上,以捕捉近场、近距离混合信号,同时使用远场麦克风记录远场混合信号。每种近场混合信号对佩戴者来说都有相对较高的能量水平,可以直观地作为训练远场语音分离模型的弱监督。然而,它们并不足以干净地用于此目的,因为它们通常包含来自其他说话人的强交叉talk语音以及背景噪声。为了解决这个问题,我们提出了一种交叉talk消除(CTR)任务,旨在从每个近场混合信号中隔离说话人的语音,并提出了一种名为CTRnet的新型方法,可以直接在真实录制的近场和远场混合信号对上训练以完成CTR。基于CTRnet,我们进一步提出基于伪标签的远场语音分离(PuLSS),利用CTRnet估计的干净语音作为伪标签来训练分离远场混合信号的模型。该框架的主要优势是CTRnet和PuLSS都可以在目标域的真实数据上进行训练,解决了模型仅在模拟数据上训练时通常观察到的泛化差距。在CHiME-6数据集上,该框架在Oracle和估计说话人分离条件下实现了最先进的ASR性能,超过了所有CHiME-{7,8}挑战提交。据我们所知,这是首个在真实对话“语音在野外”数据上显著优于引导源分离的神经语音分离方法。

英文摘要

In conversational speech separation and recognition tasks, close-talk microphones are typically attached to each speaker during training data collection to capture near-field, close-talk mixture signals, in addition to using far-field microphones to record far-field mixture signals. Each such close-talk mixture exhibits a reasonably high energy level for the wearer and could intuitively serve as weak supervision for training far-field speech separation models directly on real-recorded far-field signals. However, they are not sufficiently clean for this purpose, as they often contain strong cross-talk speech from other speakers in addition to background noise. To address this, we propose cross-talk reduction (CTR), a task aiming to isolate the wearer's speech from each close-talk mixture, and a novel method called CTRnet, which can be trained directly on real-recorded pairs of close-talk and far-field mixtures to accomplish CTR. Building on CTRnet, we further propose pseudo-label based far-field speech separation (PuLSS), which uses CTRnet's estimated clean speech as pseudo-labels to train models for separating far-field mixtures. A key advantage of the proposed framework is that both CTRnet and PuLSS can be trained on real-recorded data from the target domain, addressing the generalization gap commonly observed when models are trained exclusively on simulated data. On the CHiME-6 dataset, our framework achieves state-of-the-art ASR performance under both oracle and estimated speaker diarization, surpassing all CHiME-{7,8} challenge submissions. To our knowledge, it is the first neural speech separation method that substantially outperforms guided source separation on real conversational "speech-in-the-wild" data.

2605.19685 2026-05-20 stat.ML cs.LG

Probabilistic Multivariate Time Series Forecasting with Diffusion Copulas

基于扩散Copula的概率多变量时间序列预测

David Huk, Dongshan Wang, Miha Bresar

发表机构 * Department of Statistics The University of Warwick(威斯敏斯特大学统计系) School of Data Science The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)数据科学学院)

AI总结 本文提出了一种扩散-Copula框架,通过分离边际分布学习与依赖结构学习,改进了多变量时间序列预测中对尾部风险的估计,展示了在加密货币市场中对系统性极值的预测优势。

Comments ICLR 2026 Workshop Advances in Financial AI

详情
AI中文摘要

准确评估金融风险需要捕捉单个资产波动性和极端市场事件中复杂的非对称依赖结构。尽管现代扩散基模型在多变量预测方面有所进展,但端到端训练常导致

英文摘要

Accurately assessing financial risk requires capturing both individual asset volatility and the complex, asymmetric dependence structures that emerge during extreme market events. While modern diffusion-based models have advanced multivariate forecasting, they often suffer from a "normality bias" when trained end-to-end, sacrificing marginal calibration for joint coherence and consistently underestimating tail risk. To address this, we propose a Diffusion-Copula framework that explicitly decouples the learning of marginal distributions from their dependence structure. We employ deep Mixture Density Networks to capture heavy-tailed asset dynamics, followed by a Classification-Diffusion Copula to model the joint dependence. Applied to cryptocurrency markets, our approach demonstrates superior performance over state-of-the-art baselines in forecasting systemic extremes of both marginal and joint events. Crucially, we demonstrate that while baseline models classify simultaneous market crashes as statistically impossible "Black Swans" (high surprise), our framework identifies them as "Expected Crashes" (low surprise), successfully preserving the correlation structure necessary for robust risk management during contagion events.

2605.19667 2026-05-20 math.OC cs.LG

Convergence of Consensus-Based Particle Methods for Nonconvex Bi-Level Optimization

非凸双层优化中基于共识的粒子方法的收敛性

Yutong Chao, Xudong Sun, Konstantin Riedl, Majid Khadiv, Jalal Etesami

发表机构 * Department of Computer Science(计算机科学系) Technical University of Munich(慕尼黑技术大学) Munich Institute of Robotics and Machine Intelligence(慕尼黑机器人与智能机械研究所) Mathematical Institute(数学研究所) University of Oxford(牛津大学)

AI总结 本文研究了一种用于非凸双层优化的基于共识的优化方法,旨在最小化上层函数,其中下层问题的全局极小值集是优化域。该方法无导数,通过平滑分位数选择与Gibbs型拉普拉斯近似相结合来构建共识点。研究建立了与关联的均场动力学及其有限粒子近似的收敛性保证。特别地,在适当的平滑分位数局部化、误差界和稳定性假设下,证明了均场定律能够在给定的Wasserstein邻域内以显式指数速率达到目标双层解。数值实验进一步支持了理论结果。

详情
AI中文摘要

在本文中,我们研究了一种用于非凸双层优化的基于共识的优化方法,其中目标是最小化上层函数,其优化域为下层问题的全局极小值集。所提出的方法是无导数的,其共识点通过平滑分位数选择与Gibbs型拉普拉斯近似相结合来构建。我们建立了与关联的均场动力学及其有限粒子近似相关的收敛性保证。特别是,在适当的平滑分位数局部化、误差界和稳定性假设下,我们证明了均场定律能够在给定的Wasserstein邻域内以显式指数速率达到目标双层解,直到击中时间。在二维约束问题和神经网络训练上的数值实验进一步支持了这些理论结果。

英文摘要

In this paper, we study a consensus-based optimization method for nonconvex bi-level optimization, where the objective is to minimize an upper-level function over the set of global minimizers of a lower-level problem. The proposed approach is derivative-free, and constructs its consensus point via smooth quantile selection combined with a Gibbs-type Laplace approximation. We establish convergence guarantees for both the associated \textit{mean-field} dynamics and its \textit{finite-particle} approximation. In particular, under suitable assumptions on smooth quantile localization, error bounds, and stability, we show that the mean-field law reaches any arbitrary prescribed Wasserstein neighborhood of the target bi-level solution with an explicit exponential rate up to the hitting time. Numerical experiments on a two-dimensional constrained problem and neural network training further support the theoretical results.

2605.19666 2026-05-20 physics.med-ph cs.LG

Cross-View Attention Fusion Net: A Prior-Guided Dual-View Representation Learning for Cardiac Output Estimation from Short-Term PPG Signals

跨视图注意力融合网络:一种基于先验信息的双视图表示学习用于从短时PPG信号估计心输出量

Yaowen Zhang, Bo Cui, Libera Fresiello, Peter H. Veltink, Dirk W. Donker, Ying Wang

发表机构 * Department of Biomedical Signals and Systems(生物医学信号与系统系) Department of Cardiovascular and Respiratory Physiology(心血管与呼吸生理学系) Department of Intensive Care(重症医学系)

AI总结 本文提出了一种基于先验信息的双视图深度学习模型CVAF-Net,用于从短时PPG信号估计心输出量,通过跨视图注意力融合技术提升模型性能,并在多个数据集上验证了其有效性。

详情
AI中文摘要

从光体积脉搏波描记术(PPG)准确估计心输出量(CO)对于无创血流动力学监测具有潜力,但仍然困难,因为CO由心脏功能和血管张力共同决定。传统基于特征的模型使用具有生理意义的PPG描述符,但依赖于准确的脉搏检测并可能遗漏潜在的时间关系。相比之下,全端到端深度学习模型直接从原始PPG信号学习,但往往未能充分利用已建立的PPG衍生先验信息。本文引入了跨视图注意力融合网络(CVAF-Net),一种用于从短时、固定长度PPG段估计CO的基于先验信息的双视图深度学习模型。CVAF-Net将原始PPG信号作为时间视图,并将特征序列图(FSM)作为结构化先验引导视图,通过跨视图注意力融合两种表示。该模型独立评估了来自三个数据集的5秒、15秒和30秒段:模拟脉冲波(3323名受试者)、血管收缩诱发(79名受试者)以及静息/骑车活动(10名受试者),并与多种机器学习和深度学习基准进行了比较。CVAF-Net在大多数基准方法上表现更优,并在模拟数据上以平均绝对误差(MAE)为0.19 L/min(MAPE: 3.95%)与最先进的基于Transformer的模型性能相当,在现实世界中也实现了高准确性(最小MAE: 1.20 L/min)。重要的是,CVAF-Net将浮点运算次数(FLOPs)减少了十二倍,与领先的基于Transformer的模型相比。合理性分析显示,CO估计在生理上一致,与年龄(ρ=-0.274)、心率(ρ=0.894)和全身血管阻力(ρ=-0.740)有预期的相关性。这些发现表明,CVAF-Net提供了一种准确、计算高效且可推广的连续可穿戴CO监测方法。

英文摘要

Accurate cardiac output (CO) estimation from photoplethysmography (PPG) is promising for unobtrusive hemodynamic monitoring, but remains difficult since CO is jointly determined by cardiac function and vascular tone. Conventional feature-based models use physiologically meaningful PPG descriptors, yet depend on accurate pulse detection and may miss latent temporal relationships. In contrast, fully end-to-end deep learning models learn directly from raw PPG but often underuse established PPG-derived prior information. Here, we introduce the Cross-View Attention Fusion Network (CVAF-Net), a prior-guided dual-view deep learning model for CO estimation from short, fixed-length PPG segments. CVAF-Net processes raw PPG as a temporal view and a feature sequence map (FSM) as a structured prior-guided view, and fuses the two representations through cross-view attention. The model was independently evaluated using 5-, 15-, and 30-s segments from three datasets: simulated pulse waves (3323 subjects), vasoconstriction provocation (79 subjects), and resting/cycling activities (10 subjects), and was compared with multiple machine learning and deep learning benchmarks. CVAF-Net outperformed most benchmark methods and achieved performance comparable to a state-of-the-art Transformer-based model, with a mean absolute error (MAE) of 0.19 L/min (MAPE: 3.95%) on simulated data and high accuracy in real-world settings (minimum MAE: 1.20 L/min). Importantly, CVAF-Net reduced FLOPs by twelvefold compared with the leading Transformer-based model. Plausibility analysis showed physiologically consistent CO estimates, with expected correlations with age ($ρ= -0.274$), heart rate ($ρ= 0.894$), and systemic vascular resistance ($ρ= -0.740$). These findings indicate that CVAF-Net provides an accurate, computationally efficient, and generalizable approach for continuous wearable-based CO monitoring.

2605.19665 2026-05-20 cs.SE cs.AI

CriterAlign: Criterion-Centric Rationale Alignment for Code Preference Judging

CriterAlign: 以标准为中心的推理对齐用于代码偏好判断

Zhenyu Li, Aleksandar Cvejic, Zehui Chen, Peter Wonka

发表机构 * KAUST(卡塔尔人工智能研究 institute) ByteDance(字节跳动)

AI总结 本文提出CriterAlign,一种以标准为中心的推理对齐框架,通过直接的标准级 pairwise 判断、tie-driven 标准细化、swap-consistency 过滤和最终 pairwise 合成,改进了代码偏好判断的准确性,同时引入Human-Preference-Aligned Guidance (HPAG)来提升性能。

详情
AI中文摘要

成对的人类偏好预测是评估代码生成系统的核心,其中质量往往依赖于任务特定的权衡,而不仅仅是功能正确性。虽然基于评分表的LLM判断通过将评估分解为显式标准来提高可解释性,但大多数现有流程仍然是逐点的:它们独立评分每个响应,并通过比较聚合分数来推导偏好。我们证明这种设计与成对的代码偏好预测不匹配,并且可能在强单体判断下表现不佳。我们提出了CriterAlign,一种以标准为中心的框架,通过直接的标准级成对判断、tie驱动的标准细化、swap一致性过滤和最终成对合成,将基于评分表的判断适应于成对偏好评估。我们进一步引入Human-Preference-Aligned Guidance (HPAG),通过从训练示例中提取人类偏好与单体判断预测之间的反复推理缺口进行离线合成,并注入到标准生成器、标准判断器和最终判断器中。在BigCodeReward上,CriterAlign将Qwen2.5-VL-32B单体判断的准确率从60.4%提升到66.3%,消融实验确认了成对标准设计和HPAG的贡献。

英文摘要

Pairwise human preference prediction is central to evaluating code-generation systems, where quality often depends on task-specific trade-offs beyond functional correctness. While rubric-based LLM judges improve interpretability by decomposing evaluation into explicit criteria, most existing pipelines remain pointwise: they score each response independently and derive preferences by comparing aggregated scores. We show that this design is poorly matched to pairwise code preference prediction and can underperform a strong monolithic judge. We propose CriterAlign, a criterion-centric framework that adapts rubric-based judging to pairwise preference evaluation through direct criterion-level pairwise judgments, tie-driven criterion refinement, swap-consistency filtering, and final pairwise synthesis. We further introduce Human-Preference-Aligned Guidance (HPAG), synthesized offline from training examples by extracting recurring rationale gaps between human preferences and monolithic judge predictions, and injected into the criterion generator, criterion judge, and final judge. On BigCodeReward, CriterAlign improves a Qwen2.5-VL-32B monolithic judge from 60.4% to 66.3% accuracy, with ablations confirming the contributions of pairwise criterion design and HPAG.

2605.19646 2026-05-20 q-bio.NC cs.LG

BCI-sift: An automated feature selection toolbox for Brain Computer Interface applications

BCI-sift: 一种用于脑机接口应用的自动化特征选择工具箱

Elena C Offenberg, Dirk Keller, Mariska J Vansteensel, Zachary V Freudenburg, Nick F Ramsey, Julia Berezutskaya

发表机构 * Department of Neurology and Neurosurgery, University Medical Center Utrecht Brain Center, Utrecht University(神经学与神经外科学系,乌得勒支大学医学中心脑研究中心,乌得勒支大学) Donders Institute for Brain, Cognition and Behaviour, Radboud University(脑、认知与行为研究所,拉德堡德大学)

AI总结 本文提出BCI-sift工具箱,通过整合先进优化方法,为脑机接口任务提供自动化特征选择解决方案,提升了分类准确性和解释性。

Comments 19 pages, 12 figures

详情
AI中文摘要

在临床脑机接口(BCI)领域的发展依赖于精确且可靠的信号解释。然而,来自植入式和非植入式BCI采集的数据具有高维性和噪声特性,这带来了重大挑战,推动了特征选择算法的应用。我们引入了BCI-sift(BCI系统性和可解释性特征调节),一种基于Python的工具箱,旨在简化将各种优化算法应用于BCI数据集以识别机器学习任务中最相关的特征。我们的scikit-learn兼容工具箱(github.com/UMCU-RIBS/BCI-sift)通过整合先进的优化方法简化了BCI任务中的特征选择。我们验证了该工具箱在8名健康受试者(64-128个电极植入在运动皮层上)的高密度电极图(HD ECoG)数据上的性能,这些受试者重复说出12个单词。BCI-sift在电极、时间及频率维度上识别了信息丰富的神经特征。电极选择的解剖位置在不同受试者之间一致,并与已知的运动皮层功能组织一致。相关时间点集中在说话产生周围,高频带被识别为最信息丰富的,这与先前工作一致。特征选择比使用所有特征提高了分类准确性。BCI-sift提供了一个易于使用的多功能平台,用于BCI研究中的特征选择,能够提高解码性能、自动化特征分析和增强解释性。虽然验证了HD ECoG数据,该方法广泛适用于其他BCI模态。通过提高分类准确性和可解释性,BCI-sift解决了开发高效和透明BCI系统的关键挑战。

英文摘要

Advancements in clinical Brain-Computer Interfaces (BCIs) depend on precise and reliable signal interpretation. However, the high-dimensional and noisy nature of data captured from both implanted and non-implanted BCIs poses significant challenges, motivating the use of feature selection algorithms. We introduce BCI-sift (BCI Systematic and Interpretable Feature Tuning), a Python-based toolbox designed to streamline the application of diverse optimization algorithms to BCI datasets for identifying the most relevant features in machine learning tasks. Our scikit-learn-compatible toolbox (github.com/UMCU-RIBS/BCI-sift) simplifies feature selection in BCI tasks by integrating advanced optimization methods. We validated the toolbox on high-density electrocorticography (HD ECoG) data from eight able-bodied participants with 64-128 electrodes implanted over the sensorimotor cortex, who repeatedly spoke 12 words. BCI-sift identified informative neural features across electrode, temporal, and frequency dimensions. The anatomical locations of electrode selections were consistent across participants and aligned with known functional organization of the sensorimotor cortex. Relevant time points clustered around speech production, and the high-frequency band was identified as most informative, in line with prior work. Feature selection improved classification accuracy compared to using all features. BCI-sift provides an accessible and versatile platform for feature selection in BCI research, enabling improved decoding performance, automated feature analysis, and enhanced interpretability. While validated on HD ECoG data, the approach is broadly applicable to other BCI modalities. By enhancing classification accuracy and interpretability, BCI-sift addresses key challenges in developing efficient and transparent BCI systems.

2605.19644 2026-05-20 cs.CR cs.LG

Inferring Sensitive Attributes from Knowledge Graph Embeddings: Attack and Defense Strategies

从知识图谱嵌入中推断敏感属性:攻击与防御策略

Yasmine Hayder

发表机构 * LIFO, INSA CVL, Univ. Orléans, Inria, France(LIFO,法国里尔大学CVL学院,奥尔良大学,法国国家信息与自动化研究所)

AI总结 本文研究了基于知识图谱嵌入(KGE)推理的隐私风险,提出了一种通过后处理去污技术减轻这些风险的框架,探讨了在推荐质量与隐私保护之间进行权衡的必要性。

Journal ref ESWC - Extended Semantic Web Conference, May 2026, Dubrovnik, France

详情
AI中文摘要

知识图谱(KGs)是一种强大的链接数据表示形式,提供了灵活性、语义丰富性和支持知识丰富和推理的能力。它们帮助数据所有者组织和利用异构数据以提供有洞察力的服务(例如推荐),但现实中的KGs往往不完整,隐藏了真实的事实或遗漏了有价值的观点。知识图谱嵌入技术常用于推断有价值的缺失信息。然而,对KGs的推理可能会无意中暴露敏感的用户信息,即使这些数据并未显式存储。在本文中,我们研究了基于KGE推理的隐私风险,重点关注攻击者试图从看似非敏感的输出中推断出敏感用户属性的属性推断攻击。我们提出并评估了一个框架,通过应用后处理去污技术来减轻这些隐私风险。初步结果展示了这些攻击对KGE模型输出的有效性,并探讨了在应用基于随机化的技术时推荐质量与隐私保护之间的权衡,突显了未来工作需要实验更高级技术以解决此问题的必要性。

英文摘要

Knowledge Graphs (KGs) are a powerful representation of linked data, offering flexibility, semantic richness, and support for knowledge enrichment and reasoning. They help data owners organize and exploit heterogeneous data to provide insightful services (e.g., recommendations), yet real-world KGs are often incomplete, hiding true facts or missing valuable insights. Knowledge graph embedding techniques are commonly used to infer valuable missing information. However, reasoning over KGs can inadvertently expose sensitive user information, even when such data is not explicitly stored. In this work, we investigate the privacy risks associated with KGE-based reasoning, focusing on attribute inference attacks where adversaries attempt to deduce sensitive user attributes from seemingly non-sensitive outputs. We propose and evaluate a framework that mitigates these privacy risks by applying post processing sanitization techniques to KGE outputs. Preliminary results demonstrate the effectiveness of these attacks on the outputs of KGE models, and explore the trade-off between recommendation quality and privacy protection when applying randomization based approaches, highlighting the need to experiment with more advanced techniques in future work to address this issue.

2605.19641 2026-05-20 stat.ML cs.LG

Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data

增加缺失值以减少偏差:带有缺失数据的Richardson-SGD

Ferdinand Genans, Erwan Scornet

发表机构 * Sorbonne Université and Université Paris Cité, CNRS, Laboratoire de Probabilités, Statistique et Modélisation, LPSM(索邦大学和巴黎cité大学,CNRS,概率、统计与建模实验室,LPSM)

AI总结 本文研究了如何通过增加缺失值来减少梯度偏差,提出了一种基于Richardson外推的Richardson-SGD方法,该方法通过在已有不完整数据的基础上故意增加缺失率,从而抵消梯度偏差,提高了不完整数据下的优化和估计性能。

详情
AI中文摘要

随机梯度方法在现代大规模学习中至关重要,但其在不完整协变量中的使用仍然谨慎,因为插补方案通常会引入系统性的梯度偏差,如在线性模型中所示。在本工作中,我们证明了所有参数模型在各种插补程序中都表现出相似的梯度偏差,并且精确地刻画了缺失率向量p的依赖性,其中O(||p||)是主导项。我们利用这一分析,提出了一种简单的去偏差程序,用于带有缺失值的随机梯度下降(SGD),基于Richardson外推。关键思想是“故意增加缺失率”:从已有的不完整观测中,生成一个更稀疏的版本,在更高的、受控的缺失率下,并将两个结果的随机梯度结合以抵消主导的偏差项。我们证明,在几种缺失情况中,一个Richardson步骤将梯度偏差从O(||p||)减少到O(||p||²)。我们提出的方法计算高效,模型无关,并适用于任何参数损失函数,其随机梯度可以在插补后计算。此外,当缺失指示符独立时,总体梯度偏差是p的多线性多项式,并仅取决于由声明单个坐标缺失引起的总体梯度误差。在这种情况下,我们的方法可以推广到多步Richardson过程,该过程递归地抵消更高阶项。在经验上,Richardson去偏差提高了多个广义线性模型中的优化和估计性能,并与广泛使用的插补程序如MICE相结合。这些结果表明,有些反直觉地,在现有缺失数据上添加受控的缺失率可以使不完整数据的随机学习更准确。

英文摘要

Stochastic gradient methods are central to modern large-scale learning, but their use with incomplete covariates remains delicate since imputation schemes generally introduce systematic gradient biases, as shown for linear models. In this work, we prove that all parametric models exhibit similar gradient bias for various imputation procedures and characterize exactly the dependence on the missingness ratio vector $p$, with $O(\|p\|)$ as the leading term. We exploit this analysis to propose a simple debiasing procedure for stochastic gradient descent (SGD) with missing values based on Richardson extrapolation, which leverages the exact expression of the gradient bias. The key idea is to \emph{deliberately add missingness}: from an already incomplete observation, we generate a further-thinned version at a higher, controlled missingness level, and combine the two resulting stochastic gradients to cancel the leading bias term. We prove that one Richardson step reduces the gradient bias from $O(\|p\|)$ to $O(\|p\|^2)$ under several missingness scenarios. Our proposed method is computationally efficient, model-agnostic and applies to any parametric loss whose stochastic gradient can be computed after imputation. Furthermore, when missing indicators are independent, the population gradient bias is a multilinear polynomial in $p$ and depends only on population gradient errors induced by declaring a single coordinate missing. In this case, our method generalizes to a multi-step Richardson procedure which recursively cancels higher-order terms. Empirically, Richardson debiasing improves optimization and estimation across several generalized linear models and combines positively with widely used imputation procedures such as MICE. These results suggest that, somewhat counter-intuitively, adding controlled missingness on top of existing missing data can make stochastic learning from incomplete data more accurate.

2605.19638 2026-05-20 cs.HC cs.AI cs.CY cs.SE

The Accessibility Capability Boundary: Operational Limits and Expansion Potential of AI-Generated Browser-Native Accessibility Systems

可访问性能力边界:AI生成浏览器原生可访问性系统的操作极限与扩展潜力

Rizwan Jahangir, Daisuke Ishii

发表机构 * NUST Business School, NUST(NUST商学院,NUST) Kiara Inc.(Kiara公司)

AI总结 本文提出可访问性能力边界(ACB)理论框架,探讨AI生成浏览器原生可访问性系统在操作极限和扩展潜力方面的核心问题,并通过实证原型分析,定义了可访问性能力空间中的可达区域和不可达区域,为自主可访问性计算的可扩展性提供了理论基础。

Comments 21 pages, 4 figures

详情
AI中文摘要

随着大型语言模型(LLMs)在合成功能性用户界面方面的能力不断增强,可访问性计算领域出现了一个基本问题:AI驱动的可访问性系统能走多远?本文引入了可访问性能力边界(ACB),这是一个用于推理自主可访问性系统操作极限和扩展潜力的正式框架,并基于现实世界系统构件进行了理论构建。我们不将可访问性视为二元合规属性,而是将其视为受可测量变量约束的动态、多维能力空间,包括部署延迟、认知负荷、基础设施依赖性、离线持久性、交互复杂性和适应性等变量。我们论证了由单文件HTML构件构建的AI生成浏览器原生系统,利用标准浏览器API,可能通过将部署摩擦降至接近零,从而大幅扩展ACB。我们通过正式定义、命题和比较评估矩阵,定义了此类系统所能和无法达到的可访问性能力空间区域。我们进一步识别了剩余的计算、基础设施和验证约束,这些构成了该范式的硬边界。本文为理解自主可访问性计算的可扩展性极限提供了理论基础,并提出了未来在可访问性感知AI系统中的研究议程。

英文摘要

As large language models (LLMs) demonstrate increasing competence in synthesizing functional user interfaces, a fundamental question emerges in accessibility computing: \textit{how far can AI-driven accessibility systems go?} This paper introduces the \textit{Accessibility Capability Boundary} (ACB), a formal framework for reasoning about the operational limits and expansion potential of autonomous accessibility systems, and grounds this theory in a real-world systems artifact. We model accessibility not as a binary compliance property but as a dynamic, multidimensional capability space constrained by measurable variables including deployment latency, cognitive load, infrastructure dependency, offline persistence, interaction complexity, and adaptability. We argue that AI-generated, browser-native systems constructed as single-file HTML artifacts leveraging standard browser APIs may dramatically shift the ACB outward by reducing deployment friction to near-zero and enabling rapid, context-specific interface adaptation. We ground our theoretical framework in the analysis of two real-world exploratory prototypes. The first is an AI-generated browser-native accessibility interface deployed for a blind user in Nepal. The second is a fully functional, open-source webcam alignment assistant for visually impaired users, serving as a concrete systems artifact. Through formal definitions, propositions, and a comparative evaluation matrix, we characterize the regions of the accessibility capability space that such systems can and cannot reach. We further identify remaining computational, infrastructural, and verification constraints that constitute the hard boundaries of this paradigm. This work contributes a theoretical foundation for understanding the scalable limits of autonomous accessibility computing and proposes a research agenda for future work in accessibility-aware AI systems.

2605.19632 2026-05-20 cs.LO cs.SD

Executable Boundary Contracts for Sound Event Traces

可执行的边界合同用于声音事件轨迹

Faruk Alpay, Hamdi Alakkad

发表机构 * Bahcesehir University(巴切谢希尔大学)

AI总结 本文提出了一种可执行的边界合同,用于有限声音事件轨迹的测量,通过定义帧片段、事件层和相关约束来评估时间边界行为,以改进声音事件报告的准确性。

Comments 39 pages. Finite frame core code, tables, manifests, and Lean checks are ancillary material

详情
AI中文摘要

声音事件报告通常将时间边界行为压缩为帧、片段或事件分数。本文定义了有限声音事件轨迹的可执行边界合同。帧片段是一种有界的布尔片段,可嵌入STL后通过网格投影。事件层增加了声明的区间匹配、持续时间条款、碎片化条款和受限制的向量评分。目的是测量,而不是新的通用时间逻辑或挑战排行榜。本文的成果评估了受控的Mini LibriSpeech种子场景、MAESTRO真实声音景观、冻结的预训练时间探针以及官方的DCASE 2024任务4基准赛道。在这些赛道上,标准分数和合同坐标以可解释的方式存在分歧。最强的真实语料发现是联合活动可以隐藏类型边界失败,而外部DCASE输出提供了类索引挑战级别的参考。代码、生成的表格、清单和Lean检查用于有限帧核心作为附属材料。

英文摘要

Sound event reports often compress timed boundary behavior into frame, segment, or event scores. This paper defines executable boundary contracts for finite sound event traces. The frame fragment is a bounded Boolean fragment embeddable in STL after grid projection. The event layer adds declared interval matching, duration clauses, fragmentation clauses, and obligation restricted vector scoring. The aim is measurement, not a new general temporal logic and not a challenge leaderboard. The artifact evaluates controlled Mini LibriSpeech seeded scenes, MAESTRO Real soundscapes, frozen pretrained timing probes, and an official DCASE 2024 Task 4 baseline track. Across these tracks, standard scores and contract coordinates disagree in interpretable ways. The strongest real corpus finding is that union activity can hide typed boundary failure, while external DCASE outputs provide a class indexed challenge level reference. Code, generated tables, manifests, and Lean checks for the finite frame core are supplied as ancillary material.