arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2507.01932 2026-05-20 math.OC cs.LG cs.NA math.NA stat.ML

A first-order method for nonconvex-nonconcave minimax problems under a local Kurdyka-Lojasiewicz condition

非凸-非凹极小极大问题的一种一阶方法：在局部Kurdyka-Lojasiewicz条件下

Zhaosong Lu, Xiangyuan Wang

AI总结本文研究了一类非凸-非凹极小极大问题，其中内部最大化问题满足一个可能随外部最小化变量变化的局部Kurdyka-Lojasiewicz条件。与文献中常见的全局KL或Polyak-Lojasiewicz条件相比，该局部KL条件能涵盖更广泛的实际场景，但同时也带来了新的分析挑战。为此，本文证明了关联的最大函数是局部广义Hölder光滑的，并基于此开发了一种近似近端梯度方法来求解极小极大问题，在温和假设下建立了计算近似 stationary 点的复杂性保证。

Comments Accepted by SIAM Journal on Optimization

详情

AI中文摘要

我们研究了一类非凸-非凹极小极大问题，其中内部最大化问题满足一个可能随外部最小化变量变化的局部Kurdyka-Lojasiewicz（KL）条件。与文献中常见的全局KL或Polyak-Lojasiewicz（PL）条件相比，该局部KL条件能涵盖更广泛的实际场景，但同时也带来了新的分析挑战。特别是，随着优化算法向问题的 stationary 点推进，KL条件成立的区域可能缩小，导致更复杂且可能病态的景观。为解决这一挑战，我们证明了关联的最大函数是局部广义Hölder光滑的。利用这一关键性质，我们开发了一种近似近端梯度方法来求解极小极大问题，其中最大函数的近似梯度通过应用KL结构子问题的近端梯度方法计算。在温和假设下，我们建立了计算极小极大问题近似 stationary 点的复杂性保证。

英文摘要

We study a class of nonconvex-nonconcave minimax problems in which the inner maximization problem satisfies a local Kurdyka-Lojasiewicz (KL) condition that may vary with the outer minimization variable. In contrast to the global KL or Polyak-Lojasiewicz (PL) conditions commonly assumed in the literature -- which are significantly stronger and often too restrictive in practice -- this local KL condition accommodates a broader range of practical scenarios. However, it also introduces new analytical challenges. In particular, as an optimization algorithm progresses toward a stationary point of the problem, the region over which the KL condition holds may shrink, resulting in a more intricate and potentially ill-conditioned landscape. To address this challenge, we show that the associated maximal function is locally generalized Hölder smooth. Leveraging this key property, we develop an inexact proximal gradient method for solving the minimax problem, where the inexact gradient of the maximal function is computed by applying a proximal gradient method to a KL-structured subproblem. Under mild assumptions, we establish complexity guarantees for computing an approximate stationary point of the minimax problem.

URL PDF HTML ☆

赞 0 踩 0

2506.12218 2026-05-20 eess.SP cs.LG

Directed Acyclic Graph Convolutional Networks

有向无环图卷积网络

Samuel Rey, Hamed Ajorlou, Gonzalo Mateos

AI总结本文提出了一种专门针对DAG上信号卷积学习的新型图神经网络架构DCN，通过因果图滤波器学习节点表示，利用正式的卷积操作实现频域表示，并引入并行DCN(PDCN)以解耦模型复杂度与图规模，实验证明其在准确率、鲁棒性和计算效率上优于现有方法。

详情

DOI: 10.1109/TSP.2026.3687632

AI中文摘要

有向无环图（DAG）在科学和工程应用中至关重要，包括因果推断、调度和神经架构搜索。本文介绍DAG卷积网络（DCN），一种专为从DAG上信号进行卷积学习设计的新型图神经网络（GNN）架构。DCN利用因果图滤波器学习节点表示，这些表示考虑了DAG固有的部分顺序，这是一种在传统GNN中不存在的强归纳偏差。与以往在DAG上的机器学习方法不同，DCN基于允许频域表示的正式卷积操作。我们进一步提出并行DCN（PDCN），该模型将输入DAG信号馈入并行的因果图移位操作符银行，并使用共享的多层感知机处理这些DAG感知特征。这样，PDCN在解耦模型复杂度与图规模的同时保持了令人满意的预测性能。所提架构的排列等变性和表达能力也得到了确立。在多个任务、数据集和实验条件下进行全面的数值测试表明，(P)DCN在准确率、鲁棒性和计算效率方面均优于现有最先进基线。这些结果将(P)DCN定位为一种可行的深度学习框架，该框架专门针对DAG结构数据进行设计，基于第一性（图）信号处理原理。

英文摘要

Directed acyclic graphs (DAGs) are central to science and engineering applications including causal inference, scheduling, and neural architecture search. In this work, we introduce the DAG Convolutional Network (DCN), a novel graph neural network (GNN) architecture designed specifically for convolutional learning from signals supported on DAGs. The DCN leverages causal graph filters to learn nodal representations that account for the partial ordering inherent to DAGs, a strong inductive bias does not present in conventional GNNs. Unlike prior art in machine learning over DAGs, DCN builds on formal convolutional operations that admit spectral-domain representations. We further propose the Parallel DCN (PDCN), a model that feeds input DAG signals to a parallel bank of causal graph-shift operators and processes these DAG-aware features using a shared multilayer perceptron. This way, PDCN decouples model complexity from graph size while maintaining satisfactory predictive performance. The architectures' permutation equivariance and expressive power properties are also established. Comprehensive numerical tests across several tasks, datasets, and experimental conditions demonstrate that (P)DCN compares favorably with state-of-the-art baselines in terms of accuracy, robustness, and computational efficiency. These results position (P)DCN as a viable framework for deep learning from DAG-structured data that is designed from first (graph) signal processing principles.

URL PDF HTML ☆

赞 0 踩 0

2506.07209 2026-05-20 cs.GR cs.CV

HOI-PAGE: Zero-Shot Human-Object Interaction Generation with Part Affordance Guidance

HOI-PAGE：基于部分可及性的零样本人类-物体交互生成

Lei Li, Angela Dai

AI总结本文提出HOI-PAGE，一种通过部分可及性推理生成高保真4D人类-物体交互的零样本方法，利用大语言模型进行部分级机械推理，并通过结构化部分可及性图（PAG）引导三阶段合成，生成复杂多物体或多人物交互序列。

Comments ICML 2026. Project page: https://craigleili.github.io/projects/hoipage/ Video: https://www.youtube.com/watch?v=gwXjOffCFyk

详情

AI中文摘要

我们提出了HOI-PAGE，一种新的方法，优先考虑部分级可及性推理，从文本提示中以零样本方式生成高保真的4D人类-物体交互（HOIs）。与之前专注于全局、整体身体-物体运动合成的方法不同，我们的方法利用大语言模型（LLMs）显式推理交互的底层部分级机械特性。我们通过结构化的部分可及性图（PAG）表示来捕捉这种推理，作为高层次交互框架，引导三阶段合成：首先，将输入3D对象分解为语义部分；然后，从文本提示生成参考HOI视频以提取基于部分的运动约束；最后，优化4D HOI运动序列，使其模仿参考动态并满足部分级接触约束。广泛的实验表明，我们的方法具有灵活性，能够生成复杂的多物体或多人物交互序列，具有显著提高的现实感和文本对齐性，对于零样本4D HOI生成具有明显优势。

英文摘要

We present HOI-PAGE, a new approach that prioritizes part-level affordance reasoning to generate high-fidelity 4D human-object interactions (HOIs) from text prompts in a zero-shot fashion. In contrast to prior works that focus on global, whole body-object motion synthesis, our approach explicitly reasons about the underlying part-level mechanics of interactions using large language models (LLMs). We capture this reasoning in a structured part affordance graph (PAG) representation, serving as a high-level interaction scaffolding to guide a three-stage synthesis: first, decomposing input 3D objects into semantic parts; then, generating reference HOI videos from text prompts to extract part-based motion constraints; and finally, optimizing for 4D HOI motion sequences that mimic the reference dynamics while satisfying part-level contact constraints. Extensive experiments show that our approach is flexible and capable of generating complex multi-object or multi-person interaction sequences, with significantly improved realism and text alignment for zero-shot 4D HOI generation.

URL PDF HTML ☆

赞 0 踩 0

2506.03178 2026-05-20 eess.IV cs.AI cs.CV

LLaMA-XR: A Novel Framework for Radiology Report Generation using LLaMA and QLoRA Fine Tuning

LLaMA-XR: 一种基于LLaMA和QLoRA微调的新型放射科报告生成框架

Md. Zihad Bin Jahangir, Muhammad Ashad Kabir, Sumaiya Akter, Israt Jahan, Minh Chau

AI总结本文提出LLaMA-XR框架，结合LLaMA 3.1与DenseNet-121图像嵌入及QLoRA微调，提升放射科报告生成的准确性和临床相关性，同时保持计算效率。

Comments 25 pages

详情

DOI: 10.3390/bioengineering13050493
Journal ref: Bioengineering 2026, 13(5), 493

AI中文摘要

自动化放射科报告生成具有减少放射科医生工作负担和提高诊断准确性的潜力。然而，从胸部X光片生成精确且具有临床意义的报告仍然具有挑战性，因为医学语言的复杂性和对上下文理解的需求。现有模型在保持准确性和上下文相关性方面存在困难。在本文中，我们提出了LLaMA-XR，一种新型框架，整合了LLaMA 3.1与基于DenseNet-121的图像嵌入以及量化低秩适应（QLoRA）微调。LLaMA-XR在保持计算效率的同时实现了改进的连贯性和临床准确性。这种效率是由一种优化策略驱动的，该策略增强了参数利用并减少了内存开销，使报告生成速度更快，计算资源需求更低。在IU X光基准数据集上进行的广泛实验表明，LLaMA-XR优于一系列最先进的方法。我们的模型在ROUGE-L得分上达到0.433，在METEOR得分上达到0.336，建立了该领域的性能新基准。这些结果突显了LLaMA-XR作为自动化放射科报告的有效且高效的AI系统潜力，提供了增强的临床效用和可靠性。

英文摘要

Automated radiology report generation holds significant potential to reduce radiologists' workload and enhance diagnostic accuracy. However, generating precise and clinically meaningful reports from chest radiographs remains challenging due to the complexity of medical language and the need for contextual understanding. Existing models often struggle with maintaining both accuracy and contextual relevance. In this paper, we present LLaMA-XR, a novel framework that integrates LLaMA 3.1 with DenseNet-121-based image embeddings and Quantized Low-Rank Adaptation (QLoRA) fine-tuning. LLaMA-XR achieves improved coherence and clinical accuracy while maintaining computational efficiency. This efficiency is driven by an optimization strategy that enhances parameter utilization and reduces memory overhead, enabling faster report generation with lower computational resource demands. Extensive experiments conducted on the IU X-ray benchmark dataset demonstrate that LLaMA-XR outperforms a range of state-of-the-art methods. Our model achieves a ROUGE-L score of 0.433 and a METEOR score of 0.336, establishing new performance benchmarks in the domain. These results underscore LLaMA-XR's potential as an effective and efficient AI system for automated radiology reporting, offering enhanced clinical utility and reliability.

URL PDF HTML ☆

赞 0 踩 0

2504.08381 2026-05-20 eess.SP cs.LG

An Empirical Investigation of Reconstruction-Based Models for Seizure Prediction from ECG Signals

基于重建模型的癫痫预测的实证研究：从ECG信号出发

Mohammad Reza Chopannavaz, Foad Ghaderi

AI总结本文提出了一种基于重建的异常检测框架，利用时频表示和深度学习模型捕捉与癫痫发作相关的的心率动态变化，通过平滑重建误差和自适应阈值策略提高预测准确性，实验结果显示在Siena数据库上达到99.16%的特异度和76.05%的准确率，同时在临床环境中提供可操作的早期预警。

详情

AI中文摘要

癫痫发作是短暂的神经学事件，其特征是大脑中异常和过度的神经元活动，通常与心血管系统可测量的紊乱有关。传统上，脑电图（EEG）信号被用作癫痫预测的主要模式，因为它们直接测量大脑活动并具有高诊断精度。然而，它们的成本、对噪声的敏感性和实际部署限制限制了它们在非受控临床环境中的应用。为克服这些挑战，最近的研究越来越多地研究了心电图（ECG）信号作为一种实用且非侵入性的替代方法，用于现实环境中的癫痫预测。证据表明，ECG衍生的心脏特征可能在临床癫痫发作前出现，提供了一个可行的早期检测窗口。在本文中，我们提出了一种基于重建的异常检测框架，该框架结合了时频表示和先进的深度学习模型，以捕捉与癫痫发作相关的的心率动态变化。随后，重建误差被平滑，并应用了自适应阈值策略以减少误报。该方法在Siena数据库上进行了评估，实现了99.16%的特异度、76.05%的准确率和每小时0.01的假阳性率，平均预测时间在癫痫发作前45分钟。这些结果表明，基于ECG的预测可以提供临床可操作的早期预警，同时提高患者可及性和舒适度。然而，这种性能反映了一种倾向于高特异度而非灵敏度的权衡，导致假阳性率降低，并符合临床对可靠部署的需求。

英文摘要

Epileptic seizures are transient neurological events characterized by abnormal and excessive neuron activity in the brain, which are often associated with measurable disturbances in the cardiovascular system. Traditionally, electroencephalogram (EEG) signals have served as the primary modality for seizure prediction due to their direct measurement of brain activity and high diagnostic precision. However, their cost, sensitivity to noise, and practical deployment constraints limit their applicability outside controlled clinical environments. To overcome these challenges, recent studies have increasingly investigated electrocardiogram (ECG) signals as a practical and non-invasive alternative for seizure prediction in real-world settings. Evidence suggests that ECG-derived cardiac signatures may precede clinical seizure onset, offering a viable window for early detection. In this paper, we propose a reconstruction-based anomaly detection framework that integrates time-frequency representations with advanced deep learning models to capture deviations in heart rate dynamics associated with seizure onset. Afterward, reconstruction error is smoothed, and an adaptive thresholding strategy is applied to reduce false alarms. The method was evaluated on the Siena database, achieving a specificity of 99.16%, accuracy of 76.05%, and a false positive rate (FPR) of 0.01/h, with an average prediction horizon of 45 minutes prior to seizure onset. These results demonstrate that ECG-based prediction can provide clinically actionable early warnings while improving patient accessibility and comfort. Nevertheless, this performance reflects a trade-off favoring high specificity over sensitivity, resulting in reduced FPR and aligning with clinical requirements for reliable deployment.

URL PDF HTML ☆

赞 0 踩 0

2504.04349 2026-05-20 cs.GT cs.LG

Tight Regret Bounds for Fixed-Price Bilateral Trade

固定价格双边交易的紧懊悔界

Houshuang Chen, Yaonan Jin, Pinyan Lu, Chihao Zhang

AI总结本文研究了固定价格机制在双边交易中的懊悔最小化问题，针对独立值和相关/对抗值分别给出了紧致的懊悔界，并改进了现有结果。

详情

AI中文摘要

我们通过懊悔最小化的视角研究固定价格机制在双边交易中的应用。我们的主要结果有两个方面：(i) 对于独立值，给出了具有两比特/一比特反馈的全局预算平衡固定价格机制的近最优紧界$\widetilde{\Theta}(T^{2/3})$。(ii) 对于相关/对抗值，给出了具有两比特/一比特反馈的全局预算平衡固定价格机制的近最优下界$\Omega(T^{3/4})$，这改进了[ BCCF24]中得到的$\Omega(T^{5/7})$下界，并在多至多项式对数因子范围内匹配了同一工作中得到的$\widetilde{\mathcal{O}}(T^{3 / 4})$上界。我们的工作结合之前的[CCCFL24mor, CCCFL24jmlr, AFF24, BCCF24]等工作，全面理解了固定价格双边交易的懊悔最小化问题。在此过程中，我们开发了两个可能具有独立兴趣的技术成分：(i) 一种名为'分形消除'的新算法范式，用于处理一比特反馈和独立值；(ii) 一种新的下界构造方法，具有新颖的证明技术，用于处理全局预算平衡约束和相关值。

英文摘要

We examine fixed-price mechanisms in bilateral trade through the lens of regret minimization. Our main results are twofold. (i) For independent values, a near-optimal $\widetildeΘ(T^{2/3})$ tight bound for $\textsf{Global Budget Balance}$ fixed-price mechanisms with two-bit/one-bit feedback. (ii) For correlated/adversarial values, a near-optimal $Ω(T^{3/4})$ lower bound for $\textsf{Global Budget Balance}$ fixed-price mechanisms with two-bit/one-bit feedback, which improves the best known $Ω(T^{5/7})$ lower bound obtained in the work [BCCF24] and, up to polylogarithmic factors, matches the $\widetilde{\mathcal{O}}(T^{3 / 4})$ upper bound obtained in the same work. Our work in combination with the previous works [CCCFL24mor, CCCFL24jmlr, AFF24, BCCF24] (essentially) gives a thorough understanding of regret minimization for fixed-price bilateral trade. En route, we have developed two technical ingredients that might be of independent interest: (i) A novel algorithmic paradigm, called $\textit{fractal elimination}$, to address one-bit feedback and independent values. (ii) A new $\textit{lower-bound construction}$ with novel proof techniques, to address the $\textsf{Global Budget Balance}$ constraint and correlated values.

URL PDF HTML ☆

赞 0 踩 0

2503.16309 2026-05-20 eess.IV cs.CV physics.med-ph

Rapid patient-specific neural networks for intraoperative X-ray to volume registration

快速的患者特异性神经网络用于术中X射线到体积的配准

Vivek Gopalakrishnan, David-Dimitris Chlorogiannis, Andrew Abumoussa, Anna M. Larson, Nazim Haouchine, Darren B. Orbach, Sarah Frisken, Neel Dey, Polina Golland

AI总结本文提出了一种自监督框架xvr，结合患者特异性神经网络和梯度优化，实现了快速且准确的2D到3D配准，通过物理模拟生成训练数据，无需手动标注，提升了临床和研究社区的广泛应用能力。

详情

AI中文摘要

先进的导航技术在图像引导的介入和手术机器人中需要快速且精确地对齐3D术前体积（如CT、MRI）到2D术中图像（如X射线荧光）。然而，现有的2D/3D配准方法无法在广泛的荧光引导程序中泛化：传统基于强度的优化器需要为每个患者仔细调整超参数，而深度学习方法需要大量的手动标注数据集，并且受限于训练时特定的解剖结构。为了解决这些限制，我们提出了xvr，一种自监督框架，结合了患者特异性神经网络和基于梯度的优化，实现了自动的2D/3D配准。xvr利用基于物理的模拟生成训练数据，从患者的术前扫描中生成，消除了手动标注的需要。我们提出了一种在数千次全身扫描上预训练的基础模型，仅需5分钟的微调即可实现任何解剖区域的患者特异性适应。在迄今为止最大的2D/3D配准评估中，xvr在多种解剖结构、成像模态和医院中实现了高精度，精度比现有方法提高了数量级。xvr通过开源软件https://xvr.csail.mit.edu，使广谱解剖的2D/3D刚性配准对广泛的临床和研究社区可及。

英文摘要

Advanced navigation techniques in image-guided interventions and surgical robotics require the rapid and precise alignment of 3D preoperative volumes (e.g., CT, MRI) to 2D intraoperative images (e.g., X-ray fluoroscopy). However, existing 2D/3D registration methods fail to generalize across the broad spectrum of fluoroscopy-guided procedures: traditional intensity-based optimizers require careful hyperparameter tuning for each subject, while deep learning approaches demand extensive manually labeled datasets and remain constrained to the specific anatomy on which they were trained. To address these limitations, we present xvr, a self-supervised framework that combines patient-specific neural networks with gradient-based optimization for automatic 2D/3D registration. xvr leverages physics-based simulation to generate training data from a patient's own preoperative scan, eliminating the need for manual annotation. We present a foundation model pretrained on thousands of whole-body scans, achieving patient-specific adaptation for any anatomical region in only 5 minutes of finetuning. In the largest evaluation of 2D/3D registration on real fluoroscopy to date, xvr achieves high accuracy in seconds across diverse anatomical structures, imaging modalities, and hospitals, improving upon the accuracy of existing methods by an order of magnitude. xvr makes pan-anatomical 2D/3D rigid registration accessible to broad clinical and research communities through open-source software at https://xvr.csail.mit.edu.

URL PDF HTML ☆

赞 0 踩 0

2404.16676 2026-05-20 cs.DS cs.LG

Multilayer Correlation Clustering

多层相关聚类

Atsushi Miyauchi, Florian Adriaens, Francesco Bonchi, Nikolaj Tatti

AI总结本文提出了一种多层相关聚类方法，旨在通过最小化多层不一致向量的ℓ_𝑝范数来优化聚类结果，并设计了相应的近似算法和实验验证。

Comments AISTATS 2026

详情

AI中文摘要

我们建立了多层相关聚类，这是相关聚类在多层设置下的新一般化。在该模型中，我们被给予一系列相关聚类的输入（称为层）在共同的集合V上。目标是找到V的一个聚类，使其多层不一致向量的ℓ_𝑝范数（p≥1）最小化，该向量的维度等于层数，每个元素表示聚类在相应层上的不一致程度。对于这一一般化，我们首先设计了一个O(L log n)的近似算法，其中L是层数。然后我们研究了我们问题的一个重要特殊情况，即具有所谓概率约束的情况。对于这种情况，我们首先给出一个(α+2)的近似算法，其中α是任何可能的单层对应物的近似比。此外，我们设计了一个4近似算法，该算法改进了上述一般概率约束情况下的近似比α+2=4.5。使用现实世界数据集的计算实验支持了我们的理论发现，并展示了所提出算法的实用性。

英文摘要

We establish Multilayer Correlation Clustering, a novel generalization of Correlation Clustering to the multilayer setting. In this model, we are given a series of inputs of Correlation Clustering (called layers) over the common set $V$ of $n$ elements. The goal is to find a clustering of $V$ that minimizes the $\ell_p$-norm ($p\geq 1$) of the multilayer-disagreements vector, which is defined as the vector (with dimension equal to the number of layers), each element of which represents the disagreements of the clustering on the corresponding layer. For this generalization, we first design an $O(L\log n)$-approximation algorithm, where $L$ is the number of layers. We then study an important special case of our problem, namely the problem with the so-called probability constraint. For this case, we first give an $(α+2)$-approximation algorithm, where $α$ is any possible approximation ratio for the single-layer counterpart. Furthermore, we design a $4$-approximation algorithm, which improves the above approximation ratio of $α+2=4.5$ for the general probability-constraint case. Computational experiments using real-world datasets support our theoretical findings and demonstrate the practical effectiveness of our proposed algorithms.

URL PDF HTML ☆

赞 0 踩 0

2312.02652 2026-05-20 hep-ex cs.LG

What Machine Learning Can Do for Focusing Aerogel Detectors

机器学习如何帮助聚焦气凝胶探测器

Foma Shipilov, Alexander Barnyakov, Viktor Bobrovnikov, Sergey Kononov, Fedor Ratnikov

AI总结本文提出利用机器学习技术来过滤聚焦气凝胶环电离切连尼探测器中的背景信号，以减少数据流并提高粒子速度分辨率。

Comments 5 pages, 4 figures, to be published in 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP2023) proceedings

2605.19755 2026-05-20 cs.SE cs.AI cs.CR cs.LG cs.MA

Operationalising Artificial Intelligence Bills of Materials (AIBOMs) for Verifiable AI Provenance and Lifecycle Assurance

将人工智能物料清单（AIBOM） operationalise 以实现可验证的 AI 追溯和生命周期保证

Petar Radanliev, Omar Santos, Carsten Maple, Kay Atefi

AI总结本文提出了一种扩展CycloneDX标准的AIBOM框架，用于捕捉AI特定的溯源、模型血统和披露元数据，通过结构化架构工程、密码学验证和智能体驱动自动化，实现可验证的软件溯源，展示了98.7%的可重复性保真度、96.2%的漏洞匹配精度和63%的手动监督减少，验证了自动化溯源保证和可重复AI生命周期验证的可行性。

详情

DOI: 10.3389/fcomp.2026.1735919
Journal ref: Front. Comput. Sci. 8:1735919 (2026)

AI中文摘要

人工智能（AI）系统日益依赖复杂的、多层的软件供应链，这带来了可重复性、透明性和安全性保证的挑战。本文提出了一种扩展CycloneDX标准的人工智能物料清单（AIBOM）架构，以捕捉AI特定的溯源、模型血统和披露元数据。该框架通过结构化架构工程、密码学验证和智能体驱动自动化，提供了一种正式的方法来实现可验证的软件溯源。开发了一个自主的AI流水线，利用机器可验证的溯源链进行持续的环境检查、漏洞丰富和可重复性审计。实证评估显示，在容器化分析工作流中，可重复性保真度为98.7%，漏洞匹配精度为96.2%，手动监督减少了63%。这些结果验证了自动化溯源保证和可重复AI生命周期验证的可行性。AIBOM框架在软件供应链透明性和AI可重复性工程的科学基础方面取得了进展，提供了一种可推广的方法来确保AI系统安全、加强溯源完整性，并支持符合国际信息安全标准。

英文摘要

Artificial Intelligence (AI) systems are increasingly dependent on complex, multi-layered software supply chains that introduce challenges for reproducibility, transparency, and security assurance. This study presents an Artificial Intelligence Bill of Materials (AIBOM) schema extending the CycloneDX standard to capture AI-specific provenance, model lineage, and disclosure metadata. The framework provides a formalised approach to verifiable software provenance through structured schema engineering, cryptographic validation, and agent-driven automation. An autonomous AI pipeline is developed to perform continuous environment inspection, vulnerability enrichment, and reproducibility auditing using machine-verifiable provenance chains. Empirical evaluation demonstrates 98.7% reproducibility fidelity, 96.2% vulnerability match precision, and a 63% reduction in manual oversight across containerised analytic workflows. These results confirm the feasibility of automated provenance assurance and reproducible AI lifecycle validation. The AIBOM framework advances the scientific foundations of software supply chain transparency and AI reproducibility engineering, offering a generalisable methodology for securing AI systems, strengthening provenance integrity, and supporting compliance with international information security standards.

URL PDF HTML ☆

赞 0 踩 0

2605.19737 2026-05-20 cs.GR cs.CV

Decentralized Direct Volume Rendering: A Browser-Native GPU Architecture for MRI Digital Twins in Resource-Constrained Settings

去中心化直接体渲染：一种浏览器原生的GPU架构，用于资源受限环境中的MRI数字孪生

Oserebameh Augustine Beckley

AI总结本研究提出了一种去中心化的浏览器原生GPU架构，用于在资源受限环境中实现高保真的MRI数字孪生，通过在低成本集成边缘GPU上执行确定性的单次通过射线投射和形态学梯度计算，实现了快速的像素生成和稳定的交互性能。

Comments 10 pages, 4 figures. Live interactive browser demo available at: https://webgpu-mri.vercel.app/ . Source code repository: https://github.com/Bahdmanbabzo/webgpu-mri

详情

AI中文摘要

数字孪体（DT）技术在手术计划和个性化医学中具有巨大潜力。然而，生成交互式、患者特异性的解剖孪体目前依赖于计算密集型的服务器端渲染（SSR）或昂贵的本地工作站，这在资源受限环境中（RCS）构成了显著的部署障碍。本文提出了一种去中心化的、客户端侧的WebGPU架构，以民主化高保真解剖数字孪体的访问。通过绕过标准的服务器端渲染管线，该框架在低成本的集成边缘GPU上执行确定性的单次通过射线投射和形态学梯度计算。消除云渲染解决方案固有的网络延迟，系统实现了小于920.0毫秒的首次像素时间（TTFP）并在>=82.0 FPS的稳定交互性。通过统一缓冲区维持连续交互保真度，实现了零延迟的组织参数操控，以支持动态临床决策。通过证明复杂的患者特异性MRI扫描的3D医学模拟可以在浏览器中原生执行，无需深度学习或外部计算依赖，该架构提供了一种可扩展且经济的平台，以促进医疗数字孪体的广泛临床应用。

英文摘要

Digital Twin (DT) technology holds immense potential for surgical planning and personalized medicine. However, generating interactive, patient-specific anatomical twins currently relies on computationally heavy Server-Side Rendering (SSR) or expensive local workstations, creating significant barriers to deployment, especially in resource-constrained settings (RCS). This paper presents a decentralized, client-side WebGPU architecture that democratizes access to high-fidelity anatomical Digital Twins. By bypassing standard server-side rendering pipelines, the framework executes deterministic single-pass raymarching and morphological gradient calculations directly on low-cost integrated edge GPUs. Eliminating the network latency inherent to cloud-rendered solutions, the system achieves a Time to First Pixel (TTFP) of under 920.0ms and maintains stable interactivity at >= 82.0 FPS. Continuous Interaction Fidelity is maintained via uniform buffers, enabling zero-latency manipulation of tissue parameters for dynamic clinical decision-making. By proving that complex 3D medical simulations of patient-specific MRI scan can be executed natively in the browser without deep learning or external computational dependencies, this architecture provides a scalable, affordable foundation for the widespread clinical adoption of healthcare Digital Twins.

URL PDF HTML ☆

赞 0 踩 0

2605.19733 2026-05-20 math.NA cs.LG cs.NA

Graph Neural Networks for Community Detection in Graph Signal Analysis

图神经网络在图信号分析中的社区检测

Roberto Cavoretto, Alessandra De Rossi, Enrico Montini

AI总结本文研究了图神经网络在图信号插值框架中的社区检测应用，通过将GNN生成的社区与图基函数（GBF）-PUM插值方法结合，实现了对图信号的准确重建，展示了深度学习在社区检测中对大规模图信号分析的支持。

详情

AI中文摘要

社区检测是图分析中的核心问题，其应用范围从网络科学到图信号处理。近年来，图神经网络（GNNs）已成为学习图结构数据低维表示的有效工具，并在聚类任务中，特别是在大规模和高维图上表现出强劲的性能。本文研究了在图信号插值框架中使用基于GNN的社区检测。在回顾了根据标准分类法的主要GNN架构类别后，我们将所得到的图社区整合到Partition of Unity Method（PUM）中，用于具有图基函数（GBF）的插值。在该方法中，GNN生成的社区被用来构建局部子域，在这些子域上计算GBF插值算子，并随后组合成全局近似。在基准图数据集上进行的数值实验，包括几何和城市网络示例，证明了所提出的GNN聚类与GBF-PUM插值结合方法能够实现准确的信号重建。结果表明，基于深度学习的社区检测可以为局部插值方案提供有效的图划分，支持其在可扩展的图信号分析中的应用。

英文摘要

Community detection is a central problem in graph analysis, with applications ranging from network science to graph signal processing. In recent years, Graph Neural Networks (GNNs) have emerged as effective tools for learning low-dimensional representations of graph-structured data and have shown strong performance in clustering tasks, particularly on large and high-dimensional graphs. This paper investigates the use of GNN-based community detection within a graph signal interpolation framework. After reviewing the main classes of GNN architectures for community detection according to a standard taxonomy, we integrate the resulting graph communities into a Partition of Unity Method (PUM) for interpolation with Graph Basis Functions (GBFs). In this approach, GNN-derived communities are used to construct local subdomains on which GBF interpolants are computed and subsequently combined into a global approximation. Numerical experiments on benchmark %graph datasets, including geometric and urban network examples demonstrate that the proposed combination of GNN-based clustering and GBF-PUM interpolation yields accurate signal reconstructions. The results indicate that deep learning-based community detection can provide effective graph partitions for localized interpolation schemes, supporting its use in scalable graph signal analysis.

URL PDF HTML ☆

赞 0 踩 0

2605.19722 2026-05-20 cs.CR cs.AI

Measuring Safety Alignment Effects in Autonomous Security Agents

在自主安全代理中测量安全对齐效应

Isaac David, Arthur Gervais

AI总结本文提出了一种基于轨迹的基准测试，用于评估安全代理在执行漏洞分析任务时的安全对齐效果，发现安全代理的性能差异主要体现在拒绝、不安全行为和工具可靠性等方面，而非单纯的拒绝率。

详情

AI中文摘要

当安全对齐的语言模型及其未经审查或删除的衍生版本作为自主安全代理运行时，它们的行为是否不同？单轮拒绝基准无法回答这个问题：安全代理必须检查仓库、调用工具并在授权的沙箱中生成漏洞证据。我们提出了一个包含30个本地漏洞分析任务的基于轨迹的基准测试，这些任务具有固定的工具、确定的成功谓词、擦除规则和基础检查，并将四种标准模型与未经审查或删除的衍生版本进行比较：Gemma 4 31B、Gemma 4 26B A4B、Qwen2.5-Coder 7B和Llama 3.1 8B。该成果包含1,500个安全代理轨迹和800个非安全控制轨迹。Gemma配对显示在安全任务中具有较大的限制减少收益：31B的成功率从14.0%降至0.7%，26B的成功率从10.7%降至0.0%，同时具有更高的平均基础性（3.91 vs 3.27和4.12 vs 1.64，满分5分）以及0.0%的拒绝、压制行为和不安全行为率。然而，控制和非Gemma配对排除了干净的安全特定或普遍的限制减少效应：Gemma的差距也出现在普通编码任务中，Qwen2.5-Coder在限制减少衍生版本中的成功率较低（2.0% vs 5.3%），而删除的Llama衍生版本未能通过工具协议。在所有家族中，硬证明触发和补丁验证任务仍无法解决。这些结果表明，自主安全代理中的安全对齐效应应在系统层面进行测量，将拒绝、不安全行为、工具可靠性和证据基础性分开，而不是将拒绝率作为安全信号。

英文摘要

Do stock safety-aligned language models and their uncensored or abliterated derivatives behave differently when run as autonomous security agents? Single-turn refusal benchmarks cannot answer this question: security agents must inspect repositories, call tools, and produce vulnerability evidence inside authorized sandboxes. We present a trace-based benchmark of 30 local vulnerability-analysis tasks with fixed tools, deterministic success predicates, redaction rules, and grounding checks, and compare four stock models against uncensored or abliterated derivatives: Gemma 4 31B, Gemma 4 26B A4B, Qwen2.5-Coder 7B, and Llama 3.1 8B. The artifact contains 1,500 security-agent traces and 800 non-security control traces. The Gemma pairs show large less-restricted gains on security tasks: 14.0% versus 0.7% success for 31B and 10.7% versus 0.0% for 26B, with higher mean grounding (3.91 versus 3.27 and 4.12 versus 1.64 out of five) and 0.0% refusal, suppressed-action, and unsafe-action rates in the 31B traces. However, controls and non-Gemma pairs rule out a clean security-specific or universal less-restricted effect: Gemma gaps also appear on ordinary coding tasks, Qwen2.5-Coder success is lower for the less-restricted derivative (2.0% versus 5.3%), and the abliterated Llama derivative fails the tool protocol. Across all families, hard proof-of-trigger and patch-verification tasks remain unsolved. These results show that safety alignment effects in autonomous security agents should be measured at the system level, separating refusal, unsafe action, tool reliability, and evidence grounding rather than treating refusal rate as the safety signal.

URL PDF HTML ☆

赞 0 踩 0

2605.19698 2026-05-20 cs.CR cs.LG

Awakening the Hydra: Stabilizing Multi-Concept Backdoor Injection in Text-to-Image Diffusion Models

唤醒 Hydra：在文本到图像扩散模型中稳定多概念后门注入

Kai Wang, Jiale Zhang, Chengcheng Zhu, Chuang Ma, Songze Li

AI总结本文研究了在易受干扰的环境下多概念后门攻击的稳定性问题，提出 Hydra 框架，通过约束触发语义和协调跨任务交互，实现稳健且可控的多概念后门注入，实验表明 Hydra 在保持清洁生成质量的同时，有效激活后门。

Comments Preprint. 18 pages

详情

AI中文摘要

文本到图像扩散模型通过开源重用和多次下游微调不断发展，其中重用的检查点难以验证，因此更容易出现隐藏的后门行为。在这样的生态系统中，一个预训练模型可能被多个独立方依次适应和重新分发，导致多个概念特定的触发-目标关联在同一个模型中累积。当这些关联共存时，语义冲突会在共享的表示空间中被放大，导致跨概念纠缠和生成质量下降。值得注意的是，这种累积并不增强攻击，反而可能破坏之前注入的行为并降低攻击可靠性。在本工作中，我们系统地研究了在此干扰环境中后门攻击，并提出 Hydra，一个统一的框架，用于在累积和去中心化的重用下实现稳健和可控的多概念后门注入。我们的核心见解是，在大规模多概念设置下稳定的后门注入需要在优化过程中显式约束触发语义并协调跨任务交互。具体而言，Hydra 在文本编码器空间中执行进化触发搜索，以识别与目标概念语义对齐但与其他注入概念保持稳定的触发器。它进一步结合多任务微调与触发器清洁正则化，以提高在密集多概念注入下的训练稳定性。在多个扩散骨干网络上进行的严格多概念设置下的广泛实验表明，Hydra 在保持清洁生成保真度和图像质量的同时，维持了有效的后门激活。例如，在 8 个攻击者和 500 个概念对上，Hydra 维持了约 95% 的 ASR 和强清洁生成。

英文摘要

Text-to-image diffusion models are increasingly developed through open-source reuse and repeated downstream fine-tuning, where reused checkpoints are difficult to verify and thus more susceptible to hidden backdoor behaviors. In such ecosystems, a single pretrained model may be sequentially adapted and redistributed by multiple independent parties, allowing multiple concept-specific trigger-target associations to accumulate in the same model. When these associations coexist, semantic conflicts can be amplified in the shared representation space, leading to cross-concept entanglement and degraded generation quality. Notably, instead of strengthening the attack, such accumulation can destabilize previously injected behaviors and reduce attack reliability. In this work, we systematically investigate backdoor attacks under this interference-prone setting and propose Hydra, a unified framework for robust and controlled multi-concept backdoor injection under cumulative and decentralized reuse. Our core insight is that stable backdoor injection under large-scale multi-concept settings requires explicitly constraining trigger semantics while coordinating cross-task interactions during optimization. Specifically, Hydra performs evolutionary trigger search in the text encoder space to identify triggers that are semantically aligned with their target concepts while remaining stable across other injected concepts. It further combines multi-task fine-tuning with trigger-clean regularization to improve training stability under dense multi-concept injection. Extensive experiments across multiple diffusion backbones under rigorous multi-concept settings show that Hydra maintains effective backdoor activation while preserving clean generation fidelity and image quality. For instance, across 8 attackers and 500 concept pairs, Hydra maintains ~95% ASR and strong clean generation.

URL PDF HTML ☆

赞 0 踩 0

2605.19695 2026-05-20 eess.AS cs.SD

Cross-Talk Speech Reduction, by Separation, for Separation

通过分离实现的交叉talk语音消除，用于分离

Zhong-Qiu Wang, Samuele Cornell

AI总结本文提出了一种旨在从近场混合信号中分离说话人语音的交叉talk消除任务，并提出了一种名为CTRnet的新型方法，可以直接在真实录制的近场和远场混合信号对上训练以完成CTR。基于CTRnet，进一步提出基于伪标签的远场语音分离（PuLSS），利用CTRnet估计的干净语音作为伪标签来训练分离远场混合信号的模型。该框架的主要优势是CTRnet和PuLSS都可以在目标域的真实数据上进行训练，解决了模型仅在模拟数据上训练时通常观察到的泛化差距。在CHiME-6数据集上，该框架在Oracle和估计说话人分离条件下实现了最先进的ASR性能，超过了所有CHiME-{7,8}挑战提交。据我们所知，这是首个在真实对话“语音在野外”数据上显著优于引导源分离的神经语音分离方法。

Comments in submission

详情

AI中文摘要

在对话语音分离和识别任务中，通常在训练数据收集期间将近场麦克风附接到每个说话人上，以捕捉近场、近距离混合信号，同时使用远场麦克风记录远场混合信号。每种近场混合信号对佩戴者来说都有相对较高的能量水平，可以直观地作为训练远场语音分离模型的弱监督。然而，它们并不足以干净地用于此目的，因为它们通常包含来自其他说话人的强交叉talk语音以及背景噪声。为了解决这个问题，我们提出了一种交叉talk消除（CTR）任务，旨在从每个近场混合信号中隔离说话人的语音，并提出了一种名为CTRnet的新型方法，可以直接在真实录制的近场和远场混合信号对上训练以完成CTR。基于CTRnet，我们进一步提出基于伪标签的远场语音分离（PuLSS），利用CTRnet估计的干净语音作为伪标签来训练分离远场混合信号的模型。该框架的主要优势是CTRnet和PuLSS都可以在目标域的真实数据上进行训练，解决了模型仅在模拟数据上训练时通常观察到的泛化差距。在CHiME-6数据集上，该框架在Oracle和估计说话人分离条件下实现了最先进的ASR性能，超过了所有CHiME-{7,8}挑战提交。据我们所知，这是首个在真实对话“语音在野外”数据上显著优于引导源分离的神经语音分离方法。

英文摘要

In conversational speech separation and recognition tasks, close-talk microphones are typically attached to each speaker during training data collection to capture near-field, close-talk mixture signals, in addition to using far-field microphones to record far-field mixture signals. Each such close-talk mixture exhibits a reasonably high energy level for the wearer and could intuitively serve as weak supervision for training far-field speech separation models directly on real-recorded far-field signals. However, they are not sufficiently clean for this purpose, as they often contain strong cross-talk speech from other speakers in addition to background noise. To address this, we propose cross-talk reduction (CTR), a task aiming to isolate the wearer's speech from each close-talk mixture, and a novel method called CTRnet, which can be trained directly on real-recorded pairs of close-talk and far-field mixtures to accomplish CTR. Building on CTRnet, we further propose pseudo-label based far-field speech separation (PuLSS), which uses CTRnet's estimated clean speech as pseudo-labels to train models for separating far-field mixtures. A key advantage of the proposed framework is that both CTRnet and PuLSS can be trained on real-recorded data from the target domain, addressing the generalization gap commonly observed when models are trained exclusively on simulated data. On the CHiME-6 dataset, our framework achieves state-of-the-art ASR performance under both oracle and estimated speaker diarization, surpassing all CHiME-{7,8} challenge submissions. To our knowledge, it is the first neural speech separation method that substantially outperforms guided source separation on real conversational "speech-in-the-wild" data.

URL PDF HTML ☆

赞 0 踩 0

2605.19685 2026-05-20 stat.ML cs.LG

Probabilistic Multivariate Time Series Forecasting with Diffusion Copulas

基于扩散Copula的概率多变量时间序列预测

David Huk, Dongshan Wang, Miha Bresar

AI总结本文提出了一种扩散-Copula框架，通过分离边际分布学习与依赖结构学习，改进了多变量时间序列预测中对尾部风险的估计，展示了在加密货币市场中对系统性极值的预测优势。

Comments ICLR 2026 Workshop Advances in Financial AI

2605.19667 2026-05-20 math.OC cs.LG

Convergence of Consensus-Based Particle Methods for Nonconvex Bi-Level Optimization

非凸双层优化中基于共识的粒子方法的收敛性

Yutong Chao, Xudong Sun, Konstantin Riedl, Majid Khadiv, Jalal Etesami

AI总结本文研究了一种用于非凸双层优化的基于共识的优化方法，旨在最小化上层函数，其中下层问题的全局极小值集是优化域。该方法无导数，通过平滑分位数选择与Gibbs型拉普拉斯近似相结合来构建共识点。研究建立了与关联的均场动力学及其有限粒子近似的收敛性保证。特别地，在适当的平滑分位数局部化、误差界和稳定性假设下，证明了均场定律能够在给定的Wasserstein邻域内以显式指数速率达到目标双层解。数值实验进一步支持了理论结果。

2605.19666 2026-05-20 physics.med-ph cs.LG

Cross-View Attention Fusion Net: A Prior-Guided Dual-View Representation Learning for Cardiac Output Estimation from Short-Term PPG Signals

跨视图注意力融合网络：一种基于先验信息的双视图表示学习用于从短时PPG信号估计心输出量

Yaowen Zhang, Bo Cui, Libera Fresiello, Peter H. Veltink, Dirk W. Donker, Ying Wang

AI总结本文提出了一种基于先验信息的双视图深度学习模型CVAF-Net，用于从短时PPG信号估计心输出量，通过跨视图注意力融合技术提升模型性能，并在多个数据集上验证了其有效性。

详情

AI中文摘要

从光体积脉搏波描记术（PPG）准确估计心输出量（CO）对于无创血流动力学监测具有潜力，但仍然困难，因为CO由心脏功能和血管张力共同决定。传统基于特征的模型使用具有生理意义的PPG描述符，但依赖于准确的脉搏检测并可能遗漏潜在的时间关系。相比之下，全端到端深度学习模型直接从原始PPG信号学习，但往往未能充分利用已建立的PPG衍生先验信息。本文引入了跨视图注意力融合网络（CVAF-Net），一种用于从短时、固定长度PPG段估计CO的基于先验信息的双视图深度学习模型。CVAF-Net将原始PPG信号作为时间视图，并将特征序列图（FSM）作为结构化先验引导视图，通过跨视图注意力融合两种表示。该模型独立评估了来自三个数据集的5秒、15秒和30秒段：模拟脉冲波（3323名受试者）、血管收缩诱发（79名受试者）以及静息/骑车活动（10名受试者），并与多种机器学习和深度学习基准进行了比较。CVAF-Net在大多数基准方法上表现更优，并在模拟数据上以平均绝对误差（MAE）为0.19 L/min（MAPE: 3.95%）与最先进的基于Transformer的模型性能相当，在现实世界中也实现了高准确性（最小MAE: 1.20 L/min）。重要的是，CVAF-Net将浮点运算次数（FLOPs）减少了十二倍，与领先的基于Transformer的模型相比。合理性分析显示，CO估计在生理上一致，与年龄（ρ=-0.274）、心率（ρ=0.894）和全身血管阻力（ρ=-0.740）有预期的相关性。这些发现表明，CVAF-Net提供了一种准确、计算高效且可推广的连续可穿戴CO监测方法。

英文摘要

Accurate cardiac output (CO) estimation from photoplethysmography (PPG) is promising for unobtrusive hemodynamic monitoring, but remains difficult since CO is jointly determined by cardiac function and vascular tone. Conventional feature-based models use physiologically meaningful PPG descriptors, yet depend on accurate pulse detection and may miss latent temporal relationships. In contrast, fully end-to-end deep learning models learn directly from raw PPG but often underuse established PPG-derived prior information. Here, we introduce the Cross-View Attention Fusion Network (CVAF-Net), a prior-guided dual-view deep learning model for CO estimation from short, fixed-length PPG segments. CVAF-Net processes raw PPG as a temporal view and a feature sequence map (FSM) as a structured prior-guided view, and fuses the two representations through cross-view attention. The model was independently evaluated using 5-, 15-, and 30-s segments from three datasets: simulated pulse waves (3323 subjects), vasoconstriction provocation (79 subjects), and resting/cycling activities (10 subjects), and was compared with multiple machine learning and deep learning benchmarks. CVAF-Net outperformed most benchmark methods and achieved performance comparable to a state-of-the-art Transformer-based model, with a mean absolute error (MAE) of 0.19 L/min (MAPE: 3.95%) on simulated data and high accuracy in real-world settings (minimum MAE: 1.20 L/min). Importantly, CVAF-Net reduced FLOPs by twelvefold compared with the leading Transformer-based model. Plausibility analysis showed physiologically consistent CO estimates, with expected correlations with age ($ρ= -0.274$), heart rate ($ρ= 0.894$), and systemic vascular resistance ($ρ= -0.740$). These findings indicate that CVAF-Net provides an accurate, computationally efficient, and generalizable approach for continuous wearable-based CO monitoring.

URL PDF HTML ☆

赞 0 踩 0

2605.19665 2026-05-20 cs.SE cs.AI

CriterAlign: Criterion-Centric Rationale Alignment for Code Preference Judging

CriterAlign: 以标准为中心的推理对齐用于代码偏好判断

Zhenyu Li, Aleksandar Cvejic, Zehui Chen, Peter Wonka

AI总结本文提出CriterAlign，一种以标准为中心的推理对齐框架，通过直接的标准级 pairwise 判断、tie-driven 标准细化、swap-consistency 过滤和最终 pairwise 合成，改进了代码偏好判断的准确性，同时引入Human-Preference-Aligned Guidance (HPAG)来提升性能。

详情

AI中文摘要

成对的人类偏好预测是评估代码生成系统的核心，其中质量往往依赖于任务特定的权衡，而不仅仅是功能正确性。虽然基于评分表的LLM判断通过将评估分解为显式标准来提高可解释性，但大多数现有流程仍然是逐点的：它们独立评分每个响应，并通过比较聚合分数来推导偏好。我们证明这种设计与成对的代码偏好预测不匹配，并且可能在强单体判断下表现不佳。我们提出了CriterAlign，一种以标准为中心的框架，通过直接的标准级成对判断、tie驱动的标准细化、swap一致性过滤和最终成对合成，将基于评分表的判断适应于成对偏好评估。我们进一步引入Human-Preference-Aligned Guidance (HPAG)，通过从训练示例中提取人类偏好与单体判断预测之间的反复推理缺口进行离线合成，并注入到标准生成器、标准判断器和最终判断器中。在BigCodeReward上，CriterAlign将Qwen2.5-VL-32B单体判断的准确率从60.4%提升到66.3%，消融实验确认了成对标准设计和HPAG的贡献。

英文摘要

Pairwise human preference prediction is central to evaluating code-generation systems, where quality often depends on task-specific trade-offs beyond functional correctness. While rubric-based LLM judges improve interpretability by decomposing evaluation into explicit criteria, most existing pipelines remain pointwise: they score each response independently and derive preferences by comparing aggregated scores. We show that this design is poorly matched to pairwise code preference prediction and can underperform a strong monolithic judge. We propose CriterAlign, a criterion-centric framework that adapts rubric-based judging to pairwise preference evaluation through direct criterion-level pairwise judgments, tie-driven criterion refinement, swap-consistency filtering, and final pairwise synthesis. We further introduce Human-Preference-Aligned Guidance (HPAG), synthesized offline from training examples by extracting recurring rationale gaps between human preferences and monolithic judge predictions, and injected into the criterion generator, criterion judge, and final judge. On BigCodeReward, CriterAlign improves a Qwen2.5-VL-32B monolithic judge from 60.4% to 66.3% accuracy, with ablations confirming the contributions of pairwise criterion design and HPAG.

URL PDF HTML ☆

赞 0 踩 0

2605.19646 2026-05-20 q-bio.NC cs.LG

BCI-sift: An automated feature selection toolbox for Brain Computer Interface applications

BCI-sift: 一种用于脑机接口应用的自动化特征选择工具箱

Elena C Offenberg, Dirk Keller, Mariska J Vansteensel, Zachary V Freudenburg, Nick F Ramsey, Julia Berezutskaya

AI总结本文提出BCI-sift工具箱，通过整合先进优化方法，为脑机接口任务提供自动化特征选择解决方案，提升了分类准确性和解释性。

Comments 19 pages, 12 figures

详情

AI中文摘要

在临床脑机接口（BCI）领域的发展依赖于精确且可靠的信号解释。然而，来自植入式和非植入式BCI采集的数据具有高维性和噪声特性，这带来了重大挑战，推动了特征选择算法的应用。我们引入了BCI-sift（BCI系统性和可解释性特征调节），一种基于Python的工具箱，旨在简化将各种优化算法应用于BCI数据集以识别机器学习任务中最相关的特征。我们的scikit-learn兼容工具箱（github.com/UMCU-RIBS/BCI-sift）通过整合先进的优化方法简化了BCI任务中的特征选择。我们验证了该工具箱在8名健康受试者（64-128个电极植入在运动皮层上）的高密度电极图（HD ECoG）数据上的性能，这些受试者重复说出12个单词。BCI-sift在电极、时间及频率维度上识别了信息丰富的神经特征。电极选择的解剖位置在不同受试者之间一致，并与已知的运动皮层功能组织一致。相关时间点集中在说话产生周围，高频带被识别为最信息丰富的，这与先前工作一致。特征选择比使用所有特征提高了分类准确性。BCI-sift提供了一个易于使用的多功能平台，用于BCI研究中的特征选择，能够提高解码性能、自动化特征分析和增强解释性。虽然验证了HD ECoG数据，该方法广泛适用于其他BCI模态。通过提高分类准确性和可解释性，BCI-sift解决了开发高效和透明BCI系统的关键挑战。

英文摘要

Advancements in clinical Brain-Computer Interfaces (BCIs) depend on precise and reliable signal interpretation. However, the high-dimensional and noisy nature of data captured from both implanted and non-implanted BCIs poses significant challenges, motivating the use of feature selection algorithms. We introduce BCI-sift (BCI Systematic and Interpretable Feature Tuning), a Python-based toolbox designed to streamline the application of diverse optimization algorithms to BCI datasets for identifying the most relevant features in machine learning tasks. Our scikit-learn-compatible toolbox (github.com/UMCU-RIBS/BCI-sift) simplifies feature selection in BCI tasks by integrating advanced optimization methods. We validated the toolbox on high-density electrocorticography (HD ECoG) data from eight able-bodied participants with 64-128 electrodes implanted over the sensorimotor cortex, who repeatedly spoke 12 words. BCI-sift identified informative neural features across electrode, temporal, and frequency dimensions. The anatomical locations of electrode selections were consistent across participants and aligned with known functional organization of the sensorimotor cortex. Relevant time points clustered around speech production, and the high-frequency band was identified as most informative, in line with prior work. Feature selection improved classification accuracy compared to using all features. BCI-sift provides an accessible and versatile platform for feature selection in BCI research, enabling improved decoding performance, automated feature analysis, and enhanced interpretability. While validated on HD ECoG data, the approach is broadly applicable to other BCI modalities. By enhancing classification accuracy and interpretability, BCI-sift addresses key challenges in developing efficient and transparent BCI systems.

URL PDF HTML ☆

赞 0 踩 0

2605.19644 2026-05-20 cs.CR cs.LG

Inferring Sensitive Attributes from Knowledge Graph Embeddings: Attack and Defense Strategies

从知识图谱嵌入中推断敏感属性：攻击与防御策略

Yasmine Hayder

AI总结本文研究了基于知识图谱嵌入（KGE）推理的隐私风险，提出了一种通过后处理去污技术减轻这些风险的框架，探讨了在推荐质量与隐私保护之间进行权衡的必要性。

详情

Journal ref: ESWC - Extended Semantic Web Conference, May 2026, Dubrovnik, France

AI中文摘要

知识图谱（KGs）是一种强大的链接数据表示形式，提供了灵活性、语义丰富性和支持知识丰富和推理的能力。它们帮助数据所有者组织和利用异构数据以提供有洞察力的服务（例如推荐），但现实中的KGs往往不完整，隐藏了真实的事实或遗漏了有价值的观点。知识图谱嵌入技术常用于推断有价值的缺失信息。然而，对KGs的推理可能会无意中暴露敏感的用户信息，即使这些数据并未显式存储。在本文中，我们研究了基于KGE推理的隐私风险，重点关注攻击者试图从看似非敏感的输出中推断出敏感用户属性的属性推断攻击。我们提出并评估了一个框架，通过应用后处理去污技术来减轻这些隐私风险。初步结果展示了这些攻击对KGE模型输出的有效性，并探讨了在应用基于随机化的技术时推荐质量与隐私保护之间的权衡，突显了未来工作需要实验更高级技术以解决此问题的必要性。

英文摘要

Knowledge Graphs (KGs) are a powerful representation of linked data, offering flexibility, semantic richness, and support for knowledge enrichment and reasoning. They help data owners organize and exploit heterogeneous data to provide insightful services (e.g., recommendations), yet real-world KGs are often incomplete, hiding true facts or missing valuable insights. Knowledge graph embedding techniques are commonly used to infer valuable missing information. However, reasoning over KGs can inadvertently expose sensitive user information, even when such data is not explicitly stored. In this work, we investigate the privacy risks associated with KGE-based reasoning, focusing on attribute inference attacks where adversaries attempt to deduce sensitive user attributes from seemingly non-sensitive outputs. We propose and evaluate a framework that mitigates these privacy risks by applying post processing sanitization techniques to KGE outputs. Preliminary results demonstrate the effectiveness of these attacks on the outputs of KGE models, and explore the trade-off between recommendation quality and privacy protection when applying randomization based approaches, highlighting the need to experiment with more advanced techniques in future work to address this issue.

URL PDF HTML ☆

赞 0 踩 0

2605.19641 2026-05-20 stat.ML cs.LG

Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data

增加缺失值以减少偏差：带有缺失数据的Richardson-SGD

Ferdinand Genans, Erwan Scornet

AI总结本文研究了如何通过增加缺失值来减少梯度偏差，提出了一种基于Richardson外推的Richardson-SGD方法，该方法通过在已有不完整数据的基础上故意增加缺失率，从而抵消梯度偏差，提高了不完整数据下的优化和估计性能。

详情

AI中文摘要

随机梯度方法在现代大规模学习中至关重要，但其在不完整协变量中的使用仍然谨慎，因为插补方案通常会引入系统性的梯度偏差，如在线性模型中所示。在本工作中，我们证明了所有参数模型在各种插补程序中都表现出相似的梯度偏差，并且精确地刻画了缺失率向量p的依赖性，其中O(||p||)是主导项。我们利用这一分析，提出了一种简单的去偏差程序，用于带有缺失值的随机梯度下降（SGD），基于Richardson外推。关键思想是“故意增加缺失率”：从已有的不完整观测中，生成一个更稀疏的版本，在更高的、受控的缺失率下，并将两个结果的随机梯度结合以抵消主导的偏差项。我们证明，在几种缺失情况中，一个Richardson步骤将梯度偏差从O(||p||)减少到O(||p||²)。我们提出的方法计算高效，模型无关，并适用于任何参数损失函数，其随机梯度可以在插补后计算。此外，当缺失指示符独立时，总体梯度偏差是p的多线性多项式，并仅取决于由声明单个坐标缺失引起的总体梯度误差。在这种情况下，我们的方法可以推广到多步Richardson过程，该过程递归地抵消更高阶项。在经验上，Richardson去偏差提高了多个广义线性模型中的优化和估计性能，并与广泛使用的插补程序如MICE相结合。这些结果表明，有些反直觉地，在现有缺失数据上添加受控的缺失率可以使不完整数据的随机学习更准确。

英文摘要

Stochastic gradient methods are central to modern large-scale learning, but their use with incomplete covariates remains delicate since imputation schemes generally introduce systematic gradient biases, as shown for linear models. In this work, we prove that all parametric models exhibit similar gradient bias for various imputation procedures and characterize exactly the dependence on the missingness ratio vector $p$, with $O(\|p\|)$ as the leading term. We exploit this analysis to propose a simple debiasing procedure for stochastic gradient descent (SGD) with missing values based on Richardson extrapolation, which leverages the exact expression of the gradient bias. The key idea is to \emph{deliberately add missingness}: from an already incomplete observation, we generate a further-thinned version at a higher, controlled missingness level, and combine the two resulting stochastic gradients to cancel the leading bias term. We prove that one Richardson step reduces the gradient bias from $O(\|p\|)$ to $O(\|p\|^2)$ under several missingness scenarios. Our proposed method is computationally efficient, model-agnostic and applies to any parametric loss whose stochastic gradient can be computed after imputation. Furthermore, when missing indicators are independent, the population gradient bias is a multilinear polynomial in $p$ and depends only on population gradient errors induced by declaring a single coordinate missing. In this case, our method generalizes to a multi-step Richardson procedure which recursively cancels higher-order terms. Empirically, Richardson debiasing improves optimization and estimation across several generalized linear models and combines positively with widely used imputation procedures such as MICE. These results suggest that, somewhat counter-intuitively, adding controlled missingness on top of existing missing data can make stochastic learning from incomplete data more accurate.

URL PDF HTML ☆

赞 0 踩 0

2605.19638 2026-05-20 cs.HC cs.AI cs.CY cs.SE

The Accessibility Capability Boundary: Operational Limits and Expansion Potential of AI-Generated Browser-Native Accessibility Systems

可访问性能力边界：AI生成浏览器原生可访问性系统的操作极限与扩展潜力

Rizwan Jahangir, Daisuke Ishii

AI总结本文提出可访问性能力边界（ACB）理论框架，探讨AI生成浏览器原生可访问性系统在操作极限和扩展潜力方面的核心问题，并通过实证原型分析，定义了可访问性能力空间中的可达区域和不可达区域，为自主可访问性计算的可扩展性提供了理论基础。

Comments 21 pages, 4 figures

详情

AI中文摘要

随着大型语言模型（LLMs）在合成功能性用户界面方面的能力不断增强，可访问性计算领域出现了一个基本问题：AI驱动的可访问性系统能走多远？本文引入了可访问性能力边界（ACB），这是一个用于推理自主可访问性系统操作极限和扩展潜力的正式框架，并基于现实世界系统构件进行了理论构建。我们不将可访问性视为二元合规属性，而是将其视为受可测量变量约束的动态、多维能力空间，包括部署延迟、认知负荷、基础设施依赖性、离线持久性、交互复杂性和适应性等变量。我们论证了由单文件HTML构件构建的AI生成浏览器原生系统，利用标准浏览器API，可能通过将部署摩擦降至接近零，从而大幅扩展ACB。我们通过正式定义、命题和比较评估矩阵，定义了此类系统所能和无法达到的可访问性能力空间区域。我们进一步识别了剩余的计算、基础设施和验证约束，这些构成了该范式的硬边界。本文为理解自主可访问性计算的可扩展性极限提供了理论基础，并提出了未来在可访问性感知AI系统中的研究议程。

英文摘要

As large language models (LLMs) demonstrate increasing competence in synthesizing functional user interfaces, a fundamental question emerges in accessibility computing: \textit{how far can AI-driven accessibility systems go?} This paper introduces the \textit{Accessibility Capability Boundary} (ACB), a formal framework for reasoning about the operational limits and expansion potential of autonomous accessibility systems, and grounds this theory in a real-world systems artifact. We model accessibility not as a binary compliance property but as a dynamic, multidimensional capability space constrained by measurable variables including deployment latency, cognitive load, infrastructure dependency, offline persistence, interaction complexity, and adaptability. We argue that AI-generated, browser-native systems constructed as single-file HTML artifacts leveraging standard browser APIs may dramatically shift the ACB outward by reducing deployment friction to near-zero and enabling rapid, context-specific interface adaptation. We ground our theoretical framework in the analysis of two real-world exploratory prototypes. The first is an AI-generated browser-native accessibility interface deployed for a blind user in Nepal. The second is a fully functional, open-source webcam alignment assistant for visually impaired users, serving as a concrete systems artifact. Through formal definitions, propositions, and a comparative evaluation matrix, we characterize the regions of the accessibility capability space that such systems can and cannot reach. We further identify remaining computational, infrastructural, and verification constraints that constitute the hard boundaries of this paradigm. This work contributes a theoretical foundation for understanding the scalable limits of autonomous accessibility computing and proposes a research agenda for future work in accessibility-aware AI systems.

URL PDF HTML ☆

赞 0 踩 0

2605.19632 2026-05-20 cs.LO cs.SD

Executable Boundary Contracts for Sound Event Traces

可执行的边界合同用于声音事件轨迹

Faruk Alpay, Hamdi Alakkad

AI总结本文提出了一种可执行的边界合同，用于有限声音事件轨迹的测量，通过定义帧片段、事件层和相关约束来评估时间边界行为，以改进声音事件报告的准确性。

Comments 39 pages. Finite frame core code, tables, manifests, and Lean checks are ancillary material

详情

AI中文摘要

声音事件报告通常将时间边界行为压缩为帧、片段或事件分数。本文定义了有限声音事件轨迹的可执行边界合同。帧片段是一种有界的布尔片段，可嵌入STL后通过网格投影。事件层增加了声明的区间匹配、持续时间条款、碎片化条款和受限制的向量评分。目的是测量，而不是新的通用时间逻辑或挑战排行榜。本文的成果评估了受控的Mini LibriSpeech种子场景、MAESTRO真实声音景观、冻结的预训练时间探针以及官方的DCASE 2024任务4基准赛道。在这些赛道上，标准分数和合同坐标以可解释的方式存在分歧。最强的真实语料发现是联合活动可以隐藏类型边界失败，而外部DCASE输出提供了类索引挑战级别的参考。代码、生成的表格、清单和Lean检查用于有限帧核心作为附属材料。

英文摘要

Sound event reports often compress timed boundary behavior into frame, segment, or event scores. This paper defines executable boundary contracts for finite sound event traces. The frame fragment is a bounded Boolean fragment embeddable in STL after grid projection. The event layer adds declared interval matching, duration clauses, fragmentation clauses, and obligation restricted vector scoring. The aim is measurement, not a new general temporal logic and not a challenge leaderboard. The artifact evaluates controlled Mini LibriSpeech seeded scenes, MAESTRO Real soundscapes, frozen pretrained timing probes, and an official DCASE 2024 Task 4 baseline track. Across these tracks, standard scores and contract coordinates disagree in interpretable ways. The strongest real corpus finding is that union activity can hide typed boundary failure, while external DCASE outputs provide a class indexed challenge level reference. Code, generated tables, manifests, and Lean checks for the finite frame core are supplied as ancillary material.

URL PDF HTML ☆

赞 0 踩 0

2605.19629 2026-05-20 stat.ML cs.LG math.OC

Gaussian Approximation and Multiplier Bootstrap for Federated Linear Stochastic Approximation

高斯近似与乘数自助法用于联邦线性随机逼近

Ilya Levin, Maksim Shuklin, Eric Moulines, Paul Mangold, Sergey Samsonov

AI总结本文建立了联邦线性随机逼近的Berry-Esseen型界，首次明确捕捉通信-计算权衡和异质性误差项的联邦高斯近似，量化了局部步长、局部更新次数和异质性对收敛速率的影响。

2605.19621 2026-05-20 eess.IV cs.LG cs.NA math.NA

Diffusion Graph Posterior Sampling for Nonlinear Inverse Problems with Application to Electrical Impedance Tomography

基于扩散后验采样的图结构数据非线性反问题求解方法及其在电阻抗断层成像中的应用

Giovanni S. Alberti, Damiana Lazzaro, Serena Morigi, Matteo Santacesaria, Shibo Wang

AI总结本文提出了一种扩展扩散后验采样（DPS）到图结构数据的框架，通过在二维三角网格上开发无条件分数基于扩散模型来学习物理解空间的准确先验，并引入正则化变体RDPS，结合总变差和广义Tikhonov等显式正则化项，以缓解严重病态问题，实验表明RDPS在合成和真实2D EIT数据集上产生稳定且物理合理的重建。

详情

AI中文摘要

深度生成模型已发展为解决反问题的最先进方法，但将其应用于PDE反问题，如电阻抗断层成像（EIT）仍具挑战性。由于物理领域自然离散为无结构网格而非规则网格，标准卷积架构往往不足。本文提出了一种新的框架，将扩散后验采样（DPS）扩展到图结构数据。我们开发了直接在2D三角网格上无条件分数基于扩散模型，以学习物理解空间的准确先验。此外，我们引入正则化变体RDPS，结合总变差和广义Tikhonov等显式正则化项，以补充隐含扩散先验并缓解严重病态问题。在合成和真实2D EIT数据集上的广泛实验表明，RDPS产生稳定、物理合理的重建。我们的方法能够很好地推广到非分布包含几何形状，对测量噪声具有高度鲁棒性，并在重建准确性和伪影减少方面优于当前最先进的求解器（例如GPnP-BM3D、DP-SGS）

英文摘要

Deep generative models have emerged as state-of-the-art for solving inverse problems, but applying them to inverse problems for PDEs, like electrical impedance tomography (EIT) remains challenging. Because physical domains are naturally discretized as unstructured meshes rather than regular grids, standard convolutional architectures are often inadequate. In this paper, we propose a novel framework that extends diffusion posterior sampling (DPS) to graph-structured data. We develop an unconditional score-based diffusion model directly on a 2D triangular mesh to learn an accurate prior over the physical solution space. Furthermore, we introduce a regularized variant, RDPS, which incorporates explicit regularization terms, such as total variation and generalized Tikhonov, to complement the implicit diffusion prior and mitigate severe ill-posedness. Extensive experiments on synthetic and real 2D EIT datasets demonstrate that RDPS produces stable, physically plausible reconstructions. Our approach generalizes well to out-of-distribution inclusion geometries, is highly robust to measurement noise, and outperforms current state-of-the-art solvers (e.g., GPnP-BM3D, DP-SGS) in reconstruction accuracy and artifact reduction.

URL PDF HTML ☆

赞 0 踩 0

2605.19610 2026-05-20 stat.ML cs.LG

Posterior Contraction of Lévy Adaptive B-spline Regression in Besov Spaces

Lévy自适应B样条回归在Besov空间中的后验收缩

Jeunghun Oh, Sewon Park, Jaeyong Lee

AI总结本文研究了Lévy自适应B样条回归模型在Besov空间中的后验收缩性质，证明了该模型在非参数回归框架中能够以接近最优的速率收敛到真实函数，同时自动适应未知的光滑度。

详情

AI中文摘要

我们研究了Lévy自适应B样条（LABS）回归模型的渐近性质，这是一种将B样条核纳入Lévy自适应回归核（LARK）模型的贝叶斯非参数方法。LABS应用具有不同次数的样条，并独立定义结点，从而获得一个灵活的模型类，能够适应真实函数的不规则和局部结构特征。在单变量随机设计和高斯误差的非参数回归框架中，我们证明了LABS后验在Besov类中以接近最优的速率收敛到真实函数，直至一个对数因子，同时自动适应未知的光滑度。本研究填补了文献中的空白，因为关于LARK模型在Besov空间中的后验收缩的理论结果仍然很少。在Besov空间的标准测试函数（包括Blocks、Bumps、HeaviSine和Doppler）上的模拟实验补充了理论结果，并展示了LABS的实用价值。

英文摘要

We investigate the asymptotic properties of the Lévy Adaptive B-spline (LABS) regression model, a Bayesian nonparametric method that incorporates B-spline kernels into the Lévy Adaptive Regression Kernel (LARK) model. LABS applies splines of varying degrees with independently defined knots, yielding a flexible model class capable of adapting to irregular and locally structured features of the true function. Within the nonparametric regression framework with univariate random design and Gaussian errors, we establish that the LABS posterior contracts around the true function in Besov classes at nearly minimax-optimal rates, up to a logarithmic factor, while adapting automatically to unknown smoothness. This study contributes to filling a gap in the literature, where theoretical results on posterior contraction of the LARK model in Besov spaces remain scarce. Simulation experiments on standard test functions in Besov spaces, including Blocks, Bumps, HeaviSine, and Doppler, complement the theoretical results and demonstrate the practical utility of LABS.

URL PDF HTML ☆

赞 0 踩 0

2605.19565 2026-05-20 physics.flu-dyn cs.LG

HiLiftAeroML: High-Fidelity Computational Fluid Dynamics Dataset for High-Lift Aircraft Aerodynamics

HiLiftAeroML：高保真计算流体力学数据集用于高升力飞机气动性能

Neil Ashton, Adam Clark, Liam Heidt, Christopher Ivey, Sanjeeb Bose, Rahul Agrawal, Konrad Goc, Rishi Ranade, Corey Adams, Peter Sharpe, Sheel Nidhan, Semit Akkurt, Daniel Leibovici, Jean Kossaifi

AI总结本文介绍了一个首个开源的高保真计算流体力学数据集，用于AI代理模型开发，该数据集包含1800个样本，源自180种几何变体和10个攻角的NASA通用研究模型（CRM）几何体，用于AIAA高升力预测工作坊系列。该数据集的创新之处在于使用GPU加速的高保真显式壁模式LES方法进行每个模拟，使用300M到500M的适应性网格，以确保在已知的稳态RANS方法在飞行包线部分的挑战下尽可能高的精度。整个数据集（几何体、时间平均体积和表面变量以及积分力）免费提供，带有宽松的开源许可（CC-BY-4.0）。通过公开发布此数据，我们旨在加速航空航天工业中AI代理建模的研究与开发。

详情

AI中文摘要

本文描述了首个开源的高保真计算流体力学数据集，用于AI代理模型开发。该数据集由1800个样本组成，源自180种几何变体和10个攻角的高升力NASA通用研究模型（CRM）几何体，用于AIAA高升力预测工作坊系列。该数据集的一个创新点是使用GPU加速的高保真显式壁模式LES方法进行每个模拟，使用300M到500M的适应性网格。这确保了在已知的稳态RANS方法在飞行包线部分的挑战下尽可能高的精度。整个数据集（几何体、时间平均体积和表面变量以及积分力）免费提供，带有宽松的开源许可（CC-BY-4.0）。通过公开发布此数据，我们旨在加速航空航天工业中AI代理建模的研究与开发。

英文摘要

This paper describes the first-ever open-source high-fidelity CFD dataset of a high-lift aircraft for the purpose of AI surrogate model development. The dataset is composed of 1800 samples, arising from 180 geometry variants and 10 angles of attack for the high-lift NASA Common Research Model (CRM) geometry, used within the AIAA High-Lift Prediction Workshop series. One of the novelties of this dataset is the use of a GPU-accelerated high-fidelity explicit, wall-modeled LES approach for each simulation, using solution-adapted grids between 300M and 500M cells. This ensures the greatest possible accuracy given known challenges in steady-state RANS approaches for these portions of the flight envelope. The entire dataset (geometries, time-averaged volume and surface variables and integral forces) are available, free of charge with a permissive open-source license (CC-BY-4.0). By making this data publicly available, we aim to accelerate the research and development of AI surrogate modeling within the aerospace industry.

URL PDF HTML ☆

赞 0 踩 0

2605.19557 2026-05-20 stat.ML cs.LG

Density-Ratio Losses for Post-Hoc Learning to Defer

基于密度比损失的后验学习延迟

Alexander Soen, Ragnar Thobaben, Joakim Jaldén, Richard Nock

AI总结本文研究了后验学习延迟（L2D）问题，通过理想分布的视角定义延迟，并提出基于密度比损失的CPE损失函数，通过阈值判断延迟决策，从而在不重新训练的情况下调整延迟率，同时揭示了Chow规则与专家倾斜贝叶斯后验之间的联系。

Comments Preprint

详情

AI中文摘要

我们通过理想分布的视角研究后验学习延迟（L2D）。理想分布被定义为在其中模型能够取得低损失的数据分布的密度比重加权。我们通过将密度比估计还原为类别概率估计，推导出用于后验L2D评分器的DR CPE损失。延迟决策通过阈值化评分器进行，允许在不重新训练的情况下调整延迟率。对于基于KL的理想分布，我们的延迟规则在原始分布下恢复Chow规则，并在理想分布是联合或边缘分布时与专家倾斜的贝叶斯后验建立联系。实验表明，我们的方法在与常见基线相比具有竞争力，并且在不同数据集设置下更加稳健。更广泛地说，我们的结果将后验L2D视为理想分布之间的密度比学习，连接了Chow式规则、专家比较以及阐明了与异常检测等其他学习设置的相关联系。

英文摘要

We study post-hoc Learning to Defer (L2D) through the lens of ideal distributions: divergence-regularized reweightings of the data distribution under which a model attains low loss. We define deferral via the density-ratio between a model's and an expert's ideals. Using the reduction from density-ratio estimation to class-probability estimation, we derive the DR CPE losses for post-hoc L2D scorers. Deferral decisions are then made by thresholding the scorer, allowing deferral rates to be adjusted without retraining. For KL-based ideal distributions, our deferral rules recovers Chow's rule under the original distribution and a connection to an expert-tilted Bayes posterior -- which incorporates the expert's performance -- depending on if the ideal distributions are joint or marginal distributions. Experimentally, our approach is competitive compared to common baselines and more robust across dataset settings. More broadly, our results cast post-hoc L2D as density-ratio learning between ideal distributions, bridging Chow-style rules, expert comparison, and elucidating connections to related learning settings including anomaly detection.

URL PDF HTML ☆

赞 0 踩 0

2605.19551 2026-05-20 cs.GR cs.CV

AnchorFlow: Editable SVG Reconstruction via Sparse Anchor Point Fields

AnchorFlow: 通过稀疏锚点场实现可编辑的SVG重建

Mengnan Jiang, Christian Franke, Michele Franco Adesso, Antonio Haas, Grace Li Zhang

AI总结本文提出AnchorFlow框架，通过稀疏锚点场实现路径级锚点放置，解决图像到SVG重建中精度与可编辑性的平衡问题，实验表明其在保持高质量的同时显著降低可编辑复杂度。

详情

AI中文摘要

图像到SVG重建旨在生成忠实于位图输入且易于编辑的矢量图形。现有方法在如何参数化矢量结构上面临结构性权衡，包括图像由多少路径表示以及每个路径由多少锚点定义。高保真方法通常依赖大量路径或密集参数化曲线，而过于紧凑的SVG生成可能会偏离输入几何。这个问题在局部位图证据不完美时更加明显，其中边界跟随重建可能会引入冗余锚点和碎片化结构。我们主张应在锚点放置层面解决这一权衡，因为贝塞尔曲线上的锚点定义局部路径结构，并强烈影响精度和可编辑性。我们提出AnchorFlow，一个可编辑的SVG重建框架，通过稀疏锚点场建模路径级锚点放置。给定从位图图像中提取的路径状前景组件，AnchorFlow为每个组件预测一个图像条件的稀疏锚点场，并将其解析为有序的贝塞尔路径。渲染引导的反馈随后纠正局部结构错误后再进行重新解析。恢复的路径随后被组装和优化为最终的SVG。在孤立路径和完整图像上的实验表明，AnchorFlow在精度和可编辑性之间实现了有利的权衡，显著降低了可编辑复杂度，同时保持竞争性的位图保真度。

英文摘要

Image-to-SVG reconstruction aims to produce vector graphics that are faithful to raster inputs and easy to edit. Existing methods face a structural trade-off in how vector structure is parameterized, including how many paths represent an image and how many anchor points define each path. High-fidelity methods often rely on many paths or densely parameterized curves, whereas overly compact SVG generation may deviate from the input geometry. This issue becomes more pronounced when local raster evidence is imperfect, where boundary-following reconstruction can introduce redundant anchors and fragmented structures. We argue that this trade-off should be addressed at the level of anchor placement, since anchors on Bezier curves define local path structure and strongly affect both accuracy and editability. We propose AnchorFlow, an editable SVG reconstruction framework that models path-level anchor placement with sparse anchor point fields. Given path-like foreground components extracted from a raster image, AnchorFlow predicts an image-conditioned sparse anchor field for each component and resolves it into an ordered Bezier path. Rendering-guided feedback then corrects local structural errors before re-resolution. The recovered paths are then assembled and optimized into the final SVG. Experiments on isolated paths and full images show that AnchorFlow achieves a favorable fidelity-editability trade-off, substantially reducing editable complexity while preserving competitive raster fidelity.

URL PDF HTML ☆

赞 0 踩 0