arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 8085
0910.0921 2026-06-03 cs.LG cs.NA math.NA

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

含噪声观测的低秩矩阵补全:定量比较

Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh

AI总结 本文通过仿真平台定量比较了三种主流低秩矩阵补全算法(OptSpace、ADMiRA和FPCA)在噪声观测下的性能,并展示了它们在真实数据和随机生成数据上的准确重建能力。

Comments 7 pages, 7 figures, 47th Allerton Conference on Communication Control and Computing, 2009, invited paper

详情
AI中文摘要

我们考虑一个具有重要实际意义的问题,即从少量条目中重建低秩数据矩阵。该问题出现在许多领域,如协同过滤、计算机视觉和无线传感器网络。本文重点研究观测样本被噪声污染的矩阵补全问题。我们在单一仿真平台上比较了三种最先进的矩阵补全算法(OptSpace、ADMiRA和FPCA)的性能,并给出了数值结果。我们表明,这些高效算法在实践中可用于准确重建真实数据矩阵以及随机生成的矩阵。

英文摘要

We consider a problem of significant practical importance, namely, the reconstruction of a low-rank data matrix from a small subset of its entries. This problem appears in many areas such as collaborative filtering, computer vision and wireless sensor networks. In this paper, we focus on the matrix completion problem in the case when the observed samples are corrupted by noise. We compare the performance of three state-of-the-art matrix completion algorithms (OptSpace, ADMiRA and FPCA) on a single simulation platform and present numerical results. We show that in practice these efficient algorithms can be used to reconstruct real data matrices, as well as randomly generated matrices, accurately.

0909.5000 2026-06-03 cs.LG cs.NA cs.NE math.NA

Eignets for function approximation on manifolds

用于流形上函数逼近的特征网络

H. N. Mhaskar

AI总结 针对紧致光滑黎曼流形上的函数逼近问题,提出一种基于核函数线性组合的特征网络(eignet)的确定性通用算法,给出最优逼近阶估计并证明系数有界性及导数逼近的最优性。

Comments 28 pages. Articles in press; Applied and Computational Harmonic Analysis, 2009

详情
AI中文摘要

设 $\XX$ 为无边界紧致光滑连通黎曼流形,$G:\XX\times\XX\to \RR$ 为核函数。类似于径向基函数网络,特征网络(eignet)形如 $\sum_{j=1}^M a_jG(\circ,y_j)$,其中 $a_j\in\RR$,$y_j\in\XX$,$1\le j\le M$。我们描述了一种确定性的通用算法,用于构造逼近 $L^p(\mu;\XX)$ 中函数的特征网络,适用于一类广泛的测度 $\mu$ 和核 $G$。我们的算法产生线性算子。以中心 $y_j$ 之间的最小间隔作为逼近代价,我们给出了特征网络逼近度的光滑模估计,并通过逆定理证明这些估计对每个个体函数都是最优的。我们还根据特征网络的范数给出了系数 $a_j$ 的估计。最后,我们证明:如果任何特征网络序列满足光滑函数逼近度的最优估计(以最小间隔度量),那么特征网络的导数也以最优方式逼近目标函数的相应导数。

英文摘要

Let $\XX$ be a compact, smooth, connected, Riemannian manifold without boundary, $G:\XX\times\XX\to \RR$ be a kernel. Analogous to a radial basis function network, an eignet is an expression of the form $\sum_{j=1}^M a_jG(\circ,y_j)$, where $a_j\in\RR$, $y_j\in\XX$, $1\le j\le M$. We describe a deterministic, universal algorithm for constructing an eignet for approximating functions in $L^p(μ;\XX)$ for a general class of measures $μ$ and kernels $G$. Our algorithm yields linear operators. Using the minimal separation amongst the centers $y_j$ as the cost of approximation, we give modulus of smoothness estimates for the degree of approximation by our eignets, and show by means of a converse theorem that these are the best possible for every \emph{individual function}. We also give estimates on the coefficients $a_j$ in terms of the norm of the eignet. Finally, we demonstrate that if any sequence of eignets satisfies the optimal estimates for the degree of approximation of a smooth function, measured in terms of the minimal separation, then the derivatives of the eignets also approximate the corresponding derivatives of the target function in an optimal manner.

0906.0311 2026-06-03 cs.AI cs.NA math.NA physics.data-an

Solar radiation forecasting using ad-hoc time series preprocessing and neural networks

使用特定时间序列预处理和神经网络的太阳辐射预测

Christophe Paoli, Cyril Voyant, Marc Muselli, Marie-Laure Nivet

AI总结 本文提出一种结合特定时间序列预处理和多层感知器(MLP)的日水平面太阳辐射预测方法,实现了nRMSE<21%和RMSE<998 Wh/m²的预测性能,优于ARIMA、贝叶斯推断等传统方法。

Comments 14 pages, 8 figures, 2009 International Conference on Intelligent Computing

详情
AI中文摘要

在本文中,我们展示了神经网络在可再生能源领域的一个应用。我们开发了一种用于日水平面全球太阳辐射预测的方法。我们使用特定时间序列预处理和多层感知器(MLP)来预测日尺度的太阳辐射。初步结果令人鼓舞,nRMSE < 21%,RMSE < 998 Wh/m²。我们优化的MLP的预测性能与ARIMA技术、贝叶斯推断、马尔可夫链和k近邻近似器等传统方法相似甚至更好。此外,我们发现我们的数据预处理方法可以显著减少预测误差。

英文摘要

In this paper, we present an application of neural networks in the renewable energy domain. We have developed a methodology for the daily prediction of global solar radiation on a horizontal surface. We use an ad-hoc time series preprocessing and a Multi-Layer Perceptron (MLP) in order to predict solar radiation at daily horizon. First results are promising with nRMSE < 21% and RMSE < 998 Wh/m2. Our optimized MLP presents prediction similar to or even better than conventional methods such as ARIMA techniques, Bayesian inference, Markov chains and k-Nearest-Neighbors approximators. Moreover we found that our data preprocessing approach can reduce significantly forecasting errors.

0804.1046 2026-06-03 cs.CV cs.CG cs.GR cs.NA math.NA

Discrete schemes for Gaussian curvature and their convergence

高斯曲率的离散格式及其收敛性

Zhiqiang Xu, Guoliang Xu

AI总结 本文综述了高斯曲率的几种离散格式,提出了一种新的离散格式并证明了其在价数不小于5的正则顶点处的收敛性,同时通过反例表明价数为4时无法构造收敛的离散格式,最后比较了多种离散格式的渐近误差。

详情
AI中文摘要

本文综述了高斯曲率的几种离散格式。考虑了一种修正的高斯曲率离散格式的收敛性。此外,提出了一种新的高斯曲率离散格式。我们证明了新格式在价数不小于5的正则顶点处收敛。通过构造反例,我们还表明不可能构建一个在价数为4的正则顶点处收敛的高斯曲率离散格式。最后,比较了几种高斯曲率离散格式的渐近误差。

英文摘要

In this paper, several discrete schemes for Gaussian curvature are surveyed. The convergence property of a modified discrete scheme for the Gaussian curvature is considered. Furthermore, a new discrete scheme for Gaussian curvature is resented. We prove that the new scheme converges at the regular vertex with valence not less than 5. By constructing a counterexample, we also show that it is impossible for building a discrete scheme for Gaussian curvature which converges over the regular vertex with valence 4. Finally, asymptotic errors of several discrete scheme for Gaussian curvature are compared.

0712.4126 2026-06-03 cs.AI cs.CE cs.MS cs.NA cs.NE math.NA

TRUST-TECH based Methods for Optimization and Learning

基于TRUST-TECH的优化与学习方法

Chandan K. Reddy

AI总结 针对机器学习中的非线性和全局优化问题,提出基于TRUST-TECH的框架,通过交替局部和邻域搜索阶段,降低对初始化的敏感性并提高解的质量。

Comments PHD Thesis

详情
Journal ref
Chandan K. Reddy, TRUST-TECH based Methods for Optimization and Learning, PHD Thesis, Cornell University, February 2007
AI中文摘要

机器学习领域中出现的许多问题涉及非线性,并且通常要求用户获得全局最优解而非局部最优解。优化问题是机器学习算法中固有的,因此机器学习中的许多方法都继承自优化文献。通常被称为初始化问题,所需的理想参数集将显著依赖于给定的初始值。最近开发的TRUST-TECH(稳定性保持平衡变换表征)方法系统地探索参数子空间,以获得完整的局部最优解集。在本论文工作中,我们提出了基于TRUST-TECH的方法来解决若干优化和机器学习问题。在解空间中交替重复两个阶段,即局部阶段和邻域搜索阶段,以提高解的质量。我们的方法在合成数据集和真实数据集上进行了测试,使用这一新颖框架的优势得到了清晰体现。该框架不仅降低了对初始化的敏感性,还允许从业者灵活使用各种对特定问题有效的全局和局部方法。还研究了其他层次随机算法,如进化算法和平滑算法,并提出了将这些方法与TRUST-TECH结合的框架,在多个测试系统上进行了评估。

英文摘要

Many problems that arise in machine learning domain deal with nonlinearity and quite often demand users to obtain global optimal solutions rather than local optimal ones. Optimization problems are inherent in machine learning algorithms and hence many methods in machine learning were inherited from the optimization literature. Popularly known as the initialization problem, the ideal set of parameters required will significantly depend on the given initialization values. The recently developed TRUST-TECH (TRansformation Under STability-reTaining Equilibria CHaracterization) methodology systematically explores the subspace of the parameters to obtain a complete set of local optimal solutions. In this thesis work, we propose TRUST-TECH based methods for solving several optimization and machine learning problems. Two stages namely, the local stage and the neighborhood-search stage, are repeated alternatively in the solution space to achieve improvements in the quality of the solutions. Our methods were tested on both synthetic and real datasets and the advantages of using this novel framework are clearly manifested. This framework not only reduces the sensitivity to initialization, but also allows the flexibility for the practitioners to use various global and local methods that work well for a particular problem of interest. Other hierarchical stochastic algorithms like evolutionary algorithms and smoothing algorithms are also studied and frameworks for combining these methods with TRUST-TECH have been proposed and evaluated on several test systems.

2606.02913 2026-06-03 eess.AS cs.SD

A Comparison of Generative and Discriminative Methods for Speech Enhancement: Robustness, Complexity, and Hallucination

生成式与判别式语音增强方法的比较:鲁棒性、复杂性与幻觉

Shrishti Saha Shetu, Emanuël A. P. Habets, Andreas Brendel

AI总结 本文比较了生成式和判别式深度学习方法在语音增强中的表现,分析了高/低信噪比、匹配/失配训练场景下的鲁棒性、复杂度与幻觉特性。

详情
AI中文摘要

在本研究中,我们对基于深度学习的生成式和判别式语音增强方法进行了全面的比较分析,特别是在降噪任务中。我们的研究重点在于评估它们在高低信噪比条件下的有效性,同时考虑匹配和不匹配的训练场景。我们进一步研究了训练数据量、模型收敛速度的影响,并根据所考虑的训练范式,从客观结果的角度解释了性能差异。此外,我们比较了这些方法的复杂度-性能权衡和实际可行性。为了进一步加强评估,我们研究了生成式方法在词错误率和音素相似度方面的幻觉特性。本研究得出的见解提供了经验证据,帮助研究人员和从业者理解不同方法的感知增益是否证明了其在实际应用中的计算成本是合理的。

英文摘要

In this study, we conduct a comprehensive comparative analysis of generative and discriminative deep learning-based speech enhancement methods, specifically in noise reduction tasks. Our investigation focuses on evaluating their effectiveness under high and low signal-to-noise ratio conditions, considering both matched and mismatched training scenarios. We further investigate the impact of training data volume, model convergence speed, and interpret the performance differences in terms of objective results for the considered training paradigms. Additionally, we compare the complexity-performance trade-off and the practical viability of these approaches. To further strengthen the evaluation, we study the hallucination characteristics of generative approaches in terms of word error rate and phoneme similarity. The insights derived from this study provide empirical evidence to assist researchers and practitioners in understanding whether the perceptual gains of different approaches justify their computational cost in practical applications.

2606.02634 2026-06-03 eess.IV cs.AI

Echo-POSED: Geometric Self-Distillation for Echocardiography Guidance

Echo-POSED:用于超声心动图引导的几何自蒸馏

Elias Stenhede, Edvart Grüner Bjerke, Joanna Sulkowska, Eivind Bjørkan Orstad, Ole Jakob Elle, Ulysse Côté-Allard, Arian Ranjbar

AI总结 提出一种自监督框架Echo-POSED,通过从3D超声心动图体积中切取2D视图训练,实现实时经胸超声心动图引导,无需专家标注视图或跟踪探头轨迹,在SO(3)×SO(3)上保持探头运动等变性,在患者内和患者间引导模拟中达到平均角度误差8.2度。

详情
AI中文摘要

我们引入了Echo-POSED,一种用于实时经胸超声心动图(TTE)引导的自监督框架,它直接从2D超声图像推荐探头调整,无需专家标注的视图或跟踪的探头轨迹。相反,它在从常规采集的3D超声心动图体积中切取的2D视图上训练,强制执行对探头运动的等变性,同时保持对心脏相位的不变性,从而在SO(3)×SO(3)上产生姿态表示。在保留的测试集和公共外部3D-TTE数据集(包括供应商变化)上,Echo-POSED在虚拟扰动下保持几何一致性,并实现患者内和患者间引导模拟,在具有心脏运动的患者内模拟中,引导视图与目标视图之间的平均角度误差为8.2度。

英文摘要

We introduce Echo-POSED, a self-supervised framework for real-time transthoracic echocardiography (TTE) guidance that recommends probe adjustments directly from 2D ultrasound images, without the need for expert-labelled views or tracked probe trajectories. Instead, it trains on 2D views sliced from routinely acquired 3D echocardiography volumes, enforcing equivariance to probe motions while remaining invariant to cardiac phase, yielding a pose representation on $\mathrm{SO}(3)\times\mathrm{SO}(3)$. Across a held-out split and public external 3D--TTE datasets (including vendor shift), Echo-POSED maintains geometric consistency under virtual perturbations and enables intra- and inter-patient guidance simulations, achieving a combined mean angular error of 8.2 degrees between the guided and target views in intra-patient simulations with cardiac motion.

2606.03112 2026-06-03 stat.AP cs.LG

Trans GAN-WT: A Feature Extraction and Interactive Learning-Based Anomaly Detection Model for Wind Turbine Time Series Data

Trans GAN-WT: 一种基于特征提取和交互学习的风电机组时间序列数据异常检测模型

Jingzhe Kang

AI总结 提出融合Transformer和生成对抗网络的异常检测模型TransGAN-WT,通过放大重构误差、自回归多模态特征提取和时序特征交互学习,在真实风电机组数据集上F1达96.10%,误报率仅0.06%。

详情
AI中文摘要

随着风电场规模和数量的增加,风电机组的日常运维成本不断上升。为了降低运维成本并在灾难性故障发生前提高风电机组及系统运行数据的可靠性,监测设备运行状态并在早期检测故障至关重要。利用工况数据对风电机组运行状态进行异常评估,实现运行状态异常监测具有重要的实际意义。然而,现有的异常检测方法既无法在充满大量冗余信息的数据中进行有效的关系建模,也无法合理利用有价值的异常数据。为此,本文提出了一种融合Transformer和生成对抗网络的异常检测模型。首先,通过放大重构误差来降低微小偏差异常的漏检率。其次,利用自回归推理提取多模态特征,以增强训练的稳定性和泛化能力。最后,构建时序特征提取模块,促进不同时间尺度特征之间的交互学习,有效减少时间冗余。在真实风电机组数据集上进行的多组实验结果表明,TransGAN-WT在多个风电机组数据集上的平均F1分数达到96.10%,比几种其他最先进的基线方法分别高出5.84%和2.89%。同时,其误报率(FPR)仅为0.06%,并通过Wilcoxon符号秩检验验证了与最先进基线方法相比取得了统计上显著的性能提升,有效保障了风电机组的稳定运行。

英文摘要

With the increasing scale and number of wind farms, wind turbines' daily operation and maintenance costs are increasing. To reduce operation and maintenance costs and enhance the reliability of wind turbine and system operation data before reaching catastrophic failures, monitoring the operating status of the equipment and detecting failures at an early stage is crucial. It is of great practical significance to utilize the working condition data for abnormal assessment of the operating status of wind turbines to realize abnormal monitoring of the operating status of wind turbines. However, the existing anomaly detection methods can neither perform effective relational modeling in data filled with a large amount of redundant information nor reasonably utilize the valuable anomaly data. For this reason, this paper proposes an anomaly detection model that fuses a Transformer and a generative adversarial network. Firstly, it reduces the leakage detection rate of minor deviation anomalies by amplifying the reconstruction error. Secondly, it uses autoregressive inference to extract multimodal features to enhance the stability and generalization ability of training. Finally, the temporal feature extraction module is constructed to promote the interactive learning between features of different time scales and effectively reduce the time redundancy. The results of multiple sets of experiments conducted on real WTG datasets show that TransGAN-WT achieves an average F1 score of 96.10% across multiple wind turbine datasets, which is 5.84% and 2.89% higher than several other state-of-the-art baseline methods. It also realizes a false positive rate (FPR) of 0.06%, and is verified by the Wilcoxon signed-rank test to have achieved a statistically significant performance enhancement compared to the state-of-the-art baseline methods, effectively ensuring the stable operation of wind turbines.

2606.03018 2026-06-03 stat.ME cs.LG math.ST stat.ML stat.TH

A Fast Screening Approach for High-dimensional Outcomes and High-dimensional Predictors

高维结果与高维预测变量的快速筛选方法

Hongju Park, Zhenyao Ye, Shuo Chen

AI总结 提出图独立双筛选(GIDS)框架,同时降低响应变量和预测变量的维度,以解决高维交叉模态分析中的计算负担和可解释性问题。

Comments 38 pages, 2 figures

详情
AI中文摘要

由于超高维度和复杂依赖结构伴随高水平噪声,对多模态高维数据间的交互建模本质上具有挑战性。筛选方法能有效降低维度,但大多数现有方法仅缩减预测变量空间而保留所有结果变量。在交叉模态分析中,不同结果变量通常选择不同的预测变量子集,因此并集仍然很大且响应维度不变,限制了筛选的实际效益。这导致沉重的计算负担和较差的可解释性。为解决这些局限,我们提出一个新的筛选框架——图独立双筛选(GIDS),它同时降低响应变量和预测变量的维度。我们设计了计算高效的算法,促进后续选择过程,提高准确性和可扩展性,并建立了支持性的理论结果。广泛的模拟研究表明,GIDS优于仅筛选预测变量的现有方法。为展示其实用性,我们将GIDS应用于阿尔茨海默病神经影像学倡议(ADNI)数据集,分析全基因组865,353个DNA甲基化与49,386个转录组变量之间的交互。GIDS将特征空间缩减至约9,000个CpG位点和2,000个转录本,揭示了块状交互结构:具有强关联的CpG位点簇和基因转录本簇。这些发现不仅提高了计算可处理性,还产生了可解释的生物学见解,突显了阿尔茨海默病背后的协调调控机制。

英文摘要

Modeling interactions among multimodal, high-dimensional data is intrinsically challenging due to ultra-high dimensionality and complex dependence structure with high level noise. Screening methods are effective for reducing dimensionality, but most existing approaches shrink only the predictor space while retaining all outcomes. In cross-modal analyses, different outcomes often select different predictor subsets, so the union remains large and the response dimension is unchanged, limiting the practical benefit of screening. This gives rise to heavy computational burdens and poor interpretability. To address these limitations, we propose a new screening framework, Graph Independence Dual Screening (GIDS), which simultaneously reduces the dimensionality of response variables and predictors. We design computationally efficient algorithms that facilitate downstream selection procedures, improving accuracy and scalability, and establish supporting theoretical results. Extensive simulation studies demonstrate that GIDS outperforms existing methods that screen only predictors. To illustrate its utility, we applied GIDS to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, analyzing interactions between genome-wide 865,353 DNA methylation and 49,386 transcriptomic variables. GIDS reduced the feature space to approximately 9,000 CpGs and 2,000 transcripts, uncovering blockwise interaction structures: clusters of CpG sites and gene transcripts with strong associations. These findings not only improve computational tractability but also yield interpretable biological insights, highlighting coordinated regulatory mechanisms underlying Alzheimer's disease.

2606.03763 2026-06-03 econ.GN cs.AI q-fin.EC

Merit or networks? What decides where research is published

功绩还是关系网?什么决定了研究成果的发表地点

Ning Li

AI总结 利用经济学工作论文数据,通过LLM评估论文思想质量,结合执行质量、关系网络、作者能力和语言模型文本得分,构建五因素生产函数,揭示发表过程中功绩与关系的作用机制。

详情
AI中文摘要

科学出版奖励的是思想的质量还是关系的优势?这个问题在追求声望的科学界普遍存在,但几十年来一直难以研究,因为论文的质量无法在其发表命运之前被衡量,而不使用该命运作为标尺。我们通过直接测量论文的思想质量来打破这一限制,在发表之前,使用一个经过学科训练的LLM评估器,该评估器在不看到作者姓名或结果的情况下对思想进行评分。以经济学为案例,我们将这种文本可读的思想质量评分与执行质量评分、关系指数、作者能力指数和现成的语言模型文本评分相结合,为6208篇经济学工作论文的期刊定位估计了一个五投入生产函数。这些投入不是竞争对手,而是沿着声望阶梯的一个序列。执行设定了功绩底线,并且是总体最大的投入。文本可读的思想质量则对中间的阶梯进行分级。关系设定了一个偏袒上限,主要在最顶端、最具选择性的期刊附近产生影响。关系通过两个加性渠道发挥作用:有关系的作者撰写的论文得分更高,并且在同等分数下,他们的论文仍然更有可能获得更好的发表位置。然而,这种优势是有限的。关系提高了每个阶梯的几率,但并未使顶端成为普通思想的典型结果,即使是得分最高的论文在进入可见的期刊阶梯时也面临实际摩擦。这一结果将功绩主义和关系网络对科学出版的解释嵌套在一起,而不是在两者之间做出选择。

英文摘要

Does scientific publishing reward the quality of ideas or the advantage of connections? The question is universal to prestige-driven science, yet it has resisted decades of study because a paper's quality could not be gauged ahead of its publication fate without using that fate as the yardstick. We break this constraint by measuring a paper's idea quality directly from its text, before publication, using a discipline-trained LLM evaluator that scores the idea without seeing author names or outcomes. Using economics as a case study, we combine this text-legible idea-quality score with an execution-quality rubric, a connection index, an author-ability index, and an off-the-shelf language-model text score to estimate a five-input production function for journal placement across 6,208 economics working papers. The inputs are not rivals but a sequence along the ladder of prestige. Execution sets a meritocratic floor and is the largest input overall. Text-legible idea quality grades the rungs in between. Connections set a favoritism ceiling that bites mainly near the apex, the most selective journals. Connections work through two additive channels: connected authors write papers that score higher, and at equal scores their papers are still more likely to place better. Yet this advantage is bounded. Connections raise the odds of every rung without making the apex the typical outcome for ordinary ideas, and even the highest-scoring papers face real friction reaching the visible journal ladder. The result nests, rather than chooses between, the meritocracy and network accounts of how science is published.

2606.02625 2026-06-03 q-bio.QM cs.AI cs.LG

DXA-Derived Skeletal Phenotypes and Hip Fracture Risk: A Backdoor-Adjusted Causal Analysis

DXA衍生的骨骼表型与髋部骨折风险:后门调整因果分析

Zixin Shi, Chen Zhao, Meiling Zhou, Kevin A. Maupin, Joyce H. Keyak, Nancy E. Lane, Kuan-Jui Su, Hui Shen, Hong-Wen Deng, Kui Zhang, Weihua Zhou

AI总结 本研究利用后门调整的平均处理效应比较了DXA衍生的髋部骨骼表型与骨折风险的关系,并评估了基于效应排序的表型对风险分层的改善。

Comments 35 pages; main manuscript includes 4 figures and 3 tables; supplementary material includes 13 figures and 3 tables

详情
AI中文摘要

目的:通过预设的混杂因素调整,比较双能X射线吸收测定法(DXA)衍生的髋部骨骼表型与髋部骨折风险的关系,并评估按后门调整的平均处理效应(ATEs)排序的表型是否能改善风险分层。方法:我们分析了21,098名英国生物样本库参与者,他们具有关联的健康记录、髋部DXA衍生的骨骼测量值和预设协变量。评估了涵盖髋部相关区域的骨矿物质含量(BMC)、骨矿物质密度(BMD)和T评分的16种表型。混杂因素选择由预设的有向无环图(DAG)指导。后门调整的ATEs以每标准差(SD)增加的绝对风险差尺度估计。评估了股骨总BMD的效应异质性,并使用临床变量与按ATE大小排序的表型组合评估下游预测。结果:在21,098名参与者中,115人发生髋部骨折。所有16种表型均显示每SD增加的后门调整ATEs为负值。最大的ATEs出现在股骨总BMC和股骨总BMD,每个的风险差为-0.0047,对应于每1,000名参与者中每SD较高的表型值减少约4.7例髋部骨折。股骨总BMD的条件效应在年龄较大和BMI较低的参与者中更强。在预测中,临床变量加上按ATE排序的前11个表型达到了比FRAX(含股骨颈BMD)更高的AUC(0.842 vs. 0.709),具有更高的敏感性(0.748 vs. 0.443)和相似的特异性(0.793 vs. 0.777)。结论:DXA衍生的髋部骨骼表型在其后门调整的ATEs上存在差异。表型水平的因果评估可能有助于识别用于风险分层的信息性DXA测量值。

英文摘要

Purpose: To compare dual-energy X-ray absorptiometry (DXA)-derived hip skeletal phenotypes in relation to hip fracture risk using prespecified confounder adjustment and to assess whether phenotypes ranked by their backdoor-adjusted average treatment effects (ATEs) improve risk stratification. Methods: We analyzed 21,098 UK Biobank participants with linked health records, hip DXA-derived skeletal measures, and prespecified covariates. Sixteen phenotypes spanning bone mineral content (BMC), bone mineral density (BMD), and T-score across hip-related regions were evaluated. Confounder selection was guided by a prespecified directed acyclic graph (DAG). Backdoor-adjusted ATEs were estimated on the absolute risk-difference scale per standard deviation (SD) increase. Effect heterogeneity was evaluated for total femur BMD, and downstream prediction was assessed using clinical variables combined with phenotypes ranked by ATE magnitude. Results: Among 21,098 participants, 115 had hip fractures. All 16 phenotypes showed negative backdoor-adjusted ATEs per SD increase. The largest ATEs were observed for total femur BMC and total femur BMD, each with a risk difference of -0.0047, corresponding to approximately 4.7 fewer hip fractures per 1,000 participants per SD higher phenotype value. Conditional effects of total femur BMD were stronger among older participants and those with lower BMI. In prediction, clinical variables plus the top 11 ATE-ranked phenotypes achieved higher AUC than FRAX with femoral neck BMD (0.842 vs. 0.709), with higher sensitivity (0.748 vs. 0.443) and similar specificity (0.793 vs. 0.777). Conclusion: DXA-derived hip skeletal phenotypes differed in their backdoor-adjusted ATEs. Phenotype-level causal evaluation may help identify informative DXA measures for risk stratification.

2606.03910 2026-06-03 cs.PF cs.AI cs.DC cs.NI

NetKV: Network-Aware Decode Instance Selection for Disaggregated LLM Inference

NetKV: 面向分解式LLM推理的网络感知解码实例选择

Mubarak Adetunji Ojewale

AI总结 针对分解式LLM推理中KV缓存传输导致的首令牌时间增加问题,提出网络成本感知调度器NetKV,通过贪心算法选择解码实例,在64-GPU胖树模拟器上平均降低TTFT达21.2%。

详情
AI中文摘要

分解式LLM推理迫使KV缓存在解码开始前穿越数据中心网络,因此传输时间直接计入首令牌时间(TTFT)预算。当前调度器仅根据计算负载和前缀缓存局部性进行路由,忽略了预填充和解码实例之间的拓扑距离和动态拥塞。我们通过一个轻量级的算子到调度器接口(网络成本预言机)来弥补这一差距,并证明忽略网络项会导致仅缓存感知的调度在上下文长度增长时任意次优。NetKV是一个每请求O(|D|)的贪心算法,它消耗该预言机,其层级排名对过时遥测数据具有可证明的鲁棒性。在由Mooncake轨迹驱动的64-GPU四层胖树模拟器上,NetKV相比轮询调度平均降低TTFT达21.2%,相比调优的缓存+负载感知调度器降低17.6%,将SLO达标率提升最多20.1个百分点,并在所有测试条件下将令牌间时间开销保持在0.5毫秒以下,无需对传输、推理引擎或硬件进行任何更改。

英文摘要

Disaggregated LLM inference forces the KV cache to traverse the datacenter network before decoding begins, so transfer time enters directly into the Time to First Token (TTFT) budget. Current schedulers route on compute load and prefix-cache locality alone, ignoring the topological distance and dynamic congestion between prefill and decode instances. We close this gap with a thin operator-to-scheduler interface, the network cost oracle, and we prove that ignoring the network term renders cache-aware-only scheduling arbitrarily suboptimal as context length grows. NetKV, the O(|D|) per-request greedy that consumes this oracle, has tier rankings that are provably robust to stale telemetry. On a 64-GPU four-tier fat-tree simulator driven by Mooncake traces, NetKV reduces mean TTFT by up to 21.2% over round-robin and 17.6% over a tuned cache+load-aware scheduler, lifts SLO attainment by up to 20.1 percentage points, and keeps the Time Between Tokens overhead below 0.5 ms in every condition tested, with no changes to the transport, inference engine, or hardware.

2606.03895 2026-06-03 cs.OS cs.AI cs.CR

Agent libOS: A Library-OS-Inspired Runtime for Long-Running, Capability-Controlled LLM Agents

Agent libOS: 一种受库操作系统启发的运行时,用于长时间运行、能力受控的LLM智能体

Yingqi Zhang

AI总结 提出Agent libOS运行时,将LLM智能体建模为具有进程标识、生命周期、能力控制和审计记录的AgentProcess,通过类似libc的工具包装和运行时原语边界实现安全调度与资源控制。

Comments 14 pages, 1 figure, 2 tables

详情
AI中文摘要

大型语言模型(LLM)智能体正在从请求-响应助手演变为长时间运行的软件参与者:它们在模型调用之间维护状态,分叉子任务,等待外部事件,请求人类授权,生成工具,并执行必须被恢复和审计的副作用。本文提出Agent libOS,一种受库操作系统启发的LLM智能体运行时基础。Agent libOS运行在传统主机操作系统之上;它不实现硬件驱动、内核模式隔离或POSIX兼容操作系统。相反,它将智能体视为一个AgentProcess:一个可调度的执行主体,具有进程标识、父子谱系、生命周期状态、从AgentImage派生的工具表、类型化对象内存、显式能力、人类队列、检查点、事件和审计记录。其核心设计原则是工具是类似libc的包装器;运行时原语是权限边界。文件系统访问、对象访问、睡眠、人类批准、JIT工具注册和外部副作用都在显式能力和策略下在原语边界进行检查。我们描述了设计、威胁模型、Python原型和面向安全的评估。当前原型实现了异步调度、命名空间本地对象内存、运行时集成的人类批准、一次性权限授予、每进程工作目录、shell和图像注册原语、基于libOS系统调用代理的Deno/TypeScript JIT工具、文件系统/对象桥接工具、可注入的资源提供者基础、确定性演示、真实模型烟雾脚本以及撰写时的123个回归测试。Agent libOS不是提高规划器准确性,而是展示了一种运行时基础,在该基础上,长时间运行的LLM智能体可以被调度、授权、恢复和审计,而无需将工具分发视为信任边界。

英文摘要

Large language model (LLM) agents are evolving from request-response assistants into long-running software actors: they maintain state across model calls, fork subtasks, wait for external events, request human authority, generate tools, and perform side effects that must be resumed and audited. This paper presents Agent libOS, a library-OS-inspired runtime substrate for LLM agents. Agent libOS runs above a conventional host operating system; it does not implement hardware drivers, kernel-mode isolation, or a POSIX-compatible operating system. Instead, it treats an agent as an AgentProcess: a schedulable execution subject with process identity, parent-child lineage, lifecycle state, a tool table derived from an AgentImage, typed Object Memory, explicit capabilities, human queues, checkpoints, events, and audit records. Its central design rule is tools are libc-like wrappers; runtime primitives are the authority boundary. Filesystem access, object access, sleeps, human approval, JIT tool registration, and external side effects are checked at primitive boundaries under explicit capabilities and policy. We describe the design, threat model, Python prototype, and safety-oriented evaluation. The current prototype implements async scheduling, namespace-local Object Memory, runtime-integrated human approval, one-shot permission grants, per-process working directories, shell and image-registration primitives, Deno/TypeScript JIT tools over a libOS syscall broker, filesystem/object bridge tools, an injectable Resource Provider Substrate, deterministic demos, real-model smoke scripts, and 123 regression tests at the time of writing. Rather than improving planner accuracy, Agent libOS demonstrates a runtime substrate in which long-running LLM agents can be scheduled, authorized, resumed, and audited without treating tool dispatch as the trust boundary.

2606.03796 2026-06-03 cs.NE cs.AI

Signed Spiking Neuron Enabled by an Orthogonal-Easy-Axis Magnetic Tunnel Junction

基于正交易轴磁隧道结的有符号脉冲神经元

Huannan Zheng, Jingli Liu, Kezhou Yang

AI总结 提出一种基于正交易轴磁隧道结的紧凑型有符号脉冲神经元,通过自由层和钉扎层的正交易轴实现双极性脉冲生成,并映射磁矩动力学到有符号LIF膜电位演化,在CIFAR-10和CIFAR10-DVS上分别达到91.06%和77.40%的准确率。

详情
AI中文摘要

有符号脉冲神经元携带比标准脉冲神经元更丰富的信息。本文提出一种基于磁隧道结(MTJ)的紧凑型神经元,用于有符号泄漏积分点火(LIF)操作。通过自由层和钉扎层中的正交易轴,该器件能够实现双极性脉冲生成,并将磁矩动力学映射到有符号LIF膜电位演化。Landau-Lifshitz-Gilbert模拟表明,适当的自由层尺寸使器件响应能够遵循有符号LIF方程。一个代表性设计为10 nm x 45 nm x 50 nm,对应纵横比约为2:9:10。使用拟合的器件神经元模型进行网络评估,在CIFAR-10上达到91.06%,在CIFAR10-DVS上达到77.40%,保留了理想有符号LIF神经元的大部分准确率。

英文摘要

Signed spiking neurons carry richer information than standard spiking neurons. This work proposes a compact magnetic tunnel junction (MTJ)-based neuron for signed leaky integrate-and-fire (LIF) operation. With orthogonal easy axes in the free and pinned layers, the device enables bipolar spike generation and maps magnetic-moment dynamics to signed LIF membrane-potential evolution. Landau--Lifshitz--Gilbert simulations show that proper free-layer dimensions allow the device response to follow a signed LIF equation. A representative design of 10 nm x 45 nm x 50 nm corresponds to an aspect ratio of about 2:9:10. Network evaluations using the fitted device-neuron model achieve 91.06% on CIFAR-10 and 77.40% on CIFAR10-DVS, retaining most of the accuracy of ideal signed LIF neurons.

2606.03664 2026-06-03 cs.NI cs.AI

AUGUSTE: Online-Learning dApp for Predictive URLLC Scheduling

AUGUSTE: 用于预测性URLLC调度的在线学习dApp

Maxime Elkael, Michele Polese, Yunseong Lee, Koichiro Furueda, Tommaso Melodia

AI总结 针对URLLC中调度请求导致的高延迟问题,提出基于在线机器学习的MAC调度框架AUGUSTE,通过预测数据包到达提前分配资源,在真实5G测试平台上实现延迟与资源开销的最佳权衡。

详情
AI中文摘要

超可靠低延迟通信(URLLC)是5G的主要驱动力之一,3GPP为工业自动化、车联网(V2X)、战术边缘网络和无人系统控制等应用设定了1-10毫秒的延迟目标。多年后,真实的5G时分双工(TDD)网络的中位上行链路(UL)往返时间仍在50-70毫秒范围内,这主要是因为用户设备(UE)在发送UL数据之前必须完成调度请求(SR)过程。现有的补救措施,主要是配置授权(CG)调度,仅能消除严格周期性流量的这一开销,并需要跨层同步,这限制了其采用。我们提出了AUGUSTE(通过自适应时间估计实现URLLC的预测性上行授权),这是一种基于学习的介质访问控制(MAC)调度框架,它将在线机器学习(ML)模型嵌入UL调度器中,以预测数据包到达并在发出SR之前主动分配资源。一个自适应状态机在收集无偏到达统计信息的学习阶段和利用学习到的预测仅在预期有流量时进行调度的自信阶段之间交替。我们在运行OpenAirInterface的真实5G测试平台上,针对三种URLLC流量模式(请求-响应、ML边缘推理和周期性自主报告)评估了AUGUSTE,结果表明它在延迟-开销权衡上达到了最佳可行点:它以约十分之一的资源开销(7-10%开销)实现了与始终在线调度相当的中位往返时间(RTT)(约10毫秒,比基于SR的20毫秒基线减半)。

英文摘要

Ultra Reliable and Low Latency Communications (URLLC) was one of the main motivations behind 5G, with 3GPP advertising 1-10 ms latency targets for applications such as industrial automation, Vehicle-To-Everything (V2X), tactical edge networking, and unmanned-system control. Years on, real 5G Time Division Duplexing (TDD) networks still show median Uplink (UL) round-trip times in the 50-70 ms range, largely because of the Scheduling Request (SR) procedure that a User Equipment (UE) must complete before transmitting UL data. Existing remedies, primarily Configured Grant (CG) scheduling, only eliminate this overhead for strictly periodic traffic and require cross-layer synchronization, which has limited their adoption. We propose AUGUSTE (Anticipatory Uplink Grants for URLLC via Self-Adapting Temporal Estimation), a learning-based Medium Access Control (MAC) scheduling framework that embeds online Machine Learning (ML) models in the UL scheduler to predict packet arrivals and proactively allocate resources before an SR is issued. An adaptive state machine alternates between a learning phase that collects unbiased arrival statistics and a confident phase that exploits the learned predictions to schedule only when traffic is expected. We evaluate AUGUSTE on a real 5G testbed running OpenAirInterface across three URLLC traffic patterns (request-response, ML edge inference, and periodic autonomous reporting), and show that it operates at the best achievable point on the latency-overhead trade-off: it matches always-on scheduling's median Round Trip Time (RTT) (around 10 ms, halving the 20 ms SR-based baseline) at roughly one-tenth its resource cost (7-10 percent overhead).

2606.03432 2026-06-03 cs.CR cs.AI cs.LG

A Hybrid Approach For Malware Classification Using Secondary Features Fusion

一种使用二次特征融合的恶意软件分类混合方法

Raja Khurram Shahzad, Muhammad Mustaqeem, Haroon Elahi

AI总结 提出一种通过融合API调用和n-gram特征,并采用投票集成算法进行恶意软件检测与家族分类的方法,在Microsoft数据集上达到99.72%准确率和0.989 AUC。

详情
AI中文摘要

恶意软件(无论是变种还是新型)的数量正在迅速增加,使得恶意软件检测和缓解成为一个复杂的问题。改善恶意软件缓解的一种方法是自动检测和恶意软件家族分类。然而,传统的恶意软件检测方法无法将检测到的恶意软件分类到各自的家族中,阻碍了有效的恶意软件缓解。因此,本文提出了一种自动化恶意软件检测并将检测到的恶意软件分类到相应恶意软件家族的方法。所提出的方法在提取相关恶意软件特征(如API调用、固定和可变长度n-gram)后,使用自定义特征选择方法进行特征融合。此外,对于预测模型,提出了一种基于投票的算法融合方法。为了对所提出的方法进行实验评估,对Microsoft提供的数据集应用了二分类和多分类方法。最后,将实验结果与现有技术进行了比较。实验结果表明了所提出方法的有效性和效率,AUC为0.989,准确率为99.72%,对数损失为0.01。

英文摘要

The number of malware (either variant or novel) is rapidly increasing, making malware detection and mitigation a complex problem. One approach to improving malware mitigation is automatic detection and malware family classification. However, traditional malware detection methods cannot classify detected malware into their respective families, hindering effective malware mitigation. Consequently, this paper proposes a method to automate malware detection and classification of the detected malware into respective malware families. The proposed method uses feature fusion after extracting relevant malware features such as API calls and fixed and variable length n-grams with a customized feature selection method. Moreover, for the predictive model, a voting based approach is proposed for algorithm fusion. For the experimental evaluation of the proposed method, both binary and multi-class classification approaches are applied to the data set provided by Microsoft. Finally, the experimental results are compared with the state of the art. The experimental results indicate the effectiveness and efficiency of the proposed approach with an AUC of 0.989, accuracy of 99.72%, and a log loss of 0.01.

2606.02814 2026-06-03 cs.IR cs.AI cs.CL

Do Neural Retrievers Prefer Certain Documents? Evidence of Learned Relevance Priors

神经检索器是否偏好某些文档?学习到的相关性先验的证据

Francisco Valentini, Edgar Altszyler, Martin Fajcik

AI总结 通过分析监督双编码器检索器在文档嵌入中编码的查询无关信号,发现模型从标注数据中学习到文档级相关性先验,导致低先验文档即使相关也更难被检索,揭示了监督检索的结构性局限。

详情
AI中文摘要

神经检索器通过标注的查询-文档对训练来估计查询-文档相关性。然而,标注协议可能并不纯粹反映相关性:它们只选择一部分文档进行标注,并且这种选择可能偏向某些文档类型。我们研究监督双编码器检索器是否隐式学习了一个文档级相关性先验:一个查询无关的信号,作为在标注数据上训练的副作用编码在其表示空间中。我们通过在冻结的文档嵌入上训练简单分类器来估计这个先验,并在多个IR基准上评估三个最先进的检索器。我们发现监督神经检索器编码了能泛化到未见文档且跨模型一致的相关性先验。这些先验造成了可发现性差距:先验较低的文档即使真正相关也更难被检索。这种效应在监督密集检索器中出现,但在BM25中较弱且不一致,并在受控的匹配文档比较下持续存在。利用基于LLM的解释,我们发现被判定为相关的文档往往是主流主题的全面、自包含的摘要,而小众、零碎或高度技术性的内容通常未被评判。检索器内化了这种偏见,将具有这些偏好特征的文档排得比缺乏这些特征的文档更高,而与它们的实际相关性无关。我们的发现揭示了监督检索的结构性局限:在标注数据上训练的模型不仅学习相关性,还学习其训练数据中的隐式文档偏好。

英文摘要

Neural retrievers are trained to estimate query-document relevance from annotated query-document pairs. Yet annotation protocols may not purely reflect relevance: they select only a subset of documents for labeling, and this selection can favor certain document types over others. We investigate whether supervised bi-encoder retrievers implicitly learn a document-level relevance prior: a query-independent signal encoded in their representation space as a side effect of training on annotated data. We estimate this prior by training simple classifiers on frozen document embeddings and evaluate three state-of-the-art retrievers across multiple IR benchmarks. We find that supervised neural retrievers encode relevance priors that generalize to unseen documents and are consistent across models. These priors create a findability gap: documents with lower prior are systematically harder to retrieve, even when genuinely relevant. This effect appears in supervised dense retrievers but is weaker and less consistent in BM25, and it persists under controlled matched-document comparisons. Using LLM-based explanations, we find that judged-relevant documents tend to be comprehensive, self-contained summaries of mainstream topics, while niche, fragmentary, or highly technical content is often left unjudged. Retrievers internalize this bias, ranking documents with these favored features higher than documents that lack them, independently of their actual relevance. Our findings expose a structural limitation of supervised retrieval: models trained on annotated data do not just learn relevance, but also the implicit document preferences in their training data.

2606.02755 2026-06-03 cs.SE cs.AI

Acceptance-Test-Driven Evaluation Protocols for Business-Centric LLM Systems

面向业务中心LLM系统的验收测试驱动评估协议

Eric Liang

AI总结 针对LLM系统概率生成与确定性需求不匹配问题,提出基于验收测试驱动开发、安全工程和业务中心验证的评估协议,将利益相关者目标转化为可执行行为契约,并采用红-训练-绿生命周期确保多维门控通过后才发布。

详情
AI中文摘要

大型语言模型(LLM)应用日益期望在依赖概率生成组件的同时满足确定性机构需求。这种不匹配使得普通的后期基准测试对于必须安全、可靠、可审计且经济有用的系统而言是不够的。本文为基于验收测试驱动开发、安全工程和业务中心验证的运营LLM系统贡献了一种评估协议扩展。该扩展在提示、模型、检索或智能体变更被接受之前,将利益相关者目标转化为可执行行为契约、发布门控、监控信号和证据工件。它将测试驱动开发的红-绿-重构纪律调整为红-训练-绿色生命周期:首先为期望行为定义失败的验收测试,然后通过提示变更、检索设计、微调、护栏或数据增强改进LLM系统,最后仅当多维门控满足时才发布。贡献在于一个面向治理的度量栈、参考架构和用于比较验收测试驱动LLM开发与提示优先和基准后工作流的经验协议。

英文摘要

Large language model (LLM) applications are increasingly expected to satisfy deterministic institutional requirements while relying on probabilistic generative components. This mismatch makes ordinary post-hoc benchmarking insufficient for systems that must be safe, reliable, auditable, and economically useful. This paper contributes an evaluation-protocol extension for operational LLM systems grounded in acceptance-test-driven development, safety engineering, and business-centric validation. The extension translates stakeholder goals into executable behavioral contracts, release gates, monitoring signals, and evidence artifacts before prompt, model, retrieval, or agent changes are accepted. It adapts the red-green-refactor discipline of test-driven development to a red-train-green lifecycle: first define failing acceptance tests for desired behavior, then improve the LLM system through prompt changes, retrieval design, fine-tuning, guardrails, or data augmentation, and finally release only when multidimensional gates are satisfied. The contribution is a governance-oriented metric stack, reference architecture, and empirical protocol for comparing acceptance-test-driven LLM development against prompt-first and benchmark-after workflows.

2606.02640 2026-06-03 cs.CR cs.AI

D-Judge: Disrupting Multi-Turn Jailbreaks using Semantics-Preserving Output Rewriting

D-Judge: 使用语义保持输出重写破坏多轮越狱攻击

Huanli Gong, Zhipeng Wei, Yu Fu, Haz Sameen Shahgir, Ananya Gupta, Yue Dong, N. Benjamin Erichson

AI总结 提出D-Judge防御方法,通过语义保持的输出重写干扰攻击者的评判模型反馈循环,从而降低多轮越狱攻击的成功率。

Comments Proceedings of the 43rd International Conference on Machine Learning

详情
AI中文摘要

多轮越狱攻击对大型语言模型(LLM)的安全性构成日益严重的威胁,因为它们利用辅助评判模型的反馈来迭代优化提示,以实现有害目标。现有的防御措施主要在单个轮次或最终响应中检测或阻止不安全内容,但保留了评判驱动的优化循环,使攻击者能够从中间交互中提取信息性反馈。我们引入了D-Judge,一种语义保持的输出重写防御方法,它直接干预该循环,在攻击者的评判模型评估之前重写受害者LLM的响应。通过在不改变原始响应含义的情况下使评判的反馈信号失准,D-Judge破坏了攻击者的提示优化过程,导致后续查询针对扭曲的攻击进展信号进行优化。为了提高D-Judge生成此类重写的能力,我们构建了一个语义等价的响应对数据集,这些响应对会诱导不同的评判分配的有害性分数,并使用该数据集进行监督微调,随后进行直接偏好优化。在HarmBench上的实验表明,D-Judge在保持良性基准性能的同时,降低了最先进的多轮越狱攻击的成功率。

英文摘要

Multi-turn jailbreak attacks pose a growing threat to large language model (LLM) safety because they exploit feedback from auxiliary judge models to iteratively refine prompts toward harmful goals. Existing defenses largely detect or block unsafe content at individual turns or at the final response, leaving the judge-driven refinement loop intact and allowing attackers to extract informative feedback from intermediate interactions. We introduce D-Judge, a semantics-preserving output rewriting defense that intervenes directly in this loop by rewriting the victim LLM's responses before they are evaluated by the attacker's judge. By misaligning the judge's feedback signal without changing the meaning of the original response, D-Judge derails the attacker's prompt-refinement process, causing subsequent queries to be optimized against a distorted signal of attack progress. To improve D-Judge's ability to produce such rewrites, we construct a dataset of semantically equivalent response pairs that induce different judge-assigned harmfulness scores, and use it for supervised fine-tuning followed by direct preference optimization. Experiments on HarmBench show that D-Judge reduces the success rate of state-of-the-art multi-turn jailbreaks while preserving performance on benign benchmarks.

2606.02581 2026-06-03 cs.IR cs.AI

Cost-Aware Query Routing in RAG: Empirical Analysis of Retrieval Depth Tradeoffs

RAG中的成本感知查询路由:检索深度权衡的实证分析

Sanjay Mishra

AI总结 提出CA-RAG框架,通过为每个查询选择最优的检索深度和生成配置组合,在保证答案质量的同时减少令牌成本和延迟。

Comments 13 pages , 18 figures , 8 tables

详情
AI中文摘要

检索增强生成(RAG)面临一个基本的三方权衡:更深的检索改善了事实基础,但增加了令牌成本和端到端延迟。静态检索配置无法解决异构查询工作负载下的这一矛盾——简单的定义性查询在不必要的上下文上浪费预算,而复杂的分析性提示则因浅层检索而得不到充分服务。本文介绍了\emph{成本感知RAG}(CA-RAG),这是一个逐查询路由框架,通过最大化一个标量效用(该效用线性结合了估计的质量先验与预测延迟和总计费令牌的归一化惩罚),从离散的\emph{策略包}目录中选择——每个包将检索深度(从无检索直接推理到top-$k{=}10$密集检索)与固定的生成配置配对。CA-RAG使用基于FAISS的密集检索和OpenAI聊天/嵌入API实现,并在涵盖四个策略包的28个查询基准上进行评估。路由器动态地使用所有策略包,与始终重度检索相比,实现了 extbf{26\%更少的计费令牌},与始终直接推理相比,实现了 extbf{34\%更低的平均延迟},同时保持等效的答案质量。逐查询增量分析表明,节省是非均匀的,集中在较简单的查询上,这激发了复杂度感知的护栏。敏感性分析证实,仅通过权重调整,相同的策略包目录即可支持多个成本-延迟-质量操作点。所有结果直接从记录的CSV工件生成,以实现完全可重复性。CA-RAG为成本意识型LLM部署提供了透明、可审计的基础。

英文摘要

Retrieval-augmented generation (RAG) faces a fundamental three-way tension: deeper retrieval improves factual grounding but inflates token costs and end-to-end latency. Static retrieval configurations cannot resolve this tension across heterogeneous query workloads -- simple definitional queries waste budget on unnecessary context, while complex analytical prompts are underserved by shallow retrieval. This paper introduces \emph{Cost-Aware RAG} (CA-RAG), a per-query routing framework that selects from a discrete catalog of \emph{strategy bundles} -- each coupling a retrieval depth (from retrieval-free direct inference to top-$k{=}10$ dense retrieval) with a fixed generation profile -- by maximizing a scalar utility that linearly combines an estimated quality prior with normalized penalties for predicted latency and total billed tokens. CA-RAG is implemented with FAISS-backed dense retrieval and OpenAI chat/embedding APIs, and evaluated on a 28-query benchmark spanning four bundles. The router dynamically exercises all bundles, achieving \textbf{26\% fewer billed tokens} than always-heavy retrieval and \textbf{34\% lower mean latency} than always-direct inference while maintaining equivalent answer quality. Per-query delta analysis reveals that savings are non-uniform and concentrated in simpler queries, motivating complexity-aware guardrails. Sensitivity analysis confirms that the same bundle catalog supports multiple cost-latency-quality operating points through weight adjustment alone. All results are generated directly from logged CSV artifacts for full reproducibility. CA-RAG provides a transparent, auditable foundation for cost-conscious LLM deployments.

2606.03769 2026-06-03 math.OC cs.LG math.PR

Bregman meets Lévy: Stochastic mirror descent with heavy-tailed noise in continuous and discrete time

Bregman遇见Lévy:具有重尾噪声的随机镜像下降在连续和离散时间中

Pierre-Louis Cauvin, Panayotis Mertikopoulos

AI总结 研究随机镜像下降在重尾噪声下的鲁棒性,通过引入Lévy镜像流连续时间模型,证明其在凸和强凸目标下达到ε-最优的时间复杂度,并推导出离散时间匹配保证。

Comments 68 pages, 3 figures; to appear in the proceedings of ICML 2026

详情
AI中文摘要

我们研究了随机镜像下降(SMD)在重尾噪声下的鲁棒性,重点关注该方法在使用无限方差随机梯度输入时是否保持其收敛保证。为了以原则性的方式解决这个问题,我们首先引入SMD的连续时间模型,作为一个由具有有限$p$阶矩($1 < p \leq 2$)的中心化Lévy噪声过程驱动的随机微分方程(SDE)。该方案——我们称之为Lévy镜像流(LMF)——自然作为重尾噪声下SMD的缩放极限出现。特别地,当$p < 2$(即重噪声区域)时,LMF的轨迹通常表现出任意大小的跳跃不连续性,如果这些跳跃足够频繁,会导致无限方差。然而,尽管存在这种高度奇异的行为,我们证明LMF在凸情况下在$\mathcal{O}(\epsilon^{-p/(p-1)})$时间内达到$\epsilon$-最优,在(相对)强凸目标下在$\mathcal{\tilde O}(\epsilon^{-1/(p-1)})$时间内达到$\epsilon$-最优。这些保证提供了频繁长跳跃对过程收敛影响的清晰刻画,并渗透到重尾噪声下SMD几种变体的系列匹配离散时间保证中。

英文摘要

We study the robustness of stochastic mirror descent (SMD) under heavy-tailed noise, focusing on whether the method retains its convergence guarantees when run with infinite-variance stochastic gradient input. To address this question in a principled manner, we begin by introducing a continuous-time model of SMD as a stochastic differential equation (SDE) driven by a centered Lévy noise process with finite $p$-th order moments, $1 < p \leq 2$. This scheme -- which we call the Lévy mirror flow (LMF) -- arises naturally as the scaling limit of SMD in the presence of heavy-tailed noise. In particular, when $p < 2$ -- the heavy noise regime -- the trajectories of LMF generically exhibit jump discontinuities of arbitrary magnitude which, if frequent enough, lead to infinite variance. Nonetheless, despite this highly singular behavior, we show that LMF attains $ε$-optimality within $\mathcal{O}(ε^{-p/(p-1)})$ time in the convex case, and within $\mathcal{\tilde O}(ε^{-1/(p-1)})$ time for (relatively) strongly convex objectives. These guarantees provide a transparent characterization of the impact of frequent long jumps on the convergence of the process, and percolate to a series of matching discrete-time guarantees for several variants of SMD under heavy-tailed noise.

2606.01184 2026-06-03 stat.ME cs.AI

Topological Ignorability for Structural Causal Effects Beyond Means

超越均值的结构因果效应的拓扑可忽略性

Usef Faghihi

AI总结 本文提出基于拓扑几何的因果度量(如密度超水平Betti摘要、欧拉签名和持续同调摘要)来量化干预分布的结构差异,并引入拓扑可忽略性假设以在无需完整反事实分布的情况下识别结构因果效应。

Comments This is a new version of our paper titled: Beyond Means: Topological Causal Effects under Persistent-Homology Ignorability. So we will resubmit this as version 2 of arXiv:2603.14169

详情
AI中文摘要

许多干预措施改变的是结果分布的结构而非其均值:它们可以将总体分裂为不连通的区域、创建循环或空洞、生成分支,或重组结果云团而几乎不改变平均响应。在这种情况下,基于均值的因果估计量(如平均处理效应)可能遗漏重要的结构效应。 我们引入了基于干预结果定律摘要的拓扑几何因果度量,包括密度超水平Betti摘要、欧拉签名和持续同调摘要。这些度量量化了处理组和未处理组结果定律之间超出平均值的结构差异。我们还研究了因果解释所需的假设。我们引入了拓扑可忽略性,这是条件可忽略性的拓扑类比,要求所选结构特征的不变性而非整个反事实分布。当所选摘要是单射时,该条件与弱可忽略性一致;对于非单射摘要,它可以在不识别完整干预定律的情况下识别感兴趣的结构特征。 我们定义了一个协变量标准化的拓扑几何因果效应,并开发了实用的估计量。我们在两个隐藏混杂基准中验证了该框架:一个完全合成的精确基准和一个使用威斯康星乳腺癌协变量的真实协变量半合成基准。在这两个基准中,弱可忽略性失败,平衡观测协变量几乎消除了标准化均值差异,但坐标均值平均处理效应仍然有偏。相比之下,选定的有限密度超水平Betti和欧拉对比在神谕、观测和加权分析中保持稳定。

英文摘要

Many interventions alter the structure of an outcome distribution rather than its mean: they can split a population into disconnected regimes, create loops or holes, generate branches, or reorganize an outcome cloud while leaving the average response nearly unchanged. In such settings, mean-based causal estimands such as the average treatment effect may miss important structural effects. We introduce topological-geometrical causal metrics based on summaries of interventional outcome laws, including density-superlevel Betti summaries, Euler signatures, and persistent-homology summaries. These metrics quantify structural differences between treated and untreated outcome laws beyond averages. We also study the assumptions needed for causal interpretation. We introduce topological ignorability, a topological analogue of conditional ignorability that requires invariance of the chosen structural feature rather than the full counterfactual distribution. When the chosen summary is injective, this condition coincides with weak ignorability; for noninjective summaries, it can identify the structural feature of interest without identifying the full interventional law. We define a covariate-standardized topological-geometrical causal effect and develop practical estimators. We validate the framework in two hidden-confounding benchmarks: a fully synthetic exact benchmark and a real-covariate semi-synthetic benchmark using Wisconsin breast-cancer covariates. In both, weak ignorability fails and balancing observed covariates nearly eliminates standardized mean differences, yet the coordinate-mean average treatment effect remains biased. By contrast, selected finite density-superlevel Betti and Euler contrasts remain stable across oracle, observational, and weighted analyses.

2605.30155 2026-06-03 cs.LO cs.AI

Neural Network Verification using Partial Multi-Neuron Relaxation

使用部分多神经元松弛的神经网络验证

Ido Shmuel, Guy Katz

AI总结 提出部分多神经元松弛方法,通过启发式选择少量神经元生成多神经元边界,在Marabou验证器中实现紧致性与可扩展性的平衡。

Comments To appear in SAIV 2026

详情
AI中文摘要

深度神经网络在关键系统中的日益集成,激发了对其行为进行形式化安全保证的理论和实际兴趣。为了实现这一点,当代验证算法依赖于为网络的非线性激活函数计算线性松弛。现有的线性松弛方法通常分为两类:单神经元松弛,其中每个激活神经元根据其源进行界定;以及多神经元松弛,其中计算涉及多个激活神经元及其源的线性边界。然而,现有方法可能无法平衡紧致性和可扩展性,因为单神经元边界可能无法推导出验证所需的足够紧致的边界,而为所有激活神经元生成多神经元松弛在计算上代价高昂。在本文中,我们提出了一种中间方法,即部分多神经元松弛,其中我们仅对启发式选择的一小部分神经元生成多神经元边界。为了实现这一点,我们基于现有的分支启发式方法选择神经元,并优化多神经元边界的边界超平面。我们将所提出的方法集成到Marabou验证器中,并与现有的边界紧缩方法相比获得了有利的结果。我们的实验展示了我们的技术在神经网络验证中的潜力。

英文摘要

The increasing integration of deep neural networks in critical systems has spawned a theoretical and practical interest in formally guaranteeing safety properties about their behavior. To achieve this, contemporary verification algorithms rely on computing linear relaxations for a network's non-linear activation functions. Existing approaches for linear relaxations typically fall into one of two categories: single-neuron relaxation, in which each activation neuron is bounded in terms of its sources; and multi-neuron relaxation, in which linear bounds involving multiple activation neurons and their sources are calculated. However, existing methods might fail to balance tightness and scalability, as single-neuron bounds might not derive sufficiently tight bounds necessary for verification to complete, whereas generating multi-neuron relaxation for all activation neurons is computationally expensive. In this paper, we present a middle-ground approach featuring partial multi-neuron relaxation, in which we generate multi-neuron bounds for only a small, heuristically selected subset of neurons. To achieve this, we build upon existing branching heuristics for selecting neurons and for optimizing bounding hyper-planes for multi-neuron bounds. We integrated our proposed method within the Marabou verifier, and obtained favorable results in comparison to existing bound tightening methods. Our experiments showcase the potential of our technique for neural network verification.

2512.18552 2026-06-03 cs.SE cs.AI cs.CL cs.LG

Toward Training Superintelligent Software Agents through Self-Play SWE-RL

通过自我对弈SWE-RL训练超级智能软件代理

Yuxiang Wei, Zhiqing Sun, Emily McMilin, Jonas Gehring, David Zhang, Gabriel Synnaeve, Daniel Fried, Lingming Zhang, Sida Wang

AI总结 提出自我对弈SWE-RL(SSR)方法,通过强化学习在自对弈环境中训练单一LLM代理,使其在无需人工标注问题或测试的情况下,在真实代码库中迭代注入和修复软件缺陷,在SWE-bench基准上实现显著自我改进并超越人类数据基线。

Comments Accepted to ICML 2026

详情
AI中文摘要

尽管当前由大型语言模型(LLM)和智能体强化学习(RL)驱动的软件代理能够提高程序员的生产力,但其训练数据(例如GitHub问题和拉取请求)和环境(例如通过-通过和失败-通过测试)严重依赖人类知识或整理,这构成了通向超级智能的根本障碍。在本文中,我们提出了自我对弈SWE-RL(SSR),这是迈向超级智能软件代理训练范式的第一步。我们的方法仅需最小的数据假设,只需访问带有源代码和已安装依赖项的沙盒化仓库,无需人工标注的问题或测试。基于这些真实世界的代码库,单个LLM代理通过强化学习在自我对弈环境中进行训练,以迭代地注入和修复复杂度逐渐增加的软件缺陷,每个缺陷由测试补丁而非自然语言问题描述正式指定。在SWE-bench Verified和SWE-Bench Pro基准上,SSR实现了显著的自我改进(分别提升+10.4和+7.8分),并在整个训练轨迹中持续优于人类数据基线,尽管其评估的是自我对弈中未出现的自然语言问题。我们的结果虽然尚处于早期阶段,但表明了一条路径,即代理可以从真实软件仓库中自主收集广泛的学习经验,最终实现超越人类能力的超级智能系统,在理解系统构建方式、解决新挑战以及从头开始自主创建新软件方面超越人类。

英文摘要

While current software agents powered by large language models (LLMs) and agentic reinforcement learning (RL) can boost programmer productivity, their training data (e.g., GitHub issues and pull requests) and environments (e.g., pass-to-pass and fail-to-pass tests) heavily depend on human knowledge or curation, posing a fundamental barrier to superintelligence. In this paper, we present Self-play SWE-RL (SSR), a first step toward training paradigms for superintelligent software agents. Our approach takes minimal data assumptions, only requiring access to sandboxed repositories with source code and installed dependencies, with no need for human-labeled issues or tests. Grounded in these real-world codebases, a single LLM agent is trained via reinforcement learning in a self-play setting to iteratively inject and repair software bugs of increasing complexity, with each bug formally specified by a test patch rather than a natural language issue description. On the SWE-bench Verified and SWE-Bench Pro benchmarks, SSR achieves notable self-improvement (+10.4 and +7.8 points, respectively) and consistently outperforms the human-data baseline over the entire training trajectory, despite being evaluated on natural language issues absent from self-play. Our results, albeit early, suggest a path where agents autonomously gather extensive learning experiences from real-world software repositories, ultimately enabling superintelligent systems that exceed human capabilities in understanding how systems are constructed, solving novel challenges, and autonomously creating new software from scratch.

2510.12837 2026-06-03 cs.MA cs.AI cs.CY cs.NE

Semantic knowledge guides innovation and drives cultural evolution

语义知识引导创新并驱动文化进化

Anil Yaman, Shen Tian, Björn Lindström

AI总结 通过基于主体的模型和大规模行为实验,发现语义知识通过引导探索、增强创新成功率和促进泛化,与社会学习协同驱动累积文化进化。

详情
Journal ref
Proceedings of the National Academy of Sciences, 123(22), e2530750123, 2026
AI中文摘要

文化进化使得思想和技术能够代代积累,在人类中达到最复杂和开放的形式。虽然社会学习使得这些创新的传播成为可能,但产生这些创新的认知过程仍然知之甚少。经典理论通常将创新视为随机变异,这种简化不足以解释人类文化进化的复杂性。我们提出,语义知识——将概念与其属性和功能联系起来的关联——引导人类创新并驱动累积文化。为了验证这一点,我们结合了一个基于主体的模型(该模型考察语义知识如何塑造文化进化动态)和一个大规模行为实验(N = 1,243),测试其在人类创新中的作用。在这两种方法中,我们发现语义知识将探索引导向有意义的解决方案,增强创新成功率,并使得从先前发现中泛化成为可能。此外,语义知识与社会学习协同作用,放大创新并加速累积文化变化。相反,缺乏语义知识的实验参与者即使在社会学习可能的情况下,表现也不比随机好,并且依赖浅层探索策略进行创新。综合这些发现表明,语义知识是支撑人类累积文化的关键认知过程。

英文摘要

Cultural evolution allows ideas and technologies to accumulate across generations, reaching their most complex and open-ended form in humans. While social learning enables the transmission of such innovations, the cognitive processes that generate them remain poorly understood. Classical theories typically treat innovation as random variation, a simplification insufficient for explaining the complexity of human cultural evolution. We propose that semantic knowledge-the associations linking concepts to their properties and functions-guides human innovation and drives cumulative culture. To test this, we combined an agent-based model, which examines how semantic knowledge shapes cultural evolutionary dynamics, with a large-scale behavioral experiment (N = 1,243) testing its role in human innovation. Across both approaches, we found that semantic knowledge directed exploration toward meaningful solutions, enhanced innovation success, and enabled generalization from prior discoveries. Moreover, semantic knowledge interacted synergistically with social learning to amplify innovation and accelerate cumulative cultural change. In contrast, experimental participants lacking access to semantic knowledge performed no better than chance, even when social learning was possible, and relied on shallow exploration strategies for innovation. Together, these findings suggest that semantic knowledge is a key cognitive process underpinning human cumulative culture.

2601.02380 2026-06-03 cs.CY cs.AI

LLMs, Reasoning and Plagiarism

可反驳性差距:大型语言模型推理验证中的挑战

Elchanan Mossel

AI总结 本文指出当前声称LLM具备科学发现和通用智能的说法不满足波普尔可反驳性原则,并提出了提高科学透明度和可重复性的指南。

Comments The authors explicitly reserve all rights in this work. No permission is granted for the reproduction, storage, or use of this document for the purpose of training artificial intelligence systems or for text and data mining (TDM), including but not limited to the generation of embeddings, summaries, or synthetic derivatives. Claude and Gemini were used in writing this manuscript

详情
AI中文摘要

最近的报告声称大型语言模型(LLM)已经具备了推导新科学和展现人类级通用智能的能力。我们认为这样的说法并非严谨的科学声明,因为它们不满足波普尔的可反驳性原则(通常称为可证伪性),该原则要求科学陈述能够被证伪。我们识别了当前AI推理研究中的几个方法论陷阱,包括由于不透明且不可搜索的训练数据而无法验证发现的新颖性、由于持续模型更新导致缺乏可重复性,以及省略人机交互记录从而掩盖科学发现的真正来源。此外,缺乏反事实和失败尝试的数据造成了选择偏差,可能夸大LLM的能力。为应对这些挑战,我们提出了关于LLM推理研究的科学透明度和可重复性指南。建立这样的指南对于科学诚信以及当前关于公平数据使用的社会辩论至关重要。我们还讨论了相关问题,如LLM生成的抄袭挑战以及LLM中检索与新颖性的一般问题。

英文摘要

Recent reports claim that Large Language Models (LLMs) derive new science and exhibit human-level general intelligence. Such claims are entangled with two different narratives about what LLMs do: one in which they are an engine of synthesis that genuinely reasons to new knowledge, and one in which they retrieve and re-emit the work of others without attribution. In the scientific setting these are best understood as a contrast between \emph{reasoning} and \emph{plagiarism}. Finding where the truth lies between these two narratives is very challenging, as central components of the model -- the training data and the interaction transcript -- remain opaque. Thus claims of LLM reasoning do not satisfy Popper's refutability principle. We propose guidelines for transparency and reproducibility that will allow reasoning claims to be studied using the scientific method. The dominance of the reasoning narrative, we suggest, is in practice encouraging plagiarism in the scientific literature; we discuss what might be done about it.

2602.04899 2026-06-03 cs.CR cs.AI

Phantom Transfer: Data Poisoning can Survive Data-Level Defences

幻影转移:数据投毒可存活于数据级防御

Andrew Draganov, Tolga H. Dur, Anandmayi Bhongade, Mary Phuong

AI总结 提出一种名为“幻影转移”的数据投毒攻击,即使知道毒药如何被放入良性数据集也无法过滤,该攻击通过修改阈下学习以适应现实场景,并在多种数据级防御下存活。

详情
AI中文摘要

我们提出了一种数据投毒攻击——幻影转移——其特性是,即使你确切知道毒药是如何被放入原本良性的数据集中,你也无法将其过滤掉。我们通过修改阈下学习以在现实世界中工作来实现这一点,并证明无论数据由哪个模型生成、训练数据的是哪个模型或攻击目标是什么,该攻击都有效。此外,该攻击在11种测试的数据级防御下存活,包括一种将每个样本由另一个模型改写的防御。我们描述了这种攻击何时效果最佳,并展示了它可以用于将密码触发的行为植入模型,同时仍然击败防御。简而言之,我们提供了一个存在性证明,即最大能力防御可能无法阻止复杂的数据投毒攻击。我们建议未来的防御应辅以白盒方法和训练后模型审计。

英文摘要

We present a data poisoning attack -- Phantom Transfer -- with the property that, even if you know precisely how the poison was placed into an otherwise benign dataset, you cannot filter it out. We achieve this by modifying subliminal learning to work in real-world contexts and demonstrate that the attack works regardless of which model produced the data, which model is trained on the data or what the attack target is. Furthermore, the attack survives 11 tested data-level defences, including one where every sample is paraphrased by another model. We characterise when this attack works best and show that it can be used to plant password-triggered behaviours into models while still beating defences. In short, we provide an existence proof that maximum-affordance defences can fail to stop sophisticated data poisoning attacks. We suggest that future defences should be supplemented with white-box methods and post-training model audits.

2509.01641 2026-06-03 eess.SP cs.AI cs.LG

Non-Identical Diffusion Models in MIMO-OFDM Channel Generation

MIMO-OFDM信道生成中的非相同扩散模型

Yuzhi Yang, Omar Alhussein, Mérouane Debbah

AI总结 提出非相同扩散模型,通过元素级时间指示器捕获局部误差变化,解决MIMO-OFDM信道估计中元素可靠性不均的问题,理论验证其正确性并数值实验证明有效性。

Comments resubmitted to IEEE TCOM

详情
AI中文摘要

我们提出了一种新颖的扩散模型,称为非相同扩散模型,并研究了其在无线正交频分复用(OFDM)信道生成中的应用。与使用标量时间索引表示全局噪声水平的标准扩散模型不同,我们将这一概念扩展为元素级时间指示器,以更准确地捕获局部误差变化。非相同扩散使我们能够表征噪声输入中每个元素(例如OFDM中的子载波)的可靠性,从而在初始化有偏时改善生成结果。具体来说,我们专注于无线多输入多输出(MIMO)OFDM信道矩阵的恢复,其中由于导频方案,初始信道估计在元素间表现出高度不均匀的可靠性。传统的时间嵌入假设噪声进展均匀,无法捕获这种跨导频方案和噪声水平的变化。我们引入一个与输入大小匹配的矩阵来控制元素级噪声进展。遵循与现有方法类似的扩散过程,我们从理论和数值上证明了所提出的非相同扩散方案的正确性和有效性。对于MIMO-OFDM信道生成,我们提出了一种维度级时间嵌入策略。我们还开发并评估了多种训练和生成方法,并通过数值实验进行了比较。

英文摘要

We propose a novel diffusion model, termed the non-identical diffusion model, and investigate its application to wireless orthogonal frequency division multiplexing (OFDM) channel generation. Unlike the standard diffusion model that uses a scalar-valued time index to represent the global noise level, we extend this notion to an element-wise time indicator to capture local error variations more accurately. Non-identical diffusion enables us to characterize the reliability of each element (e.g., subcarriers in OFDM) within the noisy input, leading to improved generation results when the initialization is biased. Specifically, we focus on the recovery of wireless multi-input multi-output (MIMO) OFDM channel matrices, where the initial channel estimates exhibit highly uneven reliability across elements due to the pilot scheme. Conventional time embeddings, which assume uniform noise progression, fail to capture such variability across pilot schemes and noise levels. We introduce a matrix that matches the input size to control element-wise noise progression. Following a similar diffusion procedure to existing methods, we show the correctness and effectiveness of the proposed non-identical diffusion scheme both theoretically and numerically. For MIMO-OFDM channel generation, we propose a dimension-wise time embedding strategy. We also develop and evaluate multiple training and generation methods and compare them through numerical experiments.

2511.04243 2026-06-03 quant-ph cs.LG

Twirlator: A Pipeline for Analyzing Subgroup Symmetry Effects in Quantum Machine Learning Ansatzes

Twirlator: 分析量子机器学习拟设中子群对称性效应的流水线

Valter Uotila, Väinö Mehtola, Ilmo Salmenperä, Bo Zhao

AI总结 提出Twirlator流水线,通过对称群子群大小建模部分对称性,量化对称性增加时量子机器学习拟设的生成器漂移、电路开销、表达能力和纠缠能力之间的权衡。

Comments 8 pages; 7 figures; presented at the 7th International Workshop on Quantum Software Engineering (Q-SE 2026)

详情
Journal ref
Q-SE '26: Proceedings of the 7th IEEE/ACM International Workshop on Quantum Software Engineering (2026) 55 - 62
AI中文摘要

对称性是几何深度学习及其量子对应物中的强归纳偏置,并因其改善QML模型可训练性而受到越来越多的关注。然而,将对称性纳入量子机器学习(QML)拟设并非免费:对称化通常会增加门并约束电路。为了理解这些效应,我们提出了Twirlator,这是一个自动化流水线,用于对称化参数化QML拟设,并量化随着对称性增加而产生的权衡。Twirlator通过对称群子群的大小对部分对称性进行建模,从而能够分析“无对称性”和“完全对称性”极端之间的情形。在19种常见拟设模式中,Twirlator针对$S_n$的任何子群对称化电路,并测量(1)生成器漂移,(2)电路开销(深度和大小),以及(3)表达能力和纠缠能力。实验评估聚焦于$S_4$和$S_5$的子群。Twirlator揭示,较大的子群通常会增加电路开销,降低表达能力,并往往增加纠缠能力。该流水线和结果为在对称性感知的QML应用中选择平衡硬件成本和模型性能的拟设模式和对称性水平提供了实用指导。

英文摘要

Symmetry is a strong inductive bias in geometric deep learning and its quantum counterpart, and has attracted increasing attention for improving the trainability of QML models. Yet incorporating symmetries into quantum machine learning (QML) ansatzes is not free: symmetrization often adds gates and constrains the circuits. To understand these effects, we present Twirlator, which is an automated pipeline that symmetrizes parameterized QML ansatzes and quantifies the trade-offs as the amount of symmetry increases. Twirlator models partial symmetries by the size of a subgroup of the symmetric group, enabling analysis between the ``no symmetry'' and ``full symmetry'' extremes. Across 19 common ansatz patterns, Twirlator symmetrizes circuits with respect to any subgroup of $S_n$ and measures (1) generator drift, (2) circuit overhead (depth and size), and (3) expressibility and entangling capability. The experimental evaluation focuses on subgroups of $S_4$ and $S_5$. Twirlator reveals that larger subgroups typically increase circuit overhead, reduce expressibility, and often increase entangling capability. The pipeline and results provide practical guidance for selecting ansatz patterns and symmetry levels that balance hardware cost and model performance in symmetry-aware QML applications.

2511.13663 2026-06-03 cs.PL cs.LG

SAIL: Sound Abstract Interpreters with LLMs

SAIL: 基于LLM的可靠抽象解释器

Qiuhan Gu, Avaljot Singh, Gagandeep Singh

AI总结 提出SAIL框架,利用大语言模型自动合成全局可靠的抽象变换器,通过约束优化和代价函数确保可靠性,在神经网络验证中匹配甚至超越人工设计的变换器。

Comments 43 pages, 21 figures

详情
Journal ref
Proc. ACM Program. Lang. 10, PLDI, Article 230, 26 pages (2026)
AI中文摘要

如何构建全局可靠的抽象解释器以安全地近似程序行为仍然是抽象解释中的一个瓶颈。在本文中,我们展示了使用最先进的大语言模型来自动化这一繁琐过程的潜力。聚焦于神经网络验证领域,我们利用大语言模型从零开始在无限空间中搜索,跨不同抽象域合成非平凡的可靠抽象变换器。我们将合成任务形式化为一个约束优化问题,为此设计了一种新颖的基于数学的代价函数,用于衡量每个生成候选变换器的不可靠程度,同时强制执行硬性的语法和语义有效性约束。基于这一公式,我们引入了SAIL,一个新颖的统一框架,结合了模型生成、语法和语义验证以及基于代价函数的细化,以合成全局可靠的抽象变换器。评估结果表明,SAIL不仅匹配了人工设计的变换器的性能,还能够合成为复杂非线性算子设计的、文献中不存在的可靠且高精度的变换器。

英文摘要

How to construct globally sound abstract interpreters to safely approximate program behaviors remains a bottleneck in abstract interpretation. In this paper, we show the potential of using state-of-the-art LLMs to automate this tedious process. Focusing on the neural network verification area, we synthesize non-trivial sound abstract transformers across diverse abstract domains using LLMs to search within infinite space from scratch. We formalize the synthesis task as a constrained optimization problem, for which we design a novel mathematically grounded cost function that measures the degree of unsoundness of each generated candidate transformer, while enforcing hard syntactic and semantic validity constraints. Building on this formulation, we introduce SAIL, a novel unified framework that combines model generation, syntactic and semantic validation, and cost-function-based refinement to synthesize globally sound abstract transformers. Evaluation results show that SAIL not only matches the performance of manually designed transformers, but also is able to synthesize sound and high-precision transformers that do not exist in the literature for complex non-linear operators.