arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.07399 2026-06-08 stat.ML cs.LG 新提交

Automatic, Debiased, and Invariant Counterfactual Generation under General Interventions

通用干预下的自动、去偏和不变反事实生成

Raphael C Kim, Jingsen Zhu, Ramin Zabih, Michele Santacatterina

发表机构 * Cornell Tech（康奈尔科技）； Cornell University（康奈尔大学）； Department of Biostatistics, Department of Population Health（生物统计学系、人口健康系）； New York University Grossman School of Medicine（纽约大学格罗斯曼医学院）

AI总结提出ADIGen框架，结合Riesz回归、因果不变性和正交统计学习，实现通用干预下反事实生成的自动、去偏和不变性，并提供过剩风险界。

2606.06957 2026-06-08 stat.ML cs.LG 新提交

Deep Single-Index Fréchet Regression

深度单指标弗雷歇回归

Muqing Cui, Yidong Zhou, Su I Iao, Hans-Georg Müller

发表机构 * arXiv.org ； University of California, Berkeley（加州大学伯克利分校）

AI总结提出DeSI框架，通过深度神经网络估计单指标方向，在度量空间中进行弗雷歇回归，缓解维数灾难并保持可解释性，理论保证收敛率，在分布、网络等数据上表现优异。

详情

AI中文摘要

预测位于非欧几里得空间中的输出，如概率分布、网络和对称正定矩阵，在现代数据分析中变得越来越重要，特别是当输入是高维时。我们提出了DeSI（深度单指标弗雷歇回归），一种用于度量空间值输出和多变量输入的半参数回归框架，该框架假设条件弗雷歇均值具有单指标结构。DeSI使用深度神经网络估计可解释的指标方向，该方向量化了输入的相对重要性，并在目标度量空间中沿着得到的一维指标进行弗雷歇回归。这种结构缓解了维数灾难，同时保持了可解释性，这与标准深度神经网络形成对比。我们为DeSI建立了理论保证，包括一致逼近和收敛速度，并通过在分布、网络和对称正定矩阵上的模拟，以及在新泽西州的成分情绪数据上的应用，展示了其强大的预测性能。

英文摘要

Predicting outputs that are located in non-Euclidean spaces, such as probability distributions, networks, and symmetric positive-definite matrices, is becoming increasingly important in modern data analysis, particularly when inputs are high-dimensional. We propose DeSI (Deep Single-Index Fréchet Regression), a semiparametric framework for regression with metric space-valued outputs and multivariate inputs that assumes a single-index structure for the conditional Fréchet mean. DeSI estimates an interpretable index direction, which quantifies the relative importance of inputs, using a deep neural network, and performs Fréchet regression along the resulting one-dimensional index in the target metric space. This structure mitigates the curse of dimensionality while retaining interpretability, which stands in contrast to standard deep neural networks. We establish theoretical guarantees for DeSI, including uniform approximation and convergence rates, and demonstrate its strong predictive performance through simulations on distributions, networks, and symmetric positive-definite matrices, as well as an application to compositional mood data from New Jersey.

URL PDF HTML ☆

赞 0 踩 0

2606.06855 2026-06-08 stat.ML cs.LG math.ST stat.TH 新提交

Stability beyond Bounded Differences: Sharp Generalization Bounds under Finite $L_p$ Moments

超越有界差分的稳定性：有限 $L_p$ 矩下的尖锐泛化界

Qianqian Lei, Soham Bonnerjee, Yuefeng Han, Wei Biao Wu

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结针对重尾或无界损失，提出仅需有限 $L_p$ 矩条件的稳定性框架，导出尖锐高概率泛化界，覆盖经验风险最小化、转导回归和元学习。

详情

AI中文摘要

虽然算法稳定性是理解学习算法泛化能力的核心工具，但现有的高概率保证通常依赖于一致有界或次高斯/次韦布尔尾部假设，这对于现代设置中重尾或无界损失可能过于严格。我们开发了一个仅需有限 $L_p$ 矩条件的稳定性框架。我们的第一个贡献是在 $L_p$ 约束下独立随机变量函数的尖锐集中不等式，将 McDiarmid 的有界差分技术扩展到经典范围之外。利用这些结果，我们在一系列学习范式中推导出尖锐的高概率泛化界，包括经验风险最小化、转导回归和元学习。这些保证表明，即使有界性不成立，$L_p$ 稳定性也足以实现鲁棒泛化，显著削弱了稳定性文献中的标准假设。

英文摘要

While algorithmic stability is a central tool for understanding generalization of learning algorithms, existing high-probability guarantees typically rely on uniform boundedness or sub-Gaussian/sub-Weibull tail assumptions, which can be overly restrictive for modern settings with heavy-tailed or unbounded losses. We develop a stability-based framework that requires only a finite $L_p$ moment condition. Our first contribution is sharp concentration inequalities for functions of independent random variables under $L_p$ constraints, extending McDiarmid's bounded-differences techniques beyond the classical regime. Leveraging these results, we derive sharp high-probability generalization bounds across a range of learning paradigms, including empirical risk minimization, transductive regression, and meta-learning. These guarantees show that $L_p$ stability suffices for robust generalization even when boundedness fails, substantially weakening the standard assumptions in the stability literature.

URL PDF HTML ☆

赞 0 踩 0

2606.06814 2026-06-08 stat.ML cs.LG math.ST stat.AP stat.TH 新提交

The Effect of Training Task Diversity on In-Context Learning through the Lens of Low-Dimensional Subspaces

训练任务多样性对上下文学习的影响：基于低维子空间的视角

Soo Min Kwon, Alec S. Xu, Can Yaras, Dogyoon Song, Laura Balzano, Qing Qu

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Washington（华盛顿大学）； University of California, Los Angeles（加州大学洛杉矶分校）； Stanford University（斯坦福大学）； University of Toronto（多伦多大学）

AI总结本文通过低秩高斯混合模型分析训练任务多样性（由子空间非重叠列数定义）如何提升线性注意力上下文学习的泛化与优化，解释训练多样性缩短学习平台期及实现分布外泛化的现象，并扩展至非线性场景。

详情

AI中文摘要

Transformer执行上下文学习（ICL）的涌现能力引发了大量旨在理解其底层机制的研究。现有工作通常研究训练任务多样性（定义为ICL训练任务向量的数量或任务向量所来自的函数类数量）如何塑造ICL的学习动态和泛化能力。尽管这两种定义都揭示了许多有趣的现象，但后一定义下的许多观察结果在理论上仍未得到解释。本文提出了一个最小分析模型，在这些现象下，这些现象可以从训练数据的属性中可靠地涌现。通过将训练任务向量建模为低秩高斯的混合，我们展示了训练任务多样性（由参数化协方差矩阵的子空间之间的非重叠列数定义）如何改善线性注意力ICL的泛化和优化轨迹。特别地，我们表明我们的模型可以解释（i）为什么任务多样性训练缩短了ICL的平台期，以及（ii）为什么ICL似乎实现了分布外泛化。最后，我们通过实验证明了我们的结果如何扩展到非线性Transformer和非线性函数类。总体而言，我们的工作提出了一个可处理的框架来统一现有的观察结果。

英文摘要

The transformer's emergent ability to perform in-context learning (ICL) has sparked a wide range of studies designed to understand its underlying mechanisms. Existing works often study how training task diversity, defined either as the number of ICL training task vectors or as the number of function classes from which the task vectors are drawn, shapes both the learning dynamics and generalization capabilities of ICL. While both definitions have uncovered many interesting phenomena, many observations under the latter definition remain theoretically unexplained. This paper presents a minimal analytical model under which these phenomena provably emerge from the properties of the training data. By modeling the training task vectors as a mixture of low-rank Gaussians, we show how training task diversity, defined by the number of non-overlapping columns between subspaces that parameterize the covariance matrices, improves both the generalization and optimization trajectory of ICL with linear attention. In particular, we show that our model can explain (i) why training with task diversity shortens the ICL plateau and (ii) why ICL appears to achieve out-of-distribution generalization. We conclude by empirically demonstrating how our results extend to nonlinear transformers and nonlinear function classes. Overall, our work presents a tractable framework to unify existing observations.

URL PDF HTML ☆

赞 0 踩 0

2606.06785 2026-06-08 stat.ML cs.LG math.DS 新提交

Empirical Transfer Operators and Finite-Sample Change Detection for Noisy Expanding Interval Maps

经验转移算子与含噪扩张区间映射的有限样本变化检测

Aparna Rajput

发表机构 * Department of Mathematics and Statistics, Concordia University（数学与统计学系，康科迪亚大学）

AI总结针对一维含噪动力系统，提出基于分区经验转移矩阵的有限样本变化检测方法，通过比较滑动窗口与基线段的平稳分布L1距离来检测不变密度变化，并给出有限样本界和误报保证。

详情

Comments: 27 pages, 2 tables, 1 figure

AI中文摘要

我们研究了一维含噪动力系统的有限样本变化检测，使用基于分区的经验近似来刻画平稳行为。给定区间值过程的观测，我们对状态空间进行划分，从观测到的分区元素之间的转移中估计一个有限转移矩阵，并应用一个小的Doeblin型正则化以确保唯一的平稳分布。从初始参考段，我们计算基线经验平稳分布$\widehat{\pi}_{0,\rho}$。对于每个后续滑动窗口，我们计算$\widehat{\pi}_{t,\rho}$并定义得分\[ S_t=\|\widehat{\pi}_{t,\rho}-\widehat{\pi}_{0,\rho}\|_1. \] $S_t$的大值表示相对于基线的平稳行为发生变化。该统计量检测不变密度或平稳定律的变化，但不检测转移动态的所有可能变化。在关于经验转移集中性、有限状态平稳分布稳定性、分区近似、正则化偏差和噪声稳定性的明确假设下，我们推导了经验平稳密度的有限样本界。该界将采样误差、正则化偏差、分区近似误差和噪声偏差分开。然后，我们得到了单窗口误报保证，以及当不变密度变化超过估计误差时的充分检测条件。我们在合成含噪beta映射变点实验中展示了该方法。

英文摘要

We study finite-sample change detection for one-dimensional noisy dynamical systems using partition-based empirical approximations of stationary behaviour. Given observations from an interval-valued process, we partition the state space, estimate a finite transition matrix from observed transitions between partition elements, and apply a small Doeblin-type regularisation to ensure a unique stationary distribution. From an initial reference segment, we compute a baseline empirical stationary distribution $\widehatπ_{0,ρ}$. For each later sliding window, we compute $\widehatπ_{t,ρ}$ and define the score \[ S_t=\|\widehatπ_{t,ρ}-\widehatπ_{0,ρ}\|_1. \] Large values of $S_t$ indicate a change in stationary behaviour relative to the baseline. The statistic detects changes in invariant density or stationary law, but not all possible changes in transition dynamics. Under explicit assumptions on empirical transition concentration, finite-state stationary distribution stability, partition approximation, regularisation bias, and noise stability, we derive a finite-sample bound for the empirical stationary density. The bound separates sampling error, regularisation bias, partition approximation error, and noise bias. We then obtain a single-window false-alarm guarantee and a sufficient detection condition when the invariant density changes by more than the estimation error. We illustrate the method on synthetic noisy beta-map change-point experiments.

URL PDF HTML ☆

赞 0 踩 0

2606.06772 2026-06-08 stat.ML cs.AI cs.LG 新提交

Generalization in Deep Neural Networks: Minimax Rates for Gradient Methods

深度神经网络的泛化：梯度方法的极小化最优速率

Junyu Zhou, Puyu Wang, Yunwen Lei, Marius Kloft, Yiming Ying

发表机构 * Mathematical Institute for Machine Learning and Data Science, Catholic University of Eichstätt-Ingolstadt（机器学习与数据科学数学研究所，埃施特哈特-因戈尔施塔特天主教大学）； Department of Computer Science, RPTU Kaiserslautern-Landau（计算机科学系，凯斯莱特恩-兰道大学）； Department of Mathematics, The University of Hong Kong（数学系，香港大学）； School of Mathematics and Statistics, The University of Sydney（数学与统计学学院，悉尼大学）

AI总结本文建立了过参数化深度神经网络与核方法学习动力学的联系，证明了梯度下降和随机梯度下降在足够宽度下能达到极小化最优泛化误差。

详情

Comments: 37 pages

AI中文摘要

理解过参数化神经网络的泛化性能已成为深度学习理论的核心课题。尽管近期进展，特别是神经正切核（NTK）机制下的工作，揭示了浅层架构的行为，但深度神经网络（DNN）的统计泛化性质，尤其是在回归任务中，仍远未得到充分理解。本文通过提供使用梯度方法训练的DNN的全面泛化分析，在弥合这一差距方面取得了重大进展。首先，我们首次建立了使用梯度方法训练的、具有光滑激活函数的DNN的学习动态与核方法的学习动态之间的关键联系，表明过参数化DNN上的梯度方法可以完全继承其核对应物的有利学习动态。基于这一联系以及核方法已确立的最优性，我们推导出了梯度下降（GD）和随机梯度下降（SGD）的过量总体风险的第一个已知极小化最优速率，假设网络宽度与样本大小成多项式比例。我们的结果表明，在足够宽度下，由GD或SGD训练的DNN可以实现与基于核的方法相当的泛化性能。

英文摘要

Understanding the generalization performance of over-parameterized neural networks has become a central topic in deep learning theory. While recent advances, particularly works under the Neural Tangent Kernel (NTK) regime, have shed light on the behavior of shallow architectures, the statistical generalization properties of deep neural networks (DNNs), especially in regression tasks, remain far less understood. In this paper, we make significant progress toward closing this gap by providing a comprehensive generalization analysis of DNNs trained using gradient-based methods. First, we establish, for the first time, a crucial connection between the learning dynamics of a DNN with smooth activation functions trained via gradient-based methods and those of kernel methods, showing that gradient-based methods on over-parameterized DNNs can fully inherit the favorable learning dynamics of their kernel counterparts. Building on this connection and the well-established optimality of kernel methods, we derive the first known minimax-optimal rates for the excess population risk of both gradient descent (GD) and stochastic gradient descent (SGD), under the assumption that network width scales polynomially with the sample size. Our results demonstrate that, with sufficient width, DNNs trained by GD or SGD can achieve generalization performance comparable to kernel-based methods.

URL PDF HTML ☆

赞 0 踩 0

2606.06764 2026-06-08 stat.ML cs.AI cs.LG 新提交

Optimal Rates for Generalization of Gradient Descent Methods with Deep Neural Networks

深度神经网络梯度下降方法的泛化最优速率

Junyu Zhou, Puyu Wang, Yunwen Lei, Yiming Ying, Ding-Xuan Zhou

发表机构 * Mathematical Institute for Machine Learning and Data Science, KU Eichstätt-Ingolstadt（机器学习与数据科学数学研究所，埃施特哈特-英戈尔施塔特大学）； Department of Computer Science, RPTU Kaiserslautern-Landau（计算机科学系，凯撒斯劳滕-兰道大学）； Department of Mathematics, University of Hong Kong（数学系，香港大学）； School of Mathematics and Statistics, University of Sydney（数学与统计学学院，悉尼大学）

AI总结本文针对深度ReLU网络，在神经正切核（NTK）机制下，首次建立了梯度下降（GD）和随机梯度下降（SGD）的极小化最优泛化误差速率，证明宽度足够时可达核方法的最优速率。

详情

Comments: 39 pages, 1 table

AI中文摘要

近年来，在神经正切核（NTK）机制下，对于过参数化神经网络的梯度下降方法的统计泛化性能的理解取得了进展。然而，现有关于回归问题的工作大多局限于浅层网络架构，在深度神经网络理论中留下了显著的空白。本文通过为使用梯度下降（GD）和随机梯度下降（SGD）训练的深度ReLU网络提供全面的泛化分析来填补这一空白。具体来说，我们首次建立了深度ReLU网络的GD和SGD在总体风险超额上的极小化最优速率，假设网络宽度与网络深度和训练样本规模呈多项式关系。我们的结果表明，在足够宽度下，深度ReLU网络的梯度下降方法能够达到与核方法相当的泛化最优速率。

英文摘要

Recent progress has been made in understanding the statistical generalization performance of gradient descent methods for overparameterized neural networks within the neural tangent kernel (NTK) regime. However, most of the existing work on regression problems is limited to shallow network architectures, leaving a notable gap in the theory of deep neural networks. This paper addresses this gap by presenting a comprehensive generalization analysis for deep ReLU networks trained using gradient descent (GD) and stochastic gradient descent (SGD). Specifically, we establish the first known minimax-optimal rates of excess population risk for both GD and SGD with deep ReLU networks, under the assumption that the network width scales polynomially with respect to the network depth and training sample size. Our results demonstrate that with sufficient width, gradient descent methods for deep ReLU networks can achieve optimal generalization rates on par with kernel methods.

URL PDF HTML ☆

赞 0 踩 0

2606.06516 2026-06-08 q-bio.QM cs.LG 新提交

Probabilistic learning to perform pre-onset individualised prediction of disease severity: application to Veno Occlusive Disease

概率学习用于疾病严重程度的发病前个体化预测：在静脉闭塞性疾病中的应用

Dalia Chakrabarty, Kane Warrior, Chuqiao Zhang, Akash Bhojgaria, Joydeep Chakrabartty

发表机构 * University of York（约克大学）

AI总结提出一种新的概率监督学习方法，利用数字孪生和概率逆学习，在骨髓移植前自动预测静脉闭塞性疾病（VOD）的严重程度评分，辅助医生制定治疗方案。

详情

AI中文摘要

我们提出了一种新的概率监督学习方法，能够对预期患者疾病发展的严重程度进行可靠、自动且早期的个体化预测。通过考虑预期患者的数字孪生（DT），在移植前预测静脉闭塞性疾病（VOD）的严重程度评分来展示预测能力，该评分参数化了患者在接受骨髓移植后VOD发展的严重程度。通过将移植前变量与严重程度评分变量之间的关系建模为（随机）函数，该函数被视为适当选择的随机过程的样本函数，从而学习这种关系。该基础过程的参数使用训练数据集学习，该数据集由回顾性患者队列的实时演变生成，随后通过预期患者评分的概率逆学习来扩充该训练数据集的大小。扩充后的训练集允许学习在移植前阶段自动预测VOD严重程度评分的函数，该评分表征了物理患者在其独特移植前状态下的DT。该评分随后反馈给真实预期患者，作为其移植后VOD发展的严重程度。这样的评分允许治疗血液肿瘤学家决定治疗方案，在本例中简化为决定是否使用去纤维蛋白多核苷酸治疗患者。开发了一个AI工具来执行这种自动预测，医生输入表征预期患者DT的移植前状态数据。

英文摘要

We advance a new probabilistic supervised learning approach that permits reliable, automated, and early individualised prediction of the severity with which a disease will develop in a prospective patient. The prediction capacity is illustrated via the pre-transplant prediction of the score of severity of Veno Occlusive Disease (or VOD) in the digital twin (DT) of the considered prospective patient, where this score parametrises the severity with which VOD will develop in this patient, after they undergo their Bone Marrow Transplant. The learning of the relationship between the pre-transplant variables, and a severity score variable is undertaken by modelling this relationship as a (random) function that is treated as a sample function of an adequately-chosen stochastic process. The parameters of this underlying process are learnt using a training dataset that is generated using the real-time evolution of retrospective patients in a cohort, with this training dataset subsequently augmented in size by a probabilistic inverse learning of the score of prospective patients. The augmented training set, then permits the learning of the function that capacitates - at the pre-transplant stage - automated prediction of the score of the severity of VOD that characterises the DT of a physical patient in their unique pre-transplant state. This score is subsequently fed back to the real prospective patient as the severity with which VOD will develop in them, after this patient undergoes their transplant. Such a score then permits the treating Haematologist-Oncologists to decide on the treatment regimen, which in this illustration reduces to deciding on treating the patient with Defibrotide. An AI facility is developed to undertake such automated prediction, with the physician inputting the data on the pre-transplant state that characterises the DT of the prospective patient under consideration.

URL PDF HTML ☆

赞 0 踩 0

2606.07492 2026-06-08 cs.IR cs.LG stat.ML 新提交

Bradley-Terry Rankings for Recommender Systems Across Dataset Taxonomies

基于数据集分类学的推荐系统Bradley-Terry排名

Ekaterina Grishina, Stepan Kuznetsov, Askar Tsyganov, Ilya Ivanov, Daria Korovaitceva, Margarita Rusanova, Uliana Parkina, Alexander Derevyagin, Evgeny Frolov, Sergey Samsonov, Anton Lysenko

发表机构 * HSE University（俄罗斯莫斯科国立高等经济学院）

AI总结针对推荐算法排名对数据集特性敏感的问题，提出基于Bradley-Terry模型的数据驱动排名方法，并引入排名一致性指标和针对未见数据集的算法排名方法。

详情

DOI: 10.1145/3770855.3817890
Comments: KDD'26

AI中文摘要

推荐算法的排名是一个具有挑战性的问题，因为模型性能对数据集特征（如稀疏性、序列结构和规模）敏感。这驱动了对适当方法的需求，以公平比较算法。对性能指标（例如，在基准测试上平均NDCG）的简单聚合可能会产生误导性的排名，削弱实际选择。为解决此问题，我们引入了一种基于Bradley-Terry（BT）模型的新型数据驱动排名方法。我们证明所获得的排名取决于关键数据集统计量。此外，我们提出了一种新的排名一致性评估指标，并展示了我们的排名对不完整数据的鲁棒性。最后，我们引入了一种针对未见数据集的算法排名方法，无需运行模型，依赖于Bradley-Terry框架的扩展，包括BT树和带协变量的BT模型。

英文摘要

The ranking of recommendation algorithms is a challenging problem since model performance is sensitive to dataset characteristics such as sparsity, sequential structure, and scale. This drives a demand for a proper methodology for fair comparison between algorithms. Naive aggregation of performance metrics (e.g., averaging NDCG over benchmarks) can yield misleading rankings, undermining practical selection. To address this problem, we introduce a novel, data-driven ranking methodology based on Bradley-Terry (BT) model. We demonstrate that the obtained ranking depends on key dataset statistics. Additionally, we propose a novel metric for evaluating ranking consistency and demonstrate robustness of our ranking to incomplete data. Finally, we introduce a dataset-specific methodology for ranking algorithms on unseen datasets without running the models, relying on extensions of the Bradley-Terry framework, including BT trees and BT models with covariates.

URL PDF HTML ☆

赞 0 踩 0

2606.07491 2026-06-08 cs.DC cs.AI cs.LG cs.SE 新提交

Twelve quick tips for designing AI-driven HPC workflows

设计AI驱动的高性能计算工作流的十二条快速技巧

Jamie J. Alnasir

发表机构 * Department of Computer Science（计算机科学系）； Royal Holloway University of London（伦敦皇家霍洛威大学）

AI总结本文针对AI与HPC融合带来的新挑战，提出十二条实用技巧，涵盖容器化、作业数组、反馈循环和I/O优化，帮助设计高效、可扩展、可复现的AI驱动HPC工作流。

详情

Comments: 12 pages, 1 figure. Formatted using the bioRxiv LaTeX preprint style

AI中文摘要

高性能计算（HPC）集群仍然是大规模科学计算的支柱，传统上执行确定性、线性流水线，以优化可预测性能。然而，人工智能（AI）和基础模型在科学研究中的普遍集成引入了一种根本性的新计算范式。AI驱动的工作流具有迭代、数据驱动和概率性的特征，带来了数据引力、异构资源管理和复杂工作流编排方面的独特挑战。本指南提供了十二条实用技巧，旨在帮助研究人员设计高效、可扩展和可复现的AI驱动HPC工作流。通过解决关键的系统级瓶颈——例如用于环境可移植性的容器化、作业数组的战略部署、显式反馈循环机制以及小文件的I/O优化——本文提供了一个从刚性执行流水线过渡到自适应、智能计算环境的框架。虽然这些架构原则广泛适用于分布式环境，但它们特别针对现代计算生物学中资源密集型的吞吐量需求。

英文摘要

High-performance computing (HPC) clusters remain the backbone of large-scale scientific computation, traditionally executing deterministic, linear pipelines optimised for predictable performance. However, the pervasive integration of artificial intelligence (AI) and foundation models into scientific research has introduced a fundamentally new computational paradigm. AI-driven workflows are characteristically iterative, data-driven, and probabilistic, introducing unique challenges regarding data gravity, heterogeneous resource management, and complex workflow orchestration. This guide provides twelve practical tips designed to help researchers design efficient, scalable, and reproducible AI-driven HPC workflows. By addressing critical system-level bottlenecks - such as containerisation for environment portability, strategic deployment of job arrays, explicit feedback loop mechanics, and I/O optimisation for small files - this article offers a framework for transitioning from rigid execution pipelines to adaptive, intelligent computational environments. While these architectural principles are broadly applicable across distributed environments, they are particularly tailored to the resource-intensive throughput demands of modern computational biology.

URL PDF HTML ☆

赞 0 踩 0

2606.07476 2026-06-08 eess.SY cs.RO cs.SY eess.SP 新提交

Physiologically Constrained Musculoskeletal Neural Network for Multi-DoF Joint Kinematics Estimation from Partially Observed sEMG

生理约束下的肌肉骨骼神经网络用于部分观测sEMG的多自由度关节运动学估计

Wending Heng, Mingming Zhang, Glen Cooper, Zhenhong Li

发表机构 * University of Manchester（曼彻斯特大学）； Southern University of Science and Technology（南方科技大学）

AI总结提出一种肌肉骨骼神经网络(MSK-NN)，结合CNN和肌肉骨骼前向动力学模块，在部分观测表面肌电信号下估计多自由度关节角度，并通过复合损失函数实现生理合理激活推断。

详情

AI中文摘要

本文研究了在部分观测表面肌电信号(sEMG)下的多自由度(DoF)关节运动学估计问题，其中由于解剖不可及性或传感器限制，只能测量任务相关肌肉的子集。提出了一种新颖的肌肉骨骼神经网络(MSK-NN)，用于估计多自由度关节角度，同时推断已测量和未测量肌肉的激活。MSK-NN由一个基于CNN的肌肉激活估计器和一个嵌入的MSK前向动力学模块组成，形成完全可微的架构。与需要额外生物力学标签(如肌肉-肌腱力、关节力矩)的现有混合神经框架不同，MSK-NN在没有内部生物力学变量直接监督的情况下进行训练。通过结合关节运动学损失、数据驱动的肌肉协同损失和解剖引导的趋势损失，设计了复合物理-生理损失。该方法在不受约束的速度和幅度下的三种节律运动和一种随机运动上评估了二自由度腕关节运动学估计。与CNN、Bi-LSTM、CNN-LSTM和PET基线相比，MSK-NN实现了更低的归一化均方根误差(NRMSE)和更高的决定系数(R2)，尤其是在随机运动中。更重要的是，优化的MSK参数保持在生理极限内，并且输入排除肌肉的估计激活与其记录的sEMG包络表现出强烈的时间一致性，证明了MSK-NN恢复生理合理激活的能力。

英文摘要

This paper investigates multi-degrees of freedom (DoF) joint kinematics estimation under partially observed surface electromyography (sEMG), where only a subset of task-relevant muscles can be measured due to anatomical inaccessibility or sensor constraints. A novel musculoskeletal neural network (MSK-NN) is proposed to estimate multi-DoF joint angles while simultaneously inferring activations for both measured and unmeasured muscles. MSK-NN consists of a CNN-based muscle activation estimator and an embedded MSK forward dynamics module, forming a fully differentiable architecture. Unlike existing hybrid neural frameworks that require additional biomechanical labels (e.g., muscle-tendon forces, joint torques), MSK-NN is trained without direct supervision of internal biomechanical variables. A composite physics-physiology loss is designed by incorporating a joint kinematics loss, a data-driven muscle synergy loss, and an anatomy-guided trend loss. The proposed method is evaluated on two-DoF wrist kinematics estimation across three rhythmic motions with unconstrained speed and amplitude, and one random motion. Compared with CNN, Bi-LSTM, CNN-LSTM, and PET baselines, MSK-NN achieves lower normalized root mean square error (NRMSE) and higher coefficient of determination (R2), especially for the random motion. More importantly, the optimized MSK parameters remain within physiological limits, and the estimated activation of an input-excluded muscle exhibits strong temporal agreement with its recorded sEMG envelope, demonstrating the capability of musculoskeletal (MSK)-NN to recover physiologically plausible activations.

URL PDF HTML ☆

赞 0 踩 0

2606.07454 2026-06-08 cs.IR cs.AI 新提交

PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams

PaperFlow: 跨日常论文流的画像、推荐与自适应

Fuqiang Wang, Song Tan, Zheng Guo, Jiaohao Fu, Xinglong Xu, Bihui Yu, Jie Dong, Zheng Sun, Siyuan Li, Jingxuan Wei, Cheng Tan

发表机构 * Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences)（计算机网络与信息安全部重点实验室，教育部，山东计算机科学中心（济南国家超级计算机中心），齐鲁工业大学（山东省科学院））； University of Chinese Academy of Science（中国科学院大学）； Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）

AI总结提出PaperFlow框架，通过画像、推荐和自适应三阶段处理动态论文流，并构建纵向用户日基准，实验表明其在排序和行为对齐上优于基线。

详情

Comments: 48 pages, 13 figures, 22 tables

AI中文摘要

科学论文推荐通常被评估为固定候选集上的静态排序，然而真实的科研阅读是一个日常的纵向过程，其中兴趣会变化且反馈会累积。我们引入PaperFlow，一个将其组织为三个耦合阶段的框架：画像（Profiling），从异构冷启动证据中构建并维护一个结构化、可检查的学术画像；推荐（Recommending），在固定展示预算下通过多信号聚合对每个特定日期的论文流进行排序；自适应（Adapting），从语义不同的反馈信号中更新用户状态，并建模跨天的兴趣漂移。我们进一步定义了一个纵向用户日基准，该基准在共享的时间信息边界下固定用户、日期、候选池、可见输入和隐藏的模拟相关标签。该基准包含24个模拟研究用户、50个每日论文流、1200个用户日片段、20727篇独特论文和497448个片段-论文记录。我们还指定了一个盲人评估协议，以验证自动指标与专家判断之间的一致性。针对五个科学论文推荐基线的实验表明，PaperFlow实现了最强的基于oracle的排序、与模拟阅读选择最高的行为对齐以及最佳的盲人评估分数。

英文摘要

Scientific paper recommendation is typically evaluated as static ranking over a fixed candidate set, yet real scientific reading unfolds as a daily, longitudinal process in which interests shift and feedback accumulates. We introduce PaperFlow, a framework that organizes it into three coupled stages: Profiling, which constructs and maintains a structured, inspectable scholarly profile from heterogeneous cold-start evidence; Recommending, which ranks each date-specific paper stream through multi-signal aggregation under a fixed display budget; and Adapting, which updates user state from semantically distinct feedback signals and models interest drift across days. We further define a longitudinal user-day benchmark that fixes users, dates, candidate pools, visible inputs, and hidden simulated relevance labels under a shared temporal information boundary. The benchmark contains 24 simulated research users, 50 daily paper streams, 1,200 user-day episodes, 20,727 unique papers, and 497,448 episode-paper records. We additionally specify a blind human-evaluation protocol to validate alignment between automatic metrics and expert judgments. Experiments against five scientific recommendation baselines show that PaperFlow achieves the strongest oracle-based ranking, the highest behavioral alignment with simulated reading selections, and the best blind human-evaluation score.

URL PDF HTML ☆

赞 0 踩 0

2606.07449 2026-06-08 eess.SY cs.RO cs.SY 新提交

On orbital stabilization of a circular motion primitive for a dynamic extension of the Dubins car model

关于Dubins汽车模型动态扩展的圆形运动原语的轨道镇定

Artem Angelchev-Shiryaev, Pavel E. Aleshin, Anton S. Shiriaev, Pavel A. Shamanaev, Leonid B. Freidovich

发表机构 * Department of Industrial and Mechanical Sciences, Lund University（林恩大学工业与机械科学系）； Department of Information Technologies, Sirius University（西里乌斯大学信息科技系）； Department of Engineering Cybernetics, NTNU（挪威特纳大学工程控制系）； Department of Applied Physics and Electronics, Umeå University（乌梅大学应用物理与电子系）

AI总结针对Dubins汽车模型动态扩展的圆形运动原语，在横向线性化框架下研究轨道镇定，发现标准方法因横向线性化不稳定而失效，提出一组显式可验证条件使控制器设计仍适用。

2606.07316 2026-06-08 cs.MA cs.AI cs.DC 新提交

Hierarchical Certified Semantic Commitment for Byzantine-Resilient LLM-Agent Collaboration

面向拜占庭鲁棒的大语言模型智能体协作的分层认证语义承诺

Haoran Xu, Lei Zhang, Iadh Ounis, Xianbin Wang

发表机构 * University of Glasgow（格拉斯哥大学）； University of Western Ontario（西部 Ontario 大学）

AI总结提出H-CSC协议，将基于嵌入的终结性信号转换为三种类型输出（语义承诺、判决承诺或显式中止），实现大语言模型智能体拜占庭协作的最终性控制，在语义投毒和拜占庭攻击下保持低角度偏差和高中止率。

详情

Comments: 27 pages, 3 figures, 8 tables

AI中文摘要

大语言模型智能体间的拜占庭协作需要一个最终性控制原语：给定已交付的随机、结构化自然语言提案，协议必须决定该轮是否支持提交、何种提交或带类型的显式安全中止。朴素聚合将这一选择隐藏在单一判决背后；经典拜占庭容错将其隐藏在字节同一性背后，而LLM提案不满足该要求。我们提出分层认证语义承诺（H-CSC），一种受BFT启发的协议，将判决条件分组的提案上的嵌入派生最终性信号转换为三种类型结果之一：语义承诺（一个2f+1的判决内语义核心支持该判决，发出量化聚合上的参数绑定摘要）、判决承诺（强判决边缘但分散的语义理由，发出判决级证书而不声明语义聚合）或带类型原因的显式中止。贡献在于类型化最终性，而非原始提交准确性。在受控语义投毒诊断（BCS_v1，120个片段）上，H-CSC在BFT可行桶上以低角度偏差（0.31至2.04度）提交，并按设计100%中止超出BFT的轮次（n<3f+1）。在真实LLM智能体声明验证基准（MVR-50，50个任务）上，面对配对静态和急袭拜占庭攻击，H-CSC提交率为0.90/0.92，诚实引用无效率为0.02/0.00，统计上与强证书发出的仅判决基线匹配。不同于该基线，H-CSC还在74%/72%的轮次上发出嵌入支持的语义承诺摘要，提供类型化来源。严格语义消融仅提交0.54/0.48，表明判决级回退对于在相同≤0.04安全底线下的覆盖率（+0.36/+0.44）是必要的；跨四个LLM的100任务跨模型检查将无效hmaj保持在0.00至0.03内。

英文摘要

Byzantine collaboration among large-language-model agents requires a finality-control primitive: given delivered stochastic, structured natural-language proposals, the protocol must decide whether the round supports a commit, what kind of commit, or a typed safe abort. Naive aggregation hides this choice behind a single verdict; classical Byzantine fault tolerance hides it behind byte-identity that LLM proposals do not satisfy. We introduce Hierarchical Certified Semantic Commitment (H-CSC), a BFT-inspired protocol that converts embedding-derived finality signals over verdict-conditioned proposal groups into one of three typed outcomes: a semantic_commit (a 2f+1 within-verdict semantic core backs the verdict, emitting a parameter-bound digest over the quantised aggregate), a verdict_commit (strong verdict margin but dispersed semantic rationale, emitting a verdict-level certificate without claiming a semantic aggregate), or an explicit abort with a typed reason. The contribution is typed finality, not raw commit accuracy. On a controlled semantic-poisoning diagnostic (BCS_v1, 120 episodes), H-CSC commits with low angular deviation on BFT-feasible buckets (0.31 to 2.04 degrees) and aborts 100% of beyond-BFT rounds (n<3f+1) as intended. On a real LLM-agent claim-verification benchmark (MVR-50, 50 tasks) under paired static and rushing Byzantine attacks, H-CSC commits 0.90/0.92 with honest-reference-invalid rates of 0.02/0.00, statistically matching a strong certificate-emitting verdict-only baseline. Unlike that baseline, H-CSC also emits an embedding-backed semantic_commit digest on 74%/72% of rounds, supplying typed provenance. A strict-semantic ablation commits only 0.54/0.48, showing the verdict-level fallback is necessary for coverage (+0.36/+0.44) at the same <=0.04 safety floor; a 100-task cross-model check across four LLMs preserves invalid_hmaj within 0.00 to 0.03.

URL PDF HTML ☆

赞 0 踩 0

2606.07297 2026-06-08 cs.SE cs.CL 新提交

SWE-Explore: Benchmarking How Coding Agents Explore Repositories

SWE-Explore: 基准测试编码智能体如何探索代码仓库

Shaoqiu Zhang, Yuhang Wang, Jialiang Liang, Yuling Shi, Wenhao Zeng, Maoquan Wang, Shilin He, Ningyuan Xu, Siyu Ye, Kai Cai, Xiaodong Gu

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Xinjiang University（新疆大学）； University of Illinois at Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Independent Researcher（独立研究者）； The Chinese University of Hong Kong（香港中文大学）

AI总结提出SWE-Explore基准，通过评估编码智能体在给定代码仓库和问题下返回相关代码区域排名列表的能力，衡量其仓库探索性能，覆盖10种编程语言和203个仓库的848个问题。

详情

Comments: 20 pages, 5 figures

AI中文摘要

仓库级编码基准（如SWE-bench）推动了编码智能体能力的快速提升。然而，它们通常将编码任务视为一个整体的二元预测问题（例如，已解决或未解决），忽略了细粒度的智能体能力，如仓库理解、上下文检索、代码定位和错误诊断。在本文中，我们介绍了SWE-Explore，一个隔离评估仓库探索（编码智能体的关键能力）的基准。给定一个仓库和一个问题，SWE-Explore要求探索者在固定的行预算下返回一个相关的代码区域排名列表。SWE-Explore涵盖了10种编程语言和203个开源仓库中的848个问题。对于每个实例，我们从成功解决同一问题的独立智能体轨迹中推导出行级真实标签，提炼出它们的解决方案路径实际参考的具体代码区域。我们从覆盖度、排名和上下文效率维度评估探索，表明这些指标强烈跟踪下游修复行为。在一系列广泛的检索方法、通用编码智能体和专用定位器中，我们发现智能体探索者明显优于经典检索。虽然文件级定位对于现代方法已经很强，但行级覆盖度和高效排名仍然是区分最先进探索者的关键轴。

英文摘要

Repository-level coding benchmarks such as SWE-bench have driven a rapid surge in the capabilities of coding agents. Yet they usually treat coding tasks as a holistic, binary prediction problem (e.g., resolved or unresolved), neglecting fine-grained agent capabilities such as repository understanding, context retrieval, code localization, and bug diagnosis. In this paper, we introduce SWE-Explore, a benchmark that isolates the evaluation of repository exploration, a critical capability of coding agents. Given a repository and an issue, SWE-Explore asks an explorer to return a ranked list of relevant code regions under a fixed line budget. SWE-Explore covers 848 issues across 10 programming languages and 203 open-source repositories. For each instance, we derive line-level ground truth from independent agent trajectories that successfully solved the same issue, distilling the specific code regions their solution paths actually consulted. We evaluate exploration along coverage, ranking, and context-efficiency dimensions, showing that these metrics strongly track downstream repair behavior. Across a broad set of retrieval methods, general coding agents, and specialized localizers, we find that agentic explorers form a clear tier above classical retrieval. While file-level localization is already strong for modern methods, line-level coverage and efficient ranking remain the key axes differentiating state-of-the-art explorers.

URL PDF HTML ☆

赞 0 踩 0

2606.07150 2026-06-08 cs.CR cs.AI cs.MA cs.NI 新提交

From Privacy to Workflow Integrity: Communication-Graph Metadata in Autonomous Agent Interoperability

从隐私到工作流完整性：自主智能体互操作性中的通信图元数据

Bijaya Dangol

发表机构 * Independent Researcher（独立研究者）

AI总结针对智能体通信图元数据泄露问题，提出工作流完整性威胁模型，定义传输层与引导层隐私属性，并通过A2A案例验证元数据保护可有效抑制任务推断。

详情

Comments: 12 pages, 6 figures

AI中文摘要

诸如A2A和MCP之类的智能体互操作性协议标准化了智能体之间的通信内容，但假设基于地址的HTTP(S)传输。此类传输保护消息内容，并越来越多地采用端到端加密。它们暴露在明文中的是通信图：哪个智能体联系哪个智能体、何时以及频率如何。在智能体系统中，该图比隐私框架所暗示的更具后果性。端点通常带有能力标签，工作流是结构化和链式的，交互与实际行动耦合，因此观察者恢复的不仅仅是过去的关系。它可以推断出待处理的工作流、正在组装的任务以及可能即将发生的行动。以机器速度，它可以在工作流完成之前根据该推断采取行动。因此，威胁是工作流完整性，而不仅仅是隐私：对自主行动的预测性杠杆。我们为智能体通信图提供了一个威胁模型；识别了使智能体元数据具有独特揭示性的因素（语义性、前瞻性、驱动性）；定义了传输层和引导层隐私属性，并评估了候选传输（SimpleX/SMP、Tor、混合网络）与这些属性的匹配程度；并提出了一个A2A案例研究，其中元数据保护绑定是可表达的，但揭示了协议的身份假设。我们在一个基于真实A2A捕获的生成模型上测试了这些。仅凭被动元数据，没有载荷，一个分类器从工作流的开头就能以远高于随机水平的概率恢复任务类别；应用这些属性后，该恢复被急剧拉回随机水平。除了观察者能恢复的内容外，我们衡量了利用泄露的杠杆：在工作流开头和固定预算下，选择对哪些工作流采取行动的对手在此模型中实现了大部分先知攻击者相对于元数据盲攻击者的优势，而相同的属性抑制了这一点。

英文摘要

Agent-interoperability protocols such as A2A and MCP standardize what agents say to one another, but assume address-based transport over HTTP(S). Such transports protect message content, increasingly with end-to-end encryption. What they leave in the clear is the communication graph: which agent contacts which, when, and how often. In agent systems this graph is more consequential than a privacy framing suggests. Endpoints are often capability-labeled, workflows are structured and chained, and interactions are coupled to real actions, so an observer recovers more than past relationships. It can infer the pending workflow, the task being assembled and the action likely to follow. At machine speed, it can act on that inference before the workflow completes. The threat is therefore one of workflow integrity, not privacy alone: predictive leverage over autonomous action. We give a threat model for the agent communication graph; identify what makes agent metadata distinctively revealing (semanticity, prospectivity, actuation); define transport- and bootstrap-layer privacy properties and weigh candidate transports (SimpleX/SMP, Tor, mixnets) against them; and present an A2A case study in which a metadata-protecting binding is expressible but surfaces the protocol's identity assumptions. We test these on a generative model anchored to a real A2A capture. From passive metadata alone, with no payloads, a classifier recovers a task's class well above chance, from only the workflow's opening; applied together, the properties drive that recovery sharply back toward chance. Beyond what an observer can recover, we measure the leverage of acting on the leak: from a workflow's opening and under a fixed budget, an adversary choosing which workflows to act on realizes in this model most of a clairvoyant attacker's advantage over a metadata-blind one, and the same properties suppress it.

URL PDF HTML ☆

赞 0 踩 0

2606.07119 2026-06-08 cs.ET cs.AI cs.MA 新提交

The Three-Ring Architecture: Governing Agents in the Era of On-Platform Organisations

三环架构：平台型组织时代中的智能体治理

Sergio Alvarez-Telena, Marta Diez-Fernandez

发表机构 * arXiv

AI总结针对企业AI部署中缺乏治理基础设施导致失败率高达95%的问题，提出三环架构：环1为现有生产架构，环2为基于策略的智能体联邦层，环3为LLM前沿智能层，其中环2作为智能体企业的操作系统，实现资源抽象、进程协调、权限执行和智能积累，并区分环2与环3的风险轮廓。

详情

Comments: 28 pages

AI中文摘要

当前企业AI部署阶段面临结构性失败：组织获取了智能体能力，却缺乏治理基础设施。这预计将重演第一波AI部署的错误：去中心化智能缺乏联邦层，导致95%的项目失败率。本文正式提出三环架构作为平台型组织的治理基础设施。环1是现有生产架构；环2是基于策略的智能体AI构建的M2联邦层；环3是基于LLM的前沿智能层。环2在技术精确意义上构成了智能体企业的操作系统——在组织层面执行计算操作系统在设备层面执行的功能：资源抽象、进程协调、权限执行，以及一个用于复合智能的稳定平台。一个核心贡献是正式区分环2和环3的风险轮廓。基于策略的智能体在确定性框架内运行：其后果可追溯，其权限可执行，其偏差可恢复。基于LLM的智能体引入了截然不同的风险：一个非确定性行为者，其偏差通过复杂的组织系统传播，且无法追溯。环2不是一个有用的附加——它是控制和合规的必要条件。进一步推论：LLM能力的每一次提升都是该架构的结构性顺风。更强大的非确定性行为者在偏离时会产生更大的后果。治理需求随能力增长。该架构已在金融服务、政府、采购和合规等多个领域经过十年的部署验证。

英文摘要

The current phase of enterprise AI deployment faces a structural failure: organisations are acquiring agentic capability without the infrastructure to govern it. The result is expected to reproduce the error of the first wave of AI deployment: decentralised intelligence without a federation layer leading to a 95% project failure rate. This paper formalises the Three-Ring Architecture as the governing infrastructure of the on-platform organisation. Ring 1 is the existing production architecture; Ring 2 is the M2 federation layer built on strategies-based agentic AI; Ring 3 is the LLM-based frontier intelligence layer. Ring 2 constitutes, in the technically exact sense, the operating system of the agentic enterprise - performing at the organisational level what a computing OS performs at the device level: resource abstraction, process coordination, permission enforcement, and a stable platform for compounding intelligence. A central contribution is the formal distinction between Ring 2 and Ring 3 risk profiles. Strategies-based agents operate within a deterministic framework: their consequences are traceable, their permissions enforceable, their deviations recoverable. LLM-based agents introduce a categorically distinct risk: a non-deterministic actor whose deviations propagate through complex organisational systems without retrospective traceability. Ring 2 is not a useful addition - it is a necessary condition of control and compliance. A further consequence: every improvement in LLM capability is a structural tailwind for this architecture. More capable non-deterministic actors produce larger consequences when they deviate. The governance requirement scales with capability. The architecture has been validated across a decade of deployment in financial services, government, procurement, and compliance among other sectors.

URL PDF HTML ☆

赞 0 踩 0

2606.07114 2026-06-08 cs.NI cs.AI cs.IT math.IT 新提交

DIFFRACT: Neuralized Utility Maximization for Wireless Networks by Differentiable Programming

DIFFRACT: 通过可微编程实现无线网络的神经化效用最大化

Chee Wei Tan, Siya Chen

发表机构 * Nanyang Technological University（南洋理工大学）

AI总结提出DIFFRACT框架，利用可微编程将深度学习与优化结合，通过算法展开将干扰管理算法映射为可微神经网络，实现分布式端到端梯度学习，以应对动态多用户干扰和随机服务质量约束。

详情

Comments: IEEE INFOCOM 2026

AI中文摘要

下一代无线网络，包括卫星到开放RAN系统，需要敏捷且智能的资源管理，能够在随机服务质量约束下处理动态多用户干扰。本文介绍了DIFFRACT，一个神经化效用最大化框架，利用可微编程将深度学习与无线网络中的优化相结合。我们方法的核心是利用标准干扰函数的数学结构，这些函数是无线功率控制的基础。通过为这些函数开发对偶理论，我们通过算法展开将迭代干扰管理算法映射为可微神经网络架构。这使得在网络边缘进行分布式、端到端的基于梯度的学习成为可能，支持在地面和非地面环境中实时适应干扰。DIFFRACT通过建模复杂的信道动态并利用可微模型的表达能力，实现了可扩展且稳健的效用最大化。实验结果证实了该框架在下一代无线系统中的理论合理性和实际有效性。

英文摘要

Next-generation wireless networks, including satellite-to-Open RAN systems, demand agile and intelligent resource management capable of handling dynamic multi-user interference under stochastic quality of service constraints. This paper introduces DIFFRACT, a neuralized utility maximization framework that leverages differentiable programming to integrate deep learning with optimization in wireless networks. Central to our approach is the exploitation of the mathematical structure of standard interference functions, which are foundational in wireless power control. By developing a duality theory for these functions, we map iterative interference management algorithms into differentiable neural network architectures via algorithm unrolling. This enables distributed, end-to-end gradient-based learning at the network edge, supporting real-time adaptation to interference in both terrestrial and non-terrestrial environments. DIFFRACT allows for scalable and robust utility maximization by modeling complex channel dynamics and leveraging the expressiveness of differentiable models. Experimental results confirm the framework's theoretical soundness and practical effectiveness for next-generation wireless systems.

URL PDF HTML ☆

赞 0 踩 0

2606.07094 2026-06-08 cs.SE cs.AI 新提交

MetaConfigurator: AI-Assisted RDF Authoring from JSON Data

MetaConfigurator：从JSON数据实现AI辅助的RDF创作

Felix Neubauer, Mahdi Jafarkhani, Kenichi Endo, Jürgen Pleiss, Benjamin Uekermann

发表机构 * Institute for Parallel and Distributed Systems, University of Stuttgart（并行与分布式系统研究所，斯图加特大学）； Institute of Polymer Chemistry, University of Stuttgart（聚合化学研究所，斯图加特大学）； Institute of Biochemistry, University of Stuttgart（生物化学研究所，斯图加特大学）

AI总结提出一个集成AI辅助的RML映射、SPARQL查询和知识图谱可视化的Web界面，将JSON等结构化数据转换为RDF，并以MOF合成实验数据验证其有效性。

详情

Comments: Submitted as post-proceedings for the deRSE26 conference

AI中文摘要

科学工作流越来越多地生成结构化的JSON数据，这些数据易于交换，但由于缺乏语义互操作性，跨系统一致解释困难。虽然JSON Schema确保结构验证，但它不原生支持链接数据语义。本文提出了一个RDF创作视图，扩展了开源JSON Schema编辑器MetaConfigurator，使研究人员能够通过AI辅助的RML映射将现有的JSON、YAML或CSV数据转换为RDF，精炼三元组，执行SPARQL查询，可视化知识图谱，并在单个集成Web界面中导出RDF序列化。该工作流由本体感知的IRI自动完成、JSON-LD文本视图与RDF三元组表之间的双向同步以及从自然语言提示生成AI辅助的SPARQL查询支持。我们使用金属有机框架（MOF）合成实验的实验室数据演示了该工作流。描述试剂、程序步骤和数量的协议数据通过RML映射从JSON转换为基于本体的JSON-LD。然后，我们精炼语义表示，查询实验条件与结果之间的关系，并交互式探索生成的知识图谱。该集成环境将传统的结构化数据管理与语义网技术桥接起来，同时保留实验上下文并通过AI辅助降低技术门槛。

英文摘要

Scientific workflows increasingly generate structured JSON data that is easy to exchange but difficult to interpret consistently across systems due to lacking semantic interoperability. While JSON Schema ensures structural validation, it provides no native support for Linked Data semantics. This paper presents an RDF Authoring View extending the open-source JSON Schema editor MetaConfigurator, enabling researchers to transform existing JSON, YAML, or CSV data into RDF using AI-assisted RML mappings, refine triples, execute SPARQL queries, visualize knowledge graphs, and export RDF serializations within a single integrated web interface. This workflow is supported by ontology-aware IRI auto-completion, bidirectional synchronization between JSON-LD text views and RDF triple tables, and AI-assisted SPARQL query generation from natural language hints. We demonstrate the workflow using laboratory data from metal-organic framework (MOF) synthesis experiments. Protocol data describing reagents, procedure steps, and quantities is converted from JSON to ontology-based JSON-LD via RML mappings. We then refine the semantic representation, query relationships between experimental conditions and outcomes, and explore the resulting knowledge graph interactively. This integrated environment bridges conventional structured data management with Semantic Web technologies while preserving experimental context and lowering technical barriers through AI assistance.

URL PDF HTML ☆

赞 0 踩 0

2606.07057 2026-06-08 cs.IR cs.CL 新提交

Meaning in Order, Order in Meaning: Semantic R-precision for Keyphrase Evaluation

意义中的顺序，顺序中的意义：用于关键词评估的语义R-精度

Shamira Venturini, Steffen Kinkel

发表机构 * ILIN - Institute for Learning and Innovation in Networks, Karlsruhe University of Applied Sciences（学习与网络创新研究所，卡尔斯鲁厄应用科学大学）； Karlsruhe Institute of Technology（卡尔斯鲁厄工业大学）

AI总结提出语义R-精度（SemR-p）指标，结合语义相似性与排序感知，从人类视角评估自动生成关键词的质量，优于传统词汇和语义匹配方法。

详情

AI中文摘要

评估自动生成关键词的质量仍然是一个复杂的挑战。传统指标要么依赖精确词汇匹配，要么考虑语义相似性但忽略预测排序，两者都与人类判断信息性和相关性的方式不一致。我们引入了语义R-精度（SemR-p），一种新颖的评估指标，将语义相似性整合到排序感知的R-精度框架中。SemR-p从以人为中心的角度设计，受信息检索指标启发，奖励输出列表中早期出现的语义相关关键词。我们进行了广泛分析，评估其语义敏感性、排序感知能力以及跨模型和数据集的区分能力。结果表明，SemR-p为评估关键词预测提供了补充视角，有助于更好地反映以用户为中心的相关性概念，与传统的词汇和语义匹配指标相辅相成。

英文摘要

Evaluating the quality of automatically generated keyphrases remains a complex challenge. Traditional metrics either rely on exact lexical matching or consider semantic similarity while ignoring prediction ranking, both of which misalign with how humans judge informativeness and relevance. We introduce Semantic R-Precision (SemR-p), a novel evaluation metric that integrates semantic similarity into the rank-aware R-Precision framework. Designed from a human-centric perspective and inspired by Information Retrieval metrics, SemR-p rewards semantically relevant keyphrases that appear early in the output list. We conducted extensive analyses to assess its semantic sensitivity, ranking awareness, and discriminative power across models and datasets. The results suggest that SemR-p offers a complementary lens for evaluating keyphrase predictions, helping to better reflect user-centred notions of relevance alongside traditional lexical and semantic matching metrics.

URL PDF HTML ☆

赞 0 踩 0

2606.06838 2026-06-08 cs.SE cs.AI 新提交

LLM Agent-Assisted Reverse Engineering with Quantitative Readability Metrics

LLM 代理辅助逆向工程与定量可读性指标

Neil Archibald, Ruben Thijssen

发表机构 * University of Cambridge（剑桥大学）

AI总结提出定量可读性分数（QRS）框架，结合结构相似性门控与三个可读性子指标，指导 LLM 代理提升反编译代码可读性，同时保持功能正确性。

详情

AI中文摘要

自动反编译器生成功能正确但通常不可读的 C 代码。本文针对逆向工程工作流的一个阶段：使用由定量指标引导的 LLM 代理提高反编译代码的可读性。我们提出了一个三阶段的研究演进。阶段 1（通过 Ghidra MCP 的工具驱动引导）由于缺乏定量指导，导致覆盖不完整和改进不一致。阶段 2（仅结构相似性验证）揭示了代理以非预期方式优化指标，生成结构等效但可读性更差的代码。我们的贡献是定量可读性分数（QRS）框架，这是一个复合指标，结合了结构相似性门控与三个独立的可读性子指标（词汇惊奇度、结构简单性和惯用质量）。我们证明，QRS 引导的优化使 LLM 代理能够在牺牲正确性的情况下进行有针对性的可读性改进。我们提供了更广泛的逆向工程工作流（二进制提升、反编译清理和实现功能等价）作为背景讨论，但不在本文范围内。

英文摘要

Automatic decompilers produce functionally correct but often unreadable C code. This paper addresses one stage of the reverse engineering workflow: improving the readability of decompiled code using LLM agents guided by quantitative metrics. We present a three-phase research evolution. Phase 1 (tool-driven steering via Ghidra MCP) suffered from incomplete coverage and inconsistent improvements due to lack of quantitative guidance. Phase 2 (structural similarity validation alone) revealed that agents optimize for metrics in unintended ways, producing structurally equivalent but less readable code. Our contribution is the Quantitative Readability Score (QRS) framework, a composite metric combining a structural similarity gate with three independent readability sub-metrics (Lexical Surprisal, Structural Simplicity, and Idiomatic Quality). We demonstrate that QRS-guided refinement enables LLM agents to make targeted readability improvements without sacrificing correctness. We provide a discussion of the broader reverse engineering workflow (binary lifting, decompilation cleanup, and achieving functional equivalence) as context, however, it remains out of scope.

URL PDF HTML ☆

赞 0 踩 0

2606.06830 2026-06-08 cs.CY cs.LG 新提交

Learning Fair Demand Models

学习公平需求模型

Adam N. Elmachtoub, Hyemi Kim, Jonathan Y. Tan

发表机构 * Department of Industrial Engineering and Operations Research and Data Science Institute, Columbia University（工业工程与运筹学系及数据科学研究院，哥伦比亚大学）； Department of Industrial Engineering and Operations Research, Columbia University（工业工程与运筹学系，哥伦比亚大学）

AI总结研究数据驱动定价中的公平性问题，通过比较在需求估计或价格优化阶段施加公平约束的策略，分析其对消费者福利和社会效益的影响。

详情

AI中文摘要

数据驱动定价在航空、贷款、保险和零售等领域日益普遍。通过从客户特征中学习需求模型并据此定价，这些系统可能产生歧视性结果，引发公平性问题。这引出了基本问题——系统应如何在定价流程中纳入公平考量，以及最终如何影响社会结果？为回答这些问题，我们研究了一个简化模型，其中卖方有一个两阶段决策流程：线性需求模型估计，随后是价格优化。卖方在训练损失、价格和需求方面考虑公平概念，包括均等主义和罗尔斯主义视角。我们表明，跨消费者群体均等化训练损失会导致多个解，进而可能产生不良结果，尽管这是公平机器学习中的标准方法。相反，关注直接应用于价格或需求的公平性，我们比较了两种策略：在需求估计阶段或价格优化阶段强制执行公平性。对于均等主义公平，我们刻画了在较小公平水平下每种策略何时产生更高的社会福利。我们表明，当数据集中的市场规模和价格相似时，在估计阶段施加价格公平对消费者更有利，而在优化阶段施加需求公平则带来更好的消费者结果。对于罗尔斯主义公平，两种策略完全一致。最后，我们将模型扩展到其他需求函数，并使用真实世界疫苗定价数据进行案例研究。

英文摘要

Data-driven pricing is increasingly prevalent in sectors such as airlines, lending, insurance, and retail. By learning demand models from customer features and setting prices accordingly, these systems may generate discriminatory outcomes that raise fairness concerns. This leads to fundamental questions - how and where should systems incorporate fairness considerations in the pricing pipeline, and how does it ultimately affect societal outcomes? To answer these, we study a stylized model where a seller has a two-stage decision pipeline comprising linear demand model estimation followed by price optimization. The seller considers fairness notions in training loss, price, and demand, under both parity-wise and Rawlsian perspectives. We show that equalizing training loss across consumer groups leads to multiple solutions, which in turn can result in undesirable outcomes despite being a standard approach in fair machine learning. Focusing instead on fairness applied directly to prices or demand, we compare two strategies that enforce fairness in either the demand estimation stage or the price optimization stage. For parity-wise fairness, we characterize when each strategy yields higher social welfare under small fairness levels. We show that when market sizes and prices in the dataset are similar, imposing price fairness in the estimation stage is more beneficial to consumers, whereas imposing demand fairness in the optimization stage yields better consumer outcomes. For Rawlsian fairness, the two strategies coincide exactly. Lastly, we extend our model to alternate demand functions and conduct a case study using real-world vaccine pricing data.

URL PDF HTML ☆

赞 0 踩 0

2606.06818 2026-06-08 cs.DC cs.AR cs.LG 新提交

Terastal: Layer-Variant-based Scheduling for Real-Time Multi-DNN Workloads on Heterogeneous Accelerators

Terastal：基于层变体的异构加速器实时多DNN工作负载调度

Sing-Yao Wu, Fengshuo Song, Eli Bozorgzadeh

发表机构 * nd IEEE International Conference on Embedded and Real-Time Computing Systems and Applications（第32届IEEE嵌入式与实时计算系统与应用国际会议）

AI总结针对异构DNN加速器上多DNN执行中因层延迟差异导致的调度灵活性不足和截止时间错过率增加问题，提出层变体概念和Terastal框架，通过离线虚拟预算分配和层变体设计结合在线调度，优化加速器映射和变体选择，实验表明截止时间错过率降低30%以上且精度损失仅2.24%。

详情

Comments: 8 pages, 6 figures. Accepted by RTCSA 2026. Author accepted manuscript

AI中文摘要

异构DNN加速器通过将每一层映射到其首选的加速器来改善软实时多DNN执行，从而减少延迟。然而，在偏斜的工作负载下，跨加速器的较大层延迟差异限制了调度灵活性并增加了截止时间错过率。为了解决这一挑战，我们引入了层变体，即定制的层实现，以减少非首选加速器上的延迟差距。然后，我们提出了Terastal，一个用于异构DNN加速器上层变体设计和调度的软实时框架。Terastal结合了离线异构感知虚拟预算分配和层变体设计，以及在线调度，在时间和精度约束下联合优化加速器映射和变体选择。实验结果表明，与FCFS、EDF和DREAM相比，Terastal将每个模型的截止时间错过率分别降低了40.58%、30.53%和36.27%，同时跨模型变体仅造成平均2.24%的归一化精度损失。

英文摘要

Heterogeneous DNN accelerators improve soft real-time multi-DNN execution by mapping each layer to its preferred accelerator to reduce latency. However, under skewed workloads, large layer-latency differences across accelerators limit scheduling flexibility and increase deadline misses. To address this challenge, we introduce layer variants, customized layer implementations that reduce latency gaps on non-preferred accelerators. We then present Terastal, a soft real-time framework for layer-variant design and scheduling on heterogeneous DNN accelerators. Terastal combines offline heterogeneity-aware virtual budget assignment and layer-variant design, and online scheduling to jointly optimize accelerator mapping and variant selection under timing and accuracy constraints. Experimental results show that Terastal reduces deadline miss rate per model by 40.58%, 30.53%, and 36.27% compared with FCFS, EDF, and DREAM, respectively, while incurring only 2.24% average normalized accuracy loss across models with variants.

URL PDF HTML ☆

赞 0 踩 0

2606.06800 2026-06-08 cs.HC cs.AI 新提交

Exploring Reinforcement Learning for Fluid Transitions Between Clinical Mental Healthcare and Everyday Wellness Support

探索强化学习在临床心理健康与日常健康支持之间的流畅过渡

Tony Wang, Qian Yang

发表机构 * Cornell University（康奈尔大学）

AI总结本研究探索强化学习（RL）构建数字健康系统，动态选择临床与健康干预措施，以优化整体健康目标（持续日记），发现RL优化序列的益处常在干预结束后显现，且高参与度用户随时间深化参与，而恒定干预组易倦怠退出。

详情

Journal ref: Healthcare Beyond Reaction: Harnessing AI and Sensing for Proactive Care, Workshop at ACM Interactive Health 2026 (IH '26), July 05--08, 2026, Porto, Portugal

AI中文摘要

心理健康问题时好时坏，但临床和健康干预通常分开运作，导致护理过渡频繁中断。我们探索强化学习（RL）作为构建数字健康系统的手段，该系统主动提供临床和健康干预，作为连贯护理旅程的一部分。我们问：设计这样一个系统涉及哪些复杂性？我们构建了一个上下文赌博机，从临床和健康库中动态选择日记提示，以优化总体健康目标（持续日记），并在为期四周的探索性研究（N=38）中部署。我们发现，首先，RL优化的干预序列的许多益处仅在干预结束后才显现，这引发了一个问题：提供连贯临床-健康护理旅程的系统是否应包括退步期？如果是，何时以及如何？其次，与RL生成干预互动最多的参与者随着时间的推移加深了他们的参与度，而与恒定干预互动最多的参与者往往后来倦怠并退出。这引发了一个问题：当系统混合临床和健康干预时，何时应降低强度以防止倦怠，何时应维持强度以最大化治疗效果？

英文摘要

Mental health struggles wax and wane, yet clinical and wellness interventions typically operate separately, causing frequent breakdowns at care transitions. We explore reinforcement learning (RL) as a means to build digital health systems that deliver clinical and wellness interventions proactively, as part of a coherent care journey. We ask: what complexities does designing such a system involve? We built a contextual bandit that dynamically selects journaling prompts from clinical and wellness repertoires to optimize for an overarching health goal (sustained journaling) and deployed it in a four-week exploratory study (N=38). We found that, first, many benefits of RL-optimized intervention sequences appeared only after interventions ended, raising the question: Should systems that offer coherent clinical-wellness care journeys include stepping-back periods? If so, when and how? Second, participants most engaged with RL-generated interventions deepened their engagement over time, while those most engaged with a constant intervention tended to burn out and drop out later. It raises the question: When should a system blending clinical and wellness interventions reduce intensity to prevent burnout in versus sustain it to maximize treatment gains?

URL PDF HTML ☆

赞 0 踩 0

2606.06784 2026-06-08 cs.CR cs.AI cs.CY 新提交

What Your Posts Reveal: A Benchmark and Agentic Framework for User-Level Privacy Leakage on Social Media

你的帖子揭示了什么：社交媒体用户级隐私泄露的基准与智能体框架

Zifan Peng, Yini Huang, Aiwen Lu, Qiming Ye, Peixian Zhang, Jingyi Zheng, Yule Liu, Xuechao Wang, Xinlei He, Jiaheng Wei

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港理工大学（广州））； Wuhan University（武汉大学）

AI总结针对社交媒体用户级多模态隐私泄露缺乏统一基准和评估指标的问题，提出SopriBench基准和隐私暴露分数（PES），并开发了无需训练的智能体框架Argus，通过跨帖子线索累积推理实现隐私推断，PES达0.55，较最强基线提升25%。

详情

AI中文摘要

公开的社交媒体帖子可以通过散布在文本、图像或元数据中的微弱线索泄露私人信息。这种泄露通常是累积性和跨帖子的：单独看似无害的线索可能共同暴露用户的家庭、工作场所或日常行程。然而，当前研究缺乏用户级多模态隐私泄露的统一基准，以及能够捕捉暴露严重程度（超越二元准确性）的评估指标。为解决这些不足，我们提出了SopriBench，这是一个由从Rednote和Instagram账户的私有参考语料库中抽象出的泄露模式引导的合成基准，涵盖50个用户档案和1569张图像，包含属性、上下文敏感性、粒度、泄露类型、推理难度和支持证据。我们进一步引入了隐私暴露分数（PES），该分数通过上下文敏感性对值粒度进行加权。受溯因推理启发，我们提出了Argus，一个无需训练的智能体框架，用于累积泄露推理。Argus从累积证据中形成假设，验证支持证据，并将跨帖子线索聚合为隐私档案，实现了0.55的PES，比最强基线提高了25%，在跨帖子泄露上增益最大。

英文摘要

Public social media posts can reveal private information through weak cues scattered across text, images, or metadata. Such leakage is often cumulative and cross-post: cues that appear harmless in isolation may jointly expose a user's home, workplace, or routine. However, current research lacks a unified benchmark for user-level multimodal privacy leakage and an evaluation metric that captures exposure severity beyond binary accuracy. To address these gaps, we propose SopriBench, a synthetic benchmark guided by leakage patterns abstracted from a private reference corpus of Rednote and Instagram accounts, covering 50 user profiles and 1,569 images with attributes, contextual sensitivity, granularity, leakage type, inference difficulty, and supporting evidence. We further introduce the Privacy Exposure Score (PES), which weights value granularity by contextual sensitivity. Inspired by abductive reasoning, we introduce Argus, a training-free agentic framework for cumulative leakage inference. Argus forms hypotheses from accumulated evidence, verifies supporting evidence, and aggregates cross-post cues into privacy profiles, achieving 0.55 PES, a 25% improvement over the strongest baseline, with the largest gain on cross-post leakage.

URL PDF HTML ☆

赞 0 踩 0

2606.06779 2026-06-08 cs.IR cs.AI 新提交

Mind the Gap: Bridging Behavioral Silos with LLMs in Multi-Vertical Recommendations

注意差距：用LLM弥合多垂直领域推荐中的行为孤岛

Nimesh Sinha, Raghav Saboo, Martin Wang, Sudeep Das

发表机构 * DoorDash Inc.（DoorDash公司）

AI总结提出利用LLM从数据丰富垂直领域（如餐厅）向稀疏领域（如杂货）迁移知识的框架，通过分层RAG生成多级特征，集成到MTL排序模型，显著提升新兴业务个性化与参与度。

详情

AI中文摘要

在多垂直领域电商平台（如DoorDash）中，较新的产品垂直领域（如杂货和零售）为个性化创新提供了重要机遇。一个关键挑战在于解决用户的“冷启动”问题。本文介绍了一种新颖框架，通过将知识从数据丰富的垂直领域（例如DoorDash的餐厅）迁移到数据稀疏的垂直领域来提升推荐质量。我们利用大型语言模型（LLMs）进行生成式推理，合成封装潜在用户偏好的稀疏高维特征。具体而言，我们采用分层检索增强生成（RAG）流水线，从用户餐厅订单历史和搜索查询中推导出多级分类特征。这些生成的特征编码了长期跨垂直偏好和短期意图，并集成到生产环境中的多任务学习（MTL）排序模型中。通过广泛的离线和在线评估，我们证明该方法显著改善了新兴业务垂直领域的个性化和参与度，有效弥合了行为数据差距。

英文摘要

In multi-vertical e-commerce platforms like DoorDash, relatively newer product verticals such as grocery and retail present a significant opportunity for personalization innovation. A key challenge lies in solving the "cold start" problem for users. This paper introduces a novel framework for enhancing recommendation quality by transferring knowledge from data-rich verticals (e.g., restaurants at DoorDash) to data-sparse ones. We leverage Large Language Models (LLMs) to perform generative inference, synthesizing sparse, high-dimensional features that encapsulate latent user affinities. Specifically, we employ a hierarchical Retrieval-Augmented Generation (RAG) pipeline to derive multi-level taxonomic features from user restaurant order histories and search queries. These generated features, encoding both long-term cross-vertical preferences and short-term intent, are integrated into a production Multi-Task Learning (MTL) ranking model. We demonstrate through extensive offline and online evaluation that this approach significantly improves personalization and engagement in emerging business verticals, effectively bridging the behavioral data gap.

URL PDF HTML ☆

赞 0 踩 0

2606.06566 2026-06-08 cs.SE cs.AI 新提交

NTILC: Neural Tool Invocation via Learned Compression

NTILC: 通过学习的压缩实现神经工具调用

Andrew Krikorian, Yayuan Li, Jason J. Corso

发表机构 * Department of Robotics, University of Michigan（机器人学系，密歇根大学）； Department of ECE, University of Michigan（电子工程与计算机科学系，密歇根大学）

AI总结提出NTILC框架，用学习的潜在检索替代上下文工具查找，将工具选择与参数生成解耦，通过签名感知复合损失函数提升选择精度，相比基线减少95%上下文消耗和74%延迟。

详情

Comments: 10 Pages, 4 Figures, 5 Tables, 1 Algorithm

AI中文摘要

基于代理的工具调用语言模型依赖于大量可调用API、函数和本地动作的注册表。直接将完整工具规范放入提示中的成本随工具注册表大小线性增长，迅速消耗上下文预算。随着注册表增长，这导致更高的延迟和降低的选择准确性，特别是由于不相关工具的干扰。我们通过引入NTILC（一种神经工具选择和调用框架）克服了这些限制，该框架用学习的潜在检索取代了上下文注册表查找。NTILC将用户意图和工具规范映射到共享嵌入空间，通过外部检索而非上下文查找实现工具选择。语言模型仅基于所选工具模式进行条件生成，从而实现精确、受限的参数生成。我们方法的核心是签名感知复合目标，它通过从工具签名（例如参数模式、类型兼容性和返回类型）派生的约束来增强语义相似性。通过将Circle Loss与Functional Margin Loss相结合，模型强制区分语义相似但在执行签名下不兼容的工具。我们在公开的工具选择和函数调用数据集上评估NTILC，并报告上下文令牌使用量、检索准确性和选择延迟指标。在这些设置中，与长上下文ICT基线相比，NTILC将上下文窗口消耗减少了95%以上，推理延迟减少了高达74%。

英文摘要

Agentic tool-calling language models depend on large registries of callable APIs, functions, and local actions. Placing full tool specifications directly in the prompt incurs a cost that scales linearly with the size of the tool registry, rapidly consuming the context budget. As the registry grows, this leads to higher latency and degrades selection accuracy, particularly due to interference from irrelevant tools. We overcome these limitations by introducing NTILC, a neural tool selection and invocation framework that replaces in-context registry look-up with learned latent retrieval. NTILC maps both user intent and tool specifications into a shared embedding space, enabling tool selection via external retrieval rather than in-context lookup. The language model is conditioned only on the selected tool schema, allowing for precise, constrained argument generation. Central to our approach is a signature-aware composite objective, which augments semantic similarity with constraints derived from tool signatures (e.g., argument schema, type compatibility, and return types). By combining Circle Loss with a Functional Margin Loss, the model enforces separation between tools that are semantically similar but incompatible under their execution signatures. We evaluate NTILC on public tool-selection and function-calling datasets and report context token usage, retrieval accuracy, and selection latency metrics. Across these settings, NTILC reduces context window consumption by over 95% and inference latency by up to 74% compared to long-context ICT baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.06565 2026-06-08 cs.GR cs.HC cs.LG 新提交

AI Level of Detail: Distance-Aware ML Model Precision Selection for Real-Time Human Motion Prediction in Games

AI细节层次：面向游戏中实时人体运动预测的距离感知机器学习模型精度选择

Mathew Varghese

发表机构 * University of Washington（华盛顿大学）

AI总结提出AI LOD框架，根据NPC与玩家摄像头的距离调整机器学习推理精度，利用量化模型作为近似，在保持感知质量的同时降低计算开销。

详情

Comments: Camera-ready for SIGGRAPH Technical Workshops 2026

AI中文摘要

现代游戏引擎使用学习到的运动模型来驱动NPC动画，消耗大量计算资源。本文提出AI细节层次（AI LOD）框架，其中机器学习推理精度根据每个NPC与玩家摄像头的距离进行调整。核心思想类似于经典的几何LOD：在差异不可察觉的地方用更便宜的近似替代。这里的近似是低精度的量化机器学习模型，而不是低多边形网格。本文的贡献在于AI LOD概念本身：推理时量化可以作为AI驱动角色动画的LOD轴——更广泛地说，适用于任何基于AI的运行时系统，其中感知敏感性随上下文变化。使用Li等人的卷积序列到序列模型作为代表性示例来演示该概念，将其训练好的检查点导出为三个ONNX Runtime变体（FP32、FP16和INT8逐张量），旨在运行时由基于距离的选择器路由。在CMU Mocap数据集上的评估初步表明，每个精度层级可以在其指定距离范围内提供服务，且可感知的退化可忽略不计，支持了距离感知的机器学习模型精度选择作为基于AI的角色动画的可行LOD策略这一更广泛的假设。

英文摘要

Modern game engines spend significant compute animating NPCs with learned motion models. This paper proposes AI Level of Detail (AI LOD), a framework in which machine learning inference precision is adapted based on the distance between each NPC and the player camera. The core idea mirrors classical geometry LOD: substitute a cheaper approximation where the difference is imperceptible. Here, the approximation is a lower-precision quantized machine learning model rather than a lower-polygon mesh. The contribution of this work is the AI LOD concept itself: that inference-time quantization can serve as the LOD axis for AI-driven character animation - and more broadly, for any AI-based runtime system where perceptual sensitivity varies with context. The convolutional sequence-to-sequence model of Li et al. is used as a representative example to demonstrate the concept, with its trained checkpoint exported into three ONNX Runtime variants (FP32, FP16, and INT8 per-tensor), intended to be routed by a distance-based selector at runtime. Evaluation on the CMU Mocap dataset provides initial evidence that each precision tier can be served at its assigned distance range with negligible perceptible degradation, supporting the broader premise that distance-aware ML model precision selection is a viable LOD strategy for AI-based character animation.

URL PDF HTML ☆

赞 0 踩 0

2606.06555 2026-06-08 cs.NE cs.LG 新提交

Depth over Fidelity in Fixed-Budget Noisy Evolution Strategies

固定预算噪声进化策略中深度优先于保真度

Sichen Wang, Zhipeng Lu

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结针对固定评估预算下的噪声进化策略，提出概率精英成员（PEM）方法，通过条件期望秩权重替代硬秩权重，实现Rao-Blackwell化降噪，在COCO基准和RL等任务中取得一致提升。

详情

Comments: Accepted at the 43rd International Conference on Machine Learning (ICML 2026). 28 pages, 16 figures, 7 tables, including appendices

AI中文摘要

在固定评估预算下的噪声进化策略面临深度-保真度权衡：花费评估来去噪代内排序会减少优化器可以执行的分布更新次数。我们主张深度优先于保真度，并提出概率精英成员（PEM），该方法用集成排序不确定性的条件期望秩权重替代进化策略中基于硬秩的权重。PEM在保持条件均值更新的同时减少了条件更新离散度，这是对噪声秩基步骤的Rao-Blackwell化。我们通过带每代开销上限的残差自助法（RB-PEM）实例化PEM，并辅以自适应探测-切换机制以应对低噪声场景。在COCO bbob-noisy套件以及包括RL策略搜索和超参数优化的外部任务中，RB-PEM在高误排序、预算受限的设置下取得了一致增益。

英文摘要

Noisy evolution strategies under fixed evaluation budgets face a depth-fidelity trade-off: spending evaluations to denoise intra-generation rankings reduces the number of distribution updates the optimizer can execute. We argue for depth over fidelity and propose probabilistic elite membership (PEM), which replaces hard rank-based weights in evolution strategies with conditional expected rank weights that integrate over ranking uncertainty. PEM preserves the conditional mean update while reducing conditional update dispersion, a Rao-Blackwellization of the noisy rank-based step. We instantiate PEM via residual bootstrapping (RB-PEM) with capped per-generation overhead, complemented by an adaptive probe-and-switch mechanism for low-noise regimes. Across the COCO bbob-noisy suite and external tasks including RL policy search and hyperparameter optimization, RB-PEM achieves consistent gains in high-misranking, budget-constrained settings.

URL PDF HTML ☆

赞 0 踩 0

2606.06545 2026-06-08 cs.SE cs.AI 新提交

Queen-Bee Agents: A BeeSpec-Centered Architecture for Governed Enterprise MCP Orchestration

蜂后智能体：一种以BeeSpec为中心的受治理企业MCP编排架构

Dutao Zhang, Liaotian

发表机构 * Polytechnic University（理工学院）

AI总结提出Queen-Bee多智能体架构，通过Queen控制平面检索能力、规划任务并编译BeeSpec，由Bee智能体在受限工具访问下执行，实现策略执行、租户隔离和边界内执行，在59个企业任务中达到0.964成功率且零治理失败。

详情

Comments: Technical report. Prototype-level systems evidence; 59 enterprise-style tasks

AI中文摘要

企业智能体系统日益需要将大语言模型连接到私有工具、内部知识和模型上下文协议（MCP）接口。在这种环境下，原始任务能力是不够的：组织还需要策略执行、租户范围的隔离以及在明确操作边界内的执行。我们提出了Queen-Bee，一种受治理的多智能体架构，其中Queen控制平面检索能力、规划任务范围的执行，并编译结构化的BeeSpec，由专门的Bee智能体在受限工具访问下执行。我们实现了一个工作原型，具有租户范围的MCP连接器、审计支持的运行时治理、检索驱动的弱孵化和多个配置后端。我们在59个企业风格任务上评估了系统，涵盖治理敏感请求、检索驱动的配置、范围化本地执行和化学工作流集成。检索驱动的Queen-Bee变体实现了0.964的任务成功率、零治理失败，并且在范围化执行质量上显著优于静态Queen-Bee基线和宽松的单智能体基线。我们进一步展示了一个多Bee化学工作流，具有明确的审批门控和一个基于真实上游证据和筛选工件的具体前三候选名单。与混合检索和LLM引导配置的额外比较表明，更丰富的配置后端是可行的，但在当前小型、高度结构化的能力注册表上并未优于轻量级结构化检索器。结果提供了原型级别的系统证据，而非生产部署研究，并表明企业智能体平台不仅应根据能力进行评估，还应考虑受治理的配置、隔离行为、范围化执行质量和工件感知的工作流协调。

英文摘要

Enterprise agent systems increasingly need to connect large language models to private tools, internal knowledge, and Model Context Protocol (MCP) interfaces. In this setting, raw task capability is insufficient: organizations also require policy enforcement, tenant-scoped isolation, and execution that remains within explicit operational boundaries. We present Queen-Bee, a governed multi-agent architecture in which a Queen control plane retrieves capabilities, plans task-scoped execution, and compiles a structured BeeSpec that is executed by specialized Bee agents under constrained tool access. We implement a working prototype with tenant-scoped MCP connectors, audit-backed execution-time governance, retrieval-driven weak incubation, and multiple provisioning backends. We evaluate the system on 59 enterprise-style tasks spanning governance-sensitive requests, retrieval-driven provisioning, scoped local execution, and chemistry workflow integration. The retrieval-driven Queen-Bee variant achieves a task success rate of 0.964, zero governance failures, and substantially better scoped execution quality than both a static Queen-Bee baseline and a permissive single-agent baseline. We further show a multi-Bee chemistry workflow with explicit approval gating and a concrete top-3 shortlist grounded in real upstream evidence and screening artifacts. Additional comparisons with hybrid retrieval and LLM-guided provisioning show that richer provisioning backends are viable but do not outperform the lightweight structured retriever on the current small, highly structured capability registry. The results provide prototype-level systems evidence rather than a production deployment study, and suggest that enterprise agent platforms should be evaluated not only by capability, but also by governed provisioning, isolation behavior, scoped execution quality, and artifact-aware workflow coordination.

URL PDF HTML ☆

赞 0 踩 0