arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.20172 2026-05-20 cs.LO cs.AI

Long-term Power Grid Planning via Answer Set Programming

通过答案集编程进行长期电力网络规划

Antonio Ielo, Francesco Doria, Sandra Castellanos-Paez, Marco Maratea, Francesco Percassi, Mauro Vallati

AI总结本文提出了一种基于答案集编程的自动化和优化长期电力网络规划方法，以解决可持续性目标、需求模式和城市化趋势等复杂问题。

Comments 16 pages, 4 figures

详情

AI中文摘要

电力网络是支撑现代社会各个方面的重要基础设施，其有效性需要持续适应。特别是要应对可持续性目标、需求模式和城市化趋势，需要对网络进行更改。实际发展可能持续数十年，必须通过确保符合多种拓扑和组合不变量来保持供应连续性和服务质量。长期电力网络规划涉及上述过程，尽管规划语言可能是一个自然的选择，但所需的属性和不变量在这样的语言中难以表达；相反，它们可以优雅且简洁地编码在答案集编程（ASP）中。在本文中，我们提出了一种利用ASP自动化和优化长期电力网络规划过程的方法。在合成和实际电网数据上进行的实验评估证实了所提出的基于ASP的方法的表达能力，并展示了其有效性。

英文摘要

The Power grid is a critical infrastructure underpinning all aspects of modern society and its services. Maintaining its effectiveness requires continuous adaptations. In particular, addressing sustainability targets, demand patterns, and urbanisation trends requires implementing changes to the network. Actual developments can potentially span over a decade, with supply continuity and service quality that must be preserved throughout by ensuring conformance to several topological and combinatorial invariants. Long-term power grid planning deals with the above process, and although planning languages could be a natural choice, the kind of properties and invariants needed are cumbersome to express in such languages; on the contrary, they can be elegantly and succinctly encoded in Answer Set Programming (ASP). In this paper, we propose the first approach to automate and optimise the long-term power grid planning process using ASP. Experimental evaluations conducted on synthetic and real-world grid data confirm the expressive power of the proposed ASP-based approach and demonstrate its effectiveness.

URL PDF HTML ☆

赞 0 踩 0

2605.20145 2026-05-20 stat.ML cs.LG stat.ME

Goal-Oriented Lower-Tail Calibration of Gaussian Processes for Bayesian Optimization

面向目标的高斯过程低尾校准用于贝叶斯优化

Aurélien Pion, Emmanuel Vazquez

AI总结本文研究了在无噪声情况下，针对低于低阈值t的标准高斯过程模型的预测分布进行面向目标的校准，提出了一种后处理方法tcGP，以校准预测分布低于t的部分，并展示了基于此的全局优化算法在设计空间中保持密集性，实验表明相较于标准高斯过程模型和全局校准高斯过程模型，改进了低尾校准和贝叶斯优化性能。

详情

Journal ref: ICML 2026

AI中文摘要

贝叶斯优化（BO）利用高斯过程（GP）预测分布来选择昂贵的黑箱目标的评估点。核选择和超参数选择可能导致预测分布不准确，从而影响探索与利用的平衡。对于最小化问题，采样标准如预期改进（EI）依赖于当前最佳值以下的预测分布，因此低尾不准确直接影响采样决策。本文研究了在无噪声情况下，针对低于低阈值t的标准高斯过程模型的预测分布进行面向目标的校准，超参数通过最大似然法选择。引入了一种预测可靠性低于t的框架，基于两个空间校准的概念：设计空间上的发生校准和子水平集形式{ x∈X, f(x)≤t }上的阈值μ-校准。在此框架基础上，提出tcGP，一种后处理方法，用于校准预测分布低于t的部分，并证明由此得到的基于EI的全局优化算法在设计空间中保持密集。在标准基准测试中，实验表明相较于标准高斯过程模型和全局校准高斯过程模型，改进了低尾校准和贝叶斯优化性能。

英文摘要

Bayesian optimization (BO) selects evaluation points for expensive black-box objectives using Gaussian process (GP) predictive distributions. Kernel choice and hyperparameter selection can lead to miscalibrated predictive distributions and an inappropriate exploration-exploitation trade-off. For minimization, sampling criteria such as expected improvement (EI) depend on the predictive distribution below the current best value, so lower-tail miscalibration directly affects the sampling decision. This article studies goal-oriented calibration of GP predictive distributions below a low threshold $t$ in the noiseless setting, for standard GP models with hyperparameters selected by maximum likelihood. A framework for predictive reliability below $t$ is introduced, based on two notions of spatial calibration: occurrence calibration over the design space and thresholded $μ$-calibration on sublevel sets of the form $\{x\in\mathbb{X}, f(x)\le t\}$. Building on this framework, we propose tcGP, a post-hoc method that calibrates GP predictive distributions below~$t$, and we show that the resulting EI-based global optimization algorithm remains dense in the design space. Experiments on standard benchmarks show improved lower-tail calibration and BO performance relative to standard GP models and globally calibrated GP models.

URL PDF HTML ☆

赞 0 踩 0

2605.20132 2026-05-20 physics.geo-ph cs.LG eess.SP

FiLark: a streaming-first software framework for end-to-end exploration, annotation, and algorithm integration in distributed acoustic sensing

FiLark：一种面向流式处理的软件框架，用于分布式声学传感的端到端探索、标注和算法集成

Jintao Li, Weichang Li, Kai Tong, Xaingyu Guo

AI总结本文提出FiLark框架，通过流式处理原则，实现分布式声学传感数据的端到端探索、标注和算法集成，解决传统批量分析框架无法处理连续高通道数据流的问题。

详情

AI中文摘要

分布式声学传感（DAS）系统生成的连续、超高通道计数的数据流速率超过了传统批量分析框架的能力。因此，诸如长时记录的交互探索、可扩展的事件标注和实时算法闭环监控等关键任务仍然无法得到足够支持。本文提出了FiLark（Fiber Lark），一种Python框架，其应用流式处理原则贯穿数据访问、信号处理、可视化和监控。FiLark将任何DAS源，包括连续多文件记录，作为统一流进行处理，并围绕该抽象构建所有系统组件。基于OpenGL的环形缓冲区渲染器允许以恒定内存使用量交互浏览和可视化任意长的记录。集成的标注界面支持在连续数据流中直接进行事件标注，从而在不进行离线预处理的情况下创建可重复的机器学习准备好的标注数据集。信号处理库包括时间、空间、频谱和分解基的运算符，包含通过PyTorch实现的CPU版本和GPU加速版本，以及具有状态的分块执行，以在段边界保持处理连续性和应用语义。标准化的监控接口进一步将流式检测器和基于学习的模型整合到可视化工作流程中。通过在所有层次共享共同的流式抽象，FiLark允许在交互式开发的处理配置和工作流程直接转移到可扩展的生产管道中，而无需修改。

英文摘要

Distributed acoustic sensing (DAS) systems generate continuous, ultra-high-channel-count data streams at rates that exceed the capabilities of conventional batch-oriented analysis frameworks. As a result, essential tasks such as interactive exploration of long-duration recordings, scalable event annotation, and real-time algorithm-in-the-loop monitoring remain inadequately supported by workflows built around manually selected data segments and offline processing. This paper presents FiLark (Fiber Lark), a Python framework that applies a \emph{streaming-first} principle uniformly across data access, signal processing, visualization and monitoring for DAS. Instead of operating on manually selected data segments, FiLark presents any DAS sources-including continuous multi-file recordings-as a unified stream and builds all system components around that abstraction. An OpenGL-based ring-buffer renderer enables interactive browsing and visualization of arbitrarily long recordings with constant memory usage. An integrated annotation interface supports event labeling directly within continuous data streams, facilitating the creation of reproducible machine-learning-ready labeled datasets without offline preprocessing. The signal processing library includes temporal, spatial, spectral, and decomposition-based operators, with both CPU implementations and GPU-accelerated variants via PyTorch, alongside stateful chunked execution that preserves processing continuity and application semantics across segment boundaries. A standardized monitor interface further integrates streaming detectors and learning-based models into the visualization workflow. By sharing a common streaming abstraction across all layers, FiLark allows processing configurations and workflows developed interactively to transfer directly to scalable production pipelines without modification.

URL PDF HTML ☆

赞 0 踩 0

2605.20127 2026-05-20 q-bio.NC cs.AI cs.LG

Beyond Prediction Accuracy: Target-Space Recovery Profiles for Evaluating Model-Brain Alignment

超越预测准确性：用于评估模型-大脑对齐的靶空间恢复曲线

Ken Nakamura, Tomoya Nakai, Ryuto Yashiro, Ayumu Yamashita, Kaoru Amano

AI总结本文提出了一种评估模型-大脑对齐的新方法，通过分析可重复预测的靶空间响应维度，揭示预测准确性之外的模型-大脑对齐情况。

Comments 34 pages, 12 figures, 5 tables

详情

AI中文摘要

人工视觉模型通常通过测量其内部表示预测大脑响应的准确性来评估人类视觉皮层。然而，仅凭预测准确性无法确定目标大脑响应空间中哪些维度被恢复。本文介绍了一种统一框架，通过识别预测恢复的响应维度来评估模型-大脑和大脑-大脑对齐。通过重复fMRI测量，我们首先确定可在独立试验分割中重复预测的目标大脑响应维度。然后，我们预测目标大脑响应，无论是从另一个受试者的大脑响应还是视觉模型的内部表示，并量化这些可重复响应维度的恢复程度。将此框架应用于自然场景数据集的一个子集，其中八名受试者在fMRI下观看了相同的自然图像，我们发现早期到中期视觉皮层响应包含一组低维的可重复维度。大脑-大脑比较确定哪些维度可以从其他受试者的大脑中一致恢复，提供了一种诊断性的人类参考而非仅标量基准。在某些情况下，预训练和随机初始化的模型在预测准确性上相似，但这些响应维度的恢复曲线却不同。这些结果表明，仅凭预测准确性可能掩盖模型-大脑不匹配。通过明确哪些可重复的大脑响应维度被预测恢复，我们的框架提供了更诊断性的评估，以评估人工视觉模型与人类视觉皮层的对齐情况。

英文摘要

Artificial vision models are often evaluated against the human visual cortex by measuring how accurately their internal representations predict brain responses. However, prediction accuracy alone does not indicate which dimensions of the target brain's response space are recovered. Here, we introduce a unified framework for evaluating both model-brain and brain-brain alignment by identifying the response dimensions recovered by prediction. Using repeated fMRI measurements, we first identify target-brain response dimensions that can be reproducibly predicted across independent trial splits. We then predict target-brain responses from either another subject's brain responses or a vision model's internal representations, and quantify how strongly each of these reproducible response dimensions is recovered. Applying this framework to a subset of the Natural Scenes Dataset, in which eight subjects viewed the same natural images during fMRI, we find that the early-to-intermediate visual-cortex responses contain a low-dimensional set of reproducible dimensions. Brain-to-brain comparisons identify which of these dimensions are consistently recoverable from other subjects' brains, providing a diagnostic human reference rather than only a scalar benchmark. In some cases, pretrained and randomly initialized models achieve similar prediction accuracy while showing distinct recovery profiles across these response dimensions. These results show that prediction accuracy alone can mask model-brain mismatches. By making explicit which reproducible brain response dimensions are recovered by prediction, our framework provides a more diagnostic evaluation of alignment between artificial vision models and the human visual cortex.

URL PDF HTML ☆

赞 0 踩 0

2605.20122 2026-05-20 stat.ML cs.CC cs.LG

Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation

优化Wasserstein距离估计的计算-统计运行时间

Peter Matthew Jacobs, Jeff M. Phillips

AI总结本文提出了一种Sample-Sketch-Solve方法，通过引入正则化笛卡尔网格草图来压缩数据并加速Wasserstein距离的计算，实现了在Hölder光滑分布下以更优的运行时间达到ε误差的估计。

详情

AI中文摘要

平方Wasserstein距离是衡量概率分布之间差异的常用工具。该距离通常在两个底层随机样本的经验测度之间计算。不幸的是，即使在低维欧几里得空间问题（d∈{2,3}）中，计算Wasserstein距离的算法在运行时间上随着n和所需精度的增加而表现不佳。为此，我们考虑计算-统计运行时间，目标是从样本中估计潜在光滑测度之间的Wasserstein距离，误差在期望意义上不超过ε。我们允许收集样本的计算成本为O(1)。为此，我们开发了一种Sample-Sketch-Solve范式，其中引入了样本的正则化笛卡尔网格草图。我们证明，尤其是在α-Hölder光滑分布下，这可以压缩数据而不增加渐近误差，并且正则化结构使更快的精确算法成为可能。最终，我们以ε误差在ε^{-max(2,(d+1+o(1))/(1+α))}时间内近似W_2^2(P,Q)，对于0 < α < 1的Hölder光滑分布P,Q在(0,1)^d上；当d=2时，对于α>1/2，达到最优Θ(ε^{-2})，当d=3时，当α→1时几乎最优。

英文摘要

Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems $\left( d \in \{2,3\} \right)$, algorithms for Wasserstein distance computation with approximate or exact precision guarantees scale poorly in the runtime as a function of $n$ and the desired precision. In response, we consider the computational-statistical runtime, where the goal is to estimate from samples the Wasserstein distance between potentially smooth measures up to $ε$-additive error in expectation with respect to the sampling; we allow $O(1)$ computational cost for collecting a sample. Towards this, we develop a Sample-Sketch-Solve paradigm where we introduce a regular cartesian grid sketch of the samples. We show that (especially under $α$-Hölder smooth distributions) this can compress the data without increasing asymptotic error, and also regularizes the structure which enables faster exact algorithms. Ultimately, we approximate $W_2^2(P,Q)$ within $ε$ error in $ε^{-\max(2,\frac{d+1+o(1)}{1+α})}$ time for $0 < α< 1$ Hölder smooth distributions $P,Q$ on $(0,1)^{d}$; an optimal $Θ(ε^{-2})$ for $α> 1/2$ when $d=2$ and nearly optimal as $α\to 1$ when $d = 3$.

URL PDF HTML ☆

赞 0 踩 0

2605.20108 2026-05-20 eess.SY cs.AI cs.LG cs.LO cs.SY

k-Inductive Neural Barrier Certificates for Unknown Nonlinear Dynamics

k-诱导神经屏障证书用于未知非线性动力学

Ben Wooding, Hongchao Zhang, Taylor T. Johnson, Abolfazl Lavaei

AI总结本文提出了一种基于神经网络的k-诱导神经屏障证书(k-NBCs)，用于部分未知的非线性系统，通过利用神经网络的可扩展性以及泛化Willems等人基本引理，构建数据驱动的表示以进行SMT验证，同时提高了设计灵活性。

Comments 18 pages, 5 figures, 3rd International Conference on Neuro-Symbolic Systems (NeuS)

详情

AI中文摘要

尽管传统的(k=1)离散时间屏障证书条件通过要求函数在每一步都非递增来施加严格的安全约束，k-诱导屏障证书通过允许临时增加--最多k-1次，每次在阈值ε内--同时保持整体安全性并提高灵活性。本文利用神经网络构建k-诱导神经屏障证书(k-NBCs)用于(部分)未知的非线性系统。虽然神经网络在设计过程中提供可扩展性，但缺乏形式保证，需要额外的方法如基于可满足性模理论(SMT)的反例引导归纳合成(CEGIS)进行验证。然而，CEGIS-SMT框架需要系统动力学的知识，这在实际情况下不可用。为此，我们利用Willems等人基本引理的泛化，使用单个状态轨迹，构建数据驱动的表示以进行SMT验证而不牺牲准确性。此外，CEGIS-SMT进一步消除了将屏障证书限制在特定函数类（如平方和）的约束，从而在设计上具有更大的灵活性。我们验证了我们的方法在三个非线性案例研究中，具有(部分)未知的动力学。

英文摘要

While conventional (k=1) discrete-time barrier certificate conditions impose strict safety constraints by requiring the function to be non-increasing at every step, k-inductive barrier certificates relax this by allowing a temporary increase -- up to k-1 times, each within a threshold $ε$ -- while maintaining overall safety, and improving flexibility. This paper leverages neural networks and constructs k-inductive neural barrier certificates (k-NBCs) for (partially) unknown nonlinear systems. While neural networks offer scalability in the design process, they lack formal guarantees, requiring additional approaches such as counterexample-guided inductive synthesis (CEGIS) with satisfiability modulo theories (SMT) for verification. However, the CEGIS-SMT framework requires knowledge of system dynamics, which is unavailable in practical settings. To address this, we leverage the generalization of the Willems et al.'s fundamental lemma, using a single state trajectory, to construct a data-driven representation of (partially) unknown models for SMT verification without sacrificing accuracy. Additionally, CEGIS-SMT further removes the constraint of restricting barrier certificates to specific function classes, such as sum-of-squares, enabling greater flexibility in their design. We validate our approach on three nonlinear case studies with (partially) unknown dynamics.

URL PDF HTML ☆

赞 0 踩 0

2605.20086 2026-05-20 cs.NE cs.AI cs.LG

What Do Evolutionary Coding Agents Evolve?

进化编码代理进化什么？

Nico Pelleriti, Sree Harsha Nelaturu, Zhanke Zhou, Zongze Li, Max Zimmer, Bo Han, Sebastian Pokutta

AI总结本文研究了进化编码代理在数学发现和算法设计中通过任务特定反馈生成、修改和选择代码的过程，通过EvoTrace数据集和EvoReplay方法分析了进化过程中的机制，发现大部分得分提升来自少数几种编辑类型，并发现存在确定性的循环模式。

Comments 28 pages, 12 figures, 12 tables

详情

AI中文摘要

最近的研究将大型语言模型与进化搜索结合，通过任务特定反馈迭代地生成、修改和选择代码。这些系统在数学发现和算法设计中取得了显著成果，但一个基本问题仍然存在：它们实际上进化了什么？进展通常通过任务特定评估器下最佳得分来总结，但该得分可能反映多种不同的机制：新的算法结构、重新调整现有策略、重新组合已存在于模型内部知识中的想法，或过度拟合评估器。区分这些机制需要检查搜索过程本身，而不是仅其最终结果。我们引入了EvoTrace，一个涵盖四个进化框架、推理和非推理模型以及16个数学和算法设计任务的进化编码轨迹数据集。为了分析这些轨迹，我们开发了EvoReplay，一种基于回放的方法，可以重建高分解决方案背后的局部搜索状态，并测试受控干预，包括调整常数、删除程序组件和替换模型或提示上下文。我们使用LLM-as-judge流程对EvoTrace中的每个代码编辑注释为九种 recurring 编辑类型之一，并通过盲人人工重新注释验证了该流程。在EvoTrace中，大部分得分提升来自少数几种编辑类型。我们进一步发现一种确定性的循环模式：大约30%的搜索过程中添加的代码行是字节相同的重新引入先前删除的行，几乎在每个运行中都存在。这些结果表明，进化编码代理的基准提升可能来自质的不同机制，其中只有某些机制对应于新的算法结构。EvoTrace使进化编码代理的评估超越了最终基准得分。

英文摘要

Recent work pairs LLMs with evolutionary search to iteratively generate, modify, and select code using task-specific feedback. These systems have produced strong results in mathematical discovery and algorithm design, yet a fundamental question remains: what do they actually evolve? Progress is typically summarized by the best score a run reaches under a task-specific evaluator, but that score can reflect several different mechanisms: new algorithmic structure, re-tuning an existing strategy, recombining ideas already in the model's internal knowledge, or overfitting to the evaluator. Distinguishing these mechanisms requires inspecting the search process itself, not only its final outcome. We introduce EvoTrace, a dataset of evolutionary coding traces spanning four evolutionary frameworks, reasoning and non-reasoning models, and 16 tasks across mathematics and algorithm design. To analyze these traces, we develop EvoReplay, a replay-based methodology that reconstructs the local search states behind high-scoring solutions and tests controlled interventions, including adjusting constants, removing program components and substituting models or prompting contexts. We annotate every code edit in EvoTrace with one of nine recurring edit types using an LLM-as-judge pipeline validated against blind human re-annotation. Across EvoTrace, most score gains come from a small subset of these edit types. We further find a deterministic cycling pattern: about 30% of code lines added during search are byte-identical re-introductions of previously-deleted lines, present throughout nearly every run. These results show that benchmark gains in evolutionary coding agents can arise from qualitatively different mechanisms, only some of which correspond to new algorithmic structure. EvoTrace enables more diagnostic evaluation of evolutionary coding agents beyond final benchmark scores.

URL PDF HTML ☆

赞 0 踩 0

2605.20068 2026-05-20 stat.ML cs.LG

Tail Annealing for Heavy-Tailed Flow Matching

尾部退火用于厚尾流匹配

Jean Pachebat

AI总结本文提出了一种简单的方法，通过在训练前对数据应用软对数变换，然后在生成后进行指数化，以处理厚尾数据问题。该方法通过Hill诊断决定是否对每个坐标进行变换，保留轻尾边缘不变，从而压缩厚尾到标准流匹配可以处理的范围内，无需厚尾基础分布或架构修改。

Comments 18 pages

详情

AI中文摘要

标准生成模型在处理厚尾数据时存在困难：Lipschitz架构无法从高斯噪声中生成幂律尾部，且在厚尾数据和高斯数据之间插值是不合理的。我们提出一个简单的解决方案：在训练前对数据应用软对数变换$ϕ(x) = \mathrm{sign}(x) \cdot \log(1 + |x|)$，然后在生成后对样本进行指数化。Hill诊断决定每个坐标是否进行变换，从而在不增加复杂度的情况下保留轻尾边缘不变。这将厚尾压缩到标准流匹配可以处理的范围内，而无需厚尾基础分布或架构修改。我们提供了理论直觉说明其有效性：对数变换将帕累托尾部映射到指数，诱导的动力学通过幂变换实现尾部退火。在144配置的多变量基准测试（3个copulas，$d$最大到100，4个尾指数）上，Log-FM在$W_1$、CVaR$_{99}$和极值分位数度量上优于专门的基线，并且是唯一在2880次运行中无严重发散的方法。

英文摘要

Standard generative models struggle with heavy-tailed data: Lipschitz architectures cannot produce power-law tails from Gaussian noise, and interpolating between heavy-tailed data and Gaussians is ill-posed. We propose a simple fix: apply the soft-log transform $ϕ(x) = \mathrm{sign}(x) \cdot \log(1 + |x|)$ coordinate-wise to data before training, then exponentiate samples after generation. A Hill diagnostic decides per-coordinate whether to transform, leaving light-tailed margins untouched at no added complexity. This compresses heavy tails into a range where standard flow matching succeeds, without heavy-tailed base distributions or architectural modifications. We provide theoretical intuition for why this works: the log-transform maps Pareto tails to exponentials, and the induced dynamics implement a form of tail annealing via power transformations. On a 144-configuration multivariate benchmark (3 copulas, $d$ up to 100, 4 tail indices), Log-FM dominates specialized baselines on $W_1$, CVaR$_{99}$, and extreme-quantile metrics, and is the only method with zero severe divergences across 2{,}880 runs.

URL PDF HTML ☆

赞 0 踩 0

2605.20055 2026-05-20 cs.SE cs.AI cs.RO

Towards LLM-Assisted Architecture Recovery for Real-World ROS~2 Systems: An Agent-Based Multi-Level Approach to Hierarchical Structural Architecture Reconstruction

面向现实世界ROS~2系统的LLM辅助架构恢复：一种基于智能体的多级方法用于分层结构架构重建

Dominique Briechle, Raj Chanchad, Tobias Geger, Ruidi He, Dhruv Jajadiya, Dhruv Kapadiya, Andreas Rausch, Meng Zhang

AI总结本文提出了一种基于智能体的多级方法，用于恢复复杂ROS~2系统中的分层结构架构，通过改进的提示和多级中间架构表示，提高了架构恢复的一致性和可扩展性。

详情

AI中文摘要

显式软件架构模型是沟通、分析和演变复杂软件密集型系统的关键 artifacts。然而，在基于ROS~2的机器人系统中，结构（解构）和集成语义通常仅在分布式 artifacts（如源代码和启动文件）中隐式编码，使得恢复分层架构尤其困难。现有方法主要关注节点级实体和通信布线，而对多抽象层次上的分层结构（解构）恢复支持有限。本文扩展了我们之前提出的蓝图引导的LLM辅助架构恢复流程，通过两个主要改进：（1）改进的提示以提高架构合成的一致性和可控性；（2）基于多级中间架构表示的分阶段恢复策略，该策略结合了原子ROS节点列表和启动文件依赖关系，从而在多个抽象层次上实现结构受限的重建。该方法在基于协作机械臂和异构ROS~2 artifacts的现实世界自动化产品拆卸系统上进行了评估。与我们之前的工作相比，所选案例研究显示出显著更高的集成复杂性和更丰富的功能。结果表明，架构恢复在结构一致性、可扩展性和鲁棒性方面有所提高，同时揭示了与大规模ROS~2系统中动态集成语义相关的剩余挑战。

英文摘要

Explicit software architecture models are essential artifacts for communicating, analyzing, and evolving complex software-intensive systems. In ROS~2-based robotic systems, however, structural (de-)composition and integration semantics are often only implicitly encoded across distributed artifacts such as source code and launch files, making recovery of hierarchical architecture particularly difficult. Existing approaches mainly focus on node-level entities and communication wiring, while providing limited support for recovering hierarchical structural (de-)composition across multiple abstraction levels. In this paper, we extend our previously proposed blueprint-guided LLM-assisted architecture recovery pipeline for ROS~2 systems through two major enhancements: (1) refined prompting to improve the consistency and controllability of architecture synthesis, and (2) a staged recovery strategy based on multi-level intermediate architectural representations that incorporate the atomic ROS node list and launch file dependencies, thereby enabling structurally constrained reconstruction across multiple abstraction levels. The approach is evaluated on a real-world automated product disassembly system based on cooperative robotic arms and heterogeneous ROS~2 artifacts. Compared to our previous work, the considered case study exhibits substantially higher integration complexity and richer functionality. The results demonstrate improved structural consistency, scalability, and robustness of architecture recovery, while also revealing remaining challenges related to dynamic integration semantics in large-scale ROS~2 systems.

URL PDF HTML ☆

赞 0 踩 0

2605.20049 2026-05-20 cs.SE cs.AI

Does Code Cleanliness Affect Coding Agents? A Controlled Minimal-Pair Study

代码整洁性影响编码代理吗？一项受控的最小对研究

Priyansh Trivedi, Olivier Schmitt

AI总结本研究探讨了代码整洁性对编码代理性能的影响，通过构建结构和风格相似但整洁度不同的代码库对，发现整洁性不影响通过率，但显著降低计算成本和文件重复访问。

详情

AI中文摘要

随着自主编码代理的快速普及，其评估主要集中在固定目标代码库的任务完成率上。这留下了一个关键问题未被回答：底层代码的结构和风格质量，即“整洁性”，是否会影响代理导航和修改代码的能力？为了隔离代码整洁性对代理能力的影响，我们引入了一种基于最小对的评估协议：构建结构、依赖和外部行为相同但静态分析规则违反和认知复杂度不同的代码库对。这些对通过代理流水线在两个方向上构建：一个降级干净代码库或清理混乱代码库。我们为六个这样的对编写了33项任务，并通过应用的公共表面进行隐藏测试。在660次使用Claude Code的试验中，代码整洁性没有改变代理的通过率。然而，它显著改变了代理的操作足迹：在整洁代码上工作的代理使用7至8%更少的标记，并减少34%的文件重复访问。我们的发现表明，传统可维护性原则在AI驱动开发时代仍然高度相关，影响编码代理的计算成本和导航效率。代码整洁性与模型选择、工具和提示并列，成为影响代理行为的重要因素。

英文摘要

As autonomous coding agents see rapid adoption, their evaluation has primarily focused on task completion rates holding the target codebase fixed. This leaves a critical question unanswered: does the structural and stylistic quality, or ``cleanliness'' of the underlying code affect an agent's ability to navigate and modify it? To isolate the effect of code cleanliness from agent capability, we introduce an evaluation protocol built around minimal pairs: repositories that match on architecture, dependencies, and external behaviour, but differ on static-analysis rule violations and cognitive complexity. The pairs are constructed in both directions, by agent pipelines that either degrade a clean repository or clean a messy one. We author 33 tasks across six such pairs, evaluated through hidden tests at the application's public surface. Across 660 trials with Claude Code, code cleanliness does not change the agent's pass rate. However, it substantially alters the agent's operational footprint: agents working on cleaner code use 7 to 8% fewer tokens and reduce file revisitations by 34%. Our findings suggest that traditional maintainability principles remain highly relevant in the era of AI-driven development, shaping the computational cost and navigational efficiency of coding agents. Code cleanliness joins model choice, harness, and prompting as a factor that materially affects agent behaviours.

URL PDF HTML ☆

赞 0 踩 0

2605.20016 2026-05-20 eess.IV cs.CV

FGSVQA: Frequency-Guided Short-form Video Quality Assessment

FGSVQA：基于频率的短视频质量评估

Xinyi Wang, Angeliki Katsenou, Junxiao Shen, David Bull

AI总结本文提出了一种端到端的视频质量评估框架，利用基于CLIP的密集视觉编码器和频率域中的压缩先验，生成具有伪影和结构感知的权重图，以实现高效的视频质量预测。

Comments 4 pages, 1 figure

详情

AI中文摘要

短视频给用户生成内容（UGC）的质量评估带来了新挑战，由于其复杂的生成流程、快速的内容变化和混合的失真。为了解决这一挑战，我们提出了一种端到端的视频质量评估（VQA）框架，该框架采用基于CLIP的密集视觉编码器，并结合从频率域导出的压缩先验，生成具有伪影和结构感知的权重图用于特征聚合。通过显式分解伪影、结构和原始视觉特征分支，并通过学习的门控模块在时间上自适应融合，所提出的方法实现了准确且高效的质量预测。实验结果表明，我们的方法在短视频数据集上在平均排名和线性相关性（SRCC: 0.736，PLCC: 0.787）方面表现出色，同时保持了高效的推理运行时间。代码和额外结果可在：https://github.com/xinyiW915/FGSVQA 获取。

英文摘要

Short-form video poses new challenges to the quality assessment of user-generated content (UGC) due to its complex generation pipeline, rapid content variation, and mixed distortions. To address this challenge, we propose an end-to-end video quality assessment (VQA) framework that employs a dense visual encoder based on CLIP, and incorporates compression priors derived from the frequency domain to generate artifact- and structure-aware weight maps for feature aggregation. By explicitly decomposing artifact, structure, and original visual feature branches and adaptively fusing them over time through a learned gating module, the proposed method achieves accurate and efficient quality prediction. Experimental results show that our method achieves strong performance on short-form video datasets in terms of average rank and linear correlation (SRCC: 0.736, PLCC: 0.787), while maintaining efficient inference runtime. The code and additional results are available at: https://github.com/xinyiW915/FGSVQA.

URL PDF HTML ☆

赞 0 踩 0

2605.19988 2026-05-20 cs.SE cs.AI cs.DB cs.PF

A Case for Agentic Tuning: From Documentation to Action in PostgreSQL

为代理调优辩护：从文档到PostgreSQL中的行动

Hongyu Lin, Mingyu Li, Weichen Zhang, Yihang Lou, Mingjie Xing, Yanjun Wu, Haibo Chen

AI总结本文提出通过动态行动替代静态文档进行系统调优，引入PerfEvolve工具，利用LLM代理实现版本一致性验证、工作负载特定分析和多参数联合优化，实验表明其在PostgreSQL上比现有文档驱动调优方法提升35.2%。

2605.19945 2026-05-20 cs.DC cs.AI cs.CL

GEM: GPU-Variability-Aware Expert to GPU Mapping for MoE Systems

GEM: 为MoE系统设计的GPU变异性感知专家到GPU映射

Sourish Wawdhane, Avinash Kumar, Poulami Das

AI总结 GEM通过考虑GPU变异性，优化MoE模型中专家到GPU的映射，从而减少延迟，提升系统性能。

Comments 18 pages

详情

AI中文摘要

混合专家（MoE）模型通过使用较小的专家并按每个token激活子集来实现高效的推理。MoE服务引擎将专家分布在多个GPU上，并在推理时根据激活的专家将token路由到适当的GPU。它们以锁步方式处理token，即每个batch中的token必须完成处理后才能进入下一层。这种同步障碍成为关键瓶颈，因为MoE模型的性能受限于最后一个完成的GPU（straggler）。stragglers出现在太多重用的专家被放置在同一GPU或最慢的GPU时。尽管先前的工作将专家平衡token负载分布在GPU上，但都忽略了GPU的变异性，并经常将高使用量的专家放置在最慢的GPU上。我们提出了GEM，即GPU变异性感知的专家映射框架，用于MoE模型的GPU变异性感知专家到GPU映射。GEM利用两个洞察：首先，必须将专家放置在每个GPU上根据其变异性接收非均匀的token负载，并且它们都在大约同一时间完成处理一层。我们的研究显示，存在两种类型的专家：一致的专家，通常被使用，以及时间性的专家，通常在剩余时间内一起使用。我们的第二个洞察是必须将同时使用的一致和时间性专家放置在不同的GPU上，并避免将它们放置在较慢的GPU上以减少延迟。GEM收集每个模型和任务的GPU变异性资料，并利用每个任务的token负载分布来映射专家到GPU。我们的实验表明，GEM在平均上将端到端延迟提高了7.9%，与基线相比最高提高了16.5%。

英文摘要

Mixture-of-Expert (MoE) models enable efficient inference by employing smaller experts and activating only a subset of them per token. MoE serving engines distribute experts across multiple GPUs and route tokens to appropriate GPUs at inference time based on experts activated. They process tokens in lock-step fashion, where tokens within a batch must finish processing before proceeding to the next layer. This synchronization barrier acts as a critical bottleneck because the performance of MoE models is limited by the straggler GPU that finishes last. Stragglers emerge when too many heavily used experts are placed on the same GPU or the slowest GPU. While prior works place experts that balance token loads across GPUs, they all overlook GPU variability and often place highly used experts on the slowest GPUs. We propose GEM, GPU-variability-aware Expert Mapping, a framework for GPU variability-aware expert to GPU mapping for MoE models. GEM exploits two insights. First, we must place experts such that each GPU receives non-uniform token loads based on their variability and they all finish processing a layer at about the same time. Our studies show that there are two types of experts: consistent that are used most of the time and temporal that are often used together for the remaining time. Our second insight is that we must place simultaneously used consistent and temporal experts on different GPUs and avoid placing them on slower GPUs to reduce slowdown. GEM gathers the variability profile of GPUs for each model and task and uses the token load distributions per task to map experts to GPUs. Our experiments show that GEM improves end-to-end latency by 7.9% on average and by up to 16.5% compared to the baseline.

URL PDF HTML ☆

赞 0 踩 0

2605.19928 2026-05-20 cs.GT cs.AI cs.LG

Real-Time Parallel Counterfactual Regret Minimization

实时并行反事实遗憾最小化

Boning Li, Longbo Huang

AI总结本文提出了一种实时深度限制下的CFR求解并行框架，通过剪枝、抽象和高级CFR变体的无缝整合，实现了在几秒内完成近均衡策略计算的高效方法，实验显示在德州扑克中速度提升了3.3-3.4倍。

Comments 13 pages, 3 figures

详情

AI中文摘要

反事实遗憾最小化（CFR）是解决大型不完全信息游戏的主要算法家族，支撑了Libratus和Pluribus等No-Limit Texas Hold'em扑克突破。在实时游戏系统中，求解器必须在仅几秒的严格时间预算内计算近均衡策略，而在此窗口内完成的CFR迭代次数直接决定了游戏表现。我们提出了Parallel CFR，这是首个用于实时深度限制CFR求解的并行化框架，无缝整合了剪枝、抽象和高级CFR变体。我们将每个CFR迭代分解为七个阶段的流水线，并识别了两个正交的并行维度：按信息集和按树节点。叶节点评估通过批量神经网络推理卸载到GPU，创建了异构的CPU-GPU流水线。在一对一No-Limit Texas Hold'em实验中，Parallel CFR在翻牌街实现了3.3-3.4倍的速度提升，深度限制游戏树中超过10亿历史的每迭代时间约为47-54毫秒。所有实验均在单个桌面级设备（NVIDIA DGX Spark）上运行，无需数据中心级基础设施即可在典型实时决策预算内完成数百次CFR迭代。

英文摘要

Counterfactual Regret Minimization (CFR) is the dominant algorithmic family for solving large imperfect-information games, underpinning breakthroughs such as Libratus and Pluribus in No-Limit Texas Hold'em poker. In real-time game-playing systems, the solver must compute a near-equilibrium strategy within a strict time budget of only a few seconds per decision, and the number of CFR iterations completed in this window directly determines play strength. We present \textbf{Parallel CFR}, the first parallelization framework for real-time depth-limited CFR solving that seamlessly integrates pruning, abstraction, and advanced CFR variants. We decompose each CFR iteration into a pipeline of seven stages and identify two orthogonal dimensions of parallelism: \emph{by information set} and \emph{by tree node}. Leaf node evaluation is offloaded to GPUs via batched neural network inference, creating a heterogeneous CPU--GPU pipeline. Experiments on Heads-Up No-Limit Texas Hold'em demonstrate that Parallel CFR achieves $3.3$--$3.4\times$ speedup over the single-threaded baseline on postflop streets, with per-iteration time of ${\sim}47$--$54$~ms on a depth-limited game tree with over $1$ billion histories. All experiments run on a single desktop-class device (NVIDIA DGX Spark), enabling hundreds of CFR iterations within a typical real-time decision budget without requiring datacenter-scale infrastructure.

URL PDF HTML ☆

赞 0 踩 0

2605.19892 2026-05-20 cs.DC cs.AI cs.ET cs.NI

Deep Tech to Space: Space Data Centers and AI Revolution at the Edge

深科技向太空：太空数据中心与边缘AI革命

Jonas Weiss, Patricia Sagmeister, Gabriel Maiolini Capez, Dinesh Verma, Roberto Garello, Alberto Perotti, Dawid Lazaj, Alicja Musial, Jakub Nalepa, Thomas Morf, Martin Schmatz, Marek Krawczyk, Mateusz Przeliorz, Kevin Roche, Sagar Tayal, Mahalakshmi Lakshminarayanan, Nicolas Longépé, Pierre-Philippe Mathieu, Agata Wijata

AI总结随着私人部门创新带来的成本大幅下降，轨道上的卫星数量迅速增加，随之而来的太空生成数据量也大幅上升。传输大量数据到地球进行处理可能变得越来越昂贵和具有挑战性，因为空间到地球链路拥堵和延迟增加。此外，传统地面站网络可能难以应对增长的数据流和工作负载，因为容量限制、复杂的调度物流和受限的可见窗口可能会限制扩展性。太空数据中心（SDCs）——一种软件驱动、多租户的人工智能服务平台，能够处理轨道上的数据以生成可操作的见解，为客户卫星和地面用户服务——代表了解决这些挑战的一种有前景的方法。

Comments 7 pages, 4 figures, 2 tables

详情

AI中文摘要

由私人部门创新驱动的大幅成本降低已经导致轨道上卫星数量的迅速增加，以及相应的大规模太空生成数据量的激增。随着这一趋势的持续发展，将大量数据传输到地球进行处理可能变得越来越昂贵和具有挑战性，因为空间到地球链路拥堵和延迟增加。此外，传统地面站网络可能由于容量限制、复杂的调度物流和受限的可见窗口而难以应对增长的数据流和工作负载，从而限制扩展性。太空数据中心（SDCs）——一种软件驱动、多租户的人工智能服务平台，能够处理轨道上的数据以生成可操作的见解，为客户卫星和地面用户服务——代表了解决这些挑战的一种有前景的方法。本文介绍了低地球轨道SDC卫星星座的架构，考虑了轨道设计、卫星间链路和网络拓扑、计算资源组织以及软件服务编排。我们利用技术路线图指导的预测模型分析SDC的技术可行性和经济可行性，并通过地球观测和月球探索的用例来说明这一概念。

英文摘要

Dramatic cost reductions driven by private sector innovations have led to a rapid increase in the number of satellites in orbit and a corresponding surge in space-generated data. As this trend continues, transmitting large volumes of data to Earth for processing may become increasingly costly and challenging due to potential space-to-Earth link congestion and increased latency. Moreover, traditional ground station networks may face difficulties accommodating growing data flows and workloads because of capacity constraints, complex scheduling logistics, and restricted visibility windows, which can limit scalability. Space Data Centers (SDCs) -- software-driven, multi-tenant artificial intelligence-based service platforms capable of processing data in orbit to generate actionable insights for client satellites and ground users -- represent a promising approach to address these challenges. This article presents the architecture of a Low Earth Orbit SDC satellite constellation, considering orbital design, inter-satellite links and network topology, computational resource organization, and software service orchestration. We analyze the potential technical feasibility and economic viability of SDCs using forecasting models informed by technology roadmaps and illustrate the concept through Earth observation and lunar exploration use cases.

URL PDF HTML ☆

赞 0 踩 0

2605.19889 2026-05-20 cs.GR cs.CV

GLUT: 3D Gaussian Lookup Table for Continuous Color Transformation

GLUT: 3D高斯查找表用于连续颜色变换

Danna Xue, David Serrano-Lozano, Shaolin Su, Javier Vazquez-Corral

AI总结本文提出GLUT，一种连续且显式的颜色表示方法，通过学习的3D高斯基元建模颜色变换，实现灵活的表示能力和紧凑的内存占用，并支持高效的用户友好编辑。

Comments Project page: https://color.cvc.uab.cat/glut/

详情

AI中文摘要

3D查找表（3D LUTs）广泛用于颜色映射，但其基于网格的表示需要对RGB空间进行离散化，导致容量-内存权衡问题，当存储大量LUT时尤为严重。最近的方法采用隐式神经表示来提高可扩展性，但其黑箱性质限制了可解释性和直观的局部编辑。在本文中，我们提出了Gaussian LUT（GLUT），一种连续且显式的颜色表示方法，通过一组可学习的3D高斯基元来建模颜色变换。通过避免固定分辨率的网格，GLUT在保持紧凑内存占用的同时实现了灵活的表示能力。其显式、空间局部化的形式进一步使准确建模和可解释性成为可能。基于这一表示，我们引入了一个紧凑的条件生成器（CGLUT），用于为多个LUT实例预测GLUT参数，将多样的颜色风格编码在一个框架中，以实现平滑且可控的LUT风格混合。此外，GLUT通过允许对特定颜色区域进行局部调整而不需全局重新训练，实现了高效的用户友好编辑。实验结果表明，我们的方法在准确性和效率方面均优于先前的神经LUT表示，同时提供了改进的可解释性和交互控制。

英文摘要

3D Lookup Tables (3D LUTs) are widely used for color mapping, but their grid-based representation requires discretizing the RGB space, leading to a capacity-memory trade-off that becomes prohibitive when storing large numbers of LUTs. Recent approaches adopt implicit neural representations to improve scalability, yet their black-box nature limits interpretability and hinders intuitive, localized editing. In this paper, we propose Gaussian LUT (GLUT), a continuous and explicit color representation that models color transformations using a set of learnable 3D Gaussian primitives. By avoiding fixed-resolution grids, GLUT achieves flexible representational capacity while maintaining a compact memory footprint. Its explicit, spatially localized formulation further enables both accurate modeling and interpretability. Building on this representation, we introduce a compact conditional generator (CGLUT) that predicts GLUT parameters for multiple LUT instances, encoding diverse color styles in a single framework to enable smooth and controllable LUT style blending. Moreover, GLUT supports efficient, user-friendly editing by allowing localized adjustments to specific color regions without global retraining. Experimental results demonstrate that our approach outperforms prior neural LUT representations in both accuracy and efficiency, while offering improved interpretability and interactive control.

URL PDF HTML ☆

赞 0 踩 0

2605.19887 2026-05-20 cs.DC cs.MA cs.RO cs.SY eess.SY

DAG-Based QoS-Aware Dynamic Task Placement for Networked Multi-Stage Control Pipelines

基于DAG的QoS感知动态任务放置用于网络化多阶段控制流水线

Thien Tran, Jonathan Kua, Thuong Hoang, Minh Tran, Yuemin Ding, Jiong Jin

AI总结本文提出一种基于DAG的QoS感知动态任务放置框架，用于网络化机器人中的感知-感知-规划-控制流水线，通过动态任务放置优化计算、通信延迟和任务放置集，解决传统静态边缘卸载和单阶段模型的不足。

Comments 4 pages, 1 figure, 1 algorithm, accepted as a Work-in-Progress (WiP) paper, on the 24th IEEE International Conference on Industrial Informatics (INDIN), 26-29 July, 2026, Melbourne, Australia

详情

AI中文摘要

当前物理人工智能（PAI）严重依赖闭环视觉伺服流水线，其感知和规划阶段由于嵌入在机器人上的复杂模型可能在机载上变得计算密集。在实践中，将感知任务静态卸载到本地边缘是不适合具有标准化工业网络的高延迟敏感、精确工业环境的。这强调了在工业自动化中控制-通信-计算（3C）协同设计的重要性：单一本地执行会饱和AI加速的机器和机器人硬件，而静态边缘卸载会暴露控制环路到网络抖动。现有的自适应任务放置（ATP）控制器可以部分解决这一差距，通过在二进制阈值规则下将单个流水线阶段重新定位，但没有多阶段模型和显式的任务放置切换成本。在本工作进展（WiP）论文中，我们提出了一种基于有向无环图（DAG）的高质量服务（QoS）感知动态任务放置（DTP）框架，用于网络化机器人中的感知-感知-规划-控制流水线。该流水线被形式化为一个DAG，具有任务级别和节点级别的属性，用于计算成本、通信延迟和可行的任务放置集；在小的可解释候选集（完全本地、静态卸载、混合）上，基于窗口的成本函数结合尾端到端延迟、截止时间违规率、硬件利用率和汉明距离切换惩罚，并且DTP算法具有滞回和最小停留时间界限的任务放置抖动。本文的WiP论文提出了理论框架、结构化的定性分析以及两阶段仿真加硬件在环验证路线图。

英文摘要

Current Physical AI (PAI) relies heavily on closed-loop visual-servoing pipelines, whose perception and planning stages may become computationally intensive onboard due to complex models embedded on robots. In practice, offloading the perception task to on-site edges statically is inappropriate for latency-sensitive, precise industrial settings over a standardized industrial network. This emphasizes the importance of Control-Communication-Computing (3C) co-design in industrial automation: monolithic local execution saturates AI-accelerated machine and robot hardware, while static edge offloading exposes the control loop to network jitter. Existing adaptive task placement (ATP) controllers can partially address the gap by relocating a single pipeline stage on binary threshold rules, without a multi-stage model and an explicit cost on placement switching. In this Work-in-Progress (WiP) paper, we propose a directed acyclic graph (DAG) based quality-of-service (QoS)-aware dynamic task placement (DTP) framework for sensing-perception-planning-control pipelines in networked robotics. This pipeline is formalized as a DAG with task-level and node-level attributes for compute cost, communication delay, and feasible placement sets; over a small interpretable candidate set (fully local, static offload, hybrid), a window-based cost function combines tail end-to-end latency, deadline violation rate, hardware utilization, and a Hamming-distance switching penalty, and a DTP algorithm with hysteresis and a minimum dwell-time bounds placement chatter. Our WiP paper presents the theoretical framework, a structured qualitative analysis, and a two-phase simulation plus hardware-in-the-loop validation roadmap.

URL PDF HTML ☆

赞 0 踩 0

2605.19794 2026-05-20 cs.HC cs.AI cs.DB

AffectAI-Capture: A Reproducible Multimodal Protocol for Small-Group Meeting Research

AffectAI-Capture：一种可重复的多模态协议用于小型小组会议研究

Meisam Jamshidi Seikavandi, Alice Modica, Anna Obara, Fabricio Batista Narcizo, Tanya Ignatenko, Ted Vucurevich, Jesper Bünsow Boldt, Paolo Burelli, Andrew Burke Dittberner

AI总结本文提出了一种可重复的多模态协议AffectAI-Capture，用于收集四人会议类互动的同步多模态数据，结合眼动追踪、可穿戴生理、近距离和房间音频、多视角视频、事件日志和结构化自我报告。通过固定任务块和已建立的小组互动范式，结合权威事件时间线和标准化输出进行数据采集和后期处理。本文贡献在于建立了可重复的协议架构，将任务设计、仪器化、时间溯源和数据封装连接起来，用于情绪、行为和会议分析研究。

2605.17635 2026-05-20 hep-ex cs.LG

ML-based Fast Simulation of FARICH Responses

基于机器学习的FARICH响应快速模拟

Foma Shipilov, Alexander Barnyakov, Artem Ivanov, Fedor Ratnikov

AI总结本文提出基于条件生成对抗网络的机器学习方法，用于快速模拟FARICH探测器响应，通过轻量级卷积架构生成真实光子击中探测器矩阵的样本，并在速度和精度上优于传统蒙特卡洛方法。

Comments to be published in 7th International Workshop on Future Tau Charm Facilities (FTCF2025) proceedings

2605.16630 2026-05-20 cs.CR cs.AI

PrivScope: Task-scoped Disclosure Control for Hybrid Agentic Systems

PrivScope：面向混合代理系统的任务范围披露控制

Shafizur Rahman Seeam, Zhengxiong Li, Zhiyuan Yu, Yimin, Chen, Yidan Hu

AI总结本文提出PrivScope，一种在本地与云语言模型之间实施任务范围披露控制的可信本地负载管理者，旨在防止在混合代理系统中因任务无关上下文、先前工作流残留信息和过于具体的敏感细节导致的过度披露问题。

详情

AI中文摘要

混合本地-云代理在将能力密集型子任务委托给云语言模型（CLM）之前，会利用持久的工作状态上下文来丰富用户请求。尽管这种增强可以提高任务成功率，但也暴露了云绑定负载中的不必要的信息，包括任务无关的上下文、先前工作流的残留信息以及过于具体的敏感细节，导致过度披露。现有解决方案要么将工作流隔离以限制跨工作流泄漏，要么应用通用目的的清理方法，但这些方法无法在LC组装的负载范围内进行推理。我们提出PrivScope，一种可信的本地负载管理者，它在本地-CLM边界实施任务范围披露控制，而无需对云端进行更改。其关键思想是：敏感信息应仅在需要完成委托子任务时才传送到云，并且以最不透露的形式传递，同时保持实用性。PrivScope从组装的负载中提取披露单元，并在本地保留直接标识符和账户关联值。其余单元通过云必要性控制，确定实际上需要什么；必须传送到云的单元会被抽象为最不具体的表示，足以完成任务。在100个医疗预约工作流上跨三个商业CLM测试中，PrivScope消除了资料泄漏（0.0% vs. 17.7%），将攻击者再识别率减少了一半以上（23.1% vs. 64.3%），并在每个测试的CLM上实现了最高的候选召回率，同时在GPT-4o-mini和Gemini 2.5 Flash上保持任务成功率接近未保护基线。在五种本地后端和商用硬件上，收益持续存在，仅增加了几秒钟的本地延迟。

英文摘要

Hybrid local--cloud agents enrich user requests with context from persistent working state before delegating capability-intensive subtasks to a cloud language model (CLM). While this enrichment can improve task success, it also exposes unnecessary information in the cloud-bound payload, including task-irrelevant context, carryover from prior workflows, and overly specific sensitive details, resulting in \emph{over-disclosure}. Existing solutions either isolate workflows to limit cross-workflow leakage or apply general-purpose sanitization that does not reason over LC-assembled payload scope. We present \textsc{PrivScope}, a trusted on-device payload governor that enforces \emph{task-scoped disclosure} at the local--CLM boundary, without requiring cloud-side changes. Its key idea: sensitive information should reach the cloud only when required for the delegated subtask, and then only in the least revealing form preserving utility. \textsc{PrivScope} extracts disclosure units from the assembled payload and keeps direct identifiers and account-linked values on device. The remaining units pass through cloud-necessity control, which determines what is actually needed; units that must reach the cloud are abstracted to the least-specific representation sufficient for the task. On 100 medical-booking workflows across three commercial CLMs, \textsc{PrivScope} eliminates profile leakage (0.0\% vs.\ 17.7\%), more than halves attacker re-identification (23.1\% vs.\ 64.3\%), and achieves the highest candidate recall on every CLM tested while preserving task success close to the unprotected baseline on GPT-4o-mini and Gemini 2.5 Flash. Gains hold across five local backbones and add only seconds of on-device latency on commodity hardware.

URL PDF HTML ☆

赞 0 踩 0

2603.24400 2026-05-20 stat.ML cs.LG

Neural Network Models for Contextual Regression

用于上下文回归的神经网络模型

Seksan Kiatsupaibul, Pakawan Chansiripas

AI总结本文提出了一种用于上下文回归的神经网络模型，通过将上下文特征确定主动子模型和拟合模型的算法分离，实现了结构化且可解释的架构，参数更少。数学上证明该架构足以用标准神经网络组件表示上下文线性回归模型，并通过数值实验表明所提模型在参数数量相当的情况下，具有更低的均方误差和更稳定的性能。

详情

AI中文摘要

我们提出了一种用于上下文回归的神经网络模型，其中回归模型依赖于确定活跃子模型的上下文特征以及一个拟合模型的算法。所提出的简单上下文神经网络（SCtxtNN）将上下文识别与上下文特定回归分离，从而实现了一个结构化且可解释的架构，其参数数量少于全连接前馈网络。我们数学上证明所提出的架构仅使用标准神经网络组件即可表示上下文线性回归模型。提供的数值实验支持这一理论结果，显示所提模型在参数数量相当的情况下，比具有相同参数数量的前馈神经网络具有更低的超额均方误差和更稳定的性能，而更大的网络只能以增加复杂性为代价提高准确性。结果表明，引入上下文结构可以提高模型效率，同时保持可解释性。

英文摘要

We propose a neural network model for contextual regression in which the regression model depends on contextual features that determine the active submodel and an algorithm to fit the model. The proposed simple contextual neural network (SCtxtNN) separates context identification from context-specific regression, resulting in a structured and interpretable architecture with fewer parameters than a fully connected feed-forward network. We show mathematically that the proposed architecture is sufficient to represent contextual linear regression models using only standard neural network components. Numerical experiments are provided to support the theoretical result, showing that the proposed model achieves lower excess mean squared error and more stable performance than feed-forward neural networks with comparable numbers of parameters, while larger networks improve accuracy only at the cost of increased complexity. The results suggest that incorporating contextual structure can improve model efficiency while preserving interpretability.

URL PDF HTML ☆

赞 0 踩 0

2602.18718 2026-05-20 stat.ML cs.LG math.OC stat.CO

Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space

基于Bures-沃斯特斯坦空间到参数空间的随机梯度变分推断与Price梯度估计

Kyurae Kim, Qiang Fu, Yi-An Ma, Jacob R. Gardner, Trevor Campbell

AI总结本文研究了在仅给定目标分布无规范化的对数密度时，利用随机梯度的变分推断方法。通过比较Wasserstein VI和Black-Box VI，发现WVI在使用Price梯度估计时具有更优的收敛性，本文进一步证明两者在迭代复杂度上可以达到一致的最优结果。

Comments Accepted to ICML'26

详情

AI中文摘要

对于仅给定目标分布无规范化的对数密度时，基于随机梯度的变分推断（VI）算法是一种流行的方法。例如，Wasserstein VI（WVI）和Black-Box VI（BBVI）分别在测度空间（Bures-Wasserstein空间）和参数空间上执行梯度下降。此前，对于高斯变分族，WVI的收敛性保证显示出优于使用重参数化梯度的Black-Box VI的结果，表明测度空间方法可能提供一些独特优势。然而，本文通过获得两者相同的最优迭代复杂度保证，填补了这一差距。特别是，我们发现WVI的优越性源于其使用的特定梯度估计器，BBVI也可以通过少量修改利用该估计器。所讨论的估计器通常与Price定理相关，并利用目标对数密度的二阶信息（Hessian）。我们将此称为Price梯度。另一方面，WVI可以通过使用重参数化梯度使其更广泛适用，这只需要对数密度的梯度。我们实验证明，使用Price梯度是性能提升的主要来源。

英文摘要

For approximating a target distribution given only its unnormalized log-density, stochastic gradient-based variational inference (VI) algorithms are a popular approach. For example, Wasserstein VI (WVI) and black-box VI (BBVI) perform gradient descent in measure space (Bures-Wasserstein space) and parameter space, respectively. Previously, for the Gaussian variational family, convergence guarantees for WVI have shown superiority over existing results for black-box VI with the reparametrization gradient, suggesting the measure space approach might provide some unique benefits. In this work, however, we close this gap by obtaining identical state-of-the-art iteration complexity guarantees for both. In particular, we identify that WVI's superiority stems from the specific gradient estimator it uses, which BBVI can also leverage with minor modifications. The estimator in question is usually associated with Price's theorem and utilizes second-order information (Hessians) of the target log-density. We will refer to this as Price's gradient. On the flip side, WVI can be made more widely applicable by using the reparametrization gradient, which requires only gradients of the log-density. We empirically demonstrate that the use of Price's gradient is the major source of performance improvement.

URL PDF HTML ☆

赞 0 踩 0

2601.15014 2026-05-20 stat.ML cs.LG math.ST stat.TH

Efficient and Minimax Optimal In-context Nonparametric Regression with Transformers

高效且最优的基于上下文的非参数回归变换器

Michelle Ching, Ioana Popescu, Nico Smith, Tianyi Ma, William G. Underwood, Richard J. Samworth

AI总结本文研究了基于上下文学习的非参数回归，针对α-Holder光滑回归函数，证明了使用预训练的变换器可以达到最优收敛率，且参数和预训练序列数量显著少于现有文献。

Comments 30 pages, 7 figures

2601.12367 2026-05-20 cs.HC cs.RO

User-to-Vehicle Interaction in Smart Mobility: The GO-DRiVeS Autonomous Ride-Sharing Application

用户与车辆交互在智能交通中的应用：GO-DRiVeS自动驾驶拼车应用

Hana E. Elmalah, Catherine M. Elias

AI总结本文提出了一种名为GO-DRiVeS的拼车应用，旨在解决大学学生和员工在炎热天气或携带重物时长时间步行的问题。该应用采用敏捷开发方法，并基于现有的交通应用框架进行分析和比较，实现了用户注册、拼车请求和实时追踪等功能，并通过多个实验验证了其稳定性和可靠性。

详情

DOI: 10.1109/MELECON64486.2026.11418861

AI中文摘要

本文介绍了GO-DRiVeS应用，这是一种按需拼车和请求的移动应用，专门针对解决长时间步行、时间消耗和疲劳的问题，尤其是在炎热天气或携带重物时，这对大学学生和员工来说是一个挑战。GO-DRiVeS应用是按照敏捷方法开发的，以确保其灵活性。此外，使用移动应用程序系统架构和客户端-服务器架构。GO-DRiVeS是使用React Native（Expo）作为前端，Node.js和Express作为后端，MongoDB作为数据库实现的；基于对现有交通应用的详细分析，比较其框架并识别其核心功能。GO-DRiVeS支持用户注册、拼车请求和实时追踪等核心功能。此外，它能够以先到先得的方式同时处理多个请求。该应用基于这些功能进行开发，其结果以多种形式的实验形式呈现，展示了在处理请求时的稳定性，如在方法和结果章节中所展示的。

英文摘要

This paper introduces the GO-DRiVeS application, an on demand ride sharing and requesting mobile application tailored specifically to save long walks and challenges which are time consuming and tiring especially during hot days or when carrying heavy items, faced by university students and staff. The GO-DRiVeS application was developed following the Agile methodology for its flexibility. In addition to, using the mobile application system architecture and client-server architecture. GO-DRiVeS was implemented using React Native (Expo) for the frontend, Node.js and Express for the backend, and MongoDB as the database; based on a detailed analyses to the existing transportation application, comparing their frameworks and identifying their essential functionalities. GO-DRiVeS supports core features like user registration, ride requesting and real-time tracking.In addition to handling multiple requests at the same time in a first come first serve manner. The application was developed based on these features, and the results were conducted in the form of multiple experiments that demonstrated stable behavior in handling the requests, as presented in the Methodology and Results chapters.

URL PDF HTML ☆

赞 0 踩 0

2512.00667 2026-05-20 eess.SY cs.RO cs.SY

Active Learning of Fractional-Order Viscoelastic Model Parameters for Realistic Haptic Rendering

分数阶黏弹性模型参数的主动学习用于真实触觉渲染

Harun Tolasa, Gorkem Gemalmaz, Volkan Patoglu

AI总结本文提出了一种系统的方法，通过主动学习优化分数阶黏弹性模型的参数，以提高触觉渲染的感知真实感，同时通过人类在回路优化和群体感知地图结合，选择出在一般人群中被广泛认为真实的参数。

Comments This work has been submitted to the IEEE Transactions on Haptics for possible publication. 14 pages, 8 figures

详情

AI中文摘要

有效的医疗模拟器需要真实地渲染具有黏弹性材料特性（如蠕变和应力松弛）的生物组织。分数阶模型提供了一种有效描述本质上时间依赖的黏弹性动力学的方法，仅需少量参数，因为它们自然地捕捉记忆效应。然而，由于分数元素的阶数与其他参数之间的非直观、频率依赖的耦合，确定产生高感知真实感的分数阶模型参数值仍是一个重大挑战。在本研究中，我们提出了一种系统的方法，通过主动学习优化分数阶黏弹性模型的参数，以优化触觉渲染在一般人群中的感知真实感。首先，我们证明通过基于定性反馈的人类在回路（HiL）优化可以有效优化分数阶模型的参数，以确保对每个人都能保持一致的高真实感评分。其次，我们提出了一种严格的方法，将HiL优化结果结合到一个在完整数据集上训练的聚合感知地图中，并展示如何从这种表示中选择群体层面的最佳参数，这些参数在一般人群中被广泛认为是真实的。最后，我们通过人类受试者实验验证了广义分数阶黏弹性模型参数在三种黏弹性材料中的有效性。总体而言，通过所提出的HiL优化和聚合方法建立的广义分数阶黏弹性模型有潜力显著提高医疗训练模拟器的sim-to-real过渡性能。

英文摘要

Effective medical simulators necessitate realistic haptic rendering of biological tissues that exhibit viscoelastic material properties, such as creep and stress relaxation. Fractional-order models provide an effective means of describing intrinsically time-dependent viscoelastic dynamics with few parameters, as they naturally capture memory effects. However, due to the unintuitive, frequency-dependent coupling among the order of the fractional element and other parameters, determining appropriate parameter values for fractional-order models that yield high perceived realism remains a significant challenge. In this study, we propose a systematic means of determining the parameters of fractional-order viscoelastic models that optimizes the perceived realism of haptic rendering across general populations. First, we demonstrate that the parameters of fractional-order models can be effectively optimized through active learning, using qualitative feedback-based human-in-the-loop (HiL) optimization, to ensure consistently high realism ratings for each individual. Second, we propose a rigorous method to combine HiL optimization results into an aggregate perceptual map trained on the entire dataset, and demonstrate how to select population-level optimal parameters from this representation that are broadly perceived as realistic across general populations. Finally, we provide evidence of the effectiveness of the generalized fractional-order viscoelastic model parameters for three viscoelastic materials by characterizing their perceived realism through human-subject experiments. Overall, generalized fractional-order viscoelastic models established through the proposed HiL optimization and aggregation approach possess the potential to significantly improve the sim-to-real transition performance of medical training simulators.

URL PDF HTML ☆

赞 0 踩 0

2511.01959 2026-05-20 astro-ph.IM astro-ph.CO astro-ph.HE cs.LG

Addressing prior dependence in hierarchical Bayesian modeling for PTA data analysis II: Noise and SGWB inference through parameter decorrelation

解决层次贝叶斯建模中先验依赖问题 II：通过参数去相关进行噪声和随机引力波背景推断

Eleonora Villa, Luigi D'Amico, Aldo Barca, Fatima Modica Bittordo, Francesco Alì, Massimo Meneghetti, Luca Naso

AI总结本文提出了一种层次贝叶斯建模策略，通过参数去相关来解决脉冲星计时阵列数据中的先验依赖问题，同时通过正交投影和归一化流方法提高噪声和随机引力波背景参数推断的准确性。

Comments 27 pages, 5 figures. Extended analysis and appendix added. Submitted to the Astronomy and Computing special issue HPC in Cosmology and Astrophysics

详情

AI中文摘要

脉冲星计时阵列（PTA）提供了一个强大的框架来测量低频引力波，但结果的准确性和鲁棒性受到复杂噪声过程的挑战，必须精确建模。标准PTA分析为每个脉冲星分配固定均匀噪声先验，这种方法在组合阵列时可能引入系统性偏差。为克服这一限制，我们采用层次贝叶斯建模策略，其中噪声先验由更高层次的超参数参数化。为缓解推断参数对噪声超先验选择的敏感性，我们引入了一种基于超参数在物理参数子空间上的正交投影的层次模型重参数化方法。该变换通过归一化流（NFs）实现，提供了可逆且可 tractable 的表示，并在重参数化模型中保留了收缩和跨脉冲星信息池化。我们还采用i-nessai，一种流引导的嵌套采样器，以高效探索由此产生的高维参数空间。我们将其方法应用于一个最小的3脉冲星案例研究，同时推断噪声和随机引力波背景（SGWB）参数。尽管数据集有限，结果一致表明，重参数化层次处理更紧密地约束了噪声参数，并部分缓解了红噪声-SGWB退化，而正交重参数化进一步增强了参数独立性，而不影响物理过程幂律建模固有的相关性。

英文摘要

Pulsar Timing Arrays (PTA) provide a powerful framework to measure low-frequency gravitational waves, but accuracy and robustness of the results are challenged by complex noise processes that must be accurately modeled. Standard PTA analyses assign fixed uniform noise priors to each pulsar, an approach that can introduce systematic biases when combining the array. To overcome this limitation, we adopt a hierarchical Bayesian modeling strategy in which noise priors are parametrized by higher-level hyperparameters. To mitigate the sensitivity of the inferred parameters to the choice of noise hyperprior, we introduce a reparametrization of the hierarchical model based on the orthogonal projection of hyperparameters onto the physical parameter subspace. The transformation is implemented through Normalizing Flows (NFs), which provide an invertible, tractable representation and preserve shrinkage and inter-pulsar information pooling in the reparametrized model. We also employ i-nessai, a flow-guided nested sampler, to efficiently explore the resulting higher-dimensional parameter space. We apply our method to a minimal 3-pulsar case study, performing a simultaneous inference of noise and stochastic gravitational wave background (SGWB) parameters. Despite the limited dataset, the results consistently show that the reparametrized hierarchical treatment constrains the noise parameters more tightly and partially alleviates the red-noise-SGWB degeneracy, while the orthogonal reparametrization further enhances parameter independence without affecting the correlations intrinsic to the power-law modeling of the physical processes involved.

URL PDF HTML ☆

赞 0 踩 0

2510.27588 2026-05-20 cs.DS cs.DB cs.LG

Learned Static Function Data Structures

学习静态函数数据结构

Stefan Hermann, Hans-Peter Lehmann, Giorgio Vinciguerra, Stefan Walzer

AI总结本文提出了一种利用机器学习捕获键值间相关性的静态函数数据结构，通过压缩编码实现空间节省，突破零阶熵限制并支持点查询。

详情

DOI: 10.14778/3796195.3796205
Journal ref: PVLDB, 19(5): 917-930, 2026

AI中文摘要

我们考虑了构建一个数据结构的任务，该数据结构将静态键集与值关联起来，同时允许对键集外的查询返回任意值。与哈希表相比，这些所谓的静态函数数据结构不需要存储键集，因此使用显著更少的内存。已知几种技术，压缩的静态函数接近值序列的零阶经验熵。在本文中，我们引入了学习静态函数，利用机器学习捕捉键和值之间的相关性。对于每个键，模型预测一个值的概率分布，从中推导出键特定的前缀码以紧凑地编码真实值。所得的编码词存储在经典静态函数数据结构中。这种设计使学习静态函数能够突破零阶熵限制，同时支持点查询。我们的实验显示了显著的空间节省：在真实数据上可达一个数量级，在合成数据上可达三个数量级。

英文摘要

We consider the task of constructing a data structure for associating a static set of keys with values, while allowing arbitrary output values for queries involving keys outside the set. Compared to hash tables, these so-called static function data structures do not need to store the key set and thus use significantly less memory. Several techniques are known, with compressed static functions approaching the zero-order empirical entropy of the value sequence. In this paper, we introduce learned static functions, which use machine learning to capture correlations between keys and values. For each key, a model predicts a probability distribution over the values, from which we derive a key-specific prefix code to compactly encode the true value. The resulting codeword is stored in a classic static function data structure. This design allows learned static functions to break the zero-order entropy barrier while still supporting point queries. Our experiments show substantial space savings: up to one order of magnitude on real data, and up to three orders of magnitude on synthetic data.

URL PDF HTML ☆

赞 0 踩 0

2510.12278 2026-05-20 cs.ET cs.AI

Quantum Annealing for Staff Scheduling in Educational Environments

量子退火在教育环境中的员工调度应用

Alessia Ciacco, Francesca Guerriero, Eneko Osaba

AI总结本文提出了一种基于量子退火的优化模型，用于解决多所学校和教育层次间员工分配问题，展示了量子退火在教育调度中的实际应用价值。

Comments 8 pages, 3 tables, and 2 figures. Paper presented at the International Conference on Quantum Communications, Networking, and Computing (QCNC 2026)

详情

DOI: 10.1109/QCNC69040.2026.00104
Journal ref: in Proc. 2026 International Conference on Quantum Communications, Networking, and Computing (QCNC), IEEE, 2026, pp. 630-637

AI中文摘要

我们解决了一个新的员工分配问题，该问题出现在多个学校站点和教育层次之间组织协作者的过程中。该问题源于意大利卡拉布里亚公立学校的一个真实案例，其中员工必须在幼儿园、小学和中学之间分配，受到可用性、能力和公平性的约束。为解决此问题，我们开发了一个优化模型并研究了基于量子退火的解决方案方法。我们在真实数据上的计算实验表明，量子退火能够在较短的运行时间内产生平衡的分配结果。这些结果为量子优化方法在教育调度中的实际应用提供了证据，并更广泛地为复杂资源分配任务提供了依据。

英文摘要

We address a novel staff allocation problem that arises in the organization of collaborators among multiple school sites and educational levels. The problem emerges from a real case study in a public school in Calabria, Italy, where staff members must be distributed across kindergartens, primary, and secondary schools under constraints of availability, competencies, and fairness. To tackle this problem, we develop an optimization model and investigate a solution approach based on quantum annealing. Our computational experiments on real-world data show that quantum annealing is capable of producing balanced assignments in short runtimes. These results provide evidence of the practical applicability of quantum optimization methods in educational scheduling and, more broadly, in complex resource allocation tasks.

URL PDF HTML ☆

赞 0 踩 0

2509.19707 2026-05-20 stat.ML cs.LG stat.CO stat.ME

Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies

扩散与流基copula：遗忘与记忆依赖

David Huk, Theodoros Damoulas

AI总结本文提出基于扩散和流原理的copula建模方法，通过遗忘和记忆依赖机制，有效建模多变量依赖，提升了copula模型的表示能力，适用于复杂和高维数据。

Comments Published as a conference paper at ICLR 2026

详情

AI中文摘要

copulas是建模数据多变量依赖的基本工具，在众多领域和应用中被广泛采用。然而，现有模型在处理多模态和高维依赖时受到限制性假设和扩展性差的阻碍。在本文中，我们提出了基于扩散和流原理的copula建模方法。我们设计了两种过程，逐步遗忘变量间依赖，同时不影响维度分布，证明在所有时间都定义有效的copula。我们展示了如何通过学习从每个过程中记忆遗忘的依赖来获得copula模型，理论上在最优时恢复真实copula。我们的框架的第一种实例专注于直接密度估计，第二种则专注于高效采样。实验表明，我们的方法在建模科学数据集和图像中的复杂和高维依赖方面优于现有copula方法。我们的工作增强了copula模型的表示能力，推动了其在更广泛领域和更大规模应用中的采用。

英文摘要

Copulas are a fundamental tool for modelling multivariate dependencies in data, forming the method of choice in diverse fields and applications. However, the adoption of existing models for multimodal and high-dimensional dependencies is hindered by restrictive assumptions and poor scaling. In this work, we present methods for modelling copulas based on the principles of diffusions and flows. We design two processes that progressively forget inter-variable dependencies while leaving dimension-wise distributions unaffected, provably defining valid copulas at all times. We show how to obtain copula models by learning to remember the forgotten dependencies from each process, theoretically recovering the true copula at optimality. The first instantiation of our framework focuses on direct density estimation, while the second specialises in expedient sampling. Empirically, we demonstrate the superior performance of our proposed methods over state-of-the-art copula approaches in modelling complex and high-dimensional dependencies from scientific datasets and images. Our work enhances the representational power of copula models, empowering applications and paving the way for their adoption on larger scales and more challenging domains.

URL PDF HTML ☆

赞 0 踩 0

2508.06526 2026-05-20 cs.DC cs.AI cs.AR

PiKV: KV Cache Management System for Mixture of Experts

PiKV: 一种用于混合专家架构的键值缓存管理系统

Dong Liu, Yanxuan Yu, Ben Lengerich, Ying Nian Wu

AI总结本文提出PiKV，一种专为混合专家架构设计的并行分布式键值缓存服务框架，通过专家分片缓存、PiKV路由和PiKV调度来减少缓存访问开销，并通过压缩模块降低内存使用。

Comments Github Link: https://github.com/NoakLiu/PiKV

详情

AI中文摘要

随着大规模语言模型在规模和上下文长度上持续扩展，键值（KV）缓存存储的内存和通信成本已成为多GPU和多节点推断中的主要瓶颈。虽然基于混合专家（MoE）的架构在计算上稀疏化，但相应的KV缓存仍然密集且全局同步，导致显著的开销。我们介绍了PiKV，一种专为MoE架构设计的并行和分布式KV缓存服务框架。PiKV利用专家分片的KV存储将缓存划分为GPU，利用PiKV路由减少令牌到KV的访问，以及PiKV调度来适应性地保留查询相关的条目。为了进一步减少内存使用，PiKV将PiKV压缩模块整合到缓存管道中以加速。PiKV最近已作为开源软件库公开发布：https://github.com/NoakLiu/PiKV。PiKV仍是一个活跃的项目，旨在成为一种全面的MoE架构的键值缓存管理系统。

英文摘要

As large-scale language models continue to scale up in both size and context length, the memory and communication cost of key-value (KV) cache storage has become a major bottleneck in multi-GPU and multi-node inference. While MoE-based architectures sparsify computation across experts, the corresponding KV caches remain dense and globally synchronized, resulting in significant overhead. We introduce \textbf{PiKV}, a parallel and distributed KV cache serving framework tailored for MoE architecture. PiKV leverages \textit{expert-sharded KV storage} to partition caches across GPUs, \textit{PiKV routing} to reduce token-to-KV access, and a \textit{PiKV Scheduling} to adaptively retain query-relevant entries. To further reduce memory usage, PiKV integrates \textit{PiKV Compression} modules the caching pipeline for acceleration. PiKV is recently publicly available as an open-source software library: \href{https://github.com/NoakLiu/PiKV}{https://github.com/NoakLiu/PiKV}. PiKV is still a living project, aiming to become a comprehesive KV Cache management system for MoE Architectures.

URL PDF HTML ☆

赞 0 踩 0