arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.16208 2026-05-18 stat.ML cs.LG

A Scalable Nonparametric Continuous-Time Survival Model through Numerical Quadrature

通过数值积分实现的可扩展非参数连续时间生存模型

Chaeyeon Lee, Sehwan Kim, Hyungrok Do

发表机构 * Department of Statistics（统计系）； Ewha Womans University（成均馆大学）； NYU Grossman School of Medicine（纽约大学 Grossman 医学院）

AI总结本文提出QSurv模型，通过高斯-勒让德数值积分实现非参数连续时间生存建模，无需时间离散化或限制分布假设，有效捕捉非平稳危险动态，实验表明其在即时危险函数估计上具有优势。

详情

AI中文摘要

灵活的连续时间生存建模对于捕捉高维数据中的复杂时间变化危险动态至关重要；然而，由于似然估计所需的不可计算积分，训练此类模型仍然具有挑战性。我们引入QSurv，一种可扩展的深度学习框架，使非参数连续时间建模成为可能，而无需依赖时间离散化或限制性分布假设。我们提出基于高斯-勒让德数值积分的训练目标，该方法以高阶精度近似累积危险，同时通过标准反向传播实现高效的端到端训练。此外，为了在复杂架构中有效捕捉非平稳危险动态，我们引入了时间条件低秩适应，一种通过动态调节权重实现对时间的条件化的机制。我们提供了理论分析，建立了累积危险评估的近似误差界。在合成基准、大规模真实世界表格数据集和高维医学影像任务中的全面实验表明，QSurv在预测性能上具有竞争力，在即时危险函数估计方面具有优势，从而能够更可解释地表征时间变化的风险模式。

英文摘要

Flexible continuous-time survival modeling is critical for capturing complex time-varying hazard dynamics in high-dimensional data; however, training such models remains challenging due to the intractable integral required for likelihood estimation. We introduce QSurv, a scalable deep learning framework that enables nonparametric continuous-time modeling without relying on time discretization or restrictive distributional assumptions. We propose a training objective based on Gauss-Legendre numerical quadrature, which approximates the cumulative hazard with high-order accuracy while facilitating efficient end-to-end training via standard backpropagation. Furthermore, to effectively capture non-stationary hazard dynamics in complex architectures, we introduce time-conditioned low-rank adaptation, a mechanism that conditions general neural backbones on time by dynamically modulating weights via low-rank updates. We provide theoretical analysis establishing approximation error bounds for cumulative-hazard evaluation. Comprehensive experiments across synthetic benchmarks, large-scale real-world tabular datasets, and high-dimensional medical imaging tasks demonstrate that QSurv achieves competitive predictive performance with advantages in instantaneous hazard function estimation, enabling more interpretable characterization of time-varying risk patterns.

URL PDF HTML ☆

赞 0 踩 0

2605.16194 2026-05-18 cs.DL cs.AI cs.IR cs.MA

paper.json: A Coordination Convention for LLM-Agent-Actionable Papers

为LLM-代理可操作论文的协调约定

Arquimedes Canedo

发表机构 * arquicanedo

AI总结本文提出paper.json文件，通过稳定声明ID、明确不声明列表、精确图示命令和稳定定义ID等约定，解决LLM代理在阅读学术论文时的重复失败问题。

详情

AI中文摘要

LLM代理通常作为学术论文的第一（有时唯一）阅读者，快速浏览子声明、提取可重复性步骤并概括范围。标准论文在这一角色中产生重复失败：无法在子论文粒度下引用子声明、范围过度扩展超出论文测试内容，以及图示命令埋藏在代码库而非论文本身。我们提出paper.json，一个随PDF一同携带的JSON文件，通过轻量级约定解决这些失败：稳定声明ID（C1）、明确不声明列表（C2）、精确每图shell命令（C3）和稳定定义ID（C5）。第五个约定（C4）指出，最小可行合规性，手写JSON与PDF一同，可在一小时内完成，无需触碰人类可读输出。C1、C2、C3和C5是开放邀请：阅读合规论文并采取行动的代理将产生证据支持或反对它们。本文本身合规：运行`uv run validator.py paper.json --against paper.typ`通过。仓库：https://github.com/arquicanedo/paper-json

英文摘要

LLM agents routinely serve as first (and sometimes only) readers of academic papers, skimming for sub-claims, extracting reproducibility steps, and generalizing scope. Standard prose papers produce recurring failures in this role: sub-claims that cannot be cited at sub-paper granularity, scope overextension beyond what the paper tests, and figure commands buried in codebases rather than the paper itself. We propose `paper.json`, a companion JSON file that travels with the PDF and addresses each failure with a lightweight convention: stable claim IDs (C1), an explicit does-not-claim list (C2), exact per-figure shell commands (C3), and stable definition IDs (C5). A fifth convention (C4) holds that minimum viable compliance, hand-written JSON alongside the PDF, is achievable in under an hour for a finished paper without touching the human-readable output. C1, C2, C3, and C5 are open invitations: an agent that reads a compliant paper and acts on it produces evidence for or against them. This paper is itself compliant: `uv run validator.py paper.json --against paper.typ` passes. Repo: https://github.com/arquicanedo/paper-json

URL PDF HTML ☆

赞 0 踩 0

2605.16184 2026-05-18 cs.DC cs.LG

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training

面向可扩展大语言模型训练的运行时优化

Yishun Lu, Junhao Zhang, Zeyu Yang, Wes Armour

发表机构 * Nvidia（英伟达）

AI总结本文提出Asteria系统，通过分离二次优化逻辑与GPU训练路径，解决大规模矩阵优化器状态维护的系统成本问题，实现在内存受限和分布式训练中提升大语言模型的训练效率。

详情

AI中文摘要

二次方法为更高效的LLM训练提供了有吸引力的路径，但其实际应用常受限于维护和更新大型矩阵优化器状态的系统成本。我们引入Asteria，一种运行时系统，通过将二次优化逻辑与关键GPU训练路径分离，消除这一瓶颈。Asteria动态地将优化器状态分布在GPU内存、CPU内存和可选NVMe存储中，根据架构约束和运行时压力。它进一步利用训练钩子提前准备影子状态，使昂贵的逆根计算异步在主机上进行，同时GPU计算持续进行。对于分布式训练，Asteria采用有界滞后协议，通过拓扑感知协调限制同步频率，同时保持优化器有效性。我们在内存受限和分布式训练设置上评估Asteria。在单块GB10 GPU和128GB统一内存的DGX Spark平台，Asteria支持10亿参数语言模型的二次训练。在多节点GH200系统中，它降低了可见优化器开销，减少了反复延迟尖峰，加速了收敛时间，并在70亿参数语言模型中保持SOAP和KL-Shampoo的优化优势。我们的结果表明，二次LLM训练的实用性并非仅通过简化优化器，而是通过重新思考运行时层面的优化器状态、后台计算和分布式同步管理来实现。

英文摘要

Second-order methods offer an attractive path toward more sample-efficient LLM training, but their practical use is often blocked by the systems cost of maintaining and updating large matrix-based optimizer states. We introduce \textbf{Asteria}, a runtime system designed to remove this bottleneck by separating second-order optimization logic from the critical GPU training path. Rather than keeping all preconditioner state on the accelerator, Asteria dynamically distributes optimizer state across GPU memory, CPU memory, and optional NVMe storage according to architectural constraints and runtime pressure. It further uses training hooks to prepare shadow states in advance, allowing expensive inverse-root computations to proceed asynchronously on the host while GPU computation continues. For distributed training, Asteria employs a bounded-staleness protocol that limits synchronization frequency while preserving optimizer effectiveness through topology-aware coordination. We evaluate Asteria on both memory-constrained and distributed training settings. On a DGX Spark platform with a single GB10 GPU and 128GB unified memory, Asteria supports second-order training for a 1B-parameter language model. On multi-node GH200 systems, it lowers visible optimizer overhead, reduces recurring latency spikes, accelerates convergence in wall-clock time, and maintains the optimization advantages of SOAP and KL-Shampoo in a 7B-parameter language model. Our results suggest that second-order LLM training can be made practical not by simplifying the optimizer alone, but by rethinking how optimizer state, background computation, and distributed synchronization are managed at the runtime level.

URL PDF HTML ☆

赞 0 踩 0

2605.16145 2026-05-18 stat.ML cs.LG

Skew-adaptive conformal prediction

偏斜自适应置信预测

Paulo C. Marques F., Helton Graziadei

发表机构 * Insper Institute of Education and Research（Insper教育与研究学院）

AI总结本文提出一种偏斜自适应置信预测方法，通过非对称区间族和 gauge 方法构建置信分数，利用逆双曲正弦变换训练额外预测模型以适应特征空间中的不确定性倾斜，保持了样本有限的边缘有效性，同时实现了对局部尺度和偏斜的适应。

Comments 17 pages, 2 figures

详情

AI中文摘要

我们开发了一种偏斜自适应扩展的分割置信预测方法用于回归。该方法从一个以点预测为中心的非对称区间族开始，并利用 gauge 方法推导出由该族诱导的置信分数。符号缩放残差的逆双曲正弦变换提供了额外预测模型的训练目标，其作用是学习如何在特征空间中调整预测不确定性。所得到的程序在交换性下保持了样本有限的边缘有效性，同时产生能够适应局部尺度和局部偏斜的区间。我们还开发了一种基于校准样本的估计器，用于比较偏斜自适应和经典缩放分数区间的预期相对宽度。在各种数据集上的实验表明，与缩放分数构造和置信化分位数回归相比，预测区间效率有所提高，并显示所提出的估计器与测试样本上观察到的相应平均宽度比高度吻合。

英文摘要

We develop a skew-adaptive extension of split conformal prediction for regression. The method starts from an asymmetric interval family centered at a point prediction and uses the gauge approach to deduce the conformity score induced by this family. The inverse hyperbolic sine transform of signed scaled residuals provides the training target for an additional predictive model, whose role is to learn how predictive uncertainty should tilt across the feature space. The resulting procedure preserves the finite-sample marginal validity of split conformal prediction under exchangeability, while producing intervals that adapt to both local scale and local skewness. We also develop a calibration-sample-based estimator for comparing the expected relative future width of the skew-adaptive and classical scaled-score intervals. Experiments on a variety of datasets indicate gains in prediction interval efficiency over the scaled-score construction and conformalized quantile regression, and show that the proposed estimator closely matches the corresponding average width ratio observed on the test sample.

URL PDF HTML ☆

赞 0 踩 0

2605.16114 2026-05-18 cs.NE cs.LG

Scalable neuromorphic computing from autonomous spiking dynamics in a clockless reconfigurable chip

可扩展的神经形态计算：基于自主脉冲动态的无时钟可重构芯片

Eric Oliveira Gomes, Damien Rontani

发表机构 * LMOPS UR4423 Laboratory, CentraleSupélec and Université de Lorraine（LMOPS UR4423实验室，中央超导电子实验室和洛林大学）

AI总结本文提出了一种基于无时钟数字电路自主连续演化的脉冲动态的可扩展神经形态架构，通过FPGA实现可配置的布尔脉冲神经元网络，展示了在音频分类任务中高效处理脉冲编码数据的性能，且能耗显著低于传统数字方案。

详情

AI中文摘要

我们提出了一种基于脉冲动态的可扩展神经形态架构，该架构基于无时钟（异步）数字电路的自主连续演化的脉冲动态。在商用场可编程门阵列（FPGA）上实现，我们的系统实现了具有可配置兴奋性和抑制性突触权重的相互作用布尔脉冲神经元网络。完整的处理流水线能够高效处理脉冲编码数据以解决机器学习任务。我们展示了在基于脉冲编码的音频分类任务中具有竞争性性能。能耗显著低于传统数字实现；这使我们的方法成为一种高效的替代方案，填补了传统模拟神经形态系统与专用模拟神经形态系统之间的差距，而无需专门的硬件设计。更一般而言，我们的方法确立了无时钟数字硬件作为神经形态计算的可行平台。它为可重构芯片转变为节能的准模拟神经形态处理器铺平了道路。

英文摘要

We propose a scalable neuromorphic architecture based on spiking dynamics emerging from the autonomous time-continuous evolution of clockless (asynchronous) digital circuits. Implemented on commercially available field-programmable gate arrays (FPGAs), our system implements networks of interacting Boolean spiking neurons with configurable excitatory and inhibitory synaptic weights. A complete processing pipeline enables efficient handling of spike-encoded data for solving machine-learning tasks. We demonstrate competitive performance for an audio classification task with spike-based encoding and high-speed processing. Power consumption is significantly lower than traditional digital implementations; this makes our approach an efficient alternative that bridges the gap to dedicated analog neuromorphic systems without the need for specialized hardware design. More generally, our approach establishes clockless digital hardware as a viable platform for neuromorphic computing. It paves the way for reconfigurable chips to be turned into energy-efficient quasi-analog neuromorphic processors.

URL PDF HTML ☆

赞 0 踩 0

2605.16094 2026-05-18 cs.IT cs.AI math.IT

GeoGS-CE: Learning Delay--Beam Channel Priors with 3D Gaussians for High-Mobility Scenarios

GeoGS-CE: 利用3D高斯分布学习延迟-波束信道先验以应对高机动场景

Yumeng Zhang, Jiajia Guo, Chaozheng Wen, Chenghong Bian, Jun Zhang

发表机构 * iComAI Lab, HKUST（iComAI实验室，香港科技大学）

AI总结本文提出GeoGS-CE框架，通过3D高斯分布建模高机动场景中的信道特性，利用延迟-波束功率谱作为先验信息，提升稀疏试点下的信道频率响应重建精度。

详情

AI中文摘要

宽带信道估计（CE）在高机动场景中仍具挑战性，因为信道响应变化迅速，而实际系统只能分配稀疏试点以适应密集用户。幸运的是，许多高机动环境，如高速铁路，表现出预定轨迹、可预测速度和有限主导传播路径。这些特性诱导出的延迟-波束功率谱比瞬时复通道频率响应（CFR）更稳定，对随机相位相干性更不敏感，并富含几何信息。为利用这些环境特性，我们提出GeoGS-CE，一种针对稀疏试点高机动场景的两阶段信道估计框架。在离线阶段，GeoGS-CE联合建模：1）场景级3D高斯分布，捕捉非视线（NLoS）几何散射支持；2）漏泄感知的可微无线渲染过程，将NLoS高斯分布与显式虚拟视线（LoS）组件映射到测量的延迟-波束功率谱，同时考虑实际OFDM延迟和阵列漏泄效应。在在线阶段，为每个用户位置预测延迟-波束功率谱，并用作强协方差先验，通过线性MMSE估计器实现准确的全带和全阵列CFR重建和跟踪。基于广深高速铁路生成的信道仿真表明，所提出的几何先验显著提高了CFR重建性能，优于仅试点和非几何基线。

英文摘要

Wideband channel estimation (CE) in high-mobility scenarios remains challenging because channel responses vary rapidly, while practical systems can allocate only sparse pilots to accommodate dense users. Fortunately, many high-mobility environments, such as high-speed railways, exhibit scheduled trajectories, predictable velocities, and a limited number of dominant propagation paths. These properties induce a delay--beam power spectrum that is more stable than the instantaneous complex channel frequency response (CFR), less sensitive to the random phase coherence, and rich in geometric information. To exploit such environmental properties, we propose GeoGS-CE, a two-stage channel estimation framework for sparse-pilot high-mobility scenarios. In the offline stage, GeoGS-CE jointly models: 1) a scene-level 3D Gaussian representation that captures the non-line-of-sight (NLoS) geometric scattering support, and 2) a leakage-aware differentiable wireless rendering process that maps the NLoS Gaussians, together with an explicit virtual line-of-sight (LoS) component, to the measured delay--beam power spectrum, while accounting for practical OFDM delay and array leakage effects. In the online stage, the delay--beam power spectrum is predicted for each user location and used as a strong covariance prior, enabling accurate full-band and full-array CFR reconstruction and tracking through a linear MMSE estimator. Simulations based on channels generated from a segment of the Guangshen high-speed railway show that the proposed geometric prior substantially improves CFR reconstruction over pilot-only and non-geometric baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.16090 2026-05-18 cs.CR cs.CV

A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation

针对大视觉-语言模型的跨模态提示注入攻击：仅图像扰动

Hao Yang, Zhuo Ma, Yang Liu, Yilong Yang, Guancheng Wang, JianFeng Ma

发表机构 * Xidian University（西安电子科技大学）

AI总结本文提出CrossMPI攻击，通过仅图像扰动实现跨模态提示注入，改进模型隐藏状态空间优化并采用层选择策略与距离递减扰动策略，有效提升攻击性能。

详情

AI中文摘要

大型视觉-语言模型（LVLMs）已发展为多模态智能的强大范式，但其日益增长的部署也扩大了提示注入攻击的攻击面。尽管存在日益增长的担忧，现有攻击仍受到关键限制：注入的提示仅能引导模型对单一输入的解释。相反，这些攻击虽然多模态，但未能实现跨模态提示扰动。为此，我们引入了新颖的跨模态提示注入攻击CrossMPI，通过仅图像提示注入引导模型对文本和视觉输入的解释。我们的设计基于以下关键突破：首先，我们将注入提示扰动优化的焦点从视觉嵌入空间（通常仅有10^5个参数）转向模型隐藏状态空间（用于多模态信息整合，具有10^7个参数）。然后，采用两种策略以缓解由更大参数空间带来的优化挑战。为了约束优化的模型参数空间，我们引入了一种层选择策略，识别对多模态整合最关键的层。有趣的是，与以往经验不同，我们的分析表明，最优的LVLM提示扰动层位于模型中间而非最后。为了约束图像扰动空间，我们提出了一种新的距离递减扰动预算分配策略，该策略随着像素距离到语义关键区域的增加而递减分配预算。在多个LVLM和数据集上的广泛实验表明，我们的方法显著优于基线方法。

英文摘要

Large vision-language models (LVLMs) have emerged as a powerful paradigm for multimodal intelligence, but their growing deployment also expands the attack surface of prompt injection. Despite this growing concern, existing attacks still suffer from a critical limitation: the injected prompt for one modality only steers the model's interpretation of that singular input. Alternatively, these attacks remain multimodal but fail to achieve cross-modal prompt perturbation. To bridge this gap, we introduce a novel cross-modal prompt injection attack CrossMPI, which can steer the model's interpretation of both textual and visual inputs via image-only prompt injection. Our design is underpinned by the following key breakthroughs. First, we turn the focus of the injected prompt perturbation optimization from the visual embedding space (typically with only $10^5$ parameters) to the model hidden state space (for multimodal information integration and with $10^7$ parameters). Then, two strategies are adopted to mitigate the optimization challenges posed by the larger parameter space. To constrain the optimized model parameter space, we introduce a layer selection strategy that identifies the layers most critical to multimodal integration. Interestingly, deviating from the past experience, our analysis reveals that the optimal layers for LVLM prompt perturbation reside in the middle of the model rather than the last. To constrain the image perturbation space, we propose a new distance-decremental perturbation budget assignment strategy that allocates budgets decrementally as the pixel distance to semantic-critical regions increases. Extensive experiments across multiple LVLMs and datasets show that our method significantly outperforms baseline approaches.

URL PDF HTML ☆

赞 0 踩 0

2605.16085 2026-05-18 cs.DB cs.AI

Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks

面向关系数据库的foundation models的语言模型与图神经网络方法

Jingcheng Wu, Ratan Bahadur Thapa, Mojtaba Nayyeri, Lucas Etteldorf, Max Finkenbeiner, Fabian Leeske, Steffen Staab

发表机构 * University of Stuttgart, Stuttgart, Germany（斯图加特大学）； Internet Science Research Group, University of Southampton, Southampton, United Kingdom（互联网科学研究组，南安普顿大学）

AI总结本文提出结合语言模型和图神经网络的混合架构，通过关系实体图建模提升关系数据库的预测性能，实验表明其在多个任务中表现优异，接近监督基线并缩小与RDL的差距。

Comments 15 pages, 7 figures, 4 tables. Preprint of a paper accepted at the 1st Workshop on Extraction from Triplet Text-Table-Knowledge Graph and associated Challenge (TRIPLET), co-located with ESWC 2026

详情

AI中文摘要

关系数据库存储了大量结构化信息，对复杂预测应用至关重要。然而，关系数据的深度学习进展有限，传统方法通过人工特征工程将数据库扁平化为单表，丢失了关系上下文。关系深度学习（RDL）通过将数据库建模为关系实体图（REGs）供图神经网络（GNNs）处理，但任务和数据库特定。为结合两种范式的优势，本文提出混合架构，结合微调的BART编码器捕捉行内语义，以及基于GraphSAGE的GNN处理REGs注入关系上下文。在RelBench上的实验表明，GNN显著丰富BART的行嵌入，实现驱动-dnf任务在rel-f1数据集上的ROC-AUC为67.40。该性能与监督基线如LightGBM（68.86）相当，并缩小与RDL（72.62）的差距至5.22点，尽管与最先进的基础模型如KumoRFM（82.63）仍有较大差距。这些结果表明，轻量级混合LM-GNN架构为关系数据库的基础模型提供了有前景且资源高效的路径。

英文摘要

Relational databases store much of the world's structured information, and they are essential for driving complex predictive applications. However, deep learning progress on relational data remains limited, as conventional approaches flatten databases into single tables via manual feature engineering, discarding relational context. Relational deep learning (RDL) addresses this by modeling databases as relational entity graphs (REGs) for graph neural networks (GNNs), but remains task- and database-specific. To combine the strengths of both paradigms, we propose a hybrid architecture combining a fine-tuned BART encoder to capture intra-row semantics with a GraphSAGE-based GNN over REGs to inject relational context. Experiments on RelBench show that the GNN substantially enriches BART's row embeddings, achieving a ROC-AUC of 67.40 on the driver-dnf task from the rel-f1 dataset. This performance is competitive with supervised baselines such as LightGBM (68.86) and narrows the gap to RDL (72.62) to within 5.22 points, though a substantial gap remains to state-of-the-art foundation models such as KumoRFM (82.63). These results suggest that lightweight hybrid LM-GNN architectures offer a promising and resource-efficient path towards foundation models for relational databases.

URL PDF HTML ☆

赞 0 踩 0

2605.16078 2026-05-18 stat.ML cs.LG

A numerical study into neural network surrogate model performance for uncertainty propagation

基于神经网络代理模型的不确定性传播性能数值研究

Noah Wade, Kirubel Teferra

发表机构 * ASEE Postdoctoral Associate（ASEE博士后研究员）； U.S. Naval Research Laboratory（美国海军研究实验室）

AI总结本文研究神经网络代理模型在捕捉整个概率空间中解场完整分布的能力，尤其关注分布尾部表现，通过热传导方程对比了全连接网络与深度算子网络的性能。

详情

DOI: 10.1061/JENMDT/EMENG-8978

AI中文摘要

神经网络代理模型已发展为一种有前景的方法，用于建模物理建模中遇到的各种边界值问题的解场。随机问题特别受到关注，因为传统数值求解器在参数分析中可以显著减少昂贵的正向模型重复评估。然而，文献中的许多研究主要关注神经网络代理模型表示确定性样本或均值场解的能力，而忽视了代理模型在分布尾部的性能。本文详细研究了神经网络代理模型捕捉整个概率空间中解场完整分布的能力，尤其强调分布尾部的表现。作为典型问题，热传导方程具有高度随机的源项，导致热解场出现极端变化。通过比较经典前馈全连接网络和深度算子网络架构，使用数据驱动和物理指导的损失函数进行比较。结果表明，最坏情况预测误差比均值场误差大一个数量级，突显了异常样本的重要性。与极端样本相关的较大误差源于网络必须超出训练数据范围进行外推。本文提出了一种识别这些样本的方法，并讨论了处理其误差的潜在方法。在考虑的模型中，使用弱形式残差损失训练的全连接神经网络在处理这些外推输入方面表现最佳，实现了对数值生成数据集的最高预测精度。

英文摘要

Neural network surrogate models have emerged as a promising approach to model solution fields for a wide variety of boundary value problems encountered in physical modeling. Stochastic problems represent an area of particularly high interest because of the potential to significantly reduce the repeated evaluation of expensive forward models via traditional numerical solvers when conducting parametric analysis. However, many studies found in the literature primarily focus on the ability of neural network surrogate models to represent deterministic samples or mean field solutions and largely overlook surrogate model performance at the tails of the distribution. The present study examines in detail the ability of neural network surrogate models to capture the full distribution of solution fields over the entire probability space, while emphasis is placed at the tails of the distribution. Serving as a canonical problem is the heat conduction equation with a highly stochastic source term, inducing extremely large variation in the thermal solution field. Comparisons are made between a classic feed-forward fully connected network and a Deep Operator Network architecture, using both data-driven and physics-informed loss functions. Results show that the worst-case prediction errors are an order of magnitude larger than the mean field error, highlighting the importance of the outlier samples. The large errors associated with extreme samples result from the networks having to extrapolate beyond the bounds of the training data. A method for identifying these samples is presented along with a discussion of potential approaches to account of their errors. Among the models considered, the fully connected neural network trained using a weak form residual loss performs best in handling these extrapolated inputs, achieving the highest prediction accuracy for the numerically produced datasets.

URL PDF HTML ☆

赞 0 踩 0

2605.16046 2026-05-18 cs.SE cs.AI

XSearch: Explainable Code Search via Concept-to-Code Alignment

XSearch: 通过概念到代码对齐实现可解释的代码搜索

Yiming Liu, Ruofan Liu, Yun Lin, Zicong Zhang, Weiyu Kong, Pengnian Qi, Xiao Cheng, Weinan Zhang, Qianxiang Wang, Linpeng Huang

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Shanghai Innovation Institute（上海创新研究院）； National University of Singapore（新加坡国立大学）； Huawei Technologies Co., Ltd（华为技术有限公司）

AI总结本文提出XSearch框架，通过将代码搜索转化为概念对齐问题，提升代码搜索的可解释性和泛化能力，在分布偏移基准测试中性能提升显著。

Comments Accepted to ISSTA 2026

详情

AI中文摘要

语义代码搜索在学术和工业中广泛应用。这些方法将自然语言查询和代码片段嵌入共享嵌入空间并基于向量相似性检索结果。尽管在基准数据集上表现强劲，但往往存在可解释性和泛化能力差的问题。检索的代码可能在语义上相似，却遗漏了查询的关键功能需求，且无法解释为何选择该结果。此外，这种失败在分布偏移下更加严重，模型难以泛化到未见过的基准。本文提出XSearch，一种内在可解释的代码搜索框架。我们的关键见解是，通过依赖全局嵌入相似性，现有检索器本质上采取归纳观点。它们学习统计模式而非真正理解查询的功能需求。我们通过将代码搜索重新表述为演绎的概念对齐问题来解决这一问题。XSearch (i) 在查询中识别功能概念 (ii) 明确将这些概念与相应代码语句对齐。这种解释后再预测的设计产生内在的概念级解释，并减轻影响分布偏移泛化的捷径学习。我们训练一个具有显式概念对齐目标的编码器，并通过查询概念与代码语句之间的显式匹配进行检索。实验显示，训练在CodeSearchNet使用GraphCodeBERT (125M参数) 的XSearch在分布偏移基准测试中的性能从0.02提升到0.33 (15倍) 超过八种最先进的检索器，并且在参数高达7B的基线中表现一致。用户研究显示，概念对齐的解释使用户能够更快更准确地评估检索结果。

英文摘要

Semantic code search has been widely adopted in both academia and industry. These approaches embed natural-language queries and code snippets into a shared embedding space and retrieve results based on vector similarity. Despit strong performance on benchmark datasets, they often suffer from poor explainability and generalization. Retrieved code may appear semantically similar yet miss critical functional requirements of the query, while providing no explanation of why the result was retrieved. Moreover, such failures become more severe under distribution shift, where models struggle to generalize to unseen benchmarks. In this work, we propose XSearch, an intrinsically explainable code search framework. Our key insight is that by relying on global embedding similarity, existing retrievers inherently take an inductive view. They learn statistical patterns rather than truly understanding the query's functional requirements. We address this problem by reformulating code search as a deductive concept alignment problem. XSearch (i) identifies functional concepts in the query and (ii) explicitly aligns them with corresponding code statements. This explain-then-predict design produces inherent concept-level explanations and mitigates shortcut learning that harms out-of-distribution generalization. We train an encoder with explicit concept-alignment objectives and perform retrieval through explicit matching between query concepts and code statements. Experiments show that, trained on CodeSearchNet using GraphCodeBERT (125M parameters), XSearch improves performance on out-of-distribution benchmarks from 0.02 to 0.33 (15x) over eight state-of-the-art retrievers, and consistently outperforms both encoder- and decoder-based baselines with up to 7B parameters. A user study demonstrates that concept-alignment explanations enable users to evaluate retrieved results faster and more accurately.

URL PDF HTML ☆

赞 0 踩 0

2605.16041 2026-05-18 stat.ML cs.LG

Explainable AI Isn't Enough! Rethinking Algorithmic Contestability

可解释AI还不够！重新思考算法可争议性

Timo Freiesleben, Kristof Meding, Gunnar König

发表机构 * LMU Munich（慕尼黑莱茵河大学）； Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心）； Munich Center for Mathematical Philosophy (MCMP)（慕尼黑数学哲学中心）； CZS Institute for Artificial Intelligence and Law（CZS人工智能与法律研究所）

AI总结本文探讨了算法可争议性的重要性，提出了一种新的定义，指出传统XAI方法不足以挑战算法决策，提出了三种证据类型以支持决策逆转。

详情

AI中文摘要

机器学习系统日益影响个人生活决策，如贷款审批、招聘和作弊检测，引发如何应对这些系统不利决定的问题。尽管可解释AI（XAI）主要关注算法可逆性，但算法可争议性问题却较少受到关注。本文提出可争议性作为算法问题的正式定义，强调决策可能错误，并识别三种证据类型以挑战和推翻决策。

英文摘要

Machine learning systems increasingly make life-changing decisions about individuals, such as loan approvals, hiring, and cheating detection, raising a pressing question: how can individuals respond to negative decisions made by these opaque systems? While explainable artificial intelligence (XAI) has largely focused on algorithmic recourse -- helping individuals change their features to obtain a desired outcome -- the parallel problem of algorithmic contestability -- helping individuals review and correct erroneous algorithmic decisions -- has received far less attention, despite its central ethical and legal importance. We trace this neglect to the absence of clear formal definitions and a systematic operationalization of contestability as an algorithmic problem. To address it, we propose an operational definition of contestability as a natural complement to recourse: contestability starts from the presumption that a decision may be incorrect and focuses on identifying evidence to challenge and potentially overturn it, whereas recourse assumes the decision is valid and instead provides pathways for changing it. We show that standard XAI explanations, such as counterfactuals, LIME, or Anchors, even when combined with human intuitions about decision continuity or monotonicity, reveal only errors in the neighborhood of the individual, but provide insufficient grounds for overturning the decision at hand. Going thus beyond traditional XAI, we identify three types of evidence warranting reversal according to the decision maker's own ethical standards: predictive multiplicity, incorrect feature values, and neglected overruling evidence. We argue that these render decisions normatively indefensible and thus successfully contestable. Finally, we analyze how existing EU legislation connects to our framework and argue that individuals already hold some legal rights to these forms of evidence.

URL PDF HTML ☆

赞 0 踩 0

2605.16035 2026-05-18 cs.CR cs.AI cs.MA

Who Owns This Agent? Tracing AI Agents Back to Their Owners

谁拥有这个智能体？追溯AI智能体回其所有者

Ruben Chocron, Doron Jonathan Ben Chayim, Eyal Lenga, Gilad Gressel, Alina Oprea, Yisroel Mirsky

发表机构 * Ben-Gurion University of the Negev Beer-Sheva Israel ； Center for Cybersecurity Systems \& Networks, Amrita Vishwa Vidyapeetham Amritapuri India ； Northeastern University Boston Massachusetts USA ； Ben-Gurion University of the Negev ； Center for Cybersecurity Systems \& Networks, Amrita Vishwa Vidyapeetham ； Northeastern University

AI总结本文提出了一种基于canary的智能体归属追踪方法，解决无法追溯恶意或误配置智能体所有者的问题，展示了其在实际场景中的可靠性与鲁棒性。

Comments Under Review

详情

AI中文摘要

AI智能体越来越多地被用于在世界中自主行动，但目前仍没有可靠的方法追溯有害智能体回其部署账户。本文将这一缺口定义为智能体归属问题：将观察到的智能体交互链接到托管供应商的负责账户。我们提出了一种基于canary的协议：授权方将canary注入智能体交互流中，供应商在会话日志的狭窄窗口中恢复原始会话和账户。非对抗性情况下简单的canary足够，针对对抗性操作者过滤或改写内容，我们开发了鲁棒的canary构造，使其无法被压制而不影响智能体自身任务性能，从而在防守方获得正式的不对称优势。我们评估了多种场景，包括现实中的智能体，并证明了我们的归属方法在供应商端部署中的可靠性、鲁棒性和可扩展性。

英文摘要

AI agents are increasingly deployed to act autonomously in the world, yet there is still no reliable way to trace a harmful agent back to the account that deployed it. This creates the same accountability gap across both ends of the intent spectrum: benign operators may deploy misconfigured or overbroad agents that cause harm unintentionally, while malicious operators may deliberately weaponize agents for scams, harassment, or cyber attacks. In many cases, these agents are powered by vendor-hosted models, a dependency that holds even for sophisticated adversaries such as state actors conducting cyber operations. In either case, affected parties can observe the behavior but cannot notify the responsible operator, stop the session, or identify the account for investigation. We formalize this gap as the problem of agent attribution: linking an observed agent interaction to the responsible account at the hosting vendor. To our knowledge, this is the first work to define the problem and present a practical solution. Our protocol is canary-based: an authorized party injects a canary into the agent's interaction stream, and the vendor searches a narrow window of session logs to recover the originating session and account. Simple canaries suffice in non-adversarial settings. For adversarial operators who filter or paraphrase incoming content, we develop robust canary constructions that cannot be suppressed without degrading the agent's own task performance, yielding a formal asymmetry in the defender's favor. We evaluate a variety of scenarios including real-world agents and show that our attribution method is reliable, robust, and scalable for vendor-side deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.15996 2026-05-18 stat.ML cs.LG math.ST stat.TH

Testing properties of trees in graphical models with covariance queries

利用协方差查询测试图模型中树的性质

Sofiya Burova, Francisco Calvillo, Gábor Lugosi, Piotr Zwiernik

发表机构 * IMTECH and Departament de Matemàtiques, Universitat Politècnica de Catalunya, Barcelona, Spain and Department of Economics and Business, Pompeu Fabra University, Barcelona, Spain（IMTECH 和巴塞罗那理工大学数学系及巴塞罗那庞培法华大学经济与商业系，西班牙巴塞罗那）； LPSM, Sorbonne Université, 4 Place Jussieu, 75005 Paris, France（巴黎索邦大学LPSM，法国巴黎，4 Place Jussieu, 75005）； ICREA, Pg. Lluís Companys 23, 08010 Barcelona, Spain（ICREA，西班牙巴塞罗那Lluís Companys 23号，08010；巴塞罗那庞培法华大学经济与商业系，西班牙巴塞罗那；巴塞罗那经济学院）； Department of Economics and Business, Pompeu Fabra University, Barcelona, Spain（巴塞罗那庞培法华大学经济与商业系，西班牙巴塞罗那；巴塞罗那经济学院）； Barcelona School of Economics ； Department of Economics and Business, Pompeu Fabra University, Barcelona, Spain ； Barcelona School of Economics

AI总结本文研究高维图模型下树结构的性质测试，设计了基于子二次查询数量的随机测试方法，针对叶子数、最大度、典型距离和直径等属性提出显式查询复杂度界限。

2605.15952 2026-05-18 cs.HC cs.RO

Driving Through the Network: Performance and Workload Under Latency and Video Impairment

通过网络驾驶：延迟和视频失真下的性能与负载

Ines Trautmannsheimer, Ahmed Azab, Frank Diermeyer

发表机构 * Technical University of Munich, School of Engineering and Design, Institute of Automotive Technology and Munich Institute of Robotics and Machine Intelligence (MIRMI)（慕尼黑工业大学，工程学院，汽车技术研究所，慕尼黑机器人与机器智能研究所（MIRMI））

AI总结研究通过驾驶模拟器探讨网络延迟和视频质量对驾驶员性能和负载的影响，发现延迟和带宽增加会提升操作负荷，但生理指标显示亚加性交互作用，而性能和眼动指标交互作用较小。

Comments Preprint of VEHITS 2026 : 12th International Conference on Vehicle Technology and Intelligent Transport Systems

详情

AI中文摘要

远程操作有望扩展自动驾驶车辆的操作范围，但其关键依赖于网络延迟和视频质量。我们报告了一项固定基础驾驶模拟器研究（N=25），通过增加延迟（100/300 ms）和比特率（500/2000 kbit/s）的2x2操纵，以及最佳基线（0 ms增加，9000 kbit/s）进行研究。我们测量了每种条件下的有效玻璃到玻璃（G2G）延迟（基线约413 ms；有效总延迟约500-700 ms）并验证了稳定的帧率和编码器设置。多模态测量涵盖了性能（速度、转向反转、碰撞）、眼动行为（眨眼率、固定持续时间）、生理学（RR间隔、心率、皮肤电导）和主观工作负载。延迟和比特率均增加了操作者的负载并略微影响了性能。生理学指标（心率、RR间隔）表现出亚加性交互作用，而性能和眼动交互作用较小或不显著。等价性测试显示，300 ms与2000 kbit/s相当于最佳情况（SESOI±2 km/h），而300 ms与500 kbit/s则不等价。我们主张延迟和视频质量应被视为主要独立设计变量，并且生理学感知的适应可以提前预测过载，而不会影响安全。

英文摘要

Teleoperation promises to extend the operational envelope of automated vehicles, yet it critically depends on network latency and video quality. We report a fixed-base driving-simulator study (N=25) with a 2x2 manipulation of added latency (100/300 ms) and bitrate (500/2000 kbit/s), plus a best-case baseline (0 ms added, 9000 kbit/s). We measured effective glass-to-glass (G2G) latency per condition (baseline approx. 413 ms; effective totals approx. 500-700 ms) and verified stable framerate and encoder settings. Multimodal measures covered performance (speed, steering reversals, crashes), oculomotor behavior (blink rate, fixation duration), physiology (RR interval, heart rate, skin conductance), and subjective workload. Latency and bitrate each increased operator load and modestly affected performance. Physiological measures (heart rate, RR interval) exhibited sub-additive interactions, whereas performance and oculomotor interactions were small or non-significant. Equivalence tests showed that 300 ms with 2000 kbit/s was velocity-equivalent to best-case (SESOI +/- 2 km/h), while 300 ms with 500 kbit/s was not. We argue that latency and video quality should be treated as largely independent design levers, and that physiology-aware adaptation can anticipate overload before safety is compromised.

URL PDF HTML ☆

赞 0 踩 0

2605.15938 2026-05-18 physics.bio-ph cs.LG

Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery

湍流中利用Q学习进行钟态嗅觉搜索：烟雾恢复的几何学

Marco Rando, Robin A. Heinonen, Yujia Qi, Agnese Seminara

发表机构 * Universit\'e C\ ote d'Azur, Inria, CNRS, Laboratoire J.A. Dieudonn\'e, 28 avenue Valrose, 06108, Nice, France ； Machine Learning Genoa Center \& Department of Civil, Chemical ； Environmental Engineering, University of Genova Via Montallegro 1, 16145 Genoa, Italy ； Environmental Engineering, University of Genova, Via Montallegro 1, 16145 Genoa, Italy, now at University of California, Dept Mechanical Engineering, Engineering II, Santa Barbara, CA 93106-5070, USA ； Environmental Engineering, University of Genova, Via Montallegro 1, 16145 Genoa, Italy

AI总结本文通过Q学习训练嗅觉搜索代理，利用时间钟表恢复烟雾，结合昆虫行为提升导航策略，但需改进策略适应局部间歇性水平以增强鲁棒性。

Comments 15 pages, 13 figures, 1 table

2605.15920 2026-05-18 stat.ML cs.LG

Unsupervised Domain Shift Detection with Interpretable Subspace Attribution

无监督领域偏移检测与可解释子空间归因

Sebastian Springer, Alessandro Laio

发表机构 * SISSA（国际理论物理中心）

AI总结本文提出一种无监督领域偏移检测工具，通过高维特征空间中的局部密度异常检测，识别偏移特征子空间，从而可解释偏移来源，并提供补偿协议。

详情

AI中文摘要

我们开发了一种检测领域偏移的工具，即数据集概率分布的细微差异。我们通过检测高维特征空间中的局部密度异常来识别这些偏移。如果存在异常，则确定异常最显著的特征子空间。这使我们能够追溯偏移到一小部分特征，使其可解释。此外，我们提供了一种补偿领域偏移的协议，通过从两个未标记数据集中提取无明显残余分布差异的样本子集。我们在受控的20维基准上验证了该框架，恢复了广义和局部偏移及其支持的特征子空间。然后将其应用于由782个特征表示的健康心电图（ECG）记录。在年龄和性别匹配的队列比较中，方法检测到设备引起的偏移，提取了富含不平衡设备组件的代表性子集，并识别了与获取对比相关的ECG特征。这些结果表明，密度偏移检测和子空间归因提供了一种实用框架，可在下游建模之前揭示隐藏的队列偏见。

英文摘要

We developed a tool for detecting domain shifts, namely subtle differences in the probability distributions of datasets. We identify these shifts using an algorithm designed to detect localised density anomalies in high-dimensional feature spaces. If an anomaly is present, we then identify the feature subspace in which the anomaly is most pronounced. This allows us to trace the domain shift to a small set of features, making the shift interpretable. Moreover, we provide a protocol for compensating domain shifts by extracting, from two unlabelled datasets, subsets of samples with no detectable residual distributional difference. We validate the framework on controlled 20-dimensional benchmarks with known ground truth, recovering both broad and localized shifts together with their supporting feature subspaces. We then apply it to healthy electrocardiogram (ECG) recordings represented by 782 features. In age- and sex-matched cohort comparisons differing in measurement-device composition, the method detects device-induced shifts, extracts representative subsets enriched in the imbalanced device components, and identifies ECG features associated with the acquisition contrast. These results suggest that density-shift detection and subspace attribution provide a practical framework for uncovering hidden cohort biases before downstream modelling.

URL PDF HTML ☆

赞 0 踩 0

2605.15915 2026-05-18 cs.HC cs.AI cs.CL

SLIP & ETHICS: Graduated Intervention for AI Emotional Companions

SLIP与伦理：面向AI情感伴侣的渐进干预

Minseo Kim

发表机构 * HUA Labs（HUA实验室）

AI总结本文提出SLIP与ETHICS框架，通过渐进干预方法解决AI情感伴侣的安全与亲和力矛盾，实验显示在高能量状态下干预不足，但提升模型能力可改善检测效果。

Comments Accepted to PervasiveHealth 2026. 11 pages, 2 figures, 4 tables. Proc. of the 20th EAI International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth 2026)

详情

AI中文摘要

AI情感伴侣面临安全与亲和力的矛盾：严格的安全措施可能损害支持性联盟，而宽松的系统则可能危害用户。本文提出SLIP（分阶段干预协议），一种四阶段渐进方法，通过结构化定性指标（情绪强度(a)和叙述动态性(m)）推导干预措施（无、轻度、重度）。ETHICS（人类-人工智能交互上下文信号的新兴分类法）是一种“信号而非标签”的分类法。结合小规模生产部署（N=68，10名用户，10周）和合成角色电池测试（N=91，5种行为风险配置文件），结果显示流角色的误报率为0%，并在危机导向角色中显示出预期的升级模式。然而，初步结果表明，连续8天的高能量提升导致零干预（0/8），暴露了“不病理化”原则与安全之间的边界。后续的三模型压力测试显示，增加模型能力可将检测率从0/8提升至6/8，同时在最大模型中保持0/10的流误报率。这些发现将渐进干预作为导航而非解决情感计算中安全与亲和力张力的设计方向。

英文摘要

AI emotional companions face a safety-rapport paradox: restrictive safeguards can damage supportive alliance, while permissive systems risk user harm. We present SLIP (Staged Layers of Intervention Protocol), a four-stage graduated methodology deriving interventions (none, soft, hard) from structured qualitative indicators -- affect intensity (a) and narrative dynamism (m) -- alongside ETHICS (Emergent Taxonomy for Human-AI Interaction Context Signals), a "signals not labels" taxonomy. An evaluation combining a small-scale production deployment (N=68 entries, 10 users, 10 weeks) with a synthetic persona battery (N=91, 5 behavioral-risk profiles) achieved 0% false positives for the flow persona and showed expected escalation patterns in crisis-oriented personas. However, initial results showed that 8 consecutive days of high-energy elevation produced zero interventions (0/8), exposing a boundary where the "do not pathologize" principle conflicts with safety. A subsequent three-model stress test demonstrated that increased model capability improves detection from 0/8 to 6/8 while preserving 0/10 flow false positives in the largest model. Read as preliminary, these findings position graduated intervention as a design direction for navigating -- not resolving -- the safety-rapport tension in affective computing.

URL PDF HTML ☆

赞 0 踩 0

2605.15905 2026-05-18 cs.IR cs.AI

Generative Long-term User Interest Modeling for Click-Through Rate Prediction

生成长期用户兴趣建模用于点击通过率预测

Jiangli Shao, Kaifu Zheng, Hao Fang, Huimu Ye, Zhiwei Liu, Bo Zhang, Shu Han, Xingxing Wang

发表机构 * MeiTuan Beijing China（美团北京中国）

AI总结本文提出GenLI模型，通过生成兴趣模块、行为检索模块和兴趣融合模块，提升CTR预测的准确性和效率，解决传统方法中长期兴趣建模不完整和效率低的问题。

详情

AI中文摘要

通过大规模历史用户行为建模长期用户兴趣可提升广告和推荐系统中点击通过率（CTR）预测性能。通常采用两阶段框架，其中通用搜索单元（GSU）首先检索目标物品的相关行为，精确搜索单元（ESU）通过定制注意力生成兴趣特征。然而，当前以目标为中心的GSU会忽略其他潜在用户兴趣，导致兴趣特征不完整和偏差。此外，GSU中的匹配基于检索过程依赖于目标物品与每个历史行为之间的成对相似度分数，这不仅使在线服务在用户行为增长时变得耗时，还忽略了用户行为间的交互信息。为解决这些问题，我们提出了一种名为GenLI的生成长期用户兴趣模型用于CTR预测。GenLI包括兴趣生成模块（IGM）、行为检索模块（BRM）和兴趣融合模块（IFM）。IGM生成多个兴趣分布以表示实时用户兴趣的不同方面，该模块是目标无关的，并且结合行为间的交互信息，确保兴趣特征的完整和多样化。BRM通过简单的查找操作选择相关行为，将加权每个行为的时间复杂度降低到O(1)。最后，IFM使用精细的门控机制生成兴趣特征。基于生成过程，GenLI提高了用户兴趣的多样性，避免了基于匹配的行为检索，实现了CTR预测在准确性和效率之间的更好平衡。

英文摘要

Modeling long-term user interests with massive historical user behaviors enhances click-through rate (CTR) prediction performance in advertising and recommendation systems. Typically, a two-stage framework is widely adopted, where a general search unit (GSU) first retrieves top-$k$ relevant behaviors towards the target item, and an exact search unit (ESU) generates interest features via tailored attention. However, current target-centered GSU would ignore other latent user interests, leading to incomplete and biased interest features. Additionally, the matching-based retrieval process in GSUs depends on the pairwise similarity score between target item and each historical behavior, which not only becomes time-consuming for online services as user behaviors continue to grow, but also overlooks the interaction information among user behaviors. To combat these problems, we propose a \textbf{Gen}erative \textbf{L}ong-term user \textbf{I}nterest model named GenLI for CTR prediction. GenLI consists of an interest generation module (IGM), a behavior retrieval module (BRM), and an interest fusion module (IFM). The IGM generates multiple interest distributions to indicate different aspects of real-time user interests, which is target-independent and incorporates interaction information among behaviors, ensuring complete and diverse interest features. The BRM selects related behaviors via a simple lookup operation, reducing the time complexity for weighting each behavior to $O(1)$. Finally, the IFM uses delicate gating mechanisms to generate interest features. Based on the generation process, GenLI improves the diversity of user interests and avoids complex matching-based behavioral retrieval, achieving a better balance between accuracy and efficiency for CTR prediction.

URL PDF HTML ☆

赞 0 踩 0

2605.15895 2026-05-18 eess.IV cs.CV

Layer Selection in Feature-Based Losses Affects Image Quality and Microstructural Consistency in Deep Learning Super-Resolution of Brain Diffusion MRI

基于特征的损失函数中层选择影响深度学习超分辨率中图像质量及微结构一致性

David Lohr, Rene Werner

发表机构 * Institue for Applied Medical Informatics, University Medical Center Hamburg-Eppendorf（应用医学信息学研究所，汉堡大学医学中心）； Institute of Computational Neuroscience, University Medical Center Hamburg-Eppendorf（计算神经科学研究所，汉堡大学医学中心）； Center for Biomedical Artificial Intelligence (bAIome), University Medical Center Hamburg-Eppendorf（生物医学人工智能中心（bAIome），汉堡大学医学中心）

AI总结本研究探讨了基于特征的损失函数在深度学习超分辨率中对扩散信号一致性的影响，发现深层网络层会导致网格状伪影，而浅层网络层能保持图像与地面真实的一致性，尤其在9倍超分辨率下表现优异。

详情

AI中文摘要

高分辨率扩散MRI的临床应用受硬件限制和扫描时间阻碍，推动了计算超分辨率的发展。本研究探讨了基于特征的损失函数在深度学习超分辨率中保持扩散信号一致性的有效性。利用人类连接组计划的7T数据生成低分辨率和高分辨率扩散加权图像对，训练了UNets进行2D超分辨率。通过消融和隔离研究，评估了不同VGG16层用于特征损失与图像基L1基准的性能。更深层的层和其组合在超分辨率DWI中产生网格状伪影，这种伪影在扩散参数如定量和各向异性分数中持续存在。使用最浅层时没有此类伪影。该层的下游分析显示与地面真实高度一致，即使在9倍超分辨率下也是如此。图像SNR和使用的VGG16层深度调节伪影的出现和严重程度，要求在扩散MRI中谨慎选择贡献层。

英文摘要

Clinical application of high-resolution diffusion MRI is hindered by hardware limitations and prohibitive scan times, motivating computational super-resolution. This study investigates the efficacy of a feature-based loss function in preserving diffusion signal consistency in deep learning super-resolution. Using 7T data from the human connectome project to generate pairs of low- and high-resolution diffusion weighted images (DWI), we trained UNets for 2D super-resolution. Ablation and isolation studies evaluated different VGG16-layers for feature-based losses against an image-based L1 baseline. Deeper layers and combinations thereof resulted in grid-like artifacts in super-resolution DWIs, which persisted in diffusion parameters like quantitative and fractional anisotropy. No such artifacts were present when using the shallowest layer. Downstream analysis for this layer showed great consistency with the ground truth, even for 9-fold super-resolution. Image SNR and used VGG16-layer depths modulated artifact appearance and severity, mandating careful selection of contributing layers for application in and beyond diffusion MRI.

URL PDF HTML ☆

赞 0 踩 0

2605.15889 2026-05-18 cs.CR cs.LG

A Multi-Layer Cloud-IDS Pipeline with LLM and Adaptive Q-Learning Calibration

多层云入侵检测系统与LLM和自适应Q学习校准

Syed Waqas Ali, Ibrar Ali Shah, Farzana Zahid, Daniyal Munir, Hans D. Schotten

发表机构 * Department of Computer Software Engineering, University of Engineering and Technology（计算机软件工程系，工程大学）； Department of Computer Science, University of Waikato（计算机科学系，怀卡托大学）； WICON, RPTU University, Kaiserslautern-Landau（WICON，鲁滕堡大学，凯泽斯劳滕-兰道）； Department of Intelligent Networks, German Research Center for Artificial Intelligence (DFKI)（智能网络系，德国人工智能研究中心（DFKI））

AI总结本文提出一种多层云入侵检测系统，结合强化学习和LLM，通过自适应阈值和Chroma数据库提升检测性能，实验显示系统在准确率、精确率和召回率等方面表现优异。

详情

AI中文摘要

由于分层云架构、动态环境和未知攻击等因素，云安全成为重大关注点。入侵检测系统（IDS）通常在特定层运行，依赖机器学习模型，但实验效果好却难以在实际云部署中维持性能。本文实现了一个基于强化学习的自信度多级入侵检测系统，保护网络、主机和虚拟机三层。每层机器学习模型检测已知攻击模式，预测自信度区分可靠决策与不确定结果。在多闸门流程中，低自信事件通过学习阈值自信门（Gate-1），随后通过Chroma内存匹配门（Gate-2），未解决事件升级至大语言模型（LLM）进行语义分析和解释。最终在Gate-3使用校准的LLM自信度或加权融合回退，不确定事件保留在审查桶中以避免强制分类。生成的解释和确认知识存储在ChromaDB中以支持未来分析和再训练。该方法首先使用静态阈值评估，建立比较基准。结果表明，所提系统学习自适应阈值，减少LLM升级58.78%，降低成本同时保持强性能（88.68%准确率，85.29%精确率，84.72%召回率，85.00% F1）。网络和虚拟机层分别达到98.02%和97.08%准确率，证明了系统平衡且高效的检测能力。

英文摘要

Security in cloud computing has become a major concern due to several factors such as layered cloud architectures, dynamic environments, and exposure to unseen or zero-day attacks. Moreover, intrusion detection systems (IDS) typically operate at specific layers and rely heavily on machine learning models, which often perform well in experimental settings but fail to sustain performance in real cloud deployments. In this work, we implement a confidence-aware multilevel intrusion detection system using reinforcement learning tailored for cloud environments. The system secures three distinct layers: network, host, and hypervisor. Machine learning models at each layer detect known attack patterns, while prediction confidence distinguishes reliable decisions from uncertain outcomes. Within the multi-gate flow, low-confidence events pass through a learned-threshold confidence gate (Gate-1), followed by a Chroma memory-matching gate (Gate-2), with unresolved events escalated to a large language model (LLM) for semantic analysis and explanation. Final attack promotion at Gate-3 uses calibrated LLM confidence or weighted-fusion fallback, while uncertain events are retained in a review bucket to avoid forced classification. Generated explanations and confirmed knowledge are stored in ChromaDB to support future analysis and retraining. The approach is first evaluated using static thresholds, establishing a baseline for comparison. Results show that the proposed system learns adaptive thresholds and reduces LLM escalation by 58.78%, lowering cost while maintaining strong performance (88.68% accuracy, 85.29% precision, 84.72% recall, 85.00% F1). The network and hypervisor layers achieve 98.02% and 97.08% accuracy, demonstrating a balanced and efficient detection system.

URL PDF HTML ☆

赞 0 踩 0

2605.15881 2026-05-18 math.DS cs.AI physics.comp-ph

Symplectic Neural Operators for Learning Infinite Dimensional Hamiltonian Systems

辛神经算子用于学习无限维哈密顿系统

Yeang Makara, Yusuke Tanaka, Takashi Matsubara, Takaharu Yaguchi

发表机构 * Graduate School of Science（理学研究科）； Kobe University（Kobe大学）； NTT Communication Science Laboratories（NTT通信科学实验室）； Faculty of Information Science and Technology（信息科学和技术学部）； Hokkaido University（北海道大学）； Institute of Mathematics for Industry（工业数学研究所）； Kyushu University（九州大学）

AI总结本文提出辛神经算子，用于解决无限维哈密顿系统建模与模拟中的计算与结构挑战，通过保持辛结构提升长期稳定性与能量行为。

2605.15859 2026-05-18 cs.DS cs.LG math.ST stat.ML stat.TH

Complexity of Non-Log-Concave Sampling in Fisher Information

非对数凹分布采样中复杂性的研究

Sinho Chewi, Andre Wibisono

发表机构 * Yale University, Department of Statistics and Data Science（耶鲁大学统计与数据科学系）； Yale University, Department of Computer Science（耶鲁大学计算机科学系）

AI总结研究非对数凹分布采样中相对信息量保证的查询复杂性，提出基于近端采样器的算法，利用受限高斯 oracle 实现，改进非对数凹采样的复杂性并提升对数凹采样的精度。

详情

AI中文摘要

我们研究了获得非对数凹分布采样相对 Fisher 信息保证的查询复杂性，该问题类似于优化中的近似 stationary 点寻找。我们的算法基于近端采样器，即 Langevin 扩散的隐式离散化，并需要实现称为受限高斯 oracle（RGO）的后向步骤。我们展示通过利用最近在 Rényi 散度中高精度对数凹采样的结果，可以得到近似 RGO 实现，当与近端采样器结合时，能够获得在相对 Fisher 信息中继承与对数凹采样相同维度依赖性的复杂性保证，并在非对数凹采样中改进先前工作。我们还展示了一个逆向减少，任何在非对数凹采样中相对 Fisher 信息的维度依赖性改进都将导致高精度对数凹采样中的维度依赖性改进。

英文摘要

We study the query complexity of obtaining a relative Fisher information guarantee for sampling from a log-smooth non-log-concave distribution; this is a sampling analog of finding an approximate stationary point in optimization. Our algorithm is based on the proximal sampler, which is an implicit discretization of the Langevin diffusion, and requires an implementation of the backward step known as the restricted Gaussian oracle (RGO). We show that by leveraging the recent results for log-concave sampling with high-accuracy guarantees in Rényi divergence, we can obtain an approximate RGO implementation that -- when used with the proximal sampler -- yields a complexity guarantee in relative Fisher information that inherits the same dimension dependence as log-concave sampling, and improves upon prior work for non-log-concave sampling. We also show a converse reduction that any improvement in the dimension dependence in relative Fisher information for non-log-concave sampling will yield an improved dimension dependence for high-accuracy log-concave sampling.

URL PDF HTML ☆

赞 0 踩 0

2605.15848 2026-05-18 cs.HC cs.CL

Conversations in Space: Structuring Non-Linear LLM Interactions on a Canvas

空间对话：在画布上构建非线性大语言模型交互结构

Rifat Mehreen Amin, Alperen Adatepe, Daniela Fernandes, Daniel Buschek, Andreas Butz

发表机构 * LMU Munich（慕尼黑大学）； Aalto University（阿尔托大学）； University of Bayreuth（拜罗伊特大学）

AI总结本文提出CanvasConvo，通过将线性聊天转化为分支对话树，支持在画布上探索假设场景，提升LLM交互的非线性结构和探索效率。

详情

AI中文摘要

由大型语言模型（LLMs）驱动的对话界面广泛用于创意和分析，但其线性结构限制了替代方案的探索和长对话的管理。我们提出了CanvasConvo，一种将线性聊天转化为嵌入在空间画布中的分支对话树的对话界面概念。CanvasConvo允许用户通过直接从对话内容分支来探索假设场景，支持平行开发替代方向。这些分支在画布上可视化，同时与熟悉的聊天界面保持集成，允许用户在线性和非线性交互之间切换。支持时间线导航、自动标记和总结、以及上下文感知控制（例如目标、可重用提示）等功能，支持结构化交互和连续性。我们在5-7天的现场研究中评估了CanvasConvo，与24名参与者一起。我们的发现突显了非线性对话结构如何支持探索性工作流程和不同的LLM工作交互。

英文摘要

Conversational interfaces powered by large language models (LLMs) are widely used for ideation and analysis, yet their linear structure limits exploration of alternatives and management of long-running interactions. We present CanvasConvo, a conversational interface concept that transforms linear chat into a branching conversation tree embedded in a spatial canvas. CanvasConvo enables users to explore what-if scenarios by branching directly from conversational content, supporting parallel development of alternative directions. These branches are visualized on a canvas while remaining integrated with a familiar chat interface, allowing users to switch between linear and non-linear interaction. Features such as timeline-based navigation, automatic tagging and summarization, and context-aware controls (e.g., goals, reusable prompts) support structured interaction and continuity. We evaluated CanvasConvo in a 5-7 day field study with 24 participants. Our findings highlight how non-linear conversational structures support exploratory workflows and different interactions in LLM-based work.

URL PDF HTML ☆

赞 0 踩 0

2605.15832 2026-05-18 cs.PF cs.LG

Heuristic-Based Merging of HPC Traces to Extend Hardware Counter Coverage

基于启发式的HPC追踪合并以扩展硬件计数器覆盖率

Júlia Orteu Aubach, Fabio Banchelli, Marc Clascà Ramírez, Marta Garcia-Gasulla

发表机构 * Barcelona Supercomputing Center (BSC)（巴塞罗那超级计算中心）

AI总结本文提出一种基于启发式的HPC追踪合并方法，通过分析MPI结构、时间和通信模式匹配计算突发，从而扩展硬件计数器覆盖率，用于性能预测和分析。

详情

AI中文摘要

本文扩展了一个用于预测高性能计算工作负载性能的框架，使用机器学习。性能建模的一个常见限制是同时收集的硬件计数器数量有限。为此，我们提出了一种基于启发式的策略，通过分析MPI结构、时间和通信模式来匹配多个运行中的计算突发，从而构建一个包含更广泛硬件特征的统一数据集，而无需使用多路复用。输出是一个新的合成追踪，包含所有合并的计数器，可用于HPC性能预测和传统性能分析。该方法已在MareNostrum5机器上验证，使用各种内核和实际应用。结果表明，合并后的计数器在不同应用中保持了可接受的准确性，并可直接用于在更丰富的特征空间上训练机器学习模型，而无需先进行计数器选择。

英文摘要

This work extends a framework for predicting the performance of High-Performance Computing (HPC) workloads using Machine Learning (ML). A common limitation in performance modeling is the restricted number of hardware counters that can be collected simultaneously. To address this, we propose a heuristic-based methodology to merge execution traces from multiple runs, each instrumented with a different set of hardware counters. Our approach matches computation bursts across executions by analyzing MPI structure, timing, and communication patterns. This process enables the construction of a unified dataset that includes a wider set of hardware features without relying on multiplexing. The output is a new synthetic trace with all merged counters, which can be used both for HPC performance prediction and for conventional performance analysis. The methodology has been validated on MareNostrum5 machine with a range of kernels and real applications. Results show that the merged counters maintain acceptable accuracy depending on the application, and can be directly used to train ML models on a richer feature space without prior counter selection.

URL PDF HTML ☆

赞 0 踩 0

2605.15816 2026-05-18 cs.GR cs.CV cs.LG

StippleDiffusion: Capacity-Constrained Stippling using Controlled Diffusion

StippleDiffusion：基于受控扩散的容量受限点绘制

Ofir Gilad, Aleksander Plocharski, Przemyslaw Musialski, Andrei Sharf

发表机构 * Ben Gurion University of the Negev（贝勒贡大学内盖夫分校）； Warsaw University of Technology（华沙技术大学）； New Jersey Institute of Technology（新泽西理工学院）

AI总结本文提出一种基于扩散模型的点绘制方法，通过学习局部点分布先验和连续容量约束，实现高效且可微的点集生成，适用于任意目标密度。

Comments 12 pages, 10 figures

详情

AI中文摘要

点绘制模式，即局部密度跟踪目标图像的点集，传统上由逐密度迭代优化器生成，速度慢且非可微，每次新目标需重新运行。学习替代方法至今仅能处理无条件点生成；容量受限、图像条件化的点绘制仍无法实现。我们提出了首个基于扩散的采样器，能够在推理时同时满足学习的局部点分布先验和连续的图像定义容量约束。该方法基于最优传输网格点集扩散基础线程，构建在ControlNet分支上，条件于目标密度图和高分辨率图像。两种设计选择使组合可行：训练和推理限制在后期去噪阶段，初始化自密度加权拒绝样本；标准零卷积注入被替换为sigmoid门控1x1投影，以在强密度信号下保持基础模型的蓝噪声结构。单个训练检查点在推理时接受任意目标密度，可泛化至训练时未见过的点预算，并在输出点数几乎无关的时间内生成点集。在Icons-50基准测试中，我们的学习采样器在所有报告的指标上与逐密度优化基线持平，且保持端到端可微。

英文摘要

Stipple patterns, point sets whose local density tracks a target image, are traditionally produced by per-density iterative optimizers, which are slow, non-differentiable, and must be re-run from scratch for each new target. Learned alternatives have so far addressed only unconditional point generation; capacity-constrained, image-conditioned stippling has remained out of reach. We present the first diffusion-based sampler that simultaneously satisfies a learned local point-distribution prior and a continuous, image-defined capacity constraint at inference. The method is a ControlNet branch built on top of an optimal-transport-grid point-set diffusion baseline, conditioned on the target density map and a high-resolution image. Two design choices make the combination tractable: training and inference are restricted to the late-stage denoising regime, initialized from a density-weighted rejection sample, and the standard zero-convolution injection is replaced with a sigmoid-gated 1x1 projection that preserves the base model's blue-noise structure under hard density signals. A single trained checkpoint accepts arbitrary target densities at inference, generalizes to point budgets that were not seen during training, and produces stipples in time nearly independent of the output point count. On the Icons-50 benchmark, our learned sampler reaches parity with per-density-optimized baselines on every reported metric while remaining differentiable end-to-end.

URL PDF HTML ☆

赞 0 踩 0

2605.15815 2026-05-18 cs.SE cs.CL cs.MA

BootstrapAgent: Distilling Repository Setup into Reusable Agent Knowledge

BootstrapAgent: 将仓库设置提炼为可重用的代理知识

Sihan Fu, Oucheng Liu, Shiyuan Wang, Jin Shi, Chengkun Wei

发表机构 * Zhejiang University（浙江大学）； The Australian National University（澳大利亚国立大学）； Institute of Information Engineering, Chinese Academy of Sciences（中国科学院信息工程研究所）

AI总结 BootstrapAgent通过提炼仓库初始化过程中的启发式知识，生成可验证的代理合同，提升代码代理在 unfamiliar repositories 的任务成功率和效率。

Comments 19 pages, 9 figures, 6 tables

详情

AI中文摘要

代码代理越来越多地帮助开发者处理不熟悉的仓库，但每次任务都依赖于昂贵的前置条件：将仓库初始化为可使用的开发状态。这一过程需要大量的试错探索，但由此产生的知识——解决依赖关系、修复策略——被困在单次对话中，无法为未来的代理所用。因此，我们将仓库初始化视为一个可重用的启动知识问题，并引入BootstrapAgent，一个多代理框架，将初始化探索中发现的启发式方法提炼成持久、可验证的代理可消费的.bootstrap合同。通过证据提取、结构化规划、确定性Docker验证和基于跟踪的修复，BootstrapAgent生成涵盖环境设置、诊断检查、最小验证和累积修复知识的合同。我们进一步提出 warm repair with clean replay 来加速迭代调试而不牺牲冷启动可重现性，并提出 delta repair with sanity check 来防止奖励黑客。在三个基准测试中的实验表明，BootstrapAgent实现了92.9%的成功率，比基线高出超过10%，同时减少了下游代理的token使用量25.9%和构建时间22.3%。我们的代码可在https://github.com/Vossera/BootstrapAgent上获得。

英文摘要

Code agents increasingly help developers work with unfamiliar repositories, but every such task depends on a costly prerequisite: bootstrapping the repository into a usable development state. This process requires substantial trial-and-error exploration, yet the resulting knowledge--resolved dependencies, repair strategies--stays trapped in a single conversation, unavailable to future agents. We therefore formulate repository bootstrapping as a reusable startup knowledge problem and introduce BootstrapAgent, a multi-agent framework that distills the heuristics discovered during bootstrap exploration into a persistent, verifiable, agent-consumable .bootstrap contract. Through evidence extraction, structured planning, deterministic Docker-based verification, and trace-driven repair, BootstrapAgent generates a contract covering environment setup, diagnostic checks, minimal verification, and accumulated repair knowledge. We further propose warm repair with clean replay to accelerate iterative debugging without sacrificing cold-start reproducibility, and a delta repair with sanity check to prevent reward hacking. Experiments on three benchmarks show that BootstrapAgent achieves a 92.9% success rate, outperforming the baseline by over 10% while reducing downstream agent token usage by 25.9% and build time by 22.3%. Our code is available at https://github.com/Vossera/BootstrapAgent.

URL PDF HTML ☆

赞 0 踩 0

2605.15812 2026-05-18 cs.HC cs.AI

Toward Natural and Companionable Virtual Agents via Cross-Temporal Emotional Modeling

通过跨时间情感建模实现自然和陪伴型虚拟代理

Feier Qin, Xiao Li, Yi Zheng, Haibin Huang, Hanyao Wang, Xiaoyu Wang, Yan Lu, Yuan Zhang

发表机构 * Communication University of China（中国通信大学）； Microsoft Research Asia（微软亚洲研究院）； Institute of Artificial Intelligence, China Telecom（中国电信人工智能研究院）

AI总结本文提出CTEM框架，通过链接长期行为历史与即时情感表达，提升虚拟代理的自然性和情感和谐度，实验显示在21天的真实场景中效果显著。

Comments 21 pages, published in CHI '26

Journal ref Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26), ACM, 2026

详情

DOI: 10.1145/3772318.3790917

AI中文摘要

最近基础模型的进步使对话代理旨在持续陪伴而非单纯任务完成。然而大多数代理仍无法支持自然、长期的陪伴式互动，导致体验显得片段化和不真实。我们主张当前代理忽视了跨时间建模的社会行为和内部情感：生成的行为很少影响代理的情感状态，而情感状态 seldom 形成后续行为。我们提出了跨时间情感建模（CTEM）框架，该框架将长期行为历史与即时情感表达联系起来。CTEM建立了一个闭环，过去的经验更新演化的心理状态；该状态调节即时互动；用户反馈不断修订记忆和心理状态，使反思和预期成为可能。我们将CTEM实例化为Auri，一个即时通讯平台上的陪伴代理，并报告了一项21天的真实场景研究，显示CTEM在感知自然性、连贯性和情感和谐度方面有所改进。

英文摘要

Recent advances in foundation models have enabled conversational agents that aim for sustained companionship rather than mere task completion. Yet most still remain unable to support natural, long-term companion-like interactions, resulting in experiences that feel episodic and inauthentic. We argue that current agents overlooked cross-temporal modeling of agents' social behaviors and internal emotions: generated behaviors rarely influence an agent's emotional state, and emotional states seldom shape subsequent behaviors. We present Cross-Temporal Emotion Modeling (CTEM), a framework that links long-term behavioral history to moment-to-moment emotional expression. CTEM establishes a closed loop where past experiences update an evolving emotional state; this state conditions immediate interactions; and user feedback continually revises both memory and emotional state, enabling reflection and anticipation. We instantiate CTEM as Auri, a companion agent on an instant-messaging platform, and report a 21-day in-the-wild study showing that CTEM shows improvements in perceived naturalness, coherence, and emotional harmony.

URL PDF HTML ☆

赞 0 踩 0

2605.15799 2026-05-18 cs.MA cs.RO

From Gridworlds to Warehouses: Adapting Lightweight One-shot Multi-Agent Pathfinding for AGVs

从网格世界到仓库：为AGVs适应轻量级一次性多智能体路径规划

Hiroki Nagai, Keisuke Okumura

发表机构 * National Institute of Advanced Industrial Science and Technology (AIST)（日本国家先进工业科学与技术研究院）； Keio University（庆应大学）

AI总结本文提出多智能体仓库路径规划（MAWPF），针对差分驱动AGVs的运动特性，引入四条约束条件，改进传统MAPF算法，通过实验验证PP和LNS2在多智能体场景下的不足，PIBT类方法在可扩展性上表现更优。

Comments To be presented at IJCAI 2026

详情

AI中文摘要

多智能体路径规划（MAPF）在一次性规划中是仓库自动化的核心组成部分，但传统方法通常假设四连通的2D网格，具有单位时间四方向移动。为填补现实差距并仍能用离散组合搜索跟踪，本文提出更实用的多智能体仓库路径规划（MAWPF），具有四个约束：（i）智能体动作受限于直线运动和原地旋转；（ii）旋转需要多步成本；（iii）考虑加速度和减速度；（iv）禁止跟随碰撞以防止追尾事故。为高效解决MAWPF，我们适应了代表性的次优MAPF算法-PP、LNS2、PIBT和LaCAM，并进行了全面的基准测试。我们的实验表明，PP和LNS2在智能体数量多的实例中表现不佳，而基于PIBT的方法在增加解决方案成本时具有更优的可扩展性。我们相信，这些构成了将经典网格世界MAPF适应到运营仓库设置的重要一步。

英文摘要

Multi-agent pathfinding (MAPF) under one-shot planning is a core component of warehouse automation, yet classical formulations typically assume four-connected 2D grids with unit-time moves in four directions. To fill reality gaps while still being trackable with discrete combinatorial search, this work proposes a more practical counterpart tailored to differential-drive AGVs. We term this multi-agent warehouse pathfinding (MAWPF), featured with four constraints: (i) agent actions are restricted to straight motion and in-place rotation; (ii) rotations require multi-step costs; (iii) acceleration and deceleration are considered, and; (iv) follower collisions are prohibited to prevent rear-end crashes. To solve MAWPF efficiently, we adapt representative suboptimal MAPF algorithms-PP, LNS2, PIBT, and LaCAM-and conduct comprehensive benchmarking. Our experiments reveal that PP and LNS2 struggle to solve instances with many agents, while PIBT-based approaches achieve preferable scalability with increased solution cost. We believe that these constitute an important step toward adapting classical gridworld MAPF to operational warehouse setups.

URL PDF HTML ☆

赞 0 踩 0

2605.15788 2026-05-18 cs.DC cs.LG

ADAPT: A Self-Calibrating Proactive Autoscaler for Container Orchestration

ADAPT：一种自校准的前瞻性容器编排自动扩展器

Himanshu Singh Baghel

发表机构 * Department of Computer Engineering（计算机工程系）； [J.C. Bose University of Science and Technology]（J.C. Bose科学技术大学）

AI总结 ADAPT通过在线EWMA估计器动态调整冷启动时间，结合MPC优化副本数量，实现低于5%的SLA违规率，优于传统HPA和MPC+Prophet方案。

Comments 9 pages, 5 figures, 3 tables. Includes reproducible simulation framework for proactive Kubernetes autoscaling with adaptive cold-start estimation and MPC-based scaling. Source code and experiment configurations available at: https://github.com/Himanshu21035/autoscaling_research

详情

AI中文摘要

容器化工作负载的前瞻性自动扩展依赖于了解资源配置延迟，即从扩展决策到新容量准备就绪的时间。本文提出ADAPT（Adaptive Duration Approximation for Predictive Timing），一种在线EWMA估计器，实时跟踪冷启动时间。ADAPT将动态规划 horizon FH-OPT 输入模型预测控制器（MPC），在滑动窗口内优化副本数量。这些组件共同形成一个闭环前瞻性自动扩展设计，根据测量的资源配置延迟调整前瞻范围。在三种策略（MPC+LSTM、MPC+Prophet、HPA）和六个工作负载 archetype 上评估，MPC+LSTM 在所有工作负载上均实现低于5%的SLA违规率，相比之下，反应式HPA为7-19%，MPC+Prophet在双模交通情况下达到最高28.7%。

英文摘要

Proactive autoscaling for containerized workloads depends on knowing the provisioning delay, i.e., the time between a scaling decision and the moment new capacity is ready to serve traffic. In practice, this cold-start duration can vary substantially across environments and even across consecutive scale-out events. We present ADAPT (Adaptive Duration Approximation for Predictive Timing), an online EWMA estimator that tracks coldstart duration at runtime. ADAPT feeds a dynamic planning horizon, FH-OPT, into a Model Predictive Controller (MPC) that optimizes replica counts over a rolling window. Together, these components form a closed-loop proactive autoscaling design that adapts its lookahead based on measured provisioning delay. Evaluated across three policies (MPC+LSTM, MPC+Prophet, HPA) and six workload archetypes with five random seeds, MPC+LSTM achieves below 5% SLA violation on all workloads, compared with 7-19% for reactive HPA and up to 28.7% for MPC+Prophet on bimodal traffic.

URL PDF HTML ☆

赞 0 踩 0

2605.15108 2026-05-18 stat.ML cs.AI cs.IR cs.LG stat.ME

Logging Policy Design for Off-Policy Evaluation

为离线策略评估设计日志策略

Connor Douglas, Joel Persson, Foster Provost

发表机构 * New York University（纽约大学）； Spotify

AI总结本文研究如何设计日志策略以最小化OPE误差，探讨了奖励与覆盖之间的根本权衡，并在不同信息场景下提出了最优策略。

详情

AI中文摘要

离线策略评估（OPE）利用不同日志策略收集的数据来估计目标策略（如推荐系统）的价值。它使高风险实验无需实时部署，但实际准确性严重依赖于用于计算估计值的数据收集日志策略。我们研究如何设计日志策略以最小化OPE误差。我们刻画了一个根本的奖励-覆盖权衡：将概率质量集中在高奖励动作上会减少方差，但可能错过目标策略可能采取的动作的信号。我们提出了一种统一的日志策略设计框架，并在目标策略和奖励分布已知、未知或部分通过先验或噪声估计可知的信息场景中推导出最优策略。我们的结果为公司选择多个候选推荐系统提供了可行指导。我们展示了在收集OPE数据时治疗选择的重要性，并在该目标是公司主要目标时描述了理论上最优的方法。我们还提炼了在操作约束防止实施理论最优的情况下选择日志策略的实用设计原则。

英文摘要

Off-policy evaluation (OPE) estimates the value of a target treatment policy (e.g., a recommender system) using data collected by a different logging policy. It enables high-stakes experimentation without live deployment, yet in practice accuracy depends heavily on the logging policy used to collect data for computing the estimate. We study how to design logging policies that minimize OPE error for given target policies. We characterize a fundamental reward-coverage tradeoff: concentrating probability mass on high-reward actions reduces variance but risks missing signal on actions the target policy may take. We propose a unifying framework for logging policy design and derive optimal policies in canonical informational regimes where the target policy and reward distribution are (i) known, (ii) unknown, and (iii) partially known through priors or noisy estimates at logging time. Our results provide actionable guidance for firms choosing among multiple candidate recommendation systems. We demonstrate the importance of treatment selection when gathering data for OPE, and describe theoretically optimal approaches when this is a firm's primary objective. We also distill practical design principles for selecting logging policies when operational constraints prevent implementing the theoretical optimum.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

A Scalable Nonparametric Continuous-Time Survival Model through Numerical Quadrature

paper.json: A Coordination Convention for LLM-Agent-Actionable Papers

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training

Skew-adaptive conformal prediction

Scalable neuromorphic computing from autonomous spiking dynamics in a clockless reconfigurable chip

GeoGS-CE: Learning Delay--Beam Channel Priors with 3D Gaussians for High-Mobility Scenarios

A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation

Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks

A numerical study into neural network surrogate model performance for uncertainty propagation

XSearch: Explainable Code Search via Concept-to-Code Alignment

Explainable AI Isn't Enough! Rethinking Algorithmic Contestability

Who Owns This Agent? Tracing AI Agents Back to Their Owners

Testing properties of trees in graphical models with covariance queries

Driving Through the Network: Performance and Workload Under Latency and Video Impairment

Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery

Unsupervised Domain Shift Detection with Interpretable Subspace Attribution

SLIP & ETHICS: Graduated Intervention for AI Emotional Companions

Generative Long-term User Interest Modeling for Click-Through Rate Prediction

Layer Selection in Feature-Based Losses Affects Image Quality and Microstructural Consistency in Deep Learning Super-Resolution of Brain Diffusion MRI

A Multi-Layer Cloud-IDS Pipeline with LLM and Adaptive Q-Learning Calibration

Symplectic Neural Operators for Learning Infinite Dimensional Hamiltonian Systems

Complexity of Non-Log-Concave Sampling in Fisher Information

Conversations in Space: Structuring Non-Linear LLM Interactions on a Canvas

Heuristic-Based Merging of HPC Traces to Extend Hardware Counter Coverage

StippleDiffusion: Capacity-Constrained Stippling using Controlled Diffusion

BootstrapAgent: Distilling Repository Setup into Reusable Agent Knowledge

Toward Natural and Companionable Virtual Agents via Cross-Temporal Emotional Modeling

From Gridworlds to Warehouses: Adapting Lightweight One-shot Multi-Agent Pathfinding for AGVs

ADAPT: A Self-Calibrating Proactive Autoscaler for Container Orchestration

Logging Policy Design for Off-Policy Evaluation