arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2329
2510.06719 2026-05-13 cs.CR cs.CL cs.LG

Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)

Junki Mori, Kazuya Kakizaki, Taiki Miyagawa, Jun Sakuma

AI总结 该论文提出了一种名为DP-SynRAG的隐私保护检索增强生成(RAG)框架,旨在解决传统RAG系统在敏感领域应用中面临的隐私风险问题。不同于依赖查询时差分隐私的现有方法,DP-SynRAG利用大语言模型生成差分隐私的合成数据库,避免重复注入噪声带来的隐私损耗。实验表明,该方法在保持固定隐私预算的前提下,性能优于现有最先进的隐私保护RAG系统,为隐私友好的RAG应用提供了可扩展的解决方案。

Comments Accepted to ACL 2026 Findings

详情
英文摘要

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by grounding them in external knowledge. However, its application in sensitive domains is limited by privacy risks. Existing private RAG methods typically rely on query-time differential privacy (DP), which requires repeated noise injection and leads to accumulated privacy loss. To address this issue, we propose DP-SynRAG, a framework that uses LLMs to generate differentially private synthetic RAG databases. Unlike prior methods, the synthetic text can be reused once created, thereby avoiding repeated noise injection and additional privacy costs. To preserve essential information for downstream RAG tasks, DP-SynRAG extends private prediction, which instructs LLMs to generate text that mimics subsampled database records in a DP manner. Experiments show that DP-SynRAG achieves superior performance to the state-of-the-art private RAG systems while maintaining a fixed privacy budget, offering a scalable solution for privacy-preserving RAG.

2510.05497 2026-05-13 cs.DC cs.AI cs.AR cs.LG

Patterns behind Chaos: Forecasting Data Movement for Efficient Large-Scale MoE LLM Inference

Zhongkai Yu, Yue Guan, Zihao Yu, Chenyang Zhou, Zhengding Hu, Shuyi Pei, Yangwook Kang, Yufei Ding, Po-An Tsai

AI总结 本文研究了大规模混合专家(MoE)大语言模型推理过程中数据移动的模式,旨在提升其在多单元系统中的执行效率。通过分析2025年发布的四款大型MoE模型在24,000个不同任务上的运行情况,研究从时间和空间两个维度提炼出六个关键洞察,并据此提出适用于未来晶圆级GPU和现有GPU系统的优化方案,分别实现了6.6倍和1.25倍的性能提升。这是首个针对大规模MoE模型数据移动问题的系统性分析与应用研究。

详情
英文摘要

Large-scale Mixture of Experts (MoE) Large Language Models (LLMs) have recently become the frontier open-weight models, achieving remarkable model capability similar to proprietary ones. But their random expert selection mechanism introduces significant data movement overhead that becomes the dominant bottleneck in multi-unit LLM serving systems. To understand the patterns underlying this data movement, we conduct comprehensive data-movement-centric profiling across four state-of-the-art large-scale MoE models released in 2025 (200B-1000B) using over 24,000 requests spanning diverse workloads. We perform systematic analysis from both temporal and spatial perspectives and distill six key insights to guide the design of diverse serving systems. We verify these insights on both future wafer-scale GPU architectures and existing GPU systems. On wafer-scale GPUs, lightweight architectural modifications guided by our insights yield a 6.6$\times$ average speedup across four 200B--1000B models. On existing GPU systems, our insights drive the design of a prefill-aware expert placement algorithm that achieves up to 1.25$\times$ speedup on MoE computation. Our work presents the first comprehensive data-centric analysis of large-scale MoE models together with a concrete design study applying the learned lessons. Our profiling traces are publicly available at \href{https://huggingface.co/datasets/core12345/MoE_expert_selection_trace}{\textcolor{blue}{https://huggingface.co/datasets/core12345/MoE\_expert\_selection\_trace}}.

2509.21711 2026-05-13 stat.ML cs.LG

Multi-modal Bayesian Neural Network Surrogates with Conjugate Last-Layer Estimation

Ian Taylor, Juliane Mueller, Julie Bessac

AI总结 本文研究了如何利用多模态数据构建高效的代理模型,以支持对昂贵目标量的建模与分析。作者提出两种基于共轭后验分布的多模态贝叶斯神经网络代理模型,并利用变分推断方法进行参数估计,特别适用于存在部分缺失观测的情况。实验表明,与单模态模型相比,该方法在标量和时序数据上均表现出更高的预测精度和不确定性量化能力。

Comments 47 pages including references and appendix, 9 figures

详情
英文摘要

As data collection and simulation capabilities advance, multi-modal learning, the task of learning from multiple modalities and sources of data, is becoming an increasingly important area of research. Surrogate models that learn from data of multiple auxiliary modalities to support the modeling of a highly expensive quantity of interest have the potential to aid outer loop applications such as optimization, inverse problems, or sensitivity analyses when multi-modal data are available. We develop two multi-modal Bayesian neural network surrogate models and leverage conditionally conjugate distributions in the last layer to estimate model parameters using stochastic variational inference (SVI). We provide a method to perform this conjugate SVI estimation in the presence of partially missing observations. We demonstrate improved prediction accuracy and uncertainty quantification compared to uni-modal surrogate models for both scalar and time series data.

2509.00931 2026-05-13 stat.ML cs.LG

Semi-Supervised Bayesian GANs with Log-Signatures for Uncertainty-Aware Credit Card Fraud Detection

David Hirnschall

AI总结 本文提出了一种基于半监督贝叶斯生成对抗网络(GAN)的新型深度生成框架,用于信用卡欺诈检测,将问题建模为时间序列分类任务。该方法结合条件GAN进行目标数据增强,引入贝叶斯推理以量化预测不确定性,并利用对数符号(log-signatures)对交易历史进行鲁棒特征编码,同时设计了一种基于Wasserstein距离的损失函数以对齐生成样本与真实未标记样本。实验表明,该方法在BankSim数据集上优于现有基准,在不同标签比例下均表现出优异的统计和领域特定性能。

详情
英文摘要

We present a novel deep generative semi-supervised framework for credit card fraud detection, formulated as time series classification task. As financial transaction data streams grow in scale and complexity, traditional methods often require large labeled datasets, struggle with time series of irregular sampling frequencies and varying sequence lengths. To address these challenges, we extend conditional Generative Adversarial Networks (GANs) for targeted data augmentation, integrate Bayesian inference to obtain predictive distributions and quantify uncertainty, and leverage log-signatures for robust feature encoding of transaction histories. We introduce a novel Wasserstein distance-based loss to align generated and real unlabeled samples while simultaneously maximizing classification accuracy on labeled data. Our approach is evaluated on the BankSim dataset, a widely used simulator for credit card transaction data, under varying proportions of labeled samples, demonstrating consistent improvements over benchmarks in both global statistical and domain-specific metrics. These findings highlight the effectiveness of GAN-driven semi-supervised learning with log-signatures for irregularly sampled time series and emphasize the importance of uncertainty-aware predictions.

2508.20614 2026-05-13 stat.ML cs.LG stat.CO

Improving the Accuracy of Amortized Model Comparison with Self-Consistency

Šimon Kucharský, Aayush Mishra, Daniel Habermann, Stefan T. Radev, Paul-Christian Bürkner

AI总结 该论文研究了如何提高免训练模型比较(Amortized Bayesian Model Comparison, BMC)的准确性,特别是在模拟模型存在偏差的情况下。作者提出了一种基于自一致性(self-consistency)损失的新方法,通过在未标记的真实数据上训练神经代理模型,以增强模型比较在分布偏移情况下的鲁棒性。实验表明,在开放世界场景下,结合自一致性训练的方法能显著提升BMC估计的准确性,尤其在模型严重偏差时效果更明显。

Comments 22 pages, 14 figures. This version extends our initial results presented at Reliable ML from Unreliable Data Workshop at NeurIPS 2025. Previously, this version appeared as arXiv:2512.14308v2, which has now been withdrawn: the two versions share too much content to be considered separate papers

详情
英文摘要

Amortized Bayesian model comparison (BMC) enables fast probabilistic ranking of models via simulation-based training of neural surrogates. However, the accuracy of neural surrogates deteriorates when simulation models are misspecified; the very case where model comparison is most needed. We evaluate four different amortized BMC methods. We supplement traditional simulation-based training of these methods with a \emph{self-consistency} (SC) loss on unlabeled real data to improve BMC estimates under distribution shifts. Using one artificial and two real-world case studies, we compare amortized BMC estimators with and without SC against analytic or bridge sampling benchmarks. In the \emph{closed-world} case (data is generated by one of the candidate models), BMC estimators using classifiers work acceptably well even without SC training. However, these methods also benefit the least from SC training. In the \emph{open-world} scenario (all models misspecified), SC training strongly improves BMC estimators when having access to analytic likelihoods, or when surrogate likelihoods are locally accurate near the true parameter posterior, even for severely misspecified models. We conclude with practical recommendations for amortized BMC and suggestions for future research.

2508.02455 2026-05-13 cs.SE cs.AI cs.IR

TreeRanker: Fast and Model-agnostic Ranking System for Code Suggestions in IDEs

Daniele Cipollone, Egor Bogomolov, Arie van Deursen, Maliheh Izadi

AI总结 TreeRanker 是一种快速且模型无关的代码建议排序系统,旨在提升 IDE 中代码补全功能的相关性。该方法利用语言模型对静态补全结果进行评分,通过构建前缀树并进行一次贪心解码遍历,实现了无需复杂调整的精确排序。其优势在于高效、通用,可兼容现有代码补全模型,为在 IDE 中集成语言模型提供了实用且有效的解决方案。

详情
英文摘要

Token-level code completion is one of the most critical features in modern Integrated Development Environments (IDEs). It assists developers by suggesting relevant identifiers and APIs during coding. While completions are typically derived from static analysis, their usefulness depends heavily on how they are ranked, as correct predictions buried deep in the list are rarely seen by users. Most current systems rely on hand-crafted heuristics or lightweight machine learning models trained on user logs, which can be further improved to capture context information and generalize across projects and coding styles. In this work, we propose a new scoring approach to ranking static completions using language models in a lightweight and model-agnostic way. Our method organizes all valid completions into a prefix tree and performs a single greedy decoding pass to collect token-level scores across the tree. This enables a precise token-aware ranking without needing beam search, prompt engineering, or model adaptations. The approach is fast, architecture-agnostic, and compatible with already deployed models for code completion. These findings highlight a practical and effective pathway for integrating language models into already existing tools within IDEs, and ultimately providing smarter and more responsive developer assistance.

2506.10664 2026-05-13 stat.ML cs.LG

Sequential Off-Policy Learning with Logarithmic Smoothing

Maxime Haddouche, Otmane Sakhi

AI总结 本文研究了序列离线策略学习问题,即在实际系统中不断更新和重新部署策略时,如何利用所有历史数据进行学习。作者提出了一种结合对数平滑估计与在线PAC-贝叶斯工具的简单算法,并证明在温和条件下对对数平滑方法的改进可以提升性能并加速收敛。该算法在批量设置下与当前最优离线方法相当,而在序列更新场景下则显著优于现有方法,实验验证了其有效性。

Comments AISTATS 2026

详情
英文摘要

Off-policy learning enables training policies from logged interaction data. Most prior work considers the batch setting, where a policy is learned from data generated by a single behavior policy. In real systems, however, policies are updated and redeployed repeatedly, each time training on all previously collected data while generating new interactions for future updates. This sequential off-policy learning setting is common in practice but remains largely unexplored theoretically. In this work, we present and study a simple algorithm for sequential off-policy learning, combining Logarithmic Smoothing (LS) estimation with online PAC-Bayesian tools. We further show that a principled adjustment to LS improves performance and accelerates convergence under mild conditions. The algorithms introduced generalise previous work: they match state-of-the-art offline approaches in the batch case and substantially outperform them when policies are updated sequentially. Empirical evaluations highlight both the benefits of the sequential framework and the strength of the proposed algorithms.

2506.07859 2026-05-13 quant-ph cs.LG

Deep reinforcement learning for near-deterministic preparation of cubic- and quartic-phase gates in photonic quantum computing

Amanuel Anteneh, Léandre Brunel, Carlos González-Arciniegas, Olivier Pfister

AI总结 该研究利用深度强化学习方法,在光子量子计算中实现了近确定性的三次和四次相位门制备。通过训练深度神经网络控制量子光学电路,成功生成三次相位态,平均成功率达96%,仅需使用光子数分辨测量这一非高斯资源。研究还表明,相同资源可直接生成四次相位门,无需对三次门进行分解。

详情
Journal ref
Physical Review Research 8 (2026) L012048
英文摘要

Cubic-phase states are a sufficient resource for universal quantum computing over continuous variables. We present results from numerical experiments in which deep neural networks are trained via reinforcement learning to control a quantum optical circuit for generating cubic-phase states, with an average success rate of 96%. The only non-Gaussian resource required is photon-number-resolving measurements. We also show that the exact same resources enable the direct generation of a quartic-phase gate, with no need for a cubic gate decomposition.

2506.00294 2026-05-13 astro-ph.IM cs.CV

Applying Vision Transformers on Spectral Analysis of Astronomical Objects

Luis Felipe Strano Moraes, Ignacio Becker, Pavlos Protopapas, Guillermo Cabrera-Vives

AI总结 本文将预训练的视觉Transformer(ViT)应用于天文学光谱数据分析,通过将一维光谱转化为二维图像表示,使ViT能够通过空间自注意力机制捕捉局部和全局光谱特征。研究利用SDSS和LAMOST巡天的数百万条光谱数据对ViT进行微调,在恒星分类和红移估计等任务中表现出色,其分类准确率优于支持向量机和随机森林,且在跨类型泛化能力上达到与AstroCLIP相当的水平。这是首次将ViT应用于大规模真实光谱数据的分析,无需依赖合成输入。

Comments 9 pages, 9 figures

详情
Journal ref
A&A 709, A122 (2026)
英文摘要

We apply pre-trained Vision Transformers (ViTs), originally developed for image recognition, to the analysis of astronomical spectral data. By converting traditional one-dimensional spectra into two-dimensional image representations, we enable ViTs to capture both local and global spectral features through spatial self-attention. We fine-tune a ViT pretrained on ImageNet using millions of spectra from the SDSS and LAMOST surveys, represented as spectral plots. Our model is evaluated on key tasks including stellar object classification and redshift ($z$) estimation, where it demonstrates strong performance and scalability. We achieve classification accuracy higher than Support Vector Machines and Random Forests, and attain $R^2$ values comparable to AstroCLIP's spectrum encoder, even when generalizing across diverse object types. These results demonstrate the effectiveness of using pretrained vision models for spectroscopic data analysis. To our knowledge, this is the first application of ViTs to large-scale, which also leverages real spectroscopic data and does not rely on synthetic inputs.

2505.20754 2026-05-13 stat.ML cs.LG

Stationary MMD Points

Zonghao Chen, Toni Karvonen, Heishiro Kanagawa, François-Xavier Briol, Chris. J. Oates

AI总结 本文研究如何利用有限点集近似目标概率分布的问题,核心方法是通过最小化最大均值差异(MMD)来选择点集。由于MMD目标函数的非凸性,难以直接求得全局最优解,因此作者提出研究MMD的平稳点,这些点可以被准确计算。理论分析表明,对于相关再生核希尔伯特空间中的积分函数,平稳MMD点的数值积分误差收敛速度比MMD本身更快,并基于此提出了MMD梯度流作为计算平稳点的实用方法,同时给出了其收敛性的严格分析与误差界。

详情
Journal ref
International Conference on Machine Learning, 2026
英文摘要

Approximation of a target probability distribution using a finite set of points is a problem of fundamental importance in numerical integration. Several authors have proposed to select points by minimising a maximum mean discrepancy (MMD), but the non-convexity of this objective typically precludes global minimisation. Instead, we consider the concept of \emph{stationary points of the MMD} which, in contrast to points globally minimising the MMD, can be accurately computed. Our main contributions are two-fold and theoretical in nature. We first prove the (perhaps surprising) result that, for integrands in the associated reproducing kernel Hilbert space, the numerical integration error of stationary MMD points vanishes \emph{faster} than the MMD. Motivated by this \emph{super-convergence} property, we consider MMD gradient flows as a practical strategy for computing stationary points of the MMD. We then prove that MMD gradient flow can indeed compute stationary MMD points, based on a refined convergence analysis that establishes a novel non-asymptotic finite-particle error bound.

2505.17907 2026-05-13 stat.ML cs.LG

Approximating Simple ReLU Networks based on Spectral Decomposition of Fisher Information

Ka Long Keith Ho, Yoshinari Takeishi, Junichi Takeuchi

AI总结 本文研究了具有随机隐藏权重的两层ReLU神经网络的费舍尔信息矩阵的性质。研究发现,其特征值分布高度集中在少数几个特征空间中,前三个特征空间的特征值之和占费舍尔信息矩阵迹的97.7%,且与参数数量无关。作者识别出对应这些主要特征空间的函数空间,发现其由阶数不超过2的球谐函数组成,该结果与神经切核的Mercer分解密切相关。

Comments 18 pages, 1 figure, 1 table

详情
英文摘要

Properties of Fisher information matrices of 2-layer neural ReLU networks with random hidden weights are studied. For these networks, it is known that the eigenvalue distribution highly concentrates on several eigenspaces approximately. In particular, the eigenvalues for the first three eigenspaces account for 97.7% of the trace of the Fisher information matrix, independently of the number of parameters. In this paper, we identify the function spaces which correspond to those major eigenspaces. This function space consists of the spherical harmonic functions whose orders are not greater than 2. This result relates to the Mercer decomposition of the neural tangent kernels.

2505.16156 2026-05-13 stat.ML cs.LG

Integral Imprecise Probability Metrics

Siu Lun Chau, Michele Caprio, Krikamol Muandet

AI总结 本文提出了一种基于Choquet积分的积分模糊概率度量(IIPM)框架,用于在模糊概率模型下比较概率分布之间的差异,扩展了经典概率度量的应用范围。该方法适用于包括下概率、概率区间和信念函数在内的多种模糊概率模型,能够有效衡量认识不确定性。理论分析表明IIPM满足度量空间的条件,并可用于描述模糊概率的弱收敛形式;实验验证显示其在分类任务中表现优异,尤其在类别数量较多时优于传统方法。

Comments 48 pages

详情
英文摘要

Quantifying differences between probability distributions is fundamental to statistics and machine learning, primarily for comparing statistical uncertainty. In contrast, epistemic uncertainty -- due to incomplete knowledge -- requires richer representations than those offered by classical probability. Imprecise probability (IP) theory offers such models, capturing ambiguity and partial belief. This has driven growing interest in imprecise probabilistic machine learning (IPML), where inference and decision-making rely on broader uncertainty models -- highlighting the need for metrics beyond classical probability. This work introduces the integral imprecise probability metric framework, a Choquet integral-based generalisation of classical integral probability metrics to the setting of capacities -- a broad class of IP models encompassing many existing ones, including lower probabilities, probability intervals, belief functions, and more. Theoretically, we establish conditions under which IIPM serves as a valid metric and metrises a form of weak convergence of capacities. Practically, IIPM not only enables comparison across different IP models but also supports the quantification of epistemic uncertainty~(EU) within a single IP model. In particular, by comparing an IP model with its conjugate, IIPM gives rise to a new class of epistemic uncertainty measures -- Maximum Mean Imprecision -- which satisfy key axiomatic properties proposed in the uncertainty quantification literature. We validate MMI through selective classification experiments, demonstrating strong empirical performance against established EU measures, and outperforming them when classical methods struggle to scale to a large number of classes. Our work advances both theory and practice in Imprecise Probabilistic Machine Learning, offering a principled framework for comparing and quantifying epistemic uncertainty under imprecision.

2504.13898 2026-05-13 cs.HC cs.AI

Social Human Robot Embodied Conversation (SHREC) Dataset: Benchmarking Foundational Models' Social Reasoning

Dong Won Lee, Yubin Kim, Denison Guvenoz, Sooyeon Jeong, Parker Malachowsky, Louis-Philippe Morency, Cynthia Breazeal, Hae Won Park

AI总结 本文提出SHREC数据集,用于评估基础模型在现实人机交互中的社会推理能力。该数据集包含约400段真实人机交互视频和超过10000个标注,涵盖了机器人在情感理解、意图追踪等方面的社会挑战及错误表现。研究定义了八个基准任务,实验表明当前先进模型在社会推理方面仍存在显著性能差距,突显了开发社会智能AI的难度与方向。

Comments 23 pages, 11 figures

详情
英文摘要

Our work focuses on the social reasoning capabilities of foundation models for real-world human-robot interactions. We introduce the Social Human Robot Embodied Conversation (SHREC) Dataset, a benchmark of $\sim$400 real-world human-robot interaction videos and over 10K annotations, capturing robot social errors, competencies, underlying rationales, and corrections. Unlike prior datasets focused on human-human interactions, the SHREC Dataset uniquely highlights the social challenges faced by real-world social robots such as emotion understanding, intention tracking, and conversational mechanics. Moreover, current foundation models struggle to recognize these deficits, which manifest as subtle, socially situated failures. To evaluate AI models' capacity for social reasoning, we define eight benchmark tasks targeting critical areas such as (1) detection of social errors and competencies, (2) identification of underlying social attributes, (3) comprehension of interaction flow, and (4) providing rationale and alternative correct actions. Experiments with state-of-the-art foundation models, alongside human evaluations, reveal substantial performance gaps -- underscoring the difficulty and providing directions in developing socially intelligent AI.

2504.10428 2026-05-13 stat.ML cs.DS cs.LG math.ST stat.TH

Smoothed Analysis of Learning from Positive Samples

Jane H. Lee, Anay Mehrotra, Manolis Zampetakis

AI总结 本文研究了仅从正样本中学习二分类问题的平滑分析,旨在解决传统最坏情况下学习能力有限的问题。通过假设真实分布相对于参考分布是平滑的,作者证明了所有VC类在平滑模型下均可学习,并给出了所需的样本数量和高效算法。该成果还带来了未知截断估计、截断检测和多参考分布学习等多个应用领域的改进算法。

Comments Accepted for presentation at the 58th ACM Symposium on Theory of Computing (STOC), 2026; abstract shortened for arXiv

详情
英文摘要

Binary classification from positive-only samples is a variant of PAC learning where the learner receives i.i.d. positive samples and aims to learn a classifier with low error. Previous work by Natarajan, Gereb-Graus, and Shvaytser characterized learnability and revealed a largely negative picture: almost no interesting classes, including two-dimensional halfspaces, are learnable. This poses a challenge for applications from bioinformatics to ecology, where practitioners rely on heuristics. In this work, we initiate a smoothed analysis of positive-only learning. We assume samples from a reference distribution $D$ such that the true distribution $D^*$ is smooth with respect to it. In stark contrast to the worst-case setting, we show that all VC classes become learnable in the smoothed model, requiring $O(VC/ε^2)$ positive samples for $ε$ classification error. We also give an efficient algorithm for any class admitting $\mathrm{poly}(ε)$-approximation by degree-$k$ polynomials whose range is lower-bounded by a constant with respect to $D$ in L1-norm. It runs in time $\mathrm{poly}(d^k/ε)$, qualitatively matching L1-regression. Our results also imply faster or more general algorithms for: (1) estimation with unknown-truncation, giving the first polynomial-time algorithm for estimating exponential-family parameters from samples truncated to an unknown set approximable by non-negative polynomials in L1 norm, improving on [KTZ FOCS19; LMZ FOCS24], who required strong L2-approximation; (2) truncation detection for broad classes, including non-product distributions, improving on [DLNS STOC24]'s who required product distributions; and (3) learning from a list of reference distributions, where samples come from $O(1)$ distributions, one of which witnesses smoothness of $D^*$, as arises when list-decoding algorithms learn samplers for $D^*$ from corrupted data.

2502.04763 2026-05-13 cs.GT cs.LG

Shapley Value Approximation Based on k-Additive Games

Guilherme Dean Pelegrina, Patrick Kolpaczki, Eyke Hüllermeier

AI总结 本文提出了一种基于$k$-可加博弈的夏普利值近似方法SVA$k_{\text{ADD}}$,用于解决多智能体公平分配问题中的计算复杂性难题。该方法通过拟合一个$k$-可加替代博弈,能够精确计算替代博弈的夏普利值,并将其作为原问题的估计值。实验表明,该方法在效率和准确性方面优于现有方法,为特征或数据点贡献度的量化提供了新的有效工具。

详情
英文摘要

The Shapley value is the prevalent solution for fair division problems in which a payout is to be divided among multiple agents. By adopting a game-theoretic view, the idea of fair division and the Shapley value can also be used in machine learning to quantify the individual contribution of features or data points to the performance of a predictive model. Despite its popularity and axiomatic justification, the Shapley value suffers from a computational complexity that scales exponentially with the number of entities involved, and hence requires approximation methods for its reliable estimation. We propose SVA$k_{\text{ADD}}$, a novel approximation method that fits a $k$-additive surrogate game. By taking advantage of $k$-additivity, we are able to elicit the exact Shapley values of the surrogate game and then use these values as estimates for the original fair division problem. The efficacy of our method is evaluated empirically and compared to competing methods.

2412.11875 2026-05-13 stat.ML cs.LG

Bayesian Surrogate Training on Multiple Data Sources: A Hybrid Modeling Strategy

Philipp Reiser, Paul-Christian Bürkner, Anneli Guthke

AI总结 该论文提出了一种融合仿真数据和真实测量数据的混合建模策略,用于提升代理模型的训练效果。研究通过两种概率方法,在代理模型训练过程中整合不同数据源的信息,一种是分别训练不同数据源的代理模型并融合预测分布,另一种是训练一个统一的代理模型以同时利用多源数据。这两种方法均采用了一种新颖的异构数据加权策略,能够提升预测精度与覆盖性,并有助于诊断仿真模型中的潜在问题。

详情
英文摘要

Surrogate models are often used as computationally efficient approximations to complex simulation models, enabling tasks such as solving inverse problems, sensitivity analysis, and probabilistic forward predictions, which would otherwise be computationally infeasible. During training, surrogate parameters are fitted such that the surrogate reproduces the simulation model's outputs as closely as possible. However, the simulation model itself is merely a simplification of the real-world system, often missing relevant processes or suffering from misspecifications e.g., in inputs or boundary conditions. Hints about these might be captured in real-world measurement data, and yet, we typically ignore those hints during surrogate building. In this paper, we propose two novel probabilistic approaches to integrate simulation data and real-world measurement data during surrogate training. The first method trains separate surrogate models for each data source and combines their predictive distributions, while the second incorporates both data sources by training a single surrogate. Both hybrid modeling approaches employ a novel weighting strategy for combining heterogeneous data sources during surrogate training, which operates independently of the chosen surrogate family. We show the conceptual differences and benefits of the two approaches through both synthetic and real-world case studies. The results demonstrate the potential of these methods to improve predictive accuracy, predictive coverage, and to diagnose problems in the underlying simulation model. These insights can improve system understanding and future model development.

2409.08290 2026-05-13 cs.NE cs.AI cs.LG

Reconsidering the energy efficiency of spiking neural networks

Zhanglu Yan, Zhenyu Bai, Weng-Fai Wong

AI总结 本文重新评估了脉冲神经网络(SNN)相对于量化人工神经网络(QNN)在能效方面的优势。通过建立公平的对比基准,将具有 $T$ 个时间步的率编码 SNN 映射到等效的 $\lceil \log_2(T+1) \rceil$ 位 QNN,确保两者在表示能力和硬件需求上具有可比性。研究引入了涵盖计算和数据移动的详细能量模型,分析了多种网络和硬件参数的影响,发现 SNN 在特定条件下(如中等时间窗口和低脉冲率)确实具有更高的能效,并通过智能手表的实例展示了其实际节能效果。

详情
英文摘要

Spiking Neural Networks (SNNs) promise higher energy efficiency over conventional Quantized Artificial Neural Networks (QNNs) due to their event-driven, spike-based computation. However, prevailing energy evaluations often oversimplify, focusing on computational aspects while neglecting critical overheads like comprehensive data movement and memory access. Such simplifications can lead to misleading conclusions regarding the true energy benefits of SNNs. This paper presents a rigorous re-evaluation. We establish a fair baseline by mapping rate-encoded SNNs with $T$ timesteps to functionally equivalent QNNs with $\lceil \log_2(T+1) \rceil$ bits. This ensures both models have comparable representational capacities, as well has similar hardware requirement, enabling meaningful energy comparisons. We introduce a detailed analytical energy model encompassing core computation and data movement. Using this model, we systematically explore a wide parameter space, including intrinsic network characteristics ($T$, spike rate $s_r$, QNN sparsity $γ$, model size $N$, weight bit-level) and hardware characteristics (memory system and network-on-chip). Our analysis identifies specific operational regimes where SNNs genuinely offer superior energy efficiency. For example, under typical neuromorphic hardware conditions, SNNs with moderate time windows ($T \in [5,10]$) require an average spike rate ($s_r$) below 6.4\% to outperform equivalent QNNs. Furthermore, to illustrate the real-world implications of our findings, we analyze the operational lifetime of a typical smartwatch, showing that an optimized SNN can nearly double its battery life compared to a QNN. These insights guide the design of turely energy-efficient neural network solutions.

2406.12017 2026-05-13 stat.ML cs.LG stat.CO

Sparsity-Constraint Optimization via Splicing Iteration

Jin Zhu, Junxian Zhu, Zezhi Wang, Borui Tang, Hongmei Lin, Xueqin Wang

AI总结 本文提出了一种名为SCOPE的新型稀疏性约束优化算法,用于解决信号处理、统计和机器学习中的相关问题。该算法通过拼接迭代操作替代传统梯度步骤,无需调整连续超参数,从而实现了自然收敛。理论分析表明,SCOPE在稀疏度正确设定时能够线性收敛并准确恢复稀疏支撑集,且其理论结果不依赖于受限等距性质条件。实验表明,SCOPE在稀疏二次优化、稀疏分类器学习和稀疏马尔可夫网络恢复等任务中表现出优越的性能。

Comments 35 pages

详情
英文摘要

Sparsity-constrained optimization underlies many problems in signal processing, statistics, and machine learning. State-of-the-art hard-thresholding (HT) algorithms rely on an appropriately selected continuous step-size parameter to ensure convergence. In this paper, we propose a naturally convergent iterative algorithm, SCOPE (Sparsity-Constrained Optimization via sPlicing itEration). The algorithm is capable of optimizing nonlinear differentiable objective functions that are strongly convex and smooth on low-dimensional subspaces. SCOPE replaces the gradient step with a splicing operation guided directly by the objective value, thereby eliminating the need to tune any continuous hyperparameter. Theoretically, it achieves a linear convergence rate and recovers the true support set when the sparsity level is correctly specified. We also establish parallel theoretical results without relying on restricted-isometry-property-type conditions. We apply SCOPE's versatility and power to solve sparse quadratic optimization, learn sparse classifiers, and recover sparse Markov networks for binary variables. With our C++ implementation of SCOPE, numerical experiments on these tasks show that it achieves superior support recovery performance, confirming both its algorithmic efficiency and theoretical guarantees.

2310.17025 2026-05-13 cs.NI cs.AI

netFound: Principled Design for Network Foundation Models

Sylee Beltiukov, Satyandra Guthula, Haarika Manda, Jaber Daneshamooz, Wenbo Guo, Walter Willinger, Arpit Gupta, Inder Monga

AI总结 该论文提出了一种名为 netFound 的网络基础模型,旨在解决现有模型在流量分析任务中依赖数据捷径、嵌入空间退化以及无法捕捉外部网络条件等问题。研究提出了四个设计原则,包括协议感知的分词、操作上下文嵌入、突发流层次注意力机制和隐私优先的输入设计,并基于这些原则构建了 netFound 模型。实验表明,netFound 在表示质量、领域专家特征对齐和外部上下文识别任务中显著优于现有模型,同时在隐私保护方面也表现出色。

详情
英文摘要

Network foundation models promise reusable representations for diverse traffic analysis tasks, but recent diagnostic works have revealed fundamental problems: models exploit dataset shortcuts rather than learning genuine traffic patterns, produce collapsed embedding spaces, and fail to capture the exogenous network conditions that shape real-world behavior. We translate these diagnostic insights into four concrete design principles: protocol-aware tokenization, operational context embedding, burst-flow hierarchical attention, and privacy-by-construction input design, and build netFound, a network foundation model whose architecture is motivated by this failure analysis. We pretrain netFound on a billion-token-scale corpus over 5000 GPU hours, and demonstrate that it produces high-quality representations with lower anisotropy, significantly higher alignment with domain-expert features, and an F1 of 0.95 on exogenous context discrimination where existing state-of-the-art models score below 0.62, while preserving privacy by excluding payload and IP addresses. netFound demonstrates significant improvements in frozen-encoder evaluation, showing that pretrained embeddings themselves carry useful structure, and remains the top performer across all benchmarks in end-to-end fine-tuned settings. We release full open-source code, weights for three model sizes on HuggingFace, a containerized pipeline from raw PCAPs to downstream inference, and the full 4.2 billion flows pretraining dataset to facilitate reproducibility and further research.

2110.01729 2026-05-13 stat.ML cs.LG

Stochastic tensor space feature theory with applications to robust machine learning

Julio Enrique Castrillon-Candas, Kaili Shi, Dingning Liu, Sicheng Yang, Xiaoling Zhang, Mark Kon, the Alzheimer's Disease Neuroimaging Initiative

AI总结 本文提出了一种基于随机张量空间的多级正交子空间(MOS)Karhunen-Loeve特征理论,用于构建鲁棒的机器学习特征。通过将训练数据视为某个博赫纳空间中的随机场实例,并利用Karhunen-Loeve展开和层次化展开方法,构建多级正交子空间以检测异常信号成分,从而提取更具区分性的特征用于分类。实验表明,该方法在阿尔茨海默病血浆数据集上的分类准确率显著优于梯度提升、随机森林等主流机器学习方法。

详情
英文摘要

In this paper we develop a Multilevel Orthogonal Subspace (MOS) Karhunen-Loeve feature theory based on stochastic tensor spaces, for the construction of robust machine learning features. Training data are treated as instances of a random field within a relevant Bochner space. Our key observation is that separate machine learning classes can reside predominantly in mostly distinct subspaces. Using the Karhunen-Loeve expansion and a hierarchical expansion of the first (nominal) class, a MOS is constructed to detect anomalous signal components, treating the second class as an outlier of the first. The projection coefficients of the input data into these subspaces are then used to train a Machine Learning (ML) classifier. These coefficients become new features from which much clearer separation surfaces can arise for the underlying classes. Tests in the blood plasma dataset (Alzheimer's Disease Neuroimaging Initiative) show dramatic increases in accuracy. This contrast to popular ML methods such as Gradient Boosting, RUS Boost, Random Forest and Neural Networks. We show that with a non-invasive blood test, high-accuracy results can be obtained for predicting AD stages such as cognitive normal, mild cognitive impairment and dementia.

2605.11829 2026-05-13 physics.optics cs.LG eess.SP physics.med-ph

Bin Latent Transformer (BiLT): A shift-invariant autoencoder for calibration-free spectral unmixing of turbid media

Martin Hohmann

AI总结 该研究提出了一种名为Bin Latent Transformer (BiLT)的自编码器,用于在无需校准的情况下对混浊介质进行光谱解混,以准确恢复其组分的光学特性。其核心方法是用基于交叉注意力机制的编码器替代传统全连接网络,使模型对光谱波长的位置不敏感,从而提升在光谱仪校准漂移或硬件更换情况下的鲁棒性。实验表明,该模型在液体仿真实验中表现出优异的性能,并能泛化到不同仪器配置,具有重要的应用价值。

详情
英文摘要

The accurate recovery of constituent-level optical properties from integrating sphere measurements is a central analytical challenge in pharmaceutical analysis, food science, and biomedical diagnostics. Neural network autoencoders can extract spectrally resolved absorption and scattering coefficients for each constituent without prior knowledge, but their fully connected encoders bind learned features to absolute wavelength indices, causing accuracy loss under spectrometer calibration drift or hardware exchange. This work introduces the Bin Latent Transformer (BiLT)-Autoencoder, in which the dense encoder is replaced by a cross-attention scanner: 16 learnable probe vectors query a convolutional feature map, aggregating morphological spectral information independently of absolute wavelength position. A physics-constrained linear decoder with enforced absorption/scattering separation and a three-phase curriculum augmentation strategy complete the architecture. On a liquid phantom benchmark (intralipid and two ink absorbers; 496 samples), the model achieves $R^2 = 0.979$ and $0.975$ for $μ_a(λ)$ and $μ_s'(λ)$, respectively, on held-out test spectra, maintaining $R^2 > 0.90$ for $μ_a$ and $R^2 \approx 0.99$ for $μ_s'$ across the full tested shift range of $\pm 10$ spectral bands. The model generalises to a simulated spectrometer with a broader instrument line shape (${\approx}24$nm FWHM) without retraining, retaining $R^2 \approx 0.96$ and $0.974$ for the two channels. Attention map analysis reveals a physically interpretable two-component probe strategy: sparse anchor probes at absorption-edge wavelengths combined with a diffuse, SNR-driven ensemble at the high-transmittance long-wavelength region, which recruits additional probes dynamically under noise to provide implicit spectral averaging.

2605.11784 2026-05-13 cs.CE cs.AI cs.LG

Crash Assessment via Mesh-Based Graph Neural Networks and Physics-Aware Attention

Gabriel Curtosi, Carlos Manuel Ruiz Ruiz, Fabiola Cavaliere, Xabier Larráyoz Izcara

AI总结 该研究提出了一种基于网格图神经网络和物理感知注意力机制的混合代理模型,用于高效预测整车侧面柱碰撞中的结构变形场。通过结合局部网格信息传递、几何感知的全局注意力以及稀疏接触感知修正,模型能够在保证计算效率的同时,准确捕捉短程结构交互和长程变形模式。实验表明,该方法在测试集上取得了3.20毫米的时序均方根误差,在精度、结构一致性及物理可解释性方面优于传统方法,为工业碰撞工程分析提供了快速而可靠的预测工具。

Comments 40 pages, 15 figures, 7 tables

详情
英文摘要

Full-vehicle crash simulations are computationally expensive, limiting their use in iterative design exploration. This work investigates learned hybrid surrogate models (MeshTransolver, MeshGeoTransolver, and MeshGeoFLARE) for predicting time-resolved structural deformation fields in an industrial lateral pole-impact benchmark. We evaluate whether neural surrogates can reproduce full-field crash kinematics with sufficient accuracy, spatial regularity, and structural plausibility for engineering interpretation. The proposed architectures combine local mesh message passing, geometry-aware global attention, and sparse contact-aware correction for autoregressive crash rollout. We compare mesh-based graph neural networks, attention-based geometric models, and hybrid architectures under a common training and hyperparameter configuration. The hybrid models capture both short-range structural interactions and long-range deformation patterns, while a sparse contact-aware variant assesses the effect of dynamic proximity interactions during rollout. On a 25-sample full-vehicle test set, the best hybrid model achieves a temporal mean root-mean-square error of 3.20 mm. While geometry-aware attention baselines are quantitatively competitive, qualitative side-view inspection shows they can introduce local spatial noise and deformation irregularities that complicate structural interpretation. In contrast, hybrid mesh-attention models provide the best balance between scalar accuracy, survival-space consistency, and physically interpretable displacement fields. These results suggest that crash surrogate assessment should combine global error metrics with downstream safety-relevant quantities and qualitative field inspection. The proposed methodology enables fast full-field predictions while preserving essential structural information for industrial crash-engineering analysis.

2605.11770 2026-05-13 cs.CR cs.AI cs.SY eess.SY

Behavioral Integrity Verification for AI Agent Skills

Yuhao Wu, Tung-Ling Li, Hongliang Liu

AI总结 该研究提出了一种名为行为完整性验证(BIV)的方法,用于验证AI代理技能的实际能力是否与其声明一致,填补了现有安全机制在技能本身验证方面的空白。该方法结合确定性代码分析与大语言模型辅助的能力提取,构建了统一的分类体系,支持偏差分类、根源分析和恶意技能检测等下游任务。实验表明,BIV在大规模技能数据集上表现出色,揭示了技能描述与实现之间的广泛差距,并在恶意技能检测任务中取得了优于现有方法的高精度结果。

详情
英文摘要

Agent skills extend LLM agents with privileged third-party capabilities such as filesystem access, credentials, network calls, and shell execution. Existing safety work catches malicious prompts and risky runtime actions, but the skill artifact itself goes unverified. We formalize this as the behavioral integrity verification (BIV) problem: a typed set comparison between declared and actual capabilities over a shared taxonomy that bridges code, instructions, and metadata. The BIV framework instantiates this comparison by pairing deterministic code analysis with LLM-assisted capability extraction. The resulting structured evidence supports three downstream analyses: deviation taxonomy, root-cause classification, and malicious-skill detection. On 49,943 skills from the OpenClaw registry, the deviation taxonomy reveals a pervasive description-implementation gap: 80.0% of skills deviate from declared behavior, with four novel compound-threat categories surfaced. Root-cause classification finds that deviations are mostly oversight, not malice: 81.1% trace to developer oversight and 18.9% to adversarial intent, with 5.0% of skills carrying predicted multi-stage attack chains. On a 906-skill malicious-skill detection benchmark, BIV reaches an F1 of 0.946, outperforming state-of-the-art rule-based and single-pass LLM baselines. These results demonstrate behavioral integrity auditing for agent skills at scale.

2605.11759 2026-05-13 cs.CE cs.LG cs.NA math.NA

A nonlinear extension of parametric model embedding for dimensionality reduction in parametric shape design

Andrea Serani, Giorgio Palma, Matteo Diez

AI总结 在基于仿真的形状设计中,高维参数化限制了优化和设计空间探索的效率。本文提出了一种非线性扩展的参数化模型嵌入方法(NLPME),在保留几何驱动的潜变量和参数化重构原理的基础上,用非线性潜空间替代传统线性子空间,从而更高效地处理非线性几何变化。实验表明,NLPME在保持显式反向映射的同时,相比线性方法能以更少的潜变量达到相同的重构精度,具有更高的压缩效率和工程适用性。

详情
英文摘要

Dimensionality reduction is essential in simulation-based shape design, where high-dimensional parameterizations hinder optimization, surrogate modeling, and systematic design-space exploration. Parametric Model Embedding (PME) addresses this issue by constructing reduced variables from geometric information while preserving an explicit backmapping to the original design parameters. However, PME is intrinsically linear and may become inefficient when the sampled design space is governed by nonlinear geometric variability. This paper introduces a nonlinear extension of PME, denoted NLPME. The proposed framework preserves the defining principle of PME -- geometry-driven latent variables and parameter-mediated reconstruction -- while replacing the linear reduced subspace with a nonlinear latent representation. Geometry is not reconstructed directly from the latent variables; instead, the latent representation is decoded into admissible design parameters, and the corresponding geometry is recovered through a forward parametric map. The method is assessed on a bio-inspired autonomous underwater glider with a 32-dimensional parametric shape description and a CAD-based geometry-generation process. NLPME reaches a 5\% reconstruction-error threshold with \(N=5\) latent variables, compared with \(N=8\) for linear PME, and a 1\% threshold with \(N=9\), compared with \(N=15\) for PME. Comparison with a deep autoencoder shows that most of the nonlinear compression gain can be retained while preserving an explicit backmapping to the original design variables. The results establish NLPME as a compact, admissible, and engineering-compatible nonlinear reduced representation for parametric shape design spaces.

2605.11758 2026-05-13 eess.IV cs.CV

DiffSegLung: Diffusion Radiomic Distillation for Unsupervised Lung Pathology Segmentation

Rezkellah Noureddine Khiati, Pierre-Yves Brillet, Catalin Fetita

AI总结 本文提出了一种名为 DiffSegLung 的无监督肺部病理分割框架,旨在解决CT影像中缺乏标注数据以及现有扩散模型未能有效利用Hounsfield Unit(HU)信号的问题。该方法通过引入扩散放射组学蒸馏技术,利用手工设计的放射组学特征作为物理基础的“教师”模型,指导3D扩散U-Net的瓶颈特征学习,从而在无需标注的情况下提取病理区分性结构。实验表明,该方法在多个异质CT数据集上显著提升了分割性能和生成质量。

详情
英文摘要

Unsupervised segmentation of pulmonary pathologies in CT remains an open challenge due to the absence of annotated multi pathology cohorts and the failure of existing diffusion-based methods to exploit the quantitative Hounsfield Unit (HU) signal that physically distinguishes tissue classes. To address this, we propose DiffSegLung,a framework that introduces Diffusion Radiomic Distillation, in which handcrafted radiomic descriptors serve as a physics grounded teacher to shape the bottleneck of a 3D diffusion U-Net via a contrastive objective, transferring pathology discriminative structure into the learned representation without any annotations. At inference, the teacher is discarded and multitimestep bottleneck features are clustered by a Gaussian Mixture Model with HU-guided label assignment, followed by Sobel Diffusion Fusion for boundary refinement. Evaluated on 190 expert annotated axial slices drawn from four heterogeneous CT cohorts, Diff-SegLung improves segmentation across all four pathology classes over unsupervised baselines and improves generation fidelity over prior CT diffusion models.

2605.11720 2026-05-13 cs.SE cs.AI cs.MA

A Research Agenda on Agents and Software Engineering: Outcomes from the Rio A2SE Seminar

Davide Taibi, Henry Muccini, Karthik Vaidhyanathan, Marcos Kalinowski, Michele Albano, Antonio Pedro Santos Alves, Renato Cerqueira, Mateus Devino, Matteo Esposito, Rodrigo Falcão, Vinicius Henning, Foutse Khomh, Valentina Lenarduzzi, Qinghua Lu, Matías Martínez, Henrique Mello, Daniel Mendez, Lucas Romao

AI总结 随着智能体AI的兴起,软件工程正面临两个相互关联的变革方向:一方面智能体被越来越多地应用于支持软件工程任务,另一方面智能体AI系统本身作为复杂的系统,要求重新思考现有的软件工程实践。本文基于里约热内卢举行的A2SE研讨会成果,提出了一个由社区驱动的研究议程,明确了六个主题领域,并为每个领域设定了短期和长期的研究方向,为软件工程界提供了协调研究努力的结构化基础。

Comments 6 pages, 1 table, A2SE meeting, https://sites.google.com/view/a2se2026/home

详情
英文摘要

The rise of agentic AI is reshaping software engineering in two intertwined directions: agents are increasingly applied to support software engineering tasks, and Agentic AI systems themselves are complex systems that require re-thinking currently established software engineering practices. To chart a coherent research agenda covering the two directions, we organized the A2SE seminar in Rio de Janeiro, bringing together 18 experts from academia and industry. Through structured presentations, collaborative topic clustering, and focused group discussions, participants identified six thematic areas: Governance, Software Engineering for Agents, Agents for Software Architecture, Quality and Evaluation, Sustainability, and Code, and they prioritized short-term and long-term research directions for each. This paper presents the resulting community-driven, opinionated research agenda, offering the SE community a structured foundation for coordinating efforts at this critical juncture.

2605.11718 2026-05-13 q-bio.NC cs.AI cs.NE

Self-organized MT Direction Maps Emerge from Spatiotemporal Contrastive Optimization

Zhaotian Gu, Molan Li, Jie Su, Chang Liu, Tianyi Qian, Dahui Wang

AI总结 本研究探讨了灵长类视觉皮层背侧流中方向选择性图(如MT区)的计算起源问题。通过引入一种时空拓扑深度神经网络(TDANN),结合自监督对比学习与生物启发的空间损失函数,模型在自然视频训练中自发生成了类似大脑的运动方向图和拓扑针轮结构。研究揭示了MT区的方向选择特性源于任务驱动的判别压力与空间正则化之间的优化权衡,其表征定量匹配了猕猴MT区的生理基线,为背侧与腹侧视觉流的计算机制统一提供了新见解。

详情
英文摘要

The spatial and functional organization of the primate visual cortex is a fundamental problem in neuroscience. While recent computational frameworks like the Topographic Deep Artificial Neural Network (TDANN) have successfully modeled spatial organization in the ventral stream, the computational origins of the dorsal stream's distinct topographies, such as direction-selective maps in the middle temporal (MT) area, remain largely unresolved. In this work, we present a spatiotemporal TDANN to investigate whether MT topography is governed by the same universal principles. By training a 3D ResNet on naturalistic videos via a Momentum Contrast (MoCo) self-supervised paradigm alongside a biologically inspired spatial loss, we demonstrate the spontaneous emergence of brain-like direction maps and topological pinwheel structures. Crucially, we reveal that MT tuning properties, characterized by strong direction selectivity paired with a residual axial component, arise from a strict optimization trade-off between task-driven discriminative pressure and spatial regularization. The model's representations quantitatively match in vivo macaque MT physiological baselines, including direction selectivity index, circular variance, and pinwheel density. These findings unify the computational origins of the ventral and dorsal streams, establishing a general mechanism for cortical self-organization.

2605.11671 2026-05-13 cs.CR cs.AI cs.SE

Cochise: A Reference Harness for Autonomous Penetration Testing

Andreas Happe, Jürgen Cito

AI总结 Cochise 是一个用于自主渗透测试实验的轻量级 Python 框架,旨在提供一个标准化的实验平台以评估大语言模型驱动的渗透测试代理。该框架通过 SSH 连接 Linux 主机,支持可控的目标环境,并采用 Planner-Executor 架构分离长期状态与执行逻辑,提升实验的可控性和可复现性。研究还提供了回放与分析工具,便于研究人员对实验过程进行可视化和性能评估,推动对渗透测试代理行为与效率的深入研究。

详情
英文摘要

Recent work on LLM-driven autonomous penetration testing reports promising results, but existing systems often combine many architectural, prompting, and tool-integration choices, making it difficult to tell what is gained over a simple agent scaffold. We present cochise, a 597 LOC Python reference harness for autonomous penetration-testing experiments. Cochise connects an LLM-driven agent to a Linux execution host over SSH and supports controlled target environments reachable from that jump host. The prototype implements a separated Planner--Executor architecture in which long-term state is maintained outside the LLM context, while a ReAct-style executor issues commands over SSH and self-corrects based on command outputs. The scenario prompt can be adapted to different target environments. To demonstrate the efficacy of our minimal harness, we evaluate it against a live third-party testbed called Game of Active Directory (GOAD). Alongside the harness, we release replay and analysis tools: (i) cochise-replay for offline visualization of captured runs, (ii) cochise-analyze-alogs and cochise-analyze-graphs for cost, token, duration, and compromise analysis, and (iii) a corpus of JSON trajectory logs from GOAD runs, allowing researchers to study agent behavior without provisioning the 48--64 GB RAM / 190 GB storage testbed themselves. Cochise is intended not as a state-of-the-art pen-testing agent, but as reusable experimental infrastructure for comparing models, agent architectures, and penetration-testing traces.

2605.11653 2026-05-13 cs.CR cs.AI

Every Bit, Everywhere, All at Once: A Binomial Multibit LLM Watermark

Thibaud Gloaguen, Robin Staab, Mark Vero, Martin Vechev

AI总结 随着大语言模型水印技术逐渐应用于商业场景,实际需求日益增长,要求水印能够承载更复杂的多比特负载,如用户ID或时间戳。本文提出了一种全新的多比特水印方法,通过二项式编码在每个词的位置直接嵌入负载的每一位,并结合状态编码器动态调整编码压力以提升效果。实验表明,该方法在消息准确性和鲁棒性方面优于8种基线方法,尤其在负载较大或失真较低的情况下优势更加明显,同时引入了按位置信度评分作为更具实用价值的评估指标。

详情
英文摘要

With LLM watermarking already being deployed commercially, practical applications increasingly require multibit watermarks that encode more complex payloads, such as user IDs or timestamps, into the generated text. In this work, we propose a fundamentally new approach for multibit watermarking: introducing binomial encoding to directly encode every bit of the payload at every token position. We complement our approach with a stateful encoder that during generation dynamically redirects encoding pressure toward underencoded bits. Our evaluation against 8 baselines on up to 64-bit payloads shows that our scheme achieves superior message accuracy and robustness, with the gap to baseline methods widening in more relevant settings (i.e., large payloads and low-distortion regimes). At the same time, we challenge prior works' evaluation metrics, highlighting their lack of practical insights, and introduce per-bit confidence scoring as a practically relevant metric for evaluating multibit LLM watermarks.

2605.11652 2026-05-13 stat.ML cs.LG math.ST stat.TH

Posterior Contraction Rates for Sparse Kolmogorov-Arnold Networks in Anisotropic Besov Spaces

Jeunghun Oh, Kyeongwon Lee, Jaeyong Lee, Lizhen Lin

AI总结 本文研究了稀疏贝叶斯Kolmogorov-Arnold网络(KAN)在各向异性Besov空间中的后验收缩速率,从贝叶斯角度为KAN提供了统计学基础。通过引入尖峰-平缓型稀疏先验,证明稀疏贝叶斯KAN能够达到近似最优的后验收缩率,且该速率依赖于目标函数的内在各向异性光滑性。通过在模型规模参数上设置超先验,后验还能自适应未知的各向异性光滑性并保持相应的近似最优速率。与基于稀疏MLP的模型相比,KAN的深度可保持固定,其复杂度可通过网络宽度、样条网格范围和参数稀疏性进行控制,从而有效避免维数灾难。

详情
英文摘要

We study posterior contraction rates for sparse Bayesian Kolmogorov-Arnold networks (KANs) over anisotropic Besov spaces, providing a statistical foundation of KANs from a Bayesian point of view. We show that sparse Bayesian KANs equipped with spike-and-slab-type sparsity priors attain the near-minimax posterior contraction. In particular, the contraction rate depends on the intrinsic anisotropic smoothness of the underlying function. Moreover, by placing a hyperprior on a single model-size parameter, the resulting posterior adapts to unknown anisotropic smoothness and still achieves the corresponding near-minimax rate. A distinctive feature of our results, compared with those for standard sparse MLP-based models, is that the KAN depth can be kept fixed: owing to the flexibility of learnable spline edge functions, the required approximation complexity is controlled through the network width, spline-grid range and size, and parameter sparsity. Our analysis develops theoretical tools tailored to sparse spline-edge architectures, including approximation and complexity bounds for Bayesian KANs. We then extend to compositional Besov spaces and show that the contraction rates depend on layerwise smoothness and effective dimension of the underlying compositional structure, thereby effectively avoiding the curse of dimensionality. Together, the developed tools and findings advance the theoretical understanding of Bayesian neural networks and provide rigorous statistical foundations for KANs.