arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2075
专题追踪
2605.13252 2026-05-14 stat.ML cs.LG math.ST stat.TH

The Sample Complexity of Multiple Change Point Identification under Bandit Feedback

Maximilian Graf, Victor Thuot

发表机构 * Institut für Mathematik, Universität Potsdam(数学研究所,波茨坦大学) INRAE, Mistea, Institut Agro, Univ Montpellier(国家农业科学研究院,Mistea,农业研究所,蒙彼利埃大学)

AI总结 本文研究了在老虎机反馈机制下多突变点定位问题,旨在以最少的采样次数识别出函数中指定数量的突变点,并满足给定的精度和置信水平。作者提出了一种自适应算法,首先检测可能包含突变点的区间,再精确确定其位置,并给出了该算法的样本复杂度上界和下界。研究发现,突变点的幅度和相对位置共同影响样本复杂度,而不仅仅是突变幅度单独决定。

详情
英文摘要

We study multiple change point localization under bandit feedback. An unknown piecewise-constant function on a compact interval can be queried sequentially at adaptively chosen inputs, and each query returns a noisy evaluation of the function. The goal is to identify a prescribed number of discontinuities, known as change points, within a target precision $η$ and confidence level $1-δ$, while using as few samples as possible. We propose an adaptive algorithm that first detects intervals likely to contain change points and then refines their locations to precision $η$. We establish non-asymptotic upper bounds on its sample budget, together with corresponding lower bounds. Prior work shows that jump magnitudes alone determine the asymptotic sample complexity as $δ\to 0$. We reveal that this picture is incomplete beyond this regime. We demonstrate, both empirically and theoretically, that for general $δ$ and $η$, the complexity is jointly governed by the jumps and the relative positions of the change points.

2605.13248 2026-05-14 eess.SP cs.AI

Compact Latent Manifold Translation: A Parameter-Efficient Foundation Model for Cross-Modal and Cross-Frequency Physiological Signal Synthesis

Bo Cui, Xiaowen Song, Yaowen Zhang, Shunzhe Zhang, B. J. F. van Beijnum, Monique Tabak, Ying Wang

发表机构 * Department of Biomedical Signals and Systems(生物医学信号与系统系) University of Twente(埃因霍温理工大学)

AI总结 该研究针对心电图(ECG)和光电容积图(PPG)等生理信号分析中因设备异构导致的模态和频率差异问题,提出了一种参数高效的统一框架——紧凑潜空间流形翻译(CLMT)。该方法通过两阶段离散翻译机制,结合分层残差向量量化(RVQ)的通用分词器和融合生理先验的上下文引导潜空间翻译器,有效解耦异构信号并实现跨模态和跨频率的高保真信号合成。实验表明,该模型在参数仅为0.09B的情况下,显著优于现有大模型,在跨模态合成和高频超分辨率任务中取得了优异性能。

详情
英文摘要

The analysis of physiological time series, such as electrocardiograms (ECG) and photoplethysmograms (PPG), is persistently hindered by modality and frequency gaps stemming from heterogeneous recording devices. Existing foundation models typically rely on continuous latent spaces, which frequently suffer from severe modality entanglement, lack high-fidelity cross-frequency generative capacity, and impose high computational costs that prohibit edge-device deployment. In this paper, we propose Compact Latent Manifold Translation (CLMT), a highly parameter-efficient (0.09B) unified framework that bridges these gaps through a novel two-stage discrete translation paradigm. First, we introduce a Universal Tokenizer utilizing Hierarchical Residual Vector Quantization (RVQ) to decouple heterogeneous signals into isolated, well-structured discrete latent manifolds, effectively preventing inter-modality interference. Second, a Context-Prompted Latent Translator maps these discrete tokens across modalities by integrating static physiological priors, reframing complex signal synthesis as a pure latent sequence translation task. Extensive evaluations demonstrate that our 0.09B model significantly outperforms massive baselines. In cross-modal PPG-to-ECG synthesis, it resolves temporal phase drift and dramatically improves the clinical R-peak detection F1-score from 0.37 (baseline) to 0.83. Furthermore, in extreme cross-frequency super-resolution (25Hz to 100Hz), it successfully recovers high-frequency diagnostic landmarks, achieving an unprecedented Pearson correlation of 0.9956. By learning a universal discrete language for biological signals with a fraction of the computational footprint, our approach sets a new trajectory for edge-deployable, multi-modal medical foundation models.

2605.13242 2026-05-14 cs.GT cs.LG

When and Why is Optimistic Multiplicative Weights Slow? The Geometry of Energy Dissipation

John Lazarsfeld, Anas Barakat, Georgios Piliouras, Antonios Varvitsiotis, Andre Wibisono

发表机构 * John Lazarsfeld(1 约翰·拉兹斯菲尔德) Georgios Piliouras(2 乔治奥斯·皮利奥拉斯) Antonios Varvitsiotis(3 安东尼奥斯·瓦维蒂西奥斯) Andre Wibisono(5 安德烈·维比索尼奥)

AI总结 本文研究了乐观乘法权重更新算法(OMWU)在两人零和博弈中的收敛性。作者提出了一种新的分析框架,通过将算法的对偶迭代视为能量函数的乐观偏梯度下降,揭示了OMWU收敛缓慢的几何原因,并量化了当原始迭代接近单纯形边界时出现的收敛瓶颈。研究还给出了在唯一内点纳什均衡博弈中,OMWU的新的线性最后迭代收敛速率,并证明该速率对博弈特定常数的依赖更优,同时展示了OMWU在不同收敛度量下存在收敛速率的分离现象。

详情
英文摘要

This paper studies the convergence of the Optimistic Multiplicative Weights Update algorithm (OMWU) in two player zero-sum games. Recent works have identified instances on which the last-iterate of OMWU can converge arbitrarily slowly, but understanding when and why this slow convergence occurs has remained open. In this work, we develop a new analysis framework that gives sharp, quantitative explanations for this behavior. Our analysis is based on viewing the algorithm's dual iterates as an optimistic skew-gradient descent with respect to an energy function. We prove over the dual iterates that energy is dissipative, and by establishing tight bounds on the magnitude of dissipation, our analysis quantifies the geometric bottlenecks that arise when the corresponding primal iterates are close to the simplex boundary. This further translates into a new linear last-iterate convergence rate in KL divergence on games with a unique and interior Nash equilibrium. Compared to prior work, this new rate contains a much sharper dependence on game-specific constants, and we prove this dependence is optimal. Moreover, these geometric insights further translate into new separations on uniform convergence rates for OMWU. On the one hand, we prove constant lower bounds on the uniform best-iterate convergence rate in KL divergence and total variation distance from Nash. On the other hand, we establish for the $2\times 2$ setting a new ${\widetilde O}(T^{-1/2})$ best-iterate rate in duality gap, improving substantially over prior work. Together, this shows in general that uniform convergence rate guarantees do not transfer across different measures of distance to Nash.

2605.13214 2026-05-14 cs.CR cs.LG

Backdoor Channels Hidden in Latent Space: Cryptographic Undetectability in Modern Neural Networks

Marte Eggen, Eirik Reiestad, Kristian Gjøsteen, Inga Strümke

发表机构 * Department of Computer Science, Norwegian University of Science and Technology(挪威科学技术大学计算机科学系) Department of Mathematical Sciences, Norwegian University of Science and Technology(挪威科学技术大学数学科学系)

AI总结 本文研究了现代神经网络中隐藏在潜在空间中的后门通道问题,提出了一种具有密码学不可检测性的后门攻击方法。该方法通过将后门通道学习为潜在空间中的方向,将不可检测性问题转化为对模型参数分布的假设检验,认为该问题在实践中是难以解决的。实验表明,该攻击在ResNet和Vision Transformer等主流架构上具有高成功率且对正常任务性能影响极小,同时能够抵御多种后训练防御手段,证明了后门可以作为网络潜在表示几何结构的固有属性存在,而无需依赖特殊架构或人工构造。

详情
英文摘要

Recent cryptographic results establish that neural networks can be backdoored such that no efficient algorithm can distinguish them from a clean model. These guarantees, however, have been confined to stylised architectures of limited practical relevance, leaving open whether comparable undetectability extends to modern, end-to-end trained networks. We construct such an attack mechanism for state-of-the-art architectures, closely aligned to the cryptographic notion of undetectability, by identifying backdoor channels as learned latent directions, and show that the question of undetectability reduces to a hypothesis test between two unknown distributions over model parameters, which we conjecture to be intractable in practice. The consequence of this reframing is significant: if exploitable channels within a network's latent space are statistically indistinguishable from naturally learned directions, an attacker need not introduce foreign structure but can instead exploit the geometry the network already possesses. Demonstrating the approach on ResNet and Vision Transformer architectures trained on standard image classification datasets, the attack achieves both consistently high success rates with negligible clean accuracy degradation, and resists a comprehensive suite of post-training defences, none of which neutralise the backdoor without rendering the model unusable. Our results establish that cryptographic backdoors need not be artefacts requiring exotic architectures or artificial constructions, but identifiable as latent properties inherent to the geometry of learned representations.

2605.13188 2026-05-14 stat.ML cs.CL cs.LG stat.ME

LLMs as Implicit Imputers: Uncertainty Should Scale with Missing Information

Stef van Buuren

发表机构 * TNO - Netherlands Organization for Applied Scientific Research(荷兰应用科学研究院) Dept. of Methodology and Statistics, University of Utrecht(乌得勒支大学方法学与统计学系)

AI总结 本文研究了大型语言模型(LLMs)在不完整上下文下的回答不确定性问题,提出应将LLMs视为隐式的缺失值填补器,并借鉴多重填补理论中的标准,即不确定性应随缺失信息量增加而上升。通过在SQuAD数据集上的实验,作者发现基于采样的响应熵能更准确地反映上下文缺失程度,而置信度则无法有效体现这一变化。研究还提出了一种黑盒诊断指标,用于评估不同上下文水平下模型不确定性减少的比例,为评估LLMs在不完整信息下的表现提供了新方法。

Comments 9 pages, 3 figures, 2 tables, NeurIPS 2026 position paper

详情
英文摘要

Large language models (LLMs) are increasingly deployed in settings where the available context is incomplete or degraded. We argue that an LLM generating answers under incomplete context can be viewed as an implicit imputer, and evaluated against a criterion from the multiple imputation (MI) literature: uncertainty should scale with the amount of missing information. We assess this criterion on SQuAD, using a controlled framework in which context availability is varied across five levels. We evaluate two answer-level uncertainty measures that can be estimated from repeated sampling: sampling-based confidence (empirical mode frequency) and response entropy. Confidence fails to reflect increasing missingness: it remains high even as accuracy collapses. Entropy, by contrast, increases with context removal, consistent with the MI analogy, and explains substantially more variance in accuracy than confidence across all evidence levels (quadratic $R^2$ gap up to 0.057). We further introduce a black-box diagnostic $ρ_R(α)$ that estimates the proportion of baseline uncertainty resolved by context level $α$, requiring only repeated sampling with and without context. These results suggest that entropy is a more responsive black-box uncertainty measure than confidence under incomplete context.

2605.13174 2026-05-14 stat.ML cs.LG stat.CO

Coupling-Informed Transport Maps for Bayesian Filtering in Nonlinear Dynamical Systems

Dengfei Zeng, Lijian Jiang, Shuyu Sun, Dunhui Xiao

发表机构 * School of Mathematical Sciences, Tongji University(同济大学数学科学学院) Key Laboratory of Intelligent Computing and Applications (Ministry of Education), Tongji University(智能计算与应用重点实验室(教育部))

AI总结 本文提出了一种基于状态与观测变量之间耦合关系的无似然传输滤波方法,用于非线性动态系统的贝叶斯滤波。通过利用传输映射的块三角结构,将滤波分析步骤转化为最小化真实联合分布与其传输近似之间的最大平均差异(MMD)。为避免MMD优化中的非凸性问题,作者引入了一种无需训练的传输滤波方法,通过梯度流实现传输映射的解析计算,从而有效逼近非高斯滤波后验分布并避免粒子崩溃。该方法在高维问题中通过域局部化进行扩展,并在数值实验中展现出优于传统滤波方法的性能。

Comments 29 pages, 14 figures

详情
英文摘要

A likelihood-free transport filtering method is proposed based on the couplings between state and observation variables. By exploiting a block-triangular structure in the transport map, the analysis step of filtering is reformulated as the minimization of the maximum mean discrepancy (MMD) between the true joint measure and its transport-based approximation. To circumvent the non-convexity in the MMD optimization, we introduce a training-free transport filter method via gradient flows, which leads to an analytic computation for the transport map that implies the steepest descent direction of the MMD. The proposed approach accurately approximates non-Gaussian filtering posteriors and avoids particle collapse. We provide a convergence analysis for the expectation of the MMD between the approximated posterior and the truth posterior. Finally, we extend the method to high-dimensional problems through domain localization. Numerical examples demonstrate the superior performance of our approach over conventional filtering methods in nonlinear, non-Gaussian scenarios.

2605.13172 2026-05-14 cs.MA cs.AI

When Does Hierarchy Help? Benchmarking Agent Coordination in Event-Driven Industrial Scheduling

Ziqi Wang, Yuhao Yang, Zhiwei Ling, Wenzhuo Qian, Hailiang Zhao

发表机构 * Zhejiang University(浙江大学)

AI总结 本文提出了一种新的基准测试平台DESBench,用于评估代理在分层事件驱动工业调度中的协调能力。研究关注不同协调机制(如集中式、分层式、异层式和整体论式)在动态耦合约束环境中的表现差异,揭示了各类机制在鲁棒性、效率和通信开销等方面的权衡。该工作为理解复杂系统中代理协调的设计原则提供了重要见解,强调了未来多智能体系统研究中对更自适应和动态协调机制的需求。

详情
英文摘要

Recent advances in agent and multi-agent systems have shown strong performance on tool use, reasoning, and collaborative tasks. However, existing benchmarks mostly evaluate task completion in weakly coupled environments, and provide limited support for studying coordination in shared, dynamically evolving systems with hierarchy and coupled constraints. This leaves an important question underexplored: when do different coordination paradigms succeed or fail? We introduce Distributed Event-driven Scheduling Benchmark (DESBench), a benchmark for evaluating agent coordination in hierarchical event-driven scheduling. Built on a shared discrete-event driven environment in industrial scheduling, our benchmark captures multi-timescale decision making, partial observability, and dynamically coupled constraints. We define tasks and metrics that evaluate effectiveness, constraint alignment, coordination efficiency, and robustness, and focus on four representative coordination paradigms: centralized, hierarchical, heterarchical, and holonic. These paradigms correspond to distinct mechanisms of information flow, decision authority, and conflict resolution. Our controlled evaluations reveal clear coordination trade-offs: centralized coordination is robust and communication-efficient but scales poorly with difficulty; hierarchical coordination improves efficiency through decomposition but suffers from cross-level misalignment; heterarchical coordination is flexible but communication-heavy; and holonic coordination satisfies constraints well but loses global robustness. These findings demonstrate that coordination design fundamentally shapes agent system behavior in complex environments, revealing structural trade-offs that cannot be captured by outcome metrics alone and underscoring the imperative for more adaptive, principled, and dynamic coordination mechanisms in future MAS research.

2605.13163 2026-05-14 cs.CR cs.CV cs.LG

LoREnc: Low-Rank Encryption for Securing Foundation Models and LoRA Adapters

Beomjin Ahn, Jungmin Kwon, Chanyong Jung, Jaewook Chung

发表机构 * Samsung Research(三星研究院) Samsung Electronics(三星电子) Amazon Web Services(亚马逊网络服务) University of Michigan(密歇根大学)

AI总结 该论文提出了一种名为LoREnc的训练-free框架,用于保护基础模型和LoRA适配器的安全,防止知识产权泄露和模型恢复攻击。其核心方法基于谱截断与补偿技术,通过抑制基础模型权重中的主导低秩成分,并在授权适配器中补偿缺失信息,同时利用正交重参数化隐藏适配器的结构特征。实验表明,LoREnc在保证模型性能的同时,能有效抵御模型恢复攻击,且计算开销极低。

Comments Accepted to ICIP 2026

详情
英文摘要

Foundation models and low-rank adapters enable efficient on-device generative AI but raise risks such as intellectual property leakage and model recovery attacks. Existing defenses are often impractical because they require retraining or access to the original dataset. We propose LoREnc, a training-free framework that secures both FMs and adapters via spectral truncation and compensation. LoREnc suppresses dominant low-rank components of FM weights, compensates for the missing information in authorized adapters, and further applies orthogonal reparameterization to obscure structural fingerprints of the protected adapter. Unauthorized users produce structurally collapsed outputs, while authorized users recover exact performance. Experiments demonstrate that LoREnc provides strong protection against model recovery with under 1% computational overhead.

2605.13160 2026-05-14 stat.ML cs.LG

Kernel-based guarantees for nonlinear parametric models in Bayesian optimization

Rafael Oliveira

发表机构 * The Commonwealth Scientific and Industrial Research Organisation(共同科学与工业研究组织)

AI总结 本文研究了在贝叶斯优化中使用非线性参数模型时的理论保证问题,针对适应性数据收集场景下的模型分析缺乏理论支持的现状,提出了一种基于核函数的框架。该方法通过参数空间上的核函数诱导模型类的再生核希尔伯特空间结构,为使用广泛正则化凸损失训练的非线性模型提供了置信界,进而支持非线性获取函数和代理模型的收敛性保证,为贝叶斯优化及相关自适应优化问题提供了统一的理论分析途径。

详情
英文摘要

Modern Bayesian optimization and adaptive sampling methods increasingly rely on nonlinear parametric models, yet theoretical guarantees for such models under adaptive data collection remain limited. Existing analyses largely focus on Gaussian processes, kernel machines, linear models, or linearized neural approximations, leaving a gap between theory and the nonlinear models used in practice. We develop a kernel based framework for analyzing regularized nonlinear parametric models trained on adaptively collected data. Our approach uses kernels over the parameter space to induce reproducing kernel Hilbert space structures over the corresponding model class, yielding confidence bounds for models trained with broad classes of regularized convex losses. We show how these bounds can support convergence guarantees for nonlinear acquisition and surrogate models, including randomized regularized policies that select points by maximizing a trained random model. These results provide a unified route to analyzing nonlinear parametric models in Bayesian optimization and related adaptive optimization settings.

2605.13150 2026-05-14 stat.ML cs.LG

Generative Modeling of Approximately Periodic Time Series by a Posterior-Weighted Gaussian Process

Elias Reich, Saverio Messineo, Stefan Huber

发表机构 * Josef Ressel Centre for Intelligent and Secure Industrial Automation(约瑟夫·雷斯尔智能与安全工业自动化中心) Salzburg University of Applied Sciences(萨尔茨堡应用科学大学)

AI总结 该论文研究了工业和网络物理系统中具有近似周期性特征的离散自动化过程的时间序列生成问题。为了解决传统高斯过程模型在处理此类数据时的不足,作者提出了一种基于后验加权高斯过程的生成模型,通过引入新的核函数,实现了对周期性结构和重复间变异的解耦。该方法能够在保持重复间结构一致性的同时,生成具有平滑变化特性的近似周期时间序列,为相关领域的建模与生成任务提供了新思路。

详情
英文摘要

Discrete automated processes in industrial and cyber-physical systems often exhibit a repetitive structure in which successive repetitions follow a common trajectory while differing in duration, amplitude, and fine-scale dynamics. Such \emph{approximately periodic} behavior poses a challenge for Gaussian Processes (GP) modeling: strictly periodic models suppress inter-repetition variability, while non-periodic models fail to capture the strong structural regularities required for generation. In this work, we propose a stochastic generative model for approximately periodic time series. The model is based on a GP whose posterior is modulated by a novel kernel. Our approach decouples intra-repetition structure from inter-repetition variability through a two-stage construction which yields a generative distribution with a identical mean function across repetitions, while allowing smooth variation between repetitions. The modeling choices are supported by an implementation in which realistic synthetic trajectories are generated from toy datasets.

2605.13146 2026-05-14 stat.ML cs.CV cs.LG

On Hallucinations in Inverse Problems: Fundamental Limits and Provable Assessment Methods

David Iagaru, Nina M. Gottschling, Anders C. Hansen, Josselin Garnier

发表机构 * Gauss Centre for Supercomputing e.V.(Gauss超级计算中心) John von Neumann Institute for Computing(约翰·冯·诺依曼计算研究所) Deutsches Zentrum für Luft und Raumfahrt(德国航空航天中心) Laboratory Directed Research and Development Program of Oak Ridge National Laboratory(橡树岭国家实验室定向研究与开发计划) UT-Battelle, LLC(UT-巴特尔公司) Computing and Computational Sciences, Oak Ridge National Laboratory(橡树岭国家实验室计算与计算科学部) DAMTP, University of Cambridge(剑桥大学DAMTP)

AI总结 本文研究了逆问题中的“幻觉”现象,即人工智能模型生成的看似合理但实际错误的细节。作者提出了一种理论框架,揭示这类幻觉不仅源于模型本身,更可能源于逆问题本身的病态特性,并推导出幻觉产生的充要条件及仅依赖于前向模型的可计算界。基于该理论,文章提出了两种算法,分别用于估计最小幻觉幅度和评估重建细节的可信度,实验表明该方法适用于多种成像任务和生成模型,为量化和评估AI幻觉提供了理论依据。

Comments 31 pages, 11 figures; code available at https://github.com/davidiagraid/hallucinations_invpb

详情
英文摘要

Artificial intelligence (AI) has transformed imaging inverse problems, from medical diagnostics to Earth observation. Yet deep neural networks can produce hallucinations, realistic-looking but incorrect details, undermining their reliability, especially when ground truth data is unavailable. We develop a theoretical framework showing that such hallucinations are not merely artifacts of particular models, but can arise from the ill-posed nature of the inverse problem itself. We derive necessary and sufficient conditions for hallucinations, together with computable bounds on their magnitude that depend only on the forward model. Building on this theory, we introduce algorithms to: (1) estimate the minimum hallucination magnitude achievable by any reconstruction model for a given input; (2) assess the faithfulness of reconstructed details by a given reconstruction model. Experiments across three imaging tasks demonstrate that our approach applies broadly, including to modern generative models, and provides a principled way to quantify and evaluate AI hallucinations.

2605.13138 2026-05-14 cs.SE cs.CR cs.LG

Code-Centric Detection of Vulnerability-Fixing Commits: A Unified Benchmark and Empirical Study

Nils Loose, Joseph Bienhüls, Kristoffer Hempel, Felix Mächtle, Thomas Eisenbarth

发表机构 * University of Lübeck(吕贝克大学) University of Lübeck Institute for IT Security(吕贝克大学信息安全部)

AI总结 本文研究了如何通过代码语言模型自动检测漏洞修复提交(VFCs),并构建了一个统一的基准框架,整合了20多个数据集中的超过18万个提交记录,进行了大量实验。研究发现,仅依靠代码变更无法使模型获得可迁移的安全相关理解,提交信息在模型中占据主导地位,而仅依赖代码差异则难以有效检测漏洞。实验还揭示了数据划分方式对模型性能的影响,并指出当前基于代码的VFC检测方法在低误报率下仍存在较高的漏检率,亟需进一步改进。

详情
英文摘要

Automated detection of vulnerability-fixing commits (VFCs) is critical for timely security patch deployment, as advisory databases lag patch releases by a median of 25 days and many fixes never receive advisories. We present a comprehensive evaluation of code language model based VFC detection through a unified framework consolidating over 20 fragmented datasets spanning more than 180000 commits. Across over 180 experiments with fine-tuned models from 125 M to 14 B parameters, we find no evidence that models acquire transferable security-relevant code understanding from code changes alone. When commit messages are available, they dominate model attention, and when removed, an attribution analysis shows that enriching diffs with additional intra-procedural semantic context does not shift model attention toward the code changes. Group-stratified evaluation exposes approximately 17% performance drops compared to random splits, while temporal splits on aggregated datasets prove unreliable due to compositional shift in the underlying project distributions. At a false positive rate of 0.5% all fine-tuned code-only models miss over 93% of vulnerabilities. Larger and more diverse training data or generative approaches show preliminary improvements but do not resolve the underlying limitations. To support future research on code-centric VFC detection, we release our unified framework and evaluation suite.

2605.13129 2026-05-14 cs.GR cs.CV

Rigel3D: Rig-aware Latents for Animation-Ready 3D Asset Generation

Nikitas Chatzis, Marios Loizou, Evangelos Kalogerakis

发表机构 * Technical University of Crete(希腊克里特技术大学) CYENS Center of Excellence(CYENS卓越中心) University of Massachusetts Amherst(马萨诸塞大学阿姆赫斯特分校)

AI总结 Rigel3D 是一种生成可用于动画的 3D 资产的生成方法,解决了现有 3D 生成模型输出缺乏骨骼结构、关节层次和蒙皮权重的问题。该方法通过耦合的表面与骨骼结构化潜在表示,联合建模几何形状与骨骼结构,并利用一个骨骼感知的自编码器生成网格、骨骼拓扑、关节坐标和蒙皮权重。此外,Rigel3D 还引入了开放词汇的关节标注模块,支持生成的关节与任意重定向模板的对应,实验表明其在多个指标上优于现有方法,能够生成高质量且多样化的动画就绪 3D 资产。

详情
英文摘要

Recent 3D generative models can synthesize high-quality assets, but their outputs are typically static: they lack the skeletal rigs, joint hierarchies, and skinning weights required for animation. This limits their use in games, film, simulation, virtual agents, and embodied AI, where assets must not only look plausible but also move plausibly. We introduce Rigel3D, a generative method for animation-ready 3D assets represented as rigged meshes. Unlike post-hoc auto-rigging methods that attach rigs to completed shapes, our method jointly models geometry and rig structure through coupled surface and skeleton structured latent representations. A rig-aware autoencoder decodes these representations into mesh geometry, skeleton topology, joint coordinates, and skinning weights, while a two-stage latent generative model synthesizes both surface and skeleton representations for image-conditioned generation. To support downstream animation workflows, we further introduce an open-vocabulary joint labeling module that embeds generated joints into a shared vision-language space, enabling correspondence to arbitrary retargeting templates. Experiments on large-scale rigged asset datasets demonstrate that our method generates diverse, high-quality animation-ready assets and outperforms existing rigging baselines across multiple metrics.

2605.13128 2026-05-14 stat.ML cs.LG stat.CO

Amortized Neural Clustering of Time Series based on Statistical Features

Ángel López-Oriona, Ying Sun

发表机构 * Statistics Program, King Abdullah University of Science and Technology (KAUST)(卡斯土奈大学科学与技术学院统计学项目)

AI总结 本文提出了一种无需依赖传统聚类算法(如K-means、K-medoids或层次聚类)的基于统计特征的时间序列聚类方法,通过神经网络的 amortized 推理学习最优聚类规则。该方法利用自相关和分位数自相关等统计特征,从数据中自动学习亲和结构,无需预先指定聚类形状或数量,且能自动确定聚类数目。实验表明,该框架在多种场景下均能实现与传统方法相当或更优的聚类效果,并在金融时间序列分析中展现出实际应用价值。

详情
英文摘要

This paper introduces an algorithm-agnostic approach to feature-based time series clustering via amortized neural inference. By training neural networks to approximate the optimal partitioning rule from simulated data, the proposed framework reduces reliance on conventional clustering methods, such as $K$-means, $K$-medoids, or hierarchical clustering, and their associated objective functions and heuristics. Leveraging statistical features, such as autocorrelations and quantile autocorrelations, the approach learns a data-driven affinity structure from which clustering partitions can be recovered, without requiring explicit prior specification of cluster shapes or structures. In addition, one version of the method can automatically determine the number of clusters, avoiding ad-hoc selection procedures. Comprehensive empirical studies show that the proposed framework achieves competitive or superior clustering accuracy relative to traditional methods, even in challenging scenarios where competing techniques are provided with the true number of clusters. An application to financial time series of stock returns illustrates its practical utility. By reducing the need for algorithm selection and calibration, the proposed framework opens new possibilities for automated, adaptive, and data-driven clustering of temporal data across scientific and industrial domains.

2605.13127 2026-05-14 stat.ML cs.LG math.PR

State-of-art minibatches via novel DPP kernels: discretization, wavelets, and rough objectives

Hoang-Son Tran, Pranav Gupta, Rémi Bardenet, Subhroshekhar Ghosh

发表机构 * Univ. Lille, CNRS, Centrale Lille UMR 9189 – CRIStAL(里尔大学,CNRS,里尔中央大学 UMR 9189 – CRIStAL)

AI总结 本文研究如何利用新型行列式点过程(DPP)核来生成更高效的迷你批次和核心集,以提升大规模数据集的机器学习效率。作者提出了基于小波的欧几里得空间DPP,其精度保证优于现有方法,并开发了一种将连续DPP转换为离散核的通用方法,从而在保持方差衰减特性的同时实现高效采样。该方法拓展了DPP在不规则目标函数任务中的适用性,并提供了与任务正则性自适应的理论保证。

Comments 52 pages

详情
英文摘要

Determinantal point processes (DPPs) have emerged as a kernelized alternative to vanilla independent sampling for generating efficient minibatches, coresets and other parsimonious representations of large-scale datasets. While theoretical foundations and promising empirical performance have been demonstrated, there are two challenges for current proposals for DPP-based coresets or minibatches. The first is the need for families of DPPs with certain key variance reduction properties, usually constructed in a continuous setting, of which there are few known examples. The second is the need for an ad-hoc construction of a discrete DPP defined on a given dataset, that inherits such variance reduction. In this work, we contribute to the programme of establishing DPPs as a subsampling toolbox for ML by advancing on these two fronts. First, we propose new DPPs on the Euclidean space based on wavelets, with provably better accuracy guarantees than the best known rates. Second, we introduce a general method to convert such continuous DPPs, which are more amenable to proving analytical statements, into discrete kernels, which are pertinent for subsampling tasks such as minibatch and coreset constructions. This conversion mechanism simultaneously preserves the desired variance decay and reveals a low-rank decomposition of the discrete kernel, which makes sampling the corresponding DPP computationally inexpensive. En route, we enlarge the class of ML tasks amenable to improvements via DPP-based minibatches and coresets to include objective functions with arbitrarily low regularity, and rate guarantees that explicitly adapt to this regularity.

2605.13115 2026-05-14 cs.CR cs.LG

DiffusionHijack: Supply-Chain PRNG Backdoor Attack on Diffusion Models and Quantum Random Number Defense

Ziyang You, Liling Zheng, Xiaoke Yang, Xuxing Lu

发表机构 * School of Electronics, Electrical Engineering and Physics, Fujian University of Technology(福建工程学院电子工程与物理学院) School of Humanities, Fujian University of Technology(福建工程学院人文学院) Institute of Applied Physics and Materials Engineering, University of Macau(澳门大学应用物理与材料工程学院)

AI总结 本文提出了一种针对扩散模型的供应链后门攻击方法DiffusionHijack,通过在伪随机数生成器(PRNG)中植入恶意代码,能够在不修改模型权重的情况下,精确控制生成图像的内容,且攻击难以被现有检测机制发现。为应对该攻击,研究者提出采用量子随机数生成器(QRNG)替代传统PRNG,有效消除攻击影响,将生成图像的相似度降至随机基线水平。该工作揭示了扩散模型在供应链层面的安全隐患,并提供了硬件级的防御方案。

Comments This work has been submitted to the IEEE for possible publication

详情
英文摘要

Diffusion models depend on pseudo-random number generators (PRNGs) for latent noise sampling. We present DiffusionHijack, a supply-chain backdoor attack that hijacks the PRNG to deterministically control generated images. A malicious PRNG, injected via compromised packages, forces pixel-perfect reproduction of attacker-chosen content (SSIM = 1.00, N = 100 trials) on Stable Diffusion v1.4, v1.5, and SDXL -- without modifying model weights. The attack is inherently undetectable by existing model auditing and content moderation mechanisms, as it operates entirely outside the neural network computation graph. The attack remains effective under stochastic sampling (eta > 0), bypasses CLIP-based safety checkers (98-100% success), and operates independently of the user's prompt. As a countermeasure, we replace the PRNG with a quantum random number generator (QRNG), which provides information-theoretic unpredictability. Across N = 100 prompt-model combinations, QRNG defense completely neutralizes the attack, reducing output similarity to random baseline levels (SSIM < 0.20 for SD 1.x models, < 0.45 for SDXL). This work exposes a previously overlooked supply-chain vulnerability and offers a hardware-level fundamental mitigation for generative AI systems.

2605.13113 2026-05-14 cs.CY cs.AI

Context Matters: Auditing Gender Bias in T2I Generation through Risk-Tiered Use-Case Profiles

Jose Luna, Yankun Wu, Xiaofei Xie, Noa Garcia

发表机构 * Singapore Management University(新加坡国立大学) The University of Osaka(大阪大学)

AI总结 本文研究了文本到图像生成模型中的性别偏见问题,提出了一种基于风险分层的审计框架,以更系统地评估和治理模型中的性别偏差。该框架包含三个核心组成部分:根据欧盟AI法案的风险类别定义使用场景的分层档案,整合多种性别偏见评估指标的分类目录,以及将不同情境下的危害类型映射到具体风险场景的分类体系。研究还引入了THUMB卡片工具,帮助在审计过程中综合考虑上下文、场景、偏见表现和潜在危害,提升评估的系统性和实用性。

Comments FAccT 2026

详情
英文摘要

Text-to-image (T2I) generative models are increasingly used to produce content for education, media, and public-facing communication, and are starting to be integrated into higher-impact pipelines. Since generated images tend to reinforce stereotypes, producing representational erasure via "default" depictions and shaping perceptions of who belongs in certain roles, a growing body of work has proposed metrics to quantify gender bias in T2I outputs. Yet existing evaluations remain fragmented. Metrics are often reported without a shared view of what they measure, what assumptions they entail, or how their results should be interpreted under different deployment contexts. This limits the usefulness of gender bias measurement for both technical auditing and emerging governance discussions. We propose a risk-aligned auditing framework for gender bias in T2I models composed of three constituents that connects risk categories, evaluation metrics, and harms. First, we identify risk-tiered use-case profiles aligned with the EU AI Act's risk categories to motivate why auditing expectations may vary with deployment contexts and stakeholder exposure. Second, we construct a metric catalog that consolidates gender-bias evaluation methods and organizes them in three measurement categories: gender prediction, embedding similarity, and downstream task. Third, we introduce a harm typology that maps context-dependent harm categories (e.g., representational, quality-of-service) to specific risk-tired scenarios. Finally, we introduce THUMB cards (Text-to-image Harms-informed Use-case-aligned Metrics of gender Bias) that help formulate auditing systematically by the incorporation of context, scenario and bias manifestation, harm hypotheses, and audit strategy.

2605.13110 2026-05-14 cs.MA cs.AI cs.IR

A Multi-Agent Orchestration Framework for Venture Capital Due Diligence

Grigorios Alexandrou, Katerina Pramatari

发表机构 * Greek Business Registry(希腊企业登记处)

AI总结 本文提出了一种用于风险投资尽职调查和市场分析的全自动多智能体框架。该框架基于事件驱动的架构,结合大型语言模型与实时网络检索技术,将非结构化数据转化为结构化的投资情报。其核心贡献包括一个能够逆向解析希腊商业注册系统前后端通信的程序化数据提取流程,以及一种在数据缺失时明确标记而非生成未经验证数据的结构化回退机制,有效避免了金融场景中的幻觉问题。

Comments 13 pages, 1 figure

详情
英文摘要

We present a fully automated multi-agent framework for corporate due diligence and market analysis in venture capital. The system runs on an event-driven orchestration architecture, combining Large Language Models (LLMs) with real-time web retrieval to synthesize unstructured data into structured investment intelligence. A central technical contribution is a programmatic extraction pipeline that reverse-engineers the frontend-to-backend communication of the Greek Business Registry ($Γ$.E.MH.), querying dynamic endpoints to retrieve official financial filings that are then parsed using a layout-aware OCR extractor. A structural fallback mechanism explicitly flags data absence rather than generating unverified figures, directly targeting hallucination in financial contexts. All workflow artifacts are publicly available to support replication.

2605.13077 2026-05-14 cs.MA cs.AI

Counterfactual Reasoning for Causal Responsibility Attribution in Probabilistic Multi-Agent Systems

Chunyan Mu, Muhammad Najib

发表机构 * University of Aberdeen(阿伯丁大学) Heriot-Watt University(赫瑞瓦特大学)

AI总结 本文研究了多智能体系统中因果责任归属的问题,提出了一种基于反事实推理的因果责任分配方法。作者将系统建模为并发随机多玩家博弈,并引入了回顾性反事实责任的概念,利用夏普利值进行责任分配,确保公平性和一致性等关键性质。在此基础上,构建了一个支持责任感知系统的验证与策略推理的正式框架,并结合纳什均衡分析了责任与预期奖励之间的权衡策略。

详情
英文摘要

Responsibility allocation -- determining the extent to which agents are accountable for outcomes -- is a fundamental challenge in the design and analysis of multi-agent systems. In this work, we model such systems as concurrent stochastic multi-player games and introduce a notion of retrospective (backward) counterfactual responsibility, which quantifies an agent's accountability for outcomes resulting from a given strategy profile. To allocate responsibility among agents, we utilise the Shapley value and formally show that this method satisfies key desirable properties, including fairness and consistency. Building on this foundation, we propose a formal framework that supports both verification and strategic reasoning in responsibility-aware multi-agent systems. Furthermore, by adopting Nash equilibrium as the solution concept, we demonstrate how to compute stable strategy profiles in which agents trade off responsibility against expected reward.

2605.13072 2026-05-14 quant-ph cs.AI

Neural QAOA$^{2}$: Differentiable Joint Graph Partitioning and Parameter Initialization for Quantum Combinatorial Optimization

Zubin Zheng, Jiahao Wu, Shengcai Liu

发表机构 * Guangdong Provincial Key Laboratory of Brain-Inspired Intelligent Computation, Department of CSE, SUSTech(脑启发智能计算广东省重点实验室,计算机科学与工程系,南方科技大学)

AI总结 本文提出了一种名为Neural QAOA²的端到端可微框架,用于解决量子组合优化中的图划分与参数初始化问题。该方法通过集成生成评估网络(GEN),结合可微量子评估器作为高保真性能代理,实现了图划分与初始参数的联合生成,从而克服了传统方法中划分指标与优化目标不一致、参数初始化缺乏拓扑感知的问题。实验表明,该方法在多个QUBO、Ising和MaxCut实例上表现优异,显著优于现有启发式方法,并具备良好的跨分布泛化能力。

Comments Accepted to ICML 2026

详情
英文摘要

The quantum approximate optimization algorithm (QAOA) holds promise for combinatorial optimization but is constrained by limited qubits. While divide-and-conquer frameworks like QAOA$^{2}$ address scalability by partitioning graphs into subgraphs, existing methods suffer from two fundamental limitations: i) misalignment between heuristic partitioning metrics and quantum optimization goals, and ii) topology-blind parameter initialization that leads to optimization cold starts. To bridge these gaps, we propose Neural QAOA$^{2}$, an end-to-end differentiable framework that jointly generates graph partitions and initial parameters. By integrating a generative evaluative network (GEN), our method utilizes a differentiable quantum evaluator as a high-fidelity performance surrogate to provide direct gradient guidance, enabling the joint generator to learn the intrinsic mapping from graph topology to high-quality partition and parameter configurations. Extensive experiments on 183 QUBO, Ising, and MaxCut instances (21 to 1000 variables) demonstrate that our gradient-driven approach broadly outperforms heuristic baselines, ranking first on 101 instances. It exhibits zero-shot generalization across out-of-distribution graph topologies and scales.

2605.13052 2026-05-14 cs.IR cs.CL

RAG-Enhanced Large Language Models for Dynamic Content Expiration Prediction in Web Search

Tingyu Chen, Wenkai Zhang, Li Gao, Lixin Su, Ge Chen, Dawei Yin, Daiting Shi

发表机构 * Baidu Inc.(百度公司)

AI总结 在商业网页搜索中,如何根据用户意图匹配内容的新鲜度仍是一个挑战,传统方法依赖静态时间窗口过滤,导致排名结果可能包含语义上已过时的内容。本文提出了一种基于大语言模型(LLM)的查询感知动态内容过期预测框架,通过从文档中提取细粒度时间上下文,并利用LLM推断出与查询相关的“有效性边界”,从而动态判断信息何时变得过时。该方法结合了鲁棒的幻觉抑制策略,并在实际生产流量中通过离线和在线A/B测试验证,显著提升了搜索结果的新鲜度和用户体验。

Comments Accepted at SIGIR 2026. Final version: https://doi.org/10.1145/3805712.3808457

详情
英文摘要

In commercial web search, aligning content freshness with user intent remains challenging due to the highly varied lifespans of information. Traditional industrial approaches rely on static time-window filtering, resulting in "one-size-fits-all" rankings where content may be chronologically recent but semantically expired. To address the limitation, we present a novel Large Language Models (LLMs)-based Query-Aware Dynamic Content Expiration Prediction Framework deployed in Baidu search, reformulating timeliness as a dynamic validity inference task. Our framework extracts fine-grained temporal contexts from documents and leverages LLMs to deduce a query-specific "validity horizon"-a semantic boundary defining when information becomes obsolete based on user intent. Integrated with robust hallucination mitigation strategies to ensure reliability, our approach has been evaluated through offline and online A/B testing on live production traffic. Results demonstrate significant improvements in search freshness and user experience metrics, validating the effectiveness of LLM-driven reasoning for solving semantic expiration at an industrial scale.

2605.13044 2026-05-14 cs.CR cs.AI

No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills

Ying Li, Hongbo Wen, Yanju Chen, Hanzhi Liu, Yuan Tian, Yu Feng

发表机构 * University of California, San Diego(加州大学圣迭戈分校)

AI总结 本文研究了基于大语言模型的智能体技能中可能存在的“规范违反”问题,即技能在执行过程中违反其自身声明的安全规则,而并非由于受到攻击。为此,作者提出了一种基于语义的模糊测试框架Sefz,将安全规则转化为执行路径上的可达性目标,并通过LLM生成良性输入逐步逼近违规模式,从而自动发现规范违反问题。实验表明,Sefz在402个真实技能中发现了29.9%的规范违反问题,揭示了六类常见的设计缺陷,为安全技能设计提供了指导。

详情
英文摘要

LLM-powered agents can silently delete documents, leak credentials, or transfer funds on a routine user request, not because the agent was attacked, but because the skill it invoked broke its own declared safety rules. We call these specification violations: benign inputs cause a skill to breach the natural-language guardrails in its own specification, typically because the guardrail's semantics are undefined for autonomous execution, or because the implementation silently ignores the documented constraint. These violations are invisible to static analyzers, traditional fuzzers, and prompt-injection defenses alike, yet they undermine the very contract a user trusts when installing a skill. We present Sefz, a goal-directed semantic fuzzing framework that automatically discovers specification violations in agent skills. Sefz translates each guardrail into a reachability goal over an annotated execution trace, reducing violation checking to a deterministic graph query. An LLM-based mutator generates benign inputs whose traces progressively approach the violation patterns, guided by a multi-armed bandit that uses goal-proximity as its reward signal. On 402 real-world skills from the largest public agent-skill marketplace, Sefz finds specification violations in 120 (29.9%), including 26 previously unknown exploitable guardrail violations in deployed skills. Six recurring specification pitfalls explain the bulk of the failures, suggesting concrete principles for safer skill design.

2605.13031 2026-05-14 eess.SY cs.RO cs.SY

Relative Pose-Velocity Estimation Using Dual IMU Measurements and Relative Position Sensing

Alessandro Melis, Tarek Bouazza, Soulaimane Berkane, Tarek Hamel

发表机构 * I3S-CNRS, Nice-Sophia Antipolis, France(法国Nice-索菲亚大学I3S-CNRS研究中心) Department of Computer Science and Engineering, Université du Québec en Outaouais (UQO)(魁北克大学Outaouais计算机科学与工程系)

AI总结 本文研究了在移动目标和车辆均配备惯性测量单元(IMU)的情况下,如何利用相对位置或方位测量估计两者之间的相对姿态(位置和姿态)和速度。通过将相对动力学建模为SE₂(3)上的系统,并转化为高维空间ℝ¹⁵中的线性时变模型,设计了一个确定性Riccati观测器以实现状态估计。研究分析了保证估计误差全局指数收敛的可观测性条件,并提出了一种非线性互补滤波器以实现姿态分量的平滑估计,具有几乎全局渐近稳定性。仿真结果验证了所提方法的有效性。

详情
英文摘要

This paper addresses the problem of estimating the relative pose (position and orientation) and velocity of a vehicle with respect to a moving target, where both are equipped with Inertial Measurement Units (IMUs), assuming the availability of relative position or bearing measurements. The body-target relative dynamics are formulated on $\mathbf{SE}_2(3)$ and recast into a linear time-varying (LTV) model in the ambient space $\mathbb{R}^{15}$, on which a deterministic Riccati observer is designed. We analyze the uniform observability (UO) conditions required to guarantee global exponential convergence of the estimation error in the ambient space for both measurement cases. In the case of relative position measurements, UO requires only a persistence-of-excitation condition on the target acceleration, whereas for bearing measurements, additional conditions are required. Building on this, a nonlinear complementary filter on $\mathbf{SO}(3)$ is designed to provide a smooth estimate of the orientation component of the state with almost global asymptotic stability. Finally, simulation results are provided to validate the proposed solution.

2605.13015 2026-05-14 eess.IV cs.CV cs.LG

A General Bézier Tree Encoding Counterfactual Framework for Retinal-Vessel-Mediated Disease Analysis

Tan Su, Ethan Elio Meidinger, Lin Gu, Ruogu Fang

发表机构 * Department of Electronic and Electrical Engineering(电子与电气工程系) School of Data Science(数据科学学院) Research Institute of Electrical Communication(电气通信研究所) J. Crayton Pruitt Family Department of Biomedical Engineering(姜·克雷顿·普瑞特家庭生物医学工程系)

AI总结 该研究提出了一种基于Bézier曲线树编码的反事实框架(BTECF),用于分析视网膜血管结构与全身性疾病的因果关系。该方法将视网膜血管网络抽象为连接的立方Bézier曲线段,从而在保持血管拓扑结构的同时实现对几何特征(如弯曲度、管径)的原子级干预。通过结合扩散生成模型,BTECF能够在不破坏背景纹理的前提下,对血管结构进行可控的反事实生成,并在糖尿病视网膜病变、缺血性中风和阿尔茨海默病等疾病中验证了其有效性,为跨疾病的因果假设验证提供了统一的生成范式。

Comments 33 pages, 6 figures; preprint

详情
英文摘要

The geometry of the retinal vessel is a key biomarker of vascular diseases, yet clinical evidence remains primarily observational. Existing generative counterfactuals intervene only at the image-level disease label, failing to isolate explicit anatomical structure. To address this limitation, we propose the Bézier Tree Encoding Counterfactual Framework (BTECF). By abstracting vascular networks into interconnected cubic-Bézier segments, BTECF establishes a disease-agnostic representation in which structural topology is explicitly preserved and atomically perturbable. Coupling this encoding with a diffusion-based generator enables parameter-level do-interventions on explicit geometric axes (e.g., tortuosity, caliber) while preserving background fundus textures. We validate BTECF on diabetic retinopathy, together with independent cohorts for ischemic stroke and Alzheimer's disease. Isolated counterfactual interventions produce dose-responsive shifts in classifier predictions; a matched pixel-drop control attenuates this response by an order of magnitude or more, ruling out out-of-distribution generation artifacts. By enforcing causal isolation between vessel topology and pixel-level confounders, BTECF provides a unified generative paradigm for hypothesis verification across systemic diseases. To support reproducibility, the code will be publicly released upon acceptance.

2605.12999 2026-05-14 q-bio.NC cs.LG

Implicit Behavioral Decoding from Next-Step Spike Forecasts at Population Scale

John R. Minnick, Jesus Gonzalez-Ferrer, Kamran Hussain, Jinghui Geng, Ash Robbins, Mohammed A. Mostajo-Radji, David Haussler, Jason Eshraghian, Mircea Teodorescu

发表机构 * Department of Electrical and Computer Engineering, University of California, Santa Cruz, CA, USA(加州大学圣克ruz分校电气与计算机工程系) UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA(加州大学圣克ruz分校基因组研究所) Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA(加州大学圣克ruz分校生物分子工程系) Department of Applied Mathematics, University of California, Santa Cruz, CA, USA(加州大学圣克ruz分校应用数学系) Department of Computer Science and Engineering, University of California, Santa Cruz, CA, USA(加州大学圣克ruz分校计算机科学与工程系)

AI总结 该研究提出了一种基于单个Mamba模型的闭环脑机接口方法,能够在一次前向传播中同时预测神经群体活动和解码动物行为状态。通过在Neuropixels尺度的下一步尖峰计数上训练模型,并使用轻量级线性分类器读取模型预测的发放率,该方法在行为解码任务中表现优于直接使用原始尖峰数据的线性分类器。实验表明,该方法在视觉辨别任务中能够准确解码小鼠的选择和刺激侧,且在计算效率和解码性能上均优于传统方法。

Comments 21 pages, 6 figures, 5 tables; submitted to NeurIPS 2026 Neuroscience & Cognitive Science Track

详情
英文摘要

Closed-loop brain-computer interfaces often require both a forecast of upcoming neural population activity and a readout of the animal's behavioral state. A single Mamba forecaster, trained only on next-step spike counts at Neuropixels scale, can deliver both in one forward pass. A lightweight per-session linear head reading the model's predicted rates decodes behavior better than the same linear classifier reading the raw spike counts, under matched temporal context. We test on the Steinmetz visual-discrimination benchmark, which spans 39 sessions, roughly 27,000 neurons, and 1,994 held-out trials. Across three training seeds, Mamba's predicted rates decode mouse choice at 75.7$\pm$0.2% trial vote, roughly 2.3 times chance level, and stimulus side at 66.1$\pm$0.6%, about twice chance. Compared to a matched 500 ms-context linear decoder on the raw spike counts, Mamba wins at trial vote by 4-6 pp on response and 4-6 pp on stimulus side. A session-start calibration block of about 100-150 trials brings the readout within 1-2 pp of asymptote, and the full pipeline fits inside the 50 ms bin budget on workstation-class GPUs typical of tethered chronic Neuropixels recordings.

2605.12992 2026-05-14 q-bio.NC cs.LG

SpikeProphecy: A Large-Scale Benchmark for Autoregressive Neural Population Forecasting

John R. Minnick, Jinghui Geng, Kamran Hussain, Jesus Gonzalez-Ferrer, Ash Robbins, Mohammed A. Mostajo-Radji, David Haussler, Jason K. Eshraghian, Mircea Teodorescu

发表机构 * Department of Electrical and Computer Engineering, University of California, Santa Cruz, CA, USA(加州大学圣克ruz分校电气与计算机工程系) UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA(加州大学圣克ruz分校基因组研究所) Department of Computer Science and Engineering, University of California, Santa Cruz, CA, USA(加州大学圣克ruz分校计算机科学与工程系) Department of Applied Mathematics, University of California, Santa Cruz, CA, USA(加州大学圣克ruz分校应用数学系) Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA(加州大学圣克ruz分校生物分子工程系)

AI总结 本文提出SpikeProphecy,首个用于真实电生理记录的因果自回归神经元群体预测的大规模基准。研究通过分解群体预测性能指标,分别评估时间保真度、空间模式准确性和幅度不变对齐,揭示了传统单一相关系数所掩盖的关键信息。实验基于105个Neuropixels数据集,对比了七种不同结构的预测模型,发现不同脑区的预测能力存在显著差异,并揭示了在泊松计数域中ANN到SNN迁移的负结果。

Comments 26 pages, 4 figures, 12 tables; submitted to NeurIPS 2026 Datasets and Benchmarks Track; processed dataset at https://huggingface.co/datasets/mysteriousauthor/spikeprophecy-steinmetz (CC-BY-4.0); code at https://github.com/JohnMinnick/SpikeProphecy-A-Large-Scale-Benchmark-for-Autoregressive-Neural-Population-Forecasting

详情
英文摘要

Neural population models, which predict the joint firing of many simultaneously recorded neurons forward in time, are typically evaluated by a single aggregate Pearson correlation $r$ between predicted and actual spike counts, a number that masks critical structure. We argue that how we evaluate spike forecasting matters as much as what we build, and introduce SpikeProphecy, the first large-scale benchmark for causal, autoregressive spike-count forecasting on real electrophysiology recordings. Our core contribution is a population metric decomposition that separates aggregate performance into temporal fidelity, spatial pattern accuracy, and magnitude-invariant alignment. The decomposition surfaces aspects of the underlying data that an aggregate scalar collapses together. We apply the protocol to 105 Neuropixels sessions (Steinmetz 2019 + IBL Repeated Site; ~89,800 neurons) with seven architecture baselines spanning four structural families: four SSMs (three diagonal and one non-diagonal), a Transformer, an LSTM, and a spiking network. The decomposition surfaces a brain-region predictability ranking that reproduces across all seven baselines and survives ANCOVA correction for firing-statistics constraints (region $ΔR^2 = 0.018$ above the firing-statistics covariates). It also exposes a sub-Poisson evaluation floor where rigorous metrics combine with genuine biophysical constraints on regular spike trains, and yields a negative result on KL-on-output-rates distillation for ANN-to-SNN transfer in this Poisson count domain.

2605.12947 2026-05-14 stat.ML cs.AI cs.LG stat.ME

When Should an AI Workflow Release? Always-Valid Inference for Black-Box Generate-Verify Systems

Young Hyun Cho, Will Wei Sun

发表机构 * Department of Statistics, Purdue University(普渡大学统计系)

AI总结 随着基于大语言模型的AI工作流越来越多地采用生成-评估-修订的迭代流程,如何在适当的时候停止迭代并输出结果成为一个关键问题。本文提出了一种始终有效的发布包装器,用于现有生成-评估系统,通过构建高分失败案例的参考池并结合e-process累积证据,实现了在不确定停止时机下的统计保证。该方法能够在保证不释放不可行任务结果的同时,仍能对可行任务进行有效发布,理论分析和实验结果均验证了其有效性。

详情
英文摘要

LLM-enabled AI workflows increasingly produce outputs through iterative generate-evaluate-revise loops. Each iteration can improve the candidate, but it also creates a release decision: when to stop and output the current result? This raises a statistical challenge because deployment-time evaluator scores are adaptively generated and repeatedly monitored, yet the likelihood models or exchangeability assumptions typically used for calibration are unavailable. We propose an always-valid release wrapper for existing generator-evaluator pipelines. The wrapper builds a hard-negative reference pool of high-scoring failures, calibrates deployment-time evaluator scores against this pool, and accumulates the resulting evidence with an e-process. This separates two roles: the reference pool turns black-box scores into conservative evidence, while the e-process provides validity under optional stopping. In theory, we show that a conservative reference pool yields finite-sample control of the probability of releasing on infeasible tasks, that is, tasks for which the given workflow is not capable of producing a reliable solution. We also characterize conditions under which the same conservative rule still achieves nontrivial release on feasible tasks. In an MBPP+ coding-agent case study, the wrapper reduces premature incorrect release relative to baseline stopping rules while still releasing on tasks for which the workflow repeatedly accumulates moderate supporting evidence.

2605.12927 2026-05-14 cs.CR cs.CV cs.HC

ThermalTap: Passive Application Fingerprinting in VR Headsets via Thermal Side Channels

Mahsin Bin Akram, A H M Nazmus Sakib, OFM Riaz Rahman Aranya, Raveen Wijewickrama, Kevin Desai, Murtuza Jadliwala

发表机构 * Meta HTC

AI总结 本文提出了一种名为ThermalTap的被动非接触式侧信道攻击方法,通过VR头显外壳发出的长波红外辐射,远程识别正在运行的VR应用,无需任何设备交互或恶意软件执行。该方法将头显的热信号作为内部计算负载的高保真代理,结合环境传感器数据消除噪声干扰,实现了在室内外环境下对多种VR应用的高精度识别。研究揭示了热辐射作为沉浸式系统中不可忽视的隐私风险,暴露了现有软件防护和物理访问控制难以覆盖的安全漏洞。

详情
英文摘要

Standalone virtual reality (VR) headsets process highly sensitive personal, professional, and health-related data, yet their susceptibility to non-contact physical side channels remains largely unexplored. Existing side-channel attacks typically require malicious software execution or physical access to peripherals, making them conspicuous and potentially patchable. This paper introduces ThermalTap, the first passive, non-contact side-channel attack that fingerprints VR applications solely from the long-wave infrared (LWIR) radiation emitted by the headset chassis. By treating a headset's thermal signature as a high-fidelity proxy for internal computational workloads, ThermalTap enables remote application inference at meter-scale distances without any device interaction. To achieve robust performance in real-world settings, the system combines a commodity thermal camera with a multi-modal sensor suite (capturing ambient temperature, humidity, and airflow) to normalize environmental noise. We evaluate ThermalTap using six applications across three commercial standalone headsets. In indoor settings, ThermalTap identifies applications with over 90% accuracy using only 10 seconds of thermal camera data. Under outdoor conditions, with longer session-level observations, several applications remain identifiable despite environmental variability, with the strongest outdoor application reaching 81% accuracy. Our findings establish thermal radiation as a fundamental and unavoidable privacy risk for immersive systems, exposing a critical security gap that bypasses current software-level protections and physical access controls.

2605.12916 2026-05-14 cs.MA cs.LG

SHM-Agents: A Generalist-Specialist Integrated Agent System for Structural Health Monitoring

Yuequan Bao, Xing Li, Huabin Sun, Dawei Liu, Yuxuan Tian, Haiyang Hu

发表机构 * Key Lab of Smart Prevention and Mitigation of Civil Engineering Disasters of the Ministry of Industry and Information Technology, Harbin Institute of Technology, Harbin(工业和信息化部智能防灾减灾重点实验室,哈尔滨工业大学,哈尔滨) Key Lab of Structures Dynamic Behavior and Control of the Ministry of Education, Harbin Institute of Technology, Harbin(教育部结构动态行为与控制重点实验室,哈尔滨工业大学,哈尔滨) School of Civil Engineering, Harbin Institute of Technology, Harbin(哈尔滨工业大学土木工程学院)

AI总结 本文提出了一种名为SHM-Agents的通用-专用集成智能体系统,用于结构健康监测,旨在解决现有专用算法在实施难度、互操作性和训练复杂性方面的不足。该系统结合了大语言模型的推理与规划能力以及专用算法的问题求解优势,支持通过自然语言端到端执行多种SHM任务,并具备预训练深度学习模型和模块化扩展能力。实验表明,SHM-Agents在大跨度斜拉桥上的应用能够高效准确地完成包括数据异常诊断、模态识别、损伤检测等多种复杂任务。

Comments 19 pages, 20 figures

详情
英文摘要

Artificial intelligence is increasingly used to simplify complex tasks. In engineering applications of structural health monitoring (SHM), existing specialized algorithms, while effective, often face high implementation barriers, limited interoperability and complex training procedures. To overcome these challenges, this paper proposes SHM-Agents, a generalist-specialist agent system that integrates the reasoning and planning abilities of large language models with the problem-solving strengths of specialized algorithms. SHM-Agents enables end-to-end execution of single and combined SHM tasks via natural language, supports deep learning pre-training to simplify deployment and allows flexible expansion through a modular design. Experiments on a long-span cable-stayed bridge show that SHM-Agents can accurately and efficiently perform diverse SHM tasks, including data anomaly diagnosis and recovery, signal processing, statistical analysis, modal identification, damage identification, finite element model updating, vehicle load modeling, response calculation, reliability assessment, fatigue estimation and bridge knowledge Q\&A.

2605.12908 2026-05-14 stat.ML cs.LG

The Mechanism of Weak-to-Strong Generalization: Feature Elicitation from Latent Knowledge

Ryoya Awano, Taiji Suzuki

发表机构 * University of Tokyo(东京大学) Center for Advanced Intelligence Project, RIKEN(RIKEN高级智能项目中心)

AI总结 本文研究了从弱模型到强模型的泛化机制(W2S),即通过弱模型的输出对强模型进行微调,使强模型在保持原有能力的同时学习新任务。作者在奖励模型学习的设定下,利用两层神经网络分析了该过程,证明强模型能够高效学习任务特征并保留预训练的通用能力,而不会发生灾难性遗忘。该研究为理解W2S泛化提供了理论支持,并展示了其在特征学习场景中的有效性。

Comments 48 pages, 1 figure

详情
英文摘要

Weak-to-strong (W2S) generalization, in which a strong model is fine-tuned on outputs of a weaker, task-specialized model, has been proposed as an approach to aligning superhuman AI systems. Existing theoretical analyses either fix the student's representations or operate in restricted settings. Whether multi-step SGD can succeed in feature learning while preserving diverse pre-trained capabilities remains open. We study W2S in the setting of reward-model learning with two-layer neural networks. The strong model has pre-trained representations organized into low-dimensional subspaces $V_k$, and is fine-tuned under the supervision of a weak model specialized on task $κ$. We prove that the strong model efficiently learns task $κ$, eliciting its pre-trained knowledge while retaining general capabilities. This establishes W2S generalization in the feature-learning regime, in the sense that the strong model acquires the target feature direction through W2S training, rather than having it given a priori. Moreover, W2S preserves pre-trained off-target features, whereas standard supervised fine-tuning causes catastrophic forgetting when off-target feature directions are correlated with the target's. Numerical experiments on synthetic data confirm our theoretical results.