arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4035
2605.09509 2026-05-12 stat.ML cs.LG stat.ME

Empirical Bayes 1-bit matrix completion

Takeru Matsuda

AI总结 本文研究了二值矩阵中未观测条目预测的问题,即1比特矩阵补全,该问题在推荐系统等领域有广泛应用。受Efron-Morris估计器启发,作者提出了一种经验贝叶斯方法,通过收缩奇异值来利用二值矩阵的低秩结构,方法在预测精度、不确定性量化和计算效率方面优于现有方法。

详情
英文摘要

The problem of predicting unobserved entries in a binary matrix, known as 1-bit matrix completion, has found diverse applications in fields such as recommendation systems. In this study, we develop an empirical Bayes method for 1-bit matrix completion motivated by the Efron--Morris estimator, a matrix generalization of the James--Stein estimator that shrinks singular values toward zero. The proposed method exploits the underlying low-rank structure of binary matrices, drawing parallels with multidimensional item response theory. Simulation studies and real-data applications demonstrate that the proposed method achieves a superior balance of predictive accuracy, calibration reliability (uncertainty quantification), and computational efficiency compared to existing methods.

2605.09504 2026-05-12 cs.CR cs.AI cs.LG

Position: AI Security Policy Should Target Systems, Not Models

Michael A. Riegler, Inga Strümke

AI总结 该研究提出了一种名为swarm-attack的开源对抗测试框架,通过多个轻量级大语言模型代理的协作,实现了对前沿模型的安全绕过和软件漏洞的高效发现。实验表明,即使使用普通硬件和公开模型,也能以几乎零成本完成对GPT-4o等模型的有效攻击,并在短时间内准确检测出软件中的严重漏洞。研究指出,系统架构本身是实现这些能力的关键,而非模型本身的复杂性。

详情
英文摘要

We present swarm-attack, an open-source adversarial testing framework in which multiple lightweight LLM agents coordinate through shared memory, parallel exploration, and evolutionary optimization. Together, our results demonstrate that both safety bypass of frontier models and software vulnerability discovery, i.e., the capability class that motivated restricted release of Anthropic's Mythos Preview, are achievable at effectively zero cost using commodity hardware and openly available models. We report two experiments. In the first, five instances of a 1.2 billion parameter model conducted 225 jailbreak attacks each against GPT-4o and Claude Sonnet~4. Against GPT-4o, the swarm achieved an Effective Harm Rate of 45.8%, producing 49 critical-severity breaches; against Claude Sonnet-4, the Effective Harm Rate was 0% despite a 40% technical success rate. In the second experiment, the same models performed combined source code analysis and binary fuzzing against a vulnerable C application with 9 planted CWEs. With a hand-crafted exploit seed corpus, regex pattern detection, and AddressSanitizer-based crash classification, the pipeline recovers 9 of 9 vulnerabilities (100% recall) in approximately four minutes on a consumer MacBook. With those scaffold components disabled, the same model recovers 0 of 9 by crash verification and 2 of 9 by citation. The capability class that motivated restricted release of Anthropic's Mythos Preview is therefore reproducible at effectively zero cost; the important enabler is the system scaffold itself, which compensates for the limited reasoning capacity of small individual models.

2605.09495 2026-05-12 physics.chem-ph cs.LG physics.comp-ph

Enabling Structure-Only Initialization and Out-of-Distribution Generalization in GNN-based Molecular Dynamics Simulators

S. A. Shteingolts, Salman N. Salman, Dan Mendels

AI总结 该研究针对基于图神经网络(GNN)的分子动力学模拟器在逆向设计中的初始化和分布外泛化能力不足的问题,提出两种互补策略。通过引入推理时的物理优化框架和可微分的GNN压强控制器,有效提升了模拟器从单一静态结构初始化的稳定性,并增强了其在分布外条件下的泛化能力。实验表明,这些方法显著提高了模拟的稳定性与可靠性,为材料发现和结构优化提供了更高效、通用的工具。

Comments 10 pages, 7 figures

详情
英文摘要

Machine learning-based simulators offer the potential to model the dynamics of complex systems more efficiently than classical approaches, while retaining differentiability, a key property for materials design. Graph neural network (GNN)-based simulators have shown strong performance across a range of physical domains, including molecular dynamics. However, their reliance on temporal context for accurate prediction limits their use in inverse design settings, where simulations must be initialized from a single static configuration. Moreover, inverse design requires robust out-of-distribution (OOD) generalization, as candidate structures typically lie outside the training domain. Here, we address both challenges by introducing two complementary strategies that enable stable and accurate structure-only initialization of GNN-based simulations. To directly target OOD generalization, we propose an inference-time physics-based optimization framework that constrains model predictions to remain physically consistent during rollout. In addition, we introduce a differentiable, GNN-based barostat that enables accurate tracking of system dimensions and pressure, critical for capturing macroscopic responses and supporting OOD generalization. We evaluate these approaches in the context of uniaxial compression of disordered elastic networks spanning a broad range of geometries, Poisson ratios, and microscopic behaviors. We find that, together, these methods substantially improve rollout stability and enable reliable OOD generalization, including regimes with distinct, more complex dynamics than those in the training data. These results show that, when properly initialized and constrained, GNN-based simulators can serve as efficient and generalizable tools for materials discovery and structural optimization, advancing their use in materials, molecular, and dynamical system design.

2605.09479 2026-05-12 eess.IV cs.CV cs.MM

ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality

Feng Ding, Haisheng Fu, Jie Liang, Qihan Xu, Siyu Zhu, Jingning Han

AI总结 本文从机器视角出发,研究全参考图像质量评估问题,旨在评估图像在多下游模型中信息保留的程度。提出了一种基于CLIP视觉编码器的可微质量度量方法ML-CLIPSim,通过聚合中间特征相似性和全局图像嵌入来近似机器感知的图像质量。实验表明,该方法在机器偏好、人类质量预测以及图像压缩任务中均表现出优越性能,优于传统保真度和感知度量。

详情
英文摘要

We study full-reference image quality assessment from a machine-centric perspective, where images are evaluated by how well they preserve information for downstream models. We formulate machine-oriented quality as a latent machine utility and approximate it through pairwise predictive-consistency comparisons. To this end, we construct PCMP, a dataset of PSNR-matched distortion pairs labeled by consistency votes from multiple pretrained models. We further propose ML-CLIPSim, a differentiable quality metric built on a frozen CLIP visual encoder, which aggregates intermediate patch-token similarities and global image embeddings. Experiments on machine-preference benchmarks, human-IQA datasets, and learned image compression show that ML-CLIPSim better aligns with machine-oriented preferences than conventional fidelity and perceptual metrics, while remaining competitive for human quality prediction. Used as a compression distortion term, it improves rate--task trade-offs across multiple downstream tasks.

2605.09456 2026-05-12 stat.ML cs.LG math.AP math.OC

Quantitative Local Convergence of Mean-Field Stein Variational Gradient Flow

Lénaïc Chizat, Maria Colombo, Roberto Colombo, Xavier Fernández-Real

AI总结 本文研究了均场Stein变分梯度流(SVGD)在局部区域内的定量强收敛性。针对$d$维环面上的Riesz型交互核,作者在初始密度与目标分布在$L^2$范数下接近且光滑的条件下,给出了明确的多项式收敛速率,并证明了这些速率在某些情形下是紧致的。研究还表明,当核具有库仑奇异性时,可恢复先前工作的全局指数收敛结果,理论分析受到核均值差异Wasserstein梯度流研究的启发。

详情
英文摘要

Stein Variational Gradient Descent (SVGD) is a deterministic interacting-particle method for sampling from a target probability measure given access to its score function. In the mean-field and continuous-time limit, it is known that the flow converges weakly toward the target, but no quantitative rate is known for the last iterate. In this paper, we establish quantitative local convergence in strong norms for this dynamics, when the interaction kernel is of Riesz type on the $d$-dimensional torus. Specifically, assuming that the initial density and the target are smooth and close in $L^2$-norm, we obtain explicit polynomial convergence rates in $L^2$-norm that depend on the dimension and on the regularity parameters of the kernel, the initialization and the target. We further show that these rates are sharp in certain regimes, and support the theory with numerical experiments. In the edge case of kernels with a Coulomb singularity, we recover the global exponential convergence result established in prior work. Our analysis is inspired by recent results on Wasserstein gradient flows of kernel mean discrepancies.

2605.09454 2026-05-12 stat.ML cs.LG

Optimal Regret for Single Index Bandits

Devdan Dey, Sujoy Bhore, Avishek Ghosh

AI总结 本文研究单索引老虎机问题,其中奖励依赖于高维上下文的未知一维投影,且投影函数未知。该模型扩展了线性及广义线性老虎机到非参数设置,适用于奖励函数未知的情形。作者提出了一种两阶段算法ZoomSIB-UCB,通过归一化Stein估计器估计投影方向,再将其转化为一维老虎机问题并使用UCB策略,从而在无需额外假设的情况下实现了最优的$\tilde{\mathcal{O}}(T^{2/3})$ regret上界,并证明了匹配的下界$\tildeΩ(T^{2/3})$,给出了单索引老虎机问题的精确regret刻画。

Comments 27 pages, 9 figures

详情
英文摘要

We study the $\textit{single-index bandit}$ problem, where rewards depend on an unknown one-dimensional projection of high-dimensional contexts through an unknown reward function. This model extends linear and generalized linear bandits to a nonparametric setting, and is particularly relevant when the reward function is not known in advance. While optimal regret guarantees are known for monotone reward functions, the general non-monotone case remains poorly understood, with the best known bound being $\tilde{\mathcal{O}}(T^{3/4})$ (under standard boundedness and Lipschitz assumptions on the reward function [Kang et al., 2025]). We close this gap by establishing the optimal regret for general single-index bandits. We propose a simple two-phase algorithm, namely, Zoomed Single Index Bandit with Upper Confidence Bound ($\texttt{ZoomSIB-UCB}$), that first estimates the projection direction via a normalized Stein estimator, and then reduces the problem to a one-dimensional bandit using discretization and finally use UCB. This approach achieves a regret of $\tilde{\mathcal{O}}(T^{2/3})$, and improves significantly upon prior work without any additional assumptions. We also prove a matching minimax lower bound of $\tildeΩ(T^{2/3})$, showing that the upper bound is essentially tight. Our upper and lower bounds together provide a sharp characterization of the regret in single-index bandits. Moreover, the empirical results further demonstrate the effectiveness and robustness of our approach.

2605.09434 2026-05-12 cs.DC cs.HC cs.LG

PoHAR: Understanding Hyperlocal Human Activities with Pollution Sensor Networks

Prasenjit Karmakar, Karthik Reddy, Sandip Chakraborty

AI总结 本文提出了一种名为PoHAR的框架,旨在利用低成本空气污染传感器网络理解室内的超本地化人类活动。该框架通过冲突自由的数据复制机制、基于ESP32的分层聚类方法以及基于领导者的分组推理策略,实现了传感器网络在有限计算资源下的协同活动检测。实验表明,该方法在使用现成机器学习模型的情况下,能够以低于34微秒的延迟实现高达97.41%的室内活动识别准确率和99.68%的烹饪活动识别准确率。

Comments 8 pages, 8 figures, accepted to IEEE DCOSS-IoT 2026

详情
英文摘要

Low-cost air quality sensors are becoming ubiquitous in our daily lives as public awareness of air pollution continues to grow, and people take measures to monitor and improve the air they breathe indoors. Besides the standard operation of these sensors, fluctuations in environmental parameters can be leveraged to understand human behavior and activities in indoor spaces. Unlike traditional audio-visual, Radio Frequency, and inertial sensors, air quality sensors are easily scalable to a household, are privacy-preserving, and more economical. Such distributed sensor networks must jointly make decisions to monitor indoor occupants for downstream smart home and healthcare applications. However, due to low processing power, memory, and energy, they often struggle to maintain distributed data consensus and identify activity-affected sensor groups for accurate on-device inference. In this paper, we propose PoHAR framework that implements: (i) a conflict-free replicated data primitive for data sharing, (ii) a hierarchical clustering for ESP32 to detect activity-affected sensor groups with a self-supervised distance metric, and (iii) a leader-based group inference with off-the-shelf ML classifiers, enabling the sensor network to collaboratively detect hyperlocal indoor activities. Our extensive experiments demonstrated on-device activity detection, achieving 97.41% accuracy for indoor activity and 99.68% for cooking activity, using off-the-shelf ML models with latency below 34 microseconds.

2605.09396 2026-05-12 cs.IT cs.LG math.IT math.ST stat.ML stat.TH

Universal Feature Selection with Noisy Observations and Weak Symmetry Conditions

Dier Tang, Guangyue Han

AI总结 本文放宽了现有研究中对对称性的严格限制,提出了一种适用于噪声观测和具有方向偏好属性结构的通用特征选择框架。通过引入由二阶矩距离度量的弱球对称性概念,允许在旋转不变性上存在可控偏差,并基于噪声数据计算的典型依赖矩阵的奇异值分解构建特征选择方法。研究证明,所选特征在渐近情况下可达到接近最优的误差指数,其性能依赖于对称性偏差和噪声水平,当这些参数较小时,结果与已有研究一致,表明精确球对称性并非必要。该成果展示了框架对二阶矩偏差和观测噪声的鲁棒性,拓展了其在多种推理任务中的适用性。

Comments 6 pages, 0 figures. This work has been submitted to the 2026 IEEE Information Theory Workshop (ITW) for possible publication

详情
英文摘要

This paper relaxes the restrictive symmetry conditions adopted in [4], [5] and extends their universal feature selection framework to accommodate noisy observations as well as attribute structures that may exhibit directional preferences. We introduce the notion of weak spherical symmetry, quantified by second-moment distances, which allows controlled deviations from rotational invariance. Under this relaxed condition, we develop a universal feature selection framework based on the singular value decomposition of the canonical dependence matrix computed from noisy data. Our main result shows that the selected features achieve asymptotically optimal error exponents up to a residual term that depends on the symmetry deviation $δ$ and the noise levels $η_1, η_2$. When $δ, η_1, η_2$ are relatively small, our result recovers that of [5], thereby demonstrating that exact spherical symmetry is unnecessary. Overall, our findings highlight the robustness of the selection framework against second-moment deviations and observation noise, thereby broadening its applicability across diverse inference tasks and providing a theoretically grounded tool for universal feature selection in practical scenarios.

2605.09386 2026-05-12 eess.AS cs.AI cs.LG

Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech

Dong Yang, Yiyi Cai, Haoyu Zhang, Yuki Saito, Hiroshi Saruwatari

AI总结 该论文研究了用于零样本文本到语音生成的度量诱导离散流匹配(MI-DFM)方法中的调度优化与路径跟踪问题。为了解决现有方法中调度器依赖超参数搜索以及连续时间马尔可夫链求解器带来的有限步长路径误差,作者提出了一种基于动能最优的调度策略和有限步长矩修正方法,从而提升了生成质量与稳定性。实验表明,所提出的GibbsTTS在客观自然度和主观评价中均优于现有方法,并在说话人相似性方面表现出色。

Comments Under Review

详情
英文摘要

Metric-induced discrete flow matching (MI-DFM) exploits token-latent geometry for discrete generation, but its practical use is limited by two issues: heuristic schedulers requiring hyperparameter search, and finite-step path-tracking error from its first-order continuous-time Markov chain (CTMC) solver. We address both issues. First, we derive a kinetic-optimal scheduler for prescribed scalar-parameterized probability paths, and instantiate it for MI-DFM as a training-free numerical schedule that traverses the path at constant Fisher-Rao speed. Second, we introduce a finite-step moment correction that adjusts the jump probability while preserving the CTMC jump destination distribution. We validate the resulting method, GibbsTTS, on codec-based zero-shot text-to-speech (TTS). Under controlled comparisons with a unified architecture and large-scale dataset, GibbsTTS achieves the best objective naturalness and is preferred in subjective evaluations over masked discrete generative baselines. Additionally, in comparison with the evaluated state-of-the-art TTS systems, GibbsTTS shows strong speaker similarity, achieving the highest similarity on three of four test sets and ranking second on the fourth. Project page: https://ydqmkkx.github.io/GibbsTTSProject

2605.09362 2026-05-12 cs.GR cs.CV

FrameTwin: Curve-Anchored Gaussian Alignment from Sparse Views for Adaptive Wireframe 3D Printing

Wenting Wang, Zhuo Huang, Kun Qian, Neelotpal Dutta, Yuhu Guo, Yingjun Tian, Yeung Yam, Charlie C. L. Wang

AI总结 本文提出了一种名为FrameTwin的框架,用于从稀疏视角图像中进行自适应丝状结构3D打印的曲线锚定高斯对齐。该方法通过将高斯核锚定在参数化曲线上,捕捉薄丝结构的变形,从而获得紧凑且具有几何感知能力的编码,明确表达支撑结构的拓扑关系。与通用的高斯点扩散方法不同,该方法约束高斯核沿参数曲线分布,显著减少了稀疏视角下对薄结构的歧义,实现了全局一致的变形场对齐,并可用于动态调整后续打印路径。

详情
英文摘要

We present FrameTwin, a curve-anchored Gaussian alignment framework that uses sparse-view images to close the control loop for adaptive wireframe 3D printing. Our key idea is to capture the deformation of thin wireframe structures from sparse-view images using Gaussian kernels anchored to parametric curves, yielding a compact and geometry-aware encoding that explicitly captures strut topology. Driven by a differentiable rendering pipeline, FrameTwin estimates a neural deformation field that aligns the partially printed target model with the deformed structure observed during fabrication, where the optimized curve-Gaussian representation serves as a digital twin of the evolving wireframe. Unlike general Gaussian-splatting approaches, our formulation constrains kernel placement along parametric curves, substantially reducing the ambiguity inherent in sparse-view observations of thin structures. The resultant deformation-field alignment enforces global consistency across all struts. By using the estimated deformation field to blend the distorted printed geometry with the remaining unprinted geometry, FrameTwin enables adaptive updates to future printing trajectories. We demonstrate that FrameTwin can robustly capture and compensate for deformation in wireframe models fabricated using a robotized 3D printing system.

2605.09357 2026-05-12 cs.DC cs.LG

Split CNN Inference on Networked Microcontrollers

Junyu Lu, Shashwath Suresh, Hao Liu, Qi Hong, Qing Wang

AI总结 在微控制器(MCU)上运行深度神经网络面临严重的内存资源限制,尤其是推理过程中中间激活值占用的随机存取存储器(RAM)过高,导致许多模型无法在单个MCU上运行。本文提出了一种细粒度的分布式推理系统,通过在多个网络化MCU之间协作执行卷积神经网络(CNN)模型,实现对内存瓶颈的有效突破。该方法在子层粒度上进行推理分割,重新解释预训练模型以实现核级和神经元级的划分,并通过轻量协调器管理多设备间的资源分配与推理流程。实验表明,该方法能够在保持端到端推理延迟的同时,显著降低每个MCU的峰值RAM使用量,使原本无法在单个MCU上运行的CNN模型得以实现。

Comments 10 pages

详情
Journal ref
IEEE WoWMoM 2026
英文摘要

Running deep neural networks on microcontroller units (MCUs) is severely constrained by limited memory resources. While TinyML techniques reduce model size and computation, they often fail in practice due to excessive peak Random Access Memory (RAM) usage during inference, dominated by intermediate activations. As a result, many models remain infeasible on standalone MCUs. In this work, we present a fine-grained split inference system for networked MCUs that enables collaborative inference of Convolutional Neural Networks (CNN) models across multiple devices. Our key insight is that breaking the memory bottleneck requires splitting inference at sub-layer granularity rather than at layer boundaries. We reinterpret pre-trained models to enable kernel-wise and neuron-wise partitioning, and distribute both model parameters and intermediate activations across multiple MCUs. A lightweight, resource-aware coordinator orchestrates the inference across MCU devices with heterogeneous resources. We implement the proposed system on a real testbed and evaluate it on up to 8 MCUs using MobileNetV2, a representative CNN model. Our experimental results show that CNN models infeasible on a single MCU can be executed across networked MCUs, reducing the per-MCU peak RAM usage while maintaining the practical end-to-end inference latency. All the source code of this work can be found here: https://github.com/shashsuresh/split-inference-on-MCUs.

2605.09349 2026-05-12 math.OC cs.LG cs.SY eess.SY

Mutual Information Optimal Density Control of Linear Systems and Generalized Schrödinger Bridges with Reference Refinement

Shoju Enami, Kenji Kashima

AI总结 本文研究了带有互信息(MI)正则化的离散时间线性系统的最优密度控制问题,旨在在控制性能与随机输入带来的不确定性之间取得平衡。为了解决MI最优控制在安全关键场景中面临的不确定性控制不足的问题,作者引入了特定时刻的高斯密度约束,并提出了一种交替优化算法,推导了算法每一步的闭式解。研究还揭示了该问题的交替优化过程与广义薛定谔桥问题在数学上是等价的,为相关理论提供了新的联系与拓展。

Comments 19 pages, 5 figures

详情
英文摘要

We consider a mutual information (MI) regularized version of optimal density control of a discrete-time linear system. MI optimal control has been proposed as an extension of maximum entropy optimal control to trade off between control performance and benefits provided by stochastic inputs. MI regularization induces stochasticity in the policy, which poses challenges for applications of MI optimal control in safety-critical scenarios. To remedy this situation, we impose Gaussian density constraints at specified times to directly control state uncertainty. For this MI optimal density control problem, we propose an alternating optimization algorithm and derive the closed form of each step in the algorithm. In addition, we reveal that the alternating optimization of the MI optimal density control problem coincides with that of the so-called generalized Schrödinger bridge problem associated with the discrete-time linear system.

2605.09342 2026-05-12 cs.MA cs.LG

A Cross-Layered Multi-Drone Coordination for Medical Supply Delivery during Disaster Response Management

Aneesh Calyam, Subrahmanya Chandra Bhamidipati, Zack Murry, Sharan Srinivas

AI总结 本文研究了灾难响应中多无人机协同配送医疗物资的问题,面临动态环境、网络不稳定、能量受限及患者优先级调度等挑战。提出了一种基于CTDE深度Q网络的协同算法CEDA,通过优先级感知的路径规划与公平调度机制,实现多无人机在不确定环境下的高效协作。实验表明,CEDA在模拟和实际飞行测试中均表现出高配送成功率、低碰撞率及良好的优先级保障能力,有效提升了灾难场景下的医疗物资配送效率与公平性。

Comments 18 pages, 14 figures, 3 tables

详情
英文摘要

Autonomous drone fleets have immense potential in medical supply delivery during disaster incident response. However, coordinating multiple drones in such settings introduces compounding challenges: dynamic environmental hazards such as wind, obstacles, and intermittent network connectivity, constrained energy budgets, and the need to serve patient locations fairly under deadlines and triage-based priority while optimizing schedule utilization. In this paper, we present CEDA, a novel CTDE Deep Q-Network algorithm for cooperative multi-drone medical delivery, designed to jointly optimize triage-priority-aware routing, multi-agent coordination, and energy-efficient navigation under dynamic uncertainty. CEDA introduces a Priority-Preserving Fair Scheduling strategy, in which a structured reward function encodes both triage weights and complementary fairness mechanisms ensuring no patient class is starved of service. We evaluate CEDA in a simulated grid environment featuring dynamic hazard zones, stochastic action failures, and dynamically spawning patients across three triage priority levels, as well as in a PX4 SITL validation using two X500 quadrotors controlled via MAVSDK in offboard position mode. Simulation results demonstrate that CEDA achieves a delivery completion rate above 85%, reduces obstacle collisions by over 90% across training, and delivers an average of 6 patients per episode with a triage efficiency of 0.82. CEDA preserves clinical priority ordering, Critical patients are served first, while achieving near-zero mortality across lower-triage classes, confirming that priority-weighted routing does not condemn Stable or Urgent patients to neglect. PX4 SITL validation further demonstrates that the learned policy remains executable and triage-coherent under practical communication constraints and realistic multi-drone coordination in disaster response settings.

2605.09316 2026-05-12 quant-ph cs.AI

Neural Information Causality

Jeongho Bang, Marcin Pawłowski

AI总结 本文提出了一种名为“神经信息因果性”(Neural-IC)的新框架,将信息因果性理论嵌入到表示学习中,用于分析查询分离计算中的信息传递机制。该方法通过区分表示中的查询泄露、精度泄露和记忆泄露,为模型诊断提供了操作性指标,并揭示了量子增强并非单纯突破瓶颈的信息总量,而是实现公平的查询条件访问。研究还展示了该框架在不同容量限制下的适用性,并通过实验验证了其有效性。

Comments 32 pages, 15 figures (including Appendix)

详情
英文摘要

Query-separated computation forces a representation to play an operational role: data are encoded before a query is known, and a later decoder can answer only through the intermediate interface. In this regime the representation functions as a message rather than merely as a feature map. We formalize this observation by embedding information causality (IC) into representation learning, obtaining a framework called neural information causality (Neural-IC). The revised formulation separates two logically distinct statements. First, every query-separated architecture induces a random-access communication experiment and obeys the embedding inequality $I_{\mathrm{N\text{-}RAC}}\le I(\vec a:H,B)$. Second, any independently certified physical capacity bound on the interface, such as a hard $m$-bit alphabet, a finite-precision register, or a power-constrained noisy channel, implies $I_{\mathrm{N\text{-}RAC}}\le C_H$. This separation avoids treating capacity as a post hoc definition and makes Neural-IC an operational diagnostic for query leakage, precision leakage, and episode-specific memory. We also provide an exact one-bit classical RAC benchmark, showing explicitly that the relevant quantum enhancement is not total information beyond the bottleneck, but fair query-conditioned access. For CHSH-type correlation layers, nested Neural-RAC protocols multiply correlation biases across depth; requiring stability of a one-bit bottleneck for arbitrary depth selects the Tsirelson threshold. We extend the analysis to asymmetric seed biases, to multi-capacity finite-depth phase diagrams, and to correlated data via a conditional information score. Controlled simulations, including straight-through binary bottlenecks and deliberately leaky ablations, verify that apparent violations are accounted for by broken query separation or undercounted capacity.

2605.09305 2026-05-12 stat.ME cs.HC cs.LG stat.CO stat.ML

Reinforcement Learning Measurement Model

Wenqian Xu, Feng Ji

AI总结 本文提出了一种新的强化学习测量模型(RLMM),用于处理交互式评估中产生的序列过程数据,克服了传统项目反应模型和现有基于马尔可夫决策过程的测量模型在处理大规模任务时的计算效率问题。该模型通过共享参数化的动作价值函数,将个体选择敏感性与任务价值表示解耦,从而提高了估计效率,并引入了玻尔兹曼选择规则、软贝尔曼一致性惩罚和块坐标MAP估计方法,实现了对行为关键决策的诊断。实验表明,RLMM在复杂任务中具有更高的估计精度和更低的运行时间,并能有效反映个体决策能力与任务表现之间的关系。

详情
英文摘要

Interactive assessments generate sequential process data that are not well handled by conventional item response models. Existing MDP-based measurement approaches, such as the Markov decision process measurement model (MDP-MM, LaMar, 2018), link action choices to state-action values, but their reliance on person-specific tabular value functions makes them difficult to scale beyond small, fully enumerated tasks. We propose the Reinforcement Learning Measurement Model (RLMM), a measurement framework that decouples person-level choice sensitivity from task-level value representation through a shared parametric action-value function, making estimation more computationally efficient for larger process-data settings. The model combines a Boltzmann choice rule with normalized advantages, a soft Bellman consistency penalty, and a block-coordinate MAP procedure for joint estimation, while also yielding step-level influence diagnostics for identifying behaviorally critical decisions. In peg-solitaire simulations, the RLMM achieved higher estimation accuracy and substantially lower runtime than the original MDP-MM, with advantages increasing as task complexity grew. In AQUALAB gameplay logs, the estimated person parameter was positively associated with cumulative reward, task completion, and behavioral efficiency. These results show that the RLMM extends decision-process-based psychometric models to larger and more behaviorally realistic environments while preserving an interpretable latent trait tied to decision making steps.

2605.09299 2026-05-12 cs.GR cs.LG

LagrangianSplats: Divergence-Free Transport of Gaussian Primitives for Fluid Reconstruction

Ningxiao Tao, Baoquan Chen, Mengyu Chu

AI总结 本文提出了一种用于从稀疏2D视频重建3D流体速度场的新方法,旨在解决该逆问题中运输一致性与流体力学规律之间的冲突。该方法通过引入连续的无散度核表示和拉格朗日高斯点扩散表示,结构化地保证了流场不可压缩性和长期运输一致性。此外,文中还提出了一种滑动窗口优化方案,有效提升了训练效率,实验表明该方法在运输一致性和物理准确性方面优于现有方法,适用于高质量流场重模拟与分析。

详情
英文摘要

Reconstructing 3D fluid velocity fields from sparse 2D video observations is a highly ill-posed inverse problem, demanding both transport consistency with observed motion and physical validity under fluid laws. Existing methods typically impose these constraints through soft penalties, often leading to compromised accuracy and convergence issues. We introduce a reconstruction framework that structurally enforces both constraints. Specifically, we parameterize the reconstructed velocity using a continuous Divergence-Free Kernel representation, driving the advection of a Lagrangian 3D Gaussian Splatting representation. This formulation intrinsically guarantees both flow incompressibility and long-range transport coherence by construction. To enable the efficient optimization of such a constrained system, we introduce a novel Sliding Window scheme that propagates gradients over meaningful temporal horizons while maintaining tractable training costs. Experiments on synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art baselines in both transport consistency and physical accuracy, enabling applications such as high-quality re-simulation and flow analysis.

2605.09279 2026-05-12 cs.GR cs.CV cs.MM cs.NI eess.IV

CAGS: Color-Adaptive Volumetric Video Streaming with Dynamic 3D Gaussian Splatting

Daheng Yin, Yili Jin, Jianxin Shi, Isaac Ding, Miao Zhang, Fangxin Wang, Zhaowu Huang, Cong Zhang, Jiangchuan Liu, Fang Dong

AI总结 本文提出了一种名为CAGS的色彩自适应体素视频流系统,旨在解决动态3D高斯点云在实时传输中的带宽消耗和画质退化问题。该方法通过向量量化建立多细节层次(LoD),并利用低分辨率参考图像进行色彩校正,有效减少了颜色失真。实验表明,CAGS在不同带宽条件下相比现有方法在PSNR指标上提升了5至20 dB,并具有更高的传输效率和跨高斯表示的通用性。

Comments SIGGRAPH 2026 Conference Paper. Code is available at https://github.com/yindaheng98/ColorAdaptiveGaussianSplatting

详情
Journal ref
ACM SIGGRAPH 2026
英文摘要

Volumetric video (VV) streaming enables real-time, immersive access to remote 3D environments, powering telepresence, ecological monitoring, and robotic teleoperation. These applications turn VV streaming into a real-time interface to remote physical environments, imposing new system-level demands for photorealistic scene representation, low-latency interaction, and robust performance under heterogeneous networks. 3D Gaussian Splatting (3DGS) has been widely used for real-time photorealistic rendering, offering superior visual quality and rendering performance, but it faces challenges due to bandwidth consumption. Furthermore, as the foundation of adaptive VV streaming, existing Levels of Detail (LoD) methods based on density are not well-suited to Gaussian representations, leading to visible gaps and severe quality degradation. Recent studies have also explored attribute compression techniques to reduce bandwidth consumption. Our preliminary studies reveal that aggressive attribute compression primarily causes color distortion, which can be effectively corrected in the rendered image using a reference image. Motivated by these findings, we propose a novel Color-Adaptive scheme for adaptive VV streaming that uses vector quantization (VQ) to establish LoDs and correct color distortions with low-resolution reference images. We further present CAGS, an adaptive VV streaming system compatible with diverse Gaussian representations, which integrates the Color-Adaptive scheme by rendering reference images on the streaming server and performing color restoration on the client. Extensive experiments on our prototype system demonstrate that CAGS outperforms the existing adaptive streaming systems in PSNR by 5$\sim$20 dB under fluctuating bandwidth, operates significantly faster than existing scalable Gaussian compression methods, and generalizes across different Gaussian representations.

2605.09242 2026-05-12 eess.IV cs.CV

Cross-Modal Semantic-Enhanced Diffusion Framework for Diabetic Retinopathy Grading

Yiqun Wang

AI总结 本文提出了一种结合视觉-语言预训练和扩散概率建模的跨模态语义增强扩散框架CGSD,用于糖尿病视网膜病变的自动分级。该方法通过低秩适配技术对领域特定的视觉-语言模型进行微调,有效缩小了预训练模型与目标数据集之间的分布差异,并利用图像特征与病变等级文本描述的点积构建跨模态语义条件向量,作为扩散去噪网络的条件输入,提升了模型对细粒度病变特征和临床语义信息的感知能力。实验表明,该方法在APTOS 2019数据集上取得了优于现有方法的准确率和F1分数。

Comments 6 pages, 3 figures, 2 tables

详情
英文摘要

Automated grading of diabetic retinopathy (DR) faces several critical challenges: subtle inter-grade visual distinctions in fine-grained lesion patterns, distributional discrepancies induced by heterogeneous imaging devices and acquisition conditions, and the inherent inability of purely visual approaches to exploit clinical semantic knowledge. In this paper, we propose CLIP-Guided Semantic Diffusion (CGSD), a DR grading framework that synergistically integrates vision-language pretraining with diffusion probabilistic modeling. We adopt a domain-specific vision-language model tailored for DR grading as the semantic guidance module and adapt it to the target domain via Low-Rank Adaptation (LoRA), effectively bridging the distributional gap between the pretrained model and the target dataset with only a minimal number of trainable parameters. Building on this foundation, we construct a cross-modal semantic conditioning vector by computing the dot product between image features and the text description features of each DR grade, yielding a joint representation that simultaneously encodes visual content and clinical-grade semantics. This vector serves as the conditioning signal for the diffusion denoising network, replacing the structurally complex dual-branch visual prior employed in existing diffusion-based classification methods. Experiments on the APTOS 2019 dataset demonstrate that the proposed approach achieves an accuracy of 87.5% and a macro-averaged F1 score of 0.731, outperforming a variety of representative methods. Ablation studies further validate the independent contribution of each constituent module.

2605.07964 2026-05-12 stat.ML cs.LG

Asymptotically Log-Optimal Bayes-Assisted Confidence Sequences for Bounded Means

Valentin Kilian, Stefano Cortinovis, François Caron

AI总结 该论文提出了一种基于贝叶斯预测模型的置信序列构造方法,用于对有界独立同分布观测的均值进行时间统一的不确定性量化。核心方法通过在每一步选择最大化预测期望对数增长的合法鞅更新因子,从而在保持有效性的同时利用先验信息提升效率。研究证明,当预测分布满足Wasserstein一致性时,该方法在渐近意义上达到对数最优,实验表明其在减少置信区间宽度和采样努力方面具有显著优势。

Comments Valentin and Stefano are joint first authors

详情
英文摘要

Confidence sequences based on test martingales provide time-uniform uncertainty quantification for the mean of bounded IID observations without parametric distributional assumptions. Their practical efficiency, however, depends strongly on the choice of martingale updates, and many existing constructions do not exploit prior information about plausible data-generating distributions or mean values. We propose a Bayes-assisted framework that uses a Bayesian working predictive model to adaptively construct confidence sequences. For each candidate mean and time point, the predictive distribution selects, among valid one-step martingale factors, the update maximising predictive expected log-growth; validity is therefore preserved even when the prior or working model is misspecified. We prove that if the predictive distribution is Wasserstein-consistent, the resulting procedure is asymptotically log-optimal, matching the per-sample log-growth of an oracle procedure with access to the true distribution. We instantiate the framework using robust predictives based on Dirichlet-process mixtures and Bayesian exponentially tilted empirical likelihood. Experiments on synthetic data, sequential best-arm identification for LLM evaluation, and prediction-powered inference show that informative priors can substantially reduce confidence-sequence width and sampling effort while retaining anytime-valid coverage.

2605.05743 2026-05-12 stat.ML cs.AI cs.LG

Fourier Feature Methods for Nonlinear Causal Discovery: FFML Scoring, TRFF Scoring, and FFCI Testing in Mixed Data

Joseph D. Ramsey

AI总结 该论文提出三种基于傅里叶特征的实用方法,用于解决非线性因果发现中的大规模计算问题。FFML 评分通过有限维特征表示近似高斯过程边缘似然,降低了计算复杂度并支持混合数据;TRFF 评分采用带惩罚的Student-t回归,具有更强的鲁棒性和更快的运行速度;FFCI 检验则是一种适用于混合数据的快速非参数条件独立性检验方法。这些方法在不同数据场景下表现出互补的优势,提升了因果发现的准确性和效率。

Comments 18 pages, 2 figures, 3 tables

详情
英文摘要

Gaussian process (GP) marginal likelihood scores and kernel conditional independence tests are theoretically appealing for nonlinear causal discovery but computationally prohibitive at scale. We present three complementary RFF-based methods forming a practical toolkit for score-based, constraint-based, and hybrid causal discovery. The Fourier Feature Marginal Likelihood (FFML) score approximates the exact GP marginal likelihood by replacing the $n x n$ kernel Gram matrix with a finite-dimensional feature representation, reducing cost to $O(nm^2 + m^3)$ while retaining the probabilistic interpretation and automatic complexity penalty of the exact score. FFML extends to mixed (continuous and discrete) parent sets via a product-kernel construction, with a Kronecker path for small discrete parent sets and a Hadamard-product path otherwise. The Tetrad Random Fourier Feature (TRFF) score is a complementary BIC-style alternative using penalized Student-t regression with random Fourier features. TRFF offers robustness to heavy-tailed noise and faster runtime than FFML. Empirically, TRFF and FFML exhibit a complementary precision-recall profile: TRFF achieves higher precision while FFML achieves better recall and lower SHD overall. The Fourier Feature Conditional Independence (FFCI) test is a fast nonparametric CI test for mixed data, using ridge residualization in feature space and a Frobenius-norm cross-covariance statistic approximated as a weighted sum of chi-squared variables. Empirically, BOSS+FFML achieves the lowest SHD on nonlinear data, while BOSS+TRFF offers the highest precision. When run through PC-Max, FFCI and RCIT exhibit complementary precision-recall profiles: RCIT is more precise while FFCI achieves better recall and substantially lower SHD, at approximately twice the runtime.

2605.05284 2026-05-12 cs.NE cs.LG q-bio.PE q-bio.QM

Direct From Darwin: Deriving Advanced Optimizers From Evolutionary First Principles

Daniel Grimmer

AI总结 本文从进化论的基本原理出发,直接推导出一系列先进的基于梯度的优化算法,旨在实现高性能优化工具与达尔文进化过程的科学模拟。研究引入了达尔文系谱模拟(DLS)方法,证明在无性繁殖背景下,费舍尔和赖特对进化的对立观点在形式上是等价的,并提出了DLS噪声关系以确保模拟的准确性。通过这一框架,许多成熟的优化算法如随机梯度下降、牛顿法及其正则化形式等被证明与进化动力学兼容,只需引入符合DLS噪声的遗传漂变即可实现对达尔文进化的科学仿真,甚至包括当前最先进的Adam优化器也可通过简单的数学调整实现进化一致性。

Comments 38 pages, 5 figures. Submitted to Evolutionary Computation, May 2026. Code available at: https://github.com/danielgrimmer/adam-dls

详情
英文摘要

Evolutionary computation has long promised to deliver both high-performance optimization tools as well as rigorous scientific simulations of Darwinian evolution. However, modern algorithms frequently abandon evolutionary fidelity for physics-inspired heuristics or superficial biological metaphors. This paper derives a suite of advanced gradient-based optimization algorithms directly from evolutionary first principles. We introduce Darwinian Lineage Simulations (DLS) to prove that, in an asexual context, Fisher's and Wright's historically opposed views of evolution are actually formally equivalent; One can partition Fisher's deterministically-evolving total population into Wright's randomly-drifting sub-populations. We prove that proper bookkeeping requires introducing a specific kind of structured noise (the DLS noise relation). Crucially, any bookkeeping choices which satisfy this relation will yield a faithful simulation of evolution. Using this vast representational freedom, we prove that a broad family of battle-tested optimization algorithms are already perfectly compatible with evolutionary dynamics. These include: Stochastic Gradient Descent as well as many regularizations/approximations of Newton's method and Natural Gradient Descent. By simply adding DLS noise (i.e., evolutionarily faithful genetic drift), these algorithms become scientifically valid in silico simulations of Darwinian evolution. Finally, we demonstrate that even the state-of-the-art Adam optimizer can be brought into evolutionary compliance through a minor mathematical surgery.

2605.04589 2026-05-12 stat.ML cs.LG math.ST stat.TH

Multiscale Euclidean Network Trajectories: Second-Moment Geometry, Attribution, and Change Points

Haruka Ezoe, Ryohei Hisano

AI总结 本文研究动态网络随时间演变的几何表征问题,提出了一种基于二阶矩几何的多尺度欧几里得网络轨迹框架(MENT)。通过引入各向同性归一化处理,消除节点嵌入中的线性变换模糊性,从而保留几何结构并支持轨迹与节点层面的时间变化分析。该方法能够进行模式分解、变化归因和变点检测,并在合成与真实动态网络实验中展现出良好的结构恢复与变点检测性能。

详情
英文摘要

A central challenge in dynamic network analysis is to represent temporal evolution in a way that is both geometrically meaningful and statistically identifiable. One approach embeds a sequence of network snapshots as trajectories in a Euclidean space and relates these trajectories to node embeddings. In multilayer and unfolded spectral constructions, however, node embeddings and their underlying latent positions are identifiable only up to general linear transformations. Although this ambiguity preserves edge probabilities, it can distort geometry and invalidate distance based temporal comparisons at both the trajectory and node-levels. We develop Multiscale Euclidean Network Trajectories (MENT), a framework for multiscale temporal trajectories based on second-moment geometry. By imposing an isotropic normalization on the anchor latent positions, we reduce the relevant ambiguity to orthogonal transformations and prevent distortion of the second-moment geometry. In this canonical representation, we define a trace variation distance and mode-wise variation distances along orthogonal directions, and use multidimensional scaling to obtain low-dimensional trajectories of time points at both global and mode-wise levels. The resulting trajectories support interpretation and inference. They admit mode-wise decompositions, support attribution of global and mode-wise temporal changes to nodes, and enable change point detection through 1D trajectories. We prove consistency of the proposed unfolded spectral embedding and of the induced temporal trajectories. Experiments on two synthetic and two real dynamic networks illustrate stable and interpretable recovery of temporal structure and show strong performance against existing change point detection baselines.

2605.01805 2026-05-12 cs.MA cs.LG

MAGIC: Multi-Step Advantage-Gated Causal Influence for Multi-agent Reinforcement Learning

Haohan Yu, Jinmiao Cong, Shengzhi Wang, Lu Wang, Chanjuan Liu

AI总结 在多智能体强化学习中,如何设计有效的学习信号以促进智能体间的协作是一个关键挑战。为此,研究提出了一种名为MAGIC的框架,通过多步优势门控因果影响估计,将智能体之间的多步动作效应转化为内在奖励,从而引导探索方向。该方法利用反事实干预比较队友在真实与反事实分支下的未来表现,并引入基于优势的门控机制,显著提升了多智能体环境中的协作性能。实验表明,MAGIC在多个基准测试中均优于现有方法,性能提升达10.1%至26.9%。

详情
英文摘要

A key challenge in multi-agent reinforcement learning (MARL) lies in designing learning signals that effectively promote coordination among agents. Designing such signals requires estimating how one agent's current action affects its teammates over future interaction steps. To address this, we introduce Multi-step Advantage-Gated Interventional Causal MARL (MAGIC), a framework that estimates multi-step action effects between agents and selectively converts them into intrinsic rewards. MAGIC uses counterfactual action interventions to compare teammate futures under factual and counterfactual branches, and introduces a gate based on advantage to direct exploration toward beneficial behaviors aligned with the task goal. Experiments on Multi-Agent Particle Environments (MPE) and StarCraft micromanagement benchmarks (SMAC and SMACv2) show that MAGIC consistently outperforms leading prior methods, with average relative final performance improvements of 26.9% and 10.1%, respectively.

2605.00366 2026-05-12 cs.NE cs.LG

Geometric and dynamical analysis of attractor boundaries and storage limits in kernel Hopfield networks

Akira Tamamori

AI总结 本文研究了基于核逻辑回归(KLR)训练的霍普菲尔德网络中吸引子边界的几何与动力学特性及其存储极限。通过结合随机序列和真实图像嵌入(如CIFAR-10)的实验,以及形态变换和信噪比分析,揭示了网络在高负载下仍能保持稳定检索的机制。研究发现,吸引子在优化脊线上由陡峭的势垒分隔,存储极限主要受限于动态稳定性而非特征空间的可分性,为设计鲁棒的大规模记忆系统提供了新视角。

Comments 10 pages, 6 figures

详情
英文摘要

High-capacity associative memories based on Kernel Logistic Regression (KLR) exhibit strong storage capabilities, but the dynamical and geometric mechanisms underlying their stability remain poorly understood. This paper investigates the global geometry of attractor basins and the mechanisms governing the storage limit in KLR-trained Hopfield networks. We combine empirical evaluations using random sequences and real-world image embeddings (CIFAR-10) with morphing experiments and statistical Signal-to-Noise Ratio (SNR) analysis. Our experiments show that the network achieves a storage capacity for random sequences up to $P/N \approx 16$, while maintaining stable retrieval for structured data at effective loads near $P/N \approx 20$. Morphing analysis indicates that attractors on the "Ridge of Optimization" are separated by sharp, phase-transition-like boundaries, characterized by steep effective potential barriers and critical slowing down. Furthermore, by comparing an SNR analysis with a geometric reference point inspired by Cover's theorem, we show that the practical storage limit is governed primarily not by a lack of geometric separability in the feature space, but by the loss of dynamical stability against crosstalk noise. These findings suggest that KLR networks function as highly localized exemplar-based memories that operate near the onset of dynamical collapse, providing a useful perspective on the design of robust, large-scale retrieval systems.

2604.26961 2026-05-12 cs.SE cs.AI cs.PL

Static Program Slicing Using Language Models With Dataflow-Aware Pretraining and Constrained Decoding

Pengfei He, Shaowei Wang, Tse-Hsun Chen, Muhammad Asaduzzaman

AI总结 静态程序切片是软件工程中用于隔离与特定变量相关代码的重要技术。本文提出了一种基于语言模型的新方法Sliceformer,通过引入数据流感知的预训练目标和约束解码机制,有效提升了切片的准确性,减少了生成中的错误内容。实验表明,该方法在Java和Python基准测试中显著优于现有方法,ExactMatch指标提升了高达22%。

Comments Accepted at ACL 2026

详情
英文摘要

Static program slicing is a fundamental software engineering technique for isolating code relevant to specific variables. While recent learning-based approaches using language models (LMs) show promise in automating slice prediction, they suffer from inaccurate dependency modeling and unconstrained generation, where LMs fail to capture precise data flow relations and produce slices containing hallucinated tokens and statements. To address these challenges, we propose Sliceformer, a novel approach that reformulates static program slicing as a sequence-to-sequence task using small language models such as CodeT5+. Sliceformer introduces two key innovations that directly target the identified limitations. First, to improve dependency modeling, we design dataflow-aware pretraining objectives that leverage data flow graphs (DFG) to teach models data dependencies through dataflow-preserving statement permutation and dataflow-aware span corruption. Second, to eliminate hallucination, we develop a constrained decoding mechanism that enforces both lexical and syntactic constraints. We evaluate Sliceformer on Java and Python program slicing benchmarks, demonstrating consistent improvements over state-of-the-art baselines with up to 22% gain in ExactMatch.

2604.24428 2026-05-12 eess.SP cs.AI

BandRouteNet: An Adaptive Band Routing Neural Network for EEG Artifact Removal

Phat Lam

AI总结 该论文提出了一种名为BandRouteNet的自适应频带路由神经网络,用于解决脑电图(EEG)信号中常见的眼电(EOG)和肌电(EMG)等干扰问题。该方法结合了频带特定处理与全带上下文建模,通过引入路由机制动态决定各频带内不同时间位置的去噪强度,并利用全带条件器提取全局时序信息以辅助去噪过程。实验表明,BandRouteNet在多个去噪指标上优于现有方法,且参数量仅为0.2M,具有较高的效率和应用潜力。

Comments Preprint version, 5 pages

详情
英文摘要

Electroencephalography (EEG) is highly susceptible to artifact contamination, such as electrooculographic (EOG) and electromyographic (EMG) interference, which severely degrades signal quality and hinders reliable interpretation in applications including neurological diagnosis, brain-computer interfaces (BCIs), etc. Effective EEG denoising remains challenging because different artifact sources exhibit diverse and temporally varying distributions, together with distinct spectral characteristics across frequency bands. To address these issues, we propose BandRouteNet, an adaptive frequency-aware neural network for EEG denoising that jointly exploits band-specific processing and full-band contextual modeling. The proposed model performs band-wise denoising to explicitly capture frequency-dependent artifact patterns. Within this framework, we introduce a routing mechanism that adaptively determines where and to what extent denoising should be applied across temporal locations within each frequency band. In parallel, a full-band conditioner directly processes the original noisy EEG to extract global temporal context, producing both conditional parameters for modulating the band-wise pathway and a coarse-grained signal-level refinement to supplement the final reconstruction. Extensive experiments on the EEGDenoiseNet benchmark dataset demonstrate that BandRouteNet outperforms other methods under EOG, EMG, and mixed-artifact conditions in terms of Relative Root Mean Square Error (RRMSE) and Signal-to-Noise Ratio Improvement (SNR$_{\text{imp}}$) under unified experimental settings, while remaining highly parameter-efficient with only 0.2M trainable parameters. These results highlight its strong potential for high-performance EEG artifact removal in resource-constrained applications.

2604.23904 2026-05-12 stat.ME cs.AI stat.ML

Generative Synthetic Data for Causal Inference: Pitfalls, Remedies, and Opportunities

Yichen Xu

AI总结 该论文研究了生成合成数据在因果推断中的有效性问题,指出传统生成模型虽在预测性能上表现良好,但可能扭曲平均处理效应(ATE)估计。文章分析了生成模型在保留协变量分布与准确处理效应之间的结构性矛盾,并提出了一种混合生成框架,将协变量生成与处理和结果机制建模分离,以提升因果推断的准确性。实验表明,该方法在多种场景下相比全生成模型能显著提高因果推断的保真度。

详情
英文摘要

Synthetic tabular data are often evaluated by distributional similarity, privacy distance, or train-on-synthetic-test-on-real predictive performance, but these criteria do not ensure validity for causal inference. We show that fully generative tabular synthesizers, including GAN- and LLM-based models, can preserve predictive utility while distorting average treatment effect (ATE) estimates. The failure is structural: ATE preservation requires both a realistic covariate law and an accurate treatment-effect contrast, whereas prediction loss penalizes treatment-effect error only through an overlap-weighted term. We formalize this mismatch through sensitivity and loss-decomposition results, and identify an analogous decomposition in block-level next-token prediction under log loss. Motivated by the tabular causal analysis, we propose a hybrid synthetic-data framework that generates covariates while modeling treatment and outcome mechanisms separately, allowing causal-purpose treatment assignment such as randomized synthetic assignment. We evaluate this framework in three settings: ATE preservation under fully generative versus hybrid synthesis, targeted augmentation for practical positivity problems, and synthetic simulation engines for comparing OR, IPW, AIPW, and TMLE before real-data analysis. Across synthetic and ACTG experiments, hybrid synthesis improves causal fidelity relative to fully generative baselines; LLM-based hybrid synthesis is often more faithful than CTGAN for ATE preservation and finite-sample estimator benchmarking.

2604.16838 2026-05-12 cs.CR cs.AI cs.MA

enclawed: A Configurable, Sector-Neutral Hardening Framework for Single-User AI Assistant Gateways

Alfredo Metere

AI总结 本文提出了一种名为 enclawed 的可配置 AI 助手网关加固框架,旨在满足金融、医疗、国防等受监管行业对可信对等连接、默认拒绝外部通信、签名模块加载和防篡改审计追踪等安全需求。该框架提供两种模式:一种保持与 OpenClaw 兼容并生成审计和数据防泄露信号,另一种则启用严格的白名单、FIPS 加密模块验证和高可信度对等认证。研究还引入了基于数据驱动的分类机制和多种安全测试用例,以提升 AI 网关在面对恶意攻击时的鲁棒性和安全性。

详情
英文摘要

We present enclawed, a hard-fork hardening framework built on the OpenClaw AI assistant gateway. enclawed targets deployments that need attestable peer trust, deny-by-default external connectivity, signed-module loading, and a tamper-evident audit trail -- typically regulated industries (financial services, healthcare, defense, government). The framework ships in two flavors: an open flavor preserving OpenClaw compatibility while emitting audit, classification, and data-loss-prevention (DLP) signals, and an enclaved flavor activating strict allowlists, FIPS cryptographic-module assertion, mandatory manifest signature verification, and high-assurance peer attestation for the Model Context Protocol. The classification ladder is data-driven: deployers pick from five built-in presets or supply their own JSON. We ship a 356-case test suite (261 unit + 95 adversarial pen-tests) covering tamper detection, signature forgery, egress bypass, audit-log truncation, trust-root mutation, DLP evasion, prompt injection, code injection, and biconditional admission for net-capable extensions; real-time human-in-the-loop control; a memory-bounded transaction buffer with rollback; strict-mode TypeScript typecheck; and a CI workflow. The biconditional extension-admission gate extends the skill trust schema to non-skill extensions. The four-level verification lattice is now closed at the top: four skill-formal-* primitives plus a CLI produce a signed proof-carrying bundle the runtime re-checks at load, raising a skill from tested to formal via static effect-containment, refinement-typed dispatch, and bounded model checking. enclawed is a hardening framework, not an accredited certification; hardware, validated crypto, facilities, and assessor sign-off remain the deployer's responsibility.

2604.16349 2026-05-12 cs.IR cs.AI cs.CL

Benchmarking Real-Time Question Answering via Executable Code Workflows

Wenjie Zhou, Yuan Gao, Xin Zhou, Hao Fu, Zhongjian Miao, Wei Chen, Bo Chen, Xiaobing Zhao

AI总结 本文提出RT-QA,一个基于可执行代码工作流的动态实时问答评估框架,旨在解决现有基准在捕捉信息时效性和知识演变方面的不足。该框架通过自主生成代码实现网页爬取和DOM解析,生成实时答案,并引入自修复机制以适应网页结构变化。实验表明,当前最先进的模型在实时适应性方面存在显著局限,揭示了懒惰检索和时间混淆两大主要失效模式,为未来智能体的设计提供了重要启示。

详情
英文摘要

Retrieving real-time information is a fundamental capability for search-integrated agents in real-world applications. However, existing benchmarks are predominantly static and therefore fail to capture the temporal dynamics of information and the continuously evolving nature of real-world knowledge. To address this limitation, we propose RT-QA, a dynamic evaluation framework that leverages executable code workflows to retrieve up-to-date answers at evaluation time. Specifically, we construct an agent-driven pipeline that autonomously generates code for web crawling and DOM-based answer extraction to produce real-time ground truth. To ensure robust evaluation over time, the pipeline further incorporates a self-repair mechanism to adapt to changes in web page structures. RT-QA spans 12 domains (e.g., Finance, Sports) with 320 Chinese questions categorized into three difficulty levels. Extensive evaluations of state-of-the-art models (e.g., GPT-5.2, GLM-4.7) reveal significant limitations in real-time adaptability: even the best models achieve only 46% accuracy. Our analysis highlights two primary failure modes: (1) Lazy Retrieval, where agents rely on search snippets instead of deeply scanning specific websites for information (20% of failures); and (2) Temporal Confusion, a cognitive error where agents retrieve a historical date (e.g., an event in 2024) and fail to re-anchor to the current time (2026) for subsequent reasoning. These findings suggest that future agents require not just better retrieval strategies, but robust temporal state management.

2604.13630 2026-05-12 cs.CR cs.AI

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

Xixun Lin, Yang Liu, Yancheng Chen, Yongxuan Wu, Yucheng Ning, Yilong Liu, Nan Sun, Shun Zhang, Bin Chong, Chuan Zhou, Yanan Cao

AI总结 本文提出了一种名为 SafeHarness 的安全架构,旨在解决基于大语言模型(LLM)的智能体部署中执行框架(harness)的安全隐患问题。该架构将四个防御层直接嵌入智能体的生命周期中,分别在输入处理、决策、动作执行和状态更新阶段实现对抗性上下文过滤、分层因果验证、权限隔离的工具控制以及安全回滚与自适应降级,从而有效检测并缓解潜在攻击。实验表明,SafeHarness 在多个攻击场景下显著降低了不安全行为率和攻击成功率,同时保持了任务执行的有效性。

Comments 26 pages, 6 figures

详情
英文摘要

The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use, context management, and state persistence. Yet this same architectural centrality makes the harness a high-value attack surface: a single compromise at the harness level can cascade through the entire execution pipeline. We observe that existing security approaches suffer from structural mismatch, leaving them blind to harness-internal state and unable to coordinate across the different phases of agent operation. In this paper, we introduce \safeharness{}, a security architecture in which four proposed defense layers are woven directly into the agent lifecycle to address above significant limitations: adversarial context filtering at input processing, tiered causal verification at decision making, privilege-separated tool control at action execution, and safe rollback with adaptive degradation at state update. The proposed cross-layer mechanisms tie these layers together, escalating verification rigor, triggering rollbacks, and tightening tool privileges whenever sustained anomalies are detected. We evaluate \safeharness{} on benchmark datasets across diverse harness configurations, comparing against four security baselines under five attack scenarios spanning six threat categories. Compared to the unprotected baseline, \safeharness{} achieves an average reduction of approximately 38\% in UBR and 42\% in ASR, substantially lowering both the unsafe behavior rate and the attack success rate while preserving core task utility.