arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2329
2605.11645 2026-05-13 cs.MA cs.LG q-fin.ST

GeomHerd: A Forward-looking Herding Quantification via Ricci Flow Geometry on Agent Interactive Simulations

Lake Yang, Junwei Su, Jingfeng Zeng, Wenhao Lu, Xingzhi Qian, Weitong Zhang, Chuan Wu, Dunhong Jin

AI总结 本文提出了一种名为GeomHerd的前向预测模型,用于量化市场中代理人间的从众行为。该方法基于黎曼流形几何,直接在代理交互图上测量协调结构,避免了传统价格相关性统计方法的滞后性。通过追踪代理行为图的离散Ollivier-Ricci曲率,GeomHerd能够提前预测市场从众现象,并在多个实验场景中表现出优于传统指标的预测性能。

详情
英文摘要

Herding -- where agents align their behaviors and act collectively -- is a central driver of market fragility and systemic risk. Existing approaches to quantify herding rely on price-correlation statistics, which inherently lag because they only detect coordination after it has already moved realised returns. We propose GeomHerd, a forward-looking geometric framework that bypasses this observability lag by quantifying coordination directly on upstream agent-interaction graphs. To generate these graphs, we treat a heterogeneous LLM-driven multi-agent simulator -- each financial trader instantiated by a persona-conditioned LLM call -- as a forecastable world, and evaluate the geometric pipeline on the Cividino--Sornette continuous-spin agent-based substrate as our headline financial testbed. By tracking the discrete Ollivier--Ricci curvature of these action graphs, GeomHerd captures the structural topology of emerging coordination. Theoretically, we establish a mean-field bridge mapping our graph-theoretic metric to CSAD, the classical macroscopic herding statistic, linking GeomHerd to downstream price-dispersion measurement. Empirically, GeomHerd anticipates herding long before aggregate market baselines: on the continuous-spin substrate, our primary detector fires a median of 272 steps before order-parameter onset; a contagion detector ($β_{-}$) recalls 65% of critical trajectories 318 steps early; and on co-firing trajectories the agent-graph signal precedes price-correlation-graph baselines by 40 steps. As a complementary indicator, the effective vocabulary of agent actions contracts during cascades. The geometric signature transfers out-of-domain to the Vicsek self-driven-particle model, and a curvature-conditioned forecasting head reduces cascade-window log-return MAE over detector-conditioned and price-only baselines.

2605.11644 2026-05-13 cs.FL cs.LG

Finite Sentence-Interface Control for Learning Bounded-Fan-Out Linear MCFGs under Fixed Monoid Typing

Takayuki Kuriyama

AI总结 本文研究在固定有限单oid同态映射下的有界分支因子线性多重上下文无关文法的正例学习问题。主要挑战在于非终结符生成的元组组件在句子中可能以不同顺序出现,为此引入了“句子接口类型”作为有限的外部控制机制,用于记录元组组件的排列方式及其在句子中的边界区间值。通过构建类型化精炼、有限特征样本和正例学习器,论文证明了在固定分支因子和固定同态映射下,该文法类可以从正例数据中极限识别,并且假设的构建可在多项式时间内完成。

详情
英文摘要

We study positive-data learning of bounded-fan-out linear multiple context-free grammars under a fixed explicit finite monoid homomorphism \(h\). The main obstacle beyond the context-free case is that an MCFG nonterminal derives a tuple whose components may be placed in a surrounding sentence in different orders. We introduce sentence-interface types as finite external control objects for such tuple occurrences. A type records the permutation of tuple components in the final sentence together with the \(h\)-values of the boundary intervals between them. For reduced working binary linear nondeleting MCFG presentations whose string languages satisfy \((f,h)\)-tuple substitutability, we build a typed refinement, a finite characteristic sample, and a canonical positive-data learner. Once the sample contains this characteristic sample and remains contained in the target language, the learner reconstructs the language exactly. Consequently, for fixed fan-out bound \(f\) and fixed explicit \(h\), the resulting class is identifiable in the limit from positive data. Moreover, the hypothesis associated with any given finite sample is constructible in polynomial time for fixed \(f\) and fixed \(h\), including output size. Thus sentence-interface control is the finite mechanism that lifts fixed-\(h\) distributional reconstruction from context-free grammars to bounded-fan-out linear MCFGs.

2605.11638 2026-05-13 stat.ML cs.LG

Learning U-Statistics with Active Inference

Xiaoning Wang, Yuyang Huo, Liuhua Peng, Changliang Zou

AI总结 该论文研究了如何在标签获取成本较高的情况下,利用主动推断方法提高U统计量的估计效率。作者提出了一种基于增强逆概率加权的U统计量框架,结合采样规则和机器学习预测,设计了最优采样策略以最小化方差,并将其扩展到基于U统计量的经验风险最小化中。实验表明,该方法在保证统计推断有效性的同时,显著提升了估计效率。

详情
英文摘要

$U$-statistics play a central role in statistical inference. In many modern applications, however, acquiring the labels required for $U$-statistics is costly. Motivated by recent advances in active inference, we develop an active inference framework for $U$-statistics that selectively queries informative labels to improve estimation efficiency under a fixed labeling budget, while preserving valid statistical inference. Our approach is built on the augmented inverse probability weighting $U$-statistic, which is designed to incorporate the sampling rule and machine learning predictions. We characterize the optimal sampling rule that minimizes its variance and design practical sampling strategies. We further extend the framework to $U$-statistic-based empirical risk minimization. Experiments on real datasets demonstrate substantial gains in estimation efficiency over baseline methods, while maintaining target coverage.

2605.11583 2026-05-13 eess.IV cs.AI cs.CV cs.LG eess.SP

NexOP: Joint Optimization of NEX-Aware k-space Sampling and Image Reconstruction for Low-Field MRI

Tal Oved, Efrat Shimron

AI总结 本文提出了一种名为NexOP的深度学习框架,旨在针对低场强MRI中信噪比低的问题,联合优化多重复采集(NEX)的k空间采样策略与图像重建过程。该方法通过在扩展的k空间-NEX域内优化采样密度概率,在固定采样预算下实现更高效的采样策略,并设计了新的深度学习架构,从多个低信噪比测量中重建高质量图像。实验表明,NexOP在多种加速倍数和组织对比下均优于现有方法,且能生成非均匀采样方案,有效利用NEX维度提升成像效率与质量。

详情
英文摘要

Modern low-field magnetic resonance imaging (MRI) technology offers a compelling alternative to standard high-field MRI, with portable, low-cost systems. However, its clinical utility is limited by a low Signal-to-Noise Ratio (SNR), which hampers diagnostic image quality. A common approach to increase SNR is through repetitive signal acquisitions, known as NEX, but this results in excessively long scan durations. Although recent work has introduced methods to accelerate MRI scans through k-space sampling optimization, the NEX dimension remains unexploited; typically, a single sampling mask is used across all repetitions. Here we introduce NexOP, a deep-learning framework for joint optimization of the sampling and reconstruction in multi-NEX acquisitions, tailored for low-SNR settings. NexOP enables optimizing the sampling density probabilities across the extended k-space-NEX domain, under a fixed sampling-budget constraint, and introduces a new deep-learning architecture for reconstructing a single high-SNR image from multiple low-SNR measurements. Experiments with raw low-field (0.3T) brain data demonstrate that NexOP consistently outperforms competing methods, both quantitatively and qualitatively, across diverse acceleration factors and tissue contrasts. The results also demonstrate that NexOP yields non-uniform sampling strategies, with progressively decreasing sampling across repetitions, hence exploiting the NEX dimension efficiently. Moreover, we present a theoretical analysis supporting these numerical observations. Overall, this work proposes a sampling-reconstruction optimization framework highly suitable for low-field MRI, which can enable faster, higher-quality imaging with low-cost systems and contribute to advancing affordable and accessible healthcare.

2605.11531 2026-05-13 physics.ao-ph cs.LG stat.AP

Generative climate downscaling enables high-resolution compound risk assessment by preserving multivariate dependencies

Takuro Kutsuna, Noriko N. Ishizaki, Norihiro Oyama, Hiroaki Yoshida

AI总结 该研究提出了一种基于扩散模型的多变量生成框架,用于生成高分辨率的气候数据,以提升复合风险评估的准确性。该方法通过结合偏差校正技术,有效恢复了在分辨率提升后退化的变量间相关性,从而更精确地捕捉如干旱、热应激等复合灾害的关联特征。实验表明,该方法在提高单变量和空间精度的同时,显著降低了变量间相关性误差,为区域气候风险评估提供了更可靠的依据。

详情
英文摘要

Physics-based climate projections using general circulation models are essential for assessing future risks, but their coarse resolution limits regional decision-making. Statistical downscaling can efficiently add detail, yet many methods treat variables independently, degrading inter-variable relationships that govern compound hazards such as heat stress, drought, and wildfire. Here we show that a diffusion-based multivariate generative framework, combined with bias correction, recovers degraded inter-variable correlations even under a 50$\times$ increase in linear resolution. When applied to five meteorological variables over Japan, the framework reduces inter-variable correlation errors by more than fourfold relative to existing baselines while improving both univariate and spatial accuracy, leading to more accurate detection of severe drought. These results demonstrate that multivariate generative downscaling improves the reliability of compound risk assessment under large resolution gaps.

2605.11526 2026-05-13 math.OC cs.AI cs.LG

Efficient and provably convergent end-to-end training of deep neural networks with linear constraints

Zonglin Yang, Zhexuan Gu, Yancheng Yuan

AI总结 本文研究如何高效且理论保证地进行带线性约束的深度神经网络端到端训练。为解决投影层导致的非光滑性问题,作者引入了一种高效可计算的HS-Jacobian,并证明其在多面体集上的投影操作中具有保守映射性质,从而能够无缝集成到非光滑自动微分框架中。该方法使得如Adam等高效优化算法可用于此类网络的训练,并建立了收敛性保证,实验表明其在金融、计算机视觉等多个领域表现优异。

详情
英文摘要

Training a deep neural network with the outputs of selected layers satisfying linear constraints is required in many contemporary data-driven applications. While this can be achieved by incorporating projection layers into the neural network, its end-to-end training remains challenging due to the lack of rigorous theory and efficient algorithms for backpropagation. A key difficulty in developing the theory and efficient algorithms for backpropagation arose from the nonsmoothness of the solution mapping of the projection layer. To address this bottleneck, we introduce an efficiently computable HS-Jacobian to the projection layer. Importantly, we prove that the HS-Jacobian is a conservative mapping for the projection operator onto the polyhedral set, enabling its seamless integration into the nonsmooth automatic differentiation framework for backpropagation. Therefore, many efficient algorithms, such as Adam, can be applied for end-to-end training of deep neural networks with linear constraints. Particularly, we establish convergence guarantees of the HS-Jacobian based Adam algorithm for training linearly constrained deep neural networks. Extensive experiment results on several important applications, including finance, computer vision, and network architecture design, demonstrate the superior performance of our method compared to other existing popular methods.

2605.11511 2026-05-13 stat.ML cs.LG

Post-ADC Inference: Valid Inference After Active Data Collection

Shuichi Nishino, Tomohiro Shiraishi, Teruyuki Katsuoka, Ichiro Takeuchi

AI总结 本文研究了在主动数据收集(ADC)后进行统计推断时的效度问题,指出传统推断方法由于数据采集过程的自适应偏差而可能失效。为此,作者提出了“后ADC推断”框架,通过结合选择性推断方法,有效校正了数据采集过程和后续数据驱动目标构建所带来的偏差,从而提供有效的p值和置信区间。该方法仅需对观测噪声做假设,适用于多种ADC过程,实验表明其在GP-UCB和TPE等方法收集的数据上具有良好的推断效度。

详情
英文摘要

The validity of statistical inference depends critically on how data are collected. When data gathered through active data collection (ADC) are reused for a post-hoc inferential task, conventional inference can fail because the sampling is adaptively biased toward regions favored by the collection strategy. This issue is especially pronounced in black-box optimization, where sequential model-based optimization (SMBO) methods such as the tree-structured Parzen estimator (TPE) and Gaussian process upper confidence bound (GP-UCB) preferentially concentrate evaluations in promising regions. We study statistical inference on actively collected data when the inferential target is constructed in a data-dependent manner after data collection. To enable valid inference in this setting, we propose post-ADC inference, a framework that accounts for the biases arising from both the active data collection process and the subsequent data-driven target construction. Our method builds on selective inference and provides valid $p$-values and confidence intervals that correct for both sources of bias. The framework applies to a broad class of ADC processes by imposing only assumptions on the observation noise, without requiring any assumptions on the underlying black-box function or the surrogate model used by the SMBO algorithm. Empirical results also show that post-ADC inference provides valid inference for data collected by GP-UCB and TPE.

2605.11501 2026-05-13 cs.SE cs.AI cs.CR

Decaf: Improving Neural Decompilation with Automatic Feedback and Search

Alexander Shypula, Osbert Bastani, Edward Schwartz

AI总结 本文提出了一种名为Decaf的神经反编译系统,通过引入自动反馈和搜索机制,显著提升了反编译结果的语义正确性。该方法无需依赖更多训练数据,而是利用编译器反馈指导搜索过程,从而在保持与原始源代码相似度的同时,将反编译成功率从26.0%提升至83.9%。实验表明,该方法对提升弱神经反编译模型的性能尤为有效。

Comments 15 pages, 6 figures. Preprint; under review. Code and models available at https://github.com/AlexShypula/decaf

详情
英文摘要

Decompilers are useful tools used in reverse engineering to understand compiled source code. Reconstructing source code from compiled binaries is a challenging task, because high-level syntax, identifiers, and custom data types are generally lost as the compiler translates human-readable code to low-level machine code. Deterministic decompilers are useful tools for binary analysis, but can struggle to infer idiomatic syntax and identifier names. Generative AI models are a natural fit for reconstructing high-level syntax, identifiers, and types, but they can still suffer by hallucinating improper programming constructs and semantics. Instead of attempting to improve neural decompilers with more data and more training, we argue that compiler feedback can be used to dramatically improve the semantic correctness of neural decompiler outputs via search. Our system, Decaf (DECompilation with Automated Feedback), raises the neural decompilation rate from 26.0% on ExeBench to 83.9% on the Real -O2 split without sacrificing similarity to the original source code. We also find our automatic feedback methodology is highly effective for improving weaker neural decompilation models.

2605.11489 2026-05-13 cs.GR cs.CV

3DGS$^3$: Joint Super Sampling and Frame Interpolation for Real-Time Large-Scale 3DGS Rendering

Yibo Zhao, Fan Gao, Youcheng Cai, Ligang Liu

AI总结 3DGS$^3$ 是一种统一的后渲染框架,旨在解决 3D 高斯点绘(3DGS)在实时渲染中超大规模场景和高分辨率下的效率瓶颈问题。该方法通过联合进行超采样和帧插值,利用可微处理的低分辨率输出,实现高分辨率与高帧率的渲染。其核心模块包括基于梯度感知的超采样网络(GASS)和轻量级时序帧插值网络(LTFI),分别提升了空间细节和时间连贯性,实验表明该方法在渲染效率和视觉质量上优于现有方法,并兼容现有的 3DGS 加速技术。

详情
英文摘要

3D Gaussian Splatting (3DGS) enables high-quality real-time 3D rendering but faces challenges in efficiently scaling to ultra-dense scenes and high-resolution due to computational bottlenecks that limit its use in latency-sensitive applications. Instead of optimizing the splatting pipeline itself, we propose \textbf{3DGS$^3$}, a unified post-rendering framework that jointly performs super sampling and frame interpolation through differentiable processing of low-resolution outputs to achieve both high-resolution and high-frame-rate rendering. Our \textbf{Gradient\- \-Aware Super Sampling (GASS)} module leverages the continuous differentiability of 3DGS to extract image gradients that guide a GRU-based refinement network to enable high-fidelity super sampling. Furthermore, a \textbf{Lightweight Temporal Frame Interpolation (LTFI)} module based on a compact U-Net-like backbone fuses temporal and differentiable spatial cues from consecutive frames to synthesize temporally coherent intermediate frames. Experiments on public datasets demonstrate that 3DGS$^3$ achieves superior rendering efficiency and visual quality when compared with state-of-the-art methods and remains compatible with existing 3DGS acceleration techniques. The code will be publicly released upon acceptance.

2605.11487 2026-05-13 cs.CR cs.AI cs.MA

Digital Identity for Agentic Systems: Toward a Portable Authorization Standard for Autonomous Agents

Partha Madhira

AI总结 随着企业人工智能从辅助工具转向能够自主执行任务、协商结果并做出决策的自主代理,传统的身份认证已不足以满足需求,代理的授权需要具备明确性、约束性、可审计性、可撤销性和跨信任边界的一致解释性。本文通过分析保险理赔和供应链完整性等典型企业场景,揭示了现有身份与访问模型的结构性缺陷,并提出了一种基于授权载荷、约束代数和决策一致评估语义的可移植授权模型,旨在为自主代理提供跨组织、跨系统的统一授权标准。

Comments 46 pages, 10 figures

详情
英文摘要

Enterprise AI is shifting from copilots to autonomous agents capable of executing workflows, negotiating outcomes, and making decisions with limited human oversight. As these systems extend across organizational boundaries, identity alone is insufficient: an agent's authority must also be explicit, constrained, auditable, revocable, and consistently interpretable by independent receivers. This paper analyzes representative enterprise use cases in insurance claims processing and supply chain integrity to surface structural gaps in existing identity and access models. It proposes a portable authorization model for autonomous agents based on issuer-authored authorization payloads, typed constraint algebra, decision-consistent evaluation semantics, delegation attenuation, governed semantic resolution, fail-closed processing, and pre-flight discovery. The model separates credential containers, authorization payload semantics, and enforcement engines, allowing profiles such as JWT/JWS, Verifiable Credentials, OAuth Rich Authorization Requests, or policy-engine bindings to preserve a common authorization meaning across trust boundaries.

2605.11447 2026-05-13 cs.IR cs.AI

Conditional Memory Enhanced Item Representation for Generative Recommendation

Ziwei Liu, Yejing Wang, Shengyu Zhou, Xinhang Li, Xiangyu Zhao

AI总结 生成式推荐(GR)是一种通过自回归生成项目语义标识符(SID)来预测目标项目的新兴范式。现有方法在构建项目级表示时面临信息丢失和结构保留的冲突,为此,本文提出了一种条件记忆增强的项目表示框架ComeIR,通过多模态引导的令牌评分、双层级记忆模块和记忆恢复预测头,有效恢复SID的结构信息与粒度细节,显著提升了生成推荐的效果与灵活性。

详情
英文摘要

Generative recommendation (GR) has emerged as a promising paradigm that predicts target items by autoregressively generating their semantic identifiers (SID). Most GR methods follow a quantization-representation-generation pipeline, first assigning each item a SID, then constructing input representations from SID-token embeddings, and finally predicting the target SID through autoregressive generation. Existing item-level representation constructions mainly take two forms: directly merging SID-token embeddings into a compact vector, or enriching item-level representations with external inputs through additional networks. However, these item-level constructors still expose two practical challenges: direct merging may amplify the information loss caused by quantization and ID collision while obscuring SID code relations, whereas external-input-based methods can strengthen item semantics but cannot reliably preserve the SID-structured evidence required for token-level generation. These limitations make representation construction an underexplored bottleneck, leading to two severe problems, \ie{} the Identity-Structure Preservation Conflict and Input-Output Granularity Mismatch. To this end, we propose ComeIR, a Conditional Memory enhanced Item Representation framework that reconstructs SID-token embeddings into item-aware inputs and restores the token granularity during SID decoding. Specifically, MM-guided token scoring adaptively estimates the contribution of each code within the SID, dual-level Engram memory captures intra-item code composition and inter-item transition patterns, and a memory-restoring prediction head reuses the memories during SID decoding. Extensive experiments demonstrate the effectiveness and flexibility of ComeIR, and further reveal scalable gains from enlarging conditional memory.

2605.11442 2026-05-13 cs.CR cs.AI cs.CL

Can a Single Message Paralyze the AI Infrastructure? The Rise of AbO-DDoS Attacks through Targeted Mobius Injection

Zi Liang, Ronghua Li, Yanyun Wang, Qingqing Ye, Haibo Hu

AI总结 本文提出了一种新型的针对人工智能基础设施的攻击方法——Mobius 注入,该方法通过利用自主代理的语义闭包漏洞,将单条消息转化为持续递归执行的攻击指令,从而引发基于代理的定向 DDoS(AbO-DDoS)攻击。这种攻击具有轻量、隐蔽且高度可配置的特点,能够精准针对特定环境或模型提供商,实验表明其在多个主流代理系统中均能造成显著的性能恶化。为应对该威胁,研究者提出了一种基于代理组件能量分析的主动防御机制,用于检测恶意递归触发行为。

详情
英文摘要

Large Language Model (LLM) agents have emerged as key intermediaries, orchestrating complex interactions between human users and a wide range of digital services and LLM infrastructures. While prior research has extensively examined the security of LLMs and agents in isolation, the systemic risk of the agent acting as a disruptive hub within the user-agent-service chain remains largely overlooked. In this work, we expose a novel threat paradigm by introducing Mobius Injection, a sophisticated attack that weaponizes autonomous agents into zombie nodes to launch what we define as gent-based and -Oriented DDoS (AbO-DDoS) attacks. By exploiting a structural vulnerability in agentic logic named Semantic Closure, an adversary can induce sustained recursive execution of agent components through a single textual injection. We demonstrate that this attack is exceptionally lightweight, stealthy against both traditional DDoS monitors and contemporary AI safety filters, and highly configurable, allowing for surgical targeting of specific environments or model providers. To evaluate the real-world impact, we conduct extensive experiments across three representative claw-style agents and three mainstream coding agents, integrated with 12 frontier proprietary or open-weight LLMs. Our results demonstrate that Mobius Injection achieves substantial attack success across diverse tasks, driving single-node call amplification up to 51.0x and multi-node p95 latency inflation up to 229.1x. The attack performance exhibits a superlinear increase with the number of poisoning nodes. To mitigate Mobius Injection, we propose a proactive defense mechanism using Agent Component Energy (ACE) Analysis, which detects malicious recursive triggers by measuring anomalous energy in the agent's component graph.

2605.11394 2026-05-13 stat.ML cs.AI cs.LG stat.AP stat.ME

Spatial Adapter: Structured Spatial Decomposition and Closed-Form Covariance for Frozen Predictors

Wen-Ting Wang, Wei-Ying Wu, Hao-Yun Huang, Xuan-Chun Wang

AI总结 本文提出了一种名为 Spatial Adapter 的参数高效模块,能够在不修改原始预测模型参数的前提下,为任意冻结的初始预测器提供结构化的空间残差表示及其闭式协方差估计。该方法通过可追踪的批量 ADMM 算法,联合学习空间正则化的正交基与样本级得分,从而在残差场中提取出具有平滑性、稀疏性和正交性的低秩空间结构。该方法不仅支持对未观测位置进行克里金插值式的空间预测,还可用于不确定性量化,实验表明其在多种数据集上均能有效恢复残差空间结构,且参数量远低于传统方法。

Comments Preprint. 10 pages main text, with appendices

详情
英文摘要

We present the Spatial Adapter, a parameter-efficient post-hoc layer that equips any frozen first-stage predictor with a structured spatial representation of its residual field and an induced closed-form spatial covariance. The adapter operates as a cascade second stage on residuals, jointly learning a spatially regularized orthonormal basis and per-sample scores via a tractable mini-batch ADMM procedure, without modifying any first-stage parameter. Because the first-stage parameters are frozen, the adapter does not retrain the backbone; its role is to supply a compressed distributional summary of the residual field. Smoothness, sparsity, and orthogonality together turn a generic low-rank factorization into an identifiable spatial representation whose induced residual covariance admits a closed-form low-rank-plus-noise estimator; the effective rank is determined data-adaptively by spectral thresholding, while the nominal rank K is an optimization-side upper bound only. This covariance enables kriging-style spatial prediction at unobserved locations, with plug-in uncertainty quantification as a secondary downstream use. Across synthetic data, Weather2K for spatial-holdout prediction, and GWHD patch grids as a basis-transferability diagnostic, the adapter recovers residual spatial structure when paired with frozen first stages from linear models to deep spatiotemporal and vision backbones; the added representation uses fewer than K(N+T) parameters alongside a compact residual-trend network.

2605.11360 2026-05-13 cs.CR cs.AI cs.SE

Options, Not Clicks: Lattice Refinement for Consent-Driven MCP Authorization

Ying Li, Yanju Chen, Peiran Wang, Issac Khabra, Faysal Hossain Shezan, Yu Feng, Yuan Tian

AI总结 随着模型上下文协议的广泛应用,如何通过用户的有意义授权来保障工具调用的安全性成为一个关键问题。本文提出了一种名为Conleash的客户端中间件,它利用风险格结构自动允许已知边界内的安全调用,同时识别并升级潜在风险,并通过策略引擎和规则细化循环实现用户定义的不变量和可复用规则。实验表明,Conleash在真实场景中表现出高准确率和低开销,并在用户研究中获得了更高的信任度和更少的交互需求。

详情
英文摘要

As Model Context Protocol adoption grows, securing tool invocations via meaningful user consent has become a critical challenge, as existing methods, broad always allow toggles or opaque LLM-based decisions, fail to account for dangerous call arguments and often lead to consent fatigue. In this work, we present Conleash, a client-side middleware that enforces boundary-scoped authorization by utilizing a risk lattice to auto-permit safe calls within known boundaries while escalating risks, a policy engine for user-defined invariants, and a refinement loop that converts user decisions into reusable rules. Evaluated on 984 real-world traces, Conleash achieved 98.2% accuracy, caught 99.4% of escalations, and added only 8.2 ms of overhead for policy verification; furthermore, in a user study where N=16, participants significantly preferred Conleash scoped permissions over traditional methods, citing higher trust and reduced prompting.

2605.11350 2026-05-13 cs.GT cs.AI econ.TH

Human-AI Productivity Paradoxes: Modeling the Interplay of Skill, Effort, and AI Assistance

Ali Aouad, Thodoris Lykouris, Huiying Zhong

AI总结 本文研究了生成式人工智能工具在工作场所和教育中广泛应用背景下,其对生产力影响的复杂机制。作者构建了一个人类与AI互动的模型,分析了技能水平、努力程度与AI辅助之间的相互作用,发现AI的不可靠性或技能发展的内生性可能导致生产力悖论,即更多AI辅助反而降低生产力。此外,研究还揭示了AI对技能分布的长期影响,指出在AI素养存在异质性的情况下,技能极化现象可能在稳态中出现。

详情
英文摘要

Generative Artificial Intelligence (AI) tools are rapidly adopted in the workplace and in education, yet the empirical evidence on AI's impact remains mixed. We propose a model of human-AI interaction to better understand and analyze several mechanisms by which AI affects productivity. In our setup, human agents with varying skill levels exert utility-maximizing effort to produce certain task outcomes with AI assistance. We find that incorporating either endogeneity in skill development or in AI unreliability can induce a productivity paradox: increased levels of AI assistance may degrade productivity, leading to potentially significant shortfalls. Moreover, we examine the long-term distributional effect of AI on skill, and demonstrate that skill polarization can emerge in steady state when accounting for heterogeneity in AI literacy -- the agent's capability to identify and adapt to inaccurate AI outputs. Our results elucidate several mechanisms that may explain the emergence of human-AI productivity paradoxes and skill polarization, and identify simple measures that characterize when they arise.

2605.11335 2026-05-13 cs.DC cs.LG

ChunkFlow: Communication-Aware Chunked Prefetching for Layerwise Offloading in Distributed Diffusion Transformer Inference

Han Meng, Danny Willow Liu, Dong Li

AI总结 本文研究了在分布式扩散变换器(DiT)推理中,如何通过通信感知的分块预取技术提升层间卸载的效率。针对现有层间卸载在计算负载较小时无法隐藏预取延迟、以及与PCIe通信冲突的问题,作者提出ChunkFlow,一种基于分块粒度的自适应卸载运行时系统,能够动态协调预取与通信操作,实现计算与通信的协同调度。实验表明,ChunkFlow在保持相近推理时间的前提下显著降低了峰值GPU内存占用,并在不同工作负载下提供了可调节的内存-延迟权衡。

详情
英文摘要

Layerwise offloading reduces the GPU memory footprint of large diffusion transformer (DiT) inference by prefetching upcoming layers from host memory, but its effectiveness hinges on hiding prefetch latency behind per-layer computation. This assumption breaks down when the per-GPU compute workload is small. Moreover, on PCIe-only nodes, prefetch and inter-GPU collective communications such as all-reduce and all-to-all contend on the shared PCIe path, exposing prefetch latency even when compute would otherwise hide it. We revisit layerwise offloading as a co-scheduling problem between prefetch and communication, guided by a first-order analytical model that predicts when prefetch can be hidden by computation. Building on this model, we design ChunkFlow, a communication-aware, chunk-granular offloading runtime that adaptively yields to collective communication and smoothly trades GPU memory for prefetch volume. On three representative diffusion transformers running on two H100 GPUs over PCIe with Ulysses sequence parallelism, ChunkFlow delivers up to 1.28x step-time speedup over SGLang's existing layerwise offloading, reduces peak GPU memory by up to 49% over the no-offload baseline at near-identical step time once the workload is large enough, and exposes a tunable memory-latency tradeoff that recovers near-zero step-time overhead in the small-workload regime.

2605.11315 2026-05-13 cs.SE cs.AI cs.CR

Natural Language based Specification and Verification

Zhaorui Li, Chengyu Song

AI总结 本文研究如何利用大语言模型(LLM)基于自然语言生成系统规范并进行组合验证,以防止生成有漏洞的代码。与传统形式化验证依赖严格形式语言不同,该方法直接使用自然语言表达规范,简化了验证流程。初步实验表明,该方法在规范生成与验证任务中展现出良好潜力。

详情
英文摘要

Recent frontier large language models (LLMs) have shown strong performance in identifying security vulnerabilities in large, mature open-source systems. As LLM-generated code becomes increasingly common, a natural goal is to prevent such models from producing vulnerable implementations in the first place. Formal verification offers a principled route to this objective, but existing verification pipelines typically require specifications written in rigid formal languages. Prior work has explored using LLMs to synthesize such specifications, with limited success. In this paper, we investigate a different approach: using LLMs both to generate specifications and to verify implementations compositionally when the specifications are expressed in natural language. Our preliminary results suggest that this approach is promising.

2605.11286 2026-05-13 eess.SP cs.SD eess.AS

Adaptive Diagonal Loading using Krylov Subspaces for Robust Beamforming

Manan Mittal, Ryan M. Corey, John R. Buck, Andrew C. Singer

AI总结 本文针对大阵列麦克风在动态声学环境中进行自适应波束成形时面临的数据快照不足问题,提出了一种基于Krylov子空间的自适应对角加载方法。该方法利用Lanczos迭代构建小规模Krylov子空间,将协方差矩阵投影到低维三对角矩阵,从而高效估计其极值特征值,显著降低了计算复杂度。实验表明,该方法在保证波束成形性能和白噪声增益严格约束的同时,计算成本仅为传统特征值分解方法的很小一部分。

Comments 5 pages, 8 figures

详情
英文摘要

Reliable adaptive beamforming is critical for large microphone arrays operating in highly dynamic acoustic environments. In scenarios characterized by fast-moving talkers and interferers, the available sample support for estimating the spatial correlation matrix is often snapshot-deficient. This deficiency degrades the White Noise Gain (WNG), leading to severe target signal cancellation. To ensure stable and robust beamforming, we previously proposed an adaptive diagonal loading method that leverages the Kantorovich inequality to guarantee the WNG remains strictly within specified bounds. However, accurately determining the smallest necessary loading level requires calculating the extreme eigenvalues of the spatial correlation matrix, a computationally expensive $\mathcal{O}(M^3)$ operation for large arrays. In this paper, we introduce a highly efficient $\mathcal{O}(kM^2)$ estimation technique using Lanczos iterations to build a small Krylov subspace. By projecting the correlation matrix onto a tridiagonal matrix of dimension $k \ll M$, we extract Ritz values that rapidly converge to the exact extreme eigenvalues. Our evaluations demonstrate that this Lanczos-accelerated approach achieves performance identical to exact Eigenvalue Decomposition (EVD), ensuring optimal interference suppression and strict WNG adherence at a fraction of the computational cost.

2605.11284 2026-05-13 stat.ME cs.AI cs.LG

Rethinking external validation for the target population: Capturing patient-level similarity with a generative model

Mohammad Azizmalayeri, Ameen Abu-Hanna, Saskia Houterman, Marije M. Vis, Giovanni Cinà

AI总结 该研究旨在解决外部验证中因目标人群与模型开发人群差异而导致的模型性能解释困难问题,提出了一种基于生成模型的框架,用于量化每个外部患者与开发数据的相似性,并在不同相似度子群中评估模型性能。通过使用自编码器等生成模型,该方法无需共享原始开发数据即可实现更灵活的相似性估计,提升了外部验证的可解释性与实用性。实验表明,该框架能够揭示传统外部验证所掩盖的模型性能差异,为模型的可迁移性评估提供了更科学的依据。

详情
英文摘要

Background: External validation is essential for assessing the transportability of predictive models. However, its interpretation is often confounded by differences between external and development populations. This study introduces a framework to distinguish model deficiencies from case-mix effects. Method: We propose a framework that quantifies each external patient's similarity to the development data and measures performance in subgroups with varying levels of alignment to the development distribution. We use generative models, specifically autoencoders, to estimate similarity, offering a more flexible alternative to traditional linear approaches and enabling validation without sharing the original development data. The utility of autoencoder-based similarity measure is demonstrated using synthetic data, and the framework's application is illustrated using data from the Netherlands Heart Registration (NHR) to predict mortality after transcatheter aortic valve implantation. Results: Our framework revealed substantial variation in model performance across similarity-defined subgroups, differences that remain hidden under conventional external validation yet can meaningfully alter conclusions. In several settings, conventional external validation suggested poor overall performance. However, after accounting for differences in patient characteristics, for some sub-groups, the model performance was consistent with internal validation results. Conversely, apparently acceptable overall performance could mask clinically relevant performance deficits in specific subgroups. Conclusion: The proposed framework enhances the interpretability of external validation by linking model performance to population alignment with the development data. This provides a more principled basis for deciding whether a model is transportable and to which patients it can be safely applied.

2605.11280 2026-05-13 gr-qc astro-ph.HE cs.AI

Discovery of Interpretable Surrogates via Agentic AI: Application to Gravitational Waves

Tousif Islam, Digvijay Wadekar, Tejaswi Venumadhav, Matias Zaldarriaga, Ajit Kumar Mehta, Javier Roulet, Barak Zackay

AI总结 该研究提出了一种基于大型语言模型的智能代理工作流 GWAgent,用于从仿真数据中直接构建可解释的解析代理模型,以替代耗时的数值模拟。通过引入物理信息的先验假设,该方法在引力波波形建模中实现了高精度和显著加速,并揭示了波形中的紧凑物理结构。研究展示了该方法在分析实际引力波事件 GW200129 的轨道偏心率方面的应用,取得了优于传统方法的成果。

Comments 25 pages, 9 figures, codes available at https://github.com/tousifislam/GWAgent

详情
英文摘要

Fast surrogate models for expensive simulations are now essential across the sciences, yet they typically operate as black boxes. We present \texttt{GWAgent}, a large language model (LLM)-based workflow that constructs interpretable analytic surrogates directly from simulation data. Surrogate modeling is well suited to agentic workflows because candidate models can be quantitatively validated against ground-truth simulations at each iteration. As a demonstration, we build a surrogate for gravitational waveforms from eccentric binary black hole mergers. We show that providing the agent with a physics-informed domain ansatz substantially improves output model accuracy. The resulting analytic surrogate attains a median Advanced LIGO mismatch of $6.9\times10^{-4}$ together with an $\sim 8.4\times$ speedup in waveform evaluation, surpassing both symbolic regression and conventional machine learning baselines. Beyond producing an accurate model, the workflow identifies compact physical structure from the learned representation. As an astrophysical application, we use \texttt{GWAgent} to analyze the eccentricity of GW200129 and infer $e_{20\mathrm{Hz}}=0.099^{+0.063}_{-0.044}$. These results show that validation-constrained agentic workflows can produce accurate, fast, and interpretable surrogates for scientific simulations and inference.

2605.11269 2026-05-13 gr-qc astro-ph.HE astro-ph.IM cs.AI

gwBenchmarks: Stress-Testing LLM Agents on High-Precision Gravitational Wave Astronomy

Tousif Islam, Digvijay Wadekar, Zihan Zhou

AI总结 该研究提出了一套名为 gwBenchmarks 的基准测试任务,用于评估大型语言模型(LLM)代理在高精度引力波天文学建模任务中的表现。这些任务涵盖插值、回归和高维时间序列建模,涉及数值方法、机器学习和物理引导方法,代表了大量计算资源的投入。实验表明,现有LLM代理在完成这些任务时普遍存在系统性错误,难以满足引力波研究中对精度的严格要求,反映出当前AI代理在科学建模方面仍面临重大挑战。

Comments 26 pages, 4 figures

详情
英文摘要

Modern gravitational wave astronomy relies on modeling tasks that often require months of graduate-level effort, including building fast waveform surrogates from expensive numerical relativity simulations, modeling orbital dynamics of black holes, fitting merger remnant properties and constructing template banks. These problems demand extreme precision to support detection and parameter inference, with state-of-the-art models achieving $\lesssim 10^{-4}$ relative error. We study whether state-of-the-art LLM coding agents can perform such end-to-end scientific modeling, where success requires constructing models with stringent accuracy criteria and reasoning about physical systems. We introduce gwBenchmarks, a suite of eight tasks grounded in gravitational wave analytic calculations and numerical simulations collectively representing over $10^8$ core-hours of compute. The tasks span interpolation, regression, and high-dimensional time-series modeling, requiring a combination of numerical methods, machine learning, and physics-informed approaches. In preliminary experiments, agents frequently relied on proxy metrics, partial evaluation, or fabricated results to spuriously complete tasks. We therefore implement an external pre-defined framework to gauge agent progress. Evaluating twelve coding agents, we find no consistent winner. On the easiest task, multiple agents converge to the same cubic spline solution, with one rediscovering a coordinate transformation widely used in the literature. On harder tasks like analytic waveform modeling, all agents fall 1-2 orders of magnitude short of domain requirements and exhibit systematic failures, including metric misuse, constraint violations, and result fabrication. Our code, data, and website are publicly available.

2605.11240 2026-05-13 cs.GT cs.CY cs.LG

When to Ask a Question: Understanding Communication Strategies in Generative AI Tools

Charlotte Park, Kate Donahue, Manish Raghavan

AI总结 本文研究了生成式AI工具中用户与模型之间的沟通策略,探讨了在何种情况下用户应主动提供更多信息以提升个性化与公平性。研究提出了一种平衡用户负担与偏好表达的优化目标,并基于用户偏好相关性的观察,分析了AI系统在信息推断与主动询问之间的最佳策略。实验表明,适当的信息征求能够减少偏好推断带来的系统性偏差,从而在保持效率的同时更好地融合多样化的用户视角。

详情
英文摘要

Generative AI models differ from traditional machine learning tools in that they allow users to provide as much or as little information as they choose in their inputs. This flexibility often leads users to omit certain details, relying on the models to infer and fill in under-specified information based on distributional knowledge of user preferences. Such inferences may privilege majority viewpoints and disadvantage users with atypical preferences, raising concerns about fairness. Unlike more traditional recommender systems, LLMs can explicitly solicit more information from users through natural language. However, while directly eliciting user preferences could increase personalization and mitigate inequality, excessive querying places a burden on users who value efficiency. We develop a stylized model of user-LLM interaction and develop an objective that captures tradeoff between user burden and preference representation. Building on the observation that individual preferences are often correlated, we analyze how AI systems should balance inference and elicitation, characterizing the optimal amount of information to solicit before content generation. Ultimately, we show that information elicitation can mitigate the systematic biases of preference inference, enabling the design of generative tools that better incorporate diverse user perspectives while maintaining efficiency. We complement this theoretical analysis with an empirical evaluation illustrating the model's predictions and exploring their practical implications.

2605.11229 2026-05-13 cs.CR cs.AI cs.SE

Comment and Control: Hijacking Agentic Workflows via Context-Grounded Evolution

Neil Fendley, Zhengyu Liu, Aonan Guan, Jiacheng Zhong, Yinzhi Cao

AI总结 本文研究了自动化平台(如 GitHub Actions 和 n8n)中基于代理的工作流可能面临的安全风险,即攻击者通过精心构造的输入(如 GitHub 评论)操控大型语言模型代理,实现如凭证泄露和任意命令执行等恶意行为。为此,作者提出了首个检测与利用框架 JAW,通过一种名为“上下文引导进化”的新方法,结合静态路径可行性分析、动态提示溯源分析和能力分析,生成能够触发代理执行恶意操作的输入。实验表明,JAW 能够成功劫持大量 GitHub 工作流和 n8n 模板,并已负责任地向相关厂商披露漏洞,获得多家公司的认可与修复。

详情
英文摘要

Automation platforms such as GitHub Actions and n8n are increasingly adopting so-called agentic workflows, which integrate Large Language Model (LLM) agents for tasks such as code review and data synchronization. While bringing convenience for developers, this integration exposes a new risk: An adversary may control and craft certain inputs, such as GitHub issue comments, to manipulate the LLM agent for unwanted actions, such as credential exfiltration and arbitrary command execution. To our knowledge, no prior academic work has studied such a risk in agentic workflows. In this paper, we design the first detection and exploitation framework, called JAW, to hijack agentic workflows hosted on automation platforms via a novel approach called Context-Grounded Evolution. Our key idea is to evolve agentic workflow inputs under the contexts derived from hybrid program analysis for hijacking purposes. Specifically, JAW generates agentic workflow contexts through three analyses: (i) static path-feasibility analysis to identify feasible agent-invocation paths and the input constraints required to trigger them, (ii) dynamic prompt-provenance analysis to determine how that input is transformed and embedded into the LLM context, and (iii) capability analysis to identify the actions and restrictions available to the agent at runtime. Our evaluation of JAW on GitHub workflows and n8n templates showed that 4714 GitHub workflows and eight n8n templates can be successfully hijacked, for example, to leak user credentials. Our findings span 15 widely-used GitHub Actions, including official GitHub Actions for Claude Code, Gemini CLI, Qwen CLI, and Cursor CLI, and two official n8n nodes. We responsibly disclosed all findings to the affected vendors and received many acknowledgements, fixes, and bug bounties, notably from GitHub, Google, and Anthropic.

2605.11221 2026-05-13 q-bio.QM cs.LG

Beyond Manual Curation: Augmenting Targeted Protein Degradation Databases via Agentic Literature Extraction Workflows

Yaochen Rao, Farzaneh Jalalypour, N. M. Anoop Krishnan, Rocío Mercado

AI总结 该研究旨在解决靶向蛋白降解(TPD)领域中实验数据缺乏结构化的问题,提出了一种结合专家反馈的大型语言模型(LLM)工作流,用于自动化从科学文献中提取关键实验信息。该方法通过少量专家标注的样本优化提示指令,并在分子胶和PROTAC两类TPD化合物的数据库中实现了高精度的数据提取与扩展,显著提升了数据库规模与实验信息的完整性。研究成果为TPD研究及更广泛的科学文献数据整理提供了可复用的工具和数据资源。

详情
英文摘要

Predictive models in biomedicine depend on structured assay data locked in the text, tables, and supplements of primary publications. This bottleneck is especially acute in targeted protein degradation (TPD), where each assay record must combine compound identity, degradation target, recruiter, assay context, and endpoint values reported across sections, tables, and supplementary files. Inconsistent compound identifiers and incomplete or implicit assay context further demand domain-specific logic that generic LLM pipelines do not provide. Existing molecular glue and PROTAC databases are manually curated and often lack the experimental context required for downstream modeling. We formulate TPD database extraction as a domain-specific curation task and present an expert-in-the-loop LLM workflow, evaluated through a triangular comparison among LLM predictions, standardized baseline records, and expert-annotated ground truth. A lightweight cross-validated prompt-refinement module adapts extraction instructions from scarce expert annotations. With only seven annotated molecular glue publications, the workflow achieved record-level $F_1 = 0.98$ and transferred to PROTACs by terminology substitution alone, maintaining record-level $F_1 > 0.93$. Applied at scale, it expanded molecular glue and PROTAC databases by 81% and 92% records, respectively, with 92% and 82.5% of newly recovered records validated as correct upon expert review. The workflow also recovered kinetic and assay-context information essential for cross-study potency comparison and condition-aware degradation modeling. We release the workflow, prompts, evaluation code, and extracted datasets as resources for TPD data curation and AI-assisted scientific curation more broadly.

2605.11204 2026-05-13 eess.SY cs.LG cs.MA cs.SY math.AT

Multi-Agent System Identification with Nonlinear Sheaf Diffusion

Nivar Anwer, Hans Riess, Matthew Hale

AI总结 本文研究了如何从多智能体系统的轨迹数据中恢复局部交互规律的问题,特别是在由非线性叠层拉普拉斯算子描述的系统中。该问题的核心挑战在于轨迹数据仅能反映节点状态的演化,而无法区分不同但等效的边势函数。研究通过叠层上同调揭示了恢复过程中的拓扑障碍,并提出在特定条件下可实现唯一恢复,同时展示了参数化类别的恢复条件与数据信息矩阵的正定性密切相关。实验验证了理论结果,并表明轨迹重现的准确性并不必然意味着交互规律的正确恢复。

详情
英文摘要

Local interaction laws governing multi-agent systems can be difficult to recover from trajectory data, even when the dynamics are observed faithfully. In systems governed by a nonlinear sheaf Laplacian -- a generalization of the graph Laplacian accommodating heterogeneous state spaces and asymmetric communication channels -- the coordination law is encoded by edge potential functions whose gradients produce the inter-agent forces. Because trajectory observations record node-state evolution, they expose only the aggregate effect of the edge forces at each node: distinct interaction laws that agree at the node level are indistinguishable from trajectory data alone. We show that the fundamental obstruction to recovery is topological, measured by sheaf cohomology, and that unique recovery from an unconstrained function class is possible if and only if this cohomology vanishes. When the obstruction is nontrivial, we show that recovery within a finite-dimensional parameterized class is possible precisely when a data-dependent information matrix is positive definite. Experiments validate the theory and illustrate that accurate trajectory reproduction need not certify recovery of the underlying interaction law.

2605.11202 2026-05-13 cs.CR cs.AI cs.LG cs.SE

Continuous Discovery of Vulnerabilities in LLM Serving Systems with Fuzzing

Yunze Zhao, Yibo Zhao, Yuchen Zhang, Zaoxing Liu, Michelle L. Mazurek

AI总结 该研究针对大语言模型(LLM)推理服务系统中的安全问题,提出了一种基于模糊测试的灰色盒检测工具GRIEF,用于持续发现服务层中的漏洞。GRIEF通过处理多请求时间序列作为输入,结合轻量级检测机制,能够识别崩溃、性能异常和输出污染等问题,并确认可复现的服务层故障。实验表明,GRIEF在多个主流推理引擎中发现了15个漏洞,其中10个已被开发者确认,揭示了并发、缓存和状态复用等机制可能引发的安全隐患。

详情
英文摘要

LLM inference and serving systems have become security-critical infrastructure; however, many of their most concerning failures arise from the serving layer rather than from model behavior alone. Modern inference engines combine KV cache, batching, prefix sharing, speculative decoding, adapters, and multi-tenant scheduling, creating shared-state behavior that only emerges under realistic concurrent workloads and is missed by standard model, safety, and API tests. We present GRIEF, a greybox fuzzer for LLM inference engines that treats timed multi-request traces as first-class inputs, uses lightweight oracles to detect crashes, hangs, performance pathologies, and silent output corruption, and applies controlled replay with log-probability checks to confirm reproducible serving-layer failures. Across early campaigns on vLLM and SGLang, GRIEF discovers 15 vulnerabilities, 10 confirmed by engine developers, including 2 CVEs, spanning KV-cache isolation failures, cross-request performance interference, and crash or liveness bugs. These results show that concurrency, caching, and state reuse can induce silent cross-request contamination, noisy-neighbor denial of service, and delayed crashes without malformed inputs or explicit server errors, making concurrent serving behavior a first-class security and reliability boundary for LLM infrastructure.

2605.11199 2026-05-13 hep-lat cs.LG

Operator Spectroscopy of Trained Lattice Samplers

Moxian Qian

AI总结 本文研究了训练后的晶格采样器在场空间中的函数特性,而非仅关注其生成的系综。通过将采样器的输出(如流匹配速度、扩散得分或归一化流残差)投影到由对称性、高斯路径极限、有限体积模态和规范协变性预先选定的操作符基上,揭示了其内在结构。研究发现,在二维ϕ⁴模型中,训练后的直流通量教师不能仅由局域力基描述,其残差可分解为零模宾德分量和最低壳层有限k关联分量,特定操作符投影能有效降低残差,而其他控制方法则效果不佳。该方法适用于区分不同采样器类别,并为模型评估提供了统一的测试框架。

Comments 26 pages, 13 figures, 15 tables

详情
英文摘要

Trained lattice samplers are usually judged by the ensembles they generate. Here we instead analyze the trained field-space function itself: a flow-matching velocity, a diffusion score, or a normalizing-flow action residual. We project these functions onto operator bases fixed before the fit, chosen from symmetry, exact Gaussian path limits, finite-volume modes, and gauge covariance. For two-dimensional lattice \(ϕ^4\), a trained straight-flow teacher is not described by a local force basis alone. After the local transport basis, the residual separates into a zero-mode Binder component and a lowest-shell finite-\(k\) correlator component. The deflated zero-mode polynomial \(P_5(M;t)\) reduces the dominant Binder-tail component, while \(ϕ^\perp_{|n|^2=1}\) reduces the finite-\(k\) correlator component; wrong-parity, off-zero-mode, and random controls do not produce the same reductions. The same projection distinguishes other sampler classes. Diffusion follows the force-resolvent ordering predicted by the free theory, reverse-KL normalizing-flow collapse appears as a forbidden odd zero-mode residual, and gauge-equivariant teachers are resolved by Wilson-loop-force tangent directions. The operator basis is model- and symmetry-dependent, but the test is common: project the trained field-space function and retain sectors that lower held-out residuals and pass the available controls.

2605.11191 2026-05-13 stat.ML cs.LG

Adaptive Policy Learning Under Unknown Network Interference

Aidan Gleich, Eric Laber, Alexander Volfovsky

AI总结 本文研究了在未知网络干扰环境下进行自适应策略学习的问题,旨在同时学习网络中个体间的干扰动态并据此优化个体层面的干预分配以最大化累积收益。作者提出了一种基于吉布斯采样的汤普森采样算法,能够联合学习干扰网络并自适应优化干预策略,同时提供干扰网络的估计以支持后续因果分析。实验表明,该方法在多种场景下均能实现显著的累积收益提升,并具有良好的理论保证和实际效果。

详情
英文摘要

Adaptive experimentation under unknown network interference requires solving two coupled problems: (i) learning the underlying dynamics of interference among units and (ii) using these dynamics to inform treatment allocation in order to maximize a cumulative outcome of interest (e.g. revenue). Existing adaptive experimentation methods either assume the interference network is fully known or bypass the network by operating on coarse cluster-level randomizations. We develop a Thompson sampling algorithm that jointly learns the interference network and adaptively optimizes individual-level treatment allocations via a Gibbs sampler. The algorithm returns both an optimized treatment policy and an estimate of the interference network; the latter supports downstream causal analyses such as estimation of direct, indirect, and total treatment effects. For additive spillover models, we show that total reward is linear in the treatment vector with coefficients given by an $n$-dimensional latent score. We prove a Bayesian regret bound of order $\sqrt{nT \cdot B \log(en/B)}$ for exact posterior sampling; empirically, our Gibbs-based approximate sampler achieves regret consistent with this rate and remains sublinear when the additive spillovers assumption is violated. For general Neighborhood Interference, where this reduction is unavailable, we analyze an explore-then-commit variant with $O(n^2 \log T)$ graph-discovery cost. An information-theoretic $Ω(n \log T)$ lower bound complements both results. Empirically, our method achieves more than an order-of-magnitude reduction in regret in head-to-head comparisons. On two real-world networks, the algorithm achieves sublinear regret and yields downstream effect estimates with small RMSE relative to the truth.

2605.11188 2026-05-13 cs.CR cs.AI cs.ET

Adversarial SQL Injection Generation with LLM-Based Architectures

Ali Karakoc, H. Birkan Yilmaz

AI总结 本文研究了如何利用大型语言模型(LLM)生成对抗性SQL注入攻击,以评估Web应用防火墙(WAF)的防御能力。作者提出了两种基于LLM的新方法——RADAGAS和RefleXQLi,并在多种WAF系统上进行了大规模实验,结果显示RADAGAS在AI/ML类WAF中表现出色,但在基于规则的WAF上效果有限。研究为利用LLM进行安全测试提供了重要的实证参考。

Comments 32 pages, 8 figures, 8 tables

详情
英文摘要

SQL injection (SQLi) attacks are still one of the serious attacks ranked in the Open Worldwide Application Security Project (OWASP) Top 10 threats. Today, with advances in Artificial Intelligence (AI), especially in Large Language Models (LLMs), an opportunity has been created for automating adversarial attack tests to measure the defense mechanisms. In this paper, we aim to create a comprehensive evaluation of use cases that utilize LLMs for adversarial SQL injection generation. We introduce two novel LLM-based systems, Retrieval Augmented Generation for Adversarial SQLi (RADAGAS) and Reflective Chain-of-Thought SQLi (RefleXQLi), and compare them with existing baselines against 10 Web Application Firewalls (WAFs) and one execution-based MySQL validator. To perform a comprehensive test, we used six rule-based open-source WAFs (ModSecurity PL1--3, Coraza PL1--3), 2 AI/ML-based WAFs (WAF Brain, CNN-WAF), and 2 commercial WAFs (AWS WAF and Cloudflare WAF). For the LLM models, we used GPT-4o, Claude 3.7 Sonnet, and DeepSeek R1. Our tests consist of 240 experiments that generate 240,000 payloads and perform 2.2 million tests against WAFs. Our comprehensive evaluation reveals that RADAGAS-GPT4o outperforms other baseline models with a 22.73\% bypass rate. The proposed RADAGAS variants are highly successful on AI/ML-based WAFs (92.49\% on WAF-Brain by RADAGAS-DeepSeek, 80.48\% on CNN-WAF by RADAGAS-Claude), but struggle to bypass rule-based WAFs (0--5.70\% on ModSecurity and Coraza). In addition to these findings, another observation is that creating less diverse payloads achieves more bypasses, however they show poor results if the initially chosen payload is not successful. We observe that our findings provide a comprehensive view on using LLM-based approaches in security testing.

2605.11179 2026-05-13 stat.ML cs.LG

Interpretable Machine Learning for Spatial Science: A Lie-Algebraic Kernel for Rotationally Anisotropic Gaussian Processes

Kane Warrior, Dalia Chakrabarty

AI总结 许多三维空间场具有旋转各向异性,即变化方向不与坐标轴对齐。本文提出了一种可解释的旋转各向异性高斯过程核函数,通过三个主尺度和一个显式的SO(3)旋转参数化三维对称正定协方差度量,从而更直观地描述各向异性方向和尺度。该方法利用李代数指数映射将旋转表示为无约束的欧几里得坐标,同时保证协方差矩阵的有效性,并在合成数据和实际材料密度数据上验证了其优越性和可解释性。

详情
英文摘要

Many three-dimensional spatial fields are anisotropic, with directions of rapid and slow variation that need not align with the coordinate axes. Standard Gaussian process kernels with Automatic Relevance Determination (ARD) capture only axis-aligned anisotropy, while generic full symmetric positive definite (SPD) metrics can represent rotated anisotropy but do not parameterise principal length-scales and directions directly. We introduce an interpretable rotationally anisotropic GP kernel that parameterises a three-dimensional SPD covariance metric using three principal length-scales and an explicit SO(3) rotation. The rotation is represented by an axis-angle vector and mapped to SO(3) via the Lie-algebra exponential map, giving unconstrained Euclidean coordinates for inference while always inducing a valid SPD metric. The construction spans the same family of three-dimensional SPD covariance metrics as a generic full-SPD parameterisation, but exposes the geometry differently: length-scales and orientation are explicit, interpretable, and directly available for prior specification and posterior summaries. We perform Bayesian inference on these quantities using Markov Chain Monte Carlo (MCMC), and characterise the resulting symmetries and weakly identified regimes. On synthetic data with rotated anisotropy, the posterior recovers the generating metric and improves prediction relative to an axis-aligned ARD baseline, while matching the predictive performance of a generic full SPD baseline. When the ground truth is axis-aligned, posterior mass concentrates near the identity rotation and predictive performance matches ARD. On a material-density dataset from a laboratory-fabricated nano-brick, the inferred metric reveals rotated anisotropy that is not captured by axis-aligned kernels.