arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4033
2506.01666 2026-05-12 quant-ph cs.AI cs.LG

Synthesis of discrete-continuous quantum circuits with multimodal diffusion models

Florian Fürrutter, Zohim Chandani, Ikko Hamamura, Hans J. Briegel, Gorka Muñoz-Gil

AI总结 本文提出了一种基于多模态扩散模型的方法,用于同时生成量子电路的离散结构和连续参数,以实现目标量子门的高效编译。该方法通过两个独立的扩散过程分别处理离散门的选择和连续参数的预测,克服了现有方法在运行时间长和依赖硬件或仿真资源上的不足。实验表明,该模型在不同规模的量子电路中表现出更高的准确性和鲁棒性,并可通过简单的后优化进一步提升性能,为量子电路合成提供了新的思路和实用工具。

Comments Main Text: 11 pages, 8 figures and 1 table; Code available at: https://github.com/FlorianFuerrutter/genQC

详情
Journal ref
Machine Learning: Science and Technology 7.2 (2026)
英文摘要

Efficiently compiling quantum operations remains a major bottleneck in scaling quantum computing. Today's state-of-the-art methods achieve low compilation error by combining search algorithms with gradient-based parameter optimization, but they incur long runtimes and require multiple calls to quantum hardware or expensive classical simulations, making their scaling prohibitive. Recently, machine-learning models have emerged as an alternative, though they are currently restricted to discrete gate sets. Here, we introduce a multimodal denoising diffusion model that simultaneously generates a circuit's structure and its continuous parameters for compiling a target unitary. It leverages two independent diffusion processes, one for discrete gate selection and one for parameter prediction. We benchmark the model over different experiments, analyzing the method's accuracy across varying qubit counts and circuit depths, showcasing the ability of the method to outperform existing approaches in gate counts and under noisy conditions. Additionally, we show that a simple post-optimization scheme allows us to significantly improve the generated ansätze. Finally, by exploiting its rapid circuit generation, we create large datasets of circuits for particular operations and use these to extract valuable heuristics that can help us discover new insights into quantum circuit synthesis.

2505.17204 2026-05-12 stat.ML cs.LG math.PR math.ST stat.CO stat.TH

Liouville PDE-based sliced-Wasserstein flow

Jayshawn Cooper, Pilhwa Lee

AI总结 本文将切片沃瑟斯坦流(SWF)转化为基于刘维尔偏微分方程(PDE)的形式,提出了一种新的非参数隐式生成梯度流方法。通过将基于福克-普朗克方程的随机扩散项重新表述为无扩散项的刘维尔PDE运输方程,并结合神经ODE的标准化流进行密度估计,提升了模型的收敛效率与稳定性。该方法在生成沃瑟斯坦中心时引入柯纳托维奇势函数,有效降低了方差,并在公平回归任务中展现出优于标准SWF的准确率与公平性平衡能力。

Comments 24 pages, 10 figures. arXiv admin note: substantial text overlap with arXiv:1806.08141 by other authors

详情
英文摘要

The sliced Wasserstein flow (SWF), a nonparametric and implicit generative gradient flow, is transformed into a Liouville partial differential equation (PDE)-based formalism. First, the stochastic diffusive term from the Fokker-Planck equation-based Monte Carlo is reformulated as a Liouville PDE-based transport without the diffusive term, essentially reflecting the probability flow ODE. The involved density estimation is handled by normalizing flows of neural ODE without an explicitly defined score function. Next, the computation of the Wasserstein barycenter is approximated by the Liouville PDE-based SWF barycenter with the prescription of Kantorovich potentials for the induced gradient flow to generate its samples. These two efforts show outperforming convergence in training and testing Liouville PDE-based SWF and SWF barycenters with reduced variance. Applying the generative Liouville PDE-based SWF barycenter for fair regression demonstrates competent profiles in the accuracy-fairness Pareto curves, with comparable and alternative choices against the standard SWF, and significant benefit in improving fairness with scalability in comparison to the exact Wasserstein barycenter.

2505.01307 2026-05-12 cs.SE cs.AI

Document Retrieval Augmented Fine-Tuning (DRAFT) for safety-critical software assessments

Regan Bolton, Mohammadreza Sheikhfathollahi, Simon Parkinson, Vanessa Vulovic, Gary Bamford, Dan Basher, Howard Parkinson

AI总结 本文提出了一种名为DRAFT的新方法,用于提升对安全关键型软件进行合规性评估的大型语言模型的能力。DRAFT基于检索增强生成(RAG)技术,引入了一种新的微调框架,能够同时检索软件文档和相关标准规范,从而更准确地进行评估。通过半自动化的方法生成训练数据,DRAFT在实验中展现出比基线模型更高的正确率,并在证据处理、响应结构和领域推理方面有明显改进,为安全关键领域的合规评估提供了更实用且透明的解决方案。

详情
英文摘要

Safety critical software assessment requires robust assessment against complex regulatory frameworks, a process traditionally limited by manual evaluation. This paper presents Document Retrieval-Augmented Fine-Tuning (DRAFT), a novel approach that enhances the capabilities of a large language model (LLM) for safety-critical compliance assessment. DRAFT builds upon existing Retrieval-Augmented Generation (RAG) techniques by introducing a novel fine-tuning framework that accommodates our dual-retrieval architecture, which simultaneously accesses both software documentation and applicable reference standards. To fine-tune DRAFT, we develop a semi-automated dataset generation methodology that incorporates variable numbers of relevant documents with meaningful distractors, closely mirroring real-world assessment scenarios. Experiments with GPT-4o-mini demonstrate a 7% improvement in correctness over the baseline model, with qualitative improvements in evidence handling, response structure, and domain-specific reasoning. DRAFT represents a practical approach to improving compliance assessment systems while maintaining the transparency and evidence-based reasoning essential in regulatory domains.

2504.21228 2026-05-12 cs.CR cs.AI

CachePrune: Teaching LLMs What Not to Follow via KV-Cache Editing

Rui Wang, Junda Wu, Yu Xia, Tong Yu, Ruiyi Zhang, Ryan Rossi, Subrata Mitra, Lina Yao, Julian McAuley

AI总结 本文提出了一种名为 CachePrune 的方法,用于防御大语言模型(LLMs)面临的间接提示注入攻击。该方法通过在键值缓存(KV-cache)编码过程中识别并剪枝与指令遵循相关的神经元,使模型将上下文视为数据而非需要遵循的指令。研究引入了一种由偏好归因损失引导的神经归因机制,并在理论上将其与直接偏好优化(DPO)目标联系起来,实验表明 CachePrune 能有效降低攻击成功率,同时保持模型对用户指令的响应能力。

详情
英文摘要

Large Language Models (LLMs) are susceptible to indirect prompt injection attacks, where the model inadvertently responds to instructions injected into the prompt context. This vulnerability stems from LLMs' inability to distinguish between data and instructions within a prompt. We propose CachePrune, which defends against this attack by identifying and pruning neurons associated with instruction-following during KV cache encoding of the prompt context. The pruning steers the LLM toward interpreting the context purely as data rather than as instructions to follow. To identify these neurons, we introduce a neural attribution mechanism guided by a preferential attribution loss, and theoretically connect this loss to an upper bound of the Direct Preference Optimization (DPO) objective. Further, we improve the fidelity of neural attribution by leveraging an observed triggering effect in instruction-following. Our approach does not interfere with prompt formatting or incur test-time overhead during response generation. Experiments show that CachePrune significantly reduces the attack success rate while preserving the LLM's ability to follow user instructions.

2502.12396 2026-05-12 physics.flu-dyn cs.CE cs.LG

Scientific Machine Learning of Flow Resistance Using Universal Shallow Water Equations with Differentiable Programming

Xiaofeng Liu, Yalan Song

AI总结 该研究提出了一种基于通用微分方程(UDE)的浅水方程求解器Hydrograd,用于实现水文动力学中的科学机器学习。通过将物理方程与神经网络结合,该方法能够准确进行正向模拟,并支持基于梯度的参数反演与灵敏度分析。研究展示了Hydrograd在真实场景中反演曼宁粗糙系数的能力,并利用神经网络学习河道中粗糙系数与水力参数之间的通用关系,为水文建模提供了更可靠且具有泛化能力的新方法。

详情
英文摘要

Shallow water equations (SWEs) are the backbone of most hydrodynamics models for flood prediction, river engineering, and many other water resources applications. The estimation of flow resistance, i.e., the Manning's roughness coefficient $n$, is crucial for ensuring model accuracy, and has been previously determined using empirical formulas or tables. To better account for temporal and spatial variability in channel roughness, inverse modeling of $n$ using observed flow data is more reliable and adaptable; however, it is challenging when using traditional SWE solvers. Based on the concept of universal differential equation (UDE), which combines physics-based differential equations with neural networks (NNs), we developed a universal SWEs (USWEs) solver, Hydrograd, for hybrid hydrodynamics modeling. It can do accurate forward simulations, support automatic differentiation (AD) for gradient-based sensitivity analysis and parameter inversion, and perform scientific machine learning for physics discovery. In this work, we first validated the accuracy of its forward modeling, then applied a real-world case to demonstrate the ability of USWEs to capture model sensitivity (gradients) and perform inverse modeling of Manning's $n$. Furthermore, we used a NN to learn a universal relationship between $n$, hydraulic parameters, and flow in a real river channel. Unlike inverse modeling using surrogate models, Hydrograd uses a two-dimensional SWEs solver as its physics backbone, which eliminates the need for data-intensive pretraining and resolves the generalization problem when applied to out-of-sample scenarios. This differentiable modeling approach, with seamless integration with NNs, provides a new pathway for solving complex inverse problems and discovering new physics in hydrodynamics.

2502.06830 2026-05-12 q-fin.CP cs.AI cs.LG

OrderFusion: Encoding Orderbook for End-to-End Probabilistic Intraday Electricity Price Forecasting

Runyao Yu, Yuchen Tao, Fabian Leimgruber, Tara Esterl, Jochen Stiasny, Derek W. Bunn, Qingsong Wen, Hongye Guo, Jochen L. Cremer

AI总结 本文研究了电力市场中连续日内(CID)电价的概率预测问题,针对CID市场中买卖订单簿的动态特性,提出了一种新的订单融合方法。该方法通过端到端的学习框架,捕捉买卖订单之间的交互结构,生成更具表现力的价格预测模型,并引入分层量化估计以避免预测结果中的分位数交叉问题。实验表明,该方法在不同流动性水平的欧洲市场中均优于传统方法,具有较高的实用价值。

Comments 10 pages, 4 figures, 4 tables

详情
英文摘要

Probabilistic intraday electricity price forecasting is becoming increasingly important for short-term power-system operation. With increasing renewable generation, demand-side flexibility, and storage assets, market participants need to adjust their positions under uncertainty closer to delivery. Continuous intraday (CID) markets support this process by providing updated price signals, helping participants manage imbalance exposure and operational risk. Unlike auction markets, CID trading in many jurisdictions is characterized by the continuous posting of buy and sell orders. This dynamic orderbook microstructure of price formation presents special challenges for price forecasting. Conventional methods represent the orderbook via domain features aggregated from buy and sell trades, or by treating it as a multivariate time series, but such representations neglect the full buy-sell interaction structure of the orderbook. This research therefore develops a new order fusion methodology, which is an end-to-end and parameter-efficient probabilistic forecasting model that learns a interaction-aware representation of the buy-sell dynamics. Furthermore, as quantile crossing is often a problem in probabilistic forecasting, this approach hierarchically estimates the quantiles with non-crossing constraints. Extensive experiments on CID price indices across high- and low-liquidity European markets demonstrate consistent improvements over conventional baselines, and ablation studies highlight the contributions of the main components.The methodology is available at: https://runyao-yu.github.io/OrderFusion/.

2502.06096 2026-05-12 stat.ML cs.AI cs.LG stat.ME

Post-detection inference for sequential changepoint localization

Aytijhya Saha, Aaditya Ramdas

AI总结 本文研究了序贯变点分析中一个基础但尚未充分探索的问题:在检测到变化后进行统计推断。作者提出了一种通用的非参数框架,能够在任意序贯检测算法判定变化的停时点,仅基于该时刻之前观测到的数据,构建未知变点的置信集。该方法无需对变点后的观测分布、观测空间或检测过程做任何假设,且具有非渐近有效性,适用于多种实际场景,并提供了置信区间的宽度理论保证。

详情
英文摘要

This paper addresses a fundamental but largely unexplored challenge in sequential changepoint analysis: conducting inference following a detected change. We develop a very general framework to construct confidence sets for the unknown changepoint using only the data observed up to a data-dependent stopping time at which an arbitrary sequential detection algorithm declares a change. Our framework is nonparametric, making no assumption on the composite post-change class, the observation space, or the sequential detection procedure used, and is non-asymptotically valid. We also extend it to handle composite pre-change classes under a suitable assumption, and also derive confidence sets for the change magnitude in parametric settings. We provide theoretical guarantees on the width of our confidence intervals. Extensive simulations demonstrate that the produced sets have reasonable size, and slightly conservative coverage. In summary, we present the first general method for sequential changepoint localization, which is theoretically sound and broadly applicable in practice.

2501.04721 2026-05-12 stat.AP cs.LG physics.med-ph

A Shape-Based Functional Index for Objective Assessment of Pediatric Motor Function

Shashwat Kumar, Arafat Rahman, Robert Gutierrez, Sarah Livermon, Allison N. McCrady, Silvia Blemker, Rebecca Scharf, Anuj Srivastava, Laura E. Barnes

AI总结 该研究提出了一种基于形状的函数指标,用于客观评估儿童神经肌肉疾病患者的运动功能。通过可穿戴传感器采集数据,结合形状主成分分析和偏最小二乘法,识别出与运动速度变化和不对称性相关的运动模式,并构建了一个与肌肉脂肪浸润、运动功能评分及年龄相关退化变化高度相关的新型运动功能指数。该方法可应用于家庭环境,有助于长期追踪治疗效果,为儿科神经肌肉疾病提供更客观的评估手段。

Comments 13 pages

详情
Journal ref
Plos one, 2025
英文摘要

Clinical assessments for neuromuscular disorders, such as Spinal Muscular Atrophy (SMA) and Duchenne Muscular Dystrophy (DMD), continue to rely on subjective measures to monitor treatment response and disease progression. We introduce a novel method using wearable sensors to objectively assess motor function during daily activities in 19 patients with DMD, 9 with SMA, and 13 age-matched controls. Pediatric movement data is complex due to confounding factors such as limb length variations in growing children and variability in movement speed. Our approach uses Shape-based Principal Component Analysis to align movement trajectories and identify distinct kinematic patterns, including variations in motion speed and asymmetry. Both DMD and SMA cohorts have individuals with motor function on par with healthy controls. Notably, patients with SMA showed greater activation of the motion asymmetry pattern. We further combined projections on these principal components with partial least squares (PLS) to identify a covariation mode with a canonical correlation of r = 0.78 (95% CI: [0.34, 0.94]) with muscle fat infiltration, the Brooke score (a motor function score), and age-related degenerative changes, proposing a novel motor function index. This data-driven method can be deployed in home settings, enabling better longitudinal tracking of treatment efficacy for children with neuromuscular disorders.

2410.02543 2026-05-12 cs.NE cs.LG

Diffusion Models are Evolutionary Algorithms

Yanbo Zhang, Benedikt Hartl, Hananel Hazan, Michael Levin

AI总结 本文揭示了扩散模型本质上是一种进化算法,通过将进化过程视为去噪过程,反向进化则对应扩散过程,从而在数学上证明了扩散模型天然地包含了选择、突变和生殖隔离等进化机制。基于这一等价性,作者提出了扩散进化方法,利用迭代去噪策略在参数空间中启发式地优化解,相比传统进化算法更高效且能发现多个最优解。进一步引入潜在空间扩散和加速采样技术,提出潜在空间扩散进化方法,有效解决了高维复杂参数空间中的进化问题,并显著减少了计算步骤。这一发现不仅连接了机器学习与生物学两个领域,也为相互促进提供了新的方向。

Comments Accepted by International Conference on Learning Representations (ICLR) 2025

详情
Journal ref
Proceedings of the International Conference on Learning Representations (ICLR), 2025, pp. 74873-74894
英文摘要

In a convergence of machine learning and biology, we reveal that diffusion models are evolutionary algorithms. By considering evolution as a denoising process and reversed evolution as diffusion, we mathematically demonstrate that diffusion models inherently perform evolutionary algorithms, naturally encompassing selection, mutation, and reproductive isolation. Building on this equivalence, we propose the Diffusion Evolution method: an evolutionary algorithm utilizing iterative denoising -- as originally introduced in the context of diffusion models -- to heuristically refine solutions in parameter spaces. Unlike traditional approaches, Diffusion Evolution efficiently identifies multiple optimal solutions and outperforms prominent mainstream evolutionary algorithms. Furthermore, leveraging advanced concepts from diffusion models, namely latent space diffusion and accelerated sampling, we introduce Latent Space Diffusion Evolution, which finds solutions for evolutionary tasks in high-dimensional complex parameter space while significantly reducing computational steps. This parallel between diffusion and evolution not only bridges two different fields but also opens new avenues for mutual enhancement, raising questions about open-ended evolution and potentially utilizing non-Gaussian or discrete diffusion models in the context of Diffusion Evolution.

2406.19581 2026-05-12 cs.HC cs.LG

Quasi-Linear ICA for Motor Unit Decomposition during Dynamic Contractions

Alexander Kenneth Clarke, Dimitrios Halatsis, Agnese Grison, Irene Mendez Guerra, Noura Ezaz-Nikpay, Pranav Mamidanna, Shihan Ma, Silvia Muceli, Dario Farina

AI总结 该研究旨在解决动态收缩过程中表面肌电信号(EMG)的运动单元分解问题,这是构建神经接口设备的关键步骤。传统方法基于独立成分分析(ICA),但假设混合过程在时间上是静态的,无法适应运动过程中因传导介质变形引起的时变混合问题。本文提出了一种准线性ICA方法,通过引入一个可学习的低秩时变可逆变换和一个静态线性分离器,有效处理非平稳干扰,实现了对动态收缩条件下运动单元更准确的分解。实验表明,该方法在公开数据集上优于多种自适应ICA基线,具有更高的分解精度和召回率。

详情
英文摘要

Decomposing surface electromyography (EMG) into the spike trains of individual motor neurons is a long-standing inverse problem and a key step toward motor-neuron-driven neural interfaces such as prosthetics and exoskeletons. The standard approach, independent component analysis (ICA) of the multichannel signal, assumes that the mixing from neurons to electrodes is stationary in time. This assumption fails during movement, when volume-conductor deformation makes the mixing time-varying, and current decomposition algorithms are correspondingly restricted to isometric contractions. We introduce a quasi-linear ICA formulation in which a static linear separator is preceded by a learned, low-rank, time-varying invertible transformation. The separator is trained with an independence loss on the uncompensated projection, and the transformation with a stationarity loss on the recovered source. Gradients are not shared between the two, so the source-extraction step reduces to classical linear ICA and inherits its identifiability guarantee, while non-stationary distortion is absorbed by the transformation. The closed-form inverse of the transformation enables per-spike subtraction with a time-varying template during sequential peel-off. On a public benchmark of dynamic high-density EMG with ground-truth spike trains, the method outperforms four adaptive ICA baselines at every recall threshold, recovering more units at a higher accuracy.

2401.16310 2026-05-12 cs.SE cs.AI

An Insight into Security Code Review with LLMs: Capabilities, Obstacles, and Influential Factors

Jiaxin Yu, Peng Liang, Yujia Fu, Amjed Tahir, Mojtaba Shahin, Chong Wang, Yangxiao Cai

AI总结 本文通过实证研究探讨了大语言模型(LLMs)在安全代码审查中的应用潜力,评估了七种LLMs在不同提示下的表现,并与先进静态分析工具进行了对比。研究发现,LLMs在检测安全缺陷方面显著优于现有工具,其中推理优化的LLM表现更优,DeepSeek-R1和GPT-4在特定提示下表现最佳,但各自存在响应模糊或代码细节不准确的问题。此外,代码复杂度和注释等因素也影响了LLMs的检测能力。

Comments 30 pages, 13 images, 10 tables, Manuscript revision submitted to a journal (2026)

详情
英文摘要

Security code review is a time-consuming and labor-intensive process typically requiring integration with automated security defect detection tools. However, existing security analysis tools struggle with poor generalization, high false positive rates, and coarse detection granularity. Large Language Models (LLMs) have been considered promising candidates for addressing those challenges. In this study, we conducted an empirical study to explore the potential of LLMs in detecting security defects during code review. Specifically, we evaluated the performance of seven LLMs under five different prompts and compared them with state-of-the-art static analysis tools. We also performed linguistic and regression analyses for the two top-performing LLMs to identify quality problems in their responses and factors influencing their performance. Our findings show that: (1) In security code review, LLMs significantly outperform state-of-the-art static analysis tools, and the reasoning-optimized LLM performs better than general-purpose LLMs. (2) DeepSeek-R1 achieves the highest performance, followed by GPT-4 provided in the ChatGPT platform. The optimal prompt for DeepSeek-R1 incorporates both the commit message and chain-of-thought (CoT) guidance, while for GPT-4 via ChatGPT, the prompt with a Common Weakness Enumeration (CWE) list works best. (3) GPT-4 via ChatGPT frequently produces vague expressions and exhibits difficulties in accurately following instructions in the prompts, while DeepSeek-R1 more commonly generates inaccurate code details in its outputs. (4) LLMs are more adept at identifying security defects in code files that have fewer tokens and security-relevant annotations. (5) Higher code complexity correlates with enhanced detection capabilities of DeepSeek-R1 for specific security defect types.

2305.01507 2026-05-12 cs.NE cs.LG

A Parameter-free Adaptive Resonance Theory-based Topological Clustering Algorithm Capable of Continual Learning

Naoki Masuyama, Takanori Takebayashi, Yusuke Nojima, Chu Kiong Loo, Hisao Ishibuchi, Stefan Wermter

AI总结 本文提出了一种基于自适应共振理论(ART)的参数无关拓扑聚类算法,能够实现持续学习。该算法通过确定性点过程准则估计相似度阈值,并基于边的年龄定义边删除阈值,从而无需手动设置参数即可自适应生成分离良好的聚类。实验结果表明,该算法在合成和真实数据集上的聚类性能优于现有先进算法。

Comments This paper is accepted to Neural Computing and Applications

详情
英文摘要

In general, a similarity threshold (i.e., a vigilance parameter) for a node learning process in Adaptive Resonance Theory (ART)-based algorithms has a significant impact on clustering performance. In addition, an edge deletion threshold in a topological clustering algorithm plays an important role in adaptively generating well-separated clusters during a self-organizing process. In this paper, we propose an ART-based topological clustering algorithm that integrates parameter estimation methods for both the similarity threshold and the edge deletion threshold. The similarity threshold is estimated using a determinantal point process-based criterion, while the edge deletion threshold is defined based on the age of edges. Experimental results with synthetic and real-world datasets show that the proposed algorithm has superior clustering performance to state-of-the-art clustering algorithms without requiring parameter specifications specific to the datasets. Source code is available at https://github.com/Masuyama-lab/CAE

2303.05307 2026-05-12 cs.GT cs.AI

Learning Strategic Value and Cooperation in Multi-Player Stochastic Games through Side Payments

Yixin Chen, Jeffrey Richley, Darleen Perez-Lavin, Jessica Singh Syal, Solmaz Kia, Alan Kuhnle

AI总结 本文研究具有可转移效用的多人随机博弈,旨在通过侧支付实现个体理性的合作。作者基于哈萨尼-沙普利值,提出了两种适用于随机博弈的价值概念——HS-S 和 Coco-S,并扩展了相关公理以适应动态环境。研究证明在两人博弈中两者一致,但在三人及以上博弈中可能存在差异,并通过拓扑度理论分析了Coco-S的唯一性,最终在多玩家网格游戏中验证了方法的有效性。

Comments 51 pages, 10 figures

详情
英文摘要

We study general-sum, multi-player stochastic games with transferable utility, motivated by settings where agents can use side payments to make cooperation individually rational. Building on the Harsanyi--Shapley (HS) value for normal-form games, we introduce two HS-based value notions for stochastic games: HS-S, defined by aggregating dynamic coalition-versus-complement threat powers, and Coco-S, defined as fixed points of a statewise HS Bellman operator. We extend HS-style axioms to the stochastic setting and show that HS-S is the unique mapping satisfying them. We prove that HS-S and Coco-S coincide in all two-player stochastic games, but can disagree when $n>2$, via an explicit three-player counterexample. We prove existence and uniqueness of Coco-S fixed points for all two-player games and for three-player two-state games via topological degree theory, and provide an axiomatic characterization of Coco-S through a new \emph{Markov Consistency} axiom that distinguishes it from HS-S. Finally, we give sampling-based estimators with finite-sample guarantees and empirically compare the induced values, policies, and side payments on multi-player grid-game benchmarks.

2210.05108 2026-05-12 math.OC cs.LG

Projection-Free Functional Constrained Optimization for Risk Aversion and Sparsity Control

Yi Cheng, Guanghui Lan, Saeed Masiha, H. Edwin Romeijn

AI总结 本文研究了用于凸或光滑非凸目标函数的函数约束优化问题的无投影方法,这类问题在投资组合优化和放疗计划等应用中常见,涉及风险意识和稀疏性控制。针对凸问题,提出了一种结合层次集外层循环和条件梯度的Level Conditional Gradient(LCG)方法,其迭代复杂度与最优对偶拉格朗日乘子的大小无关;针对非凸问题,提出了一种近似 proximal 点 LCG(IPP-LCG)方法,能够高效求解近似 KKT 点。数值实验表明,所提方法在投资组合选择和调强放疗中能有效平衡稀疏性与风险。

详情
英文摘要

We study projection-free methods for functional constrained optimization with convex or smooth nonconvex objectives. Such problems arise in applications such as portfolio optimization and radiation therapy planning, where risk-aware criteria and sparsity frequently appear together. For the convex setting, we propose a Level Conditional Gradient (LCG) method that combines a level-set outer loop with a conditional gradient oracle for saddle-point subproblems, and we show an iteration complexity of $\mathcal{O}\big(ε^{-2}\log(ε^{-1})\big)$ for smooth and nonsmooth cases without dependence on the magnitude of an optimal dual Lagrange multiplier. For the nonconvex setting, we propose the Inexact Proximal Point LCG (IPP-LCG) method, which solves a sequence of convex subproblems by LCG and attains $\mathcal{O}\big(ε^{-3}\log(ε^{-1})\big)$ complexity for computing an \((ε,ε)\)-near-KKT point. Numerical results on portfolio selection and IMRT illustrate the practical sparsity/risk trade-offs of the proposed methods.

2110.12907 2026-05-12 stat.ML cs.LG math.PR math.ST stat.TH

Hamiltonian Monte Carlo with Asymmetrical Momentum Distributions

Soumyadip Ghosh, Yingdong Lu, Tomasz Nowicki

AI总结 本文研究了哈密顿蒙特卡洛(HMC)算法在使用非对称动量分布时的收敛性问题。传统HMC依赖对称的高斯动量变量,而本文通过新的动力学和概率分析,提出了在更弱条件下保证收敛的理论框架,并指出普通HMC在非对称动量下会破坏自伴随性要求。为此,作者提出了一种改进的AD-HMC算法,能够在Wasserstein距离下实现几何收敛,并通过数值实验验证了其相对于传统高斯辅助HMC的优越性。

详情
英文摘要

Existing rigorous convergence guarantees for the Hamiltonian Monte Carlo (HMC) algorithm use Gaussian auxiliary momentum variables, which are crucially symmetrically distributed. We present a novel convergence analysis for HMC utilizing new dynamical and probabilistic arguments. The convergence is rigorously established under significantly weaker conditions, which among others allow for general auxiliary distributions. In our framework, we show that plain HMC with asymmetrical momentum distributions breaks a key self-adjointness requirement. We propose a modified version of HMC, that we call the Alternating Direction HMC (AD-HMC), which overcomes this difficulty. Sufficient conditions are established under which AD-HMC exhibits geometric convergence in Wasserstein distance. The geometric convergence analysis is extended to when the Hamiltonian motion is approximated by the leapfrog symplectic integrator, where an additional Metropolis-Hastings rejection step is required. Numerical experiments suggest that AD-HMC can generalize a popular dynamic auxiliary scheme to show improved performance over HMC with Gaussian auxiliaries.

2605.08442 2026-05-12 cs.CR cs.AI cs.LG

Defense effectiveness across architectural layers: a mechanistic evaluation of persistent memory attacks on stateful LLM agents

Jun Wen Leong

AI总结 该研究系统评估了针对状态感知大语言模型代理的持久内存攻击在不同架构层次上的防御效果。通过在九个开源模型上进行五千多次实验,发现输入级和检索级防御措施对攻击几乎无能为力,而内存层的工具门控防御(Memory Sandbox)能将攻击成功率降至零,显著优于其他方法。研究揭示了各类防御失效的根本原因,并指出该防御方法在无攻击情况下不会影响模型性能,具有实际应用价值。

Comments 9 models, 5,700 runs across 5 experiments, pre-registered comparisons. Code and results: github.com/junwenleong/stateful-agent-security-eval

详情
英文摘要

Persistent memory attacks against LLM agents achieve high attack success rates against open-source models. In these attacks, malicious instructions injected via RAG-retrieved documents are stored in persistent memory and executed in later sessions. However, no systematic evaluation of defense effectiveness against this attack class exists. We evaluate six defenses across four architectural layers against delayed-trigger attacks on nine open-source models (5,040 runs, N=40 per condition). Four defenses fail at approximately baseline attack success rate: input-level filtering (Minimizer, Sanitizer) and retrieval-level filtering (RAG Sanitizer, RAG LLM Judge) achieve 88-89% ASR, statistically indistinguishable from the undefended baseline of 88.6%. Prompt Hardening partially fails at 77.8% ASR, with the reduction driven by two models at 0%: one genuine defense effect and one model-level refusal independent of the defense. The architectural explanation holds: input-level defenses cannot observe RAG-injected content, and retrieval-level classifiers are defeated by compliance-framed semantic masking. One defense, tool-gating at the memory layer (Memory Sandbox), reduces ASR to 0% for eight of nine models by removing the recall capability the attack requires. The exception inverts the defense entirely: a reasoning model that achieves 0% ASR under no defense via execution refusal inverts to 100% ASR under Memory Sandbox, because removing explicit recall forces the model onto the RAG pathway where its refusal mechanism does not activate. Memory Sandbox imposes zero utility cost in the absence of attack (BTCR = 100% across all conditions). These results provide the first systematic characterization of why each defense class fails against persistent memory attacks, enabling informed defense investment decisions.

2605.08429 2026-05-12 stat.ML cs.LG stat.ME

Active Multiple-Prediction-Powered Inference

Nicholas Brawand, Nima Leclerc, Anhthy Ngo, Matthew Peterson, Sriram Vishwanath, Laith Alhussein, Ben Wellner

AI总结 在医疗AI的部署后监控中,如何以较少的标注数据实现统计有效的推断是一个重要问题。本文提出了一种主动多预测器驱动推断(AM-PPI)方法,通过将每个样本路由到适合其成本的预测器子集,并根据所选预测器的残差不确定性按比例采样标注标签,从而在有限预算下降低估计方差。该方法扩展了单一预测器的预测驱动推断和主动统计推断,实现了多预测器的全局分配到实例级自适应路由的转变,并在理论和实验上验证了其有效性与优越性。

详情
英文摘要

Post-deployment monitoring of healthcare AI requires statistically valid, label-efficient methods, but gold-standard labels from clinician chart review are expensive. Prediction-powered inference (PPI) and active statistical inference (ASI) reduce label cost by combining a small labeled sample with abundant model predictions, but both are restricted to a single predictor, a poor fit for modern clinical pipelines that have multiple predictors of differing cost and accuracy available at inference time. We propose Active Multiple-Prediction-Powered Inference (AM-PPI), which routes each instance to a cost-appropriate predictor subset, samples gold-standard labels in proportion to the chosen subset's residual uncertainty, and reweights predictions to minimize estimator variance, all under a single deployment-time budget. AM-PPI generalizes ASI to leverage multiple predictors and extends Multiple-PPI from global per-predictor allocation to per-instance adaptive routing. We derive closed-form Karush-Kuhn-Tucker (KKT) conditions for all three decisions and prove, via biconvexity and strong duality, that the resulting fixed point is a global optimum despite the joint problem being non-jointly-convex. We establish asymptotic normality with valid coverage, minimum-variance unbiasedness within the linear-prediction augmented inverse propensity weighted (AIPW) class, and a closed-form criterion identifying when multiple predictors help. On synthetic data and three healthcare monitoring tasks, AM-PPI produces 10 to 40 percent narrower confidence intervals (CIs) than single-predictor ASI in the budget regime where routing matters, and matches the better baseline elsewhere.

2605.08400 2026-05-12 math.ST cs.IT cs.LG math.IT stat.ML stat.TH

On Observation Time for Recovering Latent Hawkes Networks

Jonas Linkerhägner, Michele Bortolasi, Lorenzo Baldassari, Maarten V. de Hoop, Ivan Dokmanić

AI总结 本文研究了从基于事件的观测中恢复潜在交互网络所需的最小观测时间问题,该问题在金融、地震学和神经科学等领域具有重要意义。针对一类具有稀疏、弱交互的平稳Hawkes过程,作者证明了观测时间在数量级上需为$\log d$,其中$d$为交互实体的数量,这一时间尺度既是充分条件也是必要条件。研究提出了一个两阶段估计方法,并结合泊松簇表示的浓度界与Fano不等式及Jacod公式,给出了理论保证。

详情
英文摘要

Dynamics of interacting systems in engineering, society, and nature often evolve over latent networks that govern which entities can interact. We study the problem of inferring these networks from event-based observations, which arise naturally in finance, seismology, and neuroscience. While there is substantial algorithmic work addressing this important problem, theoretical results are scarce. In this paper we ask the following fundamental question: what is the minimum time that one must observe the dynamics in order to exactly recover the underlying network, as a function of the number $d$ of interacting entities? For a class of stationary Hawkes processes with sparse, weak interactions, we prove that an observation time of order $\log d$ is sufficient and necessary. For the upper bound we construct a two-stage estimator that uses clipped and binned event data for screening, followed by a least-squares refinement, and apply concentration bounds derived from the Poisson cluster representation. For the lower bound we combine Fano's inequality with Jacod's Girsanov formula for point processes on a suitable subclass of networks.

2605.08382 2026-05-12 cs.CR cs.CL cs.CY

SecureForge: Finding and Preventing Vulnerabilities in LLM-Generated Code via Prompt Optimization

Houjun Liu, Lisa Einstein, John Yang, Joachim Baumann, Duncan Eddy, Christopher D. Manning, Mykel Kochenderfer, Diyi Yang

AI总结 随着大型语言模型(LLM)生成代码的规模不断扩大,其生成的代码中常隐含安全漏洞,且这些问题往往无需人工干预即可产生。本文提出 SecureForge,一种自动化流程,用于检测前沿模型生成代码中的安全风险,并生成能够降低输出漏洞的系统提示。通过识别易产生静态可检测漏洞的良性提示,并利用马尔可夫采样技术生成多样化的合成提示集,SecureForge 能有效优化系统提示,显著减少输出漏洞,同时保持单元测试性能,在多个前沿模型上实现了输出漏洞减少高达 48% 的帕累托改进。

详情
英文摘要

LLM coding agents now generate code at an unprecedented scale, yet LLM-generated code introduces cybersecurity vulnerabilities into codebases without human involvement. Even when frontier models are explicitly asked to write secure production code with relevant weaknesses to avoid in context, we find that they still produce verifiable vulnerabilities on average 23% of the time across a corpus of 250 benign coding prompts. We introduce SecureForge, an automated pipeline that both audits security risks of frontier models and produces auditing-informed secure system prompts that reduce output security vulnerabilities while maintaining unit test performance. SecureForge first identifies benign prompts that produce statically detectable vulnerabilities, and then amplifies them into a large synthetic prompt corpus of diverse scenarios using a Markovian sampling technique to jointly maintain error rates and prompt diversity. This corpus is then used to iteratively optimize the system prompts to reduce output security vulnerabilities. On frontier models, SecureForge yields a statistically significant Pareto improvement in both unit test success and output security, with output vulnerabilities reduced by up to 48%. The resulting system prompts transfer zero-shot to in-the-wild coding agent prompts, without any exposure to real user prompt distributions during optimization.

2605.08379 2026-05-12 stat.AP cs.LG

Transfer Learning for Dead Fuel Moisture Prediction Using Time-Warping Recurrent Neural Networks

Jonathon Hirschi, Jan Mandel, Adam Kochanski

AI总结 本文提出了一种基于时间扭曲的迁移学习方法,通过调整长短期记忆网络(LSTM)的时间尺度,实现不同燃料含水率类别之间的任务迁移。研究针对燃料含水率预测问题,利用天气站传感器获取的大量10小时燃料数据训练模型,并迁移预测1小时、100小时和1000小时燃料的含水率。该方法在俄克拉荷马州的一项标志性实地研究数据上进行了验证,有效提升了稀疏观测条件下的预测性能。

Comments Preprint. Related to PhD thesis work that is also available for preprint at https://doi.org/10.48550/arXiv.2604.02474

详情
英文摘要

This paper proposes a time-warping transfer learning method, a technique for temporally rescaling the learned dynamics of a recurrent neural network (RNN) with a Long Short-Term Memory (LSTM) layer to enable task transfer across fuel moisture classes. Fuel moisture content (FMC) is divided into idealized classes based on characteristic lag time. Large quantities of real-time data are available for 10h fuels from sensors on weather stations, but observations of other fuel classes are sparse in space and time. We use transfer learning to adapt an RNN pretrained on 10h FMC to predict FMC for 1h, 100h, and 1000h fuels. We validate this method using data from a landmark field study conducted in Oklahoma that was used to calibrate the state-of-the-art Nelson fuel moisture model.

2605.08332 2026-05-12 quant-ph cs.AI

Optimal FALQON for Quantum Approximate Optimization via Layer-wise Parameter Tuning

Michael Mancini, Shabnam Sodagari

AI总结 本文提出了一种名为Optimal FALQON的优化方法,通过将每层的时间步长和缩放因子作为优化变量进行经典优化,提升了基于反馈的自适应量子优化(FALQON)在解决组合优化问题时的效率和收敛速度。实验表明,Optimal FALQON在多个基准测试中显著优于标准FALQON和多种QAOA变体,提升了成功概率和计算效率。此外,使用Optimal FALQON的参数作为QAOA的初始值,也表现出更优的热启动性能。

详情
英文摘要

Feedback-based adaptive quantum optimization (FALQON) is a promising approach for solving combinatorial problems on noisy intermediate-scale quantum (NISQ) devices, requiring only single circuit evaluations per layer. However, standard FALQON relies on fixed hyperparameters that severely limit convergence speed, requiring hundreds to thousands of layers for acceptable solutions. This paper proposes Optimal FALQON, an optimization-based formulation that treats the per-layer time step ($δ_k$) and scaling factor ($M_k$) as decision variables optimized via classical methods. We present a comprehensive empirical study on all 94 non-isomorphic 3-regular graphs with 12 vertices, comparing Optimal FALQON with standard FALQON and multiple QAOA variants. Results demonstrate statistically significant improvements in success probability, evaluation efficiency, and depth-normalized cost across the evaluated benchmarks. Furthermore, initializing QAOA with parameters from Optimal FALQON yields superior warm-start performance compared to fixed initialization.

2605.08325 2026-05-12 eess.IV cs.AI cs.LG

CAMAL: Improving Attention Alignment and Faithfulness with Segmentation Masks

Rajdeep Singh Hundal, Yan Xiao, Jin Song Dong, Manuel Rigger

AI总结 该研究提出了一种名为CAMAL的方法,旨在利用分割掩码提升视觉模型中注意力机制的对齐性与可信度。通过在训练过程中将模型的注意力与分割掩码提供的真实判别区域进行对比,CAMAL作为辅助正则化项引导模型关注关键区域并抑制无关区域的注意力。实验表明,CAMAL在深度学习和深度强化学习任务中均显著提升了注意力的对齐性与可信度,并增强了模型的可解释性,同时保持了推理效率。

详情
英文摘要

Many vision datasets now provide segmentation masks in addition to annotated images to support a wide range of tasks. In this work, we propose Class Activation Map Attention Learning (CAMAL), an efficient and scalable method that utilizes segmentation masks to improve attention alignment and faithfulness in vision models. Specifically, attention alignment refers to the degree to which a model's attention aligns with ground-truth discriminative regions, while attention faithfulness refers to the degree to which a model's attention influences its decision. Improving both attention alignment and faithfulness is essential for ensuring that model attention is both spatially accurate and causally meaningful. To improve attention alignment and faithfulness in vision models, CAMAL first extracts the model's attention for each image during training and then compares the attention to ground-truth discriminative regions obtained from the corresponding segmentation masks. CAMAL then acts as an auxiliary regularizer, encouraging attention that aligns with ground-truth discriminative regions, while suppressing attention elsewhere. We evaluated CAMAL across two learning paradigms -- Deep Learning (DL) and Deep Reinforcement Learning (DRL) -- and observed consistent, significant improvements in both attention alignment and faithfulness. In particular, CAMAL yields statistically significant gains in attention alignment across all settings, and improves attention faithfulness by over 35% compared to recent work. Moreover, we show that improved attention alignment and faithfulness enhance explainability, while yielding improved or comparable generalization performance without increasing inference cost. These findings demonstrate that the spatial information contained within segmentation masks can be effectively leveraged to guide model attention across learning tasks.

2605.08324 2026-05-12 eess.IV cs.AI cs.LG

FQPDR: Federated Quantum Neural Network for Privacy-preserving Early Detection of Diabetic Retinopathy

Debashis De, Mahua Nandy Pal, Dipankar Hazra

AI总结 本文提出了一种基于联邦学习的量子神经网络(FQPDR),用于在保护隐私的前提下实现糖尿病视网膜病变的早期检测。该方法通过在分布式医疗数据上训练量子神经网络,避免了患者数据的集中共享,从而保障隐私安全。实验表明,FQPDR 在少量样本和参数下仍表现出良好的检测性能,优于现有的非联邦学习和联邦学习方法。

详情
Journal ref
Evolutionary Intelligence (17) 4047 4068 (2024)
英文摘要

Diabetic Retinopathy (DR) is a common complication of diabetes that can lead to blindness of people. Detecting DR at the earliest stage is essential to prevent irreversible eye damage. Microaneurysm dots are the first signs of DR. As the dots are tiny and of low contrast, detecting mild DR is a very challenging task. Federated learning (FL) preserves data privacy, which is a major concern for medical image processing. FL is a collaborative learning method, which shares only the model parameters with a server, without sharing the patient data to a central server. Inspired by classical FL, we propose a federated learning-based quantum neural network (federated QNN) for this task. We implemented the models with limited samples and few learnable parameters from the E-ophtha and Retina MNIST datasets. The crossevaluation efficiency of the proposed federated quantum neural network system for privacy-preserving early detection of diabetic retinopathy (FQPDR) in Kaggle dataset images indicates the robustness of the light weight learning models. FQPDR performances are inspiring while considering existing non-FL and FL methods.

2605.08319 2026-05-12 cs.SE cs.AI

Mazocarta: A Seeded Procedural Deckbuilder for Instrumented Game Development

Timothy C. Cogan

AI总结 Mazocarta 是一个基于种子的程序化战术卡牌构建游戏,采用 Rust 编写并编译为 WebAssembly 以支持浏览器运行,同时也能原生执行用于模拟。其核心贡献在于构建了一个可测量的游戏开发参考工具,同一规则引擎支持互动玩法、命令行模拟、自动化测试、存档加载以及本地多人对战。该工具通过确定性运行模型和可重复的平衡测试,为游戏机制的平衡性分析和回归测试提供了可复现的开发信号。

Comments 9 pages, 4 figures, 1 table. Code available at https://github.com/timcogan/mazocarta

详情
英文摘要

Mazocarta is a seeded procedural tactical deckbuilder implemented in Rust, compiled to WebAssembly for browser play, and executable natively for simulation. Its primary technical contribution is not the invention of a new deckbuilding genre, but the construction of an instrumented game-development reference artifact: the same rules engine supports interactive play, native command-line simulation, automated end-to-end tests, save/load fixtures, and local-area multiplayer. This paper describes Mazocarta's architecture, deterministic run model, reproducible balance probes, and QR-mediated WebRTC pairing for local multiplayer. An evaluation snapshot over 1,000 deterministic seeds shows that the simulation pipeline can produce reproducible development signals. In the evaluated configuration, single-player and two-player autoplay win rates were 36.1% and 34.9% over 1,000 deterministic seeds, respectively. These rates are not presented as final player-facing balance metrics, but as repeatable probes for future balance shifts and regressions. Mazocarta is positioned as a playable open-source reference artifact for instrumented game development: deterministic regression checks, automated playtesting workflows, balance probes for game mechanics, and browser-native local multiplayer all exercise one shared production rules core.

2605.08313 2026-05-12 cs.CR cs.AI cs.LG

Seed Hijacking of LLM Sampling and Quantum Random Number Defense

Ziyang You, Xiaoke Yang, Zhanling Fan, Feng Guo, Xiaogen Zhou, Xuxing Lu

AI总结 该研究揭示了大语言模型在自回归采样过程中依赖确定性伪随机数生成器(PRNG)所存在的安全漏洞,并提出了一种名为SeedHijack的后门攻击方法,可在不改变模型输出概率的情况下强制注入攻击者指定的令牌。研究在多个模型和采样配置上验证了攻击的高成功率,并提出基于硬件量子随机数生成器(QRNG)的防御方案,有效抵御此类攻击且性能开销极小。这一工作指出了采样层的关键安全问题,并提供了可实际部署的防御方法。

详情
英文摘要

Large language models (LLMs) rely on deterministic pseudorandom number generators (PRNGs) for autoregressive sampling, creating a critical supply-chain attack surface overlooked by existing defenses. We present SeedHijack, a backdoor attack that manipulates PRNG outputs to force attacker-specified token selection without altering model logits. In a 540-trial benchmark on GPT-2 (124M), the attack achieves 99.6% exact token injection across 9 sampling configurations; it reaches 100% success on four aligned models (1.5B-7B, RLHF/SFT/reasoning distillation) and bypasses all alignment methods tested in this work. We further propose a defense based on a hardware quantum random number generator (QRNG), which neutralizes the attack in our evaluated threat model with negligible median overhead (+0.6% latency, +7.7 MB memory). Our work identifies a critical sampling-layer vulnerability and provides a practical, deployable QRNG-based defense.

2605.08310 2026-05-12 cs.CR cs.AI

WebTrap: Stealthy Mid-Task Hijacking of Browser Agents During Navigation

Zhichao Liu, Wenbo Pan, Haining Yu, Ge Gao, Tianqing Zhu, Xiaohua Jia

AI总结 随着浏览器代理在长周期任务中的广泛应用,攻击者有更多机会注入恶意指令。本文提出WebTrap,一种隐蔽的中途任务劫持攻击方法,通过多步骤指令融合技术,将攻击目标与用户原任务无缝结合,使代理在执行攻击后仍能继续完成原任务,从而保持系统可用性。实验表明,WebTrap在保持系统正常运行的同时实现了高攻击成功率,揭示了长周期任务中代理系统存在被隐蔽劫持的关键漏洞。

Comments 31 pages, 4 figures, 10 tables. Code: https://github.com/liuyaojialiuyaojia/WebTrap

详情
英文摘要

Browser agents are increasingly deployed in long-horizon tasks, which require executing extended action chains to accomplish user goals. However, this prolonged execution process provides attackers with more opportunities to inject malicious instructions. Existing prompt injection attacks against browser agents expose two key gaps: (1) low effectiveness, as attacks optimized for toy baselines fail to achieve end-to-end goals in real-world scenarios with complex environments and longer steps; (2) weak stealthiness, since most attacks pit the attack goal against the user goal, causing a significant drop in system usability under attack. To address these gaps, we propose WebTrap, a mid-task hijacking injection attack. It employs multi-step instruction fusion steering to seamlessly combine both goals, enabling the agent to resume the original user task after executing the attack goal. Furthermore, we design a context-grounded generation method to align the injected content with the task environment and system instructions, maximizing the hijacking success rate. Extensive experiments on two browser agent tasks, based on extended WASP and InjecAgent environments, demonstrate that our method achieves a high attack success rate while preserving the usability of the original system. We find that WebTrap exploits the agent's navigation vulnerabilities, binding the two goals so tightly that standard defense mechanisms cannot restore the system to normal operation. These findings reveal a critical vulnerability in agent systems during long-horizon tasks that they can be stealthily hijacked.

2605.08299 2026-05-12 cs.SE cs.AI

Do not copy and paste! Rewriting strategies for code retrieval

Andrea Gurioli, Federico Pennino, Maurizio Gabbrielli

AI总结 该研究探讨了如何通过重写策略提升基于嵌入的代码检索性能,指出当前编码器容易过度拟合代码的表面语法,而使用大型语言模型(LLM)对查询和代码库进行规范化重写是一种有效缓解方法。研究系统比较了三种重写策略在不同编码器和数据集上的效果,发现全自然语言重写在联合查询-代码库增强下表现最佳,而仅代码库增强在多数情况下会降低检索效果。研究还引入了Delta H等指标,用于预测重写对检索性能的提升效果,为重写策略的选择提供了低成本的决策依据。

详情
英文摘要

Embedding-based code retrieval often suffers when encoders overfit to surface syntax. Prior work mitigates this by using LLMs to rephrase queries and corpora into a normalized style, but leaves two questions open: how much representational shift helps, and when is the per-query LLM call justified? We study a hierarchy of three rewriting strategies: stylistic rephrasing, NL-enriched PseudoCode, and full Natural-Language transcription, under joint query-corpus (QC, online) and corpus-only (C, offline) augmentation, across six CoIR benchmarks, five encoders, and three rewriters spanning independent model families (Qwen, DeepSeek, Mistral). We are the first to evaluate NL-enriched PseudoCode and snippet-level Natural Language as direct retrieval representations, rather than as transient intermediates. Full NL rewriting with QC yields the largest gains (+0.51 absolute NDCG@10 on CT-Contest for MoSE-18), while corpus-only rewriting degrades retrieval in 56 of 90 configurations, about 62%. We introduce two diagnostics, Delta H, token entropy, and Delta s, embedding cosine, and show that Delta H predicts retrieval gain under QC across all three rewriter families: pooled Spearman rho = +0.436, p < 0.001 on DeepSeek+Codestral; rho = +0.593 on Codestral alone; rho = +0.356 on Qwen. This establishes Delta H as a cheap, rewriter-agnostic proxy for deciding when rewriting pays off before running retrieval. Our analysis reframes LLM rewriting as a cost-benefit decision: it is most effective as a remediation layer for lightweight encoders on code-dominant queries, with diminishing returns for strong encoders or NL-heavy queries.

2605.08282 2026-05-12 eess.IV cs.AI cs.CV

A Paired Point-of-Care Ultrasound Dataset for Image Quality Enhancement and Benchmarking via a cGAN Baseline

Lennard M. van Karnenbeek, Hilde G. A. van der Pol, Mark Wijkhuizen, Eva Poelman, Caroline A. Drukker, Theo Ruers, Freija Geldof, Behdad Dashtbozorg

AI总结 本文提出了一种基于条件生成对抗网络(cGAN)的深度学习方法,用于提升便携式超声(POCUS)图像质量,并构建了首个精确配对的低端POCUS与高端超声图像数据集POCUS-IQ。该数据集通过自研的自动机械臂系统采集,结合U-Net生成器和L1与SSIM损失函数,有效提升了图像的感知质量。实验表明,该方法在多个客观和无参考质量评估指标上均有显著提升,展示了其在资源有限环境下提升POCUS诊断价值的潜力。

详情
英文摘要

Purpose: We aim to enhance the image quality of point-of-care ultrasound (POCUS) devices using deep learning and a novel paired dataset of POCUS and high-end ultrasound images. Approach: We collected the first accurately paired dataset using a custom-built automated gantry system of low-end POCUS and high-end ultrasound images. A conditional generative adversarial network (cGAN) was utilized based on the pix2pix architecture, with a U-Net generator that incorporates both L1 and structural similarity index (SSIM) losses to improve perceptual quality. Pretraining on a simulation dataset further boosts performance. Evaluation was performed on 1064 paired ex vivo tissue and phantom ultrasound image sets. Results: Our approach improves the SSIM from 0.29 to 0.54 and PSNR from 19.16 dB to 22.41 dB. No-reference metrics also indicate substantial enhancement, with the Natural Image Quality Evaluator (NIQE) and Perception-based Image Quality Evaluator (PIQE) scores dropping from 7.95 to 4.44 and 31.12 to 19.99, respectively. Conclusions: This work presents the first publicly available accurately paired dataset of low-end POCUS to high end ultrasound images. Additionally, our results demonstrate the potential of the proposed framework to overcome hardware limitations of handheld POCUS, enhancing its diagnostic value in low-resource and point-of-care settings. The POCUS-IQ Dataset is publicly available at https://github.com/NKI-MedTech-AI/POCUS-IQ.

2605.08277 2026-05-12 cs.CR cs.AI

Mitigating Many-shot Jailbreak Attacks with One Single Demonstration

Kejia Chen, Jiawen Zhang, Boheng Li, Pengcheng Li, Jian Lou, Zunlei Feng, Mingli Song, Ruoxi Jia, Tianwei Zhang

AI总结 该研究探讨了多示例 jailbreak 攻击(MSJ)如何通过大量有害问答示例诱导安全对齐的语言模型生成有害回答,并揭示了随着有害示例数量增加,模型表示逐渐偏离安全区域的现象。理论分析表明,这种漂移相当于隐式的恶意微调。基于此,研究提出在推理时附加一个固定的单示例安全示范,从而诱导相反的安全更新,恢复模型的拒绝行为,有效提升模型对 MSJ 攻击的鲁棒性,且无需修改模型参数或依赖白盒信息。

详情
英文摘要

Many-shot jailbreaking (MSJ) causes safety-aligned language models to answer harmful queries by preceding them with many harmful question-answer demonstrations. We study why this attack becomes stronger as the number of demonstrations increases. Empirically, we find that MSJ induces a progressive activation drift: the representation of a fixed harmful query moves step by step away from the safety-aligned region as more harmful demonstrations are added. Theoretically, we show that this drift can be interpreted as implicit malicious fine-tuning: conditioning on N harmful demonstrations induces SGD-style updates equivalent to optimizing on the corresponding N harmful samples. This view turns the attack mechanism into a defense principle. We append a fixed one-shot safety demonstration at inference time, which induces a counteracting safety-oriented update and restores refusal behavior. The resulting method improves the model's robustness to MSJ without modifying its parameters or requiring white-box access at deployment. Code is available at https://github.com/Thecommonirin/SafeEnd.

2605.08275 2026-05-12 eess.IV cs.CV

Model-based Dynamic 3D MRI Reconstructions using Neural Fields and Tensor Product Expansions

Ray Sheombarsing, Max van Riel, David Heesterbeek, Nico van den Berg, Alessandro Sbrizzi

AI总结 该研究提出了一种基于神经场和张量积展开的模型驱动方法,用于从高度欠采样的数据中重建动态2D和3D磁共振图像。通过将磁化和线圈灵敏度表示为连续的可微函数,该方法避免了传统离散化带来的高内存需求和结构感知不足的问题,实现了高效且结构感知力强的重建。实验表明,该方法在动态MRI任务中优于现有模型驱动方法,尤其在高加速因子下仍能保持良好的结构和运动保真度。

详情
英文摘要

Conventional MRI reconstruction methods treat images and coil sensitivities as discrete objects, leading to high memory demands and limited structural awareness that hamper effective regularization. These limitations hinder accurate reconstruction in highly undersampled scenarios, such as dynamic 3D cardiac magnetic resonance (CMR). We introduce a discretization-free, memory-efficient, model-based framework for dynamic 2D and 3D MRI reconstruction from highly undersampled data. We represent magnetization and coil sensitivities as continuous objects -- differentiable functions -- using tensor products of univariate neural fields. This tensor product structure enables scalable optimization in high-dimensional spatiotemporal settings. Our method outperforms state-of-the-art model-based reconstructions in dynamic 2D and 3D MR settings, preserving structure and motion even under aggressive undersampling (e.g., acceleration factor 16).