arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4033
2405.09570 2026-05-12 eess.SP cs.LG cs.SD eess.AS

FunnelNet: An End-to-End Deep Learning Framework to Monitor Digital Heart Murmur in Real-Time

Md Jobayer, Md. Mehedi Hasan Shawon, Md Zakir Hossain, Shreya Ghosh, Imre Rudas, Tom Gedeon, Md Rakibul Hasan

AI总结 本文提出了一种端到端的深度学习框架 FunnelNet,用于实时监测数字心音杂音。该方法结合传统滤波和深度可分离卷积网络,通过 Butterworth 滤波器和连续小波变换提取心音特征,并采用压缩、瓶颈和扩张三个网络模块实现高效特征学习。实验表明,该模型在儿科心音数据集上以仅 5.4k 参数取得了 85% 的准确率和 92% 的特异性,且在资源受限设备上实现了高实时检测性能,为医疗资源匮乏地区的便捷诊断提供了有效方案。

详情
英文摘要

Heart murmurs are abnormal sounds caused by turbulent blood flow in the heart. Several diagnostic methods are available to detect heart murmurs and their severity, including cardiac auscultation, echocardiography, and phonocardiography (PCG). However, these methods have limitations, including the need for extensive training among healthcare providers, the cost and accessibility of echocardiography, and noise interference during PCG data processing. This study proposes an end-to-end real-time heart murmur detection approach using traditional and depthwise separable convolutional networks. We applied a Butterworth filter and Continuous Wavelet Transform (CWT) to eliminate noise and extract meaningful features from the PCG data. The proposed network consists of three parts: a Squeeze net that generates a compressed data representation, a Bottleneck layer that minimizes computational complexity using depthwise-separable convolutions, and an Expansion net that up-samples the data to capture fine details. We evaluated our model on the publicly available CirCor pediatric heart sound dataset. Using only $\sim$5.4k parameters, we achieved an accuracy of 85%, a sensitivity of 85%, and a specificity of 92%, successfully outperforming several larger models. Furthermore, we converted our network into a TinyML format and tested it on two resource-constrained devices, achieving an average real-time inference accuracy of 91% on a Raspberry Pi 4B and 80% on an Android smartphone. The proposed lightweight model offers a robust deep learning framework for accurate, real-time heart murmur detection, showing strong promise for accessible medical diagnostics in limited-resource environments. The code is publicly available at https://github.com/jobayer/FunnelNet.

1706.00476 2026-05-12 math.OC cs.LG stat.ML

The Mixing method: low-rank coordinate descent for semidefinite programming with diagonal constraints

Po-Wei Wang, Wei-Cheng Chang, J. Zico Kolter

AI总结 本文提出了一种用于解决具有对角约束的结构化半定规划问题的低秩坐标下降方法,称为“Mixing方法”。该方法实现简单、无需调参,并在优化性能上相比现有方法有显著提升。研究证明该方法严格递减、收敛于临界点,且在足够秩的条件下所有非最优临界点均为不稳定的。此外,该方法在随机初始化下几乎肯定以局部线性速率收敛到全局最优解,这是首个无需假设即可在球面流形上达到全局最优的低秩半定规划方法。作者将该算法应用于最大割问题和最大可满足性问题的松弛求解,并在多个方面展示了优于现有方法的显著改进。

Comments The proof has been updated to match the version presented in the 2021 thesis: https://ml.cmu.edu/research/phd-dissertation-pdfs/thesis_poweiw.pdf

详情
英文摘要

In this paper, we propose a low-rank coordinate descent approach to structured semidefinite programming with diagonal constraints. The approach, which we call the Mixing method, is extremely simple to implement, has no free parameters, and typically attains an order of magnitude or better improvement in optimization performance over the current state of the art. We show that the method is strictly decreasing, converges to a critical point, and further that for sufficient rank all non-optimal critical points are unstable. Moreover, we prove that with a step size, the Mixing method converges to the global optimum of the semidefinite program almost surely in a locally linear rate under random initialization. This is the first low-rank semidefinite programming method that has been shown to achieve a global optimum on the spherical manifold without assumption. We apply our algorithm to two related domains: solving the maximum cut semidefinite relaxation, and solving a maximum satisfiability relaxation (we also briefly consider additional applications such as learning word embeddings). In all settings, we demonstrate substantial improvement over the existing state of the art along various dimensions, and in total, this work expands the scope and scale of problems that can be solved using semidefinite programming methods.

2605.09232 2026-05-12 cs.CR cs.LG

Privacy-Preserving Distributed Learning in IoT Systems: A Unified Threat Model and Evaluation Framework

John Cartmell, Alexander Williams

AI总结 随着物联网设备的广泛应用,分布式学习框架在数据本地化的同时共享模型更新,带来了隐私泄露风险。本文提出了一种统一的威胁模型,涵盖模型逆向、成员推断、梯度泄露等攻击,并构建了评估框架,用于在真实攻击场景和物联网资源限制下比较不同隐私保护方法的有效性。研究分析了多种典型方法,揭示了隐私保护强度与系统效率之间的根本权衡,并指出基于布隆过滤器的方法在保持低计算和通信开销的同时,能提供轻量级的隐私保护。

Comments 14 pages, 6 figures

详情
英文摘要

The increasing deployment of Internet-of-Things (IoT) devices has accelerated the use of distributed learning frameworks, where data remains local while model updates are shared across decentralized systems. Although this reduces centralized data collection, it introduces privacy risks through the exchange of gradients, model parameters, and intermediate representations. A variety of privacy-preserving techniques have been proposed to address these risks, including differential privacy, cryptographic methods, and lightweight system-level approaches. However, existing surveys often evaluate these methods in isolation and lack a unified framework for comparing their effectiveness under realistic attack models and IoT resource constraints. This paper presents a structured analysis of privacy-preserving techniques for distributed learning in IoT environments. A unified threat model is introduced that captures model inversion, membership inference, gradient leakage, and communication-based attacks. Building on this model, an evaluation framework is developed to compare methods in terms of both privacy robustness and system-level efficiency, including computational, memory, and communication overhead. Using this framework, representative approaches including differential privacy, homomorphic encryption, secure multi-party computation, distributed selective stochastic gradient descent, and Bloom Filter-based methods are analyzed. The results highlight a fundamental trade-off between privacy strength and system efficiency. In particular, Bloom Filter-based encodings are shown to provide lightweight privacy through collision-induced ambiguity while maintaining low computational and communication overhead. The paper provides a unified perspective on privacy-preserving design choices for distributed learning in IoT systems.

2605.09225 2026-05-12 cs.CR cs.AI cs.LG

The Art of the Jailbreak: Formulating Jailbreak Attacks for LLM Security Beyond Binary Scoring

Ismail Hossain, Tanzim Ahad, Md Jahangir Alam, Sai Puppala, Syed Bahauddin Alam, Sajedul Talukder

AI总结 本文研究了针对大型语言模型(LLM)的越狱攻击问题,提出了一种超越二元评分的评估方法。研究构建了一个包含114,000个对抗性提示的大型越狱数据集,并开发了自动化的越狱生成方法,能够从有害种子提示中生成流畅的攻击性输入。此外,作者提出了一种无需训练的评估指标OPTIMUS,能够更准确地衡量越狱攻击的效果,揭示了传统二元成功率指标所忽略的攻击最优区域。

Comments This paper is under review on of top security venues

详情
英文摘要

Jailbreak attacks -- adversarial prompts that bypass LLM alignment through purely linguistic manipulation -- pose a growing operational security threat, yet the field lacks large-scale, reproducible infrastructure for generating, categorizing, and evaluating them systematically. This paper addresses that gap with three contributions. (1) Large-scale compositional jailbreak dataset. We construct 114,000 adversarial prompts by applying 912 composing strategies to 125 harmful seed prompts from JailBreakV-28K. Every prompt is assigned to one of 14 cybersecurity attack categories (e.g., malware, phishing, privilege escalation) via a six-model majority-vote pipeline, and each strategy is ranked by effectiveness per category, enabling principled strategy selection grounded in concrete adversarial objectives. (2) Automated jailbreak generation. We instruction-fine-tune category-aware LLMs on Moderate and Optimal subsets, producing models that synthesize fluent jailbreak prompts from a harmful seed at inference time -- no templates, no gradient search. Our generators achieve perplexity 24-39 versus 40-140 for AutoDAN and AmpleGCG, with safety-filter evasion rates of 0.29-0.51 Mal (LlamaPromptGuard-2-86M), enabling controllable, scalable red-teaming under realistic adversarial conditions. (3) OPTIMUS: a training-free jailbreak evaluator. OPTIMUS is a continuous metric J(S,H) that jointly captures semantic similarity between the harmful seed and the jailbreak (S) and harmfulness probability (H) via calibrated penalty functions. Unlike binary attack success rate (ASR), OPTIMUS requires no task-specific training, generalizes across evolving strategies, and exposes a stealth-optimal regime (S*=0.57, H*=0.43) that ASR misses. Experiments across 114,000 prompts confirm that OPTIMUS separates Weak, Moderate, and Optimal jailbreaks with category-level evidence binary evaluation cannot supply.

2605.09222 2026-05-12 cs.DB cs.AI cs.SE

Detect, Localize, and Explain: Interactive Hierarchical Log Anomaly Analytics with LLM Augmentation

Lei Ma, Suhani Chaudhary, Ethan Shanbaum, Athanasios Tassiadamis, Peter M. VanNostrand, Dennis M. Hofmann, Haowen Xu, Elke Rundensteiner

AI总结 该论文提出了一种基于层次化日志抽象的交互式日志异常分析系统Krone-viz,旨在解决传统日志序列结构松散、难以精准诊断异常的问题。其核心方法是将扁平日志分解为语义连贯的实体、动作和状态层次单元,并结合模块化检测与大语言模型(LLM)推理,实现异常的精确检测、定位与解释。该系统支持工程师对日志分解结构、检测结果及LLM生成的解释进行交互式审查与修正,提升了日志分析的可解释性与实用性。

详情
英文摘要

Logs are ubiquitous in modern systems. Unfortunately, their unstructured nature in flat sequences limits understanding of execution behaviors, hindering effective anomaly diagnosis. To address this, Krone introduces a novel hierarchical log abstraction that transforms flat log sequences into semantically coherent units across entity, action, and status levels. Building on this abstraction, Krone introduces a hierarchical orchestration framework that decomposes flat log sequences into hierarchical execution units and performs modular detection over them. It executes and optimizes the modular detection tasks across levels, enabling precise anomaly detection, localization, and explanation with selective invocation of LLM-based reasoning. In this work, we present Krone-viz, an interactive visualization system based on Krone, which makes hierarchical log analysis interpretable and actionable for software engineers and system operators. Demonstrated on the widely used HDFS benchmark dataset, Krone-viz supports: 1) examining hierarchical decompositions of flat log sequences, 2) inspecting detection results and abnormal segments identified by Krone with LLM-generated explanations, and 3) reusing, reviewing, and revising knowledge generated by LLMs with human-in-the-loop guardrails. The code of Krone-viz is available at https://github.com/LeiMa0324/KRONE_Demo_official, and we deploy a live demo at https://leima0324.github.io/KRONE_Demo_official.

2605.09213 2026-05-12 math.AP cs.LG math.PR

Kinetic theory for Transformers and the lost-in-the-middle phenomenon

Mitia Duerinckx, Borjan Geshkovski, Stefano Rossi

AI总结 本文研究了因果自注意力动态——一种解码器Transformer的简化模型,并将其解释为非交换相互作用粒子系统。通过适配矩量展开方法和利用Glauber微积分估计相关性,作者证明了定量的平均场极限结果以及相关性的更高阶刻画。对于独立同分布的均匀标记,极限相关性方程可解析求解,从而严格解释了经验上观察到的“中间迷失”现象:标记检索性能随源位置呈U型变化,表现出首因效应、近因效应以及在特定小条件下的唯一中间最小值。

详情
英文摘要

We study causal self-attention dynamics -- a toy model for decoder Transformers -- which we interpret as a non-exchangeable interacting particle system. Adapting cumulant expansions to the triangular causal dependency structure of the model, and appealing to non-hierarchical methods to estimate correlations using Glauber calculus, we prove a quantitative mean-field limit result and a next-order characterization of correlations. For iid uniformly distributed tokens, the limiting correlation equation can be solved in closed form and we obtain a rigorous explanation of the empirically observed \emph{lost-in-the-middle} phenomenon: the token retrieval profile, as a function of the source position in the prompt, is $\mathsf{U}$-shaped, with primacy, recency, and a unique interior minimum under an explicit smallness condition.

2605.09209 2026-05-12 math.OC cs.AI

Select-then-differentiate: Solving Bilevel Optimization with Manifold Lower-level Solution Sets

Saeed Masiha, Zebang Shen, Negar Kiyavash, Niao He

AI总结 本文研究了在下层问题存在非孤立解流形的情况下,乐观双层优化问题的求解方法。作者提出了一种“先选择后求导”的方法,通过在解流形上进行显式的乐观选择,并结合伪逆计算超梯度,得到了适用于非唯一解情况下的梯度公式。该方法在非退化条件下保证了超目标函数的局部光滑性,并在实验中应用于大语言模型的预算匹配重加权任务,取得了优异的性能表现。

详情
英文摘要

We study optimistic bilevel optimization when the lower-level problem has a non-isolated manifold of minimizers. In this setting, the hyper-objective may be non-differentiable because the upper-level criterion must choose among multiple lower-level solutions. Under a local Polyak--Łojasiewicz (PŁ) condition, we show that differentiability does not require the lower-level solution set to be a singleton: uniqueness of the optimistic selection is sufficient. This yields an explicit pseudoinverse-based hyper-gradient formula extending the classical singleton-minimizer result. We further characterize the regularity of the hyper-objective: non-degeneracy of the selected minimizer along the solution manifold yields local smoothness, while failure of uniqueness can create many non-differentiable points and failure of non-degeneracy can destroy all positive Hölder regularity of the hyper-gradient. Motivated by this theory, we propose HG-MS, a select-then-differentiate method combining explicit optimistic selection with efficient pseudoinverse-based hyper-gradient computation. Despite the nonconvex nature of optimistic selection over the lower-level solution manifold, we show that HG-MS converges to a stationary point of the optimistic objective with complexity governed by the intrinsic dimension of the solution manifold rather than its ambient dimension. Empirically, we test a practical variant of HG-MS for matched-budget LLM source reweighting. This variant preserves the select-then-differentiate principle and obtains the best GSM8K/MATH scores across the tested backbones, along with competitive or best MT-Bench instruction-following results.

2605.09128 2026-05-12 cs.MA cs.AI

Internal vs. External: Comparing Deliberation and Evolution for Multi-Agent Constitutional Design

Hershraj Niranjani, Ujwal Kumar, Phan Xuan Tan

AI总结 该研究探讨了多智能体系统中行为规范应通过内部协商还是外部优化来制定的问题。研究在三种社会环境中对比了内部协商与外部进化两种方法,发现外部进化在集体行动场景中表现更优,但在双边交易中效果相当。实验还揭示了外部优化在激励变化时可能产生不利合作,而内部协商则缺乏对惩罚机制的有效探索,表明外部优化在特定条件下更具优势。

Comments 20 pages

详情
英文摘要

Multi-agent AI systems need behavioral constitutions, but it is unresolved whether such rules should emerge internally through agent self-governance or be discovered externally through optimization. We present the first controlled comparison of internal deliberation and external evolution across three social environments: a coordination grid-world, an iterated public goods game, and a bilateral trading market. Across 180 simulation runs, evolution significantly outperforms deliberation in collective-action settings (p < 0.01), while neither method improves outcomes in bilateral trading. A multiplier ablation reveals that evolution's advantage inverts when incentives shift: at pool multiplier (m = 0.75) the evolved constitution forces value-destroying cooperation and becomes the worst-performing method. Notably, no deliberation run across thirty trials ever proposed punishment -- the canonical cooperation-sustaining mechanism evolution reliably discovers -- suggesting external optimization wins on peaks while internal self-governance trades peaks for structural responsiveness.

2605.09125 2026-05-12 eess.SY cs.LG cs.SY math.OC

Transfer Learning of Multiobjective Indirect Low-Thrust Trajectories Using Diffusion Models and Markov Chain Monte Carlo

Jannik Graebner, Ryne Beeson

AI总结 本文研究了在多目标间接低推力轨道优化中利用扩散模型和马尔可夫链蒙特卡洛(MCMC)方法进行迁移学习的问题。针对任务参数变化频繁、生成高质量初始条件困难的挑战,提出了一种结合同伦方法与MCMC的迁移学习框架,以更高效地生成训练数据并提升优化效率。实验表明,该方法在样本质量与计算成本之间取得了良好平衡,并能生成更多可行解,提升帕累托前沿质量,为参数变化下的轨迹优化提供了有效解决方案。

详情
英文摘要

Preliminary low-thrust spacecraft mission design is a global search problem characterized by a complex solution landscape, multiple objectives, and numerous local minima. During this phase, mission parameters are often not yet fully defined, requiring new solutions to be generated at a high cadence across varying parameter values. When combined with the indirect approach to optimal control, diffusion models can accelerate this search by learning distributions that represent high-quality initial costates. However, generating training data remains expensive, and opportunities exist to better exploit past data. We propose a transfer-learning framework that combines homotopy in a mission parameter with Markov chain Monte Carlo (MCMC) to generate training data more efficiently. The approach reformulates a multiobjective optimization problem as sampling from an unnormalized target distribution in costate space. We compare three MCMC algorithms on a planar multi-revolution transfer in the circular restricted three-body problem, with homotopy in the system mass parameter. The results show that gradient-based MCMC variants achieve the best trade-off between sample quality and computational cost. For the test transfer, the proposed framework generates 40 % more feasible solutions and achieves a higher-quality Pareto front than a state-of-the-art indirect approach based on adjoint control transformations and gradient-based optimization. Finally, the MCMC-generated samples are used to fine-tune a diffusion model conditioned on the mass parameter, enabling it to learn a global representation of the underlying solution distribution and efficiently generate new solutions. These findings establish the transfer-learning framework as a practical method for efficiently solving indirect trajectory optimization problems with varying parameters.

2605.09120 2026-05-12 cs.IR cs.SD

Reddit2Deezer: A Scalable Dataset for Real-World Grounded Conversational Music Recommendation

Haven Kim, Julian McAuley

AI总结 当前对话式音乐推荐(CMR)研究面临一个困境:真实对话语料规模有限,而合成语料虽能扩展规模但缺乏自然性。本文提出Reddit2Deezer,一个基于19万个独特{帖子,叶子评论}对构建的现实基础CMR数据集,包含原始版本和重述版本,每个音乐实体均关联Deezer标识符,便于获取音频预览和丰富元数据。该数据集经过人工验证,确保对话质量、物品关联性和重述准确性,为内容驱动的对话推荐研究提供了重要资源。

详情
英文摘要

Conversational music recommendation (CMR) research currently faces a tradeoff between authentic dialogue corpora that are limited in scale and synthesized corpora that scale up but whose conversations are artificially constructed rather than naturally observed. In this paper, we introduce Reddit2Deezer, a reality-grounded CMR resource derived from 190k unique {thread, leaf-comment} pairs. We release the resource in two versions: a raw version that preserves authenticity, and a paraphrased version that maximizes long-term reproducibility. Each musical entity is linked to a Deezer identifier, which provides straightforward access to audio previews and rich metadata (e.g., genre tags, popularity, BPM), opening the door to future research on content-grounded conversational recommendation. A human validation confirms the quality of the dialogues, item grounding, and paraphrases. The dataset is available at https://huggingface.co/datasets/McAuley-Lab/Reddit2Deezer.

2605.09118 2026-05-12 quant-ph cs.LG

Quantum Transfer Learning Shows Improved Robustness in Low-Data Regimes

Li-An Lo, Li-Yi Hsu, Hsien-Yi Hsieh

AI总结 本文研究了在数据量有限的情况下量子模型在迁移学习中的鲁棒性表现。通过对比多种量子和经典模型在不同迁移任务和再训练配置下的性能,发现尽管经典模型在数据充足时通常表现更优,但在低数据条件下,量子模型展现出更稳定的性能和更高的数据效率。这一结果为量子模型在资源受限的迁移学习场景中具有提升鲁棒性的潜力提供了实证支持。

Comments 22 pages, 5 figures

详情
英文摘要

Transfer learning under limited data is a challenging setting, where models must adapt to new tasks with minimal supervision. Prior work has primarily focused on improving absolute accuracy in transfer learning. However, empirical evidence comparing quantum and classical models in realistic transfer learning settings remains limited, especially in low-data regimes. In this work, we systematically study the robustness of quantum models under reduced training data. We evaluate multiple quantum and classical architectures across diverse transfer tasks and retraining configurations, and quantify robustness using accuracy degradation and relative performance retention (RPR). Our results show that, although classical models often achieve higher peak performance, they exhibit significantly larger degradation when training data is limited. In contrast, quantum models maintain more stable performance across data regimes, indicating improved robustness and data efficiency. These findings provide empirical evidence that quantum models can offer improved robustness in low-resource transfer learning scenarios.

2605.09076 2026-05-12 cs.MA cs.AI cs.LG

Robust Multi-Agent LLMs under Byzantine Faults

Haejoon Lee, Vincent-Daniel Yun, Hyeonho Oh, Dimitra Panagou, Sai Praneeth Karimireddy

AI总结 随着大型语言模型(LLM)代理越来越多地通过对等网络协作以提高可靠性,如何应对不可靠或拜占庭故障代理带来的影响成为一个关键问题。本文提出了一种名为Self-Anchored Consensus(SAC)的全去中心化协议,通过迭代交换响应、本地评估和过滤不可靠信息来增强系统鲁棒性,并给出了保证系统在拜占庭影响下仍能正常运行的通信图条件。实验表明,SAC在多种通信拓扑下均能有效抑制拜占庭影响,显著优于现有方法。

详情
英文摘要

Large language model (LLM) agents increasingly collaborate over peer-to-peer networks to improve their reliability. However, these same interactions can also become a source of vulnerability, as unreliable or Byzantine agents may sway neighboring agents toward incorrect conclusions and degrade overall system performance. Existing methods rely on leader-based coordination or self-reported confidence, both of which are susceptible to adversarial manipulation. We study decentralized LLM multi-agent systems (LLM-MAS) and propose Self-Anchored Consensus (SAC), a fully decentralized iterative filter-and-refine protocol in which agents iteratively exchange responses, locally evaluate and filter unreliable messages, and refine their own outputs. We present $(F{+}1)$-robustness conditions for the communication graph that ensure honest agents preserve and propagate reliable information despite Byzantine influence. Experiments on mathematical and commonsense reasoning benchmarks show that SAC effectively suppresses Byzantine influence and consistently improves performance across diverse communication topologies, whereas prior methods degrade under adversarial conditions.

2605.09075 2026-05-12 stat.ML cs.LG

Optimality of Sub-network Laplace Approximations: New Results and Methods

Swarnali Raha, Kshitij Khare, Rohit K Patra

AI总结 本文研究了子网络拉普拉斯近似方法在深度神经网络不确定性量化中的最优性问题。现有方法通常依赖于对参数子集的启发式选择,忽略了参数间的交叉作用,且缺乏理论保证。作者通过理论分析证明,所有子网络拉普拉斯方法都会系统性低估全拉普拉斯后验的预测方差,且该偏差随保留参数子矩阵的增大而减小。基于这一发现,本文提出了两种基于梯度和贪心策略的子网络拉普拉斯近似方法,并证明其在理论上的优越性,实验也表明其性能优于现有方法。

Comments 34 Pages, 8 Figures, 2 Tables

详情
英文摘要

Although the Laplace approximation offers a simple route to uncertainty quantification in deep neural networks, its reliance on inverting large Hessian matrices has motivated a range of computationally feasible low-dimensional or sparse approximations. A prominent class of such methods - sub-network Laplace approximations, constructs surrogates by restricting attention to a small subset of parameters. Existing approaches in this family typically rely on diagonal, layer-wise, or other architectural heuristics for subset selection, which ignore cross-parameter interactions and lack formal optimality guarantees. In this paper, we provide a rigorous theoretical analysis of the sub-network Laplace paradigm. We prove that all sub-network Laplace methods systematically underestimate the predictive variance of the full Laplace posterior, and that this bias decreases monotonically as the retained sub-matrix expands. Leveraging this insight, we propose two principled, analytically grounded sub-network Hessian approximations: \textit{Gradient-Laplace} selects parameters with the largest average squared gradients of the model output with respect to the parameters over a reference dataset; while \textit{Greedy-Laplace} iteratively refines this selection by accounting for off-diagonal interactions in the precision matrix. We establish theoretical guarantees characterizing their optimality properties and show that Gradient-Laplace provably outperforms existing heuristic approaches. Extensive numerical studies across diverse settings indicate that these methods perform strongly relative to existing benchmarks.

2605.09070 2026-05-12 cs.CR cs.AI

Single-Configuration Attack Success Rate Is Not Enough: Jailbreak Evaluations Should Report Distributional Attack Success

Carsten Maple, Abhishek Kumar, Riya Tapwal

AI总结 许多关于越狱攻击的研究论文仅报告有限参数配置下的攻击成功率,而忽略了参数组合的多样性。本文指出,这种做法无法充分反映参数化越狱攻击的真实威胁,并提出应采用分布式的评估方式。为此,作者引入了变体敏感度度量(VSM)和联合覆盖率(UC)两个新指标,以更全面地评估攻击效果,并通过实验表明这些指标在多个攻击方法和目标模型上的重要性。

详情
英文摘要

Many jailbreak attack research papers report attack success rates for a limited number of parameter settings, even though there are many combinations of parameter settings that could be used. Further, when new jailbreak papers are released, they often benchmark results against single configurations of existing attacks. This position paper argues such practices are fundamentally insufficient for characterising the threat posed by parameterised jailbreak attacks, and comparing attacks. Most jailbreak attacks expose multiple internal parameters, system prompt templates, conversation rounds, cipher dispersion, teaching shots, and ASR varies substantially across these parameters. Reporting only the best-case configuration discards two pieces of information that defenders genuinely need: how typical that performance is across the variant space, and how much of the attack surface is missed by selecting a single variant. We propose two new measures for jailbreak attacks: the Variant Sensitivity Measure (VSM) and Union Coverage (UC). VSM quantifies how far the best reported ASR deviates from the mean ASR across the tested variant space, UC is the total fraction of prompts resulting in unsafe responses across all tested configurations. We empirically demonstrate the importance of these measures using two attack families across three open-source target models. For PAIR, the best template reaches 69% ASR on Mistral-7B and 75% on Qwen3-0.6B, while UC rises to 88% and 93%, respectively. For bijection on Mistral-7B, the best variant reaches 81% ASR, but the 36-variant union covers 100% of HarmBench-100 prompts. We argue that distributional reporting, publishing VSM alongside ASR and enumerating variant coverage as fully as compute allows, should become the new minimum standard for parameterised jailbreak evaluation.

2605.09061 2026-05-12 q-fin.CP cs.LG

A Market-Rule-Informed Neural Network for Efficient Imbalance Electricity Price Forecasting

Runyao Yu, Julia Lin, Derek W. Bunn, Jochen Stiasny, Wentao Wang, Yujie Chen, Tara Esterl, Peter Palensky, Jochen L. Cremer

AI总结 本文提出了一种结合市场规则的神经网络框架,用于高效预测电力市场的不平衡电价,以应对实时交易中非线性定价机制、异构输入信号和数据缺失等挑战。该方法将电价形成规则嵌入神经网络的潜在空间,在保留原始信号信息的同时利用透明的市场先验知识,提升了预测的准确性和计算效率。实验表明,与通用深度学习模型相比,该模型在参数量和训练时间上更具优势,验证了结合市场规则与神经网络对工业能源交易中精准且可持续预测的有效性。

Comments 10 pages, 3 figures, 3 tables

详情
英文摘要

Accurate and efficient imbalance electricity price forecasting is critical for industrial energy trading systems, especially as battery assets and automated bidding pipelines increasingly participate in balancing markets. However, real-time forecasting is complicated by nonlinear market-rule-based price formation, heterogeneous input signals, and incomplete data availability caused by communication delays, publication lags, and measurement outages. This paper proposes a market-rule-informed neural forecasting framework that embeds imbalance price formation rules into the latent space of an expressive neural network. The proposed framework preserves raw signal information while exploiting transparent market-rule priors. We further analyze operational robustness by removing price-component information and characterize how forecasting performance scales with input length and forecasting horizon. Experimental results show that the proposed model achieves competitive forecasting performance with substantially fewer trainable parameters and shorter training time than generic deep learning baselines. Experimental results show that the proposed model achieves competitive forecasting performance with substantially fewer trainable parameters and shorter training time than generic deep learning baselines, demonstrating that market-rule priors and expressive neural networks should be jointly used for accurate and computationally sustainable forecasting in industrial energy trading applications. The implementation is publicly available at https://runyao-yu.github.io/MRINN/.

2605.09058 2026-05-12 physics.comp-ph cs.LG

Nonlinear GENERIC Informed Neural Networks (N-GINNs): learning GENERIC dynamics with non-quadratic dissipation potentials

Vojtěch Votruba, Zequn He, Weilun Qiu, Celia Reina, Michal Pavelka

AI总结 本文提出了一种名为非线性GENERIC有知神经网络(N-GINNs)的深度学习框架,用于从数据中发现由非线性GENERIC形式描述的系统的演化方程。该方法通过引入凸耗散势函数,能够识别更广泛的符合热力学一致性的动力学系统,包括具有非二次耗散势的系统。通过适当重新参数化双矢量算子和耗散势,N-GINNs在结构上严格保证了热力学第一和第二定律的满足,并在三个典型示例中验证了其有效性。

Comments 26 pages, 7 figures, 4 tables

详情
英文摘要

We introduce Nonlinear GENERIC Informed Neural Networks (N-GINNs), a deep learning framework for discovering evolution equations of systems governed by the nonlinear GENERIC formalism (General Equation for Non-Equilibrium Reversible-Irreversible Coupling). Such systems exhibit coupled conservative and dissipative dynamics, and can be described via the superposition of a Hamiltonian flow and a generalized gradient flow. In contrast to existing approaches, our formulation incorporates generalized gradient flows via convex dissipation potentials, enabling the identification of a broader class of thermodynamically consistent dynamics, including systems with non-quadratic dissipation potentials. Thermodynamic structure is strongly enforced by construction through suitable reparameterizations of both the bivector operator and the dissipation potential, ensuring exact compliance with the first and second laws of thermodynamics. We validate the proposed approach on three representative examples: a harmonic oscillator coupled to a heat bath, an idealized chemical motor, and a one-dimensional viscoplastic model of Perzyna type. These results demonstrate the method's ability to accurately infer thermodynamically consistent models from data for systems incorporating both conservative and nonlinear dissipative dynamics.

2605.09019 2026-05-12 quant-ph cs.LG

Learning Pure Quantum States in Any Dimension (Almost) Without Regret

Josep Lumbreras, Marco Tomamichel

AI总结 本文研究了在任意有限维下几乎无干扰地学习纯量子态的问题,提出了一种适用于任意维度纯态的量子态层析方法。该方法通过在纯态流形上局部工作,利用相邻投影测量的差值来估计误差的切向分量,并结合自适应方差估计与跨周期正则化技术,实现了高效的在线学习。该算法在任意维度 $d$ 中,经过 $T$ 次测量后,累积遗憾为 $\mathcal{O}(d^3\log^2 T)$,并在每个中间时刻 $t$ 具有 $\mathcal{O}(d^3\log(T)/t)$ 的在线非保真度,展示了无干扰纯态层析在高维量子系统中的普适性。

Comments 43 pages

详情
英文摘要

We extend quantum state tomography with minimal cumulative disturbance, first investigated in [arXiv:2406.18370], to arbitrary finite-dimensional pure states. A learner sequentially receives fresh copies of an unknown pure state, chooses a rank-one projector for each copy using the previous outcomes, and performs the corresponding two-outcome projective measurement. The goal is to learn the state while keeping the chosen projectors close to the unknown state in order to minimize disturbance. The qubit solution relies on the special geometry of the Bloch sphere and does not extend directly to qudits, where pure states form a curved manifold. We show that this obstruction can be overcome by working locally on the pure-state manifold. The algorithm proceeds in epochs. In each epoch, it fixes a current estimate, measures pairs of nearby rank-one projectors obtained by moving in opposite tangent directions, and takes differences of the corresponding outcomes. This gives an exact linear observation of the tangent component of the error. The resulting local linear models are combined with a robust variance-adaptive estimator and a hot-start regularization that transfers precision across epochs. For every unknown pure state in dimension \(d\), after \(T\) measured copies, our protocol achieves cumulative regret \(\mathcal{O}(d^3\log^2 T)\), and at each intermediate time \(t\leq T\) its current estimate has online infidelity \(\mathcal{O}(d^3\log(T)/t)\). Hence, pure-state tomography with essentially no cumulative disturbance is not a peculiarity of qubits but a geometric phenomenon that persists for qudits.

2605.08994 2026-05-12 physics.chem-ph cond-mat.mtrl-sci cs.LG

Beyond the Black Box: An Interpretable Machine Learning Framework for Predicting Electronic Structure Microdescriptors and Structure-Performance Relationships in Fe-based Catalytic Systems

Oyinkansola Romiluyi

AI总结 该研究提出了一种可解释的机器学习框架,用于预测基于铁的催化体系中电子结构微描述符及其结构-性能关系。该框架结合SHAP特征重要性分析与树集成模型,能够在数据有限的情况下识别并排序影响催化性能的关键热力学、结构和几何微描述符,揭示电子带隙与催化性能之间的关系。研究发现,热力学晶格稳定性和几何因素是影响电子带隙的主要因素,而非整体化学计量比,为催化剂的高效筛选和优化提供了可解释的物理特征指导。

Comments 27 pages, 10 figures

详情
英文摘要

The current catalyst discovery and development pipeline for energy-intensive applications like methane conversion remains bottlenecked by expensive trial-and-error experimentation, irreproducible chemical intuition, and a lack of frameworks linking complex catalytic design spaces to performance. This work presents an interpretable machine learning framework that integrates SHAP-based feature importance analysis (Explainable AI) with tree-based ensembles (Random Forest and Bayesian-optimized CatBoost) to characterize Fe-zeolite and oxide-supported catalysts for the partial oxidation of methane (POM). Despite limited data, the framework decodes complex structure-performance relationships by identifying and ranking thermodynamic, structural, and geometric microdescriptors that influence the electronic band gap and govern macroscale performance metrics such as selectivity, activity, and stability. This work explicitly demonstrates that thermodynamic lattice stability and geometric factors are the primary drivers of electronic band gap (a critical proxy for redox reactivity) rather than bulk stoichiometry. Non-linear models achieve an R2 of 0.61 - 0.77, significantly outperforming traditional linear baselines (R2 = 0.32). This workflow provides both a light-weight generalizable methodology and a prioritized list of physical features for accelerated catalyst screening - and these features can subsequently be integrated into microkinetic and reaction engineering models to create digital twins of complex reactor systems and to enable predictive optimization in autonomous R&D laboratories.

2605.08963 2026-05-12 stat.ML cs.LG

Survey-aware Machine Learning: A Guideline for Valid Population Health Inference based on Scoping Review

YongKyung Oh, Henry W. Zheng, Jeffrey Feng, Alex A. T. Bui

AI总结 该研究针对基于复杂健康调查数据(如NHANES)的机器学习模型中常忽略调查设计信息的问题,提出了一个九步指南——Survey-aware Machine Learning(SaML),以确保人口健康推断的有效性。通过综述16篇方法学论文,总结了加权模型训练、基于设计的交叉验证和调查调整性能评估等现有方法,并指出现有研究在超参数调优和部署方面的不足。SaML为不同分析目标提供了具体的步骤指导,有助于提升模型的公平性和推断准确性。

详情
英文摘要

Machine Learning (ML) models trained on complex health surveys such as the National Health and Nutrition Examination Survey (NHANES) often ignore primary sampling units, stratification variables, and sampling weights. This practice violates the independence assumptions of standard evaluation methods. As a result, estimates become biased, uncertainty is underestimated, and fairness assessments fail to reflect population-level disparities. We propose Survey-aware Machine Learning (SaML), a nine-step guideline that incorporates survey design metadata across the ML lifecycle. Through a scoping review of 16 methodological papers, we summarize existing work on weighted model training, design-based cross-validation, and survey-adjusted performance evaluation. We also identify gaps in hyperparameter tuning and deployment. We provide task-specific guidance that clarifies which steps are required for different analytical objectives. SaML provides a checklist for valid population inference from survey data.

2605.08960 2026-05-12 cond-mat.mtrl-sci cs.LG physics.chem-ph physics.comp-ph

CrystalREPA: Transferring Physical Priors from Universal MLIPs to Crystal Generative Models

Chengqian Zhang, Yucheng Jin, Duo Zhang, Tiejun Li, Han Wang

AI总结 该研究提出了一种名为CrystalREPA的框架,旨在将通用机器学习原子势(MLIPs)中蕴含的物理先验知识迁移至晶体生成模型中,以提升生成晶体的热力学稳定性与结构真实性。通过对比学习,CrystalREPA在训练时对齐生成模型编码器与冻结MLIP的原子级隐状态,从而以较低的训练开销实现稳定性相关的原子先验知识迁移。实验表明,该方法在多个生成模型和基准数据集上均有效提升了生成晶体的质量,并揭示了MLIP迁移效果与其在标准榜单上的精度无直接关系,而是与其原子级表征空间的可区分性密切相关。

详情
英文摘要

Crystal generative models mainly learn what stable crystals look like, with little explicit supervision for what makes them stable. We reveal a substantial representation gap between state-of-the-art crystal generative models and pretrained universal machine learning interatomic potentials (MLIPs) via energy probing, and show this gap can be closed by a simple training-time alignment. We propose Crystal REPresentation Alignment (CrystalREPA), a plug-and-play framework that aligns the atom-wise hidden states of generative encoders with frozen MLIP representations through an element-aware contrastive objective, transferring stability-aware atomistic priors with marginal training overhead and no additional inference cost. Across three generative frameworks, ten MLIP teachers, and two benchmark datasets, CrystalREPA consistently improves the thermodynamic stability, structural validity, and structural fidelity of generated crystals. Equally important, we find that an MLIP's transfer effectiveness is poorly predicted by its accuracy on standard leaderboards (e.g., Matbench Discovery) but strongly predicted by the distinguishability of its atom-wise representation space, yielding a practical, accuracy-independent criterion for selecting MLIP teachers for generative transfer.

2605.08910 2026-05-12 cs.CR cs.LG

Enhancing Adversarial Robustness in Network Intrusion Detection: A Layer-wise Adaptive Regularization Approach

Hira Nasir, Eiman Javed, Balawal Shabir, Zunera Jalil, Ahmad Mohsin

AI总结 本文提出了一种名为LARAR的层间自适应正则化方法,用于增强基于神经网络的网络入侵检测系统在对抗攻击下的鲁棒性。该方法通过引入层间脆弱性分析和自适应权重机制,并结合辅助分类器,提升了模型的可解释性和防御能力。实验表明,LARAR在UNSW-NB15数据集上实现了95.01%的干净样本准确率,并在多种对抗攻击下表现出更强的鲁棒性,同时有助于降低计算复杂度和实现对抗样本的早期检测。

详情
英文摘要

The new wave of adversarial attacks that utilize gradient-related vulnerabilities in neural network-based classifiers makes Network Intrusion Detection Systems more open to such threats. Although state-of-the-art adversarial training methods have shown promising results in producing more robust classifiers, their interpretability and defense ability are limited due to their lack of understanding of how adversarial attacks propagate in different layers of network classifiers. In this paper, we present an insightful approach, called LARAR (Layer-wise Adversarial Robustness using Adaptive Regularization), that incorporates additional layer-wise vulnerability analysis and adaptive weighting in conventional adversarial training methods. Additionally, we utilize 'Auxiliary Classifiers' in our approach. LARAR provides interpretable layer-wise vulnerability scores, achieves a clean accuracy of 95.01%, and provides better robustness against adversarial attacks (FGSM, PGD, and transfer attacks) on the UNSW-NB15 dataset. Through the identification of vulnerable layers, the proposed framework reduces computational complexity and enables the early detection of adversarial samples, thus enhancing the effectiveness and interpretability of adversarial defense mechanisms in NIDS.

2605.08886 2026-05-12 eess.IV cs.RO

VISTA: A Benchmark for Real-Time Video Streaming under Network Impairments in Surgical Teleoperation

Zexin Deng, Zhenhui Yuan, Tian Lu, Gaofeng Li, Meipeng Huang, Longhao Zou

AI总结 VISTA 是一个用于评估手术远程操作中实时视频流在真实网络干扰下的性能基准。该研究通过模拟医院局域网、5G城市、4G农村、低轨卫星和高轨卫星等五种网络环境,系统评估了网络质量对视频质量、时间连续性及操作任务成功率的影响。实验表明,网络退化显著降低了远程手术的成功率和任务完成时间,为未来相关研究提供了可复现的评估基础。

Comments Oral presentation at the Connected Autonomous Robotic Systems Workshop, ICRA 2026

详情
英文摘要

Real-time video streaming is crucial in surgical teleoperation, yet reproducible evaluation under realistic network impairments remains limited. This paper presents VISTA, a benchmark designed to study how impairments along the forward video path affect received video quality, temporal continuity, and human task performance. VISTA employs Linux Traffic Control with NetEm and a Gilbert-Elliott loss model to emulate five network conditions: Hospital LAN, 5G Urban, 4G Rural, LEO Satellite, and GEO Satellite. The benchmark integrates a standardised peg transfer task with synchronized measurements of network quality of service (QoS), objective video quality (PSNR, SSIM, and VMAF), and temporal continuity through freeze rate, while maintaining a stable reverse control channel. Across 375 experimental trials, network degradation substantially reduced teleoperation performance: success rate decreased from 97% in Hospital LAN to 79% in 5G Urban, 35% in 4G Rural, 71% in LEO Satellite, and 12% in GEO Satellite, while mean task completion time for successful trials increased from 80 s in Hospital LAN to 117 s in 5G Urban, 211 s in 4G Rural, 152 s in LEO Satellite, and 255 s in GEO Satellite. These findings show that network impairments have a direct impact on task completion and success in surgical teleoperation, and provide a reproducible basis for evaluating teleoperation video under realistic network constraints. Source code available at https://github.com/Dzxx623/VISTA.

2605.08878 2026-05-12 cs.CR cs.AI

Why Do Aligned LLMs Remain Jailbreakable: Refusal-Escape Directions, Operator-Level Sources, and Safety-Utility Trade-off

Yu Chen, Yuanhao Liu, Qi Cao

AI总结 对齐的大语言模型(LLMs)仍易受到越狱攻击。本文从连续输入变换的角度出发,提出了一种称为“拒绝-逃脱方向”(RED)的局部扰动方向,揭示了模型在有害输入附近行为从拒绝到回答转变的机制,并证明RED可分解为模型操作层面上的多个来源贡献。研究进一步指出,消除RED需要在保持模型正常响应能力的同时抑制这些结构漏洞,从而引发安全性和实用性的权衡。实验验证了RED的存在及其与越狱行为之间的关联。

Comments 40 pages, 45 figures

详情
英文摘要

Aligned large language models (LLMs) remain vulnerable to jailbreak attacks. Recent mechanistic studies have identified latent features and representation shifts associated with jailbreak success, but they leave a more fundamental question open: why do aligned LLMs remain jailbreakable, and what structural vulnerabilities in the model make this possible? We study this question through a continuous input-transformation view. Our theoretical finding is that aligned models can still exhibit Refusal-Escape Directions (RED): local perturbation directions around a harmful input that shift the model's behavior from refusal to answering while preserving the model's harmful-semantics interpretation. From this perspective, a jailbreak is not only a successful discrete prompt construction, but can also be understood as a refusal-to-answer behavior transition induced by continuously perturbing a harmful input along RED. We then prove that RED can be exactly decomposed into contributions from operator-level sources across the model's operator structure, and identify normalization, residual-wiring, and terminal sources as analytically constrained operator-level sources. To eliminate RED, the shared expressive modules -- self-attention and MLP -- must eliminate the contributions from these analytically constrained sources while preserving the mechanisms that support benign responses. These competing requirements give rise to a conditional safety-utility trade-off. Experiments across multiple models and attack methods empirically analyze RED from two complementary perspectives and show that added token dimensions can expose RED, while successful jailbreaks exhibit refusal-to-answer shifts largely aligned with terminal-source contributions.

2605.08871 2026-05-12 math.OC cs.DC cs.LG stat.ML

Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction

Zhirayr Tovmasyan, Artavazd Maranjyan, Peter Richtárik

AI总结 本文研究了在异构计算集群中如何通过方差缩减技术提升并行随机优化算法的时间复杂度。作者提出了一种基于动量的方差缩减方法Rennala MVR,改进了原有的Rennala SGD算法,并在均方平滑假设下证明了其时间复杂度的优越性。实验表明,该方法在理论分析和实际应用中均能有效提升优化效率。

详情
英文摘要

Large-scale machine learning models are trained on clusters of machines that exhibit heterogeneous performance due to hardware variability, network delays, and system-level instabilities. In such environments, time complexity rather than iteration complexity becomes the relevant performance metric for optimization algorithms. Recent work by Tyurin and Richtárik (2023) established the first time complexity analysis for parallel first-order stochastic optimization, proposing Rennala SGD as a time-optimal method for smooth nonconvex optimization. However, Rennala SGD is fundamentally a modification of SGD, and variance reduction techniques are known to improve the iteration complexity of SGD. In this work, we investigate whether variance reduction can also improve time complexity in heterogeneous systems. We show that, under a mean-squared smoothness assumption, variance reduction can improve time complexity in relevant parameter regimes. To this end, we propose Rennala MVR, a variance-reduced extension of Rennala SGD based on momentum-based variance reduction, and analyze its oracle and time complexity. We establish lower bounds for time complexity under these assumptions. On a stochastic quadratic benchmark, experiments with the exact method support the theory, while neural-network experiments with a practical inexact variant show similar empirical gains over Rennala SGD.

2605.08866 2026-05-12 stat.ML cs.LG math.OC

Tight Generalization Bounds for Noiseless Inverse Optimization

Pouria Fatemi, Hoomaan Maskan, Suvrit Sra, Peyman Mohajerin Esfahani

AI总结 本文研究了无噪声逆优化问题,旨在从观测到的上下文-动作数据中推断决策者的优化目标参数。作者提供了高概率下的 $O(\frac{d}{T})$ 通用化界,并在特定条件下进一步加强了这一界,使其与强化学习中的最佳臂识别结果相一致。此外,作者证明了该界在所考虑的一致估计器中是紧致的,并将结果扩展到瞬时和累积遗憾分析,实验验证了理论结果的有效性。

Comments 29 pages, 2 figures

详情
英文摘要

Inverse optimization (IO) seeks to infer the parameters of a decision-maker's objective from observed context--action data. We study noiseless IO, where demonstrations are generated by a ground-truth objective. We provide a high-probability ${O}(\frac{d}{T})$ generalization bound for the induced action set, where $d$ is the number of unknown parameters and $T$ is the size of the training dataset. We strengthen these guarantees under additional conditions that ensure uniqueness of the chosen action, bringing our IO guarantees in line with best-arm identification results in the bandit literature. We further show that the ${O}(\frac{d}{T})$ rate is tight over all consistent estimators considered here, and extend the result to both instantaneous and cumulative regret. Notably, the resulting regret lower bound matches the corresponding upper bounds in the adversarial setting, indicating that the stochastic IO setting is effectively adversarial for the class of estimators studied here. Finally, we propose a parameter-free algorithm with lower per-iteration complexity than generic solvers. Experiments validate the predicted rates and illustrate the tightness of our bounds.

2605.08850 2026-05-12 math.OC cs.LG stat.ML

Local LMO: Constrained Gradient Optimization via a Local Linear Minimization Oracle

Peter Richtárik, Kaja Gruntkowska, Hanmin Li

AI总结 本文提出了一种新的无投影梯度优化方法 Local LMO,用于解决约束优化问题。其核心思想是用局部线性最小化预言替代传统 Frank-Wolfe 方法中的全局线性最小化预言,通过在当前迭代点周围的小球区域内进行约束集的交集最小化操作,从而实现更高效的优化过程。Local LMO 在多个重要场景下继承了投影梯度下降的收敛速率,并在无需约束集有界、无需曲率假设等条件下,获得了凸函数、强凸函数以及非凸、随机和非光滑问题的多种收敛性保证。

Comments 71 pages, 8 figures

详情
英文摘要

We design Local LMO - a new projection-free gradient-type method for constrained optimization. The key algorithmic idea is to replace the global linear minimization oracle over the constraint set used by Frank-Wolfe (FW) with a local linear minimization oracle over the intersection of the constraint set and a "small" ball centered at the current iterate. In particular, when minimizing $f:\mathbb{R}^d\to \mathbb{R}$ over a constraint $\emptyset\neq\mathcal{X}\subseteq\mathbb{R}^d$, Local LMO performs the iteration \[x_{k+1}\in \arg\min_{z\in\mathcal{X}\cap\mathcal{B}(x_{k},t_k)}\langle\nabla f(x_{k}), z \rangle,\] where $x_0\in\mathcal{X}$, and $t_k>0$ is a suitably chosen radius which can be interpreted as an effective stepsize. While designed as an alternative to FW, Local LMO is perhaps best viewed as a generalization of Gradient Descent (GD) rather than a modification of FW. Indeed, it is easy to see that Local LMO reduces to GD in the unconstrained setting and, more generally, to GD restricted to an affine subspace if the constraint $\mathcal{X}$ is affine. We prove that this simple algorithmic scheme transfers the known (unaccelerated) convergence rates of Projected Gradient Descent (PGD) to the projection-free world in several important regimes, some of which are beyond the reach of FW. In contrast to FW theory, i) our guarantees hold without requiring the feasible set $\mathcal{X}$ to be bounded, ii) our theory does not require the "curvature" assumption, which allows us to establish a standard sublinear rate for convex functions with bounded gradients, iii) we obtain a linear rate in the smooth strongly convex regime. Furthermore, we obtain sharp sublinear rates in the smooth convex and non-convex regimes, in the $(L_0,L_1)$-smooth convex regime, and in stochastic and non-differentiable settings.

2605.08824 2026-05-12 cs.GR cs.CV

HairGPT: Strand-as-Language Autoregressive Modeling for Realistic 3D Hairstyle Synthesis

Haimin Luo, Min Ouyang, Lan Xu, Jingyi Yu

AI总结 HairGPT 是一种基于发丝作为语言单元的自回归生成模型,旨在解决真实感3D发型合成中的结构与纹理耦合问题。该方法将发型分解为语义区域和结构层次的双解耦序列建模问题,通过几何分词器和语义注释引导发丝级别的生成,实现了复杂发型的合成与编辑。HairGPT 将发型生成从传统的纹理合成转变为结构化且语义可控的创作过程,支持在真实和风格化场景中生成高保真发型。

Comments Accepted to SIGGRAPH 2026 (Journal Track)

详情
英文摘要

Hair is a rich medium of visual and cultural expression, yet its digital modeling remains challenging due to the duality of fluidity and structure. Many existing generative approaches rely primarily on continuous diffusion fields, which entangle global topology with local texture and obscure the semantic and structural organization of hairstyles. To address this, we propose HairGPT, a strand-centric framework that treats strands as generative primitives and formulates realistic 3D hairstyle synthesis as a dual-decoupled autoregressive sequence modeling problem. Our method applies spatial decoupling across semantic scalp regions and structural decoupling along a hierarchical strand representation, progressing from global layout to fine-grained style. We further introduce a geometric tokenizer and region-aware semantic annotations to guide strand-level generation, enabling compositional editing, synthesis of rare and complex hairstyles, and adaptation to stylized domains. By aligning generative modeling with the workflow of digital grooming, HairGPT turns hair generation from opaque texture synthesis into a structured and semantically controllable authoring process, supporting robust semantic conditioning and high-fidelity results across realistic and stylized domains. Project Page: https://haiminluo.github.io/hairgpt/

2605.08811 2026-05-12 stat.ML cs.LG

Learning Theory of Transformers: Local-to-Global Approximation via Softmax Partition of Unity

Zhongjie Shi, Wenjing Liao

AI总结 本文研究了Transformer网络在紧致欧几里得域和紧致黎曼流形上的回归任务中的学习理论,提出了一种基于softmax分区统一性的构造性逼近框架,通过注意力机制实现局部逼近的全局聚合。研究表明,仅包含两个编码器块和标准单隐藏层前馈网络的密集型Transformer,能够以$\mathcal{O}(\varepsilon^{-d/α})$参数数量实现对α-Hölder连续函数的均匀ε逼近。进一步分析表明,该模型的泛化误差界达到近似最小最大最优,为$\mathcal{O}\big(n^{-\frac{2α}{2α+d}} \log n\big)$,其中$n$为训练样本数量。

详情
英文摘要

This paper investigates the learning theory of Transformer networks for regression tasks on the compact Euclidean domain $[0,1]^d$ and $d$-dimensional compact Riemannian manifolds. We propose a novel constructive approximation framework for Transformers that builds local approximations of the target function and aggregates them into a global approximation via softmax partition of unity. This approach leverages the attention mechanism to achieve spatial localization through affine transformations of the input. The softmax activation plays a crucial role in aggregating local approximations to a global output. From an approximation perspective, we prove that a dense Transformer equipped with only two encoder blocks and standard single-hidden-layer point-wise feed-forward networks can achieve a uniform $\varepsilon$-approximation error for $α$-Hölder continuous functions with $α\in (0,1]$ using $\mathcal{O}(\varepsilon^{-d/α})$ total parameters. Building upon this approximation guarantee, we establish a near minimax-optimal generalization error bound of order $\mathcal{O}\big(n^{-\frac{2α}{2α+d}} \log n\big)$ for the empirical risk minimizer, where $n$ is the training data size. The Transformer architecture studied in this paper is dense, shallow and wide, and employs softmax activation and sinusoidal positional encodings, closely reflecting practical implementations.

2605.08793 2026-05-12 cs.MS cs.AI cs.LG stat.CO stat.ML

cuRegOT: A GPU-Accelerated Solver for Entropic-Regularized Optimal Transport

Yixuan Qiu

AI总结 最优传输(OT)已成为现代机器学习中的基础工具,但在大规模应用中其计算成本仍是一个显著瓶颈。为提升计算效率,本文提出 cuRegOT,一种针对熵正则化最优传输问题的高性能 GPU 求解器。该方法结合了多种算法与架构优化策略,包括摊销符号分析、异步 Sinkhorn 迭代生成机制以及融合内核设计,有效提升了 GPU 上的计算效率与收敛速度,并在多个基准任务中展现出优于现有方法的性能。

详情
英文摘要

Optimal transport (OT) has emerged as a fundamental tool in modern machine learning, yet its computational cost remains a significant bottleneck for large-scale applications. While harnessing the massive parallelism of modern GPU hardware is critical for efficiency, the de facto standard Sinkhorn algorithm, despite its ease of parallelization, often suffers from slow convergence in challenging problems. More recently, the sparse-plus-low-rank quasi-Newton method offers a balance between convergence rate and per-iteration complexity; however, its efficiency on GPUs is severely hindered by the serial nature of sparse matrix symbolic analysis and irregular memory access patterns. To bridge this gap, we present cuRegOT, a high-performance GPU solver tailored for entropic-regularized OT. We introduce a suite of algorithmic and architectural optimizations, including an amortized symbolic analysis strategy to mitigate CPU bottlenecks, an asynchronous Sinkhorn iterates generation mechanism, and a fused kernel for bandwidth-efficient gradient evaluation. These strategies are backed by rigorous theoretical guarantees ensuring algorithmic convergence. Extensive numerical experiments demonstrate that cuRegOT achieves significant speedups over state-of-the-art GPU-based solvers across a variety of benchmark tasks.

2605.08785 2026-05-12 cs.AR cs.AI

A Reconfigurable Multiplier Architecture for Error-Resilient Applications in RISC-V Core

Pragun Jaswal, L. Hemanth Krishna, B. Srinivasu

AI总结 本文提出了一种可重构的乘法器架构,集成于RISC-V核心中,旨在提升能效神经网络推理和边缘AI应用的性能。该架构通过专用的mulscr模块支持精确与近似计算的多级精度配置,实现了在标准处理器流水线中对能耗与精度的细粒度控制。实验表明,该设计在精确和近似模式下分别实现了44%-52%和62%-68%的功耗降低,同时保持了较高的计算性能,验证了其在能效受限的边缘AI场景中的有效性。

Comments Accepted in ISVLSI 2026

详情
英文摘要

Neural Networks (NNs) have been widely adopted due to their outstanding efficacy and adaptability across computer vision and deep learning applications. The optimization of NNs is necessary to enable their deployment on energy constrained embedded devices, where the limited available energy poses a significant challenge for efficient inference. This paper presents a runtime reconfigurable multiplier architecture integrated into the RISC-V core, targeting energy efficient neural network inference and edge AI applications. The proposed multiplier supports adaptability for exact and approximate computation with multiple configurable accuracy levels via a dedicated mulscr, enabling fine-grained energy accuracy control within a standard processor pipeline. The proposed design achieves 44%-52% and 62%-68% power reduction in exact and approximate modes respectively, while maintaining the computational performance of 1.89 DMIPS/MHz. Evaluations on error-tolerant workloads including 2d convolution and matrix multiplication demonstrate up to 63% reduction in energy consumption, with the proposed design achieving 1.21 pJ/instruction for matrix multiplication, confirming its effectiveness for energy-constrained edge AI deployments.