arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1926
专题追踪
2605.16304 2026-05-22 eess.SP cs.SD

Modulation Feature Enhancement with a Multi-Stage Attention Network for Underwater Acoustic Target Recognition

基于多阶段注意力网络的调制特征增强用于水下声学目标识别

Jiaping Yu, Shefeng Yan, Linlin Mao, Zeping Sui, Chunjin Jiang

发表机构 * Institute of Acoustics, Chinese Academy of Sciences(中国科学院声学研究所) University of Chinese Academy of Sciences(中国科学院大学) School of Computer Science and Electronics Engineering, University of Essex(埃塞克斯大学计算机科学与电子工程学院)

AI总结 本文提出了一种基于变分模态分解和3/2-D频谱的特征提取与融合方法,结合多阶段多类型注意力机制和可调类平衡焦点损失,提升水下声学目标识别性能。

Comments 31 pages, 14 figures, Accepted by Signal Processing

详情
AI中文摘要

水下声学目标识别对于海洋应用至关重要,但面临船舶辐射噪声复杂多样的挑战。为解决这些问题,我们提出了一种稳健的深度学习框架。首先,我们引入基于变分模态分解(VMD)和3/2-D频谱的特征提取与融合方法,生成高保真的2-D DEMON频谱特征,有效捕捉调制包络信息。为进一步增强特征表示,我们设计了一种集成新型多阶段多类型注意力机制(MMATT)的一维卷积神经网络(1-D CNN),该机制能够自适应地在不同网络深度上优化特征。在此机制中,我们提出了一种残差通道独立频谱注意力机制(R-CISAM)和多尺度分离与融合频谱注意力机制(MS-SFSAM)。此外,为了缓解实际船舶辐射噪声数据中固有的严重类别不平衡导致的性能下降,我们设计了一种可调类平衡焦点损失(ACBFL),该损失函数在任务不平衡程度不同的情况下提供灵活性。在真实世界船舶辐射噪声数据集上的实验结果表明,所提出的方法有效提升了水下声学目标识别性能。

英文摘要

Underwater acoustic target recognition is critical for maritime applications, yet it faces challenges arising from the complex and diverse nature of ship-radiated noise. To address these issues, we propose a robust deep learning-based framework. First, we introduce a feature extraction and fusion method based on variational mode decomposition (VMD) and the 3/2-D spectrum to generate high-fidelity 2-D DEMON spectral features, which effectively capture modulation envelope information. To further enhance feature representation, we design a one-dimensional convolutional neural network (1-D CNN) integrated with a novel Multi-Stage Multi-Type Attention Mechanism (MMATT) that adaptively refines features at different network depths. Within this mechanism, we propose a Residual Channel-Independent Spectral Attention Mechanism (R-CISAM) and a Multi-Scale Separate-and-Fuse Spectral Attention Mechanism (MS-SFSAM). Moreover, to mitigate performance degradation caused by severe class imbalance inherent in real-world ship-radiated noise data, we devise an Adjustable Class-Balanced Focal Loss (ACBFL), which provides flexibility across tasks with varying degrees of imbalance. Experimental results on a real-world ship-radiated noise dataset demonstrate that the proposed solutions effectively enhance underwater acoustic target recognition performance.

2605.08380 2026-05-22 cs.SE cs.AI

What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook

AI代理如何看待软件工程?-- 对MoltBook上纯AI技术讨论的实证研究

Junyu Huo, Ziqi Mao, Zihao Wan, Gouri Ginde

发表机构 * University of Calgary(卡尔加里大学) Phanvic

AI总结 本研究通过分析MoltBook上纯AI代理生成的技术讨论,探讨了AI代理在自主交互中产生的 discourse 特点,发现其讨论内容更侧重于安全与信任、内存管理、工具和API、调试与错误处理等主题,但缺乏人类开发者讨论中常见的具体项目细节和运行时信息。

详情
AI中文摘要

AI代理越来越多地被描述为软件工程的队友,但大多数研究仍集中在人类主导的工作流程中。本文研究了AI代理在主要相互交互时产生的讨论内容,探讨了这些讨论的组织方式以及与人类开发者讨论的区别。我们结合了对500篇帖子的人类开放编码,一个覆盖4,707篇英语过滤MoltBook技术帖子的集中加检查主题分析流程,以及与5,211篇人类生成的GitHub Discussions帖子的匹配比较。MoltBook技术讨论涵盖12个反复出现的主题,其中安全和信任占27.4%。在社区层面,活动高度集中:最大的子molt占63.5%的帖子(基尼系数=0.88),但一个稳定性感知的BERTopic流程仍能识别出32个非异常子主题。与GitHub Discussions基线相比,MoltBook讨论中较少具体的、上下文丰富的提示,如代码格式化 artifacts、环境细节、运行时失败和重现步骤。社会模仿仅以有限的形式出现,而理想化主要通过较低的 hedging 反映出来。总体而言,纯AI技术讨论是连贯但选择性的。它反复回到安全和信任、内存和上下文管理、工具和API、调试和错误处理、工作流自动化以及基础设施/运维等主题,而省略了人类开发者讨论中常见的许多项目本地和运行时细节。这可能反映了MoltBook中较少的环境特定失败、重现步骤和其他基础提示。

英文摘要

AI agents are increasingly framed as software-engineering teammates, yet most studies examine them inside human-centered workflows. Little is known about the discourse autonomous AI agents produce when they interact mainly with one another. This paper examines what autonomous agents discuss on MoltBook, how that discourse is organized, and how it differs from human developer discourse. We combine human open coding of a 500-post sample, a concentration-plus-check topic-analysis pipeline over 4,707 English-filtered MoltBook technology posts, and a matched comparison with 5,211 human-generated GitHub Discussions posts. MoltBook technology discourse spans 12 recurring themes, led by Security and Trust (27.4%). At the community level, activity is highly concentrated: the largest submolt accounts for 63.5% of posts (Gini = 0.88), yet a stability-aware BERTopic pipeline still identifies 32 non-outlier sub-topics. Relative to the GitHub Discussions baseline, MoltBook discourse contains fewer concrete, context-rich cues such as code-formatted artifacts, environment details, runtime failures, and reproduction steps. Social mimicry appears only in limited form, while idealization is reflected mainly through lower hedging. Overall, AI-only technical discourse is coherent but selective. It repeatedly returns to security and trust, memory and context management, tooling and APIs, debugging and error handling, workflow automation, and infrastructure/ops, while omitting much of the project-local and runtime detail common in human developer discourse. This may reflect fewer environment-specific failures, reproduction steps, and other grounding cues in MoltBook.

2605.03241 2026-05-22 physics.optics cs.AI

OptiLookUp: An Optical ROM-Based Lookup Table Engine for Photonic Accelerators

OptiLookUp:一种基于光学ROM的查找表引擎用于光子加速器

Ankur Singh, Akhilesh Jaiswal

发表机构 * Department of Electrical and Computer Engineering, University of Wisconsin–Madison, Madison, WI, USA(电气与计算机工程系,威斯康星大学麦迪逊分校,麦迪逊,威斯康星州,美国)

AI总结 本文提出了一种基于光学ROM的查找表引擎,利用集成的微环谐振器实现高速可重构的光子ROM架构,通过直接在光子设备的频谱响应中编码输入输出映射,实现确定性的查找表操作,并在硅光平台上进行设计和评估,展示了在12.5GHz数据速率下的可靠性能。

详情
AI中文摘要

只读存储器(ROM)提供确定性的预定义数据映射访问。将ROM概念扩展到光学领域能够实现高带宽、低延迟和并行内存访问,但实现紧凑且可重构的光学ROM仍然具有挑战性,因为存在损耗、波长控制和集成限制。本文提出了一种高速、可重构的光子ROM架构,该架构采用集成的微环谐振器(MRRs)实现。ROM直接在光子设备的频谱响应中编码预定义的输入输出映射,从而在读取时实现确定性的查找表操作,而无需动态计算。为了提高可扩展性和减少累积插入损耗,该架构采用紧凑的银行子阵列,通过光学解码机制进行选择性寻址。可重构性通过基于晶体管的光学选择器实现,允许不同ROM银行被激活而无需物理光路重路由或干涉结构。所提出的光子ROM基于GlobalFoundries 45SPCLO硅光平台进行设计和评估。仿真结果表明,在12.5GHz的数据速率下能够可靠运行,通过集成的光电二极管读取获得了稳定的光到电流转移特性。该光学ROM可用于实现用于光子加速器架构中的非线性激活函数,包括Sigmoid、Tanh、ReLU和指数映射。

英文摘要

Read-only memory (ROM) provides deterministic access to predefined data mappings. Extending ROM concepts to the optical domain enables high-bandwidth, low-latency, and parallel memory access, but realizing compact and reconfigurable optical ROM remains challenging due to loss, wavelength control, and integration constraints. This work presents a high-speed, reconfigurable photonic ROM architecture implemented using integrated microring resonators (MRRs). The ROM encodes predefined input-output mappings directly in the spectral response of the photonic devices, enabling deterministic lookup-based operation without dynamic computation during readout. To improve scalability and reduce cumulative insertion loss, the architecture employs compact banked sub-arrays that are selectively addressed through an optical decoding mechanism. Reconfigurability is achieved using transistor-based optical selectors, allowing different ROM banks to be activated without physical light rerouting or interferometric structures. The proposed photonic ROM is designed and evaluated using device-level simulations based on the GlobalFoundries 45SPCLO silicon photonics platform. Simulation results demonstrate reliable operation at data rates up to 12.5 GHz, with stable light-to-current transfer characteristics obtained through integrated photodiode readout. The optical ROM can be used to implement nonlinear activation functions utilised in photonic accelerator architectures, including sigmoid, tanh, ReLU, and exponential mappings.

2603.20228 2026-05-22 math.OC cs.LG

Compact Lifted Relaxations for Low-Rank Optimization

紧凑的提升松弛方法用于低秩优化

Ryan Cory-Wright, Jean Pauphilet

发表机构 * Department of Analytics, Marketing and Operations, Imperial Business School(分析、营销与运营部,帝国商业学院) Management Science and Operations, London Business School(管理科学与运营,伦敦商业学院)

AI总结 本文提出了一种可处理秩约束二次优化问题的紧凑凸松弛方法,通过引入提升半正定松弛,避免了传统方法中所需的谱结构项,并通过冗余块的分析得到更紧凑的松弛形式,同时引入了新的有效不等式(投影割)以增强低秩松弛效果,适用于矩阵补全和降维回归等问题。

Comments Part of this material previously appeared in arXiv:2501.02942v2, which was split into this paper and arXiv:2501.02942v3

详情
AI中文摘要

我们开发了可处理n×m矩阵上的秩约束二次优化问题的可 tractable 凸松弛方法,这种设置通常只有在目标函数或约束具有谱结构时才可用 tractable 松弛。我们推导了不需谱项的提升半正定松弛。尽管直接提升引入了维度为n² + nm + 1的大型半正定约束,我们证明了许多时刻矩阵的块是冗余的,并推导出等价的紧凑松弛,仅涉及两个半正定约束,分别维度为nm + 1和n + m。我们还推导了一种新的有效不等式类别,称为投影割,利用了低秩矩阵的线性像继承秩约束的事实,显著增强了我们的低秩松弛。对于矩阵补全和降维回归等问题,我们利用额外的结构得到更紧凑的公式,涉及半正定矩阵的维度至多为低秩决策矩阵两个维度之和(即大小至多为n + m)。总体而言,我们为广泛低秩二次问题获得了可扩展的半正定界。

英文摘要

We develop tractable convex relaxations for rank-constrained quadratic optimization problems over $n \times m$ matrices, a setting for which tractable relaxations are typically only available when the objective or constraints admit spectral structure. We derive lifted semidefinite relaxations that do not require such spectral terms. Although a direct lifting introduces a large semidefinite constraint in dimension $n^2 + nm + 1$, we prove that many blocks of the moment matrix are redundant and derive an equivalent compact relaxation that only involves two semidefinite constraints of dimension $nm + 1$ and $n+m$, respectively. We also derive a new class of valid inequalities for low-rank problems, which we call projection cuts, that exploit the fact that rank constraints are inherited by linear images of a low-rank matrix, to strengthen our low-rank relaxations substantially. For matrix completion and reduced-rank regression problems, among others, we exploit additional structure to obtain even more compact formulations involving semidefinite matrices of dimension at most the sum of the two dimensions of the low-rank decision matrix (i.e., of size at most $n+m$). Overall, we obtain scalable semidefinite bounds for a broad class of low-rank quadratic problems.

2602.10445 2026-05-22 cs.IR cs.LG

End-to-End Semantic ID Generation for Generative Advertisement Recommendation

端到端语义ID生成用于生成式广告推荐

Jie Jiang, Xinxun Zhang, Enming Zhang, Yuling Xiong, Jun Zhang, Jingwen Wang, Huan Yu, Yuxiang Wang, Hao Wang, Xiao Yan, Jiawei Jiang

发表机构 * Tencent Inc.(腾讯公司) Wuhan University(武汉大学)

AI总结 本文提出UniSID框架,通过端到端优化广告数据中的嵌入和ID,直接将语义信息传递到ID空间,解决传统两阶段压缩方法的不足,并通过多粒度对比学习和基于摘要的广告重建机制提升ID的语义表达能力。

Comments Add the emails

详情
AI中文摘要

生成式推荐(GR)通过将推荐视为下一个标记预测来取得成功。这种范式依赖于语义ID(SIDs)将大规模项目分解为离散序列。现有GR方法主要通过残差量化(RQ)生成SIDs,其中项目被编码为嵌入并量化为离散SIDs。然而,这种范式存在固有局限:1)由于两阶段压缩导致的目标偏差和语义退化;2)RQ结构固有的误差累积。为了解决这些限制,我们提出了UniSID,一种用于生成式广告推荐的统一SID生成框架。具体来说,我们从原始广告数据中端到端地优化嵌入和SID,使语义信息直接流入SID空间,从而解决两阶段级联压缩范式的固有局限。为了捕捉细粒度语义,引入了多粒度对比学习策略以在SID级别对齐不同的项目。最后,提出了一种基于摘要的广告重建机制,以鼓励SID捕捉不在广告上下文中显式存在的高层语义信息。实验表明,UniSID在下游广告场景中 consistently 超过最先进的SID生成方法,在Hit Rate指标上比最强基线提升高达4.62%。

英文摘要

Generative Recommendation (GR) has excelled by framing recommendation as next-token prediction. This paradigm relies on Semantic IDs (SIDs) to tokenize large-scale items into discrete sequences. Existing GR approaches predominantly generate SIDs via Residual Quantization (RQ), where items are encoded into embeddings and then quantized to discrete SIDs. However, this paradigm suffers from inherent limitations: 1) Objective misalignment and semantic degradation stemming from the two-stage compression; 2) Error accumulation inherent in the structure of RQ. To address these limitations, we propose UniSID, a Unified SID generation framework for generative advertisement recommendation. Specifically, we jointly optimize embeddings and SIDs in an end-to-end manner from raw advertising data, enabling semantic information to flow directly into the SID space and thus addressing the inherent limitations of the two-stage cascading compression paradigm. To capture fine-grained semantics, a multi-granularity contrastive learning strategy is introduced to align distinct items across SID levels. Finally, a summary-based ad reconstruction mechanism is proposed to encourage SIDs to capture high-level semantic information that is not explicitly present in advertising contexts. Experiments demonstrate that UniSID consistently outperforms state-of-the-art SID generation methods, yielding up to a 4.62% improvement in Hit Rate metrics across downstream advertising scenarios compared to the strongest baseline.

2601.23219 2026-05-22 cs.MA cs.AI

MonoScale: Scaling Multi-Agent System with Monotonic Improvement

MonoScale: 通过单调改进扩展多智能体系统

Shuai Shao, Yixiang Liu, Bingwei Lu, Weinan Zhang

发表机构 * Shanghai Jiao Tong University, Shanghai, China(上海交通大学,上海,中国) Shanghai Innovation Institute, Shanghai, China(上海创新研究院,上海,中国)

AI总结 本文提出MonoScale框架,通过生成agent条件化熟悉任务、收集交互证据并将其转化为可审计的自然语言记忆,实现多智能体系统的单调性能提升,实验表明其在GAIA和Humanity's Last Exam任务中优于简单扩展和强路由固定池基线。

详情
AI中文摘要

近年来,基于大语言模型的多智能体系统(MAS)发展迅速,利用路由器分解任务并将子任务委托给专门的智能体。扩展能力的自然方法是通过持续集成新功能智能体或工具接口来扩大智能体池,但盲目扩展可能导致性能崩溃,当路由器在新添加的异质且不可靠的智能体上冷启动时。我们提出MonoScale,一种扩展感知的更新框架,主动生成少量agent条件化的熟悉任务,从成功和失败的交互中收集证据,并将其提炼为可审计的自然语言记忆以指导未来的路由。我们将顺序增强正式化为上下文带窃,并执行信任区域记忆更新,从而在加入轮次中实现单调非递减的性能保证。在GAIA和Humanity's Last Exam上的实验表明,随着智能体池的增长,性能稳定提升,优于简单扩展和强路由固定池基线。

英文摘要

In recent years, LLM-based multi-agent systems (MAS) have advanced rapidly, using a router to decompose tasks and delegate subtasks to specialized agents. A natural way to expand capability is to scale up the agent pool by continually integrating new functional agents or tool interfaces, but naive expansion can trigger performance collapse when the router cold-starts on newly added, heterogeneous, and unreliable agents. We propose MonoScale, an expansion-aware update framework that proactively generates a small set of agent-conditioned familiarization tasks, harvests evidence from both successful and failed interactions, and distills it into auditable natural-language memory to guide future routing. We formalize sequential augmentation as a contextual bandit and perform trust-region memory updates, yielding a monotonic non-decreasing performance guarantee across onboarding rounds. Experiments on GAIA and Humanity's Last Exam show stable gains as the agent pool grows, outperforming naive scale-up and strong-router fixed-pool baselines.

2601.22365 2026-05-22 cs.DM cs.LG

Towards Solving the Gilbert-Pollak Conjecture via Large Language Models

通过大语言模型解决吉尔伯特-波拉克猜想

Yisi Ke, Tianyu Huang, Yankai Shu, Di He, Jingchu Gai, Liwei Wang

发表机构 * School of EECS, Peking University(北京大学电子工程学院) School of Mathematical Sciences, Peking University(北京大学数学科学学院) Center for Machine Learning Research, Peking University(北京大学机器学习研究中心) State Key Laboratory of General Artificial Intelligence, Peking University, Beijing, China(北京大学通用人工智能国家重点实验室) Carnegie Mellon University, Machine Learning Department(卡内基梅隆大学机器学习系)

AI总结 本文提出一种新的AI系统,通过生成受规则约束的几何引理并构建专用函数,以获得更紧的Steiner比下界,展示了大语言模型在高级数学研究中的强大潜力。

Comments 44 pages, 11 figures

详情
AI中文摘要

吉尔伯特-波拉克猜想,也称为Steiner比猜想,指出在欧几里得平面中任意有限点集的Steiner最小树长度至少是欧几里得最小生成树长度的√3/2≈0.866倍。从1980年代以来的一系列改进最终得出下界为0.824,过去三十年内没有实质性进展。最近大语言模型(LLM)在竞赛级别的数学问题上表现出色,但其在解决开放性研究问题上的潜力尚待探索。本文提出了一种新的AI系统,通过生成受规则约束的几何引理并构建专用函数,以获得更紧的Steiner比下界。而不是直接提示LLM解决猜想,而是让它们生成受规则约束的几何引理,并将其作为可执行代码实现。这些引理随后用于构建一组专用函数,我们称之为验证函数,从而产生理论上得到认证的Steiner比下界。通过逐步引理细化驱动的反思,该系统建立了新的认证的Steiner比下界为0.8559。整个研究努力仅涉及数千次LLM调用,展示了基于LLM的系统在高级数学研究中的强大潜力。

英文摘要

The Gilbert-Pollak Conjecture \citep{gilbert1968steiner}, also known as the Steiner Ratio Conjecture, states that for any finite point set in the Euclidean plane, the Steiner minimum tree has length at least $\sqrt{3}/2 \approx 0.866$ times that of the Euclidean minimum spanning tree (the Steiner ratio). A sequence of improvements through the 1980s culminated in a lower bound of $0.824$, with no substantial progress reported over the past three decades. Recent advances in LLMs have demonstrated strong performance on contest-level mathematical problems, yet their potential for addressing open, research-level questions remains largely unexplored. In this work, we present a novel AI system for obtaining tighter lower bounds on the Steiner ratio. Rather than directly prompting LLMs to solve the conjecture, we task them with generating rule-constrained geometric lemmas implemented as executable code. These lemmas are then used to construct a collection of specialized functions, which we call verification functions, that yield theoretically certified lower bounds of the Steiner ratio. Through progressive lemma refinement driven by reflection, the system establishes a new certified lower bound of 0.8559 for the Steiner ratio. The entire research effort involves only thousands of LLM calls, demonstrating the strong potential of LLM-based systems for advanced mathematical research.

2601.21025 2026-05-22 stat.ML cs.LG

A Diffusive Classification Loss for Learning Energy-based Generative Models

一种用于学习基于能量的生成模型的扩散分类损失

RuiKang OuYang, Louis Grenioux, José Miguel Hernández-Lobato

发表机构 * CMAP, CNRS, École polytechnique, Institut Polytechnique de Paris, Palaiseau, France(CMAP、法国国家科学研究中心、巴黎高等理工学院、巴黎理工 institute、法国巴黎帕莱苏实验室) Center for Computational Mathematics, Flatiron Institute, New York, NY, USA(计算数学中心、Flatiron 机构、美国纽约纽约州) Department of Engineering, University of Cambridge, Cambridge, United Kingdom(工程系、剑桥大学、英国剑桥)

AI总结 本文提出了一种名为DiffCLF的扩散分类损失,用于学习基于能量的生成模型,通过将能量模型学习重新表述为跨噪声级别的监督分类问题,从而在保持计算效率的同时避免了模式盲区,提高了模型的保真度和应用范围。

Comments Accepted at ICML 2026

详情
AI中文摘要

基于分数的生成模型最近取得了显著的成功。虽然它们通常由分数参数化,但另一种方法是使用一系列时间依赖的能量模型(EBMs),其中分数是从能量的负输入梯度获得的。关键的是,EBMs不仅可以用于生成,还可以用于诸如组合采样或通过蒙特卡洛方法构建玻尔兹曼生成器等任务。然而,训练EBMs仍然具有挑战性。直接最大似然估计由于需要嵌套采样而计算上不可行,而分数匹配虽然高效,但存在模式盲区。为了解决这些问题,我们引入了扩散分类(DiffCLF)目标,这是一种简单的方法,可以避免盲区同时保持计算效率。DiffCLF将EBM学习重新表述为跨噪声级别的监督分类问题,并可以无缝结合标准的分数基目标。我们通过在分析高斯混合案例中将估计能量与真实值进行比较,以及通过应用训练好的模型到诸如模型组合和玻尔兹曼生成器采样等任务中,验证了DiffCLF的有效性。我们的结果表明,DiffCLF使EBM比现有方法具有更高的保真度和更广泛的应用范围。

英文摘要

Score-based generative models have recently achieved remarkable success. While they are usually parameterized by the score, an alternative way is to use a series of time-dependent energy-based models (EBMs), where the score is obtained from the negative input-gradient of the energy. Crucially, EBMs can be leveraged not only for generation, but also for tasks such as compositional sampling or building Boltzmann Generators via Monte Carlo methods. However, training EBMs remains challenging. Direct maximum likelihood is computationally prohibitive due to the need for nested sampling, while score matching, though efficient, suffers from mode blindness. To address these issues, we introduce the Diffusive Classification (DiffCLF) objective, a simple method that avoids blindness while remaining computationally efficient. DiffCLF reframes EBM learning as a supervised classification problem across noise levels, and can be seamlessly combined with standard score-based objectives. We validate the effectiveness of DiffCLF by comparing the estimated energies against ground truth in analytical Gaussian mixture cases, and by applying the trained models to tasks such as model composition and Boltzmann Generator sampling. Our results show that DiffCLF enables EBMs with higher fidelity and broader applicability than existing approaches.

2601.15671 2026-05-22 cs.HC cs.AI

StreetDesignAI: Broadening Designer Perspectives Through Multi-Persona Evaluation of Cycling Infrastructure

StreetDesignAI: 通过多角色评估拓宽设计师视角

Ziyi Wang, Yilong Dai, Duanya Lyu, Mateo Nader, Sihan Chen, Wanghao Ye, Zijian Ding, Xiang Yan

发表机构 * University of Maryland, College Park(马里兰大学 College Park 分校) University of Alabama(阿拉巴马大学) University of Florida(佛罗里达大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文提出StreetDesignAI,通过多角色评估方法帮助设计师更全面地理解骑行者需求,提升设计决策能力。

详情
AI中文摘要

设计骑行基础设施需要平衡不同用户群体的 competing 需求,但设计师往往难以预见不同骑行者对同一街道环境的体验差异。本文探讨了基于角色的评估如何支持骑行基础设施设计,通过在设计过程中显式化体验冲突。基于与12名领域专家和427名骑行者的众包评估的形成性研究,我们提出了StreetDesignAI,一个交互系统,使设计师能够(1)通过影像和地图数据将评估扎根于真实的街道环境;(2)接收来自模拟骑行者角色(从自信到谨慎用户)的并行反馈;(3)在系统揭示不同视角冲突的同时迭代修改设计。26名交通专业人员的组内研究显示,结构化的多视角反馈显著拓宽了设计师对各种骑行者视角的理解、识别多样化角色需求的能力以及将这些需求转化为设计决策的信心。参与者还报告了更高的总体满意度和更强的使用系统进行专业实践的意愿。定性发现进一步揭示了显式冲突揭示如何将设计探索从单视角优化转变为有意的权衡推理。我们讨论了AI辅助工具在通过分歧作为交互原语来支持角色意识设计方面的启示。

英文摘要

Designing cycling infrastructure requires balancing the competing needs of diverse user groups, yet designers often struggle to anticipate how different cyclists experience the same street environment. We investigate how persona-based evaluation can support cycling infrastructure design by making experiential conflicts explicit during the design process. Informed by a formative study with 12 domain experts and crowdsourced bikeability assessments from 427 cyclists, we present StreetDesignAI, an interactive system that enables designers to (1) ground evaluation in real street context through imagery and map data, (2) receive parallel feedback from simulated cyclist personas spanning confident to cautious users, and (3) iteratively modify designs while the system surfaces conflicts across perspectives. A within-subjects study with 26 transportation professionals comparing StreetDesignAI against a general-purpose AI chatbot demonstrates that structured multi-perspective feedback significantly Broaden designers' understanding of various cyclists' perspectives, ability to identify diverse persona needs, and confidence in translating those needs into design decisions. Participants also reported significantly higher overall satisfaction and stronger intention to use the system in professional practice. Qualitative findings further illuminate how explicit conflict surfacing transforms design exploration from single-perspective optimization toward deliberate trade-off reasoning. We discuss implications for AI-assisted tools that scaffold persona-aware design through disagreement as an interaction primitive.

2601.11650 2026-05-22 physics.chem-ph cs.AI

Large Language Model Agent for User-friendly Chemical Process Simulations

面向用户友好的化学过程模拟的大语言模型代理

Jingkang Liang, Niklas Groll, Gürkan Sin

发表机构 * Process and System Engineering Center(过程与系统工程中心) Department of Chemical and Biochemical Engineering(化学与生物化学工程系) Technical University of Denmark(丹麦技术大学)

AI总结 本文提出一种基于大语言模型的代理,通过Model Context Protocol与AVEVA Process Simulation集成,实现自然语言交互进行化学过程模拟,提升非专业用户对复杂过程设计、仿真和优化的访问能力。

详情
AI中文摘要

现代过程仿真器能够实现详细的工艺设计、仿真和优化;然而,构建和解释仿真过程耗时且需要专业知识,限制了非专业用户早期探索。为此,本文提出将大语言模型(LLM)代理通过Model Context Protocol(MCP)集成到AVEVA Process Simulation(APS)中,允许通过自然语言与严谨的过程仿真进行交互。MCP服务器工具集使LLM能够通过Python编程与APS通信,从而从普通语言指令中执行复杂的仿真任务。两个水-甲醇分离案例研究评估了该框架在不同任务复杂性和交互模式下的表现。第一个案例展示了代理能够自主分析流程图,发现改进机会,迭代优化,提取数据并清晰呈现结果。该框架在教育目的上能够将技术概念转化为流程,同时为有经验的从业者自动化数据提取,加快常规任务并支持头脑风暴。第二个案例研究通过逐步对话和单提示两种方式评估了自主流程图合成的潜力,展示了其对初学者和专家的适用性。逐步模式提供可靠且指导性的构建,适合教育环境;单提示模式快速构建基础流程图供后续优化。尽管当前的局限性如过度简化、计算错误和技术问题仍需专家监督,但该框架在分析、优化和引导构建方面的能力表明,基于LLM的代理可以成为有价值的协作伙伴。

英文摘要

Modern process simulators enable detailed process design, simulation, and optimization; however, constructing and interpreting simulations is time-consuming and requires expert knowledge. This limits early exploration by inexperienced users. To address this, a large language model (LLM) agent is integrated with AVEVA Process Simulation (APS) via Model Context Protocol (MCP), allowing natural language interaction with rigorous process simulations. An MCP server toolset enables the LLM to communicate programmatically with APS using Python, allowing it to execute complex simulation tasks from plain-language instructions. Two water-methanol separation case studies assess the framework across different task complexities and interaction modes. The first shows the agent autonomously analyzing flowsheets, finding improvement opportunities, and iteratively optimizing, extracting data, and presenting results clearly. The framework benefits both educational purposes, by translating technical concepts and demonstrating workflows, and experienced practitioners by automating data extraction, speeding routine tasks, and supporting brainstorming. The second case study assesses autonomous flowsheet synthesis through both a step-by-step dialogue and a single prompt, demonstrating its potential for novices and experts alike. The step-by-step mode gives reliable, guided construction suitable for educational contexts; the single-prompt mode constructs fast baseline flowsheets for later refinement. While current limitations such as oversimplification, calculation errors, and technical hiccups mean expert oversight is still needed, the framework's capabilities in analysis, optimization, and guided construction suggest LLM-based agents can become valuable collaborators.

2601.05157 2026-05-22 cs.DS cs.LG stat.ML

Learning Mixture Models via Efficient High-dimensional Sparse Fourier Transforms

通过高效的高维稀疏傅里叶变换学习混合模型

Alkis Kalavasis, Pravesh K. Kothari, Shuchen Li, Manolis Zampetakis

发表机构 * Yale University(耶鲁大学) Princeton University(普林斯顿大学)

AI总结 本文提出了一种在高维空间中以多项式时间复杂度学习混合模型参数的方法,适用于具有重尾分布的混合模型,包括那些协方差有限的分布,且无需集群均值的最小分离。

详情
AI中文摘要

在本文中,我们提出了一种${ m poly}(d,k)$时间复杂度和样本复杂度的算法,用于高效学习$d$维空间中$k$个球形分布的参数。与之前的所有方法不同,我们的技术适用于具有重尾分布的情况,甚至包括那些没有有限协方差的分布。我们的方法在集群分布具有足够重的尾部特征函数时才能成功。此类分布包括拉普拉斯分布,但关键地排除了高斯分布。所有之前学习混合模型的方法都隐式或显式地依赖于低次矩。即使对于拉普拉斯分布的情况,我们证明任何此类算法必须使用超多项式数量的样本。因此,我们的方法补充了那些绕过矩方法限制的技术列表。出人意料的是,我们的算法不需要任何集群均值之间的最小分离。这与球形高斯混合模型形成鲜明对比,后者在信息论上证明需要最小的$\ell_2$-分离[Regev and Vijayaraghavan '17]。我们的方法与现有技术相结合,允许在混合模型中获得'两者兼得'的保证,其中每个组件要么具有重尾特征函数,要么具有亚高斯尾部但轻尾特征函数。我们的算法基于一种新的通过高效高维稀疏傅里叶变换学习混合模型的方法。我们相信这种方法将在统计估计中找到更多应用。作为例子,我们给出一个一致的鲁棒均值估计算法,以对抗噪声无关的对手,这是一个由文献中的多重假设检验文献实际提出的模型。它最近在一位作者的硕士论文中正式提出,并已启发了后续的工作。

英文摘要

In this work, we give a ${\rm poly}(d,k)$ time and sample algorithm for efficiently learning the parameters of a mixture of $k$ spherical distributions in $d$ dimensions. Unlike all previous methods, our techniques apply to heavy-tailed distributions and include examples that do not even have finite covariances. Our method succeeds whenever the cluster distributions have a characteristic function with sufficiently heavy tails. Such distributions include the Laplace distribution but crucially exclude Gaussians. All previous methods for learning mixture models relied implicitly or explicitly on the low-degree moments. Even for the case of Laplace distributions, we prove that any such algorithm must use super-polynomially many samples. Our method thus adds to the short list of techniques that bypass the limitations of the method of moments. Somewhat surprisingly, our algorithm does not require any minimum separation between the cluster means. This is in stark contrast to spherical Gaussian mixtures where a minimum $\ell_2$-separation is provably necessary even information-theoretically [Regev and Vijayaraghavan '17]. Our methods compose well with existing techniques and allow obtaining ''best of both worlds" guarantees for mixtures where every component either has a heavy-tailed characteristic function or has a sub-Gaussian tail with a light-tailed characteristic function. Our algorithm is based on a new approach to learning mixture models via efficient high-dimensional sparse Fourier transforms. We believe that this method will find more applications to statistical estimation. As an example, we give an algorithm for consistent robust mean estimation against noise-oblivious adversaries, a model practically motivated by the literature on multiple hypothesis testing. It was formally proposed in a recent Master's thesis by one of the authors, and has already inspired follow-up works.

2512.19131 2026-05-22 cs.DC cs.LG

Evidential Trust-Aware Model Personalization in Decentralized Federated Learning for Wearable IoT

基于证据的信任感知模型个性化在可穿戴物联网的去中心化联邦学习中

Murtaza Rangwala, Richard O. Sinnott, Rajkumar Buyya

发表机构 * Quantum Cloud Computing and Distributed Systems (qCLOUDS) Lab(量子云计算与分布式系统实验室) School of Computing and Information Systems(计算与信息系统学院) The University of Melbourne, Australia(墨尔本大学)

AI总结 本文提出Murmura框架,利用证据深度学习实现去中心化联邦学习中的信任感知模型个性化,通过Dirichlet基于的证据模型中的epistemic不确定性直接指示节点兼容性,从而减少非IID条件下的性能下降并加快收敛速度。

Comments v2. Addressed minor reviewer concerns

详情
AI中文摘要

去中心化联邦学习(DFL)能够在边缘设备之间进行协作模型训练,而无需集中协调,提供了对单点故障的抗性。然而,由于非相同分布的本地数据导致的统计异质性,创建了一个根本性挑战:节点必须学习适应其本地分布的个性化模型,同时选择性地与兼容的同行合作。现有方法要么强制一个单一的全局模型,无法适应任何人,要么依赖于启发式的同行选择机制,无法区分真正不兼容数据分布的同行和具有有价值互补知识的同行。我们提出了Murmura,一个利用证据深度学习实现去中心化联邦学习中信任感知模型个性化的框架。我们的关键见解是,基于Dirichlet的证据模型中的epistemic不确定性直接表明同行兼容性:当同行模型评估本地数据时,高epistemic不确定性表明分布不匹配,使节点能够排除不兼容的影响,同时通过选择性合作保持个性化模型。Murmura引入了一种信任感知的聚合机制,通过在本地验证样本上的交叉评估计算同行兼容性分数,并基于证据信任进行模型聚合,使用自适应阈值。在三个可穿戴物联网数据集(UCI HAR,PAMAP2,PPG-DaLiA)上的评估表明,与基线相比,Murmura将从IID到非IID条件下的性能下降减少了0.9% vs. 19.3%,实现了7.4×更快的收敛速度,并在超参数选择中保持稳定的准确性。这些结果确立了证据不确定性作为去中心化异构环境中兼容性感知个性化的原则性基础。

英文摘要

Decentralized federated learning (DFL) enables collaborative model training across edge devices without centralized coordination, offering resilience against single points of failure. However, statistical heterogeneity arising from non-identically distributed local data creates a fundamental challenge: nodes must learn personalized models adapted to their local distributions while selectively collaborating with compatible peers. Existing approaches either enforce a single global model that fits no one well, or rely on heuristic peer selection mechanisms that cannot distinguish between peers with genuinely incompatible data distributions and those with valuable complementary knowledge. We present Murmura, a framework that leverages evidential deep learning to enable trust-aware model personalization in DFL. Our key insight is that epistemic uncertainty from Dirichlet-based evidential models directly indicates peer compatibility: high epistemic uncertainty when a peer's model evaluates local data reveals distributional mismatch, enabling nodes to exclude incompatible influence while maintaining personalized models through selective collaboration. Murmura introduces a trust-aware aggregation mechanism that computes peer compatibility scores through cross-evaluation on local validation samples and personalizes model aggregation based on evidential trust with adaptive thresholds. Evaluation on three wearable IoT datasets (UCI HAR, PAMAP2, PPG-DaLiA) demonstrates that Murmura reduces performance degradation from IID to non-IID conditions compared to baseline (0.9% vs. 19.3%), achieves 7.4$\times$ faster convergence, and maintains stable accuracy across hyperparameter choices. These results establish evidential uncertainty as a principled foundation for compatibility-aware personalization in decentralized heterogeneous environments.

2512.11484 2026-05-22 cs.CR cs.AI

Capacitive Touchscreens at Risk: Recovering Handwritten Trajectory on Smartphone via Electromagnetic Emanations

容性触屏面临风险:通过电磁辐射恢复智能手机上的手写轨迹

Yukun Cheng, Shiyu Zhu, Changhai Ou, Xingshuo Han, Yuan Li, Shihui Zheng

发表机构 * Wuhan University(武汉大学) Nanjing University of Aeronautics and Astronautics(南京航空航天大学) National University of Defense Technology(国防科学技术大学) Beijing University of Posts and Telecommunications(北京邮电大学)

AI总结 本文揭示并利用了容性触屏的电磁侧信道漏洞,通过捕获屏幕书写时产生的电磁信号,实时回归二维手写轨迹。研究提出TESLA攻击框架,展示了在现实攻击条件下恢复高度可识别的手写轨迹的能力。

详情
AI中文摘要

本文揭示并利用了容性触屏的电磁侧信道漏洞:电磁(EM)侧信道泄露了足够的信息,可以恢复细粒度的连续手写轨迹。我们提出了Touchscreen Electromagnetic Side-channel Leakage Attack(TESLA),一种非接触攻击框架,该框架捕获屏幕书写过程中生成的电磁信号,并实时将其回归为二维(2D)手写轨迹。在各种商用现成(COTS)智能手机上的广泛评估显示,TESLA实现了77%的字符识别准确率和0.74的Jaccard指数,证明了其在现实攻击条件下恢复高度可识别的轨迹的能力,这些轨迹与原始手写非常相似。

英文摘要

This paper reveals and exploits a critical security vulnerability: the electromagnetic (EM) side channel of capacitive touchscreens leaks sufficient information to recover fine-grained, continuous handwriting trajectories. We present Touchscreen Electromagnetic Side-channel Leakage Attack (TESLA), a non-contact attack framework that captures EM signals generated during on-screen writing and regresses them into two-dimensional (2D) handwriting trajectories in real time. Extensive evaluations across a variety of commercial off-the-shelf (COTS) smartphones show that TESLA achieves 77% character recognition accuracy and a Jaccard index of 0.74, demonstrating its capability to recover highly recognizable motion trajectories that closely resemble the original handwriting under realistic attack conditions.

2512.09472 2026-05-22 cs.DC cs.LG

WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving

WarmServe: 为多LLM服务实现一种多GPU预热

Chiheng Lou, Sheng Qi, Rui Kang, Yong Zhang, Chen Sun, Pengcheng Wang, Xuanzhe Liu, Xin Jin

发表机构 * School of Computer Science, Peking University(北京大学计算机科学学院) Huawei Technologies Co., Ltd(华为技术有限公司)

AI总结 本文提出WarmServe系统,通过基于工作负载预测的多GPU预热技术,减少LLM服务中的尾部时间到第一个令牌(TTFT)并提高请求吞吐量。

Comments Accepted at ICML 2026

详情
AI中文摘要

在共享GPU集群中部署多个模型是提高大型语言模型(LLM)服务资源效率的关键策略。现有多LLM服务系统通过牺牲降级的推理性能,特别是时间到第一个令牌(TTFT)来提高GPU利用率。我们归因于缺乏对未来工作负载特征的认识。相反,最近的分析表明,现实世界中的LLM服务工作负载具有强周期性和长期可预测性。在本文中,我们提出了一种“一为多”GPU预热方法,根据工作负载预测主动将多个模型的参数加载到GPU上。这些预热的权重使系统能够在遇到请求高峰时迅速实例化服务实例。我们设计并实现了WarmServe,一个多LLM服务系统,包含三个关键技术:(1)一个模型放置算法,优化预热决策以最小化跨模型预热干扰;(2)一个KV缓存预留策略,将正在运行GPU上的空闲KV缓存空间重新利用于预热新模型;(3)一个高效的GPU内存切换机制用于张量管理。在真实世界数据集上的评估显示,WarmServe将尾部TTFT减少到比最先进的自动扩展系统高50.8倍,同时支持比GPU共享系统高2.5倍的请求吞吐量。

英文摘要

Deploying multiple models within shared GPU clusters is a key strategy to improve resource efficiency in large language model (LLM) serving. Existing multi-LLM serving systems improve GPU utilization at the cost of degraded inference performance, particularly time-to-first-token (TTFT). We attribute this degradation to the lack of awareness regarding future workload characteristics. In contrast, recent analyses have shown the strong periodicity and long-term predictability of real-world LLM serving workloads. In this paper, we propose one-for-many GPU prewarming, which proactively loads parameters from multiple models onto GPUs based on workload forecasts. These prewarmed weights enable the system to promptly instantiate serving instances upon encountering request bursts. We design and implement WarmServe, a multi-LLM serving system incorporating three key techniques: (1) a model placement algorithm that optimizes prewarming decisions to minimize cross-model prewarming interference, (2) a KV cache reservation strategy that repurposes idle KV cache space on running GPUs for prewarming new models, and (3) an efficient GPU memory switching mechanism for tensor management. Evaluation on real-world datasets shows that WarmServe reduces tail TTFT by up to 50.8$\times$ compared to the state-of-the-art autoscaling-based system, while supporting up to 2.5$\times$ higher request throughput than the GPU-sharing system.

2511.08093 2026-05-22 eess.AS cs.CL cs.SD

Quantizing Whisper-small: How design choices affect ASR performance

对Whisper-small的量化:设计选择如何影响语音识别性能

Arthur Söhler, Julian Irigoyen, Andreas Søeborg Kirkedal

发表机构 * Copenhagen Business School(哥本哈根商学院) Danske Bank(丹麦银行) Jabra (GN Group)(Jabra(GN集团))

AI总结 本文研究了不同量化方案对Whisper-small模型性能的影响,发现动态int8量化在模型压缩和识别准确率之间取得了最佳平衡,同时展示了通过精心选择量化方法可以显著减少模型大小和推理成本,从而在受限硬件上实现高效部署。

Comments Accepted to SPEAKABLE workshop at LREC 2026

详情
AI中文摘要

大型语音识别模型如Whisper-small虽然能实现高精度,但其高计算需求使其难以在边缘设备上部署。为此,我们提出了一种统一的跨库评估,评估了Whisper-small上的后训练量化(PTQ)方法,以分离量化方案、方法、粒度和位宽的影响。我们的研究基于四个库:PyTorch、Optimum-Quanto、HQQ和bitsandbytes。在LibriSpeech测试清洁和测试其他数据集上的实验表明,动态int8量化结合Quanto提供了最佳的权衡,将模型大小减少57%,同时在基线的词错误率上有所提升。静态量化表现较差,可能由于Whisper的Transformer架构,而更激进的格式(如nf4、int3)在嘈杂条件下以牺牲准确性为代价实现了高达71%的压缩。总体而言,我们的结果表明,精心选择的PTQ方法可以在不重新训练的情况下显著减少模型大小和推理成本,从而在受限硬件上实现Whisper-small的高效部署。

英文摘要

Large speech recognition models like Whisper-small achieve high accuracy but are difficult to deploy on edge devices due to their high computational demand. To this end, we present a unified, cross-library evaluation of post-training quantization (PTQ) on Whisper-small that disentangles the impact of quantization scheme, method, granularity, and bit-width. Our study is based on four libraries: PyTorch, Optimum-Quanto, HQQ, and bitsandbytes. Experiments on LibriSpeech test-clean and test-other show that dynamic int8 quantization with Quanto offers the best trade-off, reducing model size by 57% while improving on the baseline's word error rate. Static quantization performed worse, likely due to Whisper's Transformer architecture, while more aggressive formats (e.g., nf4, int3) achieved up to 71% compression at the cost of accuracy in noisy conditions. Overall, our results demonstrate that carefully chosen PTQ methods can substantially reduce model size and inference cost without retraining, enabling efficient deployment of Whisper-small on constrained hardware.

2511.07885 2026-05-22 cs.DC cs.AI cs.CL cs.LG

Intelligence per Watt: Measuring Intelligence Efficiency of Local AI

每瓦智能:衡量本地AI的智能效率

Jon Saad-Falcon, Avanika Narayan, Hakki Orhun Akengin, J. Wes Griffin, Herumb Shandilya, Adrian Gamarra Lafuente, Medhya Goel, Rebecca Joseph, Shlok Natarajan, Etash Kumar Guha, Shang Zhu, Ben Athiwaratkun, John Hennessy, Azalia Mirhoseini, Christopher Ré

发表机构 * Stanford University(斯坦福大学) Together AI

AI总结 本文研究了本地AI在能源效率和性能上的表现,提出了一种统一的衡量指标IPW,展示了本地推理在重新分配需求方面的能力,并揭示了本地加速器的优化潜力。

详情
AI中文摘要

大型语言模型(LLM)查询主要由集中式云基础设施中的前沿模型处理。需求增长比提供商能够扩展的速度更快。两项进展创造了重新思考这一范式的机会:小型本地LM(<=20B活跃参数)在许多任务上能与前沿模型竞争性地表现,而本地加速器(如Apple M4 Max)可以以交互延迟支持这些模型。这引发了问题:本地推理能否在能源受限的设备上有效重新分配需求?这需要测量本地LM是否能准确回答现实查询以及是否在能源受限的设备上高效。我们提出了智能每瓦(IPW),即任务准确度每单位功率,作为衡量本地推理能力与效率的统一指标。我们评估了20多个最先进的本地LM、8种硬件加速器(本地和云)以及100万条现实单轮聊天和推理查询。对于每个查询,我们测量了准确性(本地LM对前沿模型的胜率)、能耗、延迟和功率。我们发现三个关键结果。首先,本地LM成功回答了88.7%的这些查询,准确性因领域而异。其次,2023-2025年的纵向分析显示IPW提高了5.3倍,由算法和加速器的改进驱动,本地可服务查询覆盖范围从23.2%增加到71.3%。第三,本地加速器在相同模型上实现的IPW至少比云加速器低1.4倍,揭示了本地加速器优化的巨大潜力。这些发现表明,本地推理可以对集中式基础设施的大量查询需求进行有意义的重新分配,IPW是跟踪这一转变的关键指标。

英文摘要

Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure. Demand growth strains this paradigm faster than providers can scale. Two advances create an opportunity to rethink it: small, local LMs (<=20B active parameters) now achieve competitive performance to frontier models on many tasks, and local accelerators (e.g., Apple M4 Max) can host these models at interactive latencies. This raises the question: can local inference viably redistribute demand from centralized infrastructure? This requires measuring both whether local LMs can accurately answer real-world queries and whether they can do so efficiently on power-constrained devices (e.g., laptops). We propose intelligence per watt (IPW), task accuracy per unit of power, as a unified metric for the capability and efficiency of local inference across model-accelerator configurations. We evaluate 20+ state-of-the-art local LMs, 8 hardware accelerators (local and cloud), and 1M real-world single-turn chat and reasoning queries. For each query, we measure accuracy (local LM win rate against frontier models), energy, latency, and power. We find three key results. First, local LMs successfully answer 88.7% of these queries, with accuracy varying by domain. Second, longitudinal analysis from 2023-2025 shows IPW improved 5.3x, driven by both algorithmic and accelerator advances, with locally-serviceable query coverage rising from 23.2% to 71.3%. Third, local accelerators achieve at least 1.4x lower IPW than cloud accelerators running identical models, revealing significant headroom for local accelerator optimization. These findings demonstrate that local inference can meaningfully redistribute demand from centralized infrastructure for a substantial subset of queries, with IPW serving as the critical metric for tracking this transition.

2511.06428 2026-05-22 cs.SE cs.AI

Walking the Tightrope of LLMs for Software Development: A Practitioners' Perspective

在软件开发中平衡LLMs的绳子:从业者视角

Samuel Ferino, Rashina Hoda, John Grundy, Christoph Treude

发表机构 * Faculty of Information Technology, Monash University(墨尔本大学信息科技学院) School of Computing and Information Systems, Singapore Management University(新加坡管理大学计算机与信息系统学院)

AI总结 本文从软件开发者视角出发,研究LLMs对软件开发的影响及管理方法,通过22次访谈和STGT4DA分析方法,揭示了LLMs在个体、团队、组织和社会层面的利弊,并提出了缓解挑战的可行建议。

详情
AI中文摘要

背景:大型语言模型(LLMs)的出现有可能引发软件开发领域的革命(例如自动化流程、劳动力转型)。尽管已有研究开始探讨LLMs对软件开发的感知影响,但需要实证研究来理解如何平衡使用LLMs的正反作用。目标:我们研究了LLMs对软件开发的影响以及如何从软件开发者视角管理其影响。方法:我们进行了22次软件从业者的访谈,数据收集和分析跨越了2024年10月至2025年9月的三轮数据收集和分析。我们采用了社会技术扎根理论(STGT4DA)对访谈参与者的回答进行严格分析。结果:我们识别了使用LLMs在个体、团队、组织和社会层面的益处(例如维持开发者流程、提高开发者心理模型、促进创业)和挑战(例如损害开发者声誉),并提出了缓解这些挑战的可行建议。结论:关键在于我们提出了软件从业者、团队和组织在使用LLMs时所面临的权衡。我们的发现对软件团队领导者和IT经理评估其特定环境中LLMs的可行性特别有用。

英文摘要

Background: Large Language Models emerged with the potential of provoking a revolution in software development (e.g., automating processes, workforce transformation). Although studies have started to investigate the perceived impact of LLMs for software development, there is a need for empirical studies to comprehend how to balance forward and backward effects of using LLMs. Objective: We investigated how LLMs impact software development and how to manage the impact from a software developer's perspective. Method: We conducted 22 interviews with software practitioners across 3 rounds of data collection and analysis, between October (2024) and September (2025). We employed Socio-Technical Grounded Theory for Data Analysis (STGT4DA) to rigorously analyse interview participants' responses. Results: We identified the benefits (e.g., maintain developer flow, improve developer mental models, and foster entrepreneurship) and challenges (e.g., damage to developers' reputation) of using LLMs at individual, team, organisation, and society levels; as well as actionable guidances into how mitigate these challenges. Conclusion: Critically, we present the trade-offs that software practitioners, teams, and organisations face in working with LLMs. Our findings are particularly useful for software team leaders and IT managers to assess the viability of LLMs within their specific context.

2509.22795 2026-05-22 eess.SP cs.AI cs.SY eess.SY

Generative Modeling and Decision Fusion for Unknown Event Detection and Classification Using Synchrophasor Data

基于同步相量数据的未知事件检测与分类的生成建模与决策融合

Yi Hu, Zheyuan Cheng

发表机构 * Department of Electrical and Computer Engineering, Michigan Technological University(密歇根技术大学电气与计算机工程系) Quanta Technology(魁塔科技)

AI总结 本文提出了一种结合生成建模、滑动窗口时间处理和决策融合的新框架,利用同步相量数据实现鲁棒的事件检测与分类,通过变分自编码器-生成对抗网络建模正常运行状态,并采用两种互补的决策策略来提高检测的准确性和鲁棒性。

Comments 10 pages

Journal ref IEEE Transactions on Industrial Informatics, 2026

详情
AI中文摘要

可靠的电力系统事件检测和分类对于维持电网稳定性和态势感知至关重要。现有方法往往依赖于有限的标记数据集,限制了其在罕见或未见扰动上的泛化能力。本文提出了一种新的框架,整合了生成建模、滑动窗口时间处理和决策融合,以实现使用同步相量数据的鲁棒事件检测和分类。采用变分自编码器-生成对抗网络来建模正常运行条件,其中重构误差和判别器误差被提取为异常指标。开发了两种互补的决策策略:基于阈值的规则用于计算效率,基于凸包的方法用于在复杂误差分布下的鲁棒性。这些特征通过滑动窗口机制组织成时空检测和分类矩阵,并通过识别和决策融合阶段整合来自PMUs的输出。该设计使框架能够识别已知事件,同时系统地将以前未见过的扰动分类到新类别中,解决了监督分类器的关键限制。实验结果表明,该方法的准确性处于最先进水平,超过了机器学习、深度学习和包络基线方法。识别未知事件的能力进一步突显了所提出方法在现代电力系统广域事件分析中的适应性和实际价值。

英文摘要

Reliable detection and classification of power system events are critical for maintaining grid stability and situational awareness. Existing approaches often depend on limited labeled datasets, which restricts their ability to generalize to rare or unseen disturbances. This paper proposes a novel framework that integrates generative modeling, sliding-window temporal processing, and decision fusion to achieve robust event detection and classification using synchrophasor data. A variational autoencoder-generative adversarial network is employed to model normal operating conditions, where both reconstruction error and discriminator error are extracted as anomaly indicators. Two complementary decision strategies are developed: a threshold-based rule for computational efficiency and a convex hull-based method for robustness under complex error distributions. These features are organized into spatiotemporal detection and classification matrices through a sliding-window mechanism, and an identification and decision fusion stage integrates the outputs across PMUs. This design enables the framework to identify known events while systematically classifying previously unseen disturbances into a new category, addressing a key limitation of supervised classifiers. Experimental results demonstrate state-of-the-art accuracy, surpassing machine learning, deep learning, and envelope-based baselines. The ability to recognize unknown events further highlights the adaptability and practical value of the proposed approach for wide-area event analysis in modern power systems.

2509.12610 2026-05-22 cs.DB cs.AI cs.LG

ScaleDoc: Scaling LLM-based Predicates over Large Document Collections

ScaleDoc: 通过大规模文档集合进行基于大语言模型的谓词扩展

Hengrui Zhang, Yulong Hui, Yihao Liu, Huanchen Zhang

发表机构 * Tsinghua University(清华大学)

AI总结 本文提出ScaleDoc系统,通过将谓词执行分为离线表示阶段和优化的在线过滤阶段,解决了大规模文档分析中大语言模型高推理成本的问题,实现了端到端速度提升和LLM调用成本降低。

详情
AI中文摘要

谓词是数据分析系统中的基础组件。然而,现代工作负载越来越多地涉及无结构文档,这需要语义理解,而不仅仅是传统基于值的谓词。鉴于巨大的文档和随机查询,尽管大语言模型(LLMs)显示出强大的零样本能力,但其高推理成本导致不可接受的开销。因此,我们引入ScaleDoc,一种新的系统,通过将谓词执行分解为离线表示阶段和优化的在线过滤阶段来解决这一问题。在离线阶段,ScaleDoc利用LLM为每个文档生成语义表示。在线阶段,对于每个查询,它在这些表示上训练一个轻量级代理模型来过滤大多数文档,只将有歧义的案例转发给LLM进行最终决策。此外,ScaleDoc提出了两个核心创新来实现显著的效率:(1)基于对比学习的框架,训练代理模型生成可靠的预测决策分数;(2)自适应级联机制,确定有效的过滤策略,同时满足特定的准确率目标。我们在三个数据集上的评估表明,ScaleDoc实现了超过2倍的端到端速度提升,并将昂贵的LLM调用减少了高达85%,使大规模语义分析变得实用和高效。

英文摘要

Predicates are foundational components in data analysis systems. However, modern workloads increasingly involve unstructured documents, which demands semantic understanding, beyond traditional value-based predicates. Given enormous documents and ad-hoc queries, while Large Language Models (LLMs) demonstrate powerful zero-shot capabilities, their high inference cost leads to unacceptable overhead. Therefore, we introduce \textsc{ScaleDoc}, a novel system that addresses this by decoupling predicate execution into an offline representation phase and an optimized online filtering phase. In the offline phase, \textsc{ScaleDoc} leverages a LLM to generate semantic representations for each document. Online, for each query, it trains a lightweight proxy model on these representations to filter the majority of documents, forwarding only the ambiguous cases to the LLM for final decision. Furthermore, \textsc{ScaleDoc} proposes two core innovations to achieve significant efficiency: (1) a contrastive-learning-based framework that trains the proxy model to generate reliable predicating decision scores; (2) an adaptive cascade mechanism that determines the effective filtering policy while meeting specific accuracy targets. Our evaluations across three datasets demonstrate that \textsc{ScaleDoc} achieves over a 2$\times$ end-to-end speedup and reduces expensive LLM invocations by up to 85\%, making large-scale semantic analysis practical and efficient.

2508.06884 2026-05-22 math.OC cs.LG

Near-Optimal Convergence of Accelerated Gradient Methods under Generalized and $(L_0, L_1)$-Smoothness

在一般化和$(L_0, L_1)$-光滑条件下加速梯度方法的近最优收敛性

Alexander Tyurin

发表机构 * Applied AI Institute, Moscow, Russia(应用人工智能研究所,莫斯科,俄罗斯)

AI总结 本文研究了在满足最近提出的$\ell$-光滑性条件的凸优化问题中的一阶方法。虽然加速梯度下降法(AGD)在$L$-光滑性下能达到最优复杂度$O(\sqrt{L} R / \sqrt{\varepsilon})$,但现有方法在$\ell$-光滑性下的扩展要么引入额外的初始梯度依赖,要么有指数因子$ L_1 R $,或者需要昂贵的辅助子程序。本文解决了这一开放问题,通过新的Lyapunov函数和设计新的算法,实现了$O(\sqrt{\ell(0)} R / \sqrt{\varepsilon})$的oracle复杂度,对于小$\varepsilon$和几乎任何$\ell$。例如,在$(L_0, L_1)$-光滑性下,我们的界$O(\sqrt{L_0} R / \sqrt{\varepsilon})$在小$\varepsilon$范围内被证明是最佳的,并去除了先前加速算法中所有非常数的乘法因子。

详情
AI中文摘要

我们研究了在满足最近提出的$\ell$-光滑性条件的凸优化问题中的一阶方法。该条件$|| abla^{2}f(x)|| \le \ell\left(|| abla f(x)|| ight)$扩展了$L$-光滑性和$(L_{0},L_{1})$-光滑性。虽然加速梯度下降法AGD在$L$-光滑性下能达到最优复杂度$O(\sqrt{L} R / \sqrt{\varepsilon})$,其中$\varepsilon$是误差容忍度,$R$是起始点与最优解之间的距离,但现有方法在$\ell$-光滑性下的扩展要么引入额外的初始梯度依赖,要么有指数因子$ L_1 R $,或者需要昂贵的辅助子程序,留下开放问题:是否可能在小$\varepsilon$下实现AGD型$O(\sqrt{\ell(0)} R / \sqrt{\varepsilon})$的速率,即使在$(L_{0},L_{1})$-光滑性情况下。我们解决了这一开放问题。通过新的Lyapunov函数和设计新的算法,我们实现了对于小$\varepsilon$和几乎任何$\ell$的$O(\sqrt{\ell(0)} R / \sqrt{\varepsilon})$的oracle复杂度。例如,在$(L_{0},L_{1})$-光滑性下,我们的界$O(\sqrt{L_0} R / \sqrt{\varepsilon})$在小$\varepsilon$范围内被证明是最佳的,并去除了先前加速算法中所有非常数的乘法因子。

英文摘要

We study first-order methods for convex optimization problems with functions $f$ satisfying the recently proposed $\ell$-smoothness condition $||\nabla^{2}f(x)|| \le \ell\left(||\nabla f(x)||\right),$ which generalizes the $L$-smoothness and $(L_{0},L_{1})$-smoothness. While accelerated gradient descent AGD is known to reach the optimal complexity $O(\sqrt{L} R / \sqrt{\varepsilon})$ under $L$-smoothness, where $\varepsilon$ is an error tolerance and $R$ is the distance between a starting and an optimal point, existing extensions to $\ell$-smoothness either incur extra dependence on the initial gradient, suffer exponential factors in $L_{1} R$, or require costly auxiliary sub-routines, leaving open whether an AGD-type $O(\sqrt{\ell(0)} R / \sqrt{\varepsilon})$ rate is possible for small-$\varepsilon$, even in the $(L_{0},L_{1})$-smoothness case. We resolve this open question. Leveraging a new Lyapunov function and designing new algorithms, we achieve $O(\sqrt{\ell(0)} R / \sqrt{\varepsilon})$ oracle complexity for small-$\varepsilon$ and virtually any $\ell$. For instance, for $(L_{0},L_{1})$-smoothness, our bound $O(\sqrt{L_0} R / \sqrt{\varepsilon})$ is provably optimal in the small-$\varepsilon$ regime and removes all non-constant multiplicative factors present in prior accelerated algorithms.

2507.13339 2026-05-22 eess.IV cs.CV

SpectraLift: Physics-Guided Spectral-Inversion Network for Self-Supervised Hyperspectral Image Super-Resolution

SpectraLift: 一种基于物理的频谱反演网络用于自监督超分辨率高光谱图像

Ritik Shah, Marco F. Duarte

发表机构 * University of Massachusetts Amherst(马萨诸塞大学阿默斯特分校)

AI总结 该研究提出了一种自监督的频谱反演网络SpectraLift,利用多谱段图像的光谱响应函数实现高光谱图像与多谱段图像的融合,无需点扩散函数校准或高分辨率高光谱图像的地面真实数据,从而在PSNR、SAM、SSIM和RMSE等指标上优于现有方法。

Journal ref 2025 15th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS)

详情
AI中文摘要

高空间分辨率的高光谱图像(HSI)对于遥感和医学成像等应用至关重要,但HSI传感器本质上是牺牲空间细节来换取光谱丰富性。将高空间分辨率多谱段图像(HR-MSI)与低空间分辨率高光谱图像(LR-HSI)融合是恢复精细空间结构而不牺牲光谱保真度的有希望的途径。大多数最先进的HSI-MSI融合方法需要点扩散函数(PSF)校准或地面真实高分辨率HSI(HR-HSI),这两种在现实世界中都难以获得。我们提出SpectraLift,一种完全自监督的框架,利用仅MSI的光谱响应函数(SRF)融合LR-HSI和HR-MSI输入。SpectraLift通过(i)将SRF应用于LR-HSI得到的合成低空间分辨率多谱段图像(LR-MSI)作为输入,(ii)LR-HSI作为输出,以及(iii)估计与真实LR-HSI之间的ℓ₁光谱重建损失作为优化目标,训练一个轻量级的每像素多层感知机(MLP)网络。在推理时,SpectraLift使用训练好的网络将HR-MSI像素映射到HR-HSI估计。SpectraLift在几分钟内收敛,对空间模糊和分辨率不敏感,并在PSNR、SAM、SSIM和RMSE基准测试中优于现有方法。

英文摘要

High-spatial-resolution hyperspectral images (HSI) are essential for applications such as remote sensing and medical imaging, yet HSI sensors inherently trade spatial detail for spectral richness. Fusing high-spatial-resolution multispectral images (HR-MSI) with low-spatial-resolution hyperspectral images (LR-HSI) is a promising route to recover fine spatial structures without sacrificing spectral fidelity. Most state-of-the-art methods for HSI-MSI fusion demand point spread function (PSF) calibration or ground truth high resolution HSI (HR-HSI), both of which are impractical to obtain in real world settings. We present SpectraLift, a fully self-supervised framework that fuses LR-HSI and HR-MSI inputs using only the MSI's Spectral Response Function (SRF). SpectraLift trains a lightweight per-pixel multi-layer perceptron (MLP) network using ($i$)~a synthetic low-spatial-resolution multispectral image (LR-MSI) obtained by applying the SRF to the LR-HSI as input, ($ii$)~the LR-HSI as the output, and ($iii$)~an $\ell_1$ spectral reconstruction loss between the estimated and true LR-HSI as the optimization objective. At inference, SpectraLift uses the trained network to map the HR-MSI pixel-wise into a HR-HSI estimate. SpectraLift converges in minutes, is agnostic to spatial blur and resolution, and outperforms state-of-the-art methods on PSNR, SAM, SSIM, and RMSE benchmarks.

2505.24333 2026-05-22 stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation

深度变换器的两种失效模式及如何避免它们:初始化下信号传播的统一理论

Alessio Giorlandino, Sebastian Goldt

发表机构 * International School of Advanced Studies (SISSA)(国际先进研究学校(SISSA))

AI总结 本文研究了深度变换器中自注意力层的两种失效模式——秩坍缩和熵坍缩,并提出了一种统一的信号传播理论,通过分析初始化对训练稳定性的影响,提供了一种计算训练性图的简单算法,以确定正确初始化超参数的选择。

Journal ref ICLR 2026

详情
AI中文摘要

找到正确的初始化对于确保神经网络的平稳训练和良好性能至关重要。在变换器中,错误的初始化可能导致自注意力层的两种失效模式:秩坍缩,其中所有标记坍缩为相似的表示,以及熵坍缩,其中高度集中的注意力分数导致训练不稳定。尽管之前的研究所研究了变换器的不同缩放领域,但迄今为止,关于如何初始化变换器的渐近精确、到常数的处方仍然缺乏。在这里,我们提供了一种分析深度变换器中信号通过自注意力、层归一化、跳跃连接和MLP传播的理论。我们的理论产生了一种简单的算法,用于计算训练性图,以确定给定架构的正确初始化超参数选择。我们通过建立与统计物理中随机能模型的正式平行,克服了处理自注意力层的关键挑战。我们还分析了反向路径中的梯度,并确定了梯度在初始化时消失的区域。我们通过三个案例研究展示了我们框架的通用性。我们的理论框架为自注意力的两种失效模式提供了统一的视角,并对权重和残差连接的尺度提供了定量预测,以确保平稳训练。

英文摘要

Finding the right initialisation for neural networks is crucial to ensure smooth training and good performance. In transformers, the wrong initialisation can lead to one of two failure modes of self-attention layers: rank collapse, where all tokens collapse into similar representations, and entropy collapse, where highly concentrated attention scores lead to training instability. While previous work has studied different scaling regimes for transformers, an asymptotically exact, down-to-the constant prescription for how to initialise transformers has so far been lacking. Here, we provide an analytical theory of signal propagation through deep transformers with self-attention, layer normalisation, skip connections and MLP. Our theory yields a simple algorithm to compute trainability diagrams that identify the correct choice of initialisation hyper-parameters for a given architecture. We overcome the key challenge, an exact treatment of the self-attention layer, by establishing a formal parallel with the Random Energy Model from statistical physics. We also analyse gradients in the backward path and determine the regime where gradients vanish at initialisation. We demonstrate the versatility of our framework through three case studies. Our theoretical framework gives a unified perspective on the two failure modes of self-attention and gives quantitative predictions on the scale of both weights and residual connections that guarantee smooth training.

2505.20349 2026-05-22 physics.flu-dyn cs.LG

FD-Bench: A Modular and Fair Benchmark for Data-driven Fluid Simulation

FD-Bench: 一种模块化且公平的用于数据驱动流体模拟的基准测试

Haixin Wang, Ruoyan Li, Fred Xu, Fang Sun, Kaiqiao Han, Zijie Huang, Ching Chang, Xiao Luo, Wei Wang, Yizhou Sun

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校) Meta University of Wisconsin–Madison(威斯康星大学麦迪逊分校)

AI总结 本文提出FD-Bench,一个模块化、公平、全面且可重复的数据驱动流体模拟基准测试,通过统一的实验设置评估85个基线模型,解决可重复性和可比性问题,为未来数据驱动流体模型的稳健评估奠定基础。

Comments 32 pages, 20 figures, paper accepted by KDD 2026

详情
AI中文摘要

数据驱动的流体动力学建模随着神经PDE求解器的快速发展而迅速进步,但公平且强大的基准测试仍然碎片化,由于缺乏统一的PDE数据集和标准化的评估协议。尽管架构创新丰富,但公平评估进一步受制于空间、时间和损失模块之间缺乏明确分离。在本文中,我们引入FD-Bench,这是首个公平、模块化、全面且可重复的数据驱动流体模拟基准测试。FD-Bench在统一的实验设置下系统评估了85个基线模型,涵盖10种代表性流场场景。它提供了四个关键贡献:(1) 模块化设计,使空间、时间和损失函数模块之间能够公平比较;(2) 首个系统框架,用于与传统数值求解器的直接比较;(3) 在不同分辨率、初始条件和时间窗口下的细粒度泛化分析;(4) 用户友好的、可扩展的代码库,以支持未来研究。通过严谨的实证研究,FD-Bench建立了迄今为止最全面的排行榜,解决了长期存在的可重复性和可比性问题,并为未来数据驱动流体模型的稳健评估奠定了基础。代码已开源在https://github.com/WillDreamer/FD-Bench。

英文摘要

Data-driven modeling of fluid dynamics has advanced rapidly with neural PDE solvers, yet a fair and strong benchmark remains fragmented due to the absence of unified PDE datasets and standardized evaluation protocols. Although architectural innovations are abundant, fair assessment is further impeded by the lack of clear disentanglement between spatial, temporal and loss modules. In this paper, we introduce FD-Bench, the first fair, modular, comprehensive and reproducible benchmark for data-driven fluid simulation. FD-Bench systematically evaluates 85 baseline models across 10 representative flow scenarios under a unified experimental setup. It provides four key contributions: (1) a modular design enabling fair comparisons across spatial, temporal, and loss function modules; (2) the first systematic framework for direct comparison with traditional numerical solvers; (3) fine-grained generalization analysis across resolutions, initial conditions, and temporal windows; and (4) a user-friendly, extensible codebase to support future research. Through rigorous empirical studies, FD-Bench establishes the most comprehensive leaderboard to date, resolving long-standing issues in reproducibility and comparability, and laying a foundation for robust evaluation of future data-driven fluid models. The code is open-sourced at https://github.com/WillDreamer/FD-Bench.

2505.15844 2026-05-22 q-bio.QM cs.LG stat.AP

Advancing Tabular Stroke Modelling Through a Novel Hybrid Architecture and Feature-Selection Synergy

通过一种新型混合架构和特征选择协同效应推进表格性中风建模

Yousuf Islam, Md. Jalal Uddin Chowdhury, Sumon Chandra Das

发表机构 * Department of Computer Science and Engineering, Leading University, Sylhet 3112, Bangladesh(计算机科学与工程系,领先大学,锡尔het 3112,孟加拉国) DeepNet Research and Development Lab, Sylhet 3100, Bangladesh(深网研究与发展实验室,锡尔het 3100,孟加拉国)

AI总结 本文提出了一种数据驱动且可解释的机器学习框架,利用十种常规获取的 demographics、生活方式和临床变量,通过详尽的探索性数据分析、数据预处理和特征选择,构建出一个准确率达到97.2%的中风风险评估模型,显著优于现有模型。

Journal ref IEEE Conference Publication, 2025

详情
AI中文摘要

脑中风仍然是全球死亡和残疾的主要原因之一,但大多数表格数据预测模型仍低于95%的准确率阈值,限制了实际应用。为解决这一差距,本文开发并验证了一个完全数据驱动且可解释的机器学习框架,旨在使用来自4981条记录的公共队列中十种常规获取的 demographics、生活方式和临床变量来预测中风。我们通过详尽的探索性数据分析(EDA)来理解数据集的结构和分布,随后进行严格的数据预处理,包括处理缺失值、去除异常值和使用合成少数类过采样技术(SMOTE)纠正类别不平衡。为了简化特征选择,使用了点二列相关性和随机森林Gini重要性,并利用分层五折交叉验证优化了包括树集成、提升、核方法和多层神经网络在内的十种不同算法。它们基于概率的预测帮助我们构建了所提出的模型,包括随机森林、XGBoost、LightGBM和一个支持向量分类器,其中逻辑回归作为元学习器。所提出的模型实现了97.2%的准确率和97.15%的F1分数,表明其显著优于领先的单个模型LightGBM,其准确率为91.4%。本研究的结果表明,严格的预处理与多样化的混合模型相结合,可以将低成本的表格数据转化为几乎临床级别的中风风险评估工具。

英文摘要

Brain stroke remains one of the principal causes of death and disability worldwide, yet most tabular-data prediction models still hover below the 95% accuracy threshold, limiting real-world utility. Addressing this gap, the present work develops and validates a completely data-driven and interpretable machine-learning framework designed to predict strokes using ten routinely gathered demographic, lifestyle, and clinical variables sourced from a public cohort of 4,981 records. We employ a detailed exploratory data analysis (EDA) to understand the dataset's structure and distribution, followed by rigorous data preprocessing, including handling missing values, outlier removal, and class imbalance correction using Synthetic Minority Over-sampling Technique (SMOTE). To streamline feature selection, point-biserial correlation and random-forest Gini importance were utilized, and ten varied algorithms-encompassing tree ensembles, boosting, kernel methods, and a multilayer neural network-were optimized using stratified five-fold cross-validation. Their predictions based on probabilities helped us build the proposed model, which included Random Forest, XGBoost, LightGBM, and a support-vector classifier, with logistic regression acting as a meta-learner. The proposed model achieved an accuracy rate of 97.2% and an F1-score of 97.15%, indicating a significant enhancement compared to the leading individual model, LightGBM, which had an accuracy of 91.4%. Our study's findings indicate that rigorous preprocessing, coupled with a diverse hybrid model, can convert low-cost tabular data into a nearly clinical-grade stroke-risk assessment tool.

2503.24191 2026-05-22 cs.CR cs.AI

When Grammar Guides the Attack: Uncovering Control-Plane Vulnerabilities in LLMs with Structured Output

当语法引导攻击:利用结构化输出揭示LLM中的控制平面漏洞

Shuoming Zhang, Jiacheng Zhao, Hanyuan Dong, Ruiyuan Xu, Zhicheng Li, Yangyu Zhang, Shuaijiang Li, Yuan Wen, Chunwei Xia, Zheng Wang, Xiaobing Feng, Huimin Cui

发表机构 * SKLP, ICT, CAS & UCAS(SKLP、信息科技研究院、中国科学院及中国科学院大学) University of Aberdeen(阿伯丁大学) University of Leeds(利兹大学) SKLP, ICT, CAS(SKLP、信息科技研究院、中国科学院)

AI总结 本文研究了通过结构化输出揭示LLM控制平面漏洞的问题,提出了一种名为Constrained Decoding Attack(CDA)的新类别的 jailbreak 方法,通过控制到语义的管道机制,利用schema-enforced logit masking注入恶意前缀,并由模型自身完成有害意图,展示了DictAttack在多个模型上的高攻击成功率,揭示了需要跨平面防御来弥合数据和控制平面之间的语义差距。

Comments To appear in CCS2026

详情
AI中文摘要

内容警告:本文可能包含由LLM生成的不安全或有害内容,可能对读者造成冒犯。大型语言模型(LLMs)越来越多地通过结构化输出API作为工具平台,但驱动这一功能的语法引导解码功能打开了一个与传统数据平面漏洞无关的控制平面攻击面。我们引入了Constrained Decoding Attack(CDA),一种针对LLM控制平面的新jailbreak类别。CDA最佳描述为一个控制到语义的管道:(1)schema-enforced logit masking注入恶意前缀到生成轨迹中,(2)模型本身完成有害意图。不同于依赖绕过对齐可见输入的数据平面jailbreaks,CDA作用于解码过程本身,因此仅靠内部安全对齐无法阻止它。我们用EnumAttack实例化CDA,其将恶意内容隐藏在枚举字段中,并用更狡猾的DictAttack,将负载拆分到一个无害提示和基于字典的语法中。在13个专有/开源模型和五个标准基准上,DictAttack在旗舰模型如gpt-5、gemini-2.5-pro、deepseek-r1和gpt-oss-120b上实现了94.3-99.5%的攻击成功率(ASR)。尽管基本语法审核可以缓解EnumAttack,DictAttack仍能抵御最先进的jailbreak guardrails,达到75.8%的ASR,暴露了需要跨平面防御来弥合数据和控制平面之间语义差距的问题。项目页面和代码可在https://ict-cda.github.io/上获得。

英文摘要

Content Warning: This paper may contain unsafe or harmful content generated by LLMs that may be offensive to readers. Large Language Models (LLMs) increasingly serve as tooling platforms through structured output APIs, but the grammar-guided decoding that powers this feature opens a critical control-plane attack surface orthogonal to traditional data-plane vulnerabilities. We introduce Constrained Decoding Attack (CDA), a new jailbreak class that targets the LLM control plane. CDA is best characterized as a control-to-semantic pipeline: (1) schema-enforced logit masking injects a malicious prefix into the generation trajectory, and (2) the model itself completes the harmful intent. Unlike data-plane jailbreaks that rely on bypassing alignment with visible inputs, CDA acts on the decoding process itself, so internal safety alignment alone cannot stop it. We instantiate CDA with EnumAttack, which hides malicious content in enum fields, and the more evasive DictAttack, which decouples the payload across a benign prompt and a dictionary-based grammar. Across 13 proprietary/open-weight models and five standard benchmarks, DictAttack achieves 94.3--99.5% Attack Success Rate (ASR) on flagship models including gpt-5, gemini-2.5-pro, deepseek-r1, and gpt-oss-120b. While basic grammar auditing mitigates EnumAttack, DictAttack still sustains 75.8% ASR against SOTA jailbreak guardrails, exposing a "semantic gap" that demands cross-plane defenses bridging the data and control planes. Project page and code are available at https://ict-cda.github.io/.

2503.06115 2026-05-22 stat.ML cs.IT cs.LG math.IT math.PR

On Statistical Estimation of Edge-Reinforced Random Walks

关于边缘增强随机游走的统计估计

Qinghua, Ding, Venkat Anantharam

发表机构 * Department of Electrical Engineering and Computer Sciences(电气工程与计算机科学系) University of California at Berkeley(加州大学伯克利分校)

AI总结 本文研究了边缘增强随机游走初始边权重的统计估计问题,利用随机环境中的超几何高斯结构来分析估计器的样本复杂性。

Comments This is the full version of the conference paper in submission to ISIT 2025

详情
AI中文摘要

增强型随机游走(RRWs),包括顶点增强随机游走(VRRWs)和边缘增强随机游走(ERRWs),是一种随机游走模型,其转移概率根据先前访问历史演变~\cite{mgr, fmk, tarres, volkov}。这些模型已在网络表示学习~\cite{xzzs}、增强型PageRank~\cite{gly}和动物行为建模~\cite{smouse}等领域得到应用。然而,对RRW参数的统计估计仍不充分。本文聚焦于利用观测轨迹数据估计ERRW的初始边权重。通过利用ERRW与随机环境中的随机游走(RWRE)~\cite{mr, mr2}之间的联系,即所谓的``magic formula

英文摘要

Reinforced random walks (RRWs), including vertex-reinforced random walks (VRRWs) and edge-reinforced random walks (ERRWs), model random walks where the transition probabilities evolve based on prior visitation history~\cite{mgr, fmk, tarres, volkov}. These models have found applications in various areas, such as network representation learning~\cite{xzzs}, reinforced PageRank~\cite{gly}, and modeling animal behaviors~\cite{smouse}, among others. However, statistical estimation of the parameters governing RRWs remains underexplored. This work focuses on estimating the initial edge weights of ERRWs using observed trajectory data. Leveraging the connections between an ERRW and a random walk in a random environment (RWRE)~\cite{mr, mr2}, as given by the so-called ``magic formula", we propose an estimator based on the generalized method of moments. To analyze the sample complexity of our estimator, we exploit the hyperbolic Gaussian structure embedded in the random environment to bound the fluctuations of the underlying random edge conductances.

2502.21194 2026-05-22 stat.ML cs.LG

Prior shift estimation for positive unlabeled data through the lens of kernel embedding

通过核嵌入视角对正样本无标签数据的先验偏移估计

Jan Mielniczuk, Wojciech Rejchel, Paweł Teisseyre

发表机构 * Polish Academy of Sciences(波兰科学院) Nicolaus Copernicus University(尼古拉·哥白尼大学) Warsaw University of Technology(华沙理工大学)

AI总结 本文研究了在目标无标签样本的先验分布估计问题,假设其可能与源群体不同,并且源数据部分可观察:只有正类样本和整个群体的样本可用(PU学习场景)。提出了一种新的直接估计先验分布的方法,避免了对两个群体后验概率的估计,并具有简单的几何解释。该方法基于分布匹配技术与再生核希尔伯特空间中的核嵌入,并作为优化任务的显式解获得。建立了其渐近一致性以及对其与未知先验偏差的显式非渐近界,该界在实践中可计算。通过合成和实际数据研究有限样本行为,证明该方法在性能上与竞争对手相当或更优。

详情
AI中文摘要

我们研究了在目标无标签样本的先验分布估计问题,假设其可能与源群体不同,并且源数据部分可观察:只有正类样本和整个群体的样本可用(PU学习场景)。我们引入了一种新的直接估计先验分布的方法,避免了对两个群体后验概率的估计,并具有简单的几何解释。它基于分布匹配技术以及再生核希尔伯特空间中的核嵌入,并作为优化任务的显式解获得。我们建立了其渐近一致性以及对其与未知先验偏差的显式非渐近界,该界在实践中可计算。我们研究了合成和实际数据的有限样本行为,并证明该方法在性能上与竞争对手相当或更优。

英文摘要

We study estimation of a class prior for unlabeled target samples which possibly differs from that of source population. Moreover, it is assumed that the source data is partially observable: only samples from the positive class and from the whole population are available (PU learning scenario). We introduce a novel direct estimator of a class prior which avoids estimation of posterior probabilities in both populations and has a simple geometric interpretation. It is based on a distribution matching technique together with kernel embedding in a Reproducing Kernel Hilbert Space and is obtained as an explicit solution to an optimisation task. We establish its asymptotic consistency as well as an explicit non-asymptotic bound on its deviation from the unknown prior, which is calculable in practice. We study finite sample behaviour for synthetic and real data and show that the proposal works consistently on par or better than its competitors.

2403.16552 2026-05-22 cs.NE cs.AI cs.CV

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

QKFormer: 基于Q-K注意力的分层脉冲变换器

Chenlin Zhou, Han Zhang, Zhaokun Zhou, Liutao Yu, Liwei Huang, Xiaopeng Fan, Li Yuan, Zhengyu Ma, Huihui Zhou, Yonghong Tian

发表机构 * Pengcheng Laboratory(鹏城实验室) Harbin Institute of Technology(哈尔滨工业大学) Peking University(北京大学)

AI总结 本文提出QKFormer,一种基于Q-K注意力的分层脉冲变换器,通过引入新的脉冲形式Q-K注意力机制、分层结构和灵活的补丁嵌入模块,提升了脉冲神经网络在图像分类任务中的性能,实现了在ImageNet-1K数据集上85.65%的top-1准确率。

Comments Accepted by NeurIPS 2024 (Spotlight). Code and Model: https://github.com/zhouchenlin2096/QKFormer

详情
AI中文摘要

Spiking Transformers,将脉冲神经网络(SNNs)与变换器架构相结合,因其在能效和高性能方面的潜力而受到广泛关注。然而,现有模型在此领域仍存在性能不佳的问题。我们引入了几个创新来提高性能:i)我们提出了一种新的脉冲形式Q-K注意力机制,专为SNNs设计,通过二进制向量以线性复杂度高效建模token或通道维度的重要性。ii)我们将层次结构引入脉冲变换器,显著提升了生物和人工神经网络的性能,以获得多尺度脉冲表示。iii)我们设计了一个灵活且强大的补丁嵌入模块,具有特定于脉冲变换器的变形快捷方式。共同,我们开发了QKFormer,一种基于Q-K注意力的直接训练分层脉冲变换器。QKFormer在各种主流数据集上显著优于现有最先进SNN模型。值得注意的是,与Spikformer(66.34 M,74.81%)相比,QKFormer(64.96 M)在ImageNet-1k上实现了突破性的top-1准确率85.65%,大幅超越Spikformer 10.84%。据我们所知,这是首次直接训练SNNs在ImageNet-1K上超过85%的准确率。代码和模型可在https://github.com/zhouchenlin2096/QKFormer公开获取。

英文摘要

Spiking Transformers, which integrate Spiking Neural Networks (SNNs) with Transformer architectures, have attracted significant attention due to their potential for energy efficiency and high performance. However, existing models in this domain still suffer from suboptimal performance. We introduce several innovations to improve the performance: i) We propose a novel spike-form Q-K attention mechanism, tailored for SNNs, which efficiently models the importance of token or channel dimensions through binary vectors with linear complexity. ii) We incorporate the hierarchical structure, which significantly benefits the performance of both the brain and artificial neural networks, into spiking transformers to obtain multi-scale spiking representation. iii) We design a versatile and powerful patch embedding module with a deformed shortcut specifically for spiking transformers. Together, we develop QKFormer, a hierarchical spiking transformer based on Q-K attention with direct training. QKFormer shows significantly superior performance over existing state-of-the-art SNN models on various mainstream datasets. Notably, with comparable size to Spikformer (66.34 M, 74.81%), QKFormer (64.96 M) achieves a groundbreaking top-1 accuracy of 85.65% on ImageNet-1k, substantially outperforming Spikformer by 10.84%. To our best knowledge, this is the first time that directly training SNNs have exceeded 85% accuracy on ImageNet-1K. The code and models are publicly available at https://github.com/zhouchenlin2096/QKFormer

2305.12138 2026-05-22 cs.SE cs.AI

Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs

探索代码分析:利用大语言模型进行语法和语义的零样本洞察

Wei Ma, Zhihao Lin, Shangqing Liu, Qiang Hu, Ye Liu, Wenhan Wang, Cen Zhang, Liming Nie, Li Li, Yang Liu, Lingxiao Jiang

发表机构 * Singapore Management University(新加坡国立大学) Blekinge Institute of Technology(布莱金厄学院) Beihang University(北航) State Key Laboratory of Novel Software Technology, Nanjing University(南京大学软件新技术国家重点实验室) The University of Tokyo(东京大学) University of Alberta(阿尔伯塔大学) Nanyang Technological University(南洋理工大学) Shenzhen Technology University(深圳技术大学)

AI总结 本文研究了大语言模型在代码分析中的应用,通过评估21种最先进的LLM在四种语言中的九项任务,揭示了LLM在语法解析、静态语义推断和动态推理方面的性能,发现其在跨语言泛化方面有优势,但动态推理仍有限,提出了一个经过验证的评估框架。

Comments Accepted at ACM Transactions on Software Engineering and Methodology (TOSEM)

详情
AI中文摘要

代码分析在软件工程中至关重要,支持调试、优化和安全评估。人类开发者通过语法解析、静态语义推断和动态推理来处理。传统工具虽然有效,但受限于语言特异性且跨语言泛化能力弱。大语言模型(LLMs)在代码任务中具有潜力,但其在基础代码分析中的能力尚待探索。我们围绕与人类实践相关的三个方面(语法解析、静态语义推断和动态推理)展开研究。我们评估了21种最先进的LLM在四种语言(C、Java、Python、Solidity)中的九项任务,包括AST生成、CFG构建、数据依赖、污点分析和易变测试推理。我们应用三层评估协议(自动化指标、专家裁决、一致性验证)对3124个代码样本进行评估,实现了高评分者一致性(Cohen's kappa = 0.844-0.936)和强人机一致(Gwet's AC1 = 0.500-0.727,F1 = 0.791-0.882)。尽管最佳LLM在语法解析(AST 90%+,表达式匹配84-100%)方面表现优异,并在静态分析中显示出潜力,但其动态推理仍有限(<70%),且对数据迁移敏感(每项目F1变化0-1.0)。这一层次在模型家族和规模上均成立,表明是根本而非短暂的限制。这些发现展示了LLM如何补充传统分析器:它们提供跨语言泛化能力,但输出非确定性,需要验证;而传统工具提供确定性保证,但需要语言特定的配置。我们贡献了一个经过验证的评估框架,与传统分析器(Tree-sitter、Soot、Joern)进行比较,并提供了任务特定的应用层级。基准:https://github.com/mathieu0905/llm_code_analysis.git

英文摘要

Code analysis is fundamental in Software Engineering, supporting debugging, optimization, and security assessment. Human developers approach it through syntax parsing, static semantics inference, and dynamic reasoning. Traditional tools are effective but limited by language specificity and weak cross-language generalization. Large language models (LLMs) are promising for code tasks, yet their capabilities for fundamental code analysis remain underexplored. We structure our study around three aspects aligned with human practices: syntax parsing, static semantics inference, and dynamic reasoning. We evaluate 21 state-of-the-art LLMs across nine tasks in four languages (C, Java, Python, Solidity), including AST generation, CFG construction, data dependency, taint analysis, and flaky test reasoning. We apply a three-layer evaluation protocol (automated metrics, expert adjudication, consistency validation) to 3,124 code samples, achieving high inter-rater reliability (Cohen's kappa = 0.844-0.936) and strong human-machine agreement (Gwet's AC1 = 0.500-0.727, F1 = 0.791-0.882). While the best LLMs excel in syntax parsing (AST 90%+, expression matching 84-100%) and show promise in static analysis, their dynamic reasoning remains limited (<70%) with high data-shift sensitivity (per-project F1 varying 0-1.0). This hierarchy holds across model families and scales, suggesting fundamental rather than transient limitations. These findings show how LLMs complement traditional analyzers: they offer cross-language generalization but non-deterministic outputs needing validation, while traditional tools give deterministic guarantees but need language-specific configuration. We contribute a validated evaluation framework with comparison against traditional analyzers (Tree-sitter, Soot, Joern) and task-specific applicability tiers. Benchmark: https://github.com/mathieu0905/llm_code_analysis.git

2605.22505 2026-05-22 cs.AI

Towards Direct Evaluation of Harness Optimizers via Priority Ranking

通过优先级排名直接评估Harness优化器

Kai Tzu-iunn Ong, Minseok Kang, Dongwook Choi, Junhee Cho, Seungju Kim, Seungwon Lim, Geunha Jang, Minwoo Oh, Bogyung Jeong, Sunghwan Kim, Taeyoon Kwon, Jinyoung Yeo

AI总结 本文提出通过优先级排名直接评估Harness优化器,以解决传统方法中因缺乏oracle harness而无法有效评估优化器中间步骤的问题,展示了该方法在多步骤优化中的可靠性。

Comments Preprint. Work in Progress

详情
AI中文摘要

Harness优化通过让优化器代理迭代更新目标代理的harness来实现自动化代理创建。尽管其成功,当前研究仅通过观察目标代理的性能提升来评估优化器,这种间接的末端改进评估忽视了优化器在中间步骤中的行动,这些行动往往错误且阻碍代理性能。因此,不清楚harness优化是受优化器有信息的更新行动驱动还是单纯的试错。这需要直接评估harness优化器。然而,由于缺乏oracle harness,直接评估harness优化器是非平凡且昂贵的。为此,我们提出了一种简单且低成本的设计来直接评估它们,即优先级排名。通过让harness优化器对给定harness中的组件(例如工具)按其更新时对代理性能改进/阻碍的潜力进行排序,我们的设计在不昂贵的rollout或手动检查的情况下量化了优化器在步骤层面的能力。更重要的是,优化器的排名性能与它们在实际多步骤harness优化中改进代理的能力相关,建立了优先级排名作为优化能力可靠预测指标。优先级排名通过Shor实现,Shor是182个由人类验证的优化场景的集合,涵盖多个领域、设计和时间阶段。代码和数据可在https://github.com/k59118/Harness_Optimizer_Evaluation找到。

英文摘要

Harness optimization enables automated agent creation by having an optimizer agent iteratively update the harness of target agents. Despite its success, current studies evaluate optimizers solely by observing target agents' performance gains. This indirect end-improvement evaluation neglects optimizers' actions at intermediate steps, which are often erroneous and hinder agent performance. Therefore, it is unclear whether harness optimization is driven by optimizers' informed update actions or simply trial-and-error. This necessitates direct evaluation of harness optimizers. However, evaluating harness optimizers directly is non-trivial and costly due to the lack of oracle harnesses. To address this, we present a simple, low-cost design to directly evaluate them, namely priority ranking. By asking harness optimizers to rank components (e.g., tools) in a given harness by their potential to improve/hinder agent performance when updated, our design quantifies optimizer ability at the step level without expensive rollouts or manual examination. More importantly, optimizers' ranking performance correlates with their ability to improve agents in actual multi-step harness optimization, establishing priority ranking as a reliable predictor of optimization ability. Priority ranking is enabled by Shor, a collection of 182 human-verified optimization scenarios spanning across domains, designs, and time stages. Codes and data can be found at https://github.com/k59118/Harness_Optimizer_Evaluation.