arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.03241 2026-05-22 physics.optics cs.AI

OptiLookUp: An Optical ROM-Based Lookup Table Engine for Photonic Accelerators

OptiLookUp：一种基于光学ROM的查找表引擎用于光子加速器

Ankur Singh, Akhilesh Jaiswal

AI总结本文提出了一种基于光学ROM的查找表引擎，利用集成的微环谐振器实现高速可重构的光子ROM架构，通过直接在光子设备的频谱响应中编码输入输出映射，实现确定性的查找表操作，并在硅光平台上进行设计和评估，展示了在12.5GHz数据速率下的可靠性能。

详情

AI中文摘要

只读存储器（ROM）提供确定性的预定义数据映射访问。将ROM概念扩展到光学领域能够实现高带宽、低延迟和并行内存访问，但实现紧凑且可重构的光学ROM仍然具有挑战性，因为存在损耗、波长控制和集成限制。本文提出了一种高速、可重构的光子ROM架构，该架构采用集成的微环谐振器（MRRs）实现。ROM直接在光子设备的频谱响应中编码预定义的输入输出映射，从而在读取时实现确定性的查找表操作，而无需动态计算。为了提高可扩展性和减少累积插入损耗，该架构采用紧凑的银行子阵列，通过光学解码机制进行选择性寻址。可重构性通过基于晶体管的光学选择器实现，允许不同ROM银行被激活而无需物理光路重路由或干涉结构。所提出的光子ROM基于GlobalFoundries 45SPCLO硅光平台进行设计和评估。仿真结果表明，在12.5GHz的数据速率下能够可靠运行，通过集成的光电二极管读取获得了稳定的光到电流转移特性。该光学ROM可用于实现用于光子加速器架构中的非线性激活函数，包括Sigmoid、Tanh、ReLU和指数映射。

英文摘要

Read-only memory (ROM) provides deterministic access to predefined data mappings. Extending ROM concepts to the optical domain enables high-bandwidth, low-latency, and parallel memory access, but realizing compact and reconfigurable optical ROM remains challenging due to loss, wavelength control, and integration constraints. This work presents a high-speed, reconfigurable photonic ROM architecture implemented using integrated microring resonators (MRRs). The ROM encodes predefined input-output mappings directly in the spectral response of the photonic devices, enabling deterministic lookup-based operation without dynamic computation during readout. To improve scalability and reduce cumulative insertion loss, the architecture employs compact banked sub-arrays that are selectively addressed through an optical decoding mechanism. Reconfigurability is achieved using transistor-based optical selectors, allowing different ROM banks to be activated without physical light rerouting or interferometric structures. The proposed photonic ROM is designed and evaluated using device-level simulations based on the GlobalFoundries 45SPCLO silicon photonics platform. Simulation results demonstrate reliable operation at data rates up to 12.5 GHz, with stable light-to-current transfer characteristics obtained through integrated photodiode readout. The optical ROM can be used to implement nonlinear activation functions utilised in photonic accelerator architectures, including sigmoid, tanh, ReLU, and exponential mappings.

URL PDF HTML ☆

赞 0 踩 0

2603.20228 2026-05-22 math.OC cs.LG

Compact Lifted Relaxations for Low-Rank Optimization

紧凑的提升松弛方法用于低秩优化

Ryan Cory-Wright, Jean Pauphilet

AI总结本文提出了一种可处理秩约束二次优化问题的紧凑凸松弛方法，通过引入提升半正定松弛，避免了传统方法中所需的谱结构项，并通过冗余块的分析得到更紧凑的松弛形式，同时引入了新的有效不等式（投影割）以增强低秩松弛效果，适用于矩阵补全和降维回归等问题。

Comments Part of this material previously appeared in arXiv:2501.02942v2, which was split into this paper and arXiv:2501.02942v3

详情

AI中文摘要

我们开发了可处理n×m矩阵上的秩约束二次优化问题的可 tractable 凸松弛方法，这种设置通常只有在目标函数或约束具有谱结构时才可用 tractable 松弛。我们推导了不需谱项的提升半正定松弛。尽管直接提升引入了维度为n² + nm + 1的大型半正定约束，我们证明了许多时刻矩阵的块是冗余的，并推导出等价的紧凑松弛，仅涉及两个半正定约束，分别维度为nm + 1和n + m。我们还推导了一种新的有效不等式类别，称为投影割，利用了低秩矩阵的线性像继承秩约束的事实，显著增强了我们的低秩松弛。对于矩阵补全和降维回归等问题，我们利用额外的结构得到更紧凑的公式，涉及半正定矩阵的维度至多为低秩决策矩阵两个维度之和（即大小至多为n + m）。总体而言，我们为广泛低秩二次问题获得了可扩展的半正定界。

英文摘要

We develop tractable convex relaxations for rank-constrained quadratic optimization problems over $n \times m$ matrices, a setting for which tractable relaxations are typically only available when the objective or constraints admit spectral structure. We derive lifted semidefinite relaxations that do not require such spectral terms. Although a direct lifting introduces a large semidefinite constraint in dimension $n^2 + nm + 1$, we prove that many blocks of the moment matrix are redundant and derive an equivalent compact relaxation that only involves two semidefinite constraints of dimension $nm + 1$ and $n+m$, respectively. We also derive a new class of valid inequalities for low-rank problems, which we call projection cuts, that exploit the fact that rank constraints are inherited by linear images of a low-rank matrix, to strengthen our low-rank relaxations substantially. For matrix completion and reduced-rank regression problems, among others, we exploit additional structure to obtain even more compact formulations involving semidefinite matrices of dimension at most the sum of the two dimensions of the low-rank decision matrix (i.e., of size at most $n+m$). Overall, we obtain scalable semidefinite bounds for a broad class of low-rank quadratic problems.

URL PDF HTML ☆

赞 0 踩 0

2602.10445 2026-05-22 cs.IR cs.LG

End-to-End Semantic ID Generation for Generative Advertisement Recommendation

端到端语义ID生成用于生成式广告推荐

Jie Jiang, Xinxun Zhang, Enming Zhang, Yuling Xiong, Jun Zhang, Jingwen Wang, Huan Yu, Yuxiang Wang, Hao Wang, Xiao Yan, Jiawei Jiang

AI总结本文提出UniSID框架，通过端到端优化广告数据中的嵌入和ID，直接将语义信息传递到ID空间，解决传统两阶段压缩方法的不足，并通过多粒度对比学习和基于摘要的广告重建机制提升ID的语义表达能力。

Comments Add the emails

详情

AI中文摘要

生成式推荐（GR）通过将推荐视为下一个标记预测来取得成功。这种范式依赖于语义ID（SIDs）将大规模项目分解为离散序列。现有GR方法主要通过残差量化（RQ）生成SIDs，其中项目被编码为嵌入并量化为离散SIDs。然而，这种范式存在固有局限：1）由于两阶段压缩导致的目标偏差和语义退化；2）RQ结构固有的误差累积。为了解决这些限制，我们提出了UniSID，一种用于生成式广告推荐的统一SID生成框架。具体来说，我们从原始广告数据中端到端地优化嵌入和SID，使语义信息直接流入SID空间，从而解决两阶段级联压缩范式的固有局限。为了捕捉细粒度语义，引入了多粒度对比学习策略以在SID级别对齐不同的项目。最后，提出了一种基于摘要的广告重建机制，以鼓励SID捕捉不在广告上下文中显式存在的高层语义信息。实验表明，UniSID在下游广告场景中 consistently 超过最先进的SID生成方法，在Hit Rate指标上比最强基线提升高达4.62%。

英文摘要

Generative Recommendation (GR) has excelled by framing recommendation as next-token prediction. This paradigm relies on Semantic IDs (SIDs) to tokenize large-scale items into discrete sequences. Existing GR approaches predominantly generate SIDs via Residual Quantization (RQ), where items are encoded into embeddings and then quantized to discrete SIDs. However, this paradigm suffers from inherent limitations: 1) Objective misalignment and semantic degradation stemming from the two-stage compression; 2) Error accumulation inherent in the structure of RQ. To address these limitations, we propose UniSID, a Unified SID generation framework for generative advertisement recommendation. Specifically, we jointly optimize embeddings and SIDs in an end-to-end manner from raw advertising data, enabling semantic information to flow directly into the SID space and thus addressing the inherent limitations of the two-stage cascading compression paradigm. To capture fine-grained semantics, a multi-granularity contrastive learning strategy is introduced to align distinct items across SID levels. Finally, a summary-based ad reconstruction mechanism is proposed to encourage SIDs to capture high-level semantic information that is not explicitly present in advertising contexts. Experiments demonstrate that UniSID consistently outperforms state-of-the-art SID generation methods, yielding up to a 4.62% improvement in Hit Rate metrics across downstream advertising scenarios compared to the strongest baseline.

URL PDF HTML ☆

赞 0 踩 0

2601.23219 2026-05-22 cs.MA cs.AI

MonoScale: Scaling Multi-Agent System with Monotonic Improvement

MonoScale: 通过单调改进扩展多智能体系统

Shuai Shao, Yixiang Liu, Bingwei Lu, Weinan Zhang

AI总结本文提出MonoScale框架，通过生成agent条件化熟悉任务、收集交互证据并将其转化为可审计的自然语言记忆，实现多智能体系统的单调性能提升，实验表明其在GAIA和Humanity's Last Exam任务中优于简单扩展和强路由固定池基线。

详情

AI中文摘要

近年来，基于大语言模型的多智能体系统（MAS）发展迅速，利用路由器分解任务并将子任务委托给专门的智能体。扩展能力的自然方法是通过持续集成新功能智能体或工具接口来扩大智能体池，但盲目扩展可能导致性能崩溃，当路由器在新添加的异质且不可靠的智能体上冷启动时。我们提出MonoScale，一种扩展感知的更新框架，主动生成少量agent条件化的熟悉任务，从成功和失败的交互中收集证据，并将其提炼为可审计的自然语言记忆以指导未来的路由。我们将顺序增强正式化为上下文带窃，并执行信任区域记忆更新，从而在加入轮次中实现单调非递减的性能保证。在GAIA和Humanity's Last Exam上的实验表明，随着智能体池的增长，性能稳定提升，优于简单扩展和强路由固定池基线。

英文摘要

In recent years, LLM-based multi-agent systems (MAS) have advanced rapidly, using a router to decompose tasks and delegate subtasks to specialized agents. A natural way to expand capability is to scale up the agent pool by continually integrating new functional agents or tool interfaces, but naive expansion can trigger performance collapse when the router cold-starts on newly added, heterogeneous, and unreliable agents. We propose MonoScale, an expansion-aware update framework that proactively generates a small set of agent-conditioned familiarization tasks, harvests evidence from both successful and failed interactions, and distills it into auditable natural-language memory to guide future routing. We formalize sequential augmentation as a contextual bandit and perform trust-region memory updates, yielding a monotonic non-decreasing performance guarantee across onboarding rounds. Experiments on GAIA and Humanity's Last Exam show stable gains as the agent pool grows, outperforming naive scale-up and strong-router fixed-pool baselines.

URL PDF HTML ☆

赞 0 踩 0

2601.22365 2026-05-22 cs.DM cs.LG

Towards Solving the Gilbert-Pollak Conjecture via Large Language Models

通过大语言模型解决吉尔伯特-波拉克猜想

Yisi Ke, Tianyu Huang, Yankai Shu, Di He, Jingchu Gai, Liwei Wang

AI总结本文提出一种新的AI系统，通过生成受规则约束的几何引理并构建专用函数，以获得更紧的Steiner比下界，展示了大语言模型在高级数学研究中的强大潜力。

Comments 44 pages, 11 figures

详情

AI中文摘要

吉尔伯特-波拉克猜想，也称为Steiner比猜想，指出在欧几里得平面中任意有限点集的Steiner最小树长度至少是欧几里得最小生成树长度的√3/2≈0.866倍。从1980年代以来的一系列改进最终得出下界为0.824，过去三十年内没有实质性进展。最近大语言模型（LLM）在竞赛级别的数学问题上表现出色，但其在解决开放性研究问题上的潜力尚待探索。本文提出了一种新的AI系统，通过生成受规则约束的几何引理并构建专用函数，以获得更紧的Steiner比下界。而不是直接提示LLM解决猜想，而是让它们生成受规则约束的几何引理，并将其作为可执行代码实现。这些引理随后用于构建一组专用函数，我们称之为验证函数，从而产生理论上得到认证的Steiner比下界。通过逐步引理细化驱动的反思，该系统建立了新的认证的Steiner比下界为0.8559。整个研究努力仅涉及数千次LLM调用，展示了基于LLM的系统在高级数学研究中的强大潜力。

英文摘要

The Gilbert-Pollak Conjecture \citep{gilbert1968steiner}, also known as the Steiner Ratio Conjecture, states that for any finite point set in the Euclidean plane, the Steiner minimum tree has length at least $\sqrt{3}/2 \approx 0.866$ times that of the Euclidean minimum spanning tree (the Steiner ratio). A sequence of improvements through the 1980s culminated in a lower bound of $0.824$, with no substantial progress reported over the past three decades. Recent advances in LLMs have demonstrated strong performance on contest-level mathematical problems, yet their potential for addressing open, research-level questions remains largely unexplored. In this work, we present a novel AI system for obtaining tighter lower bounds on the Steiner ratio. Rather than directly prompting LLMs to solve the conjecture, we task them with generating rule-constrained geometric lemmas implemented as executable code. These lemmas are then used to construct a collection of specialized functions, which we call verification functions, that yield theoretically certified lower bounds of the Steiner ratio. Through progressive lemma refinement driven by reflection, the system establishes a new certified lower bound of 0.8559 for the Steiner ratio. The entire research effort involves only thousands of LLM calls, demonstrating the strong potential of LLM-based systems for advanced mathematical research.

URL PDF HTML ☆

赞 0 踩 0

2601.21025 2026-05-22 stat.ML cs.LG

A Diffusive Classification Loss for Learning Energy-based Generative Models

一种用于学习基于能量的生成模型的扩散分类损失

RuiKang OuYang, Louis Grenioux, José Miguel Hernández-Lobato

AI总结本文提出了一种名为DiffCLF的扩散分类损失，用于学习基于能量的生成模型，通过将能量模型学习重新表述为跨噪声级别的监督分类问题，从而在保持计算效率的同时避免了模式盲区，提高了模型的保真度和应用范围。

Comments Accepted at ICML 2026

详情

AI中文摘要

基于分数的生成模型最近取得了显著的成功。虽然它们通常由分数参数化，但另一种方法是使用一系列时间依赖的能量模型（EBMs），其中分数是从能量的负输入梯度获得的。关键的是，EBMs不仅可以用于生成，还可以用于诸如组合采样或通过蒙特卡洛方法构建玻尔兹曼生成器等任务。然而，训练EBMs仍然具有挑战性。直接最大似然估计由于需要嵌套采样而计算上不可行，而分数匹配虽然高效，但存在模式盲区。为了解决这些问题，我们引入了扩散分类（DiffCLF）目标，这是一种简单的方法，可以避免盲区同时保持计算效率。DiffCLF将EBM学习重新表述为跨噪声级别的监督分类问题，并可以无缝结合标准的分数基目标。我们通过在分析高斯混合案例中将估计能量与真实值进行比较，以及通过应用训练好的模型到诸如模型组合和玻尔兹曼生成器采样等任务中，验证了DiffCLF的有效性。我们的结果表明，DiffCLF使EBM比现有方法具有更高的保真度和更广泛的应用范围。

英文摘要

Score-based generative models have recently achieved remarkable success. While they are usually parameterized by the score, an alternative way is to use a series of time-dependent energy-based models (EBMs), where the score is obtained from the negative input-gradient of the energy. Crucially, EBMs can be leveraged not only for generation, but also for tasks such as compositional sampling or building Boltzmann Generators via Monte Carlo methods. However, training EBMs remains challenging. Direct maximum likelihood is computationally prohibitive due to the need for nested sampling, while score matching, though efficient, suffers from mode blindness. To address these issues, we introduce the Diffusive Classification (DiffCLF) objective, a simple method that avoids blindness while remaining computationally efficient. DiffCLF reframes EBM learning as a supervised classification problem across noise levels, and can be seamlessly combined with standard score-based objectives. We validate the effectiveness of DiffCLF by comparing the estimated energies against ground truth in analytical Gaussian mixture cases, and by applying the trained models to tasks such as model composition and Boltzmann Generator sampling. Our results show that DiffCLF enables EBMs with higher fidelity and broader applicability than existing approaches.

URL PDF HTML ☆

赞 0 踩 0

2601.15671 2026-05-22 cs.HC cs.AI

StreetDesignAI: Broadening Designer Perspectives Through Multi-Persona Evaluation of Cycling Infrastructure

StreetDesignAI: 通过多角色评估拓宽设计师视角

Ziyi Wang, Yilong Dai, Duanya Lyu, Mateo Nader, Sihan Chen, Wanghao Ye, Zijian Ding, Xiang Yan

AI总结本文提出StreetDesignAI，通过多角色评估方法帮助设计师更全面地理解骑行者需求，提升设计决策能力。

详情

DOI: 10.1145/3800645.3812888

AI中文摘要

设计骑行基础设施需要平衡不同用户群体的 competing 需求，但设计师往往难以预见不同骑行者对同一街道环境的体验差异。本文探讨了基于角色的评估如何支持骑行基础设施设计，通过在设计过程中显式化体验冲突。基于与12名领域专家和427名骑行者的众包评估的形成性研究，我们提出了StreetDesignAI，一个交互系统，使设计师能够（1）通过影像和地图数据将评估扎根于真实的街道环境；（2）接收来自模拟骑行者角色（从自信到谨慎用户）的并行反馈；（3）在系统揭示不同视角冲突的同时迭代修改设计。26名交通专业人员的组内研究显示，结构化的多视角反馈显著拓宽了设计师对各种骑行者视角的理解、识别多样化角色需求的能力以及将这些需求转化为设计决策的信心。参与者还报告了更高的总体满意度和更强的使用系统进行专业实践的意愿。定性发现进一步揭示了显式冲突揭示如何将设计探索从单视角优化转变为有意的权衡推理。我们讨论了AI辅助工具在通过分歧作为交互原语来支持角色意识设计方面的启示。

英文摘要

Designing cycling infrastructure requires balancing the competing needs of diverse user groups, yet designers often struggle to anticipate how different cyclists experience the same street environment. We investigate how persona-based evaluation can support cycling infrastructure design by making experiential conflicts explicit during the design process. Informed by a formative study with 12 domain experts and crowdsourced bikeability assessments from 427 cyclists, we present StreetDesignAI, an interactive system that enables designers to (1) ground evaluation in real street context through imagery and map data, (2) receive parallel feedback from simulated cyclist personas spanning confident to cautious users, and (3) iteratively modify designs while the system surfaces conflicts across perspectives. A within-subjects study with 26 transportation professionals comparing StreetDesignAI against a general-purpose AI chatbot demonstrates that structured multi-perspective feedback significantly Broaden designers' understanding of various cyclists' perspectives, ability to identify diverse persona needs, and confidence in translating those needs into design decisions. Participants also reported significantly higher overall satisfaction and stronger intention to use the system in professional practice. Qualitative findings further illuminate how explicit conflict surfacing transforms design exploration from single-perspective optimization toward deliberate trade-off reasoning. We discuss implications for AI-assisted tools that scaffold persona-aware design through disagreement as an interaction primitive.

URL PDF HTML ☆

赞 0 踩 0

2601.11650 2026-05-22 physics.chem-ph cs.AI

Large Language Model Agent for User-friendly Chemical Process Simulations

面向用户友好的化学过程模拟的大语言模型代理

Jingkang Liang, Niklas Groll, Gürkan Sin

AI总结本文提出一种基于大语言模型的代理，通过Model Context Protocol与AVEVA Process Simulation集成，实现自然语言交互进行化学过程模拟，提升非专业用户对复杂过程设计、仿真和优化的访问能力。

详情

DOI: 10.1016/j.dche.2026.100312

AI中文摘要

现代过程仿真器能够实现详细的工艺设计、仿真和优化；然而，构建和解释仿真过程耗时且需要专业知识，限制了非专业用户早期探索。为此，本文提出将大语言模型（LLM）代理通过Model Context Protocol（MCP）集成到AVEVA Process Simulation（APS）中，允许通过自然语言与严谨的过程仿真进行交互。MCP服务器工具集使LLM能够通过Python编程与APS通信，从而从普通语言指令中执行复杂的仿真任务。两个水-甲醇分离案例研究评估了该框架在不同任务复杂性和交互模式下的表现。第一个案例展示了代理能够自主分析流程图，发现改进机会，迭代优化，提取数据并清晰呈现结果。该框架在教育目的上能够将技术概念转化为流程，同时为有经验的从业者自动化数据提取，加快常规任务并支持头脑风暴。第二个案例研究通过逐步对话和单提示两种方式评估了自主流程图合成的潜力，展示了其对初学者和专家的适用性。逐步模式提供可靠且指导性的构建，适合教育环境；单提示模式快速构建基础流程图供后续优化。尽管当前的局限性如过度简化、计算错误和技术问题仍需专家监督，但该框架在分析、优化和引导构建方面的能力表明，基于LLM的代理可以成为有价值的协作伙伴。

英文摘要

Modern process simulators enable detailed process design, simulation, and optimization; however, constructing and interpreting simulations is time-consuming and requires expert knowledge. This limits early exploration by inexperienced users. To address this, a large language model (LLM) agent is integrated with AVEVA Process Simulation (APS) via Model Context Protocol (MCP), allowing natural language interaction with rigorous process simulations. An MCP server toolset enables the LLM to communicate programmatically with APS using Python, allowing it to execute complex simulation tasks from plain-language instructions. Two water-methanol separation case studies assess the framework across different task complexities and interaction modes. The first shows the agent autonomously analyzing flowsheets, finding improvement opportunities, and iteratively optimizing, extracting data, and presenting results clearly. The framework benefits both educational purposes, by translating technical concepts and demonstrating workflows, and experienced practitioners by automating data extraction, speeding routine tasks, and supporting brainstorming. The second case study assesses autonomous flowsheet synthesis through both a step-by-step dialogue and a single prompt, demonstrating its potential for novices and experts alike. The step-by-step mode gives reliable, guided construction suitable for educational contexts; the single-prompt mode constructs fast baseline flowsheets for later refinement. While current limitations such as oversimplification, calculation errors, and technical hiccups mean expert oversight is still needed, the framework's capabilities in analysis, optimization, and guided construction suggest LLM-based agents can become valuable collaborators.

URL PDF HTML ☆

赞 0 踩 0

2601.05157 2026-05-22 cs.DS cs.LG stat.ML

Learning Mixture Models via Efficient High-dimensional Sparse Fourier Transforms

通过高效的高维稀疏傅里叶变换学习混合模型

Alkis Kalavasis, Pravesh K. Kothari, Shuchen Li, Manolis Zampetakis

AI总结本文提出了一种在高维空间中以多项式时间复杂度学习混合模型参数的方法，适用于具有重尾分布的混合模型，包括那些协方差有限的分布，且无需集群均值的最小分离。

详情

AI中文摘要

在本文中，我们提出了一种${ m poly}(d,k)$时间复杂度和样本复杂度的算法，用于高效学习$d$维空间中$k$个球形分布的参数。与之前的所有方法不同，我们的技术适用于具有重尾分布的情况，甚至包括那些没有有限协方差的分布。我们的方法在集群分布具有足够重的尾部特征函数时才能成功。此类分布包括拉普拉斯分布，但关键地排除了高斯分布。所有之前学习混合模型的方法都隐式或显式地依赖于低次矩。即使对于拉普拉斯分布的情况，我们证明任何此类算法必须使用超多项式数量的样本。因此，我们的方法补充了那些绕过矩方法限制的技术列表。出人意料的是，我们的算法不需要任何集群均值之间的最小分离。这与球形高斯混合模型形成鲜明对比，后者在信息论上证明需要最小的$\ell_2$-分离[Regev and Vijayaraghavan '17]。我们的方法与现有技术相结合，允许在混合模型中获得'两者兼得'的保证，其中每个组件要么具有重尾特征函数，要么具有亚高斯尾部但轻尾特征函数。我们的算法基于一种新的通过高效高维稀疏傅里叶变换学习混合模型的方法。我们相信这种方法将在统计估计中找到更多应用。作为例子，我们给出一个一致的鲁棒均值估计算法，以对抗噪声无关的对手，这是一个由文献中的多重假设检验文献实际提出的模型。它最近在一位作者的硕士论文中正式提出，并已启发了后续的工作。

英文摘要

In this work, we give a ${\rm poly}(d,k)$ time and sample algorithm for efficiently learning the parameters of a mixture of $k$ spherical distributions in $d$ dimensions. Unlike all previous methods, our techniques apply to heavy-tailed distributions and include examples that do not even have finite covariances. Our method succeeds whenever the cluster distributions have a characteristic function with sufficiently heavy tails. Such distributions include the Laplace distribution but crucially exclude Gaussians. All previous methods for learning mixture models relied implicitly or explicitly on the low-degree moments. Even for the case of Laplace distributions, we prove that any such algorithm must use super-polynomially many samples. Our method thus adds to the short list of techniques that bypass the limitations of the method of moments. Somewhat surprisingly, our algorithm does not require any minimum separation between the cluster means. This is in stark contrast to spherical Gaussian mixtures where a minimum $\ell_2$-separation is provably necessary even information-theoretically [Regev and Vijayaraghavan '17]. Our methods compose well with existing techniques and allow obtaining ''best of both worlds" guarantees for mixtures where every component either has a heavy-tailed characteristic function or has a sub-Gaussian tail with a light-tailed characteristic function. Our algorithm is based on a new approach to learning mixture models via efficient high-dimensional sparse Fourier transforms. We believe that this method will find more applications to statistical estimation. As an example, we give an algorithm for consistent robust mean estimation against noise-oblivious adversaries, a model practically motivated by the literature on multiple hypothesis testing. It was formally proposed in a recent Master's thesis by one of the authors, and has already inspired follow-up works.

URL PDF HTML ☆

赞 0 踩 0

2512.19131 2026-05-22 cs.DC cs.LG

Evidential Trust-Aware Model Personalization in Decentralized Federated Learning for Wearable IoT

基于证据的信任感知模型个性化在可穿戴物联网的去中心化联邦学习中

Murtaza Rangwala, Richard O. Sinnott, Rajkumar Buyya

AI总结本文提出Murmura框架，利用证据深度学习实现去中心化联邦学习中的信任感知模型个性化，通过Dirichlet基于的证据模型中的epistemic不确定性直接指示节点兼容性，从而减少非IID条件下的性能下降并加快收敛速度。

Comments v2. Addressed minor reviewer concerns

详情

DOI: 10.1109/CCGrid68966.2026.00061

AI中文摘要

去中心化联邦学习（DFL）能够在边缘设备之间进行协作模型训练，而无需集中协调，提供了对单点故障的抗性。然而，由于非相同分布的本地数据导致的统计异质性，创建了一个根本性挑战：节点必须学习适应其本地分布的个性化模型，同时选择性地与兼容的同行合作。现有方法要么强制一个单一的全局模型，无法适应任何人，要么依赖于启发式的同行选择机制，无法区分真正不兼容数据分布的同行和具有有价值互补知识的同行。我们提出了Murmura，一个利用证据深度学习实现去中心化联邦学习中信任感知模型个性化的框架。我们的关键见解是，基于Dirichlet的证据模型中的epistemic不确定性直接表明同行兼容性：当同行模型评估本地数据时，高epistemic不确定性表明分布不匹配，使节点能够排除不兼容的影响，同时通过选择性合作保持个性化模型。Murmura引入了一种信任感知的聚合机制，通过在本地验证样本上的交叉评估计算同行兼容性分数，并基于证据信任进行模型聚合，使用自适应阈值。在三个可穿戴物联网数据集（UCI HAR，PAMAP2，PPG-DaLiA）上的评估表明，与基线相比，Murmura将从IID到非IID条件下的性能下降减少了0.9% vs. 19.3%，实现了7.4×更快的收敛速度，并在超参数选择中保持稳定的准确性。这些结果确立了证据不确定性作为去中心化异构环境中兼容性感知个性化的原则性基础。

英文摘要

Decentralized federated learning (DFL) enables collaborative model training across edge devices without centralized coordination, offering resilience against single points of failure. However, statistical heterogeneity arising from non-identically distributed local data creates a fundamental challenge: nodes must learn personalized models adapted to their local distributions while selectively collaborating with compatible peers. Existing approaches either enforce a single global model that fits no one well, or rely on heuristic peer selection mechanisms that cannot distinguish between peers with genuinely incompatible data distributions and those with valuable complementary knowledge. We present Murmura, a framework that leverages evidential deep learning to enable trust-aware model personalization in DFL. Our key insight is that epistemic uncertainty from Dirichlet-based evidential models directly indicates peer compatibility: high epistemic uncertainty when a peer's model evaluates local data reveals distributional mismatch, enabling nodes to exclude incompatible influence while maintaining personalized models through selective collaboration. Murmura introduces a trust-aware aggregation mechanism that computes peer compatibility scores through cross-evaluation on local validation samples and personalizes model aggregation based on evidential trust with adaptive thresholds. Evaluation on three wearable IoT datasets (UCI HAR, PAMAP2, PPG-DaLiA) demonstrates that Murmura reduces performance degradation from IID to non-IID conditions compared to baseline (0.9% vs. 19.3%), achieves 7.4$\times$ faster convergence, and maintains stable accuracy across hyperparameter choices. These results establish evidential uncertainty as a principled foundation for compatibility-aware personalization in decentralized heterogeneous environments.

URL PDF HTML ☆

赞 0 踩 0

2512.11484 2026-05-22 cs.CR cs.AI

Capacitive Touchscreens at Risk: Recovering Handwritten Trajectory on Smartphone via Electromagnetic Emanations

容性触屏面临风险：通过电磁辐射恢复智能手机上的手写轨迹

Yukun Cheng, Shiyu Zhu, Changhai Ou, Xingshuo Han, Yuan Li, Shihui Zheng

AI总结本文揭示并利用了容性触屏的电磁侧信道漏洞，通过捕获屏幕书写时产生的电磁信号，实时回归二维手写轨迹。研究提出TESLA攻击框架，展示了在现实攻击条件下恢复高度可识别的手写轨迹的能力。

2512.09472 2026-05-22 cs.DC cs.LG

WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving

WarmServe: 为多LLM服务实现一种多GPU预热

Chiheng Lou, Sheng Qi, Rui Kang, Yong Zhang, Chen Sun, Pengcheng Wang, Xuanzhe Liu, Xin Jin

AI总结本文提出WarmServe系统，通过基于工作负载预测的多GPU预热技术，减少LLM服务中的尾部时间到第一个令牌（TTFT）并提高请求吞吐量。

Comments Accepted at ICML 2026

详情

AI中文摘要

在共享GPU集群中部署多个模型是提高大型语言模型（LLM）服务资源效率的关键策略。现有多LLM服务系统通过牺牲降级的推理性能，特别是时间到第一个令牌（TTFT）来提高GPU利用率。我们归因于缺乏对未来工作负载特征的认识。相反，最近的分析表明，现实世界中的LLM服务工作负载具有强周期性和长期可预测性。在本文中，我们提出了一种“一为多”GPU预热方法，根据工作负载预测主动将多个模型的参数加载到GPU上。这些预热的权重使系统能够在遇到请求高峰时迅速实例化服务实例。我们设计并实现了WarmServe，一个多LLM服务系统，包含三个关键技术：（1）一个模型放置算法，优化预热决策以最小化跨模型预热干扰；（2）一个KV缓存预留策略，将正在运行GPU上的空闲KV缓存空间重新利用于预热新模型；（3）一个高效的GPU内存切换机制用于张量管理。在真实世界数据集上的评估显示，WarmServe将尾部TTFT减少到比最先进的自动扩展系统高50.8倍，同时支持比GPU共享系统高2.5倍的请求吞吐量。

英文摘要

Deploying multiple models within shared GPU clusters is a key strategy to improve resource efficiency in large language model (LLM) serving. Existing multi-LLM serving systems improve GPU utilization at the cost of degraded inference performance, particularly time-to-first-token (TTFT). We attribute this degradation to the lack of awareness regarding future workload characteristics. In contrast, recent analyses have shown the strong periodicity and long-term predictability of real-world LLM serving workloads. In this paper, we propose one-for-many GPU prewarming, which proactively loads parameters from multiple models onto GPUs based on workload forecasts. These prewarmed weights enable the system to promptly instantiate serving instances upon encountering request bursts. We design and implement WarmServe, a multi-LLM serving system incorporating three key techniques: (1) a model placement algorithm that optimizes prewarming decisions to minimize cross-model prewarming interference, (2) a KV cache reservation strategy that repurposes idle KV cache space on running GPUs for prewarming new models, and (3) an efficient GPU memory switching mechanism for tensor management. Evaluation on real-world datasets shows that WarmServe reduces tail TTFT by up to 50.8$\times$ compared to the state-of-the-art autoscaling-based system, while supporting up to 2.5$\times$ higher request throughput than the GPU-sharing system.

URL PDF HTML ☆

赞 0 踩 0

2511.08093 2026-05-22 eess.AS cs.CL cs.SD

Quantizing Whisper-small: How design choices affect ASR performance

对Whisper-small的量化：设计选择如何影响语音识别性能

Arthur Söhler, Julian Irigoyen, Andreas Søeborg Kirkedal

AI总结本文研究了不同量化方案对Whisper-small模型性能的影响，发现动态int8量化在模型压缩和识别准确率之间取得了最佳平衡，同时展示了通过精心选择量化方法可以显著减少模型大小和推理成本，从而在受限硬件上实现高效部署。

Comments Accepted to SPEAKABLE workshop at LREC 2026

详情

AI中文摘要

大型语音识别模型如Whisper-small虽然能实现高精度，但其高计算需求使其难以在边缘设备上部署。为此，我们提出了一种统一的跨库评估，评估了Whisper-small上的后训练量化（PTQ）方法，以分离量化方案、方法、粒度和位宽的影响。我们的研究基于四个库：PyTorch、Optimum-Quanto、HQQ和bitsandbytes。在LibriSpeech测试清洁和测试其他数据集上的实验表明，动态int8量化结合Quanto提供了最佳的权衡，将模型大小减少57%，同时在基线的词错误率上有所提升。静态量化表现较差，可能由于Whisper的Transformer架构，而更激进的格式（如nf4、int3）在嘈杂条件下以牺牲准确性为代价实现了高达71%的压缩。总体而言，我们的结果表明，精心选择的PTQ方法可以在不重新训练的情况下显著减少模型大小和推理成本，从而在受限硬件上实现Whisper-small的高效部署。

英文摘要

Large speech recognition models like Whisper-small achieve high accuracy but are difficult to deploy on edge devices due to their high computational demand. To this end, we present a unified, cross-library evaluation of post-training quantization (PTQ) on Whisper-small that disentangles the impact of quantization scheme, method, granularity, and bit-width. Our study is based on four libraries: PyTorch, Optimum-Quanto, HQQ, and bitsandbytes. Experiments on LibriSpeech test-clean and test-other show that dynamic int8 quantization with Quanto offers the best trade-off, reducing model size by 57% while improving on the baseline's word error rate. Static quantization performed worse, likely due to Whisper's Transformer architecture, while more aggressive formats (e.g., nf4, int3) achieved up to 71% compression at the cost of accuracy in noisy conditions. Overall, our results demonstrate that carefully chosen PTQ methods can substantially reduce model size and inference cost without retraining, enabling efficient deployment of Whisper-small on constrained hardware.

URL PDF HTML ☆

赞 0 踩 0

2511.07885 2026-05-22 cs.DC cs.AI cs.CL cs.LG

Intelligence per Watt: Measuring Intelligence Efficiency of Local AI

每瓦智能：衡量本地AI的智能效率

Jon Saad-Falcon, Avanika Narayan, Hakki Orhun Akengin, J. Wes Griffin, Herumb Shandilya, Adrian Gamarra Lafuente, Medhya Goel, Rebecca Joseph, Shlok Natarajan, Etash Kumar Guha, Shang Zhu, Ben Athiwaratkun, John Hennessy, Azalia Mirhoseini, Christopher Ré

AI总结本文研究了本地AI在能源效率和性能上的表现，提出了一种统一的衡量指标IPW，展示了本地推理在重新分配需求方面的能力，并揭示了本地加速器的优化潜力。

详情

AI中文摘要

大型语言模型（LLM）查询主要由集中式云基础设施中的前沿模型处理。需求增长比提供商能够扩展的速度更快。两项进展创造了重新思考这一范式的机会：小型本地LM（<=20B活跃参数）在许多任务上能与前沿模型竞争性地表现，而本地加速器（如Apple M4 Max）可以以交互延迟支持这些模型。这引发了问题：本地推理能否在能源受限的设备上有效重新分配需求？这需要测量本地LM是否能准确回答现实查询以及是否在能源受限的设备上高效。我们提出了智能每瓦（IPW），即任务准确度每单位功率，作为衡量本地推理能力与效率的统一指标。我们评估了20多个最先进的本地LM、8种硬件加速器（本地和云）以及100万条现实单轮聊天和推理查询。对于每个查询，我们测量了准确性（本地LM对前沿模型的胜率）、能耗、延迟和功率。我们发现三个关键结果。首先，本地LM成功回答了88.7%的这些查询，准确性因领域而异。其次，2023-2025年的纵向分析显示IPW提高了5.3倍，由算法和加速器的改进驱动，本地可服务查询覆盖范围从23.2%增加到71.3%。第三，本地加速器在相同模型上实现的IPW至少比云加速器低1.4倍，揭示了本地加速器优化的巨大潜力。这些发现表明，本地推理可以对集中式基础设施的大量查询需求进行有意义的重新分配，IPW是跟踪这一转变的关键指标。

英文摘要

Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure. Demand growth strains this paradigm faster than providers can scale. Two advances create an opportunity to rethink it: small, local LMs (<=20B active parameters) now achieve competitive performance to frontier models on many tasks, and local accelerators (e.g., Apple M4 Max) can host these models at interactive latencies. This raises the question: can local inference viably redistribute demand from centralized infrastructure? This requires measuring both whether local LMs can accurately answer real-world queries and whether they can do so efficiently on power-constrained devices (e.g., laptops). We propose intelligence per watt (IPW), task accuracy per unit of power, as a unified metric for the capability and efficiency of local inference across model-accelerator configurations. We evaluate 20+ state-of-the-art local LMs, 8 hardware accelerators (local and cloud), and 1M real-world single-turn chat and reasoning queries. For each query, we measure accuracy (local LM win rate against frontier models), energy, latency, and power. We find three key results. First, local LMs successfully answer 88.7% of these queries, with accuracy varying by domain. Second, longitudinal analysis from 2023-2025 shows IPW improved 5.3x, driven by both algorithmic and accelerator advances, with locally-serviceable query coverage rising from 23.2% to 71.3%. Third, local accelerators achieve at least 1.4x lower IPW than cloud accelerators running identical models, revealing significant headroom for local accelerator optimization. These findings demonstrate that local inference can meaningfully redistribute demand from centralized infrastructure for a substantial subset of queries, with IPW serving as the critical metric for tracking this transition.

URL PDF HTML ☆

赞 0 踩 0

2511.06428 2026-05-22 cs.SE cs.AI

Walking the Tightrope of LLMs for Software Development: A Practitioners' Perspective

在软件开发中平衡LLMs的绳子：从业者视角

Samuel Ferino, Rashina Hoda, John Grundy, Christoph Treude

AI总结本文从软件开发者视角出发，研究LLMs对软件开发的影响及管理方法，通过22次访谈和STGT4DA分析方法，揭示了LLMs在个体、团队、组织和社会层面的利弊，并提出了缓解挑战的可行建议。

详情

AI中文摘要

背景：大型语言模型（LLMs）的出现有可能引发软件开发领域的革命（例如自动化流程、劳动力转型）。尽管已有研究开始探讨LLMs对软件开发的感知影响，但需要实证研究来理解如何平衡使用LLMs的正反作用。目标：我们研究了LLMs对软件开发的影响以及如何从软件开发者视角管理其影响。方法：我们进行了22次软件从业者的访谈，数据收集和分析跨越了2024年10月至2025年9月的三轮数据收集和分析。我们采用了社会技术扎根理论（STGT4DA）对访谈参与者的回答进行严格分析。结果：我们识别了使用LLMs在个体、团队、组织和社会层面的益处（例如维持开发者流程、提高开发者心理模型、促进创业）和挑战（例如损害开发者声誉），并提出了缓解这些挑战的可行建议。结论：关键在于我们提出了软件从业者、团队和组织在使用LLMs时所面临的权衡。我们的发现对软件团队领导者和IT经理评估其特定环境中LLMs的可行性特别有用。

英文摘要

Background: Large Language Models emerged with the potential of provoking a revolution in software development (e.g., automating processes, workforce transformation). Although studies have started to investigate the perceived impact of LLMs for software development, there is a need for empirical studies to comprehend how to balance forward and backward effects of using LLMs. Objective: We investigated how LLMs impact software development and how to manage the impact from a software developer's perspective. Method: We conducted 22 interviews with software practitioners across 3 rounds of data collection and analysis, between October (2024) and September (2025). We employed Socio-Technical Grounded Theory for Data Analysis (STGT4DA) to rigorously analyse interview participants' responses. Results: We identified the benefits (e.g., maintain developer flow, improve developer mental models, and foster entrepreneurship) and challenges (e.g., damage to developers' reputation) of using LLMs at individual, team, organisation, and society levels; as well as actionable guidances into how mitigate these challenges. Conclusion: Critically, we present the trade-offs that software practitioners, teams, and organisations face in working with LLMs. Our findings are particularly useful for software team leaders and IT managers to assess the viability of LLMs within their specific context.

URL PDF HTML ☆

赞 0 踩 0

2509.22795 2026-05-22 eess.SP cs.AI cs.SY eess.SY

Generative Modeling and Decision Fusion for Unknown Event Detection and Classification Using Synchrophasor Data

基于同步相量数据的未知事件检测与分类的生成建模与决策融合

Yi Hu, Zheyuan Cheng

AI总结本文提出了一种结合生成建模、滑动窗口时间处理和决策融合的新框架，利用同步相量数据实现鲁棒的事件检测与分类，通过变分自编码器-生成对抗网络建模正常运行状态，并采用两种互补的决策策略来提高检测的准确性和鲁棒性。

Comments 10 pages

详情

DOI: 10.1109/TII.2026.3690931
Journal ref: IEEE Transactions on Industrial Informatics, 2026

AI中文摘要

可靠的电力系统事件检测和分类对于维持电网稳定性和态势感知至关重要。现有方法往往依赖于有限的标记数据集，限制了其在罕见或未见扰动上的泛化能力。本文提出了一种新的框架，整合了生成建模、滑动窗口时间处理和决策融合，以实现使用同步相量数据的鲁棒事件检测和分类。采用变分自编码器-生成对抗网络来建模正常运行条件，其中重构误差和判别器误差被提取为异常指标。开发了两种互补的决策策略：基于阈值的规则用于计算效率，基于凸包的方法用于在复杂误差分布下的鲁棒性。这些特征通过滑动窗口机制组织成时空检测和分类矩阵，并通过识别和决策融合阶段整合来自PMUs的输出。该设计使框架能够识别已知事件，同时系统地将以前未见过的扰动分类到新类别中，解决了监督分类器的关键限制。实验结果表明，该方法的准确性处于最先进水平，超过了机器学习、深度学习和包络基线方法。识别未知事件的能力进一步突显了所提出方法在现代电力系统广域事件分析中的适应性和实际价值。

英文摘要

Reliable detection and classification of power system events are critical for maintaining grid stability and situational awareness. Existing approaches often depend on limited labeled datasets, which restricts their ability to generalize to rare or unseen disturbances. This paper proposes a novel framework that integrates generative modeling, sliding-window temporal processing, and decision fusion to achieve robust event detection and classification using synchrophasor data. A variational autoencoder-generative adversarial network is employed to model normal operating conditions, where both reconstruction error and discriminator error are extracted as anomaly indicators. Two complementary decision strategies are developed: a threshold-based rule for computational efficiency and a convex hull-based method for robustness under complex error distributions. These features are organized into spatiotemporal detection and classification matrices through a sliding-window mechanism, and an identification and decision fusion stage integrates the outputs across PMUs. This design enables the framework to identify known events while systematically classifying previously unseen disturbances into a new category, addressing a key limitation of supervised classifiers. Experimental results demonstrate state-of-the-art accuracy, surpassing machine learning, deep learning, and envelope-based baselines. The ability to recognize unknown events further highlights the adaptability and practical value of the proposed approach for wide-area event analysis in modern power systems.

URL PDF HTML ☆

赞 0 踩 0

2509.12610 2026-05-22 cs.DB cs.AI cs.LG

ScaleDoc: Scaling LLM-based Predicates over Large Document Collections

ScaleDoc: 通过大规模文档集合进行基于大语言模型的谓词扩展

Hengrui Zhang, Yulong Hui, Yihao Liu, Huanchen Zhang

AI总结本文提出ScaleDoc系统，通过将谓词执行分为离线表示阶段和优化的在线过滤阶段，解决了大规模文档分析中大语言模型高推理成本的问题，实现了端到端速度提升和LLM调用成本降低。

详情

DOI: 10.1145/3802106

AI中文摘要

谓词是数据分析系统中的基础组件。然而，现代工作负载越来越多地涉及无结构文档，这需要语义理解，而不仅仅是传统基于值的谓词。鉴于巨大的文档和随机查询，尽管大语言模型（LLMs）显示出强大的零样本能力，但其高推理成本导致不可接受的开销。因此，我们引入ScaleDoc，一种新的系统，通过将谓词执行分解为离线表示阶段和优化的在线过滤阶段来解决这一问题。在离线阶段，ScaleDoc利用LLM为每个文档生成语义表示。在线阶段，对于每个查询，它在这些表示上训练一个轻量级代理模型来过滤大多数文档，只将有歧义的案例转发给LLM进行最终决策。此外，ScaleDoc提出了两个核心创新来实现显著的效率：（1）基于对比学习的框架，训练代理模型生成可靠的预测决策分数；（2）自适应级联机制，确定有效的过滤策略，同时满足特定的准确率目标。我们在三个数据集上的评估表明，ScaleDoc实现了超过2倍的端到端速度提升，并将昂贵的LLM调用减少了高达85%，使大规模语义分析变得实用和高效。

英文摘要

Predicates are foundational components in data analysis systems. However, modern workloads increasingly involve unstructured documents, which demands semantic understanding, beyond traditional value-based predicates. Given enormous documents and ad-hoc queries, while Large Language Models (LLMs) demonstrate powerful zero-shot capabilities, their high inference cost leads to unacceptable overhead. Therefore, we introduce \textsc{ScaleDoc}, a novel system that addresses this by decoupling predicate execution into an offline representation phase and an optimized online filtering phase. In the offline phase, \textsc{ScaleDoc} leverages a LLM to generate semantic representations for each document. Online, for each query, it trains a lightweight proxy model on these representations to filter the majority of documents, forwarding only the ambiguous cases to the LLM for final decision. Furthermore, \textsc{ScaleDoc} proposes two core innovations to achieve significant efficiency: (1) a contrastive-learning-based framework that trains the proxy model to generate reliable predicating decision scores; (2) an adaptive cascade mechanism that determines the effective filtering policy while meeting specific accuracy targets. Our evaluations across three datasets demonstrate that \textsc{ScaleDoc} achieves over a 2$\times$ end-to-end speedup and reduces expensive LLM invocations by up to 85\%, making large-scale semantic analysis practical and efficient.

URL PDF HTML ☆

赞 0 踩 0

2508.06884 2026-05-22 math.OC cs.LG

Near-Optimal Convergence of Accelerated Gradient Methods under Generalized and $(L_0, L_1)$-Smoothness

在一般化和$(L_0, L_1)$-光滑条件下加速梯度方法的近最优收敛性

Alexander Tyurin

AI总结本文研究了在满足最近提出的$\ell$-光滑性条件的凸优化问题中的一阶方法。虽然加速梯度下降法（AGD）在$L$-光滑性下能达到最优复杂度$O(\sqrt{L} R / \sqrt{\varepsilon})$，但现有方法在$\ell$-光滑性下的扩展要么引入额外的初始梯度依赖，要么有指数因子$ L_1 R $，或者需要昂贵的辅助子程序。本文解决了这一开放问题，通过新的Lyapunov函数和设计新的算法，实现了$O(\sqrt{\ell(0)} R / \sqrt{\varepsilon})$的oracle复杂度，对于小$\varepsilon$和几乎任何$\ell$。例如，在$(L_0, L_1)$-光滑性下，我们的界$O(\sqrt{L_0} R / \sqrt{\varepsilon})$在小$\varepsilon$范围内被证明是最佳的，并去除了先前加速算法中所有非常数的乘法因子。

详情

AI中文摘要

我们研究了在满足最近提出的$\ell$-光滑性条件的凸优化问题中的一阶方法。该条件$|| abla^{2}f(x)|| \le \ell\left(|| abla f(x)|| ight)$扩展了$L$-光滑性和$(L_{0},L_{1})$-光滑性。虽然加速梯度下降法AGD在$L$-光滑性下能达到最优复杂度$O(\sqrt{L} R / \sqrt{\varepsilon})$，其中$\varepsilon$是误差容忍度，$R$是起始点与最优解之间的距离，但现有方法在$\ell$-光滑性下的扩展要么引入额外的初始梯度依赖，要么有指数因子$ L_1 R $，或者需要昂贵的辅助子程序，留下开放问题：是否可能在小$\varepsilon$下实现AGD型$O(\sqrt{\ell(0)} R / \sqrt{\varepsilon})$的速率，即使在$(L_{0},L_{1})$-光滑性情况下。我们解决了这一开放问题。通过新的Lyapunov函数和设计新的算法，我们实现了对于小$\varepsilon$和几乎任何$\ell$的$O(\sqrt{\ell(0)} R / \sqrt{\varepsilon})$的oracle复杂度。例如，在$(L_{0},L_{1})$-光滑性下，我们的界$O(\sqrt{L_0} R / \sqrt{\varepsilon})$在小$\varepsilon$范围内被证明是最佳的，并去除了先前加速算法中所有非常数的乘法因子。

英文摘要

We study first-order methods for convex optimization problems with functions $f$ satisfying the recently proposed $\ell$-smoothness condition $||\nabla^{2}f(x)|| \le \ell\left(||\nabla f(x)||\right),$ which generalizes the $L$-smoothness and $(L_{0},L_{1})$-smoothness. While accelerated gradient descent AGD is known to reach the optimal complexity $O(\sqrt{L} R / \sqrt{\varepsilon})$ under $L$-smoothness, where $\varepsilon$ is an error tolerance and $R$ is the distance between a starting and an optimal point, existing extensions to $\ell$-smoothness either incur extra dependence on the initial gradient, suffer exponential factors in $L_{1} R$, or require costly auxiliary sub-routines, leaving open whether an AGD-type $O(\sqrt{\ell(0)} R / \sqrt{\varepsilon})$ rate is possible for small-$\varepsilon$, even in the $(L_{0},L_{1})$-smoothness case. We resolve this open question. Leveraging a new Lyapunov function and designing new algorithms, we achieve $O(\sqrt{\ell(0)} R / \sqrt{\varepsilon})$ oracle complexity for small-$\varepsilon$ and virtually any $\ell$. For instance, for $(L_{0},L_{1})$-smoothness, our bound $O(\sqrt{L_0} R / \sqrt{\varepsilon})$ is provably optimal in the small-$\varepsilon$ regime and removes all non-constant multiplicative factors present in prior accelerated algorithms.

URL PDF HTML ☆

赞 0 踩 0

2507.13339 2026-05-22 eess.IV cs.CV

SpectraLift: Physics-Guided Spectral-Inversion Network for Self-Supervised Hyperspectral Image Super-Resolution

SpectraLift: 一种基于物理的频谱反演网络用于自监督超分辨率高光谱图像

Ritik Shah, Marco F. Duarte

AI总结该研究提出了一种自监督的频谱反演网络SpectraLift，利用多谱段图像的光谱响应函数实现高光谱图像与多谱段图像的融合，无需点扩散函数校准或高分辨率高光谱图像的地面真实数据，从而在PSNR、SAM、SSIM和RMSE等指标上优于现有方法。

详情

DOI: 10.1109/WHISPERS69515.2025.11501599
Journal ref: 2025 15th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS)

AI中文摘要

高空间分辨率的高光谱图像（HSI）对于遥感和医学成像等应用至关重要，但HSI传感器本质上是牺牲空间细节来换取光谱丰富性。将高空间分辨率多谱段图像（HR-MSI）与低空间分辨率高光谱图像（LR-HSI）融合是恢复精细空间结构而不牺牲光谱保真度的有希望的途径。大多数最先进的HSI-MSI融合方法需要点扩散函数（PSF）校准或地面真实高分辨率HSI（HR-HSI），这两种在现实世界中都难以获得。我们提出SpectraLift，一种完全自监督的框架，利用仅MSI的光谱响应函数（SRF）融合LR-HSI和HR-MSI输入。SpectraLift通过（i）将SRF应用于LR-HSI得到的合成低空间分辨率多谱段图像（LR-MSI）作为输入，（ii）LR-HSI作为输出，以及（iii）估计与真实LR-HSI之间的ℓ₁光谱重建损失作为优化目标，训练一个轻量级的每像素多层感知机（MLP）网络。在推理时，SpectraLift使用训练好的网络将HR-MSI像素映射到HR-HSI估计。SpectraLift在几分钟内收敛，对空间模糊和分辨率不敏感，并在PSNR、SAM、SSIM和RMSE基准测试中优于现有方法。

英文摘要

High-spatial-resolution hyperspectral images (HSI) are essential for applications such as remote sensing and medical imaging, yet HSI sensors inherently trade spatial detail for spectral richness. Fusing high-spatial-resolution multispectral images (HR-MSI) with low-spatial-resolution hyperspectral images (LR-HSI) is a promising route to recover fine spatial structures without sacrificing spectral fidelity. Most state-of-the-art methods for HSI-MSI fusion demand point spread function (PSF) calibration or ground truth high resolution HSI (HR-HSI), both of which are impractical to obtain in real world settings. We present SpectraLift, a fully self-supervised framework that fuses LR-HSI and HR-MSI inputs using only the MSI's Spectral Response Function (SRF). SpectraLift trains a lightweight per-pixel multi-layer perceptron (MLP) network using ($i$)~a synthetic low-spatial-resolution multispectral image (LR-MSI) obtained by applying the SRF to the LR-HSI as input, ($ii$)~the LR-HSI as the output, and ($iii$)~an $\ell_1$ spectral reconstruction loss between the estimated and true LR-HSI as the optimization objective. At inference, SpectraLift uses the trained network to map the HR-MSI pixel-wise into a HR-HSI estimate. SpectraLift converges in minutes, is agnostic to spatial blur and resolution, and outperforms state-of-the-art methods on PSNR, SAM, SSIM, and RMSE benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2505.24333 2026-05-22 stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation

深度变换器的两种失效模式及如何避免它们：初始化下信号传播的统一理论

Alessio Giorlandino, Sebastian Goldt

AI总结本文研究了深度变换器中自注意力层的两种失效模式——秩坍缩和熵坍缩，并提出了一种统一的信号传播理论，通过分析初始化对训练稳定性的影响，提供了一种计算训练性图的简单算法，以确定正确初始化超参数的选择。

详情

Journal ref: ICLR 2026

AI中文摘要

找到正确的初始化对于确保神经网络的平稳训练和良好性能至关重要。在变换器中，错误的初始化可能导致自注意力层的两种失效模式：秩坍缩，其中所有标记坍缩为相似的表示，以及熵坍缩，其中高度集中的注意力分数导致训练不稳定。尽管之前的研究所研究了变换器的不同缩放领域，但迄今为止，关于如何初始化变换器的渐近精确、到常数的处方仍然缺乏。在这里，我们提供了一种分析深度变换器中信号通过自注意力、层归一化、跳跃连接和MLP传播的理论。我们的理论产生了一种简单的算法，用于计算训练性图，以确定给定架构的正确初始化超参数选择。我们通过建立与统计物理中随机能模型的正式平行，克服了处理自注意力层的关键挑战。我们还分析了反向路径中的梯度，并确定了梯度在初始化时消失的区域。我们通过三个案例研究展示了我们框架的通用性。我们的理论框架为自注意力的两种失效模式提供了统一的视角，并对权重和残差连接的尺度提供了定量预测，以确保平稳训练。

英文摘要

Finding the right initialisation for neural networks is crucial to ensure smooth training and good performance. In transformers, the wrong initialisation can lead to one of two failure modes of self-attention layers: rank collapse, where all tokens collapse into similar representations, and entropy collapse, where highly concentrated attention scores lead to training instability. While previous work has studied different scaling regimes for transformers, an asymptotically exact, down-to-the constant prescription for how to initialise transformers has so far been lacking. Here, we provide an analytical theory of signal propagation through deep transformers with self-attention, layer normalisation, skip connections and MLP. Our theory yields a simple algorithm to compute trainability diagrams that identify the correct choice of initialisation hyper-parameters for a given architecture. We overcome the key challenge, an exact treatment of the self-attention layer, by establishing a formal parallel with the Random Energy Model from statistical physics. We also analyse gradients in the backward path and determine the regime where gradients vanish at initialisation. We demonstrate the versatility of our framework through three case studies. Our theoretical framework gives a unified perspective on the two failure modes of self-attention and gives quantitative predictions on the scale of both weights and residual connections that guarantee smooth training.

URL PDF HTML ☆

赞 0 踩 0

2505.20349 2026-05-22 physics.flu-dyn cs.LG

FD-Bench: A Modular and Fair Benchmark for Data-driven Fluid Simulation

FD-Bench: 一种模块化且公平的用于数据驱动流体模拟的基准测试

Haixin Wang, Ruoyan Li, Fred Xu, Fang Sun, Kaiqiao Han, Zijie Huang, Ching Chang, Xiao Luo, Wei Wang, Yizhou Sun

AI总结本文提出FD-Bench，一个模块化、公平、全面且可重复的数据驱动流体模拟基准测试，通过统一的实验设置评估85个基线模型，解决可重复性和可比性问题，为未来数据驱动流体模型的稳健评估奠定基础。

Comments 32 pages, 20 figures, paper accepted by KDD 2026

详情

AI中文摘要

数据驱动的流体动力学建模随着神经PDE求解器的快速发展而迅速进步，但公平且强大的基准测试仍然碎片化，由于缺乏统一的PDE数据集和标准化的评估协议。尽管架构创新丰富，但公平评估进一步受制于空间、时间和损失模块之间缺乏明确分离。在本文中，我们引入FD-Bench，这是首个公平、模块化、全面且可重复的数据驱动流体模拟基准测试。FD-Bench在统一的实验设置下系统评估了85个基线模型，涵盖10种代表性流场场景。它提供了四个关键贡献：(1) 模块化设计，使空间、时间和损失函数模块之间能够公平比较；(2) 首个系统框架，用于与传统数值求解器的直接比较；(3) 在不同分辨率、初始条件和时间窗口下的细粒度泛化分析；(4) 用户友好的、可扩展的代码库，以支持未来研究。通过严谨的实证研究，FD-Bench建立了迄今为止最全面的排行榜，解决了长期存在的可重复性和可比性问题，并为未来数据驱动流体模型的稳健评估奠定了基础。代码已开源在https://github.com/WillDreamer/FD-Bench。

英文摘要

Data-driven modeling of fluid dynamics has advanced rapidly with neural PDE solvers, yet a fair and strong benchmark remains fragmented due to the absence of unified PDE datasets and standardized evaluation protocols. Although architectural innovations are abundant, fair assessment is further impeded by the lack of clear disentanglement between spatial, temporal and loss modules. In this paper, we introduce FD-Bench, the first fair, modular, comprehensive and reproducible benchmark for data-driven fluid simulation. FD-Bench systematically evaluates 85 baseline models across 10 representative flow scenarios under a unified experimental setup. It provides four key contributions: (1) a modular design enabling fair comparisons across spatial, temporal, and loss function modules; (2) the first systematic framework for direct comparison with traditional numerical solvers; (3) fine-grained generalization analysis across resolutions, initial conditions, and temporal windows; and (4) a user-friendly, extensible codebase to support future research. Through rigorous empirical studies, FD-Bench establishes the most comprehensive leaderboard to date, resolving long-standing issues in reproducibility and comparability, and laying a foundation for robust evaluation of future data-driven fluid models. The code is open-sourced at https://github.com/WillDreamer/FD-Bench.

URL PDF HTML ☆

赞 0 踩 0

2505.15844 2026-05-22 q-bio.QM cs.LG stat.AP

Advancing Tabular Stroke Modelling Through a Novel Hybrid Architecture and Feature-Selection Synergy

通过一种新型混合架构和特征选择协同效应推进表格性中风建模

Yousuf Islam, Md. Jalal Uddin Chowdhury, Sumon Chandra Das

AI总结本文提出了一种数据驱动且可解释的机器学习框架，利用十种常规获取的 demographics、生活方式和临床变量，通过详尽的探索性数据分析、数据预处理和特征选择，构建出一个准确率达到97.2%的中风风险评估模型，显著优于现有模型。

详情

DOI: 10.1109/BECITHCON69222.2025.11503962
Journal ref: IEEE Conference Publication, 2025

AI中文摘要

脑中风仍然是全球死亡和残疾的主要原因之一，但大多数表格数据预测模型仍低于95%的准确率阈值，限制了实际应用。为解决这一差距，本文开发并验证了一个完全数据驱动且可解释的机器学习框架，旨在使用来自4981条记录的公共队列中十种常规获取的 demographics、生活方式和临床变量来预测中风。我们通过详尽的探索性数据分析（EDA）来理解数据集的结构和分布，随后进行严格的数据预处理，包括处理缺失值、去除异常值和使用合成少数类过采样技术（SMOTE）纠正类别不平衡。为了简化特征选择，使用了点二列相关性和随机森林Gini重要性，并利用分层五折交叉验证优化了包括树集成、提升、核方法和多层神经网络在内的十种不同算法。它们基于概率的预测帮助我们构建了所提出的模型，包括随机森林、XGBoost、LightGBM和一个支持向量分类器，其中逻辑回归作为元学习器。所提出的模型实现了97.2%的准确率和97.15%的F1分数，表明其显著优于领先的单个模型LightGBM，其准确率为91.4%。本研究的结果表明，严格的预处理与多样化的混合模型相结合，可以将低成本的表格数据转化为几乎临床级别的中风风险评估工具。

英文摘要

Brain stroke remains one of the principal causes of death and disability worldwide, yet most tabular-data prediction models still hover below the 95% accuracy threshold, limiting real-world utility. Addressing this gap, the present work develops and validates a completely data-driven and interpretable machine-learning framework designed to predict strokes using ten routinely gathered demographic, lifestyle, and clinical variables sourced from a public cohort of 4,981 records. We employ a detailed exploratory data analysis (EDA) to understand the dataset's structure and distribution, followed by rigorous data preprocessing, including handling missing values, outlier removal, and class imbalance correction using Synthetic Minority Over-sampling Technique (SMOTE). To streamline feature selection, point-biserial correlation and random-forest Gini importance were utilized, and ten varied algorithms-encompassing tree ensembles, boosting, kernel methods, and a multilayer neural network-were optimized using stratified five-fold cross-validation. Their predictions based on probabilities helped us build the proposed model, which included Random Forest, XGBoost, LightGBM, and a support-vector classifier, with logistic regression acting as a meta-learner. The proposed model achieved an accuracy rate of 97.2% and an F1-score of 97.15%, indicating a significant enhancement compared to the leading individual model, LightGBM, which had an accuracy of 91.4%. Our study's findings indicate that rigorous preprocessing, coupled with a diverse hybrid model, can convert low-cost tabular data into a nearly clinical-grade stroke-risk assessment tool.

URL PDF HTML ☆

赞 0 踩 0

2503.24191 2026-05-22 cs.CR cs.AI

When Grammar Guides the Attack: Uncovering Control-Plane Vulnerabilities in LLMs with Structured Output

当语法引导攻击：利用结构化输出揭示LLM中的控制平面漏洞

Shuoming Zhang, Jiacheng Zhao, Hanyuan Dong, Ruiyuan Xu, Zhicheng Li, Yangyu Zhang, Shuaijiang Li, Yuan Wen, Chunwei Xia, Zheng Wang, Xiaobing Feng, Huimin Cui

AI总结本文研究了通过结构化输出揭示LLM控制平面漏洞的问题，提出了一种名为Constrained Decoding Attack（CDA）的新类别的 jailbreak 方法，通过控制到语义的管道机制，利用schema-enforced logit masking注入恶意前缀，并由模型自身完成有害意图，展示了DictAttack在多个模型上的高攻击成功率，揭示了需要跨平面防御来弥合数据和控制平面之间的语义差距。

Comments To appear in CCS2026

详情

AI中文摘要

内容警告：本文可能包含由LLM生成的不安全或有害内容，可能对读者造成冒犯。大型语言模型（LLMs）越来越多地通过结构化输出API作为工具平台，但驱动这一功能的语法引导解码功能打开了一个与传统数据平面漏洞无关的控制平面攻击面。我们引入了Constrained Decoding Attack（CDA），一种针对LLM控制平面的新jailbreak类别。CDA最佳描述为一个控制到语义的管道：（1）schema-enforced logit masking注入恶意前缀到生成轨迹中，（2）模型本身完成有害意图。不同于依赖绕过对齐可见输入的数据平面jailbreaks，CDA作用于解码过程本身，因此仅靠内部安全对齐无法阻止它。我们用EnumAttack实例化CDA，其将恶意内容隐藏在枚举字段中，并用更狡猾的DictAttack，将负载拆分到一个无害提示和基于字典的语法中。在13个专有/开源模型和五个标准基准上，DictAttack在旗舰模型如gpt-5、gemini-2.5-pro、deepseek-r1和gpt-oss-120b上实现了94.3-99.5%的攻击成功率（ASR）。尽管基本语法审核可以缓解EnumAttack，DictAttack仍能抵御最先进的jailbreak guardrails，达到75.8%的ASR，暴露了需要跨平面防御来弥合数据和控制平面之间语义差距的问题。项目页面和代码可在https://ict-cda.github.io/上获得。

英文摘要

Content Warning: This paper may contain unsafe or harmful content generated by LLMs that may be offensive to readers. Large Language Models (LLMs) increasingly serve as tooling platforms through structured output APIs, but the grammar-guided decoding that powers this feature opens a critical control-plane attack surface orthogonal to traditional data-plane vulnerabilities. We introduce Constrained Decoding Attack (CDA), a new jailbreak class that targets the LLM control plane. CDA is best characterized as a control-to-semantic pipeline: (1) schema-enforced logit masking injects a malicious prefix into the generation trajectory, and (2) the model itself completes the harmful intent. Unlike data-plane jailbreaks that rely on bypassing alignment with visible inputs, CDA acts on the decoding process itself, so internal safety alignment alone cannot stop it. We instantiate CDA with EnumAttack, which hides malicious content in enum fields, and the more evasive DictAttack, which decouples the payload across a benign prompt and a dictionary-based grammar. Across 13 proprietary/open-weight models and five standard benchmarks, DictAttack achieves 94.3--99.5% Attack Success Rate (ASR) on flagship models including gpt-5, gemini-2.5-pro, deepseek-r1, and gpt-oss-120b. While basic grammar auditing mitigates EnumAttack, DictAttack still sustains 75.8% ASR against SOTA jailbreak guardrails, exposing a "semantic gap" that demands cross-plane defenses bridging the data and control planes. Project page and code are available at https://ict-cda.github.io/.

URL PDF HTML ☆

赞 0 踩 0

2503.06115 2026-05-22 stat.ML cs.IT cs.LG math.IT math.PR

On Statistical Estimation of Edge-Reinforced Random Walks

关于边缘增强随机游走的统计估计

Qinghua, Ding, Venkat Anantharam

AI总结本文研究了边缘增强随机游走初始边权重的统计估计问题，利用随机环境中的超几何高斯结构来分析估计器的样本复杂性。

Comments This is the full version of the conference paper in submission to ISIT 2025

2502.21194 2026-05-22 stat.ML cs.LG

Prior shift estimation for positive unlabeled data through the lens of kernel embedding

通过核嵌入视角对正样本无标签数据的先验偏移估计

Jan Mielniczuk, Wojciech Rejchel, Paweł Teisseyre

AI总结本文研究了在目标无标签样本的先验分布估计问题，假设其可能与源群体不同，并且源数据部分可观察：只有正类样本和整个群体的样本可用（PU学习场景）。提出了一种新的直接估计先验分布的方法，避免了对两个群体后验概率的估计，并具有简单的几何解释。该方法基于分布匹配技术与再生核希尔伯特空间中的核嵌入，并作为优化任务的显式解获得。建立了其渐近一致性以及对其与未知先验偏差的显式非渐近界，该界在实践中可计算。通过合成和实际数据研究有限样本行为，证明该方法在性能上与竞争对手相当或更优。

2403.16552 2026-05-22 cs.NE cs.AI cs.CV

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

QKFormer: 基于Q-K注意力的分层脉冲变换器

Chenlin Zhou, Han Zhang, Zhaokun Zhou, Liutao Yu, Liwei Huang, Xiaopeng Fan, Li Yuan, Zhengyu Ma, Huihui Zhou, Yonghong Tian

AI总结本文提出QKFormer，一种基于Q-K注意力的分层脉冲变换器，通过引入新的脉冲形式Q-K注意力机制、分层结构和灵活的补丁嵌入模块，提升了脉冲神经网络在图像分类任务中的性能，实现了在ImageNet-1K数据集上85.65%的top-1准确率。

Comments Accepted by NeurIPS 2024 (Spotlight). Code and Model: https://github.com/zhouchenlin2096/QKFormer

详情

AI中文摘要

Spiking Transformers，将脉冲神经网络（SNNs）与变换器架构相结合，因其在能效和高性能方面的潜力而受到广泛关注。然而，现有模型在此领域仍存在性能不佳的问题。我们引入了几个创新来提高性能：i）我们提出了一种新的脉冲形式Q-K注意力机制，专为SNNs设计，通过二进制向量以线性复杂度高效建模token或通道维度的重要性。ii）我们将层次结构引入脉冲变换器，显著提升了生物和人工神经网络的性能，以获得多尺度脉冲表示。iii）我们设计了一个灵活且强大的补丁嵌入模块，具有特定于脉冲变换器的变形快捷方式。共同，我们开发了QKFormer，一种基于Q-K注意力的直接训练分层脉冲变换器。QKFormer在各种主流数据集上显著优于现有最先进SNN模型。值得注意的是，与Spikformer（66.34 M，74.81%）相比，QKFormer（64.96 M）在ImageNet-1k上实现了突破性的top-1准确率85.65%，大幅超越Spikformer 10.84%。据我们所知，这是首次直接训练SNNs在ImageNet-1K上超过85%的准确率。代码和模型可在https://github.com/zhouchenlin2096/QKFormer公开获取。

英文摘要

Spiking Transformers, which integrate Spiking Neural Networks (SNNs) with Transformer architectures, have attracted significant attention due to their potential for energy efficiency and high performance. However, existing models in this domain still suffer from suboptimal performance. We introduce several innovations to improve the performance: i) We propose a novel spike-form Q-K attention mechanism, tailored for SNNs, which efficiently models the importance of token or channel dimensions through binary vectors with linear complexity. ii) We incorporate the hierarchical structure, which significantly benefits the performance of both the brain and artificial neural networks, into spiking transformers to obtain multi-scale spiking representation. iii) We design a versatile and powerful patch embedding module with a deformed shortcut specifically for spiking transformers. Together, we develop QKFormer, a hierarchical spiking transformer based on Q-K attention with direct training. QKFormer shows significantly superior performance over existing state-of-the-art SNN models on various mainstream datasets. Notably, with comparable size to Spikformer (66.34 M, 74.81%), QKFormer (64.96 M) achieves a groundbreaking top-1 accuracy of 85.65% on ImageNet-1k, substantially outperforming Spikformer by 10.84%. To our best knowledge, this is the first time that directly training SNNs have exceeded 85% accuracy on ImageNet-1K. The code and models are publicly available at https://github.com/zhouchenlin2096/QKFormer

URL PDF HTML ☆

赞 0 踩 0

2305.12138 2026-05-22 cs.SE cs.AI

Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs

探索代码分析：利用大语言模型进行语法和语义的零样本洞察

Wei Ma, Zhihao Lin, Shangqing Liu, Qiang Hu, Ye Liu, Wenhan Wang, Cen Zhang, Liming Nie, Li Li, Yang Liu, Lingxiao Jiang

AI总结本文研究了大语言模型在代码分析中的应用，通过评估21种最先进的LLM在四种语言中的九项任务，揭示了LLM在语法解析、静态语义推断和动态推理方面的性能，发现其在跨语言泛化方面有优势，但动态推理仍有限，提出了一个经过验证的评估框架。

Comments Accepted at ACM Transactions on Software Engineering and Methodology (TOSEM)

详情

AI中文摘要

代码分析在软件工程中至关重要，支持调试、优化和安全评估。人类开发者通过语法解析、静态语义推断和动态推理来处理。传统工具虽然有效，但受限于语言特异性且跨语言泛化能力弱。大语言模型（LLMs）在代码任务中具有潜力，但其在基础代码分析中的能力尚待探索。我们围绕与人类实践相关的三个方面（语法解析、静态语义推断和动态推理）展开研究。我们评估了21种最先进的LLM在四种语言（C、Java、Python、Solidity）中的九项任务，包括AST生成、CFG构建、数据依赖、污点分析和易变测试推理。我们应用三层评估协议（自动化指标、专家裁决、一致性验证）对3124个代码样本进行评估，实现了高评分者一致性（Cohen's kappa = 0.844-0.936）和强人机一致（Gwet's AC1 = 0.500-0.727，F1 = 0.791-0.882）。尽管最佳LLM在语法解析（AST 90%+，表达式匹配84-100%）方面表现优异，并在静态分析中显示出潜力，但其动态推理仍有限（<70%），且对数据迁移敏感（每项目F1变化0-1.0）。这一层次在模型家族和规模上均成立，表明是根本而非短暂的限制。这些发现展示了LLM如何补充传统分析器：它们提供跨语言泛化能力，但输出非确定性，需要验证；而传统工具提供确定性保证，但需要语言特定的配置。我们贡献了一个经过验证的评估框架，与传统分析器（Tree-sitter、Soot、Joern）进行比较，并提供了任务特定的应用层级。基准：https://github.com/mathieu0905/llm_code_analysis.git

英文摘要

Code analysis is fundamental in Software Engineering, supporting debugging, optimization, and security assessment. Human developers approach it through syntax parsing, static semantics inference, and dynamic reasoning. Traditional tools are effective but limited by language specificity and weak cross-language generalization. Large language models (LLMs) are promising for code tasks, yet their capabilities for fundamental code analysis remain underexplored. We structure our study around three aspects aligned with human practices: syntax parsing, static semantics inference, and dynamic reasoning. We evaluate 21 state-of-the-art LLMs across nine tasks in four languages (C, Java, Python, Solidity), including AST generation, CFG construction, data dependency, taint analysis, and flaky test reasoning. We apply a three-layer evaluation protocol (automated metrics, expert adjudication, consistency validation) to 3,124 code samples, achieving high inter-rater reliability (Cohen's kappa = 0.844-0.936) and strong human-machine agreement (Gwet's AC1 = 0.500-0.727, F1 = 0.791-0.882). While the best LLMs excel in syntax parsing (AST 90%+, expression matching 84-100%) and show promise in static analysis, their dynamic reasoning remains limited (<70%) with high data-shift sensitivity (per-project F1 varying 0-1.0). This hierarchy holds across model families and scales, suggesting fundamental rather than transient limitations. These findings show how LLMs complement traditional analyzers: they offer cross-language generalization but non-deterministic outputs needing validation, while traditional tools give deterministic guarantees but need language-specific configuration. We contribute a validated evaluation framework with comparison against traditional analyzers (Tree-sitter, Soot, Joern) and task-specific applicability tiers. Benchmark: https://github.com/mathieu0905/llm_code_analysis.git

URL PDF HTML ☆

赞 0 踩 0

2605.22822 2026-05-22 hep-th hep-ph quant-ph

Bottom-up open EFT for non-Abelian gauge theory with dynamical color environment

自底向上的开放有效场论用于非阿贝尔规范理论中的动态颜色环境

Yoshihiko Abe, Kanji Nishii

AI总结本文提出了一种自底向上的开放有效场论，用于描述非阿贝尔规范理论中动态颜色环境的色响应，通过保留慢响应变量构建局部系统-环境有效场论，并展示了硬热环响应作为保留环境响应的一种实现。

Comments 51 pages, 1 figure

详情

AI中文摘要

我们发展了一种基于Schwinger-Keldysh形式化方法的自底向上的开放有效场论（EFT），用于非阿贝尔规范理论。与完全积分出环境并从非局域影响功能开始不同，我们显式保留慢环境响应变量，并构建了局部的系统-环境EFT。环境部分由一个动态的颜色框架变量、类似Stückelberg的场以及相关的颜色电流部分描述，这给出了系统与环境之间非平凡的相互作用和耗散。所得到的构造提供了非局域和非马尔可夫色响应的协变马尔可夫嵌入。在积分出保留的环境变量并采用延迟边界条件后，简化的系统理论获得了非局域耗散核和随机源。我们展示了硬热环响应自然地作为保留环境响应的一种实现。本文框架提供了非阿贝尔等离子体中色输运、记忆效应和涨落-耗散结构的局部开放EFT描述，并为具有动态环境的耗散杨-米尔斯EFTs提供了系统性的起点。

英文摘要

We develop a bottom-up open effective field theory (EFT) for non-Abelian gauge theories within the Schwinger--Keldysh formalism. Instead of integrating out the environment completely and starting from a nonlocal influence functional, we retain the slow environmental response variables explicitly and construct a local system-environment EFT. The environmental sector is described by a dynamical color-frame variable, Stückelberg-like field, and an associated color-current sector, which gives the nontrivial interactions and dissipation between the system and the environment. The resulting construction provides a gauge-covariant Markov embedding of nonlocal and non-Markovian color response. After integrating out the retained environmental variables with retarded boundary conditions, the reduced system theory acquires nonlocal dissipative kernels and stochastic sources. We show that the hard thermal loop response arises naturally as a particular realization of the retained environmental response. Our framework provides a local open-EFT description of color transport, memory effects, and fluctuation-dissipation structure in non-Abelian plasmas, and offers a systematic starting point for dissipative Yang--Mills EFTs with dynamical environments.

URL PDF HTML ☆

赞 0 踩 0

2605.22815 2026-05-22 hep-ex hep-ph

New constraints on physics within and beyond the standard model from the latest CONUS datasets

从最新CONUS数据集对标准模型内和外的物理的新约束

N. Ackermann, H. Bonet, A. Bonhomme, C. Buck, 1 K. Fülber, J. Hakenmüller, J. Hempfling, G. Heusser, T. Hugle, M. Lindner, W. Maneschg, S. Mertens, K. Ni, D. Piani, M. Rank, T. Rink, E. Sanchez Garcia, I. Stalder, H. Strecker, R. Wink, J. Woenckhaus

AI总结利用中性子衰变静止、太阳和最近的反应堆反中性子检测，CONUS合作组将相干弹性中性子-核散射（CEνNS）确立为研究标准模型内和外物理的工具，通过最新数据集进一步改进了中性子磁矩和中性子毫电荷的限制，并降低了与NSI相关的新的物理尺度和轻新媒介耦合的限制。

Comments 35 pages, 16 figures, 6 tables; comments welcome

详情

AI中文摘要

通过中性子衰变静止、太阳和最近的反应堆反中性子检测，CONUS合作组将相干弹性中性子-核散射（CEνNS）确立为研究标准模型内和外物理的工具。CONUS实验位于德国布罗克多夫和瑞士莱比锡特核电站，使用锗半导体探测器在紧凑屏蔽中靠近反应堆核心运行。在莱比锡特站点报告了3.7σ显著性观测结果，与标准模型预测良好一致。总结了在布罗克多夫反应堆和莱比锡特站点收集的最新数据集上进行的物理研究。通过实验分析框架，所呈现的结果包含实验背后的完整系统。之前确定的中性子-电子散射中中性子磁矩和中性子毫电荷的限制被改进为μ_ν <5.18·10^{-11}μ_B和q_ν<1.76·10^{-12}e_0（90%置信水平）。此外，与NSI相关的新的物理尺度被改进到Λ_{NSI}=145 GeV，轻新媒介耦合的限制被降低到4·10^{-7}（90%置信水平）。最后，利用CEνNS和反应堆反中性子确定的Weinberg角为sin^{2}θ_W=0.28^{+0.03}_{-0.04}，在动量转移约为10 MeV时。

英文摘要

Its detections with pion-decay-at-rest, solar and recently with reactor antineutrinos by the CONUS collaboration render coherent elastic neutrino-nucleus scattering (CE$ν$NS) an established tool for investigations within and beyond the Standard Model (SM). The CONUS experiment located at the nuclear power plants in Brokdorf (Germany) and Leibstadt (Switzerland) operates Germanium semiconductor detectors in a compact shield at close distance to the reactor core. An observation with $3.7 σ$ significance is reported at the Leibstadt site, showing good agreement with its SM prediction. Physics investigations performed with the last datasets collected at the Brokdorf reactor and with the first data obtained at the Leibstadt site are summarized. By using the experimental analysis framework, the presented results contain the full systematics that underlie the experiment. Previously determined limits with neutrino-electron scattering on the neutrino magnetic moment and a neutrino millicharge are improved to $μ_ν <5.18\cdot 10^{-11}μ_\mathrm{B}$ and $q_ν<1.76\cdot 10^{-12} e_0$ (90% C.L). Further, the scale of new physics related to NSIs is improved to $Λ_{\rm NSI}$=145 GeV and limits on the coupling of light new mediators are lowered down to $4 \cdot 10^{-7}$ (90% C.L.) with the new data. Finally, the determination of the Weinberg angle with CE$ν$NS and reactor antineutrinos yields $\sin^{2}θ_W= 0.28^{+0.03}_{-0.04}$ at a momentum transfer of $\sim 10 \ \mathrm{MeV}$.

URL PDF HTML ☆

赞 0 踩 0

2605.22813 2026-05-22 cs.DS

Optimal Testing of Reed-Muller Codes with an Online Adversary

Reed-Muller码的最优测试与在线对抗

Esty Kelman, Uri Meir, Kai Zhe Zheng

AI总结本文提出了一种半采样测试器，用于在在线擦除模型中对Reed-Muller码进行最优测试，改进了Minzer和Zheng的工作，并首次为提升的仿射不变码提供了在线擦除模型下的测试方法。

详情

AI中文摘要

受Kalemaj、Raskhodnikova和Varma（ITCS 2022和Theory of Computing 2023）在线擦除模型中属性测试应用的启发，我们定义并分析了Reed-Muller码的半采样测试器。Reed-Muller测试的任务是通过尽可能少的点查询来确定输入函数$f: \F^n o \F$是否属于Reed-Muller码或与其远距离。Reed-Muller测试在属性测试和概率可验证证明文献中均有深入研究。在线擦除模型引入了新的挑战：每次查询后，对手可能擦除输入函数的最多$t$个点，这可能阻止任何查询遵循可预测模式的测试。半采样测试器是样本测试器和标准测试器之间的混合体：样本测试器只能对输入函数进行均匀随机查询，而标准测试器可以自由选择查询。它们是为在线擦除模型设计的，操作方式是首先选择域的一个子集$S$，然后在$S$内均匀随机地进行查询。我们描述了Reed-Muller码的半采样测试器，并给出了其正确性的最优分析。因此，我们证明半采样测试器确实在存在在线擦除的情况下有效，从而在在线擦除模型中实现了Reed-Muller码测试的最优查询复杂度。这一结果改进了Minzer和Zheng（SODA 2024）的工作。作为额外的奖励，我们还证明半采样测试器也适用于Guo、Kopparty和Sudan（ITCS 2013）提出的提升的仿射不变码，从而为这些码在在线擦除模型下提供了已知的首次测试方法。

英文摘要

Motivated by applications to property testing in the online-erasure model of Kalemaj, Raskhodnikova, and Varma (ITCS 2022 and Theory of Computing 2023), we define and analyze {\em semi-sample-based testers} for Reed-Muller codes. The task in Reed-Muller testing is to determine whether an input function $f: \F^n \to \F$ belongs to the Reed-Muller code or is far from it, using as few point queries to $f$ as possible. Reed-Muller testing is a well-studied task with its roots in both the Property Testing and Probabilistically Checkable Proofs literature. The online-erasure model introduces a twist: after each query made, an adversary may erase up to $t$ points of the input function, potentially thwarting any test in which the queries follow a predictable pattern. Semi-sample-based testers are a hybrid between sample-based testers -- which can only make uniformly random queries to the input function -- and standard testers, which can choose their queries freely. They are designed with the online-erasure model in mind and operate by first choosing some subset $S$ of the domain and then making their queries uniformly at random inside of $S$. We describe semi-sample-based testers for the Reed-Muller code and give an optimal analysis of their soundness. Consequently, we show that semi-sample-based testers are indeed effective in the presence of online erasures, and thereby achieve optimal query complexity for testing the Reed-Muller code in the online-erasure model. This result improves upon prior work of Minzer and Zheng (SODA 2024). As an added bonus, we show that semi-sample-based testers also exist for the lifted affine-invariant codes of Guo, Kopparty, and Sudan (ITCS 2013), thereby providing the first known testers for these codes in the online-erasure model.

URL PDF HTML ☆

赞 0 踩 0