arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 8098
2602.08873 2026-06-03 cs.IR cs.AI cs.CY cs.SI physics.soc-ph

Whose Name Comes Up? II: Benchmarking and Intervention-Based Auditing of LLM-Based Scholar Recommendation

谁的名字出现?II:基于基准测试和干预审计的LLM学者推荐系统

Lisette Espín-Noboa, Gonzalo Gabriel Méndez

发表机构 * Complexity Science Hub Vienna(维也纳复杂性科学中心) Universitat Politècnica de València(巴塞罗那理工大学) Inria Rennes(里昂国家信息与自动化研究所)

AI总结 提出LLMScholarBench基准,通过温度变化、表示约束提示和检索增强生成等干预措施审计22个LLM在物理专家推荐中的技术质量和社会代表性,发现干预措施带来不同权衡。

Comments In Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '26). 30 pages: 11 pages in main (6 figures, 1 table), 19 pages in appendix (22 figures, 2 tables)

详情
AI中文摘要

大型语言模型(LLM)现在被用于学术专家推荐。现有的审计通常孤立地评估此类推荐,忽略了最终用户的推理时干预。因此,尚不清楚失败(例如,拒绝、幻觉、覆盖不均)源于模型选择还是部署决策。我们引入了LLMScholarBench,一个用于审计基于LLM的学者推荐的基准,它联合评估模型基础设施和最终用户在多个任务上的干预。LLMScholarBench使用九个指标衡量技术质量和社会代表性。我们在物理专家推荐中实例化该基准,并在温度变化、表示约束提示和通过网络搜索的检索增强生成(RAG)下审计22个LLM。我们的结果表明,每种干预都带来不同的权衡。较高的温度会降低有效性、一致性和事实性。表示约束提示以提高多样性为代价降低了事实性,而RAG主要提高了技术质量,同时降低了多样性和平等性。总体而言,最终用户的干预重塑了权衡,而不是提供统一的收益。LLMScholarBench使得在基于LLM的学者推荐中,跨模型和干预的所有这些动态都可审计。

英文摘要

Large language models (LLMs) are now used for academic expert recommendation. Existing audits typically evaluate such recommendations in isolation, ignoring end-user inference-time interventions. Thus, it remains unclear whether failures (e.g., refusals, hallucinations, uneven coverage) stem from model choice or deployment decisions. We introduce LLMScholarBench, a benchmark for auditing LLM-based scholar recommendation that jointly evaluates model infrastructure and end-user interventions across multiple tasks. LLMScholarBench measures technical quality and social representation using nine metrics. We instantiate the benchmark in physics expert recommendation and audit 22 LLMs under temperature variation, representation-constrained prompting, and retrieval-augmented generation (RAG) via web search. Our results show that each intervention entails distinct tradeoffs. Higher temperature degrades validity, consistency, and factuality. Representation-constrained prompting improves diversity at the expense of factuality, while RAG primarily improves technical quality while reducing diversity and parity. Overall, end-user interventions reshape trade-offs rather than providing uniform gains. LLMScholarBench makes all these dynamics auditable across models and interventions in LLM-based scholar recommendations.

2602.06842 2026-06-03 math.NA cs.LG cs.NA

Are Deep Learning Based Hybrid PDE Solvers Reliable? Why Training Paradigms and Update Strategies Matter

基于深度学习的混合PDE求解器可靠吗?为什么训练范式和更新策略很重要

Yuhan Wu, Jan Willem van Beek, Victorita Dolean, Alexander Heinlein

发表机构 * Delft Institute of Applied Mathematics(代尔夫特应用数学研究所) Delft University of Technology(代尔夫特理工大学) Department of Mathematics and Computer Science(数学与计算机科学系) Eindhoven University of Technology(埃因霍温理工大学) The Netherlands(荷兰)

AI总结 本文研究基于深度学习的混合迭代方法(DL-HIMs)在科学计算中的可靠性问题,发现训练目标与求解器动力学及物理问题不一致会导致残差停滞,并提出物理感知的Anderson加速(PA-AA)方法以恢复可靠收敛。

Comments Accepted manuscript version of an article accepted for publication in IEEE Computing in Science & Engineering. The final published version will be available through IEEE Xplore

详情
AI中文摘要

基于深度学习的混合迭代方法(DL-HIMs)将经典数值求解器与神经算子相结合,利用它们互补的谱偏差来加速收敛。尽管有这一前景,许多DL-HIMs在假固定点处停滞,此时神经更新消失而物理残差仍然很大,这引发了对其在科学计算中可靠性的质疑。在本文中,我们提供证据表明,即使神经架构固定,性能对训练范式和更新策略高度敏感。通过对基于DeepONet的混合迭代数值可转移求解器(HINTS)和基于FFT的傅里叶神经求解器(FNS)的详细研究,我们展示了当训练目标与求解器动力学和问题物理不一致时,显著的物理残差可能持续存在。我们进一步研究了Anderson加速(AA),并证明其经典形式不适用于非线性神经算子。为了克服这一点,我们引入了物理感知的Anderson加速(PA-AA),它最小化物理残差而非固定点更新。数值实验证实,PA-AA在显著更少的迭代次数内恢复了可靠收敛。这些发现为围绕基于AI的PDE求解器的持续争议提供了具体答案:可靠性不仅取决于架构,还取决于物理信息驱动的训练和迭代设计。

英文摘要

Deep learning-based hybrid iterative methods (DL-HIMs) integrate classical numerical solvers with neural operators, utilizing their complementary spectral biases to accelerate convergence. Despite this promise, many DL-HIMs stagnate at false fixed points where neural updates vanish while the physical residual remains large, raising questions about reliability in scientific computing. In this paper, we provide evidence that performance is highly sensitive to training paradigms and update strategies, even when the neural architecture is fixed. Through a detailed study of a DeepONet-based hybrid iterative numerical transferable solver (HINTS) and an FFT-based Fourier neural solver (FNS), we show that significant physical residuals can persist when training objectives are not aligned with solver dynamics and problem physics. We further examine Anderson acceleration (AA) and demonstrate that its classical form is ill-suited for nonlinear neural operators. To overcome this, we introduce physics-aware Anderson acceleration (PA-AA), which minimizes the physical residual rather than the fixed-point update. Numerical experiments confirm that PA-AA restores reliable convergence in substantially fewer iterations. These findings provide a concrete answer to ongoing controversies surrounding AI-based PDE solvers: reliability hinges not only on architectures but on physically informed training and iteration design.

2511.12085 2026-06-03 cs.CR cs.AI cs.LG

A Robust and Explainable Transformer-Based Framework for Phishing Email Detection

一种鲁棒且可解释的基于Transformer的钓鱼邮件检测框架

Sajad U P

发表机构 * Independent Researcher(独立研究者)

AI总结 提出基于DistilBERT的轻量级钓鱼邮件检测框架,通过梯度对抗训练和字符级噪声增强鲁棒性,并集成LIME、SHAP和IG三种可解释AI方法,结合Flan-T5-Small生成自然语言解释,提升检测准确性和用户信任。

详情
AI中文摘要

钓鱼及相关网络威胁正变得越来越复杂,基于电子邮件的钓鱼仍然是最持久的攻击载体。这些攻击利用人类漏洞来传递恶意软件或获取对敏感信息的未授权访问。基于Transformer的模型通过强大的上下文语言理解增强了钓鱼检测;然而,由于缺乏可解释性,它们通常被视为黑盒。此外,最近的AI驱动攻击进一步削弱了模型的韧性。为了解决这些挑战,本文提出了一种基于DistilBERT(一种轻量级Transformer模型)的轻量级钓鱼检测框架。通过使用快速梯度法(FGM)进行基于梯度的对抗训练,并结合随机字符级扰动,增强了对嵌入级扰动和字符级输入噪声的鲁棒性。为了提高透明度,集成了三种突出的可解释AI(XAI)方法:LIME(局部可解释模型无关解释)、SHAP(SHapley Additive exPlanations)和IG(积分梯度),以解释模型决策。一个结构化的基于规则的提示结合模型预测和XAI特征,引导Flan-T5-Small生成通俗易懂、基于证据的解释。实验结果表明,所提出的框架在准确性和韧性方面优于未经鲁棒性增强的标准DistilBERT检测模型。这种集成方法有助于弥合模型可靠性与用户信任之间的差距,推动透明钓鱼检测的发展。

英文摘要

Phishing and related cyber threats are becoming increasingly sophisticated, with email-based phishing remaining the most persistent attack vector. These attacks exploit human vulnerabilities to deliver malware or gain unauthorized access to sensitive information. Transformer-based models enhance phishing detection through robust contextual language understanding; yet they are often regarded as black boxes due to a lack of interpretability. Moreover, recent AI-enabled attacks further undermine model resilience. To address these challenges, this work proposes a lightweight phishing detection framework based on DistilBERT, a lightweight Transformer model. Robustness to embedding-level perturbations and character-level input noise is enhanced through gradient-based adversarial training using the Fast Gradient Method (FGM), combined with stochastic character-level perturbations. To improve transparency, three prominent Explainable AI (XAI) methods, LIME (Local Interpretable Model-agnostic Explanations), SHAP (SHapley Additive exPlanations), and IG (Integrated Gradients), are integrated to interpret model decision-making. A structured rule-based prompt combines model predictions and XAI features to guide Flan-T5-Small in generating plain-language, evidence-based explanations. Experimental results demonstrate that the proposed framework outperforms a standard DistilBERT-based detection model trained without robustness enhancements in terms of accuracy and resilience. This integrated approach helps bridge the gap between model reliability and user trust, advancing transparent phishing detection.

2601.10222 2026-06-03 math.NA cs.AI cs.NA math.OC

Introduction to optimization methods for training SciML models

训练科学机器学习模型的优化方法导论

Alena Kopaničáková, Elisa Riccietti

发表机构 * Toulouse-INP, IRIT-APO, ANITI(图卢兹INP、IRIT-APO、ANITI) ENS de Lyon, CNRS, Inria, Universitè Claude Bernard Lyon 1, LIP, UMR 5668(里昂大学、国家科学研究中心、法国国家信息与自动化研究所、克莱尔伯恩里昂第一大学、LIP、UMR 5668)

AI总结 本文统一介绍了机器学习和科学机器学习中的优化方法,强调问题结构如何影响算法选择,并讨论了物理约束和数据驱动SciML模型的实用策略。

详情
AI中文摘要

优化是现代机器学习(ML)和科学机器学习(SciML)的核心,但底层优化问题的结构在这些领域之间存在显著差异。经典ML通常依赖于随机、样本可分离的目标,这有利于一阶和自适应梯度方法。相比之下,SciML通常涉及物理信息或算子约束的公式,其中微分算子导致损失景观中的全局耦合、刚性和强各向异性。因此,SciML中的优化行为由底层物理模型的谱特性而非数据统计决定,这常常限制了标准随机方法的有效性,并促使采用确定性或曲率感知的方法。本文提供了ML和SciML中优化方法的统一介绍,强调问题结构如何塑造算法选择。我们回顾了确定性和随机设置中的一阶和二阶优化技术,讨论了它们对物理约束和数据驱动SciML模型的适应,并通过教程示例说明了实用策略,同时突出了科学计算和科学机器学习交叉领域的开放研究方向。

英文摘要

Optimization is central to both modern machine learning (ML) and scientific machine learning (SciML), yet the structure of the underlying optimization problems differs substantially across these domains. Classical ML typically relies on stochastic, sample-separable objectives that favor first-order and adaptive gradient methods. In contrast, SciML often involves physics-informed or operator-constrained formulations in which differential operators induce global coupling, stiffness, and strong anisotropy in the loss landscape. As a result, optimization behavior in SciML is governed by the spectral properties of the underlying physical models rather than by data statistics, frequently limiting the effectiveness of standard stochastic methods and motivating deterministic or curvature-aware approaches. This document provides a unified introduction to optimization methods in ML and SciML, emphasizing how problem structure shapes algorithmic choices. We review first- and second-order optimization techniques in both deterministic and stochastic settings, discuss their adaptation to physics-constrained and data-driven SciML models, and illustrate practical strategies through tutorial examples, while highlighting open research directions at the interface of scientific computing and scientific machine learning.

2601.04120 2026-06-03 math.OC cs.LG

A Single-Loop Bilevel Deep Learning Method for Optimal Control of Obstacle Problems

障碍问题最优控制的单环双层深度学习方法

Yongcun Song, Shangzhi Zeng, Jin Zhang, Lvgang Zhang

发表机构 * SUSTech(四川大学)

AI总结 提出一种无网格、可扩展的单环双层深度学习方法,通过约束嵌入神经网络和单环随机一阶双层算法高效求解障碍问题的最优控制。

详情
AI中文摘要

障碍问题的最优控制出现在广泛的应用中,由于其非光滑性、非线性和双层结构,计算上具有挑战性。经典数值方法依赖于基于网格的离散化,通常需要求解一系列代价高昂的子问题。在这项工作中,我们提出了一种单环双层深度学习方法,该方法无网格、可扩展到高维和复杂域,并避免重复求解离散子问题。该方法采用约束嵌入神经网络来逼近状态和控制,并保持双层结构。为了高效训练神经网络,我们提出了一种单环随机一阶双层算法(S2-FOBA),该算法消除了嵌套优化,并且不依赖于限制性的下层唯一性假设。我们在温和假设下分析了S2-FOBA的收敛行为。在基准示例上的数值实验,包括复杂域上具有规则和不规则障碍的分布控制和障碍控制问题,表明与经典数值方法相比,所提出的方法在降低计算成本的同时实现了令人满意的精度。

英文摘要

Optimal control of obstacle problems arises in a wide range of applications and is computationally challenging due to its nonsmoothness, nonlinearity, and bilevel structure. Classical numerical approaches rely on mesh-based discretization and typically require solving a sequence of costly subproblems. In this work, we propose a single-loop bilevel deep learning method, which is mesh-free, scalable to high-dimensional and complex domains, and avoids repeated solution of discretized subproblems. The method employs constraint-embedding neural networks to approximate the state and control and preserves the bilevel structure. To train the neural networks efficiently, we propose a Single-Loop Stochastic First-Order Bilevel Algorithm (S2-FOBA), which eliminates nested optimization and does not rely on restrictive lower-level uniqueness assumptions. We analyze the convergence behavior of S2-FOBA under mild assumptions. Numerical experiments on benchmark examples, including distributed and obstacle control problems with regular and irregular obstacles on complex domains, demonstrate that the proposed method achieves satisfactory accuracy while reducing computational cost compared to classical numerical methods.

2512.16882 2026-06-03 physics.chem-ph cond-mat.mtrl-sci cs.LG

A Cartesian-3j Framework for Machine Learning Interatomic Potentials

机器学习原子间势的 Cartesian-3j 框架

Zemin Xu, Chenyu Wu, Wenbo Xie, P. Hu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出基于Cartesian-3j符号和Cartesian广义Clebsch-Gordan系数的不可约Cartesian张量框架,构建MACE、NequIP和Allegro的Cartesian版本,并引入TACE-v1-OAM-M模型在Matbench Discovery上取得竞争性能。

详情
AI中文摘要

机器学习原子间势(MLIPs)在计算化学的外推能力方面带来了显著提升。然而,大多数等变模型通常使用球张量(STs)构建,而笛卡尔张量公式尽管与原子坐标和张量目标自然对齐,但仍未得到充分发展。在这项工作中,我们通过引入\texttt{Cartesian-3j}符号和Cartesian广义Clebsch-Gordan系数,为不可约Cartesian张量(ICTs)开发了一个Cartesian框架,这些符号和系数直接类比于为ST耦合定义的\texttt{Wigner-3j}符号和广义Clebsch-Gordan系数。我们扩展了\texttt{e3nn}库以支持ICT乘积,并使用该框架构建了\texttt{MACE}、\texttt{NequIP}和\texttt{Allegro}的Cartesian对应版本,从而首次实现了在固定架构仅改变张量基下的受控比较。我们的实验表明,不可约Cartesian模型可以达到与球面对应版本相当的精度,但直接Cartesian化会导致不利的计算和内存缩放,这促使我们采用专门的Cartesian架构选择。利用ICTs和我们的框架,我们引入了\texttt{TACE-v1-OAM-M},并证明它在Matbench Discovery上取得了与最先进ST模型竞争的性能。

英文摘要

Machine learning interatomic potentials (MLIPs) have brought substantial gains in the extrapolation capability in computational chemistry. However, most equivariant models are typically built with spherical tensors (STs), while Cartesian tensor formulations remain less developed despite their natural alignment with atomic coordinates and tensorial targets. In this work, we develop a Cartesian framework for irreducible Cartesian tensors (ICTs) by introduce the \texttt{Cartesian-3j} symbol and Cartesian Generalized Clebsch-Gordan Coefficients, which serve as direct analogues of the \texttt{Wigner-3j} symbol and Generalized Clebsch-Gordan coefficients defined for ST coupling. We extend the \texttt{e3nn} library to support ICT product, and use this framework to build Cartesian counterparts of \texttt{MACE}, \texttt{NequIP}, and \texttt{Allegro}, allowing the first controlled comparison where architectures are held fixed and only the tensor basis is changed. Our experiments show that irreducible Cartesian models can achieve accuracy comparable to spherical counterparts, but direct Cartesianization incurs unfavorable compute and memory scaling, motivating dedicated Cartesian architectural choices. Leveraging ICTs and our framework, we introduce \texttt{TACE-v1-OAM-M} and demonstrate that it achieves competitive performance on Matbench Discovery compared to state-of-the-art ST models.

2511.17126 2026-06-03 eess.IV cs.CV cs.LG physics.optics

Towards Blind Lens Aberration Correction via Large LensLib Pre-training and Discrete Degradation Priors

面向盲镜头像差校正的大规模LensLib预训练与离散退化先验

Xiaolong Qian, Qi Jiang, Yao Gao, Lei Sun, Kailun Yang, Xian Wang, Zhonghua Yi, Wenyong Li, Ming-Hsuan Yang, Luc Van Gool, Kaiwei Wang

发表机构 * National Research Center for Optical Instrumentation, Zhejiang University(浙江省光学仪器研究中心,浙江大学) INSAIT, Sofia University "St. Kliment Ohridski"(INSAIT,索菲亚大学"圣克莱门特·欧弗里迪斯基") School of Artificial Intelligence and Robotics, Hunan University(人工智能与机器人学院,湖南大学) National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University(机器人视觉感知与控制技术国家工程研究中心,湖南大学)

AI总结 提出FoundCAC框架,通过构建大规模无偏镜头库AODLibpro和离散退化先验LPR,解决数据扩展与先验缺失问题,实现盲镜头像差校正的零样本泛化和高效少样本适应。

Comments Accepted to 2026 IEEE International Conference on Computational Photography (ICCP). The source code and datasets will be made publicly available at https://github.com/zju-jiangqi/FoundCAC

详情
AI中文摘要

新兴的基于深度学习的镜头库预训练(LensLib-PT)流程通过训练通用神经网络,为盲镜头像差校正提供了新途径,展现出处理多种未知光学退化的强大能力。本文提出FoundCAC,一个通用的基础框架,解决了阻碍现有流程泛化的两个挑战:训练数据扩展的困难以及缺乏表征光学退化的先验指导。为提高数据可扩展性,我们扩展设计规范以增加退化多样性,并基于均匀采样策略构建了大规模无偏镜头库AODLibpro,该策略量化了空间变化模式和严重程度。在模型设计方面,为利用点扩散函数(PSF)作为指导同时保持盲范式,我们提出了一种多阶段向量量化表示学习方案。该范式专门设计用于构建潜在PSF表示(LPR),将复杂的连续PSF显式编码为离散退化先验,以规范高度病态的恢复过程。通过简单而有效的码本冻结策略,我们的框架利用离散先验提升全样本恢复性能,并实现对未见镜头的高效少样本适应。在合成LensLib和真实镜头的多种像差上的实验表明,我们的框架实现了最先进的零样本泛化,同时支持针对特定镜头的高效少样本适应。源代码和数据集将在https://github.com/zju-jiangqi/FoundCAC公开提供。

英文摘要

Emerging deep-learning-based lens library pre-training (LensLib-PT) pipeline offers a new avenue for blind lens aberration correction by training a universal neural network, demonstrating strong capability in handling diverse unknown optical degradations. This work proposes FoundCAC, a universal foundational framework that resolves two challenges hindering the generalization of existing pipelines: the difficulty of scaling training data and the absence of prior guidance characterizing optical degradation. To improve data scalability, we expand the design specifications to increase degradation diversity and construct AODLibpro, a large-scale, unbiased lens library based on a uniform sampling strategy that quantifies spatial-variation patterns and severity. In terms of model design, to leverage Point Spread Functions (PSFs) as guidance while maintaining the blind paradigm, we propose a multi-stage vector-quantized representation learning scheme. This paradigm is specifically designed to construct a Latent PSF Representation (LPR), explicitly encoding complex continuous PSFs into a discrete degradation prior to regularize the highly ill-posed restoration process. Through a simple yet effective codebook-freezing strategy, our framework leverages the discrete prior to elevate full-shot restoration performance and unlock highly efficient few-shot adaptation for unseen lenses. Experiments on diverse aberrations of synthetic LensLib and real-world lenses demonstrate that our framework achieves state-of-the-art zero-shot generalization while enabling highly efficient few-shot adaptation for specific lenses. The source code and datasets will be made publicly available at https://github.com/zju-jiangqi/FoundCAC.

2511.05050 2026-06-03 stat.ML cs.LG stat.ME

Estimating Bidirectional Causal Effects with Large Scale Online Kernel Learning

基于大规模在线核学习的双向因果效应估计

Masahiro Tanaka

发表机构 * Japan Society for the Promotion of Science(日本学术振兴会)

AI总结 提出一种可扩展的在线核学习框架,结合异方差识别和拟极大似然估计,用于估计存在相互依赖和异方差系统中的双向因果效应,并通过随机傅里叶特征和自适应在线梯度下降实现高效计算。

详情
Journal ref
Proceedings of the 2025 International Conference on Data Science and Intelligent Systems (DSIS 2025), Article 65, pp. 449-455
AI中文摘要

本研究提出一种可扩展的在线核学习框架,用于估计以相互依赖和异方差为特征的系统中的双向因果效应。传统因果推断通常关注单向效应,忽略了现实世界中常见的双向关系。基于异方差识别,该方法将联立方程模型的拟极大似然估计与大规模在线核学习相结合。它采用随机傅里叶特征逼近来灵活建模非线性条件均值和方差,同时自适应在线梯度下降算法确保了对流式和高维数据的计算效率。大量模拟结果表明,与单方程和多项式逼近基线相比,该方法在多种数据生成过程中实现了更高的准确性和稳定性,偏差和均方根误差更低。这些结果证实,该方法以近线性计算扩展有效捕获了复杂的双向因果效应。通过将计量经济学识别与现代机器学习技术相结合,所提框架为自然科学/社会科学、政策制定、商业和工业应用中的大规模因果推断提供了一种实用、可扩展且理论扎实的解决方案。

英文摘要

In this study, a scalable online kernel learning framework is proposed for estimating bidirectional causal effects in systems characterized by mutual dependence and heteroskedasticity. Traditional causal inference often focuses on unidirectional effects, overlooking the common bidirectional relationships in real-world phenomena. Building on heteroskedasticity-based identification, the proposed method integrates a quasi-maximum likelihood estimator for simultaneous equation models with large scale online kernel learning. It employs random Fourier feature approximations to flexibly model nonlinear conditional means and variances, while an adaptive online gradient descent algorithm ensures computational efficiency for streaming and high-dimensional data. Results from extensive simulations demonstrate that the proposed method achieves superior accuracy and stability than single equation and polynomial approximation baselines, exhibiting lower bias and root mean squared error across various data-generating processes. These results confirm that the proposed approach effectively captures complex bidirectional causal effects with near-linear computational scaling. By combining econometric identification with modern machine learning techniques, the proposed framework offers a practical, scalable, and theoretically grounded solution for large scale causal inference in natural/social science, policy making, business, and industrial applications.

2511.13899 2026-06-03 q-bio.NC cs.CE cs.LG

A Factorized Low-Rank RNN Framework for Uncovering Independent Neural Latent Dynamics and Connectivity

一种分解低秩RNN框架用于揭示独立神经潜在动力学和连接性

Chengrui Li, Yunmiao Wang, Yule Wang, Weihan Li, Dieter Jaeger, Anqi Wu

发表机构 * University of California, San Diego(加州大学圣迭戈分校)

AI总结 提出FacRNN框架,通过组间独立假设和部分相关惩罚,在低秩循环神经网络中实现潜在动力学的解耦与可解释性提升。

详情
AI中文摘要

低秩循环神经网络(lrRNN)是一类揭示神经群体活动背后低维潜在动力学的模型。尽管其功能连接是低秩的,但缺乏独立性解释,使得难以将不同的计算角色分配给不同的潜在维度。为了解决这个问题,我们提出了分解循环神经网络(FacRNN),这是一种生成式lrRNN框架,它假设潜在动力学之间具有组间独立性,同时允许组内灵活纠缠。这些独立的潜在组允许潜在动力学分别演化,但内部丰富以进行复杂计算。我们在变分自编码器(VAE)框架下重新表述lrRNN,从而引入部分相关惩罚,鼓励潜在维度组之间的独立性。在合成数据、猴子M1和小鼠电压成像数据上的实验表明,与不鼓励组间独立性的基线lrRNN相比,FacRNN持续改善了在低维空间和低秩连接中学到的神经潜在轨迹的解耦性和可解释性。

英文摘要

Low-rank recurrent neural networks (lrRNNs) are a class of models that uncover low-dimensional latent dynamics underlying neural population activity. Although their functional connectivity is low-rank, it lacks independence interpretations, making it difficult to assign distinct computational roles to different latent dimensions. To address this, we propose the Factored Recurrent Neural Network (FacRNN), a generative lrRNN framework that assumes group-wise independence among latent dynamics while allowing flexible within-group entanglement. These independent latent groups allow latent dynamics to evolve separately, but are internally rich for complex computation. We reformulate the lrRNN under a variational autoencoder (VAE) framework, enabling us to introduce a partial correlation penalty that encourages independence between groups of latent dimensions. Experiments on synthetic, monkey M1, and mouse voltage imaging data show that FacRNN consistently improves the disentanglement and interpretability of learned neural latent trajectories in low-dimensional space and low-rank connectivity over baseline lrRNNs that do not encourage group-wise independence.

2511.12482 2026-06-03 quant-ph cs.LG

Discovering autonomous quantum error correction via deep reinforcement learning

通过深度强化学习发现自主量子纠错

Yue Yin, Tailong Xiao, Xiaoyang Deng, Ming He, Jianping Fan, Guihua Zeng

发表机构 * Zhiyuan College, Shanghai Jiao Tong University, Shanghai 200240, P.R. China(上海交通大学玉泉学院) State Key Laboratory of Photonics and Communications, Institute for Quantum Sensing and Information Processing, Shanghai Jiao Tong University, Shanghai 200240, P.R. China(上海交通大学光子通信国家重点实验室) Hefei National Laboratory, Hefei, 230088, P.R. China(合肥国家实验室) Shanghai Research Center for Quantum Sciences, Shanghai, 201315, P.R. China(上海量子科学研究中心) AI Lab, Lenovo Research, Beijing 100094, P.R. China(联想AI实验室)

AI总结 本文利用课程学习启发的深度强化学习,在近似自主量子纠错框架下发现抵抗单光子和双光子损失的玻色子码,并实现超越盈亏平衡点的最优码字。

详情
Journal ref
Phys. Rev. A 112, 062618 (2025)
AI中文摘要

量子纠错对于容错量子计算至关重要。然而,依赖主动测量的标准方法可能会引入额外错误。自主量子纠错(AQEC)通过利用玻色子系统中的工程耗散和驱动来规避这一问题,但由于严格的Knill-Laflamme条件,识别实用的编码仍然具有挑战性。在本工作中,我们利用课程学习启发的深度强化学习,在近似AQEC框架下发现抵抗单光子和双光子损失的玻色子码。我们提出了在近似条件下求解主方程的解析解,这可以显著加速强化学习的训练过程。智能体首先通过在受限演化时间框架内快速探索,识别出超越盈亏平衡点的编码子空间,然后策略性地微调其策略,以在更长的时间范围内维持这一性能优势。我们发现,经过两阶段训练的智能体能够发现最优码字集合,即考虑单光子和双光子损失效应的Fock态$\ket{4}$和$\ket{7}$。我们识别出该码在更长的演化时间内超越了盈亏平衡阈值,并达到了最先进的性能。我们还分析了该码对相位阻尼和振幅阻尼噪声的鲁棒性。我们的工作突显了课程学习启发的深度强化学习在发现最优量子纠错码方面的潜力,特别是在早期容错量子系统中。

英文摘要

Quantum error correction is essential for fault-tolerant quantum computing. However, standard methods relying on active measurements may introduce additional errors. Autonomous quantum error correction (AQEC) circumvents this by utilizing engineered dissipation and drives in bosonic systems, but identifying practical encoding remains challenging due to stringent Knill-Laflamme conditions. In this work, we utilize curriculum learning enabled deep reinforcement learning to discover Bosonic codes under approximate AQEC framework to resist both single-photon and double-photon losses. We present an analytical solution of solving the master equation under approximation conditions, which can significantly accelerate the training process of reinforcement learning. The agent first identifies an encoded subspace surpassing the breakeven point through rapid exploration within a constrained evolutionary time-frame, then strategically fine-tunes its policy to sustain this performance advantage over extended temporal horizons. We find that the two-phase trained agent can discover the optimal set of codewords, i.e., the Fock states $\ket{4}$ and $\ket{7}$ considering the effect of both single-photon and double-photon loss. We identify that the discovered code surpasses the breakeven threshold over a longer evolution time and achieve the state-of-art performance. We also analyze the robustness of the code against the phase damping and amplitude damping noise. Our work highlights the potential of curriculum learning enabled deep reinforcement learning in discovering the optimal quantum error correct code especially in early fault-tolerant quantum systems.

2511.02986 2026-06-03 stat.ML cs.LG q-bio.GN

Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models

基于潜在扩散模型的可扩展单细胞基因表达生成

Giovanni Palla, Sudarshan Babu, Payam Dibaeinia, James D. Pearce, Donghui Li, Aly A. Khan, Theofanis Karaletsos, Jakub M. Tomczak

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出scLDM,一种结合变分自编码器和潜在扩散模型的可扩展生成方法,通过置换不变/等变架构和扩散Transformer实现高质量单细胞基因表达生成。

Comments Accepted to ICML 2026, Github: https://github.com/czi-ai/scldm/

详情
AI中文摘要

单细胞基因表达的计算建模对于理解细胞过程至关重要,但生成真实的表达谱仍然是一个主要挑战。这一困难源于基因表达数据的计数性质以及基因之间复杂的潜在依赖性。现有的生成模型通常强加人工基因排序或依赖浅层神经网络架构。我们引入了一种可扩展的潜在扩散模型用于单细胞基因表达数据,称为scLDM,该模型尊重数据的基本可交换性属性。我们的VAE使用固定大小的潜在变量,利用统一的多头交叉注意力块(MCAB)架构,该架构具有双重作用:编码器中的置换不变池化和解码器中的置换等变反池化。我们通过用使用扩散Transformer和线性插值的潜在扩散模型替换高斯先验来增强这一框架,从而通过多条件无分类器引导实现高质量生成。我们在观察性和扰动性单细胞数据的多种实验以及下游任务(如细胞水平分类)中展示了其优越性能。

英文摘要

Computational modeling of single-cell gene expression is crucial for understanding cellular processes, but generating realistic expression profiles remains a major challenge. This difficulty arises from the count nature of gene expression data and complex latent dependencies among genes. Existing generative models often impose artificial gene orderings or rely on shallow neural network architectures. We introduce a scalable latent diffusion model for single-cell gene expression data, which we refer to as scLDM, that respects the fundamental exchangeability property of the data. Our VAE uses fixed-size latent variables leveraging a unified Multi-head Cross-Attention Block (MCAB) architecture, which serves dual roles: permutation-invariant pooling in the encoder and permutation-equivariant unpooling in the decoder. We enhance this framework by replacing the Gaussian prior with a latent diffusion model using Diffusion Transformers and linear interpolants, enabling high-quality generation with multi-conditional classifier-free guidance. We show its superior performance in a variety of experiments for both observational and perturbational single-cell data, as well as downstream tasks like cell-level classification.

2511.02304 2026-06-03 cs.MA cs.AI cs.CL cs.FL cs.LG

Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning

自动机条件化协作多智能体强化学习

Beyazit Yalcinkaya, Marcell Vazquez-Chanlatte, Ameesh Shah, Hanna Krasowski, Sanjit A. Seshia

发表机构 * Massachusetts Institute of Technology(麻省理工学院) Stanford University(斯坦福大学)

AI总结 提出自动机条件化协作多智能体强化学习框架,通过自动机分解团队目标为子任务,学习任务条件化的分散策略,实现最优任务分配和多步协调。

详情
AI中文摘要

我们研究在集中训练、分散执行下,针对协作性时间目标的多任务、多智能体策略学习。在此设置中,使用自动机表示分配给智能体的任务,能够将团队级目标分解为更简单、更小的子任务。然而,现有方法样本效率低下,且局限于单任务情况,需要为每个新任务重新训练策略。在这项工作中,我们提出了自动机条件化协作多智能体强化学习(ACC-MARL),一个学习任务条件化分散团队策略的框架。我们识别了ACC-MARL可行性的挑战,提出了解决方案,并证明了我们的方法是最优的。我们进一步展示了学习到的价值函数可用于在测试时最优地分配任务。实验表明,智能体之间涌现出任务感知的多步协调,例如按下按钮开门、扶住门以及短路任务。

英文摘要

We study learning multi-task, multi-agent policies for cooperative, temporal objectives, under centralized training, decentralized execution. In this setting, using automata to represent tasks assigned to agents enables breaking down a team-level objective into simpler, smaller sub-tasks. However, existing approaches remain sample-inefficient and are limited to the single-task case, requiring retraining policies for each new task. In this work, we present Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning (ACC-MARL), a framework for learning task-conditioned, decentralized team policies. We identify challenges to the feasibility of ACC-MARL, propose solutions, and prove that our approach is optimal. We further show that learned value functions can be used to assign tasks optimally at test time. Experiments demonstrate emergent task-aware, multi-step coordination among agents, such as pressing a button to unlock a door, holding the door, and short-circuiting tasks.

2510.15780 2026-06-03 stat.AP cs.LG

Enhanced Renewable Energy Forecasting using Context-Aware Conformal Prediction

基于上下文感知保形预测的增强型可再生能源预测

Alireza Moradi, Mathieu Tanneau, Reza Zandehshahvar, Pascal Van Hentenryck

发表机构 * EPFL, Switzerland(瑞士联邦理工学院) Ghent University, Belgium(比利时根特大学)

AI总结 提出上下文感知保形预测(CACP)框架,通过加权历史观测校准预测区间,无需重新训练模型,提升可再生能源预测的可靠性和效率。

详情
AI中文摘要

人工智能(AI)越来越多地被用于支持可再生能源预测和电网运营。随着可再生能源渗透率的增长,可靠的概率预测对于管理不确定性和支持风险感知的运营决策变得至关重要。然而,由于时间变异性、天气条件变化和异质运行机制,这些预测常常存在校准偏差。在许多实际场景中,可再生能源预测由外部来源、供应商或独立训练的系统提供,由于模型访问受限或计算约束,重新训练不可行。这需要高效且模型无关的方法来在预测生成后提高其可靠性。本文提出了上下文感知保形预测(CACP),一种用于校准可再生能源预测的框架。所提方法在校准过程中依赖于一种加权机制,该机制为与目标预测条件更相似的历史观测分配更高的权重。这使得能够自适应预测区间,反映局部不确定性机制,而无需访问或重新训练底层预测模型。实验在来自美国国家可再生能源实验室(NREL)的日前太阳能预测大规模数据集上进行,涵盖包括MISO、ERCTO和SPP在内的多个系统。结果表明,与NREL的基础预测模型和其他保形预测基线相比,CACP在站点和系统层面均改善了可靠性-效率权衡。这些结果表明,CACP可以作为可信AI驱动的可再生能源预测和运营决策支持的实际可靠性增强层。

英文摘要

Artificial intelligence (AI) is increasingly used to support renewable energy forecasting and grid operations. As renewable penetration grows, reliable probabilistic forecasting is becoming essential for managing uncertainty and supporting risk-aware operational decision-making. However, these forecasts often suffer from miscalibration due to temporal variability, changing weather conditions, and heterogeneous operating regimes. In many real-world settings, renewable energy forecasts are provided by external sources, vendors, or independently trained systems, making retraining infeasible because of limited model access or computational constraints. This creates a need for efficient and model-agnostic methods that can improve forecast reliability after they are produced. This paper presents Context-Aware Conformal Prediction (CACP), a framework for calibrating renewable energy forecasts. The proposed method relies on a weighting mechanism during the calibration procedure which assigns higher weights to historical observations that are more similar to the target forecasting condition. This enables adaptive prediction intervals that reflect local uncertainty regimes without requiring access to, or retraining of, the underlying forecasting model. Experiments are performed on a large-scale dataset from National Renewable Energy Laboratory (NREL) day-ahead solar forecasting, covering multiple systems including MISO, ERCTO, and SPP. The results show that CACP improves the reliability-efficiency tradeoff at both site and system levels compared to NREL's base forecasting model and the other conformal prediction baselines. These results suggest that CACP can serve as a practical reliability-enhancement layer for trustworthy AI-enabled renewable energy forecasting and operational decision support.

2509.08048 2026-06-03 hep-ph cs.LG

Forecasting Generative Amplification

预测生成放大

Henning Bahl, Sascha Diefenbacher, Nina Elmer, Tilman Plehn, Jonas Spinner

发表机构 * Institut für Theoretische Physik, Universität Heidelberg, Germany(海德堡大学理论物理研究所) Physics Division, Lawrence Berkeley National Laboratory, Berkeley, USA(伯克利国家实验室物理部) Interdisciplinary Center for Scientific Computing (IWR), Universität Heidelberg, Germany(海德堡大学跨学科科学计算中心(IWR))

AI总结 本文提出两种互补方法(平均放大和差分放大)来估计生成网络在LHC模拟中的统计放大因子,无需大型保留数据集,并应用于最新事件生成器,表明放大在相空间特定区域可行但尚未覆盖整个分布。

Comments 23 pages, 15 figures. v2: added link to github repo, extended acknowledgements. v3: updated conventions and refined text, now 25 pages

详情
Journal ref
SciPost Phys. 20, 150 (2026)
AI中文摘要

生成网络是提高LHC模拟速度和精度的完美工具。理解其统计精度至关重要,尤其是在生成超出训练数据集大小的事件时。我们提出了两种互补方法来估计放大因子,无需大型保留数据集。平均放大使用贝叶斯网络或集成方法,通过给定相空间体积上积分的精度来估计放大。差分放大使用假设检验来量化放大,且没有任何分辨率损失。应用于最先进的事件生成器时,两种方法都表明放大在相空间的特定区域是可能的,但尚未覆盖整个分布。

英文摘要

Generative networks are perfect tools to enhance the speed and precision of LHC simulations. It is important to understand their statistical precision, especially when generating events beyond the size of the training dataset. We present two complementary methods to estimate the amplification factor without large holdout datasets. Averaging amplification uses Bayesian networks or ensembling to estimate amplification from the precision of integrals over given phase-space volumes. Differential amplification uses hypothesis testing to quantify amplification without any resolution loss. Applied to state-of-the-art event generators, both methods indicate that amplification is possible in specific regions of phase space, but not yet across the entire distribution.

2509.09685 2026-06-03 cs.IR cs.AI cs.MM cs.SD eess.AS

TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation

TalkPlayData 2:用于多模态对话式音乐推荐的智能体合成数据流水线

Keunwoo Choi, Seungheon Doh, Juhan Nam

发表机构 * KAIST(韩国科学技术院)

AI总结 提出TalkPlayData 2,一个由智能体数据流水线生成的多模态对话式音乐推荐合成数据集,通过多角色大语言模型模拟对话并覆盖多种场景,以支持生成式推荐模型训练。

详情
AI中文摘要

我们提出了TalkPlayData 2,一个由智能体数据流水线生成的多模态对话式音乐推荐合成数据集。在该流水线中,多个大语言模型(LLM)智能体被创建,承担不同角色,具有专门的提示词和访问不同信息部分的权限,通过记录Listener LLM和Recsys LLM之间的对话来获取聊天数据。为了覆盖各种对话场景,每个对话的Listener LLM基于微调的对话目标进行条件设置。最后,所有LLM都是多模态的,支持音频和图像,从而模拟多模态推荐和对话。在LLM-as-a-judge和主观评估实验中,TalkPlayData 2在训练音乐生成式推荐模型的各个方面达到了预期目标。TalkPlayData 2及其生成代码已在https://talkpl-ai.github.io发布。

英文摘要

We present TalkPlayData 2, a synthetic dataset for multimodal conversational music recommendation generated by an agentic data pipeline. In the proposed pipeline, multiple large language model (LLM) agents are created under various roles with specialized prompts and access to different parts of information, and the chat data is acquired by logging the conversation between the Listener LLM and the Recsys LLM. To cover various conversation scenarios, for each conversation, the Listener LLM is conditioned on a finetuned conversation goal. Finally, all the LLMs are multimodal with audio and images, allowing a simulation of multimodal recommendation and conversation. In the LLM-as-a-judge and subjective evaluation experiments, TalkPlayData 2 achieved the proposed goal in various aspects related to training a generative recommendation model for music. TalkPlayData 2 and its generation code are released at https://talkpl-ai.github.io.

2510.01698 2026-06-03 cs.IR cs.MM cs.SD eess.AS

TalkPlay-Tools: Conversational Music Recommendation with LLM Tool Calling

TalkPlay-Tools: 基于大语言模型工具调用的对话式音乐推荐

Seungheon Doh, Keunwoo Choi, Juhan Nam

发表机构 * KAIST(韩国科学技术院) talkpl.ai

AI总结 提出一种基于LLM工具调用的统一检索-重排序流水线,通过布尔过滤、稀疏检索、稠密检索和生成式检索的组合,实现端到端的对话式音乐推荐。

Comments Accepted for publication at The Workshop on AI for Music, Neural Information Processing Systems (NeurIPS-AI4Music)

详情
AI中文摘要

尽管大型语言模型(LLM)的最新进展已成功实现了具有自然语言交互的生成式推荐系统,但其推荐行为受限,导致系统中其他更简单但关键组件(如元数据或属性过滤)未被充分利用。我们提出了一种基于LLM的音乐推荐系统,通过工具调用作为统一的检索-重排序流水线。该系统将LLM定位为端到端推荐系统,解释用户意图、规划工具调用并编排专门组件:布尔过滤(SQL)、稀疏检索(BM25)、稠密检索(嵌入相似度)和生成式检索(语义ID)。通过工具规划,系统预测要使用的工具类型、执行顺序以及查找匹配用户偏好的音乐所需的参数,支持多种模态,同时无缝集成多个数据库过滤方法。我们证明,这种统一的工具调用框架通过基于用户查询选择性地采用适当的检索方法,在多种推荐场景中实现了有竞争力的性能,为对话式音乐推荐系统设想了新的范式。

英文摘要

While the recent developments in large language models (LLMs) have successfully enabled generative recommenders with natural language interactions, their recommendation behavior is limited, leaving other simpler yet crucial components such as metadata or attribute filtering underutilized in the system. We propose an LLM-based music recommendation system with tool calling to serve as a unified retrieval-reranking pipeline. Our system positions an LLM as an end-to-end recommendation system that interprets user intent, plans tool invocations, and orchestrates specialized components: boolean filters (SQL), sparse retrieval (BM25), dense retrieval (embedding similarity), and generative retrieval (semantic IDs). Through tool planning, the system predicts which types of tools to use, their execution order, and the arguments needed to find music matching user preferences, supporting diverse modalities while seamlessly integrating multiple database filtering methods. We demonstrate that this unified tool-calling framework achieves competitive performance across diverse recommendation scenarios by selectively employing appropriate retrieval methods based on user queries, envisioning a new paradigm for conversational music recommendation systems.

2502.09755 2026-06-03 cs.CR cs.LG

Jailbreak Attack Initializations as Extractors of Compliance Directions

越狱攻击初始化作为合规方向的提取器

Amit Levi, Rom Himelstein, Yaniv Nemcovsky, Avi Mendelson, Chaim Baskin

发表机构 * Department of Computer Science, Technion - Israel Institute of Technology(技术学院计算机科学系) Department of Data and Decision Science, Technion - Israel Institute of Technology(技术学院数据与决策科学系) School of Electrical and Computer Engineering Engineering, Ben-Gurion University of the Negev(内盖夫本· Gurion大学电气与计算机工程学院)

AI总结 本文发现基于梯度的越狱攻击初始化会收敛到抑制拒绝的单一合规方向,并据此提出CRI框架,通过沿合规方向投影未见提示来提高攻击成功率并降低计算开销。

Comments Accepted to Findings of the Association for Computational Linguistics 2025 (EMNLP 2025)

详情
AI中文摘要

安全对齐的LLM对提示的响应要么是合规要么是拒绝,每种响应对应模型激活空间中的不同方向。最近的研究表明,通过从其他提示进行自我迁移来初始化攻击可以显著提升其性能。然而,这些初始化的潜在机制仍不清楚,并且攻击使用任意或手动选择的初始化。本文表明,每个基于梯度的越狱攻击及其后续初始化逐渐收敛到一个抑制拒绝的单一合规方向,从而能够实现从拒绝到合规的高效转换。基于这一见解,我们提出了CRI,一个旨在将未见提示进一步投影到合规方向的初始化框架。我们在多种攻击、模型和数据集上展示了我们的方法,实现了更高的攻击成功率(ASR)并降低了计算开销,突显了安全对齐LLM的脆弱性。参考实现可在以下网址获取:https://amit1221levi.github.io/CRI-Jailbreak-Init-LLMs-evaluation

英文摘要

Safety-aligned LLMs respond to prompts with either compliance or refusal, each corresponding to distinct directions in the model's activation space. Recent works show that initializing attacks via self-transfer from other prompts significantly enhances their performance. However, the underlying mechanisms of these initializations remain unclear, and attacks utilize arbitrary or hand-picked initializations. This work presents that each gradient-based jailbreak attack and subsequent initialization gradually converge to a single compliance direction that suppresses refusal, thereby enabling an efficient transition from refusal to compliance. Based on this insight, we propose CRI, an initialization framework that aims to project unseen prompts further along compliance directions. We demonstrate our approach on multiple attacks, models, and datasets, achieving an increased attack success rate (ASR) and reduced computational overhead, highlighting the fragility of safety-aligned LLMs. A reference implementation is available at: https://amit1221levi.github.io/CRI-Jailbreak-Init-LLMs-evaluation.

2510.01377 2026-06-03 math.OC cs.AI cs.LG cs.MA cs.SY eess.SY

DeMuon: A Decentralized Muon for Matrix Optimization over Graphs

DeMuon:一种用于图上矩阵优化的去中心化Muon方法

Chuan He, Shuyi Ren, Jingwei Mao, Erik G. Larsson

发表机构 * Department of Mathematics, Linköping University(利乌普堡大学数学系) Department of Electrical Engineering, Linköping University(利乌普堡大学电气工程系) Department of Computer and Information Science, Linköping University(利乌普堡大学计算机与信息科学系)

AI总结 提出DeMuon方法,通过牛顿-舒尔茨迭代实现矩阵正交化,并利用梯度跟踪处理局部函数异质性,在重尾噪声下达到与集中式算法匹配的复杂度,首次将Muon扩展到去中心化图优化并具有可证明的复杂度保证。

Comments Add an accelerated variant of the proposed method. New proofs of proposed methods

详情
AI中文摘要

本文提出DeMuon,一种在给定通信拓扑上进行去中心化矩阵优化的方法。DeMuon通过牛顿-舒尔茨迭代(继承自其集中式前身Muon)实现矩阵正交化,并采用梯度跟踪来减轻局部函数之间的异质性。在重尾噪声条件和额外的温和假设下,我们建立了DeMuon达到近似随机驻点的迭代复杂度。该复杂度结果在目标容差依赖方面与已知的最佳集中式算法复杂度界相匹配。据我们所知,DeMuon是首个将Muon直接扩展到图上去中心化优化并具有可证明复杂度保证的方法。我们在不同连通程度的图上进行了去中心化Transformer预训练的初步数值实验。数值结果表明,在不同网络拓扑下,DeMuon相比其他流行的去中心化算法具有明显的改进优势。

英文摘要

In this paper, we propose DeMuon, a method for decentralized matrix optimization over a given communication topology. DeMuon incorporates matrix orthogonalization via Newton-Schulz iterations-a technique inherited from its centralized predecessor, Muon-and employs gradient tracking to mitigate heterogeneity among local functions. Under heavy-tailed noise conditions and additional mild assumptions, we establish the iteration complexity of DeMuon for reaching an approximate stochastic stationary point. This complexity result matches the best-known complexity bounds of centralized algorithms in terms of dependence on the target tolerance. To the best of our knowledge, DeMuon is the first direct extension of Muon to decentralized optimization over graphs with provable complexity guarantees. We conduct preliminary numerical experiments on decentralized transformer pretraining over graphs with varying degrees of connectivity. Our numerical results demonstrate a clear margin of improvement of DeMuon over other popular decentralized algorithms across different network topologies.

2509.08726 2026-06-03 math.OC cs.LG

Decentralized Stochastic Nonconvex Optimization under the $(L_0,L_1)$-Smoothness

$(L_0,L_1)$-光滑条件下的去中心化随机非凸优化

Luo Luo, Xue Cui, Tingkai Jia, Cheng Chen

发表机构 * School of Data Science, Fudan University(复旦大学数据科学学院) East China Normal University(华东师范大学)

AI总结 针对满足$(L_0,L_1)$-光滑条件的非凸函数,提出去中心化归一化随机梯度下降算法,实现每个局部智能体达到ε-稳定点,并给出样本复杂度和通信复杂度的上界。

详情
AI中文摘要

本文关注去中心化随机优化问题 $f(\mathbf{x})=\frac{1}{m}\sum_{i=1}^m f_i(\mathbf{x})$,其中网络由 $n$ 个智能体连接,每个局部函数形如 $f_i(\mathbf{x}) = {\mathbb E}\left[F(\mathbf{x};{\boldsymbol ξ}_i)\right]$,满足 $(L_0,L_1)$-光滑条件但可能非凸,且每个随机变量 ${\boldsymbol ξ}_i$ 服从分布 ${\mathcal D}_i$。我们提出一种新算法——去中心化归一化随机梯度下降(DNSGD),该算法可使每个局部智能体达到 $\varepsilon$-稳定点。我们提出了一个基于梯度范数与一致性误差乘积的李雅普诺夫函数的新框架,用于分析 $(L_0,L_1)$-光滑设置下的去中心化一阶方法。我们证明,所提算法在每个智能体上的样本复杂度上界为 ${\mathcal O}(m^{-1}(L_fσ^2Δ_fε^{-4} + σ^2ε^{-2} + L_f^{-2}L_1^3σ^2Δ_fε^{-1} + L_f^{-2}L_1^2σ^2))$,通信复杂度上界为 $\tilde{\mathcal O}((L_fε^{-2} + L_1ε^{-1})γ^{-1/2}Δ_f)$,其中 $L_f=L_0 +L_1ζ$,$σ^2$ 是随机梯度的方差,$Δ_f$ 是初始最优函数值差距,$γ$ 是网络的谱间隙,$ζ$ 是梯度异质性程度。在 $L_1=0$ 的特殊情况下,上述结果(几乎)匹配标准光滑条件下去中心化随机非凸优化的下界。我们还进行了数值实验,以展示我们方法的实证优越性。

英文摘要

This paper focuses on the decentralized stochastic optimization problem $f(\mathbf{x})=\frac{1}{m}\sum_{i=1}^m f_i(\mathbf{x})$ over a connected network of $n$ agents, where each local function has the form of $f_i(\mathbf{x}) = {\mathbb E}\left[F(\mathbf{x};{\boldsymbol ξ}_i)\right]$ which satisfies the $(L_0,L_1)$-smooth condition but possibly nonconvex and each random variable ${\boldsymbol ξ}_i$ follows distribution ${\mathcal D}_i$. We propose a novel algorithm called decentralized normalized stochastic gradient descent (DNSGD), which can achieve an $ε$-stationary point at each local agent. We present a new framework for analyzing decentralized first-order methods in the $(L_0,L_1)$-smooth setting, based on the Lyapunov function related to the product of the gradient norm and the consensus error. We show that the proposed algorithm attains the upper bounds on the sample complexity of ${\mathcal O}(m^{-1}(L_fσ^2Δ_fε^{-4} + σ^2ε^{-2} + L_f^{-2}L_1^3σ^2Δ_fε^{-1} + L_f^{-2}L_1^2σ^2))$ per agent and the communication complexity of $\tilde{\mathcal O}((L_fε^{-2} + L_1ε^{-1})γ^{-1/2}Δ_f)$, where $L_f=L_0 +L_1ζ$, $σ^2$ is the variance of the stochastic gradient, $Δ_f$ is the initial optimal function value gap, $γ$ is the spectral gap of the network, and $ζ$ is the degree of the gradient dissimilarity. In the special case of $L_1=0$, the above results (nearly) match the lower bounds of decentralized stochastic nonconvex optimization under the standard smoothness. We also conduct numerical experiments to show the empirical superiority of our method.

2509.08707 2026-06-03 q-bio.BM cs.LG

Tokenizing Loops of Antibodies

抗体环的标记化

Ada Fang, Robert G. Alberstein, Simon Kelow, Frédéric A. Dreyer

发表机构 * Harvard University(哈佛大学) Prescient Design, Genentech(Prescient Design,基因泰克)

AI总结 提出Igloo多模态抗体环标记器,通过对比学习编码主链二面角和序列,高效检索相似环结构,提升H3环识别性能5.9%,并集成到蛋白质语言模型中改善抗体设计。

Comments 21 pages, 7 figures, 10 tables, code available at https://github.com/prescient-design/igloo

详情
AI中文摘要

抗体的互补决定区是环状结构,对其与抗原的相互作用至关重要,并且对新型生物制品的设计具有高度重要性。自20世纪80年代以来,将CDR结构的多样性分类为规范簇使得能够识别抗体的关键结构基序。然而,现有方法的覆盖范围有限,并且不能轻易地整合到蛋白质基础模型中。在这里,我们介绍了免疫球蛋白环标记器Igloo,这是一种多模态抗体环标记器,用于编码主链二面角和序列。Igloo使用对比学习目标进行训练,以在潜在空间中将具有相似主链二面角的环映射得更近。Igloo可以高效地从结构抗体数据库中检索最接近的匹配环结构,在识别相似H3环方面比现有方法高出5.9%。Igloo为所有环分配标记,解决了规范簇覆盖范围有限的问题,同时保留了恢复规范环构象的能力。为了展示Igloo标记的多功能性,我们展示了它们可以通过IglooLM和IglooALM整合到蛋白质语言模型中。在预测重链变体的结合亲和力方面,IglooLM在10个抗体-抗原靶点中的8个上优于基础蛋白质语言模型。此外,它与现有的最先进的基于序列和多模态蛋白质语言模型相当,与参数多7倍的模型表现相当。IglooALM采样的抗体环在序列上多样化,在结构上比最先进的抗体逆折叠模型更一致。Igloo展示了引入多模态标记用于抗体环在编码抗体环的多样化景观、改进蛋白质基础模型以及抗体CDR设计方面的优势。

英文摘要

The complementarity-determining regions of antibodies are loop structures that are key to their interactions with antigens, and of high importance to the design of novel biologics. Since the 1980s, categorizing the diversity of CDR structures into canonical clusters has enabled the identification of key structural motifs of antibodies. However, existing approaches have limited coverage and cannot be readily incorporated into protein foundation models. Here we introduce ImmunoGlobulin LOOp Tokenizer, Igloo, a multimodal antibody loop tokenizer that encodes backbone dihedral angles and sequence. Igloo is trained using a contrastive learning objective to map loops with similar backbone dihedral angles closer together in latent space. Igloo can efficiently retrieve the closest matching loop structures from a structural antibody database, outperforming existing methods on identifying similar H3 loops by 5.9\%. Igloo assigns tokens to all loops, addressing the limited coverage issue of canonical clusters, while retaining the ability to recover canonical loop conformations. To demonstrate the versatility of Igloo tokens, we show that they can be incorporated into protein language models with IglooLM and IglooALM. On predicting binding affinity of heavy chain variants, IglooLM outperforms the base protein language model on 8 out of 10 antibody-antigen targets. Additionally, it is on par with existing state-of-the-art sequence-based and multimodal protein language models, performing comparably to models with $7\times$ more parameters. IglooALM samples antibody loops which are diverse in sequence and more consistent in structure than state-of-the-art antibody inverse folding models. Igloo demonstrates the benefit of introducing multimodal tokens for antibody loops for encoding the diverse landscape of antibody loops, improving protein foundation models, and for antibody CDR design.

2506.01969 2026-06-03 cs.DC cs.AI cs.LG

FlashMLA-ETAP: Efficient Transpose Attention Pipeline for Accelerating MLA Inference on NVIDIA H20 GPUs

FlashMLA-ETAP:用于加速NVIDIA H20 GPU上MLA推理的高效转置注意力流水线

Pengcuo Dege, Qiuming Luo, Rui Mao, Chang Kong

发表机构 * Tencent(腾讯) College of Computer Science and Software Engineering, Shenzhen University(深圳大学计算机科学与软件工程学院) College of Artificial Intelligence, Shenzhen Polytechnic University(深圳职业技术学院人工智能学院)

AI总结 针对单多GPU服务器部署DeepSeek-R1 671B模型时多头潜在注意力(MLA)推理效率低的问题,提出FlashMLA-ETAP框架,通过高效转置注意力流水线(ETAP)重配置注意力计算,在NVIDIA H20 GPU上实现2.78倍加速,并保持数值稳定性。

Comments Accepted by ICONIP2025

详情
AI中文摘要

多头潜在注意力(MLA)的高效推理面临在单台多GPU服务器上部署DeepSeek-R1 671B模型的挑战。本文介绍FlashMLA-ETAP,一种新颖的框架,用于增强NVIDIA H20 GPU上单实例部署场景的MLA推理。我们提出了高效转置注意力流水线(ETAP),通过转置重新配置注意力计算,使KV上下文长度与WGMMA操作中的\(M\)维度对齐,显著减少冗余计算。FlashMLA-ETAP在64K序列长度(批大小16)下比FlashMLA加速2.78倍,比FlashAttention-3和FlashInfer分别提升5.24倍和4.94倍,同时保持数值稳定性,均方根误差(RMSE)比FlashAttention-3低15.2倍(\(1.25 imes 10^{-5}\))。此外,ETAP的设计能够无缝集成到FlashAttention-3和FlashInfer等框架中,并有详细的理论分析支持。我们的工作解决了资源受限推理中的一个关键空白,为中端GPU提供了可扩展的解决方案,并为硬件感知优化的更广泛采用铺平了道路。代码可在https://github.com/pengcuo/FlashMLA-ETAP获取。

英文摘要

Efficient inference of Multi-Head Latent Attention (MLA) is challenged by deploying the DeepSeek-R1 671B model on a single Multi-GPU server. This paper introduces FlashMLA-ETAP, a novel framework that enhances MLA inference for the single-instance deployment scenario on NVIDIA H20 GPUs. We propose the Efficient Transpose Attention Pipeline (ETAP), which reconfigures attention computation through transposition to align the KV context length with the \(M\)-dimension in WGMMA operations, significantly reducing redundant computations. FlashMLA-ETAP achieves a 2.78x speedup over FlashMLA at 64K sequence length (batch size 16), with 5.24x and 4.94x improvements over FlashAttention-3 and FlashInfer, respectively, while maintaining numerical stability with a 15.2x lower RMSE (\(1.25 \times 10^{-5}\)) than FlashAttention-3. Furthermore, ETAP's design enables seamless integration into frameworks like FlashAttention-3 and FlashInfer, supported by a detailed theoretical analysis. Our work addresses a critical gap in resource-constrained inference, offering a scalable solution for mid-tier GPUs and paving the way for broader adoption in hardware-aware optimization. Code is available at https://github.com/pengcuo/FlashMLA-ETAP.

2506.01075 2026-06-03 cs.DS cs.IT cs.LG math.IT

Learning DNF through Generalized Fourier Representations

通过广义傅里叶表示学习DNF

Mohsen Heidari, Roni Khardon

发表机构 * Department of Computer Sciences, Indiana University, Bloomington, IN, USA(印第安纳大学计算机科学系,印第安纳州布卢明顿,IN,USA)

AI总结 针对非乘积分布下DNF学习难题,引入基于贝叶斯网络的广义傅里叶表示,证明合取式的L1谱范数有界性,实现DNF和决策树的可学习性。

Comments 60 pages

详情
AI中文摘要

布尔傅里叶表示在学习理论中被广泛使用,特别是在均匀分布和乘积分布下学习析取范式(DNF)。将这些结果扩展到非乘积分布一直是一个长期未解决的开放问题。我们通过引入一种广义傅里叶表示来应对这一挑战,该表示能够在广泛的一类非乘积分布下进行学习。我们的方法将任意分布$D$表示为贝叶斯网络(BN),并推导出相应的傅里叶展开。我们证明了使用成员查询来识别重系数的标准基于傅里叶的学习技术可以通过少量修改适应于这种广义表示。我们证明了对于差分有界树BN,合取式的$L_1$谱范数在这种展开下保持有界,显著推广了均匀分布的已知结果;匹配的下界证明了这些约束的必要性。利用这些结果,我们建立了DNF的可学习性以及决策树在此类分布下的不可知学习性。最后,我们提出了一种学习差分有界树BN分布的算法,将我们的结果扩展到分布未知的场景。

英文摘要

The Boolean Fourier representation has been widely used in learning theory, particularly for learning Disjunctive Normal Form (DNF) under uniform and product distributions. Extending these results to non-product distributions has remained a longstanding open problem. We address this challenge by introducing a generalized Fourier representation that enables learning under a broad class of non-product distributions. Our approach represents any distribution $D$ as a Bayesian network (BN) and derives a corresponding Fourier expansion. We show that standard Fourier-based learning techniques using membership queries to identify heavy coefficients can be adapted to this generalized representation with minor modifications. We prove that the $L_1$ spectral norm of conjunctions remains bounded under this expansion for difference-bounded tree BNs, significantly generalizing the known result for uniform distributions; matching lower bounds demonstrate the necessity of these constraints. Using these results, we establish the learnability of DNF and the agnostic learnability of decision trees under such distributions. Finally, we present an algorithm for learning difference-bounded tree BN distributions, extending our results to settings where the distribution is unknown.

2502.13713 2026-06-03 cs.IR cs.SD eess.AS

TALKPLAY: Multimodal Music Recommendation with Large Language Models

TALKPLAY: 基于大语言模型的多模态音乐推荐

Seungheon Doh, Keunwoo Choi, Juhan Nam

发表机构 * KAIST(韩国科学技术院) talkpl.ai

AI总结 提出TALKPLAY系统,通过将推荐转化为token生成问题,利用大语言模型处理多模态音乐特征,实现端到端对话式推荐,显著优于单模态方法。

详情
AI中文摘要

我们提出TALKPLAY,一种新颖的多模态音乐推荐系统,它将推荐重新表述为使用大语言模型(LLM)的token生成问题。通过利用LLM的指令遵循和自然语言生成能力,我们的系统能够从多样化的用户查询中有效推荐音乐,同时生成上下文相关的响应。虽然预训练的LLM主要设计用于文本模态,但TALKPLAY通过两个关键创新扩展了其范围:一个多模态音乐分词器,用于编码音频特征、歌词、元数据、语义标签和播放列表共现信号;以及一个词汇扩展机制,能够统一处理和生成语言和音乐相关的token。通过将推荐系统直接集成到LLM架构中,TALKPLAY通过以下方式改造传统系统:(1)将先前的两阶段对话推荐系统(推荐引擎和对话管理器)统一为连贯的端到端系统,(2)有效利用长对话上下文进行推荐,同时在扩展的多轮交互中保持强劲性能,以及(3)生成自然语言响应以实现无缝的用户交互。我们的定性和定量评估表明,TALKPLAY在推荐性能和对话自然度方面显著优于仅基于文本或收听历史的单模态方法。

英文摘要

We present TALKPLAY, a novel multimodal music recommendation system that reformulates recommendation as a token generation problem using large language models (LLMs). By leveraging the instruction-following and natural language generation capabilities of LLMs, our system effectively recommends music from diverse user queries while generating contextually relevant responses. While pretrained LLMs are primarily designed for text modality, TALKPLAY extends their scope through two key innovations: a multimodal music tokenizer that encodes audio features, lyrics, metadata, semantic tags, and playlist co-occurrence signals; and a vocabulary expansion mechanism that enables unified processing and generation of both linguistic and music-relevant tokens. By integrating the recommendation system directly into the LLM architecture, TALKPLAY transforms conventional systems by: (1) unifying previous two-stage conversational recommendation systems (recommendation engines and dialogue managers) into a cohesive end-to-end system, (2) effectively utilizing long conversational context for recommendation while maintaining strong performance in extended multi-turn interactions, and (3) generating natural language responses for seamless user interaction. Our qualitative and quantitative evaluation demonstrates that TALKPLAY significantly outperforms unimodal approaches based solely on text or listening history in both recommendation performance and conversational naturalness.

2505.07068 2026-06-03 stat.ML cs.LG math.DS

A Sparse Bayesian Learning Algorithm for Estimation of Interaction Kernels in Motsch-Tadmor Model

Motsch-Tadmor模型中交互核估计的稀疏贝叶斯学习算法

Jinchao Feng, Sui Tang

发表机构 * Department of Mathematics, Great Bay University(广东大湾大学数学系) Department of Mathematics, University of California, Santa Barbara(加州大学圣芭芭拉分校数学系)

AI总结 针对Motsch-Tadmor模型中非对称交互核的估计问题,提出一种基于变分框架和稀疏贝叶斯学习的算法,实现核函数的鲁棒识别与不确定性量化。

Comments 23 pages

详情
AI中文摘要

本文基于观测轨迹数据,研究Motsch-Tadmor模型中非对称交互核的数据驱动辨识。所考虑的模型由一类半线性演化方程控制,其中交互核定义了一个归一化的、依赖于状态的拉普拉斯算子,该算子支配集体动力学。为了解决由此产生的非线性逆问题,我们提出一个变分框架,利用控制方程的隐式形式重新表述核辨识问题,将其简化为子空间辨识问题。我们建立了一个可辨识性结果,刻画了交互核在尺度意义下可唯一恢复的条件。为了鲁棒地求解逆问题,我们开发了一种稀疏贝叶斯学习算法,该算法引入信息先验进行正则化,量化不确定性,并实现原则性的模型选择。在代表性交互粒子系统上的大量数值实验表明,所提出的框架在不同噪声水平和数据范围内具有准确性、鲁棒性和可解释性。

英文摘要

In this paper, we investigate the data-driven identification of asymmetric interaction kernels in the Motsch-Tadmor model based on observed trajectory data. The model under consideration is governed by a class of semilinear evolution equations, where the interaction kernel defines a normalized, state-dependent Laplacian operator that governs collective dynamics. To address the resulting nonlinear inverse problem, we propose a variational framework that reformulates kernel identification using the implicit form of the governing equations, reducing it to a subspace identification problem. We establish an identifiability result that characterizes conditions under which the interaction kernel can be uniquely recovered up to scale. To solve the inverse problem robustly, we develop a sparse Bayesian learning algorithm that incorporates informative priors for regularization, quantifies uncertainty, and enables principled model selection. Extensive numerical experiments on representative interacting particle systems demonstrate the accuracy, robustness, and interpretability of the proposed framework across a range of noise levels and data regimes.

2502.03139 2026-06-03 astro-ph.CO astro-ph.IM cs.LG

Fast Sampling of Cosmological Initial Conditions with Gaussian Neural Posterior Estimation

基于高斯神经后验估计的宇宙学初始条件快速采样

Oleg Savchenko, Guillermo Franco Abellán, Florian List, Noemi Anau Montel, Christoph Weniger

发表机构 * GRAPPA Institute, Institute for Theoretical Physics Amsterdam, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands(GRAPPA研究所、阿姆斯特丹理论物理研究所、阿姆斯特丹大学、科学公园904号、1098 XH阿姆斯特丹、荷兰) Department of Astrophysics, University of Vienna, Türkenschanzstraße 17, 1180 Vienna, Austria(天体物理学系、维也纳大学、土耳其沙恩茨街17号、1180维也纳、奥地利)

AI总结 提出一种基于模拟推理的方法,通过高斯后验建模和神经网络估计,实现从晚期观测数据快速重建宇宙初始密度场,比现有方法快数个数量级。

Comments 9 + 2 pages, 7 figures, 1 table. Comments welcome!

详情
Journal ref
Mon Not R Astron Soc (2026)
AI中文摘要

了解宇宙大尺度结构在宇宙时间中形成的原初物质密度场对宇宙学至关重要。然而,从晚期观测重建这些宇宙学初始条件是一项著名的困难任务,需要先进的宇宙学模拟器和复杂的统计方法来探索数百万维的参数空间。我们展示了如何利用基于模拟的推理(SBI)来解决这个问题,并以模拟高效的方式使用通用的不可微模拟器获得数据约束的原初暗物质密度场实现。我们的方法适用于完整的高分辨率暗物质$N$体模拟,并基于将约束初始条件的后验分布建模为傅里叶空间中对角协方差矩阵的高斯分布。因此,我们可以在单个GPU上几秒内生成数千个后验样本,比现有方法快数个数量级,为宇宙学场的顺序SBI铺平了道路。此外,我们对协方差与波数的依赖关系进行了解析拟合,有效地将任何初始条件的点估计器转化为快速采样器。我们通过汇总统计将获得的样本与真实值进行比较,并执行贝叶斯一致性检验,验证了样本的有效性。

英文摘要

Knowledge of the primordial matter density field from which the large-scale structure of the Universe emerged over cosmic time is of fundamental importance for cosmology. However, reconstructing these cosmological initial conditions from late-time observations is a notoriously difficult task, which requires advanced cosmological simulators and sophisticated statistical methods to explore a multi-million-dimensional parameter space. We show how simulation-based inference (SBI) can be used to tackle this problem and to obtain data-constrained realisations of the primordial dark matter density field in a simulation-efficient way with general non-differentiable simulators. Our method is applicable to full high-resolution dark matter $N$-body simulations and is based on modelling the posterior distribution of the constrained initial conditions to be Gaussian with a diagonal covariance matrix in Fourier space. As a result, we can generate thousands of posterior samples within seconds on a single GPU, orders of magnitude faster than existing methods, paving the way for sequential SBI for cosmological fields. Furthermore, we perform an analytical fit of the estimated dependence of the covariance on the wavenumber, effectively transforming any point-estimator of initial conditions into a fast sampler. We test the validity of our obtained samples by comparing them to the true values with summary statistics and performing a Bayesian consistency test.

2501.02173 2026-06-03 cs.IR cs.LG

The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit

效率与准确性的权衡:使用多头早期退出优化RAG增强的LLM推荐系统

Huixue Zhou, Hengrui Gu, Xi Liu, Kaixiong Zhou, Mingfu Liang, Yongkang Xiao, Srinivas Govindan, Piyush Chawla, Jiyan Yang, Xiangfei Meng, Huayu Li, Buyun Zhang, Liang Luo, Wen-Yen Chen, Yiping Han, Bo Long, Rui Zhang, Tianlong Chen

发表机构 * Meta Platforms(Meta平台) University of Minnesota(明尼苏达大学) NCSU(北卡罗来纳州立大学) UNC at Chapel Hill(Chapel Hill分校,北卡罗来纳大学)

AI总结 提出结合检索增强生成(RAG)与多头早期退出架构的优化框架,通过图卷积网络(GCN)高效检索和动态推理终止,在降低计算时间的同时保持或提升点击率(CTR)预测准确性。

详情
AI中文摘要

在推荐系统中部署大型语言模型(LLM)以预测点击率(CTR)需要在计算效率和预测准确性之间取得微妙的平衡。本文提出一个优化框架,结合检索增强生成(RAG)与创新的多头早期退出架构,同时增强这两个方面。通过集成图卷积网络(GCN)作为高效检索机制,我们能够显著减少数据检索时间,同时保持高模型性能。采用的早期退出策略允许动态终止模型推理,利用跨多个头的实时预测置信度评估。这不仅加快了LLM的响应速度,还维持或提高了其准确性,使其非常适合实时应用场景。我们的实验表明,该架构有效减少了计算时间,而不牺牲可靠推荐交付所需的准确性,为商业系统中高效、实时的LLM部署建立了新标准。

英文摘要

The deployment of Large Language Models (LLMs) in recommender systems for predicting Click-Through Rates (CTR) necessitates a delicate balance between computational efficiency and predictive accuracy. This paper presents an optimization framework that combines Retrieval-Augmented Generation (RAG) with an innovative multi-head early exit architecture to concurrently enhance both aspects. By integrating Graph Convolutional Networks (GCNs) as efficient retrieval mechanisms, we are able to significantly reduce data retrieval times while maintaining high model performance. The early exit strategy employed allows for dynamic termination of model inference, utilizing real-time predictive confidence assessments across multiple heads. This not only quickens the responsiveness of LLMs but also upholds or improves their accuracy, making it ideal for real-time application scenarios. Our experiments demonstrate how this architecture effectively decreases computation time without sacrificing the accuracy needed for reliable recommendation delivery, establishing a new standard for efficient, real-time LLM deployment in commercial systems.

2412.17484 2026-06-03 cs.DC cs.AI

Power- and Fragmentation-aware Online Scheduling for GPU Datacenters

面向GPU数据中心的功耗与碎片感知在线调度

Francesco Lettich, Emanuele Carlini, Franco Maria Nardini, Raffaele Perego, Salvatore Trani

发表机构 * Istituto di Scienza e Tecnologie dell’Informazione "Alessandro Faedo", Consiglio Nazionale delle Ricerche(阿莱索·法多信息科学与技术研究所,意大利国家研究委员会)

AI总结 针对GPU数据中心在线调度问题,提出PWR调度策略,结合碎片梯度下降(FGD)方法,在降低功耗和最小化GPU碎片之间取得平衡。

Comments This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

人工智能和大语言模型的兴起推动了数据中心中GPU在复杂训练和推理任务中的使用增加,影响了大规模计算基础设施的运营成本、能源需求和环境足迹。本文解决了GPU数据中心中的在线调度问题,即在不知道任务未来到达时间的情况下进行调度。我们关注两个目标:最小化GPU碎片和降低功耗。当数据中心接近满容量时,部分GPU分配会阻碍剩余资源的有效利用,从而产生GPU碎片。最近的调度策略FGD(碎片梯度下降)利用碎片度量来解决这个问题。由于GPU的功耗需求巨大,降低功耗也至关重要。为此,我们提出了PWR,一种新颖的调度策略,通过选择功耗高效的GPU和CPU组合来最小化功耗。这涉及到一个简化的功耗测量模型,该模型集成到Kubernetes评分插件中。通过在模拟集群中的广泛实验评估,我们展示了PWR与FGD结合时,如何在降低功耗和最小化GPU碎片之间实现平衡的权衡。

英文摘要

The rise of Artificial Intelligence and Large Language Models is driving increased GPU usage in data centers for complex training and inference tasks, impacting operational costs, energy demands, and the environmental footprint of large-scale computing infrastructures. This work addresses the online scheduling problem in GPU datacenters, which involves scheduling tasks without knowledge of their future arrivals. We focus on two objectives: minimizing GPU fragmentation and reducing power consumption. GPU fragmentation occurs when partial GPU allocations hinder the efficient use of remaining resources, especially as the datacenter nears full capacity. A recent scheduling policy, Fragmentation Gradient Descent (FGD), leverages a fragmentation metric to address this issue. Reducing power consumption is also crucial due to the significant power demands of GPUs. To this end, we propose PWR, a novel scheduling policy to minimize power usage by selecting power-efficient GPU and CPU combinations. This involves a simplified model for measuring power consumption integrated into a Kubernetes score plugin. Through an extensive experimental evaluation in a simulated cluster, we show how PWR, when combined with FGD, achieves a balanced trade-off between reducing power consumption and minimizing GPU fragmentation.

2406.10407 2026-06-03 math.OC cs.LG cs.NA math.NA

Suboptimality bounds for trace-bounded SDPs enable a faster and scalable low-rank SDP solver SDPLR+

迹有界半定规划的最优性界实现更快且可扩展的低秩SDP求解器SDPLR+

Yufan Huang, David F. Gleich

发表机构 * Purdue University(普渡大学)

AI总结 本文利用迹有界半定规划的最优性界改进Burer-Monteiro的低秩SDP求解器SDPLR,提出SDPLR+,通过动态调整秩并跟踪原始不可行性和最优性,实现更快的求解和更好的可扩展性。

Comments 31 pages, 12 figures

详情
AI中文摘要

半定规划(SDP)及其求解器是机器学习和数据科学中许多应用的有力工具。设计可扩展的SDP求解器具有挑战性,因为标准情况下正半定决策变量是一个$n \times n$的稠密矩阵,尽管输入通常是$n \times n$的稀疏矩阵。然而,如Barvinok和Pataki所示,解可能不需要满秩矩阵。二十年前,Burer和Monteiro开发了SDP求解器\texttt{SDPLR},它在低秩分解上而不是完整矩阵上进行优化。这大大降低了存储成本,并且对许多问题效果良好。原始求解器\texttt{SDPLR}仅跟踪解的原始不可行性,阻止了在中等精度下的提前终止。我们利用迹有界SDP问题的最优性界,使我们能够更好地跟踪进展并执行提前终止。然后我们开发了\texttt{SDPLR+},它以极低秩分解开始优化,并基于原始不可行性和最优性动态更新秩。这进一步加速了计算并节省了存储。在Max Cut、Minimum Bisection、Cut Norm和Lovász Theta问题上与许多近期的内存高效可扩展SDP求解器的数值比较展示了\texttt{SDPLR+}在决策变量达到百万乘百万规模问题上的可扩展性。它通常是达到中等精度$10^{-2}$的最快求解器。在$\mu$-电导、矩阵补全和$k$-均值聚类上的进一步实验显示了\texttt{SDPLR+}在更广泛数据科学应用中的潜力。

英文摘要

Semidefinite programs (SDPs) and their solvers are powerful tools with many applications in machine learning and data science. Designing scalable SDP solvers is challenging because by standard the positive semidefinite decision variable is an $n \times n$ dense matrix, even though the input is often an $n \times n$ sparse matrix. However, the solution may not require a full-rank matrix, as shown by Barvinok and Pataki. Two decades ago, Burer and Monteiro developed an SDP solver \texttt{SDPLR} that optimizes over a low-rank factorization instead of the full matrix. This greatly decreases the storage cost and works well for many problems. The original solver \texttt{SDPLR} tracks only the primal infeasibility of the solution, preventing early termination at moderate accuracy. We use a suboptimality bound for trace-bounded SDP problems that enables us to track the progress better and perform early termination. We then develop \texttt{SDPLR+}, which starts the optimization with an extremely low-rank factorization and dynamically updates the rank based on the primal infeasibility and suboptimality. This further speeds up the computation and saves storage. Numerical comparisons on Max Cut, Minimum Bisection, Cut Norm, and Lovász Theta problems with many recent memory-efficient scalable SDP solvers demonstrate the scalability of \texttt{SDPLR+} up to problems with million-by-million decision variables. It is often the fastest solver to a moderate accuracy of $10^{-2}$. Further experiments on $μ$-conductance, matrix completion, and $k$-means clustering show the potential of \texttt{SDPLR+} on a broader range of data science applications.

2606.03992 2026-06-03 cs.CV cs.RO

Exploring Easy Boosts for Lidar Semantic Scene Completion

探索激光雷达语义场景补全的简易提升方法

Tetiana Martyniuk, Jonathan Seele, Alexandre Boulch, Gilles Puy, Renaud Marlet, Raoul de Charette

AI总结 本文研究无需复杂架构重设计的“免费午餐”策略,通过为输入点云添加语义伪标签和可见性信息,显著提升激光雷达语义场景补全性能,使旧模型与最先进系统竞争甚至超越。

Comments Accepted to ICIP 2026

详情
AI中文摘要

本文研究了“免费午餐”策略,以提升激光雷达语义场景补全(SSC)的性能,而无需复杂的架构重新设计。我们首先证明,使用现成分割器为输入点云赋予语义伪标签可以显著提升现有架构的性能。通过将这些模型与 oracle 进行评估,我们确定高质量的语义先验是 mIoU 提升的主要驱动力。此外,我们为输入激光雷达扫描配备了可见性信息,以区分空区域和未知区域,这为测试的架构提供了次要的性能提升。使用这些简单的增强,我们观察到旧模型仍然可以与最先进的系统竞争,甚至超越它们。我们的代码可在 https://this https URL 获取。

英文摘要

This paper investigates "free lunch" strategies to boost the performance of lidar semantic scene completion (SSC) without requiring complex architectural redesigns. We first demonstrate that endowing input point clouds with semantic pseudo-labels from off-the-shelf segmentors significantly improves the performance of existing architectures. By evaluating these models against an oracle, we establish that high-quality semantic priors are a primary driver of mIoU gains. Furthermore, we equip the input lidar scan with visibility information that distinguishes between empty and unknown spaces, which provides a secondary performance boost across the tested architectures. Using these simple enhancements, we observe that older models remain competitive with state-of-the-art systems, and can even outperform them. Our code is available at https://github.com/astra-vision/SSC-Priors.

2606.03982 2026-06-03 cs.CL

Language Models Compare Quantities Using Number-specific and Unit-specific Heuristics

语言模型使用数字特定和单位特定启发式比较数量

Mutsumi Sasaki, Go kamoda, Ryosuke Takahashi, Kosuke Sato, Kentaro Inui, Keisuke Sakaguchi, Benjamin Heinzerling

AI总结 本研究通过控制实验发现,语言模型在比较带单位的数量时,并非进行精确的尺度转换,而是依赖数字差异和单位尺度差异的启发式策略,导致在比较边界附近系统性错误。

详情
AI中文摘要

带有测量单位的数量,例如110 cm和1.2 m,要求语言模型(LMs)将数字与符号单位尺度相结合。在这里,我们研究LMs如何在跨越多个单位系统的受控设置中比较此类数量。我们发现,在比较边界附近,准确性会下降,其中值的微小变化决定了正确答案。由此产生的错误是系统性的:线性代理模型从数字差异和单位尺度差异线索中预测LM偏好,并且对这些变量对齐的子空间进行因果干预会改变模型的输出。结果表明,LMs通过一系列关于数字和单位的启发式策略来比较数量,而不是先将两个表达式转换为精确的共享尺度表示。

英文摘要

Quantities with measurement units, such as 110 cm and 1.2 m, require language models (LMs) to combine a numeral with a symbolic unit scale. Here, we study how LMs compare such quantities in controlled settings spanning several unit systems. We find that accuracy degrades near the comparison boundary, where small changes in value determine the correct answer. The resulting errors are systematic: linear surrogate models predict LM preferences from numerical-difference and unit-scale-difference cues, and causal interventions on subspaces aligned with these variables shift model's output. The results suggest that LMs compare quantities through a bag of heuristics over numerals and units, rather than first converting both expressions to an exact shared-scale representation.