arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1676
2603.16011 2026-05-18 cs.SE cs.AI cs.CL

FormulaCode: Evaluating Agentic Optimization on Large Codebases

Atharva Sehgal, James Hou, Akanksha Sarkar, Ishaan Mantripragada, Swarat Chaudhuri, Jennifer J. Sun, Yisong Yue

AI总结 本文提出FormulaCode,一个用于评估大语言模型(LLM)代理在真实大型代码库中进行多目标优化能力的基准。该基准基于从GitHub科学Python仓库中挖掘的957个性能瓶颈,每个瓶颈都配有专家编写的补丁和大量社区维护的性能测试任务,能够全面评估LLM在保证正确性与性能约束下的优化能力。实验表明,当前最先进的LLM代理在面对大规模、多目标优化任务时仍面临显著挑战。

Comments Preprint version

详情
英文摘要

Large language model (LLM) coding agents increasingly operate at the repository level, motivating benchmarks that evaluate their ability to optimize entire codebases under realistic constraints. Existing code benchmarks largely rely on synthetic tasks, binary correctness signals, or single-objective evaluation, limiting their ability to assess holistic optimization behavior. We introduce FormulaCode, a benchmark for evaluating agentic optimization on large, real-world codebases with fine-grained, multi-objective performance metrics. FormulaCode comprises 957 performance bottlenecks mined from scientific Python repositories on GitHub, each paired with expert-authored patches and, on average, 264.6 community-maintained performance workloads per task, enabling the holistic ability of LLM agents to optimize codebases under realistic correctness and performance constraints. Our evaluations reveal that repository-scale, multi-objective optimization remains a major challenge for frontier LLM agents. Project website at: https://formula-code.github.io

2603.13864 2026-05-18 cs.CR cs.CV

Inevitable Encounters: Backdoor Attacks Involving Lossy Compression

Qian Li, Yunuo Chen, Yuntian Chen

AI总结 本文研究了在现实场景中,由于数据存储和传输过程中不可避免地使用有损压缩,导致后门攻击效果被削弱的问题。针对图像压缩过程中嵌入的触发器信息可能丢失的问题,作者提出了两种专门应对有损压缩的中毒策略,确保触发器信息在压缩后仍能被有效恢复。实验表明,这两种方法在多种压缩方案下均具有良好的攻击效果,为后门攻击在实际应用中的实现提供了新的思路。

详情
英文摘要

Real-world backdoor attacks often require poisoned datasets to be stored and transmitted before being used to compromise deep learning systems. However, in the era of big data, the inevitable use of lossy compression poses a fundamental challenge to invisible backdoor attacks. We find that triggers embedded in RGB images often become ineffective after the images are lossily compressed into binary bitstreams (e.g., JPEG files) for storage and transmission. As a result, the poisoned data lose its malicious effect after compression, causing backdoor injection to fail. In this paper, we highlight the necessity of explicitly accounting for the lossy compression process in backdoor attacks. This requires attackers to ensure that the transmitted binary bitstreams preserve malicious trigger information, so that effective triggers can be recovered in the decompressed data. Building on the region-of-interest (ROI) coding mechanism in image compression, we propose two poisoning strategies tailored to inevitable lossy compression. First, we introduce Universal Attack Activation, a universal method that uses sample-specific ROI masks to reactivate trigger information in binary bitstreams for learned image compression (LIC). Second, we present Compression-Adapted Attack, a new attack strategy that employs customized ROI masks to encode trigger information into binary bitstreams and is applicable to both traditional codecs and LIC. Extensive experiments demonstrate the effectiveness of both strategies.

2603.04459 2026-05-18 cs.CR cs.AI cs.SE

Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks

Junjie Chu, Xinyue Shen, Ye Leng, Michael Backes, Yun Shen, Yang Zhang

AI总结 本文系统评估了31个大型语言模型安全基准的代码质量和可运行性,并与382篇非基准论文进行对比。研究发现,大多数基准代码需要修改才能运行,且仅有少数提供完整的安装指南和伦理考量。作者指出,基准的采用与作者知名度和代码可运行性相关,而非代码质量标准,揭示了社区在基准选择上的潜在偏差。此外,部分基准存在安全隐患,可能被用作攻击资源,影响安全评估的可靠性。

Comments 24 pages. 19 figures

详情
英文摘要

The rapid expansion of research in LLM safety presents challenges in tracking advancements, making benchmarks important evaluation infrastructures for identifying key trends and facilitating systematic comparisons. Yet no systematic assessment exists of their code quality and runnability, nor of what factors are associated with the community's adoption of certain benchmarks over others. To address this gap, we conduct a systematic measurement study of 31 LLM safety benchmarks (covering prompt injection, jailbreak, and hallucination) with 382 non-benchmark papers as a control group, combining automated static analysis, human runnability testing (220+ person-hours), and bibliometric analysis. We find that only 39\% of benchmark repositories can run without modification, only 16\% provide flawless installation guides, and a mere 6\% include ethical considerations despite containing potentially harmful content. These deficiencies persist across the study period with no significant improvement. Analyzing adoption factors, we find that benchmark adoption correlates with author prominence and code runnability, but not with code quality standards such as Pylint score and maintainability, suggesting that the community's benchmark selection does not reward higher coding standards. Based on these results, we identify potential safety and reliability concerns. Some safety benchmark repositories openly expose harmful content, such as successful jailbreak responses, without any ethical warning or access control, effectively serving as unguarded attack resources. Furthermore, when benchmarks require ad-hoc modifications to run, downstream safety evaluations across different papers may not be comparable. We present case studies illustrating these concrete consequences and propose a targeted checklist to help benchmark contributors improve code quality, documentation, and ethical practices.

2602.14342 2026-05-18 math.ST cs.DS cs.LG math.PR stat.TH

High-accuracy log-concave sampling with stochastic queries

Fan Chen, Sinho Chewi, Constantinos Daskalakis, Alexander Rakhlin

AI总结 本文研究了在对数凹函数采样中如何实现高精度的采样保证,提出使用具有亚指数尾部的随机梯度可以达到迭代和查询复杂度与 $\mathrm{poly}\log(1/δ)$ 相关的高精度采样。这与凸优化问题形成对比,后者在梯度存在随机性时需要 $\mathrm{poly}(1/δ)$ 的查询次数。研究还从信息论角度论证了轻尾随机梯度对于实现高精度采样的必要性,并给出了针对零阶随机查询和有限和势函数采样的改进复杂度结果。

详情
英文摘要

We show that high-accuracy guarantees for log-concave sampling -- that is, iteration and query complexities which scale as $\mathrm{poly}\log(1/δ)$, where $δ$ is the desired target accuracy -- are achievable using stochastic gradients with subexponential tails. Notably, this exhibits a separation with the problem of convex optimization, where stochasticity (even additive Gaussian noise) in the gradient oracle incurs $\mathrm{poly}(1/δ)$ queries. We also give an information-theoretic argument that light-tailed stochastic gradients are necessary for high accuracy: for example, in the bounded variance case, we show that the minimax-optimal query complexity scales as $Θ(1/δ)$. Our framework also provides similar high accuracy guarantees under stochastic zeroth order (value) queries, and an improved complexity result for sampling from finite-sum potentials.

2602.14092 2026-05-18 eess.SY cs.RO cs.SY

Simultaneous State Estimation and Online Model Learning in a Soft Robotic System

Jan-Hendrik Ewering, Max Bartholdt, Simon F. G. Ehlers, Niklas Wahlström, Thomas B. Schön, Thomas Seel

AI总结 本文研究了在软体机器人系统中同时进行状态估计和在线模型学习的问题。作者提出了一种基于灰色箱系统辨识工具的方法,仅需使用名义上的恒曲率机器人模型和机器人基座的力测量数据,即可同时估计软体机器人的当前姿态并学习其弯曲刚度模型。该方法通过边缘化粒子滤波器将恒曲率模型与高斯过程模型结合,有效提升了模型预测精度和整体质量,并在实际软体机器人实验中验证了其有效性。

Comments 8 pages, 3 figures, 2 tables, contribution to the International Conference on Information Fusion 2026

详情
英文摘要

Operating complex real-world systems, such as soft robots, can benefit from precise predictive control schemes that require accurate state and model knowledge. This knowledge is typically not available in practical settings and must be inferred from noisy measurements. In particular, it is challenging to simultaneously estimate unknown states and learn a model online from sequentially arriving measurements. In this paper, we show how a recently proposed gray-box system identification tool enables the estimation of a soft robot's current pose while at the same time learning a bending stiffness model. For estimation and learning, we only need a nominal constant-curvature robot model and measurements of the robot's base reactions (e.g., base forces). The estimation scheme -- relying on a marginalized particle filter -- allows us to conveniently interface nominal constant-curvature equations with a Gaussian Process (GP) bending stiffness model to be learned. This, in contrast to estimation via a random walk over stiffness values, enables prediction of bending stiffness and improves overall model quality. We demonstrate, using a real-world soft robot, that the method learns a bending-stiffness model online while accurately estimating the robot's pose. Notably, reduced error in multi-step forward predictions indicates that the learned bending-stiffness GP improves overall model quality.

2602.12292 2026-05-18 eess.SP cs.LG

A Gradient Boosted Mixed-Model Machine Learning Framework for Vessel Speed in the U.S. Arctic

Mauli Pant, Linda Fernandez, Indranil Sahoo

AI总结 本文研究了环境与操作条件如何影响美国北极地区船舶的航速,通过分析2010至2019年的自动识别系统(AIS)数据,提出了一种两阶段的混合机器学习框架,分别建模航速大于零的概率和条件航速。该方法结合了梯度提升决策树与随机效应,能够捕捉非线性环境响应并处理重复观测,结果显示海岸距离和水深是影响船舶航速的主要因素,而风和海冰的影响则相对较小。

详情
英文摘要

Understanding how environmental and operational conditions influence vessel speed is crucial for characterizing navigational conditions in the Arctic. We analyzed Automatic Identification System (AIS) data from 2010-2019 to examine vessel speed over ground (SOG). Over half of the AIS records showed zero SOG, and treating zero and positive SOG as a single continuous process can obscure important patterns. We therefore applied a two-stage machine learning framework, first modeling the probability of SOG greater than zero and then modeling SOG conditional on being positive. AIS observations were integrated with sea ice concentration, course over ground, wind, bathymetric depth, distance to coast, vessel group, and navigational status. Gradient boosted decision trees with random effects captured nonlinear environmental responses while accounting for repeated observations. The positive SOG classifier achieved strong discrimination (AUC = 0.85), while the conditional speed model explained approximately 77 percent of out-of-fold variance. SHAP values quantified covariate effects by decomposing model predictions into additive contributions from individual variables. Distance to coast and bathymetric depth were dominant determinants of both the likelihood and magnitude of vessel speed, while changes in course, vessel group, and navigational status introduced secondary variation. Wind and sea ice effects were modest. Together, these results empirically characterize Arctic vessel operating regimes relevant to speed management and corridor-level assessment.

2602.06824 2026-05-18 math.OC cs.LG

RanSOM: Second-Order Momentum with Randomized Scaling for Constrained and Unconstrained Optimization

El Mahdi Chayti

AI总结 本文提出了一种名为 RanSOM 的统一优化框架,用于解决有约束和无约束优化问题,旨在消除传统动量方法在随机设置中的曲率偏差问题。该方法通过将确定性步长替换为从特定分布中随机抽取的步长,结合 Stein 型恒等式,仅使用一次 Hessian-向量乘积即可准确估计动量偏差,从而避免了额外采样或对光滑性假设的依赖。实验表明,RanSOM 在标准噪声条件下实现了最优的 $\mathcal{O}(ε^{-3})$ 收敛速度,并在重尾噪声环境下也表现出优越的性能。

详情
英文摘要

Momentum methods, such as Polyak's Heavy Ball, are the standard for training deep networks but suffer from curvature-induced bias in stochastic settings, limiting convergence to suboptimal $\mathcal{O}(ε^{-4})$ rates. Existing corrections typically require expensive auxiliary sampling or restrictive smoothness assumptions. We propose \textbf{RanSOM}, a unified framework that eliminates this bias by replacing deterministic step sizes with randomized steps drawn from distributions with mean $η_t$. This modification allows us to leverage Stein-type identities to compute an exact, unbiased estimate of the momentum bias using a single Hessian-vector product computed jointly with the gradient, avoiding auxiliary queries. We instantiate this framework in two algorithms: \textbf{RanSOM-E} for unconstrained optimization (using exponentially distributed steps) and \textbf{RanSOM-B} for constrained optimization (using beta-distributed steps to strictly preserve feasibility). Theoretical analysis confirms that RanSOM recovers the optimal $\mathcal{O}(ε^{-3})$ convergence rate under standard bounded noise, and achieves optimal rates for heavy-tailed noise settings ($p \in (1, 2]$).

2602.01568 2026-05-18 cs.GT cs.RO

Efficiently Solving Mixed-Hierarchy Games with Quasi-Policy Approximations

Hamzah Khan, Dong Ho Lee, Jingqi Li, Tianyu Qiu, Christian Ellis, Jesse Milzman, Wesley Suttle, David Fridovich-Keil

AI总结 本文研究了具有混合层次结构的多机器人博弈问题,其中部分机器人作为Stackelberg领导者在其子树中决策,而不同分支的机器人则通过纳什均衡进行交互。为了解决这类博弈中高阶导数带来的求解困难,作者提出了一种准策略近似方法,并结合非精确牛顿法高效求解近似KKT系统,证明了算法在非二次目标和非线性约束下的局部指数收敛性。该方法已在实际硬件和仿真环境中验证,展示了对复杂混合层次结构的实时求解能力。

详情
英文摘要

Multi-robot coordination often exhibits hierarchical structure, with some robots' decisions depending on the planned behaviors of others. While game theory provides a principled framework for such interactions, existing solvers struggle to handle mixed information structures that combine simultaneous (Nash) and hierarchical (Stackelberg) decision-making. We study N-robot forest-structured mixed-hierarchy games, in which each robot acts as a Stackelberg leader over its subtree while robots in different branches interact via Nash equilibria. We derive the Karush-Kuhn-Tucker (KKT) first-order optimality conditions for this class of games and show that they involve increasingly high-order derivatives of robots' best-response policies as the hierarchy depth grows, rendering a direct solution intractable. To overcome this challenge, we introduce a quasi-policy approximation that removes higher-order policy derivatives and develop an inexact Newton method for efficiently solving the resulting approximated KKT systems. We prove local exponential convergence of the proposed algorithm for games with non-quadratic objectives and nonlinear constraints. The approach is implemented in a highly optimized Julia library (MixedHierarchyGames.jl) and evaluated in hardware and simulated multi-agent experiments, demonstrating real-time convergence for complex mixed-hierarchy information structures.

2601.23030 2026-05-18 stat.ML cs.LG stat.ME

Neural Backward Filtering Forward Guiding

Gefan Yang, Frank van der Meulen, Stefan Sommer

AI总结 本文提出了一种名为“神经反向滤波正向引导”(NBFFG)的统一框架,用于解决树状非线性连续随机过程中的推断问题,尤其适用于观测稀疏且拓扑结构复杂的情形。该方法通过构造一个近似的线性高斯过程,得到闭式反向滤波器以引导生成路径向高似然区域移动,并利用神经网络残差捕捉非线性偏差,从而实现无偏的路径子采样,显著降低训练复杂度。实验表明,NBFFG在合成数据集和高维系统发育分析任务中均优于现有方法。

详情
英文摘要

Inference in nonlinear continuous stochastic processes on trees is challenging, particularly when observations are sparse and the topology is complex. Exact smoothing via Doob's $h$-transform is intractable for general nonlinear dynamics. We propose Neural Backward Filtering Forward Guiding (NBFFG), a unified framework for both discrete transitions and continuous diffusions. Our method constructs a variational posterior by leveraging a proxy linear-Gaussian process. This proxy process yields a closed-form backward filter that serves as a guide, steering the generative path toward high-likelihood regions. We then learn a neural residual to capture the non-linear discrepancies. This formulation allows for an unbiased pathwise subsampling scheme, reducing the training complexity from tree-size dependent to path-length dependent. Empirical results show that NBFFG outperforms baselines on synthetic benchmarks, and we demonstrate the method on a high-dimensional inference task in phylogenetic analysis with reconstruction of ancestral butterfly wing shapes.

2601.21028 2026-05-18 cs.CY cs.AI cs.HC

"Unlimited Realm of Exploration and Experimentation": Methods and Motivations of AI-Generated Sexual Content Creators

Jaron Mink, Lucy Qin, Elissa M. Redmiles

AI总结 本文研究了AI生成性内容(AIG-SC)创作者的动机、方法及内容类型,揭示了他们创作的多样性,包括性探索、创意表达和技术实验等。研究通过深入访谈28位创作者,探讨了AIG-SC在技术、伦理和社会层面的影响,为相关政策制定提供了重要参考。

详情
英文摘要

AI-generated media is radically changing the way content is both consumed and produced on the internet, and in no place is this potentially more visible than in sexual content. AI-generated sexual content (AIG-SC) is increasingly enabled by an ecosystem of individual AI developers, specialized third-party applications, and foundation model providers. AIG-SC raises a number of concerns from older debates about the line between pornography and obscenity to newer debates about fair use and labor displacement (in this case, of sex workers), and has spurred new regulations to curb the spread of non-consensual intimate imagery (NCII) created using the same technology used to create AIG-SC. However, despite the growing prevalence of AIG-SC, little is known about its creators, their motivations, and what types of content they produce. To inform effective governance in this space, we conducted an in-depth study to understand what AIG-SC creators make, along with how and why they make it. Interviews with 28 AIG-SC creators, ranging from hobbyists to entrepreneurs to those who moderate communities of hundreds of thousands of other creators, revealed a wide spectrum of motivations, including sexual exploration, creative expression, technical experimentation, and in a handful of cases, the creation of NCII.

2601.00361 2026-05-18 cs.DS cs.LG

Deterministic Coreset for Lp Subspace

Rachit Chhaya, Anirban Dasgupta, Dan Feldman, Supratim Shit

AI总结 本文提出了一种首个迭代算法,用于构造保证确定性 $\ell_p$ 子空间嵌入的 $\varepsilon$-coreset,适用于任意 $p \in [1, \infty)$ 和 $\varepsilon > 0$。该算法通过迭代维护一个行子集,确保其损失函数与原数据集的损失函数在适当缩放下保持上下界,从而提供了确定性的子空间嵌入保证。该方法去除了传统核心集大小中的对数因子,达到最优的理论界,并可用于确定性地近似求解 $\ell_p$ 回归问题。

Comments The proofs of some claims are incomplete

详情
英文摘要

We introduce the first iterative algorithm for constructing a $\varepsilon$-coreset that guarantees deterministic $\ell_p$ subspace embedding for any $p \in [1,\infty)$ and any $\varepsilon > 0$. For a given full rank matrix $\mathbf{X} \in \mathbb{R}^{n \times d}$ where $n \gg d$, $\mathbf{X}' \in \mathbb{R}^{m \times d}$ is an $(\varepsilon,\ell_p)$-subspace embedding of $\mathbf{X}$, if for every $\mathbf{q} \in \mathbb{R}^d$, $(1-\varepsilon)\|\mathbf{Xq}\|_{p}^{p} \leq \|\mathbf{X'q}\|_{p}^{p} \leq (1+\varepsilon)\|\mathbf{Xq}\|_{p}^{p}$. Specifically, in this paper, $\mathbf{X}'$ is a weighted subset of rows of $\mathbf{X}$ which is commonly known in the literature as a coreset. In every iteration, the algorithm ensures that the loss on the maintained set is upper and lower bounded by the loss on the original dataset with appropriate scalings. So, unlike typical coreset guarantees, due to bounded loss, our coreset gives a deterministic guarantee for the $\ell_p$ subspace embedding. For an error parameter $\varepsilon$, our algorithm takes $O(\mathrm{poly}(n,d,\varepsilon^{-1}))$ time and returns a deterministic $\varepsilon$-coreset, for $\ell_p$ subspace embedding whose size is $O\left(\frac{d^{\max\{1,p/2\}}}{\varepsilon^{2}}\right)$. Here, we remove the $\log$ factors in the coreset size, which had been a long-standing open problem. Our coresets are optimal as they are tight with the lower bound. As an application, our coreset can also be used for approximately solving the $\ell_p$ regression problem in a deterministic manner.

2512.07946 2026-05-18 hep-th cs.LG

Conformal Defects in Neural Network Field Theories

Pietro Capuozzo, Brandon Robinson, Benjamin Suzzoni

AI总结 本文研究了神经网络场论(NN-FTs)中共形不变缺陷的构建方法,提出了一种形式化框架用于在这些理论中引入共形缺陷。通过两个标量场论的玩具模型,展示了该方法的有效性,并发展了类似缺陷算符乘积展开的神经网络解释,为共形场论与深度学习的交叉研究提供了新工具。

Comments 23 pages, 1 figure

详情
Journal ref
J. High Energy Phys. 05 (2026) 124
英文摘要

Neural Network Field Theories (NN-FTs) represent a novel construction of arbitrary field theories, including those of conformal fields, through the specification of the network architecture and prior distribution for the network parameters. In this work, we present a formalism for the construction of conformally invariant defects in these NN-FTs. We demonstrate this new formalism in two toy models of NN scalar field theories. We develop an NN interpretation of an expansion akin to the defect OPE in two-point correlation functions in these models.

2512.04745 2026-05-18 math.OC cs.AI cs.SY eess.SY nlin.AO

Neural Policy Composition from Free Energy Minimization

Francesca Rossi, Veronica Centorrino, Francesco Bullo, Giovanni Russo

AI总结 本文研究了如何通过最小化变分自由能来实现神经策略的组合,提出了一种规范化的框架,为策略组合提供了原理性且广泛适用的目标函数。基于该框架,作者推导出一种连续时间梯度流,其轨迹可保证以明确速率收敛到最优策略组合,并展示了该动态机制可通过软竞争递归电路实现。实验表明,该模型在多智能体群体行为、人类决策任务和分层控制等场景中,能够有效解释策略组合机制,再现关键行为特征,并在性能上优于或匹配现有模型。

详情
英文摘要

The ability to flexibly compose previously acquired skills to execute intelligent behaviors is a hallmark of natural intelligence. Such compositional flexibility is often attributed to context-dependent gating mechanisms that determine how multiple policies or behavioral primitives are combined. Yet, despite remarkable efforts, the normative objective from which such gating rules should arise, and the neural computations capable of implementing them, remain unclear. Existing approaches typically rely on prespecified design choices for the gating rules, and remain tied to specific architectures, learning paradigms, or datasets. Here, we introduce a normative framework in which policy composition emerges from the minimization of a variational free energy, providing a principled and broadly applicable objective for gating. Based on this framework, we derive a continuous-time gradient flow whose trajectories are guaranteed to converge, with explicit rate, to the optimal composition of primitives. We further show that this dynamics admits a mechanistic neural implementation as a soft-competitive recurrent circuit with context-sensitive local interactions. We evaluate the model on emerging flocking behaviors in multi-agent systems, human decision-making in bandit tasks, and control benchmarks in layered architectures. Across these settings, the model provides interpretable mechanistic accounts of policy composition, reproduces key behavioral signatures, yields insights into data, and matches or outperforms established models.

2511.05297 2026-05-18 cs.SE cs.LG

Building Specialized Software-Assistant ChatBot with Graph-Based Retrieval-Augmented Generation

Mohammed Hilel, Yannis Karmim, Jean De Bodinat, Reda Sarehane, Antoine Gillon

AI总结 本文提出了一种基于图结构的检索增强生成框架,用于构建面向企业软件的专用软件助手聊天机器人,以解决传统大型语言模型在缺乏软件结构理解时易产生幻觉的问题。该方法通过自动将企业网页应用转换为状态-动作知识图谱,辅助语言模型生成更准确、上下文相关的指导信息。研究还详细介绍了从软件界面中提取和构建知识图谱的工程流程,并展示了该方法在实际数字采用平台中的集成与应用效果。

Comments Accepted at ICMLC 2026

详情
英文摘要

Digital Adoption Platforms (DAPs) have become essential tools for helping employees navigate complex enterprise software such as CRM, ERP, or HRMS systems. Companies like LemonLearning have shown how digital guidance can reduce training costs and accelerate onboarding. However, building and maintaining these interactive guides still requires extensive manual effort. Leveraging Large Language Models as virtual assistants is an appealing alternative, yet without a structured understanding of the target software, LLMs often hallucinate and produce unreliable answers. Moreover, most production-grade LLMs are black-box APIs, making fine-tuning impractical due to the lack of access to model weights. In this work, we introduce a Graph-based Retrieval-Augmented Generation framework that automatically converts enterprise web applications into state-action knowledge graphs, enabling LLMs to generate grounded and context-aware assistance. The framework was co-developed with the AI enterprise RAKAM, in collaboration with Lemon Learning. We detail the engineering pipeline that extracts and structures software interfaces, the design of the graph-based retrieval process, and the integration of our approach into production DAP workflows. Finally, we discuss scalability, robustness, and deployment lessons learned from industrial use cases.

2511.04484 2026-05-18 cs.DS cs.LG

Online Algorithms for Repeated Optimal Stopping: Balancing Baseline Guarantees and Regret

Tsubasa Harada, Yasushi Kawase, Hanna Sumita

AI总结 本文研究重复最优停止问题,在未知分布的情况下,目标是在每轮中保持强性能保证的同时实现整体次线性遗憾。作者提出了一种通用算法框架,在完整反馈条件下,以高概率同时满足每轮性能保证和次线性遗憾,并适用于如先知不等式、秘书问题等多种经典场景。研究还给出了在独立同分布模型下的遗憾下界,表明所提方法的性能接近最优。

Comments 30 pages, Major revision with corrected results, new impossibility results, and revised exposition

详情
英文摘要

We study the repeated optimal stopping problem, in which the same optimal stopping instance with an unknown distribution is solved repeatedly over $T$ rounds. We aim to simultaneously achieve strong per-round performance guarantees relative to a given baseline and sublinear regret across all rounds. Our primary contribution is a comprehensive theoretical characterization of whether and when these two objectives are compatible. First, under standard semi-bandit feedback, we prove that maintaining the per-round guarantee forces regret of $Ω(T / \log T)$. Second, even under full feedback, we show that requiring almost-sure satisfaction of the per-round guarantee in every round is incompatible with sublinear regret. Third, under full feedback, we propose a general algorithmic framework that achieves both sublinear regret and the per-round guarantee with high probability. Our framework applies to canonical problems, including the prophet inequality, the secretary problem, and their variants under adversarial, random, and i.i.d. input models. For example, in the repeated prophet inequality problem, our method guarantees that, with high probability in each round, its expected reward is at least that of the classical single-sample algorithm, which achieves a $1/2$ competitive ratio, while simultaneously ensuring $\tilde{O}(\sqrt{T})$ regret. Furthermore, we establish a regret lower bound of $Ω(\sqrt{T})$ even in the i.i.d. model, which is nearly tight with respect to the number of rounds.

2511.03606 2026-05-18 stat.ML cs.LG math.ST stat.TH

Vector-valued self-normalized concentration inequalities beyond sub-Gaussianity

Diego Martinez-Taboada, Tomas Gonzalez, Aaditya Ramdas

AI总结 本文研究了超越次高斯分布的向量值自归一化过程的集中不等式,填补了该领域在非次高斯条件下的理论空白。作者提出了适用于轻尾分布(如贝内特或伯努利分布)的集中界,扩展了传统自归一化分析的适用范围。研究成果在在线线性回归及核化线性强盗算法中具有重要应用价值。

详情
英文摘要

The study of self-normalized processes plays a crucial role in a wide range of applications, from sequential decision-making to econometrics. While the behavior of self-normalized concentration has been widely investigated for scalar-valued processes, vector-valued processes remain comparatively underexplored, especially outside of the sub-Gaussian framework. In this contribution, we provide concentration bounds for self-normalized processes with light tails beyond sub-Gaussianity (such as Bennett or Bernstein bounds). We illustrate the relevance of our results in the context of online linear regression, with applications in (kernelized) linear bandits.

2510.19315 2026-05-18 cs.FL cs.LG cs.LO

Transformers are Inherently Succinct

Pascal Bergsträßer, Ryan Cotterell, Anthony W. Lin

AI总结 本文研究了变换器(transformers)在表达能力上的简洁性,将其作为衡量其性能的一个重要指标。作者证明了固定精度的变换器在描述语言时比线性时序逻辑(LTL)、循环神经网络(RNN)以及有限自动机等传统模型更加简洁,甚至在某些情况下具有指数级或双指数级的简洁优势。同时,研究还给出了相应的上界,表明变换器可以转换为LTL公式,且仅需指数级的扩展,这改进了之前的双指数级转换结果。这一简洁性也导致了变换器的基本验证问题(如空集性和等价性)在计算复杂度上是难以处理的,具体为EXPSPACE-完全问题。

详情
英文摘要

We study succinctness as a measure of the expressive power of transformers. Succinctness -- how compactly a formalism can describe a language relative to other formalisms -- is a classical notion in logic and automata theory. We prove that fixed-precision transformers are remarkably succinct: they can be exponentially more succinct than both linear temporal logic (LTL) and recurrent neural networks, and, by extension, state-space models, and doubly exponentially more succinct than finite automata. In other words, there exist families of languages describable by polynomial-size transformers whose smallest equivalent LTL formula or recurrent neural network is exponentially large, and whose smallest equivalent automaton is doubly exponentially large. We also establish matching upper bounds, showing that any fixed-precision transformer can be converted to an LTL formula with at most an exponential blow-up -- improving a prior doubly exponential translation. As a consequence of this succinctness, we show that basic verification problems for transformers, such as emptiness and equivalence, are provably intractable: specifically, EXPSPACE-complete.

2510.15714 2026-05-18 math.OC cs.LG

A Split-Client Approach to Second-Order Optimization

El Mahdi Chayti, Martin Jaggi

AI总结 本文提出了一种名为Split-Client的框架,用于解决二阶优化方法中Hessian矩阵计算和分解带来的计算瓶颈问题。该方法将优化过程分解为并行的梯度和曲率计算,实现了对延迟的自适应调整,无需手动调参即可达到与最优Lazy方法相当的收敛速度。此外,该框架在持续曲率误差和结构化条件下分别提供了噪声自适应和更快的收敛速率,并在非凸问题实验中展示了显著的加速效果。

详情
英文摘要

Second-order optimization methods offer superior convergence rates but are often bottlenecked by the wall-clock cost of Hessian computation and factorization. In the moderate-dimensional regime where the full Hessian fits in memory, factorization $\mathcal{O}(d^3)$ typically dominates gradient evaluation $\mathcal{O}(nd)$, creating a synchronization barrier that negates the per-iteration progress of classical second-order methods. We propose the \emph{Split-Client} framework, which decouples optimization into parallel gradient and curvature processes. Unlike Lazy Hessian approaches, whose arithmetic-complexity analysis does not charge factorization time and whose optimal reuse frequency requires tuning, our method is fully \textbf{delay-adaptive}: its wall-clock complexity scales with the \emph{average} delay $\Barτ$, and it matches the optimally-tuned Lazy rate of $\mathcal{O}(\eps^{-3/2}\sqrt{\Barτ})$ without any tuning. For persistent curvature error, we provide a noise-adaptive schedule with $\widetilde{\mathcal{O}}(T^{-3/4})$ rate (on $E[\|\nabla f\|]^{3/2}$), recovering the rate that uniform-error analyses such as Kamzolov et al (2023) achieve via inflated regularization. Under a verifiable subspace-alignment condition, an additional \emph{structured} analysis based on the secant condition of L-BFGS gives a faster $\mathcal{O}(T^{-1})$ rate, with a hybrid theorem interpolating smoothly between the two regimes. We extend the framework to Subsampled Cubic Newton with adaptive batch sizes and an aggregate sampling budget linear in $T$. Experiments on two non-convex problems show wall-clock speedups of up to $800\times$ over Vanilla and $30\times$ over Lazy in the strongly factorization-dominated regime.

2510.02734 2026-05-18 q-bio.BM cs.AI q-bio.GN

SAE-RNA: A Sparse Autoencoder Model for Interpreting RNA Language Model Representations

Taehan Kim, Sangdae Nam

AI总结 本文提出了一种名为 SAE-RNA 的稀疏自编码器模型,用于解释 RNA 语言模型的表示,旨在探索其是否能够对 RNA 语言模型的特征进行可解释的分解。该方法基于 RiNALMo 模型,通过映射到已知的生物学特征,分析 RNA 语言模型内部如何组织生物信息。研究为 RNA 分类和结构特征的识别提供了一个基于特征层面的比较框架,并探讨了稀疏自编码器在该任务中的适用性与局限性。

Comments 12 pages, 7 figures. v2: Updated bibliography to improve reference accuracy and reflect updated publication venues. Refined claims for better alignment with results and added an Appendix

详情
英文摘要

Deep learning, particularly with the advancement of Large Language Models, has transformed biomolecular modeling, with protein language models such as ESM inspiring emerging RNA language models such as RiNALMo. Recent work has begun applying sparse autoencoders (SAEs) to protein language model representations, exploring representation-level interpretability in biomolecular models. Here, we explore whether SAEs can provide interpretable feature decompositions of RNA language model representations, while also examining their limitations in this setting. We present SAE-RNA, interpretability model that analyzes RiNALMo representations and maps them to known human-level biological features. Rather than claiming definitive biological concept discovery, our study frames SAE-based analysis as a representation-level probe for characterizing how RNA language models organize biological information internally. More broadly, SAE-RNA provides a feature-level framework for comparing RNA groups and identifying sparse representation components associated with RNA family identity or structural context.

2509.16223 2026-05-18 eess.SP cs.CV

mRadNet: A Compact Radar Object Detector with MetaFormer

Huaiyu Chen, Fahed Hassanat, Robert Laganiere, Martin Bouchard

AI总结 本文提出了一种名为mRadNet的紧凑型雷达目标检测模型,旨在满足车载嵌入式系统对模型轻量化和高效性的需求。该模型基于U-Net结构,结合MetaFormer模块,利用分离卷积和注意力机制有效提取局部与全局特征,并引入更高效的特征嵌入与融合策略以进一步降低计算复杂度。实验结果表明,mRadNet在CRUW数据集上以最少的参数和最低的计算量实现了优于现有方法的检测性能。

Comments 5 pages, 2 figures, to appear in Proc. of 34th European Signal Processing Conference (EUSIPCO 2026), Bruges, Belgique, Aug. 31 - Sept. 4, 2026. Code availble at https://github.com/huaiyu-chen/mRadNet

详情
英文摘要

Frequency-modulated continuous wave radars have gained increasing popularity in the automotive industry. Their robustness against adverse weather conditions makes it a suitable choice for radar object detection in advanced driver assistance systems. These real-time embedded systems have requirements for the compactness and efficiency of the model, which have been largely overlooked in previous work. In this work, we propose mRadNet, a novel radar object detection model with compactness in mind. mRadNet employs a U-net style architecture with MetaFormer blocks, in which separable convolution and attention token mixers are used to capture both local and global features effectively. More efficient token embedding and merging strategies are introduced to further facilitate the lightweight design. The performance of mRadNet is validated on the CRUW dataset, improving state-of-the-art performance with the fewest parameters and the lowest FLOPs.

2509.12266 2026-05-18 q-bio.GN cs.LG

Genome-Factory: A Library for Tuning, Deploying, and Interpreting Genomic Foundation Models

Weimin Wu, Xuefeng Song, Yibo Wen, Qinjie Lin, Zhihan Zhou, Jerry Yao-Chieh Hu, Zhong Wang, Han Liu

AI总结 本文介绍了 Genome-Factory,一个用于调优、部署和解释基因组基础模型的首个集成 Python 库。该库通过统一数据收集、模型调优、推理、基准测试和可解释性分析的流程,简化了基因组模型的开发工作。其核心贡献包括自动化数据预处理、支持多种模型调优方式、提供嵌入提取与序列生成功能,并引入基于稀疏自编码器的生物解释器,显著提升了基因组模型在实际分析中的实用价值。

详情
英文摘要

We introduce Genome-Factory, the first integrated Python library for tuning, deploying, and interpreting genomic foundation models. Our core contribution is to simplify and unify the workflow for genomic model development: data collection, model tuning, inference, benchmarking, and interpretability. For data collection, Genome-Factory offers an automated pipeline to download genomic sequences and preprocess them. For model tuning, Genome-Factory supports both full and parameter-efficient fine-tuning across diverse genomic models. For inference, Genome-Factory enables both embedding extraction and DNA sequence generation. For benchmarking, we include two existing benchmarks and provide a flexible interface to incorporate additional benchmarks. For interpretability, Genome-Factory introduces an open-source biological interpreter based on a sparse auto-encoder. We validate the utility of Genome-Factory across three dimensions: (i) Compatibility with diverse models and fine-tuning methods; (ii) Benchmarking downstream performance using two open-source benchmarks; (iii) Biological interpretation of learned representations with DNABERT-2. These results highlight its practical value for real-world genomic analysis. GitHub: https://github.com/WeiminWu2000/Genome_Factory.

2509.01685 2026-05-18 stat.ML cs.LG math.OC stat.CO

Preconditioned Regularized Wasserstein Proximal Sampling

Hong Ye Tan, Stanley Osher, Wuchen Li

AI总结 本文研究如何通过有限粒子的演化从吉布斯分布中进行采样,提出了一种预条件正则化Wasserstein近端采样方法。该方法通过正则化Wasserstein近端算子的数值可计算得分函数来近似得分函数,并基于各向异性热方程的Cole-Hopf变换推导出其核形式。实验表明,该方法在多种对数凹和非对数凹分布以及贝叶斯图像去卷积和神经网络训练任务中表现出加速和稳定性优势。

详情
英文摘要

We consider sampling from a Gibbs distribution by evolving finitely many particles. We propose a preconditioned version of a recently proposed noise-free sampling method, governed by approximating the score function with the numerically tractable score of a regularized Wasserstein proximal operator. This is derived by a Cole--Hopf transformation on coupled anisotropic heat equations, yielding a kernel formulation for the preconditioned regularized Wasserstein proximal. The diffusion component of the proposed method is also interpreted as a modified self-attention block, as in transformer architectures. For quadratic potentials, we provide a discrete-time non-asymptotic convergence analysis and explicitly characterize the bias, which is dependent on regularization and independent of step-size. Experiments demonstrate acceleration and particle-level stability on various log-concave and non-log-concave toy examples to Bayesian total-variation regularized image deconvolution, and competitive/better performance on non-convex Bayesian neural network training when utilizing variable preconditioning matrices.

2508.16114 2026-05-18 astro-ph.GA astro-ph.IM astro-ph.SR cs.LG

Neural-Network Chemical Emulator for First-Star Formation: Robust Iterative Predictions over a Wide Density Range

Sojun Ono, Kazuyuki Sugimura

AI总结 本文提出了一种基于神经网络的化学模拟器,用于研究第一代恒星(Population III)形成过程中的热力学与化学演化。该模拟器能够覆盖21个数量级的密度范围(10⁻³–10¹⁸ cm⁻³),准确追踪六种原始物质的演化。为提高预测的鲁棒性和效率,研究引入了基于时间尺度的更新方法,并在不同密度区间分别训练深度算子网络,显著提升了计算速度并保证了多步迭代下的预测精度。

Comments 19 pages, 7 figures, Accepted for publication in ApJ

详情
Journal ref
ApJ, 996, 9 (2026)
英文摘要

We present a neural-network emulator for the thermal and chemical evolution in Population III star formation. The emulator accurately reproduces the thermochemical evolution over a wide density range spanning 21 orders of magnitude (10$^{-3}$-10$^{18}$ cm$^{-3}$), tracking six primordial species: H, H$_2$, e$^{-}$, H$^{+}$, H$^{-}$, and H$_2^{+}$. To handle the broad dynamic range, we partition the density range into five subregions and train separate deep operator networks (DeepONets) in each region. When applied to randomly sampled thermochemical states, the emulator achieves relative errors below 10% in over 90% of cases for both temperature and chemical abundances (except for the rare species H$_2^{+}$). The emulator is roughly ten times faster on a CPU and more than 1000 times faster for batched predictions on a GPU, compared with conventional numerical integration. Furthermore, to ensure robust predictions under many iterations, we introduce a novel timescale-based update method, where a short-timestep update of each variable is computed by rescaling the predicted change over a longer timestep equal to its characteristic variation timescale. In one-zone collapse calculations, the results from the timescale-based method agree well with traditional numerical integration even with many iterations at a timestep as short as 10$^{-4}$ of the free-fall time. This proof-of-concept study suggests the potential for neural network-based chemical emulators to accelerate hydrodynamic simulations of star formation.

2508.03810 2026-05-18 hep-th cs.LG

Viability of perturbative expansion for quantum field theories on neurons

Srimoyee Sen, Varun Vaidya

AI总结 本文研究了在有限神经元数量下,使用神经网络架构进行局部量子场论微扰计算的可行性,以$d$维欧几里得空间中的标量$ϕ^4$理论为例。研究发现,二点和四点关联函数的重整化$O(1/N)$修正所形成的微扰级数对紫外截断敏感,收敛性较弱。为此,作者提出对网络结构进行改进,并探讨了理论参数和神经元数量的标度关系,以更准确地提取场论结果。

Comments Published version

详情
英文摘要

Neural Network (NN) architectures that break statistical independence of parameters have been proposed as a new approach for simulating local quantum field theories (QFTs). In the infinite neuron number limit, single-layer NNs can exactly reproduce QFT results. This paper examines the viability of this architecture for perturbative calculations of local QFTs for finite neuron number $N$ using scalar $ϕ^4$ theory in $d$ Euclidean dimensions as an example. We find that the renormalized $O(1/N)$ corrections to two- and four-point correlators yield perturbative series which are sensitive to the ultraviolet cut-off and therefore have a weak convergence. We propose a modification to the architecture to improve this convergence and discuss constraints on the parameters of the theory and the scaling of N which allow us to extract accurate field theory results.

2506.14829 2026-05-18 cs.HC cs.AI cs.LG

The Hardness of Achieving Impact in AI for Social Impact Research: A Ground-Level View of Challenges & Opportunities

Aditya Majumdar, Wenbo Zhang, Kashvi Prawal, Amulya Yadav

AI总结 本文探讨了人工智能用于社会影响研究(AI4SI)在实际应用中面临的主要挑战与机遇。研究通过访谈26位AI4SI领域的研究者,分析了在结构性、组织性、沟通与协作等方面阻碍AI4SI落地的障碍,并总结了可行的合作策略与实践经验。该研究为希望推动社会影响的AI研究者和机构提供了实用指导。

Comments To be published in FAccT'26

详情
英文摘要

AI for Social Impact (AI4SI) is an emergent field harnessing interdisciplinarities between the fields of artificial intelligence (AI), machine learning (ML), and the social sciences to address societal issues aligned with the United Nations Sustainable Development Goals (UN SDGs), such as universal healthcare, climate action, etc. Despite AI4SI's rising popularity, achieving tangible, on-the-ground impact remains a significant challenge. In particular, identifying collaborators open to co-designing and deploying AI4SI-based solutions in real-world settings is often difficult. Thus, many projects stall at the proof-of-concept stage, unable to scale to production-level deployment. Drawing on twenty-six AI4SI researchers' interviews, primarily from academic institutions though also including some industry researchers and practitioners, and the authors' own lived experiences, this paper employs thematic analysis to highlight structural, organizational, communication, collaboration, and operational challenges hindering socially impactful AI4SI deployments. While there are no easy fixes, the authors synthesize best practices and actionable strategies from interviews and personal experiences, positioning this paper as a practical guide for AI4SI researchers and organizations pursuing socially impactful collaborations$^1$. $^1$We note that our findings are most directly applicable to academic research groups in the global north, as governmental, startup, and global south researchers' perspectives are underrepresented in our sample.

2506.00182 2026-05-18 stat.ML cs.IT cs.LG math.IT math.ST stat.TH

Overfitting has a limitation: a model-independent generalization gap bound based on Rényi entropy

Atsushi Suzuki, Jing Wang

AI总结 本文研究了机器学习模型泛化能力的限制,提出了一个与模型无关的泛化间隙上界,该上界仅依赖于数据生成分布的Rényi熵。研究指出,即使模型规模无限增大,只要数据量相对于Rényi熵足够,仍可保持较小的泛化间隙。该框架不仅解释了数据中注入噪声导致性能下降的现象,还拓展了无免费午餐定理,强调了数据分布熵在成功学习中的关键作用。

详情
英文摘要

Will further scaling up of machine learning models continue to bring success? A significant challenge in answering this question lies in understanding generalization gap, which is the impact of overfitting. Understanding generalization gap behavior of increasingly large-scale machine learning models remains a significant area of investigation, as conventional analyses often link error bounds to model complexity, failing to fully explain the success of extremely large architectures. This research introduces a novel perspective by establishing a model-independent upper bound for generalization gap applicable to algorithms whose outputs are determined solely by the data's histogram, such as empirical risk minimization or gradient-based methods. Crucially, this bound is shown to depend only on the Rényi entropy of the data-generating distribution, suggesting that a small generalization gap can be maintained even with arbitrarily large models, provided the data quantity is sufficient relative to this entropy. This framework offers a direct explanation for the phenomenon where generalization performance degrades significantly upon injecting random noise into data, where the performance degrade is attributed to the consequent increase in the data distribution's Rényi entropy. Furthermore, we adapt the no-free-lunch theorem to be data-distribution-dependent, demonstrating that an amount of data corresponding to the Rényi entropy is indeed essential for successful learning, thereby highlighting the tightness of our proposed generalization bound.

2505.11708 2026-05-18 cs.CR cs.LG

Unveiling the Black Box: A Multi-Layer Framework for Explaining Reinforcement Learning-Based Cyber Agents

Diksha Goel, Kristen Moore, Jeff Wang, Minjune Kim, Thanh Thi Nguyen

AI总结 随着强化学习(RL)在模拟复杂网络攻击中的应用日益广泛,其决策过程的不透明性成为阻碍信任建立、调试和防御准备的关键问题。本文提出了一种统一的多层级解释框架,用于揭示基于RL的攻击代理在战略(MDP层)和战术(策略层)层面的决策逻辑,通过将网络攻击建模为部分可观测马尔可夫决策过程(POMDP)并分析Q值的动态变化,实现了对攻击行为演变的深入解释。该框架具有通用性,适用于多种攻击代理和环境,为红队模拟、策略调试、威胁建模和前瞻防御等场景提供了可解释的行为洞察。

详情
英文摘要

Reinforcement Learning (RL) agents are increasingly used to simulate sophisticated cyberattacks, but their decision-making processes remain opaque, hindering trust, debugging, and defensive preparedness. In high-stakes cybersecurity contexts, explainability is essential for understanding how adversarial strategies are formed and evolve over time. In this paper, we propose a unified, multi-layer explainability framework for RL-based attacker agents that reveals both strategic (Markov Decision Process (MDP)-level) and tactical (policy-level) reasoning. At the MDP-level, we model cyberattacks as a Partially Observable Markov Decision Process (POMDP) to expose exploration-exploitation dynamics and phase-aware behavioural shifts. At the policy-level, we analyse the temporal evolution of Q-values and use Prioritised Experience Replay (PER) to surface critical learning transitions and evolving action preferences. Evaluated across CyberBattleSim environments of increasing complexity, our framework offers interpretable insights into agent behaviour at scale. Unlike previous explainable RL methods, which are {predominantly} post-hoc, domain-specific, or limited in depth, our approach is both agent- and environment-agnostic, {supporting use cases such as red-team simulation, RL policy debugging, phase-aware threat modelling and anticipatory defence planning.} By transforming black-box learning into actionable behavioural intelligence, our framework enables both defenders and developers to better anticipate, analyse, and respond to autonomous cyber threats.

2504.13850 2026-05-18 cs.DC cs.LG

FedOptima: Optimizing Resource Utilization in Federated Learning

Zihan Zhang, Leon Wong, Blesson Varghese

AI总结 本文提出 FedOptima,一种优化联邦学习中资源利用的系统,旨在解决服务器和设备资源利用率低的问题。该系统通过异步聚合、辅助网络和集中式任务调度等创新方法,同时减少由任务依赖和设备异步导致的空闲时间,显著提升了训练效率和模型准确性。实验表明,FedOptima 在保持高精度的同时,大幅提升了训练速度和系统吞吐量。

Comments Accepted for publication in Future Generation Computer Systems

详情
Journal ref
Future Generation Computer Systems, Volume 183, October 2026, 108551
英文摘要

Federated learning (FL) systems facilitate distributed machine learning across a server and multiple devices. However, FL systems have low resource utilization on servers and devices, limiting their practical use in the real world. This inefficiency primarily arises from two types of idle time: (i) task dependency between the server and devices, and (ii) stragglers among heterogeneous devices. This paper introduces FedOptima, a resource-optimized FL system designed to simultaneously minimize both types of idle time; existing systems do not eliminate or reduce both at the same time. FedOptima offloads the training of certain layers of a neural network from a device to a server using three innovations. First, devices operate independently of each other using asynchronous aggregation to eliminate straggler effects, and independently of the server by utilizing auxiliary networks to minimize idle time caused by task dependency. Second, the server performs centralized training using a task scheduler that ensures balanced contributions from all devices, improving model accuracy. Third, an efficient memory management mechanism on the server increases the scalability of the number of participating devices. Extensive experiments are conducted on multiple lab-based testbeds, evaluated on image classification and sentiment analysis tasks with CNNs and Transformers. Compared to four state-of-the-art offloading-based and asynchronous FL baselines, FedOptima (i) achieves higher or comparable accuracy, (ii) accelerates training by 1.9x to 21.8x, (iii) reduces server and device idle time by up to 93.9% and 81.8%, respectively, and (iv) increases throughput by 1.1x to 2.0x.

2501.13188 2026-05-18 cond-mat.stat-mech cs.LG nlin.AO q-bio.CB

Topological constraints on self-organisation in locally interacting systems

Francesco Sacco, Dalton A R Sakthivadivel, Michael Levin

AI总结 本文研究了局部相互作用系统中自组织行为的拓扑限制,探讨了在平面图结构下,系统能否形成有序相的必要条件。通过分析三个模型系统(Potts模型、自回归模型和分层网络)中自由能随领域壁形成的缩放行为,揭示了图结构中的相互作用组合如何影响自发有序的产生。研究结果为理解生物多尺度系统能够形成复杂模式,而基础语言模型在处理长序列时面临挑战提供了理论依据。

Comments 11+3 pages, four figures, four tikzpictures. This version to appear in Philos Trans R Soc A

详情
Journal ref
Philosophical Transactions A, 384(2320), 2026
英文摘要

All intelligence is collective intelligence, in the sense that it is made of parts which must align with respect to system-level goals. Understanding the dynamics which facilitate or limit navigation of problem spaces by aligned parts thus impacts many fields ranging across life sciences and engineering. To that end, consider a system on the vertices of a planar graph, with pairwise interactions prescribed by the edges of the graph. Such systems can sometimes exhibit long-range order, distinguishing one phase of macroscopic behaviour from another. In networks of interacting systems we may view spontaneous ordering as a form of self-organisation, modelling neural and basal forms of cognition. Here, we discuss necessary conditions on the topology of the graph for an ordered phase to exist, with an eye towards finding constraints on the ability of a system with local interactions to maintain an ordered target state. By studying the scaling of free energy under the formation of domain walls in three model systems -- the Potts model, autoregressive models, and hierarchical networks -- we show how the combinatorics of interactions on a graph prevent or allow spontaneous ordering. As an application we are able to analyse why multiscale systems like those prevalent in biology are capable of organising into complex patterns, whereas rudimentary language models are challenged by long sequences of outputs.

2412.12636 2026-05-18 cs.DC cs.AI cs.LG cs.PF

TrainMover: An Interruption-Resilient Runtime for ML Training

ChonLam Lao, Jiaqi Gao, Jiamin Cao, Zhipeng Zhang, Pengcheng Zhang, Jiangfei Duan, Zhilong Zheng, Yu Guan, Yichi Xu, Yong Li, Zhengping Qian, Aditya Akella, Minlan Yu, Ennan Zhai, Dennis Cai, Jingren Zhou

AI总结 大规模机器学习训练任务常因硬件、软件故障或管理事件而中断,现有方法如检查点重启或运行时重新配置往往导致较长的停机时间和性能下降。本文提出TrainMover,一种具有高弹性的大语言模型训练运行时系统,通过利用弹性与备用机器实现最小停机时间和零内存开销的中断处理。TrainMover引入了两阶段基于增量的通信组构建、无通信沙箱预热以及通用备用设计等关键技术,实验表明其在千GPU规模下处理中断的停机时间可稳定控制在约20秒,相比现有最佳方案可减少55%的GPU空转时间。

Comments 14 pages body, 19 pages total

详情
英文摘要

Large-scale ML training jobs are frequently interrupted by hardware and software anomalies, failures, and management events. Existing solutions like checkpoint-restart or runtime reconfiguration suffer from long downtimes and degraded performance. We present TrainMover, a resilient LLM training runtime that leverages elastic and standby machines to handle interruptions with minimal downtime and zero memory overhead. To achieve these goals, TrainMover introduces three key techniques: two-phase, delta-based communication group setup; communication-free sandboxed warmup; and general standby design that enables failure recovery from any role. Our evaluation shows that TrainMover consistently achieves around 20 seconds of downtime when handling various interruptions at the 1024-GPU scale. TrainMover is projected to reduce wasted GPU hours by 55% compared to the best alternative, saving 1.4 million GPU-hours per week at the 64K-GPU scale.