arXivDaily arXiv每日学术速递 周一至周五更新
重置
cs.NE神经进化14
2606.12382 2026-06-11 cs.NE cs.AI 新提交

SPEA2$^+$: Improved Density Estimation in SPEA2 with Provable Runtime Guarantees

SPEA2$^+$:具有可证明运行时间保证的改进SPEA2密度估计

Duc-Cuong Dang, Andre Opris, Dirk Sudholt

AI总结 针对SPEA2处理支配解时多样性不足的问题,提出使用所有成对距离改进密度估计的SPEA2$^+$,在OneTrapZeroTrap基准上达到与其他主流算法相同的性能保证。

详情
Comments
To appear in the Proceedings of PPSN 2026
AI中文摘要

强度帕累托进化算法2(SPEA2)是解决多目标优化问题的流行且著名的进化算法。尽管其受欢迎,但SPEA2的理论分析直到最近才出现。此外,这些分析仅关注SPEA2如何处理非支配解,而忽略了处理支配解的算法组件。我们首次对SPEA2进行了运行时分析,其中分析了这些组件。我们证明,与其他主流算法(包括相同设置下具有恒定种群大小和重复消除的NSGA-II、NSGA-III和SMS-EMOA)不同,SPEA2无法有效覆盖OneTrapZeroTrap基准的帕累托前沿。我们的结果表明,在适应度分配中使用k近邻距离提供的信号不足以维持支配个体间的多样性。为了解决这个问题,我们提出了一种改进的变体SPEA2$^+$,它考虑了所有成对距离。新算法在OneTrapZeroTrap上实现了与其他主流算法相同的性能保证,同时在更简单的问题上匹配原始SPEA2的性能。实验结果补充了我们的理论发现。

英文摘要

The Strength Pareto Evolutionary Algorithm 2 (SPEA2) is a popular and prominent evolutionary algorithm for solving multi-objective optimisation problems. Despite its popularity, theoretical analyses of SPEA2 have only appeared recently. Moreover, these analyses focus exclusively on how SPEA2 handles non-dominated solutions and disregard the algorithmic components responsible for handling dominated solutions. We conduct a first runtime analysis of SPEA2 for which these components are analysed. We prove that, unlike other prominent algorithms, including NSGA-II, NSGA-III and SMS-EMOA under the same setting of constant population size and duplicate elimination, SPEA2 is unable to cover the Pareto front of the OneTrapZeroTrap benchmark efficiently. Our results indicate that using k-th nearest-neighbour distance in the fitness assignment provides an insufficient signal to maintain diversity among dominated individuals. To address this issue, we propose an improved variant, SPEA2$^+$, that considers all pairwise distances. The new algorithm achieves the same performance guarantees as the other prominent algorithms on OneTrapZeroTrap, while matching the performance of the original SPEA2 on simpler problems. Experimental results complement our theoretical findings.

2606.12289 2026-06-11 cs.LG cs.AI cs.NE 新提交

The Standard Interpretable Model: A general theory of interpretable machine learning to deductively design interpretable methods using Lagrangian mechanics

标准可解释模型:一种基于拉格朗日力学的可解释机器学习通用理论,用于演绎设计可解释方法

Pietro Barbiero, Giovanni De Felice, Mateo Espinosa Zarlenga, Francesco Giannini, Filippo Bonchi, Mateja Jamnik, Giuseppe Marra, Ruggero Noris

AI总结 提出标准可解释模型(SIM),基于拉格朗日力学从前提演绎出可解释性对称性和约束,通过最小化拉格朗日函数得到最优可解释模型,解决现有方法局限性并指导新方法设计。

详情
AI中文摘要

随着人工智能模型复杂性的增加,可解释性已成为理解、调试和控制其计算不可或缺的工具。然而,可解释性缺乏通用理论来演绎设计可解释方法。理论与方法之间的这种差距导致了文献的碎片化和不一致的评估协议。为填补这一空白,我们引入了标准可解释模型(SIM),这是一种基于拉格朗日力学的通用理论,能够演绎设计可解释方法。具体而言,SIM 在一组前提中总结了目标用户的可解释性含义。从这些前提出发,SIM 系统地推导出可解释性对称性和相应的约束,这些约束塑造了拉格朗日函数的景观,其最小值对应于最优可解释模型。为了达到最小值,可以更新不透明模型的参数值使其更可解释,或者将约束编译成可解释架构。我们通过实验表明,SIM 能够识别并解决现有方法(包括传统、基于概念和机制可解释性)的局限性,突出未充分探索的研究方向,并指导核心编程接口的设计。除了作为一种研究方法,SIM 的演绎性质为可解释性课程提供了教学基础,并可能改变科学界对这一长期碎片化学科的看法。

英文摘要

As Artificial Intelligence models grow in complexity, interpretability has become an indispensable tool for understanding, debugging, and controlling their computations. However, interpretability lacks general theories to deductively design interpretable methods. This gap between theories and methods results in a fragmented literature and inconsistent evaluation protocols. To fill this gap, we introduce the Standard Interpretable Model (SIM), a general theory grounded in Lagrangian mechanics that enables the deductive design of interpretable methods. Specifically, the SIM summarises, in a set of premises, what interpretability is for a target user. From these premises, the SIM systematically derives interpretability symmetries and corresponding constraints, which shape the landscape of a Lagrangian whose minima correspond to optimal interpretable models. To reach the minima, one can either update the parameter values of an opaque model to make it more interpretable or compile constraints into an interpretable architecture. We empirically show that the SIM identifies and solves limitations of existing methods (including traditional, concept-based, and mechanistic interpretability), highlights underexplored research directions, and informs the design of core programming interfaces. Beyond being a research method, the deductive nature of the SIM offers pedagogical grounding for interpretability curricula and may shift the scientific community's perspective of a discipline that has long been fragmented.

2606.12287 2026-06-11 cs.NE cs.AI 新提交

SpikeDecoder: Realizing the GPT Architecture with Spiking Neural Networks

SpikeDecoder: 用脉冲神经网络实现GPT架构

Claas Beger, Florian Walter, Alois Knoll

AI总结 提出SpikeDecoder,一种基于脉冲神经网络(SNN)的Transformer解码器,用于自然语言处理,通过替换ANN模块和优化嵌入方法,在保持性能的同时降低理论能耗87%-93%。

详情
AI中文摘要

Transformer架构被广泛认为是自然语言处理最强大的工具,但由于大量复杂操作,其本质上存在高能耗问题。为解决这一问题,我们考虑脉冲神经网络(SNN),它通过天然的事件驱动方式处理信息,是传统人工神经网络(ANN)的节能替代方案。然而,这本质上使得SNN难以训练。通常,许多基于SNN的模型通过转换预训练的ANN来规避这一问题。最近,有研究尝试设计可直接训练的基于SNN的Transformer模型结构改编。尽管结果显示出巨大潜力,但应用领域是计算机视觉,且所提模型仅包含编码器模块。在本文中,我们提出SpikeDecoder,一种完全基于SNN的Transformer解码器模块实现,用于自然语言处理。通过一系列实验,我们分析了用脉冲替代方案交换ANN模型不同模块的影响,以识别权衡和性能损失的主要来源。我们进一步研究了残差连接的作用以及SNN兼容归一化技术的选择。除了模型架构的工作,我们还制定并比较了将文本数据投影为脉冲的不同嵌入方法。最后,我们证明,与ANN基线相比,所提出的基于SNN的解码器模块将理论能耗降低了87%至93%。

英文摘要

The Transformer architecture is widely regarded as the most powerful tool for natural language processing, but due to a high number of complex operations, it inherently faces the issue of high energy consumption. To address this issue, we consider Spiking Neural Networks (SNNs), which are an energy-efficient alternative to conventional Artificial Neural Networks (ANNs) due to their naturally event-driven approach to processing information. However, this inherently makes them difficult to train. Often, many SNN-based models circumvent this issue by converting pre-trained ANNs. More recently, attempts have been made to design directly trainable SNN-based adaptations of the Transformer model structure. Although the results showed great promise, the application field was computer vision. Moreover, the proposed model incorporates only encoder blocks. In this paper, we propose SpikeDecoder, a fully SNN-based implementation of the Transformer decoder block, for applications in natural language processing. In a series of experiments, we analyze the impact of exchanging different blocks of the ANN model with spike-based alternatives to identify trade-offs and significant sources of performance loss. We further investigate the role of residual connections and the selection of SNN-compatible normalization techniques. Besides the work on the model architecture, we formulate and compare different embedding methods to project text data into spikes. Finally, we demonstrate that our proposed SNN-based decoder block reduces the theoretical energy consumption by 87% to 93% compared to the ANN baseline.

2606.12279 2026-06-11 cs.NE cs.AI cs.LG 新提交

Mathematical perspective on genetic algorithms with optimization guided operators

遗传算法与优化引导算子的数学视角

Anna Brandenberger, Ilan Doron-Arad, Elchanan Mossel

AI总结 本文从数学角度建模遗传算法,将优化问题转化为查询复杂度问题,并证明某些问题必须依赖生成、变异和重组算子,同时揭示了多样性在解池中的关键作用。

详情
Comments
18 pages, 1 figure
AI中文摘要

近期机器学习工作将遗传算法应用于推理阶段,以迭代改进优化问题的解。所涉及的基本变异和重组算子在性质上不同于经典研究。变异不再是随机的;机器学习算法以改进目标为目的对解进行变异。同样,重组不再基于父代解的随机拼接,而是基于机器学习的优化算子,其目标是从输入中合成改进的解。因此,这些变异和重组算子更有可能改进目标,但其计算成本更高。我们引入了一个遗传算法的通用模型,并使用强化学习的语言将优化问题表述为查询复杂度问题。然后我们研究专门模型。我们证明某些优化问题必须通过生成、变异和重组来解决。接着,我们在此框架内为一类问题获得了定性紧的算法,该算法捕捉了解池中多样性的非平凡作用,这是实际机器学习遗传算法的一个关键特征。

英文摘要

Recent work in ML applies genetic algorithms at inference time to iteratively improve solutions to optimization problems. The basic mutation and recombination operators involved are qualitatively different from those studied classically. Mutations are no longer random; an ML algorithm mutates a solution with the goal of improving an objective. Similarly, recombination is not based on random collages of parent solutions. Instead, it is an ML optimization-based operator whose goal is to synthesize improved solutions from its inputs. Thus, these mutation and recombination operators are more likely to improve the objective, but their computational cost is much higher. We introduce a general model of genetic algorithms and formulating optimization in this model as a query-complexity problem, using the language of reinforcement learning. We then study specialized models. We show that some optimization problems require generation, mutation, and recombination to be solved. We then obtain qualitatively tight algorithms for a family of problems within this framework that captures the nontrivial role of diversity in the solution pool, a key feature of practical ML genetic algorithms.

2606.12059 2026-06-11 cs.LG cs.NE nlin.AO 新提交

Attention by Synchronization in Coupled Oscillator Networks

耦合振荡器网络中的同步注意力机制

Fabio Pasqualetti, Taosha Guo

发表机构 * University of California, Irvine(加州大学尔湾分校)

AI总结 提出基于Kuramoto同步动力学的固定查询振荡器注意力机制,无需指数运算和全局归约,在物理基板上实现注意力计算,并在关键词识别和主谓一致任务上优于softmax。

详情
AI中文摘要

我们探讨了能量受限物理基板上的Transformer注意力机制。Softmax注意力需要指数运算和全局归约,这些操作在冯·诺依曼硬件上能耗高且没有自然的物理模拟。我们证明Kuramoto同步动力学(出现在电气、机械、超导和电荷密度波振荡器阵列等物理系统中)无需上述操作即可实现定义良好的注意力操作。由此产生的机制——固定查询振荡器注意力——用球面上梯度流的平衡取代了softmax的算术运算:查询是固定在球面上的学习锚点,自由振荡器在Kuramoto-Lohe动力学下演化,直到它们稳定在通过余弦相似度编码注意力权重的位置上。由于计算是平衡过程,因此不需要指数运算;唯一的全局操作是读出时的仿射归一化。该不动点是唯一且从几乎所有初始条件全局吸引的,这一保证适用于所有物理实现。在实验上,在最小硬件配置(振荡器维度$d_{\mathrm{osc}}=2$)下,振荡器注意力在关键词识别(+1.00个百分点)和主谓一致(困难句子+5.27个百分点,零训练失败,而softmax五分之一失败)上优于softmax。在因果语言建模中,softmax仍保持优势,但振荡器注意力随着$d_{\mathrm{osc}}$的增长缩小了差距:在WikiText-2上,从$d_{\mathrm{osc}}=2$时的+11.09 PPL降至$d_{\mathrm{osc}}=32$时的+2.98 PPL;在TinyStories上,从$d_{\mathrm{osc}}=2$时的+2.39 PPL降至$d_{\mathrm{osc}}=32$时的+0.57 PPL。本工作的主要目标不是用软件替代softmax,而是为物理基板上的精确注意力提供数学基础蓝图。

英文摘要

We address transformer attention on energy-constrained physical substrates. Softmax attention requires exponentiation and global reduction, operations with high energy cost on von Neumann hardware and no natural physical analog. We show that Kuramoto synchronization dynamics (which arise in electrical, mechanical, superconducting, and charge-density-wave oscillator arrays, among other physical systems) implement a well-defined attention operation without either. The resulting mechanism, fixed-query oscillator attention, replaces softmax's arithmetic with the equilibration of a gradient flow on the sphere: queries are learned anchors fixed on the sphere, and free oscillators evolve under Kuramoto-Lohe dynamics until they settle at positions encoding attention weights via cosine similarity. Because the computation is equilibration, it requires no exponentiation; the only global operation is an affine normalization at readout. The fixed point is provably unique and globally attractive from almost every initial condition, a guarantee that holds across every physical realization. Empirically, at the minimal hardware configuration (oscillator dimension $d_{\mathrm{osc}}$ = 2), oscillator attention outperforms softmax on keyword spotting (+1.00 pp) and on subject-verb agreement (+5.27 pp on hard sentences, with zero training failures versus one in five for softmax). On causal language modeling, where softmax retains an advantage, oscillator attention closes the gap as $d_{\mathrm{osc}}$ grows: from +11.09 PPL at $d_{\mathrm{osc}}$ = 2 to +2.98 PPL at $d_{\mathrm{osc}}$ = 32 on WikiText-2, and from +2.39 PPL at $d_{\mathrm{osc}}$ = 2 to +0.57 PPL at $d_{\mathrm{osc}}$ = 32 on TinyStories. The main objective of this work is not to replace softmax in software but to provide a mathematically grounded blueprint for accurate attention on physical substrates.

2606.11256 2026-06-11 physics.chem-ph cs.LG cs.NE 新提交

My Chemical Harness: Evolutionary Molecular Design over Synthetic Pathways with Large Language Model Agents

我的化学缰绳:基于合成路径的大语言模型智能体进化分子设计

César Ojeda, Darius A. Faroughy, Maryam Karimi, Payam Zarrintaj, Mir Mehdi Seyedebrahimi, Martín Carballo-Pacheco

AI总结 提出一种以可执行合成路径为种群、大语言模型仅作策略控制器的进化框架,在可溶性环氧化物水解酶代理任务上达到最优性能。

详情
Comments
27 pages | 10 figures
AI中文摘要

当候选结构伴随可行的合成路线时,设计具有目标性质的分子最为有用。我们介绍了My Chemical Harness,一种面向目标分子设计的路线原生进化框架,其中搜索种群由可执行的合成路径而非孤立的分子图组成。每条路径由可购买的构建块和反应模板构建,通过确定性化学工具执行,并通过任务特定的分子预言机评分。大语言模型仅用作策略控制器,选择关于路径长度、移动类型、反应家族、基序和探索压力的高级偏好,而本地代码执行路径构建、验证、去重、评分、选择和记忆更新。这种分离使得大语言模型能够引导探索,同时防止其引入幻觉产物或不受支持的反应步骤。在一个可溶性环氧化物水解酶代理任务上,我们的LLM智能体优于单次LLM和确定性控制器,在sEH分数、合成可及性分数和AiZynthFinder成功率指标上达到最先进性能。这些结果表明,受约束的大语言模型智能体可以在无需训练、微调或专用生成模型的情况下,在分子发现中发挥重要作用。

英文摘要

Designing molecules with target properties is most useful when candidate structures are accompanied by feasible synthetic routes. We introduce My Chemical Harness, a route-native evolutionary framework for goal-directed molecular design in which the search population consists of executable synthetic pathways rather than isolated molecular graphs. Each route is built from purchasable building blocks and reaction templates, executed by deterministic chemistry tools, and scored through task-specific molecular oracles. Large language models (LLMs) are used only as strategy controllers that select high-level preferences over route length, move type, reaction families, motifs, and exploration pressure, while local code performs route construction, validation, deduplication, scoring, selection, and memory updates. This separation lets the LLM guide exploration without allowing it to introduce hallucinated products or unsupported reaction steps. On a soluble epoxide hydrolase proxy task, our LLM agent improves over single pass LLM and deterministic controllers, reaching state-of-the-art performance across the sEH score, synthetic accessibility score, and AiZynthFinder success rate metrics. These results suggest that constrained LLM agents can play a significant role in molecular discovery without requiring training, fine-tuning, or dedicated generative models.

2606.11245 2026-06-11 cs.AI cs.NE q-bio.NC 新提交

Position: Hippocampal Explicit Memory Is the Cornerstone for AGI

立场:海马体显式记忆是通用人工智能的基石

Sangjun Park

AI总结 本文主张,将显式记忆整合到大语言模型中是迈向通用人工智能的关键,因为LLM的学习机制类似人类内隐记忆,而高阶认知功能依赖海马体显式记忆。

详情
Comments
Accepted to ICML 2026 (Position Paper Track)
AI中文摘要

大语言模型(LLM)在各种任务中展现了卓越的能力,提升了人们对通用人工智能(AGI)的期望。这篇立场论文认为,整合显式记忆是推动LLM迈向AGI的基石。关键原因在于,LLM的底层学习机制与人类内隐记忆高度相似。然而,AGI所需的高阶认知功能,如长期战略规划、元认知和符号推理,严重依赖海马体显式记忆,无法仅从内隐统计学习中产生。借鉴神经科学的发现,我提出这一观点,并辅以人工显式记忆系统的计算要求,希望促进进一步研究,为显式记忆整合奠定基础。

英文摘要

Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, raising expectations for Artificial General Intelligence (AGI). This position paper argues that integrating explicit memory is the cornerstone for advancing LLMs toward AGI. The key reason is that the underlying learning mechanism of LLMs is highly analogous to human implicit memory. However, higher-order cognitive functions necessary for AGI, such as long-term strategic planning, metacognition, and symbolic reasoning, heavily rely on hippocampal explicit memory and cannot arise solely from implicit statistical learning. Drawing on findings from neuroscience, I advance this perspective and complement it with computational requirements for artificial explicit memory systems, hoping to foster further research and lay the groundwork for explicit memory integration.

2606.11236 2026-06-11 cs.NE cs.CV cs.LG 新提交

A2SG:Adaptive and Asymmetric Surrogate Gradients for Training Deep Spiking Neural Networks

A2SG:用于训练深度脉冲神经网络的适应性和非对称替代梯度

Yechan Kang, Yongjin Kweon, Mingyeong Seo, Sohee Park, Yeonguk Jeon, Jongkil Park, Hyun Jae Jang, Jaewook Kim, YeonJoo Jeong, Suyoun Lee, Seongsik Park

AI总结 提出适应性和非对称替代梯度(A2SG)框架,通过自适应窗口调整梯度方向一致性、非对称梯度反映神经元动态,降低梯度变化并促进收敛到平坦最小值,在多种SNN模型和任务上提升精度与能效。

详情
Comments
Accepted at ICML 2026
AI中文摘要

由于替代梯度导致的尖锐损失景观和时间不一致性,训练深度脉冲神经网络(SNN)仍然具有挑战性。为了解决这些问题,我们提出了一个统一框架:适应性和非对称替代梯度A2SG。适应性梯度调整一个有效窗口以实现时空适应,减少空间梯度变化并保持梯度随时间的方向一致性。非对称梯度通过为具有更高膜电位的神经元分配更大的梯度来反映神经元动态,并且我们证明它们比对称替代梯度产生更低的方差。我们的分析进一步建立了局部梯度变化与损失景观曲率之间的直接联系,为A2SG如何促进收敛到更平坦的最小值并改善泛化提供了原理性解释。我们在多种模型上进行了广泛实验,包括基于CNN和基于Transformer的SNN,涉及各种任务,如使用静态和神经形态数据集的图像分类以及分割。结果表明,A2SG持续提高了准确性和能效,使其成为训练深度SNN的通用且可靠的解决方案。我们的代码可在以下网址获取:此 https URL。

英文摘要

Training deep spiking neural networks (SNNs) remains challenging due to sharp loss landscapes and temporal inconsistency caused by surrogate gradients. To address these challenges, we propose a unified framework: adaptive and asymmetric surrogate gradients A2SG. The adaptive gradients adjust an effective window for spatio-temporal adaptation, reducing spatial gradient variation and maintaining directional consistency of gradients over time. The asymmetric gradients reflect neuronal dynamics by assigning larger gradients to neurons with higher membrane potentials, and we prove that they yield lower variation than symmetric surrogates. Our analysis further establishes a direct connection between local gradient variation and the curvature of the loss landscape, providing a principled explanation for how A2SG promotes convergence to flatter minima and improves generalization. We conduct extensive experiments on diverse models, including CNN-based and Transformer-based SNNs, across various tasks such as image classification using both static and neuromorphic datasets, as well as segmentation. The results demonstrate that A2SG consistently improves accuracy and energy efficiency, establishing it as a general and reliable solution for training deep SNNs. Our code is available at this https URL.

2605.15435 2026-06-11 cs.LG cs.NE 版本更新

On the Stability of Growth in Structural Plasticity

结构塑性中增长的稳定性

Lute Lillo, Nick Cheney

AI总结 本文研究了结构塑性中增长与剪枝的稳定性差异,指出生长在优化轨迹中插入新单元体,而剪枝则在训练初期选择已有单元。生长在图像分类任务中表现更优,但需足够时间整合新单元以提高适应性。

详情
AI中文摘要

标准深度学习管道通常在训练前选择网络架构并保持不变。相比之下,模型可以在训练过程中通过剪枝现有隐藏单元或生长新单元来适应。尽管增长对自适应和持续学习系统有吸引力,但本文表明增长并非单纯是剪枝的逆过程。剪枝在训练初期选择参与训练的单元,而增长在已专业化的优化轨迹中插入新单元。新生单元通常在正向计算中活跃但反向信号较弱。在小型MLP基准中此劣势较小,但在更难的图像分类设置中变得明显。在这些设置中,Grow在结构编辑过程中能获得高最终精度,而Prune在训练轨迹平均性能或重新训练稀疏网络时表现更优。针对优化器状态、插入、选择和可训练性等干预表明,提高新生单元的整合能改善适应性表现,但不自动产生更好的最终子网络。在压力塑性损失的持续学习基准中,Grow在新单元有足够时间整合时表现竞争。这些结果表明,Grow不应仅作为架构搜索操作符,而应作为时间敏感的优化过程,其成功取决于插入稳定性。

英文摘要

Standard deep-learning pipelines usually choose the network architecture before training and keep it fixed throughout optimization. In contrast, a model can also be adapted by editing its structure during training, for example by pruning existing hidden-neuron units or growing new ones. Although growth is appealing for adaptive and continual systems, we show that it is not simply the inverse of pruning. Pruning selects among units that have participated in training from the start, whereas growth inserts new units into an already specialized optimization trajectory. We isolate this insertion problem and show that newborn units are often forward-active but backward-starved: they participate in the forward computation, yet receive much weaker gradient signal than incumbent units. This disadvantage is minor in small MLP benchmarks, but becomes clear in harder image-classification settings with a convolutional trunk. In these settings, \textsc{Grow} can achieve high final accuracy during the structural-editing procedure, while \textsc{Prune} is stronger when performance is averaged over the training trajectory or when the final sparse network is retrained from scratch. Interventions targeting optimizer state, insertion, selection, and trainability show that improving the integration of newborn units can improve adaptive performance, but does not automatically produce better final subnetworks. In continual-learning benchmarks stressing plasticity loss, \textsc{Grow} becomes competitive mainly when new units have enough time to integrate. Together, these results suggest that \textsc{Grow} should be evaluated not only as an architecture-search operator, but as a time-sensitive optimization process whose success depends on insertion stability.

2604.10639 2026-06-11 cs.NE cs.ET 版本更新

Visualising the Attractor Landscape of Neural Cellular Automata

神经细胞自动机的吸引子景观可视化

James Stovold, Mia-Katrin Kvalsund, Harald Michael Ludwig, Varun Sharma, Alexander Mordvintsev

AI总结 本文应用流形学习和拓扑数据分析技术,从宏观和微观层面揭示神经细胞自动机(NCA)的行为流形,以增强其可解释性。

详情
Comments
Accepted to ALIFE 2026
AI中文摘要

随着神经细胞自动机(NCA)越来越多地应用于人工生命中的玩具模型之外,迫切需要理解它们的行为并建立适当的途径来解释它们所学到的东西。就其本质而言,训练NCA的好处与缺乏可解释性相平衡:我们可以设计涌现行为,但理解所学内容的能力有限。在本文中,我们应用多种技术来撬开NCA的黑箱,并对其所学内容有所了解。我们应用流形学习技术(主成分分析以及密集和稀疏自编码器)以及拓扑数据分析技术(持续同调)来捕获NCA的底层行为流形,取得了不同程度的成功。结果表明,当在宏观层面进行分析(即把整个NCA状态作为一个数据点)时,底层流形通常相当简单,可以很好地捕获和分析。当在微观层面进行分析(即把单个细胞的状态作为一个数据点)时,流形高度复杂,需要更复杂的技术才能理解它。

英文摘要

As Neural Cellular Automata (NCAs) are increasingly applied outside of the toy models in Artificial Life, there is a pressing need to understand how they behave and to build appropriate routes to interpret what they have learnt. By their very nature, the benefits of training NCAs are balanced with a lack of interpretability: we can engineer emergent behaviour, but have limited ability to understand what has been learnt. In this paper, we apply a variety of techniques to pry open the NCA black box and glean some understanding of what it has learnt to do. We apply techniques from manifold learning (principal components analysis and both dense and sparse autoencoders) along with techniques from topological data analysis (persistent homology) to capture the NCA's underlying behavioural manifold, with varying success. Results show that when analysis is performed at a macroscopic level (i.e. taking the entire NCA state as a single data point), the underlying manifold is often quite simple and can be captured and analysed quite well. When analysis is performed at a microscopic level (i.e. taking the state of individual cells as a single data point), the manifold is highly complex and more complicated techniques are required in order to make sense of it.

2512.20464 2026-06-11 physics.optics cs.CV cs.NE physics.app-ph

Snapshot 3D image projection using a diffractive decoder

Cagatay Isil, Alexander Chen, Yuhang Li, F. Onuralp Ardic, Shiqi Chen, Che-Yung Shen, Aydogan Ozcan

详情
Journal ref
Light: Science & Applications (2026)
Comments
22 Pages, 8 Figures
英文摘要

3D image display is essential for next-generation volumetric imaging; however, dense depth multiplexing for 3D image projection remains challenging because diffraction-induced cross-talk rapidly increases as the axial image planes get closer. Here, we introduce a 3D display system comprising a digital encoder and a diffractive optical decoder, which simultaneously projects different images onto multiple target axial planes with high axial resolution. By leveraging multi-layer diffractive wavefront decoding and deep learning-based end-to-end optimization, the system achieves high-fidelity depth-resolved 3D image projection in a snapshot, enabling axial plane separations on the order of a wavelength. The digital encoder leverages a Fourier encoder network to capture multi-scale spatial and frequency-domain features from input images, integrates axial position encoding, and generates a unified phase representation that simultaneously encodes all images to be axially projected in a single snapshot through a jointly-optimized diffractive decoder. We characterized the impact of diffractive decoder depth, output diffraction efficiency, spatial light modulator resolution, and axial encoding density, revealing trade-offs that govern axial separation and 3D image projection quality. We further demonstrated the capability to display volumetric images containing 28 axial slices, as well as the ability to dynamically reconfigure the axial locations of the image planes, performed on demand. Finally, we experimentally validated the presented approach, demonstrating close agreement between the measured results and the target images. These results establish the diffractive 3D display system as a compact and scalable framework for depth-resolved snapshot 3D image projection, with potential applications in holographic displays, AR/VR interfaces, and volumetric optical computing.

2509.23982 2026-06-11 cs.CL cs.AI cs.CY cs.LG cs.NE 版本更新

Toward Preference-aligned Large Language Models via Residual-based Model Steering

基于残差模型引导的偏好对齐大型语言模型

Lucio La Cava, Andrea Tagarelli

AI总结 提出PaLRS方法,利用残差流中的偏好信号提取轻量级引导向量,无需训练即可在推理时对齐模型偏好,在数学推理和代码生成任务上取得一致提升,同时节省大量时间。

详情
Comments
Accepted at IJCAI 2026
AI中文摘要

偏好对齐是使大型语言模型(LLMs)有用且与(人类)偏好一致的关键步骤。现有方法如基于人类反馈的强化学习或直接偏好优化通常需要精心策划的数据和对数十亿参数进行昂贵的优化,最终导致持久性的任务特定模型。在这项工作中,我们引入了基于残差引导的LLM偏好对齐(PaLRS),这是一种无需训练的方法,利用LLM残差流中编码的偏好信号。从仅一百个偏好对中,PaLRS提取出轻量级、即插即用的引导向量,可在推理时应用以将模型推向偏好行为。我们在各种中小型开源LLM上评估了PaLRS,显示PaLRS对齐的模型在数学推理和代码生成基准上取得了一致的提升,同时保持了基线通用性能。此外,与使用DPO和SimPO对齐的模型相比,它们表现更好且节省大量时间。我们的发现强调,PaLRS为标准偏好优化流程提供了一种有效、更高效且灵活的替代方案,提供了一种无需训练、即插即用的对齐机制,且数据需求极少。

英文摘要

Preference alignment is a critical step in making Large Language Models (LLMs) useful and aligned with (human) preferences. Existing approaches such as Reinforcement Learning from Human Feedback or Direct Preference Optimization typically require curated data and expensive optimization over billions of parameters, and eventually lead to persistent task-specific models. In this work, we introduce Preference alignment of Large Language Models via Residual Steering (PaLRS), a training-free method that exploits preference signals encoded in the residual streams of LLMs. From as few as one hundred preference pairs, PaLRS extracts lightweight, plug-and-play steering vectors that can be applied at inference time to push models toward preferred behaviors. We evaluate PaLRS on various small-to-medium-scale open-source LLMs, showing that PaLRS-aligned models achieve consistent gains on mathematical reasoning and code generation benchmarks while preserving baseline general-purpose performance. Moreover, when compared to models aligned with DPO and SimPO, they perform better with great time-savings. Our findings highlight that PaLRS offers an effective, much more efficient and flexible alternative to standard preference optimization pipelines, offering a training-free, plug-and-play mechanism for alignment with minimal data.

2508.11703 2026-06-11 cs.NE cs.LG

Data-Driven Discovery of Interpretable Kalman Filter Variants through Large Language Models and Genetic Programming

Vasileios Saketos, Sebastian Kaltenbach, Sergey Litvinov, Petros Koumoutsakos

详情
英文摘要

Algorithmic discovery has traditionally relied on human ingenuity and extensive experimentation. Here we investigate whether a prominent scientific computing algorithm, the Kalman Filter, can be discovered through an automated, data-driven, evolutionary process that relies on Cartesian Genetic Programming (CGP) and Large Language Models (LLM). We evaluate the contributions of both modalities (CGP and LLM) in discovering the Kalman filter under varying conditions. Our results demonstrate that our framework of CGP and LLM-assisted evolution converges to near-optimal solutions when Kalman optimality assumptions hold. When these assumptions are violated, our framework evolves interpretable alternatives that outperform the Kalman filter. These results demonstrate that combining evolutionary algorithms and generative models for interpretable, data-driven synthesis of simple computational modules is a potent approach for algorithmic discovery in scientific computing.

2503.12743 2026-06-11 cs.NE 版本更新

Oncomorphic neural agent populations for resource-limited sequential learning

用于资源受限顺序学习的肿瘤形态神经智能体群体

Philip Greulich, Michael Levin, Rosalia Moreddu

AI总结 提出一种受癌症启发的多智能体框架,通过智能体复制、架构突变、迁移和生态竞争,在有限共享资源下实现顺序学习,实验表明选择压力提升局部准确率,架构突变依赖初始种群,并增强多任务能力。

详情
Comments
17 pages, 5 figures, 1 table
AI中文摘要

分布式人工智能通常在顺序任务暴露、计算不均和分散协调下运行。这里,我们提出一个受癌症启发(即肿瘤形态)的多智能体框架,其中模拟神经智能体可以复制、突变其神经网络架构、跨任务环境迁移、经历生态更替,并从有限的共享储备中招募学习/生态资源。我们在受控的合成非线性分类环境中评估该框架,其中每个智能体仅在其本地任务上训练,从而由种群生态而非集中优化决定哪些神经网络架构持续存在。对于各种初始条件,我们发现更强的选择提高了存活智能体群体的终点局部准确率。架构突变扮演了状态依赖的角色:多样化的初始种群在低突变下表现最佳,而克隆的大架构种群受益于突变产生的变异。选择还提高了运行结束时的多任务能力,通过在所有环境上评估存活智能体而不进行额外训练来衡量。招募和升高的基线复制重塑了人口统计支持,而预测质量保持在狭窄范围内,与有限学习资源的重新分配一致。时间分辨的熵和优势分析揭示了向成功架构的集中,而有限训练周期使智能体处于非渐近学习状态。这些结果提供了概念验证的机制证据,表明肿瘤形态的种群动力学可能为工程应用中在有限本地资源下的分散适应提供一条途径。

英文摘要

Distributed artificial intelligence often operates under sequential task exposure, uneven compute, and decentralized coordination. Here, we present a cancer-inspired, or oncomorphic, multi-agent framework in which simulated neural agents can replicate, mutate their neural network architecture, migrate across task environments, undergo ecological turnover, and recruit learning/ecological resources from a finite shared reserve. We evaluate the framework in controlled synthetic nonlinear classification environments in which each agent trains only on its local task, allowing population ecology rather than centralized optimization to determine which neural network architectures persist. For various initial conditions, we find that stronger selection increased the endpoint local accuracy of surviving agent populations. Architecture mutation played a state-dependent role: diverse initial populations performed best at low mutation, whereas clonal large-architecture populations benefited from mutation-generated variation. Selection also increased end-of-run multi-task competence, measured by evaluating surviving agents on all environments without additional training. Recruitment and elevated baseline replication reshaped demographic support while prediction quality remained within a narrow band, consistent with redistribution of finite learning resources. Time-resolved entropy and dominance analyses revealed concentration toward successful architectures, while finite training cycles kept agents in a non-asymptotic learning regime. These results provide proof-of-concept mechanistic evidence that oncomorphic population dynamics may offer a route to decentralized adaptation in engineering applications under bounded local resources.