arXivDaily arXiv每日学术速递 周一至周五更新
重置
nlin.AO自适应系统5
2606.12059 2026-06-11 cs.LG cs.NE nlin.AO 新提交

Attention by Synchronization in Coupled Oscillator Networks

耦合振荡器网络中的同步注意力机制

Fabio Pasqualetti, Taosha Guo

发表机构 * University of California, Irvine(加州大学尔湾分校)

AI总结 提出基于Kuramoto同步动力学的固定查询振荡器注意力机制,无需指数运算和全局归约,在物理基板上实现注意力计算,并在关键词识别和主谓一致任务上优于softmax。

详情
AI中文摘要

我们探讨了能量受限物理基板上的Transformer注意力机制。Softmax注意力需要指数运算和全局归约,这些操作在冯·诺依曼硬件上能耗高且没有自然的物理模拟。我们证明Kuramoto同步动力学(出现在电气、机械、超导和电荷密度波振荡器阵列等物理系统中)无需上述操作即可实现定义良好的注意力操作。由此产生的机制——固定查询振荡器注意力——用球面上梯度流的平衡取代了softmax的算术运算:查询是固定在球面上的学习锚点,自由振荡器在Kuramoto-Lohe动力学下演化,直到它们稳定在通过余弦相似度编码注意力权重的位置上。由于计算是平衡过程,因此不需要指数运算;唯一的全局操作是读出时的仿射归一化。该不动点是唯一且从几乎所有初始条件全局吸引的,这一保证适用于所有物理实现。在实验上,在最小硬件配置(振荡器维度$d_{\mathrm{osc}}=2$)下,振荡器注意力在关键词识别(+1.00个百分点)和主谓一致(困难句子+5.27个百分点,零训练失败,而softmax五分之一失败)上优于softmax。在因果语言建模中,softmax仍保持优势,但振荡器注意力随着$d_{\mathrm{osc}}$的增长缩小了差距:在WikiText-2上,从$d_{\mathrm{osc}}=2$时的+11.09 PPL降至$d_{\mathrm{osc}}=32$时的+2.98 PPL;在TinyStories上,从$d_{\mathrm{osc}}=2$时的+2.39 PPL降至$d_{\mathrm{osc}}=32$时的+0.57 PPL。本工作的主要目标不是用软件替代softmax,而是为物理基板上的精确注意力提供数学基础蓝图。

英文摘要

We address transformer attention on energy-constrained physical substrates. Softmax attention requires exponentiation and global reduction, operations with high energy cost on von Neumann hardware and no natural physical analog. We show that Kuramoto synchronization dynamics (which arise in electrical, mechanical, superconducting, and charge-density-wave oscillator arrays, among other physical systems) implement a well-defined attention operation without either. The resulting mechanism, fixed-query oscillator attention, replaces softmax's arithmetic with the equilibration of a gradient flow on the sphere: queries are learned anchors fixed on the sphere, and free oscillators evolve under Kuramoto-Lohe dynamics until they settle at positions encoding attention weights via cosine similarity. Because the computation is equilibration, it requires no exponentiation; the only global operation is an affine normalization at readout. The fixed point is provably unique and globally attractive from almost every initial condition, a guarantee that holds across every physical realization. Empirically, at the minimal hardware configuration (oscillator dimension $d_{\mathrm{osc}}$ = 2), oscillator attention outperforms softmax on keyword spotting (+1.00 pp) and on subject-verb agreement (+5.27 pp on hard sentences, with zero training failures versus one in five for softmax). On causal language modeling, where softmax retains an advantage, oscillator attention closes the gap as $d_{\mathrm{osc}}$ grows: from +11.09 PPL at $d_{\mathrm{osc}}$ = 2 to +2.98 PPL at $d_{\mathrm{osc}}$ = 32 on WikiText-2, and from +2.39 PPL at $d_{\mathrm{osc}}$ = 2 to +0.57 PPL at $d_{\mathrm{osc}}$ = 32 on TinyStories. The main objective of this work is not to replace softmax in software but to provide a mathematically grounded blueprint for accurate attention on physical substrates.

2606.11585 2026-06-11 cs.LG cs.CL nlin.AO 新提交

Kuramoto Attention: Synchronizing Self-Attention on the Torus

Kuramoto注意力:在环面上同步自注意力

Joshua Nunley

发表机构 * Department of Informatics, Luddy School of Informatics, Computing, and Engineering, Cognitive Science Program, Indiana University Bloomington(印第安纳大学伯明顿分校信息学系,卢迪信息学、计算与工程学院,认知科学项目)

AI总结 提出Kuramoto注意力层,将隐藏坐标视为角度,通过门控余弦相似度和环形均值更新实现自注意力,等价于Kuramoto耦合项,在字符级语言建模中达到与强基线相近的性能。

详情
Comments
13 pages, 2 figures, 3 tables
AI中文摘要

我们引入了Kuramoto注意力,一种自注意力层,其中每个隐藏坐标是一个角度。该层通过门控余弦相似度对令牌进行评分,关注先前的相位状态,并通过注意力加权的环形均值的切线分量更新每个令牌。由于值是原始相位状态,该更新恰好是Kuramoto耦合项$\sum_u A_{t,u}\sin(\theta_u-\theta_t)$,其中注意力矩阵充当自适应、内容相关的耦合核。等价地,门控分数是环面上的学习度量,用于选择哪些令牌耦合,更新将每个令牌拉向其选择的令牌的环形均值,从而收紧它们的相位一致性。相同的两个成分,即不变相似度分数和流形上的均值,定义了任何紧致群上的此类层;环面是阿贝尔情形,两者都有闭式解。softmax权重解决了一个熵正则化的相位检索问题,旋转位置编码作为分数中与位置相关的相位漂移进入。在enwiki8字符级语言建模中,该层作为功能语言模型训练,其每字符比特数接近强匹配的RoPE+SwiGLU Transformer:在100万参数时相差0.02 BPC(1.637±0.010对比1.616±0.004),在500万参数时中位数持平(五个种子下1.448对比1.452),Transformer在均值上领先(1.468对比1.456)。这些实验表明,受约束的几何结构在此规模下是可行的语言模型;结构本身及其同步解释是贡献。消融实验隔离了承重组件,结果给出了自注意力和相位同步之间的紧凑桥梁。

英文摘要

We introduce Kuramoto attention, a self-attention layer in which each hidden coordinate is an angle. The layer scores tokens by gated cosine similarity, attends over previous phase states, and updates each token by the tangent component of the attention-weighted circular mean. Because the values are the raw phase states, this update is exactly the Kuramoto coupling term $\sum_u A_{t,u}\sin(\theta_u-\theta_t)$, with the attention matrix acting as an adaptive, content-dependent coupling kernel. Equivalently, the gated score is a learned metric on the torus that selects which tokens couple, and the update pulls each token toward the circular mean of the tokens it selects, tightening their phase agreement. The same two ingredients, an invariant similarity score and an on-manifold mean, define such a layer on any compact group; the torus is the abelian case, where both are closed-form. The softmax weights solve an entropy-regularized phase-retrieval problem, and rotary position enters as a position-dependent phase drift in the score. On enwiki8 character-level language modeling, the layer trains as a functional language model whose bits-per-character stays close to a strong matched RoPE+SwiGLU transformer: within $0.02$ BPC at one million parameters ($1.637\pm0.010$ versus $1.616\pm0.004$) and level on the median at five million ($1.448$ versus $1.452$ over five seeds) with the transformer ahead on the mean ($1.468$ versus $1.456$). These experiments establish that the constrained geometric structure is a viable language model at this scale; the structure itself, and its synchronization reading, is the contribution. Ablations isolate the load-bearing components, and the result gives a compact bridge between self-attention and phase synchronization.

2606.11259 2026-06-11 nlin.AO cond-mat.stat-mech cs.SI math.DS q-bio.PE 新提交

Stabilizing Role of Uninformed Participants in Collective Decision Making

无信息参与者在集体决策中的稳定作用

Leonardo Colombo, Marıa Emma Eyrea Irazu, Laura P. Schaposnik, James Unwin

AI总结 通过耗散哈密顿量建模,发现无信息参与者通过方向无关的耗散延迟极化转变,稳定集体决策。

详情
Comments
23 pages, 6 images
AI中文摘要

对于没有严格等级制度的群体,集体决策通常通过妥协产生。我们使用耗散哈密顿量公式开发了一个集体决策的二阶网络模型,其中知情代理引入偏好方向,而无信息参与者仅贡献方向无关的耗散。我们表明,在低冲突下,该模型允许一个局部唯一、指数稳定的妥协状态。使用结构化模块网络,我们进一步表明,随着冲突增加,局部妥协分支通过鞍节点折叠终止,而不是通过平滑的平均场对称破缺转变。模块化极化状态在局部与妥协分支分离的分支上持续存在。方向无关的耗散不会改变静态结构阈值,但会延迟从鞍节点幽灵的逃逸,并将极化的可观察起始点推向更大的冲突。我们的工作确定了一种耗散介导的机制,与基于连通性的解释互补,通过该机制,无信息参与者稳定了生物和工程群体中的集体行为。

英文摘要

For groups without strict hierarchy, collective decisions often emerge through compromise. We develop a second-order network model of collective decision-making using a dissipative Hamiltonian formulation, in which informed agents introduce preferred directions while uninformed participants contribute only direction-free dissipation. We show that under low conflict, the model admits a locally unique, exponentially stable compromise state. Using a structured modular network we further show that as conflict increases the local compromise branch terminates through a saddle-node fold rather than through a smooth mean-field symmetry-breaking transition. Modular polarized states persist on branches that are locally separated from the compromise branch. Direction-free dissipation does not shift the static structural threshold, but it delays escape from the saddle-node ghost and pushes the observable onset of polarization to larger conflicts. Our work identifies a dissipation-mediated mechanism, complementary to connectivity-based accounts, through which uninformed participants stabilize collective behavior in biological and engineered swarms.

2511.04327 2026-06-11 q-bio.PE nlin.AO physics.bio-ph

Feasibility and Single Parameter Scaling of Extinctions in Large Ecological Communities

大规模生态群落中灭绝可行性的单参数标度

Philippe Jacquod

AI总结 研究通过随机矩阵理论分析了大规模生态群落中物种共存的可行性及灭绝触发机制,推导出灭绝概率的解析表达式并提出单参数标度律。

详情
Journal ref
Phys. Rev. E 113, L62202 (2026)
Comments
Final version; to appear in Phys. Rev. E Letters
AI中文摘要

由广义利克特-沃尔特方程建模的多物种生态系统表现出稳定的种群丰度,其中大量物种往往共存。理解这种共存在何种条件下可行以及触发物种灭绝的因素是理论生态学中的关键问题。通过标准的随机矩阵理论方法,我证明在弱相互作用范围内,物种丰度分布在平衡时呈高斯分布。一个结果是,对于足够多的物种,可行性通常在稳定性之前被破坏。我进一步推导了n=0,1,2,...个物种灭绝的概率解析表达式,并推测物种灭绝遵循单参数标度律。这些结果通过在广泛系统参数范围内的数值模拟得到验证。

英文摘要

Multispecies ecosystems modelled by generalized Lotka-Volterra equations exhibit stationary population abundances, where large number of species often coexist. Understanding the precise conditions under which this is at all feasible and what triggers species extinctions is a key, outstanding problem in theoretical ecology. Using standard methods of random matrix theory, I show that distributions of species abundances are Gaussian at equilibrium, in the weakly interacting regime. One consequence is that feasibility is generically broken before stability, for large enough number of species. I further derive an analytical expression for the probability that $n=0,1,2,...$ species go extinct and conjecture that a single-parameter scaling law governs species extinctions. These results are corroborated by numerical simulations in a wide range of system parameters.

2511.00044 2026-06-11 cs.LG nlin.AO 版本更新

Time-multiplexed layer reuse for physical neural networks

物理神经网络的时间复用层重用

Kohei Tsuchiyama, Andre Roehm, Takatomo Mihana, Ryoichi Horisaki

AI总结 针对物理神经网络权重调整慢的瓶颈,提出TIDAL-Net,通过时间复用层增加有效深度,在图像分类和自然语言处理任务上提升性能。

详情
AI中文摘要

物理神经网络(PNN)是下一代计算的有前途的候选者,但现有演示仍比现代数字神经网络小几个数量级,而现代数字神经网络的最新进展是由可训练参数的快速增长驱动的。这种情况类似于早期数字神经网络的限制,这导致了关于参数重用的想法。我们研究了类似高效的硬件架构可能是什么样子,特别关注PNN中权重重新调整的常见瓶颈。我们提出了时间索引深度交替层网络(TIDAL-Net),它占据循环神经网络和深度神经网络之间的中间状态,专门针对常见PNN原型的规模和限制。TIDAL-Net利用许多PNN中快速前向动力学和缓慢可训练权重与偏置之间的时间尺度分离,通过逐层时间复用来增加有效深度,同时限制实现成本。在图像分类和自然语言处理任务上的数值实验表明,TIDAL-Net在仅对传统PNN进行微小修改的情况下提高了性能。

英文摘要

Physical neural networks (PNNs) are promising candidates for next-generation computing, but existing demonstrations remain several orders of magnitude smaller than modern digital neural networks, whose recent advances have been driven by rapid growth in trainable parameters. This situation resembles the constraints of early digital neural networks, which led to ideas around parameter reuse. We investigate what similarly efficient hardware architectures may look like, focusing specifically on the common bottleneck of slow re-adjustment of the weights in PNNs. We propose the Time-Indexed Deep Alternating Layers Network (TIDAL-Net), which occupies an intermediate regime between recurrent and deep neural networks, specifically aimed at the scales and restrictions of common PNN prototypes. TIDAL-Net leverages the timescale separation found in many PNNs between fast forward dynamics and slowly trainable weights and biases, using layer-by-layer time multiplexing to increase effective depth while limiting implementation cost. Numerical experiments on image classification and natural language processing tasks show that TIDAL-Net improves performance with only minor modifications to conventional PNNs.