SpikeDecoder: Realizing the GPT Architecture with Spiking Neural Networks
SpikeDecoder: 用脉冲神经网络实现GPT架构
Claas Beger, Florian Walter, Alois Knoll
AI总结 提出SpikeDecoder,一种基于脉冲神经网络(SNN)的Transformer解码器,用于自然语言处理,通过替换ANN模块和优化嵌入方法,在保持性能的同时降低理论能耗87%-93%。
详情
Transformer架构被广泛认为是自然语言处理最强大的工具,但由于大量复杂操作,其本质上存在高能耗问题。为解决这一问题,我们考虑脉冲神经网络(SNN),它通过天然的事件驱动方式处理信息,是传统人工神经网络(ANN)的节能替代方案。然而,这本质上使得SNN难以训练。通常,许多基于SNN的模型通过转换预训练的ANN来规避这一问题。最近,有研究尝试设计可直接训练的基于SNN的Transformer模型结构改编。尽管结果显示出巨大潜力,但应用领域是计算机视觉,且所提模型仅包含编码器模块。在本文中,我们提出SpikeDecoder,一种完全基于SNN的Transformer解码器模块实现,用于自然语言处理。通过一系列实验,我们分析了用脉冲替代方案交换ANN模型不同模块的影响,以识别权衡和性能损失的主要来源。我们进一步研究了残差连接的作用以及SNN兼容归一化技术的选择。除了模型架构的工作,我们还制定并比较了将文本数据投影为脉冲的不同嵌入方法。最后,我们证明,与ANN基线相比,所提出的基于SNN的解码器模块将理论能耗降低了87%至93%。
The Transformer architecture is widely regarded as the most powerful tool for natural language processing, but due to a high number of complex operations, it inherently faces the issue of high energy consumption. To address this issue, we consider Spiking Neural Networks (SNNs), which are an energy-efficient alternative to conventional Artificial Neural Networks (ANNs) due to their naturally event-driven approach to processing information. However, this inherently makes them difficult to train. Often, many SNN-based models circumvent this issue by converting pre-trained ANNs. More recently, attempts have been made to design directly trainable SNN-based adaptations of the Transformer model structure. Although the results showed great promise, the application field was computer vision. Moreover, the proposed model incorporates only encoder blocks. In this paper, we propose SpikeDecoder, a fully SNN-based implementation of the Transformer decoder block, for applications in natural language processing. In a series of experiments, we analyze the impact of exchanging different blocks of the ANN model with spike-based alternatives to identify trade-offs and significant sources of performance loss. We further investigate the role of residual connections and the selection of SNN-compatible normalization techniques. Besides the work on the model architecture, we formulate and compare different embedding methods to project text data into spikes. Finally, we demonstrate that our proposed SNN-based decoder block reduces the theoretical energy consumption by 87% to 93% compared to the ANN baseline.