Stability properties of Minimal Gated Unit neural networks
最小门控单元神经网络的稳定性性质
Stefano De Carli, Davide Previtali, Mirko Mazzoleni, Fabio Previdi
AI总结 针对资源受限环境,分析最小门控单元网络的输入-状态稳定性,导出充分参数条件,提出稳定性促进训练方法,在合成数据和Silverbox基准上验证其参数效率与推理速度优势。
详情
- Comments
- Preprint submitted to Automatica. 16 pages, 6 figures and 1 table MATLAB code for the proposed methodologies is available at: https://github.com/StefanoDeCarli/MGU_dISS.git
在这项工作中,我们通过分析最小门控单元(MGU)网络的稳定性,解决了在计算资源有限的环境中需要高效且形式稳定的循环神经网络(RNN)的问题。MGU网络是系统辨识中常用门控RNN的轻量级替代方案。我们推导了MGU网络输入-状态稳定性和增量输入-状态稳定性的充分参数条件。这些条件使得模型稳定性的后验验证成为可能,并构成了新颖的稳定性促进训练方法的基础,包括网络参数的热启动和基于投影梯度的优化方案,两者均在本工作中提出。比较评估,包括鲁棒性分析以及在合成数据和真实世界数据(即Silverbox基准)上的验证,表明最小门控单元网络成功地将形式稳定性保证与优越的参数效率和更快的推理时间相结合,同时保持了可比较且令人满意的准确性。值得注意的是,在Silverbox基准上获得的结果表明,稳定的MGU网络有效捕捉了系统动态,而其他稳定的RNN未能收敛到可靠模型。
In this work, we address the need for efficient and formally stable Recurrent Neural Networks (RNNs) in environments with limited computational resources by analyzing the stability of the Minimal Gated Unit (MGU) network, a lightweight alternative to common gated RNNs used in system identification. We derive sufficient parametric conditions for the MGU network's input-to-state stability and incremental input-to-state stability properties. These conditions enable a-posteriori validation of model stability and form the basis for novel stability-promoting training methodologies, including a warm-start of the network's parameters and a projected gradient-based optimization scheme, both of which are presented in this work. Comparative evaluation, including robustness analysis and validation on synthetic and real-world data (i.e., the Silverbox benchmark), demonstrates that the minimal gated unit network successfully combines formal stability guarantees with superior parameter efficiency and faster inference times compared to other state-of-the-art recurrent neural networks, while maintaining comparable and satisfactory accuracy. Notably, the results attained on the Silverbox benchmark illustrate that the stable MGU network effectively captures the system dynamics, whereas other stable RNNs fail to converge to a reliable model.