FPGA-based design and implementation of the location attention mechanism in neural networks

Ruixiu Qiao; Xiaozhou Guo; Wenyu Mao; Jixing Li; Huaxiang Lu

doi:10.3233/jifs-212273

Loading next page...

References (58)

Eriko Nurvitadhi, Jaewoong Sim, D. Sheffield, Asit Mishra, Krishnan Srivatsan, Debbie Marr (2016)
Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC
2016 26th International Conference on Field Programmable Logic and Applications (FPL)
William Chan, N. Jaitly, Quoc Le, O. Vinyals (2015)
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Bingbing Li, Santosh Pandey, Haowen Fang, Yanjun Lyv, Ji Li, Jieyang Chen, Mimi Xie, Lipeng Wan, Hang Liu, Caiwen Ding (2020)
FTRANS: energy-efficient acceleration of transformers using FPGA
Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design
Duncan Moss, Krishnan Srivatsan, Eriko Nurvitadhi, P. Ratuszniak, Chris Johnson, Jaewoong Sim, Asit Mishra, Debbie Marr, S. Subhaschandra, P. Leong (2018)
A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform: A Deep Learning Case Study
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun (2015)
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
IEEE Transactions on Pattern Analysis and Machine Intelligence, 39
(2017)
Attention is all you need[C]//
(2018)
Squeeze-and-excitation networks[C]//
(2018)
Bert: Pre-training of deep bidirectional transformers for language understanding[J]
Eriko Nurvitadhi, D. Sheffield, Jaewoong Sim, Asit Mishra, Ganesh Venkatesh, Debbie Marr (2016)
Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC
2016 International Conference on Field-Programmable Technology (FPT)
A. Razavian, Hossein Azizpour, Josephine Sullivan, S. Carlsson (2014)
CNN Features Off-the-Shelf: An Astounding Baseline for Recognition
2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops
Jiang Jiang, Vincent Mirian, Kam Tang, P. Chow, Zuocheng Xing (2009)
Matrix Multiplication Based on Scalable Macro-Pipelined FPGA Accelerator Architecture
2009 International Conference on Reconfigurable Computing and FPGAs
(2015)
Recurrent neural networks hardware implementation on FPGA[J]
arXiv preprint arXiv:1511.05552
Taisuke Ono, T. Shoji, Hasitha Waidyasooriya, M. Hariyama, Yuichiro Aoki, Yuki Kondoh, Yaoko Nakagawa (2019)
FPGA-Based Acceleration of Word2vec using OpenCL
2019 IEEE International Symposium on Circuits and Systems (ISCAS)
Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition
Tae Ham, Sungjun Jung, Seonghak Kim, Young Oh, Yeonhong Park, Yoonho Song, Jung-Hun Park, Sanghee Lee, Kyoung Park, Jae Lee, D. Jeong (2020)
A^3: Accelerating Attention Mechanisms in Neural Networks with Approximation
2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)
M. Véstias, H. Neto (2014)
Trends of CPU, GPU and FPGA for high-performance computing
2014 24th International Conference on Field Programmable Logic and Applications (FPL)
Machine Translation System Based on Self-Attention Model[J]
Computer and Modernization, 2019
Effective Approaches to Attention-based Neural Machine Translation
T. Sledevič (2019)
Adaptation of Convolution and Batch Normalization Layer for CNN Implementation on FPGA
2019 Open Conference of Electrical, Electronic and Information Sciences (eStream)
(2018)
Deep learning with applications using python: chatbots and face, object, and speech recognition with tensorflow and keras[M]
Apress
Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer
Yuhan Shen, Ke-Xin He, Weiqiang Zhang (2018)
SAM-GCNN: A Gated Convolutional Neural Network with Segment-Level Attention Mechanism for Home Activity Monitoring
2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)
M. Bahoura, H. Ezzaidi (2011)
FPGA-Implementation of Discrete Wavelet Transform with Application to Signal Denoising
Circuits, Systems, and Signal Processing, 31
(2015)
Faster r-cnn: Towards real-time object detection with region proposal networks[J]
Advances in Neural Information Processing Systems, 28
Mohammad Basiri, Shahla Nemati, Moloud Abdar, E. Cambria, U. Acharrya (2021)
ABCDM: An Attention-based Bidirectional CNN-RNN Deep Model for sentiment analysis
Future Gener. Comput. Syst., 115
Minjae Lee, Kyuyeon Hwang, Jinhwan Park, Sung-Hyk Choi, Sungho Shin, Wonyong Sung (2016)
FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks
2016 IEEE International Workshop on Signal Processing Systems (SiPS)
(2015)
Attention-based models for speech recognition[J]
arXiv preprint arXiv:1506.07503
Zhao Shengxue (2020)
English corpus translation system based on FPGA and machine learning
Microprocessors and Microsystems
(2015)
Effective approaches to attention-based neural machine translation[J]
arXiv preprint arXiv:1508.04025
(2020)
Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer[J]
arXiv preprint arXiv:2009.08605
Jim Whittington, Kapeel Deo, Tristan Kleinschmidt, Michael Mason (2008)
FPGA implementation of spectral subtraction for in-car speech enhancement and recognition
2008 2nd International Conference on Signal Processing and Communication Systems
(2018)
Cbam: Convolutional block attention module[C]//
(2014)
Empirical evaluation of gated recurrent neural networks on sequence modeling[J]
arXiv preprint arXiv:1412.3555
Weian Yan, W. Tong, Xiaoli Zhi (2020)
FPGAN: An FPGA Accelerator for Graph Attention Networks With Software and Hardware Co-Optimization
IEEE Access, 8
(2015)
A convolutional neural network cascade for face detection[C]//
Kazuki Irie, Zoltán Tüske, Tamer Alkhouli, Ralf Schlüter, H. Ney (2016)
LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition
Tianjun Xiao, Yichong Xu, Kuiyuan Yang, Jiaxing Zhang, Yuxin Peng, Zheng Zhang (2014)
The application of two-level attention models in deep convolutional neural network for fine-grained image classification
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2021)
Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition[J]
arXiv preprint arXiv:2103.00422
Klaus Greff, R. Srivastava, Jan Koutník, Bas Steunebrink, J. Schmidhuber (2015)
LSTM: A Search Space Odyssey
IEEE Transactions on Neural Networks and Learning Systems, 28
(2018)
Real-time road segmentation using lidar data processing on an fpga[C]//
Suyoun Kim, Takaaki Hori, Shinji Watanabe (2016)
Joint CTC-attention based end-to-end speech recognition using multi-task learning
2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
S. Kestur, John Davis, Oliver Williams (2010)
BLAS Comparison on FPGA, CPU and GPU
2010 IEEE Computer Society Annual Symposium on VLSI
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
(2016)
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition[C]//IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
IEEE
CNN in TensorFlow
Ruizhe Zhao, Xinyu Niu, Yajie Wu, W. Luk, Qiang Liu (2017)
Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms
IMPLEMENTATION OF A SIGMOID ACTIVATION FUNCTION FOR NEURAL NETWORK USING FPGA
Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, Huazhong Yang (2016)
Going Deeper with Embedded FPGA Platform for Convolutional Neural Network
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
(2019)
An attentive survey of attention models[J]
arXiv preprint arXiv:1904.02874
(2015)
Spatial transformer networks[J]
Dichao Hu (2018)
An Introductory Survey on Attention Mechanisms in NLP Problems
(2018)
Pyramid attention network for semantic segmentation[J]
arXiv preprint arXiv:1805.10180
(2014)
Recurrent models of visual attention[C]//
Yuanhe Tian, Yan Song, Fei Xia (2020)
Joint Chinese Word Segmentation and Part-of-speech Tagging via Multi-channel Attention of Character N-grams
K. Wang, H. Zhong, N. Yu, Q. Xia (2019)
Nonintrusive Load Monitoring based on Sequence-to-sequence Model With Attention Mechanism
, 39
Wenjuan Du, Li Zhang, Lining Sun, Yu-hang Chen, Cheng-huan Li (2021)
Research and application of semantic understanding based on Attention-RNN
Procedia Computer Science, 183
(2016)
Character-level question answering with attention[J]
arXiv preprint arXiv:1604.00727
Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]

Publisher: IOS Press
Copyright: © 2022 – IOS Press. All rights reserved
ISSN: 1064-1246
eISSN: 1875-8967
DOI: 10.3233/jifs-212273
Publisher site: See Article on Publisher Site

Abstract

The location attention mechanism has been widely applied in deep neural networks. However, as the mechanism entails heavy computing workload, significant memories consumed for weights storage, and shows poor parallelism in some calculations, it is hard to achieve high efficiency deployment. In this paper, the field-programmable gate array (FPGA) is employed to implement the location attention mechanism in hardware, and a novel fusion approach is proposed to connect the convolutional layer with the fully connected layer, which not only improves the parallelism of both the algorithm and the hardware pipeline, but also reduces the computation cost for such operations as multiplication and addition. Meanwhile, the shared computing architecture is used to reduce the demand for hardware resources. Parallel computing arrays are utilized to time-multiplex a single computing array, which can speed up the pipeline parallel computing of the attention mechanism. Experimental results show that for the location attention mechanism, the FPGA’s inference speed is 0.010 ms, which is around a quarter of the speed achieved by running it with GPU, and its power consumption is 1.73 W, which is about 2.89% of the power consumed by running it with CPU. Compared with other FPGA implementation methods of attention mechanism, it has less hardware resource consumption and less inference time. When applied to speech recognition tasks, the trained attention model is symmetrically quantized and deployed on the FPGA. The result shows that the word error rate is only 0.79% higher than that before quantization, which proves the effectiveness and correctness of the hardware circuit.

Journal

Journal of Intelligent and Fuzzy Systems – IOS Press

Published: Aug 10, 2022

Keywords: Attention mechanism; neural networks; FPGA; deep learning; hardware implementation

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

FPGA-based design and implementation of the location attention mechanism in neural networks

FPGA-based design and implementation of the location attention mechanism in neural networks

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

FPGA-based design and implementation of the location attention mechanism in neural networks

FPGA-based design and implementation of the location attention mechanism in neural networks

References (58)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies