Access the full text.
Sign up today, get DeepDyve free for 14 days.
Eriko Nurvitadhi, Jaewoong Sim, D. Sheffield, Asit Mishra, Krishnan Srivatsan, Debbie Marr (2016)
Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC2016 26th International Conference on Field Programmable Logic and Applications (FPL)
William Chan, N. Jaitly, Quoc Le, O. Vinyals (2015)
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Bingbing Li, Santosh Pandey, Haowen Fang, Yanjun Lyv, Ji Li, Jieyang Chen, Mimi Xie, Lipeng Wan, Hang Liu, Caiwen Ding (2020)
FTRANS: energy-efficient acceleration of transformers using FPGAProceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design
Duncan Moss, Krishnan Srivatsan, Eriko Nurvitadhi, P. Ratuszniak, Chris Johnson, Jaewoong Sim, Asit Mishra, Debbie Marr, S. Subhaschandra, P. Leong (2018)
A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform: A Deep Learning Case StudyProceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun (2015)
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence, 39
(2017)
Attention is all you need[C]//
(2018)
Squeeze-and-excitation networks[C]//
(2018)
Bert: Pre-training of deep bidirectional transformers for language understanding[J]
Eriko Nurvitadhi, D. Sheffield, Jaewoong Sim, Asit Mishra, Ganesh Venkatesh, Debbie Marr (2016)
Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC2016 International Conference on Field-Programmable Technology (FPT)
A. Razavian, Hossein Azizpour, Josephine Sullivan, S. Carlsson (2014)
CNN Features Off-the-Shelf: An Astounding Baseline for Recognition2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops
Jiang Jiang, Vincent Mirian, Kam Tang, P. Chow, Zuocheng Xing (2009)
Matrix Multiplication Based on Scalable Macro-Pipelined FPGA Accelerator Architecture2009 International Conference on Reconfigurable Computing and FPGAs
(2015)
Recurrent neural networks hardware implementation on FPGA[J]arXiv preprint arXiv:1511.05552
Taisuke Ono, T. Shoji, Hasitha Waidyasooriya, M. Hariyama, Yuichiro Aoki, Yuki Kondoh, Yaoko Nakagawa (2019)
FPGA-Based Acceleration of Word2vec using OpenCL2019 IEEE International Symposium on Circuits and Systems (ISCAS)
Tae Ham, Sungjun Jung, Seonghak Kim, Young Oh, Yeonhong Park, Yoonho Song, Jung-Hun Park, Sanghee Lee, Kyoung Park, Jae Lee, D. Jeong (2020)
A^3: Accelerating Attention Mechanisms in Neural Networks with Approximation2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)
M. Véstias, H. Neto (2014)
Trends of CPU, GPU and FPGA for high-performance computing2014 24th International Conference on Field Programmable Logic and Applications (FPL)
Machine Translation System Based on Self-Attention Model[J]
Computer and Modernization, 2019
T. Sledevič (2019)
Adaptation of Convolution and Batch Normalization Layer for CNN Implementation on FPGA2019 Open Conference of Electrical, Electronic and Information Sciences (eStream)
(2018)
Deep learning with applications using python: chatbots and face, object, and speech recognition with tensorflow and keras[M]Apress
Yuhan Shen, Ke-Xin He, Weiqiang Zhang (2018)
SAM-GCNN: A Gated Convolutional Neural Network with Segment-Level Attention Mechanism for Home Activity Monitoring2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)
M. Bahoura, H. Ezzaidi (2011)
FPGA-Implementation of Discrete Wavelet Transform with Application to Signal DenoisingCircuits, Systems, and Signal Processing, 31
(2015)
Faster r-cnn: Towards real-time object detection with region proposal networks[J]Advances in Neural Information Processing Systems, 28
Mohammad Basiri, Shahla Nemati, Moloud Abdar, E. Cambria, U. Acharrya (2021)
ABCDM: An Attention-based Bidirectional CNN-RNN Deep Model for sentiment analysisFuture Gener. Comput. Syst., 115
Minjae Lee, Kyuyeon Hwang, Jinhwan Park, Sung-Hyk Choi, Sungho Shin, Wonyong Sung (2016)
FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks2016 IEEE International Workshop on Signal Processing Systems (SiPS)
(2015)
Attention-based models for speech recognition[J]arXiv preprint arXiv:1506.07503
Zhao Shengxue (2020)
English corpus translation system based on FPGA and machine learningMicroprocessors and Microsystems
(2015)
Effective approaches to attention-based neural machine translation[J]arXiv preprint arXiv:1508.04025
(2020)
Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer[J]arXiv preprint arXiv:2009.08605
Jim Whittington, Kapeel Deo, Tristan Kleinschmidt, Michael Mason (2008)
FPGA implementation of spectral subtraction for in-car speech enhancement and recognition2008 2nd International Conference on Signal Processing and Communication Systems
(2018)
Cbam: Convolutional block attention module[C]//
(2014)
Empirical evaluation of gated recurrent neural networks on sequence modeling[J]arXiv preprint arXiv:1412.3555
Weian Yan, W. Tong, Xiaoli Zhi (2020)
FPGAN: An FPGA Accelerator for Graph Attention Networks With Software and Hardware Co-OptimizationIEEE Access, 8
(2015)
A convolutional neural network cascade for face detection[C]//
Kazuki Irie, Zoltán Tüske, Tamer Alkhouli, Ralf Schlüter, H. Ney (2016)
LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition
Tianjun Xiao, Yichong Xu, Kuiyuan Yang, Jiaxing Zhang, Yuxin Peng, Zheng Zhang (2014)
The application of two-level attention models in deep convolutional neural network for fine-grained image classification2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2021)
Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition[J]arXiv preprint arXiv:2103.00422
Klaus Greff, R. Srivastava, Jan Koutník, Bas Steunebrink, J. Schmidhuber (2015)
LSTM: A Search Space OdysseyIEEE Transactions on Neural Networks and Learning Systems, 28
(2018)
Real-time road segmentation using lidar data processing on an fpga[C]//
Suyoun Kim, Takaaki Hori, Shinji Watanabe (2016)
Joint CTC-attention based end-to-end speech recognition using multi-task learning2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
S. Kestur, John Davis, Oliver Williams (2010)
BLAS Comparison on FPGA, CPU and GPU2010 IEEE Computer Society Annual Symposium on VLSI
(2016)
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition[C]//IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)IEEE
Ruizhe Zhao, Xinyu Niu, Yajie Wu, W. Luk, Qiang Liu (2017)
Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms
IMPLEMENTATION OF A SIGMOID ACTIVATION FUNCTION FOR NEURAL NETWORK USING FPGA
Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, Huazhong Yang (2016)
Going Deeper with Embedded FPGA Platform for Convolutional Neural NetworkProceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
(2019)
An attentive survey of attention models[J]arXiv preprint arXiv:1904.02874
(2015)
Spatial transformer networks[J]
Dichao Hu (2018)
An Introductory Survey on Attention Mechanisms in NLP Problems
(2018)
Pyramid attention network for semantic segmentation[J]arXiv preprint arXiv:1805.10180
(2014)
Recurrent models of visual attention[C]//
Yuanhe Tian, Yan Song, Fei Xia (2020)
Joint Chinese Word Segmentation and Part-of-speech Tagging via Multi-channel Attention of Character N-grams
K. Wang, H. Zhong, N. Yu, Q. Xia (2019)
Nonintrusive Load Monitoring based on Sequence-to-sequence Model With Attention Mechanism, 39
Wenjuan Du, Li Zhang, Lining Sun, Yu-hang Chen, Cheng-huan Li (2021)
Research and application of semantic understanding based on Attention-RNNProcedia Computer Science, 183
(2016)
Character-level question answering with attention[J]arXiv preprint arXiv:1604.00727
Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]
The location attention mechanism has been widely applied in deep neural networks. However, as the mechanism entails heavy computing workload, significant memories consumed for weights storage, and shows poor parallelism in some calculations, it is hard to achieve high efficiency deployment. In this paper, the field-programmable gate array (FPGA) is employed to implement the location attention mechanism in hardware, and a novel fusion approach is proposed to connect the convolutional layer with the fully connected layer, which not only improves the parallelism of both the algorithm and the hardware pipeline, but also reduces the computation cost for such operations as multiplication and addition. Meanwhile, the shared computing architecture is used to reduce the demand for hardware resources. Parallel computing arrays are utilized to time-multiplex a single computing array, which can speed up the pipeline parallel computing of the attention mechanism. Experimental results show that for the location attention mechanism, the FPGA’s inference speed is 0.010 ms, which is around a quarter of the speed achieved by running it with GPU, and its power consumption is 1.73 W, which is about 2.89% of the power consumed by running it with CPU. Compared with other FPGA implementation methods of attention mechanism, it has less hardware resource consumption and less inference time. When applied to speech recognition tasks, the trained attention model is symmetrically quantized and deployed on the FPGA. The result shows that the word error rate is only 0.79% higher than that before quantization, which proves the effectiveness and correctness of the hardware circuit.
Journal of Intelligent and Fuzzy Systems – IOS Press
Published: Aug 10, 2022
Keywords: Attention mechanism; neural networks; FPGA; deep learning; hardware implementation
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.