# Deep learning for natural language processing: advantages and challenges

Deep learning for natural language processing: advantages and challenges INTRODUCTION Deep learning refers to machine learning technologies for learning and utilizing ‘deep’ artificial neural networks, such as deep neural networks (DNN), convolutional neural networks (CNN) and recurrent neural networks (RNN). Recently, deep learning has been successfully applied to natural language processing and significant progress has been made. This paper summarizes the recent advancement of deep learning for natural language processing and discusses its advantages and challenges. We think that there are five major tasks in natural language processing, including classification, matching, translation, structured prediction and the sequential decision process. For the first four tasks, it is found that the deep learning approach has outperformed or significantly outperformed the traditional approaches. End-to-end training and representation learning are the key features of deep learning that make it a powerful tool for natural language processing. Deep learning is not almighty, however. It might not be sufficient for inference and decision making, which are essential for complex problems like multi-turn dialogue. Furthermore, how to combine symbolic processing and neural processing, how to deal with the long tail phenomenon, etc. are also challenges of deep learning for natural language processing. PROGRESS IN NATURAL LANGUAGE PROCESSING In our view, there are five major tasks in natural language processing, namely classification, matching, translation, structured prediction and the sequential decision process. Most of the problems in natural language processing can be formalized as these five tasks, as summarized in Table 1. In the tasks, words, phrases, sentences, paragraphs and even documents are usually viewed as a sequence of tokens (strings) and treated similarly, although they have different complexities. In fact, sentences are the most widely used processing units. It has been observed recently that deep learning can enhance the performances in the first four tasks and becomes the state-of-the-art technology for the tasks (e.g. [1–8]). Table 2 shows the performances of example problems in which deep learning has surpassed traditional approaches. Among all the NLP problems, progress in machine translation is particularly remarkable. Neural machine translation, i.e. machine translation using deep learning, has significantly outperformed traditional statistical machine translation. The state-of-the art neural translation systems employ sequence-to-sequence learning models comprising RNNs [4–6]. Deep learning has also, for the first time, made certain applications possible. For example, deep learning has been successfully applied to image retrieval (also known as text to image), in which query and image are first transformed into vector representations with CNNs, the representations are matched with DNN and the relevance of the image to the query is calculated [3]. Deep learning is also employed in generation-based natural language dialogue, in which, given an utterance, the system automatically generates a response and the model is trained in sequence-to-sequence learning [7]. Table 1. Five tasks in natural language processing. Task  Description  Model  Applications  Classification  assign a label to a string  $$s \to c s{\rm{:string}}, c{\rm{:label}}$$  text classification, sentiment analysis  Matching  matching two strings  $$s,t \to {R^ + } s{\rm{: string}}, t{\rm{:string }} {R^ + }{\rm{: non - negative\, real\, values}}$$  search, question answering, single turn dialogue (retrieval based)  Translation  transform one string to another  $$s \to t s{\rm{:string}}, t{\rm{:string}}$$  machine translation, automatic speech recognition, single turn dialogue (generation-based)  structured prediction  map a string to a structure  $$s \to [s] s{\rm{:string}}, [s]{\rm{:structure}}$$  named entity recognition, word segmentation, part-of-speech tagging, dependency parsing, semantic parsing  sequential decision process  take actions in states in dynamically changing environment  $$\pi :s \to a \pi {\rm{:policy}}, s{\rm{:state}}, a{\rm{: action}}$$  multi-turn dialogue  Task  Description  Model  Applications  Classification  assign a label to a string  $$s \to c s{\rm{:string}}, c{\rm{:label}}$$  text classification, sentiment analysis  Matching  matching two strings  $$s,t \to {R^ + } s{\rm{: string}}, t{\rm{:string }} {R^ + }{\rm{: non - negative\, real\, values}}$$  search, question answering, single turn dialogue (retrieval based)  Translation  transform one string to another  $$s \to t s{\rm{:string}}, t{\rm{:string}}$$  machine translation, automatic speech recognition, single turn dialogue (generation-based)  structured prediction  map a string to a structure  $$s \to [s] s{\rm{:string}}, [s]{\rm{:structure}}$$  named entity recognition, word segmentation, part-of-speech tagging, dependency parsing, semantic parsing  sequential decision process  take actions in states in dynamically changing environment  $$\pi :s \to a \pi {\rm{:policy}}, s{\rm{:state}}, a{\rm{: action}}$$  multi-turn dialogue  View Large Table 2. Performances of Natural Language Processing Problems. Task  Example problem  Deep learning  Traditional approach  Reference  classification  sentiment classification  CNN, acc = 86.8%  SVM, acc = 79.4%  [1]  matching  single turn dialogue  CNN, p@1 = 49.6%  MLP, p@1 = 36.1%  [2]  translation  machine translation  NMT, BLEU = 39.0  SMT, BLEU = 37.0  [6]  structured prediction  dependency parsing  acc = 91.8%  acc = 90.7%  [8]  Task  Example problem  Deep learning  Traditional approach  Reference  classification  sentiment classification  CNN, acc = 86.8%  SVM, acc = 79.4%  [1]  matching  single turn dialogue  CNN, p@1 = 49.6%  MLP, p@1 = 36.1%  [2]  translation  machine translation  NMT, BLEU = 39.0  SMT, BLEU = 37.0  [6]  structured prediction  dependency parsing  acc = 91.8%  acc = 90.7%  [8]  View Large Table 3. Advantages and challenges of deep learning for natural language processing. Advantages  Challenges  good at pattern recognition problems data-driven, and performance is high in many problems end-to-end training: little or no domain knowledge is needed in system construction learn of representations: cross-modal processing is possible gradient-based learning: learning algorithm is simple mainly supervised learning methods  not good at inference and decision making cannot directly handle symbols data-hungry and thus is not suitable when data size is small difficult to handle long tail phenomena model is usually a black box and is difficult to understand computational cost of learning is high unsupervised learning methods need to be developed still lacks of theoretical foundation  Advantages  Challenges  good at pattern recognition problems data-driven, and performance is high in many problems end-to-end training: little or no domain knowledge is needed in system construction learn of representations: cross-modal processing is possible gradient-based learning: learning algorithm is simple mainly supervised learning methods  not good at inference and decision making cannot directly handle symbols data-hungry and thus is not suitable when data size is small difficult to handle long tail phenomena model is usually a black box and is difficult to understand computational cost of learning is high unsupervised learning methods need to be developed still lacks of theoretical foundation  View Large The fifth task, the sequential decision process such as the Markov decision process, is the key issue in multi-turn dialogue, as explained below. It has not been thoroughly verified, however, how deep learning can contribute to the task. ADVANTAGES AND CHALLENGES Deep learning certainly has advantages and challenges when applied to natural language processing, as summarized in Table 3. Advantages We think that, among the advantages, end-to-end training and representation learning really differentiate deep learning from traditional machine learning approaches, and make it powerful machinery for natural language processing. It is often possible to perform end-to-end training in deep learning for an application. This is because the model (deep neural network) offers rich representability and information in the data can be effectively ‘encoded’ in the model. For example, in neural machine translation, the model is completely automatically constructed from a parallel corpus and usually no human intervention is needed. This is clearly an advantage compared to the traditional approach of statistical machine translation, in which feature engineering is crucial. With deep learning, the representations of data in different forms, such as text and image, can all be learned as real-valued vectors. This makes it possible to perform information processing across multiple modality. For example, in image retrieval, it becomes feasible to match the query (text) against images and find the most relevant images, because all of them are represented as vectors. Challenges There are challenges of deep learning that are more common, such as lack of theoretical foundation, lack of interpretability of model, and requirement of a large amount of data and powerful computing resources. There are also challenges that are more unique to natural language processing, namely difficulty in dealing with long tail, incapability of directly handling symbols, and ineffectiveness at inference and decision making. Data in natural language always follow a power law distribution. As a result, for example, the size of the vocabulary increases as the size of the data increases. That means that, no matter how much data there are for training, there always exist cases that the training data cannot cover. How to deal with the long tail problem poses a significant challenge to deep learning. By resorting to deep learning alone, this problem would be hard to solve. Language data is by nature symbol data, which is different from vector data (real-valued vectors) that deep learning normally utilizes. Currently, symbol data in language are converted to vector data and then are input into neural networks, and the output from neural networks is further converted to symbol data. In fact, a large amount of knowledge for natural language processing is in the form of symbols, including linguistic knowledge (e.g. grammar), lexical knowledge (e.g. WordNet) and world knowledge (e.g. Wikipedia). Currently, deep learning methods have not yet made effective use of the knowledge. Symbol representations are easy to interpret and manipulate and, on the other hand, vector representations are robust to ambiguity and noise. How to combine symbol data and vector data and how to leverage the strengths of both data types remain an open question for natural language processing. There are complex tasks in natural language processing, which may not be easily realized with deep learning alone. For example, multi-turn dialogue amounts to a very complicated process. It involves language understanding, language generation, dialogue management, knowledge base access and inference. Dialogue management can be formalized as a sequential decision process and reinforcement learning can play a critical role. Obviously, combination of deep learning and reinforcement learning could be potentially useful for the task, which is beyond deep learning itself. In summary, there are still a number of open challenges with regard to deep learning for natural language processing. Deep learning, when combined with other technologies (reinforcement learning, inference, knowledge), may further push the frontier of the field. FUNDING This work is supported in part by the National Basic Research Program of China (973 Program, 2014CB340301). REFERENCES 1. Blunsom P, Grefenstette E, Kalchbrenner N. A convolutional neural network for modelling sentences. In: 52nd Annual Meeting of the Association for Computational Linguistics . Baltimore, USA, 2014, 655– 65. 2. Hu B, Lu Z, Li H. Convolutional Neural Network Architectures for Matching Natural Language Sentences. In: Advances in Neural Information Processing Systems 27 . Montreal, Canada, 2014, 2042– 50. 3. Ma L, Lu Z, Shang L et al.   Multimodal Convolutional Neural Networks for Matching Image and Sentence. In: IEEE International Conference on Computer Vision . Santiago, Chile, 2015, 2623– 31. 4. Cho K, Van Merriënboer B, Gulcehre C et al.   Learning phrase representations using rnn encoder-decoder for statistical machine. In: Conference on Empirical Methods in Natural Language Processing . Doha, Qatar, 2014, 1724– 34. 5. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations . San Diego, USA, 2015. 6. Wu Y, Schuster M, Chen Z. CoRR , vol. abs/1609.08144, 2016. 7. Shang L, Lu Z, Li H. Neural Responding Machine for Short-Text Conversation. In: 53th Annual Meeting of Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing . Beijing, China, 2015, 1577– 86. 8. Chen D, Manning CD. A Fast and Accurate Dependency Parser using Neural Networks. In: Conference on Empirical Methods in Natural Language Processing . Doha, Qatar, 2014, 740– 50. © The Author(s) 2017. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png National Science Review Oxford University Press

# Deep learning for natural language processing: advantages and challenges

, Volume 5 (1) – Jan 1, 2018
3 pages

Publisher
Oxford University Press
ISSN
2095-5138
eISSN
2053-714X
D.O.I.
10.1093/nsr/nwx110
Publisher site
See Article on Publisher Site

### Abstract

INTRODUCTION Deep learning refers to machine learning technologies for learning and utilizing ‘deep’ artificial neural networks, such as deep neural networks (DNN), convolutional neural networks (CNN) and recurrent neural networks (RNN). Recently, deep learning has been successfully applied to natural language processing and significant progress has been made. This paper summarizes the recent advancement of deep learning for natural language processing and discusses its advantages and challenges. We think that there are five major tasks in natural language processing, including classification, matching, translation, structured prediction and the sequential decision process. For the first four tasks, it is found that the deep learning approach has outperformed or significantly outperformed the traditional approaches. End-to-end training and representation learning are the key features of deep learning that make it a powerful tool for natural language processing. Deep learning is not almighty, however. It might not be sufficient for inference and decision making, which are essential for complex problems like multi-turn dialogue. Furthermore, how to combine symbolic processing and neural processing, how to deal with the long tail phenomenon, etc. are also challenges of deep learning for natural language processing. PROGRESS IN NATURAL LANGUAGE PROCESSING In our view, there are five major tasks in natural language processing, namely classification, matching, translation, structured prediction and the sequential decision process. Most of the problems in natural language processing can be formalized as these five tasks, as summarized in Table 1. In the tasks, words, phrases, sentences, paragraphs and even documents are usually viewed as a sequence of tokens (strings) and treated similarly, although they have different complexities. In fact, sentences are the most widely used processing units. It has been observed recently that deep learning can enhance the performances in the first four tasks and becomes the state-of-the-art technology for the tasks (e.g. [1–8]). Table 2 shows the performances of example problems in which deep learning has surpassed traditional approaches. Among all the NLP problems, progress in machine translation is particularly remarkable. Neural machine translation, i.e. machine translation using deep learning, has significantly outperformed traditional statistical machine translation. The state-of-the art neural translation systems employ sequence-to-sequence learning models comprising RNNs [4–6]. Deep learning has also, for the first time, made certain applications possible. For example, deep learning has been successfully applied to image retrieval (also known as text to image), in which query and image are first transformed into vector representations with CNNs, the representations are matched with DNN and the relevance of the image to the query is calculated [3]. Deep learning is also employed in generation-based natural language dialogue, in which, given an utterance, the system automatically generates a response and the model is trained in sequence-to-sequence learning [7]. Table 1. Five tasks in natural language processing. Task  Description  Model  Applications  Classification  assign a label to a string  $$s \to c s{\rm{:string}}, c{\rm{:label}}$$  text classification, sentiment analysis  Matching  matching two strings  $$s,t \to {R^ + } s{\rm{: string}}, t{\rm{:string }} {R^ + }{\rm{: non - negative\, real\, values}}$$  search, question answering, single turn dialogue (retrieval based)  Translation  transform one string to another  $$s \to t s{\rm{:string}}, t{\rm{:string}}$$  machine translation, automatic speech recognition, single turn dialogue (generation-based)  structured prediction  map a string to a structure  $$s \to [s] s{\rm{:string}}, [s]{\rm{:structure}}$$  named entity recognition, word segmentation, part-of-speech tagging, dependency parsing, semantic parsing  sequential decision process  take actions in states in dynamically changing environment  $$\pi :s \to a \pi {\rm{:policy}}, s{\rm{:state}}, a{\rm{: action}}$$  multi-turn dialogue  Task  Description  Model  Applications  Classification  assign a label to a string  $$s \to c s{\rm{:string}}, c{\rm{:label}}$$  text classification, sentiment analysis  Matching  matching two strings  $$s,t \to {R^ + } s{\rm{: string}}, t{\rm{:string }} {R^ + }{\rm{: non - negative\, real\, values}}$$  search, question answering, single turn dialogue (retrieval based)  Translation  transform one string to another  $$s \to t s{\rm{:string}}, t{\rm{:string}}$$  machine translation, automatic speech recognition, single turn dialogue (generation-based)  structured prediction  map a string to a structure  $$s \to [s] s{\rm{:string}}, [s]{\rm{:structure}}$$  named entity recognition, word segmentation, part-of-speech tagging, dependency parsing, semantic parsing  sequential decision process  take actions in states in dynamically changing environment  $$\pi :s \to a \pi {\rm{:policy}}, s{\rm{:state}}, a{\rm{: action}}$$  multi-turn dialogue  View Large Table 2. Performances of Natural Language Processing Problems. Task  Example problem  Deep learning  Traditional approach  Reference  classification  sentiment classification  CNN, acc = 86.8%  SVM, acc = 79.4%  [1]  matching  single turn dialogue  CNN, p@1 = 49.6%  MLP, p@1 = 36.1%  [2]  translation  machine translation  NMT, BLEU = 39.0  SMT, BLEU = 37.0  [6]  structured prediction  dependency parsing  acc = 91.8%  acc = 90.7%  [8]  Task  Example problem  Deep learning  Traditional approach  Reference  classification  sentiment classification  CNN, acc = 86.8%  SVM, acc = 79.4%  [1]  matching  single turn dialogue  CNN, p@1 = 49.6%  MLP, p@1 = 36.1%  [2]  translation  machine translation  NMT, BLEU = 39.0  SMT, BLEU = 37.0  [6]  structured prediction  dependency parsing  acc = 91.8%  acc = 90.7%  [8]  View Large Table 3. Advantages and challenges of deep learning for natural language processing. Advantages  Challenges  good at pattern recognition problems data-driven, and performance is high in many problems end-to-end training: little or no domain knowledge is needed in system construction learn of representations: cross-modal processing is possible gradient-based learning: learning algorithm is simple mainly supervised learning methods  not good at inference and decision making cannot directly handle symbols data-hungry and thus is not suitable when data size is small difficult to handle long tail phenomena model is usually a black box and is difficult to understand computational cost of learning is high unsupervised learning methods need to be developed still lacks of theoretical foundation  Advantages  Challenges  good at pattern recognition problems data-driven, and performance is high in many problems end-to-end training: little or no domain knowledge is needed in system construction learn of representations: cross-modal processing is possible gradient-based learning: learning algorithm is simple mainly supervised learning methods  not good at inference and decision making cannot directly handle symbols data-hungry and thus is not suitable when data size is small difficult to handle long tail phenomena model is usually a black box and is difficult to understand computational cost of learning is high unsupervised learning methods need to be developed still lacks of theoretical foundation  View Large The fifth task, the sequential decision process such as the Markov decision process, is the key issue in multi-turn dialogue, as explained below. It has not been thoroughly verified, however, how deep learning can contribute to the task. ADVANTAGES AND CHALLENGES Deep learning certainly has advantages and challenges when applied to natural language processing, as summarized in Table 3. Advantages We think that, among the advantages, end-to-end training and representation learning really differentiate deep learning from traditional machine learning approaches, and make it powerful machinery for natural language processing. It is often possible to perform end-to-end training in deep learning for an application. This is because the model (deep neural network) offers rich representability and information in the data can be effectively ‘encoded’ in the model. For example, in neural machine translation, the model is completely automatically constructed from a parallel corpus and usually no human intervention is needed. This is clearly an advantage compared to the traditional approach of statistical machine translation, in which feature engineering is crucial. With deep learning, the representations of data in different forms, such as text and image, can all be learned as real-valued vectors. This makes it possible to perform information processing across multiple modality. For example, in image retrieval, it becomes feasible to match the query (text) against images and find the most relevant images, because all of them are represented as vectors. Challenges There are challenges of deep learning that are more common, such as lack of theoretical foundation, lack of interpretability of model, and requirement of a large amount of data and powerful computing resources. There are also challenges that are more unique to natural language processing, namely difficulty in dealing with long tail, incapability of directly handling symbols, and ineffectiveness at inference and decision making. Data in natural language always follow a power law distribution. As a result, for example, the size of the vocabulary increases as the size of the data increases. That means that, no matter how much data there are for training, there always exist cases that the training data cannot cover. How to deal with the long tail problem poses a significant challenge to deep learning. By resorting to deep learning alone, this problem would be hard to solve. Language data is by nature symbol data, which is different from vector data (real-valued vectors) that deep learning normally utilizes. Currently, symbol data in language are converted to vector data and then are input into neural networks, and the output from neural networks is further converted to symbol data. In fact, a large amount of knowledge for natural language processing is in the form of symbols, including linguistic knowledge (e.g. grammar), lexical knowledge (e.g. WordNet) and world knowledge (e.g. Wikipedia). Currently, deep learning methods have not yet made effective use of the knowledge. Symbol representations are easy to interpret and manipulate and, on the other hand, vector representations are robust to ambiguity and noise. How to combine symbol data and vector data and how to leverage the strengths of both data types remain an open question for natural language processing. There are complex tasks in natural language processing, which may not be easily realized with deep learning alone. For example, multi-turn dialogue amounts to a very complicated process. It involves language understanding, language generation, dialogue management, knowledge base access and inference. Dialogue management can be formalized as a sequential decision process and reinforcement learning can play a critical role. Obviously, combination of deep learning and reinforcement learning could be potentially useful for the task, which is beyond deep learning itself. In summary, there are still a number of open challenges with regard to deep learning for natural language processing. Deep learning, when combined with other technologies (reinforcement learning, inference, knowledge), may further push the frontier of the field. FUNDING This work is supported in part by the National Basic Research Program of China (973 Program, 2014CB340301). REFERENCES 1. Blunsom P, Grefenstette E, Kalchbrenner N. A convolutional neural network for modelling sentences. In: 52nd Annual Meeting of the Association for Computational Linguistics . Baltimore, USA, 2014, 655– 65. 2. Hu B, Lu Z, Li H. Convolutional Neural Network Architectures for Matching Natural Language Sentences. In: Advances in Neural Information Processing Systems 27 . Montreal, Canada, 2014, 2042– 50. 3. Ma L, Lu Z, Shang L et al.   Multimodal Convolutional Neural Networks for Matching Image and Sentence. In: IEEE International Conference on Computer Vision . Santiago, Chile, 2015, 2623– 31. 4. Cho K, Van Merriënboer B, Gulcehre C et al.   Learning phrase representations using rnn encoder-decoder for statistical machine. In: Conference on Empirical Methods in Natural Language Processing . Doha, Qatar, 2014, 1724– 34. 5. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations . San Diego, USA, 2015. 6. Wu Y, Schuster M, Chen Z. CoRR , vol. abs/1609.08144, 2016. 7. Shang L, Lu Z, Li H. Neural Responding Machine for Short-Text Conversation. In: 53th Annual Meeting of Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing . Beijing, China, 2015, 1577– 86. 8. Chen D, Manning CD. A Fast and Accurate Dependency Parser using Neural Networks. In: Conference on Empirical Methods in Natural Language Processing . Doha, Qatar, 2014, 740– 50. © The Author(s) 2017. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

### Journal

National Science ReviewOxford University Press

Published: Jan 1, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create lists to

Export lists, citations