Grammatical Error Detection and Correction using a Single Maximum Entropy Model

Peilu Wang; Zhongye Jia; Hai Zhao

doi:10.3115/v1/w14-1710

Wang, Peilu;Jia, Zhongye;Zhao, Hai;

2014-01-01 00:00:00

Grammatical Error Detection and Correction using a Single Maximum Entropy Model Peilu Wang, Zhongye Jia and Hai Zhao Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Center for Brain-Like Computing and Machine Intelligence Department of Computer Science and Engineering, Shanghai Jiao Tong University 800 Dongchuan Road, Shanghai 200240, China { plwang1990,jia.zhongye} @gmail.com,[email protected] Abstract cusing on ﬁve major types of errors, requires to correct all 28 types of errors (Ng et al., 2014). This paper describes the system of Shang- One traditional strategy is designing a system hai Jiao Tong Unvierity team in the combined of a set of sub-models, where each sub- CoNLL-2014 shared task. Error correc- model is specialized for a speciﬁc subtask, for ex- tion operations are encoded as a group of ample, correcting one type of errors. This strat- predeﬁned labels and therefore the task egy is computationally efﬁcient and can adopt d- is formulized as a multi-label classiﬁca- ifferent favorable features for each subtask. Top tion task. For training, labels are obtained ranked systems in CoNLL-2013 (Rozovskaya et through a strict rule-based approach. For al., 2013; Kao et al., 2013; Xing et al., 2013; decoding, errors are detected and correct- Yoshimoto et al., 2013; Xiang et al., 2013) are ed according to the classiﬁcation results. based on this strategy. However, the division of A single maximum entropy model is used the model relies on prior-knowledges and the de- for the classiﬁcation implementation in- signing of different features for each sub-model corporated with an improved feature selec- requires a large amount of manual works. This tion algorithm. Our system achieved pre- shortage is especially notable in CoNLL-2014 cision of 29.83, recall of 5.16 and F 0.5 shared task, since the number of error types is of 15.24 in the ofﬁcial evaluation. much larger and the composition of errors is more complicated than before. 1 Introduction In contrast, we follow the work in (Jia et al., The task of CoNLL-2014 is grammatical error cor- 2013a; Zhao et al., 2009a), integrating everything rection which consists of detecting and correcting into one model. This integrated system holds a the grammatical errors in English essays written merit that a one-way feature selection beneﬁts the by non-native speakers (Ng et al., 2014). The re- whole system and no additional process is needed search of grammatical error correction can poten- to deal with the conﬂict or error propagation of ev- tially help millions of people in the world who are ery sub-models. Here is a glance of this method: A learning English as foreign language. Although set of more detailed error types are generated auto- there have been many works on grammatical error matically from the original 28 types of errors. The correction, the current approaches mainly focus on detailed error type can be regarded as the label of very limited error types and the result is far from a word, thus the task of grammatical error detec- satisfactory. tion is transformed to a multi-label classiﬁcation The CoNLL-2014 shared task, compared with task using maximum entropy model (Berger et al., the previous Help Our Own (HOO) tasks (Dale et 1996; Zhao et al., 2013). A feature selection ap- al., 2012) considering only determiner and prepo- proach is introduced to get effective features from sition errors and the CoNLL-2013 shared task fo- large amounts of feature candidates. Once errors This work was partially supported by the National Natu- are detected through word label classiﬁcation, a ral Science Foundation of China (Grant No.60903119, Grant rule-based method is used to make corrections ac- No.61170114, and Grant No.61272248), the National Ba- cording to their labels. sic Research Program of China (Grant No.2013CB329401), the Science and Technology Commission of Shanghai Mu- The rest of the paper is organized as follows. nicipality (Grant No.13511500200), and the European Union Section 2 describes the system architecture. Sec- Seventh Framework Program (Grant No.247619). Corresponding author tion 3 introduces the feature selection approach Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pages 74–82, Baltimore, Maryland, 26-27 July 2014. c 2014 Association for Computational Linguistics and the features we used. Experiments and result- extended from the Levenshtein edit distance algo- s are presented in section 5, followed by conclu- rithm and can divide error types into more detailed sion. subtypes that each subtype can be corrected by ap- plying one simple rule. How to calculate the ex- 2 System Architecture tended Levenshtein edit distance is described in Algorithm 1. In our approach, the grammatical error detection is regarded as a multi-label classiﬁcation task. At Algorithm 1 Extended Levenshtein Edit Distance ﬁrst, each token in training corpus is assigned a la- INPUT: toks , toks src dst bel according to the golden annotation. The con- OUTPUT: E, P struction of labels is rule based using an extend- l , l ← len(toks ), len(toks ) src dst src dst D[0 . . . l ][0 . . . l ] ← 0 ed version of Levenshtein edit distance algorith- src dst B[0 . . . l ][0 . . . l ] ← (0, 0) src dst m which will be discussed in the following sub- E[0 . . . l ][0 . . . l ] ← ϕ src dst section. Each label maps an edit operation to do for i ← 1 . . . l do src D[i][0] ← i the correction, thus the generated labels are much B[i][0] ← (i- 1, 0) more detailed than the originial 28 error types. E[i][0] ← D Then, a maximum entropy (ME) model is adopted end for for j ← 1 . . . l do dst as the classiﬁer. With the labeled data, the process D[0][j] ← j of grammatical error correction is just applying the B[0][j] ← (0, j- 1) edit operation mapped by each label, which is ba- E[0][j] ← A end for sically the reverse of the labeling phase. for i ← 1 . . . l ; do src for j ← 1 . . . l do dst 2.1 Data Labeling if toks [i- 1] = toks [j- 1] then src dst D[i][j] ← D[i- 1][j- 1] In CoNLL-2014 shared task, there are 28 error B[i][j] ← (i- 1, j- 1) types but they can not be used directly as class la- E[i][j] ← U else bels, since these types are too general that they can m = min(D[i- 1][j- 1], D[i- 1][j], D[i][j- 1]) hardly be corrected by applying one rule-based if m = D[i- 1][j- 1] then edit. For example, the correction of Vform (ver- D[i][j] ← D[i- 1][j- 1] + 1 B[i][j] ← (i- 1, j- 1) b form) error type includes all verb form inﬂec- if lemma(toks [i- 1]) src tions such as converting a verb to its inﬁnitive for- = lemma(toks [j- 1]) then dst m, gerund form, past form and past participle and E[i][j] ← S else so on. Previous works (Dahlmeier et al., 2012; E[i][j] ← I Rozovskaya et al., 2012; Kochmar et al., 2012) end if manually decompose each error types to more de- else if m = D[i- 1][j] then D[i][j] ← D[i- 1][j] + 1 tailed subtypes. For example, in (Dahlmeier et al., B[i][j] ← (i- 1, j) 2012), the determinater errors are decomposed in- E[i][j] ← D to: else if m = D[i][j- 1] then D[i][j] ← D[i][j- 1] + 1 B[i][j] ← (i, j- 1) • replacement determiner (RD): { a → the } E[i][j] ← A end if • missing determiner (MD): { ϵ → a } end if end for • unwanted determiner (UD): { a → ϵ } end for i, j ← l , l src dst For a task with a few error types such as merely while i > 0∨ j > 0 do insert E[i][j] into head of E determinative and preposition error in HOO 2012, insert toks [j − 1] into head of P dst manually decomposition may be sufﬁcient. How- (i, j) ← B[i][j] end while ever, for CoNLL-2014, all 28 error types are re- return (E, P) quired to be corrected and some of these types such as Rloc- (Local redundancy) and Um (Un- clear meaning) are quite complex that the manu- In this algorithm, toks represents the tokens src al decomposition is time consuming and requires that are annotated with one grammatical error and lots of grammatical knowledges. Therefore, an au- toks represents the corrected tokens of toks . dst src tomatica decomposition method is proposed. It is At ﬁrst, three two dimensional matrixes D, B and 75 ⟨label⟩ ::= ⟨simple-label ⟩ | ⟨ compound-label ⟩ E are initialized. For all i and j, D[i][j] holds the Levenshtein distance between the ﬁrst i tokens ⟨simple-label ⟩ ::= ⟨pivot⟩ | ⟨ add-before ⟩ | of toks and ﬁrst j tokens of toks . B stores src dst ⟨add-after ⟩ the path of the Levenshtein distance and E stores the edit operations in this path. The original Lev- ⟨compound-label ⟩ ::= ⟨add-before ⟩ ⟨pivot⟩ enshtein edit distance has 4 edit operations: un- | ⟨ pivot⟩ ⟨add-after ⟩ change (U), addition (A), deletion (D) and substi- | ⟨ add-before ⟩ ⟨pivot⟩ ⟨add-after ⟩ tution (S). We extend the “substitution” edit into two types of edits: inﬂection (I) and the original ⟨pivot⟩ ::= ⟨unchange⟩ | ⟨ substitution⟩ | substitution (S). If two different words have the ⟨inﬂection⟩ same lemma, the substitution operation is I, else is | ⟨ deletion⟩ S. lemma(x) returns the lemma of token x. This algorithm returns the edit operations E and the pa- ⟨add-before ⟩ ::= ⟨word⟩⊕ rameters of these operations P. Here is a simple | ⟨ word⟩⊕⟨add-before ⟩ sample illustrating this algorithm. For the golden ⟨add-after ⟩ ::= ⊕⟨word⟩ edit { a red apple is → red apples are} , toks is src | ⊕⟨ word⟩⟨add-after ⟩ a red apple is, toks is red apples are, the output dst edits E will be {D , U, I, S} , and the parameters P ⟨substitution⟩ ::= ⟨word⟩ will be { -, red, apples, are } . Then with the output of this extended Leven- ⟨inﬂection⟩ ::= ⟨inﬂection-rules ⟩ shtein distance algorithm, labels can be generated by transforming these edit operations into readable ⟨unchange⟩ ::= ⊙ symbols. For those tokens without errors, we di- ⟨deletion⟩ ::= ⊖ rectly assign a special label “⊙” to them. A tricky part of the labeling process is the problem of the edit “addition”, A. A new token can only be added Figure 1: BNF syntax of label before or after an existing token. Thus for edit op- eration with addition, we must ﬁnd an existing to- ken that the label can be assigned to, and this sort Rules Description of token is deﬁned as pivot. A pivot can be a token LEMMA change word to its lemma that is not changed in an edit operation, such as the NPLURAL change noun to its plural form “apple” in edit { apple → an apple} , or some oth- VSINGULAR change verb to its singular form er types of edit such as the inﬂection of “look” to GERUND change verb to its gerund form “looking” in edit { look → have been looking at} . PAST change verb to its past form The names of these labels are based on BNF PART change verb to its past partici- syntax which is deﬁned in Figure 1. The non- ple terminal ⟨word⟩ can be substituted by all words Table 1: Inﬂection rules in the vocabulary. The non-terminal ⟨inﬂection- rules⟩ can be substituted by terminals of inﬂection rules that are used for correcting the error types of noun number, verb form, and subject-verb agree- the L returned by Algorithm 2 is {⊖ , ⊙, NPLU- ment errors. All the inﬂection rules are listed in RAL, ARE} corresponding to the tokens { a, red, Table 1. apple, is} in toks . Some other examples of the src With the output of extended Levenshtein edits generated labels are presented in Table 2. distance algorithm, Algorithm 2 gives the process to generate labels whose names are based on the These labels are elaborately designed that each syntax deﬁned in Figure 1. It takes the output E, P of them can be interpreted easily as a series of ed- of Algorithm 1 as inputs and returns the generat- it operations. Once the labels are determined by ed set of labels L. Each label in L corresponds to classiﬁer, the correction of the grammatical errors one token in toks in order. For our previous ex- is conducted by applying the edit operations inter- src ample of edit { a red apple is → red apples are} , preted from these labels. 76 Algorithm 2 Labeling Algorithm ine the factors involved in a wide range of fea- 1: INPUT: E, P tures that have been or can be used to the word 2: OUTPUT: L label classiﬁcation task. Many features that are 3: pivot ← number of edits in E that are notA considered effective in various of previous work- 4: L ← ϕ ′′ 5: L ← s (Dahlmeier et al., 2012; Rozovskaya et al., 6: while i < length(E) do 2012; Han et al., 2006; Rozovskaya et al., 2011; 7: if E[i] = A then Tetreault, Joel R and Chodorow, Martin, 2008) 8: L ← L+ label of edit E[i] with P[i] 9: i ← i + 1 are included. Besides, features that are used in 10: else the similar spell checking tasks (Jia et al., 2013b; 11: l ← L+ label of edit E[i] with P[i] Yang et al., 2012) and some novel features show- 12: pivot ← pivot− 1 13: if pivot = 0 then ing effectiveness in other NLP tasks (Wang et al., 14: i ← i + 1 2013; Zhang and Zhao, 2013; Xu and Zhao, 2012; 15: while i < length of E do Ma and Zhao, 2012; Zhao, 2009; Zhao et al., 16: l ← l +⊕ + P[i] 17: i ← i + 1 2009b) are also included. However, using too 18: end while many features is time consuming. Besides, it in- 19: end if 20: push l into L creases the probability of overﬁtting and may lead ′′ 21: L ← to a poor solution of the maximum-likelihood pa- 22: end if rameter estimate in the ME training. 23: end while 24: L ← upper case of L 25: return L Algorithm 3 Greedy Feature Selection 1: INPUT: all feature candidates F Tokens Edit Label 2: OUTPUT: selected features S to ⊖ 3: S = { f , f , . . . , f } , a random subset of F 0 1 k { to reveal→revealing} reveal GERUND 4: while do a ⊖ 5: C = RECRUITMORE(S) { a woman→women} woman NPLURAL 6: if C = {} then developing { developing world THE⊕ 7: return S wold →the developing world} ⊙ 8: end if a { a→ ϵ} ⊖ 9: S = SHAKEOFF(S+C) 10: if scr(M(S)) ≥ scr(M(S )) then in { in→on} ON 11: return S apple { apple→an apple} AN⊕ 12: end if 13: S = S Table 2: Examples of labeling 14: end while 15: function RECRUITMORE(S) 16: C = {} , and p = scr(M(S)) 2.2 Label Classiﬁcation 17: for each f ∈ F − S do 18: if p < scr(M(S +{ f} )) then Using the approach described above, the training 19: C = C +{ f} 20: end if corpus is converted to a sequence of words with 21: end for labels. Maximum entropy model is used as the 22: end function classiﬁer. It allows a very rich set of features to be 23: function SHAKEOFF(S) 24: while do used in a model and has shown good performance 25: S = S = S in similiar tasks (Zhao et al., 2013). The features 26: for each f ∈ S do ′ ′ 27: if scr(M(S )) < scr(M(S −{ f} )) then we used are discussed in the next section. ′ ′ 28: S = S −{ f} 29: end if 3 Feature Selection and Generation 30: end for 31: S = S One key factor affecting the performance of maxi- 32: if S = S then 33: return S mum entropy classiﬁer is the features it used. A 34: end if good feature that contains useful information to 35: end while 36: end function guide classiﬁcation will signiﬁcantly improve the performance of the classiﬁer. One direct way to involve more good features is involving more fea- Therefore a feature selection algorithm is intro- tures. duced to ﬁlter out “bad” features at ﬁrst and the re- In our approach, large amounts of candidate maining features will be used to generate new fea- features are collected at ﬁrst. We carefully exam- tures. The feature selection algorithm has shown 77 Abbreviation Description effectiveness in (Zhao et al., 2013) and is present- NP Noun Phrase ed in Algorithm 3. NC Noun Compound and is ac- In this algorithm, M (S) represents the model tive if second to last word in using feature set S and scr(M ) represents the e- NP is tagged as noun valuation score of model M on a development da- VP Verb Phrase ta set. It repeats two main steps until no further cw Current Word performance gain is achievable: pos part-of-speech of the current word 1. Include any features from the rest of F into X.l the ith word in the left of X the current set of candidate features if the in- X.r the ith word in the right of X clusion would lead to a performance gain. NP[0] the ﬁrst word of NP NP.head the head word of NP 2. Exclude any features from the current set of NP.(DT or word in NP whose pos is DT candidate templates if the exclusion would IN or TO) or IN or TO lead to no deterioration in performance. VP.verb word in VP whose pos is ver- VP.NP NP in VP By repeatedly adding the useful and removing dp the dependency relation gen- the useless features, the algorithm aims to return erated by standford depen- a better and smaller set of features for next round. dency parser Only 55 of the 109 candidate features remain af- dp.dep the dependent in the depen- ter using this algorithm and they are presented in dency relation Table 4. Table 3 gives an interpretation of the ab- dp.head the head in the dependency breviations used in Table 4. Each feature of a word relation is set to that listed in feature column if the word dp.rel the type of the dependency satisﬁes the condition listed in current word col- relation umn, else the feature is set to “NULL”. For ex- ample, if the current word satisﬁes the condition in the ﬁrst row of Table 4 which is the ﬁrst word Table 3: The interpretation of the abbrevations in in the left of a NC, feature 1 of this word is set to Table 4 all words in the NC, otherwise, feature 1 is set to “NULL”. 4.2 Data Labeling 4 Experiment The labeling algorithm described in section 2.1 is ﬁrstly applied to the training corpus. Total 7047 4.1 Data Sets labels are generated and those whose count is larg- The CoNLL-2014 training data is a corpus of er than 15 is presented in Table 5. Directly ap- learner English provided by (Dahlmeier et al., plying these 7047 labels for correction receives an 2013). This corpus consists of 1,397 articles, 12K M score of precision=90.2%, recall=87.0% and sentences and 116K tokens. The ofﬁcial blind test F 0.5=89.5%. However, the number of labels data consists of 50 articles, 245 sentences and 30K is too large that the training process is time con- tokens. More detailed information about this data suming and those labels appears only few times is described in (Ng et al., 2014; Dahlmeier et al., will hurt the generalization of the trained model. 2013). Therefore, labels with low frequency which ap- In development phase, the entire training corpus pear less than 30 times are cut out and 109 labels is splited by sentence. 80% sentences are picked remain. The M score of the system using this re- up randomly and used for training and the rest ﬁned labels is precision=83.9%, recall=64.0% and 20% are used as the developing corpus. For the ﬁ- F 0.5=79.0%. Note that even applying all labels, nal submission, the entire corpus is used for train- the F 0.5 is not 100%. It is because some annota- ing. tions in the training corpus are not consistency. 78 Count Label current word feature 1091911 ⊙ NC.l NC 31507 ⊖ 3637 NPLURAL NP.l NP 2822 THE⊕ NP[0] NP.l .pos 2600 LEMMA NC.l NC 948 ,⊕ NC.l NC.l .pos 1 1 300˜900 A⊕ PAST THE IN TO . IS OF ARE FOR NC.l and pos=DT NC GERUND , 50˜100 AND ON AN⊕ A VSINGULAR WAS THEIR NC.l and pos=VB NC 20˜50 ELDERLY IT OF⊕ THEY WITH TO⊕ NP.l and pos=VB NP WERE THIS ; ITS .⊕ THAT ’S ⊕ AND⊕ pos=VB cw THAT⊕ HAVE⊕ CAN AS HAVE⊕PART FROM BE WOULD BY pos=DT cw 15˜20 HAVE HAS⊕ WILL HAS AT AN THESE ⊕, the cw.r THEM IN⊕ INTO #⊕ ARE⊕ WHICH PEO- PLE HAS⊕PART ECONOMIC IS⊕ BE⊕ SO a cw.r COULD TO⊕LEMMA MANY PART MAY an cw.r LESS IT⊕ FOR⊕ BEING⊕ NP[0] cw 15˜20 NOT ABOUT WILL⊕LEMMA SHOULD HIS BECAUSE AGED SUCH ALSO NP[0] NP.l WHICH⊕ HAVE⊕PAST WILL⊕ WHO NP[0] NP.l WHEN MUCH 15˜20 ON⊕ ’ THROUGH BE ⊕PAST MORE NP[0] NP.l IF HELP THE⊕ELDERLY ’S ONE AS ⊕ NP[0] NP.l .pos THERE THEIR⊕ WITH⊕ HAVE⊕⊙ NP[0] NP.l .pos ECONOMY DEVELOPMENT CON- CERNED PEOPLE⊕ PROBLEMS BUT NP[0] NP.l .pos MEANS THEREFORE HOWEVER BE- NP.l NP.head ING : UP PROBLEM ’ ⊕ THE⊕LEMMA IN⊕ADDITION HOWEVER⊕,⊕ AMONG NP.l NP.head.pos ;⊕ WHERE THUS ONLY HEALTH NP.head NP. head HAS⊕PAST FUNDING EXTENT ALSO⊕ NP.head NP. head.bag TECHNOLOGICAL ” OR HAD WOULD⊕ VERY .⊕THIS ITS⊕ IMPORTANT DEVEL- NP.head NP. head.pos OPED ⊕BEEN AGE ABOUT⊕ WHO⊕ USE NP.head NP. head.pos.bag THEY⊕ THAN NUMBER HOWEVER⊕, GOVERNMENT FURTHERMORE DURING NP.head NP. (JJ or CC) BUT⊕ YOUNGER RIGHT POPULATION NP.(DT or IN or TO) NP PERSON⊕ FEWER ENVIRONMENTAL- NP.(DT or IN or TO) NP.head LY WOULD⊕LEMMA OTHER MAY⊕ LIMITED HE COULD⊕HAVE BEEN STIL- NP.(DT or IN or TO) NP.head.pos L SPENDING SAFETY OVER ONE⊕’S dp.dep dp.head MAKE MADE LIFE HUMAN HAD⊕ FUNDS CARE ARGUED ALL ”⊕ WHEN⊕ dp.head dp.dep TIME THOSE SOCIETY RESEARCH dp.dep dp.head.pos PROVIDE OLD NEEDS INCREASING DE- dp.head dp.dep.pos VELOPING BECOME BE⊕⊙ ADDITION dp.dep dp.rel Table 5: Labels whose count is larger than 15. dp.head dp.rel VP.verb VP.NP VP.verb VP.NP.head VP.NP.head VP.verb current word feature VP.verb VP.NP.head.pos NC.l NC, cw, cw.l , cw.l .pos, 1 1 1 VP.NP.head VP.verb.pos cw.r , cw.r .pos 1 1 cw cw.l , i ∈ { 0, 1, 2, 3} NP[0] NP.head, NP.l , NP.l , 1 2 cw cw.r , i ∈ { 1, 2, 3} cw, cw.l , cw.l .pos, i 1 1 cw cw.l .pos, i ∈ { 0, 1, 2, 3} NP.head NP[0], NP.l , NP.l , cw, i 1 2 cw.l , cw.l .pos, cw cw.r .pos, i ∈ { 1, 2, 3} i 1 1 dp.head cw, cw.l , cw.l dp.dep, 1 2 dp.dep.pos, dp.rel Table 4: Remained features after the feature selec- tion. Table 6: Examples of the new generated features. 79 2 4.3 Data Reﬁnement l and F 0.5 score of the M scorer according to (Dahlmeier and Ng, 2012). It can be seen The training corpus is reﬁned before used that sen- that sys 220 with the most number of features tences which do not contain errors are ﬁltered out. achieves the best performance. Only 38% of the total sentences remain. With less training corpus, it takes less time to train the ME 4.5 Evaluation Result model. Table 7 presents the performance of sys- The ﬁnal system we use is sys 220 with reﬁned tems using the unreﬁned training corpus and re- training data, the performance of our system on the ﬁned corpus. developing corpus and the blind ofﬁcial test data is System Presicion Recall F 0.5 presented in Table 8. The score is calculated using M scorer. unreﬁned 26.99% 1.67% 6.71% reﬁned 11.17% 3.1% 7.34% Data Set Precision Recall F 0.5 Table 7: Comparison of systems with differen- DEV 13.52% 6.41% 11.07% t training corpus. OFFICIAL 29.83% 5.16% 15.24% Table 8: Evaluation Results All sets of these systems are kept the same ex- cept the training corpus they use. It can be seen 5 Conclusion that the reﬁnement also improves the performance of the system. In this paper, we describe the system of Shang- hai Jiao Tong Univerity team in the CoNLL-2014 4.4 Feature Selection shared task. The grammatical error detection is re- Figure 2 shows the results of systems with dif- garded as a multi-label classiﬁcation task and the ferent feature sets. sys 10 is the system with correction is conducted by applying a rule-based approach based on these labels. A single max- imum entropy classiﬁer is introduced to do the multi-label classiﬁcation. Various features are in- volved and a feature selection algorithm is used to reﬁne these features. Finally, large amounts of feature templates that are generated by the combi- nation of the reﬁned features are used. This system achieved precision of 29.83%, recall of 5.16% and F 0.5 of 15.24% in the ofﬁcial evaluation. References Figure 2: Performance of systems with different Adam L Berger, Vincent J Della Pietra, and Stephen features. A Della Pietra. 1996. A maximum entropy ap- proach to natural language processing. Computa- tional linguistics, 22(1):39–71. 10 randomly chosen features which are used as the initial set of features in Algorithm 3, sys 55 Daniel Dahlmeier and Hwee Tou Ng. 2012. Better is the system with the reﬁned 55 features. With evaluation for grammatical error correction. In Pro- ceedings of the 2012 Conference of the North Amer- these reﬁned features, various of new features are ican Chapter of the Association for Computational generated by combining different features. This Linguistics: Human Language Technologies (NAA- combination is conducted empirically that features CL 2012), pages 568–572, Montreal, Canada. which are considered having relations are com- bined to generate new features. Using this method, Daniel Dahlmeier, Hwee Tou Ng, and Eric Jun Feng Ng. 2012. NUS at the HOO 2012 Shared Task. In 165 new features are generated and total 220 fea- Proceedings of the Seventh Workshop on Building E- tures are used in sys 220. Table 6 gives a few ducational Applications Using NLP, pages 216–224, of examples showing the combined features. The Montreal, ´ Canada, June. Association for Computa- performance is evaluated by the precision, recal- tional Linguistics. 80 Daniel Dahlmeier, Hwee Tou Ng, and Siew Mei Wu. Alla Rozovskaya, Mark Sammons, Joshua Gioja, and 2013. Building a large annotated corpus of learner Dan Roth. 2011. University of Illinois System in english: The nus corpus of learner english. In Pro- HOO Text Correction Shared Task. In Proceedings ceedings of the Eighth Workshop on Innovative Use of the 13th European Workshop on Natural Lan- of NLP for Building Educational Applications (BEA guage Generation, pages 263–266. Association for 2013), pages 22–31, Atlanta, Georgia, USA. Computational Linguistics. Alla Rozovskaya, Mark Sammons, and Dan Roth. Robert Dale, Ilya Anisimoff, and George Narroway. 2012. The UI System in the HOO 2012 Shared Task 2012. Hoo 2012: A report on the preposition and on Error Correction. In Proceedings of the Seventh determiner error correction shared task. In Proceed- Workshop on Building Educational Applications Us- ings of the Second Workshop on Building Education- ing NLP, pages 272–280, Montreal, ´ Canada, June. al Applications Using NLP, pages 54–62, Montreal, ´ Association for Computational Linguistics. Canada, June. Association for Computational Lin- guistics. Alla Rozovskaya, Kai-Wei Chang, Mark Sammons, and Dan Roth. 2013. The university of illinois sys- NA-RAE Han, Martin Chodorow, and Claudia Lea- tem in the conll-2013 shared task. In Proceedings of cock. 2006. Detecting Errors in English Article the Seventeenth Conference on Computational Natu- Usage by Non-Native Speakers. Natural Language ral Language Learning: Shared Task, pages 13–19, Engineering, 12:115–129, 5. Soﬁa, Bulgaria, August. Association for Computa- tional Linguistics. Zhongye Jia, Peilu Wang, and Hai Zhao. 2013a. Grammatical error correction as multiclass classi- Tetreault, Joel R and Chodorow, Martin. 2008. The ﬁcation with single model. In Proceedings of the Ups and Downs of Preposition Error Detection in Seventeenth Conference on Computational Natural ESL Writing. In Proceedings of the 22nd Inter- Language Learning: Shared Task, pages 74–81, national Conference on Computational Linguistics- Soﬁa, Bulgaria, August. Association for Computa- Volume 1, pages 865–872. Association for Compu- tional Linguistics. tational Linguistics. Zhongye Jia, Peilu Wang, and Hai Zhao. 2013b. Graph Rui Wang, Masao Utiyama, Isao Goto, Eiichro Sumi- model for chinese spell checking. In Proceedings ta, Hai Zhao, and Bao-Liang Lu. 2013. Convert- of the Seventh SIGHAN Workshop on Chinese Lan- ing continuous-space language models into n-gram guage Processing, pages 88–92, Nagoya, Japan, Oc- language models for statistical machine translation. tober. Asian Federation of Natural Language Pro- In Proceedings of the 2013 Conference on Empiri- cessing. cal Methods in Natural Language Processing, pages 845–850, Seattle, Washington, USA, October. Asso- Ting-hui Kao, Yu-wei Chang, Hsun-wen Chiu, Tzu-Hsi ciation for Computational Linguistics. Yen, Joanne Boisson, Jian-cheng Wu, and Jason S. Chang. 2013. Conll-2013 shared task: Grammati- Yang Xiang, Bo Yuan, Yaoyun Zhang, Xiaolong Wang, cal error correction nthu system description. In Pro- Wen Zheng, and Chongqiang Wei. 2013. A hybrid ceedings of the Seventeenth Conference on Compu- model for grammatical error correction. In Proceed- tational Natural Language Learning: Shared Task, ings of the Seventeenth Conference on Computation- pages 20–25, Soﬁa, Bulgaria, August. Association al Natural Language Learning: Shared Task, pages for Computational Linguistics. 115–122, Soﬁa, Bulgaria, August. Association for Computational Linguistics. Ekaterina Kochmar, Øistein Andersen, and Ted Briscoe. 2012. HOO 2012 Error Recognition and Junwen Xing, Longyue Wang, Derek F. Wong, Lidi- Correction Shared Task: Cambridge University Sub- a S. Chao, and Xiaodong Zeng. 2013. Um-checker: mission Report. In Proceedings of the Seventh A hybrid system for english grammatical error cor- Workshop on Building Educational Applications Us- rection. In Proceedings of the Seventeenth Confer- ing NLP, pages 242–250, Montreal, ´ Canada, June. ence on Computational Natural Language Learn- Association for Computational Linguistics. ing: Shared Task, pages 34–42, Soﬁa, Bulgaria, Au- gust. Association for Computational Linguistics. Xuezhe Ma and Hai Zhao. 2012. Fourth-order depen- dency parsing. In Proceedings of COLING 2012: Qiongkai Xu and Hai Zhao. 2012. Using deep lin- Posters, pages 785–796, Mumbai, India, December. guistic features for ﬁnding deceptive opinion spam. The COLING 2012 Organizing Committee. In Proceedings of COLING 2012: Posters, pages 1341–1350, Mumbai, India, December. The COL- Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian ING 2012 Organizing Committee. Hadiwinoto, Raymond Hendy Susanto, and Christo- pher Bryant. 2014. The conll-2014 shared task Shaohua Yang, Hai Zhao, Xiaolin Wang, and Bao on grammatical error correction. In Proceedings of liang Lu. 2012. Spell checking for chinese. the Eighteenth Conference on Computational Natu- In Nicoletta Calzolari (Conference Chair), Khalid ral Language Learning: Shared Task (CoNLL-2014 Choukri, Thierry Declerck, Mehmet Ugur ˘ Dogan, ˘ Shared Task), Baltimore, Maryland, USA. Bente Maegaard, Joseph Mariani, Jan Odijk, and 81 Stelios Piperidis, editors, Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012) , pages 730–736, Is- tanbul, Turkey, May. European Language Resources Association (ELRA). ACL Anthology Identiﬁer: L12-1423. Ippei Yoshimoto, Tomoya Kose, Kensuke Mitsuza- wa, Keisuke Sakaguchi, Tomoya Mizumoto, Yuta Hayashibe, Mamoru Komachi, and Yuji Matsumo- to. 2013. Naist at 2013 conll grammatical error correction shared task. In Proceedings of the Seven- teenth Conference on Computational Natural Lan- guage Learning: Shared Task, pages 26–33, Soﬁ- a, Bulgaria, August. Association for Computational Linguistics. Jingyi Zhang and Hai Zhao. 2013. Improving function word alignment with frequency and syntactic infor- mation. In Proceedings of the Twenty-Third inter- national joint conference on Artiﬁcial Intelligence, pages 2211–2217. AAAI Press, August. Hai Zhao, Wenliang Chen, and Chunyu Kit. 2009a. Semantic dependency parsing of nombank and prop- bank: An efﬁcient integrated approach via a large- scale feature selection. In Proceedings of the 2009 Conference on Empirical Methods in Natural Lan- guage Processing, pages 30–39, Singapore, August. Association for Computational Linguistics. Hai Zhao, Yan Song, Chunyu Kit, and Guodong Zhou. 2009b. Cross language dependency parsing using a bilingual lexicon. In Proceedings of the Joint Con- ference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 55–63, Suntec, Singapore, August. Association for Compu- tational Linguistics. Hai Zhao, Xiaotian Zhang, and Chunyu Kit. 2013. In- tegrative semantic dependency parsing via efﬁcien- t large-scale feature selection. Journal of Artiﬁcial Intelligence Research, 46:203–233. Hai Zhao. 2009. Character-level dependencies in chi- nese: Usefulness and learning. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), pages 879–887, Athens, Greece, March. Association for Computational Linguistics.

http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png

Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task Unpaywall

http://www.deepdyve.com/lp/unpaywall/grammatical-error-detection-and-correction-using-a-single-maximum-TWAwLTG723

Grammatical Error Detection and Correction using a Single Maximum Entropy Model

Wang, Peilu; Jia, Zhongye; Zhao, Hai

Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task – Jan 1, 2014

Loading next page...

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher: Unpaywall
DOI: 10.3115/v1/w14-1710
Publisher site: See Article on Publisher Site

Abstract

Journal

Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task – Unpaywall

Published: Jan 1, 2014

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Grammatical Error Detection and Correction using a Single Maximum Entropy Model

Grammatical Error Detection and Correction using a Single Maximum Entropy Model

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Grammatical Error Detection and Correction using a Single Maximum Entropy Model

Grammatical Error Detection and Correction using a Single Maximum Entropy Model

References

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies