Building Concept Definitions from Explanatory Dictionaries

Building Concept Definitions from Explanatory Dictionaries Abstract A key resource for any application in computational semantics is a model of word meaning. Existing systems currently rely on either distributional models trained on large corpora or lexical ontologies such as WordNet. The dict_to_4lang module of the 4lang software library builds concept graph definitions for virtually all words of English by processing automatically entries of three large explanatory dictionaries of English using a state of the art dependency parser and a rule-based system for converting its output to graphs over concepts corresponding to words in each definition. The resulting set of definition graphs has been used successfully in measuring semantic similarity of English words and sentences. The current top scoring system on the popular SimLex benchmark uses features derived from definitions built by dict_to_4lang. Plans for further applications such as recognizing textual entailment and semantics-driven parsing are also outlined in the paper. 1. Introduction The 4lang concept dictionary contains manual definitions of over 2000 language-independent concepts using the 4lang formalism for representing meaning. We present the dict_to_4lang tool of the open-source 4lang library, which builds similar definitions for virtually any word of the English language by processing entries of monolingual dictionaries. Concept graphs created by our tool have been used successfully in measuring semantic similarity of words. Our future plans include applying them to other common tasks in computational semantics such as recognizing textual entailment, question answering, and inference. All software presented in this paper is downloadable from the repository at http://github.com/kornai/4lang and may be freely distributed under an MIT license. The paper is structured as follows: Section 2 provides a short overview of the 4lang formalism for modeling meaning (Kornai et al., 2015). Section 3 presents the dep_to_4lang pipeline, which creates 4lang-style meaning representations from running text. Section 4 describes its application to monolingual dictionary definitions, dict_to_4lang, used to create large concept lexica automatically. Section 5 presents two applications of the dict_to_4lang module to the tasks of measuring the semantic similarity of pairs of English sentences and words. Finally, Section 6 discusses our plans for future applications of the 4lang system. 2. The 4lang system This section is a short outline of the 4lang system for representing meaning using directed graphs of concepts. We shall not attempt a full presentation of the 4lang principles. Instead, we shall introduce the formalism in Section 2.1, then continue to discuss some specific aspects relevant to this paper. 4lang’s approach to multiple word senses is summarized in Section 2.2, Section 2.3 is concerned with reasoning based on 4lang graphs. The treatment of extra-linguistic knowledge is discussed in Section 2.4. Finally, Section 2.5 considers the primitives of the 4lang representation and contrasts them with some earlier approaches to representing word meaning. For a complete presentation of the theory of lexical semantics underlying 4lang the reader is referred to (Kornai, 2010) and (Kornai, 2012). (Kornai et al., 2015) compares 4lang to contemporary theories of word meaning. 4lang is also the name of a manually built dictionary1 mapping 2,200 English words to concept graphs (as well as their translations in Hungarian, Polish, and Latin, hence its name). The dictionary is described in (Kornai & Makrai, 2013). For work on extending 4lang to include all European languages, see (Ács et al., 2013). 2.1. The formalism 4lang represents the meaning of words, phrases and utterances as directed graphs whose nodes correspond to language-independent concepts and whose edges may have one of three labels, 0, 1, and 2. (The 4lang theory represents concepts as Eilenberg-machines (Eilenberg, 1974) with three partitions, each of which may contain zero or more pointers to other machines and therefore also represent a directed graph with three types of edges. The additional capabilities offered by Eilenberg-machines have not been applied by any of the systems presented here, therefore it makes more sense to consider the representations under discussion as plain directed graphs.) First we shall discuss the nature of 4lang concepts - represented by the nodes of the graph, then we shall introduce the types of relationships encoded by each of the three edge types. 2.1.1 Nodes Nodes of 4lang graphs correspond to concepts.4lang concepts are not words, nor do they have any grammatical attributes such as part-of-speech (category), number, tense, mood, voice, etc. For example, 4lang representations make no difference between the meaning of freeze (N), freeze (V), freezing, or frozen. Therefore, the mapping between words of some language and the language-independent set of 4lang concepts is a many-to-one relation. In particular, many concepts will be defined by a single link to another concept that is its hypernym or synonym, e.g. above →0up or grasp →0catch. Encyclopedic information is omitted, e.g. Canada, Denmark, and Egypt are all defined as country (their definitions also containing an indication that an external resource may contain more information). In general, definitions are limited to what can be considered the shared knowledge of competent speakers - e.g. the definition of water contains the information that it is a colourless, tasteless, odorless liquid, but not that it is made up of hydrogen and oxygen. The distinction between linguistic and extra-linguistic knowledge will be discussed in more detail in Section 2.4. We shall now go through the types of links used in 4lang graphs. 2.1.2 The 0-edge The most common relation between concepts in 4lang graphs is the 0-edge, which represents attribution (dog →0friendly); the IS_A relation (hypernymy) (dog →0animal); and unary predication (dog →0bark). Since concepts do not have grammatical categories, this uniform treatment means that the same graph can be used to encode the meaning of phrases like water freezes and frozen water, both of which would be represented as water →0freeze. 2.1.3 1- and 2-edges Edges of type 1 and type 2 connect binary predicates to their arguments (e.g. cat ←1catch →2mouse). The formalism used in the 4lang dictionary explicitly marks binary (transitive) elements – by using UPPERCASE printnames. The pipeline that we shall introduce in Section 3 will not make use of this distinction, any concept can have outgoing 1- and 2-edges. However, we will retain the uppercase marking for those binary elements that do not correspond to any word in a given phrase or sentence, e.g. the meaning of the sentence Penny ate Leonard’s food will be represented by the graph in Figure 1. The top ten most common binaries used in 4lang are listed in Table 1 and examples are shown for each. Table 1 Most common binaries in the 4lang dictionary. HAS shirt ←1HAS →2collar IN letter ←1IN →2envelope AT move ←1AT →2way CAUSE humor ←1CAUSE →2laugh INSTRUMENT sew ←1INSTRUMENT →2needle PART_OF leaf ←1PART_OF →2plant ON smile ←1ON →2face ER slow ←1ER →2speed FOLLOW Friday ←1FOLLOW →2Thursday MAKE bee ←1MAKE →2honey HAS shirt ←1HAS →2collar IN letter ←1IN →2envelope AT move ←1AT →2way CAUSE humor ←1CAUSE →2laugh INSTRUMENT sew ←1INSTRUMENT →2needle PART_OF leaf ←1PART_OF →2plant ON smile ←1ON →2face ER slow ←1ER →2speed FOLLOW Friday ←1FOLLOW →2Thursday MAKE bee ←1MAKE →2honey Table 1 Most common binaries in the 4lang dictionary. HAS shirt ←1HAS →2collar IN letter ←1IN →2envelope AT move ←1AT →2way CAUSE humor ←1CAUSE →2laugh INSTRUMENT sew ←1INSTRUMENT →2needle PART_OF leaf ←1PART_OF →2plant ON smile ←1ON →2face ER slow ←1ER →2speed FOLLOW Friday ←1FOLLOW →2Thursday MAKE bee ←1MAKE →2honey HAS shirt ←1HAS →2collar IN letter ←1IN →2envelope AT move ←1AT →2way CAUSE humor ←1CAUSE →2laugh INSTRUMENT sew ←1INSTRUMENT →2needle PART_OF leaf ←1PART_OF →2plant ON smile ←1ON →2face ER slow ←1ER →2speed FOLLOW Friday ←1FOLLOW →2Thursday MAKE bee ←1MAKE →2honey Fig. 1 View largeDownload slide 4lang graph with two types of binaries. Fig. 1 View largeDownload slide 4lang graph with two types of binaries. Given two concepts c1 and c2 such that c2 is a predicate that holds for c1, 4lang will allow for one of two possible connections between them: c1 →0c2 if c2 is a one-place predicate and c2 →1c1 if c2 is a two-place predicate. The mutual exclusiveness of these two configurations is both counter-intuitive and unpractical: two-place predicates often appear with a single argument (e.g. John is eating), and representing such a statement as John →0eat while the sentence John is eating a muffin warrants John ←1eat →2muffin would mean that we consider the relationship between John and eat dependent on whether we have established the object of his eating. Therefore we choose to adopt a modified version of the 4lang representation where the 0-connection holds between a subject and predicate regardless of whether the predicate has another argument. The example graph in Figure 1 can then be revised to obtain that in Figure 22. The meaning of each 4lang concept is represented as a 4lang graph over other concepts – a typical definition in the 4lang dictionary can be seen in Figure 3; this graph captures the facts that birds are vertebrates, that they lay eggs, and that they have feathers and wings. The generic applicability of the 4lang relations introduced in Section 2.1 have the consequence that to create, understand, and manipulate 4lang representations one need not make the traditional distinction between entities, properties, and events. The relationships dog →0bark and dog →0faithful can be treated in a uniform fashion, when making inferences based on the definitions of each concept, e.g. that dog ←1MAKE →2sound or that calling another person a dog is insulting. In other words, all semantic properties are inherited by default via paths of 0-edges. Fig. 2 View largeDownload slide Revised 4lang graph with two types of binaries. Fig. 2 View largeDownload slide Revised 4lang graph with two types of binaries. Fig. 3 View largeDownload slide 4lang definition of bird. Fig. 3 View largeDownload slide 4lang definition of bird. 2.2. Ambiguity and compositionality 4lang does not allow for multiple senses when representing word meaning, all occurrences of the same word form – with the exception of true homonyms like trunkthe very long nose of an elephant and trunkthe part at the back of a car where you can put bags, tools etc.3,4 – must be mapped to the same concept, whose definition in turn must be generic enough to allow for all possible uses of the word (Ruhl, 1989). As Jakobson reportedly notes, such a monosemic approach might define the word bachelor as ‘unfulfilled in typical male role’ (Fillmore, 1977) – to account for all senses of the word including ‘man who has never married’, ‘has the first or lowest academic degree’ and ‘young fur seal when without a mate during the breeding time’ (Katz & Fodor, 1963, p.186). While such definitions place a great burden on the process responsible for combining the meaning of words to create representations of phrases and utterances – see Section 3 –, it also has the potential to model the flexibility and creativity of language use: we note here a significant advantage of the monosemic approach, namely that it makes interesting predictions about novel usage, while the predictions of the polysemic approach border on the trivial. To stay with the example, it is possible to envision novel usage of bachelor to denote a contestant in a game who wins by default (because no opponent could be found in the same weight class or the opponent was a no-show). The polysemic theory would predict that not just seals but maybe also penguins without a mate may be termed bachelor – true but not very revealing. (Kornai, 2010, p.182) One typical consequence of this approach is that 4lang definitions will not distinguish between bachelor and some concept w that means ‘unfulfilled male’ – both could be defined in 4lang as male, LACK. This is not a shortcoming of the representation, rather it is in accordance with the principles underlying it; the concepts unfulfilled and male cannot be combined (e.g. to create a representation describing an unfulfilled male) without making reference to some nodes of the graph representing the meaning of male; if something is a ‘typical male role’, this should be indicated in the definition graph of male – if only by inbound pointers –, and without any such information, unfulfilled male cannot be interpreted at all. This does not mean that male cannot be defined without listing all stereotypes associated with the concept. However, if the piece of information that ‘being with a mate at breeding time’ is a typical male role – which is necessary to account for the interpretation of bachelor as ‘young fur seal when without a mate at breeding time’ – is to be accessed by some inference mechanism, then it must be present in the form of some subgraph containing the nodes seal, mate, male, and possibly others. Then, a 4lang-based natural language understanding system that is presented with the word bachelor in the context of mating seals for the first time, may explore the neighborhood of these nodes until it finds this piece of information as the only one that makes sense of this novel use of bachelor. Note that this is a model of novel language use in general. Humans produce and understand without much difficulty novel phrases that most theories would label ‘semantically anomalous’. In particular, all language use that is commonly labeled metaphoric involves accessing a lexical element for the purpose of activating some of its meaning components, while ignoring others completely. It is this use of language that 4lang wishes to model, as it is most typical of everyday communication (Richards, 1937; Hobbs, 1990)5. Another 4lang principle that ensures metaphoric interpretation is that any link in a 4lang definition can be overridden. In fact, the only type of negation used in 4lang definitions (LACK) carries the potential to override elements that might otherwise be activated when definitions are expanded: e.g. the definition of penguin, which undoubtedly contains →0bird, may also contain ←1LACK →2fly to block inference based on bird →0fly. That any element can freely be overridden ensures that novel language use does not necessarily cause contradiction. “[T]o handle ‘the ship plowed through the sea’, one lifts the restriction on ‘plow’ that the medium be earth and keeps the property that the motion is in a substantially straight line through some medium” (Hobbs, 1990, p.55). Since a 4lang definition of plow must contain some version of →2earth, there must be a mechanism allowing to override it and not make inferences such as sea →0earth6. 2.3. Reasoning The 4lang principles summarized so far place a considerable burden on the inferencing mechanism. Given the possibility of defining all concepts using only a small set of primitives, and a formalism that strictly limits the variety of connections between concepts, we claim to have laid the groundwork for a semantic engine with the chance of understanding creative language use. Since no generic reasoning has yet been implemented in 4lang – although we present early attempts in Section 4.3 –, we shall now simply outline what we believe could be the main mechanisms of such a system. The simplest kind of lexical inference in 4lang graphs is performed by following paths of 0-edges from some concept to determine the relationships in which it takes part. The concept mammal is defined in 4lang as an animal that has fur and milk (see Figure 4), from which one can conclude that the relations ←1HAS →2milk and ←1HAS →2fur also hold for all concepts whose definition includes →0mammal (we shall assume that this simple inference can be made when we construct 4lang definitions from dictionary definitions in Section 4). Similar inferences can be made after expanding definitions, i.e. connecting all concept nodes to their own definition graphs (see Section 4.3 for details). If the definition of giraffe contains →0mammal, to which we add edges ←1HAS →2fur and ←1HAS →2milk, this expanded graph will allow us to infer the relations giraffe ←1HAS →2fur and giraffe ←1HAS →2milk. As mentioned in the previous section, this process requires that relations present explicitly in a definition override those obtained by inference: penguins are birds and yet they cannot fly, humans are mammals without fur, etc. Fig. 4 View largeDownload slide 4lang definition of mammal. Fig. 4 View largeDownload slide 4lang definition of mammal. A more complicated procedure is necessary to detect connections between nodes of an expanded definition and nodes connected to the original concept. According to Quillian’s account of his Teachable Language Comprehender (Quillian, 1969), the phrase lawyer’s client triggers an iterative search process that will eventually find lawyer to be compatible with the employer property of client, since both are professionals. A similar process can be implemented for 4lang graphs; consider the definition graphs for lawyer and client in Figures 5 and 6, built automatically from definitions in the Longman dictionary, as described in Section 4, then pruned manually. (These graphs, being the output of the dict_to_4lang system and not manual annotation, have numerous issues: the word people in the Longman dictionary definition of lawyer was not mapped to person, nor have the words advice and advise been mapped to the same concept. After correcting these errors manually, nodes with identical names in the graph for lawyer’s client (Figure 7) can form the starting point of the inference process. Let us now go over the various steps of inference necessary to reduce this graph to the most informative representation of lawyer’s client. Note that we do not wish to impose any logical order on these steps; they should rather be the ‘winners’ of a process that considers many transformations in parallel and ends up keeping only some of them. Fig. 5 View largeDownload slide Definition graph for lawyer. Fig. 5 View largeDownload slide Definition graph for lawyer. Fig. 6 View largeDownload slide Definition graph for client. Fig. 6 View largeDownload slide Definition graph for client. Fig. 7 View largeDownload slide Corrected graph for lawyer’s client. Fig. 7 View largeDownload slide Corrected graph for lawyer’s client. We should be able to realize that the person who is adviced (and is represented by) the lawyer can be the same as the client who gets advice from the lawyer. To this end we must be able to make the inference that X ←1get →2advice and advice →2X are synonymous. We believe a 4lang-based system should be able to make such an inference in at least one of two independent ways. First, we’d like to be able to accommodate constructions in the 4lang system (see also Section 6.3); in this case one that explicitly pairs the above two configurations for some surface forms but not for others. Secondly, since we cannot expect to have all possibilities listed, we should also be able to establish for any concept Y the hypothesis Y →2X in the presence of X ←1get →2Y, to be confirmed or disproved at some later step. We should also consider unifying the person node in person ←1from →2advice with lawyer in advice →1lawyer, which would once again require either some construction that states that when someone advises, then the advice is from her, or a generic rule that can guess the same connection. Given these inferences, the two advice can also be merged as likely referring to the same action, resulting in the final graph in Figure 8. The nodes organization, company, and service have been omitted from the figure to improve readability. Fig. 8 View largeDownload slide Inferred graph for lawyer’s client. Fig. 8 View largeDownload slide Inferred graph for lawyer’s client. 2.4. Extra-linguistic knowledge The same 4lang graph might represent the meaning of some utterance, a piece of world knowledge, or could be the output of some inference mechanism whose input can be any combination of linguistic or extra-linguistic knowledge. In fact, neither the 4lang formalism, nor the mechanisms we propose for reasoning based on 4lang representations require a distinction between linguistic and extra-linguistic information. Returning to one of the simplest examples above, where bird →0fly is overridden to accommodate both penguin ←1LACK →2 fly and penguin →0bird, we need not decide whether the particular piece of information that penguins cannot fly is part of the meaning of penguin. Clearly it is possible for one to learn of the existence of penguins and that they are a type of bird without realizing that they cannot fly, and this person could easily make the (incorrect) inference that they can, yet we would not like to claim that this person does not know what the word penguin means. Some components of word meaning, on the other hand, appear to be essential to the understanding of a particular concept, e.g. if a learner of English believes that nephew refers to the child of one’s sibling, male or female – perhaps because in her native language a single word stands for both nephews and nieces, and because she has heard no contradicting examples –, we say that she does not know the meaning of the word; nephew →0male appears to be somehow more internal to the concept nephew than penguin ←1LACK →2fly is to penguin. While this distinction is commonly made in semantics, we believe that in everyday discourse it is neither well-defined, nor does it play an important role in predicting language use and common-sense reasoning. Carrying a conversation successfully only requires that the participants’ representations of word meaning do not contradict each other in a way relevant to the conversation at hand7. Static lexical resources such as the Longman Dictionary of Contemporary English (LDOCE) or the 4lang concept dictionary must make decisions about which pieces of information to include, and may do so based on some notion of how ‘technical’ or ‘commonplace’ they are, but this distinction is not necessary for modeling language use in general. A person’s ignorance of the fact that somebody’s nephew is necessarily male is probably itself the result of one or several conversations about nephews that somehow remained consistent despite his incomplete knowledge about how the word is typically used. The uniform representation of linguistic and extra-linguistic knowledge should also allow us to extend 4lang representations arbitrarily using non-linguistic sources of world knowledge; an example is discussed in Section 6.5.2. 2.5. Primitives of representation In the following two sections we present methods for 1) building 4lang representations from raw text and 2) building 4lang definition graphs for virtually all words based on monolingual dictionaries. Given these two applications, any text can be mapped to 4lang graphs and nodes of any graph can be expanded to include their 4lang definitions. Performing this expansion iteratively, all representations can be traced back to a small set of concepts; in case the Longman Dictionary is used to build definition graphs, the concepts listed in the 4lang dictionary will suffice to cover all of them, since it contains all words of the Longman Defining Vocabulary (LDV), the set of all words used in definitions of the Longman Dictionary (Boguraev & Briscoe, 1989). The set of concepts necessary to define all others can be further reduced: it has been shown in (Kornai et al., 2015) that as few as 129 4lang concepts are enough to define all others in the 4lang dictionary, and thus, via monolingual dictionaries, practically all words in the English language. 2.6. Theoretical significance This section provided a brief summary of the main principles behind the 4lang system for representing the meaning of linguistic structures. Before we proceed to present a set of tools for building and manipulating 4lang representations, let us point out some of the most important characteristics of 4lang representations that make it our formalism of choice. No categories 4lang does not differentiate between concepts denoting actions, entities, attributes, etc., there are no categories of concepts equivalent to part-of-speech categories of words. This ensures, among other things, that words with a shared root are mapped to the same concept, and that ultimately utterances with the same information content can be mapped to identical 4lang representations (although neither of these is strictly required). No polysemy 4lang will only accommodate multiple senses of a word as a last resort. Distant but related uses of the same word must be interpreted via the same generic concept. This virtually eliminates the difficulty of word sense disambiguation. Requires powerful inference the above principles require a mechanism for deriving all uses of a word from minimalistic definitions. Such a mechanism may stand a real chance at handling creative language use typical of everyday human communication (and responsible for polysemy in the first place). Such inference may be achieved using spreading activation over nodes of 4lang graphs, as has been shown by (Nemeskey et al., 2013). No failure of interpretation no combinations of concepts and connections between them are forbidden by the formalism itself. Inference may judge certain states-of-affairs unlikely or even impossible, but the formalism will not fail the interpretation process. 3. From text to concept graph In this section we present our work on combining word representations like those described in Section 2 to create graphs that encode the meaning of phrases. We shall defer the task of syntactic parsing to the state-of-the-art Stanford Parser (DeMarneffe et al., 2006; Socher et al., 2013): the pipeline presented in this section processes sets of dependency triplets emitted by the Stanford Parser to create 4lang-style graphs of concepts (our future plans to incorporate syntactic parsing in 4lang are outlined in Section 6.3). This section is structured as follows: dependency parsing is briefly introduced in Section 3.1, the central dep_to_4lang module which maps dependencies to 4lang graphs is presented in Section 3.2. Major issues are discussed in Section 3.3, some solutions are presented in Section 3.4, manual evaluation of the text_to_4lang system is provided in Section 3.5. Besides the ability to map chunks of running text to semantic representations, text_to_4lang will see another application that is crucial to the system described in this paper: we process definitions of monolingual dictionaries to acquire word representations for lexical items that are not covered by 4lang. The resulting module dict_to_4lang will be presented in Section 4. 3.1. Dependency parsing Our present work is not concerned with the well-known problem of analyzing syntactic structure of natural language text. Instead, we use a robust, state-of-the-art tool, the Stanford Parser8 to obtain dependency relations that hold between pairs of words in an English sentence. Unlike dependency parsers that have been trained on manually annotated dependency treebanks, the Stanford Parser discovers relations by matching templates against its parse of a sentence’s constituent structure (DeMarneffe et al., 2006). This approach is more robust, since phrase structure parsers, and in particular the PCFG parser in the Stanford toolkit (Klein & Manning, 2003), are trained on much larger datasets than what is available to standard dependency parsers. The Stanford Dependency Parser is also capable of returning collapsed dependencies, which explicitly encode relations between two words that are encoded in the sentence by a function word such as a preposition or conjunction. E.g. in case of the sentence I saw the man who loves you, standard dependency parse would contain the relation nsubj(loves, who) but not nsubj(loves, man), even though man is clearly the subject of loves. Collapsed dependency parses contain these implicitly present dependencies and are therefore more useful for extracting the semantic relationships between words in the sentence. Furthermore, the Stanford Parser can postprocess conjunct dependencies: in the sentence Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas, the NP Bills on ports and immigration will at first be parsed into the relations prep_on(Bills, ports) and cc_and(ports, immigration), then matched against a rule that adds the relation prep_on(Bills, immigration). For our purposes we enable both types of postprocessing and use the resulting set of relations (or triplets) as input to the dep_to_4lang module, which uses them to build 4lang graphs and will be introduced in Section 3.2. The list of dependency relations extracted from a sentence is clearly not intended as a representation of meaning. However, it will prove sufficient to construct good quality semantic representations because of the nature of 4lang relations: for sentences and phrases such as Mary loves John or queen of France, 4lang representations are as simple as Mary ←1love →2John and France ←1HAS →2queen which can be straightforwardly constructed from the dependency relations nsubj(love, Mary), dobj(love, John), and prep_of(queen, France). Any further details that one may demand of a semantic representation, e.g. that John is an experiencer or that France does not physically possess the queen, will be inferred from the 4lang definitions of the concepts love and queen, in the latter case probably also accessing the definitions of rule or country. 3.2. From dependencies to graphs To construct 4lang graphs using dependency relations in the parser’s output, we created manually a mapping from relations to 4lang subgraphs, assigning to each dependency one of nine possible configurations. Additionally, all remaining relations of the form prep_* and prepc_* are mapped to binary subgraphs containing a node corresponding to the given preposition. To map words to 4lang concepts, we first lemmatize them using the hunmorph morphological analyzer (Trón et al., 2005) and the morphdb.en database. Graph edges for each dependency are added between the nodes corresponding to the lemmas returned by hunmporph. The full mapping from dependencies to 4lang-subgraphs is presented in Table 2. Figures 9 and 109 provide examples of how 4lang subgraphs correspond to dependency triplets. For a detailed description of each dependency relation the reader is referred to (De Marneffe & Manning, 2008). Table 2 Mapping from dependency relations to 4lang subgraphs. Dependency Edge amod w1 →0w2 advmod npadvmod acomp dep num prt nsubj w1 ⇌01w2 csubj xsubj agent dobj w1 →2w2 pobj nsubjpass csubjpass pcomp xcomp appos w1 ⇌00w2 poss w2←1HAS →2w1 prep_of tmod w1←1AT →2w2 prep_with w1←1INSTRUMENT →2w2 prep_without w1←1LACK →2w2 prep_P w1←1P →2w2 Dependency Edge amod w1 →0w2 advmod npadvmod acomp dep num prt nsubj w1 ⇌01w2 csubj xsubj agent dobj w1 →2w2 pobj nsubjpass csubjpass pcomp xcomp appos w1 ⇌00w2 poss w2←1HAS →2w1 prep_of tmod w1←1AT →2w2 prep_with w1←1INSTRUMENT →2w2 prep_without w1←1LACK →2w2 prep_P w1←1P →2w2 Table 2 Mapping from dependency relations to 4lang subgraphs. Dependency Edge amod w1 →0w2 advmod npadvmod acomp dep num prt nsubj w1 ⇌01w2 csubj xsubj agent dobj w1 →2w2 pobj nsubjpass csubjpass pcomp xcomp appos w1 ⇌00w2 poss w2←1HAS →2w1 prep_of tmod w1←1AT →2w2 prep_with w1←1INSTRUMENT →2w2 prep_without w1←1LACK →2w2 prep_P w1←1P →2w2 Dependency Edge amod w1 →0w2 advmod npadvmod acomp dep num prt nsubj w1 ⇌01w2 csubj xsubj agent dobj w1 →2w2 pobj nsubjpass csubjpass pcomp xcomp appos w1 ⇌00w2 poss w2←1HAS →2w1 prep_of tmod w1←1AT →2w2 prep_with w1←1INSTRUMENT →2w2 prep_without w1←1LACK →2w2 prep_P w1←1P →2w2 Table 3 Basic figures for each dataset. Dict headwords av. def. length approx. vocab. size LDOCE 30,126 11.6 9,000 Collins 82,026 13.9 31,000 en.wikt 128,003 8.4 38,000 Dict headwords av. def. length approx. vocab. size LDOCE 30,126 11.6 9,000 Collins 82,026 13.9 31,000 en.wikt 128,003 8.4 38,000 Table 3 Basic figures for each dataset. Dict headwords av. def. length approx. vocab. size LDOCE 30,126 11.6 9,000 Collins 82,026 13.9 31,000 en.wikt 128,003 8.4 38,000 Dict headwords av. def. length approx. vocab. size LDOCE 30,126 11.6 9,000 Collins 82,026 13.9 31,000 en.wikt 128,003 8.4 38,000 Table 4 Graphs built from each dataset. Dict # graphs av. nodes LDOCE 24,799 6.1 Collins 45,311 4.9 en.wikt 120,670 5.4 Dict # graphs av. nodes LDOCE 24,799 6.1 Collins 45,311 4.9 en.wikt 120,670 5.4 Table 4 Graphs built from each dataset. Dict # graphs av. nodes LDOCE 24,799 6.1 Collins 45,311 4.9 en.wikt 120,670 5.4 Dict # graphs av. nodes LDOCE 24,799 6.1 Collins 45,311 4.9 en.wikt 120,670 5.4 Fig. 9 View largeDownload slide Constructing the graph for Harry shivered in the cold night air. Fig. 9 View largeDownload slide Constructing the graph for Harry shivered in the cold night air. Fig. 10 View largeDownload slide Constructing the graph for Everyone from wizarding families talked about Quidditch constantly. Fig. 10 View largeDownload slide Constructing the graph for Everyone from wizarding families talked about Quidditch constantly. 3.3. Issues 3.3.1 Parsing errors Using the Stanford Parser for dependency parsing yields high-quality output, it is however limited by the quality of the phrase structure grammar parser. Parsing errors constitute a major source of errors in our pipeline, occasionally resulting in dubious semantic representations that could be discarded by a system that integrates semantic analysis into the parsing process. While our long-term plans include implementing such a process within the 4lang framework using constructions (see Section 6.3), we must currently rely on independent efforts to improve the accuracy of phrase structure grammar parsers using semantic information. Results of a pioneering effort in this direction are already included in the latest versions of the Stanford Parser (including the one used in the 4lang system): (Socher et al., 2013) improves the accuracy of the Stanford Parser by using Compositional Vector Grammars Their model combines classic PCFG grammars with word embeddings to account for the semantic relationships between words in the text that is to be parsed and words that have occurred in the training data. For example, the sentence He ate spaghetti with a spoon can be structurally distinguished from He ate spaghetti with meatballs even if in the training phase the model has only had access to [eat [spaghetti] [with a fork]], by grasping the similarity between the words spoon and fork. This phenomenon of incorrect PP-attachment is the single most frequent source of anomalities in our output. For example, syntactic ambiguity in the Longman definition of basement: a room or area in a building that is under the level of the ground, which has the constituent structure in Figure 11, is incorrectly assigned the structure in Figure 12, resulting in the erroneous semantic representation in Figure 13. Most such ambiguities are easily resolved by humans based on world knowledge (in this case e.g. that buildings with some underground rooms are more common than buildings that are entirely under the ground, if the latter can be called buildings at all) but it is unclear whether such inferencing isn’t beyond the capabilities even for parsers using word embeddings. Fig. 11 View largeDownload slide Constituent structure of a room or area in a building that is under the level of the ground. Fig. 11 View largeDownload slide Constituent structure of a room or area in a building that is under the level of the ground. Fig. 12 View largeDownload slide Incorrect parse tree for a room or area in a building that is under the level of the ground. Fig. 12 View largeDownload slide Incorrect parse tree for a room or area in a building that is under the level of the ground. Fig. 13 View largeDownload slide Incorrect definition graph for basement. Fig. 13 View largeDownload slide Incorrect definition graph for basement. 3.4. Postprocessing dependencies Some of the typical issues of the graphs constructed by the process described in Section 3.2 can be resolved by postprocessing the dependency triplets in the parser’s output before passing them to dep_to_4lang. Currently the dependency_processor module handles two configurations: coordination (Section 3.4.1) and copular sentences (Section 3.4.2) 3.4.1 Coordination One frequent class of parser errors related to PP-attachment (cf. Section 3.3.1) involve constituents modifying a coordinated phrase which are analyzed as modifying only one of the coordinated elements. E.g. in the Longman entry casualty: someone who is hurt or killed in an accident or war, the parser fails to detect that the PP in an accident or war modifies the constituent hurt or killed, not just killed. Determining which of two possible parse trees is the correct one is of course difficult – once again, casualty may as well mean ‘someone who is killed in an accident or war or someone who is hurt (in any way)’ and that such a misunderstanding is unlikely in real life is a result of inference mechanisms well beyond what we are able to model. Our simple attempt to improve the quality of graphs built is to process all pairs of words between which a coordinating dependency holds (e.g. conj_and, conj_or, etc.) and copy all edges from each node to the other. While this could hardly be called a solution, as it may introduce dependencies incorrectly, in practice it has proved an improvement. In our current example this step enables us to obtain missing dependencies and thus build the correct 4lang graph (see Figure 14). Fig. 14 View largeDownload slide Definition graph built from: casualty: someone who is hurt or killed in an accident or war, with extra dependencies added by the postprocessor. Fig. 14 View largeDownload slide Definition graph built from: casualty: someone who is hurt or killed in an accident or war, with extra dependencies added by the postprocessor. 3.4.2 Copulars and prepositions Two further postprocessing steps involve copular constructions containing prepositional phrases. In simple sentences such as The wombat is under the table, the parser returns the pair of dependencies nsubj(is, wombat) and prep_under(is, table), which we use to generate prep_under(wombat, table). Similarly, when PPs are used to modify a noun, such as in the Longman definition of abbess: a woman who is in charge of a convent, for which the dependency parser returns, among others, the triplets rcmod(woman, is) and prep_in(is, convent), we let a simple rule add the triplet prep_in(woman, charge) (see Figure 15). In both cases we finish by removing the copular verb in order to simplify our final representation. Fig. 15 View largeDownload slide Postprocessing of the entry: abbess:a woman who is in charge of a convent. Fig. 15 View largeDownload slide Postprocessing of the entry: abbess:a woman who is in charge of a convent. 3.5. Evaluation We performed manual evaluation of the text_to_4lang module on a sample from the UMBC Webbase corpus (Han et al., 2013), a set of 3 billion English words based on a 2007 webcrawl performed as part of the Stanford Webbase10 project. We used the GNU utility shuf to extract a random sample of 50 sentences, which we processed with text_to_4lang, then examined manually both the final output and the dependencies output by the Stanford Parser in order to gain a full understanding of each anomaly in the graphs created. The sentences in this corpus are quite long (22.1 words/sentence on average), therefore most graphs are affected by multiple issues; we shall now take stock of those that affected more than one sentence in our sample. Parser errors remain the single most frequent source of error in our final 4lang graphs: 16 sentences in our sample of 50 were assigned dependencies erroneously. 4 of these cases are related to PP-attachment (see Section 3.3.1). Parser errors are also virtually the only issue that cause incorrect edges to be added to the final graph – nearly all remaining issues will result in missing connections only. The second largest source of errors in this dataset are related to connectives between clauses that our pipeline does not currently process. Our sample contains 12 such examples, including 4 relative clauses and 4 pairs of clauses connected by connectives such as that, unless, etc. The output of our pipeline for these sentences typically consists of two graphs that are near-perfect representations of the two clauses, but are not connected to each other in any way – an example is shown in Figure 16. Fig. 16 View largeDownload slide 4lang graph built from the sentence The Manitoba Action Committee is concerned that the privatization of MTS will lead to rate increases. The dependency ccomp(concerned, lead) was not processed. Fig. 16 View largeDownload slide 4lang graph built from the sentence The Manitoba Action Committee is concerned that the privatization of MTS will lead to rate increases. The dependency ccomp(concerned, lead) was not processed. There are three more error classes worth mentioning: 5 graphs suffered from recall errors made by the Stanford Coreference Resolution system: in these cases connections of a single concept in the final graph are split among two or more nodes, since our pipeline failed to identify two words as referring to the same entity (Figure 17 shows an example). Another 5 sentences caused errors because of the appearance in the parser output of the dependency relation vmod, which holds between a noun and a reduced non-final verbal modifier, which “is a participial or infinitive form of a verb heading a phrase (which may have some arguments, roughly like a VP). These are used to modify the meaning of an NP or another verb.” (DeMarneffe et al., 2006, p.10). This dependency is not processed by dep_to_4lang, since it may encode the relation between a verb and either its subject or object; e.g. the example sentences in the Stanford Dependency Manual, Truffles picked during the spring are tasty and Bill tried to shoot, demonstrating his incompetence will result in the triplets vmod(truffles, picked) and vmod(shoot, demonstrating), but should be represented in 4lang by the edges pick →2truffles and shoot →0demonstrate, respectively. Most representations in our sample suffer from multiple errors. While a quantitative analysis of the quality of these representations is currently not possible, our manual inspection tells us that 16 of the 50 graphs in our sample are either perfect representations of the input sentence (in 4 cases) or are affected by a single minor error only and remain high-quality representations. Fig. 17 View largeDownload slide 4lang graph built from the sentence My wife and I have used Western Union very successfully for almost two years to send money to her family in Ukraine.. Nodes with dashed edges should have been unified based on coreference resolution. Fig. 17 View largeDownload slide 4lang graph built from the sentence My wife and I have used Western Union very successfully for almost two years to send money to her family in Ukraine.. Nodes with dashed edges should have been unified based on coreference resolution. 4. Building definition graphs By using the text_to_4lang module to process entries in monolingual dictionaries written for humans we can attempt to build definition graphs like those in 4lang for practically every word. This section presents the dict_to_4lang module, which extends the text_to_4lang pipeline with parsers for several major dictionaries (an overview of these is given in section 4.1) as well as some preprocessing steps specific to the genre of dictionary definitions – these are presented in section 4.2. Finally, Section 4.4 points out several remaining issues with definition graphs produced by the dict_to_4lang pipeline. Applications of dict_to_4lang, both existing and planned, shall be described in Section 5. The entire pipeline is available as part of the 4lang library. 4.1. Dictionaries of English We process three large dictionaries of English; custom parsers have been built for each and are distributed as part of the 4lang module. The Longman Dictionary of Contemporary English (Bullon, 2003) contains ca. 42 000 English headwords and its definitions are constrained to a small vocabulary, the Longman Defining Vocabulary (LDV, (Boguraev & Briscoe, 1989)). The longman_parser tool processes the xml-formatted data and extracts for each headword a list of its senses, including for each the plain-text definition, the part-of-speech tag, and the full form of the word being defined, if present: e.g. definitions of acronyms will contain the phrase that is abbreviated by the headword. No component of 4lang currently makes use of this last field, AAA will not be replaced by American Automobile Association, but this may change in the future. The Collins-COBUILD dictionary (Sinclair, 1987) contains over 84,500 headwords and its definitions use a vocabulary that is considerably larger than LDOCE, including a large technical vocabulary (e.g. adularia:a white or colourless glassy variety of orthoclase in the form of prismatic crystals., rare words (affricare:to rub against), and multiple orthographic forms (adsuki bean:variant spelling of adzuki bean). Since many definitions are simply pointers to other headwords, the average entry in Collins is much shorter than in LDOCE. However, given the technical nature of many entries, the vocabulary used by definitions exhibits a much larger variety: while Longman definitions – for the greatest part limited to the LDV – contain less than 9000 English lemmas (not including named entities, numbers, etc.), Collins definitions use over 38,000 (figures obtained using the hunmorph analyzer and the morhdb.en database). Our third source of English definitions, the English Wiktionary at http://en.wiktionary.org is the most comprehensive database, containing over 128,000 headwords and available via public data dumps that are updated weekly. Since wiktionaries are available for many languages using similar – although not standardized – data formats, it has long been a resource for various NLP tasks, among them an effort to extend the 4lang dictionary to 40 languages (Ács et al., 2013). While for most languages datasets such as Longman and Collins may not be publicly available, wiktionaries currently contain over 100,000 entries for each of nearly 40 languages, and over 10,000 for a total of 76. 4.2. Parsing definitions 4.2.1 Preprocessing Before passing dictionary entries to the parser, we match them against some simple patterns that are then deleted or changed to simplify the phrase or sentence without loss of information. A structure typical of dictionary definitions are noun phrases with very generic meanings, e.g. something, one, a person, etc. For example, LDOCE defines buffer as someone or something that protects one thing or person from being harmed by another. The frequency of such structures makes it worthwhile to perform a simple preprocessing step: phrases such as someone, someone who, someone, etc. are removed from definitions in order to simplify them, thus reducing the chance of error in later steps. The above definition of buffer, for example, can be reduced to protects from being harmed, which can then be parsed to construct the definition graph protect ←1FROM →2harm. 4.2.2 Constraining the parser Since virtually all dictionary definitions of nouns are single noun phrases, we constrain the parser to only allow such analyses for the definitions of all noun headwords11. This fixes many incorrect parses, for example when the defining noun phrase could also be parsed as a complete sentence, as in Figure 18. Fig. 18 View largeDownload slide Incorrect parse tree from the Stanford Parser for the definition of wavelength: the size of a radio wave used to broadcast a radio signal. Fig. 18 View largeDownload slide Incorrect parse tree from the Stanford Parser for the definition of wavelength: the size of a radio wave used to broadcast a radio signal. 4.2.3 Building definition graphs The output of the – possibly constrained – parsing process is passed to the dep_to_4lang module introduced in Section 3. The ROOT dependency in each parse, which was ignored in the general case, is now used to identify the head of the definition, which is a hypernym of the word being defined. This allows us to connect, via a 0-edge, the node of the concept being defined to the graph built form its definition. We can perform this step safely because the vast majority of definitions contain a hypernym of the headword as their root element – exceptions will be discussed in Section 4.4.2. 4.3. Expanding definition graphs The 4lang dictionary contains by design all words of the Longman Defining Vocabulary (LDV, (Boguraev & Briscoe, 1989)). This way, if we use dict_to_4lang to define each headword in LDOCE as a graph over nodes corresponding to words in its dictionary definition, these graphs will only contain concepts that are defined in the hand-written 4lang dictionary. To take advantage of this, we implement an expansion step in 4lang, which adds the definition of each concept to a 4lang graph by simply adjoining each definition graph to G at the node corresponding to the concept being defined. This can be stated formally as follows: Definition 1. Given the set of all concepts C, a 4lang graph G with concept nodes V(G)=c1,c2,…,ci∈C, a set of definition graphs D, and a lexicon function L:C→D such that ∀c∈C:c∈V(L(c)), we define the expansion of G as G*=G∪∪ci∈LL(G) Hand-written definitions in the 4lang dictionary may also contain pointers to arguments of the definiendum. For example, the concept stand is defined as upright ←0= AGT ←1ON →1feet, indicating that it is the agent of stand that is →0upright, etc. While detecting the thematic role of a verb’s arguments can be difficult, we handle the majority of cases correctly using a simple step after expansion: all edges containing =AGT (=PAT) nodes are moved to the machine(s) with a 1-edge (2-edge) pointing to it from the concept being defined. This allows us to create the graph in Figure 19 based on the above definition of stand. Fig. 19 View largeDownload slide Expanded graph for A man stands in the door. Nodes of the unexpanded graph are shown in grey. Fig. 19 View largeDownload slide Expanded graph for A man stands in the door. Nodes of the unexpanded graph are shown in grey. Expansion will affect all nodes of graphs built from LDOCE; when processing generic English text using text_to_4lang we may choose to limit expansion to manually built 4lang definitions, or we can turn to dictionaries built using dict_to_4lang, allowing ourselves to add definitions to nearly all nodes. 4lang modules can be configured to select the approach most suitable for any given application. 4.4. Issues and evaluation In this section we shall describe sources of errors in our pipeline besides those caused by incorrect parser output (see Section 3.3.1). We shall also present the results of manual error analysis conducted on a small sample of graphs in an effort to determine both the average accuracy of our output graphs as well as to identify the key error sources. 4.4.1 Error analysis To perform manual evaluation of the dict_to_4lang pipeline we randomly selected 50 headwords from the Longman Dictionary12. In one round of evaluation we grouped the 50 definition graphs by quality, disregarding the process that created them. We found that 31 graphs were high-quality representations: 19 perfectly represented all facts present in the dictionary entry (see e.g. Figure 20) and another 12 were mostly accurate, with only minor details missing or an incorrect relation present in addition to the correct ones. Of the remaining 19 graphs, 9 still encoded several true relationships, the last 10 were essentially useless. Our sample is too small to conclude that 62% of the graphs we build are of acceptable quality, but these results are nevertheless promising. Our second round of manual inspection was directed at the entire process of building the 50 graphs and aimed to identify the source of errors. Out of the 31 graphs that had errors at all, 8 were clearly a result parser errors (discussed in Section 3.3.1), another 8 contained non-compositional structures that in the future may be handled by constructions (see Section 6.5.1), and 3 were connected to non-standard definitions (see Section 4.4.2). All remaining errors were caused by one-of-a-kind bugs in the pipeline, e.g. preprocessing issues, the occasional overgeneration of relations by the postprocessing of coordinated structures (see Section 3.4.1), etc. Fig. 20 View largeDownload slide Graph constructed from the definition of Zen: a kind of Buddhism from Japan that emphasizes meditation. Fig. 20 View largeDownload slide Graph constructed from the definition of Zen: a kind of Buddhism from Japan that emphasizes meditation. 4.4.2 Non-standard definitions Our method for building 4lang definitions can be successful in the great majority of cases because most dictionary definitions – or at least their first sentences, which is all we make use of – are rarely complex sentences; in most cases they are single phrases describing the concept denoted by the headword – a typical example would be the definition of koala: an Australian animal like a small grey bear with no tail that climbs trees and eats leaves. It is these kinds of simple definitions that are prevalent in the dictionaries we process and that are handled quite accurately by both the Stanford Parser and our mapping from dependencies to 4lang relations. In some cases, however, definitions use full sentences to explain the meaning of a word in a more straightforward and comprehensible way, for example: playback - the playback of a tape that you have recorded is when you play it on a machine in order to watch or listen to it indigenous - indigenous people or things have always been in the place where they are, rather than being brought there from somewhere else ramshackle - a ramshackle building or vehicle is in bad condition and in need of repair These sentences will result in a higher number of dependency relations, and consequently a denser definition graph; often with erroneous edges. In the special case when the Stanford Parser’s output does not contain the ROOT relation, that is the parser failed to identify any of the words as the root of the sentence, we skip the entry entirely – this affects 0.76% of LDOCE entries, 0.90% of entries in en.wiktionary. 4.4.3 Word senses As discussed in Section 2.2, the 4lang theory assigns only one definition to each word form, i.e. it does not permit multiple word senses, all usage of a word must be derived from a single concept graph. Explanatory dictionaries like the ones listed in Section 4.1 provide several definitions for each word, of which we always process the first one. This decision is somewhat arbitrary, but produces good results in practice; the first definition typically describes the most common sense of the word, as in the case of tooth: one of the hard white objects in your mouth that you use to bite and eat food one of the sharp or pointed parts that sticks out from the edge of a comb or saw We cannot expect to construct from this entry a generic definition such as sharp, one_of_many. Instead, to capture at a later stage that objects other than those in your mouth could be instances of tooth, we must turn to the principle that any link in a 4lang definition can be overridden (see Section 2.2). Not only are we unable to predict the particular subset of links in the definition of tooth that will be shared across various uses of the word tooth, we shouldn’t make any such predictions: it is no more than an accident that teeth turned out to be metaphors for small, sharp objects lined up next to one another and not for e.g. small, white, cube-shaped objects. While in most cases the various senses defined for a word are metaphoric uses of the first, there remain words whose first definition is not generic enough to accommodate all others even if we assume powerful inferencing capabilities. Consider e.g. the definitions of shower from LDOCE below: a piece of equipment that you stand under to wash your whole body an act of washing your body while standing under a shower a short period of rain or snow a lot of small, light things falling or going through the air together a party at which presents are given to a woman who is going to get married or have a baby a group of stupid or lazy people to wash your whole body while standing under a shower to give someone a lot of things to scatter a lot of things onto a person or place, or to be scattered in this way A 4lang definition generic enough so that one could derive at least the majority of these cases would be most similar to definition #4: showers are occurrences of many things falling, typically through the air. Understanding the word shower in the context of e.g. baby showers (#5) would remain a difficult task, including among others that of understanding that fall may refer to an object changing place not only physically but also in terms of ownership. In the above LDOCE entry, however, since we use the first definition to build the 4lang graph, we lose any chance of recovering any of the meanings #3-6 and #8-9. The lexicographic principle that keeps sense #2 and sense #7 separate simply does not apply in 4lang, which does not distinguish meanings that differ in part of speech alone: the verb and the nomen actionis are simply one and the same. We further note that many of the distinctions made here would be made by overt suffixes in other languages, e.g. the Hungarian equivalents of #1 and #2 are zuhany and zuhanyozik, respectively. 5. Semantic similarity This section summarizes two successful applications of the dict_to_4lang system. A tool for measuring the similarity of English sentence pairs, introduced in (Recski & Ács, 2015), is presented in Section 5.1, while Section 5.2 documents the more recent wordsim system for measuring similarity of word pairs, which we evaluate on the popular benchmark SimLex-999, achieving significant improvement over the current state of the art (see also (Recski et al., 2016)). 5.1. Sentence similarity This section reviews a set of systems participating in the 2015 SemEval task of measuring semantic similarity of sentence pairs using concept graphs built using dict_to_4lang to measure the semantic similarity between words. We briefly review the STS task, then present the system architecture and our measure of word similarity based on 4lang representations. This measure is combined with word pair features derived from various word embeddings, lexical resources like WordNet, and surface forms of words, to produce a competitive algorithm for measuring sentence similarity. 5.1.1 The STS task The Semantic Textual Similarity (STS) track of SemEval conferences requires participating systems to measure the degree of semantic similarity between pairs of sentences. Datasets used in recent years were taken from a variety of sources (news headlines, image captions, answers to questions posted in online forums, answers given by students in classroom tests, etc.). Gold annotation was obtained by crowdsourcing (using Amazon Mechanical Turk), annotators were required to grade sentence pairs on a scale from 0 to 5; Inter-annotator agreement was calculated to ensure the high quality of annotations. 5.1.2 System architecture Our framework for measuring semantic similarity of sentence pairs is a reimplementation of the system presented in (Han et al., 2013), who were among the top scorers in all STS tasks since 2013 (Kashyap et al., 2014; Han et al., 2015). Their architecture, Align and Penalize, involves computing an alignment score between two sentences based on some measure of word similarity. Our system extends the capabilities of this system in several ways, among them by defining a measure of semantic similarity between 4lang graphs and using it as an additional source of word similarity in several of their configurations. The core idea behind the Align and Penalize architecture is, given two sentences S1 and S2 and some measure of word similarity, to align each word of one sentence with some word of the other sentence so that the total similarity of word pairs is maximized. The mapping need not be one-to-one and is calculated independently for words of S1 (aligning them with words from S2) and words of S2 (aligning them with words from S1). The score of an alignment is the sum of the similarities of each word pair, normalized by sentence length, the final score assigned to a pair of sentences is the average of the alignment scores for each sentence. Multiple components are used to measure word similarity, their output is combined using supervised learning methods. For out-of-vocabulary (OOV) words, i.e. those that are not covered by the component used for measuring word similarity, the systems rely on string similarity, the Dice- and Jaccard-similarities (Dice, 1945; Jaccard, 1912) over the sets of character n-grams in each word for n=1,2,3,4. 5.1.3 Word similarity in 4lang The 4lang-similarity of two words is the similarity between the 4lang graphs defining them. The exact definition is based on the intuition that similar concepts will overlap in the elementary configurations they take part in: they might share a 0-neighbor, e.g. train →0vehicle ←0car, or they might be on the same path of 1- and 2-edges, e.g. park ←1 IN →2town and street ←1IN →2town. Predicates of a concept are defined as the set of elementary configurations it takes part in: for example, based on the definition graph in Figure 3, predicates of the concept bird ( P(bird)) are {vertebrate; (HAS, feather); (HAS, wing); (MAKE, egg)}. Predicates can also be inherited via paths of 0-edges, that is (HAS, wing) is considered a predicate of all concepts for which →0bird holds. By default, the similarity of two concepts is the Jaccard similarity of the sets of predicates of each concept S(w1,w2)=J(P(w1),P(w2))=|P(w1)∩P(w2)||P(w1)∪P(w2)| If the same metric for all nodes in two definition graphs is larger, it is used instead; this is meant to account for small degrees of similarity such as that between casualty and army, whose definitions do not share any predicates but have a single common node war, causing their similarity to be greater than zero. (see Figure 21) Fig. 21 View largeDownload slide Definitions of casualty (built from LDOCE) and army (defined in 4lang). Fig. 21 View largeDownload slide Definitions of casualty (built from LDOCE) and army (defined in 4lang). Our submissions achieved state-of-the-art results on the 2015 STS task. One of the three systems, embedding, did not make use of 4lang, but used a word embedding built from the first 1 billion words of the English Wikipedia. Our second submission, machine used the 4lang-based word similarity, while the hybrid submission combined the output of first two systems. Results are presented in Table 5; our top system ranked 11th among 78 systems in 2015. Table 5 Performance of the our team’s systems on STS 2015. embedding machine hybrid answers-forums 0.704 0.698 0.723 answers-students 0.700 0.746 0.751 belief 0.733 0.736 0.747 headlines 0.769 0.805 0.804 images 0.804 0.841 0.844 mean Pearson 0.748 0.777 0.784 embedding machine hybrid answers-forums 0.704 0.698 0.723 answers-students 0.700 0.746 0.751 belief 0.733 0.736 0.747 headlines 0.769 0.805 0.804 images 0.804 0.841 0.844 mean Pearson 0.748 0.777 0.784 Table 5 Performance of the our team’s systems on STS 2015. embedding machine hybrid answers-forums 0.704 0.698 0.723 answers-students 0.700 0.746 0.751 belief 0.733 0.736 0.747 headlines 0.769 0.805 0.804 images 0.804 0.841 0.844 mean Pearson 0.748 0.777 0.784 embedding machine hybrid answers-forums 0.704 0.698 0.723 answers-students 0.700 0.746 0.751 belief 0.733 0.736 0.747 headlines 0.769 0.805 0.804 images 0.804 0.841 0.844 mean Pearson 0.748 0.777 0.784 5.2. Word Similarity The experiments described in Section 5.1 provided many insights about the potential of 4lang representations to model semantic relatedness of concepts. This section will describe more recent efforts at measuring the semantic similarity of word pairs, resulting in the hybrid wordsim system. The word similarity task has been a standard method for evaluating distributional models of semantics, with some models trained explicitly for this task. The wordsim system implements supervised learning over features from multiple models (including both word embeddings and 4lang representations). Models were evaluated on the standard SimLex-999 dataset13, we shall introduce the dataset and summarize previous results in Section 5.3. Section 5.4 lists features defined by wordsim over pairs of 4lang definition graphs, results are presented in Section 5.5. The wordsim library is available under an MIT license from http://www.github.com/recski/wordsim, the contents of this section are presented in greater detail by (Recski et al., 2016). 5.3. Previous work (Hill et al., 2015) recently proposed the SimLex-999 dataset as a benchmark for systems measuring word similarity. They argue that earlier gold standards measure association, not similarity, of word pairs; e.g. the words cup and coffee receive a high score by annotators in the widely used wordsim353 data (Finkelstein et al., 2002). Hill et al. note that “[a]ssociation and similarity are neither mutually exclusive nor independent” (2015, p.668). Instead of providing any definition of the above distinction, annotators of the SimLex dataset were simply shown a small set of examples and counter-examples. Since its publication in 2015 dozens of models have used the SimLex dataset for evaluation, some of these are listed on the SimLex webpage14. Various systems for measuring word similarity are compared using the SimLex dataset by measuring the Spearman correlation between scores assigned to word pairs by each system and the average of scores given by human annotators. Word embeddings are evaluated by several authors by treating the cosine distance of the pair of word vectors as the word similarity score assigned by that embedding to a pair of words. (Hill et al., 2015) report a correlation of 0.41 by an embedding trained on Wikipedia using word2vec (Mikolov et al., 2013), (Schwartz et al., 2015) achieve a score of 0.56 using a combination of a standard word2vec-based embedding and the SP model, which encodes the cooccurrence of words in symmetric patterns such as X and Y or X as well as Y. (Banjade et al., 2015) document a set of experiments on the contribution of various models to the task of measuring word similarity. Half a dozen distributional models are combined with simple WordNet-based features indicating whether word pairs are synonymous or antonymous, and with the word similarity algorithm of (Han et al., 2013), which we briefly introduced in Section 5.1.2, and which itself uses WordNet-based features for boosting. By generating features using each of these resources and evaluating ML models trained using 11 different subsets of 10 feature classes, (Banjade et al., 2015) conclude that top performance is achieved when including all of them. This system achieved a Spearman correlation of 0.64, a considerable improvement over the performance of any individual model. The highest scores on SimLex that we are aware of (other than our own system) is achieved using the Paragram embedding (Wieting et al., 2015), a set of vectors obtained by training pre-existing embeddings on word pairs from the Paraphrase Database (Ganitkevitch et al., 2013). Their top correlation of 0.69 is measured when using a 300-dimension embedding created from the same GloVe-vectors that have been introduced in this section (trained on 840 billion tokens). Hyperparameters of this database have been tuned for maximum performance on SimLex, another version tuned for the WS-353 dataset achieves a correlation of 0.67. 5.4. 4lang-based features Based on insights gained from developing a 4lang-based similarity measure for the 2015 STS system (see Section 5.1 for details) we have defined multiple features over pairs of 4lang graphs which we predicted would correlate with word similarity. In defining these features we rely on the definition of predicates introduced in Section 5.1.3. Two real-valued features correspond to the main components of our earlier, rule-based measure: the Jaccard-similarities of sets of predicates and nodes in definition graphs. Additionally, we introduce three binary features: the links_contain feature is true iff either concept is contained in a predicate of the other, nodes_contain holds iff either concept is included in the other’s definition graph, and 0_connected is true iff the two nodes are connected by a path of 0-edges in either definition graph. All 4lang-based features are listed in Table 6. Table 6 4lang similarity features. feature definition links_jaccard J(P(w1),P(w2)) nodes_jaccard J(N(w1),N(w2)) links_contain 1 if w1∈P(w2) or w2∈P(w1), 0 otherwise nodes_contain 1 if w1∈N(w2) or w2∈N(w1), 0 otherwise 0_connected 1 iff w1 and w2 are on a path of 0-edges, 0 otherwise feature definition links_jaccard J(P(w1),P(w2)) nodes_jaccard J(N(w1),N(w2)) links_contain 1 if w1∈P(w2) or w2∈P(w1), 0 otherwise nodes_contain 1 if w1∈N(w2) or w2∈N(w1), 0 otherwise 0_connected 1 iff w1 and w2 are on a path of 0-edges, 0 otherwise Table 6 4lang similarity features. feature definition links_jaccard J(P(w1),P(w2)) nodes_jaccard J(N(w1),N(w2)) links_contain 1 if w1∈P(w2) or w2∈P(w1), 0 otherwise nodes_contain 1 if w1∈N(w2) or w2∈N(w1), 0 otherwise 0_connected 1 iff w1 and w2 are on a path of 0-edges, 0 otherwise feature definition links_jaccard J(P(w1),P(w2)) nodes_jaccard J(N(w1),N(w2)) links_contain 1 if w1∈P(w2) or w2∈P(w1), 0 otherwise nodes_contain 1 if w1∈N(w2) or w2∈N(w1), 0 otherwise 0_connected 1 iff w1 and w2 are on a path of 0-edges, 0 otherwise Since these features are not sensitive to the 4lang nodes LACK, representing negation (dumb →0intelligent →0LACK), and BEFORE, which indicates that something was only true in the past (forget →0know →0BEFORE), pairs of antonyms in SimLex were regularly assigned high similarity scores. A further binary feature, is_antonym was therefore implemented, true iff one word is within the scope of, i.e. 0-connected to, an instance of either LACK or BEFORE in the other word’s definition graph. A system trained on 4lang-based features only achieves a Pearson correlation of 0.38 on the SimLex data, which is competitive with some word embeddings, but significantly below the 0.58−0.68 range of the state of the art systems cited in Sections 5.3. After measuring the individual contribution of each type of 4lang feature to the performance of purely vector-based configurations, only two features, 0-connected and is_antonym were kept. Adding these two features to the vector-based system brought correlation to 0.75, a model using both 4lang and WordNet achieved the top score of 0.76. 5.5. Results Table 7 presents correlation figures for major configurations of wordsim. Features extracted from pairs of graphs built by dict_to_4lang improve the top system by a significant margin, narrowing considerably the gap between other configurations and the 0.78 average performance of annotators when measured against the average of all other annotators’ scores (Hill et al., 2015). This improvement also appears to be more significant than that achieved via a set of WordNet features encoding basic lexical relations such as synonymy and hypernymy. Table 7 Performance of major configurations on SimLex. System Spearman’s ρ embeddings 0.72 embeddings+wordnet 0.73 embeddings+4lang 0.75 embeddings+wordnet+4lang 0.76 System Spearman’s ρ embeddings 0.72 embeddings+wordnet 0.73 embeddings+4lang 0.75 embeddings+wordnet+4lang 0.76 Table 7 Performance of major configurations on SimLex. System Spearman’s ρ embeddings 0.72 embeddings+wordnet 0.73 embeddings+4lang 0.75 embeddings+wordnet+4lang 0.76 System Spearman’s ρ embeddings 0.72 embeddings+wordnet 0.73 embeddings+4lang 0.75 embeddings+wordnet+4lang 0.76 6. Outlook We have presented a system for building 4lang-style concept definitions for practically all words of English, and for using them in creating 4lang-representations of the meaning of any utterance. We have also reviewed a pair of experiments in measuring semantic similarity by combining features derived from 4lang-representations with standard distributional models of meaning. This section outlines our future plans for using 4lang representations to solve some of the most challenging tasks in computational semantics. We shall briefly discuss the tasks of measuring sentence similarity and entailment (Section 6.1), question answering (Section 6.2), and semantics-based parsing (Section 6.3), arguing that each of these should be approached via the single generic task of determining the likelihood of some 4lang representation based on models of context trained on other 4lang graphs relevant to the task at hand (the context). Some preliminary ideas for such a component are presented in Section 6.4. Finally, Section 6.5 will discuss ways to exploit existing sources of both linguistic and extra-linguistic knowledge in the 4lang system by converting them to 4lang representations. 6.1. Sentence similarity and entailment In Sections 5.1 and 5.2 we have introduced measures of semantic similarity between words based on their 4lang definitions which helped achieve state of the art performance on the tasks of measuring word and sentence similarity. Most top STS systems reduce the task of measuring textual similarity to that of word similarity, and lexical resources such as WordNet and surface features such as character-based similarity play an important role in most approaches. Our current systems are no exception. We believe that the task of directly quantifying the similarity of two meaning representations amounts to detecting entailment between parts of such representations. The nature of the similarity scale (e.g. what it means for two sentences to be 70% similar) is unclear, but it can be assumed that (i) if two sentences S1 and S2 are perfectly similar (i.e. mean exactly the same thing), then each of them must entail the other, and (ii) if S1 and S2 are similar to some extent then there must be exist some substructures of the meanings of S1 and S2 such that these substructures are perfectly similar, i.e. entail each other. The connection between STS and RTE tasks has recently been made by (Vo & Popescu, 2016), who present a corpus annotated for both semantic relatedness and entailment, measure correlation between the two sets of scores, and propose a joint architecture for performing the two tasks simultaneously. The nature of these substructures is less obvious. A straightforward approach is to consider subgraphs of 4lang representations and assume that similarity of two representations is connected to the intersection of graphs (i.e. the intersection of the sets of edges over the intersection of the sets of nodes). For example, the sentences John walks and John runs, when interpreted in 4lang and properly expanded, will map to graphs that share the subgraph John ⇌10move x←1INSTRUMENT x→2foot. Other common configurations between graphs can also warrant similarity, e.g. John walks with a stick and John fights with a stick both map to John ⇌10X x←1INSTRUMENT x→2stick for some X. If our notion of similarity could refer to shared subgraphs only, no connection could be made between John and stick and these sentences could not be judged more similar to each other than to virtually any sentence about John or about a stick being an instrument. Thus it appears that such common templates, i.e. graphs with some unspecified nodes, must play a role in determining the similarity of two 4lang graphs. The number of such templates matching a given graph grows exponentially with the number of nodes, but we can expect the relevant templates to be of limited size and a search for common templates in two graphs seems feasible15. If similarity can be defined in terms of common substructures of 4lang graphs, a definition of entailment can follow that takes into account the substructures in one graph that are also present in the other. Simply put, John walks entails John moves because the representation of the latter, John ⇌10move, is contained in that of the former, but entailment does not hold the other way round, because many edges for John walks are left uncovered by John moves, e.g. those in move ←1INSTRUMENT →2foot. Since this asymmetric relationship between graphs – the ratio of templates in one that are present in the other – is also of a gradual nature, it is more intuitive to think of it as the extent to which some utterance supports the other (the term entailment is typically used as a strictly binary concept). John moves may not entail John walks, it nevertheless supports it to a greater extent than e.g. John sings. How similarity and support between 4lang graphs should be measured exactly cannot be worked out without considerable experimenting (we are trying to approximate human judgment, as in the case of the STS task in Section 5.1), what we argued for here is that 4lang representations are powerful and expressive enough that the semantic relatedness of utterances can be measured through them effectively. 6.2. Question Answering In the previous section we discussed the task of measuring the extent to which one utterance supports another – a relationship that differs from entailment in being gradual. A workable measure of support can take part in question answering: it can be used to rank answer candidates in order to find those that are supported to the highest degree by a given context. There remains the task of finding candidates that are relevant answers to the question asked. The text_to_4lang pipeline offers no special treatment for questions. A wh-question such as Who won the 2014 World Cup are handled by all components in the same way as indicatives, creating e.g. the edges who ←1win →2cup. Yes-no questions are simply not detected as such, Did Germany win the 2014 World Cup and Germany won the 2014 World Cup will map to the same 4lang graph. In the future we plan to experiment with simple methods for finding candidates: e.g. searching for wh-questions allows us to identify the template X ←1win →2cup(…) and match it against graphs already in the context; we shall discuss how such a context might be modeled in Section 6.4. 6.3. Parsing in 4lang For the purposes of the 4lang modules and applications presented in this paper, we relegate syntactic analysis to dependency parsers. In Section 3.3.1 we have seen examples of errors introduced by the parsing component, and in sections on evaluation we observed that they are in fact the single largest source of errors in most of our applications. Our long-term plans for the 4lang library include an integrated module for semantics-assisted parsing. Since most of our plans are unimplemented (with the exception of some early experiments documented in (Nemeskey et al., 2013)), here we shall only provide a summary of our basic ideas. Since generic parsing remains a challenging task in natural language processing, many NLP applications rely on the output of chunkers for high-accuracy syntactic information about a sentence. Chunkers typically identify the boundaries of phrases at the lowest level of the constituent structure, e.g. in the sentence A 61-year old furniture salesman was pushed down the shaft of a freight elevator they would identify the noun phrases [A 61-year old furniture salesman], [the shaft], and [freight elevator]. Since chunking can be performed with high accuracy across languages ((Kudo & Matsumoto, 2001; Recski & Varga, 2010)), and some of our past experiments suggest that the internal syntactic structure of chunks can also be detected with high accuracy (Recski, 2014), our first goal for 4lang is to detect phrase-internal semantic relations directly. The aim of parsing with 4lang is to make the process sensitive to (lexical) semantics. Currently the phrase blue giraffe would be mapped to the graph giraffe →0blue on the basis of the dependency relation amod(giraffe, blue), warranted by a particular fragment of the parse-tree, something along the lines of [NP [A blue] [N giraffe]], which has been constructed with little or no regard to the semantics of blue or giraffe. The architecture we propose would still make use of the constituent structure of phrases, but it would create a connection between blue giraffe and giraffe →0blue by means of a construction that pairs the rewrite rule NP → A N with the operation that adds the 0-edge between the concepts corresponding to the words blue and giraffe16. Since many dependency parsers, among them the Stanford Parser used by dict_to_4lang, derive their analyses from parse trees using template matching, it seems reasonable to assume that a direct mapping between syntactic patterns and 4lang configurations can also be implemented straightforwardly. The task of ranking competing parse trees can then be supplemented by some module that ranks 4lang representations by likelihood; what likelihood means and how such a module could be designed is discussed in Section 6.4. Thus, the problem of resolving ambiguities such as the issue of PP-attachment discussed in Section 3.3.1, e.g. to parse the sentence He ate spaghetti with meatballs, becomes no more difficult then predicting that eat →2meatball is significantly more likely than eat ←1INSTRUMENT →2meatballs. If we plan to make such predictions based on statistics over 4lang representations seen previously, our approach can be seen as the semantic counterpart of data-oriented parsing (Bod, 2008), a theory that estimates the likelihood of syntactic parses based on the likelihood of its substructures, learned from structures in some training data. 6.4. Likelihood of 4lang representations We have proposed the notion of support, the extent to which parts of one utterance entail parts of another, in Section 6.1, and we have also indicated in Section 6.2 that we require a model of context that allows us to measure the extent to which the context supports some utterance. Finally, in Section 6.3, we argued that a method for ranking 4lang (sub)graphs by the extent to which the context supports them could be used to improve the quality of syntactic parsing and thereby reduce errors in the entire text_to_4lang pipeline. We shall refer to this measure as the likelihood of some 4lang graph (given some context). This section presents some early ideas for the design of a future 4lang module that models context and measures likelihood. Given a system capable of comparing the likelihoods of competing semantic representations, we will have a chance of successfully addressing more complex tasks in artificial intelligence, such as the Winograd-schema Challenge (Levesque et al., 2011). In Section 6.1 we introduced 4langtemplates – sets of concepts and paths of edges between them – as the structures shared by 4lang graphs that are semantically related. Templates are more general structures than subgraphs, two graphs may share many templates over a set of nodes in spite of having only few shared edges; a previous example was the pair of sentences John walks with a stick and John fights with a stick, sharing the template John ⇌10X ←1INSTRUMENT →2stick. Our initial approach is to think of the likelihood of some graph as some product of the likelihood of matching templates, given a model of the context. We believe that both the likelihood of templates in some context and the way they can be combined to obtain the likelihood of an utterance should be learned from the set of 4lang graphs associated with the context. E.g. if we are to establish the likelihood of the utterance Germany won the 2014 World Cup and the context is a set of 4lang graphs obtained by processing a set of newspaper articles on sports using text_to_4lang, our answer should be based on (i) the frequency of templates in the target 4lang graph, as observed in the set of context graphs and (ii) our knowledge of how important each template is, e.g. based on their overall frequency in the context or among all occurrences over their sets of nodes17. In theory there is an enormous number of templates to consider over some graph (doubly exponential in the number of nodes), but the search space can be effectively reduced in a fashion similar to the way standard language modeling reduces the space of all possible word sequences to that of trigrams. If e.g. we consider templates of no more than 4 nodes, and we use expansion to reduce all graphs to some form of ‘plain English’ with a vocabulary no greater than 105 ((Kornai et al., 2015) has shown that an even greater reduction is possible, by iterative expansion 4lang representations can be reduced to 129 primitives, possibly fewer), then the number of node sets will remain in the 1015 range, and while the total number of theoretically possible 4lang graphs over 4 nodes is as high as 26(42)≈1012, we cannot expect to observe more than a fraction of them: the present 4lang architecture in itself determines a much smaller variety. Note that templates likely to occur in data are also mostly meaningful: e.g. templates over the graph for Germany won the 2014 World Cup are representations for states-of-affairs such as ‘Germany won a 2014 something’ (Germany ←1win →2X →0 2014), ‘somebody won a world cup’ (X ←1win →2cup →0world), or ‘Germany did something to a world something’ (Germany ←1X →2Y →0world) – our proposed parameters are the likelihoods of each of these states-of-affairs based on what we’ve learned from previous experience. What we outlined here are merely directions for further investigation – the exact architecture, the method of learning (including reduction of the parameter space) need to be determined by experiments, as does the question of how far such an approach can scale across many domains, genres, and large amounts of data. Our purpose was once again to argue for the expressiveness of 4lang representations, and to indicate our plans for future research in computational semantics. 6.5. External sources In this final section we present simple examples for using external databases of linguistic and extra-linguistic knowledge to build or extend 4lang representations automatically. Just as dict_to_4lang is a tool for acquiring knowledge about the meaning of words, similar systems could be built for learning grammar (Section 6.5.1) or facts about the world (Section 6.5.2). 6.5.1 Constructions As discussed in Section 6.3, in the future we plan to map text to 4lang representations using constructions, which are essentially pairs of patterns mapping classes of surface forms to classes of 4lang graphs. Such constructions need not be hand-coded, they may be created on a large scale from existing linguistic ontologies. One example is the PropBank database (Palmer et al., 2005) – also a key component of the AMR semantic representation (Banarescu et al., 2013) –, which contains argument lists of English verbs along with the semantic roles each argument takes. The example entry in Figure 22 establishes that the mandatory roles associated with arguments of the verb agree are those of agreer and proposition and that their functions are those of prototypical agent (PAG) and prototypical patient (PPT), respectively. This information could be represented as a 4lang construction stating that concepts accessible from agree via 1- and 2-edges should have 0-edges leading to the concepts agreer and proposition. This construction could then be used to extend the 4lang definition of agree (see Figure 23). The large-scale extension of 4lang data based on this external source will require a carefully selected set of high-precision patterns: a method must be devised to decide for each pair of PropBank frameset and 4lang definition whether an extension of the latter is warranted. Fig. 22 View largeDownload slide Part of the PropBank frameset for agree.18 Fig. 22 View largeDownload slide Part of the PropBank frameset for agree.18 Fig. 23 View largeDownload slide Extending the 4lang definition of agree (new nodes are shown in grey). Fig. 23 View largeDownload slide Extending the 4lang definition of agree (new nodes are shown in grey). 6.5.2 World knowledge Even the most simple forms of reasoning will require some model of world knowledge, and 4lang representations are capable of representing facts taken from publicly available knowledge bases such as WikiData (successor to the widely used but discontinued Freebase (Bollacker et al., 2008)). Such datasets contain triplets of the form predicate(argument1, argument2) such as author(George_Orwell, 1984). author is defined in Longman as someone who has written a book, which dict_to_4lang uses to build the definition graph in Figure 24. If we are ready to make the assumption that the first and second arguments of the WiktData predicate author correspond to the 1- and 2-neighbours of the only binary relation in this definition (write), we can combine the fact author(George_Orwell, 1984) with the definition of author to obtain the graph in Figure 25. Fig. 24 View largeDownload slide 4lang definition of author. Fig. 24 View largeDownload slide 4lang definition of author. Fig. 25 View largeDownload slide 4lang graph inferred from author(George_Orwell, 1984). Fig. 25 View largeDownload slide 4lang graph inferred from author(George_Orwell, 1984). A system for building 4lang graphs from WiktData automatically will require a high-precision method for matching WiktData relations with arguments of 4lang definitions, as we did in the case of author above. Simple heuristics like the one used in this example will have to be evaluated and only those with reasonable precision selected. Such a curated set of patterns can then be applied to any subset of WiktData to convert large amounts of factual information to the 4lang format and efficiently combine them with 4lang’s knowledge of linguistic semantics. Footnotes 1 https://github.com/kornai/4lang/blob/master/4lang. 2 Since the text_to_4lang pipeline presented in Section 3 assigns 4lang graphs to raw text based on the output of dependency parsers that treat uniformly the relationship between a subject and verb irrespective of whether the verb is transitive or not, the 4lang graphs we build will include a 1-edge between all verbs and their subjects. We do not consider this a shortcoming: for the purposes of semantic analysis we do not see the practicality of a distinction between transitive and intransitive verbs – we only recognize the difference between the likelihood (based on data) of some verb taking a certain number of arguments. 3 All example definitions, unless otherwise indicated, are taken from the Longman Dictionary of Contemporary English (Bullon, 2003). 4 Note that we do not provide a proper definition of true homonyms – particular applications of the 4lang system for representing word meaning can and should make their own decisions on where to draw the line based on their inferencing capabilities. Allowing for polysemy when the need arises is essential not only because of words like trunk, it is also more practical to maintain multiple definitions for polysemous words when such definitions are readily available (see Section 4.4.3 for more discussion). 5 For a possible typology of such semantic exploitations, see Chapter 8 of (Hanks, 2013). 6 Note that such an inference must access some form of world knowledge in addition to the definition of each concept: the definition of ship will contain ←1ON →2water (or similar), but to infer that this makes it incompatible with the earth in the definition of plow one must also be aware that water and earth cancel each other out in the context of where a vehicle runs. 7 This is also reflected in The Urban Dictionary’s definition of semantics: The study of discussing the meaning/interpretation of words or groups of words within a certain context; usually in order to win some form of argument (http://www.urbandictionary.com). 8 http://nlp.stanford.edu/software/lex-parser.shtml. 9 the word wizarding should have been mapped to the concept wizard. 10 http://dbpubs.stanford.edu:8091/∼testbed/doc2/WebBase/. 11 The command-line interface of the Stanford Parser does not support adding constraints on parse trees, but the Java API does; we implemented a small wrapper in jython that allowed us to access the classes and functions necessary to enforce this constraint. 12 The 50 words in our sample, selected randomly using GNU shuf were the following: aircraft, arbour, armful, characteristic, clothesline, contact, contrived, costermonger, cycling, cypress, dandy, efface, excited, fedora, forester, frustrate, gazette, grenade, houseboy, incandescent, invalid, khaki, kohl, lecture, lizard, might, multiplication, nightie, okey-doke, outdid, overwork, popularity, preceding, Presbyterian, punch-drunk, reputed, residency, retaliation, rock-solid, sandpaper, scant, sewing, slurp, transference, T-shirt, underwrite, vivace, well-fed, whatsit, Zen. 13 http://www.cl.cam.ac.uk/∼fh295/simlex.html. 14 http://www.cl.cam.ac.uk/∼fh295/simlex.html. 15 The 4lang theory of representing meaning using networks of Eilenberg machines – of which our graphs are simplifications – will have the machines walk and fight inherit all properties of all machines to which they have pointers on their 0th partition; in other words they will end up with all properties of concepts that are accessible through a path of IS_A relationships, and will probably share at least some very generic properties such as voluntary action. The machine-equivalent of templates could then be networks of machines, each with any arbitrary set of properties. 16 As mentioned in Section 2.1, the directed graphs used throughout this paper are simplifications of our formalism; the constructions in 4lang actually map surface patterns to operations over Eilenberg-machines, in this case one that places a pointer to a blue machine on the 0th partition of a giraffe machine. 17 At this point we must note that likelihood is not (directly related to) truth; in fact none of our previous discussions leading up to this notion makes reference to truth. Neither do we suggest that calculating likelihood can take the place of inference – a context may entail or contradict an utterance regardless of how likely the latter is; our notion is rather motivated by the various applications discussed in this section. 18 https://github.com/propbank/propbank-frames/blob/master/frames/agree.xml. References Ács J. , Pajkossy K. , & Kornai A. 2013 . Building basic vocabulary across 40 languages . In Proceedings of the Sixth Workshop on Building and Using Comparable Corpora (pp. 52 – 58 ). Sofia, Bulgaria : Association for Computational Linguistics . Banarescu L. , Bonial C. , Cai S. , Georgescu M. , Griffitt K. , Hermjakob U. ,  …N. Schneider ( 2013 ). Abstract meaning representation for sembanking . In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse (pp. 178 – 186 ). Sofia, Bulgaria : Association for Computational Linguistics . Banjade R. , Maharjan N. , Niraula N. B. , Rus V. , & Gautam D. ( 2015 ). Lemon and tea are not similar: Measuring word-to-word similarity by combining different methods . In Gelbukh A. (ed.), International Conference on Intelligent Text Processing and Computational Linguistics (pp. 335 – 346 ). Springer . Bod R. ( 2008 ). The data-oriented parsing approach: theory and application . Springer . Boguraev B. K. , & Briscoe E. J. ( 1989 ). Computational Lexicography for Natural Language Processing . Longman . Bollacker K. , Evans C. , Paritosh P. , Sturge T. , & Taylor J. ( 2008 ). Freebase: a collaboratively created graph database for structuring human knowledge . In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (pp. 1247 – 1250 ). Bullon S. ( 2003 ). Longman dictionary of contemporary English 4 . Longman . DeMarneffe M.-C. , MacCartney W. , & Manning C. ( 2006 ). Generating typed dependency parses from phrase structure parses . In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC) (Vol. 6, pp. 449–454). Genoa, Italy . De Marneffe M.-C. , & Manning C. D. ( 2008 ). Stanford typed dependencies manual [Computer software manual] . Retrieved from http://nlp.stanford.edu/software/dependencies_manual.pdf (Revised for the Stanford Parser v. 3.5.1 in February 2015) Dice L. R. ( 1945 ). Measures of the amount of ecologic association between species . Ecology , 26 ( 3 ), 297 – 302 . Google Scholar Crossref Search ADS Eilenberg S. ( 1974 ). Automata, languages, and machines ( Vol. A ). Academic Press . Fillmore C. J. ( 1977 ). Scenes-and-frames semantics . In Zampolli A. (ed.), Linguistic structures processing (pp. 55 – 88 ). North Holland . Finkelstein L. , Gabrilovich E. , Matias Y. , Rivlin E. , Solan Z. , Wolfman G. , … Ruppin E. ( 2002 ). Placing search in context: The concept revisited . ACM Transactions on Information Systems , 20( 1), 116 – 131 . Google Scholar Crossref Search ADS Ganitkevitch J. , Van Durme B. , & Callison-Burch C. ( 2013 ). PPDB: The Paraphrase Database . In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013) (pp. 758 – 764 ). Atlanta, Georgia : Association for Computational Linguistics . Han L. , Kashyap L. , Finin A. , Mayfield T. , & Weese J. , ( 2013 ). Umbc_ebiquity-core: Semantic textual similarity systems . In Second Joint Conference on Lexical and Computational Semantics (*SEM) (pp. 44 – 52 ). Atlanta, Georgia, USA : Association for Computational Linguistics . Han L. , Martineau J. , Cheng D. , & Thomas C. ( 2015 ). Samsung: Alignand- Differentiate Approach to Semantic Textual Similarity . In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) (pp. 172 – 177 ). Denver, Colorado : Association for Computational Linguistics . Hanks P. ( 2013 ). Lexical analysis: Norms and exploitations . MIT Press . Hill F. , Reichart R. , & Korhonen A. ( 2015 ). Simlex-999: Evaluating semantic models with (genuine) similarity estimation . Computational Linguistics , 41 ( 4 ), 665 – 695 . Google Scholar Crossref Search ADS Hobbs J. R. ( 1990 ). Literature and cognition (No. 21). Center for the Study of Language (CSLI) . Jaccard P. ( 1912 ). The distribution of the flora in the alpine zone . New phytologist , 11 ( 2 ), 37 – 50 . Google Scholar Crossref Search ADS Kashyap A. , Han L. , Yus R. , Sleeman J. , Satyapanich T. , Gandhi S. , & Finin T. ( 2014 ). Meerkat Mafia: Multilingual and Cross-Level Semantic Textual Similarity Systems . In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014) (pp. 416 – 423 ). Dublin, Ireland : Association for Computational Linguistics and Dublin City University . Katz J. , & Fodor J. A. ( 1963 ). The structure of a semantic theory . Language , 39 , 170 – 210 . Google Scholar Crossref Search ADS Klein D. , & Manning C. D. ( 2003 ). Accurate unlexicalized parsing . In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (pp. 423 – 430 ). Sapporo, Japan : Association for Computational Linguistics . Kornai A. ( 2010 ). The algebra of lexical semantics . In Ebert C. , Jäger G. , & Michaelis J. (Eds.), Proceedings of the 11th Mathematics of Language Workshop (pp. 174 – 199 ). Springer . Kornai A. ( 2012 ). Eliminating ditransitives . In de Groote P. & Nederhof M.-J. (Eds.), Revised and Selected Papers from the 15th and 16th Formal Grammar Conferences (pp. 243 – 261 ). Springer . Kornai A. , Ács J. , Makrai M. , Nemeskey D. M. , Pajkossy K. , & Recski G. ( 2015 ). Competence in lexical semantics . In Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics (*SEM 2015) (pp. 165 – 175 ). Denver, Colorado : Association for Computational Linguistics . Kornai A. , & Makrai M. ( 2013 ). A 4lang fogalmi szótár . In Tanács A. & Vincze V. (Eds.), IX. Magyar Számitógépes Nyelvészeti Konferencia (pp. 62 – 70 ). Kudo T. , & Matsumoto Y. ( 2001 ). Chunking with support vector machines . In Proceedings of the 2nd meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 2001) (pp. 1 – 8 ). Association for Computational Linguistics . Levesque H. J. , Davis E. , & Morgenstern L. ( 2011 ). The Winograd schema challenge . In AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning ( Vol. 46 , p. 47 ). Mikolov T. , Chen K. , Corrado G. , & Dean J. ( 2013 ). Efficient estimation of word representations in vector space . In Bengio Y. & LeCun Y. (Eds.), Proceedings of the ICLR 2013 . Nemeskey D. , Recski G. , Makrai M. , Zséder A. , & Kornai A. ( 2013 ). Spreading activation in language understanding . In Proceedings of the 9th International Conference on Computer Science and Information Technologies (CSIT 2013) (pp. 140 – 143 ). Yerevan, Armenia : Springer . Palmer M. , Gildea D. , & Kingsbury P. ( 2005 ). The Proposition Bank: An annotated corpus of semantic roles . Computational linguistics , 31 ( 1 ), 71 – 106 . Google Scholar Crossref Search ADS Quillian M. R. ( 1969 ). The teachable language comprehender . Communications of the ACM , 12 , 459 – 476 . Google Scholar Crossref Search ADS Recski G. ( 2014 ). Hungarian noun phrase extraction using rule-based and hybrid methods . Acta Cybernetica , 21 , 461 – 479 . Google Scholar Crossref Search ADS Recski G. , & Ács J. ( 2015 ). MathLingBudapest: Concept networks for semantic similarity . In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) (pp. 543 – 547 ). Denver, Colorado : Association for Computational Linguistics . Recski G. , Iklódi E. , Pajkossy K. , & Kornai A. ( 2016 ). Measuring semantic similarity of words using concept networks . In Proceedings of the 1st Workshop on Representation Learning for NLP (pp. 193 – 200 ). Berlin, Germany : Association for Computational Linguistics . Recski G. , & Varga D. ( 2010 ). A Hungarian NP Chunker . The Odd Yearbook. ELTE SEAS Undergraduate Papers in Linguistics , 8 , 87 – 93 . Richards I. ( 1937 ). The philosophy of rhetoric . Oxford University Press . Ruhl C. ( 1989 ). On monosemy: a study in lingusitic semantics . State University of New York Press . Schwartz R. , Reichart R. , & Rappoport A. ( 2015 ). Symmetric pattern based word embeddings for improved word similarity prediction . In Proceedings of the 19th Conference on Computational Natural Language Learning (CoNLL 2015) (pp. 258 – 267 ). Beijing, China : Association for Computational Linguistics . Sinclair J. M. ( 1987 ). Looking up: an account of the COBUILD project in lexical computing . Collins ELT . Socher R. , Bauer J. , Manning C. D. , & Andrew Y. N. ( 2013 ). Parsing with compositional vector grammars . In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013) (pp. 455 – 465 ). Sofia, Bulgaria : Association for Computational Linguistics . Trón V. , Gyepesi G. , Halácsky P. , Kornai A. , Németh L. , & Varga D. ( 2005 ). Hunmorph: Open source word analysis . In Proceedings of the ACL Workshop on Software (pp. 77 – 85 ). Ann Arbor, Michigan : Association for Computational Linguistics . Vo N. P. A. , & Popescu O. ( 2016 ). Corpora for learning the mutual relationship between semantic relatedness and textual entailment . In Calzolari N. et al. (Eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) . Paris, France : European Language Resources Association (ELRA) . Wieting J. , Bansal M. , Gimpel K. , Livescu K. , & Roth D. ( 2015 ). From paraphrase database to compositional paraphrase model and back . TACL , 3 , 345 – 358 . © 2017 Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Lexicography Oxford University Press

Building Concept Definitions from Explanatory Dictionaries

Loading next page...
 
/lp/ou_press/building-concept-definitions-from-explanatory-dictionaries-VCgSbFiK7D
Publisher
Oxford University Press
Copyright
© 2017 Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com
ISSN
0950-3846
eISSN
1477-4577
D.O.I.
10.1093/ijl/ecx007
Publisher site
See Article on Publisher Site

Abstract

Abstract A key resource for any application in computational semantics is a model of word meaning. Existing systems currently rely on either distributional models trained on large corpora or lexical ontologies such as WordNet. The dict_to_4lang module of the 4lang software library builds concept graph definitions for virtually all words of English by processing automatically entries of three large explanatory dictionaries of English using a state of the art dependency parser and a rule-based system for converting its output to graphs over concepts corresponding to words in each definition. The resulting set of definition graphs has been used successfully in measuring semantic similarity of English words and sentences. The current top scoring system on the popular SimLex benchmark uses features derived from definitions built by dict_to_4lang. Plans for further applications such as recognizing textual entailment and semantics-driven parsing are also outlined in the paper. 1. Introduction The 4lang concept dictionary contains manual definitions of over 2000 language-independent concepts using the 4lang formalism for representing meaning. We present the dict_to_4lang tool of the open-source 4lang library, which builds similar definitions for virtually any word of the English language by processing entries of monolingual dictionaries. Concept graphs created by our tool have been used successfully in measuring semantic similarity of words. Our future plans include applying them to other common tasks in computational semantics such as recognizing textual entailment, question answering, and inference. All software presented in this paper is downloadable from the repository at http://github.com/kornai/4lang and may be freely distributed under an MIT license. The paper is structured as follows: Section 2 provides a short overview of the 4lang formalism for modeling meaning (Kornai et al., 2015). Section 3 presents the dep_to_4lang pipeline, which creates 4lang-style meaning representations from running text. Section 4 describes its application to monolingual dictionary definitions, dict_to_4lang, used to create large concept lexica automatically. Section 5 presents two applications of the dict_to_4lang module to the tasks of measuring the semantic similarity of pairs of English sentences and words. Finally, Section 6 discusses our plans for future applications of the 4lang system. 2. The 4lang system This section is a short outline of the 4lang system for representing meaning using directed graphs of concepts. We shall not attempt a full presentation of the 4lang principles. Instead, we shall introduce the formalism in Section 2.1, then continue to discuss some specific aspects relevant to this paper. 4lang’s approach to multiple word senses is summarized in Section 2.2, Section 2.3 is concerned with reasoning based on 4lang graphs. The treatment of extra-linguistic knowledge is discussed in Section 2.4. Finally, Section 2.5 considers the primitives of the 4lang representation and contrasts them with some earlier approaches to representing word meaning. For a complete presentation of the theory of lexical semantics underlying 4lang the reader is referred to (Kornai, 2010) and (Kornai, 2012). (Kornai et al., 2015) compares 4lang to contemporary theories of word meaning. 4lang is also the name of a manually built dictionary1 mapping 2,200 English words to concept graphs (as well as their translations in Hungarian, Polish, and Latin, hence its name). The dictionary is described in (Kornai & Makrai, 2013). For work on extending 4lang to include all European languages, see (Ács et al., 2013). 2.1. The formalism 4lang represents the meaning of words, phrases and utterances as directed graphs whose nodes correspond to language-independent concepts and whose edges may have one of three labels, 0, 1, and 2. (The 4lang theory represents concepts as Eilenberg-machines (Eilenberg, 1974) with three partitions, each of which may contain zero or more pointers to other machines and therefore also represent a directed graph with three types of edges. The additional capabilities offered by Eilenberg-machines have not been applied by any of the systems presented here, therefore it makes more sense to consider the representations under discussion as plain directed graphs.) First we shall discuss the nature of 4lang concepts - represented by the nodes of the graph, then we shall introduce the types of relationships encoded by each of the three edge types. 2.1.1 Nodes Nodes of 4lang graphs correspond to concepts.4lang concepts are not words, nor do they have any grammatical attributes such as part-of-speech (category), number, tense, mood, voice, etc. For example, 4lang representations make no difference between the meaning of freeze (N), freeze (V), freezing, or frozen. Therefore, the mapping between words of some language and the language-independent set of 4lang concepts is a many-to-one relation. In particular, many concepts will be defined by a single link to another concept that is its hypernym or synonym, e.g. above →0up or grasp →0catch. Encyclopedic information is omitted, e.g. Canada, Denmark, and Egypt are all defined as country (their definitions also containing an indication that an external resource may contain more information). In general, definitions are limited to what can be considered the shared knowledge of competent speakers - e.g. the definition of water contains the information that it is a colourless, tasteless, odorless liquid, but not that it is made up of hydrogen and oxygen. The distinction between linguistic and extra-linguistic knowledge will be discussed in more detail in Section 2.4. We shall now go through the types of links used in 4lang graphs. 2.1.2 The 0-edge The most common relation between concepts in 4lang graphs is the 0-edge, which represents attribution (dog →0friendly); the IS_A relation (hypernymy) (dog →0animal); and unary predication (dog →0bark). Since concepts do not have grammatical categories, this uniform treatment means that the same graph can be used to encode the meaning of phrases like water freezes and frozen water, both of which would be represented as water →0freeze. 2.1.3 1- and 2-edges Edges of type 1 and type 2 connect binary predicates to their arguments (e.g. cat ←1catch →2mouse). The formalism used in the 4lang dictionary explicitly marks binary (transitive) elements – by using UPPERCASE printnames. The pipeline that we shall introduce in Section 3 will not make use of this distinction, any concept can have outgoing 1- and 2-edges. However, we will retain the uppercase marking for those binary elements that do not correspond to any word in a given phrase or sentence, e.g. the meaning of the sentence Penny ate Leonard’s food will be represented by the graph in Figure 1. The top ten most common binaries used in 4lang are listed in Table 1 and examples are shown for each. Table 1 Most common binaries in the 4lang dictionary. HAS shirt ←1HAS →2collar IN letter ←1IN →2envelope AT move ←1AT →2way CAUSE humor ←1CAUSE →2laugh INSTRUMENT sew ←1INSTRUMENT →2needle PART_OF leaf ←1PART_OF →2plant ON smile ←1ON →2face ER slow ←1ER →2speed FOLLOW Friday ←1FOLLOW →2Thursday MAKE bee ←1MAKE →2honey HAS shirt ←1HAS →2collar IN letter ←1IN →2envelope AT move ←1AT →2way CAUSE humor ←1CAUSE →2laugh INSTRUMENT sew ←1INSTRUMENT →2needle PART_OF leaf ←1PART_OF →2plant ON smile ←1ON →2face ER slow ←1ER →2speed FOLLOW Friday ←1FOLLOW →2Thursday MAKE bee ←1MAKE →2honey Table 1 Most common binaries in the 4lang dictionary. HAS shirt ←1HAS →2collar IN letter ←1IN →2envelope AT move ←1AT →2way CAUSE humor ←1CAUSE →2laugh INSTRUMENT sew ←1INSTRUMENT →2needle PART_OF leaf ←1PART_OF →2plant ON smile ←1ON →2face ER slow ←1ER →2speed FOLLOW Friday ←1FOLLOW →2Thursday MAKE bee ←1MAKE →2honey HAS shirt ←1HAS →2collar IN letter ←1IN →2envelope AT move ←1AT →2way CAUSE humor ←1CAUSE →2laugh INSTRUMENT sew ←1INSTRUMENT →2needle PART_OF leaf ←1PART_OF →2plant ON smile ←1ON →2face ER slow ←1ER →2speed FOLLOW Friday ←1FOLLOW →2Thursday MAKE bee ←1MAKE →2honey Fig. 1 View largeDownload slide 4lang graph with two types of binaries. Fig. 1 View largeDownload slide 4lang graph with two types of binaries. Given two concepts c1 and c2 such that c2 is a predicate that holds for c1, 4lang will allow for one of two possible connections between them: c1 →0c2 if c2 is a one-place predicate and c2 →1c1 if c2 is a two-place predicate. The mutual exclusiveness of these two configurations is both counter-intuitive and unpractical: two-place predicates often appear with a single argument (e.g. John is eating), and representing such a statement as John →0eat while the sentence John is eating a muffin warrants John ←1eat →2muffin would mean that we consider the relationship between John and eat dependent on whether we have established the object of his eating. Therefore we choose to adopt a modified version of the 4lang representation where the 0-connection holds between a subject and predicate regardless of whether the predicate has another argument. The example graph in Figure 1 can then be revised to obtain that in Figure 22. The meaning of each 4lang concept is represented as a 4lang graph over other concepts – a typical definition in the 4lang dictionary can be seen in Figure 3; this graph captures the facts that birds are vertebrates, that they lay eggs, and that they have feathers and wings. The generic applicability of the 4lang relations introduced in Section 2.1 have the consequence that to create, understand, and manipulate 4lang representations one need not make the traditional distinction between entities, properties, and events. The relationships dog →0bark and dog →0faithful can be treated in a uniform fashion, when making inferences based on the definitions of each concept, e.g. that dog ←1MAKE →2sound or that calling another person a dog is insulting. In other words, all semantic properties are inherited by default via paths of 0-edges. Fig. 2 View largeDownload slide Revised 4lang graph with two types of binaries. Fig. 2 View largeDownload slide Revised 4lang graph with two types of binaries. Fig. 3 View largeDownload slide 4lang definition of bird. Fig. 3 View largeDownload slide 4lang definition of bird. 2.2. Ambiguity and compositionality 4lang does not allow for multiple senses when representing word meaning, all occurrences of the same word form – with the exception of true homonyms like trunkthe very long nose of an elephant and trunkthe part at the back of a car where you can put bags, tools etc.3,4 – must be mapped to the same concept, whose definition in turn must be generic enough to allow for all possible uses of the word (Ruhl, 1989). As Jakobson reportedly notes, such a monosemic approach might define the word bachelor as ‘unfulfilled in typical male role’ (Fillmore, 1977) – to account for all senses of the word including ‘man who has never married’, ‘has the first or lowest academic degree’ and ‘young fur seal when without a mate during the breeding time’ (Katz & Fodor, 1963, p.186). While such definitions place a great burden on the process responsible for combining the meaning of words to create representations of phrases and utterances – see Section 3 –, it also has the potential to model the flexibility and creativity of language use: we note here a significant advantage of the monosemic approach, namely that it makes interesting predictions about novel usage, while the predictions of the polysemic approach border on the trivial. To stay with the example, it is possible to envision novel usage of bachelor to denote a contestant in a game who wins by default (because no opponent could be found in the same weight class or the opponent was a no-show). The polysemic theory would predict that not just seals but maybe also penguins without a mate may be termed bachelor – true but not very revealing. (Kornai, 2010, p.182) One typical consequence of this approach is that 4lang definitions will not distinguish between bachelor and some concept w that means ‘unfulfilled male’ – both could be defined in 4lang as male, LACK. This is not a shortcoming of the representation, rather it is in accordance with the principles underlying it; the concepts unfulfilled and male cannot be combined (e.g. to create a representation describing an unfulfilled male) without making reference to some nodes of the graph representing the meaning of male; if something is a ‘typical male role’, this should be indicated in the definition graph of male – if only by inbound pointers –, and without any such information, unfulfilled male cannot be interpreted at all. This does not mean that male cannot be defined without listing all stereotypes associated with the concept. However, if the piece of information that ‘being with a mate at breeding time’ is a typical male role – which is necessary to account for the interpretation of bachelor as ‘young fur seal when without a mate at breeding time’ – is to be accessed by some inference mechanism, then it must be present in the form of some subgraph containing the nodes seal, mate, male, and possibly others. Then, a 4lang-based natural language understanding system that is presented with the word bachelor in the context of mating seals for the first time, may explore the neighborhood of these nodes until it finds this piece of information as the only one that makes sense of this novel use of bachelor. Note that this is a model of novel language use in general. Humans produce and understand without much difficulty novel phrases that most theories would label ‘semantically anomalous’. In particular, all language use that is commonly labeled metaphoric involves accessing a lexical element for the purpose of activating some of its meaning components, while ignoring others completely. It is this use of language that 4lang wishes to model, as it is most typical of everyday communication (Richards, 1937; Hobbs, 1990)5. Another 4lang principle that ensures metaphoric interpretation is that any link in a 4lang definition can be overridden. In fact, the only type of negation used in 4lang definitions (LACK) carries the potential to override elements that might otherwise be activated when definitions are expanded: e.g. the definition of penguin, which undoubtedly contains →0bird, may also contain ←1LACK →2fly to block inference based on bird →0fly. That any element can freely be overridden ensures that novel language use does not necessarily cause contradiction. “[T]o handle ‘the ship plowed through the sea’, one lifts the restriction on ‘plow’ that the medium be earth and keeps the property that the motion is in a substantially straight line through some medium” (Hobbs, 1990, p.55). Since a 4lang definition of plow must contain some version of →2earth, there must be a mechanism allowing to override it and not make inferences such as sea →0earth6. 2.3. Reasoning The 4lang principles summarized so far place a considerable burden on the inferencing mechanism. Given the possibility of defining all concepts using only a small set of primitives, and a formalism that strictly limits the variety of connections between concepts, we claim to have laid the groundwork for a semantic engine with the chance of understanding creative language use. Since no generic reasoning has yet been implemented in 4lang – although we present early attempts in Section 4.3 –, we shall now simply outline what we believe could be the main mechanisms of such a system. The simplest kind of lexical inference in 4lang graphs is performed by following paths of 0-edges from some concept to determine the relationships in which it takes part. The concept mammal is defined in 4lang as an animal that has fur and milk (see Figure 4), from which one can conclude that the relations ←1HAS →2milk and ←1HAS →2fur also hold for all concepts whose definition includes →0mammal (we shall assume that this simple inference can be made when we construct 4lang definitions from dictionary definitions in Section 4). Similar inferences can be made after expanding definitions, i.e. connecting all concept nodes to their own definition graphs (see Section 4.3 for details). If the definition of giraffe contains →0mammal, to which we add edges ←1HAS →2fur and ←1HAS →2milk, this expanded graph will allow us to infer the relations giraffe ←1HAS →2fur and giraffe ←1HAS →2milk. As mentioned in the previous section, this process requires that relations present explicitly in a definition override those obtained by inference: penguins are birds and yet they cannot fly, humans are mammals without fur, etc. Fig. 4 View largeDownload slide 4lang definition of mammal. Fig. 4 View largeDownload slide 4lang definition of mammal. A more complicated procedure is necessary to detect connections between nodes of an expanded definition and nodes connected to the original concept. According to Quillian’s account of his Teachable Language Comprehender (Quillian, 1969), the phrase lawyer’s client triggers an iterative search process that will eventually find lawyer to be compatible with the employer property of client, since both are professionals. A similar process can be implemented for 4lang graphs; consider the definition graphs for lawyer and client in Figures 5 and 6, built automatically from definitions in the Longman dictionary, as described in Section 4, then pruned manually. (These graphs, being the output of the dict_to_4lang system and not manual annotation, have numerous issues: the word people in the Longman dictionary definition of lawyer was not mapped to person, nor have the words advice and advise been mapped to the same concept. After correcting these errors manually, nodes with identical names in the graph for lawyer’s client (Figure 7) can form the starting point of the inference process. Let us now go over the various steps of inference necessary to reduce this graph to the most informative representation of lawyer’s client. Note that we do not wish to impose any logical order on these steps; they should rather be the ‘winners’ of a process that considers many transformations in parallel and ends up keeping only some of them. Fig. 5 View largeDownload slide Definition graph for lawyer. Fig. 5 View largeDownload slide Definition graph for lawyer. Fig. 6 View largeDownload slide Definition graph for client. Fig. 6 View largeDownload slide Definition graph for client. Fig. 7 View largeDownload slide Corrected graph for lawyer’s client. Fig. 7 View largeDownload slide Corrected graph for lawyer’s client. We should be able to realize that the person who is adviced (and is represented by) the lawyer can be the same as the client who gets advice from the lawyer. To this end we must be able to make the inference that X ←1get →2advice and advice →2X are synonymous. We believe a 4lang-based system should be able to make such an inference in at least one of two independent ways. First, we’d like to be able to accommodate constructions in the 4lang system (see also Section 6.3); in this case one that explicitly pairs the above two configurations for some surface forms but not for others. Secondly, since we cannot expect to have all possibilities listed, we should also be able to establish for any concept Y the hypothesis Y →2X in the presence of X ←1get →2Y, to be confirmed or disproved at some later step. We should also consider unifying the person node in person ←1from →2advice with lawyer in advice →1lawyer, which would once again require either some construction that states that when someone advises, then the advice is from her, or a generic rule that can guess the same connection. Given these inferences, the two advice can also be merged as likely referring to the same action, resulting in the final graph in Figure 8. The nodes organization, company, and service have been omitted from the figure to improve readability. Fig. 8 View largeDownload slide Inferred graph for lawyer’s client. Fig. 8 View largeDownload slide Inferred graph for lawyer’s client. 2.4. Extra-linguistic knowledge The same 4lang graph might represent the meaning of some utterance, a piece of world knowledge, or could be the output of some inference mechanism whose input can be any combination of linguistic or extra-linguistic knowledge. In fact, neither the 4lang formalism, nor the mechanisms we propose for reasoning based on 4lang representations require a distinction between linguistic and extra-linguistic information. Returning to one of the simplest examples above, where bird →0fly is overridden to accommodate both penguin ←1LACK →2 fly and penguin →0bird, we need not decide whether the particular piece of information that penguins cannot fly is part of the meaning of penguin. Clearly it is possible for one to learn of the existence of penguins and that they are a type of bird without realizing that they cannot fly, and this person could easily make the (incorrect) inference that they can, yet we would not like to claim that this person does not know what the word penguin means. Some components of word meaning, on the other hand, appear to be essential to the understanding of a particular concept, e.g. if a learner of English believes that nephew refers to the child of one’s sibling, male or female – perhaps because in her native language a single word stands for both nephews and nieces, and because she has heard no contradicting examples –, we say that she does not know the meaning of the word; nephew →0male appears to be somehow more internal to the concept nephew than penguin ←1LACK →2fly is to penguin. While this distinction is commonly made in semantics, we believe that in everyday discourse it is neither well-defined, nor does it play an important role in predicting language use and common-sense reasoning. Carrying a conversation successfully only requires that the participants’ representations of word meaning do not contradict each other in a way relevant to the conversation at hand7. Static lexical resources such as the Longman Dictionary of Contemporary English (LDOCE) or the 4lang concept dictionary must make decisions about which pieces of information to include, and may do so based on some notion of how ‘technical’ or ‘commonplace’ they are, but this distinction is not necessary for modeling language use in general. A person’s ignorance of the fact that somebody’s nephew is necessarily male is probably itself the result of one or several conversations about nephews that somehow remained consistent despite his incomplete knowledge about how the word is typically used. The uniform representation of linguistic and extra-linguistic knowledge should also allow us to extend 4lang representations arbitrarily using non-linguistic sources of world knowledge; an example is discussed in Section 6.5.2. 2.5. Primitives of representation In the following two sections we present methods for 1) building 4lang representations from raw text and 2) building 4lang definition graphs for virtually all words based on monolingual dictionaries. Given these two applications, any text can be mapped to 4lang graphs and nodes of any graph can be expanded to include their 4lang definitions. Performing this expansion iteratively, all representations can be traced back to a small set of concepts; in case the Longman Dictionary is used to build definition graphs, the concepts listed in the 4lang dictionary will suffice to cover all of them, since it contains all words of the Longman Defining Vocabulary (LDV), the set of all words used in definitions of the Longman Dictionary (Boguraev & Briscoe, 1989). The set of concepts necessary to define all others can be further reduced: it has been shown in (Kornai et al., 2015) that as few as 129 4lang concepts are enough to define all others in the 4lang dictionary, and thus, via monolingual dictionaries, practically all words in the English language. 2.6. Theoretical significance This section provided a brief summary of the main principles behind the 4lang system for representing the meaning of linguistic structures. Before we proceed to present a set of tools for building and manipulating 4lang representations, let us point out some of the most important characteristics of 4lang representations that make it our formalism of choice. No categories 4lang does not differentiate between concepts denoting actions, entities, attributes, etc., there are no categories of concepts equivalent to part-of-speech categories of words. This ensures, among other things, that words with a shared root are mapped to the same concept, and that ultimately utterances with the same information content can be mapped to identical 4lang representations (although neither of these is strictly required). No polysemy 4lang will only accommodate multiple senses of a word as a last resort. Distant but related uses of the same word must be interpreted via the same generic concept. This virtually eliminates the difficulty of word sense disambiguation. Requires powerful inference the above principles require a mechanism for deriving all uses of a word from minimalistic definitions. Such a mechanism may stand a real chance at handling creative language use typical of everyday human communication (and responsible for polysemy in the first place). Such inference may be achieved using spreading activation over nodes of 4lang graphs, as has been shown by (Nemeskey et al., 2013). No failure of interpretation no combinations of concepts and connections between them are forbidden by the formalism itself. Inference may judge certain states-of-affairs unlikely or even impossible, but the formalism will not fail the interpretation process. 3. From text to concept graph In this section we present our work on combining word representations like those described in Section 2 to create graphs that encode the meaning of phrases. We shall defer the task of syntactic parsing to the state-of-the-art Stanford Parser (DeMarneffe et al., 2006; Socher et al., 2013): the pipeline presented in this section processes sets of dependency triplets emitted by the Stanford Parser to create 4lang-style graphs of concepts (our future plans to incorporate syntactic parsing in 4lang are outlined in Section 6.3). This section is structured as follows: dependency parsing is briefly introduced in Section 3.1, the central dep_to_4lang module which maps dependencies to 4lang graphs is presented in Section 3.2. Major issues are discussed in Section 3.3, some solutions are presented in Section 3.4, manual evaluation of the text_to_4lang system is provided in Section 3.5. Besides the ability to map chunks of running text to semantic representations, text_to_4lang will see another application that is crucial to the system described in this paper: we process definitions of monolingual dictionaries to acquire word representations for lexical items that are not covered by 4lang. The resulting module dict_to_4lang will be presented in Section 4. 3.1. Dependency parsing Our present work is not concerned with the well-known problem of analyzing syntactic structure of natural language text. Instead, we use a robust, state-of-the-art tool, the Stanford Parser8 to obtain dependency relations that hold between pairs of words in an English sentence. Unlike dependency parsers that have been trained on manually annotated dependency treebanks, the Stanford Parser discovers relations by matching templates against its parse of a sentence’s constituent structure (DeMarneffe et al., 2006). This approach is more robust, since phrase structure parsers, and in particular the PCFG parser in the Stanford toolkit (Klein & Manning, 2003), are trained on much larger datasets than what is available to standard dependency parsers. The Stanford Dependency Parser is also capable of returning collapsed dependencies, which explicitly encode relations between two words that are encoded in the sentence by a function word such as a preposition or conjunction. E.g. in case of the sentence I saw the man who loves you, standard dependency parse would contain the relation nsubj(loves, who) but not nsubj(loves, man), even though man is clearly the subject of loves. Collapsed dependency parses contain these implicitly present dependencies and are therefore more useful for extracting the semantic relationships between words in the sentence. Furthermore, the Stanford Parser can postprocess conjunct dependencies: in the sentence Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas, the NP Bills on ports and immigration will at first be parsed into the relations prep_on(Bills, ports) and cc_and(ports, immigration), then matched against a rule that adds the relation prep_on(Bills, immigration). For our purposes we enable both types of postprocessing and use the resulting set of relations (or triplets) as input to the dep_to_4lang module, which uses them to build 4lang graphs and will be introduced in Section 3.2. The list of dependency relations extracted from a sentence is clearly not intended as a representation of meaning. However, it will prove sufficient to construct good quality semantic representations because of the nature of 4lang relations: for sentences and phrases such as Mary loves John or queen of France, 4lang representations are as simple as Mary ←1love →2John and France ←1HAS →2queen which can be straightforwardly constructed from the dependency relations nsubj(love, Mary), dobj(love, John), and prep_of(queen, France). Any further details that one may demand of a semantic representation, e.g. that John is an experiencer or that France does not physically possess the queen, will be inferred from the 4lang definitions of the concepts love and queen, in the latter case probably also accessing the definitions of rule or country. 3.2. From dependencies to graphs To construct 4lang graphs using dependency relations in the parser’s output, we created manually a mapping from relations to 4lang subgraphs, assigning to each dependency one of nine possible configurations. Additionally, all remaining relations of the form prep_* and prepc_* are mapped to binary subgraphs containing a node corresponding to the given preposition. To map words to 4lang concepts, we first lemmatize them using the hunmorph morphological analyzer (Trón et al., 2005) and the morphdb.en database. Graph edges for each dependency are added between the nodes corresponding to the lemmas returned by hunmporph. The full mapping from dependencies to 4lang-subgraphs is presented in Table 2. Figures 9 and 109 provide examples of how 4lang subgraphs correspond to dependency triplets. For a detailed description of each dependency relation the reader is referred to (De Marneffe & Manning, 2008). Table 2 Mapping from dependency relations to 4lang subgraphs. Dependency Edge amod w1 →0w2 advmod npadvmod acomp dep num prt nsubj w1 ⇌01w2 csubj xsubj agent dobj w1 →2w2 pobj nsubjpass csubjpass pcomp xcomp appos w1 ⇌00w2 poss w2←1HAS →2w1 prep_of tmod w1←1AT →2w2 prep_with w1←1INSTRUMENT →2w2 prep_without w1←1LACK →2w2 prep_P w1←1P →2w2 Dependency Edge amod w1 →0w2 advmod npadvmod acomp dep num prt nsubj w1 ⇌01w2 csubj xsubj agent dobj w1 →2w2 pobj nsubjpass csubjpass pcomp xcomp appos w1 ⇌00w2 poss w2←1HAS →2w1 prep_of tmod w1←1AT →2w2 prep_with w1←1INSTRUMENT →2w2 prep_without w1←1LACK →2w2 prep_P w1←1P →2w2 Table 2 Mapping from dependency relations to 4lang subgraphs. Dependency Edge amod w1 →0w2 advmod npadvmod acomp dep num prt nsubj w1 ⇌01w2 csubj xsubj agent dobj w1 →2w2 pobj nsubjpass csubjpass pcomp xcomp appos w1 ⇌00w2 poss w2←1HAS →2w1 prep_of tmod w1←1AT →2w2 prep_with w1←1INSTRUMENT →2w2 prep_without w1←1LACK →2w2 prep_P w1←1P →2w2 Dependency Edge amod w1 →0w2 advmod npadvmod acomp dep num prt nsubj w1 ⇌01w2 csubj xsubj agent dobj w1 →2w2 pobj nsubjpass csubjpass pcomp xcomp appos w1 ⇌00w2 poss w2←1HAS →2w1 prep_of tmod w1←1AT →2w2 prep_with w1←1INSTRUMENT →2w2 prep_without w1←1LACK →2w2 prep_P w1←1P →2w2 Table 3 Basic figures for each dataset. Dict headwords av. def. length approx. vocab. size LDOCE 30,126 11.6 9,000 Collins 82,026 13.9 31,000 en.wikt 128,003 8.4 38,000 Dict headwords av. def. length approx. vocab. size LDOCE 30,126 11.6 9,000 Collins 82,026 13.9 31,000 en.wikt 128,003 8.4 38,000 Table 3 Basic figures for each dataset. Dict headwords av. def. length approx. vocab. size LDOCE 30,126 11.6 9,000 Collins 82,026 13.9 31,000 en.wikt 128,003 8.4 38,000 Dict headwords av. def. length approx. vocab. size LDOCE 30,126 11.6 9,000 Collins 82,026 13.9 31,000 en.wikt 128,003 8.4 38,000 Table 4 Graphs built from each dataset. Dict # graphs av. nodes LDOCE 24,799 6.1 Collins 45,311 4.9 en.wikt 120,670 5.4 Dict # graphs av. nodes LDOCE 24,799 6.1 Collins 45,311 4.9 en.wikt 120,670 5.4 Table 4 Graphs built from each dataset. Dict # graphs av. nodes LDOCE 24,799 6.1 Collins 45,311 4.9 en.wikt 120,670 5.4 Dict # graphs av. nodes LDOCE 24,799 6.1 Collins 45,311 4.9 en.wikt 120,670 5.4 Fig. 9 View largeDownload slide Constructing the graph for Harry shivered in the cold night air. Fig. 9 View largeDownload slide Constructing the graph for Harry shivered in the cold night air. Fig. 10 View largeDownload slide Constructing the graph for Everyone from wizarding families talked about Quidditch constantly. Fig. 10 View largeDownload slide Constructing the graph for Everyone from wizarding families talked about Quidditch constantly. 3.3. Issues 3.3.1 Parsing errors Using the Stanford Parser for dependency parsing yields high-quality output, it is however limited by the quality of the phrase structure grammar parser. Parsing errors constitute a major source of errors in our pipeline, occasionally resulting in dubious semantic representations that could be discarded by a system that integrates semantic analysis into the parsing process. While our long-term plans include implementing such a process within the 4lang framework using constructions (see Section 6.3), we must currently rely on independent efforts to improve the accuracy of phrase structure grammar parsers using semantic information. Results of a pioneering effort in this direction are already included in the latest versions of the Stanford Parser (including the one used in the 4lang system): (Socher et al., 2013) improves the accuracy of the Stanford Parser by using Compositional Vector Grammars Their model combines classic PCFG grammars with word embeddings to account for the semantic relationships between words in the text that is to be parsed and words that have occurred in the training data. For example, the sentence He ate spaghetti with a spoon can be structurally distinguished from He ate spaghetti with meatballs even if in the training phase the model has only had access to [eat [spaghetti] [with a fork]], by grasping the similarity between the words spoon and fork. This phenomenon of incorrect PP-attachment is the single most frequent source of anomalities in our output. For example, syntactic ambiguity in the Longman definition of basement: a room or area in a building that is under the level of the ground, which has the constituent structure in Figure 11, is incorrectly assigned the structure in Figure 12, resulting in the erroneous semantic representation in Figure 13. Most such ambiguities are easily resolved by humans based on world knowledge (in this case e.g. that buildings with some underground rooms are more common than buildings that are entirely under the ground, if the latter can be called buildings at all) but it is unclear whether such inferencing isn’t beyond the capabilities even for parsers using word embeddings. Fig. 11 View largeDownload slide Constituent structure of a room or area in a building that is under the level of the ground. Fig. 11 View largeDownload slide Constituent structure of a room or area in a building that is under the level of the ground. Fig. 12 View largeDownload slide Incorrect parse tree for a room or area in a building that is under the level of the ground. Fig. 12 View largeDownload slide Incorrect parse tree for a room or area in a building that is under the level of the ground. Fig. 13 View largeDownload slide Incorrect definition graph for basement. Fig. 13 View largeDownload slide Incorrect definition graph for basement. 3.4. Postprocessing dependencies Some of the typical issues of the graphs constructed by the process described in Section 3.2 can be resolved by postprocessing the dependency triplets in the parser’s output before passing them to dep_to_4lang. Currently the dependency_processor module handles two configurations: coordination (Section 3.4.1) and copular sentences (Section 3.4.2) 3.4.1 Coordination One frequent class of parser errors related to PP-attachment (cf. Section 3.3.1) involve constituents modifying a coordinated phrase which are analyzed as modifying only one of the coordinated elements. E.g. in the Longman entry casualty: someone who is hurt or killed in an accident or war, the parser fails to detect that the PP in an accident or war modifies the constituent hurt or killed, not just killed. Determining which of two possible parse trees is the correct one is of course difficult – once again, casualty may as well mean ‘someone who is killed in an accident or war or someone who is hurt (in any way)’ and that such a misunderstanding is unlikely in real life is a result of inference mechanisms well beyond what we are able to model. Our simple attempt to improve the quality of graphs built is to process all pairs of words between which a coordinating dependency holds (e.g. conj_and, conj_or, etc.) and copy all edges from each node to the other. While this could hardly be called a solution, as it may introduce dependencies incorrectly, in practice it has proved an improvement. In our current example this step enables us to obtain missing dependencies and thus build the correct 4lang graph (see Figure 14). Fig. 14 View largeDownload slide Definition graph built from: casualty: someone who is hurt or killed in an accident or war, with extra dependencies added by the postprocessor. Fig. 14 View largeDownload slide Definition graph built from: casualty: someone who is hurt or killed in an accident or war, with extra dependencies added by the postprocessor. 3.4.2 Copulars and prepositions Two further postprocessing steps involve copular constructions containing prepositional phrases. In simple sentences such as The wombat is under the table, the parser returns the pair of dependencies nsubj(is, wombat) and prep_under(is, table), which we use to generate prep_under(wombat, table). Similarly, when PPs are used to modify a noun, such as in the Longman definition of abbess: a woman who is in charge of a convent, for which the dependency parser returns, among others, the triplets rcmod(woman, is) and prep_in(is, convent), we let a simple rule add the triplet prep_in(woman, charge) (see Figure 15). In both cases we finish by removing the copular verb in order to simplify our final representation. Fig. 15 View largeDownload slide Postprocessing of the entry: abbess:a woman who is in charge of a convent. Fig. 15 View largeDownload slide Postprocessing of the entry: abbess:a woman who is in charge of a convent. 3.5. Evaluation We performed manual evaluation of the text_to_4lang module on a sample from the UMBC Webbase corpus (Han et al., 2013), a set of 3 billion English words based on a 2007 webcrawl performed as part of the Stanford Webbase10 project. We used the GNU utility shuf to extract a random sample of 50 sentences, which we processed with text_to_4lang, then examined manually both the final output and the dependencies output by the Stanford Parser in order to gain a full understanding of each anomaly in the graphs created. The sentences in this corpus are quite long (22.1 words/sentence on average), therefore most graphs are affected by multiple issues; we shall now take stock of those that affected more than one sentence in our sample. Parser errors remain the single most frequent source of error in our final 4lang graphs: 16 sentences in our sample of 50 were assigned dependencies erroneously. 4 of these cases are related to PP-attachment (see Section 3.3.1). Parser errors are also virtually the only issue that cause incorrect edges to be added to the final graph – nearly all remaining issues will result in missing connections only. The second largest source of errors in this dataset are related to connectives between clauses that our pipeline does not currently process. Our sample contains 12 such examples, including 4 relative clauses and 4 pairs of clauses connected by connectives such as that, unless, etc. The output of our pipeline for these sentences typically consists of two graphs that are near-perfect representations of the two clauses, but are not connected to each other in any way – an example is shown in Figure 16. Fig. 16 View largeDownload slide 4lang graph built from the sentence The Manitoba Action Committee is concerned that the privatization of MTS will lead to rate increases. The dependency ccomp(concerned, lead) was not processed. Fig. 16 View largeDownload slide 4lang graph built from the sentence The Manitoba Action Committee is concerned that the privatization of MTS will lead to rate increases. The dependency ccomp(concerned, lead) was not processed. There are three more error classes worth mentioning: 5 graphs suffered from recall errors made by the Stanford Coreference Resolution system: in these cases connections of a single concept in the final graph are split among two or more nodes, since our pipeline failed to identify two words as referring to the same entity (Figure 17 shows an example). Another 5 sentences caused errors because of the appearance in the parser output of the dependency relation vmod, which holds between a noun and a reduced non-final verbal modifier, which “is a participial or infinitive form of a verb heading a phrase (which may have some arguments, roughly like a VP). These are used to modify the meaning of an NP or another verb.” (DeMarneffe et al., 2006, p.10). This dependency is not processed by dep_to_4lang, since it may encode the relation between a verb and either its subject or object; e.g. the example sentences in the Stanford Dependency Manual, Truffles picked during the spring are tasty and Bill tried to shoot, demonstrating his incompetence will result in the triplets vmod(truffles, picked) and vmod(shoot, demonstrating), but should be represented in 4lang by the edges pick →2truffles and shoot →0demonstrate, respectively. Most representations in our sample suffer from multiple errors. While a quantitative analysis of the quality of these representations is currently not possible, our manual inspection tells us that 16 of the 50 graphs in our sample are either perfect representations of the input sentence (in 4 cases) or are affected by a single minor error only and remain high-quality representations. Fig. 17 View largeDownload slide 4lang graph built from the sentence My wife and I have used Western Union very successfully for almost two years to send money to her family in Ukraine.. Nodes with dashed edges should have been unified based on coreference resolution. Fig. 17 View largeDownload slide 4lang graph built from the sentence My wife and I have used Western Union very successfully for almost two years to send money to her family in Ukraine.. Nodes with dashed edges should have been unified based on coreference resolution. 4. Building definition graphs By using the text_to_4lang module to process entries in monolingual dictionaries written for humans we can attempt to build definition graphs like those in 4lang for practically every word. This section presents the dict_to_4lang module, which extends the text_to_4lang pipeline with parsers for several major dictionaries (an overview of these is given in section 4.1) as well as some preprocessing steps specific to the genre of dictionary definitions – these are presented in section 4.2. Finally, Section 4.4 points out several remaining issues with definition graphs produced by the dict_to_4lang pipeline. Applications of dict_to_4lang, both existing and planned, shall be described in Section 5. The entire pipeline is available as part of the 4lang library. 4.1. Dictionaries of English We process three large dictionaries of English; custom parsers have been built for each and are distributed as part of the 4lang module. The Longman Dictionary of Contemporary English (Bullon, 2003) contains ca. 42 000 English headwords and its definitions are constrained to a small vocabulary, the Longman Defining Vocabulary (LDV, (Boguraev & Briscoe, 1989)). The longman_parser tool processes the xml-formatted data and extracts for each headword a list of its senses, including for each the plain-text definition, the part-of-speech tag, and the full form of the word being defined, if present: e.g. definitions of acronyms will contain the phrase that is abbreviated by the headword. No component of 4lang currently makes use of this last field, AAA will not be replaced by American Automobile Association, but this may change in the future. The Collins-COBUILD dictionary (Sinclair, 1987) contains over 84,500 headwords and its definitions use a vocabulary that is considerably larger than LDOCE, including a large technical vocabulary (e.g. adularia:a white or colourless glassy variety of orthoclase in the form of prismatic crystals., rare words (affricare:to rub against), and multiple orthographic forms (adsuki bean:variant spelling of adzuki bean). Since many definitions are simply pointers to other headwords, the average entry in Collins is much shorter than in LDOCE. However, given the technical nature of many entries, the vocabulary used by definitions exhibits a much larger variety: while Longman definitions – for the greatest part limited to the LDV – contain less than 9000 English lemmas (not including named entities, numbers, etc.), Collins definitions use over 38,000 (figures obtained using the hunmorph analyzer and the morhdb.en database). Our third source of English definitions, the English Wiktionary at http://en.wiktionary.org is the most comprehensive database, containing over 128,000 headwords and available via public data dumps that are updated weekly. Since wiktionaries are available for many languages using similar – although not standardized – data formats, it has long been a resource for various NLP tasks, among them an effort to extend the 4lang dictionary to 40 languages (Ács et al., 2013). While for most languages datasets such as Longman and Collins may not be publicly available, wiktionaries currently contain over 100,000 entries for each of nearly 40 languages, and over 10,000 for a total of 76. 4.2. Parsing definitions 4.2.1 Preprocessing Before passing dictionary entries to the parser, we match them against some simple patterns that are then deleted or changed to simplify the phrase or sentence without loss of information. A structure typical of dictionary definitions are noun phrases with very generic meanings, e.g. something, one, a person, etc. For example, LDOCE defines buffer as someone or something that protects one thing or person from being harmed by another. The frequency of such structures makes it worthwhile to perform a simple preprocessing step: phrases such as someone, someone who, someone, etc. are removed from definitions in order to simplify them, thus reducing the chance of error in later steps. The above definition of buffer, for example, can be reduced to protects from being harmed, which can then be parsed to construct the definition graph protect ←1FROM →2harm. 4.2.2 Constraining the parser Since virtually all dictionary definitions of nouns are single noun phrases, we constrain the parser to only allow such analyses for the definitions of all noun headwords11. This fixes many incorrect parses, for example when the defining noun phrase could also be parsed as a complete sentence, as in Figure 18. Fig. 18 View largeDownload slide Incorrect parse tree from the Stanford Parser for the definition of wavelength: the size of a radio wave used to broadcast a radio signal. Fig. 18 View largeDownload slide Incorrect parse tree from the Stanford Parser for the definition of wavelength: the size of a radio wave used to broadcast a radio signal. 4.2.3 Building definition graphs The output of the – possibly constrained – parsing process is passed to the dep_to_4lang module introduced in Section 3. The ROOT dependency in each parse, which was ignored in the general case, is now used to identify the head of the definition, which is a hypernym of the word being defined. This allows us to connect, via a 0-edge, the node of the concept being defined to the graph built form its definition. We can perform this step safely because the vast majority of definitions contain a hypernym of the headword as their root element – exceptions will be discussed in Section 4.4.2. 4.3. Expanding definition graphs The 4lang dictionary contains by design all words of the Longman Defining Vocabulary (LDV, (Boguraev & Briscoe, 1989)). This way, if we use dict_to_4lang to define each headword in LDOCE as a graph over nodes corresponding to words in its dictionary definition, these graphs will only contain concepts that are defined in the hand-written 4lang dictionary. To take advantage of this, we implement an expansion step in 4lang, which adds the definition of each concept to a 4lang graph by simply adjoining each definition graph to G at the node corresponding to the concept being defined. This can be stated formally as follows: Definition 1. Given the set of all concepts C, a 4lang graph G with concept nodes V(G)=c1,c2,…,ci∈C, a set of definition graphs D, and a lexicon function L:C→D such that ∀c∈C:c∈V(L(c)), we define the expansion of G as G*=G∪∪ci∈LL(G) Hand-written definitions in the 4lang dictionary may also contain pointers to arguments of the definiendum. For example, the concept stand is defined as upright ←0= AGT ←1ON →1feet, indicating that it is the agent of stand that is →0upright, etc. While detecting the thematic role of a verb’s arguments can be difficult, we handle the majority of cases correctly using a simple step after expansion: all edges containing =AGT (=PAT) nodes are moved to the machine(s) with a 1-edge (2-edge) pointing to it from the concept being defined. This allows us to create the graph in Figure 19 based on the above definition of stand. Fig. 19 View largeDownload slide Expanded graph for A man stands in the door. Nodes of the unexpanded graph are shown in grey. Fig. 19 View largeDownload slide Expanded graph for A man stands in the door. Nodes of the unexpanded graph are shown in grey. Expansion will affect all nodes of graphs built from LDOCE; when processing generic English text using text_to_4lang we may choose to limit expansion to manually built 4lang definitions, or we can turn to dictionaries built using dict_to_4lang, allowing ourselves to add definitions to nearly all nodes. 4lang modules can be configured to select the approach most suitable for any given application. 4.4. Issues and evaluation In this section we shall describe sources of errors in our pipeline besides those caused by incorrect parser output (see Section 3.3.1). We shall also present the results of manual error analysis conducted on a small sample of graphs in an effort to determine both the average accuracy of our output graphs as well as to identify the key error sources. 4.4.1 Error analysis To perform manual evaluation of the dict_to_4lang pipeline we randomly selected 50 headwords from the Longman Dictionary12. In one round of evaluation we grouped the 50 definition graphs by quality, disregarding the process that created them. We found that 31 graphs were high-quality representations: 19 perfectly represented all facts present in the dictionary entry (see e.g. Figure 20) and another 12 were mostly accurate, with only minor details missing or an incorrect relation present in addition to the correct ones. Of the remaining 19 graphs, 9 still encoded several true relationships, the last 10 were essentially useless. Our sample is too small to conclude that 62% of the graphs we build are of acceptable quality, but these results are nevertheless promising. Our second round of manual inspection was directed at the entire process of building the 50 graphs and aimed to identify the source of errors. Out of the 31 graphs that had errors at all, 8 were clearly a result parser errors (discussed in Section 3.3.1), another 8 contained non-compositional structures that in the future may be handled by constructions (see Section 6.5.1), and 3 were connected to non-standard definitions (see Section 4.4.2). All remaining errors were caused by one-of-a-kind bugs in the pipeline, e.g. preprocessing issues, the occasional overgeneration of relations by the postprocessing of coordinated structures (see Section 3.4.1), etc. Fig. 20 View largeDownload slide Graph constructed from the definition of Zen: a kind of Buddhism from Japan that emphasizes meditation. Fig. 20 View largeDownload slide Graph constructed from the definition of Zen: a kind of Buddhism from Japan that emphasizes meditation. 4.4.2 Non-standard definitions Our method for building 4lang definitions can be successful in the great majority of cases because most dictionary definitions – or at least their first sentences, which is all we make use of – are rarely complex sentences; in most cases they are single phrases describing the concept denoted by the headword – a typical example would be the definition of koala: an Australian animal like a small grey bear with no tail that climbs trees and eats leaves. It is these kinds of simple definitions that are prevalent in the dictionaries we process and that are handled quite accurately by both the Stanford Parser and our mapping from dependencies to 4lang relations. In some cases, however, definitions use full sentences to explain the meaning of a word in a more straightforward and comprehensible way, for example: playback - the playback of a tape that you have recorded is when you play it on a machine in order to watch or listen to it indigenous - indigenous people or things have always been in the place where they are, rather than being brought there from somewhere else ramshackle - a ramshackle building or vehicle is in bad condition and in need of repair These sentences will result in a higher number of dependency relations, and consequently a denser definition graph; often with erroneous edges. In the special case when the Stanford Parser’s output does not contain the ROOT relation, that is the parser failed to identify any of the words as the root of the sentence, we skip the entry entirely – this affects 0.76% of LDOCE entries, 0.90% of entries in en.wiktionary. 4.4.3 Word senses As discussed in Section 2.2, the 4lang theory assigns only one definition to each word form, i.e. it does not permit multiple word senses, all usage of a word must be derived from a single concept graph. Explanatory dictionaries like the ones listed in Section 4.1 provide several definitions for each word, of which we always process the first one. This decision is somewhat arbitrary, but produces good results in practice; the first definition typically describes the most common sense of the word, as in the case of tooth: one of the hard white objects in your mouth that you use to bite and eat food one of the sharp or pointed parts that sticks out from the edge of a comb or saw We cannot expect to construct from this entry a generic definition such as sharp, one_of_many. Instead, to capture at a later stage that objects other than those in your mouth could be instances of tooth, we must turn to the principle that any link in a 4lang definition can be overridden (see Section 2.2). Not only are we unable to predict the particular subset of links in the definition of tooth that will be shared across various uses of the word tooth, we shouldn’t make any such predictions: it is no more than an accident that teeth turned out to be metaphors for small, sharp objects lined up next to one another and not for e.g. small, white, cube-shaped objects. While in most cases the various senses defined for a word are metaphoric uses of the first, there remain words whose first definition is not generic enough to accommodate all others even if we assume powerful inferencing capabilities. Consider e.g. the definitions of shower from LDOCE below: a piece of equipment that you stand under to wash your whole body an act of washing your body while standing under a shower a short period of rain or snow a lot of small, light things falling or going through the air together a party at which presents are given to a woman who is going to get married or have a baby a group of stupid or lazy people to wash your whole body while standing under a shower to give someone a lot of things to scatter a lot of things onto a person or place, or to be scattered in this way A 4lang definition generic enough so that one could derive at least the majority of these cases would be most similar to definition #4: showers are occurrences of many things falling, typically through the air. Understanding the word shower in the context of e.g. baby showers (#5) would remain a difficult task, including among others that of understanding that fall may refer to an object changing place not only physically but also in terms of ownership. In the above LDOCE entry, however, since we use the first definition to build the 4lang graph, we lose any chance of recovering any of the meanings #3-6 and #8-9. The lexicographic principle that keeps sense #2 and sense #7 separate simply does not apply in 4lang, which does not distinguish meanings that differ in part of speech alone: the verb and the nomen actionis are simply one and the same. We further note that many of the distinctions made here would be made by overt suffixes in other languages, e.g. the Hungarian equivalents of #1 and #2 are zuhany and zuhanyozik, respectively. 5. Semantic similarity This section summarizes two successful applications of the dict_to_4lang system. A tool for measuring the similarity of English sentence pairs, introduced in (Recski & Ács, 2015), is presented in Section 5.1, while Section 5.2 documents the more recent wordsim system for measuring similarity of word pairs, which we evaluate on the popular benchmark SimLex-999, achieving significant improvement over the current state of the art (see also (Recski et al., 2016)). 5.1. Sentence similarity This section reviews a set of systems participating in the 2015 SemEval task of measuring semantic similarity of sentence pairs using concept graphs built using dict_to_4lang to measure the semantic similarity between words. We briefly review the STS task, then present the system architecture and our measure of word similarity based on 4lang representations. This measure is combined with word pair features derived from various word embeddings, lexical resources like WordNet, and surface forms of words, to produce a competitive algorithm for measuring sentence similarity. 5.1.1 The STS task The Semantic Textual Similarity (STS) track of SemEval conferences requires participating systems to measure the degree of semantic similarity between pairs of sentences. Datasets used in recent years were taken from a variety of sources (news headlines, image captions, answers to questions posted in online forums, answers given by students in classroom tests, etc.). Gold annotation was obtained by crowdsourcing (using Amazon Mechanical Turk), annotators were required to grade sentence pairs on a scale from 0 to 5; Inter-annotator agreement was calculated to ensure the high quality of annotations. 5.1.2 System architecture Our framework for measuring semantic similarity of sentence pairs is a reimplementation of the system presented in (Han et al., 2013), who were among the top scorers in all STS tasks since 2013 (Kashyap et al., 2014; Han et al., 2015). Their architecture, Align and Penalize, involves computing an alignment score between two sentences based on some measure of word similarity. Our system extends the capabilities of this system in several ways, among them by defining a measure of semantic similarity between 4lang graphs and using it as an additional source of word similarity in several of their configurations. The core idea behind the Align and Penalize architecture is, given two sentences S1 and S2 and some measure of word similarity, to align each word of one sentence with some word of the other sentence so that the total similarity of word pairs is maximized. The mapping need not be one-to-one and is calculated independently for words of S1 (aligning them with words from S2) and words of S2 (aligning them with words from S1). The score of an alignment is the sum of the similarities of each word pair, normalized by sentence length, the final score assigned to a pair of sentences is the average of the alignment scores for each sentence. Multiple components are used to measure word similarity, their output is combined using supervised learning methods. For out-of-vocabulary (OOV) words, i.e. those that are not covered by the component used for measuring word similarity, the systems rely on string similarity, the Dice- and Jaccard-similarities (Dice, 1945; Jaccard, 1912) over the sets of character n-grams in each word for n=1,2,3,4. 5.1.3 Word similarity in 4lang The 4lang-similarity of two words is the similarity between the 4lang graphs defining them. The exact definition is based on the intuition that similar concepts will overlap in the elementary configurations they take part in: they might share a 0-neighbor, e.g. train →0vehicle ←0car, or they might be on the same path of 1- and 2-edges, e.g. park ←1 IN →2town and street ←1IN →2town. Predicates of a concept are defined as the set of elementary configurations it takes part in: for example, based on the definition graph in Figure 3, predicates of the concept bird ( P(bird)) are {vertebrate; (HAS, feather); (HAS, wing); (MAKE, egg)}. Predicates can also be inherited via paths of 0-edges, that is (HAS, wing) is considered a predicate of all concepts for which →0bird holds. By default, the similarity of two concepts is the Jaccard similarity of the sets of predicates of each concept S(w1,w2)=J(P(w1),P(w2))=|P(w1)∩P(w2)||P(w1)∪P(w2)| If the same metric for all nodes in two definition graphs is larger, it is used instead; this is meant to account for small degrees of similarity such as that between casualty and army, whose definitions do not share any predicates but have a single common node war, causing their similarity to be greater than zero. (see Figure 21) Fig. 21 View largeDownload slide Definitions of casualty (built from LDOCE) and army (defined in 4lang). Fig. 21 View largeDownload slide Definitions of casualty (built from LDOCE) and army (defined in 4lang). Our submissions achieved state-of-the-art results on the 2015 STS task. One of the three systems, embedding, did not make use of 4lang, but used a word embedding built from the first 1 billion words of the English Wikipedia. Our second submission, machine used the 4lang-based word similarity, while the hybrid submission combined the output of first two systems. Results are presented in Table 5; our top system ranked 11th among 78 systems in 2015. Table 5 Performance of the our team’s systems on STS 2015. embedding machine hybrid answers-forums 0.704 0.698 0.723 answers-students 0.700 0.746 0.751 belief 0.733 0.736 0.747 headlines 0.769 0.805 0.804 images 0.804 0.841 0.844 mean Pearson 0.748 0.777 0.784 embedding machine hybrid answers-forums 0.704 0.698 0.723 answers-students 0.700 0.746 0.751 belief 0.733 0.736 0.747 headlines 0.769 0.805 0.804 images 0.804 0.841 0.844 mean Pearson 0.748 0.777 0.784 Table 5 Performance of the our team’s systems on STS 2015. embedding machine hybrid answers-forums 0.704 0.698 0.723 answers-students 0.700 0.746 0.751 belief 0.733 0.736 0.747 headlines 0.769 0.805 0.804 images 0.804 0.841 0.844 mean Pearson 0.748 0.777 0.784 embedding machine hybrid answers-forums 0.704 0.698 0.723 answers-students 0.700 0.746 0.751 belief 0.733 0.736 0.747 headlines 0.769 0.805 0.804 images 0.804 0.841 0.844 mean Pearson 0.748 0.777 0.784 5.2. Word Similarity The experiments described in Section 5.1 provided many insights about the potential of 4lang representations to model semantic relatedness of concepts. This section will describe more recent efforts at measuring the semantic similarity of word pairs, resulting in the hybrid wordsim system. The word similarity task has been a standard method for evaluating distributional models of semantics, with some models trained explicitly for this task. The wordsim system implements supervised learning over features from multiple models (including both word embeddings and 4lang representations). Models were evaluated on the standard SimLex-999 dataset13, we shall introduce the dataset and summarize previous results in Section 5.3. Section 5.4 lists features defined by wordsim over pairs of 4lang definition graphs, results are presented in Section 5.5. The wordsim library is available under an MIT license from http://www.github.com/recski/wordsim, the contents of this section are presented in greater detail by (Recski et al., 2016). 5.3. Previous work (Hill et al., 2015) recently proposed the SimLex-999 dataset as a benchmark for systems measuring word similarity. They argue that earlier gold standards measure association, not similarity, of word pairs; e.g. the words cup and coffee receive a high score by annotators in the widely used wordsim353 data (Finkelstein et al., 2002). Hill et al. note that “[a]ssociation and similarity are neither mutually exclusive nor independent” (2015, p.668). Instead of providing any definition of the above distinction, annotators of the SimLex dataset were simply shown a small set of examples and counter-examples. Since its publication in 2015 dozens of models have used the SimLex dataset for evaluation, some of these are listed on the SimLex webpage14. Various systems for measuring word similarity are compared using the SimLex dataset by measuring the Spearman correlation between scores assigned to word pairs by each system and the average of scores given by human annotators. Word embeddings are evaluated by several authors by treating the cosine distance of the pair of word vectors as the word similarity score assigned by that embedding to a pair of words. (Hill et al., 2015) report a correlation of 0.41 by an embedding trained on Wikipedia using word2vec (Mikolov et al., 2013), (Schwartz et al., 2015) achieve a score of 0.56 using a combination of a standard word2vec-based embedding and the SP model, which encodes the cooccurrence of words in symmetric patterns such as X and Y or X as well as Y. (Banjade et al., 2015) document a set of experiments on the contribution of various models to the task of measuring word similarity. Half a dozen distributional models are combined with simple WordNet-based features indicating whether word pairs are synonymous or antonymous, and with the word similarity algorithm of (Han et al., 2013), which we briefly introduced in Section 5.1.2, and which itself uses WordNet-based features for boosting. By generating features using each of these resources and evaluating ML models trained using 11 different subsets of 10 feature classes, (Banjade et al., 2015) conclude that top performance is achieved when including all of them. This system achieved a Spearman correlation of 0.64, a considerable improvement over the performance of any individual model. The highest scores on SimLex that we are aware of (other than our own system) is achieved using the Paragram embedding (Wieting et al., 2015), a set of vectors obtained by training pre-existing embeddings on word pairs from the Paraphrase Database (Ganitkevitch et al., 2013). Their top correlation of 0.69 is measured when using a 300-dimension embedding created from the same GloVe-vectors that have been introduced in this section (trained on 840 billion tokens). Hyperparameters of this database have been tuned for maximum performance on SimLex, another version tuned for the WS-353 dataset achieves a correlation of 0.67. 5.4. 4lang-based features Based on insights gained from developing a 4lang-based similarity measure for the 2015 STS system (see Section 5.1 for details) we have defined multiple features over pairs of 4lang graphs which we predicted would correlate with word similarity. In defining these features we rely on the definition of predicates introduced in Section 5.1.3. Two real-valued features correspond to the main components of our earlier, rule-based measure: the Jaccard-similarities of sets of predicates and nodes in definition graphs. Additionally, we introduce three binary features: the links_contain feature is true iff either concept is contained in a predicate of the other, nodes_contain holds iff either concept is included in the other’s definition graph, and 0_connected is true iff the two nodes are connected by a path of 0-edges in either definition graph. All 4lang-based features are listed in Table 6. Table 6 4lang similarity features. feature definition links_jaccard J(P(w1),P(w2)) nodes_jaccard J(N(w1),N(w2)) links_contain 1 if w1∈P(w2) or w2∈P(w1), 0 otherwise nodes_contain 1 if w1∈N(w2) or w2∈N(w1), 0 otherwise 0_connected 1 iff w1 and w2 are on a path of 0-edges, 0 otherwise feature definition links_jaccard J(P(w1),P(w2)) nodes_jaccard J(N(w1),N(w2)) links_contain 1 if w1∈P(w2) or w2∈P(w1), 0 otherwise nodes_contain 1 if w1∈N(w2) or w2∈N(w1), 0 otherwise 0_connected 1 iff w1 and w2 are on a path of 0-edges, 0 otherwise Table 6 4lang similarity features. feature definition links_jaccard J(P(w1),P(w2)) nodes_jaccard J(N(w1),N(w2)) links_contain 1 if w1∈P(w2) or w2∈P(w1), 0 otherwise nodes_contain 1 if w1∈N(w2) or w2∈N(w1), 0 otherwise 0_connected 1 iff w1 and w2 are on a path of 0-edges, 0 otherwise feature definition links_jaccard J(P(w1),P(w2)) nodes_jaccard J(N(w1),N(w2)) links_contain 1 if w1∈P(w2) or w2∈P(w1), 0 otherwise nodes_contain 1 if w1∈N(w2) or w2∈N(w1), 0 otherwise 0_connected 1 iff w1 and w2 are on a path of 0-edges, 0 otherwise Since these features are not sensitive to the 4lang nodes LACK, representing negation (dumb →0intelligent →0LACK), and BEFORE, which indicates that something was only true in the past (forget →0know →0BEFORE), pairs of antonyms in SimLex were regularly assigned high similarity scores. A further binary feature, is_antonym was therefore implemented, true iff one word is within the scope of, i.e. 0-connected to, an instance of either LACK or BEFORE in the other word’s definition graph. A system trained on 4lang-based features only achieves a Pearson correlation of 0.38 on the SimLex data, which is competitive with some word embeddings, but significantly below the 0.58−0.68 range of the state of the art systems cited in Sections 5.3. After measuring the individual contribution of each type of 4lang feature to the performance of purely vector-based configurations, only two features, 0-connected and is_antonym were kept. Adding these two features to the vector-based system brought correlation to 0.75, a model using both 4lang and WordNet achieved the top score of 0.76. 5.5. Results Table 7 presents correlation figures for major configurations of wordsim. Features extracted from pairs of graphs built by dict_to_4lang improve the top system by a significant margin, narrowing considerably the gap between other configurations and the 0.78 average performance of annotators when measured against the average of all other annotators’ scores (Hill et al., 2015). This improvement also appears to be more significant than that achieved via a set of WordNet features encoding basic lexical relations such as synonymy and hypernymy. Table 7 Performance of major configurations on SimLex. System Spearman’s ρ embeddings 0.72 embeddings+wordnet 0.73 embeddings+4lang 0.75 embeddings+wordnet+4lang 0.76 System Spearman’s ρ embeddings 0.72 embeddings+wordnet 0.73 embeddings+4lang 0.75 embeddings+wordnet+4lang 0.76 Table 7 Performance of major configurations on SimLex. System Spearman’s ρ embeddings 0.72 embeddings+wordnet 0.73 embeddings+4lang 0.75 embeddings+wordnet+4lang 0.76 System Spearman’s ρ embeddings 0.72 embeddings+wordnet 0.73 embeddings+4lang 0.75 embeddings+wordnet+4lang 0.76 6. Outlook We have presented a system for building 4lang-style concept definitions for practically all words of English, and for using them in creating 4lang-representations of the meaning of any utterance. We have also reviewed a pair of experiments in measuring semantic similarity by combining features derived from 4lang-representations with standard distributional models of meaning. This section outlines our future plans for using 4lang representations to solve some of the most challenging tasks in computational semantics. We shall briefly discuss the tasks of measuring sentence similarity and entailment (Section 6.1), question answering (Section 6.2), and semantics-based parsing (Section 6.3), arguing that each of these should be approached via the single generic task of determining the likelihood of some 4lang representation based on models of context trained on other 4lang graphs relevant to the task at hand (the context). Some preliminary ideas for such a component are presented in Section 6.4. Finally, Section 6.5 will discuss ways to exploit existing sources of both linguistic and extra-linguistic knowledge in the 4lang system by converting them to 4lang representations. 6.1. Sentence similarity and entailment In Sections 5.1 and 5.2 we have introduced measures of semantic similarity between words based on their 4lang definitions which helped achieve state of the art performance on the tasks of measuring word and sentence similarity. Most top STS systems reduce the task of measuring textual similarity to that of word similarity, and lexical resources such as WordNet and surface features such as character-based similarity play an important role in most approaches. Our current systems are no exception. We believe that the task of directly quantifying the similarity of two meaning representations amounts to detecting entailment between parts of such representations. The nature of the similarity scale (e.g. what it means for two sentences to be 70% similar) is unclear, but it can be assumed that (i) if two sentences S1 and S2 are perfectly similar (i.e. mean exactly the same thing), then each of them must entail the other, and (ii) if S1 and S2 are similar to some extent then there must be exist some substructures of the meanings of S1 and S2 such that these substructures are perfectly similar, i.e. entail each other. The connection between STS and RTE tasks has recently been made by (Vo & Popescu, 2016), who present a corpus annotated for both semantic relatedness and entailment, measure correlation between the two sets of scores, and propose a joint architecture for performing the two tasks simultaneously. The nature of these substructures is less obvious. A straightforward approach is to consider subgraphs of 4lang representations and assume that similarity of two representations is connected to the intersection of graphs (i.e. the intersection of the sets of edges over the intersection of the sets of nodes). For example, the sentences John walks and John runs, when interpreted in 4lang and properly expanded, will map to graphs that share the subgraph John ⇌10move x←1INSTRUMENT x→2foot. Other common configurations between graphs can also warrant similarity, e.g. John walks with a stick and John fights with a stick both map to John ⇌10X x←1INSTRUMENT x→2stick for some X. If our notion of similarity could refer to shared subgraphs only, no connection could be made between John and stick and these sentences could not be judged more similar to each other than to virtually any sentence about John or about a stick being an instrument. Thus it appears that such common templates, i.e. graphs with some unspecified nodes, must play a role in determining the similarity of two 4lang graphs. The number of such templates matching a given graph grows exponentially with the number of nodes, but we can expect the relevant templates to be of limited size and a search for common templates in two graphs seems feasible15. If similarity can be defined in terms of common substructures of 4lang graphs, a definition of entailment can follow that takes into account the substructures in one graph that are also present in the other. Simply put, John walks entails John moves because the representation of the latter, John ⇌10move, is contained in that of the former, but entailment does not hold the other way round, because many edges for John walks are left uncovered by John moves, e.g. those in move ←1INSTRUMENT →2foot. Since this asymmetric relationship between graphs – the ratio of templates in one that are present in the other – is also of a gradual nature, it is more intuitive to think of it as the extent to which some utterance supports the other (the term entailment is typically used as a strictly binary concept). John moves may not entail John walks, it nevertheless supports it to a greater extent than e.g. John sings. How similarity and support between 4lang graphs should be measured exactly cannot be worked out without considerable experimenting (we are trying to approximate human judgment, as in the case of the STS task in Section 5.1), what we argued for here is that 4lang representations are powerful and expressive enough that the semantic relatedness of utterances can be measured through them effectively. 6.2. Question Answering In the previous section we discussed the task of measuring the extent to which one utterance supports another – a relationship that differs from entailment in being gradual. A workable measure of support can take part in question answering: it can be used to rank answer candidates in order to find those that are supported to the highest degree by a given context. There remains the task of finding candidates that are relevant answers to the question asked. The text_to_4lang pipeline offers no special treatment for questions. A wh-question such as Who won the 2014 World Cup are handled by all components in the same way as indicatives, creating e.g. the edges who ←1win →2cup. Yes-no questions are simply not detected as such, Did Germany win the 2014 World Cup and Germany won the 2014 World Cup will map to the same 4lang graph. In the future we plan to experiment with simple methods for finding candidates: e.g. searching for wh-questions allows us to identify the template X ←1win →2cup(…) and match it against graphs already in the context; we shall discuss how such a context might be modeled in Section 6.4. 6.3. Parsing in 4lang For the purposes of the 4lang modules and applications presented in this paper, we relegate syntactic analysis to dependency parsers. In Section 3.3.1 we have seen examples of errors introduced by the parsing component, and in sections on evaluation we observed that they are in fact the single largest source of errors in most of our applications. Our long-term plans for the 4lang library include an integrated module for semantics-assisted parsing. Since most of our plans are unimplemented (with the exception of some early experiments documented in (Nemeskey et al., 2013)), here we shall only provide a summary of our basic ideas. Since generic parsing remains a challenging task in natural language processing, many NLP applications rely on the output of chunkers for high-accuracy syntactic information about a sentence. Chunkers typically identify the boundaries of phrases at the lowest level of the constituent structure, e.g. in the sentence A 61-year old furniture salesman was pushed down the shaft of a freight elevator they would identify the noun phrases [A 61-year old furniture salesman], [the shaft], and [freight elevator]. Since chunking can be performed with high accuracy across languages ((Kudo & Matsumoto, 2001; Recski & Varga, 2010)), and some of our past experiments suggest that the internal syntactic structure of chunks can also be detected with high accuracy (Recski, 2014), our first goal for 4lang is to detect phrase-internal semantic relations directly. The aim of parsing with 4lang is to make the process sensitive to (lexical) semantics. Currently the phrase blue giraffe would be mapped to the graph giraffe →0blue on the basis of the dependency relation amod(giraffe, blue), warranted by a particular fragment of the parse-tree, something along the lines of [NP [A blue] [N giraffe]], which has been constructed with little or no regard to the semantics of blue or giraffe. The architecture we propose would still make use of the constituent structure of phrases, but it would create a connection between blue giraffe and giraffe →0blue by means of a construction that pairs the rewrite rule NP → A N with the operation that adds the 0-edge between the concepts corresponding to the words blue and giraffe16. Since many dependency parsers, among them the Stanford Parser used by dict_to_4lang, derive their analyses from parse trees using template matching, it seems reasonable to assume that a direct mapping between syntactic patterns and 4lang configurations can also be implemented straightforwardly. The task of ranking competing parse trees can then be supplemented by some module that ranks 4lang representations by likelihood; what likelihood means and how such a module could be designed is discussed in Section 6.4. Thus, the problem of resolving ambiguities such as the issue of PP-attachment discussed in Section 3.3.1, e.g. to parse the sentence He ate spaghetti with meatballs, becomes no more difficult then predicting that eat →2meatball is significantly more likely than eat ←1INSTRUMENT →2meatballs. If we plan to make such predictions based on statistics over 4lang representations seen previously, our approach can be seen as the semantic counterpart of data-oriented parsing (Bod, 2008), a theory that estimates the likelihood of syntactic parses based on the likelihood of its substructures, learned from structures in some training data. 6.4. Likelihood of 4lang representations We have proposed the notion of support, the extent to which parts of one utterance entail parts of another, in Section 6.1, and we have also indicated in Section 6.2 that we require a model of context that allows us to measure the extent to which the context supports some utterance. Finally, in Section 6.3, we argued that a method for ranking 4lang (sub)graphs by the extent to which the context supports them could be used to improve the quality of syntactic parsing and thereby reduce errors in the entire text_to_4lang pipeline. We shall refer to this measure as the likelihood of some 4lang graph (given some context). This section presents some early ideas for the design of a future 4lang module that models context and measures likelihood. Given a system capable of comparing the likelihoods of competing semantic representations, we will have a chance of successfully addressing more complex tasks in artificial intelligence, such as the Winograd-schema Challenge (Levesque et al., 2011). In Section 6.1 we introduced 4langtemplates – sets of concepts and paths of edges between them – as the structures shared by 4lang graphs that are semantically related. Templates are more general structures than subgraphs, two graphs may share many templates over a set of nodes in spite of having only few shared edges; a previous example was the pair of sentences John walks with a stick and John fights with a stick, sharing the template John ⇌10X ←1INSTRUMENT →2stick. Our initial approach is to think of the likelihood of some graph as some product of the likelihood of matching templates, given a model of the context. We believe that both the likelihood of templates in some context and the way they can be combined to obtain the likelihood of an utterance should be learned from the set of 4lang graphs associated with the context. E.g. if we are to establish the likelihood of the utterance Germany won the 2014 World Cup and the context is a set of 4lang graphs obtained by processing a set of newspaper articles on sports using text_to_4lang, our answer should be based on (i) the frequency of templates in the target 4lang graph, as observed in the set of context graphs and (ii) our knowledge of how important each template is, e.g. based on their overall frequency in the context or among all occurrences over their sets of nodes17. In theory there is an enormous number of templates to consider over some graph (doubly exponential in the number of nodes), but the search space can be effectively reduced in a fashion similar to the way standard language modeling reduces the space of all possible word sequences to that of trigrams. If e.g. we consider templates of no more than 4 nodes, and we use expansion to reduce all graphs to some form of ‘plain English’ with a vocabulary no greater than 105 ((Kornai et al., 2015) has shown that an even greater reduction is possible, by iterative expansion 4lang representations can be reduced to 129 primitives, possibly fewer), then the number of node sets will remain in the 1015 range, and while the total number of theoretically possible 4lang graphs over 4 nodes is as high as 26(42)≈1012, we cannot expect to observe more than a fraction of them: the present 4lang architecture in itself determines a much smaller variety. Note that templates likely to occur in data are also mostly meaningful: e.g. templates over the graph for Germany won the 2014 World Cup are representations for states-of-affairs such as ‘Germany won a 2014 something’ (Germany ←1win →2X →0 2014), ‘somebody won a world cup’ (X ←1win →2cup →0world), or ‘Germany did something to a world something’ (Germany ←1X →2Y →0world) – our proposed parameters are the likelihoods of each of these states-of-affairs based on what we’ve learned from previous experience. What we outlined here are merely directions for further investigation – the exact architecture, the method of learning (including reduction of the parameter space) need to be determined by experiments, as does the question of how far such an approach can scale across many domains, genres, and large amounts of data. Our purpose was once again to argue for the expressiveness of 4lang representations, and to indicate our plans for future research in computational semantics. 6.5. External sources In this final section we present simple examples for using external databases of linguistic and extra-linguistic knowledge to build or extend 4lang representations automatically. Just as dict_to_4lang is a tool for acquiring knowledge about the meaning of words, similar systems could be built for learning grammar (Section 6.5.1) or facts about the world (Section 6.5.2). 6.5.1 Constructions As discussed in Section 6.3, in the future we plan to map text to 4lang representations using constructions, which are essentially pairs of patterns mapping classes of surface forms to classes of 4lang graphs. Such constructions need not be hand-coded, they may be created on a large scale from existing linguistic ontologies. One example is the PropBank database (Palmer et al., 2005) – also a key component of the AMR semantic representation (Banarescu et al., 2013) –, which contains argument lists of English verbs along with the semantic roles each argument takes. The example entry in Figure 22 establishes that the mandatory roles associated with arguments of the verb agree are those of agreer and proposition and that their functions are those of prototypical agent (PAG) and prototypical patient (PPT), respectively. This information could be represented as a 4lang construction stating that concepts accessible from agree via 1- and 2-edges should have 0-edges leading to the concepts agreer and proposition. This construction could then be used to extend the 4lang definition of agree (see Figure 23). The large-scale extension of 4lang data based on this external source will require a carefully selected set of high-precision patterns: a method must be devised to decide for each pair of PropBank frameset and 4lang definition whether an extension of the latter is warranted. Fig. 22 View largeDownload slide Part of the PropBank frameset for agree.18 Fig. 22 View largeDownload slide Part of the PropBank frameset for agree.18 Fig. 23 View largeDownload slide Extending the 4lang definition of agree (new nodes are shown in grey). Fig. 23 View largeDownload slide Extending the 4lang definition of agree (new nodes are shown in grey). 6.5.2 World knowledge Even the most simple forms of reasoning will require some model of world knowledge, and 4lang representations are capable of representing facts taken from publicly available knowledge bases such as WikiData (successor to the widely used but discontinued Freebase (Bollacker et al., 2008)). Such datasets contain triplets of the form predicate(argument1, argument2) such as author(George_Orwell, 1984). author is defined in Longman as someone who has written a book, which dict_to_4lang uses to build the definition graph in Figure 24. If we are ready to make the assumption that the first and second arguments of the WiktData predicate author correspond to the 1- and 2-neighbours of the only binary relation in this definition (write), we can combine the fact author(George_Orwell, 1984) with the definition of author to obtain the graph in Figure 25. Fig. 24 View largeDownload slide 4lang definition of author. Fig. 24 View largeDownload slide 4lang definition of author. Fig. 25 View largeDownload slide 4lang graph inferred from author(George_Orwell, 1984). Fig. 25 View largeDownload slide 4lang graph inferred from author(George_Orwell, 1984). A system for building 4lang graphs from WiktData automatically will require a high-precision method for matching WiktData relations with arguments of 4lang definitions, as we did in the case of author above. Simple heuristics like the one used in this example will have to be evaluated and only those with reasonable precision selected. Such a curated set of patterns can then be applied to any subset of WiktData to convert large amounts of factual information to the 4lang format and efficiently combine them with 4lang’s knowledge of linguistic semantics. Footnotes 1 https://github.com/kornai/4lang/blob/master/4lang. 2 Since the text_to_4lang pipeline presented in Section 3 assigns 4lang graphs to raw text based on the output of dependency parsers that treat uniformly the relationship between a subject and verb irrespective of whether the verb is transitive or not, the 4lang graphs we build will include a 1-edge between all verbs and their subjects. We do not consider this a shortcoming: for the purposes of semantic analysis we do not see the practicality of a distinction between transitive and intransitive verbs – we only recognize the difference between the likelihood (based on data) of some verb taking a certain number of arguments. 3 All example definitions, unless otherwise indicated, are taken from the Longman Dictionary of Contemporary English (Bullon, 2003). 4 Note that we do not provide a proper definition of true homonyms – particular applications of the 4lang system for representing word meaning can and should make their own decisions on where to draw the line based on their inferencing capabilities. Allowing for polysemy when the need arises is essential not only because of words like trunk, it is also more practical to maintain multiple definitions for polysemous words when such definitions are readily available (see Section 4.4.3 for more discussion). 5 For a possible typology of such semantic exploitations, see Chapter 8 of (Hanks, 2013). 6 Note that such an inference must access some form of world knowledge in addition to the definition of each concept: the definition of ship will contain ←1ON →2water (or similar), but to infer that this makes it incompatible with the earth in the definition of plow one must also be aware that water and earth cancel each other out in the context of where a vehicle runs. 7 This is also reflected in The Urban Dictionary’s definition of semantics: The study of discussing the meaning/interpretation of words or groups of words within a certain context; usually in order to win some form of argument (http://www.urbandictionary.com). 8 http://nlp.stanford.edu/software/lex-parser.shtml. 9 the word wizarding should have been mapped to the concept wizard. 10 http://dbpubs.stanford.edu:8091/∼testbed/doc2/WebBase/. 11 The command-line interface of the Stanford Parser does not support adding constraints on parse trees, but the Java API does; we implemented a small wrapper in jython that allowed us to access the classes and functions necessary to enforce this constraint. 12 The 50 words in our sample, selected randomly using GNU shuf were the following: aircraft, arbour, armful, characteristic, clothesline, contact, contrived, costermonger, cycling, cypress, dandy, efface, excited, fedora, forester, frustrate, gazette, grenade, houseboy, incandescent, invalid, khaki, kohl, lecture, lizard, might, multiplication, nightie, okey-doke, outdid, overwork, popularity, preceding, Presbyterian, punch-drunk, reputed, residency, retaliation, rock-solid, sandpaper, scant, sewing, slurp, transference, T-shirt, underwrite, vivace, well-fed, whatsit, Zen. 13 http://www.cl.cam.ac.uk/∼fh295/simlex.html. 14 http://www.cl.cam.ac.uk/∼fh295/simlex.html. 15 The 4lang theory of representing meaning using networks of Eilenberg machines – of which our graphs are simplifications – will have the machines walk and fight inherit all properties of all machines to which they have pointers on their 0th partition; in other words they will end up with all properties of concepts that are accessible through a path of IS_A relationships, and will probably share at least some very generic properties such as voluntary action. The machine-equivalent of templates could then be networks of machines, each with any arbitrary set of properties. 16 As mentioned in Section 2.1, the directed graphs used throughout this paper are simplifications of our formalism; the constructions in 4lang actually map surface patterns to operations over Eilenberg-machines, in this case one that places a pointer to a blue machine on the 0th partition of a giraffe machine. 17 At this point we must note that likelihood is not (directly related to) truth; in fact none of our previous discussions leading up to this notion makes reference to truth. Neither do we suggest that calculating likelihood can take the place of inference – a context may entail or contradict an utterance regardless of how likely the latter is; our notion is rather motivated by the various applications discussed in this section. 18 https://github.com/propbank/propbank-frames/blob/master/frames/agree.xml. References Ács J. , Pajkossy K. , & Kornai A. 2013 . Building basic vocabulary across 40 languages . In Proceedings of the Sixth Workshop on Building and Using Comparable Corpora (pp. 52 – 58 ). Sofia, Bulgaria : Association for Computational Linguistics . Banarescu L. , Bonial C. , Cai S. , Georgescu M. , Griffitt K. , Hermjakob U. ,  …N. Schneider ( 2013 ). Abstract meaning representation for sembanking . In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse (pp. 178 – 186 ). Sofia, Bulgaria : Association for Computational Linguistics . Banjade R. , Maharjan N. , Niraula N. B. , Rus V. , & Gautam D. ( 2015 ). Lemon and tea are not similar: Measuring word-to-word similarity by combining different methods . In Gelbukh A. (ed.), International Conference on Intelligent Text Processing and Computational Linguistics (pp. 335 – 346 ). Springer . Bod R. ( 2008 ). The data-oriented parsing approach: theory and application . Springer . Boguraev B. K. , & Briscoe E. J. ( 1989 ). Computational Lexicography for Natural Language Processing . Longman . Bollacker K. , Evans C. , Paritosh P. , Sturge T. , & Taylor J. ( 2008 ). Freebase: a collaboratively created graph database for structuring human knowledge . In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (pp. 1247 – 1250 ). Bullon S. ( 2003 ). Longman dictionary of contemporary English 4 . Longman . DeMarneffe M.-C. , MacCartney W. , & Manning C. ( 2006 ). Generating typed dependency parses from phrase structure parses . In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC) (Vol. 6, pp. 449–454). Genoa, Italy . De Marneffe M.-C. , & Manning C. D. ( 2008 ). Stanford typed dependencies manual [Computer software manual] . Retrieved from http://nlp.stanford.edu/software/dependencies_manual.pdf (Revised for the Stanford Parser v. 3.5.1 in February 2015) Dice L. R. ( 1945 ). Measures of the amount of ecologic association between species . Ecology , 26 ( 3 ), 297 – 302 . Google Scholar Crossref Search ADS Eilenberg S. ( 1974 ). Automata, languages, and machines ( Vol. A ). Academic Press . Fillmore C. J. ( 1977 ). Scenes-and-frames semantics . In Zampolli A. (ed.), Linguistic structures processing (pp. 55 – 88 ). North Holland . Finkelstein L. , Gabrilovich E. , Matias Y. , Rivlin E. , Solan Z. , Wolfman G. , … Ruppin E. ( 2002 ). Placing search in context: The concept revisited . ACM Transactions on Information Systems , 20( 1), 116 – 131 . Google Scholar Crossref Search ADS Ganitkevitch J. , Van Durme B. , & Callison-Burch C. ( 2013 ). PPDB: The Paraphrase Database . In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013) (pp. 758 – 764 ). Atlanta, Georgia : Association for Computational Linguistics . Han L. , Kashyap L. , Finin A. , Mayfield T. , & Weese J. , ( 2013 ). Umbc_ebiquity-core: Semantic textual similarity systems . In Second Joint Conference on Lexical and Computational Semantics (*SEM) (pp. 44 – 52 ). Atlanta, Georgia, USA : Association for Computational Linguistics . Han L. , Martineau J. , Cheng D. , & Thomas C. ( 2015 ). Samsung: Alignand- Differentiate Approach to Semantic Textual Similarity . In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) (pp. 172 – 177 ). Denver, Colorado : Association for Computational Linguistics . Hanks P. ( 2013 ). Lexical analysis: Norms and exploitations . MIT Press . Hill F. , Reichart R. , & Korhonen A. ( 2015 ). Simlex-999: Evaluating semantic models with (genuine) similarity estimation . Computational Linguistics , 41 ( 4 ), 665 – 695 . Google Scholar Crossref Search ADS Hobbs J. R. ( 1990 ). Literature and cognition (No. 21). Center for the Study of Language (CSLI) . Jaccard P. ( 1912 ). The distribution of the flora in the alpine zone . New phytologist , 11 ( 2 ), 37 – 50 . Google Scholar Crossref Search ADS Kashyap A. , Han L. , Yus R. , Sleeman J. , Satyapanich T. , Gandhi S. , & Finin T. ( 2014 ). Meerkat Mafia: Multilingual and Cross-Level Semantic Textual Similarity Systems . In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014) (pp. 416 – 423 ). Dublin, Ireland : Association for Computational Linguistics and Dublin City University . Katz J. , & Fodor J. A. ( 1963 ). The structure of a semantic theory . Language , 39 , 170 – 210 . Google Scholar Crossref Search ADS Klein D. , & Manning C. D. ( 2003 ). Accurate unlexicalized parsing . In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (pp. 423 – 430 ). Sapporo, Japan : Association for Computational Linguistics . Kornai A. ( 2010 ). The algebra of lexical semantics . In Ebert C. , Jäger G. , & Michaelis J. (Eds.), Proceedings of the 11th Mathematics of Language Workshop (pp. 174 – 199 ). Springer . Kornai A. ( 2012 ). Eliminating ditransitives . In de Groote P. & Nederhof M.-J. (Eds.), Revised and Selected Papers from the 15th and 16th Formal Grammar Conferences (pp. 243 – 261 ). Springer . Kornai A. , Ács J. , Makrai M. , Nemeskey D. M. , Pajkossy K. , & Recski G. ( 2015 ). Competence in lexical semantics . In Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics (*SEM 2015) (pp. 165 – 175 ). Denver, Colorado : Association for Computational Linguistics . Kornai A. , & Makrai M. ( 2013 ). A 4lang fogalmi szótár . In Tanács A. & Vincze V. (Eds.), IX. Magyar Számitógépes Nyelvészeti Konferencia (pp. 62 – 70 ). Kudo T. , & Matsumoto Y. ( 2001 ). Chunking with support vector machines . In Proceedings of the 2nd meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 2001) (pp. 1 – 8 ). Association for Computational Linguistics . Levesque H. J. , Davis E. , & Morgenstern L. ( 2011 ). The Winograd schema challenge . In AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning ( Vol. 46 , p. 47 ). Mikolov T. , Chen K. , Corrado G. , & Dean J. ( 2013 ). Efficient estimation of word representations in vector space . In Bengio Y. & LeCun Y. (Eds.), Proceedings of the ICLR 2013 . Nemeskey D. , Recski G. , Makrai M. , Zséder A. , & Kornai A. ( 2013 ). Spreading activation in language understanding . In Proceedings of the 9th International Conference on Computer Science and Information Technologies (CSIT 2013) (pp. 140 – 143 ). Yerevan, Armenia : Springer . Palmer M. , Gildea D. , & Kingsbury P. ( 2005 ). The Proposition Bank: An annotated corpus of semantic roles . Computational linguistics , 31 ( 1 ), 71 – 106 . Google Scholar Crossref Search ADS Quillian M. R. ( 1969 ). The teachable language comprehender . Communications of the ACM , 12 , 459 – 476 . Google Scholar Crossref Search ADS Recski G. ( 2014 ). Hungarian noun phrase extraction using rule-based and hybrid methods . Acta Cybernetica , 21 , 461 – 479 . Google Scholar Crossref Search ADS Recski G. , & Ács J. ( 2015 ). MathLingBudapest: Concept networks for semantic similarity . In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) (pp. 543 – 547 ). Denver, Colorado : Association for Computational Linguistics . Recski G. , Iklódi E. , Pajkossy K. , & Kornai A. ( 2016 ). Measuring semantic similarity of words using concept networks . In Proceedings of the 1st Workshop on Representation Learning for NLP (pp. 193 – 200 ). Berlin, Germany : Association for Computational Linguistics . Recski G. , & Varga D. ( 2010 ). A Hungarian NP Chunker . The Odd Yearbook. ELTE SEAS Undergraduate Papers in Linguistics , 8 , 87 – 93 . Richards I. ( 1937 ). The philosophy of rhetoric . Oxford University Press . Ruhl C. ( 1989 ). On monosemy: a study in lingusitic semantics . State University of New York Press . Schwartz R. , Reichart R. , & Rappoport A. ( 2015 ). Symmetric pattern based word embeddings for improved word similarity prediction . In Proceedings of the 19th Conference on Computational Natural Language Learning (CoNLL 2015) (pp. 258 – 267 ). Beijing, China : Association for Computational Linguistics . Sinclair J. M. ( 1987 ). Looking up: an account of the COBUILD project in lexical computing . Collins ELT . Socher R. , Bauer J. , Manning C. D. , & Andrew Y. N. ( 2013 ). Parsing with compositional vector grammars . In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013) (pp. 455 – 465 ). Sofia, Bulgaria : Association for Computational Linguistics . Trón V. , Gyepesi G. , Halácsky P. , Kornai A. , Németh L. , & Varga D. ( 2005 ). Hunmorph: Open source word analysis . In Proceedings of the ACL Workshop on Software (pp. 77 – 85 ). Ann Arbor, Michigan : Association for Computational Linguistics . Vo N. P. A. , & Popescu O. ( 2016 ). Corpora for learning the mutual relationship between semantic relatedness and textual entailment . In Calzolari N. et al. (Eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) . Paris, France : European Language Resources Association (ELRA) . Wieting J. , Bansal M. , Gimpel K. , Livescu K. , & Roth D. ( 2015 ). From paraphrase database to compositional paraphrase model and back . TACL , 3 , 345 – 358 . © 2017 Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

International Journal of LexicographyOxford University Press

Published: Sep 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off