Functional querying in graph databases

Functional querying in graph databases Vietnam J Comput Sci (2018) 5:95–105 https://doi.org/10.1007/s40595-017-0104-6 REGULAR PAPER Jaroslav Pokorný Received: 12 July 2017 / Accepted: 16 October 2017 / Published online: 10 November 2017 © The Author(s) 2017. This article is an open access publication Abstract The paper is focused on a functional querying in Graph databases are considered usually as NoSQL data- graph databases. We consider labelled property graph model bases (e.g., [18]). One rather popular definition of a graph and mention also the graph model behind XML databases. database (GDB), also called a graph-oriented database, says An attention is devoted to functional modelling of graph that it is a database that uses graph theory to store, map and databases both at a conceptual and data level. The notions query relationships. That is, the distinguished characteristics of graph conceptual schema and graph database schema are of the domain include: considered. The notion of a typed attribute is used as a basic structure both on the conceptual and database level. As a • relationship-rich data, formal approach to declarative graph database querying a ver- • relationships are first-class citizens in graph databases. sion of typed lambda calculus is used. This approach allows to use a logic necessary for querying, arithmetic as well as Despite of the fact that there are various approaches to GDB aggregation function. Another advantage is the ability to deal implementation, native graph processing based on so called with relations and graphs in one integrated environment. index-free adjacency is the most efficient means of process- ing data in a graph because connected nodes use physical Keywords Graph database · Querying graph database · “pointers” to neighbour nodes in the database. Of course, Graph database schema · Graph conceptual schema · there are other approaches based on an extension of SQL. Functional graph database schema · Functional graph For example, graph features introduced in SQL Server 2017 database · Typed lambda calculus · Language of terms · allow users to create node or edge tables. Graph extensions XML graph are fully integrated in SQL Server engine. A GDB can contain a single (big) graph or a collections of graphs. The former includes, e.g., graphs of social plat- forms such as, Facebook, Twitter, Linked-In or Web graph, 1 Introduction the latter is especially used in scientific domains such as bioinformatics and chemistry or human interaction patterns, Graph databases are focused on efficient storing and querying temporal road networks, etc. Graph search occurs in other highly connected data. They are a powerful tool for graph- application scenarios, like recommender systems, complex like queries, e.g., computing the shortest path between two software plagiarism detection, and traffic route planning. In nodes in the graph. They reach an excellent performance for line with similar concepts in other database technologies, we local reads by traversing the graph and can use various data will talk about graph data management systems (GDBMS) models for graphs and their data extensions. and graph database systems (GDBS). An important part of GDB technology is querying graphs. B Jaroslav Pokorný Always there is the intimate relationship between database pokorny@ksi.mff.cuni.cz 1 1 MFF UK, Malostranské nám.25, 118 00 Prague, https://docs.microsoft.com/en-us/sql/relational-databases/graphs/ Czech Republic sql-graph-architecture (retrieved on 10.9.2017). 123 96 Vietnam J Comput Sci (2018) 5:95–105 modelling and querying. Most graph query languages use XML data are trees. One can view XML data as func- directly a structure of directed graphs or property graphs. tions from elements to PCDATA or to more complex data Now, the most known declarative query language over prop- structures composed from additional elements and strings. erty graphs is Cypher of GDBMS Neo4j [15]. Cypher was An XML variant appropriate for querying XML data called the first pattern-matching query language to target the prop- XML-λ was described in [8]. Thus, it would be also possible erty graph data model. Cypher commands use partially SQL to integrate querying over relational and XML data. Now it syntax and are targeted at ad hoc queries over the graph data. is possible in SQL, e.g., to work with XML data type in a Yet other approaches are possible, e.g., a functional table column (SQL/XML), but a unified query language for approach. In the late 1980s, there was the functional lan- such polyglot databases is still missing. guage, DAPLEX [16]. The language only allowed nested The goal of the paper is to present above mentioned applications of functions. The functional map was applied approaches and discuss their power and usability. We applied in context-oriented semantics of multivalued functions. A the functional approach to properties graphs in the work [14]. number of significant works using functional approach to The present paper is an extension of [14]. data management are contained in the book [4]. In the The rest of the paper is organized as follows. Section 2 current era of GDBMSs, we can mention the Gremlin — introduces a graph data model based on (labelled) prop- a functional graph query language developed by Apache erty graphs. In addition, we will consider XML graphs as TinkerPop which allows to express complex graph traver- a special case. Section 3 describes modelling and query- sals and mutation operations over property graphs. Traversal ing GDBs, both on the conceptual and database level. XML operators/functions are chained together to form path-like graphs are also considered. The notions of graph conceptual expressions. Gremlin is supported by many GDBMSs (e.g., schema and graph database schema are introduced including Titan ). some integrity constraints (ICs). Section 4 shortly introduces Here, we will use a functional approach in which a a functional approach to GDB modelling based on typed database graph is represented by so called attributes, i.e., attributes. A version of typed lambda calculus appropriate typed partial functions. We use for this approach the HIT for GDB querying is introduced in Sect. 5. Details of query- Database Model, see, e.g., [7], as a functional alternative ing GDB with functions are explained in examples including variant of E-R model. Then a typed lambda calculus, i.e., functional querying XML data. Section 6 gives the conclu- the language of lambda terms (LT), can be used as a data sions. manipulation language. Due to the strong typing, LT can deal with various data structures in a natural way as with functions. Since sets (rela- 2 Graph data model tions) are modelled as their characteristic functions, we gain a tool for common manipulation of relations and graph data. In general, traditional database technologies are always based In consequence, the query results can include new graphs, on a database model. In the case of GDBs, such model uses relations or nested relations as well. In practice, attempts to a kind of a graph. combine, e.g., Neo4j and Oracle DBMS already exist [17]. In this polyglot environment it is possible to synchronize a 2.1 Labelled property graphs portion of the data from the relational database to Neo4j or even to synchronize all data between Oracle and Neo4j. To Here we will use a (labelled) property graph model whose use a high-level formal apparatus like a variant of the LT basic constructs include: language as a background for combining these two database technologies can be beneficial for practice. • entities (nodes), We will see, that this functional approach reflects the graph • properties (attributes), structure of a GDB and, moreover, provides powerful possi- • labels (types), bilities for dealing with properties, i.e., with the GDB content. • relationships (edges) having a direction, start node, and Attribute descriptions simultaneously provide a conceptual end node, view of the data in the GDB. • identifiers. In early 2000s we used the functional approach in the context of the XML language [8]. In a little simplified view, Entities and relationships can have any number of proper- ties, nodes and edges can be tagged with labels. More edges http://neo4j.com/developer/cypher-query-language/ (retrieved on connecting two nodes are allowed. Both nodes and edges are 10.9.2017). http://tinkerpop.apache.org/ (retrieved on 10.9.2017). https://www.w3.org/TR/2008/REC-xml-20081126/ (retrieved on http://titan.thinkaurelius.com/ (retrieved on 10.9.2017). 10.9.2017). 123 Vietnam J Comput Sci (2018) 5:95–105 97 <?xml version="1.0" encoding="UTF-8"?> Language <teachers> Language Name: German <teacher> Name: English Textbook: German for beginners <T_ID>ZI21</T_ID> <T_name>Uli</T_name> <birth_year>1982</birth_year> Teaches <teaches> Day: Mon Teacher <day>Thu</day> Teaches Hour: 6 T_ID: ZI21 <hour>4</hour> Day: Thu Room: S1 T_Name: Uli <room>S2</room> Hour: 4 Birth year: 1982 <language> Room: S2 Is_born_in <name>English</name> Date: 23.4.1950 </language> </teaches> Lives_in Town <teaches> From: 30.1.1978 Town_name: Berlin <day>Thu</day> Populaton: 50 <hour>6</hour> <room>S1</room> <language> Fig. 1 Example of a GDB <name>German</name> <textbook>German for beginners</textbook> </language> </teaches> defined by a unique identifier. Properties are expressed in <lives_in> the key:value style. In graph-theoretic notions we also talk <from>30.1.1978</from> <town> about labelled and directed attributed multigraphs in this <town_name>Berlin</town_name> case. These graphs are used both for GDB and its database <population>50</population> schema (if any). An example of a GDB is in Fig. 1. </town> </lives_in> <is_born_in> <date>23.4.1950</date> 2.2 XML graphs <town> <town_name>Berlin</town_name> In a little restricted version, also XML documents can be <population>50</population> </town> represented by labelled, finite trees. More generally, XML </is_born_in> documents can have a more general graph structure, due to ID </teacher> </teachers> references, but we will not consider this case. Nodes of these data structures are objects, labels on edges are XML tags (ele- Fig. 2 XML document ment names or attribute names). We will not even consider XML attributes here. Leaves contain atomic values—texts. Figure 2 presents an example of XML document based on Teachers the GDB sample in Fig. 1. Teacher XML data can be represented as trees in more ways. One Teacher possibility is used in Fig. 3 for XML document in Fig. 2. Birth_year Teaches Comparing to property graphs, these XML trees use only T_Name T_ID nodes of two types and labelled edges. Teaches Uli Z121 XML data can contain more occurrences of one element. Day To include them into a database, we introduce a set E of Language Hour Room Name abstract elements. An empty abstract element ε is also in E. Th S2 English Any function is undefined on ε.The set E serves as a reservoir for construction of XML elements and to their unique iden- Fig. 3 Tree representation of XML document in Fig. 2 (a part) tification. Abstract elements can be implemented as inner OIDs in an XML database. The content of an abstract ele- ment will be either a string from PCDATA, in the easiest example, or a sequence of abstract subelements, or empty. populations). The second component of a town couple can Neglecting ordering of subelements, the sequence can be be ε. Due to the tree representation, for example the town replaced by a set. For example, < town_name > Berlin < Berlin will represented twice in such XML tree. Similarly, /town_name > is an instance of the town_name ele- if more teachers learn German, representation of German ment object. The town_name element object is a (partial) language would be repeated in the XML document. This function from E to PCDATA. A more complex town ele- observation reflects the fact that XML data is textual, even ment type can be conceived as a set of functions from E to for numeric data. Moreover, no conceptual view is supposed the Cartesian product of E × E (abstract town names and for this data. 123 98 Vietnam J Comput Sci (2018) 5:95–105 In [8] we considered only partially ordered XML docu- Language ments. In paper [9], we proposed an ordered model of XML Is_taught_by data, i.e., a modification of its typing apparatus, which pre- serves ordering of subelements. Teaches Is_a Teacher Person 3 Modelling and querying graph databases Is_born_in Current commercial GDBMSs need more improvements Is_birthplace_of to meet traditional definitions of conceptual and database Street Town schema known, e.g., from the relational databases world. Has Is_in The graph database model is usually not presented explicitly, but it is hidden in constructs of a data definition language Fig. 4 Graph conceptual schema (DDL) which is at disposal in the given GDBMS. These languages also enable to specify some simple ICs. Con- Language ceptual modelling of graph databases is not used at all. An exception is the GRAD database model [6], which although schema-less, uses conceptual constructs occurring in E- Teaches R conceptual model and some powerful ICs. Both graph conceptual schema and graph database schema can pro- Is_a Person vide effective communication medium between users of any Teacher GDB. They can also significantly help to GDB designers. In Sect. 3.1 we propose a graph data modelling based Is_born_in on property graphs and introduce a conceptual level for this purpose. Section 3.2 repeats some basics of XML modelling Has with the help of DTD language. To complete graph mod- Town Street elling issues, we mention ICs in GDB in Sect. 3.3. Section 3.4 introduces some principles of graph querying. Fig. 5 Graph database schema without properties 3.1 Modelling graph data based on property graphs Town:(0, n)) can be associated with the relationship In [12] we proposed a binary E-R model as a variant for type (Is_born_in, Is_birthplace_of). Expres- sions (E :(a, n), E :(c, n)) correspond to cardinalities m : n graph conceptual modelling considering strong entity types, 1 2 weak entity types, relationship types, attributes, identifica- in an alternative notation. tion keys, partial identification keys, ISA-hierarchies, and A correct graph conceptual schema may be mapped into min-max ICs. The notation is based on the Oracle Designer an equivalent (or nearly equivalent) graph database schema CASE [3]. The graph conceptual schema in Fig. 4 uses for with the straightforward mapping algorithm [12] but with min-max ICs well-known notation with dotted lines and a weaker notion of a database schema, i.e., some inherent crow’s foots used for the start node and the end node of ICs from the conceptual level have to be neglected to satisfy some edges. The perpendicular line denotes the identification usual notation of labelled property graphs. Consequently, we and existence dependency of weak entity types. Subtyp- can propose several different graph database schemas from a graph conceptual schema. For example, the edges Teaches ing (ISA-hierarchies) is simply expressed by arrow to the entity supertype. Relationship types are expressed by cou- and Is_born_in provide only a partial information w.r.t. ples of mutually “inverse” labels, e.g.,(Is_thought_by, the associated source conceptual schema. The inverted arrow Teaches). Is_taught_by could be used as well. Due to the loss of Figures 4 and 5 give examples of graph conceptual schema the inherent ICs occurring at the conceptual level, we should and graph database schema (without attributes/properties), put some explicit ICs into the GDB schema, e.g., that “A respectively. Graphical min-max ICs (see, Fig. 2) can be teacher can teach more languages” and “A teacher is born in expressed equivalently by expressions (E :(a, b), E :(c, d)), exactly one town”. 1 2 where a, c {0, 1}, b, d {1, n}, and n means “any num- As usual, only single-valued attributes of form key: ber greater than 1”. For example, (Teacher:(1, 1), domain are considered here, where domains include ele- mentary descriptive types like String, Number, https://www.w3.org/TR/xml11/ (retrieved on 10.9.2017). Date, etc. Then, the identification key of Teacher 123 Vietnam J Comput Sci (2018) 5:95–105 99 Lives_in dependency (CFD) introduced in [12]. For example, “Each From:Date teacher is born in one town” and “Teachers born later than in 1994 teach at most one language” are examples of FD Teacher Town Language and CFD, respectively. A usable approach is also offered T_ID:String Town_name:String Name:String T_Name:String Population:Number Textbook:String by above mentioned GRAD database model. It enables to Birst_year:Number express some semantic restrictions over the graph data with Is_born_in Teaches using graph patterns. A graph pattern P is a predicate on the Date:Date Day:Date graph topology (specifying conditions on the structural prop- Hour:Number Room:String erties of the graph) and properties (specifying conditions on their values) of the graph elements. Fig. 6 Graph database schema with properties At least inherent ICs coming from a graph conceptual schema should be considered as explicit ICs on the graph <!DOCTYPE teachers [ <!ELEMENT teachers(teacher*) databases level, i.e., using a DDL for their formulation. In <!ELEMENT teacher (T_ID,T_name,birth_year,teaches*,lives_in, is_born_in) <!ELEMENT teaches (day,hour,room,language)> [13], we have focused on graph database Neo4j and its pos- <!ELEMENT language(name,textbook?) sibilities to express a database schema and/or ICs. We have <!ELEMENT lives_in(from,town)> <!ELEMENT is_born_in(date,town)> extended these possibilities through new constructs in Neo4j <!ELEMENT town(town_name,population?)> DDL including their prototype implementation and experi- <!ELEMENT town_name (#PCDATA) > <!ELEMENT population (#PCDATA) > ments. <!ELEMENT birth_year (#PCDATA) > On the other hand, NoSQL databases often does not ]> require the notion of a graph database schema at all. Strict application of schemas is sometimes considered disadvan- Fig. 7 DTD for the XML document in Fig. 2 tageous by those who develop applications for dynamic domains, e.g., domains working with user-generated content, could be #Person_ID. On the database schema level, the where the data structures are changing very often [1]. Conse- identification key of Street would be {Town_name, quently, many GDBMSs are schema-less, including Neo4j. Street_name}. Details of mapping of graph conceptual OrientDB even distinguishes three roles of graph database schemas to graph database schemas can be found in [12]. schema: schema-full, schema-less, and schema-hybrid. Even Figure 6 presents the graph database schema associated to schema-less, some GDBMSs support to specify some types the GDB in Fig. 1. Obviously, the schema is again a labelled of ICs. Neo4j uses for this purpose the language Cypher. and directed attributed multigraph. The values of some properties can be unknown or unde- fined in a GDB. This reminds NULL values in SQL databases. 3.4 Graph querying For example, Textbook:NULL could be considered. In GDB as well is in NoSQL databases generally, such proper- In this section, we focus on basics of graph querying (for ties are not explicitly represented. In Fig. 1,the Language more details see [11]). Its simplest type uses the index-free node for English is the case. adjacency. In practice, the basic queries like k-hop queries are the most frequent. Looking for a node, looking for its neigh- 3.2 Modelling graph data based on XML format bours (1-hop query), scan edges in several hops, retrieval of property values, etc., belong into the category. More complex queries are subgraph and supergraph The simplest way how to model XML data on the database queries. They belong to traditional queries based on exact schema level is given by the language DTD. A more rich matching. Other typical queries include breadth-first/depth- collection of modelling tools is offered by XML Schema first search, path and shortest path finding, least-cost path language . The DTD subset considered in our examples uses finding, finding cliques or dense subgraphs, finding strong only element declarations. Figure 7 contains a DTD describ- ing the XML data in Fig. 2. connected components,etc. Very useful are regular path queries (RPQ). RPQs have the form: 3.3 Integrity constraints in graph databases Due to the graph structure of data in GDB, associated RPQ(x , y) := (x , R, y) explicit ICs can have also a graph form. Very simple IC is a functional dependency (FD) or conditional functional 7 8 https://www.w3.org/XML/Schema (retrieved on 10.9.2017). http://www.orientechnologies.com/ (retrieved on 10.9.2017). 123 100 Vietnam J Comput Sci (2018) 5:95–105 where R is a regular expression over the vocabulary S of edge type Bool allows to type some objects as sets and rela- labels. Construction of regular expressions is as follows: tions. They are modelled as unary and n-ary characteristic functions, respectively. The concept of the set then becomes R:: = s|R.R|R|R|R |R?|(R) redundant. The fact that X is an object of type R ∈ T can be writ- where s is a label from S. RPQs provide couples of nodes ten as XR,or“X is the R-object”. For each typed object connected by a path conforming to R. With the closure of o the function type returns type(o) ∈ T of o. Log- RPQs under conjunction and existential quantification we ical connectives, quantifiers and predicates are also typed obtain conjunctive RPQs. functions: e.g., and/(Bool:Bool,Bool), R-identity = For example, the Cypher working with Neo4j databases is (Bool:R,R)-object, universal R-quantifier  , and exis- lacks some fundamental graph querying functionalities, tential R-quantifiers  are (Bool:(Bool:R))-objects. R- namely, RPQs and graph construction. In [2] an interesting singularizer I /(R:(Bool:R)) denotes the function whose newer approach is offered by the language G-Path. G-Path is value is the only member of an R-singleton and in all an RPQ language working on graphs, which supports mostly other cases the application of I is undefined. Arithmetic all useful regular expression operators. PGQL [19] is based operations +, -, *, / are examples of (Number:Number, on the paradigm of graph pattern matching, closely follows Number)-objects. We can also type functions of functions syntactic structures of SQL, and provides RPQs with condi- etc. tions on labels and properties. 4.2 Attributes 4 Database modelling with functions Object structures usable in building a database can be described by some expressions of a natural language. For Our approach to graph modelling and querying is based rather example, “the language thought by a teacher at a school” on conceptual structures than database ones. We start with (abbr. LTS) is a (Language:Teacher, School) a conceptual modelling can based on the notion of attribute -object, i.e., a (partial) function f :Teacher × School→ viewed as an empirical typed function that is described by Language, where Teacher, School, and an expression of a natural language [7]. A lot of papers are Language are appropriate elementary types. Such func- devoted to this approach developed and studied mainly in the tions are called attributes in [7]. 1990s (see, e.g., [10]). The type apparatus allows a number More formally, attributes are functions of type variants. For our purposes, we can handle with functional ((S:T):W), where W is the logical space (possible worlds), types and tuple types built in a hierarchical way. T contains time moments, and S  T. M denotes the appli- cation of the attribute M to w.W, M denotes the application wt 4.1 Types of M to the time moment t. In our approach to conceptual modelling, we can omit parameters w and t in type(M). In A hierarchy of types is constructed as follows. We assume the the case of LTS attribute, we suppose only possible worlds, existence of some (elementary) types S ,…,S (k ≥ 1). They where teachers teach at most on language in each school. 1 k constitute a base B. More complex types are constructed in However, this may not be true for other possible worlds. the following way. For GDBs we can elementary entity types conceive as sets If S, R ,..., R (n ≥ 1) are types, then of node IDs. For a better readability, John, Prague, Frank can be simply such IDs. (i) (S : R ,…,R ) is a (functional) type, Attributes can be constructed according to their type 1 n (ii) (R ,…,R ) is a (tuple) type. in a more complicated way. For example, “the classes 1 n in a school” could be considered as an attribute CS of The set of types T over B is the least set containing types from type ((Bool:(Bool:Student)):School), i.e., the B and those given by (i)-(ii). When S in B are interpreted classes contain sets of students and the CS returns a set of as non-empty sets, then (S : R ,..., R ) denotes the set of classes (of students) for a given school. Obviously, Student 1 n all (total or partial) functions from R × ··· × R into S, is an elementary type. 1 n (R ,..., R ) denotes the Cartesian product R × ··· × R . We can also consider other functions that need no possi- 1 n 1 n In the conceptual modelling, each base B consists of ble world. For example, aggregate functions like COUNT / descriptive and entity types. Descriptive types (String, ((Number:(Bool:S)), SUM/(Number:(Bool: Number, etc.) serve for domains of properties. Their ele- Number)) and arithmetic operations provide such func- ments are constants like ‘Prague‘, ‘John‘, ‘201400‘, etc. The elementary type Bool = {TRUE, FALSE} is also in B.The We suppose here, that SUM is defined on sets only. 123 Vietnam J Comput Sci (2018) 5:95–105 101 tions. These functions have the same behaviour in all possible and relationship types in the associated traditional GDB worlds and time moments. Consequently, we can distin- schema or at least their abbreviations. guish between two categories of functions: empirical (e.g. attributes) and analytical. The former are conceived as par- tial functions from the logical space. Range of these functions 5 Manipulating functions are again functions. Analytical functions are of type S, where S does not depend on W and T. A manipulating language for functions is traditionally a typed The notion of attribute applied in GDBs could be restricted lambda calculus. Our version of the typed lambda calculus on attributes of types(R:S), ((Bool:R):S), or directly supports manipulating objects typed by T introduced (Bool:R,S), where R and S are entity types. This strategy in Sect. 4.1. We will suppose a collection Func of constants, simply covers binary functional types, binary multivalued each having a fixed type, and denumerable many variables of functional types, and binary relationships described as binary each type. Then the language of (lambda) terms LT is defined characteristic functions. The last option corresponds to m : n as follows: relationship types. For modelling directed graphs the first two Let types R, S, R ,…,R (n ≥ 1) are elements of T. 1 n types are sufficient, because m : n relationships types can be expressed by two „inverse“ binary multivalued functional (1) Every variable of type R is a term of type R. types. Here we will consider always one of them (see, e.g., (2) Every constant (a member of Func) of type R is a term Teaches below). Thus, a graph database schema can reflect a of type R. reality only partially, similarly to non-functional graph data (3) If M is a term of type (S : R ,..., R ), and N ,..., N 1 n 1 n modelling. are terms of types R ,..., R , respectively, then 1 n Now we add properties. Properties describing entity M (N ,..., N ) is a term of type S./application/ 1 n types can be of types (S ,…,S :R), where S are descrip- (4) If x ,..., x are different variables of the respec- 1 m i 1 n tive elementary types and R is an entity type. So we tive types R ,..., R and M is a term of type S, 1 n deal with functional properties. Similarly, we can express then λx ,..., x (M ) is a term of type (S : R ,..., R ) 1 n 1 n properties of edges. They are of types (S ,…,S , R : /lambda abstraction/ 1 m 1 R ) and ((Bool : S ,…,S , R ) : R ) for binary func- (5) If N ,..., N are terms of typesR ,..., R , respectively, 2 1 m 1 2 1 n 1 n tional and binary multivalued functional types, respec- then tively. (N ,..., N ) is a term of type (R ,..., R )./tuple/ 1 n 1 n Then a functional database schema describing GDB in (6) If M is a term of type (R ,..., R ), then 1 n Fig. 1 can look as: M [1],…,M[n] are terms of respective types R ,..., R . 1 n La/((Name, Textbook):Language) /components/ Te/((T_ID, T_Name, Birth_year):Teacher) Tw/((Town_name, Population):Town) 5.1 From the LT language to a more user-friendly Teaches/((Bool:Day, Hour, Room, notation Language):Teacher) Is_Born_in/((Date, Town):Teacher) Instead of the position notation we can use more read- Lives_in/((From,Town):Teacher) able dot notation for components. Consider the Prague where La, Te, Tw, Teaches, Is_Born_in and Lives_in are typed object (entity, node). Then instead of Tw(Prague) [2], variables. The use of variables correspond to a “database where Prague/Town, we can write Tw(Prague). view” of the modelled world. Since we do not consider tem- Population. The real effect of this convention is that we poral functional databases here, it is not necessary to use time approach Population property independently on its position explicitly. in the tuple described by the associated functional database We remark, however, that our functional GDBs with such schema. In the case that the population is not present between schemas can contain isolated nodes with at least one prop- properties of the town Prague, the function will be undefined. erty. IDs of edges are not necessary, because edges are not Terms are interpreted in a standard way by means of an explicitly considered. interpretation assigning to each function symbol from Func Thus, a functional graph database schema is a set of an object of the same type, and by a semantic mapping variables of types from T restricted to attribute types. A func- [ ] from LT into all functions given by T. Func influ- tional graph database is any evaluation of these variables. ences the expressive power of LT. It can contain usual Thus, certain variables serve for referencing the associated arithmetic and aggregation functions, etc. Of a special database. For convenience, we denote the variables from the importance is R-identity = usable for (Bool:(Bool:S),(Bool:S)) functional graph database schema by names remaining entity a comparison of two sets of S-objects. We will denote the (Bool:(Bool:S),(Bool:S))type as 2sets for 123 102 Vietnam J Comput Sci (2018) 5:95–105 better readability. Consider Teacher-objects John and Using the syntactic abbreviation similar to that one used Frank. Then with in Section 4.3, the resulted lambda expression Language Language Teacher Number Number λl ∃d , h , r , Teaches(John)(d , h , r , l ) = 1 1 1 1 1 1 2sets 1 1 λt , n (n Language Language λl ∃d , h , r , Teaches(Frank)(d , h , r , l ) Teacher Language 2 2 2 2 2 2 2 2 = COUNT (λ l(Teaches(t )(l )))) Language we can test whether John and Frank teach the same set of denotes the query “Give a set of couples associating with languages. Similarly to domain relational calculus, we can each teacher the number of languages teaching by him/her”. simplify this expression by omitting some existential quan- Obviously, the query could be also reformulated as λ t (λ n tifiers and associated variables. Then the resulted expression (…)), or more conventionally λ t onlyone n (…). can look as To consider only teachers born in Prague the lambda Language Language expression might look like λl Teaches(John)(l ) = 2sets 1 1 Language Language Teacher Number Number λl Teaches(Frank)(l ) λ t , n (n 2 2 Teacher Language = COUNT (λ l(Teaches(t )(l )) Language A further simplification could be made by omitting lambda Teacher and Is − born_in(t ).Town_name = Prague )) abstraction supposing that information about objects con- sidered is in the Bool(Language)-identity. Then we can Remark 1 In our conceptual framework we could conceive write each query as an attribute, i.e., a lambda expression depen- dent on possible worlds and time moments. In practice, we Teaches(John) = Teaches(Frank) 2sets omit λw (λ t …in its head. Clearly, the resulted lambda expressions can express more complex graph structures than In other words, an application is evaluated as the applica- k-hop queries, i.e., they can contribute to constructions of tion of an associated function to given arguments, a lambda new graphs from the original GDB. abstraction “constructs” a new function. In the conventional approach a valuation δ is used. Supposing this mapping we In querying GDBs, RPQs are of user importance. We will can assign objects to every variable occurring in a term. consider expressions For example, CS(Oxford_House)(’AM_training’) is TRUE, if there is the class AM_training in the Oxford_house λ x , yReg(x , y) school (’AM_training’ is a constant of type (Bool: Student)). It is true while ’AM_training’ contains all stu- where Reg is an expression simulating concatenation and dents of the class AM_training. closure. There are two styles how to construct Reg. First, In accordance with the semantics of the quantifiers consider the attribute and the singularizer, we can write simply ∀x (M ) instead of (λ x (M )). Similarly, ∃x (M ) replaces (λ x (M )). Friends_Of _Friend/((Bool : Person) : Person), Finally, we write I (λ x (M )) shortly as onlyone x (M ) and read “the only x such as M”. Certainly, M/Bool. abbreviated as FOF. FOF( p )(p ) = TRUE expresses the fact i j From the database point of view, we have at disposal a that there is an edge between p and p in the associated GDB. i j powerful declarative language for formulating schema trans- Then formation, ICs, etc., even querying GDBs, as we shall see in Sect. 5.1. Section 5.2 shows that a similar technique can be FOF (p , p ) used in context of XML trees. 1 k will denote the expression FOF( p )(p ) and … and 5.2 Querying graph data functionally 1 2 FOF( p ,)( p ),for a k (k >1), and k−1 k The LT language can be used as a theoretical tool for building λ x , yFOF (x , y) a functional database language. A query in such language is expressed by a term of LT, e.g., provides a set of couples ( p , p ), where there is a directed 1 k path from p to p along edges of FOF. λ t, n(n =COUNT (λ l (∃d, h, rTeaches(t )(d,h,r, l)))) 1 k Language Now we will consider a single-valued function, e.g., with d, h, r, and l of types Teacher, Day, Hour, Room, Language, respectively. Manager_of /(Person : Person), 123 Vietnam J Comput Sci (2018) 5:95–105 103 abbreviated as MO. MO (p), where p/Person, will denote is an abbreviation for (BOOL:T). In this case, (BOOL:T)- the expression MO(…(MO( p)…)) with l applications of MO, objects include ∅ which simulates the empty set. Similarly, for a l ≥1, and T + denotes the set of (BOOL:T)-objects except ∅, and T? the set of objects of type T ∪ NIL. λx , y (MO (x ) = y) Now we will define (XML) element types, following the established type system T over B. In order to distinguish reg provides a set of couples ( p , p ), where there is a directed tags used in T from element tags we will suppose for each 1 k reg path from p to p along edges MO. tag used in the elementary types tag:T or tag: the existence 1 k A composition of different functions representing rela- of the TAG name denoting an associate element type .The tionships and properties looks, e.g., as follows. The fol- same holds for any tag ∈ NAME. Thus, NAME contains both lowing term expresses a YES/NO query “Is John born in tags and TAGs. Berlin?” Let T over B be the type system and E be a set of reg abstract elements. Then the type system T induced by T E reg Tw(Is_born(John).Town).Town_name = Berlin (or T ifT is understood) containing the regular element E reg expressions is given by the following grammar: One of the most fundamental problems in graph process- ing is pattern matching. Specifically, a pattern match query searches over a GDB to look for the existence of a pattern graph in a GDB. For example, triangle counting is used heavily in social network analysis. It is easy to formulate terms providing sets of triples (p , p , p ) and there is a 1 2 3 path p → p → p → p in GDB. Clearly, 1 2 3 1 Elementary element types and regular element expres- we have to suppose oriented labelled edges, i.e., attribute sions TAG:(E ,..., E ) are called element types. 1 n names, e.g., FOF. Finding only structural triangles, regard- The functional semantics of the element types asso- less of edge names, would require to expand the typing of ciate with TAG:PCDATA the set of all functions from E to variables and to move to the second-order typed lambda cal- tag:PCDATA. For a non-elementary element type T, the culus. semantics of TAG:T are also functional, but the functions We remind that the LT is not computationally complete. are more complex. But it makes possible to increase its computational power by Then an XML-database schema, S , is a set of variables XML adding new built-in functions into Func. In other words, LT of types from T . Given an XML-database schema S ,an E XML is extendible with various mathematical functions, including XML-database is any valuation of these variables. Thus, cer- logical operators. tain variables serve for referencing the associated database. For convenience, we denote the variables from S by the XML 5.3 Querying XML trees functionally same names as TAGs from T ,e.g. BOOK, AUTHOR, etc. For example, a number of types associated with DTD in Fig. 7 First, we extend the type system defined in Sect. 4.1 by the can look as follows: union type (T + T ) denoting union of sets T and T . 1 2 1 2 Now we introduce the type system T .Let B = {PCDATA, reg TEACHERS : (TEACHER∗) BOOL, NAME}.The type system T over B is recursively reg TOWN : (TOWN_NAME, POPULATION?) defined as follows. TOWN_NAME : PCDATA To manipulate XML data from XML-database based on T , it is necessary only to extend the original LT language by tagged terms K:M, where K/NAME.If M/T, then K:M/(T:E). The resulted language version called XML-λ in [8] enables to simulate much of the XPath language as well as the 1st order logic, aggregation functions, arithmetic, and user The type system T describes regular expressions over reg defined functions. Obviously, a more user-friendly notation character data, similarly as it is allowed in DTDs. Suppose that PCDATA is interpreted as a set of character data (strings). Then tag:PCDATA denotes the set of character data labelled This distinguishing is only formal and can be done in any other way by tag. The type tag: denotes the empty labelled character in practice. object. (T ∪T ) denotes the set of objects of type T ∪T . T* https://www.w3.org/TR/xpath/ (retrieved on 10.9.2017). 1 2 1 2 123 104 Vietnam J Comput Sci (2018) 5:95–105 can be used. For example, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. POPULATION(TOWN(IS_BORN_IN(TEACHER(e)))) can be rewritten as e.TEACHER.IS_BORN_IN.TOWN.POPULATION References When we use a path in more logical conditions, it is possible 1. Angels, R.: A comparison of current graph database models. In: to write the common prefix only once in XML-λ and to put IEEE 28th Int. Conference on Data Engineering Workshops, pp. conditions in parentheses. For example, 171–177 (2012) 2. Bai, Y., Wang, Ch., Ning, Y., Wu, H., Wang, H.: G-path: flexible λx(TEACHER(IS_BORN_IN.TOWN.TOWN_NAME = path pattern query on large graphs, pp. 333–336. WWW (Compan- ion Volume) (2013) LIVES_IN.TOWN.TOWN_NAMEandT_NAME = x)) 3. Barker, R.: Case*Method: Entity Relationship Modeling. Addison- Wesley Publ. Comp., Boston (1990) denotes the query “Give a set of teachers names who live in 4. Gray, P.M.D., Kerschberg, L., King, P.J.H., Poulovassilis, A.: The Functional Approach to Data Management. Springer, Berlin (2004) the same town where they were born”. 5. Ma, S., Li, J., Hu, Ch., Lin, X., Huai, J.: Big graph search: chal- lenges and techniques. Front. Comput. Sci. 10(3), 387–398 (2016) 6. Ghrab, A., Romero, O., Skhiri, S., Vaisman, A., Zimányi, E.. 6 Conclusions GRAD: On Graph Database Modeling. Cornel University Library (2016). arXiv:1602.00503 7. Pokorný, J.: A function: unifying mechanism for entity-oriented The objective of this paper was to provide an alternative database models. In: Batini, C. (ed.) Entity-Relationship Approach, approach to GDB querying based on a functional approach. pp. 165–181. Elsevier Science Publishers B.V, North-Holland Comparing to other graph query languages, the functional (1989) 8. Pokorny, J.: XML functionally. In: Desai, B.C., Kiyoki, Y., Toyama, language LT designed here is based on the notion of M. (eds.) Proc. of IDEAS2000, pp. 266-274. IEEE Comp. Society graph conceptual schema using the notion of attribute. The (2000) approach is based on the idea that the conceptual view can 9. Pokorný, J.: XML querying with functions. In: Kiyoki, Y. et al be directly conceived as a database view. More technically, (eds.) Proc. of the IASTED Int. Conf. Information Systems and Databases, pp. 204–209. Acta Press (2002) only an appropriate database implementation of concep- 10. Pokorný, J.: Database semantics in heterogeneous environment. In: tual structures is necessary. Query languages based on this Jeffery, K.G.. Král, J., Bartošek M. (eds.) Proc. of 23rd Seminar approach are usable in environments where GDB is searched SOFSEM◦96: Theory and Practice of Informatics, pp. 125–142. Springer-Verlag (1996) for collecting and aggregating information from nodes and 11. Pokorný, J.: Graph databases: their power and limitations. In: relationships rather than extractions of only structural pat- Saeed, K. and Homenda, W. (eds.) Proc. of 14th Int. Conf. terns. on Computer Information Systems and Industrial Management We have seen that not only labelled property graphs but Applications (CISIM), LNCS 9339, pp. 58–69. Springer, Berlin (2015) also XML trees provide application environment for the func- 12. Pokorný, J.: Conceptual and database modelling of graph tional approach. Only the type system is different. databases. In: Desai, B. (ed.) Proc. of IDEAS’ 16, pp. 370–377. Finally, a few words about usability of functional query- ACM (2016) ing in GDBs. All the techniques associated with GDBMS and 13. Pokorný, J., Valenta, M., Kovaci ˇ c, ˇ J.: Integrity constraints in graph databases. In: Proc. of the 8th International Conference on Ambient supported in any graph search engine should fulfil so called Systems, Networks and Technologies (ANT 2017), 7th Int. Symp. FAE rule [5]. The FAE rule says that the quality of search on Frontiers in Ambient and Mobile Systems (FAMS 2017), pp. engines includes three key factors: Friendliness, Accuracy 975–981. Elsevier Science, Procedia Computer Science (2017). and Efficiency, i.e., that a good search engine must provide https://doi.org/10.1016/j.procs.2017.05.456 the users with a friendly query interface and highly accurate 14. Pokorný, J.: Functional Querying in Graph Databases. In: Nguyen N., Tojo S., Nguyen L., Trawinski ´ B. (eds.) Proc. of 9th Asian Con- answers in a fast way. A friendliness of our functional lan- ference on Intelligent Information and Database Systems (ACIIDS guage is still missing till now. This is the main challenge for 2017), pp. 291–301, Part I, LNCS 10191. Springer (2017) future work. 15. Robinson, I., Webber, J., Eifrém, E.: Graph Databases. O’Reilly Media (2013) Acknowledgements This work was supported by the Charles Univer- 16. Shipman, D.W.: The functional data model and the data languages sity project Q48. DAPLEX. ACM Trans. Database Syst. (TODS) 6(1), 140–173 (1981) Open Access This article is distributed under the terms of the Creative 17. Stanek, G., Kolmar, S.: How Neo4j co-exists with Oracle RDBMS. White paper, Neo4j (2016) Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, 18. Tivari, S.: Professional NoSQL. Wiley/Wrox (2011) 123 Vietnam J Comput Sci (2018) 5:95–105 105 Publisher’s Note Springer Nature remains neutral with regard to juris- 19. van Rest, O., Hong, S., Kim, J., Meng, X., Chafi, H.: PGQL: a dictional claims in published maps and institutional affiliations. property graph query language. In: Proc. of the 4th Int. Workshop on GRADES, Redwood Shores, CA (2016) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Vietnam Journal of Computer Science Springer Journals

Functional querying in graph databases

Free
11 pages
Loading next page...
 
/lp/springer_journal/functional-querying-in-graph-databases-tznFTY0ADi
Publisher
Springer Berlin Heidelberg
Copyright
Copyright © 2017 by The Author(s)
Subject
Computer Science; Information Systems and Communication Service; Artificial Intelligence (incl. Robotics); Computer Applications; e-Commerce/e-business; Computer Systems Organization and Communication Networks; Computational Intelligence
ISSN
2196-8888
eISSN
2196-8896
D.O.I.
10.1007/s40595-017-0104-6
Publisher site
See Article on Publisher Site

Abstract

Vietnam J Comput Sci (2018) 5:95–105 https://doi.org/10.1007/s40595-017-0104-6 REGULAR PAPER Jaroslav Pokorný Received: 12 July 2017 / Accepted: 16 October 2017 / Published online: 10 November 2017 © The Author(s) 2017. This article is an open access publication Abstract The paper is focused on a functional querying in Graph databases are considered usually as NoSQL data- graph databases. We consider labelled property graph model bases (e.g., [18]). One rather popular definition of a graph and mention also the graph model behind XML databases. database (GDB), also called a graph-oriented database, says An attention is devoted to functional modelling of graph that it is a database that uses graph theory to store, map and databases both at a conceptual and data level. The notions query relationships. That is, the distinguished characteristics of graph conceptual schema and graph database schema are of the domain include: considered. The notion of a typed attribute is used as a basic structure both on the conceptual and database level. As a • relationship-rich data, formal approach to declarative graph database querying a ver- • relationships are first-class citizens in graph databases. sion of typed lambda calculus is used. This approach allows to use a logic necessary for querying, arithmetic as well as Despite of the fact that there are various approaches to GDB aggregation function. Another advantage is the ability to deal implementation, native graph processing based on so called with relations and graphs in one integrated environment. index-free adjacency is the most efficient means of process- ing data in a graph because connected nodes use physical Keywords Graph database · Querying graph database · “pointers” to neighbour nodes in the database. Of course, Graph database schema · Graph conceptual schema · there are other approaches based on an extension of SQL. Functional graph database schema · Functional graph For example, graph features introduced in SQL Server 2017 database · Typed lambda calculus · Language of terms · allow users to create node or edge tables. Graph extensions XML graph are fully integrated in SQL Server engine. A GDB can contain a single (big) graph or a collections of graphs. The former includes, e.g., graphs of social plat- forms such as, Facebook, Twitter, Linked-In or Web graph, 1 Introduction the latter is especially used in scientific domains such as bioinformatics and chemistry or human interaction patterns, Graph databases are focused on efficient storing and querying temporal road networks, etc. Graph search occurs in other highly connected data. They are a powerful tool for graph- application scenarios, like recommender systems, complex like queries, e.g., computing the shortest path between two software plagiarism detection, and traffic route planning. In nodes in the graph. They reach an excellent performance for line with similar concepts in other database technologies, we local reads by traversing the graph and can use various data will talk about graph data management systems (GDBMS) models for graphs and their data extensions. and graph database systems (GDBS). An important part of GDB technology is querying graphs. B Jaroslav Pokorný Always there is the intimate relationship between database pokorny@ksi.mff.cuni.cz 1 1 MFF UK, Malostranské nám.25, 118 00 Prague, https://docs.microsoft.com/en-us/sql/relational-databases/graphs/ Czech Republic sql-graph-architecture (retrieved on 10.9.2017). 123 96 Vietnam J Comput Sci (2018) 5:95–105 modelling and querying. Most graph query languages use XML data are trees. One can view XML data as func- directly a structure of directed graphs or property graphs. tions from elements to PCDATA or to more complex data Now, the most known declarative query language over prop- structures composed from additional elements and strings. erty graphs is Cypher of GDBMS Neo4j [15]. Cypher was An XML variant appropriate for querying XML data called the first pattern-matching query language to target the prop- XML-λ was described in [8]. Thus, it would be also possible erty graph data model. Cypher commands use partially SQL to integrate querying over relational and XML data. Now it syntax and are targeted at ad hoc queries over the graph data. is possible in SQL, e.g., to work with XML data type in a Yet other approaches are possible, e.g., a functional table column (SQL/XML), but a unified query language for approach. In the late 1980s, there was the functional lan- such polyglot databases is still missing. guage, DAPLEX [16]. The language only allowed nested The goal of the paper is to present above mentioned applications of functions. The functional map was applied approaches and discuss their power and usability. We applied in context-oriented semantics of multivalued functions. A the functional approach to properties graphs in the work [14]. number of significant works using functional approach to The present paper is an extension of [14]. data management are contained in the book [4]. In the The rest of the paper is organized as follows. Section 2 current era of GDBMSs, we can mention the Gremlin — introduces a graph data model based on (labelled) prop- a functional graph query language developed by Apache erty graphs. In addition, we will consider XML graphs as TinkerPop which allows to express complex graph traver- a special case. Section 3 describes modelling and query- sals and mutation operations over property graphs. Traversal ing GDBs, both on the conceptual and database level. XML operators/functions are chained together to form path-like graphs are also considered. The notions of graph conceptual expressions. Gremlin is supported by many GDBMSs (e.g., schema and graph database schema are introduced including Titan ). some integrity constraints (ICs). Section 4 shortly introduces Here, we will use a functional approach in which a a functional approach to GDB modelling based on typed database graph is represented by so called attributes, i.e., attributes. A version of typed lambda calculus appropriate typed partial functions. We use for this approach the HIT for GDB querying is introduced in Sect. 5. Details of query- Database Model, see, e.g., [7], as a functional alternative ing GDB with functions are explained in examples including variant of E-R model. Then a typed lambda calculus, i.e., functional querying XML data. Section 6 gives the conclu- the language of lambda terms (LT), can be used as a data sions. manipulation language. Due to the strong typing, LT can deal with various data structures in a natural way as with functions. Since sets (rela- 2 Graph data model tions) are modelled as their characteristic functions, we gain a tool for common manipulation of relations and graph data. In general, traditional database technologies are always based In consequence, the query results can include new graphs, on a database model. In the case of GDBs, such model uses relations or nested relations as well. In practice, attempts to a kind of a graph. combine, e.g., Neo4j and Oracle DBMS already exist [17]. In this polyglot environment it is possible to synchronize a 2.1 Labelled property graphs portion of the data from the relational database to Neo4j or even to synchronize all data between Oracle and Neo4j. To Here we will use a (labelled) property graph model whose use a high-level formal apparatus like a variant of the LT basic constructs include: language as a background for combining these two database technologies can be beneficial for practice. • entities (nodes), We will see, that this functional approach reflects the graph • properties (attributes), structure of a GDB and, moreover, provides powerful possi- • labels (types), bilities for dealing with properties, i.e., with the GDB content. • relationships (edges) having a direction, start node, and Attribute descriptions simultaneously provide a conceptual end node, view of the data in the GDB. • identifiers. In early 2000s we used the functional approach in the context of the XML language [8]. In a little simplified view, Entities and relationships can have any number of proper- ties, nodes and edges can be tagged with labels. More edges http://neo4j.com/developer/cypher-query-language/ (retrieved on connecting two nodes are allowed. Both nodes and edges are 10.9.2017). http://tinkerpop.apache.org/ (retrieved on 10.9.2017). https://www.w3.org/TR/2008/REC-xml-20081126/ (retrieved on http://titan.thinkaurelius.com/ (retrieved on 10.9.2017). 10.9.2017). 123 Vietnam J Comput Sci (2018) 5:95–105 97 <?xml version="1.0" encoding="UTF-8"?> Language <teachers> Language Name: German <teacher> Name: English Textbook: German for beginners <T_ID>ZI21</T_ID> <T_name>Uli</T_name> <birth_year>1982</birth_year> Teaches <teaches> Day: Mon Teacher <day>Thu</day> Teaches Hour: 6 T_ID: ZI21 <hour>4</hour> Day: Thu Room: S1 T_Name: Uli <room>S2</room> Hour: 4 Birth year: 1982 <language> Room: S2 Is_born_in <name>English</name> Date: 23.4.1950 </language> </teaches> Lives_in Town <teaches> From: 30.1.1978 Town_name: Berlin <day>Thu</day> Populaton: 50 <hour>6</hour> <room>S1</room> <language> Fig. 1 Example of a GDB <name>German</name> <textbook>German for beginners</textbook> </language> </teaches> defined by a unique identifier. Properties are expressed in <lives_in> the key:value style. In graph-theoretic notions we also talk <from>30.1.1978</from> <town> about labelled and directed attributed multigraphs in this <town_name>Berlin</town_name> case. These graphs are used both for GDB and its database <population>50</population> schema (if any). An example of a GDB is in Fig. 1. </town> </lives_in> <is_born_in> <date>23.4.1950</date> 2.2 XML graphs <town> <town_name>Berlin</town_name> In a little restricted version, also XML documents can be <population>50</population> </town> represented by labelled, finite trees. More generally, XML </is_born_in> documents can have a more general graph structure, due to ID </teacher> </teachers> references, but we will not consider this case. Nodes of these data structures are objects, labels on edges are XML tags (ele- Fig. 2 XML document ment names or attribute names). We will not even consider XML attributes here. Leaves contain atomic values—texts. Figure 2 presents an example of XML document based on Teachers the GDB sample in Fig. 1. Teacher XML data can be represented as trees in more ways. One Teacher possibility is used in Fig. 3 for XML document in Fig. 2. Birth_year Teaches Comparing to property graphs, these XML trees use only T_Name T_ID nodes of two types and labelled edges. Teaches Uli Z121 XML data can contain more occurrences of one element. Day To include them into a database, we introduce a set E of Language Hour Room Name abstract elements. An empty abstract element ε is also in E. Th S2 English Any function is undefined on ε.The set E serves as a reservoir for construction of XML elements and to their unique iden- Fig. 3 Tree representation of XML document in Fig. 2 (a part) tification. Abstract elements can be implemented as inner OIDs in an XML database. The content of an abstract ele- ment will be either a string from PCDATA, in the easiest example, or a sequence of abstract subelements, or empty. populations). The second component of a town couple can Neglecting ordering of subelements, the sequence can be be ε. Due to the tree representation, for example the town replaced by a set. For example, < town_name > Berlin < Berlin will represented twice in such XML tree. Similarly, /town_name > is an instance of the town_name ele- if more teachers learn German, representation of German ment object. The town_name element object is a (partial) language would be repeated in the XML document. This function from E to PCDATA. A more complex town ele- observation reflects the fact that XML data is textual, even ment type can be conceived as a set of functions from E to for numeric data. Moreover, no conceptual view is supposed the Cartesian product of E × E (abstract town names and for this data. 123 98 Vietnam J Comput Sci (2018) 5:95–105 In [8] we considered only partially ordered XML docu- Language ments. In paper [9], we proposed an ordered model of XML Is_taught_by data, i.e., a modification of its typing apparatus, which pre- serves ordering of subelements. Teaches Is_a Teacher Person 3 Modelling and querying graph databases Is_born_in Current commercial GDBMSs need more improvements Is_birthplace_of to meet traditional definitions of conceptual and database Street Town schema known, e.g., from the relational databases world. Has Is_in The graph database model is usually not presented explicitly, but it is hidden in constructs of a data definition language Fig. 4 Graph conceptual schema (DDL) which is at disposal in the given GDBMS. These languages also enable to specify some simple ICs. Con- Language ceptual modelling of graph databases is not used at all. An exception is the GRAD database model [6], which although schema-less, uses conceptual constructs occurring in E- Teaches R conceptual model and some powerful ICs. Both graph conceptual schema and graph database schema can pro- Is_a Person vide effective communication medium between users of any Teacher GDB. They can also significantly help to GDB designers. In Sect. 3.1 we propose a graph data modelling based Is_born_in on property graphs and introduce a conceptual level for this purpose. Section 3.2 repeats some basics of XML modelling Has with the help of DTD language. To complete graph mod- Town Street elling issues, we mention ICs in GDB in Sect. 3.3. Section 3.4 introduces some principles of graph querying. Fig. 5 Graph database schema without properties 3.1 Modelling graph data based on property graphs Town:(0, n)) can be associated with the relationship In [12] we proposed a binary E-R model as a variant for type (Is_born_in, Is_birthplace_of). Expres- sions (E :(a, n), E :(c, n)) correspond to cardinalities m : n graph conceptual modelling considering strong entity types, 1 2 weak entity types, relationship types, attributes, identifica- in an alternative notation. tion keys, partial identification keys, ISA-hierarchies, and A correct graph conceptual schema may be mapped into min-max ICs. The notation is based on the Oracle Designer an equivalent (or nearly equivalent) graph database schema CASE [3]. The graph conceptual schema in Fig. 4 uses for with the straightforward mapping algorithm [12] but with min-max ICs well-known notation with dotted lines and a weaker notion of a database schema, i.e., some inherent crow’s foots used for the start node and the end node of ICs from the conceptual level have to be neglected to satisfy some edges. The perpendicular line denotes the identification usual notation of labelled property graphs. Consequently, we and existence dependency of weak entity types. Subtyp- can propose several different graph database schemas from a graph conceptual schema. For example, the edges Teaches ing (ISA-hierarchies) is simply expressed by arrow to the entity supertype. Relationship types are expressed by cou- and Is_born_in provide only a partial information w.r.t. ples of mutually “inverse” labels, e.g.,(Is_thought_by, the associated source conceptual schema. The inverted arrow Teaches). Is_taught_by could be used as well. Due to the loss of Figures 4 and 5 give examples of graph conceptual schema the inherent ICs occurring at the conceptual level, we should and graph database schema (without attributes/properties), put some explicit ICs into the GDB schema, e.g., that “A respectively. Graphical min-max ICs (see, Fig. 2) can be teacher can teach more languages” and “A teacher is born in expressed equivalently by expressions (E :(a, b), E :(c, d)), exactly one town”. 1 2 where a, c {0, 1}, b, d {1, n}, and n means “any num- As usual, only single-valued attributes of form key: ber greater than 1”. For example, (Teacher:(1, 1), domain are considered here, where domains include ele- mentary descriptive types like String, Number, https://www.w3.org/TR/xml11/ (retrieved on 10.9.2017). Date, etc. Then, the identification key of Teacher 123 Vietnam J Comput Sci (2018) 5:95–105 99 Lives_in dependency (CFD) introduced in [12]. For example, “Each From:Date teacher is born in one town” and “Teachers born later than in 1994 teach at most one language” are examples of FD Teacher Town Language and CFD, respectively. A usable approach is also offered T_ID:String Town_name:String Name:String T_Name:String Population:Number Textbook:String by above mentioned GRAD database model. It enables to Birst_year:Number express some semantic restrictions over the graph data with Is_born_in Teaches using graph patterns. A graph pattern P is a predicate on the Date:Date Day:Date graph topology (specifying conditions on the structural prop- Hour:Number Room:String erties of the graph) and properties (specifying conditions on their values) of the graph elements. Fig. 6 Graph database schema with properties At least inherent ICs coming from a graph conceptual schema should be considered as explicit ICs on the graph <!DOCTYPE teachers [ <!ELEMENT teachers(teacher*) databases level, i.e., using a DDL for their formulation. In <!ELEMENT teacher (T_ID,T_name,birth_year,teaches*,lives_in, is_born_in) <!ELEMENT teaches (day,hour,room,language)> [13], we have focused on graph database Neo4j and its pos- <!ELEMENT language(name,textbook?) sibilities to express a database schema and/or ICs. We have <!ELEMENT lives_in(from,town)> <!ELEMENT is_born_in(date,town)> extended these possibilities through new constructs in Neo4j <!ELEMENT town(town_name,population?)> DDL including their prototype implementation and experi- <!ELEMENT town_name (#PCDATA) > <!ELEMENT population (#PCDATA) > ments. <!ELEMENT birth_year (#PCDATA) > On the other hand, NoSQL databases often does not ]> require the notion of a graph database schema at all. Strict application of schemas is sometimes considered disadvan- Fig. 7 DTD for the XML document in Fig. 2 tageous by those who develop applications for dynamic domains, e.g., domains working with user-generated content, could be #Person_ID. On the database schema level, the where the data structures are changing very often [1]. Conse- identification key of Street would be {Town_name, quently, many GDBMSs are schema-less, including Neo4j. Street_name}. Details of mapping of graph conceptual OrientDB even distinguishes three roles of graph database schemas to graph database schemas can be found in [12]. schema: schema-full, schema-less, and schema-hybrid. Even Figure 6 presents the graph database schema associated to schema-less, some GDBMSs support to specify some types the GDB in Fig. 1. Obviously, the schema is again a labelled of ICs. Neo4j uses for this purpose the language Cypher. and directed attributed multigraph. The values of some properties can be unknown or unde- fined in a GDB. This reminds NULL values in SQL databases. 3.4 Graph querying For example, Textbook:NULL could be considered. In GDB as well is in NoSQL databases generally, such proper- In this section, we focus on basics of graph querying (for ties are not explicitly represented. In Fig. 1,the Language more details see [11]). Its simplest type uses the index-free node for English is the case. adjacency. In practice, the basic queries like k-hop queries are the most frequent. Looking for a node, looking for its neigh- 3.2 Modelling graph data based on XML format bours (1-hop query), scan edges in several hops, retrieval of property values, etc., belong into the category. More complex queries are subgraph and supergraph The simplest way how to model XML data on the database queries. They belong to traditional queries based on exact schema level is given by the language DTD. A more rich matching. Other typical queries include breadth-first/depth- collection of modelling tools is offered by XML Schema first search, path and shortest path finding, least-cost path language . The DTD subset considered in our examples uses finding, finding cliques or dense subgraphs, finding strong only element declarations. Figure 7 contains a DTD describ- ing the XML data in Fig. 2. connected components,etc. Very useful are regular path queries (RPQ). RPQs have the form: 3.3 Integrity constraints in graph databases Due to the graph structure of data in GDB, associated RPQ(x , y) := (x , R, y) explicit ICs can have also a graph form. Very simple IC is a functional dependency (FD) or conditional functional 7 8 https://www.w3.org/XML/Schema (retrieved on 10.9.2017). http://www.orientechnologies.com/ (retrieved on 10.9.2017). 123 100 Vietnam J Comput Sci (2018) 5:95–105 where R is a regular expression over the vocabulary S of edge type Bool allows to type some objects as sets and rela- labels. Construction of regular expressions is as follows: tions. They are modelled as unary and n-ary characteristic functions, respectively. The concept of the set then becomes R:: = s|R.R|R|R|R |R?|(R) redundant. The fact that X is an object of type R ∈ T can be writ- where s is a label from S. RPQs provide couples of nodes ten as XR,or“X is the R-object”. For each typed object connected by a path conforming to R. With the closure of o the function type returns type(o) ∈ T of o. Log- RPQs under conjunction and existential quantification we ical connectives, quantifiers and predicates are also typed obtain conjunctive RPQs. functions: e.g., and/(Bool:Bool,Bool), R-identity = For example, the Cypher working with Neo4j databases is (Bool:R,R)-object, universal R-quantifier  , and exis- lacks some fundamental graph querying functionalities, tential R-quantifiers  are (Bool:(Bool:R))-objects. R- namely, RPQs and graph construction. In [2] an interesting singularizer I /(R:(Bool:R)) denotes the function whose newer approach is offered by the language G-Path. G-Path is value is the only member of an R-singleton and in all an RPQ language working on graphs, which supports mostly other cases the application of I is undefined. Arithmetic all useful regular expression operators. PGQL [19] is based operations +, -, *, / are examples of (Number:Number, on the paradigm of graph pattern matching, closely follows Number)-objects. We can also type functions of functions syntactic structures of SQL, and provides RPQs with condi- etc. tions on labels and properties. 4.2 Attributes 4 Database modelling with functions Object structures usable in building a database can be described by some expressions of a natural language. For Our approach to graph modelling and querying is based rather example, “the language thought by a teacher at a school” on conceptual structures than database ones. We start with (abbr. LTS) is a (Language:Teacher, School) a conceptual modelling can based on the notion of attribute -object, i.e., a (partial) function f :Teacher × School→ viewed as an empirical typed function that is described by Language, where Teacher, School, and an expression of a natural language [7]. A lot of papers are Language are appropriate elementary types. Such func- devoted to this approach developed and studied mainly in the tions are called attributes in [7]. 1990s (see, e.g., [10]). The type apparatus allows a number More formally, attributes are functions of type variants. For our purposes, we can handle with functional ((S:T):W), where W is the logical space (possible worlds), types and tuple types built in a hierarchical way. T contains time moments, and S  T. M denotes the appli- cation of the attribute M to w.W, M denotes the application wt 4.1 Types of M to the time moment t. In our approach to conceptual modelling, we can omit parameters w and t in type(M). In A hierarchy of types is constructed as follows. We assume the the case of LTS attribute, we suppose only possible worlds, existence of some (elementary) types S ,…,S (k ≥ 1). They where teachers teach at most on language in each school. 1 k constitute a base B. More complex types are constructed in However, this may not be true for other possible worlds. the following way. For GDBs we can elementary entity types conceive as sets If S, R ,..., R (n ≥ 1) are types, then of node IDs. For a better readability, John, Prague, Frank can be simply such IDs. (i) (S : R ,…,R ) is a (functional) type, Attributes can be constructed according to their type 1 n (ii) (R ,…,R ) is a (tuple) type. in a more complicated way. For example, “the classes 1 n in a school” could be considered as an attribute CS of The set of types T over B is the least set containing types from type ((Bool:(Bool:Student)):School), i.e., the B and those given by (i)-(ii). When S in B are interpreted classes contain sets of students and the CS returns a set of as non-empty sets, then (S : R ,..., R ) denotes the set of classes (of students) for a given school. Obviously, Student 1 n all (total or partial) functions from R × ··· × R into S, is an elementary type. 1 n (R ,..., R ) denotes the Cartesian product R × ··· × R . We can also consider other functions that need no possi- 1 n 1 n In the conceptual modelling, each base B consists of ble world. For example, aggregate functions like COUNT / descriptive and entity types. Descriptive types (String, ((Number:(Bool:S)), SUM/(Number:(Bool: Number, etc.) serve for domains of properties. Their ele- Number)) and arithmetic operations provide such func- ments are constants like ‘Prague‘, ‘John‘, ‘201400‘, etc. The elementary type Bool = {TRUE, FALSE} is also in B.The We suppose here, that SUM is defined on sets only. 123 Vietnam J Comput Sci (2018) 5:95–105 101 tions. These functions have the same behaviour in all possible and relationship types in the associated traditional GDB worlds and time moments. Consequently, we can distin- schema or at least their abbreviations. guish between two categories of functions: empirical (e.g. attributes) and analytical. The former are conceived as par- tial functions from the logical space. Range of these functions 5 Manipulating functions are again functions. Analytical functions are of type S, where S does not depend on W and T. A manipulating language for functions is traditionally a typed The notion of attribute applied in GDBs could be restricted lambda calculus. Our version of the typed lambda calculus on attributes of types(R:S), ((Bool:R):S), or directly supports manipulating objects typed by T introduced (Bool:R,S), where R and S are entity types. This strategy in Sect. 4.1. We will suppose a collection Func of constants, simply covers binary functional types, binary multivalued each having a fixed type, and denumerable many variables of functional types, and binary relationships described as binary each type. Then the language of (lambda) terms LT is defined characteristic functions. The last option corresponds to m : n as follows: relationship types. For modelling directed graphs the first two Let types R, S, R ,…,R (n ≥ 1) are elements of T. 1 n types are sufficient, because m : n relationships types can be expressed by two „inverse“ binary multivalued functional (1) Every variable of type R is a term of type R. types. Here we will consider always one of them (see, e.g., (2) Every constant (a member of Func) of type R is a term Teaches below). Thus, a graph database schema can reflect a of type R. reality only partially, similarly to non-functional graph data (3) If M is a term of type (S : R ,..., R ), and N ,..., N 1 n 1 n modelling. are terms of types R ,..., R , respectively, then 1 n Now we add properties. Properties describing entity M (N ,..., N ) is a term of type S./application/ 1 n types can be of types (S ,…,S :R), where S are descrip- (4) If x ,..., x are different variables of the respec- 1 m i 1 n tive elementary types and R is an entity type. So we tive types R ,..., R and M is a term of type S, 1 n deal with functional properties. Similarly, we can express then λx ,..., x (M ) is a term of type (S : R ,..., R ) 1 n 1 n properties of edges. They are of types (S ,…,S , R : /lambda abstraction/ 1 m 1 R ) and ((Bool : S ,…,S , R ) : R ) for binary func- (5) If N ,..., N are terms of typesR ,..., R , respectively, 2 1 m 1 2 1 n 1 n tional and binary multivalued functional types, respec- then tively. (N ,..., N ) is a term of type (R ,..., R )./tuple/ 1 n 1 n Then a functional database schema describing GDB in (6) If M is a term of type (R ,..., R ), then 1 n Fig. 1 can look as: M [1],…,M[n] are terms of respective types R ,..., R . 1 n La/((Name, Textbook):Language) /components/ Te/((T_ID, T_Name, Birth_year):Teacher) Tw/((Town_name, Population):Town) 5.1 From the LT language to a more user-friendly Teaches/((Bool:Day, Hour, Room, notation Language):Teacher) Is_Born_in/((Date, Town):Teacher) Instead of the position notation we can use more read- Lives_in/((From,Town):Teacher) able dot notation for components. Consider the Prague where La, Te, Tw, Teaches, Is_Born_in and Lives_in are typed object (entity, node). Then instead of Tw(Prague) [2], variables. The use of variables correspond to a “database where Prague/Town, we can write Tw(Prague). view” of the modelled world. Since we do not consider tem- Population. The real effect of this convention is that we poral functional databases here, it is not necessary to use time approach Population property independently on its position explicitly. in the tuple described by the associated functional database We remark, however, that our functional GDBs with such schema. In the case that the population is not present between schemas can contain isolated nodes with at least one prop- properties of the town Prague, the function will be undefined. erty. IDs of edges are not necessary, because edges are not Terms are interpreted in a standard way by means of an explicitly considered. interpretation assigning to each function symbol from Func Thus, a functional graph database schema is a set of an object of the same type, and by a semantic mapping variables of types from T restricted to attribute types. A func- [ ] from LT into all functions given by T. Func influ- tional graph database is any evaluation of these variables. ences the expressive power of LT. It can contain usual Thus, certain variables serve for referencing the associated arithmetic and aggregation functions, etc. Of a special database. For convenience, we denote the variables from the importance is R-identity = usable for (Bool:(Bool:S),(Bool:S)) functional graph database schema by names remaining entity a comparison of two sets of S-objects. We will denote the (Bool:(Bool:S),(Bool:S))type as 2sets for 123 102 Vietnam J Comput Sci (2018) 5:95–105 better readability. Consider Teacher-objects John and Using the syntactic abbreviation similar to that one used Frank. Then with in Section 4.3, the resulted lambda expression Language Language Teacher Number Number λl ∃d , h , r , Teaches(John)(d , h , r , l ) = 1 1 1 1 1 1 2sets 1 1 λt , n (n Language Language λl ∃d , h , r , Teaches(Frank)(d , h , r , l ) Teacher Language 2 2 2 2 2 2 2 2 = COUNT (λ l(Teaches(t )(l )))) Language we can test whether John and Frank teach the same set of denotes the query “Give a set of couples associating with languages. Similarly to domain relational calculus, we can each teacher the number of languages teaching by him/her”. simplify this expression by omitting some existential quan- Obviously, the query could be also reformulated as λ t (λ n tifiers and associated variables. Then the resulted expression (…)), or more conventionally λ t onlyone n (…). can look as To consider only teachers born in Prague the lambda Language Language expression might look like λl Teaches(John)(l ) = 2sets 1 1 Language Language Teacher Number Number λl Teaches(Frank)(l ) λ t , n (n 2 2 Teacher Language = COUNT (λ l(Teaches(t )(l )) Language A further simplification could be made by omitting lambda Teacher and Is − born_in(t ).Town_name = Prague )) abstraction supposing that information about objects con- sidered is in the Bool(Language)-identity. Then we can Remark 1 In our conceptual framework we could conceive write each query as an attribute, i.e., a lambda expression depen- dent on possible worlds and time moments. In practice, we Teaches(John) = Teaches(Frank) 2sets omit λw (λ t …in its head. Clearly, the resulted lambda expressions can express more complex graph structures than In other words, an application is evaluated as the applica- k-hop queries, i.e., they can contribute to constructions of tion of an associated function to given arguments, a lambda new graphs from the original GDB. abstraction “constructs” a new function. In the conventional approach a valuation δ is used. Supposing this mapping we In querying GDBs, RPQs are of user importance. We will can assign objects to every variable occurring in a term. consider expressions For example, CS(Oxford_House)(’AM_training’) is TRUE, if there is the class AM_training in the Oxford_house λ x , yReg(x , y) school (’AM_training’ is a constant of type (Bool: Student)). It is true while ’AM_training’ contains all stu- where Reg is an expression simulating concatenation and dents of the class AM_training. closure. There are two styles how to construct Reg. First, In accordance with the semantics of the quantifiers consider the attribute and the singularizer, we can write simply ∀x (M ) instead of (λ x (M )). Similarly, ∃x (M ) replaces (λ x (M )). Friends_Of _Friend/((Bool : Person) : Person), Finally, we write I (λ x (M )) shortly as onlyone x (M ) and read “the only x such as M”. Certainly, M/Bool. abbreviated as FOF. FOF( p )(p ) = TRUE expresses the fact i j From the database point of view, we have at disposal a that there is an edge between p and p in the associated GDB. i j powerful declarative language for formulating schema trans- Then formation, ICs, etc., even querying GDBs, as we shall see in Sect. 5.1. Section 5.2 shows that a similar technique can be FOF (p , p ) used in context of XML trees. 1 k will denote the expression FOF( p )(p ) and … and 5.2 Querying graph data functionally 1 2 FOF( p ,)( p ),for a k (k >1), and k−1 k The LT language can be used as a theoretical tool for building λ x , yFOF (x , y) a functional database language. A query in such language is expressed by a term of LT, e.g., provides a set of couples ( p , p ), where there is a directed 1 k path from p to p along edges of FOF. λ t, n(n =COUNT (λ l (∃d, h, rTeaches(t )(d,h,r, l)))) 1 k Language Now we will consider a single-valued function, e.g., with d, h, r, and l of types Teacher, Day, Hour, Room, Language, respectively. Manager_of /(Person : Person), 123 Vietnam J Comput Sci (2018) 5:95–105 103 abbreviated as MO. MO (p), where p/Person, will denote is an abbreviation for (BOOL:T). In this case, (BOOL:T)- the expression MO(…(MO( p)…)) with l applications of MO, objects include ∅ which simulates the empty set. Similarly, for a l ≥1, and T + denotes the set of (BOOL:T)-objects except ∅, and T? the set of objects of type T ∪ NIL. λx , y (MO (x ) = y) Now we will define (XML) element types, following the established type system T over B. In order to distinguish reg provides a set of couples ( p , p ), where there is a directed tags used in T from element tags we will suppose for each 1 k reg path from p to p along edges MO. tag used in the elementary types tag:T or tag: the existence 1 k A composition of different functions representing rela- of the TAG name denoting an associate element type .The tionships and properties looks, e.g., as follows. The fol- same holds for any tag ∈ NAME. Thus, NAME contains both lowing term expresses a YES/NO query “Is John born in tags and TAGs. Berlin?” Let T over B be the type system and E be a set of reg abstract elements. Then the type system T induced by T E reg Tw(Is_born(John).Town).Town_name = Berlin (or T ifT is understood) containing the regular element E reg expressions is given by the following grammar: One of the most fundamental problems in graph process- ing is pattern matching. Specifically, a pattern match query searches over a GDB to look for the existence of a pattern graph in a GDB. For example, triangle counting is used heavily in social network analysis. It is easy to formulate terms providing sets of triples (p , p , p ) and there is a 1 2 3 path p → p → p → p in GDB. Clearly, 1 2 3 1 Elementary element types and regular element expres- we have to suppose oriented labelled edges, i.e., attribute sions TAG:(E ,..., E ) are called element types. 1 n names, e.g., FOF. Finding only structural triangles, regard- The functional semantics of the element types asso- less of edge names, would require to expand the typing of ciate with TAG:PCDATA the set of all functions from E to variables and to move to the second-order typed lambda cal- tag:PCDATA. For a non-elementary element type T, the culus. semantics of TAG:T are also functional, but the functions We remind that the LT is not computationally complete. are more complex. But it makes possible to increase its computational power by Then an XML-database schema, S , is a set of variables XML adding new built-in functions into Func. In other words, LT of types from T . Given an XML-database schema S ,an E XML is extendible with various mathematical functions, including XML-database is any valuation of these variables. Thus, cer- logical operators. tain variables serve for referencing the associated database. For convenience, we denote the variables from S by the XML 5.3 Querying XML trees functionally same names as TAGs from T ,e.g. BOOK, AUTHOR, etc. For example, a number of types associated with DTD in Fig. 7 First, we extend the type system defined in Sect. 4.1 by the can look as follows: union type (T + T ) denoting union of sets T and T . 1 2 1 2 Now we introduce the type system T .Let B = {PCDATA, reg TEACHERS : (TEACHER∗) BOOL, NAME}.The type system T over B is recursively reg TOWN : (TOWN_NAME, POPULATION?) defined as follows. TOWN_NAME : PCDATA To manipulate XML data from XML-database based on T , it is necessary only to extend the original LT language by tagged terms K:M, where K/NAME.If M/T, then K:M/(T:E). The resulted language version called XML-λ in [8] enables to simulate much of the XPath language as well as the 1st order logic, aggregation functions, arithmetic, and user The type system T describes regular expressions over reg defined functions. Obviously, a more user-friendly notation character data, similarly as it is allowed in DTDs. Suppose that PCDATA is interpreted as a set of character data (strings). Then tag:PCDATA denotes the set of character data labelled This distinguishing is only formal and can be done in any other way by tag. The type tag: denotes the empty labelled character in practice. object. (T ∪T ) denotes the set of objects of type T ∪T . T* https://www.w3.org/TR/xpath/ (retrieved on 10.9.2017). 1 2 1 2 123 104 Vietnam J Comput Sci (2018) 5:95–105 can be used. For example, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. POPULATION(TOWN(IS_BORN_IN(TEACHER(e)))) can be rewritten as e.TEACHER.IS_BORN_IN.TOWN.POPULATION References When we use a path in more logical conditions, it is possible 1. Angels, R.: A comparison of current graph database models. In: to write the common prefix only once in XML-λ and to put IEEE 28th Int. Conference on Data Engineering Workshops, pp. conditions in parentheses. For example, 171–177 (2012) 2. Bai, Y., Wang, Ch., Ning, Y., Wu, H., Wang, H.: G-path: flexible λx(TEACHER(IS_BORN_IN.TOWN.TOWN_NAME = path pattern query on large graphs, pp. 333–336. WWW (Compan- ion Volume) (2013) LIVES_IN.TOWN.TOWN_NAMEandT_NAME = x)) 3. Barker, R.: Case*Method: Entity Relationship Modeling. Addison- Wesley Publ. Comp., Boston (1990) denotes the query “Give a set of teachers names who live in 4. Gray, P.M.D., Kerschberg, L., King, P.J.H., Poulovassilis, A.: The Functional Approach to Data Management. Springer, Berlin (2004) the same town where they were born”. 5. Ma, S., Li, J., Hu, Ch., Lin, X., Huai, J.: Big graph search: chal- lenges and techniques. Front. Comput. Sci. 10(3), 387–398 (2016) 6. Ghrab, A., Romero, O., Skhiri, S., Vaisman, A., Zimányi, E.. 6 Conclusions GRAD: On Graph Database Modeling. Cornel University Library (2016). arXiv:1602.00503 7. Pokorný, J.: A function: unifying mechanism for entity-oriented The objective of this paper was to provide an alternative database models. In: Batini, C. (ed.) Entity-Relationship Approach, approach to GDB querying based on a functional approach. pp. 165–181. Elsevier Science Publishers B.V, North-Holland Comparing to other graph query languages, the functional (1989) 8. Pokorny, J.: XML functionally. In: Desai, B.C., Kiyoki, Y., Toyama, language LT designed here is based on the notion of M. (eds.) Proc. of IDEAS2000, pp. 266-274. IEEE Comp. Society graph conceptual schema using the notion of attribute. The (2000) approach is based on the idea that the conceptual view can 9. Pokorný, J.: XML querying with functions. In: Kiyoki, Y. et al be directly conceived as a database view. More technically, (eds.) Proc. of the IASTED Int. Conf. Information Systems and Databases, pp. 204–209. Acta Press (2002) only an appropriate database implementation of concep- 10. Pokorný, J.: Database semantics in heterogeneous environment. In: tual structures is necessary. Query languages based on this Jeffery, K.G.. Král, J., Bartošek M. (eds.) Proc. of 23rd Seminar approach are usable in environments where GDB is searched SOFSEM◦96: Theory and Practice of Informatics, pp. 125–142. Springer-Verlag (1996) for collecting and aggregating information from nodes and 11. Pokorný, J.: Graph databases: their power and limitations. In: relationships rather than extractions of only structural pat- Saeed, K. and Homenda, W. (eds.) Proc. of 14th Int. Conf. terns. on Computer Information Systems and Industrial Management We have seen that not only labelled property graphs but Applications (CISIM), LNCS 9339, pp. 58–69. Springer, Berlin (2015) also XML trees provide application environment for the func- 12. Pokorný, J.: Conceptual and database modelling of graph tional approach. Only the type system is different. databases. In: Desai, B. (ed.) Proc. of IDEAS’ 16, pp. 370–377. Finally, a few words about usability of functional query- ACM (2016) ing in GDBs. All the techniques associated with GDBMS and 13. Pokorný, J., Valenta, M., Kovaci ˇ c, ˇ J.: Integrity constraints in graph databases. In: Proc. of the 8th International Conference on Ambient supported in any graph search engine should fulfil so called Systems, Networks and Technologies (ANT 2017), 7th Int. Symp. FAE rule [5]. The FAE rule says that the quality of search on Frontiers in Ambient and Mobile Systems (FAMS 2017), pp. engines includes three key factors: Friendliness, Accuracy 975–981. Elsevier Science, Procedia Computer Science (2017). and Efficiency, i.e., that a good search engine must provide https://doi.org/10.1016/j.procs.2017.05.456 the users with a friendly query interface and highly accurate 14. Pokorný, J.: Functional Querying in Graph Databases. In: Nguyen N., Tojo S., Nguyen L., Trawinski ´ B. (eds.) Proc. of 9th Asian Con- answers in a fast way. A friendliness of our functional lan- ference on Intelligent Information and Database Systems (ACIIDS guage is still missing till now. This is the main challenge for 2017), pp. 291–301, Part I, LNCS 10191. Springer (2017) future work. 15. Robinson, I., Webber, J., Eifrém, E.: Graph Databases. O’Reilly Media (2013) Acknowledgements This work was supported by the Charles Univer- 16. Shipman, D.W.: The functional data model and the data languages sity project Q48. DAPLEX. ACM Trans. Database Syst. (TODS) 6(1), 140–173 (1981) Open Access This article is distributed under the terms of the Creative 17. Stanek, G., Kolmar, S.: How Neo4j co-exists with Oracle RDBMS. White paper, Neo4j (2016) Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, 18. Tivari, S.: Professional NoSQL. Wiley/Wrox (2011) 123 Vietnam J Comput Sci (2018) 5:95–105 105 Publisher’s Note Springer Nature remains neutral with regard to juris- 19. van Rest, O., Hong, S., Kim, J., Meng, X., Chafi, H.: PGQL: a dictional claims in published maps and institutional affiliations. property graph query language. In: Proc. of the 4th Int. Workshop on GRADES, Redwood Shores, CA (2016)

Journal

Vietnam Journal of Computer ScienceSpringer Journals

Published: Nov 10, 2017

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off