Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You and Your Team.

Learn More →

A novel framework for delivering static search capabilities to large textual corpora directly on the Web domain: an implementation for Migne’s Patrologia Graeca

A novel framework for delivering static search capabilities to large textual corpora directly on... This study aims to provide a system capable of static searching on a large number of unstructured texts directly on the Web domain while keeping costs to a minimum. The proposed framework is applied to the unstructured texts of Migne’s Patrologia Graeca (PG) collection, setting PG as an implementation example of the method.Design/methodology/approachThe unstructured texts of PG have automatically transformed to a read-only not only Structured Query Language (NoSQL) database with a structure identical to that of a representational state transfer access point interface. The transformation makes it possible to execute queries and retrieve ranked results based on a specialized application of the extended Boolean model.FindingsUsing a specifically built Web-browser-based search tool, the user can quickly locate ranked relevant fragments of texts with the ability to navigate back and forth. The user can search using the initial part of words and by ignoring the diacritics of the Greek language. The performance of the search system is comparatively examined when different versions of hypertext transfer protocol (Http) are used for various network latencies and different modes of network connections. Queries using Http-2 have by far the best performance, compared to any of Http-1.1 modes.Originality/valueThe system is not limited to the case study of PG and has a generic application in the field of humanities. The expandability of the system in terms of semantic enrichment is feasible by taking into account synonyms and topics if they are available. The system’s main advantage is that it is totally static which implies important features such as simplicity, efficiency, fast response, portability, security and scalability. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Web Information Systems Emerald Publishing

A novel framework for delivering static search capabilities to large textual corpora directly on the Web domain: an implementation for Migne’s Patrologia Graeca

Loading next page...
 
/lp/emerald-publishing/a-novel-framework-for-delivering-static-search-capabilities-to-large-vDYOXulcfR
Publisher
Emerald Publishing
Copyright
© Emerald Publishing Limited
ISSN
1744-0084
DOI
10.1108/ijwis-10-2020-0062
Publisher site
See Article on Publisher Site

Abstract

This study aims to provide a system capable of static searching on a large number of unstructured texts directly on the Web domain while keeping costs to a minimum. The proposed framework is applied to the unstructured texts of Migne’s Patrologia Graeca (PG) collection, setting PG as an implementation example of the method.Design/methodology/approachThe unstructured texts of PG have automatically transformed to a read-only not only Structured Query Language (NoSQL) database with a structure identical to that of a representational state transfer access point interface. The transformation makes it possible to execute queries and retrieve ranked results based on a specialized application of the extended Boolean model.FindingsUsing a specifically built Web-browser-based search tool, the user can quickly locate ranked relevant fragments of texts with the ability to navigate back and forth. The user can search using the initial part of words and by ignoring the diacritics of the Greek language. The performance of the search system is comparatively examined when different versions of hypertext transfer protocol (Http) are used for various network latencies and different modes of network connections. Queries using Http-2 have by far the best performance, compared to any of Http-1.1 modes.Originality/valueThe system is not limited to the case study of PG and has a generic application in the field of humanities. The expandability of the system in terms of semantic enrichment is feasible by taking into account synonyms and topics if they are available. The system’s main advantage is that it is totally static which implies important features such as simplicity, efficiency, fast response, portability, security and scalability.

Journal

International Journal of Web Information SystemsEmerald Publishing

Published: Jul 27, 2021

Keywords: Performance of Web applications; Web search and information extraction; Web mining; Patrologia Graeca; Static search engine; NoSQL; RDBMS; REST API; Http-2; Http-1.1

References