Exploring arXiv usage habits among Slovenian scientistsMetelko, Zala; Maver, Jasna
2023 Journal of Documentation
doi: 10.1108/jd-07-2022-0162
This study investigates how important the preprint arXiv is for Slovenian scientists, whether there are differences between scientific disciplines and the reputation of arXiv among Slovenian scientists. We are also interested in what advantages and disadvantages scientists see in using arXiv.Design/methodology/approachA voluntary sample of active researchers from the scientific fields covered by arXiv was used. Data were collected over 21 days in September 2021 using a 40-question online survey. In addition to descriptive statistics, nonparametric statistical methods such as Pearson's chi-squared test for independence, Kruskal-Wallis' H-test and Mann-Whitney's U-test were applied to the collected data.FindingsAmong Slovenian scientists there is a wide range of different users of arXiv. The authors note differences among scientific disciplines. Physicists and astronomers are the most engaged, followed by mathematicians. Researchers in computer science, electrical engineering and systems science seem to have recognized the benefits of the archive, but are still hesitant to use it. Researchers from the other scientific fields participated in the survey to a lesser extent, suggesting that arXiv is less popular in these scientific fields. For Slovenian scientists, the main advantages of arXiv are faster access to knowledge, open access, greater impact of scientists' work and the fact that publishing in the archive is free of charge. A negative aspect of using the archive is the frustration caused by the difficulties in assessing the credibility of articles.Research limitations/implicationsA voluntary sample was used, which attracted a larger number of researchers but has a higher risk of sampling bias.Practical implicationsThe results are useful for international comparisons, but also provide bases and recommendations for institutional and national policies to evaluate researchers and their performance.Originality/valueThe results provide valuable insights into arXiv usage habits and the reasons for using or not using arXiv by Slovenian scientists. There is no comparable study conducted in Slovenia.
Website quality evaluation: a model for developing comprehensive assessment instruments based on key quality factorsMorales-Vargas, Alejandro; Pedraza-Jimenez, Rafael; Codina, Lluís
2023 Journal of Documentation
doi: 10.1108/jd-11-2022-0246
The field of website quality evaluation attracts the interest of a range of disciplines, each bringing its own particular perspective to bear. This study aims to identify the main characteristics – methods, techniques and tools – of the instruments of evaluation described in this literature, with a specific concern for the factors analysed, and based on these, a multipurpose model is proposed for the development of new comprehensive instruments.Design/methodology/approachFollowing a systematic bibliographic review, 305 publications on website quality are examined, the field's leading authors, their disciplines of origin and the sectors to which the websites being assessed belong are identified, and the methods they employ characterised.FindingsEvaluations of website quality tend to be conducted with one of three primary focuses: strategic, functional or experiential. The technique of expert analysis predominates over user studies and most of the instruments examined classify the characteristics to be evaluated – for example, usability and content – into factors that operate at different levels, albeit that there is little agreement on the names used in referring to them.Originality/valueBased on the factors detected in the 50 most cited works, a model is developed that classifies these factors into 13 dimensions and more than 120 general parameters. The resulting model provides a comprehensive evaluation framework and constitutes an initial step towards a shared conceptualization of the discipline of website quality.
Is dc:subject enough? A landscape on iconography and iconology statements of knowledge graphs in the semantic webBaroncini, Sofia; Sartini, Bruno; Van Erp, Marieke; Tomasi, Francesca; Gangemi, Aldo
2023 Journal of Documentation
doi: 10.1108/jd-09-2022-0207
In the last few years, the size of Linked Open Data (LOD) describing artworks, in general or domain-specific Knowledge Graphs (KGs), is gradually increasing. This provides (art-)historians and Cultural Heritage professionals with a wealth of information to explore. Specifically, structured data about iconographical and iconological (icon) aspects, i.e. information about the subjects, concepts and meanings of artworks, are extremely valuable for the state-of-the-art of computational tools, e.g. content recognition through computer vision. Nevertheless, a data quality evaluation for art domains, fundamental for data reuse, is still missing. The purpose of this study is filling this gap with an overview of art-historical data quality in current KGs with a focus on the icon aspects.Design/methodology/approachThis study’s analyses are based on established KG evaluation methodologies, adapted to the domain by addressing requirements from art historians’ theories. The authors first select several KGs according to Semantic Web principles. Then, the authors evaluate (1) their structures’ suitability to describe icon information through quantitative and qualitative assessment and (2) their content, qualitatively assessed in terms of correctness and completeness.FindingsThis study’s results reveal several issues on the current expression of icon information in KGs. The content evaluation shows that these domain-specific statements are generally correct but often not complete. The incompleteness is confirmed by the structure evaluation, which highlights the unsuitability of the KG schemas to describe icon information with the required granularity.Originality/valueThe main contribution of this work is an overview of the actual landscape of the icon information expressed in LOD. Therefore, it is valuable to cultural institutions by providing them a first domain-specific data quality evaluation. Since this study’s results suggest that the selected domain information is underrepresented in Semantic Web datasets, the authors highlight the need for the creation and fostering of such information to provide a more thorough art-historical dimension to LOD.
Optical character recognition quality affects subjective user perception of historical newspaper clippingsKettunen, Kimmo; Keskustalo, Heikki; Kumpulainen, Sanna; Pääkkönen, Tuula; Rautiainen, Juha
2023 Journal of Documentation
doi: 10.1108/jd-01-2023-0002
This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different quality OCR on users' subjective perception through an interactive information retrieval task with a collection of one digitized historical Finnish newspaper.Design/methodology/approachThis study is based on the simulated work task model used in interactive information retrieval. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869–1918 which consists of ca. 1.45 million autosegmented articles. The article search database had two versions of each article with different quality OCR. Each user performed six pre-formulated and six self-formulated short queries and evaluated subjectively the top 10 results using a graded relevance scale of 0–3. Users were not informed about the OCR quality differences of the otherwise identical articles.FindingsThe main result of the study is that improved OCR quality affects subjective user perception of historical newspaper articles positively: higher relevance scores are given to better-quality texts.Originality/valueTo the best of the authors’ knowledge, this simulated interactive work task experiment is the first one showing empirically that users' subjective relevance assessments are affected by a change in the quality of an optically read text.
Open access books through open data sources: assessing prevalence, providers, and preservationLaakso, Mikael
2023 Journal of Documentation
doi: 10.1108/jd-02-2023-0016
Science policy and practice for open access (OA) books is a rapidly evolving area in the scholarly domain. However, there is much that remains unknown, including how many OA books there are and to what degree they are included in preservation coverage. The purpose of this study is to contribute towards filling this knowledge gap in order to advance both research and practice in the domain of OA books.Design/methodology/approachThis study utilized open bibliometric data sources to aggregate a harmonized dataset of metadata records for OA books (data sources: the Directory of Open Access Books, OpenAIRE, OpenAlex, Scielo Books, The Lens, and WorldCat). This dataset was then cross-matched based on unique identifiers and book titles to openly available content listings of trusted preservation services (data sources: Cariniana Network, CLOCKSS, Global LOCKSS Network, and Portico). The web domains of the OA books were determined by querying the web addresses or digital object identifiers provided in the metadata of the bibliometric database entries.FindingsIn total, 396,995 unique records were identified from the OA book bibliometric sources, of which 19% were found to be included in at least one of the preservation services. The results suggest reason for concern for the long tail of OA books distributed at thousands of different web domains as these include volatile cloud storage or sometimes no longer contained the files at all.Research limitations/implicationsData quality issues, varying definitions of OA across services and inconsistent implementation of unique identifiers were discovered as key challenges. The study includes recommendations for publishers, libraries, data providers and preservation services for improving monitoring and practices for OA book preservation.Originality/valueThis study provides methodological and empirical findings for advancing the practices of OA book publishing, preservation and research.
Revisiting the notion of the public library as a meeting place: challenges to the mission of promoting democracy in times of political turmoilCarlsson, Hanna; Hanell, Fredrik; Engström, Lisa
2023 Journal of Documentation
doi: 10.1108/jd-03-2023-0061
This article explores how public librarians understand and perform the democratic mission of public libraries in times of political and social turbulence and critically discusses the idea of public libraries as meeting places.Design/methodology/approachFive group interviews conducted with public librarians in southern Sweden are analyzed using a typology of four perspectives on democracy.Findings Two perspectives on democracy are commonly represented: social-liberal democracy, focusing on libraries as promoters of equality and deliberative democracy, focusing on the library as a place for rational deliberation. Two professional dilemmas in particular present challenges to librarians: how to handle undemocratic voices and how to be a library for all.Originality/valueThe analysis points to a need for rethinking the idea of the meeting place and offers a rare example of an empirically based argument for the benefits of plural agonistics for analyzing and strengthening the democratic role of public libraries.
An analysis of citing and referencing habits across all scholarly disciplines: approaches and trends in bibliographic referencing and citing practicesSantos, Erika Alves dos; Peroni, Silvio; Mucheroni, Marcos Luiz
2023 Journal of Documentation
doi: 10.1108/jd-10-2022-0234
In this study, the authors want to identify current possible causes for citing and referencing errors in scholarly literature to compare if something changed from the snapshot provided by Sweetland in his 1989 paper.Design/methodology/approachThe authors analysed reference elements, i.e. bibliographic references, mentions, quotations and respective in-text reference pointers, from 729 articles published in 147 journals across the 27 subject areas.FindingsThe outcomes of the analysis pointed out that bibliographic errors have been perpetuated for decades and that their possible causes have increased, despite the encouraged use of technological facilities, i.e. the reference managers.Originality/valueAs far as the authors know, the study is the best recent available analysis of errors in referencing and citing practices in the literature since Sweetland (1989).
Digitizing and parsing semi-structured historical administrative documents from the G.I. Bill mortgage guarantee programLafia, Sara; Bleckley, David A.; Alexander, J. Trent
2023 Journal of Documentation
doi: 10.1108/jd-03-2023-0055
Many libraries and archives maintain collections of research documents, such as administrative records, with paper-based formats that limit the documents' access to in-person use. Digitization transforms paper-based collections into more accessible and analyzable formats. As collections are digitized, there is an opportunity to incorporate deep learning techniques, such as Document Image Analysis (DIA), into workflows to increase the usability of information extracted from archival documents. This paper describes the authors' approach using digital scanning, optical character recognition (OCR) and deep learning to create a digital archive of administrative records related to the mortgage guarantee program of the Servicemen's Readjustment Act of 1944, also known as the G.I. Bill.Design/methodology/approachThe authors used a collection of 25,744 semi-structured paper-based records from the administration of G.I. Bill Mortgages from 1946 to 1954 to develop a digitization and processing workflow. These records include the name and city of the mortgagor, the amount of the mortgage, the location of the Reconstruction Finance Corporation agent, one or more identification numbers and the name and location of the bank handling the loan. The authors extracted structured information from these scanned historical records in order to create a tabular data file and link them to other authoritative individual-level data sources.FindingsThe authors compared the flexible character accuracy of five OCR methods. The authors then compared the character error rate (CER) of three text extraction approaches (regular expressions, DIA and named entity recognition (NER)). The authors were able to obtain the highest quality structured text output using DIA with the Layout Parser toolkit by post-processing with regular expressions. Through this project, the authors demonstrate how DIA can improve the digitization of administrative records to automatically produce a structured data resource for researchers and the public.Originality/valueThe authors' workflow is readily transferable to other archival digitization projects. Through the use of digital scanning, OCR and DIA processes, the authors created the first digital microdata file of administrative records related to the G.I. Bill mortgage guarantee program available to researchers and the general public. These records offer research insights into the lives of veterans who benefited from loans, the impacts on the communities built by the loans and the institutions that implemented them.