Bias in Reinforcement Learning: A Review in Healthcare ApplicationsSmith, Benjamin; Khojandi, Anahita; Vasudevan, Rama
doi: 10.1145/3609502pmid: N/A
Reinforcement learning (RL) can assist in medical decision making using patient data collected in electronic health record (EHR) systems. RL, a type of machine learning, can use these data to develop treatment policies. However, RL models are typically trained using imperfect retrospective EHR data. Therefore, if care is not taken in training, RL policies can propagate existing bias in healthcare. Literature that considers and addresses the issues of bias and fairness in sequential decision making are reviewed. The major themes to mitigate bias that emerge relate to (1) data management; (2) algorithmic design; and (3) clinical understanding of the resulting policies.
Service Caching and Computation Reuse Strategies at the Edge: A SurveyBarrios, Carlos; Kumar, Mohan
doi: 10.1145/3609504pmid: N/A
With the proliferation of connected devices including smartphones, novel network connectivity and management methods are needed to meet user Quality of Experience (QoE) and computational demands of contemporary applications. Service caching and computation reuse techniques are being employed to alleviate challenges due to scalability, interoperability, and mobility, as well as to reduce application latency by enabling caching at the edge. This survey provides a taxonomy for service caching and computation reuse and describes the current state of the research and its challenges. This is the first survey that provides a comprehensive analysis and suggests future research directions on this topic.
Document Image Quality Assessment: A SurveyAlaei, Alireza; Bui, Vinh; Doermann, David; Pal, Umapada
doi: 10.1145/3606692pmid: N/A
The rapid emergence of new portable capturing technologies has significantly increased the number and diversity of document images acquired for business and personal applications. The performance of document image processing systems and applications depends directly on the quality of the document images captured. Therefore, estimating the document's image quality is an essential step in the early stages of the document analysis pipeline. This article surveys research on Document Image Quality Assessment (DIQA). We first provide a detailed analysis of both subjective and objective DIQA methods. Subjective methods, including ratings and pair-wise comparison-based approaches, are based on human opinions. Objective methods are based on quantitative measurements, including document modeling and human perception-based methods. Second, we summarize the types and sources of document degradations and techniques used to model degradations. In addition, we thoroughly review two standard measures to characterize document image quality: Optical Character Recognition (OCR)-based and objective human perception-based. Finally, we outline open challenges regarding developing DIQA methods and provide insightful discussion and future research directions for this problem. This survey will become an essential resource for the document analysis research community and serve as a basis for future research.
Extensions of Fuzzy Cognitive Maps: A Systematic ReviewSchuerkamp, Ryan; Giabbanelli, Philippe J.
doi: 10.1145/3610771pmid: N/A
Fuzzy Cognitive Maps (FCMs) are widely used to simulate complex systems. However, they cannot handle nonlinear relationships or time delays/lags, nor can they fully represent uncertain information, which prompted the development of extended FCMs. The latest review covered extensions up to 2010. We search for extensions from 2011 to March 2023 and assess their motivations, features, operationalizations, use cases, reproducibility, and evaluation to support modelers in reusing existing solutions. We reviewed 26 extensions and found a paucity of extensions addressing multiple limitations, and none of the extensions provided code, hindering modelers in reusing existing extensions while suggesting future work.
I/O Access Patterns in HPC Applications: A 360-Degree SurveyBez, Jean Luca; Byna, Suren; Ibrahim, Shadi
doi: 10.1145/3611007pmid: N/A
The high-performance computing I/O stack has been complex due to multiple software layers, the inter-dependencies among these layers, and the different performance tuning options for each layer. In this complex stack, the definition of an “I/O access pattern” has been reappropriated to describe what an application is doing to write or read data from the perspective of different layers of the stack, often comprising a different set of features. It has become common to have to redefine what is meant when discussing a pattern in every new study, as no assumption can be made. This survey aims to propose a baseline taxonomy, harnessing the I/O community’s knowledge over the past 20 years. This definition can serve as a common ground for high-performance computing I/O researchers and developers to apply known I/O tuning strategies and design new strategies for improving I/O performance. We seek to summarize and bring a consensus to the multiple ways to describe a pattern based on common features already used by the community over the years.
Deep Learning for Zero-day Malware Detection and Classification: A SurveyDeldar, Fatemeh; Abadi, Mahdi
doi: 10.1145/3605775pmid: N/A
Zero-day malware is malware that has never been seen before or is so new that no anti-malware software can catch it. This novelty and the lack of existing mitigation strategies make zero-day malware challenging to detect and defend against. In recent years, deep learning has become the dominant and leading branch of machine learning in various research fields, including malware detection. Considering the significant threat of zero-day malware to cybersecurity and business continuity, it is necessary to identify deep learning techniques that can somehow be effective in detecting or classifying such malware. But so far, such a comprehensive review has not been conducted. In this article, we study deep learning techniques in terms of their ability to detect or classify zero-day malware. Based on our findings, we propose a taxonomy and divide different zero-day resistant, deep malware detection and classification techniques into four main categories: unsupervised, semi-supervised, few-shot, and adversarial resistant. We compare the techniques in each category in terms of various factors, including deep learning architecture, feature encoding, platform, detection or classification functionality, and whether the authors have performed a zero-day evaluation. We also provide a summary view of the reviewed papers and discuss their main characteristics and challenges.
Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph QueriesBesta, Maciej; Gerstenberger, Robert; Peter, Emanuel; Fischer, Marc; Podstawski, Michał; Barthels, Claude; Alonso, Gustavo; Hoefler, Torsten
doi: 10.1145/3604932pmid: N/A
Numerous irregular graph datasets, for example social networks or web graphs, may contain even trillions of edges. Often, their structure changes over time and they have domain-specific rich data associated with vertices and edges. Graph database systems such as Neo4j enable storing, processing, and analyzing such large, evolving, and rich datasets. Due to the sheer size and irregularity of such datasets, these systems face unique design challenges. To facilitate the understanding of this emerging domain, we present the first survey and taxonomy of graph database systems. We focus on identifying and analyzing fundamental categories of these systems (e.g., document stores, tuple stores, native graph database systems, or object-oriented systems), the associated graph models (e.g., Resource Description Framework or Labeled Property Graph), data organization techniques (e.g., storing graph data in indexing structures or dividing data into records), and different aspects of data distribution and query execution (e.g., support for sharding and Atomicity, Consistency, Isolation, Durability). Fifty-one graph database systems are presented and compared, including Neo4j, OrientDB, and Virtuoso. We outline graph database queries and relationships with associated domains (NoSQL stores, graph streaming, and dynamic graph algorithms). Finally, we outline future research and engineering challenges related to graph databases.
The Business Impact of Inner Source and How to Quantify ItBuchner, Stefan; Riehle, Dirk
doi: 10.1145/3611648pmid: N/A
Inner source software development is the practice of using open source practices for firm-internal software development. Practitioner reports have shown that inner source can increase flexibility and reduce costs. Despite the potential benefits of inner source, there has been little research on its impact on businesses and their processes. To address this gap, we conducted a systematic literature review that identified which business processes are affected by inner source development, particularly within the accounting and management domain. Our review revealed the need for new dedicated community building processes within companies. In addition, we examined computational tools and techniques that can be used to measure inner source development. We found that existing tools and techniques are insufficiently suitable to manage inner source processes. Based on this, we propose research topics for future work on quantifying inner source.
A Survey of Malware Analysis Using Community Detection AlgorithmsAmira, Abdelouahab; Derhab, Abdelouahid; Karbab, Elmouatez Billah; Nouali, Omar
doi: 10.1145/3610223pmid: N/A
In recent years, we have witnessed an overwhelming and fast proliferation of different types of malware targeting organizations and individuals, which considerably increased the time required to detect malware. The malware developers make this issue worse by spreading many variants of the same malware [13]. To deal with this issue, graph theory techniques, and particularly community detection algorithms, can be leveraged to achieve bulk detection of malware families and variants to identify malicious communities instead of focusing on the detection of an individual instance of malware, which could significantly reduce the detection time. In this article, we review the state-of-the-art malware analysis solutions that employ community detection algorithms and provide a taxonomy that classifies the solutions with respect to five facets: analysis task, community detection approach, target platform, analysis type, and source of features. We present the solutions with respect to the analysis task, which covers malware detection, malware classification, cyber-threat infrastructure detection, and feature selection. The findings of this survey indicate that there is still room for contributions to further improve the state of the art and address research gaps. Finally, we discuss the advantages and the limitations of the solutions, identify open issues, and provide future research directions.