Data Readiness for AI: A 360-Degree SurveyHiniduma, Kaveen; Byna, Suren; Bez, Jean Luca
doi: 10.1145/3722214pmid: N/A
Artificial Intelligence (AI) applications critically depend on data. Poor-quality data produces inaccurate and ineffective AI models that may lead to incorrect or unsafe use. Evaluation of data readiness is a crucial step in improving the quality and appropriateness of data usage for AI. R&D efforts have been spent on improving data quality. However, standardized metrics for evaluating data readiness for use in AI training are still evolving. In this study, we perform a comprehensive survey of metrics used to verify data readiness for AI training. This survey examines more than 140 papers published by ACM Digital Library, IEEE Xplore, journals such as Nature, Springer, and Science Direct, and online articles published by prominent AI experts. This survey aims to propose a taxonomy of data readiness for AI (DRAI) metrics for structured and unstructured datasets. We anticipate that this taxonomy will lead to new standards for DRAI metrics that would be used for enhancing the quality, accuracy, and fairness of AI training and inference.
Overview and Challenges of Distributed Decision Making in Resource Contested and Dynamic EnvironmentsSzabo, Claudia; Baker, Robin; Pearce, Glen; Teffera, Eyoel; Perry, Anthony
doi: 10.1145/3719001pmid: N/A
Understanding the advantages and disadvantages of distributed decision making approaches as they are developed for and deployed in contested and dynamic environments is critical to ensure that recent advancements are used in practice to their maximum potential. In this survey, we focus on the use of decision making algorithms in two perspectives, namely, context and situational awareness (CSA) and decision making based on findings from CSA. We introduce taxonomies of required characteristics and analyse how they are met by existing approaches. Our analysis finds that evaluation of decision making approaches needs to mature to consider critical attributes such as the used network bandwidth, fault tolerance, and robustness, among others. The broad majority of experimental analyses focused on showing that the approach works, typically in a small scale scenario, and that attributes such as runtime, network bandwidth, and size weight and power, were critically overlooked. None of the approaches consider large action spaces or sparse rewards. We discuss trade-offs and challenges of existing work and highlight research opportunities.
Testbeds and Evaluation Frameworks for Anomaly Detection within Built Environments: A Systematic ReviewAlosaimi, Mohammed; Rana, Omer; Perera, Charith
doi: 10.1145/3722213pmid: N/A
The Internet of Things (IoT) has revolutionized built environments by enabling seamless data exchange among devices such as sensors, actuators, and computers. However, IoT devices often lack robust security mechanisms, making them vulnerable to cyberattacks, privacy breaches, and operational anomalies caused by environmental factors or device faults. While anomaly detection techniques are critical for securing IoT systems, the role of testbeds in evaluating these techniques has been largely overlooked. This systematic review addresses this gap by treating testbeds as first-class entities essential for the standardized evaluation and validation of anomaly detection methods in built environments. We analyze testbed characteristics, including infrastructure configurations, device selection, user-interaction models, and methods for anomaly generation. We also examine evaluation frameworks, highlighting key metrics and integrating emerging technologies such as edge computing and 5G networks into testbed design. By providing a structured and comprehensive approach to testbed development and evaluation, this paper offers valuable guidance to researchers and practitioners in enhancing the reliability and effectiveness of anomaly detection systems. Our findings contribute to the development of more secure, adaptable, and scalable IoT systems, ultimately improving the security, resilience, and efficiency of built environments.
A Primer on Pretrained Multilingual Language ModelsDoddapaneni, Sumanth; Ramesh, Gowtham; Khapra, Mitesh; Kunchukuttan, Anoop; Kumar, Pratyush
doi: 10.1145/3727339pmid: N/A
Multilingual Language Models (MLLMs) such as mBERT, XLM, XLM-R, and the like, have emerged as a viable option for bringing the power of pretraining to a large number of languages. Given their success in zero-shot transfer learning, there has emerged a large body of work in (i) building bigger MLLMs covering a large number of languages, (ii) creating exhaustive benchmarks covering a wider variety of tasks and languages for evaluating MLLMs, (iii) analysing the performance of MLLMs on monolingual, zero-shot cross-lingual and bilingual tasks, (iv) understanding the universal language patterns (if any) learnt by MLLMs, and (v) augmenting the (often) limited capacity of MLLMs to improve their performance on seen or even unseen languages. In this survey, we review the existing literature covering the above broad areas of research pertaining to MLLMs. Based on our survey, we recommend some promising directions of future research.
A Survey of mmWave Backscatter: Applications, Platforms, and TechnologiesSun, Yimiao; He, Yuan; Zou, Yang; Gu, Jiaming; Yang, Xiaolei; Zhang, Jia; Mao, Ziheng
doi: 10.1145/3723004pmid: N/A
As a key enabling technology of the Internet of Things (IoT) and 5G communication networks, millimeter wave (mmWave) backscatter has undergone noteworthy advancements and brought significant improvement to prevailing sensing and communication systems. Past few years have witnessed growing efforts in innovating mmWave backscatter transmitters (e.g., tags and metasurfaces) and the corresponding techniques, which provide efficient information embedding and fine-grained signal manipulation for mmWave backscatter technologies. These efforts have greatly enabled a variety of appealing applications, such as long-range localization, roadside-to-vehicle communication, coverage optimization and large-scale identification. In this article, we carry out a comprehensive survey to systematically summarize the works related to the topic of mmWave backscatter. Firstly, we introduce the scope of this survey and provide a taxonomy to distinguish two categories of mmWave backscatter research based on the operating principle of the backscatter transmitter: modulation-based and relay-based. Furthermore, existing works in each category are grouped and introduced in detail, with their common applications, platforms and technologies, respectively. Finally, we elaborate on potential directions and discuss related surveys in this area.
Vehicle Trajectory Data Processing, Analytics, and Applications: A SurveyLiu, Chenxi; Xiao, Zhu; Long, Wangchen; Li, Tong; Jiang, Hongbo; Li, Keqin
doi: 10.1145/3715902pmid: N/A
Vehicles traveling through cities generate extensive vehicle trajectory collected by scalable sensors, providing excellent opportunities to address urban challenges such as traffic congestion and public safety. In this survey, we systematically review vehicle trajectory collection, preprocessing, analytics, and applications. First, we focus on the standard techniques for vehicle trajectory collection and corresponding datasets. Next, we introduce representative approaches for the latest advances in vehicle trajectory processing. We further discuss individual travel behavior and collective mobility analytics using vehicle trajectory. Since private cars constitute the majority of urban vehicles and form the basis for many recent research findings, we emphasize analytics based on private car trajectory data. We then compile vehicle trajectory-boosted applications from the perspective of computing vehicle trajectory. Finally, we go through unresolved problems with vehicle trajectory and outline potential future research directions.
Secure Robotics: Navigating Challenges at the Nexus of Safety, Trust, and Cybersecurity in Cyber-Physical SystemsHaskard, Adam; Herath, Damith
doi: 10.1145/3723050pmid: N/A
The growing pervasiveness of robotic and embodied artificial intelligence systems in daily life and within cyber-physical environments highlights a complex web of challenges at the intersection of robotic safety, human-to-robot trust, and cybersecurity. This article explores these challenges by emphasising the crucial role of security in establishing and maintaining trust between humans and robots, which is integral to successfully adopting and operating these systems in human environments. Safety considerations include mitigating the risks of physical harm and environmental damage due to robotic malfunctions or cyberattacks, particularly in autonomous robots requiring high built-in safety measures. From a cybersecurity perspective, these systems face unique challenges due to their complex, interconnected software and hardware components that necessitate robust protection against data breaches to ensure secure data communication. Additionally, the dynamic interaction of these systems with the physical environment adds a layer of complexity, which makes the safety, security, and reliability of these interactions a vital component of the overall security strategy. This article reviews these areas within the cyber-physical systems paradigm by focusing on engineering fail-safe mechanisms, the importance of trust and ethical responsibility in human-robot interactions, and the need for resilient cybersecurity measures. At this nexus, a table of crossover challenges illustrates the intricacy of integrating safety, trust, and security in robotic systems. This article introduces “secure robotics” as a new paradigm to address these collective challenges with a novel model to provide a structured methodology for evaluating and enhancing robotic system performance that symbolises the convergence of theoretical constructs with empirical analysis. By defining secure robotics, this article establishes a framework for advancing robotics in the cyber-physical era in alignment with current technological trends while anticipating future developments. This framework positions secure robotics as a key contributor to the evolution of cyber-physical systems.
Machine Learning for Identifying Risk in Financial Statements: A SurveyZavitsanos, Elias; Spyropoulou, Eirini; Giannakopoulos, George; Paliouras, Georgios
doi: 10.1145/3723157pmid: N/A
The work herein reviews the scientific literature on Machine Learning approaches for financial risk assessment using financial reports. We identify two prominent use cases that constitute fundamental risk factors for a company, namely misstatement detection and financial distress prediction. We further categorize the related work along four dimensions that can help highlight the peculiarities and challenges of the domain. Specifically, we group the related work based on (a) the input features used by each method, (b) the sources providing the labels of the data, (c) the evaluation approaches used to confirm the validity of the methods, and (d) the machine learning methods themselves. This categorization facilitates a technical overview of risk detection methods, revealing common patterns, methodologies, significant challenges, and opportunities for further research in the field.
The Federation Strikes Back: A Survey of Federated Learning Privacy Attacks, Defenses, Applications, and Policy LandscapeZhao, Joshua; Bagchi, Saurabh; Avestimehr, Salman; Chan, Kevin; Chaterji, Somali; Dimitriadis, Dimitris; Li, Jiacheng; Li, Ninghui; Nourian, Arash; Roth, Holger
doi: 10.1145/3724113pmid: N/A
Deep learning has shown incredible potential across a wide array of tasks, and accompanied by this growth has been an insatiable appetite for data. However, a large amount of data needed for enabling deep learning is stored on personal devices, and recent concerns on privacy have further highlighted challenges for accessing such data. As a result, federated learning (FL) has emerged as an important privacy-preserving technology that enables collaborative training of machine learning models without the need to send the raw, potentially sensitive, data to a central server. However, the fundamental premise that sending model updates to a server is privacy-preserving only holds if the updates cannot be “reverse engineered” to infer information about the private training data. It has been shown under a wide variety of settings that this privacy premise does not hold. In this article we provide a comprehensive literature review of the different privacy attacks and defense methods in FL. We identify the current limitations of these attacks and highlight the settings in which the privacy of an FL client can be broken. We further dissect some of the successful industry applications of FL and draw lessons for future successful adoption. We survey the emerging landscape of privacy regulation for FL and conclude with future directions for taking FL toward the cherished goal of generating accurate models while preserving the privacy of the data from its participants.
A Generic Taxonomy for Steganography MethodsWendzel, Steffen; Caviglione, Luca; Mazurczyk, Wojciech; Mileva, Aleksandra; Dittmann, Jana; Krätzer, Christian; Lamshöft, Kevin; Vielhauer, Claus; Hartmann, Laura; Keller, Jörg; Neubert, Tom; Zillien, Sebastian
doi: 10.1145/3729165pmid: N/A
A unified understanding of terms is essential for every scientific discipline: steganography is no exception. Being divided into several domains (e.g., network and text steganography), it is crucial to provide a unified terminology as well as a taxonomy that is not limited to few applications or areas. A prime attempt toward a unified understanding of terms was conducted in 2015 with the introduction of a pattern-based taxonomy for network steganography. In 2021, the first work toward a pattern-based taxonomy for all domains of steganography was proposed. However, this initial attempt still faced several shortcomings, e.g., remaining inconsistencies and a lack of patterns for several steganography domains.As the consortium who published the previous studies on steganography patterns, we present the first comprehensive pattern-based taxonomy tailored to fit all known domains of steganography, including smaller and emerging areas, such as filesystem, IoT/CPS, and AI/ML steganography. To make our contribution more effective and promote the use of the taxonomy to advance research, we also provide a unified description method joint with a thorough tutorial on its utilization.