TY - JOUR AU1 - San Martin, Ane AU2 - Kildal, Johan AB - Abstract It is difficult to estimate the boundaries of the hazard zones generated around autonomous machines and robots when navigating a space shared with them. We investigated the use of multimodal (auditory and/or visual) mixed-reality (MR) displays to warn users about invading such hazards zones and to help them return to safety. Two single-modality auditory and visual displays were designed, which were subjectively comparable as generic hazard displays. An experimental user study was then conducted to compare the designed single-modality displays as well as an audio-visual display that combined both. When the display included auditory information, users returned sooner to safety, although this had a small effect on performance when carrying out an independent navigation task. Additional nuanced possible differences are reported and discussed in relation to display design characteristics, as well as in relation to the limitations of the implementations that are possible with current MR head-mounted display devices. RESEARCH HIGHLIGHTS • Design and evaluation of novel auditory and visual mixed-reality hazard warning displays for safe pedestrian navigation of a space. • Select a pair of subjectively equivalent single-modality interactive hazard warning displays. • Comparable performance observed when navigating with any of the displays selected (individually and in combination). Returning to safety was more efficient if auditory feedback was provided. • Subjective experiences and observed possible additional differences identified and discussed in the context of rendering technologies. 1 INTRODUCTION Industrial production environments are pioneering the adoption of robots, at a current growth in shipped units estimated at 13% per year [Teulieres et al., 2019]. Together with the more prevalent traditional industrial robots, growth is also fueled by fenceless robots that include collaborative robots, service robots and automated guided vehicles. This means that spaces in production environments are increasingly shared between humans and robots, in scenarios that range from coexistence to collaboration on the same tasks. The main requirement for robots in shared spaces is that they are safe [Aaltonen & Salmi, 2019, Kildal et al., 2018, Probst et al., 2015]. Current cobots are, in fact, designed and deployed to be intrinsically safe, with techniques such as collision detection and reaction [Haddadin et al., 2008]. A consequence of this is that they may move more slowly or stop altogether when they detect nearby human presence. Also, if they detect that they have unexpectedly entered in physical contact with a person or object, the forces that they exert are rapidly reduced to levels that cannot harm a person [Behrens et al., 2015]. While safety is the absolute design priority, safe behavior that involves slowing down or even halting activity hinders productivity. This is undesirable for the human workers would rather not trigger such safety measures. Beyond pragmatic productivity measures, and despite the acknowledgment that robots are safe in most situations [Malm et al., 2019], workers that enter the area of influence of a robot may believe that a residual potential danger may always exist because of, e.g. malfunction of safety mechanisms or risks created by objects (such as tools) that the robot may be manipulating. Adding to this, studies show [Lasota & Shah, 2015] that implementing safety simply by means of collision prevention or avoidance results in a poor sense of safety for humans. The uncertainty motivated by such poor situation awareness can lead to increased intensity of sustained levels of stress that can become chronic over time. Such chronic stress can become a potentially serious health and safety hazard for workers [Oken et al., 2015]. In this context, the goal of this research is to investigate the use of visual and/or auditory mixed reality (MR) head-mounted displays (HMDs) to present information about hazard zones around fenceless robots, including collaborative robots, that operators should avoid entering while navigating a space that is shared with them. We specifically want to understand the relative roles that auditory and visual hazard information displays can play in improving situation awareness and in improving the subjective perception of safety for users that navigate such environments. Together with this, we want to understand what aspects of the user experience (UX) are affected by different display designs, and how. Improving situation awareness when working with fenceless robots has been identified as key to support the operator’s subjective sense of safety and, as a result, to promote trust and foster a better UX [Bolstad et al., 2006, Lohse, 2009, Onal et al., 2013]. Also, according to the literature, making hazard zones around a robot perceivable promotes an overall sense of safety [Karwowski et al., 1988, Or et al., 2009]. As proposed, better situation awareness regarding hazard zones is also likely to help preserve efficiency of automation, by avoiding production halts due to unintended intrusions that trigger slow-down mechanisms in the robot. The availability of multimodal MR-HMD devices (such as Microsoft’s HoloLens) provides an opportunity to investigate strategies for the presentation of safety-related multimodal information in scenarios involving, e.g. the navigation on foot of a space that is shared with robots. The research work reported in this paper aims to provide knowledge for the design of such MR hazard awareness displays. Specifically, the goal of this research was to gain insight into the relative contributions that visual and auditory modalities can make to a hazard awareness display, both objectively (as navigation aids) and subjectively (through the UX that they elicited). For that, and by combining design workshop and evaluative research as design methods [Hanington & Martin, 2012], we first designed a pair of hazard awareness displays: one visual and one auditory. The two single-modality displays were designed to be subjectively comparable, in relation to a range of perceptual and functional criteria, as detailed in Section 3.1 below. The fundamentals of the first iteration of that design process were reported as work in progress in San Martín & Kildal [2019], and they will be further described in this paper. The two display designs resulting from that first design iteration were then revised and further developed by applying objective design criteria in order to obtain a pair of single-modality displays that were ever more closely equivalent to each other and that could thus offer a fairer comparison with one another. However, they did not attain full equivalence in the information they conveyed. Using the display designs obtained in the previous steps, we conducted a three-condition experimental user study in which we compared the three different configurations of auditory and visual hazard awareness information: (i) visual-only display, (ii) auditory-only display and (iii) audio–visual display. The comparison was done in the context of a gamified navigation task (the navigation of a space filled with stationary hazard zones that participants had to avoid invading while navigating the space on foot to solve an independent task). Objective performance and subjective UX-related metrics were analyzed in the comparison. The rest of this paper is structured as follows. First, a review of relevant related work is provided. Following that, the user-centered process of design of single-modality displays is described. This is followed by an account of the process of selection of a pair of displays that could provide a fair comparison in the final study. The paper then reports the full experimental modality-comparison user study that was conducted, making use of the pair of displays selected in the previous steps. An in-depth analysis of the data obtained is then provided, and conclusions are drawn and discussed. 2 RELEVANT RELATED WORK This section reviews three key areas of prior work in the research domains of augmented reality (AR) and MR. One area is the support that these technologies offer for improved situation awareness and consequently better human–robot interaction (HRI). The second area is the use of these technologies specifically for the display of hazard awareness and warning information. A third area reviews design guidelines for multimodal audio–visual information presentation and awareness displays, irrespective of the rendering technology used. Conceptually, AR and MR overlap heavily with each other, with AR tending to be more limited than MR in how the user and the real environment can interact with virtual content (see Appendix-A in [Kaplan et al., 2020], with a compendium of definitions from the literature). While we consider that the present work belongs to the domain of MR, we refer to work from the literature in the terms (MR or AR) chosen by their authors. 2.1 Supporting HRI with AR/MR Robot legibility (i.e. being able to read intentions from the robot’s actions) has been identified as a key aspect in improving situation awareness and supporting seamless HRI in industrial workplaces. Studies show that enabling operators to predict or interpret robot movements and behaviors through good legibility can lead to better UX while also increasing task effectiveness [Dragan et al., 2013, Lichtenthäler & Kirsch, 2016]. Studies also show that transparency of intentions and access to information about the interaction supports improved task completion, a sense of trust and worker acceptance of novel technologies for collaboration with the robot [Hancock et al., 2011, Hoff & Bashir, 2015, Maurtua et al., 2017]. AR was used to enhance operator trust in a study by Palmarini et al. [2018], by designing an AR interface to communicate the robot’s status, progress and intention in collaborative tasks by overlaying a virtual animation over the real environment. More generally, AR/MR technologies have been investigated as ways of supporting human–robot collaborative tasks in industrial environments, by displaying instructions, providing visual overviews of the assembly process, updating the operator on production status changes and supporting staff training processes [Makris et al., 2016, Matsas & Vosniakos, 2017]. AR devices have also been proposed as aids for workers in a variety of sectors utilizing robots such as the automotive industry [Doshi et al., 2017], assembly industries [Evans et al., 2017], logistics industries [Reif & Günthner, 2009], shipyards [Blanco-Novoa et al., 2018] and construction [Li et al., 2018]. A comprehensive recent review of the use of AR in robotics [Makhataeva & Varol, 2020] shows that the development of enabling AR/VR display technologies and the applications developed making use of them focus on the creation of virtual content that is almost exclusively visual. 2.2 Display of hazard awareness information Safety being the primary concern for the deployment of unfenced robots [Aaltonen & Salmi, 2019, Kildal et al., 2018, Probst et al., 2015], examples of safety-related human–robot collaboration (HRC) applications using AR/MR are not uncommon. Vision is again the primary channel addressed in most cases, and the color coding employed to rate the level of danger is in line with international standards on color coding, such as the color coding defined in ISO 17724:2003 (Graphical Symbols – Vocabulary) and employed in ISO 22324:2015 (Societal security – Emergency management – Guidelines for color-coded alerts). This coding of severity ranges from green (safe), to yellow (caution) and then red (danger). The intermediate caution levels can be sub-divided into further intermediate hues of orange and brown. Concrete examples of this include the projection-based sensor systems that have been proposed to indicate hazard zones around moving robots, utilizing dynamic light barriers coupled with vibrating pressure mats [Vogel et al., 2016, 2011]. In a more immersive way, AR equipment is used to create hue patterns on a robot, which can be seen over the robots from a greater distance [Makhataeva et al., 2019, Michalos et al., 2016]. In Matsas et al. [2018], a proactive technique for collaboration is proposed in which a similar aura-related visual augmentation is presented. The authors also include spatial audio cues to enhance awareness, but this is only done by reproducing some of the noises that the moving robot would make (an ecological approach), rather than by designing synthetic sonifications of events (equivalent in their synthetic nature to aura-based visual representations), as it is done in the work presented in this paper. The HRC literature provides only a few examples in which audio and vision are used for situation awareness, and in particular for safety-related applications. However, ergonomics studies regarding the optimal design of warnings suggest that increasing the number of modalities used on the display can improve user reaction times [Selcon et al., 1995], suggesting that efforts should be made to explore that direction. More broadly, other collaborative contexts have demonstrated the benefits of including spatial auditory and visual cues on MR remote collaboration [Yang et al., 2020]. In other application domains, much research about the use of AR for hazard awareness and warning display has focused on AR for car drivers, with some studies supporting combined use of audio and visual modalities for hazard awareness (e.g. proximity to other vehicles or hazardous objects out with the driver’s field of view (FoV)) and safe navigation [Vogel et al., 2016]. 2.3 Role of multimodality in awareness displays Multimodal interaction has been studied for the presentation of information that can enhance awareness of the user in a variety of application scenarios, such as for driver assistance (awareness of other vehicles and persons). In Houtenbos et al. [2017], a visual and an auditory display were employed to inform about speed and direction of cars approaching an intersection. While the combined audio–visual display proved to be helpful, participants thought that auditory signals were more useful than visual ones. Also relevant is the work by Gutwin et al. [2011] for workspace awareness in distributed groupware. In an audio–visual dynamically synthesized display, they reported the benefit of audio vs. visual display information, to provide awareness information outside the FoV. Other research combining visual and auditory cues for hazard warning [Chan & Ng, 2009, Haas & Van Erp, 2014] found that adding auditory warning information made visual warning more effective. Multimodal warnings have also been investigated combining three modality displays: visual, auditory and tactile. One driver assistance study [Politis et al., 2014] found that non-visual signals were more effective in visually demanding situations and that faster reactions were achieved with bimodal and trimodal warnings. In another driver assistance study [Murata et al., 2013] that used warning signals in different combinations of the same three modalities, it was found that reaction times were shortest with tactile-only and with audio-tactile warnings. The literature discusses that one of the strengths of multimodal displays is the facilitation of access by persons with sensorial disabilities to the information that is conveyed, redundantly, across sensory channels (Colley et al. [2019] provide a discussion in the context of autonomous road traffic). 3 DESIGNING SINGLE MODALITY DISPLAYS FOR COMPARISON This section describes the process that we followed to design two single-modality hazard awareness displays (one visual and one auditory) that could later be compared (with each other alone and with both in combination) in the context of a task that required safe pedestrian navigation. The process of designing the displays was as follows. A set of modality-independent design requirements for a hazard awareness display was defined as the starting point. A design workshop was then conducted, which produced three display patterns, each of which was instantiated in an auditory and a visual display, resulting in three auditory and three visual display designs. An evaluative user study was then conducted, in which a small group of participants experienced and interacted with all six displays, rated them according to several criteria and paired them based on subjective overall similarity. The result of this user study was the selection of two display designs that met the requirements for use as hazard awareness displays, as well as to be compared based on modality. The essentials of the design steps mentioned so far were reported as work in progress in San Martín & Kildal [2019]. The selected designs were then subjected to further analysis and refinement based on the results of the preliminary display selection study and on other considerations of comparability of the selected pair. In this way, this process culminated in the selection of a visual and an auditory display satisfying the design requirements and that were perceived to address the respective sensory modality in a similar way. A detailed description of this design process is provided in the following sub-sections, and limitations of the designs obtained are discussed in Section 7. 3.1 Assumptions and requirements A simplified configuration of a generic hazard zone was assumed as a premise for the design study. This means that real robots (or other physical entities) were not used to generate hazard zones. Such robots or agents were not represented virtually either. Instead, a generic and highly schematic representation of the origin of a hazard was provided in the form of a vertical pole (∅0.1m), representing the central axis of the hazard zone. The axis was rendered as a visual hologram stationary in space (see Fig. 1). It was further assumed that the level of severity of the hazard decayed with the radial distance to its central axis, and up to a radial distance of 1.3 meters (∅2.6m) that marked the external boundary of the hazard zone. This model of a hazard zone could be reproduced multiple times in various stationary locations in space to create an environment populated by hazard zones for users to navigate on foot while avoiding entering any of them. To drive the hazard-awareness display design process, a set of modality-independent display design requirements was defined: |$\bullet $| to perceive the information displayed as a hazard warning; |$\bullet $| to inform about when the hazard zone was entered and left; |$\bullet $| to inform about the relative location (orientation and distance) of the origin of the hazard with respect to the user; |$\bullet $| to keep virtual information clutter low (auditory or visual), avoiding interfering with the perception of reality; |$\bullet $| to minimize the introduction of additional new cognitive demands. Figure 1 Open in new tabDownload slide A user inside a hazard zone. The boundaries of the zone are invisible to the user, but they are shown as a dotted circle in the figure. The origin of the hazard (its central vertical axis) is seen by the user as a stationary hologram in the shape of a cylindrical column. Figure 1 Open in new tabDownload slide A user inside a hazard zone. The boundaries of the zone are invisible to the user, but they are shown as a dotted circle in the figure. The origin of the hazard (its central vertical axis) is seen by the user as a stationary hologram in the shape of a cylindrical column. An additional requirement was added, which did not refer to a specific functional aspect but was intended for the designers to keep in perspective when designing everything else: |$\bullet $| to provide a good UX. 3.2 Design of displays We wanted to design an auditory display and a visual display, both satisfying the design requirements. To keep this ambition realistic, given the radically different nature of visual and auditory display possible to implement with a Hololens HMD device, our aim was that the displays satisfied the design requirements to the extent that they were perceived as comparable. To achieve this, we conducted a design workshop with a small group of three interaction designers, as described in detail in San Martín & Kildal [2019]. As a result of their discussions and collaboration, the designers in the workshop described and documented (supported by graphical sketching Sturdee & Lindley, 2019) three core display patterns, expressed independently of the modality of the display: |$\bullet $| boolean pattern, where a constant stimulus was presented when inside the hazard zone, independently of the distance to the source of the hazard; |$\bullet $| progressive pattern, where the stimulus gradually increased in salience with proximity to the source of the hazard; |$\bullet $| stepped Progressive pattern, where a stimulus gradually increased in salience as the user approached the origin of the hazard and additionally underwent an abrupt increase (step) in salience when crossing a threshold proximity distance to the core of the hazard. It can be noticed that a characteristic common to all these display patterns was that they were only perceived after the user had entered inside the boundaries of the hazard zone. No indication about the size and outside boundaries of the hazard zone could be perceived from outside it, unlike in examples from the literature mentioned earlier. Instances of auditory and visual displays with each of these three patterns were also proposed in the workshop. This gave origin to six display designs (three per modality) that could be implemented in a HoloLens device. The implementations of each design in each modality are represented graphically in Fig. 2. Following, details about each display design are discussed. Figure 2 Open in new tabDownload slide Auditory and visual display designs, modality-specific instances of the three display patterns obtained in the design workshop. Horizontal axes (leftwards) show distance to hazard. Visual displays (left column): color cast rendered inside hazard zone; green cast rendered with the duration of 1 s, only when exiting zone. Auditory displays (right column): sound volume same in all cases; for Progressive and Stepped Progressive, vertical axis indicates frequency of pulsation of sound. Figure 2 Open in new tabDownload slide Auditory and visual display designs, modality-specific instances of the three display patterns obtained in the design workshop. Horizontal axes (leftwards) show distance to hazard. Visual displays (left column): color cast rendered inside hazard zone; green cast rendered with the duration of 1 s, only when exiting zone. Auditory displays (right column): sound volume same in all cases; for Progressive and Stepped Progressive, vertical axis indicates frequency of pulsation of sound. 3.2.1 Auditory displays The goal was to design auditory information displays, rather than alarms, which could grab attention and convey a sufficient sense of urgency for the user to react, without eliciting unnecessary stress, and providing information to diagnose the source of the warning [Edworthy & Hellier, 2006]. In the auditory display implementations for this study, all auditory stimuli were based on a 440 Hz pure tone sound: a single pitch of sound was used, rather than a range of frequencies, to convey a more moderate sense of urgency [Edworthy et al., 1991]; also, chords or harmonically complex sounds were avoided because they are more resistant to masking [Edworthy & Hellier, 2006] and they could interfere with normal auditory communication during navigation, such as with natural speech. Amplitude was the same for all sounds. The chosen amplitude was subjectively judged by the designers as similar in salience as the most saturated red color that the HoloLens device was capable of, under average office indoor illumination. However, no psychophysical calibration user study was conducted to confirm the accuracy of this equivalence [Dragan et al., 2013]. Amplitude of sound is a good conveyor of proximity information [Bach et al., 2009], but making use of a range of amplitudes would either jeopardize perceivability of the sound in the outer boundary of the hazard zone or make it excessively salient near the core of the hazard zone. For these reasons, amplitude of sound was not used as a design parameter. Instead, tempo was used as a parameter to map urgency [Giang & Burns, 2012] in the progressive display designs. With these considerations in mind, the auditory instances of the three display patterns were created as follows (see graphical representations on the right column in Fig. 2). |$\bullet $| The Auditory Boolean display produced a continuous tone for as long as the user remained inside the hazard zone. An alternative to this might have been to use pulsation with a constant tempo. However, the designers thought that a constant tone was a cleaner implementation of a display with two distinct states: on or off. |$\bullet $| The Auditory Progressive display design produced discrete grains of sound consisting of one period of a 440 Hz tone. The temporal spacing between grains became linearly shorter as the user approached the center of the hazard. In the extreme points, tone grains were produced at a frequency of 2 Hz in the periphery of the hazard zone and at 10 Hz in its center. According to studies, a 2 Hz (or 120 bpm) is a preferred tempo in music, and rather neutral in terms of the stress or sense of danger it conveys. With a range of tempos that could increase up to a maximum of 10 Hz in the center, the goal was to convey a sense of danger that increases rapidly [Schäfer et al., 2015]. |$\bullet $| The Auditory Stepped Progressive display differed from auditory progressive only in the path of increase of frequency in tone grain production, which was linear within the peripheral half (2 to 3.3 Hz), and then increased abruptly in frequency to 6.6 Hz and continued increasing linearly until it reached 10 Hz of pulsation frequency at the center of the hazard zone. In the implementation, the auditory displays were based on the spatial audio capabilities of the device, which makes use of head-related transfer functions for easy spatial localization of virtual sound sources. This characteristic makes it possible for the user to locate the origin of the alert sound in the surrounding space (both in terms of angle and in distance relative to the user position). Figure 3 Open in new tabDownload slide Sequence of HoloLens image captures, as seen by a user exiting (walking backwards) the hazard zone created by the virtual pole in view. The sequence shows stages in the process of exiting from the hazard zone, starting from the red range well inside the hazard zone (left image), going through the peripheral yellow range (middle image) and finally out of the hazard zone, where the green confirmation of safety is seen for 1 s. In the three captures, a real picture can be seen on the wall in the background. Figure 3 Open in new tabDownload slide Sequence of HoloLens image captures, as seen by a user exiting (walking backwards) the hazard zone created by the virtual pole in view. The sequence shows stages in the process of exiting from the hazard zone, starting from the red range well inside the hazard zone (left image), going through the peripheral yellow range (middle image) and finally out of the hazard zone, where the green confirmation of safety is seen for 1 s. In the three captures, a real picture can be seen on the wall in the background. 3.2.2 Visual displays In the visual display implementations, the stimulus perceived by the user was the color cast of the whole scene rendered by the HMD device. In every case, color casts were created using spheres of light, with their center at the approximate height of an adult’s eyes. This height could be adjusted if the user’s height differed significantly from the default. The three display patterns were instantiated as follows for the visual modality (see graphical representations on the left column in Fig. 2). |$\bullet $| In the Visual Boolean design, the scene turned see-through red for as long as the user remained anywhere inside the hazard zone. |$\bullet $| With the Visual Progressive design, the intensity of the red cast gained in saturation as the distance to the center of the hazard zone was shorter, with a maximum level of intensity equal to that used in the boolean design. |$\bullet $| With the Visual Stepped Progressive design, there were two distinct ranges of color cast: yellow in the half region further from the center, and red in the inner region. In all three visual displays, leaving the hazard zone caused the scene to be cast in see-through green for 1 s, with any color cast disappearing after that second-long period of time. Still images captured from hololens in Fig. 3 convey an impression of the visual experience with the Visual Stepped Progressive display. Given the see-through nature of the holograms rendered in HoloLens, the extent to which color cast intensity can be used is limited, since it soon becomes washed out when intensity is lowered. The red color used in the Visual Progressive display was rendered convincingly across a range of intensities. In contrast, the yellow color used in the Visual Stepped Progressive display could not render a perceivable gradient of intensities. For this reason, it was decided that the Visual Stepped Progressive display would consist of successive layers of constant intensity color casts. This resulted in a different implementation of the stepped progressive pattern in the visual modality than in the auditory modality. 3.2.3 Equalizing spatial awareness Human vision and audition differ distinctly in that, while audition permits the listener to estimate the position and distance of the source of a sound, vision only provides similar information within, at most, the field of vision in front of the viewer. This difference is inherent to the human senses and consequently it shapes the potential that each modality display can offer outright. In the current HMD devices, however, this difference is further increased. While spatial sound in HoloLens allows for human hearing to use its localization capability fully, the FoV offered by this device is far narrower than the FoV of human vision. This difference between audio and vision through HoloLens potentially increases the advantage that the auditory display may have over the visual display when compared for the same task, in the context of the application domain studied here. To partially compensate for this difference, we developed a visual swiveling-arrow cue to locate the origin of the hazard when it is outside the FoV offered by the visual HMD (Fig. 4). The design process of this visual cue is reported in San Martín & Kildal [2019]. Figure 4 Open in new tabDownload slide Graphic representation of the visual cue to locate the origin of the hazard when it is outside the FoV offered by the visual HMD. (a) The origin of the hazard (represented as a blue pole). The red cast indicates that the user is inside the zone of hazard generated by that pole. (b) View from inside the same hazard zone when the source of the hazard in not visible in the FoV of the HMD device. In that case, an arrow cue swivels an angle |$\alpha $| meaning that the user should turn their head that same angle in the direction of the arrow (to the right in this case) to be able to see the source of hazard within the FoV of the device. As soon as the pole (origin of the hazard) appears on view, the arrow disappears. The angle alpha maps the rotation of the person’s head around her vertical axis with respect to the column hologram causing the hazard. As examples, for the values |$\alpha =90$|⁠, |$\alpha =180$| and |$\alpha =270$|⁠, the hazard is located on the right, behind and on the left of the user that sees the cue. Figure 4 Open in new tabDownload slide Graphic representation of the visual cue to locate the origin of the hazard when it is outside the FoV offered by the visual HMD. (a) The origin of the hazard (represented as a blue pole). The red cast indicates that the user is inside the zone of hazard generated by that pole. (b) View from inside the same hazard zone when the source of the hazard in not visible in the FoV of the HMD device. In that case, an arrow cue swivels an angle |$\alpha $| meaning that the user should turn their head that same angle in the direction of the arrow (to the right in this case) to be able to see the source of hazard within the FoV of the device. As soon as the pole (origin of the hazard) appears on view, the arrow disappears. The angle alpha maps the rotation of the person’s head around her vertical axis with respect to the column hologram causing the hazard. As examples, for the values |$\alpha =90$|⁠, |$\alpha =180$| and |$\alpha =270$|⁠, the hazard is located on the right, behind and on the left of the user that sees the cue. 3.3 Display selection study The six displays obtained from the design workshop were first implemented using Unity, to be rendered in a HoloLens device. Then, an evaluative user study was conducted with a group of six participants. Since the study and display selection process is described in detail in San Martín & Kildal [2019], in this section, we provide a summary and presentation of the findings. The goal was to select, out of the six displays obtained from the design workshop, a pair of hazard awareness displays (one visual and one auditory) that were a best match from the perspective of the design requirements. Although the six displays had been designed in pairs (i.e. visual and auditory instantiations of three design patterns), differences within each pair were substantial. Thus, the study made no a priori assumptions about best matching pairs, and it allowed for participants to propose pairwise matches considering all possible combinations. The final aim of this study was to select one pair of hazard awareness displays that was the best match in terms of how they satisfied the design requirements. In other words, we wanted to identify two displays (visual and auditory) that were subjectively comparable as hazard awareness displays, knowing that they would not be closely equivalent. Six blue holographic vertical poles (like the one described in Fig. 1) were created and distributed across a room in stationary positions, forming two rows of three holograms each (see layout in Fig. 5). Each pole represented the central axis of a hazard zone and triggered the response of one of the six display designs. After inspecting all hazard response designs (refer to San Martín & Kildal, 2019, for details), participants decided, in a post-task questionnaire, which design rated highest within each modality, in the following five categories: |$\bullet $| pleasantness, |$\bullet $| capacity to capture attention, |$\bullet $| capacity to provide a sense of safety, |$\bullet $| capacity to convey a sense of distance to the origin of the hazard and |$\bullet $| overall preference. Figure 5 Open in new tabDownload slide Top-view layout of the 10|$\times $|10-m indoor space where the display selection study was conducted. Six stationary vertical holographic poles (∅0.1 m, shown as solid small circles) were visible, arranged in two rows. The larger dotted circles (∅2.6 m) around the poles are the boundaries of each hazard zone, and they were not visible. The rows on the left and right respectively implemented the three visual display and auditory display designs. Left and right displays were arranged according to no pattern. Figure 5 Open in new tabDownload slide Top-view layout of the 10|$\times $|10-m indoor space where the display selection study was conducted. Six stationary vertical holographic poles (∅0.1 m, shown as solid small circles) were visible, arranged in two rows. The larger dotted circles (∅2.6 m) around the poles are the boundaries of each hazard zone, and they were not visible. The rows on the left and right respectively implemented the three visual display and auditory display designs. Left and right displays were arranged according to no pattern. Results from the post-task questionnaire are summarized in Fig. 6 (visual display) and in Fig. 7 (auditory display). Figure 6 Open in new tabDownload slide Graph showing the number of times a visual display design was selected as the best or the worst visual display in each category. Figure 6 Open in new tabDownload slide Graph showing the number of times a visual display design was selected as the best or the worst visual display in each category. Figure 7 Open in new tabDownload slide Graph showing the number of times an auditory display design was selected as the best or the worst auditory display in each category. Figure 7 Open in new tabDownload slide Graph showing the number of times an auditory display design was selected as the best or the worst auditory display in each category. Each participant was also asked to pair the displays from one modality with the displays from the other modality, producing best-matching pairs according to the subjective judgment of the participant. Table 1 shows the two pairs of displays that were most frequently proposed as the best matching. Table 1 Pairs of displays most frequently proposed as the best matching. Visual . Auditory . Proposed as good match by . Stepped progressive Stepped progressive 4/6 participants Stepped progressive Progressive 4/6 participants Visual . Auditory . Proposed as good match by . Stepped progressive Stepped progressive 4/6 participants Stepped progressive Progressive 4/6 participants Open in new tab Table 1 Pairs of displays most frequently proposed as the best matching. Visual . Auditory . Proposed as good match by . Stepped progressive Stepped progressive 4/6 participants Stepped progressive Progressive 4/6 participants Visual . Auditory . Proposed as good match by . Stepped progressive Stepped progressive 4/6 participants Stepped progressive Progressive 4/6 participants Open in new tab The same visual display was included in both proposed pairs, so it was clear that the visual display for the modality comparison in the following study would be Visual Stepped Progressive. Inspecting the results in Fig. 6, this visual display was, by far, the most preferred visual hazard awareness display. It was also far better rated than the other visual displays for providing a sense of distance to the source of the hazard. The only possible contender would have been Visual Boolean for being better at capturing attention. It was also more pleasant, but its preference rating being so low, we confirmed our selection of Visual Stepped Progressive. Among other things, we learned from our interviews that this visual display was preferred because the change in color was indicating progression toward the hazard more clearly than the change in intensity, and some mentioned that additional intermediate colors might be an improvement on the design. As for the auditory display, two alternatives had been proposed, with the same number of supporters: Auditory Stepped Progressive and Auditory Progressive. From the post-condition questionnaire responses (see again Fig. 7) we saw that Auditory Progressive was the most preferred auditory display design, and it was the clear choice among auditory displays in terms of good indication of distance to the origin. It also performed better than its contender in subjective scores for capturing attention. As a result, we selected Auditory Progressive as the auditory display design to be included in the modality comparison study that followed. In conclusion, the pair formed by the Visual Stepped Progressive and the Auditory Progressive hazard awareness displays was identified as a pair of displays in the two modalities that were perceived as more closely comparable in the various dimensions of design and as the preferred designs. As a result, it was decided that the modality comparison study in the next section would make use of those two displays. 3.4 Implementation of revised designs We revised and refined the implementations on HoloLens of the displays selected for the user study. The main modification was introduced in the design of the Visual Stepped Progressive display, which consisted of implementing an additional intermediate color, orange, for the middle range of the hazard zone (one additional step in the progression of the stepped progressive design). The rationale for this was, primarily, the user feedback obtained from the display selection study, indicating the desire to see additional steps that graded progression toward the core of the hazard. Based on that, we decided to add one more color, orange, between yellow and red. Only one color was added to avoid blurring the step effect of abrupt color transitions that the participants had favored in the visual display when selecting this design. The resulting final display designs for the modality comparison study are represented graphically in Fig. 8. Notice that the auditory display saw no modification from the original design that had been selected. These final display designs were thus implemented to be rendered on a first-generation HoloLens device. With the final design, each one of the unimodal hazard awareness displays behaved as illustrated in Fig. 9. Refer also to Sections 3.2.1 and 3.2.2 for further details about the behavior of each selected display (Visual Stepped Progressive and Auditory Progressive), except for the extra color step introduced for the visual display. Figure 8 Open in new tabDownload slide Design of the modified Visual Stepped Progressive display (left) and the Auditory Progressive display (right) compared in the user study. The vertical bar represents the central axis of the hazard. Figure 8 Open in new tabDownload slide Design of the modified Visual Stepped Progressive display (left) and the Auditory Progressive display (right) compared in the user study. The vertical bar represents the central axis of the hazard. Figure 9 Open in new tabDownload slide Single modality hazard awareness displays used in the modality comparison study. A user wearing a HoloLens device walks toward the source of a hazard (seen by the user as a stationary hologram in the shape of a vertical pole). When entering the hazard zone around the pole (the red dotted circle, invisible for the user) the user is made aware of the hazard through a three-step Visual Stepped Progressive display (left) or through an Auditory Progressive display (right). Figure 9 Open in new tabDownload slide Single modality hazard awareness displays used in the modality comparison study. A user wearing a HoloLens device walks toward the source of a hazard (seen by the user as a stationary hologram in the shape of a vertical pole). When entering the hazard zone around the pole (the red dotted circle, invisible for the user) the user is made aware of the hazard through a three-step Visual Stepped Progressive display (left) or through an Auditory Progressive display (right). 4 MODALITY COMPARISON USER STUDY The final design of the Visual Stepped Progressive and Auditory Progressive hazard awareness displays were compared in a user study as described in this section. It should be stressed once more that our objective with this study was not to obtain the best possible design of a hazard awareness display. More realistically, our goal was to learn how different modality-specific design features may influence user performance and experience. For that, we chose to collect data from a limited-sized cohort of participants across a range of quantitative and qualitative metrics. 4.1 Experimental design A three-condition, within-groups, repeated-measures study was designed to compare the performance and the UX provided by different display designs rendered on a HoloLens device. The three conditions included in the study, for comparison when carrying out a pedestrian navigation task, were (i) auditory condition (using only the auditory hazard-awareness display), (ii) visual condition (using only the visual hazard awareness display) and (iii) audiovisual condition (with both the auditory and the visual hazard awareness displays at the same time). 4.2 Experimental task The study was conducted in a square empty indoor space, 10-m long on each side, with bare walls that were painted in a light color. One of the walls was a large glass window that looked over a hillside. No direct sunlight came in. The room was in a quiet part of the building, with no audible background noise that could disturb the concentration of the participants. The experimental task involved navigating the indoor space to complete a gamified challenge while avoiding invading the hazard zones that populated it. When glancing across the room through the HMD, the central holographic axes of eight hazard zones could be seen, distributed throughout the navigation space. The outside boundaries of the hazard zones were not visible (or otherwise perceivable) in any way when looking from outside their boundaries (see Fig. 10). Figure 10 Open in new tabDownload slide Still capture of the view from within one of the mazes used in the evaluation, where three virtual poles are seen through HoloLens. A yellow ball and the blue basket used in the study can also be seen on the floor. Figure 10 Open in new tabDownload slide Still capture of the view from within one of the mazes used in the evaluation, where three virtual poles are seen through HoloLens. A yellow ball and the blue basket used in the study can also be seen on the floor. We created three maze designs (shown in Fig. 11), each with eight danger zones (∅2.6-m cylindrical volumes centered on each vertical pole). The layouts of the mazes were horizontally and vertically symmetrical. Walking between neighboring hazard zones without invading any of them was possible but challenging. Since the outside limits of the hazard zones were not visible from further out, the intention was that participants would end up entering the hazard zones accidentally and rather frequently when carrying out a navigation task across the space. Such accidental intrusions would trigger the stimuli from the awareness display or displays that were active in each condition. The navigation task consisted in playing an MR game that involved collecting, one by one, six physical rubber balls (∅5 cm) from the floor and putting them in the basket with the same color as the ball. There were three yellow and three blue balls. The game, as explained to the participants, consisted in picking up each ball and placing them in the basket with the same color, one by one. The task had to be completed as quickly as possible, without entering any of the hazard zones. If a hazard zone was entered, it had to be exited as promptly as possible. The task always started with the participant standing next to the yellow basket. Then, the participant collected the blue balls and put them one by one in the blue basket, thus finishing the task near the blue basket. The task stopped when the participant clicked the ‘end’ option on a holographic panel menu located next to the blue box. The physical balls and baskets could be seen on the floor with the naked eye. Through the HoloLens device, the rubber balls and the baskets could be seen in relation to the central axes of the hazard zones. The relative positions of balls and boxes required that a participant walked along lengthy meandering routes to complete the task without invading any hazard zone. The balls were distributed evenly across the floor and in similar ways for the three mazes (one ball of each color was always placed in the half of the space in which the box of the same color was, and two on the opposite side). Figure 11 Open in new tabDownload slide Maze designs. Figure 11 Open in new tabDownload slide Maze designs. 4.3 Participants We recruited 12 participants for the study (6 female and 6 male, in the 18–30-years-old age range). Of the 12 participants, 7 declared to have some previous experience with AR technologies: 5 participants had played Pokemon Go and 2 participants had used decoration apps. The following additional prior experiences were endorsed by one participant each: played AR videogames, used AR advertisement, built a box to be used with a smartphone and took a module on AR during their Bachelor’s degree studies. In addition, 6 out of the 7 participants with some experience in AR, plus another 4 participants (10 participants in total) declared having some prior experience with VR. Most of them (8 of the 10 participants) mentioned experience with simulators; 2 participants had used VR glasses and 1 used glasses to create a VR environment with the use of the mobile phone. The participants were interns and postgraduate students working in technology and engineering fields and none of them was involved on the display selection study. An internal call for voluntary participation was advertised for participants to opt in. No monetary or in-kind reward was offered in exchange for the participation. Gender quotas were applied to ensure participation by equal numbers. As the quotas were filled up, further volunteers were added to a waiting list. Participants were informed of their right to withdraw from the study at any point, and of the anonymity of their participation. 4.4 Procedure and data collection Each participant completed the game in the three conditions—visual, auditory and audio–visual—each triggering the corresponding modality (or modalities) hazard warning response from the selected display designs. The order of the conditions was counterbalanced between the participants in the study. For each condition, one of the three maze configurations was allocated randomly, so that routes could not be learned. Each session started with filling out a demographic questionnaire, followed by a training session that was split into two parts. |$\bullet $| A first part for the habituation to the HoloLens device, and to acquire basic knowledge for its use in the study. During that first training part, the participant simply walked and looked around wearing the device. A holographic menu panel could be seen through the device, in which he/she could click buttons with the default HoloLens finger gesture, as a simple way of interacting with the augmented world. |$\bullet $| The goal of the second part of the training session was to understand the awareness display in the three conditions. For this training part, three hazard zones were created, each with its visible holographic central vertical axis and with a surrounding hazard zone with the same dimensions as in the main experimental task. Entering the hazard zone of each cylinder triggered warning information in one of the conditions: visual, auditory and audio–visual. Several balls were placed on the floor in random places for the participant to collect, so that the displays could also be experienced in a context that resembled the context of the game in the experimental task. During the experiment, participants filled out a post-task questionnaire after each condition (after playing the game and collecting all six balls in that condition). The questionnaire consisted of a single ease question (SEQ) score [Sauro, 2012] (a 7-point Likert scale rating how easy it was to carry out the experimental task in that condition, from very difficult to very easy), and a raw NASA-TLX questionnaire (RTLX), extended with the category irritability often included in auditory display research [Haas & Edworthy, 1996]. At the end of the experimental session, a semi-structured interview was conducted, which sought to obtain further insight from the participants regarding their experiences in playing the game (completing the experimental task). Additional questions in the interview were motivated by the answers that each participant had provided in the questionnaires. Questionnaires were transcribed and analyzed following the methodology of affinity diagramming as described in Lucero [2015]. 5 RESULTS AND DATA ANALYSIS Collected quantitative and qualitative data are presented and analyzed in this section. We then combine the findings from both types of data to obtain a more complete picture of the results obtained that is as complete and nuanced as possible. Specifically, we choose not to make use of null-hypothesis significance testing, in favor of effect sizes (ESs), estimation techniques using 95% confidence intervals and accumulation of evidence that allows the reader to extract their own critical conclusions, without constraining discussions to dichotomous assessments of significance. This is increasingly accepted to be the preferred approach to statistics in human–computer interaction (HCI) [Cumming, 2014, Dragicevic, 2016]. When presenting quantitative results, we observe mean differences as unstandardized ESs [Dragicevic, 2020]. 5.1 Quantitative results The responses to the SEQ questionnaire after each condition (post-task questionnaire) are summarized in Fig. 12, with higher values indicating that the task was easier to complete. On average, task ease was rated as moderate in all conditions, with only small differences between average values and mostly overlapping confidence intervals. The average ease rating in the visual condition (3.83, |$CI_{95\%}$|[2.76, 4.91]) was slightly lower than the ratings in the audio–visual condition (4.67, |$CI_{95\%}$|[3.72, 5.62]) and audio-only condition (4.58, |$CI_{95\%}$|[4.01, 5.16]). Amid the uncertainty of these ESs, data showed some chance that the task might be more difficult to complete if only visual feedback was present. Figure 12 Open in new tabDownload slide Responses to the SEQ post-task questionnaire, rating how easy it was to execute the task in each condition. The graph shows average values per condition. Error bars represent 95% confidence intervals. Figure 12 Open in new tabDownload slide Responses to the SEQ post-task questionnaire, rating how easy it was to execute the task in each condition. The graph shows average values per condition. Error bars represent 95% confidence intervals. The following two figures summarize results from observed quantitative metrics. The data from time-related metrics are presented graphically in Fig. 13. The graph on the left shows that the average time to complete the task (to play the game) was shortest in the audio-only condition (139.89 s, |$CI_{95\%}$|[99.49 s, 180.28 s]) and longest in the visual-only condition (158.48 s, |$CI_{95\%}$|[126.24 s, 190.72 s]), which was only slightly longer in average than in the audio–visual condition (150.45 s |$CI_{95\%}$|[111.25 s, 189.64 s]). Differences between conditions were thus small and uncertainty high, suggesting that completion time differences between conditions were not significant. These results beg the question: how much of the total task execution time did participants spend inside hazard zones? Data shown in the middle and right graphs in Fig. 13 aim to respond to that question. The middle graph shows that the average total time spent inside hazard zones during the execution of the task (cumulative time from all the intrusions in hazard zones during one task execution) was longest in the visual-only condition, amounting to a total 55.27 s in average (CI95%[36.68 s, 73.86 s]) that were spent inside hazard zones, over a third of the total task completion time in that condition, which also accounted for the highest average percentage of the total execution time that participants spent inside hazard zones (Fig. 13, right graph): 34.48% of the total time in the visual condition (⁠|$CI_{95\%}$|[26.16%, 42.81%]). This percentage was slightly lower in the in the audio display condition (31.37%, |$CI_{95\%}$|[22.56]%, 40.18%]), and it was lowest in the audio–visual condition (25.13%, |$CI_{95\%}$|[16.19%, 34.01%]), with a quartet of the navigation time spent inside hazard zones. Comparison of confidence intervals suggests that there is some chance for users to remain inside a hazard zone if they only receive visual information. However, uncertainty around this possible effect is high. Figure 13 Open in new tabDownload slide Task completion time, the total time spent inside hazard zones and the relative proportion of one another. Error bars represent 95% confidence intervals. Figure 13 Open in new tabDownload slide Task completion time, the total time spent inside hazard zones and the relative proportion of one another. Error bars represent 95% confidence intervals. Figure 14 Open in new tabDownload slide Total number of intrusions in a hazard zones during a task execution (left), and the mean duration of an intrusion, in seconds (right). The graph shows average values per condition. Error bars represent 95% confidence intervals. Figure 14 Open in new tabDownload slide Total number of intrusions in a hazard zones during a task execution (left), and the mean duration of an intrusion, in seconds (right). The graph shows average values per condition. Error bars represent 95% confidence intervals. The graphs in Fig. 14 add to the data just presented by zooming into single intrusions into hazard zones. The graph on the left shows that the number of times a participant entered and then exited a hazard zone during the execution of the task was independent of the hazard warning display. In fact, the average number of times participants intruded in hazard zones throughout a complete game play, turned out to be almost exactly 23 in the three conditions. To provide the exact numerical figures: 23.67 times with the auditory display (⁠|$CI_{95\%}$|[17.99, 29.34]), 22.75 times with the visual display (⁠|$CI_{95\%}$|[17.87, 27.63]) and 22.5 times with the combined audio–visual display (⁠|$CI_{95\%}$|[15.35, 29.65]). This coincidence is not surprising because the hazard displays in this study were activated only after entering a hazard zone, and they did not offer any guidance beforehand. In contrast, the graph on the right shows that, once inside a hazard zone, the type of hazard display may have had an influence in how long it took for a participant to return back to safety. In fact, the results from the study show that it took participants longest in average to return to safety when the display was only visual (2.37 s, |$CI_{95\%}$|[1.89 s, 2.84 s]). If the display included auditory feedback, time to safety was clearly lower in average. With the auditory display alone, the average time spent in a hazard zone was 1.76s (⁠|$CI_{95\%}$|[1.39 s, 2.12 s]). If both displays were present, the average time was reduced further only slightly (1.55s, |$CI_{95\%}$|[1.11 s, 1.98 s]). This suggests that, although there might be some benefit in having both displays, the main difference can probably be attributed to the contribution made by the auditory display. While some degree of uncertainty remains around these ESs (there is some overlap between all pairs of confidence intervals), this is the metric with the largest ES observed in the study. Figure 15 Open in new tabDownload slide Results from the extended RTLX questionnaire, presented by condition (A: audio; V: visual; AV: audio–visual). Notice that the Irritability category was added as an extension and was not used to compute the RTLX index. The graph shows average values per condition. Error bars represent 95% confidence intervals. Numerical values of means and confidence intervals are shown in Table 2. Figure 15 Open in new tabDownload slide Results from the extended RTLX questionnaire, presented by condition (A: audio; V: visual; AV: audio–visual). Notice that the Irritability category was added as an extension and was not used to compute the RTLX index. The graph shows average values per condition. Error bars represent 95% confidence intervals. Numerical values of means and confidence intervals are shown in Table 2. The next step to progress in this analysis was to understand the mental processes of the participants and to know about their subjective experiences during the study. An entry point for this is the quantitative analysis of the results collected through the extended RTLX questionnaire. As said in the previous section, this is the raw NASA task load index questionnaire that had been extended with an additional question, Irritability, which was not used to compute the RTLX index. The collected data are summarized graphically in Fig. 15. For better legibility of the discussion in the main text, numerical data (mean values and limits of the 95% confidence intervals) are collected in Table 2. As it can be seen in Fig. 15 (by inspecting the small differences between means and the large overlaps between confidence intervals), neither the RTLX index itself nor any of the categories that make up the index, or the irritability category showed any differences that can be remarked on. 5.2 Qualitative results Interpreting the quantitative observations presented above requires the thorough and systematic analysis of comments and opinions from the participants in the study. As said, qualitative data were collected by conducting semi-structured interviews with the participants after they completed all the conditions. All data from the interviews were then analyzed through affinity diagramming. The facilitator conducted the interviews by focusing on the discussion of the following themes: pleasantness, task difficulty, change of color (in the visual display), change of frequency (in the auditory display), constant presentation of stimuli, direction and distance relative to the participant of the origin of the hazard and perception of danger. During the interviews, the facilitator invited the participants to identify positive and negative aspects about each theme and with reference to the different display versions. All interviews were transcribed, and individual comments were collected on Post-it notes that were then grouped in an affinity diagram. Through this process, three main subjective experience categories were identified within which all obtained comments could be classified. The categories were pleasantness, task difficulty and danger. Given the diversity of information related to danger, that third category was subdivided into another three subcategories: perception of danger, distance to the origin of the danger and direction of the origin of the danger, relative to the user. This structure of subjective experience categories is represented graphically in Fig. 16. Within each category or sub-category, identifiers of participants providing relevant comments are clustered by condition and by whether the comments conveyed positive or negative aspects of the referred display in the corresponding category. The following discussion aims to gain insight into each category of the structure, based on the customer responses and comments in each cluster. Table 2 Numerical data corresponding to the extended RTLX questionnaire results, results summarized in Fig. 15 (irritability was not used to compute the RTLX index). . Auditory . Visual . A&V . Mental demand 11 [7.89, 14.11] 11.42 [7.87, 14.96] 10.92 [7.83, 14] Physical demand 8.33 [4.72, 11.95] 8.83 [4.97, 12.7] 8.33 [4.8, 11.87] Temporal demand 10.42 [7.49, 13.34] 10.92 [8.06, 13.78] 9.67 [6.78, 12.56] Effort 11.83 [9.03, 14.63] 12.33 [9.42, 15.24] 11.59 [8.99, 14.17] Performance 9.5 [7.43, 11.57] 9.58 [7.08, 12.09] 9.5 [7.27, 11.73] Frustration 8.42 [5.51, 11.33] 9.25 [6.08, 12.42] 8.42 [5.44, 11.39] Irritability 7.33 [4.54, 10.13] 6.25 [3.37, 9.13] 6.42 [3.48, 9.35] RTLX 9.92 [7.91, 11.92] 10.39 [7.88, 12.9] 9.74 [7.55, 11.92] . Auditory . Visual . A&V . Mental demand 11 [7.89, 14.11] 11.42 [7.87, 14.96] 10.92 [7.83, 14] Physical demand 8.33 [4.72, 11.95] 8.83 [4.97, 12.7] 8.33 [4.8, 11.87] Temporal demand 10.42 [7.49, 13.34] 10.92 [8.06, 13.78] 9.67 [6.78, 12.56] Effort 11.83 [9.03, 14.63] 12.33 [9.42, 15.24] 11.59 [8.99, 14.17] Performance 9.5 [7.43, 11.57] 9.58 [7.08, 12.09] 9.5 [7.27, 11.73] Frustration 8.42 [5.51, 11.33] 9.25 [6.08, 12.42] 8.42 [5.44, 11.39] Irritability 7.33 [4.54, 10.13] 6.25 [3.37, 9.13] 6.42 [3.48, 9.35] RTLX 9.92 [7.91, 11.92] 10.39 [7.88, 12.9] 9.74 [7.55, 11.92] Open in new tab Table 2 Numerical data corresponding to the extended RTLX questionnaire results, results summarized in Fig. 15 (irritability was not used to compute the RTLX index). . Auditory . Visual . A&V . Mental demand 11 [7.89, 14.11] 11.42 [7.87, 14.96] 10.92 [7.83, 14] Physical demand 8.33 [4.72, 11.95] 8.83 [4.97, 12.7] 8.33 [4.8, 11.87] Temporal demand 10.42 [7.49, 13.34] 10.92 [8.06, 13.78] 9.67 [6.78, 12.56] Effort 11.83 [9.03, 14.63] 12.33 [9.42, 15.24] 11.59 [8.99, 14.17] Performance 9.5 [7.43, 11.57] 9.58 [7.08, 12.09] 9.5 [7.27, 11.73] Frustration 8.42 [5.51, 11.33] 9.25 [6.08, 12.42] 8.42 [5.44, 11.39] Irritability 7.33 [4.54, 10.13] 6.25 [3.37, 9.13] 6.42 [3.48, 9.35] RTLX 9.92 [7.91, 11.92] 10.39 [7.88, 12.9] 9.74 [7.55, 11.92] . Auditory . Visual . A&V . Mental demand 11 [7.89, 14.11] 11.42 [7.87, 14.96] 10.92 [7.83, 14] Physical demand 8.33 [4.72, 11.95] 8.83 [4.97, 12.7] 8.33 [4.8, 11.87] Temporal demand 10.42 [7.49, 13.34] 10.92 [8.06, 13.78] 9.67 [6.78, 12.56] Effort 11.83 [9.03, 14.63] 12.33 [9.42, 15.24] 11.59 [8.99, 14.17] Performance 9.5 [7.43, 11.57] 9.58 [7.08, 12.09] 9.5 [7.27, 11.73] Frustration 8.42 [5.51, 11.33] 9.25 [6.08, 12.42] 8.42 [5.44, 11.39] Irritability 7.33 [4.54, 10.13] 6.25 [3.37, 9.13] 6.42 [3.48, 9.35] RTLX 9.92 [7.91, 11.92] 10.39 [7.88, 12.9] 9.74 [7.55, 11.92] Open in new tab Figure 16 Open in new tabDownload slide Structure of qualitative results, as revealed through affinity diagramming. Participant identifiers (Ux) are clustered in the corresponding categories. Examples of comments that motivated the clustering are provided in the main text. Figure 16 Open in new tabDownload slide Structure of qualitative results, as revealed through affinity diagramming. Participant identifiers (Ux) are clustered in the corresponding categories. Examples of comments that motivated the clustering are provided in the main text. 5.2.1 Pleasantness In the visual condition, half of the participants (6/12) expressed negative opinions regarding pleasantness, vs. three participants that expressed positive opinions. Two of the participants (U9, U10) found this the least pleasant of the conditions. Others (U5, U6) mentioned, as a source of unpleasantness, the constant light and tinted scene for as long as the participant was in the hazard zone. Two participants (U10, U11) drew a contrast with audio on this: they found it unpleasant to have the colored light constantly and everywhere in their FoV, as opposed to the audio condition, in which sound was spatially located in a specific direction (the direction of the source of danger). For U1, this produced frustration and could lead to fatigue. Of the three participants that voiced positive opinions about the visual display in relation to pleasantness, two participants (U1, U2) felt it produced less irritability than the other conditions, leading to a better overall experience. In the audio condition, there was a balance in the number of positive and negative comments regarding pleasantness. U1 commented that sound was startling, which could generate irritability, while U2 experienced frustration. However, others commented that the audio display was pleasant (U9) and that due to the directionality of spatial sound, the user could easily detect the direction where the danger was (U10, U11). In the audio–visual condition, participants made only negative comments regarding pleasantness. Two participants (U1, U4) felt that too much information was given at the same time, while another participant said that audio–visual warnings were unsuitable for use in applications where users need to concentrate for long periods of time (U11). 5.2.2 Task difficulty Participants felt that task difficulty varied depending on the condition. Most participants (10/12) spoke negatively about the audio–visual condition, where eight participants (U1, U3, U4, U6, U7, U8, U9, U10) said that too much information was presented, which was distracting. However, U6 felt that this only occurred at the beginning of the task. U10 said that this issue could slow down task completion. Distraction might be worsened by the presence of the arrow in the visual interface (U5), intended to locate the origin of the hazard. It was also suggested that too much information could reduce user concentration (U11). The visual-only condition also received more negative than positive comments regarding task difficulty. U5 felt that the light was distracting, rather than an indicator of danger. Others felt this condition made the task more difficult (U1, U12). However, two participants (U3, U8) said this only occurred at the beginning of the task. In contrast with the majority of other participants’ opinions, U12 stated that the visual condition generated the least mental demand, which made it easier to perform the task. Very few participants spoke about task difficulty in the audio condition. U12 said that mental demand was higher in this condition than in the others. However, U11 felt the opposite, saying audio improved concentration to perform the task. 5.2.3 Danger Participant comments regarding danger led to the identification of three sub-categories: recognizing the direction (relative to the user) where the origin of the danger was, judging the distance to the center of the danger zone and the overall level of danger perceived. Each of these factors is discussed separately. Direction of danger The need to identify the direction of the danger zone was mentioned only for the visual and audio conditions, where the visual condition received more negative comments (7/12 participants commented negatively) and the audio condition more positive ones (7/12 participants commented positively). Considering the visual Condition, 6 participants (U3, U4, U5, U6, U7, U9) commented that the colored light did not indicate the direction of the danger. Three participants (U6, U7, U11) pointed out that the arrow symbol used to indicate the direction of danger was less effective than the spatial sound presented in the audio condition. In contrast, five participants (U2, U7, U8, U11, U12) said the spatial sound in the audio condition made it easy to detect danger direction. This helped users to avoid danger zones without needing to look directly at them, making task completion easier (U5) and danger detection easier than the arrow used in the visual condition (U6, U7). However, three participants (U2, U10, U12) stated that the audio did not indicate the direction of danger accurately, and U6 said it was difficult to interpret the spatial sound. Distance to danger: Participants commented only about the visual and audio conditions, not the condition with the combination of those. Some participants (4/12) who spoke positively about this theme in the visual condition (U3, U6, U8, U10) explained that, thanks to the different color spheres, it was easy to judge distance to the center of the danger zone. U10 commented that a visual signal was necessary to correctly indicate the range of the danger zone, with audio of little use in this context. However, two participants felt differently, commenting that the visual condition did not enable them to judge distance to the center of the danger zone (U9, U11). U9 found it easier to judge distance in the audio condition. Of the 12 participants, 5 made negative comments about distance perception in the audio condition. Four of them (U3, U6, U7, U10) indicated difficulty in discerning the frequency variation, which could lead to judging the distance to the danger incorrectly. Two participants (U8, U10) said that it was easier to react to changing colors in the visual condition than to the frequency variation in audio. However, three users had positive opinions about audio, with two (U11, U12) finding it easy to detect the frequency variation. U9 stated that it was easier to react to the frequency variation than to color change. Danger level perception In the visual condition, most of the comments regarding danger perception were negative (seven negative comments vs. two positive ones were recorded.). Some participants (U4, U9, U12) felt that the visual condition did not adequately transmit a feeling of danger, as users became accustomed to the light and thus less aware of the danger (U8, U12), which U1 said also worsened reaction time. Two participants (U4, U5) found the light distracting. Others said the light caused slower reaction time (U3, U8, U9, U12). However, U10 differed, saying that visual information caught the attention of users better and thus enabled faster reaction to the danger. The audio condition, in contrast, received mainly positive comments on danger perception (11 out of the 12 participants commented positively about it). Half of the participants (U1, U3, U4, U6, U7, U9) explicitly said that audio led to faster reaction times, and some (U4, U9, U6) said audio was the most effective method of communicating a feeling of danger. Two participants (U5, U12) said that the audio feedback was best at grabbing user attention. U5 commented that sound alone is sufficient to transmit an awareness of danger, and others felt that audio was a better method for warning about danger than visual (U8, U9, U12). U12 said that the difference between the visual and the audio conditions was that users could become accustomed to light stimuli but that it was impossible for this to happen with audio stimuli. However (as the only negative comment about audio for danger perception), U10 felt that it was also possible to become accustomed to sound and thus ignore it, eventually leading to slower reaction times. Finally, regarding the audio–visual condition, only two participants made comments and both were positive. One participant (U12) felt that audio–visual display was the best method to transmit a feeling of danger, and another participant (U3) stated this enabled faster reaction to the danger. 6 DISCUSSION Having laid out all the quantitative and qualitative results obtained from the user study, a set of conclusions can be drawn. A first general conclusion (reached by others, e.g. Houtenbos et al., 2017, for audio–visual warning displays) is that the three display configurations were similarly usable as hazard warning displays, which supports the notion that the selection of visual and auditory display designs (reported in Section 3) was appropriate for the comparison user study: the difficulty of the experimental task was rated as medium in the three conditions (see SEQ questionnaire results, Fig. 12); the total time to complete the task remained mostly between 2 and 3 minutes in all cases (see left graph in Fig. 13); the task load index and its constituent components were rated very similarly across conditions (see Fig. 15). Inspecting the results more closely, however, subtle (and not so subtle) patterns of differences between displays emerged from the quantitative data, many of which are also reflected in the qualitative participant comments, opinions and preferences. Quantitatively, the most clearly observed difference between conditions was that each single intrusion in a hazard zone lasted longest when navigating with the visual-only display (see Fig. 14, right), although some uncertainty about this finding also remained. This is the result from the study with a more practical significance. How urgent it is that a user abandons a hazard zone, will depend on the application scenario and on the actual level of danger in which users may find themselves.The importance of leaving the hazard zone urgently will also depend on the disturbance the user causes to a system or process by remaining in the hazard zone, resulting in the activation of safety mechanisms. Taking these aspects into account, this finding tells designers that an auditory warning display will lead to the user vacating the hazard zone in less time than if the display is only visual. Although with greater uncertainty, an indication of a possible disadvantage from receiving visual-only feedback was observed for the whole-task completion time metric (Fig. 13). With even greater uncertainty, results from the SEQ questionnaire (Fig. 12) may tentatively suggest slightly increased difficulty of completing the task with visual feedback alone. A relative disadvantage of visual-only warning displays, in relation to other modalities or combinations or modalities in multimodal displays, is in line with what is reported in the literature. For instance, a visual-only design was found to be inferior to an auditory-only or to an audio–visual design in terms of effectiveness and participant preference [Gutwin et al., 2011, Houtenbos et al., 2017]. However, further research is required to confirm that this correlation across metrics exists, given the uncertainty observed around these ESs. Regarding other quantitative data from the study, no condition effect was observed for any of the categories measured with the extended RTLX questionnaire. Turning to qualitative data, possible explanations for the effects observed were articulated by the participants. The analysis of such qualitative feedback is crystalized in the categorization structure of Fig. 16 and elaborated in Section 5.2 The pattern in the quantitative results discussed above indicates that the presence of auditory feedback in the display made it more effective for the user to leave the hazard zone and return to safety swiftly and successfully. Relevant participant insights were clustered in the danger perception category, in Fig. 16. In this, large clusters of positive comments for the auditory display and negative ones for the visual display were collected, with comparatively few instances of the opposite sign in each case. The comments in these clusters (which are based on first-hand experience acquired during participation in the study) state that auditory stimuli were more effective at grabbing participant attention and boosting reaction than visual stimuli. A similar reaction boosting capacity of auditory vs. visual stimuli was reported in prior studies, e.g. in Chan & Ng [2009] and in Haas & Van Erp [2014]. In addition, participants believed that it was likely to experience habituation to the stimuli in the visual display, but not so much to the audio stimuli in the auditory display; this would further lower the attention-grabbing and urgency-conveying capacity of the visual display. This advantage of the auditory display over the visual display (good to grab attention and reduce reaction time, with good resilience to habituation) could be a part of the explanation for the pattern observed in the quantitative results, according to which it took longer for participants to leave the hazard area with the visual-only display (i.e. in the absence of auditory feedback in the display). The qualitative data revealed a possible explanation for this, related to identifying the direction relative to the user from which the hazard warning was emerging (i.e. the direction relative to the user in which the pole activating the hazard warning was, and from which the user had to get away). The fact that auditory warning signals are more effective at attracting attention to positions than visual signals has been documented in the literature [Haas & Van Erp, 2014, Spence & Driver, 2017]. As discussed in relation to the danger direction category in Fig. 16, participants found that spatial audio was an effective way to recognize the direction in which the triggering hazard was. This was in contrast to the visual display in which the narrow FoV of the HMD device required decoding the indications given by the arrow-shaped directional artifact (the study in Politis et al., 2014, also supports this notion). This blindness outside the central region of the visual field and the mental burden of decoding the behavior of a visual artifact likely contributed to participants lingering inside the hazard zone for longer. Considering also the comments from the danger distance category in Fig. 16, color transitions in the visual display were regarded as effective to judge relative distance to the core of the hazard, but deriving direction from evolving color gradients would take engaging in trial-and-error displacements of the body, until the direction of maximum gradient was found. Altogether, according to participant comments, the mechanisms provided by the visual display were less intuitive, less ecologically valid and cognitively more demanding. Qualitative data revealed some additional key aspects with potential to affect the UX obtained using the hazard warning displays. Regarding the auditory display, a key aspect that may affect UX negatively is its potential to be found irritable [Haas & Edworthy, 1996]. Several participants mentioned this, although the irritability scale failed to capture it. This, together with the resulting cluttering of the auditory scene, could lead to fatigue over time, as also suggested by some participants. A few participants simply said that they did not like audio, which might be simply due to personal preference. It is a case of trade-off between effective warning (hearing auditory warnings information cannot be voluntarily avoided or easily got used to) and good tolerability over extended periods of time (cumulative irritation must be avoided). In the case of the visual display, an aspect that can affect UX negatively is the widely acknowledged narrow FoV of current AR HMD devices, and of the first-generation HoloLens device, used in this study, in particular. AR with a narrow FoV leads to poor ecological validity of information presented through the visual channel because it deviates significantly from how humans make use of their visual FoV to explore their environment, including the role peripheral vision plays during navigation, by viewing and recognizing objects in the surrounding without looking at them directly [Yamamoto & Philbeck, 2013]. The compensatory dynamic visual cue introduced needs to be learned, and it contributes to cluttering the visual scene, all of which can be detrimental for the UX obtained [Vi et al., 2019]. User comments presented in Section 5 include negative feedback about the visual display related to this. 7 LIMITATIONS The study just presented suffers from the limitation of having used a sample population that was small (12 participants). Such population size is not uncommon in the formative HCI literature. However, this limits the conclusions that can be drawn from the data the study is able to provide (in the case of this paper, the large uncertainty surrounding all ESs). A larger sample size would have reduced this uncertainty and revealed findings with more clarity. An additional limitation is the extent to which the design space for auditory and visual awareness displays was explored and represented in the display options produced and in the designs selected for comparison. Several alternative design options not included in the selected designs are mentioned in Section 3. Although justification for the design decisions has been provided, some of the alternative options mentioned would have probably made the same display patterns implemented for different modalities more closely comparable. A final limitation is imposed by the state of the art of the technology used for the study. Color cast intensity gradient could not be fully exploited for parametric information display design because of the poor distinguishable range of intensities that were possible with some colors. More significantly, the narrow FoV of the device hindered the potential of vision as a sensory channel, and additional visual information display artifacts had to be created to partially compensate for this. 8 CONCLUSIONS AND NEXT STEPS The aim of this work was to compare visual and auditory displays rendered on a HoloLens multimodal (auditory and visual) head-mounted MR device (as representative of the current state of the art in this type of devices). Motivated by the increasing presence of fenceless collaborative robots in work environments, we were interested in using such displays to warn users about hazard zones that robots generate around themselves, in a pedestrian navigation scenario of a space shared between humans and robots. However, to provide a setup with relevance in other domains, generic static hazard representations and experimental task have been used in this study. The study compared the use of a visual-only hazard awareness display, an auditory-only display and an audio–visual display (combination of the other two displays), in the context of a pedestrian navigation game that participants played as the experimental task. The quantitative and qualitative data obtained suggest that there are small differences between conditions regarding performance executing the navigation task while also keeping outside hazard zones. In that sense, the displays designed (through the process described in the first part of the paper), all passed the scrutiny of usability and usefulness with comparable scores. However, a subtle but consistent pattern appeared in the data, which suggests that user performance and effectiveness at keeping the person safe might have been slightly superior when auditory feedback was provided (on its own or in combination with the visual display) likely because of its attention grabbing effectiveness and clear directionality in the spatial sound implementation used. In a further analysis of results, the inspection of qualitative data identified aspects about the displays (intrinsic to the differences between sensory channels and display technologies) with the potential to negatively affect the UX obtained from using the displays. Regarding the auditory display, a likely negative influence on UX is the fact that, over time, human users tend to find auditory alarms and alerts irritating, among other reasons because they cannot be ignored or easily got used to. This is desirable for the warning display to remain effective but undesirable over time for aspects like worker satisfaction. On the part of the visual display, the narrow FoV of current HMD devices is likely to damage UX, given the resulting difficulty to keep awareness of objects in the near surroundings, which lie in the regions of the peripheral vision. In addition, the ease with which users can get used to the visual cues can result in the display not fulfilling its main function, and this itself would lead to experiencing poor UX. The next steps of our research will focus on addressing some of the limitations of this study and improving display designs based on the main findings. For instance, in order to reduce irritability, a revised design of the audio–visual display might be such that only visual stimuli are conveyed in the outer layer of the hazard zone, with audio near the core for increased urgency and possibly also to signal crossing the external boundary of the hazard zone. As for the visual FoV, HMD devices with wider FoV will be used; VR environments will also be considered to evaluate designs with less FoV restrictions. An additional expected benefit from using next generation HMV XR devices, or devices based on video see-through technology (instead of optical see-through), is that it may be possible to implement additional visual display designs that can make optimal use of display parameters such as color cast intensity (overcoming one of the limitations from this study). Another important aspect for research with the next display implementations is the possibility for users to adjust and customize the setting of each information conveying parameter to their needs and preferences (e.g. color range and saturation levels, setting for the auditory signal patterns such as loudness gradients, frequency of pulsation, pulse selection and others). Expanding the design space further, cited related literature has shown that a tactile modality may play an important role as an alternative to audio in boosting the attention grabbing and urgency conveying capacity of a visual display. Importantly, tactile displays can provide access to warning information for persons with sensorial disabilities and for contexts with changing environmental conditions (e.g. noisy contexts). For this reason, a tactile display will be considered and studied as part of trimodal designs, and as a display component that can help implement design-for-all principles [Stephanidis, 2000]. An important final requirement to be observed in follow up research is the design of studies with larger user sample sizes. This will offer higher power for the identification of ESs, and thus to obtain results with lower uncertainty than were reported in the present study. Studies will also be longer, with tasks that are closer to real scenarios and applications. Regarding the application domain, which is very generic in this study, it will be made explicitly relevant for the navigation among robots. To achieve this, a navigation environment will be created in which real robots (both stationary and mobile) share space with human users. The goal will be to obtain findings that have relevance and applicability in the intended application area. Funding The Centre for the Developmentof Industrial Technology (CDTI). 5R- Red Cervera de Tecnologías robóticas en fabricación inteligente (CER-20211007). References Aaltonen , I. and Salmi , T. ( 2019 ) Experiences and expectations of collaborative robots in industry and academia: Barriers and development needs . Procedia Manuf. , 38 , 1151 – 1158 . Google Scholar Crossref Search ADS WorldCat Bach , D. R. , Neuhoff , J. G., Perrig , W. and Seifritz , E. ( 2009 ) Looming sounds as warning signals: The function of motion cues . Int. J. Psychophysiol. , 74 , 28 – 33 . Google Scholar Crossref Search ADS PubMed WorldCat Behrens , R. , Saenz , J., Vogel , C. and Elkmann , N. ( 2015 ) Upcoming technologies and fundamentals for safeguarding all forms of human–robot collaboration . In 8th Int. conf. safety of industrial automated systems (SIAS 2015) , Deutsche Gesetzliche Unfallversicherung (DGUV) , Königswinter, Germany , pp. 18 – 20 . Blanco-Novoa , O. , Fernández-Caramés , T. M., Fraga-Lamas , P. and Vilar-Montesinos , M. A. ( 2018 ) A practical evaluation of commercial industrial augmented reality systems in an industry 4.0 shipyard . IEEE Access , 6 , 8201 – 8218 . Google Scholar Crossref Search ADS WorldCat Bolstad , C. , Costello , A. and Endsley , M. ( 2006 ) Bad situation awareness designs: What went wrong and why . In Proc. 16th world congress of int. ergonomics association . Elsevier Science & Technology Books . Chan , A. H. and Ng , A. W. ( 2009 ) Perceptions of implied hazard for visual and auditory alerting signals . Safety Sci. , 47 , 346 – 352 . Google Scholar Crossref Search ADS WorldCat Colley , M. , Walch , M., Gugenheimer , J. and Rukzio , E. ( 2019 ) Including people with impairments from the start: External communication of autonomous vehicles . In Proc. 11th int. conf. automotive user interfaces and interactive vehicular applications: adjunct proceedings . Association for Computing Machinery (ACM) , pp. 307 – 314 . Cumming , G. ( 2014 ) The new statistics: Why and how . Psych. Sci. , 25 , 7 – 29 . Google Scholar Crossref Search ADS WorldCat Doshi , A. , Smith , R. T., Thomas , B. H. and Bouras , C. ( 2017 ) Use of projector based augmented reality to improve manual spot-welding precision and accuracy for automotive manufacturing . Int. J. Adv. Manuf. Technol. , 89 , 1279 – 1293 . Google Scholar Crossref Search ADS WorldCat Dragan , A. D. , Lee , K. C. and Srinivasa , S. S. ( 2013 ) Legibility and predictability of robot motion . In Proc. 8th ACM/IEEE int. conf. human–robot interaction , pp. 301 – 308 . IEEE Press . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC Dragicevic , P. ( 2016 ) Fair Statistical Communication in HCI . In Modern Statistical Methods for HCI , pp. 291 – 330 . Springer . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC Dragicevic , P. ( 2020 ) A mean difference is an effect size . PhD Thesis, Inria Saclay Ile de France . Edworthy , J. and Hellier , E. ( 2006 ) Alarms and human behaviour: Implications for medical alarms . Brit. J. Anaesthesia , 97 , 12 – 17 . Google Scholar Crossref Search ADS WorldCat Edworthy , J. , Loxley , S. and Dennis , I. ( 1991 ) Improving auditory warning design: Relationship between warning sound parameters and perceived urgency . Human Factors , 33 , 205 – 231 . Google Scholar Crossref Search ADS PubMed WorldCat Evans , G. , Miller , J., Pena , M. I., MacAllister , A. and Winer , E. ( 2017 ) Evaluating the Microsoft HoloLens through an Augmented Reality Assembly Application . In Degraded Environments: Sensing, Processing, and Display 2017 , vol. 10197 , p. 101970V . International Society for Optics and Photonics . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Giang , W. and Burns , C. M. ( 2012 ) Sonification discriminability and perceived urgency . In Proc. human factors and ergonomics society annual meeting , vol. 56 , pp. 1298 – 1302 . SAGE Publications , Los Angeles, CA . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Gutwin , C. , Schneider , O., Xiao , R. and Brewster , S. ( 2011 ) Chalk sounds: The effects of dynamic synthesized audio on workspace awareness in distributed groupware . In Proc. ACM 2011 conf. computer supported cooperative work . Association for Computing Machinery (ACM) , pp. 85 – 94 . Haas , E. C. and Edworthy , J. ( 1996 ) Designing urgency into auditory warnings using pitch, speed and loudness . Comput. Control Eng. J. , 7 , 193 – 198 . Google Scholar Crossref Search ADS WorldCat Haas , E. C. and Van Erp , J. B. ( 2014 ) Multimodal warnings to enhance risk communication and safety . Safety Sci. , 61 , 29 – 35 . Google Scholar Crossref Search ADS WorldCat Haddadin , S. , Albu-Schaffer , A., De Luca , A. and Hirzinger , G. ( 2008 ) Collision detection and reaction: A contribution to safe physical human–robot interaction . In 2008 IEEE/RSJ int. conf. intelligent robots and systems , pp. 3356 – 3363 . IEEE . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC Hancock , P. A. , Billings , D. R., Schaefer , K. E., Chen , J. Y., De Visser , E. J. and Parasuraman , R. ( 2011 ) A meta-analysis of factors affecting trust in human–robot interaction . Human Factors , 53 , 517 – 527 . Google Scholar Crossref Search ADS PubMed WorldCat Hanington , B. and Martin , B. ( 2012 ) Universal Methods of Design: 100 Ways to Research Complex Problems, Develop Innovative Ideas, and Design Effective Solutions . Rockport Publishers . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Hoff , K. A. and Bashir , M. ( 2015 ) Trust in automation: Integrating empirical evidence on factors that influence trust . Human Factors , 57 , 407 – 434 . Google Scholar Crossref Search ADS PubMed WorldCat Houtenbos , M. d. , de Winter , J. C., Hale , A., Wieringa , P. and Hagenzieker , M. ( 2017 ) Concurrent audio-visual feedback for supporting drivers at intersections: A study using two linked driving simulators . Appl. Ergon. , 60 , 30 – 42 . Google Scholar Crossref Search ADS PubMed WorldCat Kaplan , A. D. , Cruit , J., Endsley , M., Beers , S. M., Sawyer , B. D. and Hancock , P. ( 2020 ) The effects of virtual reality, augmented reality, and mixed reality as training enhancement methods: A meta-analysis . Human Factors , 63 , 706 – 726 . Google Scholar Crossref Search ADS PubMed WorldCat Karwowski , W. , Rahimi , M., Nash , D. L. and Parsaei , H. R. ( 1988 ) Perception of safety zone around an industrial robot . In Proc. human factors society annual meeting , vol. 32 , pp. 948 – 952 . SAGE Publications , Los Angeles, CA . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC Kildal , J. , Tellaeche , A., Fernández , I. and Maurtua , I. ( 2018 ) Potential users’ key concerns and expectations for the adoption of cobots . Procedia CIRP , 72 , 21 – 26 . Google Scholar Crossref Search ADS WorldCat Lasota , P. A. and Shah , J. A. ( 2015 ) Analyzing the effects of human-aware motion planning on close-proximity human–robot collaboration . Human Factors , 57 , 21 – 33 . Google Scholar Crossref Search ADS PubMed WorldCat Li , X. , Yi , W., Chi , H.-L., Wang , X. and Chan , A. P. ( 2018 ) A critical review of virtual and augmented reality (VR/AR) applications in construction safety . Automat. Construct. , 86 , 150 – 162 . Google Scholar Crossref Search ADS WorldCat Lichtenthäler , C. and Kirsch , A. ( 2016 ) Legibility of Robot Behavior: A Literature Review . Working paper or preprint . Lohse , M. ( 2009 ) The Role of Expectations in HRI . In New Frontiers in Human–Robot Interaction . Citeseer . Lucero , A. ( 2015 ) Using affinity diagrams to evaluate interactive prototypes . In IFIP conf. human–computer interaction , pp. 231 – 248 . Springer . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC Makhataeva , Z. and Varol , H. A. ( 2020 ) Augmented reality for robotics: A review . Robotics , 9 , 21 . Google Scholar Crossref Search ADS WorldCat Makhataeva , Z. , Zhakatayev , A. and Varol , H. A. ( 2019 ) Safety aura visualization for variable impedance actuated robots . In 2019 IEEE/SICE int. symposium on system integration (SII) , pp. 805 – 810 . IEEE . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC Makris , S. , Karagiannis , P., Koukas , S. and Matthaiakis , A.-S. ( 2016 ) Augmented reality system for operator support in human–robot collaborative assembly . CIRP Ann. , 65 , 61 – 64 . Google Scholar Crossref Search ADS WorldCat Malm , T. , Salmi , T., Marstio , I. and Aaltonen , I. ( 2019 ) Are collaborative robots safe? Open Engineering. Automation in Finland 2019 Special Issue . Teulieres , M. , Tilley , J., Bolz , L., Ludwig-Dehm , P. M. and Wägner , S. ( 2019 ) Growth Dynamics in the Industrial Robotics Market . Technical Report, McKinsey & Company . Matsas , E. and Vosniakos , G.-C. ( 2017 ) Design of a virtual reality training system for human–robot collaboration in manufacturing tasks . Int. J. Interact. Design Manuf. , 11 , 139 – 153 . Google Scholar Crossref Search ADS WorldCat Matsas , E. , Vosniakos , G.-C. and Batras , D. ( 2018 ) Prototyping proactive and adaptive techniques for human–robot collaboration in manufacturing using virtual reality . Robot. Comput.-Integr. Manuf. , 50 , 168 – 180 . Google Scholar Crossref Search ADS WorldCat Maurtua , I. , Ibarguren , A., Kildal , J., Susperregi , L. and Sierra , B. ( 2017 ) Human–robot collaboration in industrial applications: Safety, interaction and trust . Int. J. Adv. Robot. Syst. , 14 , 1729881417716010 . Google Scholar OpenURL Placeholder Text WorldCat Michalos , G. , Karagiannis , P., Makris , S., Tokçalar , Ö. and Chryssolouris , G. ( 2016 ) Augmented reality (AR) applications for supporting human–robot interactive cooperation . Procedia CIRP , 41 , 370 – 375 . Google Scholar Crossref Search ADS WorldCat Murata , A. , Kanbayashi , M. and Hayami , T. ( 2013 ) Effectiveness of automotive warning system presented with multiple sensory modalities . In Int. conf. digital human modeling and applications in health, safety, ergonomics and risk management , pp. 88 – 97 . Springer . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC Oken , B. S. , Chamine , I. and Wakeland , W. ( 2015 ) A systems approach to stress, stressors and resilience in humans . Behav. Brain Res. , 282 , 144 – 154 . Google Scholar Crossref Search ADS PubMed WorldCat Onal , E. , Craddock , C., Endsley , M. and Chapman , A. ( 2013 ) From theory to practice: How designing for situation awareness can transform confusing, overloaded shovel operator interfaces, reduce costs, and increase safety . In Proc. int. symposium on automation and robotics in construction (ISARC) , vol. 30 , p. 1 . IAARC Publications . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC Or , C. K. , Duffy , V. G. and Cheung , C. C. ( 2009 ) Perception of safe robot idle time in virtual reality and real industrial environments . Int. J. Indust. Ergon. , 39 , 807 – 812 . Google Scholar Crossref Search ADS WorldCat Palmarini , R. , Fernandez del Amo , I., Bertolino , G., Dini , G., Erkoyuncu , J. A., Roy , R. and Farnsworth , M. ( 2018 ) Designing an AR interface to improve trust in human–robots collaboration . Procedia CIRP , 70 , 350 – 355 . Google Scholar Crossref Search ADS WorldCat Politis , I. , Brewster , S. A. and Pollick , F. ( 2014 ) Evaluating multimodal driver displays under varying situational urgency . In Proc. SIGCHI conf. human factors in computing systems . Association for Computing Machinery (ACM) , pp. 4067 – 4076 . Probst , L. , Frideres , L., Pedersen , B. and Caputi , C. ( 2015 ) Service Innovation for Smart Industry: Human–Robot Collaboration . European Commission , Luxembourg . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Reif , R. and Günthner , W. A. ( 2009 ) Pick-by-vision: Augmented reality supported order picking . Vis. Comput. , 25 , 461 – 467 . Google Scholar Crossref Search ADS WorldCat San Martín , A. and Kildal , J. ( 2019 ) Audio-visual AR to improve awareness of hazard zones around robots . In Extended abstracts of the 2019 CHI conf. human factors in computing systems , p. LBW2213 . ACM . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC Sauro , J. ( 2012 ) 10 things to know about the single ease question (seq) . Measur. U , 2012 . https://measuringu.com/seq10/. Google Scholar OpenURL Placeholder Text WorldCat Schäfer , T. , Huron , D., Shanahan , D. and Sedlmeier , P. ( 2015 ) The sounds of safety: Stress and danger in music perception . Front. Psychol. , 6 , 1140 . Google Scholar Crossref Search ADS PubMed WorldCat Selcon , S. , Taylor , R. and McKenna , F. ( 1995 ) Integrating multiple information sources: Using redundancy in the design of warnings . Ergonomics , 38 , 2362 – 2370 . Google Scholar Crossref Search ADS WorldCat Spence , C. and Driver , J. ( 2017 ) Audiovisual Links in Attention: Implications for Interface Design . In Engineering Psychology and Cognitive Ergonomics , pp. 185 – 192 . Routledge . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC Stephanidis , C. ( 2000 ) User Interfaces for All: Concepts, Methods, and Tools . CRC Press . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Sturdee , M. and Lindley , J. ( 2019 ) Sketching & drawing as future inquiry in HCI . In Proc. halfway to the future symposium 2019 . Association for Computing Machinery (ACM) , pp. 1 – 10 . Vi , S. , da Silva , T. S. and Maurer , F. ( 2019 ) User experience guidelines for designing HMD extended reality applications . In IFIP conf. human–computer interaction , pp. 319 – 341 . Springer . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC Vogel , C. , Fritzsche , M. and Elkmann , N. ( 2016 ) Safe human–robot cooperation with high-payload robots in industrial applications . In 2016 11th ACM/IEEE int. conf. human–robot interaction (HRI) , pp. 529 – 530 . IEEE . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC Vogel , C. , Poggendorf , M., Walter , C. and Elkmann , N. ( 2011 ) Towards safe physical human–robot collaboration: A projection-based safety system . In 2011 IEEE/RSJ int. conf. intelligent robots and systems , pp. 3355 – 3360 . IEEE . Google Scholar Crossref Search ADS Google Preview WorldCat COPAC Yamamoto , N. and Philbeck , J. W. ( 2013 ) Peripheral vision benefits spatial learning by guiding eye movements . Memory Cogn. , 41 , 109 – 121 . Google Scholar Crossref Search ADS WorldCat Yang , J. , Sasikumar , P., Bai , H., Barde , A., Sörös , G. and Billinghurst , M. ( 2020 ) The effects of spatial auditory and visual cues on mixed reality remote collaboration . J. Multimodal User Interface. , 14 , 337 – 352 . Google Scholar Crossref Search ADS WorldCat © The Author(s) 2021. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved. For Permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Audio-Visual Mixed Reality Representation of Hazard Zones for Safe Pedestrian Navigation of a Space JF - Interacting with Computers DO - 10.1093/iwc/iwab028 DA - 2021-10-14 UR - https://www.deepdyve.com/lp/oxford-university-press/audio-visual-mixed-reality-representation-of-hazard-zones-for-safe-2717lfwlBf SP - 1 EP - 1 VL - Advance Article IS - DP - DeepDyve ER -