Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

The communicative advantage: how kinematic signaling supports semantic comprehension

The communicative advantage: how kinematic signaling supports semantic comprehension Humans are unique in their ability to communicate information through representational gestures which visually simulate an action (eg. moving hands as if opening a jar). Previous research indicates that the intention to communicate modulates the kinematics (e.g., velocity, size) of such gestures. If and how this modulation influences addressees’ comprehension of gestures have not been investigated. Here we ask whether communicative kinematic modulation enhances semantic comprehension (i.e., identification) of gestures. We additionally investigate whether any comprehension advantage is due to enhanced early identification or late identification. Participants (n = 20) watched videos of representational gestures produced in a more- (n = 60) or less-communicative (n = 60) context and performed a forced-choice recognition task. We tested the isolated role of kinematics by removing visibility of actor’s faces in Experiment I, and by reducing the stimuli to stick-light figures in Experiment II. Three video lengths were used to disentangle early identification from late identification. Accuracy and response time quantified main effects. Kinematic modulation was tested for correlations with task performance. We found higher gesture identification performance in more- compared to less-communicative gestures. However, early identification was only enhanced within a full visual context, while late identification occurred even when viewing isolated kinematics. Additionally, temporally segmented acts with more post-stroke holds were associated with higher accuracy. Our results demonstrate that communicative signaling, interacting with other visual cues, generally supports gesture identification, while kinematic modulation specifically enhances late identification in the absence of other cues. Results provide insights into mutual understanding processes as well as creating artificial communicative agents. Introduction (Csibra & Gergely, 2006). This communicative signal- ing system is powerful in that the signals are dynamically Human communication is multimodal, utilizing various sig- adapted for the context in which they are used. For example, nals to convey meaning and interact with others. Indeed, representational gestures (Kendon, 2004; McNeill, 1994) humans may be uniquely adapted for knowledge transfer, show systematic modulations dependent upon the commu- with the ability to signal the intention to interact as well as nicative or social context in which they occur (Campisi & to manifest the knowledge that s/he wishes to communicate Özyürek, 2013; Galati & Galati, 2015; Gerwing & Bavelas, 2004; Holler & Beattie, 2005). Although these gestures are an important aspect of human communication, it is currently Electronic supplementary material The online version of this unclear how the addressee benefits from this communicative article (https ://doi.org/10.1007/s0042 6-019-01198 -y) contains supplementary material, which is available to authorized users. modulation. The current study aims to investigate for the first time whether and how kinematic signaling enhances * James P. Trujillo identification of representational gestures. james.trujillo@mpi.nl There is growing evidence that adults modulate their Donders Institute for Brain, Cognition and Behaviour, action and gesture kinematics when communicating with Radboud University, Montessorilaan 3, B.01.25, other adults, depending on the communicative context. For 6525GR Nijmegen, The Netherlands example, adults adapt to addressees’ knowledge by produc- Centre for Language Studies, Radboud University, Nijmegen, ing gestures that are larger (Bavelas, Gerwing, Sutton, & The Netherlands Prevost, 2008; Campisi & Özyürek, 2013), more complex Max Planck Institute for Psycholinguistics, Wundtlaan 1, (Gerwing & Bavelas, 2004; Holler & Beattie, 2005), and 6525XD Nijmegen, The Netherlands Vol.:(0123456789) 1 3 Psychological Research higher in space (Hilliard & Cook, 2016) when conveying suggesting that early kinematic cues provide sufficient novel information. Instrumental actions intended to teach information to inform accurate prediction of whole actions show similar kinematic modulation, including spatial before they are seen in their entirety (Cavallo, Koul, Ansuini, (McEllin, Knoblich, & Sebanz, 2018; Vesper & Richard- Capozzi, & Becchio, 2016; Manera, Becchio, Cavallo, Sar- son, 2014) and temporal (McEllin et al., 2018) exaggeration. tori, & Castiello, 2011). One study, for example, used videos Evidence from our own lab corroborates these findings of of a person walking, and at a pause in the video participants spatial and temporal modulation in the production of both were asked whether the actress in the video would continue actions and gestures. In our recent work, we quantified the to walk, or start to crawl. The authors showed that whole- spatial and temporal modulation of actions and pantomime body kinematics could support predictions about the out- gestures (used without speech) in a more- relative to a less- come of an ongoing action (Stapel, Hunnius, & Bekkering, communicative context (Trujillo, Simanova, Bekkering, & 2012). However, another study showed videos of a person Özyürek, 2018). We showed that spatial and temporal fea- reaching out and grasping a bottle, and asked the participants tures of actions and pantomime gestures are adapted to the to predict the next sequence in the action (e.g., to drink, communicative context in which they are produced. to move, to offer) and found that they were unable to use A computational account by Pezzulo, Donnarumma, and such early cues for accurate identification in this more com- Dindo (2013) suggests that modulation makes meaningful plex, open-ended situation (Naish, Reader, Houston-Price, acts communicative by disambiguating the relevant infor- Bremner, & Holmes, 2013). Furthermore, identification mation, effectively making the intended movement goal of pantomime gestures has previously been reported to be clear to the observer. This framework focuses on actions, quite low when no contextual (i.e., object) information is but could be extended to gestures. One recent experimental provided (Osiurak, Jarry, Baltenneck, Boudin, & Le Gall, study directly assessed how kinematic modulation affects 2012). Given these inconsistencies in the literature, an open gesture comprehension. By combining computationally question remains: are early kinematic cues sufficient to based robotic production of gestures with validation through inform early representational gesture identification, or does human comprehension experiments, Holladay, Dragan, and kinematic modulation primarily aid gesture identification as Srinivasa (2014) showed that spatial exaggeration of kin- the movements unfold (i.e., late identification)? ematics allows observers to more easily recognize the target Finally, to understand how kinematic modulation might of pointing gestures. Similarly, Gielniak and Thomaz (2012) support gesture identification, it is important to consider showed that when robot co-speech gestures are kinematically other factors that might influence the semantic comprehen- exaggerated, the content of an interaction with that robot sion of an observer. In a natural environment, movements is better remembered. Another study used an action-based such as gestures are accompanied by additional communica- leader–follower task to show that task leaders not only sys- tive signals, such as facial expression and eye-gaze, and/or tematically modulate task-relevant kinematic parameters, but finger kinematics relevant in the execution of the gestures. these modulations are linked to better performance of the Humans are particularly sensitive to the presence of human followers (Vesper, Schmitz, & Knoblich, 2017). faces, which naturally draw attention (Cerf, Harel, Einhäu- These previous studies suggest that the kinematics ser, & Koch, 2007; Hershler & Hochstein, 2005; Theeuwes modulation of communicative movements (e.g., actions & Van der Stigchel, 2006). This effect is most prominent and gestures) serves to clarify relevant information for the in the presence of mutual gaze (Farroni, Csibra, Simion, addressee. However, it remains unclear whether this also & Johnson, 2002; Holler et al., 2015), but also occurs in holds for more complex human movements, such as pan- averted gaze compared to non-face objects (Hershler & tomime gestures. This question is important for our under- Hochstein, 2005). Hand-shape information can also provide standing of human communication given that complex rep- clues as to the object one is manipulating (Ansuini et al., resentations form an important part of the communicative 2016), and more generally the kinematics of the hand and message (Kelly, Ozyurek, & Maris, 2010; Özyürek, 2014). fingers together provide early cues to upcoming actions The mechanism by which kinematic modulation might (Becchio, Koul, Ansuini, Bertone, & Cavallo, 2018; Cav- support semantic comprehension, or identification, of com- allo et al., 2016), which together may allow the act to be plex movements remains unclear. Several studies suggest more easily identified. To understand the role of kinematic disambiguation of the ongoing act, either through tempo- modulation in communication, the complexity of the visual ral segmentation of relevant parts (Blokpoel et al., 2012; scene must also be taken into account. Brand, Baldwin, & Ashburn, 2002), or spatial exaggeration In sum, previous studies show kinematic modulation of relevant features (Brand et al., 2002) as the mechanism. occurring as a communicative cue in actions and ges- In the case of disambiguation, the “semantic core” (Kendon, tures. While research suggests that this modulation serves 1986), or meaningful part of the movement, is made easier to enhance comprehension, this has not been assessed to understand as it unfolds. However, there is also evidence directly in terms of semantic comprehension of complex 1 3 Psychological Research movements, such as representational gestures. Further- Methods more, it is currently unclear if improved comprehension would be driven by early action identification or by late Participants identification of semantics, and which kinematic features provide this advantage. Twenty participants were included in this study (mean The current study addresses these questions. In two age = 28; 16 female), recruited from the Radboud Uni- experiments, naïve participants perform a recognition versity. Participants were selected on the criteria of being task of naturalistic pantomime gestures recorded in our aged 18–35, right-handed and fluent in the Dutch language, previous study (Trujillo, Simanova et al., 2018). In the with no history of psychiatric disorders or communication first experiment, they see the original videos with the impairments. The procedure was approved by a local eth- face of the actor either visible or blurred, to control for ics committee and informed consent was obtained from all eye-gaze effects. In the second experiment, the same vid- individual participants in this study. eos are reduced to stick-light figures, reconstructed from Kinect motion tracking data. The stick figure videos allow Materials us to test the contribution of specific kinematic features, because only the movements are visible, but not the face or Each participant performed the recognition task with 60 hand shape. In both experiments, we additionally manip- videos of pantomimes that differed in their context (more ulate video length to test whether any communicative or less communicative), video duration (short, medium and benefit is driven more by early identification (resulting full), and face visibility (face visible vs. blurred). Detailed in differences only in the initial fragment), or late iden- description of the video recordings, selection and manipula- tification (resulting in differences in the medium and full tion follows below. fragments). Experiment II provides an additional explora- tory test of the contribution of specific kinematic features Video recording procedure Stimuli were derived from a to gesture identification. previous experiment (Trujillo, Simanova et  al., 2018). In We hypothesize that kinematic modulation serves to this previous experiment, participants (henceforth, actors) enhance semantic legibility. As early kinematic infor- were filmed while seated at a table, with a camera hang- mation is less reliable for open-ended action prediction ing in front of the table. Motion-tracking data were acquired (Naish et al., 2013) and pantomime gestures may gener- using Microsoft Kinect system hanging slightly to the left ally be difficult to identify without context (Osiurak et al., of the camera. Each actor performed a set of 31 gestures, 2012), we expect better recognition scores for the com- either in a more-communicative or a less-communicative municative gestures in the medium fragments and full setting (described below). Gestures consisted of simple fragments compared to initial fragments. We furthermore object-directed acts, such as cutting paper with scissors or predict that performance will correlate with stronger kin- pouring water into a cup. Target objects were placed on the ematic modulation. Additionally, we expect performance table (e.g., scissors and a sheet of paper for the item ‘cut the to be lower overall with stick-light figures, compared to the paper with the scissors’) but actors were instructed to per- full videos due to decreased visual information, but with form as if they were acting on the objects, without actually a similar pattern (i.e., better performance in medium and touching them. For each item, actors began with their hands full fragments compared to initial). For our exploratory placed on designated starting points on the table (marked test, we expect that exaggeration of both spatial and tem- with tape). After placing the target object(s) on the table, the poral kinematic features will contribute to better gesture experimenter moved out of view from the participant and identification. the camera, and recorded instructions were played. Imme- diately following the instructions, a bell sound was played, which indicated that the participant could begin with the pantomime. Once the act was completed, actors returned Experiment I: Full visual context their hands to the indicated starting points, which elicited another bell sound, and waited for the next item. For this Our first experiment, with actual videos of the gestures, was study, videos began at the first bell sound, and ended at the designed to test whether (1) kinematic modulations lead to second bell sounded. In the more-communicative context improved semantic comprehension in an addressee, (2) if we introduced a confederate who sat in an adjacent room the advantage is better explained by early identification or and was said to be watching through the video camera and late identification of the gestures, and (3) whether the effect learning the gestures from the participant. In this way, an is altered by removing a salient part of the visual context, implied communicative context was created. In the less- the actor’s face. communicative context, the same confederate was said to 1 3 Psychological Research be learning the experimental setup. The less-communicative from the motion-tracking data represent qualities that are context was, therefore, exactly matched, including the pres- visible in the videos. ence of an observer, but only differed in that there was no implied interaction. Despite the subtle task manipulation, Inclusion and  randomization Our stimuli set included 120 our previous study (Trujillo, Simanova et al., 2018) showed videos (of the 2480) recorded in our previous study (Tru- robust differences in kinematics between the gestures pro- jillo, Simanova et  al., 2018). Our selection procedure (see duced in the more-communicative vs. the less-communica- Appendix  1) ensured that our stimulus set in the present tive context. experiment included an equal number of more- and less- communicative videos. Each of the 31 gesture items from the original set was included a minimum of three times and Kinematic feature quantification maximum of four times across the entire selection, per- formed by different actors, while ensuring that each item For the current study, we used the same kinematic features also appeared at least once in the more-communicative that were quantified in our earlier study (Trujillo, Simanova context and once in the less-communicative context. Three et  al., 2018). We used a toolkit for markerless automatic videos from each actor in the previous study were included. analysis of kinematic features, developed earlier in our group Appendix  2 provides the full list of items gesture items. (Trujillo, Vaitonyte, Simanova, & Özyürek, 2018). The fol- Supplementary Figure 1 illustrates the range of kinematics, lowing briefly describes the feature quantification procedure: gaze, and video durations included across the two groups in all features were measured within the time frame between the current study with respect to the original dataset from the beginning and the ending bell sound. Motion-tracking Trujillo, Simanova et al. (2018). We ensured that the stimu- data from the Kinect provided measures for our kinematic lus set for the present study matched the original dataset in features, and all raw motion-tracking data were smoothed terms of context-specific differences in the kinematics and using the Savitzky–Golay filter with a span of 15 and degree eye-gaze, ensuring that the current stimulus set is a repre- of 5. As described in our previous work (Trujillo, Simanova sentative sample of the data shown in Trujillo, Simanova et al., 2018), this smoothing protocol was used as it brought et al. (2018). These results are provided in Appendix 1. the Kinect data closely in line with simultaneously recorded optical motion-tracking data in a separate pilot session. The Video segmentation following features were calculated from the smoothed data: Distance was calculated as the total distance traveled by To test whether kinematic modulation primarily influ- both hands in 3D space over the course of the item. Vertical ences early or late identification (question 2), we divided amplitude was calculated on the basis of the highest space the videos into segments of different length. Based on used by either hand in relation to the body. Peak velocity was the previous literature (Kendon, 1986; Kita, van Gijn, & calculated as the greatest velocity achieved with the right van der Hulst, 1998), we defined segments as following: (dominant) hand. Hold time was calculated as the total time, Wait covered the approximate 500 ms after the bell was in seconds, counting as a hold. Holds were defined as an played, but before the participant started to move. Reach event in which both hands and arms are still for at least 0.3 s. to grasp covered the time during which the participant Submovements were calculated as the number of individual reached towards, and subsequently grasped the target ballistic movements made, per hand, throughout the item. To object. In the case of multiple objects, this segment ended account for the inherent differences in the kinematics of the after both objects were grasped. Prepare captured any various items performed, z scores were calculated for each movements unrelated to the initial reach to grasp, but was feature/item combination across all actors including both not part of the main semantic aspect of the pantomime. conditions. This standardized score represents the modula- Main movement covered any movements directly related tion of that feature, as it quantifies how much greater or to the semantic core of the item. Auxiliary captured any smaller the feature was when compared to the average of that additional movements not directly related to the semantic feature across all of the actors. (Addressee-directed) Eye- core. Return object captured the movement of the hands gaze was coded in ELAN as the proportion of the total dura- back to the objects starting position, depicting the object tion of the video in which the participant is looking directly being replaced to its original location. Retract covered the into the camera. For a more detailed description of these movement of the hands back to the indicated the starting quantifications, see Trujillo, Simanova et al. (2018). Also position of the hands, until the end of the video. Note that note that the kinematic features calculated using this pro- the “prepare”, and “auxiliary” segments were optional, tocol are in line with the same features manually annotated and only coded when such movements were present. All from the video recordings (Trujillo, Vaitonyte et al., 2018). other segments were present in all videos. Phases were This supports our assumption that the features calculated delineated based on this segmentation. Phase 0 covered 1 3 Psychological Research the “wait” segment. Phase 1 covered “reach to grasp” and Blurring In all videos, a Gaussian blur was applied to the “prepare”. Phase 2 covered the “main movement” and object, which was otherwise visible in the video. This “auxiliary”. Phase 3 covered “return object” and “retract”. ensured that the object could not be used to infer the action. See Table 1 and Fig. 1 for examples of how these phases To determine whether the face in general, in particular the map onto specific parts of the movement. gaze direction, has an effect on pantomime recognition, we After defining the segments for each video, we also applied a Gaussian blur to the face in half of the videos. divided the videos into three lengths, referred to as ini- Blurring the faces in this way allowed us to manipulate the tial fragments (M = 3.27 ± 1.52  s), medium fragments amount of available visual information, providing a first test (M = 4.62 ± 2.19  s), and full videos (M = 5.59 ± 2.53  s). for how kinematic modulation affects gesture identification Initial fragments consisted of only phase 0 and phase 1, in a less complete visual context (question 3). This was bal- medium fragments consisted of phases 0–2, and full videos anced so that each actor had at least one video with a visible contained all of the phases. An overview of these segments face and one with a blurred face. and phases can be seen in Fig. 1. We performed ANOVAs on each of the fragment lengths to ensure video durations Task of the same fragment length did not differ significantly across cells (see Supplementary Table  1 for statistics). Before beginning the experiment, participants received a This resulted in initial fragments only providing initial brief description of the task to inform them of the nature of hand-shape and arm/hand/finger configuration informa- the stimuli. This ensured that the participants knew to expect tion, medium fragments providing all relevant semantic incomplete videos in some trials. Participants were seated in information, and full videos providing additional eye- front of a 24″ Benq XL2420Z monitor with a standard key- gaze (when present) and additional time for processing board for responses. Stimuli were presented at a frame rate the information. of 29 frames per second, with a display size of 1280 × 720. Table 1 Movement phase examples Phase 1 Phase 2 Phase 3 Reach-to-grasp Prepare Main movement Auxiliary Return object Retract Open jar Right hands extends to jar Right hand lifts jar. Twisting hands to Hands moved apart Hands return to Hands Left hand grasps depict unscrew- to show separating object starting returned to lid ing the lid lid from jar positions indicated starting posi- tion Cut paper Right hand extends to scis- Both hands lifted, Cutting motion Hands spread apart Hands return to Hands sors, left hand to paper configured to depicted with to show that the object starting returned to start cutting paper right hand cutting is complete positions indicated starting posi- tion Fig. 1 Overview of video segmentation and phases. Along the top, representative still frames are shown throughout one video (item: “open jar”). The individual blue blocks indicate individual segments. Below this, phase division is depicted (color figure online) 1 3 Psychological Research Table 2 Overview of analysis cells for Experiment I Context Face visibility Face visibility Fragment length More-communicative More-communicative Less-communicative Less-communicative Face visible Face blurred Face visible Face blurred Initial fragment Initial fragment Initial fragment Initial fragment Mean duration = 4.49 Mean duration = 5.03 Mean duration = 4.50 Mean duration = 4.03 More-communicative More-communicative Less-communicative Less-communicative Face visible Face blurred Face visible Face blurred Medium fragment Medium fragment Medium fragment Medium fragment Mean duration = 4.72 Mean duration = 4.43 Mean duration = 4.34 Mean duration = 4.57 More-communicative More-communicative Less-communicative Less-communicative Face visible Face blurred Face visible Face blurred Full fragment Full fragment Full fragment Full fragment Mean duration = 4.73 Mean duration = 4.34 Mean duration = 4.29 Mean duration = 4.61 There are ten videos in each of the cells During the experiment, participants would first see a fixation for the presence of main and interactional effects. We used cross for a period 1000 ms with a jitter of 250 ms. One of the Mauchly’s test of sphericity on each factor and interaction in item videos was then displayed on the screen, after which the our model and applied the Greenhouse–Geisser correction question appeared: “What was the action being depicted?” where appropriate. Two possible answers were presented on the screen, one on the left, and one on the right. Answers consisted of one Results: Experiment I verb and one noun that captured the action (e.g., the correct answer to the item “pour the water into the cup” was “pour We used RM-ANOVA to test for a significant main effect water”). Correct answers were randomly assigned to one of communicative context, fragment length, or face vis- of the two sides. The second option was always one of the ibility on performance. In terms of accuracy, results of the possible answers from the total set. Therefore, all options fragment length x face visibility x communicative context were presented equally often as the correct answer and as the RM-ANOVA showed a significant main effect of commu- wrong (distractor) option. Participants could respond with nicative context, F(1,19) = 2.912, p = 0.029, as well as a the 0 (left option) or 1 (right option) keys on the keyboard. main effect of fragment length, F(2,38) = 53.583, p < 0.001, Accuracy and response time (RT) were recorded for each but no main effect of face visibility, F(1,19) = 0.050, video. p = 0.825. Planned comparisons revealed higher accuracy in the more-communicative context for initial fragments Analysis (more-communicative mean = 87.13%, less-communica- tive mean = 81.17%; t(18) = 3.025, p = 0.007), but there Main effects analyses: communicative context, fragment was no difference between contexts in the medium frag- length, and  visual context Both RT and accuracy of iden- ments (more-communicative context mean = 97.37%, less- tification judgments were calculated for each of 12 cells communicative mean = 96.49%; t(18) = 0.785, p = 0.443) (Table  2): fragment length (initial fragment vs. medium or full videos (more-communicative mean = 97.37%, less- fragment vs. full video) × face (blurred vs. visible) × context communicative mean = 97.22%; t(18) = 0.128, p = 0.899). In (more-communicative vs. less-communicative) in order to sum, performance was high overall on more-communicative test (1) whether more-communicative gestures were identi- compared to less-communicative videos, with specifically fied faster or with higher accuracy (main effect of context), more-communicative initial fragments showing higher per- (2) performance was higher in only initial fragments (pro- formance than less-communicative initial fragments. Accu- viding evidence for early identification theory) or only in racy, regardless of communicative context, was additionally medium fragments (providing evidence for late identifica- higher in medium and full fragments compared to initial. See tion), as well as (3) whether face visibility impacted per- Fig. 2a for an overview of these results. formance, which informs us whether there is an effect of In terms of RT, results of the fragment length x face x visual information availability on the identification per - context RM-ANOVA revealed a significant main effect of formance. Separate repeated-measures analyses of vari- communicative context, F(1,19) = 5.699, p = 0.028, and ance (RM-ANOVA) were run for accuracy and RT to test of fragment length, F(2,38) = 192.489, p < 0.001, but not 1 3 Psychological Research Fig. 2 Overview of semantic judgment performance over context and the three video lengths. Panel b shows RT across the three video fragment length, combined for face visibility. Bean plots depict the lengths. In all panels, fragment length is depicted along the x-axis, distribution (kernel density estimation) of the data. The dotted lines the y-axis shows mean performance (in panel, mean accuracy; in indicate the overall performance mean, the larger solid bars indicate panel, mean RT in seconds), while blue (left) plots depict the less- the mean per video length and communicative context, shorter bars communicative context and green (right) plots the more-communica- indicate mean values per participant, and the filled curve depicts the tive context (color figure online) overall distribution of scores. Panel a shows mean accuracy across of face visibility, F(1,19) = 3.725, p = 0.069. Planned con- some of the relevant information is available even in the trasts revealed faster RT in more-communicative compared earliest stages of the act, and that communicative modulation to less-communicative initial fragments (more-communi- enhances this information. Since the face visibility did not cative mean = 1.446; less-communicative mean = 1.583 s), contribute significantly to better performance, we suggest t(19) = 3.824, p = 0.001 but faster RT for less- compared to that improved comprehension may come from fine-grained more-communicative medium fragments (more-communi- kinematic cues, such as hand-shape and finger kinematics. cative mean = 1.094 s; less-communicative mean = 1.029 s), As objects are known to have specific action and hand- t(19) = 3.479, p = 0.003, but no difference between more- shape affordances (Grèzes & Decety, 2002; Tucker & Ellis, and less-communicative full videos (more-communica- 2001), hand shape can also provide clues as to the object tive mean = 1.094; less-communicative mean = 1.129), being grasped, and thus also the upcoming action (Ansuini t(19) = 1.237, p = 0.231. We also found faster RT for et al., 2016; van Elk, van Schie, & Bekkering, 2014). These medium fragments (M = 1.093) compared to initial frag- results are therefore in line with the early prediction results ments (M = 1.630), t(19) = 12.538, p < 0.001, as well as for described for action chains (Becchio, Manera, Sartori, Cav- medium fragments compared to full videos (M = 1.142), allo, & Castiello, 2012; Cavallo et al., 2016). Our results t(19) = 2.326, p = 0.031. In sum, RT was similar in both the may also be explained by immediate comprehension. In more- and less-communicative contexts, but faster responses other words, the visual information provided by the shape were seen in medium fragments compared to initial and full and configuration of the hands may be sufficiently clear to fragments. See Fig. 2b for an overview of these results. activate the semantic representation of the action without any prediction of the upcoming movements. Although we Discussion: Experiment I cannot determine the exact cognitive mechanism, we can conclude that communicative modulation supports compre- In our first experiment, we sought to determine how commu - hension through early action identification. nicative modulation affects identification of pantomime ges- We found no evidence for higher accuracy in more- com- ture semantics. We found that pantomime gestures produced pared to less-communicative medium fragments, nor for full in a more-communicative context were better recognized videos. It seems that the overall accuracy in medium and full when compared to those produced in a less-communicative fragments does not allow a difference to be found between context. Specifically, more-communicative initial fragments the contexts. In both more- and less-communicative medium were recognized more accurately and faster than less-com- fragments, accuracy was above 96%, suggesting that ceiling municative initial fragments. level performance may have already been reached. This indi- The higher accuracy in recognizing more- compared to cates that even if communicative modulation supports late less-communicative initial fragments suggests that at least identification, general task difficulty was not high enough 1 3 Psychological Research in our task to allow us to find any difference. Surprisingly, information being highly restricted, we expect task difficulty faster RT was found for less- compared to more-communica- to be increased. tive medium fragments. This unexpected result may reflect a In this way, we are able to determine if kinematic modu- trade-off between kinematic modulation, which is thought to lation supports early action identification in the absence of be informative, and direct eye-gaze, which serves a commu- other early cues such as hand shape, and whether it supports nicative function but may not lead to faster responses. Along ongoing semantic disambiguation when gesture recognition this line, Holler and colleagues (2012) argue that direct eye- is more difficult. Overall, this experiment will build on our gaze leads to a feeling of being addressed, which in turn findings from Experiment I by providing a specific test of forces the addressee to split their attention between the eyes how kinematic modulation affects semantic comprehension and hands of the speaker. If this interpretation is correct, we when isolated from other contextual information. Addition- would expect that although responses are faster for the less- ally, it will test which specific kinematic features contribute communicative videos, accuracy should still be higher in the to supporting semantic comprehension. more-communicative videos. To draw any conclusions about how communicative modulation affects late identification, Methods: Experiment II we suggest that it is necessary to increase task difficulty. In sum, our results show that communicatively produced Participants gestures are more easily recognized than less communica- tive gestures, and that this effect is explained by early action Twenty participants were included in this study (mean identification. This result is in line with the research on age = 24; 16 female), recruited from the Radboud Univer- child-directed actions (Brand et al., 2002), as well as the sity. Participants were selected on the criteria of being aged more recent developments regarding early action identifica- 18–35, right-handed, fluent in the Dutch language, without tion based on kinematic cues (Ansuini, Cavallo, Bertone, & any history of psychiatric impairments or communication Becchio, 2014; Cavallo et al., 2016). disorders, and not having participated in the previous experi- ment. The procedure was approved by a local ethics commit- tee and informed consent was obtained from all individual Experiment II: Isolated kinematic context participants in this study. Although this first experiment shows evidence for a support- Materials ing role of kinematic modulation in semantic comprehension of gestures, it remains unclear whether the effect remains We used same video materials as in the Experiment I, but when only gross kinematics are observed, and facial, includ- this time the videos were reduced to stick-light-figures. ing attentional cueing to the hands, and finger kinematics, Motion-tracking data were used to reconstruct the move- including hand shape, are completely removed. Removing ments of the upper-body joints (Trujillo, Vaitonyte et al., additional visual contextual information would therefore 2018). Videos consisted of these reconstructions, using x, help to disentangle the effects of gross (i.e., posture and y, z coordinates acquired at 30 frames per second of these hands) kinematic modulation from other (potentially com- joints (see Fig. 3 for an illustration of the joints utilized). municative) visual information. For example, while exten- Note that no joints pertaining to the fingers were visually sive research has looked at the early phase of action iden- represented. This ensured that hand shape was not a feature tification from hand and finger kinematics (Ansuini et al., that could be identified by an observer. These points were 2016; Becchio et al., 2018; Cavallo et al., 2016), the higher depicted with lines drawn between the individual points to level dynamics of the hands and arms, which we call gross create a light stick figure, representing the participants’ kin- kinematics, have not been well studied. This is particularly ematic skeleton. Skeletons were centered in space on the relevant as these high level kinematic features are similar to screen, with the viewing angle adjusted to reflect an azimuth the qualities described in gesture research. Thus, in Experi- of 20° and an elevation of 45° in reference to the center of ment II we replicate Experiment I, but reduce the stimuli to the skeleton. present a visually simplistic scene consisting of only lines representing the limbs of the actor’s body. If kinematic Analysis modulation is driving the communicative advantage seen in our first experiment, we can expect the same effect pat - Main effects analyses: communicative context, fragment tern as seen in Experiment I. If other features of the visible length, and  visual context To determine if there was an scene, such as finger kinematics, provided the necessary overall effect of communicative context on accuracy or RT, cues for semantic comprehension then the effect on early and to again test for evidence of either the early identifi- identification should no longer be present. Due to the visual cation or late identification hypothesis, we used two sepa- 1 3 Psychological Research Fig. 3 Illustration of materials used for Experiment II. a Diagram of 6–9 are present for both the left and right arms. b Still frames from joints represented in the videos of Experiment II: 1. top of head, 2. an actual stimulus video, depicting the visual information made avail- bottom of head, 3. top of spine, 4. middle of spine, 5. lower spine, able to the participants, underneath the corresponding actual video 6. shoulder, 7. elbow, 8. wrist, 9. center of hand. Note that numbers frames (not shown to participants) for comparison Table 3 Overview of analysis cells for Experiment II between the set of kinematic features and accuracy. Regres- sion analyses were performed on the medium fragments, as Context this is where a statistically significant difference was found Fragment length More-communicative Less-communicative between more- and less-communicative videos. Statistical Initial fragment Initial fragment analyses utilized mixed effects models implemented in the Mean = 4.22 s Mean = 4.24 s R statistical program (R Core Team, 2014) using the lme4 More-communicative Less-communicative package (Bates, Mächler, Bolker, & Walker, 2014). p val- Medium fragment Medium fragment ues were estimated using the Satterthwaite approximation Mean = 4.68 s Mean = 4.73 s for denominator degrees of freedom, as implemented in the More-communicative Less-communicative Full fragment Full fragment lmerTest package (Kuznetsova, 2016). Our regression mod- Mean = 4.59 s Mean = 4.51 s els first factored out video duration and subsequently tested the three main components of kinematic modulation that There are ten videos in each of the cells have been identified in previous research: range of motion (Bavelas et al., 2008; Hilliard & Cook, 2016) (here quan- rate 3 (fragment length) × 2 (context) one-way ANOVAs. tified as vertical space utilized), velocity of movements, and punctuality (Brand et al., 2002) (here quantified as the When appropriate, independent samples t tests were used to determine where these differences occurred across the number of submovements and the amount of holds between them. Kinematic features were defined as main effects, while three video lengths. When a non-normal distribution was detected, results are reported after a Greenhouse–Geisser a random intercept was added for participant. For a detailed description of how the model was defined, see Appendix  3. correction. To reduce the risk of Type I error, we used the Simple Inter- active Statistical Analysis tool (http://www.quant itati veski Feature level regression analysis: exploratory test of  kin- ematic modulation values Given that Experiment II aims lls.com/sisa/calcu latio ns/bonf e r .htm) to calculate an adjusted alpha threshold based on the mean correlation between all of to test the specific contribution of kinematic modulation on semantic comprehension, we additionally performed the tested features (regardless of whether they are in the final model or not), as well as the number of tests (i.e., number of an exploratory linear mixed effects analysis using the kin- ematic modulation values that characterize the stimulus vid- variables remaining in the final mixed model). Our six vari- ables (duration, vertical amplitude, peak velocity, submove- eos. This was done to assess the relation between specific kinematic features and semantic judgment performance. ments, hold time) showed an average correlation of 0.154, leading to a corrected threshold of p = 0.019. Kinematic modulation values were available from our previous study, where these stimulus videos were created Results: Experiment II (Trujillo, Simanova et al., 2018), and were meant to quan- tify kinematic features in the semantic core of the action. Main effects analyses: communicative context, fragment We, therefore, chose to perform this additional analysis in Experiment II as a follow-up assessment of the significant length difference between more- and less-communicative medium fragments (Table 3). Our first RM-ANOVA tested whether accuracy was affected by the communicative context, or the fragment length of the We performed linear regression analyses between the set of kinematic features and RT, and a logistic regression videos. We found a significant main effect of communicative 1 3 Psychological Research context on accuracy, F(1,19) = 5.108, p = 0.036, as well as a Feature level regression analysis: exploratory test main effect of fragment length, F(2,38) = 10.962, p < 0.001. of kinematic modulation values Planned comparisons revealed no difference between accu- racy of more-communicative and less-communicative initial To test which specific kinematic features, if any, affected fragments (more-communicative mean = 59.58%, less-com- accuracy, we used mixed models to assess whether accuracy municative mean = 56.76%), t(19) = − 0.646, p = 0.526, or on each video could be explained by the kinematic features in full videos (more-communicative mean = 64.87%, less- of that video. We found kinematic modulation of punctuality communicative mean = 62.76%), t(19) = 0.492, p = 0.628. (hold-time and submovements) to explain performance accu- We found significantly higher accuracy in more-commu- racy better than the null model, χ (5) = 16.064, p < 0.001. nicative medium fragments (M = 75.69%) compared to Specifically, increased hold time was associated with higher less-communicative medium fragments (M = 66.11%) vid- accuracy (b = 0.377, z = 3.962, p < 0.001), although sub- eos, t(19) = 2.99, p = 0.007. We found no fragment length movements were not (z = − 0.085, p = 0.932). We found no by communicative context interaction, F (2,36) = 0.659, correlation between duration and accuracy (z = − 1.151, p = 0.523. p = 0.249) in our kinematic model. Response time was not Our second RM-ANOVA tested whether RT was significantly explained by any of the kinematic feature sets. affected by communicative context or fragment length. We Duration, as assessed in the null model, was also not related found a significant main effect of fragment length on RT, to response time (t = − 1.768, p = 0.077). In sum, kinematic F(2,38) = 7.263, p = 0.003, but no main effect of commu- modulation of hold time was specifically related to higher nicative context, F(1,19) = 2.12, p = 0.162. We additionally performance accuracy. found a video length x context interaction, F(2,38) = 3.87, p = 0.031. Planned comparisons revealed significantly faster Discussion: Experiment II RT in medium fragments (M = 1.817 s) compared to ini- tial fragments (M = 1.953  s), t(19) = 3.982, p = 0.001, but Experiment II was designed to test the isolated contribution no difference between medium fragments and full videos of kinematics to semantic comprehension and further dif- (M = 1.872  s), t(19) = 1.339, p = 0.196. See Fig.  4 for an ferentiate between early identification vs. late identification. overview of these results. In sum, communicative context We found that more-communicative videos were still recog- did not affect RT, but responses were faster in medium com- nized with overall higher accuracy than less-communicative pared to initial fragments. videos even in the absence of contextual cues such as hand- shape, finger kinematics, or actor’s face. Higher accuracy in recognizing more-communicative compared to less-communicative medium fragments sug- gests that the advantage given by kinematic modulation predominantly affects identification of the pantomime after Fig. 4 Overview of semantic judgment performance over context and video lengths. Panel b shows RT across the three video lengths. In all fragment length in Experiment II. Bean plots depict the distribution panels, fragment length is depicted along the x-axis, the y-axis shows (kernel density estimation) of the data. The dotted lines indicate the mean performance (in panel, mean accuracy; in panel, mean RT in overall performance mean, the largest solid bars indicate the group seconds), while blue (left) plots depict the less-communicative con- mean per video length and context, and shorter bars indicate individ- text and green (right) plots the more-communicative context (color ual participant means. Panel a shows mean accuracy across the three figure online) 1 3 Psychological Research it has unfolded. The unfolding of the final phase of the pan- non-kinematic cues play a role in early gesture recognition, tomime may provide enough extra time for the overall act while modulated arm and hand kinematics provide cues to to be processed completely and the pantomime to be recog- identify the act as it unfolds, even in the absence of other nized accurately regardless of modulation. This finding is visual cues. therefore in line with the hypothesis that kinematic modula- Our conclusion regarding the role of temporal modula- tion mainly contributes to ongoing semantic disambiguation. tion, and more specifically the increased hold time, as sup- We further explored the contribution of specific kinematic porting semantic comprehension matches well with the fac- features to semantic comprehension in the absence of fur- tor ‘punctuality’, as defined by Brand et al. (2002) in their ther visual context such as hand shape or facial cues. We study of child-directed action. Punctuality of actions refers found that temporal kinematic modulation (i.e., increasing to movement segments with clear beginning and end points, segmentation of the act) was an important factor influenc- allowing the individual movements to be clear to an observer ing semantic comprehension. Specifically, increasing hold (Blokpoel et al., 2012). Exaggerating the velocity changes time positively impacted accuracy. Our results suggest that between movements and increasing hold time (Vesper et al., although the effect may be subtle in production, this fea- 2017) can make the final body configuration more salient ture plays an important role in clarifying semantic content by allowing longer viewing time of this configuration for through temporal unfolding of the gesture. the addressee. Our findings have several important implications. By combining naturalistic motion-tracking production data General discussion with a semantic judgment task in naïve observers, our study provides new insights and support for models of effec- This study aimed to determine the role of kinematic modu- tive human–machine interactions. Specifically, our results lation in the semantic comprehension of (pantomime) ges- expand and contrast the robotics literature that demonstrate tures. First, we asked whether kinematic modulation influ- spatial modulation as a method of defining more legible ences semantic comprehension of gestures and found that acts (Dragan, Lee, & Srinivasa, 2013; Dragan & Srinivasa, more-communicatively produced gestures are recognized 2014; Holladay et al., 2014). Our findings suggest that while better than less-communicatively produced gestures (Experi- spatial modulation may be effective for single-movement ments I and II). Second, by utilizing different video frag- gestures such as pointing, temporal modulation has a larger ment lengths, we tested the underlying mechanism of this role in this clarification effect in more complex acts. communicative advantage. We found evidence for enhanced We additionally build on studies of gesture comprehen- early identification when provided with a more complete sion, showing the importance of kinematic cues in success- visual scene, including the hand shape (Experiment I), but ful semantic uptake and bringing new insights into previous enhanced late identification when providing with only gross findings. For instance, our findings provide a mechanistic kinematics (Experiment II). Finally, we show in Experiment understanding of larger scale, qualitative features, such as II that increased post-stroke hold time has the strongest effect informativeness (Campisi & Özyürek, 2013). Differences on the communicative gesture comprehension advantage. in the informativeness of complex gestures may be under- When provided with a wealth of visual cues, as in Experi- stood by looking at the underlying kinematic die ff rences and ment I, participants gained a communicative advantage even how these relate to the comprehension of such gestures. As in the early stages of movement. This finding fits nicely an example, gestures are understood through the individ- with the idea that the end goal of an action, or perhaps the ual movements that comprise them, rather than static hand upcoming movements themselves, can be predicted by utiliz- configurations (Kendon, 2004; McNeill, 1994). Increasing ing early kinematics together with visual contextual infor- the number of clearly defined movements consequently mation (Cavallo et al., 2016; Iacoboni et al., 2005; Stapel increases the amount of visual information available to an et  al., 2012). Our results from the Experiment II suggest observer, which could lead to the perception of increased that kinematic modulation of gross hand movements alone informativeness. is not sufficient for this effect as when the visual stimulus Our work has further implications for clinical practice, was degraded this advantage was removed. It should be where it can be applied to areas such as communication dis- noted that we cannot conclude that kinematic information orders. Research has shown that people with aphasia use is insufficient, but rather that the gross hand kinematics that gestures, including pantomimes, to supplement the semantic are typically used to assess gestures are insufficient. This content of their speech (DeBeer et al., 2015; Rose, Mok, is particularly relevant given the evidence that hand and & Sekine, 2017). Knowledge of which features contribute finger kinematics inform early manual action identification to semantically recognizable gestures could, therefore, be (Becchio et al., 2018; Cavallo et al., 2016; Manera et al., applied to developing therapies for more effective panto- 2011). We, therefore, conclude that both kinematic and mime use and understanding. 1 3 Psychological Research for addressee-directed eye-gaze and kinematic modula- Summary tion were ranked higher than those with low values. This placed all items on a continuum that ranked their commu- Our study is the first to systematically test and provide a nicativeness. This was done due to the observation that, partial account of how the kinematic modulation that arises due to the subtle manipulation of context in Experiment from a more-communicative context can support efficient I of Trujillo, Simanova et al. (2018), there was consider- identification of a manual act. We found that communica- able overlap of kinematic modulation in the middle of the tively produced acts are more easily understood early on spectrum (i.e., some actors in the more-communicative due to kinematic and non-kinematic cues. While compre- context showed modulation more similar to those of the hension is dependent on how much of the visual scene is less-communicative context, and vice versa). We chose available, communicative kinematic modulation alone leads to include items which represented a range of eye-gaze to improved recognition of pantomime gestures even in a and kinematic features representative of their respective highly reduced visual scene. Particularly, temporal kine- communicative context. This method allowed a more clear matic modulation leads to improved late identification of separation of the contexts, while our further selection pro- the act in the absence of other cues. cedure (described below) ensured that items were included Acknowledgements The authors are grateful to Ksenija Slivac for her across a wide range of this ranked continuum. contribution to stimulus preparation and data collection in Experiment After creating the ranked continuum of items, inclu- I, as well as Muqing Li for her contribution to data collection and sion moved from highest to lowest ranked items. Each of analyses in Experiment II. We additionally thank Louis ten Bosch for the 31 items, as described in Appendix  2, was included his insights and discussions regarding methodology. This research was supported by the NWO Language in Interaction Gravitation Grant. a minimum of three times and maximum of four times across the entire selection, performed by different actors, Funding Funding was provided by Nederlandse Organisatie voor while ensuring that each item also appeared at least once Wetenschappelijk Onderzoek (Grant no. 2014.WP4.PhD.RUN.014). in more-communicative context and once in the less-com- municative context. Three videos from each actor in the Compliance with ethical standards previous study were included. This ensured an even rep- resentation of the data on which we previously reported. Conflict of interest The authors declare no conflict of interest in this Supplementary Figure 1 illustrates the range of kinemat- study. ics, gaze, and video durations included across the two Ethics statement All procedures performed in studies involving groups in the current study with respect to the original human participants were in accordance with the ethical standards of dataset. the institutional and/or national research committee and with the 1964 We ensured that the current stimulus set was representa- Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all individual partici- tive of the original data by repeating the same mixed model pants included in the study. analyses described in Trujillo, Simanova et al. (2018). In line with the original dataset, we found significantly higher Informed consent Informed consent was obtained from all individual values in communicative compared to non-communica- participants included in the study. tive ver tical amplitude (communicative = 0.160 ± 0.99; non-communicative = − 0.449 ± 0.809; χ (4) = 12.263, Open Access This article is distributed under the terms of the Crea- p < 0.001), submovements (communicative = 0.161 ± 789; tive Commons Attribution 4.0 International License (http://creat iveco mmons.or g/licenses/b y/4.0/), which permits unrestricted use, distribu- non-communicative = − 0.661 ± 585; χ (4) = 32.821, tion, and reproduction in any medium, provided you give appropriate p < 0.001), peak velocity (communicative = 0.181 ± 1.08; credit to the original author(s) and the source, provide a link to the non-communicative = − 0.683 ± 0.649; χ (4) = 23.965, Creative Commons license, and indicate if changes were made. p = 0.001), and direct eye-gaze (communica - tive = 0.235 ± 0.220; non-communicative = 0.013 ± 0.041; χ (4) = 44.703, p < 0.001). Also in line with the original data, we found a less robust, but still significant difference Appendix 1: Item selection procedure in hold time (communicative = 0.107 ± 1.159; non-com- municative = − 0.448 ± 0.892; χ (4) = 7.917, p = 0.005), To provide a representative sampling of each of the two Finally, duration was also longer in communicative groups, all individual items from all subjects included in (M = 7.237 ± 1.754) compar ed t o non-communicative the previous study were ranked according to eye-gaze and (M = 6.132 ± 1.235) videos. overall kinematic modulation (i.e., z scores derived from the kinematic features described in the section b). The two groups were ordered such that items with high values 1 3 Psychological Research changed in child–adult interactions by (Brand et al., 2002), Appendix 2: List of items from Trujillo, but was found to be increased in a communicative context Simanova et al. (2018) by (Trujillo, Simanova et al., 2018). As more-communicative videos were, on average, longer The table provides the original Dutch response options than less-communicative videos, we included video duration that participants saw, alongside the English translation. (ms) in our regression models. This allowed us to test the Original (Dutch) English contribution of kinematic features after taking into account total duration, ensuring that any effect of kinematics is not appel verplaatsen Move apple explained by duration alone. We report the video duration banaan pellen Peel banana correlation from the best-fit model if this model is a better fit blokken stapelen Stack blocks to the data than the null model. If the null model is a better brood snijden Cut bread fit, then we report the video duration correlation from the citroen uitpersen Squeeze lemon null model. Duration was fitted before the kinematic vari - dobbelstenen gooien Roll dice ables in order to ensure that any significant contribution of haar borstelen Brush hair kinematic modulation to the model fit was over and above hoed opdoen Put on hat that of duration. In other words, our models were set up to kaarten schudden Shuffle cards specifically test the contribution of kinematic modulation kurk verdwijderen Remove cork after taking into account video duration and inter-individual naam schrijven Write name differences. papier afvegen Brush-off paper Typically, when utilizing mixed effects models the papier knippen Cut paper researcher must first find the model that is the best-fit for papier kreukelen Crumple paper the data before making inferences on the model parame- papier meten Measure paper ters. The best-fit model was determined by first defining a papieren nieten Staple papers ‘null’ model that only included duration as fixed effect and papier scheuren Tear paper participant as random intercept. We used a series of log- papier stempelen Stamp paper likelihood ratio tests to determine if each kinematic feature papier vouwen Fold paper term (described above: range of motion, velocity, punctual- pendop opdoen Put on pen cap ity) contributed significantly to the model fit. For example, pendop verdwijderen Remove pen cap if a comparison between a model that includes peak velocity potje openmaken Open jar and a model that does not include this effect term yields a ring aandoen Put on ring non-significant result, then we do not include this kinematic slot openmaken Open lock feature in the model. If the comparison yields as a signifi- spijkers slaan Hammer nails cant result, we keep this kinematic feature and compare this tafel schrobben Scrub desk model with a new model that contains the next non-tested tekening wissen Erase drawing kinematic feature. In a step-wise fashion we thus test the thee roeren Stir tea contribution of each of the kinematic features. We report theezakje dompelen Steep tea effects from the final, best-fit model, if it is still a better fit water gieten Pour water than the null model. zonnebril opdoen Put on sunglasses References Appendix 3: Mixed effects modeling procedure Ansuini, C., Cavallo, A., Bertone, C., & Becchio, C. (2014). The vis- ible face of intention: Why kinematics matters. Frontiers in Psy- chology, 5, 815. https ://doi.org/10.3389/fpsyg .2014.00815 . The order in which the predictor variables were entered Ansuini, C., Cavallo, A., Koul, A., D’Ausilio, A., Taverna, L., & Bec- into the mixed effects model was determined based on the chio, C. (2016). Grasping others’ movements: Rapid discrimina- a priori hypothesized contribution of the three compo- tion of object size from observed hand movements. Journal of Experimental Psychology: Human Perception and Performance, nents: range of motion has been found to be increased in 42(7), 918–929. https ://doi.org/10.1037/xhp00 00169 . adult–child interactions (Brand et al., 2002; Fukuyama et al., Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014). Fitting linear 2015); peak velocity was found to be increased in a com- mixed-effects models using lme4. Journal of Statistical Software, municative context in at least one study (Trujillo, Simanova 67(1), 1–48. https ://doi.org/10.18637 /jss.v067.i01. et  al., 2018); punctuality was previously not found to be 1 3 Psychological Research Bavelas, J., Gerwing, J., Sutton, C., & Prevost, D. (2008). Gesturing Human–Robot Interaction—HRI’12 (p. 375). New York: ACM on the telephone: Independent effects of dialogue and visibility. Press. http://doi.org/10.1145/21576 89.21578 13. Journal of Memory and Language, 58(2), 495–520. https ://doi. Grèzes, J., & Decety, J. (2002). Does visual perception of object afford org/10.1016/j.jml.2007.02.004. action? Evidence from a neuroimaging study. Neuropsychologia, Becchio, C., Koul, A., Ansuini, C., Bertone, C., & Cavallo, A. (2018). 40(2), 212–222. https: //doi.org/10.1016/S0028- 3932(01)00089- 6. Seeing mental states: An experimental strategy for measuring the Hershler, O., & Hochstein, S. (2005). At first sight: A high-level pop observability of other minds. Physics of Life Reviews. https://doi. out effect for faces. Vision Research, 45(13), 1707–1724. https :// org/10.1016/j.plrev .2017.10.002.doi.org/10.1016/J.VISRE S.2004.12.021. Becchio, C., Manera, V., Sartori, L., Cavallo, A., & Castiello, U. Hilliard, C., & Cook, S. W. (2016). Bridging gaps in common ground: (2012). Grasping intentions: From thought experiments to empiri- Speakers design their gestures for their listeners. Journal of cal evidence. Frontiers in Human Neuroscience, 6(May), 1–6. Experimental Psychology: Learning, Memory, and Cognition, https ://doi.org/10.3389/fnhum .2012.00117 . 42(1), 91–103. https ://doi.org/10.1037/xlm00 00154 . Blokpoel, M., van Kesteren, M., Stolk, A., Haselager, P., Toni, I., & Holladay, R. M., Dragan, A. D., & Srinivasa, S. S. (2014). Legible van Rooij, I. (2012). Recipient design in human communication: Robot Pointing. In: The 23rd IEEE International Symposium on Simple heuristics or perspective taking? Frontiers in Human Neu- Robot and Human Interactive Communication, 2014 RO-MAN roscience, 6, 253. https ://doi.org/10.3389/fnhum .2012.00253 . (pp. 217–223). Brand, R. J., Baldwin, D. A., & Ashburn, L. A. (2002). Evidence Holler, J., & Beattie, G. (2005). Gesture use in social interaction: How for ‘motionese’: Modifications in mothers’ infant-directed speakers’ gestures can reflect listeners’ thinking. In: 2nd Con- action. Developmental Science, 5(1), 72–83. https ://doi. ference of the International Society for Gesture Studies (ISGS): org/10.1111/1467-7687.00211. Interacting Bodies (pp. 1–12). Campisi, E., & Özyürek, A. (2013). Iconicity as a communicative strat- Holler, J., Kelly, S., Hagoort, P., & Özyürek, A. (2012). When gestures egy: Recipient design in multimodal demonstrations for adults catch the eye: The influence of gaze direction on co-speech ges- and children. Journal of Pragmatics, 47(1), 14–27. https ://doi. ture comprehension in triadic communication. In: N. Miyake, D. org/10.1016/j.pragm a.2012.12.007. Peebles, & R. P. Cooper (Eds.) Proceedings of the 34th Annual Cavallo, A., Koul, A., Ansuini, C., Capozzi, F., & Becchio, C. (2016). Meeting of the Cognitive Science Society (pp. 467–472) Austin, Decoding intentions from movement kinematics. Scientific TX: Cognitive Society. Reports, 6(November), 37036. https://doi.or g/10.1038/srep37036 . Holler, J., Kokal, I., Toni, I., Hagoort, P., Kelly, S. D., & Ozyurek, A. Cerf, M., Harel, J., Einhäuser, W., & Koch, C. (2007). Predicting (2015). Eye’m talking to you: Speakers’ gaze direction modulates human gaze using low-level saliency combined with face detec- co-speech gesture processing in the right MTG. Social Cogni- tion. NIPS 2007. https ://doi.org/10.1016/j.visre s.2015.04.007. tive and Affective Neuroscience, 10(2), 255–261. https ://doi. Csibra, G., & Gergely, G. (2006). Social learning and social cognition: org/10.1093/scan/nsu04 7. The case for pedagogy. Processes of Change in Brain and Cogni- Iacoboni, M., Molnar-Szakacs, I., Gallese, V., Buccino, G., Mazziotta, tive Development, 21, 249–274. J. C., & Rizzolatti, G. (2005). Grasping the intentions of others DeBeer, C., Carragher, M., van Nispen, K., de Ruiter, J., Hogrefe, with one’s own mirror neuron system. PLoS Biology, 3(3), e79. K., & Rose, M. (2015). Which gesture types make a difference? https ://doi.org/10.1371/journ al.pbio.00300 79. Interpretation of semantic content communicated by PWA via Kelly, S. D., Ozyurek, A., & Maris, E. (2010). Two sides of the same different gesture types. GESPIN, 4, 89–93. coin: Speech and gesture mutually interact to enhance compre- Dragan, A. D., Lee, K. C. T., & Srinivasa, S. S. (2013). Legibility hension. Psychological Science, 21(2), 260–267. https ://doi. and predictability of robot motion. In 2013 8th ACM/IEEE org/10.1177/09567 97609 35732 7. International Conference on Human-Robot Interaction (HRI) Kendon, A. (1986). Current issues in the study of gesture. In J.-L. (pp. 301–308). Tokyo, Japan: IEEE. http://doi.org/10.1109/ Nespoulous, P. Perron, A. R. Lecours, & T. S. Circle (Eds.), The HRI.2013.64836 03 biological foundations of gestures: Motor and semiotic aspects Dragan, A., & Srinivasa, S. (2014). Integrating human observer infer- (1st ed., pp. 23–47). London: Psychology Press. ences into robot motion planning. Autonomous Robots, 37(4), Kendon, A. (2004). Gesture: Visible actions as utterance. Cambridge: 351–368. https ://doi.org/10.1007/s1051 4-014-9408-x. Cambridge University Press. Farroni, T., Csibra, G., Simion, F., & Johnson, M. H. (2002). Eye con- Kita, S., van Gijn, I., & van der Hulst, H. (1998). Movement phases tact detection in humans from birth. Proceedings of the National in signs and co-speech gestures, and their transcription by human academy of Sciences of the United States of America, 99(14), coders. In Lecture notes in computer science (including subseries 9602–9605. https ://doi.org/10.1073/pnas.15215 9999. lecture notes in artificial intelligence and lecture notes in bioin- Fukuyama, H., Qin, S., Kanakogi, Y., Nagai, Y., Asada, M., & formatics) (Vol. 1371, pp. 23–35). Berlin: Springer. http://doi. Myowa-Yamakoshi, M. (2015). Infant’s action skill dynamically org/10.1007/BFb00 52986 . modulates parental action demonstration in the dyadic interac- Kuznetsova, A. (2016). lmerTest package: Tests in linear mixed effects tion. Developmental Science, 18(6), 1006–1013. https ://doi. models. Journal of Statistical Software, 82(13), 1. https ://do i. org/10.1111/desc.12270 .org/10.18637 /jss.v082.i13. Galati, A., & Galati, A. (2015). Speakers adapt gestures to addressees’ Manera, V., Becchio, C., Cavallo, A., Sartori, L., & Castiello, U. knowledge: Implications for models of co-speech gesture. Lan- (2011). Cooperation or competition? Discriminating between guage, Cognition and Neuroscience, 29(4), 435–451. https ://doi. social intentions by observing prehensile movements. Experimen- org/10.1080/01690 965.2013.79639 7. tal Brain Research, 211(3–4), 547–556. https ://doi.org/10.1007/ Gerwing, J., & Bavelas, J. (2004). Linguistic influences on ges-s0022 1-011-2649-4. ture’s form. Gesture, 4(2), 157–195. https ://doi.or g/10.1075/ McEllin, L., Knoblich, G., & Sebanz, N. (2018). Distinct kinematic gest.4.2.04ger . markers of demonstration and joint action coordination? Evi- Gielniak, M. J., & Thomaz, A. L. (2012). Enhancing interac- dence from virtual xylophone playing. Journal of Experimental tion through exaggerated motion synthesis. In Proceedings of Psychology: Human Perception and Performance. https ://doi. the Seventh Annual ACM/IEEE International Conference on org/10.1037/xhp00 00505 . 1 3 Psychological Research McNeill, D. (1994). Hand and mind: What gestures reveal about Trujillo, J. P., Simanova, I., Bekkering, H., & Özyürek, A. (2018a). thought. Leonardo (Vol. 27). Chicago: University of Chicago Communicative intent modulates production and comprehension Press. https ://doi.org/10.2307/15760 15. of actions and gestures: A Kinect study. Cognition, 180, 38–51. Naish, K. R., Reader, A. T., Houston-Price, C., Bremner, A. J., & Hol-https ://doi.org/10.1016/j.cogni tion.2018.04.003. mes, N. P. (2013). To eat or not to eat? Kinematics and muscle Trujillo, J. P., Vaitonyte, J., Simanova, I., & Özyürek, A. (2018b). activity of reach-to-grasp movements are influenced by the action Toward the markerless and automatic analysis of kinematic fea- goal, but observers do not detect these differences. Experimental tures: A toolkit for gesture and movement research. Behavior Brain Research, 225(2), 261–275. https ://doi.org/10.1007/s0022 Research Methods. https ://doi.org/10.3758/s1342 8-018-1086-8. 1-012-3367-2. Tucker, M., & Ellis, R. (2001). The potentiation of grasp types dur- Osiurak, F., Jarry, C., Baltenneck, N., Boudin, B., & Le Gall, D. ing visual object categorization. Visual Cognition, 8(6), 769–800. (2012). Make a gesture and I will tell you what you are mim-https ://doi.org/10.1080/13506 28004 20001 44. ing. Pantomime recognition in healthy subjects. Cortex, 48(5), van Elk, M., van Schie, H., & Bekkering, H. (2014). Action seman- 584–592. https ://doi.org/10.1016/j.corte x.2011.01.007. tics: A unifying conceptual framework for the selective use of Özyürek, A. (2014). Hearing and seeing meaning in speech and gesture: multimodal and modality-specific object knowledge. Physics Insights from brain and behaviour. Philosophical Transactions of Life Reviews, 11(2), 220–250. https ://doi.org/10.1016/j.plrev of the Royal Society B, 369, 20130296. https ://doi.org/10.1098/ .2013.11.005. rstb.2013.0296. Vesper, C., & Richardson, M. J. (2014). Strategic communication and Pezzulo, G., Donnarumma, F., & Dindo, H. (2013). Human senso- behavioral coupling in asymmetric joint action. Experimental rimotor communication: A theory of signaling in online social Brain Research, 232(9), 2945–2956. https ://doi.or g/10.1007/ interactions. PLoS ONE, 8(11), e79876. https ://doi.org/10.1371/s0022 1-014-3982-1. journ al.pone.00798 76. Vesper, C., Schmitz, L., & Knoblich, G. (2017). Modulating action R Core Team (2014). R: A language and environment for statistical duration to establish nonconventional communication. Journal of computing. R Foundation for Statistical Computing, Vienna, Aus- Experimental Psychology: General, 146(12), 1722–1737. https:// tria. URL http://www.R-proje ct.org/.doi.org/10.1037/xge00 00379 .supp. Rose, M. L., Mok, Z., & Sekine, K. (2017). Communicative effective- ness of pantomime gesture in people with aphasia. International Publisher’s Note Springer Nature remains neutral with regard to Journal of Language and Communication Disorders, 52(2), 227– jurisdictional claims in published maps and institutional affiliations. 237. https ://doi.org/10.1111/1460-6984.12268 . Stapel, J. C., Hunnius, S., & Bekkering, H. (2012). Online prediction of others’ actions: The contribution of the target object, action context and movement kinematics. Psychological Research, 76(4), 434–445. https ://doi.org/10.1007/s0042 6-012-0423-2. Theeuwes, J., & Van der Stigchel, S. (2006). Faces capture attention: Evidence from inhibition of return. Visual Cognition, 13(6), 657– 665. https ://doi.org/10.1080/13506 28050 04109 49. 1 3 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Psychological Research Springer Journals

The communicative advantage: how kinematic signaling supports semantic comprehension

Loading next page...
 
/lp/springer-journals/the-communicative-advantage-how-kinematic-signaling-supports-semantic-MLgqzJrbpy

References (45)

Publisher
Springer Journals
Copyright
Copyright © 2019 by The Author(s)
Subject
Psychology; Psychology Research
ISSN
0340-0727
eISSN
1430-2772
DOI
10.1007/s00426-019-01198-y
Publisher site
See Article on Publisher Site

Abstract

Humans are unique in their ability to communicate information through representational gestures which visually simulate an action (eg. moving hands as if opening a jar). Previous research indicates that the intention to communicate modulates the kinematics (e.g., velocity, size) of such gestures. If and how this modulation influences addressees’ comprehension of gestures have not been investigated. Here we ask whether communicative kinematic modulation enhances semantic comprehension (i.e., identification) of gestures. We additionally investigate whether any comprehension advantage is due to enhanced early identification or late identification. Participants (n = 20) watched videos of representational gestures produced in a more- (n = 60) or less-communicative (n = 60) context and performed a forced-choice recognition task. We tested the isolated role of kinematics by removing visibility of actor’s faces in Experiment I, and by reducing the stimuli to stick-light figures in Experiment II. Three video lengths were used to disentangle early identification from late identification. Accuracy and response time quantified main effects. Kinematic modulation was tested for correlations with task performance. We found higher gesture identification performance in more- compared to less-communicative gestures. However, early identification was only enhanced within a full visual context, while late identification occurred even when viewing isolated kinematics. Additionally, temporally segmented acts with more post-stroke holds were associated with higher accuracy. Our results demonstrate that communicative signaling, interacting with other visual cues, generally supports gesture identification, while kinematic modulation specifically enhances late identification in the absence of other cues. Results provide insights into mutual understanding processes as well as creating artificial communicative agents. Introduction (Csibra & Gergely, 2006). This communicative signal- ing system is powerful in that the signals are dynamically Human communication is multimodal, utilizing various sig- adapted for the context in which they are used. For example, nals to convey meaning and interact with others. Indeed, representational gestures (Kendon, 2004; McNeill, 1994) humans may be uniquely adapted for knowledge transfer, show systematic modulations dependent upon the commu- with the ability to signal the intention to interact as well as nicative or social context in which they occur (Campisi & to manifest the knowledge that s/he wishes to communicate Özyürek, 2013; Galati & Galati, 2015; Gerwing & Bavelas, 2004; Holler & Beattie, 2005). Although these gestures are an important aspect of human communication, it is currently Electronic supplementary material The online version of this unclear how the addressee benefits from this communicative article (https ://doi.org/10.1007/s0042 6-019-01198 -y) contains supplementary material, which is available to authorized users. modulation. The current study aims to investigate for the first time whether and how kinematic signaling enhances * James P. Trujillo identification of representational gestures. james.trujillo@mpi.nl There is growing evidence that adults modulate their Donders Institute for Brain, Cognition and Behaviour, action and gesture kinematics when communicating with Radboud University, Montessorilaan 3, B.01.25, other adults, depending on the communicative context. For 6525GR Nijmegen, The Netherlands example, adults adapt to addressees’ knowledge by produc- Centre for Language Studies, Radboud University, Nijmegen, ing gestures that are larger (Bavelas, Gerwing, Sutton, & The Netherlands Prevost, 2008; Campisi & Özyürek, 2013), more complex Max Planck Institute for Psycholinguistics, Wundtlaan 1, (Gerwing & Bavelas, 2004; Holler & Beattie, 2005), and 6525XD Nijmegen, The Netherlands Vol.:(0123456789) 1 3 Psychological Research higher in space (Hilliard & Cook, 2016) when conveying suggesting that early kinematic cues provide sufficient novel information. Instrumental actions intended to teach information to inform accurate prediction of whole actions show similar kinematic modulation, including spatial before they are seen in their entirety (Cavallo, Koul, Ansuini, (McEllin, Knoblich, & Sebanz, 2018; Vesper & Richard- Capozzi, & Becchio, 2016; Manera, Becchio, Cavallo, Sar- son, 2014) and temporal (McEllin et al., 2018) exaggeration. tori, & Castiello, 2011). One study, for example, used videos Evidence from our own lab corroborates these findings of of a person walking, and at a pause in the video participants spatial and temporal modulation in the production of both were asked whether the actress in the video would continue actions and gestures. In our recent work, we quantified the to walk, or start to crawl. The authors showed that whole- spatial and temporal modulation of actions and pantomime body kinematics could support predictions about the out- gestures (used without speech) in a more- relative to a less- come of an ongoing action (Stapel, Hunnius, & Bekkering, communicative context (Trujillo, Simanova, Bekkering, & 2012). However, another study showed videos of a person Özyürek, 2018). We showed that spatial and temporal fea- reaching out and grasping a bottle, and asked the participants tures of actions and pantomime gestures are adapted to the to predict the next sequence in the action (e.g., to drink, communicative context in which they are produced. to move, to offer) and found that they were unable to use A computational account by Pezzulo, Donnarumma, and such early cues for accurate identification in this more com- Dindo (2013) suggests that modulation makes meaningful plex, open-ended situation (Naish, Reader, Houston-Price, acts communicative by disambiguating the relevant infor- Bremner, & Holmes, 2013). Furthermore, identification mation, effectively making the intended movement goal of pantomime gestures has previously been reported to be clear to the observer. This framework focuses on actions, quite low when no contextual (i.e., object) information is but could be extended to gestures. One recent experimental provided (Osiurak, Jarry, Baltenneck, Boudin, & Le Gall, study directly assessed how kinematic modulation affects 2012). Given these inconsistencies in the literature, an open gesture comprehension. By combining computationally question remains: are early kinematic cues sufficient to based robotic production of gestures with validation through inform early representational gesture identification, or does human comprehension experiments, Holladay, Dragan, and kinematic modulation primarily aid gesture identification as Srinivasa (2014) showed that spatial exaggeration of kin- the movements unfold (i.e., late identification)? ematics allows observers to more easily recognize the target Finally, to understand how kinematic modulation might of pointing gestures. Similarly, Gielniak and Thomaz (2012) support gesture identification, it is important to consider showed that when robot co-speech gestures are kinematically other factors that might influence the semantic comprehen- exaggerated, the content of an interaction with that robot sion of an observer. In a natural environment, movements is better remembered. Another study used an action-based such as gestures are accompanied by additional communica- leader–follower task to show that task leaders not only sys- tive signals, such as facial expression and eye-gaze, and/or tematically modulate task-relevant kinematic parameters, but finger kinematics relevant in the execution of the gestures. these modulations are linked to better performance of the Humans are particularly sensitive to the presence of human followers (Vesper, Schmitz, & Knoblich, 2017). faces, which naturally draw attention (Cerf, Harel, Einhäu- These previous studies suggest that the kinematics ser, & Koch, 2007; Hershler & Hochstein, 2005; Theeuwes modulation of communicative movements (e.g., actions & Van der Stigchel, 2006). This effect is most prominent and gestures) serves to clarify relevant information for the in the presence of mutual gaze (Farroni, Csibra, Simion, addressee. However, it remains unclear whether this also & Johnson, 2002; Holler et al., 2015), but also occurs in holds for more complex human movements, such as pan- averted gaze compared to non-face objects (Hershler & tomime gestures. This question is important for our under- Hochstein, 2005). Hand-shape information can also provide standing of human communication given that complex rep- clues as to the object one is manipulating (Ansuini et al., resentations form an important part of the communicative 2016), and more generally the kinematics of the hand and message (Kelly, Ozyurek, & Maris, 2010; Özyürek, 2014). fingers together provide early cues to upcoming actions The mechanism by which kinematic modulation might (Becchio, Koul, Ansuini, Bertone, & Cavallo, 2018; Cav- support semantic comprehension, or identification, of com- allo et al., 2016), which together may allow the act to be plex movements remains unclear. Several studies suggest more easily identified. To understand the role of kinematic disambiguation of the ongoing act, either through tempo- modulation in communication, the complexity of the visual ral segmentation of relevant parts (Blokpoel et al., 2012; scene must also be taken into account. Brand, Baldwin, & Ashburn, 2002), or spatial exaggeration In sum, previous studies show kinematic modulation of relevant features (Brand et al., 2002) as the mechanism. occurring as a communicative cue in actions and ges- In the case of disambiguation, the “semantic core” (Kendon, tures. While research suggests that this modulation serves 1986), or meaningful part of the movement, is made easier to enhance comprehension, this has not been assessed to understand as it unfolds. However, there is also evidence directly in terms of semantic comprehension of complex 1 3 Psychological Research movements, such as representational gestures. Further- Methods more, it is currently unclear if improved comprehension would be driven by early action identification or by late Participants identification of semantics, and which kinematic features provide this advantage. Twenty participants were included in this study (mean The current study addresses these questions. In two age = 28; 16 female), recruited from the Radboud Uni- experiments, naïve participants perform a recognition versity. Participants were selected on the criteria of being task of naturalistic pantomime gestures recorded in our aged 18–35, right-handed and fluent in the Dutch language, previous study (Trujillo, Simanova et al., 2018). In the with no history of psychiatric disorders or communication first experiment, they see the original videos with the impairments. The procedure was approved by a local eth- face of the actor either visible or blurred, to control for ics committee and informed consent was obtained from all eye-gaze effects. In the second experiment, the same vid- individual participants in this study. eos are reduced to stick-light figures, reconstructed from Kinect motion tracking data. The stick figure videos allow Materials us to test the contribution of specific kinematic features, because only the movements are visible, but not the face or Each participant performed the recognition task with 60 hand shape. In both experiments, we additionally manip- videos of pantomimes that differed in their context (more ulate video length to test whether any communicative or less communicative), video duration (short, medium and benefit is driven more by early identification (resulting full), and face visibility (face visible vs. blurred). Detailed in differences only in the initial fragment), or late iden- description of the video recordings, selection and manipula- tification (resulting in differences in the medium and full tion follows below. fragments). Experiment II provides an additional explora- tory test of the contribution of specific kinematic features Video recording procedure Stimuli were derived from a to gesture identification. previous experiment (Trujillo, Simanova et  al., 2018). In We hypothesize that kinematic modulation serves to this previous experiment, participants (henceforth, actors) enhance semantic legibility. As early kinematic infor- were filmed while seated at a table, with a camera hang- mation is less reliable for open-ended action prediction ing in front of the table. Motion-tracking data were acquired (Naish et al., 2013) and pantomime gestures may gener- using Microsoft Kinect system hanging slightly to the left ally be difficult to identify without context (Osiurak et al., of the camera. Each actor performed a set of 31 gestures, 2012), we expect better recognition scores for the com- either in a more-communicative or a less-communicative municative gestures in the medium fragments and full setting (described below). Gestures consisted of simple fragments compared to initial fragments. We furthermore object-directed acts, such as cutting paper with scissors or predict that performance will correlate with stronger kin- pouring water into a cup. Target objects were placed on the ematic modulation. Additionally, we expect performance table (e.g., scissors and a sheet of paper for the item ‘cut the to be lower overall with stick-light figures, compared to the paper with the scissors’) but actors were instructed to per- full videos due to decreased visual information, but with form as if they were acting on the objects, without actually a similar pattern (i.e., better performance in medium and touching them. For each item, actors began with their hands full fragments compared to initial). For our exploratory placed on designated starting points on the table (marked test, we expect that exaggeration of both spatial and tem- with tape). After placing the target object(s) on the table, the poral kinematic features will contribute to better gesture experimenter moved out of view from the participant and identification. the camera, and recorded instructions were played. Imme- diately following the instructions, a bell sound was played, which indicated that the participant could begin with the pantomime. Once the act was completed, actors returned Experiment I: Full visual context their hands to the indicated starting points, which elicited another bell sound, and waited for the next item. For this Our first experiment, with actual videos of the gestures, was study, videos began at the first bell sound, and ended at the designed to test whether (1) kinematic modulations lead to second bell sounded. In the more-communicative context improved semantic comprehension in an addressee, (2) if we introduced a confederate who sat in an adjacent room the advantage is better explained by early identification or and was said to be watching through the video camera and late identification of the gestures, and (3) whether the effect learning the gestures from the participant. In this way, an is altered by removing a salient part of the visual context, implied communicative context was created. In the less- the actor’s face. communicative context, the same confederate was said to 1 3 Psychological Research be learning the experimental setup. The less-communicative from the motion-tracking data represent qualities that are context was, therefore, exactly matched, including the pres- visible in the videos. ence of an observer, but only differed in that there was no implied interaction. Despite the subtle task manipulation, Inclusion and  randomization Our stimuli set included 120 our previous study (Trujillo, Simanova et al., 2018) showed videos (of the 2480) recorded in our previous study (Tru- robust differences in kinematics between the gestures pro- jillo, Simanova et  al., 2018). Our selection procedure (see duced in the more-communicative vs. the less-communica- Appendix  1) ensured that our stimulus set in the present tive context. experiment included an equal number of more- and less- communicative videos. Each of the 31 gesture items from the original set was included a minimum of three times and Kinematic feature quantification maximum of four times across the entire selection, per- formed by different actors, while ensuring that each item For the current study, we used the same kinematic features also appeared at least once in the more-communicative that were quantified in our earlier study (Trujillo, Simanova context and once in the less-communicative context. Three et  al., 2018). We used a toolkit for markerless automatic videos from each actor in the previous study were included. analysis of kinematic features, developed earlier in our group Appendix  2 provides the full list of items gesture items. (Trujillo, Vaitonyte, Simanova, & Özyürek, 2018). The fol- Supplementary Figure 1 illustrates the range of kinematics, lowing briefly describes the feature quantification procedure: gaze, and video durations included across the two groups in all features were measured within the time frame between the current study with respect to the original dataset from the beginning and the ending bell sound. Motion-tracking Trujillo, Simanova et al. (2018). We ensured that the stimu- data from the Kinect provided measures for our kinematic lus set for the present study matched the original dataset in features, and all raw motion-tracking data were smoothed terms of context-specific differences in the kinematics and using the Savitzky–Golay filter with a span of 15 and degree eye-gaze, ensuring that the current stimulus set is a repre- of 5. As described in our previous work (Trujillo, Simanova sentative sample of the data shown in Trujillo, Simanova et al., 2018), this smoothing protocol was used as it brought et al. (2018). These results are provided in Appendix 1. the Kinect data closely in line with simultaneously recorded optical motion-tracking data in a separate pilot session. The Video segmentation following features were calculated from the smoothed data: Distance was calculated as the total distance traveled by To test whether kinematic modulation primarily influ- both hands in 3D space over the course of the item. Vertical ences early or late identification (question 2), we divided amplitude was calculated on the basis of the highest space the videos into segments of different length. Based on used by either hand in relation to the body. Peak velocity was the previous literature (Kendon, 1986; Kita, van Gijn, & calculated as the greatest velocity achieved with the right van der Hulst, 1998), we defined segments as following: (dominant) hand. Hold time was calculated as the total time, Wait covered the approximate 500 ms after the bell was in seconds, counting as a hold. Holds were defined as an played, but before the participant started to move. Reach event in which both hands and arms are still for at least 0.3 s. to grasp covered the time during which the participant Submovements were calculated as the number of individual reached towards, and subsequently grasped the target ballistic movements made, per hand, throughout the item. To object. In the case of multiple objects, this segment ended account for the inherent differences in the kinematics of the after both objects were grasped. Prepare captured any various items performed, z scores were calculated for each movements unrelated to the initial reach to grasp, but was feature/item combination across all actors including both not part of the main semantic aspect of the pantomime. conditions. This standardized score represents the modula- Main movement covered any movements directly related tion of that feature, as it quantifies how much greater or to the semantic core of the item. Auxiliary captured any smaller the feature was when compared to the average of that additional movements not directly related to the semantic feature across all of the actors. (Addressee-directed) Eye- core. Return object captured the movement of the hands gaze was coded in ELAN as the proportion of the total dura- back to the objects starting position, depicting the object tion of the video in which the participant is looking directly being replaced to its original location. Retract covered the into the camera. For a more detailed description of these movement of the hands back to the indicated the starting quantifications, see Trujillo, Simanova et al. (2018). Also position of the hands, until the end of the video. Note that note that the kinematic features calculated using this pro- the “prepare”, and “auxiliary” segments were optional, tocol are in line with the same features manually annotated and only coded when such movements were present. All from the video recordings (Trujillo, Vaitonyte et al., 2018). other segments were present in all videos. Phases were This supports our assumption that the features calculated delineated based on this segmentation. Phase 0 covered 1 3 Psychological Research the “wait” segment. Phase 1 covered “reach to grasp” and Blurring In all videos, a Gaussian blur was applied to the “prepare”. Phase 2 covered the “main movement” and object, which was otherwise visible in the video. This “auxiliary”. Phase 3 covered “return object” and “retract”. ensured that the object could not be used to infer the action. See Table 1 and Fig. 1 for examples of how these phases To determine whether the face in general, in particular the map onto specific parts of the movement. gaze direction, has an effect on pantomime recognition, we After defining the segments for each video, we also applied a Gaussian blur to the face in half of the videos. divided the videos into three lengths, referred to as ini- Blurring the faces in this way allowed us to manipulate the tial fragments (M = 3.27 ± 1.52  s), medium fragments amount of available visual information, providing a first test (M = 4.62 ± 2.19  s), and full videos (M = 5.59 ± 2.53  s). for how kinematic modulation affects gesture identification Initial fragments consisted of only phase 0 and phase 1, in a less complete visual context (question 3). This was bal- medium fragments consisted of phases 0–2, and full videos anced so that each actor had at least one video with a visible contained all of the phases. An overview of these segments face and one with a blurred face. and phases can be seen in Fig. 1. We performed ANOVAs on each of the fragment lengths to ensure video durations Task of the same fragment length did not differ significantly across cells (see Supplementary Table  1 for statistics). Before beginning the experiment, participants received a This resulted in initial fragments only providing initial brief description of the task to inform them of the nature of hand-shape and arm/hand/finger configuration informa- the stimuli. This ensured that the participants knew to expect tion, medium fragments providing all relevant semantic incomplete videos in some trials. Participants were seated in information, and full videos providing additional eye- front of a 24″ Benq XL2420Z monitor with a standard key- gaze (when present) and additional time for processing board for responses. Stimuli were presented at a frame rate the information. of 29 frames per second, with a display size of 1280 × 720. Table 1 Movement phase examples Phase 1 Phase 2 Phase 3 Reach-to-grasp Prepare Main movement Auxiliary Return object Retract Open jar Right hands extends to jar Right hand lifts jar. Twisting hands to Hands moved apart Hands return to Hands Left hand grasps depict unscrew- to show separating object starting returned to lid ing the lid lid from jar positions indicated starting posi- tion Cut paper Right hand extends to scis- Both hands lifted, Cutting motion Hands spread apart Hands return to Hands sors, left hand to paper configured to depicted with to show that the object starting returned to start cutting paper right hand cutting is complete positions indicated starting posi- tion Fig. 1 Overview of video segmentation and phases. Along the top, representative still frames are shown throughout one video (item: “open jar”). The individual blue blocks indicate individual segments. Below this, phase division is depicted (color figure online) 1 3 Psychological Research Table 2 Overview of analysis cells for Experiment I Context Face visibility Face visibility Fragment length More-communicative More-communicative Less-communicative Less-communicative Face visible Face blurred Face visible Face blurred Initial fragment Initial fragment Initial fragment Initial fragment Mean duration = 4.49 Mean duration = 5.03 Mean duration = 4.50 Mean duration = 4.03 More-communicative More-communicative Less-communicative Less-communicative Face visible Face blurred Face visible Face blurred Medium fragment Medium fragment Medium fragment Medium fragment Mean duration = 4.72 Mean duration = 4.43 Mean duration = 4.34 Mean duration = 4.57 More-communicative More-communicative Less-communicative Less-communicative Face visible Face blurred Face visible Face blurred Full fragment Full fragment Full fragment Full fragment Mean duration = 4.73 Mean duration = 4.34 Mean duration = 4.29 Mean duration = 4.61 There are ten videos in each of the cells During the experiment, participants would first see a fixation for the presence of main and interactional effects. We used cross for a period 1000 ms with a jitter of 250 ms. One of the Mauchly’s test of sphericity on each factor and interaction in item videos was then displayed on the screen, after which the our model and applied the Greenhouse–Geisser correction question appeared: “What was the action being depicted?” where appropriate. Two possible answers were presented on the screen, one on the left, and one on the right. Answers consisted of one Results: Experiment I verb and one noun that captured the action (e.g., the correct answer to the item “pour the water into the cup” was “pour We used RM-ANOVA to test for a significant main effect water”). Correct answers were randomly assigned to one of communicative context, fragment length, or face vis- of the two sides. The second option was always one of the ibility on performance. In terms of accuracy, results of the possible answers from the total set. Therefore, all options fragment length x face visibility x communicative context were presented equally often as the correct answer and as the RM-ANOVA showed a significant main effect of commu- wrong (distractor) option. Participants could respond with nicative context, F(1,19) = 2.912, p = 0.029, as well as a the 0 (left option) or 1 (right option) keys on the keyboard. main effect of fragment length, F(2,38) = 53.583, p < 0.001, Accuracy and response time (RT) were recorded for each but no main effect of face visibility, F(1,19) = 0.050, video. p = 0.825. Planned comparisons revealed higher accuracy in the more-communicative context for initial fragments Analysis (more-communicative mean = 87.13%, less-communica- tive mean = 81.17%; t(18) = 3.025, p = 0.007), but there Main effects analyses: communicative context, fragment was no difference between contexts in the medium frag- length, and  visual context Both RT and accuracy of iden- ments (more-communicative context mean = 97.37%, less- tification judgments were calculated for each of 12 cells communicative mean = 96.49%; t(18) = 0.785, p = 0.443) (Table  2): fragment length (initial fragment vs. medium or full videos (more-communicative mean = 97.37%, less- fragment vs. full video) × face (blurred vs. visible) × context communicative mean = 97.22%; t(18) = 0.128, p = 0.899). In (more-communicative vs. less-communicative) in order to sum, performance was high overall on more-communicative test (1) whether more-communicative gestures were identi- compared to less-communicative videos, with specifically fied faster or with higher accuracy (main effect of context), more-communicative initial fragments showing higher per- (2) performance was higher in only initial fragments (pro- formance than less-communicative initial fragments. Accu- viding evidence for early identification theory) or only in racy, regardless of communicative context, was additionally medium fragments (providing evidence for late identifica- higher in medium and full fragments compared to initial. See tion), as well as (3) whether face visibility impacted per- Fig. 2a for an overview of these results. formance, which informs us whether there is an effect of In terms of RT, results of the fragment length x face x visual information availability on the identification per - context RM-ANOVA revealed a significant main effect of formance. Separate repeated-measures analyses of vari- communicative context, F(1,19) = 5.699, p = 0.028, and ance (RM-ANOVA) were run for accuracy and RT to test of fragment length, F(2,38) = 192.489, p < 0.001, but not 1 3 Psychological Research Fig. 2 Overview of semantic judgment performance over context and the three video lengths. Panel b shows RT across the three video fragment length, combined for face visibility. Bean plots depict the lengths. In all panels, fragment length is depicted along the x-axis, distribution (kernel density estimation) of the data. The dotted lines the y-axis shows mean performance (in panel, mean accuracy; in indicate the overall performance mean, the larger solid bars indicate panel, mean RT in seconds), while blue (left) plots depict the less- the mean per video length and communicative context, shorter bars communicative context and green (right) plots the more-communica- indicate mean values per participant, and the filled curve depicts the tive context (color figure online) overall distribution of scores. Panel a shows mean accuracy across of face visibility, F(1,19) = 3.725, p = 0.069. Planned con- some of the relevant information is available even in the trasts revealed faster RT in more-communicative compared earliest stages of the act, and that communicative modulation to less-communicative initial fragments (more-communi- enhances this information. Since the face visibility did not cative mean = 1.446; less-communicative mean = 1.583 s), contribute significantly to better performance, we suggest t(19) = 3.824, p = 0.001 but faster RT for less- compared to that improved comprehension may come from fine-grained more-communicative medium fragments (more-communi- kinematic cues, such as hand-shape and finger kinematics. cative mean = 1.094 s; less-communicative mean = 1.029 s), As objects are known to have specific action and hand- t(19) = 3.479, p = 0.003, but no difference between more- shape affordances (Grèzes & Decety, 2002; Tucker & Ellis, and less-communicative full videos (more-communica- 2001), hand shape can also provide clues as to the object tive mean = 1.094; less-communicative mean = 1.129), being grasped, and thus also the upcoming action (Ansuini t(19) = 1.237, p = 0.231. We also found faster RT for et al., 2016; van Elk, van Schie, & Bekkering, 2014). These medium fragments (M = 1.093) compared to initial frag- results are therefore in line with the early prediction results ments (M = 1.630), t(19) = 12.538, p < 0.001, as well as for described for action chains (Becchio, Manera, Sartori, Cav- medium fragments compared to full videos (M = 1.142), allo, & Castiello, 2012; Cavallo et al., 2016). Our results t(19) = 2.326, p = 0.031. In sum, RT was similar in both the may also be explained by immediate comprehension. In more- and less-communicative contexts, but faster responses other words, the visual information provided by the shape were seen in medium fragments compared to initial and full and configuration of the hands may be sufficiently clear to fragments. See Fig. 2b for an overview of these results. activate the semantic representation of the action without any prediction of the upcoming movements. Although we Discussion: Experiment I cannot determine the exact cognitive mechanism, we can conclude that communicative modulation supports compre- In our first experiment, we sought to determine how commu - hension through early action identification. nicative modulation affects identification of pantomime ges- We found no evidence for higher accuracy in more- com- ture semantics. We found that pantomime gestures produced pared to less-communicative medium fragments, nor for full in a more-communicative context were better recognized videos. It seems that the overall accuracy in medium and full when compared to those produced in a less-communicative fragments does not allow a difference to be found between context. Specifically, more-communicative initial fragments the contexts. In both more- and less-communicative medium were recognized more accurately and faster than less-com- fragments, accuracy was above 96%, suggesting that ceiling municative initial fragments. level performance may have already been reached. This indi- The higher accuracy in recognizing more- compared to cates that even if communicative modulation supports late less-communicative initial fragments suggests that at least identification, general task difficulty was not high enough 1 3 Psychological Research in our task to allow us to find any difference. Surprisingly, information being highly restricted, we expect task difficulty faster RT was found for less- compared to more-communica- to be increased. tive medium fragments. This unexpected result may reflect a In this way, we are able to determine if kinematic modu- trade-off between kinematic modulation, which is thought to lation supports early action identification in the absence of be informative, and direct eye-gaze, which serves a commu- other early cues such as hand shape, and whether it supports nicative function but may not lead to faster responses. Along ongoing semantic disambiguation when gesture recognition this line, Holler and colleagues (2012) argue that direct eye- is more difficult. Overall, this experiment will build on our gaze leads to a feeling of being addressed, which in turn findings from Experiment I by providing a specific test of forces the addressee to split their attention between the eyes how kinematic modulation affects semantic comprehension and hands of the speaker. If this interpretation is correct, we when isolated from other contextual information. Addition- would expect that although responses are faster for the less- ally, it will test which specific kinematic features contribute communicative videos, accuracy should still be higher in the to supporting semantic comprehension. more-communicative videos. To draw any conclusions about how communicative modulation affects late identification, Methods: Experiment II we suggest that it is necessary to increase task difficulty. In sum, our results show that communicatively produced Participants gestures are more easily recognized than less communica- tive gestures, and that this effect is explained by early action Twenty participants were included in this study (mean identification. This result is in line with the research on age = 24; 16 female), recruited from the Radboud Univer- child-directed actions (Brand et al., 2002), as well as the sity. Participants were selected on the criteria of being aged more recent developments regarding early action identifica- 18–35, right-handed, fluent in the Dutch language, without tion based on kinematic cues (Ansuini, Cavallo, Bertone, & any history of psychiatric impairments or communication Becchio, 2014; Cavallo et al., 2016). disorders, and not having participated in the previous experi- ment. The procedure was approved by a local ethics commit- tee and informed consent was obtained from all individual Experiment II: Isolated kinematic context participants in this study. Although this first experiment shows evidence for a support- Materials ing role of kinematic modulation in semantic comprehension of gestures, it remains unclear whether the effect remains We used same video materials as in the Experiment I, but when only gross kinematics are observed, and facial, includ- this time the videos were reduced to stick-light-figures. ing attentional cueing to the hands, and finger kinematics, Motion-tracking data were used to reconstruct the move- including hand shape, are completely removed. Removing ments of the upper-body joints (Trujillo, Vaitonyte et al., additional visual contextual information would therefore 2018). Videos consisted of these reconstructions, using x, help to disentangle the effects of gross (i.e., posture and y, z coordinates acquired at 30 frames per second of these hands) kinematic modulation from other (potentially com- joints (see Fig. 3 for an illustration of the joints utilized). municative) visual information. For example, while exten- Note that no joints pertaining to the fingers were visually sive research has looked at the early phase of action iden- represented. This ensured that hand shape was not a feature tification from hand and finger kinematics (Ansuini et al., that could be identified by an observer. These points were 2016; Becchio et al., 2018; Cavallo et al., 2016), the higher depicted with lines drawn between the individual points to level dynamics of the hands and arms, which we call gross create a light stick figure, representing the participants’ kin- kinematics, have not been well studied. This is particularly ematic skeleton. Skeletons were centered in space on the relevant as these high level kinematic features are similar to screen, with the viewing angle adjusted to reflect an azimuth the qualities described in gesture research. Thus, in Experi- of 20° and an elevation of 45° in reference to the center of ment II we replicate Experiment I, but reduce the stimuli to the skeleton. present a visually simplistic scene consisting of only lines representing the limbs of the actor’s body. If kinematic Analysis modulation is driving the communicative advantage seen in our first experiment, we can expect the same effect pat - Main effects analyses: communicative context, fragment tern as seen in Experiment I. If other features of the visible length, and  visual context To determine if there was an scene, such as finger kinematics, provided the necessary overall effect of communicative context on accuracy or RT, cues for semantic comprehension then the effect on early and to again test for evidence of either the early identifi- identification should no longer be present. Due to the visual cation or late identification hypothesis, we used two sepa- 1 3 Psychological Research Fig. 3 Illustration of materials used for Experiment II. a Diagram of 6–9 are present for both the left and right arms. b Still frames from joints represented in the videos of Experiment II: 1. top of head, 2. an actual stimulus video, depicting the visual information made avail- bottom of head, 3. top of spine, 4. middle of spine, 5. lower spine, able to the participants, underneath the corresponding actual video 6. shoulder, 7. elbow, 8. wrist, 9. center of hand. Note that numbers frames (not shown to participants) for comparison Table 3 Overview of analysis cells for Experiment II between the set of kinematic features and accuracy. Regres- sion analyses were performed on the medium fragments, as Context this is where a statistically significant difference was found Fragment length More-communicative Less-communicative between more- and less-communicative videos. Statistical Initial fragment Initial fragment analyses utilized mixed effects models implemented in the Mean = 4.22 s Mean = 4.24 s R statistical program (R Core Team, 2014) using the lme4 More-communicative Less-communicative package (Bates, Mächler, Bolker, & Walker, 2014). p val- Medium fragment Medium fragment ues were estimated using the Satterthwaite approximation Mean = 4.68 s Mean = 4.73 s for denominator degrees of freedom, as implemented in the More-communicative Less-communicative Full fragment Full fragment lmerTest package (Kuznetsova, 2016). Our regression mod- Mean = 4.59 s Mean = 4.51 s els first factored out video duration and subsequently tested the three main components of kinematic modulation that There are ten videos in each of the cells have been identified in previous research: range of motion (Bavelas et al., 2008; Hilliard & Cook, 2016) (here quan- rate 3 (fragment length) × 2 (context) one-way ANOVAs. tified as vertical space utilized), velocity of movements, and punctuality (Brand et al., 2002) (here quantified as the When appropriate, independent samples t tests were used to determine where these differences occurred across the number of submovements and the amount of holds between them. Kinematic features were defined as main effects, while three video lengths. When a non-normal distribution was detected, results are reported after a Greenhouse–Geisser a random intercept was added for participant. For a detailed description of how the model was defined, see Appendix  3. correction. To reduce the risk of Type I error, we used the Simple Inter- active Statistical Analysis tool (http://www.quant itati veski Feature level regression analysis: exploratory test of  kin- ematic modulation values Given that Experiment II aims lls.com/sisa/calcu latio ns/bonf e r .htm) to calculate an adjusted alpha threshold based on the mean correlation between all of to test the specific contribution of kinematic modulation on semantic comprehension, we additionally performed the tested features (regardless of whether they are in the final model or not), as well as the number of tests (i.e., number of an exploratory linear mixed effects analysis using the kin- ematic modulation values that characterize the stimulus vid- variables remaining in the final mixed model). Our six vari- ables (duration, vertical amplitude, peak velocity, submove- eos. This was done to assess the relation between specific kinematic features and semantic judgment performance. ments, hold time) showed an average correlation of 0.154, leading to a corrected threshold of p = 0.019. Kinematic modulation values were available from our previous study, where these stimulus videos were created Results: Experiment II (Trujillo, Simanova et al., 2018), and were meant to quan- tify kinematic features in the semantic core of the action. Main effects analyses: communicative context, fragment We, therefore, chose to perform this additional analysis in Experiment II as a follow-up assessment of the significant length difference between more- and less-communicative medium fragments (Table 3). Our first RM-ANOVA tested whether accuracy was affected by the communicative context, or the fragment length of the We performed linear regression analyses between the set of kinematic features and RT, and a logistic regression videos. We found a significant main effect of communicative 1 3 Psychological Research context on accuracy, F(1,19) = 5.108, p = 0.036, as well as a Feature level regression analysis: exploratory test main effect of fragment length, F(2,38) = 10.962, p < 0.001. of kinematic modulation values Planned comparisons revealed no difference between accu- racy of more-communicative and less-communicative initial To test which specific kinematic features, if any, affected fragments (more-communicative mean = 59.58%, less-com- accuracy, we used mixed models to assess whether accuracy municative mean = 56.76%), t(19) = − 0.646, p = 0.526, or on each video could be explained by the kinematic features in full videos (more-communicative mean = 64.87%, less- of that video. We found kinematic modulation of punctuality communicative mean = 62.76%), t(19) = 0.492, p = 0.628. (hold-time and submovements) to explain performance accu- We found significantly higher accuracy in more-commu- racy better than the null model, χ (5) = 16.064, p < 0.001. nicative medium fragments (M = 75.69%) compared to Specifically, increased hold time was associated with higher less-communicative medium fragments (M = 66.11%) vid- accuracy (b = 0.377, z = 3.962, p < 0.001), although sub- eos, t(19) = 2.99, p = 0.007. We found no fragment length movements were not (z = − 0.085, p = 0.932). We found no by communicative context interaction, F (2,36) = 0.659, correlation between duration and accuracy (z = − 1.151, p = 0.523. p = 0.249) in our kinematic model. Response time was not Our second RM-ANOVA tested whether RT was significantly explained by any of the kinematic feature sets. affected by communicative context or fragment length. We Duration, as assessed in the null model, was also not related found a significant main effect of fragment length on RT, to response time (t = − 1.768, p = 0.077). In sum, kinematic F(2,38) = 7.263, p = 0.003, but no main effect of commu- modulation of hold time was specifically related to higher nicative context, F(1,19) = 2.12, p = 0.162. We additionally performance accuracy. found a video length x context interaction, F(2,38) = 3.87, p = 0.031. Planned comparisons revealed significantly faster Discussion: Experiment II RT in medium fragments (M = 1.817 s) compared to ini- tial fragments (M = 1.953  s), t(19) = 3.982, p = 0.001, but Experiment II was designed to test the isolated contribution no difference between medium fragments and full videos of kinematics to semantic comprehension and further dif- (M = 1.872  s), t(19) = 1.339, p = 0.196. See Fig.  4 for an ferentiate between early identification vs. late identification. overview of these results. In sum, communicative context We found that more-communicative videos were still recog- did not affect RT, but responses were faster in medium com- nized with overall higher accuracy than less-communicative pared to initial fragments. videos even in the absence of contextual cues such as hand- shape, finger kinematics, or actor’s face. Higher accuracy in recognizing more-communicative compared to less-communicative medium fragments sug- gests that the advantage given by kinematic modulation predominantly affects identification of the pantomime after Fig. 4 Overview of semantic judgment performance over context and video lengths. Panel b shows RT across the three video lengths. In all fragment length in Experiment II. Bean plots depict the distribution panels, fragment length is depicted along the x-axis, the y-axis shows (kernel density estimation) of the data. The dotted lines indicate the mean performance (in panel, mean accuracy; in panel, mean RT in overall performance mean, the largest solid bars indicate the group seconds), while blue (left) plots depict the less-communicative con- mean per video length and context, and shorter bars indicate individ- text and green (right) plots the more-communicative context (color ual participant means. Panel a shows mean accuracy across the three figure online) 1 3 Psychological Research it has unfolded. The unfolding of the final phase of the pan- non-kinematic cues play a role in early gesture recognition, tomime may provide enough extra time for the overall act while modulated arm and hand kinematics provide cues to to be processed completely and the pantomime to be recog- identify the act as it unfolds, even in the absence of other nized accurately regardless of modulation. This finding is visual cues. therefore in line with the hypothesis that kinematic modula- Our conclusion regarding the role of temporal modula- tion mainly contributes to ongoing semantic disambiguation. tion, and more specifically the increased hold time, as sup- We further explored the contribution of specific kinematic porting semantic comprehension matches well with the fac- features to semantic comprehension in the absence of fur- tor ‘punctuality’, as defined by Brand et al. (2002) in their ther visual context such as hand shape or facial cues. We study of child-directed action. Punctuality of actions refers found that temporal kinematic modulation (i.e., increasing to movement segments with clear beginning and end points, segmentation of the act) was an important factor influenc- allowing the individual movements to be clear to an observer ing semantic comprehension. Specifically, increasing hold (Blokpoel et al., 2012). Exaggerating the velocity changes time positively impacted accuracy. Our results suggest that between movements and increasing hold time (Vesper et al., although the effect may be subtle in production, this fea- 2017) can make the final body configuration more salient ture plays an important role in clarifying semantic content by allowing longer viewing time of this configuration for through temporal unfolding of the gesture. the addressee. Our findings have several important implications. By combining naturalistic motion-tracking production data General discussion with a semantic judgment task in naïve observers, our study provides new insights and support for models of effec- This study aimed to determine the role of kinematic modu- tive human–machine interactions. Specifically, our results lation in the semantic comprehension of (pantomime) ges- expand and contrast the robotics literature that demonstrate tures. First, we asked whether kinematic modulation influ- spatial modulation as a method of defining more legible ences semantic comprehension of gestures and found that acts (Dragan, Lee, & Srinivasa, 2013; Dragan & Srinivasa, more-communicatively produced gestures are recognized 2014; Holladay et al., 2014). Our findings suggest that while better than less-communicatively produced gestures (Experi- spatial modulation may be effective for single-movement ments I and II). Second, by utilizing different video frag- gestures such as pointing, temporal modulation has a larger ment lengths, we tested the underlying mechanism of this role in this clarification effect in more complex acts. communicative advantage. We found evidence for enhanced We additionally build on studies of gesture comprehen- early identification when provided with a more complete sion, showing the importance of kinematic cues in success- visual scene, including the hand shape (Experiment I), but ful semantic uptake and bringing new insights into previous enhanced late identification when providing with only gross findings. For instance, our findings provide a mechanistic kinematics (Experiment II). Finally, we show in Experiment understanding of larger scale, qualitative features, such as II that increased post-stroke hold time has the strongest effect informativeness (Campisi & Özyürek, 2013). Differences on the communicative gesture comprehension advantage. in the informativeness of complex gestures may be under- When provided with a wealth of visual cues, as in Experi- stood by looking at the underlying kinematic die ff rences and ment I, participants gained a communicative advantage even how these relate to the comprehension of such gestures. As in the early stages of movement. This finding fits nicely an example, gestures are understood through the individ- with the idea that the end goal of an action, or perhaps the ual movements that comprise them, rather than static hand upcoming movements themselves, can be predicted by utiliz- configurations (Kendon, 2004; McNeill, 1994). Increasing ing early kinematics together with visual contextual infor- the number of clearly defined movements consequently mation (Cavallo et al., 2016; Iacoboni et al., 2005; Stapel increases the amount of visual information available to an et  al., 2012). Our results from the Experiment II suggest observer, which could lead to the perception of increased that kinematic modulation of gross hand movements alone informativeness. is not sufficient for this effect as when the visual stimulus Our work has further implications for clinical practice, was degraded this advantage was removed. It should be where it can be applied to areas such as communication dis- noted that we cannot conclude that kinematic information orders. Research has shown that people with aphasia use is insufficient, but rather that the gross hand kinematics that gestures, including pantomimes, to supplement the semantic are typically used to assess gestures are insufficient. This content of their speech (DeBeer et al., 2015; Rose, Mok, is particularly relevant given the evidence that hand and & Sekine, 2017). Knowledge of which features contribute finger kinematics inform early manual action identification to semantically recognizable gestures could, therefore, be (Becchio et al., 2018; Cavallo et al., 2016; Manera et al., applied to developing therapies for more effective panto- 2011). We, therefore, conclude that both kinematic and mime use and understanding. 1 3 Psychological Research for addressee-directed eye-gaze and kinematic modula- Summary tion were ranked higher than those with low values. This placed all items on a continuum that ranked their commu- Our study is the first to systematically test and provide a nicativeness. This was done due to the observation that, partial account of how the kinematic modulation that arises due to the subtle manipulation of context in Experiment from a more-communicative context can support efficient I of Trujillo, Simanova et al. (2018), there was consider- identification of a manual act. We found that communica- able overlap of kinematic modulation in the middle of the tively produced acts are more easily understood early on spectrum (i.e., some actors in the more-communicative due to kinematic and non-kinematic cues. While compre- context showed modulation more similar to those of the hension is dependent on how much of the visual scene is less-communicative context, and vice versa). We chose available, communicative kinematic modulation alone leads to include items which represented a range of eye-gaze to improved recognition of pantomime gestures even in a and kinematic features representative of their respective highly reduced visual scene. Particularly, temporal kine- communicative context. This method allowed a more clear matic modulation leads to improved late identification of separation of the contexts, while our further selection pro- the act in the absence of other cues. cedure (described below) ensured that items were included Acknowledgements The authors are grateful to Ksenija Slivac for her across a wide range of this ranked continuum. contribution to stimulus preparation and data collection in Experiment After creating the ranked continuum of items, inclu- I, as well as Muqing Li for her contribution to data collection and sion moved from highest to lowest ranked items. Each of analyses in Experiment II. We additionally thank Louis ten Bosch for the 31 items, as described in Appendix  2, was included his insights and discussions regarding methodology. This research was supported by the NWO Language in Interaction Gravitation Grant. a minimum of three times and maximum of four times across the entire selection, performed by different actors, Funding Funding was provided by Nederlandse Organisatie voor while ensuring that each item also appeared at least once Wetenschappelijk Onderzoek (Grant no. 2014.WP4.PhD.RUN.014). in more-communicative context and once in the less-com- municative context. Three videos from each actor in the Compliance with ethical standards previous study were included. This ensured an even rep- resentation of the data on which we previously reported. Conflict of interest The authors declare no conflict of interest in this Supplementary Figure 1 illustrates the range of kinemat- study. ics, gaze, and video durations included across the two Ethics statement All procedures performed in studies involving groups in the current study with respect to the original human participants were in accordance with the ethical standards of dataset. the institutional and/or national research committee and with the 1964 We ensured that the current stimulus set was representa- Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all individual partici- tive of the original data by repeating the same mixed model pants included in the study. analyses described in Trujillo, Simanova et al. (2018). In line with the original dataset, we found significantly higher Informed consent Informed consent was obtained from all individual values in communicative compared to non-communica- participants included in the study. tive ver tical amplitude (communicative = 0.160 ± 0.99; non-communicative = − 0.449 ± 0.809; χ (4) = 12.263, Open Access This article is distributed under the terms of the Crea- p < 0.001), submovements (communicative = 0.161 ± 789; tive Commons Attribution 4.0 International License (http://creat iveco mmons.or g/licenses/b y/4.0/), which permits unrestricted use, distribu- non-communicative = − 0.661 ± 585; χ (4) = 32.821, tion, and reproduction in any medium, provided you give appropriate p < 0.001), peak velocity (communicative = 0.181 ± 1.08; credit to the original author(s) and the source, provide a link to the non-communicative = − 0.683 ± 0.649; χ (4) = 23.965, Creative Commons license, and indicate if changes were made. p = 0.001), and direct eye-gaze (communica - tive = 0.235 ± 0.220; non-communicative = 0.013 ± 0.041; χ (4) = 44.703, p < 0.001). Also in line with the original data, we found a less robust, but still significant difference Appendix 1: Item selection procedure in hold time (communicative = 0.107 ± 1.159; non-com- municative = − 0.448 ± 0.892; χ (4) = 7.917, p = 0.005), To provide a representative sampling of each of the two Finally, duration was also longer in communicative groups, all individual items from all subjects included in (M = 7.237 ± 1.754) compar ed t o non-communicative the previous study were ranked according to eye-gaze and (M = 6.132 ± 1.235) videos. overall kinematic modulation (i.e., z scores derived from the kinematic features described in the section b). The two groups were ordered such that items with high values 1 3 Psychological Research changed in child–adult interactions by (Brand et al., 2002), Appendix 2: List of items from Trujillo, but was found to be increased in a communicative context Simanova et al. (2018) by (Trujillo, Simanova et al., 2018). As more-communicative videos were, on average, longer The table provides the original Dutch response options than less-communicative videos, we included video duration that participants saw, alongside the English translation. (ms) in our regression models. This allowed us to test the Original (Dutch) English contribution of kinematic features after taking into account total duration, ensuring that any effect of kinematics is not appel verplaatsen Move apple explained by duration alone. We report the video duration banaan pellen Peel banana correlation from the best-fit model if this model is a better fit blokken stapelen Stack blocks to the data than the null model. If the null model is a better brood snijden Cut bread fit, then we report the video duration correlation from the citroen uitpersen Squeeze lemon null model. Duration was fitted before the kinematic vari - dobbelstenen gooien Roll dice ables in order to ensure that any significant contribution of haar borstelen Brush hair kinematic modulation to the model fit was over and above hoed opdoen Put on hat that of duration. In other words, our models were set up to kaarten schudden Shuffle cards specifically test the contribution of kinematic modulation kurk verdwijderen Remove cork after taking into account video duration and inter-individual naam schrijven Write name differences. papier afvegen Brush-off paper Typically, when utilizing mixed effects models the papier knippen Cut paper researcher must first find the model that is the best-fit for papier kreukelen Crumple paper the data before making inferences on the model parame- papier meten Measure paper ters. The best-fit model was determined by first defining a papieren nieten Staple papers ‘null’ model that only included duration as fixed effect and papier scheuren Tear paper participant as random intercept. We used a series of log- papier stempelen Stamp paper likelihood ratio tests to determine if each kinematic feature papier vouwen Fold paper term (described above: range of motion, velocity, punctual- pendop opdoen Put on pen cap ity) contributed significantly to the model fit. For example, pendop verdwijderen Remove pen cap if a comparison between a model that includes peak velocity potje openmaken Open jar and a model that does not include this effect term yields a ring aandoen Put on ring non-significant result, then we do not include this kinematic slot openmaken Open lock feature in the model. If the comparison yields as a signifi- spijkers slaan Hammer nails cant result, we keep this kinematic feature and compare this tafel schrobben Scrub desk model with a new model that contains the next non-tested tekening wissen Erase drawing kinematic feature. In a step-wise fashion we thus test the thee roeren Stir tea contribution of each of the kinematic features. We report theezakje dompelen Steep tea effects from the final, best-fit model, if it is still a better fit water gieten Pour water than the null model. zonnebril opdoen Put on sunglasses References Appendix 3: Mixed effects modeling procedure Ansuini, C., Cavallo, A., Bertone, C., & Becchio, C. (2014). The vis- ible face of intention: Why kinematics matters. Frontiers in Psy- chology, 5, 815. https ://doi.org/10.3389/fpsyg .2014.00815 . The order in which the predictor variables were entered Ansuini, C., Cavallo, A., Koul, A., D’Ausilio, A., Taverna, L., & Bec- into the mixed effects model was determined based on the chio, C. (2016). Grasping others’ movements: Rapid discrimina- a priori hypothesized contribution of the three compo- tion of object size from observed hand movements. Journal of Experimental Psychology: Human Perception and Performance, nents: range of motion has been found to be increased in 42(7), 918–929. https ://doi.org/10.1037/xhp00 00169 . adult–child interactions (Brand et al., 2002; Fukuyama et al., Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014). Fitting linear 2015); peak velocity was found to be increased in a com- mixed-effects models using lme4. Journal of Statistical Software, municative context in at least one study (Trujillo, Simanova 67(1), 1–48. https ://doi.org/10.18637 /jss.v067.i01. et  al., 2018); punctuality was previously not found to be 1 3 Psychological Research Bavelas, J., Gerwing, J., Sutton, C., & Prevost, D. (2008). Gesturing Human–Robot Interaction—HRI’12 (p. 375). New York: ACM on the telephone: Independent effects of dialogue and visibility. Press. http://doi.org/10.1145/21576 89.21578 13. Journal of Memory and Language, 58(2), 495–520. https ://doi. Grèzes, J., & Decety, J. (2002). Does visual perception of object afford org/10.1016/j.jml.2007.02.004. action? Evidence from a neuroimaging study. Neuropsychologia, Becchio, C., Koul, A., Ansuini, C., Bertone, C., & Cavallo, A. (2018). 40(2), 212–222. https: //doi.org/10.1016/S0028- 3932(01)00089- 6. Seeing mental states: An experimental strategy for measuring the Hershler, O., & Hochstein, S. (2005). At first sight: A high-level pop observability of other minds. Physics of Life Reviews. https://doi. out effect for faces. Vision Research, 45(13), 1707–1724. https :// org/10.1016/j.plrev .2017.10.002.doi.org/10.1016/J.VISRE S.2004.12.021. Becchio, C., Manera, V., Sartori, L., Cavallo, A., & Castiello, U. Hilliard, C., & Cook, S. W. (2016). Bridging gaps in common ground: (2012). Grasping intentions: From thought experiments to empiri- Speakers design their gestures for their listeners. Journal of cal evidence. Frontiers in Human Neuroscience, 6(May), 1–6. Experimental Psychology: Learning, Memory, and Cognition, https ://doi.org/10.3389/fnhum .2012.00117 . 42(1), 91–103. https ://doi.org/10.1037/xlm00 00154 . Blokpoel, M., van Kesteren, M., Stolk, A., Haselager, P., Toni, I., & Holladay, R. M., Dragan, A. D., & Srinivasa, S. S. (2014). Legible van Rooij, I. (2012). Recipient design in human communication: Robot Pointing. In: The 23rd IEEE International Symposium on Simple heuristics or perspective taking? Frontiers in Human Neu- Robot and Human Interactive Communication, 2014 RO-MAN roscience, 6, 253. https ://doi.org/10.3389/fnhum .2012.00253 . (pp. 217–223). Brand, R. J., Baldwin, D. A., & Ashburn, L. A. (2002). Evidence Holler, J., & Beattie, G. (2005). Gesture use in social interaction: How for ‘motionese’: Modifications in mothers’ infant-directed speakers’ gestures can reflect listeners’ thinking. In: 2nd Con- action. Developmental Science, 5(1), 72–83. https ://doi. ference of the International Society for Gesture Studies (ISGS): org/10.1111/1467-7687.00211. Interacting Bodies (pp. 1–12). Campisi, E., & Özyürek, A. (2013). Iconicity as a communicative strat- Holler, J., Kelly, S., Hagoort, P., & Özyürek, A. (2012). When gestures egy: Recipient design in multimodal demonstrations for adults catch the eye: The influence of gaze direction on co-speech ges- and children. Journal of Pragmatics, 47(1), 14–27. https ://doi. ture comprehension in triadic communication. In: N. Miyake, D. org/10.1016/j.pragm a.2012.12.007. Peebles, & R. P. Cooper (Eds.) Proceedings of the 34th Annual Cavallo, A., Koul, A., Ansuini, C., Capozzi, F., & Becchio, C. (2016). Meeting of the Cognitive Science Society (pp. 467–472) Austin, Decoding intentions from movement kinematics. Scientific TX: Cognitive Society. Reports, 6(November), 37036. https://doi.or g/10.1038/srep37036 . Holler, J., Kokal, I., Toni, I., Hagoort, P., Kelly, S. D., & Ozyurek, A. Cerf, M., Harel, J., Einhäuser, W., & Koch, C. (2007). Predicting (2015). Eye’m talking to you: Speakers’ gaze direction modulates human gaze using low-level saliency combined with face detec- co-speech gesture processing in the right MTG. Social Cogni- tion. NIPS 2007. https ://doi.org/10.1016/j.visre s.2015.04.007. tive and Affective Neuroscience, 10(2), 255–261. https ://doi. Csibra, G., & Gergely, G. (2006). Social learning and social cognition: org/10.1093/scan/nsu04 7. The case for pedagogy. Processes of Change in Brain and Cogni- Iacoboni, M., Molnar-Szakacs, I., Gallese, V., Buccino, G., Mazziotta, tive Development, 21, 249–274. J. C., & Rizzolatti, G. (2005). Grasping the intentions of others DeBeer, C., Carragher, M., van Nispen, K., de Ruiter, J., Hogrefe, with one’s own mirror neuron system. PLoS Biology, 3(3), e79. K., & Rose, M. (2015). Which gesture types make a difference? https ://doi.org/10.1371/journ al.pbio.00300 79. Interpretation of semantic content communicated by PWA via Kelly, S. D., Ozyurek, A., & Maris, E. (2010). Two sides of the same different gesture types. GESPIN, 4, 89–93. coin: Speech and gesture mutually interact to enhance compre- Dragan, A. D., Lee, K. C. T., & Srinivasa, S. S. (2013). Legibility hension. Psychological Science, 21(2), 260–267. https ://doi. and predictability of robot motion. In 2013 8th ACM/IEEE org/10.1177/09567 97609 35732 7. International Conference on Human-Robot Interaction (HRI) Kendon, A. (1986). Current issues in the study of gesture. In J.-L. (pp. 301–308). Tokyo, Japan: IEEE. http://doi.org/10.1109/ Nespoulous, P. Perron, A. R. Lecours, & T. S. Circle (Eds.), The HRI.2013.64836 03 biological foundations of gestures: Motor and semiotic aspects Dragan, A., & Srinivasa, S. (2014). Integrating human observer infer- (1st ed., pp. 23–47). London: Psychology Press. ences into robot motion planning. Autonomous Robots, 37(4), Kendon, A. (2004). Gesture: Visible actions as utterance. Cambridge: 351–368. https ://doi.org/10.1007/s1051 4-014-9408-x. Cambridge University Press. Farroni, T., Csibra, G., Simion, F., & Johnson, M. H. (2002). Eye con- Kita, S., van Gijn, I., & van der Hulst, H. (1998). Movement phases tact detection in humans from birth. Proceedings of the National in signs and co-speech gestures, and their transcription by human academy of Sciences of the United States of America, 99(14), coders. In Lecture notes in computer science (including subseries 9602–9605. https ://doi.org/10.1073/pnas.15215 9999. lecture notes in artificial intelligence and lecture notes in bioin- Fukuyama, H., Qin, S., Kanakogi, Y., Nagai, Y., Asada, M., & formatics) (Vol. 1371, pp. 23–35). Berlin: Springer. http://doi. Myowa-Yamakoshi, M. (2015). Infant’s action skill dynamically org/10.1007/BFb00 52986 . modulates parental action demonstration in the dyadic interac- Kuznetsova, A. (2016). lmerTest package: Tests in linear mixed effects tion. Developmental Science, 18(6), 1006–1013. https ://doi. models. Journal of Statistical Software, 82(13), 1. https ://do i. org/10.1111/desc.12270 .org/10.18637 /jss.v082.i13. Galati, A., & Galati, A. (2015). Speakers adapt gestures to addressees’ Manera, V., Becchio, C., Cavallo, A., Sartori, L., & Castiello, U. knowledge: Implications for models of co-speech gesture. Lan- (2011). Cooperation or competition? Discriminating between guage, Cognition and Neuroscience, 29(4), 435–451. https ://doi. social intentions by observing prehensile movements. Experimen- org/10.1080/01690 965.2013.79639 7. tal Brain Research, 211(3–4), 547–556. https ://doi.org/10.1007/ Gerwing, J., & Bavelas, J. (2004). Linguistic influences on ges-s0022 1-011-2649-4. ture’s form. Gesture, 4(2), 157–195. https ://doi.or g/10.1075/ McEllin, L., Knoblich, G., & Sebanz, N. (2018). Distinct kinematic gest.4.2.04ger . markers of demonstration and joint action coordination? Evi- Gielniak, M. J., & Thomaz, A. L. (2012). Enhancing interac- dence from virtual xylophone playing. Journal of Experimental tion through exaggerated motion synthesis. In Proceedings of Psychology: Human Perception and Performance. https ://doi. the Seventh Annual ACM/IEEE International Conference on org/10.1037/xhp00 00505 . 1 3 Psychological Research McNeill, D. (1994). Hand and mind: What gestures reveal about Trujillo, J. P., Simanova, I., Bekkering, H., & Özyürek, A. (2018a). thought. Leonardo (Vol. 27). Chicago: University of Chicago Communicative intent modulates production and comprehension Press. https ://doi.org/10.2307/15760 15. of actions and gestures: A Kinect study. Cognition, 180, 38–51. Naish, K. R., Reader, A. T., Houston-Price, C., Bremner, A. J., & Hol-https ://doi.org/10.1016/j.cogni tion.2018.04.003. mes, N. P. (2013). To eat or not to eat? Kinematics and muscle Trujillo, J. P., Vaitonyte, J., Simanova, I., & Özyürek, A. (2018b). activity of reach-to-grasp movements are influenced by the action Toward the markerless and automatic analysis of kinematic fea- goal, but observers do not detect these differences. Experimental tures: A toolkit for gesture and movement research. Behavior Brain Research, 225(2), 261–275. https ://doi.org/10.1007/s0022 Research Methods. https ://doi.org/10.3758/s1342 8-018-1086-8. 1-012-3367-2. Tucker, M., & Ellis, R. (2001). The potentiation of grasp types dur- Osiurak, F., Jarry, C., Baltenneck, N., Boudin, B., & Le Gall, D. ing visual object categorization. Visual Cognition, 8(6), 769–800. (2012). Make a gesture and I will tell you what you are mim-https ://doi.org/10.1080/13506 28004 20001 44. ing. Pantomime recognition in healthy subjects. Cortex, 48(5), van Elk, M., van Schie, H., & Bekkering, H. (2014). Action seman- 584–592. https ://doi.org/10.1016/j.corte x.2011.01.007. tics: A unifying conceptual framework for the selective use of Özyürek, A. (2014). Hearing and seeing meaning in speech and gesture: multimodal and modality-specific object knowledge. Physics Insights from brain and behaviour. Philosophical Transactions of Life Reviews, 11(2), 220–250. https ://doi.org/10.1016/j.plrev of the Royal Society B, 369, 20130296. https ://doi.org/10.1098/ .2013.11.005. rstb.2013.0296. Vesper, C., & Richardson, M. J. (2014). Strategic communication and Pezzulo, G., Donnarumma, F., & Dindo, H. (2013). Human senso- behavioral coupling in asymmetric joint action. Experimental rimotor communication: A theory of signaling in online social Brain Research, 232(9), 2945–2956. https ://doi.or g/10.1007/ interactions. PLoS ONE, 8(11), e79876. https ://doi.org/10.1371/s0022 1-014-3982-1. journ al.pone.00798 76. Vesper, C., Schmitz, L., & Knoblich, G. (2017). Modulating action R Core Team (2014). R: A language and environment for statistical duration to establish nonconventional communication. Journal of computing. R Foundation for Statistical Computing, Vienna, Aus- Experimental Psychology: General, 146(12), 1722–1737. https:// tria. URL http://www.R-proje ct.org/.doi.org/10.1037/xge00 00379 .supp. Rose, M. L., Mok, Z., & Sekine, K. (2017). Communicative effective- ness of pantomime gesture in people with aphasia. International Publisher’s Note Springer Nature remains neutral with regard to Journal of Language and Communication Disorders, 52(2), 227– jurisdictional claims in published maps and institutional affiliations. 237. https ://doi.org/10.1111/1460-6984.12268 . Stapel, J. C., Hunnius, S., & Bekkering, H. (2012). Online prediction of others’ actions: The contribution of the target object, action context and movement kinematics. Psychological Research, 76(4), 434–445. https ://doi.org/10.1007/s0042 6-012-0423-2. Theeuwes, J., & Van der Stigchel, S. (2006). Faces capture attention: Evidence from inhibition of return. Visual Cognition, 13(6), 657– 665. https ://doi.org/10.1080/13506 28050 04109 49. 1 3

Journal

Psychological ResearchSpringer Journals

Published: May 11, 2019

There are no references for this article.