Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

GestuRe and ACtion Exemplar (GRACE) video database: stimuli for research on manners of human locomotion and iconic gestures

GestuRe and ACtion Exemplar (GRACE) video database: stimuli for research on manners of human... Behav Res (2018) 50:1270–1284 DOI 10.3758/s13428-017-0942-2 GestuRe and ACtion Exemplar (GRACE) video database: stimuli for research on manners of human locomotion and iconic gestures 1 2 1 Suzanne Aussems · Natasha Kwok · Sotaro Kita Published online: 15 September 2017 © The Author(s) 2017. This article is an open access publication Abstract Human locomotion is a fundamental class of Third, all the actions in the database are distinct from each events, and manners of locomotion (e.g., how the limbs other. Fourth, adult native English speakers were unable to are used to achieve a change of location) are commonly describe the 26 different actions concisely, indicating that encoded in language and gesture. To our knowledge, there the actions are unusual. This normed stimuli set is useful for is no openly accessible database containing normed human experimental psychologists working in the language, ges- locomotion stimuli. Therefore, we introduce the GestuRe ture, visual perception, categorization, memory, and other and ACtion Exemplar (GRACE) video database, which related domains. contains 676 videos of actors performing novel manners of human locomotion (i.e., moving from one location to Keywords Action exemplars · Iconic gestures · Human another in an unusual manner) and videos of a female locomotion manners · Video database · Stimuli set actor producing iconic gestures that represent these actions. The usefulness of the database was demonstrated across four norming experiments. First, our database contains clear Introduction matches and mismatches between iconic gesture videos and action videos. Second, the male actors and female Human locomotion (e.g., movement of the human limbs actors whose action videos matched the gestures in the to change location) is a topic widely studied in the field best possible way, perform the same actions in very simi- of experimental psychology. For instance, expressions of lar manners and different actions in highly distinct manners. human locomotion have been studied in spoken language (e.g., Malt et al. 2008; Slobin et al. 2014;Maltetal. 2014), written language (e.g., Slobin 2004, 2006), sign language (e.g., Supalla 2009; Slobin & Hoiting 1994), and gesture Electronic supplementary material The online version of this ¨ ¨ (e.g., Ozyurek ¨ 1999; Kita 2003; Ozc ¸alıs ¸kan 2016). Also, in article (https://doi.org/10.3758/s13428-017-0942-2) contains sup- plementary material, which is available to authorized users. many word learning experiments, researchers teach children verbs for novel manners of human locomotion (e.g., Mum- ford 2014; Mumford & Kita 2014;Imaietal. 2008;Scott & Suzanne Aussems s.aussems@warwick.ac.uk Fisher 2012). In memory experiments, locomotion stimuli are often used to study visual memory of agents and their Natasha Kwok natasha@plaiconsulting.com actions (e.g., Wood 2012). In categorization experiments, human locomotion is used to study, inter alia, how chil- Sotaro Kita dren perceptually categorize manners of locomotion (e.g., s.kita@warwick.ac.uk Salkind et al. 2003; Salkind et al. 2005; Pulverman et al. Department of Psychology, University of Warwick, 2006). CV4 7AL Coventry, UK Particularly in studies on verb learning, human locomo- tion stimuli are often used along with iconic gestures. Iconic P.L.A.I. Behaviour Consulting, Hong Kong, P.O. Box 11010, General Post Office, Hong Kong gestures (McNeill, 1992) represent actions, motions or Behav Res (2018) 50:1270–1284 1271 attributes associated with people, animals, or objects (e.g., Norming the GRACE Video Database wiggling the index and middle fingers to represent a per- son walking; tracing a shape). Researchers have investigated In this section, we identify and motivate four essential whether novel verb meanings are shaped by iconic gestures requirements for the type of stimuli in the GRACE video that are shown when the verb is taught (e.g., Spencer et database. These requirements guided the design of our al. 2009; Goodrich & Hudson Kam 2009; Mumford 2014; norming studies to assure its usefulness for experimental Mumford & Kita 2014). psychologists. The GRACE video database is particularly Developing human locomotion stimuli can be very labo- useful for researchers who need unusual human locomo- rious. Nevertheless, most researchers develop such stimuli tion stimuli to study language and gesture, memory, and solely for the purpose of their own research. As a conse- categorization. Below, we discuss the implications of each quence, there is no openly accessible video database con- norming study in the context of these research areas. taining manners of human locomotion and iconic gestures First, the GRACE video database includes videos that that represent these manners. were normed for the degree of match between action pairs and matching and mismatching iconic gestures. Many experiments in developmental psychology use two-way Current Research forced choice tasks. In such tasks, pairing actions that would appear as two choices is important. The design of our first Contents of the GRACE Video Database norming experiment is motivated by this future use. Also, pairing actions made data collection for this study more We developed and normed the GestuRe and Action Exem- manageable; if we did not pair, participants would have plar (GRACE) video database, which includes 676 videos to rate a large number of action-gesture combinations that of 26 actors (13 males, 13 females) performing 26 novel make “mismatches”. Action pairs with matching and mis- manners of human locomotion (i.e. moving from one loca- matching gestures could be used in experiments with a tion to another in an unusual manner), and 26 videos of a two-way forced choice task in which one of the actions is female actor who produces iconic gestures that represent congruent with gesture, but the other is incongruent. This is these manners. Figure 1 presents three examples of the ges- useful for research on word learning with the help of iconic tures and the corresponding manners of locomotion (in the gestures (e.g., Mumford & Kita 2014; Mumford 2014; upper right corner of each panel). The gesturing hands rep- Ozc ¸alıs ¸kan et al. 2014; Goodrich & Hudson Kam 2009), the resent the actor’s feet (panel A), the actor’s legs (panel B), intake of information conveyed by gesture and speech (e.g., and the actor’s whole body (panel C). McNeill et al. 1994; Cassell 1999; Ozyurek ¨ et al. 2007), and The GRACE video database is openly available from the memory recall for sentences with the help of gesture (e.g., Warwick Research Archive Portal at nAlong with the 702 Feyereisen 2006; Madan & Singhal 2012). Furthermore, video files, we have made the raw data from our norm- these stimuli are useful for studies on processing gesture- ing studies available and the Python scripts that we used to speech combinations, in which researchers often manipulate process the data. We also included a manual that contains the semantic relations between the two channels (i.e., ges- guidelines on how to use the GRACE video database. ture and speech match, mismatch, or complement each Fig. 1 Three panels (A, B, and C) with cropped stills of videos in Gestures and actions are included in separate video files in which a female actor gestures iconically to represent the manners the database. From left to right the panels show the follow- of human locomotion performed by actors in the upper right cor- ing gesture videos: “00F scurrying.mp4”, “00F mermaiding.mp4”, ners of the panels. In the actual norming study, the action video and and “00F twisting.mp4”, and action videos: “01F scurrying.mp4”, the gesture video had the same size and were presented side-by-side. “09F mermaiding.mp4”, “01M twisting.mp4” 1272 Behav Res (2018) 50:1270–1284 other) (e.g., McNeill et al. 1994; Cassell et al. 1999;Ozyurek ¨ al. 2013), which use change-detection tasks with more than et al. 2007; Spencer et al. 2009). Thus, the first norm- two options (e.g., four actions presented to participants on ing study tested matches and mismatches between iconic each quadrant). Third, the manners of locomotion that are gestures and manners of human locomotion in all the 676 shown to one participant need to be highly distinctive from action videos. We then ran an algorithm over the norming each other to avoid confusion in any given task. For exam- scores to identify the best possible matches between iconic ple, if a participant is taught a novel label for a locomotion gestures and actions performed by male actors and female manner in a word learning task, then this manner should be actors, separately. This led to a one-to-one assignment of distinct from all manners that are subsequently labeled to male actors and female actors to action pairs. Action videos avoid a bias in test performance. Therefore, the third norm- of the selected actors were used in the next norming study. ing study tested the similarity between all combinations of Second, GRACE contains videos that were normed for actions to obtain a measure of distinctiveness for each action the similarity of the same actions within action pairs in the database. In this norming study, human raters were performed by male actors and female actors and the presented with a subset of the videos from the database, in (dis)similarity of the different actions within action pairs which each video showed one of the 26 actions performed performed by male actors and female actors. Researchers by either a male or female actor. who introduce an actor-change in their experimental task Finally, the 26 actions in the GRACE video database (e.g., to test actor memory or verb generalization) often do were normed for how accurately and concisely they can this by changing between male actors and female actors, be described by adult native English speakers. We asked as they have naturally distinct appearances (e.g., Mum- whether the English language contains existing single-word ford 2014). For instance, word learning studies that take or multi-word labels for the actions, which we used as a an exemplar-based approach could use videos that show measure of how unusual the actions are. It is important that different actors performing the same actions and the same the stimuli are unusual to ensure that a given task perfor- actors performing different actions (e.g., Maguire et al. mance occurs as a function of an experimental manipulation 2002; Maguire et al. 2008; Scott & Fisher 2012). Videos that and not as a consequence of participants being familiar with show different actors moving in the same manner could also the stimuli prior to the task. This is important for language be useful for creating generalization tasks to test people’s research: if a participant already knows a label for an action understanding of locomotion verbs (e.g., Imai et al. 2008), action that is labeled in a word learning task, then this and recognition tasks and change-detection tasks to test their may cause a bias in test performance. It is also important memory of actors (e.g., Imai et al. 2005; Wood 2008). In all for memory research: if people commonly perform these these tasks it is important that the manner of human loco- actions in real life, then this may cause a bias in test per- motion is similar across the actor-change. Thus, the second formance. Therefore, the fourth experiment assessed how norming study tested how similar male actors and female accurately and concisely each action can be described by actors perform the same actions within action pairs, and adult native speakers of English. Participants described the how distinct each male actor and female actor performs the 26 actions in the database based on the same set of videos two different actions within action pairs. All actions that are as in the third norming study. included in the database were normed in this study, but par- ticipants rated only the videos of male actors and female General Methods for Developing the GRACE Video actors who were assigned to an action pair because their per- Database formance matched corresponding gestures very well in the first norming study. The GRACE video database originated in work by Mum- Third, GRACE includes 26 actions which were normed ford and Kita (2014) and Mumford (2014), who developed for how distinct they are compared to every other action 14 unusual manners of human locomotion and iconic ges- in the database. In this norming study, we let go of the tures representing these manners. GRACE includes these 14 notion of action pairs to obtain a measure of distinc- manners and 12 additional manners of human locomotion tiveness for all the actions in the database. There are and corresponding iconic gestures, resulting in a total of 26 three advantages of using this approach. First, norming the manners and gestures. distinctiveness between all 26 actions is useful for studies on the ways in which people can categorize various seman- Action Videos tic components of motion verbs such as figure (e.g., the man, the woman, Pulverman et al.2006) and manner (e.g., We recruited 13 male actors between 22–40 years old (M = Salkind 2003; Salkind et al. 2005). Second, such norms are 27.00, SD = 4.98) and 13 female actors between 20–42 useful for studies on infants’ ability to discriminate man- years old (M = 27.08, SD = 6.36). The national origin of the ners of motion (e.g., Pulverman et al. 2008; Pulverman et actors varied from British, Czech, Japanese, Polish, Dutch, Behav Res (2018) 50:1270–1284 1273 Indian, Irish, German, Canadian, Nigerian, Mauritian, Bul- Linux. The total size of the GRACE video database is 185 garian, Pakistani, Singaporean, Malaysian to Chinese. All mega-bytes. actors were educated to the university degree level. Actors participated in individual recording sessions. They were instructed to keep their arms and hands by their Experiment 1 side when performing the actions, because we needed the hand gestures of the female actor to unambiguously repre- The first experiment tested the degree of match (and mis- sent the actors’ feet, leg, and body movements. Actors were match) between iconic gestures and manners of human also required to carry out each action as an ongoing motion locomotion. During the development of the database, 26 without any breaks. iconic gestures were created that matched each action. A Prior to recording each action, actors watched an exam- mismatch between iconic gestures and actions was set up in ple video of a model. The videos of the model were not the following way. Every action was paired up with another included in the database so that all actors shared the same action from the set to create 13 action pairs (see Table 1). We reference point when performing the actions. Subsequently, then showed participants each action with a matching iconic the actors were required to move across the length of a gesture, but also with the iconic gesture that was created for scene in the same manner as the model. The starting point the other action in the action pair as a mismatching iconic and the ending point were marked on the floor just outside gesture. Participants rated these matches and mismatches on the camera view. Each action was recorded at least twice a seven-point scale. from a distance of approximately 4.5 meters. If actors strug- We predicted that match ratings for matching iconic ges- gled with one of the actions, the researcher showed them tures and actions would be higher than match ratings for their last recorded video and practised the movement with mismatching iconic gestures and actions. Additionally, we them repeatedly until they were ready to record again. Every predicted that matches would be rated higher than the neu- recording session lasted approximately 1 hour. Informed tral score on a seven-point scale and that mismatches would written consent was obtained at the end of each recording be rated lower than the neutral score. session. Method Gesture Videos Participants Hand gestures of a female actor were recorded from a dis- tance of approximately 1.5 meters. This actor watched the We recruited 301 individuals (183 males, 117 females) from video recordings of the model performing an action prior to the university’s online participant pool. Eight participants recording the gesture that was designed to match this action. were excluded from further analyses because they indicated Gestures were designed by the researchers based on the def- that the videos did not display, or run smoothly. The final inition of iconic gestures by McNeill (1992) so that the form participant sample included 293 individuals (179 females, of gesture resembled the referent action. 113 males) between 18–67 years old (M = 22.19, SD = Specifically, all gestures iconically represented the body 6.66). The majority of participants reported English as part that was most prominent for each movement (i.e., feet, their native language (58.7%), followed by Asian languages legs, or whole body), its dynamic shape, and the rate at (23.2%), and other Indo-European languages (18.1%). Par- which the movement was carried out. Gestures representing ticipants automatically entered a lottery for an Amazon the whole body were performed with the right hand. Ges- voucher upon completing the task. tures representing the legs were performed by both hands, where the right hand represented the right leg and the left Materials hand represented the left leg. Gestures representing the feet were performed with the fingers, where the right hand fin- We used videos of 26 manners of locomotion carried out by gers matched the right foot and the left hand fingers matched 26 actors (676 videos in total), and 26 videos of a female the left foot. actor producing iconic gestures. Actions were organized in pairs (see Table 1) so that matches and mismatches between Apparatus iconic gestures and actions could be created. Figure 2 shows the matches and mismatches between iconic gestures and Videos were recorded using a Canon Legria HFR56 cam- actions for action pair 1. For instance, participants were shown era with autofocus in a room with controlled light settings. bowing with a bowing gesture (Panel A), bowing with a Recordings were muted, cut, optimized for HTML, and con- skating gesture (Panel B), skating with a skating gesture verted to MP4 files of 640 × 480 pixels using avconv on (Panel C), and skating with a bowing gesture (Panel D). 1274 Behav Res (2018) 50:1270–1284 Table 1 Twenty-six manners of human locomotion organized in action pairs Pair Action a Still frame Action b Still frame 1. bowing skating 2. wobbling marching 3. mermaiding overstepping 4. creeping crisscrossing 5. turning hopscotching 6. swinging skipping 7. jumping crossing 8. dropping folding 9. twisting stomping 10. trotting hopping 11. flicking dragging 12. grapevining shuffling 13. groining scurrying Still frames are taken from the videos of the male actor whose videos file names start with “08M ”. Short-hand action labels are used to refer to the manners of locomotion and follow after the underscore in the file names of the database (e.g., “08M bowing.mp4”, “08M skating.mp4”) We created 26 batches of videos to keep the length of the side-by-side, which started playing on loop automatically experiment reasonable. Each video batch contained videos when a trial started. Participants were instructed to rate the of the 26 actions, but performed by different actors to ensure match between the hand gesture of the female actor (left that all 676 action videos appeared in one of the batches. video) and the manner in which an actor moved (right video) on a seven-point scale, where 1 indicated a very bad match, Each action video was combined with a matching and mis- 4 indicated neither a good nor a bad match, and 7 indicated matching gesture video within a batch, which resulted in a very good match. Participants were randomly assigned to 52 trials. Each action video–gesture video combination was one of the 26 batches and trials were randomly displayed for rated by on average 23 participants (range = 18 to 28). each participant. After they had seen all the trials, they were asked if all the videos ran smoothly, and if not, what type of Procedure problems had occurred. The experiment was set up in a web-based environment. Data Analysis Participants signed a digital consent form and were asked for demographic information. The instruction page showed participants a still frame of a gesture video and a still frame Using the irr package in the R software for statistical analyses (R Development Core Team, 2011), we computed Kendall’s of an action video from the model as an example of a very good match. Participants were then shown two videos W (also known as Kendall’s coefficient of concordance) to Behav Res (2018) 50:1270–1284 1275 can be assigned to only one action pair). In order to achieve a one-to-one assignment the matrix has to have the same number of rows and columns. The same procedure was carried out for the matrix containing average ratings for 13 male actors. The Hungarian method (Kuhn & Yaw, 1955; Kuhn, 1956) finds an optimal assignment for a given n · n matrix in the following way. Suppose we have n action pairs to which we want to assign n actors on a one-to-one basis. The average ratings are the profit of assigning each actor to each action pair. We wish to find an optimal assignment which maximizes the total profit. Let P be the profit of assigning an ith actor to the j th i,j action pair. We define the profit matrix to be the n·n matrix: ⎡ ⎤ P P ··· P 1,1 1,2 1,n ⎢ ⎥ P P ··· P 2,1 2,2 2,n ⎢ ⎥ P = ⎢ ⎥ . (1) . . . . . . ⎣ ⎦ . . . P P ··· P n,1 n,2 n,n An assignment is a set of n entry positions in the matrix, none of which lie in the same column or row. The sum of the n entries of an assignment is its profit. An assignment with the highest profit is called an optimal assignment. We imple- mented this algorithm in Python using the Munkres package. Fig. 2 Four panels (A, B, C, and D) with cropped stills of videos in Our Python scripts are available from the Warwick Research which a female actor gestures iconically to represent the actions of pair Archive Portal at http://wrap.warwick.ac.uk/78493. 1, as performed by a male actor in the upper right corners of the panels. Panels A shows a bowing gesture with a bowing movement (match), Results and Discussion Panel B shows a bowing gesture with a skating movement (mismatch), Panel C shows a skating gesture with a skating movement (match), Inter-Rater Reliability and Panel D shows a skating gesture with a bowing movement (mis- match). Gesture videos are “00F bowing.mp4” (Panel A and B) and “00F skating.mp4” (Panel C and D). Action videos are “06M bowing” Kendall’s W averaged over all 26 video batches was .72 (Panel A and D) and “06M skating” (Panel B and C) (SD = 0.07) and ranged between .54 and .81. This coeffi- cient was statistically significant for all batches (p < .001), assess agreement between participants who rated the same indicating that participants were applying the same stan- video batch. Kendall’s W is a non-parametric test statistic dards when rating the stimuli. that takes into account the number of raters and the fact that the videos were rated on an ordinal scale. Its coefficient General Findings ranges from 0 (no agreement) to 1 (complete agreement). We used non-parametric tests to analyze the ratings for Figure 3 displays the average ratings for the degree of match matches and mismatches between iconic gestures and actions, between iconic gestures and actions. Black dots represent because these ratings were not normally distributed. The R average ratings for matches between iconic gestures and script containing the basic code for all analyses reported in grey dots represent average ratings for mismatches between this paper is uploaded as supplementary material. iconic gestures and actions. The 95% confidence intervals for both match and mismatch ratings are generally very The Hungarian Algorithm narrow, indicating strong agreement among the participants. We asked whether ratings differed between match and We split the data based on the gender of the actors, because mismatch combinations of iconic gestures and actions. Rat- our aim is to identify the best possible match between iconic ings for matches and mismatches between iconic gestures gestures and action pairs carried out by male actors and and actions were averaged across all action pairs for each by female actors. The matrix containing average ratings participant. A Wilcoxon rank sum test demonstrated that for female actors was subjected to the Hungarian algorithm the median of average match ratings (Mdn = 5.92) was significantly higher than the median of average mismatch (Kuhn and Yaw, 1955; Kuhn, 1956) to find the most prof- ratings (Mdn = 1.77), W = 316.5, p < .001, 95% CI of the itable (here best overall match between gestures and actions) difference [−4.12, −3.88]. assignment of 13 female actors to 13 action pairs (each actor 1276 Behav Res (2018) 50:1270–1284 Fig. 3 Average ratings for the degree of match between matching match between iconic gestures and actions on a scale of 1 (“very bad and mismatching iconic gestures and actions, organized by action match”) to 7 (“very good match”). The dotted line indicates the neutral pair. Error bars represent 95% confidence intervals of the means. score of 4 on the seven-point scale Rating scores are averaged across all actors and represent the degree of Furthermore, we compared the averaged ratings for pair four times, and the female with the fourth highest match matches and mismatches across action pairs against the rating for an action pair one time. As the 13 females were neutral score on our seven-point scale. A Wilcoxon signed assigned to 13 action pairs, the highest possible profit that rank test indicated that the median of average match rat- could have been achieved was 91 (13 × 7). The algorithm ings was significantly higher than a neutral score of 4, W = assigned female actors to action pairs with a total profit of 42638, p < .001, 95% CI of the median [5.77, 5.92]. In 80.63 (88.6% of 91), with the lowest average match rating for contrast, the median of average mismatch ratings was signif- an assigned actor being 5.56 out of 7 (see Fig. 6 in Appendix). icantly lower than a neutral score of 4, W = 137, p < .001, For males, the algorithm selected the male actor with 95% CI of the median [1.75, 1.92]. Thus, matching iconic the highest match rating for an action pair six times, the gestures and actions were rated as good matches and mismat- male with the second highest match rating for an action ching iconic gestures and actions were rated as bad matches. pair two times, the male with the third highest match rat- The 95% confidence intervals of the means in Fig. 3 ing two times, the male with the fourth highest match rating clearly demonstrate that there is some variability between two times, and the male with the fifth highest match rating action pairs. When we compared the median of the averaged one time. The algorithm assigned male actors to action pairs match and mismatch ratings for every action pair against a with a total profit of 81.02 (89.0% of 91), with the lowest neutral score of 4, Wilcoxon signed rank tests revealed that average match rating for an assigned actor being 5.64 out of matches and mismatches for all action pairs differed signifi- 7(seeFig. 7 in Appendix). cantly from the neutral score (p < .001 for all comparisons). Experiment 1 provided norming scores for all the videos in the GRACE videos database. With these ratings we eval- Assigning Actors to Action Pairs uated the match and mismatch between iconic gestures and actions within action pairs. Moreover, the Hungarian algo- The Hungarian Algorithm optimally assigned 13 female rithm over these ratings optimally assigned male actors actors to 13 action pairs, and did the same for 13 male and female actors to action pairs, to maximize the overall actors. The Algorithm used “profit” matrices for actors and degree of match between gestures and action pairs. These action pairs, created in the following way (one matrix for assignments will be used in subsequent experiments. female actors, and another one for male actors). For each action performed by each actor, 10–14 participants rated the match between each action and a matching gesture. The ratings Experiment 2 were averaged across participants, and then the two average ratings for actions that comprise an action pair were averaged The second experiment tested whether the male actors and again to create a “profit” for the action pair and actor. female actors who were assigned to an action pair based on For females, the algorithm selected the female actor with Experiment 1 perform the same actions in similar manners the highest match rating for an action pair eight times, the and the two different actions in distinct manners. Partic- female with the second highest match rating for an action ipants rated the similarity between two action videos on Behav Res (2018) 50:1270–1284 1277 a seven-point scale. These videos showed either the same During the main task, participants saw two videos side- actor performing two different actions, or two different by-side and rated the similarity between two movements actors (male vs. female) performing the same action. on a seven-point scale, where 1 indicated very dissimilar, We predicted that two actors performing the same action 4 indicated neither similar nor dissimilar, and 7 indicated would be rated more similar than the same actor perform- very similar. Both videos started playing on loop automati- ing two different actions. Additionally, we predicted that cally when a trial commenced. Participants were randomly two actors performing the same action would be rated more assigned to an experiment version and trials were displayed similar than the neutral score on a seven-point scale and the in a random order for each participant. After they had seen same actor performing a different action would be rated less all the trials, they were asked if all the videos ran smoothly, similar than the neutral score. and if not, what type of problems had occurred. Method Data Analysis Participants The data were analyzed in the same way as in Experiment 1. We recruited 42 individuals (19 males, 22 females, and 1 Results and Discussion would rather not say) from the university’s online partic- ipant pool. Two participants were excluded from further Inter-Rater Reliability analyses because they indicated that the videos did not dis- play, or run smoothly. The final participant sample included A statistically significant Kendall’s W of .77 (p < .001) 40 individuals (20 females, 19 males, and 1 would rather was computed for the similarity ratings, indicating that not say) between 18–57 years old (M = 24.30, SD = 8.25). participants reached agreement when rating the stimuli. The majority of participants reported English as their native language (67.5%), followed by other Indo-European lan- General Findings guages (22.5%), and Asian languages (10.0%). Participants automatically entered a lottery for an Amazon voucher upon Figure 4 displays the average similarity ratings for the same completing the task. and different actions within each action pair, carried out by the male actors and female actors who were assigned Materials to these action pairs based on Experiment 1. The 95% confidence intervals of the means for both the same and We used videos of male actors and female actors, who different actions are generally very narrow, indicating that were assigned to the action pairs based on Experiment 1. participants reached agreement. Trials included either two videos of the same actor (male or We asked whether ratings differ between different actors female) performing the two different actions in a pair, or two performing the same action and the same actors performing videos of two different actors performing the same actions a different action. Ratings for the same actors performing in a pair (action a or action b). Thus, for each action pair we two different actions and two different actors performing created four trials, resulting in a total of 52 trials (13 action the same actions were averaged across action pairs for each pairs × 2 actor gender × 2 same or different action). participant. A Wilcoxon rank sum test demonstrated that the median of average ratings was significantly higher for Counterbalancing two different actors performing the same action (Mdn = 6.62) than for the same actors performing a different action The left–right position of the action videos on each trial (Mdn = 1.48), W = 1.5, p < .001, 95% CI of the difference was counterbalanced across participants using two different [−5.19, −4.73]. versions of the experiment. We also predicted that two different actors performing the same action would be rated more similar than a neutral Procedure score of 4 and that the same actors performing a different action would be rated less similar than a neutral score of The procedure of this online experiment was similar to 4. Wilcoxon signed rank tests confirmed these predictions Experiment 1. The instruction page showed two videos of the same action performed by a male actor and a female actor (different actors performing the same action W = 817, p < (who were not included in the database) as a “very similar” .001, 95% CI of the median [6.38, 6.65]; the same actors example. The instructions stated that participants should not performing a different action, W = 820, p < .001, 95% CI proceed if they were unable to view the videos properly. of the median [1.40, 1.77]). 1278 Behav Res (2018) 50:1270–1284 Fig. 4 Average similarity ratings for actions within each action pair. separately for the same and different actions within each action pair. Error bars represent 95% confidence intervals of the means. For each Rating scores represent the similarity between two actions, on a scale participant, ratings were averaged across the male actor and the female of 1 (“very dissimilar”) to 7 (“very similar”). The dotted line indicates actor who were assigned to an action pair based on Experiment 1, the neutral score of 4 on the seven-point scale The 95% confidence intervals of the means in Fig. 4 indicated that the videos did not display, or run smoothly. evidently show that there appears to be some variability The final sample included 222 individuals (87 males, 135 between action pairs. When we compared the median of females) between 18–73 years old (M = 24.04, SD = 8.69). averaged ratings for every action pair (for the same actor The majority of participants reported English as their native performing two different actions and two different actors language (55.9%), followed by other Indo-European lan- performing the same actions) against a neutral score of guages (22.5%), and Asian languages (21.6%). Participants 4, Wilcoxon signed rank tests revealed that ratings for all automatically entered a lottery for an Amazon voucher upon action pairs differed significantly from the neutral score completing the task. (p < .001 for all comparisons). Overall, Experiment 2 thus shows that male actors and female actors, who were Materials assigned to an action pair based on Experiment 1, perform the same actions in similar manners and different actions in We used a set of 26 videos showing the 13 action pairs. distinct manners. For each action pair, we randomly determined whether each action was performed by the male or female actor that was assigned to that pair based on Experiment 1.Ifthe male Experiment 3 actor was selected for one action of the action pair, then the female actor was automatically selected for the other action The third experiment tested how distinct the 26 actions are of the action pair, and vice versa. Thus, 13 videos showed a from every other action in the set. We used a subset of the male actor and 13 videos showed a female actor. video database, which included videos of the 26 actions car- All possible combinations of two different action videos ried out by the male or female actors who were assigned to (26 × 25) were then divided over 26 video batches to keep an action pair based on Experiment 1. Participants rated the the length of the experiment reasonable. We made sure that similarity between every combination of two action videos every action video appeared in each batch. Across batches on a seven-point scale. each action video thus appeared with every other action video. Method Procedure Participants The same procedure as Experiment 2 was used. Partici- pants were presented with two action videos side-by-side, We recruited 225 individuals (88 males, 137 females) and rated the similarity between the actions on a seven- through the university’s online participant pool. Three par- point scale, where 1 indicated very dissimilar, 4 indicated ticipants were excluded from further analyses because they neither similar nor dissimilar, and 7 indicated very similar. Behav Res (2018) 50:1270–1284 1279 Participants were randomly assigned to a video batch and Results and Discussion trials were randomly displayed for each participant. After they had seen all the trials, they were asked if all the Inter-Rater Reliability videos ran smoothly, and if not, what type of problems had occurred. Kendall’s W averaged over all 25 video batches was .52 Participants were allowed to rate multiple video batches, (SD = 0.12) and ranged between .27 and .68. This coef- because each batch presented participants with new com- ficient was statistically significant for all batches (p < binations of action videos. We recorded 260 responses .001), indicating that participants were applying the same from 222 individuals. Every combination of two action standards when rating the stimuli. videos was rated by on average 20 participants (range = 19 to 22). General Findings Data Analysis Table 2 shows similarity ratings for every combination of two actions. The average score was 2.56 (SD = 1.71) and Inter-rater reliability was calculated in the same way ranged between 1.10 (SD = 0.30) for the combination of as in Experiment 1-2. A similarity matrix was created action 5a and 1a and 6.63 (SD = 0.60) for action 7b and 13a. by averaging the ratings over every combination of two The distinctiveness of each action can be assessed by actions. averaging similarity ratings between a given action and Table 2 Similarity rating matrix with averages (above the diagonal line of black squares) and standard deviations (below the diagonal line of black squares) for every combination of two action videos 1a 1b 2a 2b 3a 3b 4a 4b 5a 5b 6a 6b 7a 7b 8a 8b 9a 9b 10a 10b 11a 11b 12a 12b 13a 13b 1a  2.38 4.23 1.43 1.57 2.00 2.68 1.48 1.10 1.95 1.38 2.23 1.95 1.86 3.67 1.74 2.23 1.89 1.43 1.62 1.48 2.19 1.26 3.95 2.19 2.24 1b 1.28  2.29 1.95 2.09 3.33 4.86 1.81 2.16 3.24 2.59 2.10 3.90 3.00 4.89 3.05 2.11 3.18 2.52 1.57 2.19 5.53 1.76 3.81 4.10 1.95 2a 1.57 1.35  1.24 1.71 1.38 1.95 1.38 1.48 2.33 1.57 2.58 1.62 1.58 2.38 1.52 4.11 1.90 1.43 1.48 1.62 1.57 2.36 2.43 1.84 2.90 2b 0.68 1.07 0.44  1.38 5.14 3.33 1.62 1.38 2.05 4.05 4.00 2.58 3.43 1.81 4.14 1.24 3.67 2.86 1.57 4.84 2.76 1.29 2.32 3.62 1.68 3a 0.87 1.27 1.27 0.67  1.19 1.43 3.33 4.27 2.84 1.16 2.14 2.14 1.43 2.00 1.57 3.38 2.16 1.90 5.42 1.67 1.29 3.19 1.62 1.67 2.29 3b 1.38 1.91 0.74 1.78 0.40  4.43 1.53 1.42 2.00 4.52 3.48 2.90 4.05 2.05 6.10 1.24 4.81 3.32 1.57 4.57 3.48 1.38 2.63 4.05 1.95 4a 1.80 1.74 1.09 1.85 0.68 1.33  2.05 1.81 2.57 2.48 3.41 3.33 4.41 5.43 4.00 1.19 3.90 3.10 1.67 3.16 4.95 1.68 3.57 3.95 2.67 4b 1.03 1.21 0.80 1.24 1.80 1.02 1.40  3.32 2.68 1.57 2.29 4.14 2.48 1.58 1.48 1.67 1.52 1.95 3.91 1.67 2.14 5.81 1.37 2.00 2.05 5a 0.30 1.21 0.75 1.12 1.32 0.84 1.21 1.83  2.82 1.57 1.62 2.19 1.43 1.79 1.38 3.76 1.29 1.81 4.05 1.43 1.71 2.14 1.24 1.76 1.62 5b 1.20 1.45 1.46 1.18 1.64 1.23 1.57 1.67 1.53  1.81 4.24 3.71 2.43 2.67 2.33 1.43 2.67 3.90 2.32 3.10 1.86 1.52 2.19 1.71 3.26 6a 0.74 1.50 1.03 1.40 0.50 1.50 1.44 0.75 1.40 1.03  2.24 1.90 3.76 1.73 3.84 1.38 2.58 1.90 1.42 2.76 3.27 1.67 1.81 4.33 1.62 6b 1.34 1.45 1.46 1.82 1.46 1.60 1.79 1.42 0.97 1.73 1.34  4.29 2.47 2.67 3.05 1.52 2.81 3.33 2.14 5.29 2.89 1.33 2.95 1.90 2.19 7a 1.32 1.84 0.86 1.54 1.15 1.61 1.32 1.28 1.03 1.78 1.22 1.62  3.10 2.71 2.19 1.33 3.71 3.91 2.05 4.16 3.29 2.42 3.23 2.73 3.16 7b 1.15 1.41 1.17 1.29 0.75 1.78 1.76 1.86 0.93 1.50 1.87 1.74 1.55  3.29 4.33 1.48 4.95 3.58 1.48 2.68 3.29 2.36 2.52 6.63 1.86 8a 1.98 1.79 1.24 0.98 1.20 1.28 1.54 0.69 1.40 1.59 0.98 1.71 1.62 1.79  2.62 1.33 3.05 2.38 1.91 1.90 3.84 1.90 3.43 2.90 2.71 8b 1.37 1.53 0.75 1.71 1.21 1.22 1.63 0.81 0.67 1.20 1.86 1.68 1.50 1.49 1.53  1.68 4.53 3.62 1.48 3.71 3.77 1.52 2.33 4.19 2.19 9a 1.69 1.33 1.94 0.54 1.80 0.70 0.68 1.28 1.97 0.87 0.80 0.90 0.73 0.81 0.66 0.89  1.59 1.71 2.10 1.76 1.32 2.43 1.76 1.62 3.43 9b 1.45 1.79 1.04 1.46 1.46 1.72 1.37 0.75 0.72 1.53 1.71 1.47 1.45 1.69 1.72 1.74 0.91  5.38 1.76 2.62 3.23 1.71 2.86 5.00 2.81 10a 0.68 1.40 1.16 1.61 1.04 1.38 1.41 1.40 1.17 1.81 1.22 1.91 1.93 1.68 1.16 1.72 0.90 0.86  1.38 4.86 2.10 1.90 1.95 3.42 4.76 10b 1.16 1.21 1.03 1.47 1.43 1.36 0.97 1.66 1.89 1.63 0.69 1.35 1.28 1.03 1.27 0.93 1.14 1.49 0.59  1.48 2.00 2.71 1.62 1.29 1.89 11a 0.98 1.29 1.28 1.50 0.97 1.12 1.50 1.15 0.81 1.73 1.45 1.15 2.11 1.52 1.04 1.82 1.41 1.36 1.64 0.81  2.43 1.53 2.81 2.50 2.33 11b 1.33 1.68 1.03 1.64 0.56 1.54 1.53 1.39 1.19 0.85 1.75 1.49 1.65 1.62 1.84 1.74 0.95 1.41 1.34 1.45 1.63  1.38 3.90 2.95 1.86 12a 0.65 1.22 1.73 0.64 1.54 0.59 1.06 1.17 1.31 0.93 1.06 0.58 1.46 1.68 1.14 1.03 1.29 1.10 1.34 1.49 1.02 0.59  2.52 2.36 2.19 12b 1.63 1.57 1.50 1.04 0.74 1.61 1.47 0.68 0.54 1.40 1.17 1.43 1.80 1.17 1.89 1.53 0.89 1.71 1.31 0.97 1.63 1.64 1.17  2.47 2.77 13a 1.08 1.61 1.21 1.66 1.32 1.60 1.86 1.10 1.30 0.96 1.53 1.14 1.42 0.60 1.76 1.94 1.02 1.64 1.71 0.78 1.26 1.60 1.68 1.02  1.52 13b 1.37 1.20 1.73 0.95 1.35 1.05 1.32 1.43 1.32 1.79 1.02 1.21 1.80 0.91 1.27 1.40 1.63 1.54 1.58 1.20 1.39 1.20 1.44 1.34 0.75 Columns and rows are labeled with numbers of the compared actions. N varies between 19 to 22 per cell. Average ratings are given above the diagonal line of black squares. Ratings represent the similarity between two actions, on a scale of 1 (“very dissimilar”) to 7 (“very similar”). Standard deviations are given below the diagonal line of black squares. See Table 1 for a more detailed description of what the action numbers (i.e., “1a”, “1b”, etc.) refer to 1280 Behav Res (2018) 50:1270–1284 the other 25 actions: the smaller this average is, the more videos did not display, or run smoothly. Three participants distinct the action is. According to this metric, action 9a were excluded because they reported their first language to (M = 1.94, SD = 1.40), 5a (M = 2.02, SD = 1.48), and 2a be something other than English. The final participant sam- (M = 2.03, SD = 1.43) appear to be most distinct. ple included 24 individuals (8 males, 16 males) between Figure 5 shows that most combinations of actions 18–48 years old (M = 22.92, SD = 6.43). Participants auto- (80.6%) were rated–on average–on the left side of the seven- matically entered a lottery for an Amazon voucher upon point scale (i.e. the area left side of the first dotted line), completing the task. indicating that most actions are distinct from each other. The area between the two dotted lines covers the combinations Materials of actions that participants rated neutrally (12.9%). Very few combinations of actions (6.5%) were rated–on average–on Experiment 4 used the same videos as Experiment 3. the right end of the seven-point scale (i.e. the area on the right side of the second dotted line), indicating that only Procedure some of the actions are similar to each other. The experiment was set up in a web-based environment. Participants signed a digital consent form and were asked Experiment 4 for demographic information. Prior to the main task, par- ticipants were shown a video of the model moving across The fourth experiment assessed how accurately and con- the length of a scene. The instructions stated that every fol- cisely adult native English speakers can describe the actions lowing video would also show an actor moving across the in our database. This can also be used as the proxy mea- length of a scene, and that they had to describe the actor’s sure for how unusual adult native English speakers find each manner of movement as concise and accurate as possible. action. If our participants find the actions unusual, then they Participants were instructed to type an “X” to skip a trial in should not converge on single-word or multi-word labels for case they could not come up with a description for the move- the actions. ment. Participants were also asked not proceed if they were unable to view the video on the instruction page properly. Method During the main task, a video started playing on loop automatically in the center of the screen on each trial. Partic- Participants ipants were required to answer the question “Please describe the actor’s manner of movement as concise and accurate We recruited 28 native English speakers (10 males, 18 as possible:” using a text box below the video. Participants females) from the university’s online participant pool. One also rated the difficulty of coming up with a description on participant was excluded from further analyses because the a seven-point scale, where 1 indicated very difficult, 4 indi- cated neither difficult nor easy, and 7 indicated very easy. Trials were randomly displayed for each participant, until participants had seen all actions. After they had completed all trials, they were asked if all the videos ran smoothly, and if not, what type of problems had occurred. Data Analysis Verbatim responses were spell-checked and converted to lowercase letters. The length of the descriptions was mea- sured by counting the number of words separated by a blank space. Any punctuation (e.g., hyphens) did not count towards the number of words in a description. We then annotated the content words in the descrip- tions using a Cambridge English dictionary. Nouns, main verbs, adjectives and adverbs are content words, which usu- Fig. 5 Frequency of average similarity ratings for all combinations ally refer to some object, action, or characteristic of an of (different) actions in the database. N = 325 combinations of two event. Verbs, adjectives, and nouns (i.e., rotate, rotating, different actions. Ratings represent the similarity between two actions, and rotation) that have the same root were coded as the on a scale of 1 (“very dissimilar”) to 7 (“very similar”). The dotted same responses using the root of the word (i.e. rotate). lines mark the neutral score of 4 on the seven-point scale Behav Res (2018) 50:1270–1284 1281 Annotations could contain the same root more than once, for each action. Difficulty ratings were averaged over each but only unique roots counted towards the total number of action. content words in a description. For instance, one partici- pant described action 11a with “jump forward and alternate Results and Discussion your legs with each jump like a scissor movemenn, using the word “jump” first as verb and then as a noun. These Table 3 shows that participants provided quite lengthy two words have the same root and therefore only added descriptions for the actions (mean number of words per a count of one to the total count of content words per description: M = 6.99, SD = 5.71), ranging between 4.50 description. Auxiliary verbs, pronouns, articles, and prepo- words for action 9b and 9.80 words for action 8b. On sitions are grammatical words and were therefore not coded. average, 4.68 (SD = 3.02) roots were annotated for the Annotations were checked by an independent researcher. descriptions, ranging between 3.50 for action 9b and 10b We used two key statistics to evaluate the conciseness and 6.15 for action 8b. of the descriptions: the average number of unique roots per Participants generally approached the task by describ- description and the number of descriptions that contained a ing the actions using main verbs and modified these verbs single root. We computed the percentage of participants that using adjectives, adverbs, directional phrases, and nouns mentioned the same root for each action to measure agree- that specified the part of the body that was most involved ment among participants. Subsequently, these roots were in the movement. For instance, one participant described ranked based on how many of the participants used them in action 10b as “turn sideways and simply jump sideways their description and the three most used roots were reported whilst keeping your feet together” and another participant Table 3 Descriptive statistics of written descriptions for all 26 actions in the GRACE database No. Action X No. words No. roots Root 1 Root 2 Root 3 No. single Difficulty (%) M (SD) M (SD) (%) (%) (%) roots (%) M (SD) 1a. bowing 0 (0.0) 8.04 (6.22) 5.29 (3.51) bend (66.7) walk (58.3) forward (54.2) 1 (4.2) 4.58 (1.38) 1b. skating 2 (8.3) 6.32 (4.24) 4.18 (2.30) zigzag (25.0) slide (25.0) forward (25.0) 1 (4.2) 4.08 (1.79) 2a. wobbling 1 (4.2) 8.43 (5.18) 5.48 (2.69) body (62.5) rotate (41.7) upper (37.5) 1 (4.2) 3.67 (1.71) 2b. marching 0 (0.0) 5.92 (3.60) 4.29 (2.18) leg (66.7) forward (41.7) march (29.7) 3 (12.5) 5.29 (1.46) 3a. mermaiding 0 (0.0) 7.71 (6.80) 4.96 (3.63) side (79.2) jump (58.3) together (33.3) 0 (0.0) 5.04 (1.40) 3b. overstepping 0 (0.0) 6.92 (4.60) 4.96 (2.72) leg (54.2) step (33.3) forward (33.3) 1 (4.2) 4.33 (1.37) 4a. creeping 0 (0.0) 5.29 (6.83) 3.58 (3.24) walk (29.2) forward (29.2) slow (25.0) 5 (20.8) 4.50 (1.96) 4b. crisscrossing 0 (0.0) 7.29 (5.22) 4.79 (2.64) side (75.0) cross (62.5) leg (62.5) 1 (4.2) 3.92 (1.69) 5a. turning 1 (4.2) 9.39 (8.54) 5.91 (3.99) jump (79.2) side (58.3) turn (29.2) 0 (0.0) 3.96 (1.33) 5b. hopscotching 2 (8.3) 7.27 (8.02) 4.64 (3.98) hopscotch (50.0) leg (33.3) jump (29.2) 4 (16.7) 4.04 (1.78) 6a. swinging 1 (4.2) 8.78 (5.38) 5.30 (2.67) leg (70.8) circle (50.0) walk (33.3) 1 (4.2) 3.71 (1.52) 6b. skipping 5 (20.8) 8.21 (8.49) 5.00 (4.18) forward (45.8) leg (37.5) move (25.0) 2 (8.3) 2.67 (1.74) 7a. jumping 0 (0.0) 7.04 (4.25) 4.71 (2.46) jump (58.3) forward (45.8) leg (41.7) 1 (4.2) 3.96 (1.46) 7b. crossing 2 (8.3) 8.09 (4.68) 5.41 (2.74) cross (50.0) leg (45.8) walk (37.5) 0 (0.0) 3.54 (1.35) 8a. dropping 1 (4.2) 6.61 (6.52) 4.83 (3.77) walk (58.3) squat (33.3) crouch (29.2) 2 (8.3) 4.00 (1.47) 8b. folding 4 (16.7) 9.80 (7.21) 6.15 (3.17) leg (45.8) walk (41.7) step (33.3) 0 (0.0) 2.92 (1.79) 9a. twisting 2 (8.3) 5.77 (6.64) 4.27 (2.81) rotate (33.3) degree (25.0) side (20.8) 2 (8.3) 3.76 (1.69) 9b. stomping 0 (0.0) 4.50 (3.20) 3.50 (2.02) knee (50.0) high (41.6) stomp (33.3) 5 (20.8) 5.42 (1.32) 10a. trotting 1 (4.2) 5.43 (3.64) 4.04 (1.89) knee (62.5) high (54.2) forward (37.5) 2 (8.3) 2.67 (1.88) 10b. hopping 0 (0.0) 4.79 (3.75) 3.50 (2.32) side (87.5) jump (58.3) together (37.5) 1 (4.2) 6.08 (1.14) 11a. flicking 2 (8.3) 5.73 (4.00) 4.00 (2.12) leg (41.7) forward (33.3) quick (20.8) 1 (4.2) 3.50 (1.62) 11b. dragging 1 (4.2) 7.17 (5.65) 5.17 (3.28) forward (50.0) leg (41.7) step (37.5) 2 (8.3) 4.42 (1.74) 12a. grapevining 1 (4.2) 6.48 (5.88) 4.48 (2.98) cross (62.5) side (58.3) leg (50.0) 2 (8.3) 4.04 (1.23) 12b. shuffling 0 (0.0) 5.83 (4.39) 3.71 (2.31) walk (50.0) forward (37.5) lift (20.8) 3 (12.5) 5.42 (1.56) 13a. groining 3 (12.5) 9.48 (6.30) 6.14 (3.45) leg (54.2) walk (29.2) knee (29.2) 0 (0.0) 2.96 (1.68) 13b. scurrying 0 (0.0) 5.38 (4.53) 3.92 (3.03) tiptoe (58.3) step (33.3) fast (25.0) 3 (12.5) 5.36 (1.06) N = 24. Columns left to right describe the action number (No.), short-hand action label (Action), number (%) of participants who were not able to describe the action (X), length of a description (No. words), number of content words in a description (No. roots), three most used content words for each action (Root 1–3), number (%) of descriptions that contained a single content word (No. single roots), and the rated difficulty of coming up with a description (Difficulty, 1 = very difficult and 7 = very easy) 1282 Behav Res (2018) 50:1270–1284 described action 4a as “walking forward crouching slightly publicly available to the research community, we set out to with knees bent”. inspire researchers to norm our videos for their own stud- For the next analyses, we excluded 30 responses that ies. We invite these researchers to share these norms with us stated an “X”, because this indicated that participants could and other researchers so that we can upload these along with not come up with a description. Participants who attempted the GRACE video database through the Warwick Research to describe the actions used only a single content word in Archive Portal. 7.4% of the cases. In most responses, participants thus used more than one content word to describe the actions. Participants rated the difficulty of coming up with a Acknowledgements We would like to express our gratitude to all description on average with 4.22 (SD = 1.74), suggest- the actors in the videos, without whom it would not be possible to ing that the task was neither difficult nor easy. Participants develop the video database. A special thanks goes to Lukasz Walasek found action 6b and 10a (M = 2.67) most difficult to and Neil Stewart for helping us run our experiments in a web-based environment. describe and action 10b (M = 6.08) easiest to describe. Finally, we correlated the length of the descriptions (i.e. the number of words) and the number of roots per descrip- tion with the difficulty ratings for the actions. Participants Open Access This article is distributed under the terms of the who found it more difficult to describe the actions provided Creative Commons Attribution 4.0 International License (http:// longer descriptions r(595) = .10, p = .017, and used more creativecommons.org/licenses/by/4.0/), which permits unrestricted content words, r(595) = .12, p = .003. use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. General Discussion Appendix We developed the GestuRe and Action Exemplar (GRACE) video database, which is publicly available from the War- wick Research Archive Portal at http://wrap.warwick.ac.uk/ 78493. The GRACE video database contains 676 videos of 26 novel manners of human locomotion performed by 13 male actors and 13 female actors (i.e. actors moving from one location to another in an unusual manner), and videos of a female actor producing iconic gestures that represent these manners. Our first norming study demonstrates that GRACE con- tains gesture and action videos that can be combined to create clear matches and mismatches between iconic ges- tures and manners of human locomotion. Based on the findings of this first norming study, we assigned two actors (one male and one female) to a pair of actions to maxi- mize the match between the iconic gestures and actions. Our second norming study shows that male actors and female actors who were assigned to an action pair perform the same actions in very similar manners and the different actions in highly distinct manners. Our third norming study indi- cates that the majority of actions are, in fact, highly distinct from all other actions in the database. Our fourth norming study demonstrates that adult native English speakers do not Fig. 6 Profit matrix for action–gesture matches for females. Stimuli converge on accurate and concise linguistic expressions for were rated on a seven-point scale, where 1 indicated a very bad match, the actions in the database, indicating that these manners of 4 indicated neither a good nor a bad match, and 7 indicated a very good human locomotion are unusual. match. Ratings were averaged over each action pair and actor combi- nation. Column numbers correspond to action pairs and row numbers This database is useful for experimental psychologists correspond to the female actors in the database. Grey rectangles indi- working on action and gesture in areas such as language cate the ratings that were selected by the Hungarian algorithm, which processing, vocabulary development, visual perception, maximized the total profit of a one-to-one assignment of female actors categorization, and memory. By making our video database to action pairs Behav Res (2018) 50:1270–1284 1283 Kuhn, H., & Yaw, B. (1955). The Hungarian method for the assign- ment problem. Naval Research Logistics Quarterly, 2(1-2), 83–97. Madan, C., & Singhal, A. (2012). Using actions to enhance memory: effects of enactment, gestures, and exercise on human mem- ory. Frontiers in Psychology, 3, 507. https://doi.org/10.3389/fpsyg. Maguire, M., Hennon, E., Hirsh-Pasek, K., Golinkoff, R., Slutzky, C., & Sootsman, J. (2002). Mapping words to actions and events: How do 18-month-olds learn a verb?. In Skarabela, B., Fish, S., & Do,A.(Eds.), Proceedings of the 27th Annual Boston University Conference on Language (pp. 371–382). Somerville: Cascadilla Press. Maguire, M., Hirsh-Pasek, K., Golinkoff, R., & Brandone, A. (2008). Focusing on the relation: Fewer exemplars facilitate children?s ini- tial verb learning and extension. Developmental Science, 11(4), 628–634. https://doi.org/10.1111/j.1467-7687.2008.00707.x. Malt, B., Ameel, E., Imai, M., Gennari, S., Saji, N., & Majid, A. (2014). Human locomotion in languages: Constraints on moving and meaning. Journal of Memory and Language, 74(7), 103–127. https://doi.org/10.1016/j.cognition.2008.08.009. Malt, B., Gennari, S., Ameel, E., Tsuda, N., & Majid, A. (2008). Talking about walking: Biomechanics and the lan- guage of locomotion. Psychological Science, 19(3), 232–240. https://doi.org/10.1111/j.1467-9280.2008.02074.x. McNeill, D. (1992). Hand and mind: What gestures reveal about thought. University of chicago press. Fig. 7 Profit matrix for action–gesture matches for males. Stimuli McNeill, D., Cassell, J., & McCullough, K. (1994). Communicative were rated on a seven-point scale, where 1 indicated a very bad match, effects of speech-mismatched gestures. Research on Language 4 indicated neither a good nor a bad match, and 7 indicated a very good and Social Interaction, 27(3), 223–237. match. Ratings were averaged over each action pair and actor combi- Mumford, K. (2014). The relationship between vocabulary and gesture nation. Column numbers correspond to action pairs and row numbers development in early childhood and infancy. (phd dissertation). correspond to the male actors in the database. Grey rectangles indicate University of birmingham. the scores that were selected by the Hungarian algorithm, which max- Mumford, K., & Kita, S. (2014). Children use gesture to inter- imized the total profit of a one-to-one assignment of male actors to pret novel verb meanings. Child Development, 85(3), 1181–1189. action pairs https://doi.org/10.1111/cdev.12188. Ozc ¸alıs ¸kan, S., Gentner, D., & Goldin-Meadow, S. (2014). Do iconic gestures pave the way for children’s early verbs?. Applied Psycholinguistics, 35(6), 1–20. https://doi.org/10.1017/ References S0142716412000720. Ozc ¸alıs ¸kan, S., Lucero, C., & Goldin-Meadow, S. (2016). Does language shape silent gestures?. Cognition, 148(1), 10–18. Cassell, J., McNeill, D., & McCullough, K. (1999). Speech-gesture https://doi.org/10.1016/j.cognition.2015.12.001. mismatches: Evidence for one underlying representation of lin- Ozyurek, ¨ A., & Kita, S. (1999). Expressing manner and path in guistic and nonlinguistic information. Pragmatics and Cognition, English and Turkish: Differences in speech, gesture, and con- 7(1), 1–34. https://doi.org/10.1207/s15327973rlsi2703 4. ceptualization. In Hahn, M., & Stoness, S. (Eds.), Proceedings Feyereisen, P. (2006). Further investigations on the mnemonic of the 21st Annual Conference of the Cognitive Science Society effect of gestures: Their meaning matters. European Journal of (pp. 507–512). London: Erlbaum. Cognitive Psychology, 18(2), 185–205. https://doi.org/10.1080/ Ozyurek, ¨ A., Willems, R., Kita, S., & Hagoort, P. (2007). On-line inte- gration of semantic information from speech and gesture: Insights Goodrich, W., & Hudson Kam, C. (2009). Co-speech gesture as input from event-related brain potentials. Journal of Cognitive Neuro- in novel verb learning. Developmental Science, 12(1), 81–87. science, 19(4), 605–616. https://doi.org/10.1162/jocn.2007.19.4. https://doi.org/10.1111/j.1467-7687.2008.00735.x. Imai, M., Haryu, E., & Okada, H. (2005). Mapping novel nouns and Pulverman, R., Golinkoff, R., Hirsh-Pasek, K., & Sootsman Buresh, J. verbs onto dynamic action events: are verb meanings easier to (2008). Infants discriminate manners and paths in non-linguistic learn than noun meanings for Japanese children?. Child Develop- dynamic events. Cognition, 108(3), 825–830. https://doi.org/10. ment, 76(2), 340–355. https://doi.org/10.1111/j.1467-8624.2005. 1016/j.cognition.2008.04.009. 00849.x. Pulverman, R., Hirsh-Pasek, K., Golinkoff, R., Pruden, S., & Salkind, Imai, M., Kita, S., Nagumo, M., & Okada, H. (2008). Sound sym- S. (2006). Conceptual foundations for verb learning: Celebrating bolism facilitates early verb learning. Cognition, 109(1), 54–65. the event. In Hirsh-Pasek, K., & Golinkoff, R. (Eds.), Action Meets https://doi.org/10.1016/j.cognition.2008.07.015. ¨ Word: How Children Learn Verbs. https://doi.org/10.1093/acprof: Kita, S., & Ozyurek, ¨ A. (2003). What does cross-linguistic variation in oso/9780195170009.003.0006. Amsterdam: Oxford University semantic coordination of speech and gesture reveal?: Evidence for Press. an interface representation of spatial thinking and speaking. Jour- Pulverman, R., Song, L., Golinkoff, R. M., & Hirsh-Pasek, K. (2013). nal of Memory and Language, 48(1), 16–32. https://doi.org/10.10 Preverbal infants’ attention to manner and path: Foundations for 16/S0749-596X(02)00505-3 learning relational terms. Child Development, 84(1), 241–252. Kuhn, H. (1956). Variants of the Hungarian method for assignment https://doi.org/10.1111/cdev.12030. problems. Naval Research Logistics Quarterly, 3, 253–258. 1284 Behav Res (2018) 50:1270–1284 R Development Core Team (2011). R: A Language and Environment Slobin, D. (2006). What makes manner of motion salient? Explorations for Statistical Computing. Vienna, Austria. Retrieved from http:// in linguistic typology, discourse, and cognition. In Hickmann, M., www.R-project.org (ISBN 3-900051-07-0). &Robert,S.(Eds.), Space in languages: Linguistic systems and Salkind, S. (2003). Do you see what I see?, Paper presented at the 4th cognitive categories (pp. 59–81). Amsterdam: John Benjamins. Annual University of Delaware Linguistics and Cognitive Science Slobin, D., & Hoiting, N. (1994). Reference to movement in spoken Graduate Student Conference. Neward, DE. and signed languages: Typological considerations, Proceedings Salkind, S., Golinkoff, R., & Brandone, A. (2005). Infants’ attention to of the 20th Annual Meeting of the Berkeley Linguistics Society novel actions in relation to the conflation patterns of motion verbs. (pp. 487–503). Berkeley: Berkeley Linguistics Society. In Golinkoff, R., & Hirsh-Pasek, K. (Eds.), Action Packed for Lan- Slobin, D., Ibarretxe-Antunano, ˜ I., Kopecka, A., & Majid, A. (2014). guage: Prelinguistic Foundations for Learning Relational Terms. Manners of human gait: A crosslinguistic event-naming study. Atlanta: Society for Research in Child Development Biennial Cognitive Linguistics, 25(4), 701–741. https://doi.org/10.1515/ Meeting. cog-2014-0061. Scott, R., & Fisher, C. (2012). 2.5-year-olds use cross-situational Spencer, D., McDevitt, T., & Esch, M. (2009). Brief training with consistency to learn verbs under referential uncertainty. Cog- co-speech gesture lends a hand to word learning in a foreign nition, 122(2), 163–180. https://doi.org/10.1016/j.cognition.2011. language. Language and Cognitive Processes, 24(2), 313–334. 10.010. https://doi.org/10.1080/01690960802365567. Slobin, D. (2004). The many ways to search for a frog: Linguis- Supalla, T. (2009). Fisher, S., & Siple, P. (Eds.) Theoretical issues tic typology and the expression of motion events. In Stromqvist, ¨ in sign language research: Linguistics, (pp. 127–152). Chicago: S., & Verhoeven, L. (Eds.), Relating Events in Narrative: Vol. 2: University of Chicago Press. Typological and Contextual Perspectives (pp. 219–257).Mahwah: Wood, J. (2008). Visual memory for agents and their actions. Cognition, Lawrence Erlbaum Associates. 108(2), 522–532. https://doi.org/10.1016/j.cognition.2008.02.012. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Behavior Research Methods Springer Journals

GestuRe and ACtion Exemplar (GRACE) video database: stimuli for research on manners of human locomotion and iconic gestures

Loading next page...
 
/lp/springer_journal/gesture-and-action-exemplar-grace-video-database-stimuli-for-research-PbDZAm6ANP
Publisher
Springer Journals
Copyright
Copyright © 2017 by The Author(s)
Subject
Psychology; Cognitive Psychology
eISSN
1554-3528
DOI
10.3758/s13428-017-0942-2
Publisher site
See Article on Publisher Site

Abstract

Behav Res (2018) 50:1270–1284 DOI 10.3758/s13428-017-0942-2 GestuRe and ACtion Exemplar (GRACE) video database: stimuli for research on manners of human locomotion and iconic gestures 1 2 1 Suzanne Aussems · Natasha Kwok · Sotaro Kita Published online: 15 September 2017 © The Author(s) 2017. This article is an open access publication Abstract Human locomotion is a fundamental class of Third, all the actions in the database are distinct from each events, and manners of locomotion (e.g., how the limbs other. Fourth, adult native English speakers were unable to are used to achieve a change of location) are commonly describe the 26 different actions concisely, indicating that encoded in language and gesture. To our knowledge, there the actions are unusual. This normed stimuli set is useful for is no openly accessible database containing normed human experimental psychologists working in the language, ges- locomotion stimuli. Therefore, we introduce the GestuRe ture, visual perception, categorization, memory, and other and ACtion Exemplar (GRACE) video database, which related domains. contains 676 videos of actors performing novel manners of human locomotion (i.e., moving from one location to Keywords Action exemplars · Iconic gestures · Human another in an unusual manner) and videos of a female locomotion manners · Video database · Stimuli set actor producing iconic gestures that represent these actions. The usefulness of the database was demonstrated across four norming experiments. First, our database contains clear Introduction matches and mismatches between iconic gesture videos and action videos. Second, the male actors and female Human locomotion (e.g., movement of the human limbs actors whose action videos matched the gestures in the to change location) is a topic widely studied in the field best possible way, perform the same actions in very simi- of experimental psychology. For instance, expressions of lar manners and different actions in highly distinct manners. human locomotion have been studied in spoken language (e.g., Malt et al. 2008; Slobin et al. 2014;Maltetal. 2014), written language (e.g., Slobin 2004, 2006), sign language (e.g., Supalla 2009; Slobin & Hoiting 1994), and gesture Electronic supplementary material The online version of this ¨ ¨ (e.g., Ozyurek ¨ 1999; Kita 2003; Ozc ¸alıs ¸kan 2016). Also, in article (https://doi.org/10.3758/s13428-017-0942-2) contains sup- plementary material, which is available to authorized users. many word learning experiments, researchers teach children verbs for novel manners of human locomotion (e.g., Mum- ford 2014; Mumford & Kita 2014;Imaietal. 2008;Scott & Suzanne Aussems s.aussems@warwick.ac.uk Fisher 2012). In memory experiments, locomotion stimuli are often used to study visual memory of agents and their Natasha Kwok natasha@plaiconsulting.com actions (e.g., Wood 2012). In categorization experiments, human locomotion is used to study, inter alia, how chil- Sotaro Kita dren perceptually categorize manners of locomotion (e.g., s.kita@warwick.ac.uk Salkind et al. 2003; Salkind et al. 2005; Pulverman et al. Department of Psychology, University of Warwick, 2006). CV4 7AL Coventry, UK Particularly in studies on verb learning, human locomo- tion stimuli are often used along with iconic gestures. Iconic P.L.A.I. Behaviour Consulting, Hong Kong, P.O. Box 11010, General Post Office, Hong Kong gestures (McNeill, 1992) represent actions, motions or Behav Res (2018) 50:1270–1284 1271 attributes associated with people, animals, or objects (e.g., Norming the GRACE Video Database wiggling the index and middle fingers to represent a per- son walking; tracing a shape). Researchers have investigated In this section, we identify and motivate four essential whether novel verb meanings are shaped by iconic gestures requirements for the type of stimuli in the GRACE video that are shown when the verb is taught (e.g., Spencer et database. These requirements guided the design of our al. 2009; Goodrich & Hudson Kam 2009; Mumford 2014; norming studies to assure its usefulness for experimental Mumford & Kita 2014). psychologists. The GRACE video database is particularly Developing human locomotion stimuli can be very labo- useful for researchers who need unusual human locomo- rious. Nevertheless, most researchers develop such stimuli tion stimuli to study language and gesture, memory, and solely for the purpose of their own research. As a conse- categorization. Below, we discuss the implications of each quence, there is no openly accessible video database con- norming study in the context of these research areas. taining manners of human locomotion and iconic gestures First, the GRACE video database includes videos that that represent these manners. were normed for the degree of match between action pairs and matching and mismatching iconic gestures. Many experiments in developmental psychology use two-way Current Research forced choice tasks. In such tasks, pairing actions that would appear as two choices is important. The design of our first Contents of the GRACE Video Database norming experiment is motivated by this future use. Also, pairing actions made data collection for this study more We developed and normed the GestuRe and Action Exem- manageable; if we did not pair, participants would have plar (GRACE) video database, which includes 676 videos to rate a large number of action-gesture combinations that of 26 actors (13 males, 13 females) performing 26 novel make “mismatches”. Action pairs with matching and mis- manners of human locomotion (i.e. moving from one loca- matching gestures could be used in experiments with a tion to another in an unusual manner), and 26 videos of a two-way forced choice task in which one of the actions is female actor who produces iconic gestures that represent congruent with gesture, but the other is incongruent. This is these manners. Figure 1 presents three examples of the ges- useful for research on word learning with the help of iconic tures and the corresponding manners of locomotion (in the gestures (e.g., Mumford & Kita 2014; Mumford 2014; upper right corner of each panel). The gesturing hands rep- Ozc ¸alıs ¸kan et al. 2014; Goodrich & Hudson Kam 2009), the resent the actor’s feet (panel A), the actor’s legs (panel B), intake of information conveyed by gesture and speech (e.g., and the actor’s whole body (panel C). McNeill et al. 1994; Cassell 1999; Ozyurek ¨ et al. 2007), and The GRACE video database is openly available from the memory recall for sentences with the help of gesture (e.g., Warwick Research Archive Portal at nAlong with the 702 Feyereisen 2006; Madan & Singhal 2012). Furthermore, video files, we have made the raw data from our norm- these stimuli are useful for studies on processing gesture- ing studies available and the Python scripts that we used to speech combinations, in which researchers often manipulate process the data. We also included a manual that contains the semantic relations between the two channels (i.e., ges- guidelines on how to use the GRACE video database. ture and speech match, mismatch, or complement each Fig. 1 Three panels (A, B, and C) with cropped stills of videos in Gestures and actions are included in separate video files in which a female actor gestures iconically to represent the manners the database. From left to right the panels show the follow- of human locomotion performed by actors in the upper right cor- ing gesture videos: “00F scurrying.mp4”, “00F mermaiding.mp4”, ners of the panels. In the actual norming study, the action video and and “00F twisting.mp4”, and action videos: “01F scurrying.mp4”, the gesture video had the same size and were presented side-by-side. “09F mermaiding.mp4”, “01M twisting.mp4” 1272 Behav Res (2018) 50:1270–1284 other) (e.g., McNeill et al. 1994; Cassell et al. 1999;Ozyurek ¨ al. 2013), which use change-detection tasks with more than et al. 2007; Spencer et al. 2009). Thus, the first norm- two options (e.g., four actions presented to participants on ing study tested matches and mismatches between iconic each quadrant). Third, the manners of locomotion that are gestures and manners of human locomotion in all the 676 shown to one participant need to be highly distinctive from action videos. We then ran an algorithm over the norming each other to avoid confusion in any given task. For exam- scores to identify the best possible matches between iconic ple, if a participant is taught a novel label for a locomotion gestures and actions performed by male actors and female manner in a word learning task, then this manner should be actors, separately. This led to a one-to-one assignment of distinct from all manners that are subsequently labeled to male actors and female actors to action pairs. Action videos avoid a bias in test performance. Therefore, the third norm- of the selected actors were used in the next norming study. ing study tested the similarity between all combinations of Second, GRACE contains videos that were normed for actions to obtain a measure of distinctiveness for each action the similarity of the same actions within action pairs in the database. In this norming study, human raters were performed by male actors and female actors and the presented with a subset of the videos from the database, in (dis)similarity of the different actions within action pairs which each video showed one of the 26 actions performed performed by male actors and female actors. Researchers by either a male or female actor. who introduce an actor-change in their experimental task Finally, the 26 actions in the GRACE video database (e.g., to test actor memory or verb generalization) often do were normed for how accurately and concisely they can this by changing between male actors and female actors, be described by adult native English speakers. We asked as they have naturally distinct appearances (e.g., Mum- whether the English language contains existing single-word ford 2014). For instance, word learning studies that take or multi-word labels for the actions, which we used as a an exemplar-based approach could use videos that show measure of how unusual the actions are. It is important that different actors performing the same actions and the same the stimuli are unusual to ensure that a given task perfor- actors performing different actions (e.g., Maguire et al. mance occurs as a function of an experimental manipulation 2002; Maguire et al. 2008; Scott & Fisher 2012). Videos that and not as a consequence of participants being familiar with show different actors moving in the same manner could also the stimuli prior to the task. This is important for language be useful for creating generalization tasks to test people’s research: if a participant already knows a label for an action understanding of locomotion verbs (e.g., Imai et al. 2008), action that is labeled in a word learning task, then this and recognition tasks and change-detection tasks to test their may cause a bias in test performance. It is also important memory of actors (e.g., Imai et al. 2005; Wood 2008). In all for memory research: if people commonly perform these these tasks it is important that the manner of human loco- actions in real life, then this may cause a bias in test per- motion is similar across the actor-change. Thus, the second formance. Therefore, the fourth experiment assessed how norming study tested how similar male actors and female accurately and concisely each action can be described by actors perform the same actions within action pairs, and adult native speakers of English. Participants described the how distinct each male actor and female actor performs the 26 actions in the database based on the same set of videos two different actions within action pairs. All actions that are as in the third norming study. included in the database were normed in this study, but par- ticipants rated only the videos of male actors and female General Methods for Developing the GRACE Video actors who were assigned to an action pair because their per- Database formance matched corresponding gestures very well in the first norming study. The GRACE video database originated in work by Mum- Third, GRACE includes 26 actions which were normed ford and Kita (2014) and Mumford (2014), who developed for how distinct they are compared to every other action 14 unusual manners of human locomotion and iconic ges- in the database. In this norming study, we let go of the tures representing these manners. GRACE includes these 14 notion of action pairs to obtain a measure of distinc- manners and 12 additional manners of human locomotion tiveness for all the actions in the database. There are and corresponding iconic gestures, resulting in a total of 26 three advantages of using this approach. First, norming the manners and gestures. distinctiveness between all 26 actions is useful for studies on the ways in which people can categorize various seman- Action Videos tic components of motion verbs such as figure (e.g., the man, the woman, Pulverman et al.2006) and manner (e.g., We recruited 13 male actors between 22–40 years old (M = Salkind 2003; Salkind et al. 2005). Second, such norms are 27.00, SD = 4.98) and 13 female actors between 20–42 useful for studies on infants’ ability to discriminate man- years old (M = 27.08, SD = 6.36). The national origin of the ners of motion (e.g., Pulverman et al. 2008; Pulverman et actors varied from British, Czech, Japanese, Polish, Dutch, Behav Res (2018) 50:1270–1284 1273 Indian, Irish, German, Canadian, Nigerian, Mauritian, Bul- Linux. The total size of the GRACE video database is 185 garian, Pakistani, Singaporean, Malaysian to Chinese. All mega-bytes. actors were educated to the university degree level. Actors participated in individual recording sessions. They were instructed to keep their arms and hands by their Experiment 1 side when performing the actions, because we needed the hand gestures of the female actor to unambiguously repre- The first experiment tested the degree of match (and mis- sent the actors’ feet, leg, and body movements. Actors were match) between iconic gestures and manners of human also required to carry out each action as an ongoing motion locomotion. During the development of the database, 26 without any breaks. iconic gestures were created that matched each action. A Prior to recording each action, actors watched an exam- mismatch between iconic gestures and actions was set up in ple video of a model. The videos of the model were not the following way. Every action was paired up with another included in the database so that all actors shared the same action from the set to create 13 action pairs (see Table 1). We reference point when performing the actions. Subsequently, then showed participants each action with a matching iconic the actors were required to move across the length of a gesture, but also with the iconic gesture that was created for scene in the same manner as the model. The starting point the other action in the action pair as a mismatching iconic and the ending point were marked on the floor just outside gesture. Participants rated these matches and mismatches on the camera view. Each action was recorded at least twice a seven-point scale. from a distance of approximately 4.5 meters. If actors strug- We predicted that match ratings for matching iconic ges- gled with one of the actions, the researcher showed them tures and actions would be higher than match ratings for their last recorded video and practised the movement with mismatching iconic gestures and actions. Additionally, we them repeatedly until they were ready to record again. Every predicted that matches would be rated higher than the neu- recording session lasted approximately 1 hour. Informed tral score on a seven-point scale and that mismatches would written consent was obtained at the end of each recording be rated lower than the neutral score. session. Method Gesture Videos Participants Hand gestures of a female actor were recorded from a dis- tance of approximately 1.5 meters. This actor watched the We recruited 301 individuals (183 males, 117 females) from video recordings of the model performing an action prior to the university’s online participant pool. Eight participants recording the gesture that was designed to match this action. were excluded from further analyses because they indicated Gestures were designed by the researchers based on the def- that the videos did not display, or run smoothly. The final inition of iconic gestures by McNeill (1992) so that the form participant sample included 293 individuals (179 females, of gesture resembled the referent action. 113 males) between 18–67 years old (M = 22.19, SD = Specifically, all gestures iconically represented the body 6.66). The majority of participants reported English as part that was most prominent for each movement (i.e., feet, their native language (58.7%), followed by Asian languages legs, or whole body), its dynamic shape, and the rate at (23.2%), and other Indo-European languages (18.1%). Par- which the movement was carried out. Gestures representing ticipants automatically entered a lottery for an Amazon the whole body were performed with the right hand. Ges- voucher upon completing the task. tures representing the legs were performed by both hands, where the right hand represented the right leg and the left Materials hand represented the left leg. Gestures representing the feet were performed with the fingers, where the right hand fin- We used videos of 26 manners of locomotion carried out by gers matched the right foot and the left hand fingers matched 26 actors (676 videos in total), and 26 videos of a female the left foot. actor producing iconic gestures. Actions were organized in pairs (see Table 1) so that matches and mismatches between Apparatus iconic gestures and actions could be created. Figure 2 shows the matches and mismatches between iconic gestures and Videos were recorded using a Canon Legria HFR56 cam- actions for action pair 1. For instance, participants were shown era with autofocus in a room with controlled light settings. bowing with a bowing gesture (Panel A), bowing with a Recordings were muted, cut, optimized for HTML, and con- skating gesture (Panel B), skating with a skating gesture verted to MP4 files of 640 × 480 pixels using avconv on (Panel C), and skating with a bowing gesture (Panel D). 1274 Behav Res (2018) 50:1270–1284 Table 1 Twenty-six manners of human locomotion organized in action pairs Pair Action a Still frame Action b Still frame 1. bowing skating 2. wobbling marching 3. mermaiding overstepping 4. creeping crisscrossing 5. turning hopscotching 6. swinging skipping 7. jumping crossing 8. dropping folding 9. twisting stomping 10. trotting hopping 11. flicking dragging 12. grapevining shuffling 13. groining scurrying Still frames are taken from the videos of the male actor whose videos file names start with “08M ”. Short-hand action labels are used to refer to the manners of locomotion and follow after the underscore in the file names of the database (e.g., “08M bowing.mp4”, “08M skating.mp4”) We created 26 batches of videos to keep the length of the side-by-side, which started playing on loop automatically experiment reasonable. Each video batch contained videos when a trial started. Participants were instructed to rate the of the 26 actions, but performed by different actors to ensure match between the hand gesture of the female actor (left that all 676 action videos appeared in one of the batches. video) and the manner in which an actor moved (right video) on a seven-point scale, where 1 indicated a very bad match, Each action video was combined with a matching and mis- 4 indicated neither a good nor a bad match, and 7 indicated matching gesture video within a batch, which resulted in a very good match. Participants were randomly assigned to 52 trials. Each action video–gesture video combination was one of the 26 batches and trials were randomly displayed for rated by on average 23 participants (range = 18 to 28). each participant. After they had seen all the trials, they were asked if all the videos ran smoothly, and if not, what type of Procedure problems had occurred. The experiment was set up in a web-based environment. Data Analysis Participants signed a digital consent form and were asked for demographic information. The instruction page showed participants a still frame of a gesture video and a still frame Using the irr package in the R software for statistical analyses (R Development Core Team, 2011), we computed Kendall’s of an action video from the model as an example of a very good match. Participants were then shown two videos W (also known as Kendall’s coefficient of concordance) to Behav Res (2018) 50:1270–1284 1275 can be assigned to only one action pair). In order to achieve a one-to-one assignment the matrix has to have the same number of rows and columns. The same procedure was carried out for the matrix containing average ratings for 13 male actors. The Hungarian method (Kuhn & Yaw, 1955; Kuhn, 1956) finds an optimal assignment for a given n · n matrix in the following way. Suppose we have n action pairs to which we want to assign n actors on a one-to-one basis. The average ratings are the profit of assigning each actor to each action pair. We wish to find an optimal assignment which maximizes the total profit. Let P be the profit of assigning an ith actor to the j th i,j action pair. We define the profit matrix to be the n·n matrix: ⎡ ⎤ P P ··· P 1,1 1,2 1,n ⎢ ⎥ P P ··· P 2,1 2,2 2,n ⎢ ⎥ P = ⎢ ⎥ . (1) . . . . . . ⎣ ⎦ . . . P P ··· P n,1 n,2 n,n An assignment is a set of n entry positions in the matrix, none of which lie in the same column or row. The sum of the n entries of an assignment is its profit. An assignment with the highest profit is called an optimal assignment. We imple- mented this algorithm in Python using the Munkres package. Fig. 2 Four panels (A, B, C, and D) with cropped stills of videos in Our Python scripts are available from the Warwick Research which a female actor gestures iconically to represent the actions of pair Archive Portal at http://wrap.warwick.ac.uk/78493. 1, as performed by a male actor in the upper right corners of the panels. Panels A shows a bowing gesture with a bowing movement (match), Results and Discussion Panel B shows a bowing gesture with a skating movement (mismatch), Panel C shows a skating gesture with a skating movement (match), Inter-Rater Reliability and Panel D shows a skating gesture with a bowing movement (mis- match). Gesture videos are “00F bowing.mp4” (Panel A and B) and “00F skating.mp4” (Panel C and D). Action videos are “06M bowing” Kendall’s W averaged over all 26 video batches was .72 (Panel A and D) and “06M skating” (Panel B and C) (SD = 0.07) and ranged between .54 and .81. This coeffi- cient was statistically significant for all batches (p < .001), assess agreement between participants who rated the same indicating that participants were applying the same stan- video batch. Kendall’s W is a non-parametric test statistic dards when rating the stimuli. that takes into account the number of raters and the fact that the videos were rated on an ordinal scale. Its coefficient General Findings ranges from 0 (no agreement) to 1 (complete agreement). We used non-parametric tests to analyze the ratings for Figure 3 displays the average ratings for the degree of match matches and mismatches between iconic gestures and actions, between iconic gestures and actions. Black dots represent because these ratings were not normally distributed. The R average ratings for matches between iconic gestures and script containing the basic code for all analyses reported in grey dots represent average ratings for mismatches between this paper is uploaded as supplementary material. iconic gestures and actions. The 95% confidence intervals for both match and mismatch ratings are generally very The Hungarian Algorithm narrow, indicating strong agreement among the participants. We asked whether ratings differed between match and We split the data based on the gender of the actors, because mismatch combinations of iconic gestures and actions. Rat- our aim is to identify the best possible match between iconic ings for matches and mismatches between iconic gestures gestures and action pairs carried out by male actors and and actions were averaged across all action pairs for each by female actors. The matrix containing average ratings participant. A Wilcoxon rank sum test demonstrated that for female actors was subjected to the Hungarian algorithm the median of average match ratings (Mdn = 5.92) was significantly higher than the median of average mismatch (Kuhn and Yaw, 1955; Kuhn, 1956) to find the most prof- ratings (Mdn = 1.77), W = 316.5, p < .001, 95% CI of the itable (here best overall match between gestures and actions) difference [−4.12, −3.88]. assignment of 13 female actors to 13 action pairs (each actor 1276 Behav Res (2018) 50:1270–1284 Fig. 3 Average ratings for the degree of match between matching match between iconic gestures and actions on a scale of 1 (“very bad and mismatching iconic gestures and actions, organized by action match”) to 7 (“very good match”). The dotted line indicates the neutral pair. Error bars represent 95% confidence intervals of the means. score of 4 on the seven-point scale Rating scores are averaged across all actors and represent the degree of Furthermore, we compared the averaged ratings for pair four times, and the female with the fourth highest match matches and mismatches across action pairs against the rating for an action pair one time. As the 13 females were neutral score on our seven-point scale. A Wilcoxon signed assigned to 13 action pairs, the highest possible profit that rank test indicated that the median of average match rat- could have been achieved was 91 (13 × 7). The algorithm ings was significantly higher than a neutral score of 4, W = assigned female actors to action pairs with a total profit of 42638, p < .001, 95% CI of the median [5.77, 5.92]. In 80.63 (88.6% of 91), with the lowest average match rating for contrast, the median of average mismatch ratings was signif- an assigned actor being 5.56 out of 7 (see Fig. 6 in Appendix). icantly lower than a neutral score of 4, W = 137, p < .001, For males, the algorithm selected the male actor with 95% CI of the median [1.75, 1.92]. Thus, matching iconic the highest match rating for an action pair six times, the gestures and actions were rated as good matches and mismat- male with the second highest match rating for an action ching iconic gestures and actions were rated as bad matches. pair two times, the male with the third highest match rat- The 95% confidence intervals of the means in Fig. 3 ing two times, the male with the fourth highest match rating clearly demonstrate that there is some variability between two times, and the male with the fifth highest match rating action pairs. When we compared the median of the averaged one time. The algorithm assigned male actors to action pairs match and mismatch ratings for every action pair against a with a total profit of 81.02 (89.0% of 91), with the lowest neutral score of 4, Wilcoxon signed rank tests revealed that average match rating for an assigned actor being 5.64 out of matches and mismatches for all action pairs differed signifi- 7(seeFig. 7 in Appendix). cantly from the neutral score (p < .001 for all comparisons). Experiment 1 provided norming scores for all the videos in the GRACE videos database. With these ratings we eval- Assigning Actors to Action Pairs uated the match and mismatch between iconic gestures and actions within action pairs. Moreover, the Hungarian algo- The Hungarian Algorithm optimally assigned 13 female rithm over these ratings optimally assigned male actors actors to 13 action pairs, and did the same for 13 male and female actors to action pairs, to maximize the overall actors. The Algorithm used “profit” matrices for actors and degree of match between gestures and action pairs. These action pairs, created in the following way (one matrix for assignments will be used in subsequent experiments. female actors, and another one for male actors). For each action performed by each actor, 10–14 participants rated the match between each action and a matching gesture. The ratings Experiment 2 were averaged across participants, and then the two average ratings for actions that comprise an action pair were averaged The second experiment tested whether the male actors and again to create a “profit” for the action pair and actor. female actors who were assigned to an action pair based on For females, the algorithm selected the female actor with Experiment 1 perform the same actions in similar manners the highest match rating for an action pair eight times, the and the two different actions in distinct manners. Partic- female with the second highest match rating for an action ipants rated the similarity between two action videos on Behav Res (2018) 50:1270–1284 1277 a seven-point scale. These videos showed either the same During the main task, participants saw two videos side- actor performing two different actions, or two different by-side and rated the similarity between two movements actors (male vs. female) performing the same action. on a seven-point scale, where 1 indicated very dissimilar, We predicted that two actors performing the same action 4 indicated neither similar nor dissimilar, and 7 indicated would be rated more similar than the same actor perform- very similar. Both videos started playing on loop automati- ing two different actions. Additionally, we predicted that cally when a trial commenced. Participants were randomly two actors performing the same action would be rated more assigned to an experiment version and trials were displayed similar than the neutral score on a seven-point scale and the in a random order for each participant. After they had seen same actor performing a different action would be rated less all the trials, they were asked if all the videos ran smoothly, similar than the neutral score. and if not, what type of problems had occurred. Method Data Analysis Participants The data were analyzed in the same way as in Experiment 1. We recruited 42 individuals (19 males, 22 females, and 1 Results and Discussion would rather not say) from the university’s online partic- ipant pool. Two participants were excluded from further Inter-Rater Reliability analyses because they indicated that the videos did not dis- play, or run smoothly. The final participant sample included A statistically significant Kendall’s W of .77 (p < .001) 40 individuals (20 females, 19 males, and 1 would rather was computed for the similarity ratings, indicating that not say) between 18–57 years old (M = 24.30, SD = 8.25). participants reached agreement when rating the stimuli. The majority of participants reported English as their native language (67.5%), followed by other Indo-European lan- General Findings guages (22.5%), and Asian languages (10.0%). Participants automatically entered a lottery for an Amazon voucher upon Figure 4 displays the average similarity ratings for the same completing the task. and different actions within each action pair, carried out by the male actors and female actors who were assigned Materials to these action pairs based on Experiment 1. The 95% confidence intervals of the means for both the same and We used videos of male actors and female actors, who different actions are generally very narrow, indicating that were assigned to the action pairs based on Experiment 1. participants reached agreement. Trials included either two videos of the same actor (male or We asked whether ratings differ between different actors female) performing the two different actions in a pair, or two performing the same action and the same actors performing videos of two different actors performing the same actions a different action. Ratings for the same actors performing in a pair (action a or action b). Thus, for each action pair we two different actions and two different actors performing created four trials, resulting in a total of 52 trials (13 action the same actions were averaged across action pairs for each pairs × 2 actor gender × 2 same or different action). participant. A Wilcoxon rank sum test demonstrated that the median of average ratings was significantly higher for Counterbalancing two different actors performing the same action (Mdn = 6.62) than for the same actors performing a different action The left–right position of the action videos on each trial (Mdn = 1.48), W = 1.5, p < .001, 95% CI of the difference was counterbalanced across participants using two different [−5.19, −4.73]. versions of the experiment. We also predicted that two different actors performing the same action would be rated more similar than a neutral Procedure score of 4 and that the same actors performing a different action would be rated less similar than a neutral score of The procedure of this online experiment was similar to 4. Wilcoxon signed rank tests confirmed these predictions Experiment 1. The instruction page showed two videos of the same action performed by a male actor and a female actor (different actors performing the same action W = 817, p < (who were not included in the database) as a “very similar” .001, 95% CI of the median [6.38, 6.65]; the same actors example. The instructions stated that participants should not performing a different action, W = 820, p < .001, 95% CI proceed if they were unable to view the videos properly. of the median [1.40, 1.77]). 1278 Behav Res (2018) 50:1270–1284 Fig. 4 Average similarity ratings for actions within each action pair. separately for the same and different actions within each action pair. Error bars represent 95% confidence intervals of the means. For each Rating scores represent the similarity between two actions, on a scale participant, ratings were averaged across the male actor and the female of 1 (“very dissimilar”) to 7 (“very similar”). The dotted line indicates actor who were assigned to an action pair based on Experiment 1, the neutral score of 4 on the seven-point scale The 95% confidence intervals of the means in Fig. 4 indicated that the videos did not display, or run smoothly. evidently show that there appears to be some variability The final sample included 222 individuals (87 males, 135 between action pairs. When we compared the median of females) between 18–73 years old (M = 24.04, SD = 8.69). averaged ratings for every action pair (for the same actor The majority of participants reported English as their native performing two different actions and two different actors language (55.9%), followed by other Indo-European lan- performing the same actions) against a neutral score of guages (22.5%), and Asian languages (21.6%). Participants 4, Wilcoxon signed rank tests revealed that ratings for all automatically entered a lottery for an Amazon voucher upon action pairs differed significantly from the neutral score completing the task. (p < .001 for all comparisons). Overall, Experiment 2 thus shows that male actors and female actors, who were Materials assigned to an action pair based on Experiment 1, perform the same actions in similar manners and different actions in We used a set of 26 videos showing the 13 action pairs. distinct manners. For each action pair, we randomly determined whether each action was performed by the male or female actor that was assigned to that pair based on Experiment 1.Ifthe male Experiment 3 actor was selected for one action of the action pair, then the female actor was automatically selected for the other action The third experiment tested how distinct the 26 actions are of the action pair, and vice versa. Thus, 13 videos showed a from every other action in the set. We used a subset of the male actor and 13 videos showed a female actor. video database, which included videos of the 26 actions car- All possible combinations of two different action videos ried out by the male or female actors who were assigned to (26 × 25) were then divided over 26 video batches to keep an action pair based on Experiment 1. Participants rated the the length of the experiment reasonable. We made sure that similarity between every combination of two action videos every action video appeared in each batch. Across batches on a seven-point scale. each action video thus appeared with every other action video. Method Procedure Participants The same procedure as Experiment 2 was used. Partici- pants were presented with two action videos side-by-side, We recruited 225 individuals (88 males, 137 females) and rated the similarity between the actions on a seven- through the university’s online participant pool. Three par- point scale, where 1 indicated very dissimilar, 4 indicated ticipants were excluded from further analyses because they neither similar nor dissimilar, and 7 indicated very similar. Behav Res (2018) 50:1270–1284 1279 Participants were randomly assigned to a video batch and Results and Discussion trials were randomly displayed for each participant. After they had seen all the trials, they were asked if all the Inter-Rater Reliability videos ran smoothly, and if not, what type of problems had occurred. Kendall’s W averaged over all 25 video batches was .52 Participants were allowed to rate multiple video batches, (SD = 0.12) and ranged between .27 and .68. This coef- because each batch presented participants with new com- ficient was statistically significant for all batches (p < binations of action videos. We recorded 260 responses .001), indicating that participants were applying the same from 222 individuals. Every combination of two action standards when rating the stimuli. videos was rated by on average 20 participants (range = 19 to 22). General Findings Data Analysis Table 2 shows similarity ratings for every combination of two actions. The average score was 2.56 (SD = 1.71) and Inter-rater reliability was calculated in the same way ranged between 1.10 (SD = 0.30) for the combination of as in Experiment 1-2. A similarity matrix was created action 5a and 1a and 6.63 (SD = 0.60) for action 7b and 13a. by averaging the ratings over every combination of two The distinctiveness of each action can be assessed by actions. averaging similarity ratings between a given action and Table 2 Similarity rating matrix with averages (above the diagonal line of black squares) and standard deviations (below the diagonal line of black squares) for every combination of two action videos 1a 1b 2a 2b 3a 3b 4a 4b 5a 5b 6a 6b 7a 7b 8a 8b 9a 9b 10a 10b 11a 11b 12a 12b 13a 13b 1a  2.38 4.23 1.43 1.57 2.00 2.68 1.48 1.10 1.95 1.38 2.23 1.95 1.86 3.67 1.74 2.23 1.89 1.43 1.62 1.48 2.19 1.26 3.95 2.19 2.24 1b 1.28  2.29 1.95 2.09 3.33 4.86 1.81 2.16 3.24 2.59 2.10 3.90 3.00 4.89 3.05 2.11 3.18 2.52 1.57 2.19 5.53 1.76 3.81 4.10 1.95 2a 1.57 1.35  1.24 1.71 1.38 1.95 1.38 1.48 2.33 1.57 2.58 1.62 1.58 2.38 1.52 4.11 1.90 1.43 1.48 1.62 1.57 2.36 2.43 1.84 2.90 2b 0.68 1.07 0.44  1.38 5.14 3.33 1.62 1.38 2.05 4.05 4.00 2.58 3.43 1.81 4.14 1.24 3.67 2.86 1.57 4.84 2.76 1.29 2.32 3.62 1.68 3a 0.87 1.27 1.27 0.67  1.19 1.43 3.33 4.27 2.84 1.16 2.14 2.14 1.43 2.00 1.57 3.38 2.16 1.90 5.42 1.67 1.29 3.19 1.62 1.67 2.29 3b 1.38 1.91 0.74 1.78 0.40  4.43 1.53 1.42 2.00 4.52 3.48 2.90 4.05 2.05 6.10 1.24 4.81 3.32 1.57 4.57 3.48 1.38 2.63 4.05 1.95 4a 1.80 1.74 1.09 1.85 0.68 1.33  2.05 1.81 2.57 2.48 3.41 3.33 4.41 5.43 4.00 1.19 3.90 3.10 1.67 3.16 4.95 1.68 3.57 3.95 2.67 4b 1.03 1.21 0.80 1.24 1.80 1.02 1.40  3.32 2.68 1.57 2.29 4.14 2.48 1.58 1.48 1.67 1.52 1.95 3.91 1.67 2.14 5.81 1.37 2.00 2.05 5a 0.30 1.21 0.75 1.12 1.32 0.84 1.21 1.83  2.82 1.57 1.62 2.19 1.43 1.79 1.38 3.76 1.29 1.81 4.05 1.43 1.71 2.14 1.24 1.76 1.62 5b 1.20 1.45 1.46 1.18 1.64 1.23 1.57 1.67 1.53  1.81 4.24 3.71 2.43 2.67 2.33 1.43 2.67 3.90 2.32 3.10 1.86 1.52 2.19 1.71 3.26 6a 0.74 1.50 1.03 1.40 0.50 1.50 1.44 0.75 1.40 1.03  2.24 1.90 3.76 1.73 3.84 1.38 2.58 1.90 1.42 2.76 3.27 1.67 1.81 4.33 1.62 6b 1.34 1.45 1.46 1.82 1.46 1.60 1.79 1.42 0.97 1.73 1.34  4.29 2.47 2.67 3.05 1.52 2.81 3.33 2.14 5.29 2.89 1.33 2.95 1.90 2.19 7a 1.32 1.84 0.86 1.54 1.15 1.61 1.32 1.28 1.03 1.78 1.22 1.62  3.10 2.71 2.19 1.33 3.71 3.91 2.05 4.16 3.29 2.42 3.23 2.73 3.16 7b 1.15 1.41 1.17 1.29 0.75 1.78 1.76 1.86 0.93 1.50 1.87 1.74 1.55  3.29 4.33 1.48 4.95 3.58 1.48 2.68 3.29 2.36 2.52 6.63 1.86 8a 1.98 1.79 1.24 0.98 1.20 1.28 1.54 0.69 1.40 1.59 0.98 1.71 1.62 1.79  2.62 1.33 3.05 2.38 1.91 1.90 3.84 1.90 3.43 2.90 2.71 8b 1.37 1.53 0.75 1.71 1.21 1.22 1.63 0.81 0.67 1.20 1.86 1.68 1.50 1.49 1.53  1.68 4.53 3.62 1.48 3.71 3.77 1.52 2.33 4.19 2.19 9a 1.69 1.33 1.94 0.54 1.80 0.70 0.68 1.28 1.97 0.87 0.80 0.90 0.73 0.81 0.66 0.89  1.59 1.71 2.10 1.76 1.32 2.43 1.76 1.62 3.43 9b 1.45 1.79 1.04 1.46 1.46 1.72 1.37 0.75 0.72 1.53 1.71 1.47 1.45 1.69 1.72 1.74 0.91  5.38 1.76 2.62 3.23 1.71 2.86 5.00 2.81 10a 0.68 1.40 1.16 1.61 1.04 1.38 1.41 1.40 1.17 1.81 1.22 1.91 1.93 1.68 1.16 1.72 0.90 0.86  1.38 4.86 2.10 1.90 1.95 3.42 4.76 10b 1.16 1.21 1.03 1.47 1.43 1.36 0.97 1.66 1.89 1.63 0.69 1.35 1.28 1.03 1.27 0.93 1.14 1.49 0.59  1.48 2.00 2.71 1.62 1.29 1.89 11a 0.98 1.29 1.28 1.50 0.97 1.12 1.50 1.15 0.81 1.73 1.45 1.15 2.11 1.52 1.04 1.82 1.41 1.36 1.64 0.81  2.43 1.53 2.81 2.50 2.33 11b 1.33 1.68 1.03 1.64 0.56 1.54 1.53 1.39 1.19 0.85 1.75 1.49 1.65 1.62 1.84 1.74 0.95 1.41 1.34 1.45 1.63  1.38 3.90 2.95 1.86 12a 0.65 1.22 1.73 0.64 1.54 0.59 1.06 1.17 1.31 0.93 1.06 0.58 1.46 1.68 1.14 1.03 1.29 1.10 1.34 1.49 1.02 0.59  2.52 2.36 2.19 12b 1.63 1.57 1.50 1.04 0.74 1.61 1.47 0.68 0.54 1.40 1.17 1.43 1.80 1.17 1.89 1.53 0.89 1.71 1.31 0.97 1.63 1.64 1.17  2.47 2.77 13a 1.08 1.61 1.21 1.66 1.32 1.60 1.86 1.10 1.30 0.96 1.53 1.14 1.42 0.60 1.76 1.94 1.02 1.64 1.71 0.78 1.26 1.60 1.68 1.02  1.52 13b 1.37 1.20 1.73 0.95 1.35 1.05 1.32 1.43 1.32 1.79 1.02 1.21 1.80 0.91 1.27 1.40 1.63 1.54 1.58 1.20 1.39 1.20 1.44 1.34 0.75 Columns and rows are labeled with numbers of the compared actions. N varies between 19 to 22 per cell. Average ratings are given above the diagonal line of black squares. Ratings represent the similarity between two actions, on a scale of 1 (“very dissimilar”) to 7 (“very similar”). Standard deviations are given below the diagonal line of black squares. See Table 1 for a more detailed description of what the action numbers (i.e., “1a”, “1b”, etc.) refer to 1280 Behav Res (2018) 50:1270–1284 the other 25 actions: the smaller this average is, the more videos did not display, or run smoothly. Three participants distinct the action is. According to this metric, action 9a were excluded because they reported their first language to (M = 1.94, SD = 1.40), 5a (M = 2.02, SD = 1.48), and 2a be something other than English. The final participant sam- (M = 2.03, SD = 1.43) appear to be most distinct. ple included 24 individuals (8 males, 16 males) between Figure 5 shows that most combinations of actions 18–48 years old (M = 22.92, SD = 6.43). Participants auto- (80.6%) were rated–on average–on the left side of the seven- matically entered a lottery for an Amazon voucher upon point scale (i.e. the area left side of the first dotted line), completing the task. indicating that most actions are distinct from each other. The area between the two dotted lines covers the combinations Materials of actions that participants rated neutrally (12.9%). Very few combinations of actions (6.5%) were rated–on average–on Experiment 4 used the same videos as Experiment 3. the right end of the seven-point scale (i.e. the area on the right side of the second dotted line), indicating that only Procedure some of the actions are similar to each other. The experiment was set up in a web-based environment. Participants signed a digital consent form and were asked Experiment 4 for demographic information. Prior to the main task, par- ticipants were shown a video of the model moving across The fourth experiment assessed how accurately and con- the length of a scene. The instructions stated that every fol- cisely adult native English speakers can describe the actions lowing video would also show an actor moving across the in our database. This can also be used as the proxy mea- length of a scene, and that they had to describe the actor’s sure for how unusual adult native English speakers find each manner of movement as concise and accurate as possible. action. If our participants find the actions unusual, then they Participants were instructed to type an “X” to skip a trial in should not converge on single-word or multi-word labels for case they could not come up with a description for the move- the actions. ment. Participants were also asked not proceed if they were unable to view the video on the instruction page properly. Method During the main task, a video started playing on loop automatically in the center of the screen on each trial. Partic- Participants ipants were required to answer the question “Please describe the actor’s manner of movement as concise and accurate We recruited 28 native English speakers (10 males, 18 as possible:” using a text box below the video. Participants females) from the university’s online participant pool. One also rated the difficulty of coming up with a description on participant was excluded from further analyses because the a seven-point scale, where 1 indicated very difficult, 4 indi- cated neither difficult nor easy, and 7 indicated very easy. Trials were randomly displayed for each participant, until participants had seen all actions. After they had completed all trials, they were asked if all the videos ran smoothly, and if not, what type of problems had occurred. Data Analysis Verbatim responses were spell-checked and converted to lowercase letters. The length of the descriptions was mea- sured by counting the number of words separated by a blank space. Any punctuation (e.g., hyphens) did not count towards the number of words in a description. We then annotated the content words in the descrip- tions using a Cambridge English dictionary. Nouns, main verbs, adjectives and adverbs are content words, which usu- Fig. 5 Frequency of average similarity ratings for all combinations ally refer to some object, action, or characteristic of an of (different) actions in the database. N = 325 combinations of two event. Verbs, adjectives, and nouns (i.e., rotate, rotating, different actions. Ratings represent the similarity between two actions, and rotation) that have the same root were coded as the on a scale of 1 (“very dissimilar”) to 7 (“very similar”). The dotted same responses using the root of the word (i.e. rotate). lines mark the neutral score of 4 on the seven-point scale Behav Res (2018) 50:1270–1284 1281 Annotations could contain the same root more than once, for each action. Difficulty ratings were averaged over each but only unique roots counted towards the total number of action. content words in a description. For instance, one partici- pant described action 11a with “jump forward and alternate Results and Discussion your legs with each jump like a scissor movemenn, using the word “jump” first as verb and then as a noun. These Table 3 shows that participants provided quite lengthy two words have the same root and therefore only added descriptions for the actions (mean number of words per a count of one to the total count of content words per description: M = 6.99, SD = 5.71), ranging between 4.50 description. Auxiliary verbs, pronouns, articles, and prepo- words for action 9b and 9.80 words for action 8b. On sitions are grammatical words and were therefore not coded. average, 4.68 (SD = 3.02) roots were annotated for the Annotations were checked by an independent researcher. descriptions, ranging between 3.50 for action 9b and 10b We used two key statistics to evaluate the conciseness and 6.15 for action 8b. of the descriptions: the average number of unique roots per Participants generally approached the task by describ- description and the number of descriptions that contained a ing the actions using main verbs and modified these verbs single root. We computed the percentage of participants that using adjectives, adverbs, directional phrases, and nouns mentioned the same root for each action to measure agree- that specified the part of the body that was most involved ment among participants. Subsequently, these roots were in the movement. For instance, one participant described ranked based on how many of the participants used them in action 10b as “turn sideways and simply jump sideways their description and the three most used roots were reported whilst keeping your feet together” and another participant Table 3 Descriptive statistics of written descriptions for all 26 actions in the GRACE database No. Action X No. words No. roots Root 1 Root 2 Root 3 No. single Difficulty (%) M (SD) M (SD) (%) (%) (%) roots (%) M (SD) 1a. bowing 0 (0.0) 8.04 (6.22) 5.29 (3.51) bend (66.7) walk (58.3) forward (54.2) 1 (4.2) 4.58 (1.38) 1b. skating 2 (8.3) 6.32 (4.24) 4.18 (2.30) zigzag (25.0) slide (25.0) forward (25.0) 1 (4.2) 4.08 (1.79) 2a. wobbling 1 (4.2) 8.43 (5.18) 5.48 (2.69) body (62.5) rotate (41.7) upper (37.5) 1 (4.2) 3.67 (1.71) 2b. marching 0 (0.0) 5.92 (3.60) 4.29 (2.18) leg (66.7) forward (41.7) march (29.7) 3 (12.5) 5.29 (1.46) 3a. mermaiding 0 (0.0) 7.71 (6.80) 4.96 (3.63) side (79.2) jump (58.3) together (33.3) 0 (0.0) 5.04 (1.40) 3b. overstepping 0 (0.0) 6.92 (4.60) 4.96 (2.72) leg (54.2) step (33.3) forward (33.3) 1 (4.2) 4.33 (1.37) 4a. creeping 0 (0.0) 5.29 (6.83) 3.58 (3.24) walk (29.2) forward (29.2) slow (25.0) 5 (20.8) 4.50 (1.96) 4b. crisscrossing 0 (0.0) 7.29 (5.22) 4.79 (2.64) side (75.0) cross (62.5) leg (62.5) 1 (4.2) 3.92 (1.69) 5a. turning 1 (4.2) 9.39 (8.54) 5.91 (3.99) jump (79.2) side (58.3) turn (29.2) 0 (0.0) 3.96 (1.33) 5b. hopscotching 2 (8.3) 7.27 (8.02) 4.64 (3.98) hopscotch (50.0) leg (33.3) jump (29.2) 4 (16.7) 4.04 (1.78) 6a. swinging 1 (4.2) 8.78 (5.38) 5.30 (2.67) leg (70.8) circle (50.0) walk (33.3) 1 (4.2) 3.71 (1.52) 6b. skipping 5 (20.8) 8.21 (8.49) 5.00 (4.18) forward (45.8) leg (37.5) move (25.0) 2 (8.3) 2.67 (1.74) 7a. jumping 0 (0.0) 7.04 (4.25) 4.71 (2.46) jump (58.3) forward (45.8) leg (41.7) 1 (4.2) 3.96 (1.46) 7b. crossing 2 (8.3) 8.09 (4.68) 5.41 (2.74) cross (50.0) leg (45.8) walk (37.5) 0 (0.0) 3.54 (1.35) 8a. dropping 1 (4.2) 6.61 (6.52) 4.83 (3.77) walk (58.3) squat (33.3) crouch (29.2) 2 (8.3) 4.00 (1.47) 8b. folding 4 (16.7) 9.80 (7.21) 6.15 (3.17) leg (45.8) walk (41.7) step (33.3) 0 (0.0) 2.92 (1.79) 9a. twisting 2 (8.3) 5.77 (6.64) 4.27 (2.81) rotate (33.3) degree (25.0) side (20.8) 2 (8.3) 3.76 (1.69) 9b. stomping 0 (0.0) 4.50 (3.20) 3.50 (2.02) knee (50.0) high (41.6) stomp (33.3) 5 (20.8) 5.42 (1.32) 10a. trotting 1 (4.2) 5.43 (3.64) 4.04 (1.89) knee (62.5) high (54.2) forward (37.5) 2 (8.3) 2.67 (1.88) 10b. hopping 0 (0.0) 4.79 (3.75) 3.50 (2.32) side (87.5) jump (58.3) together (37.5) 1 (4.2) 6.08 (1.14) 11a. flicking 2 (8.3) 5.73 (4.00) 4.00 (2.12) leg (41.7) forward (33.3) quick (20.8) 1 (4.2) 3.50 (1.62) 11b. dragging 1 (4.2) 7.17 (5.65) 5.17 (3.28) forward (50.0) leg (41.7) step (37.5) 2 (8.3) 4.42 (1.74) 12a. grapevining 1 (4.2) 6.48 (5.88) 4.48 (2.98) cross (62.5) side (58.3) leg (50.0) 2 (8.3) 4.04 (1.23) 12b. shuffling 0 (0.0) 5.83 (4.39) 3.71 (2.31) walk (50.0) forward (37.5) lift (20.8) 3 (12.5) 5.42 (1.56) 13a. groining 3 (12.5) 9.48 (6.30) 6.14 (3.45) leg (54.2) walk (29.2) knee (29.2) 0 (0.0) 2.96 (1.68) 13b. scurrying 0 (0.0) 5.38 (4.53) 3.92 (3.03) tiptoe (58.3) step (33.3) fast (25.0) 3 (12.5) 5.36 (1.06) N = 24. Columns left to right describe the action number (No.), short-hand action label (Action), number (%) of participants who were not able to describe the action (X), length of a description (No. words), number of content words in a description (No. roots), three most used content words for each action (Root 1–3), number (%) of descriptions that contained a single content word (No. single roots), and the rated difficulty of coming up with a description (Difficulty, 1 = very difficult and 7 = very easy) 1282 Behav Res (2018) 50:1270–1284 described action 4a as “walking forward crouching slightly publicly available to the research community, we set out to with knees bent”. inspire researchers to norm our videos for their own stud- For the next analyses, we excluded 30 responses that ies. We invite these researchers to share these norms with us stated an “X”, because this indicated that participants could and other researchers so that we can upload these along with not come up with a description. Participants who attempted the GRACE video database through the Warwick Research to describe the actions used only a single content word in Archive Portal. 7.4% of the cases. In most responses, participants thus used more than one content word to describe the actions. Participants rated the difficulty of coming up with a Acknowledgements We would like to express our gratitude to all description on average with 4.22 (SD = 1.74), suggest- the actors in the videos, without whom it would not be possible to ing that the task was neither difficult nor easy. Participants develop the video database. A special thanks goes to Lukasz Walasek found action 6b and 10a (M = 2.67) most difficult to and Neil Stewart for helping us run our experiments in a web-based environment. describe and action 10b (M = 6.08) easiest to describe. Finally, we correlated the length of the descriptions (i.e. the number of words) and the number of roots per descrip- tion with the difficulty ratings for the actions. Participants Open Access This article is distributed under the terms of the who found it more difficult to describe the actions provided Creative Commons Attribution 4.0 International License (http:// longer descriptions r(595) = .10, p = .017, and used more creativecommons.org/licenses/by/4.0/), which permits unrestricted content words, r(595) = .12, p = .003. use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. General Discussion Appendix We developed the GestuRe and Action Exemplar (GRACE) video database, which is publicly available from the War- wick Research Archive Portal at http://wrap.warwick.ac.uk/ 78493. The GRACE video database contains 676 videos of 26 novel manners of human locomotion performed by 13 male actors and 13 female actors (i.e. actors moving from one location to another in an unusual manner), and videos of a female actor producing iconic gestures that represent these manners. Our first norming study demonstrates that GRACE con- tains gesture and action videos that can be combined to create clear matches and mismatches between iconic ges- tures and manners of human locomotion. Based on the findings of this first norming study, we assigned two actors (one male and one female) to a pair of actions to maxi- mize the match between the iconic gestures and actions. Our second norming study shows that male actors and female actors who were assigned to an action pair perform the same actions in very similar manners and the different actions in highly distinct manners. Our third norming study indi- cates that the majority of actions are, in fact, highly distinct from all other actions in the database. Our fourth norming study demonstrates that adult native English speakers do not Fig. 6 Profit matrix for action–gesture matches for females. Stimuli converge on accurate and concise linguistic expressions for were rated on a seven-point scale, where 1 indicated a very bad match, the actions in the database, indicating that these manners of 4 indicated neither a good nor a bad match, and 7 indicated a very good human locomotion are unusual. match. Ratings were averaged over each action pair and actor combi- nation. Column numbers correspond to action pairs and row numbers This database is useful for experimental psychologists correspond to the female actors in the database. Grey rectangles indi- working on action and gesture in areas such as language cate the ratings that were selected by the Hungarian algorithm, which processing, vocabulary development, visual perception, maximized the total profit of a one-to-one assignment of female actors categorization, and memory. By making our video database to action pairs Behav Res (2018) 50:1270–1284 1283 Kuhn, H., & Yaw, B. (1955). The Hungarian method for the assign- ment problem. Naval Research Logistics Quarterly, 2(1-2), 83–97. Madan, C., & Singhal, A. (2012). Using actions to enhance memory: effects of enactment, gestures, and exercise on human mem- ory. Frontiers in Psychology, 3, 507. https://doi.org/10.3389/fpsyg. Maguire, M., Hennon, E., Hirsh-Pasek, K., Golinkoff, R., Slutzky, C., & Sootsman, J. (2002). Mapping words to actions and events: How do 18-month-olds learn a verb?. In Skarabela, B., Fish, S., & Do,A.(Eds.), Proceedings of the 27th Annual Boston University Conference on Language (pp. 371–382). Somerville: Cascadilla Press. Maguire, M., Hirsh-Pasek, K., Golinkoff, R., & Brandone, A. (2008). Focusing on the relation: Fewer exemplars facilitate children?s ini- tial verb learning and extension. Developmental Science, 11(4), 628–634. https://doi.org/10.1111/j.1467-7687.2008.00707.x. Malt, B., Ameel, E., Imai, M., Gennari, S., Saji, N., & Majid, A. (2014). Human locomotion in languages: Constraints on moving and meaning. Journal of Memory and Language, 74(7), 103–127. https://doi.org/10.1016/j.cognition.2008.08.009. Malt, B., Gennari, S., Ameel, E., Tsuda, N., & Majid, A. (2008). Talking about walking: Biomechanics and the lan- guage of locomotion. Psychological Science, 19(3), 232–240. https://doi.org/10.1111/j.1467-9280.2008.02074.x. McNeill, D. (1992). Hand and mind: What gestures reveal about thought. University of chicago press. Fig. 7 Profit matrix for action–gesture matches for males. Stimuli McNeill, D., Cassell, J., & McCullough, K. (1994). Communicative were rated on a seven-point scale, where 1 indicated a very bad match, effects of speech-mismatched gestures. Research on Language 4 indicated neither a good nor a bad match, and 7 indicated a very good and Social Interaction, 27(3), 223–237. match. Ratings were averaged over each action pair and actor combi- Mumford, K. (2014). The relationship between vocabulary and gesture nation. Column numbers correspond to action pairs and row numbers development in early childhood and infancy. (phd dissertation). correspond to the male actors in the database. Grey rectangles indicate University of birmingham. the scores that were selected by the Hungarian algorithm, which max- Mumford, K., & Kita, S. (2014). Children use gesture to inter- imized the total profit of a one-to-one assignment of male actors to pret novel verb meanings. Child Development, 85(3), 1181–1189. action pairs https://doi.org/10.1111/cdev.12188. Ozc ¸alıs ¸kan, S., Gentner, D., & Goldin-Meadow, S. (2014). Do iconic gestures pave the way for children’s early verbs?. Applied Psycholinguistics, 35(6), 1–20. https://doi.org/10.1017/ References S0142716412000720. Ozc ¸alıs ¸kan, S., Lucero, C., & Goldin-Meadow, S. (2016). Does language shape silent gestures?. Cognition, 148(1), 10–18. Cassell, J., McNeill, D., & McCullough, K. (1999). Speech-gesture https://doi.org/10.1016/j.cognition.2015.12.001. mismatches: Evidence for one underlying representation of lin- Ozyurek, ¨ A., & Kita, S. (1999). Expressing manner and path in guistic and nonlinguistic information. Pragmatics and Cognition, English and Turkish: Differences in speech, gesture, and con- 7(1), 1–34. https://doi.org/10.1207/s15327973rlsi2703 4. ceptualization. In Hahn, M., & Stoness, S. (Eds.), Proceedings Feyereisen, P. (2006). Further investigations on the mnemonic of the 21st Annual Conference of the Cognitive Science Society effect of gestures: Their meaning matters. European Journal of (pp. 507–512). London: Erlbaum. Cognitive Psychology, 18(2), 185–205. https://doi.org/10.1080/ Ozyurek, ¨ A., Willems, R., Kita, S., & Hagoort, P. (2007). On-line inte- gration of semantic information from speech and gesture: Insights Goodrich, W., & Hudson Kam, C. (2009). Co-speech gesture as input from event-related brain potentials. Journal of Cognitive Neuro- in novel verb learning. Developmental Science, 12(1), 81–87. science, 19(4), 605–616. https://doi.org/10.1162/jocn.2007.19.4. https://doi.org/10.1111/j.1467-7687.2008.00735.x. Imai, M., Haryu, E., & Okada, H. (2005). Mapping novel nouns and Pulverman, R., Golinkoff, R., Hirsh-Pasek, K., & Sootsman Buresh, J. verbs onto dynamic action events: are verb meanings easier to (2008). Infants discriminate manners and paths in non-linguistic learn than noun meanings for Japanese children?. Child Develop- dynamic events. Cognition, 108(3), 825–830. https://doi.org/10. ment, 76(2), 340–355. https://doi.org/10.1111/j.1467-8624.2005. 1016/j.cognition.2008.04.009. 00849.x. Pulverman, R., Hirsh-Pasek, K., Golinkoff, R., Pruden, S., & Salkind, Imai, M., Kita, S., Nagumo, M., & Okada, H. (2008). Sound sym- S. (2006). Conceptual foundations for verb learning: Celebrating bolism facilitates early verb learning. Cognition, 109(1), 54–65. the event. In Hirsh-Pasek, K., & Golinkoff, R. (Eds.), Action Meets https://doi.org/10.1016/j.cognition.2008.07.015. ¨ Word: How Children Learn Verbs. https://doi.org/10.1093/acprof: Kita, S., & Ozyurek, ¨ A. (2003). What does cross-linguistic variation in oso/9780195170009.003.0006. Amsterdam: Oxford University semantic coordination of speech and gesture reveal?: Evidence for Press. an interface representation of spatial thinking and speaking. Jour- Pulverman, R., Song, L., Golinkoff, R. M., & Hirsh-Pasek, K. (2013). nal of Memory and Language, 48(1), 16–32. https://doi.org/10.10 Preverbal infants’ attention to manner and path: Foundations for 16/S0749-596X(02)00505-3 learning relational terms. Child Development, 84(1), 241–252. Kuhn, H. (1956). Variants of the Hungarian method for assignment https://doi.org/10.1111/cdev.12030. problems. Naval Research Logistics Quarterly, 3, 253–258. 1284 Behav Res (2018) 50:1270–1284 R Development Core Team (2011). R: A Language and Environment Slobin, D. (2006). What makes manner of motion salient? Explorations for Statistical Computing. Vienna, Austria. Retrieved from http:// in linguistic typology, discourse, and cognition. In Hickmann, M., www.R-project.org (ISBN 3-900051-07-0). &Robert,S.(Eds.), Space in languages: Linguistic systems and Salkind, S. (2003). Do you see what I see?, Paper presented at the 4th cognitive categories (pp. 59–81). Amsterdam: John Benjamins. Annual University of Delaware Linguistics and Cognitive Science Slobin, D., & Hoiting, N. (1994). Reference to movement in spoken Graduate Student Conference. Neward, DE. and signed languages: Typological considerations, Proceedings Salkind, S., Golinkoff, R., & Brandone, A. (2005). Infants’ attention to of the 20th Annual Meeting of the Berkeley Linguistics Society novel actions in relation to the conflation patterns of motion verbs. (pp. 487–503). Berkeley: Berkeley Linguistics Society. In Golinkoff, R., & Hirsh-Pasek, K. (Eds.), Action Packed for Lan- Slobin, D., Ibarretxe-Antunano, ˜ I., Kopecka, A., & Majid, A. (2014). guage: Prelinguistic Foundations for Learning Relational Terms. Manners of human gait: A crosslinguistic event-naming study. Atlanta: Society for Research in Child Development Biennial Cognitive Linguistics, 25(4), 701–741. https://doi.org/10.1515/ Meeting. cog-2014-0061. Scott, R., & Fisher, C. (2012). 2.5-year-olds use cross-situational Spencer, D., McDevitt, T., & Esch, M. (2009). Brief training with consistency to learn verbs under referential uncertainty. Cog- co-speech gesture lends a hand to word learning in a foreign nition, 122(2), 163–180. https://doi.org/10.1016/j.cognition.2011. language. Language and Cognitive Processes, 24(2), 313–334. 10.010. https://doi.org/10.1080/01690960802365567. Slobin, D. (2004). The many ways to search for a frog: Linguis- Supalla, T. (2009). Fisher, S., & Siple, P. (Eds.) Theoretical issues tic typology and the expression of motion events. In Stromqvist, ¨ in sign language research: Linguistics, (pp. 127–152). Chicago: S., & Verhoeven, L. (Eds.), Relating Events in Narrative: Vol. 2: University of Chicago Press. Typological and Contextual Perspectives (pp. 219–257).Mahwah: Wood, J. (2008). Visual memory for agents and their actions. Cognition, Lawrence Erlbaum Associates. 108(2), 522–532. https://doi.org/10.1016/j.cognition.2008.02.012.

Journal

Behavior Research MethodsSpringer Journals

Published: Sep 15, 2017

References