‘Can I Trust the Spoken Dialogue System Because It Uses the Same Words as I Do?’—Influence of Lexically Aligned Spoken Dialogue Systems on Trustworthiness and User Satisfaction

‘Can I Trust the Spoken Dialogue System Because It Uses the Same Words as I Do?’—Influence... Abstract One of many ways in which spoken dialogue systems (SDS) are becoming more and more flexible is in their choice of words (e.g. alignment to the user’s vocabulary). We examined how users perceive such adaptive and non-adaptive SDS regarding trustworthiness and usability. In Experiment 1, 130 participants read out questions to an SDS that either made or did not make lexical alignment in its replies. They perceived higher cognitive demand when the SDS did not employ alignment. In Experiment 2, 135 participants listened to a conversation between a human and the same SDS in an online study. They judged the aligned SDS to have more integrity and to be more likeable. Implications for the design of SDS are discussed. RESEARCH HIGHLIGHTS We compared users and listeners perception of spoken dialogue systems (SDS). SDS’s lexical alignment affects user’s cognitive demand. SDS’s lexical alignment appears more likeable to listeners. SDS with lexical alignment is ascribed higher integrity by listeners. 1. INTRODUCTION Having a personal virtual assistant organizing your everyday tasks is no longer science fiction. Facebook has recently presented an assistant called ‘M’ that can be talked to in natural language (Hempel, 2015). On your friend’s birthday, for example, M can offer to order a cake for you, make a reservation in your friend’s favorite restaurant, and even make suggestions regarding a suitable present. Google has also developed a system that is able to conduct very natural spoken conversations. Extracting information autonomously, it can, for instance, deal with IT problems almost like a human consultant (Vinyals and Le, 2015). Furthermore, Amazon offers a personal assistant, called ‘Alexa’, for various purposes ranging from retrieving information to smart home operation (Manjoo, 2016). Trust is a major issue in this type of interaction (Cowan et al., 2015). Users of systems like ‘Siri’ from Apple mention trust as a criterion, which determines the range of tasks users ask the system to perfom (Luger and Sellen, 2016). Especially, trust becomes important when the capability of the system is opaque to users. People tend to be unsure that the system will perform tasks properly (Luger and Sellen, 2016). When assessing trustworthiness, users rely on language, because certain characteristics of (spoken) language can promote trust (Jucks et al., 2016). These include self-disclosing information, addressing emotions, and being empathic. When people talk to each other in dialogue situations, they tend to adopt a range of linguistic features used by their interlocutor (Branigan et al., 2010; Brennan and Clark, 1996). One of these is adaptation on the level of word choice, termed lexical alignment (Branigan et al., 2011). This means that a speaker employs words that have already been used by the conversational partner. Lexical alignment plays an important role not only in human–human but also in human–computer interaction (Bell et al., 2003; Cowan and Branigan, 2015; Gustafson et al., 1997; Linnemann and Jucks, 2016). It has been a primary focus of research on spoken dialogue systems (SDS), because it not only contributes decisively to the success of communication (Nenkova et al., 2008) but can also inform the general body of alignment research (Branigan et al., 2011). However, the use of lexical alignment also reflects social functions, like politeness (Branigan et al., 2010; Jucks et al., 2014; Torrey et al., 2006) and affiliation (Ireland and Pennebaker, 2010). Because SDS are likely to be perceived as social actors (Sundar and Nass, 2000), these social functions of lexical alignment should affect communication with SDS and, in particular, assessment of their trustworthiness. Asking whether and which SDS are perceived as trustworthy is of major practical importance in light of their widespread and growing implementation (López-Cózar et al., 2014; Luger and Sellen, 2016). In this context, lexical alignment could prove to be one key component among the linguistic features employed by SDS. In the following introductory sections, we first give an introduction to lexical alignment, we present the role of lexical alignment in human–computer interaction and especially in interaction with SDS. We then examine the role of trust in the use of SDS. Finally, we present the rationale of the present study and derive hypotheses. 1.1. Lexical alignment People tend to converge linguistically when in dialogue. This has frequently been termed alignment (Branigan and Pearson, 2006; Pickering and Garrod, 2004), convergence (Giles et al., 1991), linguistic entrainment (Garrod and Anderson, 1987) or use of linguistic precedents (Barr and Keysar, 2002). The approaches mainly differ in the conceptualization of the degree to which the conversational partner is taken into account and to the assumed degree of automation or intention respectively. Yet, there is no account that satisfactorily fully integrates these approaches (Foltz et al., 2015). However, two approaches are mainly considered in research (Cowan et al., 2015). On the one hand, alignment can be conceptualized as a rather automatic process: In their Interactive Alignment Model, Pickering and Garrod (2004) postulate that alignment is based on automatic priming mechanisms. According to the model, the tendency to align on one linguistic level—the lexical level, for instance—is enhanced through alignment on another level—the syntactic level, for instance. Following Pickering and Garrod (2004), successful alignment ensures a successful communication. On the other hand, several studies suggest that lexical alignment represents a partner specific process, which results in audience design (Brennan and Clark, 1996; Krauss and Fussell, 1991; Metzing and Brennan, 2003). The process between interlocutors to settle on shared references is termed grounding, the resulting shared knowledge base is referred to as common ground (Clark and Brennan, 1991; Clark, 1996). The approaches described are not strictly separated from each other. Alignment can contain elements of both approaches, depending on the context (Branigan et al., 2010). Lexical alignment refers to the employment of words that have already been used by the conversational partner. The existence of lexical alignment has been shown in numerous empiric studies (Branigan et al., 2010, 2011; Brennan and Clark, 1996). In the following section, we describe the role of lexical alignment in spoken human–computer interaction. 1.2. Lexical Alignment in Spoken Human–Computer Interaction Research has shown that people commonly show lexical alignment toward computers (Branigan et al., 2010), and even the mere belief that one is communicating with a computer leads to a greater amount of lexical alignment (Branigan et al., 2010, 2011). This seems to be because people aim for communicative success (Branigan et al., 2011; Linnemann and Jucks, 2016). Indeed, research confirms that lexical alignment actually does enhance the success of communication with SDS (Koulouri et al., 2016; Levitan et al., 2011). In addition, word choice can give information about the relationship between conversational partners. According to communication accommodation theory, deliberately adopting or not adopting an interaction partner’s words can express convergence with or divergence from either the partner or the topic (Giles et al., 1991; see also Danet, 1980; Van der Wege, 2009): Convergence can be motivated by the attempt to enhance the communicational effectiveness, for instance by facilitating comprehension. Divergence can express the wish to emphasize differences to the linguistic style of a conversational partner. Ireland and Pennebaker (2010) have shown that matching language style can serve as a marker for the social quality of the relationship. Furthermore, the adoption of words is perceived as polite (Branigan et al., 2010; Torrey et al., 2006) and can lead to positive feelings (Bradac et al., 1988; Branigan et al., 2010; Van Baaren et al., 2003). The resulting likeability, in turn, has been shown to exert a positive influence on the alignment toward an embodied conversational agent; that is, a graphical computer figure that possesses conversational capabilities (Pickard et al., 2014). People can be both users of SDS and observers of others interacting with an SDS. The latter case occurs, for example, for navigation systems operated by one person in a car with further occupants, or with personal assistants like ‘Alexa’ from Amazon, which is intended to be used by all members of a household. Furthermore, advertisements show other people interacting with an SDS. This may be the first encounter for potential customers before directly experiencing the interaction with an SDS. Research has shown that the degree of involvement is an important aspect in communication: Wilkes-Gibbs and Clark (1992), for example, investigated beliefs about shared information. Branigan et al. (2007) showed that addressees possess a greater likelihood to repeat the speaker’s grammatical form than bystanders. This result is interpreted by Garrod and Pickering (2009) to that effect, that adressees prepare to give an answer and therefore predict the speaker’s grammatical form. Thus, lexical alignment plays an important role both for speakers of and listeners to utterances. For speakers, alignment can be an instrument to ensure communicative success (Levitan et al., 2011; Linnemann and Jucks, 2016) and to mark their position toward the interlocutor (Van der Wege, 2009). Especially the first aspect is important in communication with SDS, because the natural variability of word choice is relatively high (vocabulary problem; Furnas et al., 1987). Therefore, the implementation of mechanisms that lead people to adopt words facilitates communication and reduces problems in understanding (Cowan and Branigan, 2015; Nenkova et al., 2008; Tomko and Rosenfeld, 2004). For listeners, the implementation of lexical alignment by the SDS can be beneficial as well. Lopes et al. (2011) reported that it has positive effects on system performance and estimated dialogue success. Furthermore, some participants reported that the system sounded more natural the more that it aligned. 1.3. SDS and trust McKnight (2005) has stated that ‘trust in technology is built the same way as trust in people’ (p. 330). Evidence suggests that people treat computers as social actors, not considering that they are the product of programmers (Sundar and Nass, 2000; see also Dybkjær and Bernsen, 2000). Hence, people put human-like social qualities on computer systems. This does not even require a highly evolved language: A basic language capability showing the system’s capability can produce the perception of the computer as a human-like being (De Angeli et al., 1999). The effect of ascribing human-like qualities remains even when participants know they are communicating with a computer (Holtgraves et al., 2007). However, there seems to be a relationship between the degree of anthropomorphism and trustworthiness: The more human-like a computer appeared, the higher participants rated its competence and trustworthiness (Gong, 2008). In communication with an SDS, especially the presence of voice and spoken language leads to the involvement of trust (Mitchell et al., 2011). Indeed, the voice alone conveys characteristics that influence the assessment of trustworthiness. For instance, male voices are rated as trustworthier (Lee et al., 2000), and so are voices with cues that match a participant’s personality (Nass and Lee, 2001). Existing SDS already possess a relatively high degree of anthropomorphism (Edlund et al., 2008; Mavridis, 2015). The growing capabilities of SDS are possible due to their permanent connection to the Internet (Mavridis, 2015) and the continuous gain in information from their users. Users perceive that they are communicating with SDS as social actors (Nass and Brave, 2005; Sundar and Nass, 2000) and are not perceiving that they are transferring their information to a distal server. Therefore, we argue that their trust behavior consists in talking to the SDS and self-disclosing personal data and questions to it (Jucks et al., 2016; Tseng and Fogg, 1999). Because of the perception of SDS as social actors and, in turn, the activation of social categories, we further assert that SDS can be conceptualized as receivers of trust. In the vast majority of conceptualizations, trust incorporates the willingness to depend on somebody else (McKnight and Chervany, 2001) and therefore the willingness to be vulnerable (Mayer and Davis, 1999). In their integrative model of trust, Mayer et al. (1995) list ability (competence), benevolence and integrity, forming the so-called ABI model, as antecedents of trustworthiness. Ability addresses the competence to attain the aspired results. Benevolence refers to the trustee’s motivation and good will toward the trustor, and integrity refers to the trustee’s honesty. All these qualities can be ascribed to elaborated SDS (Paek and Pieraccini, 2008). Trust represents an important precondition for spoken human–computer interaction because it enables interaction and enhances the vivid exchange of information (Maddux et al., 2007). Even when users know that privacy is low, they can compensate this through high trust (Joinson et al., 2010). Thus, trust is an essential aspect in communication with SDS. 1.4. Rationale We have outlined above that lexical alignment occurs in the interaction with SDS and that it plays an important role for communication from the point of view of both speakers and listeners. Its importance stems from its contribution to conversational success as well as the transmission of social aspects such as affiliation. Because highly capable systems possess a high degree of human-likeness, users can easily perceive them as social actors (Holtgraves et al., 2007; Nass and Brave, 2005; Nass and Lee, 2001; see also Von der Pütten et al., 2010). This, in turn, may promote trusting behavior with the SDS as the trustee. In the present study, we used two experiments to investigate the influence of lexical alignment on the perception of the trustworthiness of an SDS and the satisfaction with the conversation. In Experiment 1, we employed an SDS and investigated how observers assessed its trustworthiness and their satisfaction with the conversation. In Experiment 2, we examined how users experience lexical alignment shown by an SDS they talked to themselves. That is, in Experiment 2, users had direct experience of lexical alignment in response to their questions (Holtgraves and Han, 2007). 2. EXPERIMENT 1: OBSERVER STUDY In the first experiment, we investigated the effects of lexical alignment shown by the SDS on how observers perceived the system’s trustworthiness and the communication in general. Listeners have been shown to play an important role for alignment (Clark and Krych, 2004; Krauss and Fussell, 1991; Krauss, 1987). The degree of personal involvement turned out to be an important aspect of communication (Garrod and Pickering, 2009; Wilkes-Gibbs and Clark, 1992). Being an observer of a conversation, especially of a recorded conversation, does not include getting involved in this conversation. Not having to be prepared to join the conversation might lead to the attention to other conversational aspects than a speaker might focus on. For Experiment 1, we formulated the following hypotheses: Based on the findings of Gong (2008) and Mitchell et al. (2011), we hypothesized: Hypothesis 1: When the SDS shows lexical alignment, users will ascribe higher trustworthiness compared to when the system shows no lexical alignment. Lexical alignment can positively influence the communicative success (Branigan et al., 2011; Linnemann and Jucks, 2016) in the communication with SDS (Koulouri et al., 2016; Levitan et al., 2011). Therefore: Hypothesis 2: When the SDS shows lexical alignment, satisfaction with the conversation will be rated higher compared to when the system shows no lexical alignment. Alignment has a positive influence on the quality of a relationship (Ireland and Pennebaker, 2010). Therefore: Hypothesis 3: When the SDS shows lexical alignment, the quality of the relationship will be rated higher compared to when the system shows no lexical alignment. Participants listened to the conversation between a student and the SDS. The SDS either showed or did not show lexical alignment toward the student’s utterances. Afterwards, participants assessed the perceived trustworthiness of the system and their satisfaction with the conversation. Additionally, we presented three answers from each condition to the participants and asked them to edit these. The idea behind taking this exploratory approach was to assess the linguistic preferences participants would show when they were free to change the proposals. The experiment was conducted in an online setting. 2.1. Methods and materials 2.1.1. Participants Participants were recruited using students’ newsletters and via posters in the university building. They received 5 € vouchers from Amazon or the equivalent in course credits. We conducted an analysis of power in advance (assuming 1 − ß = .80; Cohen’s f = .25) and determined that we needed a sample size of 128. A total of 135 students attending a large university in Germany participated in the experiment; 83 of them were female and 52 male. Their mean age was 23.02 years (SD = 3.60). On average, they were in their fourth semester (M = 4.31, SD = 2.53). All participants reported that German was their first language. The majority (30.4%) were training to be teachers, 14.1% were majoring in psychology, and the others came from various disciplines. They reported using a computer for an average of 21.92 (SD = 14.92) hours a week and having a high level of computer knowledge (M = 3.50, SD = 0.66 on a 4-point scale ranging from 1 [beginner] to 4 [expert]). They reported using SDS for an average of 4.06 (SD = 0.95) hours per week, and using them for quick access to information (28.9%), multiple purposes (11.9%), navigation (2.2%) and weather forecasts (1.5%). 2.1.2. Design Participants listened to the conversation between a student and an SDS regarding issues of freshmen at the university. Therefore, we made up a mock SDS, which we called ‘ACURI’ (Advanced Conversational Utterances Research Interface). We embedded the recorded conversation between the student and ACURI in an online survey (EFS Survey®). We varied whether the SDS showed lexical alignment to the student or not. Participants were randomly assigned to conditions. The interval between utterances was held constant at 1 s between the student’s question and the system’s answer and 2 s between the system’s answer and the student’s next question. The study was conducted as an online experiment. Instructions included the false information that ACURI is a system that could be applied to various topics, and that it had been made available to first-year students in order to help guide them through their first days at university (see https://sites.google.com/site/sdsalignment). We chose this field of topic because of our student sample. Due to the design, participants could not use their own words. However, we aimed at a topic that is related to the participants’ life to ensure a certain ecological validity. Moreover, providing information is a central task of common SDS. We told participants that we were interested in how people interact with this kind of system. After receiving their instructions, participants listened to the recorded conversation. They then completed a questionnaire including SASSI, QRI, Trustworthiness Scale (adapted from Mayer and Davis, 1999), NEO-PI-R and TA-EG (description and alpha values: see below). Finally, we presented three questions and answers from the dialogue and gave participants the opportunity to edit them. At the end of the study, participants were thanked, fully debriefed and rewarded. 2.1.3. Materials The study was conducted in German. The conversation between a student and ACURI was recorded in advance. A female voice from MacBook Pro Voice-Over system was used to read out ACURI’s answers, the student’s answers were recorded with a microphone. The student spoke ten utterances, which did not vary between conditions. The topic were relevant issues for first-year students such as counseling, accommodation, and study regulations (see https://sites.google.com/site/sdsalignment). We varied whether ACURI showed lexical alignment to the student or did not show lexical alignment. The word length of the utterances was approximately equal in each condition (see Appendix A). Furthermore, the utterances provided the same amount of major and minor clauses. 2.2. Dependent variables 2.2.1. Trustworthiness scale We assessed trustworthiness by adapting a scale from Mayer and Davis (1999) that was constructed initially for use in the context of organizational relationships (Mayer et al., 1995). These authors defined trust as the willingness to be vulnerable to another one’s action. Their model conceptualizes the antecedents of trust as ability, benevolence, and integrity. Thus, the perception of the trustee’s trustworthiness depends on the assessment of these factors. In this definition, the trustee has to be identifiable and perceived to act with volition. Based on findings presented above in the section on trust and SDS, we assume that our mock SDS meets this requirement. The scale has been applied successfully in other contexts (e.g. online communication: Thon and Jucks, 2014). Internal consistencies were α = 0.84 for ability (five items), α = 0.83 for benevolence (five items) and α = 0.71 for integrity (six items). 2.2.2. Satisfaction with the conversation We assessed satisfaction with the conversation with the Subjective Assessment of Speak System Interfaces (SASSI, Hone and Graham, 2000, 2001). SASSI contains six subscales: Response Accuracy, Likeability, Cognitive Demand, Annoyance, Habitability and Speed. Response Accuracy covers the user’s judgment of the appropriateness of the answers. Likeability refers to friendliness and usefulness. Cognitive Demand comprises the perceived own effort. Annoyance describes repetitiveness, boredom and frustration. Habitability refers to the user’s understanding of the system’s and her own reaction. Speed means the reaction time of the system. Measures were assessed on 7-point scales ranging from 1 (I do not agree at all) to 7 (I totally agree). Internal consistencies for the SASSI subscales were α = 0.72 for Response Accuracy (nine items), α = 0.81 for Likeability (nine items), α = 0.670 for Cognitive Demand (four items), α = 0.71 for Annoyance (five items), α = 0.722 for Habitability (five items) and α = 0.71 for Speed (two items). 2.2.3. Quality of relationship inventory We assessed the relationship with the SDS by adapting the Quality of Relationship Inventory (QRI, Pierce et al., 1997; Verhofstadt et al., 2006; validated for German Samples by Reiner et al., 2012) for application to an SDS. The inventory includes the scales support (Is support available from the relationship?), conflict (Does the relationship cause conflicts and ambivalent feelings?) and depth (How close and important is the relationship?). It is widely used for assessing relationships with significant others, and therefore designed originally for use in human–human relationships. However, research has shown that people perceive and treat computational agents as social actors even when clearly informed that their partner is a computer (Holtgraves et al., 2007; Sundar and Nass, 2000). Therefore, we adapted the items to assess the quality of the relationship in human–computer interaction. Internal consistencies for the QRI subscales were α = 0.54 for the subscale support (five items), α = 0.84 for depth (six items) and α = 0.87 for conflict (12 items). 2.2.4. Text editing Participants were given the opportunity to revise three of the answers given by the SDS in the audio file they had listened to before. These three answers addressed the topics start of studies, study regulation and residence. The initial answer on start of studies contained 41 words in both conditions, the answer on study regulation contained 13 words in both conditions, and the answer on residence contained 49 words in the condition with alignment and 45 words in the condition without alignment. Participants could freely edit the text in a text box in order to improve it. Changes were saved when participants had finished their input. Following a procedure developed by Jucks et al. (2007), we compared documents (using Microsoft Word to detect changes), which included using Faigley and Witte’s (1981) taxonomy of revision changes as a basis for developing a coding scheme. First, we ascertained whether a change in the text had been made or not. If yes, we examined which kind of change had been performed: addition, deletion, replacement or change of the position of words or phrases in the text. Furthermore, we examined whether participants had changed particular words, which would make a difference regarding lexical alignment, that is, whether the change led to stronger or weaker lexical alignment or had no effect on it. Then we assessed the scope; that is, whether the change affected a word, phrase, or whole sentence. Finally, we judged whether the change influenced the meaning of the text. The coding was conducted by an instructed person. In case of uncertainty, the respective case was discussed with another person familiar with the coding scheme’s criteria. 2.3. Control measures 2.3.1. NEO-PI-R We controlled for the participants’ individual disposition to trust with eight items from the German version of NEO-PI-R (Ostendorf and Angleitner, 2004, following Costa and McCrae, 1992). Within the NEO-PI-R, propensity to trust is operationalized as a dimension of agreeableness. In this extracted form, the scale has been used before by Thon and Jucks (2014). The internal consistency was α = 0.85. 2.3.2. TA-EG We controlled for participants’ affinity with technology with the German TA-EG questionnaire (Karrer et al., 2009). This contains the subscales enthusiasm, competence, positive consequences and negative consequences. Internal consistencies were α = 0.89 for the subscale enthusiasm (five items), α = 0.60 for competence (four items), α = 0.66 for positive consequences (five items), and α = 0.78 for negative consequences (five items). 2.4. Results 2.4.1. Preliminary analyses There were no differences between conditions for sex, age and computer knowledge (comparison of distributions with Mann–Whitney U test). Furthermore, all TA-EG and NEO-PI-R subscales were equally distributed over conditions. An alpha level of 0.05 was used for all statistical tests. We used a Kolmogorov–Smirnov goodness-of-fit test to test whether the data was normally distributed. Because the data for the scales below were not normally distributed (P < 0.05), we employed a Mann–Whitney U test to compare groups. Values for Trustworthiness Scale, SASSI and Quality of Relationship Inventory are reported one-tailed because of our uni-directional hypotheses. For the exploratory analyses regarding text revision, all reported values are two-tailed. 2.4.2. Trustworthiness scale Participants ascribed more integrity to the SDS in the condition with lexical alignment (M = 3.370, SD = 0.647) than in the condition without lexical alignment (M = 3.002, SD = 0.631), U = 1441, P < 0.001, z = 3.665. There were no differences for the subscales ability (U = 1905, P = 0.05, z = 1.611, ns) and benevolence (U = 2111.5, P = 0.24, z = 0.692, ns) (all one-tailed, Mann–Whitney U test). 2.4.3. SASSI In the condition in which the SDS showed lexical alignment, participants rated the SDS’s response accuracy (U = 1787.5, P = 0.012, z = 2.125; with alignment: M = 4.833, SD = 0.571, without alignment: M = 4.701, SD = 0.471), likeability (U = 1870.5, P = 0.034, z = 1.757; with alignment: M = 3.725, SD = 0.653, without alignment: M = 3.580, SD = 0.587) and speed1 (U = 2672.5, P = 0.043, z = 1.854; with alignment: M = 3.889, SD = 0.795, without alignment: M = 4.069, SD = 0.738) higher in the condition with alignment than in the condition without lexical alignment. No differences were found for the subscales cognitive demand (U = 2505, P = 0.15, z = 1.051, ns), annoyance (U = 2585, P = 0.08, z = 1.403, ns) and habitability (U = 2291, P = 0.46, z = 0.102, ns) (all one-tailed, Mann–Whitney U test). 2.4.4. Quality of relationship inventory Analyses (all one-tailed, Mann–Whitney U test) revealed no significant differences on the Quality of Relationship Inventory subscales support (U = 2063.5, P = 0.18, z = 0.909, ns), depth (U = 2369.5, P = 0.33, z = 0.449, ns) and conflict (U = 2213, P = 0.40, z = 244, ns). 2.4.5. Text revision All results in this paragraph are reported two-tailed. The amount of words in the edited answers differed between conditions for the answer on study regulations (13 words initially): There were more words in the condition without alignment (M = 20.90, SD = 11.23) than in the condition with alignment (M = 16.25, SD = 5.42), U = 2823, P = 0.013, z = 2.472. For the answer on residence (initially 49 words in the condition with alignment and 45 in the condition without alignment), we studentized the variable because of the different initial numbers of words in the conditions. This resulted in a significant difference between conditions (U = 1700.5, P = 0.012, z = 2.526), with a higher number of words in the condition with alignment (M = 0.12, SD = 0.86) than in the condition without alignment (M = −0.09, SD = 1.10). There were no differences in word number between conditions in the answer regarding start of study (U = 2161.5, P = 0.627, z = 0.487, ns). Regarding the type of changes made (sum over all three answers), there were no differences between conditions (text revision: U = 1711.5, P = 1.079, z = 0.281, ns; addition: U = 1645, P = 0.157, z = 1.416, ns; deletion: U = 1753.5, p = 0.405, z = 0.833, ns; replacement: U = 1692.5, P = 0.248, z = 1.155, ns; change in alignment: U = 1594, P = 0.100, z = 1.643, ns; influence on meaning: U = 1576.5, P = 0.113, z = 1.587, ns; range of change: U = 1795.5, P = 0.538, z = 0.616, ns). 2.5. Discussion Experiment 1 revealed the following results: When the SDS showed lexical alignment, participants ascribed more integrity to the system compared to the condition without lexical alignment. This result does not fully support Hypothesis 1 predicting that participants would ascribe more trustworthiness to the SDS using lexical alignment; however, it is in line with the predicted direction of effects. Integrity as an antecedent of trust in the integrative model of trust (Mayer et al., 1995) refers to the trustee’s honesty and upstanding nature in behavior. Thus, this antecedent refers to a personality aspect. Personality has been shown to be ascribed to computers in general (Sundar and Nass, 2000) and to SDS specifically (Nass and Brave, 2005). Besides the content of information, the only way to retrieve information about the SDS is via its voice and verbal expression, which represents an expression of its personality (Lee et al., 2000; Nass and Lee, 2001). Therefore, integrity is the antecedent that is most likely to be affected by the usage of lexical alignment. Concerning the design of SDS, this can be especially important when the usage is perceived to bear a particular risk. There were no significant differences between the conditions regarding ability and benevolence. Ability and benevolence are antecedents that are important to guarantee concrete success of an inquiry. Participants, however, had no genuine interest in the answers to their questions; they acted according to experimental instructions. This aspect should be considered in further studies that allow for participants’ own requests. Furthermore, participants rated the SDS’s response accuracy, its likeability, and its speed as higher in the condition with alignment than in the condition without alignment. This is heading toward the claims of Hypothesis 2 predicting higher satisfaction with the conversation in the alignment condition. The results regarding perceived higher response accuracy and speed might be due to the word overlap and the general impression of a smoother communication, because lexical alignment can enhance this impression (Koulouri et al., 2016). The higher perceived likeability of the SDS when it employed lexical alignment is in line with prior findings (Bradac et al., 1988; Branigan et al., 2010; Ireland and Pennebaker, 2010; Van Baaren et al., 2003). Romero et al. (2015), for instance, found that linguistic style matching increased the perceived trustworthiness from the point of view of third-party observers, which is perfectly congruent to the presented findings. Thus, the usage of lexical alignment enhances the likeability of the SDS in the observers’ perception. Pickard et al. (2014) have found that likeability can in turn have a positive influence on alignment toward a computer. Therefore, the employment of alignment in an SDS cannot only increase its perceived likeability, but lead users to employ alignment themselves. This is especially important for systems that possess limited capabilities (Cowan and Branigan, 2015; see also Brennan, 1998), because it reduces the high lexical variability that people show (Furnas et al., 1987). In sum, the satisfaction with the conversation in observers was promoted by the SDS’s employment of lexical alignment. There were no significant differences between the conditions regarding the quality of the relationship, so Hypothesis 3 cannot be confirmed. It suggests itself that the adopted inventory for the quality of the relationship (Verhofstadt et al., 2006) was not well-suited for the examined settings. It is likely better applicable to long-term relationships, whereas the present settings included quite short encounters. Future research has to determine whether it is beneficially applicable to long-term human–computer relationships. Concerning text revision, we found more words in the condition without alignment than in the condition with alignment for one answer (regarding where to find information on study regulations), but the opposite pattern of results occurred for another answer (regarding how to find residence), and no differences for the other answer. Thus, these effects might have occurred by chance. At least, these results do not allow any conclusions. There were no further significant differences for the text revisions. Perhaps participants were unaware of how lexical features impact on them and accordingly hardly able to change the texts according to their preferences. 3. EXPERIMENT 2: LANGUAGE PRODUCTION Listening to a conversation is a different situation than communicating (Garrod and Pickering, 2009; Wilkes-Gibbs and Clark, 1992). The situation in Experiment 1 does not inform about how an SDS is perceived by users in direct communication. Therefore, the research question in the second experiment was how users experience spoken human–computer interaction when experiencing it themselves. Addressing this question enabled us to analyze how users assess SDS as active interlocutors (Holtgraves and Han, 2007). We wanted to know whether lexical alignment leads to a higher amount of trust toward the SDS in direct interaction. Participants asked a set of questions to an SDS. The SDS either showed lexical alignment in its answers or no lexical alignment. We told participants that we wanted them to talk to the SDS in order to gather data that would improve its performance. Regarding the mock SDS, we employed the same recorded utterances as in Experiment 1. Furthermore, the set of questions asked by participants in Experiment 2 were the same asked by the student in Experiment 1. We derived the following hypotheses. Based on the findings of Sundar and Nass (2000), Holtgraves (2007) and Gong (2008), we hypothesized: Hypothesis 1: When the SDS shows lexical alignment, users will ascribe higher trustworthiness compared to when the system shows no lexical alignment. Furthermore, lexical alignment has been found to positively influence the communicative success (Cowan and Branigan, 2015; Nenkova et al., 2008). Therefore: Hypothesis 2: When the SDS shows lexical alignment, satisfaction with the conversation will be rated higher compared to when the system shows no lexical alignment. Alignment has a positive influence on the quality of a relationship (Ireland and Pennebaker, 2010). Therefore: Hypothesis 3: When the SDS shows lexical alignment, the quality of the relationship will be rated higher compared to when the system shows no lexical alignment. 3.1. Methods and materials 3.1.1. Participants Participants were recruited via handouts and posters in the university building. They received 10 € or the equivalent in course credits for their participation. Participants who had taken part in Experiment 1 were not allowed to participate. We conducted a power analysis in advance (assuming 1 − ß = 0.80; Cohen’s f = 0.25) and determined that we would need a sample size of 128. A total of 130 students at a large university in Germany participated in the experiment; 99 of them were female (76%) and 31 male (24%). Their mean age was 23.31 years (SD = 4.17). On average, they were in their fourth semester (M = 4.41, SD = 2.489). Whereas 127 reported German to be their first language, three reported having another first language and having spoken German for an average of 11 years (SD = 6.08). More than one-half of the participants (56.2%) were majoring in psychology, 20 percent were training as teachers, and the others were studying various disciplines. They reported using a computer for an average of 19.23 (SD = 11.93) hours a week and having a high level of computer knowledge (M = 3.36, SD = 0.56 on a 4-point scale ranging from 1 [beginner] to 4 [expert]). They used SDS for an average of 4.41 (SD = 0.67) hours per week. When asked what they used SDS for, 16.2% reported multiple purposes; 10 percent, to gain information quickly; and 6.2%, for fun. They also reported various other purposes such navigation (3.8%), sending messages (2.3%) or setting a timer (2.3%). 3.1.2. Procedure Participants arrived at the lab individually. They were greeted by the experimenter and received written instructions. They gave informed consent to the study and to having their speech recorded. They were instructed that we were testing a new SDS for future use in university settings. The system was named ‘ACURI’ (Advanced Conversational Utterances Research Interface), it included the same utterances as in Experiment 1. No real SDS was used. Instead, we simulated a system by employing prerecorded computer-voice utterances embedded in a computer interface. A trained assistant in another room operated the interface on a computer. The computer was connected to a telephone system. When the assistant clicked on an utterance, it was played on an IPhone® 6 in the participants’ room. The assistant was also able to hear participants’ input and could then answer with the respective prerecorded utterance in time. Like this, the system was virtually user simulated. The setting can be described as a Wizard-of-Oz scenario (Branigan et al., 2011). The study was conducted in German. Participants were asked to read out a set of ten questions and statements related to university life, for example, I still have nowhere to live while studying. Can I get a room in the hall of residence? Each question or statement was printed on a separate card. The experimenter gave the whole pack of cards to participants. They were then instructed on how to pick up the phone. The experimenter informed participants that she would wait outside the room in the hallway until they finished their conversation. Then the phone rang and the experimenter left the room. Participants read the first utterance and the assistant reacted with the corresponding prerecorded audio file. Participants then read the next card and continued card by card. Depending on the condition, the assistant employed an interface containing answers to the questions (e.g. I still have no residence. Can I get a room in the hall of residence?) either with lexical alignment (e.g. You can get a room in the hall of residence if you apply for it and are a bit lucky) or without lexical alignment (e.g. Suitable application forms are provided by the student union; if you are a bit lucky you can get a room). Assignment to conditions was randomized. When participants indicated that they had finished, the experimenter came back and handed out a questionnaire. Afterwards, participants were thanked, rewarded and fully debriefed. 3.1.3. Materials Ten utterances were prepared and printed on cards to be read out loud by participants. They addressed relevant issues for first-year students such as counseling, accommodation, and study regulations (see https://sites.google.com/site/sdsalignment). The word length of the utterances was approximately equal in each condition (see Appendix A). Furthermore, the utterances provided the same amount of major and minor clauses. The SDS was simulated using prerecorded computer-voice utterances. A female voice from MacBook Pro Voice-Over system was used to read out the system’s answers. The audio files were then embedded in an interface that allowed an assistant to quickly play each file. We created two interfaces, one for each condition. Utterances were transmitted by a telephone training system (TK-MP3 Bluetooth). Each conversation was recorded. The participants used an iPhone® 6 for the conversation. 3.2. Dependent measures For a detailed description of the scales, see the respective section in Experiment 1. We employed the same scales in both experiments. Therefore, we report only the internal consistencies here for each instrument employed. 3.2.1. Trustworthiness scale The internal consistencies of the adapted trustworthiness scale from Mayer and Davis (1999) were satisfactory with α = 0.78 for the subscale ability (five items), α = 0.84 for benevolence (five items) and α = 0.71 for integrity (six items). 3.2.2. Satisfaction with the conversation Internal consistencies of the Subjective Assessment of Speak System Interfaces (SASSI) were α = 0.71 for the subscale Response Accuracy (nine items), α = 0.82 for Likeability (nine items), α = 0.75 for Cognitive Demand (four items), α = 0.71 for Annoyance (five items), α = 0.65 for Habitability (five items) and α = 0.73 for Speed (two items). 3.2.3. Quality of relationship inventory Internal consistencies of the Quality of Relationship Inventory (QRI) were α = 0.66 for the subscale support (five items), α = 0.79 for depth (six items) and α = 0.73 for conflict (12 items). 3.3. Control measures 3.3.1. NEO-PI-R The internal consistency of the eight items from the NEO-PI-R was α = 0.80. 3.3.2. TA-EG Internal consistencies of the TA-EG were α = 0.86 for the subscale enthusiasm (five items), α = 0.69 for competence (four items), α = 0.52 for positive consequences (five items), and α = 0.74 for negative consequences (five items). 3.4. Results 3.4.1. Preliminary analyses There were no differences between conditions for sex, age and computer knowledge (comparison of distributions with Mann–Whitney U test). Furthermore, all subscales of TA-EG and NEO-PI-R were equally distributed over conditions. An alpha level of 0.05 was used for all statistical tests. We used a Kolmogorov–Smirnov goodness-of-fit test to test whether the data were normally distributed. Values for Trustworthiness Scale, SASSI and Quality of Relationship Inventory are reported one-tailed because of our uni-directional hypotheses. 3.4.2. Trustworthiness scale The data was not normally distributed. A Mann–Whitney U test (one-tailed) revealed no differences between conditions for the subscales ability (U = 1794, P = 0.267, z = 0.618, ns), benevolence (U = 1987, P = 0.498, z = 0.005, ns) and integrity (U = 1995.5, P = 0.461, z = 0.099, ns). 3.4.3. SASSI In an analysis of variance, we entered the normally distributed subscales response accuracy, likeability and habitability as dependent variables and whether or not the SDS showed lexical alignment as independent variable. There was no difference between conditions for the subscales response accuracy, F(1, 124) = 0.593, P = 0.443, ns, likeability, F(1, 124) = 0.504, P = 0.479, ns and habitability, F(1, 124) = 0.087, P = 0.768, ns (all one-tailed). Because the other SASSI scales were not normally distributed, we analyzed them with a Mann–Whitney U test (all reported P values are one-tailed). This revealed an effect for cognitive demand (U = 1663.5, P = 0.032, z = 1.850; mean range with alignment = 58.18, mean range without alignment = 70.40). Hence, the perceived cognitive demand was higher in the condition without alignment. However, there was no effect for speed (U = 1970, P = 0.296, z = 0.618, ns) or annoyance (U = 2012.5, P = 0.375, z = 0.320, ns). 3.4.4. Quality of relationship inventory A Mann–Whitney U test (one-tailed) revealed no differences between the conditions for the subscales support (U = 1891.5, P = 0.332, z = 0.435, ns), conflict (U = 1665, P = 0.07, z = 1.476, ns), and depth (U = 1800.5, P = 0.189, z = 0.883, ns). 3.5. Discussion The results of Experiment 2 revealed no differences of the SDS using lexical alignment versus not using lexical alignment regarding trustworthiness and quality of the relationship. There was an effect on the perceived cognitive demand, that is, users perceived the conversation as more cognitively demanding when the SDS did not employ lexical alignment. However, there were no effects on other aspects of the satisfaction with the conversation, namely response accuracy, likeability, habitability, speed and annoyance. The effect that lexical alignment decreases the perceived cognitive demand is a central argument for the implication of lexical alignment in the linguistic behavior of SDS. Processing one’s own speech markers represents a reduction of environmental complexity (Gallois et al., 2005; Giles et al., 1979). Keysar (2007) argues that it requires cognitive effort to consider the perspective of someone else and that it would burden working memory capacities. This could be applied for listeners as well: In this case, showing alignment to users’ words would have a relieving effect regarding cognitive burden, which is in line with the perceptions reported by participants. Pickering and Garrod (2004) argue that a representation, originally constructed for understanding, could be reused for production and vice versa, that is, they base on the assumption of parity of representations. Alignment on one level influences the alignment on other levels because of the interconnection of representations. Therefore, the responses in the non-alignment condition may have been more difficult to process because of a lack of activation in the cognitive structures for language processing. There were no differences between the conditions with and without lexical alignment regarding trustworthiness and quality of relationship. This might have to do with the novelty of the situation for the participants: They were confronted with an SDS, with which they had no prior experience and with which they were asked to interact without prior demonstration. Thus, they could not fall back on a mental model and might have had difficulties to fully process the interaction (Lin et al., 2010; Roßnagel, 2004). Furthermore, the SDS confronted the participants with numerous subtopics, which can have increased the difficulty of participants’ tasks (Holtgraves and Han, 2007). Moreover, the fact that the utterances were not chosen by the participants might have hindered them to build a relationship. In this conversational situation, participants might have not experienced to be personally involved, which might have led to the consequence that the trustworthiness of the SDS did not play a role, nor the quality of relationship to the SDS. 4. GENERAL DISCUSSION In the present study, we used two experiments to examine one and the same simulated SDS either employing lexical or not employing alignment: In Experiment 1, participants listened to a recorded conversation between the system and a student. In Experiment 2, participants communicated with the system themselves, reading out preformulated utterances. We examined the influence of the SDS’s lexical alignment on the assessment of its trustworthiness, the satisfaction with the conversation, and the quality of the relationship. The fact that the same SDS and the same utterances were used allows us to make comparisons between these different settings. First, regarding perceived trustworthiness, participants ascribed more integrity to the SDS when it employed lexical alignment in Experiment 1; however, no differences were found in Experiment 2. As considered before, this difference in the pattern of results might stem from the different cognitive requirements. In Experiment 2, participants had to concentrate on the utterances they were reading out. In Experiment 1, however, participants were able to concentrate more intensely on the SDS utterances, and this could have enabled them to notice more differences in their assessment. The divergent instructions might have reinforced this pattern. Concerning integrity, this antecedent of trustworthiness refers directly to the honesty and accordingly to the personality of the interlocutor. Given that voice constitutes an important expression of personality (Nass and Brave, 2005) and personality can be ascribed to a computer (Sundar and Nass, 2000), integrity is most likely to be affected by the use of lexical alignment compared to the other antecedents. Lexical alignment has a positive influence on the perceived integrity of the SDS. This should be taken into account especially when designing SDSs whose use contains a perceived risk such as the disclosure of personal data or the fulfillment of an important task. Regarding satisfaction with the conversation, Experiment 2 revealed that cognitive demand was perceived to be higher in the condition without alignment. In Experiment 1, participants rated the SDS’s response accuracy, its likeability, and its speed as being higher in the condition with alignment. Overall, this is in line with the hypothesis that the SDS’s employment of lexical alignment leads to a greater amount of satisfaction with the conversation. However, some aspects of satisfaction differed significantly between the two scenarios: When participants had to talk themselves, lexical alignment attenuated the perceived cognitive demand (see Section 3.5). Following the Interactive Alignment Model (Pickering and Garrod, 2004), representations of concepts are linked to each other. Therefore, alignment on one linguistic level has an impact on the cognitive processing on other levels. When participants had already used the same words as the SDS, this likely relieved participants’ cognitive burden, and, in turn, the cognitive demand posed on them by processing the SDS’s utterance. Furthermore, this is in line with the Communication Accommodation Theory, which states that when people process their own speech markers, this leads to a reduced environmental complexity (Gallois et al., 2005; Giles et al., 1979). At the same time, it is cognitively demanding to consider the perspective of another person (Keysar, 2007). When participants were observers, lexical alignment enhanced the perceived SDS’s response accuracy and speed as well as its likeability. The impression of higher response accuracy could be attributed to the word overlap; this might also have influenced the perception of higher speed. Higher ascription of likeability is in line with prior research (Bradac et al., 1988; Branigan et al., 2010; Van Baaren et al., 2003). The phenomenon that people choose from a great variety of words (Furnas et al., 1987) when speaking is reduced when people are led to adapt words that they have heard before. This has an influence on the ease of communication (Cowan and Branigan, 2015; Nenkova et al., 2008; Tomko and Rosenfeld, 2004). Both experiments revealed no differences between conditions in terms of the quality of the relationship. To our knowledge, the employed scale has not been used to assess human–computer interaction before, because it was developed originally for human–human relationships. Even with adaptations, it might not be suitable for the present context. However, the examined human–computer interaction was only a short-term interaction and a relationship over a longer period might well have yielded different results (Lin et al., 2010; Roßnagel, 2004). In sum, the present results lead to the recommendation to implement lexical alignment in SDS. This could not only be beneficial to guide users’ wordings and increase the predictability of their input, but could also enhance the acceptance of SDS and willingness to use them because it might help users to experience a smooth communication (Cowan and Branigan, 2015; Koulouri et al., 2016; Levitan et al., 2011). 4.1. Limitations and suggestions for future research In Experiment 2, participants could not use their own words, but had to read preformulated sentences. This allowed for experimental control at the expense of ecological validity. Experiment 1 examined the observer’s perspective. Future research should therefore include the opportunity for participants to employ their own words and examine the influence of lexical alignment on that setting. Moreover, research should also examine relationships with SDS that last for a longer period of time such as personal assistants used for multiple purposes. In long-term interactions, users get to know the linguistic features of an SDS, and lexical alignment could result in different effects compared to one-time interactions. In particular, systems that are capable of strengthening their amount of lexical alignment could positively influence how their users assess them. Furthermore, participants of both experiments constituted a rather homogeneous group of young students, mainly of psychology and teacher training. Thus, they only represent a narrow group of potential SDS users. The results of this study are likely limited to this potential user group. Future studies should focus on less educated persons, older persons and persons with lower technical affinity because they probably have other needs regarding SDS. To date, lexical alignment constitutes an important topic in human–computer interaction because it enhances the usability of an SDS that has restricted capacity. Increasing capacities and computing powers will bring us closer to the concept of pragmatic or conceptual alignment (Stolk et al., 2016). 4.2. CONCLUSION The employment of lexical alignment in an SDS led persons who spoke given utterances to that SDS report a lower amount of cognitive demand. Furthermore, in persons who listened to the SDS communicating with someone else, lexical alignment led to a higher perceived response accuracy, speed and likeability. In the latter setting, persons also ascribed a higher amount of integrity to the system when it employed lexical alignment. Therefore, we recommend including lexical alignment in the design of SDS. In general, the fast technological development of SDS opens up new bundles of questions regarding the employment of lexical alignment in the relationship between humans and computers. To date, both the adaptation from the system to the user and the adaptation from the user to the system seem to be beneficial. However, with increasing competence of SDS on the one hand and people’s growing experience in the use of these systems on the other hand, the question who will finally mainly adapt to whom still remains open. Furthermore, social aspects are likely to become even more important. In this vein, the perception of lexical alignment or the lack of alignment may be judged not only in regard to comprehensibility, but may lead to judgments about the system’s ‘personality’. Thus, SDS’s increasing complexity of capabilities is likely to be accompanied by an increasing complexity of its social status and relationship to humans. 5. SUPPLEMENTARY MATERIAL Supplementary data is available at Interacting with Computers online. FUNDING Deutsche Forschungsgemeinschaft (German Research Foundation) within the framework of Research Training Group GRK 1712: Trust and Communication in a Digitized World. The Deutsche Forschungsgemeinschaft had no involvement in study design, data collection, analysis and interpretation and the decision to submit the article for publication. ACKNOWLEDGEMENTS We thank Christina Hanna, Jens Riehemann, Daniel Ruholl and Bianca Siemering for their help in implementing the experimental setting, data collection and processing. We thank Jonathan Harrow for language editing. Footnotes 1 Reversed scale, therefore, higher values represent an impression of slowness. REFERENCES Barr , D.J. and Keysar , B. ( 2002 ) Anchoring comprehension in linguistic precedents . J. Mem. Lang. , 46 , 391 – 418 . doi:10.1006/jmla.2001.2815 . Google Scholar CrossRef Search ADS Bell , L. , Gustafson , J. and Heldner , M. ( 2003 ) Prosodic adaption in human–computer interaction . Proc. ICPhS Barc. , 15 , 2453 – 2456 . Bradac , J.J. , Mulac , A. and House , A. ( 1988 ) Lexical diversity and magnitude of convergent versus divergent style shifting: perceptual and evaluative consequences . Lang. Commun. , 8 , 213 – 228 . Google Scholar CrossRef Search ADS Branigan , H. and Pearson , J. ( 2006 ) Alignment in Human-Computer Interaction. In Fischer , K. (ed.) , How People Talk to Computers, Robots, and Other Artificial Communication Partners . pp. 140 – 156 . HWK , Delmenhorst, Germany . Branigan , H.P. , Pickering , M.J. , Pearson , J. and McLean , J.F. ( 2010 ) Linguistic alignment between people and computers . J. Pragmat. , 42 , 2355 – 2368 . doi:10.1016/j.pragma.2009.12.012 . Google Scholar CrossRef Search ADS Branigan , H.P. , Pickering , M.J. , Pearson , J. , McLean , J.F. and Brown , A. ( 2011 ) The role of beliefs in lexical alignment: evidence from dialogs with humans and computers . Cognition , 121 , 41 – 57 . doi:10.1016/j.cognition.2011.05.011 . Google Scholar CrossRef Search ADS PubMed Branigan , H.P. , Pickering , M.J. , McLean , J.F. and Cleland , A.A. ( 2007 ) Syntactic alignment and participant role in dialogue . Cognition , 104 , 163 – 197 . doi:10.1016/j.cognition.2006.05.006 . Google Scholar CrossRef Search ADS PubMed Brennan , S.E. ( 1998 ) The grounding problem in conversations with and through computers. In Fussell , S.R. and Kreuz , R.J. (eds) , Social and Cognitive Psychological Approaches to Interpersonal Communication . pp. 201 – 225 . Lawrence Erlbaum , Hillsdale, NJ . Brennan , S.E. and Clark , H.H. ( 1996 ) Conceptual pacts and lexical choice in conversation . J. Exp. Psychol. Learn. Mem. Cogn. , 22 , 1482 – 1493 . doi:10.1037/0278-7393.22.6.1482 . Google Scholar CrossRef Search ADS PubMed Clark , H.H. ( 1996 ) Using Language . Cambridge University Press , Cambridge, UK. Google Scholar CrossRef Search ADS Clark , H.H. and Brennan , S.E. ( 1991 ) Grounding in communication . Perspect. Soc. Shared Cogn. , 13 , 127 – 149 . doi:10.1037/10096-006 . Google Scholar CrossRef Search ADS Clark , H.H. and Krych , M.A. ( 2004 ) Speaking while monitoring addressees for understanding . J. Mem. Lang. , 50 , 62 – 81 . doi:10.1016/j.jml.2003.08.004 . Google Scholar CrossRef Search ADS Costa , P.T. , Jr. and McCrae , R.R. ( 1992 ) Revised NEO Personality lnventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Professional manual . Psychological Assessment Resources , Odessa, FL . Cowan , B.R. and Branigan , H.P. ( 2015 ) Does voice anthropomorphism affect lexical alignment in speech-based human–computer dialogue? In Proceedings of INTERSPEECH 2015 . pp. 155 – 159 . International Speech Communication Association , San José, CA, USA. Cowan , B.R. , Branigan , H.P. , Obregón , M. , Bugis , E. and Beale , R. ( 2015 ) Voice anthropomorphism, interlocutor modelling and alignment effects on syntactic choices in human−computer dialogue . Int. J. Hum. Comput. Stud. , 83 , 27 – 42 . doi:10.1016/j.ijhcs.2015.05.008 . Google Scholar CrossRef Search ADS Danet , B. ( 1980 ) ‘Baby’or ‘fetus’?: Language and the construction of reality in a manslaughter trial . Semiotica , 32 , 187 – 220 . Google Scholar CrossRef Search ADS De Angeli , A. , Gerbino , W. , Nodari , E. and Petrelli , D. ( 1999 ) From tools to friends: where is the borderline? In Proceedings of the UM’99 Workshop on Attitude, Personality and Emotions in User-Adapted Interaction . pp. 1 – 10 . Springer , Berlin, Germany . Dybkjær , L. and Bernsen , N.O. ( 2000 ) Usability issues in spoken dialogue systems . Nat. Lang. Eng. , 6 , 243 – 271 . Google Scholar CrossRef Search ADS Edlund , J. , Gustafson , J. , Heldner , M. and Hjalmarsson , A. ( 2008 ) Towards human-like dialogue systems . Speech Commun. , 50 , 630 – 645 . doi:10.1016/j.specom.2008.04.002 . Google Scholar CrossRef Search ADS Faigley , L. and Witte , S. ( 1981 ) Analyzing revision . Coll. Composit. Commun. , 32 , 400 – 414 . doi:10.2307/356602 . Google Scholar CrossRef Search ADS Foltz , A. , Gaspers , J. , Thiele , K. , Stenneken , P. and Cimiano , P. ( 2015 ) Lexical alignment in triadic communication . Front. Psychol. , 6 , 127 doi:10.3389/fpsyg.2015.00127 . Google Scholar CrossRef Search ADS PubMed Furnas , G.W. , Landauer , T.K. , Gomez , L.M. and Dumais , S.T. ( 1987 ) The vocabulary problem in human–system communication . Commun. ACM , 30 , 964 – 971 . doi:10.1145/32206.32212 . Google Scholar CrossRef Search ADS Gallois , C. , Ogay , T.T. and Giles , H. ( 2005 ) Communication accommodation theory: a look back and a look ahead. In Gudykunst , W. (ed.) , Theorizing About Intercultural Communication . pp. 121 – 148 . Sage , Thousand Oaks, CA . Garrod , S. and Anderson , A. ( 1987 ) Saying what you mean in dialogue: a study in conceptual and semantic co-ordination . Cognition , 27 , 181 – 218 . doi:10.1016/0010-0277(87)90018-7 . Google Scholar CrossRef Search ADS PubMed Garrod , S. and Pickering , M.J. ( 2009 ) Joint action, interactive alignment, and dialog . Top. Cogn. Sci. , 1 , 292 – 304 . doi:10.1111/j.1756-8765.2009.01020.x . Google Scholar CrossRef Search ADS PubMed Giles , H. , Scherer , K.R. and Taylor , D.M. ( 1979 ) Speech Markers in Social Interaction. In Scherer , K.R. and Giles , H. (eds) , Social Markers in Speech . pp. 343 – 381 . Cambridge University Press , Cambridge . Giles , H. , Coupland , N. and Coupland , J. ( 1991 ) Accommodation theory: communication, context, and consequence. In Giles , H. , Coupland , J. and Coupland , N. (eds) , Contexts of Accommodation . pp. 1 – 68 . Cambridge University Press , New York, NY . Google Scholar CrossRef Search ADS Gong , L. ( 2008 ) How social is social responses to computers? The function of the degree of anthropomorphism in computer representations . Comput. Human. Behav. , 24 , 1494 – 1509 . doi:10.1016/j.chb.2007.05.007 . Google Scholar CrossRef Search ADS Gustafson , J. , Larsson , A. , Carlson , R. and Hellman , K. ( 1997 ). How do system questions influence lexical choices in user answers? Paper presented at the Eurospeech 1997, Rhodos, Greece. Hempel , J. ( 2015 , August 26). Facebook launches M, its bold answer to Siri and Cortana. Retrieved May 16, 2016. http://www.wired.com/2015/08/facebook-launches-m-new-kind-virtual-assistant/ Holtgraves , T. and Han , T.L. ( 2007 ) A procedure for studying online conversational processing using a chat bot . Behav. Res. Methods , 39 , 156 – 163 . Google Scholar CrossRef Search ADS PubMed Holtgraves , T. , Ross , S. , Weywadt , C. and Han , T.L. ( 2007 ) Perceiving artificial social agents . Comput. Human. Behav. , 23 , 2163 – 2174 . doi:10.1016/j.chb.2006.02.017 . Google Scholar CrossRef Search ADS Hone , K.S. and Graham , R. ( 2000 ) Towards a tool for the subjective assessment of speech system interfaces (SASSI) . Nat. Lang. Eng. , 6 , 287 – 303 . Google Scholar CrossRef Search ADS Hone , K.S. and Graham , R. ( 2001 ). Subjective assessment of speech–system interface usability. Proc. 7th Eur. Conf. on Speech Communication and Technology (EUROSPEECH 2001– Scandinavia) (pp. 2083–2086), Aalborg, Denmark. Ireland , M.E. and Pennebaker , J.W. ( 2010 ) Language style matching in writing: synchrony in essays, correspondence, and poetry . J. Pers. Soc. Psychol. , 99 , 549 – 571 . doi:10.1037/a0020386 . Google Scholar CrossRef Search ADS PubMed Joinson , A.N. , Reips , U.-D. , Buchanan , T. and Paine Schofield , C.B. ( 2010 ) Privacy, trust, and self-disclosure online . Hum. Comput. Int. , 25 , 1 – 24 . doi:10.1080/07370020903586662 . Google Scholar CrossRef Search ADS Jucks , R. , Linnemann , G.A. , Thon , F.M. and Zimmermann , M. ( 2016 ) Trust the words: insights into the role of language in trust building in a digitalized world. In Blöbaum , B. (ed.) , Trust and Communication in a Digitized World . pp. 225 – 237 . Springer International Publishing , Cham, Switzerland , doi:10.1007/978-3-319-28059-2 . Google Scholar CrossRef Search ADS Jucks , R. , Päuler , L. and Brummernhenrich , B. ( 2014 ) ‘I need to be explicit: You’re wrong’: impact of face threats on social evaluations in online instructional communication . Int. Comput. , 28 , 73 – 84 . doi:10.1093/iwc/iwu032 . Google Scholar CrossRef Search ADS Jucks , R. , Schulte-Löbbert , P. and Bromme , R. ( 2007 ) Supporting experts’ written knowledge communication through reflective prompts on the use of specialist concepts . Z. Psychosom. J. Psychol. , 215 , 237 – 247 . doi:10.1027/0044-3409.215.4.237 . Karrer , K. , Glaser , C. , Clemens , C. and Bruder , C. ( 2009 ) Technikaffinität erfassen—der Fragebogen TA-EG [Assessing affinity with technology: the TA-EG questionnaire] . Der Mensch im Mittelpunkt technischer Systeme , 8 , 196 – 201 . Keysar , B. ( 2007 ) Communication and miscommunication: the role of egocentric processes . Intercult. Pragmatics , 4 , 71 – 84 . doi:10.1515/IP.2007.004 . Google Scholar CrossRef Search ADS Krauss , R.M. ( 1987 ) The role of the listener: addressee influences on message formulation . J. Lang. Soc. Psychol. , 6 , 81 – 98 . doi:10.1177/0261927×8700600201 . Google Scholar CrossRef Search ADS Krauss , R.M. and Fussell , S.R. ( 1991 ) Perspective-taking in communication: representations of others’ knowledge in reference . Soc. Cogn. , 9 , 2 – 24 . Google Scholar CrossRef Search ADS Koulouri , T. , Lauria , S. and Macredie , R.D. ( 2016 ) Do (and say) as I say: linguistic adaptation in human–computer dialogs . Hum. Comput. Interact. , 31 , 59 – 95 . doi:10.1080/07370024.2014.934180 . Google Scholar CrossRef Search ADS Lee , E.J. , Nass , C. and Brave , S. ( 2000 ). Can computer-generated speech have gender? An experimental test of gender stereotypes. Proceeding CHI EA ‘00 Extended Abstracts on Human Factors in Computing Systems (pp. 289–290). New York, NY. doi:10.1145/633292.633461 Levitan , R. , Gravano , A. and Hirschberg , J. ( 2011 ) Entrainment in speech preceding backchannels. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers-Volume 2 . pp. 113 – 117 . Association for Computational Linguistics , Stroudsburg, PA . Lin , S. , Keysar , B. and Epley , N. ( 2010 ) Reflexively mindblind: using theory of mind to interpret behavior requires effortful attention . J. Exp. Soc. Psychol. , 46 , 551 – 556 . doi:10.1016/j.jesp.2009.12.019 . Google Scholar CrossRef Search ADS Linnemann , G.A. and Jucks , R. ( 2016 ) As in the question, so in the answer? Language style of human and machine speakers affects interlocutors’ convergence on wordings . J. Lang. Soc. Psychol. , doi:10.1177/0261927×15625444 . Lopes , J. , Eskenazi , M. and Trancoso , I. ( 2011 ) Towards choosing better primes for spoken dialog systems. In Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop . pp. 306 – 311 . IEEE , Hawaii . López-Cózar , R. , Callejas , Z. , Griol , D. and Quesada , J.F. ( 2014 ) Review of spoken dialogue systems . Loquens , 1 , e012 doi:10.3989/loquens.2014.012 . Google Scholar CrossRef Search ADS Luger , E. and Sellen , A. ( 2016 ) Like having a really bad PA: the gulf between user expectation and experience of conversational agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems . pp. 5286 – 5297 . ACM , San José, CA, USA. Maddux , W.W. , Mullen , E. and Galinsky , A.D. ( 2007 ) Chameleons bake bigger pies and take bigger pieces: strategic behavioral mimicry facilitates negotiation outcomes . J. Exp. Soc. Psychol. , 44 , 461 – 468 . doi:10.1016/j.jesp.2007.02.003 . Google Scholar CrossRef Search ADS Manjoo , F. ( 2016 , March 09). The echo from Amazon brims with groundbreaking promise. http://www.nytimes.com/2016/03/10/technology/the-echo-from-amazon-brims-with-groundbreaking-promise.html?_r=0 Mavridis , N. ( 2015 ) A review of verbal and non-verbal human–robot interactive communication . Rob. Auton. Syst. , 63 , 22 – 35 . doi:10.1016/j.robot.2014.09.031 . Google Scholar CrossRef Search ADS Mayer , R.C. and Davis , J.H. ( 1999 ) The effect of the performance appraisal system on trust for management: a field quasi-experiment . J. Appl. Psychol. , 84 , 123 doi:10.1037/0021-9010.84.1.123 . Google Scholar CrossRef Search ADS Mayer , R.C. , Davis , J.H. and Schoorman , F.D. ( 1995 ) An integrative model of organizational trust . Acad. Manage. Rev. , 20 , 709 – 734 . doi:10.2307/258792 . Google Scholar CrossRef Search ADS McKnight , D.H. ( 2005 ) Trust in information technology. In Davis , G.B. (ed.) , The Blackwell Encyclopedia of Management. Vol. 7 Management Information Systems . pp. 329 – 331 . Blackwell , Malden, MA . McKnight , D.H. and Chervany , N.L. ( 2001 ) Trust and distrust definitions: one bite at a time. In Trust in Cyber-societies . pp. 27 – 54 . Springer , Berlin, Germany , doi:10.1007/3-540-45547-7-3 . Google Scholar CrossRef Search ADS Metzing , C. and Brennan , S.E. ( 2003 ) When conceptual pacts are broken: partner-specific effects on the comprehension of referring expressions . J. Mem. Lang. , 49 , 201 – 213 . doi:10.1016/S0749-596×(03)00028-7 . Google Scholar CrossRef Search ADS Mitchell , W.J. , Ho , C.C. , Patel , H. and MacDorman , K.F. ( 2011 ) Does social desirability bias favor humans? Explicit–implicit evaluations of synthesized speech support a new HCI model of impression management . Comput. Hum. Behav. , 27 , 402 – 412 . doi:10.1016/j.chb.2010.09.002 . Google Scholar CrossRef Search ADS Nass , C.I. and Brave , S. ( 2005 ) Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship . MIT Press , Cambridge . Nass , C. and Lee , K.M. ( 2001 ) Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction . J. Exp. Psychol. Appl. , 7 , 171 doi:10.1037//1076-898×0.7.3.171 . Google Scholar CrossRef Search ADS PubMed Nenkova , A. , Gravano , A. and Hirschberg , J. ( 2008 ) High frequency word entrainment in spoken dialogue. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers . pp. 169 – 172 . Association for Computational Linguistics , Stroudsburg, PA . Ostendorf , F. and Angleitner , A. ( 2004 ) NEO-PI-R-NEO Persönlichkeitsinventar nach Costa und McCrae—Revidierte Fassung (PSYNDEX Tests Review) [Costa and McCrae’s Revised NEO Personality Inventory] . Hogrefe , Göttingen, Germany . Paek , T. and Pieraccini , R. ( 2008 ) Automating spoken dialogue management design using machine learning: an industry perspective . Speech Commun. , 50 , 716 – 729 . doi:10.1016/j.specom.2008.03.010 . Google Scholar CrossRef Search ADS Pickard , M.D. , Burgoon , J.K. and Derrick , D.C. ( 2014 ) Toward an objective linguistic-based measure of perceived embodied conversational agent power and likeability . Int. J. Hum. Comput. Interact. , 30 , 495 – 516 . doi:10.1080/10447318.2014.888504 . Google Scholar CrossRef Search ADS Pickering , M.J. and Garrod , S. ( 2004 ) Toward a mechanistic psychology of dialogue . Behav. Brain Sci. , 27 , 169 – 226 . doi:10.1017/S0140525X04000056 . Google Scholar PubMed Pierce , G.R. , Sarason , I.G. , Sarason , B.R. , Solky-Butzel , J.A. and Nagle , L.C. ( 1997 ) Assessing the quality of personal relationships . J. Soc. Pers. Relat. , 14 , 339 – 356 . doi:10.1177/0265407597143004 . Google Scholar CrossRef Search ADS Reiner , I. , Beutel , M. , Skaletz , C. , Brähler , E. and Stöbel-Richter , Y. ( 2012 ) Validating the German version of the Quality of Relationship Inventory: confirming the three-factor structure and report of psychometric properties . PLoS One , 7 , e37380 doi:10.1371/journal.pone.0037380 . Google Scholar CrossRef Search ADS PubMed Roßnagel , C.S. ( 2004 ) Lost in thought: cognitive load and the processing of addressees’ feedback in verbal communication . Exp. Psychol. , 51 , 191 – 200 . doi:10.1027/1618-3169.51.3.191 . Google Scholar CrossRef Search ADS PubMed Romero , D.M. , Swaab , R.I. , Uzzi , B. and Galinsky , A.D. ( 2015 ) Mimicry is presidential linguistic style matching in presidential debates and improved polling numbers . Pers. Soc. Psychol. Bull. , 41 , 1311 – 1319 . doi:10.1177/0146167215591168 . Google Scholar CrossRef Search ADS PubMed Stolk , A. , Verhagen , L. and Toni , I. ( 2016 ) Conceptual alignment: how brains achieve mutual understanding . Trends Cogn. Sci. , 20 , 180 – 191 . doi:10.1016/j.tics.2015.11.007 . Google Scholar CrossRef Search ADS PubMed Sundar , S.S. and Nass , C. ( 2000 ) Source orientation in human–computer interaction: programmer, networker, or independent social actor . Commun. Res. , 27 , 683 – 703 . doi:10.1177/009365000027006001 . Google Scholar CrossRef Search ADS Thon , F.M. and Jucks , R. ( 2014 ) Regulating privacy in interpersonal online communication: the role of self-disclosure . Stud. Commun. Sci. , 14 , 3 – 11 . doi:10.1016/j.scoms.2014.03.012 . Google Scholar CrossRef Search ADS Tomko , S. and Rosenfeld , R. ( 2004 ). Shaping spoken input in user-initiative systems. Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004. Boston, MA. Torrey , C. , Powers , A. , Marge , M. , Fussell , S.R. and Kiesler , S. ( 2006 ). Effects of adaptive robot dialogue on information exchange and social relations. Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human–Robot Interaction (pp. 126–133). New York, NY: ACM. Tseng , S. and Fogg , B.J. ( 1999 ) Credibility and computing technology . Commun. ACM , 42 , 39 – 44 . doi:10.1145/301353.301402 . Google Scholar CrossRef Search ADS Van Baaren , R.B. , Holland , R.W. , Steenaert , B. and van Knippenberg , A. ( 2003 ) Mimicry for money: behavioral consequences of imitation . J. Exp. Soc. Psychol. , 39 , 393 – 398 . Google Scholar CrossRef Search ADS Van der Wege and M.M. ( 2009 ) Lexical entrainment and lexical differentiation in reference phrase choice . J. Mem. Lang. , 60 , 448 – 463 . Google Scholar CrossRef Search ADS Verhofstadt , L.L. , Buysse , A. , Rosseel , Y. and Peene , O.J. ( 2006 ) Confirming the three-factor structure of the quality of relationships inventory within couples . Psychol. Assess. , 18 , 15 – 21 . doi:10.1037/1040-3590.18.1.15 . Google Scholar CrossRef Search ADS PubMed Vinyals , O. and Le , Q. ( 2015 ). A neural conversational model. arXiv preprint arXiv:1506.05869. ISO 690 Von der Pütten , A.M. , Krämer , N.C. , Gratch , J. and Kang , S.H. ( 2010 ) ‘It doesn’t matter what you are!’ Explaining social effects of agents and avatars . Comput. Human. Behav. , 26 , 1641 – 1650 . doi:10.1016/j.chb.2010.06.012 . Google Scholar CrossRef Search ADS Wilkes-Gibbs , D. and Clark , H.H. ( 1992 ) Coordinating beliefs in conversation . J. Mem. Lang. , 31 , 183 – 194 . doi:10.1016/0749-596×(92)90010-U . Google Scholar CrossRef Search ADS Author notes Editorial Board Member: Dr Maria Wolters © The Author(s) 2018. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved. For Permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Interacting with Computers Oxford University Press

‘Can I Trust the Spoken Dialogue System Because It Uses the Same Words as I Do?’—Influence of Lexically Aligned Spoken Dialogue Systems on Trustworthiness and User Satisfaction

Loading next page...
 
/lp/ou_press/can-i-trust-the-spoken-dialogue-system-because-it-uses-the-same-words-UcDknOnJvu
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
0953-5438
eISSN
1873-7951
D.O.I.
10.1093/iwc/iwy005
Publisher site
See Article on Publisher Site

Abstract

Abstract One of many ways in which spoken dialogue systems (SDS) are becoming more and more flexible is in their choice of words (e.g. alignment to the user’s vocabulary). We examined how users perceive such adaptive and non-adaptive SDS regarding trustworthiness and usability. In Experiment 1, 130 participants read out questions to an SDS that either made or did not make lexical alignment in its replies. They perceived higher cognitive demand when the SDS did not employ alignment. In Experiment 2, 135 participants listened to a conversation between a human and the same SDS in an online study. They judged the aligned SDS to have more integrity and to be more likeable. Implications for the design of SDS are discussed. RESEARCH HIGHLIGHTS We compared users and listeners perception of spoken dialogue systems (SDS). SDS’s lexical alignment affects user’s cognitive demand. SDS’s lexical alignment appears more likeable to listeners. SDS with lexical alignment is ascribed higher integrity by listeners. 1. INTRODUCTION Having a personal virtual assistant organizing your everyday tasks is no longer science fiction. Facebook has recently presented an assistant called ‘M’ that can be talked to in natural language (Hempel, 2015). On your friend’s birthday, for example, M can offer to order a cake for you, make a reservation in your friend’s favorite restaurant, and even make suggestions regarding a suitable present. Google has also developed a system that is able to conduct very natural spoken conversations. Extracting information autonomously, it can, for instance, deal with IT problems almost like a human consultant (Vinyals and Le, 2015). Furthermore, Amazon offers a personal assistant, called ‘Alexa’, for various purposes ranging from retrieving information to smart home operation (Manjoo, 2016). Trust is a major issue in this type of interaction (Cowan et al., 2015). Users of systems like ‘Siri’ from Apple mention trust as a criterion, which determines the range of tasks users ask the system to perfom (Luger and Sellen, 2016). Especially, trust becomes important when the capability of the system is opaque to users. People tend to be unsure that the system will perform tasks properly (Luger and Sellen, 2016). When assessing trustworthiness, users rely on language, because certain characteristics of (spoken) language can promote trust (Jucks et al., 2016). These include self-disclosing information, addressing emotions, and being empathic. When people talk to each other in dialogue situations, they tend to adopt a range of linguistic features used by their interlocutor (Branigan et al., 2010; Brennan and Clark, 1996). One of these is adaptation on the level of word choice, termed lexical alignment (Branigan et al., 2011). This means that a speaker employs words that have already been used by the conversational partner. Lexical alignment plays an important role not only in human–human but also in human–computer interaction (Bell et al., 2003; Cowan and Branigan, 2015; Gustafson et al., 1997; Linnemann and Jucks, 2016). It has been a primary focus of research on spoken dialogue systems (SDS), because it not only contributes decisively to the success of communication (Nenkova et al., 2008) but can also inform the general body of alignment research (Branigan et al., 2011). However, the use of lexical alignment also reflects social functions, like politeness (Branigan et al., 2010; Jucks et al., 2014; Torrey et al., 2006) and affiliation (Ireland and Pennebaker, 2010). Because SDS are likely to be perceived as social actors (Sundar and Nass, 2000), these social functions of lexical alignment should affect communication with SDS and, in particular, assessment of their trustworthiness. Asking whether and which SDS are perceived as trustworthy is of major practical importance in light of their widespread and growing implementation (López-Cózar et al., 2014; Luger and Sellen, 2016). In this context, lexical alignment could prove to be one key component among the linguistic features employed by SDS. In the following introductory sections, we first give an introduction to lexical alignment, we present the role of lexical alignment in human–computer interaction and especially in interaction with SDS. We then examine the role of trust in the use of SDS. Finally, we present the rationale of the present study and derive hypotheses. 1.1. Lexical alignment People tend to converge linguistically when in dialogue. This has frequently been termed alignment (Branigan and Pearson, 2006; Pickering and Garrod, 2004), convergence (Giles et al., 1991), linguistic entrainment (Garrod and Anderson, 1987) or use of linguistic precedents (Barr and Keysar, 2002). The approaches mainly differ in the conceptualization of the degree to which the conversational partner is taken into account and to the assumed degree of automation or intention respectively. Yet, there is no account that satisfactorily fully integrates these approaches (Foltz et al., 2015). However, two approaches are mainly considered in research (Cowan et al., 2015). On the one hand, alignment can be conceptualized as a rather automatic process: In their Interactive Alignment Model, Pickering and Garrod (2004) postulate that alignment is based on automatic priming mechanisms. According to the model, the tendency to align on one linguistic level—the lexical level, for instance—is enhanced through alignment on another level—the syntactic level, for instance. Following Pickering and Garrod (2004), successful alignment ensures a successful communication. On the other hand, several studies suggest that lexical alignment represents a partner specific process, which results in audience design (Brennan and Clark, 1996; Krauss and Fussell, 1991; Metzing and Brennan, 2003). The process between interlocutors to settle on shared references is termed grounding, the resulting shared knowledge base is referred to as common ground (Clark and Brennan, 1991; Clark, 1996). The approaches described are not strictly separated from each other. Alignment can contain elements of both approaches, depending on the context (Branigan et al., 2010). Lexical alignment refers to the employment of words that have already been used by the conversational partner. The existence of lexical alignment has been shown in numerous empiric studies (Branigan et al., 2010, 2011; Brennan and Clark, 1996). In the following section, we describe the role of lexical alignment in spoken human–computer interaction. 1.2. Lexical Alignment in Spoken Human–Computer Interaction Research has shown that people commonly show lexical alignment toward computers (Branigan et al., 2010), and even the mere belief that one is communicating with a computer leads to a greater amount of lexical alignment (Branigan et al., 2010, 2011). This seems to be because people aim for communicative success (Branigan et al., 2011; Linnemann and Jucks, 2016). Indeed, research confirms that lexical alignment actually does enhance the success of communication with SDS (Koulouri et al., 2016; Levitan et al., 2011). In addition, word choice can give information about the relationship between conversational partners. According to communication accommodation theory, deliberately adopting or not adopting an interaction partner’s words can express convergence with or divergence from either the partner or the topic (Giles et al., 1991; see also Danet, 1980; Van der Wege, 2009): Convergence can be motivated by the attempt to enhance the communicational effectiveness, for instance by facilitating comprehension. Divergence can express the wish to emphasize differences to the linguistic style of a conversational partner. Ireland and Pennebaker (2010) have shown that matching language style can serve as a marker for the social quality of the relationship. Furthermore, the adoption of words is perceived as polite (Branigan et al., 2010; Torrey et al., 2006) and can lead to positive feelings (Bradac et al., 1988; Branigan et al., 2010; Van Baaren et al., 2003). The resulting likeability, in turn, has been shown to exert a positive influence on the alignment toward an embodied conversational agent; that is, a graphical computer figure that possesses conversational capabilities (Pickard et al., 2014). People can be both users of SDS and observers of others interacting with an SDS. The latter case occurs, for example, for navigation systems operated by one person in a car with further occupants, or with personal assistants like ‘Alexa’ from Amazon, which is intended to be used by all members of a household. Furthermore, advertisements show other people interacting with an SDS. This may be the first encounter for potential customers before directly experiencing the interaction with an SDS. Research has shown that the degree of involvement is an important aspect in communication: Wilkes-Gibbs and Clark (1992), for example, investigated beliefs about shared information. Branigan et al. (2007) showed that addressees possess a greater likelihood to repeat the speaker’s grammatical form than bystanders. This result is interpreted by Garrod and Pickering (2009) to that effect, that adressees prepare to give an answer and therefore predict the speaker’s grammatical form. Thus, lexical alignment plays an important role both for speakers of and listeners to utterances. For speakers, alignment can be an instrument to ensure communicative success (Levitan et al., 2011; Linnemann and Jucks, 2016) and to mark their position toward the interlocutor (Van der Wege, 2009). Especially the first aspect is important in communication with SDS, because the natural variability of word choice is relatively high (vocabulary problem; Furnas et al., 1987). Therefore, the implementation of mechanisms that lead people to adopt words facilitates communication and reduces problems in understanding (Cowan and Branigan, 2015; Nenkova et al., 2008; Tomko and Rosenfeld, 2004). For listeners, the implementation of lexical alignment by the SDS can be beneficial as well. Lopes et al. (2011) reported that it has positive effects on system performance and estimated dialogue success. Furthermore, some participants reported that the system sounded more natural the more that it aligned. 1.3. SDS and trust McKnight (2005) has stated that ‘trust in technology is built the same way as trust in people’ (p. 330). Evidence suggests that people treat computers as social actors, not considering that they are the product of programmers (Sundar and Nass, 2000; see also Dybkjær and Bernsen, 2000). Hence, people put human-like social qualities on computer systems. This does not even require a highly evolved language: A basic language capability showing the system’s capability can produce the perception of the computer as a human-like being (De Angeli et al., 1999). The effect of ascribing human-like qualities remains even when participants know they are communicating with a computer (Holtgraves et al., 2007). However, there seems to be a relationship between the degree of anthropomorphism and trustworthiness: The more human-like a computer appeared, the higher participants rated its competence and trustworthiness (Gong, 2008). In communication with an SDS, especially the presence of voice and spoken language leads to the involvement of trust (Mitchell et al., 2011). Indeed, the voice alone conveys characteristics that influence the assessment of trustworthiness. For instance, male voices are rated as trustworthier (Lee et al., 2000), and so are voices with cues that match a participant’s personality (Nass and Lee, 2001). Existing SDS already possess a relatively high degree of anthropomorphism (Edlund et al., 2008; Mavridis, 2015). The growing capabilities of SDS are possible due to their permanent connection to the Internet (Mavridis, 2015) and the continuous gain in information from their users. Users perceive that they are communicating with SDS as social actors (Nass and Brave, 2005; Sundar and Nass, 2000) and are not perceiving that they are transferring their information to a distal server. Therefore, we argue that their trust behavior consists in talking to the SDS and self-disclosing personal data and questions to it (Jucks et al., 2016; Tseng and Fogg, 1999). Because of the perception of SDS as social actors and, in turn, the activation of social categories, we further assert that SDS can be conceptualized as receivers of trust. In the vast majority of conceptualizations, trust incorporates the willingness to depend on somebody else (McKnight and Chervany, 2001) and therefore the willingness to be vulnerable (Mayer and Davis, 1999). In their integrative model of trust, Mayer et al. (1995) list ability (competence), benevolence and integrity, forming the so-called ABI model, as antecedents of trustworthiness. Ability addresses the competence to attain the aspired results. Benevolence refers to the trustee’s motivation and good will toward the trustor, and integrity refers to the trustee’s honesty. All these qualities can be ascribed to elaborated SDS (Paek and Pieraccini, 2008). Trust represents an important precondition for spoken human–computer interaction because it enables interaction and enhances the vivid exchange of information (Maddux et al., 2007). Even when users know that privacy is low, they can compensate this through high trust (Joinson et al., 2010). Thus, trust is an essential aspect in communication with SDS. 1.4. Rationale We have outlined above that lexical alignment occurs in the interaction with SDS and that it plays an important role for communication from the point of view of both speakers and listeners. Its importance stems from its contribution to conversational success as well as the transmission of social aspects such as affiliation. Because highly capable systems possess a high degree of human-likeness, users can easily perceive them as social actors (Holtgraves et al., 2007; Nass and Brave, 2005; Nass and Lee, 2001; see also Von der Pütten et al., 2010). This, in turn, may promote trusting behavior with the SDS as the trustee. In the present study, we used two experiments to investigate the influence of lexical alignment on the perception of the trustworthiness of an SDS and the satisfaction with the conversation. In Experiment 1, we employed an SDS and investigated how observers assessed its trustworthiness and their satisfaction with the conversation. In Experiment 2, we examined how users experience lexical alignment shown by an SDS they talked to themselves. That is, in Experiment 2, users had direct experience of lexical alignment in response to their questions (Holtgraves and Han, 2007). 2. EXPERIMENT 1: OBSERVER STUDY In the first experiment, we investigated the effects of lexical alignment shown by the SDS on how observers perceived the system’s trustworthiness and the communication in general. Listeners have been shown to play an important role for alignment (Clark and Krych, 2004; Krauss and Fussell, 1991; Krauss, 1987). The degree of personal involvement turned out to be an important aspect of communication (Garrod and Pickering, 2009; Wilkes-Gibbs and Clark, 1992). Being an observer of a conversation, especially of a recorded conversation, does not include getting involved in this conversation. Not having to be prepared to join the conversation might lead to the attention to other conversational aspects than a speaker might focus on. For Experiment 1, we formulated the following hypotheses: Based on the findings of Gong (2008) and Mitchell et al. (2011), we hypothesized: Hypothesis 1: When the SDS shows lexical alignment, users will ascribe higher trustworthiness compared to when the system shows no lexical alignment. Lexical alignment can positively influence the communicative success (Branigan et al., 2011; Linnemann and Jucks, 2016) in the communication with SDS (Koulouri et al., 2016; Levitan et al., 2011). Therefore: Hypothesis 2: When the SDS shows lexical alignment, satisfaction with the conversation will be rated higher compared to when the system shows no lexical alignment. Alignment has a positive influence on the quality of a relationship (Ireland and Pennebaker, 2010). Therefore: Hypothesis 3: When the SDS shows lexical alignment, the quality of the relationship will be rated higher compared to when the system shows no lexical alignment. Participants listened to the conversation between a student and the SDS. The SDS either showed or did not show lexical alignment toward the student’s utterances. Afterwards, participants assessed the perceived trustworthiness of the system and their satisfaction with the conversation. Additionally, we presented three answers from each condition to the participants and asked them to edit these. The idea behind taking this exploratory approach was to assess the linguistic preferences participants would show when they were free to change the proposals. The experiment was conducted in an online setting. 2.1. Methods and materials 2.1.1. Participants Participants were recruited using students’ newsletters and via posters in the university building. They received 5 € vouchers from Amazon or the equivalent in course credits. We conducted an analysis of power in advance (assuming 1 − ß = .80; Cohen’s f = .25) and determined that we needed a sample size of 128. A total of 135 students attending a large university in Germany participated in the experiment; 83 of them were female and 52 male. Their mean age was 23.02 years (SD = 3.60). On average, they were in their fourth semester (M = 4.31, SD = 2.53). All participants reported that German was their first language. The majority (30.4%) were training to be teachers, 14.1% were majoring in psychology, and the others came from various disciplines. They reported using a computer for an average of 21.92 (SD = 14.92) hours a week and having a high level of computer knowledge (M = 3.50, SD = 0.66 on a 4-point scale ranging from 1 [beginner] to 4 [expert]). They reported using SDS for an average of 4.06 (SD = 0.95) hours per week, and using them for quick access to information (28.9%), multiple purposes (11.9%), navigation (2.2%) and weather forecasts (1.5%). 2.1.2. Design Participants listened to the conversation between a student and an SDS regarding issues of freshmen at the university. Therefore, we made up a mock SDS, which we called ‘ACURI’ (Advanced Conversational Utterances Research Interface). We embedded the recorded conversation between the student and ACURI in an online survey (EFS Survey®). We varied whether the SDS showed lexical alignment to the student or not. Participants were randomly assigned to conditions. The interval between utterances was held constant at 1 s between the student’s question and the system’s answer and 2 s between the system’s answer and the student’s next question. The study was conducted as an online experiment. Instructions included the false information that ACURI is a system that could be applied to various topics, and that it had been made available to first-year students in order to help guide them through their first days at university (see https://sites.google.com/site/sdsalignment). We chose this field of topic because of our student sample. Due to the design, participants could not use their own words. However, we aimed at a topic that is related to the participants’ life to ensure a certain ecological validity. Moreover, providing information is a central task of common SDS. We told participants that we were interested in how people interact with this kind of system. After receiving their instructions, participants listened to the recorded conversation. They then completed a questionnaire including SASSI, QRI, Trustworthiness Scale (adapted from Mayer and Davis, 1999), NEO-PI-R and TA-EG (description and alpha values: see below). Finally, we presented three questions and answers from the dialogue and gave participants the opportunity to edit them. At the end of the study, participants were thanked, fully debriefed and rewarded. 2.1.3. Materials The study was conducted in German. The conversation between a student and ACURI was recorded in advance. A female voice from MacBook Pro Voice-Over system was used to read out ACURI’s answers, the student’s answers were recorded with a microphone. The student spoke ten utterances, which did not vary between conditions. The topic were relevant issues for first-year students such as counseling, accommodation, and study regulations (see https://sites.google.com/site/sdsalignment). We varied whether ACURI showed lexical alignment to the student or did not show lexical alignment. The word length of the utterances was approximately equal in each condition (see Appendix A). Furthermore, the utterances provided the same amount of major and minor clauses. 2.2. Dependent variables 2.2.1. Trustworthiness scale We assessed trustworthiness by adapting a scale from Mayer and Davis (1999) that was constructed initially for use in the context of organizational relationships (Mayer et al., 1995). These authors defined trust as the willingness to be vulnerable to another one’s action. Their model conceptualizes the antecedents of trust as ability, benevolence, and integrity. Thus, the perception of the trustee’s trustworthiness depends on the assessment of these factors. In this definition, the trustee has to be identifiable and perceived to act with volition. Based on findings presented above in the section on trust and SDS, we assume that our mock SDS meets this requirement. The scale has been applied successfully in other contexts (e.g. online communication: Thon and Jucks, 2014). Internal consistencies were α = 0.84 for ability (five items), α = 0.83 for benevolence (five items) and α = 0.71 for integrity (six items). 2.2.2. Satisfaction with the conversation We assessed satisfaction with the conversation with the Subjective Assessment of Speak System Interfaces (SASSI, Hone and Graham, 2000, 2001). SASSI contains six subscales: Response Accuracy, Likeability, Cognitive Demand, Annoyance, Habitability and Speed. Response Accuracy covers the user’s judgment of the appropriateness of the answers. Likeability refers to friendliness and usefulness. Cognitive Demand comprises the perceived own effort. Annoyance describes repetitiveness, boredom and frustration. Habitability refers to the user’s understanding of the system’s and her own reaction. Speed means the reaction time of the system. Measures were assessed on 7-point scales ranging from 1 (I do not agree at all) to 7 (I totally agree). Internal consistencies for the SASSI subscales were α = 0.72 for Response Accuracy (nine items), α = 0.81 for Likeability (nine items), α = 0.670 for Cognitive Demand (four items), α = 0.71 for Annoyance (five items), α = 0.722 for Habitability (five items) and α = 0.71 for Speed (two items). 2.2.3. Quality of relationship inventory We assessed the relationship with the SDS by adapting the Quality of Relationship Inventory (QRI, Pierce et al., 1997; Verhofstadt et al., 2006; validated for German Samples by Reiner et al., 2012) for application to an SDS. The inventory includes the scales support (Is support available from the relationship?), conflict (Does the relationship cause conflicts and ambivalent feelings?) and depth (How close and important is the relationship?). It is widely used for assessing relationships with significant others, and therefore designed originally for use in human–human relationships. However, research has shown that people perceive and treat computational agents as social actors even when clearly informed that their partner is a computer (Holtgraves et al., 2007; Sundar and Nass, 2000). Therefore, we adapted the items to assess the quality of the relationship in human–computer interaction. Internal consistencies for the QRI subscales were α = 0.54 for the subscale support (five items), α = 0.84 for depth (six items) and α = 0.87 for conflict (12 items). 2.2.4. Text editing Participants were given the opportunity to revise three of the answers given by the SDS in the audio file they had listened to before. These three answers addressed the topics start of studies, study regulation and residence. The initial answer on start of studies contained 41 words in both conditions, the answer on study regulation contained 13 words in both conditions, and the answer on residence contained 49 words in the condition with alignment and 45 words in the condition without alignment. Participants could freely edit the text in a text box in order to improve it. Changes were saved when participants had finished their input. Following a procedure developed by Jucks et al. (2007), we compared documents (using Microsoft Word to detect changes), which included using Faigley and Witte’s (1981) taxonomy of revision changes as a basis for developing a coding scheme. First, we ascertained whether a change in the text had been made or not. If yes, we examined which kind of change had been performed: addition, deletion, replacement or change of the position of words or phrases in the text. Furthermore, we examined whether participants had changed particular words, which would make a difference regarding lexical alignment, that is, whether the change led to stronger or weaker lexical alignment or had no effect on it. Then we assessed the scope; that is, whether the change affected a word, phrase, or whole sentence. Finally, we judged whether the change influenced the meaning of the text. The coding was conducted by an instructed person. In case of uncertainty, the respective case was discussed with another person familiar with the coding scheme’s criteria. 2.3. Control measures 2.3.1. NEO-PI-R We controlled for the participants’ individual disposition to trust with eight items from the German version of NEO-PI-R (Ostendorf and Angleitner, 2004, following Costa and McCrae, 1992). Within the NEO-PI-R, propensity to trust is operationalized as a dimension of agreeableness. In this extracted form, the scale has been used before by Thon and Jucks (2014). The internal consistency was α = 0.85. 2.3.2. TA-EG We controlled for participants’ affinity with technology with the German TA-EG questionnaire (Karrer et al., 2009). This contains the subscales enthusiasm, competence, positive consequences and negative consequences. Internal consistencies were α = 0.89 for the subscale enthusiasm (five items), α = 0.60 for competence (four items), α = 0.66 for positive consequences (five items), and α = 0.78 for negative consequences (five items). 2.4. Results 2.4.1. Preliminary analyses There were no differences between conditions for sex, age and computer knowledge (comparison of distributions with Mann–Whitney U test). Furthermore, all TA-EG and NEO-PI-R subscales were equally distributed over conditions. An alpha level of 0.05 was used for all statistical tests. We used a Kolmogorov–Smirnov goodness-of-fit test to test whether the data was normally distributed. Because the data for the scales below were not normally distributed (P < 0.05), we employed a Mann–Whitney U test to compare groups. Values for Trustworthiness Scale, SASSI and Quality of Relationship Inventory are reported one-tailed because of our uni-directional hypotheses. For the exploratory analyses regarding text revision, all reported values are two-tailed. 2.4.2. Trustworthiness scale Participants ascribed more integrity to the SDS in the condition with lexical alignment (M = 3.370, SD = 0.647) than in the condition without lexical alignment (M = 3.002, SD = 0.631), U = 1441, P < 0.001, z = 3.665. There were no differences for the subscales ability (U = 1905, P = 0.05, z = 1.611, ns) and benevolence (U = 2111.5, P = 0.24, z = 0.692, ns) (all one-tailed, Mann–Whitney U test). 2.4.3. SASSI In the condition in which the SDS showed lexical alignment, participants rated the SDS’s response accuracy (U = 1787.5, P = 0.012, z = 2.125; with alignment: M = 4.833, SD = 0.571, without alignment: M = 4.701, SD = 0.471), likeability (U = 1870.5, P = 0.034, z = 1.757; with alignment: M = 3.725, SD = 0.653, without alignment: M = 3.580, SD = 0.587) and speed1 (U = 2672.5, P = 0.043, z = 1.854; with alignment: M = 3.889, SD = 0.795, without alignment: M = 4.069, SD = 0.738) higher in the condition with alignment than in the condition without lexical alignment. No differences were found for the subscales cognitive demand (U = 2505, P = 0.15, z = 1.051, ns), annoyance (U = 2585, P = 0.08, z = 1.403, ns) and habitability (U = 2291, P = 0.46, z = 0.102, ns) (all one-tailed, Mann–Whitney U test). 2.4.4. Quality of relationship inventory Analyses (all one-tailed, Mann–Whitney U test) revealed no significant differences on the Quality of Relationship Inventory subscales support (U = 2063.5, P = 0.18, z = 0.909, ns), depth (U = 2369.5, P = 0.33, z = 0.449, ns) and conflict (U = 2213, P = 0.40, z = 244, ns). 2.4.5. Text revision All results in this paragraph are reported two-tailed. The amount of words in the edited answers differed between conditions for the answer on study regulations (13 words initially): There were more words in the condition without alignment (M = 20.90, SD = 11.23) than in the condition with alignment (M = 16.25, SD = 5.42), U = 2823, P = 0.013, z = 2.472. For the answer on residence (initially 49 words in the condition with alignment and 45 in the condition without alignment), we studentized the variable because of the different initial numbers of words in the conditions. This resulted in a significant difference between conditions (U = 1700.5, P = 0.012, z = 2.526), with a higher number of words in the condition with alignment (M = 0.12, SD = 0.86) than in the condition without alignment (M = −0.09, SD = 1.10). There were no differences in word number between conditions in the answer regarding start of study (U = 2161.5, P = 0.627, z = 0.487, ns). Regarding the type of changes made (sum over all three answers), there were no differences between conditions (text revision: U = 1711.5, P = 1.079, z = 0.281, ns; addition: U = 1645, P = 0.157, z = 1.416, ns; deletion: U = 1753.5, p = 0.405, z = 0.833, ns; replacement: U = 1692.5, P = 0.248, z = 1.155, ns; change in alignment: U = 1594, P = 0.100, z = 1.643, ns; influence on meaning: U = 1576.5, P = 0.113, z = 1.587, ns; range of change: U = 1795.5, P = 0.538, z = 0.616, ns). 2.5. Discussion Experiment 1 revealed the following results: When the SDS showed lexical alignment, participants ascribed more integrity to the system compared to the condition without lexical alignment. This result does not fully support Hypothesis 1 predicting that participants would ascribe more trustworthiness to the SDS using lexical alignment; however, it is in line with the predicted direction of effects. Integrity as an antecedent of trust in the integrative model of trust (Mayer et al., 1995) refers to the trustee’s honesty and upstanding nature in behavior. Thus, this antecedent refers to a personality aspect. Personality has been shown to be ascribed to computers in general (Sundar and Nass, 2000) and to SDS specifically (Nass and Brave, 2005). Besides the content of information, the only way to retrieve information about the SDS is via its voice and verbal expression, which represents an expression of its personality (Lee et al., 2000; Nass and Lee, 2001). Therefore, integrity is the antecedent that is most likely to be affected by the usage of lexical alignment. Concerning the design of SDS, this can be especially important when the usage is perceived to bear a particular risk. There were no significant differences between the conditions regarding ability and benevolence. Ability and benevolence are antecedents that are important to guarantee concrete success of an inquiry. Participants, however, had no genuine interest in the answers to their questions; they acted according to experimental instructions. This aspect should be considered in further studies that allow for participants’ own requests. Furthermore, participants rated the SDS’s response accuracy, its likeability, and its speed as higher in the condition with alignment than in the condition without alignment. This is heading toward the claims of Hypothesis 2 predicting higher satisfaction with the conversation in the alignment condition. The results regarding perceived higher response accuracy and speed might be due to the word overlap and the general impression of a smoother communication, because lexical alignment can enhance this impression (Koulouri et al., 2016). The higher perceived likeability of the SDS when it employed lexical alignment is in line with prior findings (Bradac et al., 1988; Branigan et al., 2010; Ireland and Pennebaker, 2010; Van Baaren et al., 2003). Romero et al. (2015), for instance, found that linguistic style matching increased the perceived trustworthiness from the point of view of third-party observers, which is perfectly congruent to the presented findings. Thus, the usage of lexical alignment enhances the likeability of the SDS in the observers’ perception. Pickard et al. (2014) have found that likeability can in turn have a positive influence on alignment toward a computer. Therefore, the employment of alignment in an SDS cannot only increase its perceived likeability, but lead users to employ alignment themselves. This is especially important for systems that possess limited capabilities (Cowan and Branigan, 2015; see also Brennan, 1998), because it reduces the high lexical variability that people show (Furnas et al., 1987). In sum, the satisfaction with the conversation in observers was promoted by the SDS’s employment of lexical alignment. There were no significant differences between the conditions regarding the quality of the relationship, so Hypothesis 3 cannot be confirmed. It suggests itself that the adopted inventory for the quality of the relationship (Verhofstadt et al., 2006) was not well-suited for the examined settings. It is likely better applicable to long-term relationships, whereas the present settings included quite short encounters. Future research has to determine whether it is beneficially applicable to long-term human–computer relationships. Concerning text revision, we found more words in the condition without alignment than in the condition with alignment for one answer (regarding where to find information on study regulations), but the opposite pattern of results occurred for another answer (regarding how to find residence), and no differences for the other answer. Thus, these effects might have occurred by chance. At least, these results do not allow any conclusions. There were no further significant differences for the text revisions. Perhaps participants were unaware of how lexical features impact on them and accordingly hardly able to change the texts according to their preferences. 3. EXPERIMENT 2: LANGUAGE PRODUCTION Listening to a conversation is a different situation than communicating (Garrod and Pickering, 2009; Wilkes-Gibbs and Clark, 1992). The situation in Experiment 1 does not inform about how an SDS is perceived by users in direct communication. Therefore, the research question in the second experiment was how users experience spoken human–computer interaction when experiencing it themselves. Addressing this question enabled us to analyze how users assess SDS as active interlocutors (Holtgraves and Han, 2007). We wanted to know whether lexical alignment leads to a higher amount of trust toward the SDS in direct interaction. Participants asked a set of questions to an SDS. The SDS either showed lexical alignment in its answers or no lexical alignment. We told participants that we wanted them to talk to the SDS in order to gather data that would improve its performance. Regarding the mock SDS, we employed the same recorded utterances as in Experiment 1. Furthermore, the set of questions asked by participants in Experiment 2 were the same asked by the student in Experiment 1. We derived the following hypotheses. Based on the findings of Sundar and Nass (2000), Holtgraves (2007) and Gong (2008), we hypothesized: Hypothesis 1: When the SDS shows lexical alignment, users will ascribe higher trustworthiness compared to when the system shows no lexical alignment. Furthermore, lexical alignment has been found to positively influence the communicative success (Cowan and Branigan, 2015; Nenkova et al., 2008). Therefore: Hypothesis 2: When the SDS shows lexical alignment, satisfaction with the conversation will be rated higher compared to when the system shows no lexical alignment. Alignment has a positive influence on the quality of a relationship (Ireland and Pennebaker, 2010). Therefore: Hypothesis 3: When the SDS shows lexical alignment, the quality of the relationship will be rated higher compared to when the system shows no lexical alignment. 3.1. Methods and materials 3.1.1. Participants Participants were recruited via handouts and posters in the university building. They received 10 € or the equivalent in course credits for their participation. Participants who had taken part in Experiment 1 were not allowed to participate. We conducted a power analysis in advance (assuming 1 − ß = 0.80; Cohen’s f = 0.25) and determined that we would need a sample size of 128. A total of 130 students at a large university in Germany participated in the experiment; 99 of them were female (76%) and 31 male (24%). Their mean age was 23.31 years (SD = 4.17). On average, they were in their fourth semester (M = 4.41, SD = 2.489). Whereas 127 reported German to be their first language, three reported having another first language and having spoken German for an average of 11 years (SD = 6.08). More than one-half of the participants (56.2%) were majoring in psychology, 20 percent were training as teachers, and the others were studying various disciplines. They reported using a computer for an average of 19.23 (SD = 11.93) hours a week and having a high level of computer knowledge (M = 3.36, SD = 0.56 on a 4-point scale ranging from 1 [beginner] to 4 [expert]). They used SDS for an average of 4.41 (SD = 0.67) hours per week. When asked what they used SDS for, 16.2% reported multiple purposes; 10 percent, to gain information quickly; and 6.2%, for fun. They also reported various other purposes such navigation (3.8%), sending messages (2.3%) or setting a timer (2.3%). 3.1.2. Procedure Participants arrived at the lab individually. They were greeted by the experimenter and received written instructions. They gave informed consent to the study and to having their speech recorded. They were instructed that we were testing a new SDS for future use in university settings. The system was named ‘ACURI’ (Advanced Conversational Utterances Research Interface), it included the same utterances as in Experiment 1. No real SDS was used. Instead, we simulated a system by employing prerecorded computer-voice utterances embedded in a computer interface. A trained assistant in another room operated the interface on a computer. The computer was connected to a telephone system. When the assistant clicked on an utterance, it was played on an IPhone® 6 in the participants’ room. The assistant was also able to hear participants’ input and could then answer with the respective prerecorded utterance in time. Like this, the system was virtually user simulated. The setting can be described as a Wizard-of-Oz scenario (Branigan et al., 2011). The study was conducted in German. Participants were asked to read out a set of ten questions and statements related to university life, for example, I still have nowhere to live while studying. Can I get a room in the hall of residence? Each question or statement was printed on a separate card. The experimenter gave the whole pack of cards to participants. They were then instructed on how to pick up the phone. The experimenter informed participants that she would wait outside the room in the hallway until they finished their conversation. Then the phone rang and the experimenter left the room. Participants read the first utterance and the assistant reacted with the corresponding prerecorded audio file. Participants then read the next card and continued card by card. Depending on the condition, the assistant employed an interface containing answers to the questions (e.g. I still have no residence. Can I get a room in the hall of residence?) either with lexical alignment (e.g. You can get a room in the hall of residence if you apply for it and are a bit lucky) or without lexical alignment (e.g. Suitable application forms are provided by the student union; if you are a bit lucky you can get a room). Assignment to conditions was randomized. When participants indicated that they had finished, the experimenter came back and handed out a questionnaire. Afterwards, participants were thanked, rewarded and fully debriefed. 3.1.3. Materials Ten utterances were prepared and printed on cards to be read out loud by participants. They addressed relevant issues for first-year students such as counseling, accommodation, and study regulations (see https://sites.google.com/site/sdsalignment). The word length of the utterances was approximately equal in each condition (see Appendix A). Furthermore, the utterances provided the same amount of major and minor clauses. The SDS was simulated using prerecorded computer-voice utterances. A female voice from MacBook Pro Voice-Over system was used to read out the system’s answers. The audio files were then embedded in an interface that allowed an assistant to quickly play each file. We created two interfaces, one for each condition. Utterances were transmitted by a telephone training system (TK-MP3 Bluetooth). Each conversation was recorded. The participants used an iPhone® 6 for the conversation. 3.2. Dependent measures For a detailed description of the scales, see the respective section in Experiment 1. We employed the same scales in both experiments. Therefore, we report only the internal consistencies here for each instrument employed. 3.2.1. Trustworthiness scale The internal consistencies of the adapted trustworthiness scale from Mayer and Davis (1999) were satisfactory with α = 0.78 for the subscale ability (five items), α = 0.84 for benevolence (five items) and α = 0.71 for integrity (six items). 3.2.2. Satisfaction with the conversation Internal consistencies of the Subjective Assessment of Speak System Interfaces (SASSI) were α = 0.71 for the subscale Response Accuracy (nine items), α = 0.82 for Likeability (nine items), α = 0.75 for Cognitive Demand (four items), α = 0.71 for Annoyance (five items), α = 0.65 for Habitability (five items) and α = 0.73 for Speed (two items). 3.2.3. Quality of relationship inventory Internal consistencies of the Quality of Relationship Inventory (QRI) were α = 0.66 for the subscale support (five items), α = 0.79 for depth (six items) and α = 0.73 for conflict (12 items). 3.3. Control measures 3.3.1. NEO-PI-R The internal consistency of the eight items from the NEO-PI-R was α = 0.80. 3.3.2. TA-EG Internal consistencies of the TA-EG were α = 0.86 for the subscale enthusiasm (five items), α = 0.69 for competence (four items), α = 0.52 for positive consequences (five items), and α = 0.74 for negative consequences (five items). 3.4. Results 3.4.1. Preliminary analyses There were no differences between conditions for sex, age and computer knowledge (comparison of distributions with Mann–Whitney U test). Furthermore, all subscales of TA-EG and NEO-PI-R were equally distributed over conditions. An alpha level of 0.05 was used for all statistical tests. We used a Kolmogorov–Smirnov goodness-of-fit test to test whether the data were normally distributed. Values for Trustworthiness Scale, SASSI and Quality of Relationship Inventory are reported one-tailed because of our uni-directional hypotheses. 3.4.2. Trustworthiness scale The data was not normally distributed. A Mann–Whitney U test (one-tailed) revealed no differences between conditions for the subscales ability (U = 1794, P = 0.267, z = 0.618, ns), benevolence (U = 1987, P = 0.498, z = 0.005, ns) and integrity (U = 1995.5, P = 0.461, z = 0.099, ns). 3.4.3. SASSI In an analysis of variance, we entered the normally distributed subscales response accuracy, likeability and habitability as dependent variables and whether or not the SDS showed lexical alignment as independent variable. There was no difference between conditions for the subscales response accuracy, F(1, 124) = 0.593, P = 0.443, ns, likeability, F(1, 124) = 0.504, P = 0.479, ns and habitability, F(1, 124) = 0.087, P = 0.768, ns (all one-tailed). Because the other SASSI scales were not normally distributed, we analyzed them with a Mann–Whitney U test (all reported P values are one-tailed). This revealed an effect for cognitive demand (U = 1663.5, P = 0.032, z = 1.850; mean range with alignment = 58.18, mean range without alignment = 70.40). Hence, the perceived cognitive demand was higher in the condition without alignment. However, there was no effect for speed (U = 1970, P = 0.296, z = 0.618, ns) or annoyance (U = 2012.5, P = 0.375, z = 0.320, ns). 3.4.4. Quality of relationship inventory A Mann–Whitney U test (one-tailed) revealed no differences between the conditions for the subscales support (U = 1891.5, P = 0.332, z = 0.435, ns), conflict (U = 1665, P = 0.07, z = 1.476, ns), and depth (U = 1800.5, P = 0.189, z = 0.883, ns). 3.5. Discussion The results of Experiment 2 revealed no differences of the SDS using lexical alignment versus not using lexical alignment regarding trustworthiness and quality of the relationship. There was an effect on the perceived cognitive demand, that is, users perceived the conversation as more cognitively demanding when the SDS did not employ lexical alignment. However, there were no effects on other aspects of the satisfaction with the conversation, namely response accuracy, likeability, habitability, speed and annoyance. The effect that lexical alignment decreases the perceived cognitive demand is a central argument for the implication of lexical alignment in the linguistic behavior of SDS. Processing one’s own speech markers represents a reduction of environmental complexity (Gallois et al., 2005; Giles et al., 1979). Keysar (2007) argues that it requires cognitive effort to consider the perspective of someone else and that it would burden working memory capacities. This could be applied for listeners as well: In this case, showing alignment to users’ words would have a relieving effect regarding cognitive burden, which is in line with the perceptions reported by participants. Pickering and Garrod (2004) argue that a representation, originally constructed for understanding, could be reused for production and vice versa, that is, they base on the assumption of parity of representations. Alignment on one level influences the alignment on other levels because of the interconnection of representations. Therefore, the responses in the non-alignment condition may have been more difficult to process because of a lack of activation in the cognitive structures for language processing. There were no differences between the conditions with and without lexical alignment regarding trustworthiness and quality of relationship. This might have to do with the novelty of the situation for the participants: They were confronted with an SDS, with which they had no prior experience and with which they were asked to interact without prior demonstration. Thus, they could not fall back on a mental model and might have had difficulties to fully process the interaction (Lin et al., 2010; Roßnagel, 2004). Furthermore, the SDS confronted the participants with numerous subtopics, which can have increased the difficulty of participants’ tasks (Holtgraves and Han, 2007). Moreover, the fact that the utterances were not chosen by the participants might have hindered them to build a relationship. In this conversational situation, participants might have not experienced to be personally involved, which might have led to the consequence that the trustworthiness of the SDS did not play a role, nor the quality of relationship to the SDS. 4. GENERAL DISCUSSION In the present study, we used two experiments to examine one and the same simulated SDS either employing lexical or not employing alignment: In Experiment 1, participants listened to a recorded conversation between the system and a student. In Experiment 2, participants communicated with the system themselves, reading out preformulated utterances. We examined the influence of the SDS’s lexical alignment on the assessment of its trustworthiness, the satisfaction with the conversation, and the quality of the relationship. The fact that the same SDS and the same utterances were used allows us to make comparisons between these different settings. First, regarding perceived trustworthiness, participants ascribed more integrity to the SDS when it employed lexical alignment in Experiment 1; however, no differences were found in Experiment 2. As considered before, this difference in the pattern of results might stem from the different cognitive requirements. In Experiment 2, participants had to concentrate on the utterances they were reading out. In Experiment 1, however, participants were able to concentrate more intensely on the SDS utterances, and this could have enabled them to notice more differences in their assessment. The divergent instructions might have reinforced this pattern. Concerning integrity, this antecedent of trustworthiness refers directly to the honesty and accordingly to the personality of the interlocutor. Given that voice constitutes an important expression of personality (Nass and Brave, 2005) and personality can be ascribed to a computer (Sundar and Nass, 2000), integrity is most likely to be affected by the use of lexical alignment compared to the other antecedents. Lexical alignment has a positive influence on the perceived integrity of the SDS. This should be taken into account especially when designing SDSs whose use contains a perceived risk such as the disclosure of personal data or the fulfillment of an important task. Regarding satisfaction with the conversation, Experiment 2 revealed that cognitive demand was perceived to be higher in the condition without alignment. In Experiment 1, participants rated the SDS’s response accuracy, its likeability, and its speed as being higher in the condition with alignment. Overall, this is in line with the hypothesis that the SDS’s employment of lexical alignment leads to a greater amount of satisfaction with the conversation. However, some aspects of satisfaction differed significantly between the two scenarios: When participants had to talk themselves, lexical alignment attenuated the perceived cognitive demand (see Section 3.5). Following the Interactive Alignment Model (Pickering and Garrod, 2004), representations of concepts are linked to each other. Therefore, alignment on one linguistic level has an impact on the cognitive processing on other levels. When participants had already used the same words as the SDS, this likely relieved participants’ cognitive burden, and, in turn, the cognitive demand posed on them by processing the SDS’s utterance. Furthermore, this is in line with the Communication Accommodation Theory, which states that when people process their own speech markers, this leads to a reduced environmental complexity (Gallois et al., 2005; Giles et al., 1979). At the same time, it is cognitively demanding to consider the perspective of another person (Keysar, 2007). When participants were observers, lexical alignment enhanced the perceived SDS’s response accuracy and speed as well as its likeability. The impression of higher response accuracy could be attributed to the word overlap; this might also have influenced the perception of higher speed. Higher ascription of likeability is in line with prior research (Bradac et al., 1988; Branigan et al., 2010; Van Baaren et al., 2003). The phenomenon that people choose from a great variety of words (Furnas et al., 1987) when speaking is reduced when people are led to adapt words that they have heard before. This has an influence on the ease of communication (Cowan and Branigan, 2015; Nenkova et al., 2008; Tomko and Rosenfeld, 2004). Both experiments revealed no differences between conditions in terms of the quality of the relationship. To our knowledge, the employed scale has not been used to assess human–computer interaction before, because it was developed originally for human–human relationships. Even with adaptations, it might not be suitable for the present context. However, the examined human–computer interaction was only a short-term interaction and a relationship over a longer period might well have yielded different results (Lin et al., 2010; Roßnagel, 2004). In sum, the present results lead to the recommendation to implement lexical alignment in SDS. This could not only be beneficial to guide users’ wordings and increase the predictability of their input, but could also enhance the acceptance of SDS and willingness to use them because it might help users to experience a smooth communication (Cowan and Branigan, 2015; Koulouri et al., 2016; Levitan et al., 2011). 4.1. Limitations and suggestions for future research In Experiment 2, participants could not use their own words, but had to read preformulated sentences. This allowed for experimental control at the expense of ecological validity. Experiment 1 examined the observer’s perspective. Future research should therefore include the opportunity for participants to employ their own words and examine the influence of lexical alignment on that setting. Moreover, research should also examine relationships with SDS that last for a longer period of time such as personal assistants used for multiple purposes. In long-term interactions, users get to know the linguistic features of an SDS, and lexical alignment could result in different effects compared to one-time interactions. In particular, systems that are capable of strengthening their amount of lexical alignment could positively influence how their users assess them. Furthermore, participants of both experiments constituted a rather homogeneous group of young students, mainly of psychology and teacher training. Thus, they only represent a narrow group of potential SDS users. The results of this study are likely limited to this potential user group. Future studies should focus on less educated persons, older persons and persons with lower technical affinity because they probably have other needs regarding SDS. To date, lexical alignment constitutes an important topic in human–computer interaction because it enhances the usability of an SDS that has restricted capacity. Increasing capacities and computing powers will bring us closer to the concept of pragmatic or conceptual alignment (Stolk et al., 2016). 4.2. CONCLUSION The employment of lexical alignment in an SDS led persons who spoke given utterances to that SDS report a lower amount of cognitive demand. Furthermore, in persons who listened to the SDS communicating with someone else, lexical alignment led to a higher perceived response accuracy, speed and likeability. In the latter setting, persons also ascribed a higher amount of integrity to the system when it employed lexical alignment. Therefore, we recommend including lexical alignment in the design of SDS. In general, the fast technological development of SDS opens up new bundles of questions regarding the employment of lexical alignment in the relationship between humans and computers. To date, both the adaptation from the system to the user and the adaptation from the user to the system seem to be beneficial. However, with increasing competence of SDS on the one hand and people’s growing experience in the use of these systems on the other hand, the question who will finally mainly adapt to whom still remains open. Furthermore, social aspects are likely to become even more important. In this vein, the perception of lexical alignment or the lack of alignment may be judged not only in regard to comprehensibility, but may lead to judgments about the system’s ‘personality’. Thus, SDS’s increasing complexity of capabilities is likely to be accompanied by an increasing complexity of its social status and relationship to humans. 5. SUPPLEMENTARY MATERIAL Supplementary data is available at Interacting with Computers online. FUNDING Deutsche Forschungsgemeinschaft (German Research Foundation) within the framework of Research Training Group GRK 1712: Trust and Communication in a Digitized World. The Deutsche Forschungsgemeinschaft had no involvement in study design, data collection, analysis and interpretation and the decision to submit the article for publication. ACKNOWLEDGEMENTS We thank Christina Hanna, Jens Riehemann, Daniel Ruholl and Bianca Siemering for their help in implementing the experimental setting, data collection and processing. We thank Jonathan Harrow for language editing. Footnotes 1 Reversed scale, therefore, higher values represent an impression of slowness. REFERENCES Barr , D.J. and Keysar , B. ( 2002 ) Anchoring comprehension in linguistic precedents . J. Mem. Lang. , 46 , 391 – 418 . doi:10.1006/jmla.2001.2815 . Google Scholar CrossRef Search ADS Bell , L. , Gustafson , J. and Heldner , M. ( 2003 ) Prosodic adaption in human–computer interaction . Proc. ICPhS Barc. , 15 , 2453 – 2456 . Bradac , J.J. , Mulac , A. and House , A. ( 1988 ) Lexical diversity and magnitude of convergent versus divergent style shifting: perceptual and evaluative consequences . Lang. Commun. , 8 , 213 – 228 . Google Scholar CrossRef Search ADS Branigan , H. and Pearson , J. ( 2006 ) Alignment in Human-Computer Interaction. In Fischer , K. (ed.) , How People Talk to Computers, Robots, and Other Artificial Communication Partners . pp. 140 – 156 . HWK , Delmenhorst, Germany . Branigan , H.P. , Pickering , M.J. , Pearson , J. and McLean , J.F. ( 2010 ) Linguistic alignment between people and computers . J. Pragmat. , 42 , 2355 – 2368 . doi:10.1016/j.pragma.2009.12.012 . Google Scholar CrossRef Search ADS Branigan , H.P. , Pickering , M.J. , Pearson , J. , McLean , J.F. and Brown , A. ( 2011 ) The role of beliefs in lexical alignment: evidence from dialogs with humans and computers . Cognition , 121 , 41 – 57 . doi:10.1016/j.cognition.2011.05.011 . Google Scholar CrossRef Search ADS PubMed Branigan , H.P. , Pickering , M.J. , McLean , J.F. and Cleland , A.A. ( 2007 ) Syntactic alignment and participant role in dialogue . Cognition , 104 , 163 – 197 . doi:10.1016/j.cognition.2006.05.006 . Google Scholar CrossRef Search ADS PubMed Brennan , S.E. ( 1998 ) The grounding problem in conversations with and through computers. In Fussell , S.R. and Kreuz , R.J. (eds) , Social and Cognitive Psychological Approaches to Interpersonal Communication . pp. 201 – 225 . Lawrence Erlbaum , Hillsdale, NJ . Brennan , S.E. and Clark , H.H. ( 1996 ) Conceptual pacts and lexical choice in conversation . J. Exp. Psychol. Learn. Mem. Cogn. , 22 , 1482 – 1493 . doi:10.1037/0278-7393.22.6.1482 . Google Scholar CrossRef Search ADS PubMed Clark , H.H. ( 1996 ) Using Language . Cambridge University Press , Cambridge, UK. Google Scholar CrossRef Search ADS Clark , H.H. and Brennan , S.E. ( 1991 ) Grounding in communication . Perspect. Soc. Shared Cogn. , 13 , 127 – 149 . doi:10.1037/10096-006 . Google Scholar CrossRef Search ADS Clark , H.H. and Krych , M.A. ( 2004 ) Speaking while monitoring addressees for understanding . J. Mem. Lang. , 50 , 62 – 81 . doi:10.1016/j.jml.2003.08.004 . Google Scholar CrossRef Search ADS Costa , P.T. , Jr. and McCrae , R.R. ( 1992 ) Revised NEO Personality lnventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Professional manual . Psychological Assessment Resources , Odessa, FL . Cowan , B.R. and Branigan , H.P. ( 2015 ) Does voice anthropomorphism affect lexical alignment in speech-based human–computer dialogue? In Proceedings of INTERSPEECH 2015 . pp. 155 – 159 . International Speech Communication Association , San José, CA, USA. Cowan , B.R. , Branigan , H.P. , Obregón , M. , Bugis , E. and Beale , R. ( 2015 ) Voice anthropomorphism, interlocutor modelling and alignment effects on syntactic choices in human−computer dialogue . Int. J. Hum. Comput. Stud. , 83 , 27 – 42 . doi:10.1016/j.ijhcs.2015.05.008 . Google Scholar CrossRef Search ADS Danet , B. ( 1980 ) ‘Baby’or ‘fetus’?: Language and the construction of reality in a manslaughter trial . Semiotica , 32 , 187 – 220 . Google Scholar CrossRef Search ADS De Angeli , A. , Gerbino , W. , Nodari , E. and Petrelli , D. ( 1999 ) From tools to friends: where is the borderline? In Proceedings of the UM’99 Workshop on Attitude, Personality and Emotions in User-Adapted Interaction . pp. 1 – 10 . Springer , Berlin, Germany . Dybkjær , L. and Bernsen , N.O. ( 2000 ) Usability issues in spoken dialogue systems . Nat. Lang. Eng. , 6 , 243 – 271 . Google Scholar CrossRef Search ADS Edlund , J. , Gustafson , J. , Heldner , M. and Hjalmarsson , A. ( 2008 ) Towards human-like dialogue systems . Speech Commun. , 50 , 630 – 645 . doi:10.1016/j.specom.2008.04.002 . Google Scholar CrossRef Search ADS Faigley , L. and Witte , S. ( 1981 ) Analyzing revision . Coll. Composit. Commun. , 32 , 400 – 414 . doi:10.2307/356602 . Google Scholar CrossRef Search ADS Foltz , A. , Gaspers , J. , Thiele , K. , Stenneken , P. and Cimiano , P. ( 2015 ) Lexical alignment in triadic communication . Front. Psychol. , 6 , 127 doi:10.3389/fpsyg.2015.00127 . Google Scholar CrossRef Search ADS PubMed Furnas , G.W. , Landauer , T.K. , Gomez , L.M. and Dumais , S.T. ( 1987 ) The vocabulary problem in human–system communication . Commun. ACM , 30 , 964 – 971 . doi:10.1145/32206.32212 . Google Scholar CrossRef Search ADS Gallois , C. , Ogay , T.T. and Giles , H. ( 2005 ) Communication accommodation theory: a look back and a look ahead. In Gudykunst , W. (ed.) , Theorizing About Intercultural Communication . pp. 121 – 148 . Sage , Thousand Oaks, CA . Garrod , S. and Anderson , A. ( 1987 ) Saying what you mean in dialogue: a study in conceptual and semantic co-ordination . Cognition , 27 , 181 – 218 . doi:10.1016/0010-0277(87)90018-7 . Google Scholar CrossRef Search ADS PubMed Garrod , S. and Pickering , M.J. ( 2009 ) Joint action, interactive alignment, and dialog . Top. Cogn. Sci. , 1 , 292 – 304 . doi:10.1111/j.1756-8765.2009.01020.x . Google Scholar CrossRef Search ADS PubMed Giles , H. , Scherer , K.R. and Taylor , D.M. ( 1979 ) Speech Markers in Social Interaction. In Scherer , K.R. and Giles , H. (eds) , Social Markers in Speech . pp. 343 – 381 . Cambridge University Press , Cambridge . Giles , H. , Coupland , N. and Coupland , J. ( 1991 ) Accommodation theory: communication, context, and consequence. In Giles , H. , Coupland , J. and Coupland , N. (eds) , Contexts of Accommodation . pp. 1 – 68 . Cambridge University Press , New York, NY . Google Scholar CrossRef Search ADS Gong , L. ( 2008 ) How social is social responses to computers? The function of the degree of anthropomorphism in computer representations . Comput. Human. Behav. , 24 , 1494 – 1509 . doi:10.1016/j.chb.2007.05.007 . Google Scholar CrossRef Search ADS Gustafson , J. , Larsson , A. , Carlson , R. and Hellman , K. ( 1997 ). How do system questions influence lexical choices in user answers? Paper presented at the Eurospeech 1997, Rhodos, Greece. Hempel , J. ( 2015 , August 26). Facebook launches M, its bold answer to Siri and Cortana. Retrieved May 16, 2016. http://www.wired.com/2015/08/facebook-launches-m-new-kind-virtual-assistant/ Holtgraves , T. and Han , T.L. ( 2007 ) A procedure for studying online conversational processing using a chat bot . Behav. Res. Methods , 39 , 156 – 163 . Google Scholar CrossRef Search ADS PubMed Holtgraves , T. , Ross , S. , Weywadt , C. and Han , T.L. ( 2007 ) Perceiving artificial social agents . Comput. Human. Behav. , 23 , 2163 – 2174 . doi:10.1016/j.chb.2006.02.017 . Google Scholar CrossRef Search ADS Hone , K.S. and Graham , R. ( 2000 ) Towards a tool for the subjective assessment of speech system interfaces (SASSI) . Nat. Lang. Eng. , 6 , 287 – 303 . Google Scholar CrossRef Search ADS Hone , K.S. and Graham , R. ( 2001 ). Subjective assessment of speech–system interface usability. Proc. 7th Eur. Conf. on Speech Communication and Technology (EUROSPEECH 2001– Scandinavia) (pp. 2083–2086), Aalborg, Denmark. Ireland , M.E. and Pennebaker , J.W. ( 2010 ) Language style matching in writing: synchrony in essays, correspondence, and poetry . J. Pers. Soc. Psychol. , 99 , 549 – 571 . doi:10.1037/a0020386 . Google Scholar CrossRef Search ADS PubMed Joinson , A.N. , Reips , U.-D. , Buchanan , T. and Paine Schofield , C.B. ( 2010 ) Privacy, trust, and self-disclosure online . Hum. Comput. Int. , 25 , 1 – 24 . doi:10.1080/07370020903586662 . Google Scholar CrossRef Search ADS Jucks , R. , Linnemann , G.A. , Thon , F.M. and Zimmermann , M. ( 2016 ) Trust the words: insights into the role of language in trust building in a digitalized world. In Blöbaum , B. (ed.) , Trust and Communication in a Digitized World . pp. 225 – 237 . Springer International Publishing , Cham, Switzerland , doi:10.1007/978-3-319-28059-2 . Google Scholar CrossRef Search ADS Jucks , R. , Päuler , L. and Brummernhenrich , B. ( 2014 ) ‘I need to be explicit: You’re wrong’: impact of face threats on social evaluations in online instructional communication . Int. Comput. , 28 , 73 – 84 . doi:10.1093/iwc/iwu032 . Google Scholar CrossRef Search ADS Jucks , R. , Schulte-Löbbert , P. and Bromme , R. ( 2007 ) Supporting experts’ written knowledge communication through reflective prompts on the use of specialist concepts . Z. Psychosom. J. Psychol. , 215 , 237 – 247 . doi:10.1027/0044-3409.215.4.237 . Karrer , K. , Glaser , C. , Clemens , C. and Bruder , C. ( 2009 ) Technikaffinität erfassen—der Fragebogen TA-EG [Assessing affinity with technology: the TA-EG questionnaire] . Der Mensch im Mittelpunkt technischer Systeme , 8 , 196 – 201 . Keysar , B. ( 2007 ) Communication and miscommunication: the role of egocentric processes . Intercult. Pragmatics , 4 , 71 – 84 . doi:10.1515/IP.2007.004 . Google Scholar CrossRef Search ADS Krauss , R.M. ( 1987 ) The role of the listener: addressee influences on message formulation . J. Lang. Soc. Psychol. , 6 , 81 – 98 . doi:10.1177/0261927×8700600201 . Google Scholar CrossRef Search ADS Krauss , R.M. and Fussell , S.R. ( 1991 ) Perspective-taking in communication: representations of others’ knowledge in reference . Soc. Cogn. , 9 , 2 – 24 . Google Scholar CrossRef Search ADS Koulouri , T. , Lauria , S. and Macredie , R.D. ( 2016 ) Do (and say) as I say: linguistic adaptation in human–computer dialogs . Hum. Comput. Interact. , 31 , 59 – 95 . doi:10.1080/07370024.2014.934180 . Google Scholar CrossRef Search ADS Lee , E.J. , Nass , C. and Brave , S. ( 2000 ). Can computer-generated speech have gender? An experimental test of gender stereotypes. Proceeding CHI EA ‘00 Extended Abstracts on Human Factors in Computing Systems (pp. 289–290). New York, NY. doi:10.1145/633292.633461 Levitan , R. , Gravano , A. and Hirschberg , J. ( 2011 ) Entrainment in speech preceding backchannels. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers-Volume 2 . pp. 113 – 117 . Association for Computational Linguistics , Stroudsburg, PA . Lin , S. , Keysar , B. and Epley , N. ( 2010 ) Reflexively mindblind: using theory of mind to interpret behavior requires effortful attention . J. Exp. Soc. Psychol. , 46 , 551 – 556 . doi:10.1016/j.jesp.2009.12.019 . Google Scholar CrossRef Search ADS Linnemann , G.A. and Jucks , R. ( 2016 ) As in the question, so in the answer? Language style of human and machine speakers affects interlocutors’ convergence on wordings . J. Lang. Soc. Psychol. , doi:10.1177/0261927×15625444 . Lopes , J. , Eskenazi , M. and Trancoso , I. ( 2011 ) Towards choosing better primes for spoken dialog systems. In Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop . pp. 306 – 311 . IEEE , Hawaii . López-Cózar , R. , Callejas , Z. , Griol , D. and Quesada , J.F. ( 2014 ) Review of spoken dialogue systems . Loquens , 1 , e012 doi:10.3989/loquens.2014.012 . Google Scholar CrossRef Search ADS Luger , E. and Sellen , A. ( 2016 ) Like having a really bad PA: the gulf between user expectation and experience of conversational agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems . pp. 5286 – 5297 . ACM , San José, CA, USA. Maddux , W.W. , Mullen , E. and Galinsky , A.D. ( 2007 ) Chameleons bake bigger pies and take bigger pieces: strategic behavioral mimicry facilitates negotiation outcomes . J. Exp. Soc. Psychol. , 44 , 461 – 468 . doi:10.1016/j.jesp.2007.02.003 . Google Scholar CrossRef Search ADS Manjoo , F. ( 2016 , March 09). The echo from Amazon brims with groundbreaking promise. http://www.nytimes.com/2016/03/10/technology/the-echo-from-amazon-brims-with-groundbreaking-promise.html?_r=0 Mavridis , N. ( 2015 ) A review of verbal and non-verbal human–robot interactive communication . Rob. Auton. Syst. , 63 , 22 – 35 . doi:10.1016/j.robot.2014.09.031 . Google Scholar CrossRef Search ADS Mayer , R.C. and Davis , J.H. ( 1999 ) The effect of the performance appraisal system on trust for management: a field quasi-experiment . J. Appl. Psychol. , 84 , 123 doi:10.1037/0021-9010.84.1.123 . Google Scholar CrossRef Search ADS Mayer , R.C. , Davis , J.H. and Schoorman , F.D. ( 1995 ) An integrative model of organizational trust . Acad. Manage. Rev. , 20 , 709 – 734 . doi:10.2307/258792 . Google Scholar CrossRef Search ADS McKnight , D.H. ( 2005 ) Trust in information technology. In Davis , G.B. (ed.) , The Blackwell Encyclopedia of Management. Vol. 7 Management Information Systems . pp. 329 – 331 . Blackwell , Malden, MA . McKnight , D.H. and Chervany , N.L. ( 2001 ) Trust and distrust definitions: one bite at a time. In Trust in Cyber-societies . pp. 27 – 54 . Springer , Berlin, Germany , doi:10.1007/3-540-45547-7-3 . Google Scholar CrossRef Search ADS Metzing , C. and Brennan , S.E. ( 2003 ) When conceptual pacts are broken: partner-specific effects on the comprehension of referring expressions . J. Mem. Lang. , 49 , 201 – 213 . doi:10.1016/S0749-596×(03)00028-7 . Google Scholar CrossRef Search ADS Mitchell , W.J. , Ho , C.C. , Patel , H. and MacDorman , K.F. ( 2011 ) Does social desirability bias favor humans? Explicit–implicit evaluations of synthesized speech support a new HCI model of impression management . Comput. Hum. Behav. , 27 , 402 – 412 . doi:10.1016/j.chb.2010.09.002 . Google Scholar CrossRef Search ADS Nass , C.I. and Brave , S. ( 2005 ) Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship . MIT Press , Cambridge . Nass , C. and Lee , K.M. ( 2001 ) Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction . J. Exp. Psychol. Appl. , 7 , 171 doi:10.1037//1076-898×0.7.3.171 . Google Scholar CrossRef Search ADS PubMed Nenkova , A. , Gravano , A. and Hirschberg , J. ( 2008 ) High frequency word entrainment in spoken dialogue. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers . pp. 169 – 172 . Association for Computational Linguistics , Stroudsburg, PA . Ostendorf , F. and Angleitner , A. ( 2004 ) NEO-PI-R-NEO Persönlichkeitsinventar nach Costa und McCrae—Revidierte Fassung (PSYNDEX Tests Review) [Costa and McCrae’s Revised NEO Personality Inventory] . Hogrefe , Göttingen, Germany . Paek , T. and Pieraccini , R. ( 2008 ) Automating spoken dialogue management design using machine learning: an industry perspective . Speech Commun. , 50 , 716 – 729 . doi:10.1016/j.specom.2008.03.010 . Google Scholar CrossRef Search ADS Pickard , M.D. , Burgoon , J.K. and Derrick , D.C. ( 2014 ) Toward an objective linguistic-based measure of perceived embodied conversational agent power and likeability . Int. J. Hum. Comput. Interact. , 30 , 495 – 516 . doi:10.1080/10447318.2014.888504 . Google Scholar CrossRef Search ADS Pickering , M.J. and Garrod , S. ( 2004 ) Toward a mechanistic psychology of dialogue . Behav. Brain Sci. , 27 , 169 – 226 . doi:10.1017/S0140525X04000056 . Google Scholar PubMed Pierce , G.R. , Sarason , I.G. , Sarason , B.R. , Solky-Butzel , J.A. and Nagle , L.C. ( 1997 ) Assessing the quality of personal relationships . J. Soc. Pers. Relat. , 14 , 339 – 356 . doi:10.1177/0265407597143004 . Google Scholar CrossRef Search ADS Reiner , I. , Beutel , M. , Skaletz , C. , Brähler , E. and Stöbel-Richter , Y. ( 2012 ) Validating the German version of the Quality of Relationship Inventory: confirming the three-factor structure and report of psychometric properties . PLoS One , 7 , e37380 doi:10.1371/journal.pone.0037380 . Google Scholar CrossRef Search ADS PubMed Roßnagel , C.S. ( 2004 ) Lost in thought: cognitive load and the processing of addressees’ feedback in verbal communication . Exp. Psychol. , 51 , 191 – 200 . doi:10.1027/1618-3169.51.3.191 . Google Scholar CrossRef Search ADS PubMed Romero , D.M. , Swaab , R.I. , Uzzi , B. and Galinsky , A.D. ( 2015 ) Mimicry is presidential linguistic style matching in presidential debates and improved polling numbers . Pers. Soc. Psychol. Bull. , 41 , 1311 – 1319 . doi:10.1177/0146167215591168 . Google Scholar CrossRef Search ADS PubMed Stolk , A. , Verhagen , L. and Toni , I. ( 2016 ) Conceptual alignment: how brains achieve mutual understanding . Trends Cogn. Sci. , 20 , 180 – 191 . doi:10.1016/j.tics.2015.11.007 . Google Scholar CrossRef Search ADS PubMed Sundar , S.S. and Nass , C. ( 2000 ) Source orientation in human–computer interaction: programmer, networker, or independent social actor . Commun. Res. , 27 , 683 – 703 . doi:10.1177/009365000027006001 . Google Scholar CrossRef Search ADS Thon , F.M. and Jucks , R. ( 2014 ) Regulating privacy in interpersonal online communication: the role of self-disclosure . Stud. Commun. Sci. , 14 , 3 – 11 . doi:10.1016/j.scoms.2014.03.012 . Google Scholar CrossRef Search ADS Tomko , S. and Rosenfeld , R. ( 2004 ). Shaping spoken input in user-initiative systems. Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004. Boston, MA. Torrey , C. , Powers , A. , Marge , M. , Fussell , S.R. and Kiesler , S. ( 2006 ). Effects of adaptive robot dialogue on information exchange and social relations. Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human–Robot Interaction (pp. 126–133). New York, NY: ACM. Tseng , S. and Fogg , B.J. ( 1999 ) Credibility and computing technology . Commun. ACM , 42 , 39 – 44 . doi:10.1145/301353.301402 . Google Scholar CrossRef Search ADS Van Baaren , R.B. , Holland , R.W. , Steenaert , B. and van Knippenberg , A. ( 2003 ) Mimicry for money: behavioral consequences of imitation . J. Exp. Soc. Psychol. , 39 , 393 – 398 . Google Scholar CrossRef Search ADS Van der Wege and M.M. ( 2009 ) Lexical entrainment and lexical differentiation in reference phrase choice . J. Mem. Lang. , 60 , 448 – 463 . Google Scholar CrossRef Search ADS Verhofstadt , L.L. , Buysse , A. , Rosseel , Y. and Peene , O.J. ( 2006 ) Confirming the three-factor structure of the quality of relationships inventory within couples . Psychol. Assess. , 18 , 15 – 21 . doi:10.1037/1040-3590.18.1.15 . Google Scholar CrossRef Search ADS PubMed Vinyals , O. and Le , Q. ( 2015 ). A neural conversational model. arXiv preprint arXiv:1506.05869. ISO 690 Von der Pütten , A.M. , Krämer , N.C. , Gratch , J. and Kang , S.H. ( 2010 ) ‘It doesn’t matter what you are!’ Explaining social effects of agents and avatars . Comput. Human. Behav. , 26 , 1641 – 1650 . doi:10.1016/j.chb.2010.06.012 . Google Scholar CrossRef Search ADS Wilkes-Gibbs , D. and Clark , H.H. ( 1992 ) Coordinating beliefs in conversation . J. Mem. Lang. , 31 , 183 – 194 . doi:10.1016/0749-596×(92)90010-U . Google Scholar CrossRef Search ADS Author notes Editorial Board Member: Dr Maria Wolters © The Author(s) 2018. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved. For Permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

Interacting with ComputersOxford University Press

Published: Mar 6, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off