Estimating a latent-class user model for travel recommender systems

Estimating a latent-class user model for travel recommender systems Inf Technol Tourism (2018) 19:61–82 https://doi.org/10.1007/s40558-018-0105-z ORI G INAL RESEARCH Estimating a latent-class user model for travel recommender systems 1 1 1 • • Theo Arentze Astrid Kemperman Petr Aksenov Received: 30 May 2017 / Revised: 21 December 2017 / Accepted: 17 January 2018 / Published online: 2 February 2018 The Author(s) 2018. This article is an open access publication Abstract In determining the selection of sites to visit on a trip tourists have to trade-off attraction values against routing and time-use characteristics of points of interest (POIs). For recommending optimal personalized travel plans an accurate assessment of how users make these trade-offs is important. In this paper we report the results of a study conducted to estimate a user model for travel recommender systems. The proposed model is part of c-Space—a tour-recommender system for tourists on a city trip which uses the LATUS algorithm to find personalized optimal tours. The model takes into account a multi-attribute utility function of POIs as well as dynamic needs of persons on a trip. A stated choice experiment is designed where the current need is manipulated as a context variable and activity choice alternatives are varied. A random sample of 316 individuals participated in the on-line survey. A latent-class analysis shows that significant differences exist between tourists in terms of how they make the trade-offs between the factors and respond to needs. The estimation results provide the parameters of a multi-class user model that can be used for travel recommender systems. Keywords Travel recommender systems  User model  City trip  Stated choice experiment  Latent class model & Theo Arentze t.a.arentze@tue.nl Urban Systems and Real Estate Group, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, The Netherlands 123 62 T. Arentze et al. 1 Introduction With the advancement of information and communication technologies (ICT) the development and use of recommender systems that can offer tourists personalized advice and recommendation on which activities to conduct at a destination has received increasing attention (e.g., Buhalis 1998; Buhalis and Law 2008; Mackay and Vogt 2012; Steen Jacobsen and Munar 2012). A typical user of a travel recommender system is a tourist who is interested in exploring a city and wants to make a tour around (e.g., Yang and Hwang 2013; Borras et al. 2014). Such a tour comprises a scheduled list of attractions (museums, heritage sites, shops, parks or other points of specific interest) as well as the trips needed to travel from one point to the other (e.g., Gretzel et al. 2004; Gavalas et al. 2014). Travel Recommender Systems (TRSs) help to overcome the information load tourists may experience when they search for options, by providing users selected items that match their personal preferences (Braunhofer et al. 2015). For this a critical element of TRSs is the ability to acquire the relevant information about preferences and needs of the user and identify the POIs that match his or her interests. A number of alternative methods have been proposed to tackle this problem. These can be classified as collaborative filtering (matching a user to other users that have similar interests and preferences), content-based filtering (matching based on attributes of POIs) and knowledge-based methods (e.g., case-based reasoning). An overview of techniques in this area can be found in Hanani et al. (2001) and Adomavicius and Tuzhilin (2005). Across these approaches users’ preferences to be predicted are often formulated as rates assigned to items (POIs) that reflect how much one likes the product or service. For determining an optimal tour, however, users have to trade-off their interests in certain POIs against other considerations such as travel costs (time and effort it takes to reach the location), fee or entrance costs, and preferred allocation of time across activities. Furthermore, individuals’ preferences may depend on needs that change depending on previous activities. Such dynamic needs give rise to saturation effects and variety seeking (Arentze and Timmermans 2009). If multiple activities have to be combined on a trip, the way a user makes trade-offs between these considerations determines overall preferences for selections of POIs. Thus, in the context of tour planning, the selection of POIs is a multi-criteria decision problem. Hereby, individual travelers may differ in the weights they assign to these components in determining their preference. Although the multi-criteria nature of preferences for tours is widely acknowl- edged in advanced trip planners for ordinary travel (Kerkman et al. 2012), it has received limited attention in user models of TRSs. In the present study, we present a method to estimate tourist’s preferences taking into account the various factors involved in city trip planning. In this method, the preference value or utility for including a certain POI in a tour is modeled as a function of attributes of POIs. The utility function is estimated using a stated choice experiment administered in a survey. The estimated utility function defines a user model that allows a TRS to compose an optimal tour given personal information about specific interests of an 123 Estimating a latent-class user model for travel… 63 individual user. We design a stated choice experiment that allows the estimation of the relevant parameters and present the results of an application involving a large sample of individuals from a national on-line panel. Individual tourists may differ in terms of the way they make the trade-offs. To account for heterogeneity among individuals and identify the extent to which preferences may differ, we estimate a latent-class model. The method we propose is developed in the context of the c-Space TRS for city trips (Aksenov et al. 2014, 2016). A special characteristic of the c-Space system is that it takes dynamic needs into account by using an advanced algorithm to find personalized optimal tours called LATUS (Arentze 2015). In the context of the c-Space system, the estimates are used to define an initial user profile that can be adapted if more information about a user’s preferences becomes available. The recommender system and the LATUS algorithm have been described in earlier work as referenced above. In this study, we briefly explain the system and present the proposed method to estimate user preference profiles. The results of this study also provide substantive insights in tourists’ preferences for visiting POIs in city tours. The rest of the paper is structured as follows. First, in the next section we will review the existing approaches in the field of TRS with respect to user modeling. Then, in Sect. 3, we briefly describe the c-Space system and LATUS algorithm to offer a system concept for the user model. Then, in Sect. 4, we describe the stated choice experiment and survey method. In Sect. 5, we present the results of the survey and estimation of the latent-class model. Finally, we conclude the paper with a discussion of major conclusions and directions for future research. 2 Related work The core component of TRSs is a (filtering) algorithm to select from an exhaustive database the POIs that match a user’s preferences. Collaborative filtering is a much used technique in TRSs. In this technique, personal background or history information about a user is used to identify users with similar characteristics of whom the preferences are known. Preferences are typically represented in the form of ratings assigned to POIs. The average rates assigned by previous similar users is used as a best estimate of the preferences of the user the system is interacting with. The definition of similarity is a critical component in this process. If already ratings of the user are known from previous interactions with the system, similarity can be measured based on matching ratings. If such history information is not available then similarity may be defined based on known demographic data of users such as age, gender and education. An alternative to collaborative filtering is content-based filtering. In a content-based approach, items are recommended that have the same attributes as the items that the user has liked before (Neidhardt et al. 2015). A generally acknowledged problem with the filtering methods is the so-called cold-start problem. This problem occurs when requests come from new users who have not yet submitted any ratings or concern new items which have not been evaluated before (the first-rater problem) (Fonte et al. 2013). Knowledge based systems have been proposed where preferences are derived based on reasoning 123 64 T. Arentze et al. about user requirements that go beyond a simple matching of ratings. A well-known example of a knowledge-based technique is case-based reasoning (Fonte et al. 2013). The new user problem has also received attention in so-called Context Aware Recommender Systems (CARS). These systems emphasize that users’ preferences are dependent on contextual conditions and, hence, that recommendations should be context dependent. In tourism choice, weather conditions (sunny or rainy, etc.), travel party (alone or traveling with others) and travel mode (e.g., transport mode) are influential contextual conditions. Braunhofer and Ricci (2017) report the results of a survey conducted to identify important context factors and estimate the influence of these factors on rating predictions in the context of TRSs. Also, the role of emotion and personality traits have received attention as context factors in CARS. In a survey conducted to elicit tourists’ preferences, Neidhardt et al. (2015) use a picture based approach to address preferences on an emotional level. Braunhofer et al. (2015) show that personality traits of the Big-5 model provide useful information for generating context-aware recommendations. They argue that personality trait data are relatively easy to collect and especially useful for ranking the recommendations in case of new users. TRSs have gone further than recommending POIs in isolation. Recommendation of complete packages is relevant for tourists who want to plan a tour combining visits to several POIs on the same trip, e.g., a day-tour in a city. Many systems have considered this extended problem of recommending routes (for a review see Wo ¨ rndl and Hefele 2016). In planning a route, preferences related to interests in POIs need to be combined with other characteristics of POIs such as estimated visit times, travel distance and costs (fee or entrance). As Wo ¨ rndl and Hefele (2016) state: ‘‘the process of generating a path from a start to an end point with interesting POIs along the way can be split up into two subtasks. First, potential candidate places have to be determined and scored, and then a path finding algorithm need to generate the best route consisting of a subset of these places.’’ An example is the image-based system MoreTourism (Linaza et al. 2011). This system first elicits a user’s preferences and next recommends the POIs that have the highest utility and an optimal route taking into account estimated visit times, open and close times, and costs. In this study, we consider TRSs that have the objective to recommend complete tours. Finding an optimal tour requires that POI rating scores are traded-off against travel time, entrance costs and time-use characteristics of POIs. The purpose of the present study is to empirically assess the way individuals make these trade-offs. We model individuals’ preferences for POIs in the context of a tour as a multi-attribute utility function and estimate the utility weights in the framework of a discrete choice model. The influence of context conditions is taken into account to allow context- aware recommendation. Stated preference data from a representative sample of individuals are collected in an on-line survey. Using a latent-class model, the estimation of preference parameters and clustering of individuals regarding the preferences they display are performed simultaneously. Thus, the estimation results 123 Estimating a latent-class user model for travel… 65 also indicate the extent to which preferences differ between individuals. In the next section, we will first briefly introduce the c-Space TRS. 3 The c-Space system To formulate the multi-attribute utility function, the c-Space TRS (Aksenov et al. 2016) is the point of departure. c-Space generates personalized tours taking into account a user’s personal thematic interests in particular POIs (architecture, cathedrals, museums, etc.) as well as the weights he or she assigns to a set of basic leisure needs (relaxation, entertainment, new experiences, socializing, etc.). c-Space has been developed as a smartphone application wherein the recommendation functionality is integrated as a REST service (Simoes et al. 2015). Thematic interests and needs as well as time budget and travel constraints are retrieved in a dialogue with the user on the smartphone. Figure 1 shows an example of a dialogue. The resulting user profile is input to the LATUS algorithm together with utility weights of attributes of POIs. The recommended tour including travel plans to reach the various locations are displayed on a map of the city (Fig. 2). In c-Space, location and attribute data about the available POIs in the city of interest are stored in a database. The attribute data stored include general information, such as opening hours and ticket costs (entrance fee), as well as information specifically collected for the c-Space system. The specific information includes the recommended duration of a visit to the POI (in hour units), attraction Fig. 1 Example of a c-Space user dialogue for setting weights of needs 123 66 T. Arentze et al. Fig. 2 Example of a c-Space display of a route plan value (popularity) and theme (subject). The specific information is provided by experts from the local tourist agency. A special part of the POI data consists of parameters, one for each need, that indicate the extent to which visiting the POI matches needs on a zero–one scale (zero indicating no match and one complete match). These parameters are determined based on rule-based knowledge of the types of activities involved in a POI (e.g., a museum can satisfy a need for new experiences to a large extent and a need for physical exercise to a small extent; a botanic garden matches a need to be outdoor to a large extent and a need for entertainment to a small extent, etc.). The degree of match also determines the size of the impact the POI has on the need (e.g., a museum reduces the need of new experiences to a large extent). The POI database and personal profile of the user provide the information for determining utility scores of POIs. The utility score of a POI is determined as a function of the match of the POI with the interests of the person, the attraction value of the POI, the match of the POI with current needs, the travel required (geographical distance) and the monetary costs involved in visiting the POI. A POI matches the interests, if the theme of the POI corresponds with a theme the user has indicated to be of interest to him or her. Match with current needs is determined based on the POI-need-matching parameters, the weights the user assigns to the different needs and an assessment of the current size of each need. The sum across weighted need impacts determines the utility score regarding the match with needs (see Arentze 2015 for details). Due to the dynamic nature of needs, the utility function is dynamic. That is to say, the utility of a POI is dependent on other POIs included in the tour due to the impacts activities have on dynamic needs of the traveler (e.g., a museum will be less attractive if current POIs in the program have already reduced the need for new 123 Estimating a latent-class user model for travel… 67 experiences). Therefore, each time a POI is added to the evolving program the state of the needs are updated before a next POI is considered. The LATUS algorithm is designed to determine the optimal selection of POIs taking into account these interactions between POIs on the overall utility. LATUS starts with an empty program and successively adds POIs selected from a list of optional POIs until the time budget is fully used or no utility can be added anymore. The problem of finding the optimal tour is split in two parts: (1) determining the program by selecting POIs and (2) determining the sequence in which the POIs are visited in the tour and the travel routes between POIs. The optimum sequence is defined as the sequence that minimizes the overall travel costs and is found by means of a heuristic method. To find the selection that maximizes the utility of the tour, LATUS uses a heuristic method. The method is schematically shown in Fig. 3. In this method, the best POI to add is identified as the POI that meets a time-use requirement and maximizes the added utility. The time-use requirement is defined as a threshold level of the utility per unit time taking into account the time to reach the location and the (normal) visiting duration. The threshold level is a parameter set by the system that should reflect the time budget. The more time available the lower the threshold can be set and, vice versa, the tighter the budget the higher the threshold needs to be. Since the proper level of the threshold cannot be computed analytically, LATUS uses a trial-and-error method to find the proper threshold level in a pre-processing step. Starting with a best-guess initial value it increases the threshold when the resulting selection exceeds the budget and lowers the value when time is left in the budget. This heuristic appears to be very powerful in finding the optimal (highest utility) tours (Arentze and Timmermans 2009; Arentze et al. 2010). No Stop Has time left? Yes User’s needs and Select POI POI list preferences Update No POI selected? Stop Yes Add to program of tour Fig. 3 Schematic representation of the LATUS algorithm for selecting POIs 123 68 T. Arentze et al. The model estimation in this study provides the utility-weights of the user profile. The intended contribution of the present study is to show how utility weights for this class of TRSs can be estimated and segmented. In the sections that follow, we describe the design of the choice experiment, the survey and the results of the latent- class-model analysis. 4 Methodology In this section we describe the methodology used in the present study. The core elements of the methodology are a stated choice experiment used to collect data about preferences of tourists on city trips and a latent-class model to estimate preference parameters. Before explaining these elements we will first discuss the underlying behavioral assumptions. 4.1 Behavioral assumptions Although our point of departure is the cSpace TRS, our purpose is to derive a user model that is relevant more broadly for TRSs that are focused on recommending tours. Therefore, in this section we highlight the theoretical considerations that have led to the model specification used in c-Space. 4.1.1 Attributes of POIs The proposed user model assumes that a tourist’s preferences for selecting POIs in the context of a city tour depend on a number of attributes. First, the general attraction value of the point of interest is relevant, that is, the extent to which the point of interest is special, worth a special trip, or even the primary reason to visit the city (e.g., Ashworth and Page 2011; Yeh and Cheng 2015). For example, in many travel guide books some kind of rating system is used to distinguish a top attraction from attractions of less importance (e.g., classification according to the Michelin stars: * of interest, ** worth a detour, *** worth the trip). Second, the extent to which the point of interest matches a person’s personal interest in particular objects/themes is a consideration. For example, some people may be fascinated by cathedrals whereas others find them boring. Third, options may vary in terms of the extent to which the activity matches a current emotional or motivational state given the activities a person has already conducted on the same (city) trip (e.g., Lin et al. 2014; Ma et al. 2013). For example, if all previous activities conducted so far have been indoors, the person may prefer to conduct the next activity in the open air. Fourth, accessibility and costs considerations may play a role: options may differ in terms of the effort (e.g., travel time) it takes to travel to the location or the fee one needs to pay to visit a site (e.g., Armbrecht 2014; Lew and McKercher 2006; Wynen 2013). 123 Estimating a latent-class user model for travel… 69 4.1.2 Dynamic needs The choice of an activity generally involves a trade-off between these consider- ations. By their nature, attraction value, personal interest, effort and costs are static attributes, as the evaluation of these attributes does not depend on a momentary state of the person. In contrast, the extent to which visiting the POI meets the current needs of the tourist is inherently time-dependent. Although mood (and emotion) is also a relevant dimension in this regard (e.g., Wang et al. 2012), we focus here on basic needs. We adopt a classification of basic leisure needs that emerged in the empirical study by Nijland et al. (2010). Based on an analysis of motivations underlying leisure activities, the authors identified 6 need dimensions: new experiences/information; entertainment; relaxation; being in open air/green envi- ronment; physical exercise and social contact. Individuals may differ in terms of how strong these needs are felt or valued. Some may develop more quickly a need for entertainment while others may be more sensitive to new experiences and so on. Such differences may be related to a personal trait (e.g., thrill seeking) (Schneider and Vogt 2012) but also be affected by the nature of the primary activity (the job or occupation) of the person in daily life. For example, a person who has a hectic job in daily life may be inclined to seek relaxation in leisure activities instead of new experiences or socializing. 4.2 Design of the choice experiment To estimate tourists’ preference parameters regarding activity choice during a city trip, we use the technique of stated choice experiment (also known as conjoint analysis) (e.g., Hensher et al. 2015). In this technique, individuals are presented a choice task where they are asked to indicate their preference among a set of choice options (a choice set). The choice options are hypothetic and described in terms of a set of attributes. The attributes and the values each attribute can take are pre-defined as part of the experimental design. Across choice tasks, the attributes are varied based on a statistical design so that the separate (utility) effects of the attributes can be identified through statistical analysis of the obtained choice data. In the experiment we constructed, respondents are asked to imagine the following hypothetical situation: Imagine that you are going to make a city trip to a city you do not know yet. It is a safe, not too crowded and well accessible city. You are traveling together with a person (e.g., partner, adult–child, friend, other family member) who has the same interests as you have. There is much to see and to do in the city that is worthwhile and for sure you will not have enough time if you would want to see and do all. Furthermore, it is good weather for visiting the city. Next, choice tasks are presented to respondents where the context setting for the trip and choice alternatives are varied simultaneously. The context setting for the trip is varied in terms of the following attributes: – Total duration of the city trip (one afternoon, 1 day, 2 days). 123 70 T. Arentze et al. – The time moment of doing the activity in the context of the trip (first activity, in- between activity, last activity). – The size and nature of the current need (size: strong and very strong; nature: new experience, entertainment, relaxation, exercise, open air—green environ- ment, socializing, no specific need). An activity consistently involves visiting a particular POI. The manipulation of needs (the last item) is a key element of this experiment. To avoid needless complexity, it is assumed that a need exists on only one dimension at a time (combinations of needs are not considered). To include a null measurement, the absence of a need is included as a possible level as well; hence this variable has seven levels. The size of the need (if any) has two possible levels—strong and very strong. Literally, the need condition is formulated as: At this moment you have [size] need for [dimension] The choice alternatives are optional POIs; they are varied in the following way on the following attributes: – Attraction value of the POI (one star, two stars, three stars). – Extent to which the POI meets the person’s interests (very low, average, very much). – Extent to which the POI fulfils the person’s current need (very low, average, very much). – The costs of visiting the POI (free, 5 € pp, 10 € pp). – Travel time to reach the POI from the person’s current location (on the route, 10 min walking, 20 min walking). As said, number of stars is an often used labeling system to indicate attraction value in tourist guides and, therefore, is used here. We use separate designs for varying the contexts and choice alternatives. For the context we use a design in nine profiles. The nine profiles are a fraction of a full factorial design of 3 9 2 profiles. The fraction of nine profiles allows estimation of all main effects independently of all first-order interaction effects between attributes. Secondly, we combine the nine profiles with the seven needs (including ‘no specific need’) resulting in 63 different contexts. To design the activity alternatives, we use a design in 27 profiles. The 27 profiles are a fraction of a full- factorial design consisting of 3 profiles. Just as in the case of the design for contexts, this fraction allows the estimation of main effects of attributes independently of all first-order interaction effects. Each respondent is presented with nine choice tasks that are generated by randomly selecting nine context profiles and per context a choice set is presented including three randomly selected POI profiles. The respondent is asked, given the specific context setting, which POI he/ she would prefer or to select the base alternative which is taking a break (not doing any specific activity at the moment). 123 Estimating a latent-class user model for travel… 71 4.3 Latent class model A latent class model is used to segment the respondents regarding their city trip activity preferences (e.g., Swait 1994; Boxall and Adamowicz 2002; Greene and Hensher 2002). In the estimation respondents are simultaneously grouped into segments (or latent classes) and separate parameters are estimated for each of these segments. In our study, we assume that individuals derive some utility from choosing a specific POI during their city trip. This utility can vary between different POIs based on the attributes describing the context and the POI itself. For the usual multinomial logit model (MNL), the utility for individual i for POI j on choice occasion t can be written as: U ¼ b X þ e ; ijt ijt ijt where X expresses all attributes (defining context and POI) with relative weights ijt (parameters b ) to be estimated. e is an error term representing unobserved ijt heterogeneity in utilities. This equation assumes that the parameters are the same for all individuals. However, we assume that there exist S different homogeneous latent classes (segments) in the sample. Given that an individual belongs to latent class s (s = 1, …, S), the utility for individual i belonging to class s for activity j on choice occasion t is defined as: U ¼ b X þ e ; ijt ijt ijt where b is a parameter vector for each latent class s. The probabilities of choice can be derived from the utility function, resulting in the latent class multinomial model (LCM). For each latent class, the probability that individual i chooses POI j at choice occasion t is: expðb X Þ ijt PyðÞ ¼ jjsegment ¼ s¼ : it 0 expðb X Þ ijt j¼1 s For each individual i the probability of belonging to latent class s can be obtained by: expðh Z Þ Pðsegment ¼ sÞ¼ ; expðh Z Þ s¼1 s where Z is an optional set of observable characteristics invariant of the individual choice situation. If no such characteristics are included, the class specific proba- bilities are a set of fixed constants that sum to one. Each individual is assigned to the latent class with the highest probability. The latent class parameters can be estimated using maximum likelihood estimation (see Greene 2001 for details). The likelihood ratio test statistic [G2 =- 2(LL(0) - LL(B))] is used to test whether the estimated choice model LL(B) significantly improves the null model LL(0). McFadden’s Rho square (q = 1 - LL(B)/LL(0)) indicates the goodness of fit of the estimated choice model. 123 72 T. Arentze et al. To select the optimal number of segments, the minimum Akaike Information Criterion [AIC =- 2(LL(B) - P)] is used (e.g., Kamakura and Russell 1989; Gupta and Chintagunta 1994). 5 Results In this section we describe the data collection in terms of the survey and the sample, and the results of the estimation of the latent-class model. 5.1 Survey and sample The choice experiment was implemented in an on-line questionnaire. Apart from the choice experiment, the questionnaire also includes questions to record relevant background variables of the persons. In addition to the usual socio-demographic variables (gender, age, household type, education level, income level, work status), this includes a rating of the felt importance of each of the six basic leisure needs for the benefits the person seeks in a city trip. For these judgements a seven-point rating scale is used. In addition, the nature of the occupation (job, if any) is queried. Respondents indicate the nature of their occupation based on a classification consisting of nine profession types. This set-up allows us to relate pursued needs in leisure time to characteristics of the work activity (job type). Invitations to participate in the survey were sent to a random sample of a large existing national panel which should be representative for the Dutch population. Only respondents that have made at least one city trip in the last 2 years could proceed with the questionnaire. A city trip is defined as a visit to a city in leisure time with the aim to explore the city. A city trip lasts minimally 4 hours and does not include more than three nights. By this filter, we made sure that the relevant segment of the population was selected. In total 316 persons completed the survey. Table 1 shows the distribution of the sample for some key socio-demographic characteristics. The distributions are fairly representative for the (Dutch) population. The last row shows the distribution of the respondents across the nine profession types distinguished. Administrative, Commercial and Specialists professions are the largest categories and have shares in the range of 16–20%. Crafts & industry, Transport, Services and Education are smaller with shares ranging from 5–10%. Agricultural is only very small with a share of merely 0.6%. 5.2 The latent-class model The model specification we use allows us to estimate main effects of all (three- level) attributes of POI choice alternatives including attraction value, match personal interests, match current needs, the costs of the activity and travel time. Consistently, effect coding was used where the highest level is taken as the base. Effect coding means that each three-level variable is coded by two effect-variables: the effect-one variable is coded as [1, 0, - 1] and the effect-two variable as [0, 1, 123 Estimating a latent-class user model for travel… 73 Table 1 Sample characteristics Variables Levels % in sample % Dutch population* Gender Male 50.3 49.5 Female 49.7 50.5 Age 0 B 24 years 11.7 29.0 15 B 44 years 45.6 25.1 45 B 64 years 34.2 28.1 65? years 8.5 17.8 Household type Single 20.9 37.4 Couple 43.4 29.0 Family with children 35.8 33.6 Education level Low 12.7 31.3 Medium 44 38.7 High 43.4 28.5 Income level Unknown 18.4 15.5 Low 15.5 47.2 Medium 47.8 37.3 High 18.4 37.3 Work status Not 26.6 29.8 Part-time 27.5 23.9 Full-time 45.9 46.3 Work type Crafts & industry/transport/agricultural 11.1 5.8 Administrative 19.3 17.8 Commercial 14.6 15.8 Health services 23.1 18.2 Services/education 15.8 22.3 Specialist 16.1 20.1 *(CBS 2017) - 1] for the [low, mid, high] level of the original attribute variable. Furthermore, the model enables the estimation of two-way interactions between all the context variables and all POI choice attributes. Of specific interest is the interaction between the nature of the current need (a context variable) and match with current need (a POI attribute). On that level, interaction effects indicate to what extent individuals differentiate between need dimensions. In pre-processing steps, the specification of the latent-class model (number of classes and selection of interactions) was optimized to arrive at a parsimonious model. A three-class model appeared to be optimal. See the Appendix for the details. Table 2 shows the detailed estimation results for the three-segments model and base model (no segmentation) respectively. Estimation results of the base model represent average behavior across all segments. On this level, the results indicate that all attributes are strongly significant. The difference between utility values of the lowest and highest level indicates the relative importance of the attribute under concern in the choice of activity. Using that criterion, match with personal interests 123 74 T. Arentze et al. Table 2 Results of the Latent-Class Model estimation 1-segment model 3-segment model Parameter (t- Segment 1 Segment 2 Segment 3 statistic) Constant 1.606 (18.68) 0.949 (6.32) 1.099 (3.74) 2.155 (10.87) Activity Attraction value 3 stars 0.383 (11.21) 0.677 (8.46) - 0.188 0.469 (8.09) (- 1.16) 2 stars 0.088 (2.42) 0.094 (1.24) 0.116 (0.72) 0.112 (1.92) 1 star - 0.471 - 0.771 0.072 - 0.581 Match interest Very much 0.621 (17.53) 1.493 (14.53) 0.736 (4.48) 0.180 (2.90) Average 0.074 (2.08) 0.235 (3.24) 0.116 (0.81) 0.013 (0.23) Very low - 0.695 - 1.728 - 0.852 - 0.193 Match needs Very much 0.454 (4.83) 1.391 (6.90) 0.280 (0.68) 0.054 (0.35) Average 0.068 (0.79) 0.069 (0.46) 0.171 (0.56) 0.057 (0.40) Very low - 0.522 - 1.460 - 0.451 - 0.111 Activity costs No cost 0.419 (11.24) 0.750 (8.77) 0.175 (8.36) 0.109 (1.70) 5 € pp 0.039 (1.05) 0.025 (0.32) 0.226 (1.47) 0.033 (0.55) 10 € pp - 0.458 - 0.775 - 0.401 - 0.142 Travel time On the route 0.254 (7.01) 0.309 (4.38) 0.304 (1.94) 0.223 (3.64) 10 min walking 0.036 (0.91) 0.136 (1.54) 0.132 (0.82) 0.079 (1.27) 20 min walking - 0.290 - 0.445 - 0.436 - 0.302 Context-activity Need new experiences Match needs very 0.130 (0.97) 0.042 (0.16) - 1.530 0.409 (1.80) much (- 2.26) Match needs average 0.004 (0.03) 0.226 (0.91) 1.346 (2.38) - 0.447 (- 1.95) Need entertainment Match needs very 0.211 (1.59) 0.552 (1.94) - 0.509 0.147 (0.66) much (- 0.96) Match needs average - 0.062 (- 0.51) 0.309 (1.22) - 0.422 - 0.124 (- 0.95) (- 0.58) Need being outdoor* Match needs very 0.085 (0.80) 0.066 (0.31) - 0.074 0.110 (0.63) much (- 0.16) Match needs average 0.002 (0.02) 0.356 (1.93) - 0.137 .- 0.146 (- 0.35) (- 0.88) 123 Estimating a latent-class user model for travel… 75 Table 2 continued 1-segment model 3-segment model Parameter (t- Segment 1 Segment 2 Segment 3 statistic) Current need very strong Match needs very 0.045 (1.13) - 0.037 0.005 (0.03) 0.187 (2.80) much (- 0.45) Match needs average 0.007 (0.18) 0.109 (1.43) - 0.201 - 0.065 (- 1.18) (- 0.99) Segment probabilities (t-statistic) 0.483 (12.81) 0.131 (5.37) 0.387 (10.66) t-values are indicated in italics *Being outdoor = need for relaxation, exercise and being in the open air-green environment has the largest effect and, hence, is the most important attribute. Attraction value, match needs and activity costs have approximately equal values which are larger than the value of travel time and smaller than the value of match interests. A three- stars attraction is approximately equivalent to 10 € costs suggesting that tourists are willing to pay 10 € pp for a top attraction. They are willing to pay around the same amount for attractions that match their current needs and they are willing to pay more for attractions that match their personal interests. Thus, the results confirm the idea that current needs play a significant role in the preference for an activity. Next turning to context- interaction effects, we see no significant effects of the nature of the current need on match need. This suggests that on average across segments tourists assign approximately an equal weight to needs. Furthermore, the size of the need does not have a significant interaction effect with the match-need attribute in the overall model. This is unexpected as one would expect a stronger impact when the need is very strong as opposed to just strong. We next turn to the model with segmentation (Table 2). A first observation is that on the level of segments several interaction effects with the current need now are significant. Segments differ in terms of which need is considered most important. Furthermore, we see striking differences on the level of main effects of POI attributes and the constant (value of no activity). Considering the patterns of main and interaction effects the segments can be characterized as follows. Segment-1 individuals assign high values to all attributes—attraction value, match personal interest, match needs, activity costs and travel time. Furthermore, these individuals consider entertainment as a particularly important need as well as being outdoors. However, when the POI matches this need to a large extent the utility of the POI becomes smaller. This is counter intuitive. A possible explanation is that this quality of the POI indicates a situation of a natural environment which they don’t prefer in the context of a city trip. Segment-two individuals are more selective in terms of the attributes they take into account. For these persons only a match with personal interests is relevant; they are insensitive to attraction value. Apart from personal interests they care about costs and to a lesser extent also travel time. Especially, free entrance (no costs) has a big appealing effect on these tourists. A match with a current need is relevant for 123 76 T. Arentze et al. this group only if the need concerns new experiences. However, a strong match has a negative effect on the utility of the activity. An explanation might be that this group dislikes the type of POI that strongly addresses new experiences so that only POIs that moderately match the need are appealing to them. Respondents belonging to segment 3 are also rather selective in terms of the attributes they consider important. They consider personal interest important, but to a much lesser extent than in the other segments. Typical for this segment are the high importance assigned to attraction value and the indifference to costs. They are sensitive to a match with current needs only when the match is strong as opposed to moderate. Furthermore, they assign an above average weight to new experiences. Lastly, this segment is characterized by a high value of the constant indicating that visiting a POI must meet high demands before it is preferred over doing nothing (having a break). In sum, the three classes emerging from this analysis differ in various respects from each other. The first class consists of tourists who seek to get the maximum experience out of available options for visiting POIs in a city trip—they evaluate POIs thoroughly on all aspects. Needs play a role but there is no differentiation with respect to the nature of the need. The second class consists of tourists who choose POIs primarily based on personal interests taking into account costs and effort. Match with a current need is not taken into account except when the need concerns new experiences. The third and last class consists of tourists who impose high demands on what activity POI has to offer paying attention to attraction value, personal interests and need for new experiences. This class is insensitive to costs. Furthermore a strong match with a particular need is not always considered as positive. The likely explanation for this is that meeting a particular need may correlate with certain qualities of a POI that the person finds unattractive in the context of a city trip. The classes are not equal in size. In the sample, the shares are 48.3% (segment 1), 13.1% (segment 2) and 38.7% (segment 3). Table 3 shows the composition of the segments in terms of some key personal background variables. The segments differ significantly on gender, age, education level and work type (profession). 5.3 Incorporating the results in a TRS The model estimation results can be integrated in the user model of a TRS to take into account users’ preferences regarding travel and time-use as well as non-travel characteristics of POIs. The latent class estimation showed that considerable variation exists in how individuals trade-off attributes of POIs. We emphasize that the classes that emerged from this analysis does not necessarily identify general groups of tourists. The differences in preferences may also be related to current circumstances or motivational states (e.g., mood). The classes found do give an indication of the range of variation. In a TRS, this range can be taken into account by identifying the best fitting class for the trip under concern through a dialogue with the user at the moment of planning a trip. A possible way of doing this is to present short descriptions of the profiles to the user and ask him or her to indicate which description would best fit 123 Estimating a latent-class user model for travel… 77 Table 3 Relationships between socio-demographics and segment membership Segments Segment 1 Segment 2 Segment 3 Total X (p value) Variables (%) (%) (%) (%) Gender Male 47.7 39.0 57.4 50.3 4.939 (0.085) Female 52.3 61.0 42.6 49.1 Age 0 B 24 years 13.7 14.6 8.2 11.7 12.716 (0.048) 15 B 44 years 51.0 34.1 42.6 45.6 45 B 64 years 31.4 39.0 36.1 34.2 65? years 3.9 12.2 13.1 8.5 Household type Single 23.5 17.1 18.9 20.9 7.633 (0.106) Couple 48.4 39.0 38.5 43.4 Family with children 28.1 43.9 42.6 35.8 Education level Low 6.5 17.1 18.9 12.7 12.307 (0.015) Medium 44.4 51.2 41.0 44.0 High 49.0 31.7 40.2 13.4 Income level Unknown 16.3 22.0 19.7 18.4 7.873 (0.248) Low 19.6 9.8 12.3 15.5 Medium 46.4 58.5 45.9 47.8 High 17.6 9.8 22.1 18.4 Work status Not 25.5 34.1 25.4 26.6 4.741 (0.315) Part-time 30.1 31.7 23.0 27.5 Full-time 44.4 34.1 51.6 45.9 Work type Crafts/industry/transport/ 4.6 22.0 15.6 11.1 32.244 (0.000) agricultural Administrative 26.1 9.8 13.9 19.3 Commercial 10.5 17.1 18.9 14.6 Health services 22.2 29.3 22.1 23.1 Services/education 14.4 12.2 18.9 15.8 Specialist 22.2 9.8 10.7 16.1 his or her own profile for the trip. Although none of the standard profiles may fit an individual perfectly, it is expected that the segmentation at least will improve the assessment of the true preferences. Such a multi-class model would be an advanced feature of a TRS. Even without segmentation, the integration of the preference estimates (i.e., the one-segment 123 78 T. Arentze et al. solution) already would involve a significant refinement of the user model compared to existing systems. To demonstrate this, the model estimation results were implemented in the c-Space recommender system. The single segment solution was implemented, as the current version of c-Space does not include a method to assess the specific preference profile of a user on this level. The estimation results needed to be further processed before they can be used. For the discrete choice the POI attributes were discretized; the stated choice experiment used three levels for each attribute. For continuous variables such as travel time and entrance costs a TRS needs a continuous function. A continuous function was derived by intra- and extrapolation of the point estimates. For a first qualitative evaluation of the system, an application was developed for Trento, a popular city-trip destination for tourists in Italy. 35 individuals were approached in a street-survey and volunteered to use the system to plan and implement their trip. After having made the tour they filled out a small survey about their experiences. The responses confirmed the usefulness and added value of the system. Users reported that the content suggested for their trip was indeed of their interest (83%) and that they were not able to find such content using other means (91%). Compared to other recommender systems, which typically recommend popular tours, the tours suggested were found to be more in-line with their interests. The prototype and these test results provide evidence that refinement of the user model in the way proposed in this study is feasible and potentially can improve the quality of tour recommendation. 6 Conclusions and discussion For recommending optimal personalized tours it is important to know the way individuals make trade-offs between preferences for particular POIs against routing, costs and time-use characteristics. In this study, we described the c-Space tour- recommender system and considered the empirical estimation of utility weights tourists assign to these factors using the stated-choice-experiment and state-of-the- art choice analysis techniques (latent-class model). A random sample from a large national panel participated in the survey. The analysis revealed the influence of motivational state of a tourist on preferences for activities. It also revealed that the way trade-offs are made and the response to current needs differ significantly between individuals. The latent-class analysis indicated that three segments can be identified. The results of this study can be used to improve user models that are currently used in travel recommender systems. Current models typically assume a process where the selection of POIs and determining a route along the locations of the POIs are performed in separate steps. The multi-attribute utility function estimated in the present study allows the TRS to take the travel and time-use implications of visiting particular POIs into account already in the step of selecting POIs. Thus, using this utility function the selection of POIs that maximize a utility value on the level of a tour can be identified. As we demonstrated by an application of the c-Space TRS, the estimated values can be incorporated in a user profile together with information 123 Estimating a latent-class user model for travel… 79 about personal interests (themes) and needs of a user. A first evaluation demonstrated the efficacy of the approach. Several problems remain for future research. First, the current c-Space system does not support a process for adapting the user model to a user. Extending the system to handle a multi-class user model is an objective of further development. Second, the estimation results were based on a sample from the Dutch population. It is interesting to replicate the study in other countries to see whether similar segments emerge. Third, our study took into account only a limited number of potentially relevant contextual conditions. For advanced Context-Aware TRS (CARS) the set of conditional factors need to be expanded in order to obtain more refined estimates of utility weights in specific cases. Fourth, tourists’ activities are often conducted by individuals in a group and preferences for selecting certain POIs and activities are the result of a group decision process. Our user model did not account for this social aspect. In order to derive a suitable user model for TRSs that do take group preferences into account—so-called Social TRS, the discrete choice analysis need to be expanded in future research. Acknowledgements The research leading to these results has received funding from the European Community’s Seventh Framework Program (FP7/2007-2013) under the Grant Agreement number 611040. The author is solely responsible for the information reported in this paper. It does not represent the opinion of the Community. The Community is not responsible for any use that might be made of the information contained in this paper. We furthermore would like to acknowledge Bruno Simoes of Graphitech for his support in the evaluation study. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, dis- tribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Appendix—optimization of the latent-class model specification Before applying a latent class estimation, the specification of the base model was optimized considering parsimony. Potentially, there are many possible interaction variables that can be considered. To arrive at a parsimonious model, the significance of all interaction variables was tested in a stepwise manner, starting with including all interaction variables in the model and next removing in a stepwise manner the interaction variables that are insignificant. Recall that context variables consist of duration of the city trip, time moment of the activity in the trip and size and nature of the current need. It appeared that none of the interactions concerning duration and time moment are significant and therefore these interaction variables were dropped from the final model. Given the purpose of the present study, all two-way interactions concerning the nature and size of the current need were kept in the final base model so that this factor could be included in the search for significant 123 80 T. Arentze et al. Table 4 Statistics for the latent class models Number Number of Log likelihood at Log Likelihood Rho square (1- Akaike of parameters convergence evaluated at 0 LL(B)/LL0)) information segments (P) (LLB) (LL0) P criterion AIC 119 - 2942.09 - 3942.62 0.254 5922.2 239 - 2804.89 - 3942.62 0.289 5687.8 359 - 2746.04 - 3942.62 0.304 5610.1 479 - 2708.57 - 3942.62 0.313 5575.1 Sample size is 2844 choices from 316 respondents (N) segments. The needs being in open air, relaxation, physical exercise and social contact were merged into a single category (labeled being outdoors) to increase the parsimony of the model further, as little differentiation between these needs emerged. Hence, in the final model nature of the need has three levels: New experience, Entertainment and Being outdoors. The latent class estimation was run for several settings of the number of classes to find the optimal number of segments. Table 4 shows goodness-of-fit statistics for the estimated models where the number of classes is varied from one to four classes. According to the AIC index, the 4-segments model is the best possible model on this data. It is noticed, however, that the improvement of the index going from a 3-segments to a 4-segments model is modest. In terms of interpretation of the estimation results, the 3-segments model appears to be more useful than the 4-segments model. In the latter model, the segmentation has become increasingly sensitive to differences regarding a somewhat trivial factor (namely, the constant representing the utility of the null alternative). For these reasons, we selected the 3-segments model as the best model for the analysis purpose. References Adomavicius, Tuzhilin (2005) Toward the next generation of recommender systems: a survey of the state- of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17:734–749 Aksenov P, Kemperman A, Arentze T (2016) A personalized recommender system for tourists on city trips: concepts and implementation, International Conference on Smart Digital Futures, KES International, 15–17 June, Tenerife, Spain Aksenov P, Kemperman ADAM, Arentze TA (2014) Toward personalized and dynamic cultural routing: a three-level approach. Procedia Environ Sci 22:257–269 Arentze TA (2015) LATUS: A dynamic model for leisure activity-travel utility simulation. Paper prepared for presentation at the 94th Transportation Research Board Annual Meeting, January 2015, Washington, D.C Arentze TA, Timmermans HJP (2009) A need-based model of multi-day, multi-person activity generation. Transp Res Part B Methodol 43(2):251–265 Arentze TA, Ettema D, Timmermans HJP (2010) Incorporating time and income constraints in dynamic agent-based models of activity generation and time use: approach and illustration. Transp Res C 18:71–83 Armbrecht J (2014) Use value of cultural experiences: a comparison of contingent valuation and travel cost. Tour Manag 42:141–148 123 Estimating a latent-class user model for travel… 81 Ashworth GJ, Page SJ (2011) Urban tourism research: recent progress and current paradoxes. Tour Manag 32(1):1–15 Borras J, Moreno A, Valls A (2014) Intelligent tourism recommender systems: a survey. Expert Syst Appl 41:7370–7389 Boxall PC, Adamowicz WL (2002) Understanding heterogeneous preferences in random utility models: a latent class approach. Environ Resour Econ 23:421–446 Braunhofer M, Ricci F (2017) Selective contextual information acquisition in travel recommender systems. Inform Technol Tour 17:5–29 Braunhofer M, Elahi M, Ricci F (2015) User personality and the new user problem in a context-aware point of interest recommender system. In: Tussyadiah I, Inversini A (eds) Information and communication technologies in tourism. Springer, Switzerland, pp 537–549 Buhalis D (1998) Strategic use of information technologies in the tourism industry. Tour Manag 19(5):409–421 Buhalis D, Law R (2008) Progress in information technology and tourism management: 20 years on and 10 years after the internet—the state of eTourism research. Tour Manag 29:609–623 CBS (2017) StatLine, electronic databank of Statistics Netherlands, http://statline.cbs.nl/statweb/?LA=en. Accessed 30 Mar 2017 Fonte FAM, Lo ´ pez MR, Burguillo JC, Peleteiro A, Martı ´nez AB (2013) A tagging recommender service for mobile terminals. In: Cantoni L, Xiang Z (eds) Information and communication, Technologies in Tourism. Springer-Verlag, Berlin, pp 424–435 Gavalas D, Konstantopolous C, Mastakas K, Pantziou G (2014) Mobile recommender systems in tourism. J Netw Comput Appl 39:319–333 Greene WH (2001) Fixed and random effects in nonlinear models. Working Paper EC-01-01, Stern School of Business, Department of Economics Greene WH, Hensher DA (2002) A latent class model for discrete choice analysis: Contrast with mixed logit. Working Paper ITS-WP-02-08, Institute of Transport Studies. The University of Sydney, Australia Gretzel U, Mitsche N, Hwang YH, Fesenmaier DR (2004) Tell me who you are and I will tell you where to go: use of travel personalities in destination recommendation systems. Inform Technol Tour 7:3–12 Gupta S, Chintagunta PK (1994) On using demographic variables to determine segment membership in logit mixture models. J Mark Res 31:128–136 Hanani U, Shapira B, Shoval P (2001) Information filtering: overview of issues. Res Syst User Model User-Adapt Interact 11:203–259 Hensher DA, Rose JM, Greene WH (2015) Applied choice analysis, 2nd edn. Cambridge University Press, Cambridge , UK (ISBN: 9781107465923) Kamakura W, Russell G (1989) A probabilistic choice model for market segmentation and elasticity structure. J Mark Res 26:379–390 Kerkman K, Arentze T, Borgers A, Kemperman A (2012) Car drivers compliance with route advice and willingness to choose socially desirable routes. Transport Res Rec 1:102–109 Lew A, McKercher B (2006) Modeling tourist movements. A local destination analysis. Ann Tour Res 33(2):403–423 Lin Y, Kerstetter D, Nawijn J, Mitas O (2014) Changes in emotions and their interactions with personality in a vacation context. Tour Manag 40:416–424 Linaza MT, Agirregoikoa A, Garcia A, Torres JI, Aranburu K (2011) Image-based travel recommender system for small tourist destinations. In: Law R et al (eds) Information and communication technologies in tourism. Springer-Verlag, Wien, pp 1–11 Ma J, Gao J, Scott N, Ding P (2013) Customer delight from theme park experiences. The antecedents of delight based cognitive appraisal theory. Ann Tour Res 42:359–381 Mackay K, Vogt C (2012) Information technology in everyday and vacation contexts. Ann Tour Res 39(3):1380–1401 Neidhardt J, Seyfang L, Schuster R, Werthner H (2015) A picture-based approach to recommender systems. Inform Technol Tour 15:49–69 Nijland L, Arentze T, Timmermans H (2010) Eliciting the needs that underlie activity-travel patterns and their covariance structure: results of multimethod analyses. J Transp Res Rec 2157:54–62 Schneider OP, Vogt CA (2012) Applying the 3M model of personality and motivation to adventure travelers. J Travel Res 51:704–716 123 82 T. Arentze et al. Simoes B, Aksenov P, Santos P, Arentze T (2015) C-space: fostering new creative paradigms based on recording and sharing ‘‘casual’’ videos through the internet, Multimedia & Expo Workshops (ICMEW), 2015 IEEE International Conference Steen Jacobsen JK, Munar AM (2012) Tourist information search and destination choice in a digital age. Tour Manag Perspect 1(1):39–47 Swait J (1994) A structural equation model of latent segmentation and product choice for cross-sectional revealed preference data. J Retail Consum Serv 1(2):77–89 Wang D, Park S, Fesenmaier DR (2012) The role of smartphones in mediating the touristic experience. J Travel Res 51(4):371–387 Worndl W, Hefele A (2016) Generating paths through discovered places-of-interests for city trip planning. In: Inversini A, Schegg R (eds) Information and communication technologies in tourism. Springer, Heidelberg, pp 441–453 Wynen J (2013) Explaining travel distance during same-day visits. Tour Manag 36:133–140 Yang WS, Hwang SY (2013) iTravel: a recommender system in mobile peer-to-peer environment. J Syst Softw 86:12–20 Yeh DY, Cheng CH (2015) Recommendation system for popular tourist attractions in Taiwan using Delphi panel and repertory grid techniques. Tour Manag 46:164–176 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Information Technology & Tourism Springer Journals

Estimating a latent-class user model for travel recommender systems

Free
22 pages

Loading next page...
 
/lp/springer_journal/estimating-a-latent-class-user-model-for-travel-recommender-systems-CTVnfxFhYM
Publisher
Springer Journals
Copyright
Copyright © 2018 by The Author(s)
Subject
Business and Management; IT in Business
ISSN
1098-3058
eISSN
1943-4294
D.O.I.
10.1007/s40558-018-0105-z
Publisher site
See Article on Publisher Site

Abstract

Inf Technol Tourism (2018) 19:61–82 https://doi.org/10.1007/s40558-018-0105-z ORI G INAL RESEARCH Estimating a latent-class user model for travel recommender systems 1 1 1 • • Theo Arentze Astrid Kemperman Petr Aksenov Received: 30 May 2017 / Revised: 21 December 2017 / Accepted: 17 January 2018 / Published online: 2 February 2018 The Author(s) 2018. This article is an open access publication Abstract In determining the selection of sites to visit on a trip tourists have to trade-off attraction values against routing and time-use characteristics of points of interest (POIs). For recommending optimal personalized travel plans an accurate assessment of how users make these trade-offs is important. In this paper we report the results of a study conducted to estimate a user model for travel recommender systems. The proposed model is part of c-Space—a tour-recommender system for tourists on a city trip which uses the LATUS algorithm to find personalized optimal tours. The model takes into account a multi-attribute utility function of POIs as well as dynamic needs of persons on a trip. A stated choice experiment is designed where the current need is manipulated as a context variable and activity choice alternatives are varied. A random sample of 316 individuals participated in the on-line survey. A latent-class analysis shows that significant differences exist between tourists in terms of how they make the trade-offs between the factors and respond to needs. The estimation results provide the parameters of a multi-class user model that can be used for travel recommender systems. Keywords Travel recommender systems  User model  City trip  Stated choice experiment  Latent class model & Theo Arentze t.a.arentze@tue.nl Urban Systems and Real Estate Group, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, The Netherlands 123 62 T. Arentze et al. 1 Introduction With the advancement of information and communication technologies (ICT) the development and use of recommender systems that can offer tourists personalized advice and recommendation on which activities to conduct at a destination has received increasing attention (e.g., Buhalis 1998; Buhalis and Law 2008; Mackay and Vogt 2012; Steen Jacobsen and Munar 2012). A typical user of a travel recommender system is a tourist who is interested in exploring a city and wants to make a tour around (e.g., Yang and Hwang 2013; Borras et al. 2014). Such a tour comprises a scheduled list of attractions (museums, heritage sites, shops, parks or other points of specific interest) as well as the trips needed to travel from one point to the other (e.g., Gretzel et al. 2004; Gavalas et al. 2014). Travel Recommender Systems (TRSs) help to overcome the information load tourists may experience when they search for options, by providing users selected items that match their personal preferences (Braunhofer et al. 2015). For this a critical element of TRSs is the ability to acquire the relevant information about preferences and needs of the user and identify the POIs that match his or her interests. A number of alternative methods have been proposed to tackle this problem. These can be classified as collaborative filtering (matching a user to other users that have similar interests and preferences), content-based filtering (matching based on attributes of POIs) and knowledge-based methods (e.g., case-based reasoning). An overview of techniques in this area can be found in Hanani et al. (2001) and Adomavicius and Tuzhilin (2005). Across these approaches users’ preferences to be predicted are often formulated as rates assigned to items (POIs) that reflect how much one likes the product or service. For determining an optimal tour, however, users have to trade-off their interests in certain POIs against other considerations such as travel costs (time and effort it takes to reach the location), fee or entrance costs, and preferred allocation of time across activities. Furthermore, individuals’ preferences may depend on needs that change depending on previous activities. Such dynamic needs give rise to saturation effects and variety seeking (Arentze and Timmermans 2009). If multiple activities have to be combined on a trip, the way a user makes trade-offs between these considerations determines overall preferences for selections of POIs. Thus, in the context of tour planning, the selection of POIs is a multi-criteria decision problem. Hereby, individual travelers may differ in the weights they assign to these components in determining their preference. Although the multi-criteria nature of preferences for tours is widely acknowl- edged in advanced trip planners for ordinary travel (Kerkman et al. 2012), it has received limited attention in user models of TRSs. In the present study, we present a method to estimate tourist’s preferences taking into account the various factors involved in city trip planning. In this method, the preference value or utility for including a certain POI in a tour is modeled as a function of attributes of POIs. The utility function is estimated using a stated choice experiment administered in a survey. The estimated utility function defines a user model that allows a TRS to compose an optimal tour given personal information about specific interests of an 123 Estimating a latent-class user model for travel… 63 individual user. We design a stated choice experiment that allows the estimation of the relevant parameters and present the results of an application involving a large sample of individuals from a national on-line panel. Individual tourists may differ in terms of the way they make the trade-offs. To account for heterogeneity among individuals and identify the extent to which preferences may differ, we estimate a latent-class model. The method we propose is developed in the context of the c-Space TRS for city trips (Aksenov et al. 2014, 2016). A special characteristic of the c-Space system is that it takes dynamic needs into account by using an advanced algorithm to find personalized optimal tours called LATUS (Arentze 2015). In the context of the c-Space system, the estimates are used to define an initial user profile that can be adapted if more information about a user’s preferences becomes available. The recommender system and the LATUS algorithm have been described in earlier work as referenced above. In this study, we briefly explain the system and present the proposed method to estimate user preference profiles. The results of this study also provide substantive insights in tourists’ preferences for visiting POIs in city tours. The rest of the paper is structured as follows. First, in the next section we will review the existing approaches in the field of TRS with respect to user modeling. Then, in Sect. 3, we briefly describe the c-Space system and LATUS algorithm to offer a system concept for the user model. Then, in Sect. 4, we describe the stated choice experiment and survey method. In Sect. 5, we present the results of the survey and estimation of the latent-class model. Finally, we conclude the paper with a discussion of major conclusions and directions for future research. 2 Related work The core component of TRSs is a (filtering) algorithm to select from an exhaustive database the POIs that match a user’s preferences. Collaborative filtering is a much used technique in TRSs. In this technique, personal background or history information about a user is used to identify users with similar characteristics of whom the preferences are known. Preferences are typically represented in the form of ratings assigned to POIs. The average rates assigned by previous similar users is used as a best estimate of the preferences of the user the system is interacting with. The definition of similarity is a critical component in this process. If already ratings of the user are known from previous interactions with the system, similarity can be measured based on matching ratings. If such history information is not available then similarity may be defined based on known demographic data of users such as age, gender and education. An alternative to collaborative filtering is content-based filtering. In a content-based approach, items are recommended that have the same attributes as the items that the user has liked before (Neidhardt et al. 2015). A generally acknowledged problem with the filtering methods is the so-called cold-start problem. This problem occurs when requests come from new users who have not yet submitted any ratings or concern new items which have not been evaluated before (the first-rater problem) (Fonte et al. 2013). Knowledge based systems have been proposed where preferences are derived based on reasoning 123 64 T. Arentze et al. about user requirements that go beyond a simple matching of ratings. A well-known example of a knowledge-based technique is case-based reasoning (Fonte et al. 2013). The new user problem has also received attention in so-called Context Aware Recommender Systems (CARS). These systems emphasize that users’ preferences are dependent on contextual conditions and, hence, that recommendations should be context dependent. In tourism choice, weather conditions (sunny or rainy, etc.), travel party (alone or traveling with others) and travel mode (e.g., transport mode) are influential contextual conditions. Braunhofer and Ricci (2017) report the results of a survey conducted to identify important context factors and estimate the influence of these factors on rating predictions in the context of TRSs. Also, the role of emotion and personality traits have received attention as context factors in CARS. In a survey conducted to elicit tourists’ preferences, Neidhardt et al. (2015) use a picture based approach to address preferences on an emotional level. Braunhofer et al. (2015) show that personality traits of the Big-5 model provide useful information for generating context-aware recommendations. They argue that personality trait data are relatively easy to collect and especially useful for ranking the recommendations in case of new users. TRSs have gone further than recommending POIs in isolation. Recommendation of complete packages is relevant for tourists who want to plan a tour combining visits to several POIs on the same trip, e.g., a day-tour in a city. Many systems have considered this extended problem of recommending routes (for a review see Wo ¨ rndl and Hefele 2016). In planning a route, preferences related to interests in POIs need to be combined with other characteristics of POIs such as estimated visit times, travel distance and costs (fee or entrance). As Wo ¨ rndl and Hefele (2016) state: ‘‘the process of generating a path from a start to an end point with interesting POIs along the way can be split up into two subtasks. First, potential candidate places have to be determined and scored, and then a path finding algorithm need to generate the best route consisting of a subset of these places.’’ An example is the image-based system MoreTourism (Linaza et al. 2011). This system first elicits a user’s preferences and next recommends the POIs that have the highest utility and an optimal route taking into account estimated visit times, open and close times, and costs. In this study, we consider TRSs that have the objective to recommend complete tours. Finding an optimal tour requires that POI rating scores are traded-off against travel time, entrance costs and time-use characteristics of POIs. The purpose of the present study is to empirically assess the way individuals make these trade-offs. We model individuals’ preferences for POIs in the context of a tour as a multi-attribute utility function and estimate the utility weights in the framework of a discrete choice model. The influence of context conditions is taken into account to allow context- aware recommendation. Stated preference data from a representative sample of individuals are collected in an on-line survey. Using a latent-class model, the estimation of preference parameters and clustering of individuals regarding the preferences they display are performed simultaneously. Thus, the estimation results 123 Estimating a latent-class user model for travel… 65 also indicate the extent to which preferences differ between individuals. In the next section, we will first briefly introduce the c-Space TRS. 3 The c-Space system To formulate the multi-attribute utility function, the c-Space TRS (Aksenov et al. 2016) is the point of departure. c-Space generates personalized tours taking into account a user’s personal thematic interests in particular POIs (architecture, cathedrals, museums, etc.) as well as the weights he or she assigns to a set of basic leisure needs (relaxation, entertainment, new experiences, socializing, etc.). c-Space has been developed as a smartphone application wherein the recommendation functionality is integrated as a REST service (Simoes et al. 2015). Thematic interests and needs as well as time budget and travel constraints are retrieved in a dialogue with the user on the smartphone. Figure 1 shows an example of a dialogue. The resulting user profile is input to the LATUS algorithm together with utility weights of attributes of POIs. The recommended tour including travel plans to reach the various locations are displayed on a map of the city (Fig. 2). In c-Space, location and attribute data about the available POIs in the city of interest are stored in a database. The attribute data stored include general information, such as opening hours and ticket costs (entrance fee), as well as information specifically collected for the c-Space system. The specific information includes the recommended duration of a visit to the POI (in hour units), attraction Fig. 1 Example of a c-Space user dialogue for setting weights of needs 123 66 T. Arentze et al. Fig. 2 Example of a c-Space display of a route plan value (popularity) and theme (subject). The specific information is provided by experts from the local tourist agency. A special part of the POI data consists of parameters, one for each need, that indicate the extent to which visiting the POI matches needs on a zero–one scale (zero indicating no match and one complete match). These parameters are determined based on rule-based knowledge of the types of activities involved in a POI (e.g., a museum can satisfy a need for new experiences to a large extent and a need for physical exercise to a small extent; a botanic garden matches a need to be outdoor to a large extent and a need for entertainment to a small extent, etc.). The degree of match also determines the size of the impact the POI has on the need (e.g., a museum reduces the need of new experiences to a large extent). The POI database and personal profile of the user provide the information for determining utility scores of POIs. The utility score of a POI is determined as a function of the match of the POI with the interests of the person, the attraction value of the POI, the match of the POI with current needs, the travel required (geographical distance) and the monetary costs involved in visiting the POI. A POI matches the interests, if the theme of the POI corresponds with a theme the user has indicated to be of interest to him or her. Match with current needs is determined based on the POI-need-matching parameters, the weights the user assigns to the different needs and an assessment of the current size of each need. The sum across weighted need impacts determines the utility score regarding the match with needs (see Arentze 2015 for details). Due to the dynamic nature of needs, the utility function is dynamic. That is to say, the utility of a POI is dependent on other POIs included in the tour due to the impacts activities have on dynamic needs of the traveler (e.g., a museum will be less attractive if current POIs in the program have already reduced the need for new 123 Estimating a latent-class user model for travel… 67 experiences). Therefore, each time a POI is added to the evolving program the state of the needs are updated before a next POI is considered. The LATUS algorithm is designed to determine the optimal selection of POIs taking into account these interactions between POIs on the overall utility. LATUS starts with an empty program and successively adds POIs selected from a list of optional POIs until the time budget is fully used or no utility can be added anymore. The problem of finding the optimal tour is split in two parts: (1) determining the program by selecting POIs and (2) determining the sequence in which the POIs are visited in the tour and the travel routes between POIs. The optimum sequence is defined as the sequence that minimizes the overall travel costs and is found by means of a heuristic method. To find the selection that maximizes the utility of the tour, LATUS uses a heuristic method. The method is schematically shown in Fig. 3. In this method, the best POI to add is identified as the POI that meets a time-use requirement and maximizes the added utility. The time-use requirement is defined as a threshold level of the utility per unit time taking into account the time to reach the location and the (normal) visiting duration. The threshold level is a parameter set by the system that should reflect the time budget. The more time available the lower the threshold can be set and, vice versa, the tighter the budget the higher the threshold needs to be. Since the proper level of the threshold cannot be computed analytically, LATUS uses a trial-and-error method to find the proper threshold level in a pre-processing step. Starting with a best-guess initial value it increases the threshold when the resulting selection exceeds the budget and lowers the value when time is left in the budget. This heuristic appears to be very powerful in finding the optimal (highest utility) tours (Arentze and Timmermans 2009; Arentze et al. 2010). No Stop Has time left? Yes User’s needs and Select POI POI list preferences Update No POI selected? Stop Yes Add to program of tour Fig. 3 Schematic representation of the LATUS algorithm for selecting POIs 123 68 T. Arentze et al. The model estimation in this study provides the utility-weights of the user profile. The intended contribution of the present study is to show how utility weights for this class of TRSs can be estimated and segmented. In the sections that follow, we describe the design of the choice experiment, the survey and the results of the latent- class-model analysis. 4 Methodology In this section we describe the methodology used in the present study. The core elements of the methodology are a stated choice experiment used to collect data about preferences of tourists on city trips and a latent-class model to estimate preference parameters. Before explaining these elements we will first discuss the underlying behavioral assumptions. 4.1 Behavioral assumptions Although our point of departure is the cSpace TRS, our purpose is to derive a user model that is relevant more broadly for TRSs that are focused on recommending tours. Therefore, in this section we highlight the theoretical considerations that have led to the model specification used in c-Space. 4.1.1 Attributes of POIs The proposed user model assumes that a tourist’s preferences for selecting POIs in the context of a city tour depend on a number of attributes. First, the general attraction value of the point of interest is relevant, that is, the extent to which the point of interest is special, worth a special trip, or even the primary reason to visit the city (e.g., Ashworth and Page 2011; Yeh and Cheng 2015). For example, in many travel guide books some kind of rating system is used to distinguish a top attraction from attractions of less importance (e.g., classification according to the Michelin stars: * of interest, ** worth a detour, *** worth the trip). Second, the extent to which the point of interest matches a person’s personal interest in particular objects/themes is a consideration. For example, some people may be fascinated by cathedrals whereas others find them boring. Third, options may vary in terms of the extent to which the activity matches a current emotional or motivational state given the activities a person has already conducted on the same (city) trip (e.g., Lin et al. 2014; Ma et al. 2013). For example, if all previous activities conducted so far have been indoors, the person may prefer to conduct the next activity in the open air. Fourth, accessibility and costs considerations may play a role: options may differ in terms of the effort (e.g., travel time) it takes to travel to the location or the fee one needs to pay to visit a site (e.g., Armbrecht 2014; Lew and McKercher 2006; Wynen 2013). 123 Estimating a latent-class user model for travel… 69 4.1.2 Dynamic needs The choice of an activity generally involves a trade-off between these consider- ations. By their nature, attraction value, personal interest, effort and costs are static attributes, as the evaluation of these attributes does not depend on a momentary state of the person. In contrast, the extent to which visiting the POI meets the current needs of the tourist is inherently time-dependent. Although mood (and emotion) is also a relevant dimension in this regard (e.g., Wang et al. 2012), we focus here on basic needs. We adopt a classification of basic leisure needs that emerged in the empirical study by Nijland et al. (2010). Based on an analysis of motivations underlying leisure activities, the authors identified 6 need dimensions: new experiences/information; entertainment; relaxation; being in open air/green envi- ronment; physical exercise and social contact. Individuals may differ in terms of how strong these needs are felt or valued. Some may develop more quickly a need for entertainment while others may be more sensitive to new experiences and so on. Such differences may be related to a personal trait (e.g., thrill seeking) (Schneider and Vogt 2012) but also be affected by the nature of the primary activity (the job or occupation) of the person in daily life. For example, a person who has a hectic job in daily life may be inclined to seek relaxation in leisure activities instead of new experiences or socializing. 4.2 Design of the choice experiment To estimate tourists’ preference parameters regarding activity choice during a city trip, we use the technique of stated choice experiment (also known as conjoint analysis) (e.g., Hensher et al. 2015). In this technique, individuals are presented a choice task where they are asked to indicate their preference among a set of choice options (a choice set). The choice options are hypothetic and described in terms of a set of attributes. The attributes and the values each attribute can take are pre-defined as part of the experimental design. Across choice tasks, the attributes are varied based on a statistical design so that the separate (utility) effects of the attributes can be identified through statistical analysis of the obtained choice data. In the experiment we constructed, respondents are asked to imagine the following hypothetical situation: Imagine that you are going to make a city trip to a city you do not know yet. It is a safe, not too crowded and well accessible city. You are traveling together with a person (e.g., partner, adult–child, friend, other family member) who has the same interests as you have. There is much to see and to do in the city that is worthwhile and for sure you will not have enough time if you would want to see and do all. Furthermore, it is good weather for visiting the city. Next, choice tasks are presented to respondents where the context setting for the trip and choice alternatives are varied simultaneously. The context setting for the trip is varied in terms of the following attributes: – Total duration of the city trip (one afternoon, 1 day, 2 days). 123 70 T. Arentze et al. – The time moment of doing the activity in the context of the trip (first activity, in- between activity, last activity). – The size and nature of the current need (size: strong and very strong; nature: new experience, entertainment, relaxation, exercise, open air—green environ- ment, socializing, no specific need). An activity consistently involves visiting a particular POI. The manipulation of needs (the last item) is a key element of this experiment. To avoid needless complexity, it is assumed that a need exists on only one dimension at a time (combinations of needs are not considered). To include a null measurement, the absence of a need is included as a possible level as well; hence this variable has seven levels. The size of the need (if any) has two possible levels—strong and very strong. Literally, the need condition is formulated as: At this moment you have [size] need for [dimension] The choice alternatives are optional POIs; they are varied in the following way on the following attributes: – Attraction value of the POI (one star, two stars, three stars). – Extent to which the POI meets the person’s interests (very low, average, very much). – Extent to which the POI fulfils the person’s current need (very low, average, very much). – The costs of visiting the POI (free, 5 € pp, 10 € pp). – Travel time to reach the POI from the person’s current location (on the route, 10 min walking, 20 min walking). As said, number of stars is an often used labeling system to indicate attraction value in tourist guides and, therefore, is used here. We use separate designs for varying the contexts and choice alternatives. For the context we use a design in nine profiles. The nine profiles are a fraction of a full factorial design of 3 9 2 profiles. The fraction of nine profiles allows estimation of all main effects independently of all first-order interaction effects between attributes. Secondly, we combine the nine profiles with the seven needs (including ‘no specific need’) resulting in 63 different contexts. To design the activity alternatives, we use a design in 27 profiles. The 27 profiles are a fraction of a full- factorial design consisting of 3 profiles. Just as in the case of the design for contexts, this fraction allows the estimation of main effects of attributes independently of all first-order interaction effects. Each respondent is presented with nine choice tasks that are generated by randomly selecting nine context profiles and per context a choice set is presented including three randomly selected POI profiles. The respondent is asked, given the specific context setting, which POI he/ she would prefer or to select the base alternative which is taking a break (not doing any specific activity at the moment). 123 Estimating a latent-class user model for travel… 71 4.3 Latent class model A latent class model is used to segment the respondents regarding their city trip activity preferences (e.g., Swait 1994; Boxall and Adamowicz 2002; Greene and Hensher 2002). In the estimation respondents are simultaneously grouped into segments (or latent classes) and separate parameters are estimated for each of these segments. In our study, we assume that individuals derive some utility from choosing a specific POI during their city trip. This utility can vary between different POIs based on the attributes describing the context and the POI itself. For the usual multinomial logit model (MNL), the utility for individual i for POI j on choice occasion t can be written as: U ¼ b X þ e ; ijt ijt ijt where X expresses all attributes (defining context and POI) with relative weights ijt (parameters b ) to be estimated. e is an error term representing unobserved ijt heterogeneity in utilities. This equation assumes that the parameters are the same for all individuals. However, we assume that there exist S different homogeneous latent classes (segments) in the sample. Given that an individual belongs to latent class s (s = 1, …, S), the utility for individual i belonging to class s for activity j on choice occasion t is defined as: U ¼ b X þ e ; ijt ijt ijt where b is a parameter vector for each latent class s. The probabilities of choice can be derived from the utility function, resulting in the latent class multinomial model (LCM). For each latent class, the probability that individual i chooses POI j at choice occasion t is: expðb X Þ ijt PyðÞ ¼ jjsegment ¼ s¼ : it 0 expðb X Þ ijt j¼1 s For each individual i the probability of belonging to latent class s can be obtained by: expðh Z Þ Pðsegment ¼ sÞ¼ ; expðh Z Þ s¼1 s where Z is an optional set of observable characteristics invariant of the individual choice situation. If no such characteristics are included, the class specific proba- bilities are a set of fixed constants that sum to one. Each individual is assigned to the latent class with the highest probability. The latent class parameters can be estimated using maximum likelihood estimation (see Greene 2001 for details). The likelihood ratio test statistic [G2 =- 2(LL(0) - LL(B))] is used to test whether the estimated choice model LL(B) significantly improves the null model LL(0). McFadden’s Rho square (q = 1 - LL(B)/LL(0)) indicates the goodness of fit of the estimated choice model. 123 72 T. Arentze et al. To select the optimal number of segments, the minimum Akaike Information Criterion [AIC =- 2(LL(B) - P)] is used (e.g., Kamakura and Russell 1989; Gupta and Chintagunta 1994). 5 Results In this section we describe the data collection in terms of the survey and the sample, and the results of the estimation of the latent-class model. 5.1 Survey and sample The choice experiment was implemented in an on-line questionnaire. Apart from the choice experiment, the questionnaire also includes questions to record relevant background variables of the persons. In addition to the usual socio-demographic variables (gender, age, household type, education level, income level, work status), this includes a rating of the felt importance of each of the six basic leisure needs for the benefits the person seeks in a city trip. For these judgements a seven-point rating scale is used. In addition, the nature of the occupation (job, if any) is queried. Respondents indicate the nature of their occupation based on a classification consisting of nine profession types. This set-up allows us to relate pursued needs in leisure time to characteristics of the work activity (job type). Invitations to participate in the survey were sent to a random sample of a large existing national panel which should be representative for the Dutch population. Only respondents that have made at least one city trip in the last 2 years could proceed with the questionnaire. A city trip is defined as a visit to a city in leisure time with the aim to explore the city. A city trip lasts minimally 4 hours and does not include more than three nights. By this filter, we made sure that the relevant segment of the population was selected. In total 316 persons completed the survey. Table 1 shows the distribution of the sample for some key socio-demographic characteristics. The distributions are fairly representative for the (Dutch) population. The last row shows the distribution of the respondents across the nine profession types distinguished. Administrative, Commercial and Specialists professions are the largest categories and have shares in the range of 16–20%. Crafts & industry, Transport, Services and Education are smaller with shares ranging from 5–10%. Agricultural is only very small with a share of merely 0.6%. 5.2 The latent-class model The model specification we use allows us to estimate main effects of all (three- level) attributes of POI choice alternatives including attraction value, match personal interests, match current needs, the costs of the activity and travel time. Consistently, effect coding was used where the highest level is taken as the base. Effect coding means that each three-level variable is coded by two effect-variables: the effect-one variable is coded as [1, 0, - 1] and the effect-two variable as [0, 1, 123 Estimating a latent-class user model for travel… 73 Table 1 Sample characteristics Variables Levels % in sample % Dutch population* Gender Male 50.3 49.5 Female 49.7 50.5 Age 0 B 24 years 11.7 29.0 15 B 44 years 45.6 25.1 45 B 64 years 34.2 28.1 65? years 8.5 17.8 Household type Single 20.9 37.4 Couple 43.4 29.0 Family with children 35.8 33.6 Education level Low 12.7 31.3 Medium 44 38.7 High 43.4 28.5 Income level Unknown 18.4 15.5 Low 15.5 47.2 Medium 47.8 37.3 High 18.4 37.3 Work status Not 26.6 29.8 Part-time 27.5 23.9 Full-time 45.9 46.3 Work type Crafts & industry/transport/agricultural 11.1 5.8 Administrative 19.3 17.8 Commercial 14.6 15.8 Health services 23.1 18.2 Services/education 15.8 22.3 Specialist 16.1 20.1 *(CBS 2017) - 1] for the [low, mid, high] level of the original attribute variable. Furthermore, the model enables the estimation of two-way interactions between all the context variables and all POI choice attributes. Of specific interest is the interaction between the nature of the current need (a context variable) and match with current need (a POI attribute). On that level, interaction effects indicate to what extent individuals differentiate between need dimensions. In pre-processing steps, the specification of the latent-class model (number of classes and selection of interactions) was optimized to arrive at a parsimonious model. A three-class model appeared to be optimal. See the Appendix for the details. Table 2 shows the detailed estimation results for the three-segments model and base model (no segmentation) respectively. Estimation results of the base model represent average behavior across all segments. On this level, the results indicate that all attributes are strongly significant. The difference between utility values of the lowest and highest level indicates the relative importance of the attribute under concern in the choice of activity. Using that criterion, match with personal interests 123 74 T. Arentze et al. Table 2 Results of the Latent-Class Model estimation 1-segment model 3-segment model Parameter (t- Segment 1 Segment 2 Segment 3 statistic) Constant 1.606 (18.68) 0.949 (6.32) 1.099 (3.74) 2.155 (10.87) Activity Attraction value 3 stars 0.383 (11.21) 0.677 (8.46) - 0.188 0.469 (8.09) (- 1.16) 2 stars 0.088 (2.42) 0.094 (1.24) 0.116 (0.72) 0.112 (1.92) 1 star - 0.471 - 0.771 0.072 - 0.581 Match interest Very much 0.621 (17.53) 1.493 (14.53) 0.736 (4.48) 0.180 (2.90) Average 0.074 (2.08) 0.235 (3.24) 0.116 (0.81) 0.013 (0.23) Very low - 0.695 - 1.728 - 0.852 - 0.193 Match needs Very much 0.454 (4.83) 1.391 (6.90) 0.280 (0.68) 0.054 (0.35) Average 0.068 (0.79) 0.069 (0.46) 0.171 (0.56) 0.057 (0.40) Very low - 0.522 - 1.460 - 0.451 - 0.111 Activity costs No cost 0.419 (11.24) 0.750 (8.77) 0.175 (8.36) 0.109 (1.70) 5 € pp 0.039 (1.05) 0.025 (0.32) 0.226 (1.47) 0.033 (0.55) 10 € pp - 0.458 - 0.775 - 0.401 - 0.142 Travel time On the route 0.254 (7.01) 0.309 (4.38) 0.304 (1.94) 0.223 (3.64) 10 min walking 0.036 (0.91) 0.136 (1.54) 0.132 (0.82) 0.079 (1.27) 20 min walking - 0.290 - 0.445 - 0.436 - 0.302 Context-activity Need new experiences Match needs very 0.130 (0.97) 0.042 (0.16) - 1.530 0.409 (1.80) much (- 2.26) Match needs average 0.004 (0.03) 0.226 (0.91) 1.346 (2.38) - 0.447 (- 1.95) Need entertainment Match needs very 0.211 (1.59) 0.552 (1.94) - 0.509 0.147 (0.66) much (- 0.96) Match needs average - 0.062 (- 0.51) 0.309 (1.22) - 0.422 - 0.124 (- 0.95) (- 0.58) Need being outdoor* Match needs very 0.085 (0.80) 0.066 (0.31) - 0.074 0.110 (0.63) much (- 0.16) Match needs average 0.002 (0.02) 0.356 (1.93) - 0.137 .- 0.146 (- 0.35) (- 0.88) 123 Estimating a latent-class user model for travel… 75 Table 2 continued 1-segment model 3-segment model Parameter (t- Segment 1 Segment 2 Segment 3 statistic) Current need very strong Match needs very 0.045 (1.13) - 0.037 0.005 (0.03) 0.187 (2.80) much (- 0.45) Match needs average 0.007 (0.18) 0.109 (1.43) - 0.201 - 0.065 (- 1.18) (- 0.99) Segment probabilities (t-statistic) 0.483 (12.81) 0.131 (5.37) 0.387 (10.66) t-values are indicated in italics *Being outdoor = need for relaxation, exercise and being in the open air-green environment has the largest effect and, hence, is the most important attribute. Attraction value, match needs and activity costs have approximately equal values which are larger than the value of travel time and smaller than the value of match interests. A three- stars attraction is approximately equivalent to 10 € costs suggesting that tourists are willing to pay 10 € pp for a top attraction. They are willing to pay around the same amount for attractions that match their current needs and they are willing to pay more for attractions that match their personal interests. Thus, the results confirm the idea that current needs play a significant role in the preference for an activity. Next turning to context- interaction effects, we see no significant effects of the nature of the current need on match need. This suggests that on average across segments tourists assign approximately an equal weight to needs. Furthermore, the size of the need does not have a significant interaction effect with the match-need attribute in the overall model. This is unexpected as one would expect a stronger impact when the need is very strong as opposed to just strong. We next turn to the model with segmentation (Table 2). A first observation is that on the level of segments several interaction effects with the current need now are significant. Segments differ in terms of which need is considered most important. Furthermore, we see striking differences on the level of main effects of POI attributes and the constant (value of no activity). Considering the patterns of main and interaction effects the segments can be characterized as follows. Segment-1 individuals assign high values to all attributes—attraction value, match personal interest, match needs, activity costs and travel time. Furthermore, these individuals consider entertainment as a particularly important need as well as being outdoors. However, when the POI matches this need to a large extent the utility of the POI becomes smaller. This is counter intuitive. A possible explanation is that this quality of the POI indicates a situation of a natural environment which they don’t prefer in the context of a city trip. Segment-two individuals are more selective in terms of the attributes they take into account. For these persons only a match with personal interests is relevant; they are insensitive to attraction value. Apart from personal interests they care about costs and to a lesser extent also travel time. Especially, free entrance (no costs) has a big appealing effect on these tourists. A match with a current need is relevant for 123 76 T. Arentze et al. this group only if the need concerns new experiences. However, a strong match has a negative effect on the utility of the activity. An explanation might be that this group dislikes the type of POI that strongly addresses new experiences so that only POIs that moderately match the need are appealing to them. Respondents belonging to segment 3 are also rather selective in terms of the attributes they consider important. They consider personal interest important, but to a much lesser extent than in the other segments. Typical for this segment are the high importance assigned to attraction value and the indifference to costs. They are sensitive to a match with current needs only when the match is strong as opposed to moderate. Furthermore, they assign an above average weight to new experiences. Lastly, this segment is characterized by a high value of the constant indicating that visiting a POI must meet high demands before it is preferred over doing nothing (having a break). In sum, the three classes emerging from this analysis differ in various respects from each other. The first class consists of tourists who seek to get the maximum experience out of available options for visiting POIs in a city trip—they evaluate POIs thoroughly on all aspects. Needs play a role but there is no differentiation with respect to the nature of the need. The second class consists of tourists who choose POIs primarily based on personal interests taking into account costs and effort. Match with a current need is not taken into account except when the need concerns new experiences. The third and last class consists of tourists who impose high demands on what activity POI has to offer paying attention to attraction value, personal interests and need for new experiences. This class is insensitive to costs. Furthermore a strong match with a particular need is not always considered as positive. The likely explanation for this is that meeting a particular need may correlate with certain qualities of a POI that the person finds unattractive in the context of a city trip. The classes are not equal in size. In the sample, the shares are 48.3% (segment 1), 13.1% (segment 2) and 38.7% (segment 3). Table 3 shows the composition of the segments in terms of some key personal background variables. The segments differ significantly on gender, age, education level and work type (profession). 5.3 Incorporating the results in a TRS The model estimation results can be integrated in the user model of a TRS to take into account users’ preferences regarding travel and time-use as well as non-travel characteristics of POIs. The latent class estimation showed that considerable variation exists in how individuals trade-off attributes of POIs. We emphasize that the classes that emerged from this analysis does not necessarily identify general groups of tourists. The differences in preferences may also be related to current circumstances or motivational states (e.g., mood). The classes found do give an indication of the range of variation. In a TRS, this range can be taken into account by identifying the best fitting class for the trip under concern through a dialogue with the user at the moment of planning a trip. A possible way of doing this is to present short descriptions of the profiles to the user and ask him or her to indicate which description would best fit 123 Estimating a latent-class user model for travel… 77 Table 3 Relationships between socio-demographics and segment membership Segments Segment 1 Segment 2 Segment 3 Total X (p value) Variables (%) (%) (%) (%) Gender Male 47.7 39.0 57.4 50.3 4.939 (0.085) Female 52.3 61.0 42.6 49.1 Age 0 B 24 years 13.7 14.6 8.2 11.7 12.716 (0.048) 15 B 44 years 51.0 34.1 42.6 45.6 45 B 64 years 31.4 39.0 36.1 34.2 65? years 3.9 12.2 13.1 8.5 Household type Single 23.5 17.1 18.9 20.9 7.633 (0.106) Couple 48.4 39.0 38.5 43.4 Family with children 28.1 43.9 42.6 35.8 Education level Low 6.5 17.1 18.9 12.7 12.307 (0.015) Medium 44.4 51.2 41.0 44.0 High 49.0 31.7 40.2 13.4 Income level Unknown 16.3 22.0 19.7 18.4 7.873 (0.248) Low 19.6 9.8 12.3 15.5 Medium 46.4 58.5 45.9 47.8 High 17.6 9.8 22.1 18.4 Work status Not 25.5 34.1 25.4 26.6 4.741 (0.315) Part-time 30.1 31.7 23.0 27.5 Full-time 44.4 34.1 51.6 45.9 Work type Crafts/industry/transport/ 4.6 22.0 15.6 11.1 32.244 (0.000) agricultural Administrative 26.1 9.8 13.9 19.3 Commercial 10.5 17.1 18.9 14.6 Health services 22.2 29.3 22.1 23.1 Services/education 14.4 12.2 18.9 15.8 Specialist 22.2 9.8 10.7 16.1 his or her own profile for the trip. Although none of the standard profiles may fit an individual perfectly, it is expected that the segmentation at least will improve the assessment of the true preferences. Such a multi-class model would be an advanced feature of a TRS. Even without segmentation, the integration of the preference estimates (i.e., the one-segment 123 78 T. Arentze et al. solution) already would involve a significant refinement of the user model compared to existing systems. To demonstrate this, the model estimation results were implemented in the c-Space recommender system. The single segment solution was implemented, as the current version of c-Space does not include a method to assess the specific preference profile of a user on this level. The estimation results needed to be further processed before they can be used. For the discrete choice the POI attributes were discretized; the stated choice experiment used three levels for each attribute. For continuous variables such as travel time and entrance costs a TRS needs a continuous function. A continuous function was derived by intra- and extrapolation of the point estimates. For a first qualitative evaluation of the system, an application was developed for Trento, a popular city-trip destination for tourists in Italy. 35 individuals were approached in a street-survey and volunteered to use the system to plan and implement their trip. After having made the tour they filled out a small survey about their experiences. The responses confirmed the usefulness and added value of the system. Users reported that the content suggested for their trip was indeed of their interest (83%) and that they were not able to find such content using other means (91%). Compared to other recommender systems, which typically recommend popular tours, the tours suggested were found to be more in-line with their interests. The prototype and these test results provide evidence that refinement of the user model in the way proposed in this study is feasible and potentially can improve the quality of tour recommendation. 6 Conclusions and discussion For recommending optimal personalized tours it is important to know the way individuals make trade-offs between preferences for particular POIs against routing, costs and time-use characteristics. In this study, we described the c-Space tour- recommender system and considered the empirical estimation of utility weights tourists assign to these factors using the stated-choice-experiment and state-of-the- art choice analysis techniques (latent-class model). A random sample from a large national panel participated in the survey. The analysis revealed the influence of motivational state of a tourist on preferences for activities. It also revealed that the way trade-offs are made and the response to current needs differ significantly between individuals. The latent-class analysis indicated that three segments can be identified. The results of this study can be used to improve user models that are currently used in travel recommender systems. Current models typically assume a process where the selection of POIs and determining a route along the locations of the POIs are performed in separate steps. The multi-attribute utility function estimated in the present study allows the TRS to take the travel and time-use implications of visiting particular POIs into account already in the step of selecting POIs. Thus, using this utility function the selection of POIs that maximize a utility value on the level of a tour can be identified. As we demonstrated by an application of the c-Space TRS, the estimated values can be incorporated in a user profile together with information 123 Estimating a latent-class user model for travel… 79 about personal interests (themes) and needs of a user. A first evaluation demonstrated the efficacy of the approach. Several problems remain for future research. First, the current c-Space system does not support a process for adapting the user model to a user. Extending the system to handle a multi-class user model is an objective of further development. Second, the estimation results were based on a sample from the Dutch population. It is interesting to replicate the study in other countries to see whether similar segments emerge. Third, our study took into account only a limited number of potentially relevant contextual conditions. For advanced Context-Aware TRS (CARS) the set of conditional factors need to be expanded in order to obtain more refined estimates of utility weights in specific cases. Fourth, tourists’ activities are often conducted by individuals in a group and preferences for selecting certain POIs and activities are the result of a group decision process. Our user model did not account for this social aspect. In order to derive a suitable user model for TRSs that do take group preferences into account—so-called Social TRS, the discrete choice analysis need to be expanded in future research. Acknowledgements The research leading to these results has received funding from the European Community’s Seventh Framework Program (FP7/2007-2013) under the Grant Agreement number 611040. The author is solely responsible for the information reported in this paper. It does not represent the opinion of the Community. The Community is not responsible for any use that might be made of the information contained in this paper. We furthermore would like to acknowledge Bruno Simoes of Graphitech for his support in the evaluation study. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, dis- tribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Appendix—optimization of the latent-class model specification Before applying a latent class estimation, the specification of the base model was optimized considering parsimony. Potentially, there are many possible interaction variables that can be considered. To arrive at a parsimonious model, the significance of all interaction variables was tested in a stepwise manner, starting with including all interaction variables in the model and next removing in a stepwise manner the interaction variables that are insignificant. Recall that context variables consist of duration of the city trip, time moment of the activity in the trip and size and nature of the current need. It appeared that none of the interactions concerning duration and time moment are significant and therefore these interaction variables were dropped from the final model. Given the purpose of the present study, all two-way interactions concerning the nature and size of the current need were kept in the final base model so that this factor could be included in the search for significant 123 80 T. Arentze et al. Table 4 Statistics for the latent class models Number Number of Log likelihood at Log Likelihood Rho square (1- Akaike of parameters convergence evaluated at 0 LL(B)/LL0)) information segments (P) (LLB) (LL0) P criterion AIC 119 - 2942.09 - 3942.62 0.254 5922.2 239 - 2804.89 - 3942.62 0.289 5687.8 359 - 2746.04 - 3942.62 0.304 5610.1 479 - 2708.57 - 3942.62 0.313 5575.1 Sample size is 2844 choices from 316 respondents (N) segments. The needs being in open air, relaxation, physical exercise and social contact were merged into a single category (labeled being outdoors) to increase the parsimony of the model further, as little differentiation between these needs emerged. Hence, in the final model nature of the need has three levels: New experience, Entertainment and Being outdoors. The latent class estimation was run for several settings of the number of classes to find the optimal number of segments. Table 4 shows goodness-of-fit statistics for the estimated models where the number of classes is varied from one to four classes. According to the AIC index, the 4-segments model is the best possible model on this data. It is noticed, however, that the improvement of the index going from a 3-segments to a 4-segments model is modest. In terms of interpretation of the estimation results, the 3-segments model appears to be more useful than the 4-segments model. In the latter model, the segmentation has become increasingly sensitive to differences regarding a somewhat trivial factor (namely, the constant representing the utility of the null alternative). For these reasons, we selected the 3-segments model as the best model for the analysis purpose. References Adomavicius, Tuzhilin (2005) Toward the next generation of recommender systems: a survey of the state- of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17:734–749 Aksenov P, Kemperman A, Arentze T (2016) A personalized recommender system for tourists on city trips: concepts and implementation, International Conference on Smart Digital Futures, KES International, 15–17 June, Tenerife, Spain Aksenov P, Kemperman ADAM, Arentze TA (2014) Toward personalized and dynamic cultural routing: a three-level approach. Procedia Environ Sci 22:257–269 Arentze TA (2015) LATUS: A dynamic model for leisure activity-travel utility simulation. Paper prepared for presentation at the 94th Transportation Research Board Annual Meeting, January 2015, Washington, D.C Arentze TA, Timmermans HJP (2009) A need-based model of multi-day, multi-person activity generation. Transp Res Part B Methodol 43(2):251–265 Arentze TA, Ettema D, Timmermans HJP (2010) Incorporating time and income constraints in dynamic agent-based models of activity generation and time use: approach and illustration. Transp Res C 18:71–83 Armbrecht J (2014) Use value of cultural experiences: a comparison of contingent valuation and travel cost. Tour Manag 42:141–148 123 Estimating a latent-class user model for travel… 81 Ashworth GJ, Page SJ (2011) Urban tourism research: recent progress and current paradoxes. Tour Manag 32(1):1–15 Borras J, Moreno A, Valls A (2014) Intelligent tourism recommender systems: a survey. Expert Syst Appl 41:7370–7389 Boxall PC, Adamowicz WL (2002) Understanding heterogeneous preferences in random utility models: a latent class approach. Environ Resour Econ 23:421–446 Braunhofer M, Ricci F (2017) Selective contextual information acquisition in travel recommender systems. Inform Technol Tour 17:5–29 Braunhofer M, Elahi M, Ricci F (2015) User personality and the new user problem in a context-aware point of interest recommender system. In: Tussyadiah I, Inversini A (eds) Information and communication technologies in tourism. Springer, Switzerland, pp 537–549 Buhalis D (1998) Strategic use of information technologies in the tourism industry. Tour Manag 19(5):409–421 Buhalis D, Law R (2008) Progress in information technology and tourism management: 20 years on and 10 years after the internet—the state of eTourism research. Tour Manag 29:609–623 CBS (2017) StatLine, electronic databank of Statistics Netherlands, http://statline.cbs.nl/statweb/?LA=en. Accessed 30 Mar 2017 Fonte FAM, Lo ´ pez MR, Burguillo JC, Peleteiro A, Martı ´nez AB (2013) A tagging recommender service for mobile terminals. In: Cantoni L, Xiang Z (eds) Information and communication, Technologies in Tourism. Springer-Verlag, Berlin, pp 424–435 Gavalas D, Konstantopolous C, Mastakas K, Pantziou G (2014) Mobile recommender systems in tourism. J Netw Comput Appl 39:319–333 Greene WH (2001) Fixed and random effects in nonlinear models. Working Paper EC-01-01, Stern School of Business, Department of Economics Greene WH, Hensher DA (2002) A latent class model for discrete choice analysis: Contrast with mixed logit. Working Paper ITS-WP-02-08, Institute of Transport Studies. The University of Sydney, Australia Gretzel U, Mitsche N, Hwang YH, Fesenmaier DR (2004) Tell me who you are and I will tell you where to go: use of travel personalities in destination recommendation systems. Inform Technol Tour 7:3–12 Gupta S, Chintagunta PK (1994) On using demographic variables to determine segment membership in logit mixture models. J Mark Res 31:128–136 Hanani U, Shapira B, Shoval P (2001) Information filtering: overview of issues. Res Syst User Model User-Adapt Interact 11:203–259 Hensher DA, Rose JM, Greene WH (2015) Applied choice analysis, 2nd edn. Cambridge University Press, Cambridge , UK (ISBN: 9781107465923) Kamakura W, Russell G (1989) A probabilistic choice model for market segmentation and elasticity structure. J Mark Res 26:379–390 Kerkman K, Arentze T, Borgers A, Kemperman A (2012) Car drivers compliance with route advice and willingness to choose socially desirable routes. Transport Res Rec 1:102–109 Lew A, McKercher B (2006) Modeling tourist movements. A local destination analysis. Ann Tour Res 33(2):403–423 Lin Y, Kerstetter D, Nawijn J, Mitas O (2014) Changes in emotions and their interactions with personality in a vacation context. Tour Manag 40:416–424 Linaza MT, Agirregoikoa A, Garcia A, Torres JI, Aranburu K (2011) Image-based travel recommender system for small tourist destinations. In: Law R et al (eds) Information and communication technologies in tourism. Springer-Verlag, Wien, pp 1–11 Ma J, Gao J, Scott N, Ding P (2013) Customer delight from theme park experiences. The antecedents of delight based cognitive appraisal theory. Ann Tour Res 42:359–381 Mackay K, Vogt C (2012) Information technology in everyday and vacation contexts. Ann Tour Res 39(3):1380–1401 Neidhardt J, Seyfang L, Schuster R, Werthner H (2015) A picture-based approach to recommender systems. Inform Technol Tour 15:49–69 Nijland L, Arentze T, Timmermans H (2010) Eliciting the needs that underlie activity-travel patterns and their covariance structure: results of multimethod analyses. J Transp Res Rec 2157:54–62 Schneider OP, Vogt CA (2012) Applying the 3M model of personality and motivation to adventure travelers. J Travel Res 51:704–716 123 82 T. Arentze et al. Simoes B, Aksenov P, Santos P, Arentze T (2015) C-space: fostering new creative paradigms based on recording and sharing ‘‘casual’’ videos through the internet, Multimedia & Expo Workshops (ICMEW), 2015 IEEE International Conference Steen Jacobsen JK, Munar AM (2012) Tourist information search and destination choice in a digital age. Tour Manag Perspect 1(1):39–47 Swait J (1994) A structural equation model of latent segmentation and product choice for cross-sectional revealed preference data. J Retail Consum Serv 1(2):77–89 Wang D, Park S, Fesenmaier DR (2012) The role of smartphones in mediating the touristic experience. J Travel Res 51(4):371–387 Worndl W, Hefele A (2016) Generating paths through discovered places-of-interests for city trip planning. In: Inversini A, Schegg R (eds) Information and communication technologies in tourism. Springer, Heidelberg, pp 441–453 Wynen J (2013) Explaining travel distance during same-day visits. Tour Manag 36:133–140 Yang WS, Hwang SY (2013) iTravel: a recommender system in mobile peer-to-peer environment. J Syst Softw 86:12–20 Yeh DY, Cheng CH (2015) Recommendation system for popular tourist attractions in Taiwan using Delphi panel and repertory grid techniques. Tour Manag 46:164–176

Journal

Information Technology & TourismSpringer Journals

Published: Feb 2, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off