In this paper, we describe the world’s largest gait database with real-life carried objects (COs), which has been made publicly available for research purposes, and its application to the performance evaluation of vision-based gait recognition. Whereas existing databases for gait recognition include at most 4007 subjects, we constructed an extremely large-scale gait database that includes 62,528 subjects, with an equal distribution of males and females, and ages ranging from 2 to 95 years old. Moreover, whereas existing gait databases consider a few predefined CO positions on a subject’s body, we constructed a database that contained unconstrained variations of COs being carried in unconstrained positions. Additionally, gait samples were manually classified into seven carrying status (CS) labels. The extremely large-scale gait database enabled us to evaluate recognition performance under cooperative and uncooperative settings, the impact of the training data size, the recognition difficulty level of the CS labels, and the possibility of the classification of CS labels. Particularly, the latter two performance evaluations have not been investigated in previous gait recognition studies. Keywords: Gait database, Extremely large scale, Carried object, Carried object detection and classification, Performance evaluation 1 Introduction However, gait recognition has to overcome some Gait refers to the walking style of an individual, and can be practical issues because of circumstances defined as used as a behavioral biometric . Compared with tradi- covariates, such as view, clothing, shoes, carried object tional biometric features, such as DNA, a fingerprint, face, (CO), environmental context, aging, or mental condition and iris, gait has many unique advantages. The key advan- [30, 40]. These covariates should be fully studied for tage is that gait can be used to recognize an individual further progress and the development of a practical and at a distance from a camera without his/her cooperation, robust gait recognition algorithm. To overcome these even for a relatively low-resolution image sequence  issues, a common gait database that considers the above and low frame rate . Therefore, gait has the potential covariates is essential. Among the aforementioned covari- to be applied in many applications, such as access control, ates,COisoneofthemostimportant becausepeople surveillance, forensics, and criminal investigations from often need to carry objects in their daily lives, such as a footage from CCTV cameras installed in a public or pri- handbag, briefcase on the way to work, or multiple bags vate space [4, 16, 19]. Recently, gait has been used as a after shopping. forensic feature, and there has already been a conviction There are some existing gait databases in the research that has resulted from gait analysis . community that consider COs. However, they contain a limited number of subjects and few predefined COs, and they lack information about the positions and types of *Correspondence: email@example.com COs. For example, CASIA gait dataset B iscomposed The Institute of Scientific and Industrial Research, Osaka University, Osaka 567-0047, Japan of 124 subjects and considers a bag as a CO, where Full list of author information is available at the end of the article the bag was selected by a subject from a predefined set © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Uddin et al. IPSJ Transactions on Computer Vision and Applications (2018) 10:5 Page 2 of 11 containing a knapsack, satchel, and handbag. Similarly, the a wide range of ages. It is more than 15 times the size USF dataset  is composed of 122 subjects and con- of the existing largest dataset for gait recognition. siders a briefcase as a CO; thus, there are at most two 2. In the proposed database, there is no constraint on options for a CO available, that is, with or without a brief- type, quantity, and position of the CO. We considered case. Recently, a large dataset, the OU-ISIR Gait Database, any real-life COs that are used in daily life (e.g., Large Population Dataset with Bag, β version, which con- handbag, vanity bag, book, notepad, and umbrella) or tains 2,070 subjects with various COs, was introduced in when traveling (e.g., backpack, luggage, and travel ; however, it does not include detailed information bag). Additionally, the typical position labels of the about COs. COs are manually annotated. It would be beneficial With the growing data science trend, we always need a to analyze the classification and gait recognition large-scale dataset to efficiently solve a problem. Recently, difficulty with respect to these typical position labels. many sophisticated machine learning techniques, such as 3. We provide a set of evaluation experiments with deep learning (DL), have been developed, and they require benchmark results using state-of-the-art gait a large number of training samples because more data are recognition algorithms. Particularly, experiments more important than a better algorithm . However, few related to COs have not been investigated in previous large-scale databases are available for gait recognition, for gait recognition studies. example, the OU-ISIR Gait Database, Large Population Dataset  and Large Population Dataset with Bag, β 2 Related work version , which consider 4,007 and 2,070 subjects, 2.1 Existing gait recognition databases respectively. Although these datasets for gait recognition In this section, we briefly describe the existing major seem to be sufficient for a conventional machine learn- databases for gait recognition, which are summarized in ing algorithm (e.g., without DL), they are not sufficiently Table 1. large to efficiently conduct a study using a DL-based The USF dataset  is one of the most widely used gait approach. datasets and captured outdoors under different walking In this study, we first propose an extremely large pop- conditions. It is composed of 122 subjects and considers a ulation gait database with a large variation of CO covari- briefcase as a CO, and as a result, at most two options for ates that will encourage the gait recognition community samples (i.e., with or without a CO) are available. to deeply research this practical covariate. Second, we The Soton small dataset  considers only three types provide performance evaluations for gait recognition by of bags (i.e., handbag, barrel bag, and rucksack) as COs employing existing state-of-the-art appearance-based gait and the subject carries these bags in four ways. Because representation. The contributions of this paper are sum- this dataset contains a larger variation of CO covariates marized as follows: than that of the USF dataset, it can be used for exploratory CO covariate analysis for gait recognition . 1. The proposed database is the largest gait database in The TUM-IITKGP  dataset contains unique covari- the world and is constructed from 62,528 subjects ates, such as dynamic and static occlusion. Later, with an equal distribution of males and females, and TUM-GAID  dataset is constructed and it is the first Table 1 Existing major gait recognition databases #Possible options Gender balance Database #Subjects Types of CO for CO positions (male:female) Soton dataset, small  12 Handbag, barrel bag, rucksack Four N/A USF dataset  122 Briefcase One 4:1 CASIA dataset, B  124 Knapsack, satchel, handbag Three 3:1 CASIA dataset, C  153 Bag One 6:1 CMU Mobo dataset  25 Ball One 12:1 TUM-IITKGP  35 Backpack One N/A TUM-GAID  305 Backpack One 3:2 OU-ISIR, LP  4,007 N/A N/A 1:1 OU-ISIR, LP with Bag, β version  2,070 Unconstrained Unconstrained N/A Proposed 62,528 Unconstrained Unconstrained 1:1 Uddin et al. IPSJ Transactions on Computer Vision and Applications (2018) 10:5 Page 3 of 11 multi-signal gait dataset to contain audio signals, RGB metric learning-based approaches with a GEI feature to images, and depth images by Microsoft Kinect. evaluate the performance of the proposed database. CASIA dataset B  is constructed from 124 sub- jects with and without a CO, and before capturing the 3 OU-ISIR large population gait database with sequences with a CO, each subject chose a bag from a set carried objects of the knapsack, satchel, or handbag that he/she liked. As 3.1 Capture system a result, there are at most four options of samples available The proposed database was constructed from gait images regarding COs (no bag, knapsack, satchel, and handbag). automatically collected by a gait collecting system called CASIAdataset C considers only a backpack as a CO, Gait Collector . The gait data were collected in and data was captured from 153 subjects using a ther- conjunction with an experience-based demonstration of mal infrared camera designed for the study of night gait video-based gait analysis at a science museum (Miraikan), recognition. and informed consent for purpose of research use was OU-ISIR, LP with Bag, β version , is composed obtained electronically. An overview of the capture system of 2,070 subjects and considers unconstrained types and is illustrated in Fig. 1. The camera was set at a distance positions of COs. However, information about the status of approximately 8 m from the straight walking course of COs, such as position, change of position within a gait and a height of approximately 5 m. The image resolu- period, and quantity of COs, is unavailable. tion and frame rate were 1280 × 980 pixels and 25 fps, To summarize, the aforementioned datasets are unsuit- respectively. The green background panels and carpet able not only for studying CO covariates but also for were arranged along the walking course for clear silhou- taking advantage of modern machine learning (e.g., DL) ette extraction. The camera continuously captured video approaches. By comparing existing databases, the pro- during the museum opening hours, photo-electronic sen- posed database contains unconstrained variations of COs sors were used for detecting a subject walking past, and a and the largest number of subjects, which is approx- sequence of a target subject was extracted from the entire imately 200 times larger than the largest existing gait video stream. database with COs, that is, TUM-GAID, and 15 times Each subject was asked to walk straight three times at larger than that without COs for gait recognition, that is, his/her preferred speed. First, the subject walked to the OU-ISIR, LP. other side of the course with his/her COs and then placed We note that there exists a larger gait database that these items into a CO storage box. Subsequently, he/she consists of 63,846 subjects. However, this database is only walked twice more without COs in the same direction and used for age estimation and is not usable for gait recogni- then picked up the COs and left the walking course. As a tion because only a single gait energy image (GEI) feature result, we obtained three sequences for each subject. The is available for each subject. first sequence with or without COs (if he/she did not have COs) is called the A sequence, and the second and third 2.2 Gait recognition approaches sequences without COs are called A and A sequences, 2 3 In gait recognition, the appearance-based approach is respectively. dominant, and GEI  is the most prevalent and fre- quently used feature. Furthermore, some modified GEIs 3.2 Gait feature generation have been introduced for robust gait recognition against To obtain a GEI feature, we performed the following four CO and clothing variation covariates, such as Gait steps : (1) A silhouette image sequence of a subject was Entropy Image (GEnI) , which is computed by calcu- extracted using a chroma-key technique  (i.e., removal lating the Shannon entropy for every pixel of the GEI; of the green background area using HSV color space). (2) Masked GEI (MGEI) , for which gait energies are Then, registration and size normalization of the silhou- masked out when gait entropy is smaller than a certain ette images were performed. First, the subject’s silhouette threshold; Gabor GEI ; and transformed GEI with images were localized by detecting the top, bottom, and a Gabor filter. horizontal center (i.e., median) positions. Then, a moving- Appearance-based features, however, often suffer from average filter was applied to these positions. Finally, the large intra-subject appearance changes because of covari- sizes of the subject’s silhouette images were normalized ates. To gain more robustness, the most popular according to the average positions so that his/her height approach is to incorporate spatial metric learning-based was 128 pixels. Furthermore, the aspect ratio of each approaches, such as linear discriminant analysis (LDA) region was maintained, and as a result, we generated the  and a ranking support vector machine (RankSVM) subject’s silhouette images of 88 × 128 pixels. (3) A gait . Additionally, as a DL-based approach, a convolutional period was determined using normalized autocorrelation neural network (CNN) [8, 32, 38]isalsousedforrobust  of the subject’s silhouette image sequence along the gait recognition. Therefore, in this study, we consider temporal axis. (4) A GEI was constructed by averaging the Uddin et al. IPSJ Transactions on Computer Vision and Applications (2018) 10:5 Page 4 of 11 Fig. 1 Illustration of the data collection system subject’s silhouette image sequence over a gait period. If only the samples for the A sequence may have contained several gait periods were detected from one walking image COs, the annotation process was only applied to the A sequence, then we chose a GEI that was nearest to the sequence for each subject. center of the walking course. 3.4 Database statistics 3.3 Annotation of the carrying status Because of the good design of the system, the world’s Because we did not constrain the subject in terms of the largest database for gait recognition with COs, composed type of CO, or where and how it was carried, it could be of 62,528 subjects with ages ranging from 2 to 95 years, carried in a variety of positions and orientations. Thus, it was constructed. Detailed distributions of the subjects’ was difficult to categorize the position exactly. For sim- genders by age group are shown in Fig. 4.The gender plicity, we first divided the area in which the COs could distribution was well-balanced for males and females be carried into four regions with respect to the human body: side bottom, side middle, front, and back, as shown in Fig. 2. However, some subjects did not carry a CO, some carried multiple COs in multiple regions, and others Front region changed a CO’s position within a GEI gait period. Side middle region For each GEI, every fourth frame within a gait period Back region was manually checked to annotate the carrying status Side bottom region (CS). As a result, a total of seven distinct labels for the CS were annotated in our proposed database. A summary of Fig. 2 Four approximating regions for a person in which a CO is being the denotation of the CS labels is shown in Table 2 and carried some examples of CS labels in Fig. 3.Notethat,because Uddin et al. IPSJ Transactions on Computer Vision and Applications (2018) 10:5 Page 5 of 11 Table 2 Carrying status label The distributions of the CS labels are shown in Fig. 5. Most of subjects carried multiple COs in multiple regions CS label Explanation (i.e., with MuCO) and the subjects equally liked to carry NoCO No carried object COs at the front (i.e., with FrCO) and back regions SbCO CO(s) being carried in the side bottom region (i.e., with BaCO). Additionally, the subjects equally did SmCO CO(s) being carried in the side middle region not like to carry COs (i.e., with NoCO). Moreover, few FrCO CO(s) being carried in the front region subjects changed their CO positions from one region to BaCO CO(s) being carried in the back region another (i.e., with CpCO); similarly, few subjects carried COs in the side middle region. Meanwhile, the number MuCO COs being carried in multiple regions of subjects who carried COs in the side bottom region CO(s) with position being changed from one region CpCO (i.e., with SbCO) was approximately twice as many to another within a gait period as those who carried COs in the side middle region (i.e., with SmCO). 4 Performance evaluation for each age group, which is a desirable property for the 4.1 Overview comparison of gait recognition performance between These experiments were designed to address a variety of genders . challenges for COs and provided benchmark results for Improper GEIs were excluded manually from the final a competitive performance comparison of various algo- database if a subject stopped walking for a while at the rithms. Specifically, we considered two sets of popular center of the walking course, changed walking direc- experiments for gait recognition: cooperative and unco- tion before the end of the walking course, continued to operative settings and impact of the number of training carry COs in the A and A sequences, or exited from 2 3 subjects. Additionally, we designed two more sets of orig- the capture system after finishing the first sequence, A . inal experimental settings to study the impact of COs: As a result, each subject had at most three sequences. difficulty level of the CS labels and classification of the CS We, therefore obtained a database for publication that included 60,450 subjects for the A sequence, and 58,859 labels. To the best of our knowledge, they have not been and 58,709 subjects for A and A sequences, respectively. investigated before. 2 3 Fig. 3 Examples of CS labels: a sample RGB image within a gait period with COs (circled in yellow) in their A sequence; b corresponding GEI feature; and c GEI feature of the same subject without a CO in another captured sequence (A or A ), for reference 2 3 Uddin et al. IPSJ Transactions on Computer Vision and Applications (2018) 10:5 Page 6 of 11 The first benchmark used the direct matching method , which is a non-training-based approach that calculates the dissimilarity using the L distance between two GEIs. The method is denoted by DM in this paper. The second benchmark used LDA , which is widely exploited in gait recognition [14, 18]. Specifically, we first applied principal component analysis (PCA) to an unfolded GEI feature vector to reduce its dimensions, and subsequently applied LDA to obtain a metric to recognize an unknown sample. Fig. 4 Distribution of genders by age group The benchmark is denoted by PCA_LDA in the experiment discussions. The third benchmark used the gait energy response function (GERF) , which transforms GEI into a 4.2 Evaluation criteria better discriminative feature. Then, a Gabor filter was We evaluated the accuracy of gait recognition in two applied to the transformed GEI, and LDA was modes: identification and verification. We used the subsequently applied, followed by PCA. The cumulative matching curve (CMC) for identification benchmark is denoted by GERF in the experiment and the receiver operating characteristic curve with discussions. z-normalization (z-ROC), which indicates the trade-off A support vector machine (SVM)  is a widely used between the false rejection rate (FRR) of genuine samples method for multi-class classification. Therefore, we and false acceptance rate (FAR) of imposter samples with used SVM in a benchmark, with a third-degree varying thresholds for verification. Additionally, more polynomial kernel for the classification of the CS specific measures for each evaluation mode were used labels. The benchmark is denoted by mSVM in the to evaluate performance: Rank-1 and Rank-5 for identi- experiment discussions. fication, and the equal error rate with z-normalization RankSVM  is a well-known extension of a SVM (z-EER), FRR at 1% FAR with z-normalization (z-FRR ), 1% that is used for gait recognition in the literature and area under curve with z-normalization (z-AUC) for [23, 24, 26]. Therefore, we used RankSVM in a metric verification. learning-based benchmark. In the training phase, we Additionally, we used the correct classification rate set the positive and negative feature vectors as the (CCR) to evaluate accuracy for the classification of the CS absolute difference between the genuine and label experiment. impostor pair of GEIs, respectively. By considering the computational cost and memory, we selected 4.3 Benchmarks randomly nine impostor pairs against a genuine pair. There are various state-of-the-art appearance-based The benchmark is denoted by RSVM in the methods available for gait recognition in the literature, experiment discussions. as mentioned in the Subsection 2.2.Weselectedseven GEINet  is based on a simple CNN network benchmark methods from the wide variety of appearance- architecture for gait recognition, in which one input based gait recognition methods to validate the proposed GEI feature is fed into the network, and the soft-max database, which are summarized as follows: value from the output of the final layer (fc4), in which the number of nodes is equal to the number of training subjects, is regarded as the probability that the input matches a corresponding subject. The benchmark is denoted by GEINet in the experiment discussions. Siamese  is also based on CNN network architecture, in which two input GEI features are used to train the two parallel CNN networks with shared parameters for gait recognition [33, 41]. The output of the final layer (fc4) is regarded as a feature vector for each input. A contrastive loss was used for the genuine pair, whereas a so-called hinge loss was Fig. 5 Distribution of the CS label used for the imposter pair. Note that, for training the Uddin et al. IPSJ Transactions on Computer Vision and Applications (2018) 10:5 Page 7 of 11 network, similar to RSVM, we set nine imposter pairs Among the benchmark methods, the non-training- against a genuine pair. The benchmark is denoted by based approach DM achieved the worst performance. SIAME in the experiment discussions. Because DM did not apply a technique against the covari- ate, it was directly affected by the spatial displacement 4.4 Cooperative and uncooperative settings of the corresponding body parts in GEIs caused by the In this section, the impacts of the cooperative and unco- CS difference. By contrast, the accuracy of the training- based approaches was better than that of DM because operative settings for recognition accuracy are investi- gated. The implicit assumption for the cooperative setting the dissimilarity metrics were optimized with the training is that the covariate condition is consistent for all samples dataset. in a gallery set. However, it is difficult to collect such data Regarding the LDA-based metric learning benchmarks, in a real scenario because of the uncooperative and non- both PCA_LDA and GERF worked reasonably well and intrusive traits of gait biometrics. Therefore, in addition their performances were very similar. However, GERF to the cooperative setting, a more natural uncooperative was slightly better for the uncooperative setting, whereas setting was used in which the covariate condition was PCA_LDA was slightly better for the cooperative setting, inconsistent in the gallery set . as shown in Fig. 6 and Table 3.WebelievethatLDAper- For the settings, we prepared a subject list that included formed better recognition for both benchmarks by reduc- 58,199 subjects who had a sample in the A sequence and ing intra-subject appearance variation while increasing a sample in either the A or A sequences for each subject. inter-subject variations. Furthermore, in GERF, before 2 3 Then, the subject list was divided randomly by subject id applying LDA and PCA, a pre-processing technique was into two sets: a training set (29,097 subjects) and test set performed on GEI, for example, transforming a pixel value (29,102 subjects) equally for each CS label. Then, the test for a better discriminative feature. This transformation set was divided into two subsets: gallery set and probe in GERF was not effective for the cooperative setting; set. For the cooperative setting, we used samples from the however, it worked well for the uncooperative setting. A or A sequences (i.e., without COs) in the gallery and As a result, the performance of GERF was better for the 2 3 thesamplefromthe A sequence was used as a probe. uncooperative setting. While in the uncooperative setting, samples of each sub- Regarding RSVM, it is reported in the literature that ject were randomly separated into a gallery set and probe RankSVM works better in an identification scenario set so that the gallery contained a mix of samples that con- [24, 36] because it focuses more on the relative distance between two classes and considers the probe-dependent sisted of A and A or A sequences. The training sets 1 2 3 rank statistics. However, it did not work well in our set- for the cooperative and uncooperative settings were pre- pared in the same manner to reflect the corresponding ting. We believe the cause of this weak performance was test sets. that, as mentioned in Section 4.3, we could only set the The results for CMC and z-ROC are shown in Fig. 6,and number of impostor pairs at nine against a genuine pair, Rank-1, Rank-5, z-FRR ,z-EER,andz-AUCare shownin and hence, RankSVM could not effectively maximize the 1% Table 3. From these results, the recognition accuracy for inter-subject variation. This is one of the important disad- the cooperative setting is better than that of the uncoop- vantages of the RankSVM method for an extremely large erative setting for most of the benchmarks. training dataset. 0.5 1.0 0.9 0.4 0.8 0.7 0.3 0.6 0.5 0.2 0.4 0.3 Coop_DM Uncoop_DM 0.2 Coop_PCA_LDA Unoop_PCA_LDA 0.1 Coop_GERF Uncoop_GERF Coop_RSVM Uncoop_RSVM 0.1 Coop_GEINet Uncoop_GEINet Coop_SIAME Uncoop_SIAME 0.0 0.0 0 102030405060708090 100 0.0 0.1 0.2 0.3 0.4 0.5 Rank FAR a) CMC curves b) ROC curves with z-normalization Fig. 6 CMC and ROC curves for cooperative and uncooperative settings. Legend marks are common in all graphs. a CMC curves. b ROC curves with z-normalization Identification rate FRR Uddin et al. IPSJ Transactions on Computer Vision and Applications (2018) 10:5 Page 8 of 11 Table 3 Rank-1/5 [%], z-FRR , z-EER [%], and z-AUC [%] for cooperative (Coop) and uncooperative (Uncoop) settings 1% Rank-1 Rank-5 z-FRR z-EER z-AUC 1% Benchmark Coop Uncoop Coop Uncoop Coop Uncoop Coop Uncoop Coop Uncoop DM 17.7 15.9 23.4 20.5 56.3 68.0 18.5 29.9 10.1 23.2 PCA_LDA 40.8 31.4 53.0 41.3 21.2 34.3 7.4 14.4 2.4 8.0 GERF 38.5 31.2 50.9 42.2 30.6 34.6 8.0 11.4 2.7 5.1 RSVM 24.7 18.3 35.6 27.6 34.1 43.9 9.6 14.7 3.5 8.2 GEINet 22.3 18.5 32.5 26.9 34.8 43.3 11.3 14.7 4.5 7.1 SIAME 49.8 50.3 69.7 70.5 5.7 5.4 2.5 2.4 0.3 0.3 Bold and italic bold fonts indicate the best and second-best benchmarks, respectively Regarding CNN-based benchmarks, although GEINet for SbCO and SmCO, subjects frequently carried small did not work well, SIAME achieved the best results with and lightweight COs, which were occluded by the sub- a large margin compared with other benchmarks. We ject’s body very often, as shown in Fig. 3. Therefore, the believe the cause of the weak performance for GEINet was COs did not have much of an impact on the shape. For that the parameter of the one-input CNN architecture was the case of BaCO, subjects typically carried a large CO, trained so as to maximize the soft-max of the output layer such as a backpack that was secured by two straps that fit- (fc4) node for the same subject’s input GEIs. Therefore, ted over the shoulders, and thus the position of the CO it emphasized minimizing only intra-subject appearance was fixed and stable within a gait period. However, the variation. However, only two sample GEIs for each subject large CO heavily affected the shape and posture, as shown were used in these experiments, which was not sufficient in Fig. 3. Similarly for MuCO, subjects typically carried to train a good parameter. By contrast, the two-input a large backpack-type CO together with other types of CNN architecture Siamese in SIAME was trained so that COs that were carried in other regions. Although the CO it minimized the variation between the intra-subject and position of the back region was constant, other CO posi- maximized the variation between inter-subject GEIs. Fur- tions were random; thus, GEI samples for MuCO were thermore, there was no accuracy deviation between the largely affected not only by shape but also by motion. As a cooperative and uncooperative settings for SIAME. We result, the recognition performance of this label was worse believe that the deep neural network structure of Siamese than that of BaCO. Regarding FrCO, the subjects typically was sufficiently powerful to manage CO covariates given carried a lightweight object by hand in the front region. a very large training dataset. 4.5 Difficulty level of the CS labels The purpose of this experiment was to analyze the dif- ficulty level of the CS labels based on recognition per- formance. To analyze the difficulty level, we used the same protocol as the cooperative setting, except the probe set was divided into seven subsets according to the CS label, whereas the gallery set was unchanged for a fair comparison. The results for the Rank-1 identification rate and a) z-EERs are shown in Fig. 7. NoCO and CpCO achieved the best and worse labels respectively, whereas the remaining labels (i.e., SbCO, SmCO, FrCO, BaCO, and MuCO) were approximately at the middle difficulty level. We discuss the evaluation results by considering the static shape and dynamic motion of the gait feature. NoCO was the best label for any benchmark, and this is reasonable because there was no CO between the gallery and probe of the same subject and, as a result, shape and b) motion were stable. Fig. 7 Rank-1 identification rate and z-EERs for the difficulty level of Regarding the middle-level difficulty labels, the motion CS labels. a Rank-1. b z-EER and shapes deviated by different amounts. For example, Uddin et al. IPSJ Transactions on Computer Vision and Applications (2018) 10:5 Page 9 of 11 Unlike BaCO, the CO position was not stable, and typi- could be numerous applications such as, the detection cally both hands were required to hold the CO in the front of suspicious events, such as bag-prohibited area incur- region. Therefore, the GEI samples of FrCO were affected sion and locating the person with a backpack. However, slightly by shape and fairly affected by motion. there is no standard gait-based CO database with available Regarding CpCO, the CO position was random in any labeling information about the position and type of CO. region within a gait period because of the randomly Thus, most existing work in the gait recognition literature changing position from one region to another. There- detects a CO using the gait feature [5, 17, 35]; however, fore, GEI samples for CpCO were severely affected by the they only classify with or without a CO. We believe that to motion feature, in addition to shape. As a result, CpCO overcome such a limitation, our proposed database can be was the most difficult label. used as a benchmark database for the detection and clas- sification of CO positions because of the available labeling 4.6 Impact of the number of training subjects information of the CO. It is well-known that the performance of a machine To evaluate the performance of the classification of the learning-based method, particularly modern machine CS labels, we divided the number of subjects for each learning, depends on a variety of training samples. In a label into a training set and test set equally. Because the specific scenario, such as our case, this variety can be number of training subjects for each label was not the expressed bythenumberofsubjects.Inthissection, the same, we equalized the number of training subjects for impact of the number of training subjects on recognition all labels by considering the smallest number of training performance is investigated. subjects for a label, that is, for CpCO (1,300 subjects). In the experiment, we chose the cooperative setting of Foreachsubject,onlythe GEIofthe A sequence was Section 4.4 and selected the best benchmark, that is, the used. Then, we trained the training-based benchmarks CNN-based benchmark SIAME. Then, we prepared the using the equalized training set. For testing, each sam- training set for 100, 200, 500, 1000, 2000, 5000, and 10,000 ple of a CS label was matched against all the available subjects randomly from the entire training set (29,097), samples of the training set. To predict the CS label for and the test set was unchanged. each test sample, majority voting was used for mSVM  The results for Rank-1 identification rate and z-EERs and the mean distance to a class was used for all other are shown in Fig. 8. The accuracy was better for a larger benchmarks. number of training subjects. For example, z-EER reduced The CCR results of all CS labels for each benchmark are shown in Fig. 9. The confusion matrices for the by approximately 13% when the number of training sub- jects increased from 100 to 29,097, whereas the rank-1 best and second-best benchmarks, which were SIAME identification rate increased by approximately 44%. and mSVM, respectively, are shown in Table 4 for all The above results clearly demonstrate the importance labels as an average accuracy. The classification accuracy ofthenumberoftrainingsubjects, andalargedatabaseis for each label was quite different and depended on the essential. benchmark. Regarding the performance of benchmark methods, 4.7 Classification of the CS labels SIAME and mSVM consistently worked for each label, In previous sections, we presented our evaluations of sub- as shown in Fig. 9. For SIAME, as already mentioned in ject recognition based on gait, and in this section, we Section 4.4, the Siamese network was trained by minimiz- evaluate a different recognition problem, that is, the clas- ing the distance between intra-labels and maximizing the sification of the CS labels based on the gait feature. There distance between inter-labels. Even mSVM used a shal- low machine learning approach (i.e., SVM), but it worked well. We believe the cause is that multi-class SVM  constructed multiple binary classifiers (e.g., K (K − 1)/2 Fig. 8 Relationship between the number of training subjects and recognition accuracy for SIAME Fig. 9 CCRs of the CS labels Uddin et al. IPSJ Transactions on Computer Vision and Applications (2018) 10:5 Page 10 of 11 Table 4 Confusion matrix for the classification of the CS labels 5 Conclusion and future work In this paper, we presented a gait database that consisted of an extremely large population with unconstrained types and positions of COs, and presented a performance eval- uation for vision-based gait recognition methods. This database had the following advantages over existing gait databases: (1) the number of subjects was 62,528, which was more than 15 times greater than the largest existing database for gait recognition; and (2) the CO positions were manually annotated, and gait samples were classified as seven distinct CS labels. Together with the database, we also conducted four per- formance evaluation experiments. The results provided several insights, such as estimating the difficulty level among annotated CS labels based on recognition perfor- mance and the classification accuracy for CS labels. Further analysis of gait recognition performance and the classification of the CS labels using our database is still needed. For example, we can evaluate the perfor- classifiers for K classes), one for each pair of classes, and mance using more sophisticated and powerful DL-based finally identified a class based on majority voting. By con- approaches to gait recognition, which typically require an trast, the remaining benchmark had a similar tendency extremely large number of training samples but achieve to the cooperative and uncooperative settings, such as state-of-the-art performance. PCA_LDA, and GERF achieved nearly equal accuracy. As for the classification accuracy of each label, NoCO Endnote and BaCO worked well because there was no CO in The proposed database is an extension of the large- NoCO, and the shape and position of the CO were sta- scale dataset that was introduced in . ble in BaCO. For SIAME, the CCRs were 76.8 and 78.9% for NoCO and BaCO, respectively, as shown in Table 4. Acknowledgements We thank Maxine Garcia, PhD, from Edanz Group (www.edanzediting.com/ac), For the case of SbCO and FrCO, the position and shape for editing a draft of this manuscript. of the COs were fairly distinguished for other labels, and therefore, the classification accuracy of these labels was Funding This work was partly supported by JSPS Grants-in-Aid for Scientific Research reasonable and nearly equal. However, SbCO was slightly (A) JP15H01693, “R&D Program for Implementation of Anti-Crime and confused with NoCO because of the shape similarity Anti-Terrorism Technologies for a Safe and Secure Society”; Strategic Funds for with respect to the upper part of the GEIs. As a result, the Promotion of Science and Technology of the Ministry of Education, Culture, Sports, Science and Technology, the Japanese Government; and the sometimes samples of SbCO were misclassified as NoCO JST CREST “Behavior Understanding based on Intention-Gait Model” project. (see Table 4). For the cases of SmCO, MuCO, and CpCO, the GEI fea- Availability of data and materials The database and evaluation protocol settings is available at http://www.am. tures were not stable, and as a result, sometimes samples sanken.osaka-u.ac.jp/BiometricDB/GaitLPBag.html. of these labels were misclassified as other labels. Because of the occlusion of COs with the subject’s body for SmCO, Authors’ contributions MZU evaluated all the experiments and wrote the initial draft of the the GEI feature was confused with that of SbCO, NoCO, manuscript. MZU and TTN analyzed and discussed the evaluated accuracy. and BaCO, depending on the part of the COs that was TTN and YM revised the manuscript. NT generated the GEI features. XL occluded, as shown in Fig. 3,andthus,samples were participated to evaluate one benchmark. YM and DM designed the study. YY supervised the work and provided technical support. All authors read and misclassified as SbCO, NoCO, and BaCO (see Table 4). approved the manuscript. Similarly, for the case of MuCO, it was confused with BaCO, because, as already discussed in Section 4.5,sub- Competing interests The authors declare that they have no competing interests. jects typically carried, for example, a backpack in the back region together with a small object in other regions in Publisher’s Note MuCo, as shown in Fig 3. Additionally, for the case of Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. CpCO, subjects usually changed the CO’s position from one region to another region through the front using the Author details hands. Therefore, the GEI feature of CpCO was slightly 1 The Institute of Scientific and Industrial Research, Osaka University, Osaka confused with that of FrCO. 567-0047, Japan. The Institute for Datability Science, Osaka University, Osaka Uddin et al. IPSJ Transactions on Computer Vision and Applications (2018) 10:5 Page 11 of 11 567-0047, Japan. School of Computer Science and Engineering, Nanjing 23. Martín-Félez R, Xiang T (2012) Gait recognition by ranking. In: Fitzgibbon University of Science and Technology, Nanjing 210094, China. AW, Lazebnik S, Perona P, Sato Y, Schmid C (eds). ECCV (1), Lecture Notes in, Computer Science. vol 7572. Springer, Berlin. pp 328–341 Received: 20 February 2018 Accepted: 2 April 2018 24. Martín-Félez R, Xiang T (2014) Uncooperative gait recognition by learning to rank. Pattern Recognit 47(12):3793–3806 25. Mori A, Makihara Y, Yagi Y (2010) Gait recognition using period-based phase synchronization for low frame-rate videos. In: Proc. of the 20th Int. References Conf. on Pattern Recognition. IEEE, Istanbul. pp 2194–2197 1. Bashir K, Xiang T, Gong S (2009) Gait recognition using gait entropy 26. Muramatsu D, Shiraishi A, Makihara Y, Uddin M, Yagi Y (2015) Gait-based image. In: Proc. of the 3rd Int. Conf. on Imaging for Crime Detection and person recognition using arbitrary view transformation model. EEE Trans Prevention. IET, London. pp 1–6 Image Process 24(1):140–154 2. Bashir K, Xiang T, Gong S (2010) Gait recognition without subject 27. Nixon M, Carter J, Shutler J, Grant M (2001) Experimental plan for cooperation. Pattern Recognit Lett 31(13):2052–2060 automatic gait recognition. Tech. rep., Southampton 3. Bouchrika I, Nixon M (2008) Exploratory factor analysis of gait recognition. 28. Nixon, MS, Tan TN, Chellappa R (2005) Human identification based on In: Proc. of the 8th IEEE Int. Conf. on Automatic Face and Gesture gait. Int. Series on Biometrics. Springer-Verlag, Boston Recognition. IEEE, Amsterdam. pp 1–6 29. Otsu N (1982) Optimal linear and nonlinear solutions for least-square 4. Bouchrika I, Goffredo M, Carter J, Nixon M (2011) On using gait in forensic discriminant feature extraction. In: Proc. of the 6th Int. Conf. on Pattern biometrics. J Forensic Sci 56(4):882–88 Recognition. IEEE, Munich. pp 557–560 5. Brian DeCann AR (2010) Gait curves for human recognition, backpack 30. Sarkar S, Phillips J, Liu Z, Vega I, Ther PG, Bowyer K (2005) The HumanID detection, and silhouette correction in a nighttime environment. vol gait challenge problem: data sets, performance, and analysis. IEEE Trans 7667. SPIE, Orlando Pattern Recog Mach Intell 27(2):162–177 6. Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. 31. Schultz C (2006) Digital keying methods. University of Bremen Center for ACM Trans Intell Syst Technol 2(3):27:1–27:27 Computing Technologies. Tzi 4(2):3 7. Chapelle O, Keerthi SS (2010) Efficient algorithms for ranking with SVMs. 32. Shiraga K, Makihara Y, Muramatsu D, Echigo T, Yagi Y (2016) GEINet: View- Inf Retr 13(3):201–215 invariant gait recognition using a convolutional neural network. In: Proc. 8. Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric of the 8th IAPR Int. Conf. on Biometrics (ICB 2016). IEEE, Halmstad. pp 1–8 discriminatively, with application to face verification. In: Proc. of the IEEE 33. Takemura N, Makihara Y, Muramatsu D, Echigo T, Yagi Y (2017) On Conf. on Computer Vision and Pattern Recognition, vol 1. IEEE, San Diego. input/output architectures for convolutional neural network-based pp 539–546 cross-view gait recognition. IEEE Trans Circ Syst Video Technol 28(99):1–1 9. Domingos P (2012) A few useful things to know about machine learning. 34. Tan D, Huang K, Yu S, Tan T (2006) Efficient night gait recognition based Commun ACM 55(10):78–87 on template matching. In: Proc. of the 18th Int. Conf. on Pattern 10. Gross R, Shi J (2001) The CMU Motion of Body (MoBo) Database. Tech. Recognition, vol 3. IEEE, Hong Kong. pp 1000–1003 rep., Carnegie Mellon University 35. Tao D, Li X, Wu X, Maybank S (2006) Human carrying status in visual 11. Han J, Bhanu B (2006) Individual recognition using gait energy image. surveillance. In: Proc. of the IEEE Conf. on Computer Vision and Pattern EEE Trans Pattern Anal Mach Intell 28(2):316–322 Recognition, vol 2. IEEE, New York. pp 1670–1677 36. Uddin M, Muramatsu D, Kimura T, Makihara Y, Yagi Y (2017) MultiQ: Single 12. Hofmann M, Sural S, Rigoll G (2011) Gait recognition in the presence of sensor-based multi-quality multi-modal large-scale biometric score occlusion: a new dataset and baseline algorithms. In: Proc. of the Int. Conf. database and its performance evaluation. IPSJ Trans Comput Vis Appl on Computer Graphics, Visualization and Computer Vision, Plzen. 9(18):1–25 pp 99–104 37. UK Court (2008) How biometrics could change security. http://news.bbc. 13. Hofmann M, Geiger J, Bachmann S, Schuller B, Rigoll G (2014) The TUM co.uk/2/hi/programmes/click_online/7702065.stm Gait from Audio, Image and Depth (GAID) Database: multimodal 38. Wu Z, Huang Y, Wang L, Wang X, Tan T (2017) A comprehensive study on recognition of subjects and traits. J Vis Comun Image Represent cross-view gait based human identification with deep CNNs. IEEE Trans 25(1):195–206 Pattern Anal Mach Intell 39(2):209–226 14. Hongye X, Zhuoya H (2015) Gait recognition based on gait energy image 39. Xu C, Makihara Y, Ogi G, Li X, Yagi Y, Lu J (2017) The OU-ISIR Gait Database and linear discriminant analysis. In: Proc. of the IEEE Int. Conf. on Signal comprising the large population dataset with Age and performance Processing, Communications and Computing (ICSPCC). IEEE, Ningbo. evaluation of age estimation. IPSJ Trans Comput Vis Appl 9(1):24 pp 1–4 40. Yu S, Tan D, Tan T (2006) A framework for evaluating the effect of view 15. Iwama H, Okumura M, Makihara Y, Yagi Y (2012) The OU-ISIR Gait Database angle, clothing and carrying condition on gait recognition. In: Proc. of the comprising the large population dataset and performance evaluation of 18th Int. Conf. on Pattern Recognition, vol 4. IEEE, Hong Kong. pp 441–444 gait recognition. IEEE Trans. Inf Forensics Secur 7(5):1511–1521 41. Zhang C, Liu W, Ma H, Fu H (2016) Siamese neural network based gait 16. Iwama H, Muramatsu D, Makihara Y, Yagi Y (2013) Gait verification system recognition for human identification. In: Proc. of the IEEE Int. Conf. on for criminal investigation. IPSJ Trans Comput Vis Appl 5:163–175 Acoustics, Speech and Signal Processing (ICASSP). IEEE, Shanghai. 17. Lee M, Roan M, Smith B, Lockhart TE (2009) Gait analysis to classify external pp 2832–2836 load conditions using discriminant analysis. Hum Mov Sci 28(2):226–235 18. Li X, Makihara Y, Xu C, Muramatsu D, Yagi Y, Ren M (2016) Gait energy response function for clothing-invariant gait recognition. In: Proc. of the 13th Asian Conf. on Computer Vision (ACCV 2016). Springer, Taipei. pp 257–272 19. Lynnerup N, Larsen PK (2014) Gait as evidence. IET Biometrics 3(2):47–54 20. Makihara Y, Mannami H, Yagi Y (2010) Gait analysis of gender and age using a large-scale multi-view gait database. In: Proc. of the 10th Asian Conf. on Computer Vision. Springer, Queenstown. pp 975–986 21. Makihara Y, Kimura T, Okura F, Mitsugami I, Niwa M, Aoki C, Suzuki A, Muramatsu D, Yagi Y (2016) Gait collector: an automatic gait data collection system in conjunction with an experience-based long-run exhibition. In: Proc. of the 8th IAPR Int. Conf. on Biometrics (ICB 2016). IEEE, Halmstad. pp 1–8 22. Makihara Y, Suzuki A, Muramatsu D, Li X, Yagi Y (2017) Joint intensity and spatial metric learning for robust gait recognition. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. IEEE, Honolulu. pp 6786–6796
IPSJ Transactions on Computer Vision and Applications – Springer Journals
Published: May 30, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
All the latest content is available, no embargo periods.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud