Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Age-Group Estimation Using Feature and Decision Level Fusion

Age-Group Estimation Using Feature and Decision Level Fusion Abstract Age-group estimation is a technique of classifying an individual’s face image as belonging to a particular range of ages. Stochastic nature of ageing among individuals makes age-group estimation based on face images a challenging task. Faces from different age-groups may have similar attributes making facial age-group estimation even harder. Facial ageing is manifested as change in shape in lower age-groups and variations in texture in adult to old age-groups. As one ages, wrinkles start appearing in regions like forehead, eye-corners, mouth region and cheek bone areas among others. The general appearance of the face also varies across one’s lifetime from childhood to old age. Approaching age-group estimation as a multi-class classification problem, we adopt Linear Discriminant Analysis (LDA) for representation of global face appearance, landmark points and distance ratios between fiducial landmarks for shape representation, Local Binary Patterns (LBP) for local region texture description and Gabor filters for local region wrinkle representation. We fuse global shape and appearance features; local wrinkle and texture features for facial component before applying an ensemble of machine learning tools that comprise of Artificial Neural Network (ANN) and Support Vector Machines (SVM) for age-group estimation. Using majority voting scheme, we fuse decisions from global and component-based matchers to get final age-group label. This approach was experimented on four age-groups derived from Face and Gesture Network (FG-NET) ageing dataset. Experimental results show that feature and decision fusion improve age-group estimation accuracies. 1. INTRODUCTION Human faces provide prior information about one’s gender, identity, ethnicity, perceived age, mood among others. Alley [1] asserts that attributes derived from human facial appearance like mood, perceived age significantly impact interpersonal behaviour. Automatic age estimation can be applied in a number of ways like preventing vendor machines from dispensing alcohol and cigarettes to minors. Age estimation has attracted significant research in recent years [2]. Facial ageing has received substantial research attention with increasing attention in age invariant face recognition, age estimation and face verification across age among other areas [3–7]. Ageing is a stochastic inevitable and irreversible process that causes variation in facial shape and texture. Ageing introduces significant change in facial shape in formative years and relatively large texture variations with still minor change in shape in older age groups [8, 9]. Shape variations in younger age groups are caused by craniofacial growth. Craniofacial studies have shown that human faces change from circular to oval as one ages [10]. These changes lead to variations in the position of fiducial landmarks [11]. During craniofacial development stage, forehead slants back emancipating space on the cranium. Eyes, ears, mouth and nose expand to cover interstitial space created. The chin becomes protrusive as cheeks extend. Facial skin remains moderately unchanged compared to shape. More literature on craniofacial development is found in [12]. As one ages, facial blemishes like wrinkles, freckles and age-spots appear. Underneath the skin, melanin producing cells are damaged due to exposure to sun’s ultraviolet (UV) rays. Freckles and age-spots appear due to over-production of melanin. Consequently, light reflecting collagen not only decreases but also becomes non-uniformly distributed making facial skin tone non-uniform [13]. Parts adversely affected by sunlight are upper cheek, nose, nose-bridge and forehead. Consequently, we selected these areas for skin wrinkle and texture analysis. The most visible variations in adulthood to old age are skin variations exhibited in texture change. There is still minimal facial shape variation in these age groups. Biologically, as the skin grows old, collagen underneath the skin is lost [9]. Loss of collagen and effect of gravity makes the skin become darker, thinner, leathery and less elastic. Facial spots and wrinkles appear gradually. The framework of bones beneath the skin may also start deteriorating leading to accelerated development of wrinkles and variations in skin texture. More details about face ageing in adulthood are found in [12]. These variations in shape and texture across age could be modelled and used to automatically estimate someone’s age. Feature extraction in age estimation systems is critical since it affects accuracy of the system. Age estimation systems use local, global or hybrid facial features. Given the stochastic nature of ageing, combining both local and global texture and shape could result into better age and age-group classification results. The rest of this paper is organized as follows. Section 2 presents related work. Section 3 presents materials and methodology. Section 4 presents experiments and results. Section 5 presents discussion and recommendations. 2. RELATED WORK Global, local and hybrid features have been previously used in age and age-group estimation. Ramanathan et al. [3] present a recent survey in automated age estimation techniques. Kwon and Lobo [14] estimated age-group based on anthropometry and density of wrinkles. They separated adults from babies using distance ratios between frontal face landmarks on a small dataset of 47 images. They also extracted wrinkle features from specific regions using Active Contour Model. Young adults were differentiated from senior adults using these wrinkle indices. Baby group classification accuracy was lower than 68% but overall performance of their experiments was not reported. Furthermore, ratios used were mainly from baby faces. Horng et al. [15] used geometric features and Sobel filter for texture analysis to classify face images into four groups. They used Sobel edge magnitude to extract and analyse wrinkles and skin variance. They achieved an accuracy of 81.6% on subjectively labelled age-groups. Dehshibi and Bastanfard [11] used distance ratios between landmarks to classify human faces in various age groups. Using a back propagation neural network with these ratios as inputs, face images were classified into four age-groups of below 15, 16–30, 31–50 and above 50. Using a private dataset, they reported 86% accuracy. Distance ratios alone are not suitable for capturing age variations since they are affected by pose, translation and rotation. Thukral et al. [16] used geometric features and decision fusion for age-group estimation. They achieved 70% for 0–15, 15–30 and above 30 age-groups. The age-groups used are wide making the technique hardly useful in age estimation applications. Farkas et al. [17] used 10 anthropometric measurements of the face to classify individuals in various ethnic groups. They analyzed these measurements and identified ones that contribute significantly to diversity in facial shape in different ethnic groups. They also found that horizontal and vertical measurements are different across ethnic groups. Tiwari et al. [18] developed a morphological based face recognition technique using Euclidean distance measurements between fiducial facial landmarks. Using morphological features with back propagation neural network, they reported superior recognition rate than performance of Principal Component Analysis (PCA) [19] with back propagation neural network. This technique recognized faces but it was independent of ageing factor due to variations in these distances as one ages. This signifies that distances between facial landmarks differ at different ages, especially in young age-groups, and, therefore, it could be used in age estimation. Gunay and Nabiyev [20] used spatial Local Binary Pattern (LBP) [21] histograms to classify faces into six age-groups. Using nearest neighbour classifiers, they achieved accuracy of 80% on age-groups 10±5,20±5,30±5,40±5,50±5 and 60±5 ⁠. In [22], Gunay and Nabiyev trained three Support Vector Machines (SVM) model for age-group estimation using Active Appearance Model (AAM) [23], LBP and Gabor filter [24] features. They fuse decisions from these classifiers to obtain final decision. Although they reported 90% accuracy on subsequent age estimation, overall performance of age-group estimation was not reported. Hajizadeh and Ebrahimnezhad [25] represented facial features using Histogram of Oriented Gradients (HOG) [26]. Using probabilistic neural network (PNN) to classify HOG features extracted from several regions, they achieved 87% accuracy in classifying face images into four groups. Liu et al. [27] build a region of certainty (ROC) to link uncertainty-driven shape features with particular surface features. Two shape features are first designed to determine face certainty and classify it. Thereafter, SVM is trained on gradient orient pyramid (GOP) [28] features for age-group classification. Testing this method on three age-groups, 95% accuracy was reported. They further used GOP in [29] with Analysis of Variance (ANOVA) for feature selection to classify faces into age-groups using linear SVM [30] to model features from eyes, nose and mouth regions. They achieved an accuracy of 91% on four age groups on FG-NET dataset and 82% on MORPH dataset. It was also found that the overall performance of age-estimation decreases as the number of age-groups increase. This is because the number of images in each age-group reduces drastically as the number of groups increase. Lanitis et al. [31] adopted AAM to represent face image as a vector of combined shape and texture parameters. They defined ageing as a linear, cubic or quadratic function. For automatic age estimation, they further evaluated quadratic function, nearest neighbour and Artificial Neural Network (ANN) in [32]. They found that hierarchical age estimation achieves better results with quadratic function and ANN classifiers. Although AAM have been extensively used, it does not extract texture information. This problem is avoided by using hybrid feature extraction techniques to combine both shape and texture features for age and age-group estimation. Adopting Ageing Pattern Subspace (AGES), Geng et al. [33] proposed automatic age estimation using appearance of face images. AGES compensate missing ages by learning a subspace representation of one’s images when modelling a series of a subjects ageing face. To estimate age, test image is positioned at every possible position in the ageing pattern to find a point that can best reconstruct it. Ageing subspace that can minimize reconstruction error determines age of the test image. Fu and Huang [34] used age separated face images to model a low-dimensional manifold. Age was estimated by linear and quadratic regression analysis of feature vectors derived from respective low-dimensional manifold. The same approach of manifold learning was used by Guo et al. in [35]. They extracted face ageing features using age learning manifold scheme and performed learning and age prediction using locally adjusted regressor. Their approach reported better performance than Support Vector Regression (SVR) and SVM. Craniofacial ageing model that combines psychophysical and anthropometric evidences was proposed by Ramanathan et al. [36]. The model was used to simulate perceived age of a subject across age for improving accuracy of face recognition. Choi et al. [13] proposed age estimation approach using hierarchical classifiers with local and global facial features. Using Gabor filters for wrinkle extraction and LBP for skin feature extraction, they classified face images into age groups with SVM. This approach is error prone because it only depends on a single classifier. Wrong age-group classification leads to wrong age estimation. For accurate age estimation, age-group classification must be robust and this can be achieved by use of ensemble of classifiers. Chao et al. [37] determined the association between age labels and facial features by combining distance metric, learning and dimensionality reduction. They used label-sensitive and K-Nearest Neighbour (KNN) and SVR for age estimation. Recently, Sai et al. [38] used LBP, Gabor and Biologically Inspired Features for face representation. They used Extreme Learning Machines (ELM) [39] for age-group estimation. Their approach achieved accuracy of about 70%. Using LBP and a bank of Gabor filters, Wang et al. [40] classified images into four age-groups. They used SVM, Error Correcting Output Codes (ECOC) and AdaBoost for age-group estimation. These studies use mostly single feature and single classifiers for age-group estimation. Intuitively, when a human eye focuses on a facial image, it extracts different features like texture, wrinkle, shape and general appearance. Extracting different features from a face image and use of ensemble of classifiers could increase age-group and age estimation accuracies. This study sought to find out how feature and decision fusion improve age-group estimation accuracies. 3. MATERIALS AND METHODS 3.1. Image preprocessing The image is first smoothed using 2D Gaussian spatial filter F(a,b)=12πσee−a2+b22σ2 (1) where a=3 and b=3 are the width and the height of the Gaussian kernel, respectively, and σ is the smoothing degree. Values of aandb were determined empirically. The value of σ is derived from the size b of the filter as σ=0.3×((b−1)×0.5−1)+0.8=0.8 (2) Given an image I(x,y) ⁠, an enhanced image IG(x,y) of size m×n is found by performing a convolution of F(x,y) with I(x,y) as IG(x,y)=∑a=0m−1∑b=0n−1F(a,b)I(x−a,y−b) (3) The smoothed image of I(x,y) is IG(x,y) ⁠. Image preprocessing helps in reducing noise and enhances facial features like edges, wrinkle lines and texture features. Images to be proposed are supposed to be frontal face images. The images are converted to grey-scale single channel images before being processed. 3.2. Facial component detection The facial components used for age-group estimation in this study are Left-Eye (LE), Right-Eye (RE), Nose (NO), Mouth (MO), Nose-bridge (NB), Left-Cheek (LC), Right-Cheek (RC) and Forehead (FH). The face region is first detected from a given image, using Haar-cascade classifier [41], then cropped and resized to 100×100 pixels. Figure 1 shows the components used in this study and how they are annotated. Figure 1. View largeDownload slide Components used for age-group estimation. Figure 1. View largeDownload slide Components used for age-group estimation. The components are detected within the cropped face using landmark localization technique proposed in [42]. In this technique, eyes are detected using Viola and Jones [43] Haar-cascade classifiers due to its better detection accuracies. Spatial and geometric information of the eyes is used to locate and crop forehead, cheeks and nose-bridge. Nose and mouth are detected using Haar-cascade classifiers [44, 45]. Eyes, nose-bridge and cheeks are cropped and resized to 50×50 pixels while the forehead is cropped and resized to 70×30 pixels. The nose is cropped and resized to 30×60 while the mouth is cropped and resized to 60×30 pixels. Component images were resized in a fashion to ensure that respective component images are of the same size for efficient extraction of appearance features using LDA. Component image sizes after cropping were determined by the aspect ratio of the respective components. 3.3. Feature extraction Facial shape and appearance are extracted from the whole face image. The image is then sub-divided into eight components as shown in Fig. 1. As proposed by Suo et al. [46], in order to improve age-group estimation, global features (shape and appearance) should be used together with local features (texture and wrinkles). Wrinkles features are extracted using Gabor filter and texture features using LBP. Figure 2 shows 32 Gabor filters, while Fig. 3 shows results of applying these filters to RE region shown in Fig. 1. Figure 2. View largeDownload slide Gabor filters from eight orientations four scales. Each row represents a scale and each column represent an orientation. 0≤θ<π and π/4≤γ≤π ⁠. Figure 2. View largeDownload slide Gabor filters from eight orientations four scales. Each row represents a scale and each column represent an orientation. 0≤θ<π and π/4≤γ≤π ⁠. Figure 3. View largeDownload slide Thirty-two responses after applying filters in Fig. 2 on RE in Fig. 1. Figure 3. View largeDownload slide Thirty-two responses after applying filters in Fig. 2 on RE in Fig. 1. Originally introduce by Denis Gabor in 1946 [24], Gabor filters have been extensively used for wrinkle, edge and texture feature extraction due to its capability of determining orientation and magnitude of wrinkles [13]. Gabor filter has been regarded as the best texture descriptor in object recognition, segmentation, tracking of motion and image registration [47]. Since wrinkles appear as edge-like components with high frequency, Gabor edge analysis technique has been commonly used for wrinkle features extraction. Sobel filter [15, 48] Hough transform [49] and active contours [14] are among most commonly used texture edge descriptors. Although, edges in a face image also consist of features such as beards, mustache, hairs and shadows. To reduce the effect of this noise, Choi et al. [13] proposes the use of predominant orientation of wrinkles to be considered in wrinkle feature extraction. 2D spatial domain Gabor is defined as g(x,y)=12πσxσyexp−12x2σx2+y2σy2+2πjWx (4) where σx and σy are the standard deviations of the distribution along x and y axes, respectively, and W is the sinusoidal radial frequency. Fourier transform of the function in equation (4) is expressed as G(u,v)=exp−12(u−W)2σu2+v2σv2 (5) where σu=12πσx and σv=12πσy ⁠. Gabor filter bank is obtained by rotations and dilations of g(x,y) ⁠. The general equation for creating Gabor filter bank could be expressed as gb(x,y)=a−mg(x¯,y¯) (6) where x¯=xcosθ+ysinθ and y¯=−xsinθ+ycosθ where θk=π(k−1)n,k=1,2,3…n where n is the number of orientations used and a−m is filter scale for m=0,1,2...S for S scales. We refer readers to [47, 50] for more details on Gabor filters. Wrinkles features were extracted from components in Fig. 1. A set of Gabor responses at a given sampling point constitute wrinkle feature. For each component, p wrinkle features are extracted, where p>0 represent size of Gabor filter bank. Mean μ and standard deviation σ of Gabor filter response magnitude are used to determine wrinkle features. These p wrinkle features from a component are concatenated and form wrinkle feature vector of that particular component. Apart from wrinkle analysis, texture features were also extracted and analyzed. Unlike wrinkles that appear at particular points on a face, facial texture appears randomly for every facial part. LBP is a texture description technique that can detect microstructure patterns like spots, edges, lines and flat areas on the skin [21]. LBP is used to describe texture for face recognition, gender classification, age estimation, face detection, face and facial component tracking. Figure 4 shows sample 3×3 LBP operation. Figure 4. View largeDownload slide LBP operation with P=8 ⁠, R=1 ⁠. (a) Sample image region, (b) thresholding and (c) resultant LBP code. Figure 4. View largeDownload slide LBP operation with P=8 ⁠, R=1 ⁠. (a) Sample image region, (b) thresholding and (c) resultant LBP code. LBP assigns a code to each pixel in an image by comparing it to its neighbours in a particular radius. LBP code is created by LBPP,R(xc,yc)=∑n=0N−12ns(gn−gc) (7) where N is the number of neighbouring pixels, R is the distance of neighbouring pixel from centre pixel, gc is the grey value of centre pixel, gn for n=0,1,3,…,N−1 correspond to grey value of neighbouring pixel on circular symmetric neighbourhood of distance R>0 and thresholding function τ that generates a binary bit for a particular pixel is defined as τ(x)=1ifx≥00ifx<0 (8) Concatenating all 8 bits gives a binary number. Resulting binary number is converted to a decimal and assigned to centre pixel as its LBP code. Ojala et al. [21] further categorized LBP codes as uniform and non-uniform patterns. LBP pattern with utmost two bitwise transition from 0 to 1 or 1 to 0 is categorized as a uniform pattern. For instance, 00000000, 00010000 and 11011111 patterns are uniform while 01010000, 11100101 and 10101001 are non-uniform patterns. Ojala et al. [51] found that when using eight neighbours and radius 1, 90% of all patterns are made up of uniform patterns. The original LBP operator had limitation in capturing dominant features with large scale structures. The operator was later extended to capture texture features with neighbourhood of different radii [51]. A set of sampling pixels distributed evenly along circle circumference centred at the pixel to be labelled define the neighbourhood. Bilinear interpolation of points that do not fall within the pixels is done to allow any radii and any number of sampling pixels. Uniform patterns may represent microstructures as line, spot, edge or flat area. Figure 5 shows microstructure pattern representation while Fig. 6 shows LBP codes for all uniform patterns in LBP8,1 neighbourhood. In order to extract rotational invariant features using LBP, the generated LBP code is circularly rotated until its minimum value is obtained [52]. Extended LBP operator could capture more texture features on an image but still it could not preserve spatial information about these features. Ahonen et al. [53] proposed a technique of dividing a face image into n cells. Histograms are generated for each cell then concatenated to a single spatial histogram. Spatial histogram preserves both spatial and texture description of an image. Figure 5. View largeDownload slide Microstructure pattern LBP code P=8 ⁠, R=1 ⁠. Figure 5. View largeDownload slide Microstructure pattern LBP code P=8 ⁠, R=1 ⁠. Figure 6. View largeDownload slide Example of LBP codes that represent uniform pattern P=8 ⁠, R=1 ⁠. Figure 6. View largeDownload slide Example of LBP codes that represent uniform pattern P=8 ⁠, R=1 ⁠. Image texture features are finally represented by histogram of LBP codes. LBP histogram contains detailed texture descriptor for all structures on the face image like spots, lines, edges and flat areas. AAM [23] are used to represent shape and appearance of faces. AAM statistically represents face appearance and shape using PCA [19]. AAM face representation does not capture enough texture and wrinkle information. We used 68 landmark points in Face and Gesture Recognition Network (FG-NET) ageing database and ratios of distances between fiducial landmarks to represent global facial shape. Figure 7 shows labelled image and 68 landmark points. Figure 7. View largeDownload slide Landmark points used in facial shape. (a) Labelled face and (b) 2D-landmark points. Figure 7. View largeDownload slide Landmark points used in facial shape. (a) Labelled face and (b) 2D-landmark points. These points can be determined by use of appropriate 2D landmarking algorithm like one proposed in [54]. Global face appearance features are extracted by Linear Discriminant Analysis (LDA) [19, 55]. LDA is a feature extraction technique that searches for features that best discriminant between classes. Given a set of independent features, LDA creates a linear combination of these features such that largest mean differences between classes are achieved. The detected face is resized to 100×100 pixels. The image is then flattened into a 1×10 000 vector. We use LDA for appearance feature extraction. We first project 1×10 000 face vector to PCA subspace to reduce the dimensionality of the input image data from 10 000 to N−c where N is the number of samples and c is the number of age-groups to be estimated. This is done before applying LDA to ensure that Sw does not become singular as proposed in [56, 57]. We perform LDA on face images projected on PCA space. For each image in all ages, we define within-class scatter matrix Sw=∑j=1c∑i=1Nj(xij−μj)(xij−μj)T (9) where xij is the ith image of age j ⁠, μj is the mean of age j ⁠, c is the number of ages to be estimated and Nj is the number of images in age j ⁠. We define between-class scatter matrix as Sb=∑j=1c(μj−μ)(μj−μ)T (10) where μ is the mean of all ages. LDA main objective is to maximize between-class scatter matrix while minimizing within-class scatter matrix. One way of doing this is maximizing the ratio det∣Sb∣det∣Sw∣ ⁠. Given that Sw is non-singular, it has been proven [55] that this ratio is maximized when column vectors of projection matrix are the eigenvectors of Sw−1Sb ⁠. Sw maximum rank is N−c with N samples and c classes. This, therefore, requires N=t+c samples to guarantee that Sw does not become singular, where t is the dimensionality of input data. The number of samples N is almost always smaller than t ⁠, making the scatter matrix Sw singular. To solve this problem, [56, 57] propose projecting input data to PCA subspace, to reduce dimensionality to N−c ⁠, or less, before applying LDA. PCA and LDA are widely used appearance feature extraction methods in pattern recognition [58]. Consequently, we adopt LDA for extraction of global face appearance features for age-group estimation. We project the input data to a PCA subspace as recommended before applying LDA. Thereafter, we combine LDA and PCA eigenvectors using GEMM1; [59] operation and use the combined eigenvectors to project each of the face data. We take these projections as appearance feature vector for age estimation. 3.4. Feature fusion Fusion is a technique of combining multiple information in a biometric system [60]. Fusion in multi-biometric systems can be categorized into two broad categories: (1) fusion prior to matching, and (2) fusion after matching. Fusion prior to matching is further divided into two; sensor level fusion and feature level fusion. Sensor level fusion is used when multiple sources are used to represent samples of the same biometric trait. Feature level fusion is used by combining features extracted from multiple biometric sources or features extracted by different algorithms. Fusion after matching includes matching score level fusion, rank level fusion and decision level fusion. Decision level fusion is when output from different matchers are combined using some rule to generate a final decision. The rules could be AND rules, OR rules, Majority voting among others [61]. We use feature level and decision level fusion. As reported by Ross et al. [62], feature level fusion results in a richer feature representation of the raw data leading to better classification. We, therefore, adopt feature level fusion before classification and majority vote scheme is used for decision level fusion to combine decisions from multiple matchers for the final decision. We extract global shape features f1=(s1,s2,…,sn) ⁠, global appearance feature vector f2=(a1,a2,…,ax) ⁠, local wrinkle features f3=(w1,w2,…,wp) and local texture features f4=(t1,t2,…,tm) ⁠. The values n>0,x>0,p>0 and m>0 represent dimensionality of each respective feature vector. Feature dimensionality is normalized using LDA. We perform z-score normalization of these features before fusion. The normalization is done by fnormi=fi−μiσi (11) where fi is a facial ageing feature, μi and σi are the mean and the standard deviation of this feature, and fnormi is the normalized feature of feature i ⁠. We thereafter fused feature f1 with feature f2 and feature f3 with feature f4 by concatenating them as fg=f1⊕f2fl=f3⊕f4 (12) where fg and fl represent the fused global feature vector and the fused local feature vector, respectively. Feature fusion often leads to high-dimensional feature vectors leading to ‘curse of dimensionality’ problem. Thus, each of the fused feature vectors dimensionality is reduced by PCA. We perform decision fusion using majority voting scheme. Given number of classifiers as T with each classifier having a single vote, predicted class is one with T votes, or one with ⌊T/2⌋+1 votes or the class with simple majority vote [63]. The final decision is: select class ωx if ∑t=1Tdt,x=maxj=1c∑t=1Tdt,j (13) Assumptions in this technique are that T is odd, for any instance x ⁠, probability that each classifier will give correct class label is p and that p is independent for each classifier [64]. We use simple majority voting rule to get final output of ensemble. 3.5. Overview of proposed method Given a face image, global shape features Sij and global appearance features Aij for subject i at age-group j are extracted. Aij and Sij are fused to get a single global feature vector fG ⁠. fG is used to train a classifier for age-group estimation based on global feature vector. Face image is divided into R>1 components. For each component Rr for r=1,2,3…R ⁠, where R is the number of components, local wrinkle feature Wi,rj and local texture feature Ti,rj of subject i at age-group j are extracted. Wi,rj and Ti,rj are fused into a single local feature vector fLr for component r ⁠. A classifier is trained on each of fLr ⁠. A total of T=R+1 classifiers are trained. Each of these classifiers returns age-group label for respective feature vector. Figure 8 shows a block diagram of the proposed method. SVM [13, 35, 65] LDA [66], Nearest Neighbour [20] and ANN [15, 67], Extreme Learning Machines (ELM) [38] have been used separately in previous studies for age estimation. Each classifier has its own inherent limitations. We propose a technique that uses an ensemble of back propagation ANN and SVM with Radial Basis Function (RBF) kernel. A final age-group label is determined by decision level fusion. Decisions from all the classifiers are fused using majority voting rule whereby the estimated age-group is one with more votes. Figure 8. View largeDownload slide Block diagram of the proposed method. Figure 8. View largeDownload slide Block diagram of the proposed method. 3.6. Experiments The experiment was performed on publicly available FG-NET ageing dataset [2, 68]. The dataset consists of 1002 high-resolution colour or grey-scale face images of 82 multi-race subjects. The images in this dataset have large variations of lighting, pose and expression. Some images have adverse conditions because they were scanned. Age-group estimation performance is measured by cumulative score (CS) [33] at error level 0 which is defined as CS(x)=nxNx×100% (14) where nx is the number of test images correctly recognized as belonging to age-group x and Nx is the total number of test images in age-group x ⁠. We split our dataset into two sets; training and testing set. We further split the training set into k equal subsets. We perform k-folds of cross-validation by training classifiers on k−1 subsets and testing them on the remaining one subset. We perform cross-validation to choose the best γ and C RBF kernel parameters and number of layers and nodes in each layer for ANN. We try different pairs of (C,γ) on kernel-SVM and different numbers of hidden layers and nodes in ANN. We trained ANN with four layers, with 3 neurons in the input layer, 40 neurons in the second and the third layer and 4 neurons in the output layer. We used sigmoid activation function for our neuron processing. For SVM, we used parameters γ=0.12 and C=100 ⁠. Each classifier is tested on each of the components and the one that performs better is chosen for a specific component. 4. RESULTS AND DISCUSSION This section presents results of the experiments and performance evaluation of the proposed age-group estimation method. We used four age groups since age-group estimation reduces drastically with an increase in the number of groups due to reduction in the number of images per group [29]. Age grouping used is almost similar to grouping in [27, 29]. In this study, the age-groups are formulated as shown in Table 1. Table 1. Grouping of images used in this study. Age-group Number of images % Distribution 0–6 274 27.3 7–14 271 27.0 15–24 249 24.9 25–69 208 20.8 Total 1002 100 Age-group Number of images % Distribution 0–6 274 27.3 7–14 271 27.0 15–24 249 24.9 25–69 208 20.8 Total 1002 100 View Large Table 1. Grouping of images used in this study. Age-group Number of images % Distribution 0–6 274 27.3 7–14 271 27.0 15–24 249 24.9 25–69 208 20.8 Total 1002 100 Age-group Number of images % Distribution 0–6 274 27.3 7–14 271 27.0 15–24 249 24.9 25–69 208 20.8 Total 1002 100 View Large This grouping is almost similar to one used in [27, 29]. Age-group classification was performed on individual fused feature vectors to determine their performance. Experimental results for age-group estimation based on global shape and appearance features is shown in Table 2. We observed that age-group estimation accuracy is higher is Age-groups 1 and 4. During lower age-groups, the shape of the face changes significantly while appearance remains relatively constant. Shape features play a major role in classifying images in these age groups. Table 2. Cumulative scores of age estimation using holistic shape and appearance features. Age-group Shape+Appearance SVM (%) ANN (%) 1 78.2 94.5 2 75.1 75.8 3 77.4 80.6 4 80.3 82.7 Age-group Shape+Appearance SVM (%) ANN (%) 1 78.2 94.5 2 75.1 75.8 3 77.4 80.6 4 80.3 82.7 View Large Table 2. Cumulative scores of age estimation using holistic shape and appearance features. Age-group Shape+Appearance SVM (%) ANN (%) 1 78.2 94.5 2 75.1 75.8 3 77.4 80.6 4 80.3 82.7 Age-group Shape+Appearance SVM (%) ANN (%) 1 78.2 94.5 2 75.1 75.8 3 77.4 80.6 4 80.3 82.7 View Large In adulthood, facial shape remains slightly constant as there is significant change in appearance. In these age-groups, appearance features are vital in discriminating between age-group. Fusing of these two features contribute to better accuracy across all age groups. Table 3 shows the cumulative scores for age-group estimation based on forehead texture and wrinkle fused features. FH Gabor and LBP features can discriminate between Age-groups 1 and 2 by accuracies in the range of 45–60%. As one grows into Age-group 3, the forehead texture becomes distinct from Age-groups 1 and 2. Accuracies improve to slightly around 80% in Age-group 4. This could be attributed to introduction of heavy wrinkle lines on the forehead in Age-group 4. The performance of these features could also be adversely affected by hair style that covers part of the forehead. Table 3. Cumulative scores of age estimation using forehead texture and wrinkle features. Age-group Texture+Wrinkle SVM (%) ANN (%) 1 45.2 51.5 2 64.3 53.7 3 72.4 74.2 4 78.5 82.6 Age-group Texture+Wrinkle SVM (%) ANN (%) 1 45.2 51.5 2 64.3 53.7 3 72.4 74.2 4 78.5 82.6 View Large Table 3. Cumulative scores of age estimation using forehead texture and wrinkle features. Age-group Texture+Wrinkle SVM (%) ANN (%) 1 45.2 51.5 2 64.3 53.7 3 72.4 74.2 4 78.5 82.6 Age-group Texture+Wrinkle SVM (%) ANN (%) 1 45.2 51.5 2 64.3 53.7 3 72.4 74.2 4 78.5 82.6 View Large As shown in Table 4, LE and RE LBP and Gabor features perform quite well in age-group estimation. Age-group discrimination using LE texture and wrinkle features achieved accuracy of 50–70% in Age-groups 1 and 2. This could be attributed to minimal texture variations in these age groups. Wrinkles are also rarely found around eye regions in this age group. Accuracy of between 70% and 80% was achieved in Age-groups 3 and 4. Accuracy improvement in last two age groups could be attributed to significant texture variations around the eye region and introduction of wrinkles as one grows. Table 4. Cumulative scores of age estimation using eyes texture and wrinkle features. Age-group Texture+Wrinkle LE RE SVM (%) ANN (%) SVM (%) ANN (%) 1 54.2 52.3 55.7 53.1 2 66.0 64.4 65.8 65.5 3 73.1 75.7 74.2 76.7 4 76.3 79.6 76.8 78.4 Age-group Texture+Wrinkle LE RE SVM (%) ANN (%) SVM (%) ANN (%) 1 54.2 52.3 55.7 53.1 2 66.0 64.4 65.8 65.5 3 73.1 75.7 74.2 76.7 4 76.3 79.6 76.8 78.4 View Large Table 4. Cumulative scores of age estimation using eyes texture and wrinkle features. Age-group Texture+Wrinkle LE RE SVM (%) ANN (%) SVM (%) ANN (%) 1 54.2 52.3 55.7 53.1 2 66.0 64.4 65.8 65.5 3 73.1 75.7 74.2 76.7 4 76.3 79.6 76.8 78.4 Age-group Texture+Wrinkle LE RE SVM (%) ANN (%) SVM (%) ANN (%) 1 54.2 52.3 55.7 53.1 2 66.0 64.4 65.8 65.5 3 73.1 75.7 74.2 76.7 4 76.3 79.6 76.8 78.4 View Large As shown in Table 5, LBP and Gabor features from LC and RC performed slightly lower compared to eye features in Age-groups 1 and 2. As we move to Age-groups 3 and 4, age spots, wrinkles and general texture variations start appearing in cheek bone areas. This leads to improvement in LC and RC features performance from 50% to 63% in Age-groups 1 and 2 to the range of 75–84% in Age-groups 3 and 4. Table 5. Cumulative scores of age estimation using cheeks texture and wrinkle features. Age-group Texture+Wrinkle LC RC SVM (%) ANN (%) SVM (%) ANN (%) 1 48.4 53.2 50.3 53.7 2 62.5 60.0 63.1 61.0 3 78.1 76.8 79.2 76.2 4 81.4 83.2 80.8 81.6 Age-group Texture+Wrinkle LC RC SVM (%) ANN (%) SVM (%) ANN (%) 1 48.4 53.2 50.3 53.7 2 62.5 60.0 63.1 61.0 3 78.1 76.8 79.2 76.2 4 81.4 83.2 80.8 81.6 View Large Table 5. Cumulative scores of age estimation using cheeks texture and wrinkle features. Age-group Texture+Wrinkle LC RC SVM (%) ANN (%) SVM (%) ANN (%) 1 48.4 53.2 50.3 53.7 2 62.5 60.0 63.1 61.0 3 78.1 76.8 79.2 76.2 4 81.4 83.2 80.8 81.6 Age-group Texture+Wrinkle LC RC SVM (%) ANN (%) SVM (%) ANN (%) 1 48.4 53.2 50.3 53.7 2 62.5 60.0 63.1 61.0 3 78.1 76.8 79.2 76.2 4 81.4 83.2 80.8 81.6 View Large As shown in Table 6, performance on Age-group 4 was the highest with an accuracy of 75–84% on NB texture and wrinkle features. This could be attributed to introduction of heavy wrinkle lines and significant texture variations around NB in adults and old age. In old age, freckles start appearing around NB due to over-exposure to UV rays from the sun. LBP and Gabor features from NB region in Age-groups 1 and 2 were slightly low with accuracies in the range of 50–60%. This is due to relatively constant skin texture and lack of wrinkle features in this region. In Age-group 3, accuracies improve to the range of 60–70%. Similarities between subjects in Age-group 3 and those in 2 and 4 could have contributed to this lower accuracy. We observed that NB region provides rich texture and wrinkle information for distinguishing Age-group 3 from Age-group 4. Wrinkle and texture feature from the NO region performed slightly better than features from NB region for Age-groups 1 and 2. Table 6. Cumulative scores of age estimation using cheeks texture and wrinkle features. Age-group Texture+Wrinkle NB NO SVM (%) ANN (%) SVM (%) ANN (%) 1 55.4 53.2 57.2 65.8 2 57.8 58.1 72.1 69.6 3 61.5 69.0 75.5 73.3 4 75.1 83.6 78.2 80.8 Age-group Texture+Wrinkle NB NO SVM (%) ANN (%) SVM (%) ANN (%) 1 55.4 53.2 57.2 65.8 2 57.8 58.1 72.1 69.6 3 61.5 69.0 75.5 73.3 4 75.1 83.6 78.2 80.8 View Large Table 6. Cumulative scores of age estimation using cheeks texture and wrinkle features. Age-group Texture+Wrinkle NB NO SVM (%) ANN (%) SVM (%) ANN (%) 1 55.4 53.2 57.2 65.8 2 57.8 58.1 72.1 69.6 3 61.5 69.0 75.5 73.3 4 75.1 83.6 78.2 80.8 Age-group Texture+Wrinkle NB NO SVM (%) ANN (%) SVM (%) ANN (%) 1 55.4 53.2 57.2 65.8 2 57.8 58.1 72.1 69.6 3 61.5 69.0 75.5 73.3 4 75.1 83.6 78.2 80.8 View Large Performance in Age-group 4 had the highest accuracy of 78–81% with Age-groups 2 and 3 having accuracies of between 70% and 75%. Similar to other regions, Age-group 1 had the lowest accuracy between 55% and 60%. Performance in Age-groups 3 and 4 could have been affected by partial occlusion/nose shadow and presence and absence of mustache. Table 7 shows the performance of texture and wrinkle features from MO region. Age-groups 1 and 2 had accuracies of between 67% and 71%. Table 7. Cumulative scores of age estimation using mouth texture and wrinkle features. Age-group Texture+Wrinkle SVM (%) ANN (%) 1 67.2 69.3 2 70.5 68.8 3 59.5 58.7 4 62.4 63.2 Age-group Texture+Wrinkle SVM (%) ANN (%) 1 67.2 69.3 2 70.5 68.8 3 59.5 58.7 4 62.4 63.2 View Large Table 7. Cumulative scores of age estimation using mouth texture and wrinkle features. Age-group Texture+Wrinkle SVM (%) ANN (%) 1 67.2 69.3 2 70.5 68.8 3 59.5 58.7 4 62.4 63.2 Age-group Texture+Wrinkle SVM (%) ANN (%) 1 67.2 69.3 2 70.5 68.8 3 59.5 58.7 4 62.4 63.2 View Large Performance of features from MO was the lowest in Age-groups 3 and 4 with an accuracy of 58–64%. This could be attributed to facial expression like smiling and laughing that extremely influence LBP and Gabor features extracted from MO region. Another factor that could have affected this accuracy is mustache and beards around the mouth. We combined the two classifiers to make an ensemble of nine classifiers; one for each landmark and one for global fused feature. Using majority voting strategy, we fused decision from these classifiers in order to get single ensemble output for each test sample. Figure 9 shows overall performance of feature and decision level fusion in age-group estimation. Figure 9. View largeDownload slide Cumulative score for age-group estimation based on feature and decision level fusion of global shape and appearance and local wrinkle and texture feature. Figure 9. View largeDownload slide Cumulative score for age-group estimation based on feature and decision level fusion of global shape and appearance and local wrinkle and texture feature. Using hybrid of global and local facial features with feature fusion and decision fusion significantly improve age-group estimation. The overall performance of Age-group 1 had the highest accuracy of 97.8%. Age-group 2 recorded overall performance of 88.6%, while Age-group 3 overall performance was 89.5%. Age-group 4 had performance of 93.7%. The general performance of this approach is 92.4% which is calculated as accuracy=∑i=14niN×ai (15) where ni is the number of images in Age-group i ⁠, N is the total number of images (1002) and ai is the accuracy of Age-group i ⁠. Performance of Age-group 4 could be attributed to significant contribution from appearance, texture and wrinkle features. In this age-group, appearance, texture and wrinkles vary significantly as compared with subjects in lower age-groups. Lower performance of Age-group 2 could be attributed to similarity of subjects in this age-group with subjects in Age-groups 1 and 3. 4.1. Comparison with previous methods It is a bit challenging to compare age-group estimation approaches due to differences in age-grouping in various approaches. We compare performance in age-group estimation in this paper with almost similar age-groups in previous approaches. As shown in Fig. 9, fusing shape, appearance, texture and wrinkle features improved performance in middle age groups from 40%–60% reported in [38] to the range 80–90%. Adult age-group performance was also improved to 93.7% as compared to 87.5% reported in [40]. We achieved performance of 89.5% in teen age-group (15–24 years) as compared with teen age-group (12–21 years) performance of 87% reported in [40]. Age-group 1 (0–6 yeas) performance improved to 97.8% as compared with 92.1% baby group (0–3 years) accuracy reported in [27]. Our approach achieved 93.7% accuracy in adult age group (25–69 years) compared with 90.1% reported in [27] for age-group 20–59 years. We achieved performance of 89.5% in Age-group 3 (15–24 year) as compared with 75% for age range 16–30 years reported in [25]. Table 8 shows a summary of comparison of overall performance of this approach with previous methods in the literature. Table 8. Comparison of age-group estimation performance with previous studies. Method Age grouping Accuracy (%) 2D-Landmark Points, Regression Vector Machines, Partial Least Squares, Nearest Neighbour, Naive Bayes, Fisher Linear Discriminant [16] 0–15, 15–30, 30+ 70.0 LBP, Minimum Distance, Nearest Neighbour [20] 10±5,20±530±5,40±550±5,60±5 80.0 Anthropometric Distance Ratios, Canny, Artificial Neural Network [11] 0–15, 16–30, 31–50, 51+ 86.6 HOG, Probabilistic Neural Network [25] 0–15, 16–30, 31–50, 50+ 87.0 2D-Landmark Points, GOP, ANOVA, SVM (GAS) [27] 0–3, 4–19, 20–59 91.4 Gradient of Oriented Pyramids, SVM [29] 0–5, 6–12, 13–21, 22–69 91.3 Proposed: Anthropometric Distance Ratios, LBP, 2D-Landmark Points, Gabor, LDA, SVM, Artificial Neural Network 0–6, 7–14, 15–24, 25–69 92.4 Method Age grouping Accuracy (%) 2D-Landmark Points, Regression Vector Machines, Partial Least Squares, Nearest Neighbour, Naive Bayes, Fisher Linear Discriminant [16] 0–15, 15–30, 30+ 70.0 LBP, Minimum Distance, Nearest Neighbour [20] 10±5,20±530±5,40±550±5,60±5 80.0 Anthropometric Distance Ratios, Canny, Artificial Neural Network [11] 0–15, 16–30, 31–50, 51+ 86.6 HOG, Probabilistic Neural Network [25] 0–15, 16–30, 31–50, 50+ 87.0 2D-Landmark Points, GOP, ANOVA, SVM (GAS) [27] 0–3, 4–19, 20–59 91.4 Gradient of Oriented Pyramids, SVM [29] 0–5, 6–12, 13–21, 22–69 91.3 Proposed: Anthropometric Distance Ratios, LBP, 2D-Landmark Points, Gabor, LDA, SVM, Artificial Neural Network 0–6, 7–14, 15–24, 25–69 92.4 View Large Table 8. Comparison of age-group estimation performance with previous studies. Method Age grouping Accuracy (%) 2D-Landmark Points, Regression Vector Machines, Partial Least Squares, Nearest Neighbour, Naive Bayes, Fisher Linear Discriminant [16] 0–15, 15–30, 30+ 70.0 LBP, Minimum Distance, Nearest Neighbour [20] 10±5,20±530±5,40±550±5,60±5 80.0 Anthropometric Distance Ratios, Canny, Artificial Neural Network [11] 0–15, 16–30, 31–50, 51+ 86.6 HOG, Probabilistic Neural Network [25] 0–15, 16–30, 31–50, 50+ 87.0 2D-Landmark Points, GOP, ANOVA, SVM (GAS) [27] 0–3, 4–19, 20–59 91.4 Gradient of Oriented Pyramids, SVM [29] 0–5, 6–12, 13–21, 22–69 91.3 Proposed: Anthropometric Distance Ratios, LBP, 2D-Landmark Points, Gabor, LDA, SVM, Artificial Neural Network 0–6, 7–14, 15–24, 25–69 92.4 Method Age grouping Accuracy (%) 2D-Landmark Points, Regression Vector Machines, Partial Least Squares, Nearest Neighbour, Naive Bayes, Fisher Linear Discriminant [16] 0–15, 15–30, 30+ 70.0 LBP, Minimum Distance, Nearest Neighbour [20] 10±5,20±530±5,40±550±5,60±5 80.0 Anthropometric Distance Ratios, Canny, Artificial Neural Network [11] 0–15, 16–30, 31–50, 51+ 86.6 HOG, Probabilistic Neural Network [25] 0–15, 16–30, 31–50, 50+ 87.0 2D-Landmark Points, GOP, ANOVA, SVM (GAS) [27] 0–3, 4–19, 20–59 91.4 Gradient of Oriented Pyramids, SVM [29] 0–5, 6–12, 13–21, 22–69 91.3 Proposed: Anthropometric Distance Ratios, LBP, 2D-Landmark Points, Gabor, LDA, SVM, Artificial Neural Network 0–6, 7–14, 15–24, 25–69 92.4 View Large As shown in Table 8, our approach has improved overall age-group estimation from 91.4 achieved in [27] using Gradient of oriented pyramids (GOP), ANOVA and SVM (GAS) and 91.3% reported in [29] to 92.4%. This shows that using both global and local features with feature fusion and decision fusion improves performance of age-group estimation. 5. CONCLUSION An age-group estimation method using feature fusion, decision fusion and ensemble of classifiers is proposed. Global shape features are extracted using landmark points and distance ratios between fiducial points. LDA is used to extract global appearance features. After z-score normalization on each feature vector, global shape and appearance features are fused into one feature vector for training one of the classifiers. Face image is then divided into eight facial components. Component-based texture and wrinkle features are extracted by LBP and Gabor filters respectively. For each component, these features are z-score normalized and then fused to make one feature vector. LDA is used to normalize dimensionality of features before z-score normalization and fusion. Fused features are used to train an ensemble made of ANN and SVM classifiers for age-group estimation. A total of nine models are trained. Decision level fusion is adopted by use of majority voting scheme for final age-group label. Experimental results on widely used FG-NET ageing dataset show that the proposed approach achieves better accuracy in age-group estimation against previous methods. To improve exact age estimation, in-depth inquiry is necessary for performing age-group estimation with narrow uniform age groups (0→4,5→9,10→14...96→100) ⁠. Further inquiry may be done to verify robustness of proposed method across various face ageing datasets. Further investigation is required to make age-group and age estimation robust to various dynamics like facial expression, cosmetics, illumination and differences in perceived ageing between males and females. REFERENCES 1 Alley , R. ( 1998 ) Social and Applied Aspects of Perceiving Faces . Lawrence Erlbaum Associates, Inc . 2 Panis , G. , Lanitis , A. , Tsapatsoulis , N. and Cootes , T.F. ( 2016 ) Overview of research on facial ageing using the FG-NET ageing database . IET Biometrics , 5 , 37 – 46 . Google Scholar Crossref Search ADS 3 Ramanathan , N. , Chellappa , R. and Biswas , S. ( 2009 ) Computational methods for modeling facial aging: a survey . J. Vis. Lang. Comput. , 20 , 131 – 144 . Google Scholar Crossref Search ADS 4 Raval , M.J. and Shankar , P. ( 2015 ) Age invariant face recognition using artificial neural network . Int. J. Adv. Eng. Res. Dev. , 2 , 121 – 128 . 5 Sonu , A. , Sushil , K. and Sanjay , K. ( 2014 ) A novel idea for age invariant face recognition . Int. J. Innov. Res. Sci. Eng. Technol. , 3 , 15618 – 15624 . Google Scholar Crossref Search ADS 6 Jyothi , S.N. and Indiramma , M. ( 2013 ) Stable local feature based age invariant face recognition . Int. J. Appl. Innov. Eng. Manage. , 2 , 366 – 371 . 7 Jinli , S. , Xilin , C. , Shiguang , S. , Wen , G. and Qionghai , D. ( 2012 ) A concatenational graph evolution aging model . IEEE Trans. Pattern Anal. Mach. Intell. , 34 , 2083 – 2096 . Google Scholar Crossref Search ADS PubMed 8 Park , U. , Tong , Y. and Jain , A.K. ( 2010 ) Age invariant face recognition . IEEE Trans. Pattern Anal. Mach. Intell. , 32 , 947 – 954 . Google Scholar Crossref Search ADS PubMed 9 Fu , Y. , Guo , G. and Huang , T. ( 2010 ) Age synthesis and estimation via faces: a survey . IEEE Trans. Pattern Anal. Mach. Intell. , 32 , 1955 – 1976 . Google Scholar Crossref Search ADS PubMed 10 Ramanathan , N. and Chellappa , R. ( 2005 ) Face Verification Across Age Progression. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 462–469. IEEE. 11 Dehshibi , M.M. and Bastanfard , A. ( 2010 ) A new algorithm for age recognition from facial images . Signal Processing , 90 , 2431 – 2444 . Google Scholar Crossref Search ADS 12 Alberta , A.M. , Ricanek , K. and Pattersonb , E. ( 2007 ) A review of the literature on the aging adult skull and face: implications for forensic science research and applications . Forensic Sci. Int. , 172 , 1 – 9 . Google Scholar Crossref Search ADS PubMed 13 Choi , S.E. , Lee , Y.J. , Lee , S.J. , Park , K.R. and Kim , J. ( 2011 ) Age estimation using hierarchical classifier based on global and local features . Pattern Recognit. , 44 , 1262 – 1281 . Google Scholar Crossref Search ADS 14 Kwon , Y. and Lobo , N. ( 1999 ) Age classification from facial images . Comput. Vis. Image Underst. , 74 , 1 – 21 . Google Scholar Crossref Search ADS 15 Horng , W.B. , Lee , C.P. and Chen , C.W. ( 2001 ) Classification of age groups based on facial features . Tamkang J. Sci. Eng. , 4 , 183 – 192 . 16 Thukral , P. , Mitra , K. and Chellappa , R. ( 2012 ) A Hierarchical Approach for Human Age Estimation. In Proc. IEEE Int. Conf. Acoustic, Speech and Signal Processing, pp. 1529–32. 17 Farkas , L.G. , Katic , M.J. and Forrest , C.R. ( 2005 ) International anthropometric study of facial morphology in various ethnic groups/races . J. Craniofacila Surg. , 16 , 615 – 646 . Google Scholar Crossref Search ADS 18 Tiwari , R. , Shukla , A. , Prakash , C. , Sharma , D. , Kumar , R. and Sharma , S. ( 2009 ) Face Recognition Using Morphological Methods. In Proc. IEEE Int. Advance Computing Conference. IEEE. 19 Fukunaga , K. ( 1990 ) Introduction to Statistical Pattern recognition ( 2nd edn ). Academic Press . 20 Gunay , A. and Nabiyev , V.V. ( 2008 ) Automatic Age Classification with LBP. In Proc. 23rd Int. Sympos. Computer and Information Sciences, pp. 1–4. 21 Ojala , T. , Pietikainen , M. and Harwood , D. ( 1996 ) A comparative study of texture measures with classification based on featured distribution . Pattern Recognit. , 29 , 51 – 59 . Google Scholar Crossref Search ADS 22 Gunay , A. and Nabiyev , V.V. ( 2015 ) Facial age estimation based on decision level fusion of AMM, LBP and gabor features . Int. J. Adv. Comput. Sci. Appl. , 6 , 19 – 26 . 23 Cootes , T.F. , Edwards , G.J. and Taylor , C.J. ( 2001 ) Active appearance models . IEEE Trans. Pattern Anal. Mach. Intell. , 23 , 681 – 685 . Google Scholar Crossref Search ADS 24 Gabor , D. ( 1946 ) Theory of communication . J. Inst. Electr. Eng. , 93 , 429 – 457 . 25 Hajizadeh , M.A. and Ebrahimnezhad , H. ( 2011 ) Classification of Age Groups from Facial Image Using Histogram of Oriented Gradient. In Proc. 7th Iranian Machine Vision and Image Processing Conference, pp. 1–5. 26 Dalal , N. and Triggs , B. ( 2005 ) Histograms of Oriented Gradients for Human Detection. In Proc. Conf. Computer Vision and Pattern Recognition, pp. 886–893. IEEE. 27 Liu , K.H. , Yan , S. and Kuo , J.C.C. ( 2014 ) Age Group Classification via Structured Fusion of Uncertainty-Driven Shape Features and Selected Surface Features. In Proc. IEEE Winter Conf. Applications of Computer Vision (WACV), pp. 445–452. IEEE. 28 Ling , H. , Soatto , S. , Ramanathan , N. and Jacobs , D. ( 2010 ) Face verification across age progression using discriminanative methods . IEEE Trans. Inf. Forensics Secur. , 5 , 82 – 91 . Google Scholar Crossref Search ADS 29 Liu , K.H. , Yan , S. and Kuo , J.C.C. ( 2015 ) Age estimation via grouping and decision fusion . IEEE Trans. Inf. Forensics Secur. , 10 , 2408 – 2423 . Google Scholar Crossref Search ADS 30 Chang , C.C. and Lin , C.J. ( 2011 ) Libsvm: a library for support vector machines . ACM Trans. Intell. Syst. Technol. , 2 , 27:1 – 27:27 . Google Scholar Crossref Search ADS 31 Lanitis , A. , Taylor , J. and Cootes , T.F. ( 2002 ) Toward automatic simulation of aging effects on face images . IEEE Trans. Pattern Anal. Mach. Intell. , 24 , 442 – 455 . Google Scholar Crossref Search ADS 32 Lanitis , A. , Draganova , C. and Christodoulou , C. ( 2004 ) Comparing different classifiers for automatic age estimation . IEEE Trans. Man Syst. Cybernatics , 34 , 621 – 628 . Google Scholar Crossref Search ADS 33 Geng , X. , Zhau , Z. and Smith-miles , K. ( 2007 ) Automatic age estimation based on facial aging patterns . IEEE Trans. Pattern Anal. Mach. Intell. , 29 , 2234 – 2240 . Google Scholar Crossref Search ADS PubMed 34 Fu , Y. and Huang , T.S. ( 2008 ) Human age estimation with regression on discriminative aging manifold . IEEE Trans. Multimed. , 10 , 578 – 584 . Google Scholar Crossref Search ADS 35 Guo , G. , Fu , Y. , Dyer , C. and Huang , T. ( 2008 ) Image-based human age estimation by manifold learning and locally adjusted robust regression . IEEE Trans. Image Process. , 17 , 1178 – 1188 . Google Scholar Crossref Search ADS PubMed 36 Ramanathan , N. and Chellappa , R. ( 2006 ) Modeling Age Progression in Young Faces. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 384–394. IEEE. 37 Chao , W. , Liu , J. and Ding , J. ( 2013 ) Facial age estimation based on label-sensitive learning and age-oriented regression . Pattern Recognit. , 46 , 628 – 641 . Google Scholar Crossref Search ADS 38 Sai , P. , Wang , J. and Teoh , E. ( 2015 ) Facial age range estimation with extreme learning machines . Neurocomputing , 149 , 364 – 372 . Google Scholar Crossref Search ADS 39 Huang , G.B. , Zhu , Q.Y. and Siew , C.K. ( 2006 ) Extreme learning machines: theory and applications . Neurocomputing , 70 , 489 – 501 . Google Scholar Crossref Search ADS 40 Wang , J. , Yau , W. and Wang , H.L. ( 2009 ) Age Categorization via ECOC with Fused Gabor and LBP Features. In Proc. IEEE Applications of Computer Vision (WACV), pp. 313–318. IEEE. 41 Lienhart , R. and Maydt , J. ( 2002 ) An extended set of Haar-like features for rapid object detection . IEEE ICIP , 1 , 900 – 903 . 42 Angulu , R. , Tapamo , J.R. and Aderemi , O.A. ( 2017 ) Landmark Localization Approach for Facial Computing. In Proc. IEEE Conf. ICT and Society. IEEE. 43 Viola , P. and Jones , M. ( 2001 ) Rapid Object Detection Using a Boosted Cascade of Simple Features. In Proc. IEEE Int. Conf. Computer Vision and Pattern (CVPR), pp. 8–14. IEEE. 44 Castrillon-santana , M. , Deniz-Suarez , O. , Hernandez , T.M. and Guerra , A.C. ( 2007 ) Real-time detection of multiple faces at different resolutions in video streams . J. Vis. Commun. Image Representation , 18 , 130 – 140 . Google Scholar Crossref Search ADS 45 Castrillon-santana , M. , Deniz-Suarez , O. , Anton-Canalis , L. and Lorenzo-Navarro , J. ( 2008 ) Face and Facial Feature Detection Evaluation. In Third Int. Conf. Computer Vision Theory and Applications. 46 Suo , J. , Wu , T. , Zhu , S. , Shan , S. , Chen , X. and Gao , W. ( 2008 ) Design Sparse Features for Age Estimation Using Hierarchical Face Model. In Proc. IEEE Conf. Automatic Face and Gesture Recognition. IEEE. 47 Manjunathi , B.S. and Ma , W.Y. ( 1996 ) Texture features for browsing, retrieval of image data . IEEE Trans. Pattern Anal. Mach. Intell. , 18 , 837 – 842 . Google Scholar Crossref Search ADS 48 Takimoto , H. , Mitsukura , Y. , Fukumi , M. and Akamatsu , N. ( 2008 ) Robust gender and age estimation under varying facial poses . Electron. Commun. Jpn. , 91 , 32 – 40 . Google Scholar Crossref Search ADS 49 Hayashi , J. , Yasumoto , M. , Ito , H. and Koshimizu , H. ( 2001 ) A Method for Estimating and Modeling Age and Gender Using Facial Image Processing. In Proc. 7th Int. Conf. Virtual Systems and Multimedia. 50 Jung , H.G. and Kim , J. ( 2010 ) Constructing a pedestrian recognition system with a public open database, without the necessity of re-training: an experimental study . Pattern Anal. Appl. , 13 , 223 – 233 . Google Scholar Crossref Search ADS 51 Ojala , T. , Pietikainen , M. and Maenpaa , T. ( 2002 ) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns . IEEE Trans. Pattern Anal. Mach. Intell. , 24 , 971 – 987 . Google Scholar Crossref Search ADS 52 Maenpaa , T. and Pietikainen , M. ( 2005 ) Texture Analysis with Local Binary Patterns: Handbook of Pattern Recognition and Computer Vision . World Scientific . 53 Ahonen , T. , Hadid , A. and Pietikainen , M. ( 2004 ) Face Recognition with Local Binary Patterns. In Euro. Conf. Computer Vision 469–481. 54 Duta , N. , Jain , A.K. and Dubuisson-Jolly , M.P. ( 2001 ) Automatic construction of 2d shape model . IEEE Trans. Pattern Anal. Mach. Intell. , 23 , 433 – 446 . Google Scholar Crossref Search ADS 55 Fisher , R.A. ( 1938 ) The statistical utilization of multiple measurements . Ann. Eugen. , 8 , 376 – 386 . Google Scholar Crossref Search ADS 56 Belhumeour , P.N. , Hespanda , J.P. and Kriegman , D.J. ( 1997 ) Eigenfaces vs Fisherfaces: recognition using class specific linear projection . IEEE Trans. Pattern Anal. Mach. Intell. , 19 , 711 – 720 . Google Scholar Crossref Search ADS 57 Swets , D.L. and Weng , J.J. ( 1996 ) Using discriminant eigenfeatures for image retrieval . IEEE Trans. Pattern Anal. Mach. Intell. , 18 , 71 – 86 . Google Scholar Crossref Search ADS 58 Martinez , A.M. and Kak , A.C. ( 2001 ) PCA versus LDA . IEEE Trans. Pattern Anal. Mach. Intell. , 23 , 228 – 233 . Google Scholar Crossref Search ADS 59 Dongarra , J.J. , Croz , J.D. , Hammarling , S. and Duff , I.S. ( 1990 ) A set of level 3 basic linear algebra subprograms . ACM Trans. Math. Softw. , 16 , 1 – 17 . Google Scholar Crossref Search ADS 60 Ross , A. and Jain , A. ( 2003 ) Information fusion in biometrics . Pattern Recognit. Lett. , 24 , 2115 – 2125 . Google Scholar Crossref Search ADS 61 Eskandari , M. , Onsen , T. and Hasan , D. ( 2013 ) A new approach for face-iris multimodal biometric recognition using score fusion . Int. J. Pattern Recognit. Artif. Intell. , 27 , 1 – 15 . Google Scholar Crossref Search ADS 62 Ross , A. and Jain , A.K. ( 2003 ) Information fusion in biometrics . Pattern Recognit. Lett. , 24 , 2115 – 2125 . Google Scholar Crossref Search ADS 63 Kittler , J.V. , Hatef , M. , Duin , R.P.W. and Matas , J. ( 1998 ) On combining classifiers . IEEE Trans. Pattern Anal. Mach. Intell. , 20 , 226 – 239 . Google Scholar Crossref Search ADS 64 Mangai , U.G. , Samanta , S. , Das , S. and Chowdhury , P.R. ( 2010 ) A survey of decision fusion and feature fusion strategies for pattern classification . IETE Tech. Rev. , 27 , 293 – 307 . Google Scholar Crossref Search ADS 65 Lian , H.C. and Lu , B.L. ( 2005 ) Age Estimation Using a Min–Max Modular Support Vector Machine. In Proc. 12th Int. Conf. Neural Information Processing, pp. 83–88. 66 Gao , F. and Ai , H. ( 2009 ) Face Age Classification on Consumer Images with Gabor Feature and Fuzzy LDA Method: Lecture Notes in Computer Science. In Proc. 3rd Int. Conf. Advances in Biometrics, pp. 132–141. 67 Hewahi , N. , Olwan , A. , Tubeel , N. , El-Asar , S. and Abu-Sultan , Z. ( 2010 ) Age estimation based on neural networks using face features . J. Emerg. Trends Comput. Inf. Sci. , 1 , 61 – 67 . 68 FG-NET ( 2002 ). Face and Gesture Recognition Working Group. http://www-prima.inrialpes.fr/FGnet/. Footnotes 1 Generalized Matrix Multiplication (GEMM) in Basic Linear Algebra Subprograms (BLAS) Level 3 http://www.netlib.org/blas/ Author notes Handling editor: Fionn Murtagh © The British Computer Society 2018. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png The Computer Journal Oxford University Press

Age-Group Estimation Using Feature and Decision Level Fusion

Loading next page...
 
/lp/ou_press/age-group-estimation-using-feature-and-decision-level-fusion-Xb8srcvTfn
Publisher
Oxford University Press
Copyright
© The British Computer Society 2018. All rights reserved. For permissions, please email: journals.permissions@oup.com
ISSN
0010-4620
eISSN
1460-2067
DOI
10.1093/comjnl/bxy050
Publisher site
See Article on Publisher Site

Abstract

Abstract Age-group estimation is a technique of classifying an individual’s face image as belonging to a particular range of ages. Stochastic nature of ageing among individuals makes age-group estimation based on face images a challenging task. Faces from different age-groups may have similar attributes making facial age-group estimation even harder. Facial ageing is manifested as change in shape in lower age-groups and variations in texture in adult to old age-groups. As one ages, wrinkles start appearing in regions like forehead, eye-corners, mouth region and cheek bone areas among others. The general appearance of the face also varies across one’s lifetime from childhood to old age. Approaching age-group estimation as a multi-class classification problem, we adopt Linear Discriminant Analysis (LDA) for representation of global face appearance, landmark points and distance ratios between fiducial landmarks for shape representation, Local Binary Patterns (LBP) for local region texture description and Gabor filters for local region wrinkle representation. We fuse global shape and appearance features; local wrinkle and texture features for facial component before applying an ensemble of machine learning tools that comprise of Artificial Neural Network (ANN) and Support Vector Machines (SVM) for age-group estimation. Using majority voting scheme, we fuse decisions from global and component-based matchers to get final age-group label. This approach was experimented on four age-groups derived from Face and Gesture Network (FG-NET) ageing dataset. Experimental results show that feature and decision fusion improve age-group estimation accuracies. 1. INTRODUCTION Human faces provide prior information about one’s gender, identity, ethnicity, perceived age, mood among others. Alley [1] asserts that attributes derived from human facial appearance like mood, perceived age significantly impact interpersonal behaviour. Automatic age estimation can be applied in a number of ways like preventing vendor machines from dispensing alcohol and cigarettes to minors. Age estimation has attracted significant research in recent years [2]. Facial ageing has received substantial research attention with increasing attention in age invariant face recognition, age estimation and face verification across age among other areas [3–7]. Ageing is a stochastic inevitable and irreversible process that causes variation in facial shape and texture. Ageing introduces significant change in facial shape in formative years and relatively large texture variations with still minor change in shape in older age groups [8, 9]. Shape variations in younger age groups are caused by craniofacial growth. Craniofacial studies have shown that human faces change from circular to oval as one ages [10]. These changes lead to variations in the position of fiducial landmarks [11]. During craniofacial development stage, forehead slants back emancipating space on the cranium. Eyes, ears, mouth and nose expand to cover interstitial space created. The chin becomes protrusive as cheeks extend. Facial skin remains moderately unchanged compared to shape. More literature on craniofacial development is found in [12]. As one ages, facial blemishes like wrinkles, freckles and age-spots appear. Underneath the skin, melanin producing cells are damaged due to exposure to sun’s ultraviolet (UV) rays. Freckles and age-spots appear due to over-production of melanin. Consequently, light reflecting collagen not only decreases but also becomes non-uniformly distributed making facial skin tone non-uniform [13]. Parts adversely affected by sunlight are upper cheek, nose, nose-bridge and forehead. Consequently, we selected these areas for skin wrinkle and texture analysis. The most visible variations in adulthood to old age are skin variations exhibited in texture change. There is still minimal facial shape variation in these age groups. Biologically, as the skin grows old, collagen underneath the skin is lost [9]. Loss of collagen and effect of gravity makes the skin become darker, thinner, leathery and less elastic. Facial spots and wrinkles appear gradually. The framework of bones beneath the skin may also start deteriorating leading to accelerated development of wrinkles and variations in skin texture. More details about face ageing in adulthood are found in [12]. These variations in shape and texture across age could be modelled and used to automatically estimate someone’s age. Feature extraction in age estimation systems is critical since it affects accuracy of the system. Age estimation systems use local, global or hybrid facial features. Given the stochastic nature of ageing, combining both local and global texture and shape could result into better age and age-group classification results. The rest of this paper is organized as follows. Section 2 presents related work. Section 3 presents materials and methodology. Section 4 presents experiments and results. Section 5 presents discussion and recommendations. 2. RELATED WORK Global, local and hybrid features have been previously used in age and age-group estimation. Ramanathan et al. [3] present a recent survey in automated age estimation techniques. Kwon and Lobo [14] estimated age-group based on anthropometry and density of wrinkles. They separated adults from babies using distance ratios between frontal face landmarks on a small dataset of 47 images. They also extracted wrinkle features from specific regions using Active Contour Model. Young adults were differentiated from senior adults using these wrinkle indices. Baby group classification accuracy was lower than 68% but overall performance of their experiments was not reported. Furthermore, ratios used were mainly from baby faces. Horng et al. [15] used geometric features and Sobel filter for texture analysis to classify face images into four groups. They used Sobel edge magnitude to extract and analyse wrinkles and skin variance. They achieved an accuracy of 81.6% on subjectively labelled age-groups. Dehshibi and Bastanfard [11] used distance ratios between landmarks to classify human faces in various age groups. Using a back propagation neural network with these ratios as inputs, face images were classified into four age-groups of below 15, 16–30, 31–50 and above 50. Using a private dataset, they reported 86% accuracy. Distance ratios alone are not suitable for capturing age variations since they are affected by pose, translation and rotation. Thukral et al. [16] used geometric features and decision fusion for age-group estimation. They achieved 70% for 0–15, 15–30 and above 30 age-groups. The age-groups used are wide making the technique hardly useful in age estimation applications. Farkas et al. [17] used 10 anthropometric measurements of the face to classify individuals in various ethnic groups. They analyzed these measurements and identified ones that contribute significantly to diversity in facial shape in different ethnic groups. They also found that horizontal and vertical measurements are different across ethnic groups. Tiwari et al. [18] developed a morphological based face recognition technique using Euclidean distance measurements between fiducial facial landmarks. Using morphological features with back propagation neural network, they reported superior recognition rate than performance of Principal Component Analysis (PCA) [19] with back propagation neural network. This technique recognized faces but it was independent of ageing factor due to variations in these distances as one ages. This signifies that distances between facial landmarks differ at different ages, especially in young age-groups, and, therefore, it could be used in age estimation. Gunay and Nabiyev [20] used spatial Local Binary Pattern (LBP) [21] histograms to classify faces into six age-groups. Using nearest neighbour classifiers, they achieved accuracy of 80% on age-groups 10±5,20±5,30±5,40±5,50±5 and 60±5 ⁠. In [22], Gunay and Nabiyev trained three Support Vector Machines (SVM) model for age-group estimation using Active Appearance Model (AAM) [23], LBP and Gabor filter [24] features. They fuse decisions from these classifiers to obtain final decision. Although they reported 90% accuracy on subsequent age estimation, overall performance of age-group estimation was not reported. Hajizadeh and Ebrahimnezhad [25] represented facial features using Histogram of Oriented Gradients (HOG) [26]. Using probabilistic neural network (PNN) to classify HOG features extracted from several regions, they achieved 87% accuracy in classifying face images into four groups. Liu et al. [27] build a region of certainty (ROC) to link uncertainty-driven shape features with particular surface features. Two shape features are first designed to determine face certainty and classify it. Thereafter, SVM is trained on gradient orient pyramid (GOP) [28] features for age-group classification. Testing this method on three age-groups, 95% accuracy was reported. They further used GOP in [29] with Analysis of Variance (ANOVA) for feature selection to classify faces into age-groups using linear SVM [30] to model features from eyes, nose and mouth regions. They achieved an accuracy of 91% on four age groups on FG-NET dataset and 82% on MORPH dataset. It was also found that the overall performance of age-estimation decreases as the number of age-groups increase. This is because the number of images in each age-group reduces drastically as the number of groups increase. Lanitis et al. [31] adopted AAM to represent face image as a vector of combined shape and texture parameters. They defined ageing as a linear, cubic or quadratic function. For automatic age estimation, they further evaluated quadratic function, nearest neighbour and Artificial Neural Network (ANN) in [32]. They found that hierarchical age estimation achieves better results with quadratic function and ANN classifiers. Although AAM have been extensively used, it does not extract texture information. This problem is avoided by using hybrid feature extraction techniques to combine both shape and texture features for age and age-group estimation. Adopting Ageing Pattern Subspace (AGES), Geng et al. [33] proposed automatic age estimation using appearance of face images. AGES compensate missing ages by learning a subspace representation of one’s images when modelling a series of a subjects ageing face. To estimate age, test image is positioned at every possible position in the ageing pattern to find a point that can best reconstruct it. Ageing subspace that can minimize reconstruction error determines age of the test image. Fu and Huang [34] used age separated face images to model a low-dimensional manifold. Age was estimated by linear and quadratic regression analysis of feature vectors derived from respective low-dimensional manifold. The same approach of manifold learning was used by Guo et al. in [35]. They extracted face ageing features using age learning manifold scheme and performed learning and age prediction using locally adjusted regressor. Their approach reported better performance than Support Vector Regression (SVR) and SVM. Craniofacial ageing model that combines psychophysical and anthropometric evidences was proposed by Ramanathan et al. [36]. The model was used to simulate perceived age of a subject across age for improving accuracy of face recognition. Choi et al. [13] proposed age estimation approach using hierarchical classifiers with local and global facial features. Using Gabor filters for wrinkle extraction and LBP for skin feature extraction, they classified face images into age groups with SVM. This approach is error prone because it only depends on a single classifier. Wrong age-group classification leads to wrong age estimation. For accurate age estimation, age-group classification must be robust and this can be achieved by use of ensemble of classifiers. Chao et al. [37] determined the association between age labels and facial features by combining distance metric, learning and dimensionality reduction. They used label-sensitive and K-Nearest Neighbour (KNN) and SVR for age estimation. Recently, Sai et al. [38] used LBP, Gabor and Biologically Inspired Features for face representation. They used Extreme Learning Machines (ELM) [39] for age-group estimation. Their approach achieved accuracy of about 70%. Using LBP and a bank of Gabor filters, Wang et al. [40] classified images into four age-groups. They used SVM, Error Correcting Output Codes (ECOC) and AdaBoost for age-group estimation. These studies use mostly single feature and single classifiers for age-group estimation. Intuitively, when a human eye focuses on a facial image, it extracts different features like texture, wrinkle, shape and general appearance. Extracting different features from a face image and use of ensemble of classifiers could increase age-group and age estimation accuracies. This study sought to find out how feature and decision fusion improve age-group estimation accuracies. 3. MATERIALS AND METHODS 3.1. Image preprocessing The image is first smoothed using 2D Gaussian spatial filter F(a,b)=12πσee−a2+b22σ2 (1) where a=3 and b=3 are the width and the height of the Gaussian kernel, respectively, and σ is the smoothing degree. Values of aandb were determined empirically. The value of σ is derived from the size b of the filter as σ=0.3×((b−1)×0.5−1)+0.8=0.8 (2) Given an image I(x,y) ⁠, an enhanced image IG(x,y) of size m×n is found by performing a convolution of F(x,y) with I(x,y) as IG(x,y)=∑a=0m−1∑b=0n−1F(a,b)I(x−a,y−b) (3) The smoothed image of I(x,y) is IG(x,y) ⁠. Image preprocessing helps in reducing noise and enhances facial features like edges, wrinkle lines and texture features. Images to be proposed are supposed to be frontal face images. The images are converted to grey-scale single channel images before being processed. 3.2. Facial component detection The facial components used for age-group estimation in this study are Left-Eye (LE), Right-Eye (RE), Nose (NO), Mouth (MO), Nose-bridge (NB), Left-Cheek (LC), Right-Cheek (RC) and Forehead (FH). The face region is first detected from a given image, using Haar-cascade classifier [41], then cropped and resized to 100×100 pixels. Figure 1 shows the components used in this study and how they are annotated. Figure 1. View largeDownload slide Components used for age-group estimation. Figure 1. View largeDownload slide Components used for age-group estimation. The components are detected within the cropped face using landmark localization technique proposed in [42]. In this technique, eyes are detected using Viola and Jones [43] Haar-cascade classifiers due to its better detection accuracies. Spatial and geometric information of the eyes is used to locate and crop forehead, cheeks and nose-bridge. Nose and mouth are detected using Haar-cascade classifiers [44, 45]. Eyes, nose-bridge and cheeks are cropped and resized to 50×50 pixels while the forehead is cropped and resized to 70×30 pixels. The nose is cropped and resized to 30×60 while the mouth is cropped and resized to 60×30 pixels. Component images were resized in a fashion to ensure that respective component images are of the same size for efficient extraction of appearance features using LDA. Component image sizes after cropping were determined by the aspect ratio of the respective components. 3.3. Feature extraction Facial shape and appearance are extracted from the whole face image. The image is then sub-divided into eight components as shown in Fig. 1. As proposed by Suo et al. [46], in order to improve age-group estimation, global features (shape and appearance) should be used together with local features (texture and wrinkles). Wrinkles features are extracted using Gabor filter and texture features using LBP. Figure 2 shows 32 Gabor filters, while Fig. 3 shows results of applying these filters to RE region shown in Fig. 1. Figure 2. View largeDownload slide Gabor filters from eight orientations four scales. Each row represents a scale and each column represent an orientation. 0≤θ<π and π/4≤γ≤π ⁠. Figure 2. View largeDownload slide Gabor filters from eight orientations four scales. Each row represents a scale and each column represent an orientation. 0≤θ<π and π/4≤γ≤π ⁠. Figure 3. View largeDownload slide Thirty-two responses after applying filters in Fig. 2 on RE in Fig. 1. Figure 3. View largeDownload slide Thirty-two responses after applying filters in Fig. 2 on RE in Fig. 1. Originally introduce by Denis Gabor in 1946 [24], Gabor filters have been extensively used for wrinkle, edge and texture feature extraction due to its capability of determining orientation and magnitude of wrinkles [13]. Gabor filter has been regarded as the best texture descriptor in object recognition, segmentation, tracking of motion and image registration [47]. Since wrinkles appear as edge-like components with high frequency, Gabor edge analysis technique has been commonly used for wrinkle features extraction. Sobel filter [15, 48] Hough transform [49] and active contours [14] are among most commonly used texture edge descriptors. Although, edges in a face image also consist of features such as beards, mustache, hairs and shadows. To reduce the effect of this noise, Choi et al. [13] proposes the use of predominant orientation of wrinkles to be considered in wrinkle feature extraction. 2D spatial domain Gabor is defined as g(x,y)=12πσxσyexp−12x2σx2+y2σy2+2πjWx (4) where σx and σy are the standard deviations of the distribution along x and y axes, respectively, and W is the sinusoidal radial frequency. Fourier transform of the function in equation (4) is expressed as G(u,v)=exp−12(u−W)2σu2+v2σv2 (5) where σu=12πσx and σv=12πσy ⁠. Gabor filter bank is obtained by rotations and dilations of g(x,y) ⁠. The general equation for creating Gabor filter bank could be expressed as gb(x,y)=a−mg(x¯,y¯) (6) where x¯=xcosθ+ysinθ and y¯=−xsinθ+ycosθ where θk=π(k−1)n,k=1,2,3…n where n is the number of orientations used and a−m is filter scale for m=0,1,2...S for S scales. We refer readers to [47, 50] for more details on Gabor filters. Wrinkles features were extracted from components in Fig. 1. A set of Gabor responses at a given sampling point constitute wrinkle feature. For each component, p wrinkle features are extracted, where p>0 represent size of Gabor filter bank. Mean μ and standard deviation σ of Gabor filter response magnitude are used to determine wrinkle features. These p wrinkle features from a component are concatenated and form wrinkle feature vector of that particular component. Apart from wrinkle analysis, texture features were also extracted and analyzed. Unlike wrinkles that appear at particular points on a face, facial texture appears randomly for every facial part. LBP is a texture description technique that can detect microstructure patterns like spots, edges, lines and flat areas on the skin [21]. LBP is used to describe texture for face recognition, gender classification, age estimation, face detection, face and facial component tracking. Figure 4 shows sample 3×3 LBP operation. Figure 4. View largeDownload slide LBP operation with P=8 ⁠, R=1 ⁠. (a) Sample image region, (b) thresholding and (c) resultant LBP code. Figure 4. View largeDownload slide LBP operation with P=8 ⁠, R=1 ⁠. (a) Sample image region, (b) thresholding and (c) resultant LBP code. LBP assigns a code to each pixel in an image by comparing it to its neighbours in a particular radius. LBP code is created by LBPP,R(xc,yc)=∑n=0N−12ns(gn−gc) (7) where N is the number of neighbouring pixels, R is the distance of neighbouring pixel from centre pixel, gc is the grey value of centre pixel, gn for n=0,1,3,…,N−1 correspond to grey value of neighbouring pixel on circular symmetric neighbourhood of distance R>0 and thresholding function τ that generates a binary bit for a particular pixel is defined as τ(x)=1ifx≥00ifx<0 (8) Concatenating all 8 bits gives a binary number. Resulting binary number is converted to a decimal and assigned to centre pixel as its LBP code. Ojala et al. [21] further categorized LBP codes as uniform and non-uniform patterns. LBP pattern with utmost two bitwise transition from 0 to 1 or 1 to 0 is categorized as a uniform pattern. For instance, 00000000, 00010000 and 11011111 patterns are uniform while 01010000, 11100101 and 10101001 are non-uniform patterns. Ojala et al. [51] found that when using eight neighbours and radius 1, 90% of all patterns are made up of uniform patterns. The original LBP operator had limitation in capturing dominant features with large scale structures. The operator was later extended to capture texture features with neighbourhood of different radii [51]. A set of sampling pixels distributed evenly along circle circumference centred at the pixel to be labelled define the neighbourhood. Bilinear interpolation of points that do not fall within the pixels is done to allow any radii and any number of sampling pixels. Uniform patterns may represent microstructures as line, spot, edge or flat area. Figure 5 shows microstructure pattern representation while Fig. 6 shows LBP codes for all uniform patterns in LBP8,1 neighbourhood. In order to extract rotational invariant features using LBP, the generated LBP code is circularly rotated until its minimum value is obtained [52]. Extended LBP operator could capture more texture features on an image but still it could not preserve spatial information about these features. Ahonen et al. [53] proposed a technique of dividing a face image into n cells. Histograms are generated for each cell then concatenated to a single spatial histogram. Spatial histogram preserves both spatial and texture description of an image. Figure 5. View largeDownload slide Microstructure pattern LBP code P=8 ⁠, R=1 ⁠. Figure 5. View largeDownload slide Microstructure pattern LBP code P=8 ⁠, R=1 ⁠. Figure 6. View largeDownload slide Example of LBP codes that represent uniform pattern P=8 ⁠, R=1 ⁠. Figure 6. View largeDownload slide Example of LBP codes that represent uniform pattern P=8 ⁠, R=1 ⁠. Image texture features are finally represented by histogram of LBP codes. LBP histogram contains detailed texture descriptor for all structures on the face image like spots, lines, edges and flat areas. AAM [23] are used to represent shape and appearance of faces. AAM statistically represents face appearance and shape using PCA [19]. AAM face representation does not capture enough texture and wrinkle information. We used 68 landmark points in Face and Gesture Recognition Network (FG-NET) ageing database and ratios of distances between fiducial landmarks to represent global facial shape. Figure 7 shows labelled image and 68 landmark points. Figure 7. View largeDownload slide Landmark points used in facial shape. (a) Labelled face and (b) 2D-landmark points. Figure 7. View largeDownload slide Landmark points used in facial shape. (a) Labelled face and (b) 2D-landmark points. These points can be determined by use of appropriate 2D landmarking algorithm like one proposed in [54]. Global face appearance features are extracted by Linear Discriminant Analysis (LDA) [19, 55]. LDA is a feature extraction technique that searches for features that best discriminant between classes. Given a set of independent features, LDA creates a linear combination of these features such that largest mean differences between classes are achieved. The detected face is resized to 100×100 pixels. The image is then flattened into a 1×10 000 vector. We use LDA for appearance feature extraction. We first project 1×10 000 face vector to PCA subspace to reduce the dimensionality of the input image data from 10 000 to N−c where N is the number of samples and c is the number of age-groups to be estimated. This is done before applying LDA to ensure that Sw does not become singular as proposed in [56, 57]. We perform LDA on face images projected on PCA space. For each image in all ages, we define within-class scatter matrix Sw=∑j=1c∑i=1Nj(xij−μj)(xij−μj)T (9) where xij is the ith image of age j ⁠, μj is the mean of age j ⁠, c is the number of ages to be estimated and Nj is the number of images in age j ⁠. We define between-class scatter matrix as Sb=∑j=1c(μj−μ)(μj−μ)T (10) where μ is the mean of all ages. LDA main objective is to maximize between-class scatter matrix while minimizing within-class scatter matrix. One way of doing this is maximizing the ratio det∣Sb∣det∣Sw∣ ⁠. Given that Sw is non-singular, it has been proven [55] that this ratio is maximized when column vectors of projection matrix are the eigenvectors of Sw−1Sb ⁠. Sw maximum rank is N−c with N samples and c classes. This, therefore, requires N=t+c samples to guarantee that Sw does not become singular, where t is the dimensionality of input data. The number of samples N is almost always smaller than t ⁠, making the scatter matrix Sw singular. To solve this problem, [56, 57] propose projecting input data to PCA subspace, to reduce dimensionality to N−c ⁠, or less, before applying LDA. PCA and LDA are widely used appearance feature extraction methods in pattern recognition [58]. Consequently, we adopt LDA for extraction of global face appearance features for age-group estimation. We project the input data to a PCA subspace as recommended before applying LDA. Thereafter, we combine LDA and PCA eigenvectors using GEMM1; [59] operation and use the combined eigenvectors to project each of the face data. We take these projections as appearance feature vector for age estimation. 3.4. Feature fusion Fusion is a technique of combining multiple information in a biometric system [60]. Fusion in multi-biometric systems can be categorized into two broad categories: (1) fusion prior to matching, and (2) fusion after matching. Fusion prior to matching is further divided into two; sensor level fusion and feature level fusion. Sensor level fusion is used when multiple sources are used to represent samples of the same biometric trait. Feature level fusion is used by combining features extracted from multiple biometric sources or features extracted by different algorithms. Fusion after matching includes matching score level fusion, rank level fusion and decision level fusion. Decision level fusion is when output from different matchers are combined using some rule to generate a final decision. The rules could be AND rules, OR rules, Majority voting among others [61]. We use feature level and decision level fusion. As reported by Ross et al. [62], feature level fusion results in a richer feature representation of the raw data leading to better classification. We, therefore, adopt feature level fusion before classification and majority vote scheme is used for decision level fusion to combine decisions from multiple matchers for the final decision. We extract global shape features f1=(s1,s2,…,sn) ⁠, global appearance feature vector f2=(a1,a2,…,ax) ⁠, local wrinkle features f3=(w1,w2,…,wp) and local texture features f4=(t1,t2,…,tm) ⁠. The values n>0,x>0,p>0 and m>0 represent dimensionality of each respective feature vector. Feature dimensionality is normalized using LDA. We perform z-score normalization of these features before fusion. The normalization is done by fnormi=fi−μiσi (11) where fi is a facial ageing feature, μi and σi are the mean and the standard deviation of this feature, and fnormi is the normalized feature of feature i ⁠. We thereafter fused feature f1 with feature f2 and feature f3 with feature f4 by concatenating them as fg=f1⊕f2fl=f3⊕f4 (12) where fg and fl represent the fused global feature vector and the fused local feature vector, respectively. Feature fusion often leads to high-dimensional feature vectors leading to ‘curse of dimensionality’ problem. Thus, each of the fused feature vectors dimensionality is reduced by PCA. We perform decision fusion using majority voting scheme. Given number of classifiers as T with each classifier having a single vote, predicted class is one with T votes, or one with ⌊T/2⌋+1 votes or the class with simple majority vote [63]. The final decision is: select class ωx if ∑t=1Tdt,x=maxj=1c∑t=1Tdt,j (13) Assumptions in this technique are that T is odd, for any instance x ⁠, probability that each classifier will give correct class label is p and that p is independent for each classifier [64]. We use simple majority voting rule to get final output of ensemble. 3.5. Overview of proposed method Given a face image, global shape features Sij and global appearance features Aij for subject i at age-group j are extracted. Aij and Sij are fused to get a single global feature vector fG ⁠. fG is used to train a classifier for age-group estimation based on global feature vector. Face image is divided into R>1 components. For each component Rr for r=1,2,3…R ⁠, where R is the number of components, local wrinkle feature Wi,rj and local texture feature Ti,rj of subject i at age-group j are extracted. Wi,rj and Ti,rj are fused into a single local feature vector fLr for component r ⁠. A classifier is trained on each of fLr ⁠. A total of T=R+1 classifiers are trained. Each of these classifiers returns age-group label for respective feature vector. Figure 8 shows a block diagram of the proposed method. SVM [13, 35, 65] LDA [66], Nearest Neighbour [20] and ANN [15, 67], Extreme Learning Machines (ELM) [38] have been used separately in previous studies for age estimation. Each classifier has its own inherent limitations. We propose a technique that uses an ensemble of back propagation ANN and SVM with Radial Basis Function (RBF) kernel. A final age-group label is determined by decision level fusion. Decisions from all the classifiers are fused using majority voting rule whereby the estimated age-group is one with more votes. Figure 8. View largeDownload slide Block diagram of the proposed method. Figure 8. View largeDownload slide Block diagram of the proposed method. 3.6. Experiments The experiment was performed on publicly available FG-NET ageing dataset [2, 68]. The dataset consists of 1002 high-resolution colour or grey-scale face images of 82 multi-race subjects. The images in this dataset have large variations of lighting, pose and expression. Some images have adverse conditions because they were scanned. Age-group estimation performance is measured by cumulative score (CS) [33] at error level 0 which is defined as CS(x)=nxNx×100% (14) where nx is the number of test images correctly recognized as belonging to age-group x and Nx is the total number of test images in age-group x ⁠. We split our dataset into two sets; training and testing set. We further split the training set into k equal subsets. We perform k-folds of cross-validation by training classifiers on k−1 subsets and testing them on the remaining one subset. We perform cross-validation to choose the best γ and C RBF kernel parameters and number of layers and nodes in each layer for ANN. We try different pairs of (C,γ) on kernel-SVM and different numbers of hidden layers and nodes in ANN. We trained ANN with four layers, with 3 neurons in the input layer, 40 neurons in the second and the third layer and 4 neurons in the output layer. We used sigmoid activation function for our neuron processing. For SVM, we used parameters γ=0.12 and C=100 ⁠. Each classifier is tested on each of the components and the one that performs better is chosen for a specific component. 4. RESULTS AND DISCUSSION This section presents results of the experiments and performance evaluation of the proposed age-group estimation method. We used four age groups since age-group estimation reduces drastically with an increase in the number of groups due to reduction in the number of images per group [29]. Age grouping used is almost similar to grouping in [27, 29]. In this study, the age-groups are formulated as shown in Table 1. Table 1. Grouping of images used in this study. Age-group Number of images % Distribution 0–6 274 27.3 7–14 271 27.0 15–24 249 24.9 25–69 208 20.8 Total 1002 100 Age-group Number of images % Distribution 0–6 274 27.3 7–14 271 27.0 15–24 249 24.9 25–69 208 20.8 Total 1002 100 View Large Table 1. Grouping of images used in this study. Age-group Number of images % Distribution 0–6 274 27.3 7–14 271 27.0 15–24 249 24.9 25–69 208 20.8 Total 1002 100 Age-group Number of images % Distribution 0–6 274 27.3 7–14 271 27.0 15–24 249 24.9 25–69 208 20.8 Total 1002 100 View Large This grouping is almost similar to one used in [27, 29]. Age-group classification was performed on individual fused feature vectors to determine their performance. Experimental results for age-group estimation based on global shape and appearance features is shown in Table 2. We observed that age-group estimation accuracy is higher is Age-groups 1 and 4. During lower age-groups, the shape of the face changes significantly while appearance remains relatively constant. Shape features play a major role in classifying images in these age groups. Table 2. Cumulative scores of age estimation using holistic shape and appearance features. Age-group Shape+Appearance SVM (%) ANN (%) 1 78.2 94.5 2 75.1 75.8 3 77.4 80.6 4 80.3 82.7 Age-group Shape+Appearance SVM (%) ANN (%) 1 78.2 94.5 2 75.1 75.8 3 77.4 80.6 4 80.3 82.7 View Large Table 2. Cumulative scores of age estimation using holistic shape and appearance features. Age-group Shape+Appearance SVM (%) ANN (%) 1 78.2 94.5 2 75.1 75.8 3 77.4 80.6 4 80.3 82.7 Age-group Shape+Appearance SVM (%) ANN (%) 1 78.2 94.5 2 75.1 75.8 3 77.4 80.6 4 80.3 82.7 View Large In adulthood, facial shape remains slightly constant as there is significant change in appearance. In these age-groups, appearance features are vital in discriminating between age-group. Fusing of these two features contribute to better accuracy across all age groups. Table 3 shows the cumulative scores for age-group estimation based on forehead texture and wrinkle fused features. FH Gabor and LBP features can discriminate between Age-groups 1 and 2 by accuracies in the range of 45–60%. As one grows into Age-group 3, the forehead texture becomes distinct from Age-groups 1 and 2. Accuracies improve to slightly around 80% in Age-group 4. This could be attributed to introduction of heavy wrinkle lines on the forehead in Age-group 4. The performance of these features could also be adversely affected by hair style that covers part of the forehead. Table 3. Cumulative scores of age estimation using forehead texture and wrinkle features. Age-group Texture+Wrinkle SVM (%) ANN (%) 1 45.2 51.5 2 64.3 53.7 3 72.4 74.2 4 78.5 82.6 Age-group Texture+Wrinkle SVM (%) ANN (%) 1 45.2 51.5 2 64.3 53.7 3 72.4 74.2 4 78.5 82.6 View Large Table 3. Cumulative scores of age estimation using forehead texture and wrinkle features. Age-group Texture+Wrinkle SVM (%) ANN (%) 1 45.2 51.5 2 64.3 53.7 3 72.4 74.2 4 78.5 82.6 Age-group Texture+Wrinkle SVM (%) ANN (%) 1 45.2 51.5 2 64.3 53.7 3 72.4 74.2 4 78.5 82.6 View Large As shown in Table 4, LE and RE LBP and Gabor features perform quite well in age-group estimation. Age-group discrimination using LE texture and wrinkle features achieved accuracy of 50–70% in Age-groups 1 and 2. This could be attributed to minimal texture variations in these age groups. Wrinkles are also rarely found around eye regions in this age group. Accuracy of between 70% and 80% was achieved in Age-groups 3 and 4. Accuracy improvement in last two age groups could be attributed to significant texture variations around the eye region and introduction of wrinkles as one grows. Table 4. Cumulative scores of age estimation using eyes texture and wrinkle features. Age-group Texture+Wrinkle LE RE SVM (%) ANN (%) SVM (%) ANN (%) 1 54.2 52.3 55.7 53.1 2 66.0 64.4 65.8 65.5 3 73.1 75.7 74.2 76.7 4 76.3 79.6 76.8 78.4 Age-group Texture+Wrinkle LE RE SVM (%) ANN (%) SVM (%) ANN (%) 1 54.2 52.3 55.7 53.1 2 66.0 64.4 65.8 65.5 3 73.1 75.7 74.2 76.7 4 76.3 79.6 76.8 78.4 View Large Table 4. Cumulative scores of age estimation using eyes texture and wrinkle features. Age-group Texture+Wrinkle LE RE SVM (%) ANN (%) SVM (%) ANN (%) 1 54.2 52.3 55.7 53.1 2 66.0 64.4 65.8 65.5 3 73.1 75.7 74.2 76.7 4 76.3 79.6 76.8 78.4 Age-group Texture+Wrinkle LE RE SVM (%) ANN (%) SVM (%) ANN (%) 1 54.2 52.3 55.7 53.1 2 66.0 64.4 65.8 65.5 3 73.1 75.7 74.2 76.7 4 76.3 79.6 76.8 78.4 View Large As shown in Table 5, LBP and Gabor features from LC and RC performed slightly lower compared to eye features in Age-groups 1 and 2. As we move to Age-groups 3 and 4, age spots, wrinkles and general texture variations start appearing in cheek bone areas. This leads to improvement in LC and RC features performance from 50% to 63% in Age-groups 1 and 2 to the range of 75–84% in Age-groups 3 and 4. Table 5. Cumulative scores of age estimation using cheeks texture and wrinkle features. Age-group Texture+Wrinkle LC RC SVM (%) ANN (%) SVM (%) ANN (%) 1 48.4 53.2 50.3 53.7 2 62.5 60.0 63.1 61.0 3 78.1 76.8 79.2 76.2 4 81.4 83.2 80.8 81.6 Age-group Texture+Wrinkle LC RC SVM (%) ANN (%) SVM (%) ANN (%) 1 48.4 53.2 50.3 53.7 2 62.5 60.0 63.1 61.0 3 78.1 76.8 79.2 76.2 4 81.4 83.2 80.8 81.6 View Large Table 5. Cumulative scores of age estimation using cheeks texture and wrinkle features. Age-group Texture+Wrinkle LC RC SVM (%) ANN (%) SVM (%) ANN (%) 1 48.4 53.2 50.3 53.7 2 62.5 60.0 63.1 61.0 3 78.1 76.8 79.2 76.2 4 81.4 83.2 80.8 81.6 Age-group Texture+Wrinkle LC RC SVM (%) ANN (%) SVM (%) ANN (%) 1 48.4 53.2 50.3 53.7 2 62.5 60.0 63.1 61.0 3 78.1 76.8 79.2 76.2 4 81.4 83.2 80.8 81.6 View Large As shown in Table 6, performance on Age-group 4 was the highest with an accuracy of 75–84% on NB texture and wrinkle features. This could be attributed to introduction of heavy wrinkle lines and significant texture variations around NB in adults and old age. In old age, freckles start appearing around NB due to over-exposure to UV rays from the sun. LBP and Gabor features from NB region in Age-groups 1 and 2 were slightly low with accuracies in the range of 50–60%. This is due to relatively constant skin texture and lack of wrinkle features in this region. In Age-group 3, accuracies improve to the range of 60–70%. Similarities between subjects in Age-group 3 and those in 2 and 4 could have contributed to this lower accuracy. We observed that NB region provides rich texture and wrinkle information for distinguishing Age-group 3 from Age-group 4. Wrinkle and texture feature from the NO region performed slightly better than features from NB region for Age-groups 1 and 2. Table 6. Cumulative scores of age estimation using cheeks texture and wrinkle features. Age-group Texture+Wrinkle NB NO SVM (%) ANN (%) SVM (%) ANN (%) 1 55.4 53.2 57.2 65.8 2 57.8 58.1 72.1 69.6 3 61.5 69.0 75.5 73.3 4 75.1 83.6 78.2 80.8 Age-group Texture+Wrinkle NB NO SVM (%) ANN (%) SVM (%) ANN (%) 1 55.4 53.2 57.2 65.8 2 57.8 58.1 72.1 69.6 3 61.5 69.0 75.5 73.3 4 75.1 83.6 78.2 80.8 View Large Table 6. Cumulative scores of age estimation using cheeks texture and wrinkle features. Age-group Texture+Wrinkle NB NO SVM (%) ANN (%) SVM (%) ANN (%) 1 55.4 53.2 57.2 65.8 2 57.8 58.1 72.1 69.6 3 61.5 69.0 75.5 73.3 4 75.1 83.6 78.2 80.8 Age-group Texture+Wrinkle NB NO SVM (%) ANN (%) SVM (%) ANN (%) 1 55.4 53.2 57.2 65.8 2 57.8 58.1 72.1 69.6 3 61.5 69.0 75.5 73.3 4 75.1 83.6 78.2 80.8 View Large Performance in Age-group 4 had the highest accuracy of 78–81% with Age-groups 2 and 3 having accuracies of between 70% and 75%. Similar to other regions, Age-group 1 had the lowest accuracy between 55% and 60%. Performance in Age-groups 3 and 4 could have been affected by partial occlusion/nose shadow and presence and absence of mustache. Table 7 shows the performance of texture and wrinkle features from MO region. Age-groups 1 and 2 had accuracies of between 67% and 71%. Table 7. Cumulative scores of age estimation using mouth texture and wrinkle features. Age-group Texture+Wrinkle SVM (%) ANN (%) 1 67.2 69.3 2 70.5 68.8 3 59.5 58.7 4 62.4 63.2 Age-group Texture+Wrinkle SVM (%) ANN (%) 1 67.2 69.3 2 70.5 68.8 3 59.5 58.7 4 62.4 63.2 View Large Table 7. Cumulative scores of age estimation using mouth texture and wrinkle features. Age-group Texture+Wrinkle SVM (%) ANN (%) 1 67.2 69.3 2 70.5 68.8 3 59.5 58.7 4 62.4 63.2 Age-group Texture+Wrinkle SVM (%) ANN (%) 1 67.2 69.3 2 70.5 68.8 3 59.5 58.7 4 62.4 63.2 View Large Performance of features from MO was the lowest in Age-groups 3 and 4 with an accuracy of 58–64%. This could be attributed to facial expression like smiling and laughing that extremely influence LBP and Gabor features extracted from MO region. Another factor that could have affected this accuracy is mustache and beards around the mouth. We combined the two classifiers to make an ensemble of nine classifiers; one for each landmark and one for global fused feature. Using majority voting strategy, we fused decision from these classifiers in order to get single ensemble output for each test sample. Figure 9 shows overall performance of feature and decision level fusion in age-group estimation. Figure 9. View largeDownload slide Cumulative score for age-group estimation based on feature and decision level fusion of global shape and appearance and local wrinkle and texture feature. Figure 9. View largeDownload slide Cumulative score for age-group estimation based on feature and decision level fusion of global shape and appearance and local wrinkle and texture feature. Using hybrid of global and local facial features with feature fusion and decision fusion significantly improve age-group estimation. The overall performance of Age-group 1 had the highest accuracy of 97.8%. Age-group 2 recorded overall performance of 88.6%, while Age-group 3 overall performance was 89.5%. Age-group 4 had performance of 93.7%. The general performance of this approach is 92.4% which is calculated as accuracy=∑i=14niN×ai (15) where ni is the number of images in Age-group i ⁠, N is the total number of images (1002) and ai is the accuracy of Age-group i ⁠. Performance of Age-group 4 could be attributed to significant contribution from appearance, texture and wrinkle features. In this age-group, appearance, texture and wrinkles vary significantly as compared with subjects in lower age-groups. Lower performance of Age-group 2 could be attributed to similarity of subjects in this age-group with subjects in Age-groups 1 and 3. 4.1. Comparison with previous methods It is a bit challenging to compare age-group estimation approaches due to differences in age-grouping in various approaches. We compare performance in age-group estimation in this paper with almost similar age-groups in previous approaches. As shown in Fig. 9, fusing shape, appearance, texture and wrinkle features improved performance in middle age groups from 40%–60% reported in [38] to the range 80–90%. Adult age-group performance was also improved to 93.7% as compared to 87.5% reported in [40]. We achieved performance of 89.5% in teen age-group (15–24 years) as compared with teen age-group (12–21 years) performance of 87% reported in [40]. Age-group 1 (0–6 yeas) performance improved to 97.8% as compared with 92.1% baby group (0–3 years) accuracy reported in [27]. Our approach achieved 93.7% accuracy in adult age group (25–69 years) compared with 90.1% reported in [27] for age-group 20–59 years. We achieved performance of 89.5% in Age-group 3 (15–24 year) as compared with 75% for age range 16–30 years reported in [25]. Table 8 shows a summary of comparison of overall performance of this approach with previous methods in the literature. Table 8. Comparison of age-group estimation performance with previous studies. Method Age grouping Accuracy (%) 2D-Landmark Points, Regression Vector Machines, Partial Least Squares, Nearest Neighbour, Naive Bayes, Fisher Linear Discriminant [16] 0–15, 15–30, 30+ 70.0 LBP, Minimum Distance, Nearest Neighbour [20] 10±5,20±530±5,40±550±5,60±5 80.0 Anthropometric Distance Ratios, Canny, Artificial Neural Network [11] 0–15, 16–30, 31–50, 51+ 86.6 HOG, Probabilistic Neural Network [25] 0–15, 16–30, 31–50, 50+ 87.0 2D-Landmark Points, GOP, ANOVA, SVM (GAS) [27] 0–3, 4–19, 20–59 91.4 Gradient of Oriented Pyramids, SVM [29] 0–5, 6–12, 13–21, 22–69 91.3 Proposed: Anthropometric Distance Ratios, LBP, 2D-Landmark Points, Gabor, LDA, SVM, Artificial Neural Network 0–6, 7–14, 15–24, 25–69 92.4 Method Age grouping Accuracy (%) 2D-Landmark Points, Regression Vector Machines, Partial Least Squares, Nearest Neighbour, Naive Bayes, Fisher Linear Discriminant [16] 0–15, 15–30, 30+ 70.0 LBP, Minimum Distance, Nearest Neighbour [20] 10±5,20±530±5,40±550±5,60±5 80.0 Anthropometric Distance Ratios, Canny, Artificial Neural Network [11] 0–15, 16–30, 31–50, 51+ 86.6 HOG, Probabilistic Neural Network [25] 0–15, 16–30, 31–50, 50+ 87.0 2D-Landmark Points, GOP, ANOVA, SVM (GAS) [27] 0–3, 4–19, 20–59 91.4 Gradient of Oriented Pyramids, SVM [29] 0–5, 6–12, 13–21, 22–69 91.3 Proposed: Anthropometric Distance Ratios, LBP, 2D-Landmark Points, Gabor, LDA, SVM, Artificial Neural Network 0–6, 7–14, 15–24, 25–69 92.4 View Large Table 8. Comparison of age-group estimation performance with previous studies. Method Age grouping Accuracy (%) 2D-Landmark Points, Regression Vector Machines, Partial Least Squares, Nearest Neighbour, Naive Bayes, Fisher Linear Discriminant [16] 0–15, 15–30, 30+ 70.0 LBP, Minimum Distance, Nearest Neighbour [20] 10±5,20±530±5,40±550±5,60±5 80.0 Anthropometric Distance Ratios, Canny, Artificial Neural Network [11] 0–15, 16–30, 31–50, 51+ 86.6 HOG, Probabilistic Neural Network [25] 0–15, 16–30, 31–50, 50+ 87.0 2D-Landmark Points, GOP, ANOVA, SVM (GAS) [27] 0–3, 4–19, 20–59 91.4 Gradient of Oriented Pyramids, SVM [29] 0–5, 6–12, 13–21, 22–69 91.3 Proposed: Anthropometric Distance Ratios, LBP, 2D-Landmark Points, Gabor, LDA, SVM, Artificial Neural Network 0–6, 7–14, 15–24, 25–69 92.4 Method Age grouping Accuracy (%) 2D-Landmark Points, Regression Vector Machines, Partial Least Squares, Nearest Neighbour, Naive Bayes, Fisher Linear Discriminant [16] 0–15, 15–30, 30+ 70.0 LBP, Minimum Distance, Nearest Neighbour [20] 10±5,20±530±5,40±550±5,60±5 80.0 Anthropometric Distance Ratios, Canny, Artificial Neural Network [11] 0–15, 16–30, 31–50, 51+ 86.6 HOG, Probabilistic Neural Network [25] 0–15, 16–30, 31–50, 50+ 87.0 2D-Landmark Points, GOP, ANOVA, SVM (GAS) [27] 0–3, 4–19, 20–59 91.4 Gradient of Oriented Pyramids, SVM [29] 0–5, 6–12, 13–21, 22–69 91.3 Proposed: Anthropometric Distance Ratios, LBP, 2D-Landmark Points, Gabor, LDA, SVM, Artificial Neural Network 0–6, 7–14, 15–24, 25–69 92.4 View Large As shown in Table 8, our approach has improved overall age-group estimation from 91.4 achieved in [27] using Gradient of oriented pyramids (GOP), ANOVA and SVM (GAS) and 91.3% reported in [29] to 92.4%. This shows that using both global and local features with feature fusion and decision fusion improves performance of age-group estimation. 5. CONCLUSION An age-group estimation method using feature fusion, decision fusion and ensemble of classifiers is proposed. Global shape features are extracted using landmark points and distance ratios between fiducial points. LDA is used to extract global appearance features. After z-score normalization on each feature vector, global shape and appearance features are fused into one feature vector for training one of the classifiers. Face image is then divided into eight facial components. Component-based texture and wrinkle features are extracted by LBP and Gabor filters respectively. For each component, these features are z-score normalized and then fused to make one feature vector. LDA is used to normalize dimensionality of features before z-score normalization and fusion. Fused features are used to train an ensemble made of ANN and SVM classifiers for age-group estimation. A total of nine models are trained. Decision level fusion is adopted by use of majority voting scheme for final age-group label. Experimental results on widely used FG-NET ageing dataset show that the proposed approach achieves better accuracy in age-group estimation against previous methods. To improve exact age estimation, in-depth inquiry is necessary for performing age-group estimation with narrow uniform age groups (0→4,5→9,10→14...96→100) ⁠. Further inquiry may be done to verify robustness of proposed method across various face ageing datasets. Further investigation is required to make age-group and age estimation robust to various dynamics like facial expression, cosmetics, illumination and differences in perceived ageing between males and females. REFERENCES 1 Alley , R. ( 1998 ) Social and Applied Aspects of Perceiving Faces . Lawrence Erlbaum Associates, Inc . 2 Panis , G. , Lanitis , A. , Tsapatsoulis , N. and Cootes , T.F. ( 2016 ) Overview of research on facial ageing using the FG-NET ageing database . IET Biometrics , 5 , 37 – 46 . Google Scholar Crossref Search ADS 3 Ramanathan , N. , Chellappa , R. and Biswas , S. ( 2009 ) Computational methods for modeling facial aging: a survey . J. Vis. Lang. Comput. , 20 , 131 – 144 . Google Scholar Crossref Search ADS 4 Raval , M.J. and Shankar , P. ( 2015 ) Age invariant face recognition using artificial neural network . Int. J. Adv. Eng. Res. Dev. , 2 , 121 – 128 . 5 Sonu , A. , Sushil , K. and Sanjay , K. ( 2014 ) A novel idea for age invariant face recognition . Int. J. Innov. Res. Sci. Eng. Technol. , 3 , 15618 – 15624 . Google Scholar Crossref Search ADS 6 Jyothi , S.N. and Indiramma , M. ( 2013 ) Stable local feature based age invariant face recognition . Int. J. Appl. Innov. Eng. Manage. , 2 , 366 – 371 . 7 Jinli , S. , Xilin , C. , Shiguang , S. , Wen , G. and Qionghai , D. ( 2012 ) A concatenational graph evolution aging model . IEEE Trans. Pattern Anal. Mach. Intell. , 34 , 2083 – 2096 . Google Scholar Crossref Search ADS PubMed 8 Park , U. , Tong , Y. and Jain , A.K. ( 2010 ) Age invariant face recognition . IEEE Trans. Pattern Anal. Mach. Intell. , 32 , 947 – 954 . Google Scholar Crossref Search ADS PubMed 9 Fu , Y. , Guo , G. and Huang , T. ( 2010 ) Age synthesis and estimation via faces: a survey . IEEE Trans. Pattern Anal. Mach. Intell. , 32 , 1955 – 1976 . Google Scholar Crossref Search ADS PubMed 10 Ramanathan , N. and Chellappa , R. ( 2005 ) Face Verification Across Age Progression. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 462–469. IEEE. 11 Dehshibi , M.M. and Bastanfard , A. ( 2010 ) A new algorithm for age recognition from facial images . Signal Processing , 90 , 2431 – 2444 . Google Scholar Crossref Search ADS 12 Alberta , A.M. , Ricanek , K. and Pattersonb , E. ( 2007 ) A review of the literature on the aging adult skull and face: implications for forensic science research and applications . Forensic Sci. Int. , 172 , 1 – 9 . Google Scholar Crossref Search ADS PubMed 13 Choi , S.E. , Lee , Y.J. , Lee , S.J. , Park , K.R. and Kim , J. ( 2011 ) Age estimation using hierarchical classifier based on global and local features . Pattern Recognit. , 44 , 1262 – 1281 . Google Scholar Crossref Search ADS 14 Kwon , Y. and Lobo , N. ( 1999 ) Age classification from facial images . Comput. Vis. Image Underst. , 74 , 1 – 21 . Google Scholar Crossref Search ADS 15 Horng , W.B. , Lee , C.P. and Chen , C.W. ( 2001 ) Classification of age groups based on facial features . Tamkang J. Sci. Eng. , 4 , 183 – 192 . 16 Thukral , P. , Mitra , K. and Chellappa , R. ( 2012 ) A Hierarchical Approach for Human Age Estimation. In Proc. IEEE Int. Conf. Acoustic, Speech and Signal Processing, pp. 1529–32. 17 Farkas , L.G. , Katic , M.J. and Forrest , C.R. ( 2005 ) International anthropometric study of facial morphology in various ethnic groups/races . J. Craniofacila Surg. , 16 , 615 – 646 . Google Scholar Crossref Search ADS 18 Tiwari , R. , Shukla , A. , Prakash , C. , Sharma , D. , Kumar , R. and Sharma , S. ( 2009 ) Face Recognition Using Morphological Methods. In Proc. IEEE Int. Advance Computing Conference. IEEE. 19 Fukunaga , K. ( 1990 ) Introduction to Statistical Pattern recognition ( 2nd edn ). Academic Press . 20 Gunay , A. and Nabiyev , V.V. ( 2008 ) Automatic Age Classification with LBP. In Proc. 23rd Int. Sympos. Computer and Information Sciences, pp. 1–4. 21 Ojala , T. , Pietikainen , M. and Harwood , D. ( 1996 ) A comparative study of texture measures with classification based on featured distribution . Pattern Recognit. , 29 , 51 – 59 . Google Scholar Crossref Search ADS 22 Gunay , A. and Nabiyev , V.V. ( 2015 ) Facial age estimation based on decision level fusion of AMM, LBP and gabor features . Int. J. Adv. Comput. Sci. Appl. , 6 , 19 – 26 . 23 Cootes , T.F. , Edwards , G.J. and Taylor , C.J. ( 2001 ) Active appearance models . IEEE Trans. Pattern Anal. Mach. Intell. , 23 , 681 – 685 . Google Scholar Crossref Search ADS 24 Gabor , D. ( 1946 ) Theory of communication . J. Inst. Electr. Eng. , 93 , 429 – 457 . 25 Hajizadeh , M.A. and Ebrahimnezhad , H. ( 2011 ) Classification of Age Groups from Facial Image Using Histogram of Oriented Gradient. In Proc. 7th Iranian Machine Vision and Image Processing Conference, pp. 1–5. 26 Dalal , N. and Triggs , B. ( 2005 ) Histograms of Oriented Gradients for Human Detection. In Proc. Conf. Computer Vision and Pattern Recognition, pp. 886–893. IEEE. 27 Liu , K.H. , Yan , S. and Kuo , J.C.C. ( 2014 ) Age Group Classification via Structured Fusion of Uncertainty-Driven Shape Features and Selected Surface Features. In Proc. IEEE Winter Conf. Applications of Computer Vision (WACV), pp. 445–452. IEEE. 28 Ling , H. , Soatto , S. , Ramanathan , N. and Jacobs , D. ( 2010 ) Face verification across age progression using discriminanative methods . IEEE Trans. Inf. Forensics Secur. , 5 , 82 – 91 . Google Scholar Crossref Search ADS 29 Liu , K.H. , Yan , S. and Kuo , J.C.C. ( 2015 ) Age estimation via grouping and decision fusion . IEEE Trans. Inf. Forensics Secur. , 10 , 2408 – 2423 . Google Scholar Crossref Search ADS 30 Chang , C.C. and Lin , C.J. ( 2011 ) Libsvm: a library for support vector machines . ACM Trans. Intell. Syst. Technol. , 2 , 27:1 – 27:27 . Google Scholar Crossref Search ADS 31 Lanitis , A. , Taylor , J. and Cootes , T.F. ( 2002 ) Toward automatic simulation of aging effects on face images . IEEE Trans. Pattern Anal. Mach. Intell. , 24 , 442 – 455 . Google Scholar Crossref Search ADS 32 Lanitis , A. , Draganova , C. and Christodoulou , C. ( 2004 ) Comparing different classifiers for automatic age estimation . IEEE Trans. Man Syst. Cybernatics , 34 , 621 – 628 . Google Scholar Crossref Search ADS 33 Geng , X. , Zhau , Z. and Smith-miles , K. ( 2007 ) Automatic age estimation based on facial aging patterns . IEEE Trans. Pattern Anal. Mach. Intell. , 29 , 2234 – 2240 . Google Scholar Crossref Search ADS PubMed 34 Fu , Y. and Huang , T.S. ( 2008 ) Human age estimation with regression on discriminative aging manifold . IEEE Trans. Multimed. , 10 , 578 – 584 . Google Scholar Crossref Search ADS 35 Guo , G. , Fu , Y. , Dyer , C. and Huang , T. ( 2008 ) Image-based human age estimation by manifold learning and locally adjusted robust regression . IEEE Trans. Image Process. , 17 , 1178 – 1188 . Google Scholar Crossref Search ADS PubMed 36 Ramanathan , N. and Chellappa , R. ( 2006 ) Modeling Age Progression in Young Faces. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 384–394. IEEE. 37 Chao , W. , Liu , J. and Ding , J. ( 2013 ) Facial age estimation based on label-sensitive learning and age-oriented regression . Pattern Recognit. , 46 , 628 – 641 . Google Scholar Crossref Search ADS 38 Sai , P. , Wang , J. and Teoh , E. ( 2015 ) Facial age range estimation with extreme learning machines . Neurocomputing , 149 , 364 – 372 . Google Scholar Crossref Search ADS 39 Huang , G.B. , Zhu , Q.Y. and Siew , C.K. ( 2006 ) Extreme learning machines: theory and applications . Neurocomputing , 70 , 489 – 501 . Google Scholar Crossref Search ADS 40 Wang , J. , Yau , W. and Wang , H.L. ( 2009 ) Age Categorization via ECOC with Fused Gabor and LBP Features. In Proc. IEEE Applications of Computer Vision (WACV), pp. 313–318. IEEE. 41 Lienhart , R. and Maydt , J. ( 2002 ) An extended set of Haar-like features for rapid object detection . IEEE ICIP , 1 , 900 – 903 . 42 Angulu , R. , Tapamo , J.R. and Aderemi , O.A. ( 2017 ) Landmark Localization Approach for Facial Computing. In Proc. IEEE Conf. ICT and Society. IEEE. 43 Viola , P. and Jones , M. ( 2001 ) Rapid Object Detection Using a Boosted Cascade of Simple Features. In Proc. IEEE Int. Conf. Computer Vision and Pattern (CVPR), pp. 8–14. IEEE. 44 Castrillon-santana , M. , Deniz-Suarez , O. , Hernandez , T.M. and Guerra , A.C. ( 2007 ) Real-time detection of multiple faces at different resolutions in video streams . J. Vis. Commun. Image Representation , 18 , 130 – 140 . Google Scholar Crossref Search ADS 45 Castrillon-santana , M. , Deniz-Suarez , O. , Anton-Canalis , L. and Lorenzo-Navarro , J. ( 2008 ) Face and Facial Feature Detection Evaluation. In Third Int. Conf. Computer Vision Theory and Applications. 46 Suo , J. , Wu , T. , Zhu , S. , Shan , S. , Chen , X. and Gao , W. ( 2008 ) Design Sparse Features for Age Estimation Using Hierarchical Face Model. In Proc. IEEE Conf. Automatic Face and Gesture Recognition. IEEE. 47 Manjunathi , B.S. and Ma , W.Y. ( 1996 ) Texture features for browsing, retrieval of image data . IEEE Trans. Pattern Anal. Mach. Intell. , 18 , 837 – 842 . Google Scholar Crossref Search ADS 48 Takimoto , H. , Mitsukura , Y. , Fukumi , M. and Akamatsu , N. ( 2008 ) Robust gender and age estimation under varying facial poses . Electron. Commun. Jpn. , 91 , 32 – 40 . Google Scholar Crossref Search ADS 49 Hayashi , J. , Yasumoto , M. , Ito , H. and Koshimizu , H. ( 2001 ) A Method for Estimating and Modeling Age and Gender Using Facial Image Processing. In Proc. 7th Int. Conf. Virtual Systems and Multimedia. 50 Jung , H.G. and Kim , J. ( 2010 ) Constructing a pedestrian recognition system with a public open database, without the necessity of re-training: an experimental study . Pattern Anal. Appl. , 13 , 223 – 233 . Google Scholar Crossref Search ADS 51 Ojala , T. , Pietikainen , M. and Maenpaa , T. ( 2002 ) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns . IEEE Trans. Pattern Anal. Mach. Intell. , 24 , 971 – 987 . Google Scholar Crossref Search ADS 52 Maenpaa , T. and Pietikainen , M. ( 2005 ) Texture Analysis with Local Binary Patterns: Handbook of Pattern Recognition and Computer Vision . World Scientific . 53 Ahonen , T. , Hadid , A. and Pietikainen , M. ( 2004 ) Face Recognition with Local Binary Patterns. In Euro. Conf. Computer Vision 469–481. 54 Duta , N. , Jain , A.K. and Dubuisson-Jolly , M.P. ( 2001 ) Automatic construction of 2d shape model . IEEE Trans. Pattern Anal. Mach. Intell. , 23 , 433 – 446 . Google Scholar Crossref Search ADS 55 Fisher , R.A. ( 1938 ) The statistical utilization of multiple measurements . Ann. Eugen. , 8 , 376 – 386 . Google Scholar Crossref Search ADS 56 Belhumeour , P.N. , Hespanda , J.P. and Kriegman , D.J. ( 1997 ) Eigenfaces vs Fisherfaces: recognition using class specific linear projection . IEEE Trans. Pattern Anal. Mach. Intell. , 19 , 711 – 720 . Google Scholar Crossref Search ADS 57 Swets , D.L. and Weng , J.J. ( 1996 ) Using discriminant eigenfeatures for image retrieval . IEEE Trans. Pattern Anal. Mach. Intell. , 18 , 71 – 86 . Google Scholar Crossref Search ADS 58 Martinez , A.M. and Kak , A.C. ( 2001 ) PCA versus LDA . IEEE Trans. Pattern Anal. Mach. Intell. , 23 , 228 – 233 . Google Scholar Crossref Search ADS 59 Dongarra , J.J. , Croz , J.D. , Hammarling , S. and Duff , I.S. ( 1990 ) A set of level 3 basic linear algebra subprograms . ACM Trans. Math. Softw. , 16 , 1 – 17 . Google Scholar Crossref Search ADS 60 Ross , A. and Jain , A. ( 2003 ) Information fusion in biometrics . Pattern Recognit. Lett. , 24 , 2115 – 2125 . Google Scholar Crossref Search ADS 61 Eskandari , M. , Onsen , T. and Hasan , D. ( 2013 ) A new approach for face-iris multimodal biometric recognition using score fusion . Int. J. Pattern Recognit. Artif. Intell. , 27 , 1 – 15 . Google Scholar Crossref Search ADS 62 Ross , A. and Jain , A.K. ( 2003 ) Information fusion in biometrics . Pattern Recognit. Lett. , 24 , 2115 – 2125 . Google Scholar Crossref Search ADS 63 Kittler , J.V. , Hatef , M. , Duin , R.P.W. and Matas , J. ( 1998 ) On combining classifiers . IEEE Trans. Pattern Anal. Mach. Intell. , 20 , 226 – 239 . Google Scholar Crossref Search ADS 64 Mangai , U.G. , Samanta , S. , Das , S. and Chowdhury , P.R. ( 2010 ) A survey of decision fusion and feature fusion strategies for pattern classification . IETE Tech. Rev. , 27 , 293 – 307 . Google Scholar Crossref Search ADS 65 Lian , H.C. and Lu , B.L. ( 2005 ) Age Estimation Using a Min–Max Modular Support Vector Machine. In Proc. 12th Int. Conf. Neural Information Processing, pp. 83–88. 66 Gao , F. and Ai , H. ( 2009 ) Face Age Classification on Consumer Images with Gabor Feature and Fuzzy LDA Method: Lecture Notes in Computer Science. In Proc. 3rd Int. Conf. Advances in Biometrics, pp. 132–141. 67 Hewahi , N. , Olwan , A. , Tubeel , N. , El-Asar , S. and Abu-Sultan , Z. ( 2010 ) Age estimation based on neural networks using face features . J. Emerg. Trends Comput. Inf. Sci. , 1 , 61 – 67 . 68 FG-NET ( 2002 ). Face and Gesture Recognition Working Group. http://www-prima.inrialpes.fr/FGnet/. Footnotes 1 Generalized Matrix Multiplication (GEMM) in Basic Linear Algebra Subprograms (BLAS) Level 3 http://www.netlib.org/blas/ Author notes Handling editor: Fionn Murtagh © The British Computer Society 2018. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Journal

The Computer JournalOxford University Press

Published: Mar 1, 2019

References