TY - JOUR AU - Song,, Tao AB - Abstract Object tracking is a very important step in building an intelligent video monitoring system that can protect people’s lives and property. In recent years, although visual tracking has made great progress in terms of speed and accuracy, there are still few real-time high-precision tracking algorithms. Although discriminative correlation filters have excellent performance in tracking speed, there are deficiencies in handling fast motion. This leads to the inability to achieve long-term stable tracking results. The long-time tracking with discriminative correlation filter (LT-DCF) was proposed to solve these deficiencies. We use larger size detection image blocks and smaller size filters to increase the proportion of real samples to solve the boundary effects of fast motion. And we combine the histogram of oriented gradient (HOG) feature detection and scale-invariant feature transform (SIFT) key point detection to solve the obstacles caused by scale variations. The detector with deep feature flow is then incorporated into the tracker to detect key frames to improve tracking accuracy. This method has achieved more than 75% of the distance accuracy and 70% of the overlapping success rate on the VOT2015 and VOT2016 datasets, and the stable tracking video length can reach 6895 frames. 1. INTRODUCTION It is generally accepted that object visual tracking is divided into two categories: the generation models and discriminant models. The generation models are to find the predicted position in the current frame, which is the most similar area to the model in the next frame. The biggest difference between the generation models and discriminant models is that the latter use machine learning. The background information is used in training so that the classifier can focus on the distinction between foreground and background. However, the method of machine learning is slow to track and requires a large number of samples to be trained in advance. Intelligent video surveillance systems are expected to be detected and tracked in real time. Therefore, we use the generation models. Object tracking usually faces several major difficulties, e.g. appearance distortion, light changes, rapid motion and motion blur, similar background interference, out-of-plane rotation, in-plane rotation, scale changes, occlusion and out of view. Tracking is difficult due to these circumstances. Therefore, we have incorporated scale adaptation and boundary effects to overcome this difficulty based on a discriminative correlation filter with channel and spatial reliability (CSR-DCF) [1–4]. Correlation filter (CF) is also called a discriminative correlation filter if it is a tracker without deep learning. The biggest advantage of a CF is its high speed. However, a CF cannot handle the situations of rapid deformation and rapid movement well, because a CF uses a template-matching tracking algorithm. This type of algorithm uses a rectangular or elliptical template to represent the target and the motion of the target is usually described by the coordinate transformation of the template. Different coordinate transformation parameters give different image areas. The template-update strategy is to use the matched image area as a new template every several frames. However, in the tracking process, it is easy for the target to gradually remove the template, and the background object gradually moves into the template, resulting in a loss of tracking. This phenomenon is called template drift. The fast motion is mainly caused by the boundary effect, and the wrong sample generated by the boundary effect will cause the classifier to have insufficient discriminative power. Our main idea is to use larger size detection image blocks and smaller size filters to increase the proportion of real samples [5]. Scale-invariant feature transform (SIFT) is a kind of local feature, which can locate the target accurately. Therefore, in dealing with scale changes, we introduce SIFT key point detection into our algorithms. In the hope that the tracker can achieve long-term stable tracking, we added a detector with deep feature flow to the improved CSR-DCF algorithm to achieve long-term fast tracking. In summary, in this paper we make the following three contributions: We use larger size detection image blocks and smaller size filters to increase the proportion of real samples to solve the target loss problem caused by the fast motion of the target. Combining the SIFT key point detection with the discriminant correlation filter solves the tracking failure problem caused by the scale variation. We combine the detector with deep feature flow with the improved CSR-DCF tracker to make the tracking more accurate and meet real-time requirements. 2. RELATED WORK Correlated filtering was applied to object tracking as early as 2014, and it received widespread attention with its amazing number of transmitted frames per second (fps). However, there are still many shortcomings in the face of rapid movement. Object detection is also one of the hot topics at present. Owing to the rapid development of machine learning, object detection has also achieved tremendous success. 2.1. Proposed tracking methods The discriminative correlation filter (DCF) excels in visual tracking. However, the existing DCFs learn filters separately from feature extraction and updates these filters by using moving average operations with empirical weights. These restrictions raise two issues: Whether DCFs with feature representation can be modeled end-to-end. Whether DCFs can be updated in other effective ways. DCFs have been shown to be outstanding in visual tracking. To address these two issues, Song et al. proposed a convolutional residual learning scheme for visual tracking (CREST) [6, 7]. They reformulated DCFs as a one-layer convolutional neural network that directly generates the response map as the spatial correlation between two consecutive frames. In addition, they applied residual learning to capture the object appearance changes referring to spatiotemporal frames. Galoogahi et al. [8] proposed a background-aware CF based on hand-crafted features that can efficiently model how both the foreground and background of the object varies over time. A new CF for real-time visual tracking was also proposed. Unlike prior CF trackers in which negative examples are limited to circular shifted patches, this tracker is trained from real negative training examples, densely extracted from the background. In addition, an effective alternating direction multiplier (ADMM) method was proposed for learning filters of multi-channel features. Shou et al. [9, 10] designed a novel Convolutional-De-Convolutional (CDC) network that places CDC filters on top of three-dimensional ConvNets, which have been shown to be effective for abstracting action semantics but reduce the temporal length of the input data. In pixel-level semantic segmentation, deconvolution proves to be an effective upsampling method in both image and video for producing output of the same resolution as the input. In the temporal localization problem, the temporal length of the output should be the same as the input video. The proposed CDC filter performs the required temporal upsampling and spatial down-sampling operations simultaneously to predict actions at the frame-level granularity. To make the tracking algorithms become real-time with high accuracy, Fan et al. [11–13] studied the problem from a new perspective and presented a novel parallel tracking and verifying (PTAV) framework. The PTAV framework typically consists of two components, a tracker T and a verifier V, working in parallel on two separate threads. The tracker T aims to provide a super-real-time tracking inference; in contrast, the verifier V checks the tracking results and corrects T when needed. To avoid heavy computation, T maintains a buffer of tracking information (e.g. intermediate status) in recent frames to facilitate fast tracing back when needed. Yeo et al. [14] proposed a novel tracking-by-segmentation framework using an absorbing Markov chain (AMC) on superpixel segmentation, where the estimated target segmentation is propagated to subsequent frames in a recursive manner. To obtain target segmentation in the current frame, they first construct a graph for the AMC using the superpixels in the previous and current frames, where a vertex corresponds to a superpixel and the weight of each edge is given by the scores learned from support vector regression. Once the graph is constructed, target segmentation is obtained from the absorption time of each superpixel in the AMC, and the final tracking result is given by identifying connected components of superpixels corresponding to the target. 2.2. Proposed object-detection methods Temporal information in videos is vital for object detection. To fully utilize temporal information, state-of-the-art methods are based on spatio-temporal tubelets. Kang et al. [15] proposed a framework for object detection in videos, which consists of a novel tubelet proposal network to efficiently generate spatio-temporal proposals, and a long short-term memory (LSTM) network that incorporates temporal information from tubelet proposals for achieving high object-detection accuracy in videos. To determine whether an object is in motion, irrespective of camera motion, Tokmakov et al. [16] proposed a motion pattern network (MP-Net) that takes optical flow as input and outputs a per-pixel score for moving objects. This encoder–decoder-style architecture first learns a coarse representation of the optical flow-field features, and then refines it iteratively to produce motion labels at the original high resolution. Lea et al. [17] proposed temporal convolutional networks (TCNs) for action segmentation and detection. The networks are faster to train than competing LSTM-based recurrent neural networks. The encoder-decoder TCN (ED-TCN) only uses a hierarchy of temporal convolutions, pooling, and up-sampling, but can efficiently capture long-range temporal patterns. It also has a relatively small number of layers, but each layer contains a set of long convolutional filters. Dilated TCN uses dilated convolutions instead of pooling and up-sampling and adds skip connections between layers. 3. LONG-TERM DETECTION TRACKER BASED ON CF Lukezic et al. [1] introduced the concept of channel and space reliability to DCF tracking and provided a novel learning algorithm for its efficient and seamless integration into filter updating and tracking. However, this method still faces the problem of losing the target in the actual application process. In order to solve this problem, in this paper, we propose a long-term detection tracker method, long-time tracking with DCF (LT-DCF). To realize this method, the CSR-DCF method was improved and the detector mechanism added to increase the accuracy of tracking without reducing the tracking speed. 3.1. Improved CSR-DCF The CSR-DCF is a discriminative CF with channel and space reliability. The spatial reliability map adapts the filter to adapt it to the tracked part of the object, which can solve the problem of cyclic shift in any search range and eliminate the limitations associated with the rectangular shape assumption. The second novelty of the CSR-DCF lies in the channel reliability. Reliability is estimated from the nature of the constrained least-squares solution. The channel-reliability score is used to weight each channel filter response in localization. An experimental comparison with the most recent state-of-the-art boundary-constraint method shows that there are significant advantages to using this method. First, we construct a space reliability map that calculates the probability of occurrence from the target foreground/background color model using Bayesian rules. The a priori probability is determined by the ratio of the size of the foreground/background histogram extracted regions. The method to construct the space reliability map is as follows: Select the training patch tracking object boundary. Use space priori as a unitary item of Markov random field optimization. Determine the object log-likelihood according to the foreground/background color model. Calculate the posterior probability of Markov random field regularization. Cover the training patch with the final binary reliability map. Scale change is a relatively basic and common problem in tracking. If the target is reduced, the filter will learn a significant amount of background information. If the target is expanded, the filter will discriminate against the target local texture. In both cases, unexpected results may occur, resulting in drift and tracking failure. The CSR-DCF method uses the HOG feature to classify, and the result is not satisfactory when dealing with scale variations. The SIFT key point detection can well cope with the impact of scale variations. Therefore, we add the SIFT key point detection mechanism to the CSR-DCF algorithm. In the detection stage, the correlation filtering is weak for the target detection of rapid movement, and it is easy to produce the boundary effect. Scale changes and boundary effects are also two important causes of the inability of the CSR-DCF method to maintain long-term tracking. We use larger size detection image blocks and smaller size filters to increase the proportion of real samples to cope with the boundary effects of fast motion. 3.2. Detector with deep feature flow Single-frame algorithm detection for video is difficult to do in real time because of the large amount of computation. Therefore, Zhu et al. [18, 19] proposed a deep-feature-flow algorithm that runs very computationally intensive convolutional sub-networks only on sparse keyframes and transmits their depth profiles to other frames through the flow field. The algorithm has been significantly accelerated due to the relatively fast-flow calculation method. The feed-forward convolutional network N is decomposed into two contiguous sub-networks. The first one, Nfeat (feature network), is a fully convolutional network that outputs multiple intermediate features. The second sub-network is Ntask, which has a specific structure according to different tasks, and performs the task of identification on the feature maps. 3.3. Discriminative correlation filter for long-time tracking While we are hopeful that the proposed tracker can achieve long-term correct tracking, CSR-DCF cannot be fully realized. When simply adopting the improved CSR-DCF method, when there are similar goals, the tracker will probably choose the wrong target box, and then use the wrong target to update the new training sample until it is beyond recognition. Therefore, we assign a detector to an ordinary tracker and call its own detector to perform detection when tracking and correct it when an error occurs. We deal with trackers and detectors separately. The detector only detects key frames and passes information through the optical flow field of the next frame. When the detector detects a tracking problem, the message is immediately passed to the tracker and the tracker re-tracks. Figure 1 is a system framework of the LT-DCF. FIGURE 1. View largeDownload slide The system framework of the LT-DCF. Red dotted line is each frame, checkmark is detected by the verifier, and error number is not passed by the verifier. Red box at right indicates a tracking error. FIGURE 1. View largeDownload slide The system framework of the LT-DCF. Red dotted line is each frame, checkmark is detected by the verifier, and error number is not passed by the verifier. Red box at right indicates a tracking error. 4. EXPERIMENTAL ANALYSIS Here, we outline the comprehensive experimental evaluation of the proposed LT-DCF tracker. The datasets are described in Section 4.1, the implementation details are discussed in Section 4.2, the experimental results of applying the LT-DCF are provided in Section 4.3, the LT-DCF method is compared with other methods in Section 4.4, and Section 4.5 contains the details of a comprehensive experimental analysis. 4.1. Experimental data The VOT2015 [19] benchmark included the results of 63 state-of-the-art trackers that were evaluated on 60 challenging video sequences. The VOT2015 dataset is constructed from 300 sequences and uses an advanced sequence-selection method, making it the most challenging sequence set. The basic VOT measurement is the number of failures during tracking (robustness) and the average overlap (accuracy) during successful tracking, while the main VOT2015 measurement is the expected average overlap of short-term sequences [20]. The latter can be considered the expected unbiased average overlap. The VOT2016 dataset [21] contains 60 sequences from VOT2015 with improved annotations. VOT2016 evaluated a set of 70 trackers. This dataset is diverse, and the best-performing trackers come from various types, such as CF methods, DCNs, and different detection-based methods. In recent years, great progress has been made in the process of sharing code and data sets. The OTB100 benchmark contains results of 29 trackers evaluated on 100 sequences by a no-reset evaluation protocol. Tracking quality is measured by precision and success plots. Success plot shows portion of frames with the overlap between predicted and ground truth bounding box greater than a threshold with respect to all threshold values. To reduce clutter in the graphs, we show here only the results for top-performing recent baselines. 4.2. Implementation details and parameters In order to test the performance of our algorithm, it was programmed through MATLAB2016a. Video sequences (provided by the data platform:VOT and OTB) were tested on the computer with CPU Intel® Core™ i5-7300HQ 2.50 GHz and memory of 8 GB. Standard HOG [22] and Colornames features are used in the CF and HSV (Hue, Saturation, Value) foreground/background color histograms with 16 bins per color channel are used in reliability-map estimation with the parameter αmin=0.05 ⁠. All of the parameters are set to values commonly used in the literature. The histogram adaptation rate is set to ηc=0.04 ⁠, the correlation filter adaptation rate to η=0.02 ⁠, and the regularization parameter to λ=0.01 ⁠. The augmented Lagrangian optimization parameters are set to μ0=5 and β=3 ⁠. By observing the experimental results, we found that when the number of matching points is greater than 3, the SIFT feature can track the target stably and accurately. Taking into account the accuracy and speed, we set the time interval to 10 in the experiment, that is, the key frame in the detector takes a fixed length of time. 4.3. Experimental results On the dataset of VOT2015, the LT-DCF achieved a 77.5% distance precision rate (DPR) and 70.3% overlap success rate (OVR), and the frame rate was 15.4 fps. On the dataset of VOT2016, the LT-DCF achieved a DPR of 79.6% and an OVR of 74.8%, and the frame rate was 16.8 fps. On the dataset of OTB, although the LT-DCF only achieved a DPR of 72.6% and an OVR of 69.1%, the frame rate was 18.2 fps. Figures 2 and 3 show the results of several experiments on VOT2015 and VOT2016, respectively. Table 1 shows the distance precision rate (DPR), overlap success rate (OVR), and frames Per Second (fps) obtained by the LT-DCF method on each dataset. FIGURE 2. View largeDownload slide LT-DCF tracking results on VOT2015. FIGURE 2. View largeDownload slide LT-DCF tracking results on VOT2015. FIGURE 3. View largeDownload slide LT-DCF tracking results on VOT2016. FIGURE 3. View largeDownload slide LT-DCF tracking results on VOT2016. Table 1. Experimental results of LT-DCF method on VOT2015 and VOT2016 datasets. Datasets DPR (%) OVR (%) FPS VOT2015 77.5 70.3 15.4 VOT2016 79.6 74.8 16.8 OBT 72.6 69.1 18.2 Datasets DPR (%) OVR (%) FPS VOT2015 77.5 70.3 15.4 VOT2016 79.6 74.8 16.8 OBT 72.6 69.1 18.2 View Large Table 1. Experimental results of LT-DCF method on VOT2015 and VOT2016 datasets. Datasets DPR (%) OVR (%) FPS VOT2015 77.5 70.3 15.4 VOT2016 79.6 74.8 16.8 OBT 72.6 69.1 18.2 Datasets DPR (%) OVR (%) FPS VOT2015 77.5 70.3 15.4 VOT2016 79.6 74.8 16.8 OBT 72.6 69.1 18.2 View Large Experiments show that the LT-DCF algorithm has a good tracking result when dealing with scale variations and fast motion. Figure 4 shows the tracking results in the scenes of fast motion. Figure 5 shows the tracking results of the LT-DCF algorithm in the scenes of scale variations. The number in the upper left corner is the current frame of the video sequence. FIGURE 4. View largeDownload slide The tracking results of LT-DCF in the scenes of fast motion. FIGURE 4. View largeDownload slide The tracking results of LT-DCF in the scenes of fast motion. FIGURE 5. View largeDownload slide The tracking results of LT-DCF in the scenes of scale variations. FIGURE 5. View largeDownload slide The tracking results of LT-DCF in the scenes of scale variations. 4.4. Experimental comparison The algorithms we chose to compare spatial robustness evaluation (SRE) are ASLA [23], DFT [24], MTT [25], MIL [26], DLT [27], Struck [28], LCT [29], HCF [30] and TLD [31]. They are the most advanced trackers with a wide range of applicability. ASLA, MTT, LCT, and DLT adopt the particle filters. DFT adopts the local optimal search. MIL, Struck, and TLD adopt the sliding windows. And HCF adopts convolutional neural networks. Tracking results of different algorithms are represented by different colors and different shape moments. Figure 6 is an experimental comparison of various algorithms under different conditions. FIGURE 6. View largeDownload slide (a) The experimental results generated by the LT-DCF algorithm in occluded scenes. (b) Experimental results of LT-DCF algorithm in scenes with scale variations. (c) Experimental results of LT-DCF algorithm in scenes with fast motion. (d) Experimental results of the LT-DCF algorithm in scenes with illumination variations. FIGURE 6. View largeDownload slide (a) The experimental results generated by the LT-DCF algorithm in occluded scenes. (b) Experimental results of LT-DCF algorithm in scenes with scale variations. (c) Experimental results of LT-DCF algorithm in scenes with fast motion. (d) Experimental results of the LT-DCF algorithm in scenes with illumination variations. The results of the measurements in the OTB100 show that the LT-DCF algorithm is significantly better than other methods. The results of LT-DCF are ranked first, when there are occlusions or scale variations in the scene. But it is not as good as Struck in dealing with fast motion, and inferior to Struck and ASLA algorithms in dealing with illumination variations. The results show that our algorithm achieves the desired effect. 4.5. Experimental analysis We also applied the proposed method to other real-world projects. After verification, the proposed method can track stably for a long time (the longest video tested had 6895 frames). Our long-term tracking discriminant filter can handle deformation and rotation to a large extent. Even in the event of a short drift, the detector can sense the drift and then detect the correct target for follow-up. In a detector, different verification intervals may affect accuracy and efficiency. Smaller time intervals mean more frequent validation, which requires more calculations, reducing efficiency. Conversely, a larger time interval may cost less computation, but may lose goals as the target changes rapidly. If the tracker loses a target object, it may update a large number of backgrounds in its appearance model until the next validation. Even if the verifier repositions the target and provides the correct test results, the tracker may still lose the target due to significant changes in the tracking appearance model. 5. CONCLUSIONS In this article, we proposed a way to achieve long-term, real-time tracking of goals. In order to achieve long-term stable tracking, we use a method that combines the tracker with the detector. In order to achieve real-time functionality, we used the DCF method and a detector with a deep feature stream. And we have effectively improved the robustness of the tracking system in coping with scale variations and fast motion. Experiments on the VOT2015 and VOT2016 datasets and other projects were conducted separately, and the results are good. When the environment is less volatile and there are fewer similarities to the target in the background, the method can effectively achieve long-term stable tracking. In general, our algorithm has a better robustness to most Obscured scene and deformation of the target compared with these contrastive algorithms, so it can provide more accurate tracking results. In addition, LT-DCF has the characteristics of fast running speed and can meet real-time requirements. However, it has a drawback that LT-DCF is not as effective as the Struck algorithm and ASLA algorithm in dealing with illumination variation. Since Gabor wavelets are not sensitive to illumination changes, we will combine our algorithm with Gabor wavelets in the future. REFERENCES 1 Lukezic , A. , Vojir , T. , Zajc , L. , Matas , J. and Kristan , M. ( 2017 ) Discriminative Correlation Filter with Channel and Spatial Reliability. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, July 21–26, pp. 4847–4856, IEEE. 2 Bolme , D.S. , Beveridge , J.R. , Draper , B.A. and Lui , Y.M. ( 2010 ) Visual Object Tracking Using Adaptive Correlation Filters. 2010 IEEE Computer Society Conf. Computer Vision and Pattern Recognition, San Francisco, CA, USA, June 13–18, pp. 2544–2550. IEEE. 3 Boyd , S. , Parikh , N. , Chu , E. , Peleato , B. and Eckstein , J. ( 2010 ) Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers . Foundations and Trends® in Machine Learning , Boston, USA . 4 Danelljan , M. , Häger , G. , Khan , F.S. and Felsberg , M. ( 2014 ) Accurate Scale Estimation for Robust Visual Tracking. British Machine Vision Conference, Nottingham, September 1–5, pp. 1–11, BMVA Press. 5 Galoogahi , H.K. , Sim , T. and Lucey , S. ( 2015 ) Correlation Filters with Limited Boundaries, 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, June 7–12, pp. 4630–4638, IEEE. 6 Song , Y. , Ma , C. , Gong , L. , Zhang , J. and Lau , R. ( 2017 ) CREST: Convolutional Residual Learning for Visual Tracking. 2017 IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, October 22–29, pp. 2574–2583, IEEE. 7 He , K. , Zhang , X. , Ren , S. and Sun. , J. ( 2016 ) Deep Residual Learning for Image Recognition. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June 27–30, pp. 770–778, IEEE. 8 Galoogahi , H.K. , Fagg , A. and Lucey , S. ( 2017 ) Learning Background-Aware Correlation Filters for Visual Tracking. 2017 IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, October 22–29, pp. 1144–1152, IEEE. 9 Shou , Z. , Chan , J. and Zareian , A. ( 2017 ) CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 21–26, pp. 1417–1426, IEEE. 10 Shou , Z. , Wang , D. and Chang , S.F. ( 2016 )Temporal Action Localization in Untrimmed Videos via Multi-Stage CNNS. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June 27–30, pp. 1049–1058, IEEE. 11 Fan , H. and Ling , H. ( 2017 ) Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking. 2017 IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, October 22–29, pp. 5487–5495, IEEE. 12 Danelljan , M. , Hager , G. , Khan , F.S. and Felsberg , M. ( 2016 ) Discriminative scale space tracking . IEEE Trans. Pattern Anal. Mach. Intell. , 10 , 1561 – 1575 . 13 Galoogahi , H.K. , Sim , T. and Lucey , S. ( 2013 ) Multi-Channel Correlation Filters. The IEEE Int. Conf. Computer Vision (ICCV ) , Vol. 10, pp. 3072–3079. 14 Yeo , D. , Son , J. , Han , B. and Han , J.H. ( 2017 ) Superpixel-Based Tracking-by-Segmentation Using Markov Chains. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 21–26, pp. 511–520, IEEE. 15 Kang , K. , Li , H. , Xiao , T. , Quyang , W. , Yan , J. , Liu , X. and Wang , X. ( 2017 ) Object Detection in Videos with Tubelet Proposal Networks, 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, July 21–26, pp. 889–897, IEEE. 16 Tokmakov , P. , Alahari , K. and Schmid , C. ( 2017 ) Learning Motion Patterns in Videos, 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, 21–26 July, pp. 531–53, IEEE. 17 Lea , C. , Flynn , M.D. , Vidal , R. , Reiter , A. and Hager , G.D. ( 2017 ) Temporal Convolutional Networks for Action Segmentation and Detection, 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, July 21–26, pp. 1003–1012, IEEE. 18 Zhu , X. , Xiong , Y. , Dai , J. , Yuan , L. and Wei , Y. ( 2017 ) Deep Feature Flow for Video Recognition. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 21–26, pp. 4141–4150, IEEE. 19 Kristan , M. , Matas , J. , Leonardis , A. , Felsberg , M. , Cehovin , L. , Femandez , G. , Vojir , T. and Hager , G. ( 2015 ) The Visual Object Tracking vot2015 Challenge Results. 2015 IEEE Int. Conf. Computer Vision Workshop (ICCVW), Santiago, Chile, December 7–13, pp. 564–586, IEEE. 20 Kristan , M. , Matas , J. , Leonardis , A. , Vojíř , T. , Pflugfelder , R. , Fernández , G. , Nebehay , G. , Porikli , F. and Čehovin , L. ( 2016 ) A novel performance evaluation methodology for single-target trackers . IEEE Trans. Pattern Anal. Mach. Intell. , 38 , 2137 – 2155 . Google Scholar Crossref Search ADS PubMed 21 Kristan , M. et al. ( 2016 ) The Visual Object Tracking VOT2016 Challenge Results. IEEE Int. Conf. Computer Vision Workshops, Vol. 10, pp. 777–823. 22 Danelljan , M. , Hager , G. , Khan , F.S. and Felsberg , M. ( 2015 ) Learning Spatially Regularized Correlation Filters for Visual Tracking. 2015 IEEE Int. Conf. Computer Vision (ICCV), Santiago, Chile, December 7–13, pp. 4310–4318, IEEE. 23 Jia , X. , Lu , H. and Yang , M.H. ( 2012 ) Visual Tracking Via Adaptive Structural Local Sparse Appearance Model, 2012 IEEE Conf. Computer Vision and Pattern Recognition, Providence, RI, USA, June 16–21, pp. 1822–1829, IEEE. 24 Learned-Miller , E. and Sevilla-Lara , L. ( 2012 ) Distribution Fields for Tracking. 2012 IEEE Conf. Computer Vision and Pattern Recognition, Providence, RI USA, June, pp. 1910–1917, IEEE. 25 Zhang , T. , Ghanem , B. , Liu , S. et al. ( 2012 ) Robust Visual Tracking Via Multi-Task Sparse Learning. 2012 IEEE Conf. Computer Vision and Pattern Recognition, Providence, RI, USA, June 16–21, pp. 2042–2049, IEEE. 26 Babenko , B. , Yang , M.H. and Belongie , S. ( 2009 ) Visual Tracking with Online Multiple Instance Learning. IEEE Conf. Computer Vision and Pattern Recognition, Vol. 33, pp. 983–990. 27 Wang , N. and Yeung , D.Y. ( 2013 ) Learning a Deep Compact Image Representation for Visual Tracking. Int. Conf. Neural Information Processing Systems, Vol. 1, pp. 809–817 28 Hare , S. , Golodetz , S. , Saffari , A. , Vineet , V. , Cheng , M.M. , Hicks , S.L. and Torr , P.H. ( 2016 ) Struck: structured output tracking with kernels . IEEE Trans. Pattern Anal. Mach. Intell. , 38 , 2096 – 2109 . Google Scholar Crossref Search ADS PubMed 29 Ma , C. , Yang , X. , Zhang , C. and Yang , M.H. ( 2015 ) Long-Term Correlation Tracking. 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, June 7–12, pp. 5388–5396, IEEE. 30 Ma , C. , Huang , J.B. , Yang , X. and Yang , M.H. ( 2015 ) Hierarchical Convolutional Features for Visual Tracking. 2015 IEEE Int. Conf. Computer Vision (ICCV), Santiago, Chile, December 7–13, pp. 3074–3082, IEEE. 31 Han , K.H. and Kim , J.H. ( 2004 ) Quantum-inspired evolutionary algorithms with a new termination criterion . IEEE Trans. Evol. Comput. , 8 , 156 – 169 . Google Scholar Crossref Search ADS © The British Computer Society 2019. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Discriminative Correlation Filter for Long-Time Tracking JF - The Computer Journal DO - 10.1093/comjnl/bxz049 DA - 2020-03-18 UR - https://www.deepdyve.com/lp/oxford-university-press/discriminative-correlation-filter-for-long-time-tracking-4oi9jNvqqq SP - 1 VL - Advance Article IS - DP - DeepDyve ER -