Digital video stabilization based on adaptive camera trajectory smoothing

Digital video stabilization based on adaptive camera trajectory smoothing The development of multimedia equipments has allowed a significant growth in the production of videos through professional and amateur cameras, smartphones, and other mobile devices. Examples of applications involving video processing and analysis include surveillance and security, telemedicine, entertainment, teaching, and robotics. Video stabilization refers to the set of techniques required to detect and correct glitches or instabilities caused during the video acquisition process due to vibrations and undesired motion when handling the camera. In this work, we propose and evaluate a novel approach to video stabilization based on an adaptive Gaussian filter to smooth the camera trajectories. Experiments conducted on several video sequences demonstrate the effectiveness of the method, which generates videos with adequate trade-off between stabilization rate and amount of frame pixels. Our results were compared to YouTube’s state-of-the-art method, achieving competitive results. Keywords: Video stabilization, Interest points, Trajectory smoothing, Video processing 1 Introduction corrected when it should not. Recent approaches have The availability of new digital technologies [1–8]and the used optimizations [23, 24] to control the local smoothing reduction of equipment costs have facilitated the genera- intensity. tion of large volumes of videos in high resolutions. Several As a main contribution, this work presents and eval- devices have allowed the acquisition and editing of videos, uates a novel technique for video stabilization based on such as digital cameras, smartphones, and other mobile an adaptive Gaussian filter to smooth the camera trajec- devices. tories. Experiments demonstrate the effectiveness of the A large number of applications involve the use of digital method, which generates videos with proper stabiliza- videos such as telemedicine, advertising, entertainment, tion rate while maintaining a reasonable amount of frame robotics, teaching, autonomous vehicles, surveillance, and pixels. security. Due to the large amount of video that are cap- The proposed method can be seen as an alternative tured, stored, and transmitted, it is fundamental to inves- to optimization approaches recently developed in the lit- tigate and develop efficient multimedia processing and erature [23, 24] with a lower computational cost. The analysis techniques for indexing, browsing, and retrieving results are compared to different versions of Gaussian video content [9–11]. filter, Kalman filter, and the video stabilization method Video stabilization [12–21]aimstocorrect camera employed in YouTube [24], which is considered a state-of- motion oscillations that occur in the acquisition process, the-art approach. particularly when the cameras are mobile and handled by This paper is organized as follows. Some relevant con- amateurs. cepts and related work are briefly described in Section 2. Several low-pass filters have been employed in the sta- The proposed method for video stabilization is detailed bilization process [20, 22]. However, their straightforward in Section 3. Experimental results are presented and dis- application using fixed intensity along all the videos is cussed in Section 4. Finally, some final remarks and not suitable, since the camera motion may be unduly directions for future work are included in Section 5. *Correspondence: helio@ic.unicamp.br Institute of Computing, University of Campinas, Av. Albert Einstein 1251, Campinas 13083-852, Brazil © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 2 of 11 2 Background Many methods have employed different motion fil- Different categories of stabilization approaches [25–32] tering mechanisms, such as motion vector integration have been developed to improve the quality of videos, [39], Kalman filter [42, 43], particle filter [44], and reg- which can be broadly classified as mechanical stabiliza- ularization [37]. Such mechanisms aim to remove high- tion, optical stabilization, and digital stabilization. frequency instability from camera motion [36]. Other Mechanical stabilization typically uses sensors to detect approaches have focused on improving the quality of the camera shifts and compensate for undesired motion. A videos, often lost in the stabilization process. The most common way is to use gyroscopes to detect motion and commonly used techniques include inpainting to fill miss- send signals to motors connected to small wheels so that ing frame parts [22, 41, 45], deconvolution to improve the the camera can move in the opposite direction of motion. video focus [22, 41], and weighting of stabilization metrics The camera is usually positioned on a tripod. Despite the and video quality aspects [36, 45]. efficiency usually obtained with this type of system, there Recent improvements in 2D methods have made them are disadvantages in relation to the resources required, comparable to 3D methods in terms of quality. For such as device weight and battery consumption. instance, the use of an L1-norm optimization can gener- Optical stabilization [33] is widely used in photographic ate a camera path that follows cinematographic rules in cameras and consists of a mechanism to compensate for order to consider separately constant, linear, and parabolic the angular and translational motion of the cameras, sta- motion [24]. A mesh-based model, in which multiple tra- bilizing the image before it is recorded on the sensor. jectories are calculated at different locations of the video, A mechanism for optical stabilization introduces a gyro- proved to be efficient in dealing with parallax without the scope to measure velocity differences at distinct instants use of 3D methods [23]. A semi-automatic 2D method, in order to distinguish between normal and undesired which requires assistance from the user, is proposed to motion. Other systems employ a set of lenses and sensors adjust problematic frames [46]. to detect angle and speed of motion for video stabilization. On the other hand, 3D methods typically construct a Digital stabilization of videos is implemented without three-dimensional model of the scene through structure- the use of special devices. In general, undesired camera from-motion (SFM) techniques for smoothing motion motion is estimated by comparing consecutive frames and [47], providing superior quality stabilization but at a applying a transform to the video sequence to compen- higher computational cost [27, 47]. Since they usually sate for motion. These techniques are typically slower have serious problems in handling large objects in the when compared to optical techniques; however, they can foreground [36], 2D methods are in general preferred in achieve adequate results in terms of quality and speed, practice [36]. depending on the algorithms used. Although 3D methods can generate good results in Methods found in the literature for digitally stabiliz- static scenes using image-based rendering techniques ing videos are usually classified into two-dimensional (2D) [34, 48], they usually do not handle dynamic scenes or three-dimensional (3D) categories. Sequences of 2D correctly, causing motion blur [49]. Thus, the con- transformations are employed in the first category to cept of content preservation was introduced, restrict- represent camera motion and stabilize the videos. Low- ing each output frame to be generated from a single pass filters can be used to smooth the transformations, input frame [49]. Other approaches address this prob- reducing the influence of high frequency of the camera lem through a geometric approximation by abdicating to [20, 22]. In the second category, camera trajectories are be robust with respect to the parallax [35]. Other diffi- reconstructed from 3D transformations [34, 35], such as culties found in 3D methods appear in amateur videos, scaling, translation, and rotation. such as lack of parallax, zoom, and use of complemen- Approaches that use 2D transformations focus on con- tary metal oxide semiconductor (CMOS) sensors, among tributing to specific steps in their stabilization process others [47]. [36]. By considering the estimation of camera motion, Although not common, 3D methods can fill missing 2D methods can be further subdivided into two cate- parts of a frame by using information from several other gories [32]: (i) intensity-based approaches [37, 38], which frames [48]. More recently, 2D and 3D methods have been directly use the texture of the images as motion vector, extended to deal with stereoscopic videos [29, 50]. Hybrid and (ii) keypoint-based approaches [39, 40], which locate approaches have emerged to obtain the efficiency and a set of corresponding points in adjacent frames. Since robustness of 2D methods in addition to the high qual- keypoint-based approaches have a lower computational ity of 3D methods. Some of them are based on concepts cost, they are most commonly used [22]. Techniques such such as trajectories subspace [47] and epipolar transfer as the extraction of regions of interest can be used in this [51]. A hybrid method for dealing with discrete depth vari- step, in order to avoid cutting certain objects or regions ations present in short-distance videos was described by that are supposed to be important to the observer [41]. Liu et al. [31]. Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 3 of 11 3 Adaptive video stabilization method and those that describe the movement of objects. In This section presents the proposed video stabilization the application of this method, the value of the residual method based on an adaptive Gaussian smoothing of the threshold parameter, which determines the maximum camera trajectories. Figure 1 illustrates the main stages error for a match to be considered as inlier, is calcu- of the methodology, which are described in the following lated for each pair of frames. Algorithm 1 presents the subsections. calculation to determine the final similarity matrix. 3.1 Keypoint detection and matching The process starts with the detection and description of Algorithm 1 Similarity matrix computation keypoints in the video frames. In this step, we used the 1: procedure FINALMATRIX speeded up robust features (SURF) method [52]. After 2: Generate the similarity matrix M by considering extracting the keypoints between two adjacent frames, all matches. their correspondence is performed using the brute-force 3: Let MSE(M) be the mean square error of matrix method with cross-checking, where the Euclidean dis- M. tance between the feature vectors for each pair of points 4: Apply the RANSAC considering the MSE(M) as x ∈ f and x ∈ f is calculated for two adjacent frames residual threshold value. i t t+1 f and f .Thus, x corresponds to x if and only if x is the 5: Generate the similarity matrix M considering t t+1 i i only the inliers obtained previously. closest point to x and x the closest to x . j j 6: Let MSE(M ) be the mean square error of matrix Figures 2 and 3 show the detection of keypoints in a M . frame and the correspondence between the points of two 7: Apply the RANSAC considering the MSE(M ) as adjacent superimposed frames, respectively. residual threshold value. 8: Generate the similarity matrix M considering 3.2 Motion estimation final only the inliers obtained for the second execution of After determining the matches between the keypoints, RANSAC. it is necessary to estimate the motion performed by the camera. For this, we estimate the similarity matrix, that is, the matrix that transforms the set of points in aframe f to the set of points in a frame f .Since In cases of pairs of frames with spatially variant motion, t t+1 we consider the matrix of similarity, the parameters the correct matches also tend to have certain variation. of the matrix transformation take into account camera Thus, the residual threshold is calculated so that its value shifts (translation), distortion (scaling), and undesirable is low enough to eliminate undesired matches and high motion (rotation) for the construction of a stabilization enough such that the correct matches are maintained. model. In the process of digital video stabilization, oscillations 3.3 Trajectory construction of the camera that occurred at the time of recording After estimating the final similarity matrices for each pair of adjacent frames of the video, a trajectory is calculated must be compensated. The similarity matrix should take for each of the factors of the similarity matrix. In this into account only the correspondences that are, in fact, work, we consider a vertical translation factor, a horizon- between two equivalent points. In addition, it should not tal translation factor, a rotation factor, and a scaling factor. consider the movement of objects present in the scene. Each factor f of the matrix is decomposed, and the trajec- The random sample consensus (RANSAC) method is tory of each of them is calculated in order to accumulate applied to estimate a similarity matrix that considers only itspreviousvalues, expressed as inliers in order to disregard the incorrect correspondences Fig. 1 Main steps of the proposed digital video stabilization method Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 4 of 11 Gaussian filter modifies the input through a convolution by considering a Gaussian function in a window of size M. Thus, this function is used as impulse response in the Gaussian filter and can be defined as (x−μ) 2σ G(x) = ae (2) where a is a constant considered as 1 so that G(x) has values between 0 and 1. The constant μ is the expected value, considered as 0, whereas σ represents the variance. The parameter M indicates the number of points of the output window, whose value is expressed as M = −1(3) Fig. 2 Detection of keypoints between adjacent frames where n is the total number of frames in the video. Since different instants of the video will have a distinct f f f t = t +  (1) amount of oscillations, this work applies a Gaussian filter i i−1 i adaptively in order to remove only the undesired camera where t is the value of a given trajectory in the i-th motion. position and  is the value of the factor f for the i-th sim- i The smoothing of an intense motion may result in ilarity matrix previously estimated. The trajectories are videos with a low amount of pixels. Moreover, this type of then smoothed. The equations presented in the remain- motion is typically a desired camera motion, which should der of the text are always applied to the trajectories of each not be smoothed. Therefore, the parameter σ is computed factor separately. Thus, the factor index f will be omitted in such a way that it has smaller values in these regions. in order to not overload the notation. Thus, the trajectory will be smoothed by considering a distinct value for σ at each point i. To determine the value 3.4 Trajectory smoothing of σ , a sliding window of size twice as large as the frame Assuming that only the camera motion is present in the rate measure is applied, so that the window information similarity matrices, the calculated trajectory refers to the lasts for two video seconds. The ratio r is expressed as path made by the camera during the video recording. To obtain a stabilized video, it is necessary to remove μ r = 1 − (4) the oscillations from this path, keeping only the desired max_value motion. where the max_value corresponds to either width in the Since the Gaussian filter is a linear low-pass filter, it horizontal translation trajectory or height in the vertical attenuates the high frequencies present in a signal. The translation trajectory. In this work, we consider θ = as the angle (in radians) in the rotation trajectory. Thus, the motion will be considered large based mainly on the video resolution. Value μ is calculated in such a way to give higher weights to points closer to i,where μ is expressed as G(|j − i|, σ ) μ j j∈W , j=i μ = (5) G(|j − i|, σ ) j∈W where j is the index of each point in the window of i, whereas G() is a Gaussian function with σ calculated as σ = FPS(1 − CV) (6) where FPS is the video frames per second, and CV is the coefficient of variation of the absolute values of the trajectory that are inside the window. As the value Fig. 3 Matching of keypoints between adjacent frames of CV is between 0 and 1, its final value is limited Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 5 of 11 to 0.9 in order for σ not to have null values. There- smoothed trajectory (indexed by k)for each σ previously μ i fore, σ makes the actual size of the window adap- calculated. The final smoothed trajectory corresponds to tive, such that the higher the variation of motion inside the concatenation of points for each of the generated tra- the window, the higher the weight given to the central jectories, and the k-th trajectory contributes with its k-th points. point. Thus, an adaptive smoothed path is obtained. This The coefficient of variation can be expressed as process is applied only to the translation and rotation paths. std(∀t | i ∈ W ) i i Figure 4 shows the trajectory generated by considering CV = (7) avg(∀t | i ∈ W ) i i the horizontal translational factor (blue) and the obtained smoothing (green) using the Gaussian filter with σ = 40 where W is in the same window as in Eq. 5 and t i i and the adaptive version proposed in this work. It is pos- the trajectory value. Therefore, the coefficient of vari- sible to observe that the smoothing is applied at different ation corresponds to the standard deviation std to the degrees along the trajectory. average avg. Assuming that r ranges between 0 and 1, a linear trans- 3.5 Motion compensation and frame cropping formation is applied to obtain a proper interval to the After applying the Gaussian filter, it is necessary to Gaussian filter. This transformation is given as recalculate the value of each factor for each similarity σ − σ max min matrix. In order to do that, the similarity matrix value σ = (r − r ) + σ (8) i i min min r − r max min of a given factor is calculated by the difference between each point of its smoothed trajectory and its predeces- where σ and σ are the minimum and maximum min max sor. With the similarity matrices of each pair of frames values of the new interval (after linear transformation), updated, the similarity matrix is applied to the first respectively. In this work, these values are defined as 0.5 frame of the pair to take it to the coordinates of the and 40, respectively. Values r and r are the min- min max second. imum and maximum values of the old interval (before Applying the geometric transformation in the frame linear transformation). In this work, value r is always max causes information to be lost in certain pixels of the frame set to 1. To control whether a motion is desired or not, a boundary. Figure 5 presents a transformed frame, where value in the interval between 0 and 1 is set to r .The min it is possible to observe the loss of information at the bor- same r is used as a lower limit to r , before applying the min i ders. They are then cropped so that no frames in the stabi- linear transformation. lized video hold pixels without information. To determine An exponential transformation is then applied to σ val- the frame boundaries, each similarity matrix is applied to ues to amplify their magnitude. After calculating σ for the original coordinates of the four vertices, thus generat- each point of the trajectory, its values are lightly smoothed ing the transformed coordinates for the respective frame. by a Gaussian filter with σ = 5, chosen empirically. Finally, the innermost coordinates of all frames are con- This is done to avoid abrupt changes in the value of σ sidered final. Figure 6, extracted from [22], illustrates the along the trajectory. Finally, the Gaussian filter is applied n cropping process applied to the transformed frame. times (once for each point in the trajectory), generating a ab Fig. 4 Smoothing of camera motion trajectories. a Gaussian filter with σ = 40. b Adaptive Gaussian filter Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 6 of 11 video sequence to the stabilized sequence, since frames after transformation will tend to be more similar. The interframe transformation fidelity (ITF) can be used to evaluate the final stabilization of the method, expressed as N −1 ITF = PSNR(k) (10) N − 1 k=1 where N is the number of frames in the videos. Typically, the stabilized sequence has a higher ITF value than the original sequence. Due to the loss of information in the application of the similarity matrix in the frames, it is important to evalu- Fig. 5 Frame after application of geometric transformation ate and compare such rate among different stabilization methods. Forthis, we report thepercentageofpixelsheld by the stabilized video in comparison to the original video, 3.6 Evaluation metrics expressed as The peak signal-to-noise ratio (PSNR) is used to evalu- ate the overall difference between two frames of the video, W H s s Rate of preserved pixels = 100 (11) expressed as WH WHL max PSNR(f , f ) =10 log (9) t t+1 where W and H correspond to the width and height of the W H [ f (x, y)−f (x, y)] t t+1 frames in the original video and W and H correspond to s s x=1 y=1 the width and height of the frames in the video generated by the stabilization process, respectively. where f and f are two consecutive frames of the t t+1 video, W and H are the width and height of each frame, 4 Results and discussion respectively, and L is the maximum value intensity max This section describes the results of experiments con- of the image. The PSNR metric is expressed in decibel ducted on a set of input videos. Fourteen videos with (dB), a unit originally defined to measure sound intensity oscillations were submitted to the stabilization process on a logarithmic scale. Typical PSNR values range from and evaluated, where eleven of them are available from the 20 to 40. The PSNR value should increase from the initial Fig. 6 Frame after boundary cropping Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 7 of 11 Table 1 Video sequences used in our experiments No. Video Source Resolution (pixels) FPS 1 gleicher1 GaTech VideoStab 640 × 360 30 2 gleicher2 GaTech VideoStab 640 × 360 30 3 gleicher3 GaTech VideoStab 640 × 360 30 4 gleicher4 GaTech VideoStab 640 × 360 30 5 greyson_chance GaTech VideoStab 640 × 360 30 6 hippo nghiaho.com/uploads/hippo.mp4 480 × 360 30 7 lf_juggle GaTech VideoStab 480 × 360 25 8 new_gleicher GaTech VideoStab 480 × 270 30 9 sam_1 GaTech VideoStab 640 × 360 30 10 sam and cocoa youtu.be/627MqC6E5Yo 540 × 360 30 11 sany0025 GaTech VideoStab 640 × 360 30 12 shake_pgh_1 GaTech VideoStab 640 × 360 30 13 shaky_car MatLab 320 × 240 30 14 yuna_long GaTech VideoStab 640 × 360 30 GaTech VideoStab [24] database and the three others col- for each trajectory according to the size of the trajectory lected separately. Table 1 presents the videos used in the range with respect to the size of the video frame. Higher experiments and their sources. values of σ are assigned to paths with smaller intervals; In the experiments performed, we compared the val- we then denominate this version as semi-adaptive. The uesofthe ITFmetricaswellasthe amount of pixels locally adaptive version of the Gaussian filter proposed in held for different versions of the trajectory smoothing. In this work is presented. A version using the Kalman filter the first version, we used the Gaussian filter considering is also shown. In addition, the videos were submitted to σ = 40. In another version, the Gaussian filter is used in a the YouTube stabilization method [24]inorder to com- slightly more adaptive way, choosing different values of σ pare its results against ours. The metric is calculated for Table 2 Comparison between Gaussian filter and Kalman filter No. of videos Original Gaussian filter Kalman filter σ = 40 ITF ITF Hold pixels (%) ITF Hold pixels (%) 1 18.793 27.738 69.276 25.888 71.000 2 20.390 29.331 71.750 27.201 74.771 3 16.186 22.559 72.972 22.122 73.003 4 19.965 33.380 48.958 26.298 54.903 5 23.277 28.660 2.540 25.991 4.958 6 19.681 29.804 67.891 25.576 73.507 7 24.109 28.510 60.495 28.063 57.167 8 17.881 25.448 70.648 24.081 72.287 9 19.248 23.251 25.797 21.426 33.818 10 12.972 18.453 17.519 16.680 27.204 11 21.487 26.826 43.599 25.704 52.875 12 15.081 0 0 20.219 2.686 13 23.841 30.621 70.312 28.200 71.875 14 18.065 20.265 7.448 20.902 7.642 Average 19.355 24.631 44.953 24.167 48.406 Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 8 of 11 Table 3 Comparison between semi-adaptive Gaussian filter and adaptive Gaussian filter No. of videos Original Semi-adaptive Gaussian filter Locally adaptive Gaussian filter ITF ITF Hold pixels (%) ITF Hold pixels (%) 1 18.793 27.620 70.745 27.455 74.500 2 20.390 29.331 71.750 28.914 75.781 3 16.186 22.559 72.972 22.090 76.056 4 19.965 33.380 48.958 27.931 62.465 5 23.277 27.814 8.312 27.360 53.385 6 19.681 29.804 67.891 29.077 70.838 7 24.109 28.510 60.495 28.876 73.667 8 17.881 25.448 70.648 25.182 73.284 9 19.248 21.845 35.750 21.435 57.139 10 12.972 17.465 27.907 16.381 70.296 11 21.487 26.826 43.559 25.659 57.260 12 15.081 19.827 16.611 17.895 59.847 13 23.841 30.621 70.312 29.987 71.719 14 18.065 19.759 39.045 19.773 54.146 Average 19.355 25.772 50.353 24.858 66.455 the video sequence before and after the stabilization pro- camera motion, which is erroneously considered as oscil- cess. Table 2 shows the results obtained with the Kalman lations by the Gaussian filter, if the value of σ used is filter and the Gaussian filter with σ = 40. high enough. However, smaller values may not remove the From Table 2, we can observe a certain superiority in oscillations from the videos efficiently, since each video the use of the Gaussian filter, which achieves a higher has oscillations of different proportions. ITF value for all videos with basically the same amount In ordertoimprove thequality of thestabilization of pixels kept for most videos. Videos #5, #9, #10, #12, for these cases, Table 3 presents the results obtained and #14 keep a lower amount of pixels compared to with the version of the semi-adaptive Gaussian filter, the other videos. This is due to the presence of desired where trajectories with greater difference between the Table 4 Comparison between adaptive Gaussian filter and YouTube method [24] No. of videos Original Locally adaptive Gaussian filter YouTube [24]Holdpixels 1 18.793 27.455 27.890 Superior 2 20.390 28.914 28.604 Superior 3 16.186 22.090 23.030 Comparable 4 19.965 27.931 33.711 Superior 5 23.277 27.360 27.599 Inferior 6 19.681 29.077 29.390 Superior 7 24.109 28.876 29.252 Comparable 8 17.881 25.182 25.908 Superior 9 19.248 21.435 20.922 Inferior 10 12.972 16.381 20.495 Superior 11 21.487 25.659 26.672 Comparable 12 15.081 17.895 19.283 Comparable 13 23.841 29.987 28.845 Comparable 14 18.065 19.773 20.128 Inferior Average 19.355 24.858 25.837 – Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 9 of 11 Fig. 7 Video #1. Amount of pixels hold through our method is superior than the state-of-the-art approach. a Adaptive Gaussian filter; b YouTube [24] minimum and maximum values will have a lower value comparable (when both methods hold basically the same for σ.Weused σ = 40 for trajectories with inter- amount of pixels). Figures 7, 8,and 9 illustrate the analysis vals smaller than 80% of the respective frame size, performed. whereas σ = 20 otherwise. For the locally adap- In Fig. 7, it is possible to observe that more infor- tive version proposed in this work, we experimen- mation is maintained on the top, left, and right sides tally set r as 0.4, whose results are reported in of the video obtained with our method. The difference min Table 3. is not considerably large, and the advantage or dis- The semi-adaptive version maintains more pixels in the advantage obtained follows these proportions in most videos in which the original Gaussian filter had prob- videos. lems, since σ = 20 was applied to them. However, the In Fig. 8, there is less information maintained on the top amount of pixels held in the frames is lower than in the and bottom sides in the use of the adaptive Gaussian filter. other videos. This is because, in many cases, σ = 20 is On the other hand, there is a larger amount of information still a very high value. On the other hand, smaller values held on the left and right sides. of σ can ignore the oscillations that are present in other Figure 9 illustrates a situation where our method main- instance of the video, thus generating videos not stabi- tains less pixels. Lower amount of information is held on lized enough and consequently with a lower ITF value. the every sides with our method. Therefore, as can be seen in Table 3, the locally adap- From Table 4, we can observe a certain parity for tive version, whose smoothing intensity is changed along both methods in terms of ITF metric, with a slight the trajectory, obtained ITF values comparable to the advantage of the YouTube method [24], while the main- original and semi-adaptive version, maintaining consider- tained pixels are in general comparable and, when lower, ably more pixels. do not differ much. This demonstrates that the pro- Table 4 presents a comparison of the results between posed method is competitive with one of the meth- ourmethodand YouTubeapproach [24]. The percentage ods considered as current state-of-the-art, despite the of pixels held was not reported since the YouTube method simplicity of our method. Notwithstanding, the method resizes the stabilized videos to their original size. Thus, a still needs to be further extended to deal with some qualitative analysis is done through the first frame of each adverse situations, such as the treatment of non- video, whose results are classified into three categories: rigid oscillations in the video #10, the rolling shut- superior (when our method maintains more pixels), infe- ter in the video #12, and the parallax effect, among rior (when the YouTube method holds more pixels), and others. Fig. 8 Video #3. Amount of pixels hold through our method is comparable to the state-of-the-art approach. a Adaptive Gaussian filter; b YouTube [24] Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 10 of 11 Fig. 9 Video #12. Amount of pixels hold through our method is inferior than the state-of-the-art approach. a Adaptive Gaussian filter; b YouTube [24] 5Conclusions Received: 28 November 2017 Accepted: 15 May 2018 In this work, we presented a technique for video stabi- lization based on an adaptive Gaussian filter to smooth References the camera trajectory in order to remove undesired oscil- 1. C Yan, Y Zhang, J Xu, F Dai, J Zhang, Q Dai, et al., Efficient parallel lations. The proposed filter assigns distinct values to σ framework for HEVC motion estimation on many-core processors. IEEE Trans. Circ. Syst. Video Technol. 24(12), 2077–2089 (2014) along the camera trajectory by considering that the inten- 2. MF Alcantara, TP Moreira, H Pedrini, Real-time action recognition using a sity of the oscillations changes throughout the video and multilayer descriptor with variable size. J. Electron. Imaging. 25(1), that a very high value of σ can result in a video with a 013020.1?013020.9 (2016) 3. C Yan, Y Zhang, J Xu, F Dai, L Li, Q Dai, et al., A highly parallel framework low amount of pixels, while smaller values generate less for HEVC coding unit partitioning tree decision on many-core processors. stabilized videos. IEEE Sig. Process. Lett. 21(5), 573–576 (2014) The results obtained in the experiments were compared 4. MF Alcantara, H Pedrini, Y Cao, Human action classification based on silhouette indexed interest points for multiple domains. Int. J. Image with different versions for the smoothing of the trajec- Graph. 17(3), 1750018_1–1750018_27 (1750) tory: Kalman filter, Gaussian filter with σ = 40, and a 5. C Yan, H Xie, D Yang, J Yin, Y Zhang, Q Dai, Supervised hash coding with semi-adaptive Gaussian filter. The approaches achieved deep neural network for environment perception of intelligent vehicles. comparable values for the ITF metric while maintaining a IEEE Trans. Intell. Transp. Syst. 19(1), 284–295 (2018) 6. MF Alcantara, TP Moreira, H Pedrini, F Fl´rez-Revuelta, Action identification significantly higher amount of pixels. using a descriptor with autonomous fragments in a multilevel prediction A comparison was performed with the stabilization scheme. Signal Image Video Process. 11(2), 325–332 (2017) method used in YouTube, where the results were compet- 7. C Yan, H Xie, S Liu, J Yin, Y Zhang, Q Dai, Effective Uyghur language text detection in complex background images for traffic prompt itive. As directions for future work, we intend to extend identification. IEEE Trans. Intell. Transp. Syst. 19(1), 220–229 (2018) our method to deal with some adverse situations, such as 8. BS Torres, H Pedrini, Detection of complex video events through visual non-rigid oscillations and effect of parallax. rhythm. Vis. Comput. 34(2), 145–165 (2018) 9. MVM Cirne, H Pedrini, in Progress in Pattern Recognition, Image Analysis, Abbreviations Computer Vision, and Applications. A video summarization method based 2D: Two-dimensional; 3D: Three-dimensional; CMOS: Complementary metal on spectral clustering (Springer, 2013), pp. 479–486 oxide semiconductor; CV: Coefficient of variation; dB: Decibel; ITF: Interframe 10. MVM Cirne, H Pedrini, in Progress in Pattern Recognition, Image Analysis, transformation fidelity; MSE: Mean square error; PSNR: Peak signal-to-noise Computer Vision, and Applications. Summarization of videos by image ratio; RANSAC: Random sample consensus; SFM: Structure-from-motion quality assessment (Springer, 2014), pp. 901–908 11. TS Huang, Image Sequence Analysis. vol. 5. (Springer Science & Business Acknowledgements Media, Heidelberg, 2013) The authors would like to thank the editors and anonymous reviewers for their 12. AA Amanatiadis, I Andreadis, Digital image stabilization by independent valuable comments. component analysis. IEEE Trans. Instrum. Meas. 59(7), 1755–1763 (2010) 13. JY Chang, WF Hu, MH Cheng, BS Chang, Digital image translational and Funding rotational motion stabilization using optical flow technique. IEEE Trans. The authors are thankful to FAPESP (grants #2014/12236-1 and Consum. Electron. 48(1), 108–115 (2002) #2017/12646-3) and CNPq (grant #305169/2015-7) for their financial support. 14. S Ertürk, Real-time digital image stabilization using Kalman filters. Real-time Imaging. 8(4), 317–328 (2002) Availability of data and materials 15. R Jia, H Zhang, L Wang, J Li, in International Conference on Artificial Data are publicly available. Intelligence and Computational Intelligence. Digital image stabilization Authors’ contributions based on phase correlation. vol. 3 (IEEE, 2009), pp. 485–489 16. SJ Ko, SH Lee, KH Lee, Digital image stabilizing algorithms based on HP and MRS contributed equally to this work. Both authors carried out the in- bit-plane matching. IEEE Trans. Consum. Electron. 44(3), 617–622 (1998) depth analysis of the experimental results and checked the correctness of the 17. S Kumar, H Azartash, M Biswas, T Nguyen, Real-time affine global motion evaluation. Both authors took part in the writing and proof reading of the final estimation using phase correlation and its application for digital image version of the paper. Both authors read and approved the final manuscript. stabilization. IEEE Trans. Image Process. 20(12), 3406–3418 (2011) Competing interests 18. CT Lin, CT Hong, CT Yang, Real-time digital image stabilization system The authors declare that they have no competing interests. using modified proportional integrated controller. IEEE Trans. Circ. Syst. Video Technol. 19(3), 427–431 (2009) Publisher’s Note 19. L Marcenaro, G Vernazza, CS Regazzoni, in International Conference on Springer Nature remains neutral with regard to jurisdictional claims in Image Processing. Image stabilization algorithms for video-surveillance published maps and institutional affiliations. applications. vol. 1 (IEEE, 2001), pp. 349–352 Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 11 of 11 20. C Morimoto, R Chellappa, in 13th International Conference on Pattern 45. ML Gleicher, F Liu, Re-cinematography: improving the camerawork of Recognition. Fast electronic digital image stabilization. vol. 3 (IEEE, 1996), casual video. ACM Trans. Multimed. Comput. Commun. Appl. 5(1), 2 (2008) pp. 284–288 46. J Bai, A Agarwala, M Agrawala, R Ramamoorthi, Wiley Online 21. YG Ryu, MJ Chung, Robust online digital image stabilization based on Library.User-assisted video stabilization. Comput. Graph. Forum. 33(4), point-feature trajectory without accumulative global motion estimation. 61–70 (2014) IEEE Signal Proc. Lett. 19(4), 223–226 (2012) 47. F Liu, M Gleicher, J Wang, H Jin, A Agarwala, Subspace video stabilization. 22. Y Matsushita, E Ofek, W Ge, X Tang, HY Shum, Full-frame video ACM Trans. Graph. 30(1), 4 (2011) stabilization with motion inpainting. IEEE Trans. Pattern Anal. Mach. Intell. 48. P Bhat, CL Zitnick, N Snavely, A Agarwala, M Agrawala, M Cohen, et al,in 28(7), 1150–1163 (2006) 18th Eurographics Conference on Rendering Techniques. Eurographics 23. S Liu, L Yuan, P Tan, J Sun, Bundled camera paths for video stabilization. Association. Using photographs to enhance videos of a static scene, ACM Trans. Graph. 32(4), 78 (2013) (2007), pp. 327–338 24. M Grundmann, V Kwatra, I Essa, in IEEE Conference on Computer Vision and 49. F Liu, M Gleicher, H Jin, A Agarwala, Content-preserving warps for 3D Pattern Recognition. Auto-directed video stabilization with robust L1 video stabilization. ACM Trans. Graph. 28(3), 44 (2009) optimal camera paths (IEEE, 2011), pp. 225–232 50. FLiu,YNiu, HJin,in IEEE International Conference on Computer Vision.Joint 25. C Jia, BL Evans, Online motion smoothing for video stabilization via subspace stabilization for stereoscopic video, (2013), pp. 73–80 constrained multiple-model estimation. EURASIP J. Image Video Proc. 51. A Goldstein, R Fattal, Video stabilization using epipolar geometry. ACM 2017(1), 25 (2017) Trans. Graph. 31(5), 1–10 (2012) 26. S Liu, M Li, S Zhu, B Zeng, CodingFlow: enable video coding for video 52. H Bay, A Ess, T Tuytelaars, L Van Gool, Speeded-up robust features (SURF). stabilization. IEEE Trans. Image Proc. 26(7), 3291–3302 (2017) Comput. Vis. Image Underst. 110(3), 346–359 (2008) 27. Z Zhao, X Ma, in IEEE International Conference on Image Processing. Video stabilization based on local trajectories and robust mesh transformation (IEEE, 2016), pp. 4092–4096 28. N Bhowmik, V Gouet-Brunet, L Wei, G Bloch, in International Conference on Multimedia Modeling. Adaptive and optimal combination of local features for image retrieval (Springer, Cham, 2017), pp. 76–88 29. HGuo,SLiu, SZhu,BZeng, in IEEE International Conference on Image Processing. Joint bundled camera paths for stereoscopic video stabilization (IEEE, 2016), pp. 1071–1075 30. Q Zheng, M Yang, A video stabilization method based on inter-frame image matching score. Glob. J. Comput. Sci. Technol. 17(1), 41–46 (2017) 31. S Liu, B Xu, C Deng, S Zhu, B Zeng, M Gabbouj, A hybrid approach for near-range video stabilization. IEEE Trans. Circ. Syst. Video Technol. 27(9), 1922–1933 (2016) 32. BH Chen, A Kopylov, SC Huang, O Seredin, R Karpov, SY Kuo, et al., Improved global motion estimation via motion vector clustering for video stabilization. Eng. Appl. Artif. Intell. 54, 39–48 (2016) 33. B Cardani, Optical Image Stabilization for Digital Cameras. IEEE Control. Syst. 26(2), 21–22 (2006) 34. C Buehler, M Bosse, L McMillan, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Non-metric image-based rendering for video stabilization. vol. 2 (IEEE, 2001), pp. II–609 35. G Zhang, W Hua, X Qin, Y Shao, H Bao, Video Stabilization based on a 3D Perspective Camera Model. Vis. Comput. 25(11), 997–1008 (2009) 36. KY Lee, YY Chuang, BY Chen, M Ouhyoung, in IEEE 12th International Conference on Computer Vision. Video stabilization using robust feature trajectories (IEEE, 2009), pp. 1397–1404 37. HC Chang, SH Lai, KR Lu, in IEEE International Conference on Multimedia and Expo. A robust and efficient video stabilization algorithm. vol. 1 (IEEE, 2004), pp. 29–32 38. G Puglisi, S Battiato, A robust image alignment algorithm for video stabilization purposes. IEEE Trans. Circ. Syst. Video Technol. 21(10), 1390–1400 (2011) 39. S Battiato, G Gallo, G Puglisi, S Scellato, in 14th International Conference on Image Analysis and Processing. SIFT features tracking for video stabilization (IEEE, 2007), pp. 825–830 40. Y Shen, P Guturu, T Damarla, BP Buckles, KR Namuduri, Video stabilization using principal component analysis and scale invariant feature transform in particle filter framework. IEEE Trans. Consum. Electron. 55(3), 1714–1721 (2009) 41. BY Chen, KY Lee, WT Huang, JS Lin, Wiley Online Library. Capturing intention-based full-frame video stabilization. Comput. Graphics Forum. 27(7), 1805–1814 (2008) 42. S Ertürk, Image sequence stabilisation based on Kalman filtering of frame positions. Electron. Lett. 37(20), 1 (2001) 43. A Litvin, J Konrad, WC Karl, in Electronic Imaging.International Society for Optics and Photonics. Probabilistic video stabilization using Kalman filtering and mosaicing (SPIE-IS&T, 2003), pp. 663–674 44. J Yang, D Schonfeld, C Chen, M Mohamed, in International Conference on Image Processing. Online video stabilization based on particle filters (IEEE, 2006), pp. 1545–1548 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png EURASIP Journal on Image and Video Processing Springer Journals

Digital video stabilization based on adaptive camera trajectory smoothing

Free
11 pages

Loading next page...
 
/lp/springer_journal/digital-video-stabilization-based-on-adaptive-camera-trajectory-WGVqlwpyn0
Publisher
Springer International Publishing
Copyright
Copyright © 2018 by The Author(s)
Subject
Engineering; Signal,Image and Speech Processing; Image Processing and Computer Vision; Biometrics; Pattern Recognition
eISSN
1687-5281
D.O.I.
10.1186/s13640-018-0277-7
Publisher site
See Article on Publisher Site

Abstract

The development of multimedia equipments has allowed a significant growth in the production of videos through professional and amateur cameras, smartphones, and other mobile devices. Examples of applications involving video processing and analysis include surveillance and security, telemedicine, entertainment, teaching, and robotics. Video stabilization refers to the set of techniques required to detect and correct glitches or instabilities caused during the video acquisition process due to vibrations and undesired motion when handling the camera. In this work, we propose and evaluate a novel approach to video stabilization based on an adaptive Gaussian filter to smooth the camera trajectories. Experiments conducted on several video sequences demonstrate the effectiveness of the method, which generates videos with adequate trade-off between stabilization rate and amount of frame pixels. Our results were compared to YouTube’s state-of-the-art method, achieving competitive results. Keywords: Video stabilization, Interest points, Trajectory smoothing, Video processing 1 Introduction corrected when it should not. Recent approaches have The availability of new digital technologies [1–8]and the used optimizations [23, 24] to control the local smoothing reduction of equipment costs have facilitated the genera- intensity. tion of large volumes of videos in high resolutions. Several As a main contribution, this work presents and eval- devices have allowed the acquisition and editing of videos, uates a novel technique for video stabilization based on such as digital cameras, smartphones, and other mobile an adaptive Gaussian filter to smooth the camera trajec- devices. tories. Experiments demonstrate the effectiveness of the A large number of applications involve the use of digital method, which generates videos with proper stabiliza- videos such as telemedicine, advertising, entertainment, tion rate while maintaining a reasonable amount of frame robotics, teaching, autonomous vehicles, surveillance, and pixels. security. Due to the large amount of video that are cap- The proposed method can be seen as an alternative tured, stored, and transmitted, it is fundamental to inves- to optimization approaches recently developed in the lit- tigate and develop efficient multimedia processing and erature [23, 24] with a lower computational cost. The analysis techniques for indexing, browsing, and retrieving results are compared to different versions of Gaussian video content [9–11]. filter, Kalman filter, and the video stabilization method Video stabilization [12–21]aimstocorrect camera employed in YouTube [24], which is considered a state-of- motion oscillations that occur in the acquisition process, the-art approach. particularly when the cameras are mobile and handled by This paper is organized as follows. Some relevant con- amateurs. cepts and related work are briefly described in Section 2. Several low-pass filters have been employed in the sta- The proposed method for video stabilization is detailed bilization process [20, 22]. However, their straightforward in Section 3. Experimental results are presented and dis- application using fixed intensity along all the videos is cussed in Section 4. Finally, some final remarks and not suitable, since the camera motion may be unduly directions for future work are included in Section 5. *Correspondence: helio@ic.unicamp.br Institute of Computing, University of Campinas, Av. Albert Einstein 1251, Campinas 13083-852, Brazil © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 2 of 11 2 Background Many methods have employed different motion fil- Different categories of stabilization approaches [25–32] tering mechanisms, such as motion vector integration have been developed to improve the quality of videos, [39], Kalman filter [42, 43], particle filter [44], and reg- which can be broadly classified as mechanical stabiliza- ularization [37]. Such mechanisms aim to remove high- tion, optical stabilization, and digital stabilization. frequency instability from camera motion [36]. Other Mechanical stabilization typically uses sensors to detect approaches have focused on improving the quality of the camera shifts and compensate for undesired motion. A videos, often lost in the stabilization process. The most common way is to use gyroscopes to detect motion and commonly used techniques include inpainting to fill miss- send signals to motors connected to small wheels so that ing frame parts [22, 41, 45], deconvolution to improve the the camera can move in the opposite direction of motion. video focus [22, 41], and weighting of stabilization metrics The camera is usually positioned on a tripod. Despite the and video quality aspects [36, 45]. efficiency usually obtained with this type of system, there Recent improvements in 2D methods have made them are disadvantages in relation to the resources required, comparable to 3D methods in terms of quality. For such as device weight and battery consumption. instance, the use of an L1-norm optimization can gener- Optical stabilization [33] is widely used in photographic ate a camera path that follows cinematographic rules in cameras and consists of a mechanism to compensate for order to consider separately constant, linear, and parabolic the angular and translational motion of the cameras, sta- motion [24]. A mesh-based model, in which multiple tra- bilizing the image before it is recorded on the sensor. jectories are calculated at different locations of the video, A mechanism for optical stabilization introduces a gyro- proved to be efficient in dealing with parallax without the scope to measure velocity differences at distinct instants use of 3D methods [23]. A semi-automatic 2D method, in order to distinguish between normal and undesired which requires assistance from the user, is proposed to motion. Other systems employ a set of lenses and sensors adjust problematic frames [46]. to detect angle and speed of motion for video stabilization. On the other hand, 3D methods typically construct a Digital stabilization of videos is implemented without three-dimensional model of the scene through structure- the use of special devices. In general, undesired camera from-motion (SFM) techniques for smoothing motion motion is estimated by comparing consecutive frames and [47], providing superior quality stabilization but at a applying a transform to the video sequence to compen- higher computational cost [27, 47]. Since they usually sate for motion. These techniques are typically slower have serious problems in handling large objects in the when compared to optical techniques; however, they can foreground [36], 2D methods are in general preferred in achieve adequate results in terms of quality and speed, practice [36]. depending on the algorithms used. Although 3D methods can generate good results in Methods found in the literature for digitally stabiliz- static scenes using image-based rendering techniques ing videos are usually classified into two-dimensional (2D) [34, 48], they usually do not handle dynamic scenes or three-dimensional (3D) categories. Sequences of 2D correctly, causing motion blur [49]. Thus, the con- transformations are employed in the first category to cept of content preservation was introduced, restrict- represent camera motion and stabilize the videos. Low- ing each output frame to be generated from a single pass filters can be used to smooth the transformations, input frame [49]. Other approaches address this prob- reducing the influence of high frequency of the camera lem through a geometric approximation by abdicating to [20, 22]. In the second category, camera trajectories are be robust with respect to the parallax [35]. Other diffi- reconstructed from 3D transformations [34, 35], such as culties found in 3D methods appear in amateur videos, scaling, translation, and rotation. such as lack of parallax, zoom, and use of complemen- Approaches that use 2D transformations focus on con- tary metal oxide semiconductor (CMOS) sensors, among tributing to specific steps in their stabilization process others [47]. [36]. By considering the estimation of camera motion, Although not common, 3D methods can fill missing 2D methods can be further subdivided into two cate- parts of a frame by using information from several other gories [32]: (i) intensity-based approaches [37, 38], which frames [48]. More recently, 2D and 3D methods have been directly use the texture of the images as motion vector, extended to deal with stereoscopic videos [29, 50]. Hybrid and (ii) keypoint-based approaches [39, 40], which locate approaches have emerged to obtain the efficiency and a set of corresponding points in adjacent frames. Since robustness of 2D methods in addition to the high qual- keypoint-based approaches have a lower computational ity of 3D methods. Some of them are based on concepts cost, they are most commonly used [22]. Techniques such such as trajectories subspace [47] and epipolar transfer as the extraction of regions of interest can be used in this [51]. A hybrid method for dealing with discrete depth vari- step, in order to avoid cutting certain objects or regions ations present in short-distance videos was described by that are supposed to be important to the observer [41]. Liu et al. [31]. Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 3 of 11 3 Adaptive video stabilization method and those that describe the movement of objects. In This section presents the proposed video stabilization the application of this method, the value of the residual method based on an adaptive Gaussian smoothing of the threshold parameter, which determines the maximum camera trajectories. Figure 1 illustrates the main stages error for a match to be considered as inlier, is calcu- of the methodology, which are described in the following lated for each pair of frames. Algorithm 1 presents the subsections. calculation to determine the final similarity matrix. 3.1 Keypoint detection and matching The process starts with the detection and description of Algorithm 1 Similarity matrix computation keypoints in the video frames. In this step, we used the 1: procedure FINALMATRIX speeded up robust features (SURF) method [52]. After 2: Generate the similarity matrix M by considering extracting the keypoints between two adjacent frames, all matches. their correspondence is performed using the brute-force 3: Let MSE(M) be the mean square error of matrix method with cross-checking, where the Euclidean dis- M. tance between the feature vectors for each pair of points 4: Apply the RANSAC considering the MSE(M) as x ∈ f and x ∈ f is calculated for two adjacent frames residual threshold value. i t t+1 f and f .Thus, x corresponds to x if and only if x is the 5: Generate the similarity matrix M considering t t+1 i i only the inliers obtained previously. closest point to x and x the closest to x . j j 6: Let MSE(M ) be the mean square error of matrix Figures 2 and 3 show the detection of keypoints in a M . frame and the correspondence between the points of two 7: Apply the RANSAC considering the MSE(M ) as adjacent superimposed frames, respectively. residual threshold value. 8: Generate the similarity matrix M considering 3.2 Motion estimation final only the inliers obtained for the second execution of After determining the matches between the keypoints, RANSAC. it is necessary to estimate the motion performed by the camera. For this, we estimate the similarity matrix, that is, the matrix that transforms the set of points in aframe f to the set of points in a frame f .Since In cases of pairs of frames with spatially variant motion, t t+1 we consider the matrix of similarity, the parameters the correct matches also tend to have certain variation. of the matrix transformation take into account camera Thus, the residual threshold is calculated so that its value shifts (translation), distortion (scaling), and undesirable is low enough to eliminate undesired matches and high motion (rotation) for the construction of a stabilization enough such that the correct matches are maintained. model. In the process of digital video stabilization, oscillations 3.3 Trajectory construction of the camera that occurred at the time of recording After estimating the final similarity matrices for each pair of adjacent frames of the video, a trajectory is calculated must be compensated. The similarity matrix should take for each of the factors of the similarity matrix. In this into account only the correspondences that are, in fact, work, we consider a vertical translation factor, a horizon- between two equivalent points. In addition, it should not tal translation factor, a rotation factor, and a scaling factor. consider the movement of objects present in the scene. Each factor f of the matrix is decomposed, and the trajec- The random sample consensus (RANSAC) method is tory of each of them is calculated in order to accumulate applied to estimate a similarity matrix that considers only itspreviousvalues, expressed as inliers in order to disregard the incorrect correspondences Fig. 1 Main steps of the proposed digital video stabilization method Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 4 of 11 Gaussian filter modifies the input through a convolution by considering a Gaussian function in a window of size M. Thus, this function is used as impulse response in the Gaussian filter and can be defined as (x−μ) 2σ G(x) = ae (2) where a is a constant considered as 1 so that G(x) has values between 0 and 1. The constant μ is the expected value, considered as 0, whereas σ represents the variance. The parameter M indicates the number of points of the output window, whose value is expressed as M = −1(3) Fig. 2 Detection of keypoints between adjacent frames where n is the total number of frames in the video. Since different instants of the video will have a distinct f f f t = t +  (1) amount of oscillations, this work applies a Gaussian filter i i−1 i adaptively in order to remove only the undesired camera where t is the value of a given trajectory in the i-th motion. position and  is the value of the factor f for the i-th sim- i The smoothing of an intense motion may result in ilarity matrix previously estimated. The trajectories are videos with a low amount of pixels. Moreover, this type of then smoothed. The equations presented in the remain- motion is typically a desired camera motion, which should der of the text are always applied to the trajectories of each not be smoothed. Therefore, the parameter σ is computed factor separately. Thus, the factor index f will be omitted in such a way that it has smaller values in these regions. in order to not overload the notation. Thus, the trajectory will be smoothed by considering a distinct value for σ at each point i. To determine the value 3.4 Trajectory smoothing of σ , a sliding window of size twice as large as the frame Assuming that only the camera motion is present in the rate measure is applied, so that the window information similarity matrices, the calculated trajectory refers to the lasts for two video seconds. The ratio r is expressed as path made by the camera during the video recording. To obtain a stabilized video, it is necessary to remove μ r = 1 − (4) the oscillations from this path, keeping only the desired max_value motion. where the max_value corresponds to either width in the Since the Gaussian filter is a linear low-pass filter, it horizontal translation trajectory or height in the vertical attenuates the high frequencies present in a signal. The translation trajectory. In this work, we consider θ = as the angle (in radians) in the rotation trajectory. Thus, the motion will be considered large based mainly on the video resolution. Value μ is calculated in such a way to give higher weights to points closer to i,where μ is expressed as G(|j − i|, σ ) μ j j∈W , j=i μ = (5) G(|j − i|, σ ) j∈W where j is the index of each point in the window of i, whereas G() is a Gaussian function with σ calculated as σ = FPS(1 − CV) (6) where FPS is the video frames per second, and CV is the coefficient of variation of the absolute values of the trajectory that are inside the window. As the value Fig. 3 Matching of keypoints between adjacent frames of CV is between 0 and 1, its final value is limited Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 5 of 11 to 0.9 in order for σ not to have null values. There- smoothed trajectory (indexed by k)for each σ previously μ i fore, σ makes the actual size of the window adap- calculated. The final smoothed trajectory corresponds to tive, such that the higher the variation of motion inside the concatenation of points for each of the generated tra- the window, the higher the weight given to the central jectories, and the k-th trajectory contributes with its k-th points. point. Thus, an adaptive smoothed path is obtained. This The coefficient of variation can be expressed as process is applied only to the translation and rotation paths. std(∀t | i ∈ W ) i i Figure 4 shows the trajectory generated by considering CV = (7) avg(∀t | i ∈ W ) i i the horizontal translational factor (blue) and the obtained smoothing (green) using the Gaussian filter with σ = 40 where W is in the same window as in Eq. 5 and t i i and the adaptive version proposed in this work. It is pos- the trajectory value. Therefore, the coefficient of vari- sible to observe that the smoothing is applied at different ation corresponds to the standard deviation std to the degrees along the trajectory. average avg. Assuming that r ranges between 0 and 1, a linear trans- 3.5 Motion compensation and frame cropping formation is applied to obtain a proper interval to the After applying the Gaussian filter, it is necessary to Gaussian filter. This transformation is given as recalculate the value of each factor for each similarity σ − σ max min matrix. In order to do that, the similarity matrix value σ = (r − r ) + σ (8) i i min min r − r max min of a given factor is calculated by the difference between each point of its smoothed trajectory and its predeces- where σ and σ are the minimum and maximum min max sor. With the similarity matrices of each pair of frames values of the new interval (after linear transformation), updated, the similarity matrix is applied to the first respectively. In this work, these values are defined as 0.5 frame of the pair to take it to the coordinates of the and 40, respectively. Values r and r are the min- min max second. imum and maximum values of the old interval (before Applying the geometric transformation in the frame linear transformation). In this work, value r is always max causes information to be lost in certain pixels of the frame set to 1. To control whether a motion is desired or not, a boundary. Figure 5 presents a transformed frame, where value in the interval between 0 and 1 is set to r .The min it is possible to observe the loss of information at the bor- same r is used as a lower limit to r , before applying the min i ders. They are then cropped so that no frames in the stabi- linear transformation. lized video hold pixels without information. To determine An exponential transformation is then applied to σ val- the frame boundaries, each similarity matrix is applied to ues to amplify their magnitude. After calculating σ for the original coordinates of the four vertices, thus generat- each point of the trajectory, its values are lightly smoothed ing the transformed coordinates for the respective frame. by a Gaussian filter with σ = 5, chosen empirically. Finally, the innermost coordinates of all frames are con- This is done to avoid abrupt changes in the value of σ sidered final. Figure 6, extracted from [22], illustrates the along the trajectory. Finally, the Gaussian filter is applied n cropping process applied to the transformed frame. times (once for each point in the trajectory), generating a ab Fig. 4 Smoothing of camera motion trajectories. a Gaussian filter with σ = 40. b Adaptive Gaussian filter Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 6 of 11 video sequence to the stabilized sequence, since frames after transformation will tend to be more similar. The interframe transformation fidelity (ITF) can be used to evaluate the final stabilization of the method, expressed as N −1 ITF = PSNR(k) (10) N − 1 k=1 where N is the number of frames in the videos. Typically, the stabilized sequence has a higher ITF value than the original sequence. Due to the loss of information in the application of the similarity matrix in the frames, it is important to evalu- Fig. 5 Frame after application of geometric transformation ate and compare such rate among different stabilization methods. Forthis, we report thepercentageofpixelsheld by the stabilized video in comparison to the original video, 3.6 Evaluation metrics expressed as The peak signal-to-noise ratio (PSNR) is used to evalu- ate the overall difference between two frames of the video, W H s s Rate of preserved pixels = 100 (11) expressed as WH WHL max PSNR(f , f ) =10 log (9) t t+1 where W and H correspond to the width and height of the W H [ f (x, y)−f (x, y)] t t+1 frames in the original video and W and H correspond to s s x=1 y=1 the width and height of the frames in the video generated by the stabilization process, respectively. where f and f are two consecutive frames of the t t+1 video, W and H are the width and height of each frame, 4 Results and discussion respectively, and L is the maximum value intensity max This section describes the results of experiments con- of the image. The PSNR metric is expressed in decibel ducted on a set of input videos. Fourteen videos with (dB), a unit originally defined to measure sound intensity oscillations were submitted to the stabilization process on a logarithmic scale. Typical PSNR values range from and evaluated, where eleven of them are available from the 20 to 40. The PSNR value should increase from the initial Fig. 6 Frame after boundary cropping Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 7 of 11 Table 1 Video sequences used in our experiments No. Video Source Resolution (pixels) FPS 1 gleicher1 GaTech VideoStab 640 × 360 30 2 gleicher2 GaTech VideoStab 640 × 360 30 3 gleicher3 GaTech VideoStab 640 × 360 30 4 gleicher4 GaTech VideoStab 640 × 360 30 5 greyson_chance GaTech VideoStab 640 × 360 30 6 hippo nghiaho.com/uploads/hippo.mp4 480 × 360 30 7 lf_juggle GaTech VideoStab 480 × 360 25 8 new_gleicher GaTech VideoStab 480 × 270 30 9 sam_1 GaTech VideoStab 640 × 360 30 10 sam and cocoa youtu.be/627MqC6E5Yo 540 × 360 30 11 sany0025 GaTech VideoStab 640 × 360 30 12 shake_pgh_1 GaTech VideoStab 640 × 360 30 13 shaky_car MatLab 320 × 240 30 14 yuna_long GaTech VideoStab 640 × 360 30 GaTech VideoStab [24] database and the three others col- for each trajectory according to the size of the trajectory lected separately. Table 1 presents the videos used in the range with respect to the size of the video frame. Higher experiments and their sources. values of σ are assigned to paths with smaller intervals; In the experiments performed, we compared the val- we then denominate this version as semi-adaptive. The uesofthe ITFmetricaswellasthe amount of pixels locally adaptive version of the Gaussian filter proposed in held for different versions of the trajectory smoothing. In this work is presented. A version using the Kalman filter the first version, we used the Gaussian filter considering is also shown. In addition, the videos were submitted to σ = 40. In another version, the Gaussian filter is used in a the YouTube stabilization method [24]inorder to com- slightly more adaptive way, choosing different values of σ pare its results against ours. The metric is calculated for Table 2 Comparison between Gaussian filter and Kalman filter No. of videos Original Gaussian filter Kalman filter σ = 40 ITF ITF Hold pixels (%) ITF Hold pixels (%) 1 18.793 27.738 69.276 25.888 71.000 2 20.390 29.331 71.750 27.201 74.771 3 16.186 22.559 72.972 22.122 73.003 4 19.965 33.380 48.958 26.298 54.903 5 23.277 28.660 2.540 25.991 4.958 6 19.681 29.804 67.891 25.576 73.507 7 24.109 28.510 60.495 28.063 57.167 8 17.881 25.448 70.648 24.081 72.287 9 19.248 23.251 25.797 21.426 33.818 10 12.972 18.453 17.519 16.680 27.204 11 21.487 26.826 43.599 25.704 52.875 12 15.081 0 0 20.219 2.686 13 23.841 30.621 70.312 28.200 71.875 14 18.065 20.265 7.448 20.902 7.642 Average 19.355 24.631 44.953 24.167 48.406 Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 8 of 11 Table 3 Comparison between semi-adaptive Gaussian filter and adaptive Gaussian filter No. of videos Original Semi-adaptive Gaussian filter Locally adaptive Gaussian filter ITF ITF Hold pixels (%) ITF Hold pixels (%) 1 18.793 27.620 70.745 27.455 74.500 2 20.390 29.331 71.750 28.914 75.781 3 16.186 22.559 72.972 22.090 76.056 4 19.965 33.380 48.958 27.931 62.465 5 23.277 27.814 8.312 27.360 53.385 6 19.681 29.804 67.891 29.077 70.838 7 24.109 28.510 60.495 28.876 73.667 8 17.881 25.448 70.648 25.182 73.284 9 19.248 21.845 35.750 21.435 57.139 10 12.972 17.465 27.907 16.381 70.296 11 21.487 26.826 43.559 25.659 57.260 12 15.081 19.827 16.611 17.895 59.847 13 23.841 30.621 70.312 29.987 71.719 14 18.065 19.759 39.045 19.773 54.146 Average 19.355 25.772 50.353 24.858 66.455 the video sequence before and after the stabilization pro- camera motion, which is erroneously considered as oscil- cess. Table 2 shows the results obtained with the Kalman lations by the Gaussian filter, if the value of σ used is filter and the Gaussian filter with σ = 40. high enough. However, smaller values may not remove the From Table 2, we can observe a certain superiority in oscillations from the videos efficiently, since each video the use of the Gaussian filter, which achieves a higher has oscillations of different proportions. ITF value for all videos with basically the same amount In ordertoimprove thequality of thestabilization of pixels kept for most videos. Videos #5, #9, #10, #12, for these cases, Table 3 presents the results obtained and #14 keep a lower amount of pixels compared to with the version of the semi-adaptive Gaussian filter, the other videos. This is due to the presence of desired where trajectories with greater difference between the Table 4 Comparison between adaptive Gaussian filter and YouTube method [24] No. of videos Original Locally adaptive Gaussian filter YouTube [24]Holdpixels 1 18.793 27.455 27.890 Superior 2 20.390 28.914 28.604 Superior 3 16.186 22.090 23.030 Comparable 4 19.965 27.931 33.711 Superior 5 23.277 27.360 27.599 Inferior 6 19.681 29.077 29.390 Superior 7 24.109 28.876 29.252 Comparable 8 17.881 25.182 25.908 Superior 9 19.248 21.435 20.922 Inferior 10 12.972 16.381 20.495 Superior 11 21.487 25.659 26.672 Comparable 12 15.081 17.895 19.283 Comparable 13 23.841 29.987 28.845 Comparable 14 18.065 19.773 20.128 Inferior Average 19.355 24.858 25.837 – Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 9 of 11 Fig. 7 Video #1. Amount of pixels hold through our method is superior than the state-of-the-art approach. a Adaptive Gaussian filter; b YouTube [24] minimum and maximum values will have a lower value comparable (when both methods hold basically the same for σ.Weused σ = 40 for trajectories with inter- amount of pixels). Figures 7, 8,and 9 illustrate the analysis vals smaller than 80% of the respective frame size, performed. whereas σ = 20 otherwise. For the locally adap- In Fig. 7, it is possible to observe that more infor- tive version proposed in this work, we experimen- mation is maintained on the top, left, and right sides tally set r as 0.4, whose results are reported in of the video obtained with our method. The difference min Table 3. is not considerably large, and the advantage or dis- The semi-adaptive version maintains more pixels in the advantage obtained follows these proportions in most videos in which the original Gaussian filter had prob- videos. lems, since σ = 20 was applied to them. However, the In Fig. 8, there is less information maintained on the top amount of pixels held in the frames is lower than in the and bottom sides in the use of the adaptive Gaussian filter. other videos. This is because, in many cases, σ = 20 is On the other hand, there is a larger amount of information still a very high value. On the other hand, smaller values held on the left and right sides. of σ can ignore the oscillations that are present in other Figure 9 illustrates a situation where our method main- instance of the video, thus generating videos not stabi- tains less pixels. Lower amount of information is held on lized enough and consequently with a lower ITF value. the every sides with our method. Therefore, as can be seen in Table 3, the locally adap- From Table 4, we can observe a certain parity for tive version, whose smoothing intensity is changed along both methods in terms of ITF metric, with a slight the trajectory, obtained ITF values comparable to the advantage of the YouTube method [24], while the main- original and semi-adaptive version, maintaining consider- tained pixels are in general comparable and, when lower, ably more pixels. do not differ much. This demonstrates that the pro- Table 4 presents a comparison of the results between posed method is competitive with one of the meth- ourmethodand YouTubeapproach [24]. The percentage ods considered as current state-of-the-art, despite the of pixels held was not reported since the YouTube method simplicity of our method. Notwithstanding, the method resizes the stabilized videos to their original size. Thus, a still needs to be further extended to deal with some qualitative analysis is done through the first frame of each adverse situations, such as the treatment of non- video, whose results are classified into three categories: rigid oscillations in the video #10, the rolling shut- superior (when our method maintains more pixels), infe- ter in the video #12, and the parallax effect, among rior (when the YouTube method holds more pixels), and others. Fig. 8 Video #3. Amount of pixels hold through our method is comparable to the state-of-the-art approach. a Adaptive Gaussian filter; b YouTube [24] Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 10 of 11 Fig. 9 Video #12. Amount of pixels hold through our method is inferior than the state-of-the-art approach. a Adaptive Gaussian filter; b YouTube [24] 5Conclusions Received: 28 November 2017 Accepted: 15 May 2018 In this work, we presented a technique for video stabi- lization based on an adaptive Gaussian filter to smooth References the camera trajectory in order to remove undesired oscil- 1. C Yan, Y Zhang, J Xu, F Dai, J Zhang, Q Dai, et al., Efficient parallel lations. The proposed filter assigns distinct values to σ framework for HEVC motion estimation on many-core processors. IEEE Trans. Circ. Syst. Video Technol. 24(12), 2077–2089 (2014) along the camera trajectory by considering that the inten- 2. MF Alcantara, TP Moreira, H Pedrini, Real-time action recognition using a sity of the oscillations changes throughout the video and multilayer descriptor with variable size. J. Electron. Imaging. 25(1), that a very high value of σ can result in a video with a 013020.1?013020.9 (2016) 3. C Yan, Y Zhang, J Xu, F Dai, L Li, Q Dai, et al., A highly parallel framework low amount of pixels, while smaller values generate less for HEVC coding unit partitioning tree decision on many-core processors. stabilized videos. IEEE Sig. Process. Lett. 21(5), 573–576 (2014) The results obtained in the experiments were compared 4. MF Alcantara, H Pedrini, Y Cao, Human action classification based on silhouette indexed interest points for multiple domains. Int. J. Image with different versions for the smoothing of the trajec- Graph. 17(3), 1750018_1–1750018_27 (1750) tory: Kalman filter, Gaussian filter with σ = 40, and a 5. C Yan, H Xie, D Yang, J Yin, Y Zhang, Q Dai, Supervised hash coding with semi-adaptive Gaussian filter. The approaches achieved deep neural network for environment perception of intelligent vehicles. comparable values for the ITF metric while maintaining a IEEE Trans. Intell. Transp. Syst. 19(1), 284–295 (2018) 6. MF Alcantara, TP Moreira, H Pedrini, F Fl´rez-Revuelta, Action identification significantly higher amount of pixels. using a descriptor with autonomous fragments in a multilevel prediction A comparison was performed with the stabilization scheme. Signal Image Video Process. 11(2), 325–332 (2017) method used in YouTube, where the results were compet- 7. C Yan, H Xie, S Liu, J Yin, Y Zhang, Q Dai, Effective Uyghur language text detection in complex background images for traffic prompt itive. As directions for future work, we intend to extend identification. IEEE Trans. Intell. Transp. Syst. 19(1), 220–229 (2018) our method to deal with some adverse situations, such as 8. BS Torres, H Pedrini, Detection of complex video events through visual non-rigid oscillations and effect of parallax. rhythm. Vis. Comput. 34(2), 145–165 (2018) 9. MVM Cirne, H Pedrini, in Progress in Pattern Recognition, Image Analysis, Abbreviations Computer Vision, and Applications. A video summarization method based 2D: Two-dimensional; 3D: Three-dimensional; CMOS: Complementary metal on spectral clustering (Springer, 2013), pp. 479–486 oxide semiconductor; CV: Coefficient of variation; dB: Decibel; ITF: Interframe 10. MVM Cirne, H Pedrini, in Progress in Pattern Recognition, Image Analysis, transformation fidelity; MSE: Mean square error; PSNR: Peak signal-to-noise Computer Vision, and Applications. Summarization of videos by image ratio; RANSAC: Random sample consensus; SFM: Structure-from-motion quality assessment (Springer, 2014), pp. 901–908 11. TS Huang, Image Sequence Analysis. vol. 5. (Springer Science & Business Acknowledgements Media, Heidelberg, 2013) The authors would like to thank the editors and anonymous reviewers for their 12. AA Amanatiadis, I Andreadis, Digital image stabilization by independent valuable comments. component analysis. IEEE Trans. Instrum. Meas. 59(7), 1755–1763 (2010) 13. JY Chang, WF Hu, MH Cheng, BS Chang, Digital image translational and Funding rotational motion stabilization using optical flow technique. IEEE Trans. The authors are thankful to FAPESP (grants #2014/12236-1 and Consum. Electron. 48(1), 108–115 (2002) #2017/12646-3) and CNPq (grant #305169/2015-7) for their financial support. 14. S Ertürk, Real-time digital image stabilization using Kalman filters. Real-time Imaging. 8(4), 317–328 (2002) Availability of data and materials 15. R Jia, H Zhang, L Wang, J Li, in International Conference on Artificial Data are publicly available. Intelligence and Computational Intelligence. Digital image stabilization Authors’ contributions based on phase correlation. vol. 3 (IEEE, 2009), pp. 485–489 16. SJ Ko, SH Lee, KH Lee, Digital image stabilizing algorithms based on HP and MRS contributed equally to this work. Both authors carried out the in- bit-plane matching. IEEE Trans. Consum. Electron. 44(3), 617–622 (1998) depth analysis of the experimental results and checked the correctness of the 17. S Kumar, H Azartash, M Biswas, T Nguyen, Real-time affine global motion evaluation. Both authors took part in the writing and proof reading of the final estimation using phase correlation and its application for digital image version of the paper. Both authors read and approved the final manuscript. stabilization. IEEE Trans. Image Process. 20(12), 3406–3418 (2011) Competing interests 18. CT Lin, CT Hong, CT Yang, Real-time digital image stabilization system The authors declare that they have no competing interests. using modified proportional integrated controller. IEEE Trans. Circ. Syst. Video Technol. 19(3), 427–431 (2009) Publisher’s Note 19. L Marcenaro, G Vernazza, CS Regazzoni, in International Conference on Springer Nature remains neutral with regard to jurisdictional claims in Image Processing. Image stabilization algorithms for video-surveillance published maps and institutional affiliations. applications. vol. 1 (IEEE, 2001), pp. 349–352 Souza and Pedrini EURASIP Journal on Image and Video Processing (2018) 2018:37 Page 11 of 11 20. C Morimoto, R Chellappa, in 13th International Conference on Pattern 45. ML Gleicher, F Liu, Re-cinematography: improving the camerawork of Recognition. Fast electronic digital image stabilization. vol. 3 (IEEE, 1996), casual video. ACM Trans. Multimed. Comput. Commun. Appl. 5(1), 2 (2008) pp. 284–288 46. J Bai, A Agarwala, M Agrawala, R Ramamoorthi, Wiley Online 21. YG Ryu, MJ Chung, Robust online digital image stabilization based on Library.User-assisted video stabilization. Comput. Graph. Forum. 33(4), point-feature trajectory without accumulative global motion estimation. 61–70 (2014) IEEE Signal Proc. Lett. 19(4), 223–226 (2012) 47. F Liu, M Gleicher, J Wang, H Jin, A Agarwala, Subspace video stabilization. 22. Y Matsushita, E Ofek, W Ge, X Tang, HY Shum, Full-frame video ACM Trans. Graph. 30(1), 4 (2011) stabilization with motion inpainting. IEEE Trans. Pattern Anal. Mach. Intell. 48. P Bhat, CL Zitnick, N Snavely, A Agarwala, M Agrawala, M Cohen, et al,in 28(7), 1150–1163 (2006) 18th Eurographics Conference on Rendering Techniques. Eurographics 23. S Liu, L Yuan, P Tan, J Sun, Bundled camera paths for video stabilization. Association. Using photographs to enhance videos of a static scene, ACM Trans. Graph. 32(4), 78 (2013) (2007), pp. 327–338 24. M Grundmann, V Kwatra, I Essa, in IEEE Conference on Computer Vision and 49. F Liu, M Gleicher, H Jin, A Agarwala, Content-preserving warps for 3D Pattern Recognition. Auto-directed video stabilization with robust L1 video stabilization. ACM Trans. Graph. 28(3), 44 (2009) optimal camera paths (IEEE, 2011), pp. 225–232 50. FLiu,YNiu, HJin,in IEEE International Conference on Computer Vision.Joint 25. C Jia, BL Evans, Online motion smoothing for video stabilization via subspace stabilization for stereoscopic video, (2013), pp. 73–80 constrained multiple-model estimation. EURASIP J. Image Video Proc. 51. A Goldstein, R Fattal, Video stabilization using epipolar geometry. ACM 2017(1), 25 (2017) Trans. Graph. 31(5), 1–10 (2012) 26. S Liu, M Li, S Zhu, B Zeng, CodingFlow: enable video coding for video 52. H Bay, A Ess, T Tuytelaars, L Van Gool, Speeded-up robust features (SURF). stabilization. IEEE Trans. Image Proc. 26(7), 3291–3302 (2017) Comput. Vis. Image Underst. 110(3), 346–359 (2008) 27. Z Zhao, X Ma, in IEEE International Conference on Image Processing. Video stabilization based on local trajectories and robust mesh transformation (IEEE, 2016), pp. 4092–4096 28. N Bhowmik, V Gouet-Brunet, L Wei, G Bloch, in International Conference on Multimedia Modeling. Adaptive and optimal combination of local features for image retrieval (Springer, Cham, 2017), pp. 76–88 29. HGuo,SLiu, SZhu,BZeng, in IEEE International Conference on Image Processing. Joint bundled camera paths for stereoscopic video stabilization (IEEE, 2016), pp. 1071–1075 30. Q Zheng, M Yang, A video stabilization method based on inter-frame image matching score. Glob. J. Comput. Sci. Technol. 17(1), 41–46 (2017) 31. S Liu, B Xu, C Deng, S Zhu, B Zeng, M Gabbouj, A hybrid approach for near-range video stabilization. IEEE Trans. Circ. Syst. Video Technol. 27(9), 1922–1933 (2016) 32. BH Chen, A Kopylov, SC Huang, O Seredin, R Karpov, SY Kuo, et al., Improved global motion estimation via motion vector clustering for video stabilization. Eng. Appl. Artif. Intell. 54, 39–48 (2016) 33. B Cardani, Optical Image Stabilization for Digital Cameras. IEEE Control. Syst. 26(2), 21–22 (2006) 34. C Buehler, M Bosse, L McMillan, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Non-metric image-based rendering for video stabilization. vol. 2 (IEEE, 2001), pp. II–609 35. G Zhang, W Hua, X Qin, Y Shao, H Bao, Video Stabilization based on a 3D Perspective Camera Model. Vis. Comput. 25(11), 997–1008 (2009) 36. KY Lee, YY Chuang, BY Chen, M Ouhyoung, in IEEE 12th International Conference on Computer Vision. Video stabilization using robust feature trajectories (IEEE, 2009), pp. 1397–1404 37. HC Chang, SH Lai, KR Lu, in IEEE International Conference on Multimedia and Expo. A robust and efficient video stabilization algorithm. vol. 1 (IEEE, 2004), pp. 29–32 38. G Puglisi, S Battiato, A robust image alignment algorithm for video stabilization purposes. IEEE Trans. Circ. Syst. Video Technol. 21(10), 1390–1400 (2011) 39. S Battiato, G Gallo, G Puglisi, S Scellato, in 14th International Conference on Image Analysis and Processing. SIFT features tracking for video stabilization (IEEE, 2007), pp. 825–830 40. Y Shen, P Guturu, T Damarla, BP Buckles, KR Namuduri, Video stabilization using principal component analysis and scale invariant feature transform in particle filter framework. IEEE Trans. Consum. Electron. 55(3), 1714–1721 (2009) 41. BY Chen, KY Lee, WT Huang, JS Lin, Wiley Online Library. Capturing intention-based full-frame video stabilization. Comput. Graphics Forum. 27(7), 1805–1814 (2008) 42. S Ertürk, Image sequence stabilisation based on Kalman filtering of frame positions. Electron. Lett. 37(20), 1 (2001) 43. A Litvin, J Konrad, WC Karl, in Electronic Imaging.International Society for Optics and Photonics. Probabilistic video stabilization using Kalman filtering and mosaicing (SPIE-IS&T, 2003), pp. 663–674 44. J Yang, D Schonfeld, C Chen, M Mohamed, in International Conference on Image Processing. Online video stabilization based on particle filters (IEEE, 2006), pp. 1545–1548

Journal

EURASIP Journal on Image and Video ProcessingSpringer Journals

Published: May 30, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off