Access the full text.
Sign up today, get DeepDyve free for 14 days.
The human visual system was shaped through natural evolution. We have used artificial evolution to investigate whether depth information and optical flow are helpful for visual control. Our experiments were carried out in simulation. The task was controlling a simulated racing car. We have used The Open Racing Car Simulator for our experiments. Genetic programming was used to evolve visual algorithms that transform input images (color, optical flow, or depth information) to control commands for a simulated racing car. We found that significantly better solutions were found when color, depth, and optical flow were available as input together compared with color, depth, or optical flow alone. Keywords: depth map; genetic programming; optical flow; visual control. This article is structured as follows. The next section gives a brief introduction to the visual system. It is followed by the description of the racing car simulator, explanation how data from the simulator is used to compute optical flow, and a brief introduction to genetic programming. Related work on visual control using genetic programming is discussed as well as how genetic programming is used to evolve visual control algorithms. Finally, the study's results and conclusions are given. The visual system The human visual system was shaped through natural evolution [9, 10]. Visual processing starts with light entering the eye. This light is measured by two different types of receptors inside the retina [11, 12]: rods and cones. The rods mediate vision when little light is available. They have a much higher sensitivity than the cones. Cones are in charge of color vision. Three different types of cones exist, which absorb light mainly in the red, green, and blue parts of the visual spectrum. Some preprocessing occurs inside the retina. Information flows from the retinal receptors to the retinal ganglion cells. This information exits the eye at the blind spot, passes through the lateral geniculate nucleus, and then reaches the primary visual cortex or area V1. Area V1 is highly structured [13]. Cells within ocular dominance segments respond primarily to stimuli from one or the other eye. V1 contains columns with orientation-sensitive cells. Blobs with color or lightness-sensitive cells also exist. Visual information is analyzed using a retinotopic map with respect to different aspects. Indeed, the entire visual cortex is highly structured. Color, shape, and motion appear to be processed by separate visual areas [14, 15]. Color is a product of the brain. It is processed in visual area V4. Shape is processed in V3, and motion is processed in V5. A dedicated area for face and object recognition also exists. It is also interesting that color and motion are not perceived synchronously. Moutoussis and Zeki [16] demonstrated that color is perceived earlier than motion. The brain appears to bind visual attributes that are perceived together. Introduction With this contribution, we investigated whether depth and motion provide an evolutionary advantage compared to color alone. As a test environment, we have used The Open Racing Car Simulator (TORCS) [1]. Simulated evolution [2] was used to evolve control algorithms for the racing car. These algorithms use screen grabs from the racing car simulator. The screen grabs are processed using elementary computer vision operators. The output of these algorithms control the steering wheel as well as gas/brakes of the car [3]. OpenCV [4, 5], an open source library for computer vision, was used for image processing. Genetic programming [68] was used to evolve visual algorithms. We will see that significantly better solutions are found if color, depth, and optical flow are all available. *Corresponding author: Marc Ebner, Institut für Mathematik und Informatik, Ernst-Moritz-Arndt-Universität Greifswald, WaltherRathenau-Straße 47, 17487 Greifswald, Germany, Tel.: +49-3834-86-4646, Fax: +49-3834-86-4640, E-mail: marc.ebner@uni-greifswald.de Johannes Hansen: Institut für Mathematik und Informatik, ErnstMoritz-Arndt-Universität Greifswald, Walther-Rathenau-Straße 47, 17487 Greifswald, Germany 10Hansen and Ebner: Is depth information and optical flow helpful for visual control? TORCS TORCS [1] is an open-source racing car three-dimensional (3D) simulation. A sample screenshot is shown in Figure 1. We have used this simulator as a test environment to evaluate whether depth information and optical flow are helpful for visual control. Currently, Berhard Wymann maintains the TORCS project. Its original creators were Eric Espié and Christophe Guionneau. The TORCS simulator provides several different racing tracks. A player can choose among different cars when playing the game. Several different opponents are available to race against. A split-screen mode is also available. Up to four human players are supported. Supported controls are a joystick, mouse, and keyboard. Some steering wheels are also supported. The game features realistic 3D graphics, lighting, smoke, and skid marks. Game physics include simulation of collisions, tire and wheel properties, and a simple damage model. It even includes a simple aerodynamic model with slip-streaming and ground effects. TORCS was used in several different scientific competitions [17, 18]. For these competitions, participants are developing artificial intelligence methods that drive the racing car along its track. Usually, a client-server architecture is used and competition participants develop a client that sends its control commands to the TORCS server. We have modified TORCS slightly to use it for our experiments. We extract screen grabs from the graphics buffer. A dense depth map is extracted form the so-called Z-buffer of the graphics library [19], i.e. both color and depth are readily available form the graphics context. In addition to color and depth, we also provide optical flow to the visual control algorithms. How we compute optical flow is described in the next section. Computing optical flow Optical flow describes the pattern of motion of objects that are seen on the screen. This information is very helpful for visual control. We would like to estimate a dense flow field, i.e. optical flow for each image pixel. Let v(x, y, t)=(vx, vy) be the optical flow, estimated for image pixel (x, y) in an image taken at time t, then the image content in the vicinity of pixel (x, y) will be found at position (x+vxt, y+vyt) in an image taken at time t+t provided that the object is moving with constant velocity across the screen. Several different methods have been developed to compute optical flow from visual input [2023]. In recent years, the accuracy of the methods has improved considerably. Many methods for computing optical flow are based on partial derivatives. However, block-based methods are also used. Block-based methods search for pixels within an area of a given pixel in a subsequent image to determine image motion. Estimating optical flow from two subsequent images is an expensive image operation. Therefore, we have used the depth map and the known motion of the race car to compute optical flow. The depth map is readily available from the graphics library. It is a by-product of rendering the scene. The depth map contains, for each image pixel, the distance from the object to the camera along the Z-axis. The motion of the race car is available directly from the race car simulator. Let d(x, y)=dx,y be the depth map of the current image shown on the screen. We assume that all screen coordinates are specified relative to the center of the screen. Let f be the focal length of the camera. The location of an object of the scene that is shown at image pixel (x, y) has coordinates (XS, YS, ZS) inside the camera coordinate system centered on the car driver. XS dx ,y YS = f Z S x y f (1) Figure 1:The Open Racing Car Simulator. The coordinates (XS, YS, YS) are relative to the viewer sitting inside the car, i.e. these are eye coordinates. Let R be the inverse 3×3 rotation matrix that describes the rotatory motion of the racing car from one time step of the simulation to the next. Let D be the inverse vector that describes the translatory motion of the racing car from one time step of the simulation to the next. Hence, after the racing car has moved, the point (XS, YS, ZS) will have moved to a location ( X S , YS, Z S ) relative to the eye of the driver. Hansen and Ebner: Is depth information and optical flow helpful for visual control?11 XS XS YS = R YS + D Z Z S S (2) The coordinate ( X S , YS, Z S ) can be projected onto the screen using the known focal length of the camera. Let (x, y) be the screen coordinates of ( X S , YS, Z S ), then we have x XS / ZS = f y YS / Z S (3) Optical flow can then be computed by subtracting the screen coordinate before the racing car has moved from the screen coordinate after the car has moved. vx x y = - vy y y (4) As the depth map and the known motion of the racing car are correct, the optical flow will also be correct. Figure 2 shows the computed optical flow for a sample image. It would be possible to compute optical flow directly from the input images using an algorithm based on partial derivatives or using block based methods. However, this would take considerably more computing resources and also would have the disadvantage that the estimated optical flow would not be 100% correct for all image pixels. Next, we will describe genetic programming, which we have used to evolve visual control algorithms. Genetic programming Genetic programming [68] is an evolutionary algorithm. Evolutionary algorithms use simulated evolution to solve optimization problems. Such algorithms work with a population of individuals. Each individual represents a possible solution to the optimization problem. Darwinian selection is used to select above average individuals to create an population, i.e. a new generation of individuals. In genetic programming, individuals are represented as trees. The fitness of an individual describes how well this individual solves the given problem. The main operators of an evolutionary algorithms are selection, reproduction, and variation. Above-average individuals are selected to create . For our experiments, we have used four genetic operators: reproduction, mutation, ephemeral random constant (ERC) mutation, and crossover. Each genetic operator is applied with a certain probability prep, pmut, pERC mut, and pcross respectively. These four probabilities sum to 1. Individuals of the first generation are created using the so-called ramped half and half initialization [6]. To create an , one genetic operator is randomly selected (using the four probabilities). The reproduction operator simply creates a copy of the genetic material of the individual, i.e. the tree. Mutation, ERC mutation, and crossover create that are similar but not identical to their s. Depending on the type of operator, one or two s are selected from the population. Typically, tournament selection is used to select new s. For tournament selection, nT individuals are selected with uniform probability from the population. These nT individuals form a tournament. The individual with highest fitness is the winner of the tournament and becomes a . This will then create using one of the genetic operators. The genetic operators are illustrated in Figure 3. Figure 3A shows the reproduction operator that is applied with probability prep. Figure 3B shows the mutation operator that is applied with probability pmut. Figure 3C shows the ERC mutation operator that is applied with probability pERC mut. Figure 3D shows the crossover operator that is applied with probability pcross. If the reproduction operator is applied, then a copy of the individual is created. Next, a node of the tree is randomly selected. Internal nodes are selected with Figure 2: Computing optical flow from game angine data. (A) input image, (B) depth map, and (C) optical flow computed using the depth map and the known ego-motion of the car. 12Hansen and Ebner: Is depth information and optical flow helpful for visual control? 1 2 1 2 Figure 3:Genetic operators. (A) reproduction, (B) mutation, (C) ERC mutation, and (D) crossover. probability 0.9, whereas external nodes are selected with probability 0.1. Finally, the selected node is replaced with a randomly generated sub-tree. The method that is used to create this sub-tree is the same method that is used to create the individuals of the first generation. For ERCmutation, a single node which contains a so called ephemeral random constant (ERC) is selected. All ERCs located within this subtree are mutated. We originally intended to use Gaussian mutation (like an evolution strategy [24]) to slightly alter this ERC value, i.e. v:=v·e0.01z where v is the original value of the constant and z is a normally distributed random value with mean 0 and standard deviation 1. However, we have actually used v:=v·0.01 which pulls the ERC towards zero and may also change the sign of the constant. For crossover, two individuals are selected. Next, two nodes are randomly selected (one for each tree). Again, internal nodes are selected with probability 0.9, whereas external nodes are selected with probability 0.1. Then the two sub-trees (together with the selected nodes) are exchanged between the two individuals. Trees are limited to a maximum depth of 17. Whenever an is created, it is inserted into the next generation of individuals. The process of selecting a genetic operator and creating continues until the next generation is filled. Usually, an evolutionary run is terminated after a certain number of generations have been created. The individual with highest fitness that was found during all these generations is the solution that solves our problem best. For our experiments, we have used the Evolutionary Computation Library ECJ developed by Luke [25]. Development of ECJ started in 1998 and is a mature library for evolutionary computation. Visual control using genetic programming Our racing car is controlled through visual input alone. We only use the images obtained from screen grabs and the optical flow. Data from the game engine is not used to control the car. It is only used to compute optical flow as described above. Genetic programming has been used by Winkeler and Manjunath [26] for object detection. Johnson et al. [27] used it to evolve visual routines. Ebner and Tiede [28] have previously evolved controllers for TORCS using genetic programming. However, for this work, input was taken directly from the game engine and not from screen grabs. Koutník et al. [29] have used an evolutionary algorithm to evolve compressed encodings of a recursive neural network to control a racing car in TORCS. Tanev and Shimohara [30, 31] have used a genetic algorithm to evolve parameters that will control an actual scale model of a car using visual input from an overhead camera. Other researchers have used Atari video games for training game players [32]. Hausknecht et al. [33, 34] evaluated neuro-evolutionary methods for general game playing of Atari video games. They found that HyperNEAT was the only neuro-evolution algorithm able to play based Hansen and Ebner: Is depth information and optical flow helpful for visual control?13 on raw-pixel input from the games. Mnih et al. [35, 36] created a deep neural network that was trained using reinforcement learning. It was able to achieve a level comparable to human players. Deep learning in combination with Monte-Carlo tree search planning was used by Guo et al. [37]. Parker and Bryant [38, 39] evolved controllers for Quake II, which used only visual input. Table 1:Terminal symbols. Name Return Description type Float Image Image Image Image Image Image Image Image Image Image ERC in the range [0, 1] Input image (red channel) Input image (green channel) Input image (blue channel) Input image (cyan channel) Input image (magenta channel) Input image (yellow channel) Input image (gray channel, average RGB) Input image (depth map) Optical flow (horizontal component) Optical flow (vertical component) Materials and methods We are using strongly typed genetic programming [40] to evolve two trees. The first tree is used to control the steering wheel. The second tree is used to control the velocity of the racing car. The terminal symbols are shown in Table 1. We work with two return types: float and image. The only terminal symbol returning a floating point value is an ERC. An ERC is a random floating point value from the range [0, 1]. Once a node with an ERC is created, it stays constant throughout the life of the node. However, it may be modified by the ERC mutation operator. The remaining terminal symbols provide access to visual information obtained via screen grabs from the game engine. This screen grab is scaled down to one third of its original size. All pixel values are transformed to the range [0, 1]. All terminal symbols returning image data provide single band images: red channel (imageR), green channel (imageG), blue channel (imageB), cyan channel (imageC), magenta channel (imageM), yellow channel (imageY), and gray channel (imageGray). The depth ERC imageR imageG imageB imageC imageM imageY imageGray depthMap opticalFlowX opticalFlowY map of this input image is available through the terminal symbol (depthMap). Optical flow is computed using the depth map and the known ego-motion of the car, which is available from the game engine. As optical flow is a two-dimensional vector, the x-component of this vector is made available through the terminal opticalFlowX, and the y-component is made available through the terminal opticalFlowY. All image data is downscaled to one third of this size of the original image. Pixel values are scaled to the range [0, 1]. The set of elementary functions is shown in Tables 2 and 3. Table 2 shows elementary functions that return a floating point value. Table 3 shows elementary functions that return an entire image. We have used standard arithmetic functions such as addition and multiplication, Table 2:Elementary functions. Name abs(float v) round(float v) floor(float v) ceil(float v) neg(float v) sqrt(float v) minLocX(Image c) minLocY(Image c) maxLocX(Image c) maxLocY(Image c) avg(Image c) Output type Float Float Float Float Float Float Float Float Float Float Float Float Float Float Float Float Description Absolute value, o=|v| Round function, o=round(v) Floor function, o=v Ceil function, o=v Negate input, o=v Square root, o = v x-Coordinate (range [0, 1]) of minimum c(x, y) y-Coordinate (range [0, 1]) of minimum c(x, y) x-Coordinate (range [0, 1]) of maximum c(x, y) y-Coordinate (range [0, 1]) of maximum c(x, y) Average value of all pixels, o = x ,yc ( x , y ) Minimum value, o=(a b)?b:a q-quantile of the image Addition, o=a+b Multiplication, o=a·b min(float a,float b) max(float a,float b) q-quantile(Image c,float q) add(float a,float a) mult(float a,float b) The return value of the node is o. 14Hansen and Ebner: Is depth information and optical flow helpful for visual control? Table 3:Elementary functions. Name abs(Image c) sqrt(Image c) min(Image c) max(Image c) add(image a,image b) constImage(float v) invert(Image c) Output type Image Image Image Image Image Image Image Image Image Image Image Image Image Image Image Description Absolute value o(x, y)=|c(x, y)| Square root, o( x , y ) = c ( x , y ) Minimum value, o(x, y)=minx,yc(x, y) Maximum value, o(x, y)=maxx,yc(x, y) Addition, o(x, y)=a(x, y)+b(x, y) Constant image, o(x, y)=v Image, o(x, y)=maxc(x, y), where max is the maximum value of all image pixels - x 2 /(2 2 ) Gaussian filter with kernel e , where =0.3(|v|1)+0.8 Median filter with size [2|v|+1] Binary threshold, o(x, y)=(c(x, y)>v)?1:0 clamp value, o(x, y)=(c(x, y)>v)?v:c(x, y) Threshold o(x, y)=(c(x, y)>v)?c(x, y):0 Threshold o(x, y)=(c(x, y)>v)?0:c(x, y) Average threshold, o(x, y)=(c(x, y)>a)?1:0 with a = x ,yc ( x , y ). Local average threshold, o(x, y)=(c(x, y)>a(x, y))?1:0 where a(x, y) is 2 2 obtained by convolving the input image with a Gaussian kernel e - x /(2 ) with standard deviation =1.1 10 variants of this elementary function exist, with NAME {R, G, B C, M, Y, Gray, DeathMap, OpticalFlowX, OpticalFlowY}. The parameters (s, t) specify the upper left corner of a rectangular area that is extracted from the current input image (as specified by NAME). The width and height of this area is one third of the input image. gauss(float v,Image c) median(float v,Image c) binary(Image c,float v) clamp(Image c,float v) thresholdPass(Image c,float v) thresholdZero(Image c,float v) avgThreshold(Image c) localThreshold(Image c) extractNAME(float s,float t) Image The return value is o(x, y) for each pixel (x, y) of the output image. computation of minimum and maximum. We have also included functions that search for maximum and minimum values inside the image. These functions return either the x or the y coordinate of the position where the extremum was found. A Gaussian filter is also available. If we apply a Gaussian filter to a gray-scale input image and then a function that locates the maximum, we can locate the brightest point in the image. The function extractNAME extracts smaller regions from the input image. The size of the region is one third of the image. The location of the region can be specified through the parameters of the function. All bands (see terminal symbols) are available for this extraction operation. Add This functions is very useful to control the steering wheel of the car. This is illustrated in Figure 4. Suppose we extract a region from the depth map from the lefthand side of the image and we extract another region from the depth map from the right-hand side of the image. If we apply the average function, which computes the average depth within these two areas, then we can compare both average depths to control the steering wheel. This simple control algorithm will drive the car in the direction where more space is available. We have carried out four sets of experiments to evaluate whether depth information is helpful in controlling the racing car. For our experiments, we used the same basic Avg Neg ExtractDepthMap Avg ExtractDepthMap Figure 4:Extracting sub-regions from the visual input is helpful for visual control. (A) Sample tree which evaluates information from the depth map, (B) input image, and (C) depth map with regions. Hansen and Ebner: Is depth information and optical flow helpful for visual control?15 Table 4:Terminal symbols and elementary functions used for our experiments. Experiment A B C D Terminal symbols and elementary functions imageR, imageG, imageB, imageC, imageM, imageY, imageGray, extract{R,G,B,C,M,Y,Gray} imageDepthMap, extractDepthMap opticalFlowX, opticalFlowY, extractOpticalFlowX extractOpticalFlowY imageR, imageG, imageB, imageC, imageM, imageY, imageGray, extract{R,G,B,C,M,Y,Gray}, imageDepthMap, extractDepthMap, opticalFlowX, opticalFlowY, extractOpticalFlowX extractOpticalFlowY ERC, abs, round, floor, ceil, neg, sqrt, minLocX, minLocY, maxLocX, maxLocY, avg, min, max, q-quantile, add, mult, constImage, invert, gauss, median, binary, clamp, thresholdPass, thresholdZero, avgThreshold, localThreshold All set of elementary functions but varied the input information that was made available to the evolved individuals. For experiment A, only color information was available. For experiment B, only the depth map was available. For experiment C, only optical flow was provided. Finally, for experiment D, all of the visual information (color, depth, and optical flow) was provided. The terminal symbols and elementary functions for the four experiments are shown in Table 4. Note that some of the functions listed under "All" accept both floating point values and images as input. The track that we have used for all of our experiments is shown in Figure 5. The red arrow illustrates the direction in which the race will start. A path taken by an evolved individual is shown overlaid on this track (green line). The end of this path is marked with a green cross. At this point, the evolved driver lost control of its car and crashed into the border of the track. The task is to evolve visual controllers that will drive the car along the track. Therefore, fitness is computed by considering distance traveled along the track. In addition, the damage attained is also used for the fitness computation. Individuals that stay away from the border of track and manage to avoid damage will receive higher fitness values. Let d be the distance traveled along the track (in meters). Let a be the amount of damage attained (with range [0, 1], where 1 is a completely damaged car). Then fitness fi of individual i is given as -50 2 d(1- a ) fi = max{2 d(1- a ) 2 ,0} if controller is disqualified not disqualified, only single sided steering not disqualified, steering toward left and right. (5) Figure 5:Track used for our experiments. (A) Track and (B) path taken by an evolved individual (green line). Driving direction (red arrow). A controller is disqualified if it (a) drives along the track in the wrong direction or (b) does not use the steering wheel, i.e. the first tree returns a constant value or (c) does not use the gas pedal, i.e. the second tree returns a constant value. If the controller is disqualified, it will be penalized with a fitness value of 50. Otherwise, it will receive a fitness of d·(1a)2. This fitness value is doubled if the controller turns the steering wheel to the left as well as to the right. For each experiment, 10 runs with different initializations of the random number generator were performed. For each run, a population of 200 individuals was evolved for 16Hansen and Ebner: Is depth information and optical flow helpful for visual control? Experiment A (Color) Experiment B (Depth) 1500 Maximum fitness Maximum fitness 0 10 20 30 70 80 90 Experiment C (Optical flow) Experiment D (Color, Depth and Optical flow) Maximum fitness 1500 Maximum fitness 0 10 20 30 70 80 90 Figure 6:Best fitness values obtained for all four experiments. Ten runs were conducted for each experiment. Depth information seems to provide an evolutionary advantage. For some of the runs, it produced exceptionally high-fitness individuals. Optical flow also seems to provide an evolutionary advantage. 99 generations. Individuals were selected using tournament selection with nT=7. Crossover was applied with probability pcross=0.2; mutation was applied with probability pmut=0.4; ERC mutation was applied with probability pERCmut=0.3; reproduction was applied with probability prep=0.1. Ramped half and half initialization was used to initialize the individuals of the first generation with depth ranging from 2 to 6. 700 600 500 Fitness 400 300 200 100 0 0 10 20 30 Average best fitness A B C D Results Figure 6 shows the best fitness values obtained for all four experiments. By far the highest fitness values were reached when only depth information was available. It is clear (as we have described above) that this type of information is helpful in controlling the car. Optical flow also seemed to be helpful. Indeed, it is well known that bees use optical flow to achieve centering behavior [41, 42]. This ability (comparison of lateral optical flow) has also been used for visual control in robotics [43, 44]. Figure 7 shows the average best fitness for all four experiments. Using only color information resulted, on Figure 7:Average best fitness for all four experiments. average, in less average best fitness compared to using depth information, optical flow, or all three after 99 generations. Average best fitness at generation 99 is summarized in Table 5. This table also shows the overall best fitness obtained in all of the 40 runs (10 per experiment) that we have carried out. Hansen and Ebner: Is depth information and optical flow helpful for visual control?17 Table 5:Experimental results for all four experiments. Experiment A B C D Overall best fitness 312.8 1978.6 1129.7 875.0 Average best fitness f 168.4 636.2 341.2 345.4 Overall best fitness is the best fitness obtained over all 10 runs for each experiment. The average best fitness obtained in generation 99 is shown in the last column. Table 6:Comparison of average best fitness in generation 99 using the Mann-Whitney U test. Hypothesis p-Value 0.30 0.65 0.02 tree is used to control the acceleration of the car. Several elementary visual operations such as Gaussian smoothing, addition, multiplication, threshold, or locating a maximum or minimum response are also provided. These are all elementary operations that can be performed easily by a network of spiking neurons. In our experiments, we found that significantly better results in driving a racing car along its track are obtained when color, depth, and optical flow are provided together.
Bio-Algorithms and Med-Systems – de Gruyter
Published: Mar 1, 2016
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.