TY - JOUR AU - Ma,, Fei AB - Abstract Vehicles involved in traffic accidents generally experience divergent vehicle motion, which causes severe damage. This paper presents a self-learning drift-control method for the purpose of stabilizing a vehicle's yaw motions after a high-speed rear-end collision. The struck vehicle generally experiences substantial drifting and/or spinning after the collision, which is beyond the handling limit and difficult to control. Drift control of the struck vehicle along the original lane was investigated. The rear-end collision was treated as a set of impact forces, and the three-dimensional non-linear dynamic responses of the vehicle were considered in the drift control. A multi-layer perception neural network was trained as a deterministic control policy using the actor-critic reinforcement learning framework. The control policy was iteratively updated, initiating from a random parameterized policy. The results show that the self-learning controller gained the ability to eliminate unstable vehicle motion after data-driven training of about 60,000 iterations. The controlled struck vehicle was also able to drift back to its original lane in a variety of rear-end collision scenarios, which could significantly reduce the risk of a second collision in traffic. 1. Introduction Vehicle stability/safety control beyond the handling limit remains challenging for traditional control theory due to its high-scale non-linearity and uncertainty [1]. Vehicles involved in high-speed traffic accidents generally belong to this category, often experiencing divergent vehicle motion and potentially causing severe threats to traffic safety. Well-known vehicle stability-enhancement systems, such as electronic stability control (ESC) and anti-lock braking systems (ABS), are not capable of dealing with significant drifting/spinning of the vehicle [2–4]. However, recent advances in vehicle dynamics near the handling limit, machine learning and automated vehicles have the potential to contribute to improvements in drift-control technology [5, 6]. Most of the existing studies in this area still draw on traditional vehicular drift-control methods. Zhou et al. [7] proposed a hierarchical model predictive control (MPC) strategy for the struck vehicle in a rear-end collision. The magnitude and position of the collision impact in the yaw plane were estimated, incorporating the measurements from motion sensors and empirical collision coefficients [8]. This hierarchical strategy was used to generate differential braking pressures for an optimal longitudinal tyre force allocation, taking into account the simplified vehicle dynamics and control constraints. However, all vehicle states were assumed available for drift control, including the road friction coefficient and the tyre forces, which is hardly practical for the implementation of this strategy in real vehicles. In addition, this study was only able to attenuate the unstable yaw rate and tyre-slip angles to relatively small values in the stable region. The human driver was expected to take over vehicle control after the hierarchical drift-control system. A further study by Kim and Peng [9] was carried out in order to regulate both the vehicle heading angle and lateral deviation via control of differential braking and the steering wheel angle, also using an MPC method. It was shown that a yaw-plane model of the unstable vehicle could be maintained along the original lane after a high-speed rear-end collision. Zhang et al. [10] also proposed a vehicle drift-control method. Drift control was decomposed into path planning and optimal path-tracking control. This algorithm was successfully implemented on a scaled vehicle for a cornering manoeuvre. A number of studies have also attempted to use reinforcement learning (RL) methods for drift control near the handling limit. Cai et al. [11] performed drift control using a data-driven RL method to track a known path in the game-engine-based CARLA simulator. The steering wheel angle and throttle control policy were approximated by a fully connected neural network during the RL training. Deterministic and discrete control policies were compared for different tracking paths. Cutler and How [12] achieved autonomous circular drift of a scaled car prototype using an RL method. The RL training was performed through interacting with a bicycle vehicle model, and the measured on-board data was also used for successful transfer of the trained control policy. The scaled car was controlled to maintain a relatively high yaw rate and lateral velocity, which enabled the vehicle to drift around a circular path. The above-mentioned studies generally perform the drift-control policy-learning using data obtained during trial-and-error interaction with a simulated environment, employing RL methods such as deep Q-networks [13], soft actor-critic [14] and deep deterministic policy gradients (DDPG) [15, 16]. These methods avoid the need for an analytical environment model, which is preferable especially if the environment dynamics under consideration are too complex to be analytically derived. Artificial neural networks, as a popular tool for deep-learning algorithms, are the functions most frequently used to approximate the target control policy. In this study, a data-driven RL framework was employed for vehicle drift control, initiating from unstable yaw-plane motions caused by high-speed rear-end collisions. A specific RL environment was developed using the CarSim simulation platform (Section 2). A deterministic control policy considering the response constraints of the vehicle wire actuators was then trained using a DDPG method (Section 3). The self-learning drift-control policy and the simulation results are analysed in Section 4. 2. Problem definition and environment development In order to reliably predict the multidirectional non-linear vehicle motions after a high-speed rear-end collision and develop a specific RL interaction environment, the well-established CarSim platform, shown in Fig. 1, was used in this study. The vehicle models offered by this simulator can include three-dimensional non-linearities in steering, suspension, driving and braking subsystems as well as tyre–road contacts. Various data sets are used to describe the linear/piecewise linear/non-linear properties of each subsystem, and different interpolation/extrapolation methods are available. These data sets are generally well verified using laboratory or field experimental data, which enables reliable analysis of both conventional and extreme scenarios. The rich extensional application interface permits the customization of external force/moment, vehicle control of different subsystems and the measurement uncertainties of the on-board sensors. Fig. 1: Open in new tabDownload slide Rear-end collision simulation environment for RL training Fig. 1: Open in new tabDownload slide Rear-end collision simulation environment for RL training As shown in Fig. 1, the rear-end collision environment developed for RL training in this study consisted of a five-lane highway, a striking vehicle, a struck vehicle and the surrounding vehicles. The collision between the striking vehicle and the struck vehicle was considered as longitudinal and lateral impact forces, which were exerted on the vehicle sprung mass [7]. A collision on the left side of the struck vehicle without proper vehicle control is demonstrated in Fig. 1: as can be seen, the struck vehicle drifts and spins after the rear-end collision impact. The motions of the struck vehicle in Fig. 1 are dominated by the longitudinal collision force, and the struck vehicle therefore drifts forward and spins clockwise. The unstable struck vehicle and its tyre skid track during drifting and spinning are also illustrated. The normal tyre loads of the struck vehicle, indicated by arrows beside each tyre, can also be seen, and considerable load transfers are noted. The unstable struck vehicle and surrounding vehicles are thus in a dangerous state. For different rear-end collision scenarios, custom settings can be adjusted in this RL environment, which include: (i) number/type of lanes and surface type of the road, (ii) type, initial position and initial speed of each vehicle, and (iii) impact duration, magnitude and position of the rear-end collision. Since high-speed collisions are more critical, a highway scenario was chosen for this study (Fig. 1). It is also worth pointing out that the vehicles involved in the collision can be investigated individually or as a connected vehicle network using the developed RL environment. Table 1 lists the main settings of the rear-end collision RL environment used in this study. The simulation involved a B-class car driving at 80 km/h in the centre of lane 3 before the collision impact. The rear-end collision was treated as a set of half-sine impact forces with a duration of 0.15 s, referring to [7]. Random initializations were configured for the collision side and the magnitude of the collision impact. The collision position and longitudinal impact magnitude were randomly selected from the set of {left-side, middle, right-side} and in the range of [30, 50] kN. The struck vehicle generally experienced lateral sliding and spinning after collision from the left and right sides. Secondary collisions between the struck vehicle and the surrounding vehicles in the traffic flow or the road kerb/fence were not considered. Table 1: Main settings of the rear-end collision RL environment Setting . Value . Struck vehicle type B-class car Struck vehicle initial speed (V0) 80 km/h Struck vehicle original lane Lane 3 Surrounding vehicles None Road kerb None Road-adhesion coefficient 0.5 (wet asphalt) Collision position Left, middle, right Collision duration 0.15 s Longitudinal impact magnitude (Fx) 30–50 kN Lateral impact magnitude (Fy) 0 kN Setting . Value . Struck vehicle type B-class car Struck vehicle initial speed (V0) 80 km/h Struck vehicle original lane Lane 3 Surrounding vehicles None Road kerb None Road-adhesion coefficient 0.5 (wet asphalt) Collision position Left, middle, right Collision duration 0.15 s Longitudinal impact magnitude (Fx) 30–50 kN Lateral impact magnitude (Fy) 0 kN Open in new tab Table 1: Main settings of the rear-end collision RL environment Setting . Value . Struck vehicle type B-class car Struck vehicle initial speed (V0) 80 km/h Struck vehicle original lane Lane 3 Surrounding vehicles None Road kerb None Road-adhesion coefficient 0.5 (wet asphalt) Collision position Left, middle, right Collision duration 0.15 s Longitudinal impact magnitude (Fx) 30–50 kN Lateral impact magnitude (Fy) 0 kN Setting . Value . Struck vehicle type B-class car Struck vehicle initial speed (V0) 80 km/h Struck vehicle original lane Lane 3 Surrounding vehicles None Road kerb None Road-adhesion coefficient 0.5 (wet asphalt) Collision position Left, middle, right Collision duration 0.15 s Longitudinal impact magnitude (Fx) 30–50 kN Lateral impact magnitude (Fy) 0 kN Open in new tab The selected B-class car, as shown in Fig. 1, was equipped with independent front and rear suspension, a front-axle electric drive system, a hydraulic braking system and a power-steering system. This vehicle was conventional and could be conveniently re-equipped with wire steering/braking/driving systems, which was necessary for automated drift control after the rear-end collision. Further parameters of the B-class struck vehicle are summarized in Table 2; these were obtained partly from the CarSim platform and partly from the literature. The 205/45 R17 tyre was selected for this vehicle for its sporty and adequate handling performance. Table 2: Struck vehicle parameters Parameter . Value . Sprung mass 1020 kg  – Roll inertia 308.6 kgm2  – Pitch inertia 1020 kgm2  – Yaw inertia 1020 kgm2 Wheelbase 2.33 m Track width 1.48 m Suspension type Independent Drive type Front electric axle Braking type Hydraulic Tyre specification 205/45 R17 Parameter . Value . Sprung mass 1020 kg  – Roll inertia 308.6 kgm2  – Pitch inertia 1020 kgm2  – Yaw inertia 1020 kgm2 Wheelbase 2.33 m Track width 1.48 m Suspension type Independent Drive type Front electric axle Braking type Hydraulic Tyre specification 205/45 R17 Open in new tab Table 2: Struck vehicle parameters Parameter . Value . Sprung mass 1020 kg  – Roll inertia 308.6 kgm2  – Pitch inertia 1020 kgm2  – Yaw inertia 1020 kgm2 Wheelbase 2.33 m Track width 1.48 m Suspension type Independent Drive type Front electric axle Braking type Hydraulic Tyre specification 205/45 R17 Parameter . Value . Sprung mass 1020 kg  – Roll inertia 308.6 kgm2  – Pitch inertia 1020 kgm2  – Yaw inertia 1020 kgm2 Wheelbase 2.33 m Track width 1.48 m Suspension type Independent Drive type Front electric axle Braking type Hydraulic Tyre specification 205/45 R17 Open in new tab The vehicle dynamics responses after a rear-end collision are generally within the non-linear region; simplified and linearized vehicle models were thus not suitable for training the drift-control policy in this study. The non-linear characteristics of the above-mentioned struck vehicle—namely, the steering system, braking system and tyre forces/moment—are illustrated in Figs 2–4, respectively. The relationships between the front wheel-steering angles and the steering-wheel angle were defined by the steering kinematics structure. With a steering-wheel angle of ± 100° (turning to the left is positive), the wheel-steering angles were nearly identical and could be treated as a linear range. With the increase of the steering-wheel angle during a steering manoeuvre, the inner wheel angle was greater than that of the outer wheel, and the difference between the wheel-steering angles tended to increase with a greater steering-wheel angle. At the maximum steering-wheel angle of 540°, the kinematic wheel-steering angles were approximately 42° and 30°, respectively, for the inner and outer wheels. Fig. 2: Open in new tabDownload slide Relationship between steering-wheel angle and wheel-steering angles Fig. 2: Open in new tabDownload slide Relationship between steering-wheel angle and wheel-steering angles Fig. 3: Open in new tabDownload slide Relationship between hydraulic braking pressure and braking torque Fig. 3: Open in new tabDownload slide Relationship between hydraulic braking pressure and braking torque Fig. 4: Open in new tabDownload slide Non-linear tyre characteristics (205/45 R17): (a) longitudinal force; (b) lateral force; and (c) aligning moment Fig. 4: Open in new tabDownload slide Non-linear tyre characteristics (205/45 R17): (a) longitudinal force; (b) lateral force; and (c) aligning moment For the hydraulic braking system, the front and rear axles used different actuating laws due to the different axle loads, as seen in Fig. 3. The relationship between the front axle braking pressure and the delivered braking torque was linear, at a rate of 100 Nm/Mpa. A piecewise linear relationship was selected for the rear axle. A lower braking rate applied when the braking pressure was greater than 1.5 Mpa. This was intended to prevent lock-up of the rear wheels during normal scenarios to enhance the yaw stability. The maximum braking pressure was set to 5 Mpa. The non-linearity in the hydraulic pressure delivery was not considered, while the dynamics of the actuators were considered using a time constant of 0.06 s. The ABS of the struck vehicle was disabled for this study, since the braking pressure was regulated by the synthesized drift-control policy. The tyre non-linearities were mainly in the generation of longitudinal force, lateral force and aligning moment, as illustrated in Fig. 4a–c, respectively. The tyre forces and moment were obtained via a look-up table on the CarSim platform, based on the corresponding tyre slips and normal tyre load. Variations in normal tyre load during vehicle drifting and spinning contributed to the non-linearities as well. Increased normal tyre loads generally led to greater tyre forces/moment. The longitudinal tyre force increased with the longitudinal slip ratio until it reached the maximum value near a slip ratio of 0.15. Further increases in the tyre slip ratio led to a decreasing longitudinal tyre force. A similar relationship between the lateral tyre force and the lateral slip angle can be seen in Fig. 4b. The maximum lateral tyre forces occurred at a slip angle of approximately 8°. As shown in Fig. 4c, the aligning moment of the tyre reached its maximum value near a slip angle of 1.5°. The aligning moment then decreased with further increases in the slip angle, and changed direction at a slip angle of about 8°. As for the combined longitudinal and lateral tyre slip scenario—which generally applies during conventional and extreme vehicle operation—the equivalent tyre slips were calculated based on the normalized total slip [17]. Assuming symmetric tyre characteristics, the total tyre slip, |${\sigma _{total}}$|⁠, can be expressed as: $$\begin{eqnarray} {\sigma _{total}} &=& {\sqrt {\sigma _x^2 + \sigma _y^2} }\nonumber\\{\sigma _x} &=& {\frac{\kappa }{{1 + \kappa }}}\nonumber\\{\sigma _y} &=& {\frac{{\tan \,\,\alpha }}{{1 + \kappa }}} \end{eqnarray}$$(1) where |$\kappa $| is the pure longitudinal slip ratio and |$\alpha $| is the pure lateral slip angle. The normalized total tyre slip, |${\bar{\sigma }_{total}}$|⁠, is then calculated as: $$\begin{eqnarray} {{\bar{\sigma }}_{total}} &=& {\sqrt {\bar{\sigma }_x^2 + \bar{\sigma }_y^2} }\nonumber\\{{\bar{\sigma }}_x} &=& {\frac{{{\sigma _x}}}{{\sigma _x^*}}}\nonumber\\{{\bar{\sigma }}_y} &=& {\frac{{{\sigma _y}}}{{\sigma _y^*}}} \end{eqnarray}$$(2) where |$\sigma _x^*$| and |$\sigma _y^*$| are obtained from the tyre slips regarding the maximum longitudinal and lateral tyre forces, as indicated in Fig. 4. Applying the combined slip theory, the equivalent longitudinal tyre slip ratio, |${\kappa _e}$|⁠, and equivalent lateral slip angle, |${\alpha _e}$|⁠, for the combined slip scenarios are then obtained as: $$\begin{eqnarray} {\kappa _e} &=& {\frac{{{{\bar{\sigma }}_{total}}\sigma _x^*sign\left( {{\sigma _x}} \right)}}{{1 + {{\bar{\sigma }}_{total}}\sigma _x^*sign\left( {{\sigma _x}} \right)}}}\nonumber\\{\alpha _e} &=& {{{\tan }^{ - 1}}\left( {{{\bar{\sigma }}_{total}}\sigma _y^*sign\left( {{\sigma _y}} \right)} \right)} \end{eqnarray}$$(3) 3. Reinforcement learning method In this study, the post-collision safety of only the struck vehicle, as shown in Fig. 1, was considered; the objective was to minimize its lateral deviation and heading angle deviation from the original lane. The selected states of the struck vehicle were the measurable variables, such as wheel-spin rates, centre of gravity (CG) and the three-dimensional inertial states. The selected control actions were the steering-wheel angle, the front-axle electric driving torque and the braking pressures of the rear wheels. Since the vehicle's on-board wire actuators are usually constrained by a certain response rate and saturation threshold, a deterministic control policy that generated continuous action commands was preferred. The DDPG method was thus employed, incorporating a deep R technique [16]. During training, a fully connected actor neural network and critic neural network were initiated in order to approximate the optimal control policy and the true value function. The parameters of the actor network were perturbed using the parameter noise method [18], as shown in Fig. 5, for enhanced exploration within the whole state space. The generated control actions, including the steering wheel angle, driving torque and braking pressures, were then imported to the rear-end collision RL environment. The simulated data, including the vehicle states and reward signals, was obtained and stored in a data buffer. As has already been shown [16], the DDPG algorithm could converge and the near-optimal control policy could be synthesized. The stored data was sampled for the update of the critic and actor networks, as described below. Fig. 5: Open in new tabDownload slide RL training flowchart with DDPG [16] and parameter noise exploration [18] Fig. 5: Open in new tabDownload slide RL training flowchart with DDPG [16] and parameter noise exploration [18] The RL method was chosen to maximize the return |${G_t}$| of each training iteration [6], which can be expressed as: $$\begin{eqnarray} {G_t} = \mathop \sum \limits_{i = t}^{T - 1} {\gamma ^{i - t}}{r_i} \end{eqnarray}$$(4) where |$T$| is the terminal time of an iteration, which can also be infinite for continuous tasks; |$t$| is the current time; |$i$| indicates the step number; |$\gamma $| is the discount rate; and |$r$| is the immediate reward signal as a function of the selected vehicle states, |$s$|⁠, and control actions, |$a$|⁠, which is designed as: $$\begin{eqnarray} r = - \left| {{s_w}} \right| - \left| {{a_w}} \right| + b \end{eqnarray}$$(5) where |${s_w}$| and |${a_w}$| are the weighted vehicle states and weighted control actions, respectively, and |$b$| is a positive bonus reward signal when the controlled struck vehicle approaches the original lane. The vehicle states for drift control of the struck vehicle after the rear-end collision were selected according to the measurability of on-board sensors, as: $$\begin{eqnarray} s = \left[ {{\omega _1};{\omega _2};{\omega _3};{\omega _4};{a_x};{a_y};{a_z};{r_x};{r_y};{r_z};{v_x};{v_y};{p_x};{p_y};\varphi } \right]\nonumber\\ \end{eqnarray}$$(6) where |${\omega _1}$|⁠, |${\omega _2}$|⁠, |${\omega _3}$| and |${\omega _4}$| are the spin rates of the four wheels; |${a_x}$|⁠, |${a_y}$| and |${a_z}$| are the longitudinal, lateral and vertical accelerations of the sprung mass; |${r_x}$|⁠, |${r_y}$| and |${r_z}$| are the roll rate, pitch rate and yaw rate of the sprung mass; |${v_x}$| and |${v_y}$| are the longitudinal and lateral velocities of the vehicle; |${p_x}$| and |${p_y}$| are the longitudinal and lateral positions of the vehicle; and |$\varphi $| is the vehicle heading angle with respect to the original lane. The selected control actions in this study were: $$\begin{eqnarray} a = \left[ {\delta ;\, {T_f};\,{P_l};{P_r}} \right] \end{eqnarray}$$(7) where |$\delta $| is the steering-wheel angle, |${T_f}$| is the driving torque of the front axle, and |${P_l}$| and |${P_r}$| are the braking pressures of the left and right wheel, respectively, of the rear axle. Using the DDPG algorithm [16], a deterministic control policy, |$a\ = \ \pi ( s )$|⁠, was obtained. As illustrated in Fig. 5, an actor neural network |$\pi ( { s |\theta } )$| with weighting parameters |$\theta $| and a critic neural network |$Q( { {s,\ a} |w} )$| with weighting parameters |$w$| were first initiated, in order to approximate the control policy and the corresponding expected return |${\mathbb{E}_\pi }\{ {{G_t}|s,\ a} \}$|⁠, respectively. For the sake of a more stable training process, a target actor neural network and a target critic neural network were also initialized with weighting parameters |$\theta ^{\prime}$| and |$w^{\prime}$|⁠, respectively. All four neural networks were fully connected networks. In order to iteratively update these neural networks until convergence in a near-optimal control policy, control actions generated by |$\pi ( { s |\theta } )$| with perturbed weighting parameters [18] were used to explore the above-developed RL environment. The transitions of |$( {{s_t},\ {a_t},\ {r_t},\ {s_{t + 1}}} )$| were obtained through simulation of the RL environment and then stored within a large-size data buffer. A mini-batch of |$N$| transitions were uniformly sampled from the data buffer [19], and the critic neural network weightings, |$w$|⁠, were updated by optimizing the following loss function: $$\begin{eqnarray} L\left( w \right) &=& {\frac{1}{N}\mathop \sum \limits_i {{\left( {{y_i} - Q\left( {\left. {{s_i},{a_i}} \right|w} \right)} \right)}^2}}\nonumber\\{y_i} &=& {{r_i} + \gamma Q\left( {\left. {{s_{i + 1}},\pi \left( {\left. {{s_{i + 1}}} \right|\theta ^{\prime}} \right)} \right|w^{\prime}} \right)} \end{eqnarray}$$(8) where |${y_i}$| is the target value calculated using the self-consistency relationship of the action-value function [20]. The sampled policy gradient used to update the actor neural network was subsequently approximated by applying the chain rule as: $$\begin{eqnarray} {\nabla _\theta } \approx \frac{1}{N}\mathop \sum \limits_i {\left. {{\nabla _a}Q\left( {\left. {s,a} \right|w} \right)} \right|_{{s_i},\pi \left( {{s_i}} \right)}}{\left. {{\nabla _\theta }\pi \left( {\left. s \right|\theta } \right)} \right|_{{s_i}}} \end{eqnarray}$$(9) After the updating of the actor and critic neural networks, the target actor and critic neural networks were also updated, as: $$\begin{eqnarray} \theta ^{\prime} &\leftarrow& {\tau \theta + \left( {1 - \tau } \right)\theta ^{\prime}}\nonumber\\w^{\prime} &\leftarrow& {\tau w + \left( {1 - \tau } \right)w^{\prime}} \end{eqnarray}$$(10) where |$\tau $| is a hyperparameter tuned to prevent overestimation of these neural networks and maintain training stability. 4. Results and discussion After 60,000 training iterations, the returns as expressed in Equation (4) were smoothed and plotted, as shown in Fig. 6. It can be seen that the return grew from approximately −1150 and saturated at about 100, which implies the gradual evolution and convergence of the drift-control policy. The obtained return approached positive after approximately 25,000 iterations, attributing to the designed bonus reward in Equation (5). It should be noted, however, that the return still fluctuated near the saturation state. This was most probably due to the random initialization of the collision force for each iteration. It also suggests that the synthesized control policy was able to stabilize the struck vehicle under different initial conditions of the rear-end collision in the developed RL environment. Fig. 6: Open in new tabDownload slide Convergence of iteration returns during training Fig. 6: Open in new tabDownload slide Convergence of iteration returns during training The stabilized vehicle yaw-plane motions, in terms of the CG trajectory and the heading angle after a left-side rear-end collision, are demonstrated in Fig. 7a and b, respectively. The collision impact magnitude was 40 kN (the average of the selected training range), and other collision scenarios exhibited a similar trend with the synthesized drift control. The divergent vehicle motions without any active control intervention are also compared in the figures. The simulations were performed for 10 s, which was the same as the duration of a training iteration. Fig. 7: Open in new tabDownload slide Comparison of divergent and stabilized vehicle motions after a left-side rear-end collision: (a) vehicle CG trajectory; and (b) heading angle Fig. 7: Open in new tabDownload slide Comparison of divergent and stabilized vehicle motions after a left-side rear-end collision: (a) vehicle CG trajectory; and (b) heading angle It can be seen from Fig. 7a that the controlled struck vehicle drifted to the neighbouring lane and then back to the original lane. The vehicle drifted forward about 100 m and stopped near the centre of the original lane with a deviation of about 0.5 m. The uncontrolled vehicle, on the other hand, was unable to drift back to the original lane, and the lateral deviation could be more than 10 m, which risked causing a second collision. Moreover, the stabilized heading angle was nearly along the original lane after a spin of about 360°, which reduced the effect on the high-speed traffic flow. Focusing on one tyre of the controlled struck vehicle, the relationship of longitudinal and lateral tyre forces during the drift-control manoeuvre is illustrated in Fig. 8. Near-elliptical contours of the tyre forces can be noticed, which can be attributed to the combined tyre-slip model, as expressed in Equation (3), and the effect of normal tyre load, as shown in Fig. 4. The combined tyre-slip model employed constrained the available longitudinal and lateral forces, due to the limitation of tyre–road adhesion conditions. An idealized elliptical relationship should be obtained according to conventional tyre mechanics, neglecting the effect of normal tyre-load variation. Considerable tyre-load transfer was experienced during the drifting/spinning of the struck vehicle, however, as shown in Fig. 7. The variation in normal tyre force contributed to variations in the maximum longitudinal and lateral tyre forces and thus the transitions of different elliptical relationships, as illustrated in Fig. 8. Fig. 8: Open in new tabDownload slide Relationship of longitudinal and lateral tyre forces during vehicle drift control Fig. 8: Open in new tabDownload slide Relationship of longitudinal and lateral tyre forces during vehicle drift control Fig. 9 further illustrates the drift-control actions of the struck vehicle. The control policy was activated after the initial collision at 0.15 s and remained activated until the simulation terminated. The steering-wheel angle and driving torque of the front axle are shown in Fig. 9a, and the braking pressures of the two rear wheels are shown in Fig. 9b. It can be seen that the control actions respected the actuators’ physical constraints, in terms of the feasible response rate and magnitude. For the conventional front-drive electric vehicle examined in this study, the front-axle steering-wheel angle and driving torque were constrained at ± 500° and ± 300 Nm, respectively, and the rear-axle braking pressures were constrained by 5 Mpa. Fig. 9: Open in new tabDownload slide Synthesized control policy for a left-side rear-end collision: (a) front-axle steering-wheel angle and driving torque; and (b) rear-axle wheel-braking pressures Fig. 9: Open in new tabDownload slide Synthesized control policy for a left-side rear-end collision: (a) front-axle steering-wheel angle and driving torque; and (b) rear-axle wheel-braking pressures The drift-control actions after the rear-end collision generally approached the actuators’ constraint threshold, which may be due to the large magnitude of the initial unstable yaw motions. This differed from the effects of traditional ESC or ABS systems [2–4]. The high-scale motions were then stabilized through alternative steering and driving, as well as the suitable differential braking generated by the synthesized controller. The pressure differences were more obvious at around 2 s, when the yaw motions continued to increase, and at around 7 s, when the vehicle tended to stop.The braking pressures were mostly saturated, which can be attributed to the need for velocity reduction, as expressed in Equation (5). When the struck vehicle was almost at a complete stop, as seen at 9–10 s in Fig. 9, the control actions generated by the synthesized policy tended to be zero. This is as expected, since smaller actions contribute to a greater iteration return during RL training. 5. Conclusions Unstable yaw motions of a struck vehicle after a high-speed rear-end collision were able to be stabilized using a self-learning drift-control policy. The struck vehicle experienced drifting and spinning, and stopped along its original lane. The control policy gradually gained this ability through a data-driven RL process. The drifting and spinning of the struck vehicle were eventually eliminated via alternative steering and electric driving, as well as differential braking control. The control actions generally approached the actuators’ constraint threshold, due to the large magnitude of the motions. The synthesized drift-control policy was effective for different rear-end collision scenarios, irrespective of the considered collision position and impact magnitude. The data-driven method in this study, however, neglected conventional vehicle dynamics principles, which would increase the time needed for controller synthesis. The obtained controller is usually not explainable, especially using deep neural networks, although it may be valuable for complex driving tasks. The integration of explicit vehicle model will thus need to be addressed in further studies. Acknowledgements This work is supported by International Science & Technology Cooperation Program of China (Grant No. 2019YFE0100200) and the National Natural Science Foundation of China (Grant No. 51905483). This paper is also partially supported by Toyota. Conflict of interest statement None declared. References 1. Li SE , Zheng Y, Li K et al. . Dynamical modeling and distributed control of connected and automated vehicles: challenges and opportunities . IEEE Intell Transp Syst Mag . 2017 ; 9 : 46 – 58 . Google Scholar Crossref Search ADS WorldCat 2. Ferguson SA . The effectiveness of electronic stability control in reducing real-world crashes: a literature review . Traffic Inj Prev . 2007 ; 8 : 329 – 38 . Google Scholar Crossref Search ADS PubMed WorldCat 3. Høye A . The effects of electronic stability control (ESC) on crashes: an update . Accid Anal Prev . 2011 ; 43 : 1148 – 59 . Google Scholar Crossref Search ADS PubMed WorldCat 4. Liebemann E , Meder K, Schuh J et al. . Safety and performance enhancement: the Bosch electronic stability control (ESP) . Technical paper . SAE International 2004 . 5. Li SE , Chen H, Li R et al. . Predictive lateral control to stabilise highly automated vehicles at tire-road friction limits . Veh Syst Dyn . 2020 ; 58 : 1 – 19 . Google Scholar Crossref Search ADS WorldCat 6. Guan Y , Li SE, Duan J et al. . Direct and indirect reinforcement learning . arXiv:1912.10600 [cs.LG] , 2019 . 7. Zhou J , Lu J, Peng H. Vehicle stabilization in response to exogenous impulsive disturbances to the vehicle body . Int J Veh Auton Syst . 2010 ; 8 : 242 – 62 . Google Scholar Crossref Search ADS WorldCat 8. Brach RM , Goldsmith W. Mechanical Impact Dynamics: Rigid Body Collisions . American Society of Mechanical Engineers Digital Collection , 1991 . Google Scholar Crossref Search ADS Google Scholar Google Preview WorldCat COPAC 9. Kim B , Peng H. Vehicle stability control of heading angle and lateral deviation to mitigate secondary collisions . In: 11th International Symposium on Advanced Vehicle Control , Seoul, South Korea , 2012 , 1 – 6 . 10. Zhang F , Gonzales J, Li SE et al. . Drift control for cornering maneuver of autonomous vehicles . Mechatronics . 2018 ; 54 : 167 – 74 . Google Scholar Crossref Search ADS WorldCat 11. Cai P , Mei X, Tai L et al. . High-speed autonomous drifting with deep reinforcement learning . arXiv:2001.01377 [cs.RO] , 2020 . 12. Cutler M , How JP. Autonomous drifting using simulation-aided reinforcement learning . In: 2016 IEEE International Conference on Robotics and Automation (ICRA) , Stockholm, Sweden , 2016 , 5442 – 8 . 13. Mnih V , Kavukcuoglu K, Silver D et al. . Human-level control through deep reinforcement learning . Nature . 2015 ; 518 : 529 – 33 . Google Scholar Crossref Search ADS PubMed WorldCat 14. Haarnoja T , Zhou A, Abbeel P et al. . Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor . arXiv:1801.01290 [cs.LG] , 2018 . 15. Silver D , Lever G, Heess N et al. . Deterministic policy gradient algorithms . In: Proceedings of the 31st International Conference on Machine Learning , Beijing, China , 2014 , 387 – 95 . 16. Lillicrap TP , Hunt JJ, Pritzel A et al. . Continuous control with deep reinforcement learning . arXiv:1509.02971 [cs.LG] , 2015 . 17. Bakker E , Pacejka HB, Lidner L. A new tire model with an application in vehicle dynamics studies . In: Autotechnologies Conference and Exposition , Monte Carlo, Monaco , 1989 , 101 – 113 . 18. Plappert M , Houthooft R, Dhariwal P et al. . Parameter space noise for exploration . arXiv:1706.01905 [cs.LG] , 2017 . 19. Ioffe S , Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift . arXiv:1502.03167 [cs.LG] , 2015 . 20. Li SE . Reinforcement learning and control . Lecture notes . Tsinghua University 2019 . © The Author(s) 2020. Published by Oxford University Press on behalf of Central South University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com TI - Self-learning drift control of automated vehicles beyond handling limit after rear-end collision JF - Transportation Safety and Open Environment DO - 10.1093/tse/tdaa009 DA - 2020-06-01 UR - https://www.deepdyve.com/lp/oxford-university-press/self-learning-drift-control-of-automated-vehicles-beyond-handling-WBeElyrrD0 SP - 97 EP - 105 VL - 2 IS - 2 DP - DeepDyve ER -