TY - JOUR AU - Cai, Jieping AB - 1. Introduction The development of electronic trading systems and computing technology have made stock trading more efficient since the end of the 20th century. The stock market has experienced tremendous expansion and generated a large amount of stock trading data and information. Therefore, how to grasp the operating mechanism of stock price movements has become a hot topic for researchers. However, the price trend in the stock market is a complex nonlinear dynamic system [1]. The time series of the stock price are influenced by the internal micro environment of enterprises, such as company performance and growth potential. In addition, stock prices are also influenced by external macroeconomic factors, such as changes in Gross Domestic Product(GDP), market interest rates, and media opinion etc. [2]. The fluctuation of stock prices is described as a stochastic process [3]. Econometric models are used to describe stock behavior, such as traditional time series prediction methods, exponential smoothing, and differential integrated moving average autoregressive models etc. [4, 5]. In recent years, artificial intelligence technology has been widely used to solve complex nonlinear time series stock prediction problems [6]. Machine learning and deep learning are the main effective methods to predict stock prices. By constructing neural networks, the models have the ability to simulate the human brain to analyze, learn, and interpret data such as images, sound, and text. Among them, a common model is to solve stock price prediction as a regression problem of time series. The common regression metrics used to measure the prediction performance included mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and goodness of fit (R2) of the model [7–10]. The methods proposed in many studies improve the performance of LSTM models by in many aspects, there are still some potential limitations and sources of error or bias. (1) computational complexity. The combination with optimization algorithms such as adaptive genetic algorithms may increase the computational complexity, resulting in longer training time. (2) Model generalization ability. Although hyperparameters are optimized, the generalization ability of the model under different market conditions (such as bull and bear markets) may be limited, requiring more validation and testing. (3) The influence of randomness. Meta-heuristic algorithms have randomness, which need to be run multiple times to ensure the stability of the results. (4) Data quality and quantification strategies. Data quality and pre-processing methods can handle noise and outliers of stock price, which have an important impact on prediction results. Particle Swarm Optimization (PSO) is a swarm intelligence algorithm that has unique advantages due to its high efficiency. In this study, a novel PSO-LSTM stock price prediction model is proposed to leverage PSO in order to optimize LSTM hyperparameters. The decision to employ PSO for hyperparameter optimization stems from its efficacy in addressing complex optimization problems, particularly in high-dimensional spaces encountered in deep learning models like LSTMs. PSO excels in searching for global optima, converging relatively quickly compared to other optimization techniques, thus efficiently exploring hyperparameter combinations and saving computational resources. Moreover, the non-convex nature of hyperparameter optimization in LSTMs necessitates robust algorithms like PSO, capable of navigating complex, non-linear search spaces. Additionally, PSO’s ease of implementation and tuning make it accessible for researchers and practitioners seeking effective hyperparameter optimization strategies. By leveraging PSO, the predictive performance of LSTM-based stock price prediction model will be enhanced while addressing the challenges posed by high-dimensional optimization tasks. In addition, stock prices are high noise and high-dimensional time series, data preprocessing methods and feature selection strategies are also important for improving the performance of stock prediction models. Therefore, the model proposed in this study consists of three parts: (1) Data preprocessing includes wavelet transform (WT) and correlation analysis. WT was applied for the denoising time series. Correlation analysis is used to select the characteristics of input variables. (2) PSO is used to optimized the number of optimization iterations and the number of hidden neurons in LSTM neural networks. (3) The hyperparameters of the optimal solution obtained from PSO optimization are used as inputs of the LSTM model to predict stock prices. By comparing LSTM models with different hidden layers, the PSO-LSTM stock prediction results with the best performance will be selected as outputs. In order to analyze and compare the predicted results, RMSE, MAE, MAPE, and R2 are applied to evaluate the regression models. The efficient market hypothesis (EMH) assumes that the degree of market development is heterogeneous. Selecting indices from different levels of market development can help explain the robustness of algorithms as market conditions may potentially affect the effectiveness of stock forecasting. Therefore, six stock indices are used to test the performance of the model. These indices include the Dow Jones Industrial Average (DJIA) index of the New York Stock Exchange, Standard & Poor’s 500 (S&P 500) index, Nikkei 225 index of Tokyo, Hang Seng Index of Hong Kong market, CSI300 index of Chinese Mainland stock market, and Nifty50 index of India. Among these six stock indices, DJIA and S&P 500 represent the most developed and efficient market. Hang Seng Index and Nikkei 225 represent the middle stage between efficient and inefficient markets. CSI300 and Nifty50 represent developing markets. The main contribution and novelty of this article are as follows. Comprehensive experimental analysis utilizing data from six global stock indices across markets of varying development levels can facilitate a thorough comparative study of model performance. Then, the introduction of a novel PSO-LSTM stock price prediction model can leverage particle warm optimization to optimize LSTM hyperparameters. Finally, Investigation into the influence of diverse retrospective periods (50 days, 20 days, and 7 days) on model performance, providing insights into the model’s efficacy across different prediction cycles for practical applications. The remainder of this article is organized as follows. Section 2 mentions the related work. Section 3 introduces the methodology applied in this study. Section 4 presents the details of the experimental design. Section 5 illustrates the experimental results. Section 6 summarizes the research. 2. Related work Machine Learning (ML) has emerged as a transformative tool across various scientific and industrial domains. In the realm of environmental science, ML has been applied to landslide susceptibility mapping (LSM) in the Three Gorges Reservoir area. Utilizing Automated Machine Learning (AutoML), this approach simplifies the modeling process for non-experts by automating the selection and tuning of models. The implementation of AutoML has shown a significant performance improvement. ML’s utility is also evident in geotechnical engineering, particularly in predicting reservoir landslide displacements [11]. An Earthworm Optimization Algorithm-optimized Support Vector Regression (EOA-SVR) model has been developed, surpassing traditional metaheuristic models in stability and performance. This method provides a reliable tool for medium and long-term landslide early warning systems, which is crucial for disaster management and mitigation. As for post-disaster analysis, ML techniques, combined with multitemporal remote sensing, were used to monitor and analyze the evolution of landslide activity over a decade [12]. The use of algorithms like Random Forest facilitated the accurate mapping of landslides, revealing a gradual decrease in occurrences over time, despite intermittent spikes due to monsoonal rains. The study of long-term hydrological changes in the pan-Arctic region illustrates ML’s role in environmental monitoring [13]. By employing algorithms like the Extreme Gradient Boosting Tree (XGBoost), researchers have reconstructed historical water levels of pan-Arctic lakes, integrating these findings with climatic and hydrological data. This application not only enhances the understanding of hydrological dynamics influenced by climate change but also provides a methodological blueprint for similar environmental studies. These instances underline ML’s broad applicability and potential in providing innovative solutions to complex problems across various disciplines, reinforcing its role as a cornerstone technology in modern science and engineering [14]. Deep learning (DL) is a sub field of ML. LSTM neural network proposed by Hochreiter and Schmidhuber has shown excellent performance in time series prediction [15]. Compared to Convolutional Neural Network (CNN), LSTM neural network improves gate structure to selectively learn useful hidden information from a large amount of complex historical data. Therefore, LSTM can better understand the patterns and trends of time series [16]. Reference [17] used LSTM to predict the returns on investment portfolios of the S&P 500 index. The experimental results show that LSTM performs better than Random Forest (RF), Deep Neural Network (DNN), and Logistic Regression (LR). Reference [18] used LSTM to predict the opening prices of Google and NKE stocks and the results demonstrated that the LSTM model had good predictive ability. Reference [19] established an LSTM prediction model, using historical prices and technical analysis indicators as input variables to predict future trends in stock prices. LSTM was compared with other machine learning methods, and the experimental results showed that LSTM has excellent performance. Reference [20] compared LSTM with Support Vector Regression (SVR) using 9 common technical indicators to predict 5 US stock prices of international companies. The results showed that LSTM had better average prediction accuracy than SVR. The accuracy and convergence of LSTM highly depend on the combination of hyperparameters, such as the number of hidden layers and neurons in each layer. However, the network topology constructed by these hyperparameters are difficult to manually adjust one by one, which makes it difficult to ensure a suitable and optimal network structure that meets practical applications. Metaheuristic algorithm is an optimization technique that can be used to optimize the hyperparameters of machine learning models [21, 22]. Compared with traditional optimization methods which may fall into local minima and fail to find the optimal solution, metaheuristic algorithms can find the optimal hyperparameters even under conditions of complex data distribution and huge search space [23–25]. PSO [26] is a swarm intelligence algorithm that has unique advantages among metaheuristic algorithms due to its high efficiency. The principle of PSO algorithm is to simulate the process of birds searching for food, dynamically adjusting their position based on individual and group extremum. PSO searches for the best solution to a given problem in the search space using a group of candidate solutions called particles. The iterative process of the algorithm only has a small number of parameters that need to be adjusted, such as particle velocity, particle quantity, and particle position. Based on the best position of each particle in the search space and the best position of the population so far, the position and velocity of the particles will be changed during each iteration. As particles move towards the best place, unknown domains in the search space are automatically avoided. This iterative optimization help particles to avoid local minima and converge to the global optimal solution. Moreover, the adaptability of PSO algorithm enables LSTM to determine the optimal parameters based on data features quickly and accurately, which can also help reduce the computational cost of adjusting the model. The combination of adaptive PSO algorithm and self-learning iterative optimization of key parameters of LSTM model can avoid manual parameter adjustment and improve the efficiency and authenticity of the model [27, 28]. Many types of LSTM (and other RNNs) have been tuned by metaheuristics algorithms for various purposes. Parkinson’s disease diagnosis is a challenging task due to the absence of reliable tests. Cuk et al. explored the potential of LSTM neural networks combined with attention mechanisms and proposed an optimized crayfish algorithm to detect Parkinson’s disease accurately [29]. The method achieved promising results with an accuracy of 87.42% using dual-task walking test data. Quick process shift detection is vital for modern smart manufacturing. Yang et al. proposed single and stacked LSTM models optimized with metaheuristic optimizers to detect shifts in high-dimensional manufacturing processes [30]. The CSOS_S_LSTM model achieved the best results with a shorter out-of-control run length, improving response time by 38.77% on average. Bacanin et al. proposed the use of a long short-term memory (LSTM) deep learning model for cloud load time-series forecasting [31]. It utilized LSTM with attention layers and a modified particle swarm optimization (PSO) algorithm. The variational mode decomposition (VMD) was used for data preprocessing. The proposed methodology outperformed other techniques in terms of performance metrics. SHAP analysis was used to assess feature importance. The methodology had potential for assisting cloud providers in resource allocation and provisioning decision-making processes. Francisco J. employed the Automated Machine Learning (AutoML) process for feature selection, model creation, and hyperparameter optimization to develop a machine learning model for Google stock price forecasting [32]. The AutoML process selected features from 11 technical indicators. Hyperparameters were optimized using PSO, achieving accurate results with errors ranging from 1E-2 to 9E-4. The CNN-LSTM network outperformed the standalone LSTM model. Pedroza-Castro et al. addressed the need for efficient computation resource management in distributed cloud-based services by proposing a methodology for forecasting cloud resource load using RNN with attention layers [33]. The models were optimized through hyperparameter tuning using a modified PSO metaheuristic and incorporated variational mode decomposition for handling non-stationary data sequences. The study demonstrated the potential of the proposed method in accurately forecasting cloud load. The results outperformed state-of-the-art algorithms and provided valuable insights for cloud providers. To enhance the security of intrusion detection systems, Donkol introduced an enhanced LSTM technique integrated with RNN (ELSTM-RNN) [34]. The proposed system employed PSO to select effective features and enhanced LSTM for classification. It addressed the challenges faced by existing methods and achieved better performance in detecting intrusions within network communications. The system’s efficiency was validated through extensive testing on multiple datasets, demonstrating improved classification accuracy and faster training times compared to existing methods. In the latest research, many studies have applied the PSO algorithm to the field of time series prediction. The combination of PSO and LSTM models is used to predict the trend of stock changes based on quantitative and textual information [35]. Empirical results showed that the model was superior to the BP neural network and LSTM network models. Reference [36] validated the effectiveness and applicability of the PSO-LSTM model based on stock prices. Reference [37] validated the effectiveness of the PSO-LSTM model based on the sales of five types of fishing gear in an online store and two publicly available datasets. Therefore, the proposed PSO-LSTM hybrid model for stock price prediction is deliberate and multifaceted. PSO’s compatibility with LSTM architecture, coupled with its proven track record and efficiency in optimization tasks, makes it a suitable choice for stock price prediction. Furthermore, PSO aligns optimally with the requirements and characteristics of stock price prediction due to its ability to effectively handle the complex and dynamic nature of stock market data. The inherent swarm intelligence of PSO allows it to navigate the intricate parameter space of LSTM, facilitating the discovery of optimal solutions amidst the inherent noise and non-linearity of financial markets. Additionally, the adaptability of PSO enables it to continuously refine its search strategy, ensuring robust performance in the face of evolving market conditions. PSO is also underpinned by the No Free Lunch theorem [38], which underscores the recognition that while other optimization algorithms may excel in certain contexts. PSO offers the most suitable optimization approach for stock price prediction framework. 3. Methodology 3.1 Long short-term memory neural networks LSTM has been widely applied in predicting stock prices in recent years. Traditional RNN is prone to the problem of vanishing gradients. By introducing forget gates and memory units, LSTM can solve the above problems. Due to the presence of memory units, the flow of information is achieved through a mechanism called cellular state. LSTM can selectively retain or forget information based on its importance, which achieves dynamic learning of data patterns and improves prediction accuracy. LSTM can also overcome the long-term dependency problem in recurrent networks as gradients can flow over a longer period of due to the introduction of self recurrent paths. In addition, the original model improvement of LSTM also enhances the reliability of the network [39]. This improvement stems from the structure of the LSTM unit. LSTM includes a unit specifically designed for long-term storage of information. LSTM also consist of input gate, output gate, and forget gate for precise control of data flow. During the training process, the LSTM neural network adopted the Time Series Back propagation (BPTT) algorithm [40], which further improved the performance of the model. The internal structure of the LSTM computing unit is shown in Fig 1. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. LSTM unit. https://doi.org/10.1371/journal.pone.0310296.g001 The gate control operations consist of a sigmoid activation function and a dot product operation. The forget gate, input gate, and output gate are the three gates used by LSTM to protect and control the state of a single computing unit. The forgetting gate determines the information that needs to be discarded, which can be expressed as Eq (1). (1) Where wf represents the connection weight of the previous output. The symbol ht−1 is the previous output. The symbol xt is the current input. The symbol is the bias vector. The symbol σ is an activation function. The input gate determines the information that needs to be updated. It is obtained by multiplying two vectors created by the input gate layer and tanh layer that expressed in Eqs (2) and (3). (2)(3) The output gate updates the information of the input gate and the forget gate by Eqs (4) and (5). First, the sigmoid activation function determines the output part. Then the cell state is normalized to [–1,1] through the tanh layer. Finally, dot multiplication is performed. (4)(5) Then the final output value of cell is calculated by Formula (6). (6) The LSTM neural network can be obtained by connecting each LSTM unit with a directed graph structure. Fig 2 shows a typical network structure. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Typical LSTM neural network. https://doi.org/10.1371/journal.pone.0310296.g002 The training method of LSTM model usually include two steps. First, the output value of LSTM units is calculated through forward propagation. Then back propagation is used to calculate the error value and the weight gradient. These gradient values are used for updating the calculus gradient descent algorithm. Most neural network optimization algorithms use stochastic gradient descent (SGD), adaptive gradient algorithm (AdaGrad), and adaptive moment estimation algorithm (Adam). The SGD algorithm maintains a single learning rate during execution. Adam function is used in this article which can simultaneously calculate first-order moment estimation and second-order moment estimation. The calculation enable each parameter to obtain independent adaptive learning rates, thereby reducing overfitting problem to a certain extent [41]. In the specific experimental process, hyperparameters such as batch size, number of hidden layer neurons, and number of hidden layers for LSTM are needed to be set by the experimental designer. 3.2 Particle swarm optimization (PSO) algorithm PSO algorithm will be used to optimize the hyperparameters of LSTM neural networks in this study. The core of PSO optimization algorithm is based on cooperation and information sharing among particles in the population. The optimal solution is obtained through iteration. For instance, there are N particles in the D-dimensional search space. The position and velocity of each particle are random during initialization. The current optimal extremum for each particle is the optimal solution obtained by the current individual search (particle best, pbest). The expression of the optimal solution of ith particle is . The expression of the global extremum is . All particles in the particle swarm will update their velocity and position according to Eqs (7) and (8) until the optimal solution is found [39]. (7)(8) The symbol k represents the number of iterations. The spatial position of ith particle is . The speed of ith particle is . The symbol w is the inertia factor used to adjust the search range of the solution space which represents the tendency of particles to maintain their historical velocity. The symbol c1 and c2 is an acceleration constant used to adjust the maximum learning step size. The symbol r1 and r2 is a uniform random number used to increase the randomness of the search, with a range of values between [0, 1]. In addition to the inertia of particles, Eqs (7) and (8) also consider the following two tendencies. One is the tendency of particles that approach their historical best position. The other is the tendency of particles that approach the historical best position of a population or neighborhood. Subsequently, the fitness value of each particle is evaluated based on Eqs (9) and (10). The current fitness value of each particle will be compared and ranked with the fitness value of the global best position (gbest) [39]. The pbest with better fitness values will be updated based on the new global best position gbest. If the number of iterations of the algorithm reaches its maximum value, the extreme value of the particle swarm is taken as the optimal solution. Otherwise, the particle swarm will continuously iterate the above process until the optimal solution is found. (9) (10) Algorithm 1 PSO Algorithm procedure PSO  for each particle i   Initialize velocity Vi and position Xi for particle /   Evaluate particle i and set Pbesti = Xi  end for  Gbest = min {Pbesti}  while not stop   for i = 1 to N   Update the velocity and position of particle   Evaluate particle   if fit (Xi) < fit (Pbesti)    Pbesti = Xi,   if fit(Pbesti) < fit (Gbest)    Gbest = Pbest,   end for  end while  print Gbest end procedure 3.3 PSO-LSTM stock price prediction model The number of optimization iterations and the number of neurons in the hidden layer are used as particles. The flowchart of the PSO-LSTM stock price prediction model proposed in this article is shown in Fig 3. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. The flowchart of the PSO-LSTM stock price prediction model. https://doi.org/10.1371/journal.pone.0310296.g003 Three LSTM neural networks with one hidden layer, two hidden layers, and three hidden layers will be constructed. The number of iterations and the number of neurons in different hidden layers are used as the optimization objectives of the model. This experiment will preprocess six stock index datasets. At the same time, the dataset will be divided into training and testing sets. The root mean square error (RMSE) of the prediction result is used as the result of the fitness function value. pbest and gbest will be updated according to Algorithm 1 and the optimal solution will be selected as outputs. The hyperparameters of the optimal solution are input into the LSTM model to predict stock prices. The results of PSO-LSTM stock prediction models with different hidden layers will be compared based on the evaluation indicators. 3.1 Long short-term memory neural networks LSTM has been widely applied in predicting stock prices in recent years. Traditional RNN is prone to the problem of vanishing gradients. By introducing forget gates and memory units, LSTM can solve the above problems. Due to the presence of memory units, the flow of information is achieved through a mechanism called cellular state. LSTM can selectively retain or forget information based on its importance, which achieves dynamic learning of data patterns and improves prediction accuracy. LSTM can also overcome the long-term dependency problem in recurrent networks as gradients can flow over a longer period of due to the introduction of self recurrent paths. In addition, the original model improvement of LSTM also enhances the reliability of the network [39]. This improvement stems from the structure of the LSTM unit. LSTM includes a unit specifically designed for long-term storage of information. LSTM also consist of input gate, output gate, and forget gate for precise control of data flow. During the training process, the LSTM neural network adopted the Time Series Back propagation (BPTT) algorithm [40], which further improved the performance of the model. The internal structure of the LSTM computing unit is shown in Fig 1. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. LSTM unit. https://doi.org/10.1371/journal.pone.0310296.g001 The gate control operations consist of a sigmoid activation function and a dot product operation. The forget gate, input gate, and output gate are the three gates used by LSTM to protect and control the state of a single computing unit. The forgetting gate determines the information that needs to be discarded, which can be expressed as Eq (1). (1) Where wf represents the connection weight of the previous output. The symbol ht−1 is the previous output. The symbol xt is the current input. The symbol is the bias vector. The symbol σ is an activation function. The input gate determines the information that needs to be updated. It is obtained by multiplying two vectors created by the input gate layer and tanh layer that expressed in Eqs (2) and (3). (2)(3) The output gate updates the information of the input gate and the forget gate by Eqs (4) and (5). First, the sigmoid activation function determines the output part. Then the cell state is normalized to [–1,1] through the tanh layer. Finally, dot multiplication is performed. (4)(5) Then the final output value of cell is calculated by Formula (6). (6) The LSTM neural network can be obtained by connecting each LSTM unit with a directed graph structure. Fig 2 shows a typical network structure. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Typical LSTM neural network. https://doi.org/10.1371/journal.pone.0310296.g002 The training method of LSTM model usually include two steps. First, the output value of LSTM units is calculated through forward propagation. Then back propagation is used to calculate the error value and the weight gradient. These gradient values are used for updating the calculus gradient descent algorithm. Most neural network optimization algorithms use stochastic gradient descent (SGD), adaptive gradient algorithm (AdaGrad), and adaptive moment estimation algorithm (Adam). The SGD algorithm maintains a single learning rate during execution. Adam function is used in this article which can simultaneously calculate first-order moment estimation and second-order moment estimation. The calculation enable each parameter to obtain independent adaptive learning rates, thereby reducing overfitting problem to a certain extent [41]. In the specific experimental process, hyperparameters such as batch size, number of hidden layer neurons, and number of hidden layers for LSTM are needed to be set by the experimental designer. 3.2 Particle swarm optimization (PSO) algorithm PSO algorithm will be used to optimize the hyperparameters of LSTM neural networks in this study. The core of PSO optimization algorithm is based on cooperation and information sharing among particles in the population. The optimal solution is obtained through iteration. For instance, there are N particles in the D-dimensional search space. The position and velocity of each particle are random during initialization. The current optimal extremum for each particle is the optimal solution obtained by the current individual search (particle best, pbest). The expression of the optimal solution of ith particle is . The expression of the global extremum is . All particles in the particle swarm will update their velocity and position according to Eqs (7) and (8) until the optimal solution is found [39]. (7)(8) The symbol k represents the number of iterations. The spatial position of ith particle is . The speed of ith particle is . The symbol w is the inertia factor used to adjust the search range of the solution space which represents the tendency of particles to maintain their historical velocity. The symbol c1 and c2 is an acceleration constant used to adjust the maximum learning step size. The symbol r1 and r2 is a uniform random number used to increase the randomness of the search, with a range of values between [0, 1]. In addition to the inertia of particles, Eqs (7) and (8) also consider the following two tendencies. One is the tendency of particles that approach their historical best position. The other is the tendency of particles that approach the historical best position of a population or neighborhood. Subsequently, the fitness value of each particle is evaluated based on Eqs (9) and (10). The current fitness value of each particle will be compared and ranked with the fitness value of the global best position (gbest) [39]. The pbest with better fitness values will be updated based on the new global best position gbest. If the number of iterations of the algorithm reaches its maximum value, the extreme value of the particle swarm is taken as the optimal solution. Otherwise, the particle swarm will continuously iterate the above process until the optimal solution is found. (9) (10) Algorithm 1 PSO Algorithm procedure PSO  for each particle i   Initialize velocity Vi and position Xi for particle /   Evaluate particle i and set Pbesti = Xi  end for  Gbest = min {Pbesti}  while not stop   for i = 1 to N   Update the velocity and position of particle   Evaluate particle   if fit (Xi) < fit (Pbesti)    Pbesti = Xi,   if fit(Pbesti) < fit (Gbest)    Gbest = Pbest,   end for  end while  print Gbest end procedure 3.3 PSO-LSTM stock price prediction model The number of optimization iterations and the number of neurons in the hidden layer are used as particles. The flowchart of the PSO-LSTM stock price prediction model proposed in this article is shown in Fig 3. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. The flowchart of the PSO-LSTM stock price prediction model. https://doi.org/10.1371/journal.pone.0310296.g003 Three LSTM neural networks with one hidden layer, two hidden layers, and three hidden layers will be constructed. The number of iterations and the number of neurons in different hidden layers are used as the optimization objectives of the model. This experiment will preprocess six stock index datasets. At the same time, the dataset will be divided into training and testing sets. The root mean square error (RMSE) of the prediction result is used as the result of the fitness function value. pbest and gbest will be updated according to Algorithm 1 and the optimal solution will be selected as outputs. The hyperparameters of the optimal solution are input into the LSTM model to predict stock prices. The results of PSO-LSTM stock prediction models with different hidden layers will be compared based on the evaluation indicators. 4 Research data and experiment The six stock indices used to test the performance of the model include the Dow Jones Industrial Average (DJIA) index of the New York Stock Exchange, Standard & Poor’s 500 (S&P 500) index, Nikkei 225 index of Tokyo, Hang Seng Index of Hong Kong market, CSI300 index of Chinese Mainland stock market, and Nifty50 index of India. The data are from the WIND database (http://www.wind.com.cn) provided by Shanghai Wind Information Co., Ltd, CSMAR database (http://www.gtarsc.com) provided by Shenzhen GTA Education Tech. Ltd., and the global financial portal Investing.com. The time span were from 2008/07/02 to 2016/09/30. 4.1 Input variables Stock prices are influenced by both macro and micro environments. Therefore, three sets of variables are selected as input variables which are shown in Table 1. The first set of input variables are the historical trading data which include open, high, low, close prices and the trading volume [42]. These raw prices represent fundamental trading information. The details are described as No.1-5 in Table 1. The second set of input variables consists of 12 technical indicators that can grasp moving trends of stock price [43]. The details are described as No.6-15 in Table 1. The final set of inputs is the macroeconomic indicators, which include the exchange rate and interest rate [44]. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Considered features and their types. https://doi.org/10.1371/journal.pone.0310296.t001 4.2 Correlation analysis From the perspective of stock trading, the closing price is an important factor in formulating trading strategies. To avoid multicollinearity, the numerical values of Pearson correlation coefficients between the closing price and other features are calculated. Table 2 show the results of Pearson correlation coefficients of DJIA as an sample. SPSS analysis shows that features with correlation coefficients above 95% have a significant impact on price fluctuations (** Correlation is significant at the 0.01 level (2-dailed)). Therefore, features with higher correlation coefficients will be removed. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Pearson correlation coefficient of DJIA. https://doi.org/10.1371/journal.pone.0310296.t002 To further demonstrate the relevance, correlation heatmap of DJIA is illustrated as Fig 4. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. DJIA correlation heatmap. https://doi.org/10.1371/journal.pone.0310296.g004 4.3 Data pre-processing 4.3.1 Data denoising. Due to the complexity and high-noise of the stock market, data denoising is a necessary means to improve the performance of LSTM models. Wavelet transform (WT) has the ability to simultaneously analyze the frequency components of financial time series. Therefore, WT can process highly non-stationary financial time series data [45]. The Haar function is used as the wavelet function in the study, which not only decomposes time series into time-domain and frequency-domain, but also has the advantage of short computation time [46]. Continuous Wavelet Transform (CWT) extracts features of time series in the dimensions of time and scale, but the coefficients contain a large amount of redundant information and require further dimensionality reduction. Therefore, Discrete Wavelet Transform (DWT) has gradually become a more common method, which can extracts features more effectively by decomposing time series into orthogonal component sets. Fig 5 expressed the closing price of the S&P500 index for each trading day from July 1, 2008 to October 1, 2016. Fig 5(A) shows the original historical closing prices without noise reduction, which are unstable and noisy. The Pywt library in Python was used for DWT. Fig 5(B) shows the closing price of the S&P500 index after three-layer wavelet decomposition, with a more stable sequence and less noise. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. Comparison of closing price of the S&P500 index before denoising (a) and after denoising (b). https://doi.org/10.1371/journal.pone.0310296.g005 4.3.2 Data normalization. There are three sets of input variables with different dimensions used in the experiments. Therefore, it is necessary to standardize the data to be within the output range of the activation function. MinMaxScaler [47] in Pandas is used to process different scaled features into the ranges [–1,1] based on Eq (11). Specifically, as data normalization preserves all relationships in the data precisely, it avoids bias [48]. (11) where, xnorm is the converted value. xmax is the maximum value of the sample and xmin is the minimum value of the sample. 4.4 Experimental Environment and evaluation indicators The deep learning framework TensorFlow in the Keras framework [49] is served as back end support to construct prediction models. Tables 3–5 provide the details of the experimental environments. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Experimental environment of hardware. https://doi.org/10.1371/journal.pone.0310296.t003 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Experimental environment of software. https://doi.org/10.1371/journal.pone.0310296.t004 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 5. The parameters of LSTM training model. https://doi.org/10.1371/journal.pone.0310296.t005 Six stock indices were used to train and test predictive models in this experiment. The closing price is used as the predicted value, and the difference between the predicted value and the true value is used as the predictive effect. The evaluation indicators used in this study which are shown as Eqs (12)–(15) were root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and goodness of fit (R2) of the model [7–10]. RMSE is defined as (12) MAE is given as (13) MAPE is defined as (14) R2 is defined as (15) The RMSE, MAE, and MAPE were used to measure the deviation between the actual and predicted values. The smaller the value, the closer the predicted value to the actual value. R2 was used to measure the degree of model fitting. The closer it was to 1, the better the model fits the actual prices. 4.1 Input variables Stock prices are influenced by both macro and micro environments. Therefore, three sets of variables are selected as input variables which are shown in Table 1. The first set of input variables are the historical trading data which include open, high, low, close prices and the trading volume [42]. These raw prices represent fundamental trading information. The details are described as No.1-5 in Table 1. The second set of input variables consists of 12 technical indicators that can grasp moving trends of stock price [43]. The details are described as No.6-15 in Table 1. The final set of inputs is the macroeconomic indicators, which include the exchange rate and interest rate [44]. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Considered features and their types. https://doi.org/10.1371/journal.pone.0310296.t001 4.2 Correlation analysis From the perspective of stock trading, the closing price is an important factor in formulating trading strategies. To avoid multicollinearity, the numerical values of Pearson correlation coefficients between the closing price and other features are calculated. Table 2 show the results of Pearson correlation coefficients of DJIA as an sample. SPSS analysis shows that features with correlation coefficients above 95% have a significant impact on price fluctuations (** Correlation is significant at the 0.01 level (2-dailed)). Therefore, features with higher correlation coefficients will be removed. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Pearson correlation coefficient of DJIA. https://doi.org/10.1371/journal.pone.0310296.t002 To further demonstrate the relevance, correlation heatmap of DJIA is illustrated as Fig 4. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. DJIA correlation heatmap. https://doi.org/10.1371/journal.pone.0310296.g004 4.3 Data pre-processing 4.3.1 Data denoising. Due to the complexity and high-noise of the stock market, data denoising is a necessary means to improve the performance of LSTM models. Wavelet transform (WT) has the ability to simultaneously analyze the frequency components of financial time series. Therefore, WT can process highly non-stationary financial time series data [45]. The Haar function is used as the wavelet function in the study, which not only decomposes time series into time-domain and frequency-domain, but also has the advantage of short computation time [46]. Continuous Wavelet Transform (CWT) extracts features of time series in the dimensions of time and scale, but the coefficients contain a large amount of redundant information and require further dimensionality reduction. Therefore, Discrete Wavelet Transform (DWT) has gradually become a more common method, which can extracts features more effectively by decomposing time series into orthogonal component sets. Fig 5 expressed the closing price of the S&P500 index for each trading day from July 1, 2008 to October 1, 2016. Fig 5(A) shows the original historical closing prices without noise reduction, which are unstable and noisy. The Pywt library in Python was used for DWT. Fig 5(B) shows the closing price of the S&P500 index after three-layer wavelet decomposition, with a more stable sequence and less noise. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. Comparison of closing price of the S&P500 index before denoising (a) and after denoising (b). https://doi.org/10.1371/journal.pone.0310296.g005 4.3.2 Data normalization. There are three sets of input variables with different dimensions used in the experiments. Therefore, it is necessary to standardize the data to be within the output range of the activation function. MinMaxScaler [47] in Pandas is used to process different scaled features into the ranges [–1,1] based on Eq (11). Specifically, as data normalization preserves all relationships in the data precisely, it avoids bias [48]. (11) where, xnorm is the converted value. xmax is the maximum value of the sample and xmin is the minimum value of the sample. 4.3.1 Data denoising. Due to the complexity and high-noise of the stock market, data denoising is a necessary means to improve the performance of LSTM models. Wavelet transform (WT) has the ability to simultaneously analyze the frequency components of financial time series. Therefore, WT can process highly non-stationary financial time series data [45]. The Haar function is used as the wavelet function in the study, which not only decomposes time series into time-domain and frequency-domain, but also has the advantage of short computation time [46]. Continuous Wavelet Transform (CWT) extracts features of time series in the dimensions of time and scale, but the coefficients contain a large amount of redundant information and require further dimensionality reduction. Therefore, Discrete Wavelet Transform (DWT) has gradually become a more common method, which can extracts features more effectively by decomposing time series into orthogonal component sets. Fig 5 expressed the closing price of the S&P500 index for each trading day from July 1, 2008 to October 1, 2016. Fig 5(A) shows the original historical closing prices without noise reduction, which are unstable and noisy. The Pywt library in Python was used for DWT. Fig 5(B) shows the closing price of the S&P500 index after three-layer wavelet decomposition, with a more stable sequence and less noise. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. Comparison of closing price of the S&P500 index before denoising (a) and after denoising (b). https://doi.org/10.1371/journal.pone.0310296.g005 4.3.2 Data normalization. There are three sets of input variables with different dimensions used in the experiments. Therefore, it is necessary to standardize the data to be within the output range of the activation function. MinMaxScaler [47] in Pandas is used to process different scaled features into the ranges [–1,1] based on Eq (11). Specifically, as data normalization preserves all relationships in the data precisely, it avoids bias [48]. (11) where, xnorm is the converted value. xmax is the maximum value of the sample and xmin is the minimum value of the sample. 4.4 Experimental Environment and evaluation indicators The deep learning framework TensorFlow in the Keras framework [49] is served as back end support to construct prediction models. Tables 3–5 provide the details of the experimental environments. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Experimental environment of hardware. https://doi.org/10.1371/journal.pone.0310296.t003 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Experimental environment of software. https://doi.org/10.1371/journal.pone.0310296.t004 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 5. The parameters of LSTM training model. https://doi.org/10.1371/journal.pone.0310296.t005 Six stock indices were used to train and test predictive models in this experiment. The closing price is used as the predicted value, and the difference between the predicted value and the true value is used as the predictive effect. The evaluation indicators used in this study which are shown as Eqs (12)–(15) were root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and goodness of fit (R2) of the model [7–10]. RMSE is defined as (12) MAE is given as (13) MAPE is defined as (14) R2 is defined as (15) The RMSE, MAE, and MAPE were used to measure the deviation between the actual and predicted values. The smaller the value, the closer the predicted value to the actual value. R2 was used to measure the degree of model fitting. The closer it was to 1, the better the model fits the actual prices. 5. Experimental results of PSO-LSTM forecasting model 5.1 Experiments of PSO-LSTM model (Parameters manually set) Firstly, the results of PSO optimized LSTM model with one hidden layer, two hidden layers, and three hidden layers will evaluated in this experiment. The parameters for initializing PSO will be based on Eqs (7) and (8). The group size is set to 20. The range of the number of neurons and iteration times in the hidden layer are set to [0, 300]. The range of particles is set to [0, 300]. The speed range of particles is set to [–2, 2]. The maximum number of iterations for PSO is set to 50. The inertia weight w is a major parameter. The larger the inertia weight w is, the stronger the global optimization ability is and the weaker the local optimization ability is. w is set to 0.8 in the experiment. The acceleration constants c1 and c2 are set to 1.5, indicating that the weights for individual and population particles are the same. The first 80% of the data are used as the training set while the last 20% are used as the testing set. For each experiment, 10 tests are conducted and evaluated using the average of the results. In section 4.1, the impact of manually setting the number of neurons on the experimental results is discussed. Due to too many combinations, a set of experiments are selected to present the results. In Section 5.1 and 5.2, a 50 day retrospective period is used to validate the model. 5.1.1 Impact of increasing the number of hidden neurons on experimental results The prediction errors of different hidden neurons when a hidden layer is used and training rounds are set to 100 is shown in Table 6. It can be seen that increasing the numbers of hidden neurons has no significant effect on the experimental results. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 6. The prediction errors of different hidden neurons. (Hidden layer = 1, epochs = 100). https://doi.org/10.1371/journal.pone.0310296.t006 5.1.2 The impact of different training rounds on experimental results. Next, the impact of different training rounds in the LSTM model is considered. The prediction errors for different training rounds when the number of hidden neurons and hidden layers are set to 200 and 1 respectively are shown in Table 7. As the number of training rounds increases, the model can better extract features from the data set, which reduce the prediction error of the model. The first number of hidden neurons represents the number of neurons in the first hidden layer of the LSTM model, the second number of hidden neurons represents the number of neurons in the second hidden layer, and so on. However, if the number of training rounds increases beyond the appropriate range, the trained model may experience overfitting. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 7. The prediction errors of different training epochs. (Hidden neurons = 200, hidden layer = 1). https://doi.org/10.1371/journal.pone.0310296.t007 5.1.3 The impact of different numbers of hidden layers on experimental results. Finally, the effect of using different numbers of hidden layers is considered. It can be seen Table 8 that increasing the number of LSTM layers will affect the results. Adding LSTM layers helps improve the ability of neural networks to extract data features. However, in regard to different datasets, the optimal model parameters can only be obtained by selecting an appropriate number of layers for experiments. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 8. The prediction errors of using different number of hidden layers. (Hidden neurons = 200, epochs = 200). https://doi.org/10.1371/journal.pone.0310296.t008 5.1.4 PSO-LSTM model with optimal parameters. The prediction errors of the optimal parameters of the LSTM model are summarized in the Table 9 for the six stock indices representing different development markets. The comparison between the true and predicted values of the experimental dataset using LSTM and PSO-LSTM models are shown in Fig 6, respectively. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 6. The line charts of real and predicted close price in 6 indices: (a) DJIA, (a) S&P500, (c) HangSeng, (d) Nikkei225, (e) CSI300, and (f) Nifty 50. https://doi.org/10.1371/journal.pone.0310296.g006 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 9. The prediction errors of the optimal parameters. https://doi.org/10.1371/journal.pone.0310296.t009 5.2 Experiments of PSO-LSTM model (PSO automatically optimize parameters) In section 5.1, experiments with manually setting parameters to determine the appropriate neural network model for prediction are conducted. In section 5.2, the hyperparameters of the LSTM neural network are automatically optimized by PSO. In Section 5.2, a 50 day retrospective period is used to validate the model. The accuracy of the neural network before and after optimization will be compared. The data from six stock indices proves the degree of fit between the predicted values of the PSO-LSTM model and the actual values. The prediction errors and optimal parameters of LSTM and PSO-LSTM models are listed in Table 10. The changes in fitness (MSE) of the PSO-LSTM model during the evolution process are shown in Fig 7. As the number of iterations of the PSO algorithm increases, the loss gradually decreases. The PSO-LSTM model performs better than the LSTM model in almost all different network layers and experimental data. Hyperparameter alternatives of the PSO-LSTM model are seen in Table 10. PSO and search and try to find best hyperparameters. The settings of algorithm parameter values are shown in Table 11. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 7. Loss change in iteration of 6 stock indices. https://doi.org/10.1371/journal.pone.0310296.g007 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 10. Hyperparameter alternatives. https://doi.org/10.1371/journal.pone.0310296.t010 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 11. PSO parameter values. https://doi.org/10.1371/journal.pone.0310296.t011 As mentioned above, PSO is used to optimize LSTM’s parameters. To validate the effectiveness, the comparison of LSTM parameters and errors before and after using PSO algorithm is expressed in Table 12. Furthermore, Fig 8 shows the box plots of the performance scores of PSO-LSTM model with 10 replications. Table 13 illustrates the performance scores of the PSO-LSTM models in the test data. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 8. Box plots of the performance scores of PSO-LSTM model with 10 replications. https://doi.org/10.1371/journal.pone.0310296.g008 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 12. Comparison of LSTM parameters and errors before and after using PSO algorithm. https://doi.org/10.1371/journal.pone.0310296.t012 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 13. The performance scores of the PSO-LSTM models in the test data. https://doi.org/10.1371/journal.pone.0310296.t013 5.3 Experiments of other well known forecasting methods 5.3.1 Comparison of machine learning model. There are many other classic algorithms for time series prediction. In this section, PSO-LSTM model proposed in this article will be compared with several popular methods adopted by researchers in recent years. In addition, the performance of the model under different prediction cycles can be more comprehensively evaluated by conducting experiments on retrospective periods of different time spans. In Section 5.3, a 50 day retrospective period is used to validate the model. In section 5.4, a 20-day and 7-day lookback period will be used to validate the model. XGBoost can parallelly combine multiple weak classifiers (decision trees) into a strong classifier (elevation tree) through result weighting, making it one of the most efficient and high-performance algorithms in the engineering field [50]. XGBoost is often combined with deep learning models [48] or intelligent optimization algorithms [51, 52] for time series prediction. RF is an ensemble classifier based on Bagging proposed by Breiman [53]. RF uses the best classification result from all decision trees as the final result, which has been widely used in the field of stock prediction [54]. K-Nearest Neighbor (KNN) neural network algorithm lies in an accurate classification method, which classifies adjacent data parameters [55]. KNN has been successfully applied in financial time series forecasting [56]. Support Vector Machine (SVM) is a machine learning algorithm [57]. SVM is effective for regression and classification problems with multivariate features, which is also applicable to multivariate data for stock price prediction. The performance of support vector machines depends on the selection of kernel functions. Therefore, in practical problems, it is very important to choose a suitable kernel function to establish an SVM algorithm based on real data models. Support vector machines have been widely used in time series prediction [58–60]. Multi-layer perceptron (MLP) is a feed forward artificial neural network model. MLP neural network simplifies the structure of biological neurons to obtain the basic structure of the neural network, which is widely used for time series prediction [61, 62]. Bi directional long short-term memory (Bi-LSTM) model is a method of continuous input information based on LSTM with two input sequences, positive and negative. It is a special variant of recurrent neural networks, while retaining the advantages of LSTM in processing long-term correlated sequences and compensating for the disadvantage of LSTM in not using contextual information for prediction. This method has achieved good results in interactive prediction [63–65]. As for SVM, different activation functions are tested and the best results are obtained through RBF activation functions. Therefore, the parameter Gamma of RBF kernel needs to be adjusted. The optimal performance of SVM was gamma = 0.01. As for MLP, there is no fixed rule for selecting the number of hidden layers and the optimal number of neurons in the hidden layers. In order to compare the performance of the MLP and PSO-LSTM, the number of neurons, number of hidden layers, learning rate, number of training rounds, and other parameters of MLP were configured to be similar to PSO-LSTM. As for KNN, the best value of k was found to be 5 through testing, and the best values for different datasets were listed in the table. In all experiments, the first 80% of the data were used as the training set, and the last 20% were used as the testing set. This process was repeated 10 times. Finally, based on all experimental results, the average values are recorded as the final results. This technique is used to estimate the accuracy of prediction models in practical applications and avoid overfitting problems. From the experimental results of the six datasets listed in Table 14, it can be seen that PSO-LSTM achieved the best performance in the HangSeng, Nikkei225, and Nifty50 datasets. PSO-LSTM ranks in the top three in predictive performance on DJIA, S&P500, and CSI300 datasets. Compared with seven other machine learning models, PSO-LSTM achieved excellent predictive performances. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 14. Predicting errors of 50 days look back of different methods. https://doi.org/10.1371/journal.pone.0310296.t014 The p-value of the symbol test for the algorithm proposed in this article is 0.0078125 compared to other machine learning algorithms. The p-value of the Wilcoxon rank test is 0.005411. The difference between the PSO-LSTM algorithm proposed in this article and other traditional methods is at the level of 0.05 (95%), demonstrating the significant advantage of the proposed algorithm. 5.3.2 Comparison of state-of-the-art models. The comparative experimental results between PSO-LSTM and other state-of-the-art (SOTA) models are shown in Table 15. The comparative experiment includes two situations: using different datasets and using the same dataset with different time periods. The main contribution of this article is to propose a model for optimizing the structure of neural networks. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 15. Comparison of SOTA models. https://doi.org/10.1371/journal.pone.0310296.t015 5.4 Experiments on different lookback periods (20 and 7 day lookback periods) In sections 5.1 and 5.2, the experiments are conducted based on a 50 day retrospective period. In section 5.4, the performances under different lookback periods are further investigated. The changes at the current time point are predicted by analyzing the lookback period data of the S&P500 dataset. The results are listed in Table 14. It can be seen that there are no significant difference between using a 50 day (results of S&P500 of Table 14 in Section 5.3) and a seven day retrospective period (Table 16) and a twenty day retrospective period (Table 17) for prediction. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 16. Predicting errors of seven days look back in S&P500. https://doi.org/10.1371/journal.pone.0310296.t016 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 17. Predicting errors of twenty days look back in S&P500. https://doi.org/10.1371/journal.pone.0310296.t017 As depicted in the table, the proposed model consistently outperforms alternative machine learning models across various retrospective periods (7 days, 20 days, and 50 days), with a minimum 25% increase in prediction accuracy. These findings underscore the robustness and applicability of the PSO-LSTM model, particularly in high-frequency trading scenarios. 5.1 Experiments of PSO-LSTM model (Parameters manually set) Firstly, the results of PSO optimized LSTM model with one hidden layer, two hidden layers, and three hidden layers will evaluated in this experiment. The parameters for initializing PSO will be based on Eqs (7) and (8). The group size is set to 20. The range of the number of neurons and iteration times in the hidden layer are set to [0, 300]. The range of particles is set to [0, 300]. The speed range of particles is set to [–2, 2]. The maximum number of iterations for PSO is set to 50. The inertia weight w is a major parameter. The larger the inertia weight w is, the stronger the global optimization ability is and the weaker the local optimization ability is. w is set to 0.8 in the experiment. The acceleration constants c1 and c2 are set to 1.5, indicating that the weights for individual and population particles are the same. The first 80% of the data are used as the training set while the last 20% are used as the testing set. For each experiment, 10 tests are conducted and evaluated using the average of the results. In section 4.1, the impact of manually setting the number of neurons on the experimental results is discussed. Due to too many combinations, a set of experiments are selected to present the results. In Section 5.1 and 5.2, a 50 day retrospective period is used to validate the model. 5.1.1 Impact of increasing the number of hidden neurons on experimental results The prediction errors of different hidden neurons when a hidden layer is used and training rounds are set to 100 is shown in Table 6. It can be seen that increasing the numbers of hidden neurons has no significant effect on the experimental results. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 6. The prediction errors of different hidden neurons. (Hidden layer = 1, epochs = 100). https://doi.org/10.1371/journal.pone.0310296.t006 5.1.2 The impact of different training rounds on experimental results. Next, the impact of different training rounds in the LSTM model is considered. The prediction errors for different training rounds when the number of hidden neurons and hidden layers are set to 200 and 1 respectively are shown in Table 7. As the number of training rounds increases, the model can better extract features from the data set, which reduce the prediction error of the model. The first number of hidden neurons represents the number of neurons in the first hidden layer of the LSTM model, the second number of hidden neurons represents the number of neurons in the second hidden layer, and so on. However, if the number of training rounds increases beyond the appropriate range, the trained model may experience overfitting. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 7. The prediction errors of different training epochs. (Hidden neurons = 200, hidden layer = 1). https://doi.org/10.1371/journal.pone.0310296.t007 5.1.3 The impact of different numbers of hidden layers on experimental results. Finally, the effect of using different numbers of hidden layers is considered. It can be seen Table 8 that increasing the number of LSTM layers will affect the results. Adding LSTM layers helps improve the ability of neural networks to extract data features. However, in regard to different datasets, the optimal model parameters can only be obtained by selecting an appropriate number of layers for experiments. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 8. The prediction errors of using different number of hidden layers. (Hidden neurons = 200, epochs = 200). https://doi.org/10.1371/journal.pone.0310296.t008 5.1.4 PSO-LSTM model with optimal parameters. The prediction errors of the optimal parameters of the LSTM model are summarized in the Table 9 for the six stock indices representing different development markets. The comparison between the true and predicted values of the experimental dataset using LSTM and PSO-LSTM models are shown in Fig 6, respectively. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 6. The line charts of real and predicted close price in 6 indices: (a) DJIA, (a) S&P500, (c) HangSeng, (d) Nikkei225, (e) CSI300, and (f) Nifty 50. https://doi.org/10.1371/journal.pone.0310296.g006 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 9. The prediction errors of the optimal parameters. https://doi.org/10.1371/journal.pone.0310296.t009 5.1.2 The impact of different training rounds on experimental results. Next, the impact of different training rounds in the LSTM model is considered. The prediction errors for different training rounds when the number of hidden neurons and hidden layers are set to 200 and 1 respectively are shown in Table 7. As the number of training rounds increases, the model can better extract features from the data set, which reduce the prediction error of the model. The first number of hidden neurons represents the number of neurons in the first hidden layer of the LSTM model, the second number of hidden neurons represents the number of neurons in the second hidden layer, and so on. However, if the number of training rounds increases beyond the appropriate range, the trained model may experience overfitting. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 7. The prediction errors of different training epochs. (Hidden neurons = 200, hidden layer = 1). https://doi.org/10.1371/journal.pone.0310296.t007 5.1.3 The impact of different numbers of hidden layers on experimental results. Finally, the effect of using different numbers of hidden layers is considered. It can be seen Table 8 that increasing the number of LSTM layers will affect the results. Adding LSTM layers helps improve the ability of neural networks to extract data features. However, in regard to different datasets, the optimal model parameters can only be obtained by selecting an appropriate number of layers for experiments. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 8. The prediction errors of using different number of hidden layers. (Hidden neurons = 200, epochs = 200). https://doi.org/10.1371/journal.pone.0310296.t008 5.1.4 PSO-LSTM model with optimal parameters. The prediction errors of the optimal parameters of the LSTM model are summarized in the Table 9 for the six stock indices representing different development markets. The comparison between the true and predicted values of the experimental dataset using LSTM and PSO-LSTM models are shown in Fig 6, respectively. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 6. The line charts of real and predicted close price in 6 indices: (a) DJIA, (a) S&P500, (c) HangSeng, (d) Nikkei225, (e) CSI300, and (f) Nifty 50. https://doi.org/10.1371/journal.pone.0310296.g006 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 9. The prediction errors of the optimal parameters. https://doi.org/10.1371/journal.pone.0310296.t009 5.2 Experiments of PSO-LSTM model (PSO automatically optimize parameters) In section 5.1, experiments with manually setting parameters to determine the appropriate neural network model for prediction are conducted. In section 5.2, the hyperparameters of the LSTM neural network are automatically optimized by PSO. In Section 5.2, a 50 day retrospective period is used to validate the model. The accuracy of the neural network before and after optimization will be compared. The data from six stock indices proves the degree of fit between the predicted values of the PSO-LSTM model and the actual values. The prediction errors and optimal parameters of LSTM and PSO-LSTM models are listed in Table 10. The changes in fitness (MSE) of the PSO-LSTM model during the evolution process are shown in Fig 7. As the number of iterations of the PSO algorithm increases, the loss gradually decreases. The PSO-LSTM model performs better than the LSTM model in almost all different network layers and experimental data. Hyperparameter alternatives of the PSO-LSTM model are seen in Table 10. PSO and search and try to find best hyperparameters. The settings of algorithm parameter values are shown in Table 11. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 7. Loss change in iteration of 6 stock indices. https://doi.org/10.1371/journal.pone.0310296.g007 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 10. Hyperparameter alternatives. https://doi.org/10.1371/journal.pone.0310296.t010 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 11. PSO parameter values. https://doi.org/10.1371/journal.pone.0310296.t011 As mentioned above, PSO is used to optimize LSTM’s parameters. To validate the effectiveness, the comparison of LSTM parameters and errors before and after using PSO algorithm is expressed in Table 12. Furthermore, Fig 8 shows the box plots of the performance scores of PSO-LSTM model with 10 replications. Table 13 illustrates the performance scores of the PSO-LSTM models in the test data. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 8. Box plots of the performance scores of PSO-LSTM model with 10 replications. https://doi.org/10.1371/journal.pone.0310296.g008 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 12. Comparison of LSTM parameters and errors before and after using PSO algorithm. https://doi.org/10.1371/journal.pone.0310296.t012 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 13. The performance scores of the PSO-LSTM models in the test data. https://doi.org/10.1371/journal.pone.0310296.t013 5.3 Experiments of other well known forecasting methods 5.3.1 Comparison of machine learning model. There are many other classic algorithms for time series prediction. In this section, PSO-LSTM model proposed in this article will be compared with several popular methods adopted by researchers in recent years. In addition, the performance of the model under different prediction cycles can be more comprehensively evaluated by conducting experiments on retrospective periods of different time spans. In Section 5.3, a 50 day retrospective period is used to validate the model. In section 5.4, a 20-day and 7-day lookback period will be used to validate the model. XGBoost can parallelly combine multiple weak classifiers (decision trees) into a strong classifier (elevation tree) through result weighting, making it one of the most efficient and high-performance algorithms in the engineering field [50]. XGBoost is often combined with deep learning models [48] or intelligent optimization algorithms [51, 52] for time series prediction. RF is an ensemble classifier based on Bagging proposed by Breiman [53]. RF uses the best classification result from all decision trees as the final result, which has been widely used in the field of stock prediction [54]. K-Nearest Neighbor (KNN) neural network algorithm lies in an accurate classification method, which classifies adjacent data parameters [55]. KNN has been successfully applied in financial time series forecasting [56]. Support Vector Machine (SVM) is a machine learning algorithm [57]. SVM is effective for regression and classification problems with multivariate features, which is also applicable to multivariate data for stock price prediction. The performance of support vector machines depends on the selection of kernel functions. Therefore, in practical problems, it is very important to choose a suitable kernel function to establish an SVM algorithm based on real data models. Support vector machines have been widely used in time series prediction [58–60]. Multi-layer perceptron (MLP) is a feed forward artificial neural network model. MLP neural network simplifies the structure of biological neurons to obtain the basic structure of the neural network, which is widely used for time series prediction [61, 62]. Bi directional long short-term memory (Bi-LSTM) model is a method of continuous input information based on LSTM with two input sequences, positive and negative. It is a special variant of recurrent neural networks, while retaining the advantages of LSTM in processing long-term correlated sequences and compensating for the disadvantage of LSTM in not using contextual information for prediction. This method has achieved good results in interactive prediction [63–65]. As for SVM, different activation functions are tested and the best results are obtained through RBF activation functions. Therefore, the parameter Gamma of RBF kernel needs to be adjusted. The optimal performance of SVM was gamma = 0.01. As for MLP, there is no fixed rule for selecting the number of hidden layers and the optimal number of neurons in the hidden layers. In order to compare the performance of the MLP and PSO-LSTM, the number of neurons, number of hidden layers, learning rate, number of training rounds, and other parameters of MLP were configured to be similar to PSO-LSTM. As for KNN, the best value of k was found to be 5 through testing, and the best values for different datasets were listed in the table. In all experiments, the first 80% of the data were used as the training set, and the last 20% were used as the testing set. This process was repeated 10 times. Finally, based on all experimental results, the average values are recorded as the final results. This technique is used to estimate the accuracy of prediction models in practical applications and avoid overfitting problems. From the experimental results of the six datasets listed in Table 14, it can be seen that PSO-LSTM achieved the best performance in the HangSeng, Nikkei225, and Nifty50 datasets. PSO-LSTM ranks in the top three in predictive performance on DJIA, S&P500, and CSI300 datasets. Compared with seven other machine learning models, PSO-LSTM achieved excellent predictive performances. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 14. Predicting errors of 50 days look back of different methods. https://doi.org/10.1371/journal.pone.0310296.t014 The p-value of the symbol test for the algorithm proposed in this article is 0.0078125 compared to other machine learning algorithms. The p-value of the Wilcoxon rank test is 0.005411. The difference between the PSO-LSTM algorithm proposed in this article and other traditional methods is at the level of 0.05 (95%), demonstrating the significant advantage of the proposed algorithm. 5.3.2 Comparison of state-of-the-art models. The comparative experimental results between PSO-LSTM and other state-of-the-art (SOTA) models are shown in Table 15. The comparative experiment includes two situations: using different datasets and using the same dataset with different time periods. The main contribution of this article is to propose a model for optimizing the structure of neural networks. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 15. Comparison of SOTA models. https://doi.org/10.1371/journal.pone.0310296.t015 5.3.1 Comparison of machine learning model. There are many other classic algorithms for time series prediction. In this section, PSO-LSTM model proposed in this article will be compared with several popular methods adopted by researchers in recent years. In addition, the performance of the model under different prediction cycles can be more comprehensively evaluated by conducting experiments on retrospective periods of different time spans. In Section 5.3, a 50 day retrospective period is used to validate the model. In section 5.4, a 20-day and 7-day lookback period will be used to validate the model. XGBoost can parallelly combine multiple weak classifiers (decision trees) into a strong classifier (elevation tree) through result weighting, making it one of the most efficient and high-performance algorithms in the engineering field [50]. XGBoost is often combined with deep learning models [48] or intelligent optimization algorithms [51, 52] for time series prediction. RF is an ensemble classifier based on Bagging proposed by Breiman [53]. RF uses the best classification result from all decision trees as the final result, which has been widely used in the field of stock prediction [54]. K-Nearest Neighbor (KNN) neural network algorithm lies in an accurate classification method, which classifies adjacent data parameters [55]. KNN has been successfully applied in financial time series forecasting [56]. Support Vector Machine (SVM) is a machine learning algorithm [57]. SVM is effective for regression and classification problems with multivariate features, which is also applicable to multivariate data for stock price prediction. The performance of support vector machines depends on the selection of kernel functions. Therefore, in practical problems, it is very important to choose a suitable kernel function to establish an SVM algorithm based on real data models. Support vector machines have been widely used in time series prediction [58–60]. Multi-layer perceptron (MLP) is a feed forward artificial neural network model. MLP neural network simplifies the structure of biological neurons to obtain the basic structure of the neural network, which is widely used for time series prediction [61, 62]. Bi directional long short-term memory (Bi-LSTM) model is a method of continuous input information based on LSTM with two input sequences, positive and negative. It is a special variant of recurrent neural networks, while retaining the advantages of LSTM in processing long-term correlated sequences and compensating for the disadvantage of LSTM in not using contextual information for prediction. This method has achieved good results in interactive prediction [63–65]. As for SVM, different activation functions are tested and the best results are obtained through RBF activation functions. Therefore, the parameter Gamma of RBF kernel needs to be adjusted. The optimal performance of SVM was gamma = 0.01. As for MLP, there is no fixed rule for selecting the number of hidden layers and the optimal number of neurons in the hidden layers. In order to compare the performance of the MLP and PSO-LSTM, the number of neurons, number of hidden layers, learning rate, number of training rounds, and other parameters of MLP were configured to be similar to PSO-LSTM. As for KNN, the best value of k was found to be 5 through testing, and the best values for different datasets were listed in the table. In all experiments, the first 80% of the data were used as the training set, and the last 20% were used as the testing set. This process was repeated 10 times. Finally, based on all experimental results, the average values are recorded as the final results. This technique is used to estimate the accuracy of prediction models in practical applications and avoid overfitting problems. From the experimental results of the six datasets listed in Table 14, it can be seen that PSO-LSTM achieved the best performance in the HangSeng, Nikkei225, and Nifty50 datasets. PSO-LSTM ranks in the top three in predictive performance on DJIA, S&P500, and CSI300 datasets. Compared with seven other machine learning models, PSO-LSTM achieved excellent predictive performances. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 14. Predicting errors of 50 days look back of different methods. https://doi.org/10.1371/journal.pone.0310296.t014 The p-value of the symbol test for the algorithm proposed in this article is 0.0078125 compared to other machine learning algorithms. The p-value of the Wilcoxon rank test is 0.005411. The difference between the PSO-LSTM algorithm proposed in this article and other traditional methods is at the level of 0.05 (95%), demonstrating the significant advantage of the proposed algorithm. 5.3.2 Comparison of state-of-the-art models. The comparative experimental results between PSO-LSTM and other state-of-the-art (SOTA) models are shown in Table 15. The comparative experiment includes two situations: using different datasets and using the same dataset with different time periods. The main contribution of this article is to propose a model for optimizing the structure of neural networks. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 15. Comparison of SOTA models. https://doi.org/10.1371/journal.pone.0310296.t015 5.4 Experiments on different lookback periods (20 and 7 day lookback periods) In sections 5.1 and 5.2, the experiments are conducted based on a 50 day retrospective period. In section 5.4, the performances under different lookback periods are further investigated. The changes at the current time point are predicted by analyzing the lookback period data of the S&P500 dataset. The results are listed in Table 14. It can be seen that there are no significant difference between using a 50 day (results of S&P500 of Table 14 in Section 5.3) and a seven day retrospective period (Table 16) and a twenty day retrospective period (Table 17) for prediction. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 16. Predicting errors of seven days look back in S&P500. https://doi.org/10.1371/journal.pone.0310296.t016 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 17. Predicting errors of twenty days look back in S&P500. https://doi.org/10.1371/journal.pone.0310296.t017 As depicted in the table, the proposed model consistently outperforms alternative machine learning models across various retrospective periods (7 days, 20 days, and 50 days), with a minimum 25% increase in prediction accuracy. These findings underscore the robustness and applicability of the PSO-LSTM model, particularly in high-frequency trading scenarios. 6. Conclusion The proposed PSO-LSTM model offers a novel approach to address the intricate challenge of stock price prediction. Although LSTM neural networks have gained prominence for their adeptness in handling financial time series data, their efficacy in practical scenarios is hampered by the intricate process of parameter optimization. This study introduces a hybrid model that integrates LSTM with PSO to mitigate this limitation and enhance predictive accuracy. The PSO-LSTM model’s adaptability is particularly noteworthy, as it efficiently determines optimal parameters by leveraging the inherent problem-solving capabilities of the PSO algorithm. By swiftly identifying parameter combinations aligned with data characteristics, the model streamlines computational overhead and bolsters predictive performance. Empirical evaluations underscore the efficacy of the proposed approach. Manual exploration of key parameters, including the number of hidden neurons and layers, indicates their nuanced impact on model performance. Furthermore, automated optimization via PSO demonstrates tangible improvements in predictive accuracy as well as lower prediction errors. Comparative analysis with traditional LSTM models and a spectrum of machine learning algorithms reaffirm the superiority of the PSO-LSTM framework. Notably, it emerges as a top-performer across diverse datasets, underscoring its robust predictive capabilities. Moreover, the model’s resilience is evidenced through retrospective validation across varying timeframes, wherein consistent predictive accuracy is maintained over 50, 20, and 7-day intervals. In essence, the proposed PSO-LSTM model represents a significant advancement in stock market prediction methodologies, offering a potent amalgamation of LSTM’s temporal modeling prowess and PSO’s optimization finesse. The reasons that the model produces better results are as follows. Firstly, LSTM can capture the long short-term dependencies in financial time series, which can effectively learn and predict complex patterns and trends in data. Secondly, PSO algorithm is used to optimize the parameters of the LSTM model. PSO maintains a good balance between global search and local search to avoid the problem of falling into local optima. PSO also can find parameter combinations close to the global optimum in a short period of time. The advantages enable the PSO-LSTM model to be trained with the optimal parameter configuration to improve predictive performance. During the training process, various regularization techniques, such as dropout and L2 regularization, are employed to prevent overfitting of the model. Overfitting refers to the situation where a model performs well on training data but performs poorly on test data. The regularization process ensures the generalization ability of the model, making its predictions more accurate on unknown data. In addition, data normalization, denoising, and feature selection are conducted before model training. These steps ensure the quality of input data and enable the model to learn and capture important information of the data to improve prediction performance. Finally, a multi-level evaluations of the performances of the model are conducted. In addition, the performance of the model under different market conditions are also tested, which verify its adaptability and robustness in different scenarios. However, the proposed solution in this article still has many aspects for improvements. The proposed algorithm focuses on stock prediction as a single objective optimization problem with the aim of enhancing accuracy. In practical applications, it is often essential to efficiently train the model in response to market changes. In addition, the neural network operates as a "black box" model that sacrifices the interpretability and understandability of the original features. How to utilize an automatic and effective feature extraction process to reduce the dimensionality of the feature space and extract transformed features to create a new low-dimensional space are promising. In future work, data completeness by incorporating natural language analysis of financial news will be enhance. The latest models like GANs and pre-trained models for stock price prediction are also worth exploring. Additionally, the LSTM neural network will be refined by exploring additional parameters. Alternative evolutionary algorithms for hyperparameter optimization will also be investigated. Future values of the stocks can be predicted with low error. The trading system built by the proposed model can give successful buy/sell/keep suggestions. Supporting information S1 File. https://doi.org/10.1371/journal.pone.0310296.s001 (ZIP) TI - Enhancing stock index prediction: A hybrid LSTM-PSO model for improved forecasting accuracy JF - PLoS ONE DO - 10.1371/journal.pone.0310296 DA - 2025-01-14 UR - https://www.deepdyve.com/lp/public-library-of-science-plos-journal/enhancing-stock-index-prediction-a-hybrid-lstm-pso-model-for-improved-SE2SjiTMvI SP - e0310296 VL - 20 IS - 1 DP - DeepDyve ER -