# A Random Forest Test For Jumps in Stock Markets Using R

In the previous article we looked at how one can use Neural Networks to detect jumps present in returns of a particular stock. In this blog post, we build on the thinking established in the previous article and use a Random Forest to detect jumps present in stock market returns.

I have build an interactive web application which allows the user to select the share they want to test for jumps, and displays the results of the jump test. Feel free to play around with the web app. Any feedback on the web application will be highly appreciated. Note that the web app may take time to load as the Random Forest is first trained.

A Random Forest is an ensemble of decision trees. A Random Forest, like a Decision Tree, can be used for both classification and regression. Decision Trees tend to be weak learners because they tend to over-fit the training data set. That is, Decision trees do not generalize well.

The Random Forest combines weak learners (being the Decision Trees) into strong learners by combining the results from each of the decision trees.  Random Forests grows many classification trees. To classify a new observation, one passes the observation down each of the trees in the forest. Each individual tree gives a classification of the observation. In effect, the tree would have "voted" for that class. The forest then chooses the classification having the most votes (over all the trees in the forest)

The Random Forest is grown as follows

• If the sample size is N, then sample with replacement (bootstrap) from this sample. The resulting data is the data that will be used to grow the tree.
• Assume that there are P "depended" variables, a number p<<P is chosen at each node of the tree. Then p variables out of the P variables are then selected (at random) and the best split (e.g. using information gain) is then used to split the tree at this node.
• All the individual trees are grown to the largest extent possible. There is no pruning when growing a Random Forest.

In this blog post we will be focusing on using Random Forest to decide whether or not a set of returns from a share contain jumps or not. We will make use of the "randomForest" package in R.

# Feature Selection For the Random Forest

As in the previous article, the features/inputs that we will use to grow the Random Forest are: The first and second centered moments, skewness, kurtosis, the fifth, sixth, seventh and eighth centered moments. All of the moments used are sample moments. Let $X$ be a series of $n$ log returns. The moments inputs would then be given as:

$m_1= \frac{\sum X}{n}$

$m_2= \frac{(\sum X-m_1)^2}{n}$

$skweness= \frac{\frac{\sum (X-m_1)^3}{n}}{m_2^{3/2}}$

$kurtosis= \frac{\frac{\sum (X-m_1)^4}{n}}{m_2^{2}}$

$m_5= \frac{(\sum X-m_1)^5}{n}$

$m_6= \frac{(\sum X-m_1)^6}{n}$

$m_7= \frac{(\sum X-m_1)^7}{n}$

$m_8= \frac{(\sum X-m_1)^8}{n}.$

The reasons for choosing these moments as the features to use are given the article on how to use Neural Networks to test for jumps.

The Random Forest was build using the randomForest package in R. An example of a tree in the Random Forest is displayed below:

# Simulation Study For The Random Forest

In order to asses the performance of the Random Forest in detecting jumps, we performed a simulation study. The simulation study  involved simulation returns from a GBM process (which has no jumps) and returns from a Merton Jump diffusion process (which has jumps). Thus the results of our test will only be reliable if the returns obtained in the market are generated only by these two model. This is admittedly restrictive.

Based on the results of the simulation, we worked out the Probability of ACTUAL detection (that is, the probability of the test being able to detect jumps in  a series that has jumps) and the probability of FALSE detection (the test incorrectly detecting jumps in a series of returns that doesn't have jumps) of the Random Forest Jump Test. The simulation was conducted at a daily frequency, using different combinations of the GBM and Jump diffusion model parameters. A more rigorous comparison would have been to compare the two tests at different frequencies, and for large and small jumps.

We found that the predictive accuracy of the Random Forest test (using 5 000 observations) was as follows:

• Probability of ACTUAL detection : 100.0% (This means that the jump test model is 100% accurate assuming that the returns are generated either by a GBM, or by a Merton Jump diffusion model)
• Probability of FALSE detection : 0.0%

This suggests that a Random Forest approach to detecting jumps is better than the Neural Network approach discussed in the previous example. An ensemble of the Random Forest and the Neural network would more likely produce even better results (for a smaller training data set).

The web application can be used to see which shares (on your local stock exchange)  has jumps according to the formulation in this blog post. Note that the Random Forest in the web app is trained with only 500 observations to avoid it being too slow to load.

This section has the R code used for this blog post.