How I use stock correlation to predict stock price

Matthew Leung
3 min readSep 5, 2020

Last time, I use the historical prices to predict the future price with Conv1D/LTSM model, but the outcome is not very good. Although the Mean Absolute Error (MAE) is small, it is not a good indicator. It is because the MAE of 1.4 and 1.0 is equal to 0.2 and -0.2, but in terms of investment, the former is a difference in gain, the latter is that a loss is predicted as gain, which implies a big difference. So I use the accuracy of prediction of rise/drop as the indicator, and finally found that the model is just about 50% accurate, which means the model is just as good as throwing a coin.

I think the future stock price does not necessarily depend on the past values. I change my mind to use other factors. I think some stocks may affect the price of other stocks, so I use the historical price of a stock to predict the future price of another stock. Specifically, I use the 10-day historical prices to predict the 5-day average of the future price of another stock with regression.

However, how to pick the correct pair of correlated stocks? Thanks to automation, I can use brute force by program it in python. I looped over the NASDAQ-100 stocks, and for each possible pair, I run a regression over the 2-year data. Compute the validation RMSE and the drop/rise accuracy. Then we can get the pair with the highest accuracy and the lowest RMSE.

The outcome are: ZM/BIIB is the most accurate (about 70%), and DOCU/TSLA has the lowest RMSE. (<0.03 for normalized values)

ZM (Zoom) is a popular video conference provider, and BIIB (Biogen) is a BioTech company for drug R&D. I think both companies are benefitted from the virus outbreak.

I also inspected other high accuracy stock pairs: ZM/LBTYK, and ZM/BKNG. LBTYK is a broadband company. The more people using video conferencing, the more bandwidth they will need, which will benefit the broadband company.

BKNG is a online travel company. The travel industry is severely impacted by the virus outbreak. Therefore, when the epidemic being controlled will affect when the travel company recover its business. Perhaps, the ZM and BKNG has negative correlation.

For the pair of DOCU/TSLA, I don’t understand how the electronic company related to the Motor company, but when I google them, I found a discussion thread that Tesla will send all future documents through DocuSign to customer e-mail address. Maybe it is the why when a lot of electron documents were sent, indicate more people using Tesla? Or using less paper and using less fuel are both good to environment? Not sure, may need more research on the relation.

Sound interesting? The model is not just to find out the related pair, but also predict the future price. So it can be for both qualitative and quantitative analysis. For example, below is the comparison of the prediction (red line), and the actual value (green line). They are close although the prediction has some spikes/outliners. It would be better if it can be smoothed by moving average.

--

--