Aggregating multiple types of complex data in stock market prediction: A model-independent framework
The increasing richness in volume, and especially types of data in the financial domain provides unprecedented opportunities to understand the stock market more comprehensively and makes the price prediction more accurate than before. However, they also bring challenges to classic statistic approaches since those models might be constrained to a certain type of data. Aiming at aggregating differently sourced information and offering type-free capability to existing models, a framework for predicting stock market of scenarios with mixed data, including scalar data, compositional data (pie-like) and functional data (curve-like), is established. The presented framework is model-independent, as it serves like an interface to multiple types of data and can be combined with various prediction models. And it is proved to be effective through numerical simulations. Regarding to price prediction, we incorporate the trading volume (scalar data), intraday return series (functional data), and investors' emotions from social media (compositional data) through the framework to competently forecast whether the market goes up or down at opening in the next day. The strong explanatory power of the framework is further demonstrated. Specifically, it is found that the intraday returns impact the following opening prices differently between bearish market and bullish market. And it is not at the beginning of the bearish market but the subsequent period in which the investors' "fear" comes to be indicative. The framework would help extend existing prediction models easily to scenarios with multiple types of data and shed light on a more systemic understanding of the stock market.
READ FULL TEXT