A Study of Machine Learning Models in Predicting the Intention of Adolescents to Smoke Cigarettes
The use of electronic cigarette (e-cigarette) is increasing among adolescents. This is problematic since consuming nicotine at an early age can cause harmful effects in developing teenager's brain and health. Additionally, the use of e-cigarette has a possibility of leading to the use of cigarettes, which is more severe. There were many researches about e-cigarette and cigarette that mostly focused on finding and analyzing causes of smoking using conventional statistics. However, there is a lack of research on developing prediction models, which is more applicable to anti-smoking campaign, about e-cigarette and cigarette. In this paper, we research the prediction models that can be used to predict an individual e-cigarette user's (including non-e-cigarette users) intention to smoke cigarettes, so that one can be early informed about the risk of going down the path of smoking cigarettes. To construct the prediction models, five machine learning (ML) algorithms are exploited and tested for their accuracy in predicting the intention to smoke cigarettes among never smokers using data from the 2018 National Youth Tobacco Survey (NYTS). In our investigation, the Gradient Boosting Classifier, one of the prediction models, shows the highest accuracy out of all the other models. Also, with the best prediction model, we made a public website that enables users to input information to predict their intentions of smoking cigarettes.
READ FULL TEXT