Disease Identification From Unstructured User Input
The increasing number of Internet users leads to the rapid popularization of online searching for health related advice. Now a days, just in case of facing health problem, people tend to "go online" initially instead of consulting with a health professional. With the proliferation of online symptom checker sites and health forums, it is easy to gain knowledge regarding health condition supported by a number of given symptoms. Though existing symptom checkers provide instant sense of disease diagnosis, these question-answering and selection based systems lack in interactivity. Online health forum sites can also be underwhelming because of it's time intensive nature and reliability issues. In this scenario, this paper proposes an web based automated disease identification framework which takes unstructured textual data like health forum posts as input and provides a symptom-disease correlation based ranking of probable diseases as output considering all important factors. The proposed framework incorporates a lexicographic and semantic feature based two-phase state-of-the-art text classification system and a disease knowledge base based similarity measurement module to identify probable disease. We evaluate this framework varying the number of feature components and the result suggests that, significant accuracy and reliability is obtained over baseline systems by effective feature engineering at the same time of keeping up with increased user interactivity.
READ FULL TEXT