Exploring the Potential of Large Language models in Traditional Korean Medicine: A Foundation Model Approach to Culturally-Adapted Healthcare
Introduction: Traditional Korean medicine (TKM) emphasizes individualized diagnosis and treatment, making AI modeling difficult due to limited data and implicit processes. GPT-3.5 and GPT-4, large language models, have shown impressive medical knowledge despite lacking medicine-specific training. This study aimed to assess the capabilities of GPT-3.5 and GPT-4 for TKM using the Korean National Licensing Examination for Korean Medicine Doctors. Methods: GPT-3.5 (February 2023) and GPT-4 (March 2023) models answered 340 questions from the 2022 examination across 12 subjects. Each question was independently evaluated five times in an initialized session. Results: GPT-3.5 and GPT-4 achieved 42.06 performance. There were significant differences in accuracy by subjects, with 83.75 (2). Both models showed high accuracy in recall-based and diagnosis-based questions but struggled with intervention-based ones. The accuracy for questions that require TKM-specialized knowledge was relatively lower than the accuracy for questions that do not GPT-4 showed high accuracy for table-based questions, and both models demonstrated consistent responses. A positive correlation between consistency and accuracy was observed. Conclusion: Models in this study showed near-passing performance in decision-making for TKM without domain-specific training. However, limits were also observed that were believed to be caused by culturally-biased learning. Our study suggests that foundation models have potential in culturally-adapted medicine, specifically TKM, for clinical assistance, medical education, and medical research.
READ FULL TEXT