Chinese Spelling Check with Nearest Neighbors
Chinese Spelling Check (CSC) aims to detect and correct error tokens in Chinese contexts, which has a wide range of applications. In this paper, we introduce InfoKNN-CSC, extending the standard CSC model by linearly interpolating it with a k-nearest neighbors (kNN) model. Moreover, the phonetic, graphic, and contextual information (info) of tokens and contexts are elaborately incorporated into the design of the query and key of kNN, according to the characteristics of the task. After retrieval, in order to match the candidates more accurately, we also perform reranking methods based on the overlap of the n-gram values and inputs. Experiments on the SIGHAN benchmarks demonstrate that the proposed model achieves state-of-the-art performance with substantial improvements over existing work.
READ FULL TEXT