FONDUE: A Framework for Node Disambiguation Using Network Embeddings
Real-world data often presents itself in the form of a network. Examples include social networks, citation networks, biological networks, and knowledge graphs. In their simplest form, networks represent real-life entities (e.g. people, papers, proteins, concepts) as nodes, and describe them in terms of their relations with other entities by means of edges between these nodes. This can be valuable for a range of purposes from the study of information diffusion to bibliographic analysis, bioinformatics research, and question-answering. The quality of networks is often problematic though, affecting downstream tasks. This paper focuses on the common problem where a node in the network in fact corresponds to multiple real-life entities. In particular, we introduce FONDUE, an algorithm based on network embedding for node disambiguation. Given a network, FONDUE identifies nodes that correspond to multiple entities, for subsequent splitting. Extensive experiments on twelve benchmark datasets demonstrate that FONDUE is substantially and uniformly more accurate for ambiguous node identification compared to the existing state-of-the-art, at a comparable computational cost, while less optimal for determining the best way to split ambiguous nodes.
READ FULL TEXT