Siamese Networks for Large-Scale Author Identification
Authorship attribution is the process of identifying the author of a text. Classification-based approaches work well for small numbers of candidate authors, but only similarity-based methods are applicable for larger numbers of authors or for authors beyond the training set. While deep learning methods have been applied to classification-based approaches, current similarity-based methods only embody static notions of similarity. Siamese networks have been used to develop learned notions of similarity in one-shot image tasks, and also for tasks of semantic relatedness in NLP. We examine their application to the stylistic task of authorship attribution, and show that they can substantially outperform both classification- and existing similarity-based approaches on datasets with large numbers of authors.
READ FULL TEXT