Instance Smoothed Contrastive Learning for Unsupervised Sentence Embedding

05/12/2023
by   Hongliang He, et al.
0

Contrastive learning-based methods, such as unsup-SimCSE, have achieved state-of-the-art (SOTA) performances in learning unsupervised sentence embeddings. However, in previous studies, each embedding used for contrastive learning only derived from one sentence instance, and we call these embeddings instance-level embeddings. In other words, each embedding is regarded as a unique class of its own, whichmay hurt the generalization performance. In this study, we propose IS-CSE (instance smoothing contrastive sentence embedding) to smooth the boundaries of embeddings in the feature space. Specifically, we retrieve embeddings from a dynamic memory buffer according to the semantic similarity to get a positive embedding group. Then embeddings in the group are aggregated by a self-attention operation to produce a smoothed instance embedding for further analysis. We evaluate our method on standard semantic text similarity (STS) tasks and achieve an average of 78.30 and 79.42 RoBERTa-base, and RoBERTa-large respectively, a 2.05 improvement compared to unsup-SimCSE.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset