The package will take care of downloading the models automatically tokenizer = AutoTokenizer. distance import cosine from transformers import AutoModel, AutoTokenizer # Import our models. Use SimCSE with Huggingfaceīesides using our provided sentence embedding tool, you can also easily import our models with HuggingFace's transformers: Naming rules: unsup and sup represent "unsupervised" (trained on Wikipedia corpus) and "supervised" (trained on NLI datasets) respectively. Note that the results are slightly better than what we have reported in the current version of the paper after adopting a new set of hyperparameters (for hyperparamters, see the training section). Princeton-nlp/sup-simcse-bert-large-uncased Princeton-nlp/sup-simcse-bert-base-uncased Princeton-nlp/unsup-simcse-bert-large-uncased Princeton-nlp/unsup-simcse-bert-base-uncased You can import these models by using the simcse package or using HuggingFace's Transformers. Our released models are listed as following. The code is based on DensePhrases' repo and demo (a lot of thanks to the authors of DensePhrases). We also provide an easy-to-build demo website to show how SimCSE can be used in sentence retrieval. In that case, you should change to other GPUs or install the CPU version of faiss package. WARNING: We have found that faiss did not well support Nvidia AMPERE GPUs (3090 and A100). Just install the package following instructions here and simcse will automatically use faiss for efficient search. We also support faiss, an efficient similarity search library. To use the tool, first install the simcse package from PyPI We provide an easy-to-use sentence embedding tool based on our SimCSE model (see our Wiki for detailed usage). The following figure is an illustration of our models. Our supervised SimCSE incorporates annotated pairs from NLI datasets into contrastive learning by using entailment pairs as positives and contradiction pairs as hard negatives. Unsupervised SimCSE simply takes an input sentence and predicts itself in a contrastive learning framework, with only standard dropout used as noise. We propose a simple contrastive learning framework that works with both unlabeled and labeled data. 4/20: We released our model checkpoints and evaluation code.5/10: We released our sentence embedding tool and demo code.5/12: We updated our unsupervised models with new hyperparameters and better performance.8/31: Our paper has been accepted to EMNLP! Please check out our updated paper (with updated numbers and baselines).This repository contains the code and pre-trained models for our paper SimCSE: Simple Contrastive Learning of Sentence Embeddings. SimCSE: Simple Contrastive Learning of Sentence Embeddings
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |