如何使用BERT来对相似的句子进行聚类 [英] How to cluster similar sentences using BERT

查看：293 发布时间：2021/4/10 18:32:49 python nlp artificial-intelligence word-embedding bert-language-model

本文介绍了如何使用BERT来对相似的句子进行聚类的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于ElMo，FastText和Word2Vec，我对一个句子中的单词嵌入进行平均，并使用HDBSCAN/KMeans聚类对相似的句子进行分组.

For ElMo, FastText and Word2Vec, I'm averaging the word embeddings within a sentence and using HDBSCAN/KMeans clustering to group similar sentences.

在这篇简短的文章中可以看到一个很好的实现示例:

A good example of the implementation can be seen in this short article: http://ai.intelligentonlinetools.com/ml/text-clustering-word-embedding-machine-learning/

我想使用BERT(使用拥抱脸的BERT python包)做同样的事情，但是我不熟悉如何提取原始单词/句子向量以便将它们输入到聚类算法中.我知道BERT可以输出句子表示形式-那么我实际上将如何从句子中提取原始向量呢?

I would like to do the same thing using BERT (using the BERT python package from hugging face), however I am rather unfamiliar with how to extract the raw word/sentence vectors in order to input them into a clustering algorithm. I know that BERT can output sentence representations - so how would I actually extract the raw vectors from a sentence?

任何信息都会有所帮助.

Any information would be helpful.

如何使用BERT来对相似的句子进行聚类 [英] How to cluster similar sentences using BERT

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何使用BERT来对相似的句子进行聚类 [英] How to cluster similar sentences using BERT

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭