SpaCy的相似度如何计算? [英] How is SpaCy's similarity computed?

查看:765
本文介绍了SpaCy的相似度如何计算?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

初学者NLP问题在这里:

Beginner NLP Question here:

哇,空间太好了!它的tfidf模型可能更易于预处理,但是只有一行代码(token.vector)的w2v? -太棒了!

Wow spaCy is great! Its tfidf model could be easier to preprocess, but w2v with only one line of code (token.vector)?! - Awesome!

在他的关于spaCy的10行教程 andrazhribernik向我们展示了可以在令牌,已发送消息,单词块和文档上运行的.similarity方法.

In his 10 line tutorial on spaCy andrazhribernik show's us the .similarity method that can be run on tokens, sents, word chunks, and docs.

nlp = spacy.load('en')doc = nlp(raw_text)之后 我们可以在令牌和大块之间进行相似度查询. 但是,此.similarity方法在幕后计算的是什么?

After nlp = spacy.load('en') and doc = nlp(raw_text) we can do .similarity queries between tokens and chunks. However, what is being calculated behind the scenes in this .similarity method?

SpaCy已经具有非常简单的.vector,它可以根据GloVe模型的训练来计算w2v向量(.tfidf.fasttext方法有多酷?).

SpaCy already has the incredibly simple .vector, which computes the w2v vector as trained from the GloVe model (how cool would a .tfidf or .fasttext method be?).

模型相似性模型是否只是在计算这两个w2v-GloVe-矢量之间的余弦相似度,还是做其他事情?具体细节在文档中不清楚;任何帮助表示赞赏!

Is the model similarity model simply computing the cosine similarity between these two w2v-GloVe-vectors or doing something else? The specifics aren't clear in the documentation; any help appreciated!

推荐答案

假设您所引用的方法是令牌相似性之一,则可以在源代码

Assuming that the method you are referring to is the token similarity one, you can find the function in the sourcecode here. As you can see it computes the cosine similarity between the vectors.

如教程中所述:

单词嵌入是矢量或其他形式的数字映射中单词的表示,并由此扩展了整个语言的语料库.这样一来,就可以对单词进行数字化处理,将单词相似度表示为单词嵌入映射的维度上的空间差异.

A word embedding is a representation of a word, and by extension a whole language corpus, in a vector or other form of numerical mapping. This allows words to be treated numerically with word similarity represented as spatial difference in the dimensions of the word embedding mapping.

因此向量距离可以与单词相似度相关.

So the vector distance can be related to the word similarity.

这篇关于SpaCy的相似度如何计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆