空间相似度方法不能正常工作 [英] spacy similarity method doesn't not work correctly

查看:30
本文介绍了空间相似度方法不能正常工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 spacy 进行简单的自然语言处理.我正在通过测量单词之间的相似性来过滤单词.

I am doing simple natural language processing using spacy. I'm working on filtering out words by measuring the similarity between words.

我编写并使用了 spacy 文档中显示的以下简单代码,但结果看起来不像 文档.

I wrote and used the following simple code shown in the spacy documentation, but the result does not look like a documentation.

import spacy
nlp = spacy.load('en_core_web_lg')
tokens = nlp('dog cat banana')

for token1 in tokens:
    for token2 in tokens:
        sim = token1.similarity(token2)
        print("{:>6s}, {:>6s}: {}".format(token1.text, token2.text, sim))

代码结果如下.

   dog,    dog: 1.0
   dog,    cat: 2.307269867164827e-21
   dog, banana: 0.0
   cat,    dog: 2.307269867164827e-21
   cat,    cat: 1.0
   cat, banana: -0.04468117654323578
banana,    dog: -7.828739256116838e+17
banana,    cat: -8.242222286053048e+17
banana, banana: 1.0

特别是狗"和猫"之间的相似度应该在0.8左右,但并不是非常非常小的值.

Especially, similarity between "dog" and "cat" should be about 0.8, but it is not a nd very very small value.

此外,dog"和banana"之间的相似度为 0.0,但banana"和dog"之间的相似度为 -7.828739256116838e+17.

In addition, similarity between "dog" and "banana" is 0.0 but similarity between 'banana' and 'dog' is -7.828739256116838e+17.

我不知道如何解决它.

请帮帮我.

推荐答案

首先安装大型 EN 模型(或所有模型).

First install large EN model (or all models).

python3 -m spacy.en.download all

接下来,尝试按照文档使用示例代码,

Next, try with sample code as per documentation using,

nlp = spacy.load('en_core_web_md')

如果这不起作用,请不要尝试加载,

If that doesnt work, Instead of above try loading,

nlp = spacy.load('en')

执行上述更改后,结果与文档一致.

After doing above changes the result is as per documentation.

python3 /tmp/c.py
   dog,    dog: 1.000000078333395
   dog,    cat: 0.8016855098942641
   dog, banana: 0.2432764518408807
   cat,    dog: 0.8016855098942641
   cat,    cat: 1.0000001375986456
   cat, banana: 0.2815436412709355
banana,    dog: 0.2432764518408807
banana,    cat: 0.2815436412709355
banana, banana: 1.000000107068369

这篇关于空间相似度方法不能正常工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆