如何使用Python NLTK计算WordNet中两个形容词之间的最短路径(测地距离)? [英] How do I calculate the shortest path (geodesic) distance between two adjectives in WordNet using Python NLTK?

查看:519
本文介绍了如何使用Python NLTK计算WordNet中两个形容词之间的最短路径(测地距离)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用几种内置的相似度度量可以轻松完成WordNet中两个同义词集之间的语义相似度计算,例如:

Computing the semantic similarity between two synsets in WordNet can be easily done with several built-in similarity measures, such as:

synset1.path_similarity(synset2)

synset1.lch_similarity(synset2),Leacock-Chodorow相似性

synset1.lch_similarity(synset2), Leacock-Chodorow Similarity

synset1.wup_similarity(synset2),Wu-Palmer相似性

synset1.wup_similarity(synset2), Wu-Palmer Similarity

(如图所示)此处)

但是,所有这些方法都利用了WordNet的分类关系,即名词和动词的关系.形容词和副词通过同义词,反义词和相关符号相互关联.如何测量两个形容词之间的距离(跳数)?

However, all of these exploit WordNet's taxonomic relations, which are relations for nouns and verbs. Adjectives and adverbs are related via synonymy, antonymy and pertainyms. How can one measure the distance (number of hops) between two adjectives?

我尝试了path_similarity(),但正如预期的那样,它返回了'None':

I tried path_similarity(), but as expected, it returns 'None':

from nltk.corpus import wordnet as wn
x = wn.synset('good.a.01')
y = wn.synset('bad.a.01')


print(wn.path_similarity(x,y))

如果有任何方法可以计算一个形容词和另一个形容词之间的距离,请指出来.

If there is any way to compute the distance between one adjective and another, pointing it out would be greatly appreciated.

推荐答案

没有简单的方法来获得非名词/动词的词之间的相似性.

There's no easy way to get similarity between words that are not nouns/verbs.

如前所述,名词/动词的相似性很容易从

As noted, nouns/verbs similarity are easily extracted from

>>> from nltk.corpus import wordnet as wn
>>> dog = wn.synset('dog.n.1')
>>> cat = wn.synset('cat.n.1')
>>> car = wn.synset('car.n.1')
>>> wn.path_similarity(dog, cat)
0.2
>>> wn.path_similarity(dog, car)
0.07692307692307693
>>> wn.wup_similarity(dog, cat)
0.8571428571428571
>>> wn.wup_similarity(dog, car)
0.4
>>> wn.lch_similarity(dog, car)
1.072636802264849
>>> wn.lch_similarity(dog, cat)
2.0281482472922856

对于形容词来说很难,因此您需要构建自己的文本相似性设备.最简单的方法是使用向量空间模型,基本上,所有单词都由许多浮点数表示,例如

For adjective it's hard, so you would need to build your own text similarity device. The easiest way is to use vector space model, basically, all words are represented by a number of floating point numbers, e.g.

>>> import numpy as np
>>> blue = np.array([0.2, 0.2, 0.3])
>>> red = np.array([0.1, 0.2, 0.3])
>>> pink = np.array([0.1001, 0.221, 0.321])
>>> car = np.array([0.6, 0.9, 0.5])
>>> def cosine(x,y):
...     return np.dot(x,y) / (np.linalg.norm(x) * np.linalg.norm(y))
... 
>>> cosine(pink, red)
0.99971271929384864
>>> cosine(pink, blue)
0.96756147991512709
>>> cosine(blue, red)
0.97230558532824662
>>> cosine(blue, car)
0.91589118863996888
>>> cosine(red, car)
0.87469454283170045
>>> cosine(pink, car)
0.87482313596223782

要为诸如pink = np.array([0.1001, 0.221, 0.321])之类的东西训练一堆矢量,您应该尝试使用google

To train a bunch of vectors for something like pink = np.array([0.1001, 0.221, 0.321]), you should try google for

  • 潜在语义索引/潜在语义分析
  • 单词袋
  • 向量空间模型语义
  • Word2Vec,Doc2Vec,Wiki2Vec
  • 神经网络
  • 余弦相似度自然语言语义

您还可以尝试一些现成的软件/库,例如:

You can also try some off the shelf software / libraries like:

  • Gensim https://radimrehurek.com/gensim/
  • http://webcache.googleusercontent.com/search?q=cache:u5y4He592qgJ:takelab.fer.hr/sts/+&cd=2&hl=en&ct=clnk&gl=sg

除了矢量空间模型外,您还可以尝试一些图形模型,该模型将单词放入图形中并使用诸如pagerank之类的图形在图形中四处走动,以提供一些相似性度量.

Other than vector space model, you can try some graphical model that puts words into a graph and uses something like pagerank to walk around the graph to give you some similarity measure.

另请参阅:

  • Compare similarity of terms/expressions using NLTK?
  • check if two words are related to each other
  • How to determine semantic hierarchies / relations in using NLTK?
  • Is there an algorithm that tells the semantic similarity of two phrases
  • Semantic Relatedness Algorithms - python

这篇关于如何使用Python NLTK计算WordNet中两个形容词之间的最短路径(测地距离)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆