使用wordnet nltk确定Hypernym或同义字 [英] Determining Hypernym or Hyponym using wordnet nltk

查看:326
本文介绍了使用wordnet nltk确定Hypernym或同义字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想检查两个单词之间的上位词/同义词关系(由用户提供) 这意味着它们中的任何一个都可以是另一个的上位词,或者也可能是两者之间没有上位词的关系.为此,我还想知道是否最好从sparql查询中进行检查

I want to check for the hypernyms/hyponym relation between two words (given by the user) which means any of them can be hypernym of other or it can also be the case that there is no hypernym relation between the two.Can I use path_similarity for the same.I am trying to do like this.If you can suggest any better method for that.I also want to know if it is better to check the same from a sparql query

 first=wn.synset('automobile.n.01')
 second=wn.synset('car.n.01')
 first.path_similarity(second) 

推荐答案

首先,wordnet中的wordsynset/concept之间存在区别.

Firstly, there is a difference between word and synset/concept in wordnet.

在这里我们看到一个单词可以有多种含义(即链接到多个概念):

Here we see that one word can have multiple meaning (i.e. links to multiple concepts):

>>> from nltk.corpus import wordnet as wn
>>> car = 'car'
>>> auto = 'automobile'
>>> wn.synsets(auto)
[Synset('car.n.01'), Synset('automobile.v.01')]
>>> wn.synsets(car)
[Synset('car.n.01'), Synset('car.n.02'), Synset('car.n.03'), Synset('car.n.04'), Synset('cable_car.n.01')]

在这种情况下,汽车"和汽车"可以引用相同的Synset('car.n.01'),如果是,则它们之间没有下位/上位关系.

And in this case 'automobile' and 'car' can refer to the same Synset('car.n.01') and if so, then they have no hypo/hypernym relationship.

还有lemma的概念,它只会使事情复杂化,因此我们现在将其略过.

There's also the notion of lemma which will just complicate things, so we'll skip that for now.

让我们说您不是在比较单词,而是比较同义词集,那么您可以简单地找到同义词集的所有下位词,并查看其中是否有另一个同义词集.

Let's say you are not comparing words but synsets, then you can simply find all hyponyms of the synset and see whether the other synset occurs inside it.

如果您要比较普通单词,请参见

If you're comparing plain words, see How to get all the hyponyms of a word/synset in python nltk and wordnet?

下面将显示如何比较同义词集.例如,我将使用水果"和苹果",这比汽车"和汽车"更具逻辑性,因为汽车"和汽车"只有一个名词同义词集

The below will show how to compare synsets. For example sake, i'll use 'fruit' and 'apple' which is more logical than 'automobile' and 'car' since there is only one noun synset for 'automobile' and 'car'

>>> from nltk.corpus import wordnet as wn
>>>
>>> fruit = 'fruit'
>>> wn.synsets(fruit)
[Synset('fruit.n.01'), Synset('yield.n.03'), Synset('fruit.n.03'), Synset('fruit.v.01'), Synset('fruit.v.02')]
>>> wn.synsets(fruit)[0].definition()
u'the ripened reproductive body of a seed plant'
>>> fruit = wn.synsets(fruit)[0]
>>> 
>>> apple = 'apple'
>>> wn.synsets(apple)
[Synset('apple.n.01'), Synset('apple.n.02')]
>>> wn.synsets(apple)[0].definition()
u'fruit with red or yellow or green skin and sweet to tart crisp whitish flesh'
>>> apple = wn.synsets(apple)[0]
>>>

下面,我们看到苹果不在水果的直接下位词中:

Below, we see that apple is not in fruit's direct hyponyms:

>>> fruit.hyponyms()
[Synset('accessory_fruit.n.01'), Synset('achene.n.01'), Synset('acorn.n.01'), Synset('aggregate_fruit.n.01'), Synset('berry.n.02'), Synset('buckthorn_berry.n.01'), Synset('buffalo_nut.n.01'), Synset('chokecherry.n.01'), Synset('cubeb.n.01'), Synset('drupe.n.01'), Synset('ear.n.05'), Synset('edible_fruit.n.01'), Synset('fruitlet.n.01'), Synset('gourd.n.02'), Synset('hagberry.n.01'), Synset('hip.n.05'), Synset('juniper_berry.n.01'), Synset('marasca.n.01'), Synset('may_apple.n.01'), Synset('olive.n.01'), Synset('pod.n.02'), Synset('pome.n.01'), Synset('prairie_gourd.n.01'), Synset('pyxidium.n.01'), Synset('quandong.n.02'), Synset('rowanberry.n.01'), Synset('schizocarp.n.01'), Synset('seed.n.01'), Synset('wild_cherry.n.01')]
>>> 
>>> apple in fruit.hyponyms()
False

所以我们必须遍历所有下位字母,看看苹果是否在其中之一:

So we have to iterate down all the hyponyms and see whether apple is in one of them:

>>> hypofruits = set([i for i in fruit.closure(lambda s:s.hyponyms())])
>>> apple in hypofruits
True

那里有!为了完整起见:

There you have it! For the sake of completeness:

>>> hyperapple = set([i for i in apple.closure(lambda s:s.hypernyms())])
>>> fruit in hyperapple
True
>>> hypoapple = set([i for i in apple.closure(lambda s:s.hyponyms())])
>>> fruit in hypoapple
False
>>> hyperfruit = set([i for i in fruit.closure(lambda s:s.hypernyms())])
>>> apple in hyperfruit
False

这篇关于使用wordnet nltk确定Hypernym或同义字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆