wordnet 路径相似度是可交换的吗? [英] Is wordnet path similarity commutative?

查看:29
本文介绍了wordnet 路径相似度是可交换的吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 nltk 的 wordnet API.当我将一个同义词集与另一个同义词集进行比较时,我得到 None 但当我反过来比较它们时,我得到一个浮点值.

它们不应该给出相同的值吗?有没有解释或者是wordnet的bug?

示例:

wn.synset('car.n.01').path_similarity(wn.synset('automobile.v.01')) # 无wn.synset('automobile.v.01').path_similarity(wn.synset('car.n.01')) # 0.06666666666666667

解决方案

从技术上讲,如果没有虚拟根,carautomobile 同义词集将没有相互链接:

<预><代码>>>>从 nltk.corpus 导入 wordnet as wn>>>x = wn.synset('car.n.01')>>>y = wn.synset('automobile.v.01')>>>打印 x.shortest_path_distance(y)没有任何>>>打印 y.shortest_path_distance(x)没有任何

现在,让我们仔细看看虚拟根问题.首先,NLTK 中有一个简洁的函数可以说明同义词集是否需要虚拟根:

<预><代码>>>>x._needs_root()错误的>>>y._needs_root()真的

接下来,当您查看 path_similarity 代码 (http://nltk.googlecode.com/svn-/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#Synset.path_similarity),你可以看到:

def path_similarity(self, other,verbose=False,simulate_root=True):距离 = self.shortest_path_distance(other, simulate_root=simulate_root 和 self._needs_root())如果距离为无或距离<0:返回无返回 1.0/(距离 + 1)

因此对于 automobile 同义词集,当您尝试 时,此参数 simulate_root=simulate_root 和 self._needs_root() 将始终为 Truey.path_similarity(x) 并且当你尝试 x.path_similarity(y) 时它总是 False 因为 x._needs_root()代码>是<代码>假:

<预><代码>>>>真和 y._needs_root()真的>>>真和 x._needs_root()错误的

现在当 path_similarity() 传递给 shortest_path_distance() (https://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#Synset.shortest_path_distance) 然后到 hypernym_distances(),它将尝试调用一个上位词列表来检查它们的距离,而无需 simulate_root = Trueautomobile 同义词集不会连接到 car,反之亦然:

<预><代码>>>>y.hypernym_distances(simulate_root=True)set([(Synset('automobile.v.01'), 0), (Synset('*ROOT*'), 2), (Synset('travel.v.01'), 1)])>>>y.hypernym_distances()set([(Synset('automobile.v.01'), 0), (Synset('travel.v.01'), 1)])>>>x.hypernym_distances()set([(Synset('object.n.01'), 8), (Synset('self-propelled_vehicle.n.01'), 2), (Synset('whole.n.02'), 8),(Synset('artifact.n.01'), 7), (Synset('physical_entity.n.01'), 10), (Synset('entity.n.01'), 11), (Synset('object.n.01'), 9), (Synset('instrumentality.n.03'), 5), (Synset('motor_vehicle.n.01'), 1), (Synset('vehicle.n.01')), 4), (Synset('entity.n.01'), 10), (Synset('physical_entity.n.01'), 9), (Synset('whole.n.02'), 7),(Synset('conveyance.n.03'), 5), (Synset('wheeled_vehicle.n.01'), 3), (Synset('artifact.n.01'), 6), (Synset('car.n.01'), 0), (Synset('container.n.01'), 4), (Synset('instrumentality.n.03'), 6)])

所以理论上,正确的 path_similarity 是 0/None ,但由于 simulate_root=simulate_root 和 self._needs_root() 参数,

nltk.corpus.wordnet.path_similarity() 在 NLTK 的 API 中是不可交换的.

但是代码也没有错误/错误,因为通过根进行的任何同义词距离的比较将不断远离,因为虚拟 *ROOT* 的位置永远不会改变,所以最佳做法是这样做以计算 path_similarity:

<预><代码>>>>从 nltk.corpus 导入 wordnet as wn>>>x = wn.synset('car.n.01')>>>y = wn.synset('automobile.v.01')# 当你从不想要一个非零值时,因为要# *ROOT* 总会让你保持某种距离# 从同义词集 x 到同义词集 y>>>最大(wn.path_similarity(x,y),wn.path_similarity(y,x))# 当你可以在同义词集相似性比较中允许 None 时>>>分钟(wn.path_similarity(x,y),wn.path_similarity(y,x))

I am using the wordnet API from nltk. When I compare one synset with another I got None but when I compare them the other way around I get a float value.

Shouldn't they give the same value? Is there an explanation or is this a bug of wordnet?

Example:

wn.synset('car.n.01').path_similarity(wn.synset('automobile.v.01')) # None
wn.synset('automobile.v.01').path_similarity(wn.synset('car.n.01')) # 0.06666666666666667

解决方案

Technically without the dummy root, both car and automobile synsets would have no link to each other:

>>> from nltk.corpus import wordnet as wn
>>> x = wn.synset('car.n.01')
>>> y = wn.synset('automobile.v.01')
>>> print x.shortest_path_distance(y)
None
>>> print y.shortest_path_distance(x)
None

Now, let's look at the dummy root issue closely. Firstly, there is a neat function in NLTK that says whether a synset needs a dummy root:

>>> x._needs_root()
False
>>> y._needs_root()
True

Next, when you look at the path_similarity code (http://nltk.googlecode.com/svn-/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#Synset.path_similarity), you can see:

def path_similarity(self, other, verbose=False, simulate_root=True):
  distance = self.shortest_path_distance(other, 
               simulate_root=simulate_root and self._needs_root())

  if distance is None or distance < 0:
    return None
  return 1.0 / (distance + 1)

So for automobile synset, this parameter simulate_root=simulate_root and self._needs_root() will always be True when you try y.path_similarity(x) and when you try x.path_similarity(y) it will always be False since x._needs_root() is False:

>>> True and y._needs_root()
True
>>> True and x._needs_root()
False

Now when path_similarity() pass down to shortest_path_distance() (https://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#Synset.shortest_path_distance) and then to hypernym_distances(), it will try to call for a list of hypernyms to check their distances, without simulate_root = True, the automobile synset will not connect to the car and vice versa:

>>> y.hypernym_distances(simulate_root=True)
set([(Synset('automobile.v.01'), 0), (Synset('*ROOT*'), 2), (Synset('travel.v.01'), 1)])
>>> y.hypernym_distances()
set([(Synset('automobile.v.01'), 0), (Synset('travel.v.01'), 1)])
>>> x.hypernym_distances()
set([(Synset('object.n.01'), 8), (Synset('self-propelled_vehicle.n.01'), 2), (Synset('whole.n.02'), 8), (Synset('artifact.n.01'), 7), (Synset('physical_entity.n.01'), 10), (Synset('entity.n.01'), 11), (Synset('object.n.01'), 9), (Synset('instrumentality.n.03'), 5), (Synset('motor_vehicle.n.01'), 1), (Synset('vehicle.n.01'), 4), (Synset('entity.n.01'), 10), (Synset('physical_entity.n.01'), 9), (Synset('whole.n.02'), 7), (Synset('conveyance.n.03'), 5), (Synset('wheeled_vehicle.n.01'), 3), (Synset('artifact.n.01'), 6), (Synset('car.n.01'), 0), (Synset('container.n.01'), 4), (Synset('instrumentality.n.03'), 6)])

So theoretically, the right path_similarity is 0 / None , but because of the simulate_root=simulate_root and self._needs_root() parameter,

nltk.corpus.wordnet.path_similarity() in NLTK's API is not commutative.

BUT the code is also not wrong/bugged, since comparison of any synset distance by going through the root will be constantly far since the position of the dummy *ROOT* will never change, so the best of practice is to do this to calculate path_similarity:

>>> from nltk.corpus import wordnet as wn
>>> x = wn.synset('car.n.01')
>>> y = wn.synset('automobile.v.01')

# When you NEVER want a non-zero value, since going to 
# the *ROOT* will always get you some sort of distance 
# from synset x to synset y
>>> max(wn.path_similarity(x,y), wn.path_similarity(y,x))

# when you can allow None in synset similarity comparison
>>> min(wn.path_similarity(x,y), wn.path_similarity(y,x))

这篇关于wordnet 路径相似度是可交换的吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆