如何使用NLTK确定语义层次结构/关系? [英] How to determine semantic hierarchies / relations in using NLTK?

查看:120
本文介绍了如何使用NLTK确定语义层次结构/关系?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用NLTK和wordnet来理解两个单词之间的语义关系.就像我输入员工"和服务员"一样,它返回的信息表明员工比服务员更一般.或者对于雇员"和工人",它返回相等.有人知道怎么做吗?

I want to use NLTK and wordnet to understand the semantic relation between two words. Like if I enter "employee" and "waiter", it returns something showing that employee is more general than waiter. Or for "employee" and "worker", it returns equal. Does anyone know how to do that?

推荐答案

首先,您必须解决将单词放入引理然后进入同义词集的问题,即如何从单词中识别同义词集?

Firstly, you have to tackle the problem of getting words into lemmas and then into Synsets, i.e. how can you identify a synset from a word?

word => lemma => lemma.pos.sense => synset    
Waiters => waiter => 'waiter.n.01' => wn.Synset('waiter.n.01')

因此,假设您已经解决了上述问题,并得出了waiter最正确的表示形式,那么您可以继续比较同义词集.请注意,一个单词可以有许多同义词集

So let's say you have already deal with the above problem and arrived at the right most representation of waiter, then you can continue to compare synsets. Do note that, a word can have many synsets

from nltk.corpus import wordnet as wn
waiter = wn.Synset('waiter.n.01')
employee = wn.Synset('employee.n.01')

all_hyponyms_of_waiter = list(set([w.replace("_"," ") for s in waiter.closure(lambda s:s.hyponyms()) for w in s.lemma_names]))
all_hyponyms_of_employee = list(set([w.replace("_"," ") for s in employee.closure(lambda s:s.hyponyms()) for w in s.lemma_names]))

if 'waiter' in all_hyponyms_of_employee:
  print 'employee more general than waiter'
elif 'employee' in all_hyponyms_of_waiter:
  print 'waiter more general than employee'
else:
  print "The SUMO ontology used in wordnet just doesn't have employee or waiter under the same tree"

这篇关于如何使用NLTK确定语义层次结构/关系?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆