获取指定单词的WordNet域名 [英] Get WordNet's domain name for the specified word

查看:94
本文介绍了获取指定单词的WordNet域名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道WordNet具有域层次结构:例如运动->足球.

I know WordNet has Domains Hierarchy: e.g. sport->football.

1)是否可以列出与"sport-> football"子域相关的所有单词?

1) Is it possible to list all words related, for example, to the 'sport->football' sub-domain?

  Response: goalkeeper, forward, penalty, ball, field, stadium, referee and so on.

2)获取给定单词的域名,例如守门员"?

2) Get domain's name for a given word , e.g. 'goalkeeper'?

 Need something like [sport->football; sport->hockey] or [football;hockey] or just 'football'.

它用于文档分类任务.

推荐答案

WordNet具有上位/下位层次结构,但这不是您想要的,因为您 可以看到您何时查找守门员:

WordNet has a hypernym / hyponym hierarchy but that is not what you want here, as you can see when you look up goalkeeper:

from nltk.corpus import wordnet
s = wordnet.synsets('goalkeeper')[0]
s.hypernym_paths()

其中一个结果是:

[Synset('entity.n.01'),
Synset('physical_entity.n.01'),
Synset('causal_agent.n.01'),
Synset('person.n.01'),
Synset('contestant.n.01'),
Synset('athlete.n.01'),
Synset('soccer_player.n.01'),
Synset('goalkeeper.n.01')]

有两种方法,分别称为usage_domains()topic_domains(),但是对于大多数单词,它们返回一个空列表:

There are two methods called usage_domains() and topic_domains() but they return an empty list for most words:

s = wordnet.synsets('football')[0]
s.topic_domains()
>>> []
s.usage_domains()
>>> []

WordNet域项目可能就是您想要的.它提供了一个文本文件,其中包含Princeton WordNet 2.0同义词集及其对应域之间的映射.您必须注册您的电子邮件地址才能访问数据. 然后,您可以读取与您的WordNet版本相对应的文件(它们提供2.0和3.2),例如,使用anydbm模块:

The WordNet Domains project however could be what you are looking for. It offers a text file that contains the mapping between Princeton WordNet 2.0 synsets and their corresponding domains. You have to register your email address to get access to the data. Then you can read in the file that corresponds to your WordNet version (they offer 2.0 and 3.2), for example with the anydbm module:

import anydbm
fh = open('wn-domains-2.0-20050210', 'r')
dbdomains = anydbm.open('dbdomains', 'c')
for line in fh:
    offset, domain = line.split('\t')
    dbdomains[offset[:-2]] = domain
fh.close()

然后,您可以使用同义词集的offset属性来找出其域.也许您必须在开头添加零:

You can then use the offset attribute of a synset to find out its domain. Maybe you have to add a zero at the beginning:

dbdomains.get('0' + str(wordnet.synsets('travel_guidebook')[0].offset))
>>> 'linguistics\n'

这篇关于获取指定单词的WordNet域名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆