获取指定词的WordNet的域名 [英] Get WordNet's domain name for the specified word

查看:19
本文介绍了获取指定词的WordNet的域名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道 WordNet 有域层次结构:例如运动->足球.

I know WordNet has Domains Hierarchy: e.g. sport->football.

1) 是否可以列出所有与sport->football"子域相关的词?

1) Is it possible to list all words related, for example, to the 'sport->football' sub-domain?

  Response: goalkeeper, forward, penalty, ball, field, stadium, referee and so on.

2) 获取给定单词的域名,例如守门员"?

2) Get domain's name for a given word , e.g. 'goalkeeper'?

 Need something like [sport->football; sport->hockey] or [football;hockey] or just 'football'.

用于文档分类任务.

推荐答案

WordNet 有一个上位词/下位词层次结构,但这不是你想要的,因为你抬头就能看到守门员:

WordNet has a hypernym / hyponym hierarchy but that is not what you want here, as you can see when you look up goalkeeper:

from nltk.corpus import wordnet
s = wordnet.synsets('goalkeeper')[0]
s.hypernym_paths()

结果之一是:

[Synset('entity.n.01'),
Synset('physical_entity.n.01'),
Synset('causal_agent.n.01'),
Synset('person.n.01'),
Synset('contestant.n.01'),
Synset('athlete.n.01'),
Synset('soccer_player.n.01'),
Synset('goalkeeper.n.01')]

有两个方法叫做 usage_domains()topic_domains() 但它们对于大多数单词返回一个空列表:

There are two methods called usage_domains() and topic_domains() but they return an empty list for most words:

s = wordnet.synsets('football')[0]
s.topic_domains()
>>> []
s.usage_domains()
>>> []

WordNet Domains 项目 但是可能正是您要找的.它提供了一个文本文件,其中包含普林斯顿 WordNet 2.0 同义词集与其对应域之间的映射.您必须注册您的电子邮件地址才能访问数据.然后您可以读取与您的 WordNet 版本(他们提供 2.0 和 3.2)相对应的文件,例如使用 anydbm 模块:

The WordNet Domains project however could be what you are looking for. It offers a text file that contains the mapping between Princeton WordNet 2.0 synsets and their corresponding domains. You have to register your email address to get access to the data. Then you can read in the file that corresponds to your WordNet version (they offer 2.0 and 3.2), for example with the anydbm module:

import anydbm
fh = open('wn-domains-2.0-20050210', 'r')
dbdomains = anydbm.open('dbdomains', 'c')
for line in fh:
    offset, domain = line.split('	')
    dbdomains[offset[:-2]] = domain
fh.close()

然后您可以使用同义词集的偏移属性来找出其域.也许你必须在开头加一个零:

You can then use the offset attribute of a synset to find out its domain. Maybe you have to add a zero at the beginning:

dbdomains.get('0' + str(wordnet.synsets('travel_guidebook')[0].offset))
>>> 'linguistics
'

这篇关于获取指定词的WordNet的域名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆