nltk"OMW"阿拉伯语的词网 [英] nltk "OMW" wordnet with Arabic language

查看:202
本文介绍了nltk"OMW"阿拉伯语的词网的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用(OMW)wordnet专门针对阿拉伯语言的python/nltk.所有功能都能很好地使用英语,但是当我使用'arb'标记时,我似乎无法执行其中任何一个.唯一有效的方法是从给定的阿拉伯语同义词集中提取lemma_names.

I'm working on python/nltk with (OMW) wordnet specifically for The Arabic language. All the functions work fine with the English language yet I can't seem to be able to perform any of them when I use the 'arb' tag. The only thing that works great is extracting the lemma_names from a given Arabic synset.

下面的代码可与u'arb'配合使用: 输出是阿拉伯语引理的列表.

The code below works fine with u'arb': The output is a list of Arabic lemmas.

for synset in wn.synsets(u'عام',lang=('arb')):
    for lemma in synset.lemma_names(u'arb'):
        print lemma

当我尝试通过同义词集,定义(例如,上位词)执行与以上代码相同的逻辑时,我收到一条错误消息:

When I try to perform the same logic as the code above with synset, definitions, example, hypernyms, I get an error which says:

TypeError: hyponyms() takes exactly 1 argument (2 given)

(如果我提供了'arb'标志)或

(if I supply the 'arb' flag) or

KeyError: u'arb'

这是如果我写synset.hyponyms(u'arb')不能使用的代码之一:

This is one of the codes that will not work if I write synset.hyponyms(u'arb'):

for synset in wn.synsets(u'عام',lang=('arb')):
    for hypo in synset.hyponyms(): #print the hyponyms in English not Arabic
        print hypo

这是否意味着我无法使用wn.all_synsets和其他内置函数来提取所有阿拉伯语同义词集,上位词等?

Does this mean that I can't get to use wn.all_synsets and other built-in functions to extract all the Arabic synsets, hypernyms, etc?

推荐答案

nltk的开放式多语言Wordnet具有所有同义词集的英文名称,因为它是一个以原始英语Wordnet为中心的多语言数据库.同义词集对含义进行建模,因此它们与语言无关,因此无法使用特定语言进行请求.但是,每个同义词集都与OMW涵盖的语言的引理相关联.一旦有了一些同义词集(原始,下位词等),只需再次询问阿拉伯语引理即可:

The nltk's Open Multilingual Wordnet has English names for all the synsets, since it is a multilingual database centered on the original English Wordnet. Synsets model meanings, hence they are language-independent and cannot be requested in a specific language. But each synset is linked to lemmas for the languages covered by the OMW. Once you have some synsets (original, hyponyms, etc.), just ask for the Arabic lemmas again:

>>> for synset in wn.synsets(u'عام',lang=('arb')):
...     for hypo in synset.hyponyms():
...         for lemma in hypo.lemmas("arb"):
...             print(lemma)
... 
Lemma('waft.v.01.إِنْبعث')
Lemma('waft.v.01.انبعث')
Lemma('waft.v.01.إنبعث_كالرائحة_العطرة')
Lemma('waft.v.01.إِنْدفع')
Lemma('waft.v.01.إِنْطلق')
Lemma('waft.v.01.انطلق')
Lemma('waft.v.01.حمل_بخفة')
Lemma('waft.v.01.دفع')
Lemma('calendar_year.n.01.سنة_شمْسِيّة')
Lemma('calendar_year.n.01.سنة_مدنِيّة')
Lemma('fiscal_year.n.01.سنة_ضرِيبِيّة')
Lemma('fiscal_year.n.01.سنة_مالِيّة')

换句话说,引理是多语言的,同义词集不是.

In other words, the lemmas are multilingual, the synsets are not.

这篇关于nltk"OMW"阿拉伯语的词网的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆