在Python中使用阿拉伯语WordNet作为同义词? [英] Using Arabic WordNet for synonyms in python?
问题描述
我正在尝试获取句子中阿拉伯语单词的同义词
I am trying to get the synonyms for arabic words in a sentence
如果该单词是英语,它会很好地工作,并且结果以阿拉伯语显示,我想知道是否有可能立即获得阿拉伯语单词的同义词,而无需先用英语编写.
If the word is in English it works perfectly, and the results are displayed in Arabic language, I was wondering if its possible to get the synonym of an Arabic word right away without writing it in english first.
我尝试过,但是没有用&我希望没有tashkeelانتظار而不是اِنْتِظار
I tried that but it didn't work & I would prefer without tashkeel انتظار instead of اِنْتِظار
from nltk.corpus import wordnet as omw
jan = omw.synsets('انتظار ')[0]
print(jan)
print(jan.lemma_names(lang='arb'))
推荐答案
nltk中使用的Wordnet不支持阿拉伯语.如果您要查找阿拉伯语Wordnet ,那么这是完全不同的事情
Wordnet used in nltk doesnt support arabic. If you are looking for Arabic Wordnet so this is a totally different thing.
对于阿拉伯语wordnet,请下载:
For Arabic wordnet, download:
- http://nlp.lsi.upc.edu/awn/get_bd.php
- http://nlp.lsi.upc.edu/awn/AWNDatabaseManagement.py.gz
您通过以下方式运行它:
You run it with:
$ python AWNDatabaseManagement.py -i upc_db.xml
现在可以得到类似wn.synset('إنتظار')
的信息.阿拉伯语Wordnet具有功能wn.get_synsets_from_word(word)
,但它提供了偏移量.它也只接受数据库中发声的单词.例如,对于جميل
,您应该使用جَمِيل
:
Now to get something like wn.synset('إنتظار')
. Arabic Wordnet has a function wn.get_synsets_from_word(word)
, but it gives offsets. Also it accepts the words only as vocalized in the database. For example, you should use جَمِيل
for جميل
:
>> wn.get_synsets_from_word(u"جَمِيل")
[(u'a', u'300218842')]
300218842
是جميل的同义词集的偏移量.
300218842
is the offset of the synset of جميل .
我检查了单词إنتظار,看来它在AWN中不存在.
I checked for the word إنتظار and seems it doesn't exist in AWN.
有关使用AWN获取同义词的更多详细信息,此处.
More details about using AWN to get synonyms here.
这篇关于在Python中使用阿拉伯语WordNet作为同义词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!