在python中导入阿拉伯语Wordnet [英] Import Arabic Wordnet in python
问题描述
我需要使用 python 来处理阿拉伯语单词.我需要将阿拉伯语 wordnet 与 python 链接来执行一些方法,例如:
i need to do function on arabic words by using python.. and i need to link arabic wordnet with python to do some method like :
wn.synset('جميل')
我找到了多语言词典:AWN - 阿拉伯语WN
i find Multilingual Lexicons: AWN - ArabicWN
http://www.talp.upc.edu/index.php/technology/resources/multilingual-lexicons-and-machine-translation-resources/multilingual-lexicons/72-awn
我尝试运行:一套用于访问数据库的基本python函数
and i try to run : A set of basic python functions for accessing the database
http://nlp.lsi.upc.edu/awn/AWNDatabaseManagement.py.gz
但是当运行代码时(AWNDatabaseManagement.py)出现此错误:
but when run the code(AWNDatabaseManagement.py) this error occur:
processing file E:/usuaris/horacio/arabicWN/AWNdatabase/upc_db.xml
file E:/usuaris/horacio/arabicWN/AWNdatabase/upc_db.xml not correct
Traceback (most recent call last):
File "/Users/s/Desktop/arab", line 403, in <module>
wn.compute_index_w()
NameError: global name 'wn' is not defined
有什么想法吗?
推荐答案
AWNDatabaseManagement.py
应该由参数 -i
提供,该参数将阿拉伯语 WordNet 作为值.如果未指定参数,它将使用默认路径 E:/usuaris/horacio/arabicWN/AWNdatabase/upc_db.xml
.
AWNDatabaseManagement.py
should be fed by the argument -i
that has the Arabic WordNet as a value. If the argument is not specified, it will use a default path E:/usuaris/horacio/arabicWN/AWNdatabase/upc_db.xml
.
为了解决这个问题,下载阿拉伯语WordNet的xml数据库 upc_db.xml
.我建议将它与脚本 AWNDatabaseManagement.py
放在同一文件夹中.然后,运行:
So to resolve that, download the xml database of Arabic WordNet upc_db.xml
. I suggest to place it in the same folder with the script AWNDatabaseManagement.py
. Then,run:
$ python AWNDatabaseManagement.py -i upc_db.xml
这是我运行后得到的,没有错误:
This what I got after running it, no errors:
processing file upc_db.xml
<open file 'upc_db.xml', mode 'r' at 0xb74689c0>
你也可以改变320行
opts['i']='E:/usuaris/horacio/arabicWN/AWNdatabase/upc_db.xml'
到
opts['i']='upc_db.xml'
然后在没有 -i
您可以加载它:
>> from AWNDatabaseManagement import wn
如果失败,请检查您是否将 xml 资源放在正确的路径中.
if it fails, check that you are putting the xml resource in the right path.
现在得到类似 wn.synset('جميل')
的东西.阿拉伯语 Wordnet 有一个函数 wn.get_synsets_from_word(word)
,但它给出了偏移量.它也只接受在数据库中发声的单词.例如,您应该使用 جَمِيل
而不是 جميل
:
Now to get something like wn.synset('جميل')
. Arabic Wordnet has a function wn.get_synsets_from_word(word)
, but it gives offsets. Also it accepts the words only as vocalized in the database. For example, you should use جَمِيل
not جميل
:
>> wn.get_synsets_from_word(u"جَمِيل")
[(u'a', u'300218842')]
300218842
是 جميل 同义词集的偏移量.我建议改用下一个方法.列出单词:
300218842
is the offset of the synset of جميل . I suggest to use the next method instead. list words by:
>> for word,ids in sorted(wn.get_words(False)):
.. print word, ids
你会得到这样的结果:
جَمِيعَة [u'jamiyEap_1']
جَمِيل [u'jamiyl_1']
جَمِيْعَة [u'jamiyoEap_1']
جَمَّدَ [u'jam~ada_2', u'jam~ada_1']
选择你的词,然后从它的 id 中选择一个 id.ID 以 Buckwalter 罗马化 编写.许多ids意味着这个词有不同的含义.通过以下方式描述所选单词:
Choose your word, and pick an id of its ids. IDs are written in Buckwalter romanization. Many ids means the word has different meanings. Describe the chosen word by:
>> wn._words["jamiyl_1"].describe()
wordid jamiyl_1
value جَمِيل
synsets [u'jamiyl_a1AR']
forms [(u'root', u'\u062c\u0645\u0644')]
现在您有了同义词集列表.有关同义词集的更多信息,请使用:
Now you have the synsets list. For more information about a synset, use:
>> wn._items["jamiyl_a1AR"].describe()
itemid jamiyl_a1AR
offset 300218842
name جَمِيل
type synset
pos a
input links [[u'be_in_state', u'jamaAl_n1AR'], [u'near_antonym', u'qabiyH_a1AR']]
output links [[u'near_antonym', u'qabiyH_a1AR']]
这篇关于在python中导入阿拉伯语Wordnet的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!