在python中导入阿拉伯语Wordnet [英] Import Arabic Wordnet in python

查看:202
本文介绍了在python中导入阿拉伯语Wordnet的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用python对阿拉伯语单词进行处理. 而且我需要将阿拉伯文字网与python链接以执行某些方法,例如:

i need to do function on arabic words by using python.. and i need to link arabic wordnet with python to do some method like :

wn.synset('جميل')

我找到了多语言词典:AWN-ArabicWN

i find Multilingual Lexicons: AWN - ArabicWN

http://www.talp.upc.edu/index.php/technology/resources/multilingual-lexicons-and-machine-translation-resources/multilingual-lexicons/72-awn

,我尝试运行: 一组用于访问数据库的基本python函数

and i try to run : A set of basic python functions for accessing the database

http://nlp.lsi.upc.edu/awn/AWNDatabaseManagement.py.gz

但是运行代码时(AWNDatabaseManagement.py) 发生此错误:

but when run the code(AWNDatabaseManagement.py) this error occur:

processing file  E:/usuaris/horacio/arabicWN/AWNdatabase/upc_db.xml
file  E:/usuaris/horacio/arabicWN/AWNdatabase/upc_db.xml  not correct

Traceback (most recent call last):
  File "/Users/s/Desktop/arab", line 403, in <module>
    wn.compute_index_w()
NameError: global name 'wn' is not defined 

有什么主意吗?

推荐答案

AWNDatabaseManagement.py应该由以阿拉伯语WordNet作为值的参数-i提供.如果未指定参数,它将使用默认路径E:/usuaris/horacio/arabicWN/AWNdatabase/upc_db.xml.

AWNDatabaseManagement.py should be fed by the argument -i that has the Arabic WordNet as a value. If the argument is not specified, it will use a default path E:/usuaris/horacio/arabicWN/AWNdatabase/upc_db.xml.

要解决此问题,请下载阿拉伯语WordNet的xml数据库 .我建议将其与脚本AWNDatabaseManagement.py放在同一文件夹中.然后,运行:

So to resolve that, download the xml database of Arabic WordNet upc_db.xml . I suggest to place it in the same folder with the script AWNDatabaseManagement.py. Then,run:

$ python AWNDatabaseManagement.py -i upc_db.xml

这是我运行后得到的,没有错误:

This what I got after running it, no errors:

processing file  upc_db.xml
<open file 'upc_db.xml', mode 'r' at 0xb74689c0>

您还可以更改第320行

You can also change the line 320

opts['i']='E:/usuaris/horacio/arabicWN/AWNdatabase/upc_db.xml'

opts['i']='upc_db.xml'

,然后在没有-i

您可以加载它:

>> from AWNDatabaseManagement import wn

如果失败,请检查是否将xml资源放在正确的路径中.

if it fails, check that you are putting the xml resource in the right path.

现在可以得到类似wn.synset('جميل')的信息.阿拉伯语Wordnet具有功能wn.get_synsets_from_word(word),但它提供了偏移量.它也只接受数据库中发声的单词.例如,您应该使用جَمِيل而不是جميل:

Now to get something like wn.synset('جميل'). Arabic Wordnet has a function wn.get_synsets_from_word(word), but it gives offsets. Also it accepts the words only as vocalized in the database. For example, you should use جَمِيل not جميل:

>> wn.get_synsets_from_word(u"جَمِيل")
[(u'a', u'300218842')]

300218842是جميل的同义词集的偏移量.我建议改用下一种方法.列出单词的依据:

300218842 is the offset of the synset of جميل . I suggest to use the next method instead. list words by:

 >> for word,ids  in sorted(wn.get_words(False)):
 ..     print word, ids 

您将得到如下结果:

 جَمِيعَة [u'jamiyEap_1']
 جَمِيل [u'jamiyl_1']
 جَمِيْعَة [u'jamiyoEap_1']
 جَمَّدَ [u'jam~ada_2', u'jam~ada_1']

选择您的单词,然后选择其ID的ID. ID用 Buckwalter罗马化编写.许多id表示该单词具有不同的含义.通过以下方式描述所选单词:

Choose your word, and pick an id of its ids. IDs are written in Buckwalter romanization. Many ids means the word has different meanings. Describe the chosen word by:

>> wn._words["jamiyl_1"].describe()
wordid  jamiyl_1
value  جَمِيل
synsets  [u'jamiyl_a1AR']
forms  [(u'root', u'\u062c\u0645\u0644')]

现在您有了同义词集列表.有关同义词集的更多信息,请使用:

Now you have the synsets list. For more information about a synset, use:

>> wn._items["jamiyl_a1AR"].describe()
itemid  jamiyl_a1AR
offset  300218842
name  جَمِيل
type  synset
pos  a
input links  [[u'be_in_state', u'jamaAl_n1AR'], [u'near_antonym', u'qabiyH_a1AR']]
output links  [[u'near_antonym', u'qabiyH_a1AR']]

这篇关于在python中导入阿拉伯语Wordnet的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆