阿拉伯文字的Python ISRIStemmer [英] Python ISRIStemmer for Arabic text
问题描述
我正在IDLE(Python)上运行以下代码,我想输入阿拉伯字符串并获取其词根,但实际上不起作用
I am running the following code on IDLE(Python) and I want to enter Arabic string and get the stemming for it but actually it doesn't work
">>>从nltk.stem.isri导入ISRIStemmer
">>> from nltk.stem.isri import ISRIStemmer
">>> st = ISRIStemmer()
">>> st = ISRIStemmer()
">>> w ='حركات'
">>> w= 'حركات'
">>> join = w.decode('Windows-1256')
">>> join = w.decode('Windows-1256')
">>>打印st.stem(join).encode('Windows-1256').decode('utf-8')
">>> print st.stem(join).encode('Windows-1256').decode('utf-8')
运行它的结果是w中的相同文本,即'حركات'而不是词干
The result of running it is the same text in w which is 'حركات' which is not the stem
但是何时执行以下操作:
but when do the following:
">>>打印st.stem(u'اعلاميون')
">>> print st.stem(u'اعلاميون')
结果成功并返回'علم'
the result succeeded and returns the stem which is 'علم'
为什么将变量传递给stem()函数不会返回主干.
why passing variable to stem() function doesn't return the stem.
推荐答案
好的,我自己使用以下方法解决了这个问题:
Ok, I solved the problem by myself using the following:
w ='حركات'
w='حركات'
st.stem(w.decode('utf-8'))
st.stem(w.decode('utf-8'))
它正确地给出了根号حرك"
and it gives the root correctly which is "حرك"
这篇关于阿拉伯文字的Python ISRIStemmer的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!