用于阿拉伯文本的 Python ISRIStemmer [英] Python ISRIStemmer for Arabic text

查看：30 发布时间：2021/11/17 1:18:52 python utf-8 arabic stemming

本文介绍了用于阿拉伯文本的 Python ISRIStemmer的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在 IDLE(Python) 上运行以下代码，我想输入阿拉伯语字符串并获取它的词干，但实际上它不起作用

<预><代码>>>>从 nltk.stem.isri 导入 ISRIStemmer>>>st = ISRIStemmer()>>>w= 'حركات'>>>join = w.decode('Windows-1256')>>>打印 st.stem(join).encode('Windows-1256').decode('utf-8')

运行它的结果是 w 中相同的文本，即 'حركات' 不是词干

但是什么时候执行以下操作:

<预><代码>>>>打印 st.stem(u'اعلاميون')

结果成功并返回词干是'علم'

为什么将一些单词传递给 stem() 函数不会返回词干?

解决方案

好的，我自己用下面的方法解决了这个问题:

w = 'حركات'st.stem(w.decode('utf-8'))

它正确地给出了词根，即"حرك"

I am running the following code on IDLE(Python) and I want to enter Arabic string and get the stemming for it but actually it doesn't work

>>> from nltk.stem.isri import ISRIStemmer
>>> st = ISRIStemmer()
>>> w= 'حركات'
>>> join = w.decode('Windows-1256')
>>> print st.stem(join).encode('Windows-1256').decode('utf-8')

The result of running it is the same text in w which is 'حركات' which is not the stem

But when do the following:

>>> print st.stem(u'اعلاميون')

The result succeeded and returns the stem which is 'علم'

Why passing some words to stem() function doesn't return the stem?

解决方案

Ok, I solved the problem by myself using the following:

w = 'حركات' 
st.stem(w.decode('utf-8'))

and it gives the root correctly which is "حرك"

这篇关于用于阿拉伯文本的 Python ISRIStemmer的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用于阿拉伯文本的 Python ISRIStemmer [英] Python ISRIStemmer for Arabic text

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

用于阿拉伯文本的 Python ISRIStemmer [英] Python ISRIStemmer for Arabic text

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭