NLTK停用词的可用语言 [英] NLTK available languages for stopwords
问题描述
我想知道在哪里可以找到NLTK停用词支持的语言(及其键)的完整列表.
I'm wondering where I can find the full list of supported langs (and their keys) for the NLTK stopwords.
I find a list in https://pypi.org/project/stop-words/ but it does not contain the keys for each country. So, it is not clear if you can retrieve the list by simply stopwords.words("Bulgarian")
. In fact, that will throw an error.
我检查了NLTK站点,发现有4个文档与停用词"匹配,但没有一个文档对此进行了描述. https://www.nltk.org/search. html?q = stopwords& check_keywords = yes& area = default
I checked in the NLTK site and there are 4 documents matching "stopwords" but none of them describes that. https://www.nltk.org/search.html?q=stopwords&check_keywords=yes&area=default
在他们的书中什么也没说: http://www.nltk.org/book/ch02.html#stopwords_index_term
And nothing is sayd in their book: http://www.nltk.org/book/ch02.html#stopwords_index_term
那么,您知道在哪里可以找到密钥列表吗?
So, do you know where can I find the list of keys?
推荐答案
首先检查是否下载了nltk
软件包.
如果没有,您可以使用以下方式下载它:
First check if you have downloaded nltk
packages.
If not you can download it using below:
import nltk
nltk.download()
此后,您可以在下面的路径中找到停用词语言文件.
After this you can find stopword language files in below path.
C:/Users/username/AppData/Roming/nltk_data/corpora/stopwords
它支持21种语言(几天前我安装了nltk
,所以该数字必须是最新的).您可以在
There are 21 languages supported by it (I installed nltk
few days back, so this number must be up to date). You can pass filename as parameter in
nltk.corpus.stopwords.words('langauage')
这篇关于NLTK停用词的可用语言的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!