NLTK停用词的可用语言 [英] NLTK available languages for stopwords

查看:179
本文介绍了NLTK停用词的可用语言的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道在哪里可以找到NLTK停用词支持的语言(及其键)的完整列表.

I'm wondering where I can find the full list of supported langs (and their keys) for the NLTK stopwords.

我在 https://pypi.org/project/stop-words/<中找到了一个列表/a>,但其中不包含每个国家/地区的密钥.因此,不清楚是否可以仅通过stopwords.words("Bulgarian")检索列表.实际上,这将引发错误.

I find a list in https://pypi.org/project/stop-words/ but it does not contain the keys for each country. So, it is not clear if you can retrieve the list by simply stopwords.words("Bulgarian"). In fact, that will throw an error.

我检查了NLTK站点,发现有4个文档与停用词"匹配,但没有一个文档对此进行了描述. https://www.nltk.org/search. html?q = stopwords& check_keywords = yes& area = default

I checked in the NLTK site and there are 4 documents matching "stopwords" but none of them describes that. https://www.nltk.org/search.html?q=stopwords&check_keywords=yes&area=default

在他们的书中什么也没说: http://www.nltk.org/book/ch02.html#stopwords_index_term

And nothing is sayd in their book: http://www.nltk.org/book/ch02.html#stopwords_index_term

那么,您知道在哪里可以找到密钥列表吗?

So, do you know where can I find the list of keys?

推荐答案

首先检查是否下载了nltk软件包.
如果没有,您可以使用以下方式下载它:

First check if you have downloaded nltk packages.
If not you can download it using below:

import nltk
nltk.download()

此后,您可以在下面的路径中找到停用词语言文件.

After this you can find stopword language files in below path.

C:/Users/username/AppData/Roming/nltk_data/corpora/stopwords

它支持21种语言(几天前我安装了nltk,所以该数字必须是最新的).您可以在

There are 21 languages supported by it (I installed nltk few days back, so this number must be up to date). You can pass filename as parameter in

nltk.corpus.stopwords.words('langauage')

这篇关于NLTK停用词的可用语言的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆