将单词添加到 scikit-learn 的 CountVectorizer 的停止列表 [英] Adding words to scikit-learn's CountVectorizer's stop list

查看：64 发布时间：2021/7/16 19:50:56 python scikit-learn stop-words

本文介绍了将单词添加到 scikit-learn 的 CountVectorizer 的停止列表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Scikit-learn 的 CountVectorizer 类允许您传递字符串 'english' 到参数 stop_words.我想在这个预定义列表中添加一些内容.谁能告诉我如何做到这一点?

Scikit-learn's CountVectorizer class lets you pass a string 'english' to the argument stop_words. I want to add some things to this predefined list. Can anyone tell me how to do this?

推荐答案

根据源代码 sklearn.feature_extraction.text，完整列表(实际上是一个 frozenset，来自 stop_words) 的 ENGLISH_STOP_WORDS 通过 __all__ 公开.因此，如果您想使用该列表以及更多项目，您可以执行以下操作:

According to the source code for sklearn.feature_extraction.text, the full list (actually a frozenset, from stop_words) of ENGLISH_STOP_WORDS is exposed through __all__. Therefore if you want to use that list plus some more items, you could do something like:

from sklearn.feature_extraction import text 

stop_words = text.ENGLISH_STOP_WORDS.union(my_additional_stop_words)

(其中 my_additional_stop_words 是任何字符串序列)并将结果用作 stop_words 参数.CountVectorizer.__init__ 的这个输入由 _check_stop_list 解析，它将直接传递新的 frozenset.

(where my_additional_stop_words is any sequence of strings) and use the result as the stop_words argument. This input to CountVectorizer.__init__ is parsed by _check_stop_list, which will pass the new frozenset straight through.

这篇关于将单词添加到 scikit-learn 的 CountVectorizer 的停止列表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将单词添加到 scikit-learn 的 CountVectorizer 的停止列表 [英] Adding words to scikit-learn's CountVectorizer's stop list

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将单词添加到 scikit-learn 的 CountVectorizer 的停止列表 [英] Adding words to scikit-learn&#39;s CountVectorizer&#39;s stop list

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

将单词添加到 scikit-learn 的 CountVectorizer 的停止列表 [英] Adding words to scikit-learn's CountVectorizer's stop list

登录关闭