随意添加/删除停用词 [英] Add/remove stop words with spacy
本文介绍了随意添加/删除停用词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
添加/删除带有停用字符的停用词的最佳方法是什么?我正在使用 token.is_stop
函数,并希望对集合进行一些自定义更改.我正在查看文档,但找不到有关停用词的任何内容.谢谢!
What is the best way to add/remove stop words with spacy? I am using token.is_stop
function and would like to make some custom changes to the set. I was looking at the documentation but could not find anything regarding of stop words. Thanks!
推荐答案
您可以在像这样处理文本之前对其进行编辑(请参见此帖子):
You can edit them before processing your text like this (see this post):
>>> import spacy
>>> nlp = spacy.load("en")
>>> nlp.vocab["the"].is_stop = False
>>> nlp.vocab["definitelynotastopword"].is_stop = True
>>> sentence = nlp("the word is definitelynotastopword")
>>> sentence[0].is_stop
False
>>> sentence[3].is_stop
True
注意:这似乎可行< = v1.8.对于较新的版本,请参见其他答案.
Note: This seems to work <=v1.8. For newer versions, see other answers.
这篇关于随意添加/删除停用词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文