随意添加/删除停用词 [英] Add/remove stop words with spacy

查看:126
本文介绍了随意添加/删除停用词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

添加/删除带有停用字符的停用词的最佳方法是什么?我正在使用 token.is_stop 函数,并希望对集合进行一些自定义更改.我正在查看文档,但找不到有关停用词的任何内容.谢谢!

What is the best way to add/remove stop words with spacy? I am using token.is_stop function and would like to make some custom changes to the set. I was looking at the documentation but could not find anything regarding of stop words. Thanks!

推荐答案

您可以在像这样处理文本之前对其进行编辑(请参见此帖子):

You can edit them before processing your text like this (see this post):

>>> import spacy
>>> nlp = spacy.load("en")
>>> nlp.vocab["the"].is_stop = False
>>> nlp.vocab["definitelynotastopword"].is_stop = True
>>> sentence = nlp("the word is definitelynotastopword")
>>> sentence[0].is_stop
False
>>> sentence[3].is_stop
True

注意:这似乎可行< = v1.8.对于较新的版本,请参见其他答案.

Note: This seems to work <=v1.8. For newer versions, see other answers.

这篇关于随意添加/删除停用词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆