提取非内容英语单词字符串-python [英] Extract non-content English language words string - python
问题描述
我正在研究Python脚本,在该脚本中,我想从字符串中删除常见的英语单词,例如"the","an","and","for"以及更多.目前,我所做的是已在本地列出所有此类单词,并且我只是调用remove()
将其从字符串中删除.但是我想在这里找到一些Python风格的方法来实现这一目标.已经阅读过有关nltk和wordnet的信息,但完全不知道那是我应该使用的内容以及如何使用它.
I am working on Python script in which I want to remove the common english words like "the","an","and","for" and many more from a String. Currently what I have done is I have made a local list of all such words and I just call remove()
to remove them from the string. But I want here some pythonish way to achieve this. Have read about nltk and wordnet but totally clueless about that's what I should use and how to use it.
修改
好吧,我不明白为什么标记为重复的问题,因为我的问题丝毫不意味着我了解停用词,现在我只想知道如何使用它.....问题是关于我的问题可以在我的场景中使用,答案是停用词...但是当我发布此问题时,我对停用词一无所知.
Well I don't understand why marked as duplicate as my question does not in any way mean that I know about Stop words and now I just want to know how to use it.....the question is about what I can use in my scenario and answer to that was stop words...but when I posted this question I din't know anything about stop words.
推荐答案
我发现我要找的是这个东西:
I have found that what I was looking for is this:
from nltk.corpus import stopwords
my_stop_words = stopwords.words('english')
现在,我可以从列表/字符串中找到匹配项my_stop_words的单词中删除或替换单词.
Now I can remove or replace the words from my list/string where I find the match in my_stop_words which is a list.
要执行此操作,我必须下载python的NLTK,并使用其下载程序下载了stopwords软件包.
For this to work I had to download the NLTK for python and the using its downloader I downloaded stopwords package.
它还包含许多其他软件包,这些软件包可以在不同情况下用于NLP,例如words,brown,wordnet etc.
It also contains many other packages which can be used in different situations for NLP like words,brown,wordnet etc.
这篇关于提取非内容英语单词字符串-python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!