提取非内容英语单词字符串-python [英] Extract non-content English language words string - python

查看:125
本文介绍了提取非内容英语单词字符串-python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究Python脚本,在该脚本中,我想从字符串中删除常见的英语单词,例如"the","an","and","for"以及更多.目前,我所做的是已在本地列出所有此类单词,并且我只是调用remove()将其从字符串中删除.但是我想在这里找到一些Python风格的方法来实现这一目标.已经阅读过有关nltk和wordnet的信息,但完全不知道那是我应该使用的内容以及如何使用它.

I am working on Python script in which I want to remove the common english words like "the","an","and","for" and many more from a String. Currently what I have done is I have made a local list of all such words and I just call remove() to remove them from the string. But I want here some pythonish way to achieve this. Have read about nltk and wordnet but totally clueless about that's what I should use and how to use it.

修改

好吧,我不明白为什么标记为重复的问题,因为我的问题丝毫不意味着我了解停用词,现在我只想知道如何使用它.....问题是关于我的问题可以在我的场景中使用,答案是停用词...但是当我发布此问题时,我对停用词一无所知.

Well I don't understand why marked as duplicate as my question does not in any way mean that I know about Stop words and now I just want to know how to use it.....the question is about what I can use in my scenario and answer to that was stop words...but when I posted this question I din't know anything about stop words.

推荐答案

我发现我要找的是这个东西:

I have found that what I was looking for is this:

from nltk.corpus import stopwords
my_stop_words = stopwords.words('english')

现在,我可以从列表/字符串中找到匹配项my_stop_words的单词中删除或替换单词.

Now I can remove or replace the words from my list/string where I find the match in my_stop_words which is a list.

要执行此操作,我必须下载python的NLTK,并使用其下载程序下载了stopwords软件包.

For this to work I had to download the NLTK for python and the using its downloader I downloaded stopwords package.

它还包含许多其他软件包,这些软件包可以在不同情况下用于NLP,例如words,brown,wordnet etc.

It also contains many other packages which can be used in different situations for NLP like words,brown,wordnet etc.

这篇关于提取非内容英语单词字符串-python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆