从 nltk 中删除停用词后如何删除引号? [英] How to remove quotes after removing stopwords from nltk?

查看：122 发布时间：2021/6/7 20:44:03 python-2.7 nltk stop-words

本文介绍了从 nltk 中删除停用词后如何删除引号?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我从报纸上捕获了标题，我也从标题中删除了停用词，但是在删除停用词后，该词带有单引号，所以我不想要这些引号，为此我尝试了以下代码:

I had captured headers from newspapers,also i removed stopwords from headres but after removing stopwords the word comes with single quote,so i dont want these quote,for this i tried below code:

from nltk.corpus import stopwords
blog_posts=[]
stop = stopwords.words('english')+[
    '.',
    ',',
    '--',
    '\'s',
    '?',
    ')',
    '(',
    ':',
    '\'',
    '\'re',
    '"',
    '-',
    '}',
    '{',
    u'—',
   'a', 'able', 'about', 'above', 'according', 'accordingly', 'across', 'actually', 'after', 'afterwards', 'again', 'against', 'all', 'allow', 'allows', 'almost', 'alone', 'along', 'already', 'also', 'although', 'always', 'am', 'among', 'amongst', 'an', 'and', 'another', 'any', 'anybody', 
]
file=open("resources/ch05-webpages/newspapers/TOI2232014.csv","r+")
t=[i for i in file.read().split() if i not in stop]
blog_posts.append((t,))
print blog_posts

所以这段代码的输出是:

so the output of this code is:

[(['"\'Duplicates\'', 'BJP,', 'Jaswant', 'Singh', 'says"', '"Flight'],)]

但我想要这样的输出:

 [([Duplicates,BJP,Jaswant,Singh,ays,Flight])]

那么我可以为这个输出做什么?

so what can i do for this output?

推荐答案

yahoo 终于找到了这个问题的答案.

yahoo finally i got the answer for this question.

t=[i.replace("\'","").replace("?","").replace(":","").replace("\"","").replace("#","").strip() 
  for i in file.read().split() if i not in stop]
#blog_posts.append((t,))
p=' '.join(t)
blog_posts.append((p,))
print blog_posts

这篇关于从 nltk 中删除停用词后如何删除引号?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从 nltk 中删除停用词后如何删除引号? [英] How to remove quotes after removing stopwords from nltk?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从 nltk 中删除停用词后如何删除引号? [英] How to remove quotes after removing stopwords from nltk?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭