用Python计算字数 [英] Count Words in Python

查看：121 发布时间：2020/5/2 6:26:06 python list scikit-learn

本文介绍了用Python计算字数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在python中有一个字符串列表.

I have a list of strings in python.

list = [ "Sentence1. Sentence2...", "Sentence1. Sentence2...",...]

我想删除停用词并计算所有不同字符串组合中每个词的出现次数.有简单的方法吗?

I want to remove stop words and count occurrence of each word of all different strings combined. Is there a simple way to do it?

我目前正在考虑使用scikit中的CountVectorizer()，而不是对每个单词进行迭代并组合结果

I am currently thinking of using CountVectorizer() from scikit and than iterating for each word and combining the results

推荐答案

如果您不介意安装新的python库，建议您使用

If you don't mind installing a new python library, I suggest you use gensim. The first tutorial does exactly what you ask:

# remove common words and tokenize
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
         for document in documents]

然后，您需要为您的文档语料库创建词典并创建单词袋.

You will then need to create the dictionary for your corpus of document and create the bag-of-words.

dictionary = corpora.Dictionary(texts)
dictionary.save('/tmp/deerwester.dict') # store the dictionary, for future 
print(dictionary)

您可以使用tf-idf和东西加权结果，然后很容易地进行LDA.

You can weight the result using tf-idf and stuff and do LDA quite easily after.

在教程1中查看此处

这篇关于用Python计算字数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用Python计算字数 [英] Count Words in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

用Python计算字数 [英] Count Words in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭