计算文本中每个单词的出现次数 - Python [英] Count the number occurrences of each word in a text - Python

查看:36
本文介绍了计算文本中每个单词的出现次数 - Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道我可以在文本/数组中找到一个单词:

I know that I can find a word in a text/array with this:

if word in text: 
   print 'success'

我想做的是阅读文本中的一个单词,并根据找到的单词不断计数(这是一个简单的计数器任务).但问题是我真的不知道如何read 已经读过的单词.最后:统计每个单词出现的次数?

What I want to do is read a word in a text, and keep counting as many times as the word is found (it is a simple counter task). But the thing is I do not really know how to read words that have already been read. In the end: count the number occurrences of each word?

我想过保存在一个数组中(甚至是多维数组,所以保存单词和它出现的次数,或者两个数组),每次在该数组中出现一个单词时求和1.

I have thought of saving in an array (or even multidimensional array, so save the word and the number of times it appears, or in two arrays), summing 1 every time it appears a word in that array.

那么,当我读一个词时,我能不能用类似这样的东西来读它:

So then, when I read a word, can I NOT read it with something similar to this:

if word not in wordsInText: 
       print 'success'

推荐答案

既然我们已经确定了您要实现的目标,我可以给您一个答案.现在您需要做的第一件事是将文本转换为单词列表.虽然 split 方法可能看起来是一个不错的解决方案,但当句子以单词结尾,后跟句号、逗号或任何其他字符时,它会在实际计数中产生问题.所以这个问题的一个很好的解决方案是 NLTK.假设您拥有的文本存储在名为 text 的变量中.您要查找的代码如下所示:

Now that we established what you're trying to achieve, I can give you an answer. Now the first thing you need to do is convert the text into a list of words. While the split method might look like a good solution, it will create a problem in the actual counting when sentences end with a word, followed by a full stop, commas or any other characters. So a good solution for this problem would be NLTK. Assume that the text you have is stored in a variable called text. The code you are looking for would look something like this:

from itertools import chain
from collections import Counter
from nltk.tokenize import sent_tokenize, word_tokenize

text = "This is an example text. Let us use two sentences, so that it is more logical."
wordlist = list(chain(*[word_tokenize(s) for s in sent_tokenize(text)]))
print(Counter(wordlist))
# Counter({'.': 2, 'is': 2, 'us': 1, 'more': 1, ',': 1, 'sentences': 1, 'so': 1, 'This': 1, 'an': 1, 'two': 1, 'it': 1, 'example': 1, 'text': 1, 'logical': 1, 'Let': 1, 'that': 1, 'use': 1})

这篇关于计算文本中每个单词的出现次数 - Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆