在python中查找文本文件中每个单词的频率 [英] To find frequency of every word in text file in python

查看:392
本文介绍了在python中查找文本文件中每个单词的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在我的文本文件中查找所有单词的出现频率,以便可以从中找出最频繁出现的单词. 有人可以帮助我使用该命令.

I want to find frequency of all words in my text file so that i can find out most frequently occuring words from them. Can someone please help me the command to be used for that.

import nltk
text1 = "hello he heloo hello hi " // example text
 fdist1 = FreqDist(text1) 

我使用了上面的代码,但是问题是它没有给出单词频率,而是显示了每个字符的频率. 我也想知道如何使用文本文件输入文本.

I have used above code but problem is that it is not giving word frequency,rather it is displaying frequency of every character. Also i want to know how to input text using text file.

推荐答案

我看到您正在使用该示例,并看到了与您所看到的相同的东西,为了使其正常工作,必须将字符串用空格分开.如果您不这样做,则似乎要对每个字符进行计数,这就是您所看到的.这将返回每个单词的正确计数,而不是字符.

I saw you were using the example and saw the same thing you were seeing, in order for it to work properly, you have to split the string by spaces. If you do not do this, it seems to count each character, which is what you were seeing. This returns the proper counts of each word, not character.

import nltk

text1 = 'hello he heloo hello hi '
text1 = text1.split(' ')
fdist1 = nltk.FreqDist(text1)
print (fdist1.most_common(50))

如果您想读取文件并获取字数,可以这样操作:

If you want to read from a file and get the word count, you can do it like so:

hello he heloo hello hi
my username is heinst
your username is frooty

python代码

import nltk

with open ("input.txt", "r") as myfile:
    data=myfile.read().replace('\n', ' ')

data = data.split(' ')
fdist1 = nltk.FreqDist(data)
print (fdist1.most_common(50))

这篇关于在python中查找文本文件中每个单词的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆