基于两个词的词频计数使用python [英] Word frequency count based on two words using python

查看:160
本文介绍了基于两个词的词频计数使用python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有许多在线资源显示如何对单字
执行字数统计,例如和其他人...

但是我不能找到两个单词计数频率的具体示例。



我有一个csv文件

  FileList =我喜欢电视节目让我高兴,我也喜欢喜剧节目让我感觉喜欢飞

所以我想要的输出是:

  wordscount = {I love:2,show makes:2,make me:2} 


当然,我必须删除所有逗号,询问点.... {!,,,',? ,...,(,),[,],^,%,#,@,& b
$ b

我也会删除一些停用词,我发现



如何使用python实现这个结果?



谢谢!

解决方案

 >> from collections import Counter 
>>>> import re
>>>>
>>>>句子=我爱电视节目让我快乐,我也喜欢喜剧节目让我感觉像飞翔
>>> words = re.findall(r'\w +',sentence)
>>>> two_words = [''.join(ws)for ws in zip(words,words [1:])]
>>> wordscount = {w:f for w,f in Counter(two_words).most_common()if f> 1}
>>>> wordscount
{'show makes':2,'makes me':2,'I love':2}


There are many resources online that shows how to do a word count for single word like this and this and this and others...
But I was not not able to find a concrete example for two words count frequency .

I have a csv file that has some strings in it.

FileList = "I love TV show makes me happy, I love also comedy show makes me feel like flying"

So I want the output to be like :

wordscount =  {"I love": 2, "show makes": 2, "makes me" : 2 }

Of course I will have to strip all the comma, interrogation points.... {!, , ", ', ?, ., (,), [, ], ^, %, #, @, &, *, -, _, ;, /, \, |, }

I will also remove some stop words which I found here just to get more concrete data from the text.

How can I achieve this results using python?

Thanks!

解决方案

>>> from collections import Counter
>>> import re
>>> 
>>> sentence = "I love TV show makes me happy, I love also comedy show makes me feel like flying"
>>> words = re.findall(r'\w+', sentence)
>>> two_words = [' '.join(ws) for ws in zip(words, words[1:])]
>>> wordscount = {w:f for w, f in Counter(two_words).most_common() if f > 1}
>>> wordscount
{'show makes': 2, 'makes me': 2, 'I love': 2}

这篇关于基于两个词的词频计数使用python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆