基于两个词的词频计数使用python [英] Word frequency count based on two words using python
问题描述
有许多在线资源显示如何对单字
执行字数统计,例如这和此和此和其他人...
但是我不能找到两个单词计数频率的具体示例。
我有一个csv文件
FileList =我喜欢电视节目让我高兴,我也喜欢喜剧节目让我感觉喜欢飞
所以我想要的输出是:
wordscount = {I love:2,show makes:2,make me:2}
当然,我必须删除所有逗号,询问点....
{!,,,',? ,...,(,),[,],^,%,#,@,& b
$ b
如何使用python实现这个结果?
谢谢!
解决方案
>> from collections import Counter
>>>> import re
>>>>
>>>>句子=我爱电视节目让我快乐,我也喜欢喜剧节目让我感觉像飞翔
>>> words = re.findall(r'\w +',sentence)
>>>> two_words = [''.join(ws)for ws in zip(words,words [1:])]
>>> wordscount = {w:f for w,f in Counter(two_words).most_common()if f> 1}
>>>> wordscount
{'show makes':2,'makes me':2,'I love':2}
There are many resources online that shows how to do a word count for single word like this and this and this and others...
But I was not not able to find a concrete example for two words count frequency .I have a csv file that has some strings in it.
FileList = "I love TV show makes me happy, I love also comedy show makes me feel like flying"
So I want the output to be like :
wordscount = {"I love": 2, "show makes": 2, "makes me" : 2 }
Of course I will have to strip all the comma, interrogation points....
{!, , ", ', ?, ., (,), [, ], ^, %, #, @, &, *, -, _, ;, /, \, |, }
I will also remove some stop words which I found here just to get more concrete data from the text.
How can I achieve this results using python?
Thanks!
解决方案>>> from collections import Counter >>> import re >>> >>> sentence = "I love TV show makes me happy, I love also comedy show makes me feel like flying" >>> words = re.findall(r'\w+', sentence) >>> two_words = [' '.join(ws) for ws in zip(words, words[1:])] >>> wordscount = {w:f for w, f in Counter(two_words).most_common() if f > 1} >>> wordscount {'show makes': 2, 'makes me': 2, 'I love': 2}
这篇关于基于两个词的词频计数使用python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!