计算Dataframe Pandas中句子中最常见的100个单词 [英] Count most frequent 100 words from sentences in Dataframe Pandas

查看:181
本文介绍了计算Dataframe Pandas中句子中最常见的100个单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Pandas数据框中的一栏中有文字评论,我想用频率计数来计数N个最频繁出现的单词(整列-不在单个单元格中).一种方法是使用计数器,通过遍历每一行来对单词进行计数.有更好的选择吗?

I have text reviews in one column in Pandas dataframe and I want to count the N-most frequent words with their frequency counts (in whole column - NOT in single cell). One approach is Counting the words using a counter, by iterating through each row. Is there a better alternative?

代表性数据.

0    a heartening tale of small victories and endu
1    no sophomore slump for director sam mendes  w
2    if you are an actor who can relate to the sea
3    it's this memory-as-identity obviation that g
4    boyd's screenplay ( co-written with guardian

推荐答案

from collections import Counter
Counter(" ".join(df["text"]).split()).most_common(100)

我非常确定会给您您想要的东西(您可能必须在调用most_common之前从计数器结果中删除一些非单词)

im pretty sure would give you what you want (you might have to remove some non-words from the counter result before calling most_common)

这篇关于计算Dataframe Pandas中句子中最常见的100个单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆