计算 pandas 数据框中单词的出现频率 [英] Counting the Frequency of words in a pandas data frame

查看：66 发布时间：2020/5/18 1:11:33 python pandas nltk

本文介绍了计算 pandas 数据框中单词的出现频率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个如下表:

      URN                   Firm_Name
0  104472               R.X. Yah & Co
1  104873        Big Building Society
2  109986          St James's Society
3  114058  The Kensington Society Ltd
4  113438      MMV Oil Associates Ltd

我想计算Firm_Name列中所有单词的出现频率，以获得如下输出:

And I want to count the frequency of all the words within the Firm_Name column, to get an output like below:

我尝试了以下代码:

import pandas as pd
import nltk
data = pd.read_csv("X:\Firm_Data.csv")
top_N = 20
word_dist = nltk.FreqDist(data['Firm_Name'])
print('All frequencies')
print('='*60)
rslt=pd.DataFrame(word_dist.most_common(top_N),columns=['Word','Frequency'])

print(rslt)
print ('='*60)

但是，以下代码不会产生唯一的字数.

However the following code does not produce a unique word count.

推荐答案

IIUIC，使用value_counts()

In [3361]: df.Firm_Name.str.split(expand=True).stack().value_counts()
Out[3361]:
Society       3
Ltd           2
James's       1
R.X.          1
Yah           1
Associates    1
St            1
Kensington    1
MMV           1
Big           1
&             1
The           1
Co            1
Oil           1
Building      1
dtype: int64

或者，

Or,

pd.Series(np.concatenate([x.split() for x in df.Firm_Name])).value_counts()

或者，

Or,

pd.Series(' '.join(df.Firm_Name).split()).value_counts()

对于前N个，例如3

For top N, for example 3

In [3379]: pd.Series(' '.join(df.Firm_Name).split()).value_counts()[:3]
Out[3379]:
Society    3
Ltd        2
James's    1
dtype: int64

详细信息

Details

In [3380]: df
Out[3380]:
      URN                   Firm_Name
0  104472               R.X. Yah & Co
1  104873        Big Building Society
2  109986          St James's Society
3  114058  The Kensington Society Ltd
4  113438      MMV Oil Associates Ltd

这篇关于计算 pandas 数据框中单词的出现频率的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

计算 pandas 数据框中单词的出现频率 [英] Counting the Frequency of words in a pandas data frame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

计算 pandas 数据框中单词的出现频率 [英] Counting the Frequency of words in a pandas data frame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭