如何计算 pandas 系列中的特定单词? [英] how to count specific words from a pandas Series?

查看:97
本文介绍了如何计算 pandas 系列中的特定单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试像这样从pandas DataFrame计算关键字的数量:

I am trying to count the number of keywords from a pandas DataFrame as such:

df = pd.read_csv('amazon_baby.csv')
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

selected_words必须从系列中计数:df ['review']

The selected_words have to be counted from the Series: df['review']

我尝试过

def word_counter(sent):
a={}
for word in selected_words:
    a[word] = sent.count(word)
return a

然后

df['totalwords'] = df.review.str.split()
df['word_count'] = df.totalwords.apply(word_counter)

----------------------------------------------------------------------------
----> 1 df['word_count'] = df.totalwords.apply(word_counter)

c:\users\admin\appdata\local\programs\python\python36\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   3192             else:
   3193                 values = self.astype(object).values
-> 3194                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3195 
   3196         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src\inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-51-cd11c5eb1f40> in word_counter(sent)
  2     a={}
  3     for word in selected_words:
----> 4         a[word] = sent.count(word)
  5     return a

AttributeError: 'float' object has no attribute 'count'

有人可以帮助..吗? 我猜这是因为该系列中的某些故障值不是字符串. .

can someone help..? i am guessing it is because of some fault value in the series that is not a string. . .

有人尝试提供帮助,但问题是DataFrame中的各个单元格中都有句子.

some people have tried helping but the issue is that the individual cells in the DataFrame have sentences in them.

我需要提取选定单词的数量(最好是字典形式),并将它们存储在具有相应行的同一dataFrame中的新列中.

I need to extract a count of selected words, preferably in dictionary form and store them in a new column in the same dataFrame with the corresponding rows.

csv格式的数据

推荐答案

假设您的数据框看起来像这样,

Suppose your dataframe looks like this,

df=pd.DataFrame({'A': ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate','great', 'fantastic', 'amazing', 'love', 'horrible']})
print(df)
    A
0   awesome
1   great
2   fantastic
3   amazing
4   love
5   horrible
6   bad
7   terrible
8   awful
9   wow
10  hate
11  great
12  fantastic
13  amazing
14  love
15  horrible

selected_words=['awesome','great','fantastic']

df.loc[df['A'].isin(selected_words),'A'].value_counts()
[out]
great        2
fantastic    2
awesome      1
Name: A, dtype: int64

这篇关于如何计算 pandas 系列中的特定单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆