什么时候适合使用df.value_counts()和df.groupby('...').count()? [英] When is it appropriate to use df.value_counts() vs df.groupby('...').count()?

查看:664
本文介绍了什么时候适合使用df.value_counts()和df.groupby('...').count()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Pandas听说过,通常有多种方法可以做同一件事,但是我想知道–

I've heard in Pandas there's often multiple ways to do the same thing, but I was wondering –

如果我要按特定列中的值对数据进行分组并计算具有该值的项目数,那么什么时候使用df.groupby('colA').count()有意义,什么时候使用df['colA'].value_counts()有意义?

If I'm trying to group data by a value within a specific column and count the number of items with that value, when does it make sense to use df.groupby('colA').count() and when does it make sense to use df['colA'].value_counts() ?

推荐答案

有区别

There is difference value_counts return:

生成的对象将按降序排列,以便第一个元素是出现频率最高的元素.

The resulting object will be in descending order so that the first element is the most frequently-occurring element.

但是 count 不是,它按index(由groupby('col')中的列创建)对输出进行排序.

but count not, it sort output by index (created by column in groupby('col')).

df.groupby('colA').count() 

用于按功能count.汇总df的所有列,因此它计算不包括NaN s的值.

is for aggregate all columns of df by function count. So it count values excluding NaNs.

因此,如果需要count,则只需要一列:

So if need count only one column need:

df.groupby('colA')['colA'].count() 

示例:

df = pd.DataFrame({'colB':list('abcdefg'),
                   'colC':[1,3,5,7,np.nan,np.nan,4],
                   'colD':[np.nan,3,6,9,2,4,np.nan],
                   'colA':['c','c','b','a',np.nan,'b','b']})

print (df)
  colA colB  colC  colD
0    c    a   1.0   NaN
1    c    b   3.0   3.0
2    b    c   5.0   6.0
3    a    d   7.0   9.0
4  NaN    e   NaN   2.0
5    b    f   NaN   4.0
6    b    g   4.0   NaN

print (df['colA'].value_counts())
b    3
c    2
a    1
Name: colA, dtype: int64

print (df.groupby('colA').count())
      colB  colC  colD
colA                  
a        1     1     1
b        3     2     2
c        2     2     1

print (df.groupby('colA')['colA'].count())
colA
a    1
b    3
c    2
Name: colA, dtype: int64

这篇关于什么时候适合使用df.value_counts()和df.groupby('...').count()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆