如何计算 pandas 中每个唯一值的出现 [英] how to count occurrence of each unique value in pandas

查看:71
本文介绍了如何计算 pandas 中每个唯一值的出现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大熊猫数据框,我想计算其中每个唯一值的出现,我尝试跟踪,但要花很多时间和内存。

  pack = [] 
用于索引,以包为单位。 ():
pack.extend(pd.Series(row).dropna()。values.tolist())

不重复,count = np.unique(pack,return_counts = True)
counts = np.asarray((unique,count))


解决方案

似乎您想跨所有列计算值计数。您可以将其展平为一个系列,删除NaN,然后​​调用 value_counts 。这是一个示例-

  df 

ab
0 1.0 NaN
1 1.0 NaN
2 3.0 3.0
3 NaN 4.0
4 5.0 NaN
5 NaN 4.0
6 NaN 5.0





  pd.Series(df.values.ravel())。dropna()。value_counts ()

5.0 2
4.0 2
3.0 2
1.0 2
dtype:int64

另一种方法是使用 np.unique -

  u,c = np.unique(pd.Series(df.values.ravel())。dropna()。values,return_counts = True)
pd.Series( c,index = u)

1.0 2
3.0 2
4.0 2
5.0 2
dtype:int64

请注意,第一种方法以结果的降序对结果进行排序,而第二种方法则不。

I have large pandas dataframe, I would like to count the occurrence of each unique value in it, I try following but it takes to much time and memory usage. How can I do it in a pythonic way?

pack=[]
for index,row in packets.iterrows ():
    pack.extend(pd.Series(row).dropna().values.tolist())

unique, count= np.unique(pack, return_counts=True)
counts= np.asarray((unique, count))

解决方案

It seems like you want to compute value counts across all columns. You can flatten it to a series, drop NaNs, and call value_counts. Here's a sample -

df

     a    b
0  1.0  NaN
1  1.0  NaN
2  3.0  3.0
3  NaN  4.0
4  5.0  NaN
5  NaN  4.0
6  NaN  5.0

pd.Series(df.values.ravel()).dropna().value_counts()

5.0    2
4.0    2
3.0    2
1.0    2
dtype: int64

Another method is with np.unique -

u, c = np.unique(pd.Series(df.values.ravel()).dropna().values, return_counts=True)
pd.Series(c, index=u)

1.0    2
3.0    2
4.0    2
5.0    2
dtype: int64

Note that the first method sorts results in descending order of counts, while the latter does not.

这篇关于如何计算 pandas 中每个唯一值的出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆