pandas 按月份统计不重复发生的次数 [英] Pandas Count Unique occurrences by Month

查看:64
本文介绍了 pandas 按月份统计不重复发生的次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些月度数据,试图使用Pandas进行总结,我需要计算每月发生的唯一条目数.这是一些示例代码,显示了我正在尝试做的事情:

import pandas as pd

mnths = ['JAN','FEB','MAR','APR']
custs = ['A','B','C',]

testFrame = pd.DataFrame(index=custs, columns=mnths)
testFrame['JAN']['A'] = 'purchased Prod'
testFrame['JAN']['B'] = 'No Data'
testFrame['JAN']['C'] = 'Purchased Competitor'
testFrame['FEB']['A'] = 'purchased Prod'
testFrame['FEB']['B'] = 'purchased Prod'
testFrame['FEB']['C'] = 'purchased Prod'
testFrame['MAR']['A'] = 'No Data'
testFrame['MAR']['B'] = 'No Data'
testFrame['MAR']['C'] = 'Purchased Competitor'
testFrame['APR']['A'] = 'Purchased Competitor'
testFrame['APR']['B'] = 'purchased Prod'
testFrame['APR']['C'] = 'Purchased Competitor'

uniqueValues = pd.Series(testFrame.values.ravel()).unique()

#CODE TO GET COUNT OF ENTRIES IN testFrame BY UNIQUE VALUE

所需的输出:

                JAN    FEB    MAR    APR
purchased Prod   ?     ?       ?      ?
Purchased Competitor ? ?       ?      ?
No Data          ?     ?       ?      ?

我可以获得唯一的值,并使用正确的轴/列创建一个新的数据框

我从这里开始: 熊猫:计算数据框中的唯一值 在Pandas数据框中查找唯一值,而不考虑行或列的位置

,但是仍然不能完全将输出转换成我需要的格式.我不太确定如何将df.groupby语法或df.apply语法应用于正在使用的内容.

解决方案

填充是可选的.

In [40]: testFrame.apply(Series.value_counts).fillna(0)
Out[40]: 
                      JAN  FEB  MAR  APR
No Data                 1    0    2    0
Purchased Competitor    1    0    1    2
purchased Prod          1    3    0    1

这是一个巧妙的应用技巧.我将创建一个函数并打印出传入的内容(甚至可以在其中进行调试).然后轻松查看发生了什么.

In [20]: def f(x):
   ....:     print(x)
   ....:     return x.value_counts()
   ....: 

In [21]: testFrame.apply(f)
A          purchased Prod
B                 No Data
C    Purchased Competitor
Name: JAN, dtype: object
A          purchased Prod
B                 No Data
C    Purchased Competitor
Name: JAN, dtype: object
A    purchased Prod
B    purchased Prod
C    purchased Prod
Name: FEB, dtype: object
A                 No Data
B                 No Data
C    Purchased Competitor
Name: MAR, dtype: object
A    Purchased Competitor
B          purchased Prod
C    Purchased Competitor
Name: APR, dtype: object
Out[21]: 
                      JAN  FEB  MAR  APR
No Data                 1  NaN    2  NaN
Purchased Competitor    1  NaN    1    2
purchased Prod          1    3  NaN    1

[3 rows x 4 columns]

因此,执行此操作,然后将它们封装在一起(带有正确的标签)

In [22]: testFrame.iloc[0].value_counts()
Out[22]: 
purchased Prod          2
Purchased Competitor    1
No Data                 1
dtype: int64

I have some monthly data that I'm trying to summarize using Pandas and I need to count the number of unique entries that occur each month. Here's some sample code that shows what I'm trying to do:

import pandas as pd

mnths = ['JAN','FEB','MAR','APR']
custs = ['A','B','C',]

testFrame = pd.DataFrame(index=custs, columns=mnths)
testFrame['JAN']['A'] = 'purchased Prod'
testFrame['JAN']['B'] = 'No Data'
testFrame['JAN']['C'] = 'Purchased Competitor'
testFrame['FEB']['A'] = 'purchased Prod'
testFrame['FEB']['B'] = 'purchased Prod'
testFrame['FEB']['C'] = 'purchased Prod'
testFrame['MAR']['A'] = 'No Data'
testFrame['MAR']['B'] = 'No Data'
testFrame['MAR']['C'] = 'Purchased Competitor'
testFrame['APR']['A'] = 'Purchased Competitor'
testFrame['APR']['B'] = 'purchased Prod'
testFrame['APR']['C'] = 'Purchased Competitor'

uniqueValues = pd.Series(testFrame.values.ravel()).unique()

#CODE TO GET COUNT OF ENTRIES IN testFrame BY UNIQUE VALUE

Desired Output:

                JAN    FEB    MAR    APR
purchased Prod   ?     ?       ?      ?
Purchased Competitor ? ?       ?      ?
No Data          ?     ?       ?      ?

I can get the unique values and create a new dataframe with the correct axes/columns

I started here and here: Pandas: Counting unique values in a dataframe Find unique values in a Pandas dataframe, irrespective of row or column location

but still can't quite get the output to the formats I need. I'm not quite sure how to apply the df.groupby syntax or the df.apply syntax to what I'm working with.

解决方案

The filling is optional.

In [40]: testFrame.apply(Series.value_counts).fillna(0)
Out[40]: 
                      JAN  FEB  MAR  APR
No Data                 1    0    2    0
Purchased Competitor    1    0    1    2
purchased Prod          1    3    0    1

Here is a neat apply trick. I'll create a function and print out what is incoming (and maybe even debug in their). Then easy to see what's happening.

In [20]: def f(x):
   ....:     print(x)
   ....:     return x.value_counts()
   ....: 

In [21]: testFrame.apply(f)
A          purchased Prod
B                 No Data
C    Purchased Competitor
Name: JAN, dtype: object
A          purchased Prod
B                 No Data
C    Purchased Competitor
Name: JAN, dtype: object
A    purchased Prod
B    purchased Prod
C    purchased Prod
Name: FEB, dtype: object
A                 No Data
B                 No Data
C    Purchased Competitor
Name: MAR, dtype: object
A    Purchased Competitor
B          purchased Prod
C    Purchased Competitor
Name: APR, dtype: object
Out[21]: 
                      JAN  FEB  MAR  APR
No Data                 1  NaN    2  NaN
Purchased Competitor    1  NaN    1    2
purchased Prod          1    3  NaN    1

[3 rows x 4 columns]

So its doing this operation then concatting them together (with the correct labels)

In [22]: testFrame.iloc[0].value_counts()
Out[22]: 
purchased Prod          2
Purchased Competitor    1
No Data                 1
dtype: int64

这篇关于 pandas 按月份统计不重复发生的次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆