如何在 Python Pandas 中分组多个具有唯一值的列 [英] How to groupby multiple columns with count unique value in Python Pandas

查看:80
本文介绍了如何在 Python Pandas 中分组多个具有唯一值的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据帧 df_data:

CustID    MatchID    LocationID   isMajor  #Major is 1 and Minor is 0
  1        11111       324         0  
  1        11111       324         0
  1        11111       324         0
  1        22222       490         0
  1        33333       675         1
  2        44444       888         0

我有一个这样的函数和参数:

I have a function and parameter like this:

def compute_something(list_minor = None, list_major = None):
   return pass

解释参数:如果CustID = 1,参数应该是list_minor = [3,1](位置不重要),list_major = [1] 因为 LocationID = 324 他得到 3 次,LocationID = 490 他得到 1 次 (324,490> 得到 isMajor = 0 所以它应该在 1 list 中).类似的,CustID2有参数list_minor = [1]list_major = [](如果他没有数据major/minor,我应该通过[].

Explain Parameters: with CustID = 1 the parameters should be list_minor = [3,1] (position is not important), list_major = [1] because with LocationID = 324 he get 3 times and LocationID = 490 he get 1 time (324,490 gets isMajor = 0 so it should be into 1 list). Similiar, CustID2 have parameters list_minor = [1] and list_major = [] (if he don't have data major/minor, I should be pass [].

这是我的程序:

data = [
    [1, 11111, 324, 0],
    [1, 11111, 324, 0],
    [1, 11111, 324, 0],
    [1, 22222, 490, 0],
    [1, 33333, 675, 1],
    [2, 44444, 888, 0]
]
df_data = pd.DataFrame(data, columns = ['CustID','MatchID','LocationID','IsMajor'])
df_parameter = DataFrame()

df_parameter['parameters'] = df.groupby(['CustID','MatchID','IsMajor'])['LeagueID'].nunique()

但是df_parameter['parameters']的结果是错误的:

                                    parameters
 CustID     MatchID    IsMajor
   1         11111        0             1   #should be 3
             22222        0             1
             33333        1             1
   2         44444        0             1

我可以通过 groupby 获取我上面解释的参数并将它们传递给函数吗?

Can I get the parameters I explained above with groupby and pass them to the function?

推荐答案

怎么样:

(df.groupby(['CustID','isMajor', 'MatchID']).size()
   .groupby(level=[0,1]).agg(set)
   .unstack('isMajor')
)

输出:

isMajor       0    1
CustID              
1        {1, 3}  {1}
2           {1}  NaN


更新试试这个 groupby:


Update Try this one groupby:

(df.groupby(['CustID','isMajor'])['MatchID']
   .apply(lambda x: x.value_counts().agg(list))
   .unstack('isMajor')
)

此外,使用两个键的 groupby 可能会很慢.在这种情况下,您可以将键和 groupby 连接起来:

Also, groupby with two keys can be slow. In that case, you can just concatenate the keys and groupby on that:

keys = df['CustID'].astype(str) + '_' + df['isMajor'].astype(str)

(df.groupby(keys)['MatchID']
   .apply(lambda x: x.value_counts().agg(list))
)

这篇关于如何在 Python Pandas 中分组多个具有唯一值的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆