如何在 Python Pandas 中分组多个具有唯一值的列 [英] How to groupby multiple columns with count unique value in Python Pandas
问题描述
我有一个数据帧 df_data
:
CustID MatchID LocationID isMajor #Major is 1 and Minor is 0
1 11111 324 0
1 11111 324 0
1 11111 324 0
1 22222 490 0
1 33333 675 1
2 44444 888 0
我有一个这样的函数和参数:
I have a function and parameter like this:
def compute_something(list_minor = None, list_major = None):
return pass
解释参数:如果CustID = 1
,参数应该是list_minor = [3,1]
(位置不重要),list_major = [1]
因为 LocationID = 324
他得到 3 次,LocationID = 490
他得到 1 次 (324,490
> 得到 isMajor = 0
所以它应该在 1 list
中).类似的,CustID2
有参数list_minor = [1]
和list_major = []
(如果他没有数据major/minor,我应该通过[]
.
Explain Parameters: with CustID = 1
the parameters should be list_minor = [3,1]
(position is not important), list_major = [1]
because with LocationID = 324
he get 3 times and LocationID = 490
he get 1 time (324,490
gets isMajor = 0
so it should be into 1 list
). Similiar, CustID2
have parameters list_minor = [1]
and list_major = []
(if he don't have data major/minor, I should be pass []
.
这是我的程序:
data = [
[1, 11111, 324, 0],
[1, 11111, 324, 0],
[1, 11111, 324, 0],
[1, 22222, 490, 0],
[1, 33333, 675, 1],
[2, 44444, 888, 0]
]
df_data = pd.DataFrame(data, columns = ['CustID','MatchID','LocationID','IsMajor'])
df_parameter = DataFrame()
df_parameter['parameters'] = df.groupby(['CustID','MatchID','IsMajor'])['LeagueID'].nunique()
但是df_parameter['parameters']
的结果是错误的:
parameters
CustID MatchID IsMajor
1 11111 0 1 #should be 3
22222 0 1
33333 1 1
2 44444 0 1
我可以通过 groupby 获取我上面解释的参数并将它们传递给函数吗?
Can I get the parameters I explained above with groupby and pass them to the function?
推荐答案
怎么样:
(df.groupby(['CustID','isMajor', 'MatchID']).size()
.groupby(level=[0,1]).agg(set)
.unstack('isMajor')
)
输出:
isMajor 0 1
CustID
1 {1, 3} {1}
2 {1} NaN
更新试试这个 groupby:
Update Try this one groupby:
(df.groupby(['CustID','isMajor'])['MatchID']
.apply(lambda x: x.value_counts().agg(list))
.unstack('isMajor')
)
此外,使用两个键的 groupby 可能会很慢.在这种情况下,您可以将键和 groupby 连接起来:
Also, groupby with two keys can be slow. In that case, you can just concatenate the keys and groupby on that:
keys = df['CustID'].astype(str) + '_' + df['isMajor'].astype(str)
(df.groupby(keys)['MatchID']
.apply(lambda x: x.value_counts().agg(list))
)
这篇关于如何在 Python Pandas 中分组多个具有唯一值的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!