pandas :按列A分组,并从其他列中列出元组? [英] Pandas: groupby column A and make lists of tuples from other columns?
问题描述
我想将用户交易汇总到熊猫列表中.我不知道如何制作一个包含多个字段的列表.例如,
I would like to aggregate user transactions into lists in pandas. I can't figure out how to make a list comprised of more than one field. For example,
df = pd.DataFrame({'user':[1,1,2,2,3],
'time':[20,10,11,18, 15],
'amount':[10.99, 4.99, 2.99, 1.99, 10.99]})
看起来像
amount time user
0 10.99 20 1
1 4.99 10 1
2 2.99 11 2
3 1.99 18 2
4 10.99 15 3
如果我愿意
print(df.groupby('user')['time'].apply(list))
我知道
user
1 [20, 10]
2 [11, 18]
3 [15]
但如果我这样做
df.groupby('user')[['time', 'amount']].apply(list)
我知道
user
1 [time, amount]
2 [time, amount]
3 [time, amount]
多亏了下面的答案,我才知道我可以做到
Thanks to an answer below, I learned I can do this
df.groupby('user').agg(lambda x: x.tolist()))
获得
amount time
user
1 [10.99, 4.99] [20, 10]
2 [2.99, 1.99] [11, 18]
3 [10.99] [15]
但是我要按照相同的顺序对时间和金额进行排序-因此我可以按顺序处理每个用户的交易.
but I'm going to want to sort time and amounts in the same order - so I can go through each users transactions in order.
我一直在寻找一种产生这种方式的方法:
I was looking for a way to produce this:
amount-time-tuple
user
1 [(20, 10.99), (10, 4.99)]
2 [(11, 2.99), (18, 1.99)]
3 [(15, 10.99)]
但是也许有一种方法可以在不纠缠"两列的情况下进行排序?
but maybe there is a way to do the sort without "tupling" the two columns?
推荐答案
apply(list)
将考虑系列索引而不是值.我认为您正在寻找
apply(list)
will consider the series index not the values .I think you are looking for
df.groupby('user')[['time', 'amount']].apply(lambda x: x.values.tolist())
user
1 [[23.0, 2.99], [50.0, 1.99]]
2 [[12.0, 1.99]]
这篇关于 pandas :按列A分组,并从其他列中列出元组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!