pandas groupby 可以聚合成一个列表,而不是 sum、mean 等吗? [英] Can pandas groupby aggregate into a list, rather than sum, mean, etc?
问题描述
我已经成功地使用 groupby 函数按组对给定变量求和或求平均值,但是有没有办法聚合到值列表中,而不是获得单个结果?(这仍然会被称为聚合吗?)
I've had success using the groupby function to sum or average a given variable by groups, but is there a way to aggregate into a list of values, rather than to get a single result? (And would this still be called aggregation?)
我并不完全确定这是我应该采用的方法,所以下面是我想要使用玩具数据进行转换的示例.
I am not entirely sure this is the approach I should be taking anyhow, so below is an example of the transformation I'd like to make, with toy data.
也就是说,如果数据看起来像这样:
That is, if the data look something like this:
A B C
1 10 22
1 12 20
1 11 8
1 10 10
2 11 13
2 12 10
3 14 0
我想最终得到的结果类似于以下内容.我不完全确定这是否可以通过 groupby 聚合到列表中来完成,并且对于从这里去哪里感到很迷茫.
What I am trying to end up with is something like the following. I am not totally sure whether this can be done through groupby aggregating into lists, and am rather lost as to where to go from here.
假设输出:
A B C New1 New2 New3 New4 New5 New6
1 10 22 12 20 11 8 10 10
2 11 13 12 10
3 14 0
也许我应该追求支点?将数据放入列的顺序无关紧要 - 本示例中的所有列 B 到 New6 都是等效的.非常感谢所有建议/更正.
Perhaps I should be pursuing pivots instead? The order by which the data are put into columns does not matter - all columns B through New6 in this example are equivalent. All suggestions/corrections are much appreciated.
推荐答案
我的解决方案比您预期的要长一些,我确信它可以缩短,但是:
my solution is a bit longer than you may expect, I'm sure it could be shortened, but:
g = df.groupby("A").apply(lambda x: pd.concat((x["B"], x["C"])))
k = g.reset_index()
k["i"] = k1.index
k["rn"] = k1.groupby("A")["i"].rank()
k.pivot_table(rows="A", cols="rn", values=0)
# output
# rn 1 2 3 4 5 6
# A
# 1 10 12 11 22 20 8
# 2 10 11 10 13 NaN NaN
# 3 14 10 NaN NaN NaN NaN
稍微解释一下.第一行,g = df.groupby("A").apply(lambda x: pd.concat((x["B"], x["C"])))
.这一组 df
由 A
组成,然后将 B
和 C
列放入一列:
A bit of explanation. First line, g = df.groupby("A").apply(lambda x: pd.concat((x["B"], x["C"])))
. This one group df
by A
and then put columns B
and C
into one column:
A
1 0 10
1 12
2 11
0 22
1 20
2 8
2 3 10
4 11
3 10
4 13
3 5 14
5 10
然后k = g.reset_index()
,创建顺序索引,结果为:
Then k = g.reset_index()
, creating sequential index, result is:
A level_1 0
0 1 0 10
1 1 1 12
2 1 2 11
3 1 0 22
4 1 1 20
5 1 2 8
6 2 3 10
7 2 4 11
8 2 3 10
9 2 4 13
10 3 5 14
11 3 5 10
现在我想将此索引移动到列中(我想听听如何在不重置索引的情况下创建连续列),k["i"] = k1.index
:
Now I want to move this index into column (I'd like to hear how I can make a sequential column without resetting index), k["i"] = k1.index
:
A level_1 0 i
0 1 0 10 0
1 1 1 12 1
2 1 2 11 2
3 1 0 22 3
4 1 1 20 4
5 1 2 8 5
6 2 3 10 6
7 2 4 11 7
8 2 3 10 8
9 2 4 13 9
10 3 5 14 10
11 3 5 10 11
现在,k["rn"] = k1.groupby("A")["i"].rank()
将在每个 A
(就像 SQL 中的 row_number() over(partition by A order by i)
:
Now, k["rn"] = k1.groupby("A")["i"].rank()
will add row_number inside each A
(like row_number() over(partition by A order by i)
in SQL:
A level_1 0 i rn
0 1 0 10 0 1
1 1 1 12 1 2
2 1 2 11 2 3
3 1 0 22 3 4
4 1 1 20 4 5
5 1 2 8 5 6
6 2 3 10 6 1
7 2 4 11 7 2
8 2 3 10 8 3
9 2 4 13 9 4
10 3 5 14 10 1
11 3 5 10 11 2
最后,只需使用 k.pivot_table(rows="A", cols="rn", values=0)
进行透视:
And finally, just pivoting with k.pivot_table(rows="A", cols="rn", values=0)
:
rn 1 2 3 4 5 6
A
1 10 12 11 22 20 8
2 10 11 10 13 NaN NaN
3 14 10 NaN NaN NaN NaN
这篇关于pandas groupby 可以聚合成一个列表,而不是 sum、mean 等吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!