pandas groupby 可以聚合成一个列表,而不是 sum、mean 等吗? [英] Can pandas groupby aggregate into a list, rather than sum, mean, etc?

查看:27
本文介绍了pandas groupby 可以聚合成一个列表,而不是 sum、mean 等吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经成功地使用 groupby 函数按组对给定变量求和或求平均值,但是有没有办法聚合到值列表中,而不是获得单个结果?(这仍然会被称为聚合吗?)

I've had success using the groupby function to sum or average a given variable by groups, but is there a way to aggregate into a list of values, rather than to get a single result? (And would this still be called aggregation?)

我并不完全确定这是我应该采用的方法,所以下面是我想要使用玩具数据进行转换的示例.

I am not entirely sure this is the approach I should be taking anyhow, so below is an example of the transformation I'd like to make, with toy data.

也就是说,如果数据看起来像这样:

That is, if the data look something like this:

    A    B    C  
    1    10   22
    1    12   20
    1    11   8
    1    10   10
    2    11   13
    2    12   10 
    3    14   0

我想最终得到的结果类似于以下内容.我不完全确定这是否可以通过 groupby 聚合到列表中来完成,并且对于从这里去哪里感到很迷茫.

What I am trying to end up with is something like the following. I am not totally sure whether this can be done through groupby aggregating into lists, and am rather lost as to where to go from here.

假设输出:

     A    B    C  New1  New2  New3  New4  New5  New6
    1    10   22  12    20    11    8     10    10
    2    11   13  12    10 
    3    14   0

也许我应该追求支点?将数据放入列的顺序无关紧要 - 本示例中的所有列 B 到 New6 都是等效的.非常感谢所有建议/更正.

Perhaps I should be pursuing pivots instead? The order by which the data are put into columns does not matter - all columns B through New6 in this example are equivalent. All suggestions/corrections are much appreciated.

推荐答案

我的解决方案比您预期的要长一些,我确信它可以缩短,但是:

my solution is a bit longer than you may expect, I'm sure it could be shortened, but:

g = df.groupby("A").apply(lambda x: pd.concat((x["B"], x["C"])))
k = g.reset_index()
k["i"] = k1.index
k["rn"] = k1.groupby("A")["i"].rank()
k.pivot_table(rows="A", cols="rn", values=0)

# output
# rn   1   2   3   4   5   6
# A                         
# 1   10  12  11  22  20   8
# 2   10  11  10  13 NaN NaN
# 3   14  10 NaN NaN NaN NaN

稍微解释一下.第一行,g = df.groupby("A").apply(lambda x: pd.concat((x["B"], x["C"]))).这一组 dfA 组成,然后将 BC 列放入一列:

A bit of explanation. First line, g = df.groupby("A").apply(lambda x: pd.concat((x["B"], x["C"]))). This one group df by A and then put columns B and C into one column:

A   
1  0    10
   1    12
   2    11
   0    22
   1    20
   2     8
2  3    10
   4    11
   3    10
   4    13
3  5    14
   5    10

然后k = g.reset_index(),创建顺序索引,结果为:

Then k = g.reset_index(), creating sequential index, result is:

    A  level_1   0
0   1        0  10
1   1        1  12
2   1        2  11
3   1        0  22
4   1        1  20
5   1        2   8
6   2        3  10
7   2        4  11
8   2        3  10
9   2        4  13
10  3        5  14
11  3        5  10

现在我想将此索引移动到列中(我想听听如何在不重置索引的情况下创建连续列),k["i"] = k1.index:

Now I want to move this index into column (I'd like to hear how I can make a sequential column without resetting index), k["i"] = k1.index:

    A  level_1   0   i
0   1        0  10   0
1   1        1  12   1
2   1        2  11   2
3   1        0  22   3
4   1        1  20   4
5   1        2   8   5
6   2        3  10   6
7   2        4  11   7
8   2        3  10   8
9   2        4  13   9
10  3        5  14  10
11  3        5  10  11

现在,k["rn"] = k1.groupby("A")["i"].rank() 将在每个 A (就像 SQL 中的 row_number() over(partition by A order by i):

Now, k["rn"] = k1.groupby("A")["i"].rank() will add row_number inside each A (like row_number() over(partition by A order by i) in SQL:

    A  level_1   0   i  rn
0   1        0  10   0   1
1   1        1  12   1   2
2   1        2  11   2   3
3   1        0  22   3   4
4   1        1  20   4   5
5   1        2   8   5   6
6   2        3  10   6   1
7   2        4  11   7   2
8   2        3  10   8   3
9   2        4  13   9   4
10  3        5  14  10   1
11  3        5  10  11   2

最后,只需使用 k.pivot_table(rows="A", cols="rn", values=0) 进行透视:

And finally, just pivoting with k.pivot_table(rows="A", cols="rn", values=0):

rn   1   2   3   4   5   6
A                         
1   10  12  11  22  20   8
2   10  11  10  13 NaN NaN
3   14  10 NaN NaN NaN NaN

这篇关于pandas groupby 可以聚合成一个列表,而不是 sum、mean 等吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆