按X列对数据框进行分组 [英] Grouping a dataframe by X columns

查看:79
本文介绍了按X列对数据框进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,我想将一个函数应用于每2列(或3列,它是变量).

I have a dataframe and I'd like to apply a function to each 2 columns (or 3, it's variable).

例如下面的DataFrame,我想将均值函数应用于0-1、2-3、4-5,.... 28-29列

For example with the following DataFrame, I'd like to apply the mean function to columns 0-1, 2-3, 4-5, ....28-29

d = pd.DataFrame((np.random.randn(360)).reshape(12,30))

           0         1  ...       17        18        19            29 
0   0.590293 -2.794911 ...  0.772830 -1.389820 -1.696832 ...  0.615549 
1   0.115954  2.179996 ... -0.764384 -0.610713 -0.289050 ... -1.130803 
2   0.209405  0.381398 ... -0.317797  0.261590  2.502581 ...  1.750126 
3   2.828746  0.831299 ... -0.679128 -1.255643  0.245522 ... -0.612011 
4   0.625284  1.141448 ...  0.391047 -1.262303 -0.094523 ... -3.643543 
5   0.493923  1.601924 ... -0.935102 -2.416869  0.112278 ... -0.001863 
6  -1.213347  0.396682 ...  0.671210  0.122041 -1.469256 ...  1.825214 
7   0.026695 -0.482887 ...  0.020123  1.151533 -0.440114 ... -1.407276 
8   0.235436  0.763454 ... -0.446333 -0.322420  1.067925 ... -0.622363 
9   0.668812  0.537556 ...  0.471777 -0.119756  0.098581 ...  0.007390 
10 -1.112536 -2.378293 ...  1.047705 -0.812025  0.771080 ... -0.403167 
11 -0.709457 -1.598942 ... -0.568418 -2.095332 -1.970319 ...  1.687536 

推荐答案

groupby也可以在axis=1上运行,并且可以接受一组组标签.如果您的列在您的示例中那样是方便的范围,那么这很简单:

groupby can work on axis=1 as well, and can accept a sequence of group labels. If your columns are convenient ranges like in your example, it's trivial:

>>> df = pd.DataFrame((np.random.randn(6*6)).reshape(6,6))
>>> df
          0         1         2         3         4         5
0  1.705550 -0.757193 -0.636333  2.097570 -1.064751  0.450812
1  0.575623 -0.385987  0.105516  0.820795 -0.464069  0.728609
2  0.776840 -0.173348  0.878534  0.995937  0.094515  0.098853
3  0.326854  1.297625  2.232534  1.004719 -0.440271  1.548430
4  0.483211 -1.182175 -0.012520 -1.766317 -0.895284 -0.695300
5  0.523011 -1.653557  1.022042  1.201774 -1.118465  1.400537
>>> df.groupby(df.columns//2, axis=1).mean()
          0         1         2
0  0.474179  0.730618 -0.306970
1  0.094818  0.463155  0.132270
2  0.301746  0.937235  0.096684
3  0.812239  1.618627  0.554080
4 -0.349482 -0.889419 -0.795292
5 -0.565273  1.111908  0.141036

(之所以起作用,是因为df.columns//2给出了Int64Index([0, 0, 1, 1, 2, 2], dtype='int64').)

(This works because df.columns//2 gives Int64Index([0, 0, 1, 1, 2, 2], dtype='int64').)

即使我们不是很幸运,我们仍然可以自己建立适当的小组:

Even if we're not so fortunate, we can still build the appropriate groups ourselves:

>>> df.groupby(np.arange(df.columns.size)//2, axis=1).mean()
          0         1         2
0  0.474179  0.730618 -0.306970
1  0.094818  0.463155  0.132270
2  0.301746  0.937235  0.096684
3  0.812239  1.618627  0.554080
4 -0.349482 -0.889419 -0.795292
5 -0.565273  1.111908  0.141036

这篇关于按X列对数据框进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆