“铺平" group.nth在 pandas 中的输出 [英] "Flattening" output of group.nth in Pandas

查看:59
本文介绍了“铺平" group.nth在 pandas 中的输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的索引编制能力还不够,我正在为这个问题而苦苦挣扎.

My indexing skills are not quite up to par and I'm struggling with this problem.

我有以下设置:

import pandas as pd
import numpy as np

index = pd.bdate_range('2012-1-1', periods=250)
df1 = pd.DataFrame(np.random.rand(250,4), index=index, columns=[1, 2, 3, 4])
df2 = pd.DataFrame(np.random.rand(250,4), index=index, columns=[1, 2, 3, 4])
df = pd.concat({'A': df1, 'B': df2}, axis=1)

group = df.groupby([lambda x: x.year, lambda x: x.month])

我看到最大数量我的组(即(年,月)组合中的工作日)为23:

I see that the maximum no. of business days within my groups (i.e., (year, month) combinations) is 23:

In [257]: group.size().max()
Out[257]: 23

在每个月的第一个工作日(索引n = 0),我可以获得以下统计信息:

And for the 1st business day (index n=0) of every month, I can get statistics as follows:

In [258]: group.nth(0).describe()
Out[258]: 
               A                                           B             \
               1          2          3          4          1          2   
count  12.000000  12.000000  12.000000  12.000000  12.000000  12.000000   
mean    0.541559   0.491684   0.354012   0.448284   0.353839   0.408020   
std     0.367662   0.242924   0.254447   0.248426   0.228194   0.220511   
min     0.021792   0.110715   0.067677   0.074719   0.097227   0.116947   
25%     0.144712   0.368966   0.144415   0.209418   0.189507   0.260863   
50%     0.646160   0.439860   0.233370   0.472696   0.214474   0.370281   
75%     0.865417   0.614928   0.587038   0.710450   0.529376   0.602299   
max     0.963938   0.912865   0.766722   0.750037   0.778580   0.776627   


               3          4  
count  12.000000  12.000000  
mean    0.434197   0.588980  
std     0.301113   0.287869  
min     0.004253   0.064859  
25%     0.262517   0.357484  
50%     0.350605   0.653136  
75%     0.676960   0.775588  
max     0.991661   0.990118  

我要执行的是在range(23)中为n运行group.nth(n).describe(),并将结果保存为以下格式:

What I would like to do is run group.nth(n).describe() for n in range(23), and save the results in this format:

                 count      mean       std   min   25%   50%   75%   max
(col2, n, col1)    281 -0.004093  0.140578 -1.64 -0.04 -0.00  0.04  0.58

对于(col2,n,col1)的所有组合,其中col2是较低的列名称(1到4),n在range(23)内,而col1是较高的列名称("A"或"B") ).

For all combinations of (col2, n, col1) where col2 is the lower column name (1 through 4), n is in range(23), and col1 is the upper column name ('A' or 'B').

任何帮助将不胜感激-我将学到很多有关如何进行这类操作的知识.我有一些办法:

Any help would be greatly appreciated -- I'll learn a lot about how to do these kinds of manipulations. I got some of the way there with:

group.nth(0).describe().stack().T.stack()`

但是当我迭代n到22时,我会对其进行哈希处理.

But I make a hash of it when I iterate n through 22.

谢谢.

推荐答案

您非常亲密.您只需要使用索引从索引生成显式列表,以将n放在中间.然后,使用数据框列表,您可以直接使用concat.

You're very close. You just need to use the index to generate an explicit list from index to put the n in the middle. Then, with the list of dataframes, you can just use concat straight up.

group = df.groupby([lambda x: x.year, lambda x: x.month])
dataframes = []
for n in range(23):
    frame = group.nth(n).describe().T
    frame.index = [(inner, n, outer) for outer, inner in frame.index]
    dataframes.append(frame)
final_df = pd.concat(dataframes)

这篇关于“铺平" group.nth在 pandas 中的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆