“铺平" group.nth在 pandas 中的输出 [英] "Flattening" output of group.nth in Pandas
问题描述
我的索引编制能力还不够,我正在为这个问题而苦苦挣扎.
My indexing skills are not quite up to par and I'm struggling with this problem.
我有以下设置:
import pandas as pd
import numpy as np
index = pd.bdate_range('2012-1-1', periods=250)
df1 = pd.DataFrame(np.random.rand(250,4), index=index, columns=[1, 2, 3, 4])
df2 = pd.DataFrame(np.random.rand(250,4), index=index, columns=[1, 2, 3, 4])
df = pd.concat({'A': df1, 'B': df2}, axis=1)
group = df.groupby([lambda x: x.year, lambda x: x.month])
我看到最大数量我的组(即(年,月)组合中的工作日)为23:
I see that the maximum no. of business days within my groups (i.e., (year, month) combinations) is 23:
In [257]: group.size().max()
Out[257]: 23
在每个月的第一个工作日(索引n = 0),我可以获得以下统计信息:
And for the 1st business day (index n=0) of every month, I can get statistics as follows:
In [258]: group.nth(0).describe()
Out[258]:
A B \
1 2 3 4 1 2
count 12.000000 12.000000 12.000000 12.000000 12.000000 12.000000
mean 0.541559 0.491684 0.354012 0.448284 0.353839 0.408020
std 0.367662 0.242924 0.254447 0.248426 0.228194 0.220511
min 0.021792 0.110715 0.067677 0.074719 0.097227 0.116947
25% 0.144712 0.368966 0.144415 0.209418 0.189507 0.260863
50% 0.646160 0.439860 0.233370 0.472696 0.214474 0.370281
75% 0.865417 0.614928 0.587038 0.710450 0.529376 0.602299
max 0.963938 0.912865 0.766722 0.750037 0.778580 0.776627
3 4
count 12.000000 12.000000
mean 0.434197 0.588980
std 0.301113 0.287869
min 0.004253 0.064859
25% 0.262517 0.357484
50% 0.350605 0.653136
75% 0.676960 0.775588
max 0.991661 0.990118
我要执行的是在range(23)中为n运行group.nth(n).describe(),并将结果保存为以下格式:
What I would like to do is run group.nth(n).describe() for n in range(23), and save the results in this format:
count mean std min 25% 50% 75% max
(col2, n, col1) 281 -0.004093 0.140578 -1.64 -0.04 -0.00 0.04 0.58
对于(col2,n,col1)的所有组合,其中col2是较低的列名称(1到4),n在range(23)内,而col1是较高的列名称("A"或"B") ).
For all combinations of (col2, n, col1) where col2 is the lower column name (1 through 4), n is in range(23), and col1 is the upper column name ('A' or 'B').
任何帮助将不胜感激-我将学到很多有关如何进行这类操作的知识.我有一些办法:
Any help would be greatly appreciated -- I'll learn a lot about how to do these kinds of manipulations. I got some of the way there with:
group.nth(0).describe().stack().T.stack()`
但是当我迭代n到22时,我会对其进行哈希处理.
But I make a hash of it when I iterate n through 22.
谢谢.
推荐答案
您非常亲密.您只需要使用索引从索引生成显式列表,以将n
放在中间.然后,使用数据框列表,您可以直接使用concat
.
You're very close. You just need to use the index to generate an explicit list from index to put the n
in the middle. Then, with the list of dataframes, you can just use concat
straight up.
group = df.groupby([lambda x: x.year, lambda x: x.month])
dataframes = []
for n in range(23):
frame = group.nth(n).describe().T
frame.index = [(inner, n, outer) for outer, inner in frame.index]
dataframes.append(frame)
final_df = pd.concat(dataframes)
这篇关于“铺平" group.nth在 pandas 中的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!