如何缩小大 pandas 数据框 [英] How to downsample a pandas dataframe
问题描述
我正在尝试缩小大熊猫数据框,以减少粒度。例如,我想减少这个数据框:
I am trying to downsample a pandas dataframe in order to reduce granularity. In example, I want to reduce this dataframe:
1 2 3 4
2 4 3 3
2 2 1 3
3 1 3 2
(下采样以获得2x2数据帧使用平均值):
to this (downsampling to obtain a 2x2 dataframe using mean):
2.25 3.25
2 2.25
有没有内置的方法或有效的方式来做,或者我必须自己写?
Is there a builtin way or efficient way to do it or I have to write it on my own?
谢谢
推荐答案
一个选项是使用groupby两次。一次索引:
One option is to use groupby twice. Once for the index:
In [11]: df.groupby(lambda x: x/2).mean()
Out[11]:
0 1 2 3
0 1.5 3.0 3 3.5
1 2.5 1.5 2 2.5
,一列为列:
In [12]: df.groupby(lambda x: x/2).mean().groupby(lambda y: y/2, axis=1).mean()
Out[12]:
0 1
0 2.25 3.25
1 2.00 2.25
注意:意思是一次可能更好...一个选择是堆叠,分组,平均和拆分,但 atm 这是一个有点fiddly。
Note: A solution which only calculated the mean once might be preferable... one option is to stack, groupby, mean, and unstack, but atm this is a little fiddly.
这似乎比维克托的解决方案:
In [21]: df = pd.DataFrame(np.random.randn(100, 100))
In [22]: %timeit df.groupby(lambda x: x/2).mean().groupby(lambda y: y/2, axis=1).mean()
1000 loops, best of 3: 1.64 ms per loop
In [23]: %timeit viktor()
1 loops, best of 3: 822 ms per loop
事实上,Viktor的解决方案使我的(不足)的笔记本电脑崩溃了大数据帧:在[31]中:df = pd.DataFrame(np.random.randn(1000,1000))$ b
In fact, Viktor's solution crashes my (underpowered) laptop for larger DataFrames:
In [31]: df = pd.DataFrame(np.random.randn(1000, 1000))
In [32]: %timeit df.groupby(lambda x: x/2).mean().groupby(lambda y: y/2, axis=1).mean()
10 loops, best of 3: 42.9 ms per loop
In [33]: %timeit viktor()
# crashes
正如维克托指出的,这不适用于非整数索引,如果这是需要的,您可以将它们存储为临时变量,然后将它们重新馈入:
As Viktor points out, this doesn't work with non-integer index, if this was wanted, you could just store them as temp variables and feed them back in after:
df_index, df_cols, df.index, df.columns = df.index, df.columns, np.arange(len(df.index)), np.arange(len(df.columns))
res = df.groupby(...
res.index, res.columns = df_index[::2], df_cols[::2]
这篇关于如何缩小大 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!