如何缩小大 pandas 数据框 [英] How to downsample a pandas dataframe

查看：115 发布时间：2017/3/26 2:28:49 python pandas dataframe downsampling

本文介绍了如何缩小大 pandas 数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试缩小大熊猫数据框，以减少粒度。例如，我想减少这个数据框：

I am trying to downsample a pandas dataframe in order to reduce granularity. In example, I want to reduce this dataframe:

（下采样以获得2x2数据帧使用平均值）：

to this (downsampling to obtain a 2x2 dataframe using mean):

2.25  3.25
2     2.25

有没有内置的方法或有效的方式来做，或者我必须自己写？

Is there a builtin way or efficient way to do it or I have to write it on my own?

谢谢

推荐答案

一个选项是使用groupby两次。一次索引：

One option is to use groupby twice. Once for the index:

In [11]: df.groupby(lambda x: x/2).mean()
Out[11]:
     0    1  2    3
0  1.5  3.0  3  3.5
1  2.5  1.5  2  2.5

，一列为列：

In [12]: df.groupby(lambda x: x/2).mean().groupby(lambda y: y/2, axis=1).mean()
Out[12]:
      0     1
0  2.25  3.25
1  2.00  2.25

注意：意思是一次可能更好...一个选择是堆叠，分组，平均和拆分，但 atm 这是一个有点fiddly。

Note: A solution which only calculated the mean once might be preferable... one option is to stack, groupby, mean, and unstack, but atm this is a little fiddly.

这似乎比维克托的解决方案：

In [21]: df = pd.DataFrame(np.random.randn(100, 100))

In [22]: %timeit df.groupby(lambda x: x/2).mean().groupby(lambda y: y/2, axis=1).mean()
1000 loops, best of 3: 1.64 ms per loop

In [23]: %timeit viktor()
1 loops, best of 3: 822 ms per loop

事实上，Viktor的解决方案使我的（不足）的笔记本电脑崩溃了大数据帧：在[31]中：df = pd.DataFrame（np.random.randn（1000,1000））$ b

In fact, Viktor's solution crashes my (underpowered) laptop for larger DataFrames:

In [31]: df = pd.DataFrame(np.random.randn(1000, 1000)) In [32]: %timeit df.groupby(lambda x: x/2).mean().groupby(lambda y: y/2, axis=1).mean() 10 loops, best of 3: 42.9 ms per loop In [33]: %timeit viktor() # crashes

正如维克托指出的，这不适用于非整数索引，如果这是需要的，您可以将它们存储为临时变量，然后将它们重新馈入：

As Viktor points out, this doesn't work with non-integer index, if this was wanted, you could just store them as temp variables and feed them back in after:

df_index, df_cols, df.index, df.columns = df.index, df.columns, np.arange(len(df.index)), np.arange(len(df.columns)) res = df.groupby(... res.index, res.columns = df_index[::2], df_cols[::2]

这篇关于如何缩小大 pandas 数据框的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何缩小大 pandas 数据框 [英] How to downsample a pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何缩小大 pandas 数据框 [英] How to downsample a pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭