pandas 对整数索引的重采样 [英] Pandas' equivalent of resample for integer index

查看:61
本文介绍了 pandas 对整数索引的重采样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为不是DatetimeIndex而是整数数组甚至浮点数的数据框寻找与resample方法等效的大熊猫.

I'm looking for a pandas equivalent of the resample method for a dataframe whose isn't a DatetimeIndex but an array of integers, or maybe even floats.

我知道在某些情况下(例如,)重新采样方法可以很容易地用重新索引和内插代替,但在某些情况下(我认为)不能.

I know that for some cases (this one, for example) the resample method can be substituted easily by a reindex and interpolation, but for some cases (I think) it can't.

例如,如果我有

df = pd.DataFrame(np.random.randn(10,2))
withdates = df.set_index(pd.date_range('2012-01-01', periods=10))
withdates.resample('5D', np.std)

这给了我

                   0         1
2012-01-01  1.184582  0.492113
2012-01-06  0.533134  0.982562

,但是用df和重新采样无法产生相同的结果.所以我正在寻找一种可以用作

but I can't produce the same result with df and resample. So I'm looking for something that would work as

 df.resample(5, np.std)

那会给我

          0         1
0  1.184582  0.492113
5  0.533134  0.982562

是否存在这种方法?我能够创建此方法的唯一方法是,将df手动分离到较小的数据帧中,应用np.std,然后将所有内容串联起来,我发现这很慢而且一点也不聪明.

Does such a method exist? The only way I was able to create this method was by manually separating df into smaller dataframes, applying np.std and then concatenating everything back, which I find pretty slow and not smart at all.

欢呼

推荐答案

设置

import pandas as pd
import numpy as np

np.random.seed([3,1415])
df = pd.DataFrame(np.random.rand(20, 2), columns=['A', 'B'])

您需要创建标签以自己分组.我会用:

You need to create the labels to group by yourself. I'd use:

(df.index.to_series() / 5).astype(int)

为您提供一系列[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, ...]值,然后在groupby

To get you a series of values like [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, ...] Then use this in a groupby

您还需要为新数据框指定索引.我会用:

You'll also need to specify the index for the new dataframe. I'd use:

df.index[4::5]

要获取从第5个位置开始的当前索引(因此从4开始),此后每第5个位置开始.看起来像[4, 9, 14, 19].我本可以作为df.index[::5]来获得开始位置,但是我选择了结束位置.

To get a the current index starting at the 5th position (hence the 4) and every 5th position after that. It will look like [4, 9, 14, 19]. I could've done this as df.index[::5] to get the starting positions but I went with ending positions.

# assign as variable because I'm going to use it more than once.
s = (df.index.to_series() / 5).astype(int)

df.groupby(s).std().set_index(s.index[4::5])

看起来像:

           A         B
4   0.198019  0.320451
9   0.329750  0.408232
14  0.293297  0.223991
19  0.095633  0.376390

其他注意事项

这等效于下采样.我们尚未解决抽样问题.

Other considerations

This is for the equivalent of down sampling. We haven't addressed up sampling.

要以更频繁的方式从生成的数据返回到数据框索引,我们可以像这样使用reindex:

To go back from what we've produced to a dataframe index by something more frequent, we can use reindex like so:

# assign what we've done above to df_down
df_down = df.groupby(s).std().set_index(s.index[4::5])

df_up = df_down.reindex(range(20)).bfill()

看起来像:

           A         B
0   0.198019  0.320451
1   0.198019  0.320451
2   0.198019  0.320451
3   0.198019  0.320451
4   0.198019  0.320451
5   0.329750  0.408232
6   0.329750  0.408232
7   0.329750  0.408232
8   0.329750  0.408232
9   0.329750  0.408232
10  0.293297  0.223991
11  0.293297  0.223991
12  0.293297  0.223991
13  0.293297  0.223991
14  0.293297  0.223991
15  0.095633  0.376390
16  0.095633  0.376390
17  0.095633  0.376390
18  0.095633  0.376390
19  0.095633  0.376390

我们还可以将其他内容用于reindex,例如range(0, 20, 2)来将样本提升到甚至整数索引.

We could also use other things to reindex by like range(0, 20, 2) to up sample to even integer indices.

这篇关于 pandas 对整数索引的重采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆