在 pandas 中重新采样 [英] Resampling in pandas

查看:72
本文介绍了在 pandas 中重新采样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在另一个线程

I have asked a question on another thread Link. But I got an incomplete answer. And no one is willing to reply. That is why I am making another modified question. Let me explain the question briefly, I wanted to resample the following data:

**`
Timestamp  L_x   L_y    L_a     R_x     R_y     R_a
2403950   621.3 461.3   313     623.3   461.8   260
2403954   622.5 461.3   312     623.3   462.6   260
2403958   623.1 461.5   311     623.4   464     261
2403962   623.6 461.7   310     623.7   465.4   261
2403966   623.8 461.5   309     623.9   466.1   261   
2403970   620.9 461.4   309     623.8   465.9   259
2403974   621.7 461.1   308     623     464.8   258
2403978   622.1 461.1   308     621.9   463.9   256
2403982   622.5 461.5   308     621     463.4   255
2403986   622.4 462.1   307     620.7   463.3   254
`**

桌子就这样继续下去.所有时间戳以毫秒为单位.我想将其重新采样到100L bin time.

The table goes on and on like that. All the timestamps are in milliseconds. And I wanted to resample it into 100L bin time.

df = df.resample('100L')

结果表为: Timestamp L_x L_y L_a R_x R_y R_a 2403900 621.3 461.3 313 623.3 461.8 260 2404000 622.5 461.3 312 623.3 462.6 260 2404100 623.1 461.5 311 623.4 464 261 2404200 623.6 461.7 310 623.7 465.4 261 2404300 623.8 461.5 309 623.9 466.1 261

The resulting table is: Timestamp L_x L_y L_a R_x R_y R_a 2403900 621.3 461.3 313 623.3 461.8 260 2404000 622.5 461.3 312 623.3 462.6 260 2404100 623.1 461.5 311 623.4 464 261 2404200 623.6 461.7 310 623.7 465.4 261 2404300 623.8 461.5 309 623.9 466.1 261

但这不是我想要的结果.因为原始表中的第一个时间戳索引是2403950.所以第一个bin时间应包含2403950到2404050,而应该是2403900-2404000.如下所示: Timestamp L_x L_y L_a R_x R_y R_a 2403950 ... ... ... ... ... ... 2404050 ... ... ... ... ... ... 2404150 ... ... ... ... ... ... 2404250 ... ... ... ... ... ... 2404350 ... ... ... ... ... ... 该列的其余部分是原始表的值的平均值. 因此,有人建议我必须计算偏移量.在我的情况下是50毫秒.并执行以下操作:

But that is not the result I want. because the first timestamp index in the original table is 2403950. So the first bin time should contain from 2403950 to 2404050 but instead it is 2403900 - 2404000. like the following: Timestamp L_x L_y L_a R_x R_y R_a 2403950 ... ... ... ... ... ... 2404050 ... ... ... ... ... ... 2404150 ... ... ... ... ... ... 2404250 ... ... ... ... ... ... 2404350 ... ... ... ... ... ... The rest of the column are the mean of the values of the original table. So to do that someone sugested that I have to calculate the offset. In my case it is 50 milliseconds. And do the following:

df.resample('100L', loffset='50L')

偏移量仅将标签向前移动50毫秒,但不会更改平均值.它仍在计算例如第一个bin时间的平均值,而不是2403950到2404050,而是从2403900到2404000.

The offset only moves the labels 50 milliseconds forward but it doesnot change the mean values. It is still calculating the mean of, for instance for the first bin time, values from 2403900 to 2404000 instead of 2403950 to 2404050.

感谢您的帮助

推荐答案

您正在寻找基本的kwarg.

You're looking for the base kwarg.

base:int,默认为0
对于平均细分为1天的频率,则是汇总间隔的起点".例如,对于"5分钟"频率,基数范围可以从0到4.默认值为0

base : int, default 0
For frequencies that evenly subdivide 1 day, the "origin" of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0


您的情况看起来像您想要的:


In your case it looks like you want:

df.resample('100L', base=50)

注意:没有DatetimeIndex/PeriodIndex/TimedeltaIndex的重新采样会在最近的熊猫中引发错误,因此您应该在执行此操作之前将其转换为DatetimeIndex.

Note: resample without a DatetimeIndex/PeriodIndex/TimedeltaIndex raises an error in recent pandas, so you should convert to DatetimeIndex before doing this.

这篇关于在 pandas 中重新采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆