在 pandas 中重新采样 [英] Resampling in pandas
问题描述
I have asked a question on another thread Link. But I got an incomplete answer. And no one is willing to reply. That is why I am making another modified question. Let me explain the question briefly, I wanted to resample the following data:
**`
Timestamp L_x L_y L_a R_x R_y R_a
2403950 621.3 461.3 313 623.3 461.8 260
2403954 622.5 461.3 312 623.3 462.6 260
2403958 623.1 461.5 311 623.4 464 261
2403962 623.6 461.7 310 623.7 465.4 261
2403966 623.8 461.5 309 623.9 466.1 261
2403970 620.9 461.4 309 623.8 465.9 259
2403974 621.7 461.1 308 623 464.8 258
2403978 622.1 461.1 308 621.9 463.9 256
2403982 622.5 461.5 308 621 463.4 255
2403986 622.4 462.1 307 620.7 463.3 254
`**
桌子就这样继续下去.所有时间戳以毫秒为单位.我想将其重新采样到100L bin time.
The table goes on and on like that. All the timestamps are in milliseconds. And I wanted to resample it into 100L bin time.
df = df.resample('100L')
结果表为:
Timestamp L_x L_y L_a R_x R_y R_a
2403900 621.3 461.3 313 623.3 461.8 260
2404000 622.5 461.3 312 623.3 462.6 260
2404100 623.1 461.5 311 623.4 464 261
2404200 623.6 461.7 310 623.7 465.4 261
2404300 623.8 461.5 309 623.9 466.1 261
The resulting table is:
Timestamp L_x L_y L_a R_x R_y R_a
2403900 621.3 461.3 313 623.3 461.8 260
2404000 622.5 461.3 312 623.3 462.6 260
2404100 623.1 461.5 311 623.4 464 261
2404200 623.6 461.7 310 623.7 465.4 261
2404300 623.8 461.5 309 623.9 466.1 261
但这不是我想要的结果.因为原始表中的第一个时间戳索引是2403950.所以第一个bin时间应包含2403950到2404050,而应该是2403900-2404000.如下所示:
Timestamp L_x L_y L_a R_x R_y R_a
2403950 ... ... ... ... ... ...
2404050 ... ... ... ... ... ...
2404150 ... ... ... ... ... ...
2404250 ... ... ... ... ... ...
2404350 ... ... ... ... ... ...
该列的其余部分是原始表的值的平均值.
因此,有人建议我必须计算偏移量.在我的情况下是50毫秒.并执行以下操作:
But that is not the result I want. because the first timestamp index in the original table is 2403950. So the first bin time should contain from 2403950 to 2404050 but instead it is 2403900 - 2404000. like the following:
Timestamp L_x L_y L_a R_x R_y R_a
2403950 ... ... ... ... ... ...
2404050 ... ... ... ... ... ...
2404150 ... ... ... ... ... ...
2404250 ... ... ... ... ... ...
2404350 ... ... ... ... ... ...
The rest of the column are the mean of the values of the original table.
So to do that someone sugested that I have to calculate the offset. In my case it is 50 milliseconds. And do the following:
df.resample('100L', loffset='50L')
偏移量仅将标签向前移动50毫秒,但不会更改平均值.它仍在计算例如第一个bin时间的平均值,而不是2403950到2404050,而是从2403900到2404000.
The offset only moves the labels 50 milliseconds forward but it doesnot change the mean values. It is still calculating the mean of, for instance for the first bin time, values from 2403900 to 2404000 instead of 2403950 to 2404050.
感谢您的帮助
推荐答案
您正在寻找基本的kwarg.
You're looking for the base kwarg.
base:int,默认为0
对于平均细分为1天的频率,则是汇总间隔的起点".例如,对于"5分钟"频率,基数范围可以从0到4.默认值为0
base : int, default 0
For frequencies that evenly subdivide 1 day, the "origin" of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0
您的情况看起来像您想要的:
In your case it looks like you want:
df.resample('100L', base=50)
注意:没有DatetimeIndex/PeriodIndex/TimedeltaIndex的重新采样会在最近的熊猫中引发错误,因此您应该在执行此操作之前将其转换为DatetimeIndex.
Note: resample without a DatetimeIndex/PeriodIndex/TimedeltaIndex raises an error in recent pandas, so you should convert to DatetimeIndex before doing this.
这篇关于在 pandas 中重新采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!