如何使用正向填充python重新采样 [英] How to resample using forward fill python
问题描述
我的Dataframe df3看起来像这样:
My Dataframe df3 looks something like this:
Id Timestamp Data Group_Id
0 1 2018-01-01 00:00:05.523 125.5 101
1 2 2018-01-01 00:00:05.757 125.0 101
2 3 2018-01-02 00:00:09.507 127.0 52
3 4 2018-01-02 00:00:13.743 126.5 52
4 5 2018-01-03 00:00:15.407 125.5 50
...
11 11 2018-01-01 00:00:07.523 125.5 120
12 12 2018-01-01 00:00:08.757 125.0 120
13 13 2018-01-04 00:00:14.507 127.0 300
14 14 2018-01-04 00:00:15.743 126.5 300
15 15 2018-01-05 00:00:19.407 125.5 350
我想每秒使用填充来重新采样,这样看起来像这样:
I wanted to resample using ffill for every second so that it looks like this:
Id Timestamp Data Group_Id
0 1 2018-01-01 00:00:06.000 125.00 101
1 2 2018-01-01 00:00:07.000 125.00 101
2 3 2018-01-01 00:00:08.000 125.00 101
3 4 2018-01-02 00:00:09.000 125.00 52
4 5 2018-01-02 00:00:10.000 127.00 52
...
我的代码:
def resample(df):
indexing = df[['Timestamp','Data']]
indexing['Timestamp']=pd.to_datetime(indexing['Timestamp'])
indexing =indexing.set_index('Timestamp')
indexing1= indexing.resample('1S',fill_method='ffill')
# indexing1 = indexing1.resample('D')
return indexing1
indexing = resample(df3)
但发生错误
ValueError: cannot reindex a non-unique index with a method or limit
我不太了解这个错误的含义.来自的类似问题的@jezrael 建议将drop_duplicates
与groupby
一起使用.我不确定这对数据有什么影响,因为我的数据似乎没有重复项?有人可以解释一下吗?谢谢.
I don't quite understand what this error mean. @jezrael from this similar question suggested using drop_duplicates
with groupby
. I am not sure what this does to the data as it seems there are no duplicates in my data? Can someone explain this please? Thanks.
推荐答案
此错误是由于以下原因引起的:
This error is caused because of the following:
Id Timestamp Data Group_Id
0 1 2018-01-01 00:00:05.523 125.5 101
1 2 2018-01-01 00:00:05.757 125.0 101
当您将这两个时间戳重新采样到最近的秒时,它们都将变为
2018-01-01 00:00:06
和pandas不知道要选择哪个数据值
因为它有两个可供选择.相反,您可以做的是使用聚合函数
例如last
(尽管mean
,max
,min
也可能适用),以便
选择其中一个值.然后,您可以应用前向填充.
When you resample both these timestamps to the nearest second they both become
2018-01-01 00:00:06
and pandas doesn't know which value for the data to pick
because it has two to select from. Instead what you can do is use an aggregation function
such as last
(though mean
, max
, min
may also be suitable) in order to
select one of the values. Then you can apply the forward fill.
示例:
from io import StringIO
import pandas as pd
df = pd.read_table(StringIO(""" Id Timestamp Data Group_Id
0 1 2018-01-01 00:00:05.523 125.5 101
1 2 2018-01-01 00:00:05.757 125.0 101
2 3 2018-01-02 00:00:09.507 127.0 52
3 4 2018-01-02 00:00:13.743 126.5 52
4 5 2018-01-03 00:00:15.407 125.5 50"""), sep='\s\s+')
df['Timestamp'] = pd.to_datetime(df['Timestamp']).dt.round('s')
df.set_index('Timestamp', inplace=True)
df = df.resample('1S').last().ffill()
这篇关于如何使用正向填充python重新采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!