如何使用正向填充python重新采样 [英] How to resample using forward fill python

查看:95
本文介绍了如何使用正向填充python重新采样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的Dataframe df3看起来像这样:

My Dataframe df3 looks something like this:

    Id           Timestamp         Data    Group_Id    
0    1     2018-01-01 00:00:05.523 125.5   101 
1    2     2018-01-01 00:00:05.757 125.0   101 
2    3     2018-01-02 00:00:09.507 127.0   52  
3    4     2018-01-02 00:00:13.743 126.5   52  
4    5     2018-01-03 00:00:15.407 125.5   50
                    ...

11   11    2018-01-01 00:00:07.523 125.5   120 
12   12    2018-01-01 00:00:08.757 125.0   120 
13   13    2018-01-04 00:00:14.507 127.0   300  
14   14    2018-01-04 00:00:15.743 126.5   300  
15   15    2018-01-05 00:00:19.407 125.5   350

我想每秒使用填充来重新采样,这样看起来像这样:

I wanted to resample using ffill for every second so that it looks like this:

    Id           Timestamp         Data    Group_Id    
0    1     2018-01-01 00:00:06.000 125.00    101 
1    2     2018-01-01 00:00:07.000 125.00    101 
2    3     2018-01-01 00:00:08.000 125.00    101 
3    4     2018-01-02 00:00:09.000 125.00     52 
4    5     2018-01-02 00:00:10.000 127.00     52 

                    ...

我的代码:

def resample(df):
    indexing = df[['Timestamp','Data']]
    indexing['Timestamp']=pd.to_datetime(indexing['Timestamp'])
    indexing =indexing.set_index('Timestamp')
    indexing1= indexing.resample('1S',fill_method='ffill')
    # indexing1 = indexing1.resample('D')
    return indexing1
indexing = resample(df3)

但发生错误

ValueError: cannot reindex a non-unique index with a method or limit

我不太了解这个错误的含义.来自的类似问题的@jezrael 建议将drop_duplicatesgroupby一起使用.我不确定这对数据有什么影响,因为我的数据似乎没有重复项?有人可以解释一下吗?谢谢.

I don't quite understand what this error mean. @jezrael from this similar question suggested using drop_duplicates with groupby. I am not sure what this does to the data as it seems there are no duplicates in my data? Can someone explain this please? Thanks.

推荐答案

此错误是由于以下原因引起的:

This error is caused because of the following:

    Id           Timestamp         Data    Group_Id    
0    1     2018-01-01 00:00:05.523 125.5   101 
1    2     2018-01-01 00:00:05.757 125.0   101 

当您将这两个时间戳重新采样到最近的秒时,它们都将变为 2018-01-01 00:00:06和pandas不知道要选择哪个数据值 因为它有两个可供选择.相反,您可以做的是使用聚合函数 例如last(尽管meanmaxmin也可能适用),以便 选择其中一个值.然后,您可以应用前向填充.

When you resample both these timestamps to the nearest second they both become 2018-01-01 00:00:06 and pandas doesn't know which value for the data to pick because it has two to select from. Instead what you can do is use an aggregation function such as last (though mean, max, min may also be suitable) in order to select one of the values. Then you can apply the forward fill.

示例:

from io import StringIO
import pandas as pd
df = pd.read_table(StringIO("""    Id           Timestamp         Data    Group_Id    
0    1     2018-01-01 00:00:05.523  125.5   101 
1    2     2018-01-01 00:00:05.757  125.0   101 
2    3     2018-01-02 00:00:09.507  127.0   52  
3    4     2018-01-02 00:00:13.743  126.5   52  
4    5     2018-01-03 00:00:15.407  125.5   50"""), sep='\s\s+')
df['Timestamp'] = pd.to_datetime(df['Timestamp']).dt.round('s')
df.set_index('Timestamp', inplace=True)
df = df.resample('1S').last().ffill()

这篇关于如何使用正向填充python重新采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆