pandas -在索引中使用merge_asof函数 [英] Pandas - Using merge_asof function in index

查看:115
本文介绍了 pandas -在索引中使用merge_asof函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

代码是:

import numpy as np 
import pandas as pd

dateparse = lambda x: pd.datetime.strptime(x,'%d %m %Y %H %M')
vento = pd.read_csv('dados_tpm.txt', header=0, delim_whitespace= True, parse_dates = [['Dia', 'Mes', 'Ano', 'Hora','Minuto']], index_col = False, date_parser = dateparse)
vento1 = vento.rename(columns={'Dia_Mes_Ano_Hora_Minuto': 'Data'})
vento0 = vento1.set_index('Data')
vento_time = pd.DataFrame({'Data':pd.date_range(start='2016-07-12 18:00:00',end='2017-02-28 21:00:00',freq='3H')})
vento_time0 = vento_time.set_index('Data')
vento_2 = pd.merge_asof(vento_time0,vento0, on='Index', tolerance=pd.Timedelta("5 minutes")).fillna('NAN')

vento0类似于:

Index               Vel Dir
2016-07-12 16:17:00 9.8  13.8
2016-07-12 16:18:00 10.9 1.8
2016-07-12 16:19:00 10.0 11.1
2016-07-12 16:20:00 11.0 11.0
...                 ...  ...
...                 ...  ...
2017-02-28 22:34:00 9.2  13.7

vento_time0似乎是:

Index
2016-07-12 18:00:00
2016-07-12 21:00:00
2016-07-13 00:00:00
2016-07-13 03:00:00
...        ...
...        ...
2017-02-28 21:00:00

我的数据间隔为一分钟,并且没有正规化.这样做的目的是将其间隔3小时,用五分钟范围内的最接近数据替换缺失值.但是,当使用merge_asof时,会出现此错误:KeyError: 'Index'.我还尝试使用Data(索引的实际名称),但得到相同的错误.预期的输出将是:

My data has an one minute interval and it's non regularised. The objective of this is to put it in a 3 hour interval replacing the missing values with the closests data in a range of five minutes. But when merge_asof is used, this error appears: KeyError: 'Index'. I also tried to use Data, the actual name of indexes but get the same error. The expected output will be:

Index                 Vel  Dir
2016-07-12 18:00:00   8.0  55
2016-07-12 21:00:00   16.0 67
2016-07-13 00:00:00   NAN  NAN
2016-07-13 03:00:00   19.0 83
...        ...
...        ...
2017-02-28 21:00:00   NAN  NAN

任何人都可以帮忙吗?有没有一种方法可以在Index中使用merge_asof函数?

Can anyone help? Is there a way to use merge_asof function in the Index?

推荐答案

执行以下操作: 使用.sort_values(by = 'Data')代替.set_index

Do something like this: Use .sort_values(by = 'Data') instead .set_index

vento0 = vento1.sort_values(by = 'Data')
vento_time0 = vento_time.sort_values(by = 'Data')

这样做之后,这应该可以工作:

After do that, this should work:

vento_2 = pd.merge_asof(vento_time0,vento0, \
                        tolerance=pd.Timedelta("5 minutes")).fillna('NAN')

使用以下方法确保您的'NAN'变为不是数字":

Be sure that your 'NAN' becomes "not a number" using:

vento_2.convert_objects(convert_numeric=True)

使用merge_asof并转换'NAN'后,您可以设置索引.

After use merge_asof and convert your 'NAN' you can set your index.

vento_2.set_index(['Data'], inplace=True)

这篇关于 pandas -在索引中使用merge_asof函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆