pandas 下采样问题 [英] Pandas Downsampling Issue

查看:111
本文介绍了 pandas 下采样问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv文件,其中两列包含日期和0或1,如下所示:

I have a csv file with two columns containing dates and 0 or 1 like so:

17/08/2012 07:47:16 0
17/08/2012 07:54:31 1
17/08/2012 08:02:31 0
17/08/2012 09:22:33 0
17/08/2012 09:58:05 0
17/08/2012 12:26:59 1
17/08/2012 20:56:00 0
18/08/2012 10:04:06 0
18/08/2012 10:42:52 0
20/08/2012 07:22:02 0
20/08/2012 07:54:28 0
20/08/2012 08:01:58 0
20/08/2012 08:16:31 1
20/08/2012 08:26:38 0
20/08/2012 08:55:19 1
20/08/2012 09:00:09 0 
20/08/2012 09:26:11 0
20/08/2012 09:50:10 0
20/08/2012 10:33:37 0
20/08/2012 10:39:13 0
20/08/2012 10:39:35 1
20/08/2012 11:15:07 1
20/08/2012 11:19:15 0
20/08/2012 11:21:01 0

我将此文件加载到DataFrame raw_data中,然后将索引更改为Timestamp:

I load this file into a DataFrame raw_data and then change the index to Timestamp :

ts_data=raw_data.set_index(pd.to_datetime(raw_data.when_created,dayfirst=True))

然后我尝试使用以下方法对数据进行降采样:

I then try to downsample the data using:

daily_conversions=ts_data.resample('D',how='sum')

它可以工作整天(超过7个月,这里我只包含一个子集),但有一天我会得到以下输出:

It works for all days (more than 7 months ,here i only include a subset) except one day where i get this output:

2012-08-20 NaN

2012-08-20 NaN

从数据中可以看出,这没有意义.有趣的是,如果我使用较高的频率(例如``h'')进行降采样,则在该特定日期可以获得正确的结果.对于不存在的小时,我会得到空值,对于存在的小时,则只有0,但只有0为存在但为== 1的小时数提供正确的总和. 有什么想法吗?

This does not make sense as you can see from the data. The interesting part is that if i downsample using a higher frequency like 'h' i get correct results for that specific day.I get null-values for the hours that are not present 0 for the hourse that are present but only have 0 and a correct sum for the hours that are present but are ==1. Any ideas please?

推荐答案

从上面进行了有益的评论后,我意识到出了什么问题.这只是标签问题.因此,实际上应该返回NaN的日期为19号,但是默认设置为label ='right',因此它显示为20号.当我添加label ='left'时,它可以正常工作.谢谢

After a helpful comment from above i realised what was wrong. It is just a matter of labelling. So in reality the date that should return NaN is the 19th but the default setting is label='right' so it was showing as the 20th. When i add label='left' it works fine. Thanks

这篇关于 pandas 下采样问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆