使用大 pandas 计算不规则时间序列的每日平均值 [英] Calculating daily average from irregular time series using pandas

查看:205
本文介绍了使用大 pandas 计算不规则时间序列的每日平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从csv文件中获取每日平均值。



csv文件中的数据从20点13:00开始2013年9月,2014年1月14日至10:57:

 时间值
20/09/2013 13: 00 5.133540
20/09/2013 13:01 5.144993
20/09/2013 13:02 5.158208
20/09/2013 13:03 5.170542
20/09/2013 13:04 5.167899 20/09/2013 13:25 5.168780
20/09/2013 13:26 5.179351
...

我用以下形式导入它们:

  import pandas as pd 
= pd.read_csv('< file name>',parse_dates = {'Timestamp':'Time']},index_col ='Timestamp')

这会导致

 
Timestamp
2013- 09-20 13:00:00 5.133540
2013-09-20 13:01:00 5.144993
2013-09-20 13:02:00 5.158208
2013-09-20 13: 03:00 5.170542
2013-09-20 13:04:00 5.167899
2013-09-20 13:25:00 5.168780
2013-09-20 13:26:00 5.179351
...

然后我做

  dataDailyAv = data.resample('D',how ='mean')

这会导致

 
Timestamp
2013-01-10 8.623744
2013-01-11 NaN
2013-01-12 NaN
2013-01-13 NaN
2013-01-14 NaN
...

换句话说,结果包含未出现在原始数据中的日期, (例如



感谢您对我们的支持! 。



编辑:显然日期解析有问题:01/10/2013解释为2013年1月10日,而不是2013年10月1日。这可以解决

解决方案

您可以通过编辑csv文件中的日期格式来指定日期格式< dayfirst = True .io.parsers.read_csv.htmlrel =nofollow> read_csv docs 。


I am trying to obtain daily averages from an irregular time series from a csv-file.

The data in the csv-file start at 13:00 on 20 September 2013 and run till 10:57 on 14 January 2014:

Time                    Values
20/09/2013 13:00        5.133540
20/09/2013 13:01        5.144993
20/09/2013 13:02        5.158208
20/09/2013 13:03        5.170542
20/09/2013 13:04        5.167899    20/09/2013 13:25        5.168780
20/09/2013 13:26        5.179351
...

I import them with:

import pandas as pd
data = pd.read_csv('<file name>', parse_dates={'Timestamp':'Time']},index_col='Timestamp')

This results in

                           Values
Timestamp                          
2013-09-20 13:00:00        5.133540
2013-09-20 13:01:00        5.144993
2013-09-20 13:02:00        5.158208
2013-09-20 13:03:00        5.170542
2013-09-20 13:04:00        5.167899
2013-09-20 13:25:00        5.168780
2013-09-20 13:26:00        5.179351
...

And then I do

dataDailyAv = data.resample('D', how = 'mean')

This results in

                  Values
Timestamp                 
2013-01-10        8.623744
2013-01-11             NaN
2013-01-12             NaN
2013-01-13             NaN
2013-01-14             NaN
...

In other words, the result contains dates that do not appear in the original data, and for some of these dates (e.g. 10 January 2013), there even appears a value.

Any ideas about what is going wrong?

Thanks.

Edit: apparently something goes wrong with the parsing of the date: 01/10/2013 is interpreted as 10 January 2013 instead of 1 October 2013. This can be solved by editing the date format in the csv-file, but is there a way to specify the date format in read_csv?

解决方案

You want dayfirst=True, one of the many tweaks listed in the read_csv docs.

这篇关于使用大 pandas 计算不规则时间序列的每日平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆