使用大 pandas 计算不规则时间序列的每日平均值 [英] Calculating daily average from irregular time series using pandas
问题描述
我试图从csv文件中获取每日平均值。
csv文件中的数据从20点13:00开始2013年9月,2014年1月14日至10:57:
时间值
20/09/2013 13: 00 5.133540
20/09/2013 13:01 5.144993
20/09/2013 13:02 5.158208
20/09/2013 13:03 5.170542
20/09/2013 13:04 5.167899 20/09/2013 13:25 5.168780
20/09/2013 13:26 5.179351
...
我用以下形式导入它们:
import pandas as pd
= pd.read_csv('< file name>',parse_dates = {'Timestamp':'Time']},index_col ='Timestamp')
这会导致
值
Timestamp
2013- 09-20 13:00:00 5.133540
2013-09-20 13:01:00 5.144993
2013-09-20 13:02:00 5.158208
2013-09-20 13: 03:00 5.170542
2013-09-20 13:04:00 5.167899
2013-09-20 13:25:00 5.168780
2013-09-20 13:26:00 5.179351
...
然后我做
dataDailyAv = data.resample('D',how ='mean')
这会导致
值
Timestamp
2013-01-10 8.623744
2013-01-11 NaN
2013-01-12 NaN
2013-01-13 NaN
2013-01-14 NaN
...
换句话说,结果包含未出现在原始数据中的日期, (例如
感谢您对我们的支持! 。
编辑:显然日期解析有问题:01/10/2013解释为2013年1月10日,而不是2013年10月1日。这可以解决
解决方案您可以通过编辑csv文件中的日期格式来指定日期格式<
dayfirst = True .io.parsers.read_csv.htmlrel =nofollow> read_csv docs 。 I am trying to obtain daily averages from an irregular time series from a csv-file.
The data in the csv-file start at 13:00 on 20 September 2013 and run till 10:57 on 14 January 2014:
Time Values 20/09/2013 13:00 5.133540 20/09/2013 13:01 5.144993 20/09/2013 13:02 5.158208 20/09/2013 13:03 5.170542 20/09/2013 13:04 5.167899 20/09/2013 13:25 5.168780 20/09/2013 13:26 5.179351 ...
I import them with:
import pandas as pd data = pd.read_csv('<file name>', parse_dates={'Timestamp':'Time']},index_col='Timestamp')
This results in
Values Timestamp 2013-09-20 13:00:00 5.133540 2013-09-20 13:01:00 5.144993 2013-09-20 13:02:00 5.158208 2013-09-20 13:03:00 5.170542 2013-09-20 13:04:00 5.167899 2013-09-20 13:25:00 5.168780 2013-09-20 13:26:00 5.179351 ...
And then I do
dataDailyAv = data.resample('D', how = 'mean')
This results in
Values Timestamp 2013-01-10 8.623744 2013-01-11 NaN 2013-01-12 NaN 2013-01-13 NaN 2013-01-14 NaN ...
In other words, the result contains dates that do not appear in the original data, and for some of these dates (e.g. 10 January 2013), there even appears a value.
Any ideas about what is going wrong?
Thanks.
Edit: apparently something goes wrong with the parsing of the date: 01/10/2013 is interpreted as 10 January 2013 instead of 1 October 2013. This can be solved by editing the date format in the csv-file, but is there a way to specify the date format in read_csv?
解决方案You want
dayfirst=True
, one of the many tweaks listed in the read_csv docs.这篇关于使用大 pandas 计算不规则时间序列的每日平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!