如何在Python中按天汇总时间序列数据? resample.sum()无效 [英] How do I sum time series data by day in Python? resample.sum() has no effect

查看:838
本文介绍了如何在Python中按天汇总时间序列数据? resample.sum()无效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Python的新手。如何根据日期对数据求和并绘制结果?



我有一个Series对象,其数据如下:

  2017-11-03 07:30:00 NaN 
2017-11-03 09:18:00 NaN
2017-11-03 10:00: 00 NaN
2017-11-03 11:08:00 NaN
2017-11-03 14:39:00 NaN
2017-11-03 14:53:00 NaN
2017-11-03 15:00:00 NaN
2017-11-03 16:00:00 NaN
2017-11-03 17:03:00 NaN
2017-11- 03 17:42:00 800.0
2017-11-04 07:27:00 600.0
2017-11-04 10:10:00 NaN
2017-11-04 11:48: 00 NaN
2017-11-04 12:58:00 500.0
2017-11-04 13:40:00 NaN
2017-11-04 15:15:00 NaN
2017-11-04 16:21:00 NaN
2017-11-04 17:37:00 500.0
2017-11-04 21:37:00 NaN
2017-11-04 05 03:00:00 NaN
2017-11-05 06:30:00 NaN
2017-11-05 07:19:00 NaN
2017-11-05 08:31: 00 200.0
2017-11-05 09:31:00 500.0
2017-11-05 12:03:00 NaN
2017-11-05 12 :25:00 200.0
2017-11-05 13:11:00 500.0
2017-11-05 16:31:00 NaN
2017-11-05 19:00:00 500.0
2017-11-06 08:08:00 NaN

我有以下代码:

 #加载程序包
导入熊猫为pd
导入matplotlib.pyplot为plt

#导入止痛药数据
df = pd.read_csv('/ Users / user / Documents / health / PainOverTime.csv',delimiter =',')

#绘图条形图日期和止痛药用量
次= pd.to_datetime(df.loc [:,'Time'])

ts = pd.Series(df.loc [:,'acetaminophen'] .values,index = times,
name ='Painkiller over time')
ts.plot()

这给了我以下线图(?)图:





这是一个开始;现在我想按日期对剂量求和。但是,此代码无法实现任何更改:结果图相同。怎么了?

  ts.resample('D',closed ='left',label ='right')。sum ()
ts.plot()

我也尝试过 ts.resample('D')。sum() ts.resample('1d')。sum() ts.resample('1D')。sum(),但是图中没有变化。



.resample 甚至是正确的功能吗?我了解从数据中重新采样是采样,例如



也就是说,我希望获得一些结果(基于上述数据),例如:

  2017-11-03 800 
2017-11-04 1600
2017-11-05 1900
2017-11-06 NaN


解决方案





此方法是否不优于'groupby'函数?



现在我该如何制作散点图或条形图呢?线图...?


I am new to Python. How do I sum data based on date and plot the result?

I have a Series object with data like:

2017-11-03 07:30:00      NaN
2017-11-03 09:18:00      NaN
2017-11-03 10:00:00      NaN
2017-11-03 11:08:00      NaN
2017-11-03 14:39:00      NaN
2017-11-03 14:53:00      NaN
2017-11-03 15:00:00      NaN
2017-11-03 16:00:00      NaN
2017-11-03 17:03:00      NaN
2017-11-03 17:42:00    800.0
2017-11-04 07:27:00    600.0
2017-11-04 10:10:00      NaN
2017-11-04 11:48:00      NaN
2017-11-04 12:58:00    500.0
2017-11-04 13:40:00      NaN
2017-11-04 15:15:00      NaN
2017-11-04 16:21:00      NaN
2017-11-04 17:37:00    500.0
2017-11-04 21:37:00      NaN
2017-11-05 03:00:00      NaN
2017-11-05 06:30:00      NaN
2017-11-05 07:19:00      NaN
2017-11-05 08:31:00    200.0
2017-11-05 09:31:00    500.0
2017-11-05 12:03:00      NaN
2017-11-05 12:25:00    200.0
2017-11-05 13:11:00    500.0
2017-11-05 16:31:00      NaN
2017-11-05 19:00:00    500.0
2017-11-06 08:08:00      NaN

I have the following code:

# load packages
import pandas as pd
import matplotlib.pyplot as plt

# import painkiller data
df = pd.read_csv('/Users/user/Documents/health/PainOverTime.csv',delimiter=',')

# plot bar graph of date and painkiller amount
times = pd.to_datetime(df.loc[:,'Time'])

ts = pd.Series(df.loc[:,'acetaminophen'].values, index = times,
               name = 'Painkiller over Time')
ts.plot()

This gives me the following line(?) graph:

It's a start; now I want to sum the doses by date. However, this code fails to effect any change: The resulting plot is the same. What is wrong?

ts.resample('D',closed='left', label='right').sum()
ts.plot()

I have also tried ts.resample('D').sum(), ts.resample('1d').sum(), ts.resample('1D').sum(), but there is no change in the plot.

Is .resample even the correct function? I understand resampling to be sampling from the data, e.g. randomly taking one point per day, whereas I want to sum each day's values.

Namely, I'm hoping for some result (based on the above data) like:

2017-11-03 800
2017-11-04 1600
2017-11-05 1900
2017-11-06 NaN

解决方案

This answer helped me see that I needed to assign it to a new object (if that's the right terminology):

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('/Users/user/Documents/health/PainOverTime.csv',delimiter=',')
# plot bar graph of date and painkiller amount
times = pd.to_datetime(df.loc[:,'Time'])

# raw plot of data
ts = pd.Series(df.loc[:,'acetaminophen'].values, index = times,
               name = 'Painkiller over Time')
fig1 = ts.plot()

# combine data by day
test2 = ts.resample('D').sum()
fig2 = test2.plot()

That produces the following plots:

Is this method not better than the 'groupby' function?

Now how do I make a scatter or bar plot instead of this line plot...?

这篇关于如何在Python中按天汇总时间序列数据? resample.sum()无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆