使用matplotlib / pandas / python,我不能将数据可视化为每30分钟和每天的值 [英] Using matplotlib/pandas/python, I cannot visualize data as values per 30mins and per days

查看:339
本文介绍了使用matplotlib / pandas / python,我不能将数据可视化为每30分钟和每天的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Matplotlib / Python分析CSV文件。



这是CSV文件。
https://github.com/camenergydatalab/EnergyDataSimulationChallenge /blob/master/challenge2/data/total_watt.csv



导入CSV文件后,我成功绘制了每30分钟的图表和可视化能耗, (谢谢!)使用Matplotlib,显示CSV数据

 从matplotlib导入样式
从matplotlib import pylab as plt
import numpy as np

style.use('ggplot')

filename ='total_watt.csv'
date = []
number = []

import csv
with open(filename,'rb')as csvfile:
csvreader = csv.reader(csvfile,delimiter =',',quotechar ='|')
for row in csvreader:
if len(row)== 2:
date.append(row [0])
number.append(row [1])$ ​​b
$ b number = np .nray(number)

import datetime
在范围内(len(date)):
date [ii] = datetime.datetime.strptime(date [ii] '%Y-%m-%d%H:%M:%S')

plt.plot(date,number)

plt.title )
plt.ylabel('Y axis')
plt.xlabel('X axis')

plt.show()



但事实是,我无法想象每天的能源消耗...



------------已编辑(谢谢Florian !!)------------



我安装了pandas



现在,我的代码看起来像下面这样:

 从matplotlib导入样式
从matplotlib导入pylab as plt
导入numpy为np
导入pandas为pd

style.use ('ggplot')

filename ='total_watt.csv'
date = []
number = []

import csv
with open(filename,'rb')as csvfile:

df = pd.read_csv('total_watt.csv',parse_dates = [0],index_col = [0])
df。重新取样('1D',how ='sum')



对于df中的行:
如果len(row)== 2:
date.append(row [0])
number.append(row [1])$ ​​b
$ b number = np.array(number)

import datetime
for ii in range(len(date)):
date [ii] = datetime.datetime.strptime(date [ii],'%Y-%m-%d%H:%M:%S ')

plt.plot(date,number)

plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')

plt.show()

和当我实现这个代码。我没有错误。

解决方案

使用 pandas resample 函数可以使您的生活更轻松。



资料



  import io 
import pandas as pd
content ='''timestamp value
2011-04-18 16:52:00 152.684299188514
2011-04-18 17:22:00 327.579073188405
2011-04-18 17:52:00 156.826945856169
2011-04-18 18:22:00 330.202764488018
2011-04-18 18:52:00 1118.60404324133
2011-04-18 19:22:00 243.972250782998
2011-04-18 19:52:00 852.88815851216
2011-04-18 20:22:00 491.859992982456
2011-04-18 20:52:00 466.738983617709
2011-04-18 21:22:00 659.670303375527
2011-04-18 21:52:00 576.304871428571
2011-04-18 22:22:00 2497.20620579196
2011-04-18 22:52:00 2790.20392088608
2011-04-18 23:22:00 1092.20906629318
2011-04-18 23:52:00 825.994417375886
2011-04-19 00:22:00 2397.16672089666
2011-04-19 00:52:00 1411.66659265233
2011-04-19 01:22:00 2379.18391111111
2011-04-19 01:52:00 841.224212511672
2011-04-19 02:22:00 471.5203​​08479532
2011-04-19 02:52:00 1189.78122544232
2011-04-19 03:22:00 343.7574197609
2011-04-19 03:52:00 336.486834795322
2011-04-19 04:22:00 541.401434220355
2011-04-19 04:52:00 316.106452883263
2011-04-19 05:22:00 502.502274561404
2011-04-19 05:52:00 314.832323976608
'''

df = pd.read_table(io.BytesIO(content.encode('UTF-8')),sep ='\s {2,}' parse_dates = [0],index_col = [0],engine ='python')





请参阅这里的文件: http://pandas-docs.github.io/pandas-docs-travis/

$ 30每30分钟



  df = df.resample('30min',how ='sum')
Out [496]:
value
timestamp
2011-04-18 16:30:00 152.684299
2011-04-18 17:00:00 327.579073
2011-04-18 17:30:00 156.826946
2011 -04-18 18:00:00 330.202764
2011-04-18 18:30:00 1118.604043
2011-04-18 19:00:00 243.972251
2011-04-18 19 :30:00 852.888159
2011-04-18 20:00:00 491.859993
2011-04-18 20:30:00 466.738984
2011-04-18 21:00:00 659.670303
2011-04-18 21:30:00 576.304871
2011-04-18 22:00:00 2497.206206
2011-04-18 22:30:00 2790.203921
2011 -04-18 23:00:00 1092.209066
2011-04-18 23:30:00 825.994417
2011-04-19 00:00:00 2397.166721
2011-04-19 00 :30:00 1411.666593
2011-04-19 01:00:00 2379.183911
2011-04-19 01:30:00 841.224213
2011-04-19 02:00:00 471.5203​​08
2011-04-19 02:30:00 1189.781225
2011-04-19 03:00:00 343.757420
2011-04-19 03:30:00 336.486835
2011 -04-19 04:00:00 541.401434
2011-04-19 04:30:00 316.106453
2011-04-19 05:00:00 502.502275
2011-04-19 05 :30:00 314.832324



每天



  df = df.resample('1D',how ='sum')
Out [497]:
value
timestamp
2011 -04-18 12582.945297
2011-04-19 11045.629711



Plot





希望它有帮助!


I am analysing a CSV file with Matplotlib/Python.

This is the CSV file. https://github.com/camenergydatalab/EnergyDataSimulationChallenge/blob/master/challenge2/data/total_watt.csv

Importing a CSV file, I successfully plotted a graph and visualised energy consumption per 30 minutes with the following code.(Thank you guys!! Using Matplotlib, visualize CSV data)

from matplotlib import style
from matplotlib import pylab as plt
import numpy as np

style.use('ggplot')

filename='total_watt.csv'
date=[]
number=[]

import csv
with open(filename, 'rb') as csvfile:
    csvreader = csv.reader(csvfile, delimiter=',', quotechar='|')
    for row in csvreader:
        if len(row) ==2 :
            date.append(row[0])
            number.append(row[1])

number=np.array(number)

import datetime
for ii in range(len(date)):
    date[ii]=datetime.datetime.strptime(date[ii], '%Y-%m-%d %H:%M:%S')

plt.plot(date,number)

plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')

plt.show()

But the thing is, I cannot visualize the energy consumption per days...

------------Edited (Thank you Florian!!)------------

I installed pandas and added a code for pandas to my code.

Now, my code is look like as following;

from matplotlib import style
from matplotlib import pylab as plt
import numpy as np
import pandas as pd

style.use('ggplot')

filename='total_watt.csv'
date=[]
number=[]

import csv
with open(filename, 'rb') as csvfile:

    df = pd.read_csv('total_watt.csv', parse_dates=[0], index_col=[0])
    df.resample('1D', how='sum')



for row in df:
        if len(row) == 2 :
            date.append(row[0])
            number.append(row[1])

number=np.array(number)

import datetime
for ii in range(len(date)):
    date[ii]=datetime.datetime.strptime(date[ii], '%Y-%m-%d %H:%M:%S')

plt.plot(date,number)

plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')

plt.show()

and when I implemented this code. I got no error. But in my graph, nothing is drawn..How can I solve it..?

解决方案

Using pandas and the resample function could make your life easier.

Data

import io
import pandas as pd
content = '''timestamp  value
2011-04-18 16:52:00     152.684299188514
2011-04-18 17:22:00     327.579073188405
2011-04-18 17:52:00     156.826945856169
2011-04-18 18:22:00     330.202764488018
2011-04-18 18:52:00     1118.60404324133
2011-04-18 19:22:00     243.972250782998
2011-04-18 19:52:00     852.88815851216
2011-04-18 20:22:00     491.859992982456
2011-04-18 20:52:00     466.738983617709
2011-04-18 21:22:00     659.670303375527
2011-04-18 21:52:00     576.304871428571
2011-04-18 22:22:00     2497.20620579196
2011-04-18 22:52:00     2790.20392088608
2011-04-18 23:22:00     1092.20906629318
2011-04-18 23:52:00     825.994417375886
2011-04-19 00:22:00     2397.16672089666
2011-04-19 00:52:00     1411.66659265233
2011-04-19 01:22:00     2379.18391111111
2011-04-19 01:52:00     841.224212511672
2011-04-19 02:22:00     471.520308479532
2011-04-19 02:52:00     1189.78122544232
2011-04-19 03:22:00     343.7574197609
2011-04-19 03:52:00     336.486834795322
2011-04-19 04:22:00     541.401434220355
2011-04-19 04:52:00     316.106452883263
2011-04-19 05:22:00     502.502274561404
2011-04-19 05:52:00     314.832323976608
'''

df = pd.read_table(io.BytesIO(content.encode('UTF-8')), sep='\s{2,}', parse_dates=[0], index_col=[0], engine='python')

Using resample function

See documentation here : http://pandas-docs.github.io/pandas-docs-travis/

per 30 min

df = df.resample('30min', how='sum')
Out[496]: 
                           value
timestamp                       
2011-04-18 16:30:00   152.684299
2011-04-18 17:00:00   327.579073
2011-04-18 17:30:00   156.826946
2011-04-18 18:00:00   330.202764
2011-04-18 18:30:00  1118.604043
2011-04-18 19:00:00   243.972251
2011-04-18 19:30:00   852.888159
2011-04-18 20:00:00   491.859993
2011-04-18 20:30:00   466.738984
2011-04-18 21:00:00   659.670303
2011-04-18 21:30:00   576.304871
2011-04-18 22:00:00  2497.206206
2011-04-18 22:30:00  2790.203921
2011-04-18 23:00:00  1092.209066
2011-04-18 23:30:00   825.994417
2011-04-19 00:00:00  2397.166721
2011-04-19 00:30:00  1411.666593
2011-04-19 01:00:00  2379.183911
2011-04-19 01:30:00   841.224213
2011-04-19 02:00:00   471.520308
2011-04-19 02:30:00  1189.781225
2011-04-19 03:00:00   343.757420
2011-04-19 03:30:00   336.486835
2011-04-19 04:00:00   541.401434
2011-04-19 04:30:00   316.106453
2011-04-19 05:00:00   502.502275
2011-04-19 05:30:00   314.832324

Per day

df = df.resample('1D', how='sum')
Out[497]: 
                   value
timestamp               
2011-04-18  12582.945297
2011-04-19  11045.629711

Plot

Hope it helps!

这篇关于使用matplotlib / pandas / python,我不能将数据可视化为每30分钟和每天的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆