使用数组绘制图 [英] Plotting a graph using arrays

查看:29
本文介绍了使用数组绘制图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组要绘制在图中的数据.我有一个要每小时进行分组的时间戳列表,然后我想在一个折线图中查看每小时的点数(在一天中,我有多天的数据,我希望每天在一个图表中).

我有每小时的积分值,也有它们出现的小时数.我无法在图形中显示一条线,而且我想我没有一个简单的解决方案.我也张贴了一张图片,您可以看到输出.要显示该行,应采取以下步骤吗?

我有以下代码:

 将pandas导入为pd导入matplotlib.pyplot作为plt将numpy导入为np导入csv从datetime导入timedelta导入日期时间为dtdata = pd.read_csv('test2.csv',header = 0,index_col = None,parse_dates = True,sep =';',usecols = [0,1])df = pd.DataFrame(data,columns = ['Date','Time'])df ['DateTime'] = df ['Date'] + df ['Time']#for df ['DateTime']中的日期:def RemoveMilliSeconds(x):返回x [:-5]df ['Time'] = df ['Time'].apply(RemoveMilliSeconds)df ['DateTime'] = df ['Date'] + df ['Time']df ['DateTime'] = pd.to_datetime(df ['DateTime'],format =%Y:%m:%d%H:%M:%S")df ['TimeDelta'] = df.groupby('Date')['DateTime'].apply(lambda x:x.diff())#print(df ['TimeDelta']/np.timedelta64(1,'h'))df ['HourOfDay'] = df ['DateTime'].dt.hourdf ['Day'] = df ['DateTime'].dt.daygrouped_df = df.groupby('Day')对于密钥,grouped_df中的项目:打印(grouped_df.get_group(key)['HourOfDay'].value_counts(),"\ n \ n")res = []为我在df ['DateTime'].dt.hour中:如果我不在资源中:res.append(i)print("enkele lijst:" + str(res))#range =(0,24)#bins = 2#plt.hist(df ['DateTime'].dt.hour,bins,range)x = np.array([res])y = np.array([df ['HourOfDay'].value_counts()])plt.plot(x,y)plt.show()#times = pd.DatetimeIndex(df.Time)#grouped = df.groupby([times.hour]) 

显示输出的图片

我的样本数据:

  Date; Time2020:02:13; 12:39:02:9132020:02:13; 12:39:42:9152020:02:13; 13:06:20:7182020:02:13; 13:18:25:9882020:02:13; 13:34:02:8352020:02:13; 13:46:35:7932020:02:13; 13:59:10:6592020:02:13; 14:14:33:5712020:02:13; 14:25:36:3812020:02:13; 14:35:38:3422020:02:13; 14:46:04:0062020:02:13; 14:56:57:3462020:02:13; 15:07:39:7522020:02:13; 15:19:44:8682020:02:13; 15:32:31:4382020:02:13; 15:44:44:9282020:02:13; 15:56:54:4532020:02:13; 16:08:21:0232020:02:13; 16:19:17:6202020:02:13; 16:29:56:9442020:02:13; 16:40:11:1322020:02:13; 16:49:12:1132020:02:13; 16:57:26:6522020:02:13; 16:57:26:6522020:02:13; 17:04:22:0922020:02:17; 08:58:08:5622020:02:17; 08:58:42:545 

解决方案

您没有以 matplotlib 可以理解它们之间的关系的方式准备x-y数据.

简单的答案"将是直接相对地绘制 res df ['HourOfDay'].value_counts():

 #...#range =(0,24)#bins = 2#plt.hist(df ['DateTime'].dt.hour,bins,range)plt.plot(res,df ['HourOfDay'].value_counts())plt.show() 

但是示例输出向您显示了问题:

matplotlib 不会为您排序 x 值(这会在不同的上下文中错误地表示数据).因此,我们必须在绘制之前执行以下操作:

 #...#range =(0,24)#bins = 2#plt.hist(df ['DateTime'].dt.hour,bins,range)xy = np.stack((res,df ['HourOfDay'].value_counts()))xy = xy [:, np.argsort(xy [0 ,:])]plt.plot(* xy)plt.show() 

现在, x 值的顺序正确,并且 y 值已在合并的 xy 中与它们一起排序为此创建的数组:

显然,最好直接准备 res df ['HourOfDay'].value_counts(),因此我们不必创建组合数组将它们排序在一起.由于您没有提供代码应该做的解释,因此我们只能将所创建代码的问题后修复-您应该以不同的方式构造它,这样就不会出现此问题.但是只有您能做到这一点(或者了解您的代码意图的人-我不会).

我还建议您花一些时间来指导

更新2
要将它们绘制成单个图形,可以修改循环:

 #...dfplot = dfcounts.groupby(dfcounts.Date)对于dfplot.groups中的groupdate:图,(ax1,ax2)= plt.subplots(1,2,图大小=(8,4))fig.suptitle(日期:" + str(groupdate),fontsize = 16)#scaled为图形之间的可比性ax1.plot(dfplot.get_group(groupdate).小时,dfplot.get_group(groupdate).Count,颜色=蓝色",标记="o")ax1.set_xlim(0,24)ax1.xaxis.set_ticks(np.arange(0,25,2))ax1.set_ylim(0,maxcount * 1.1)ax1.set_title(可比较版本")#scaled最大化每天的可见度ax2.plot(dfplot.get_group(groupdate).小时,dfplot.get_group(groupdate).Count,颜色=红色",标记="x")ax2.set_xlim(0,24)ax2.xaxis.set_ticks(np.arange(0,25,2))ax2.set_title(扩展版本")plt.tight_layout()#选择性地保存#plt.savefig("MyDataForDay" + str(groupdate)+.eps")打印(生成所有数字")plt.show() 

其中某天的样本输出:

使用以下测试数据创建:

  Date; Time2020:02:13; 12:39:02:9132020:02:13; 12:39:42:9152020:02:13; 13:06:20:7182020:02:13; 13:18:25:9882020:02:13; 13:34:02:8352020:02:13; 13:46:35:7932020:02:13; 13:59:10:6592020:02:13; 14:14:33:5712020:02:13; 14:25:36:3812020:02:13; 14:35:38:3422020:02:13; 14:46:04:0062020:02:13; 14:56:57:3462020:02:13; 15:07:39:7522020:02:13; 15:19:44:8682020:02:13; 15:32:31:4382020:02:13; 15:44:44:9282020:02:13; 15:56:54:4532020:02:13; 16:08:21:0232020:02:13; 16:19:17:6202020:02:13; 16:29:56:9442020:02:13; 16:40:11:1322020:02:13; 16:49:12:1132020:02:13; 16:57:26:6522020:02:13; 16:57:26:6522020:02:13; 17:04:22:0922020:02:17; 08:58:08:5622020:02:17; 08:58:42:5452020:02:17; 15:19:44:8682020:02:17; 17:32:31:4382020:02:17; 17:44:44:9282020:02:17; 17:56:54:4532020:02:17; 18:08:21:0232020:03:19; 06:19:17:6202020:03:19; 06:29:56:9442020:03:19; 06:40:11:1322020:03:19; 14:49:12:1132020:03:19; 16:57:26:6522020:03:19; 16:57:26:6522020:03:19; 17:04:22:0922020:03:19; 18:58:08:5622020:03:19; 18:58:42:545 

I have a set of data that I want to plot in a graph. I have a list of timestamps which I want to group per hour and then I want to see the amount of points per hour in a line graph (over one day, where I have data of multiple days, which I want in a graph per day).

I have the value of the points per hour and I have the hours on which they occur. I do not get it to work that it gives a line in my graph and I think I am missing a simple solution. I have posted a picture as well to you can see the output. What is the following step to take to get the line to show?

I have the following code:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import csv
from datetime import timedelta
import datetime as dt
 
data= pd.read_csv('test2.csv', header=0, index_col=None, parse_dates=True, sep=';', usecols=[0,1])
df=pd.DataFrame(data, columns=['Date', 'Time'])
df['DateTime'] = df['Date'] + df['Time']

#for date in df['DateTime']:


def RemoveMilliSeconds(x):
    return x[:-5]

df['Time'] = df['Time'].apply(RemoveMilliSeconds)

df['DateTime'] = df['Date'] + df['Time']
df['DateTime'] = pd.to_datetime(df['DateTime'], format="%Y:%m:%d %H:%M:%S")
df['TimeDelta'] = df.groupby('Date')['DateTime'].apply(lambda x: x.diff())

#print(df['TimeDelta'] / np.timedelta64(1, 'h'))
df['HourOfDay'] = df['DateTime'].dt.hour
df['Day'] = df['DateTime'].dt.day

grouped_df = df.groupby('Day')

for key, item in grouped_df:
    print(grouped_df.get_group(key)['HourOfDay'].value_counts(), "\n\n")


res=[]
for i in df['DateTime'].dt.hour:
    if i not in res:
        res.append(i)
print("enkele lijst:" + str(res))
#range = (0,24)
#bins = 2
#plt.hist(df['DateTime'].dt.hour, bins, range)

x=np.array([res])

y=np.array([df['HourOfDay'].value_counts()])
plt.plot(x,y)
plt.show()

#times = pd.DatetimeIndex(df.Time)
#grouped = df.groupby([times.hour])

The picture that shows the output

My sample data:

Date;Time
2020:02:13 ;12:39:02:913 
2020:02:13 ;12:39:42:915 
2020:02:13 ;13:06:20:718 
2020:02:13 ;13:18:25:988 
2020:02:13 ;13:34:02:835 
2020:02:13 ;13:46:35:793 
2020:02:13 ;13:59:10:659 
2020:02:13 ;14:14:33:571 
2020:02:13 ;14:25:36:381 
2020:02:13 ;14:35:38:342 
2020:02:13 ;14:46:04:006 
2020:02:13 ;14:56:57:346 
2020:02:13 ;15:07:39:752 
2020:02:13 ;15:19:44:868 
2020:02:13 ;15:32:31:438 
2020:02:13 ;15:44:44:928 
2020:02:13 ;15:56:54:453 
2020:02:13 ;16:08:21:023 
2020:02:13 ;16:19:17:620 
2020:02:13 ;16:29:56:944 
2020:02:13 ;16:40:11:132 
2020:02:13 ;16:49:12:113 
2020:02:13 ;16:57:26:652 
2020:02:13 ;16:57:26:652 
2020:02:13 ;17:04:22:092 
2020:02:17 ;08:58:08:562 
2020:02:17 ;08:58:42:545 

解决方案

You did not prepare your x-y data in a way that matplotlib can understand their relationship.

The easy "answer" would be to plot res and df['HourOfDay'].value_counts() directly against each other:

#.....
#range = (0,24)
#bins = 2
#plt.hist(df['DateTime'].dt.hour, bins, range)

plt.plot(res, df['HourOfDay'].value_counts())
plt.show()

But the sample output shows you the problem:

matplotlib does not order the x-values for you (that would misrepresent the data in a different context). So, we have to do this before plotting:

#.....
#range = (0,24)
#bins = 2
#plt.hist(df['DateTime'].dt.hour, bins, range)

xy=np.stack((res, df['HourOfDay'].value_counts()))
xy = xy[:, np.argsort(xy[0,:])]
plt.plot(*xy)
plt.show()

Now, the x-values are in the correct order, and the y-values have been sorted with them in the combined xy array that we created for this purpose:

Obviously, it would be better to prepare res and df['HourOfDay'].value_counts() directly, so we don't have to create a combined array to sort them together. Since you did not provide an explanation what your code is supposed to do, we can only post-fix the problem the code created - you should structure it differently, so that this problem does not occur in the first place. But only you can do this (or people who understand the intention of your code - I don't).

I also suggest spending some time with the instructive matplotlib tutorials - this time is not wasted.

Update
It seems you try to create a subplot for each day and count the number of entries per hour. I would approach it like this (but I am sure, some panda experts have better ways for this):

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
 
#read your data and create datetime index
df= pd.read_csv('test1.txt', sep=";") 
df.index = pd.to_datetime(df["Date"]+df["Time"].str[:-5], format="%Y:%m:%d %H:%M:%S")

#group by date and hour, count entries
dfcounts = df.groupby([df.index.date, df.index.hour]).size().reset_index()
dfcounts.columns = ["Date", "Hour", "Count"]
maxcount = dfcounts.Count.max()

#group by date for plotting
dfplot = dfcounts.groupby(dfcounts.Date)

#plot each day into its own subplot
fig, axs = plt.subplots(dfplot.ngroups, figsize=(6,8))

for i, groupdate in enumerate(dfplot.groups):
    ax=axs[i]
    #the marker is not really necessary but has been added in case there is just one entry per day
    ax.plot(dfplot.get_group(groupdate).Hour, dfplot.get_group(groupdate).Count, color="blue", marker="o")
    ax.set_title(str(groupdate))
    ax.set_xlim(0, 24)
    ax.set_ylim(0, maxcount * 1.1)
    ax.xaxis.set_ticks(np.arange(0, 25, 2))

plt.tight_layout()
plt.show()

Sample output:

Update 2
To plot them into individual figures, you can modify the loop:

#...
dfplot = dfcounts.groupby(dfcounts.Date)

for groupdate in dfplot.groups:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 4))
    fig.suptitle("Date:"+str(groupdate), fontsize=16)

    #scaled for comparability among graphs
    ax1.plot(dfplot.get_group(groupdate).Hour, dfplot.get_group(groupdate).Count, color="blue", marker="o")
    ax1.set_xlim(0, 24)
    ax1.xaxis.set_ticks(np.arange(0, 25, 2))
    ax1.set_ylim(0, maxcount * 1.1)
    ax1.set_title("comparable version")

    #scaled to maximize visibility per day
    ax2.plot(dfplot.get_group(groupdate).Hour, dfplot.get_group(groupdate).Count, color="red", marker="x")
    ax2.set_xlim(0, 24)
    ax2.xaxis.set_ticks(np.arange(0, 25, 2))
    ax2.set_title("expanded version")
    
    plt.tight_layout()
    #save optionally 
    #plt.savefig("MyDataForDay"+str(groupdate)+".eps")

print("All figures generated")
plt.show()

Sample output for one of the days:

created with the following test data:

Date;Time
2020:02:13 ;12:39:02:913 
2020:02:13 ;12:39:42:915 
2020:02:13 ;13:06:20:718 
2020:02:13 ;13:18:25:988 
2020:02:13 ;13:34:02:835 
2020:02:13 ;13:46:35:793 
2020:02:13 ;13:59:10:659 
2020:02:13 ;14:14:33:571 
2020:02:13 ;14:25:36:381 
2020:02:13 ;14:35:38:342 
2020:02:13 ;14:46:04:006 
2020:02:13 ;14:56:57:346 
2020:02:13 ;15:07:39:752 
2020:02:13 ;15:19:44:868 
2020:02:13 ;15:32:31:438 
2020:02:13 ;15:44:44:928 
2020:02:13 ;15:56:54:453 
2020:02:13 ;16:08:21:023 
2020:02:13 ;16:19:17:620 
2020:02:13 ;16:29:56:944 
2020:02:13 ;16:40:11:132 
2020:02:13 ;16:49:12:113 
2020:02:13 ;16:57:26:652 
2020:02:13 ;16:57:26:652 
2020:02:13 ;17:04:22:092 
2020:02:17 ;08:58:08:562 
2020:02:17 ;08:58:42:545 
2020:02:17 ;15:19:44:868 
2020:02:17 ;17:32:31:438 
2020:02:17 ;17:44:44:928 
2020:02:17 ;17:56:54:453 
2020:02:17 ;18:08:21:023 
2020:03:19 ;06:19:17:620 
2020:03:19 ;06:29:56:944 
2020:03:19 ;06:40:11:132 
2020:03:19 ;14:49:12:113 
2020:03:19 ;16:57:26:652 
2020:03:19 ;16:57:26:652 
2020:03:19 ;17:04:22:092 
2020:03:19 ;18:58:08:562 
2020:03:19 ;18:58:42:545 

这篇关于使用数组绘制图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆