修改UTC日期时间的数据框堆叠直方图+ Matplotlib DateFormatter问题的日期时间轴 [英] Modify datetime axis of a dataframe stacked histogram + Matplotlib DateFormatter issues with UTC datetime

查看:62
本文介绍了修改UTC日期时间的数据框堆叠直方图+ Matplotlib DateFormatter问题的日期时间轴的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含时间和情绪的数据框.时间变量为YYYY-MM-DD HH:MM:SS.

我想用5分钟的柱形图绘制情绪的堆积直方图.

以下代码有效,但是x轴太忙.我只希望在x轴上显示30分钟的间隔,而每隔5分钟仍然显示一次.

您能帮我实现这一目标吗?

 将matplotlib.pyplot导入为pltdf ['time'] = pd.to_datetime(df ['time'])df.groupby([df.time.dt.floor('5Min'),'Sentiment']).size().unstack().plot(kind ='bar',stacked = True)plt.show()

编辑 #1

我认为以下代码朝着正确的方向发展,但是mdates.Dateformatter似乎未返回正确的日期.数据示例链接:

I have a dataframe that contains time and sentiment. The time variable is YYYY-MM-DD HH:MM:SS.

I want to plot a stacked histogram of the sentiment with 5 mins bars.

The below code works, but the x axis is too busy. I want to display only 30 minutes intervals on the x axis and still have each 5 mins bars.

Can you please help me achieve this?

import matplotlib.pyplot as plt    
df['time'] = pd.to_datetime(df['time'])

df.groupby([df.time.dt.floor('5Min'),'Sentiment']).size().unstack().plot(kind='bar',stacked=True)

plt.show()

EDIT #1

I think the following code is in the right direction, but it seems that the mdates.Dateformatter is not returning the proper dates. Link to data sample : https://pastebin.pl/view/52b65e7b

df = pd.read_csv("testfile.csv", nrows=999)
df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S%z')
df['time'] = pd.to_datetime(df['time'], utc=True)
df['time'] = df['time'].dt.tz_convert('US/Eastern')


df.groupby([df.time.dt.floor('5Min'),'Sentiment']).size().unstack().plot(kind='bar',stacked=True)


plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d %H:%M:%S'))
plt.gca().xaxis.set_major_locator(mdates.MinuteLocator(interval=30))
plt.gcf().autofmt_xdate()
plt.show()

EDIT #2

In my dataframe, I have another column named 'close' that I want to display as a line on the same axis. How do I overlay a line for df['close'] on this graph?

解决方案

If you want to thin out the time series on its X axis in a grouped graph, it is easiest to update it with labels created as often as you need. The change from your code is that the time zone information is removed because it makes the labels on the X-axis longer. We also use the resample function to get a sampling every 5 minutes. If the data is different, please correct it.

import pandas as pd
import numpy as np

df = pd.read_csv('testfile.csv', sep=',', nrows=999)
df['time'] = pd.to_datetime(df['time'])
df.set_index('time', inplace=True)
df.index = df.index.tz_localize(None)

import matplotlib.pyplot as plt
import matplotlib.dates as mdates

ax = df.groupby(['Sentiment']).resample('5min').size().unstack().T.plot(kind='bar', stacked=True, figsize=(14,9))

df_ts = df.groupby(['Sentiment']).resample('5min').size().unstack().T
labels = [ str(x) if (x.minute == 30) | (x.minute == 0) else '' for x in df_ts.index]

ax.set_xticklabels(labels, rotation=45)
plt.gcf().autofmt_xdate()
plt.show()

这篇关于修改UTC日期时间的数据框堆叠直方图+ Matplotlib DateFormatter问题的日期时间轴的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆