来自 pandas 数据框中列的热图 [英] Heatmap from columns in pandas dataframe

查看:24
本文介绍了来自 pandas 数据框中列的热图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试按一天中的天和小时(X->天,Y->小时)从熊猫数据框中生成热图.结果应该是这样的:

I try to generate a heatmap from a pandas dataframe by days and hours of the day (X-> days, Y->hours). The result should be something like this:

数据源是postgres中的一张表:

the data source is a table in postgres:

   id    |       created_at       
---------+------------------------
 2558145 | 2017-03-02 11:31:15+01
 2558146 | 2017-03-02 11:31:46+01
 2558147 | 2017-03-02 11:32:28+01
 2558148 | 2017-03-02 11:32:57+01
....

这是我按小时重新组合数据的代码.

here is my code the regroup the data by hour.

import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres:postgres@localhost:5432/bla')
import datetime
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
from matplotlib.dates import date2num
import seaborn as sns

df = pd.read_sql_query("""
SELECT created_at, 1 as print
FROM foo
WHERE created_at > '2017-02-01'
AND created_at < '2017-03-01'""", con=engine)

df['created_at'] = pd.to_datetime(df['created_at'])
df.index = df['created_at']

df = df.resample('H')['print'].sum()
df.fillna(0, inplace=True)

print(df.head())

created_at
2017-02-01 07:00:00+00:00      1.0
2017-02-01 08:00:00+00:00    152.0
2017-02-01 09:00:00+00:00    101.0
2017-02-01 10:00:00+00:00     92.0
2017-02-01 11:00:00+00:00    184.0
Freq: H, Name: print, dtype: float64

结果看起来不错,但我不知道如何绘制这个数据框?

The result looks fine but I can not figure out how to plot this dataframe?

推荐答案

热图是一个二维图,它将 x 和 y 对映射到一个值.这意味着热图的输入必须是二维数组.

A heatmap is a two dimensional plot, which maps x and y pairs to a value. This means that the input to the heatmap must be a 2D array.

在这里,您需要使数组的列表示日期,而行表示小时.第一步,我们需要在数据框的两个不同列中放置几天和几小时.然后可以将这些列重塑为2D阵列,这将需要知道有多少天和几小时.如果还要求每个天/小时对实际上有一个条目.
如果没有这个限制,我们也可以使用 pivot_table 来聚合表中的值.这在以下解决方案中显示.

Here you would want to have the columns of the array denote days and the rows to denote the hours. As a first step we would need to have days and hours in two different columns of the dataframe. One could then reshape those columns to 2D arrays, which would require to know how many days and hours there are. If would also require that there is actually an entry for each day/hour pair.
Without this restriction we can alternatively use a pivot_table to aggregate the values in a table. This is shown in the following solution.

import pandas as pd
import numpy as np; np.random.seed(0)
import seaborn.apionly as sns
import matplotlib.pyplot as plt

# create dataframe with datetime as index and aggregated (frequency) values
date = pd.date_range('2017-02-23', periods=10*12, freq='2h')
freq = np.random.poisson(lam=2, size=(len(date)))
df = pd.DataFrame({"freq":freq}, index=date)

# add a column hours and days
df["hours"] = df.index.hour
df["days"] = df.index.map(lambda x: x.strftime('%b-%d'))     
# create pivot table, days will be columns, hours will be rows
piv = pd.pivot_table(df, values="freq",index=["hours"], columns=["days"], fill_value=0)
#plot pivot table as heatmap using seaborn
ax = sns.heatmap(piv, square=True)
plt.setp( ax.xaxis.get_majorticklabels(), rotation=90 )
plt.tight_layout()
plt.show()

对于绘图,您还可以使用 matplotlib imshow 绘图,如下所示:

For plotting you may also use a matplotlib imshow plot as follows:

fig, ax = plt.subplots()
im = ax.imshow(piv, cmap="Greens")
fig.colorbar(im, ax=ax)

ax.set_xticks(range(len(piv.columns)))
ax.set_yticks(range(len(piv.index)))
ax.set_xticklabels(piv.columns, rotation=90)
ax.set_yticklabels(piv.index)
ax.set_xlabel("Days")
ax.set_ylabel("Hours")

plt.tight_layout()
plt.show()

这篇关于来自 pandas 数据框中列的热图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆