基于线的热图或2D线直方图 [英] Line-based heatmap or 2D line histogram

查看:137
本文介绍了基于线的热图或2D线直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个合成数据集,其中包含1000个各种阶数的嘈杂多边形和sin/cos曲线,可以使用python seaborn将它们绘制为线.

I have a synthetic dataset with 1000 noisy polygons of various orders and sin/cos curves that I can plot as lines using python seaborn.

由于我有很多重叠的线,所以我想绘制一些线图的热图或直方图. 我尝试遍历各列并汇总计数以使用Seaborn的热图图,但是对于许多行而言,这需要花费相当长的时间.

Since I have quite a few lines that are overlapping, I'd like to plot some sort of heatmap or histogram of my line graphs. I've tried iterating over the columns and aggregating the counts to use seaborn's heatmap graph, but with many lines this takes quite a while.

导致我想要的第二个最好的事情是一个六边形图(带有seaborn关节图).

The next best thing that results in what I want was a hexbin graph (with seaborn jointgraph).

但这是运行时和粒度之间的折衷(所示图形的网格大小为750).我找不到其他图形类型来解决我的问题.但我也不知道它到底叫什么.

But it's a compromise between runtime and granularity (the shown graph has gridsize 750). I couldn't find any other graph-type for my problem. But I also don't know exactly what it might be called.

我也尝试过将alpha设置为0.2.这导致了与我想要的图形相似的图形.但这不太精确(如果在同一点上有5条以上的线重叠,则我的透明度已经为零).而且,它缺少热图的典型着色.

I've also tried with line alpha set to 0.2. This results in a similar graph to what I want. But it's less precise (if more than 5 lines overlap at the same point I already have zero transparency left). Also, it misses the typical coloration of heatmaps.

(热门搜索字词是:热图,二维线直方图,线直方图,密度图...)

(Moot search terms were: heatmap, 2D line histogram, line histogram, density plots...)

是否有人知道软件包来更高效,更高质量地进行绘制,还是知道如何使用流行的python绘图仪(即matplotlib系列:matplotlib,seaborn,bokeh)进行绘制.我对任何包装都很好.

Does anybody know packages to plot this more efficiently and high(er) quality or knows how to do it with the popular python plotters (i.e. the matplotlib family: matplotlib, seaborn, bokeh). I'm really fine with any package though.

推荐答案

我花了一段时间,但最终我使用 Datashader .如果使用笔记本电脑,则可以将地块嵌入交互式 Bokeh 地块中,看起来非常不错.

It took me awhile, but I finally solved this using Datashader. If using a notebook, the plots can be embedded into interactive Bokeh plots, which looks really nice.

无论如何,这是静态图像的代码,以防其他人需要类似的东西:

Anyhow, here is the code for static images, in case someone else is in need of something similar:

# coding: utf-8
import time

import numpy as np
from numpy.polynomial import polynomial
import pandas as pd

import matplotlib.pyplot as plt
import datashader as ds
import datashader.transfer_functions as tf


plt.style.use("seaborn-whitegrid")

def create_data():
    # ...

# Each column is one data sample
df = create_data()

# Following will append a nan-row and reshape the dataframe into two columns, with each sample stacked on top of each other
#   THIS IS CRUCIAL TO OPTIMIZE SPEED: https://github.com/bokeh/datashader/issues/286

# Append row with nan-values
df = df.append(pd.DataFrame([np.array([np.nan] * len(df.columns))], columns=df.columns, index=[np.nan]))

# Reshape
x, y = df.shape
arr = df.as_matrix().reshape((x * y, 1), order='F')
df_reshaped = pd.DataFrame(arr, columns=list('y'), index=np.tile(df.index.values, y))
df_reshaped = df_reshaped.reset_index()
df_reshaped.columns.values[0] = 'x'

# Plotting parameters
x_range = (min(df.index.values), max(df.index.values))
y_range = (df.min().min(), df.max().max())
w = 1000
h = 750
dpi = 150
cvs = ds.Canvas(x_range=x_range, y_range=y_range, plot_height=h, plot_width=w)

# Aggregate data
t0 = time.time()
aggs = cvs.line(df_reshaped, 'x', 'y', ds.count())
print("Time to aggregate line data: {}".format(time.time()-t0))

# One colored plot
t1 = time.time()
stacked_img = tf.Image(tf.shade(aggs, cmap=["darkblue", "darkblue"]))
print("Time to create stacked image: {}".format(time.time() - t1))

# Save
f0 = plt.figure(figsize=(w / dpi, h / dpi), dpi=dpi)
ax0 = f0.add_subplot(111)
ax0.imshow(stacked_img.to_pil())
ax0.grid(False)
f0.savefig("stacked.png", bbox_inches="tight", dpi=dpi)

# Heat map - This uses a equalized histogram (built-in default), there are other options, though.
t2 = time.time()
heatmap_img = tf.Image(tf.shade(aggs, cmap=plt.cm.Spectral_r))
print("Time to create stacked image: {}".format(time.time() - t2))

# Save
f1 = plt.figure(figsize=(w / dpi, h / dpi), dpi=dpi)
ax1 = f1.add_subplot(111)
ax1.imshow(heatmap_img.to_pil())
ax1.grid(False)
f1.savefig("heatmap.png", bbox_inches="tight", dpi=dpi)

具有以下运行时间(以秒为单位):

With following run times (in seconds):

Time to aggregate line data: 0.7710442543029785
Time to create stacked image: 0.06000351905822754
Time to create stacked image: 0.05600309371948242

结果图:

这篇关于基于线的热图或2D线直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆