在matplotlib中具有重叠点的散点图的可视化 [英] Visualization of scatter plots with overlapping points in matplotlib

查看:446
本文介绍了在matplotlib中具有重叠点的散点图的可视化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须在matplotlib的散点图中代表约30,000点.这些点属于两个不同的类,所以我想用不同的颜色来描述它们.

I have to represent about 30,000 points in a scatter plot in matplotlib. These points belong to two different classes, so I want to depict them with different colors.

我成功了,但是有一个问题.这些点在许多区域中重叠,并且我最后描述的课程将在另一个课程的顶部可视化,将其隐藏.此外,使用散点图无法显示每个区域中有多少个点. 我还尝试使用histogram2d和imshow制作2d直方图,但是很难清楚地显示属于这两个类的点.

I succeded in doing so, but there is an issue. The points overlap in many regions and the class that I depict for last will be visualized on top of the other one, hiding it. Furthermore, with the scatter plot is not possible to show how many points lie in each region. I have also tried to make a 2d histogram with histogram2d and imshow, but it's difficult to show the points belonging to both classes in a clear way.

您能提出一种既可以明确类的分布又可以明确点的集中度的方法吗?

Can you suggest a way to make clear both the distribution of the classes and the concentration of the points?

更清楚地说,这是 链接到我的数据文件,格式为"x,y,class" >

To be more clear, this is the link to my data file in the format "x,y,class"

推荐答案

一种方法是将数据绘制为具有低alpha值的散点图,这样您就可以看到各个点以及密度的粗略度量. (这样做的不利之处在于,该方法只能显示有限的重叠范围,即最大密度约为1/alpha.)

One approach is to plot the data as a scatter plot with a low alpha, so you can see the individual points as well as a rough measure of density. (The downside to this is that the approach has a limited range of overlap it can show -- i.e., a maximum density of about 1/alpha.)

这是一个例子:

您可以想象,由于可以表示的重叠范围有限,因此在单个点的可见性与重叠量的表达(以及标记,标绘等的大小)之间要进行权衡. /p>

As you can imagine, because of the limited range of overlaps that can be expressed, there's a tradeoff between visibility of the individual points and the expression of amount of overlap (and the size of the marker, plot, etc).

import numpy as np
import matplotlib.pyplot as plt

N = 10000
mean = [0, 0]
cov = [[2, 2], [0, 2]]
x,y = np.random.multivariate_normal(mean, cov, N).T

plt.scatter(x, y, s=70, alpha=0.03)
plt.ylim((-5, 5))
plt.xlim((-5, 5))
plt.show()

(我假设您的意思是30e3点,而不是30e6.对于30e6,我认为需要某种类型的平均密度图.)

(I'm assuming here you meant 30e3 points, not 30e6. For 30e6, I think some type of averaged density plot would be necessary.)

这篇关于在matplotlib中具有重叠点的散点图的可视化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆