用不同大小的箱绘制概率热图/六边形 [英] Plot probability heatmap/hexbin with different sized bins

查看:111
本文介绍了用不同大小的箱绘制概率热图/六边形的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这与另一个问题有关:绘制加权频率矩阵.

This is related to another question: Plot weighted frequency matrix.

我有此图形(由R中的以下代码生成):

I have this graphic (produced by the code below in R):

#Set the number of bets and number of trials and % lines
numbet <- 36 
numtri <- 1000 
#Fill a matrix where the rows are the cumulative bets and the columns are the trials
xcum <- matrix(NA, nrow=numbet, ncol=numtri)
for (i in 1:numtri) {
x <- sample(c(0,1), numbet, prob=c(5/6,1/6), replace = TRUE)
xcum[,i] <- cumsum(x)/(1:numbet)
}
#Plot the trials as transparent lines so you can see the build up
matplot(xcum, type="l", xlab="Number of Trials", ylab="Relative Frequency", main="", col=rgb(0.01, 0.01, 0.01, 0.02), las=1)

我非常喜欢此图的构建方式,并显示较频繁的路径比较稀有的路径更暗(但对于打印演示而言,它还不够清晰).我想做的是为数字生成某种十六进制或热图.考虑一下之后,该图似乎必须包含不同大小的垃圾箱(请参见我的信封草图背面):

I very much like the way that this plot is built up and shows the more frequent paths as darker than the rarer paths (but it is not clear enough for a print presentation). What I would like to do is to produce some kind of hexbin or heatmap for the numbers. On thinking about it, it seems that the plot will have to incorporate different sized bins (see my back of the envelope sketch):

然后我的问题是:如果我使用上面的代码模拟一百万次运行,我如何将其显示为热图或六边形图,并显示草图中所示的不同大小的框?

为了澄清:我不想依靠透明度来显示通过情节一部分的审判的稀有性.相反,我想用热来表示稀有性,并以热(红色)表示普通路径,以冷(蓝色)表示罕见路径.另外,我不认为垃圾箱的大小应该相同,因为第一个试验只有两个可以放置路径的地方,但是最后一个有更多地方.因此,事实是我根据该事实选择了一个变化的垃圾箱规模. 本质上,我是在计算路径通过单元格的次数(第1列中的2列,第2列中的3列),然后根据通过的次数为该单元着色.

To clarify: I do not want to rely on transparency to show the rarity of a trial passing through a part of the plot. Instead I would like to denote rarity with heat and show a common pathway as hot (red) and a rare pathway as cold (blue). Also, I do not think the bins should be the same size because the first trial has only two places where the path can be, but the last has many more. Hence the fact I chose a changing bin scale, based on that fact. Essentially I am counting the number of times a path passes through the cell (2 in col 1, 3 in col 2 etc) and then colouring the cell based on how many times it has been passed through.

更新:我已经有一个类似于@Andrie的情节,但是我不确定它是否比顶部情节清晰得多.我不喜欢这张图的不连续本质(以及为什么我想要某种热图).我认为,因为第一列只有两个可能的值,所以它们之间等之间不应存在巨大的视觉间隙.因此,为什么我设想使用不同大小的垃圾箱.我仍然觉得合并版本会更好地显示大量样本.

UPDATE: I already had a plot similar to @Andrie, but I am not sure it is much clearer than the top plot. It is the discontinuous nature of this graph, that I do not like (and why I want some kind of heatmap). I think that because the first column has only two possible values, that there should not be a huge visual gap between them etc etc. Hence why I envisaged the different sized bins. I still feel that the binning version would show large number of samples better.

更新:此网站概述了绘制过程热图:

Update: This website outlines a procedure to plot a heatmap:

要创建密度(热图)图版本,我们必须有效地枚举图像中每个离散位置上这些点的出现.这是通过设置网格并计算点坐标落入"该网格中每个位置的每个单独像素箱"的次数来完成的.

To create a density (heatmap) plot version of this we have to effectively enumerate the occurrence of these points at each discrete location in the image. This is done by setting a up a grid and counting the number of times a point coordinate "falls" into each of the individual pixel "bins" at every location in that grid.

也许该网站上的某些信息可以与我们已经拥有的信息相结合?

Perhaps some of the information on that website can be combined with what we have already?

更新:我采纳了安德里(Andrie)撰写的一些

Update: I took some of what Andrie wrote with some of this question, to arrive at this, which is quite close to what I was conceiving:

numbet <- 20
numtri <- 100
prob=1/6
#Fill a matrix 
xcum <- matrix(NA, nrow=numtri, ncol=numbet+1)
for (i in 1:numtri) {
  x <- sample(c(0,1), numbet, prob=c(prob, 1-prob), replace = TRUE)
  xcum[i, ] <- c(i, cumsum(x)/cumsum(1:numbet))
}
colnames(xcum) <- c("trial", paste("bet", 1:numbet, sep=""))

mxcum <- reshape(data.frame(xcum), varying=1+1:numbet, 
  idvar="trial", v.names="outcome", direction="long", timevar="bet")

 #from the other question
 require(MASS)
dens <- kde2d(mxcum$bet, mxcum$outcome)
filled.contour(dens)

我不太了解发生了什么,但这似乎更像我想生产的东西(显然没有大小不同的垃圾箱).

I don't quite understand what's going on, but this seems to be more like what I wanted to produce (obviously without the different sized bins).

更新:这与此处的其他图类似.这不太正确:

Update: This is similar to the other plots here. It is not quite right:

plot(hexbin(x=mxcum$bet, y=mxcum$outcome))

最后一次尝试.如上:

Last try. As above:

image(mxcum$bet, mxcum$outcome)

这很好.我希望它看起来像我的手绘草图.

This is pretty good. I would just like it to look like my hand-drawn sketch.

推荐答案

编辑

我认为以下解决方案可以满足您的要求.

I think the following solution does what you ask for.

(请注意,这很慢,尤其是reshape步骤)

numbet <- 32
numtri <- 1e5
prob=5/6
#Fill a matrix 
xcum <- matrix(NA, nrow=numtri, ncol=numbet+1)
for (i in 1:numtri) {
  x <- sample(c(0,1), numbet, prob=c(prob, 1-prob), replace = TRUE)
  xcum[i, ] <- c(i, cumsum(x)/cumsum(1:numbet))
}
colnames(xcum) <- c("trial", paste("bet", 1:numbet, sep=""))

mxcum <- reshape(data.frame(xcum), varying=1+1:numbet, 
  idvar="trial", v.names="outcome", direction="long", timevar="bet")


library(plyr)
mxcum2 <- ddply(mxcum, .(bet, outcome), nrow)
mxcum3 <- ddply(mxcum2, .(bet), summarize, 
                ymin=c(0, head(seq_along(V1)/length(V1), -1)), 
                ymax=seq_along(V1)/length(V1),
                fill=(V1/sum(V1)))
head(mxcum3)

library(ggplot2)

p <- ggplot(mxcum3, aes(xmin=bet-0.5, xmax=bet+0.5, ymin=ymin, ymax=ymax)) + 
    geom_rect(aes(fill=fill), colour="grey80") + 
    scale_fill_gradient("Outcome", formatter="percent", low="red", high="blue") +
    scale_y_continuous(formatter="percent") +
    xlab("Bet")

print(p)

这篇关于用不同大小的箱绘制概率热图/六边形的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆