将ggplot2中的六进制数组设置为相同的大小 [英] Setting hex bins in ggplot2 to same size
问题描述
我试图在几个类别中对数据进行六进制表示。问题是,facetting这些垃圾箱似乎使所有的大小不同。
set.seed(1)#Create data
bindata < - data.frame(x = rnorm(100),y = rnorm(100))
fac_probs < - dnorm(seq(-3,3,length.out = 26))
fac_probs< fac_probs / sum(fac_probs);
bindata $ factor< - 样本(字母,100,替换= TRUE,prob = fac_probs)
库(ggplot2)#实际绘制
library(hexbin)
ggplot(bindata,aes(x = x,y = y))+
geom_hex()+
facet_wrap(〜factor )
是否可以设置一些东西使所有这些垃圾箱物理尺寸相同?
<正如Julius所说,问题在于
hexGrob
没有得到关于bin大小的信息,并且根据它在很明显,将 dx
和 dy
到 hexGrob
- 不具有六边形的宽度和高度就像在中心指定一个圆而不给出半径。
解决方法:
分辨率策略可以起作用。 x和y都不相同。因此,作为一种解决方法,我将手动构建一个data.frame,其中包含单元格的x和y中心坐标以及构面和计数的因子:
除了问题中指定的库之外,我还需要
library(reshape2)
还有 bindata $因子
实际上需要一个因子:
bindata $因子< - as.factor(bindata $因子)
$ p $现在,计算基本的六边形网格
h < - hexbin( bindata,xbins = 5,IDs = TRUE,
xbnds = range(bindata $ x),
ybnds = range(bindata $ y))
接下来,我们需要计算取决于
bindata $因子
$的计数b $ b $ $ p $counts <-hexTapply(h,bindata $ factor,table)
counts <-t(simplify2array(counts))
counts< ; - 融化(计数)
colnames(计数) < - c(ID,factor,counts)
我们可以将这个data.frame与合适的坐标合并:
hexdf< - data.frame(hcell2xy(h ),ID = h @ cell)
hexdf< - merge(counts,hexdf)
这是data.frame的样子:
> head(hexdf)
ID因子计数xy
1 3 e 0 -0.3681728 -1.914359
2 3 s 0 -0.3681728 -1.914359
3 3 y 0 -0.3681728 -1.914359
4 3 r 0 -0.3681728 -1.914359
5 3 p 0 -0.3681728 -1.914359
6 3 o 0 -0.3681728 -1.914359
ggplot
ting(使用下面的命令)这会产生正确的bin大小,但是这个数字有点怪异的外观: 0数六边形绘制,但只有其他一些方面有这个斌的填充。为了抑制绘图,我们可以将计数设置为 NA
,并使 na.value
完全透明(默认值为至grey50):
hexdf $ counts [hexdf $ counts == 0]< - NA
ggplot(hexdf,aes(x = x,y = y,fill = counts))+
geom_hex(stat =identity)+
facet_wrap(〜factor)+
coord_equal )+
scale_fill_continuous(low =grey80,high =#000040,na.value =#00000000)
产生的帖子顶部的数字。
只要binwidth是正确的,没有facetting,这个策略就可以工作。如果binwidth设置得很小,那么分辨率
可能仍然会产生太大的 dx
和 dy
。在这种情况下,我们可以给 hexGrob
提供两个相邻的bin(但x和y都不相同),其中 NA
计数
dummy <-hgridcent(xbins = 5,
xbnds = range(bindata $ x) ,
ybnds = range(bindata $ y),
shape = 1)
dummy< - data.frame(ID = 0,
factor = rep ($,$ b $ = $),
x = rep(dummy $ x [1] + c(0,dummy $ dx / 2),
nlevels (bindata $ factor)),
y = rep(dummy $ y [1] + c(0,dummy $ dy),
nlevels(bindata $ factor)))
这种方法的另外一个优点是我们可以删除所有已经在 counts $中的计数为0的行c $ c>,在这种情况下将
hexdf
的大小减少大约3/4(1 22行而不是520):
计数< - 计数[count $ counts> 0,]
hexdf< - data.frame(hcell2xy(h),ID = h @ cell)
hexdf< - merge(counts,hexdf)
hexdf< - rbind hexdf,dummy)
该图看起来与上面完全相同,但您可以用 na.value
不完全透明。
更多关于问题的信息
这个问题并不是独特的,但是如果占用太多箱子,总会出现问题,因此没有对角相邻的箱子。
以下是一系列显示问题的最小数据:首先,我跟踪 hexBin
,所以我得到了相同六角形网格的所有中心坐标,它们是 ggplot2 :::hexBin
和 hexbin
:
trace(ggplot2 ::: hexBin,exit = quote({trace.grid << - as.data.frame(hgridcent(xbins = xbins,xbnds = xbnds,ybnds = ybnds,shape = ybins / xbins)[1:2]); trace.h< -hb}))
设置一个非常小的数据集:
df < - data.frame(x = 3:1,y = 1:3)
以及plot:
p < - ggplot(df,aes(x = x ,y = y))+ geom_hex(binwidth = c(1,1))+
coord_fixed(xlim = c(0,4),ylim = c(0,4))
$ b $需要进行跟踪的bp#
p + geom_point(data = trace.grid,size = 4)+
geom_point(data = df,col =red)#data pts
str(tr ace.h)
带有16个插槽的正式类'hexbin'[包'hexbin']
.. @ cell:int [1:3] 3 5 7
.. @ count:int [1:3] 1 1 1
.. @ xcm:num [1:3] 3 2 1
.. @ ycm:num [1:3] 1 2 3
.. @ xbins:num 2
.. @ shape:num 1
.. @ xbnds:num [1:2] 1 3
.. @ ybnds:num [1: 2] 1 3
.. @ dimen:num [1:2] 4 3
.. @ n:int 3
.. @ ncells:int 3
.. @ call:language hexbin(x = x,y = y,xbins = xbins,shape = ybins / xbins,xbnds = xbnds,ybnds = ybnds)
.. @ xlab:chrx
.. @ ylab:chry
.. @ cID:NULL
.. @ cAtt:int(0)
我重复该图,忽略数据点2:
p < - ggplot( df [-2,],aes(x = x,y = y))+ geom_hex(binwidth = c(1,1))+ coord_fixed(xlim = c(0,4),ylim = c(0,4) )
p
p + geom_point(data = trace.grid,size = 4)+ geom_point(data = df,col =red)
str(trace.h)
正式class'hexbin'[packagehexbin] with 16 slots
.. @ cell:int [1:2] 3 7
.. @ count:int [1:2] 1 1
.. @ xcm:num [1:2] 3 1
.. @ ycm:num [1:2] 1 3
.. @ xbins:num 2
.. @ shape :num 1
.. @ xbnds:num [1:2] 1 3
.. @ ybnds:num [1:2] 1 3
.. @ dimen:num [1: 2] 4 3
.. @ n:int 2
.. @ ncells:int 2
.. @ call:language hexbin(x = x,y = y,xbins = xbins, shape = ybins / xbins,xbnds = xbnds,ybnds = ybnds)
.. @ xlab:chrx
.. @ ylab:chry
.. @ cID:NULL
.. @ cAtt:int(0)
$ b
虽然它被填充:
df < - data.frame(x = 1:3,y = 1:3)
p < - ggplot(df,aes (x = x,y = y))+ geom_hex(binwidth = c(0.5,0.8))+
coord_fixed(xlim = c(0,4),ylim = c(0,4))
p#用于跟踪发生
p + geom_point(data = trace.grid,size = 4)+
geom_point(data = df,col =red)+#data pts
geom_point(data = as.data.frame(hcell2xy(trace.h)),shape = 1,size = 6)
这里,六边形的渲染不可能是正确的 - 它们不属于一个六边形网格。
I'm trying to make a hexbin representation of data in several categories. The problem is, facetting these bins seems to make all of them different sizes.
set.seed(1) #Create data
bindata <- data.frame(x=rnorm(100), y=rnorm(100))
fac_probs <- dnorm(seq(-3, 3, length.out=26))
fac_probs <- fac_probs/sum(fac_probs)
bindata$factor <- sample(letters, 100, replace=TRUE, prob=fac_probs)
library(ggplot2) #Actual plotting
library(hexbin)
ggplot(bindata, aes(x=x, y=y)) +
geom_hex() +
facet_wrap(~factor)
Is it possible to set something to make all these bins physically the same size?
As Julius says, the problem is that hexGrob
doesn't get the information about the bin sizes, and guesses it from the differences it finds within the facet.
Obviously, it would make sense to hand dx
and dy
to a hexGrob
-- not having the width and height of a hexagon is like specifying a circle by center without giving the radius.
Workaround:
The resolution
strategy works, if the facet contains two adjacent haxagons that differ in both x and y. So, as a workaround, I'll construct manually a data.frame containing the x and y center coordinates of the cells, and the factor for facetting and the counts:
In addition to the libraries specified in the question, I'll need
library (reshape2)
and also bindata$factor
actually needs to be a factor:
bindata$factor <- as.factor (bindata$factor)
Now, calculate the basic hexagon grid
h <- hexbin (bindata, xbins = 5, IDs = TRUE,
xbnds = range (bindata$x),
ybnds = range (bindata$y))
Next, we need to calculate the counts depending on bindata$factor
counts <- hexTapply (h, bindata$factor, table)
counts <- t (simplify2array (counts))
counts <- melt (counts)
colnames (counts) <- c ("ID", "factor", "counts")
As we have the cell IDs, we can merge this data.frame with the proper coordinates:
hexdf <- data.frame (hcell2xy (h), ID = h@cell)
hexdf <- merge (counts, hexdf)
Here's what the data.frame looks like:
> head (hexdf)
ID factor counts x y
1 3 e 0 -0.3681728 -1.914359
2 3 s 0 -0.3681728 -1.914359
3 3 y 0 -0.3681728 -1.914359
4 3 r 0 -0.3681728 -1.914359
5 3 p 0 -0.3681728 -1.914359
6 3 o 0 -0.3681728 -1.914359
ggplot
ting (use the command below) this yields the correct bin sizes, but the figure has a bit weird appearance: 0 count hexagons are drawn, but only where some other facet has this bin populated. To suppres the drawing, we can set the counts there to NA
and make the na.value
completely transparent (it defaults to grey50):
hexdf$counts [hexdf$counts == 0] <- NA
ggplot(hexdf, aes(x=x, y=y, fill = counts)) +
geom_hex(stat="identity") +
facet_wrap(~factor) +
coord_equal () +
scale_fill_continuous (low = "grey80", high = "#000040", na.value = "#00000000")
yields the figure at the top of the post.
This strategy works as long as the binwidths are correct without facetting. If the binwidths are set very small, the resolution
may still yield too large dx
and dy
. In that case, we can supply hexGrob
with two adjacent bins (but differing in both x and y) with NA
counts for each facet.
dummy <- hgridcent (xbins = 5,
xbnds = range (bindata$x),
ybnds = range (bindata$y),
shape = 1)
dummy <- data.frame (ID = 0,
factor = rep (levels (bindata$factor), each = 2),
counts = NA,
x = rep (dummy$x [1] + c (0, dummy$dx/2),
nlevels (bindata$factor)),
y = rep (dummy$y [1] + c (0, dummy$dy ),
nlevels (bindata$factor)))
An additional advantage of this approach is that we can delete all the rows with 0 counts already in counts
, in this case reducing the size of hexdf
by roughly 3/4 (122 rows instead of 520):
counts <- counts [counts$counts > 0 ,]
hexdf <- data.frame (hcell2xy (h), ID = h@cell)
hexdf <- merge (counts, hexdf)
hexdf <- rbind (hexdf, dummy)
The plot looks exactly the same as above, but you can visualize the difference with na.value
not being fully transparent.
more about the problem
The problem is not unique to facetting but occurs always if too few bins are occupied, so that no "diagonally" adjacent bins are populated.
Here's a series of more minimal data that shows the problem:
First, I trace hexBin
so I get all center coordinates of the same hexagonal grid that ggplot2:::hexBin
and the object returned by hexbin
:
trace (ggplot2:::hexBin, exit = quote ({trace.grid <<- as.data.frame (hgridcent (xbins = xbins, xbnds = xbnds, ybnds = ybnds, shape = ybins/xbins) [1:2]); trace.h <<- hb}))
Set up a very small data set:
df <- data.frame (x = 3 : 1, y = 1 : 3)
And plot:
p <- ggplot(df, aes(x=x, y=y)) + geom_hex(binwidth=c(1, 1)) +
coord_fixed (xlim = c (0, 4), ylim = c (0,4))
p # needed for the tracing to occur
p + geom_point (data = trace.grid, size = 4) +
geom_point (data = df, col = "red") # data pts
str (trace.h)
Formal class 'hexbin' [package "hexbin"] with 16 slots
..@ cell : int [1:3] 3 5 7
..@ count : int [1:3] 1 1 1
..@ xcm : num [1:3] 3 2 1
..@ ycm : num [1:3] 1 2 3
..@ xbins : num 2
..@ shape : num 1
..@ xbnds : num [1:2] 1 3
..@ ybnds : num [1:2] 1 3
..@ dimen : num [1:2] 4 3
..@ n : int 3
..@ ncells: int 3
..@ call : language hexbin(x = x, y = y, xbins = xbins, shape = ybins/xbins, xbnds = xbnds, ybnds = ybnds)
..@ xlab : chr "x"
..@ ylab : chr "y"
..@ cID : NULL
..@ cAtt : int(0)
I repeat the plot, leaving out data point 2:
p <- ggplot(df [-2,], aes(x=x, y=y)) + geom_hex(binwidth=c(1, 1)) + coord_fixed (xlim = c (0, 4), ylim = c (0,4))
p
p + geom_point (data = trace.grid, size = 4) + geom_point (data = df, col = "red")
str (trace.h)
Formal class 'hexbin' [package "hexbin"] with 16 slots
..@ cell : int [1:2] 3 7
..@ count : int [1:2] 1 1
..@ xcm : num [1:2] 3 1
..@ ycm : num [1:2] 1 3
..@ xbins : num 2
..@ shape : num 1
..@ xbnds : num [1:2] 1 3
..@ ybnds : num [1:2] 1 3
..@ dimen : num [1:2] 4 3
..@ n : int 2
..@ ncells: int 2
..@ call : language hexbin(x = x, y = y, xbins = xbins, shape = ybins/xbins, xbnds = xbnds, ybnds = ybnds)
..@ xlab : chr "x"
..@ ylab : chr "y"
..@ cID : NULL
..@ cAtt : int(0)
note that the results from
hexbin
are on the same grid (cell numbers did not change, just cell 5 is not populated any more and thus not listed), grid dimensions and ranges did not change. But the plotted hexagons did change dramatically.Also notice that
hgridcent
forgets to return the center coordinates of the first cell (lower left).
Though it gets populated:
df <- data.frame (x = 1 : 3, y = 1 : 3)
p <- ggplot(df, aes(x=x, y=y)) + geom_hex(binwidth=c(0.5, 0.8)) +
coord_fixed (xlim = c (0, 4), ylim = c (0,4))
p # needed for the tracing to occur
p + geom_point (data = trace.grid, size = 4) +
geom_point (data = df, col = "red") + # data pts
geom_point (data = as.data.frame (hcell2xy (trace.h)), shape = 1, size = 6)
Here, the rendering of the hexagons cannot possibly be correct - they do not belong to one hexagonal grid.
这篇关于将ggplot2中的六进制数组设置为相同的大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!