将ggplot2中的六进制数组设置为相同的大小 [英] Setting hex bins in ggplot2 to same size

查看:111
本文介绍了将ggplot2中的六进制数组设置为相同的大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在几个类别中对数据进行六进制表示。问题是,facetting这些垃圾箱似乎使所有的大小不同。

  set.seed(1)#Create data 
bindata < - data.frame(x = rnorm(100),y = rnorm(100))
fac_probs < - dnorm(seq(-3,3,length.out = 26))
fac_probs< fac_probs / sum(fac_probs);
bindata $ factor< - 样本(字母,100,替换= TRUE,prob = fac_probs)

库(ggplot2)#实际绘制
library(hexbin)

ggplot(bindata,aes(x = x,y = y))+
geom_hex()+
facet_wrap(〜factor )


是否可以设置一些东西使所有这些垃圾箱物理尺寸相同?


<正如Julius所说,问题在于 hexGrob 没有得到关于bin大小的信息,并且根据它在内找到的差异猜出它。

很明显,将 dx dy hexGrob - 不具有六边形的宽度和高度就像在中心指定一个圆而不给出半径。

解决方法:



分辨率策略可以起作用。 x和y都不相同。因此,作为一种解决方法,我将手动构建一个data.frame,其中包含单元格的x和y中心坐标以及构面和计数的因子:

除了问题中指定的库之外,我还需要

  library(reshape2)

还有 bindata $因子实际上需要一个因子:

  bindata $因子<  -  as.factor(bindata $因子)


  h < -  hexbin( bindata,xbins = 5,IDs = TRUE,
xbnds = range(bindata $ x),
ybnds = range(bindata $ y))

接下来,我们需要计算取决于 bindata $因子


$的计数b $ b $ $ p $ counts <-hexTapply(h,bindata $ factor,table)
counts <-t(simplify2array(counts))
counts< ; - 融化(计数)
colnames(计数) < - c(ID,factor,counts)

我们可以将这个data.frame与合适的坐标合并:

  hexdf<  -  data.frame(hcell2xy(h ),ID = h @ cell)
hexdf< - merge(counts,hexdf)

这是data.frame的样子:

 > head(hexdf)
ID因子计数xy
1 3 e 0 -0.3681728 -1.914359
2 3 s 0 -0.3681728 -1.914359
3 3 y 0 -0.3681728 -1.914359
4 3 r 0 -0.3681728 -1.914359
5 3 p 0 -0.3681728 -1.914359
6 3 o 0 -0.3681728 -1.914359

ggplot ting(使用下面的命令)这会产生正确的bin大小,但是这个数字有点怪异的外观: 0数六边形绘制,但只有其他一些方面有这个斌的填充。为了抑制绘图,我们可以将计数设置为 NA ,并使 na.value 完全透明(默认值为至grey50):

  hexdf $ counts [hexdf $ counts == 0]<  -  NA 

ggplot(hexdf,aes(x = x,y = y,fill = counts))+
geom_hex(stat =identity)+
facet_wrap(〜factor)+
coord_equal )+
scale_fill_continuous(low =grey80,high =#000040,na.value =#00000000)

产生的帖子顶部的数字。



只要binwidth是正确的,没有facetting,这个策略就可以工作。如果binwidth设置得很小,那么分辨率可能仍然会产生太大的 dx dy 。在这种情况下,我们可以给 hexGrob 提供两个相邻的bin(但x和y都不相同),其中 NA 计数

  dummy <-hgridcent(xbins = 5,
xbnds = range(bindata $ x) ,
ybnds = range(bindata $ y),
shape = 1)

dummy< - data.frame(ID = 0,
factor = rep ($,$ b $ = $),
x = rep(dummy $ x [1] + c(0,dummy $ dx / 2),
nlevels (bindata $ factor)),
y = rep(dummy $ y [1] + c(0,dummy $ dy),
nlevels(bindata $ factor)))

这种方法的另外一个优点是我们可以删除所有已经在 counts ,在这种情况下将 hexdf 的大小减少大约3/4(1 22行而不是520):

 计数< - 计数[count $ counts> 0,] 
hexdf< - data.frame(hcell2xy(h),ID = h @ cell)
hexdf< - merge(counts,hexdf)
hexdf< - rbind hexdf,dummy)

该图看起来与上面完全相同,但您可以用 na.value 不完全透明。






更多关于问题的信息



这个问题并不是独特的,但是如果占用太多箱子,总会出现问题,因此没有对角相邻的箱子。



以下是一系列显示问题的最小数据:首先,我跟踪 hexBin ,所以我得到了相同六角形网格的所有中心坐标,它们是 ggplot2 :::hexBin hexbin

  trace(ggplot2 ::: hexBin,exit = quote({trace.grid <<  -  as.data.frame(hgridcent(xbins = xbins,xbnds = xbnds,ybnds = ybnds,shape = ybins / xbins)[1:2]); trace.h< -hb})) 

设置一个非常小的数据集:

  df < -  data.frame(x = 3:1,y = 1:3)

以及plot:

  p < -  ggplot(df,aes(x = x ,y = y))+ geom_hex(binwidth = c(1,1))+ 
coord_fixed(xlim = c(0,4),ylim = c(0,4))
$ b $需要进行跟踪的bp#
p + geom_point(data = trace.grid,size = 4)+
geom_point(data = df,col =red)#data pts

str(tr ace.h)

带有16个插槽的正式类'hexbin'[包'hexbin']
.. @ cell:int [1:3] 3 5 7
.. @ count:int [1:3] 1 1 1
.. @ xcm:num [1:3] 3 2 1
.. @ ycm:num [1:3] 1 2 3
.. @ xbins:num 2
.. @ shape:num 1
.. @ xbnds:num [1:2] 1 3
.. @ ybnds:num [1: 2] 1 3
.. @ dimen:num [1:2] 4 3
.. @ n:int 3
.. @ ncells:int 3
.. @ call:language hexbin(x = x,y = y,xbins = xbins,shape = ybins / xbins,xbnds = xbnds,ybnds = ybnds)
.. @ xlab:chrx
.. @ ylab:chry
.. @ cID:NULL
.. @ cAtt:int(0)

我重复该图,忽略数据点2:

  p < -  ggplot( df [-2,],aes(x = x,y = y))+ geom_hex(binwidth = c(1,1))+ coord_fixed(xlim = c(0,4),ylim = c(0,4) )
p
p + geom_point(data = trace.grid,size = 4)+ geom_point(data = df,col =red)
str(trace.h)

正式class'hexbin'[packagehexbin] with 16 slots
.. @ cell:int [1:2] 3 7
.. @ count:int [1:2] 1 1
.. @ xcm:num [1:2] 3 1
.. @ ycm:num [1:2] 1 3
.. @ xbins:num 2
.. @ shape :num 1
.. @ xbnds:num [1:2] 1 3
.. @ ybnds:num [1:2] 1 3
.. @ dimen:num [1: 2] 4 3
.. @ n:int 2
.. @ ncells:int 2
.. @ call:language hexbin(x = x,y = y,xbins = xbins, shape = ybins / xbins,xbnds = xbnds,ybnds = ybnds)
.. @ xlab:chrx
.. @ ylab:chry
.. @ cID:NULL
.. @ cAtt:int(0)


$ b

  • 单元格编号没有改变,只是单元格5不再被填充,因此没有列出),网格尺寸和r时代没有改变。但是绘制的六角形的确发生了巨大的变化。
  • 第一个单元格(左下角)。




    虽然它被填充:

      df < -  data.frame(x = 1:3,y = 1:3)

    p < - ggplot(df,aes (x = x,y = y))+ geom_hex(binwidth = c(0.5,0.8))+
    coord_fixed(xlim = c(0,4),ylim = c(0,4))

    p#用于跟踪发生
    p + geom_point(data = trace.grid,size = 4)+
    geom_point(data = df,col =red)+#data pts
    geom_point(data = as.data.frame(hcell2xy(trace.h)),shape = 1,size = 6)



    这里,六边形的渲染不可能是正确的 - 它们不属于一个六边形网格。

    I'm trying to make a hexbin representation of data in several categories. The problem is, facetting these bins seems to make all of them different sizes.

    set.seed(1) #Create data
    bindata <- data.frame(x=rnorm(100), y=rnorm(100))
    fac_probs <- dnorm(seq(-3, 3, length.out=26))
    fac_probs <- fac_probs/sum(fac_probs)
    bindata$factor <- sample(letters, 100, replace=TRUE, prob=fac_probs)
    
    library(ggplot2) #Actual plotting
    library(hexbin)
    
    ggplot(bindata, aes(x=x, y=y)) +
      geom_hex() +
      facet_wrap(~factor)
    

    Is it possible to set something to make all these bins physically the same size?

    解决方案

    As Julius says, the problem is that hexGrob doesn't get the information about the bin sizes, and guesses it from the differences it finds within the facet.

    Obviously, it would make sense to hand dx and dy to a hexGrob -- not having the width and height of a hexagon is like specifying a circle by center without giving the radius.

    Workaround:

    The resolution strategy works, if the facet contains two adjacent haxagons that differ in both x and y. So, as a workaround, I'll construct manually a data.frame containing the x and y center coordinates of the cells, and the factor for facetting and the counts:

    In addition to the libraries specified in the question, I'll need

    library (reshape2)
    

    and also bindata$factor actually needs to be a factor:

    bindata$factor <- as.factor (bindata$factor)
    

    Now, calculate the basic hexagon grid

    h <- hexbin (bindata, xbins = 5, IDs = TRUE, 
                 xbnds = range (bindata$x), 
                 ybnds = range (bindata$y))
    

    Next, we need to calculate the counts depending on bindata$factor

    counts <- hexTapply (h, bindata$factor, table)
    counts <- t (simplify2array (counts))
    counts <- melt (counts)
    colnames (counts)  <- c ("ID", "factor", "counts")
    

    As we have the cell IDs, we can merge this data.frame with the proper coordinates:

    hexdf <- data.frame (hcell2xy (h),  ID = h@cell)
    hexdf <- merge (counts, hexdf)
    

    Here's what the data.frame looks like:

    > head (hexdf)
      ID factor counts          x         y
    1  3      e      0 -0.3681728 -1.914359
    2  3      s      0 -0.3681728 -1.914359
    3  3      y      0 -0.3681728 -1.914359
    4  3      r      0 -0.3681728 -1.914359
    5  3      p      0 -0.3681728 -1.914359
    6  3      o      0 -0.3681728 -1.914359
    

    ggplotting (use the command below) this yields the correct bin sizes, but the figure has a bit weird appearance: 0 count hexagons are drawn, but only where some other facet has this bin populated. To suppres the drawing, we can set the counts there to NA and make the na.value completely transparent (it defaults to grey50):

    hexdf$counts [hexdf$counts == 0] <- NA
    
    ggplot(hexdf, aes(x=x, y=y, fill = counts)) +
      geom_hex(stat="identity") +
      facet_wrap(~factor) +
      coord_equal () +
      scale_fill_continuous (low = "grey80", high = "#000040", na.value = "#00000000")
    

    yields the figure at the top of the post.

    This strategy works as long as the binwidths are correct without facetting. If the binwidths are set very small, the resolution may still yield too large dx and dy. In that case, we can supply hexGrob with two adjacent bins (but differing in both x and y) with NA counts for each facet.

    dummy <- hgridcent (xbins = 5, 
                        xbnds = range (bindata$x),  
                        ybnds = range (bindata$y),  
                        shape = 1)
    
    dummy <- data.frame (ID = 0,
                         factor = rep (levels (bindata$factor), each = 2),
                         counts = NA,
                         x = rep (dummy$x [1] + c (0, dummy$dx/2), 
                                  nlevels (bindata$factor)),
                         y = rep (dummy$y [1] + c (0, dummy$dy  ), 
                                  nlevels (bindata$factor)))
    

    An additional advantage of this approach is that we can delete all the rows with 0 counts already in counts, in this case reducing the size of hexdf by roughly 3/4 (122 rows instead of 520):

    counts <- counts [counts$counts > 0 ,]
    hexdf <- data.frame (hcell2xy (h),  ID = h@cell)
    hexdf <- merge (counts, hexdf)
    hexdf <- rbind (hexdf, dummy)
    

    The plot looks exactly the same as above, but you can visualize the difference with na.value not being fully transparent.


    more about the problem

    The problem is not unique to facetting but occurs always if too few bins are occupied, so that no "diagonally" adjacent bins are populated.

    Here's a series of more minimal data that shows the problem:

    First, I trace hexBin so I get all center coordinates of the same hexagonal grid that ggplot2:::hexBin and the object returned by hexbin:

    trace (ggplot2:::hexBin, exit = quote ({trace.grid <<- as.data.frame (hgridcent (xbins = xbins, xbnds = xbnds, ybnds = ybnds, shape = ybins/xbins) [1:2]); trace.h <<- hb}))
    

    Set up a very small data set:

    df <- data.frame (x = 3 : 1, y = 1 : 3)
    

    And plot:

    p <- ggplot(df, aes(x=x, y=y)) +  geom_hex(binwidth=c(1, 1)) +          
         coord_fixed (xlim = c (0, 4), ylim = c (0,4))
    
    p # needed for the tracing to occur
    p + geom_point (data = trace.grid, size = 4) + 
        geom_point (data = df, col = "red") # data pts
    
    str (trace.h)
    
    Formal class 'hexbin' [package "hexbin"] with 16 slots
      ..@ cell  : int [1:3] 3 5 7
      ..@ count : int [1:3] 1 1 1
      ..@ xcm   : num [1:3] 3 2 1
      ..@ ycm   : num [1:3] 1 2 3
      ..@ xbins : num 2
      ..@ shape : num 1
      ..@ xbnds : num [1:2] 1 3
      ..@ ybnds : num [1:2] 1 3
      ..@ dimen : num [1:2] 4 3
      ..@ n     : int 3
      ..@ ncells: int 3
      ..@ call  : language hexbin(x = x, y = y, xbins = xbins, shape = ybins/xbins, xbnds = xbnds, ybnds = ybnds)
      ..@ xlab  : chr "x"
      ..@ ylab  : chr "y"
      ..@ cID   : NULL
      ..@ cAtt  : int(0) 
    

    I repeat the plot, leaving out data point 2:

    p <- ggplot(df [-2,], aes(x=x, y=y)) +  geom_hex(binwidth=c(1, 1)) +          coord_fixed (xlim = c (0, 4), ylim = c (0,4))
    p
    p + geom_point (data = trace.grid, size = 4) + geom_point (data = df, col = "red")
    str (trace.h)
    
    Formal class 'hexbin' [package "hexbin"] with 16 slots
      ..@ cell  : int [1:2] 3 7
      ..@ count : int [1:2] 1 1
      ..@ xcm   : num [1:2] 3 1
      ..@ ycm   : num [1:2] 1 3
      ..@ xbins : num 2
      ..@ shape : num 1
      ..@ xbnds : num [1:2] 1 3
      ..@ ybnds : num [1:2] 1 3
      ..@ dimen : num [1:2] 4 3
      ..@ n     : int 2
      ..@ ncells: int 2
      ..@ call  : language hexbin(x = x, y = y, xbins = xbins, shape = ybins/xbins, xbnds = xbnds, ybnds = ybnds)
      ..@ xlab  : chr "x"
      ..@ ylab  : chr "y"
      ..@ cID   : NULL
      ..@ cAtt  : int(0) 
    

    • note that the results from hexbin are on the same grid (cell numbers did not change, just cell 5 is not populated any more and thus not listed), grid dimensions and ranges did not change. But the plotted hexagons did change dramatically.

    • Also notice that hgridcent forgets to return the center coordinates of the first cell (lower left).

    Though it gets populated:

    df <- data.frame (x = 1 : 3, y = 1 : 3)
    
    p <- ggplot(df, aes(x=x, y=y)) +  geom_hex(binwidth=c(0.5, 0.8)) +          
         coord_fixed (xlim = c (0, 4), ylim = c (0,4))
    
    p # needed for the tracing to occur
    p + geom_point (data = trace.grid, size = 4) + 
        geom_point (data = df, col = "red") + # data pts
        geom_point (data = as.data.frame (hcell2xy (trace.h)), shape = 1, size = 6)
    

    Here, the rendering of the hexagons cannot possibly be correct - they do not belong to one hexagonal grid.

    这篇关于将ggplot2中的六进制数组设置为相同的大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆