如何保持一致的轴在具有不同画布大小的ggplot2图的网格中缩放 [英] How to keep consistent axes scaling in a grid of ggplot2 plots with different canvas size

查看:1690
本文介绍了如何保持一致的轴在具有不同画布大小的ggplot2图的网格中缩放的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我有一个包含多个动物位置的数据集。编辑:将描述清除并添加代码示例,添加图表。



我为每只动物创建了一个位置散点图的网格。因为图的xy是距离,所以我希望每个图本身都保持xy相同的比例(所以没有距离上的扭曲),并且在整个图中(所以我可以比较具有相同比例的不同图)。



Facet是自然选择,它适用于 coord_fixed()。然而,当数据中存在异常值(可能是错误)时,它变得更加复杂。我修改了@Mark Peterson的一个很好的答案,以增加一些离群点。

  set.seed(8675309)
df< -
data.frame(
x = runif(40,1,20)
,y = runif(40,100,140)
,ind = sample(LETTERS [1: 4],40,TRUE)

#添加一些异常值来扩展绘图
异常值< - data.frame(x = c(-100,30,60,-50) ,
y = c(20,200,-100,500),
ind = LETTERS [1:4])
df < - rbind(df,异常值)

ggplot(df,aes(x = x,y = y))+
geom_point()+
facet_wrap(〜ind)+
coord_fixed()

这就是我们得到的。


$ b 1.facet plot with coord_fixed():一致的比例尺,对齐的坐标轴

该图符合比例缩放比例要求和比例尺一致性要求,它也具有所有坐标轴对齐,即所有xlim ylim都相同。这很有用,因为它可以显示彼此的相对位置。

我也想检查每个图的模式并进行比较。为了保持相对位置的方面图,我想添加另一个具有一致比例但轴不对齐的图。如果您单独绘制每个绘图,则会选择xlim ylim来覆盖没有对齐要求的数据。所以我只需要绘制每个图,用 gridExtra cowplot 来排列它们。



然后为了处理异常值,我们的计划是添加一个缩放按钮来放大所有的地块(地块将在一个Shiny应用程序中)。

我们决定将每个绘图集中到它的质心。虽然这样会有更多的空间浪费,所有的情节正确地居中,放大他们都会显示所有情节的大部分,他们仍然可以比较尺度。



我有一个函数可以将每个图表调整到它的中位数中心,这与@Mark Peterson代码有些相似。

我知道中值点在2D点中没有很好的定义,但是它足够满足我的需求。因为我需要单独调整每个绘图,所以我不能再使用facet。

  expand_1D_center<  -  function(vec){
中心< - 中位数(vec)
new_diff < - 最大(中心 - 最小(vec),
最大(vec) - 中心)
返回(c(new_min = center - new_diff,
new_max = center + new_diff))
}
#给出xy向量,得到新的xy lim使质心居中
expand_2D_center< - function(x_vec,y_vec){
return(list(xlim = expand_1D_center(x_vec),
ylim = expand_1D_center(y_vec)))
}
#绘制每个都有中心调整的
id_vector< - sort(unique(df $ ind))
g_list< - vector(list,length = length(id_vector))
for(i in seq_along(id_vector)){
data_i< ; - df [df $ ind == id_vector [i],]
new_lim< - expand_2D_center(data_i $ x,data_i $ y)
g_list [[i]]< - ggplot(data = data_i,aes(x,y))+
geom_point()+
coord_fixed(xlim = new_lim $ xlim ,ylim = new_lim $ ylim)
}
grid.arrange(grobs = g_list,ncol = 2,respect = TRUE)



2。中心调整后的图表,每个图表的xy比例正确,但在整个图表中不一致。



我希望现在更清楚。我的第一篇文章没有清楚地说明问题,当时我专注于当前的问题,忘记了需要解释我们需求的整个历史。 p 似乎解决了这个问题,我会进一步阅读代码来验证。

谢谢!

编辑:给一些上下文,我从这里添加真实数据的图表: / p>

将所有海鸥的概览图绘制在一个图表中,注意有一些异常值拉伸了该图



这是方面图,这对于一切对齐很有帮助。





这一个每个图都以质心为中心。我计划同时放大它们。唯一的问题是整个地块的比例不一致。





编辑:我在我的数据上尝试了@Mark Peterson代码,它裁剪了一些点,但情节是一致的,可能是因为我的数据有更大的值,所以原始填充不够大。 b
$ b Mark每个绘图在所有绘图中使用最大xrange,因此每个绘图都有相同的范围。我的代码尝试将每个绘图都适合其模式,但要将它们放置在具有一致缩放比例的网格中,则需要用最大的画布缩小绘图,或者填充最小的绘图。设置每个绘图的范围实际上具有相似的效果,但实现起来要简单得多。

解决方案

好吧,我想我有尽管我对@MrFlick表示同意,明确地共享数据将对此产生巨大的帮助。



如果您拥有所有的简单数据你的动物在同一个基本格子上,我猜你不会问(至少不是你的样子)。也就是说,给出这些数据:

  set.seed(8675309)
df< -
data .frame(
x = runif(40,1,20)
,y = runif(40,100,140)
,ind = sample(LETTERS [1:4],40,TRUE )

这个简单的 facet_grid 作品:

  ggplot(df,aes(x = x,y = y))+ 
geom_point() +
facet_wrap(〜ind)+
coord_fixed()





但是,您说 facet_wrap 解决方案不起作用。所以,我猜测你有数据,每个动物都在不同的网格中,像这样(注意,在这里使用 dplyr 以及更多下面的内容):

b
$ b

  modDF < -  
df%>%
mutate(x = x + as.numeric(ind)* 10
,y = y + as.numeric(ind)* 20)

(使用 modDF 而不是 df



< pre $ ggplot(modDF,aes(x = x,y = y))+
geom_point()+
facet_wrap(〜ind)+
coord_fixed ()

给出了这个:

< a href =https://i.stack.imgur.com/RGuxk.png =nofollow noreferrer>



有很多空间浪费,看起来不太好。所以,我认为你在问如何处理这些数据。为此,我认为你需要做的是计算最大范围(在每个轴上),然后生成以每个人的数据为中心的范围。为此,我严重依赖 dplyr group_by 个人并计算最小和最大x / y位置。然后,我计算了一些额外的列来计算每个人的数据的中点,范围的大小,然后,范围应扩展到需要设置的最大宽度/高度,并以该个人的数据为中心。请注意,我也将这些填充一点,以便在实现范围时可以设置 expand = FALSE

  getRanges<  -  
modDF%>%
group_by(ind)%>%
汇总(
minx = min( x)
,maxx = max(x)
,miny = min(y)
,maxy = max(y)
)%>%
mutate(
#查找范围设置的中点
midx =(maxx + minx)/ 2
,midy =(maxy + miny)/ 2
#查找所有范围的大小
,xrange = maxx - minx
,yrange = maxy - miny
#将X lims设置为最大范围的大小,以中间
为中心,xstart = midx - max(xrange) / 2 - 0.5
,xend = midx + max(xrange)/ 2 + 0.5
#设置Y lims为最大范围的大小,以中间
为中心,ystart = midy - max (yrange)/ 2 - 0.5
,yend = midy + max(yrange)/ 2 + 0.5

给出

  ind min x maxx miny maxy midx midy xrange yrange xstart xend ystart yend 
< fctr> < DBL> < DBL> < DBL> < DBL> < DBL> < DBL> < DBL> < DBL> < DBL> < DBL> < DBL> < DBL>
1 A 14.91873 29.53871 120.0743 157.6944 22.22872 138.8844 14.61997 37.62010 14.17717 30.28027 119.5743 158.1944
2 B 22.50432 37.27647 153.5654 179.0589 29.89039 166.3122 14.77215 25.49352 21.83884 37.94195 147.0021 185.6222
3 C 32.15187 47.08845 165.9829 195.0261 39.62016 180.5045 14.93658 29.04320 31.56861 47.67171 161.1945 199.8146
4 D 44.49392 59.59702 192.7243 214.5523 52.04547 203.6383 15.10310 21.82806 43.99392 60.09702 184.3283 222.9484

然后,我遍历每个人,生成所需的图并将范围设置为为该个人计算的范围。 (您可以使用 ggtitle 而不是 facet_wrap ,但我喜欢 strip
$ b

  level(modDF $ ind),function(thisInd){
thisRange< -
filter(getRanges,ind == thisInd)

modDF%>%
filter(ind == thisInd)%>%
ggplot(aes(x = x,y = y))+
geom_point()+
coord_fixed(
xlim = c (thisRange $ xstart,thisRange $ xend)
,ylim = c(thisRange $ ystart,thisRange $ yend)
,expand = FALSE
)+
#ggtitle(thisInd)









$ b然后,我用 plot_grid 来自 cowplot 来排列图。请注意,加载 cowplot 会设置一个主题。所以,我重置了主题,因为我不是来自 cowplot

  library(cowplot)
theme_set(theme_gray())

plot_grid(plotlist = sepPlot)

给出:



从这里开始,您可以随心所欲地使用比例和轴标。


EDIT: with description cleared up and code example, plots added.

I have a data set with locations of several animals.

I created a grid of location scatter plots for every single animal. Because the x y of plot are distance, I want to keep x y in same scale for each plot itself (so there is no distortion in distance) and across plots (so I can compare different plots with same scale).

Facet is a natural choice for this and it works with coord_fixed(). However it became more complex when there are outliers in the data (which could be errors). I modified @Mark Peterson great answer to add some outlier points.

set.seed(8675309)
df <-
  data.frame(
    x = runif(40, 1, 20)
    , y = runif(40, 100, 140)
    , ind = sample(LETTERS[1:4], 40, TRUE)
  )
# add some outliers to stretch the plot
outliers <- data.frame(x = c(-100, 30, 60,-50),
                       y = c(20, 200, -100, 500),
                       ind = LETTERS[1:4])
df <- rbind(df, outliers)

ggplot(df , aes(x = x, y = y)) +
  geom_point() +
  facet_wrap(~ind) +
  coord_fixed()

This is what we got.

1.facet plot with coord_fixed(): consistent scales, aligned axes

This plot satisfied the scale ratio requirement and the scale consistent requirement, it also have all axes aligned, i.e. all xlim ylim are same. This is useful because it can show the relative position of each other.

I also want to check the patterns of each plot and compare them. Keeping the facet plot for relative position, I want to add another plot that have consistent scales but axes not aligned. If you draw each plot individually it will choose the xlim ylim to just cover the data without the alignment requirement. So I just need to draw each plot, arrange them with gridExtra or cowplot.

Then to deal with the outliers, our plan is to add a zoom button to zoom in all plots (the plots will be in a Shiny app).

We decided to center every plot to its centroid. Although this way there will be more space wasted, with all plot centered correctly, zooming them all will show the majority of all plot and they are still comparable in scales.

I had a function to adjust each plot to its median center, a little bit similar to @Mark Peterson code.

I knew median center is not well defined in 2D points, but it's good enough for my needs. Because I need to adjust each plot individually, I cannot use facet anymore.

expand_1D_center <- function(vec){
  center <- median(vec)
  new_diff <- max(center - min(vec), 
                  max(vec) - center)
  return(c(new_min = center - new_diff, 
           new_max = center + new_diff))
}
# given x y vectors, get new x y lim to make centroid center
expand_2D_center <- function(x_vec, y_vec){
  return(list(xlim = expand_1D_center(x_vec),
              ylim = expand_1D_center(y_vec)))
}
# plot each with center adjusted
id_vector <- sort(unique(df$ind))
g_list <- vector("list", length = length(id_vector))
for (i in seq_along(id_vector)) {
  data_i <- df[df$ind == id_vector[i], ]
  new_lim <- expand_2D_center(data_i$x, data_i$y)
  g_list[[i]] <- ggplot(data = data_i, aes(x, y)) +
    geom_point() +
    coord_fixed(xlim = new_lim$xlim, ylim = new_lim$ylim) 
}
grid.arrange(grobs = g_list, ncol = 2, respect=TRUE)

2. center adjusted plots, with xy scale right for each plot, but not consistent across plots.

I hope this is more clear now. My first post didn't state the problem clearly when I was focused on current problem and forgot the whole history, which are needed to explain our requirement.

@Mark Peterson answer seems solved this problem, I'll read the code further to verify.

Thanks!

EDIT: to give some context, I added the plots from the real data here:

the overview plots with all gulls in one plot, note there are some outliers stretched the plot

This is the facet plot, which is useful to have everything aligned.

This is the individual plots with each scales right, not aligned across plots.

This one have each plot centered around the centroid. I plan to zoom in them all at the same time. The only problem is the scales are not consistent across plots.

EDIT: I tried @Mark Peterson code on my data, it cropped some points but the plots are consistent., probably because my data is with much bigger values so the original padding is not big enough.

Mark is using the max xrange across all plots for each plot, so every plot have same range. My code tried to fit every plot to their pattern, but to place them inside a grid with consistent scales will need to shrink the plot with biggest canvas, or padding the smallest plot. Setting the range of every plot to same actually have similar effect but is much simpler to implement.

解决方案

Alright, I think I have gotten my best guess at what you are asking, though I agree with @MrFlick that explictly sharing data would be a huge help to that.

If you had simple data with all of your animals on the same basic grid, I am guessing you wouldn't be asking (at least not the way you are). That is, given these data:

set.seed(8675309)
df <-
  data.frame(
    x = runif(40, 1, 20)
    , y = runif(40, 100, 140)
    , ind = sample(LETTERS[1:4], 40, TRUE)
  )

This straightforward facet_grid works:

ggplot(df , aes(x = x, y = y)) +
  geom_point() +
  facet_wrap(~ind) +
  coord_fixed()

to give this:

But, you said that facet_wrap solutions wouldn't work. So, I am guessing that you have data where each animal is in a different grid, like this (note, using dplyr here and much more below):

modDF <-
  df %>%
  mutate(x = x + as.numeric(ind)*10
         , y = y + as.numeric(ind)*20)

And that means that the above code (using modDF instead of df)

ggplot(modDF, aes(x = x, y = y)) +
  geom_point() +
  facet_wrap(~ind) +
  coord_fixed()

gives this:

which has a ton of wasted space and doesn't look great. So, I think you are asking how to handle data like these. For that, I think what you need to do is calculate the largest range (in each axis) and then generate that range centered on the data for each individual. For that, I am relying heavily on dplyr to group_by individual and calculate the minimum and maximum x/y locations. Then, I calculate a number of additional columns to calculate the midpoint of the data for each individual, the size of the range, and then where the range should extend to be set to the largest width/height needed and be centered on that individual's data. Note that I am also padding these a little bit so that I can set expand = FALSE when I implement the ranges.

getRanges <-
  modDF %>%
  group_by(ind) %>%
  summarise(
    minx = min(x)
    , maxx = max(x)
    , miny = min(y)
    , maxy = max(y)
  ) %>%
  mutate(
    # Find mid points for range setting
    midx = (maxx + minx)/2
    , midy = (maxy + miny)/2
    # Find size of all ranges
    , xrange = maxx - minx
    , yrange = maxy - miny
    # Set X lims the size of the biggest range, centered at the middle
    , xstart = midx - max(xrange)/2 - 0.5
    , xend = midx + max(xrange)/2 + 0.5
    # Set Y lims the size of the biggest range, centered at the middle
    , ystart = midy - max(yrange)/2 - 0.5
    , yend = midy + max(yrange)/2 + 0.5
    )

gives

     ind     minx     maxx     miny     maxy     midx     midy   xrange   yrange   xstart     xend   ystart     yend
  <fctr>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1      A 14.91873 29.53871 120.0743 157.6944 22.22872 138.8844 14.61997 37.62010 14.17717 30.28027 119.5743 158.1944
2      B 22.50432 37.27647 153.5654 179.0589 29.89039 166.3122 14.77215 25.49352 21.83884 37.94195 147.0021 185.6222
3      C 32.15187 47.08845 165.9829 195.0261 39.62016 180.5045 14.93658 29.04320 31.56861 47.67171 161.1945 199.8146
4      D 44.49392 59.59702 192.7243 214.5523 52.04547 203.6383 15.10310 21.82806 43.99392 60.09702 184.3283 222.9484

Then, I loop through each individual, generating the plot needed and setting the range to what was calculated for that individual. (You could use ggtitle instead of facet_wrap but I like the strip effect from facet_wrap.)

sepPlots <- lapply(levels(modDF$ind), function(thisInd){
  thisRange <-
    filter(getRanges, ind == thisInd)

  modDF %>%
    filter(ind == thisInd) %>%
    ggplot(aes(x = x, y = y)) +
    geom_point() +
    coord_fixed(
      xlim = c(thisRange$xstart, thisRange$xend)
      , ylim = c(thisRange$ystart, thisRange$yend)
      , expand = FALSE
    ) +
    # ggtitle(thisInd)
    facet_wrap(~ind)
})

Then, I use plot_grid from cowplot to arrange the plots together. Note that loading cowplot sets a theme. So, I am resetting the theme because I am not a huge fan of the one from cowplot

library(cowplot)
theme_set(theme_gray())

plot_grid(plotlist = sepPlots)

gives:

From there, you can play around with scales and axis labels as you see fit.

这篇关于如何保持一致的轴在具有不同画布大小的ggplot2图的网格中缩放的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆