两侧ggplot2相同的比例(非连续) [英] ggplot2 identical scales (non-continuous) on both sides

查看:801
本文介绍了两侧ggplot2相同的比例(非连续)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 ggplot2 (最新版本)来生成一个图表复制图表两侧的x轴或y轴,其中 不连续。

最小代表

 #示例数据
dat1< - tibble :: tibble(x = c(rep( a),50),rep(b,50)),
y = runif(100))

#标准散点图
p1 < - ggplot2 :: ggplot( dat1)+
ggplot2 :: geom_boxplot(ggplot2 :: aes(x = x,y = y))

当量表连续时,这很容易通过身份转换(显然是一对一)。

 #这工作
p1 + ggplot2 :: scale_y_continuous(sec.axis = ggplot2 :: sec_axis(〜。))

但是,当比例不连续时,这不起作用,因为其他 scale _ * 函数没有 sec.axis 参数(这是有道理的)。

 #这不起作用
p1 + ggplot2 :: scale_x_discrete(sec.axis = ggplot2 :: sec_axis(〜。))

在discrete_scale(c(x,xmin,xmax,xend),position_d,:
未使用的参数(sec.axis =< environment> ;)

我也尝试使用位置参数在比例_ * 函数中,但这也不起作用。

 #这不起作用
p1 + ggplot2 :: scale_x_discrete(position = c(top,bottom))

Match.arg错误(位置,c (left,right,top,bottom)):
'arg'必须长度为1

编辑

为了清晰起见,我希望复制x轴或y轴,规模是什么 ,而不仅仅是离散的(一个因子变量)。例如,这个问题出现在非连续比例为 datetime time 格式。



这也适用于时间而不是日期。虽然日期是整数天,但是时间是整数秒(自Unix时代开始,对于日期时间或从午夜开始,对于时间而言)。



假设您有一些观察结果分钟的规模,而不是几天。



代码将类似,有一些调整:

  df < -  data_frame(tm = ymd_hms(2017-08-01 23:58:00)+ 60 * 0:10,
y = cumsum(rnorm(length (tm))))%>%
mutate(tm_num = as.numeric(tm))




  df 

#A tibble:11 x 3
tm y tm_num
; < DBL> < DBL>
1 2017-08-01 23:58:00 1.375275 1501631880
2 2017-08-01 23:59:00 2.373565 1501631940
3 2017-08-02 00:00:00 3.650167 1501632000
4 2017-08-02 00:01:00 2.578420 1501632060
5 2017-08-02 00:02:00 5.155688 1501632120
6 2017-08-02 00:03:00 4.022228 1501632180
7 2017-08-02 00:04:00 4.776145 1501632240
8 2017-08-02 00:05:00 4.917420 1501632300
9 2017-08-02 00:06: 00 4.513710 1501632360
10 2017-08-02 00:07:00 4.134294 1501632420
11 2017-08-02 00:08:00 3.142898 1501632480




  df%>%
ggplot(aes(tm_num,y) )+ geom_line()+
scale_x_continuous(sec.axis = dup_axis(),
reak = function(limits){
seq(floor(limits [1] / 60)* 60,ceiling (限制[2] / 60)* 60,
by = as.numeric(as_datetime(minutes(2))))
},
labels = function(breaks){
邮票(Jan 1,\\\
0:00:00,orders =md hms)(as_datetime(breaks))
})

在这里,我更新了虚拟数据,从午夜之前到午夜之后的11分钟。在 breaks = 中,我修改了它以确保我得到了整数分钟来创建中断,并将 as_date 更改为 as_datetime ,并使用分钟(2)每两分钟休息一次。在 labels = 中,我添加了一个函数标记(...)(...),它创建了一个很好的格式以显示。





最后只是时间。 / p>

  df < -  data_frame(tm = milliseconds(1234567 + 0:10),
y = cumsum(rnorm(length (tm))))%>%
mutate(tm_num = as.numeric(tm))

df




 #A tibble:11 x 3 
tm y tm_num
1 1234.567S 0.2136745 1234.567
2 1234.568S -0.6376908 1234.568
3 1234.569S -1.1080997 1234.569
4 1234.57S -0.4219645 1234.570
5 1234.571S -2.7579118 1234.571
6 1234.572S -1.6626674 1234.572
7 1234.573S -3.2298175 1234.573
8 1234.574S -3.2078864 1234.574
9 1234.575S -3.3982454 1234.575
10 1234.576S -2.1051759 1234.576
11 1234.577S -1.9163266 1234.577




  df%>%
ggplot(aes(tm_num,y))+ geom_line()+
scale_x_continuous(sec.axis = dup_axis(),
breaks = function限制){
seq(limits [1],limits [2],
by = as.numeric(milliseconds(3)))
},
labels = function ){format((as_datetime(breaks)),
format =%H:%M:%OS3)})

这里我们从t = 20分34.567秒开始每11毫秒观察一次毫秒。所以在 breaks = 我们可以放弃任何四舍五入,因为我们现在不需要整数。然后我们使用每个毫秒(2)的休息时间。然后 labels = 需要格式化为接受小数秒,%OS3表示秒位的小数点后3位数(最多可以接受6,参见<$ c
$ b



这一切都值得吗?可能不会,除非你真的想要一个重复的时间轴。因为 dup_axis 应该与日期时间一起工作,所以我可能会在 ggplot2 GitHub上发布这个问题。

Goal

Use ggplot2 (latest version) to produce a graph that duplicates the x- or y-axis on both sides of the plot, where the scale is not continuous.

Minimal Reprex

# Example data
dat1 <- tibble::tibble(x = c(rep("a", 50), rep("b", 50)), 
                       y = runif(100))

# Standard scatterplot
p1 <- ggplot2::ggplot(dat1) +
    ggplot2::geom_boxplot(ggplot2::aes(x = x, y = y))

When the scale is continuous, this is easy to do with an identity transformation (clearly one-to-one).

# This works
p1 + ggplot2::scale_y_continuous(sec.axis = ggplot2::sec_axis(~ .))

However, when the scale is not continuous, this doesn't work, as other scale_* functions don't have a sec.axis argument (which makes sense).

# This doesn't work
p1 + ggplot2::scale_x_discrete(sec.axis = ggplot2::sec_axis(~ .))

Error in discrete_scale(c("x", "xmin", "xmax", "xend"), "position_d",  : 
  unused argument (sec.axis = <environment>)

I also tried using the position argument in the scale_* functions, but this doesn't work either.

# This doesn't work either
p1 + ggplot2::scale_x_discrete(position = c("top", "bottom"))

Error in match.arg(position, c("left", "right", "top", "bottom")) : 
  'arg' must be of length 1

Edit

For clarity, I was hoping to duplicate the x- or y-axis where the scale is anything, not just discrete (a factor variable). I just used a discrete variable in the minimal reprex for simplicity.

For example, this issue arises in a context where the non-continuous scale is datetime or time format.

解决方案

Duplicating (and modifying) discrete axis in ggplot2

You can adapt this answer by just putting the same labels on both sides. As far as "you can convert anything non-continuous to a factor, but that's even more inelegant!" from your comment above, that's what a non-continuous axis is, so I'm not sure why that would be a problem for you.

TL:DR Use as.numeric(...) for your categorical aesthetic and manually supply the labels from the original data, using scale_*_continuous(..., sec_axis(~., ...)).


Edited to update:

I happened to look back through this thread and see that it was asked for dates and times. This makes the question worded incorrectly: dates and times are continuous not discrete. Discrete scales are factors. Dates and times are ordered continuous scales. Under the hood, they're just either the days or the seconds since "1970-01-01".

scale_x_date will indeed throw an error if you try to pass a sec.axis argument, even if it's dup_axis. To work around this, you convert your dates/times to a number, and then fool your scales using labels. While this requires a bit of fiddling, it's not too complicated.

library(lubridate)
library(dplyr)

df <- data_frame(tm = ymd("2017-08-01") + 0:10,
                 y = cumsum(rnorm(length(tm)))) %>% 
  mutate(tm_num = as.numeric(tm)) 

df

# A tibble: 11 x 3
           tm          y tm_num
       <date>      <dbl>  <dbl>
 1 2017-08-01 -2.0948146  17379
 2 2017-08-02 -2.6020691  17380
 3 2017-08-03 -3.8940781  17381
 4 2017-08-04 -2.7807154  17382
 5 2017-08-05 -2.9451685  17383
 6 2017-08-06 -3.3355426  17384
 7 2017-08-07 -1.9664428  17385
 8 2017-08-08 -0.8501699  17386
 9 2017-08-09 -1.7481911  17387
10 2017-08-10 -1.3203246  17388
11 2017-08-11 -2.5487692  17389

I just made a simple vector of 11 days (0 to 10) added to "2017-08-01". If you run as.numeric on that, you get the number of days since the beginning of the Unix epoch. (see ?lubridate::as_date).

df %>% 
  ggplot(aes(tm_num, y)) + geom_line() +
  scale_x_continuous(sec.axis = dup_axis(),
                     breaks = function(limits) {
                       seq(floor(limits[1]), ceiling(limits[2]), 
                           by = as.numeric(as_date(days(2))))
                       },
                     labels = function(breaks) {as_date(breaks)})

When you plot tm_num against y, it's treated just like normal numbers, and you can use scale_x_continuous(sec.axis = dup_axis(), ...). Then you have to figure out how many breaks you want and how to label them.

The breaks = is a function that takes the limits of the data, and calculates nice looking breaks. First you round the limits, to make sure you get integers (dates don't work well with non-integers). Then you generate a sequence of your desired width (the days(2)). You could use weeks(1) or months(3) or whatever, check out ?lubridate::days. Under the hood, days(x) generates a number of seconds (86400 per day, 604800 per week, etc.), as_date converts that into a number of days since the Unix epoch, and as.numeric converts it back to an integer.

The labels = is a function takes the sequence of integers we just generated and converts those back to displayable dates.

This also works with times instead of dates. While dates are integer days, times are integer seconds (either since the Unix epoch, for datetimes, or since midnight, for times).

Let's say you had some observations that were on the scale of minutes, not days.

The code would be similar, with a few tweaks:

df <- data_frame(tm = ymd_hms("2017-08-01 23:58:00") + 60*0:10,
           y = cumsum(rnorm(length(tm)))) %>% 
  mutate(tm_num = as.numeric(tm)) 

df

# A tibble: 11 x 3
                    tm        y     tm_num
                <dttm>    <dbl>      <dbl>
 1 2017-08-01 23:58:00 1.375275 1501631880
 2 2017-08-01 23:59:00 2.373565 1501631940
 3 2017-08-02 00:00:00 3.650167 1501632000
 4 2017-08-02 00:01:00 2.578420 1501632060
 5 2017-08-02 00:02:00 5.155688 1501632120
 6 2017-08-02 00:03:00 4.022228 1501632180
 7 2017-08-02 00:04:00 4.776145 1501632240
 8 2017-08-02 00:05:00 4.917420 1501632300
 9 2017-08-02 00:06:00 4.513710 1501632360
10 2017-08-02 00:07:00 4.134294 1501632420
11 2017-08-02 00:08:00 3.142898 1501632480

df %>% 
  ggplot(aes(tm_num, y)) + geom_line() +
  scale_x_continuous(sec.axis = dup_axis(),
                     breaks = function(limits) {
                       seq(floor(limits[1] / 60) * 60, ceiling(limits[2] / 60) * 60, 
                           by = as.numeric(as_datetime(minutes(2))))
                       },
                     labels = function(breaks) {
                       stamp("Jan 1,\n0:00:00", orders = "md hms")(as_datetime(breaks))
                       })

Here I updated the dummy data to span 11 minutes from just before midnight to just after midnight. In breaks = I modified it to make sure I got an integer number of minutes to create breaks on, changed as_date to as_datetime, and used minutes(2) to make a break every two minutes. In labels = I added a functional stamp(...)(...), which creates a nice format to display.

Finally just times.

df <- data_frame(tm = milliseconds(1234567 + 0:10),
           y = cumsum(rnorm(length(tm)))) %>% 
  mutate(tm_num = as.numeric(tm)) 

df

# A tibble: 11 x 3
             tm          y   tm_num
   <S4: Period>      <dbl>    <dbl>
 1    1234.567S  0.2136745 1234.567
 2    1234.568S -0.6376908 1234.568
 3    1234.569S -1.1080997 1234.569
 4     1234.57S -0.4219645 1234.570
 5    1234.571S -2.7579118 1234.571
 6    1234.572S -1.6626674 1234.572
 7    1234.573S -3.2298175 1234.573
 8    1234.574S -3.2078864 1234.574
 9    1234.575S -3.3982454 1234.575
10    1234.576S -2.1051759 1234.576
11    1234.577S -1.9163266 1234.577

df %>% 
  ggplot(aes(tm_num, y)) + geom_line() +
  scale_x_continuous(sec.axis = dup_axis(),
                     breaks = function(limits) {
                       seq(limits[1], limits[2], 
                           by = as.numeric(milliseconds(3)))
                       },
                     labels = function(breaks) {format((as_datetime(breaks)),
                                                       format = "%H:%M:%OS3")})

Here we've got an observation every millisecond for 11 hours starting at t = 20min34.567sec. So in breaks = we dispense with any rounding, since we don't want integers now. Then we use breaks every milliseconds(2). Then labels = needs to be formatted to accept decimal seconds, the "%OS3" means 3 digits of decimals for the seconds place (can accept up to 6, see ?strptime).

Is all of this worth it? Probably not, unless you really really want a duplicated time axis. I'll probably post this as an issue on the ggplot2 GitHub, because dup_axis should "just work" with datetimes.

这篇关于两侧ggplot2相同的比例(非连续)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆