在ggplot中将y中的y限制为> 0 [英] In ggplot restrict y to be >0 in LOESS

查看:203
本文介绍了在ggplot中将y中的y限制为> 0的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的代码:

#data
sites <- 
  structure(list(site = c(928L, 928L, 928L, 928L, 928L, 928L, 928L,
                          928L, 928L, 928L, 928L, 928L, 928L, 928L,
                          928L, 928L, 928L, 928L, 928L, 928L, 928L,
                          928L, 928L, 928L, 928L, 928L), 
                 date = c(13493L, 13534L, 13566L, 13611L, 13723L,
                          13752L, 13804L, 13837L, 13927L, 14028L,
                          14082L, 14122L, 14150L, 14182L, 14199L,
                          16198L, 16279L, 16607L, 16945L, 17545L,
                          17650L, 17743L, 17868L, 17941L, 18017L, 18092L),
                 y = c(7L, 7L, 17L, 18L, 17L, 17L, 10L, 3L, 17L, 24L, 
                       11L, 5L, 5L, 3L, 5L, 14L, 2L, 9L, 9L, 4L, 7L,
                       6L, 1L, 0L, 5L, 0L)), 
            .Names = c("site", "date", "y"),
            class = "data.frame", row.names = c(NA, -26L))

#convert to date
x<-as.Date(sites$date, origin="1960-01-01") 

#plot smooth, line goes below zero!
qplot(data=sites, x, y, main="Site 349") 
(p <- qplot(data = sites, x, y, xlab = "", ylab = ""))
(p1 <- p + geom_smooth(method = "loess",span=0.5, size = 1.5))

某些LOESS行和置信区间低于零,我想将图形限制为0和正数(因为负数没有意义).

Some of the LOESS lines and confidence intervals go below zero, I would like to restrict the graphics to 0 and positive numbers (because negative do not make sense).

我该怎么做?

推荐答案

我赞成Matt Parker的建议,即您必须更改拟合过程.通常仅适用于正数数据的一种选择是在对数刻度上进行拟合,然后取幂以得到原始刻度上的结果.这将保证只有正值.

I am seconding Matt Parker's suggestion that you have to change the fitting procedure. One option that often works for positive-only data, is to do the fit on log-scale, and then exponentiate to get results on the original scale. This will guarantee positive only values.

生成具有某些此类问题的随机数据:

Generating random data that has some of this issues:

 d <- data.frame(x=0:100)
 d$y <- exp(rnorm(nrow(d), mean=-d$x/40, sd=0.8))
 qplot(x,y,data=d) + stat_smooth() 

现在,我们可以使用ggplot的转换功能对y值进行对数转换,但以指数级显示结果(对应于原始值):

Now we can use ggplot's transformation capabilites to log-transform the y-values, but display the results on an exponential scale (which corresponds to the original one):

qplot(x,y,data=d) + stat_smooth() + scale_y_log10()+coord_trans(ytrans="pow10")

您可以在coord_trans帮助页面上看到类似的示例.如果您不喜欢y轴,则可以操纵中断和标签.

You can see examples like this on the coord_trans help page. If you don't like the y-axis, you can manipulate the breaks and labels.

根据问题更新进行编辑

自最初提出问题以来,ggplot2中已有一些更改,并且原始答案未处理0.

There have been some changes in ggplot2 since the question was originally asked, and the original answer did not deal with 0's.

选项1

该解决方案的主要思想是相同的:找到一个将可能值范围映射到-Inf到Inf的变换,在该处进行黄土平滑处理,然后对结果进行反变换.如果没有零,则对数转换会很好.我不认为如果包含0则不存在所需的函数,但是log(1+x)转换通常是可行的.那是内置的,但我们还需要进行逆变换exp(x)-1.

The main idea of the solution is the same: find a transformation that will map the range of possible values to -Inf to Inf, do the loess smooth there, and then backtransform the result. The log-transformation would be great if there were no zeroes. I don't think the required function exists if 0 is included, but a possibility that often works is the log(1+x) transformation. That is built-in, but we need to have the inverse transformation exp(x)-1 as well.

library(scales)
#create exp(x)-1 transformation, the inverse of log(1+p)
expm1_trans <-  function() trans_new("expm1", "expm1", "log1p")

qplot(x, y, data=sites) + stat_smooth(method="loess") +
  scale_y_continuous(trans=log1p_trans()) +
  coord_trans(ytrans=expm1_trans())

选项2

第二个选项将注释中的建议扩展到了马特·帕克(Matt Parker)的答案:使用包含结果整数性质的回归方法.这意味着计数的泊松回归过于分散(以防万一).虽然您不能做黄土,但是可以进行样条拟合.您可以玩自由度来控制平滑度.

The second option extends the suggestion in the comments to Matt Parker's answer: use a regression method that incorporates the integer nature of the outcomes. That means overdispersed (just in case) Poisson regression for counts. While you can't do loess, you can do a spline fit. You can play with the degrees of freedom to control the smoothness.

library(splines)
qplot(x, y, data=sites) + stat_smooth(method="glm", family="quasipoisson", 
                                      formula = y ~ ns(x, 3))

这两个选项给出的结果非常相似,这是一件好事.

The two options give quite similar results, which is a good thing.

这篇关于在ggplot中将y中的y限制为> 0的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆