如何在对数图中处理零 [英] How to deal with zero in log plot

查看:324
本文介绍了如何在对数图中处理零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据,我想使用ggplot2在y轴上以对数标度在线图中绘制.不幸的是,我的一些价值观一直下降到零.数据表示依赖于某些参数的特征的相对出现.当在样本中未观察到该特征时,将出现零值,这意味着该特征很少出现,甚至从未出现.这些零值在对数图中引起问题.

I have data that I would like to plot in a line-graph with a log-scale on the y-axis using ggplot2. Unfortunately, some of my values go all the way down to zero. The data represents relative occurences of a feature in dependence of some parameters. The value zero occurs when that feature is not observed in a sample, which means that it occurs very seldomly, or indeed never. These zero values cause a problem in the log plot.

以下代码说明了简化数据集上的问题.实际上,数据集包含更多的点,因此曲线看起来更平滑,并且参数p的值也更多.

The following code illustrates the problem on a simplified data set. In reality the data set consists of more points, so the curve looks smoother, and also more values for the parameter p.

library(ggplot2)

dat <- data.frame(x=rep(c(0, 1, 2, 3), 2),
                  y=c(1e0, 1e-1, 1e-4, 0,
                      1e-1, 1e-3, 0, 0),
                  p=c(rep('a', 4), rep('b', 4)))
qplot(data=dat, x=x, y=y, colour=p, log="y", geom=c("line", "point"))

鉴于上面的数据,我们期望有两条线,第一条在对数图上应具有三个有限点,第二条在对数图上应仅具有两个有限点.

Given the data above, we would expect two lines, the first one should have three finite points on a log plot, the second one should have only two finite points on a log plot.

但是,正如您所看到的,这会产生非常误导的情节.看起来蓝线和红线都收敛于1e-4和1e-3之间的值.原因是log(0)给出了-Inf,而ggplot恰好将其放到了下轴上.

However, as you can see this produces a very misleading plot. It looks like the blue and red line are both converging to a value between 1e-4 and 1e-3. The reason is that log(0) gives -Inf, which ggplot just puts on the lower axis.

用ggplot2在R中处理此问题的最佳方法是什么? best 的意思是效率,并且是意识形态的R(我对R还是很陌生).

What's the best way to deal with this in R with ggplot2? By best I mean in terms of efficiency, and being ideomatic R (I'm fairly new to R).

该图应指示分别在x = 2(红色)或x = 1(蓝色)之后,这些曲线下降到非常小".理想情况下,一条垂直线从最后一个有限点开始向下.我的意思是,下面将对此进行演示.

The plot should indicate that these curves go down to "very small" after x=2 (red), or x=1 (blue), respectively. Ideally, with a vertical line downwards from the last finite point. What I mean by that is demonstrated in the following.

在这里,我将描述我的想法.但是,考虑到我是R的新手,我怀疑还有更好的方法.

Here I'll describe what I've come up with. However, given that I'm fairly new to R, I suspect that there might a much better way.

library(ggplot2)
library(scales)

dat <- data.frame(x=rep(c(0, 1, 2, 3), 2),
                  y=c(1e0, 1e-1, 1e-4, 0,
                      1e-1, 1e-3, 0, 0),
                  p=c(rep('a', 4), rep('b', 4)))

与上述数据相同.

现在,我要遍历每个唯一参数p,找到最后一个有限点的x坐标,并将其分配给y为零的所有点的x坐标.那是达到一条垂直线.

Now, I'm going through each unique parameter p, find the x coordinate of the last finite point, and assign it to the x coordinates of all points where y is zero. That is to achieve a vertical line.

for (p in unique(dat$p)) {
    dat$x[dat$p == p & dat$y == 0] <- dat$x[head(which(dat$p == p & dat$y == 0), 1) - 1]
}

此时,情节如下所示.

垂直线在那里.但是,也有几点.这些具有误导性,因为它们表明那里有一个实际的数据点,这是不正确的.

The vertical lines are there. However, there are also points. These are misleading as they indicate that there was an actual data point there, which is not true.

要删除这些点,我复制了y数据(似乎很浪费),我们将其称为yp,然后用NA替换零.然后,我将新的yp用作geom_point的y美学.

To remove the points I duplicate the y data (seems wasteful), let's call it yp, and replace zero by NA. Then I use that new yp as the y aesthetics for geom_point.

dat$yp <- dat$y
dat$yp[dat$y == 0] <- NA

ggplot(dat, aes(x=x, y=y, colour=p)) +
    geom_line() +
    geom_point(aes(y=dat$yp)) +
    scale_y_continuous(trans=log10_trans(),
                       breaks = trans_breaks("log10", function(x) 10^x),
                       labels = trans_format("log10", math_format(10^.x)))

在我使用ggplot而不是qplot的地方,以便为geom_linegeom_point赋予不同的美感.

Where I've used ggplot instead of qplot so that I can give different aesthetics to geom_line and geom_point.

最后,情节看起来像这样.

Finally, the plot looks like this.

正确的 方法是什么?

推荐答案

如果使用的是ggplot,则可以使用scales::pseudo_log_trans()作为转换对象.这会将您的-inf替换为0.

If you're using ggplot, you can use scales::pseudo_log_trans() as your transformation object. This will replace your -inf with 0.

从文档中( https://scales.r-lib.org/reference/pseudo_log_trans.html ),

将数字映射到有符号对数刻度的转换,并平滑地过渡到0附近的线性刻度.

A transformation mapping numbers to a signed logarithmic scale with a smooth transition to linear scale around 0.

pseudo_log_trans(sigma = 1, base = exp(1))

例如,我的比例表达式如下所示:

For example, my scale expression looks like this:

+ scale_fill_gradient(name = "n occurrences", trans="pseudo_log")

未经证实,但您可能需要包括秤库:

Unconfirmed, but you probably need to include the scales library:

require("scales")

这篇关于如何在对数图中处理零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆