如何在对数图中处理零 [英] How to deal with zero in log plot
问题描述
我有一些数据,我想使用ggplot2在y轴上以对数标度在线图中绘制.不幸的是,我的一些价值观一直下降到零.数据表示依赖于某些参数的特征的相对出现.当在样本中未观察到该特征时,将出现零值,这意味着该特征很少出现,甚至从未出现.这些零值在对数图中引起问题.
I have data that I would like to plot in a line-graph with a log-scale on the y-axis using ggplot2. Unfortunately, some of my values go all the way down to zero. The data represents relative occurences of a feature in dependence of some parameters. The value zero occurs when that feature is not observed in a sample, which means that it occurs very seldomly, or indeed never. These zero values cause a problem in the log plot.
以下代码说明了简化数据集上的问题.实际上,数据集包含更多的点,因此曲线看起来更平滑,并且参数p
的值也更多.
The following code illustrates the problem on a simplified data set. In reality the data set consists of more points, so the curve looks smoother, and also more values for the parameter p
.
library(ggplot2)
dat <- data.frame(x=rep(c(0, 1, 2, 3), 2),
y=c(1e0, 1e-1, 1e-4, 0,
1e-1, 1e-3, 0, 0),
p=c(rep('a', 4), rep('b', 4)))
qplot(data=dat, x=x, y=y, colour=p, log="y", geom=c("line", "point"))
鉴于上面的数据,我们期望有两条线,第一条在对数图上应具有三个有限点,第二条在对数图上应仅具有两个有限点.
Given the data above, we would expect two lines, the first one should have three finite points on a log plot, the second one should have only two finite points on a log plot.
但是,正如您所看到的,这会产生非常误导的情节.看起来蓝线和红线都收敛于1e-4和1e-3之间的值.原因是log(0)
给出了-Inf
,而ggplot恰好将其放到了下轴上.
However, as you can see this produces a very misleading plot. It looks like the blue and red line are both converging to a value between 1e-4 and 1e-3. The reason is that log(0)
gives -Inf
, which ggplot just puts on the lower axis.
用ggplot2在R中处理此问题的最佳方法是什么? best 的意思是效率,并且是意识形态的R(我对R还是很陌生).
What's the best way to deal with this in R with ggplot2? By best I mean in terms of efficiency, and being ideomatic R (I'm fairly new to R).
该图应指示分别在x = 2(红色)或x = 1(蓝色)之后,这些曲线下降到非常小".理想情况下,一条垂直线从最后一个有限点开始向下.我的意思是,下面将对此进行演示.
The plot should indicate that these curves go down to "very small" after x=2 (red), or x=1 (blue), respectively. Ideally, with a vertical line downwards from the last finite point. What I mean by that is demonstrated in the following.
在这里,我将描述我的想法.但是,考虑到我是R的新手,我怀疑还有更好的方法.
Here I'll describe what I've come up with. However, given that I'm fairly new to R, I suspect that there might a much better way.
library(ggplot2)
library(scales)
dat <- data.frame(x=rep(c(0, 1, 2, 3), 2),
y=c(1e0, 1e-1, 1e-4, 0,
1e-1, 1e-3, 0, 0),
p=c(rep('a', 4), rep('b', 4)))
与上述数据相同.
现在,我要遍历每个唯一参数p
,找到最后一个有限点的x坐标,并将其分配给y为零的所有点的x坐标.那是达到一条垂直线.
Now, I'm going through each unique parameter p
, find the x coordinate of the last finite point, and assign it to the x coordinates of all points where y is zero. That is to achieve a vertical line.
for (p in unique(dat$p)) {
dat$x[dat$p == p & dat$y == 0] <- dat$x[head(which(dat$p == p & dat$y == 0), 1) - 1]
}
此时,情节如下所示.
垂直线在那里.但是,也有几点.这些具有误导性,因为它们表明那里有一个实际的数据点,这是不正确的.
The vertical lines are there. However, there are also points. These are misleading as they indicate that there was an actual data point there, which is not true.
要删除这些点,我复制了y数据(似乎很浪费),我们将其称为yp
,然后用NA
替换零.然后,我将新的yp
用作geom_point
的y美学.
To remove the points I duplicate the y data (seems wasteful), let's call it yp
, and replace zero by NA
. Then I use that new yp
as the y aesthetics for geom_point
.
dat$yp <- dat$y
dat$yp[dat$y == 0] <- NA
ggplot(dat, aes(x=x, y=y, colour=p)) +
geom_line() +
geom_point(aes(y=dat$yp)) +
scale_y_continuous(trans=log10_trans(),
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))
在我使用ggplot
而不是qplot
的地方,以便为geom_line
和geom_point
赋予不同的美感.
Where I've used ggplot
instead of qplot
so that I can give different aesthetics to geom_line
and geom_point
.
最后,情节看起来像这样.
Finally, the plot looks like this.
正确的 方法是什么?
推荐答案
如果使用的是ggplot,则可以使用scales::pseudo_log_trans()
作为转换对象.这会将您的-inf替换为0.
If you're using ggplot, you can use scales::pseudo_log_trans()
as your transformation object. This will replace your -inf with 0.
从文档中( https://scales.r-lib.org/reference/pseudo_log_trans.html ),
将数字映射到有符号对数刻度的转换,并平滑地过渡到0附近的线性刻度.
A transformation mapping numbers to a signed logarithmic scale with a smooth transition to linear scale around 0.
pseudo_log_trans(sigma = 1, base = exp(1))
例如,我的比例表达式如下所示:
For example, my scale expression looks like this:
+ scale_fill_gradient(name = "n occurrences", trans="pseudo_log")
未经证实,但您可能需要包括秤库:
Unconfirmed, but you probably need to include the scales library:
require("scales")
这篇关于如何在对数图中处理零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!