绘制ggplot2中的累计计数 [英] Plotting cumulative counts in ggplot2
问题描述
有一些关于在ggplot中绘制累积密度的文章。我目前正在使用更简单的方法绘制ggplot?中的累积频率分布图来绘制我的累积计数。但是这个解决方案需要事先预先计算这些值。
这里我正在寻找一个纯粹的ggplot解决方案。让我们来展示我到目前为止的内容:
x < - data.frame(A = replicate(200,sample(c( a,b,c),1)),X = rnorm(200))
ggplot的 stat_ecdf
我可以使用ggplot的 stat_ecdf
,但它只绘制累计密度:
ggplot(x,aes(x = X,color = A))+ geom_step (aes(y = .. y ..),stat =ecdf)
我想要做类似以下,但它不起作用:
ggplot(x,aes(x = X,color = A))+ geom_step(aes(y = .. y .. * ..count ..),stat =ecdf)
cumsum
和 stat_bin
我发现关于使用 cumsum
和 stat_bin
的想法:
ggplot(x,aes(x = X,color = A))+ stat_bin(aes(y = cumsum(.. count ..)),geom =step)
但你可以看到,下一个颜色不是从 y = 0
开始,而是最后一个颜色结束。
我要的是什么
我想从最好到最差:
-
理想情况下,简单的修复方法是不工作的
ggplot(x,aes(x = X,color = A))+ geom_step(aes(y = .. y .. * ..count ..),stat =ecdf)
-
使用
stat_ecdf
计数的更复杂方式。 - 最后的方法是使用
cumsum
方法,因为它给出了更糟的(分箱)结果。
这不会解决直接分组问题,但它会解决问题。
您可以在> stat_bin()
A levels。
ggplot(x,aes(x = X,color = A))+
stat_bin(data = subset(x,A ==a),aes(y = cumsum(.. count ..)),geom =step)+
stat_bin(data = subset (x,A ==b),aes(y = cumsum(.. count ..)),geom =step)+
stat_bin(data = subset(x,A ==c ),aes(y = cumsum(.. count ..)),geom =step)
更新 - 解决方案使用geom_step()
另一种可能性是将 .. y ..
的值与每个级别的观测值相乘。为了在这一时刻获得这些观察数量,我发现的唯一方法是在绘图之前对它们进行预先计算并将它们添加到原始数据框中。我将此列命名为 len
。然后在 geom_step()
里面 aes()
,你应该定义你将使用变量 len = len
,然后将 y
值定义为 y = .. y .. * len
。
set.seed(123)
x< - data.frame(A = replicate(200,sample c)(a,b,c),1)),X = rnorm(200))
library(plyr)
df < - ddply ,len = length(X))
ggplot(df,aes(x = X,color = A))+ geom_step(aes(len = len,y = .. y .. * len),stat =ecdf)
There are some posts about plotting cumulative densities in ggplot. I'm currently using the accepted answer from Easier way to plot the cumulative frequency distribution in ggplot? for plotting my cumulative counts. But this solution involves pre-calculating the values beforehand.
Here I'm looking for a pure ggplot solution. Let's show what I have so far:
x <- data.frame(A=replicate(200,sample(c("a","b","c"),1)),X=rnorm(200))
ggplot's stat_ecdf
I can use ggplot's stat_ecdf
, but it only plots cumulative densities:
ggplot(x,aes(x=X,color=A)) + geom_step(aes(y=..y..),stat="ecdf")
I'd like to do something like the following, but it doesn't work:
ggplot(x,aes(x=X,color=A)) + geom_step(aes(y=..y.. * ..count..),stat="ecdf")
cumsum
and stat_bin
I found an idea about using cumsum
and stat_bin
:
ggplot(x,aes(x=X,color=A)) + stat_bin(aes(y=cumsum(..count..)),geom="step")
But as you can see, the next color doesn't start at y=0
, but where the last color ended.
What I ask for
What I'd like to have from best to worst:
Ideally a simple fix to the not working
ggplot(x,aes(x=X,color=A)) + geom_step(aes(y=..y.. * ..count..),stat="ecdf")
A more complicated way to use
stat_ecdf
with counts.- Last resort would be to use the
cumsum
approach, since it gives worse (binned) results.
This will not solve directly problem with grouping of lines but it will be workaround.
You can add three calls to stat_bin()
where you subset your data according to A
levels.
ggplot(x,aes(x=X,color=A)) +
stat_bin(data=subset(x,A=="a"),aes(y=cumsum(..count..)),geom="step")+
stat_bin(data=subset(x,A=="b"),aes(y=cumsum(..count..)),geom="step")+
stat_bin(data=subset(x,A=="c"),aes(y=cumsum(..count..)),geom="step")
UPDATE - solution using geom_step()
Another possibility is to multiply values of ..y..
with number of observations in each level. To get this number of observations at this moment only way I found is to precalculate them before plotting and add them to original data frame. I named this column len
. Then in geom_step()
inside aes()
you should define that you will use variable len=len
and then define y
values as y=..y.. * len
.
set.seed(123)
x <- data.frame(A=replicate(200,sample(c("a","b","c"),1)),X=rnorm(200))
library(plyr)
df <- ddply(x,.(A),transform,len=length(X))
ggplot(df,aes(x=X,color=A)) + geom_step(aes(len=len,y=..y.. * len),stat="ecdf")
这篇关于绘制ggplot2中的累计计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!