绘制ggplot2中的累计计数 [英] Plotting cumulative counts in ggplot2

查看:570
本文介绍了绘制ggplot2中的累计计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一些关于在ggplot中绘制累积密度的文章。我目前正在使用更简单的方法绘制ggplot?中的累积频率分布图来绘制我的累积计数。但是这个解决方案需要事先预先计算这些值。

这里我正在寻找一个纯粹的ggplot解决方案。让我们来展示我到目前为止的内容:

  x < -  data.frame(A = replicate(200,sample(c( a,b,c),1)),X = rnorm(200))



ggplot的 stat_ecdf



我可以使用ggplot的 stat_ecdf ,但它只绘制累计密度:

  ggplot(x,aes(x = X,color = A))+ geom_step (aes(y = .. y ..),stat =ecdf)



我想要做类似以下,但它不起作用:

  ggplot(x,aes(x = X,color = A))+ geom_step(aes(y = .. y .. * ..count ..),stat =ecdf)



cumsum stat_bin



我发现关于使用 cumsum stat_bin 的想法:

  ggplot(x,aes(x = X,color = A))+ stat_bin(aes(y = cumsum(.. count ..)),geom =step)



但你可以看到,下一个颜色不是从 y = 0 开始,而是最后一个颜色结束。



我要的是什么



我想从最好到最差:


  1. 理想情况下,简单的修复方法是不工作的

      ggplot(x,aes(x = X,color = A))+ geom_step(aes(y = .. y .. * ..count ..),stat =ecdf)


  2. 使用 stat_ecdf 计数的更复杂方式。


  3. 最后的方法是使用 cumsum 方法,因为它给出了更糟的(分箱)结果。


解决方案

这不会解决直接分组问题,但它会解决问题。

您可以在> stat_bin() A levels。​​

  ggplot(x,aes(x = X,color = A))+ 
stat_bin(data = subset(x,A ==a),aes(y = cumsum(.. count ..)),geom =step)+
stat_bin(data = subset (x,A ==b),aes(y = cumsum(.. count ..)),geom =step)+
stat_bin(data = subset(x,A ==c ),aes(y = cumsum(.. count ..)),geom =step)



更新 - 解决方案使用geom_step()



另一种可能性是将 .. y .. 的值与每个级别的观测值相乘。为了在这一时刻获得这些观察数量,我发现的唯一方法是在绘图之前对它们进行预先计算并将它们添加到原始数据框中。我将此列命名为 len 。然后在 geom_step()里面 aes(),你应该定义你将使用变量 len = len ,然后将 y 值定义为 y = .. y .. * len

  set.seed(123)
x< - data.frame(A = replicate(200,sample c)(a,b,c),1)),X = rnorm(200))
library(plyr)
df < - ddply ,len = length(X))
ggplot(df,aes(x = X,color = A))+ geom_step(aes(len = len,y = .. y .. * len),stat =ecdf)


There are some posts about plotting cumulative densities in ggplot. I'm currently using the accepted answer from Easier way to plot the cumulative frequency distribution in ggplot? for plotting my cumulative counts. But this solution involves pre-calculating the values beforehand.

Here I'm looking for a pure ggplot solution. Let's show what I have so far:

x <- data.frame(A=replicate(200,sample(c("a","b","c"),1)),X=rnorm(200))

ggplot's stat_ecdf

I can use ggplot's stat_ecdf, but it only plots cumulative densities:

ggplot(x,aes(x=X,color=A)) + geom_step(aes(y=..y..),stat="ecdf")

I'd like to do something like the following, but it doesn't work:

ggplot(x,aes(x=X,color=A)) + geom_step(aes(y=..y.. * ..count..),stat="ecdf")

cumsum and stat_bin

I found an idea about using cumsum and stat_bin:

ggplot(x,aes(x=X,color=A)) + stat_bin(aes(y=cumsum(..count..)),geom="step")

But as you can see, the next color doesn't start at y=0, but where the last color ended.

What I ask for

What I'd like to have from best to worst:

  1. Ideally a simple fix to the not working

    ggplot(x,aes(x=X,color=A)) + geom_step(aes(y=..y.. * ..count..),stat="ecdf")
    

  2. A more complicated way to use stat_ecdf with counts.

  3. Last resort would be to use the cumsum approach, since it gives worse (binned) results.

解决方案

This will not solve directly problem with grouping of lines but it will be workaround.

You can add three calls to stat_bin() where you subset your data according to A levels.

ggplot(x,aes(x=X,color=A)) +
  stat_bin(data=subset(x,A=="a"),aes(y=cumsum(..count..)),geom="step")+
  stat_bin(data=subset(x,A=="b"),aes(y=cumsum(..count..)),geom="step")+
  stat_bin(data=subset(x,A=="c"),aes(y=cumsum(..count..)),geom="step")

UPDATE - solution using geom_step()

Another possibility is to multiply values of ..y.. with number of observations in each level. To get this number of observations at this moment only way I found is to precalculate them before plotting and add them to original data frame. I named this column len. Then in geom_step() inside aes() you should define that you will use variable len=len and then define y values as y=..y.. * len.

set.seed(123)
x <- data.frame(A=replicate(200,sample(c("a","b","c"),1)),X=rnorm(200))
library(plyr)
df <- ddply(x,.(A),transform,len=length(X))
ggplot(df,aes(x=X,color=A)) + geom_step(aes(len=len,y=..y.. * len),stat="ecdf") 

这篇关于绘制ggplot2中的累计计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆