更容易的方式来绘制ggplot中的累积频率分布? [英] Easier way to plot the cumulative frequency distribution in ggplot?

查看:587
本文介绍了更容易的方式来绘制ggplot中的累积频率分布?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我有一些直方图可以立即显示的数据

  qplot(mydata,binwidth = 1); 

我在 http://www.r-tutor.com/elementary-statistics/quantitative-data/cumulative-frequency-graph ,但它涉及多个步骤,并且在探索数据时耗费时间。



有没有办法在ggplot中以更直接的方式实现它,类似于趋势行和置信区间可以通过指定选项来添加?

解决方案

内置 ecdf() R中的函数应该使事情变得更简单。以下是一些示例代码,利用 plyr

  library(plyr)
数据(虹膜)

## Ecdf在所有物种上
iris.all< - 总结(虹膜,Sepal.Length =唯一(Sepal.Length),
ecdf = ecdf(Sepal.Length)(unique(Sepal.Length)))

ggplot(iris.all,aes(Sepal.Length,ecdf))+ geom_step()
$ b $ (种类),总结,
Sepal.Length =唯一(Sepal.Length),
ecdf = ecdf(Sepal .Length)(unique(Sepal.Length)))

ggplot(iris.species,aes(Sepal.Length,ecdf,color = Species))+ geom_step()

编辑我刚才意识到你想要累积频率。您可以通过将ecdf值乘以观察总数来获得:

  iris.all<  -  summary(iris, Sepal.Length = unique(Sepal.Length),
ecdf = ecdf(Sepal.Length)(unique(Sepal.Length))* length(Sepal.Length))

iris.species < - ddply(iris,。(Species),summary,
Sepal.Length = unique(Sepal.Length),
ecdf = ecdf(Sepal.Length)(unique(Sepal.Length))*长度(Sepal.Length))


I'm looking for an easier way to draw the cumulative distribution line in ggplot.

I have some data whose histogram I can immediately display with

qplot (mydata, binwidth=1);

I found a way to do it at http://www.r-tutor.com/elementary-statistics/quantitative-data/cumulative-frequency-graph but it involves several steps and when exploring data it's time consuming.

Is there a way to do it in a more straightforward way in ggplot, similar to how trend lines and confidence intervals can be added by specifying options?

解决方案

There is a built in ecdf() function in R which should make things easier. Here's some sample code, utilizing plyr

library(plyr)
data(iris)

## Ecdf over all species
iris.all <- summarize(iris, Sepal.Length = unique(Sepal.Length), 
                            ecdf = ecdf(Sepal.Length)(unique(Sepal.Length)))

ggplot(iris.all, aes(Sepal.Length, ecdf)) + geom_step()

#Ecdf within species
iris.species <- ddply(iris, .(Species), summarize,
                            Sepal.Length = unique(Sepal.Length),
                            ecdf = ecdf(Sepal.Length)(unique(Sepal.Length)))

ggplot(iris.species, aes(Sepal.Length, ecdf, color = Species)) + geom_step()

Edit I just realized that you want cumulative frequency. You can get that by multiplying the ecdf value by the total number of observations:

iris.all <- summarize(iris, Sepal.Length = unique(Sepal.Length), 
                            ecdf = ecdf(Sepal.Length)(unique(Sepal.Length)) * length(Sepal.Length))

iris.species <- ddply(iris, .(Species), summarize,
                            Sepal.Length = unique(Sepal.Length),
                            ecdf = ecdf(Sepal.Length)(unique(Sepal.Length))*length(Sepal.Length))

这篇关于更容易的方式来绘制ggplot中的累积频率分布?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆