更容易的方式来绘制ggplot中的累积频率分布? [英] Easier way to plot the cumulative frequency distribution in ggplot?
问题描述
我有一些直方图可以立即显示的数据
qplot(mydata,binwidth = 1);
我在 http://www.r-tutor.com/elementary-statistics/quantitative-data/cumulative-frequency-graph ,但它涉及多个步骤,并且在探索数据时耗费时间。
有没有办法在ggplot中以更直接的方式实现它,类似于趋势行和置信区间可以通过指定选项来添加?
内置 ecdf() R中的
函数应该使事情变得更简单。以下是一些示例代码,利用 plyr
library(plyr)
数据(虹膜)
## Ecdf在所有物种上
iris.all< - 总结(虹膜,Sepal.Length =唯一(Sepal.Length),
ecdf = ecdf(Sepal.Length)(unique(Sepal.Length)))
ggplot(iris.all,aes(Sepal.Length,ecdf))+ geom_step()
$ b $ (种类),总结,
Sepal.Length =唯一(Sepal.Length),
ecdf = ecdf(Sepal .Length)(unique(Sepal.Length)))
ggplot(iris.species,aes(Sepal.Length,ecdf,color = Species))+ geom_step()
编辑我刚才意识到你想要累积频率。您可以通过将ecdf值乘以观察总数来获得:
iris.all< - summary(iris, Sepal.Length = unique(Sepal.Length),
ecdf = ecdf(Sepal.Length)(unique(Sepal.Length))* length(Sepal.Length))
iris.species < - ddply(iris,。(Species),summary,
Sepal.Length = unique(Sepal.Length),
ecdf = ecdf(Sepal.Length)(unique(Sepal.Length))*长度(Sepal.Length))
I'm looking for an easier way to draw the cumulative distribution line in ggplot.
I have some data whose histogram I can immediately display with
qplot (mydata, binwidth=1);
I found a way to do it at http://www.r-tutor.com/elementary-statistics/quantitative-data/cumulative-frequency-graph but it involves several steps and when exploring data it's time consuming.
Is there a way to do it in a more straightforward way in ggplot, similar to how trend lines and confidence intervals can be added by specifying options?
There is a built in ecdf()
function in R which should make things easier. Here's some sample code, utilizing plyr
library(plyr)
data(iris)
## Ecdf over all species
iris.all <- summarize(iris, Sepal.Length = unique(Sepal.Length),
ecdf = ecdf(Sepal.Length)(unique(Sepal.Length)))
ggplot(iris.all, aes(Sepal.Length, ecdf)) + geom_step()
#Ecdf within species
iris.species <- ddply(iris, .(Species), summarize,
Sepal.Length = unique(Sepal.Length),
ecdf = ecdf(Sepal.Length)(unique(Sepal.Length)))
ggplot(iris.species, aes(Sepal.Length, ecdf, color = Species)) + geom_step()
Edit I just realized that you want cumulative frequency. You can get that by multiplying the ecdf value by the total number of observations:
iris.all <- summarize(iris, Sepal.Length = unique(Sepal.Length),
ecdf = ecdf(Sepal.Length)(unique(Sepal.Length)) * length(Sepal.Length))
iris.species <- ddply(iris, .(Species), summarize,
Sepal.Length = unique(Sepal.Length),
ecdf = ecdf(Sepal.Length)(unique(Sepal.Length))*length(Sepal.Length))
这篇关于更容易的方式来绘制ggplot中的累积频率分布?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!