R / ggplot直方图中的累积和 [英] R/ggplot Cumulative Sum in Histogram

查看：425 发布时间：2018/4/24 20:57:57 r ggplot2

本文介绍了R / ggplot直方图中的累积和的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含用户ID和他们创建的对象数量的数据集。我使用ggplot绘制了直方图，现在我试图将x值的累积和作为一行。目标是看到很多垃圾箱对总数的贡献。我尝试了以下方法：

  ggplot（data = userStats，aes（x = Num_Tours））+ geom_histogram（binwidth = 0.2）+ 
 scale_x_log10（name ='计划行程数'，休息= c（1,5,10,50,100,200））+ 
 geom_line（aes（x = Num_Tours，y = cumsum（Num_Tours）/ sum Num_Tours）* 3500），color =red）+ 
 scale_y_continuous（name ='Number of users'，sec.axis = sec_axis（〜。/ 3500，name =累计路线百分比[％]） ）

这是行不通的，因为我没有包含任何垃圾箱，所以剧情

和

  ggplot（data = userStats，aes（x = Num_Tours））+ geom_histogram （binwidth = 0.2）+ 
 scale_x_log10（name ='计划行程数'，break = c（1,5,10,50,100,200））+ 
 stat_bin（aes（y = cumsum（.. count ..）），binwidth = 0.2，geom =line，color =red）+ 
 scale_y_continuous（name ='Number of users'，sec.axis = sec_axis（〜。/ 3500，name =）累积百分比的路线[％]））

导致：
。

这里考虑计数的cumsum。我想要的是bin的count *值的cumsum。然后它应该正常化，以便它可以显示在一个图中。我想要的是这样的：

如果有任何输入，我将不胜感激！感谢

编辑：
作为测试数据，这应该是正常的：

< pre $ userID <-c（1：100） Num_Tours < - 样本（1：100,100） userStats< - data.frame（userID，Num_Tours ） userStats $ cumulative< - cumsum（userStats $ Num_Tours / sum（userStats $ Num_Tours））

解决方案

这是一个说明性的例子，可以帮助您。

  set .seed（111）
 userID <-c（1：100）
 Num_Tours < -  sample（1：100,100，replace = T）
 userStats<  -  data.frame （用户ID，Num_Tours）
 
＃排序x数据
 userStats $ Num_Tours<  -  sort（userStats $ Num_Tours）
 userStats $ cumulative<  -  cumsum（userStats $ Num_Tours / sum （userStats $ Num_Tours））
 
 library（ggplot2）
＃手动修复y轴的最大值
 ymax < -  40 
 ggplot（data = userStats ，aes（x = Num_Tours））+ 
 geom_histogram（binwidth = 0.2，col =white）+ 
 scale_x_log10（nam e ='计划行程数'，中断= c（1,5,10,50,100,200））+ 
 geom_line（aes（x = Num_Tours，y =累积* ymax），col =红色，lwd = 1）+ 
 scale_y_continuous（name ='Number of users'，sec.axis = sec_axis（〜。/ ymax，
 name =累计路线百分比[％]））

I have a dataset with user IDs and the number of objects they created. I drew the histogram using ggplot and now I'm trying to include the cumulative sum of the x-values as a line. The aim is to see much the bins contribute to the total number. I tried the following:
ggplot(data=userStats,aes(x=Num_Tours)) + geom_histogram(binwidth = 0.2)+ scale_x_log10(name = 'Number of planned tours',breaks=c(1,5,10,50,100,200))+ geom_line(aes(x=Num_Tours, y=cumsum(Num_Tours)/sum(Num_Tours)*3500),color="red")+ scale_y_continuous(name = 'Number of users', sec.axis = sec_axis(~./3500, name = "Cummulative percentage of routes [%]"))
This does not work because I don't include any bins so the plot

and
ggplot(data=userStats,aes(x=Num_Tours)) + geom_histogram(binwidth = 0.2)+ scale_x_log10(name = 'Number of planned tours',breaks=c(1,5,10,50,100,200))+ stat_bin(aes(y=cumsum(..count..)),binwidth = 0.2, geom="line",color="red")+ scale_y_continuous(name = 'Number of users', sec.axis = sec_axis(~./3500, name = "Cummulative percentage of routes [%]"))
Resulting in this: .

Here the cumsum of the count is considered. What I want is the cumsum of the count * value of the bin. Then it should be normalized, so that it can be displayed in one plot. What I am trying to to is something like that:

I would appreciate any input! Thanks

Edit: As test data, this should work:
userID <- c(1:100) Num_Tours <- sample(1:100,100) userStats <- data.frame(userID,Num_Tours) userStats$cumulative <- cumsum(userStats$Num_Tours/sum(userStats$Num_Tours))

解决方案
Here is an illustrative example that could be helpful for you.
set.seed(111) userID <- c(1:100) Num_Tours <- sample(1:100, 100, replace=T) userStats <- data.frame(userID, Num_Tours) # Sorting x data userStats$Num_Tours <- sort(userStats$Num_Tours) userStats$cumulative <- cumsum(userStats$Num_Tours/sum(userStats$Num_Tours)) library(ggplot2) # Fix manually the maximum value of y-axis ymax <- 40 ggplot(data=userStats,aes(x=Num_Tours)) + geom_histogram(binwidth = 0.2, col="white")+ scale_x_log10(name = 'Number of planned tours',breaks=c(1,5,10,50,100,200))+ geom_line(aes(x=Num_Tours,y=cumulative*ymax), col="red", lwd=1)+ scale_y_continuous(name = 'Number of users', sec.axis = sec_axis(~./ymax, name = "Cumulative percentage of routes [%]"))

这篇关于R / ggplot直方图中的累积和的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R / ggplot直方图中的累积和 [英] R/ggplot Cumulative Sum in Histogram

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R / ggplot直方图中的累积和 [英] R/ggplot Cumulative Sum in Histogram

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭