堆叠区域直方图 [英] Stacked Area Histogram in R

查看:173
本文介绍了堆叠区域直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一个Hadoop集群上运行一个Pig作业,将一大堆数据压缩成一个R可以处理的数据来进行队列分析。我有以下脚本,作为第二行到最后一行我有以下格式的数据:

I ran a Pig job on a Hadoop cluster that crunched a bunch of data down into something R can handle to do a cohort analysis. I have the following script, and as of the second to last line I have the data in the format:

> names(data)
[1] "VisitWeek" "ThingAge"    "MyMetric"



<是一个日期。

VisitWeek is a Date. ThingAge and MyMetric are integers.

数据如下:

2010-02-07     49  12345

我到目前为止的脚本是:

The script I have so far is:

# Load ggplot2 for charting 
library(ggplot2);

# Our file has headers - column names
data = read.table('weekly_cohorts.tsv',header=TRUE,sep="\t");

# Print the names
names(data)

# Convert to dates
data$VisitWeek = as.Date(data$VisitWeek)
data$ThingCreation = as.Date(data$ThingCreation)

# Fill in the age column
data$ThingAge = as.integer(data$VisitWeek - data$ThingCreation)

# Filter data to thing ages lt 10 weeks (70 days) + a sanity check for gt 0, and drop the creation week column
data = subset(data, data$ThingAge <= 70, c("VisitWeek","ThingAge","MyMetric"))
data = subset(data, data$ThingAge >= 0)

print(ggplot(data, aes(x=VisitWeek, y=MyMetric, fill=ThingAge)) + geom_area())

不工作。我尝试了很多变化,条形,直方图,但像往常一样,R docs打败了我。

This last line does not work. I've tried lots of variations, bars, histograms, but as usual R docs defeat me.

我想要显示一个标准的Excel样式的堆积面积图 - 一次系列,每个ThingAge堆积在x轴上的几个星期,日期在y轴。这种图表的示例如下: http:// upload。 wikimedia.org/wikipedia/commons/a/a1/Mk_Zuwanderer.png

I want it to show a standard Excel style stacked area chart - one time series for each ThingAge stacked across the weeks in the x axis, with the date on the y axis. An example of this kind of chart is here: http://upload.wikimedia.org/wikipedia/commons/a/a1/Mk_Zuwanderer.png

我在这里阅读了文件: http://had.co.nz/ggplot2/geom_area.html http://had.co.nz/ggplot2/geom_histogram.html 和此博客 http://chartsgraphs.wordpress.com/2008/10 / 05 / r-lattice-plot-beats-excel-stack-area-trend-chart / ,但我不能使它为我工作。

I've read the docs here: http://had.co.nz/ggplot2/geom_area.html and http://had.co.nz/ggplot2/geom_histogram.html and this blog http://chartsgraphs.wordpress.com/2008/10/05/r-lattice-plot-beats-excel-stacked-area-trend-chart/ but I can't quite make it work for me.

如何实现这一目标?

推荐答案

library(ggplot2)
set.seed(134)
df <- data.frame(
    VisitWeek = rep(as.Date(seq(Sys.time(),length.out=5, by="1 day")),3),
    ThingAge = rep(1:3, each=5),
    MyMetric = sample(100, 15))

ggplot(df, aes(x=VisitWeek, y=MyMetric)) + 
    geom_area(aes(fill=factor(ThingAge)))

给我下面的图片。我怀疑你的问题在于正确指定区域图的填充映射: fill = factor(ThingAge)

gives me the image below. I suspect your problem lies in correctly specifying the fill mapping for the area plot: fill=factor(ThingAge)

这篇关于堆叠区域直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆