ggplot2:Yearmon比例尺和geom_bar [英] ggplot2: yearmon scale and geom_bar

查看:82
本文介绍了ggplot2:Yearmon比例尺和geom_bar的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

除了解决方案之外,我还想了解为什么应该很容易的事情却并非如此.

More than a solution I'd like to understand the reason why something which should be quite easy, it's actually not.

[[我从另一篇文章中借用了部分代码,涉及到该问题,但最终得到了我不喜欢的解决方案]

[I am borrowing part of the code from a different post which touched on the issue but it ended up with a solution I didn't like]

library(ggplot2)
library(xts)
library(dplyr)
library(scales)

csvData <- "dt,status
2015-12-03,1
2015-12-05,1
2015-12-05,0
2015-11-24,1
2015-10-17,0
2015-12-18,0
2016-06-30,0
2016-05-21,1
2016-03-31,0
2015-12-31,0"

tmp <- read.csv(textConnection(csvData))
tmp$dt <- as.Date(tmp$dt)
tmp$yearmon <- as.yearmon(tmp$dt)
tmp$status <- as.factor(tmp$status)

### Not good. Why?
ggplot(tmp, aes(x = yearmon, fill = status)) + 
  geom_bar() + 
  scale_x_yearmon()

### Almost good but long-winded and ticks not great
chartData <- tmp %>%
  group_by(yearmon, status) %>%
  summarise(count = n()) %>%
  as.data.frame()
ggplot(chartData, aes(x = yearmon, y = count, fill = status)) + 
  geom_col() + 
  scale_x_yearmon()

第一个情节全错了;第二个几乎完美(X轴上的刻度不是很好,但我可以接受). geom_bar()是否应该执行我必须在第二张图表中手动执行的计数工作?

The first plot is all wrong; the second is almost perfect (ticks on the X axis are not great but I can live with that). Isn't geom_bar() supposed to perform the count job I have to manually perform in the second chart?

第一图表

秒表

我的问题是:为什么第一张图表这么差?有一个警告是要建议一些东西("position_stack需要不重叠的x间隔"),但我真的听不懂. 谢谢.

My question is: why is the first chart so poor? There is a warning which is meant to suggest something ("position_stack requires non-overlapping x intervals") but I really fail to understand it. Thanks.

我的个人答案

这是我学到的(非常感谢大家!)

This is what I learned (thanks so much to all of you!):

  • 即使存在scale_#_yearmonscale_#_date,不幸的是 ggplot 仍将这些对象类型视为连续数字.这使得geom_bar无法使用.
  • geom_histogram可能会解决问题.但是您失去了对美学的相关部分的控制.
  • 底线:在绘制图表之前,您需要分组/求和
  • 不确定(如果您打算使用ggplot2), xts lubridate 对于我要实现的目标真的有用.我怀疑任何连续的情况-按日期排列-它们将是完美的.
  • Even if there is a scale_#_yearmon or scale_#_date, unfortunately ggplot treats those object types as continuous numbers. That makes geom_bar unusable.
  • geom_histogram might do the trick. But you lose control on relevant parts of the aestethics.
  • bottom line: you need to group/sum before you chart
  • Not sure (if you plan to use ggplot2) xts or lubridate are really that useful for what I was trying to achieve. I suspect for any continuous case - date-wise - they will be perfect.

总而言之,我以这一点结束了我所做的一切(注意为什么不需要 xts lubridate ):

All in, I ended with this which does perfectly what I am after (notice how there is no need for xts or lubridate):

library(ggplot2)
library(dplyr)
library(scales)

csvData <- "dt,status
2015-12-03,1
2015-12-05,1
2015-12-05,0
2015-11-24,1
2015-10-17,0
2015-12-18,0
2016-06-30,0
2016-05-21,1
2016-03-31,0
2015-12-31,0"

tmp <- read.csv(textConnection(csvData))
tmp$dt <- as.Date(tmp$dt)
tmp$yearmon <- as.Date(format(tmp$dt, "%Y-%m-01"))
tmp$status <- as.factor(tmp$status)

### GOOD
chartData <- tmp %>%
  group_by(yearmon, status) %>%
  summarise(count = n()) %>%
  as.data.frame()

ggplot(chartData, aes(x = yearmon, y = count, fill = status)) + 
  geom_col() + 
  scale_x_date(labels = date_format("%h-%y"),
               breaks = seq(from = min(chartData$yearmon), 
                            to = max(chartData$yearmon), by = "month"))

最终输出

推荐答案

第一个图被拧的原因基本上是ggplot2,与yearmon并不完全相同.如您所见,它只是内部带有标签的num.

The reason why the first plot is screwed is basically ggplot2 does not exactly what the yearmon is. As you see here it is just a num internally with labels.

> as.numeric(tmp$yearmon)
[1] 2015.917 2015.917 2015.917 2015.833 2015.750 2015.917 2016.417 2016.333 2016.167 2015.917

因此,当您在没有先前聚合的情况下进行绘制时,条形图将展开.您需要像这样使用geom_histogram()分配适当的binwidth:

So when you plot without the previous aggregation, the bar is spread out. You need to assign appropriate binwidth using geom_histogram() like this:

ggplot(tmp, aes(x = yearmon, fill = status)) + 
  geom_histogram(binwidth = 1/12) + 
  scale_x_yearmon()

1/12每年对应12个月.

对于聚集后的绘图,如@ed_sans所建议的,我也更喜欢lubridate,因为我对如何更改刻度和修改轴标签有更好的了解.

For a plot after aggregation, as @ed_sans suggest, I also prefer lubridate as I know better on how to change ticks and modify axis labels.

chartData <- tmp %>%
  mutate(ym = floor_date(dt,"month")) %>%
  group_by(ym, status) %>%
  summarise(count = n()) %>%
  as.data.frame()

ggplot(chartData, aes(x = ym, y = count, fill = status)) + 
  geom_col() + 
  scale_x_date(labels = date_format("%Y-%m"),
               breaks = as.Date("2015-09-01") + 
                 months(seq(0, 10, by = 2)))

这篇关于ggplot2:Yearmon比例尺和geom_bar的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆