ggplot2:Yearmon比例尺和geom_bar [英] ggplot2: yearmon scale and geom_bar
问题描述
除了解决方案之外,我还想了解为什么应该很容易的事情却并非如此.
More than a solution I'd like to understand the reason why something which should be quite easy, it's actually not.
[[我从另一篇文章中借用了部分代码,涉及到该问题,但最终得到了我不喜欢的解决方案]
[I am borrowing part of the code from a different post which touched on the issue but it ended up with a solution I didn't like]
library(ggplot2)
library(xts)
library(dplyr)
library(scales)
csvData <- "dt,status
2015-12-03,1
2015-12-05,1
2015-12-05,0
2015-11-24,1
2015-10-17,0
2015-12-18,0
2016-06-30,0
2016-05-21,1
2016-03-31,0
2015-12-31,0"
tmp <- read.csv(textConnection(csvData))
tmp$dt <- as.Date(tmp$dt)
tmp$yearmon <- as.yearmon(tmp$dt)
tmp$status <- as.factor(tmp$status)
### Not good. Why?
ggplot(tmp, aes(x = yearmon, fill = status)) +
geom_bar() +
scale_x_yearmon()
### Almost good but long-winded and ticks not great
chartData <- tmp %>%
group_by(yearmon, status) %>%
summarise(count = n()) %>%
as.data.frame()
ggplot(chartData, aes(x = yearmon, y = count, fill = status)) +
geom_col() +
scale_x_yearmon()
第一个情节全错了;第二个几乎完美(X轴上的刻度不是很好,但我可以接受). geom_bar()
是否应该执行我必须在第二张图表中手动执行的计数工作?
The first plot is all wrong; the second is almost perfect (ticks on the X axis are not great but I can live with that). Isn't geom_bar()
supposed to perform the count job I have to manually perform in the second chart?
第一图表
秒表
我的问题是:为什么第一张图表这么差?有一个警告是要建议一些东西("position_stack需要不重叠的x间隔"),但我真的听不懂. 谢谢.
My question is: why is the first chart so poor? There is a warning which is meant to suggest something ("position_stack requires non-overlapping x intervals") but I really fail to understand it. Thanks.
我的个人答案
这是我学到的(非常感谢大家!)
This is what I learned (thanks so much to all of you!):
- 即使存在
scale_#_yearmon
或scale_#_date
,不幸的是 ggplot 仍将这些对象类型视为连续数字.这使得geom_bar
无法使用. -
geom_histogram
可能会解决问题.但是您失去了对美学的相关部分的控制. - 底线:在绘制图表之前,您需要分组/求和
- 不确定(如果您打算使用ggplot2), xts 或 lubridate 对于我要实现的目标真的有用.我怀疑任何连续的情况-按日期排列-它们将是完美的.
- Even if there is a
scale_#_yearmon
orscale_#_date
, unfortunately ggplot treats those object types as continuous numbers. That makesgeom_bar
unusable. geom_histogram
might do the trick. But you lose control on relevant parts of the aestethics.- bottom line: you need to group/sum before you chart
- Not sure (if you plan to use ggplot2) xts or lubridate are really that useful for what I was trying to achieve. I suspect for any continuous case - date-wise - they will be perfect.
总而言之,我以这一点结束了我所做的一切(注意为什么不需要 xts 或 lubridate ):
All in, I ended with this which does perfectly what I am after (notice how there is no need for xts or lubridate):
library(ggplot2)
library(dplyr)
library(scales)
csvData <- "dt,status
2015-12-03,1
2015-12-05,1
2015-12-05,0
2015-11-24,1
2015-10-17,0
2015-12-18,0
2016-06-30,0
2016-05-21,1
2016-03-31,0
2015-12-31,0"
tmp <- read.csv(textConnection(csvData))
tmp$dt <- as.Date(tmp$dt)
tmp$yearmon <- as.Date(format(tmp$dt, "%Y-%m-01"))
tmp$status <- as.factor(tmp$status)
### GOOD
chartData <- tmp %>%
group_by(yearmon, status) %>%
summarise(count = n()) %>%
as.data.frame()
ggplot(chartData, aes(x = yearmon, y = count, fill = status)) +
geom_col() +
scale_x_date(labels = date_format("%h-%y"),
breaks = seq(from = min(chartData$yearmon),
to = max(chartData$yearmon), by = "month"))
最终输出
推荐答案
第一个图被拧的原因基本上是ggplot2
,与yearmon
并不完全相同.如您所见,它只是内部带有标签的num
.
The reason why the first plot is screwed is basically ggplot2
does not exactly what the yearmon
is. As you see here it is just a num
internally with labels.
> as.numeric(tmp$yearmon)
[1] 2015.917 2015.917 2015.917 2015.833 2015.750 2015.917 2016.417 2016.333 2016.167 2015.917
因此,当您在没有先前聚合的情况下进行绘制时,条形图将展开.您需要像这样使用geom_histogram()
分配适当的binwidth
:
So when you plot without the previous aggregation, the bar is spread out. You need to assign appropriate binwidth
using geom_histogram()
like this:
ggplot(tmp, aes(x = yearmon, fill = status)) +
geom_histogram(binwidth = 1/12) +
scale_x_yearmon()
1/12
每年对应12个月.
对于聚集后的绘图,如@ed_sans所建议的,我也更喜欢lubridate
,因为我对如何更改刻度和修改轴标签有更好的了解.
For a plot after aggregation, as @ed_sans suggest, I also prefer lubridate
as I know better on how to change ticks and modify axis labels.
chartData <- tmp %>%
mutate(ym = floor_date(dt,"month")) %>%
group_by(ym, status) %>%
summarise(count = n()) %>%
as.data.frame()
ggplot(chartData, aes(x = ym, y = count, fill = status)) +
geom_col() +
scale_x_date(labels = date_format("%Y-%m"),
breaks = as.Date("2015-09-01") +
months(seq(0, 10, by = 2)))
这篇关于ggplot2:Yearmon比例尺和geom_bar的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!