ggplot2中出现故障的日期 [英] out of order date in ggplot2

查看:54
本文介绍了ggplot2中出现故障的日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我通常会知道如何在ggplot中排序日期,但是此数据有些不同,我希望有人可以帮我澄清一下.

I typically know how to order my dates in ggplot but something is different about this data and I'm hoping someone can clarify for me.

考虑:

ggplot(tmp3)+
geom_boxplot(aes(x=simdte,y=r2))+
facet_wrap(~simyr, scales='free_x')+
theme(axis.text.x=element_text(angle=45,hjust=1))

日期按字母数字顺序排列,但是现在我想格式化x轴标签,所以我尝试了:

The dates are in alphanumeric order but now I want to format the x axis labels so I tried:

ggplot(tmp3)+
geom_boxplot(aes(x=reorder(strftime(strptime(simdte,'%Y%m%d'),'%b-%d'),as.numeric(simdte)),y=r2))+
facet_wrap(~simyr, scales='free_x')+
theme(axis.text.x=element_text(angle=45,hjust=1))

但请注意,除2015年6月8日外,所有日期均按顺序排列.

but notice that all the dates are in order EXCEPT Jun-08 in 2015.

我也尝试过

tmp3=
tmp3 %>%
mutate(plotsimdte=factor(strftime(strptime(simdte,'%Y%m%d'),'%b-%d'),                        levels=strftime(strptime(unique(simdte),'%Y%m%d'),'%b-%d')[order(unique(simdte))]))

并使用 x = plotsimdte 进行绘图,但没有区别.在创建有关重复级别的因素时,我会收到一条警告,这很令人困惑,因为我仅使用唯一值.

and plotting with x=plotsimdte but no difference. I get a warning when I create this factor about duplicated levels which is confusing since I'm only using unique values.

最后,我尝试了

ggplot(tmp3)+
geom_boxplot(aes(x=as.Date(simdte,'%Y%m%d'),y=r2, group=simdte))+
scale_x_date(date_labels ='%b-%d')+
facet_wrap(~simyr, scales='free_x')+
theme(axis.text.x=element_text(angle=45,hjust=1))

但是我想使日期保持离散,因为它们的重要性是作为标识符,而不是时间分布.

but I'd like to keep the dates discrete because their importance is as an identifier rather than distribution through time.

任何建议将不胜感激.谢谢

Any advice would be appreciated. Thanks

一小部分数据

使用as.data.frame更新了dput输出

updated dput output with as.data.frame

> dput(as.data.frame(tmp3))
structure(list(mdldte = c("20130525", "20140407", "20140413", 
"20150608", "20130525", "20150608", "20140420", "20130429", "20130608", 
"20130608", "20140323", "20140413", "20150325", "20150608", "20140511", 
"20130601", "20150608", "20130608", "20140420", "20150305", "20150415", 
"20130608", "20140531", "20150608", "20140531", "20150608", "20130403", 
"20130503", "20150415", "20140407", "20150608", "20140323", "20130525", 
"20140420", "20130403", "20130403", "20130608", "20150501", "20150608", 
"20130429", "20160607", "20140527", "20140420", "20140531", "20140502", 
"20150325", "20140428", "20160620", "20160620", "20130403", "20160527", 
"20150415", "20140413", "20160607", "20140413", "20150608", "20160613", 
"20150608", "20140407", "20150501", "20140323", "20160607", "20140531", 
"20150305", "20150409", "20140428", "20130503", "20130525", "20140428", 
"20140407", "20130503", "20130525", "20130403", "20150305", "20150217", 
"20150501", "20130608", "20150305", "20150217", "20130608", "20140511", 
"20160527", "20140502", "20150415"), simdte = c("20130403", "20130403", 
"20130403", "20130429", "20130429", "20130429", "20130503", "20130503", 
"20130503", "20130525", "20130525", "20130525", "20130601", "20130601", 
"20130601", "20130608", "20130608", "20130608", "20140323", "20140323", 
"20140323", "20140407", "20140407", "20140407", "20140413", "20140413", 
"20140413", "20140420", "20140420", "20140420", "20140428", "20140428", 
"20140428", "20140502", "20140502", "20140502", "20140511", "20140511", 
"20140511", "20140517", "20140517", "20140517", "20140527", "20140527", 
"20140527", "20140531", "20140531", "20140531", "20150217", "20150217", 
"20150217", "20150305", "20150305", "20150305", "20150325", "20150325", 
"20150325", "20150409", "20150409", "20150409", "20150415", "20150415", 
"20150415", "20150427", "20150427", "20150427", "20150501", "20150501", 
"20150501", "20150608", "20150608", "20150608", "20160527", "20160527", 
"20160527", "20160607", "20160607", "20160607", "20160613", "20160613", 
"20160613", "20160620", "20160620", "20160620"), r2 = c(0.862283742909527, 
0.813142444594872, 0.700946018367384, 0.474388980021752, 0.826648311592866, 
0.794283339648572, 0.79687922855493, 0.808984929407683, 0.781751354268809, 
0.535951689307516, 0.68524477567256, 0.716321630808227, 0.373141090466726, 
0.723850452026657, 0.408972539926536, 0.29346057127035, 0.319261073048776, 
0.319535158994707, 0.872351278607699, 0.871652058666136, 0.509872096326808, 
0.398605136979609, 0.420745998256184, 0.596082529689281, 0.793035779455997, 
0.661212720614186, 0.736581215438551, 0.89337362408349, 0.900773593767951, 
0.916946297262156, 0.700865150846107, 0.839501961957186, 0.863684601286204, 
0.819367869015135, 0.765192251153536, 0.590744027549224, 0.720092636591613, 
0.732237645665246, 0.701898569000057, 0.505310296599101, 0.756344530560126, 
0.522404606955389, 0.631453896947287, 0.732767696833121, 0.669168785479052, 
0.340080390313005, 0.397681954572616, 0.708286400101956, 0.551718623201008, 
0.62217661847446, 0.160935876745664, 0.79407487647674, 0.729924604817696, 
0.716024523586796, 0.526169199415047, 0.702098331814224, 0.748626603557805, 
0.432690018453805, 0.710646849035047, 0.526049259906931, 0.811336120223548, 
0.679819505156441, 0.591396577448379, 0.656686513355743, 0.698313842140892, 
0.718604690738853, 0.768070041705958, 0.453336001102217, 0.544446423520199, 
0.583336140040845, 0.172961846412558, 0.298155303932666, 0.731010397306203, 
0.582517045429492, 0.521708072638302, 0.610885761462162, 0.543494236386099, 
0.630580819311437, 0.642714888852003, 0.736302041771047, 0.736086951074143, 
0.444437396681972, 0.445336147280364, 0.43829690520584), simyr = c("2013", 
"2013", "2013", "2013", "2013", "2013", "2013", "2013", "2013", 
"2013", "2013", "2013", "2013", "2013", "2013", "2013", "2013", 
"2013", "2014", "2014", "2014", "2014", "2014", "2014", "2014", 
"2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", 
"2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", 
"2014", "2014", "2014", "2014", "2014", "2014", "2014", "2015", 
"2015", "2015", "2015", "2015", "2015", "2015", "2015", "2015", 
"2015", "2015", "2015", "2015", "2015", "2015", "2015", "2015", 
"2015", "2015", "2015", "2015", "2015", "2015", "2015", "2016", 
"2016", "2016", "2016", "2016", "2016", "2016", "2016", "2016", 
"2016", "2016", "2016"), mdlpreds = structure(c(4L, 2L, 3L, 1L, 
3L, 2L, 4L, 2L, 3L, 3L, 4L, 2L, 1L, 2L, 3L, 1L, 3L, 3L, 4L, 4L, 
1L, 1L, 1L, 3L, 2L, 3L, 3L, 4L, 4L, 4L, 2L, 3L, 4L, 2L, 4L, 1L, 
3L, 3L, 3L, 3L, 2L, 1L, 4L, 2L, 4L, 3L, 1L, 4L, 4L, 4L, 3L, 4L, 
2L, 2L, 1L, 3L, 3L, 1L, 3L, 2L, 2L, 3L, 3L, 4L, 4L, 3L, 2L, 1L, 
3L, 2L, 3L, 1L, 2L, 1L, 3L, 1L, 1L, 3L, 2L, 2L, 2L, 1L, 1L, 1L
), .Label = c("phv", "phvfsca", "phvaso", "phvasofsca"), class = "factor")), class = "data.frame", .Names = c("mdldte", 
"simdte", "r2", "simyr", "mdlpreds"), row.names = c(NA, -84L))

推荐答案

问题是您的日期当前被解释为字符数据,R对其进行了一些改组.您真正想要的是将它们视为真正的Date对象,然后让ggplot的更高级别的函数相应地处理排序和标记.

The issue is that your dates are currently being interpreted as character data, and R is shuffling them a little. What you really want is for them to be treated as genuine Date objects, and then let ggplot's higher-level functions handle the ordering and labeling accordingly.

将日期数据转换为日期类型:

Convert the date data to Date type:

tmp3$newdate <- as.Date(strptime(tmp3$simdte, '%Y%m%d'))

将新日期指定为x值(无需仅选择唯一值),然后使用 scale_x_date 创建漂亮的标签.请注意,这也可以跨时间正确地间隔数据点,而不是对日期数据的每个级别"使用均匀的间隔.

Specify the new dates as the x-values (no need to select only the unique values), and use scale_x_date to create pretty labels. Note that this also correctly spaces the data points across time, instead of using even spacing for each "level" of the date data.

plot.new <- ggplot(tmp3)+
    geom_point(aes(x= newdate, y=r2))+
    scale_x_date(date_labels = '%b-%d') +
    facet_wrap(~simyr, scales='free_x')+
    theme(axis.text.x=element_text(angle=45,hjust=1))
print(plot.new)

将来,了解 str 函数很有用,该函数可以快速告诉您数据列的格式(也可以从RStudio的环境"面板中访问):

In the future, it's useful to be aware of the str function, which can quickly tell you the format of your data columns (also accessible from the Environment panel in RStudio):

str(tmp3)

'data.frame':   28 obs. of  7 variables:
 $ mdldte  : chr  "20150305" "20140531" "20160620" "20150305" ...
 $ simdte  : chr  "20130403" "20130429" "20130503" "20130525" ...
 $ r2      : num  0.542 0.485 0.54 0.4 0.594 ...
 $ simyr   : chr  "2013" "2013" "2013" "2013" ...
 $ mdlyr   : chr  "2015" "2014" "2016" "2015" ...
 $ mdlpreds: Factor w/ 4 levels "phv","phvfsca",..: 1 1 1 1 4 1 4 2 3 4 ...
 $ newdate : Date, format: "2013-04-03" "2013-04-29" "2013-05-03" "2013-05-25" ...

如您所见,原始的"simdte"列被存储为字符数据.R(和ggplot)会将数据的每个值视为唯一的 level 或类别.相反,日期数据基本上是数字的.R会将其视为 continuous (连续的),这样可以更轻松地在时间轴或轴上准确绘制它们.它还使将底层数据与任何打印标签的格式分开变得更加容易.

As you can see, your original "simdte" column is being stored as character data. R (and ggplot) will treat every value of the data as a unique level or category. Conversely, Date data are fundamentally numerical. R will treat them as continuous, which makes it easier to plot them accurately on a timeline or axis. It also makes it easier to separate the underlying data from the format of any plotting labels.

如果相反,我们希望每个日期都充当一个类别(而不是让日期数据充当数字距离),则解决方案实际上更简单.当您尝试更改输入到ggplot美学中的值的数量时,会发生奇怪的事情,我怀疑这是造成乱序问题的根本原因.

If instead we wanted each date to act as a category (instead of having the date data act as a numerical distance), the solution is actually simpler. Strange things happen when you try to change the number of values being fed into a ggplot aesthetic, which I suspect is the root cause of your misordering problem.

关键是要依赖ggplot的内置标签功能.再次,对 ggplot 的主调用被馈入原始数据,并且 scale_x_discrete 处理漂亮标签的创建:

The key is to rely on ggplot's built-in labeling functions. Once again, the main call to ggplot is fed the raw data, and scale_x_discrete handles the creation of pretty labels:

plot.new <- ggplot(tmp3)+
    geom_boxplot(aes(x=simdte,y=r2))+
    facet_wrap(~simyr, scales='free_x')+
    scale_x_discrete(labels = function(x) strftime(strptime(x, '%Y%m%d'), '%b-%d'))+
    theme(axis.text.x=element_text(angle=45,hjust=1))
print(plot.new)

这篇关于ggplot2中出现故障的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆