按月份和日期分组的事件直方图 [英] Histogram of events grouped by month and day

查看:183
本文介绍了按月份和日期分组的事件直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从多年的数据集中的每个事件发生次数的直方图(或其他情节),而是按月和日进行分组。基本上我想要一年多的x轴从3月1日开始,显示每个日期发生多少次,并根据分类值遮挡它们。以下是数据集中的前20个条目:

I am trying to make a histogram (or other plot) of the number of occurrences of each event from a set of data from multiple years but grouped by month and day. Basically I want a year long x-axis starting from 1 March showing how many times each date occurs and shading those based on a categorical value. Below is the top 20 entries in the data set:

goose

Index   DateLost    DateLost1   Nested
1   2/5/1988    1988-02-05  N
2   5/20/1988   1988-05-20  N
3   1/31/1985   1985-01-31  N
4   9/6/1997    1997-09-06  Y
5   9/24/1996   1996-09-24  N
6   9/27/1996   1996-09-27  N
7   9/15/1997   1997-09-15  Y
8   1/18/1989   1989-01-18  Y
9   1/12/1985   1985-01-12  Y
10  2/12/1988   1988-02-12  N
11  1/12/1985   1985-01-12  Y
12  10/26/1986  1986-10-26  N
13  9/15/1988   1988-09-15  Y
14  12/30/1986  1986-12-30  N
15  1/19/1991   1991-01-19  N
16  1/7/1992    1992-01-07  N
17  10/9/1999   1999-10-09  N
18  10/20/1990  1990-10-20  N
19  10/25/2001  2001-10-25  N
20  9/23/1996   1996-09-23  Y

我试过使用strftime,zoo和lubridate进行分组,但是绘图不能识别时间顺序或允许我调整起始价值。我已经尝试过很多方法使用plot()和ggplot2(),但是无法得到分组的数据绘制正确或无法获得数据分组。目前为止,我最好的情节是这段代码:

I have tried grouping using strftime, zoo, and lubridate but then the plots don't recognize the time sequence or allow me to adjust the starting value. I have tried numerous methods using plot() and ggplot2() but either can't get the grouped data to plot correctly or can't get data grouped. My best plot so far is from this code:

ggplot(goose,aes(x = DateLost1,fill = Nested))+
stat_bin(binwidth = 100,po​​sition =identity)+
scale_x_date(Date)

好的情节,但是在所有年份,而不是一年。我也玩过以前的答案的代码:
了解日期并在R中绘制ggplot2的直方图
但是在选择开始日期时遇到麻烦。任何帮助将不胜感激。让我知道如果我可以以更容易使用的格式提供示例数据。

This gets me a nice plot but over all years, rather than one year. I have also played with the code from a previous answer here: Understanding dates and plotting a histogram with ggplot2 in R But am having trouble choosing a start date. Any help would be greatly appreciated. Let me know if I can provide the example data in an easier to use format.

推荐答案

让我们阅读你的数据: p>

Let's read in your data:

goose <- read.table(header = TRUE, text = "Index   DateLost    DateLost1   Nested
1   2/5/1988    1988-02-05  N
2   5/20/1988   1988-05-20  N
3   1/31/1985   1985-01-31  N
4   9/6/1997    1997-09-06  Y
5   9/24/1996   1996-09-24  N
6   9/27/1996   1996-09-27  N
7   9/15/1997   1997-09-15  Y
8   1/18/1989   1989-01-18  Y
9   1/12/1985   1985-01-12  Y
10  2/12/1988   1988-02-12  N
11  1/12/1985   1985-01-12  Y
12  10/26/1986  1986-10-26  N
13  9/15/1988   1988-09-15  Y
14  12/30/1986  1986-12-30  N
15  1/19/1991   1991-01-19  N
16  1/7/1992    1992-01-07  N
17  10/9/1999   1999-10-09  N
18  10/20/1990  1990-10-20  N
19  10/25/2001  2001-10-25  N
20  9/23/1996   1996-09-23  Y")

现在我们可以将它转换成POSIXct格式:

now we can convert this to POSIXct format:

goose$DateLost1 <- as.POSIXct(goose$DateLost,
                              format = "%m/%d/%Y", 
                              tz = "GMT")


$ b $那么我们需要弄清楚它相对于3月31日失去了什么年份。不要在 ggplot()中尝试这样做。这需要一些弄清楚我们在哪一年,然后计算3月31日之后的天数。

then we need to figure out what year it was lost in, relative to March 31. Don't try to do this in ggplot(). This requires some mucking about to figure out which year we are in, and then calculate the number of days after March 31.

goose$DOTYMarch1 = as.numeric(format(as.POSIXct(paste0("3/1/",format(goose$DateLost1,"%Y")),
                                                format = "%m/%d/%Y",
                                                tz = "GMT"),
                              "%j"))
goose$DOTYLost = as.numeric(format(goose$DateLost1,
                             "%j"))
goose$YLost = as.numeric(format(goose$DateLost1,"%Y")) + (as.numeric(goose$DOTYLost>goose$DOTYMarch1) -1)
goose$DOTYAfterMarch31Lost = as.numeric(goose$DateLost1 - as.POSIXct(paste0("3/1/",goose$YLost),
                                                          format = "%m/%d/%Y", 
                                                          tz = "GMT"))

然后我们可以绘制它。你的代码已经非常完美了。

Then we can plot it. Your code was pretty much perfect already.

require(ggplot2)

p <- ggplot(goose, 
            aes(x=DOTYAfterMarch31Lost,
                fill=Nested))+ 
  stat_bin(binwidth=1,
           position="identity")
print(p)

我们得到这个:

这篇关于按月份和日期分组的事件直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆