将年-月字符串列转换为季度分类 [英] Convert year-month string column into quarterly bins

查看:164
本文介绍了将年-月字符串列转换为季度分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用大型物候数据集,其中在给定的月份中有多处树木观察。我想将这些观察结果分配到三个月的群集或垃圾箱中。我当前正在使用以下代码:

I am currently working with a large phenology data set, where there are multiple observations of trees for a given month. I want to assign these observations into three month clusters or bins. I am currently using the following code:

Cluster.GN <- ifelse(Master.feed.parts.gn$yr.mo=="2007.1", 1,
              ifelse(Master.feed.parts.gn$yr.mo=="2007.11", 1,....     
              ifelse(Master.feed.parts.gn$yr.mo=="2014.05", 17, NA)

此代码有效,但是它因为有超过50个月的时间,这非常麻烦。我很难找到其他解决方案,因为这种分类不是基于观察次数(因为每个月最多可以进行4000观察),并且它不是按时间顺序排列的,因为有些

This code works, but it is very cumbersome as there are over 50 months. I have had trouble finding another solution because this "binning" is not based on number of observations (as within each month there can be up to 4000 observations) and it is not chronological, as some months are missing. Any help you can provide would be highly appreciated.

更新一:我在R中使用了剪切功能,尝试将中断设置为17因为那是我应该拥有的三个月的垃圾箱,但是当我使用table(Cluster.GN)时,它表明只有奇数的垃圾箱具有观察值(对不起,但是我不知道如何在此处上传表格)。> Cluster.GN<-cut(Mas ter.feed.parts.gn $ yr.mo,breaks = 17,c( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17),include.lowest = TRUE)

UPDATE I: I used the "cut" function in R. I tried setting the breaks to 17, as that is how many three month bins I should have. But when I use table(Cluster.GN) it shows that only the odd numbered "bins" have observations (sorry but I can't figure out how to get the table uploaded here). >Cluster.GN <- cut(Master.feed.parts.gn$yr.mo, breaks= 17, c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17"), include.lowest=TRUE)

推荐答案

更新:这个答案是一个快速的技巧,我没有检查 zoo 库。有关正确的方法,请参见 G Grothendieck使用 zoo :: as.yearqtr()的答案

UPDATE: this answer was a quick hack, I didn't check zoo library. For the right way to do it, please see G Grothendieck's answer using zoo::as.yearqtr()

所有您需要做的就是转换 yr .mo 字段从年月字符串(例如 2007.11 )转换为1..17范围内的整数,每个季度进行分箱(即将1..3个月放入第一个容器,将4..6个月放入第二个容器等)。 (除非您的数据稀疏,否则我不知道8年(2007..2014)* 4个季度= 32个存储区如何减少到只有17个存储区。但是无论如何...)

All you need to do is convert the yr.mo field from a year-month string (e.g. 2007.11) into an integer in the range 1..17, binning on every quarter (i.e. months 1..3 into first bin, 4..6 into second bin etc.). (I don't see how 8 years (2007..2014) * 4 quarters = 32 bins reduces to only 17 bins, unless your data is sparse. But anyway...)

不需要笨拙的梯形梯。

为了获得更高的性能,请使用 stringi 库, stri_split_fixed()

And for higher performance, use stringi library, stri_split_fixed()

sample_wr <- function(...) sample(..., replace=T)

# Generate sample data (you're supposed to provide this to code, to make your issue reproducible)
set.seed(123)
N <- 20
df <- data.frame(yr.mo =
          paste(sample_wr(2007:2014, N), sample_wr(1:12, N), sep='.') )
# [1] "2009.11" "2013.9"  "2010.8"  "2014.12" "2014.8"  "2007.9"  "2011.7" 
# [8] "2014.8"  "2011.4"  "2010.2"  "2014.12" "2010.11" "2012.9"  "2011.10"
#[15] "2007.1"  "2014.6"  "2008.10" "2007.3"  "2009.4"  "2014.3" 

yearmonth_to_integer <- function(xx) {
    yy_mm <- as.integer(unlist(strsplit(xx, '.', fixed=T)))
    return( (yy_mm[1] - 2006) + (yy_mm[2] %/% 3) )
}

Cluster.GN <- sapply(x, yearmonth_to_integer)

# 2009.11  2013.9  2010.8 2014.12  2014.8  2007.9  2011.7 
#    6      10       6      12      10       4       7 
# 2014.8  2011.4  2010.2 2014.12 2010.11  2012.9 2011.10 
#   10       6       4      12       7       9       8 
# 2007.1  2014.6 2008.10  2007.3  2009.4  2014.3 
#    1      10       5       2       4       9 

为了获得更高的性能,请使用dplyr或data.table库:

and for higher performance, use dplyr or data.table library:

require(dplyr)

# something like the following, currently doesn't work,
# you have to handle two intermediate columns from yy_mm
# You get to fix this :)

df %>% mutate(yy_mm = as.integer(unlist(strsplit(yr.mo, '.', fixed=T))),
              quarter = yy_mm[1]-2006 + yy_mm[2] %/% 3 )

这篇关于将年-月字符串列转换为季度分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆