按年拆分数据 [英] Split data by year

查看:127
本文介绍了按年拆分数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的数据:

  ID ATTRIBUTE START END 
1 A 01-01-2000 15- 03-2010
1 B 05-11-2001 06-02-2002
2 B 01-02-2002 08-05-2008
2 B 01-06-2008 01-07- 2008

我现在想计算每年具有特定属性的不同ID的数量。



结果可能如下所示:

  YEAR count(A)count B)
2000 1 0
2001 1 1
2002 1 2
2003 1 1
2004 1 1
2005 1 1
2006 1 1
2007 1 1
2008 1 1
2009 1 0
2010 1 0

我计算发生的第二步可能很简单。



但是我怎么会将我的数据分成多年?



提前谢谢!

解决方案

这是一个使用几个Hadley软件包的方法。

  library(lubridate);图书馆(reshape2);图书馆(plyr)

#从开始和结束日期提取年份转换为日期
dfr2 = transform(dfr,START = year(dmy(START)),END = year ($)
dfr2 = adply(dfr2,1,transform,YEAR = START:END)

#创建年度与属性的数据透视表,其ID值为
dcast(dfr2,YEAR〜ATTRIBUTE,function(x)length(unique(x)),value_var ='ID' )

编辑:如果原始 data.frame 很大,那么 adply 可能需要很多时间。在这种情况下,有用的替代方法是使用 data.table 包。这是我们如何使用 data.table 替换 adply 呼叫。

  require(data.table)
dfr2 = data.table(dfr2)[,list(YEAR = START:END),'ID,ATTRIBUTE']


I have data like this:

ID    ATTRIBUTE        START          END
 1            A   01-01-2000   15-03-2010
 1            B   05-11-2001   06-02-2002
 2            B   01-02-2002   08-05-2008
 2            B   01-06-2008   01-07-2008

I now want to count the number of different IDs having a certain attribute per year.

A result could look like this:

YEAR    count(A)    count(B)
2000          1           0
2001          1           1
2002          1           2
2003          1           1
2004          1           1
2005          1           1
2006          1           1
2007          1           1
2008          1           1
2009          1           0
2010          1           0

I the second step of counting the occurences is probably easy.

But how would I split my data into years?

Thank you in advance!

解决方案

Here is an approach using a few of Hadley's packages.

library(lubridate); library(reshape2); library(plyr)

# extract years from start and end dates after converting them to date
dfr2 = transform(dfr, START = year(dmy(START)), END = year(dmy(END)))

# for every row, construct a sequence of years from start to end
dfr2 = adply(dfr2, 1, transform, YEAR = START:END)

# create pivot table of year vs. attribute with number of unique values of ID
dcast(dfr2, YEAR ~ ATTRIBUTE, function(x) length(unique(x)), value_var = 'ID')

EDIT: If the original data.frame is large, then adply might take a lot of time. A useful alternate in such cases is to use the data.table package. Here is how we can replace the adply call using data.table.

require(data.table)
dfr2 = data.table(dfr2)[,list(YEAR = START:END),'ID, ATTRIBUTE']

这篇关于按年拆分数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆