按年拆分数据 [英] Split data by year
问题描述
ID ATTRIBUTE START END
1 A 01-01-2000 15- 03-2010
1 B 05-11-2001 06-02-2002
2 B 01-02-2002 08-05-2008
2 B 01-06-2008 01-07- 2008
我现在想计算每年具有特定属性的不同ID的数量。
结果可能如下所示:
YEAR count(A)count B)
2000 1 0
2001 1 1
2002 1 2
2003 1 1
2004 1 1
2005 1 1
2006 1 1
2007 1 1
2008 1 1
2009 1 0
2010 1 0
我计算发生的第二步可能很简单。
但是我怎么会将我的数据分成多年?
提前谢谢!
这是一个使用几个Hadley软件包的方法。
library(lubridate);图书馆(reshape2);图书馆(plyr)
#从开始和结束日期提取年份转换为日期
dfr2 = transform(dfr,START = year(dmy(START)),END = year ($)
dfr2 = adply(dfr2,1,transform,YEAR = START:END)
#创建年度与属性的数据透视表,其ID值为
dcast(dfr2,YEAR〜ATTRIBUTE,function(x)length(unique(x)),value_var ='ID' )
编辑:如果原始 data.frame
很大,那么 adply
可能需要很多时间。在这种情况下,有用的替代方法是使用 data.table
包。这是我们如何使用 data.table
替换 adply
呼叫。
require(data.table)
dfr2 = data.table(dfr2)[,list(YEAR = START:END),'ID,ATTRIBUTE']
I have data like this:
ID ATTRIBUTE START END
1 A 01-01-2000 15-03-2010
1 B 05-11-2001 06-02-2002
2 B 01-02-2002 08-05-2008
2 B 01-06-2008 01-07-2008
I now want to count the number of different IDs having a certain attribute per year.
A result could look like this:
YEAR count(A) count(B)
2000 1 0
2001 1 1
2002 1 2
2003 1 1
2004 1 1
2005 1 1
2006 1 1
2007 1 1
2008 1 1
2009 1 0
2010 1 0
I the second step of counting the occurences is probably easy.
But how would I split my data into years?
Thank you in advance!
Here is an approach using a few of Hadley's packages.
library(lubridate); library(reshape2); library(plyr)
# extract years from start and end dates after converting them to date
dfr2 = transform(dfr, START = year(dmy(START)), END = year(dmy(END)))
# for every row, construct a sequence of years from start to end
dfr2 = adply(dfr2, 1, transform, YEAR = START:END)
# create pivot table of year vs. attribute with number of unique values of ID
dcast(dfr2, YEAR ~ ATTRIBUTE, function(x) length(unique(x)), value_var = 'ID')
EDIT: If the original data.frame
is large, then adply
might take a lot of time. A useful alternate in such cases is to use the data.table
package. Here is how we can replace the adply
call using data.table
.
require(data.table)
dfr2 = data.table(dfr2)[,list(YEAR = START:END),'ID, ATTRIBUTE']
这篇关于按年拆分数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!