按年拆分数据 [英] Split data by year

查看：127 发布时间：2017/3/26 1:21:50 r split dataframe

本文介绍了按年拆分数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有这样的数据：

  ID ATTRIBUTE START END 
 1 A 01-01-2000 15- 03-2010 
 1 B 05-11-2001 06-02-2002 
 2 B 01-02-2002 08-05-2008 
 2 B 01-06-2008 01-07- 2008

我现在想计算每年具有特定属性的不同ID的数量。

结果可能如下所示：

  YEAR count（A）count B）
 2000 1 0 
 2001 1 1 
 2002 1 2 
 2003 1 1 
 2004 1 1 
 2005 1 1 
 2006 1 1 
 2007 1 1 
 2008 1 1 
 2009 1 0 
 2010 1 0

我计算发生的第二步可能很简单。

但是我怎么会将我的数据分成多年？

提前谢谢！

解决方案

这是一个使用几个Hadley软件包的方法。

  library（lubridate）;图书馆（reshape2）;图书馆（plyr）
 
＃从开始和结束日期提取年份转换为日期
 dfr2 = transform（dfr，START = year（dmy（START）），END = year （$）
 dfr2 = adply（dfr2,1，transform，YEAR = START：END）
 
＃创建年度与属性的数据透视表，其ID值为
 dcast（dfr2，YEAR〜ATTRIBUTE，function（x）length（unique（x）），value_var ='ID' ）

编辑：如果原始 data.frame 很大，那么 adply 可能需要很多时间。在这种情况下，有用的替代方法是使用 data.table 包。这是我们如何使用 data.table 替换 adply 呼叫。

  require（data.table）
 dfr2 = data.table（dfr2）[，list（YEAR = START：END），'ID，ATTRIBUTE']

I have data like this:

ID    ATTRIBUTE        START          END
 1            A   01-01-2000   15-03-2010
 1            B   05-11-2001   06-02-2002
 2            B   01-02-2002   08-05-2008
 2            B   01-06-2008   01-07-2008

I now want to count the number of different IDs having a certain attribute per year.

A result could look like this:

YEAR    count(A)    count(B)
2000          1           0
2001          1           1
2002          1           2
2003          1           1
2004          1           1
2005          1           1
2006          1           1
2007          1           1
2008          1           1
2009          1           0
2010          1           0

I the second step of counting the occurences is probably easy.

But how would I split my data into years?

Thank you in advance!

解决方案

Here is an approach using a few of Hadley's packages.

library(lubridate); library(reshape2); library(plyr)

# extract years from start and end dates after converting them to date
dfr2 = transform(dfr, START = year(dmy(START)), END = year(dmy(END)))

# for every row, construct a sequence of years from start to end
dfr2 = adply(dfr2, 1, transform, YEAR = START:END)

# create pivot table of year vs. attribute with number of unique values of ID
dcast(dfr2, YEAR ~ ATTRIBUTE, function(x) length(unique(x)), value_var = 'ID')

EDIT: If the original data.frame is large, then adply might take a lot of time. A useful alternate in such cases is to use the data.table package. Here is how we can replace the adply call using data.table.

require(data.table)
dfr2 = data.table(dfr2)[,list(YEAR = START:END),'ID, ATTRIBUTE']

这篇关于按年拆分数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

按年拆分数据 [英] Split data by year

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

按年拆分数据 [英] Split data by year

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭