扩展数据帧中的序列 [英] Expanding a sequence in a data frame

查看:72
本文介绍了扩展数据帧中的序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含美国总统的数据框架,名字起始年份在职,并且在任期结束。以下是一个示例:

 名称从
Bill Clinton 1993 2001
George W. Bush 2001 2009
Barack Obama 2009 2012

以下是dput的输出:

 > dput(尾(总统,3))
结构(list(name = c(Bill Clinton,George W. Bush,Barack Obama
),from = c(1993,2001 ,2009),= c(2001,2009,2012)),.Names = c(name,
from,to),row.names = 42:44,class =data 。框架

我想创建两列(名称和年份)的数据框架,总统任职一年的一连串。这是一个例子:

 名字年
比尔·克林顿1993
比尔·克林顿1994
比尔克林顿1995
...
乔治·W·布什2009
巴拉克·奥巴马2009
巴拉克·奥巴马2010
巴拉克·奥巴马2011
巴拉克·奥巴马2012

我知道我可以使用 data.frame(name =Bill Clinton,year = seq(1993,2001))为一个总统扩大事情,但我不知道如何迭代每个总统。



我该怎么做?我觉得我应该知道这一点,但是我正在画一个空白。



更新1



我已经尝试过这两种解决方案,我收到一个错误:

  foo< -structure(list(name = c 格罗弗·克利夫兰,本杰明·哈里森,格罗弗·克利夫兰),从= c(1885年,1889年,1893年)到= c(1889年,1893年,1897年)),Names = c(name ,to),row.names = 22:24,class =data.frame)
ddply(foo,name,summarize,year = seq(from,to))
seq.default(from,to)中的错误:'from'必须为长度1


解决方案

您可以使用 plyr 包:

  library(plyr)
ddply(总统,名称,总结,年份= seq(从,到))
#名称年
#1巴拉克·奥巴马2009
# 2巴拉克·奥巴马2010
#3巴拉克·奥巴马2011
#4巴拉克·奥巴马2012
#5比尔·克林顿1993
#6比尔·克林顿1994
#[...]

如果数据按年份排序很重要,可以使用排列函数:

  df<  -  ddply(总统,名称,总结,year = seq(从,到))
arrange ,df $ year)
#名字年
#1比尔·克林顿1993
#2比尔·克林顿1994
#3比尔·克林顿1995
#[...]
#21巴拉克·奥巴马2011
#22巴拉克·奥巴马2012

编辑1: @ edgester的更新1,更合适的方法是使用 adply 来代替具有非连续条款的总统:



[pre> adply(foo,1,summarize,year = seq(from,to))[c(name,year)]


I have a data frame containing U.S. Presidents with name, starting year in office, and ending year in office. Here is a sample:

name           from  to
Bill Clinton   1993 2001
George W. Bush 2001 2009
Barack Obama   2009 2012

Here is the output from dput:

> dput(tail(presidents,3))
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name", 
"from", "to"), row.names = 42:44, class = "data.frame")

I want to create data frame with two columns (name and year) where there is a row for each year that a president was in office. Here is an example:

name           year
Bill Clinton   1993
Bill Clinton   1994
Bill Clinton   1995
...
George W. Bush 2009
Barack Obama   2009
Barack Obama   2010
Barack Obama   2011
Barack Obama   2012

I know that I can use data.frame(name="Bill Clinton", year=seq(1993,2001)) to expand things for a single president, but I can't figure out how to iterate for each president.

How do I do this? I feel that I should know this, but I'm drawing a blank.

Update 1

OK, I've tried both solutions, and I'm getting an error:

foo<-structure(list(name = c("Grover Cleveland", "Benjamin Harrison", "Grover Cleveland"), from = c(1885, 1889, 1893), to = c(1889, 1893, 1897)), .Names = c("name", "from", "to"), row.names = 22:24, class = "data.frame")
ddply(foo, "name", summarise, year = seq(from, to))
Error in seq.default(from, to) : 'from' must be of length 1

解决方案

You can use the plyr package:

library(plyr)
ddply(presidents, "name", summarise, year = seq(from, to))
#              name year
# 1    Barack Obama 2009
# 2    Barack Obama 2010
# 3    Barack Obama 2011
# 4    Barack Obama 2012
# 5    Bill Clinton 1993
# 6    Bill Clinton 1994
# [...]

and if it is important that the data be sorted by year, you can use the arrange function:

df <- ddply(presidents, "name", summarise, year = seq(from, to))
arrange(df, df$year)
#              name year
# 1    Bill Clinton 1993
# 2    Bill Clinton 1994
# 3    Bill Clinton 1995
# [...]
# 21   Barack Obama 2011
# 22   Barack Obama 2012

Edit 1: Following's @edgester's "Update 1", a more appropriate approach is to use adply to account for presidents with non-consecutive terms:

adply(foo, 1, summarise, year = seq(from, to))[c("name", "year")]

这篇关于扩展数据帧中的序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆