扩展数据帧中的序列 [英] Expanding a sequence in a data frame
问题描述
名称从
Bill Clinton 1993 2001
George W. Bush 2001 2009
Barack Obama 2009 2012
以下是dput的输出:
> dput(尾(总统,3))
结构(list(name = c(Bill Clinton,George W. Bush,Barack Obama
),from = c(1993,2001 ,2009),= c(2001,2009,2012)),.Names = c(name,
from,to),row.names = 42:44,class =data 。框架
我想创建两列(名称和年份)的数据框架,总统任职一年的一连串。这是一个例子:
名字年
比尔·克林顿1993
比尔·克林顿1994
比尔克林顿1995
...
乔治·W·布什2009
巴拉克·奥巴马2009
巴拉克·奥巴马2010
巴拉克·奥巴马2011
巴拉克·奥巴马2012
我知道我可以使用 data.frame(name =Bill Clinton,year = seq(1993,2001))
为一个总统扩大事情,但我不知道如何迭代每个总统。
我该怎么做?我觉得我应该知道这一点,但是我正在画一个空白。
更新1
我已经尝试过这两种解决方案,我收到一个错误:
foo< -structure(list(name = c 格罗弗·克利夫兰,本杰明·哈里森,格罗弗·克利夫兰),从= c(1885年,1889年,1893年)到= c(1889年,1893年,1897年)),Names = c(name ,to),row.names = 22:24,class =data.frame)
ddply(foo,name,summarize,year = seq(from,to))
seq.default(from,to)中的错误:'from'必须为长度1
您可以使用 plyr
包:
library(plyr)
ddply(总统,名称,总结,年份= seq(从,到))
#名称年
#1巴拉克·奥巴马2009
# 2巴拉克·奥巴马2010
#3巴拉克·奥巴马2011
#4巴拉克·奥巴马2012
#5比尔·克林顿1993
#6比尔·克林顿1994
#[...]
如果数据按年份排序很重要,可以使用排列
函数:
df< - ddply(总统,名称,总结,year = seq(从,到))
arrange ,df $ year)
#名字年
#1比尔·克林顿1993
#2比尔·克林顿1994
#3比尔·克林顿1995
#[...]
#21巴拉克·奥巴马2011
#22巴拉克·奥巴马2012
编辑1: @ edgester的更新1,更合适的方法是使用 adply
来代替具有非连续条款的总统:
[pre>
adply(foo,1,summarize,year = seq(from,to))[c(name,year)]
I have a data frame containing U.S. Presidents with name, starting year in office, and ending year in office. Here is a sample:
name from to
Bill Clinton 1993 2001
George W. Bush 2001 2009
Barack Obama 2009 2012
Here is the output from dput:
> dput(tail(presidents,3))
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name",
"from", "to"), row.names = 42:44, class = "data.frame")
I want to create data frame with two columns (name and year) where there is a row for each year that a president was in office. Here is an example:
name year
Bill Clinton 1993
Bill Clinton 1994
Bill Clinton 1995
...
George W. Bush 2009
Barack Obama 2009
Barack Obama 2010
Barack Obama 2011
Barack Obama 2012
I know that I can use data.frame(name="Bill Clinton", year=seq(1993,2001))
to expand things for a single president, but I can't figure out how to iterate for each president.
How do I do this? I feel that I should know this, but I'm drawing a blank.
Update 1
OK, I've tried both solutions, and I'm getting an error:
foo<-structure(list(name = c("Grover Cleveland", "Benjamin Harrison", "Grover Cleveland"), from = c(1885, 1889, 1893), to = c(1889, 1893, 1897)), .Names = c("name", "from", "to"), row.names = 22:24, class = "data.frame")
ddply(foo, "name", summarise, year = seq(from, to))
Error in seq.default(from, to) : 'from' must be of length 1
You can use the plyr
package:
library(plyr)
ddply(presidents, "name", summarise, year = seq(from, to))
# name year
# 1 Barack Obama 2009
# 2 Barack Obama 2010
# 3 Barack Obama 2011
# 4 Barack Obama 2012
# 5 Bill Clinton 1993
# 6 Bill Clinton 1994
# [...]
and if it is important that the data be sorted by year, you can use the arrange
function:
df <- ddply(presidents, "name", summarise, year = seq(from, to))
arrange(df, df$year)
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
# 3 Bill Clinton 1995
# [...]
# 21 Barack Obama 2011
# 22 Barack Obama 2012
Edit 1: Following's @edgester's "Update 1", a more appropriate approach is to use adply
to account for presidents with non-consecutive terms:
adply(foo, 1, summarise, year = seq(from, to))[c("name", "year")]
这篇关于扩展数据帧中的序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!