扩展数据帧中的序列 [英] Expanding a sequence in a data frame

查看：72 发布时间：2017/3/25 22:42:15 r dataframe

本文介绍了扩展数据帧中的序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含美国总统的数据框架，名字起始年份在职，并且在任期结束。以下是一个示例：

 名称从
 Bill Clinton 1993 2001 
 George W. Bush 2001 2009 
 Barack Obama 2009 2012

以下是dput的输出：

 > dput（尾（总统，3））
结构（list（name = c（Bill Clinton，George W. Bush，Barack Obama
），from = c（1993，2001 ，2009），= c（2001，2009，2012）），.Names = c（name，
from，to），row.names = 42:44，class =data 。框架

我想创建两列（名称和年份）的数据框架，总统任职一年的一连串。这是一个例子：

 名字年
比尔·克林顿1993 
比尔·克林顿1994 
比尔克林顿1995 
 ... 
乔治·W·布什2009 
巴拉克·奥巴马2009 
巴拉克·奥巴马2010 
巴拉克·奥巴马2011 
巴拉克·奥巴马2012

我知道我可以使用 data.frame（name =Bill Clinton，year = seq（1993,2001））为一个总统扩大事情，但我不知道如何迭代每个总统。

我该怎么做？我觉得我应该知道这一点，但是我正在画一个空白。

更新1

我已经尝试过这两种解决方案，我收到一个错误：

  foo< -structure（list（name = c 格罗弗·克利夫兰，本杰明·哈里森，格罗弗·克利夫兰），从= c（1885年，1889年，1893年）到= c（1889年，1893年，1897年）），Names = c（name ，to），row.names = 22:24，class =data.frame）
 ddply（foo，name，summarize，year = seq（from，to））
 seq.default（from，to）中的错误：'from'必须为长度1

解决方案

您可以使用 plyr 包：

  library（plyr）
 ddply（总统，名称，总结，年份= seq（从，到））
＃名称年
＃1巴拉克·奥巴马2009 
＃ 2巴拉克·奥巴马2010 
＃3巴拉克·奥巴马2011 
＃4巴拉克·奥巴马2012 
＃5比尔·克林顿1993 
＃6比尔·克林顿1994 
＃[...]

如果数据按年份排序很重要，可以使用排列函数：

  df<  -  ddply（总统，名称，总结，year = seq（从，到））
 arrange ，df $ year）
＃名字年
＃1比尔·克林顿1993 
＃2比尔·克林顿1994 
＃3比尔·克林顿1995 
＃[...] 
＃21巴拉克·奥巴马2011 
＃22巴拉克·奥巴马2012

编辑1： @ edgester的更新1，更合适的方法是使用 adply 来代替具有非连续条款的总统：

[pre>

 adply（foo，1，summarize，year = seq（from，to））[c（name，year）]

I have a data frame containing U.S. Presidents with name, starting year in office, and ending year in office. Here is a sample:

name           from  to
Bill Clinton   1993 2001
George W. Bush 2001 2009
Barack Obama   2009 2012

Here is the output from dput:

> dput(tail(presidents,3))
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name", 
"from", "to"), row.names = 42:44, class = "data.frame")

I want to create data frame with two columns (name and year) where there is a row for each year that a president was in office. Here is an example:

name           year
Bill Clinton   1993
Bill Clinton   1994
Bill Clinton   1995
...
George W. Bush 2009
Barack Obama   2009
Barack Obama   2010
Barack Obama   2011
Barack Obama   2012

I know that I can use data.frame(name="Bill Clinton", year=seq(1993,2001)) to expand things for a single president, but I can't figure out how to iterate for each president.

How do I do this? I feel that I should know this, but I'm drawing a blank.

Update 1

OK, I've tried both solutions, and I'm getting an error:

foo<-structure(list(name = c("Grover Cleveland", "Benjamin Harrison", "Grover Cleveland"), from = c(1885, 1889, 1893), to = c(1889, 1893, 1897)), .Names = c("name", "from", "to"), row.names = 22:24, class = "data.frame")
ddply(foo, "name", summarise, year = seq(from, to))
Error in seq.default(from, to) : 'from' must be of length 1

解决方案

You can use the plyr package:

library(plyr)
ddply(presidents, "name", summarise, year = seq(from, to))
#              name year
# 1    Barack Obama 2009
# 2    Barack Obama 2010
# 3    Barack Obama 2011
# 4    Barack Obama 2012
# 5    Bill Clinton 1993
# 6    Bill Clinton 1994
# [...]

and if it is important that the data be sorted by year, you can use the arrange function:

df <- ddply(presidents, "name", summarise, year = seq(from, to))
arrange(df, df$year)
#              name year
# 1    Bill Clinton 1993
# 2    Bill Clinton 1994
# 3    Bill Clinton 1995
# [...]
# 21   Barack Obama 2011
# 22   Barack Obama 2012

Edit 1: Following's @edgester's "Update 1", a more appropriate approach is to use adply to account for presidents with non-consecutive terms:

adply(foo, 1, summarise, year = seq(from, to))[c("name", "year")]

这篇关于扩展数据帧中的序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

扩展数据帧中的序列 [英] Expanding a sequence in a data frame

问题描述

更新1

Update 1

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

扩展数据帧中的序列 [英] Expanding a sequence in a data frame

问题描述

更新1

Update 1

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭