扩展由“来自”定义的范围和“至”列 [英] Expand ranges defined by "from" and "to" columns

查看:58
本文介绍了扩展由“来自”定义的范围和“至”列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中包含美国总统的姓名 ,即他们任职开始和结束的年份,( from 列)。这是一个示例:

I have a data frame containing "name" of U.S. Presidents, the years when they start and end in office, ("from" and "to" columns). Here is a sample:

name           from  to
Bill Clinton   1993 2001
George W. Bush 2001 2009
Barack Obama   2009 2012

...以及的输出dput

dput(tail(presidents, 3))
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name", 
"from", "to"), row.names = 42:44, class = "data.frame")

我想创建具有两列的数据框(名称 ),每年总统上任的行数。因此,我需要创建一个常规序列,每年从 。这是我的期望值:

I want to create data frame with two columns ("name" and "year"), with a row for each year that a president was in office. Thus, I need to create a regular sequence with each year from "from", to "to". Here's my expected out:

name           year
Bill Clinton   1993
Bill Clinton   1994
...
Bill Clinton   2000
Bill Clinton   2001
George W. Bush 2001
George W. Bush 2002
... 
George W. Bush 2008
George W. Bush 2009
Barack Obama   2009
Barack Obama   2010
Barack Obama   2011
Barack Obama   2012

我知道我可以使用 data.frame(name = Bill Clinton,year = seq(1993,2001))可以扩展单个总统的职位,但我不知道如何为每个总统迭代。

I know that I can use data.frame(name = "Bill Clinton", year = seq(1993, 2001)) to expand things for a single president, but I can't figure out how to iterate for each president.

我该怎么做?我觉得我应该知道这一点,但我在画一个空白。

How do I do this? I feel that I should know this, but I'm drawing a blank.

好,我已经尝试了两种解决方案,但都遇到错误:

OK, I've tried both solutions, and I'm getting an error:

foo<-structure(list(name = c("Grover Cleveland", "Benjamin Harrison", "Grover Cleveland"), from = c(1885, 1889, 1893), to = c(1889, 1893, 1897)), .Names = c("name", "from", "to"), row.names = 22:24, class = "data.frame")
ddply(foo, "name", summarise, year = seq(from, to))
Error in seq.default(from, to) : 'from' must be of length 1


推荐答案

您可以使用 plyr 软件包:

library(plyr)
ddply(presidents, "name", summarise, year = seq(from, to))
#              name year
# 1    Barack Obama 2009
# 2    Barack Obama 2010
# 3    Barack Obama 2011
# 4    Barack Obama 2012
# 5    Bill Clinton 1993
# 6    Bill Clinton 1994
# [...]

a如果重要的是按年份对数据进行排序,则可以使用 arrange 函数:

and if it is important that the data be sorted by year, you can use the arrange function:

df <- ddply(presidents, "name", summarise, year = seq(from, to))
arrange(df, df$year)
#              name year
# 1    Bill Clinton 1993
# 2    Bill Clinton 1994
# 3    Bill Clinton 1995
# [...]
# 21   Barack Obama 2011
# 22   Barack Obama 2012

编辑1:继@edgester的 Update 1之后,一种更合适的方法是使用 adply 来解释具有非连续性条款的总裁:

Edit 1: Following's @edgester's "Update 1", a more appropriate approach is to use adply to account for presidents with non-consecutive terms:

adply(foo, 1, summarise, year = seq(from, to))[c("name", "year")]

这篇关于扩展由“来自”定义的范围和“至”列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆