扩展由“来自”定义的范围和“至”列 [英] Expand ranges defined by "from" and "to" columns
问题描述
我有一个数据框,其中包含美国总统的姓名
,即他们任职开始和结束的年份,( from
和至
列)。这是一个示例:
I have a data frame containing "name"
of U.S. Presidents, the years when they start and end in office, ("from"
and "to"
columns). Here is a sample:
name from to
Bill Clinton 1993 2001
George W. Bush 2001 2009
Barack Obama 2009 2012
...以及的输出dput
:
dput(tail(presidents, 3))
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name",
"from", "to"), row.names = 42:44, class = "data.frame")
我想创建具有两列的数据框(名称 和
(
),每年总统上任的行数。因此,我需要创建一个常规序列,每年从 从
到到
。这是我的期望值:
I want to create data frame with two columns ("name"
and "year"
), with a row for each year that a president was in office. Thus, I need to create a regular sequence with each year from "from
", to "to"
. Here's my expected out:
name year
Bill Clinton 1993
Bill Clinton 1994
...
Bill Clinton 2000
Bill Clinton 2001
George W. Bush 2001
George W. Bush 2002
...
George W. Bush 2008
George W. Bush 2009
Barack Obama 2009
Barack Obama 2010
Barack Obama 2011
Barack Obama 2012
我知道我可以使用 data.frame(name = Bill Clinton,year = seq(1993,2001))
可以扩展单个总统的职位,但我不知道如何为每个总统迭代。
I know that I can use data.frame(name = "Bill Clinton", year = seq(1993, 2001))
to expand things for a single president, but I can't figure out how to iterate for each president.
我该怎么做?我觉得我应该知道这一点,但我在画一个空白。
How do I do this? I feel that I should know this, but I'm drawing a blank.
好,我已经尝试了两种解决方案,但都遇到错误:
OK, I've tried both solutions, and I'm getting an error:
foo<-structure(list(name = c("Grover Cleveland", "Benjamin Harrison", "Grover Cleveland"), from = c(1885, 1889, 1893), to = c(1889, 1893, 1897)), .Names = c("name", "from", "to"), row.names = 22:24, class = "data.frame")
ddply(foo, "name", summarise, year = seq(from, to))
Error in seq.default(from, to) : 'from' must be of length 1
推荐答案
您可以使用 plyr
软件包:
library(plyr)
ddply(presidents, "name", summarise, year = seq(from, to))
# name year
# 1 Barack Obama 2009
# 2 Barack Obama 2010
# 3 Barack Obama 2011
# 4 Barack Obama 2012
# 5 Bill Clinton 1993
# 6 Bill Clinton 1994
# [...]
a如果重要的是按年份对数据进行排序,则可以使用 arrange
函数:
and if it is important that the data be sorted by year, you can use the arrange
function:
df <- ddply(presidents, "name", summarise, year = seq(from, to))
arrange(df, df$year)
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
# 3 Bill Clinton 1995
# [...]
# 21 Barack Obama 2011
# 22 Barack Obama 2012
编辑1:继@edgester的 Update 1之后,一种更合适的方法是使用 adply
来解释具有非连续性条款的总裁:
Edit 1: Following's @edgester's "Update 1", a more appropriate approach is to use adply
to account for presidents with non-consecutive terms:
adply(foo, 1, summarise, year = seq(from, to))[c("name", "year")]
这篇关于扩展由“来自”定义的范围和“至”列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!