如何“在事件之前"创建条件假人?与dplyr在R中? [英] How to create conditional dummies "before the event" with dplyr in R?
问题描述
我正在尝试使用规则创建条件虚拟(X)
I'm trying create a condition dummy (X) with the rule
如果在NA之前的最后两年中Y = 1,则设置X = 1(仅计算一次!).
set X=1 if Y=1 the last two years before the NA (only count once!).
举个例子:这是我的数据中的一个示例:
To give an example: this is a sample from my data:
year country Y
1990 Bahamas 1
1991 Bahamas NA
1992 Bahamas NA
1993 Bahamas 0
1994 Bahamas 1
1995 Bahamas 1
1996 Bahamas NA
1997 Bahamas 1
1998 Bahamas NA
1999 Bahamas 1
2000 Bahamas NA
2001 Bahamas 1
2002 Bahamas 1
2003 Bahamas 0
2004 Bahamas NA
2005 Bahamas 0
2006 Bahamas 0
2007 Bahamas 1
2008 Bahamas NA
2009 Bahamas 1
2010 Bahamas 1
2011 Bahamas 1
这是X假人的外观:
year country Y X1
1990 Bahamas 1 1
1991 Bahamas NA 0
1992 Bahamas NA 0
1993 Bahamas 0 0
1994 Bahamas 1 1
1995 Bahamas 1 0
1996 Bahamas NA 0
1997 Bahamas 1 1
1998 Bahamas NA 0
1999 Bahamas 1 1
2000 Bahamas NA 0
2001 Bahamas 1 1
2002 Bahamas 1 0
2003 Bahamas 0 0
2004 Bahamas NA 0
2005 Bahamas 0 0
2006 Bahamas 0 0
2007 Bahamas 1 1
2008 Bahamas NA 0
2009 Bahamas 1 0
2010 Bahamas 1 0
2011 Bahamas 1 0
这对我来说有点太复杂了.我一直在阅读有关dplyr的信息,这里似乎是一个相关的软件包.到目前为止,我的读物已经把我带到了这个鳕鱼上.
This is a bit too complicated for me. I've been reading about dplyr which seems to be a relevant package here. My readings has so far taken me to this cod
df %>% mutate(X=ifelse(Y >0) & lag(Y,2,))
我得到了错误:
缺少参数是",没有默认值
argument "yes" is missing, with no default
请告诉我我在做什么错.我是否也应该在滞后"之前加上"ifelse"?
Please tell me what am I doing wrong here. Should I put the "ifelse" before the "lag" as well?
谢谢.
推荐答案
library(dplyr)
dat <- readr::read_table(
"year country Y
1990 Bahamas 1
1991 Bahamas NA
1992 Bahamas NA
1993 Bahamas 0
1994 Bahamas 1
1995 Bahamas 1
1996 Bahamas NA
1997 Bahamas 1
1998 Bahamas NA
1999 Bahamas 1
2000 Bahamas NA
2001 Bahamas 1
2002 Bahamas 1
2003 Bahamas 0
2004 Bahamas NA
2005 Bahamas 0
2006 Bahamas 0
2007 Bahamas 1
2008 Bahamas NA
2009 Bahamas 1
2010 Bahamas 1
2011 Bahamas 1
")
expected_output <- readr::read_table(
"year country Y X1
1990 Bahamas 1 1
1991 Bahamas NA 0
1992 Bahamas NA 0
1993 Bahamas 0 0
1994 Bahamas 1 1
1995 Bahamas 1 0
1996 Bahamas NA 0
1997 Bahamas 1 1
1998 Bahamas NA 0
1999 Bahamas 1 1
2000 Bahamas NA 0
2001 Bahamas 1 1
2002 Bahamas 1 0
2003 Bahamas 0 0
2004 Bahamas NA 0
2005 Bahamas 0 0
2006 Bahamas 0 0
2007 Bahamas 1 1
2008 Bahamas NA 0
2009 Bahamas 1 0
2010 Bahamas 1 0
2011 Bahamas 1 0
")
标识以NA
结尾的组,在Y
列中找到第一个1
的位置,在找到的位置中创建X1
列,其中1
s:
Identify the groups ending with NA
, find the position of the first 1
in the Y
column, create the X1
column with 1
s in found positions:
res <-
dat %>%
group_by(country) %>%
group_by(grp = cumsum(is.na(lag(Y))), add = TRUE) %>%
mutate(first_year_at_1 = match(1, Y) * any(is.na(Y)) * any(tail(Y, 3) == 1L),
X1 = {x <- integer(length(Y)) ; x[first_year_at_1] <- 1L ; x}) %>%
ungroup()
all.equal(select(res, -grp, -first_year_at_1), expected_output)
# [1] TRUE
(注意:如果实际数据集中存在不同的国家,您可能希望先按country
分组,以避免在国家/地区交界处产生不良影响.我相应地编辑了答案).
(Note: if there are different countries in the real dataset, you might want to group by country
first to avoid undesirable effects at the junction of countries. I edited my answer accordingly).
这篇关于如何“在事件之前"创建条件假人?与dplyr在R中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!