如何“在事件之前"创建条件假人?与dplyr在R中? [英] How to create conditional dummies "before the event" with dplyr in R?

查看:84
本文介绍了如何“在事件之前"创建条件假人?与dplyr在R中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用规则创建条件虚拟(X)

I'm trying create a condition dummy (X) with the rule

如果在NA之前的最后两年中Y = 1,则设置X = 1(仅计算一次!).

set X=1 if Y=1 the last two years before the NA (only count once!).

举个例子:这是我的数据中的一个示例:

To give an example: this is a sample from my data:

year    country Y
1990    Bahamas 1
1991    Bahamas NA
1992    Bahamas NA
1993    Bahamas 0
1994    Bahamas 1
1995    Bahamas 1
1996    Bahamas NA
1997    Bahamas 1
1998    Bahamas NA
1999    Bahamas 1
2000    Bahamas NA
2001    Bahamas 1
2002    Bahamas 1
2003    Bahamas 0
2004    Bahamas NA
2005    Bahamas 0
2006    Bahamas 0
2007    Bahamas 1
2008    Bahamas NA
2009    Bahamas 1
2010    Bahamas 1
2011    Bahamas 1

这是X假人的外观:

year    country Y   X1
1990    Bahamas 1   1
1991    Bahamas NA  0
1992    Bahamas NA  0
1993    Bahamas 0   0
1994    Bahamas 1   1
1995    Bahamas 1   0
1996    Bahamas NA  0
1997    Bahamas 1   1
1998    Bahamas NA  0
1999    Bahamas 1   1
2000    Bahamas NA  0
2001    Bahamas 1   1
2002    Bahamas 1   0
2003    Bahamas 0   0
2004    Bahamas NA  0
2005    Bahamas 0   0
2006    Bahamas 0   0
2007    Bahamas 1   1
2008    Bahamas NA  0
2009    Bahamas 1   0
2010    Bahamas 1   0
2011    Bahamas 1   0

这对我来说有点太复杂了.我一直在阅读有关dplyr的信息,这里似乎是一个相关的软件包.到目前为止,我的读物已经把我带到了这个鳕鱼上.

This is a bit too complicated for me. I've been reading about dplyr which seems to be a relevant package here. My readings has so far taken me to this cod

df %>% mutate(X=ifelse(Y >0) & lag(Y,2,))

我得到了错误:

缺少参数是",没有默认值

argument "yes" is missing, with no default

请告诉我我在做什么错.我是否也应该在滞后"之前加上"ifelse"?

Please tell me what am I doing wrong here. Should I put the "ifelse" before the "lag" as well?

谢谢.

推荐答案

library(dplyr)

dat <- readr::read_table(
"year    country Y
1990    Bahamas 1
1991    Bahamas NA
1992    Bahamas NA
1993    Bahamas 0
1994    Bahamas 1
1995    Bahamas 1
1996    Bahamas NA
1997    Bahamas 1
1998    Bahamas NA
1999    Bahamas 1
2000    Bahamas NA
2001    Bahamas 1
2002    Bahamas 1
2003    Bahamas 0
2004    Bahamas NA
2005    Bahamas 0
2006    Bahamas 0
2007    Bahamas 1
2008    Bahamas NA
2009    Bahamas 1
2010    Bahamas 1
2011    Bahamas 1
")

expected_output <- readr::read_table(
"year    country Y   X1
1990    Bahamas 1   1
1991    Bahamas NA  0
1992    Bahamas NA  0
1993    Bahamas 0   0
1994    Bahamas 1   1
1995    Bahamas 1   0
1996    Bahamas NA  0
1997    Bahamas 1   1
1998    Bahamas NA  0
1999    Bahamas 1   1
2000    Bahamas NA  0
2001    Bahamas 1   1
2002    Bahamas 1   0
2003    Bahamas 0   0
2004    Bahamas NA  0
2005    Bahamas 0   0
2006    Bahamas 0   0
2007    Bahamas 1   1
2008    Bahamas NA  0
2009    Bahamas 1   0
2010    Bahamas 1   0
2011    Bahamas 1   0
")

标识以NA结尾的组,在Y列中找到第一个1的位置,在找到的位置中创建X1列,其中1 s:

Identify the groups ending with NA, find the position of the first 1 in the Y column, create the X1 column with 1s in found positions:

res <-
  dat %>% 
  group_by(country) %>% 
  group_by(grp = cumsum(is.na(lag(Y))), add = TRUE) %>% 
  mutate(first_year_at_1 = match(1, Y) * any(is.na(Y)) * any(tail(Y, 3) == 1L), 
         X1 = {x <- integer(length(Y)) ; x[first_year_at_1] <- 1L ; x}) %>% 
  ungroup()

all.equal(select(res, -grp, -first_year_at_1), expected_output)

# [1] TRUE

(注意:如果实际数据集中存在不同的国家,您可能希望先按country分组,以避免在国家/地区交界处产生不良影响.我相应地编辑了答案).

(Note: if there are different countries in the real dataset, you might want to group by country first to avoid undesirable effects at the junction of countries. I edited my answer accordingly).

这篇关于如何“在事件之前"创建条件假人?与dplyr在R中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆