如何创建“条件”变量在R? [英] How to create a "conditional" variable in R?

查看:147
本文介绍了如何创建“条件”变量在R?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个条件虚拟变量。假设我有一个看起来像这样的数据集:

I want to create a conditional dummy variable. Assume that I have a dataset that looks something like this:

Subject Year    X   X1
   A    1990    1   0
   A    1991    1   0
   A    1992    2   0
   A    1993    3   0
   A    1994    4   0
   A    1995    4   1
   B    1990    0   0
   B    1991    1   0
   B    1992    1   0
   B    1993    2   0
   B    1994    3   0
   C    1990    1   0
   C    1991    2   0
   C    1992    3   1
   C    1993    3   0
   D    1990    1   0
   D    1991    2   0
   D    1992    3   0
   D    1993    4   1
   D    1994    5   0
   E    1990    1   0
   E    1991    1   0
   E    1992    2   1
   E    1993    3   0

让我们调用这个条件变量: Q1to3_noX1 。另一个感兴趣的变量是 Q1to3

Let's call this conditional variable:Q1to3_noX1. Another variable of interest is Q1to3.

变量也是一个虚拟变量,表示X达到值3时为1,否则为每个主题指定为0。如果X为4或更大,那么 Q1to3 变量应为0.X是累积变量(0,1,2,3,4 ...)。换句话说,如果最大X值为3,则 Q1to3 为1.

The Q1to3 variable is also a dummy variable indicating 1 when the X has reached value 3, and 0 otherwise (for each Subject). If the X is 4 or more, then the Q1to3 variable should be 0. The X is a cumulative variable (0,1,2,3,4...). So in other words, the Q1to3 is 1 if the maximum X value is 3.

我使用以下命令创建了此变量: data $ Q1to3< - ave(data $ X,data $ Subject,FUN = function(x)if(max(x)== 3)1 else 0)(感谢@ Zelazny7) 。

I created this variable using: data$Q1to3 <- ave(data$X, data$Subject, FUN = function(x) if (max(x) == 3) 1 else 0) (thanks to @Zelazny7).

变量与变量Q1to3 非常相似, / em>,它取决于X1变量。更准确地说,如果在接下来的5年(从 Q1到3的第一年计数)中X1 = 1,则 Q1to3_no5 应该为0.换句话说,如果a)最大X值为3,b)如果在5年后(否则为0),如果X1 = 0,则 Q1to3_noX1 应为1。

The Q1to3_noX1 variable is very similar to the Q1to3 variable, but in contrast to the Q1to3 , it is conditional on the X1 variable. To be more precise, if the X1 = 1 in the following 5 years (counting from the first year of Q1to3), the Q1to3_no5 should be 0. In other words, the Q1to3_noX1 should be 1 if a)the maximum X value is 3, b) if X1=0 following 5 years(otherwise 0).

我从此问题了解到,我应该使用 rle 函数。但是,我无法在这种特殊情况下应用它。你有什么建议吗?

I understand from this question that I should use the rlefunction. However, I haven't been able to apply it in this particular case. Do you have any suggestions?

理想的结果应如下所示:

The desirable outcome should look like this:

Subject Year    X   X1  Q1to3   Q1to3_noX1
   A    1990    1   0   0          0
   A    1991    1   0   0          0
   A    1992    2   0   0          0
   A    1993    3   0   0          0
   A    1994    4   0   0          0
   A    1995    4   1   0          0
   B    1990    0   0   1          0
   B    1991    1   0   1          1
   B    1992    1   0   1          1
   B    1993    2   0   1          1
   B    1994    3   0   1          1
   C    1990    1   0   1          0
   C    1991    2   0   1          0
   C    1992    3   1   1          0
   C    1993    3   0   1          0
   D    1990    1   0   0          0
   D    1991    2   0   0          0
   D    1992    3   0   0          0
   D    1993    4   1   0          0
   D    1994    5   0   0          0
   E    1990    1   0   1          0
   E    1991    1   0   1          0
   E    1992    2   1   1          0
   E    1993    3   0   1          0

可重复的示例:

    > dput(data)
structure(list(Subject = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 
5L, 5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"), 
    Year = c(1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1990L, 
    1991L, 1992L, 1993L, 1994L, 1990L, 1991L, 1992L, 1993L, 1990L, 
    1991L, 1992L, 1993L, 1994L, 1990L, 1991L, 1992L, 1993L), 
    X = c(1L, 1L, 2L, 3L, 4L, 4L, 0L, 1L, 1L, 2L, 3L, 1L, 2L, 
    3L, 3L, 1L, 2L, 3L, 4L, 5L, 1L, 1L, 2L, 3L), X1 = c(0L, 0L, 
    0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
    0L, 1L, 0L, 0L, 0L, 1L, 0L), Q1to3 = c(0L, 0L, 0L, 0L, 0L, 
    0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 
    1L, 1L, 1L, 1L), Q1to3_noX1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L)), .Names = c("Subject", "Year", "X", "X1", "Q1to3", 
"Q1to3_noX1"), class = "data.frame", row.names = c(NA, -24L))


推荐答案

这是另一个使用Base R的例子。我不是100%我明白问题的确切细节,但这种模式应该可以解决你的问题。

Here's another example using Base R. I'm not 100% I understand the exact details of the question, but this pattern should solve your problem.

ave 非常适用于将摘要向量广播回数据的原始维度。但是如果你看一下 ave 的函数体,它只是使用 split 。我们可以这样做,并为每个块创建多个列,而不只是一个:

ave is great for broadcasting a summarized vector back to the original dimensions of the data. But if you look at the function body for ave it is just using split under the hood. We can do the same and create multiple columns per chunk instead of just one:

# split the data.frame
s <- split(df, df$Subject)

## calculate both columns at once per subject
both <- lapply(s, function(chunk) {
  Q1to3 <- if (max(chunk$X) == 3) 1 else 0
  Q1to3_noX1 <- if (Q1to3 == 1 & all(chunk$X1 == 0)) 1 else 0
  data.frame(Q1to3, Q1to3_noX1)
})

## cbind them back together and unsplit
out <- unsplit(Map(cbind, s, both), df$Subject)

这篇关于如何创建“条件”变量在R?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆