如何产生“范围”变量在R? [英] How to generate a "range" variable in R?
问题描述
我有一个数据集,看起来像这样:
主题年份X
pre>
A 1990 1
A 1991 1
A 1992 2
A 1993 3
A 1994 4
A 1995 4
B 1990 0
B 1991 1
B 1992 1
B 1993 2
C 1991 1
C 1992 2
C 1993 3
C 1994 3
D 1991 1
D 1992 2
D 1993 3
D 1994 4
D 1995 5
D 1996 5
D 1997 6
我想生成一个二进制(0/1)变量(让我们说变量A),表示X变量达到3(或1-3)的天气,为每个主题。如果X变量达到4以上,则A不能捕获。
应该如下所示:
主题年份XA
A 1990 1 0
A 1991 1 0
A 1992 2 0
A 1993 3 0
A 1994 4 0
A 1995 4 0
B 1990 0 0
B 1991 1 0
B 1992 1 0
B 1993 2 0
C 1991 1 1
C 1992 2 1
C 1993 3 1
C 1994 3 1
D 1991 1 0
D 1992 2 0
D 1993 3 0
D 1994 4 0
D 1995 5 0
D 1996 5 0
D 1997 6 0
我尝试过以下操作:
mydata $ A < - as.numeric(mydata $ X%in%1:3)
,但不能继续执行....
可重现的样本:
> dput(mydata)
structure(list(Subject = structure(c(1L,1L,1L,1L,1L,1L,
2L,2L,2L,2L,3L,3L,3L,3L, 4L,4L,4L,4L,4L,4L,4L),.Label = c(A,
B,C,D),class =factor c(1990L,1991L,1992L,
1993L,1994L,1995L,1990L,1991L,1992L,1993L,1991L,1992L,
1993L,1994L,1991L,1992L,1993L,1994L,1995L, (1L,1L,2L,3L,4L,4L,0L,1L,1L,2L,1L,2L,3L,
3L,1L,2L,3L,4L ,5L,5L,6L)),.Names = c(Subject,Year,
X),class =data.frame,row.names = c(NA,-21L) )
欢迎所有的建议 - 谢谢!
解决方案这是一个基本的R单行使用
ave
p>
df $ A< - ave(df $ X,df $ Subject,FUN = function(x)if(max(x) == 3)1 else 0)
> df
主题年份XA
1 A 1990 1 0
2 A 1991 1 0
3 A 1992 2 0
4 A 1993 3 0
5 A 1994 4 0
6 A 1995 4 0
7 B 1990 0 0
8 B 1991 1 0
9 B 1992 1 0
10 B 1993 2 0
11 C 1991 1 1
12 C 1992 2 1
13 C 1993 3 1
14 C 1994 3 1
15 D 1991 1 0
16 D 1992 2 0
17 D 1993 3 0
18 D 1994 4 0
19 D 1995 5 0
20 D 1996 5 0
21 D 1997 6 0
I have a dataset that looks something like this:
Subject Year X A 1990 1 A 1991 1 A 1992 2 A 1993 3 A 1994 4 A 1995 4 B 1990 0 B 1991 1 B 1992 1 B 1993 2 C 1991 1 C 1992 2 C 1993 3 C 1994 3 D 1991 1 D 1992 2 D 1993 3 D 1994 4 D 1995 5 D 1996 5 D 1997 6
I want to generate a binary(0/1) variable (let's say variable A) that indicates weather the X variables has reached 3 (or 1-3), for each Subject. If the X variable has reached 4 or more, the A should not capture it.
It should look like this:
Subject Year X A A 1990 1 0 A 1991 1 0 A 1992 2 0 A 1993 3 0 A 1994 4 0 A 1995 4 0 B 1990 0 0 B 1991 1 0 B 1992 1 0 B 1993 2 0 C 1991 1 1 C 1992 2 1 C 1993 3 1 C 1994 3 1 D 1991 1 0 D 1992 2 0 D 1993 3 0 D 1994 4 0 D 1995 5 0 D 1996 5 0 D 1997 6 0
I tried the following:
mydata$A<- as.numeric(mydata$X %in% 1:3)
but it doesn't control for the continuation....A reproducible sample:
> dput(mydata) structure(list(Subject = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"), Year = c(1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1990L, 1991L, 1992L, 1993L, 1991L, 1992L, 1993L, 1994L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L ), X = c(1L, 1L, 2L, 3L, 4L, 4L, 0L, 1L, 1L, 2L, 1L, 2L, 3L, 3L, 1L, 2L, 3L, 4L, 5L, 5L, 6L)), .Names = c("Subject", "Year", "X"), class = "data.frame", row.names = c(NA, -21L))
All suggestions are welcome – thanks!
解决方案Here's a base R one-liner use
ave
:df$A <- ave(df$X, df$Subject, FUN = function(x) if (max(x) == 3) 1 else 0) > df Subject Year X A 1 A 1990 1 0 2 A 1991 1 0 3 A 1992 2 0 4 A 1993 3 0 5 A 1994 4 0 6 A 1995 4 0 7 B 1990 0 0 8 B 1991 1 0 9 B 1992 1 0 10 B 1993 2 0 11 C 1991 1 1 12 C 1992 2 1 13 C 1993 3 1 14 C 1994 3 1 15 D 1991 1 0 16 D 1992 2 0 17 D 1993 3 0 18 D 1994 4 0 19 D 1995 5 0 20 D 1996 5 0 21 D 1997 6 0
这篇关于如何产生“范围”变量在R?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!