从我的数据框中创建一个虚拟变量矩阵;使用`NA`表示缺失值 [英] Create a matrix of dummy variables from my data frame; use `NA` for missing values
问题描述
我有一个基于不同年份的数据,重复了几次.我希望输出的列数等于年数,每列为一年.现在,目的是分别为每一年创建一个虚拟对象.例如,只要与2000年平行的主数据中存在非NA观测值,则2000年的输出列必须具有值"1",否则为"0".此外,NA必须保持为NA.请在下面看到一小部分输入数据:
I have a data based on different years, repeated several time. I want my output having columns equal to number of years, each column for one year. Now, the purpose is to create dummy for each year separately. For example, the output column for year 2000 must have a value "1" whenever there is a non-NA observation in the main data parallel to year 2000, else "0". Moreover, NA must remain NA. Please see below a small sample of input data:
df:
2000 NA
2001 NA
2002 -1.3
2000 1.1
2001 0
2002 NA
2000 -3
2001 3
2002 4.1
现在输出应为:
df1:
2000 2001 2002
NA NA NA
NA NA NA
0 0 1
1 0 0
0 1 0
NA NA NA
1 0 0
0 1 0
0 0 1
如果可能的话,我希望使用"for循环"来获得此输出.否则,将采用任何更简单的方法.
I would prefer to obtain this output by using a "for loop", if possible. Otherwise, any simpler approach will be appreciated.
推荐答案
不需要循环.我们可以使用model.matrix
:
No loop is needed. We can use model.matrix
:
## your data variable and NA index
x <- c(NA, NA, -1.3, 1.1, 0, NA, -3, 3, 4.1)
na_id <- is.na(x)
## code your year variable as a factor
year <- factor(rep(2000:2002, 3))
## original model matrix; drop intercept to disable contrast
X <- model.matrix(~ year - 1)
# year2000 year2001 year2002
#1 1 0 0
#2 0 1 0
#3 0 0 1
#4 1 0 0
#5 0 1 0
#6 0 0 1
#7 1 0 0
#8 0 1 0
#9 0 0 1
## put NA where `x` is NA (we have used recycling rule here)
X[na_id] <- NA
# year2000 year2001 year2002
#1 NA NA NA
#2 NA NA NA
#3 0 0 1
#4 1 0 0
#5 0 1 0
#6 NA NA NA
#7 1 0 0
#8 0 1 0
#9 0 0 1
矩阵X
将具有一些属性.您可以根据需要删除它们:
Matrix X
will have some attributes. You can drop them if you want:
attr(X, "assign") <- attr(X, "contrasts") <- NULL
您还可以将此矩阵的列名重命名为其他名称,例如
You can also rename the column names of this matrix to something else, like
colnames(X) <- 2000:2002
这篇关于从我的数据框中创建一个虚拟变量矩阵;使用`NA`表示缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!