从我的数据框中创建一个虚拟变量矩阵;使用`NA`表示缺失值 [英] Create a matrix of dummy variables from my data frame; use `NA` for missing values

查看:222
本文介绍了从我的数据框中创建一个虚拟变量矩阵;使用`NA`表示缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个基于不同年份的数据,重复了几次.我希望输出的列数等于年数,每列为一年.现在,目的是分别为每一年创建一个虚拟对象.例如,只要与2000年平行的主数据中存在非NA观测值,则2000年的输出列必须具有值"1",否则为"0".此外,NA必须保持为NA.请在下面看到一小部分输入数据:

I have a data based on different years, repeated several time. I want my output having columns equal to number of years, each column for one year. Now, the purpose is to create dummy for each year separately. For example, the output column for year 2000 must have a value "1" whenever there is a non-NA observation in the main data parallel to year 2000, else "0". Moreover, NA must remain NA. Please see below a small sample of input data:

df:
2000    NA
2001    NA
2002   -1.3
2000    1.1
2001    0
2002    NA
2000   -3
2001    3
2002    4.1

现在输出应为:

df1:
2000    2001    2002
 NA      NA      NA
 NA      NA      NA
 0       0       1
 1       0       0
 0       1       0
 NA      NA      NA
 1       0       0
 0       1       0
 0       0       1

如果可能的话,我希望使用"for循环"来获得此输出.否则,将采用任何更简单的方法.

I would prefer to obtain this output by using a "for loop", if possible. Otherwise, any simpler approach will be appreciated.

推荐答案

不需要循环.我们可以使用model.matrix:

No loop is needed. We can use model.matrix:

## your data variable and NA index
x <- c(NA, NA, -1.3, 1.1, 0, NA, -3, 3, 4.1)
na_id <- is.na(x)

## code your year variable as a factor
year <- factor(rep(2000:2002, 3))

## original model matrix; drop intercept to disable contrast
X <- model.matrix(~ year - 1)

#  year2000 year2001 year2002
#1        1        0        0
#2        0        1        0
#3        0        0        1
#4        1        0        0
#5        0        1        0
#6        0        0        1
#7        1        0        0
#8        0        1        0
#9        0        0        1

## put NA where `x` is NA (we have used recycling rule here)
X[na_id] <- NA

#  year2000 year2001 year2002
#1       NA       NA       NA
#2       NA       NA       NA
#3        0        0        1
#4        1        0        0
#5        0        1        0
#6       NA       NA       NA
#7        1        0        0
#8        0        1        0
#9        0        0        1

矩阵X将具有一些属性.您可以根据需要删除它们:

Matrix X will have some attributes. You can drop them if you want:

attr(X, "assign") <- attr(X, "contrasts") <- NULL

您还可以将此矩阵的列名重命名为其他名称,例如

You can also rename the column names of this matrix to something else, like

colnames(X) <- 2000:2002

这篇关于从我的数据框中创建一个虚拟变量矩阵;使用`NA`表示缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆