将缺失/非缺失值更改为二进制 (0/1) [英] Change missing/non-missing values to binary (0/1)

查看:68
本文介绍了将缺失/非缺失值更改为二进制 (0/1)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据集是:

df=data.frame(x=c(1,4,6,NA,7,NA,9,10,4,NA),
          y=c(10,12,NA,NA,14,18,20,15,12,17),
          z=c(225,198,NA,NA,NA,130,NA,200,NA,99))
df
    x  y   z
1   1 10 225
2   4 12 198
3   6 NA  NA
4  NA NA  NA
5   7 14  NA
6  NA 18 130
7   9 20  NA
8  10 15 200
9   4 12  NA
10 NA 17  99

我想将数据集更改为二进制数据集,如下所示

I want to change dataset to binary dataset as follows

观察到的非NA值 ->1

缺失,NA 值 ->0

 x y z
1  1 1 1
2  1 1 1
3  1 0 0
4  0 0 0
5  1 1 0
6  0 1 1
7  1 1 0
8  1 1 1
9  1 1 0
10 0 1 1

如何在 R 中实现?我的训练代码是 ifelse(df=NA , 0 ,1) .

How to do it in R ? my training code is ifelse(df=NA , 0 ,1) .

推荐答案

你可以直接使用 !is.na,像这样:

You can just use !is.na, like this:

# df[] <- as.numeric(!is.na(df))  # <- Original answer
df[] <- as.integer(!is.na(df))    # <- Thanks @docendodiscimus
df
#    x y z
# 1  1 1 1
# 2  1 1 1
# 3  1 0 0
# 4  0 0 0
# 5  1 1 0
# 6  0 1 1
# 7  1 1 0
# 8  1 1 1
# 9  1 1 0
# 10 0 1 1

<小时>

如果考虑效率,您可以尝试使用data.table"包:


If efficiency is of concern, you can try using the "data.table" package:

as.data.table(df)[, lapply(.SD, function(x) as.numeric(!is.na(x)))]
#     x y z
#  1: 1 1 1
#  2: 1 1 1
#  3: 1 0 0
#  4: 0 0 0
#  5: 1 1 0
#  6: 0 1 1
#  7: 1 1 0
#  8: 1 1 1
#  9: 1 1 0
# 10: 0 1 1

或在替换时赋值:

as.data.table(df)[, (names(df)) := lapply(.SD, function(x) as.numeric(!is.na(x)))][]

更新

如果有人对进一步的基准测试感兴趣,您可以查看这个要点.

If anyone is interested in further benchmarks, you can check out this Gist.

基准测试摘要:

  • 如果您只追求速度,请采用data.table"方法.
  • 如果您想使用基于 R 的高效代码,as.integer+ 几乎是相辅相成的,所以我想您知道我的建议在哪里.
  • If it's sheer speed you're after, go for a "data.table" approach.
  • If you want efficient code in base R, as.integer and + are virtually neck-to-neck, so I think you know where my recommendation would lie.

这篇关于将缺失/非缺失值更改为二进制 (0/1)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆