将缺失/非缺失值更改为二进制 (0/1) [英] Change missing/non-missing values to binary (0/1)
本文介绍了将缺失/非缺失值更改为二进制 (0/1)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的数据集是:
df=data.frame(x=c(1,4,6,NA,7,NA,9,10,4,NA),
y=c(10,12,NA,NA,14,18,20,15,12,17),
z=c(225,198,NA,NA,NA,130,NA,200,NA,99))
df
x y z
1 1 10 225
2 4 12 198
3 6 NA NA
4 NA NA NA
5 7 14 NA
6 NA 18 130
7 9 20 NA
8 10 15 200
9 4 12 NA
10 NA 17 99
我想将数据集更改为二进制数据集,如下所示
I want to change dataset to binary dataset as follows
观察到的非NA
值 ->1
缺失,NA
值 ->0
x y z
1 1 1 1
2 1 1 1
3 1 0 0
4 0 0 0
5 1 1 0
6 0 1 1
7 1 1 0
8 1 1 1
9 1 1 0
10 0 1 1
如何在 R 中实现?我的训练代码是 ifelse(df=NA , 0 ,1)
.
How to do it in R ?
my training code is ifelse(df=NA , 0 ,1)
.
推荐答案
你可以直接使用 !is.na
,像这样:
You can just use !is.na
, like this:
# df[] <- as.numeric(!is.na(df)) # <- Original answer
df[] <- as.integer(!is.na(df)) # <- Thanks @docendodiscimus
df
# x y z
# 1 1 1 1
# 2 1 1 1
# 3 1 0 0
# 4 0 0 0
# 5 1 1 0
# 6 0 1 1
# 7 1 1 0
# 8 1 1 1
# 9 1 1 0
# 10 0 1 1
<小时>
如果考虑效率,您可以尝试使用data.table"包:
If efficiency is of concern, you can try using the "data.table" package:
as.data.table(df)[, lapply(.SD, function(x) as.numeric(!is.na(x)))]
# x y z
# 1: 1 1 1
# 2: 1 1 1
# 3: 1 0 0
# 4: 0 0 0
# 5: 1 1 0
# 6: 0 1 1
# 7: 1 1 0
# 8: 1 1 1
# 9: 1 1 0
# 10: 0 1 1
或在替换时赋值:
as.data.table(df)[, (names(df)) := lapply(.SD, function(x) as.numeric(!is.na(x)))][]
更新
如果有人对进一步的基准测试感兴趣,您可以查看这个要点.
If anyone is interested in further benchmarks, you can check out this Gist.
基准测试摘要:
- 如果您只追求速度,请采用data.table"方法.
- 如果您想使用基于 R 的高效代码,
as.integer
和+
几乎是相辅相成的,所以我想您知道我的建议在哪里.
- If it's sheer speed you're after, go for a "data.table" approach.
- If you want efficient code in base R,
as.integer
and+
are virtually neck-to-neck, so I think you know where my recommendation would lie.
这篇关于将缺失/非缺失值更改为二进制 (0/1)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文