R:用 NA 替换多列数据帧中的多个值 [英] R: Replace multiple values in multiple columns of dataframes with NA

查看:26
本文介绍了R:用 NA 替换多列数据帧中的多个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实现类似于 这个问题 但有多个值必须由 NA 替换,并且在大型数据集中.

I am trying to achieve something similar to this question but with multiple values that must be replaced by NA, and in large dataset.

df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = rep(1:9), var2 = rep(3:5, each = 3))

生成此数据帧:

df
  name foo var1 var2
1    a   1    1    3
2    a   2    2    3
3    a   3    3    3
4    b   4    4    4
5    b   5    5    4
6    b   6    6    4
7    c   7    7    5
8    c   8    8    5
9    c   9    9    5

我想用 NA 替换所有出现的,比如 3 和 4,但只在以var"开头的列中.

I would like to replace all occurrences of, say, 3 and 4 by NA, but only in the columns that start with "var".

我知道我可以使用 [] 运算符的组合来达到我想要的结果:

I know that I can use a combination of [] operators to achieve the result I want:

df[,grep("^var[:alnum:]?",colnames(df))][ 
        df[,grep("^var[:alnum:]?",colnames(df))] == 3 |
        df[,grep("^var[:alnum:]?",colnames(df))] == 4
   ] <- NA

df
  name foo var1 var2
1    a   1    1    NA
2    a   2    2    NA
3    a   3    NA   NA
4    b   4    NA   NA
5    b   5    5    NA
6    b   6    6    NA
7    c   7    7    5
8    c   8    8    5
9    c   9    9    5

现在我的问题如下:

  1. 考虑到我的实际情况,有没有办法以有效的方式做到这一点数据集大约有 100.000 行,500 个变量中有 400 个开始与var".当我使用时,我的电脑似乎(主观上)很慢双括号技术.
  2. 如果出现问题,我将如何处理而不是由 NA 替换的 2 个值(3 和 4),我有一个很长的例如,100 个不同值的列表?有没有办法指定多个值,而不必执行一系列由 | 运算符分隔的笨拙条件?
  1. Is there a way to do this in an efficient way, given that my actual dataset has about 100.000 lines, and 400 out of 500 variables start with "var". It seems (subjectively) slow on my computer when I use the double brackets technique.
  2. How would I approach the problem if instead of 2 values (3 and 4) to be replaced by NA, I had a long list of, say, 100 various values? Is there a way to specify multiple values with having to do a clumsy series of conditions separated by | operator?

推荐答案

您也可以使用 replace 来做到这一点:

You can also do this using replace:

sel <- grepl("var",names(df))
df[sel] <- lapply(df[sel], function(x) replace(x,x %in% 3:4, NA) )
df

#  name foo var1 var2
#1    a   1    1   NA
#2    a   2    2   NA
#3    a   3   NA   NA
#4    b   4   NA   NA
#5    b   5    5   NA
#6    b   6    6   NA
#7    c   7    7    5
#8    c   8    8    5
#9    c   9    9    5

一些使用百万行数据样本的快速基准测试表明这比其他答案更快.

Some quick benchmarking using a million row sample of data suggests this is quicker than the other answers.

这篇关于R:用 NA 替换多列数据帧中的多个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆