R:用NA替换多列数据帧中的多个值 [英] R: Replace multiple values in multiple columns of dataframes with NA

查看:106
本文介绍了R:用NA替换多列数据帧中的多个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力实现类似于这个问题,但是具有必须由NA替换的多个值以及大型数据集。

I am trying to achieve something similar to this question but with multiple values that must be replaced by NA, and in large dataset.

df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = rep(1:9), var2 = rep(3:5, each = 3))

生成此数据框:

df
  name foo var1 var2
1    a   1    1    3
2    a   2    2    3
3    a   3    3    3
4    b   4    4    4
5    b   5    5    4
6    b   6    6    4
7    c   7    7    5
8    c   8    8    5
9    c   9    9    5

我想用NA替换所有出现的3和4,但仅在以var开头的列中。

I would like to replace all occurrences of, say, 3 and 4 by NA, but only in the columns that start with "var".

我知道我可以使用 [] 运算符的组合来实现我想要的结果:

I know that I can use a combination of [] operators to achieve the result I want:

df[,grep("^var[:alnum:]?",colnames(df))][ 
        df[,grep("^var[:alnum:]?",colnames(df))] == 3 |
        df[,grep("^var[:alnum:]?",colnames(df))] == 4
   ] <- NA

df
  name foo var1 var2
1    a   1    1    NA
2    a   2    2    NA
3    a   3    NA   NA
4    b   4    NA   NA
5    b   5    5    NA
6    b   6    6    NA
7    c   7    7    5
8    c   8    8    5
9    c   9    9    5

现在我的问题如下:


  1. 有没有办法这是一个有效的方式,假设我的实际
    数据集有大约100.000行,500个变量中的400个开始
    与var。当我使用
    双括号技术时,我的电脑上似乎(主观上)缓慢。

  2. 如果
    而不是2个值(3和4)被替换为NA,我有一个长的
    列表,例如100个不同的值?有没有办法指定多个值,必须执行由 | 运算符分开的笨拙系列条件?

  1. Is there a way to do this in an efficient way, given that my actual dataset has about 100.000 lines, and 400 out of 500 variables start with "var". It seems (subjectively) slow on my computer when I use the double brackets technique.
  2. How would I approach the problem if instead of 2 values (3 and 4) to be replaced by NA, I had a long list of, say, 100 various values? Is there a way to specify multiple values with having to do a clumsy series of conditions separated by | operator?


推荐答案

您也可以使用替换

sel <- grepl("var",names(df))
df[sel] <- lapply(df[sel], function(x) replace(x,x %in% 3:4, NA) )
df

#  name foo var1 var2
#1    a   1    1   NA
#2    a   2    2   NA
#3    a   3   NA   NA
#4    b   4   NA   NA
#5    b   5    5   NA
#6    b   6    6   NA
#7    c   7    7    5
#8    c   8    8    5
#9    c   9    9    5

一些使用百万行数据的快速基准测试表明,这比其他答案更快。

Some quick benchmarking using a million row sample of data suggests this is quicker than the other answers.

这篇关于R:用NA替换多列数据帧中的多个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆