如果另一列中没有列范围内的值,请替换为NA [英] If values in a range of columns aren't present in another column, replace with NA

查看:59
本文介绍了如果另一列中没有列范围内的值,请替换为NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,其中包含一些我要替换为NA的未引用数据.在以下示例中,如果列rep1到rep4中的数据与ID列中的值之一不匹配,我想用NA替换该值.在这种情况下,x,y和z的值未在ID列中列出,因此应将其替换.

I have a dataset that includes some non-referenced data that I would like to replace with NA. In the following example, if the data in columns rep1 to rep4 does not match one of the values in the ID column, I would like to replace the value with NA. In this case, the values of x, y, and z aren't listed in the ID column, so they should be replaced.

这是我之前在这里问过的类似问题:

This is a somewhat similar question that I asked earlier here : If data present, replace with data from another column based on row ID

我认为解决方案将类似于上一个问题,但我不知道如何更改第二部分〜value [match(.,ID)] 以返回ID列中未列出的值的不适用.

I think the solution will be similar to what was given in the previous question, but I don't know how to alter the second portion ~ value[match(., ID)] to return NA for values that aren't listed in the ID column.

df%>%mutate_at(vars(rep1:rep4),〜value [match(.,ID)])

ID  rep1  rep2  rep3  rep4  
a                           
b   a                       
c   a     b                 
d   a     b     c           
e   a     b     c     d     
f                           
g   x                       
h                           
i                           
j   y     z                 
k   z                       
l                           
m                           

结果应如下所示:

ID  rep1  rep2  rep3  rep4  
a                           
b   a                       
c   a     b                 
d   a     b     c           
e   a     b     c     d     
f                           
g   NA                      
h                           
i                           
j   NA    NA                    
k   NA                      
l                           
m                           

以下是使用 dput()

structure(list(ID = structure(1:13, .Label = c("a", "b", "c", 
"d", "e", "f", "g", "h", "i", "j", "k", "l", "m"), class = "factor"), 
    rep1 = structure(c(1L, 2L, 2L, 2L, 2L, 1L, 3L, 1L, 1L, 4L, 
    5L, 1L, 1L), .Label = c("", "a", "x", "y", "z"), class = "factor"), 
    rep2 = structure(c(1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 
    1L, 1L, 1L), .Label = c("", "b", "z"), class = "factor"), 
    rep3 = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L), .Label = c("", "c"), class = "factor"), rep4 = structure(c(1L, 
    1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", 
    "d"), class = "factor")), class = "data.frame", row.names = c(NA, -13L))

推荐答案

使用 replace()

df %>%
  mutate_at(vars(rep1:rep4), ~replace(., which(!(. %in% ID | . == "")), NA))

   ID rep1 rep2 rep3 rep4
1   a                    
2   b    a               
3   c    a    b          
4   d    a    b    c     
5   e    a    b    c    d
6   f                    
7   g <NA>               
8   h                    
9   i                    
10  j <NA> <NA>          
11  k <NA>               
12  l                    
13  m 

这篇关于如果另一列中没有列范围内的值,请替换为NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆