在具有行NA的数据框中找到唯一性? [英] Find uniqueness in data frame withe rows NA?

查看:46
本文介绍了在具有行NA的数据框中找到唯一性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下数据框.我想找到唯一的行(唯一性).但是在这个数据中我有"NA".我喜欢一行中具有NA值的所有值是否与其他行相同(例如行:1,2,5),但我想忽略它,但如果不相同(例如行:2,4),我想保留它作为唯一行.例如,在行1、2和6中,除NA之外的所有其他值都相同,因此因为NA可以是值"1和3",所以我想删除此行而只保留行2.另外,第6行中的值2和3(不包括NA)与c2和c5行相同,并且c6中的NA可能会获得与c2和c5类似的值,因此该行不是唯一的.

I have a data frame like below. I would like to find unique rows (uniqueness). But in this data I have 'NA'. I like if all value in one row with NA value is the same with other rows (like rows: 1,2,5) I want to ignore it, but if not same (like rows : 2,4) I like to keep it as unique row. For example, in rows 1 ,2 and 6 all values except NA are the same so because NA can be value '1 and 3' I like to remove this row and just keep row 2. Also, in row 6 values 2 and 3 (exclude NA) are the same as row c2 and c5 and there is possible NAs in c6 get same value like as c2 and c5, so this row is not unique.

此外,@ Sotos解决方案为我提供了更多帮助,但是最后一部分在删除行的make模式时删除NA​​后,他的解决方案考虑了c8和c6的相同模式(23)并将其删除.但是实际上c8是唯一的.

Also, @ Sotos solution help me more but in last part after removing NA when make pattern for rows , his solution consider same pattern (23) for c8 and c6 and remove them. But actually c8 is unique.

数据:

      a1  a2   a3   a4
c1    2    1    3   NA
c2    2    1    3    3
c3    2    1    4    3
c4    2    2    3   NA
c5    2    1    3    3
c6    2    NA   3   NA
c7    2    NA   0   NA
c8    2    3   NA   NA

我想要这个输出:

输出:

     a1    a2  a3   a4
c2    2    1    3    3
c3    2    1    4    3
c4    2    2    3   NA
c7    2    NA   0   NA
c8    2    3   NA   NA

推荐答案

library(stringr) 
df <- unique(df)
#paste rows omitting NAs
df$new <- apply(df, 1, function(i) paste(na.omit(i), collapse = ''))
#use str_detect to determine whether each pattern is found more than once
df$new2 <- rowSums(sapply(df$new, function(i) str_detect(i, df$new)))
new_df <- subset(df, df$new2 == 1)
new_df <- new_df[, !names(new_df) %in% c('new', 'new2')]
new_df
#   V2 V3 V4 V5
#2  2  1  3  3
#3  2  1  4  3
#4  2  2  3 NA

根据您的注释在附加行中测试代码:

Testing the code with the additional row as per your comment:

new_df
#   a1 a2 a3 a4
#c2  2  1  3  3
#c3  2  1  4  3
#c4  2  2  3 NA
#c7  2 NA  0 NA

数据

dput(df)
structure(list(a1 = c(2L, 2L, 2L, 2L, 2L, 2L, 2L), a2 = c(1L, 
1L, 1L, 2L, 1L, NA, NA), a3 = c(3L, 3L, 4L, 3L, 3L, 3L, 0L), 
    a4 = c(NA, 3L, 3L, NA, 3L, NA, NA)), .Names = c("a1", "a2", 
"a3", "a4"), class = "data.frame", row.names = c("c1", "c2", 
"c3", "c4", "c5", "c6", "c7"))

这篇关于在具有行NA的数据框中找到唯一性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆