R独特的列或行与NA无与伦比 [英] R unique columns or rows incomparables with NA

查看:103
本文介绍了R独特的列或行与NA无与伦比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何人都知道unique()duplicated()incomparables自变量是否曾经在incomparables=FALSE之后实现吗?

Anyone know if the incomparables argument of unique() or duplicated() has ever been implemented beyond incomparables=FALSE?

也许我不明白它应该如何工作...

Maybe I don't understand how it is supposed to work...

无论如何,我正在寻找一种精巧的解决方案,以使除额外的NA之外仅保留与另一列相同的唯一列(或行)?例如,我可以使用cor()对其进行暴力破解,但是对于成千上万的列,这很棘手.

Anyway I'm looking for a slick solution to keep only unique columns (or rows) that are identical to another column besides extra NAs? I can brute force it using cor() for example, but for tens of thousands of columns, this is intractable.

这里有一个例子,很抱歉,如果它有点混乱,但我认为它说明了这一点.制作一些矩阵z:

Heres an example, sorry if its a little messy, but I think it illustrates the point. Make some matrix z:

z <- matrix(sample(c(1:3, NA), 100, replace=TRUE), 10, 10)
colnames(z) <- paste("c", 1:10, sep="")
rownames(z) <- paste("r",1:10, sep="")

let会添加几个带有额外的NA的重复列,并对这些列进行随机化(这样,它们并不总是在末尾).

lets add a couple duplicate columns with extra NAs, and randomize the columns, (that way they aren't always at the end).

c3.1 <- z[, 3]
c3.1[sample(1:10, 3)] <- NA
c8.1 <- z[, 8]
c8.1[sample(1:10, 5)] <- NA

z <- cbind(z, c3.1, c8.1)
z <- z[, sample(1:ncol(z))]

所以我可以按丢失的数字进行排序,然后看起来duplicated()unique()可以工作,但它不想忽略丢失.

So I could sort by the number missing, then it would seem as though duplicated() or unique() would work, but it doesn't like to ignore missing.

missing <- apply(z, 2, function(x) {length(which(is.na(x)))})
z.sorted <- z[, order(missing)]

z.sorted[,!duplicated(z.sorted,MARGIN=2)]
unique(z.sorted,MARGIN=2)

我认为这是incomparables参数专门用于的目的,但似乎尚未实现:

I figured this is what the incomparables argument was specifically for, but it doesn't appear to be implemented yet:

z.sorted[,!duplicated(z.sorted,MARGIN=2,incomparables=NA)]
unique(z.sorted,MARGIN=2,incomparables=NA)

我知道我很可能会尽快找到一个不太优雅的解决方案,我想我更多是在问为什么尚未实施呢?或者如果我只是用错了.似乎我经常遇到这种情况,但是我搜索了好一阵子却没有找到答案.有什么想法吗?

I know I will likely find a less elegant solution soon enough, I guess I'm more asking about why this hasn't been implemented yet? or if I'm just using it wrong. Seems I run into this quite often, yet I searched around for quite a while without finding answer. Any thoughts?

推荐答案

您怀疑,对于uniquedata.framematrix方法,尚未实现incomparables != FALSE.它 以默认方法实现,该方法用于不带暗点的矢量.例如:

As you suspect, for the data.frame and matrix methods of unique, incomparables != FALSE is not yet implemented. It is implemented in the default method, which is used for vectors without dims. E.g.:

unique(c(1, 2, 2, 3, 3, 3, NA, NA, NA), incomparables=2)
## [1]  1  2  2  3 NA

unique(c(1, 2, 2, 3, 3, 3, NA, NA, NA), incomparables=NA)
## [1]  1  2  3 NA NA NA

查看unique.matrixunique.default的源代码(只需在控制台中键入函数名称并单击Enter,或在RStudio中按F2,在新窗格中打开源代码).

Take a look at the source of unique.matrix versus unique.default (just type the function names into the console and hit Enter, or press F2 in RStudio ro open the source in a new pane).

在您的情况下,您可以使用outer创建一个矩阵,以指示特定的行/列对是否相同,而无需考虑NA.

In your case, you could use outer to create a matrix indicating whether particular pairs of rows/columns are the same or not, disregarding NAs.

same <- outer(seq_len(ncol(z)), seq_len(ncol(z)), 
              Vectorize(function(x, y) all(z[, x]==z[, y], na.rm=TRUE)))

same

##        [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
##  [1,]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [2,] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
##  [3,] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
##  [4,] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [5,] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
##  [6,] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [7,] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
##  [8,] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
##  [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
## [10,] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
## [11,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
## [12,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

然后,如果您只想保留与第二列相同的列(对我来说是列c8.1-有关我使用的完整z矩阵,请参见此文章的底部),您可以这样做:

Then, if you want to keep only those columns that are the same as, e.g., the second column (which is column c8.1 for me - see bottom of this post for the full z matrix I used), you can do:

z[, same[2, ]] # or, equivalently, z[, same[, 2]]

##     c8.1 c8
## r1     2  2
## r2     1  1
## r3    NA  3
## r4    NA  1
## r5     3  3
## r6    NA  1
## r7     2  2
## r8    NA  1
## r9     3  3
## r10   NA  1

要将矩阵简化为唯一的列集(忽略NA)并且具有最少的NA个列,则可以执行以下操作:

To reduce the matrix to the set of columns that is unique (ignoring NA), and has the least number of NAs, you can then do:

z[, unique(sapply(apply(same, 2, which), function(x) 
  x[which.min(colSums(is.na(z))[x])]))]

##      c7 c8 c3 c1 c6 c10 c2 c9 c4
##  r1   2  2  1  2  1   1  1  2 NA
##  r2   3  1  3  1  3  NA  1  2  2
##  r3   2  3  2  3  1  NA  2  1 NA
##  r4   2  1  1  2  2   1  3 NA  2
##  r5  NA  3  2  1  3   2 NA NA  3
##  r6   2  1  2  2  1   1  2  1 NA
##  r7   2  2  2  2 NA   3  1  2  2
##  r8  NA  1  1  3  2  NA  1 NA  1
##  r9   1  3  3  2 NA   2  1 NA  2
## r10  NA  1  1 NA  1   1  1  2  3


作为参考,以下是我正在使用的z:

    c7 c8.1 c3 c1 c5 c10 c8 c6 c2 c3.1 c9 c4
r1   2    2  1  2  1   1  2  1  1    1  2 NA
r2   3    1  3  1  3  NA  1  3  1    3  2  2
r3   2   NA  2  3  1  NA  3  1  2    2  1 NA
r4   2   NA  1  2 NA   1  1  2  3   NA NA  2
r5  NA    3  2  1  3   2  3  3 NA    2 NA  3
r6   2   NA  2  2  1   1  1  1  2    2  1 NA
r7   2    2  2  2  1   3  2 NA  1    2  2  2
r8  NA   NA  1  3 NA  NA  1  2  1   NA NA  1
r9   1    3  3  2  1   2  3 NA  1   NA NA  2
r10 NA   NA  1 NA NA   1  1  1  1    1  2  3

这篇关于R独特的列或行与NA无与伦比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆