转计数矩阵成二进制存在矩阵 [英] Turn a count matrix into a binary existence matrix

查看:152
本文介绍了转计数矩阵成二进制存在矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据帧存储各种不同的人的水果数量的占有。像下面

 苹果香蕉橘子
蒂姆·3 0 2
汤姆0 1 1
鲍勃1 2 2

再次数字是水果的计数。我怎样才能把它变成一个矩阵的存在,这意味着,如果一个人有一个水果,不管他有多少的话,我记录1如果没有,战绩为0。像下面

 苹果香蕉橘子
蒂姆·1 0 1
汤姆0 1 1
鲍勃1 1 1


解决方案

下面是你的 data.frame

  X<  - 结构(列表(苹果= C(3L,0L,1L),香蕉= 0:2,橙= C(2L,
1L,2L)),.Names = C(苹果,香蕉,橙色)中,class =data.frame,row.names = C(蒂姆
汤姆,鲍勃))

和你的矩阵:

  as.matrix((X 0)+ 0)
    苹果香蕉橘子
蒂姆·1 0 1
汤姆0 1 1
鲍勃1 1 1

更新

我不知道,一个快速的pre-睡前发帖将产生<一个href=\"http://stackoverflow.com/questions/14526429/turn-a-count-matrix-into-a-binary-existence-matrix/14526637#comment20256941_14526637\">any 讨论,但讨论本身是挺有意思的,所以我想在这里总结一下:

我的直觉是简单地拿一个事实,即下一个 TRUE FALSE 在R,是数字 1 0 。如果您尝试(不那么好办法)来检查对等,如 1 == TRUE 0 == FALSE ,您将获得 TRUE 。我的快捷方式(这将会采取的更多的时间正确的,或至少的更多概念上正确的方式)的只需添加 0 我的 TRUE FALSE S,因为我知道, R.将强制逻辑向量数字。

正确,或至少,更合适的方式,将使用输出转换 as.numeric (我认为这就是@ JoshO'Brien打算写) 。但不幸的是....,它可消除输入的尺寸属性,所以你需要得到的载体再转化为矩阵,其中,因为它的出现,为 还是比增加快 0 正如我在我的答案一样。

读了意见和批评,我想我会多加一个选项---使用适用来遍历列,并使用 as.numeric 办法。那是的的比手动重新创建-矩阵,但是的略快的比增加 0 来的逻辑比较。

  X&LT;  -  data.frame(复制(1E4,样品(0:1E3)))
库(rbenchmark)
基准(X1 = {
            X1&下; - as.matrix((X 0)+ 0)
          },
          X2 = {
            X2&LT; - 申请(X,2,功能(Y)as.numeric(Y&0))
          },
          X3 = {
            ×3所述; - as.numeric(as.matrix(x)的大于0)
            X3&LT; - 矩阵(X3,nrow = 1001)
          },
          X4 = {
            ×4所述; - ifelse(X大于0,1,0)
          },
          列= C(测试,复制,流逝,
                      相对,user.self))
#试验重复过去相对user.self
#1 X1 100 116.618 1.985 110.711
#2 X2 100 105.026 1.788 94.070
#3 X3 100 58.750 1.000 46.007
#4 X4 100 382.410 6.509 311.567all.equal(X1,X2,check.attributes = FALSE)
#[1] TRUE
all.equal(X1,X3,check.attributes = FALSE)
#[1] TRUE
all.equal(X1,X4,check.attributes = FALSE)
#[1] TRUE

感谢你们讨论!

I have a data frame stores the possession of numbers of different kinds of fruits of different people. Like below

    apple  banana  orange
Tim     3       0       2
Tom     0       1       1
Bob     1       2       2

Again, the numbers are the counts of fruits. How can I change it into a existence matrix which means if a person has one fruit, no matter how many he has, then the I record 1, if not, record 0. Like below

    apple  banana  orange
Tim     1       0       1
Tom     0       1       1
Bob     1       1       1

解决方案

Here's your data.frame:

x <- structure(list(apple = c(3L, 0L, 1L), banana = 0:2, orange = c(2L, 
1L, 2L)), .Names = c("apple", "banana", "orange"), class = "data.frame", row.names = c("Tim", 
"Tom", "Bob"))

And your matrix:

as.matrix((x > 0) + 0)
    apple banana orange
Tim     1      0      1
Tom     0      1      1
Bob     1      1      1

Update

I had no idea that a quick pre-bedtime posting would generate any discussion, but the discussions themselves are quite interesting, so I wanted to summarize here:

My instinct was to simply take the fact that underneath a TRUE and FALSE in R, are the numbers 1 and 0. If you try (a not so good way) to check for equivalence, such as 1 == TRUE or 0 == FALSE, you'll get TRUE. My shortcut way (which turns out to take more time than the correct, or at least more conceptually correct way) was to just add 0 to my TRUEs and FALSEs, since I know that R would coerce the logical vectors to numeric.

The correct, or at least, more appropriate way, would be to convert the output using as.numeric (I think that's what @JoshO'Brien intended to write). BUT.... unfortunately, that removes the dimensional attributes of the input, so you need to re-convert the resulting vector to a matrix, which, as it turns out, is still faster than adding 0 as I did in my answer.

Having read the comments and criticisms, I thought I would add one more option---using apply to loop through the columns and use the as.numeric approach. That is slower than manually re-creating the matrix, but slightly faster than adding 0 to the logical comparison.

x <- data.frame(replicate(1e4,sample(0:1e3)))
library(rbenchmark)
benchmark(X1 = {
            x1 <- as.matrix((x > 0) + 0)
          },
          X2 = {
            x2 <- apply(x, 2, function(y) as.numeric(y > 0))
          },
          X3 = {
            x3 <- as.numeric(as.matrix(x) > 0)
            x3 <- matrix(x3, nrow = 1001)
          },
          X4 = {
            x4 <- ifelse(x > 0, 1, 0)
          },
          columns = c("test", "replications", "elapsed", 
                      "relative", "user.self"))
#   test replications elapsed relative user.self
# 1   X1          100 116.618    1.985   110.711
# 2   X2          100 105.026    1.788    94.070
# 3   X3          100  58.750    1.000    46.007
# 4   X4          100 382.410    6.509   311.567

all.equal(x1, x2, check.attributes=FALSE)
# [1] TRUE
all.equal(x1, x3, check.attributes=FALSE)
# [1] TRUE
all.equal(x1, x4, check.attributes=FALSE)
# [1] TRUE

Thanks for the discussion y'all!

这篇关于转计数矩阵成二进制存在矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆