转计数矩阵成二进制存在矩阵 [英] Turn a count matrix into a binary existence matrix
问题描述
我有一个数据帧存储各种不同的人的水果数量的占有。像下面
苹果香蕉橘子
蒂姆·3 0 2
汤姆0 1 1
鲍勃1 2 2
再次数字是水果的计数。我怎样才能把它变成一个矩阵的存在,这意味着,如果一个人有一个水果,不管他有多少的话,我记录1如果没有,战绩为0。像下面
苹果香蕉橘子
蒂姆·1 0 1
汤姆0 1 1
鲍勃1 1 1
下面是你的 data.frame
:
X< - 结构(列表(苹果= C(3L,0L,1L),香蕉= 0:2,橙= C(2L,
1L,2L)),.Names = C(苹果,香蕉,橙色)中,class =data.frame,row.names = C(蒂姆
汤姆,鲍勃))
和你的矩阵:
as.matrix((X 0)+ 0)
苹果香蕉橘子
蒂姆·1 0 1
汤姆0 1 1
鲍勃1 1 1
更新
我不知道,一个快速的pre-睡前发帖将产生<一个href=\"http://stackoverflow.com/questions/14526429/turn-a-count-matrix-into-a-binary-existence-matrix/14526637#comment20256941_14526637\">any 讨论,但讨论本身是挺有意思的,所以我想在这里总结一下:
我的直觉是简单地拿一个事实,即下一个 TRUE
和 FALSE
在R,是数字 1
和 0
。如果您尝试(不那么好办法)来检查对等,如 1 == TRUE
或 0 == FALSE
,您将获得 TRUE
。我的快捷方式(这将会采取的更多的时间比正确的,或至少的更多概念上正确的方式)的只需添加 0
我的 TRUE
和 FALSE
S,因为我知道, R.将强制逻辑向量数字。
正确,或至少,更合适的方式,将使用输出转换 as.numeric
(我认为这就是@ JoshO'Brien打算写) 。但不幸的是....,它可消除输入的尺寸属性,所以你需要得到的载体再转化为矩阵,其中,因为它的出现,为 还是的比增加快 0
正如我在我的答案一样。
读了意见和批评,我想我会多加一个选项---使用适用
来遍历列,并使用 as.numeric
办法。那是的慢的比手动重新创建-矩阵,但是的略快的比增加 0
来的逻辑比较。
X&LT; - data.frame(复制(1E4,样品(0:1E3)))
库(rbenchmark)
基准(X1 = {
X1&下; - as.matrix((X 0)+ 0)
},
X2 = {
X2&LT; - 申请(X,2,功能(Y)as.numeric(Y&0))
},
X3 = {
×3所述; - as.numeric(as.matrix(x)的大于0)
X3&LT; - 矩阵(X3,nrow = 1001)
},
X4 = {
×4所述; - ifelse(X大于0,1,0)
},
列= C(测试,复制,流逝,
相对,user.self))
#试验重复过去相对user.self
#1 X1 100 116.618 1.985 110.711
#2 X2 100 105.026 1.788 94.070
#3 X3 100 58.750 1.000 46.007
#4 X4 100 382.410 6.509 311.567all.equal(X1,X2,check.attributes = FALSE)
#[1] TRUE
all.equal(X1,X3,check.attributes = FALSE)
#[1] TRUE
all.equal(X1,X4,check.attributes = FALSE)
#[1] TRUE
感谢你们讨论!
I have a data frame stores the possession of numbers of different kinds of fruits of different people. Like below
apple banana orange
Tim 3 0 2
Tom 0 1 1
Bob 1 2 2
Again, the numbers are the counts of fruits. How can I change it into a existence matrix which means if a person has one fruit, no matter how many he has, then the I record 1, if not, record 0. Like below
apple banana orange
Tim 1 0 1
Tom 0 1 1
Bob 1 1 1
Here's your data.frame
:
x <- structure(list(apple = c(3L, 0L, 1L), banana = 0:2, orange = c(2L,
1L, 2L)), .Names = c("apple", "banana", "orange"), class = "data.frame", row.names = c("Tim",
"Tom", "Bob"))
And your matrix:
as.matrix((x > 0) + 0)
apple banana orange
Tim 1 0 1
Tom 0 1 1
Bob 1 1 1
Update
I had no idea that a quick pre-bedtime posting would generate any discussion, but the discussions themselves are quite interesting, so I wanted to summarize here:
My instinct was to simply take the fact that underneath a TRUE
and FALSE
in R, are the numbers 1
and 0
. If you try (a not so good way) to check for equivalence, such as 1 == TRUE
or 0 == FALSE
, you'll get TRUE
. My shortcut way (which turns out to take more time than the correct, or at least more conceptually correct way) was to just add 0
to my TRUE
s and FALSE
s, since I know that R would coerce the logical vectors to numeric.
The correct, or at least, more appropriate way, would be to convert the output using as.numeric
(I think that's what @JoshO'Brien intended to write). BUT.... unfortunately, that removes the dimensional attributes of the input, so you need to re-convert the resulting vector to a matrix, which, as it turns out, is still faster than adding 0
as I did in my answer.
Having read the comments and criticisms, I thought I would add one more option---using apply
to loop through the columns and use the as.numeric
approach. That is slower than manually re-creating the matrix, but slightly faster than adding 0
to the logical comparison.
x <- data.frame(replicate(1e4,sample(0:1e3)))
library(rbenchmark)
benchmark(X1 = {
x1 <- as.matrix((x > 0) + 0)
},
X2 = {
x2 <- apply(x, 2, function(y) as.numeric(y > 0))
},
X3 = {
x3 <- as.numeric(as.matrix(x) > 0)
x3 <- matrix(x3, nrow = 1001)
},
X4 = {
x4 <- ifelse(x > 0, 1, 0)
},
columns = c("test", "replications", "elapsed",
"relative", "user.self"))
# test replications elapsed relative user.self
# 1 X1 100 116.618 1.985 110.711
# 2 X2 100 105.026 1.788 94.070
# 3 X3 100 58.750 1.000 46.007
# 4 X4 100 382.410 6.509 311.567
all.equal(x1, x2, check.attributes=FALSE)
# [1] TRUE
all.equal(x1, x3, check.attributes=FALSE)
# [1] TRUE
all.equal(x1, x4, check.attributes=FALSE)
# [1] TRUE
Thanks for the discussion y'all!
这篇关于转计数矩阵成二进制存在矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!