我如何在R中不使用for循环的情况下编写此指标矩阵 [英] How can I code this indicator matrix without using a for loop in R
问题描述
我有一个由数字序列给定的因子向量.这些因素也可以在称为test_set
和train_set
的单独数据集中找到.以下代码执行的操作是找到数据集中的因子在因子向量中的匹配位置,并将1放置在矩阵的位置.将此矩阵compound_test
乘以test_set$Compound
应该会得到compare_comp
.
I have a vector of factors given by a sequence of numbers. These factors are also found in separate data seta, called test_set
and train_set
. What the following code does is find where the factor in the data sets matches in the vector of factors and puts a 1 in the place of the matrix. Multiplying this matrix compound_test
by test_set$Compound
should give you compare_comp
.
compare_comp <- rbind(dcm,cmp1)[,1]
compound_test <- matrix(0,nrow(test_set),length(compare_comp)) # test indicator matrix
compound_train <-matrix(0,nrow(train_set),length(compare_comp))
for (i in 1:length(compare_comp)){
compound_test[which(compare_comp[i]==test_set$Compound),i]=1
compound_train[which(compare_comp[i]==train_set$Compound),i]=1}
它是针对训练集和测试集执行的,而compare_comp是因子的向量.
It does this for a train and test set, and compare_comp is the vector of factors.
R中是否有一个函数可以让我创建相同的东西而无需for循环?我试过model.matrix(~Compound,data=test_set)
没多大运气.
Is there a function in R that lets me create the same thing without the need for a for loop? I have tried model.matrix(~Compound,data=test_set)
without much luck.
推荐答案
虽然您可能无法完全避免迭代,因为您正在将compare_comp
向量的每个元素与每个train_set
,但是您可以通过 apply 系列功能使用更紧凑的分配.
While you may not be able to completely avoid iteration since you are comparing each element of compare_comp
vector to the full vector of Compound
in each test_set
and train_set
, you can however use more compact assignment with apply family functions.
具体来说,sapply
返回一个布尔逻辑(TRUE
,FALSE
)的逻辑矩阵,我们在相应的位置将其分配给初始化的矩阵,其中TRUE
转换为1,FALSE
转换为0.
Specifically, sapply
returns a logical matrix of booleans (TRUE
, FALSE
) that we assign in corresponding position to initialized matrices where TRUE
converts to 1 and FALSE
to 0.
# SAPPLY AFTER MATRIX INITIALIZATION
compound_test2 <- matrix(0, nrow(test_set), length(compare_comp))
compound_train2 <- matrix(0, nrow(train_set), length(compare_comp))
compound_test2[] <- sapply(compare_comp, function(x) x == test_set$Compound)
compound_train2[] <- sapply(compare_comp, function(x) x == train_set$Compound)
或者,很少使用的和众所周知的vapply
(类似于sapply
,但必须定义输出类型),返回等效矩阵但作为数字类型.
Alternatively, the rarely used and known vapply
(similar to sapply
but must define the output type), returns an equivalent matrix but as numeric type.
# VAPPLY WITHOUT MATRIX INITIALIZATION
compound_test3 <- vapply(compare_comp, function(x) x == test_set$Compound,
numeric(length(compare_comp)))
compound_train3 <- vapply(compare_comp, function(x) x == train_set$Compound,
numeric(length(compare_comp)))
测试使用随机数据进行确认(请参见下面的演示),两个版本都与循环版本相同
Testing confirms with random data (see demo below), both versions are identical to your looped version
identical(compound_test1, compound_test2)
identical(compound_train1, compound_train2)
# [1] TRUE
# [1] TRUE
identical(compound_test1, compound_test3)
identical(compound_train1, compound_train3)
# [1] TRUE
# [1] TRUE
这篇关于我如何在R中不使用for循环的情况下编写此指标矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!