在添加向量之后,从现有矩阵中随机选择值(在R中) [英] Randomly selecting values from an existing matrix after adding a vector (in R)
问题描述
非常感谢您的提前帮助!
Thank you so much for your help in advance!
我正在尝试修改现有的矩阵,以便在将新行添加到矩阵中时,它会删除先前存在的矩阵中的值.
I am trying to modify an existing matrix such that, when a new line is added to the matrix, it removes values from the preexisting matrix.
例如,我有矩阵:
[,1] [,2] [,3] [,4]
1 1 0 0
0 1 0 0
1 0 1 0
0 0 1 1
我想添加另一个向量I.vec,它具有两个值(I.vec=c(0,1,1,0)
).
这很容易做到.我只是将其绑定到矩阵中.
现在,对于I.vec等于1的每一列,我想从其他行中随机选择一个值并将其设为零.
理想情况下,它将以如下矩阵结尾:
I want to add another vector, I.vec, which has two values (I.vec=c(0,1,1,0)
).
This is easy enough to do. I just rbind it to the matrix.
Now, for every column where I.vec is equal to 1, I want to randomly select a value from the other rows and make it zero.
Ideally, this would end up with a matrix like:
[,1] [,2] [,3] [,4]
1 0 0 0
0 1 0 0
1 0 0 0
0 0 1 1
0 1 1 0
但是每次我运行迭代时,我都希望它再次随机采样.
But each time I run the iteration, I want it to randomly sample again.
这就是我尝试过的:
mat1<-matrix(c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1),byrow=T, nrow=4)
I.vec<-c(0,1,1,0)
mat.I<-rbind(mat1,I.vec)
mat.I.r<-mat.I
d1<-mat.I[,which(mat.I[5,]==1)]
mat.I.r[sample(which(d1[1:4]==1),1),which(mat.I[5,]==1)]<-0
但这只会删除我要删除的两个值之一.我还尝试过对子集进行变体设置,但是没有成功.
But this only deletes one of the two values I would like to delete. I have also tried variations on subsetting the matrix, but I have not been successful.
再次感谢您!
推荐答案
OP的描述中有些含糊,因此建议两种解决方案:
There is a little bit of ambiguity in the description from the OP, so two solutions are suggested:
我将更改原始功能(见下文).所做的更改是定义rows
的行.我现在拥有了(原始版本中有一个错误-修订了以下版本以处理该错误):
I'll just alter the original function (see below). The change is to the line defining rows
. I now have (there was a bug in the original - the version below is revised to handle deal with the bug):
rows <- sapply(seq_along(cols),
function(x, mat, cols) {
ones <- which(mat[,cols[x]] == 1L)
out <- if(length(ones) == 1L) {
ones
} else {
sample(ones, 1)
}
out
}, mat = mat, cols = cols)
基本上,它的作用是,对于每一列,我们都需要将1
交换为0
,我们可以计算出该列的哪些行包含1
s并对其进行采样.
Basically, what this does is, for each column we need to swap a 1
to a 0
, we work out which rows of the column contain 1
s and sample one of these.
编辑:我们必须处理一列中只有一个1
的情况.如果仅从长度为1的向量采样,则R的sample()
会将其视为要从集合seq_len(n)
而不是从长度为1的集合n
进行采样.我们现在使用if, else
语句来处理这个问题.
Edit: We have to handle the case where there is only a single 1
in a column. If we just sample from a length 1 vector, R's sample()
will treat it as if we wanted to sample from the set seq_len(n)
not from the length 1 set n
. We handle this now with an if, else
statement.
我们必须为每一列单独执行此操作,以便获得正确的行.我想我们可以做一些很好的操作来避免重复调用which()
和sample()
,但是此刻我却不为所动,因为我们必须处理该列中只有一个1
的情况.这是完成的功能(已更新为可处理原始代码中的第1个示例错误):
We have to do this individually for each column so we get the correct rows. I suppose we could do some nice manipulation to avoid repeated calls to which()
and sample()
, but how escapes me at the moment, because we do have to handle the case where there is only one 1
in the column. Here's the finished function (updated to handle the length 1 sample bug in the original):
foo <- function(mat, vec) {
nr <- nrow(mat)
nc <- ncol(mat)
cols <- which(vec == 1L)
rows <- sapply(seq_along(cols),
function(x, mat, cols) {
ones <- which(mat[,cols[x]] == 1L)
out <- if(length(ones) == 1L) {
ones
} else {
sample(ones, 1)
}
out
}, mat = mat, cols = cols)
ind <- (nr*(cols-1)) + rows
mat[ind] <- 0
mat <- rbind(mat, vec)
rownames(mat) <- NULL
mat
}
并且它正在起作用:
> set.seed(2)
> foo(mat1, ivec)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 1 0 1 0
[4,] 0 0 0 1
[5,] 0 1 1 0
,并且当我们要在其中进行交换的列中只有一个1
时,它会起作用:
and it works when there is only one 1
in a column we want to do a swap in:
> foo(mat1, c(0,0,1,1))
[,1] [,2] [,3] [,4]
[1,] 1 1 0 0
[2,] 0 1 0 0
[3,] 1 0 1 0
[4,] 0 0 0 1
[5,] 0 0 1 1
原始答案:假设相关列中的任何值可以设置为零
这是向量化的答案,在进行替换时,我们将矩阵视为向量.使用示例数据:
Original Answer: Assuming any value in a relevant column can be set to zero
Here is a vectorised answer, where we treat the matrix as a vector when doing the replacement. Using the example data:
mat1 <- matrix(c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1), byrow = TRUE, nrow = 4)
ivec <- c(0,1,1,0)
## Set a seed to make reproducible
set.seed(2)
## number of rows and columns of our matrix
nr <- nrow(mat1)
nc <- ncol(mat1)
## which of ivec are 1L
cols <- which(ivec == 1L)
## sample length(cols) row indices, with replacement
## so same row can be drawn more than once
rows <- sample(seq_len(nr), length(cols), replace = TRUE)
## Compute the index of each rows cols combination
## if we treated mat1 as a vector
ind <- (nr*(cols-1)) + rows
## ind should be of length length(cols)
## copy for illustration
mat2 <- mat1
## replace the indices we want with 0, note sub-setting as a vector
mat2[ind] <- 0
## bind on ivec
mat2 <- rbind(mat2, ivec)
这给了我们
> mat2
[,1] [,2] [,3] [,4]
1 0 0 0
0 1 0 0
1 0 0 0
0 0 1 1
ivec 0 1 1 0
如果我这样做不止一次或两次,我会将其包装在一个函数中:
If I were doing this more than once or twice, I'd wrap this in a function:
foo <- function(mat, vec) {
nr <- nrow(mat)
nc <- ncol(mat)
cols <- which(vec == 1L)
rows <- sample(seq_len(nr), length(cols), replace = TRUE)
ind <- (nr*(cols-1)) + rows
mat[ind] <- 0
mat <- rbind(mat, vec)
rownames(mat) <- NULL
mat
}
哪个给:
> foo(mat1, ivec)
[,1] [,2] [,3] [,4]
[1,] 1 1 0 0
[2,] 0 1 0 0
[3,] 1 0 1 0
[4,] 0 0 0 1
[5,] 0 1 1 0
如果您想对多个ivec
执行此操作,并且每次增长mat1
,那么您可能不想循环执行此操作,因为增长的对象很慢(它涉及副本等).但是您可以修改ind
的定义,以包括为n
ivec
s绑定的额外的n
行.
If you wanted to do this for multiple ivec
s, growing mat1
each time, then you probably don't want to do that in a loop as growing objects is slow (it involves copies etc). But you could just modify the definition of ind
to include the extra n
rows you bind on for the n
ivec
s.
这篇关于在添加向量之后,从现有矩阵中随机选择值(在R中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!