获取与一系列向量重合的矩阵行,而不使用 apply [英] Getting rows of a matrix which coincide with a series of vectors, without using apply
问题描述
我的问题有点与 我之前的问题有关问题.
假设我有一个矩阵和 4 个向量(可以考虑另一个矩阵,因为向量的顺序很重要),我想按顺序获得与每个向量重合的行号.我希望解决方案避免重复向量并尽可能高效,因为问题规模很大.
Suppose I have one matrix and 4 vectors (can consider this another matrix, since the order of the vectors matters), and I want to get the row numbers which coincide to each vector, in order. I would like the solution to avoid repeating vectors and be as efficient as possible, since the problem is large scale.
示例.
set.seed(1)
M = matrix(rpois(50,5),5,10)
v1 = c(3, 2, 7, 7, 4, 4, 7, 4, 5, 6)
v2= c(8, 6, 4, 4, 3, 8, 3, 6, 5, 6)
v3= c(4, 8, 3, 5, 9, 4, 5, 6, 7 ,7)
v4= c(4, 9, 3, 6, 3, 1, 5, 7,6, 1)
Vmat = cbind(v1,v2,v3,v4)
M
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 4 8 3 5 9 4 5 6 7 7
[2,] 4 9 3 6 3 1 5 7 6 1
[3,] 5 6 6 11 6 4 5 2 7 5
[4,] 8 6 4 4 3 8 3 6 5 6
[5,] 3 2 7 7 4 4 7 4 5 6
Vmat
v1 v2 v3 v4
[1,] 3 8 4 4
[2,] 2 6 8 9
[3,] 7 4 3 3
[4,] 7 4 5 6
[5,] 4 3 9 3
[6,] 4 8 4 1
[7,] 7 3 5 5
[8,] 4 6 6 7
[9,] 5 5 7 6
[10,] 6 6 7 1
输出应该是...
5 4 1 2
推荐答案
类似于@user295691 的回答,我们合并,但现在在 merge.data.table<中使用
which=TRUE
选项/代码>:
Similar to @user295691's answer, we merge, but now with which=TRUE
option in merge.data.table
:
set.seed(1)
matdata <- create_data(1e6,20,1e5) # using @user295691's example data
library(data.table)
M = as.data.table(matdata$M)
V = as.data.table(matdata$V)
r <- M[V, on=names(V), which=TRUE]
要验证它是否正确...
To verify that it is correct...
V[1,]
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
# 1: 7 5 3 2 5 6 3 3 5 5 3 2 4 9 4 4 3 6 4 3
M[r[1],]
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
# 1: 7 5 3 2 5 6 3 3 5 5 3 2 4 9 4 4 3 6 4 3
<小时>
基准
OP 的示例数据(在已删除的答案中):
OP's example data (in a deleted answer):
set.seed(1)
NM = 1e6
NV = 1e5
Ncols = 20
MM = matrix(rpois(NM*Ncols,Ncols),NM,Ncols)
rows=sample(NM,NV,replace = FALSE)
Vmat=t(MM[rows,])
# converted to data.frames, because why not?
M = as.data.frame(MM)
V = as.data.frame(t(Vmat))
# converted to data.tables
M2 = setDT(copy(M))
V2 = setDT(copy(V))
要测试的功能:
match_strings <- function(){
m = do.call(function(...) paste(...,sep="_"), M)
v = do.call(function(...) paste(...,sep="_"), V)
match(v,m)
}
merge_df <- function(){ # from @user295691's answer
M$mid = seq(nrow(M))
V$vid = seq(nrow(V))
with(merge(M,V), mid[order(vid)])
}
merge_dt <- function(){
M2[V2, on=names(V2), which=TRUE]
}
结果:
system.time({r_strings = match_strings()})
# user system elapsed
# 10.40 0.06 10.49
system.time({r_merge_df = merge_df()})
# user system elapsed
# 14.71 0.10 14.84
system.time({r_merge_dt = merge_dt()})
# user system elapsed
# 0.39 0.00 0.40
identical(r_strings,r_merge_df) # TRUE
identical(r_strings,r_merge_dt) # TRUE
这篇关于获取与一系列向量重合的矩阵行,而不使用 apply的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!