用于查找与向量中的唯一值相关联的索引的有效R代码 [英] Efficient R code for finding indices associated with unique values in vector

查看:128
本文介绍了用于查找与向量中的唯一值相关联的索引的有效R代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有向量 vec < - c(D,B,B,C,C)



我的目标是结束一个维度 length(unique(vec))的列表,其中每个 i 在中返回指向 unique(vec)[i] 例如, vec 的此列表将返回:

  exampleList < -  list()
exampleList [[1]] <是第一个元素
exampleList [[2]] < - c(2,3)#由于B是第2/3个元素。
exampleList [[3]] < - c(4,5)#由于C是第4/5元素。

我尝试了下面的方法,但是太慢了。我的例子很大,所以我需要更快的代码:

  vec < -  c(D,B,B ,C,C)
uniques< - unique(vec)
exampleList< lapply(1:3,function(i){
which(vec == uniques [i])
})
exampleList


解决方案

更新:行为 DT [,list(list(。)),by =。] 有时会导致错误的结果R version> = 3.1.0。现在,在提交#1280 中修正了 data.table v1.9.3。从新闻



  • DT [,list(list(。)),by =。] 返回正确的结果在R> = 3.1.0。该错误是由于R v3.1.0中最近(欢迎)更改,其中 list(。)不会导致复制。关闭#481







使用 data.table 大约快15倍 tapply

  library(data.table)

vec <-c(D,B,B,C,C)

dt = as.data.table(vec) list(list(.I)),by = vec]
dt
#vec V1
#1:D 1
#2:B 2,3
# 3:C 4,5

#以所需的格式获得它
#(或许在将来data.table的setnames将用于列表)
setattr(dt $ V1,'names',dt $ vec)
dt $ V1
#$ D
#[1] 1


$ b# [1] 2 3

#$ C
#[1] 4 5


b $ b

速度测试:

  vec = sample(letters,1e7,T)
$ b b system.time(tapply(seq_along(vec),vec,identity)[unique(vec)])
#用户系统已过
#7.92 0.35 8.50

system.time ({dt = as.data.table(vec)[,list(list(.I)),by = vec]; setattr(dt $ V1,'names',dt $ vec); dt $ V1})
#用户系统已过
#0.39 0.09 0.49


Suppose I have vector vec <- c("D","B","B","C","C").

My objective is to end up with a list of dimension length(unique(vec)), where each i of this list returns a vector of indices which denote the locations of unique(vec)[i] in vec.

For example, this list for vec would return:

exampleList <- list()
exampleList[[1]] <- c(1) #Since "D" is the first element
exampleList[[2]] <- c(2,3) #Since "B" is the 2nd/3rd element.
exampleList[[3]] <- c(4,5) #Since "C" is the 4th/5th element.

I tried the following approach but it's too slow. My example is large so I need faster code:

vec <- c("D","B","B","C","C")
uniques <- unique(vec)
exampleList <- lapply(1:3,function(i) {
    which(vec==uniques[i])
})
exampleList

解决方案

Update: The behaviour DT[, list(list(.)), by=.] sometimes resulted in wrong results in R version >= 3.1.0. This is now fixed in commit #1280 in the current development version of data.table v1.9.3. From NEWS:

  • DT[, list(list(.)), by=.] returns correct results in R >=3.1.0 as well. The bug was due to recent (welcoming) changes in R v3.1.0 where list(.) does not result in a copy. Closes #481.


Using data.table is about 15x faster than tapply:

library(data.table)

vec <- c("D","B","B","C","C")

dt = as.data.table(vec)[, list(list(.I)), by = vec]
dt
#   vec  V1
#1:   D   1
#2:   B 2,3
#3:   C 4,5

# to get it in the desired format
# (perhaps in the future data.table's setnames will work for lists instead)
setattr(dt$V1, 'names', dt$vec)
dt$V1
#$D
#[1] 1
#
#$B
#[1] 2 3
#
#$C
#[1] 4 5

Speed tests:

vec = sample(letters, 1e7, T)

system.time(tapply(seq_along(vec), vec, identity)[unique(vec)])
#   user  system elapsed 
#   7.92    0.35    8.50 

system.time({dt = as.data.table(vec)[, list(list(.I)), by = vec]; setattr(dt$V1, 'names', dt$vec); dt$V1})
#   user  system elapsed 
#   0.39    0.09    0.49 

这篇关于用于查找与向量中的唯一值相关联的索引的有效R代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆