我可以在R中使用列表作为散列吗？如果是这样，为什么这么慢？ [英] Can I use a list as a hash in R? If so, why is it so slow?

查看：78 发布时间：2018/6/1 15:20:04 r perl hash

本文介绍了我可以在R中使用列表作为散列吗？如果是这样，为什么这么慢？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在使用R之前，我使用了相当多的Perl。在Perl中，我经常使用散列，并且查找散列在Perl中通常被认为是快速的。

例如，下面的代码将填充高达10000的散列键/值对，其中键是随机字母，值是随机整数。然后，它在该散列中进行10000次随机查找。

 ＃！/ usr / bin / perl -w 
 use严格; 
 
 my @letters =（'a'..'z'）; 
 
打印@letters。 \\\
; 
 my％testHash; （my $ i = 0; $ i <10000; $ i ++）{
 my $ r1 = int（rand（26））; 
 
; 
 my $ r2 = int（rand（26））; 
 my $ r3 = int（rand（26））; 
 my $ key = $信函[$ r1]。 $字母[$ r2]。 $信[$ R 3]。 
 my $ value = int（rand（1000））; 
 $ testHash {$ key} = $ value; 
} 
 
 my @keyArray = keys（％testHash）; 
 my $ keyLen =标量@keyArray; 
 $ b $ for（my $ j = 0; $ j <10000; $ j ++）{
 my $ key = $ keyArray [int（rand（$ keyLen））]]; 
 my $ lookupValue = $ testHash {$ key}; 
打印钥匙。 $键。 查找$ lookupValue \\\
; 
}

现在越来越多，我想要在类似哈希的数据结构R.以下是等效的R代码：

  testHash<  -  list（）
 
 for （我在1：10000）{
 key.tmp = paste（字母[floor（26 * runif（3））]，sep =）
 key<  -  capture.output（cat key.tmp，sep =））
 value<  -  floor（1000 * runif（1））
 testHash [[key]]  -  value 
 
 
 keyArray<  -  attributes（testHash）$ names 
 keyLen = length（keyArray）; 
 
 for（j in 1：10000）{
 key<  -  keyArray [floor（keyLen * runif（1））] 
 lookupValue = testHash [[key]] 
 print（paste（key，key，Lookup，lookupValue））
}

这段代码似乎在做同样的事情。然而，Perl的速度要快得多：

 >时间./perlHashTest.pl 
真实0m4.346s 
用户** 0m0.110s ** 
 sys 0m0.100s

与R：

  time R CMD BATCH RHashTest.R 
 
 real 0m8.210s 
 user ** 0m7.630s ** 
 sys 0m0.200s

什么解释了这种差异？ R列表中的查找不好吗？

增加到100,000个列表长度和100,000个查找只会夸大这个差异？对于R中的散列数据结构，有没有比本地列表（）更好的选择？

解决方案

具有命名元素的列表不被散列。哈希查找是O（1），因为在插入期间，使用散列函数将密钥转换为整数，然后将值放入空间散列（键）％num_spots 数组 num_spots long（这是一个大的简化，避免了处理冲突的复杂性）。查找密钥只需要对密钥进行散列查找值的位置（与O（n）数组查找相比，它是O（1））。 R列表使用O（n）的名称查找。

正如Dirk所说，使用散列包。与此相关的一个巨大局限是它使用环境（被哈希）和重写 [方法来模拟哈希表。但是一个环境不能包含其他环境，所以你不能使用散列函数进行嵌套散列。

后来我开始在C / R中实现一个可以嵌套的纯哈希表数据结构，但是当我在其他工作环境中工作时，的东西。尽管如此： - ）

Before using R, I used quite a bit of Perl. In Perl, I would often use hashes, and lookups of hashes are generally regarded as fast in Perl.

For example, the following code will populate a hash with up to 10000 key/value pairs, where the keys are random letters and the values are random integers. Then, it does 10000 random lookups in that hash.

#!/usr/bin/perl -w
use strict;

my @letters = ('a'..'z');

print @letters . "\n";
my %testHash;

for(my $i = 0; $i < 10000; $i++) {
    my $r1 = int(rand(26));
    my $r2 = int(rand(26));
    my $r3 = int(rand(26));
    my $key = $letters[$r1] . $letters[$r2] . $letters[$r3];
    my $value = int(rand(1000));
    $testHash{$key} = $value;
}

my @keyArray = keys(%testHash);
my $keyLen = scalar @keyArray;

for(my $j = 0; $j < 10000; $j++) {
    my $key = $keyArray[int(rand($keyLen))];
    my $lookupValue = $testHash{$key};
    print "key " .  $key . " Lookup $lookupValue \n";
}

Now that increasingly, I am wanting to have a hash-like data structure in R. The following is the equivalent R code:

testHash <- list()

for(i in 1:10000) {
  key.tmp = paste(letters[floor(26*runif(3))], sep="")
  key <- capture.output(cat(key.tmp, sep=""))
  value <- floor(1000*runif(1))
  testHash[[key]] <- value
}

keyArray <- attributes(testHash)$names
keyLen = length(keyArray);

for(j in 1:10000) {
  key <- keyArray[floor(keyLen*runif(1))]
  lookupValue = testHash[[key]]
  print(paste("key", key, "Lookup", lookupValue))
}

The code seem to be doing equivalent things. However, the Perl one is much faster:

>time ./perlHashTest.pl
real    0m4.346s
user    **0m0.110s**
sys 0m0.100s

Comparing to R:

time R CMD BATCH RHashTest.R

real    0m8.210s
user    **0m7.630s**
sys 0m0.200s

What explains the discrepancy? Are lookups in R lists just not good?

Increasing to 100,000 list length and 100,000 lookups only exaggerates the discrepancy? Is there a better alternative for the hash data structure in R than the native list()?

解决方案

The underlying reason is that R lists with named elements are not hashed. Hash lookups are O(1), because during insert the key is converted to an integer using a hash function, and then the value put in the space hash(key) % num_spots of an array num_spots long (this is a big simplification and avoids the complexity of dealing with collisions). Lookups of the key just require hashing the key to find the value's position (which is O(1), versus a O(n) array lookup). R lists use name lookups which are O(n).

As Dirk says, use the hash package. A huge limitation with this is that it uses environments (which are hashed) and overriding of [ methods to mimic hash tables. But an environment cannot contain another environment, so you cannot have nested hashes with the hash function.

A while back I worked on implementing a pure hash table data structure in C/R that could be nested, but it went on my project back burner while I worked on other things. It would be nice to have though :-)

这篇关于我可以在R中使用列表作为散列吗？如果是这样，为什么这么慢？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我可以在R中使用列表作为散列吗？如果是这样，为什么这么慢？ [英] Can I use a list as a hash in R? If so, why is it so slow?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

我可以在R中使用列表作为散列吗？如果是这样，为什么这么慢？ [英] Can I use a list as a hash in R? If so, why is it so slow?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭