在R中使用大型哈希表 [英] Using large hash tables in R

查看:243
本文介绍了在R中使用大型哈希表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用软件包hash,我知道这是最常用的实现(除了直接使用环境之外).

I'm trying to use package hash, which I understand is the most commonly adopted implementation (other than directly using environments).

如果我尝试创建和存储大于20MB的哈希,就会开始出现protect(): protection stack overflow错误.

If I try to create and store hashes larger than ~20MB, I start getting protect(): protection stack overflow errors.

pryr::object_size(hash::hash(1:120000, 1:120000))  # * (see end of post)
#> 21.5 MB
h <- hash::hash(1:120000, 1:120000)
#> Error: protect(): protection stack overflow

如果我运行一次h <- ...命令,则该错误仅出现一次.如果我运行两次,则会在控制台中出现无限循环的错误,冻结Rstudio并迫使我从任务管理器中重新启动它.

If I run the h <- ... command once, the error only appears once. If I run it twice, I get an infinite loop of errors appearing in the console, freezing Rstudio and forcing me to restart it from the Task Manager.

从其他多个SO问题中,我理解这意味着我创建的指针超出了R可以保护的范围.这对我来说很有意义,因为散列实际上只是环境(它们本身只是散列表),因此我认为R需要作为单独的指针来跟踪散列表中的每个值.

From multiple other SO questions, I understand this means I'm creating more pointers than R can protect. This makes sense to me, since hashes are actually just environments (which themselves are just hash tables), so I assume R needs to keep track of each value in the hash table as a separate pointer.

我看到的针对protect()错误的常见解决方案是使用rstudio.exe --max-ppsize=500000(我认为该选项会将选项传播给R本身),但是在这种情况下它无济于事,错误仍然存​​在.这有点令人惊讶,因为上面示例中的哈希值只有120,000个键/指针,比给定的ppsize 500,000小得多.

The common solution I've seen for the protect() error is to use rstudio.exe --max-ppsize=500000 (which I assume propagates that option to R itself), but it doesn't help in this case, the error remains. This is somewhat surprising, since the hash in the example above is only 120,000 keys/pointers long, much smaller than the given ppsize of 500,000.

那么,如何在R中使用大哈希呢?我假设更改为纯环境无济于事,因为hash实际上只是围绕环境的包装.

So, how can I use large hashes in R? I'm assuming changing to pure environments won't help, since hash is really just a wrapper around environments.

*作为记录,上面给定的hash :: hash()调用将创建具有非语法名称的哈希,但这无关紧要:我的实际情况具有简单的字符键和整数值,并显示相同的行为)

* For the record, the given hash::hash() call above will create hashes with non-syntactic names, but that's irrelevant: my real case has simple character keys and integer values and shows the same behavior)

推荐答案

这是RStudio中的错误,而不是R中的限制.当它尝试检查h对象以显示在环境窗格中时,就会发生该错误.该错误在他们的问题列表中为 https://github.com/rstudio/rstudio/issues/5546 .

This is a bug in RStudio, not a limitation in R. The bug happens when it tries to examine the h object for display in the environment pane. The bug is on their issue list as https://github.com/rstudio/rstudio/issues/5546 .

这篇关于在R中使用大型哈希表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆