在R中传递大对象的高效记忆方式 [英] memory efficient way of passing large objects in R

查看:136
本文介绍了在R中传递大对象的高效记忆方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个需要访问其父环境(调用函数的范围)的变量的函数。该变量在内存方面是很大的,所以我不希望通过值传递给被调用的函数。除了在全局范围内声明变量之外,还有一种标准的方法吗?例如:

  g<  - 功能(a,b){#do stuff} 

f < - function(x){
y< - 3#,但在我的程序中y非常大
g(x,y)
}

我想在 g()中访问y。所以这样的东西:

  g<  -   - (a){a + y} 

f < - function(x){
y< - 3#但在我的程序中y非常大
g(x)
}
pre>

这是可能吗?



谢谢

解决方案

在全局范围内声明变量没有任何优势,在R中可能不可能,具体取决于您的意思。你当然可以使用第二种形式。导致对象的重复或甚至三重拷贝的操作是分配。您将需要更详细地描述您要通过代码说明的内容: y < - 3 。通常只需要访问位于封闭框架中的名为y的对象的函数通常不需要。



在声明的环境中存储变量有时会改善访问的效率,但我的理解是,效率是提高速度,因为使用哈希表。一个环境中的项目以与访问列表元素相同的方式访问环境:

 > evn<  -  new.env()
> evn $ a< - rnorm(100000)
> ls(evn)
[1]a
>长度(evn $ a)
[1] 100000

BigMemory项目可能提供这个:
http://www.bigmemory.org/
它和 Lumley的biglm 可以帮助评论中提到的大型数据集。


I have a function that needs to access a variable in its parent environment (scope from which the function is called). The variable is large in terms of memory so I would prefer not to pass it to by value to the function being called. Is there a standard way of doing this other than declaring the variable in the global scope? For example:

g <- function (a, b) { #do stuff}

f <- function(x) {
    y <- 3 #but in my program y is very large
    g(x, y)
}

I would like to access y in g(). So something like this:

g <- function (a) { a+y }

f <- function(x) {
    y <- 3 #but in my program y is very large
    g(x)
}

Is this possible?

Thanks

解决方案

There is no advantage to "declaring the variable in the global scope" and it may not even be possible in R depending on what you mean by that. You certainly could use the second form. The action that causes duplicate or even triplicate copies of an object is assignment. You will need to describe in more detail what you are trying to illustrate by the code: y <- 3. That would not normally be needed inside a function that merely accessed an object named "y" that was located in an enclosing frame.

Storing variables in a declared environment will sometimes improve efficiency of access, but my understanding is that the efficiency is in terms of improved speed because a hash table is used. One accesses items in an environment in the same manner as one accesses list elements:

> evn <- new.env()
> evn$a <- rnorm(100000)
> ls(evn)
[1] "a"
> length(evn$a)
[1] 100000

The BigMemory project may offer facilities for this: http://www.bigmemory.org/ . It and Lumley's biglm may help with the large dataset mentioned in the comments.

这篇关于在R中传递大对象的高效记忆方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆