R向量大小限制:“.C不支持长向量(参数5)". [英] R vector size limit: "long vectors (argument 5) are not supported in .C"

查看:405
本文介绍了R向量大小限制:“.C不支持长向量(参数5)".的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的矩阵,试图在具有足够内存的服务器上通过glmnet运行.即使在非常大的数据集上达到某个点,它也能正常工作,此后出现以下错误:

I have a very large matrix I'm trying to run through glmnet on a server with plenty of memory. It works fine even on very large data sets up to a certain point, after which I get the following error:

Error in elnet(x, ...) : long vectors (argument 5) are not supported in .C

如果我正确理解,这是由于R中的限制所致,该限制不能有任何长度大于INT_MAX的向量.那是对的吗?是否有不需要完整重写glmnet的可用解决方案?是否有任何其他R解释器(Riposte等)解决此限制?

If I understand correctly this is caused by a limitation in R which cannot have any vector with length longer than INT_MAX. Is that correct? Are there any available solutions to this that don't require a complete rewrite of glmnet? Do any of the alternative R interpreters (Riposte, etc) address this limitation?

谢谢!

推荐答案

由于版本3 R支持长向量.长向量由double索引.长向量可以是矩阵或2维以上数组的基础,只要每个维足够小以可由integer索引.长向量不能通过.C.Fortran传递给本机代码.您收到的错误消息是因为正在通过.C传递长向量.

Since version 3 R supports long vectors. A long vector is indexed by double. A long vector can be a base for a matrix or a more-than-2 dimensional array as long as each dimension is small enough to be indexable by an integer. Long vectors cannot be passed to native code via .C and .Fortran. The error message you are getting is because a long vector is being passed via .C.

长向量可以通过.Call传递.因此,只要glmnet的本机代码可以支持长向量(64位索引)或可以对其进行修改/编译以支持它,则只需修改R和glmnet的本机代码之间的接口即可.您可以在C中手动执行此操作,并且还有一个名为dotCall64的新程序包用于此任务.修改接口的一部分是确定何时复制参数-.C/.Fortran进行预防性复制,但是您不想对大型数据结构不必要地执行此操作.

Long vectors can be passed via .Call. So, as long as the native code of glmnet could support long vectors (64 bit indexes) or could be modified/compiled to support it, one only would have to modify the interface between R and native code of glmnet. You can do this manually in C and there is also a new package named dotCall64 for this task. Part of modifying the interface is deciding when to copy arguments - .C/.Fortran preventively copies, but you don't want to do this unnecessarily with large data structures.

我认为更改glmnet的本机代码以支持64位索引的难度取决于实际的代码(我只看过但从未使用过).将Fortran代码中的所有整数(或显式或隐式32位整数)切换为64位很容易.当一些整数必须保留32位时,麻烦就来了,例如对于从R代码传递到R代码的整数向量,因为R使用32位整数(即使在长向量中也是如此). glmnet中传递了这样的整数向量.修改的难度取决于原始的Fortran代码的干净程度(例如,如果它使用单独的整数变量来索引和访问整数数组的值,等等).

I think the difficulty of changing the native code of glmnet to support 64 bit indexes depends on the actual code (that I only looked at but never worked with). It is easy to switch all integers (or explicitly or implicitly 32-bit integers) in Fortran code to 64-bit. The troubles come when some integers have to stay 32 bit, and this will happen e.g. for integer vectors passed from/to R code, because R uses 32 bit integers (even in long vectors indeed). There are such integer vectors passed in glmnet. How hard is the modification then depends on how clean is the original Fortran code (e.g. if it uses separate integer variables for indexing and accessing values of integer arrays, etc).

R子集的实验实现(如Riposte)将无济于事.

Experimental implementations of subsets of R, like Riposte, will not help.

这篇关于R向量大小限制:“.C不支持长向量(参数5)".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆