fread保护堆栈溢出错误 [英] fread protection stack overflow error

查看:170
本文介绍了fread保护堆栈溢出错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在data.table(1.8.8,R 3.0.1)中使用fread尝试读取非常大的文件。

I'm using fread in data.table (1.8.8, R 3.0.1) in a attempt to read very large files.

问题文件有313行和约6.6百万列数字数据行,文件大约12GB。这是一个Centos 6.4与512GB的内存。

The file in questions has 313 rows and ~6.6 million cols of numeric data rows and the file is around around 12gb. This is a Centos 6.4 with 512GB of RAM.

当我尝试读入文件时:

g=fread('final.results',header=T,sep=' ')
'header' changed by user from 'auto' to TRUE
Error: protect(): protection stack overflow

我尝试用--max-ppsize 500000启动R,这是最大,但是同样的错误。

I tried starting R with --max-ppsize 500000 , which is the max, but the same error.

我也尝试通过

ulimit -s unlimited

虚拟内存已设置为unlimited。

Virtual memory was already set to unlimited.

我对这个大小的档案不切实际吗?

Am I being unrealistic with a file of this size? Did I miss something fairly obvious?

推荐答案

现在在R-Forge上修正为v1.8.9。

Now fixed in v1.8.9 on R-Forge.



  • fread 中删除​​了非预期的50,000列限制。感谢mpmorley的报告。已添加测试。

  • An unintended 50,000 column limit has been removed in fread. Thanks to mpmorley for reporting. Test added.

原因是我在 fread.c 源:

// *********************************************************************
// Allocate columns for known nrow
// *********************************************************************
ans=PROTECT(allocVector(VECSXP,ncol));
protecti++;
setAttrib(ans,R_NamesSymbol,names);
for (i=0; i<ncol; i++) {
    thistype  = TypeSxp[ type[i] ];
    thiscol = PROTECT(allocVector(thistype,nrow));   // ** HERE **
    protecti++;
    if (type[i]==SXP_INT64)
        setAttrib(thiscol, R_ClassSymbol, ScalarString(mkChar("integer64")));
    SET_TRUELENGTH(thiscol, nrow);
    SET_VECTOR_ELT(ans,i,thiscol);
}



据的 R-EXTS 5.9.1节,循环内PROTECT:

According to R-exts section 5.9.1, that PROTECT inside the loop isn't needed :


在某些情况下,需要更好地跟踪是否真正需要保护。 Be
特别注意生成大量对象的情况。指针
保护堆栈具有固定大小(默认为10,000),可以变满。这不是一个好主意
然后只是保护一切可见和联合国保护协会几千个对象在结束。它
将几乎总是可以要么分配对象作为另一个对象的一部分(自动
保护他们),或使用后立即取消保护它们。

In some cases it is necessary to keep better track of whether protection is really needed. Be particularly aware of situations where a large number of objects are generated. The pointer protection stack has a fixed size (default 10,000) and can become full. It is not a good idea then to just PROTECT everything in sight and UNPROTECT several thousand objects at the end. It will almost invariably be possible to either assign the objects as part of another object (which automatically protects them) or unprotect them immediately after use.

这样PROTECT现在被删除,一切都很好。 (看来这个指针保护栈限制已经减少到50000因为该案文写。Defn.h包含的#define R_PPSSIZE 50000L )我已经检查了所有其他保护在data.table中的任何类似的C源代码,并在assign.c中修复一个(当通过引用添加超过50,000个列时),没有其他。

So that PROTECT is now removed and all is well. (It seems that the pointer protection stack limit has been reduced to 50,000 since that text was written; Defn.h contains #define R_PPSSIZE 50000L.) I've checked all other PROTECTs in data.table C source for anything similar and found and fixed one in assign.c too (when adding more than 50,000 columns by reference), no others.

报告!

这篇关于fread保护堆栈溢出错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆