xlConnect R 使用 JVM 内存 [英] xlConnect R use of JVM memory

查看:30
本文介绍了xlConnect R 使用 JVM 内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R 中使用 XLConnect(Mirai Solutions)时遇到了 JVM 内存问题.

I'm running into a problem with JVM memory using XLConnect (Mirai Solutions) in R.

使用 loadWorkbookreadWorksheetFromFile 将数据加载到 R 中就好了,但是较大的数据(大约 3MB 的数据帧)在导出期间被写入 JVM 时卡住了导出函数(writeNamedRegionwriteWorksheetToFile 等),R 停止响应.

Data loads into R just fine using loadWorkbook or readWorksheetFromFile, but larger data (data frames about 3MB) get stuck while being written to the JVM during export with any of the export functions (writeNamedRegion, writeWorksheetToFile, etc.), and R stops responding.

我已经使用 options(java.parameters = "-Xmx1500m") 重置了 java 参数,这增加了我能够导出到 Excel 的数据框的大小,但 R 仍然减慢大约 1MB 并且不会在大约 3MB 处工作.

I've reset the java parameters using options(java.parameters = "-Xmx1500m"), and this increased the size of the data frames I was able to export to Excel, but R still slows around 1MB and won't work around 3MB.

我在 64 位 Windows 7 系统上使用 32 位 Office 软件和 32 位 Java,在一台 8GB RAM 的机器上.与 JVM 中大约 750 MB 的空闲内存相比,3MB 似乎并不大,据说在导出开始时存在(使用 xlcMemoryReport 检查).

I'm on a 64-bit Windows 7 system with 32-bit Office software and 32-bit Java on a machine with 8GB RAM. 3MB doesn't seem very big compared to the ~750 MB free memory in the JVM that is supposedly there at the beginning of export (checked with xlcMemoryReport).

想法?

推荐答案

鉴于您的参考值为 3MB,我的结论是您正在尝试使用维度为 10 列 x 40k 行(或可比;此类数据帧的对象大小约为 3.2MB).

Given your reference value of 3MB I'm concluding you are trying to write a data.frame with numeric variables of dimension 10 columns x 40k rows (or comparable; the object.size of such a data.frame results in approx. 3.2MB).

根据您是尝试编写 xls (BIFF8) 还是 xlsx (OOXML) 文件,内存要求可能大不相同.原因是 xlsx 文档实际上是压缩的 XML 文件,而 Apache POI(XLConnect 使用的底层 Java API)使用 xmlbeans 来操作这些文件 - 这可能会占用大量内存.另一方面,BIFF8 是一种二进制数据格式,需要较少的内存.

Depending on if you are trying to write xls (BIFF8) or xlsx (OOXML) files, memory requirements can be quite different. Reason being that xlsx documents are actually compressed XML files and Apache POI (which is the underlying Java API that is used by XLConnect) uses xmlbeans to manipulate those - this can be quite memory intense. BIFF8 on the other hand is a binary data format and requires less memory.

您应该能够将前面提到的维度的 data.frame 写入 xlsx 文档中,并且最大.堆大小为 1024m,即以下对我来说效果很好:

You should be able to write a data.frame of before mentioned dimensions to an xlsx document with a max. heap size of 1024m, i.e. the following worked fine for me:

options(java.parameters = "-Xmx1024m") # required BEFORE any JVM is initialized in R
require(XLConnect)
tmp = as.data.frame(matrix(rnorm(4e5), ncol = 10))
writeWorksheetToFile(tmp, file = "test.xlsx", sheet = "test")

...使用 R 2.15.1 32 位 RStudio、XLConnect 0.2-0 和 JRE 1.6.0_25(在 32 位 Windows XP 上运行,4GB 内存).

... using R 2.15.1 32-bit with RStudio, XLConnect 0.2-0 and JRE 1.6.0_25 (running on 32-bit Windows XP with 4GB of RAM).

对于那些有兴趣更深入地讨论 Apache POI 方面的内存使用情况的人,有以下讨论:http://apache-poi.1045710.n5.nabble.com/HSSF-and-XSSF-memory-usage-some-数字-td4312784.html

For those interested in a more in-depth discussion of memory usage on the Apache POI side there is the following discussion: http://apache-poi.1045710.n5.nabble.com/HSSF-and-XSSF-memory-usage-some-numbers-td4312784.html

这篇关于xlConnect R 使用 JVM 内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆