biglm预测无法分配大小为xx.x MB的向量 [英] biglm predict unable to allocate a vector of size xx.x MB

查看:88
本文介绍了biglm预测无法分配大小为xx.x MB的向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有此代码:

library(biglm)
library(ff)

myData <- read.csv.ffdf(file = "myFile.csv")
testData <- read.csv(file = "test.csv")
form <- dependent ~ .
model <- biglm(form, data=myData)
predictedData <- predict(model, newdata=testData)

创建模型时没有问题,但是当我做出预测时...它用尽了内存:

the model is created without problems, but when I make the prediction... it runs out of memory:

无法分配大小为xx.x MB的矢量

一些提示? 或如何使用ff为预测数据变量保留内存?

some hints? or how to use ff to reserve memory for predictedData variable?

推荐答案

我以前没有使用过biglm软件包.根据您所说的,调用predict时内存不足,并且新数据集的行数接近7,000,000.

I have not used biglm package before. Based on what you said, you ran out of memory when calling predict, and you have nearly 7,000,000 rows for new dataset.

要解决内存问题,必须按块进行预测.例如,您一次迭代地预测20,000行.我不确定predict.bigglm是否可以进行逐块预测.

To resolve the memory issue, prediction must be done chunk-wise. For example, you iteratively predict 20,000 rows at a time. I am not sure whether the predict.bigglm can do chunk-wise prediction.

为什么不看看mgcv pacakage?对于大型数据集,bam可以适合线性模型/广义线性模型/广义加性模型等.与biglm相似,当拟合模型时,它将执行逐块矩阵分解.但是,predict.bam支持逐块预测,这对于您的情况确实有用.此外,它还执行并行模型拟合和模型预测,并由parallel程序包支持[使用bam()的自变量cluster;参见?bam?predict.bam下的示例以获取并行示例].

Why not have a look at mgcv pacakage? bam can fit linear models / generalized linear models / generalized additive models, etc, for large data set. Similar to biglm, it performs chunk-wise matrix factorization when fitting model. But, the predict.bam supports chunk-wise prediction, which is really useful for your case. Furthermore, it does parallel model fitting and model prediction, backed by parallel package [use argument cluster of bam(); see examples under ?bam and ?predict.bam for parallel examples].

只需执行library(mgcv),然后检查?bam?predict.bam.

Just do library(mgcv), and check ?bam, ?predict.bam.

备注

请勿将nthreads参数用于并行性.这对于参数回归没有用.

Do not use nthreads argument for parallelism. That is not useful for parametric regression.

这篇关于biglm预测无法分配大小为xx.x MB的向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆