Left_join:错误:无法分配大小为"small"的向量.兆字节 [英] Left_join: Error: cannot allocate vector of size "small" Mb

查看:100
本文介绍了Left_join:错误:无法分配大小为"small"的向量.兆字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理相当大的数据框,但极端情况下,该数据框具有约300.000行和1.500变量.因此,在处理这些数据框时,有时会出现错误:

I am working with pretty large dataframes, with as an extreme a dataframe with about 300.000 rows and 1.500 variables. Because of that, when working on those dataframes, I sometimes get the error:

Error: cannot allocate vector of size x.x Gb

通常,这意味着我必须将代码分成较小的步骤,或者必须完全更改方法.

Mostly this means I have to split up my code into smaller steps, or have to change my approach altogether.

此刻,我正在做几个选择和left_join,看起来像这样:

At the moment I am doing several selections and left_join's which look something like this:

#Subsetting the main dataframe
df2 <- select(df1, matchcode, x1, x2, x3)
#Joining variables from a third dataframe
df2 <- df2 %>% left_join(select(df3, matchcode, y1, y2, y3), by="matchcode")

选择部分进行得很完美.但是奇怪的是,当我使用left_join时,出现了这些错误,其中无法分配的数量非常小:

The selection part goes perfectly. The odd thing however, is that I am now getting these errors when using left_join where the amount which cannot be allocated is very small:

Error: cannot allocate vector of size 2.6 Mb
Error: cannot allocate vector of size 4.0 Mb
Error: cannot allocate vector of size 2.6 Mb

还有其他可能导致我不知道的错误的问题吗?或者我的代码有错误吗?

Are there other issues which could result in these errors that I am not aware of, or is there a fault in my code?

推荐答案

自发布此问题以来,我已经进行了一些研究.我首先认为错误与我的工作空间中对象的数量(大小)有关,而事实并非如此.

Since posting this question I have done some research. I first thought the errors had to do with the number(size) of objects in my workspace, which was not the case.

对我自己的问题最重要的回答(请随时对此进行详细说明)是,无法分配的向量的大小并不一定说明该操作对内存的作用.

The most important answer to my own question (please feel free to elaborate on this), is that the size of the vector which cannot be allocated does not necessarily say a lot about what the operation does to memory.

事实证明,其中一个错误是由于我试图对两个巨大的数据集进行多对多联接而导致的:

It turned out that one of the errors was due to me trying to do a many-to-many join on two huge datasets, which created the error:

Error: cannot allocate vector of size 140.4 Mb

其他联接是一对多的(确实导致错误大大减少,请参阅原始帖子).我已经能够通过使用data.table解决方案来加入这些数据框架;

The other joins were one-to-many (which did result in significantly smaller errors, see original post). I have been able to join these data frames by using a data.table solution instead;

library(data.table)
df1 <- merge(df1, df2, by= "matchcode", all.x = TRUE, allow.cartesian=TRUE)

对于多对多联接,我折叠了一个数据集,因此联接变成了一对多.我希望这会有所帮助.

For the many-to-many join, I collapsed one of the datasets so the join became a one-to-many. I hope this helps.

这篇关于Left_join:错误:无法分配大小为"small"的向量.兆字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆