合并和附加 ffdf 数据帧 [英] Merging and appending ffdf dataframes

查看:29
本文介绍了合并和附加 ffdf 数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过合并和附加两个现有的 ffdf 数据帧来创建一个 ffdf 数据帧.ffdfs 具有不同的列数和不同的行数.我知道 merge() 只执行内部和左外部连接,而 ffdfappend() 如果列不相同则不允许追加.我想知道是否有人对此有解决方法.gtools 包 中的 smartbind() 函数或任何其他解决方法.

I am trying to create an ffdf dataframe by merging and appending two existing ffdf dataframes. The ffdfs have different numbers of columns and different row numbers. I know that merge() performs only inner and left outer joins while ffdfappend() will not allow appending if columns are not identical. I am wondering if anyone has a workaround for this. Either a function like the smartbind() function in the gtools package or any other workaround.

当然转换回 as.data.frame() 和使用 smartbind() 不是一个选项,因为 ffdfs 的大小.

Of course converting back to as.data.frame() and using smartbind() is not an option because of the size of the ffdfs.

任何帮助将不胜感激.

根据建议这里是一个可重复的例子:

As per suggesting here is a reproducible example:

require(ff)
require(ffbase)

df1 <- data.frame(A=1:10, B=LETTERS[1:10], C=rnorm(10), G=1 )
df2 <- data.frame(A=11:20, D=rnorm(10), E=letters[1:10], G=1 )
ffdf1 <- as.ffdf(df1) 
ffdf2 <- as.ffdf(df2)

所需的结果应该是这样的(在 data.frames 上生成,如果我知道如何在 ffdfs 上生成它,我就不会问这个问题了):

The desired result should look something like this (produced on the data.frames, if I knew how to produce it on the ffdfs I would not be asking the question):

require(gtools)
dfcombined <- smartbind(df1, df2)
dfcombined
      A    B          C G          D    E
1:1   1    A  1.1556719 1         NA <NA>
1:2   2    B  0.3279260 1         NA <NA>
1:3   3    C  0.4067643 1         NA <NA>
1:4   4    D -0.9144717 1         NA <NA>
1:5   5    E -0.1138263 1         NA <NA>
1:6   6    F  0.8227560 1         NA <NA>
1:7   7    G  0.3394098 1         NA <NA>
1:8   8    H  1.4498439 1         NA <NA>
1:9   9    I -1.3202419 1         NA <NA>
1:10 10    J  0.2099266 1         NA <NA>
2:1  11 <NA>         NA 1 -1.5802636    a
2:2  12 <NA>         NA 1  1.2925790    b
2:3  13 <NA>         NA 1  1.3477483    c
2:4  14 <NA>         NA 1 -1.6760211    d
2:5  15 <NA>         NA 1  0.1456295    e
2:6  16 <NA>         NA 1  0.4726867    f
2:7  17 <NA>         NA 1 -1.5209117    g
2:8  18 <NA>         NA 1  0.3407136    h
2:9  19 <NA>         NA 1  1.3582868    i
2:10 20 <NA>         NA 1 -1.5083929    j

我希望这能让我更清楚我要实现的目标.

I hope this makes it clearer what I try to achieve.

推荐答案

如果您正在寻找类似 rbind.fill 的东西,但要寻找 ffdf 对象.也许这就是你正在寻找的.这对我有用,在 Jan 准备的测试示例中没有内存问题.

If you are looking for something like rbind.fill but for ffdf objects. Maybe this is what you are looking for. This worked for me without memory issues on the test example Jan prepared.

require(ff)
require(ffbase)
smartffdfbind <- function(..., clone=TRUE){
  x <- list(...)
  columns <- lapply(x, FUN=function(x) colnames(x))
  columns <- do.call(c, columns)
  columns <- unique(columns)
  for(element in 1:length(x)){
    missingcolumns <- setdiff(columns, colnames(x[[element]]))
    for(missingcolumn in missingcolumns){
      x[[element]][[missingcolumn]] <- ff(NA, vmode = "logical", length = nrow(x[[element]]))
    }
  }
  if(clone){
    result <- clone(x[[1]][columns])
  }else{
    result <- x[[1]][columns]
  }
  for (l in tail(x, -1)) {
    result <- ffdfappend(result[columns], l[columns], recode=TRUE)
  }
  result
}

ffdf1 <- ffdf(a = ffrandom(1E8, rnorm), b = ffrandom(1E8, rnorm))
ffdf2 <- ffdf(b = ffrandom(1E8, rnorm), c = ffrandom(1E8, rnorm))

x <- smartffdfbind(ffdf1, ffdf2)
nrow(x)
[1] 200000000
class(x)
"ffdf"

这篇关于合并和附加 ffdf 数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆