在使用do.call时丢失数据帧 [英] losing dataframe when using do.call

查看:96
本文介绍了在使用do.call时丢失数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用rbind合并多个数据框。如果我打电话给rbind directy没有问题:

 >测试<  -  rbind(x)
> is.data.frame(x)
[1] TRUE

但是,如果我使用 do.call 我遇到一个问题,我的字符列被折叠,数据帧转换为矩阵。

 > test<  -  do.call(rbind,x)
> is.data.frame(test)
[1] FALSE

根据?文档我试过添加stringsAsFactors = FALSE ,但行为没有变化。我的数据表看起来像这样:

  ID序列描述符
1 aaacccttt g12
2 actttgtgt e34
3 tttgggctc b12
4 ccgcgcgcg c12
... ...

并且rbind输出看起来像这样,但 do.call(rbind,x)输出显示如下,其中sequence列不再是一个字符: p>

  ID 363 426 91 
序列98 353 100
描述符g12 b12 c12

我想使用do.call,因为我循环使用一组数据框,以便使用下面的脚本进行整合。另一个有用的答案可能会提供一个替代的解决方案,如何在一个循环中调用它们来合并多个数据框。

  stringsAsFactors = FALSE 
dfs< - as.list(ls(pattern =Data_))
for(i in 1:length(dfs)){
x< - get(as.character(dfs [ i]))
AllData< - do.call(rbind,x)
}

dfs 是我工作环境中数据框的列表,我使用获取



谢谢。

解决方案

有两个不同的问题导致你




  • stringsAsFactors



您正确地查看 stringsAsFactors ,但是没有在相当正确的地方调用它。



您有两个选项。您可以在选项中设置,如下所示:

  options (stringsAsFactors = FALSE)

或在用于创建您的数据的代码中。表 s:

  a<  -  read.table(textConnection(ID sequence descriptor 
1 aaacccttt g12
2 actttgtgt e34
3 tttgggctc b12
4 ccgcgcgcg c12),
header = T,stringsAsFactors = FALSE)




  • args = 参数 do.call()



您还在使用 do.call()为此。但是,正如@Sacha指出的那样, dfs 需要是 data.frame 的列表,而不是一个 data.frame (本身是向量列表)。

 创建两个data.frames的列表
b< - a
dfs< - list(a,b)

#或者,如果你从他们的名字列表开始
dfs< - list(a,b)
dfs< - lapply(dfs,get)

#检查这个是否符合
。 call(rbind,dfs)
#ID序列描述符
#1 1 aaacccttt g12
#2 2 actttgtgt e34
#3 3 tttgggctc b12
#4 4 ccgcgcgcg c12
#5 1 aaacccttt g12
#6 2 actttgtgt e34
#7 3 tttgggctc b12
#8 4 ccgcgcgcg c12
/ pre>

即使您只有一个 data.frame ,这也适用于您,只要它被包裹在(长度为1)列表中,如下所示: dfs< - list(a) / p>

I am trying to merge a number of dataframes using rbind. If I call rbind directy there is no problem:

> test <- rbind(x)
> is.data.frame(x)
[1] TRUE

however, if I use do.call I run into a problem where my character columns are collapsed and the dataframe is converted to a matrix.

>test <- do.call("rbind", x)
> is.data.frame(test)
[1] FALSE

As per the ?rbind documentation i tried add stringsAsFactors = FALSE but no change in behavior. My data tables look something like this:

ID  sequence    descriptor
1   aaacccttt   g12
2   actttgtgt   e34
3   tttgggctc   b12
4   ccgcgcgcg   c12
…   …       ...

and the rbind output looks like this but the do.call("rbind", x) output appears as follows, where the sequence column is no longer a character:

ID  363 426 91
Sequence 98 353 100
descriptor  g12 b12 c12 

I would like to use do.call because I am looping through a set of dataframes in order to consolidate them using a script below. Another helpful answer might offer an alternative solution on how to merge multiple dataframes while calling them in a loop.

stringsAsFactors = FALSE
dfs <- as.list(ls(pattern="Data_"))
for (i in 1:length(dfs)) {
  x <- get(as.character(dfs[i]))
  AllData <- do.call("rbind", x) 
  }

dfs is the list of dataframes in my working environment and I get the actual dataframe using get

thank you.

解决方案

There are two different issues causing you difficulties.

  • stringsAsFactors

You're right to be looking at stringsAsFactors, but just haven't called it in quite the right place.

You have two options. You can either set it in your options, like this:

options(stringsAsFactors=FALSE)

Or in the code used to create your data.tables:

a <- read.table(textConnection("ID  sequence    descriptor
1   aaacccttt   g12
2   actttgtgt   e34
3   tttgggctc   b12
4   ccgcgcgcg   c12"),
header=T, stringsAsFactors=FALSE)

  • args= argument to do.call()

You're also on the right track in wanting to use do.call() for this. But, as @Sacha points out, dfs needs to be a list of data.frames, not a single data.frame (which is itself a list of vectors).

# Create list of two data.frames
b <- a
dfs <- list(a, b)

# Or, if you start with a list of their names
dfs <- list("a", "b")
dfs <- lapply(dfs, get)

# Check that this works
do.call("rbind", dfs)
#   ID  sequence descriptor
# 1  1 aaacccttt        g12
# 2  2 actttgtgt        e34
# 3  3 tttgggctc        b12
# 4  4 ccgcgcgcg        c12
# 5  1 aaacccttt        g12
# 6  2 actttgtgt        e34
# 7  3 tttgggctc        b12
# 8  4 ccgcgcgcg        c12

This should also work for you even if you have just a single data.frame, as long as it is wrapped in a (length-1) list, like this: dfs <- list(a)

这篇关于在使用do.call时丢失数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆