列表中的数据框;添加一个名为dataframe的新变量 [英] Dataframes in a list; adding a new variable with name of dataframe

查看：162 发布时间：2017/3/25 23:28:23 list r dataframe names lapply

本文介绍了列表中的数据框;添加一个名为dataframe的新变量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个我最终希望合并的数据框列表，同时保留原始数据帧名称或列表索引的记录。这将允许我在所有行中的子集等。为了完成这个，我想为每个数据帧添加一个新的变量'id'，其中包含它所属的数据帧的名称/索引。

编辑：在我的真实代码是使用以下代码从多个文件中读取数据帧变量，所以我没有真正的名称只有在files.to.read列表中，我不确定它们是否与数据帧顺序对齐：

  mylist<  -  llply（files.to.read，read.csv）

几个帖子中突出显示了几种方法：
使用数据框架在一个列表中的变量添加新的一个和
使用改变参数

我已经尝试过两种类似的方法，首先使用索引列表：

  df1 < -  data.frame（x = c（1：5），y = c（11:15））
 df2<  -  data.frame = c（1：5），y = c（11:15））
 mylist < -  list（df1，df2）
 
＃添加一个新的coloumn'id'每个数据帧中的每一行为5。 
＃我想根据列表索引更改值。 
 mylist1<  -  lapply（mylist，
 function（x）{
x $ id < -  5 
 return（x）
} 
）
＃我想要的，而不是'5'的例子。 
＃> mylist1 
＃[[1]] 
 #xy id 
＃1 1 11 1 
＃2 2 12 1 
＃3 3 13 1 
＃ 4 4 14 1 
＃5 5 15 1 
＃
＃[[2]] 
 #xy id 
＃1 1 11 2 
＃2 2 12 2 
＃3 3 13 2 
＃4 4 14 2 
＃5 5 15 2

第二次尝试传递列表的名称（）。

 ＃我想要在每个数据帧中的每一行添加一个新的coloumn'id'与各自的数据帧
＃的名称。 
 mylist2<  -  lapply（names（mylist），
 function（x）{
 portfolio.results [[x]] $ id < - dataframe name here
 return（portfolio.results [[x]]）
} 
）
＃我想要的，而不是这里的数据框名称的示例。 
＃mylist2 
＃[[1]] 
 #xy id 
＃1 1 11 df1 
＃2 2 12 df1 
＃3 3 13 df1 
＃4 4 14 df1 
＃5 5 15 df1 
＃
＃[[2]] 
 #xy id 
＃1 1 11 df2 
＃2 2 12 df2 
＃3 3 13 df2 
＃4 4 14 df2 
＃5 5 15 df2

但是，names（）函数在数据框列表上不起作用;它返回NULL。
我可以在第一个例子中使用seq_along（mylist）。

任何想法或更好的方式来处理整个与源代码合并

编辑 - 添加解决方案如下：我已经实现了一个解决方案，使用Hadleys建议和Tommy的微调，看起来像这样。

  files.to.read<  -  list.files（datafolder，pattern =\\_D.csv $ ，full.names = FALSE）
 mylist<  -  llply（files.to.read，read.csv）
 all<  -  do.call（rbind，mylist）
全部$ id < -  rep（files.to.read，sapply（mylist，nrow））

使用files.to.read向量作为每个数据帧的id

我也从使用merge_recurse（）更改，因为它非常慢，由于某些原因。

  all<  -  merge_recurse（mylist）

感谢大家。

解决方案

个人而言，我认为添加名称更容易崩溃：

  df1<  -  data.frame（x = c（1：5），y = c ）
 df2<  -  data.frame（x = c（1：5），y = c（11:15））
 mylist < -  list（df1 = df1，df2 = df2） 
 
 all<  -  do.call（rbind，mylist）
全部$ id<  -  rep（names（mylist），sapply（mylist，nrow））

I have a list of dataframes which I eventually want to merge while maintaining a record of their original dataframe name or list index. This will allow me to subset etc across all the rows. To accomplish this I would like to add a new variable 'id' to every dataframe, which contains the name/index of the dataframe it belongs to.

Edit: "In my real code the dataframe variables are created from reading multiple files using the following code, so I don't have actual names only those in the 'files.to.read' list which I'm unsure if they will align with the dataframe order:

mylist <- llply(files.to.read, read.csv)

A few methods have been highlighted in several posts: Working-with-dataframes-in-a-list-drop-variables-add-new-ones and Using-lapply-with-changing-arguments

I have tried two similar methods, the first using the index list:

df1 <- data.frame(x=c(1:5),y=c(11:15))
df2 <- data.frame(x=c(1:5),y=c(11:15))
mylist <- list(df1,df2)

# Adds a new coloumn 'id' with a value of 5 to every row in every dataframe.
# I WANT to change the value based on the list index.
mylist1 <- lapply(mylist, 
    function(x){
        x$id <- 5
        return (x)
    }
)
#Example of what I WANT, instead of '5'.
#> mylist1
#[[1]]
  #x  y id
#1 1 11  1
#2 2 12  1
#3 3 13  1
#4 4 14  1
#5 5 15  1
#
#[[2]]
  #x  y id
#1 1 11  2
#2 2 12  2
#3 3 13  2
#4 4 14  2
#5 5 15  2

The second attempts to pass the names() of the list.

# I WANT it to add a new coloumn 'id' with the name of the respective dataframe
# to every row in every dataframe.
mylist2 <- lapply(names(mylist), 
    function(x){
        portfolio.results[[x]]$id <- "dataframe name here"
        return (portfolio.results[[x]])
    }
)
#Example of what I WANT, instead of 'dataframe name here'.
# mylist2
#[[1]]
  #x  y id
#1 1 11  df1
#2 2 12  df1
#3 3 13  df1
#4 4 14  df1
#5 5 15  df1
#
#[[2]]
  #x  y id
#1 1 11  df2
#2 2 12  df2
#3 3 13  df2
#4 4 14  df2
#5 5 15  df2

But the names() function doesn't work on a list of dataframes; it returns NULL. Could I use seq_along(mylist) in the first example.

Any ideas or better way to handle the whole "merge with source id"

Edit - Added Solution below: I've implemented a solution using Hadleys suggestion and Tommy’s nudge which looks something like this.

files.to.read <- list.files(datafolder, pattern="\\_D.csv$", full.names=FALSE)
mylist <- llply(files.to.read, read.csv)
all <- do.call("rbind", mylist)
all$id <- rep(files.to.read, sapply(mylist, nrow))

I used the files.to.read vector as the id for each dataframe

I also changed from using merge_recurse() as it was very slow for some reason.

 all <- merge_recurse(mylist)

Thanks everyone.

解决方案

Personally, I think it's easier to add the names after collapse:

df1 <- data.frame(x=c(1:5),y=c(11:15))
df2 <- data.frame(x=c(1:5),y=c(11:15))
mylist <- list(df1 = df1, df2 = df2)

all <- do.call("rbind", mylist)
all$id <- rep(names(mylist), sapply(mylist, nrow))

这篇关于列表中的数据框;添加一个名为dataframe的新变量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

列表中的数据框;添加一个名为dataframe的新变量 [英] Dataframes in a list; adding a new variable with name of dataframe

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

列表中的数据框;添加一个名为dataframe的新变量 [英] Dataframes in a list; adding a new variable with name of dataframe

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭