列表中的数据框;添加一个名为dataframe的新变量 [英] Dataframes in a list; adding a new variable with name of dataframe

查看:162
本文介绍了列表中的数据框;添加一个名为dataframe的新变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个我最终希望合并的数据框列表,同时保留原始数据帧名称或列表索引的记录。这将允许我在所有行中的子集等。为了完成这个,我想为每个数据帧添加一个新的变量'id',其中包含它所属的数据帧的名称/索引。



编辑:在我的真实代码是使用以下代码从多个文件中读取数据帧变量,所以我没有真正的名称只有在files.to.read列表中,我不确定它们是否与数据帧顺序对齐:

  mylist<  -  llply(files.to.read,read.csv)

几个帖子中突出显示了几种方法:
使用数据框架在一个列表中的变量添加新的一个
使用改变参数



我已经尝试过两种类似的方法,首先使用索引列表:

  df1 < -  data.frame(x = c(1:5),y = c(11:15))
df2< - data.frame = c(1:5),y = c(11:15))
mylist < - list(df1,df2)

#添加一个新的coloumn'id'每个数据帧中的每一行为5。
#我想根据列表索引更改值。
mylist1< - lapply(mylist,
function(x){
x $ id < - 5
return(x)
}

#我想要的,而不是'5'的例子。
#> mylist1
#[[1]]
#xy id
#1 1 11 1
#2 2 12 1
#3 3 13 1
# 4 4 14 1
#5 5 15 1

#[[2]]
#xy id
#1 1 11 2
#2 2 12 2
#3 3 13 2
#4 4 14 2
#5 5 15 2

第二次尝试传递列表的名称()。

 #我想要在每个数据帧中的每一行添加一个新的coloumn'id'与各自的数据帧
#的名称。
mylist2< - lapply(names(mylist),
function(x){
portfolio.results [[x]] $ id < - dataframe name here
return(portfolio.results [[x]])
}

#我想要的,而不是这里的数据框名称的示例。
#mylist2
#[[1]]
#xy id
#1 1 11 df1
#2 2 12 df1
#3 3 13 df1
#4 4 14 df1
#5 5 15 df1

#[[2]]
#xy id
#1 1 11 df2
#2 2 12 df2
#3 3 13 df2
#4 4 14 df2
#5 5 15 df2

但是,names()函数在数据框列表上不起作用;它返回NULL。
我可以在第一个例子中使用seq_along(mylist)。



任何想法或更好的方式来处理整个与源代码合并



编辑 - 添加解决方案如下:我已经实现了一个解决方案,使用Hadleys建议和Tommy的微调,看起来像这样。

  files.to.read<  -  list.files(datafolder,pattern =\\_D.csv $ ,full.names = FALSE)
mylist< - llply(files.to.read,read.csv)
all< - do.call(rbind,mylist)
全部$ id < - rep(files.to.read,sapply(mylist,nrow))

使用files.to.read向量作为每个数据帧的id



我也从使用merge_recurse()更改,因为它非常慢,由于某些原因。

  all<  -  merge_recurse(mylist)

感谢大家。

解决方案

个人而言,我认为添加名称更容易崩溃:

  df1<  -  data.frame(x = c(1:5),y = c )
df2< - data.frame(x = c(1:5),y = c(11:15))
mylist < - list(df1 = df1,df2 = df2)

all< - do.call(rbind,mylist)
全部$ id< - rep(names(mylist),sapply(mylist,nrow))


I have a list of dataframes which I eventually want to merge while maintaining a record of their original dataframe name or list index. This will allow me to subset etc across all the rows. To accomplish this I would like to add a new variable 'id' to every dataframe, which contains the name/index of the dataframe it belongs to.

Edit: "In my real code the dataframe variables are created from reading multiple files using the following code, so I don't have actual names only those in the 'files.to.read' list which I'm unsure if they will align with the dataframe order:

mylist <- llply(files.to.read, read.csv)

A few methods have been highlighted in several posts: Working-with-dataframes-in-a-list-drop-variables-add-new-ones and Using-lapply-with-changing-arguments

I have tried two similar methods, the first using the index list:

df1 <- data.frame(x=c(1:5),y=c(11:15))
df2 <- data.frame(x=c(1:5),y=c(11:15))
mylist <- list(df1,df2)

# Adds a new coloumn 'id' with a value of 5 to every row in every dataframe.
# I WANT to change the value based on the list index.
mylist1 <- lapply(mylist, 
    function(x){
        x$id <- 5
        return (x)
    }
)
#Example of what I WANT, instead of '5'.
#> mylist1
#[[1]]
  #x  y id
#1 1 11  1
#2 2 12  1
#3 3 13  1
#4 4 14  1
#5 5 15  1
#
#[[2]]
  #x  y id
#1 1 11  2
#2 2 12  2
#3 3 13  2
#4 4 14  2
#5 5 15  2

The second attempts to pass the names() of the list.

# I WANT it to add a new coloumn 'id' with the name of the respective dataframe
# to every row in every dataframe.
mylist2 <- lapply(names(mylist), 
    function(x){
        portfolio.results[[x]]$id <- "dataframe name here"
        return (portfolio.results[[x]])
    }
)
#Example of what I WANT, instead of 'dataframe name here'.
# mylist2
#[[1]]
  #x  y id
#1 1 11  df1
#2 2 12  df1
#3 3 13  df1
#4 4 14  df1
#5 5 15  df1
#
#[[2]]
  #x  y id
#1 1 11  df2
#2 2 12  df2
#3 3 13  df2
#4 4 14  df2
#5 5 15  df2

But the names() function doesn't work on a list of dataframes; it returns NULL. Could I use seq_along(mylist) in the first example.

Any ideas or better way to handle the whole "merge with source id"

Edit - Added Solution below: I've implemented a solution using Hadleys suggestion and Tommy’s nudge which looks something like this.

files.to.read <- list.files(datafolder, pattern="\\_D.csv$", full.names=FALSE)
mylist <- llply(files.to.read, read.csv)
all <- do.call("rbind", mylist)
all$id <- rep(files.to.read, sapply(mylist, nrow))

I used the files.to.read vector as the id for each dataframe

I also changed from using merge_recurse() as it was very slow for some reason.

 all <- merge_recurse(mylist)

Thanks everyone.

解决方案

Personally, I think it's easier to add the names after collapse:

df1 <- data.frame(x=c(1:5),y=c(11:15))
df2 <- data.frame(x=c(1:5),y=c(11:15))
mylist <- list(df1 = df1, df2 = df2)

all <- do.call("rbind", mylist)
all$id <- rep(names(mylist), sapply(mylist, nrow))

这篇关于列表中的数据框;添加一个名为dataframe的新变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆