列表中的数据框;添加一个名为dataframe的新变量 [英] Dataframes in a list; adding a new variable with name of dataframe
问题描述
编辑:在我的真实代码是使用以下代码从多个文件中读取数据帧变量,所以我没有真正的名称只有在files.to.read列表中,我不确定它们是否与数据帧顺序对齐:
mylist< - llply(files.to.read,read.csv)
几个帖子中突出显示了几种方法:
使用数据框架在一个列表中的变量添加新的一个和
使用改变参数
我已经尝试过两种类似的方法,首先使用索引列表:
df1 < - data.frame(x = c(1:5),y = c(11:15))
df2< - data.frame = c(1:5),y = c(11:15))
mylist < - list(df1,df2)
#添加一个新的coloumn'id'每个数据帧中的每一行为5。
#我想根据列表索引更改值。
mylist1< - lapply(mylist,
function(x){
x $ id < - 5
return(x)
}
)
#我想要的,而不是'5'的例子。
#> mylist1
#[[1]]
#xy id
#1 1 11 1
#2 2 12 1
#3 3 13 1
# 4 4 14 1
#5 5 15 1
#
#[[2]]
#xy id
#1 1 11 2
#2 2 12 2
#3 3 13 2
#4 4 14 2
#5 5 15 2
第二次尝试传递列表的名称()。
#我想要在每个数据帧中的每一行添加一个新的coloumn'id'与各自的数据帧
#的名称。
mylist2< - lapply(names(mylist),
function(x){
portfolio.results [[x]] $ id < - dataframe name here
return(portfolio.results [[x]])
}
)
#我想要的,而不是这里的数据框名称的示例。
#mylist2
#[[1]]
#xy id
#1 1 11 df1
#2 2 12 df1
#3 3 13 df1
#4 4 14 df1
#5 5 15 df1
#
#[[2]]
#xy id
#1 1 11 df2
#2 2 12 df2
#3 3 13 df2
#4 4 14 df2
#5 5 15 df2
但是,names()函数在数据框列表上不起作用;它返回NULL。
我可以在第一个例子中使用seq_along(mylist)。
任何想法或更好的方式来处理整个与源代码合并
编辑 - 添加解决方案如下:我已经实现了一个解决方案,使用Hadleys建议和Tommy的微调,看起来像这样。
files.to.read< - list.files(datafolder,pattern =\\_D.csv $ ,full.names = FALSE)
mylist< - llply(files.to.read,read.csv)
all< - do.call(rbind,mylist)
全部$ id < - rep(files.to.read,sapply(mylist,nrow))
使用files.to.read向量作为每个数据帧的id
我也从使用merge_recurse()更改,因为它非常慢,由于某些原因。
all< - merge_recurse(mylist)
感谢大家。
个人而言,我认为添加名称更容易崩溃:
df1< - data.frame(x = c(1:5),y = c )
df2< - data.frame(x = c(1:5),y = c(11:15))
mylist < - list(df1 = df1,df2 = df2)
all< - do.call(rbind,mylist)
全部$ id< - rep(names(mylist),sapply(mylist,nrow))
I have a list of dataframes which I eventually want to merge while maintaining a record of their original dataframe name or list index. This will allow me to subset etc across all the rows. To accomplish this I would like to add a new variable 'id' to every dataframe, which contains the name/index of the dataframe it belongs to.
Edit: "In my real code the dataframe variables are created from reading multiple files using the following code, so I don't have actual names only those in the 'files.to.read' list which I'm unsure if they will align with the dataframe order:
mylist <- llply(files.to.read, read.csv)
A few methods have been highlighted in several posts: Working-with-dataframes-in-a-list-drop-variables-add-new-ones and Using-lapply-with-changing-arguments
I have tried two similar methods, the first using the index list:
df1 <- data.frame(x=c(1:5),y=c(11:15))
df2 <- data.frame(x=c(1:5),y=c(11:15))
mylist <- list(df1,df2)
# Adds a new coloumn 'id' with a value of 5 to every row in every dataframe.
# I WANT to change the value based on the list index.
mylist1 <- lapply(mylist,
function(x){
x$id <- 5
return (x)
}
)
#Example of what I WANT, instead of '5'.
#> mylist1
#[[1]]
#x y id
#1 1 11 1
#2 2 12 1
#3 3 13 1
#4 4 14 1
#5 5 15 1
#
#[[2]]
#x y id
#1 1 11 2
#2 2 12 2
#3 3 13 2
#4 4 14 2
#5 5 15 2
The second attempts to pass the names() of the list.
# I WANT it to add a new coloumn 'id' with the name of the respective dataframe
# to every row in every dataframe.
mylist2 <- lapply(names(mylist),
function(x){
portfolio.results[[x]]$id <- "dataframe name here"
return (portfolio.results[[x]])
}
)
#Example of what I WANT, instead of 'dataframe name here'.
# mylist2
#[[1]]
#x y id
#1 1 11 df1
#2 2 12 df1
#3 3 13 df1
#4 4 14 df1
#5 5 15 df1
#
#[[2]]
#x y id
#1 1 11 df2
#2 2 12 df2
#3 3 13 df2
#4 4 14 df2
#5 5 15 df2
But the names() function doesn't work on a list of dataframes; it returns NULL. Could I use seq_along(mylist) in the first example.
Any ideas or better way to handle the whole "merge with source id"
Edit - Added Solution below: I've implemented a solution using Hadleys suggestion and Tommy’s nudge which looks something like this.
files.to.read <- list.files(datafolder, pattern="\\_D.csv$", full.names=FALSE)
mylist <- llply(files.to.read, read.csv)
all <- do.call("rbind", mylist)
all$id <- rep(files.to.read, sapply(mylist, nrow))
I used the files.to.read vector as the id for each dataframe
I also changed from using merge_recurse() as it was very slow for some reason.
all <- merge_recurse(mylist)
Thanks everyone.
Personally, I think it's easier to add the names after collapse:
df1 <- data.frame(x=c(1:5),y=c(11:15))
df2 <- data.frame(x=c(1:5),y=c(11:15))
mylist <- list(df1 = df1, df2 = df2)
all <- do.call("rbind", mylist)
all$id <- rep(names(mylist), sapply(mylist, nrow))
这篇关于列表中的数据框;添加一个名为dataframe的新变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!