组合(rbind)数据框并使用原始数据框的名称创建列 [英] Combine (rbind) data frames and create column with name of original data frames
问题描述
我有几个要按行组合的数据框.在生成的单个数据框中,我想创建一个新变量来标识观察来自哪个数据集.
I have several data frames that I want to combine by row. In the resulting single data frame, I want to create a new variable identifying which data set the observation came from.
# original data frames
df1 <- data.frame(x = c(1, 3), y = c(2, 4))
df2 <- data.frame(x = c(5, 7), y = c(6, 8))
# desired, combined data frame
df3 <- data.frame(x = c(1, 3, 5, 7), y = c(2, 4, 6, 8),
source = c("df1", "df1", "df2", "df2")
# x y source
# 1 2 df1
# 3 4 df1
# 5 6 df2
# 7 8 df2
我怎样才能做到这一点?提前致谢!
How can I achieve this? Thanks in advance!
推荐答案
这不是您所要求的,但非常接近.将你的对象放在一个命名列表中并使用 do.call(rbind...)
It's not exactly what you asked for, but it's pretty close. Put your objects in a named list and use do.call(rbind...)
> do.call(rbind, list(df1 = df1, df2 = df2))
x y
df1.1 1 2
df1.2 3 4
df2.1 5 6
df2.2 7 8
请注意,行名称现在反映了源 data.frame
s.
Notice that the row names now reflect the source data.frame
s.
另一种选择是制作如下基本功能:
Another option is to make a basic function like the following:
AppendMe <- function(dfNames) {
do.call(rbind, lapply(dfNames, function(x) {
cbind(get(x), source = x)
}))
}
此函数然后采用您要堆叠"的 data.frame
名称的字符向量,如下所示:
This function then takes a character vector of the data.frame
names that you want to "stack", as follows:
> AppendMe(c("df1", "df2"))
x y source
1 1 2 df1
2 3 4 df1
3 5 6 df2
4 7 8 df2
更新 2:使用gdata"包中的 combine
> library(gdata)
> combine(df1, df2)
x y source
1 1 2 df1
2 3 4 df1
3 5 6 df2
4 7 8 df2
更新 3:使用data.table"中的 rbindlist
现在可以使用的另一种方法是使用data.table"中的 rbindlist
及其 idcol
参数.有了这个,方法可以是:
Update 3: Use rbindlist
from "data.table"
Another approach that can be used now is to use rbindlist
from "data.table" and its idcol
argument. With that, the approach could be:
> rbindlist(mget(ls(pattern = "df\\d+")), idcol = TRUE)
.id x y
1: df1 1 2
2: df1 3 4
3: df2 5 6
4: df2 7 8
更新 4:使用purrr"中的 map_df
与 rbindlist
类似,您也可以使用来自purrr"的 map_df
和 I
或 c
作为应用于每个列表元素的函数.
Update 4: use map_df
from "purrr"
Similar to rbindlist
, you can also use map_df
from "purrr" with I
or c
as the function to apply to each list element.
> mget(ls(pattern = "df\\d+")) %>% map_df(I, .id = "src")
Source: local data frame [4 x 3]
src x y
(chr) (int) (int)
1 df1 1 2
2 df1 3 4
3 df2 5 6
4 df2 7 8
这篇关于组合(rbind)数据框并使用原始数据框的名称创建列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!