附加的数据集,创建新的列确定哪些数据集,它来自 [英] Append data sets, create new column identifying which data set it came from
问题描述
我记得曾读到一个R函数,将追加多个数据集,也可以创建一个新的变量确定哪些数据集观测是从哪里来的。我已经冲刷净了几个小时了,不能找到我要找的。 P>
I remember reading about an R function that would append multiple data sets and also create a new variable identifying which data set the observation came from. I've scoured the net for the past hour and can't find what I'm looking for.
df1 <- x y
1 2
3 4
df2 <- x y
5 6
7 8
df3 <- FUNCTION(df1, df2)
df3 = x y source
1 2 df1
3 4 df1
5 6 df2
7 8 df2
有谁知道可能是什么功能?或者,我会想象这个?
Does anyone know what FUNCTION could be? Or, am I imagining this?
在此先感谢!
推荐答案
这不正是你问什么,但它的pretty接近。把你的对象命名列表,并使用 do.call(rbind ...)
It's not exactly what you asked for, but it's pretty close. Put your objects in a named list and use do.call(rbind...)
> do.call(rbind, list(df1 = df1, df2 = df2))
x y
df1.1 1 2
df1.2 3 4
df2.1 5 6
df2.2 7 8
注意,行名称反映,现在源 data.frame
秒。
另一种选择是使像下面这样的基本功能:
Another option is to make a basic function like the following:
AppendMe <- function(dfNames) {
do.call(rbind, lapply(dfNames, function(x) {
cbind(get(x), source = x)
}))
}
此功能然后采取要栈的 data.frame
名称的特征向量,如下:
This function then takes a character vector of the data.frame
names that you want to "stack", as follows:
> AppendMe(c("df1", "df2"))
x y source
1 1 2 df1
2 3 4 df1
3 5 6 df2
4 7 8 df2
更新2:使用组合
从GDATA包
Update 2: Use combine
from the "gdata" package
> library(gdata)
> combine(df1, df2)
x y source
1 1 2 df1
2 3 4 df1
3 5 6 df2
4 7 8 df2
更新3:使用 rbindlist
从data.table
这是现在可以使用的另一种方法是使用 rbindlist
从data.table。就这样,该方法可以是:
Update 3: Use rbindlist
from "data.table"
Another approach that can be used now is to use rbindlist
from "data.table". With that, the approach could be:
> rbindlist(mget(ls(pattern = "df\\d+")), idcol = TRUE)
.id x y
1: df1 1 2
2: df1 3 4
3: df2 5 6
4: df2 7 8
更新4:使用 map_df
从purrr
类似 rbindlist
,你也可以使用 map_df
从purrr与我
或 C
作为功能应用到每个列表元素。
Update 4: use map_df
from "purrr"
Similar to rbindlist
, you can also use map_df
from "purrr" with I
or c
as the function to apply to each list element.
> mget(ls(pattern = "df\\d+")) %>% map_df(I, .id = "src")
Source: local data frame [4 x 3]
src x y
(chr) (int) (int)
1 df1 1 2
2 df1 3 4
3 df2 5 6
4 df2 7 8
这篇关于附加的数据集,创建新的列确定哪些数据集,它来自的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!