附加的数据集,创建新的列确定哪些数据集,它来自 [英] Append data sets, create new column identifying which data set it came from

查看:159
本文介绍了附加的数据集,创建新的列确定哪些数据集,它来自的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我记得曾读到一个R函数,将追加多个数据集,也可以创建一个新的变量确定哪些数据集观测是从哪里来的。我已经冲刷净了几个小时了,不能找到我要找的。

I remember reading about an R function that would append multiple data sets and also create a new variable identifying which data set the observation came from. I've scoured the net for the past hour and can't find what I'm looking for.

df1 <- x y
       1 2
       3 4
df2 <- x y
       5 6
       7 8
df3 <- FUNCTION(df1, df2)
df3 = x y source
      1 2 df1
      3 4 df1
      5 6 df2
      7 8 df2

有谁知道可能是什么功能?或者,我会想象这个?

Does anyone know what FUNCTION could be? Or, am I imagining this?

在此先感谢!

推荐答案

这不正是你问什么,但它的pretty接近。把你的对象命名列表,并使用 do.call(rbind ...)

It's not exactly what you asked for, but it's pretty close. Put your objects in a named list and use do.call(rbind...)

> do.call(rbind, list(df1 = df1, df2 = df2))
      x y
df1.1 1 2
df1.2 3 4
df2.1 5 6
df2.2 7 8

注意,行名称反映,现在源 data.frame 秒。

另一种选择是使像下面这样的基本功能:

Another option is to make a basic function like the following:

AppendMe <- function(dfNames) {
  do.call(rbind, lapply(dfNames, function(x) {
    cbind(get(x), source = x)
  }))
}

此功能然后采取要栈的 data.frame 名称的特征向量,如下:

This function then takes a character vector of the data.frame names that you want to "stack", as follows:

> AppendMe(c("df1", "df2"))
  x y source
1 1 2    df1
2 3 4    df1
3 5 6    df2
4 7 8    df2

更新2:使用组合从GDATA包

Update 2: Use combine from the "gdata" package

> library(gdata)
> combine(df1, df2)
  x y source
1 1 2    df1
2 3 4    df1
3 5 6    df2
4 7 8    df2

更新3:使用 rbindlist 从data.table

这是现在可以使用的另一种方法是使用 rbindlist 从data.table。就这样,该方法可以是:

Update 3: Use rbindlist from "data.table"

Another approach that can be used now is to use rbindlist from "data.table". With that, the approach could be:

> rbindlist(mget(ls(pattern = "df\\d+")), idcol = TRUE)
   .id x y
1: df1 1 2
2: df1 3 4
3: df2 5 6
4: df2 7 8

更新4:使用 map_df 从purrr

类似 rbindlist ,你也可以使用 map_df 从purrr与 C 作为功能应用到每个列表元素。

Update 4: use map_df from "purrr"

Similar to rbindlist, you can also use map_df from "purrr" with I or c as the function to apply to each list element.

> mget(ls(pattern = "df\\d+")) %>% map_df(I, .id = "src")
Source: local data frame [4 x 3]

    src     x     y
  (chr) (int) (int)
1   df1     1     2
2   df1     3     4
3   df2     5     6
4   df2     7     8

这篇关于附加的数据集,创建新的列确定哪些数据集,它来自的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆