组合(rbind)数据框并使用原始数据框的名称创建列 [英] Combine (rbind) data frames and create column with name of original data frames

查看:61
本文介绍了组合(rbind)数据框并使用原始数据框的名称创建列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个要按行组合的数据框.在生成的单个数据框中,我想创建一个新变量来标识观察来自哪个数据集.

I have several data frames that I want to combine by row. In the resulting single data frame, I want to create a new variable identifying which data set the observation came from.

# original data frames
df1 <- data.frame(x = c(1, 3), y = c(2, 4))
df2 <- data.frame(x = c(5, 7), y = c(6, 8))

# desired, combined data frame
df3  <- data.frame(x = c(1, 3, 5, 7), y = c(2, 4, 6, 8),
                   source = c("df1", "df1", "df2", "df2")
# x y source
# 1 2    df1
# 3 4    df1
# 5 6    df2
# 7 8    df2

我怎样才能做到这一点?提前致谢!

How can I achieve this? Thanks in advance!

推荐答案

这不是您所要求的,但非常接近.将你的对象放在一个命名列表中并使用 do.call(rbind...)

It's not exactly what you asked for, but it's pretty close. Put your objects in a named list and use do.call(rbind...)

> do.call(rbind, list(df1 = df1, df2 = df2))
      x y
df1.1 1 2
df1.2 3 4
df2.1 5 6
df2.2 7 8

请注意,行名称现在反映了源 data.frames.

Notice that the row names now reflect the source data.frames.

另一种选择是制作如下基本功能:

Another option is to make a basic function like the following:

AppendMe <- function(dfNames) {
  do.call(rbind, lapply(dfNames, function(x) {
    cbind(get(x), source = x)
  }))
}

此函数然后采用您要堆叠"的 data.frame 名称的字符向量,如下所示:

This function then takes a character vector of the data.frame names that you want to "stack", as follows:

> AppendMe(c("df1", "df2"))
  x y source
1 1 2    df1
2 3 4    df1
3 5 6    df2
4 7 8    df2

更新 2:使用gdata"包中的 combine

> library(gdata)
> combine(df1, df2)
  x y source
1 1 2    df1
2 3 4    df1
3 5 6    df2
4 7 8    df2

更新 3:使用data.table"中的 rbindlist

现在可以使用的另一种方法是使用data.table"中的 rbindlist 及其 idcol 参数.有了这个,方法可以是:

Update 3: Use rbindlist from "data.table"

Another approach that can be used now is to use rbindlist from "data.table" and its idcol argument. With that, the approach could be:

> rbindlist(mget(ls(pattern = "df\\d+")), idcol = TRUE)
   .id x y
1: df1 1 2
2: df1 3 4
3: df2 5 6
4: df2 7 8

更新 4:使用purrr"中的 map_df

rbindlist 类似,您也可以使用来自purrr"的 map_dfIc 作为应用于每个列表元素的函数.

Update 4: use map_df from "purrr"

Similar to rbindlist, you can also use map_df from "purrr" with I or c as the function to apply to each list element.

> mget(ls(pattern = "df\\d+")) %>% map_df(I, .id = "src")
Source: local data frame [4 x 3]

    src     x     y
  (chr) (int) (int)
1   df1     1     2
2   df1     3     4
3   df2     5     6
4   df2     7     8

这篇关于组合(rbind)数据框并使用原始数据框的名称创建列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆