根据列合并文件 [英] Merging files on the basis of columns
问题描述
我有多个文件,其中有许多行和三列,并且需要在前两列匹配的基础上合并它们. File1
I have multiple files with many rows and three columns and need to merge them on the basis of first two columns match. File1
12 13 a
13 15 b
14 17 c
4 9 d
. . .
. . .
81 23 h
文件2
12 13 e
3 10 b
14 17 c
4 9 j
. . .
. . .
1 2 k
文件3
12 13 m
13 15 k
1 7 x
24 9 d
. . .
. . .
1 2 h
,依此类推. 我想将它们合并以获得以下结果
and so on. I want to merge them to obtain the following result
12 13 a e m
13 15 b k
14 17 c c
4 9 d j
3 10 b
24 9 d
. . .
. . .
81 23 h
1 2 k
1 7 x
推荐答案
这些类型的问题通常首先想到的是merge
,也许与Reduce(function(x, y) merge(x, y, by = "somecols", all = TRUE), yourListOfDataFrames)
一起使用.
The first thing that usually comes to mind with these types of problems is merge
, perhaps in conjunction with a Reduce(function(x, y) merge(x, y, by = "somecols", all = TRUE), yourListOfDataFrames)
.
但是,merge
并不总是最有效的功能,特别是因为它看起来像您想折叠"所有值以填充从左到右的行,这不是默认的merge
行为
However, merge
is not always the most efficient function, especially since it looks like you want to "collapse" all the values to fill in the rows from left to right, which would not be the default merge
behavior.
相反,我建议您将所有内容堆叠到一个长的data.frame
中,并在添加索引变量后对其进行整形.
Instead, I suggest you stack everything into one long data.frame
and reshape it after you have added an index variable.
有两种方法:
- 使用
mget
将所有data.frame
放入list
. - 使用
rbind_all
将该list
转换为单个data.frame
. - 使用"dplyr"中
mutate
中的sequence(n())
对数据进行分组并创建索引. - 使用"tidyr"中的
spread
将长"格式转换为宽"格式.
- Use
mget
to put all of yourdata.frame
s into alist
. - Use
rbind_all
to convert thatlist
into a singledata.frame
. - Use
sequence(n())
inmutate
from "dplyr" to group the data and create an index. - Use
spread
from "tidyr" to transform from a "long" format to a "wide" format.
library(dplyr)
library(tidyr)
combined <- rbind_all(mget(ls(pattern = "^file\\d")))
combined %>%
group_by(V1, V2) %>%
mutate(time = sequence(n())) %>%
ungroup() %>%
spread(time, V3, fill = "")
# Source: local data frame [7 x 5]
#
# V1 V2 1 2 3
# 1 1 7 x
# 2 3 10 b
# 3 4 9 d j
# 4 12 13 a e m
# 5 13 15 b k
# 6 14 17 c c
# 7 24 9 d
选项2:"data.table"
- 使用
mget
将所有data.frame
放入list
. - 使用
rbindlist
将该列表转换为单个data.table
. - 使用
sequence(.N)
来按组生成序列. - 使用
dcast.data.table
将长"data.table
转换为宽".
- Use
mget
to put all of yourdata.frame
s into alist
. - Use
rbindlist
to convert that list into a singledata.table
. - Use
sequence(.N)
to generate your sequence by your groups. - Use
dcast.data.table
to convert the "long"data.table
into a "wide" one.
library(data.table)
dcast.data.table(
rbindlist(mget(ls(pattern = "^file\\d")))[,
time := sequence(.N), by = list(V1, V2)],
V1 + V2 ~ time, value.var = "V3", fill = "")
# V1 V2 1 2 3
# 1: 1 7 x
# 2: 3 10 b
# 3: 4 9 d j
# 4: 12 13 a e m
# 5: 13 15 b k
# 6: 14 17 c c
# 7: 24 9 d
这两个答案均假设我们从以下示例数据开始:
Both of these answers assume we are starting with the following sample data:
file1 <- structure(
list(V1 = c(12L, 13L, 14L, 4L), V2 = c(13L, 15L, 17L, 9L),
V3 = c("a", "b", "c", "d")), .Names = c("V1", "V2", "V3"),
class = "data.frame", row.names = c(NA, -4L))
file2 <- structure(
list(V1 = c(12L, 3L, 14L, 4L), V2 = c(13L, 10L, 17L, 9L),
V3 = c("e", "b", "c", "j")), .Names = c("V1", "V2", "V3"),
class = "data.frame", row.names = c(NA, -4L))
file3 <- structure(
list(V1 = c(12L, 13L, 1L, 24L), V2 = c(13L, 15L, 7L, 9L),
V3 = c("m", "k", "x", "d")), .Names = c("V1", "V2", "V3"),
class = "data.frame", row.names = c(NA, -4L))
这篇关于根据列合并文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!